Documentation · ↑ All docs

📖 Core Workflows

Applies to: FinLang v0.7+ Status: Stable Last verified: v0.7.9

🎯 Quick Navigation

I want to…


✅ Daily Run

The Daily Run applies your personal rules plus optional starter packs to new transaction data.

First Time? Quick Setup

# 1) Install FinLang (with fast IO extras)
pip install "finlang[fastio]"

# 2) Create an empty rules file
echo "# My FinLang Rules" > my_rules.fin

# 3) Run your first categorization
finlang --input transactions.csv --output categorized.csv --rules my_rules.fin

Example (Full Production Command)

finlang --input transactions.csv --output categorized.csv \
  --rules my_rules.fin --include-pack retail,sanity \
  --fastio --audit audit.json --audit-mode lite

What’s Happening

  • transactions.csv → raw bank export (FinLang normalizes headers automatically).
  • my_rules.fin → your personal ruleset (highest precedence).
  • --include-pack retail,sanity → adds baseline coverage & sanity checks.
  • --audit audit.json --audit-mode lite → logs changed cells for traceability (lite = only changed cells).
  • --fastio → speeds up CSV IO with PyArrow.

🌍 International users: If your CSV uses European formats (e.g., 1.234,56 or DD/MM/YYYY), add I18n flags:

finlang --input transactions.csv --output categorized.csv \
  --rules my_rules.fin --include-pack retail,sanity \
  --decimal "," --thousands "." --dayfirst --encoding auto --strict-parse

See i18n_examples.md for regional recipes.

When to Use

  • Daily or weekly transaction categorization.
  • Producing audit trails for compliance or bookkeeping.
  • Fast, reliable updates with minimal overhead.

🔁 Growth Loop (Feedback Workflow)

Growth Loop Diagram

FinLang's Growth Loop converts uncategorized data into new rules using three tools:

  • finlang → Process transactions
  • finlang-discover → Find frequent uncategorized patterns
  • finlang-suggest → Generate conservative draft rules

Step 1 — Initial Processing

When to use: Start of every growth loop cycle, or when processing new transaction data.

Run FinLang as per the Daily Run example above. This produces categorized.csv.

Step 2 — Discover Candidates

When to use: After processing, to identify recurring uncategorized counterparties.

Identify frequently-occurring uncategorized counterparties and also export full discovery stats.

Exclude-aware (v0.7.4): Rows marked exclude=True are skipped by default — they are intentionally out of scope, not categorisation gaps. To include them (e.g., for audit or review), add --include-excluded.

finlang-discover --input categorized.csv \
  --candidates candidates.csv \
  --all-candidates all_candidates.csv \
  --min-count 3 --strict-parse --encoding auto

Step 3 — Suggest Draft Rules

When to use: When you have candidates worth converting to rules (typically 5+ occurrences).

Generate draft .fin rules from the candidates. For production-grade precision, prefer exact matching.

finlang-suggest --input candidates.csv --output draft_rules.fin \
  --rules my_rules.fin \
  --emit-match exact \
  --category "Review"

⚠️ Important: Always review draft_rules.fin before merging. The "Review" category is intentional—verify logic then update categories.

Step 4 — Review & Merge

When to use: After reviewing suggested rules for accuracy. Never merge blindly.

# Linux/macOS
cat draft_rules.fin >> my_rules.fin

# Windows (PowerShell)
Get-Content draft_rules.fin | Add-Content my_rules.fin

# Windows (CMD)
type draft_rules.fin >> my_rules.fin

Step 5 — Re-run with Full Audit

When to use: After merging new rules, to validate coverage improvement.

finlang --input transactions.csv --output categorized.csv \
  --rules my_rules.fin --include-pack retail,sanity \
  --audit audit_full.json --audit-mode full --fastio

📈 Expected Outcomes

Iteration Uncategorized ↓ Time/Loop Rules Added
First loop 60% → 40% ~45 min 15–30
3–5 loops 40% → 15% ~20 min 5–10
Steady state <5% ~10 min/mo Maintenance only

Results vary by dataset complexity and team discipline. Most users see 5–10% improvement per loop.

Track Coverage

Purpose: Monitor your categorization progress over iterations. The goal is to reduce uncategorized transactions to <5%.

# Linux/macOS
finlang-discover --input categorized.csv --candidates temp.csv
grep -c '""' categorized.csv  # Count empty categories (heuristic)
# Windows PowerShell
finlang-discover --input categorized.csv --candidates temp.csv
(Get-Content categorized.csv | Select-String '""').Count

🧪 Benchmarking

When to benchmark

  • Validate that FinLang handles your data volume
  • Compare ruleset strategies
  • Capacity planning prior to rollout
  • After major rule changes (regression check)

When not to benchmark

  • Routine daily ops (adds noise)
  • Before understanding your data patterns
  • Without a specific performance question

Single-Ruleset Harness (CLI)

python -m benchmarks.bench_finlang_harness \
  --mode full-cli \
  --run-fin "finlang --fastio --audit-mode none --headless --strict-parse --encoding auto" \
  --rules examples/rules.demo.fin \
  --rows 25000 50000 100000 200000 \
  --cols 5 20 35 50 \
  --runs 3 \
  --final-rows 1000000 5000000 \
  --outdir bench_out

Performance at a Glance (v0.7.7)

Rows × Cols Runtime Throughput Context Suitable For
5M × 5 ~18 s ~283 K rows/s SME batch Small business
5M × 20 ~72 s ~70 K rows/s Payment gateway Mid-market
5M × 50 ~179 s ~28 K rows/s Enterprise ledger Enterprise
20M × 6 ~90 s (FastIO) ~217 K rows/s Integrity harness Engine throughput ceiling

v0.7.7 update: A hot-path bug fix in _to_number (removing an unnecessary \b word boundary that was misclassifying no-space CR/DR formats) delivered +30-50% throughput on the integrity harness vs v0.7.6. See benchmarks.md for the full version comparison and methodology.

See benchmarks.md for detailed data & methodology.


🏢 Enterprise Integration & Workflows

Git-Based Review Flow (Recommended)

# Create feature branch
git checkout -b add-suggested-rules

# Review and edit draft_rules.fin locally
# ... make changes ...

# Merge draft rules into main ruleset
cat draft_rules.fin >> my_rules.fin

# Commit & push
git add my_rules.fin
git commit -m "Add suggested rules for TESCO, AMAZON, UBER"
git push origin add-suggested-rules

# Open a Pull Request for review

CI/CD Validation (GitHub Actions)

Protect your main branch with automated rule testing:

name: Validate Rules
on: [pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - name: Install FinLang
        run: pip install "finlang[fastio]"
      - name: Validate Rules (strict, headless)
        run: >
          finlang --rules my_rules.fin
                  --input test_data/sample.csv
                  --output /dev/null
                  --headless --strict-parse --audit-mode none

Windows note: Use NUL instead of /dev/null if running steps on Windows runners.

Integrity Verification

FinLang can verify that immutable fields (date, amount, counterparty) are unchanged between input and output using built-in SHA-256 fingerprinting:

# Fast verification (fingerprint only, console output)
finlang --input data.csv --output out.csv --rules rules.fin --verify

# Full verification (fingerprint + field comparison)
finlang --input data.csv --output out.csv --rules rules.fin --verify-full

# CI/CD pipeline: verify + save artifacts for audit trail
finlang --input data.csv --output out.csv --rules rules.fin --verify --verify-output-dir ./audit

Exit code 3 indicates a verification failure. Artifacts include verify_report.json, verify_proof.csv, and verify_mismatches.csv (on failure only).

See verify.md for the full feature explainer (when to use it, output anatomy, limitations).


🔄 Reconciliation Workflow

Where governance expects an independent challenge to a categorisation pipeline — typically an ML model — --reconcile produces a row-by-row mismatch report with rule attribution and audit reason. This is the integration pattern.

Step 1 — Run FinLang against the same raw data the ML pipeline processed

finlang --input transactions.csv \
        --rules compliance.fin \
        --output finlang_out.csv \
        --audit audit.json --audit-mode full \
        --reconcile ml_categorised.csv \
        --reconcile-output-dir audit/ \
        --reconcile-html

--audit --audit-mode full is required so mismatch rows can carry rule name + match condition. --reconcile-html is optional but recommended for compliance-context reports.

Step 2 — Read the report

Artefact Purpose Read by
audit/reconcile_report.json Machine-readable summary (status, match rate, mismatches count, audit_entries_loaded) CI/CD assertions, dashboards
audit/reconcile_mismatches.csv One row per disagreement: counterparty, ML's category, FinLang's category, rule name, audit reason Auditors, compliance reviewers
audit/reconcile_report.html Self-contained HTML view of the above (opens offline, no JS) Compliance reports, archival, stakeholder review

Exit code 3 indicates one or more mismatches. CI/CD should treat this as "review needed" — not "the data is broken" (that's exit code 1) and not "configuration is wrong" (exit code 2).

Step 3 — Decide what to do about the disagreements

--reconcile reports disagreements; it does not score them or judge which side is right. A human reads the mismatches CSV and decides.

The CSV column an auditor reads is the one a black-box ML model does not expose: finlang_rule_matched plus finlang_audit_reason. That column is the load-bearing piece — it's the answer to "why did FinLang reach a different conclusion?", which the regulator's challenger workflow needs.

CI/CD pattern

finlang ... --reconcile ml_out.csv --reconcile-output-dir ./audit
EXIT=$?
if [ $EXIT -eq 3 ]; then
  echo "Reconciliation surfaced disagreements. See audit/reconcile_mismatches.csv"
  # Pipeline policy: notify reviewer, do NOT auto-block downstream
fi

See reconciliation.md for the full feature explainer.


✅ Rollout Checklist

Phase 1: Pilot (Week 1–2)

  • Install FinLang in test environment
  • Validate with 3 months historical data
  • Train 2–3 power users
  • Create initial ruleset

Phase 2: Department (Week 3–4)

  • Deploy to finance team (10–20 users)
  • Set up Git repository for rules
  • Establish Growth Loop cadence
  • Document standard workflows

Phase 3: Enterprise (Month 2–3)

  • CI/CD pipeline integration
  • Audit log storage & retention
  • Multi-team collaboration model
  • SLA definition & monitoring

Phase 4: Scale (Month 3+)

  • Automated daily runs
  • Dashboard/metrics reporting
  • Cross-department rule sharing
  • Rule pack marketplace / internal packs

Adoption Pyramid


📚 Related Documentation

  • install.md — Getting started quickly
  • flags.md — All CLI flags & canonical formats
  • i18n_examples.md — Regional format recipes
  • mapping_guide.md — Align headers to the canonical schema
  • amount_synthesis.md — Debit/credit synthesis logic
  • rule_language.md — Write and test rules
  • growth_loop_best_practices.md — 3-step discovery workflow
  • cli_reference.md — Complete command reference
  • benchmarks.md — Performance data and methodology

© FinLang Ltd. All rights reserved.

Source: FinLang-Ltd/finlang/docs/workflows.md. Edit there, push, the website rebuilds.