📊 FinLang Benchmarks
Applies to: FinLang v0.7.7+
Status: Reference
Last verified: v0.7.7 (11 April 2026)
This guide presents validated benchmark data for FinLang v0.7.7, tested on a real developer workstation.
⚙️ Test Environment
- CPU: Intel i7‑12700T (12th Gen)
- RAM: 48 GB
- OS: Windows 11 (64‑bit)
- Python: 3.13.7 (64‑bit)
- Backend: FastIO (PyArrow 21.0.0)
- FinLang: 0.7.7 (installed from PyPI)
Your absolute numbers may differ based on CPU, storage, and OS. Focus on shape (linear scaling) and relative performance.
🚀 v0.7.7 Performance Highlights
v0.7.7 contains a hot-path bug fix in _to_number (the function that runs on every amount value). The CR/DR detection regex contained an unnecessary \b word boundary that was both:
- Producing wrong results on no-space CR/DR formats like
200DR(silently emitting +200 instead of -200, latent since v0.6.4) - Costing measurable runtime by forcing per-character boundary assertions on every regex evaluation
Removing the \b aligned the detection regex with its sibling stripping regex (which never had \b and worked correctly). The fix is a single character change. The performance gain is a direct side effect.
Headline numbers:
- Standard mode: ~180K rows/sec (steady-state from 5M rows upwards)
- FastIO mode: ~217K rows/sec at 20M rows
- 20M rows in ~90 seconds (FastIO), full field-by-field SHA-256 verified
- +30-50% throughput improvement vs v0.7.6 on the integrity harness
🧪 Benchmark Harnesses
1) Single Ruleset Harness — scaling across Rows × Cols
Evaluates performance for a single static ruleset over a grid of dataset sizes and column widths.
python -m benchmarks.bench_finlang_harness `
--mode full-cli `
--run-fin "finlang --fastio --audit-mode none --headless" `
--rules examples/rules.demo.fin `
--include-pack retail,transport,subs `
--rows 25000 50000 100000 200000 `
--cols 5 20 35 50 `
--runs 3 `
--final-rows 1000000 5000000 `
--outdir bench_out_v077
Outputs:
bench_surface.png— 3D runtime surfacebench_heatmap.png— runtime heatmapbench_results.csv— raw timings (averaged over 3 runs)bench_big.csv— finals (1M and 5M rows)
2) Ruleset Comparison Harness — small vs medium vs large rules
Measures runtime impact of ruleset size/complexity.
python -m benchmarks.bench_finlang_rulesets `
--run-fin "finlang --fastio --audit-mode none --headless" `
--rules-set Small:examples/rules.demo.fin `
--rules-set Medium:src/finlang/rulepacks/01-vendors-retail.fin `
--rules-set Large:src/finlang/rulepacks/03-subscriptions.fin `
--include-pack transport `
--grid-rows 25000 50000 100000 200000 `
--grid-cols 5 20 35 50 `
--repeats 3 `
--final-rows 1000000 5000000 `
--outdir bench_out_rulesets_v077
Outputs:
heatmap_Small.png,heatmap_Medium.png,heatmap_Large.pngresults_all.csv,finales.csv
3) Integrity Test — cryptographic verification at scale
Validates data integrity with SHA-256 fingerprinting. Proves zero data corruption or cross-row contamination.
# Default: 5K rows, fingerprint-only (daily use)
python integrity_testv2.py
# Full validation: field-by-field + fingerprint
python integrity_testv2.py --full
# Scale testing
python integrity_testv2.py --rows 20000000 --full
# Generate annotated proof CSV (demo/audit collateral)
python integrity_testv2.py --rows 500000 --full --proof
What it proves:
- Row count preserved (no dropped/duplicated rows)
- Immutable fields unchanged (
date,amount,counterparty) - SHA-256 fingerprint per row validates no cross-row contamination
- Both code paths (standard + PyArrow) produce identical results
📈 Validated Results (v0.7.7)
Single Ruleset Performance — Grid (3-run average)
| Rows × Cols | Runtime (s) | Throughput (rows/s) |
|---|---|---|
| 25K × 5 | 0.69 | 36,200 |
| 25K × 20 | 0.95 | 26,300 |
| 25K × 35 | 1.20 | 20,800 |
| 25K × 50 | 1.51 | 16,600 |
| 50K × 5 | 0.75 | 66,800 |
| 50K × 20 | 1.29 | 38,800 |
| 50K × 35 | 1.85 | 27,100 |
| 50K × 50 | 2.37 | 21,100 |
| 100K × 5 | 0.92 | 108,500 |
| 100K × 20 | 1.98 | 50,600 |
| 100K × 35 | 3.08 | 32,500 |
| 100K × 50 | 4.18 | 23,900 |
| 200K × 5 | 1.24 | 160,800 |
| 200K × 20 | 3.38 | 59,200 |
| 200K × 35 | 5.61 | 35,700 |
| 200K × 50 | 7.96 | 25,100 |
Single Ruleset Performance — Finals (3-run average)
| Rows × Cols | Runtime (s) | Throughput (rows/s) |
|---|---|---|
| 1M × 5 | 3.84 | 260,400 |
| 1M × 20 | 15.04 | 66,500 |
| 1M × 50 | 36.28 | 27,600 |
| 5M × 5 | 17.66 | 283,100 |
| 5M × 20 | 71.55 | 69,900 |
| 5M × 50 | 179.27 | 27,900 |
Ruleset Comparison (5M × 50)
| Ruleset | Runtime (s) | Throughput (rows/s) |
|---|---|---|
| Small | 173.28 | 28,900 |
| Medium | 175.43 | 28,500 |
| Large | 174.97 | 28,600 |
Key finding: ~1.2% variance across rulesets — rule complexity has negligible impact at scale.
All three rulesets land within ~2 seconds of each other at the 5M × 50 extreme. The engine's hot path is ruleset-shape-independent at scale.
🔐 Integrity Test Results (v0.7.7)
Cryptographic verification using SHA-256 fingerprints on every row.
Performance by Scale
| Rows | Generation | Engine (std) | Engine (fast) | Validation (full) | Total |
|---|---|---|---|---|---|
| 5M | 18.1s | 27.9s (178,903 rows/s) | 25.2s (198,448 rows/s) | ~3.1m | ~5 min |
| 10M | 37.4s | 56.0s (178,511 rows/s) | 46.7s (214,136 rows/s) | ~6.0m | ~10 min |
| 20M | 1.2m | 1.8m (181,566 rows/s) | 1.5m (217,068 rows/s) | ~11.8m | ~18 min |
20M Row Validation — Full Output
=== FinLang Data Integrity Test (Python) ===
Row count: 20,000,000
Validation mode: Full (field-by-field + fingerprint)
PyArrow available: Yes
Proof output: No
[1/6] Generating 20,000,000 test rows with fingerprints... OK (1.2m)
[2/6] Creating test rules... OK
Loading input data for validation... OK (20.9s)
[3/6] Running FinLang engine (standard)... OK (1.8m, 181,566 rows/s)
[4/6] Validating integrity (standard, full)... OK (20,000,000 categorized, 5.9m)
[5/6] Running FinLang engine (--fastio)... OK (1.5m, 217,068 rows/s)
[6/6] Validating integrity (--fastio, full)... OK (20,000,000 categorized, 5.9m)
=== Integrity Test PASSED ===
Rows tested: 20,000,000
Immutable fields verified: date, amount, counterparty
Fingerprints validated: 20,000,000 (no cross-row contamination)
Validation mode: Full (field-by-field + fingerprint)
Standard mode: PASS (181,566 rows/s)
FastIO mode: PASS (217,068 rows/s)
Why Integrity Test Shows Higher Throughput
The integrity test uses a minimal 6-column schema (date, amount, counterparty, memo, category, fingerprint) versus the benchmark harness's 50-column enterprise schema.
| Test Type | Columns | Throughput (FastIO) |
|---|---|---|
| Integrity test | 6 | 217K rows/s |
| Enterprise benchmark | 50 | ~28K rows/s |
This is expected: narrower data = less I/O, less memory pressure, faster processing. Both numbers are valid for their respective use cases. The integrity test isolates engine performance from CSV I/O overhead; the enterprise harness measures real-world end-to-end throughput.
📊 Version Comparison
Integrity Harness (FastIO, full validation)
| Scale | v0.7.6 (std) | v0.7.7 (std) | Std Δ | v0.7.6 (fast) | v0.7.7 (fast) | Fast Δ |
|---|---|---|---|---|---|---|
| 5M | 128K rows/s | 179K rows/s | +39.8% | 133K rows/s | 198K rows/s | +49.2% |
| 10M | 122K rows/s | 179K rows/s | +46.3% | 156K rows/s | 214K rows/s | +37.3% |
| 20M | 146K rows/s | 182K rows/s | +24.4% | 167K rows/s | 217K rows/s | +30.0% |
Single Ruleset Harness (5M × 50)
| Version | Runtime (s) | Throughput |
|---|---|---|
| v0.6.4 | 208.31 | ~24K rows/s |
| v0.7.2 | 187.76 | ~26.6K rows/s |
| v0.7.4post1 | 187.90 | ~26.6K rows/s |
| v0.7.7 | 179.27 | ~27.9K rows/s |
Cumulative gain v0.6.4 → v0.7.7: -14% runtime, +16% throughput on the enterprise harness.
What Changed in v0.7.7
The improvement is dominated by a single hot-path fix in _to_number:
# v0.6.4 - v0.7.6 (broken on no-space CR/DR formats)
r'\b(?:CR|CRED|CREDIT)\b\.?\s*$'
# v0.7.7 (correct)
r'(?:CR|CRED|CREDIT)\.?\s*$'
The \b word boundary was inconsistent with the sibling stripping regex (which never had \b), causing 200DR to be silently parsed as +200 instead of -200. Removing it:
- Fixed the correctness bug
- Removed per-character boundary assertions from a regex evaluated tens of millions of times per dataset
- Delivered 30-50% throughput improvement on the integrity harness
The performance gain is a direct side effect of the bug fix, not a separate optimisation.
🖼️ Visual Results
| Visualization | Description |
|---|---|
![]() |
3D runtime surface (Rows × Cols) |
![]() |
Runtime heatmap (Rows × Cols) |
![]() |
Ruleset comparison — Small |
![]() |
Ruleset comparison — Medium |
![]() |
Ruleset comparison — Large |
🔬 Methodology Notes
- 3 runs per point, average reported to smooth variance
- Warm runs (Chrome closed, system idle)
--audit-mode noneto measure engine speed (no audit overhead)- PyArrow (
--fastio) enabled for CSV I/O - Deterministic data generation, reproducible CLI scripts
- All v0.7.7 runs from PyPI-installed package (
pip install "finlang[fastio]") - Run-to-run variance under 1.5% on the 5M × 50 finale (179.27s averaged across 3 runs)
💡 Practical Interpretation
| Scenario | Example Dataset | Expected Runtime |
|---|---|---|
| Personal finance | 25K × 20 | < 1s |
| Small business | 500K × 35 | ~14s |
| Enterprise ledger | 5M × 50 | ~3 min |
| Full year bank data | 20M × 6 | ~18 min (with integrity verification) |
Rule of thumb: FinLang scales linearly — doubling rows ≈ doubling runtime; increasing columns raises evaluation cost predictably.
🧩 Troubleshooting
| Issue | Symptom | Fix |
|---|---|---|
| "PyArrow missing" | ImportError: No module named pyarrow |
pip install "finlang[fastio]" |
| "Encoding error" | Garbled text | Add --encoding auto |
| Slow performance | Lower than expected throughput | Ensure --fastio is present; close background apps |
| CPU power limits | Runtime > expected | Disable power saving / thermal throttling |
📚 Related Documentation
- CLI Reference — Complete command reference
- Runtime Contract — Backend selection logic
- Flags — Full CLI flags and canonical formats
- Workflows — End‑to‑end workflow guide
- Release Notes v0.7.7 — Bug fix and performance details




