🌐 FinLang HTTP API
Status: v0.1 — SOL-041 MVP Applies to: FinLang with the
[api]extras installed (pip install finlang[api])
The FinLang API is a thin REST surface over the published CLI. Categorise transactions, discover counterparties, and generate draft rules without leaving HTTP — same engine, same audit trail, with curated HTTP parameters mapping to the most-used CLI flags. It is not a SaaS, not a hosted service, not a replacement for the CLI: it makes the same deterministic engine reachable over HTTP for buyers, integrators, and demo widgets that evaluate FinLang as a deployable service rather than a Python tool.
🎯 Quick Navigation
I want to…
- Understand when the API fits my workflow → Bidirectional When / When NOT
- See the API in action → curl POST /process end-to-end
- Look up an endpoint's exact form fields and response schema → Reference doc
- Understand auth and configuration → Env vars, X-API-Key
- Map HTTP status codes to engine exit codes → 200 / 422 / 500 table
New to FinLang? Start with install.md and the Daily Run workflow. The API runs on top of a working FinLang install — it isn't first-touch.
✅ When to Use
- Wrapping FinLang in your service stack. The API is the natural surface when FinLang sits behind a downstream service that already speaks HTTP. Drop in a microservice, point your existing client at it.
- Demoing FinLang to a buyer or integrator. A POST
/processwith curl on a sample CSV beats a "let me show you the CLI flags" walkthrough. Buyer evaluates "service you can deploy" differently from "CLI you install." - Powering an interactive demo widget. The website widget can hit
/processagainst pre-baked input and render the response. A prospect experiences the product before speaking to anyone. - Multi-language client integration. Anything that can POST a multipart form can call FinLang. No Python required on the caller's side.
- Containerised deployment. Run
finlang-apiinside a Docker container, expose port 8000, put nginx or Caddy in front. Same shape any Python service ships in.
❌ When NOT to Use
- Single-user batch jobs on your own machine. The CLI is faster and produces the same output. The API adds HTTP overhead for no gain.
- You expect a SaaS. No multi-tenancy, no persistent storage, no async job queue, no rate limiting, no metering. Those concerns belong to a hosted-service layer above this wrapper. The wrapper itself is intentionally simple.
- Long-running streaming jobs. Each request runs synchronously to completion within the configured timeout (300s default). For multi-hour batches, run the CLI directly.
- Anything requiring per-request encryption keys, fine-grained authz, or audit-of-the-API-itself. The wrapper is single-process with one optional API key. Production deployments behind a reverse proxy can layer those concerns; the API itself does not.
🔄 The Request Flow
┌─────────────────────┐
│ Your client │
│ (curl, browser, │
│ downstream svc) │
└──────────┬──────────┘
│ POST /process
│ multipart form
▼
┌─────────────────────┐ ┌─────────────────────┐
│ FastAPI app │ ──────► │ Temp directory │
│ (uvicorn, single │ stage │ input.csv │
│ process) │ files │ rules.fin │
└──────────┬──────────┘ │ audit.json │
│ └─────────────────────┘
│ subprocess.run([finlang, --input, ...])
▼
┌─────────────────────┐
│ finlang CLI │
│ (fresh process, │
│ same engine) │
└──────────┬──────────┘
│ exits 0 / 1 / 2 / 3
▼
┌─────────────────────┐
│ FastAPI app │
│ reads outputs, │
│ maps exit code, │
│ returns JSON │
└──────────┬──────────┘
│ HTTP 200 / 422 / 500
▼
┌─────────────────────┐
│ Your client │
│ receives: │
│ output_csv, │
│ audit, stats │
└─────────────────────┘
The subprocess boundary is load-bearing. The API never imports finlang.engine.*. Every request runs the published CLI as a fresh child process — same binary your end users run from a terminal. Failures are isolated; engine state can't leak between requests.
📍 Worked Example
A 5-row sample CSV, two rules, a single curl command.
# transactions.csv
date,counterparty,amount,memo
2024-01-15,TESCO STORES 1234,-45.20,GROCERIES
2024-01-16,SHELL FUEL,-65.00,FORECOURT
2024-01-17,UBER TRIP,-12.50,RIDE
2024-01-18,SALARY ACME LTD,3200.00,JAN PAY
2024-01-19,UNKNOWN VENDOR XYZ,-19.99,MEMO
# rules.fin
rule "GROCERIES: Tesco" {
match:
- counterparty ~ "*TESCO*"
set:
- category = "Groceries"
}
rule "FUEL: Shell" {
match:
- counterparty ~ "*SHELL*"
set:
- category = "Fuel"
}
Start the API and POST:
finlang-api &
curl -s -X POST http://localhost:8000/process \
-F "input_csv=@transactions.csv" \
-F "rules=@rules.fin" \
-F "audit_mode=lite"
Response:
{
"output_csv": "date,counterparty,amount,memo,category,...\n2024-01-15,TESCO STORES 1234,-45.20,GROCERIES,Groceries,...\n2024-01-16,SHELL FUEL,-65.00,FORECOURT,Fuel,...\n...",
"audit": [
{"row": 0, "rule": "GROCERIES: Tesco", "changes": {"category": "Groceries"}},
{"row": 1, "rule": "FUEL: Shell", "changes": {"category": "Fuel"}}
],
"stats": {
"rows_in": 5,
"rows_out": 5,
"audit_entries": 2,
"duration_seconds": 0.0612,
"exit_code": 0
},
"stderr": ""
}
Two rules fired (Tesco, Shell). Three rows were left uncategorised — FinLang doesn't assign a default; rows that match no rule keep an empty category field. The full output CSV is returned as a string in the response — pipe it to a file, render it in a UI, or pass it to the next stage of your pipeline.
🌍 Locale flags inherited: The same i18n flags that the engine honours (
decimal,thousands,dayfirst,encoding,output_encoding) are form fields on/process. European-format input doesn't need a separate code path — it's a flag.
⚙️ Endpoints at a Glance
| Method | Path | Purpose | Auth |
|---|---|---|---|
GET |
/ |
HTML landing page → /docs |
no |
GET |
/health |
Liveness check + version + cli_resolved | no |
POST |
/process |
Categorise transactions; optional --verify |
yes |
POST |
/discover |
Find uncategorised counterparties | yes |
POST |
/suggest |
Generate draft .fin rules from candidates |
yes |
POST |
/reconcile |
Reconcile against ML output; returns JSON summary + (optional) HTML report | yes |
For the full form-field schema on each endpoint, response shapes, and curl recipes, see api_reference.md. Interactive Swagger UI is always live at http://localhost:8000/docs while finlang-api is running.
⚠️
/reconcileexit code semantics: unlike/process, where exit code 3 (verify mismatch) maps to HTTP 422,/reconcilemaps exit 3 → HTTP 200 with mismatches surfaced in the response body. Finding mismatches is the expected outcome of reconciliation, not an error. The caller readsstats.mismatches_foundandsummary.mismatchesto know what happened. Only ops errors (exit 1 → 500) and validation errors (exit 2 → 422) map to error statuses on this endpoint.
🌐
/reconcile?format=htmlshortcut: for human inspection of the HTML report, append?format=htmlto the POST URL and the API returns the HTML directly withContent-Type: text/html— no JSON unwrapping, no escape-character cleanup. Save withcurl -o report.htmlor open in a browser. Requiresreconcile_html=true. Defaultformat=jsonreturns the fullReconcileResponse(existing behaviour).
🔐 Auth and Configuration
Auth
Auth is opt-in via env var. Set FINLANG_API_KEY in the process environment, and every non-/health endpoint requires the matching X-API-Key header on incoming requests.
export FINLANG_API_KEY="your-secret-string"
finlang-api &
curl -H "X-API-Key: your-secret-string" \
-X POST http://localhost:8000/process \
-F "input_csv=@transactions.csv" \
-F "rules=@rules.fin"
If FINLANG_API_KEY is unset, auth is disabled (dev mode). /health is always public regardless of auth state — useful for liveness probes behind a load balancer.
For production, set the key, rotate it on a schedule, and put TLS termination in front (nginx, Caddy, your cloud provider's LB). Per-tenant keys, OAuth, and JWT are explicitly out of scope for this wrapper — those concerns belong to a hosted service layer above.
Configuration
| Env var | Default | Purpose |
|---|---|---|
FINLANG_API_KEY |
unset (auth disabled) | When set, all non-health endpoints require X-API-Key: <key> |
FINLANG_API_HOST |
127.0.0.1 |
Bind host for the finlang-api script |
FINLANG_API_PORT |
8000 |
Bind port |
FINLANG_API_TIMEOUT |
300 |
Subprocess timeout in seconds |
FINLANG_API_MAX_UPLOAD |
104857600 |
Max upload size in bytes (100 MiB) |
FINLANG_API_LOG_LEVEL |
info |
Uvicorn log level |
⚠️ Trap to know about: the API does not rate-limit, throttle, or cap concurrent requests. A single
finlang-apiprocess serves requests from one Uvicorn worker. For production loads, run multiple workers (uvicorn --workers N) and put a reverse proxy in front. The API is a service surface, not a SaaS — sizing it is the operator's job.
🚦 HTTP Status ↔ Exit Code Mapping
The engine returns four exit codes; the API maps them to clean HTTP statuses.
| Engine exit code | HTTP status | Meaning |
|---|---|---|
0 |
200 OK |
Engine succeeded; output CSV + audit + stats returned |
1 |
500 Internal Server Error |
Ops error — file not found, IO failure, unexpected crash |
2 |
422 Unprocessable Entity |
Validation/parse error — malformed CSV, bad flag combination, missing required field |
3 |
422 Unprocessable Entity |
Verify mismatch (when --verify or --verify-full is on) |
Other HTTP statuses the API can return:
400 Bad Request— input validation failed at the API layer (e.g. neitherrulesnorinclude_packprovided to/process)401 Unauthorized— auth required (env var set) and request missing or has wrongX-API-Key413 Request Entity Too Large— upload exceedsFINLANG_API_MAX_UPLOAD503 Service Unavailable— FinLang CLI not found on PATH (installation issue)504 Gateway Timeout— subprocess exceededFINLANG_API_TIMEOUT
For CI/CD pipelines: 200 = success on /process, and on /reconcile check stats.mismatches_found in the body for the review signal; 422 = engine validation/parse error or verify mismatch (data didn't flow cleanly); 500/503/504 = ops failure to investigate; 400/401/413 = caller error.
🚧 Limitations (v0.1 MVP)
- Single-process, synchronous. One request runs at a time per worker. No async job queue.
- No persistent storage. Uploaded files live in a temp directory for the duration of one request, then disappear.
- No streaming response. Whole output CSV is returned as a string in the JSON body. For multi-million-row outputs, prefer the CLI.
- No multi-tenancy. One API key, one set of users.
- No rate limiting. Operator's concern; layer at the reverse proxy.
- No WebSocket / Server-Sent Events. HTTP request/response only.
- Subprocess overhead per request. ~50–150ms of process-startup cost on top of engine time. Negligible for human-driven requests; consider the CLI for batch loops where startup dominates.
🛣️ Roadmap (direction, not promises)
Candidates being evaluated when buyer or customer demand surfaces:
- OpenAPI client SDK generation (TypeScript / Python) — scaffold a typed client straight from
/openapi.json. - Streaming response on
/processand/reconcilefor very large outputs — chunked CSV instead of one JSON blob. - Optional per-request audit log download — instead of inlining
auditin the JSON response, return a presigned link or multipart attachment. - Health check enrichment — surface engine version, rule pack inventory, last-N-request stats.
- Standalone reconcile mode — once the engine ships
--reconcile-only, expose it as a separate endpoint that takes two pre-existing CSVs without re-running the engine. --date-formatform field on/process— currently not exposed; add if a buyer asks. Most users get adequate locale handling viadayfirst+ auto-parsing.
Async job queues, persistent storage, multi-tenancy, OAuth/JWT, and rate metering remain explicitly out of scope. Those are hosted-SaaS concerns.
📚 Related Documentation
- api_reference.md — full form-field tables, response schemas, curl recipes per endpoint
- cli_reference.md — the underlying CLI surface that the API dispatches to
- workflows.md — Daily Run / Growth Loop patterns the API can drive
- reconciliation.md —
--reconcileengine feature (exposed via/reconcile) - verify.md —
--verifyintegrity primitive (already wired to/processvia theverifyform field) - install.md —
pip install finlang[api]and gettingfinlang-apion PATH - faq.md — general FinLang FAQ
The CLI is canonical. The API makes it reachable. Same engine, same audit trail, same determinism — over HTTP.