tabularmapper 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Karthikeyan Duraisamy
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,455 @@
1
+ Metadata-Version: 2.4
2
+ Name: tabularmapper
3
+ Version: 1.0.0
4
+ Summary: Map any spreadsheet (.xlsx) to a schema you define — deterministic column mapping with an optional AI matcher
5
+ Author-email: Karthikeyan Duraisamy <karthikeyanduraisamy@kultivateindia.com>
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/KarthiKeyan05046/tabularmapper
8
+ Project-URL: Repository, https://github.com/KarthiKeyan05046/tabularmapper
9
+ Project-URL: Issues, https://github.com/KarthiKeyan05046/tabularmapper/issues
10
+ Keywords: schema mapping,xlsx,spreadsheet,etl,column mapping,openpyxl
11
+ Classifier: Development Status :: 5 - Production/Stable
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.9
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Topic :: Office/Business
19
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
20
+ Requires-Python: >=3.9
21
+ Description-Content-Type: text/markdown
22
+ License-File: LICENSE
23
+ Requires-Dist: openpyxl>=3.1
24
+ Requires-Dist: rapidfuzz>=3.0
25
+ Requires-Dist: python-dateutil>=2.8
26
+ Provides-Extra: api
27
+ Requires-Dist: fastapi>=0.110; extra == "api"
28
+ Requires-Dist: python-multipart>=0.0.9; extra == "api"
29
+ Provides-Extra: redis
30
+ Requires-Dist: redis>=5.0; extra == "redis"
31
+ Provides-Extra: valkey
32
+ Requires-Dist: valkey>=6.0; extra == "valkey"
33
+ Provides-Extra: postgres
34
+ Requires-Dist: psycopg[binary]>=3.1; extra == "postgres"
35
+ Provides-Extra: dotenv
36
+ Requires-Dist: python-dotenv>=1.0; extra == "dotenv"
37
+ Dynamic: license-file
38
+
39
+ # Tabular Mapper
40
+
41
+ Map **any spreadsheet (`.xlsx`), in any layout**, to a schema *you* define — the
42
+ header row is found automatically, columns are matched to your fields, and
43
+ anything ambiguous is flagged for review instead of silently guessed.
44
+
45
+ The engine is **domain-agnostic** — invoices, product catalogs, payroll, bank
46
+ statements. "Bank statements" is just a built-in preset (`bank_preset()`). The
47
+ common path is **100% deterministic** (header detection + synonym/fuzzy matching);
48
+ an LLM is optional, off by default, and only ever sees column *headers* + column
49
+ *structure* — never your cell data.
50
+
51
+ ```python
52
+ from tabularmapper import process_file, configure, config_from_dict
53
+
54
+ configure(config_from_dict({
55
+ "output_schema": [{"field": "sku", "header": "SKU", "type": "text"},
56
+ {"field": "price", "header": "Unit Price", "type": "number"}],
57
+ "synonyms": {"sku": ["sku", "item code"], "price": ["unit price", "rate"]},
58
+ }))
59
+ res = process_file("catalog.xlsx")
60
+ res.records # -> [{'sku': 'A-100', 'price': 12.5}, ...] ready for JSON / a DB
61
+ res.needs_review # -> False (True if a column was uncertain)
62
+ ```
63
+
64
+ **Contents:** [Install](#install) · [Quickstart](#quickstart) · [The result](#the-result-object)
65
+ · [Configuration](#configuration-env-vars) · [Storage backends](#storage-backends)
66
+ · [FastAPI](#use-with-fastapi) · [Output formats](#output-formats) · [AI matcher](#ai-column-matcher-optional)
67
+ · [Self-learning](#self-learning) · [Custom schema](#custom-output-schema) · [API reference](#api-reference)
68
+ · [Gotchas](#gotchas--faq)
69
+
70
+ ---
71
+
72
+ ## Install
73
+
74
+ ```bash
75
+ pip install tabularmapper # core — no DB driver, no AI SDK
76
+ pip install "tabularmapper[api]" # + FastAPI router
77
+ pip install "tabularmapper[valkey]" # + Valkey (also: redis, postgres, dotenv)
78
+ ```
79
+
80
+ The core install pulls only `openpyxl`, `rapidfuzz`, `python-dateutil`. Everything
81
+ else (Redis/Valkey/Postgres drivers, FastAPI, dotenv) is an **optional extra** you
82
+ add only if you use it. Import name is `tabularmapper`.
83
+
84
+ ## Quickstart
85
+
86
+ ### 1. As a library
87
+
88
+ ```python
89
+ from tabularmapper import process_file, process_stream, configure, bank_preset
90
+
91
+ configure(config=bank_preset()) # or a config_from_dict(...) of your own
92
+ res = process_file("file.xlsx")
93
+ rows = res.records # list[dict], one per row
94
+
95
+ # from bytes (e.g. an upload) — parsed in memory, nothing written to disk
96
+ res = process_stream(open("file.xlsx", "rb").read())
97
+ ```
98
+
99
+ There is **no default schema** — call `configure(...)` with your own config or a
100
+ preset first, otherwise nothing is mapped ([Custom schema](#custom-output-schema)).
101
+
102
+ ### 2. From the command line
103
+
104
+ ```bash
105
+ tabularmapper file.xlsx --config schema.json # your schema
106
+ tabularmapper file.xlsx --preset bank # built-in bank layout
107
+ tabularmapper file.xlsx --preset bank --format json # JSON to stdout
108
+ ```
109
+
110
+ ### 3. In a FastAPI app
111
+
112
+ ```python
113
+ from fastapi import FastAPI
114
+ from tabularmapper.api import router, lifespan
115
+
116
+ app = FastAPI(lifespan=lifespan) # lifespan wires cache + config + AI for you
117
+ app.include_router(router) # -> POST /mapper/map, GET /mapper/health
118
+ ```
119
+
120
+ That's the whole integration. **Do not add your own cache manager** — the
121
+ `lifespan` already builds the cache (see [Gotchas](#gotchas--faq)).
122
+
123
+ ## How it works
124
+
125
+ ```
126
+ 1. detect header row deterministic scoring — finds the real header even under
127
+ bank logos / metadata rows (never assumes row 1)
128
+ 2. map columns exact synonym → fuzzy match → optional AI (unknown headers)
129
+ 3. extract rows deterministic date/amount parsing, debit/credit vs signed
130
+ amount reconciliation (a model never sees a data row)
131
+ 4. review gate missing/uncertain critical column -> needs_review = True
132
+ ```
133
+
134
+ ## The result object
135
+
136
+ `process_file` / `process_stream` return a `ProcessResult`:
137
+
138
+ | Attribute | Type | What it is |
139
+ |---|---|---|
140
+ | `records` | `list[dict]` | the mapped rows — keys are your schema fields. **Use this for a DB.** |
141
+ | `needs_review` | `bool` | `True` if any critical column was missing or low-confidence |
142
+ | `review_reasons` | `list[str]` | human-readable reasons when `needs_review` |
143
+ | `column_maps` | `list[ColumnMap]` | per-column: `raw_header`, `field`, `confidence`, `method` |
144
+ | `header_index` | `int` | 0-based row where the header was found |
145
+ | `output` | `OutputResult` | serializers: `.records` `.json` `.bytes` `.base64` (see [formats](#output-formats)) |
146
+
147
+ ```python
148
+ res = process_file("statement.xlsx")
149
+ if res.needs_review:
150
+ print("review:", res.review_reasons) # quarantine instead of trusting
151
+ else:
152
+ db.insert_many(res.records) # each dict is one row
153
+ ```
154
+
155
+ ## Configuration (env vars)
156
+
157
+ Everything swappable is set by an environment variable — **no code changes**.
158
+ All are optional; sensible defaults apply.
159
+
160
+ | Variable | Default | Purpose |
161
+ |---|---|---|
162
+ | `TABULARMAPPER_CACHE` | `memory://` (no files) | where header→field mappings are cached ([backends](#storage-backends)) |
163
+ | `TABULARMAPPER_LEARN_STORE` | `memory://` (no files) | where self-learned header synonyms live |
164
+ | `TABULARMAPPER_CONFIG` | *(none — required)* | output template + synonyms JSON (file / `https://` / `s3://`) |
165
+ | `TABULARMAPPER_ROUTE_PREFIX` | `/mapper` | FastAPI router path prefix |
166
+ | `OPENAI_API_KEY` | *(unset → AI off)* | enables the AI column matcher |
167
+ | `OPENAI_BASE_URL` | `https://api.openai.com/v1` | any OpenAI-compatible endpoint |
168
+ | `OPENAI_MODEL` | `gpt-4o-mini` | model name |
169
+
170
+ ```bash
171
+ export TABULARMAPPER_CACHE="valkeys://default:PASSWORD@host:6379"
172
+ ```
173
+
174
+ ## Storage backends
175
+
176
+ The cache and the learn store share one URL convention (like SQLAlchemy/Celery) —
177
+ change the backend by changing the URL, nothing else:
178
+
179
+ | URL | Backend | Install |
180
+ |---|---|---|
181
+ | `memory://` | in-process, **no files (default)** | — |
182
+ | `sqlite:///cache.db` | SQLite file, concurrency-safe, persistent | — |
183
+ | `redis://` / `rediss://` | Redis | `pip install "tabularmapper[redis]"` |
184
+ | `valkey://` / `valkeys://` | Valkey (Redis fork, e.g. Aiven) | `pip install "tabularmapper[valkey]"` |
185
+ | `postgresql://` | Postgres | `pip install "tabularmapper[postgres]"` |
186
+
187
+ ```python
188
+ from tabularmapper import MappingCache, process_file
189
+
190
+ cache = MappingCache("valkeys://default:pw@host:6379") # or MappingCache() to read the env var
191
+ res = process_file("statement.xlsx", cache=cache)
192
+ ```
193
+
194
+ `MappingCache` is **synchronous** — `.get()`, `.put()`, `.close()`. There is no
195
+ async manager and no `init_cache`/`close_cache`. Selecting the backend is the URL,
196
+ full stop.
197
+
198
+ ### Persistence is opt-in (no files by default)
199
+
200
+ By default the cache and learn store are **in-memory** — the package writes
201
+ **no files**. They still cache/learn within a running process (lost on restart).
202
+ Turn on persistence only when you want it, by setting a URL:
203
+
204
+ ```bash
205
+ # default: nothing set -> in-memory, no files
206
+
207
+ # persist to a file (creates cache.db + WAL sidecars .db-wal / .db-shm):
208
+ TABULARMAPPER_CACHE=sqlite:////var/lib/tabularmapper/cache.db
209
+ TABULARMAPPER_LEARN_STORE=sqlite:////var/lib/tabularmapper/learned.db
210
+
211
+ # or a shared server (survives restarts, shared across workers):
212
+ TABULARMAPPER_CACHE=valkeys://user:pw@host:6379
213
+ TABULARMAPPER_LEARN_STORE=valkeys://user:pw@host:6379
214
+ ```
215
+
216
+ If you *do* use a SQLite URL, the `.db-wal` / `.db-shm` files that appear next to
217
+ it are normal Write-Ahead-Logging sidecars (that's what makes it concurrency-safe);
218
+ they're checkpointed away on a clean shutdown and are already gitignored.
219
+
220
+ > In a FastAPI app the `.env` file is **not** auto-loaded (only the CLI does that).
221
+ > Call `load_dotenv()` at startup, or run with `uv run --env-file .env`, or the
222
+ > env vars won't be seen and you'll get the in-memory default.
223
+
224
+ ## Use with FastAPI
225
+
226
+ The package ships a ready router. Two ways to use it.
227
+
228
+ ### Simplest — use the built-in lifespan
229
+
230
+ ```python
231
+ from fastapi import FastAPI
232
+ from tabularmapper.api import router, lifespan
233
+
234
+ app = FastAPI(lifespan=lifespan)
235
+ app.include_router(router)
236
+ ```
237
+
238
+ At startup the `lifespan` reads `TABULARMAPPER_CONFIG`, builds `MappingCache()` from
239
+ `TABULARMAPPER_CACHE`, builds the learn store, and enables the AI matcher if
240
+ `OPENAI_API_KEY` is set. **Configure it entirely with env vars.**
241
+
242
+ ### Control the cache yourself — write your own lifespan
243
+
244
+ ```python
245
+ import os
246
+ from contextlib import asynccontextmanager
247
+ from fastapi import FastAPI
248
+ import tabularmapper.engine as engine
249
+ from tabularmapper.api import router, state, build_matcher
250
+ from tabularmapper import MappingCache, LearnStore, apply_learned
251
+
252
+ @asynccontextmanager
253
+ async def lifespan(app: FastAPI):
254
+ engine.configure(os.getenv("TABULARMAPPER_CONFIG"))
255
+ state.cache = MappingCache("valkeys://default:pw@host:6379") # your URL
256
+ state.matcher = build_matcher() # None if no OPENAI_API_KEY
257
+ state.learn = LearnStore()
258
+ apply_learned(state.learn)
259
+ yield
260
+ state.cache.close() # sync, no await
261
+ state.learn.close()
262
+
263
+ app = FastAPI(lifespan=lifespan)
264
+ app.include_router(router)
265
+ ```
266
+
267
+ ### Endpoints
268
+
269
+ | Method | Path | Purpose |
270
+ |---|---|---|
271
+ | `POST` | `/mapper/map` | upload an `.xlsx`, get the mapping + rows (JSON) |
272
+ | `GET` | `/mapper/health` | `{status, ai_enabled}` |
273
+ | `GET` | `/mapper/learn/pending` | debit/credit synonyms awaiting approval |
274
+ | `POST` | `/mapper/learn/approve` | approve a pending synonym (`?phrase=&field=`) |
275
+ | `POST` | `/mapper/learn/reject` | reject a pending synonym |
276
+
277
+ `POST /mapper/map` reads the upload in memory (no temp file) and runs the
278
+ blocking work in a threadpool. Store the original file to S3 in your own endpoint
279
+ if you need it — the mapper stays out of AWS.
280
+
281
+ The `/mapper` prefix is configurable (this is a general table→schema mapper, not
282
+ just banks): set `TABULARMAPPER_ROUTE_PREFIX`, or build the router yourself:
283
+
284
+ ```python
285
+ from tabularmapper.api import make_router, lifespan
286
+ app.include_router(make_router("/catalog")) # -> POST /catalog/map, ...
287
+ ```
288
+
289
+ ## Output formats
290
+
291
+ `res.output` serializes the same records five ways, lazily (built once, cached):
292
+
293
+ | `output_format` | `res.output` accessor | Best for |
294
+ |---|---|---|
295
+ | `records` | `.records` (`list[dict]`) | DB insert, JSON APIs *(default for `process_stream`)* |
296
+ | `json` | `.json` (`str`) | HTTP responses, queues |
297
+ | `bytes` | `.bytes` (`bytes`) | `StreamingResponse`, S3 upload |
298
+ | `base64` | `.base64` (`str`) | embedding in JSON |
299
+ | `file` | writes to `out_path` | disk *(default for `process_file`)* |
300
+
301
+ ```python
302
+ res = process_stream(data, output_format="records")
303
+ db.insert_many(res.records) # to your database
304
+ s3.put_object(Bucket=b, Key=k, Body=res.output.bytes) # .xlsx to S3, one pass
305
+ ```
306
+
307
+ CSV: `from tabularmapper import records_to_csv_bytes`.
308
+
309
+ ## AI column matcher (optional)
310
+
311
+ For a brand-new bank whose headers the synonyms can't place, one LLM call maps the
312
+ whole header row. It's **off unless `OPENAI_API_KEY` is set**, and the prompt
313
+ contains only column headers + structural metadata (types, fill rate, which
314
+ columns are mutually exclusive) — **never a transaction value**.
315
+
316
+ ```python
317
+ from tabularmapper.ai_matcher import OpenAICompatibleMatcher
318
+ res = process_file("new_bank.xlsx", table_matcher=OpenAICompatibleMatcher())
319
+ ```
320
+
321
+ Works with OpenAI, Azure, Together, Groq, or a local vLLM/Ollama endpoint via
322
+ `OPENAI_BASE_URL`.
323
+
324
+ ## Self-learning
325
+
326
+ When the AI resolves a new header, it's remembered so the next statement from that
327
+ bank maps deterministically (an `exact` match) with no AI call. Debit/credit are
328
+ held for a one-time human approval (a wrong direction is the costly error);
329
+ everything else auto-applies.
330
+
331
+ ```python
332
+ from tabularmapper import LearnStore, apply_learned, process_file
333
+ from tabularmapper.ai_matcher import OpenAICompatibleMatcher
334
+
335
+ store = LearnStore() # TABULARMAPPER_LEARN_STORE or sqlite
336
+ apply_learned(store) # activate at startup
337
+ res = process_file("stmt.xlsx", table_matcher=OpenAICompatibleMatcher(), learn_store=store)
338
+
339
+ store.pending() # debit/credit awaiting review
340
+ store.approve("outgoing", "debit") # now an exact match everywhere
341
+ ```
342
+
343
+ Bootstrap from an archive in one pass:
344
+ `tabularmapper --harvest ./past_statements --learn sqlite:///learned.db`.
345
+
346
+ ## Custom output schema
347
+
348
+ The output columns and synonyms are data, not code. Point `TABULARMAPPER_CONFIG` at
349
+ a JSON file (or `https://` / `s3://` URL):
350
+
351
+ ```json
352
+ {
353
+ "output_schema": [
354
+ {"field": "date", "header": "Date", "type": "date"},
355
+ {"field": "description", "header": "Details", "type": "text"},
356
+ {"field": "debit", "header": "Debit", "type": "money"},
357
+ {"field": "credit", "header": "Credit", "type": "money"}
358
+ ],
359
+ "synonyms": { "debit": ["withdrawal", "paid out"] }
360
+ }
361
+ ```
362
+
363
+ `type` is `date` | `number`/`money`/`currency`/`integer`/`float` | `text`/`string`.
364
+ Rename a header, reorder, drop a column, or add a brand-new one — all config, no
365
+ code. In a library call `configure("config.json")` (or `configure(config_from_dict(...))`)
366
+ before processing. **There is no default schema** — `synonyms` are exactly what
367
+ you declare (nothing is merged in).
368
+
369
+ Optional keys, all data-driven (omit them for a plain type-based mapping):
370
+
371
+ | Key | What it does |
372
+ |---|---|
373
+ | `output_schema[].description` | hint for the AI matcher (falls back to the field name) |
374
+ | `critical_fields` | fields that must be mapped, else `needs_review` |
375
+ | `require_any` | `[[a, b]]` — each group needs ≥1 mapped field, else `needs_review` |
376
+ | `reconcile` | `{"signed": s, "negative": n, "positive": p}` — split one signed column into two directional ones (e.g. debit/credit) |
377
+ | `row_keep_if_any` | a row is a record only if ≥1 of these has a value (default: any non-empty) |
378
+ | `continuation_field` | a row with only this field folds into the row above (multi-line cells) |
379
+
380
+ The ready-made **bank preset** is in `config.example.json` (also `bank_preset()`
381
+ in code) — copy it as a starting point. A minimal config needs only
382
+ `output_schema` + `synonyms`. See `tests/test_schema.py::test_generic_custom_config`.
383
+
384
+ ## API reference
385
+
386
+ Top-level (`from tabularmapper import ...`):
387
+
388
+ | Symbol | Kind | Notes |
389
+ |---|---|---|
390
+ | `process_file(path, *, output_format="file", cache=None, table_matcher=None, learn_store=None, threshold=80)` | fn | map a file → `ProcessResult` |
391
+ | `process_stream(data, *, output_format="records", cache=None, ...)` | fn | map bytes / a binary stream |
392
+ | `MappingCache("<url>")` | class | layout cache; `.get/.put/.close` (sync). No arg → env/sqlite |
393
+ | `LearnStore("<url>")` | class | learned synonyms; `.synonyms/.pending/.approve/.reject/.add/.close` |
394
+ | `configure(source=None, config=None)` | fn | load output template + synonyms (call once at startup) |
395
+ | `apply_learned(store)` | fn | activate a LearnStore's synonyms |
396
+ | `learn_from_result(res, store)` / `harvest_folder(dir, store)` | fn | teach the store |
397
+ | `load_config` / `config_from_dict` / `Config` | — | build a config object |
398
+ | `open_store(url)` | fn | low-level backend factory |
399
+ | `ProcessResult`, `ColumnMap`, `OutputResult` | class | result types |
400
+ | `records_to_csv_bytes(records)` | fn | CSV serializer |
401
+
402
+ Submodules: `tabularmapper.ai_matcher` (`OpenAICompatibleMatcher`),
403
+ `tabularmapper.api` (`router`, `lifespan`, `app`, `state`,
404
+ `build_matcher`), `tabularmapper.llm_fallback` (`HashingEmbeddingFallback`).
405
+
406
+ ## Gotchas & FAQ
407
+
408
+ - **"No module named `bank_mapper_cache`" / `MappingCacheManager` not found.**
409
+ Those don't exist. The cache is `from tabularmapper import MappingCache`,
410
+ and it's a plain sync object. The FastAPI `lifespan` already creates it — you
411
+ don't need a manager or a startup hook.
412
+ - **The cache is synchronous.** No `await`, no `init_cache()`/`close_cache()`.
413
+ Lifecycle is `MappingCache(...)` and `.close()`.
414
+ - **Don't mix `lifespan=` with `@app.on_event(...)`.** Use the `lifespan` (the
415
+ `on_event` API is deprecated in FastAPI, and the lifespan already sets up the cache).
416
+ - **Setting an env var after `import` has no effect on config.** Set
417
+ `TABULARMAPPER_CONFIG` before startup, or call `configure(...)` explicitly. The
418
+ router/CLI do this for you.
419
+ - **I get `balance` even though my schema drops it.** Your config didn't load —
420
+ the built-in default (which has `balance`) is active. Check the key is exactly
421
+ `output_schema` and that `configure()`/`TABULARMAPPER_CONFIG` actually ran; a bad
422
+ config logs a warning and falls back to defaults.
423
+ - **`.db` files appear even though I set `memory://` in `.env`.** In a FastAPI
424
+ app the `.env` isn't auto-loaded, so your env vars aren't seen and it uses the
425
+ default. The default is now **in-memory (no files)** — but if you're on an
426
+ older build it defaulted to SQLite. Either upgrade, `load_dotenv()` at startup,
427
+ or run with `uv run --env-file .env`. See [Persistence is opt-in](#persistence-is-opt-in-no-files-by-default).
428
+ - **AI never fires.** It's off unless `OPENAI_API_KEY` is set and you pass a
429
+ `table_matcher` (or use the router, which builds one when the key is present).
430
+ - **`ModuleNotFoundError: redis`** (or valkey/psycopg). You selected that backend
431
+ but didn't install its extra: `pip install "tabularmapper[redis]"`. The
432
+ default SQLite backend needs nothing.
433
+ - **Multiple workers.** SQLite is safe for one host; for several containers point
434
+ `TABULARMAPPER_CACHE`/`TABULARMAPPER_LEARN_STORE` at `redis://` / `valkey://` /
435
+ `postgresql://` so they share state.
436
+
437
+ ## Development
438
+
439
+ ```bash
440
+ git clone https://github.com/KarthiKeyan05046/tabularmapper
441
+ cd tabularmapper
442
+ pip install -e ".[api]" pytest
443
+ pytest -q # 59 tests
444
+ python make_fixtures.py # regenerate test_statements/
445
+ ```
446
+
447
+ ## Scope
448
+
449
+ `.xlsx` only. Library + CLI + FastAPI router. No transaction categorization. The
450
+ `02/06` day-vs-month ambiguity resolves per-locale (default day-first) and is
451
+ surfaced, never silently guessed.
452
+
453
+ ## License
454
+
455
+ MIT © Karthikeyan Duraisamy