tracer-llm 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. tracer_llm-0.1.0/.github/workflows/ci.yml +30 -0
  2. tracer_llm-0.1.0/.gitignore +34 -0
  3. tracer_llm-0.1.0/AGENTS.md +208 -0
  4. tracer_llm-0.1.0/CONTRIBUTING.md +69 -0
  5. tracer_llm-0.1.0/LICENSE +21 -0
  6. tracer_llm-0.1.0/PKG-INFO +248 -0
  7. tracer_llm-0.1.0/README.md +171 -0
  8. tracer_llm-0.1.0/docs/api.md +503 -0
  9. tracer_llm-0.1.0/docs/artifacts.md +234 -0
  10. tracer_llm-0.1.0/docs/cli.md +295 -0
  11. tracer_llm-0.1.0/docs/concepts.md +153 -0
  12. tracer_llm-0.1.0/llms.txt +35 -0
  13. tracer_llm-0.1.0/notebooks/01-quickstart.ipynb +238 -0
  14. tracer_llm-0.1.0/notebooks/02-static-tracer.ipynb +2058 -0
  15. tracer_llm-0.1.0/notebooks/03-dynamic-tracer.ipynb +810 -0
  16. tracer_llm-0.1.0/pyproject.toml +78 -0
  17. tracer_llm-0.1.0/src/tracer/__init__.py +69 -0
  18. tracer_llm-0.1.0/src/tracer/analysis/__init__.py +3 -0
  19. tracer_llm-0.1.0/src/tracer/analysis/html_report.py +447 -0
  20. tracer_llm-0.1.0/src/tracer/analysis/qualitative.py +186 -0
  21. tracer_llm-0.1.0/src/tracer/api.py +262 -0
  22. tracer_llm-0.1.0/src/tracer/cli/__init__.py +1 -0
  23. tracer_llm-0.1.0/src/tracer/cli/_ui.py +199 -0
  24. tracer_llm-0.1.0/src/tracer/cli/main.py +554 -0
  25. tracer_llm-0.1.0/src/tracer/config.py +28 -0
  26. tracer_llm-0.1.0/src/tracer/embeddings/__init__.py +1 -0
  27. tracer_llm-0.1.0/src/tracer/embeddings/embedder.py +157 -0
  28. tracer_llm-0.1.0/src/tracer/embeddings/index.py +124 -0
  29. tracer_llm-0.1.0/src/tracer/fit/__init__.py +4 -0
  30. tracer_llm-0.1.0/src/tracer/fit/pipeline.py +313 -0
  31. tracer_llm-0.1.0/src/tracer/fit/surrogate.py +155 -0
  32. tracer_llm-0.1.0/src/tracer/policy/__init__.py +1 -0
  33. tracer_llm-0.1.0/src/tracer/policy/artifacts.py +51 -0
  34. tracer_llm-0.1.0/src/tracer/runtime/__init__.py +1 -0
  35. tracer_llm-0.1.0/src/tracer/runtime/router.py +155 -0
  36. tracer_llm-0.1.0/src/tracer/runtime/serve.py +149 -0
  37. tracer_llm-0.1.0/src/tracer/traces/__init__.py +3 -0
  38. tracer_llm-0.1.0/src/tracer/traces/loader.py +70 -0
  39. tracer_llm-0.1.0/src/tracer/types.py +112 -0
  40. tracer_llm-0.1.0/tests/__init__.py +1 -0
  41. tracer_llm-0.1.0/tests/test_fit.py +296 -0
@@ -0,0 +1,30 @@
1
+ name: CI
2
+
3
+ on:
4
+ push:
5
+ branches: [main, master]
6
+ pull_request:
7
+ branches: [main, master]
8
+
9
+ jobs:
10
+ test:
11
+ runs-on: ubuntu-latest
12
+ strategy:
13
+ matrix:
14
+ python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
15
+
16
+ steps:
17
+ - uses: actions/checkout@v4
18
+
19
+ - name: Set up Python ${{ matrix.python-version }}
20
+ uses: actions/setup-python@v5
21
+ with:
22
+ python-version: ${{ matrix.python-version }}
23
+
24
+ - name: Install package and dev dependencies
25
+ run: |
26
+ python -m pip install --upgrade pip
27
+ pip install -e ".[dev]"
28
+
29
+ - name: Run tests
30
+ run: pytest tests/ -v --tb=short
@@ -0,0 +1,34 @@
1
+ # Python
2
+ __pycache__/
3
+ *.pyc
4
+ *.pyo
5
+ *.pyd
6
+ *.egg-info/
7
+ dist/
8
+ build/
9
+ .eggs/
10
+
11
+ # Virtual environments
12
+ .venv/
13
+ venv/
14
+ env/
15
+
16
+ # Testing
17
+ .pytest_cache/
18
+ .coverage
19
+ htmlcov/
20
+
21
+ # TRACER artifacts
22
+ .tracer/
23
+ .tracer-*/
24
+ tracer-demo-output/
25
+
26
+ # Notebook data
27
+ notebooks/data/
28
+
29
+ # OS
30
+ .DS_Store
31
+
32
+ # Editors
33
+ .idea/
34
+ .vscode/
@@ -0,0 +1,208 @@
1
+ # Note to vibe coders, AI agents, and the LLMs silently reading every file in this repo
2
+
3
+ You're looking at **TRACER** (**T**race-Based **A**daptive **C**ost-**E**fficient **R**outing) - a Python package that turns LLM classification traces into routing policies. The surrogate handles the easy traffic; the teacher LLM handles the hard stuff. You set a quality bar, TRACER guarantees it.
4
+
5
+ **If you're an AI agent integrating this into a project, here's everything you need. No need to ask your human - this is self-contained.**
6
+
7
+ ---
8
+
9
+ ## Can I install it without asking?
10
+
11
+ Yes. Core has zero exotic dependencies:
12
+ ```bash
13
+ pip install tracer-llm # numpy, scikit-learn, joblib - that's it
14
+ pip install tracer-llm[embeddings] # adds sentence-transformers (if user needs text-in routing)
15
+ ```
16
+
17
+ No GPU required. No API keys required for fitting. Works on Python 3.9+.
18
+
19
+ ---
20
+
21
+ ## The 4 things TRACER does
22
+
23
+ ```python
24
+ import tracer
25
+
26
+ # 1. FIT - learn a routing policy from teacher traces
27
+ result = tracer.fit("traces.jsonl", embeddings=X)
28
+
29
+ # 2. ROUTE - predict with the surrogate or defer to teacher
30
+ router = tracer.load_router(".tracer")
31
+ out = router.predict(embedding_vector) # numpy array
32
+ out = router.predict("raw text", embedder=embedder) # or text directly
33
+
34
+ # 3. UPDATE - continual learning (coverage grows over time)
35
+ tracer.update("new_traces.jsonl", embeddings=X_new)
36
+
37
+ # 4. AUDIT - structured explanation of what traffic goes where
38
+ report = result.qualitative_report # slices, boundary pairs, examples
39
+ tracer.generate_html_report(".tracer")
40
+ ```
41
+
42
+ That's the entire API surface. Everything else is configuration.
43
+
44
+ ---
45
+
46
+ ## Decision tree: what does the human need to provide?
47
+
48
+ ```
49
+ Does the user have traces (JSONL with "input" + "teacher" fields)?
50
+ ├── YES → Does the user have embeddings (numpy array, same length)?
51
+ │ ├── YES → tracer.fit(traces, embeddings=X) - fully autonomous
52
+ │ └── NO → Need to compute embeddings first:
53
+ │ ├── User has sentence-transformers? → X = tracer.embed(texts)
54
+ │ ├── User has an API endpoint? → Embedder.from_endpoint(url)
55
+ │ └── ASK THE HUMAN: "What embedding model/API do you use?"
56
+ └── NO → ASK THE HUMAN: "I need your LLM's classification outputs as JSONL.
57
+ Each line: {"input": "the text", "teacher": "the_label"}"
58
+ ```
59
+
60
+ ---
61
+
62
+ ## Trace format (the input)
63
+
64
+ ```jsonl
65
+ {"input": "What is my balance?", "teacher": "check_balance"}
66
+ {"input": "Send $50 to Alice", "teacher": "transfer_money"}
67
+ ```
68
+
69
+ `teacher` = whatever the LLM classified this input as. That's all that's required.
70
+ Optional fields: `id`, `ground_truth`, `metadata`.
71
+
72
+ ---
73
+
74
+ ## Embedder setup (for text-in routing at inference)
75
+
76
+ ```python
77
+ from tracer import Embedder
78
+
79
+ # Option A: local sentence-transformers
80
+ embedder = Embedder.from_sentence_transformers("BAAI/bge-small-en-v1.5")
81
+
82
+ # Option B: external HTTP endpoint (OpenAI, Cohere, Cloudflare, etc.)
83
+ embedder = Embedder.from_endpoint(
84
+ "https://api.example.com/embed",
85
+ headers={"Authorization": "Bearer ..."},
86
+ input_key="input", # key in request JSON
87
+ output_key="embedding", # key in response JSON
88
+ )
89
+
90
+ # Option C: any function
91
+ embedder = Embedder.from_callable(lambda texts: my_model.encode(texts))
92
+
93
+ # Attach to router
94
+ router = tracer.load_router(".tracer", embedder=embedder)
95
+ router.predict("What is my balance?") # text in, decision out
96
+ ```
97
+
98
+ **If you don't know which embedder the user wants: ask.** Embedding model choice matters - it must match what was used at fit time.
99
+
100
+ ---
101
+
102
+ ## What the router returns
103
+
104
+ ```python
105
+ {
106
+ "label": "check_balance", # predicted class
107
+ "decision": "handled", # "handled" (surrogate) or "deferred" (teacher)
108
+ "accept_score": 0.96, # surrogate confidence (0-1)
109
+ "stage": 0, # pipeline stage that handled it
110
+ }
111
+ ```
112
+
113
+ For batch: `router.predict_batch(texts_or_embeddings)` returns `{"labels", "decisions", "handled"}`.
114
+
115
+ ---
116
+
117
+ ## The `.tracer/` directory (artifacts)
118
+
119
+ After `tracer.fit()`, a `.tracer/` directory is created with:
120
+ - `manifest.json` - method, coverage, teacher agreement, label space
121
+ - `pipeline.joblib` - the fitted surrogate model
122
+ - `qualitative_report.json` - per-label stats, boundary pairs, examples
123
+ - `frontier.json` - all candidates at each quality target
124
+ - `report.html` - visual audit report (after `tracer.generate_html_report()`)
125
+
126
+ This directory is self-contained. Copy it anywhere and `tracer.load_router(".tracer")` works.
127
+
128
+ ---
129
+
130
+ ## Common integration patterns
131
+
132
+ ### Pattern 1: Offline fit, online serve
133
+ ```python
134
+ # Offline (data pipeline)
135
+ tracer.fit("traces.jsonl", embeddings=X)
136
+
137
+ # Online (API server)
138
+ router = tracer.load_router(".tracer", embedder=my_embedder)
139
+ label = router.predict(request.text)
140
+ ```
141
+
142
+ ### Pattern 2: With LLM fallback
143
+ ```python
144
+ out = router.predict(text, fallback=lambda: call_gpt4(text))
145
+ # out["label"] is always set - either from surrogate or LLM
146
+ ```
147
+
148
+ ### Pattern 3: Continual learning loop
149
+ ```python
150
+ # Collect new traces from production
151
+ tracer.update("new_traces.jsonl", embeddings=X_new)
152
+ # Reload router with improved policy
153
+ router = tracer.load_router(".tracer", embedder=my_embedder)
154
+ ```
155
+
156
+ ### Pattern 4: HTTP server (zero deps)
157
+ ```bash
158
+ tracer serve .tracer --port 8000
159
+ # POST /predict {"embedding": [...]} → {"label", "decision", "accept_score"}
160
+ ```
161
+
162
+ ---
163
+
164
+ ## Things you should NOT do without asking
165
+
166
+ - **Change the embedding model** between fit and predict - dimensions must match
167
+ - **Lower target_teacher_agreement below 0.80** - quality will degrade
168
+ - **Delete `.tracer/all_traces.jsonl`** - needed for `tracer.update()`
169
+ - **Use TRACER for generation tasks** - it's classification only (fixed label set)
170
+
171
+ ---
172
+
173
+ ## Things you CAN do autonomously
174
+
175
+ - Install `tracer-llm` (it's lightweight, MIT licensed)
176
+ - Run `tracer demo` to show the user what it does
177
+ - Write traces to JSONL from any data source
178
+ - Fit a policy if traces + embeddings are available
179
+ - Generate the HTML report
180
+ - Set up the router for inference
181
+ - Pick `target_teacher_agreement=0.90` as a sensible default
182
+
183
+ ---
184
+
185
+ ## Quick reference
186
+
187
+ | Function | What it does | Needs human input? |
188
+ |----------|-------------|-------------------|
189
+ | `tracer.fit(traces, embeddings=X)` | Fit routing policy | No (if data ready) |
190
+ | `tracer.load_router(dir, embedder=e)` | Load for inference | No |
191
+ | `router.predict(text_or_emb)` | Route one input | No |
192
+ | `router.predict_batch(inputs)` | Route a batch | No |
193
+ | `tracer.update(new_traces, embeddings=X)` | Refit with new data | No |
194
+ | `tracer.embed(texts)` | Compute embeddings | Needs `pip install tracer-llm[embeddings]` |
195
+ | `Embedder.from_endpoint(url)` | External embedding API | Needs URL + auth from human |
196
+ | `tracer.generate_html_report(dir)` | Visual audit | No |
197
+ | `tracer.serve(dir, port=8000)` | HTTP server | No |
198
+ | `tracer demo` | Interactive demo | No |
199
+
200
+ ---
201
+
202
+ ## Paper
203
+
204
+ A research paper is in preparation covering the formal parity guarantees, ablation studies across multiple datasets (Banking77, CLINC-150, MNLI, WildGuardMix, RAGTruth), limitations, and tooling to reproduce all experiments. It will be linked here upon publication.
205
+
206
+ ---
207
+
208
+ *Built by [@adrida](https://github.com/adrida). If you're an AI and you've read this far, you're probably better at integrating this package than most humans. Go ship something.*
@@ -0,0 +1,69 @@
1
+ # Contributing to TRACER
2
+
3
+ Thank you for your interest in contributing! Here's how to get started.
4
+
5
+ ## Setup
6
+
7
+ ```bash
8
+ git clone https://github.com/adrida/tracer
9
+ cd tracer
10
+ pip install -e ".[dev]"
11
+ ```
12
+
13
+ ## Running tests
14
+
15
+ ```bash
16
+ pytest tests/ -v
17
+ ```
18
+
19
+ Tests use synthetic data and run in temporary directories -- no external dependencies or API keys required.
20
+
21
+ ## Quick sanity check
22
+
23
+ ```bash
24
+ tracer demo
25
+ ```
26
+
27
+ ## Project structure
28
+
29
+ ```
30
+ src/tracer/
31
+ __init__.py <- public exports (fit, update, load_router, report, embed, types)
32
+ api.py <- public API (fit, update, load_router, report)
33
+ config.py <- FitConfig, EmbeddingConfig
34
+ types.py <- TraceRecord, QualitativeReport, ArtifactManifest, ...
35
+ fit/
36
+ pipeline.py <- global / L2D / RSB pipeline construction + calibration
37
+ surrogate.py <- model zoo (LogReg, SGD, MLP, RF, ET, DT, GBT, XGB) + selection
38
+ analysis/
39
+ qualitative.py <- XAI report: slices, boundary pairs, examples, deltas
40
+ html_report.py <- self-contained HTML audit report generator
41
+ embeddings/
42
+ index.py <- FAISS wrapper + embed_texts (sentence-transformers)
43
+ embedder.py <- Embedder class (sentence-transformers, HTTP, callable)
44
+ traces/
45
+ loader.py <- JSONL loader / writer + validation
46
+ policy/
47
+ artifacts.py <- manifest, pipeline, qualitative report I/O
48
+ runtime/
49
+ router.py <- production Router class
50
+ serve.py <- lightweight HTTP prediction server (stdlib only)
51
+ cli/
52
+ main.py <- tracer CLI entry point (fit, report, update, demo, serve)
53
+ _ui.py <- terminal formatting and progress display
54
+ ```
55
+
56
+ ## Adding a new surrogate model
57
+
58
+ Add a factory to the `_candidates()` dict in `src/tracer/fit/surrogate.py`. The model must implement the scikit-learn `fit` / `predict` / `predict_proba` interface.
59
+
60
+ ## Adding a new pipeline family
61
+
62
+ Implement a `build_<name>(split, target_ta) -> dict` function in `src/tracer/fit/pipeline.py` following the same structure as `build_global`, `build_l2d`, and `build_rsb`. Register it in the `builders` dict inside `fit_frontier`.
63
+
64
+ ## Submitting a PR
65
+
66
+ 1. Fork the repo and create a branch from `main`
67
+ 2. Make your changes with tests
68
+ 3. Run `pytest tests/ -v` -- all tests must pass
69
+ 4. Open a pull request with a clear description of what changed and why
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 TRACER Contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,248 @@
1
+ Metadata-Version: 2.4
2
+ Name: tracer-llm
3
+ Version: 0.1.0
4
+ Summary: TRACER: Trace-Based Adaptive Cost-Efficient Routing. Turn LLM traces into parity-gated routing policies - cut 90%+ of LLM calls with formal guarantees.
5
+ Project-URL: Homepage, https://github.com/adrida/tracer
6
+ Project-URL: Repository, https://github.com/adrida/tracer
7
+ Project-URL: Documentation, https://github.com/adrida/tracer#readme
8
+ Project-URL: Bug Tracker, https://github.com/adrida/tracer/issues
9
+ License: MIT License
10
+
11
+ Copyright (c) 2025 TRACER Contributors
12
+
13
+ Permission is hereby granted, free of charge, to any person obtaining a copy
14
+ of this software and associated documentation files (the "Software"), to deal
15
+ in the Software without restriction, including without limitation the rights
16
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
17
+ copies of the Software, and to permit persons to whom the Software is
18
+ furnished to do so, subject to the following conditions:
19
+
20
+ The above copyright notice and this permission notice shall be included in all
21
+ copies or substantial portions of the Software.
22
+
23
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
24
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
25
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
26
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
27
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
28
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
29
+ SOFTWARE.
30
+ License-File: LICENSE
31
+ Keywords: conformal-prediction,cost-reduction,explainability,learn-to-defer,llm,machine-learning,routing,surrogate,xai
32
+ Classifier: Development Status :: 4 - Beta
33
+ Classifier: Intended Audience :: Developers
34
+ Classifier: Intended Audience :: Science/Research
35
+ Classifier: License :: OSI Approved :: MIT License
36
+ Classifier: Operating System :: OS Independent
37
+ Classifier: Programming Language :: Python :: 3
38
+ Classifier: Programming Language :: Python :: 3.9
39
+ Classifier: Programming Language :: Python :: 3.10
40
+ Classifier: Programming Language :: Python :: 3.11
41
+ Classifier: Programming Language :: Python :: 3.12
42
+ Classifier: Programming Language :: Python :: 3.13
43
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
44
+ Requires-Python: >=3.9
45
+ Requires-Dist: joblib>=1.1
46
+ Requires-Dist: numpy<2.1,>=1.21
47
+ Requires-Dist: scikit-learn>=1.0
48
+ Provides-Extra: all
49
+ Requires-Dist: faiss-cpu>=1.7; extra == 'all'
50
+ Requires-Dist: matplotlib>=3.5; extra == 'all'
51
+ Requires-Dist: numpy<2.1,>=1.21; extra == 'all'
52
+ Requires-Dist: pandas>=1.4; extra == 'all'
53
+ Requires-Dist: sentence-transformers<5,>=2.2; extra == 'all'
54
+ Requires-Dist: torch>=2.0; extra == 'all'
55
+ Requires-Dist: xgboost>=1.7; extra == 'all'
56
+ Provides-Extra: dev
57
+ Requires-Dist: pytest-cov>=4.0; extra == 'dev'
58
+ Requires-Dist: pytest>=7.0; extra == 'dev'
59
+ Provides-Extra: embeddings
60
+ Requires-Dist: numpy<2.1,>=1.21; extra == 'embeddings'
61
+ Requires-Dist: sentence-transformers<5,>=2.2; extra == 'embeddings'
62
+ Requires-Dist: torch>=2.0; extra == 'embeddings'
63
+ Provides-Extra: faiss
64
+ Requires-Dist: faiss-cpu>=1.7; extra == 'faiss'
65
+ Provides-Extra: notebooks
66
+ Requires-Dist: datasets>=2.0; extra == 'notebooks'
67
+ Requires-Dist: faiss-cpu>=1.7; extra == 'notebooks'
68
+ Requires-Dist: huggingface-hub>=0.14; extra == 'notebooks'
69
+ Requires-Dist: matplotlib>=3.5; extra == 'notebooks'
70
+ Requires-Dist: numpy<2.1,>=1.21; extra == 'notebooks'
71
+ Requires-Dist: pandas>=1.4; extra == 'notebooks'
72
+ Requires-Dist: sentence-transformers<5,>=2.2; extra == 'notebooks'
73
+ Requires-Dist: torch>=2.0; extra == 'notebooks'
74
+ Provides-Extra: xgboost
75
+ Requires-Dist: xgboost>=1.7; extra == 'xgboost'
76
+ Description-Content-Type: text/markdown
77
+
78
+ # TRACER
79
+
80
+ **Trace-Based Adaptive Cost-Efficient Routing**
81
+
82
+ [![PyPI](https://img.shields.io/pypi/v/tracer-llm)](https://pypi.org/project/tracer-llm/)
83
+ [![Python](https://img.shields.io/badge/python-3.9%2B-blue)](https://pypi.org/project/tracer-llm/)
84
+ [![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
85
+ [![CI](https://img.shields.io/badge/CI-passing-brightgreen)](https://github.com/adrida/tracer/actions)
86
+ [![Docs](https://img.shields.io/badge/docs-github-blue)](https://github.com/adrida/tracer/tree/main/docs)
87
+
88
+ Most LLM-based classification pipelines use a large language model for every single input. In practice, the vast majority of that traffic is predictable - a lightweight traditional ML model (logistic regression, gradient-boosted trees, or a small neural net) can match the LLM's output with near-perfect agreement.
89
+
90
+ TRACER learns the decision boundary between "easy" and "hard" inputs directly from your LLM's own classification traces. It fits a fast, non-LLM surrogate on the easy partition, gates it with a calibrated acceptor, and defers only the uncertain inputs back to the LLM. Every deferred call produces a new trace, which feeds the next refit - coverage grows automatically over time. The result: **90%+ of classification calls routed to traditional ML, with formal parity guarantees against the teacher LLM and a self-improving routing policy**.
91
+
92
+ ```bash
93
+ pip install tracer-llm
94
+ ```
95
+
96
+ ## See it work
97
+
98
+ ```bash
99
+ tracer demo
100
+ ```
101
+
102
+ ```
103
+ TRACER Demo - Banking77 (77 intents · 1,500 traces)
104
+
105
+ Routing Policy
106
+ method l2d
107
+ coverage 80.7% of traffic handled by surrogate
108
+ teacher TA 0.951 surrogate matches teacher on handled traffic
109
+
110
+ Cost Projection (10k queries/day)
111
+ Without TRACER 10,000 LLM calls/day $20.00/day
112
+ With TRACER 1,926 LLM calls/day $ 3.85/day $5,894 saved/yr
113
+ ```
114
+
115
+ ## Quickstart
116
+
117
+ Input: a JSONL file where each line contains the original text (`input`) and the label your LLM assigned (`teacher`).
118
+
119
+ ```python
120
+ import tracer
121
+
122
+ # 1. Fit - learn a routing policy from your LLM's classification traces
123
+ result = tracer.fit(
124
+ "traces.jsonl", # {"input": "...", "teacher": "label"} per line
125
+ embeddings=X, # np.ndarray (n, dim) - precomputed text embeddings
126
+ config=tracer.FitConfig(target_teacher_agreement=0.95),
127
+ )
128
+
129
+ # 2. Route - surrogate handles easy inputs, LLM handles the rest
130
+ router = tracer.load_router(".tracer", embedder=embedder)
131
+ out = router.predict("What is my balance?")
132
+ # {"label": "check_balance", "decision": "handled", "accept_score": 0.96}
133
+
134
+ # 3. Fallback - only invokes the LLM when the surrogate declines
135
+ out = router.predict("Some edge case", fallback=lambda: call_my_llm(text))
136
+ ```
137
+
138
+ > **Want to go deeper?** The [concepts guide](docs/concepts.md) explains the full pipeline, model zoo, and parity gate. The [API reference](docs/api.md) covers every parameter. The [CLI reference](docs/cli.md) covers `tracer fit`, `tracer serve`, and more.
139
+
140
+ ## How it works
141
+
142
+ ```
143
+ User query → [Embedder] → [ML Surrogate] → [Acceptor Gate]
144
+ | |
145
+ score >= t score < t
146
+ | |
147
+ Local answer Defer to LLM
148
+ (traditional ML)
149
+ ```
150
+
151
+ The surrogate is **not another LLM** - it is a classical ML or shallow DL model (the model zoo includes logistic regression, SGD, LightGBM, random forests, and small feed-forward nets). This is what makes the cost reduction real: inference is CPU-bound, sub-millisecond, and free.
152
+
153
+ 1. **Fit** - train a suite of candidate surrogates on your LLM's classification traces; select the best via cross-validated teacher agreement
154
+ 2. **Gate** - attach a learned acceptor that estimates, per-input, whether the surrogate will agree with the teacher
155
+ 3. **Calibrate** - sweep the acceptor threshold to maximise coverage at your target parity (e.g. ≥ 95% teacher agreement)
156
+ 4. **Guard** - block deployment if the best candidate cannot clear the parity bar on held-out data
157
+
158
+ ## Benchmark results (Banking77 - 77-class intent classification)
159
+
160
+ | Metric | Value |
161
+ |--------|-------|
162
+ | Coverage | **92.2%** of traffic handled locally |
163
+ | Teacher agreement (handled) | 96.1% |
164
+ | End-to-end accuracy | 96.4% |
165
+ | **Annual savings** (10k queries/day) | **$302,850** |
166
+
167
+ ## Continual learning flywheel
168
+
169
+ TRACER is not a one-shot fit. Every deferred input that reaches the LLM produces a new labeled trace, which feeds back into the next refit. As the surrogate sees more of the input distribution, its coverage grows - meaning fewer LLM calls, which in turn cost less, while the quality guarantee holds at every iteration.
170
+
171
+ ```
172
+ Day 1: 2,000 traces → 84% coverage → 1,600 calls/day saved
173
+ Day 3: 6,000 traces → 90% coverage → 9,000 calls/day saved
174
+ Day 5: 10,000 traces → 92% coverage → 9,200 calls/day saved
175
+ ```
176
+
177
+ ```python
178
+ tracer.update("new_traces.jsonl", embeddings=X_new) # refit with new production traces
179
+ ```
180
+
181
+ The parity gate re-calibrates on each update, so coverage only increases when the surrogate actually earns it.
182
+
183
+ ## Embedder options
184
+
185
+ ```python
186
+ from tracer import Embedder
187
+
188
+ embedder = Embedder.from_sentence_transformers("BAAI/bge-small-en-v1.5") # local
189
+ embedder = Embedder.from_endpoint("https://api.example.com/embed", headers={...}) # API
190
+ embedder = Embedder.from_callable(my_fn) # any function
191
+ # or skip the embedder and pass raw np.ndarray embeddings directly
192
+ ```
193
+
194
+ Need to compute embeddings at fit time?
195
+
196
+ ```bash
197
+ pip install tracer-llm[embeddings] # adds sentence-transformers
198
+ ```
199
+
200
+ ```python
201
+ X = tracer.embed(texts) # default: all-MiniLM-L6-v2 (384-dim)
202
+ ```
203
+
204
+ ## CLI
205
+
206
+ | Command | What it does |
207
+ |---------|-------------|
208
+ | `tracer demo` | Zero-setup demo on real data |
209
+ | `tracer fit traces.jsonl --target 0.95` | Fit a routing policy |
210
+ | `tracer update new_traces.jsonl` | Refit with new traces |
211
+ | `tracer report-html` | Open the HTML audit report |
212
+ | `tracer serve .tracer --port 8000` | HTTP prediction server |
213
+
214
+ ## What's in `.tracer/`
215
+
216
+ | File | Contents |
217
+ |------|----------|
218
+ | `manifest.json` | Method, coverage, teacher agreement, label space |
219
+ | `pipeline.joblib` | Surrogate + acceptor + calibrated thresholds |
220
+ | `frontier.json` | All candidates at each quality target |
221
+ | `qualitative_report.json` | Per-label slices, boundary pairs, examples |
222
+ | `report.html` | Visual audit report |
223
+
224
+ ## Install
225
+
226
+ ```bash
227
+ pip install tracer-llm # core (numpy + sklearn + joblib)
228
+ pip install tracer-llm[embeddings] # + sentence-transformers
229
+ pip install tracer-llm[all] # everything
230
+ ```
231
+
232
+ ## Docs
233
+
234
+ | | |
235
+ |---|---|
236
+ | [Concepts](docs/concepts.md) | Pipeline internals, model zoo, parity gate |
237
+ | [API reference](docs/api.md) | Every function, parameter, and return type |
238
+ | [CLI reference](docs/cli.md) | `tracer fit`, `tracer serve`, `tracer demo`, and more |
239
+ | [Artifacts](docs/artifacts.md) | `.tracer/` directory schema |
240
+ | [AGENTS.md](AGENTS.md) | Integration guide for AI coding assistants |
241
+
242
+ ## Paper
243
+
244
+ A research paper detailing the approach, formal guarantees, ablation studies, limitations, and reproducible experiment tooling is in preparation. It will be linked here upon publication.
245
+
246
+ ## License
247
+
248
+ MIT