schematico 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (34) hide show
  1. schematico-0.1.0/.env.example +31 -0
  2. schematico-0.1.0/.github/ISSUE_TEMPLATE/bug_report.md +35 -0
  3. schematico-0.1.0/.github/ISSUE_TEMPLATE/feature_request.md +17 -0
  4. schematico-0.1.0/.github/PULL_REQUEST_TEMPLATE.md +14 -0
  5. schematico-0.1.0/.github/workflows/publish.yml +63 -0
  6. schematico-0.1.0/.github/workflows/test.yml +32 -0
  7. schematico-0.1.0/.gitignore +41 -0
  8. schematico-0.1.0/CONTRIBUTING.md +67 -0
  9. schematico-0.1.0/LICENSE +21 -0
  10. schematico-0.1.0/PKG-INFO +289 -0
  11. schematico-0.1.0/README.md +256 -0
  12. schematico-0.1.0/assets/logo.svg +11 -0
  13. schematico-0.1.0/cookbook/run_discovery.py +51 -0
  14. schematico-0.1.0/cookbook/run_generation.py +51 -0
  15. schematico-0.1.0/pyproject.toml +61 -0
  16. schematico-0.1.0/schematico/__init__.py +27 -0
  17. schematico-0.1.0/schematico/cli/__init__.py +0 -0
  18. schematico-0.1.0/schematico/cli/main.py +375 -0
  19. schematico-0.1.0/schematico/cli/progress.py +31 -0
  20. schematico-0.1.0/schematico/cli/projects.py +173 -0
  21. schematico-0.1.0/schematico/cli/runner.py +125 -0
  22. schematico-0.1.0/schematico/cli/wizard.py +177 -0
  23. schematico-0.1.0/schematico/discovery.py +102 -0
  24. schematico-0.1.0/schematico/generator.py +97 -0
  25. schematico-0.1.0/schematico/helpers.py +38 -0
  26. schematico-0.1.0/schematico/logging.py +34 -0
  27. schematico-0.1.0/schematico/models.py +140 -0
  28. schematico-0.1.0/schematico/providers.py +95 -0
  29. schematico-0.1.0/schematico/tools/tavily_tools.py +79 -0
  30. schematico-0.1.0/tests/test_generator.py +190 -0
  31. schematico-0.1.0/tests/test_logging.py +22 -0
  32. schematico-0.1.0/tests/test_projects.py +131 -0
  33. schematico-0.1.0/tests/test_providers.py +50 -0
  34. schematico-0.1.0/uv.lock +3609 -0
@@ -0,0 +1,31 @@
1
+ # Copy this file to `.env` and fill in the values you need.
2
+ # Schematico auto-loads `.env` from the current directory.
3
+
4
+ # ── LLM credentials ───────────────────────────────────────────────────────────
5
+ # Schematico's default model is gateway/anthropic:claude-sonnet-4-6, routed
6
+ # through the Pydantic AI Gateway. Set this to use it.
7
+ PYDANTIC_AI_GATEWAY_API_KEY=
8
+
9
+ # Optional: override the default model. Any pydantic-ai model string works, e.g.
10
+ # anthropic:claude-sonnet-4-6
11
+ # gateway/openai:gpt-4.1
12
+ # ollama:llama3.2 (local, keyless)
13
+ PAI_MODEL=
14
+
15
+ # If you'd rather hit a provider directly instead of the gateway, set its native
16
+ # key and point PAI_MODEL at that provider (e.g. anthropic:claude-sonnet-4-6).
17
+ # ANTHROPIC_API_KEY=
18
+ # OPENAI_API_KEY=
19
+
20
+ # ── Discover mode (live web search) ───────────────────────────────────────────
21
+ # Required for `schematico discover`. Free tier at https://tavily.com.
22
+ TAVILY_API_KEY=
23
+
24
+ # ── Observability (optional) ──────────────────────────────────────────────────
25
+ # A Logfire write token. If set, traces, tool calls, and token usage are sent to
26
+ # https://pydantic.dev/logfire. Leave blank to log to stderr only.
27
+ LOGFIRE_TOKEN=
28
+
29
+ # ── Logging ───────────────────────────────────────────────────────────────────
30
+ # WARNING (default), INFO, or DEBUG. Controls stderr verbosity.
31
+ LOG_LEVEL=WARNING
@@ -0,0 +1,35 @@
1
+ ---
2
+ name: Bug report
3
+ about: Something isn't working the way you expected
4
+ title: "[Bug] "
5
+ labels: bug
6
+ ---
7
+
8
+ **What happened?**
9
+ A clear description of the bug.
10
+
11
+ **What did you expect to happen?**
12
+
13
+ **Steps to reproduce**
14
+ 1. ...
15
+ 2. ...
16
+
17
+ If it helps, include the schema you used and the exact command:
18
+
19
+ ```json
20
+ { "table": "...", "fields": [ ... ] }
21
+ ```
22
+
23
+ ```bash
24
+ schematico discover ...
25
+ ```
26
+
27
+ **Environment**
28
+ - schematico version: <!-- pip show schematico -->
29
+ - Python version:
30
+ - OS:
31
+ - Mode: generate / discover
32
+ - Model: <!-- e.g. gateway/anthropic:claude-sonnet-4-6 -->
33
+
34
+ **Logs**
35
+ Re-run with `LOG_LEVEL=DEBUG` and paste any relevant output (redact API keys!).
@@ -0,0 +1,17 @@
1
+ ---
2
+ name: Feature request
3
+ about: Suggest an idea or improvement
4
+ title: "[Feature] "
5
+ labels: enhancement
6
+ ---
7
+
8
+ **What problem are you trying to solve?**
9
+ Describe the use case. The *why* helps us design the right thing.
10
+
11
+ **Proposed solution**
12
+ What you'd like to see. Rough is fine.
13
+
14
+ **Alternatives you've considered**
15
+
16
+ **Anything else?**
17
+ Mockups, example schemas, links to similar tools, etc.
@@ -0,0 +1,14 @@
1
+ ## What does this PR do?
2
+
3
+ <!-- A short summary of the change and the problem it solves. -->
4
+
5
+ ## Why?
6
+
7
+ <!-- Context / motivation. Link any related issue: Closes #123 -->
8
+
9
+ ## Checklist
10
+
11
+ - [ ] Tests pass locally (`uv run pytest`)
12
+ - [ ] Code is formatted (`uv run black .`)
13
+ - [ ] Added/updated tests for the change
14
+ - [ ] Updated docs/README if behavior changed
@@ -0,0 +1,63 @@
1
+ name: Publish to PyPI
2
+
3
+ # Publishes to PyPI and creates a GitHub Release when a version tag (e.g. v0.1.0)
4
+ # is pushed. Uses PyPI Trusted Publishing (OIDC) — no API token or secret needed.
5
+ #
6
+ # Typical release flow:
7
+ # 1. Bump `version` in pyproject.toml and merge to main.
8
+ # 2. git tag v0.1.0 && git push origin v0.1.0
9
+ # The tag must match the version in pyproject.toml, or the run fails.
10
+
11
+ on:
12
+ push:
13
+ tags:
14
+ - "v*"
15
+ workflow_dispatch:
16
+
17
+ jobs:
18
+ publish:
19
+ name: Build and publish schematico
20
+ runs-on: ubuntu-latest
21
+ environment:
22
+ name: pypi
23
+ url: https://pypi.org/p/schematico
24
+ permissions:
25
+ id-token: write # Trusted Publishing to PyPI (OIDC)
26
+ contents: write # create the GitHub Release
27
+ steps:
28
+ - uses: actions/checkout@v4
29
+
30
+ - name: Install uv
31
+ uses: astral-sh/setup-uv@v5
32
+ with:
33
+ python-version: "3.12"
34
+
35
+ - name: Install dependencies
36
+ run: uv sync --dev
37
+
38
+ - name: Run tests
39
+ run: uv run pytest -q
40
+
41
+ - name: Verify tag matches package version
42
+ if: startsWith(github.ref, 'refs/tags/v')
43
+ run: |
44
+ tag_version="${GITHUB_REF_NAME#v}"
45
+ pkg_version="$(uv run python -c 'import schematico; print(schematico.__version__)')"
46
+ echo "tag=$tag_version package=$pkg_version"
47
+ if [ "$tag_version" != "$pkg_version" ]; then
48
+ echo "::error::Tag v$tag_version does not match pyproject version $pkg_version. Bump the version in pyproject.toml before tagging."
49
+ exit 1
50
+ fi
51
+
52
+ - name: Build distributions
53
+ run: uv build
54
+
55
+ - name: Publish to PyPI
56
+ uses: pypa/gh-action-pypi-publish@release/v1
57
+
58
+ - name: Create GitHub Release
59
+ if: startsWith(github.ref, 'refs/tags/v')
60
+ uses: softprops/action-gh-release@v2
61
+ with:
62
+ generate_release_notes: true
63
+ files: dist/*
@@ -0,0 +1,32 @@
1
+ name: Tests
2
+
3
+ on:
4
+ push:
5
+ branches: [main]
6
+ pull_request:
7
+ branches: [main]
8
+
9
+ jobs:
10
+ test:
11
+ name: pytest (Python ${{ matrix.python-version }})
12
+ runs-on: ubuntu-latest
13
+ strategy:
14
+ fail-fast: false
15
+ matrix:
16
+ python-version: ["3.11", "3.12", "3.13"]
17
+ steps:
18
+ - uses: actions/checkout@v4
19
+
20
+ - name: Install uv
21
+ uses: astral-sh/setup-uv@v5
22
+ with:
23
+ python-version: ${{ matrix.python-version }}
24
+
25
+ - name: Install dependencies
26
+ run: uv sync --all-extras --dev
27
+
28
+ - name: Run tests
29
+ run: uv run pytest -q
30
+
31
+ - name: Check formatting
32
+ run: uv run black --check .
@@ -0,0 +1,41 @@
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ .venv/
5
+ *.egg-info/
6
+ .pytest_cache/
7
+
8
+ # Logs
9
+ logs
10
+ *.log
11
+ npm-debug.log*
12
+ yarn-debug.log*
13
+ yarn-error.log*
14
+ pnpm-debug.log*
15
+ lerna-debug.log*
16
+
17
+ node_modules
18
+ dist
19
+ dist-ssr
20
+ *.local
21
+
22
+ # Environment variables
23
+ .env
24
+ .env.local
25
+ .env.*.local
26
+
27
+ # Editor directories and files
28
+ .vscode/*
29
+ !.vscode/extensions.json
30
+ .idea
31
+ .DS_Store
32
+ *.suo
33
+ *.ntvs*
34
+ *.njsproj
35
+ *.sln
36
+ *.sw?
37
+
38
+ output/
39
+ output_files/
40
+ .schematico
41
+ output.json
@@ -0,0 +1,67 @@
1
+ # Contributing to Schematico
2
+
3
+ First off — thank you. Schematico is young and open, and every issue, idea, and
4
+ pull request genuinely helps. This guide will get you from clone to PR in a few
5
+ minutes.
6
+
7
+ ## Ways to contribute
8
+
9
+ - **Report a bug** — open an issue with steps to reproduce.
10
+ - **Request a feature** — open an issue describing the problem you're trying to
11
+ solve (the *why* matters more than the *how*).
12
+ - **Improve the docs** — typos, clearer examples, and better explanations are
13
+ always welcome.
14
+ - **Write code** — pick up an open issue, or propose a change. For anything
15
+ large, open an issue first so we can agree on direction before you invest time.
16
+
17
+ ## Development setup
18
+
19
+ Schematico uses [uv](https://docs.astral.sh/uv/) for dependency management.
20
+
21
+ ```bash
22
+ git clone https://github.com/Sententia-Lab/schematico.git
23
+ cd schematico
24
+ uv sync # installs the package + dev tools into .venv
25
+
26
+ cp .env.example .env # add any keys you need (most tests need none)
27
+ ```
28
+
29
+ ## Running the tests
30
+
31
+ ```bash
32
+ uv run pytest # the full suite — runs in ~2s, no API keys needed
33
+ ```
34
+
35
+ The test suite mocks the LLM, so you don't need credentials or network access to
36
+ run it. Please add tests for any new behavior.
37
+
38
+ ## Code style
39
+
40
+ ```bash
41
+ uv run black . # format before committing
42
+ ```
43
+
44
+ - Keep functions small and typed — the codebase uses modern type hints throughout.
45
+ - Match the style of the surrounding code.
46
+ - New public functions get a short docstring.
47
+
48
+ ## Submitting a pull request
49
+
50
+ 1. Fork the repo and create a branch off `main`
51
+ (`git checkout -b feature/my-change`).
52
+ 2. Make your change, with tests, and run `uv run pytest` + `uv run black .`.
53
+ 3. Write a clear PR description: what changed, and why.
54
+ 4. Open the PR against `main`. CI runs the test suite on every PR.
55
+
56
+ We aim to review PRs promptly. Small, focused PRs get merged fastest.
57
+
58
+ ## Reporting security issues
59
+
60
+ Please **do not** open a public issue for security vulnerabilities. Email
61
+ [narenchaudhry@gmail.com](mailto:narenchaudhry@gmail.com) instead, and we'll
62
+ work with you on a fix and disclosure timeline.
63
+
64
+ ## License
65
+
66
+ By contributing, you agree that your contributions will be licensed under the
67
+ [MIT License](LICENSE) that covers the project.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Naren Chaudhry
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,289 @@
1
+ Metadata-Version: 2.4
2
+ Name: schematico
3
+ Version: 0.1.0
4
+ Summary: Generate realistic synthetic data from a simple JSON schema — library + CLI
5
+ Project-URL: Homepage, https://github.com/Sententia-Lab/schematico
6
+ Project-URL: Repository, https://github.com/Sententia-Lab/schematico
7
+ Project-URL: Issues, https://github.com/Sententia-Lab/schematico/issues
8
+ Author-email: Naren Chaudhry <narenchaudhry@gmail.com>
9
+ License: MIT
10
+ License-File: LICENSE
11
+ Keywords: cli,data-discovery,data-generation,pydantic-ai,schema,synthetic-data,test-data,web-search
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Environment :: Console
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: License :: OSI Approved :: MIT License
16
+ Classifier: Operating System :: OS Independent
17
+ Classifier: Programming Language :: Python :: 3
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Programming Language :: Python :: 3.12
20
+ Classifier: Programming Language :: Python :: 3.13
21
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
22
+ Classifier: Topic :: Software Development :: Testing
23
+ Classifier: Topic :: Utilities
24
+ Requires-Python: >=3.11
25
+ Requires-Dist: httpx>=0.27.0
26
+ Requires-Dist: logfire>=3.0.0
27
+ Requires-Dist: pydantic-ai>=0.4.0
28
+ Requires-Dist: python-dotenv>=1.0.0
29
+ Requires-Dist: tavily-python>=0.7.26
30
+ Requires-Dist: tomli-w>=1.0.0
31
+ Requires-Dist: typer>=0.12.0
32
+ Description-Content-Type: text/markdown
33
+
34
+ <p align="center">
35
+ <img src="assets/logo.svg" alt="Schematico" width="110" />
36
+ </p>
37
+
38
+ <h1 align="center">Schematico</h1>
39
+
40
+ <p align="center">
41
+ <strong>Describe the data you want. Get it back as clean JSON.</strong><br/>
42
+ Find real public records on the live web, or synthesize realistic ones — from one tiny schema.
43
+ </p>
44
+
45
+ <p align="center">
46
+ <a href="https://pypi.org/project/schematico/"><img src="https://img.shields.io/pypi/v/schematico.svg" alt="PyPI"></a>
47
+ <a href="https://pypi.org/project/schematico/"><img src="https://img.shields.io/pypi/pyversions/schematico.svg" alt="Python versions"></a>
48
+ <a href="https://github.com/Sententia-Lab/schematico/actions/workflows/test.yml"><img src="https://github.com/Sententia-Lab/schematico/actions/workflows/test.yml/badge.svg" alt="Tests"></a>
49
+ <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License: MIT"></a>
50
+ </p>
51
+
52
+ ---
53
+
54
+ ## Why Schematico exists
55
+
56
+ Schematico started with a frustration: the public data I needed was *out there*,
57
+ but never in a form I could use. Congressional race results, public filings,
58
+ sports stats, niche reference tables — scattered across a dozen sites, trapped in
59
+ HTML, or sitting behind a paywall. Getting a clean table meant hours of copy-paste,
60
+ fragile scrapers, and manual cleanup.
61
+
62
+ So I built a tool where you describe the shape of the data you want — once, in a
63
+ few lines — and let an AI agent go **find it on the live web** and hand it back as
64
+ structured JSON. And when no real data exists yet (you're seeding a dev database,
65
+ writing tests, building a demo), the same schema can **synthesize** realistic
66
+ records instead.
67
+
68
+ One schema. Two ways to fill it.
69
+
70
+ | Mode | What it does | Needs |
71
+ |---|---|---|
72
+ | **`discover`** | An AI agent searches the live web (via [Tavily](https://tavily.com)) and returns **real** records matching your schema. | LLM key + `TAVILY_API_KEY` |
73
+ | **`generate`** | An LLM **synthesizes** realistic, coherent records from your schema — no web access, fully made-up data. | LLM key |
74
+
75
+ > Both modes are LLM-backed. Output is always validated against your schema and
76
+ > de-duplicated before you get it.
77
+
78
+ ---
79
+
80
+ ## Install
81
+
82
+ ```bash
83
+ pipx install schematico # as a CLI tool
84
+ uv add schematico # as a library
85
+ pip install schematico # the classic way
86
+ ```
87
+
88
+ ---
89
+
90
+ ## Quick start (CLI)
91
+
92
+ ```bash
93
+ # 1. Point Schematico at a model. The default routes Claude through the
94
+ # Pydantic AI Gateway — set its key (or see "Bring your own models" below).
95
+ export PYDANTIC_AI_GATEWAY_API_KEY=...
96
+
97
+ # 2. For discover mode, add a Tavily key (free tier at https://tavily.com).
98
+ export TAVILY_API_KEY=...
99
+
100
+ # 3. Create a project interactively. You'll be prompted for mode, schema,
101
+ # output dir, count, and model. State lives in ./.schematico/.
102
+ schematico new
103
+
104
+ # 4. Run it.
105
+ schematico discover # find real records on the web
106
+ # or
107
+ schematico generate # synthesize records
108
+ ```
109
+
110
+ Output is written to `./.schematico/output/<project>_<timestamp>.json` by default.
111
+ Override per run with `--output FILE_OR_DIR` and `--count N`.
112
+
113
+ ### Command reference
114
+
115
+ ```
116
+ schematico new # interactive project wizard
117
+ schematico list # all saved project configs
118
+ schematico generate [--config N] # synthesize records (uses default project)
119
+ schematico discover [--config N] # find real records on the web
120
+ schematico delete NAME # delete a config (-m to disambiguate mode)
121
+ schematico help # the full command tree, every flag
122
+ ```
123
+
124
+ Common flags on `generate` / `discover`: `--config/-c`, `--output/-o`,
125
+ `--count/-n`, `--model/-m`.
126
+
127
+ ---
128
+
129
+ ## Schema format
130
+
131
+ A schema is a small JSON object describing the table you want:
132
+
133
+ ```json
134
+ {
135
+ "table": "congressional_elections",
136
+ "rows": 50,
137
+ "instructions": "U.S. House races in the 2026 midterms.",
138
+ "fields": [
139
+ { "name": "district", "type": "string", "description": "state and district, e.g. 'CA-12'" },
140
+ { "name": "election_date", "type": "string", "description": "ISO 8601 date" },
141
+ { "name": "incumbent_party", "type": "enum", "values": ["D", "R", "I"] },
142
+ { "name": "is_open_seat", "type": "bool" }
143
+ ]
144
+ }
145
+ ```
146
+
147
+ | Top-level key | Required | Meaning |
148
+ |---|---|---|
149
+ | `table` | ✅ | Name of the table (also names the output model). |
150
+ | `fields` | ✅ | List of field definitions (see below). |
151
+ | `rows` | — | How many records to produce (default `25`). |
152
+ | `instructions` | — | Free-text guidance passed to the agent. |
153
+
154
+ ### Field types
155
+
156
+ Types are deliberately minimal — the **`description`** does the heavy lifting.
157
+
158
+ | Type | Python | Notes |
159
+ |---|---|---|
160
+ | `string` | `str` | Any text. Shape it with `description`, e.g. `"UUID v4"`, `"ISO 8601 timestamp"`, `"ISO 3166 country code"`. |
161
+ | `int` | `int` | Optional `min` / `max`. |
162
+ | `float` | `float` | Optional `min` / `max`. |
163
+ | `bool` | `bool` | `true` / `false`. |
164
+ | `enum` | one of `values` | Requires a non-empty `values` list. |
165
+
166
+ > There's no dedicated `uuid` / `email` / `timestamp` type on purpose. Use
167
+ > `string` and say what you want in `description` — the model fills it in
168
+ > accordingly, and you're never boxed in by a fixed type list.
169
+
170
+ ---
171
+
172
+ ## Library usage
173
+
174
+ Define a schema as a Pydantic model and call `run_generation` or `run_discovery`:
175
+
176
+ ```python
177
+ from pydantic import BaseModel, Field
178
+ from schematico import run_generation
179
+
180
+ class User(BaseModel):
181
+ id: str = Field(description="UUID v4")
182
+ full_name: str = Field(description="realistic full name")
183
+ email: str = Field(description="work email matching the name")
184
+ role: str = Field(description="one of: admin, editor, viewer")
185
+
186
+ records = run_generation(
187
+ User,
188
+ samples=10,
189
+ instructions="EU-based users only. Emails must match the full_name.",
190
+ )
191
+ # -> list[dict], validated and de-duplicated
192
+ ```
193
+
194
+ Prefer JSON schemas? Load one and run it:
195
+
196
+ ```python
197
+ from schematico import model_from_json, run_generation
198
+
199
+ model, rows, instructions = model_from_json("schema.json")
200
+ records = run_generation(model, samples=rows, instructions=instructions)
201
+ ```
202
+
203
+ To find **real** data on the web instead, swap in `run_discovery` (needs
204
+ `TAVILY_API_KEY`):
205
+
206
+ ```python
207
+ from schematico import run_discovery
208
+ records = run_discovery(User, samples=25, instructions="...")
209
+ ```
210
+
211
+ Both functions accept an optional `progress_cb(found, total, event)` callback and
212
+ a `logfire_token` for tracing.
213
+
214
+ ---
215
+
216
+ ## Bring your own models
217
+
218
+ Schematico runs on [pydantic-ai](https://ai.pydantic.dev/), so you can point it
219
+ at virtually any model — hosted, gateway-routed, or local — and even build a
220
+ **failover chain** that tries each in order.
221
+
222
+ ```python
223
+ from schematico import SchematicoModel, get_llm_model, run_discovery
224
+
225
+ model = get_llm_model([
226
+ # try the gateway first…
227
+ SchematicoModel(model="gateway/anthropic:claude-sonnet-4-6"),
228
+ # …fall back to a direct provider…
229
+ SchematicoModel(model="openai:gpt-4.1", api_key="sk-..."),
230
+ # …then a local, keyless model.
231
+ SchematicoModel(model="ollama:llama3.2", base_url="http://localhost:11434/v1"),
232
+ ])
233
+
234
+ records = run_discovery(MySchema, samples=50, model=model)
235
+ ```
236
+
237
+ - A bare model string (`"anthropic:claude-sonnet-4-6"`) reads credentials from the
238
+ provider's usual env var.
239
+ - A `SchematicoModel` lets you pin `api_key` and `base_url` per model.
240
+ - A list becomes an automatic failover chain.
241
+
242
+ From the CLI, set the model per project (`schematico new`, or
243
+ `schematico <mode> use model <id>`) and the env var that holds its key.
244
+
245
+ ---
246
+
247
+ ## Configuration
248
+
249
+ | Env var | Purpose |
250
+ |---|---|
251
+ | `PYDANTIC_AI_GATEWAY_API_KEY` | Key for the default gateway-routed model. |
252
+ | `PAI_MODEL` | Override the default model id (`gateway/anthropic:claude-sonnet-4-6`). |
253
+ | `TAVILY_API_KEY` | Required for `discover` mode (live web search). |
254
+ | `LOGFIRE_TOKEN` | Optional. Send traces, tool calls, and token usage to [Logfire](https://pydantic.dev/logfire). |
255
+ | `LOG_LEVEL` | `WARNING` (default), `INFO`, or `DEBUG`. |
256
+
257
+ Schematico auto-loads a `.env` file from the current directory — see
258
+ [`.env.example`](.env.example). Project configs live in `./.schematico/` as
259
+ `<name>.<mode>.toml` files.
260
+
261
+ ---
262
+
263
+ ## Coming soon
264
+
265
+ Schematico is just getting started. On the roadmap:
266
+
267
+ - 📦 **More output formats** — CSV, Excel, SQL inserts, and Parquet, not just JSON.
268
+ - 🔎 **Smarter discovery** — source citations per record, deeper crawling, and a
269
+ second-pass agent that verifies and de-duplicates findings.
270
+ - 🧩 **Richer schemas** — nested objects, relationships between tables, and
271
+ reusable field presets.
272
+ - 🗄️ **Direct sinks** — write straight into a database or a dataframe.
273
+ - ⚡ **Offline generation** — a fast, keyless synthesis mode for when you don't
274
+ want to call a model at all.
275
+
276
+ Have an idea? [Open an issue](https://github.com/Sententia-Lab/schematico/issues) —
277
+ this is the moment to shape where it goes.
278
+
279
+ ---
280
+
281
+ ## Contributing
282
+
283
+ Contributions are very welcome — issues, docs, and PRs all help. See
284
+ [CONTRIBUTING.md](CONTRIBUTING.md) to get from clone to PR in a couple of minutes.
285
+ The test suite mocks the LLM, so `uv run pytest` needs no API keys.
286
+
287
+ ## License
288
+
289
+ MIT. See [`LICENSE`](LICENSE).