schematico 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- schematico-0.1.0/.env.example +31 -0
- schematico-0.1.0/.github/ISSUE_TEMPLATE/bug_report.md +35 -0
- schematico-0.1.0/.github/ISSUE_TEMPLATE/feature_request.md +17 -0
- schematico-0.1.0/.github/PULL_REQUEST_TEMPLATE.md +14 -0
- schematico-0.1.0/.github/workflows/publish.yml +63 -0
- schematico-0.1.0/.github/workflows/test.yml +32 -0
- schematico-0.1.0/.gitignore +41 -0
- schematico-0.1.0/CONTRIBUTING.md +67 -0
- schematico-0.1.0/LICENSE +21 -0
- schematico-0.1.0/PKG-INFO +289 -0
- schematico-0.1.0/README.md +256 -0
- schematico-0.1.0/assets/logo.svg +11 -0
- schematico-0.1.0/cookbook/run_discovery.py +51 -0
- schematico-0.1.0/cookbook/run_generation.py +51 -0
- schematico-0.1.0/pyproject.toml +61 -0
- schematico-0.1.0/schematico/__init__.py +27 -0
- schematico-0.1.0/schematico/cli/__init__.py +0 -0
- schematico-0.1.0/schematico/cli/main.py +375 -0
- schematico-0.1.0/schematico/cli/progress.py +31 -0
- schematico-0.1.0/schematico/cli/projects.py +173 -0
- schematico-0.1.0/schematico/cli/runner.py +125 -0
- schematico-0.1.0/schematico/cli/wizard.py +177 -0
- schematico-0.1.0/schematico/discovery.py +102 -0
- schematico-0.1.0/schematico/generator.py +97 -0
- schematico-0.1.0/schematico/helpers.py +38 -0
- schematico-0.1.0/schematico/logging.py +34 -0
- schematico-0.1.0/schematico/models.py +140 -0
- schematico-0.1.0/schematico/providers.py +95 -0
- schematico-0.1.0/schematico/tools/tavily_tools.py +79 -0
- schematico-0.1.0/tests/test_generator.py +190 -0
- schematico-0.1.0/tests/test_logging.py +22 -0
- schematico-0.1.0/tests/test_projects.py +131 -0
- schematico-0.1.0/tests/test_providers.py +50 -0
- schematico-0.1.0/uv.lock +3609 -0
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
# Copy this file to `.env` and fill in the values you need.
|
|
2
|
+
# Schematico auto-loads `.env` from the current directory.
|
|
3
|
+
|
|
4
|
+
# ── LLM credentials ───────────────────────────────────────────────────────────
|
|
5
|
+
# Schematico's default model is gateway/anthropic:claude-sonnet-4-6, routed
|
|
6
|
+
# through the Pydantic AI Gateway. Set this to use it.
|
|
7
|
+
PYDANTIC_AI_GATEWAY_API_KEY=
|
|
8
|
+
|
|
9
|
+
# Optional: override the default model. Any pydantic-ai model string works, e.g.
|
|
10
|
+
# anthropic:claude-sonnet-4-6
|
|
11
|
+
# gateway/openai:gpt-4.1
|
|
12
|
+
# ollama:llama3.2 (local, keyless)
|
|
13
|
+
PAI_MODEL=
|
|
14
|
+
|
|
15
|
+
# If you'd rather hit a provider directly instead of the gateway, set its native
|
|
16
|
+
# key and point PAI_MODEL at that provider (e.g. anthropic:claude-sonnet-4-6).
|
|
17
|
+
# ANTHROPIC_API_KEY=
|
|
18
|
+
# OPENAI_API_KEY=
|
|
19
|
+
|
|
20
|
+
# ── Discover mode (live web search) ───────────────────────────────────────────
|
|
21
|
+
# Required for `schematico discover`. Free tier at https://tavily.com.
|
|
22
|
+
TAVILY_API_KEY=
|
|
23
|
+
|
|
24
|
+
# ── Observability (optional) ──────────────────────────────────────────────────
|
|
25
|
+
# A Logfire write token. If set, traces, tool calls, and token usage are sent to
|
|
26
|
+
# https://pydantic.dev/logfire. Leave blank to log to stderr only.
|
|
27
|
+
LOGFIRE_TOKEN=
|
|
28
|
+
|
|
29
|
+
# ── Logging ───────────────────────────────────────────────────────────────────
|
|
30
|
+
# WARNING (default), INFO, or DEBUG. Controls stderr verbosity.
|
|
31
|
+
LOG_LEVEL=WARNING
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: Bug report
|
|
3
|
+
about: Something isn't working the way you expected
|
|
4
|
+
title: "[Bug] "
|
|
5
|
+
labels: bug
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
**What happened?**
|
|
9
|
+
A clear description of the bug.
|
|
10
|
+
|
|
11
|
+
**What did you expect to happen?**
|
|
12
|
+
|
|
13
|
+
**Steps to reproduce**
|
|
14
|
+
1. ...
|
|
15
|
+
2. ...
|
|
16
|
+
|
|
17
|
+
If it helps, include the schema you used and the exact command:
|
|
18
|
+
|
|
19
|
+
```json
|
|
20
|
+
{ "table": "...", "fields": [ ... ] }
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
```bash
|
|
24
|
+
schematico discover ...
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
**Environment**
|
|
28
|
+
- schematico version: <!-- pip show schematico -->
|
|
29
|
+
- Python version:
|
|
30
|
+
- OS:
|
|
31
|
+
- Mode: generate / discover
|
|
32
|
+
- Model: <!-- e.g. gateway/anthropic:claude-sonnet-4-6 -->
|
|
33
|
+
|
|
34
|
+
**Logs**
|
|
35
|
+
Re-run with `LOG_LEVEL=DEBUG` and paste any relevant output (redact API keys!).
|
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: Feature request
|
|
3
|
+
about: Suggest an idea or improvement
|
|
4
|
+
title: "[Feature] "
|
|
5
|
+
labels: enhancement
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
**What problem are you trying to solve?**
|
|
9
|
+
Describe the use case. The *why* helps us design the right thing.
|
|
10
|
+
|
|
11
|
+
**Proposed solution**
|
|
12
|
+
What you'd like to see. Rough is fine.
|
|
13
|
+
|
|
14
|
+
**Alternatives you've considered**
|
|
15
|
+
|
|
16
|
+
**Anything else?**
|
|
17
|
+
Mockups, example schemas, links to similar tools, etc.
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
## What does this PR do?
|
|
2
|
+
|
|
3
|
+
<!-- A short summary of the change and the problem it solves. -->
|
|
4
|
+
|
|
5
|
+
## Why?
|
|
6
|
+
|
|
7
|
+
<!-- Context / motivation. Link any related issue: Closes #123 -->
|
|
8
|
+
|
|
9
|
+
## Checklist
|
|
10
|
+
|
|
11
|
+
- [ ] Tests pass locally (`uv run pytest`)
|
|
12
|
+
- [ ] Code is formatted (`uv run black .`)
|
|
13
|
+
- [ ] Added/updated tests for the change
|
|
14
|
+
- [ ] Updated docs/README if behavior changed
|
|
@@ -0,0 +1,63 @@
|
|
|
1
|
+
name: Publish to PyPI
|
|
2
|
+
|
|
3
|
+
# Publishes to PyPI and creates a GitHub Release when a version tag (e.g. v0.1.0)
|
|
4
|
+
# is pushed. Uses PyPI Trusted Publishing (OIDC) — no API token or secret needed.
|
|
5
|
+
#
|
|
6
|
+
# Typical release flow:
|
|
7
|
+
# 1. Bump `version` in pyproject.toml and merge to main.
|
|
8
|
+
# 2. git tag v0.1.0 && git push origin v0.1.0
|
|
9
|
+
# The tag must match the version in pyproject.toml, or the run fails.
|
|
10
|
+
|
|
11
|
+
on:
|
|
12
|
+
push:
|
|
13
|
+
tags:
|
|
14
|
+
- "v*"
|
|
15
|
+
workflow_dispatch:
|
|
16
|
+
|
|
17
|
+
jobs:
|
|
18
|
+
publish:
|
|
19
|
+
name: Build and publish schematico
|
|
20
|
+
runs-on: ubuntu-latest
|
|
21
|
+
environment:
|
|
22
|
+
name: pypi
|
|
23
|
+
url: https://pypi.org/p/schematico
|
|
24
|
+
permissions:
|
|
25
|
+
id-token: write # Trusted Publishing to PyPI (OIDC)
|
|
26
|
+
contents: write # create the GitHub Release
|
|
27
|
+
steps:
|
|
28
|
+
- uses: actions/checkout@v4
|
|
29
|
+
|
|
30
|
+
- name: Install uv
|
|
31
|
+
uses: astral-sh/setup-uv@v5
|
|
32
|
+
with:
|
|
33
|
+
python-version: "3.12"
|
|
34
|
+
|
|
35
|
+
- name: Install dependencies
|
|
36
|
+
run: uv sync --dev
|
|
37
|
+
|
|
38
|
+
- name: Run tests
|
|
39
|
+
run: uv run pytest -q
|
|
40
|
+
|
|
41
|
+
- name: Verify tag matches package version
|
|
42
|
+
if: startsWith(github.ref, 'refs/tags/v')
|
|
43
|
+
run: |
|
|
44
|
+
tag_version="${GITHUB_REF_NAME#v}"
|
|
45
|
+
pkg_version="$(uv run python -c 'import schematico; print(schematico.__version__)')"
|
|
46
|
+
echo "tag=$tag_version package=$pkg_version"
|
|
47
|
+
if [ "$tag_version" != "$pkg_version" ]; then
|
|
48
|
+
echo "::error::Tag v$tag_version does not match pyproject version $pkg_version. Bump the version in pyproject.toml before tagging."
|
|
49
|
+
exit 1
|
|
50
|
+
fi
|
|
51
|
+
|
|
52
|
+
- name: Build distributions
|
|
53
|
+
run: uv build
|
|
54
|
+
|
|
55
|
+
- name: Publish to PyPI
|
|
56
|
+
uses: pypa/gh-action-pypi-publish@release/v1
|
|
57
|
+
|
|
58
|
+
- name: Create GitHub Release
|
|
59
|
+
if: startsWith(github.ref, 'refs/tags/v')
|
|
60
|
+
uses: softprops/action-gh-release@v2
|
|
61
|
+
with:
|
|
62
|
+
generate_release_notes: true
|
|
63
|
+
files: dist/*
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
name: Tests
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [main]
|
|
6
|
+
pull_request:
|
|
7
|
+
branches: [main]
|
|
8
|
+
|
|
9
|
+
jobs:
|
|
10
|
+
test:
|
|
11
|
+
name: pytest (Python ${{ matrix.python-version }})
|
|
12
|
+
runs-on: ubuntu-latest
|
|
13
|
+
strategy:
|
|
14
|
+
fail-fast: false
|
|
15
|
+
matrix:
|
|
16
|
+
python-version: ["3.11", "3.12", "3.13"]
|
|
17
|
+
steps:
|
|
18
|
+
- uses: actions/checkout@v4
|
|
19
|
+
|
|
20
|
+
- name: Install uv
|
|
21
|
+
uses: astral-sh/setup-uv@v5
|
|
22
|
+
with:
|
|
23
|
+
python-version: ${{ matrix.python-version }}
|
|
24
|
+
|
|
25
|
+
- name: Install dependencies
|
|
26
|
+
run: uv sync --all-extras --dev
|
|
27
|
+
|
|
28
|
+
- name: Run tests
|
|
29
|
+
run: uv run pytest -q
|
|
30
|
+
|
|
31
|
+
- name: Check formatting
|
|
32
|
+
run: uv run black --check .
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
# Python
|
|
2
|
+
__pycache__/
|
|
3
|
+
*.py[cod]
|
|
4
|
+
.venv/
|
|
5
|
+
*.egg-info/
|
|
6
|
+
.pytest_cache/
|
|
7
|
+
|
|
8
|
+
# Logs
|
|
9
|
+
logs
|
|
10
|
+
*.log
|
|
11
|
+
npm-debug.log*
|
|
12
|
+
yarn-debug.log*
|
|
13
|
+
yarn-error.log*
|
|
14
|
+
pnpm-debug.log*
|
|
15
|
+
lerna-debug.log*
|
|
16
|
+
|
|
17
|
+
node_modules
|
|
18
|
+
dist
|
|
19
|
+
dist-ssr
|
|
20
|
+
*.local
|
|
21
|
+
|
|
22
|
+
# Environment variables
|
|
23
|
+
.env
|
|
24
|
+
.env.local
|
|
25
|
+
.env.*.local
|
|
26
|
+
|
|
27
|
+
# Editor directories and files
|
|
28
|
+
.vscode/*
|
|
29
|
+
!.vscode/extensions.json
|
|
30
|
+
.idea
|
|
31
|
+
.DS_Store
|
|
32
|
+
*.suo
|
|
33
|
+
*.ntvs*
|
|
34
|
+
*.njsproj
|
|
35
|
+
*.sln
|
|
36
|
+
*.sw?
|
|
37
|
+
|
|
38
|
+
output/
|
|
39
|
+
output_files/
|
|
40
|
+
.schematico
|
|
41
|
+
output.json
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
# Contributing to Schematico
|
|
2
|
+
|
|
3
|
+
First off — thank you. Schematico is young and open, and every issue, idea, and
|
|
4
|
+
pull request genuinely helps. This guide will get you from clone to PR in a few
|
|
5
|
+
minutes.
|
|
6
|
+
|
|
7
|
+
## Ways to contribute
|
|
8
|
+
|
|
9
|
+
- **Report a bug** — open an issue with steps to reproduce.
|
|
10
|
+
- **Request a feature** — open an issue describing the problem you're trying to
|
|
11
|
+
solve (the *why* matters more than the *how*).
|
|
12
|
+
- **Improve the docs** — typos, clearer examples, and better explanations are
|
|
13
|
+
always welcome.
|
|
14
|
+
- **Write code** — pick up an open issue, or propose a change. For anything
|
|
15
|
+
large, open an issue first so we can agree on direction before you invest time.
|
|
16
|
+
|
|
17
|
+
## Development setup
|
|
18
|
+
|
|
19
|
+
Schematico uses [uv](https://docs.astral.sh/uv/) for dependency management.
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
git clone https://github.com/Sententia-Lab/schematico.git
|
|
23
|
+
cd schematico
|
|
24
|
+
uv sync # installs the package + dev tools into .venv
|
|
25
|
+
|
|
26
|
+
cp .env.example .env # add any keys you need (most tests need none)
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
## Running the tests
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
uv run pytest # the full suite — runs in ~2s, no API keys needed
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
The test suite mocks the LLM, so you don't need credentials or network access to
|
|
36
|
+
run it. Please add tests for any new behavior.
|
|
37
|
+
|
|
38
|
+
## Code style
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
uv run black . # format before committing
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
- Keep functions small and typed — the codebase uses modern type hints throughout.
|
|
45
|
+
- Match the style of the surrounding code.
|
|
46
|
+
- New public functions get a short docstring.
|
|
47
|
+
|
|
48
|
+
## Submitting a pull request
|
|
49
|
+
|
|
50
|
+
1. Fork the repo and create a branch off `main`
|
|
51
|
+
(`git checkout -b feature/my-change`).
|
|
52
|
+
2. Make your change, with tests, and run `uv run pytest` + `uv run black .`.
|
|
53
|
+
3. Write a clear PR description: what changed, and why.
|
|
54
|
+
4. Open the PR against `main`. CI runs the test suite on every PR.
|
|
55
|
+
|
|
56
|
+
We aim to review PRs promptly. Small, focused PRs get merged fastest.
|
|
57
|
+
|
|
58
|
+
## Reporting security issues
|
|
59
|
+
|
|
60
|
+
Please **do not** open a public issue for security vulnerabilities. Email
|
|
61
|
+
[narenchaudhry@gmail.com](mailto:narenchaudhry@gmail.com) instead, and we'll
|
|
62
|
+
work with you on a fix and disclosure timeline.
|
|
63
|
+
|
|
64
|
+
## License
|
|
65
|
+
|
|
66
|
+
By contributing, you agree that your contributions will be licensed under the
|
|
67
|
+
[MIT License](LICENSE) that covers the project.
|
schematico-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Naren Chaudhry
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,289 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: schematico
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Generate realistic synthetic data from a simple JSON schema — library + CLI
|
|
5
|
+
Project-URL: Homepage, https://github.com/Sententia-Lab/schematico
|
|
6
|
+
Project-URL: Repository, https://github.com/Sententia-Lab/schematico
|
|
7
|
+
Project-URL: Issues, https://github.com/Sententia-Lab/schematico/issues
|
|
8
|
+
Author-email: Naren Chaudhry <narenchaudhry@gmail.com>
|
|
9
|
+
License: MIT
|
|
10
|
+
License-File: LICENSE
|
|
11
|
+
Keywords: cli,data-discovery,data-generation,pydantic-ai,schema,synthetic-data,test-data,web-search
|
|
12
|
+
Classifier: Development Status :: 4 - Beta
|
|
13
|
+
Classifier: Environment :: Console
|
|
14
|
+
Classifier: Intended Audience :: Developers
|
|
15
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
16
|
+
Classifier: Operating System :: OS Independent
|
|
17
|
+
Classifier: Programming Language :: Python :: 3
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
21
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
22
|
+
Classifier: Topic :: Software Development :: Testing
|
|
23
|
+
Classifier: Topic :: Utilities
|
|
24
|
+
Requires-Python: >=3.11
|
|
25
|
+
Requires-Dist: httpx>=0.27.0
|
|
26
|
+
Requires-Dist: logfire>=3.0.0
|
|
27
|
+
Requires-Dist: pydantic-ai>=0.4.0
|
|
28
|
+
Requires-Dist: python-dotenv>=1.0.0
|
|
29
|
+
Requires-Dist: tavily-python>=0.7.26
|
|
30
|
+
Requires-Dist: tomli-w>=1.0.0
|
|
31
|
+
Requires-Dist: typer>=0.12.0
|
|
32
|
+
Description-Content-Type: text/markdown
|
|
33
|
+
|
|
34
|
+
<p align="center">
|
|
35
|
+
<img src="assets/logo.svg" alt="Schematico" width="110" />
|
|
36
|
+
</p>
|
|
37
|
+
|
|
38
|
+
<h1 align="center">Schematico</h1>
|
|
39
|
+
|
|
40
|
+
<p align="center">
|
|
41
|
+
<strong>Describe the data you want. Get it back as clean JSON.</strong><br/>
|
|
42
|
+
Find real public records on the live web, or synthesize realistic ones — from one tiny schema.
|
|
43
|
+
</p>
|
|
44
|
+
|
|
45
|
+
<p align="center">
|
|
46
|
+
<a href="https://pypi.org/project/schematico/"><img src="https://img.shields.io/pypi/v/schematico.svg" alt="PyPI"></a>
|
|
47
|
+
<a href="https://pypi.org/project/schematico/"><img src="https://img.shields.io/pypi/pyversions/schematico.svg" alt="Python versions"></a>
|
|
48
|
+
<a href="https://github.com/Sententia-Lab/schematico/actions/workflows/test.yml"><img src="https://github.com/Sententia-Lab/schematico/actions/workflows/test.yml/badge.svg" alt="Tests"></a>
|
|
49
|
+
<a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License: MIT"></a>
|
|
50
|
+
</p>
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
## Why Schematico exists
|
|
55
|
+
|
|
56
|
+
Schematico started with a frustration: the public data I needed was *out there*,
|
|
57
|
+
but never in a form I could use. Congressional race results, public filings,
|
|
58
|
+
sports stats, niche reference tables — scattered across a dozen sites, trapped in
|
|
59
|
+
HTML, or sitting behind a paywall. Getting a clean table meant hours of copy-paste,
|
|
60
|
+
fragile scrapers, and manual cleanup.
|
|
61
|
+
|
|
62
|
+
So I built a tool where you describe the shape of the data you want — once, in a
|
|
63
|
+
few lines — and let an AI agent go **find it on the live web** and hand it back as
|
|
64
|
+
structured JSON. And when no real data exists yet (you're seeding a dev database,
|
|
65
|
+
writing tests, building a demo), the same schema can **synthesize** realistic
|
|
66
|
+
records instead.
|
|
67
|
+
|
|
68
|
+
One schema. Two ways to fill it.
|
|
69
|
+
|
|
70
|
+
| Mode | What it does | Needs |
|
|
71
|
+
|---|---|---|
|
|
72
|
+
| **`discover`** | An AI agent searches the live web (via [Tavily](https://tavily.com)) and returns **real** records matching your schema. | LLM key + `TAVILY_API_KEY` |
|
|
73
|
+
| **`generate`** | An LLM **synthesizes** realistic, coherent records from your schema — no web access, fully made-up data. | LLM key |
|
|
74
|
+
|
|
75
|
+
> Both modes are LLM-backed. Output is always validated against your schema and
|
|
76
|
+
> de-duplicated before you get it.
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
## Install
|
|
81
|
+
|
|
82
|
+
```bash
|
|
83
|
+
pipx install schematico # as a CLI tool
|
|
84
|
+
uv add schematico # as a library
|
|
85
|
+
pip install schematico # the classic way
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
---
|
|
89
|
+
|
|
90
|
+
## Quick start (CLI)
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
# 1. Point Schematico at a model. The default routes Claude through the
|
|
94
|
+
# Pydantic AI Gateway — set its key (or see "Bring your own models" below).
|
|
95
|
+
export PYDANTIC_AI_GATEWAY_API_KEY=...
|
|
96
|
+
|
|
97
|
+
# 2. For discover mode, add a Tavily key (free tier at https://tavily.com).
|
|
98
|
+
export TAVILY_API_KEY=...
|
|
99
|
+
|
|
100
|
+
# 3. Create a project interactively. You'll be prompted for mode, schema,
|
|
101
|
+
# output dir, count, and model. State lives in ./.schematico/.
|
|
102
|
+
schematico new
|
|
103
|
+
|
|
104
|
+
# 4. Run it.
|
|
105
|
+
schematico discover # find real records on the web
|
|
106
|
+
# or
|
|
107
|
+
schematico generate # synthesize records
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
Output is written to `./.schematico/output/<project>_<timestamp>.json` by default.
|
|
111
|
+
Override per run with `--output FILE_OR_DIR` and `--count N`.
|
|
112
|
+
|
|
113
|
+
### Command reference
|
|
114
|
+
|
|
115
|
+
```
|
|
116
|
+
schematico new # interactive project wizard
|
|
117
|
+
schematico list # all saved project configs
|
|
118
|
+
schematico generate [--config N] # synthesize records (uses default project)
|
|
119
|
+
schematico discover [--config N] # find real records on the web
|
|
120
|
+
schematico delete NAME # delete a config (-m to disambiguate mode)
|
|
121
|
+
schematico help # the full command tree, every flag
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
Common flags on `generate` / `discover`: `--config/-c`, `--output/-o`,
|
|
125
|
+
`--count/-n`, `--model/-m`.
|
|
126
|
+
|
|
127
|
+
---
|
|
128
|
+
|
|
129
|
+
## Schema format
|
|
130
|
+
|
|
131
|
+
A schema is a small JSON object describing the table you want:
|
|
132
|
+
|
|
133
|
+
```json
|
|
134
|
+
{
|
|
135
|
+
"table": "congressional_elections",
|
|
136
|
+
"rows": 50,
|
|
137
|
+
"instructions": "U.S. House races in the 2026 midterms.",
|
|
138
|
+
"fields": [
|
|
139
|
+
{ "name": "district", "type": "string", "description": "state and district, e.g. 'CA-12'" },
|
|
140
|
+
{ "name": "election_date", "type": "string", "description": "ISO 8601 date" },
|
|
141
|
+
{ "name": "incumbent_party", "type": "enum", "values": ["D", "R", "I"] },
|
|
142
|
+
{ "name": "is_open_seat", "type": "bool" }
|
|
143
|
+
]
|
|
144
|
+
}
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
| Top-level key | Required | Meaning |
|
|
148
|
+
|---|---|---|
|
|
149
|
+
| `table` | ✅ | Name of the table (also names the output model). |
|
|
150
|
+
| `fields` | ✅ | List of field definitions (see below). |
|
|
151
|
+
| `rows` | — | How many records to produce (default `25`). |
|
|
152
|
+
| `instructions` | — | Free-text guidance passed to the agent. |
|
|
153
|
+
|
|
154
|
+
### Field types
|
|
155
|
+
|
|
156
|
+
Types are deliberately minimal — the **`description`** does the heavy lifting.
|
|
157
|
+
|
|
158
|
+
| Type | Python | Notes |
|
|
159
|
+
|---|---|---|
|
|
160
|
+
| `string` | `str` | Any text. Shape it with `description`, e.g. `"UUID v4"`, `"ISO 8601 timestamp"`, `"ISO 3166 country code"`. |
|
|
161
|
+
| `int` | `int` | Optional `min` / `max`. |
|
|
162
|
+
| `float` | `float` | Optional `min` / `max`. |
|
|
163
|
+
| `bool` | `bool` | `true` / `false`. |
|
|
164
|
+
| `enum` | one of `values` | Requires a non-empty `values` list. |
|
|
165
|
+
|
|
166
|
+
> There's no dedicated `uuid` / `email` / `timestamp` type on purpose. Use
|
|
167
|
+
> `string` and say what you want in `description` — the model fills it in
|
|
168
|
+
> accordingly, and you're never boxed in by a fixed type list.
|
|
169
|
+
|
|
170
|
+
---
|
|
171
|
+
|
|
172
|
+
## Library usage
|
|
173
|
+
|
|
174
|
+
Define a schema as a Pydantic model and call `run_generation` or `run_discovery`:
|
|
175
|
+
|
|
176
|
+
```python
|
|
177
|
+
from pydantic import BaseModel, Field
|
|
178
|
+
from schematico import run_generation
|
|
179
|
+
|
|
180
|
+
class User(BaseModel):
|
|
181
|
+
id: str = Field(description="UUID v4")
|
|
182
|
+
full_name: str = Field(description="realistic full name")
|
|
183
|
+
email: str = Field(description="work email matching the name")
|
|
184
|
+
role: str = Field(description="one of: admin, editor, viewer")
|
|
185
|
+
|
|
186
|
+
records = run_generation(
|
|
187
|
+
User,
|
|
188
|
+
samples=10,
|
|
189
|
+
instructions="EU-based users only. Emails must match the full_name.",
|
|
190
|
+
)
|
|
191
|
+
# -> list[dict], validated and de-duplicated
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
Prefer JSON schemas? Load one and run it:
|
|
195
|
+
|
|
196
|
+
```python
|
|
197
|
+
from schematico import model_from_json, run_generation
|
|
198
|
+
|
|
199
|
+
model, rows, instructions = model_from_json("schema.json")
|
|
200
|
+
records = run_generation(model, samples=rows, instructions=instructions)
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
To find **real** data on the web instead, swap in `run_discovery` (needs
|
|
204
|
+
`TAVILY_API_KEY`):
|
|
205
|
+
|
|
206
|
+
```python
|
|
207
|
+
from schematico import run_discovery
|
|
208
|
+
records = run_discovery(User, samples=25, instructions="...")
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
Both functions accept an optional `progress_cb(found, total, event)` callback and
|
|
212
|
+
a `logfire_token` for tracing.
|
|
213
|
+
|
|
214
|
+
---
|
|
215
|
+
|
|
216
|
+
## Bring your own models
|
|
217
|
+
|
|
218
|
+
Schematico runs on [pydantic-ai](https://ai.pydantic.dev/), so you can point it
|
|
219
|
+
at virtually any model — hosted, gateway-routed, or local — and even build a
|
|
220
|
+
**failover chain** that tries each in order.
|
|
221
|
+
|
|
222
|
+
```python
|
|
223
|
+
from schematico import SchematicoModel, get_llm_model, run_discovery
|
|
224
|
+
|
|
225
|
+
model = get_llm_model([
|
|
226
|
+
# try the gateway first…
|
|
227
|
+
SchematicoModel(model="gateway/anthropic:claude-sonnet-4-6"),
|
|
228
|
+
# …fall back to a direct provider…
|
|
229
|
+
SchematicoModel(model="openai:gpt-4.1", api_key="sk-..."),
|
|
230
|
+
# …then a local, keyless model.
|
|
231
|
+
SchematicoModel(model="ollama:llama3.2", base_url="http://localhost:11434/v1"),
|
|
232
|
+
])
|
|
233
|
+
|
|
234
|
+
records = run_discovery(MySchema, samples=50, model=model)
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
- A bare model string (`"anthropic:claude-sonnet-4-6"`) reads credentials from the
|
|
238
|
+
provider's usual env var.
|
|
239
|
+
- A `SchematicoModel` lets you pin `api_key` and `base_url` per model.
|
|
240
|
+
- A list becomes an automatic failover chain.
|
|
241
|
+
|
|
242
|
+
From the CLI, set the model per project (`schematico new`, or
|
|
243
|
+
`schematico <mode> use model <id>`) and the env var that holds its key.
|
|
244
|
+
|
|
245
|
+
---
|
|
246
|
+
|
|
247
|
+
## Configuration
|
|
248
|
+
|
|
249
|
+
| Env var | Purpose |
|
|
250
|
+
|---|---|
|
|
251
|
+
| `PYDANTIC_AI_GATEWAY_API_KEY` | Key for the default gateway-routed model. |
|
|
252
|
+
| `PAI_MODEL` | Override the default model id (`gateway/anthropic:claude-sonnet-4-6`). |
|
|
253
|
+
| `TAVILY_API_KEY` | Required for `discover` mode (live web search). |
|
|
254
|
+
| `LOGFIRE_TOKEN` | Optional. Send traces, tool calls, and token usage to [Logfire](https://pydantic.dev/logfire). |
|
|
255
|
+
| `LOG_LEVEL` | `WARNING` (default), `INFO`, or `DEBUG`. |
|
|
256
|
+
|
|
257
|
+
Schematico auto-loads a `.env` file from the current directory — see
|
|
258
|
+
[`.env.example`](.env.example). Project configs live in `./.schematico/` as
|
|
259
|
+
`<name>.<mode>.toml` files.
|
|
260
|
+
|
|
261
|
+
---
|
|
262
|
+
|
|
263
|
+
## Coming soon
|
|
264
|
+
|
|
265
|
+
Schematico is just getting started. On the roadmap:
|
|
266
|
+
|
|
267
|
+
- 📦 **More output formats** — CSV, Excel, SQL inserts, and Parquet, not just JSON.
|
|
268
|
+
- 🔎 **Smarter discovery** — source citations per record, deeper crawling, and a
|
|
269
|
+
second-pass agent that verifies and de-duplicates findings.
|
|
270
|
+
- 🧩 **Richer schemas** — nested objects, relationships between tables, and
|
|
271
|
+
reusable field presets.
|
|
272
|
+
- 🗄️ **Direct sinks** — write straight into a database or a dataframe.
|
|
273
|
+
- ⚡ **Offline generation** — a fast, keyless synthesis mode for when you don't
|
|
274
|
+
want to call a model at all.
|
|
275
|
+
|
|
276
|
+
Have an idea? [Open an issue](https://github.com/Sententia-Lab/schematico/issues) —
|
|
277
|
+
this is the moment to shape where it goes.
|
|
278
|
+
|
|
279
|
+
---
|
|
280
|
+
|
|
281
|
+
## Contributing
|
|
282
|
+
|
|
283
|
+
Contributions are very welcome — issues, docs, and PRs all help. See
|
|
284
|
+
[CONTRIBUTING.md](CONTRIBUTING.md) to get from clone to PR in a couple of minutes.
|
|
285
|
+
The test suite mocks the LLM, so `uv run pytest` needs no API keys.
|
|
286
|
+
|
|
287
|
+
## License
|
|
288
|
+
|
|
289
|
+
MIT. See [`LICENSE`](LICENSE).
|