wren-engine 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- wren_engine-0.1.0/.claude/CLAUDE.md +59 -0
- wren_engine-0.1.0/.gitignore +26 -0
- wren_engine-0.1.0/PKG-INFO +222 -0
- wren_engine-0.1.0/README.md +131 -0
- wren_engine-0.1.0/docs/cli.md +181 -0
- wren_engine-0.1.0/docs/connections.md +91 -0
- wren_engine-0.1.0/justfile +80 -0
- wren_engine-0.1.0/pyproject.toml +87 -0
- wren_engine-0.1.0/scripts/publish.sh +113 -0
- wren_engine-0.1.0/src/wren/__init__.py +9 -0
- wren_engine-0.1.0/src/wren/cli.py +382 -0
- wren_engine-0.1.0/src/wren/connector/__init__.py +4 -0
- wren_engine-0.1.0/src/wren/connector/base.py +87 -0
- wren_engine-0.1.0/src/wren/connector/bigquery.py +57 -0
- wren_engine-0.1.0/src/wren/connector/canner.py +79 -0
- wren_engine-0.1.0/src/wren/connector/databricks.py +65 -0
- wren_engine-0.1.0/src/wren/connector/duckdb.py +125 -0
- wren_engine-0.1.0/src/wren/connector/factory.py +68 -0
- wren_engine-0.1.0/src/wren/connector/ibis.py +103 -0
- wren_engine-0.1.0/src/wren/connector/mssql.py +113 -0
- wren_engine-0.1.0/src/wren/connector/mysql.py +57 -0
- wren_engine-0.1.0/src/wren/connector/oracle.py +171 -0
- wren_engine-0.1.0/src/wren/connector/postgres.py +76 -0
- wren_engine-0.1.0/src/wren/connector/redshift.py +68 -0
- wren_engine-0.1.0/src/wren/connector/spark.py +52 -0
- wren_engine-0.1.0/src/wren/engine.py +199 -0
- wren_engine-0.1.0/src/wren/mdl/__init__.py +44 -0
- wren_engine-0.1.0/src/wren/mdl/cte_rewriter.py +234 -0
- wren_engine-0.1.0/src/wren/mdl/wren_dialect.py +19 -0
- wren_engine-0.1.0/src/wren/memory/__init__.py +101 -0
- wren_engine-0.1.0/src/wren/memory/cli.py +263 -0
- wren_engine-0.1.0/src/wren/memory/embeddings.py +26 -0
- wren_engine-0.1.0/src/wren/memory/schema_indexer.py +262 -0
- wren_engine-0.1.0/src/wren/memory/store.py +304 -0
- wren_engine-0.1.0/src/wren/model/__init__.py +296 -0
- wren_engine-0.1.0/src/wren/model/data_source.py +477 -0
- wren_engine-0.1.0/src/wren/model/error.py +86 -0
- wren_engine-0.1.0/tests/__init__.py +0 -0
- wren_engine-0.1.0/tests/conftest.py +14 -0
- wren_engine-0.1.0/tests/connectors/__init__.py +0 -0
- wren_engine-0.1.0/tests/connectors/test_duckdb.py +48 -0
- wren_engine-0.1.0/tests/connectors/test_mysql.py +92 -0
- wren_engine-0.1.0/tests/connectors/test_postgres.py +90 -0
- wren_engine-0.1.0/tests/suite/__init__.py +0 -0
- wren_engine-0.1.0/tests/suite/manifests.py +70 -0
- wren_engine-0.1.0/tests/suite/query.py +211 -0
- wren_engine-0.1.0/tests/unit/__init__.py +0 -0
- wren_engine-0.1.0/tests/unit/test_cte_rewriter.py +284 -0
- wren_engine-0.1.0/tests/unit/test_engine.py +111 -0
- wren_engine-0.1.0/tests/unit/test_memory.py +363 -0
- wren_engine-0.1.0/uv.lock +3606 -0
|
@@ -0,0 +1,59 @@
|
|
|
1
|
+
# CLAUDE.md — wren package
|
|
2
|
+
|
|
3
|
+
Standalone Python SDK and CLI for Wren Engine. Wraps `wren-core-py` (PyO3 bindings) + Ibis connectors into a single installable package.
|
|
4
|
+
|
|
5
|
+
## Package Structure
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
wren/
|
|
9
|
+
src/wren/
|
|
10
|
+
engine.py — WrenEngine facade (transpile / query / dry_run / dry_plan)
|
|
11
|
+
cli.py — Typer CLI: wren query|dry-run|transpile|validate
|
|
12
|
+
mdl/ — wren-core-py session context + manifest extraction helpers
|
|
13
|
+
connector/ — Per-datasource Ibis connectors (factory.py + one file per source)
|
|
14
|
+
model/
|
|
15
|
+
data_source.py — DataSource enum + per-source ConnectionInfo factories
|
|
16
|
+
error.py — WrenError, ErrorCode, ErrorPhase
|
|
17
|
+
tests/
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
## Build & Development
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
cd wren
|
|
24
|
+
just install # build wren-core-py wheel + uv sync
|
|
25
|
+
just install-all # with all optional extras
|
|
26
|
+
just install-extra <extra> # e.g. just install-extra postgres
|
|
27
|
+
just test # pytest tests/
|
|
28
|
+
just lint # ruff format --check + ruff check
|
|
29
|
+
just format # ruff auto-fix
|
|
30
|
+
just build # uv build (produces wheel)
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
Uses `uv` (not Poetry). `pyproject.toml` uses `hatchling` as build backend.
|
|
34
|
+
|
|
35
|
+
## Key Design Points
|
|
36
|
+
|
|
37
|
+
- **WrenEngine** is the main entry point. It accepts a base64-encoded MDL JSON string, a `DataSource`, and a connection dict.
|
|
38
|
+
- **Query flow**: `_plan()` → wren-core `SessionContext.transform_sql()` → `_transpile()` via sqlglot → connector `.query()`.
|
|
39
|
+
- **Manifest extraction**: `_plan()` tries to extract a minimal sub-manifest scoped to the query's referenced tables before calling wren-core — this reduces planning overhead. Falls back to the full manifest on error.
|
|
40
|
+
- **`get_session_context` is `@cache`-decorated** — same `(manifest_str, function_path, properties, data_source)` tuple reuses the same SessionContext. Avoid mutating session state.
|
|
41
|
+
- **Write dialect mapping**: `canner` → `trino`; file sources (`local_file`, `s3_file`, `minio_file`, `gcs_file`) → `duckdb`. All others use `data_source.name` directly.
|
|
42
|
+
- **WrenEngine is a context manager** (`__enter__` / `__exit__` call `close()`).
|
|
43
|
+
|
|
44
|
+
## Connectors
|
|
45
|
+
|
|
46
|
+
`connector/factory.py` dispatches on `DataSource` to return the right connector. Each connector wraps an Ibis backend and exposes `.query(sql, limit)` and `.dry_run(sql)`. Base class in `connector/base.py`; Ibis-backed connectors share `connector/ibis.py`.
|
|
47
|
+
|
|
48
|
+
## Optional Extras
|
|
49
|
+
|
|
50
|
+
Install per data-source extras: `postgres`, `mysql`, `bigquery`, `snowflake`, `clickhouse`, `trino`, `mssql`, `databricks`, `redshift`, `spark`, `athena`, `oracle`, `all`, `dev`.
|
|
51
|
+
|
|
52
|
+
On macOS, `mysql` extra needs:
|
|
53
|
+
```bash
|
|
54
|
+
PKG_CONFIG_PATH="$(brew --prefix mysql-client)/lib/pkgconfig" just install-extra mysql
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## Dependency on wren-core-py
|
|
58
|
+
|
|
59
|
+
`wren-core-py` wheel is built locally from `../wren-core-py/` and installed via `--find-links`. Run `just build-core` (or `just install`) to rebuild after Rust changes.
|
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
|
|
2
|
+
*.iml
|
|
3
|
+
*.ipr
|
|
4
|
+
*.iws
|
|
5
|
+
target/
|
|
6
|
+
**/.idea
|
|
7
|
+
.run
|
|
8
|
+
.DS_Store
|
|
9
|
+
.classpath
|
|
10
|
+
.settings
|
|
11
|
+
.project
|
|
12
|
+
*~
|
|
13
|
+
*.class
|
|
14
|
+
.checkstyle
|
|
15
|
+
.mvn/timing.properties
|
|
16
|
+
.mvn/maven.config
|
|
17
|
+
wren-server/etc
|
|
18
|
+
mcp-server/workspace/*
|
|
19
|
+
!mcp-server/workspace/.gitkeep
|
|
20
|
+
/**/var/
|
|
21
|
+
__pycache__/
|
|
22
|
+
venv/
|
|
23
|
+
**/.env*
|
|
24
|
+
**/*.so
|
|
25
|
+
wren-core-py/dist/
|
|
26
|
+
.claude/worktrees/
|
|
@@ -0,0 +1,222 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: wren-engine
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Wren Engine CLI and Python SDK — semantic SQL layer for 20+ data sources
|
|
5
|
+
Project-URL: Homepage, https://getwren.ai
|
|
6
|
+
Project-URL: Repository, https://github.com/Canner/wren-engine
|
|
7
|
+
Project-URL: Issues, https://github.com/Canner/wren-engine/issues
|
|
8
|
+
Author-email: Wren AI <contact@getwren.ai>
|
|
9
|
+
License: Apache-2.0
|
|
10
|
+
Keywords: analytics,cli,data-modeling,database,datafusion,mdl,python,sdk,semantic,semantic-layer,sql,wren,wrenai
|
|
11
|
+
Classifier: Development Status :: 4 - Beta
|
|
12
|
+
Classifier: Intended Audience :: Developers
|
|
13
|
+
Classifier: License :: OSI Approved :: Apache Software License
|
|
14
|
+
Classifier: Programming Language :: Python :: 3
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
17
|
+
Requires-Python: >=3.11
|
|
18
|
+
Requires-Dist: boto3>=1.26
|
|
19
|
+
Requires-Dist: duckdb>=1.0
|
|
20
|
+
Requires-Dist: ibis-framework>=10
|
|
21
|
+
Requires-Dist: loguru>=0.7
|
|
22
|
+
Requires-Dist: opendal>=0.45
|
|
23
|
+
Requires-Dist: pandas>=2
|
|
24
|
+
Requires-Dist: pyarrow-hotfix>=0.6
|
|
25
|
+
Requires-Dist: pyarrow>=14
|
|
26
|
+
Requires-Dist: pyasn1>=0.6.3
|
|
27
|
+
Requires-Dist: pydantic>=2
|
|
28
|
+
Requires-Dist: pyopenssl>=26.0.0
|
|
29
|
+
Requires-Dist: requests>=2.33.0
|
|
30
|
+
Requires-Dist: sqlglot>=27
|
|
31
|
+
Requires-Dist: typer>=0.12
|
|
32
|
+
Requires-Dist: wren-core-py>=0.1
|
|
33
|
+
Provides-Extra: all
|
|
34
|
+
Requires-Dist: databricks-sdk; extra == 'all'
|
|
35
|
+
Requires-Dist: databricks-sql-connector; extra == 'all'
|
|
36
|
+
Requires-Dist: google-auth; extra == 'all'
|
|
37
|
+
Requires-Dist: ibis-framework[athena]; extra == 'all'
|
|
38
|
+
Requires-Dist: ibis-framework[bigquery]; extra == 'all'
|
|
39
|
+
Requires-Dist: ibis-framework[clickhouse]; extra == 'all'
|
|
40
|
+
Requires-Dist: ibis-framework[mssql]; extra == 'all'
|
|
41
|
+
Requires-Dist: ibis-framework[mysql]; extra == 'all'
|
|
42
|
+
Requires-Dist: ibis-framework[postgres]; extra == 'all'
|
|
43
|
+
Requires-Dist: ibis-framework[snowflake]; extra == 'all'
|
|
44
|
+
Requires-Dist: ibis-framework[trino]; extra == 'all'
|
|
45
|
+
Requires-Dist: lancedb>=0.6; extra == 'all'
|
|
46
|
+
Requires-Dist: mysqlclient>=2.2; extra == 'all'
|
|
47
|
+
Requires-Dist: oracledb>=2; extra == 'all'
|
|
48
|
+
Requires-Dist: psycopg>=3; extra == 'all'
|
|
49
|
+
Requires-Dist: pyspark>=3.5; extra == 'all'
|
|
50
|
+
Requires-Dist: redshift-connector; extra == 'all'
|
|
51
|
+
Requires-Dist: sentence-transformers>=2.2; extra == 'all'
|
|
52
|
+
Requires-Dist: trino>=0.321; extra == 'all'
|
|
53
|
+
Provides-Extra: athena
|
|
54
|
+
Requires-Dist: ibis-framework[athena]; extra == 'athena'
|
|
55
|
+
Provides-Extra: bigquery
|
|
56
|
+
Requires-Dist: google-auth; extra == 'bigquery'
|
|
57
|
+
Requires-Dist: ibis-framework[bigquery]; extra == 'bigquery'
|
|
58
|
+
Provides-Extra: clickhouse
|
|
59
|
+
Requires-Dist: ibis-framework[clickhouse]; extra == 'clickhouse'
|
|
60
|
+
Provides-Extra: databricks
|
|
61
|
+
Requires-Dist: databricks-sdk; extra == 'databricks'
|
|
62
|
+
Requires-Dist: databricks-sql-connector; extra == 'databricks'
|
|
63
|
+
Provides-Extra: dev
|
|
64
|
+
Requires-Dist: orjson>=3; extra == 'dev'
|
|
65
|
+
Requires-Dist: pytest>=8; extra == 'dev'
|
|
66
|
+
Requires-Dist: ruff>=0.4; extra == 'dev'
|
|
67
|
+
Requires-Dist: testcontainers[mysql,postgres]>=4; extra == 'dev'
|
|
68
|
+
Provides-Extra: memory
|
|
69
|
+
Requires-Dist: lancedb>=0.6; extra == 'memory'
|
|
70
|
+
Requires-Dist: sentence-transformers>=2.2; extra == 'memory'
|
|
71
|
+
Provides-Extra: mssql
|
|
72
|
+
Requires-Dist: ibis-framework[mssql]; extra == 'mssql'
|
|
73
|
+
Provides-Extra: mysql
|
|
74
|
+
Requires-Dist: ibis-framework[mysql]; extra == 'mysql'
|
|
75
|
+
Requires-Dist: mysqlclient>=2.2; extra == 'mysql'
|
|
76
|
+
Provides-Extra: oracle
|
|
77
|
+
Requires-Dist: oracledb>=2; extra == 'oracle'
|
|
78
|
+
Provides-Extra: postgres
|
|
79
|
+
Requires-Dist: ibis-framework[postgres]; extra == 'postgres'
|
|
80
|
+
Requires-Dist: psycopg>=3; extra == 'postgres'
|
|
81
|
+
Provides-Extra: redshift
|
|
82
|
+
Requires-Dist: redshift-connector; extra == 'redshift'
|
|
83
|
+
Provides-Extra: snowflake
|
|
84
|
+
Requires-Dist: ibis-framework[snowflake]; extra == 'snowflake'
|
|
85
|
+
Provides-Extra: spark
|
|
86
|
+
Requires-Dist: pyspark>=3.5; extra == 'spark'
|
|
87
|
+
Provides-Extra: trino
|
|
88
|
+
Requires-Dist: ibis-framework[trino]; extra == 'trino'
|
|
89
|
+
Requires-Dist: trino>=0.321; extra == 'trino'
|
|
90
|
+
Description-Content-Type: text/markdown
|
|
91
|
+
|
|
92
|
+
# wren-engine
|
|
93
|
+
|
|
94
|
+
[](https://pypi.org/project/wren-engine/)
|
|
95
|
+
[](https://pypi.org/project/wren-engine/)
|
|
96
|
+
[](https://github.com/Canner/wren-engine/blob/main/LICENSE)
|
|
97
|
+
|
|
98
|
+
Wren Engine CLI and Python SDK — semantic SQL layer for 20+ data sources.
|
|
99
|
+
|
|
100
|
+
Translate natural SQL queries through an [MDL (Modeling Definition Language)](https://docs.getwren.ai/) semantic layer and execute them against your database. Powered by [Apache DataFusion](https://datafusion.apache.org/) and [Ibis](https://ibis-project.org/).
|
|
101
|
+
|
|
102
|
+
## Installation
|
|
103
|
+
|
|
104
|
+
```bash
|
|
105
|
+
pip install wren-engine # Core (DuckDB included)
|
|
106
|
+
pip install wren-engine[postgres] # PostgreSQL
|
|
107
|
+
pip install wren-engine[mysql] # MySQL
|
|
108
|
+
pip install wren-engine[bigquery] # BigQuery
|
|
109
|
+
pip install wren-engine[snowflake] # Snowflake
|
|
110
|
+
pip install wren-engine[clickhouse] # ClickHouse
|
|
111
|
+
pip install wren-engine[trino] # Trino
|
|
112
|
+
pip install wren-engine[mssql] # SQL Server
|
|
113
|
+
pip install wren-engine[databricks] # Databricks
|
|
114
|
+
pip install wren-engine[redshift] # Redshift
|
|
115
|
+
pip install wren-engine[spark] # Spark
|
|
116
|
+
pip install wren-engine[athena] # Athena
|
|
117
|
+
pip install wren-engine[oracle] # Oracle
|
|
118
|
+
pip install 'wren-engine[memory]' # Schema & query memory (LanceDB)
|
|
119
|
+
pip install 'wren-engine[all]' # All connectors + memory
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
Requires Python 3.11+.
|
|
123
|
+
|
|
124
|
+
## Quick start
|
|
125
|
+
|
|
126
|
+
**1. Create `~/.wren/mdl.json`** — your semantic model:
|
|
127
|
+
|
|
128
|
+
```json
|
|
129
|
+
{
|
|
130
|
+
"catalog": "wren",
|
|
131
|
+
"schema": "public",
|
|
132
|
+
"models": [
|
|
133
|
+
{
|
|
134
|
+
"name": "orders",
|
|
135
|
+
"tableReference": { "schema": "mydb", "table": "orders" },
|
|
136
|
+
"columns": [
|
|
137
|
+
{ "name": "order_id", "type": "integer" },
|
|
138
|
+
{ "name": "customer_id", "type": "integer" },
|
|
139
|
+
{ "name": "total", "type": "double" },
|
|
140
|
+
{ "name": "status", "type": "varchar" }
|
|
141
|
+
],
|
|
142
|
+
"primaryKey": "order_id"
|
|
143
|
+
}
|
|
144
|
+
]
|
|
145
|
+
}
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
**2. Create `~/.wren/connection_info.json`** — your connection:
|
|
149
|
+
|
|
150
|
+
```json
|
|
151
|
+
{
|
|
152
|
+
"datasource": "mysql",
|
|
153
|
+
"host": "localhost",
|
|
154
|
+
"port": 3306,
|
|
155
|
+
"database": "mydb",
|
|
156
|
+
"user": "root",
|
|
157
|
+
"password": "secret"
|
|
158
|
+
}
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
**3. Run queries** — `wren` auto-discovers both files from `~/.wren`:
|
|
162
|
+
|
|
163
|
+
```bash
|
|
164
|
+
wren --sql 'SELECT order_id FROM "orders" LIMIT 10'
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
For the full CLI reference and per-datasource `connection_info.json` formats, see [`docs/cli.md`](docs/cli.md) and [`docs/connections.md`](docs/connections.md).
|
|
168
|
+
|
|
169
|
+
**4. Index schema for semantic search** (optional, requires `wren-engine[memory]`):
|
|
170
|
+
|
|
171
|
+
```bash
|
|
172
|
+
wren memory index # index MDL schema
|
|
173
|
+
wren memory fetch -q "customer order price" # fetch relevant schema context
|
|
174
|
+
wren memory store --nl "top customers" --sql "SELECT ..." # store NL→SQL pair
|
|
175
|
+
wren memory recall -q "best customers" # retrieve similar past queries
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
---
|
|
179
|
+
|
|
180
|
+
## Python SDK
|
|
181
|
+
|
|
182
|
+
```python
|
|
183
|
+
import base64, orjson
|
|
184
|
+
from wren import WrenEngine, DataSource
|
|
185
|
+
|
|
186
|
+
manifest = { ... } # your MDL dict
|
|
187
|
+
manifest_str = base64.b64encode(orjson.dumps(manifest)).decode()
|
|
188
|
+
|
|
189
|
+
with WrenEngine(manifest_str, DataSource.mysql, {"host": "...", ...}) as engine:
|
|
190
|
+
result = engine.query('SELECT * FROM "orders" LIMIT 10')
|
|
191
|
+
print(result.to_pandas())
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
---
|
|
195
|
+
|
|
196
|
+
## Development
|
|
197
|
+
|
|
198
|
+
```bash
|
|
199
|
+
just install-dev # Install with dev dependencies
|
|
200
|
+
just lint # Ruff format check + lint
|
|
201
|
+
just format # Auto-fix
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
| Command | What it runs | Docker needed |
|
|
205
|
+
|---------|-------------|---------------|
|
|
206
|
+
| `just test-unit` | Unit tests | No |
|
|
207
|
+
| `just test-duckdb` | DuckDB connector tests | No |
|
|
208
|
+
| `just test-postgres` | PostgreSQL connector tests | Yes |
|
|
209
|
+
| `just test-mysql` | MySQL connector tests | Yes |
|
|
210
|
+
| `just test` | All tests | Yes |
|
|
211
|
+
|
|
212
|
+
## Publishing
|
|
213
|
+
|
|
214
|
+
```bash
|
|
215
|
+
./scripts/publish.sh # Build + publish to PyPI
|
|
216
|
+
./scripts/publish.sh --test # Build + publish to TestPyPI
|
|
217
|
+
./scripts/publish.sh --build # Build only
|
|
218
|
+
```
|
|
219
|
+
|
|
220
|
+
## License
|
|
221
|
+
|
|
222
|
+
Apache-2.0
|
|
@@ -0,0 +1,131 @@
|
|
|
1
|
+
# wren-engine
|
|
2
|
+
|
|
3
|
+
[](https://pypi.org/project/wren-engine/)
|
|
4
|
+
[](https://pypi.org/project/wren-engine/)
|
|
5
|
+
[](https://github.com/Canner/wren-engine/blob/main/LICENSE)
|
|
6
|
+
|
|
7
|
+
Wren Engine CLI and Python SDK — semantic SQL layer for 20+ data sources.
|
|
8
|
+
|
|
9
|
+
Translate natural SQL queries through an [MDL (Modeling Definition Language)](https://docs.getwren.ai/) semantic layer and execute them against your database. Powered by [Apache DataFusion](https://datafusion.apache.org/) and [Ibis](https://ibis-project.org/).
|
|
10
|
+
|
|
11
|
+
## Installation
|
|
12
|
+
|
|
13
|
+
```bash
|
|
14
|
+
pip install wren-engine # Core (DuckDB included)
|
|
15
|
+
pip install wren-engine[postgres] # PostgreSQL
|
|
16
|
+
pip install wren-engine[mysql] # MySQL
|
|
17
|
+
pip install wren-engine[bigquery] # BigQuery
|
|
18
|
+
pip install wren-engine[snowflake] # Snowflake
|
|
19
|
+
pip install wren-engine[clickhouse] # ClickHouse
|
|
20
|
+
pip install wren-engine[trino] # Trino
|
|
21
|
+
pip install wren-engine[mssql] # SQL Server
|
|
22
|
+
pip install wren-engine[databricks] # Databricks
|
|
23
|
+
pip install wren-engine[redshift] # Redshift
|
|
24
|
+
pip install wren-engine[spark] # Spark
|
|
25
|
+
pip install wren-engine[athena] # Athena
|
|
26
|
+
pip install wren-engine[oracle] # Oracle
|
|
27
|
+
pip install 'wren-engine[memory]' # Schema & query memory (LanceDB)
|
|
28
|
+
pip install 'wren-engine[all]' # All connectors + memory
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
Requires Python 3.11+.
|
|
32
|
+
|
|
33
|
+
## Quick start
|
|
34
|
+
|
|
35
|
+
**1. Create `~/.wren/mdl.json`** — your semantic model:
|
|
36
|
+
|
|
37
|
+
```json
|
|
38
|
+
{
|
|
39
|
+
"catalog": "wren",
|
|
40
|
+
"schema": "public",
|
|
41
|
+
"models": [
|
|
42
|
+
{
|
|
43
|
+
"name": "orders",
|
|
44
|
+
"tableReference": { "schema": "mydb", "table": "orders" },
|
|
45
|
+
"columns": [
|
|
46
|
+
{ "name": "order_id", "type": "integer" },
|
|
47
|
+
{ "name": "customer_id", "type": "integer" },
|
|
48
|
+
{ "name": "total", "type": "double" },
|
|
49
|
+
{ "name": "status", "type": "varchar" }
|
|
50
|
+
],
|
|
51
|
+
"primaryKey": "order_id"
|
|
52
|
+
}
|
|
53
|
+
]
|
|
54
|
+
}
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
**2. Create `~/.wren/connection_info.json`** — your connection:
|
|
58
|
+
|
|
59
|
+
```json
|
|
60
|
+
{
|
|
61
|
+
"datasource": "mysql",
|
|
62
|
+
"host": "localhost",
|
|
63
|
+
"port": 3306,
|
|
64
|
+
"database": "mydb",
|
|
65
|
+
"user": "root",
|
|
66
|
+
"password": "secret"
|
|
67
|
+
}
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
**3. Run queries** — `wren` auto-discovers both files from `~/.wren`:
|
|
71
|
+
|
|
72
|
+
```bash
|
|
73
|
+
wren --sql 'SELECT order_id FROM "orders" LIMIT 10'
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
For the full CLI reference and per-datasource `connection_info.json` formats, see [`docs/cli.md`](docs/cli.md) and [`docs/connections.md`](docs/connections.md).
|
|
77
|
+
|
|
78
|
+
**4. Index schema for semantic search** (optional, requires `wren-engine[memory]`):
|
|
79
|
+
|
|
80
|
+
```bash
|
|
81
|
+
wren memory index # index MDL schema
|
|
82
|
+
wren memory fetch -q "customer order price" # fetch relevant schema context
|
|
83
|
+
wren memory store --nl "top customers" --sql "SELECT ..." # store NL→SQL pair
|
|
84
|
+
wren memory recall -q "best customers" # retrieve similar past queries
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
---
|
|
88
|
+
|
|
89
|
+
## Python SDK
|
|
90
|
+
|
|
91
|
+
```python
|
|
92
|
+
import base64, orjson
|
|
93
|
+
from wren import WrenEngine, DataSource
|
|
94
|
+
|
|
95
|
+
manifest = { ... } # your MDL dict
|
|
96
|
+
manifest_str = base64.b64encode(orjson.dumps(manifest)).decode()
|
|
97
|
+
|
|
98
|
+
with WrenEngine(manifest_str, DataSource.mysql, {"host": "...", ...}) as engine:
|
|
99
|
+
result = engine.query('SELECT * FROM "orders" LIMIT 10')
|
|
100
|
+
print(result.to_pandas())
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
|
|
105
|
+
## Development
|
|
106
|
+
|
|
107
|
+
```bash
|
|
108
|
+
just install-dev # Install with dev dependencies
|
|
109
|
+
just lint # Ruff format check + lint
|
|
110
|
+
just format # Auto-fix
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
| Command | What it runs | Docker needed |
|
|
114
|
+
|---------|-------------|---------------|
|
|
115
|
+
| `just test-unit` | Unit tests | No |
|
|
116
|
+
| `just test-duckdb` | DuckDB connector tests | No |
|
|
117
|
+
| `just test-postgres` | PostgreSQL connector tests | Yes |
|
|
118
|
+
| `just test-mysql` | MySQL connector tests | Yes |
|
|
119
|
+
| `just test` | All tests | Yes |
|
|
120
|
+
|
|
121
|
+
## Publishing
|
|
122
|
+
|
|
123
|
+
```bash
|
|
124
|
+
./scripts/publish.sh # Build + publish to PyPI
|
|
125
|
+
./scripts/publish.sh --test # Build + publish to TestPyPI
|
|
126
|
+
./scripts/publish.sh --build # Build only
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
## License
|
|
130
|
+
|
|
131
|
+
Apache-2.0
|
|
@@ -0,0 +1,181 @@
|
|
|
1
|
+
# CLI reference
|
|
2
|
+
|
|
3
|
+
## Default command — query
|
|
4
|
+
|
|
5
|
+
Running `wren --sql '...'` executes a query and prints the result. This is the same as `wren query --sql '...'`.
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
wren --sql 'SELECT COUNT(*) FROM "orders"'
|
|
9
|
+
wren --sql 'SELECT * FROM "orders" LIMIT 5' --output csv
|
|
10
|
+
wren --sql 'SELECT * FROM "orders"' --limit 100 --output json
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
Output formats: `table` (default), `csv`, `json`.
|
|
14
|
+
|
|
15
|
+
## `wren query`
|
|
16
|
+
|
|
17
|
+
Execute SQL and return results.
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
wren query --sql 'SELECT order_id, total FROM "orders" ORDER BY total DESC LIMIT 5'
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
## `wren dry-plan`
|
|
24
|
+
|
|
25
|
+
Translate MDL SQL to the native dialect SQL for your data source. No database connection required.
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
wren dry-plan --sql 'SELECT order_id FROM "orders"'
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
## `wren dry-run`
|
|
32
|
+
|
|
33
|
+
Validate SQL against the live database without returning rows. Prints `OK` on success.
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
wren dry-run --sql 'SELECT * FROM "orders" LIMIT 1'
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
## `wren validate`
|
|
40
|
+
|
|
41
|
+
Same as `dry-run` but prints `Valid` / `Invalid: <reason>`.
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
wren validate --sql 'SELECT * FROM "NonExistent"'
|
|
45
|
+
# Invalid: table not found ...
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
## Overriding defaults
|
|
49
|
+
|
|
50
|
+
All flags are optional when `~/.wren/mdl.json` and `~/.wren/connection_info.json` exist:
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
wren --sql '...' \
|
|
54
|
+
--mdl /path/to/other-mdl.json \
|
|
55
|
+
--connection-file /path/to/prod-connection_info.json \
|
|
56
|
+
--datasource postgres
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
Or pass connection info inline:
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
wren --sql 'SELECT COUNT(*) FROM "orders"' \
|
|
63
|
+
--connection-info '{"datasource":"mysql","host":"localhost","port":3306,"database":"mydb","user":"root","password":"secret"}'
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
---
|
|
67
|
+
|
|
68
|
+
## `wren memory` — Schema & Query Memory
|
|
69
|
+
|
|
70
|
+
LanceDB-backed semantic memory for MDL schema search and NL→SQL retrieval. Requires the `memory` extra:
|
|
71
|
+
|
|
72
|
+
```bash
|
|
73
|
+
pip install 'wren-engine[memory]'
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
All `memory` subcommands accept `--path DIR` to override the default storage location (`~/.wren/memory/`).
|
|
77
|
+
|
|
78
|
+
### Hybrid strategy: full text vs. embedding search
|
|
79
|
+
|
|
80
|
+
When providing schema context to an LLM, there is a trade-off:
|
|
81
|
+
|
|
82
|
+
- **Small schemas** — the full plain-text description fits easily in the LLM context window and gives better results because the LLM sees the complete structure (model→column relationships, join paths, primary keys) rather than isolated fragments from a vector search.
|
|
83
|
+
- **Large schemas** — the full text exceeds what is practical to send in a single prompt, so embedding search is needed to retrieve only the relevant fragments.
|
|
84
|
+
|
|
85
|
+
`wren memory fetch` automatically picks the right strategy based on the **character length** of the generated plain-text description:
|
|
86
|
+
|
|
87
|
+
| Schema size | Threshold | Strategy |
|
|
88
|
+
|---|---|---|
|
|
89
|
+
| Below 30,000 chars (~8K tokens) | Default | Returns full plain text |
|
|
90
|
+
| Above 30,000 chars | Default | Returns embedding search results |
|
|
91
|
+
|
|
92
|
+
The threshold is measured in characters (not tokens) because character length is free to compute, while accurate token counting requires a tokeniser. The 4:1 chars-to-tokens ratio holds for English; CJK text compresses less (~1.5:1), so a CJK-heavy schema switches to embedding search sooner — which is the conservative direction.
|
|
93
|
+
|
|
94
|
+
The default threshold (30,000 chars) can be overridden with `--threshold`.
|
|
95
|
+
|
|
96
|
+
### `wren memory index`
|
|
97
|
+
|
|
98
|
+
Parse the MDL manifest and index all schema items (models, columns, relationships, views) into LanceDB with local embeddings.
|
|
99
|
+
|
|
100
|
+
```bash
|
|
101
|
+
wren memory index # uses ~/.wren/mdl.json
|
|
102
|
+
wren memory index --mdl /path/to/mdl.json # explicit MDL file
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
### `wren memory describe`
|
|
106
|
+
|
|
107
|
+
Print the full schema as structured plain text. No embedding or LanceDB required — this is a pure transformation of the MDL manifest into a human/LLM-readable format.
|
|
108
|
+
|
|
109
|
+
```bash
|
|
110
|
+
wren memory describe # uses ~/.wren/mdl.json
|
|
111
|
+
wren memory describe --mdl /path/to/mdl.json
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
### `wren memory fetch`
|
|
115
|
+
|
|
116
|
+
Get schema context for an LLM. Automatically chooses the best strategy based on schema size: full plain text for small schemas, embedding search for large schemas.
|
|
117
|
+
|
|
118
|
+
When using the search strategy, optional `--type` and `--model` filters narrow the results.
|
|
119
|
+
|
|
120
|
+
```bash
|
|
121
|
+
wren memory fetch -q "customer order price"
|
|
122
|
+
wren memory fetch -q "revenue" --type column --model orders
|
|
123
|
+
wren memory fetch -q "日期" --threshold 50000 --output json
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
| Flag | Description |
|
|
127
|
+
|------|-------------|
|
|
128
|
+
| `-q, --query` | Search query (required) |
|
|
129
|
+
| `--mdl` | Path to MDL JSON file |
|
|
130
|
+
| `-l, --limit` | Max results for search strategy (default: 5) |
|
|
131
|
+
| `-t, --type` | Filter: `model`, `column`, `relationship`, `view` (search strategy only) |
|
|
132
|
+
| `--model` | Filter by model name (search strategy only) |
|
|
133
|
+
| `--threshold` | Character threshold for full vs search (default: 30,000) |
|
|
134
|
+
| `-o, --output` | Output format: `table` (default), `json` |
|
|
135
|
+
|
|
136
|
+
### `wren memory store`
|
|
137
|
+
|
|
138
|
+
Store a natural-language-to-SQL pair for future few-shot retrieval.
|
|
139
|
+
|
|
140
|
+
```bash
|
|
141
|
+
wren memory store \
|
|
142
|
+
--nl "show top customers by revenue" \
|
|
143
|
+
--sql "SELECT c_name, sum(o_totalprice) FROM orders JOIN customer GROUP BY 1 ORDER BY 2 DESC" \
|
|
144
|
+
--datasource postgres
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
### `wren memory recall`
|
|
148
|
+
|
|
149
|
+
Search stored NL→SQL pairs by semantic similarity to a query.
|
|
150
|
+
|
|
151
|
+
```bash
|
|
152
|
+
wren memory recall -q "best customers"
|
|
153
|
+
wren memory recall -q "月度營收" --datasource mysql --limit 5 --output json
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
| Flag | Description |
|
|
157
|
+
|------|-------------|
|
|
158
|
+
| `-q, --query` | Search query (required) |
|
|
159
|
+
| `-l, --limit` | Max results (default: 3) |
|
|
160
|
+
| `-d, --datasource` | Filter by data source |
|
|
161
|
+
| `-o, --output` | Output format: `table` (default), `json` |
|
|
162
|
+
|
|
163
|
+
### `wren memory status`
|
|
164
|
+
|
|
165
|
+
Show index statistics: storage path, table names, and row counts.
|
|
166
|
+
|
|
167
|
+
```bash
|
|
168
|
+
wren memory status
|
|
169
|
+
# Path: /Users/you/.wren/memory
|
|
170
|
+
# schema_items: 47 rows
|
|
171
|
+
# query_history: 12 rows
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
### `wren memory reset`
|
|
175
|
+
|
|
176
|
+
Drop all memory tables and start fresh.
|
|
177
|
+
|
|
178
|
+
```bash
|
|
179
|
+
wren memory reset # prompts for confirmation
|
|
180
|
+
wren memory reset --force # skip confirmation
|
|
181
|
+
```
|