schema-search 0.1.5__py3-none-any.whl → 0.1.6__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of schema-search might be problematic. Click here for more details.
- {schema_search-0.1.5.dist-info → schema_search-0.1.6.dist-info}/METADATA +8 -7
- {schema_search-0.1.5.dist-info → schema_search-0.1.6.dist-info}/RECORD +6 -6
- {schema_search-0.1.5.dist-info → schema_search-0.1.6.dist-info}/WHEEL +0 -0
- {schema_search-0.1.5.dist-info → schema_search-0.1.6.dist-info}/entry_points.txt +0 -0
- {schema_search-0.1.5.dist-info → schema_search-0.1.6.dist-info}/licenses/LICENSE +0 -0
- {schema_search-0.1.5.dist-info → schema_search-0.1.6.dist-info}/top_level.txt +0 -0
|
@@ -1,9 +1,9 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: schema-search
|
|
3
|
-
Version: 0.1.
|
|
4
|
-
Summary: Natural language
|
|
5
|
-
Home-page: https://
|
|
6
|
-
Author:
|
|
3
|
+
Version: 0.1.6
|
|
4
|
+
Summary: Natural language database schema search with graph-aware semantic retrieval
|
|
5
|
+
Home-page: https://adibhasan.com/blog/schema-search/
|
|
6
|
+
Author: Adib Hasan
|
|
7
7
|
Classifier: Development Status :: 3 - Alpha
|
|
8
8
|
Classifier: Intended Audience :: Developers
|
|
9
9
|
Classifier: License :: OSI Approved :: MIT License
|
|
@@ -38,6 +38,7 @@ Requires-Dist: snowflake-sqlalchemy>=1.4.0; extra == "snowflake"
|
|
|
38
38
|
Requires-Dist: snowflake-connector-python>=3.0.0; extra == "snowflake"
|
|
39
39
|
Provides-Extra: bigquery
|
|
40
40
|
Requires-Dist: sqlalchemy-bigquery>=1.6.0; extra == "bigquery"
|
|
41
|
+
Dynamic: author
|
|
41
42
|
Dynamic: classifier
|
|
42
43
|
Dynamic: description
|
|
43
44
|
Dynamic: description-content-type
|
|
@@ -209,14 +210,14 @@ We [benchmarked](/tests/test_spider_eval.py) on the Spider dataset (1,234 train
|
|
|
209
210
|
**Memory:** The embedding model requires ~90 MB and the optional reranker adds ~155 MB. Actual process memory depends on your Python runtime.
|
|
210
211
|
|
|
211
212
|
### Without Reranker (`reranker.model: null`)
|
|
212
|
-

|
|
213
|
+

|
|
213
214
|
- **Indexing:** 0.22s ± 0.08s per database (18 total).
|
|
214
215
|
- **Accuracy:** Hybrid leads with Recall@1 62% / MRR 0.93; Semantic follows at Recall@1 58% / MRR 0.89.
|
|
215
216
|
- **Latency:** BM25 and Fuzzy return in ~5ms; Semantic spends ~15ms; Hybrid (semantic + fuzzy) averages 52ms.
|
|
216
217
|
- **Fuzzy baseline:** Recall@1 22%, highlighting the need for semantic signals on natural-language queries.
|
|
217
218
|
|
|
218
219
|
### With Reranker (`Alibaba-NLP/gte-reranker-modernbert-base`)
|
|
219
|
-

|
|
220
|
+

|
|
220
221
|
- **Indexing:** 0.25s ± 0.05s per database (same 18 DBs).
|
|
221
222
|
- **Accuracy:** All strategies converge around Recall@1 62% and MRR ≈ 0.92; Fuzzy jumps from 51% → 92% MRR.
|
|
222
223
|
- **Latency trade-off:** Extra CrossEncoder pass lifts per-query latency to ~0.18–0.29s depending on strategy.
|
|
@@ -266,7 +267,7 @@ search = SchemaSearch(
|
|
|
266
267
|
5. **Optional reranking** with CrossEncoder to refine results
|
|
267
268
|
6. Return top tables with full schema and relationships
|
|
268
269
|
|
|
269
|
-
Cache stored in
|
|
270
|
+
Cache stored in `/tmp/.schema_search_cache/` (configurable in `config.yml`)
|
|
270
271
|
|
|
271
272
|
## License
|
|
272
273
|
|
|
@@ -26,13 +26,13 @@ schema_search/search/factory.py,sha256=wgcx-xnZ8c7uSvu6oP3Fpoabd2Gl8FyJxn7zu3zZY
|
|
|
26
26
|
schema_search/search/fuzzy.py,sha256=Urn2GtJ5h6j0R3HsRkrMfQCLSTU8jtGaHdfYXL_Nb3A,1865
|
|
27
27
|
schema_search/search/hybrid.py,sha256=T1O46SLCPgpCOnTw2bznnCWmqP9EUkUBLqu5AeQu7oQ,2864
|
|
28
28
|
schema_search/search/semantic.py,sha256=brw7x2hZMCep6QK7WWMT451RnpVcSMuNIZtp51kC6Bo,1673
|
|
29
|
-
schema_search-0.1.
|
|
29
|
+
schema_search-0.1.6.dist-info/licenses/LICENSE,sha256=jOHFAJEjJCD7iBjS2dBe73X5IGDJdAWGosGOUxfCHTM,1067
|
|
30
30
|
tests/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
31
31
|
tests/test_integration.py,sha256=8Iiq9NAwAxMoZcnfR19oOcBEGTyIOmt6nSafG6LWpj0,11959
|
|
32
32
|
tests/test_llm_sql_generation.py,sha256=bj6iwTqXfNEvlrSXnbPxbrgEM2nscbrmYHbT-rNBJZ4,11834
|
|
33
33
|
tests/test_spider_eval.py,sha256=xQwrNXpipaDxk-vIKqSy0nOIl-3Nadtof58nZpsAsZA,15333
|
|
34
|
-
schema_search-0.1.
|
|
35
|
-
schema_search-0.1.
|
|
36
|
-
schema_search-0.1.
|
|
37
|
-
schema_search-0.1.
|
|
38
|
-
schema_search-0.1.
|
|
34
|
+
schema_search-0.1.6.dist-info/METADATA,sha256=GxpZhajVVAx5R836vQgQcrfNmt809SSGjaGtCM63wao,9327
|
|
35
|
+
schema_search-0.1.6.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
|
|
36
|
+
schema_search-0.1.6.dist-info/entry_points.txt,sha256=9FAtZWOuIlmRNBPX_v7bn8x_aUcfojAKWU6ruSo48GM,64
|
|
37
|
+
schema_search-0.1.6.dist-info/top_level.txt,sha256=NZTdQFHoJMezNIhtZICGPOuXlCXQkQduQV925Oqf4sk,20
|
|
38
|
+
schema_search-0.1.6.dist-info/RECORD,,
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|