schema-search 0.1.4__py3-none-any.whl → 0.1.6__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of schema-search might be problematic. Click here for more details.
- {schema_search-0.1.4.dist-info → schema_search-0.1.6.dist-info}/METADATA +9 -10
- {schema_search-0.1.4.dist-info → schema_search-0.1.6.dist-info}/RECORD +6 -6
- {schema_search-0.1.4.dist-info → schema_search-0.1.6.dist-info}/WHEEL +0 -0
- {schema_search-0.1.4.dist-info → schema_search-0.1.6.dist-info}/entry_points.txt +0 -0
- {schema_search-0.1.4.dist-info → schema_search-0.1.6.dist-info}/licenses/LICENSE +0 -0
- {schema_search-0.1.4.dist-info → schema_search-0.1.6.dist-info}/top_level.txt +0 -0
|
@@ -1,18 +1,16 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: schema-search
|
|
3
|
-
Version: 0.1.
|
|
4
|
-
Summary: Natural language
|
|
5
|
-
Home-page: https://
|
|
6
|
-
Author:
|
|
3
|
+
Version: 0.1.6
|
|
4
|
+
Summary: Natural language database schema search with graph-aware semantic retrieval
|
|
5
|
+
Home-page: https://adibhasan.com/blog/schema-search/
|
|
6
|
+
Author: Adib Hasan
|
|
7
7
|
Classifier: Development Status :: 3 - Alpha
|
|
8
8
|
Classifier: Intended Audience :: Developers
|
|
9
9
|
Classifier: License :: OSI Approved :: MIT License
|
|
10
10
|
Classifier: Programming Language :: Python :: 3
|
|
11
|
-
Classifier: Programming Language :: Python :: 3.8
|
|
12
|
-
Classifier: Programming Language :: Python :: 3.9
|
|
13
11
|
Classifier: Programming Language :: Python :: 3.10
|
|
14
12
|
Classifier: Programming Language :: Python :: 3.11
|
|
15
|
-
Requires-Python: >=3.
|
|
13
|
+
Requires-Python: >=3.10
|
|
16
14
|
Description-Content-Type: text/markdown
|
|
17
15
|
License-File: LICENSE
|
|
18
16
|
Requires-Dist: sqlalchemy>=1.4.0
|
|
@@ -40,6 +38,7 @@ Requires-Dist: snowflake-sqlalchemy>=1.4.0; extra == "snowflake"
|
|
|
40
38
|
Requires-Dist: snowflake-connector-python>=3.0.0; extra == "snowflake"
|
|
41
39
|
Provides-Extra: bigquery
|
|
42
40
|
Requires-Dist: sqlalchemy-bigquery>=1.6.0; extra == "bigquery"
|
|
41
|
+
Dynamic: author
|
|
43
42
|
Dynamic: classifier
|
|
44
43
|
Dynamic: description
|
|
45
44
|
Dynamic: description-content-type
|
|
@@ -211,14 +210,14 @@ We [benchmarked](/tests/test_spider_eval.py) on the Spider dataset (1,234 train
|
|
|
211
210
|
**Memory:** The embedding model requires ~90 MB and the optional reranker adds ~155 MB. Actual process memory depends on your Python runtime.
|
|
212
211
|
|
|
213
212
|
### Without Reranker (`reranker.model: null`)
|
|
214
|
-

|
|
213
|
+

|
|
215
214
|
- **Indexing:** 0.22s ± 0.08s per database (18 total).
|
|
216
215
|
- **Accuracy:** Hybrid leads with Recall@1 62% / MRR 0.93; Semantic follows at Recall@1 58% / MRR 0.89.
|
|
217
216
|
- **Latency:** BM25 and Fuzzy return in ~5ms; Semantic spends ~15ms; Hybrid (semantic + fuzzy) averages 52ms.
|
|
218
217
|
- **Fuzzy baseline:** Recall@1 22%, highlighting the need for semantic signals on natural-language queries.
|
|
219
218
|
|
|
220
219
|
### With Reranker (`Alibaba-NLP/gte-reranker-modernbert-base`)
|
|
221
|
-

|
|
220
|
+

|
|
222
221
|
- **Indexing:** 0.25s ± 0.05s per database (same 18 DBs).
|
|
223
222
|
- **Accuracy:** All strategies converge around Recall@1 62% and MRR ≈ 0.92; Fuzzy jumps from 51% → 92% MRR.
|
|
224
223
|
- **Latency trade-off:** Extra CrossEncoder pass lifts per-query latency to ~0.18–0.29s depending on strategy.
|
|
@@ -268,7 +267,7 @@ search = SchemaSearch(
|
|
|
268
267
|
5. **Optional reranking** with CrossEncoder to refine results
|
|
269
268
|
6. Return top tables with full schema and relationships
|
|
270
269
|
|
|
271
|
-
Cache stored in
|
|
270
|
+
Cache stored in `/tmp/.schema_search_cache/` (configurable in `config.yml`)
|
|
272
271
|
|
|
273
272
|
## License
|
|
274
273
|
|
|
@@ -26,13 +26,13 @@ schema_search/search/factory.py,sha256=wgcx-xnZ8c7uSvu6oP3Fpoabd2Gl8FyJxn7zu3zZY
|
|
|
26
26
|
schema_search/search/fuzzy.py,sha256=Urn2GtJ5h6j0R3HsRkrMfQCLSTU8jtGaHdfYXL_Nb3A,1865
|
|
27
27
|
schema_search/search/hybrid.py,sha256=T1O46SLCPgpCOnTw2bznnCWmqP9EUkUBLqu5AeQu7oQ,2864
|
|
28
28
|
schema_search/search/semantic.py,sha256=brw7x2hZMCep6QK7WWMT451RnpVcSMuNIZtp51kC6Bo,1673
|
|
29
|
-
schema_search-0.1.
|
|
29
|
+
schema_search-0.1.6.dist-info/licenses/LICENSE,sha256=jOHFAJEjJCD7iBjS2dBe73X5IGDJdAWGosGOUxfCHTM,1067
|
|
30
30
|
tests/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
31
31
|
tests/test_integration.py,sha256=8Iiq9NAwAxMoZcnfR19oOcBEGTyIOmt6nSafG6LWpj0,11959
|
|
32
32
|
tests/test_llm_sql_generation.py,sha256=bj6iwTqXfNEvlrSXnbPxbrgEM2nscbrmYHbT-rNBJZ4,11834
|
|
33
33
|
tests/test_spider_eval.py,sha256=xQwrNXpipaDxk-vIKqSy0nOIl-3Nadtof58nZpsAsZA,15333
|
|
34
|
-
schema_search-0.1.
|
|
35
|
-
schema_search-0.1.
|
|
36
|
-
schema_search-0.1.
|
|
37
|
-
schema_search-0.1.
|
|
38
|
-
schema_search-0.1.
|
|
34
|
+
schema_search-0.1.6.dist-info/METADATA,sha256=GxpZhajVVAx5R836vQgQcrfNmt809SSGjaGtCM63wao,9327
|
|
35
|
+
schema_search-0.1.6.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
|
|
36
|
+
schema_search-0.1.6.dist-info/entry_points.txt,sha256=9FAtZWOuIlmRNBPX_v7bn8x_aUcfojAKWU6ruSo48GM,64
|
|
37
|
+
schema_search-0.1.6.dist-info/top_level.txt,sha256=NZTdQFHoJMezNIhtZICGPOuXlCXQkQduQV925Oqf4sk,20
|
|
38
|
+
schema_search-0.1.6.dist-info/RECORD,,
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|