schema-search 0.1.4__py3-none-any.whl → 0.1.6__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of schema-search might be problematic. Click here for more details.

@@ -1,18 +1,16 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: schema-search
3
- Version: 0.1.4
4
- Summary: Natural language search for database schemas with graph-aware semantic retrieval
5
- Home-page: https://github.com/neehan/schema-search
6
- Author:
3
+ Version: 0.1.6
4
+ Summary: Natural language database schema search with graph-aware semantic retrieval
5
+ Home-page: https://adibhasan.com/blog/schema-search/
6
+ Author: Adib Hasan
7
7
  Classifier: Development Status :: 3 - Alpha
8
8
  Classifier: Intended Audience :: Developers
9
9
  Classifier: License :: OSI Approved :: MIT License
10
10
  Classifier: Programming Language :: Python :: 3
11
- Classifier: Programming Language :: Python :: 3.8
12
- Classifier: Programming Language :: Python :: 3.9
13
11
  Classifier: Programming Language :: Python :: 3.10
14
12
  Classifier: Programming Language :: Python :: 3.11
15
- Requires-Python: >=3.8
13
+ Requires-Python: >=3.10
16
14
  Description-Content-Type: text/markdown
17
15
  License-File: LICENSE
18
16
  Requires-Dist: sqlalchemy>=1.4.0
@@ -40,6 +38,7 @@ Requires-Dist: snowflake-sqlalchemy>=1.4.0; extra == "snowflake"
40
38
  Requires-Dist: snowflake-connector-python>=3.0.0; extra == "snowflake"
41
39
  Provides-Extra: bigquery
42
40
  Requires-Dist: sqlalchemy-bigquery>=1.6.0; extra == "bigquery"
41
+ Dynamic: author
43
42
  Dynamic: classifier
44
43
  Dynamic: description
45
44
  Dynamic: description-content-type
@@ -211,14 +210,14 @@ We [benchmarked](/tests/test_spider_eval.py) on the Spider dataset (1,234 train
211
210
  **Memory:** The embedding model requires ~90 MB and the optional reranker adds ~155 MB. Actual process memory depends on your Python runtime.
212
211
 
213
212
  ### Without Reranker (`reranker.model: null`)
214
- ![Without Reranker](img/spider_benchmark_without_reranker.png)
213
+ ![Without Reranker](https://raw.githubusercontent.com/Neehan/schema-search/refs/heads/main/img/spider_benchmark_without_reranker.png)
215
214
  - **Indexing:** 0.22s ± 0.08s per database (18 total).
216
215
  - **Accuracy:** Hybrid leads with Recall@1 62% / MRR 0.93; Semantic follows at Recall@1 58% / MRR 0.89.
217
216
  - **Latency:** BM25 and Fuzzy return in ~5ms; Semantic spends ~15ms; Hybrid (semantic + fuzzy) averages 52ms.
218
217
  - **Fuzzy baseline:** Recall@1 22%, highlighting the need for semantic signals on natural-language queries.
219
218
 
220
219
  ### With Reranker (`Alibaba-NLP/gte-reranker-modernbert-base`)
221
- ![With Reranker](img/spider_benchmark_with_reranker.png)
220
+ ![With Reranker](https://raw.githubusercontent.com/Neehan/schema-search/refs/heads/main/img/spider_benchmark_with_reranker.png)
222
221
  - **Indexing:** 0.25s ± 0.05s per database (same 18 DBs).
223
222
  - **Accuracy:** All strategies converge around Recall@1 62% and MRR ≈ 0.92; Fuzzy jumps from 51% → 92% MRR.
224
223
  - **Latency trade-off:** Extra CrossEncoder pass lifts per-query latency to ~0.18–0.29s depending on strategy.
@@ -268,7 +267,7 @@ search = SchemaSearch(
268
267
  5. **Optional reranking** with CrossEncoder to refine results
269
268
  6. Return top tables with full schema and relationships
270
269
 
271
- Cache stored in `.schema_search_cache/` (configurable in `config.yml`)
270
+ Cache stored in `/tmp/.schema_search_cache/` (configurable in `config.yml`)
272
271
 
273
272
  ## License
274
273
 
@@ -26,13 +26,13 @@ schema_search/search/factory.py,sha256=wgcx-xnZ8c7uSvu6oP3Fpoabd2Gl8FyJxn7zu3zZY
26
26
  schema_search/search/fuzzy.py,sha256=Urn2GtJ5h6j0R3HsRkrMfQCLSTU8jtGaHdfYXL_Nb3A,1865
27
27
  schema_search/search/hybrid.py,sha256=T1O46SLCPgpCOnTw2bznnCWmqP9EUkUBLqu5AeQu7oQ,2864
28
28
  schema_search/search/semantic.py,sha256=brw7x2hZMCep6QK7WWMT451RnpVcSMuNIZtp51kC6Bo,1673
29
- schema_search-0.1.4.dist-info/licenses/LICENSE,sha256=jOHFAJEjJCD7iBjS2dBe73X5IGDJdAWGosGOUxfCHTM,1067
29
+ schema_search-0.1.6.dist-info/licenses/LICENSE,sha256=jOHFAJEjJCD7iBjS2dBe73X5IGDJdAWGosGOUxfCHTM,1067
30
30
  tests/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
31
31
  tests/test_integration.py,sha256=8Iiq9NAwAxMoZcnfR19oOcBEGTyIOmt6nSafG6LWpj0,11959
32
32
  tests/test_llm_sql_generation.py,sha256=bj6iwTqXfNEvlrSXnbPxbrgEM2nscbrmYHbT-rNBJZ4,11834
33
33
  tests/test_spider_eval.py,sha256=xQwrNXpipaDxk-vIKqSy0nOIl-3Nadtof58nZpsAsZA,15333
34
- schema_search-0.1.4.dist-info/METADATA,sha256=xsYLKOF-QEDg4H5PO7vwkKu6hCk0OsQf6jIX3be_kJY,9256
35
- schema_search-0.1.4.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
36
- schema_search-0.1.4.dist-info/entry_points.txt,sha256=9FAtZWOuIlmRNBPX_v7bn8x_aUcfojAKWU6ruSo48GM,64
37
- schema_search-0.1.4.dist-info/top_level.txt,sha256=NZTdQFHoJMezNIhtZICGPOuXlCXQkQduQV925Oqf4sk,20
38
- schema_search-0.1.4.dist-info/RECORD,,
34
+ schema_search-0.1.6.dist-info/METADATA,sha256=GxpZhajVVAx5R836vQgQcrfNmt809SSGjaGtCM63wao,9327
35
+ schema_search-0.1.6.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
36
+ schema_search-0.1.6.dist-info/entry_points.txt,sha256=9FAtZWOuIlmRNBPX_v7bn8x_aUcfojAKWU6ruSo48GM,64
37
+ schema_search-0.1.6.dist-info/top_level.txt,sha256=NZTdQFHoJMezNIhtZICGPOuXlCXQkQduQV925Oqf4sk,20
38
+ schema_search-0.1.6.dist-info/RECORD,,