schema-search 0.1.5__py3-none-any.whl → 0.1.6__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of schema-search might be problematic. Click here for more details.

@@ -1,9 +1,9 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: schema-search
3
- Version: 0.1.5
4
- Summary: Natural language search for database schemas with graph-aware semantic retrieval
5
- Home-page: https://github.com/neehan/schema-search
6
- Author:
3
+ Version: 0.1.6
4
+ Summary: Natural language database schema search with graph-aware semantic retrieval
5
+ Home-page: https://adibhasan.com/blog/schema-search/
6
+ Author: Adib Hasan
7
7
  Classifier: Development Status :: 3 - Alpha
8
8
  Classifier: Intended Audience :: Developers
9
9
  Classifier: License :: OSI Approved :: MIT License
@@ -38,6 +38,7 @@ Requires-Dist: snowflake-sqlalchemy>=1.4.0; extra == "snowflake"
38
38
  Requires-Dist: snowflake-connector-python>=3.0.0; extra == "snowflake"
39
39
  Provides-Extra: bigquery
40
40
  Requires-Dist: sqlalchemy-bigquery>=1.6.0; extra == "bigquery"
41
+ Dynamic: author
41
42
  Dynamic: classifier
42
43
  Dynamic: description
43
44
  Dynamic: description-content-type
@@ -209,14 +210,14 @@ We [benchmarked](/tests/test_spider_eval.py) on the Spider dataset (1,234 train
209
210
  **Memory:** The embedding model requires ~90 MB and the optional reranker adds ~155 MB. Actual process memory depends on your Python runtime.
210
211
 
211
212
  ### Without Reranker (`reranker.model: null`)
212
- ![Without Reranker](img/spider_benchmark_without_reranker.png)
213
+ ![Without Reranker](https://raw.githubusercontent.com/Neehan/schema-search/refs/heads/main/img/spider_benchmark_without_reranker.png)
213
214
  - **Indexing:** 0.22s ± 0.08s per database (18 total).
214
215
  - **Accuracy:** Hybrid leads with Recall@1 62% / MRR 0.93; Semantic follows at Recall@1 58% / MRR 0.89.
215
216
  - **Latency:** BM25 and Fuzzy return in ~5ms; Semantic spends ~15ms; Hybrid (semantic + fuzzy) averages 52ms.
216
217
  - **Fuzzy baseline:** Recall@1 22%, highlighting the need for semantic signals on natural-language queries.
217
218
 
218
219
  ### With Reranker (`Alibaba-NLP/gte-reranker-modernbert-base`)
219
- ![With Reranker](img/spider_benchmark_with_reranker.png)
220
+ ![With Reranker](https://raw.githubusercontent.com/Neehan/schema-search/refs/heads/main/img/spider_benchmark_with_reranker.png)
220
221
  - **Indexing:** 0.25s ± 0.05s per database (same 18 DBs).
221
222
  - **Accuracy:** All strategies converge around Recall@1 62% and MRR ≈ 0.92; Fuzzy jumps from 51% → 92% MRR.
222
223
  - **Latency trade-off:** Extra CrossEncoder pass lifts per-query latency to ~0.18–0.29s depending on strategy.
@@ -266,7 +267,7 @@ search = SchemaSearch(
266
267
  5. **Optional reranking** with CrossEncoder to refine results
267
268
  6. Return top tables with full schema and relationships
268
269
 
269
- Cache stored in `.schema_search_cache/` (configurable in `config.yml`)
270
+ Cache stored in `/tmp/.schema_search_cache/` (configurable in `config.yml`)
270
271
 
271
272
  ## License
272
273
 
@@ -26,13 +26,13 @@ schema_search/search/factory.py,sha256=wgcx-xnZ8c7uSvu6oP3Fpoabd2Gl8FyJxn7zu3zZY
26
26
  schema_search/search/fuzzy.py,sha256=Urn2GtJ5h6j0R3HsRkrMfQCLSTU8jtGaHdfYXL_Nb3A,1865
27
27
  schema_search/search/hybrid.py,sha256=T1O46SLCPgpCOnTw2bznnCWmqP9EUkUBLqu5AeQu7oQ,2864
28
28
  schema_search/search/semantic.py,sha256=brw7x2hZMCep6QK7WWMT451RnpVcSMuNIZtp51kC6Bo,1673
29
- schema_search-0.1.5.dist-info/licenses/LICENSE,sha256=jOHFAJEjJCD7iBjS2dBe73X5IGDJdAWGosGOUxfCHTM,1067
29
+ schema_search-0.1.6.dist-info/licenses/LICENSE,sha256=jOHFAJEjJCD7iBjS2dBe73X5IGDJdAWGosGOUxfCHTM,1067
30
30
  tests/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
31
31
  tests/test_integration.py,sha256=8Iiq9NAwAxMoZcnfR19oOcBEGTyIOmt6nSafG6LWpj0,11959
32
32
  tests/test_llm_sql_generation.py,sha256=bj6iwTqXfNEvlrSXnbPxbrgEM2nscbrmYHbT-rNBJZ4,11834
33
33
  tests/test_spider_eval.py,sha256=xQwrNXpipaDxk-vIKqSy0nOIl-3Nadtof58nZpsAsZA,15333
34
- schema_search-0.1.5.dist-info/METADATA,sha256=oSfANTlqkUd-yOFntVULaP4y9hHjfqXxO8wiPoZVW4Q,9157
35
- schema_search-0.1.5.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
36
- schema_search-0.1.5.dist-info/entry_points.txt,sha256=9FAtZWOuIlmRNBPX_v7bn8x_aUcfojAKWU6ruSo48GM,64
37
- schema_search-0.1.5.dist-info/top_level.txt,sha256=NZTdQFHoJMezNIhtZICGPOuXlCXQkQduQV925Oqf4sk,20
38
- schema_search-0.1.5.dist-info/RECORD,,
34
+ schema_search-0.1.6.dist-info/METADATA,sha256=GxpZhajVVAx5R836vQgQcrfNmt809SSGjaGtCM63wao,9327
35
+ schema_search-0.1.6.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
36
+ schema_search-0.1.6.dist-info/entry_points.txt,sha256=9FAtZWOuIlmRNBPX_v7bn8x_aUcfojAKWU6ruSo48GM,64
37
+ schema_search-0.1.6.dist-info/top_level.txt,sha256=NZTdQFHoJMezNIhtZICGPOuXlCXQkQduQV925Oqf4sk,20
38
+ schema_search-0.1.6.dist-info/RECORD,,