PyPI - schema-search - Versions diffs - 0.1.6__tar.gz → 0.1.7__tar.gz - Mend

schema-search 0.1.6tar.gz → 0.1.7tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of schema-search might be problematic. Click here for more details.

Files changed (45) hide show

{schema_search-0.1.6/schema_search.egg-info → schema_search-0.1.7}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: schema-search
-Version: 0.1.6
+Version: 0.1.7
 Summary: Natural language database schema search with graph-aware semantic retrieval
 Home-page: https://adibhasan.com/blog/schema-search/
 Author: Adib Hasan
@@ -83,6 +83,48 @@ uv pip install "schema-search[snowflake,mcp]"  # Snowflake
 uv pip install "schema-search[bigquery,mcp]"   # BigQuery
 ```
+## Configuration
+Edit [`config.yml`](https://github.com/Neehan/schema-search/blob/main/config.yml):
+```yaml
+logging:
+  level: "WARNING"
+embedding:
+  location: "memory" # Options: "memory", "vectordb" (coming soon)
+  model: "multi-qa-MiniLM-L6-cos-v1"
+  metric: "cosine" # Options: "cosine", "euclidean", "manhattan", "dot"
+  batch_size: 32
+  show_progress: false
+  cache_dir: "/tmp/.schema_search_cache"
+chunking:
+  strategy: "raw" # Options: "raw", "llm"
+  max_tokens: 256
+  overlap_tokens: 50
+  model: "gpt-4o-mini"
+search:
+  # Search strategy: "semantic" (embeddings), "bm25" (BM25 lexical), "fuzzy" (fuzzy string matching), "hybrid" (semantic + bm25)
+  strategy: "hybrid"
+  initial_top_k: 20
+  rerank_top_k: 5
+  semantic_weight: 0.67 # For hybrid search (bm25_weight = 1 - semantic_weight)
+  hops: 1 # Number of foreign key hops for graph expansion (0-2 recommended)
+reranker:
+  # CrossEncoder model for reranking. Set to null to disable reranking
+  model: null # "Alibaba-NLP/gte-reranker-modernbert-base"
+schema:
+  include_columns: true
+  include_indices: true
+  include_foreign_keys: true
+  include_constraints: true
+```
 ## MCP Server
 Integrate with Claude Desktop or any MCP client.
@@ -97,7 +139,13 @@ Add to your MCP config (e.g., `~/.cursor/mcp.json` or Claude Desktop config):
   "mcpServers": {
     "schema-search": {
       "command": "uvx",
-      "args": ["schema-search[postgres,mcp]", "postgresql://user:pass@localhost/db", "optional config.yml path", "optional llm_api_key", "optional llm_base_url"]
+      "args": [
+        "schema-search[postgres,mcp]",
+        "postgresql://user:pass@localhost/db",
+        "optional/path/to/config.yml",
+        "optional llm_api_key",
+        "optional llm_base_url"
+      ]
     }
   }
 }
@@ -108,8 +156,14 @@ Add to your MCP config (e.g., `~/.cursor/mcp.json` or Claude Desktop config):
 {
   "mcpServers": {
     "schema-search": {
-      "command": "path/to/schema-search", // conda: /Users/<username>/opt/miniconda3/envs/<your env>/bin/schema-search",
-      "args": ["postgresql://user:pass@localhost/db", "optional config.yml path", "optional llm_api_key", "optional llm_base_url"]
+      // conda: /Users/<username>/opt/miniconda3/envs/<your env>/bin/schema-search",
+      "command": "path/to/schema-search",
+      "args": [
+        "postgresql://user:pass@localhost/db",
+        "optional/path/to/config.yml",
+        "optional llm_api_key",
+        "optional llm_base_url"
+      ]
     }
   }
 }
@@ -121,7 +175,7 @@ The LLM API key and base url are only required if you use LLM-generated schema s
 ### CLI Usage
 ```bash
-schema-search "postgresql://user:pass@localhost/db"
+schema-search "postgresql://user:pass@localhost/db" "optional/path/to/config.yml"
 ```
 Optional args: `[config_path] [llm_api_key] [llm_base_url]`
@@ -152,47 +206,6 @@ results = search.search("user_table", hops=0, limit=5, search_type="semantic")
 `SchemaSearch.index()` automatically detects schema changes and refreshes cached metadata, so you rarely need to force a reindex manually.
-## Configuration
-Edit `[config.yml](config.yml)`:
-```yaml
-logging:
-  level: "WARNING"
-embedding:
-  location: "memory" # Options: "memory", "vectordb" (coming soon)
-  model: "multi-qa-MiniLM-L6-cos-v1"
-  metric: "cosine" # Options: "cosine", "euclidean", "manhattan", "dot"
-  batch_size: 32
-  show_progress: false
-  cache_dir: "/tmp/.schema_search_cache"
-chunking:
-  strategy: "raw" # Options: "raw", "llm"
-  max_tokens: 256
-  overlap_tokens: 50
-  model: "gpt-4o-mini"
-search:
-  # Search strategy: "semantic" (embeddings), "bm25" (BM25 lexical), "fuzzy" (fuzzy string matching), "hybrid" (semantic + bm25)
-  strategy: "hybrid"
-  initial_top_k: 20
-  rerank_top_k: 5
-  semantic_weight: 0.67 # For hybrid search (bm25_weight = 1 - semantic_weight)
-  hops: 1 # Number of foreign key hops for graph expansion (0-2 recommended)
-reranker:
-  # CrossEncoder model for reranking. Set to null to disable reranking
-  model: null # "Alibaba-NLP/gte-reranker-modernbert-base"
-schema:
-  include_columns: true
-  include_indices: true
-  include_foreign_keys: true
-  include_constraints: true
-```
 ## Search Strategies
 Schema Search supports four search strategies:
@@ -200,7 +213,7 @@ Schema Search supports four search strategies:
 - **semantic**: Embedding-based similarity search using sentence transformers
 - **bm25**: Lexical search using BM25 ranking algorithm
 - **fuzzy**: String matching on table/column names using fuzzy matching
-- **hybrid**: Combines semantic and bm25 scores (default: 67% semantic, 33% fuzzy)
+- **hybrid**: Combines semantic and bm25 scores (default: 67% semantic, 33% bm25)
 Each strategy performs its own initial ranking, then optionally applies CrossEncoder reranking if `reranker.model` is configured. Set `reranker.model` to `null` to disable reranking.

{schema_search-0.1.6 → schema_search-0.1.7}/README.md RENAMED Viewed

@@ -32,6 +32,48 @@ uv pip install "schema-search[snowflake,mcp]"  # Snowflake
 uv pip install "schema-search[bigquery,mcp]"   # BigQuery
 ```
+## Configuration
+Edit [`config.yml`](https://github.com/Neehan/schema-search/blob/main/config.yml):
+```yaml
+logging:
+  level: "WARNING"
+embedding:
+  location: "memory" # Options: "memory", "vectordb" (coming soon)
+  model: "multi-qa-MiniLM-L6-cos-v1"
+  metric: "cosine" # Options: "cosine", "euclidean", "manhattan", "dot"
+  batch_size: 32
+  show_progress: false
+  cache_dir: "/tmp/.schema_search_cache"
+chunking:
+  strategy: "raw" # Options: "raw", "llm"
+  max_tokens: 256
+  overlap_tokens: 50
+  model: "gpt-4o-mini"
+search:
+  # Search strategy: "semantic" (embeddings), "bm25" (BM25 lexical), "fuzzy" (fuzzy string matching), "hybrid" (semantic + bm25)
+  strategy: "hybrid"
+  initial_top_k: 20
+  rerank_top_k: 5
+  semantic_weight: 0.67 # For hybrid search (bm25_weight = 1 - semantic_weight)
+  hops: 1 # Number of foreign key hops for graph expansion (0-2 recommended)
+reranker:
+  # CrossEncoder model for reranking. Set to null to disable reranking
+  model: null # "Alibaba-NLP/gte-reranker-modernbert-base"
+schema:
+  include_columns: true
+  include_indices: true
+  include_foreign_keys: true
+  include_constraints: true
+```
 ## MCP Server
 Integrate with Claude Desktop or any MCP client.
@@ -46,7 +88,13 @@ Add to your MCP config (e.g., `~/.cursor/mcp.json` or Claude Desktop config):
   "mcpServers": {
     "schema-search": {
       "command": "uvx",
-      "args": ["schema-search[postgres,mcp]", "postgresql://user:pass@localhost/db", "optional config.yml path", "optional llm_api_key", "optional llm_base_url"]
+      "args": [
+        "schema-search[postgres,mcp]",
+        "postgresql://user:pass@localhost/db",
+        "optional/path/to/config.yml",
+        "optional llm_api_key",
+        "optional llm_base_url"
+      ]
     }
   }
 }
@@ -57,8 +105,14 @@ Add to your MCP config (e.g., `~/.cursor/mcp.json` or Claude Desktop config):
 {
   "mcpServers": {
     "schema-search": {
-      "command": "path/to/schema-search", // conda: /Users/<username>/opt/miniconda3/envs/<your env>/bin/schema-search",
-      "args": ["postgresql://user:pass@localhost/db", "optional config.yml path", "optional llm_api_key", "optional llm_base_url"]
+      // conda: /Users/<username>/opt/miniconda3/envs/<your env>/bin/schema-search",
+      "command": "path/to/schema-search",
+      "args": [
+        "postgresql://user:pass@localhost/db",
+        "optional/path/to/config.yml",
+        "optional llm_api_key",
+        "optional llm_base_url"
+      ]
     }
   }
 }
@@ -70,7 +124,7 @@ The LLM API key and base url are only required if you use LLM-generated schema s
 ### CLI Usage
 ```bash
-schema-search "postgresql://user:pass@localhost/db"
+schema-search "postgresql://user:pass@localhost/db" "optional/path/to/config.yml"
 ```
 Optional args: `[config_path] [llm_api_key] [llm_base_url]`
@@ -101,47 +155,6 @@ results = search.search("user_table", hops=0, limit=5, search_type="semantic")
 `SchemaSearch.index()` automatically detects schema changes and refreshes cached metadata, so you rarely need to force a reindex manually.
-## Configuration
-Edit `[config.yml](config.yml)`:
-```yaml
-logging:
-  level: "WARNING"
-embedding:
-  location: "memory" # Options: "memory", "vectordb" (coming soon)
-  model: "multi-qa-MiniLM-L6-cos-v1"
-  metric: "cosine" # Options: "cosine", "euclidean", "manhattan", "dot"
-  batch_size: 32
-  show_progress: false
-  cache_dir: "/tmp/.schema_search_cache"
-chunking:
-  strategy: "raw" # Options: "raw", "llm"
-  max_tokens: 256
-  overlap_tokens: 50
-  model: "gpt-4o-mini"
-search:
-  # Search strategy: "semantic" (embeddings), "bm25" (BM25 lexical), "fuzzy" (fuzzy string matching), "hybrid" (semantic + bm25)
-  strategy: "hybrid"
-  initial_top_k: 20
-  rerank_top_k: 5
-  semantic_weight: 0.67 # For hybrid search (bm25_weight = 1 - semantic_weight)
-  hops: 1 # Number of foreign key hops for graph expansion (0-2 recommended)
-reranker:
-  # CrossEncoder model for reranking. Set to null to disable reranking
-  model: null # "Alibaba-NLP/gte-reranker-modernbert-base"
-schema:
-  include_columns: true
-  include_indices: true
-  include_foreign_keys: true
-  include_constraints: true
-```
 ## Search Strategies
 Schema Search supports four search strategies:
@@ -149,7 +162,7 @@ Schema Search supports four search strategies:
 - **semantic**: Embedding-based similarity search using sentence transformers
 - **bm25**: Lexical search using BM25 ranking algorithm
 - **fuzzy**: String matching on table/column names using fuzzy matching
-- **hybrid**: Combines semantic and bm25 scores (default: 67% semantic, 33% fuzzy)
+- **hybrid**: Combines semantic and bm25 scores (default: 67% semantic, 33% bm25)
 Each strategy performs its own initial ranking, then optionally applies CrossEncoder reranking if `reranker.model` is configured. Set `reranker.model` to `null` to disable reranking.

{schema_search-0.1.6 → schema_search-0.1.7}/schema_search/mcp_server.py RENAMED Viewed

@@ -16,7 +16,6 @@ mcp = FastMCP("schema-search")
 @mcp.tool()
 def schema_search(
     query: str,
-    hops: Optional[int] = None,
     limit: int = 5,
 ) -> dict:
     """Search database schema using natural language.
@@ -25,14 +24,14 @@ def schema_search(
     using semantic similarity. Expands results by traversing foreign key relationships.
     Args:
-        query: Natural language question about database schema (e.g., 'where are user refunds stored?', 'tables related to payments')
-        hops: Number of foreign key relationship hops for graph expansion. Use 0 for exact matches only, 1-2 to include related tables. If not specified, uses value from config.yml (default: 1)
-        limit: Maximum number of table schemas to return in results. Default: 5
+        query: Natural language question about database schema (e.g., 'tables related to payments')
+        limit: Maximum number of table schemas to return in results. Default: 5; Max: 10.
     Returns:
         Dictionary with 'results' (list of table schemas with columns, types, constraints, and relationships) and 'latency_sec' (query execution time)
     """
-    search_result = mcp.search_engine.search(query, hops=hops, limit=limit)  # type: ignore
+    limit = min(limit, 10)
+    search_result = mcp.search_engine.search(query, limit=limit)  # type: ignore
     return {
         "results": search_result["results"],
         "latency_sec": search_result["latency_sec"],

{schema_search-0.1.6 → schema_search-0.1.7/schema_search.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: schema-search
-Version: 0.1.6
+Version: 0.1.7
 Summary: Natural language database schema search with graph-aware semantic retrieval
 Home-page: https://adibhasan.com/blog/schema-search/
 Author: Adib Hasan
@@ -83,6 +83,48 @@ uv pip install "schema-search[snowflake,mcp]"  # Snowflake
 uv pip install "schema-search[bigquery,mcp]"   # BigQuery
 ```
+## Configuration
+Edit [`config.yml`](https://github.com/Neehan/schema-search/blob/main/config.yml):
+```yaml
+logging:
+  level: "WARNING"
+embedding:
+  location: "memory" # Options: "memory", "vectordb" (coming soon)
+  model: "multi-qa-MiniLM-L6-cos-v1"
+  metric: "cosine" # Options: "cosine", "euclidean", "manhattan", "dot"
+  batch_size: 32
+  show_progress: false
+  cache_dir: "/tmp/.schema_search_cache"
+chunking:
+  strategy: "raw" # Options: "raw", "llm"
+  max_tokens: 256
+  overlap_tokens: 50
+  model: "gpt-4o-mini"
+search:
+  # Search strategy: "semantic" (embeddings), "bm25" (BM25 lexical), "fuzzy" (fuzzy string matching), "hybrid" (semantic + bm25)
+  strategy: "hybrid"
+  initial_top_k: 20
+  rerank_top_k: 5
+  semantic_weight: 0.67 # For hybrid search (bm25_weight = 1 - semantic_weight)
+  hops: 1 # Number of foreign key hops for graph expansion (0-2 recommended)
+reranker:
+  # CrossEncoder model for reranking. Set to null to disable reranking
+  model: null # "Alibaba-NLP/gte-reranker-modernbert-base"
+schema:
+  include_columns: true
+  include_indices: true
+  include_foreign_keys: true
+  include_constraints: true
+```
 ## MCP Server
 Integrate with Claude Desktop or any MCP client.
@@ -97,7 +139,13 @@ Add to your MCP config (e.g., `~/.cursor/mcp.json` or Claude Desktop config):
   "mcpServers": {
     "schema-search": {
       "command": "uvx",
-      "args": ["schema-search[postgres,mcp]", "postgresql://user:pass@localhost/db", "optional config.yml path", "optional llm_api_key", "optional llm_base_url"]
+      "args": [
+        "schema-search[postgres,mcp]",
+        "postgresql://user:pass@localhost/db",
+        "optional/path/to/config.yml",
+        "optional llm_api_key",
+        "optional llm_base_url"
+      ]
     }
   }
 }
@@ -108,8 +156,14 @@ Add to your MCP config (e.g., `~/.cursor/mcp.json` or Claude Desktop config):
 {
   "mcpServers": {
     "schema-search": {
-      "command": "path/to/schema-search", // conda: /Users/<username>/opt/miniconda3/envs/<your env>/bin/schema-search",
-      "args": ["postgresql://user:pass@localhost/db", "optional config.yml path", "optional llm_api_key", "optional llm_base_url"]
+      // conda: /Users/<username>/opt/miniconda3/envs/<your env>/bin/schema-search",
+      "command": "path/to/schema-search",
+      "args": [
+        "postgresql://user:pass@localhost/db",
+        "optional/path/to/config.yml",
+        "optional llm_api_key",
+        "optional llm_base_url"
+      ]
     }
   }
 }
@@ -121,7 +175,7 @@ The LLM API key and base url are only required if you use LLM-generated schema s
 ### CLI Usage
 ```bash
-schema-search "postgresql://user:pass@localhost/db"
+schema-search "postgresql://user:pass@localhost/db" "optional/path/to/config.yml"
 ```
 Optional args: `[config_path] [llm_api_key] [llm_base_url]`
@@ -152,47 +206,6 @@ results = search.search("user_table", hops=0, limit=5, search_type="semantic")
 `SchemaSearch.index()` automatically detects schema changes and refreshes cached metadata, so you rarely need to force a reindex manually.
-## Configuration
-Edit `[config.yml](config.yml)`:
-```yaml
-logging:
-  level: "WARNING"
-embedding:
-  location: "memory" # Options: "memory", "vectordb" (coming soon)
-  model: "multi-qa-MiniLM-L6-cos-v1"
-  metric: "cosine" # Options: "cosine", "euclidean", "manhattan", "dot"
-  batch_size: 32
-  show_progress: false
-  cache_dir: "/tmp/.schema_search_cache"
-chunking:
-  strategy: "raw" # Options: "raw", "llm"
-  max_tokens: 256
-  overlap_tokens: 50
-  model: "gpt-4o-mini"
-search:
-  # Search strategy: "semantic" (embeddings), "bm25" (BM25 lexical), "fuzzy" (fuzzy string matching), "hybrid" (semantic + bm25)
-  strategy: "hybrid"
-  initial_top_k: 20
-  rerank_top_k: 5
-  semantic_weight: 0.67 # For hybrid search (bm25_weight = 1 - semantic_weight)
-  hops: 1 # Number of foreign key hops for graph expansion (0-2 recommended)
-reranker:
-  # CrossEncoder model for reranking. Set to null to disable reranking
-  model: null # "Alibaba-NLP/gte-reranker-modernbert-base"
-schema:
-  include_columns: true
-  include_indices: true
-  include_foreign_keys: true
-  include_constraints: true
-```
 ## Search Strategies
 Schema Search supports four search strategies:
@@ -200,7 +213,7 @@ Schema Search supports four search strategies:
 - **semantic**: Embedding-based similarity search using sentence transformers
 - **bm25**: Lexical search using BM25 ranking algorithm
 - **fuzzy**: String matching on table/column names using fuzzy matching
-- **hybrid**: Combines semantic and bm25 scores (default: 67% semantic, 33% fuzzy)
+- **hybrid**: Combines semantic and bm25 scores (default: 67% semantic, 33% bm25)
 Each strategy performs its own initial ranking, then optionally applies CrossEncoder reranking if `reranker.model` is configured. Set `reranker.model` to `null` to disable reranking.

{schema_search-0.1.6 → schema_search-0.1.7}/setup.py RENAMED Viewed

@@ -2,7 +2,7 @@ from setuptools import setup, find_packages
 setup(
     name="schema-search",
-    version="0.1.6",
+    version="0.1.7",
     description="Natural language database schema search with graph-aware semantic retrieval",
     author="Adib Hasan",
     long_description=open("README.md").read(),