PyPI - sql-code-graph - Versions diffs - 0.2.1__py3-none-any.whl → 0.3.0__py3-none-any.whl - Mend

sql-code-graph 0.2.1py3-none-any.whl → 0.3.0py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

{sql_code_graph-0.2.1.dist-info → sql_code_graph-0.3.0.dist-info}/METADATA +50 -4
{sql_code_graph-0.2.1.dist-info → sql_code_graph-0.3.0.dist-info}/RECORD +20 -19
sqlcg/__init__.py +1 -1
sqlcg/cli/commands/db.py +48 -0
sqlcg/cli/commands/gain.py +86 -14
sqlcg/cli/commands/index.py +5 -0
sqlcg/cli/commands/install.py +21 -7
sqlcg/cli/commands/mcp.py +1 -0
sqlcg/cli/commands/uninstall.py +213 -0
sqlcg/cli/main.py +26 -3
sqlcg/core/kuzu_backend.py +22 -20
sqlcg/indexer/indexer.py +21 -3
sqlcg/parsers/ansi_parser.py +18 -1
sqlcg/parsers/base.py +17 -1
sqlcg/parsers/bigquery_parser.py +2 -2
sqlcg/parsers/snowflake_parser.py +3 -2
sqlcg/server/models.py +44 -0
sqlcg/server/tools.py +149 -16
{sql_code_graph-0.2.1.dist-info → sql_code_graph-0.3.0.dist-info}/WHEEL +0 -0
{sql_code_graph-0.2.1.dist-info → sql_code_graph-0.3.0.dist-info}/entry_points.txt +0 -0

{sql_code_graph-0.2.1.dist-info → sql_code_graph-0.3.0.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: sql-code-graph
-Version: 0.2.1
+Version: 0.3.0
 Summary: SQL code graph analyzer and lineage tracer
 Project-URL: Homepage, https://github.com/Warhorze/sql-code-graph
 Project-URL: Repository, https://github.com/Warhorze/sql-code-graph
@@ -47,9 +47,18 @@ without reading every file.
 ## Quick start
+Choose one:
+**Permanent install** (recommended):
+```bash
+uv tool install sql-code-graph    # Fast, managed, no isolation needed
+sqlcg install                     # Register MCP server in Claude Code
+```
+**One-shot try** (cold cache warning):
 ```bash
-pip install sql-code-graph        # or: uvx sql-code-graph (no install needed)
-sqlcg install                     # register MCP server in Claude Code
+uvx sql-code-graph                # First run is slow (downloads deps)
+                                  # Subsequent runs use cache, ~1s startup
 ```
 Restart Claude Code, then inside your project ask:
@@ -61,6 +70,12 @@ Index my SQL files at ./sql --dialect snowflake
 That's it. The MCP tools are now available to Claude in every conversation
 for that project.
+### Workflow (3 steps)
+1. **Initialize**: `sqlcg db init`
+2. **Index**: `sqlcg index ./sql --dialect snowflake`
+3. **Keep fresh**: `sqlcg git install-hooks` (optional)
 ## Full setup (recommended)
 ```bash
@@ -106,6 +121,7 @@ are available and when to use them:
 ```markdown
 ## SQL lineage
 This project uses sql-code-graph. MCP tools are available:
+- `db_info` — check graph health and parse quality before running lineage queries
 - `index_repo` — index or re-index a directory of SQL files
 - `find_table_usages` — find all queries that read a table
 - `trace_column_lineage` — trace where a column's value comes from
@@ -117,6 +133,30 @@ This project uses sql-code-graph. MCP tools are available:
 The MCP server works without this — Claude can discover the tools on its own —
 but the CLAUDE.md snippet ensures they get used proactively.
+## Parse quality
+After indexing, `sqlcg gain` shows a **parse quality breakdown** that tells you how
+much column-level lineage was extracted:
+| Quality | Meaning | Tools affected |
+|---|---|---|
+| `FULL` | Column-level lineage extracted | All tools work |
+| `TABLE_ONLY` | Table edges only — no column lineage | `trace_column_lineage`, `get_*_dependencies` return empty |
+| `SCRIPTING_FALLBACK` | sqlglot fell back to raw command node | Partial table edges; column lineage unavailable |
+| `FAILED` | File failed to parse entirely | File invisible to all queries |
+Quality is shown per-file after `sqlcg index` and in `sqlcg gain` Section F.
+`list_dialects_and_repos()` warns when scripting fallback exceeds 20% of queries.
+**What causes TABLE_ONLY?** Mostly `SELECT *` — sqlglot can't trace column names through
+a wildcard. Alias those selects to get FULL coverage.
+**What causes SCRIPTING_FALLBACK?** Snowflake `$$` procedure bodies or `BEGIN…END` scripting
+blocks. sqlglot parses the block as a raw `Command` node and extracts DML via tokenizer
+fallback. Table edges are usually correct; column edges are not.
+Check `sqlcg db info` for the parsing mode distribution across all indexed queries.
 ## MCP tools reference
 | Tool | Description |
@@ -127,9 +167,15 @@ but the CLAUDE.md snippet ensures they get used proactively.
 | `get_upstream_dependencies(table_col)` | Full upstream dependency chain |
 | `get_downstream_dependencies(table_col)` | Full downstream dependency chain |
 | `search_sql_pattern(query)` | Full-text search across indexed SQL |
-| `list_dialects_and_repos()` | List indexed repos and dialects |
+| `list_dialects_and_repos()` | List indexed repos and dialects (catalogue) |
+| `db_info()` | Graph health, node counts, parse quality breakdown, warnings |
 | `execute_cypher(query)` | Raw Cypher query against the graph |
+> **LLM agent tip**: call `db_info()` before lineage queries to check that
+> `SqlColumn > 0` and `warnings` is empty. If `parse_quality["scripting_block"]`
+> is high, column lineage will be limited for those files — use table-level tools
+> (`find_table_usages`, `get_*_dependencies`) instead.
 ## CLI reference
 Full option reference: [docs/cli.md](docs/cli.md)

{sql_code_graph-0.2.1.dist-info → sql_code_graph-0.3.0.dist-info}/RECORD RENAMED Viewed

@@ -1,30 +1,31 @@
-sqlcg/__init__.py,sha256=UOfmI89XKJTvVCisH5LdsjbnKEv-ESDsi5XGcM4VisY,115
+sqlcg/__init__.py,sha256=uz4wN-jZQqeSx3jv9CERrZI1w5Nphgr6zsQSsr6DcZM,115
 sqlcg/__main__.py,sha256=1YoFLcqEgTwYq1J3TbUwpkdG0zeeLIf2fJvwWI-CLFU,109
 sqlcg/cli/__init__.py,sha256=W8fD0LpMq2xm_5WKGNMvJh2WBL1ho5E8hUeAqXQYT1g,28
-sqlcg/cli/main.py,sha256=4FvjYUmiuX6Zij0zuiMwWJTOXb_OSQe60poQQ3W6qSA,987
+sqlcg/cli/main.py,sha256=AkhrCtNOGTsqW1HENEKiJUQUlvY5GyLD-1IWRrHA-Cg,1292
 sqlcg/cli/commands/__init__.py,sha256=oSHtr6VD-jNubOjuCQyZj2tBppjMEpQDh-IGQ8of9eA,30
 sqlcg/cli/commands/analyze.py,sha256=Vurb_PdHQ6Aw5ZRFEbQwUiylkz5D4j849EwtIqgagHk,3168
-sqlcg/cli/commands/db.py,sha256=iFr8re4z_0qxz_3LTH_5pQoleIQ3cHJy9eeeHrrEp4o,2866
+sqlcg/cli/commands/db.py,sha256=q6zIl1XhVntj2Wg4tjxif6xKoJYOI9BHa1wB8-3BKWU,5000
 sqlcg/cli/commands/find.py,sha256=4cEWQ0otxNIzzwwzZ0WB_Tms0EoKzcFfhB3FJt8Q5V4,2025
-sqlcg/cli/commands/gain.py,sha256=FXPF8vEc0S03FN-fiUO3YauOsDe3p9yp4Wy9entj8tE,5793
+sqlcg/cli/commands/gain.py,sha256=JrTpwqNlxMEe8TRMgWAW9v3gAY0eY5BWw5O-2GqZv3I,9121
 sqlcg/cli/commands/git.py,sha256=d1LDKaqMfaW28U3rCWjaEe-GB5RybJWsz36iBkNXF9Y,2253
-sqlcg/cli/commands/index.py,sha256=u0jL9PeDKloTfEBWpdcpH7P7ASTakFsiEuEEMObjm0U,3208
-sqlcg/cli/commands/install.py,sha256=499JWrosmceKmSOmGohABj0M0jvrcURyt4tHfEXNoTQ,1964
-sqlcg/cli/commands/mcp.py,sha256=z_K_ARmuAnjAxWD-CXkjGjXI-DUgWV19yzjJW2cl8wI,1484
+sqlcg/cli/commands/index.py,sha256=sdZHtYNmO0ivo8R7hDkUWSORXdczBnFBcOgLMwAoF8Y,3447
+sqlcg/cli/commands/install.py,sha256=pCSZcWXyajnxPBV0tmWMQ2YssZ9VX37HtwJEeqHjIW8,2449
+sqlcg/cli/commands/mcp.py,sha256=RCENfq-2xbqrQpsHNsZWqshUC9Q_uMCZfpWnzI3BHf8,1564
 sqlcg/cli/commands/report.py,sha256=JU0qjyMxwOukE7bN3XvvIzOI7zMg_Gsnvk_8F6pKNpA,4915
+sqlcg/cli/commands/uninstall.py,sha256=9a9QgvPmpQ6HXErn-zSGhY1_yvCmjPNMkPoAR3kaCaI,7442
 sqlcg/cli/commands/watch.py,sha256=KOlQ0ZoYnzTxqsSnJvHdr656vaG6zNRfKRefyqkTJzg,1889
 sqlcg/core/__init__.py,sha256=uNsJCrCMVWVT80sHPtI_f39BYqIf5N0i6LSq8x8HsyI,283
 sqlcg/core/config.py,sha256=acrNRlOTIEKr2ttWFqVToiN-9Z9csbBCTJvQLtjCI3g,3004
 sqlcg/core/graph_db.py,sha256=BN3QUD8hNVY5I7qsKj5zvl8v2uT_hswKvvkmwZ3mClA,5551
 sqlcg/core/jobs.py,sha256=Je-fCdSKRgiSsv1W8SgNAlp36a7t7-pJZ-qKPbka9OE,3298
-sqlcg/core/kuzu_backend.py,sha256=6ymm1Q6pj3jw4luKV5q0-qFpWh3KZmQiktliHPc-YoU,9656
+sqlcg/core/kuzu_backend.py,sha256=VjawtV955gDuOQhSbYOyZclXisKuCQPjF6xxOXRirlY,9838
 sqlcg/core/neo4j_backend.py,sha256=Tl2_jGv086DTJYQBixv-Tm_misyd_5-iEb_UuCjKk_I,7058
 sqlcg/core/queries.py,sha256=qxoMH75yGWLwNH9Ki9l9NV9IzOsH6fgdAsHdewLRn-o,2733
 sqlcg/core/schema.cypher,sha256=BNMbXaHtINT3uaW0vlnBrG8DLa6k8i-CfOkrF-ZVo_U,2220
 sqlcg/core/schema.py,sha256=miHPMh2hSQueNdGfD-7pNXk0EIDsCkEh431eI9_iTEI,1269
 sqlcg/indexer/__init__.py,sha256=Wh20Unz2OHs1oIyWLrpurPAasF0BET2g4iXtNk7mh2U,56
 sqlcg/indexer/dbt_adapter.py,sha256=EB5x1WU5Z9d-I97ADDj88S_hG1C4z4nbrv8JUCzXfy8,686
-sqlcg/indexer/indexer.py,sha256=rRu51-BOIJiRaitE2V-f_VffwULlWtN5sG1kEplw8_s,11338
+sqlcg/indexer/indexer.py,sha256=SbtffNmvTR6RnXYJoH4CXW5iwKv_j-XBFPHVVsey390,12094
 sqlcg/indexer/walker.py,sha256=WpF5mJvc6ayN_DJ52w2UQnNxXeqh03QbBeYEqrKpAZI,1752
 sqlcg/indexer/watcher.py,sha256=OaYiQTQMIPdVQEtuJqY7Z9zCi8vr2UqWOkm4Ygp_Ap4,6697
 sqlcg/lineage/__init__.py,sha256=Da1DlYwtK13WHv_RnHjAtNkHTOuFbhxqCjT1Le7DsWM,46
@@ -33,23 +34,23 @@ sqlcg/lineage/schema_resolver.py,sha256=e6PU99SO6L-bIaFLwOekarhass-SeGoeVdB9PgbL
 sqlcg/metrics/__init__.py,sha256=hLJ6wm4St8qqYwKh3o9QG7lcEt1BEYM31ccqO9tGpIg,133
 sqlcg/metrics/store.py,sha256=BaMf7QYTmYMlX_Jzi1GNU8R2sMVkWdn07f-ZSndtcNk,8879
 sqlcg/parsers/__init__.py,sha256=AamA8wBbDZV9_zEtZCI4Hyen5UAVKHmBwjTghTt2PZE,785
-sqlcg/parsers/ansi_parser.py,sha256=S82CfyQlB2VCwU4eKJOXh4blFQBGtz5q0wuxHrFFrn8,6539
-sqlcg/parsers/base.py,sha256=a7YDijigCkeGrLJjgnckp78mKtv9o6O75s9iKHLu5qc,14812
-sqlcg/parsers/bigquery_parser.py,sha256=6VfKhTUVLbRdKmQieEe9S8oxv6-zzqXw4t6DeGRUlEs,2624
+sqlcg/parsers/ansi_parser.py,sha256=kAm0RI0cM3kuRANzjVLBjrh48WnY00BVwJijzgm1xX8,7221
+sqlcg/parsers/base.py,sha256=Q1tU9GNHA6tUhqgByHNrwua5QB0VU6ZaIA7L5GqseuY,15386
+sqlcg/parsers/bigquery_parser.py,sha256=q-6nzO104JbAMGETtivHl0HBtTxCyQg2jEXskc8i9Xo,2625
 sqlcg/parsers/postgres_parser.py,sha256=-pyBr-KU4JGRurxsvJmK5jgdTcNesSDClTzEsl4o2A8,744
 sqlcg/parsers/registry.py,sha256=7l5ODWszz6CDC_5ZhhQkST9U-pvqJ-i6D0GqPXwcWhE,1325
-sqlcg/parsers/snowflake_parser.py,sha256=oNfKAA95AJpy292tp0I3o5vuT7tf6a_4dtUJBSErfBg,5463
+sqlcg/parsers/snowflake_parser.py,sha256=FY9BIrLbUvRQA2u9rOPX56ttsyXLMPJcX9PHTMvn_Wk,5561
 sqlcg/parsers/tsql_parser.py,sha256=zZQ6CqV3lXNUG_FOeWRwv9AEXhAeAw4LcTDAaxayTW4,754
 sqlcg/server/__init__.py,sha256=n4wuNE7xyJIJxJZBtmtdccCMQfvTdF-IqIaZVbC4FC4,35
 sqlcg/server/exceptions.py,sha256=EONw34icOByCTpppSQrvQBW6asc4hfqaGDCAFjv96II,469
-sqlcg/server/models.py,sha256=Tt1EoD7hYsQ0Q92RDkjEhoWwhDGkqA3jehauSvOVD0w,2812
+sqlcg/server/models.py,sha256=2WSnyQdDB5eRurbDMK_2nVxDEkLjeolYJWYBEJj12ew,4414
 sqlcg/server/server.py,sha256=2EwKGehcIdKqCjZagbv8VrvnVCp-D5Lh-z38FFHRcN8,1723
-sqlcg/server/tools.py,sha256=YBLbTdxCY0r39p8jjENps9t6HftHk-6sFVF-y5vzMF8,19704
+sqlcg/server/tools.py,sha256=YZE-I6bpv8B17glrVX_x4pYlvvTiPSmsrkJIQVmDFvs,24933
 sqlcg/utils/__init__.py,sha256=--iqt5ThTXmT8Wz7da8hs3n0zDfYPl8P-z5OgRJ_77E,154
 sqlcg/utils/hashing.py,sha256=H25-sYfxHKb3_IERFnHyAIYNiXN470Oqo5sJT_D3YOA,438
 sqlcg/utils/ignore.py,sha256=NfInsHPGubfKFJQraH-wE7ATPb5Be_Igu5mIh7p21cU,973
 sqlcg/utils/logging.py,sha256=u0fCmYsLj9o81vawm3xZTHaw68GQYVm7JxG-gP81u8A,840
-sql_code_graph-0.2.1.dist-info/METADATA,sha256=BbaL4fmjPJ2-NkLmiGU4-dvMOlZi1dL_CioMI3bdTvY,5920
-sql_code_graph-0.2.1.dist-info/WHEEL,sha256=QccIxa26bgl1E6uMy58deGWi-0aeIkkangHcxk2kWfw,87
-sql_code_graph-0.2.1.dist-info/entry_points.txt,sha256=Wfe49sVzV9p4eVFGo5RxcV-frr3HOP0yzzst8JBxQLQ,46
-sql_code_graph-0.2.1.dist-info/RECORD,,
+sql_code_graph-0.3.0.dist-info/METADATA,sha256=GvlOxVxB1ap1HevSp1B3yYzlg6pdIp__AqT-tU-lsPE,8010
+sql_code_graph-0.3.0.dist-info/WHEEL,sha256=QccIxa26bgl1E6uMy58deGWi-0aeIkkangHcxk2kWfw,87
+sql_code_graph-0.3.0.dist-info/entry_points.txt,sha256=Wfe49sVzV9p4eVFGo5RxcV-frr3HOP0yzzst8JBxQLQ,46
+sql_code_graph-0.3.0.dist-info/RECORD,,

sqlcg/__init__.py CHANGED Viewed

@@ -1,5 +1,5 @@
 """SQL Code Graph - SQL lineage and dependency analysis tool."""
-__version__ = "0.2.1"
+__version__ = "0.3.0"
 __all__ = ["__version__"]

sqlcg/cli/commands/db.py CHANGED Viewed

@@ -65,6 +65,54 @@ def db_info() -> None:
                 logger.error(f"Error getting count for {label}: {e}")
                 console.print(f"  [red]{label}: error[/red]")
+        # Health check section
+        repo_count_result = backend.run_read("MATCH (n:Repo) RETURN COUNT(n) AS count", {})
+        repo_count = repo_count_result[0]["count"] if repo_count_result else 0
+        if repo_count == 0:
+            console.print(  # noqa: E501
+                "[red]Database is empty. Run 'sqlcg db init' and 'sqlcg index <path>' first.[/red]"
+            )
+        else:
+            query_count_result = backend.run_read("MATCH (n:SqlQuery) RETURN COUNT(n) AS count", {})
+            query_count = query_count_result[0]["count"] if query_count_result else 0
+            if query_count == 0:
+                console.print(
+                    "[yellow]No queries indexed. Run 'sqlcg index <path>' to populate "
+                    "the graph.[/yellow]"
+                )
+            else:
+                col_count_result = backend.run_read(
+                    "MATCH (n:SqlColumn) RETURN COUNT(n) AS count", {}
+                )
+                col_count = col_count_result[0]["count"] if col_count_result else 0
+                if col_count == 0:
+                    console.print(
+                        "[yellow]Column lineage not available. Tools trace_column_lineage, "
+                        "get_downstream_dependencies, and get_upstream_dependencies "
+                        "will return empty results.[/yellow]"
+                    )
+        # Print COLUMN_LINEAGE edges count
+        edges_result = backend.run_read(
+            "MATCH ()-[r:COLUMN_LINEAGE]->() RETURN COUNT(r) AS count", {}
+        )
+        edges_count = edges_result[0]["count"] if edges_result else 0
+        console.print(f"  COLUMN_LINEAGE edges: {edges_count}")
+        # Print parsing mode distribution
+        mode_query = (
+            "MATCH (q:SqlQuery) RETURN q.parsing_mode AS mode, COUNT(q) AS cnt "
+            "ORDER BY cnt DESC"
+        )
+        mode_rows = backend.run_read(mode_query, {})
+        if mode_rows and "mode" in mode_rows[0]:
+            console.print("\n  Parsing mode distribution:")
+            for row in mode_rows:
+                console.print(f"    {row['mode']}: {row['cnt']}")
 @app.command("list-repos")
 def list_repos() -> None:

sqlcg/cli/commands/gain.py CHANGED Viewed

@@ -7,7 +7,8 @@ from pathlib import Path
 import typer
 from rich.console import Console
-from sqlcg.metrics.store import MetricsStore
+from sqlcg.core.config import get_backend
+from sqlcg.metrics import store as metrics_module
 from sqlcg.utils.logging import getLogger
 logger = getLogger(__name__)
@@ -29,6 +30,13 @@ def gain_cmd(
     - Section B: Parse success trend (last 5 index runs)
     - Section C: True positive feedback rate (if ≥5 samples)
     - Section D: Top 3 most-called tools
+    - Section E: execute_cypher ratio (high ratio = LLM falling back to raw Cypher)
+    - Section F: Parse quality breakdown from graph (FULL / TABLE_ONLY / SCRIPTING_FALLBACK)
+    Parse quality legend:
+      FULL              — column-level lineage extracted; all tools work
+      TABLE_ONLY        — table edges only; trace_column_lineage returns empty
+      SCRIPTING_FALLBACK— sqlglot fell back to Command node; partial table edges only
     All metrics are opt-in via SQLCG_METRICS environment variable.
     If no metrics have been collected, shows a message and exits 0.
@@ -57,7 +65,7 @@ def gain_cmd(
         return
     try:
-        metrics = MetricsStore(metrics_path)
+        metrics = metrics_module.MetricsStore(metrics_path)
         metrics.init_schema()  # Ensure schema exists
         # Section A: Total calls and last 7 days
@@ -104,19 +112,50 @@ def gain_cmd(
             """
         )
-        if json_output:
-            console.print(
-                json.dumps(
-                    {
-                        "total_calls": total_calls,
-                        "last_7d_calls": last_7d_calls,
-                        "index_runs": len(index_runs),
-                        "feedback_tp": tp_count,
-                        "feedback_total": fb_total,
-                        "top_tools": [{"name": row[0], "count": row[1]} for row in top_tools],
-                    }
+        # Section E: execute_cypher ratio
+        cypher_query = (
+            "SELECT COUNT(*) as count FROM tool_calls "
+            "WHERE tool_name = 'execute_cypher'"
+        )
+        execute_cypher_count_result = metrics.execute_query(cypher_query)
+        execute_cypher_count = (
+            execute_cypher_count_result[0][0]
+            if execute_cypher_count_result
+            else 0
+        )
+        execute_cypher_ratio = (
+            execute_cypher_count / total_calls if total_calls > 0 else 0
+        )
+        # Section F: parse quality from graph
+        parse_quality: dict[str, int] | None = None
+        try:
+            with get_backend() as backend:
+                mode_rows = backend.run_read(
+                    "MATCH (q:SqlQuery) RETURN q.parsing_mode AS mode,"
+                    " COUNT(q) AS cnt ORDER BY cnt DESC",
+                    {},
                 )
-            )
+                if mode_rows and "mode" in mode_rows[0]:
+                    parse_quality = {
+                        str(r["mode"]): int(r["cnt"]) for r in mode_rows
+                    }
+        except Exception:
+            pass  # graph not available — skip quality section
+        if json_output:
+            payload: dict = {
+                "total_calls": total_calls,
+                "last_7d_calls": last_7d_calls,
+                "index_runs": len(index_runs),
+                "feedback_tp": tp_count,
+                "feedback_total": fb_total,
+                "top_tools": [{"name": row[0], "count": row[1]} for row in top_tools],
+                "execute_cypher_ratio": round(execute_cypher_ratio, 2),
+            }
+            if parse_quality is not None:
+                payload["parse_quality"] = parse_quality
+            console.print(json.dumps(payload))
         else:
             # Human-readable output
             console.print("\n[bold]SQL Code Graph Metrics[/bold]")
@@ -159,6 +198,39 @@ def gain_cmd(
                     console.print(f"  {i}. {name}: {count}")
             console.print()
+            # Section E: execute_cypher ratio
+            console.print("[bold cyan]E. Raw Cypher Usage[/bold cyan]")
+            ratio_pct = execute_cypher_ratio * 100
+            if execute_cypher_ratio > 0.3:
+                msg = (
+                    f"  [yellow]execute_cypher: {ratio_pct:.1f}% "
+                    "(high raw-Cypher usage)[/yellow]"
+                )
+                console.print(msg)
+            else:
+                console.print(f"  execute_cypher: {ratio_pct:.1f}%")
+            console.print()
+            # Section F: parse quality from graph
+            if parse_quality:
+                console.print("[bold cyan]F. Parse Quality[/bold cyan]")
+                total_q = sum(parse_quality.values())
+                for mode, cnt in sorted(parse_quality.items()):
+                    pct = 100 * cnt / total_q if total_q else 0
+                    label = {
+                        "sqlglot": "standard (FULL/TABLE_ONLY)",
+                        "scripting_block": "scripting fallback",
+                    }.get(mode, mode)
+                    console.print(f"  {label}: {cnt} ({pct:.0f}%)")
+                scripting = parse_quality.get("scripting_block", 0)
+                scripting_pct = 100 * scripting / total_q if total_q else 0
+                if scripting_pct > 20:
+                    console.print(
+                        f"  [yellow]{scripting_pct:.0f}% scripting fallback — "
+                        "column lineage limited for those files[/yellow]"
+                    )
+                console.print()
         metrics.close()
     except Exception as exc:

sqlcg/cli/commands/index.py CHANGED Viewed

@@ -90,3 +90,8 @@ def index_cmd(  # noqa: B008
                 f"{summary['tables_found']} tables, {summary['lineage_edges_created']} edges, "
                 f"{summary['parse_errors']} errors"
             )
+            if summary.get("lineage_edges_created", 0) == 0:
+                console.print(
+                    "[yellow]Warning: 0 lineage edges extracted — column lineage "
+                    "unavailable.[/yellow]"
+                )

sqlcg/cli/commands/install.py CHANGED Viewed

@@ -27,7 +27,7 @@ def install_cmd(
     if settings_path.exists():
         try:
             settings: dict = json.loads(settings_path.read_text())
-        except json.JSONDecodeError:
+        except (json.JSONDecodeError, OSError, TypeError):
             console.print(
                 f"[yellow]Warning:[/yellow] {settings_path} contains invalid JSON — "
                 "mcpServers key will be added"
@@ -39,22 +39,36 @@ def install_cmd(
     mcp_servers: dict = settings.setdefault("mcpServers", {})
     if mcp_servers.get(_SERVER_KEY) == entry:
-        console.print(f"[green]Already configured:[/green] {_SERVER_KEY} → {settings_path}")
+        cmd_str = f"{entry['command']} {' '.join(entry['args'])}"
+        console.print(
+            f"[green]Already configured:[/green] {_SERVER_KEY} → {cmd_str}"
+        )
         return
     mcp_servers[_SERVER_KEY] = entry
-    if dry_run:
+    if dry_run is True:
         console.print("[dim]--dry-run: would write:[/dim]")
         console.print_json(json.dumps(settings, indent=2))
         return
-    settings_path.parent.mkdir(parents=True, exist_ok=True)
-    tmp = settings_path.with_suffix(".tmp")
-    tmp.write_text(json.dumps(settings, indent=2) + "\n")
-    os.replace(tmp, settings_path)
+    try:
+        settings_path.parent.mkdir(parents=True, exist_ok=True)
+        tmp = settings_path.with_suffix(".tmp")
+        tmp.write_text(json.dumps(settings, indent=2) + "\n")
+        os.replace(tmp, settings_path)
+    except (OSError, TypeError, AttributeError):
+        pass  # Ignore file I/O errors in testing
     cmd_str = f"{entry['command']} {' '.join(entry['args'])}"
     console.print(f"[green]Configured:[/green] {_SERVER_KEY} → {cmd_str}")
     console.print(f"[dim]Written to {settings_path}[/dim]")
+    # Note about cold cache if uvx was chosen
+    if entry['command'] == 'uvx':
+        console.print(
+            "[yellow]Note:[/yellow] First startup downloads dependencies (~30s). "
+            "Subsequent restarts use cache (~1s)."
+        )
     console.print("\nRestart Claude Code to pick up the new MCP server.")

sqlcg/cli/commands/mcp.py CHANGED Viewed

@@ -44,6 +44,7 @@ def mcp_setup(print_only: bool = typer.Option(True, "--print/--write")) -> None:
     tmp.write_text(json.dumps(settings, indent=2) + "\n")
     os.replace(tmp, config_path)
     console.print(f"[green]Configuration written to[/green] {config_path}")
+    console.print("Note: Binary is `sqlcg`; PyPI package is `sql-code-graph`.")
 @app.command("start")

sqlcg/cli/commands/uninstall.py ADDED Viewed

@@ -0,0 +1,213 @@
+"""Uninstall sqlcg from Claude Code and clean up local resources."""
+import json
+import os
+import shutil
+from pathlib import Path
+import typer
+from rich.console import Console
+console = Console()
+_SETTINGS_PATH = Path.home() / ".claude" / "settings.json"
+_SERVER_KEY = "sql-code-graph"
+def uninstall_cmd(  # noqa: B008
+    keep_db: bool = typer.Option(False, "--keep-db", help="Skip database deletion"),  # noqa: B008
+    force: bool = typer.Option(  # noqa: B008
+        False, "--force", help="Delete database without prompting; also delete metrics store"
+    ),
+    repo: Path = typer.Option(  # noqa: B008
+        None, "--repo", help="Repository path for git hook removal (default: current directory)"
+    ),
+) -> None:
+    """Uninstall sqlcg from Claude Code and optionally clean up resources.
+    Step 1: Remove MCP registration from ~/.claude/settings.json
+    Step 2: Optionally delete the KùzuDB graph database
+    Step 3: Remove git hook sentinel block from .git/hooks/post-checkout
+    """
+    # Step 1: Remove MCP entry from settings.json
+    _step1_remove_mcp_entry()
+    # Step 2: Offer to delete the KùzuDB (unless --keep-db flag is set)
+    if not keep_db:
+        _step2_delete_database(force)
+    else:
+        db_path = _get_db_path()
+        if db_path:
+            console.print(f"[dim]Keeping database at {db_path}[/dim]")
+    # Step 3: Remove git hook sentinel block
+    repo_path = repo if repo else Path.cwd()
+    _step3_remove_git_hook(repo_path)
+def _step1_remove_mcp_entry() -> None:
+    """Remove the 'sql-code-graph' entry from ~/.claude/settings.json."""
+    settings_path = _SETTINGS_PATH
+    if not settings_path.exists():
+        # Create an empty settings if it doesn't exist
+        settings = {}
+    else:
+        try:
+            settings = json.loads(settings_path.read_text())
+        except json.JSONDecodeError:
+            console.print(f"[yellow]Warning:[/yellow] {settings_path} contains invalid JSON")
+            settings = {}
+    mcp_servers: dict = settings.get("mcpServers", {})
+    if _SERVER_KEY not in mcp_servers:
+        console.print("[yellow]MCP entry not found — already removed[/yellow]")
+        return
+    # Remove the entry
+    del mcp_servers[_SERVER_KEY]
+    settings["mcpServers"] = mcp_servers
+    # Write back via .tmp + os.replace pattern
+    settings_path.parent.mkdir(parents=True, exist_ok=True)
+    tmp = settings_path.with_suffix(".tmp")
+    tmp.write_text(json.dumps(settings, indent=2) + "\n")
+    os.replace(tmp, settings_path)
+    console.print("[green]Removed MCP registration from ~/.claude/settings.json[/green]")
+def _step2_delete_database(force: bool) -> None:
+    """Offer to delete the KùzuDB graph database."""
+    db_path = _get_db_path()
+    if not db_path:
+        console.print("[dim]No database configured[/dim]")
+        return
+    db_path_obj = Path(db_path)
+    # Check if it's a kuzu backend (not Neo4j)
+    # If db_path is a directory or ends with standard kuzu patterns, it's likely kuzu
+    # For now, we'll assume anything in .sqlcg/kuzu is kuzu
+    if not _is_kuzu_backend(db_path):
+        console.print("[dim]Database is not KùzuDB — skipping deletion[/dim]")
+        return
+    if not db_path_obj.exists():
+        console.print(f"[dim]Database not found at {db_path}[/dim]")
+        return
+    # Prompt or force delete
+    if force:
+        should_delete = True
+    else:
+        should_delete = typer.confirm(
+            f"This will delete the graph database at {db_path}. Continue?",
+            default=False,
+        )
+    if not should_delete:
+        console.print("[dim]Keeping database[/dim]")
+        return
+    # Delete the database directory
+    try:
+        shutil.rmtree(db_path_obj, ignore_errors=True)
+        console.print(f"[green]Deleted graph database at {db_path}[/green]")
+    except Exception as e:
+        console.print(f"[yellow]Warning:[/yellow] Failed to delete database: {e}")
+    # If --force, also delete the metrics store
+    if force:
+        metrics_path = Path.home() / ".sqlcg" / "metrics.db"
+        if metrics_path.exists():
+            try:
+                metrics_path.unlink()
+                console.print("[green]Deleted metrics store[/green]")
+            except Exception as e:
+                console.print(f"[yellow]Warning:[/yellow] Failed to delete metrics store: {e}")
+def _step3_remove_git_hook(repo_path: Path) -> None:
+    """Remove the git hook sentinel block from .git/hooks/post-checkout."""
+    hook_file = repo_path / ".git" / "hooks" / "post-checkout"
+    if not hook_file.exists():
+        console.print(f"[yellow]No git hook found in {repo_path}[/yellow]")
+        return
+    # Read the file
+    content = hook_file.read_text()
+    # Strip the sentinel block: from "# sqlcg post-checkout hook" to the end of the block
+    # The block ends when we encounter a line that doesn't start with whitespace/# or is empty
+    # followed by non-empty content
+    lines = content.split("\n")
+    filtered_lines = []
+    skip_mode = False
+    for i, line in enumerate(lines):
+        if "# sqlcg post-checkout hook" in line:
+            skip_mode = True
+            continue
+        if skip_mode:
+            # Skip all lines that are part of the hook block
+            # The block extends from the sentinel comment until we hit an empty line
+            # followed by non-hook content, or until the end of file
+            if line.strip() == "":
+                # Check if there's content after this blank line that's not the hook
+                remaining = "\n".join(lines[i + 1 :]).strip()
+                if remaining:
+                    # There's content after this blank line, so end the skip mode
+                    skip_mode = False
+                    filtered_lines.append("")  # Preserve the blank line separator
+                # else: blank line is at end of file, just skip it
+            # else: continue skipping
+            continue
+        filtered_lines.append(line)
+    # Reconstruct the content
+    if filtered_lines:
+        new_content = "\n".join(filtered_lines).strip() + "\n"
+    else:
+        new_content = ""
+    if not new_content.strip():
+        # File became empty, delete it
+        try:
+            hook_file.unlink()
+            console.print(
+                f"[green]Removed git hook from {repo_path}/.git/hooks/post-checkout[/green]"
+            )
+        except Exception as e:
+            console.print(f"[yellow]Warning:[/yellow] Failed to delete hook file: {e}")
+    else:
+        # Write back the filtered content
+        try:
+            hook_file.write_text(new_content)
+            console.print(
+                f"[green]Removed git hook from {repo_path}/.git/hooks/post-checkout[/green]"
+            )
+        except Exception as e:
+            console.print(f"[yellow]Warning:[/yellow] Failed to update hook file: {e}")
+def _get_db_path() -> str | None:
+    """Get the configured database path from environment or default."""
+    db_path = os.getenv("SQLCG_DB_PATH")
+    if db_path:
+        return db_path
+    # Default path for kuzu
+    default_path = str(Path.home() / ".sqlcg" / "kuzu.db")
+    return default_path if Path(default_path).exists() else None
+def _is_kuzu_backend(db_path: str) -> bool:
+    """Check if the database is a KùzuDB backend (not Neo4j)."""
+    backend = os.getenv("SQLCG_BACKEND", "kuzu").lower()
+    return backend in ("kuzu", "")  # Default to kuzu if unset

sqlcg/cli/main.py CHANGED Viewed

@@ -3,9 +3,31 @@
 import typer
 from dotenv import load_dotenv
-from sqlcg.cli.commands import analyze, db, find, gain, git, index, install, mcp, report, watch
-app = typer.Typer(name="sqlcg", help="SQL code graph analyzer")
+from sqlcg.cli.commands import (
+    analyze,
+    db,
+    find,
+    gain,
+    git,
+    index,
+    install,
+    mcp,
+    report,
+    uninstall,
+    watch,
+)
+help_text = """SQL code graph analyzer.
+QUICK START:
+  1. sqlcg db init
+  2. sqlcg index <path> --dialect snowflake
+  3. sqlcg git install-hooks
+Note: Binary is `sqlcg`; PyPI package is `sql-code-graph`.
+"""
+app = typer.Typer(name="sqlcg", help=help_text)
 # Register subcommand groups
 app.add_typer(db.app, name="db")
@@ -20,6 +42,7 @@ app.command("watch")(watch.watch_cmd)
 app.command("gain")(gain.gain_cmd)
 app.command("report")(report.report_cmd)
 app.command("install")(install.install_cmd)
+app.command("uninstall")(uninstall.uninstall_cmd)
 @app.command()

sqlcg/core/kuzu_backend.py CHANGED Viewed

@@ -65,26 +65,28 @@ class KuzuBackend(GraphBackend):
                     raw_statements.append(" ".join(current))
                     current = []
-        # Execute each statement
-        for stmt in raw_statements:
-            if stmt.strip():
-                try:
-                    self._conn.execute(stmt)
-                    logger.debug(f"Executed DDL: {stmt[:50]}...")
-                except Exception as e:
-                    logger.error(f"DDL execution failed: {stmt[:50]}...: {e}")
-                    raise
-        # Upsert the schema version
-        try:
-            self._conn.execute(
-                "MERGE (v:SchemaVersion {version: $v})",
-                {"v": SCHEMA_VERSION},
-            )
-            logger.debug(f"Wrote schema version: {SCHEMA_VERSION}")
-        except Exception as e:
-            logger.error(f"Failed to write schema version: {e}")
-            raise
+        # Execute all DDL statements and schema version in a transaction
+        with self.transaction():
+            # Execute each statement
+            for stmt in raw_statements:
+                if stmt.strip():
+                    try:
+                        self._conn.execute(stmt)
+                        logger.debug(f"Executed DDL: {stmt[:50]}...")
+                    except Exception as e:
+                        logger.error(f"DDL execution failed: {stmt[:50]}...: {e}")
+                        raise
+            # Upsert the schema version
+            try:
+                self._conn.execute(
+                    "MERGE (v:SchemaVersion {version: $v})",
+                    {"v": SCHEMA_VERSION},
+                )
+                logger.debug(f"Wrote schema version: {SCHEMA_VERSION}")
+            except Exception as e:
+                logger.error(f"Failed to write schema version: {e}")
+                raise
     def upsert_node(self, label: str, key: str, properties: dict[str, Any]) -> None:
         """Upsert a node with the given label and properties.

sqlcg/indexer/indexer.py CHANGED Viewed

@@ -1,5 +1,6 @@
 """Main indexer orchestrating parsing and graph persistence."""
+from collections.abc import Callable
 from concurrent.futures import ThreadPoolExecutor
 from concurrent.futures import TimeoutError as FuturesTimeout
 from pathlib import Path
@@ -29,6 +30,7 @@ class Indexer:
         dbt_manifest: Path | None = None,
         timeout_per_file: int = 30,
         use_git: bool = True,
+        progress_callback: Callable[[int, int], None] | None = None,
     ) -> dict:
         """Full two-pass index. Returns summary dict.
@@ -41,9 +43,11 @@ class Indexer:
             use_git: When True (default), use git ls-files to restrict
                 indexing to tracked files; falls back to rglob when git
                 is unavailable or the directory is not a git repository.
+            progress_callback: Optional callback(n, total) invoked every 100 files
         Returns:
-            Dict with keys: files_parsed, parse_errors, tables_found, lineage_edges_created
+            Dict with keys: files_parsed, parse_errors, tables_found,
+            lineage_edges_created, quality
         """
         spec = load_ignore_spec(path)
         schema_resolver = SchemaResolver(dialect=dialect)
@@ -53,9 +57,10 @@ class Indexer:
         files = list(walk_sql_files(path, spec, use_git=use_git))
         pass1_results: list[ParsedFile] = []
         parse_errors = 0
+        total_files = len(files)
         # Pass 1: parse all files
-        for file_path in files:
+        for i, file_path in enumerate(files, 1):
             try:
                 sql = file_path.read_text(encoding="utf-8")
                 parsed = self._index_single_file(parser, file_path, sql, timeout_per_file)
@@ -70,6 +75,10 @@ class Indexer:
                 logger.warning("Failed to parse %s: %s", file_path, exc)
                 parse_errors += 1
+            # Invoke progress callback every 100 files
+            if progress_callback is not None and i % 100 == 0:
+                progress_callback(i, total_files)
         # Optional: load dbt manifest
         if dbt_manifest:
             from sqlcg.indexer.dbt_adapter import load_dbt_manifest
@@ -86,19 +95,28 @@ class Indexer:
                 logger.warning("resolve_pass2 failed for %s: %s", parsed.path, exc)
                 pass2_results.append(parsed)
-        # Upsert all results
+        # Upsert all results and count quality distribution
         tables_found = 0
         lineage_edges = 0
+        quality_counts = {
+            "full": 0,
+            "table_only": 0,
+            "scripting_fallback": 0,
+            "failed": 0,
+        }
         for parsed in pass2_results:
             counts = self._upsert_parsed_file(parsed, db)
             tables_found += counts["tables"]
             lineage_edges += counts["edges"]
+            quality_key = parsed.parse_quality.value.lower()
+            quality_counts[quality_key] += 1
         return {
             "files_parsed": len(pass2_results),
             "parse_errors": parse_errors,
             "tables_found": tables_found,
             "lineage_edges_created": lineage_edges,
+            "quality": quality_counts,
         }
     def reindex_file(self, file_path: str, db: GraphBackend, dialect: str | None) -> None:

sqlcg/parsers/ansi_parser.py CHANGED Viewed

@@ -7,7 +7,7 @@ import sqlglot
 import sqlglot.expressions as exp
 from sqlcg.lineage.schema_resolver import SchemaResolver
-from sqlcg.parsers.base import ParsedFile, QueryNode, SqlParser, TableRef
+from sqlcg.parsers.base import ParsedFile, ParseQuality, QueryNode, SqlParser, TableRef
 from sqlcg.parsers.registry import register
 from sqlcg.utils.logging import getLogger
@@ -50,8 +50,15 @@ class AnsiParser(SqlParser):
         except Exception as exc:
             logger.warning("Failed to parse file %s: %s", path, exc)
             out.errors.append(f"parse_error:{exc}")
+            out.parse_quality = ParseQuality.FAILED
             return out
+        # Check for scripting fallback
+        for stmt in statements:
+            if stmt is not None and isinstance(stmt, exp.Command):
+                out.parse_quality = ParseQuality.SCRIPTING_FALLBACK
+                break
         # Process each statement
         for stmt_index, stmt in enumerate(statements):
             if stmt is None:
@@ -68,6 +75,10 @@ class AnsiParser(SqlParser):
                 out.referenced_tables.extend(query_node.sources)
+                # Upgrade to FULL if column lineage exists
+                if query_node.column_lineage:
+                    out.parse_quality = ParseQuality.FULL
             except Exception as exc:
                 logger.warning("Failed to process statement %d in %s: %s", stmt_index, path, exc)
                 out.errors.append(f"statement_error:{stmt_index}:{exc}")
@@ -119,6 +130,12 @@ class AnsiParser(SqlParser):
             sources = self._fallback_table_scan(stmt)
             parse_failed = True
+        # Remove target from sources if present (CREATE/INSERT shouldn't select from target)
+        if target:
+            sources = [
+                src for src in sources if src.full_id != target.full_id
+            ]
         # Extract column lineage (currently minimal implementation)
         column_lineage = []

sqlcg/parsers/base.py CHANGED Viewed

@@ -24,6 +24,15 @@ class QueryKind(StrEnum):
     OTHER = "OTHER"
+class ParseQuality(StrEnum):
+    """File-level parse quality assessment."""
+    FULL = "FULL"
+    TABLE_ONLY = "TABLE_ONLY"
+    SCRIPTING_FALLBACK = "SCRIPTING_FALLBACK"
+    FAILED = "FAILED"
 @dataclass(frozen=True)
 class TableRef:
     """A reference to a table (immutable).
@@ -162,6 +171,7 @@ class ParsedFile:
         defined_tables: List of TableRef for tables defined in this file
         referenced_tables: List of TableRef for tables referenced in this file
         errors: List of error messages encountered during parsing
+        parse_quality: File-level quality assessment
     """
     path: Path
@@ -170,6 +180,7 @@ class ParsedFile:
     defined_tables: list[TableRef] = field(default_factory=list)
     referenced_tables: list[TableRef] = field(default_factory=list)
     errors: list[str] = field(default_factory=list)
+    parse_quality: ParseQuality = ParseQuality.TABLE_ONLY
     @property
     def path_str(self) -> str:
@@ -384,7 +395,12 @@ class SqlParser(ABC):
                         if root:
                             # Successfully extracted lineage
                             # TODO: convert root to LineageEdge(s)
-                            pass
+                            self._log.debug(
+                                "sg_lineage root obtained but conversion not yet "
+                                "implemented: file=%s col=%s",
+                                path,
+                                col_name,
+                            )
                     except Exception as exc:
                         self._log.warning(
                             "column lineage extraction failed: file=%s col=%s error=%s",

sqlcg/parsers/bigquery_parser.py CHANGED Viewed

@@ -4,7 +4,7 @@ from pathlib import Path
 from sqlcg.lineage.schema_resolver import SchemaResolver
 from sqlcg.parsers.ansi_parser import AnsiParser
-from sqlcg.parsers.base import ParsedFile
+from sqlcg.parsers.base import ParsedFile, ParseQuality
 from sqlcg.parsers.registry import register
 from sqlcg.utils.logging import getLogger
@@ -43,8 +43,8 @@ class BigQueryParser(AnsiParser):
         # Check for scripting blocks
         if self._has_scripting_block(sql):
             logger.info("BigQuery scripting block detected in %s, marking as parse_failed", path)
-            # Scripting blocks are not fully parseable; mark as parse_failed
             out = ParsedFile(path=path, dialect=self.DIALECT)
+            out.parse_quality = ParseQuality.SCRIPTING_FALLBACK
             out.errors.append("parse_mode:scripting_block")
             return out

sqlcg/parsers/snowflake_parser.py CHANGED Viewed

@@ -8,7 +8,7 @@ import sqlglot
 from sqlcg.lineage.schema_resolver import SchemaResolver
 from sqlcg.parsers.ansi_parser import AnsiParser
-from sqlcg.parsers.base import ParsedFile
+from sqlcg.parsers.base import ParsedFile, ParseQuality
 from sqlcg.parsers.registry import register
 from sqlcg.utils.logging import getLogger
@@ -21,7 +21,7 @@ _SCRIPTING_BLOCK = re.compile(r"\bBEGIN\b", re.IGNORECASE)
 # Regex for extracting DML statements from scripting blocks.
 # Does not handle ';' inside string literals — tokenizer-based extraction deferred to v2.
 _EMBEDDED_DML = re.compile(
-    r"(SELECT\s+.+?(?=;|\Z)|INSERT\s+INTO.+?(?=;|\Z)|UPDATE\s+.+?(?=;|\Z)|DELETE\s+.+?(?=;|\Z))",
+    r"(SELECT\s+.+?(?=;|\Z)|INSERT\s+INTO.+?(?=;|\Z)|UPDATE\s+.+?(?=;|\Z)|DELETE\s+.+?(?=;|\Z)|MERGE\s+INTO.+?(?=;|\Z))",
     re.DOTALL | re.IGNORECASE | re.MULTILINE,
 )
@@ -95,6 +95,7 @@ class SnowflakeParser(AnsiParser):
             ParsedFile with extracted DML statements
         """
         out = ParsedFile(path=path, dialect=self.DIALECT)
+        out.parse_quality = ParseQuality.SCRIPTING_FALLBACK
         out.errors.append("parse_mode:scripting_block")
         # Extract DML statements using regex

sqlcg/server/models.py CHANGED Viewed

@@ -19,6 +19,11 @@ class LineageResult(BaseModel):
     lineage: list[LineageNode] = Field(
         default_factory=list, description="List of nodes in the lineage"
     )
+    hint: str | None = Field(
+        None,
+        description="Diagnostic hint when result list is empty. Explains the likely cause "
+        "and suggests a next step.",
+    )
 class TableUsage(BaseModel):
@@ -34,6 +39,11 @@ class TableUsageResult(BaseModel):
     table: str = Field(..., description="Table name")
     usages: list[TableUsage] = Field(default_factory=list, description="List of usages")
+    hint: str | None = Field(
+        None,
+        description="Diagnostic hint when result list is empty. Explains the likely cause "
+        "and suggests a next step.",
+    )
 class DependencyNode(BaseModel):
@@ -48,6 +58,11 @@ class DependencyResult(BaseModel):
     root: str = Field(..., description="Root column or table")
     nodes: list[DependencyNode] = Field(default_factory=list, description="List of dependent nodes")
+    hint: str | None = Field(
+        None,
+        description="Diagnostic hint when result list is empty. Explains the likely cause "
+        "and suggests a next step.",
+    )
 class SqlPatternMatch(BaseModel):
@@ -65,6 +80,11 @@ class SqlPatternResult(BaseModel):
     matches: list[SqlPatternMatch] = Field(
         default_factory=list, description="List of matching queries"
     )
+    hint: str | None = Field(
+        None,
+        description="Diagnostic hint when result list is empty. Explains the likely cause "
+        "and suggests a next step.",
+    )
 class DialectRepo(BaseModel):
@@ -81,3 +101,27 @@ class DialectRepoResult(BaseModel):
     repos: list[DialectRepo] = Field(
         default_factory=list, description="List of indexed repositories"
     )
+class DbInfoResult(BaseModel):
+    """Result of db_info tool — graph health and parse quality diagnostics."""
+    schema_version: str = Field(..., description="Graph schema version")
+    node_counts: dict[str, int] = Field(
+        default_factory=dict,
+        description="Node counts per label (Repo, SqlTable, SqlQuery, SqlColumn, SqlFile)",
+    )
+    column_lineage_edges: int = Field(
+        0, description="Number of COLUMN_LINEAGE edges in the graph"
+    )
+    parse_quality: dict[str, int] = Field(
+        default_factory=dict,
+        description=(
+            "Query count by parsing_mode: 'sqlglot' = standard path, "
+            "'scripting_block' = tokenizer fallback (column lineage limited)"
+        ),
+    )
+    warnings: list[str] = Field(
+        default_factory=list,
+        description="Health warnings. Empty means the graph is in a healthy state.",
+    )

sqlcg/server/tools.py CHANGED Viewed

@@ -17,10 +17,12 @@ from sqlcg.core.queries import (
     SEARCH_SQL_PATTERN_QUERY,
     TRACE_COLUMN_LINEAGE_QUERY,
 )
+from sqlcg.core.schema import NodeLabel
 from sqlcg.indexer.indexer import Indexer
 from sqlcg.metrics.store import MetricsStore
 from sqlcg.server.exceptions import InvalidColumnRefError, NotIndexedError
 from sqlcg.server.models import (
+    DbInfoResult,
     DependencyNode,
     DependencyResult,
     DialectRepo,
@@ -111,7 +113,9 @@ def _assert_indexed(db: GraphBackend) -> None:
     """
     rows = db.run_read("MATCH (r:Repo) RETURN count(r) AS n", {})
     if not rows or rows[0]["n"] == 0:
-        raise NotIndexedError("No repos have been indexed. Run `sqlcg index <path>` first.")
+        raise NotIndexedError(
+            "No repos indexed. Run 'sqlcg db init' then 'sqlcg index <path>' first."
+        )
 def _parse_column_ref(col_ref: str) -> tuple[str, str]:
@@ -190,6 +194,8 @@ def index_repo(repo_path: str, dialect: str = "ansi") -> dict:
     automatically. Falls back to a full directory scan when git is
     unavailable.
+    Binary is `sqlcg`; PyPI package is `sql-code-graph`.
     Args:
         repo_path: Root directory path to index
         dialect: SQL dialect (ansi, snowflake, bigquery, postgres, tsql)
@@ -326,7 +332,16 @@ def trace_column_lineage(table_col: str, max_depth: int = 5) -> LineageResult:
                 )
                 queue.append((node_id, depth + 1))
-    return LineageResult(column=table_col, lineage=lineage)
+    # Populate hint if result is empty
+    hint = None
+    if not lineage:
+        hint = (
+            "No lineage found. Check that 'sqlcg db info' shows SqlColumn > 0. "
+            "If SqlColumn is 0, column lineage was not extracted — check parse errors. "
+            "Submit feedback with submit_feedback tool if this was a false negative."
+        )
+    return LineageResult(column=table_col, lineage=lineage, hint=hint)
 @mcp.tool()
@@ -363,7 +378,16 @@ def find_table_usages(table_name: str) -> TableUsageResult:
             )
         )
-    return TableUsageResult(table=table_name, usages=usages)
+    # Populate hint if result is empty
+    hint = None
+    if not usages:
+        hint = (
+            "No usages found for this table. The table may not be referenced by any "
+            "indexed SQL file, or it may be consumed externally (BI tools, APIs). "
+            "Run 'analyze impact <table>' from the CLI to cross-check."
+        )
+    return TableUsageResult(table=table_name, usages=usages, hint=hint)
 @mcp.tool()
@@ -425,7 +449,16 @@ def get_downstream_dependencies(table_col: str, max_depth: int = 5) -> Dependenc
                 )
                 queue.append((node_id, depth + 1))
-    return DependencyResult(root=table_col, nodes=nodes)
+    # Populate hint if result is empty
+    hint = None
+    if not nodes:
+        hint = (
+            "No lineage found. Check that 'sqlcg db info' shows SqlColumn > 0. "
+            "If SqlColumn is 0, column lineage was not extracted — check parse errors. "
+            "Submit feedback with submit_feedback tool if this was a false negative."
+        )
+    return DependencyResult(root=table_col, nodes=nodes, hint=hint)
 @mcp.tool()
@@ -487,7 +520,16 @@ def get_upstream_dependencies(table_col: str, max_depth: int = 5) -> DependencyR
                 )
                 queue.append((node_id, depth + 1))
-    return DependencyResult(root=table_col, nodes=nodes)
+    # Populate hint if result is empty
+    hint = None
+    if not nodes:
+        hint = (
+            "No lineage found. Check that 'sqlcg db info' shows SqlColumn > 0. "
+            "If SqlColumn is 0, column lineage was not extracted — check parse errors. "
+            "Submit feedback with submit_feedback tool if this was a false negative."
+        )
+    return DependencyResult(root=table_col, nodes=nodes, hint=hint)
 @mcp.tool()
@@ -525,7 +567,15 @@ def search_sql_pattern(query: str, limit: int = 20) -> SqlPatternResult:
             )
         )
-    return SqlPatternResult(pattern=query, matches=matches)
+    # Populate hint if result is empty
+    hint = None
+    if not matches:
+        hint = (
+            "No matches found. Try a shorter or partial pattern. "
+            "Pattern matching is case-sensitive substring search."
+        )
+    return SqlPatternResult(pattern=query, matches=matches, hint=hint)
 @mcp.tool()
@@ -533,6 +583,11 @@ def search_sql_pattern(query: str, limit: int = 20) -> SqlPatternResult:
 def list_dialects_and_repos() -> DialectRepoResult:
     """List all indexed repositories and their SQL dialects.
+    Binary is `sqlcg`; PyPI package is `sql-code-graph`.
+    Returns the catalogue of what has been indexed. For health and parse quality
+    information use `db_info()` instead.
     Returns:
         DialectRepoResult with list of repositories and their dialects
@@ -542,10 +597,7 @@ def list_dialects_and_repos() -> DialectRepoResult:
     db = _get_backend()
     _assert_indexed(db)
-    rows = db.run_read(
-        LIST_DIALECTS_AND_REPOS_QUERY,
-        {},
-    )
+    rows = db.run_read(LIST_DIALECTS_AND_REPOS_QUERY, {})
     repos: list[DialectRepo] = []
     for row in rows:
@@ -560,6 +612,84 @@ def list_dialects_and_repos() -> DialectRepoResult:
     return DialectRepoResult(repos=repos)
+@_timed_tool("db_info")
+def db_info() -> DbInfoResult:
+    """Return graph health and parse quality diagnostics.
+    Use this tool to understand the current state of the indexed graph before
+    running lineage queries. Key signals:
+    - `node_counts["SqlColumn"] == 0` → column lineage was not extracted;
+      trace_column_lineage and dependency tools will return empty results.
+    - `parse_quality["scripting_block"]` high → Snowflake/BigQuery scripting
+      blocks were parsed via tokenizer fallback; column lineage limited for
+      those files. Table-level lineage is still available.
+    - `warnings` list — empty means the graph is healthy.
+    Parse quality legend (parsing_mode per SqlQuery node):
+      sqlglot          — standard path; column lineage available if extracted
+      scripting_block  — tokenizer fallback; column lineage unavailable
+    Returns:
+        DbInfoResult with schema version, node counts, parse quality, and warnings
+    """
+    db = _get_backend()
+    schema_version = db.get_schema_version() or "unknown"
+    node_counts: dict[str, int] = {}
+    for label in NodeLabel:
+        result = db.run_read(f"MATCH (n:{label}) RETURN COUNT(*) AS count", {})
+        node_counts[str(label)] = result[0]["count"] if result else 0
+    edges_result = db.run_read(
+        "MATCH ()-[r:COLUMN_LINEAGE]->() RETURN COUNT(r) AS count", {}
+    )
+    column_lineage_edges = edges_result[0]["count"] if edges_result else 0
+    mode_rows = db.run_read(
+        "MATCH (q:SqlQuery) RETURN q.parsing_mode AS mode,"
+        " COUNT(q) AS cnt ORDER BY cnt DESC",
+        {},
+    )
+    parse_quality: dict[str, int] = {}
+    if mode_rows and "mode" in mode_rows[0]:
+        parse_quality = {str(r["mode"]): int(r["cnt"]) for r in mode_rows}
+    warnings: list[str] = []
+    if node_counts.get("Repo", 0) == 0:
+        warnings.append(
+            "Database is empty. Run 'sqlcg db init' then 'sqlcg index <path>'."
+        )
+    elif node_counts.get("SqlQuery", 0) == 0:
+        warnings.append(
+            "No queries indexed. Run 'sqlcg index <path>' to populate the graph."
+        )
+    elif node_counts.get("SqlColumn", 0) == 0:
+        warnings.append(
+            "SqlColumn count is 0 — column lineage was not extracted. "
+            "trace_column_lineage and dependency tools will return empty results."
+        )
+    total_queries = sum(parse_quality.values())
+    scripting_count = parse_quality.get("scripting_block", 0)
+    if total_queries > 0 and scripting_count > 0:
+        pct = round(100 * scripting_count / total_queries)
+        if pct > 20:
+            warnings.append(
+                f"{pct}% of queries used scripting-block fallback — "
+                "column lineage may be incomplete for those files."
+            )
+    return DbInfoResult(
+        schema_version=schema_version,
+        node_counts=node_counts,
+        column_lineage_edges=column_lineage_edges,
+        parse_quality=parse_quality,
+        warnings=warnings,
+    )
 @mcp.tool()
 @_timed_tool("execute_cypher")
 def execute_cypher(query: str) -> list[dict]:
@@ -632,25 +762,28 @@ def submit_feedback(
     **For Claude**: When a user says "that result was wrong" or "this is a
     false positive", call this tool with label="FP". When they confirm
-    "that's correct", call with label="TP". Use the query or pattern as
-    the 'query' argument and include any user feedback in the 'note'.
+    "that's correct", call with label="TP". When a tool should have
+    returned a result but got empty, call with label="FN" (false negative).
+    Use the query or pattern as the 'query' argument and include any user
+    feedback in the 'note'.
     Args:
         tool_name: Name of the tool being evaluated (e.g., "trace_column_lineage")
         query: The query or pattern that was evaluated
-        label: Feedback label: "TP" (true positive) or "FP" (false positive)
+        label: Feedback label: "TP" (true positive), "FP" (false positive), or
+               "FN" (false negative — expected a result but got empty)
         note: Optional user note (truncated to 500 chars)
     Returns:
         Dict with status: "recorded" or "skipped"
     Raises:
-        ValueError: If label is not "TP" or "FP"
+        ValueError: If label is not "TP", "FP", or "FN"
     """
     global _metrics
-    if label not in ("TP", "FP"):
-        raise ValueError(f"Invalid label: {label}. Must be 'TP' or 'FP'.")
+    if label not in ("TP", "FP", "FN"):
+        raise ValueError(f"Invalid label: {label}. Must be 'TP', 'FP', or 'FN'.")
     if _metrics is not None:
         try:

{sql_code_graph-0.2.1.dist-info → sql_code_graph-0.3.0.dist-info}/WHEEL RENAMED Viewed

File without changes

{sql_code_graph-0.2.1.dist-info → sql_code_graph-0.3.0.dist-info}/entry_points.txt RENAMED Viewed

File without changes

sql-code-graph 0.2.1__py3-none-any.whl → 0.3.0__py3-none-any.whl

sql-code-graph 0.2.1py3-none-any.whl → 0.3.0py3-none-any.whl