InfoTracker 0.1.0__tar.gz → 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (120) hide show
  1. infotracker-0.2.0/MANIFEST.in +2 -0
  2. infotracker-0.2.0/PKG-INFO +285 -0
  3. infotracker-0.2.0/README.md +254 -0
  4. infotracker-0.2.0/docs/advanced_use_cases.md +294 -0
  5. {infotracker-0.1.0 → infotracker-0.2.0}/docs/example_dataset.md +6 -0
  6. {infotracker-0.1.0 → infotracker-0.2.0}/docs/lineage_concepts.md +104 -0
  7. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/01_customers.json +1 -1
  8. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/02_orders.json +3 -3
  9. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/03_products.json +4 -4
  10. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/04_order_items.json +3 -2
  11. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/10_stg_orders.json +7 -7
  12. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/11_stg_order_items.json +10 -11
  13. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/12_stg_customers.json +13 -13
  14. infotracker-0.2.0/examples/warehouse/lineage/94_fn_customer_orders_tvf.json +137 -0
  15. infotracker-0.2.0/examples/warehouse/lineage/95_usp_customer_metrics_dataset.json +96 -0
  16. infotracker-0.2.0/examples/warehouse/lineage/96_demo_usage_tvf_and_proc.json +98 -0
  17. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/sql/01_customers.sql +1 -1
  18. infotracker-0.2.0/examples/warehouse/sql/02_orders.sql +6 -0
  19. infotracker-0.2.0/examples/warehouse/sql/03_products.sql +6 -0
  20. infotracker-0.2.0/examples/warehouse/sql/04_order_items.sql +8 -0
  21. infotracker-0.2.0/examples/warehouse/sql/10_stg_orders.sql +7 -0
  22. infotracker-0.2.0/examples/warehouse/sql/11_stg_order_items.sql +9 -0
  23. infotracker-0.2.0/examples/warehouse/sql/12_stg_customers.sql +10 -0
  24. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/sql/20_vw_recent_orders.sql +2 -2
  25. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/sql/30_dim_customer.sql +2 -2
  26. infotracker-0.2.0/examples/warehouse/sql/31_dim_product.sql +7 -0
  27. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/sql/40_fct_sales.sql +3 -3
  28. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/sql/41_agg_sales_by_day.sql +3 -3
  29. infotracker-0.2.0/examples/warehouse/sql/50_vw_orders_all.sql +3 -0
  30. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/sql/51_vw_orders_all_enriched.sql +2 -2
  31. infotracker-0.2.0/examples/warehouse/sql/52_vw_order_details_star.sql +9 -0
  32. infotracker-0.2.0/examples/warehouse/sql/53_vw_products_all.sql +3 -0
  33. infotracker-0.2.0/examples/warehouse/sql/54_vw_recent_orders_star_cte.sql +7 -0
  34. infotracker-0.2.0/examples/warehouse/sql/55_vw_orders_shipped_or_delivered.sql +4 -0
  35. infotracker-0.2.0/examples/warehouse/sql/56_vw_orders_union_star.sql +4 -0
  36. infotracker-0.2.0/examples/warehouse/sql/60_vw_customer_order_analysis.sql +12 -0
  37. infotracker-0.2.0/examples/warehouse/sql/60_vw_customer_order_ranking.sql +11 -0
  38. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/sql/61_vw_sales_analytics.sql +2 -2
  39. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/sql/90_usp_refresh_sales_with_temp.sql +7 -7
  40. infotracker-0.2.0/examples/warehouse/sql/91_usp_snapshot_recent_orders_star.sql +23 -0
  41. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/sql/92_usp_rebuild_recent_sales_with_vars.sql +9 -9
  42. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/sql/93_usp_top_products_since_var.sql +10 -10
  43. infotracker-0.2.0/examples/warehouse/sql/94_fn_customer_orders_tvf.sql +78 -0
  44. infotracker-0.2.0/examples/warehouse/sql/95_usp_customer_metrics_dataset.sql +71 -0
  45. infotracker-0.2.0/examples/warehouse/sql/96_demo_usage_tvf_and_proc.sql +91 -0
  46. {infotracker-0.1.0 → infotracker-0.2.0}/pyproject.toml +3 -5
  47. {infotracker-0.1.0 → infotracker-0.2.0}/src/infotracker/adapters.py +14 -7
  48. {infotracker-0.1.0 → infotracker-0.2.0}/src/infotracker/cli.py +46 -30
  49. {infotracker-0.1.0 → infotracker-0.2.0}/src/infotracker/config.py +6 -0
  50. {infotracker-0.1.0 → infotracker-0.2.0}/src/infotracker/diff.py +208 -47
  51. infotracker-0.2.0/src/infotracker/engine.py +555 -0
  52. {infotracker-0.1.0 → infotracker-0.2.0}/src/infotracker/lineage.py +6 -3
  53. {infotracker-0.1.0 → infotracker-0.2.0}/src/infotracker/models.py +106 -15
  54. infotracker-0.2.0/src/infotracker/openlineage_utils.py +165 -0
  55. infotracker-0.2.0/src/infotracker/parser.py +1579 -0
  56. {infotracker-0.1.0 → infotracker-0.2.0}/tests/test_expected_outputs.py +2 -3
  57. infotracker-0.2.0/tests/test_impact_output.py +89 -0
  58. infotracker-0.2.0/tests/test_insert_exec.py +75 -0
  59. {infotracker-0.1.0 → infotracker-0.2.0}/tests/test_integration.py +2 -2
  60. {infotracker-0.1.0 → infotracker-0.2.0}/tests/test_parser.py +5 -5
  61. infotracker-0.2.0/tests/test_preprocessing.py +157 -0
  62. infotracker-0.2.0/tests/test_wildcard.py +137 -0
  63. infotracker-0.1.0/PKG-INFO +0 -108
  64. infotracker-0.1.0/README.md +0 -79
  65. infotracker-0.1.0/examples/warehouse/sql/02_orders.sql +0 -6
  66. infotracker-0.1.0/examples/warehouse/sql/03_products.sql +0 -6
  67. infotracker-0.1.0/examples/warehouse/sql/04_order_items.sql +0 -7
  68. infotracker-0.1.0/examples/warehouse/sql/10_stg_orders.sql +0 -7
  69. infotracker-0.1.0/examples/warehouse/sql/11_stg_order_items.sql +0 -9
  70. infotracker-0.1.0/examples/warehouse/sql/12_stg_customers.sql +0 -8
  71. infotracker-0.1.0/examples/warehouse/sql/31_dim_product.sql +0 -7
  72. infotracker-0.1.0/examples/warehouse/sql/50_vw_orders_all.sql +0 -3
  73. infotracker-0.1.0/examples/warehouse/sql/52_vw_order_details_star.sql +0 -9
  74. infotracker-0.1.0/examples/warehouse/sql/53_vw_products_all.sql +0 -3
  75. infotracker-0.1.0/examples/warehouse/sql/54_vw_recent_orders_star_cte.sql +0 -7
  76. infotracker-0.1.0/examples/warehouse/sql/55_vw_orders_shipped_or_delivered.sql +0 -4
  77. infotracker-0.1.0/examples/warehouse/sql/56_vw_orders_union_star.sql +0 -4
  78. infotracker-0.1.0/examples/warehouse/sql/60_vw_customer_order_analysis.sql +0 -12
  79. infotracker-0.1.0/examples/warehouse/sql/60_vw_customer_order_ranking.sql +0 -11
  80. infotracker-0.1.0/examples/warehouse/sql/91_usp_snapshot_recent_orders_star.sql +0 -23
  81. infotracker-0.1.0/src/infotracker/engine.py +0 -340
  82. infotracker-0.1.0/src/infotracker/parser.py +0 -807
  83. {infotracker-0.1.0 → infotracker-0.2.0}/.gitignore +0 -0
  84. {infotracker-0.1.0 → infotracker-0.2.0}/ProjectDescription.md +0 -0
  85. {infotracker-0.1.0 → infotracker-0.2.0}/docs/adapters.md +0 -0
  86. {infotracker-0.1.0 → infotracker-0.2.0}/docs/agentic_workflow.md +0 -0
  87. {infotracker-0.1.0 → infotracker-0.2.0}/docs/algorithm.md +0 -0
  88. {infotracker-0.1.0 → infotracker-0.2.0}/docs/architecture.md +0 -0
  89. {infotracker-0.1.0 → infotracker-0.2.0}/docs/breaking_changes.md +0 -0
  90. {infotracker-0.1.0 → infotracker-0.2.0}/docs/cli_usage.md +0 -0
  91. {infotracker-0.1.0 → infotracker-0.2.0}/docs/configuration.md +0 -0
  92. {infotracker-0.1.0 → infotracker-0.2.0}/docs/dbt_integration.md +0 -0
  93. {infotracker-0.1.0 → infotracker-0.2.0}/docs/edge_cases.md +0 -0
  94. {infotracker-0.1.0 → infotracker-0.2.0}/docs/faq.md +0 -0
  95. {infotracker-0.1.0 → infotracker-0.2.0}/docs/openlineage_mapping.md +0 -0
  96. {infotracker-0.1.0 → infotracker-0.2.0}/docs/overview.md +0 -0
  97. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/20_vw_recent_orders.json +0 -0
  98. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/30_dim_customer.json +0 -0
  99. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/31_dim_product.json +0 -0
  100. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/40_fct_sales.json +0 -0
  101. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/41_agg_sales_by_day.json +0 -0
  102. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/50_vw_orders_all.json +0 -0
  103. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/51_vw_orders_all_enriched.json +0 -0
  104. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/52_vw_order_details_star.json +0 -0
  105. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/53_vw_products_all.json +0 -0
  106. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/54_vw_recent_orders_star_cte.json +0 -0
  107. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/55_vw_orders_shipped_or_delivered.json +0 -0
  108. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/56_vw_orders_union_star.json +0 -0
  109. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/60_vw_customer_order_analysis.json +0 -0
  110. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/90_usp_refresh_sales_with_temp.json +0 -0
  111. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/91_usp_snapshot_recent_orders_star.json +0 -0
  112. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/92_usp_rebuild_recent_sales_with_vars.json +0 -0
  113. {infotracker-0.1.0 → infotracker-0.2.0}/examples/warehouse/lineage/93_usp_top_products_since_var.json +0 -0
  114. {infotracker-0.1.0 → infotracker-0.2.0}/infotracker.yml +0 -0
  115. {infotracker-0.1.0 → infotracker-0.2.0}/requirements.txt +0 -0
  116. {infotracker-0.1.0 → infotracker-0.2.0}/src/infotracker/__init__.py +0 -0
  117. {infotracker-0.1.0 → infotracker-0.2.0}/src/infotracker/__main__.py +0 -0
  118. {infotracker-0.1.0 → infotracker-0.2.0}/tests/__init__.py +0 -0
  119. {infotracker-0.1.0 → infotracker-0.2.0}/tests/conftest.py +0 -0
  120. {infotracker-0.1.0 → infotracker-0.2.0}/tests/test_adapter.py +0 -0
@@ -0,0 +1,2 @@
1
+ exclude tests/*
2
+ exclude tests/**/*
@@ -0,0 +1,285 @@
1
+ Metadata-Version: 2.4
2
+ Name: InfoTracker
3
+ Version: 0.2.0
4
+ Summary: Column-level SQL lineage, impact analysis, and breaking-change detection (MS SQL first)
5
+ Project-URL: homepage, https://example.com/infotracker
6
+ Project-URL: documentation, https://example.com/infotracker/docs
7
+ Author: InfoTracker Authors
8
+ License: MIT
9
+ Keywords: data-lineage,impact-analysis,lineage,mssql,openlineage,sql
10
+ Classifier: Environment :: Console
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Operating System :: OS Independent
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.10
15
+ Classifier: Topic :: Database
16
+ Classifier: Topic :: Software Development :: Libraries
17
+ Requires-Python: >=3.10
18
+ Requires-Dist: click
19
+ Requires-Dist: networkx>=3.3
20
+ Requires-Dist: packaging>=24.0
21
+ Requires-Dist: pydantic>=2.8.2
22
+ Requires-Dist: pyyaml>=6.0.1
23
+ Requires-Dist: rich
24
+ Requires-Dist: shellingham
25
+ Requires-Dist: sqlglot>=23.0.0
26
+ Requires-Dist: typer
27
+ Provides-Extra: dev
28
+ Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
29
+ Requires-Dist: pytest>=7.4.0; extra == 'dev'
30
+ Description-Content-Type: text/markdown
31
+
32
+ # InfoTracker
33
+
34
+ Column-level SQL lineage extraction and impact analysis for MS SQL Server
35
+
36
+ ## Features
37
+
38
+ - **Column-level lineage** - Track data flow at the column level
39
+ - **Parse SQL files** and generate OpenLineage-compatible JSON
40
+ - **Impact analysis** - Find upstream and downstream column dependencies with flexible selectors
41
+ - **Wildcard matching** - Support for table wildcards (`schema.table.*`) and column wildcards (`..pattern`)
42
+ - **Direction control** - Query upstream (`+selector`), downstream (`selector+`), or both (`+selector+`)
43
+ - **Configurable depth** - Control traversal depth with `--max-depth`
44
+ - **Multiple output formats** - Text tables or JSON for scripting
45
+ - **MSSQL support** - T-SQL dialect with temp tables, variables, and stored procedures
46
+ - **Advanced SQL objects** - Support for table-valued functions (TVF) and dataset-returning procedures
47
+ - **Temp table lineage** - Track EXEC into temp tables and propagate lineage downstream
48
+
49
+ ## Requirements
50
+ - Python 3.10+
51
+ - Virtual environment (activated)
52
+ - Basic SQL knowledge
53
+ - Git and shell
54
+
55
+ ## Troubleshooting
56
+ - **Error tracebacks on help commands**: Make sure you're running in an activated virtual environment
57
+ - **Command not found**: Activate your virtual environment first
58
+ - **Import errors**: Ensure all dependencies are installed with `pip install -e .`
59
+ - **Column not found**: Use full URI format or check column_graph.json for exact names
60
+
61
+ ## Quickstart
62
+
63
+ ### Setup & Installation
64
+ ```bash
65
+ # Activate virtual environment first (REQUIRED)
66
+
67
+ # Install dependencies
68
+ pip install -e .
69
+
70
+ # Verify installation
71
+ infotracker --help
72
+ ```
73
+
74
+ ### Basic Usage
75
+ ```bash
76
+ # 1. Extract lineage from SQL files (builds column graph)
77
+ infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
78
+
79
+ # 2. Run impact analysis
80
+ infotracker impact -s "STG.dbo.Orders.OrderID" # downstream dependencies
81
+ infotracker impact -s "+STG.dbo.Orders.OrderID" # upstream sources
82
+ ```
83
+
84
+ ## Selector Syntax
85
+
86
+ InfoTracker supports flexible column selectors:
87
+
88
+ | Selector Format | Description | Example |
89
+ |-----------------|-------------|---------|
90
+ | `table.column` | Simple format (adds default `dbo` schema) | `Orders.OrderID` |
91
+ | `schema.table.column` | Schema-qualified format | `dbo.Orders.OrderID` |
92
+ | `database.schema.table.column` | Database-qualified format | `STG.dbo.Orders.OrderID` |
93
+ | `schema.table.*` | Table wildcard (all columns) | `dbo.fct_sales.*` |
94
+ | `..pattern` | Column wildcard (name contains pattern) | `..revenue` |
95
+ | `.pattern` | Alias for column wildcard | `.orderid` |
96
+ | Full URI | Complete namespace format | `mssql://localhost/InfoTrackerDW.STG.dbo.Orders.OrderID` |
97
+
98
+ ### Direction Control
99
+ - `selector` - downstream dependencies (default)
100
+ - `+selector` - upstream sources
101
+ - `selector+` - downstream dependencies (explicit)
102
+ - `+selector+` - both upstream and downstream
103
+
104
+ ### Selector Cheat Sheet
105
+
106
+ **Table wildcards:**
107
+ ```bash
108
+ # All columns from a specific table
109
+ infotracker impact -s "dbo.fct_sales.*"
110
+ infotracker impact -s "STG.dbo.Orders.*"
111
+ ```
112
+
113
+ **Column name matching:**
114
+ ```bash
115
+ # Find all columns containing "revenue" (case-insensitive)
116
+ infotracker impact -s "..revenue"
117
+
118
+ # Find all columns containing "id"
119
+ infotracker impact -s "..id"
120
+
121
+ # Use wildcards for pattern matching
122
+ infotracker impact -s "..customer*"
123
+ ```
124
+
125
+ **Direction examples:**
126
+ ```bash
127
+ # Upstream: what feeds into this column
128
+ infotracker impact -s "+dbo.fct_sales.Revenue"
129
+
130
+ # Downstream: what uses this column
131
+ infotracker impact -s "STG.dbo.Orders.OrderID+"
132
+
133
+ # Both directions
134
+ infotracker impact -s "+dbo.dim_customer.CustomerID+"
135
+ ```
136
+
137
+ **Advanced SQL objects:**
138
+ ```bash
139
+ # Table-valued function columns (upstream)
140
+ infotracker impact -s "+dbo.fn_customer_orders_tvf.*"
141
+
142
+ # Procedure dataset columns (upstream)
143
+ infotracker impact -s "+dbo.usp_customer_metrics_dataset.*"
144
+
145
+ # Temp table lineage from EXEC
146
+ infotracker impact -s "+#temp_table.*"
147
+ ```
148
+
149
+ ## Examples
150
+
151
+ ```bash
152
+ # Extract lineage (run this first)
153
+ infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
154
+
155
+ # Basic column lineage
156
+ infotracker impact -s "+dbo.fct_sales.Revenue" # upstream sources
157
+ infotracker impact -s "STG.dbo.Orders.OrderID+" # downstream usage
158
+
159
+ # Wildcard selectors
160
+ infotracker impact -s "+..revenue+" # all revenue columns (both directions)
161
+ infotracker impact -s "dbo.fct_sales.*" # all columns from table
162
+ infotracker --format json impact -s "..customer*" # customer columns (JSON output)
163
+
164
+ # Advanced SQL objects (NEW)
165
+ infotracker impact -s "+dbo.fn_customer_orders_tvf.*" # TVF columns (upstream)
166
+ infotracker impact -s "+dbo.usp_customer_metrics_dataset.*" # procedure columns (upstream)
167
+
168
+ # Depth control
169
+ infotracker impact -s "+dbo.Orders.OrderID" --max-depth 1
170
+
171
+ # Demo the new features with the included examples
172
+ infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
173
+ infotracker impact -s "+dbo.fn_customer_orders_inline.*"
174
+ infotracker impact -s "+dbo.usp_customer_metrics_dataset.TotalRevenue"
175
+ ```
176
+
177
+ ### Copy-Paste Demo Commands
178
+
179
+ Test the new TVF and procedure lineage features:
180
+
181
+ ```bash
182
+ # 1. Extract all lineage (including new TVF/procedure support)
183
+ infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
184
+
185
+ # 2. Test TVF lineage
186
+ infotracker --format text impact -s "+dbo.fn_customer_orders_tvf.*"
187
+
188
+ # 3. Test procedure lineage
189
+ infotracker --format text impact -s "+dbo.usp_customer_metrics_dataset.*"
190
+
191
+ # 4. Test column name contains wildcard
192
+ infotracker --format text impact -s "+..revenue"
193
+
194
+ # 5. Show results in JSON format
195
+ infotracker --format json impact -s "..total*" > tvf_lineage.json
196
+ ```
197
+
198
+ ## Output Format
199
+
200
+ Impact analysis returns these columns:
201
+ - **from** - Source column (fully qualified)
202
+ - **to** - Target column (fully qualified)
203
+ - **direction** - `upstream` or `downstream`
204
+ - **transformation** - Type of transformation (`IDENTITY`, `ARITHMETIC`, `AGGREGATION`, `CASE_AGGREGATION`, `DATE_FUNCTION`, `WINDOW`, etc.)
205
+ - **description** - Human-readable transformation description
206
+
207
+ Results are automatically deduplicated. Use `--format json` for machine-readable output.
208
+
209
+ ### New Transformation Types
210
+
211
+ The enhanced transformation taxonomy includes:
212
+ - `ARITHMETIC_AGGREGATION` - Arithmetic operations combined with aggregation functions
213
+ - `COMPLEX_AGGREGATION` - Multi-step calculations involving multiple aggregations
214
+ - `DATE_FUNCTION` - Date/time calculations like DATEDIFF, DATEADD
215
+ - `DATE_FUNCTION_AGGREGATION` - Date functions applied to aggregated results
216
+ - `CASE_AGGREGATION` - CASE statements applied to aggregated results
217
+
218
+ ### Advanced Object Support
219
+
220
+ InfoTracker now supports advanced SQL Server objects:
221
+
222
+ **Table-Valued Functions (TVF):**
223
+ - Inline TVF (`RETURN AS SELECT`) - Parsed directly from SELECT statement
224
+ - Multi-statement TVF (`RETURN @table TABLE`) - Extracts schema from table variable definition
225
+ - Function parameters are tracked as filter metadata (don't create columns)
226
+
227
+ **Dataset-Returning Procedures:**
228
+ - Procedures ending with SELECT statement are treated as dataset sources
229
+ - Output schema extracted from the final SELECT statement
230
+ - Parameters tracked as filter metadata affecting lineage scope
231
+
232
+ **EXEC into Temp Tables:**
233
+ - `INSERT INTO #temp EXEC procedure` patterns create edges from procedure columns to temp table columns
234
+ - Temp table lineage propagates downstream to final targets
235
+ - Supports complex workflow patterns combining functions, procedures, and temp tables
236
+
237
+ ## Configuration
238
+
239
+ InfoTracker follows this configuration precedence:
240
+ 1. **CLI flags** (highest priority) - override everything
241
+ 2. **infotracker.yml** config file - project defaults
242
+ 3. **Built-in defaults** (lowest priority) - fallback values
243
+
244
+ Create an `infotracker.yml` file in your project root:
245
+ ```yaml
246
+ default_adapter: mssql
247
+ sql_dir: examples/warehouse/sql
248
+ out_dir: build/lineage
249
+ include: ["*.sql"]
250
+ exclude: ["*_wip.sql"]
251
+ ```
252
+
253
+ ## Documentation
254
+
255
+ For detailed information:
256
+ - `docs/overview.md` — what it is, goals, scope
257
+ - `docs/algorithm.md` — how extraction works
258
+ - `docs/lineage_concepts.md` — core concepts with visuals
259
+ - `docs/cli_usage.md` — commands and options
260
+ - `docs/breaking_changes.md` — definition and detection
261
+ - `docs/edge_cases.md` — SELECT *, UNION, temp tables, etc.
262
+ - `docs/adapters.md` — interface and MSSQL specifics
263
+ - `docs/architecture.md` — system and sequence diagrams
264
+ - `docs/configuration.md` — configuration reference
265
+ - `docs/openlineage_mapping.md` — how outputs map to OpenLineage
266
+ - `docs/faq.md` — common questions
267
+
268
+ #### Documentation
269
+ - `docs/overview.md` — what it is, goals, scope
270
+ - `docs/algorithm.md` — how extraction works
271
+ - `docs/lineage_concepts.md` — core concepts with visuals
272
+ - `docs/cli_usage.md` — commands and options
273
+ - `docs/breaking_changes.md` — definition and detection
274
+ - `docs/edge_cases.md` — SELECT *, UNION, temp tables, etc.
275
+ - `docs/advanced_use_cases.md` — tabular functions, procedures returning datasets
276
+ - `docs/adapters.md` — interface and MSSQL specifics
277
+ - `docs/architecture.md` — system and sequence diagrams
278
+ - `docs/configuration.md` — configuration reference
279
+ - `docs/openlineage_mapping.md` — how outputs map to OpenLineage
280
+ - `docs/faq.md` — common questions
281
+ - `docs/dbt_integration.md` — how to use with dbt projects
282
+
283
+
284
+ ## License
285
+ MIT (or your team’s preferred license)
@@ -0,0 +1,254 @@
1
+ # InfoTracker
2
+
3
+ Column-level SQL lineage extraction and impact analysis for MS SQL Server
4
+
5
+ ## Features
6
+
7
+ - **Column-level lineage** - Track data flow at the column level
8
+ - **Parse SQL files** and generate OpenLineage-compatible JSON
9
+ - **Impact analysis** - Find upstream and downstream column dependencies with flexible selectors
10
+ - **Wildcard matching** - Support for table wildcards (`schema.table.*`) and column wildcards (`..pattern`)
11
+ - **Direction control** - Query upstream (`+selector`), downstream (`selector+`), or both (`+selector+`)
12
+ - **Configurable depth** - Control traversal depth with `--max-depth`
13
+ - **Multiple output formats** - Text tables or JSON for scripting
14
+ - **MSSQL support** - T-SQL dialect with temp tables, variables, and stored procedures
15
+ - **Advanced SQL objects** - Support for table-valued functions (TVF) and dataset-returning procedures
16
+ - **Temp table lineage** - Track EXEC into temp tables and propagate lineage downstream
17
+
18
+ ## Requirements
19
+ - Python 3.10+
20
+ - Virtual environment (activated)
21
+ - Basic SQL knowledge
22
+ - Git and shell
23
+
24
+ ## Troubleshooting
25
+ - **Error tracebacks on help commands**: Make sure you're running in an activated virtual environment
26
+ - **Command not found**: Activate your virtual environment first
27
+ - **Import errors**: Ensure all dependencies are installed with `pip install -e .`
28
+ - **Column not found**: Use full URI format or check column_graph.json for exact names
29
+
30
+ ## Quickstart
31
+
32
+ ### Setup & Installation
33
+ ```bash
34
+ # Activate virtual environment first (REQUIRED)
35
+
36
+ # Install dependencies
37
+ pip install -e .
38
+
39
+ # Verify installation
40
+ infotracker --help
41
+ ```
42
+
43
+ ### Basic Usage
44
+ ```bash
45
+ # 1. Extract lineage from SQL files (builds column graph)
46
+ infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
47
+
48
+ # 2. Run impact analysis
49
+ infotracker impact -s "STG.dbo.Orders.OrderID" # downstream dependencies
50
+ infotracker impact -s "+STG.dbo.Orders.OrderID" # upstream sources
51
+ ```
52
+
53
+ ## Selector Syntax
54
+
55
+ InfoTracker supports flexible column selectors:
56
+
57
+ | Selector Format | Description | Example |
58
+ |-----------------|-------------|---------|
59
+ | `table.column` | Simple format (adds default `dbo` schema) | `Orders.OrderID` |
60
+ | `schema.table.column` | Schema-qualified format | `dbo.Orders.OrderID` |
61
+ | `database.schema.table.column` | Database-qualified format | `STG.dbo.Orders.OrderID` |
62
+ | `schema.table.*` | Table wildcard (all columns) | `dbo.fct_sales.*` |
63
+ | `..pattern` | Column wildcard (name contains pattern) | `..revenue` |
64
+ | `.pattern` | Alias for column wildcard | `.orderid` |
65
+ | Full URI | Complete namespace format | `mssql://localhost/InfoTrackerDW.STG.dbo.Orders.OrderID` |
66
+
67
+ ### Direction Control
68
+ - `selector` - downstream dependencies (default)
69
+ - `+selector` - upstream sources
70
+ - `selector+` - downstream dependencies (explicit)
71
+ - `+selector+` - both upstream and downstream
72
+
73
+ ### Selector Cheat Sheet
74
+
75
+ **Table wildcards:**
76
+ ```bash
77
+ # All columns from a specific table
78
+ infotracker impact -s "dbo.fct_sales.*"
79
+ infotracker impact -s "STG.dbo.Orders.*"
80
+ ```
81
+
82
+ **Column name matching:**
83
+ ```bash
84
+ # Find all columns containing "revenue" (case-insensitive)
85
+ infotracker impact -s "..revenue"
86
+
87
+ # Find all columns containing "id"
88
+ infotracker impact -s "..id"
89
+
90
+ # Use wildcards for pattern matching
91
+ infotracker impact -s "..customer*"
92
+ ```
93
+
94
+ **Direction examples:**
95
+ ```bash
96
+ # Upstream: what feeds into this column
97
+ infotracker impact -s "+dbo.fct_sales.Revenue"
98
+
99
+ # Downstream: what uses this column
100
+ infotracker impact -s "STG.dbo.Orders.OrderID+"
101
+
102
+ # Both directions
103
+ infotracker impact -s "+dbo.dim_customer.CustomerID+"
104
+ ```
105
+
106
+ **Advanced SQL objects:**
107
+ ```bash
108
+ # Table-valued function columns (upstream)
109
+ infotracker impact -s "+dbo.fn_customer_orders_tvf.*"
110
+
111
+ # Procedure dataset columns (upstream)
112
+ infotracker impact -s "+dbo.usp_customer_metrics_dataset.*"
113
+
114
+ # Temp table lineage from EXEC
115
+ infotracker impact -s "+#temp_table.*"
116
+ ```
117
+
118
+ ## Examples
119
+
120
+ ```bash
121
+ # Extract lineage (run this first)
122
+ infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
123
+
124
+ # Basic column lineage
125
+ infotracker impact -s "+dbo.fct_sales.Revenue" # upstream sources
126
+ infotracker impact -s "STG.dbo.Orders.OrderID+" # downstream usage
127
+
128
+ # Wildcard selectors
129
+ infotracker impact -s "+..revenue+" # all revenue columns (both directions)
130
+ infotracker impact -s "dbo.fct_sales.*" # all columns from table
131
+ infotracker --format json impact -s "..customer*" # customer columns (JSON output)
132
+
133
+ # Advanced SQL objects (NEW)
134
+ infotracker impact -s "+dbo.fn_customer_orders_tvf.*" # TVF columns (upstream)
135
+ infotracker impact -s "+dbo.usp_customer_metrics_dataset.*" # procedure columns (upstream)
136
+
137
+ # Depth control
138
+ infotracker impact -s "+dbo.Orders.OrderID" --max-depth 1
139
+
140
+ # Demo the new features with the included examples
141
+ infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
142
+ infotracker impact -s "+dbo.fn_customer_orders_inline.*"
143
+ infotracker impact -s "+dbo.usp_customer_metrics_dataset.TotalRevenue"
144
+ ```
145
+
146
+ ### Copy-Paste Demo Commands
147
+
148
+ Test the new TVF and procedure lineage features:
149
+
150
+ ```bash
151
+ # 1. Extract all lineage (including new TVF/procedure support)
152
+ infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
153
+
154
+ # 2. Test TVF lineage
155
+ infotracker --format text impact -s "+dbo.fn_customer_orders_tvf.*"
156
+
157
+ # 3. Test procedure lineage
158
+ infotracker --format text impact -s "+dbo.usp_customer_metrics_dataset.*"
159
+
160
+ # 4. Test column name contains wildcard
161
+ infotracker --format text impact -s "+..revenue"
162
+
163
+ # 5. Show results in JSON format
164
+ infotracker --format json impact -s "..total*" > tvf_lineage.json
165
+ ```
166
+
167
+ ## Output Format
168
+
169
+ Impact analysis returns these columns:
170
+ - **from** - Source column (fully qualified)
171
+ - **to** - Target column (fully qualified)
172
+ - **direction** - `upstream` or `downstream`
173
+ - **transformation** - Type of transformation (`IDENTITY`, `ARITHMETIC`, `AGGREGATION`, `CASE_AGGREGATION`, `DATE_FUNCTION`, `WINDOW`, etc.)
174
+ - **description** - Human-readable transformation description
175
+
176
+ Results are automatically deduplicated. Use `--format json` for machine-readable output.
177
+
178
+ ### New Transformation Types
179
+
180
+ The enhanced transformation taxonomy includes:
181
+ - `ARITHMETIC_AGGREGATION` - Arithmetic operations combined with aggregation functions
182
+ - `COMPLEX_AGGREGATION` - Multi-step calculations involving multiple aggregations
183
+ - `DATE_FUNCTION` - Date/time calculations like DATEDIFF, DATEADD
184
+ - `DATE_FUNCTION_AGGREGATION` - Date functions applied to aggregated results
185
+ - `CASE_AGGREGATION` - CASE statements applied to aggregated results
186
+
187
+ ### Advanced Object Support
188
+
189
+ InfoTracker now supports advanced SQL Server objects:
190
+
191
+ **Table-Valued Functions (TVF):**
192
+ - Inline TVF (`RETURN AS SELECT`) - Parsed directly from SELECT statement
193
+ - Multi-statement TVF (`RETURN @table TABLE`) - Extracts schema from table variable definition
194
+ - Function parameters are tracked as filter metadata (don't create columns)
195
+
196
+ **Dataset-Returning Procedures:**
197
+ - Procedures ending with SELECT statement are treated as dataset sources
198
+ - Output schema extracted from the final SELECT statement
199
+ - Parameters tracked as filter metadata affecting lineage scope
200
+
201
+ **EXEC into Temp Tables:**
202
+ - `INSERT INTO #temp EXEC procedure` patterns create edges from procedure columns to temp table columns
203
+ - Temp table lineage propagates downstream to final targets
204
+ - Supports complex workflow patterns combining functions, procedures, and temp tables
205
+
206
+ ## Configuration
207
+
208
+ InfoTracker follows this configuration precedence:
209
+ 1. **CLI flags** (highest priority) - override everything
210
+ 2. **infotracker.yml** config file - project defaults
211
+ 3. **Built-in defaults** (lowest priority) - fallback values
212
+
213
+ Create an `infotracker.yml` file in your project root:
214
+ ```yaml
215
+ default_adapter: mssql
216
+ sql_dir: examples/warehouse/sql
217
+ out_dir: build/lineage
218
+ include: ["*.sql"]
219
+ exclude: ["*_wip.sql"]
220
+ ```
221
+
222
+ ## Documentation
223
+
224
+ For detailed information:
225
+ - `docs/overview.md` — what it is, goals, scope
226
+ - `docs/algorithm.md` — how extraction works
227
+ - `docs/lineage_concepts.md` — core concepts with visuals
228
+ - `docs/cli_usage.md` — commands and options
229
+ - `docs/breaking_changes.md` — definition and detection
230
+ - `docs/edge_cases.md` — SELECT *, UNION, temp tables, etc.
231
+ - `docs/adapters.md` — interface and MSSQL specifics
232
+ - `docs/architecture.md` — system and sequence diagrams
233
+ - `docs/configuration.md` — configuration reference
234
+ - `docs/openlineage_mapping.md` — how outputs map to OpenLineage
235
+ - `docs/faq.md` — common questions
236
+
237
+ #### Documentation
238
+ - `docs/overview.md` — what it is, goals, scope
239
+ - `docs/algorithm.md` — how extraction works
240
+ - `docs/lineage_concepts.md` — core concepts with visuals
241
+ - `docs/cli_usage.md` — commands and options
242
+ - `docs/breaking_changes.md` — definition and detection
243
+ - `docs/edge_cases.md` — SELECT *, UNION, temp tables, etc.
244
+ - `docs/advanced_use_cases.md` — tabular functions, procedures returning datasets
245
+ - `docs/adapters.md` — interface and MSSQL specifics
246
+ - `docs/architecture.md` — system and sequence diagrams
247
+ - `docs/configuration.md` — configuration reference
248
+ - `docs/openlineage_mapping.md` — how outputs map to OpenLineage
249
+ - `docs/faq.md` — common questions
250
+ - `docs/dbt_integration.md` — how to use with dbt projects
251
+
252
+
253
+ ## License
254
+ MIT (or your team’s preferred license)