InfoTracker 0.1.0__py3-none-any.whl → 0.2.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,285 @@
1
+ Metadata-Version: 2.4
2
+ Name: InfoTracker
3
+ Version: 0.2.0
4
+ Summary: Column-level SQL lineage, impact analysis, and breaking-change detection (MS SQL first)
5
+ Project-URL: homepage, https://example.com/infotracker
6
+ Project-URL: documentation, https://example.com/infotracker/docs
7
+ Author: InfoTracker Authors
8
+ License: MIT
9
+ Keywords: data-lineage,impact-analysis,lineage,mssql,openlineage,sql
10
+ Classifier: Environment :: Console
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Operating System :: OS Independent
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.10
15
+ Classifier: Topic :: Database
16
+ Classifier: Topic :: Software Development :: Libraries
17
+ Requires-Python: >=3.10
18
+ Requires-Dist: click
19
+ Requires-Dist: networkx>=3.3
20
+ Requires-Dist: packaging>=24.0
21
+ Requires-Dist: pydantic>=2.8.2
22
+ Requires-Dist: pyyaml>=6.0.1
23
+ Requires-Dist: rich
24
+ Requires-Dist: shellingham
25
+ Requires-Dist: sqlglot>=23.0.0
26
+ Requires-Dist: typer
27
+ Provides-Extra: dev
28
+ Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
29
+ Requires-Dist: pytest>=7.4.0; extra == 'dev'
30
+ Description-Content-Type: text/markdown
31
+
32
+ # InfoTracker
33
+
34
+ Column-level SQL lineage extraction and impact analysis for MS SQL Server
35
+
36
+ ## Features
37
+
38
+ - **Column-level lineage** - Track data flow at the column level
39
+ - **Parse SQL files** and generate OpenLineage-compatible JSON
40
+ - **Impact analysis** - Find upstream and downstream column dependencies with flexible selectors
41
+ - **Wildcard matching** - Support for table wildcards (`schema.table.*`) and column wildcards (`..pattern`)
42
+ - **Direction control** - Query upstream (`+selector`), downstream (`selector+`), or both (`+selector+`)
43
+ - **Configurable depth** - Control traversal depth with `--max-depth`
44
+ - **Multiple output formats** - Text tables or JSON for scripting
45
+ - **MSSQL support** - T-SQL dialect with temp tables, variables, and stored procedures
46
+ - **Advanced SQL objects** - Support for table-valued functions (TVF) and dataset-returning procedures
47
+ - **Temp table lineage** - Track EXEC into temp tables and propagate lineage downstream
48
+
49
+ ## Requirements
50
+ - Python 3.10+
51
+ - Virtual environment (activated)
52
+ - Basic SQL knowledge
53
+ - Git and shell
54
+
55
+ ## Troubleshooting
56
+ - **Error tracebacks on help commands**: Make sure you're running in an activated virtual environment
57
+ - **Command not found**: Activate your virtual environment first
58
+ - **Import errors**: Ensure all dependencies are installed with `pip install -e .`
59
+ - **Column not found**: Use full URI format or check column_graph.json for exact names
60
+
61
+ ## Quickstart
62
+
63
+ ### Setup & Installation
64
+ ```bash
65
+ # Activate virtual environment first (REQUIRED)
66
+
67
+ # Install dependencies
68
+ pip install -e .
69
+
70
+ # Verify installation
71
+ infotracker --help
72
+ ```
73
+
74
+ ### Basic Usage
75
+ ```bash
76
+ # 1. Extract lineage from SQL files (builds column graph)
77
+ infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
78
+
79
+ # 2. Run impact analysis
80
+ infotracker impact -s "STG.dbo.Orders.OrderID" # downstream dependencies
81
+ infotracker impact -s "+STG.dbo.Orders.OrderID" # upstream sources
82
+ ```
83
+
84
+ ## Selector Syntax
85
+
86
+ InfoTracker supports flexible column selectors:
87
+
88
+ | Selector Format | Description | Example |
89
+ |-----------------|-------------|---------|
90
+ | `table.column` | Simple format (adds default `dbo` schema) | `Orders.OrderID` |
91
+ | `schema.table.column` | Schema-qualified format | `dbo.Orders.OrderID` |
92
+ | `database.schema.table.column` | Database-qualified format | `STG.dbo.Orders.OrderID` |
93
+ | `schema.table.*` | Table wildcard (all columns) | `dbo.fct_sales.*` |
94
+ | `..pattern` | Column wildcard (name contains pattern) | `..revenue` |
95
+ | `.pattern` | Alias for column wildcard | `.orderid` |
96
+ | Full URI | Complete namespace format | `mssql://localhost/InfoTrackerDW.STG.dbo.Orders.OrderID` |
97
+
98
+ ### Direction Control
99
+ - `selector` - downstream dependencies (default)
100
+ - `+selector` - upstream sources
101
+ - `selector+` - downstream dependencies (explicit)
102
+ - `+selector+` - both upstream and downstream
103
+
104
+ ### Selector Cheat Sheet
105
+
106
+ **Table wildcards:**
107
+ ```bash
108
+ # All columns from a specific table
109
+ infotracker impact -s "dbo.fct_sales.*"
110
+ infotracker impact -s "STG.dbo.Orders.*"
111
+ ```
112
+
113
+ **Column name matching:**
114
+ ```bash
115
+ # Find all columns containing "revenue" (case-insensitive)
116
+ infotracker impact -s "..revenue"
117
+
118
+ # Find all columns containing "id"
119
+ infotracker impact -s "..id"
120
+
121
+ # Use wildcards for pattern matching
122
+ infotracker impact -s "..customer*"
123
+ ```
124
+
125
+ **Direction examples:**
126
+ ```bash
127
+ # Upstream: what feeds into this column
128
+ infotracker impact -s "+dbo.fct_sales.Revenue"
129
+
130
+ # Downstream: what uses this column
131
+ infotracker impact -s "STG.dbo.Orders.OrderID+"
132
+
133
+ # Both directions
134
+ infotracker impact -s "+dbo.dim_customer.CustomerID+"
135
+ ```
136
+
137
+ **Advanced SQL objects:**
138
+ ```bash
139
+ # Table-valued function columns (upstream)
140
+ infotracker impact -s "+dbo.fn_customer_orders_tvf.*"
141
+
142
+ # Procedure dataset columns (upstream)
143
+ infotracker impact -s "+dbo.usp_customer_metrics_dataset.*"
144
+
145
+ # Temp table lineage from EXEC
146
+ infotracker impact -s "+#temp_table.*"
147
+ ```
148
+
149
+ ## Examples
150
+
151
+ ```bash
152
+ # Extract lineage (run this first)
153
+ infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
154
+
155
+ # Basic column lineage
156
+ infotracker impact -s "+dbo.fct_sales.Revenue" # upstream sources
157
+ infotracker impact -s "STG.dbo.Orders.OrderID+" # downstream usage
158
+
159
+ # Wildcard selectors
160
+ infotracker impact -s "+..revenue+" # all revenue columns (both directions)
161
+ infotracker impact -s "dbo.fct_sales.*" # all columns from table
162
+ infotracker --format json impact -s "..customer*" # customer columns (JSON output)
163
+
164
+ # Advanced SQL objects (NEW)
165
+ infotracker impact -s "+dbo.fn_customer_orders_tvf.*" # TVF columns (upstream)
166
+ infotracker impact -s "+dbo.usp_customer_metrics_dataset.*" # procedure columns (upstream)
167
+
168
+ # Depth control
169
+ infotracker impact -s "+dbo.Orders.OrderID" --max-depth 1
170
+
171
+ # Demo the new features with the included examples
172
+ infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
173
+ infotracker impact -s "+dbo.fn_customer_orders_inline.*"
174
+ infotracker impact -s "+dbo.usp_customer_metrics_dataset.TotalRevenue"
175
+ ```
176
+
177
+ ### Copy-Paste Demo Commands
178
+
179
+ Test the new TVF and procedure lineage features:
180
+
181
+ ```bash
182
+ # 1. Extract all lineage (including new TVF/procedure support)
183
+ infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
184
+
185
+ # 2. Test TVF lineage
186
+ infotracker --format text impact -s "+dbo.fn_customer_orders_tvf.*"
187
+
188
+ # 3. Test procedure lineage
189
+ infotracker --format text impact -s "+dbo.usp_customer_metrics_dataset.*"
190
+
191
+ # 4. Test column name contains wildcard
192
+ infotracker --format text impact -s "+..revenue"
193
+
194
+ # 5. Show results in JSON format
195
+ infotracker --format json impact -s "..total*" > tvf_lineage.json
196
+ ```
197
+
198
+ ## Output Format
199
+
200
+ Impact analysis returns these columns:
201
+ - **from** - Source column (fully qualified)
202
+ - **to** - Target column (fully qualified)
203
+ - **direction** - `upstream` or `downstream`
204
+ - **transformation** - Type of transformation (`IDENTITY`, `ARITHMETIC`, `AGGREGATION`, `CASE_AGGREGATION`, `DATE_FUNCTION`, `WINDOW`, etc.)
205
+ - **description** - Human-readable transformation description
206
+
207
+ Results are automatically deduplicated. Use `--format json` for machine-readable output.
208
+
209
+ ### New Transformation Types
210
+
211
+ The enhanced transformation taxonomy includes:
212
+ - `ARITHMETIC_AGGREGATION` - Arithmetic operations combined with aggregation functions
213
+ - `COMPLEX_AGGREGATION` - Multi-step calculations involving multiple aggregations
214
+ - `DATE_FUNCTION` - Date/time calculations like DATEDIFF, DATEADD
215
+ - `DATE_FUNCTION_AGGREGATION` - Date functions applied to aggregated results
216
+ - `CASE_AGGREGATION` - CASE statements applied to aggregated results
217
+
218
+ ### Advanced Object Support
219
+
220
+ InfoTracker now supports advanced SQL Server objects:
221
+
222
+ **Table-Valued Functions (TVF):**
223
+ - Inline TVF (`RETURN AS SELECT`) - Parsed directly from SELECT statement
224
+ - Multi-statement TVF (`RETURN @table TABLE`) - Extracts schema from table variable definition
225
+ - Function parameters are tracked as filter metadata (don't create columns)
226
+
227
+ **Dataset-Returning Procedures:**
228
+ - Procedures ending with SELECT statement are treated as dataset sources
229
+ - Output schema extracted from the final SELECT statement
230
+ - Parameters tracked as filter metadata affecting lineage scope
231
+
232
+ **EXEC into Temp Tables:**
233
+ - `INSERT INTO #temp EXEC procedure` patterns create edges from procedure columns to temp table columns
234
+ - Temp table lineage propagates downstream to final targets
235
+ - Supports complex workflow patterns combining functions, procedures, and temp tables
236
+
237
+ ## Configuration
238
+
239
+ InfoTracker follows this configuration precedence:
240
+ 1. **CLI flags** (highest priority) - override everything
241
+ 2. **infotracker.yml** config file - project defaults
242
+ 3. **Built-in defaults** (lowest priority) - fallback values
243
+
244
+ Create an `infotracker.yml` file in your project root:
245
+ ```yaml
246
+ default_adapter: mssql
247
+ sql_dir: examples/warehouse/sql
248
+ out_dir: build/lineage
249
+ include: ["*.sql"]
250
+ exclude: ["*_wip.sql"]
251
+ ```
252
+
253
+ ## Documentation
254
+
255
+ For detailed information:
256
+ - `docs/overview.md` — what it is, goals, scope
257
+ - `docs/algorithm.md` — how extraction works
258
+ - `docs/lineage_concepts.md` — core concepts with visuals
259
+ - `docs/cli_usage.md` — commands and options
260
+ - `docs/breaking_changes.md` — definition and detection
261
+ - `docs/edge_cases.md` — SELECT *, UNION, temp tables, etc.
262
+ - `docs/adapters.md` — interface and MSSQL specifics
263
+ - `docs/architecture.md` — system and sequence diagrams
264
+ - `docs/configuration.md` — configuration reference
265
+ - `docs/openlineage_mapping.md` — how outputs map to OpenLineage
266
+ - `docs/faq.md` — common questions
267
+
268
+ #### Documentation
269
+ - `docs/overview.md` — what it is, goals, scope
270
+ - `docs/algorithm.md` — how extraction works
271
+ - `docs/lineage_concepts.md` — core concepts with visuals
272
+ - `docs/cli_usage.md` — commands and options
273
+ - `docs/breaking_changes.md` — definition and detection
274
+ - `docs/edge_cases.md` — SELECT *, UNION, temp tables, etc.
275
+ - `docs/advanced_use_cases.md` — tabular functions, procedures returning datasets
276
+ - `docs/adapters.md` — interface and MSSQL specifics
277
+ - `docs/architecture.md` — system and sequence diagrams
278
+ - `docs/configuration.md` — configuration reference
279
+ - `docs/openlineage_mapping.md` — how outputs map to OpenLineage
280
+ - `docs/faq.md` — common questions
281
+ - `docs/dbt_integration.md` — how to use with dbt projects
282
+
283
+
284
+ ## License
285
+ MIT (or your team’s preferred license)
@@ -0,0 +1,15 @@
1
+ infotracker/__init__.py,sha256=XkoK2R_QULA1UDQqgaLbmKQ2bdsi-lO3mo_wi7dy9Gg,57
2
+ infotracker/__main__.py,sha256=_iCom0ddZ1myy6ly3ID1dBlLzzjf7iV7Kq9uUfkat74,121
3
+ infotracker/adapters.py,sha256=UEQeGSS3_fMOc5_Jsrw5aTtmIXlOdqqbHWL2uSgqkGM,3011
4
+ infotracker/cli.py,sha256=PQQoxqSmu8fSFTeGCdLKIKiY7WTcCzddiANYGc1qqe8,5666
5
+ infotracker/config.py,sha256=pYriKZxwHNdU_nkk3n3R6d_h45izUj-BIkpLeFGASzE,1861
6
+ infotracker/diff.py,sha256=LmIl3FL5NVxil6AFefrqQBkCCRonueg6BEXrnleVpw8,19796
7
+ infotracker/engine.py,sha256=JlsrzPoB4Xe4qnTrEZ7emYP0K-zkqTqYOGzZiEZesks,23441
8
+ infotracker/lineage.py,sha256=GcNflXSO5QhqJj9eJewlWwfL_86N4aHdEgoY3ESD6_U,4863
9
+ infotracker/models.py,sha256=aQwU_4V69CnnHdgsybd99uvE3fzoQoW-nwn5aMhxdbU,14796
10
+ infotracker/openlineage_utils.py,sha256=-g9Pkl5hOMQP2Rtu47ItHBC13z6Y0K3gEG6x9GrTJH8,5845
11
+ infotracker/parser.py,sha256=8NVtCMvyt7l_dIfAydR_VJGB7A_NBLb2T827ac8uMXc,70255
12
+ infotracker-0.2.0.dist-info/METADATA,sha256=E36Q9HtPU3eY4lNMRFhwCl3WtpePiwN8KimoEhF-VOo,10449
13
+ infotracker-0.2.0.dist-info/WHEEL,sha256=qtCwoSJWgHk21S1Kb4ihdzI2rlJ1ZKaIurTj_ngOhyQ,87
14
+ infotracker-0.2.0.dist-info/entry_points.txt,sha256=5ulAYRSvW3SohjeMwlYRX6LoWIHkEtc1qnwxWJQgN2Y,59
15
+ infotracker-0.2.0.dist-info/RECORD,,
@@ -1,108 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: InfoTracker
3
- Version: 0.1.0
4
- Summary: Column-level SQL lineage, impact analysis, and breaking-change detection (MS SQL first)
5
- Project-URL: homepage, https://example.com/infotracker
6
- Project-URL: documentation, https://example.com/infotracker/docs
7
- Author: InfoTracker Authors
8
- License: MIT
9
- Keywords: data-lineage,impact-analysis,lineage,mssql,openlineage,sql
10
- Classifier: Environment :: Console
11
- Classifier: License :: OSI Approved :: MIT License
12
- Classifier: Operating System :: OS Independent
13
- Classifier: Programming Language :: Python :: 3
14
- Classifier: Programming Language :: Python :: 3.10
15
- Classifier: Topic :: Database
16
- Classifier: Topic :: Software Development :: Libraries
17
- Requires-Python: <3.13,>=3.10
18
- Requires-Dist: click<9.0.0,>=8.1.3
19
- Requires-Dist: networkx>=3.3
20
- Requires-Dist: packaging>=24.0
21
- Requires-Dist: pydantic>=2.8.2
22
- Requires-Dist: pyyaml>=6.0.1
23
- Requires-Dist: sqlglot>=23.0.0
24
- Requires-Dist: typer[all]==0.12.3
25
- Provides-Extra: dev
26
- Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
27
- Requires-Dist: pytest>=7.4.0; extra == 'dev'
28
- Description-Content-Type: text/markdown
29
-
30
- ### InfoTracker
31
-
32
- This is a Python CLI that extracts column-level lineage from SQL, runs impact analysis, and detects breaking changes. First adapter targets MS SQL.
33
-
34
- #### For Students
35
- Start with a simple command: `infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage`. This analyzes SQL files in the directory.
36
-
37
- #### Setup & Installation
38
- ```bash
39
- # Activate virtual environment first (REQUIRED)
40
- source infotracker-env/bin/activate # or your venv path
41
-
42
- # Install dependencies
43
- pip install -e .
44
-
45
- # Verify installation
46
- infotracker --help
47
- ```
48
-
49
- #### Quickstart
50
- ```bash
51
- # IMPORTANT: Always run InfoTracker commands in the activated virtual environment
52
-
53
- # Extract lineage from all SQL files
54
- infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
55
-
56
- # Impact analysis (downstream dependencies)
57
- infotracker impact -s dbo.fct_sales.Revenue+
58
-
59
- # Impact analysis (upstream sources)
60
- infotracker impact -s +dbo.Orders.OrderID
61
-
62
- # Branch diff for breaking changes
63
- infotracker diff --base main --head feature/x --sql-dir examples/warehouse/sql
64
- ```
65
-
66
- #### Configuration
67
- InfoTracker follows this configuration precedence:
68
- 1. **CLI flags** (highest priority) - override everything
69
- 2. **infotracker.yml** config file - project defaults
70
- 3. **Built-in defaults** (lowest priority) - fallback values
71
-
72
- Create an `infotracker.yml` file in your project root:
73
- ```yaml
74
- default_adapter: mssql
75
- sql_dir: examples/warehouse/sql
76
- out_dir: build/lineage
77
- include: ["*.sql"]
78
- exclude: ["*_wip.sql"]
79
- severity_threshold: BREAKING
80
- ```
81
-
82
- #### Documentation
83
- - `docs/overview.md` — what it is, goals, scope
84
- - `docs/algorithm.md` — how extraction works
85
- - `docs/lineage_concepts.md` — core concepts with visuals
86
- - `docs/cli_usage.md` — commands and options
87
- - `docs/breaking_changes.md` — definition and detection
88
- - `docs/edge_cases.md` — SELECT *, UNION, temp tables, etc.
89
- - `docs/adapters.md` — interface and MSSQL specifics
90
- - `docs/architecture.md` — system and sequence diagrams
91
- - `docs/configuration.md` — configuration reference
92
- - `docs/openlineage_mapping.md` — how outputs map to OpenLineage
93
- - `docs/faq.md` — common questions
94
- - `docs/dbt_integration.md` — how to use with dbt projects
95
-
96
- #### Requirements
97
- - Python 3.10+
98
- - Virtual environment (activated)
99
- - Basic SQL knowledge
100
- - Git and shell
101
-
102
- #### Troubleshooting
103
- - **Error tracebacks on help commands**: Make sure you're running in an activated virtual environment
104
- - **Command not found**: Activate your virtual environment first
105
- - **Import errors**: Ensure all dependencies are installed with `pip install -e .`
106
-
107
- #### License
108
- MIT (or your team’s preferred license)
@@ -1,14 +0,0 @@
1
- infotracker/__init__.py,sha256=XkoK2R_QULA1UDQqgaLbmKQ2bdsi-lO3mo_wi7dy9Gg,57
2
- infotracker/__main__.py,sha256=_iCom0ddZ1myy6ly3ID1dBlLzzjf7iV7Kq9uUfkat74,121
3
- infotracker/adapters.py,sha256=mfP5uep8HN4Xi-78rAKA8PqUY6q17amXLd2Zf0Jl1xU,2535
4
- infotracker/cli.py,sha256=NER5jByK2AwAZKbqdxPsE713TdGkirE_tarKLs-aMjg,4841
5
- infotracker/config.py,sha256=EKlETgAvbSQJbty87TsKMTbFjwI16RMBpOPrduw3pMk,1725
6
- infotracker/diff.py,sha256=ziNFmNW6SEppdeD_iIdEsNHVYHT3AQdB2IQVPzhIRIk,12470
7
- infotracker/engine.py,sha256=xW-10Jfxm0-MvPUBe1il37GBkNdchc8IC0-EZfZzPfk,13557
8
- infotracker/lineage.py,sha256=Gb8NQj-igf8wjdyMhfkSmB1_bVS__cipMivXJCRmiJQ,4645
9
- infotracker/models.py,sha256=cBsGOGmmGFfyz26TE-endA4AZc0qxzVMQ_7nZCJlcVA,10398
10
- infotracker/parser.py,sha256=Mr4pWGCaodwEA6jKzf6HkLHjQGHNaHlqCFXv9jDjVtA,36269
11
- infotracker-0.1.0.dist-info/METADATA,sha256=y9rMRDCAQ8faIngE7fU9gUT-26e6GopjJ-c3AxOM16A,3749
12
- infotracker-0.1.0.dist-info/WHEEL,sha256=qtCwoSJWgHk21S1Kb4ihdzI2rlJ1ZKaIurTj_ngOhyQ,87
13
- infotracker-0.1.0.dist-info/entry_points.txt,sha256=5ulAYRSvW3SohjeMwlYRX6LoWIHkEtc1qnwxWJQgN2Y,59
14
- infotracker-0.1.0.dist-info/RECORD,,