InfoTracker 0.3.0__py3-none-any.whl → 0.3.1__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
infotracker/__init__.py CHANGED
@@ -2,5 +2,5 @@ __all__ = [
2
2
  "__version__",
3
3
  ]
4
4
 
5
- __version__ = "0.1.0"
5
+ __version__ = "0.3.1"
6
6
 
@@ -0,0 +1,301 @@
1
+ Metadata-Version: 2.4
2
+ Name: InfoTracker
3
+ Version: 0.3.1
4
+ Summary: Column-level SQL lineage, impact analysis, and breaking-change detection (MS SQL first)
5
+ Project-URL: homepage, https://example.com/infotracker
6
+ Project-URL: documentation, https://example.com/infotracker/docs
7
+ Author: InfoTracker Authors
8
+ License: MIT
9
+ Keywords: data-lineage,impact-analysis,lineage,mssql,openlineage,sql
10
+ Classifier: Environment :: Console
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Operating System :: OS Independent
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.10
15
+ Classifier: Topic :: Database
16
+ Classifier: Topic :: Software Development :: Libraries
17
+ Requires-Python: >=3.10
18
+ Requires-Dist: click
19
+ Requires-Dist: networkx>=3.3
20
+ Requires-Dist: packaging>=24.0
21
+ Requires-Dist: pydantic>=2.8.2
22
+ Requires-Dist: pyyaml>=6.0.1
23
+ Requires-Dist: rich
24
+ Requires-Dist: shellingham
25
+ Requires-Dist: sqlglot>=23.0.0
26
+ Requires-Dist: typer
27
+ Provides-Extra: dev
28
+ Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
29
+ Requires-Dist: pytest>=7.4.0; extra == 'dev'
30
+ Description-Content-Type: text/markdown
31
+
32
+ # InfoTracker
33
+
34
+ **Column-level SQL lineage extraction and impact analysis for MS SQL Server**
35
+
36
+ InfoTracker is a powerful command-line tool that parses T-SQL files and generates detailed column-level lineage in OpenLineage format. It supports advanced SQL Server features including table-valued functions, stored procedures, temp tables, and EXEC patterns.
37
+
38
+ [![Python](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://python.org)
39
+ [![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
40
+ [![PyPI](https://img.shields.io/badge/PyPI-InfoTracker-blue.svg)](https://pypi.org/project/InfoTracker/)
41
+
42
+ ## 🚀 Features
43
+
44
+ - **Column-level lineage** - Track data flow at the column level with precise transformations
45
+ - **Advanced SQL support** - T-SQL dialect with temp tables, variables, CTEs, and window functions
46
+ - **Impact analysis** - Find upstream and downstream dependencies with flexible selectors
47
+ - **Wildcard matching** - Support for table wildcards (`schema.table.*`) and column wildcards (`..pattern`)
48
+ - **Breaking change detection** - Detect schema changes that could break downstream processes
49
+ - **Multiple output formats** - Text tables or JSON for integration with other tools
50
+ - **OpenLineage compatible** - Standard format for data lineage interoperability
51
+ - **Advanced SQL objects** - Table-valued functions (TVF) and dataset-returning procedures
52
+ - **Temp table tracking** - Full lineage through EXEC into temp tables
53
+
54
+ ## 📦 Installation
55
+
56
+ ### From PyPI (Recommended)
57
+ ```bash
58
+ pip install InfoTracker
59
+ ```
60
+
61
+ ### From GitHub
62
+ ```bash
63
+ # Latest stable release
64
+ pip install git+https://github.com/InfoMatePL/InfoTracker.git
65
+
66
+ # Development version
67
+ git clone https://github.com/InfoMatePL/InfoTracker.git
68
+ cd InfoTracker
69
+ pip install -e .
70
+ ```
71
+
72
+ ### Verify Installation
73
+ ```bash
74
+ infotracker --help
75
+ ```
76
+
77
+ ## ⚡ Quick Start
78
+
79
+ ### 1. Extract Lineage
80
+ ```bash
81
+ # Extract lineage from SQL files
82
+ infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
83
+ ```
84
+
85
+ ### 2. Run Impact Analysis
86
+ ```bash
87
+ # Find what feeds into a column (upstream)
88
+ infotracker impact -s "+STG.dbo.Orders.OrderID"
89
+
90
+ # Find what uses a column (downstream)
91
+ infotracker impact -s "STG.dbo.Orders.OrderID+"
92
+
93
+ # Both directions
94
+ infotracker impact -s "+dbo.fct_sales.Revenue+"
95
+ ```
96
+
97
+ ### 3. Detect Breaking Changes
98
+ ```bash
99
+ # Compare two versions of your schema
100
+ infotracker diff --base build/lineage --head build/lineage_new
101
+ ```
102
+ ## 📖 Selector Syntax
103
+
104
+ InfoTracker supports flexible column selectors for precise impact analysis:
105
+
106
+ | Selector Format | Description | Example |
107
+ |-----------------|-------------|---------|
108
+ | `table.column` | Simple format (adds default `dbo` schema) | `Orders.OrderID` |
109
+ | `schema.table.column` | Schema-qualified format | `dbo.Orders.OrderID` |
110
+ | `database.schema.table.column` | Database-qualified format | `STG.dbo.Orders.OrderID` |
111
+ | `schema.table.*` | Table wildcard (all columns) | `dbo.fct_sales.*` |
112
+ | `..pattern` | Column wildcard (name contains pattern) | `..revenue` |
113
+ | `..pattern*` | Column wildcard with fnmatch | `..customer*` |
114
+
115
+ ### Direction Control
116
+ - `selector` - downstream dependencies (default)
117
+ - `+selector` - upstream sources
118
+ - `selector+` - downstream dependencies (explicit)
119
+ - `+selector+` - both upstream and downstream
120
+
121
+ ## 💡 Examples
122
+
123
+ ### Basic Usage
124
+ ```bash
125
+ # Extract lineage first (always run this before impact analysis)
126
+ infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
127
+
128
+ # Basic column lineage
129
+ infotracker impact -s "+dbo.fct_sales.Revenue" # What feeds this column?
130
+ infotracker impact -s "STG.dbo.Orders.OrderID+" # What uses this column?
131
+ ```
132
+
133
+ ### Wildcard Selectors
134
+ ```bash
135
+ # All columns from a specific table
136
+ infotracker impact -s "dbo.fct_sales.*"
137
+ infotracker impact -s "STG.dbo.Orders.*"
138
+
139
+ # Find all columns containing "revenue" (case-insensitive)
140
+ infotracker impact -s "..revenue"
141
+
142
+ # Find all columns starting with "customer"
143
+ infotracker impact -s "..customer*"
144
+ ```
145
+
146
+ ### Advanced SQL Objects
147
+ ```bash
148
+ # Table-valued function columns (upstream)
149
+ infotracker impact -s "+dbo.fn_customer_orders_tvf.*"
150
+
151
+ # Procedure dataset columns (upstream)
152
+ infotracker impact -s "+dbo.usp_customer_metrics_dataset.*"
153
+
154
+ # Temp table lineage from EXEC
155
+ infotracker impact -s "+#temp_table.*"
156
+ ```
157
+
158
+ ### Output Formats
159
+ ```bash
160
+ # Text output (default, human-readable)
161
+ infotracker impact -s "+..revenue"
162
+
163
+ # JSON output (machine-readable)
164
+ infotracker --format json impact -s "..customer*" > customer_lineage.json
165
+
166
+ # Control traversal depth
167
+ infotracker impact -s "+dbo.Orders.OrderID" --max-depth 2
168
+ ```
169
+
170
+ ### Breaking Change Detection
171
+ ```bash
172
+ # Extract baseline
173
+ infotracker extract --sql-dir sql_v1 --out-dir build/baseline
174
+
175
+ # Extract new version
176
+ infotracker extract --sql-dir sql_v2 --out-dir build/current
177
+
178
+ # Detect breaking changes
179
+ infotracker diff --base build/baseline --head build/current
180
+
181
+ # Filter by severity
182
+ infotracker diff --base build/baseline --head build/current --threshold BREAKING
183
+ ```
184
+
185
+
186
+ ## Output Format
187
+
188
+ Impact analysis returns these columns:
189
+ - **from** - Source column (fully qualified)
190
+ - **to** - Target column (fully qualified)
191
+ - **direction** - `upstream` or `downstream`
192
+ - **transformation** - Type of transformation (`IDENTITY`, `ARITHMETIC`, `AGGREGATION`, `CASE_AGGREGATION`, `DATE_FUNCTION`, `WINDOW`, etc.)
193
+ - **description** - Human-readable transformation description
194
+
195
+ Results are automatically deduplicated. Use `--format json` for machine-readable output.
196
+
197
+ ### New Transformation Types
198
+
199
+ The enhanced transformation taxonomy includes:
200
+ - `ARITHMETIC_AGGREGATION` - Arithmetic operations combined with aggregation functions
201
+ - `COMPLEX_AGGREGATION` - Multi-step calculations involving multiple aggregations
202
+ - `DATE_FUNCTION` - Date/time calculations like DATEDIFF, DATEADD
203
+ - `DATE_FUNCTION_AGGREGATION` - Date functions applied to aggregated results
204
+ - `CASE_AGGREGATION` - CASE statements applied to aggregated results
205
+
206
+ ### Advanced Object Support
207
+
208
+ InfoTracker now supports advanced SQL Server objects:
209
+
210
+ **Table-Valued Functions (TVF):**
211
+ - Inline TVF (`RETURN AS SELECT`) - Parsed directly from SELECT statement
212
+ - Multi-statement TVF (`RETURN @table TABLE`) - Extracts schema from table variable definition
213
+ - Function parameters are tracked as filter metadata (don't create columns)
214
+
215
+ **Dataset-Returning Procedures:**
216
+ - Procedures ending with SELECT statement are treated as dataset sources
217
+ - Output schema extracted from the final SELECT statement
218
+ - Parameters tracked as filter metadata affecting lineage scope
219
+
220
+ **EXEC into Temp Tables:**
221
+ - `INSERT INTO #temp EXEC procedure` patterns create edges from procedure columns to temp table columns
222
+ - Temp table lineage propagates downstream to final targets
223
+ - Supports complex workflow patterns combining functions, procedures, and temp tables
224
+
225
+ ## Configuration
226
+
227
+ InfoTracker follows this configuration precedence:
228
+ 1. **CLI flags** (highest priority) - override everything
229
+ 2. **infotracker.yml** config file - project defaults
230
+ 3. **Built-in defaults** (lowest priority) - fallback values
231
+
232
+ ## 🔧 Configuration
233
+
234
+ Create an `infotracker.yml` file in your project root:
235
+
236
+ ```yaml
237
+ sql_dirs:
238
+ - "sql/"
239
+ - "models/"
240
+ out_dir: "build/lineage"
241
+ exclude_dirs:
242
+ - "__pycache__"
243
+ - ".git"
244
+ severity_threshold: "POTENTIALLY_BREAKING"
245
+ ```
246
+
247
+ ### Configuration Options
248
+
249
+ | Setting | Description | Default | Examples |
250
+ |---------|-------------|---------|----------|
251
+ | `sql_dirs` | Directories to scan for SQL files | `["."]` | `["sql/", "models/"]` |
252
+ | `out_dir` | Output directory for lineage files | `"lineage"` | `"build/artifacts"` |
253
+ | `exclude_dirs` | Directories to skip | `[]` | `["__pycache__", "node_modules"]` |
254
+ | `severity_threshold` | Breaking change detection level | `"NON_BREAKING"` | `"BREAKING"` |
255
+
256
+ ## 📚 Documentation
257
+
258
+ - **[Architecture](docs/architecture.md)** - Core concepts and design
259
+ - **[Lineage Concepts](docs/lineage_concepts.md)** - Data lineage fundamentals
260
+ - **[CLI Usage](docs/cli_usage.md)** - Complete command reference
261
+ - **[Configuration](docs/configuration.md)** - Advanced configuration options
262
+ - **[DBT Integration](docs/dbt_integration.md)** - Using with DBT projects
263
+ - **[OpenLineage Mapping](docs/openlineage_mapping.md)** - Output format specification
264
+ - **[Breaking Changes](docs/breaking_changes.md)** - Change detection and severity levels
265
+ - **[Advanced Use Cases](docs/advanced_use_cases.md)** - TVFs, stored procedures, and complex scenarios
266
+ - **[Edge Cases](docs/edge_cases.md)** - SELECT *, UNION, temp tables handling
267
+ - **[FAQ](docs/faq.md)** - Common questions and troubleshooting
268
+
269
+ ## 🧪 Testing
270
+
271
+ ```bash
272
+ # Run all tests
273
+ pytest
274
+
275
+ # Run specific test categories
276
+ pytest tests/test_parser.py # Parser functionality
277
+ pytest tests/test_wildcard.py # Wildcard selectors
278
+ pytest tests/test_adapter.py # SQL dialect adapters
279
+
280
+ # Run with coverage
281
+ pytest --cov=infotracker --cov-report=html
282
+ ```
283
+
284
+
285
+
286
+
287
+
288
+ ## 📄 License
289
+
290
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
291
+
292
+ ## 🙏 Acknowledgments
293
+
294
+ - [SQLGlot](https://github.com/tobymao/sqlglot) - SQL parsing library
295
+ - [OpenLineage](https://openlineage.io/) - Data lineage standard
296
+ - [Typer](https://typer.tiangolo.com/) - CLI framework
297
+ - [Rich](https://rich.readthedocs.io/) - Terminal formatting
298
+
299
+ ---
300
+
301
+ **InfoTracker** - Making database schema evolution safer, one column at a time. 🎯
@@ -1,4 +1,4 @@
1
- infotracker/__init__.py,sha256=XkoK2R_QULA1UDQqgaLbmKQ2bdsi-lO3mo_wi7dy9Gg,57
1
+ infotracker/__init__.py,sha256=TU6dd-1zoswGqK5zIl_o01msZ-pQGxHJlynPUYSYwXY,57
2
2
  infotracker/__main__.py,sha256=_iCom0ddZ1myy6ly3ID1dBlLzzjf7iV7Kq9uUfkat74,121
3
3
  infotracker/adapters.py,sha256=UEQeGSS3_fMOc5_Jsrw5aTtmIXlOdqqbHWL2uSgqkGM,3011
4
4
  infotracker/cli.py,sha256=Hvid6PuMcygUj4Uxor4iBD5OLkfz_LJ249V0UZpwk8A,6181
@@ -10,7 +10,7 @@ infotracker/models.py,sha256=d7EIjOm3evI8YekQWgLE0L1cWiOcU0F34-XdqxBkcTk,18332
10
10
  infotracker/openlineage_utils.py,sha256=-g9Pkl5hOMQP2Rtu47ItHBC13z6Y0K3gEG6x9GrTJH8,5845
11
11
  infotracker/parser.py,sha256=-zz_bmc4Rkb-hT_eDIvvpWxFtdyGFMKcRun9raNX4AY,71335
12
12
  infotracker/infotracker.yml,sha256=iRrrrUkdLCvEhw4DHqPnMchDlsJWI3xIJEpwevNU9sg,998
13
- infotracker-0.3.0.dist-info/METADATA,sha256=1QeaLFLL2redY2HD1Xn977cvSUBRQ6izbfZh6Vwmw3w,10449
14
- infotracker-0.3.0.dist-info/WHEEL,sha256=qtCwoSJWgHk21S1Kb4ihdzI2rlJ1ZKaIurTj_ngOhyQ,87
15
- infotracker-0.3.0.dist-info/entry_points.txt,sha256=5ulAYRSvW3SohjeMwlYRX6LoWIHkEtc1qnwxWJQgN2Y,59
16
- infotracker-0.3.0.dist-info/RECORD,,
13
+ infotracker-0.3.1.dist-info/METADATA,sha256=dLhABRKb7FaHcmCW0HTwYZnJHlbIZHMHIqSD-sy7KM4,10487
14
+ infotracker-0.3.1.dist-info/WHEEL,sha256=qtCwoSJWgHk21S1Kb4ihdzI2rlJ1ZKaIurTj_ngOhyQ,87
15
+ infotracker-0.3.1.dist-info/entry_points.txt,sha256=5ulAYRSvW3SohjeMwlYRX6LoWIHkEtc1qnwxWJQgN2Y,59
16
+ infotracker-0.3.1.dist-info/RECORD,,
@@ -1,285 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: InfoTracker
3
- Version: 0.3.0
4
- Summary: Column-level SQL lineage, impact analysis, and breaking-change detection (MS SQL first)
5
- Project-URL: homepage, https://example.com/infotracker
6
- Project-URL: documentation, https://example.com/infotracker/docs
7
- Author: InfoTracker Authors
8
- License: MIT
9
- Keywords: data-lineage,impact-analysis,lineage,mssql,openlineage,sql
10
- Classifier: Environment :: Console
11
- Classifier: License :: OSI Approved :: MIT License
12
- Classifier: Operating System :: OS Independent
13
- Classifier: Programming Language :: Python :: 3
14
- Classifier: Programming Language :: Python :: 3.10
15
- Classifier: Topic :: Database
16
- Classifier: Topic :: Software Development :: Libraries
17
- Requires-Python: >=3.10
18
- Requires-Dist: click
19
- Requires-Dist: networkx>=3.3
20
- Requires-Dist: packaging>=24.0
21
- Requires-Dist: pydantic>=2.8.2
22
- Requires-Dist: pyyaml>=6.0.1
23
- Requires-Dist: rich
24
- Requires-Dist: shellingham
25
- Requires-Dist: sqlglot>=23.0.0
26
- Requires-Dist: typer
27
- Provides-Extra: dev
28
- Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
29
- Requires-Dist: pytest>=7.4.0; extra == 'dev'
30
- Description-Content-Type: text/markdown
31
-
32
- # InfoTracker
33
-
34
- Column-level SQL lineage extraction and impact analysis for MS SQL Server
35
-
36
- ## Features
37
-
38
- - **Column-level lineage** - Track data flow at the column level
39
- - **Parse SQL files** and generate OpenLineage-compatible JSON
40
- - **Impact analysis** - Find upstream and downstream column dependencies with flexible selectors
41
- - **Wildcard matching** - Support for table wildcards (`schema.table.*`) and column wildcards (`..pattern`)
42
- - **Direction control** - Query upstream (`+selector`), downstream (`selector+`), or both (`+selector+`)
43
- - **Configurable depth** - Control traversal depth with `--max-depth`
44
- - **Multiple output formats** - Text tables or JSON for scripting
45
- - **MSSQL support** - T-SQL dialect with temp tables, variables, and stored procedures
46
- - **Advanced SQL objects** - Support for table-valued functions (TVF) and dataset-returning procedures
47
- - **Temp table lineage** - Track EXEC into temp tables and propagate lineage downstream
48
-
49
- ## Requirements
50
- - Python 3.10+
51
- - Virtual environment (activated)
52
- - Basic SQL knowledge
53
- - Git and shell
54
-
55
- ## Troubleshooting
56
- - **Error tracebacks on help commands**: Make sure you're running in an activated virtual environment
57
- - **Command not found**: Activate your virtual environment first
58
- - **Import errors**: Ensure all dependencies are installed with `pip install -e .`
59
- - **Column not found**: Use full URI format or check column_graph.json for exact names
60
-
61
- ## Quickstart
62
-
63
- ### Setup & Installation
64
- ```bash
65
- # Activate virtual environment first (REQUIRED)
66
-
67
- # Install dependencies
68
- pip install -e .
69
-
70
- # Verify installation
71
- infotracker --help
72
- ```
73
-
74
- ### Basic Usage
75
- ```bash
76
- # 1. Extract lineage from SQL files (builds column graph)
77
- infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
78
-
79
- # 2. Run impact analysis
80
- infotracker impact -s "STG.dbo.Orders.OrderID" # downstream dependencies
81
- infotracker impact -s "+STG.dbo.Orders.OrderID" # upstream sources
82
- ```
83
-
84
- ## Selector Syntax
85
-
86
- InfoTracker supports flexible column selectors:
87
-
88
- | Selector Format | Description | Example |
89
- |-----------------|-------------|---------|
90
- | `table.column` | Simple format (adds default `dbo` schema) | `Orders.OrderID` |
91
- | `schema.table.column` | Schema-qualified format | `dbo.Orders.OrderID` |
92
- | `database.schema.table.column` | Database-qualified format | `STG.dbo.Orders.OrderID` |
93
- | `schema.table.*` | Table wildcard (all columns) | `dbo.fct_sales.*` |
94
- | `..pattern` | Column wildcard (name contains pattern) | `..revenue` |
95
- | `.pattern` | Alias for column wildcard | `.orderid` |
96
- | Full URI | Complete namespace format | `mssql://localhost/InfoTrackerDW.STG.dbo.Orders.OrderID` |
97
-
98
- ### Direction Control
99
- - `selector` - downstream dependencies (default)
100
- - `+selector` - upstream sources
101
- - `selector+` - downstream dependencies (explicit)
102
- - `+selector+` - both upstream and downstream
103
-
104
- ### Selector Cheat Sheet
105
-
106
- **Table wildcards:**
107
- ```bash
108
- # All columns from a specific table
109
- infotracker impact -s "dbo.fct_sales.*"
110
- infotracker impact -s "STG.dbo.Orders.*"
111
- ```
112
-
113
- **Column name matching:**
114
- ```bash
115
- # Find all columns containing "revenue" (case-insensitive)
116
- infotracker impact -s "..revenue"
117
-
118
- # Find all columns containing "id"
119
- infotracker impact -s "..id"
120
-
121
- # Use wildcards for pattern matching
122
- infotracker impact -s "..customer*"
123
- ```
124
-
125
- **Direction examples:**
126
- ```bash
127
- # Upstream: what feeds into this column
128
- infotracker impact -s "+dbo.fct_sales.Revenue"
129
-
130
- # Downstream: what uses this column
131
- infotracker impact -s "STG.dbo.Orders.OrderID+"
132
-
133
- # Both directions
134
- infotracker impact -s "+dbo.dim_customer.CustomerID+"
135
- ```
136
-
137
- **Advanced SQL objects:**
138
- ```bash
139
- # Table-valued function columns (upstream)
140
- infotracker impact -s "+dbo.fn_customer_orders_tvf.*"
141
-
142
- # Procedure dataset columns (upstream)
143
- infotracker impact -s "+dbo.usp_customer_metrics_dataset.*"
144
-
145
- # Temp table lineage from EXEC
146
- infotracker impact -s "+#temp_table.*"
147
- ```
148
-
149
- ## Examples
150
-
151
- ```bash
152
- # Extract lineage (run this first)
153
- infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
154
-
155
- # Basic column lineage
156
- infotracker impact -s "+dbo.fct_sales.Revenue" # upstream sources
157
- infotracker impact -s "STG.dbo.Orders.OrderID+" # downstream usage
158
-
159
- # Wildcard selectors
160
- infotracker impact -s "+..revenue+" # all revenue columns (both directions)
161
- infotracker impact -s "dbo.fct_sales.*" # all columns from table
162
- infotracker --format json impact -s "..customer*" # customer columns (JSON output)
163
-
164
- # Advanced SQL objects (NEW)
165
- infotracker impact -s "+dbo.fn_customer_orders_tvf.*" # TVF columns (upstream)
166
- infotracker impact -s "+dbo.usp_customer_metrics_dataset.*" # procedure columns (upstream)
167
-
168
- # Depth control
169
- infotracker impact -s "+dbo.Orders.OrderID" --max-depth 1
170
-
171
- # Demo the new features with the included examples
172
- infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
173
- infotracker impact -s "+dbo.fn_customer_orders_inline.*"
174
- infotracker impact -s "+dbo.usp_customer_metrics_dataset.TotalRevenue"
175
- ```
176
-
177
- ### Copy-Paste Demo Commands
178
-
179
- Test the new TVF and procedure lineage features:
180
-
181
- ```bash
182
- # 1. Extract all lineage (including new TVF/procedure support)
183
- infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
184
-
185
- # 2. Test TVF lineage
186
- infotracker --format text impact -s "+dbo.fn_customer_orders_tvf.*"
187
-
188
- # 3. Test procedure lineage
189
- infotracker --format text impact -s "+dbo.usp_customer_metrics_dataset.*"
190
-
191
- # 4. Test column name contains wildcard
192
- infotracker --format text impact -s "+..revenue"
193
-
194
- # 5. Show results in JSON format
195
- infotracker --format json impact -s "..total*" > tvf_lineage.json
196
- ```
197
-
198
- ## Output Format
199
-
200
- Impact analysis returns these columns:
201
- - **from** - Source column (fully qualified)
202
- - **to** - Target column (fully qualified)
203
- - **direction** - `upstream` or `downstream`
204
- - **transformation** - Type of transformation (`IDENTITY`, `ARITHMETIC`, `AGGREGATION`, `CASE_AGGREGATION`, `DATE_FUNCTION`, `WINDOW`, etc.)
205
- - **description** - Human-readable transformation description
206
-
207
- Results are automatically deduplicated. Use `--format json` for machine-readable output.
208
-
209
- ### New Transformation Types
210
-
211
- The enhanced transformation taxonomy includes:
212
- - `ARITHMETIC_AGGREGATION` - Arithmetic operations combined with aggregation functions
213
- - `COMPLEX_AGGREGATION` - Multi-step calculations involving multiple aggregations
214
- - `DATE_FUNCTION` - Date/time calculations like DATEDIFF, DATEADD
215
- - `DATE_FUNCTION_AGGREGATION` - Date functions applied to aggregated results
216
- - `CASE_AGGREGATION` - CASE statements applied to aggregated results
217
-
218
- ### Advanced Object Support
219
-
220
- InfoTracker now supports advanced SQL Server objects:
221
-
222
- **Table-Valued Functions (TVF):**
223
- - Inline TVF (`RETURN AS SELECT`) - Parsed directly from SELECT statement
224
- - Multi-statement TVF (`RETURN @table TABLE`) - Extracts schema from table variable definition
225
- - Function parameters are tracked as filter metadata (don't create columns)
226
-
227
- **Dataset-Returning Procedures:**
228
- - Procedures ending with SELECT statement are treated as dataset sources
229
- - Output schema extracted from the final SELECT statement
230
- - Parameters tracked as filter metadata affecting lineage scope
231
-
232
- **EXEC into Temp Tables:**
233
- - `INSERT INTO #temp EXEC procedure` patterns create edges from procedure columns to temp table columns
234
- - Temp table lineage propagates downstream to final targets
235
- - Supports complex workflow patterns combining functions, procedures, and temp tables
236
-
237
- ## Configuration
238
-
239
- InfoTracker follows this configuration precedence:
240
- 1. **CLI flags** (highest priority) - override everything
241
- 2. **infotracker.yml** config file - project defaults
242
- 3. **Built-in defaults** (lowest priority) - fallback values
243
-
244
- Create an `infotracker.yml` file in your project root:
245
- ```yaml
246
- default_adapter: mssql
247
- sql_dir: examples/warehouse/sql
248
- out_dir: build/lineage
249
- include: ["*.sql"]
250
- exclude: ["*_wip.sql"]
251
- ```
252
-
253
- ## Documentation
254
-
255
- For detailed information:
256
- - `docs/overview.md` — what it is, goals, scope
257
- - `docs/algorithm.md` — how extraction works
258
- - `docs/lineage_concepts.md` — core concepts with visuals
259
- - `docs/cli_usage.md` — commands and options
260
- - `docs/breaking_changes.md` — definition and detection
261
- - `docs/edge_cases.md` — SELECT *, UNION, temp tables, etc.
262
- - `docs/adapters.md` — interface and MSSQL specifics
263
- - `docs/architecture.md` — system and sequence diagrams
264
- - `docs/configuration.md` — configuration reference
265
- - `docs/openlineage_mapping.md` — how outputs map to OpenLineage
266
- - `docs/faq.md` — common questions
267
-
268
- #### Documentation
269
- - `docs/overview.md` — what it is, goals, scope
270
- - `docs/algorithm.md` — how extraction works
271
- - `docs/lineage_concepts.md` — core concepts with visuals
272
- - `docs/cli_usage.md` — commands and options
273
- - `docs/breaking_changes.md` — definition and detection
274
- - `docs/edge_cases.md` — SELECT *, UNION, temp tables, etc.
275
- - `docs/advanced_use_cases.md` — tabular functions, procedures returning datasets
276
- - `docs/adapters.md` — interface and MSSQL specifics
277
- - `docs/architecture.md` — system and sequence diagrams
278
- - `docs/configuration.md` — configuration reference
279
- - `docs/openlineage_mapping.md` — how outputs map to OpenLineage
280
- - `docs/faq.md` — common questions
281
- - `docs/dbt_integration.md` — how to use with dbt projects
282
-
283
-
284
- ## License
285
- MIT (or your team’s preferred license)