InfoTracker 0.3.0__py3-none-any.whl → 0.3.1__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- infotracker/__init__.py +1 -1
- infotracker-0.3.1.dist-info/METADATA +301 -0
- {infotracker-0.3.0.dist-info → infotracker-0.3.1.dist-info}/RECORD +5 -5
- infotracker-0.3.0.dist-info/METADATA +0 -285
- {infotracker-0.3.0.dist-info → infotracker-0.3.1.dist-info}/WHEEL +0 -0
- {infotracker-0.3.0.dist-info → infotracker-0.3.1.dist-info}/entry_points.txt +0 -0
infotracker/__init__.py
CHANGED
@@ -0,0 +1,301 @@
|
|
1
|
+
Metadata-Version: 2.4
|
2
|
+
Name: InfoTracker
|
3
|
+
Version: 0.3.1
|
4
|
+
Summary: Column-level SQL lineage, impact analysis, and breaking-change detection (MS SQL first)
|
5
|
+
Project-URL: homepage, https://example.com/infotracker
|
6
|
+
Project-URL: documentation, https://example.com/infotracker/docs
|
7
|
+
Author: InfoTracker Authors
|
8
|
+
License: MIT
|
9
|
+
Keywords: data-lineage,impact-analysis,lineage,mssql,openlineage,sql
|
10
|
+
Classifier: Environment :: Console
|
11
|
+
Classifier: License :: OSI Approved :: MIT License
|
12
|
+
Classifier: Operating System :: OS Independent
|
13
|
+
Classifier: Programming Language :: Python :: 3
|
14
|
+
Classifier: Programming Language :: Python :: 3.10
|
15
|
+
Classifier: Topic :: Database
|
16
|
+
Classifier: Topic :: Software Development :: Libraries
|
17
|
+
Requires-Python: >=3.10
|
18
|
+
Requires-Dist: click
|
19
|
+
Requires-Dist: networkx>=3.3
|
20
|
+
Requires-Dist: packaging>=24.0
|
21
|
+
Requires-Dist: pydantic>=2.8.2
|
22
|
+
Requires-Dist: pyyaml>=6.0.1
|
23
|
+
Requires-Dist: rich
|
24
|
+
Requires-Dist: shellingham
|
25
|
+
Requires-Dist: sqlglot>=23.0.0
|
26
|
+
Requires-Dist: typer
|
27
|
+
Provides-Extra: dev
|
28
|
+
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
|
29
|
+
Requires-Dist: pytest>=7.4.0; extra == 'dev'
|
30
|
+
Description-Content-Type: text/markdown
|
31
|
+
|
32
|
+
# InfoTracker
|
33
|
+
|
34
|
+
**Column-level SQL lineage extraction and impact analysis for MS SQL Server**
|
35
|
+
|
36
|
+
InfoTracker is a powerful command-line tool that parses T-SQL files and generates detailed column-level lineage in OpenLineage format. It supports advanced SQL Server features including table-valued functions, stored procedures, temp tables, and EXEC patterns.
|
37
|
+
|
38
|
+
[](https://python.org)
|
39
|
+
[](LICENSE)
|
40
|
+
[](https://pypi.org/project/InfoTracker/)
|
41
|
+
|
42
|
+
## 🚀 Features
|
43
|
+
|
44
|
+
- **Column-level lineage** - Track data flow at the column level with precise transformations
|
45
|
+
- **Advanced SQL support** - T-SQL dialect with temp tables, variables, CTEs, and window functions
|
46
|
+
- **Impact analysis** - Find upstream and downstream dependencies with flexible selectors
|
47
|
+
- **Wildcard matching** - Support for table wildcards (`schema.table.*`) and column wildcards (`..pattern`)
|
48
|
+
- **Breaking change detection** - Detect schema changes that could break downstream processes
|
49
|
+
- **Multiple output formats** - Text tables or JSON for integration with other tools
|
50
|
+
- **OpenLineage compatible** - Standard format for data lineage interoperability
|
51
|
+
- **Advanced SQL objects** - Table-valued functions (TVF) and dataset-returning procedures
|
52
|
+
- **Temp table tracking** - Full lineage through EXEC into temp tables
|
53
|
+
|
54
|
+
## 📦 Installation
|
55
|
+
|
56
|
+
### From PyPI (Recommended)
|
57
|
+
```bash
|
58
|
+
pip install InfoTracker
|
59
|
+
```
|
60
|
+
|
61
|
+
### From GitHub
|
62
|
+
```bash
|
63
|
+
# Latest stable release
|
64
|
+
pip install git+https://github.com/InfoMatePL/InfoTracker.git
|
65
|
+
|
66
|
+
# Development version
|
67
|
+
git clone https://github.com/InfoMatePL/InfoTracker.git
|
68
|
+
cd InfoTracker
|
69
|
+
pip install -e .
|
70
|
+
```
|
71
|
+
|
72
|
+
### Verify Installation
|
73
|
+
```bash
|
74
|
+
infotracker --help
|
75
|
+
```
|
76
|
+
|
77
|
+
## ⚡ Quick Start
|
78
|
+
|
79
|
+
### 1. Extract Lineage
|
80
|
+
```bash
|
81
|
+
# Extract lineage from SQL files
|
82
|
+
infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
|
83
|
+
```
|
84
|
+
|
85
|
+
### 2. Run Impact Analysis
|
86
|
+
```bash
|
87
|
+
# Find what feeds into a column (upstream)
|
88
|
+
infotracker impact -s "+STG.dbo.Orders.OrderID"
|
89
|
+
|
90
|
+
# Find what uses a column (downstream)
|
91
|
+
infotracker impact -s "STG.dbo.Orders.OrderID+"
|
92
|
+
|
93
|
+
# Both directions
|
94
|
+
infotracker impact -s "+dbo.fct_sales.Revenue+"
|
95
|
+
```
|
96
|
+
|
97
|
+
### 3. Detect Breaking Changes
|
98
|
+
```bash
|
99
|
+
# Compare two versions of your schema
|
100
|
+
infotracker diff --base build/lineage --head build/lineage_new
|
101
|
+
```
|
102
|
+
## 📖 Selector Syntax
|
103
|
+
|
104
|
+
InfoTracker supports flexible column selectors for precise impact analysis:
|
105
|
+
|
106
|
+
| Selector Format | Description | Example |
|
107
|
+
|-----------------|-------------|---------|
|
108
|
+
| `table.column` | Simple format (adds default `dbo` schema) | `Orders.OrderID` |
|
109
|
+
| `schema.table.column` | Schema-qualified format | `dbo.Orders.OrderID` |
|
110
|
+
| `database.schema.table.column` | Database-qualified format | `STG.dbo.Orders.OrderID` |
|
111
|
+
| `schema.table.*` | Table wildcard (all columns) | `dbo.fct_sales.*` |
|
112
|
+
| `..pattern` | Column wildcard (name contains pattern) | `..revenue` |
|
113
|
+
| `..pattern*` | Column wildcard with fnmatch | `..customer*` |
|
114
|
+
|
115
|
+
### Direction Control
|
116
|
+
- `selector` - downstream dependencies (default)
|
117
|
+
- `+selector` - upstream sources
|
118
|
+
- `selector+` - downstream dependencies (explicit)
|
119
|
+
- `+selector+` - both upstream and downstream
|
120
|
+
|
121
|
+
## 💡 Examples
|
122
|
+
|
123
|
+
### Basic Usage
|
124
|
+
```bash
|
125
|
+
# Extract lineage first (always run this before impact analysis)
|
126
|
+
infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
|
127
|
+
|
128
|
+
# Basic column lineage
|
129
|
+
infotracker impact -s "+dbo.fct_sales.Revenue" # What feeds this column?
|
130
|
+
infotracker impact -s "STG.dbo.Orders.OrderID+" # What uses this column?
|
131
|
+
```
|
132
|
+
|
133
|
+
### Wildcard Selectors
|
134
|
+
```bash
|
135
|
+
# All columns from a specific table
|
136
|
+
infotracker impact -s "dbo.fct_sales.*"
|
137
|
+
infotracker impact -s "STG.dbo.Orders.*"
|
138
|
+
|
139
|
+
# Find all columns containing "revenue" (case-insensitive)
|
140
|
+
infotracker impact -s "..revenue"
|
141
|
+
|
142
|
+
# Find all columns starting with "customer"
|
143
|
+
infotracker impact -s "..customer*"
|
144
|
+
```
|
145
|
+
|
146
|
+
### Advanced SQL Objects
|
147
|
+
```bash
|
148
|
+
# Table-valued function columns (upstream)
|
149
|
+
infotracker impact -s "+dbo.fn_customer_orders_tvf.*"
|
150
|
+
|
151
|
+
# Procedure dataset columns (upstream)
|
152
|
+
infotracker impact -s "+dbo.usp_customer_metrics_dataset.*"
|
153
|
+
|
154
|
+
# Temp table lineage from EXEC
|
155
|
+
infotracker impact -s "+#temp_table.*"
|
156
|
+
```
|
157
|
+
|
158
|
+
### Output Formats
|
159
|
+
```bash
|
160
|
+
# Text output (default, human-readable)
|
161
|
+
infotracker impact -s "+..revenue"
|
162
|
+
|
163
|
+
# JSON output (machine-readable)
|
164
|
+
infotracker --format json impact -s "..customer*" > customer_lineage.json
|
165
|
+
|
166
|
+
# Control traversal depth
|
167
|
+
infotracker impact -s "+dbo.Orders.OrderID" --max-depth 2
|
168
|
+
```
|
169
|
+
|
170
|
+
### Breaking Change Detection
|
171
|
+
```bash
|
172
|
+
# Extract baseline
|
173
|
+
infotracker extract --sql-dir sql_v1 --out-dir build/baseline
|
174
|
+
|
175
|
+
# Extract new version
|
176
|
+
infotracker extract --sql-dir sql_v2 --out-dir build/current
|
177
|
+
|
178
|
+
# Detect breaking changes
|
179
|
+
infotracker diff --base build/baseline --head build/current
|
180
|
+
|
181
|
+
# Filter by severity
|
182
|
+
infotracker diff --base build/baseline --head build/current --threshold BREAKING
|
183
|
+
```
|
184
|
+
|
185
|
+
|
186
|
+
## Output Format
|
187
|
+
|
188
|
+
Impact analysis returns these columns:
|
189
|
+
- **from** - Source column (fully qualified)
|
190
|
+
- **to** - Target column (fully qualified)
|
191
|
+
- **direction** - `upstream` or `downstream`
|
192
|
+
- **transformation** - Type of transformation (`IDENTITY`, `ARITHMETIC`, `AGGREGATION`, `CASE_AGGREGATION`, `DATE_FUNCTION`, `WINDOW`, etc.)
|
193
|
+
- **description** - Human-readable transformation description
|
194
|
+
|
195
|
+
Results are automatically deduplicated. Use `--format json` for machine-readable output.
|
196
|
+
|
197
|
+
### New Transformation Types
|
198
|
+
|
199
|
+
The enhanced transformation taxonomy includes:
|
200
|
+
- `ARITHMETIC_AGGREGATION` - Arithmetic operations combined with aggregation functions
|
201
|
+
- `COMPLEX_AGGREGATION` - Multi-step calculations involving multiple aggregations
|
202
|
+
- `DATE_FUNCTION` - Date/time calculations like DATEDIFF, DATEADD
|
203
|
+
- `DATE_FUNCTION_AGGREGATION` - Date functions applied to aggregated results
|
204
|
+
- `CASE_AGGREGATION` - CASE statements applied to aggregated results
|
205
|
+
|
206
|
+
### Advanced Object Support
|
207
|
+
|
208
|
+
InfoTracker now supports advanced SQL Server objects:
|
209
|
+
|
210
|
+
**Table-Valued Functions (TVF):**
|
211
|
+
- Inline TVF (`RETURN AS SELECT`) - Parsed directly from SELECT statement
|
212
|
+
- Multi-statement TVF (`RETURN @table TABLE`) - Extracts schema from table variable definition
|
213
|
+
- Function parameters are tracked as filter metadata (don't create columns)
|
214
|
+
|
215
|
+
**Dataset-Returning Procedures:**
|
216
|
+
- Procedures ending with SELECT statement are treated as dataset sources
|
217
|
+
- Output schema extracted from the final SELECT statement
|
218
|
+
- Parameters tracked as filter metadata affecting lineage scope
|
219
|
+
|
220
|
+
**EXEC into Temp Tables:**
|
221
|
+
- `INSERT INTO #temp EXEC procedure` patterns create edges from procedure columns to temp table columns
|
222
|
+
- Temp table lineage propagates downstream to final targets
|
223
|
+
- Supports complex workflow patterns combining functions, procedures, and temp tables
|
224
|
+
|
225
|
+
## Configuration
|
226
|
+
|
227
|
+
InfoTracker follows this configuration precedence:
|
228
|
+
1. **CLI flags** (highest priority) - override everything
|
229
|
+
2. **infotracker.yml** config file - project defaults
|
230
|
+
3. **Built-in defaults** (lowest priority) - fallback values
|
231
|
+
|
232
|
+
## 🔧 Configuration
|
233
|
+
|
234
|
+
Create an `infotracker.yml` file in your project root:
|
235
|
+
|
236
|
+
```yaml
|
237
|
+
sql_dirs:
|
238
|
+
- "sql/"
|
239
|
+
- "models/"
|
240
|
+
out_dir: "build/lineage"
|
241
|
+
exclude_dirs:
|
242
|
+
- "__pycache__"
|
243
|
+
- ".git"
|
244
|
+
severity_threshold: "POTENTIALLY_BREAKING"
|
245
|
+
```
|
246
|
+
|
247
|
+
### Configuration Options
|
248
|
+
|
249
|
+
| Setting | Description | Default | Examples |
|
250
|
+
|---------|-------------|---------|----------|
|
251
|
+
| `sql_dirs` | Directories to scan for SQL files | `["."]` | `["sql/", "models/"]` |
|
252
|
+
| `out_dir` | Output directory for lineage files | `"lineage"` | `"build/artifacts"` |
|
253
|
+
| `exclude_dirs` | Directories to skip | `[]` | `["__pycache__", "node_modules"]` |
|
254
|
+
| `severity_threshold` | Breaking change detection level | `"NON_BREAKING"` | `"BREAKING"` |
|
255
|
+
|
256
|
+
## 📚 Documentation
|
257
|
+
|
258
|
+
- **[Architecture](docs/architecture.md)** - Core concepts and design
|
259
|
+
- **[Lineage Concepts](docs/lineage_concepts.md)** - Data lineage fundamentals
|
260
|
+
- **[CLI Usage](docs/cli_usage.md)** - Complete command reference
|
261
|
+
- **[Configuration](docs/configuration.md)** - Advanced configuration options
|
262
|
+
- **[DBT Integration](docs/dbt_integration.md)** - Using with DBT projects
|
263
|
+
- **[OpenLineage Mapping](docs/openlineage_mapping.md)** - Output format specification
|
264
|
+
- **[Breaking Changes](docs/breaking_changes.md)** - Change detection and severity levels
|
265
|
+
- **[Advanced Use Cases](docs/advanced_use_cases.md)** - TVFs, stored procedures, and complex scenarios
|
266
|
+
- **[Edge Cases](docs/edge_cases.md)** - SELECT *, UNION, temp tables handling
|
267
|
+
- **[FAQ](docs/faq.md)** - Common questions and troubleshooting
|
268
|
+
|
269
|
+
## 🧪 Testing
|
270
|
+
|
271
|
+
```bash
|
272
|
+
# Run all tests
|
273
|
+
pytest
|
274
|
+
|
275
|
+
# Run specific test categories
|
276
|
+
pytest tests/test_parser.py # Parser functionality
|
277
|
+
pytest tests/test_wildcard.py # Wildcard selectors
|
278
|
+
pytest tests/test_adapter.py # SQL dialect adapters
|
279
|
+
|
280
|
+
# Run with coverage
|
281
|
+
pytest --cov=infotracker --cov-report=html
|
282
|
+
```
|
283
|
+
|
284
|
+
|
285
|
+
|
286
|
+
|
287
|
+
|
288
|
+
## 📄 License
|
289
|
+
|
290
|
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
291
|
+
|
292
|
+
## 🙏 Acknowledgments
|
293
|
+
|
294
|
+
- [SQLGlot](https://github.com/tobymao/sqlglot) - SQL parsing library
|
295
|
+
- [OpenLineage](https://openlineage.io/) - Data lineage standard
|
296
|
+
- [Typer](https://typer.tiangolo.com/) - CLI framework
|
297
|
+
- [Rich](https://rich.readthedocs.io/) - Terminal formatting
|
298
|
+
|
299
|
+
---
|
300
|
+
|
301
|
+
**InfoTracker** - Making database schema evolution safer, one column at a time. 🎯
|
@@ -1,4 +1,4 @@
|
|
1
|
-
infotracker/__init__.py,sha256=
|
1
|
+
infotracker/__init__.py,sha256=TU6dd-1zoswGqK5zIl_o01msZ-pQGxHJlynPUYSYwXY,57
|
2
2
|
infotracker/__main__.py,sha256=_iCom0ddZ1myy6ly3ID1dBlLzzjf7iV7Kq9uUfkat74,121
|
3
3
|
infotracker/adapters.py,sha256=UEQeGSS3_fMOc5_Jsrw5aTtmIXlOdqqbHWL2uSgqkGM,3011
|
4
4
|
infotracker/cli.py,sha256=Hvid6PuMcygUj4Uxor4iBD5OLkfz_LJ249V0UZpwk8A,6181
|
@@ -10,7 +10,7 @@ infotracker/models.py,sha256=d7EIjOm3evI8YekQWgLE0L1cWiOcU0F34-XdqxBkcTk,18332
|
|
10
10
|
infotracker/openlineage_utils.py,sha256=-g9Pkl5hOMQP2Rtu47ItHBC13z6Y0K3gEG6x9GrTJH8,5845
|
11
11
|
infotracker/parser.py,sha256=-zz_bmc4Rkb-hT_eDIvvpWxFtdyGFMKcRun9raNX4AY,71335
|
12
12
|
infotracker/infotracker.yml,sha256=iRrrrUkdLCvEhw4DHqPnMchDlsJWI3xIJEpwevNU9sg,998
|
13
|
-
infotracker-0.3.
|
14
|
-
infotracker-0.3.
|
15
|
-
infotracker-0.3.
|
16
|
-
infotracker-0.3.
|
13
|
+
infotracker-0.3.1.dist-info/METADATA,sha256=dLhABRKb7FaHcmCW0HTwYZnJHlbIZHMHIqSD-sy7KM4,10487
|
14
|
+
infotracker-0.3.1.dist-info/WHEEL,sha256=qtCwoSJWgHk21S1Kb4ihdzI2rlJ1ZKaIurTj_ngOhyQ,87
|
15
|
+
infotracker-0.3.1.dist-info/entry_points.txt,sha256=5ulAYRSvW3SohjeMwlYRX6LoWIHkEtc1qnwxWJQgN2Y,59
|
16
|
+
infotracker-0.3.1.dist-info/RECORD,,
|
@@ -1,285 +0,0 @@
|
|
1
|
-
Metadata-Version: 2.4
|
2
|
-
Name: InfoTracker
|
3
|
-
Version: 0.3.0
|
4
|
-
Summary: Column-level SQL lineage, impact analysis, and breaking-change detection (MS SQL first)
|
5
|
-
Project-URL: homepage, https://example.com/infotracker
|
6
|
-
Project-URL: documentation, https://example.com/infotracker/docs
|
7
|
-
Author: InfoTracker Authors
|
8
|
-
License: MIT
|
9
|
-
Keywords: data-lineage,impact-analysis,lineage,mssql,openlineage,sql
|
10
|
-
Classifier: Environment :: Console
|
11
|
-
Classifier: License :: OSI Approved :: MIT License
|
12
|
-
Classifier: Operating System :: OS Independent
|
13
|
-
Classifier: Programming Language :: Python :: 3
|
14
|
-
Classifier: Programming Language :: Python :: 3.10
|
15
|
-
Classifier: Topic :: Database
|
16
|
-
Classifier: Topic :: Software Development :: Libraries
|
17
|
-
Requires-Python: >=3.10
|
18
|
-
Requires-Dist: click
|
19
|
-
Requires-Dist: networkx>=3.3
|
20
|
-
Requires-Dist: packaging>=24.0
|
21
|
-
Requires-Dist: pydantic>=2.8.2
|
22
|
-
Requires-Dist: pyyaml>=6.0.1
|
23
|
-
Requires-Dist: rich
|
24
|
-
Requires-Dist: shellingham
|
25
|
-
Requires-Dist: sqlglot>=23.0.0
|
26
|
-
Requires-Dist: typer
|
27
|
-
Provides-Extra: dev
|
28
|
-
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
|
29
|
-
Requires-Dist: pytest>=7.4.0; extra == 'dev'
|
30
|
-
Description-Content-Type: text/markdown
|
31
|
-
|
32
|
-
# InfoTracker
|
33
|
-
|
34
|
-
Column-level SQL lineage extraction and impact analysis for MS SQL Server
|
35
|
-
|
36
|
-
## Features
|
37
|
-
|
38
|
-
- **Column-level lineage** - Track data flow at the column level
|
39
|
-
- **Parse SQL files** and generate OpenLineage-compatible JSON
|
40
|
-
- **Impact analysis** - Find upstream and downstream column dependencies with flexible selectors
|
41
|
-
- **Wildcard matching** - Support for table wildcards (`schema.table.*`) and column wildcards (`..pattern`)
|
42
|
-
- **Direction control** - Query upstream (`+selector`), downstream (`selector+`), or both (`+selector+`)
|
43
|
-
- **Configurable depth** - Control traversal depth with `--max-depth`
|
44
|
-
- **Multiple output formats** - Text tables or JSON for scripting
|
45
|
-
- **MSSQL support** - T-SQL dialect with temp tables, variables, and stored procedures
|
46
|
-
- **Advanced SQL objects** - Support for table-valued functions (TVF) and dataset-returning procedures
|
47
|
-
- **Temp table lineage** - Track EXEC into temp tables and propagate lineage downstream
|
48
|
-
|
49
|
-
## Requirements
|
50
|
-
- Python 3.10+
|
51
|
-
- Virtual environment (activated)
|
52
|
-
- Basic SQL knowledge
|
53
|
-
- Git and shell
|
54
|
-
|
55
|
-
## Troubleshooting
|
56
|
-
- **Error tracebacks on help commands**: Make sure you're running in an activated virtual environment
|
57
|
-
- **Command not found**: Activate your virtual environment first
|
58
|
-
- **Import errors**: Ensure all dependencies are installed with `pip install -e .`
|
59
|
-
- **Column not found**: Use full URI format or check column_graph.json for exact names
|
60
|
-
|
61
|
-
## Quickstart
|
62
|
-
|
63
|
-
### Setup & Installation
|
64
|
-
```bash
|
65
|
-
# Activate virtual environment first (REQUIRED)
|
66
|
-
|
67
|
-
# Install dependencies
|
68
|
-
pip install -e .
|
69
|
-
|
70
|
-
# Verify installation
|
71
|
-
infotracker --help
|
72
|
-
```
|
73
|
-
|
74
|
-
### Basic Usage
|
75
|
-
```bash
|
76
|
-
# 1. Extract lineage from SQL files (builds column graph)
|
77
|
-
infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
|
78
|
-
|
79
|
-
# 2. Run impact analysis
|
80
|
-
infotracker impact -s "STG.dbo.Orders.OrderID" # downstream dependencies
|
81
|
-
infotracker impact -s "+STG.dbo.Orders.OrderID" # upstream sources
|
82
|
-
```
|
83
|
-
|
84
|
-
## Selector Syntax
|
85
|
-
|
86
|
-
InfoTracker supports flexible column selectors:
|
87
|
-
|
88
|
-
| Selector Format | Description | Example |
|
89
|
-
|-----------------|-------------|---------|
|
90
|
-
| `table.column` | Simple format (adds default `dbo` schema) | `Orders.OrderID` |
|
91
|
-
| `schema.table.column` | Schema-qualified format | `dbo.Orders.OrderID` |
|
92
|
-
| `database.schema.table.column` | Database-qualified format | `STG.dbo.Orders.OrderID` |
|
93
|
-
| `schema.table.*` | Table wildcard (all columns) | `dbo.fct_sales.*` |
|
94
|
-
| `..pattern` | Column wildcard (name contains pattern) | `..revenue` |
|
95
|
-
| `.pattern` | Alias for column wildcard | `.orderid` |
|
96
|
-
| Full URI | Complete namespace format | `mssql://localhost/InfoTrackerDW.STG.dbo.Orders.OrderID` |
|
97
|
-
|
98
|
-
### Direction Control
|
99
|
-
- `selector` - downstream dependencies (default)
|
100
|
-
- `+selector` - upstream sources
|
101
|
-
- `selector+` - downstream dependencies (explicit)
|
102
|
-
- `+selector+` - both upstream and downstream
|
103
|
-
|
104
|
-
### Selector Cheat Sheet
|
105
|
-
|
106
|
-
**Table wildcards:**
|
107
|
-
```bash
|
108
|
-
# All columns from a specific table
|
109
|
-
infotracker impact -s "dbo.fct_sales.*"
|
110
|
-
infotracker impact -s "STG.dbo.Orders.*"
|
111
|
-
```
|
112
|
-
|
113
|
-
**Column name matching:**
|
114
|
-
```bash
|
115
|
-
# Find all columns containing "revenue" (case-insensitive)
|
116
|
-
infotracker impact -s "..revenue"
|
117
|
-
|
118
|
-
# Find all columns containing "id"
|
119
|
-
infotracker impact -s "..id"
|
120
|
-
|
121
|
-
# Use wildcards for pattern matching
|
122
|
-
infotracker impact -s "..customer*"
|
123
|
-
```
|
124
|
-
|
125
|
-
**Direction examples:**
|
126
|
-
```bash
|
127
|
-
# Upstream: what feeds into this column
|
128
|
-
infotracker impact -s "+dbo.fct_sales.Revenue"
|
129
|
-
|
130
|
-
# Downstream: what uses this column
|
131
|
-
infotracker impact -s "STG.dbo.Orders.OrderID+"
|
132
|
-
|
133
|
-
# Both directions
|
134
|
-
infotracker impact -s "+dbo.dim_customer.CustomerID+"
|
135
|
-
```
|
136
|
-
|
137
|
-
**Advanced SQL objects:**
|
138
|
-
```bash
|
139
|
-
# Table-valued function columns (upstream)
|
140
|
-
infotracker impact -s "+dbo.fn_customer_orders_tvf.*"
|
141
|
-
|
142
|
-
# Procedure dataset columns (upstream)
|
143
|
-
infotracker impact -s "+dbo.usp_customer_metrics_dataset.*"
|
144
|
-
|
145
|
-
# Temp table lineage from EXEC
|
146
|
-
infotracker impact -s "+#temp_table.*"
|
147
|
-
```
|
148
|
-
|
149
|
-
## Examples
|
150
|
-
|
151
|
-
```bash
|
152
|
-
# Extract lineage (run this first)
|
153
|
-
infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
|
154
|
-
|
155
|
-
# Basic column lineage
|
156
|
-
infotracker impact -s "+dbo.fct_sales.Revenue" # upstream sources
|
157
|
-
infotracker impact -s "STG.dbo.Orders.OrderID+" # downstream usage
|
158
|
-
|
159
|
-
# Wildcard selectors
|
160
|
-
infotracker impact -s "+..revenue+" # all revenue columns (both directions)
|
161
|
-
infotracker impact -s "dbo.fct_sales.*" # all columns from table
|
162
|
-
infotracker --format json impact -s "..customer*" # customer columns (JSON output)
|
163
|
-
|
164
|
-
# Advanced SQL objects (NEW)
|
165
|
-
infotracker impact -s "+dbo.fn_customer_orders_tvf.*" # TVF columns (upstream)
|
166
|
-
infotracker impact -s "+dbo.usp_customer_metrics_dataset.*" # procedure columns (upstream)
|
167
|
-
|
168
|
-
# Depth control
|
169
|
-
infotracker impact -s "+dbo.Orders.OrderID" --max-depth 1
|
170
|
-
|
171
|
-
# Demo the new features with the included examples
|
172
|
-
infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
|
173
|
-
infotracker impact -s "+dbo.fn_customer_orders_inline.*"
|
174
|
-
infotracker impact -s "+dbo.usp_customer_metrics_dataset.TotalRevenue"
|
175
|
-
```
|
176
|
-
|
177
|
-
### Copy-Paste Demo Commands
|
178
|
-
|
179
|
-
Test the new TVF and procedure lineage features:
|
180
|
-
|
181
|
-
```bash
|
182
|
-
# 1. Extract all lineage (including new TVF/procedure support)
|
183
|
-
infotracker extract --sql-dir examples/warehouse/sql --out-dir build/lineage
|
184
|
-
|
185
|
-
# 2. Test TVF lineage
|
186
|
-
infotracker --format text impact -s "+dbo.fn_customer_orders_tvf.*"
|
187
|
-
|
188
|
-
# 3. Test procedure lineage
|
189
|
-
infotracker --format text impact -s "+dbo.usp_customer_metrics_dataset.*"
|
190
|
-
|
191
|
-
# 4. Test column name contains wildcard
|
192
|
-
infotracker --format text impact -s "+..revenue"
|
193
|
-
|
194
|
-
# 5. Show results in JSON format
|
195
|
-
infotracker --format json impact -s "..total*" > tvf_lineage.json
|
196
|
-
```
|
197
|
-
|
198
|
-
## Output Format
|
199
|
-
|
200
|
-
Impact analysis returns these columns:
|
201
|
-
- **from** - Source column (fully qualified)
|
202
|
-
- **to** - Target column (fully qualified)
|
203
|
-
- **direction** - `upstream` or `downstream`
|
204
|
-
- **transformation** - Type of transformation (`IDENTITY`, `ARITHMETIC`, `AGGREGATION`, `CASE_AGGREGATION`, `DATE_FUNCTION`, `WINDOW`, etc.)
|
205
|
-
- **description** - Human-readable transformation description
|
206
|
-
|
207
|
-
Results are automatically deduplicated. Use `--format json` for machine-readable output.
|
208
|
-
|
209
|
-
### New Transformation Types
|
210
|
-
|
211
|
-
The enhanced transformation taxonomy includes:
|
212
|
-
- `ARITHMETIC_AGGREGATION` - Arithmetic operations combined with aggregation functions
|
213
|
-
- `COMPLEX_AGGREGATION` - Multi-step calculations involving multiple aggregations
|
214
|
-
- `DATE_FUNCTION` - Date/time calculations like DATEDIFF, DATEADD
|
215
|
-
- `DATE_FUNCTION_AGGREGATION` - Date functions applied to aggregated results
|
216
|
-
- `CASE_AGGREGATION` - CASE statements applied to aggregated results
|
217
|
-
|
218
|
-
### Advanced Object Support
|
219
|
-
|
220
|
-
InfoTracker now supports advanced SQL Server objects:
|
221
|
-
|
222
|
-
**Table-Valued Functions (TVF):**
|
223
|
-
- Inline TVF (`RETURN AS SELECT`) - Parsed directly from SELECT statement
|
224
|
-
- Multi-statement TVF (`RETURN @table TABLE`) - Extracts schema from table variable definition
|
225
|
-
- Function parameters are tracked as filter metadata (don't create columns)
|
226
|
-
|
227
|
-
**Dataset-Returning Procedures:**
|
228
|
-
- Procedures ending with SELECT statement are treated as dataset sources
|
229
|
-
- Output schema extracted from the final SELECT statement
|
230
|
-
- Parameters tracked as filter metadata affecting lineage scope
|
231
|
-
|
232
|
-
**EXEC into Temp Tables:**
|
233
|
-
- `INSERT INTO #temp EXEC procedure` patterns create edges from procedure columns to temp table columns
|
234
|
-
- Temp table lineage propagates downstream to final targets
|
235
|
-
- Supports complex workflow patterns combining functions, procedures, and temp tables
|
236
|
-
|
237
|
-
## Configuration
|
238
|
-
|
239
|
-
InfoTracker follows this configuration precedence:
|
240
|
-
1. **CLI flags** (highest priority) - override everything
|
241
|
-
2. **infotracker.yml** config file - project defaults
|
242
|
-
3. **Built-in defaults** (lowest priority) - fallback values
|
243
|
-
|
244
|
-
Create an `infotracker.yml` file in your project root:
|
245
|
-
```yaml
|
246
|
-
default_adapter: mssql
|
247
|
-
sql_dir: examples/warehouse/sql
|
248
|
-
out_dir: build/lineage
|
249
|
-
include: ["*.sql"]
|
250
|
-
exclude: ["*_wip.sql"]
|
251
|
-
```
|
252
|
-
|
253
|
-
## Documentation
|
254
|
-
|
255
|
-
For detailed information:
|
256
|
-
- `docs/overview.md` — what it is, goals, scope
|
257
|
-
- `docs/algorithm.md` — how extraction works
|
258
|
-
- `docs/lineage_concepts.md` — core concepts with visuals
|
259
|
-
- `docs/cli_usage.md` — commands and options
|
260
|
-
- `docs/breaking_changes.md` — definition and detection
|
261
|
-
- `docs/edge_cases.md` — SELECT *, UNION, temp tables, etc.
|
262
|
-
- `docs/adapters.md` — interface and MSSQL specifics
|
263
|
-
- `docs/architecture.md` — system and sequence diagrams
|
264
|
-
- `docs/configuration.md` — configuration reference
|
265
|
-
- `docs/openlineage_mapping.md` — how outputs map to OpenLineage
|
266
|
-
- `docs/faq.md` — common questions
|
267
|
-
|
268
|
-
#### Documentation
|
269
|
-
- `docs/overview.md` — what it is, goals, scope
|
270
|
-
- `docs/algorithm.md` — how extraction works
|
271
|
-
- `docs/lineage_concepts.md` — core concepts with visuals
|
272
|
-
- `docs/cli_usage.md` — commands and options
|
273
|
-
- `docs/breaking_changes.md` — definition and detection
|
274
|
-
- `docs/edge_cases.md` — SELECT *, UNION, temp tables, etc.
|
275
|
-
- `docs/advanced_use_cases.md` — tabular functions, procedures returning datasets
|
276
|
-
- `docs/adapters.md` — interface and MSSQL specifics
|
277
|
-
- `docs/architecture.md` — system and sequence diagrams
|
278
|
-
- `docs/configuration.md` — configuration reference
|
279
|
-
- `docs/openlineage_mapping.md` — how outputs map to OpenLineage
|
280
|
-
- `docs/faq.md` — common questions
|
281
|
-
- `docs/dbt_integration.md` — how to use with dbt projects
|
282
|
-
|
283
|
-
|
284
|
-
## License
|
285
|
-
MIT (or your team’s preferred license)
|
File without changes
|
File without changes
|