autolineage 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Kishan Raj
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,415 @@
1
+ Metadata-Version: 2.4
2
+ Name: autolineage
3
+ Version: 0.1.0
4
+ Summary: Automatic ML data lineage tracking with zero manual logging
5
+ Author-email: Kishan Raj <kishanraj41@gmail.com>
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/kishanraj41/autolineage
8
+ Project-URL: Documentation, https://github.com/kishanraj41/autolineage#readme
9
+ Project-URL: Repository, https://github.com/kishanraj41/autolineage
10
+ Project-URL: Issues, https://github.com/kishanraj41/autolineage/issues
11
+ Project-URL: Changelog, https://github.com/kishanraj41/autolineage/releases
12
+ Keywords: machine-learning,mlops,data-lineage,reproducibility,data-governance,eu-ai-act,compliance,data-provenance
13
+ Classifier: Development Status :: 4 - Beta
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: Intended Audience :: Science/Research
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.8
18
+ Classifier: Programming Language :: Python :: 3.9
19
+ Classifier: Programming Language :: Python :: 3.10
20
+ Classifier: Programming Language :: Python :: 3.11
21
+ Classifier: Programming Language :: Python :: 3.12
22
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
23
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
24
+ Classifier: Operating System :: OS Independent
25
+ Requires-Python: >=3.8
26
+ Description-Content-Type: text/markdown
27
+ License-File: LICENSE
28
+ Requires-Dist: pandas>=1.3.0
29
+ Requires-Dist: numpy>=1.20.0
30
+ Requires-Dist: networkx>=2.6.0
31
+ Requires-Dist: matplotlib>=3.4.0
32
+ Requires-Dist: click>=8.0.0
33
+ Provides-Extra: dev
34
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
35
+ Requires-Dist: black>=22.0.0; extra == "dev"
36
+ Requires-Dist: flake8>=4.0.0; extra == "dev"
37
+ Requires-Dist: nbformat>=5.0.0; extra == "dev"
38
+ Provides-Extra: ui
39
+ Requires-Dist: streamlit>=1.20.0; extra == "ui"
40
+ Requires-Dist: plotly>=5.10.0; extra == "ui"
41
+ Provides-Extra: jupyter
42
+ Requires-Dist: ipython>=7.0.0; extra == "jupyter"
43
+ Requires-Dist: notebook>=6.0.0; extra == "jupyter"
44
+ Provides-Extra: all
45
+ Requires-Dist: streamlit>=1.20.0; extra == "all"
46
+ Requires-Dist: plotly>=5.10.0; extra == "all"
47
+ Requires-Dist: ipython>=7.0.0; extra == "all"
48
+ Requires-Dist: notebook>=6.0.0; extra == "all"
49
+ Dynamic: license-file
50
+
51
+ # AutoLineage
52
+
53
+ **Automatic ML Data Lineage Tracking**
54
+
55
+ Track your data lineage automatically - from raw data to trained models - without manual logging.
56
+
57
+ [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
58
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
59
+
60
+ ## Quick Start
61
+ ```bash
62
+ pip install autolineage
63
+ ```
64
+ ```python
65
+ import autolineage.auto
66
+ import pandas as pd
67
+
68
+ # Your normal code - everything tracked automatically!
69
+ df = pd.read_csv('data.csv')
70
+ df_clean = df.dropna()
71
+ df_clean.to_csv('clean.csv')
72
+
73
+ # That's it! Lineage tracked automatically
74
+ ```
75
+
76
+ ## Features
77
+
78
+ - **Zero Manual Logging** - Track lineage automatically with zero code changes
79
+ - **Visual Graphs** - Beautiful interactive and static lineage visualizations
80
+ - **EU AI Act Compliant** - Generate compliance reports instantly
81
+ - **Jupyter Support** - Magic commands for notebooks
82
+ - **Multi-Environment** - Works in Jupyter, Python scripts, CLI
83
+ - **Lightweight** - SQLite backend, no complex setup
84
+ - **Cryptographic Verification** - SHA-256 hashes for data integrity
85
+
86
+ ## Three Ways to Use
87
+
88
+ ### 1️⃣ Automatic (Recommended)
89
+ ```python
90
+ import autolineage.auto
91
+
92
+ # Just write normal pandas/numpy code
93
+ # Everything is tracked automatically!
94
+ ```
95
+
96
+ ### 2️⃣ CLI
97
+ ```bash
98
+ lineage track my_pipeline.py
99
+ lineage show --format html
100
+ lineage report
101
+ ```
102
+
103
+ ### 3️⃣ Jupyter Magic
104
+ ```python
105
+ %load_ext autolineage
106
+ %lineage_start
107
+
108
+ # Your code...
109
+
110
+ %lineage_show
111
+ %lineage_report
112
+ ```
113
+
114
+ ## What Gets Tracked
115
+
116
+ AutoLineage automatically hooks into:
117
+
118
+ | Library | Functions |
119
+ |---------|-----------|
120
+ | **pandas** | read_csv, to_csv, read_parquet, to_parquet, read_json, to_json, read_excel, to_excel, read_pickle, to_pickle |
121
+ | **numpy** | load, save, loadtxt, savetxt |
122
+ | **pickle** | dump, load |
123
+ | **joblib** | dump, load |
124
+
125
+ **Plus:** Automatic lineage relationships between files!
126
+
127
+ ## Visualizations
128
+
129
+ Generate beautiful lineage graphs:
130
+ ```bash
131
+ # Interactive HTML
132
+ lineage show --format html --output graph.html
133
+
134
+ # Static PNG
135
+ lineage show --format png --output graph.png
136
+ ```
137
+
138
+ Features:
139
+ - Color-coded by file type
140
+ - Hover for details
141
+ - Click to explore
142
+ - Export for presentations
143
+
144
+ ## EU AI Act Compliance
145
+
146
+ Generate compliance reports with one command:
147
+ ```bash
148
+ lineage report --format markdown
149
+ ```
150
+
151
+ Includes:
152
+ - Complete data inventory with SHA-256 hashes
153
+ - All transformation operations documented
154
+ - Full lineage graph with verification
155
+ - Reproducibility instructions
156
+ - Regulatory compliance statement
157
+
158
+ Perfect for:
159
+ - EU AI Act Article 10 requirements
160
+ - Model governance and auditing
161
+ - Research reproducibility
162
+ - Team collaboration
163
+
164
+ ## CLI Reference
165
+ ```bash
166
+ lineage track SCRIPT # Track a Python script
167
+ lineage show # Visualize lineage graph
168
+ lineage summary # Show statistics
169
+ lineage report # Generate compliance report
170
+ lineage clear # Delete database
171
+ ```
172
+
173
+ See [docs/cli.md](docs/cli.md) for complete reference.
174
+
175
+ ## Jupyter Notebook
176
+ ```python
177
+ %load_ext autolineage
178
+ %lineage_start
179
+
180
+ import pandas as pd
181
+ df = pd.read_csv('data.csv')
182
+ df.to_csv('output.csv')
183
+
184
+ %lineage_summary # Show stats
185
+ %lineage_show # Display graph
186
+ %lineage_report # Generate report
187
+ ```
188
+
189
+ See [examples/jupyter_demo.ipynb](examples/jupyter_demo.ipynb) for complete demo.
190
+
191
+ ## Documentation
192
+
193
+ - [QuickStart Guide](docs/quickstart.md) - Get started in 5 minutes
194
+ - [CLI Reference](docs/cli.md) - Complete command-line guide
195
+ - [Compliance Guide](docs/compliance.md) - EU AI Act reporting
196
+ - [Examples](examples/) - Working code samples
197
+
198
+ ## Use Cases
199
+
200
+ ### Research Reproducibility
201
+ Track every step from raw data to published results. Never wonder "which dataset did I use?" again.
202
+
203
+ ### ML Model Governance
204
+ Automatic compliance documentation for regulated industries. EU AI Act ready.
205
+
206
+ ### Team Collaboration
207
+ Share complete data provenance with your team. Everyone knows exactly what transformations were applied.
208
+
209
+ ### Debugging
210
+ Trace model issues back to data sources instantly. Full audit trail included.
211
+
212
+ ## Architecture
213
+ ```
214
+ Raw Data → [Transformation 1] → Intermediate → [Transformation 2] → Model
215
+ ↓ ↓ ↓ ↓ ↓
216
+ Tracked Logged & Hashed Tracked Logged & Hashed Tracked
217
+ ```
218
+
219
+ - **SQLite Database** - Portable, zero-config storage
220
+ - **Function Hooking** - Automatic tracking via monkey-patching
221
+ - **Cryptographic Hashing** - SHA-256 for data integrity
222
+ - **Graph Generation** - NetworkX for lineage DAG
223
+
224
+ ## Contributing
225
+
226
+ This is a research project being developed for a PhD in AI.
227
+
228
+ Contributions welcome! Please:
229
+ 1. Fork the repository
230
+ 2. Create a feature branch
231
+ 3. Add tests for new features
232
+ 4. Submit a pull request
233
+
234
+ ## License
235
+
236
+ MIT License - see [LICENSE](LICENSE) for details.
237
+
238
+ ## Author
239
+
240
+ Built by Kishan as part of PhD research on ML reproducibility and data governance.
241
+
242
+ - GitHub: [@kishanraj41](https://github.com/kishanraj41)
243
+
244
+ - Email: kishanraj41@gmail.com
245
+
246
+ ## Star History
247
+
248
+ If you find AutoLineage useful, please star the repository!
249
+
250
+ ## Citation
251
+
252
+ If you use AutoLineage in your research, please cite:
253
+ ```bibtex
254
+ @software{autolineage2025,
255
+ author = Kishan Raj Vandhavasi Goutham Kumar,
256
+ title = {AutoLineage: Automatic ML Data Lineage Tracking},
257
+ year = {2025},
258
+ url = {https://github.com/kishanraj41/autolineage}
259
+ }
260
+ ```
261
+
262
+ ## Roadmap
263
+
264
+ - [x] Automatic pandas/numpy tracking
265
+ - [x] Visual lineage graphs
266
+ - [x] CLI interface
267
+ - [x] EU AI Act compliance reports
268
+ - [x] Jupyter magic commands
269
+ - [ ] MLflow integration
270
+ - [ ] Git integration
271
+ - [ ] Column-level lineage
272
+ - [ ] Data drift detection
273
+ - [ ] Team collaboration features
274
+ - [ ] Cloud storage support
275
+
276
+ ## FAQ
277
+
278
+ **Q: Does this slow down my code?**
279
+ A: Minimal overhead - just file I/O tracking. Typically <1% performance impact.
280
+
281
+ **Q: Do I need to change my code?**
282
+ A: No! Just `import autolineage.auto` at the top. Everything else is automatic.
283
+
284
+ **Q: What Python versions are supported?**
285
+ A: Python 3.8+
286
+
287
+ **Q: Can I use this in production?**
288
+ A: Yes! It's lightweight and has minimal dependencies.
289
+
290
+ **Q: How is this different from MLflow?**
291
+ A: AutoLineage focuses on automatic data lineage (zero manual logging), while MLflow is a complete MLOps platform. They complement each other!
292
+
293
+ ---
294
+
295
+ **Made for the ML community**
296
+ ```
297
+
298
+ **Save:** `Ctrl+S`
299
+
300
+ ---
301
+
302
+ ## **4.3: Create LICENSE File**
303
+
304
+ **Create file: `LICENSE`**
305
+ ```
306
+ MIT License
307
+
308
+ Copyright (c) 2025 Kishan
309
+
310
+ Permission is hereby granted, free of charge, to any person obtaining a copy
311
+ of this software and associated documentation files (the "Software"), to deal
312
+ in the Software without restriction, including without limitation the rights
313
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
314
+ copies of the Software, and to permit persons to whom the Software is
315
+ furnished to do so, subject to the following conditions:
316
+
317
+ The above copyright notice and this permission notice shall be included in all
318
+ copies or substantial portions of the Software.
319
+
320
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
321
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
322
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
323
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
324
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
325
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
326
+ SOFTWARE.
327
+ ```
328
+
329
+ **Save:** `Ctrl+S`
330
+
331
+ ---
332
+
333
+ ## **4.4: Create .gitignore (if not exists)**
334
+
335
+ **Update: `.gitignore`**
336
+ ```
337
+ # Python
338
+ __pycache__/
339
+ *.py[cod]
340
+ *$py.class
341
+ *.so
342
+ .Python
343
+ build/
344
+ develop-eggs/
345
+ dist/
346
+ downloads/
347
+ eggs/
348
+ .eggs/
349
+ lib/
350
+ lib64/
351
+ parts/
352
+ sdist/
353
+ var/
354
+ wheels/
355
+ pip-wheel-metadata/
356
+ share/python-wheels/
357
+ *.egg-info/
358
+ .installed.cfg
359
+ *.egg
360
+ MANIFEST
361
+
362
+ # Virtual environment
363
+ venv/
364
+ env/
365
+ ENV/
366
+ .venv
367
+
368
+ # IDE
369
+ .vscode/
370
+ .idea/
371
+ *.swp
372
+ *.swo
373
+ *~
374
+
375
+ # Database
376
+ *.db
377
+ *.sqlite
378
+ *.sqlite3
379
+
380
+ # OS
381
+ .DS_Store
382
+ .DS_Store?
383
+ ._*
384
+ .Spotlight-V100
385
+ .Trashes
386
+ ehthumbs.db
387
+ Thumbs.db
388
+
389
+ # Testing
390
+ .pytest_cache/
391
+ .coverage
392
+ htmlcov/
393
+ .tox/
394
+
395
+ # Documentation
396
+ docs/_build/
397
+ site/
398
+
399
+ # Jupyter
400
+ .ipynb_checkpoints/
401
+ *.ipynb_checkpoints
402
+
403
+ # Generated files
404
+ *.csv
405
+ *.parquet
406
+ *.json
407
+ *.png
408
+ *.html
409
+ *.md
410
+ !README.md
411
+ !docs/*.md
412
+ !LICENSE
413
+
414
+ # Logs
415
+ *.log