tokpipe 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
tokpipe-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 aroaxinping
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
tokpipe-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,469 @@
1
+ Metadata-Version: 2.4
2
+ Name: tokpipe
3
+ Version: 0.1.0
4
+ Summary: Data pipeline for TikTok analytics. Your exports in, real metrics out.
5
+ Author: aroaxinping
6
+ License-Expression: MIT
7
+ Keywords: tiktok,analytics,data-pipeline,content-creator
8
+ Classifier: Development Status :: 3 - Alpha
9
+ Classifier: Intended Audience :: Developers
10
+ Classifier: Programming Language :: Python :: 3
11
+ Classifier: Topic :: Scientific/Engineering :: Information Analysis
12
+ Requires-Python: >=3.10
13
+ Description-Content-Type: text/markdown
14
+ License-File: LICENSE
15
+ Requires-Dist: pandas>=2.0
16
+ Requires-Dist: openpyxl>=3.1
17
+ Requires-Dist: matplotlib>=3.7
18
+ Requires-Dist: seaborn>=0.13
19
+ Requires-Dist: plotly>=5.18
20
+ Requires-Dist: pyyaml>=6.0
21
+ Dynamic: license-file
22
+
23
+ # tokpipe
24
+
25
+ [![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue)](https://www.python.org/downloads/)
26
+ [![License: MIT](https://img.shields.io/badge/license-MIT-green)](LICENSE)
27
+ [![Tests](https://github.com/aroaxinping/tokpipe/actions/workflows/ci.yml/badge.svg)](https://github.com/aroaxinping/tokpipe/actions/workflows/ci.yml)
28
+
29
+ Data pipeline for TikTok analytics. Import your exported data, clean it, classify content, compute real metrics, and visualize what actually works.
30
+
31
+ No APIs, no scraping, no third-party tokens. Just your TikTok export files (CSV/XLSX) and Python.
32
+
33
+ ---
34
+
35
+ ## Why tokpipe?
36
+
37
+ TikTok gives you a spreadsheet with raw numbers. That's it. No insights, no trends, no "why did this video work?".
38
+
39
+ tokpipe takes that file and builds a full analytics pipeline: cleans the data, classifies your content by topic, computes real metrics (engagement rate, best posting hour, growth trends), and generates an interactive dashboard, an Excel report with formulas, and static charts. One command, all outputs.
40
+
41
+ It's built for creators who want to understand their data without depending on third-party tools that ask for your credentials.
42
+
43
+ ---
44
+
45
+ ## What you get
46
+
47
+ ```bash
48
+ tokpipe analyze TikTok_Analytics.xlsx --followers 8728
49
+ ```
50
+
51
+ | Output | What it is |
52
+ |---|---|
53
+ | `report.csv` | Your data cleaned + engagement rate, completion rate, category per video |
54
+ | `analytics.xlsx` | Excel with native formulas — open in Excel or Google Sheets |
55
+ | `dashboard.html` | Interactive Plotly dashboard — open in any browser, hover for details |
56
+ | `engagement.png` | Engagement rate distribution across all your videos |
57
+ | `best_hours.png` | Which hours get the best engagement |
58
+ | `growth.png` | 7-day rolling average of your views |
59
+
60
+ ---
61
+
62
+ ## Architecture
63
+
64
+ ```
65
+ tokpipe follows a classic ETL pipeline structure:
66
+
67
+ Export (TikTok XLSX/CSV)
68
+ |
69
+ v
70
+ +-----------+
71
+ | ingest | --> Load and validate raw export files
72
+ +-----------+
73
+ |
74
+ v
75
+ +-----------+
76
+ | clean | --> Normalize columns, fix types, handle nulls
77
+ +-----------+
78
+ |
79
+ v
80
+ +-----------+
81
+ | classify | --> Tag each video with a topic/category
82
+ +-----------+
83
+ |
84
+ v
85
+ +-----------+
86
+ | metrics | --> Compute engagement rate, retention, trends
87
+ +-----------+
88
+ |
89
+ v
90
+ +-----------+ +-----------+ +-----------+
91
+ | output | | excel | | dashboard |
92
+ | (CSV/PNG) | | (.xlsx) | | (.html) |
93
+ +-----------+ +-----------+ +-----------+
94
+ ```
95
+
96
+ ### Modules
97
+
98
+ | Module | What it does |
99
+ |---|---|
100
+ | `tokpipe.ingest` | Reads TikTok export files (XLSX, CSV). Detects format, validates columns, returns a raw DataFrame. |
101
+ | `tokpipe.clean` | Normalizes column names, converts date/number types, drops corrupted rows, fills missing values. |
102
+ | `tokpipe.classify` | Assigns a topic/category to each video. Configurable via YAML rules or custom function. |
103
+ | `tokpipe.metrics` | Computes derived metrics: engagement rate, average watch time, best posting hour, growth trends. |
104
+ | `tokpipe.output` | Exports results to CSV/JSON. Generates matplotlib/seaborn PNG charts. |
105
+ | `tokpipe.excel` | Generates Excel report with native formulas, formatting, and embedded charts. |
106
+ | `tokpipe.dashboard` | Generates interactive Plotly HTML dashboard with all visualizations. |
107
+ | `tokpipe.cli` | Command-line interface. Entry point for `tokpipe analyze`. |
108
+
109
+ ---
110
+
111
+ ## Prerequisites
112
+
113
+ You need two things before installing tokpipe:
114
+
115
+ ### Python 3.10+
116
+
117
+ Check your version:
118
+
119
+ ```bash
120
+ python --version
121
+ # or
122
+ python3 --version
123
+ ```
124
+
125
+ If you don't have it:
126
+
127
+ ```bash
128
+ # macOS (Homebrew)
129
+ brew install python
130
+
131
+ # Ubuntu/Debian
132
+ sudo apt install python3 python3-venv python3-pip
133
+
134
+ # Windows (winget)
135
+ winget install Python.Python.3.12
136
+ ```
137
+
138
+ Or download directly from [python.org](https://www.python.org/downloads/).
139
+
140
+ ### git (optional)
141
+
142
+ Only needed to clone the repo. You can also [download the ZIP](https://github.com/aroaxinping/tokpipe/archive/refs/heads/main.zip) from GitHub.
143
+
144
+ ```bash
145
+ git --version
146
+ ```
147
+
148
+ ---
149
+
150
+ ## Get your TikTok data
151
+
152
+ tokpipe works with the analytics files that TikTok lets you export. No API keys, no scraping — just the file TikTok gives you.
153
+
154
+ **How to export:**
155
+
156
+ 1. Open TikTok on **desktop** (not the app) or go to [tiktok.com](https://www.tiktok.com)
157
+ 2. Go to your profile > **Creator tools** > **Analytics**
158
+ 3. Select the date range you want to analyze
159
+ 4. Click **Export data** (top right)
160
+ 5. Download the XLSX or CSV file
161
+
162
+ **What the file should contain:**
163
+
164
+ | Required columns | Optional columns |
165
+ |---|---|
166
+ | Views | Watch time |
167
+ | Likes | Video duration |
168
+ | Comments | Post date/time |
169
+ | Shares | Caption/description |
170
+
171
+ tokpipe auto-detects column names in both English and Spanish. If your export uses different names, the pipeline will try to match them — if it can't find a `views` column, it will tell you.
172
+
173
+ ---
174
+
175
+ ## Installation
176
+
177
+ ```bash
178
+ # Clone the repo
179
+ git clone https://github.com/aroaxinping/tokpipe.git
180
+ cd tokpipe
181
+
182
+ # Create a virtual environment
183
+ python3 -m venv .venv
184
+
185
+ # Activate it
186
+ source .venv/bin/activate # macOS / Linux
187
+ # .venv\Scripts\activate # Windows (cmd)
188
+ # .venv\Scripts\Activate.ps1 # Windows (PowerShell)
189
+
190
+ # Install tokpipe and all its dependencies
191
+ pip install -e .
192
+ ```
193
+
194
+ This installs: pandas, openpyxl, matplotlib, seaborn, plotly, and pyyaml.
195
+
196
+ Verify it worked:
197
+
198
+ ```bash
199
+ tokpipe --version
200
+ # tokpipe 0.1.0
201
+ ```
202
+
203
+ ---
204
+
205
+ ## Usage
206
+
207
+ ### Try it with sample data
208
+
209
+ Don't have a TikTok export yet? Use the included sample:
210
+
211
+ ```bash
212
+ tokpipe analyze examples/sample_data.csv --output sample_results/
213
+ ```
214
+
215
+ ### Quick start
216
+
217
+ ```bash
218
+ # Make sure your venv is active
219
+ source .venv/bin/activate
220
+
221
+ # Run the pipeline on your export file
222
+ tokpipe analyze ~/Downloads/TikTok_Analytics.xlsx
223
+ ```
224
+
225
+ That's it. It will create a `results/` folder with everything.
226
+
227
+ ### Output files
228
+
229
+ ```
230
+ results/
231
+ report.csv # Your data cleaned + engagement rate, completion rate, category
232
+ analytics.xlsx # Excel with native formulas (open in Excel/Google Sheets)
233
+ dashboard.html # Interactive dashboard (open in any browser)
234
+ engagement.png # Engagement rate distribution chart
235
+ best_hours.png # Which hours get the best engagement
236
+ growth.png # How your views are trending over time
237
+ ```
238
+
239
+ ### All CLI options
240
+
241
+ ```bash
242
+ tokpipe analyze <file> [options]
243
+ ```
244
+
245
+ | Option | What it does | Example |
246
+ |---|---|---|
247
+ | `--output`, `-o` | Output directory (default: `results/`) | `--output my_report/` |
248
+ | `--followers` | Your follower count (shown in reports) | `--followers 8728` |
249
+ | `--period` | Label for the date range you're analyzing | `--period "24 Feb - 23 Mar 2026"` |
250
+ | `--rules` | Path to YAML file with custom classification rules | `--rules my_rules.yaml` |
251
+ | `--no-charts` | Skip PNG chart generation | |
252
+ | `--no-dashboard` | Skip HTML dashboard generation | |
253
+ | `--no-excel` | Skip Excel report generation | |
254
+
255
+ ### Full example
256
+
257
+ ```bash
258
+ tokpipe analyze TikTok_Analytics.xlsx \
259
+ --output results/ \
260
+ --followers 8728 \
261
+ --period "24 Feb - 23 Mar 2026" \
262
+ --rules rules.yaml
263
+ ```
264
+
265
+ ### Only want the CSV?
266
+
267
+ ```bash
268
+ tokpipe analyze data.xlsx --no-dashboard --no-excel --no-charts
269
+ ```
270
+
271
+ ### Python API
272
+
273
+ ```python
274
+ from tokpipe import ingest, clean, classify, metrics, output, excel, dashboard
275
+
276
+ # Load and clean
277
+ raw = ingest.load("TikTok_Analytics.xlsx")
278
+ df = clean.normalize(raw)
279
+
280
+ # Classify content
281
+ df["category"] = classify.classify(df)
282
+
283
+ # Compute metrics
284
+ report = metrics.compute(df)
285
+ print(report.summary())
286
+
287
+ # Export
288
+ output.to_csv(report, "report.csv")
289
+ excel.to_excel(report, "analytics.xlsx", followers=8728)
290
+ dashboard.generate(report, "dashboard.html")
291
+ ```
292
+
293
+ ---
294
+
295
+ ## Content classification
296
+
297
+ By default, tokpipe classifies videos into: setup, coding, data, study, tech, other.
298
+
299
+ ### Custom rules via YAML
300
+
301
+ Create a `rules.yaml`:
302
+
303
+ ```yaml
304
+ setup:
305
+ - keyboard
306
+ - monitor
307
+ - desk
308
+ - compra
309
+ coding:
310
+ - python
311
+ - debug
312
+ - script
313
+ data:
314
+ - dataset
315
+ - pandas
316
+ - sql
317
+ study:
318
+ - exam
319
+ - uni
320
+ - homework
321
+ ```
322
+
323
+ ```bash
324
+ tokpipe analyze data.xlsx --rules rules.yaml
325
+ ```
326
+
327
+ ### Custom function (Python API)
328
+
329
+ ```python
330
+ def my_classifier(text: str) -> str:
331
+ if "python" in text:
332
+ return "coding"
333
+ if "setup" in text:
334
+ return "setup"
335
+ return "other"
336
+
337
+ df["category"] = classify.classify(df, classifier_fn=my_classifier)
338
+ ```
339
+
340
+ ---
341
+
342
+ ## SQL queries
343
+
344
+ The `sql/` directory contains reference queries for analyzing your exported CSV with DuckDB, SQLite, or any SQL engine:
345
+
346
+ ```bash
347
+ # Example with DuckDB
348
+ duckdb -c "
349
+ CREATE TABLE videos AS SELECT * FROM read_csv_auto('results/report.csv');
350
+ SELECT * FROM videos ORDER BY engagement_rate DESC LIMIT 10;
351
+ "
352
+ ```
353
+
354
+ See [sql/queries.sql](sql/queries.sql) for the full set.
355
+
356
+ ---
357
+
358
+ ## Available metrics
359
+
360
+ | Metric | Formula / Description |
361
+ |---|---|
362
+ | Engagement rate | (likes + comments + shares) / views |
363
+ | Average watch time | Total watch time / views |
364
+ | Completion rate | Average watch time / video duration |
365
+ | Best posting hour | Hour with highest median engagement |
366
+ | Growth trend | Rolling 7-day average of views |
367
+ | Top performers | Videos above 90th percentile engagement |
368
+
369
+ ---
370
+
371
+ ## Project structure
372
+
373
+ ```
374
+ tokpipe/
375
+ .github/
376
+ workflows/
377
+ ci.yml # GitHub Actions CI (tests on Python 3.10-3.13)
378
+ ISSUE_TEMPLATE/
379
+ bug_report.md # Bug report template
380
+ feature_request.md # Feature request template
381
+ src/
382
+ tokpipe/
383
+ __init__.py # Package init, version
384
+ cli.py # Command-line interface
385
+ ingest.py # Load TikTok exports
386
+ clean.py # Normalize and clean data
387
+ classify.py # Content classifier (YAML / custom function)
388
+ metrics.py # Compute derived metrics
389
+ output.py # CSV/JSON export + matplotlib charts
390
+ excel.py # Excel report with formulas
391
+ dashboard.py # Interactive Plotly HTML dashboard
392
+ tests/
393
+ test_ingest.py
394
+ test_clean.py
395
+ test_metrics.py
396
+ sql/
397
+ queries.sql # Reference SQL queries
398
+ examples/
399
+ basic_analysis.py # Minimal working example
400
+ sample_data.csv # Fake data to test without a TikTok account
401
+ pyproject.toml
402
+ LICENSE
403
+ CONTRIBUTING.md
404
+ README.md
405
+ ```
406
+
407
+ ---
408
+
409
+ ## Troubleshooting
410
+
411
+ ### `ModuleNotFoundError: No module named 'tokpipe'`
412
+
413
+ Your virtual environment is not activated. Run:
414
+
415
+ ```bash
416
+ source .venv/bin/activate # macOS / Linux
417
+ .venv\Scripts\activate # Windows
418
+ ```
419
+
420
+ ### `ModuleNotFoundError: No module named 'pandas'`
421
+
422
+ Dependencies are not installed. Run:
423
+
424
+ ```bash
425
+ pip install -e .
426
+ ```
427
+
428
+ ### `ValueError: Could not find a 'views' column`
429
+
430
+ tokpipe couldn't match any column in your export to "views". This happens when the export uses a language tokpipe doesn't recognize yet. Open your file, check the column name for views, and [open an issue](https://github.com/aroaxinping/tokpipe/issues/new) with the column names so we can add support.
431
+
432
+ ### `FileNotFoundError: File not found`
433
+
434
+ Check that the path to your export file is correct. Use the full path:
435
+
436
+ ```bash
437
+ tokpipe analyze /Users/you/Downloads/TikTok_Analytics.xlsx
438
+ ```
439
+
440
+ ### Charts are not generated
441
+
442
+ If you see `-- Skipping best hours` or `-- Skipping growth trend`, your export file doesn't have a date/time column. tokpipe needs a column with post dates to generate time-based charts. The engagement distribution chart will still work.
443
+
444
+ ### `pip install -e .` fails
445
+
446
+ If you're on Python 3.14+, try installing without editable mode:
447
+
448
+ ```bash
449
+ pip install .
450
+ ```
451
+
452
+ Or install dependencies manually and run with PYTHONPATH:
453
+
454
+ ```bash
455
+ pip install pandas openpyxl matplotlib seaborn plotly pyyaml
456
+ PYTHONPATH=src tokpipe analyze data.xlsx
457
+ ```
458
+
459
+ ---
460
+
461
+ ## Contributing
462
+
463
+ See [CONTRIBUTING.md](CONTRIBUTING.md).
464
+
465
+ ---
466
+
467
+ ## License
468
+
469
+ MIT. See [LICENSE](LICENSE).