datalore-cli 0.4.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Mihajlo Micic
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,347 @@
1
+ Metadata-Version: 2.4
2
+ Name: datalore-cli
3
+ Version: 0.4.1
4
+ Summary: Deterministic local data analysis CLI for coding agents.
5
+ Author: Mihajlo Micic
6
+ License-Expression: MIT
7
+ Project-URL: Repository, https://github.com/micic-mihajlo/datalore-cli
8
+ Project-URL: Issues, https://github.com/micic-mihajlo/datalore-cli/issues
9
+ Keywords: cli,data-analysis,agents,codex,claude-code,pandas
10
+ Classifier: Development Status :: 4 - Beta
11
+ Classifier: Environment :: Console
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.9
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Programming Language :: Python :: 3.13
19
+ Classifier: Topic :: Scientific/Engineering :: Information Analysis
20
+ Requires-Python: >=3.9
21
+ Description-Content-Type: text/markdown
22
+ License-File: LICENSE
23
+ Requires-Dist: pandas
24
+ Requires-Dist: matplotlib
25
+ Requires-Dist: seaborn
26
+ Requires-Dist: scikit-learn
27
+ Requires-Dist: numpy
28
+ Dynamic: license-file
29
+
30
+ # Datalore
31
+
32
+ Datalore is a deterministic local data-analysis CLI designed to amplify coding agents.
33
+
34
+ The core idea is simple: when Codex, Claude Code, or a human operator needs a reliable answer about a local dataset, they should call a stable binary with explicit arguments and parse structured output instead of improvising pandas, sklearn, and matplotlib every time.
35
+
36
+ ## What It Does
37
+
38
+ - Inspect local CSV, Excel, and JSON datasets
39
+ - Profile columns, dtypes, missing values, and duplicates
40
+ - Generate structured descriptive statistics
41
+ - Generate structured correlation reports
42
+ - Run deterministic linear regression
43
+ - Apply deterministic cleaning and export operations
44
+ - Compare two datasets structurally and at the row level
45
+ - Generate reproducible plot artifacts
46
+ - Diagnose runtime/install issues with a doctor command
47
+ - Emit either human-readable text or machine-readable JSON
48
+ - Optionally persist the full JSON envelope to a report file
49
+ - Keep outputs under `artifacts/` by default
50
+
51
+ ## Install
52
+
53
+ Current install flow, from any machine where you want to analyze local data:
54
+
55
+ ```bash
56
+ python3 -m venv .venv
57
+ source .venv/bin/activate
58
+ pip install 'git+https://github.com/micic-mihajlo/datalore-cli.git'
59
+ ```
60
+
61
+ After that, the public interface is just:
62
+
63
+ ```bash
64
+ datalore --help
65
+ ```
66
+
67
+ The repo-local `./datalore` wrapper is only a development convenience for contributors working inside this repository.
68
+
69
+ If you are developing Datalore itself, install it in editable mode from the repo root:
70
+
71
+ ```bash
72
+ pip install -e .
73
+ ```
74
+
75
+ ## Use In Your Own Project
76
+
77
+ Move into the folder that contains your data:
78
+
79
+ ```bash
80
+ cd /path/to/my/data
81
+ datalore init
82
+ ```
83
+
84
+ That creates:
85
+ - `AGENTS.md`
86
+ - `CLAUDE.md`
87
+ - `./datalore`
88
+ - `artifacts/`
89
+ - a `.gitignore` entry for `artifacts/`
90
+
91
+ After that, both you and your coding agent can work from that folder directly:
92
+
93
+ ```bash
94
+ ./datalore files . --format json
95
+ ./datalore profile sales.csv --format json
96
+ ./datalore clean sales.csv --rename sales=>revenue --output artifacts/datasets/sales_clean.csv --format json
97
+ ```
98
+
99
+ Why the wrapper matters:
100
+ - humans can keep using `datalore`
101
+ - agents are usually safer using `./datalore` because it binds the project to the exact Python interpreter that installed Datalore
102
+
103
+ ## CLI Usage
104
+
105
+ Inspect a dataset:
106
+
107
+ ```bash
108
+ datalore inspect data.csv
109
+ ```
110
+
111
+ Profile a dataset in JSON mode for agent consumption:
112
+
113
+ ```bash
114
+ datalore profile data.csv --format json
115
+ ```
116
+
117
+ Run a simple regression:
118
+
119
+ ```bash
120
+ datalore regression data.csv --target last --format json
121
+ ```
122
+
123
+ Constrain the predictors:
124
+
125
+ ```bash
126
+ datalore regression data.csv --target revenue --predictors spend,visits,leads --format json
127
+ ```
128
+
129
+ Create a plot with a stable output path:
130
+
131
+ ```bash
132
+ datalore plot data.csv --kind histogram --x revenue --output artifacts/plots/revenue_hist.png --format json
133
+ ```
134
+
135
+ Generate a correlation heatmap:
136
+
137
+ ```bash
138
+ datalore plot data.csv --kind correlation-heatmap --format json
139
+ ```
140
+
141
+ Compare two datasets:
142
+
143
+ ```bash
144
+ datalore compare before.csv after.csv --format json
145
+ datalore compare before.csv after.csv --key-columns id --format json
146
+ ```
147
+
148
+ Generate structured summary statistics:
149
+
150
+ ```bash
151
+ ./datalore summary data.csv --format json
152
+ ```
153
+
154
+ Generate a structured correlation report:
155
+
156
+ ```bash
157
+ ./datalore correlate data.csv --target revenue --min-abs-correlation 0.3 --format json
158
+ ```
159
+
160
+ Clean and export a dataset deterministically:
161
+
162
+ ```bash
163
+ datalore clean data.csv --fill-numeric median --drop-duplicates --output artifacts/datasets/data_clean.csv --format json
164
+ ```
165
+
166
+ Apply deterministic transforms without falling back to ad hoc pandas:
167
+
168
+ ```bash
169
+ datalore clean data.csv \
170
+ --rename sales=>revenue \
171
+ --derive sales_per_customer=>revenue\|/\|customers \
172
+ --filter 'customers>=24' \
173
+ --limit 100 \
174
+ --output artifacts/datasets/data_prepared.csv \
175
+ --format json
176
+ ```
177
+
178
+ Write the full structured result to disk for downstream tooling:
179
+
180
+ ```bash
181
+ datalore profile data.csv --format json --report-file artifacts/reports/profile.json
182
+ ```
183
+
184
+ Find local datasets, including gitignored files in the repo:
185
+
186
+ ```bash
187
+ datalore files . --format json
188
+ ```
189
+
190
+ Inspect the runtime/install state:
191
+
192
+ ```bash
193
+ datalore doctor --format json
194
+ ```
195
+
196
+ Initialize a non-repo project for Codex and Claude Code:
197
+
198
+ ```bash
199
+ datalore init
200
+ datalore init --agent codex
201
+ ```
202
+
203
+ ## Command Surface
204
+
205
+ `datalore inspect`
206
+ - Quick shape, dtype, missing-value, duplicate, and preview summary.
207
+
208
+ `datalore profile`
209
+ - Adds per-column detail on top of `inspect`.
210
+
211
+ `datalore regression`
212
+ - Runs a deterministic linear regression using numeric columns only.
213
+
214
+ `datalore summary`
215
+ - Emits machine-readable descriptive statistics for numeric and categorical columns.
216
+
217
+ `datalore correlate`
218
+ - Emits a machine-readable correlation matrix, strongest pairs, and optional target-column ranking.
219
+
220
+ `datalore clean`
221
+ - Applies deterministic cleaning and export operations such as filter, derive, rename, select, fill, drop NA, dedupe, sort, limit, and standardize.
222
+ - The transform pipeline runs in a fixed order: `rename -> derive -> filter -> select -> fill -> drop NA -> dedupe -> standardize -> sort -> limit`.
223
+
224
+ `datalore compare`
225
+ - Compares two datasets for shape, schema, dtype, missing-value, duplicate, and row-level changes. With `--key-columns`, it can report changed rows keyed by business identifiers.
226
+
227
+ `datalore files`
228
+ - Discovers local dataset files by walking the workspace, including gitignored inputs that `rg --files` would skip.
229
+
230
+ `datalore plot`
231
+ - Generates histogram, scatter, line, bar, or correlation heatmap artifacts.
232
+
233
+ `datalore doctor`
234
+ - Reports interpreter, artifact-root, and dependency status for install/runtime debugging.
235
+
236
+ `datalore init`
237
+ - Scaffolds `AGENTS.md`, `CLAUDE.md`, a local `./datalore` wrapper, `artifacts/`, and an `artifacts/` ignore rule in the current project.
238
+
239
+ ## Reliability
240
+
241
+ - JSON output uses a stable envelope with `schema_version`, `command`, `result`, and `error`
242
+ - every command supports `--report-file` for downstream automation
243
+ - `datalore` works from any folder after installation
244
+ - `./datalore` prefers the repo venv automatically when developing inside this repo
245
+ - `datalore doctor` gives a stable diagnostics entrypoint when the runtime is suspect
246
+ - startup no longer crashes on `--help` if scientific dependencies are missing
247
+ - CI runs editable install, distribution builds, and the CLI test suite on multiple Python versions via `.github/workflows/ci.yml`
248
+
249
+ ## Agent-Friendly Conventions
250
+
251
+ In a normal user project, prefer:
252
+
253
+ ```bash
254
+ ./datalore <command> ... --format json
255
+ ```
256
+
257
+ Why:
258
+ - JSON output is stable and easy to parse
259
+ - exit codes are predictable
260
+ - artifact paths are explicit
261
+ - the analysis method is fixed rather than improvised
262
+ - `--report-file` gives you a stable on-disk result for chained workflows
263
+ - `datalore files` avoids the common failure mode where the dataset is gitignored and invisible to `rg --files`
264
+ - `datalore init` gives Codex and Claude Code repo-local instructions plus a reliable local wrapper in the user's own data project
265
+
266
+ Inside the Datalore repository itself, contributors can still prefer:
267
+
268
+ ```bash
269
+ ./datalore <command> ... --format json
270
+ ```
271
+
272
+ Repo-native instructions are included for both ecosystems:
273
+ - `AGENTS.md` for Codex
274
+ - `CLAUDE.md` for Claude Code
275
+ - `.agents/skills/` for Codex skills
276
+ - `.claude/skills/` for Claude Code skills
277
+
278
+ These instructions tell agents when to use Datalore instead of writing ad hoc analysis code.
279
+
280
+ ## Artifacts
281
+
282
+ Generated outputs go under `artifacts/` by default:
283
+
284
+ - `artifacts/plots/`
285
+ - `artifacts/datasets/`
286
+ - `artifacts/reports/`
287
+ - `artifacts/scripts/`
288
+
289
+ Override the root with:
290
+
291
+ ```bash
292
+ export DATALORE_ARTIFACT_DIR=/path/to/artifacts
293
+ ```
294
+
295
+ ## Publishing
296
+
297
+ The repository is set up for PyPI Trusted Publishing through GitHub Actions.
298
+
299
+ Release flow:
300
+
301
+ ```bash
302
+ git tag v0.4.1
303
+ git push origin v0.4.1
304
+ gh release create v0.4.1 --generate-notes
305
+ ```
306
+
307
+ Before the first release, create or register the `datalore-cli` project on PyPI and add a Trusted Publisher that matches:
308
+
309
+ - owner: `micic-mihajlo`
310
+ - repository: `datalore-cli`
311
+ - workflow: `.github/workflows/publish.yml`
312
+ - environment: `pypi`
313
+
314
+ If the project does not exist yet on PyPI, use PyPI's pending publisher flow for the first release. After that, publishing happens from GitHub Actions without local API tokens.
315
+
316
+ ## Testing
317
+
318
+ Run the CLI test suite:
319
+
320
+ ```bash
321
+ .venv/bin/python -m unittest discover -s tests -v
322
+ ```
323
+
324
+ Smoke test the fixture dataset manually:
325
+
326
+ ```bash
327
+ ./datalore files tests --format json
328
+ ./datalore inspect tests/fixtures/mini_dataset.csv --format json
329
+ ./datalore summary tests/fixtures/mini_dataset.csv --format json
330
+ ./datalore correlate tests/fixtures/mini_dataset.csv --target sales --format json
331
+ ./datalore regression tests/fixtures/mini_dataset.csv --target sales --predictors marketing,customers --format json
332
+ ./datalore clean tests/fixtures/mini_dataset.csv --rename sales=>revenue --derive sales_per_customer=>revenue\|/\|customers --filter 'customers>=24' --limit 3 --format json
333
+ ./datalore clean tests/fixtures/mini_dataset_changed.csv --fill-numeric median --output artifacts/datasets/mini_clean.csv --format json
334
+ ./datalore compare tests/fixtures/mini_dataset.csv tests/fixtures/mini_dataset_changed.csv --key-columns month --format json
335
+ ./datalore doctor --format json
336
+ ```
337
+
338
+ Smoke test the installed package from outside the repo:
339
+
340
+ ```bash
341
+ mkdir -p /tmp/datalore-demo
342
+ cp tests/fixtures/mini_dataset.csv /tmp/datalore-demo/
343
+ cd /tmp/datalore-demo
344
+ datalore init
345
+ ./datalore files . --format json
346
+ ./datalore profile mini_dataset.csv --format json
347
+ ```
@@ -0,0 +1,318 @@
1
+ # Datalore
2
+
3
+ Datalore is a deterministic local data-analysis CLI designed to amplify coding agents.
4
+
5
+ The core idea is simple: when Codex, Claude Code, or a human operator needs a reliable answer about a local dataset, they should call a stable binary with explicit arguments and parse structured output instead of improvising pandas, sklearn, and matplotlib every time.
6
+
7
+ ## What It Does
8
+
9
+ - Inspect local CSV, Excel, and JSON datasets
10
+ - Profile columns, dtypes, missing values, and duplicates
11
+ - Generate structured descriptive statistics
12
+ - Generate structured correlation reports
13
+ - Run deterministic linear regression
14
+ - Apply deterministic cleaning and export operations
15
+ - Compare two datasets structurally and at the row level
16
+ - Generate reproducible plot artifacts
17
+ - Diagnose runtime/install issues with a doctor command
18
+ - Emit either human-readable text or machine-readable JSON
19
+ - Optionally persist the full JSON envelope to a report file
20
+ - Keep outputs under `artifacts/` by default
21
+
22
+ ## Install
23
+
24
+ Current install flow, from any machine where you want to analyze local data:
25
+
26
+ ```bash
27
+ python3 -m venv .venv
28
+ source .venv/bin/activate
29
+ pip install 'git+https://github.com/micic-mihajlo/datalore-cli.git'
30
+ ```
31
+
32
+ After that, the public interface is just:
33
+
34
+ ```bash
35
+ datalore --help
36
+ ```
37
+
38
+ The repo-local `./datalore` wrapper is only a development convenience for contributors working inside this repository.
39
+
40
+ If you are developing Datalore itself, install it in editable mode from the repo root:
41
+
42
+ ```bash
43
+ pip install -e .
44
+ ```
45
+
46
+ ## Use In Your Own Project
47
+
48
+ Move into the folder that contains your data:
49
+
50
+ ```bash
51
+ cd /path/to/my/data
52
+ datalore init
53
+ ```
54
+
55
+ That creates:
56
+ - `AGENTS.md`
57
+ - `CLAUDE.md`
58
+ - `./datalore`
59
+ - `artifacts/`
60
+ - a `.gitignore` entry for `artifacts/`
61
+
62
+ After that, both you and your coding agent can work from that folder directly:
63
+
64
+ ```bash
65
+ ./datalore files . --format json
66
+ ./datalore profile sales.csv --format json
67
+ ./datalore clean sales.csv --rename sales=>revenue --output artifacts/datasets/sales_clean.csv --format json
68
+ ```
69
+
70
+ Why the wrapper matters:
71
+ - humans can keep using `datalore`
72
+ - agents are usually safer using `./datalore` because it binds the project to the exact Python interpreter that installed Datalore
73
+
74
+ ## CLI Usage
75
+
76
+ Inspect a dataset:
77
+
78
+ ```bash
79
+ datalore inspect data.csv
80
+ ```
81
+
82
+ Profile a dataset in JSON mode for agent consumption:
83
+
84
+ ```bash
85
+ datalore profile data.csv --format json
86
+ ```
87
+
88
+ Run a simple regression:
89
+
90
+ ```bash
91
+ datalore regression data.csv --target last --format json
92
+ ```
93
+
94
+ Constrain the predictors:
95
+
96
+ ```bash
97
+ datalore regression data.csv --target revenue --predictors spend,visits,leads --format json
98
+ ```
99
+
100
+ Create a plot with a stable output path:
101
+
102
+ ```bash
103
+ datalore plot data.csv --kind histogram --x revenue --output artifacts/plots/revenue_hist.png --format json
104
+ ```
105
+
106
+ Generate a correlation heatmap:
107
+
108
+ ```bash
109
+ datalore plot data.csv --kind correlation-heatmap --format json
110
+ ```
111
+
112
+ Compare two datasets:
113
+
114
+ ```bash
115
+ datalore compare before.csv after.csv --format json
116
+ datalore compare before.csv after.csv --key-columns id --format json
117
+ ```
118
+
119
+ Generate structured summary statistics:
120
+
121
+ ```bash
122
+ ./datalore summary data.csv --format json
123
+ ```
124
+
125
+ Generate a structured correlation report:
126
+
127
+ ```bash
128
+ ./datalore correlate data.csv --target revenue --min-abs-correlation 0.3 --format json
129
+ ```
130
+
131
+ Clean and export a dataset deterministically:
132
+
133
+ ```bash
134
+ datalore clean data.csv --fill-numeric median --drop-duplicates --output artifacts/datasets/data_clean.csv --format json
135
+ ```
136
+
137
+ Apply deterministic transforms without falling back to ad hoc pandas:
138
+
139
+ ```bash
140
+ datalore clean data.csv \
141
+ --rename sales=>revenue \
142
+ --derive sales_per_customer=>revenue\|/\|customers \
143
+ --filter 'customers>=24' \
144
+ --limit 100 \
145
+ --output artifacts/datasets/data_prepared.csv \
146
+ --format json
147
+ ```
148
+
149
+ Write the full structured result to disk for downstream tooling:
150
+
151
+ ```bash
152
+ datalore profile data.csv --format json --report-file artifacts/reports/profile.json
153
+ ```
154
+
155
+ Find local datasets, including gitignored files in the repo:
156
+
157
+ ```bash
158
+ datalore files . --format json
159
+ ```
160
+
161
+ Inspect the runtime/install state:
162
+
163
+ ```bash
164
+ datalore doctor --format json
165
+ ```
166
+
167
+ Initialize a non-repo project for Codex and Claude Code:
168
+
169
+ ```bash
170
+ datalore init
171
+ datalore init --agent codex
172
+ ```
173
+
174
+ ## Command Surface
175
+
176
+ `datalore inspect`
177
+ - Quick shape, dtype, missing-value, duplicate, and preview summary.
178
+
179
+ `datalore profile`
180
+ - Adds per-column detail on top of `inspect`.
181
+
182
+ `datalore regression`
183
+ - Runs a deterministic linear regression using numeric columns only.
184
+
185
+ `datalore summary`
186
+ - Emits machine-readable descriptive statistics for numeric and categorical columns.
187
+
188
+ `datalore correlate`
189
+ - Emits a machine-readable correlation matrix, strongest pairs, and optional target-column ranking.
190
+
191
+ `datalore clean`
192
+ - Applies deterministic cleaning and export operations such as filter, derive, rename, select, fill, drop NA, dedupe, sort, limit, and standardize.
193
+ - The transform pipeline runs in a fixed order: `rename -> derive -> filter -> select -> fill -> drop NA -> dedupe -> standardize -> sort -> limit`.
194
+
195
+ `datalore compare`
196
+ - Compares two datasets for shape, schema, dtype, missing-value, duplicate, and row-level changes. With `--key-columns`, it can report changed rows keyed by business identifiers.
197
+
198
+ `datalore files`
199
+ - Discovers local dataset files by walking the workspace, including gitignored inputs that `rg --files` would skip.
200
+
201
+ `datalore plot`
202
+ - Generates histogram, scatter, line, bar, or correlation heatmap artifacts.
203
+
204
+ `datalore doctor`
205
+ - Reports interpreter, artifact-root, and dependency status for install/runtime debugging.
206
+
207
+ `datalore init`
208
+ - Scaffolds `AGENTS.md`, `CLAUDE.md`, a local `./datalore` wrapper, `artifacts/`, and an `artifacts/` ignore rule in the current project.
209
+
210
+ ## Reliability
211
+
212
+ - JSON output uses a stable envelope with `schema_version`, `command`, `result`, and `error`
213
+ - every command supports `--report-file` for downstream automation
214
+ - `datalore` works from any folder after installation
215
+ - `./datalore` prefers the repo venv automatically when developing inside this repo
216
+ - `datalore doctor` gives a stable diagnostics entrypoint when the runtime is suspect
217
+ - startup no longer crashes on `--help` if scientific dependencies are missing
218
+ - CI runs editable install, distribution builds, and the CLI test suite on multiple Python versions via `.github/workflows/ci.yml`
219
+
220
+ ## Agent-Friendly Conventions
221
+
222
+ In a normal user project, prefer:
223
+
224
+ ```bash
225
+ ./datalore <command> ... --format json
226
+ ```
227
+
228
+ Why:
229
+ - JSON output is stable and easy to parse
230
+ - exit codes are predictable
231
+ - artifact paths are explicit
232
+ - the analysis method is fixed rather than improvised
233
+ - `--report-file` gives you a stable on-disk result for chained workflows
234
+ - `datalore files` avoids the common failure mode where the dataset is gitignored and invisible to `rg --files`
235
+ - `datalore init` gives Codex and Claude Code repo-local instructions plus a reliable local wrapper in the user's own data project
236
+
237
+ Inside the Datalore repository itself, contributors can still prefer:
238
+
239
+ ```bash
240
+ ./datalore <command> ... --format json
241
+ ```
242
+
243
+ Repo-native instructions are included for both ecosystems:
244
+ - `AGENTS.md` for Codex
245
+ - `CLAUDE.md` for Claude Code
246
+ - `.agents/skills/` for Codex skills
247
+ - `.claude/skills/` for Claude Code skills
248
+
249
+ These instructions tell agents when to use Datalore instead of writing ad hoc analysis code.
250
+
251
+ ## Artifacts
252
+
253
+ Generated outputs go under `artifacts/` by default:
254
+
255
+ - `artifacts/plots/`
256
+ - `artifacts/datasets/`
257
+ - `artifacts/reports/`
258
+ - `artifacts/scripts/`
259
+
260
+ Override the root with:
261
+
262
+ ```bash
263
+ export DATALORE_ARTIFACT_DIR=/path/to/artifacts
264
+ ```
265
+
266
+ ## Publishing
267
+
268
+ The repository is set up for PyPI Trusted Publishing through GitHub Actions.
269
+
270
+ Release flow:
271
+
272
+ ```bash
273
+ git tag v0.4.1
274
+ git push origin v0.4.1
275
+ gh release create v0.4.1 --generate-notes
276
+ ```
277
+
278
+ Before the first release, create or register the `datalore-cli` project on PyPI and add a Trusted Publisher that matches:
279
+
280
+ - owner: `micic-mihajlo`
281
+ - repository: `datalore-cli`
282
+ - workflow: `.github/workflows/publish.yml`
283
+ - environment: `pypi`
284
+
285
+ If the project does not exist yet on PyPI, use PyPI's pending publisher flow for the first release. After that, publishing happens from GitHub Actions without local API tokens.
286
+
287
+ ## Testing
288
+
289
+ Run the CLI test suite:
290
+
291
+ ```bash
292
+ .venv/bin/python -m unittest discover -s tests -v
293
+ ```
294
+
295
+ Smoke test the fixture dataset manually:
296
+
297
+ ```bash
298
+ ./datalore files tests --format json
299
+ ./datalore inspect tests/fixtures/mini_dataset.csv --format json
300
+ ./datalore summary tests/fixtures/mini_dataset.csv --format json
301
+ ./datalore correlate tests/fixtures/mini_dataset.csv --target sales --format json
302
+ ./datalore regression tests/fixtures/mini_dataset.csv --target sales --predictors marketing,customers --format json
303
+ ./datalore clean tests/fixtures/mini_dataset.csv --rename sales=>revenue --derive sales_per_customer=>revenue\|/\|customers --filter 'customers>=24' --limit 3 --format json
304
+ ./datalore clean tests/fixtures/mini_dataset_changed.csv --fill-numeric median --output artifacts/datasets/mini_clean.csv --format json
305
+ ./datalore compare tests/fixtures/mini_dataset.csv tests/fixtures/mini_dataset_changed.csv --key-columns month --format json
306
+ ./datalore doctor --format json
307
+ ```
308
+
309
+ Smoke test the installed package from outside the repo:
310
+
311
+ ```bash
312
+ mkdir -p /tmp/datalore-demo
313
+ cp tests/fixtures/mini_dataset.csv /tmp/datalore-demo/
314
+ cd /tmp/datalore-demo
315
+ datalore init
316
+ ./datalore files . --format json
317
+ ./datalore profile mini_dataset.csv --format json
318
+ ```