csvtrim 1.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
csvtrim-1.0.1/PKG-INFO ADDED
@@ -0,0 +1,307 @@
1
+ Metadata-Version: 2.4
2
+ Name: csvtrim
3
+ Version: 1.0.1
4
+ Summary: Filter and trim large CSV files by column values — keep only the rows and columns you need.
5
+ License: MIT
6
+ Project-URL: Homepage, https://github.com/kimtholstorf/csvtrim
7
+ Project-URL: Repository, https://github.com/kimtholstorf/csvtrim
8
+ Keywords: csv,data,filter,trim,azure,billing
9
+ Classifier: Programming Language :: Python :: 3
10
+ Classifier: License :: OSI Approved :: MIT License
11
+ Classifier: Operating System :: OS Independent
12
+ Classifier: Environment :: Console
13
+ Classifier: Topic :: Utilities
14
+ Requires-Python: >=3.10
15
+ Description-Content-Type: text/markdown
16
+ Requires-Dist: pandas
17
+ Requires-Dist: openpyxl
18
+
19
+ # csvTrim
20
+
21
+ Filter and trim large CSV files by column values — keep only the rows and columns you need.
22
+
23
+ csvTrim processes a single file or an entire folder of CSVs in one pass. It is optimised for large billing exports (e.g. Azure cost data) but works with any structured CSV. Results can also be exported to Excel.
24
+
25
+ ---
26
+
27
+ ## Features
28
+
29
+ - **Row filtering** — keep only rows whose filter column matches a list of values
30
+ - **Column trimming** — drop every column not in your keep list
31
+ - **Folder processing** — pass a folder path to process all `.csv` files at once
32
+ - **Preset system** — save named filter configurations to `presets.json` and load them by name
33
+ - **Auto-default preset** — run with just `--input` / `--output` to use the preset marked as default
34
+ - **Excel export** — optional `.xlsx` output; splits automatically across sheets if rows exceed Excel's worksheet limit
35
+ - **Memory-efficient** — reads files in 100 000-row chunks so large exports don't run out of RAM
36
+ - **Run summary** — shows row counts, reduction percentage, per-value breakdown, and elapsed time
37
+
38
+ ---
39
+
40
+ ## Requirements
41
+
42
+ - Python 3.9+
43
+ - `pandas`
44
+ - `openpyxl` (only needed for `--excel`)
45
+
46
+ ### Install
47
+
48
+ ```bash
49
+ # One-time setup (creates .venv with pandas + openpyxl)
50
+ bash setup_python_env.sh
51
+
52
+ # Activate the environment
53
+ source .venv/bin/activate
54
+ ```
55
+
56
+ The setup script installs [uv](https://github.com/astral-sh/uv) if it isn't already present (via [Homebrew](https://formulae.brew.sh/formula/uv) if available, otherwise via curl).
57
+
58
+ ---
59
+
60
+ ## Install via pip
61
+
62
+ ```bash
63
+ pip install csvtrim
64
+
65
+ # or, for an isolated install that won't affect your system Python:
66
+ pipx install csvtrim
67
+ ```
68
+
69
+ After installation, `csvtrim` is available as a shell command — no venv activation needed:
70
+
71
+ ```bash
72
+ csvtrim --input data.csv --output trimmed.csv
73
+ ```
74
+
75
+ The default `presets.json` is bundled with the package. To use a custom presets file, pass `--preset-file /path/to/your_presets.json`.
76
+
77
+ ---
78
+
79
+ ## Docker
80
+
81
+ ### Build
82
+
83
+ ```bash
84
+ docker build -t csvtrim .
85
+ ```
86
+
87
+ ### Run
88
+
89
+ Pull the image from GitHub Container Registry, then mount a local folder to `/data` with `-v` to pass files in and retrieve output. All arguments work identically to the local script.
90
+
91
+ ```bash
92
+ docker pull ghcr.io/kimtholstorf/csvtrim:latest
93
+
94
+ docker run --rm -it \
95
+ -v /your/data:/data \
96
+ ghcr.io/kimtholstorf/csvtrim:latest \
97
+ --input /data/export.csv --output /data/trimmed.csv
98
+ ```
99
+
100
+ The `-it` flag gives csvTrim a real terminal so the progress bar and ANSI output render correctly. `--rm` removes the container automatically when it exits.
101
+
102
+ ---
103
+
104
+ ## Quick start
105
+
106
+ ```bash
107
+ # Use the default preset, trim a single file
108
+ python3 csvTrim.py --input data.csv --output trimmed.csv
109
+
110
+ # Process an entire folder, also produce Excel output
111
+ python3 csvTrim.py --input ./exports --output trimmed.csv --excel
112
+
113
+ # Use a named preset
114
+ python3 csvTrim.py --input data.csv --output trimmed.csv --preset Azure
115
+ ```
116
+
117
+ ---
118
+
119
+ ## CLI reference
120
+
121
+ | Argument | Short | Description |
122
+ |---|---|---|
123
+ | `--input PATH` | `-i` | Single `.csv` file or folder of `.csv` files to process. Required unless `--preset-save` is used. |
124
+ | `--output FILE` | `-o` | Output CSV file path (e.g. `trimmed.csv`). Required unless `--preset-save` is used. |
125
+ | `--excel` | `-e` | Also write an `.xlsx` file alongside the output CSV. Splits into multiple sheets if the row count exceeds Excel's worksheet limit. |
126
+ | `--filter LIST` | `-f` | Python list of values to keep, matched against `--filter-column`. Omit to use the default preset. Example: `"['Compute', 'Storage']"` |
127
+ | `--filter-column COL` | `-fc` | Column name to match filter values against. Omit to use the default preset. |
128
+ | `--columns LIST` | `-c` | Python list of column names to keep in the output. Omit to use the default preset. Example: `"['meterCategory', 'quantity']"` |
129
+ | `--preset NAME` | `-p` | Load all filter settings from a named preset. Overrides `--filter`, `--filter-column`, and `--columns`. If no `--preset` and no individual flags are given, the `_default` preset is loaded automatically. |
130
+ | `--preset-file FILE` | `-pf` | Path to a custom JSON presets file. Defaults to `presets.json` next to the script. |
131
+ | `--preset-save NAME` | `-ps` | Save the current `--filter`, `--filter-column`, and `--columns` as a named preset (or overwrite an existing one). No CSV trimming is performed. |
132
+ | `--version` | `-v` | Print the version and exit. |
133
+
134
+ ### Flag resolution order
135
+
136
+ When deciding which filter settings to use, csvTrim applies this priority:
137
+
138
+ 1. **`--preset NAME`** — load everything from the named preset; individual flags are ignored.
139
+ 2. **No flags at all** — auto-load the `_default` preset from `presets.json`.
140
+ 3. **One or more individual flags** — load the `_default` preset as a base, then apply any explicitly passed flags on top.
141
+
142
+ ---
143
+
144
+ ## Preset system
145
+
146
+ Presets are stored in a JSON file (`presets.json` by default, next to the script). Each preset holds three values: the column to filter on, which values to keep, and which output columns to retain.
147
+ The `"_default"` key names which preset to load when no `--preset` or individual flags are given. To change the default, edit the string value — no other changes needed.
148
+
149
+ ### File format
150
+
151
+ ```json
152
+ {
153
+ "_default": "Azure",
154
+ "Azure": {
155
+ "filter_column": "serviceFamily",
156
+ "filter": ["Compute", "Networking", "Storage"],
157
+ "columns": [
158
+ "serviceFamily",
159
+ "meterCategory",
160
+ "meterSubCategory",
161
+ "meterName",
162
+ "ProductName",
163
+ "productOrderName",
164
+ "meterRegion",
165
+ "quantity",
166
+ "pricingModel",
167
+ "term",
168
+ "unitOfMeasure",
169
+ "ResourceId",
170
+ "date"
171
+ ]
172
+ }
173
+ }
174
+ ```
175
+
176
+ ### Using a preset
177
+
178
+ ```bash
179
+ python3 csvTrim.py --input data.csv --output out.csv --preset Azure
180
+ ```
181
+
182
+ ### Saving a new preset
183
+
184
+ Use `--preset-save` together with the individual flags. No trimming is performed — the preset is written to `presets.json` and the script exits.
185
+
186
+ ```bash
187
+ # Save a brand-new preset
188
+ python3 csvTrim.py --preset-save GCP \
189
+ --filter-column "service.description" \
190
+ --filter "['Compute Engine', 'Cloud Storage', 'BigQuery']" \
191
+ --columns "['billing_account_id', 'service.description', 'cost', 'currency']"
192
+
193
+ # Copy an existing preset under a new name
194
+ python3 csvTrim.py --preset Azure --preset-save AzureBackup
195
+ ```
196
+
197
+ If the preset name already exists it is overwritten. The script prints a confirmation showing what was saved.
198
+
199
+ ### Using a custom presets file
200
+
201
+ ```bash
202
+ python3 csvTrim.py --input data.csv --output out.csv \
203
+ --preset MyPreset --preset-file /path/to/my_presets.json
204
+ ```
205
+
206
+ `--preset-file` works with `--preset`, `--preset-save`, and the auto-default flow.
207
+
208
+ ---
209
+
210
+ ## Examples
211
+
212
+ ```bash
213
+ # Default run — auto-loads the '_default' preset
214
+ python3 csvTrim.py --input data.csv --output trimmed.csv
215
+
216
+ # Named preset
217
+ python3 csvTrim.py --input data.csv --output trimmed.csv --preset Azure
218
+
219
+ # Folder of CSVs + Excel output
220
+ python3 csvTrim.py --input ./monthly_exports --output combined.csv --excel
221
+
222
+ # Override only the filter values; other settings come from the default preset
223
+ python3 csvTrim.py --input data.csv --output out.csv \
224
+ --filter "['SaaS', 'Developer Tools', 'Containers', 'Databases']"
225
+
226
+ # Fully custom filter (no preset)
227
+ python3 csvTrim.py --input data.csv --output out.csv \
228
+ --filter-column meterCategory \
229
+ --filter "['Virtual Machines', 'Storage']" \
230
+ --columns "['meterCategory', 'quantity', 'date']"
231
+
232
+ # Save a preset then use it
233
+ python3 csvTrim.py --preset-save Prod \
234
+ --filter-column serviceFamily \
235
+ --filter "['Compute', 'Networking']" \
236
+ --columns "['serviceFamily', 'meterCategory', 'quantity', 'date']"
237
+
238
+ python3 csvTrim.py --input data.csv --output out.csv --preset Prod
239
+ ```
240
+
241
+ ---
242
+
243
+ ## Docker examples
244
+
245
+ Same examples as above, run inside the container. Mount your data folder to `/data` and prefix paths accordingly. Use `--preset-file /data/presets.json` when saving or loading presets so changes persist to your local machine.
246
+
247
+ ```bash
248
+ # Default run — auto-loads the '_default' preset
249
+ docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
250
+ --input /data/export.csv --output /data/trimmed.csv
251
+
252
+ # Named preset
253
+ docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
254
+ --input /data/export.csv --output /data/trimmed.csv --preset Azure
255
+
256
+ # Folder of CSVs + Excel output
257
+ docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
258
+ --input /data/monthly_exports --output /data/combined.csv --excel
259
+
260
+ # Override only the filter values; other settings come from the default preset
261
+ docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
262
+ --input /data/export.csv --output /data/out.csv \
263
+ --filter "['SaaS', 'Developer Tools', 'Containers', 'Databases']"
264
+
265
+ # Fully custom filter (no preset)
266
+ docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
267
+ --input /data/export.csv --output /data/out.csv \
268
+ --filter-column meterCategory \
269
+ --filter "['Virtual Machines', 'Storage']" \
270
+ --columns "['meterCategory', 'quantity', 'date']"
271
+
272
+ # Save a preset to the mounted folder, then use it
273
+ docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
274
+ --preset-save Prod \
275
+ --filter-column serviceFamily \
276
+ --filter "['Compute', 'Networking']" \
277
+ --columns "['serviceFamily', 'meterCategory', 'quantity', 'date']" \
278
+ --preset-file /data/presets.json
279
+
280
+ docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
281
+ --input /data/export.csv --output /data/out.csv \
282
+ --preset Prod --preset-file /data/presets.json
283
+ ```
284
+
285
+ ---
286
+
287
+ ## Output
288
+
289
+ After processing, csvTrim prints a summary:
290
+
291
+ ```
292
+ ══════════════════════════════════════════════════════════
293
+ Files: 3 Rows in: 2,841,504 Elapsed: 8.3s
294
+ ──────────────────────────────────────────────────────────
295
+ Columns kept: 13
296
+ Columns removed: 51 (79.7%)
297
+ Rows out: 312,847
298
+ Rows removed: 2,528,657 (89.0% reduction)
299
+ ──────────────────────────────────────────────────────────
300
+ Rows by serviceFamily:
301
+ Compute 241,003
302
+ Networking 48,201
303
+ Storage 23,643
304
+ ══════════════════════════════════════════════════════════
305
+ ```
306
+
307
+ Skipped files (missing columns, encoding errors, etc.) are listed below the summary with the reason.
@@ -0,0 +1,289 @@
1
+ # csvTrim
2
+
3
+ Filter and trim large CSV files by column values — keep only the rows and columns you need.
4
+
5
+ csvTrim processes a single file or an entire folder of CSVs in one pass. It is optimised for large billing exports (e.g. Azure cost data) but works with any structured CSV. Results can also be exported to Excel.
6
+
7
+ ---
8
+
9
+ ## Features
10
+
11
+ - **Row filtering** — keep only rows whose filter column matches a list of values
12
+ - **Column trimming** — drop every column not in your keep list
13
+ - **Folder processing** — pass a folder path to process all `.csv` files at once
14
+ - **Preset system** — save named filter configurations to `presets.json` and load them by name
15
+ - **Auto-default preset** — run with just `--input` / `--output` to use the preset marked as default
16
+ - **Excel export** — optional `.xlsx` output; splits automatically across sheets if rows exceed Excel's worksheet limit
17
+ - **Memory-efficient** — reads files in 100 000-row chunks so large exports don't run out of RAM
18
+ - **Run summary** — shows row counts, reduction percentage, per-value breakdown, and elapsed time
19
+
20
+ ---
21
+
22
+ ## Requirements
23
+
24
+ - Python 3.9+
25
+ - `pandas`
26
+ - `openpyxl` (only needed for `--excel`)
27
+
28
+ ### Install
29
+
30
+ ```bash
31
+ # One-time setup (creates .venv with pandas + openpyxl)
32
+ bash setup_python_env.sh
33
+
34
+ # Activate the environment
35
+ source .venv/bin/activate
36
+ ```
37
+
38
+ The setup script installs [uv](https://github.com/astral-sh/uv) if it isn't already present (via [Homebrew](https://formulae.brew.sh/formula/uv) if available, otherwise via curl).
39
+
40
+ ---
41
+
42
+ ## Install via pip
43
+
44
+ ```bash
45
+ pip install csvtrim
46
+
47
+ # or, for an isolated install that won't affect your system Python:
48
+ pipx install csvtrim
49
+ ```
50
+
51
+ After installation, `csvtrim` is available as a shell command — no venv activation needed:
52
+
53
+ ```bash
54
+ csvtrim --input data.csv --output trimmed.csv
55
+ ```
56
+
57
+ The default `presets.json` is bundled with the package. To use a custom presets file, pass `--preset-file /path/to/your_presets.json`.
58
+
59
+ ---
60
+
61
+ ## Docker
62
+
63
+ ### Build
64
+
65
+ ```bash
66
+ docker build -t csvtrim .
67
+ ```
68
+
69
+ ### Run
70
+
71
+ Pull the image from GitHub Container Registry, then mount a local folder to `/data` with `-v` to pass files in and retrieve output. All arguments work identically to the local script.
72
+
73
+ ```bash
74
+ docker pull ghcr.io/kimtholstorf/csvtrim:latest
75
+
76
+ docker run --rm -it \
77
+ -v /your/data:/data \
78
+ ghcr.io/kimtholstorf/csvtrim:latest \
79
+ --input /data/export.csv --output /data/trimmed.csv
80
+ ```
81
+
82
+ The `-it` flag gives csvTrim a real terminal so the progress bar and ANSI output render correctly. `--rm` removes the container automatically when it exits.
83
+
84
+ ---
85
+
86
+ ## Quick start
87
+
88
+ ```bash
89
+ # Use the default preset, trim a single file
90
+ python3 csvTrim.py --input data.csv --output trimmed.csv
91
+
92
+ # Process an entire folder, also produce Excel output
93
+ python3 csvTrim.py --input ./exports --output trimmed.csv --excel
94
+
95
+ # Use a named preset
96
+ python3 csvTrim.py --input data.csv --output trimmed.csv --preset Azure
97
+ ```
98
+
99
+ ---
100
+
101
+ ## CLI reference
102
+
103
+ | Argument | Short | Description |
104
+ |---|---|---|
105
+ | `--input PATH` | `-i` | Single `.csv` file or folder of `.csv` files to process. Required unless `--preset-save` is used. |
106
+ | `--output FILE` | `-o` | Output CSV file path (e.g. `trimmed.csv`). Required unless `--preset-save` is used. |
107
+ | `--excel` | `-e` | Also write an `.xlsx` file alongside the output CSV. Splits into multiple sheets if the row count exceeds Excel's worksheet limit. |
108
+ | `--filter LIST` | `-f` | Python list of values to keep, matched against `--filter-column`. Omit to use the default preset. Example: `"['Compute', 'Storage']"` |
109
+ | `--filter-column COL` | `-fc` | Column name to match filter values against. Omit to use the default preset. |
110
+ | `--columns LIST` | `-c` | Python list of column names to keep in the output. Omit to use the default preset. Example: `"['meterCategory', 'quantity']"` |
111
+ | `--preset NAME` | `-p` | Load all filter settings from a named preset. Overrides `--filter`, `--filter-column`, and `--columns`. If no `--preset` and no individual flags are given, the `_default` preset is loaded automatically. |
112
+ | `--preset-file FILE` | `-pf` | Path to a custom JSON presets file. Defaults to `presets.json` next to the script. |
113
+ | `--preset-save NAME` | `-ps` | Save the current `--filter`, `--filter-column`, and `--columns` as a named preset (or overwrite an existing one). No CSV trimming is performed. |
114
+ | `--version` | `-v` | Print the version and exit. |
115
+
116
+ ### Flag resolution order
117
+
118
+ When deciding which filter settings to use, csvTrim applies this priority:
119
+
120
+ 1. **`--preset NAME`** — load everything from the named preset; individual flags are ignored.
121
+ 2. **No flags at all** — auto-load the `_default` preset from `presets.json`.
122
+ 3. **One or more individual flags** — load the `_default` preset as a base, then apply any explicitly passed flags on top.
123
+
124
+ ---
125
+
126
+ ## Preset system
127
+
128
+ Presets are stored in a JSON file (`presets.json` by default, next to the script). Each preset holds three values: the column to filter on, which values to keep, and which output columns to retain.
129
+ The `"_default"` key names which preset to load when no `--preset` or individual flags are given. To change the default, edit the string value — no other changes needed.
130
+
131
+ ### File format
132
+
133
+ ```json
134
+ {
135
+ "_default": "Azure",
136
+ "Azure": {
137
+ "filter_column": "serviceFamily",
138
+ "filter": ["Compute", "Networking", "Storage"],
139
+ "columns": [
140
+ "serviceFamily",
141
+ "meterCategory",
142
+ "meterSubCategory",
143
+ "meterName",
144
+ "ProductName",
145
+ "productOrderName",
146
+ "meterRegion",
147
+ "quantity",
148
+ "pricingModel",
149
+ "term",
150
+ "unitOfMeasure",
151
+ "ResourceId",
152
+ "date"
153
+ ]
154
+ }
155
+ }
156
+ ```
157
+
158
+ ### Using a preset
159
+
160
+ ```bash
161
+ python3 csvTrim.py --input data.csv --output out.csv --preset Azure
162
+ ```
163
+
164
+ ### Saving a new preset
165
+
166
+ Use `--preset-save` together with the individual flags. No trimming is performed — the preset is written to `presets.json` and the script exits.
167
+
168
+ ```bash
169
+ # Save a brand-new preset
170
+ python3 csvTrim.py --preset-save GCP \
171
+ --filter-column "service.description" \
172
+ --filter "['Compute Engine', 'Cloud Storage', 'BigQuery']" \
173
+ --columns "['billing_account_id', 'service.description', 'cost', 'currency']"
174
+
175
+ # Copy an existing preset under a new name
176
+ python3 csvTrim.py --preset Azure --preset-save AzureBackup
177
+ ```
178
+
179
+ If the preset name already exists it is overwritten. The script prints a confirmation showing what was saved.
180
+
181
+ ### Using a custom presets file
182
+
183
+ ```bash
184
+ python3 csvTrim.py --input data.csv --output out.csv \
185
+ --preset MyPreset --preset-file /path/to/my_presets.json
186
+ ```
187
+
188
+ `--preset-file` works with `--preset`, `--preset-save`, and the auto-default flow.
189
+
190
+ ---
191
+
192
+ ## Examples
193
+
194
+ ```bash
195
+ # Default run — auto-loads the '_default' preset
196
+ python3 csvTrim.py --input data.csv --output trimmed.csv
197
+
198
+ # Named preset
199
+ python3 csvTrim.py --input data.csv --output trimmed.csv --preset Azure
200
+
201
+ # Folder of CSVs + Excel output
202
+ python3 csvTrim.py --input ./monthly_exports --output combined.csv --excel
203
+
204
+ # Override only the filter values; other settings come from the default preset
205
+ python3 csvTrim.py --input data.csv --output out.csv \
206
+ --filter "['SaaS', 'Developer Tools', 'Containers', 'Databases']"
207
+
208
+ # Fully custom filter (no preset)
209
+ python3 csvTrim.py --input data.csv --output out.csv \
210
+ --filter-column meterCategory \
211
+ --filter "['Virtual Machines', 'Storage']" \
212
+ --columns "['meterCategory', 'quantity', 'date']"
213
+
214
+ # Save a preset then use it
215
+ python3 csvTrim.py --preset-save Prod \
216
+ --filter-column serviceFamily \
217
+ --filter "['Compute', 'Networking']" \
218
+ --columns "['serviceFamily', 'meterCategory', 'quantity', 'date']"
219
+
220
+ python3 csvTrim.py --input data.csv --output out.csv --preset Prod
221
+ ```
222
+
223
+ ---
224
+
225
+ ## Docker examples
226
+
227
+ Same examples as above, run inside the container. Mount your data folder to `/data` and prefix paths accordingly. Use `--preset-file /data/presets.json` when saving or loading presets so changes persist to your local machine.
228
+
229
+ ```bash
230
+ # Default run — auto-loads the '_default' preset
231
+ docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
232
+ --input /data/export.csv --output /data/trimmed.csv
233
+
234
+ # Named preset
235
+ docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
236
+ --input /data/export.csv --output /data/trimmed.csv --preset Azure
237
+
238
+ # Folder of CSVs + Excel output
239
+ docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
240
+ --input /data/monthly_exports --output /data/combined.csv --excel
241
+
242
+ # Override only the filter values; other settings come from the default preset
243
+ docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
244
+ --input /data/export.csv --output /data/out.csv \
245
+ --filter "['SaaS', 'Developer Tools', 'Containers', 'Databases']"
246
+
247
+ # Fully custom filter (no preset)
248
+ docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
249
+ --input /data/export.csv --output /data/out.csv \
250
+ --filter-column meterCategory \
251
+ --filter "['Virtual Machines', 'Storage']" \
252
+ --columns "['meterCategory', 'quantity', 'date']"
253
+
254
+ # Save a preset to the mounted folder, then use it
255
+ docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
256
+ --preset-save Prod \
257
+ --filter-column serviceFamily \
258
+ --filter "['Compute', 'Networking']" \
259
+ --columns "['serviceFamily', 'meterCategory', 'quantity', 'date']" \
260
+ --preset-file /data/presets.json
261
+
262
+ docker run --rm -it -v /your/data:/data ghcr.io/kimtholstorf/csvtrim:latest \
263
+ --input /data/export.csv --output /data/out.csv \
264
+ --preset Prod --preset-file /data/presets.json
265
+ ```
266
+
267
+ ---
268
+
269
+ ## Output
270
+
271
+ After processing, csvTrim prints a summary:
272
+
273
+ ```
274
+ ══════════════════════════════════════════════════════════
275
+ Files: 3 Rows in: 2,841,504 Elapsed: 8.3s
276
+ ──────────────────────────────────────────────────────────
277
+ Columns kept: 13
278
+ Columns removed: 51 (79.7%)
279
+ Rows out: 312,847
280
+ Rows removed: 2,528,657 (89.0% reduction)
281
+ ──────────────────────────────────────────────────────────
282
+ Rows by serviceFamily:
283
+ Compute 241,003
284
+ Networking 48,201
285
+ Storage 23,643
286
+ ══════════════════════════════════════════════════════════
287
+ ```
288
+
289
+ Skipped files (missing columns, encoding errors, etc.) are listed below the summary with the reason.
File without changes
@@ -0,0 +1,2 @@
1
+ from csvtrim.csvTrim import main
2
+ main()