dataframe-textual 0.3.2__py3-none-any.whl → 1.5.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,987 @@
1
+ Metadata-Version: 2.4
2
+ Name: dataframe-textual
3
+ Version: 1.5.0
4
+ Summary: Interactive terminal viewer/editor for tabular data
5
+ Project-URL: Homepage, https://github.com/need47/dataframe-textual
6
+ Project-URL: Repository, https://github.com/need47/dataframe-textual.git
7
+ Project-URL: Documentation, https://github.com/need47/dataframe-textual#readme
8
+ Project-URL: Bug Tracker, https://github.com/need47/dataframe-textual/issues
9
+ Author-email: Tiejun Cheng <need47@gmail.com>
10
+ License: MIT
11
+ License-File: LICENSE
12
+ Keywords: csv,data-analysis,editor,excel,interactive,polars,terminal,textual,tui,viewer
13
+ Classifier: Development Status :: 3 - Alpha
14
+ Classifier: Environment :: Console
15
+ Classifier: Intended Audience :: Developers
16
+ Classifier: Intended Audience :: End Users/Desktop
17
+ Classifier: License :: OSI Approved :: MIT License
18
+ Classifier: Natural Language :: English
19
+ Classifier: Operating System :: MacOS
20
+ Classifier: Operating System :: POSIX
21
+ Classifier: Operating System :: Unix
22
+ Classifier: Programming Language :: Python :: 3
23
+ Classifier: Programming Language :: Python :: 3.11
24
+ Classifier: Programming Language :: Python :: 3.12
25
+ Classifier: Programming Language :: Python :: 3.13
26
+ Classifier: Programming Language :: Python :: 3.14
27
+ Classifier: Topic :: Office/Business
28
+ Classifier: Topic :: Utilities
29
+ Classifier: Typing :: Typed
30
+ Requires-Python: >=3.11
31
+ Requires-Dist: polars>=1.34.0
32
+ Requires-Dist: textual[syntax]>=6.5.0
33
+ Provides-Extra: dev
34
+ Requires-Dist: textual-dev>=1.8.0; extra == 'dev'
35
+ Provides-Extra: excel
36
+ Requires-Dist: fastexcel>=0.16.0; extra == 'excel'
37
+ Requires-Dist: xlsxwriter>=3.2.9; extra == 'excel'
38
+ Description-Content-Type: text/markdown
39
+
40
+ # DataFrame Textual
41
+
42
+ A powerful, interactive terminal-based viewer/editor for CSV/TSV/Excel/Parquet/JSON/NDJSON built with Python, [Polars](https://pola.rs/), and [Textual](https://textual.textualize.io/). Inspired by [VisiData](https://www.visidata.org/), this tool provides smooth keyboard navigation, data manipulation, and a clean interface for exploring tabular data directly in terminal with multi-tab support for multiple files!
43
+
44
+ ![Screenshot](https://raw.githubusercontent.com/need47/dataframe-textual/refs/heads/main/screenshot.png)
45
+
46
+ ## Features
47
+
48
+ ### Data Viewing
49
+ - 🚀 **Fast Loading** - Powered by Polars for efficient data handling
50
+ - 🎨 **Rich Terminal UI** - Beautiful, color-coded columns with various data types (e.g., integer, float, string)
51
+ - ⌨️ **Comprehensive Keyboard Navigation** - Intuitive controls
52
+ # Skip first 5 lines (comments, metadata)
53
+ dv -l 5 data_with_metadata.csv
54
+
55
+ # Skip 1 row after header (e.g., units row)
56
+ dv -a 1 data_with_units.csv
57
+
58
+ # Complex CSV with comments and units row
59
+ dv -l 3 -a 1 -I messy_scientific_data.csv
60
+
61
+ # Combine all options: skip lines, skip after header, no header, no inference, gzipped
62
+ dv -l 2 -a 1 -H -I complex_data.csv.gz
63
+
64
+ # Process compressed data from stdin with line skipping
65
+ zcat compressed_data.csv.gz | dv -f csv -l 2editing, and manipulating data
66
+ - 📊 **Flexible Input** - Read from files and/or stdin (pipes/redirects)
67
+ - 🔄 **Smart Pagination** - Lazy load rows on demand for handling large datasets
68
+
69
+ ### Data Manipulation
70
+ - 📝 **Data Editing** - Edit cells, delete rows, and remove columns
71
+ - 🔍 **Search & Filter** - Find values, highlight matches, and filter selected rows
72
+ - ↔️ **Column/Row Reordering** - Move columns and rows with simple keyboard shortcuts
73
+ - 📈 **Sorting & Statistics** - Multi-column sorting and frequency distribution analysis
74
+ - 💾 **Save & Undo** - Save edits back to file with full undo/redo support
75
+
76
+ ### Advanced Features
77
+ - 📂 **Multi-File Support** - Open multiple files in separate tabs
78
+ - 🔄 **Tab Management** - Seamlessly switch between open files with keyboard shortcuts
79
+ - 📌 **Freeze Rows/Columns** - Keep important rows and columns visible while scrolling
80
+ - 🎯 **Cursor Type Cycling** - Switch between cell, row, and column selection modes
81
+ - 🔗 **Link Column Creation** - Generate clickable URLs using template expressions with placeholder support
82
+
83
+ ## Installation
84
+
85
+ ### Using pip
86
+
87
+ ```bash
88
+ # Install from PyPI
89
+ pip install dataframe-textual
90
+
91
+ # With Excel support (fastexcel, xlsxwriter)
92
+ pip install dataframe-textual[excel]
93
+ ```
94
+
95
+ This installs an executable `dv`.
96
+
97
+ Then run:
98
+ ```bash
99
+ dv <csv_file>
100
+ ```
101
+
102
+ ### Using [uv](https://docs.astral.sh/uv/)
103
+
104
+ ```bash
105
+ # Quick run using uvx without installation
106
+ uvx https://github.com/need47/dataframe-textual.git <csvfile>
107
+
108
+ # Clone or download the project
109
+ cd dataframe-textual
110
+ uv sync --extra excel # with Excel support
111
+
112
+ # Run directly with uv
113
+ uv run dv <csv_file>
114
+ ```
115
+
116
+ ### Development installation
117
+
118
+ ```bash
119
+ # Clone the repository
120
+ git clone https://github.com/need47/dataframe-textual.git
121
+ cd dataframe-textual
122
+
123
+ # Install from local source
124
+ pip install -e .
125
+
126
+ # Or with development dependencies
127
+ pip install -e ".[excel,dev]"
128
+ ```
129
+
130
+ ## Usage
131
+
132
+ ### Basic Usage - Single File
133
+
134
+ ```bash
135
+ # After pip install dataframe-textual
136
+ dv pokemon.csv
137
+
138
+ # Or if running from source
139
+ python main.py pokemon.csv
140
+
141
+ # Or with uv
142
+ uv run python main.py pokemon.csv
143
+
144
+ # Read from stdin (auto-detects format; defaults to TSV if not recognized)
145
+ cat data.tsv | dv
146
+ dv < data.tsv
147
+
148
+ # Gzipped files are supported
149
+ dv data.csv.gz
150
+ dv large_dataset.tsv.gz
151
+
152
+ # Specify format for gzipped stdin
153
+ zcat data.csv.gz | dv -f csv
154
+ ```
155
+
156
+ ### Multi-File Usage - Multiple Tabs
157
+
158
+ ```bash
159
+ # Open multiple files in tabs
160
+ dv file1.csv file2.csv file3.csv
161
+
162
+ # Open multiple sheets in tabs in an Excel file
163
+ dv file.xlsx
164
+
165
+ # Mix files and stdin (read from stdin, then open file)
166
+ dv data1.tsv < data2.tsv
167
+
168
+ # Mix regular and gzipped files
169
+ dv data1.csv data2.csv.gz data3.tsv.gz
170
+ ```
171
+
172
+ When multiple files are opened:
173
+ - Each file appears as a separate tab at the top
174
+ - Switch between tabs using `>` (next) or `<` (previous)
175
+ - Open additional files with `Ctrl+O`
176
+ - Close the current tab with `Ctrl+W`
177
+ - Each file maintains its own state (edits, sort order, selections, history, etc.)
178
+
179
+ ## Command Line Options
180
+
181
+ ```
182
+ usage: dv [-h] [-f {csv,excel,tsv,parquet,json,ndjson}] [-H] [-I] [-E] [-c COMMENT_PREFIX] [-q QUOTE_CHAR] [-l SKIP_LINES] [-a SKIP_ROWS_AFTER_HEADER] [-u NULL [NULL ...]] [files ...]
183
+
184
+ Interactive terminal based viewer/editor for tabular data (e.g., CSV/Excel).
185
+
186
+ positional arguments:
187
+ files Files to view (or read from stdin)
188
+
189
+ options:
190
+ -h, --help show this help message and exit
191
+ -f, --format {csv,excel,tsv,parquet,json,ndjson}
192
+ Specify the format of the input files
193
+ -H, --no-header Specify that input files have no header row
194
+ -I, --no-inferrence Do not infer data types when reading CSV/TSV
195
+ -E, --ignore-errors Ignore errors when reading CSV/TSV
196
+ -c, --comment-prefix COMMENT_PREFIX
197
+ Comment lines are skipped when reading CSV/TSV (default: skip none)
198
+ -q, --quote-char QUOTE_CHAR
199
+ Quote character for reading CSV/TSV (default: "; use None to disable)
200
+ -l, --skip-lines SKIP_LINES
201
+ Skip lines when reading CSV/TSV (default: 0)
202
+ -a, --skip-rows-after-header SKIP_ROWS_AFTER_HEADER
203
+ Skip rows after header when reading CSV/TSV (default: 0)
204
+ -u, --null NULL [NULL ...]
205
+ Values to interpret as null values when reading CSV/TSV
206
+ ```
207
+
208
+ ### CLI Examples
209
+
210
+ ```bash
211
+ # View CSV file without header row
212
+ dv -H data_no_header.csv
213
+
214
+ # Disable type inference for faster loading
215
+ dv -I large_data.csv
216
+
217
+ # Ignore parsing errors in malformed CSV
218
+ dv -E data_with_errors.csv
219
+
220
+ # Skip first 3 lines of file (e.g., comments, metadata)
221
+ dv -l 3 data_with_comments.csv
222
+
223
+ # Skip 1 row after header (e.g., units row)
224
+ dv -a 1 data_with_units.csv
225
+
226
+ # Treat specific values as null/missing (e.g., 'NA', 'N/A', '-')
227
+ dv -u NA N/A - data.csv
228
+
229
+ # Multiple null values with different formats
230
+ dv -u NULL NA "" "Not Available" messy_data.csv
231
+
232
+ # Disable quote character processing for TSV with embedded quotes
233
+ dv -q "" data.tsv
234
+
235
+ # Use different quote character (e.g., single quote for CSV)
236
+ dv -q "'" data.csv
237
+
238
+ # Complex CSV with comments and units row
239
+ dv -l 3 -a 1 -I messy_scientific_data.csv
240
+
241
+ # Combine all options: skip lines, skip after header, no header, no inference, gzipped
242
+ dv -l 2 -a 1 -H -I complex_data.csv.gz
243
+
244
+ # Process compressed data from stdin with line skipping
245
+ zcat compressed_data.csv.gz | dv -f csv -l 2
246
+
247
+ # CSV with custom null values and no header
248
+ dv -H -u NA "N/A" "-" raw_data.csv
249
+
250
+ # Skip lines, specify null values, and disable type inference
251
+ dv -l 5 -u NA "" data_with_metadata.csv
252
+
253
+ # TSV file with problematic quotes in data fields
254
+ dv -q None data.tsv
255
+
256
+ # CSV with comment lines and custom null values
257
+ dv -c "#" -u NA "N/A" commented_data.csv
258
+ ```
259
+
260
+ ## Keyboard Shortcuts
261
+
262
+ ### App-Level Controls
263
+
264
+ #### File & Tab Management
265
+
266
+ | Key | Action |
267
+ |-----|--------|
268
+ | `Ctrl+O` | Open file in a new tab |
269
+ | `Ctrl+W` | Close current tab |
270
+ | `Ctrl+A` | Save all open tabs to Excel file |
271
+ | `>` or `b` | Move to next tab |
272
+ | `<` | Move to previous tab |
273
+ | `B` | Toggle tab bar visibility |
274
+ | `q` | Quit the application |
275
+
276
+ #### View & Settings
277
+
278
+ | Key | Action |
279
+ |-----|--------|
280
+ | `F1` | Toggle help panel |
281
+ | `k` | Cycle through themes |
282
+
283
+ ---
284
+
285
+ ### Table-Level Controls
286
+
287
+ #### Navigation
288
+
289
+ | Key | Action |
290
+ |-----|--------|
291
+ | `g` | Jump to first row |
292
+ | `G` | Jump to last row (loads all remaining rows) |
293
+ | `↑` / `↓` | Move up/down one row |
294
+ | `←` / `→` | Move left/right one column |
295
+ | `Home` / `End` | Jump to first/last column in current row |
296
+ | `Ctrl + Home` / `Ctrl + End` | Jump to top/bottom in current page |
297
+ | `PageDown` / `PageUp` | Scroll down/up one page |
298
+ | `Ctrl+F` | Page down |
299
+ | `Ctrl+B` | Page up |
300
+
301
+ #### Viewing & Display
302
+
303
+ | Key | Action |
304
+ |-----|--------|
305
+ | `Enter` | View full details of current row in modal |
306
+ | `F` | Show frequency distribution for column |
307
+ | `s` | Show statistics for current column |
308
+ | `S` | Show statistics for entire dataframe |
309
+ | `K` | Cycle cursor type: cell → row → column → cell |
310
+ | `~` | Toggle row labels |
311
+ | `_` (underscore) | Expand column to full width |
312
+
313
+ #### Data Editing
314
+
315
+ | Key | Action |
316
+ |-----|--------|
317
+ | `Double-click` | Edit cell or rename column header |
318
+ | `delete` | Clear current cell (set to NULL) |
319
+ | `e` | Edit current cell (respects data type) |
320
+ | `E` | Edit entire column with expression |
321
+ | `a` | Add empty column after current |
322
+ | `A` | Add column with name and value/expression |
323
+ | `@` | Add a link column from template expression |
324
+ | `-` (minus) | Delete current column |
325
+ | `x` | Delete current row |
326
+ | `X` | Delete current row and all rows below |
327
+ | `Ctrl+X` | Delete current row and all rows above |
328
+ | `d` | Duplicate current column (appends '_copy' suffix) |
329
+ | `D` | Duplicate current row |
330
+ | `h` | Hide current column |
331
+ | `H` | Show all hidden rows/columns |
332
+
333
+ #### Searching & Filtering
334
+
335
+ | Key | Action |
336
+ |-----|--------|
337
+ | `\` | Search in current column using cursor value and select rows |
338
+ | `\|` (pipe) | Search in current column with expression and select rows |
339
+ | `/` | Find in current column with cursor value and highlight matches |
340
+ | `?` | Find in current column with expression and highlight matches |
341
+ | `n` | Go to next match |
342
+ | `N` | Go to previous match |
343
+ | `{` | Go to previous selected row |
344
+ | `}` | Go to next selected row |
345
+ | `'` | Select/deselect current row |
346
+ | `t` | Toggle selected rows (invert) |
347
+ | `T` | Clear all selected rows and/or matches |
348
+ | `"` (quote) | Filter to selected rows only |
349
+ | `v` | View only rows by selected rows and/or matches or cursor value |
350
+ | `V` | View only rows by expression |
351
+
352
+ #### SQL Interface
353
+
354
+ | Key | Action |
355
+ |-----|--------|
356
+ | `l` | Simple SQL interface (select columns & WHERE clause) |
357
+ | `L` | Advanced SQL interface (full SQL queries) |
358
+
359
+ #### Find & Replace
360
+
361
+ | Key | Action |
362
+ |-----|--------|
363
+ | `;` | Find across all columns with cursor value |
364
+ | `:` | Find across all columns with expression |
365
+ | `r` | Find and replace in current column (interactive or replace all) |
366
+ | `R` | Find and replace across all columns (interactive or replace all) |
367
+
368
+ #### Sorting
369
+
370
+ | Key | Action |
371
+ |-----|--------|
372
+ | `[` | Sort current column ascending |
373
+ | `]` | Sort current column descending |
374
+
375
+ #### Reordering
376
+
377
+ | Key | Action |
378
+ |-----|--------|
379
+ | `Shift+↑` | Move current row up |
380
+ | `Shift+↓` | Move current row down |
381
+ | `Shift+←` | Move current column left |
382
+ | `Shift+→` | Move current column right |
383
+
384
+ #### Type Conversion
385
+
386
+ | Key | Action |
387
+ |-----|--------|
388
+ | `#` | Cast current column to integer (Int64) |
389
+ | `%` | Cast current column to float (Float64) |
390
+ | `!` | Cast current column to boolean |
391
+ | `$` | Cast current column to string |
392
+
393
+ #### Data Management
394
+
395
+ | Key | Action |
396
+ |-----|--------|
397
+ | `z` | Freeze rows and columns |
398
+ | `,` | Toggle thousand separator for numeric display |
399
+ | `c` | Copy current cell to clipboard |
400
+ | `Ctrl+C` | Copy column to clipboard |
401
+ | `Ctrl+R` | Copy row to clipboard (tab-separated) |
402
+ | `Ctrl+S` | Save current tab to file |
403
+ | `u` | Undo last action |
404
+ | `U` | Redo last undone action |
405
+ | `Ctrl+U` | Reset to initial state |
406
+
407
+ ## Features in Detail
408
+
409
+ ### 1. Color-Coded Data Types
410
+
411
+ Columns are automatically styled based on their data type:
412
+ - **integer**: Cyan text, right-aligned
413
+ - **float**: Magenta text, right-aligned
414
+ - **string**: Green text, left-aligned
415
+ - **boolean**: Blue text, centered
416
+ - **temporal**: Yellow text, centered
417
+
418
+ ### 2. Row Detail View
419
+
420
+ Press `Enter` on any row to open a modal showing all column values for that row.
421
+ Useful for examining wide datasets where columns don't fit on screen.
422
+
423
+ **In the Row Detail Modal**:
424
+ - Press `v` to **view** the main table to show only rows with the selected column value
425
+ - Press `"` to **filter** all rows containing the selected column value
426
+ - Press `q` or `Escape` to close the modal
427
+
428
+ ### 3. Search & Filtering
429
+
430
+ The application provides multiple search modes for different use cases:
431
+
432
+ **Search Operations** - Direct value/expression matching in current column:
433
+ - **`|` - Column Expression Search**: Opens dialog to search current column with custom expression
434
+ - **`\` - Column Cursor Search**: Instantly search current column using the cursor value
435
+
436
+ **Find Operations** - Find by value/expression:
437
+ - **`/` - Column Find**: Find cursor value within current column
438
+ - **`?` - Column Expression Find**: Open dialog to search current column with expression
439
+ - **`;` - Global Find**: Find cursor value across all columns
440
+ - **`:` - Global Expression Find**: Open dialog to search all columns with expression
441
+
442
+ **Selection & Filtering**:
443
+ - **`'` - Toggle Row Selection**: Select/deselect current row (marks it for filtering)
444
+ - **`t` - Invert Selections**: Flip selection state of all rows at once
445
+ - **`T` - Clear Selections**: Remove all row selections and matches
446
+ - **`"` - Filter Selected**: Display only the selected rows and remove others
447
+ - **`v` - View by Value**: Filter/view rows by selected rows or cursor value (others hidden but preserved)
448
+ - **`V` - View by Expression**: Filter/view rows using custom Polars expression (others hidden but preserved)
449
+
450
+ **Advanced Matching Options**:
451
+
452
+ When searching or finding, you can use checkboxes in the dialog to enable:
453
+ - **Match Nocase**: Ignore case differences (e.g., "john", "John", "JOHN" all match)
454
+ - **Match Whole**: Match complete value, not partial substrings or words (e.g., "cat" won't match in "catfish")
455
+
456
+ These options work with plain text searches. Use Polars regex patterns in expressions for more control:
457
+ - **Case-insensitive matching in expressions**: Use `(?i)` prefix in regex (e.g., `(?i)john`)
458
+ - **Word boundaries in expressions**: Use `\b` in regex (e.g., `\bjohn\b` matches whole word)
459
+
460
+ **Quick Tips:**
461
+ - Search results highlight matching rows/cells in **red**
462
+ - Multiple searches **accumulate selections** - each new search adds to the selections
463
+ - Type-aware matching automatically converts values. Resort to string comparison if conversion fails
464
+ - Use `u` to undo any search or filter
465
+
466
+ ### 3b. Find & Replace
467
+
468
+ The application provides powerful find and replace functionality for both single-column and global replacements.
469
+
470
+ **Replace Operations**:
471
+ - **`r` - Column Replace**: Replace values in the current column
472
+ - **`R` - Global Replace**: Replace values across all columns
473
+
474
+ **How It Works:**
475
+
476
+ When you press `r` or `R`, a dialog opens where you can enter:
477
+ 1. **Find term**: The value or expression to search for
478
+ 2. **Replace term**: What to replace matches with
479
+ 3. **Matching options**:
480
+ - **Match Nocase**: Ignore case differences when matching (unchecked by default)
481
+ - **Match Whole**: Match complete words only, not partial words (unchecked by default)
482
+ 4. **Replace option**:
483
+ - Choose **"Replace All"** to replace all matches at once (with confirmation)
484
+ - Otherwise, review and confirm each match individually
485
+
486
+ **Replace All** (`r` or `R` → Choose "Replace All"):
487
+ - Shows a confirmation dialog with the number of matches and replacements
488
+ - Replaces all matches with a single operation
489
+ - Full undo support with `u`
490
+ - Useful for bulk replacements when you're confident about the change
491
+
492
+ **Replace Interactive** (`r` or `R` → Choose "Replace Interactive"):
493
+ - Shows each match one at a time with a preview of the replacement
494
+ - For each match, press:
495
+ - `Enter` or press the `Yes` button - **Replace this occurrence** and move to next
496
+ - Press the `Skip` button - **Skip this occurrence** and move to next
497
+ - `Escape` or press the `No` button - **Cancel** remaining replacements (but keep already-made replacements)
498
+ - Displays progress: `Occurrence X of Y` (Y = total matches, X = current)
499
+ - Shows the value that will be replaced and what it will become
500
+ - Useful for careful replacements where you want to review each change
501
+
502
+ **Search Term Types:**
503
+ - **Plain text**: Exact string match (e.g., "John" finds "John")
504
+ - Use **Match Nocase** checkbox to match regardless of case (e.g., find "john", "John", "JOHN")
505
+ - Use **Match Whole** checkbox to match complete words only (e.g., find "cat" but not in "catfish")
506
+ - **NULL**: Replace null/missing values (type `NULL`)
507
+ - **Expression**: Polars expressions for complex matching (e.g., `$_ > 50` for column replace)
508
+ - **Regex patterns**: Use Polars regex syntax for advanced matching
509
+ - Case-insensitive: Use `(?i)` prefix (e.g., `(?i)john`)
510
+ - Whole word: Use `\b` boundary markers (e.g., `\bjohn\b`)
511
+
512
+ **Examples:**
513
+
514
+ ```
515
+ Find: "John"
516
+ Replace: "Jane"
517
+ → All occurrences of "John" become "Jane"
518
+
519
+ Find: "john"
520
+ Replace: "jane"
521
+ Match Nocase: ✓ (checked)
522
+ → "John", "JOHN", "john" all become "jane"
523
+
524
+ Find: "cat"
525
+ Replace: "dog"
526
+ Match Whole: ✓ (checked)
527
+ → "cat" becomes "dog", but "catfish" is not matched
528
+
529
+ Find: "NULL"
530
+ Replace: "Unknown"
531
+ → All null/missing values become "Unknown"
532
+
533
+ Find: "(?i)active" # Case-insensitive
534
+ Replace: "inactive"
535
+ → "Active", "ACTIVE", "active" all become "inactive"
536
+ ```
537
+
538
+ **For Global Replace (`R`)**:
539
+ - Searches and replaces across all columns simultaneously
540
+ - Each column can have different matching behavior (string matching for text, numeric for numbers)
541
+ - Preview shows which columns contain matches before replacement
542
+ - Useful for standardizing values across multiple columns
543
+
544
+ **Features:**
545
+ - **Full history support**: Use `u` (undo) to revert any replacement
546
+ - **Visual feedback**: Matching cells are highlighted before you choose replacement mode
547
+ - **Safe operations**: Requires confirmation before replacing
548
+ - **Progress tracking**: Shows how many replacements have been made during interactive mode
549
+ - **Type-aware**: Respects column data types when matching and replacing
550
+ - **Flexible matching**: Support for case-insensitive and whole-word matching
551
+
552
+ **Tips:**
553
+ - Use interactive mode for one-time replacements to be absolutely sure
554
+ - Use "Replace All" for routine replacements (e.g., fixing typos, standardizing formats)
555
+ - Use **Match Nocase** for matching variations of names or titles
556
+ - Use **Match Whole** to avoid unintended partial replacements
557
+ - Use `u` immediately if you accidentally replace something wrong
558
+ - For complex replacements, use Polars expressions or regex patterns in the find term
559
+ - Test with a small dataset first before large replacements
560
+
561
+ ### 4. [Polars Expressions](https://docs.pola.rs/api/python/stable/reference/expressions/index.html)
562
+
563
+ Complex values or filters can be specified via Polars expressions, with the following adaptions for convenience:
564
+
565
+ **Column References:**
566
+ - `$_` - Current column (based on cursor position)
567
+ - `$1`, `$2`, etc. - Column by 1-based index
568
+ - `$age`, `$salary` - Column by name (use actual column names)
569
+
570
+ **Row References:**
571
+ - `$#` - Current row index (1-based)
572
+
573
+ **Basic Comparisons:**
574
+ - `$_ > 50` - Current column greater than 50
575
+ - `$salary >= 100000` - Salary at least 100,000
576
+ - `$age < 30` - Age less than 30
577
+ - `$status == 'active'` - Status exactly matches 'active'
578
+ - `$name != 'Unknown'` - Name is not 'Unknown'
579
+
580
+ **Logical Operators:**
581
+ - `&` - AND
582
+ - `|` - OR
583
+ - `~` - NOT
584
+
585
+ **Practical Examples:**
586
+ - `($age < 30) & ($status == 'active')` - Age less than 30 AND status is active
587
+ - `($name == 'Alice') | ($name == 'Bob')` - Name is Alice or Bob
588
+ - `$salary / 1000 >= 50` - Salary divided by 1,000 is at least 50
589
+ - `($department == 'Sales') & ($bonus > 5000)` - Sales department with bonus over 5,000
590
+ - `($score >= 80) & ($score <= 90)` - Score between 80 and 90
591
+ - `~($status == 'inactive')` - Status is not inactive
592
+ - `$revenue > $expenses` - Revenue exceeds expenses
593
+
594
+ **String Matching:**
595
+ - `$name.str.contains("John")` - Name contains "John" (case-sensitive)
596
+ - `$name.str.contains("(?i)john")` - Name contains "john" (case-insensitive)
597
+ - `$email.str.ends_with("@company.com")` - Email ends with domain
598
+ - `$code.str.starts_with("ABC")` - Code starts with "ABC"
599
+ - `$age.cast(pl.String).str.starts_with("7")` - Age (cast to string first) starts with "7"
600
+
601
+ **Number Operations:**
602
+ - `$age * 2 > 100` - Double age greater than 100
603
+ - `($salary + $bonus) > 150000` - Total compensation over 150,000
604
+ - `$percentage >= 50` - Percentage at least 50%
605
+
606
+ **Null Handling:**
607
+ - `$column.is_null()` - Find null/missing values
608
+ - `$column.is_not_null()` - Find non-null values
609
+ - `NULL` - a value to represent null for convenience
610
+
611
+ **Tips:**
612
+ - Use column names that match exactly (case-sensitive)
613
+ - Use parentheses to clarify complex expressions: `($a & $b) | ($c & $d)`
614
+
615
+ ### 5. Sorting
616
+
617
+ - Press `[` to sort current column ascending
618
+ - Press `]` to sort current column descending
619
+ - Multi-column sorting supported (press multiple times on different columns)
620
+ - Press same key twice to remove the column from sorting
621
+
622
+ ### 6. Frequency Distribution
623
+
624
+ Press `F` to see how many times each value appears in the current column. The modal shows:
625
+ - Value
626
+ - Count
627
+ - Percentage
628
+ - Histogram
629
+ - **Total row** at the bottom
630
+
631
+ **In the Frequency Table**:
632
+ - Press `[` and `]` to sort by any column (value, count, or percentage)
633
+ - Press `v` to **filter** the main table to show only rows with the selected value
634
+ - Press `"` to **exclude** all rows except those containing the selected value
635
+ - Press `q` or `Escape` to close the frequency table
636
+
637
+ This is useful for:
638
+ - Understanding value distributions
639
+ - Quickly filtering to specific values
640
+ - Identifying rare or common values
641
+ - Finding the most/least frequent entries
642
+
643
+ ### 7. Column & Dataframe Statistics
644
+
645
+ Press `s` to see summary statistics for the current column, or press `S` for statistics across the entire dataframe.
646
+
647
+ **Column Statistics** (`s`):
648
+ - Shows calculated statistics using Polars' `describe()` method
649
+ - Displays: count, null count, mean, median, std, min, max, etc.
650
+ - Values are color-coded according to their data type
651
+ - Statistics label column has no styling for clarity
652
+
653
+ **Dataframe Statistics** (`S`):
654
+ - Shows statistics for all numeric and applicable columns simultaneously
655
+ - Data columns are color-coded by their data type (integer, float, string, etc.)
656
+
657
+ **In the Statistics Modal**:
658
+ - Press `q` or `Escape` to close the statistics table
659
+ - Use arrow keys to navigate
660
+ - Useful for quick data validation and summary reviews
661
+
662
+ This is useful for:
663
+ - Understanding data distributions and characteristics
664
+ - Identifying outliers and anomalies
665
+ - Data quality assessment
666
+ - Quick statistical summaries without external tools
667
+ - Comparing statistics across columns
668
+
669
+ ### 8. Data Editing
670
+
671
+ **Edit Cell** (`e` or **Double-click**):
672
+ - Opens modal for editing current cell
673
+ - Validates input based on column data type
674
+
675
+ **Rename Column Header** (**Double-click** column header):
676
+ - Quick rename by double-clicking the column header
677
+
678
+ **Delete Row** (`x`):
679
+ - Delete all selected rows (if any) at once
680
+ - Or delete single row at cursor
681
+
682
+ **Delete Row and Below** (`X`):
683
+ - Deletes the current row and all rows below it
684
+ - Useful for removing trailing data or the end of a dataset
685
+
686
+ **Delete Row and Above** (`Ctrl+X`):
687
+ - Deletes the current row and all rows above it
688
+ - Useful for removing leading rows or the beginning of a dataset
689
+
690
+ **Delete Column** (`-`):
691
+ - Removes the entire column from view and dataframe
692
+
693
+ **Delete Column and After** (`_`):
694
+ - Deletes the current column and all columns to the right
695
+ - Useful for removing trailing columns or the end of a dataset
696
+
697
+ **Delete Column and Before** (`Ctrl+-`):
698
+ - Deletes the current column and all columns to the left
699
+ - Useful for removing leading columns or the beginning of a dataset
700
+
701
+ ### 9. Hide & Show Columns
702
+
703
+ **Hide Column** (`h`):
704
+ - Temporarily hides the current column from display
705
+ - Column data is preserved in the dataframe
706
+ - Hidden columns are included in saves
707
+
708
+ **Show Hidden Rows/Columns** (`H`):
709
+ - Restores all previously hidden rows/columns to the display
710
+
711
+ This is useful for:
712
+ - Focusing on specific columns without deleting data
713
+ - Temporarily removing cluttered or unnecessary columns
714
+
715
+ ### 10. Duplicate Column
716
+
717
+ Press `d` to duplicate the current column:
718
+ - Creates a new column immediately after the current column
719
+ - New column has '_copy' suffix (e.g., 'price' → 'price_copy')
720
+ - Duplicate preserves all data from original column
721
+ - New column is inserted into the dataframe
722
+
723
+ This is useful for:
724
+ - Creating backup copies of columns before transformation
725
+ - Working with alternative versions of column data
726
+ - Comparing original vs. processed column values side-by-side
727
+
728
+ ### 11. Duplicate Row
729
+
730
+ Press `D` to duplicate the current row:
731
+ - Creates a new row immediately after the current row
732
+ - Duplicate preserves all data from original row
733
+ - New row is inserted into the dataframe
734
+
735
+ This is useful for:
736
+ - Creating variations of existing data records
737
+ - Batch adding similar rows with modifications
738
+
739
+ ### 12. Column & Row Reordering
740
+
741
+ **Move Columns**: `Shift+←` and `Shift+→`
742
+ - Swaps adjacent columns
743
+ - Reorder is preserved when saving
744
+
745
+ **Move Rows**: `Shift+↑` and `Shift+↓`
746
+ - Swaps adjacent rows
747
+ - Reorder is preserved when saving
748
+
749
+ ### 13. Freeze Rows and Columns
750
+
751
+ Press `z` to open the dialog:
752
+ - Enter number of fixed rows and/or columns to keep top rows/columns visible while scrolling
753
+
754
+ ### 13.5. Thousand Separator Toggle
755
+
756
+ Press `,` to toggle thousand separator formatting for numeric data:
757
+ - Applies to **integer** and **float** columns
758
+ - Formats large numbers with commas for readability (e.g., `1000000` → `1,000,000`)
759
+ - Works across all numeric columns in the table
760
+ - Toggle on/off as needed for different viewing preferences
761
+ - Display-only: does not modify underlying data in the dataframe
762
+ - State persists during the session
763
+
764
+ ### 14. Save File
765
+
766
+ Press `Ctrl+S` to save:
767
+ - Save filtered, edited, or sorted data back to file
768
+ - Choose filename in modal dialog
769
+ - Confirm if file already exists
770
+
771
+ ### 15. Undo/Redo/Reset
772
+
773
+ **Undo** (`u`):
774
+ - Reverts last action with full state restoration
775
+ - Works for edits, deletions, sorts, searches, etc.
776
+ - Shows description of reverted action
777
+
778
+ **Redo** (`U`):
779
+ - Reapplies the last undone action
780
+ - Restores the state before the undo was performed
781
+ - Useful for redoing actions you've undone by mistake
782
+ - Useful for alternating between two different states
783
+
784
+ **Reset** (`Ctrl+U`):
785
+ - Reverts all changes and returns to original data state when file was first loaded
786
+ - Clears all edits, deletions, selections, filters, and sorts
787
+ - Useful for starting fresh without reloading the file
788
+
789
+ ### 16. Column Type Conversion
790
+
791
+ Press the type conversion keys to instantly cast the current column to a different data type:
792
+
793
+ **Type Conversion Shortcuts**:
794
+ - `#` - Cast to **integer**
795
+ - `%` - Cast to **float**
796
+ - `!` - Cast to **boolean**
797
+ - `$` - Cast to **string**
798
+
799
+ **Features**:
800
+ - Instant conversion with visual feedback
801
+ - Full undo support - press `u` to revert
802
+ - Leverage Polars' robust type casting
803
+
804
+ **Note**: Type conversion attempts to preserve data where possible. Conversions may lose data (e.g., float to int rounding).
805
+
806
+ ### 17. Cursor Type Cycling
807
+
808
+ Press `K` to cycle through selection modes:
809
+ 1. **Cell mode**: Highlight individual cell (and its row/column headers)
810
+ 2. **Row mode**: Highlight entire row
811
+ 3. **Column mode**: Highlight entire column
812
+
813
+ ### 18. SQL Interface
814
+
815
+ The SQL interface provides two modes for querying your dataframe:
816
+
817
+ #### Simple SQL Interface (`l`)
818
+ Select specific columns and apply WHERE conditions without writing full SQL:
819
+ - Choose which columns to include in results
820
+ - Specify WHERE clause for filtering
821
+ - Ideal for quick filtering and column selection
822
+
823
+ #### Advanced SQL Interface (`L`)
824
+ Execute complete SQL queries for advanced data manipulation:
825
+ - Write full SQL queries with standard [SQL syntax](https://docs.pola.rs/api/python/stable/reference/sql/index.html)
826
+ - Support for JOINs, GROUP BY, aggregations, and more
827
+ - Access to all SQL capabilities for complex transformations
828
+ - Always use `self` as the table name
829
+
830
+ **Examples:**
831
+ ```sql
832
+ -- Filter and select specific rows and/or columns
833
+ SELECT name, age FROM self WHERE age > 30
834
+
835
+ -- Aggregate with GROUP BY
836
+ SELECT department, COUNT(*) as count, AVG(salary) as avg_salary
837
+ FROM self
838
+ GROUP BY department
839
+
840
+ -- Complex filtering with multiple conditions
841
+ SELECT *
842
+ FROM self
843
+ WHERE (age > 25 AND salary > 50000) OR department = 'Management'
844
+ ```
845
+
846
+ ### 19. Clipboard Operations
847
+
848
+ Copies value to system clipboard with `pbcopy` on macOS and `xclip` on Linux
849
+
850
+ Press `Ctrl+C` to copy:
851
+ - Press `c` to copy cursor value
852
+ - Press `Ctrl+C` to copy column values
853
+ - Press `Ctrl+R` to copy row values (delimited by tab)
854
+
855
+ ### 20. Link Column Creation
856
+
857
+ Press `@` to create a new column containing dynamically generated URLs using template expressions.
858
+
859
+ **Template Placeholders:**
860
+
861
+ The link template supports multiple placeholder types for maximum flexibility:
862
+
863
+ - **`$_`** - Current column (the column where cursor was when `@` was pressed)
864
+ - Example: `https://example.com/search/$_` - Uses values from the current column
865
+ - Useful for quick links based on the focused column
866
+
867
+ - **`$1`, `$2`, `$3`, etc.** - Column by 1-based position index
868
+ - Example: `https://example.com/product/$1/details/$2` - Uses 1st and 2nd columns
869
+ - Useful for structured templates spanning multiple columns
870
+ - Index corresponds to column display order (left-to-right)
871
+
872
+ - **`$name`** - Column by name (use actual column names)
873
+ - Example: `https://pubchem.ncbi.nlm.nih.gov/search?q=$product_id` - Uses `product_id` column
874
+ - Example: `https://example.com/$region/$city/data` - Uses `region` and `city` columns
875
+ - Useful for readable, self-documenting templates
876
+
877
+ **Features:**
878
+
879
+ - **Vectorized Expression**: All rows processed efficiently using Polars' vectorized operations
880
+ - **Type Casting**: Column values automatically converted to strings for URL construction
881
+ - **Multiple Placeholders**: Mix and match placeholders in a single template
882
+ - **URL Prefix**: Automatically prepends `https://` if URL doesn't start with `http://` or `https://`
883
+ - **PubChem Support**: Special shorthand - replace `PC` with full PubChem URL
884
+
885
+ **Examples:**
886
+
887
+ ```
888
+ Template: https://example.com/$_
889
+ Current column: product_id
890
+ Result: https://example.com/ABC123 (for each row's product_id value)
891
+
892
+ Template: https://database.org/view?id=$1&lang=$2
893
+ Column 1: item_code, Column 2: language
894
+ Result: https://database.org/view?id=X001&lang=en
895
+
896
+ Template: https://example.com/$username/profile
897
+ Column: username (must exist in dataframe)
898
+ Result: https://example.com/john_doe/profile
899
+
900
+ Template: https://example.com/$region/$city
901
+ Columns: region, city
902
+ Result: https://example.com/north/seattle
903
+
904
+ Template: PC/compound/$1
905
+ Column 1: pubchem_cid
906
+ Result: https://pubchem.ncbi.nlm.nih.gov/compound/12345
907
+ ```
908
+
909
+ **Error Handling:**
910
+
911
+ - **Invalid column index**: `$5` when only 3 columns exist → Error message showing valid range
912
+ - **Non-existent column name**: `$invalid_column` → Error message with available columns
913
+ - **No placeholders**: Template treated as constant → All rows get identical URL
914
+
915
+ **Tips:**
916
+
917
+ - Use descriptive column names for `$name` placeholders to make templates self-documenting
918
+ - Test with a small dataset first to verify template correctness
919
+ - Use full undo (`u`) if template produces unexpected URLs
920
+ - For complex multi-column URLs, use column names (`$name`) for clarity over positions (`$1`)
921
+
922
+ ## Examples
923
+
924
+ ### Single File Examples
925
+
926
+ ```bash
927
+ # View Pokemon dataset
928
+ dv pokemon.csv
929
+
930
+ # Chain with other command and specify input file format
931
+ cut -d',' -f1,2,3 pokemon.csv | dv -f csv
932
+
933
+ # Work with gzipped files
934
+ dv large_dataset.csv.gz
935
+
936
+ # CSV file without header row
937
+ dv -H raw_data.csv
938
+
939
+ # Skip type inference for faster loading
940
+ dv -I huge_file.csv
941
+
942
+ # Skip first 5 lines (comments, metadata)
943
+ dv -L 5 data_with_metadata.csv
944
+
945
+ # Skip 1 row after header (units row)
946
+ dv -K 1 data_with_units.csv
947
+
948
+ # Complex CSV with comments and units row
949
+ dv -L 3 -K 1 -I messy_scientific_data.csv
950
+
951
+ # Combine all options: skip lines, skip after header, no header, no inference, gzipped
952
+ dv -L 2 -K 1 -H -I complex_data.csv.gz
953
+
954
+ # Process compressed data from stdin with line skipping
955
+ zcat compressed_data.csv.gz | dv -f csv -L 2
956
+ ```
957
+
958
+ ### Multi-File/Tab Examples
959
+
960
+ ```bash
961
+ # Open multiple sheets as tabs in a single Excel
962
+ dv sales.xlsx
963
+
964
+ # Open multiple files as tabs (including gzipped)
965
+ dv pokemon.csv titanic.csv large_data.csv.gz
966
+
967
+ # Start with one file, then open others using Ctrl+O
968
+ dv initial_data.csv
969
+ ```
970
+
971
+ ## Dependencies
972
+
973
+ - **polars**: Fast DataFrame library for data loading/processing
974
+ - **textual**: Terminal UI framework
975
+ - **fastexcel**: Read Excel files
976
+ - **xlsxwriter**: Write Excel files
977
+
978
+ ## Requirements
979
+
980
+ - Python 3.11+
981
+ - POSIX-compatible terminal (macOS, Linux, WSL)
982
+ - Terminal supporting ANSI escape sequences and mouse events
983
+
984
+ ## Acknowledgments
985
+
986
+ - Inspired by [VisiData](https://visidata.org/)
987
+ - Built with [Textual](https://textual.textualize.io/) and [Polars](https://www.pola.rs/)