@tidyjs/tidy 2.6.0 → 2.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,239 @@
1
+ # Vector Functions API Reference
2
+
3
+ Vector functions produce an array of values equal in length to the input collection. They operate across items (not one at a time), enabling running totals, lookbacks, sliding windows, and row numbering.
4
+
5
+ **CRITICAL:** Vector functions go inside `mutateWithSummary()`, NOT inside `mutate()`. `mutate` passes items one at a time; `mutateWithSummary` passes the full array. Using vector functions inside `mutate()` will silently produce wrong results.
6
+
7
+ ```js
8
+ import { tidy, mutateWithSummary, cumsum, lag, lead, roll, rowNumber } from '@tidyjs/tidy';
9
+ ```
10
+
11
+ ---
12
+
13
+ <!-- keywords: mutateWithSummary, mutate summary, vector mutate, cross-item, broadcast, array function -->
14
+ ## mutateWithSummary
15
+
16
+ Add or replace columns using functions that receive the full array of items.
17
+
18
+ **Signature:** `mutateWithSummary(spec: Record<string, (items[]) => value[] | value | NonFunctionValue>)`
19
+ **Goes inside:** `tidy()`
20
+
21
+ ### Parameters
22
+ - `spec` -- an object where each key is a new (or existing) column name and each value is one of:
23
+ - A function `(items: T[]) => T[keyof T][] | T[keyof T]` -- receives the full array. If it returns an **array**, each element maps to the corresponding item. If it returns a **single value**, that value is broadcast to all items.
24
+ - A literal value (number, string, etc.) -- broadcast to all items.
25
+
26
+ ### Example
27
+ ```js
28
+ const data = [
29
+ { name: 'a', val: 1 },
30
+ { name: 'b', val: 2 },
31
+ { name: 'c', val: 3 },
32
+ ];
33
+
34
+ tidy(data, mutateWithSummary({
35
+ // array return: each element maps to the corresponding row
36
+ doubled: (items) => items.map(d => d.val * 2),
37
+ // single value return: broadcast to every row
38
+ total: (items) => items.reduce((s, d) => s + d.val, 0),
39
+ // literal value: broadcast to every row
40
+ flag: true,
41
+ }));
42
+ // output:
43
+ // [
44
+ // { name: 'a', val: 1, doubled: 2, total: 6, flag: true },
45
+ // { name: 'b', val: 2, doubled: 4, total: 6, flag: true },
46
+ // { name: 'c', val: 3, doubled: 6, total: 6, flag: true },
47
+ // ]
48
+ ```
49
+
50
+ ### Why not mutate?
51
+ `mutate` processes items one at a time: `(item, index) => value`. It cannot look at other rows. Vector and summary functions need the full array, so they must go inside `mutateWithSummary`.
52
+
53
+ ---
54
+
55
+ <!-- keywords: cumsum, cumulative sum, running total, running sum -->
56
+ ## cumsum
57
+
58
+ Cumulative sum. Returns an array of running totals (uses full-precision floating-point summation).
59
+
60
+ **Signature:** `cumsum(key: keyof T | (d: T, index: number, array: Iterable<T>) => number | null | undefined)`
61
+ **Goes inside:** `mutateWithSummary()`
62
+
63
+ ### Parameters
64
+ - `key` -- a property name (string) or accessor function returning a number. `null`/`undefined` values are skipped (running total stays the same).
65
+
66
+ ### Example
67
+ ```js
68
+ const data = [
69
+ { item: 'a', val: 3 },
70
+ { item: 'b', val: 1 },
71
+ { item: 'c', val: null },
72
+ { item: 'd', val: 5 },
73
+ ];
74
+
75
+ tidy(data, mutateWithSummary({
76
+ running: cumsum('val'),
77
+ }));
78
+ // output:
79
+ // [
80
+ // { item: 'a', val: 3, running: 3 },
81
+ // { item: 'b', val: 1, running: 4 },
82
+ // { item: 'c', val: null, running: 4 },
83
+ // { item: 'd', val: 5, running: 9 },
84
+ // ]
85
+ ```
86
+
87
+ ---
88
+
89
+ <!-- keywords: lag, previous value, lookback, offset, delta, difference -->
90
+ ## lag
91
+
92
+ Value from N rows before the current row. Useful for computing deltas (current minus previous).
93
+
94
+ **Signature:** `lag(key: keyof T | (d: T, index: number, array: Iterable<T>) => any, options?)`
95
+ **Goes inside:** `mutateWithSummary()`
96
+
97
+ ### Parameters
98
+ - `key` -- a property name or accessor function.
99
+ - `options` (optional):
100
+ - `n` (number, default `1`) -- how many rows back to look.
101
+ - `default` (any, default `undefined`) -- fill value for the first N items where no previous row exists.
102
+
103
+ ### Example
104
+ ```js
105
+ const data = [
106
+ { day: 1, val: 10 },
107
+ { day: 2, val: 15 },
108
+ { day: 3, val: 12 },
109
+ ];
110
+
111
+ tidy(data, mutateWithSummary({
112
+ prev: lag('val'),
113
+ prev0: lag('val', { default: 0 }),
114
+ prev2: lag('val', { n: 2 }),
115
+ }));
116
+ // output:
117
+ // [
118
+ // { day: 1, val: 10, prev: undefined, prev0: 0, prev2: undefined },
119
+ // { day: 2, val: 15, prev: 10, prev0: 10, prev2: undefined },
120
+ // { day: 3, val: 12, prev: 15, prev0: 15, prev2: 10 },
121
+ // ]
122
+ ```
123
+
124
+ ---
125
+
126
+ <!-- keywords: lead, next value, lookahead, forward, offset -->
127
+ ## lead
128
+
129
+ Value from N rows after the current row. Useful for computing forward differences.
130
+
131
+ **Signature:** `lead(key: keyof T | (d: T, index: number, array: Iterable<T>) => any, options?)`
132
+ **Goes inside:** `mutateWithSummary()`
133
+
134
+ ### Parameters
135
+ - `key` -- a property name or accessor function.
136
+ - `options` (optional):
137
+ - `n` (number, default `1`) -- how many rows ahead to look.
138
+ - `default` (any, default `undefined`) -- fill value for the last N items where no next row exists.
139
+
140
+ ### Example
141
+ ```js
142
+ const data = [
143
+ { day: 1, val: 10 },
144
+ { day: 2, val: 15 },
145
+ { day: 3, val: 12 },
146
+ ];
147
+
148
+ tidy(data, mutateWithSummary({
149
+ next: lead('val'),
150
+ next0: lead('val', { default: 0 }),
151
+ next2: lead('val', { n: 2 }),
152
+ }));
153
+ // output:
154
+ // [
155
+ // { day: 1, val: 10, next: 15, next0: 15, next2: 12 },
156
+ // { day: 2, val: 15, next: 12, next0: 12, next2: undefined },
157
+ // { day: 3, val: 12, next: undefined, next0: 0, next2: undefined },
158
+ // ]
159
+ ```
160
+
161
+ ---
162
+
163
+ <!-- keywords: roll, rolling, sliding window, moving average, running mean, window function -->
164
+ ## roll
165
+
166
+ Rolling/sliding window operation. Applies a summary function to a window of items as it slides across the array.
167
+
168
+ **Signature:** `roll(width: number, rollFn: (itemsInWindow: T[], endIndex: number) => any, options?)`
169
+ **Goes inside:** `mutateWithSummary()`
170
+
171
+ ### Parameters
172
+ - `width` (number) -- the window size (number of items).
173
+ - `rollFn` -- a function `(itemsInWindow, endIndex) => value` applied to each window. Typically a summary function like `mean('col')`.
174
+ - `options` (optional):
175
+ - `partial` (boolean, default `false`) -- if `true`, compute for windows smaller than `width` at the edges. If `false`, those positions are `undefined`.
176
+ - `align` (`'right'` | `'center'` | `'left'`, default `'right'`) -- window alignment relative to the current row:
177
+ - `'right'`: current row is the last item in the window (looks back).
178
+ - `'left'`: current row is the first item in the window (looks forward).
179
+ - `'center'`: current row is the center of the window.
180
+
181
+ ### Example
182
+ ```js
183
+ import { mean } from '@tidyjs/tidy';
184
+
185
+ const data = [
186
+ { day: 1, val: 3 },
187
+ { day: 2, val: 1 },
188
+ { day: 3, val: 3 },
189
+ { day: 4, val: 1 },
190
+ { day: 5, val: 7 },
191
+ ];
192
+
193
+ tidy(data, mutateWithSummary({
194
+ avg3: roll(3, mean('val')),
195
+ avg3p: roll(3, mean('val'), { partial: true }),
196
+ }));
197
+ // output:
198
+ // [
199
+ // { day: 1, val: 3, avg3: undefined, avg3p: 3/1 }, // partial
200
+ // { day: 2, val: 1, avg3: undefined, avg3p: 4/2 }, // partial
201
+ // { day: 3, val: 3, avg3: 7/3, avg3p: 7/3 },
202
+ // { day: 4, val: 1, avg3: 5/3, avg3p: 5/3 },
203
+ // { day: 5, val: 7, avg3: 11/3, avg3p: 11/3 },
204
+ // ]
205
+ ```
206
+
207
+ ---
208
+
209
+ <!-- keywords: rowNumber, row number, index, sequential, numbering -->
210
+ ## rowNumber
211
+
212
+ Assigns sequential row numbers starting from 0 (configurable).
213
+
214
+ **Signature:** `rowNumber(options?)`
215
+ **Goes inside:** `mutateWithSummary()`
216
+
217
+ ### Parameters
218
+ - `options` (optional):
219
+ - `startAt` (number, default `0`) -- the number to assign to the first row.
220
+
221
+ ### Example
222
+ ```js
223
+ const data = [
224
+ { name: 'a', val: 10 },
225
+ { name: 'b', val: 20 },
226
+ { name: 'c', val: 30 },
227
+ ];
228
+
229
+ tidy(data, mutateWithSummary({
230
+ row: rowNumber(),
231
+ row1: rowNumber({ startAt: 1 }),
232
+ }));
233
+ // output:
234
+ // [
235
+ // { name: 'a', val: 10, row: 0, row1: 1 },
236
+ // { name: 'b', val: 20, row: 1, row1: 2 },
237
+ // { name: 'c', val: 30, row: 2, row1: 3 },
238
+ // ]
239
+ ```
@@ -0,0 +1,193 @@
1
+ # Gotchas and Anti-Patterns
2
+
3
+ Common mistakes AI assistants make when generating tidyjs code, ordered by severity.
4
+
5
+ ---
6
+
7
+ ## 1. Using summary functions inside `mutate()` instead of `mutateWithSummary()`
8
+
9
+ This is the **most dangerous mistake** — it produces code that runs without errors but returns wrong data.
10
+
11
+ ```js
12
+ // WRONG — sum() receives one item at a time, not the full array
13
+ tidy(data, mutate({ total: sum('value') }))
14
+
15
+ // CORRECT — mutateWithSummary passes the full array to sum()
16
+ tidy(data, mutateWithSummary({ total: sum('value') }))
17
+ ```
18
+
19
+ **Rule:** If the function needs to see all items (summary functions like `sum`, `mean`, `median`, or vector functions like `cumsum`, `lag`, `lead`), use `mutateWithSummary`. If it only needs the current item, use `mutate`.
20
+
21
+ ---
22
+
23
+ ## 2. Using string column names instead of accessor functions
24
+
25
+ tidyjs is **not** pandas or SQL. Most verbs require accessor functions.
26
+
27
+ ```js
28
+ // WRONG
29
+ tidy(data, filter('value > 10'))
30
+ tidy(data, mutate({ doubled: 'value * 2' }))
31
+
32
+ // CORRECT
33
+ tidy(data, filter((d) => d.value > 10))
34
+ tidy(data, mutate({ doubled: (d) => d.value * 2 }))
35
+ ```
36
+
37
+ **Exception:** Summary functions and sort helpers accept string keys as shorthand:
38
+ ```js
39
+ sum('value') // OK — shorthand for sum((d) => d.value)
40
+ asc('name') // OK — shorthand for ascending sort by name
41
+ desc('value') // OK — shorthand for descending sort by value
42
+ ```
43
+
44
+ ---
45
+
46
+ ## 3. Forgetting the outer `tidy()` wrapper
47
+
48
+ Every transformation should go through `tidy()`. Verbs are curried — they return functions, not results.
49
+
50
+ ```js
51
+ // WRONG — filter() returns a function, not filtered data
52
+ const result = filter((d) => d.active);
53
+ // result is a function, not an array!
54
+
55
+ // CORRECT
56
+ const result = tidy(data, filter((d) => d.active));
57
+ ```
58
+
59
+ ---
60
+
61
+ ## 4. Confusing `TMath.rate()` with item-level `rate()`
62
+
63
+ These are two different functions with the same name but different purposes:
64
+
65
+ ```js
66
+ import { rate, TMath } from '@tidyjs/tidy';
67
+
68
+ // TMath.rate — simple math: (numerator, denominator) => number
69
+ TMath.rate(10, 100); // => 0.1
70
+
71
+ // rate — item-level mutator for use inside mutate()
72
+ tidy(data, mutate({
73
+ convRate: rate('conversions', 'impressions')
74
+ }))
75
+ // rate('conversions', 'impressions') creates (d) => d.conversions / d.impressions
76
+ ```
77
+
78
+ ---
79
+
80
+ ## 5. Not understanding groupBy export modes
81
+
82
+ Without an export option, `groupBy` returns a flat array. With one, the output shape changes completely.
83
+
84
+ ```js
85
+ // Returns flat array (default)
86
+ tidy(data, groupBy('type', [summarize({ n: n() })]))
87
+ // => [{ type: 'A', n: 5 }, { type: 'B', n: 3 }]
88
+
89
+ // Returns a plain object — DIFFERENT output shape
90
+ tidy(data, groupBy('type', [summarize({ n: n() })], groupBy.object()))
91
+ // => { A: [{ type: 'A', n: 5 }], B: [{ type: 'B', n: 3 }] }
92
+ ```
93
+
94
+ **Important:** When using an export mode, `groupBy` must be the **last step** in the pipeline. It returns a non-array type that cannot be piped further.
95
+
96
+ ---
97
+
98
+ ## 6. `select` negation syntax requires negation first
99
+
100
+ The `-` prefix for excluding columns only works when negation keys come first (or all keys are negations).
101
+
102
+ ```js
103
+ // CORRECT — all negations
104
+ tidy(data, select(['-password', '-secret']))
105
+
106
+ // CORRECT — negations after selector that produces all keys
107
+ tidy(data, select([everything(), '-password']))
108
+
109
+ // WRONG — mixing positive and negative keys unpredictably
110
+ tidy(data, select(['name', '-password', 'email']))
111
+ ```
112
+
113
+ ---
114
+
115
+ ## 7. `fill()` only fills forward (down), not backward
116
+
117
+ `fill` replaces `null`/`undefined` with the most recent non-null value going **downward** through the array.
118
+
119
+ ```js
120
+ const data = [
121
+ { group: 'A', value: 10 },
122
+ { group: null, value: 20 },
123
+ { group: null, value: 30 },
124
+ { group: 'B', value: 40 },
125
+ ];
126
+
127
+ tidy(data, fill('group'))
128
+ // => [{ group: 'A', value: 10 }, { group: 'A', value: 20 },
129
+ // { group: 'A', value: 30 }, { group: 'B', value: 40 }]
130
+ ```
131
+
132
+ If you need backward fill, sort your data in reverse first, fill, then sort back.
133
+
134
+ ---
135
+
136
+ ## 8. Wrapping accessors in `asc()` unnecessarily
137
+
138
+ `arrange` auto-promotes single-argument accessor functions to ascending comparators. You don't need `asc()` for the basic case.
139
+
140
+ ```js
141
+ // These are equivalent:
142
+ tidy(data, arrange((d) => d.name))
143
+ tidy(data, arrange(asc('name')))
144
+
145
+ // Use desc() explicitly for descending:
146
+ tidy(data, arrange(desc('value')))
147
+
148
+ // Multiple sort keys:
149
+ tidy(data, arrange(asc('category'), desc('value')))
150
+ ```
151
+
152
+ ---
153
+
154
+ ## 9. Relying on join auto-detection for `by` keys
155
+
156
+ `innerJoin`, `leftJoin`, and `fullJoin` auto-detect matching keys from the **first element** of each array. This can break with inconsistent data shapes.
157
+
158
+ ```js
159
+ // RISKY — auto-detects join keys from first element
160
+ tidy(data, leftJoin(lookupTable))
161
+
162
+ // SAFER — explicitly specify join keys
163
+ tidy(data, leftJoin(lookupTable, { by: 'id' }))
164
+ tidy(data, leftJoin(lookupTable, { by: { id: 'lookupId' } }))
165
+ ```
166
+
167
+ ---
168
+
169
+ ## 10. Exceeding the 10-step pipeline type limit
170
+
171
+ `tidy()` has TypeScript overloads for up to 10 pipeline steps. Beyond that, types fall back to `any`.
172
+
173
+ ```js
174
+ // If you need more than 10 steps, split into multiple tidy() calls:
175
+ const intermediate = tidy(data, step1(), step2(), /* ...up to 10 */);
176
+ const result = tidy(intermediate, step11(), step12());
177
+ ```
178
+
179
+ ---
180
+
181
+ ## 11. Using `summarize` when you want `total`
182
+
183
+ `summarize` reduces to a single row. `total` appends a summary row while keeping all original rows.
184
+
185
+ ```js
186
+ // summarize: replaces all rows with one summary row
187
+ tidy(data, summarize({ sum: sum('value') }))
188
+ // => [{ sum: 150 }]
189
+
190
+ // total: keeps all rows, adds a summary row at the end
191
+ tidy(data, total({ value: sum('value') }))
192
+ // => [{ name: 'a', value: 50 }, { name: 'b', value: 100 }, { name: 'Total', value: 150 }]
193
+ ```
@@ -0,0 +1,44 @@
1
+ # @tidyjs/tidy — AI Documentation
2
+
3
+ > Tidy up your data with JavaScript! A data wrangling library inspired by R's dplyr and the tidyverse.
4
+
5
+ ## How to Use These Docs
6
+
7
+ 1. **Start here** — Read `mental-model.md` first. It teaches the pipeline pattern, accessor conventions, and function taxonomy that are essential for writing correct tidyjs code.
8
+ 2. **Look up functions** — Use the `api-*.md` files to find specific function signatures, parameters, and examples.
9
+ 3. **Find recipes** — Check `patterns.md` for multi-step transformation recipes.
10
+ 4. **Avoid mistakes** — Read `gotchas.md` for common AI mistakes and anti-patterns.
11
+ 5. **Quick lookup** — Use `quick-reference.md` to map a task ("I want to filter rows") to the right function.
12
+
13
+ ## File Index
14
+
15
+ | File | Purpose |
16
+ |------|---------|
17
+ | [`mental-model.md`](mental-model.md) | Core concepts: pipeline pattern, accessors, function taxonomy, mutate vs mutateWithSummary |
18
+ | [`quick-reference.md`](quick-reference.md) | Task-to-function cheat sheet |
19
+ | [`gotchas.md`](gotchas.md) | Common mistakes and anti-patterns |
20
+ | [`patterns.md`](patterns.md) | Multi-verb recipes for real-world tasks |
21
+ | [`api-core.md`](api-core.md) | tidy, filter, mutate, transmute, arrange, select, distinct, rename, when, debug, map |
22
+ | [`api-grouping.md`](api-grouping.md) | groupBy + all 8 export modes |
23
+ | [`api-summarize.md`](api-summarize.md) | summarize, summarizeAll/At/If + summary functions (sum, mean, median, etc.) |
24
+ | [`api-vector.md`](api-vector.md) | mutateWithSummary + vector functions (cumsum, lag, lead, roll, rowNumber) |
25
+ | [`api-joins.md`](api-joins.md) | innerJoin, leftJoin, fullJoin |
26
+ | [`api-pivot.md`](api-pivot.md) | pivotWider, pivotLonger |
27
+ | [`api-slice.md`](api-slice.md) | slice, sliceHead, sliceTail, sliceMin, sliceMax, sliceSample |
28
+ | [`api-selectors.md`](api-selectors.md) | everything, startsWith, endsWith, contains, matches, numRange, negate |
29
+ | [`api-sequences.md`](api-sequences.md) | fullSeq, fullSeqDate, fullSeqDateISOString, vectorSeq, vectorSeqDate |
30
+ | [`api-other.md`](api-other.md) | complete, expand, fill, replaceNully, count, tally, total, addRows, TMath |
31
+
32
+ ## Quick Install
33
+
34
+ ```bash
35
+ npm install @tidyjs/tidy
36
+ ```
37
+
38
+ ```js
39
+ import { tidy, filter, mutate, arrange, desc, groupBy, summarize, sum } from '@tidyjs/tidy';
40
+ ```
41
+
42
+ ## Extension Packages
43
+
44
+ - `@tidyjs/tidy-moment` — Moment.js date/time extensions (not covered in these docs)