npm - @tidyjs/tidy - Versions diffs - 2.6.0 → 2.6.1 - Mend

@tidyjs/tidy 2.6.0 → 2.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/genai-docs/api-core.md +357 -0
package/genai-docs/api-grouping.md +400 -0
package/genai-docs/api-joins.md +118 -0
package/genai-docs/api-other.md +238 -0
package/genai-docs/api-pivot.md +112 -0
package/genai-docs/api-selectors.md +159 -0
package/genai-docs/api-sequences.md +127 -0
package/genai-docs/api-slice.md +137 -0
package/genai-docs/api-summarize.md +528 -0
package/genai-docs/api-vector.md +239 -0
package/genai-docs/gotchas.md +193 -0
package/genai-docs/index.md +44 -0
package/genai-docs/mental-model.md +270 -0
package/genai-docs/patterns.md +384 -0
package/genai-docs/quick-reference.md +125 -0
package/package.json +3 -2

package/genai-docs/mental-model.md ADDED Viewed

@@ -0,0 +1,270 @@
+# Mental Model for tidyjs
+## What is tidyjs?
+tidyjs is a JavaScript/TypeScript library for data wrangling that works with **plain arrays of objects** — no special DataFrame wrapper. It is inspired by R's dplyr and the tidyverse. Think of it as a functional pipeline for transforming `{key: value}[]` data, similar to how you might chain SQL operations or pandas methods, but using composable JavaScript functions.
+## The Pipeline Pattern
+Everything in tidyjs flows through `tidy()`:
+```js
+import { tidy, filter, mutate, arrange, desc } from '@tidyjs/tidy';
+const result = tidy(
+  data,          // 1st arg: array of objects
+  filter(...),   // 2nd+ args: transformation functions (verbs)
+  mutate(...),
+  arrange(desc('value'))
+);
+// result is a new array of objects
+```
+**Key rules:**
+- First argument is always the data array — `tidy(data, ...fns)`
+- Each subsequent argument is a **verb** — a function that returns a `TidyFn`
+- Verbs are **curried**: `filter(predicate)` returns a function `(items[]) => items[]`
+- The output of each verb feeds into the next
+- `tidy()` returns a new array (never mutates the input)
+- You can pass up to **10 pipeline steps** with full TypeScript type inference
+**Common mistake — don't call verbs directly:**
+```js
+// WRONG: calling filter directly without tidy()
+const result = filter((d) => d.value > 10)(data);
+// CORRECT: use tidy() as the pipeline
+const result = tidy(data, filter((d) => d.value > 10));
+```
+## Accessor Functions
+tidyjs uses **accessor functions** `(d) => d.column` to reference data fields, NOT string column names.
+```js
+// CORRECT: accessor function
+tidy(data, filter((d) => d.age > 30))
+tidy(data, mutate({ fullName: (d) => `${d.first} ${d.last}` }))
+// WRONG: string column names (this is NOT pandas or SQL)
+tidy(data, filter('age > 30'))  // won't work
+```
+**Exception:** Some summary functions accept either a key string or accessor for convenience:
+```js
+sum('value')              // shorthand — string key
+sum((d) => d.value)       // equivalent — accessor function
+mean('score')             // string key shorthand
+```
+These are the **only** places strings work as field references: inside summary functions like `sum`, `mean`, `min`, `max`, `median`, `first`, `last`, `n`, `nDistinct`, `deviation`, `variance`, and sort helpers like `asc('key')`, `desc('key')`.
+## The Function Taxonomy
+This is **critical** — each function type belongs in a specific context:
+### Tidy Verbs → go directly inside `tidy()`
+These are pipeline steps that transform the array:
+```js
+tidy(data,
+  filter((d) => d.active),        // filter rows
+  mutate({ tax: (d) => d.price * 0.1 }), // add/modify columns per item
+  arrange(desc('price')),          // sort rows
+  select(['name', 'price', 'tax']), // pick columns
+  distinct(['category']),          // deduplicate
+  rename({ old_name: 'new_name' }) // rename columns
+)
+```
+Full list: `filter`, `mutate`, `transmute`, `mutateWithSummary`, `arrange` (alias: `sort`), `select` (alias: `pick`), `distinct`, `rename`, `slice`, `sliceHead`, `sliceTail`, `sliceMin`, `sliceMax`, `sliceSample`, `groupBy`, `summarize`, `summarizeAll`, `summarizeAt`, `summarizeIf`, `total`, `totalAll`, `totalAt`, `totalIf`, `count`, `tally`, `innerJoin`, `leftJoin`, `fullJoin`, `pivotWider`, `pivotLonger`, `complete`, `expand`, `fill`, `replaceNully`, `addRows` (alias: `addItems`), `when`, `map`, `debug`
+### Summary Functions → go inside `summarize()` or `total()`
+These **reduce** an array of items to a single value:
+```js
+tidy(data,
+  summarize({
+    totalRevenue: sum('revenue'),
+    avgScore: mean('score'),
+    count: n(),
+  })
+)
+// => [{ totalRevenue: 1500, avgScore: 85, count: 10 }]
+```
+Summary functions: `sum`, `mean`, `median`, `min`, `max`, `n`, `nDistinct`, `first`, `last`, `deviation`, `variance`, `meanRate`
+**They also work inside `mutateWithSummary()`** to add summary-derived columns back to every row.
+### Vector Functions → go inside `mutateWithSummary()`
+These operate on the **full array** and return a new array of the same length:
+```js
+tidy(data,
+  mutateWithSummary({
+    runningTotal: cumsum('value'),
+    prevValue: lag('value'),
+    nextValue: lead('value'),
+    rank: rowNumber(),
+  })
+)
+```
+Vector functions: `cumsum`, `lag`, `lead`, `roll`, `rowNumber`
+### Item Functions → go inside `mutate()`
+These transform **one item at a time**:
+```js
+tidy(data,
+  mutate({
+    conversionRate: rate('conversions', 'impressions'),
+  })
+)
+```
+Item functions: `rate`
+### Selectors → go inside `select()`, `summarizeAt()`, `pivotLonger(cols:)`
+These dynamically select columns by pattern:
+```js
+tidy(data,
+  select([startsWith('revenue_'), 'name'])
+)
+```
+Selectors: `everything`, `startsWith`, `endsWith`, `contains`, `matches`, `numRange`, `negate`
+## mutate vs mutateWithSummary
+This is the **most important distinction** in tidyjs. Getting this wrong produces silent bugs — code that runs but returns incorrect data.
+### `mutate` — per-item transformation
+The function receives `(item, index, array)` for each item individually:
+```js
+tidy(data,
+  mutate({
+    doubled: (d) => d.value * 2,
+    label: (d) => `${d.name}: ${d.value}`,
+    constant: 42,  // non-function values are applied to all items
+  })
+)
+```
+### `mutateWithSummary` — cross-item transformation
+The function receives the **entire array** `(items[])` and must return an array of the same length OR a single value (broadcast to all items):
+```js
+tidy(data,
+  mutateWithSummary({
+    runningTotal: cumsum('value'),     // returns array
+    pctOfTotal: (items) =>             // custom: returns array
+      items.map(d => d.value / sum('value')(items)),
+    totalValue: sum('value'),          // returns single value → broadcast
+  })
+)
+```
+### When to use which?
+| Use `mutate` when... | Use `mutateWithSummary` when... |
+|---|---|
+| Each item's new value depends only on that item | New value depends on other items in the array |
+| Simple calculations: `(d) => d.a + d.b` | Cumulative ops: `cumsum`, `lag`, `lead`, `roll` |
+| String formatting: `(d) => d.name.toUpperCase()` | Summary-derived: adding `sum()` or `mean()` as a column |
+| Setting constants: `{ status: 'active' }` | Row numbering: `rowNumber()` |
+### The dangerous mistake
+```js
+// WRONG — sum() inside mutate() does NOT work correctly
+// sum() expects the full array, but mutate passes one item at a time
+tidy(data, mutate({ total: sum('value') }))
+// CORRECT — use mutateWithSummary for cross-item operations
+tidy(data, mutateWithSummary({ total: sum('value') }))
+```
+## groupBy Semantics
+`groupBy` splits data into groups, runs operations per-group, then recombines:
+```js
+tidy(data,
+  groupBy('category', [
+    summarize({ total: sum('value') })
+  ])
+)
+// => [{ category: 'A', total: 100 }, { category: 'B', total: 200 }]
+```
+**Key behaviors:**
+- Group keys are automatically merged back into results (disable with `addGroupKeys: false`)
+- Operations inside the `fns` array run on each group independently
+- Without an export option, results are flattened back to a single array (ungrouped)
+- Group by multiple keys: `groupBy(['category', 'region'], [...])`
+- Group by computed key: `groupBy((d) => d.date.getFullYear(), [...])`
+### Export Modes
+By default, `groupBy` ungroups the result back into a flat array. Use export mode shortcuts to get different output shapes:
+```js
+// Flat array (default — no export option)
+groupBy('key', [summarize(...)])
+// => [{ key: 'a', total: 10 }, { key: 'b', total: 20 }]
+// Nested entries: [[key, values], ...]
+groupBy('key', [summarize(...)], groupBy.entries())
+// Entries as objects: [{ key, values }, ...]
+groupBy('key', [summarize(...)], groupBy.entriesObject())
+// Plain object: { key: values, ... }
+groupBy('key', [summarize(...)], groupBy.object())
+// ES Map: Map { key => values }
+groupBy('key', [summarize(...)], groupBy.map())
+// Grouped Map (raw internal structure)
+groupBy('key', [summarize(...)], groupBy.grouped())
+// Just the keys
+groupBy('key', [summarize(...)], groupBy.keys())
+// Just the values (arrays)
+groupBy('key', [summarize(...)], groupBy.values())
+// Per-level control for multi-level grouping
+groupBy(['cat', 'subcat'], [summarize(...)], groupBy.levels({ levels: ['object', 'entries'] }))
+```
+Export options also accept: `flat`, `single`, `mapLeaf`, `mapLeaves`, `mapEntry`, `compositeKey`.
+**Important:** When using an export mode, `groupBy` becomes a `TidyGroupExportFn` — it must be the **last step** in the `tidy()` pipeline (or used inside another `groupBy`).
+## TypeScript Tips
+- **Accessor typing:** `(d: MyType) => d.value` gives full type inference inside `mutate`, `filter`, etc.
+- **Pipeline step limit:** `tidy()` has type overloads for up to 10 steps. For longer pipelines, split into multiple `tidy()` calls or use `as` assertions.
+- **groupBy return types:** The return type changes based on the export option. `groupBy.object()` returns `ObjectOutput`, `groupBy.entries()` returns `EntriesOutput`, etc. Without an export option, it returns the flat array type.
+- **Summary function keys:** `sum('value')` infers the key must exist on the input type. Use accessor functions `sum((d) => d.value)` for computed values.
+## What tidyjs is NOT
+- **Not a DataFrame wrapper** — works directly with `{key: value}[]` arrays, no special data structure
+- **Not lazy-evaluated** — each verb executes immediately in the pipeline
+- **Not a database query builder** — all data is in memory
+- **Not a charting library** — it transforms data; use a separate library to visualize
+- **Not a replacement for lodash/Array methods** — use it when you need multi-step data wrangling pipelines; for simple `.filter()` or `.map()`, plain JS is fine

package/genai-docs/patterns.md ADDED Viewed

@@ -0,0 +1,384 @@
+# Patterns and Recipes
+Multi-verb recipes for common data transformation tasks.
+---
+## 1. Group and Summarize
+The most common tidyjs pattern — split data into groups, then aggregate each group.
+```js
+const data = [
+  { category: 'A', region: 'east', value: 10 },
+  { category: 'A', region: 'west', value: 20 },
+  { category: 'B', region: 'east', value: 30 },
+  { category: 'B', region: 'west', value: 40 },
+];
+tidy(data,
+  groupBy('category', [
+    summarize({
+      total: sum('value'),
+      avg: mean('value'),
+      count: n(),
+    })
+  ])
+)
+// => [
+//   { category: 'A', total: 30, avg: 15, count: 2 },
+//   { category: 'B', total: 70, avg: 35, count: 2 },
+// ]
+```
+**With multiple group keys:**
+```js
+tidy(data,
+  groupBy(['category', 'region'], [
+    summarize({ total: sum('value') })
+  ])
+)
+// => [
+//   { category: 'A', region: 'east', total: 10 },
+//   { category: 'A', region: 'west', total: 20 },
+//   { category: 'B', region: 'east', total: 30 },
+//   { category: 'B', region: 'west', total: 40 },
+// ]
+```
+**Export as a keyed object:**
+```js
+tidy(data,
+  groupBy('category', [
+    summarize({ total: sum('value') })
+  ], groupBy.object({ single: true }))
+)
+// => { A: { category: 'A', total: 30 }, B: { category: 'B', total: 70 } }
+```
+---
+## 2. Pivot Wider and Longer
+### Long to wide
+```js
+const data = [
+  { name: 'Alice', metric: 'score', value: 90 },
+  { name: 'Alice', metric: 'rank', value: 1 },
+  { name: 'Bob', metric: 'score', value: 80 },
+  { name: 'Bob', metric: 'rank', value: 2 },
+];
+tidy(data,
+  pivotWider({
+    namesFrom: 'metric',
+    valuesFrom: 'value',
+  })
+)
+// => [
+//   { name: 'Alice', score: 90, rank: 1 },
+//   { name: 'Bob', score: 80, rank: 2 },
+// ]
+```
+### Wide to long
+```js
+const data = [
+  { name: 'Alice', score: 90, rank: 1 },
+  { name: 'Bob', score: 80, rank: 2 },
+];
+tidy(data,
+  pivotLonger({
+    cols: ['score', 'rank'],
+    namesTo: 'metric',
+    valuesTo: 'value',
+  })
+)
+// => [
+//   { name: 'Alice', metric: 'score', value: 90 },
+//   { name: 'Alice', metric: 'rank', value: 1 },
+//   { name: 'Bob', metric: 'score', value: 80 },
+//   { name: 'Bob', metric: 'rank', value: 2 },
+// ]
+```
+**Pivot longer with selectors:**
+```js
+tidy(data,
+  pivotLonger({
+    cols: [startsWith('q')],  // columns like q1, q2, q3, q4
+    namesTo: 'quarter',
+    valuesTo: 'revenue',
+  })
+)
+```
+---
+## 3. Fill Missing Time Series (expand + complete + fill)
+Generate missing time periods and fill forward.
+```js
+const data = [
+  { date: '2024-01', category: 'A', value: 10 },
+  { date: '2024-03', category: 'A', value: 30 },  // 2024-02 missing
+  { date: '2024-01', category: 'B', value: 20 },
+  { date: '2024-02', category: 'B', value: 25 },
+];
+tidy(data,
+  complete({
+    date: ['2024-01', '2024-02', '2024-03'],
+    category: ['A', 'B'],
+  }),
+  fill('value')
+)
+// => all date/category combinations exist, nulls filled forward
+```
+**With numeric sequences:**
+```js
+tidy(data,
+  complete({
+    year: fullSeq('year', { period: 1 }),  // fills gaps in year column
+    category: ['A', 'B'],
+  }),
+  replaceNully({ value: 0 })  // fill missing with 0 instead of forward-fill
+)
+```
+---
+## 4. Rolling Aggregation
+Compute a moving average or other rolling window calculation.
+```js
+const data = [
+  { date: '2024-01', value: 10 },
+  { date: '2024-02', value: 20 },
+  { date: '2024-03', value: 15 },
+  { date: '2024-04', value: 25 },
+  { date: '2024-05', value: 30 },
+];
+tidy(data,
+  mutateWithSummary({
+    movingAvg3: roll(3, mean('value'), { partial: true }),
+  })
+)
+// => each row gets a 3-period moving average of 'value'
+// partial: true means first 2 rows use windows smaller than 3
+```
+**Rolling sum:**
+```js
+tidy(data,
+  mutateWithSummary({
+    rollingSum: roll(3, sum('value')),
+  })
+)
+```
+---
+## 5. Cumulative Calculations
+Add a running total, cumulative count, or percentage of total.
+```js
+const data = [
+  { month: 'Jan', revenue: 100 },
+  { month: 'Feb', revenue: 150 },
+  { month: 'Mar', revenue: 200 },
+];
+tidy(data,
+  mutateWithSummary({
+    cumulativeRevenue: cumsum('revenue'),
+    rowNum: rowNumber(),
+    totalRevenue: sum('revenue'),  // broadcast single value to all rows
+  }),
+  mutate({
+    pctOfTotal: (d) => d.revenue / d.totalRevenue,
+  })
+)
+// => [
+//   { month: 'Jan', revenue: 100, cumulativeRevenue: 100, rowNum: 0, totalRevenue: 450, pctOfTotal: 0.222 },
+//   { month: 'Feb', revenue: 150, cumulativeRevenue: 250, rowNum: 1, totalRevenue: 450, pctOfTotal: 0.333 },
+//   { month: 'Mar', revenue: 200, cumulativeRevenue: 450, rowNum: 2, totalRevenue: 450, pctOfTotal: 0.444 },
+// ]
+```
+---
+## 6. Conditional Pipeline Branching
+Apply transformations only when a condition is met.
+```js
+const includeInactive = false;
+tidy(data,
+  when(includeInactive, []),  // no-op when false
+  when(!includeInactive, [filter((d) => d.active)]),  // filter when true
+  arrange(desc('value'))
+)
+```
+**With a predicate function:**
+```js
+tidy(data,
+  when(
+    (items) => items.length > 100,  // only filter if dataset is large
+    [sliceHead(100)]
+  ),
+  summarize({ avg: mean('score') })
+)
+```
+---
+## 7. Multi-Level Grouping with Export
+Nested grouping with per-level export control.
+```js
+const data = [
+  { dept: 'Eng', team: 'Frontend', name: 'Alice', salary: 100 },
+  { dept: 'Eng', team: 'Frontend', name: 'Bob', salary: 110 },
+  { dept: 'Eng', team: 'Backend', name: 'Carol', salary: 120 },
+  { dept: 'Sales', team: 'Enterprise', name: 'Dave', salary: 90 },
+];
+// Nested object: { dept: { team: [items] } }
+tidy(data,
+  groupBy(['dept', 'team'], [],
+    groupBy.levels({ levels: ['object', 'object'] })
+  )
+)
+// => {
+//   Eng: { Frontend: [Alice, Bob], Backend: [Carol] },
+//   Sales: { Enterprise: [Dave] }
+// }
+```
+**Flat export with composite keys:**
+```js
+tidy(data,
+  groupBy(['dept', 'team'], [summarize({ total: sum('salary') })],
+    groupBy.object({ flat: true, compositeKey: (keys) => keys.join(' > ') })
+  )
+)
+// => { 'Eng > Frontend': [...], 'Eng > Backend': [...], 'Sales > Enterprise': [...] }
+```
+---
+## 8. Join and Enrich
+Add columns from a lookup table.
+```js
+const orders = [
+  { orderId: 1, productId: 'A', qty: 5 },
+  { orderId: 2, productId: 'B', qty: 3 },
+  { orderId: 3, productId: 'A', qty: 2 },
+];
+const products = [
+  { productId: 'A', name: 'Widget', price: 10 },
+  { productId: 'B', name: 'Gadget', price: 25 },
+];
+tidy(orders,
+  leftJoin(products, { by: 'productId' }),
+  mutate({ total: (d) => d.qty * d.price }),
+  arrange(desc('total'))
+)
+// => [
+//   { orderId: 2, productId: 'B', qty: 3, name: 'Gadget', price: 25, total: 75 },
+//   { orderId: 1, productId: 'A', qty: 5, name: 'Widget', price: 10, total: 50 },
+//   { orderId: 3, productId: 'A', qty: 2, name: 'Widget', price: 10, total: 20 },
+// ]
+```
+---
+## 9. Top-N Per Group
+Get the highest/lowest items within each group.
+```js
+const data = [
+  { category: 'A', name: 'a1', score: 90 },
+  { category: 'A', name: 'a2', score: 85 },
+  { category: 'A', name: 'a3', score: 70 },
+  { category: 'B', name: 'b1', score: 95 },
+  { category: 'B', name: 'b2', score: 60 },
+];
+// Top 2 per category
+tidy(data,
+  groupBy('category', [
+    arrange(desc('score')),
+    sliceHead(2),
+  ])
+)
+// => [
+//   { category: 'A', name: 'a1', score: 90 },
+//   { category: 'A', name: 'a2', score: 85 },
+//   { category: 'B', name: 'b1', score: 95 },
+//   { category: 'B', name: 'b2', score: 60 },
+// ]
+```
+**Alternative using sliceMax:**
+```js
+tidy(data,
+  groupBy('category', [
+    sliceMax(2, 'score'),
+  ])
+)
+```
+---
+## 10. Lag/Lead for Period-Over-Period Comparison
+Calculate change from previous period.
+```js
+const data = [
+  { month: 'Jan', revenue: 100 },
+  { month: 'Feb', revenue: 120 },
+  { month: 'Mar', revenue: 110 },
+];
+tidy(data,
+  mutateWithSummary({
+    prevRevenue: lag('revenue', { default: 0 }),
+  }),
+  mutate({
+    change: (d) => d.revenue - d.prevRevenue,
+    pctChange: (d) => d.prevRevenue ? (d.revenue - d.prevRevenue) / d.prevRevenue : null,
+  })
+)
+// => [
+//   { month: 'Jan', revenue: 100, prevRevenue: 0, change: 100, pctChange: null },
+//   { month: 'Feb', revenue: 120, prevRevenue: 100, change: 20, pctChange: 0.2 },
+//   { month: 'Mar', revenue: 110, prevRevenue: 120, change: -10, pctChange: -0.083 },
+// ]
+```