twokeys 2.0.2 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,38 +1,22 @@
1
1
  # Twokeys
2
2
 
3
- > A small data exploration and manipulation library, named after **John Tukey** the legendary statistician who pioneered exploratory data analysis (EDA).
3
+ > Exploratory data analysis for graphs, vectors, and series — named after **John Tukey**, the legendary statistician who pioneered EDA.
4
4
 
5
5
  [![CI](https://github.com/buley/twokeys/actions/workflows/ci.yml/badge.svg)](https://github.com/buley/twokeys/actions/workflows/ci.yml)
6
6
  [![npm version](https://badge.fury.io/js/twokeys.svg)](https://www.npmjs.com/package/twokeys)
7
7
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
8
8
 
9
- ## About John Tukey
10
-
11
- John Wilder Tukey (1915–2000) revolutionized how we look at data. He invented the box plot, coined the terms "bit" and "software," and championed the idea that **looking at data** is just as important as modeling it. His book *Exploratory Data Analysis* (1977) changed statistics forever.
12
-
13
- This library is named after him — a founding mind in [data exploration and analysis](http://en.wikipedia.org/wiki/Exploratory_data_analysis) and a personal hero of the author.
9
+ ## What Is This?
14
10
 
15
- ## Features
11
+ Tukey taught us that **looking at data** is just as important as modeling it. Twokeys applies that philosophy to three domains:
16
12
 
17
- - **Summary Statistics**: Mean, median, mode, trimean, quartiles (hinges)
18
- - **Outlier Detection**: Tukey fences (inner and outer)
19
- - **Letter Values**: Extended quartiles (eighths, sixteenths, etc.)
20
- - **Stem-and-Leaf**: Text-based distribution visualization
21
- - **Ranking**: Full ranking with tie handling
22
- - **Binning**: Histogram-style grouping
23
- - **Smoothing**: Hanning filter, Tukey's 3RSSH smoothing
24
- - **Transforms**: Logarithms, square roots, reciprocals
25
- - **WASM Support**: Optional WebAssembly for maximum performance
26
- - **Zero Dependencies**: Pure TypeScript, works everywhere
27
- - **Tiny**: <3KB minified and gzipped
13
+ 1. **Graph EDA** Treat structural properties of graphs (degree distributions, clustering coefficients, assortativity) as data series that deserve full Tukey-style analysis
14
+ 2. **Vector Distance & Similarity** Cosine similarity, Mahalanobis distance, Jaccard similarity, L2 normalization, and more
15
+ 3. **Multivariate Analysis** Centroids, covariance matrices, correlation matrices, and Mahalanobis-based outlier detection via the `Points` class
16
+ 4. **1D EDA** — The original `Series` class: medians, fences, letter values, stem-and-leaf plots, smoothing, and everything else Tukey invented
17
+ 5. **Graph Algorithms** Centrality, community detection, shortest paths, flow, clustering (k-means, hierarchical, DBSCAN), TSP approximation, and a GDS-style catalog
28
18
 
29
- ## Packages
30
-
31
- | Package | Description |
32
- |---------|-------------|
33
- | `twokeys` | Core TypeScript library |
34
- | `@buley/twokeys-wasm` | WebAssembly implementation with TypeScript fallback |
35
- | `@buley/twokeys-types` | Shared Zod schemas for runtime validation |
19
+ Zero dependencies. Pure TypeScript. Works everywhere.
36
20
 
37
21
  ## Installation
38
22
 
@@ -40,316 +24,358 @@ This library is named after him — a founding mind in [data exploration and ana
40
24
  npm install twokeys
41
25
  # or
42
26
  bun add twokeys
43
- # or
44
- yarn add twokeys
45
27
  ```
46
28
 
47
- For WASM acceleration (optional):
48
-
49
- ```bash
50
- npm install @buley/twokeys-wasm
51
- ```
29
+ ## Graph EDA
52
30
 
53
- ## Quick Start
31
+ The core insight: graph structural properties ARE data series. Degree distributions get box plots. Clustering coefficients get outlier detection. Assortativity tells you whether your network is stratified.
54
32
 
55
33
  ```typescript
56
- import { Series } from 'twokeys';
57
-
58
- // Create a series from your data
59
- const series = new Series({ data: [1, 2, 3, 4, 5, 6, 7, 8, 9, 100] });
60
-
61
- // Get summary statistics
62
- console.log(series.mean()); // 14.5
63
- console.log(series.median()); // { datum: 5.5, depth: 5.5 }
64
- console.log(series.trimean()); // Tukey's trimean
65
-
66
- // Detect outliers (using Tukey fences)
67
- console.log(series.outliers()); // [100]
68
-
69
- // Get a full description
70
- const desc = series.describe();
71
- console.log(desc.summary);
34
+ import { graphEda, graphOutliers } from 'twokeys';
35
+
36
+ const nodes = ['alice', 'bob', 'carol', 'dave', 'eve'] as const;
37
+ const edges = [
38
+ { from: 'alice', to: 'bob', weight: 1 },
39
+ { from: 'alice', to: 'carol', weight: 1 },
40
+ { from: 'bob', to: 'carol', weight: 1 },
41
+ { from: 'carol', to: 'dave', weight: 1 },
42
+ { from: 'dave', to: 'eve', weight: 1 },
43
+ ];
44
+
45
+ const summary = graphEda(nodes, edges);
46
+
47
+ // Every structural metric, analyzed as a Tukey Series:
48
+ summary.density; // edges / maxPossibleEdges
49
+ summary.degreeDistribution; // Full SeriesDescription with median, fences, outliers
50
+ summary.clusteringDistribution; // Clustering coefficients as EDA
51
+ summary.globalClusteringCoefficient;
52
+ summary.averagePathLength;
53
+ summary.diameter;
54
+ summary.reciprocity;
55
+ summary.degreeAssortativity; // Do hubs connect to hubs?
56
+
57
+ // Find structurally unusual nodes
58
+ const unusual = graphOutliers(nodes, edges, { method: 'combined' });
59
+ // [{ nodeId: 'eve', score: 2.1, reason: 'Low degree + low clustering' }]
72
60
  ```
73
61
 
74
- ### Using WASM (Optional)
75
-
76
- ```typescript
77
- import { loadWasm, analyze, isWasmLoaded } from '@buley/twokeys-wasm';
78
-
79
- // Load WASM (falls back to TypeScript if unavailable)
80
- await loadWasm();
62
+ ## Vector Distance & Similarity
81
63
 
82
- console.log(isWasmLoaded()); // true if WASM loaded
64
+ Standalone functions for vector math. These are the workhorses that `Points`, graph algorithms, and consumers all use.
83
65
 
84
- // Use the same API - automatically uses WASM when available
85
- const result = analyze([1, 2, 3, 4, 5, 6, 7, 8, 9, 100]);
86
- console.log(result.summary.outliers); // [100]
66
+ ```typescript
67
+ import {
68
+ cosineSimilarity,
69
+ euclideanDistance,
70
+ manhattanDistance,
71
+ mahalanobisDistance,
72
+ normalizeL2,
73
+ cosineSimilaritySparse,
74
+ jaccardSimilarity,
75
+ overlapCoefficient,
76
+ } from 'twokeys';
77
+
78
+ // Dense vectors
79
+ cosineSimilarity([1, 2, 3], [4, 5, 6]); // 0.974
80
+ euclideanDistance([0, 0], [3, 4]); // 5
81
+ manhattanDistance([0, 0], [3, 4]); // 7
82
+ normalizeL2([3, 4]); // [0.6, 0.8]
83
+ mahalanobisDistance([5, 5], [0, 0], [1, 1]); // 7.07
84
+
85
+ // Sparse vectors (Map<string, number>)
86
+ const a = new Map([['x', 1], ['y', 2]]);
87
+ const b = new Map([['y', 3], ['z', 4]]);
88
+ cosineSimilaritySparse(a, b); // similarity on shared keys
89
+
90
+ // Set similarity
91
+ jaccardSimilarity(new Set([1, 2, 3]), new Set([2, 3, 4])); // 0.5
92
+ overlapCoefficient(new Set([1, 2]), new Set([2, 3, 4])); // 0.5
87
93
  ```
88
94
 
89
- ## Benchmarks
95
+ ## Multivariate Analysis (Points)
90
96
 
91
- Performance on different dataset sizes (operations per second, higher is better):
97
+ Full multivariate EDA: centroids, covariance, correlation, Mahalanobis outlier detection.
92
98
 
93
- ### TypeScript Implementation
94
-
95
- | Method | 100 elements | 1,000 elements | 10,000 elements |
96
- |--------|-------------:|---------------:|----------------:|
97
- | `sorted()` | 224,599 | 14,121 | 874 |
98
- | `median()` | 199,397 | 14,127 | 876 |
99
- | `mean()` | 550,610 | 413,551 | 68,399 |
100
- | `mode()` | 87,665 | 6,738 | 431 |
101
- | `fences()` | 238,486 | 13,270 | 864 |
102
- | `outliers()` | 210,058 | 12,584 | 854 |
103
- | `smooth()` | 61,268 | 1,599 | 31 |
104
- | `describe()` | 15,642 | 952 | 29 |
99
+ ```typescript
100
+ import { Points } from 'twokeys';
105
101
 
106
- ### v2.0 Performance Improvements
107
-
108
- Compared to v1.x (CoffeeScript), v2.0 TypeScript is dramatically faster:
109
-
110
- | Method | v1.x (10K) | v2.0 (10K) | Improvement |
111
- |--------|------------|------------|-------------|
112
- | `median()` | 6 ops/sec | 876 ops/sec | **146x faster** |
113
- | `counts()` | 1 ops/sec | 606 ops/sec | **606x faster** |
114
- | `fences()` | 5 ops/sec | 864 ops/sec | **173x faster** |
115
- | `describe()` | 1 ops/sec | 29 ops/sec | **29x faster** |
116
-
117
- Key optimizations:
118
- - O(1) index-based median (was O(n²) recursive)
119
- - Map-based frequency counting (was O(n²) recursive)
120
- - Eliminated unnecessary array copying in smoothing
121
-
122
- ## Example Output
123
-
124
- Applying `describe()` to a Series returns a complete analysis:
125
-
126
- ```javascript
127
- const series = new Series({ data: [48, 59, 63, 30, 57, 92, 73, 47, 31, 5] });
128
- const analysis = series.describe();
129
-
130
- // Result:
131
- {
132
- "original": [48, 59, 63, 30, 57, 92, 73, 47, 31, 5],
133
- "summary": {
134
- "median": { "datum": 52.5, "depth": 5.5 },
135
- "mean": 50.5,
136
- "hinges": [{ "datum": 31, "depth": 3 }, { "datum": 63, "depth": 8 }],
137
- "adjacent": [30, 92],
138
- "outliers": [],
139
- "extremes": [5, 92],
140
- "iqr": 32,
141
- "fences": [4.5, 100.5]
142
- },
143
- "smooths": {
144
- "smooth": [48, 30, 57, 57, 57, 47, 31, 5, 5, 5],
145
- "hanning": [48, 61, 46.5, 43.5, 74.5, 82.5, 60, 39, 18, 5]
146
- },
147
- "transforms": {
148
- "logs": [3.87, 4.08, 4.14, ...],
149
- "roots": [6.93, 7.68, 7.94, ...],
150
- "inverse": [0.021, 0.017, 0.016, ...]
151
- },
152
- "sorted": [5, 30, 31, 47, 48, 57, 59, 63, 73, 92],
153
- "ranked": { "up": {...}, "down": {...}, "groups": {...} },
154
- "binned": { "bins": 4, "width": 26, "binned": {...} }
155
- }
102
+ const points = new Points({
103
+ data: [
104
+ [1, 2, 3],
105
+ [4, 5, 6],
106
+ [7, 8, 9],
107
+ [100, 100, 100], // outlier
108
+ ],
109
+ });
110
+
111
+ points.centroid(); // [28, 28.75, 29.5]
112
+ points.variances(); // Per-dimension variance
113
+ points.standardDeviations(); // Per-dimension stddev
114
+ points.covarianceMatrix(); // Full covariance matrix
115
+ points.correlationMatrix(); // Pearson correlation matrix
116
+
117
+ // Mahalanobis distance — the multivariate equivalent of z-score
118
+ points.mahalanobis([50, 50, 50]); // Distance of a single point
119
+ points.mahalanobisAll(); // Distance for each stored point
120
+
121
+ // Tukey-style outlier detection for multivariate data
122
+ points.outliersByMahalanobis(); // [[100, 100, 100]]
123
+
124
+ // Normalization
125
+ points.normalizeL2(); // L2-normalize all points (returns new Points)
126
+ points.normalizeZscore(); // Z-score normalize per dimension
127
+
128
+ // Full description each dimension analyzed as a Series
129
+ const desc = points.describe();
130
+ desc.centroid; // Mean point
131
+ desc.correlationMatrix; // Pearson correlations
132
+ desc.mahalanobisDistances; // Distance from centroid per point
133
+ desc.outlierCount; // How many outliers
134
+ desc.dimensionSummaries; // Each dimension as a full SeriesDescription
156
135
  ```
157
136
 
158
- ## API
137
+ ## 1D Exploratory Data Analysis (Series)
159
138
 
160
- ### Series
161
-
162
- The `Series` class provides methods for exploring 1-dimensional numerical data.
139
+ The original Tukey toolkit: everything you need to explore univariate data.
163
140
 
164
141
  ```typescript
165
142
  import { Series } from 'twokeys';
166
143
 
167
- const series = new Series({ data: [1, 2, 3, 4, 5] });
144
+ const series = new Series({ data: [2, 4, 4, 4, 5, 5, 7, 9] });
145
+
146
+ // Summary statistics
147
+ series.mean(); // 5
148
+ series.median(); // { datum: 4.5, depth: 4.5 }
149
+ series.mode(); // { count: 3, data: [4] }
150
+ series.trimean(); // Tukey's trimean
151
+
152
+ // Dispersion
153
+ series.variance(); // Sample variance
154
+ series.stddev(); // Standard deviation
155
+ series.iqr(); // Interquartile range
156
+ series.skewness(); // Fisher-Pearson skewness
157
+ series.kurtosis(); // Excess kurtosis
158
+
159
+ // Outlier detection (Tukey fences)
160
+ series.fences(); // Inner fence boundaries (1.5 x IQR)
161
+ series.outliers(); // Values outside inner fences
162
+ series.outside(); // Values outside outer fences (3 x IQR)
163
+
164
+ // Time series
165
+ series.ema(0.3); // Exponential moving average
166
+ series.zscore(); // Z-score normalization
167
+ series.hanning(); // Hanning filter
168
+ series.smooth(); // Tukey's 3RSSH smoothing
169
+ series.rough(); // Residuals (original - smooth)
170
+
171
+ // Visualization
172
+ series.stemLeaf(); // Stem-and-leaf plot
173
+ series.letterValues(); // Extended quartiles (M, F, E, D, C, B, A...)
174
+
175
+ // Everything at once
176
+ series.describe();
168
177
  ```
169
178
 
170
- #### Summary Statistics
171
-
172
- | Method | Description |
173
- |--------|-------------|
174
- | `mean()` | Arithmetic mean |
175
- | `median()` | Median value and depth |
176
- | `mode()` | Most frequent value(s) |
177
- | `trimean()` | Tukey's trimean: (Q1 + 2×median + Q3) / 4 |
178
- | `extremes()` | [min, max] values |
179
- | `hinges()` | Quartile boundaries (Q1, Q3) |
180
- | `iqr()` | Interquartile range |
181
-
182
- #### Outlier Detection
183
-
184
- | Method | Description |
185
- |--------|-------------|
186
- | `fences()` | Inner fence boundaries (1.5 × IQR) |
187
- | `outer()` | Outer fence boundaries (3 × IQR) |
188
- | `outliers()` | Values outside inner fences |
189
- | `inside()` | Values within fences |
190
- | `outside()` | Values outside outer fences |
191
- | `adjacent()` | Most extreme non-outlier values |
192
-
193
- #### Letter Values & Visualization
194
-
195
- | Method | Description |
196
- |--------|-------------|
197
- | `letterValues()` | Extended quartiles (M, F, E, D, C, B, A...) |
198
- | `stemLeaf()` | Stem-and-leaf text display |
199
- | `midSummaries()` | Symmetric quantile pair averages |
200
-
201
- #### Ranking & Counting
202
-
203
- | Method | Description |
204
- |--------|-------------|
205
- | `sorted()` | Sorted copy of data |
206
- | `ranked()` | Rank information with tie handling |
207
- | `counts()` | Frequency of each value |
208
- | `binned()` | Histogram-style bins |
209
-
210
- #### Transforms
211
-
212
- | Method | Description |
213
- |--------|-------------|
214
- | `logs()` | Natural logarithm of each value |
215
- | `roots()` | Square root of each value |
216
- | `inverse()` | Reciprocal (1/x) of each value |
217
-
218
- #### Smoothing
219
-
220
- | Method | Description |
221
- |--------|-------------|
222
- | `hanning()` | Hanning filter (running averages) |
223
- | `smooth()` | Tukey's 3RSSH smoothing |
224
- | `rough()` | Residuals (original - smooth) |
179
+ ## Graph Algorithms
225
180
 
226
- #### Full Description
181
+ Centrality, community detection, shortest paths, flow, clustering, and a GDS-style catalog.
227
182
 
228
183
  ```typescript
229
- series.describe();
230
- // Returns complete analysis including all of the above
184
+ import {
185
+ // Centrality
186
+ degreeCentrality,
187
+ closenessCentrality,
188
+ betweennessCentrality,
189
+ pageRank,
190
+
191
+ // Community detection
192
+ louvainCommunities,
193
+ labelPropagationCommunities,
194
+
195
+ // Similarity & link prediction
196
+ nodeSimilarity,
197
+ kNearestNeighbors,
198
+ linkPrediction,
199
+
200
+ // Paths & flow
201
+ shortestPath,
202
+ aStarShortestPath,
203
+ yenKShortestPaths,
204
+ allPairsShortestPaths,
205
+ maximumFlow,
206
+ minCostMaxFlow,
207
+
208
+ // Structure
209
+ topologicalSort,
210
+ stronglyConnectedComponents,
211
+ weaklyConnectedComponents,
212
+ minimumSpanningTree,
213
+ articulationPointsAndBridges,
214
+ analyzeGraph,
215
+
216
+ // Clustering
217
+ kMeansClustering,
218
+ kMeansAuto,
219
+ hierarchicalClustering,
220
+ dbscan,
221
+
222
+ // TSP
223
+ travelingSalesmanApprox,
224
+
225
+ // GDS-style catalog
226
+ createGraphCatalog,
227
+ } from 'twokeys';
231
228
  ```
232
229
 
233
- ### Points
234
-
235
- The `Points` class handles n-dimensional point data.
230
+ ### Clustering
236
231
 
237
232
  ```typescript
238
- import { Points } from 'twokeys';
239
-
240
- // 100 random 2D points
241
- const points = new Points({ count: 100, dimensionality: 2 });
242
-
243
- // Or provide your own data
244
- const myPoints = new Points({ data: [[1, 2], [3, 4], [5, 6]] });
233
+ import { kMeansClustering, hierarchicalClustering, dbscan } from 'twokeys';
234
+
235
+ const points = [
236
+ [1, 1], [1.5, 2], [3, 4],
237
+ [5, 7], [3.5, 5], [4.5, 5],
238
+ [3.5, 4.5],
239
+ ];
240
+
241
+ // k-Means
242
+ const km = kMeansClustering(points, 2);
243
+
244
+ // Hierarchical (single, complete, average, or ward linkage)
245
+ const hc = hierarchicalClustering(points, 2, { linkage: 'ward' });
246
+ hc.clusters; // Point indices per cluster
247
+ hc.dendrogram; // Merge history
248
+ hc.silhouette; // Cluster quality score
249
+
250
+ // DBSCAN — density-based, finds natural shapes, handles noise
251
+ const db = dbscan(points, 1.5, 2);
252
+ db.clusters; // Point indices per cluster
253
+ db.noise; // Indices of noise points
254
+ db.clusterCount; // Number of clusters found
245
255
  ```
246
256
 
247
- ### Twokeys
257
+ ### GDS-Style Catalog
248
258
 
249
- The main class provides factory methods and utilities.
259
+ In-memory graph projections with procedure wrappers and pipelines, inspired by Neo4j GDS.
250
260
 
251
261
  ```typescript
252
- import Twokeys from 'twokeys';
262
+ import { createGraphCatalog } from 'twokeys';
253
263
 
254
- // Generate random data
255
- const randomData = Twokeys.randomSeries(100, 50); // 100 values, max 50
256
- const randomPoints = Twokeys.randomPoints(50, 3); // 50 3D points
264
+ const gds = createGraphCatalog<string>();
265
+ gds.project('social', nodes, edges, { directed: true });
257
266
 
258
- // Access classes
259
- const series = new Twokeys.Series({ data: [1, 2, 3] });
260
- const points = new Twokeys.Points(100);
267
+ const rank = gds.pageRank('social');
268
+ const pipeline = gds.runPipeline('social', [
269
+ { id: 'rank', kind: 'page-rank' },
270
+ { id: 'sim', kind: 'similarity', options: { metric: 'jaccard' } },
271
+ { id: 'links', kind: 'link-prediction', options: { limit: 10 } },
272
+ ]);
261
273
  ```
262
274
 
263
- ## Examples
275
+ ## API Reference
264
276
 
265
- ### Box Plot Data
277
+ ### Distance & Similarity (`distance.ts`)
266
278
 
267
- ```typescript
268
- const series = new Series({ data: myData });
269
-
270
- const boxPlot = {
271
- min: series.extremes()[0],
272
- q1: series.hinges()[0].datum,
273
- median: series.median().datum,
274
- q3: series.hinges()[1].datum,
275
- max: series.extremes()[1],
276
- outliers: series.outliers(),
277
- };
278
- ```
279
+ | Function | Description |
280
+ |----------|-------------|
281
+ | `cosineSimilarity(a, b)` | Cosine similarity between dense vectors [-1, 1] |
282
+ | `euclideanDistance(a, b)` | Euclidean (L2) distance |
283
+ | `squaredEuclideanDistance(a, b)` | Squared Euclidean distance (avoids sqrt) |
284
+ | `manhattanDistance(a, b)` | Manhattan (L1) distance |
285
+ | `mahalanobisDistance(point, means, variances, epsilon?)` | Mahalanobis distance |
286
+ | `normalizeL2(vector)` | L2-normalize a vector to unit length |
287
+ | `cosineSimilaritySparse(a, b)` | Cosine similarity for sparse vectors (`Map<string, number>`) |
288
+ | `jaccardSimilarity(a, b)` | Jaccard index for sets |
289
+ | `overlapCoefficient(a, b)` | Overlap coefficient for sets |
279
290
 
280
- ### Outlier Detection
291
+ ### Graph EDA (`graph-eda.ts`)
281
292
 
282
- ```typescript
283
- const series = new Series({ data: measurements });
293
+ | Function | Description |
294
+ |----------|-------------|
295
+ | `graphEda(nodes, edges, options?)` | Full Tukey-style EDA of graph structure |
296
+ | `clusteringCoefficient(nodes, edges, options?)` | Per-node clustering coefficients |
297
+ | `graphOutliers(nodes, edges, options?)` | Structurally unusual nodes |
284
298
 
285
- // Inner fences: 1.5 × IQR from hinges
286
- const suspicious = series.outliers();
299
+ ### Series
287
300
 
288
- // Outer fences: 3 × IQR from hinges
289
- const extreme = series.outside();
290
- ```
301
+ | Category | Methods |
302
+ |----------|---------|
303
+ | **Statistics** | `mean()`, `median()`, `mode()`, `trimean()`, `variance()`, `stddev()`, `skewness()`, `kurtosis()` |
304
+ | **Dispersion** | `extremes()`, `hinges()`, `iqr()`, `fences()`, `outer()` |
305
+ | **Outliers** | `outliers()`, `inside()`, `outside()`, `adjacent()` |
306
+ | **Time Series** | `ema(alpha)`, `zscore()`, `hanning()`, `smooth()`, `rough()` |
307
+ | **Visualization** | `stemLeaf()`, `letterValues()`, `midSummaries()` |
308
+ | **Transforms** | `logs()`, `roots()`, `inverse()` |
309
+ | **Sorting** | `sorted()`, `ranked()`, `counts()`, `binned()` |
291
310
 
292
- ### Letter Values Display
311
+ ### Points
293
312
 
294
- ```typescript
295
- const series = new Series({ data: myData });
296
-
297
- // Get extended quartiles
298
- const lv = series.letterValues();
299
- // [
300
- // { letter: 'M', depth: 10.5, lower: 52.5, upper: 52.5, mid: 52.5, spread: 0 },
301
- // { letter: 'F', depth: 5, lower: 31, upper: 73, mid: 52, spread: 42 },
302
- // { letter: 'E', depth: 3, lower: 30, upper: 82, mid: 56, spread: 52 },
303
- // ...
304
- // ]
305
- ```
313
+ | Method | Description |
314
+ |--------|-------------|
315
+ | `centroid()` | Mean point across all dimensions |
316
+ | `variances()` | Per-dimension variance |
317
+ | `standardDeviations()` | Per-dimension standard deviation |
318
+ | `covarianceMatrix()` | Full covariance matrix |
319
+ | `correlationMatrix()` | Pearson correlation matrix |
320
+ | `mahalanobis(point)` | Mahalanobis distance of a single point from centroid |
321
+ | `mahalanobisAll()` | Mahalanobis distance for each stored point |
322
+ | `outliersByMahalanobis(threshold?)` | Points with Mahalanobis distance > threshold |
323
+ | `normalizeL2()` | L2-normalize all points (returns new Points) |
324
+ | `normalizeZscore()` | Z-score normalize per dimension (returns new Points) |
325
+ | `describe()` | Full multivariate EDA summary |
326
+
327
+ ### Graph Algorithms
328
+
329
+ | Category | Functions |
330
+ |----------|-----------|
331
+ | **Centrality** | `degreeCentrality`, `closenessCentrality`, `betweennessCentrality`, `pageRank` |
332
+ | **Community** | `louvainCommunities`, `labelPropagationCommunities` |
333
+ | **Similarity** | `nodeSimilarity`, `kNearestNeighbors`, `linkPrediction` |
334
+ | **Paths** | `shortestPath`, `aStarShortestPath`, `yenKShortestPaths`, `allPairsShortestPaths` |
335
+ | **Flow** | `maximumFlow`, `minCostMaxFlow` |
336
+ | **Structure** | `topologicalSort`, `stronglyConnectedComponents`, `weaklyConnectedComponents`, `minimumSpanningTree`, `articulationPointsAndBridges`, `analyzeGraph` |
337
+ | **Clustering** | `kMeansClustering`, `kMeansAuto`, `hierarchicalClustering`, `dbscan` |
338
+ | **TSP** | `travelingSalesmanApprox` |
339
+ | **Catalog** | `createGraphCatalog` — GDS-style projections and pipelines |
306
340
 
307
- ### Stem-and-Leaf Display
341
+ ## Packages
308
342
 
309
- ```typescript
310
- const series = new Series({ data: myData });
311
-
312
- const { display } = series.stemLeaf();
313
- // [
314
- // " 0 | 5",
315
- // " 3 | 0 1",
316
- // " 4 | 7 8",
317
- // " 5 | 7 9",
318
- // " 6 | 3",
319
- // " 7 | 3",
320
- // " 9 | 2"
321
- // ]
322
- ```
343
+ | Package | Description |
344
+ |---------|-------------|
345
+ | `twokeys` | Core TypeScript library |
346
+ | `@buley/twokeys-wasm` | WebAssembly implementation with TypeScript fallback |
347
+ | `@buley/twokeys-types` | Shared Zod schemas for runtime validation |
323
348
 
324
- ### Data Transformation
349
+ ## Benchmarks
325
350
 
326
- ```typescript
327
- const series = new Series({ data: skewedData });
351
+ Performance on different dataset sizes (operations per second):
328
352
 
329
- // Try different transforms to normalize
330
- const logTransformed = series.logs();
331
- const sqrtTransformed = series.roots();
332
- ```
353
+ | Method | 100 elements | 1,000 elements | 10,000 elements |
354
+ |--------|-------------:|---------------:|----------------:|
355
+ | `sorted()` | 224,599 | 14,121 | 874 |
356
+ | `median()` | 199,397 | 14,127 | 876 |
357
+ | `mean()` | 550,610 | 413,551 | 68,399 |
358
+ | `mode()` | 87,665 | 6,738 | 431 |
359
+ | `fences()` | 238,486 | 13,270 | 864 |
360
+ | `outliers()` | 210,058 | 12,584 | 854 |
361
+ | `smooth()` | 61,268 | 1,599 | 31 |
362
+ | `describe()` | 15,642 | 952 | 29 |
333
363
 
334
364
  ## Development
335
365
 
336
366
  ```bash
337
- # Install dependencies
338
367
  bun install
339
-
340
- # Run tests
341
368
  bun test
342
-
343
- # Run tests with coverage
344
369
  bun test --coverage
345
-
346
- # Build all packages
347
370
  bun run build
348
-
349
- # Run benchmark
350
- bun run bench.ts
351
371
  ```
352
372
 
373
+ ## About John Tukey
374
+
375
+ John Wilder Tukey (1915-2000) revolutionized how we look at data. He invented the box plot, coined the terms "bit" and "software," and championed the idea that looking at data is just as important as modeling it. His book *Exploratory Data Analysis* (1977) changed statistics forever.
376
+
377
+ This library is named after him — a founding mind in [data exploration and analysis](http://en.wikipedia.org/wiki/Exploratory_data_analysis) and a personal hero of the author.
378
+
353
379
  ## License
354
380
 
355
381
  MIT