datly 0.0.2 → 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.MD CHANGED
@@ -1,2986 +1,2373 @@
1
- # Datly
2
-
3
- ![Version](https://img.shields.io/badge/version-1.0.0-blue.svg)
4
- ![License](https://img.shields.io/badge/license-MIT-green.svg)
5
- ![NPM](https://img.shields.io/badge/npm-datly-red.svg)
6
-
7
- **Javascript toolkit for data science, statistical analysis and machine learning.**
8
- ---
9
-
10
- ### ⚡ Key Features:
11
- - 📈 **Complete Statistics Suite** - From descriptive stats to advanced hypothesis testing
12
- - 🤖 **7 ML Algorithms** - Classification, regression, and ensemble methods
13
- - 📊 **13 Visualization Types** - Interactive D3.js charts with one-line commands
14
- - 🔄 **Auto-Analysis** - Intelligent data exploration with automated insights
15
- - 🎨 **Zero Config** - Works out of the box, customizable when needed
16
- - 🌐 **Universal** - Same API for browser and Node.js
1
+ # datly
2
+ A comprehensive JavaScript library for data analysis, statistics, machine learning, and visualization.
3
+
17
4
  ---
18
5
 
19
- **Datly** is a comprehensive JavaScript library for statistical analysis, machine learning, and data visualization. Built for both browser and Node.js environments, it provides a complete toolkit for data scientists, analysts, and developers.
6
+ ## Table of Contents
7
+
8
+ 1. [Introduction](#introduction)
9
+ 2. [Installation](#installation)
10
+ 3. [Core Concepts](#core-concepts)
11
+ 4. [Data Preparation](#data-preparation)
12
+ 5. [Descriptive Statistics](#descriptive-statistics)
13
+ 6. [Exploratory Data Analysis](#exploratory-data-analysis)
14
+ 7. [Probability Distributions](#probability-distributions)
15
+ 8. [Hypothesis Testing](#hypothesis-testing)
16
+ 9. [Correlation Analysis](#correlation-analysis)
17
+ 10. [Regression Models](#regression-models)
18
+ 11. [Classification Models](#classification-models)
19
+ 12. [Clustering](#clustering)
20
+ 13. [Ensemble Methods](#ensemble-methods)
21
+ 14. [Visualization](#visualization)
20
22
 
21
23
  ---
22
24
 
23
- ## 📚 Table of Contents
24
-
25
- - [Installation](#installation)
26
- - [Quick Start](#quick-start)
27
- - [Core Modules](#core-modules)
28
- - [1. Data Loading](#1-data-loading)
29
- - [2. Data Validation](#2-data-validation)
30
- - [3. Utility Functions](#3-utility-functions)
31
- - [4. Central Tendency](#4-central-tendency)
32
- - [5. Dispersion Measures](#5-dispersion-measures)
33
- - [6. Position Measures](#6-position-measures)
34
- - [7. Shape Analysis](#7-shape-analysis)
35
- - [8. Hypothesis Testing](#8-hypothesis-testing)
36
- - [9. Confidence Intervals](#9-confidence-intervals)
37
- - [10. Normality Tests](#10-normality-tests)
38
- - [11. Correlation Analysis](#11-correlation-analysis)
39
- - [12. Regression Analysis](#12-regression-analysis)
40
- - [13. Report Generation](#13-report-generation)
41
- - [14. Pattern Detection](#14-pattern-detection)
42
- - [15. Result Interpretation](#15-result-interpretation)
43
- - [16. Auto-Analysis](#16-auto-analysis)
44
- - [17. Machine Learning](#17-machine-learning)
45
- - [18. Data Visualization](#18-data-visualization)
46
- - [Complete Examples](#complete-examples)
47
- - [API Reference](#api-reference)
25
+ ## Introduction
26
+
27
+ datly is a comprehensive JavaScript library that brings powerful data analysis, statistical testing, machine learning, and visualization capabilities to the browser and Node.js environments.
28
+
29
+ ### Key Features
30
+
31
+ - **Descriptive Statistics**: Mean, median, variance, standard deviation, skewness, kurtosis
32
+ - **Statistical Tests**: t-tests, ANOVA, chi-square, normality tests
33
+ - **Machine Learning**: Linear/logistic regression, KNN, decision trees, random forests, Naive Bayes
34
+ - **Clustering**: K-means clustering
35
+ - **Dimensionality Reduction**: PCA (Principal Component Analysis)
36
+ - **Data Visualization**: Histograms, scatter plots, box plots, heatmaps, and more
37
+ - **Time Series**: Moving averages, exponential smoothing, autocorrelation
48
38
 
49
39
  ---
50
40
 
51
- ## 🚀 Installation
41
+ ## Installation
52
42
 
53
43
  ### Browser (CDN)
54
44
 
55
45
  ```html
56
- <!-- Include Datly -->
57
46
  <script src="https://unpkg.com/datly"></script>
58
-
59
47
  <script>
60
- const datly = new Datly();
48
+ const result = datly.mean([1, 2, 3, 4, 5]);
49
+ console.log(result);
61
50
  </script>
62
51
  ```
63
52
 
64
- ### Node.js (NPM)
65
-
66
- ```bash
67
- # Core library (statistics and machine learning)
68
- npm install datly
69
- ```
53
+ ### Module Import
70
54
 
71
55
  ```javascript
72
- const Datly = require('datly');
73
- const datly = new Datly();
56
+ import * as datly from 'datly';
74
57
  ```
75
58
 
76
59
  ---
77
60
 
78
- ## Quick Start
61
+ ## Core Concepts
79
62
 
80
- ```javascript
81
- // Initialize the library
82
- const datly = new Datly();
83
-
84
- // Load data from CSV
85
- const data = await datly.loadCSV('data.csv');
63
+ ### Output Format
86
64
 
87
- // Calculate mean
88
- const ages = [25, 30, 35, 40, 45];
89
- const meanAge = datly.mean(ages);
90
- console.log('Mean Age:', meanAge); // 35
65
+ All analysis functions return results in a structured YAML-like text format that can be parsed or displayed:
91
66
 
92
- // Perform t-test
93
- const group1 = [23, 25, 27, 29, 31];
94
- const group2 = [30, 32, 34, 36, 38];
95
- const tTest = datly.tTest(group1, group2);
96
- console.log('T-Test Result:', tTest);
97
-
98
- // Create visualization
99
- datly.plotHistogram(ages, {
100
- title: 'Age Distribution',
101
- xlabel: 'Age',
102
- ylabel: 'Frequency'
103
- });
67
+ ```yaml
68
+ type: statistic
69
+ name: mean
70
+ value: 3
71
+ n: 5
104
72
  ```
105
73
 
106
- ---
107
-
108
- ## 📦 Core Modules
74
+ This format makes it easy to:
75
+ - Display results in a readable format
76
+ - Parse results programmatically
77
+ - Store analysis outputs as text
78
+ - Share results across different systems
109
79
 
110
- ### 1. Data Loading
80
+ ---
111
81
 
112
- Load data from various sources including CSV, JSON, and file systems.
82
+ ## Data Preparation
113
83
 
114
- #### Methods
84
+ ### `dataframe_from_json(data)`
115
85
 
116
- ##### `loadCSV(filePath, options)`
117
- Load data from a CSV file.
86
+ Creates a dataframe summary from JSON data.
118
87
 
88
+ **Parameters:**
89
+ - `data`: Array of objects or single object
90
+
91
+ **Returns:**
92
+ ```yaml
93
+ type: dataframe
94
+ columns:
95
+ - name
96
+ - age
97
+ - salary
98
+ n_rows: 100
99
+ n_cols: 3
100
+ dtypes:
101
+ - string
102
+ - number
103
+ - number
104
+ preview:
105
+ - name: alice
106
+ age: 30
107
+ salary: 50000
108
+ - name: bob
109
+ age: 25
110
+ salary: 45000
111
+ ```
112
+
113
+ **Example:**
119
114
  ```javascript
120
- const data = await datly.loadCSV('sales.csv', {
121
- delimiter: ',',
122
- header: true,
123
- skipEmptyLines: true,
124
- encoding: 'utf8'
125
- });
115
+ const data = [
116
+ { name: 'Alice', age: 30, salary: 50000 },
117
+ { name: 'Bob', age: 25, salary: 45000 },
118
+ { name: 'Charlie', age: 35, salary: 60000 }
119
+ ];
126
120
 
127
- console.log(data);
128
- // {
129
- // headers: ['product', 'sales', 'revenue'],
130
- // data: [
131
- // { product: 'A', sales: 100, revenue: 5000 },
132
- // { product: 'B', sales: 150, revenue: 7500 }
133
- // ],
134
- // length: 2,
135
- // columns: 3
136
- // }
121
+ const df = datly.dataframe_from_json(data);
122
+ console.log(df);
137
123
  ```
138
124
 
139
- **Parameters:**
140
- - `filePath` (string): Path to CSV file
141
- - `options` (object): Configuration options
142
- - `delimiter` (string): Column delimiter (default: ',')
143
- - `header` (boolean): First row contains headers (default: true)
144
- - `skipEmptyLines` (boolean): Skip empty rows (default: true)
145
- - `encoding` (string): File encoding (default: 'utf8')
125
+ ---
146
126
 
147
- **Returns:** Dataset object with headers, data array, length, and column count
127
+ ## Descriptive Statistics
148
128
 
149
- ---
129
+ ### `mean(array)`
150
130
 
151
- ##### `loadJSON(source, options)`
152
- Load data from JSON file, string, or object.
131
+ Calculates the arithmetic mean of an array of numbers.
132
+
133
+ **Returns:**
134
+ ```yaml
135
+ type: statistic
136
+ name: mean
137
+ n: 5
138
+ value: 3
139
+ ```
153
140
 
141
+ **Example:**
154
142
  ```javascript
155
- // From file
156
- const data1 = await datly.loadJSON('data.json');
143
+ datly.mean([1, 2, 3, 4, 5]); // 3
144
+ ```
145
+
146
+ ### `median(array)`
157
147
 
158
- // From JSON string
159
- const jsonString = '{"users": [{"name": "John", "age": 30}]}';
160
- const data2 = await datly.loadJSON(jsonString);
148
+ Calculates the median value.
161
149
 
162
- // From object
163
- const obj = {
164
- headers: ['name', 'age'],
165
- data: [
166
- { name: 'Alice', age: 25 },
167
- { name: 'Bob', age: 30 }
168
- ]
169
- };
170
- const data3 = await datly.loadJSON(obj);
150
+ **Returns:**
151
+ ```yaml
152
+ type: statistic
153
+ name: median
154
+ n: 5
155
+ value: 3
171
156
  ```
172
157
 
173
- **Parameters:**
174
- - `source` (string|object): JSON file path, string, or object
175
- - `options` (object): Configuration options
176
- - `validateTypes` (boolean): Auto-infer data types (default: true)
177
- - `autoInferHeaders` (boolean): Extract headers from data (default: true)
158
+ **Example:**
159
+ ```javascript
160
+ datly.median([1, 2, 3, 4, 5]); // 3
161
+ datly.median([1, 2, 3, 4]); // 2.5
162
+ ```
178
163
 
179
- **Returns:** Structured dataset object
164
+ ### `variance(array, sample = true)`
180
165
 
181
- ---
166
+ Calculates the variance.
167
+
168
+ **Parameters:**
169
+ - `array`: Array of numbers
170
+ - `sample`: If true, uses sample variance (n-1); if false, uses population variance (n)
182
171
 
183
- ##### `parseCSV(text, options)`
184
- Parse CSV text into structured data.
172
+ **Returns:**
173
+ ```yaml
174
+ type: statistic
175
+ name: variance
176
+ sample: true
177
+ n: 5
178
+ value: 2.5
179
+ ```
185
180
 
181
+ **Example:**
186
182
  ```javascript
187
- const csvText = `name,age,city
188
- John,30,NYC
189
- Jane,25,LA`;
183
+ datly.variance([1, 2, 3, 4, 5]); // Sample variance
184
+ datly.variance([1, 2, 3, 4, 5], false); // Population variance
185
+ ```
190
186
 
191
- const parsed = datly.parseCSV(csvText, {
192
- delimiter: ',',
193
- header: true
194
- });
187
+ ### `stddeviation(array, sample = true)`
188
+
189
+ Calculates the standard deviation.
195
190
 
196
- console.log(parsed.data);
197
- // [
198
- // { name: 'John', age: 30, city: 'NYC' },
199
- // { name: 'Jane', age: 25, city: 'LA' }
200
- // ]
191
+ **Returns:**
192
+ ```yaml
193
+ type: statistic
194
+ name: std_deviation
195
+ sample: true
196
+ n: 5
197
+ value: 1.5811388300841898
201
198
  ```
202
199
 
203
- ---
200
+ **Example:**
201
+ ```javascript
202
+ datly.stddeviation([1, 2, 3, 4, 5]);
203
+ ```
204
204
 
205
- ##### `cleanData(dataset)`
206
- Remove rows with all null values.
205
+ ### `minv(array)`
207
206
 
208
- ```javascript
209
- const dataset = {
210
- headers: ['a', 'b', 'c'],
211
- data: [
212
- { a: 1, b: 2, c: 3 },
213
- { a: null, b: null, c: null }, // Will be removed
214
- { a: 4, b: 5, c: 6 }
215
- ],
216
- length: 3,
217
- columns: 3
218
- };
207
+ Returns the minimum value.
219
208
 
220
- const cleaned = datly.cleanData(dataset);
221
- console.log(cleaned.length); // 2
209
+ **Returns:**
210
+ ```yaml
211
+ type: statistic
212
+ name: min
213
+ value: 1
222
214
  ```
223
215
 
224
- ---
225
-
226
- ##### `getColumn(dataset, columnName)`
227
- Extract a single column as an array.
216
+ ### `maxv(array)`
228
217
 
229
- ```javascript
230
- const data = {
231
- headers: ['name', 'age', 'salary'],
232
- data: [
233
- { name: 'Alice', age: 25, salary: 50000 },
234
- { name: 'Bob', age: 30, salary: 60000 },
235
- { name: 'Charlie', age: null, salary: 55000 }
236
- ]
237
- };
218
+ Returns the maximum value.
238
219
 
239
- const ages = datly.getColumn(data, 'age');
240
- console.log(ages); // [25, 30] (null values filtered)
220
+ **Returns:**
221
+ ```yaml
222
+ type: statistic
223
+ name: max
224
+ value: 5
241
225
  ```
242
226
 
243
- ---
227
+ ### `quantile(array, q)`
244
228
 
245
- ##### `getColumns(dataset, columnNames)`
246
- Extract multiple columns as an object.
229
+ Calculates the q-th quantile (0 ≤ q ≤ 1).
247
230
 
231
+ **Returns:**
232
+ ```yaml
233
+ type: statistic
234
+ name: quantile
235
+ q: 0.25
236
+ n: 100
237
+ value: 25.5
238
+ ```
239
+
240
+ **Example:**
248
241
  ```javascript
249
- const columns = datly.getColumns(data, ['age', 'salary']);
250
- console.log(columns);
251
- // {
252
- // age: [25, 30],
253
- // salary: [50000, 60000, 55000]
254
- // }
242
+ datly.quantile([1, 2, 3, 4, 5], 0.25); // First quartile
243
+ datly.quantile([1, 2, 3, 4, 5], 0.5); // Median
244
+ datly.quantile([1, 2, 3, 4, 5], 0.75); // Third quartile
255
245
  ```
256
246
 
257
- ---
247
+ ### `skewness(array)`
248
+
249
+ Calculates the skewness (measure of asymmetry).
258
250
 
259
- ##### `filterRows(dataset, predicate)`
260
- Filter dataset rows based on condition.
251
+ **Returns:**
252
+ ```yaml
253
+ type: statistic
254
+ name: skewness
255
+ value: 0
256
+ ```
261
257
 
258
+ **Example:**
262
259
  ```javascript
263
- const filtered = datly.filterRows(data, row => row.age > 25);
264
- console.log(filtered.data);
265
- // [{ name: 'Bob', age: 30, salary: 60000 }]
260
+ datly.skewness([1, 2, 3, 4, 5]); // ~0 for symmetric data
266
261
  ```
267
262
 
268
- ---
263
+ ### `kurtosis(array)`
269
264
 
270
- ##### `sortBy(dataset, column, order)`
271
- Sort dataset by column.
265
+ Calculates the kurtosis (measure of tailedness).
266
+
267
+ **Returns:**
268
+ ```yaml
269
+ type: statistic
270
+ name: kurtosis
271
+ value: -1.2
272
+ ```
272
273
 
274
+ **Example:**
273
275
  ```javascript
274
- const sorted = datly.sortBy(data, 'salary', 'desc');
275
- console.log(sorted.data);
276
- // [
277
- // { name: 'Bob', age: 30, salary: 60000 },
278
- // { name: 'Charlie', age: null, salary: 55000 },
279
- // { name: 'Alice', age: 25, salary: 50000 }
280
- // ]
276
+ datly.kurtosis([1, 2, 3, 4, 5]);
281
277
  ```
282
278
 
283
279
  ---
284
280
 
285
- ### 2. Data Validation
281
+ ## Exploratory Data Analysis
286
282
 
287
- Validate data integrity and structure.
283
+ ### `df_describe(data)`
288
284
 
289
- #### Methods
285
+ Generates comprehensive descriptive statistics for a dataset.
290
286
 
291
- ##### `validateData(dataset)`
292
- Validate dataset structure and integrity.
287
+ **Returns:**
288
+ ```yaml
289
+ type: describe
290
+ columns:
291
+ age:
292
+ dtype: number
293
+ count: 100
294
+ missing: 0
295
+ mean: 35.5
296
+ std: 10.2
297
+ min: 18
298
+ q1: 28
299
+ median: 35
300
+ q3: 43
301
+ max: 65
302
+ skewness: 0.15
303
+ kurtosis: -0.5
304
+ name:
305
+ dtype: string
306
+ count: 100
307
+ missing: 2
308
+ unique: 95
309
+ top:
310
+ - value: john
311
+ freq: 3
312
+ - value: alice
313
+ freq: 2
314
+ ```
293
315
 
316
+ **Example:**
294
317
  ```javascript
295
- const dataset = {
296
- headers: ['a', 'b'],
297
- data: [
298
- { a: 1, b: 2 },
299
- { a: 3, b: 4, c: 5 } // Extra column 'c'
300
- ]
301
- };
318
+ const data = [
319
+ { age: 25, salary: 50000, dept: 'IT' },
320
+ { age: 30, salary: 60000, dept: 'HR' },
321
+ { age: 35, salary: 70000, dept: 'IT' }
322
+ ];
302
323
 
303
- const validation = datly.validateData(dataset);
304
- console.log(validation);
305
- // {
306
- // valid: true,
307
- // errors: [],
308
- // warnings: ['Row 1: Extra columns: c']
309
- // }
324
+ const description = datly.df_describe(data);
325
+ console.log(description);
310
326
  ```
311
327
 
312
- ---
328
+ ### `df_missing_report(data)`
313
329
 
314
- ##### `validateNumericColumn(column)`
315
- Validate that column contains numeric values.
330
+ Analyzes missing values in the dataset.
316
331
 
332
+ **Returns:**
333
+ ```yaml
334
+ type: missing_report
335
+ rows:
336
+ - column: age
337
+ missing: 5
338
+ missing_rate: 0.05
339
+ - column: salary
340
+ missing: 0
341
+ missing_rate: 0
342
+ - column: name
343
+ missing: 10
344
+ missing_rate: 0.1
345
+ ```
346
+
347
+ **Example:**
317
348
  ```javascript
318
- const column = [1, 2, 'three', 4, NaN, 5];
319
- const result = datly.validateNumericColumn(column);
320
- console.log(result);
321
- // {
322
- // valid: true,
323
- // validCount: 4,
324
- // invalidCount: 2,
325
- // cleanData: [1, 2, 4, 5]
326
- // }
349
+ const report = datly.df_missing_report(data);
327
350
  ```
328
351
 
329
- ---
352
+ ### `df_corr(data, method = 'pearson')`
353
+
354
+ Calculates correlation matrix between numeric columns.
330
355
 
331
- ##### `validateSampleSize(sample, minSize)`
332
- Ensure sample has minimum required size.
356
+ **Parameters:**
357
+ - `data`: Array of objects
358
+ - `method`: 'pearson' or 'spearman'
359
+
360
+ **Returns:**
361
+ ```yaml
362
+ type: correlation_matrix
363
+ method: pearson
364
+ matrix:
365
+ age:
366
+ age: 1
367
+ salary: 0.85
368
+ experience: 0.92
369
+ salary:
370
+ age: 0.85
371
+ salary: 1
372
+ experience: 0.78
373
+ experience:
374
+ age: 0.92
375
+ salary: 0.78
376
+ experience: 1
377
+ ```
333
378
 
379
+ **Example:**
334
380
  ```javascript
335
- const sample = [1, 2, 3];
336
- try {
337
- datly.validateSampleSize(sample, 5);
338
- } catch (error) {
339
- console.log(error.message); // "Sample size (3) must be at least 5"
340
- }
381
+ const corr = datly.df_corr(data, 'pearson');
382
+ const spearman = datly.df_corr(data, 'spearman');
341
383
  ```
342
384
 
343
- ---
385
+ ### `eda_overview(data)`
344
386
 
345
- ##### `validateConfidenceLevel(level)`
346
- Validate confidence level is between 0 and 1.
387
+ Generates a comprehensive EDA report combining describe, missing values, and correlation.
347
388
 
389
+ **Returns:**
390
+ ```yaml
391
+ type: eda
392
+ summary:
393
+ age:
394
+ dtype: number
395
+ count: 100
396
+ mean: 35.5
397
+ std: 10.2
398
+ ...
399
+ missing:
400
+ - column: age
401
+ missing: 5
402
+ missing_rate: 0.05
403
+ correlation:
404
+ age:
405
+ age: 1
406
+ salary: 0.85
407
+ ```
408
+
409
+ **Example:**
348
410
  ```javascript
349
- datly.validateConfidenceLevel(0.95); // true
350
- datly.validateConfidenceLevel(1.5); // throws error
411
+ const overview = datly.eda_overview(data);
351
412
  ```
352
413
 
353
414
  ---
354
415
 
355
- ### 3. Utility Functions
356
-
357
- General-purpose statistical utilities.
358
-
359
- #### Methods
416
+ ## Probability Distributions
360
417
 
361
- ##### `detectOutliers(data, method)`
362
- Detect outliers using various methods.
418
+ ### Normal Distribution
363
419
 
364
- ```javascript
365
- const data = [1, 2, 3, 4, 5, 100]; // 100 is an outlier
420
+ #### `normal_pdf(x, mu = 0, sigma = 1)`
366
421
 
367
- // IQR method (default)
368
- const outliers1 = datly.detectOutliers(data, 'iqr');
369
- console.log(outliers1);
370
- // {
371
- // outliers: [100],
372
- // indices: [5],
373
- // count: 1,
374
- // percentage: 16.67
375
- // }
422
+ Probability density function of normal distribution.
376
423
 
377
- // Z-score method
378
- const outliers2 = datly.detectOutliers(data, 'zscore');
424
+ **Returns:**
425
+ ```yaml
426
+ type: distribution
427
+ name: normal_pdf
428
+ params:
429
+ mu: 0
430
+ sigma: 1
431
+ value: 0.3989422804014327
432
+ ```
379
433
 
380
- // Modified Z-score method
381
- const outliers3 = datly.detectOutliers(data, 'modified_zscore');
434
+ **Example:**
435
+ ```javascript
436
+ datly.normal_pdf(0); // PDF at x=0
437
+ datly.normal_pdf([0, 1, 2], 0, 1); // PDF for multiple values
382
438
  ```
383
439
 
384
- **Methods available:**
385
- - `'iqr'`: Interquartile Range method (default)
386
- - `'zscore'`: Z-score method (|z| > 3)
387
- - `'modified_zscore'`: Modified Z-score method
440
+ #### `normal_cdf(x, mu = 0, sigma = 1)`
388
441
 
389
- ---
442
+ Cumulative distribution function of normal distribution.
390
443
 
391
- ##### `frequencyTable(data)`
392
- Create frequency distribution table.
444
+ **Returns:**
445
+ ```yaml
446
+ type: distribution
447
+ name: normal_cdf
448
+ params:
449
+ mu: 0
450
+ sigma: 1
451
+ value: 0.5
452
+ ```
393
453
 
454
+ **Example:**
394
455
  ```javascript
395
- const colors = ['red', 'blue', 'red', 'green', 'blue', 'red'];
396
- const freq = datly.frequencyTable(colors);
397
- console.log(freq);
398
- // [
399
- // { value: 'red', frequency: 3, relativeFrequency: 0.5, percentage: 50 },
400
- // { value: 'blue', frequency: 2, relativeFrequency: 0.333, percentage: 33.33 },
401
- // { value: 'green', frequency: 1, relativeFrequency: 0.167, percentage: 16.67 }
402
- // ]
456
+ datly.normal_cdf(0); // P(X 0)
457
+ datly.normal_cdf(1.96); // P(X ≤ 1.96) ≈ 0.975
403
458
  ```
404
459
 
405
- ---
460
+ #### `normal_ppf(p, mu = 0, sigma = 1)`
406
461
 
407
- ##### `groupBy(dataset, column, aggregations)`
408
- Group data and calculate aggregations.
462
+ Percent point function (inverse CDF) of normal distribution.
463
+
464
+ **Returns:**
465
+ ```yaml
466
+ type: distribution
467
+ name: normal_ppf
468
+ params:
469
+ mu: 0
470
+ sigma: 1
471
+ value: 1.959963984540054
472
+ ```
409
473
 
474
+ **Example:**
410
475
  ```javascript
411
- const data = {
412
- headers: ['category', 'sales', 'profit'],
413
- data: [
414
- { category: 'A', sales: 100, profit: 20 },
415
- { category: 'B', sales: 150, profit: 30 },
416
- { category: 'A', sales: 200, profit: 40 },
417
- { category: 'B', sales: 180, profit: 35 }
418
- ]
419
- };
476
+ datly.normal_ppf(0.975); // Returns ~1.96
477
+ ```
420
478
 
421
- const grouped = datly.groupBy(data, 'category', {
422
- sales: 'mean',
423
- profit: 'sum'
424
- });
479
+ ### Binomial Distribution
425
480
 
426
- console.log(grouped);
427
- // {
428
- // A: {
429
- // count: 2,
430
- // mean_sales: 150,
431
- // sum_profit: 60,
432
- // data: [...]
433
- // },
434
- // B: {
435
- // count: 2,
436
- // mean_sales: 165,
437
- // sum_profit: 65,
438
- // data: [...]
439
- // }
440
- // }
441
- ```
442
-
443
- **Aggregation functions:**
444
- - `'mean'`: Average value
445
- - `'median'`: Median value
446
- - `'sum'`: Sum of values
447
- - `'min'`: Minimum value
448
- - `'max'`: Maximum value
449
- - `'std'`: Standard deviation
450
- - `'var'`: Variance
451
- - `'count'`: Count of values
481
+ #### `binomial_pmf(k, n, p)`
452
482
 
453
- ---
483
+ Probability mass function of binomial distribution.
484
+
485
+ **Parameters:**
486
+ - `k`: Number of successes (can be array)
487
+ - `n`: Number of trials
488
+ - `p`: Probability of success
454
489
 
455
- ##### `sample(dataset, size, method)`
456
- Extract a sample from dataset.
490
+ **Returns:**
491
+ ```yaml
492
+ type: distribution
493
+ name: binomial_pmf
494
+ params:
495
+ n: 10
496
+ p: 0.5
497
+ value: 0.24609375
498
+ ```
457
499
 
500
+ **Example:**
458
501
  ```javascript
459
- const data = {
460
- headers: ['id', 'value'],
461
- data: Array.from({ length: 100 }, (_, i) => ({ id: i, value: i * 10 })),
462
- length: 100
463
- };
464
-
465
- // Random sampling
466
- const randomSample = datly.sample(data, 10, 'random');
502
+ datly.binomial_pmf(5, 10, 0.5); // P(X = 5)
503
+ datly.binomial_pmf([0, 1, 2, 3], 10, 0.3); // Multiple values
504
+ ```
467
505
 
468
- // Systematic sampling
469
- const systematicSample = datly.sample(data, 10, 'systematic');
506
+ #### `binomial_cdf(k, n, p)`
470
507
 
471
- // First n records
472
- const firstSample = datly.sample(data, 10, 'first');
508
+ Cumulative distribution function of binomial distribution.
473
509
 
474
- // Last n records
475
- const lastSample = datly.sample(data, 10, 'last');
510
+ **Returns:**
511
+ ```yaml
512
+ type: distribution
513
+ name: binomial_cdf
514
+ params:
515
+ n: 10
516
+ p: 0.5
517
+ value: 0.623046875
476
518
  ```
477
519
 
478
- ---
520
+ ### Poisson Distribution
479
521
 
480
- ##### `bootstrap(data, statistic, iterations)`
481
- Bootstrap resampling for estimating statistic distribution.
522
+ #### `poisson_pmf(k, lambda)`
482
523
 
483
- ```javascript
484
- const data = [23, 25, 27, 29, 31, 33, 35];
524
+ Probability mass function of Poisson distribution.
485
525
 
486
- const bootstrap = datly.bootstrap(data, 'mean', 1000);
487
- console.log(bootstrap);
488
- // {
489
- // bootstrapStats: [...], // 1000 bootstrap means
490
- // mean: 29.5,
491
- // standardError: 1.23,
492
- // confidenceInterval: { lower: 27.1, upper: 31.9 }
493
- // }
526
+ **Returns:**
527
+ ```yaml
528
+ type: distribution
529
+ name: poisson_pmf
530
+ params:
531
+ lambda: 3
532
+ value: 0.22404180765538775
494
533
  ```
495
534
 
496
- **Statistics available:**
497
- - `'mean'`: Bootstrap mean
498
- - `'median'`: Bootstrap median
499
- - `'std'`: Bootstrap standard deviation
500
- - `'var'`: Bootstrap variance
501
- - Custom function: Pass your own function
502
-
503
- ---
535
+ **Example:**
536
+ ```javascript
537
+ datly.poisson_pmf(3, 3); // P(X = 3) when λ = 3
538
+ ```
504
539
 
505
- ##### `contingencyTable(column1, column2)`
506
- Create contingency table for categorical data.
540
+ #### `poisson_cdf(k, lambda)`
507
541
 
508
- ```javascript
509
- const gender = ['M', 'F', 'M', 'F', 'M', 'F'];
510
- const preference = ['A', 'B', 'A', 'A', 'B', 'B'];
542
+ Cumulative distribution function of Poisson distribution.
511
543
 
512
- const table = datly.contingencyTable(gender, preference);
513
- console.log(table);
514
- // {
515
- // table: {
516
- // M: { A: 2, B: 1 },
517
- // F: { A: 1, B: 2 }
518
- // },
519
- // totals: {
520
- // row: { M: 3, F: 3 },
521
- // col: { A: 3, B: 3 },
522
- // grand: 6
523
- // },
524
- // rows: ['M', 'F'],
525
- // columns: ['A', 'B']
526
- // }
544
+ **Returns:**
545
+ ```yaml
546
+ type: distribution
547
+ name: poisson_cdf
548
+ params:
549
+ lambda: 3
550
+ value: 0.6472319374260858
527
551
  ```
528
552
 
529
553
  ---
530
554
 
531
- ### 4. Central Tendency
555
+ ## Hypothesis Testing
532
556
 
533
- Measures of central tendency (mean, median, mode).
557
+ ### `t_test_one_sample(array, hypothesized_mean)`
534
558
 
535
- #### Methods
559
+ One-sample t-test.
536
560
 
537
- ##### `mean(data)`
538
- Calculate arithmetic mean.
561
+ **Returns:**
562
+ ```yaml
563
+ type: hypothesis_test
564
+ name: one_sample_t_test
565
+ statistic: 2.345
566
+ df: 99
567
+ p_value: 0.021
568
+ mean: 105
569
+ hypothesized_mean: 100
570
+ ```
539
571
 
572
+ **Example:**
540
573
  ```javascript
541
- const data = [10, 20, 30, 40, 50];
542
- const avg = datly.mean(data);
543
- console.log(avg); // 30
574
+ const data = [102, 98, 105, 110, 95, 100, 108];
575
+ datly.t_test_one_sample(data, 100);
544
576
  ```
545
577
 
546
- ---
578
+ ### `t_test_paired(array1, array2)`
547
579
 
548
- ##### `median(data)`
549
- Calculate median (middle value).
580
+ Paired samples t-test.
550
581
 
551
- ```javascript
552
- const data1 = [1, 3, 5, 7, 9];
553
- console.log(datly.median(data1)); // 5
582
+ **Returns:**
583
+ ```yaml
584
+ type: hypothesis_test
585
+ name: paired_t_test
586
+ statistic: 3.456
587
+ df: 29
588
+ p_value: 0.0018
589
+ mean_difference: 2.5
590
+ ```
554
591
 
555
- const data2 = [1, 3, 5, 7];
556
- console.log(datly.median(data2)); // 4 (average of 3 and 5)
592
+ **Example:**
593
+ ```javascript
594
+ const before = [120, 115, 130, 125, 140];
595
+ const after = [115, 110, 125, 120, 135];
596
+ datly.t_test_paired(before, after);
557
597
  ```
558
598
 
559
- ---
599
+ ### `t_test_independent(array1, array2, equal_var = true)`
560
600
 
561
- ##### `mode(data)`
562
- Find mode (most frequent value).
601
+ Independent samples t-test.
563
602
 
564
- ```javascript
565
- const data = [1, 2, 2, 3, 3, 3, 4, 4];
566
- const modeResult = datly.mode(data);
567
- console.log(modeResult);
568
- // {
569
- // values: [3],
570
- // frequency: 3,
571
- // isMultimodal: false,
572
- // isUniform: false
573
- // }
603
+ **Parameters:**
604
+ - `equal_var`: If true, assumes equal variances (pooled t-test); if false, uses Welch's t-test
574
605
 
575
- // Multimodal example
576
- const data2 = [1, 1, 2, 2, 3];
577
- const modeResult2 = datly.mode(data2);
578
- console.log(modeResult2);
579
- // {
580
- // values: [1, 2],
581
- // frequency: 2,
582
- // isMultimodal: true,
583
- // isUniform: false
584
- // }
606
+ **Returns:**
607
+ ```yaml
608
+ type: hypothesis_test
609
+ name: independent_t_test
610
+ statistic: 2.105
611
+ df: 48
612
+ p_value: 0.041
613
+ means:
614
+ group_a: 105.5
615
+ group_b: 98.3
585
616
  ```
586
617
 
587
- ---
588
-
589
- ##### `geometricMean(data)`
590
- Calculate geometric mean (for positive values).
591
-
618
+ **Example:**
592
619
  ```javascript
593
- const data = [2, 8, 32]; // Growth rates
594
- const geoMean = datly.geometricMean(data);
595
- console.log(geoMean); // 8 (∛(2×8×32))
620
+ const group1 = [100, 105, 110, 115, 120];
621
+ const group2 = [95, 98, 100, 102, 105];
622
+ datly.t_test_independent(group1, group2);
596
623
  ```
597
624
 
598
- **Use cases:** Growth rates, ratios, percentages
625
+ ### `z_test_one_sample(array, mu = 0, sigma = null, alpha = 0.05)`
599
626
 
600
- ---
627
+ One-sample z-test with confidence interval.
601
628
 
602
- ##### `harmonicMean(data)`
603
- Calculate harmonic mean (for rates).
629
+ **Returns:**
630
+ ```yaml
631
+ type: hypothesis_test
632
+ name: one_sample_z_test
633
+ statistic: 2.345
634
+ p_value: 0.019
635
+ ci_lower: 102.5
636
+ ci_upper: 107.5
637
+ confidence: 0.95
638
+ extra:
639
+ sample_mean: 105
640
+ hypothesized_mean: 100
641
+ se: 2.13
642
+ sigma_used: 10
643
+ n: 22
644
+ effect_size: 0.5
645
+ ```
604
646
 
647
+ **Example:**
605
648
  ```javascript
606
- const speeds = [60, 40, 30]; // km/h on different segments
607
- const harmMean = datly.harmonicMean(speeds);
608
- console.log(harmMean); // 40 (average speed)
649
+ datly.z_test_one_sample([102, 98, 105, 110], 100, 5, 0.05);
609
650
  ```
610
651
 
611
- **Use cases:** Average rates, speeds, ratios
652
+ ### `anova_oneway(groups, alpha = 0.05)`
612
653
 
613
- ---
654
+ One-way ANOVA test.
655
+
656
+ **Parameters:**
657
+ - `groups`: Array of arrays, each representing a group
614
658
 
615
- ##### `trimmedMean(data, percentage)`
616
- Calculate trimmed mean (remove extreme values).
659
+ **Returns:**
660
+ ```yaml
661
+ type: hypothesis_test
662
+ name: anova_oneway
663
+ statistic: 5.678
664
+ df:
665
+ between: 2
666
+ within: 27
667
+ p_value: 0.009
668
+ confidence: 0.95
669
+ extra:
670
+ group_means:
671
+ - 102.5
672
+ - 108.3
673
+ - 115.7
674
+ grand_mean: 108.8
675
+ ssb: 450.5
676
+ ssw: 890.2
677
+ ```
617
678
 
679
+ **Example:**
618
680
  ```javascript
619
- const data = [1, 2, 3, 100, 4, 5, 6]; // 100 is outlier
620
- const trimmed = datly.trimmedMean(data, 10); // Trim 10% from each end
621
- console.log(trimmed); // ~3.75 (without extreme values)
681
+ const group1 = [100, 105, 110];
682
+ const group2 = [108, 112, 115];
683
+ const group3 = [115, 120, 125];
684
+ datly.anova_oneway([group1, group2, group3]);
622
685
  ```
623
686
 
624
- ---
687
+ ### `chi_square_independence(observed, alpha = 0.05)`
625
688
 
626
- ##### `weightedMean(values, weights)`
627
- Calculate weighted average.
689
+ Chi-square test for independence (contingency table).
628
690
 
629
- ```javascript
630
- const grades = [85, 90, 78, 92];
631
- const weights = [0.3, 0.3, 0.2, 0.2]; // Exam weights
632
- const weightedGrade = datly.weightedMean(grades, weights);
633
- console.log(weightedGrade); // 86.5
634
- ```eoMean); // 8 (∛(2×8×32))
691
+ **Parameters:**
692
+ - `observed`: 2D array (contingency table)
693
+
694
+ **Returns:**
695
+ ```yaml
696
+ type: hypothesis_test
697
+ name: chi_square_independence
698
+ statistic: 8.456
699
+ df: 2
700
+ p_value: 0.015
701
+ confidence: 0.95
702
+ extra:
703
+ observed:
704
+ - - 10
705
+ - 20
706
+ - 30
707
+ - - 15
708
+ - 25
709
+ - 35
710
+ expected:
711
+ - - 12.5
712
+ - 22.5
713
+ - 32.5
714
+ - - 12.5
715
+ - 22.5
716
+ - 32.5
717
+ dof: 2
718
+ ```
719
+
720
+ **Example:**
721
+ ```javascript
722
+ const table = [
723
+ [10, 20, 30],
724
+ [15, 25, 35]
725
+ ];
726
+ datly.chi_square_independence(table);
635
727
  ```
636
728
 
637
- **Use cases:** Growth rates, ratios, percentages
729
+ ### `chi_square_goodness(observed, expected, alpha = 0.05)`
638
730
 
639
- ---
731
+ Chi-square goodness of fit test.
640
732
 
641
- ##### `harmonicMean(data)`
642
- Calculate harmonic mean (for rates).
733
+ **Returns:**
734
+ ```yaml
735
+ type: hypothesis_test
736
+ name: chi_square_goodness_of_fit
737
+ statistic: 3.456
738
+ df: 3
739
+ p_value: 0.327
740
+ confidence: 0.95
741
+ extra:
742
+ observed:
743
+ - 45
744
+ - 55
745
+ - 48
746
+ - 52
747
+ expected:
748
+ - 50
749
+ - 50
750
+ - 50
751
+ - 50
752
+ dof: 3
753
+ ```
643
754
 
755
+ **Example:**
644
756
  ```javascript
645
- const speeds = [60, 40, 30]; // km/h on different segments
646
- const harmMean = datly.centralTendency.harmonicMean(speeds);
647
- console.log(harmMean); // 40 (average speed)
757
+ const observed = [45, 55, 48, 52];
758
+ const expected = [50, 50, 50, 50];
759
+ datly.chi_square_goodness(observed, expected);
648
760
  ```
649
761
 
650
- **Use cases:** Average rates, speeds, ratios
762
+ ### `shapiro_wilk(array)`
651
763
 
652
- ---
764
+ Shapiro-Wilk test for normality.
653
765
 
654
- ##### `trimmedMean(data, percentage)`
655
- Calculate trimmed mean (remove extreme values).
766
+ **Returns:**
767
+ ```yaml
768
+ type: hypothesis_test
769
+ name: shapiro_wilk
770
+ statistic: 0.987
771
+ n: 50
772
+ note: approximation; w > 0.9 suggests normality
773
+ ```
656
774
 
775
+ **Example:**
657
776
  ```javascript
658
- const data = [1, 2, 3, 100, 4, 5, 6]; // 100 is outlier
659
- const trimmed = datly.centralTendency.trimmedMean(data, 10); // Trim 10% from each end
660
- console.log(trimmed); // ~3.75 (without extreme values)
777
+ datly.shapiro_wilk([1.2, 2.3, 1.8, 2.1, 1.9, 2.0]);
661
778
  ```
662
779
 
663
- ---
780
+ ### `jarque_bera(array)`
664
781
 
665
- ##### `weightedMean(values, weights)`
666
- Calculate weighted average.
782
+ Jarque-Bera test for normality.
667
783
 
668
- ```javascript
669
- const grades = [85, 90, 78, 92];
670
- const weights = [0.3, 0.3, 0.2, 0.2]; // Exam weights
671
- const weightedGrade = datly.centralTendency.weightedMean(grades, weights);
672
- console.log(weightedGrade); // 86.5
784
+ **Returns:**
785
+ ```yaml
786
+ type: hypothesis_test
787
+ name: jarque_bera
788
+ statistic: 2.345
789
+ n: 100
790
+ df: 2
791
+ note: tests normality; low p-value rejects normality
673
792
  ```
674
793
 
675
- ---
676
-
677
- ### 5. Dispersion Measures
678
-
679
- Measures of variability and spread.
794
+ ### `levene_test(groups)`
680
795
 
681
- #### Methods
796
+ Levene's test for homogeneity of variance.
682
797
 
683
- ##### `variance(data, sample)`
684
- Calculate variance.
798
+ **Returns:**
799
+ ```yaml
800
+ type: hypothesis_test
801
+ name: levene_test
802
+ statistic: 1.234
803
+ df_between: 2
804
+ df_within: 27
805
+ note: tests homogeneity of variance
806
+ ```
685
807
 
808
+ **Example:**
686
809
  ```javascript
687
- const data = [2, 4, 6, 8, 10];
810
+ const g1 = [1, 2, 3, 4, 5];
811
+ const g2 = [2, 3, 4, 5, 6];
812
+ const g3 = [3, 4, 5, 6, 7];
813
+ datly.levene_test([g1, g2, g3]);
814
+ ```
688
815
 
689
- // Sample variance (default)
690
- const sampleVar = datly.variance(data, true);
691
- console.log(sampleVar); // 10
816
+ ### `kruskal_wallis(groups)`
692
817
 
693
- // Population variance
694
- const popVar = datly.variance(data, false);
695
- console.log(popVar); // 8
818
+ Kruskal-Wallis H-test (non-parametric alternative to ANOVA).
819
+
820
+ **Returns:**
821
+ ```yaml
822
+ type: hypothesis_test
823
+ name: kruskal_wallis
824
+ statistic: 8.765
825
+ df: 2
826
+ note: non-parametric alternative to anova
696
827
  ```
697
828
 
698
- ---
829
+ ### `mann_whitney(array1, array2)`
699
830
 
700
- ##### `standardDeviation(data, sample)`
701
- Calculate standard deviation.
831
+ Mann-Whitney U test (non-parametric alternative to t-test).
702
832
 
703
- ```javascript
704
- const data = [2, 4, 6, 8, 10];
705
- const std = datly.standardDeviation(data);
706
- console.log(std); // 3.162 (√10)
833
+ **Returns:**
834
+ ```yaml
835
+ type: hypothesis_test
836
+ name: mann_whitney_u
837
+ statistic: 45
838
+ z_score: -1.234
839
+ p_value: 0.217
840
+ note: non-parametric alternative to t-test
707
841
  ```
708
842
 
709
- ---
843
+ ### `wilcoxon_signed_rank(array1, array2)`
710
844
 
711
- ##### `range(data)`
712
- Calculate range (max - min).
845
+ Wilcoxon signed-rank test (non-parametric paired test).
713
846
 
714
- ```javascript
715
- const data = [5, 10, 15, 20, 25];
716
- const rangeResult = datly.range(data);
717
- console.log(rangeResult);
718
- // {
719
- // range: 20,
720
- // min: 5,
721
- // max: 25
722
- // }
847
+ **Returns:**
848
+ ```yaml
849
+ type: hypothesis_test
850
+ name: wilcoxon_signed_rank
851
+ statistic: 28
852
+ z_score: 1.567
853
+ p_value: 0.117
854
+ n: 20
723
855
  ```
724
856
 
725
- ---
857
+ ### Confidence Intervals
726
858
 
727
- ##### `interquartileRange(data)`
728
- Calculate IQR (Q3 - Q1).
859
+ #### `confidence_interval_mean(array, confidence = 0.95)`
729
860
 
730
- ```javascript
731
- const data = [1, 2, 3, 4, 5, 6, 7, 8, 9];
732
- const iqr = datly.interquartileRange(data);
733
- console.log(iqr);
734
- // {
735
- // iqr: 4,
736
- // q1: 3,
737
- // q3: 7
738
- // }
861
+ Confidence interval for the mean.
862
+
863
+ **Returns:**
864
+ ```yaml
865
+ type: confidence_interval
866
+ parameter: mean
867
+ confidence: 0.95
868
+ n: 50
869
+ mean: 102.5
870
+ lower: 98.3
871
+ upper: 106.7
872
+ margin: 4.2
739
873
  ```
740
874
 
741
- ---
875
+ #### `confidence_interval_proportion(successes, n, confidence = 0.95)`
742
876
 
743
- ##### `coefficientOfVariation(data)`
744
- Calculate coefficient of variation (CV%).
877
+ Confidence interval for a proportion.
745
878
 
746
- ```javascript
747
- const data = [10, 20, 30, 40, 50];
748
- const cv = datly.coefficientOfVariation(data);
749
- console.log(cv);
750
- // {
751
- // cv: 0.471,
752
- // cvPercent: 47.1
753
- // }
879
+ **Returns:**
880
+ ```yaml
881
+ type: confidence_interval
882
+ parameter: proportion
883
+ confidence: 0.95
884
+ n: 100
885
+ proportion: 0.65
886
+ lower: 0.551
887
+ upper: 0.749
888
+ margin: 0.099
754
889
  ```
755
890
 
756
- **Interpretation:**
757
- - CV < 15%: Low variability
758
- - CV 15-30%: Moderate variability
759
- - CV > 30%: High variability
760
-
761
- ---
891
+ #### `confidence_interval_variance(array, confidence = 0.95)`
762
892
 
763
- ##### `meanAbsoluteDeviation(data)`
764
- Calculate MAD (mean absolute deviation).
893
+ Confidence interval for variance.
765
894
 
766
- ```javascript
767
- const data = [2, 4, 6, 8, 10];
768
- const mad = datly.meanAbsoluteDeviation(data);
769
- console.log(mad);
770
- // {
771
- // mad: 2.4,
772
- // mean: 6
773
- // }
895
+ **Returns:**
896
+ ```yaml
897
+ type: confidence_interval
898
+ parameter: variance
899
+ confidence: 0.95
900
+ n: 30
901
+ variance: 25.5
902
+ lower: 18.2
903
+ upper: 38.7
774
904
  ```
775
905
 
776
- ---
906
+ #### `confidence_interval_difference(array1, array2, confidence = 0.95)`
777
907
 
778
- ##### `standardError(data)`
779
- Calculate standard error of the mean.
908
+ Confidence interval for difference of means.
780
909
 
781
- ```javascript
782
- const data = [10, 12, 14, 16, 18, 20];
783
- const se = datly.standardError(data);
784
- console.log(se); // 1.29 (σ/√n)
910
+ **Returns:**
911
+ ```yaml
912
+ type: confidence_interval
913
+ parameter: difference_of_means
914
+ confidence: 0.95
915
+ difference: 5.5
916
+ lower: 2.3
917
+ upper: 8.7
918
+ margin: 3.2
919
+ means:
920
+ group_a: 105.5
921
+ group_b: 100
785
922
  ```
786
923
 
787
924
  ---
788
925
 
789
- ### 6. Position Measures
926
+ ## Correlation Analysis
790
927
 
791
- Quantiles, percentiles, and ranking.
928
+ ### `corr_pearson(array1, array2)`
792
929
 
793
- #### Methods
930
+ Pearson correlation coefficient.
794
931
 
795
- ##### `quantile(data, q)`
796
- Calculate quantile.
932
+ **Returns:**
933
+ ```yaml
934
+ type: statistic
935
+ name: pearson_correlation
936
+ value: 0.856
937
+ ```
797
938
 
939
+ **Example:**
798
940
  ```javascript
799
- const data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
941
+ const x = [1, 2, 3, 4, 5];
942
+ const y = [2, 4, 5, 4, 5];
943
+ datly.corr_pearson(x, y);
944
+ ```
800
945
 
801
- // Median (0.5 quantile)
802
- console.log(datly.quantile(data, 0.5)); // 5.5
946
+ ### `corr_spearman(array1, array2)`
803
947
 
804
- // First quartile
805
- console.log(datly.quantile(data, 0.25)); // 3.25
948
+ Spearman rank correlation coefficient.
806
949
 
807
- // Third quartile
808
- console.log(datly.quantile(data, 0.75)); // 7.75
950
+ **Returns:**
951
+ ```yaml
952
+ type: statistic
953
+ name: spearman_correlation
954
+ value: 0.9
809
955
  ```
810
956
 
811
- ---
957
+ ### `corr_kendall(array1, array2)`
812
958
 
813
- ##### `percentile(data, p)`
814
- Calculate percentile (0-100 scale).
959
+ Kendall's tau correlation coefficient.
815
960
 
816
- ```javascript
817
- const scores = [65, 70, 75, 80, 85, 90, 95];
818
- const p90 = datly.percentile(scores, 90);
819
- console.log(p90); // 93.5
961
+ **Returns:**
962
+ ```yaml
963
+ type: statistic
964
+ name: kendall_tau
965
+ value: 0.8
966
+ concordant: 8
967
+ discordant: 2
968
+ n: 5
820
969
  ```
821
970
 
822
- ---
971
+ ### `corr_partial(array1, array2, array3)`
823
972
 
824
- ##### `quartiles(data)`
825
- Calculate all quartiles.
973
+ Partial correlation controlling for a third variable.
826
974
 
827
- ```javascript
828
- const data = [1, 2, 3, 4, 5, 6, 7, 8, 9];
829
- const quartiles = datly.quartiles(data);
830
- console.log(quartiles);
831
- // {
832
- // q1: 2.5,
833
- // q2: 5, // Median
834
- // q3: 7.5,
835
- // iqr: 5
836
- // }
975
+ **Returns:**
976
+ ```yaml
977
+ type: statistic
978
+ name: partial_correlation
979
+ value: 0.456
980
+ controlling_for: third_variable
837
981
  ```
838
982
 
839
- ---
983
+ ### `corr_matrix_all(data)`
840
984
 
841
- ##### `percentileRank(data, value)`
842
- Calculate percentile rank of a value.
985
+ Comprehensive correlation matrix with Pearson, Spearman, and Kendall.
843
986
 
844
- ```javascript
845
- const scores = [60, 70, 80, 90, 100];
846
- const rank = datly.percentileRank(scores, 80);
847
- console.log(rank); // 50 (80 is at the 50th percentile)
987
+ **Returns:**
988
+ ```yaml
989
+ type: correlation_analysis
990
+ pearson:
991
+ age:
992
+ age: 1
993
+ salary: 0.85
994
+ salary:
995
+ age: 0.85
996
+ salary: 1
997
+ spearman:
998
+ age:
999
+ age: 1
1000
+ salary: 0.82
1001
+ salary:
1002
+ age: 0.82
1003
+ salary: 1
1004
+ kendall:
1005
+ age:
1006
+ age: 1
1007
+ salary: 0.75
1008
+ salary:
1009
+ age: 0.75
1010
+ salary: 1
848
1011
  ```
849
1012
 
850
1013
  ---
851
1014
 
852
- ##### `zScore(data, value)`
853
- Calculate z-score (standardized value).
1015
+ ## Regression Models
1016
+
1017
+ ### Linear Regression
854
1018
 
1019
+ #### `train_linear_regression(X, y)`
1020
+
1021
+ Trains a multiple linear regression model.
1022
+
1023
+ **Parameters:**
1024
+ - `X`: 2D array of features [[x1, x2, ...], ...]
1025
+ - `y`: Array of target values
1026
+
1027
+ **Returns:**
1028
+ ```yaml
1029
+ type: linear_regression
1030
+ weights:
1031
+ - 2.5
1032
+ - 1.8
1033
+ - -0.3
1034
+ mse: 12.34
1035
+ r2: 0.856
1036
+ n: 100
1037
+ p: 2
1038
+ ```
1039
+
1040
+ **Example:**
855
1041
  ```javascript
856
- const data = [10, 20, 30, 40, 50];
857
- const z = datly.zScore(data, 40);
858
- console.log(z); // 0.632 standard deviations above mean
1042
+ const X = [[1, 2], [2, 3], [3, 4], [4, 5]];
1043
+ const y = [3, 5, 7, 9];
1044
+ const model = datly.train_linear_regression(X, y);
859
1045
  ```
860
1046
 
861
- **Interpretation:**
862
- - |z| < 1: Within 1 standard deviation
863
- - |z| < 2: Within 2 standard deviations
864
- - |z| > 3: Potential outlier
1047
+ #### `predict_linear(model, X)`
865
1048
 
866
- ---
1049
+ Makes predictions using a trained linear regression model.
867
1050
 
868
- ##### `boxplotStats(data)`
869
- Calculate box plot statistics.
1051
+ **Parameters:**
1052
+ - `model`: Model text/object from `train_linear_regression`
1053
+ - `X`: 2D array of features
1054
+
1055
+ **Returns:**
1056
+ ```yaml
1057
+ type: prediction
1058
+ name: linear_regression
1059
+ predictions:
1060
+ - 105.3
1061
+ - 110.7
1062
+ - 98.2
1063
+ ```
870
1064
 
1065
+ **Example:**
871
1066
  ```javascript
872
- const data = [1, 2, 3, 4, 5, 100]; // 100 is outlier
873
- const stats = datly.boxplotStats(data);
874
- console.log(stats);
875
- // {
876
- // min: 1,
877
- // q1: 2,
878
- // median: 3.5,
879
- // q3: 5,
880
- // max: 5,
881
- // iqr: 3,
882
- // lowerFence: -2.5,
883
- // upperFence: 9.5,
884
- // outliers: [100],
885
- // outlierCount: 1
886
- // }
1067
+ const predictions = datly.predict_linear(model, [[5, 6], [6, 7]]);
887
1068
  ```
888
1069
 
889
- ---
1070
+ ### Logistic Regression
890
1071
 
891
- ##### `rank(data, method)`
892
- Rank data values.
1072
+ #### `train_logistic_regression(X, y, options = {})`
893
1073
 
894
- ```javascript
895
- const data = [10, 20, 20, 30, 40];
1074
+ Trains a logistic regression model for binary classification.
1075
+
1076
+ **Parameters:**
1077
+ - `X`: 2D array of features
1078
+ - `y`: Array of binary labels (0 or 1)
1079
+ - `options`:
1080
+ - `learning_rate`: Learning rate (default: 0.1)
1081
+ - `iterations`: Number of iterations (default: 1000)
1082
+ - `l2`: L2 regularization parameter (default: 0)
1083
+
1084
+ **Returns:**
1085
+ ```yaml
1086
+ type: logistic_regression
1087
+ weights:
1088
+ - 0.5
1089
+ - 1.2
1090
+ - -0.8
1091
+ accuracy: 0.92
1092
+ n: 100
1093
+ p: 2
1094
+ ```
1095
+
1096
+ **Example:**
1097
+ ```javascript
1098
+ const X = [[1, 2], [2, 3], [3, 1], [4, 2]];
1099
+ const y = [0, 0, 1, 1];
1100
+ const model = datly.train_logistic_regression(X, y, {
1101
+ learning_rate: 0.1,
1102
+ iterations: 1000,
1103
+ l2: 0.01
1104
+ });
1105
+ ```
896
1106
 
897
- // Average ranking (ties get average rank)
898
- const ranks1 = datly.rank(data, 'average');
899
- console.log(ranks1); // [1, 2.5, 2.5, 4, 5]
1107
+ #### `predict_logistic(model, X, threshold = 0.5)`
900
1108
 
901
- // Min ranking (ties get minimum rank)
902
- const ranks2 = datly.rank(data, 'min');
903
- console.log(ranks2); // [1, 2, 2, 4, 5]
1109
+ Makes predictions using a trained logistic regression model.
1110
+
1111
+ **Returns:**
1112
+ ```yaml
1113
+ type: prediction
1114
+ name: logistic_regression
1115
+ threshold: 0.5
1116
+ probabilities:
1117
+ - 0.234
1118
+ - 0.789
1119
+ - 0.456
1120
+ classes:
1121
+ - 0
1122
+ - 1
1123
+ - 0
1124
+ ```
904
1125
 
905
- // Max ranking (ties get maximum rank)
906
- const ranks3 = datly.rank(data, 'max');
907
- console.log(ranks3); // [1, 3, 3, 4, 5]
1126
+ **Example:**
1127
+ ```javascript
1128
+ const predictions = datly.predict_logistic(model, [[5, 6], [6, 7]], 0.5);
908
1129
  ```
909
1130
 
910
1131
  ---
911
1132
 
912
- ### 7. Shape Analysis
1133
+ ## Classification Models
1134
+
1135
+ ### K-Nearest Neighbors (KNN)
913
1136
 
914
- Distribution shape: skewness and kurtosis.
1137
+ #### `train_knn_classifier(X, y, k = 5)`
915
1138
 
916
- #### Methods
1139
+ Trains a KNN classifier.
1140
+
1141
+ **Parameters:**
1142
+ - `X`: 2D array of features
1143
+ - `y`: Array of class labels
1144
+ - `k`: Number of neighbors (default: 5)
917
1145
 
918
- ##### `skewness(data, bias)`
919
- Calculate skewness (measure of asymmetry).
1146
+ **Returns:**
1147
+ ```yaml
1148
+ type: knn_classifier
1149
+ k: 5
1150
+ x:
1151
+ - - 1
1152
+ - 2
1153
+ - - 2
1154
+ - 3
1155
+ y:
1156
+ - 0
1157
+ - 1
1158
+ n: 100
1159
+ p: 2
1160
+ ```
920
1161
 
1162
+ **Example:**
921
1163
  ```javascript
922
- const data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 100]; // Right-skewed
1164
+ const X = [[1, 2], [2, 3], [3, 1], [4, 2]];
1165
+ const y = [0, 0, 1, 1];
1166
+ const model = datly.train_knn_classifier(X, y, 3);
1167
+ ```
923
1168
 
924
- // Biased estimate (default)
925
- const skew1 = datly.skewness(data, true);
926
- console.log(skew1); // Positive value
1169
+ #### `predict_knn_classifier(model, X)`
927
1170
 
928
- // Unbiased estimate
929
- const skew2 = datly.skewness(data, false);
930
- console.log(skew2);
1171
+ Makes predictions using KNN classifier.
1172
+
1173
+ **Returns:**
1174
+ ```yaml
1175
+ type: prediction
1176
+ name: knn_classifier
1177
+ k: 5
1178
+ predictions:
1179
+ - 0
1180
+ - 1
1181
+ - 1
931
1182
  ```
932
1183
 
933
- **Interpretation:**
934
- - Skewness < -1: Highly left-skewed
935
- - -1 < Skewness < -0.5: Moderately left-skewed
936
- - -0.5 < Skewness < 0.5: Approximately symmetric
937
- - 0.5 < Skewness < 1: Moderately right-skewed
938
- - Skewness > 1: Highly right-skewed
1184
+ #### `train_knn_regressor(X, y, k = 5)`
939
1185
 
940
- ---
1186
+ Trains a KNN regressor.
941
1187
 
942
- ##### `kurtosis(data, bias, excess)`
943
- Calculate kurtosis (measure of tailedness).
1188
+ **Returns:**
1189
+ ```yaml
1190
+ type: knn_regressor
1191
+ k: 5
1192
+ x:
1193
+ - - 1
1194
+ - 2
1195
+ - - 2
1196
+ - 3
1197
+ y:
1198
+ - 10.5
1199
+ - 12.3
1200
+ n: 100
1201
+ p: 2
1202
+ ```
944
1203
 
945
- ```javascript
946
- const data = [1, 2, 3, 4, 5, 6, 7, 8, 9];
1204
+ #### `predict_knn_regressor(model, X)`
947
1205
 
948
- // Excess kurtosis (default)
949
- const kurt1 = datly.kurtosis(data, false, true);
950
- console.log(kurt1); // -3 adjustment applied
1206
+ Makes predictions using KNN regressor.
951
1207
 
952
- // Regular kurtosis
953
- const kurt2 = datly.kurtosis(data, false, false);
954
- console.log(kurt2);
1208
+ **Returns:**
1209
+ ```yaml
1210
+ type: prediction
1211
+ name: knn_regressor
1212
+ k: 5
1213
+ predictions:
1214
+ - 10.7
1215
+ - 11.8
1216
+ - 12.5
955
1217
  ```
956
1218
 
957
- **Interpretation (excess kurtosis):**
958
- - Excess < -1: Platykurtic (thin tails)
959
- - -1 < Excess < 1: Mesokurtic (normal)
960
- - Excess > 1: Leptokurtic (fat tails)
1219
+ ### Decision Trees
961
1220
 
962
- ---
1221
+ #### `train_decision_tree_classifier(X, y, options = {})`
963
1222
 
964
- ##### `isNormalDistribution(data, alpha)`
965
- Test if data follows normal distribution.
1223
+ Trains a decision tree classifier.
966
1224
 
967
- ```javascript
968
- const normalData = [2, 3, 4, 4, 5, 5, 5, 6, 6, 7];
969
- const test = datly.isNormalDistribution(normalData, 0.05);
970
- console.log(test);
971
- // {
972
- // shapiroWilk: { statistic: 0.95, pValue: 0.12, isNormal: true },
973
- // jarqueBera: { statistic: 1.23, pValue: 0.54, isNormal: true },
974
- // skewness: 0.12,
975
- // kurtosis: -0.34,
976
- // isNormalByTests: true
977
- // }
1225
+ **Parameters:**
1226
+ - `options`:
1227
+ - `max_depth`: Maximum depth of tree (default: 5)
1228
+ - `min_samples_split`: Minimum samples required to split (default: 2)
1229
+
1230
+ **Returns:**
1231
+ ```yaml
1232
+ type: decision_tree_classifier
1233
+ tree:
1234
+ leaf: false
1235
+ feature: 0
1236
+ threshold: 2.5
1237
+ left:
1238
+ leaf: true
1239
+ prediction: 0
1240
+ n: 50
1241
+ right:
1242
+ leaf: true
1243
+ prediction: 1
1244
+ n: 50
1245
+ max_depth: 5
1246
+ min_samples: 2
1247
+ n: 100
1248
+ p: 2
1249
+ ```
1250
+
1251
+ **Example:**
1252
+ ```javascript
1253
+ const model = datly.train_decision_tree_classifier(X, y, {
1254
+ max_depth: 5,
1255
+ min_samples_split: 2
1256
+ });
978
1257
  ```
979
1258
 
980
- ---
1259
+ #### `train_decision_tree_regressor(X, y, options = {})`
981
1260
 
982
- ##### `jarqueBeraTest(data, alpha)`
983
- Jarque-Bera normality test.
1261
+ Trains a decision tree regressor.
984
1262
 
985
- ```javascript
986
- const data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
987
- const jb = datly.jarqueBeraTest(data);
988
- console.log(jb);
989
- // {
990
- // statistic: 0.84,
991
- // pValue: 0.66,
992
- // skewness: 0,
993
- // excessKurtosis: -1.2,
994
- // isNormal: true
995
- // }
1263
+ **Returns:**
1264
+ ```yaml
1265
+ type: decision_tree_regressor
1266
+ tree:
1267
+ leaf: false
1268
+ feature: 0
1269
+ threshold: 2.5
1270
+ left: ...
1271
+ right: ...
1272
+ max_depth: 5
1273
+ min_samples: 2
1274
+ n: 100
1275
+ p: 2
996
1276
  ```
997
1277
 
998
- ---
1278
+ #### `predict_decision_tree(model, X)`
1279
+
1280
+ Makes predictions using a decision tree.
999
1281
 
1000
- ### 8. Hypothesis Testing
1001
-
1002
- Statistical hypothesis tests.
1003
-
1004
- #### Methods
1005
-
1006
- ##### `tTest(sample1, sample2, type, alpha)`
1007
- Perform t-test.
1008
-
1009
- ```javascript
1010
- // One-sample t-test
1011
- const sample = [23, 25, 27, 29, 31];
1012
- const populationMean = 25;
1013
- const oneSample = datly.tTest(sample, populationMean, 'one-sample');
1014
- console.log(oneSample);
1015
- // {
1016
- // type: 'one-sample',
1017
- // statistic: 1.89,
1018
- // pValue: 0.13,
1019
- // degreesOfFreedom: 4,
1020
- // significant: false
1021
- // }
1022
-
1023
- // Two-sample t-test
1024
- const group1 = [23, 25, 27, 29, 31];
1025
- const group2 = [28, 30, 32, 34, 36];
1026
- const twoSample = datly.tTest(group1, group2, 'two-sample');
1027
- console.log(twoSample);
1028
- // {
1029
- // type: 'two-sample',
1030
- // statistic: -3.46,
1031
- // pValue: 0.008,
1032
- // significant: true,
1033
- // meanDifference: -5
1034
- // }
1035
-
1036
- // Paired t-test
1037
- const before = [120, 125, 130, 128, 122];
1038
- const after = [115, 118, 125, 120, 115];
1039
- const paired = datly.tTest(before, after, 'paired');
1040
- console.log(paired);
1282
+ **Returns:**
1283
+ ```yaml
1284
+ type: prediction
1285
+ name: decision_tree_classifier
1286
+ predictions:
1287
+ - 0
1288
+ - 1
1289
+ - 1
1041
1290
  ```
1042
1291
 
1043
- ---
1292
+ ### Random Forest
1044
1293
 
1045
- ##### `zTest(sample, populationMean, populationStd, alpha)`
1046
- Perform z-test (known population variance).
1294
+ #### `train_random_forest_classifier(X, y, options = {})`
1047
1295
 
1048
- ```javascript
1049
- const sample = [105, 110, 108, 112, 115];
1050
- const popMean = 100;
1051
- const popStd = 15;
1296
+ Trains a random forest classifier.
1052
1297
 
1053
- const zTest = datly.zTest(sample, popMean, popStd);
1054
- console.log(zTest);
1055
- // {
1056
- // type: 'z-test',
1057
- // statistic: 1.49,
1058
- // pValue: 0.136,
1059
- // significant: false,
1060
- // sampleMean: 110
1061
- // }
1298
+ **Parameters:**
1299
+ - `options`:
1300
+ - `n_estimators`: Number of trees (default: 10)
1301
+ - `max_depth`: Maximum depth (default: 5)
1302
+ - `min_samples_split`: Minimum samples to split (default: 2)
1303
+ - `seed`: Random seed (default: 42)
1304
+
1305
+ **Returns:**
1306
+ ```yaml
1307
+ type: random_forest_classifier
1308
+ trees:
1309
+ - leaf: false
1310
+ feature: 0
1311
+ threshold: 2.5
1312
+ ...
1313
+ - leaf: false
1314
+ feature: 1
1315
+ threshold: 3.2
1316
+ ...
1317
+ n_trees: 10
1318
+ max_depth: 5
1319
+ min_samples: 2
1320
+ n: 100
1321
+ p: 2
1322
+ ```
1323
+
1324
+ **Example:**
1325
+ ```javascript
1326
+ const model = datly.train_random_forest_classifier(X, y, {
1327
+ n_estimators: 10,
1328
+ max_depth: 5,
1329
+ seed: 42
1330
+ });
1062
1331
  ```
1063
1332
 
1064
- ---
1065
-
1066
- ##### `anovaTest(groups, alpha)`
1067
- One-way ANOVA test.
1333
+ #### `train_random_forest_regressor(X, y, options = {})`
1068
1334
 
1069
- ```javascript
1070
- const groupA = [23, 25, 27, 29];
1071
- const groupB = [30, 32, 34, 36];
1072
- const groupC = [28, 30, 32, 34];
1335
+ Trains a random forest regressor.
1073
1336
 
1074
- const anova = datly.anovaTest([groupA, groupB, groupC]);
1075
- console.log(anova);
1076
- // {
1077
- // type: 'one-way-anova',
1078
- // statistic: 12.45,
1079
- // pValue: 0.001,
1080
- // dfBetween: 2,
1081
- // dfWithin: 9,
1082
- // significant: true,
1083
- // groupMeans: [26, 33, 31]
1084
- // }
1337
+ **Returns:**
1338
+ ```yaml
1339
+ type: random_forest_regressor
1340
+ trees: [...]
1341
+ n_trees: 10
1342
+ max_depth: 5
1343
+ min_samples: 2
1344
+ n: 100
1345
+ p: 2
1085
1346
  ```
1086
1347
 
1087
- ---
1088
-
1089
- ##### `chiSquareTest(column1, column2, alpha)`
1090
- Chi-square test of independence.
1348
+ #### `predict_random_forest_classifier(model, X)`
1091
1349
 
1092
- ```javascript
1093
- const gender = ['M', 'F', 'M', 'F', 'M', 'F', 'M', 'F'];
1094
- const preference = ['A', 'B', 'A', 'A', 'B', 'B', 'A', 'B'];
1350
+ Makes predictions using random forest classifier.
1095
1351
 
1096
- const chiTest = datly.chiSquareTest(gender, preference);
1097
- console.log(chiTest);
1098
- // {
1099
- // type: 'chi-square-independence',
1100
- // statistic: 0.5,
1101
- // pValue: 0.48,
1102
- // degreesOfFreedom: 1,
1103
- // significant: false,
1104
- // cramersV: 0.25
1105
- // }
1352
+ **Returns:**
1353
+ ```yaml
1354
+ type: prediction
1355
+ name: random_forest_classifier
1356
+ n_trees: 10
1357
+ predictions:
1358
+ - 0
1359
+ - 1
1360
+ - 1
1106
1361
  ```
1107
1362
 
1108
- ---
1109
-
1110
- ##### `mannWhitneyTest(sample1, sample2, alpha)`
1111
- Mann-Whitney U test (non-parametric alternative to t-test).
1363
+ #### `predict_random_forest_regressor(model, X)`
1112
1364
 
1113
- ```javascript
1114
- const group1 = [1, 2, 3, 4, 5];
1115
- const group2 = [6, 7, 8, 9, 10];
1365
+ Makes predictions using random forest regressor.
1116
1366
 
1117
- const mwTest = datly.mannWhitneyTest(group1, group2);
1118
- console.log(mwTest);
1119
- // {
1120
- // type: 'mann-whitney-u',
1121
- // statistic: 0,
1122
- // u1: 0,
1123
- // u2: 25,
1124
- // zStatistic: -2.61,
1125
- // pValue: 0.009,
1126
- // significant: true
1127
- // }
1367
+ **Returns:**
1368
+ ```yaml
1369
+ type: prediction
1370
+ name: random_forest_regressor
1371
+ n_trees: 10
1372
+ predictions:
1373
+ - 10.7
1374
+ - 11.8
1375
+ - 12.5
1128
1376
  ```
1129
1377
 
1130
- ---
1378
+ ### Naive Bayes
1131
1379
 
1132
- ### 9. Confidence Intervals
1380
+ #### `train_naive_bayes(X, y)`
1133
1381
 
1134
- Estimate population parameters with confidence.
1382
+ Trains a Gaussian Naive Bayes classifier.
1135
1383
 
1136
- #### Methods
1384
+ **Parameters:**
1385
+ - `X`: 2D array of features
1386
+ - `y`: Array of class labels
1137
1387
 
1138
- ##### `mean(data, confidence)`
1139
- Confidence interval for mean.
1388
+ **Returns:**
1389
+ ```yaml
1390
+ type: naive_bayes
1391
+ classes:
1392
+ - 0
1393
+ - 1
1394
+ priors:
1395
+ 0: 0.5
1396
+ 1: 0.5
1397
+ stats:
1398
+ 0:
1399
+ - mean: 2.5
1400
+ std: 1.2
1401
+ - mean: 3.1
1402
+ std: 0.8
1403
+ 1:
1404
+ - mean: 5.2
1405
+ std: 1.5
1406
+ - mean: 6.3
1407
+ std: 1.1
1408
+ n: 100
1409
+ p: 2
1410
+ ```
1140
1411
 
1412
+ **Example:**
1141
1413
  ```javascript
1142
- const data = [23, 25, 27, 29, 31, 33, 35];
1143
- const ci = datly.confidenceInterval(data, 0.95);
1144
- console.log(ci);
1145
- // {
1146
- // mean: 29,
1147
- // standardError: 1.63,
1148
- // marginOfError: 4.03,
1149
- // lowerBound: 24.97,
1150
- // upperBound: 33.03,
1151
- // confidence: 0.95,
1152
- // degreesOfFreedom: 6
1153
- // }
1414
+ const X = [[1, 2], [2, 3], [5, 6], [6, 7]];
1415
+ const y = [0, 0, 1, 1];
1416
+ const model = datly.train_naive_bayes(X, y);
1154
1417
  ```
1155
1418
 
1156
- ---
1419
+ #### `predict_naive_bayes(model, X)`
1420
+
1421
+ Makes predictions using Naive Bayes classifier.
1157
1422
 
1158
- ##### `proportion(successes, total, confidence)`
1159
- Confidence interval for proportion.
1160
-
1161
- ```javascript
1162
- const successes = 45; // Number of successes
1163
- const total = 100; // Total trials
1164
-
1165
- const ci = datly.confidenceIntervals.proportion(successes, total, 0.95);
1166
- console.log(ci);
1167
- // {
1168
- // normal: {
1169
- // proportion: 0.45,
1170
- // lowerBound: 0.353,
1171
- // upperBound: 0.547,
1172
- // confidence: 0.95
1173
- // },
1174
- // wilson: {
1175
- // proportion: 0.45,
1176
- // center: 0.453,
1177
- // lowerBound: 0.355,
1178
- // upperBound: 0.551,
1179
- // confidence: 0.95
1180
- // },
1181
- // recommended: {...} // Wilson interval for small samples
1182
- // }
1423
+ **Returns:**
1424
+ ```yaml
1425
+ type: prediction
1426
+ name: naive_bayes
1427
+ predictions:
1428
+ - 0
1429
+ - 1
1430
+ - 1
1183
1431
  ```
1184
1432
 
1185
1433
  ---
1186
1434
 
1187
- ##### `variance(data, confidence)`
1188
- Confidence interval for variance.
1435
+ ## Clustering
1189
1436
 
1190
- ```javascript
1191
- const data = [10, 12, 14, 16, 18, 20];
1192
- const ci = datly.confidenceIntervals.variance(data, 0.95);
1193
- console.log(ci);
1194
- // {
1195
- // sampleVariance: 14.67,
1196
- // lowerBound: 6.23,
1197
- // upperBound: 73.33,
1198
- // confidence: 0.95
1199
- // }
1437
+ ### K-Means Clustering
1438
+
1439
+ #### `train_kmeans(X, k = 3, options = {})`
1440
+
1441
+ Trains a K-means clustering model.
1442
+
1443
+ **Parameters:**
1444
+ - `X`: 2D array of features
1445
+ - `k`: Number of clusters (default: 3)
1446
+ - `options`:
1447
+ - `max_iterations`: Maximum iterations (default: 100)
1448
+ - `seed`: Random seed (default: 42)
1449
+
1450
+ **Returns:**
1451
+ ```yaml
1452
+ type: kmeans
1453
+ k: 3
1454
+ centroids:
1455
+ - - 2.1
1456
+ - 3.5
1457
+ - - 5.8
1458
+ - 6.2
1459
+ - - 9.1
1460
+ - 8.7
1461
+ inertia: 45.67
1462
+ n: 150
1463
+ p: 2
1464
+ ```
1465
+
1466
+ **Example:**
1467
+ ```javascript
1468
+ const X = [[1, 2], [2, 3], [5, 6], [6, 7], [9, 8], [10, 9]];
1469
+ const model = datly.train_kmeans(X, 3, {
1470
+ max_iterations: 100,
1471
+ seed: 42
1472
+ });
1200
1473
  ```
1201
1474
 
1202
- ---
1475
+ #### `predict_kmeans(model, X)`
1203
1476
 
1204
- ##### `meanDifference(sample1, sample2, confidence, equalVariances)`
1205
- Confidence interval for difference between two means.
1477
+ Assigns cluster labels to new data points.
1206
1478
 
1207
- ```javascript
1208
- const before = [120, 125, 130, 128, 122];
1209
- const after = [115, 118, 125, 120, 115];
1479
+ **Returns:**
1480
+ ```yaml
1481
+ type: prediction
1482
+ name: kmeans
1483
+ k: 3
1484
+ cluster_labels:
1485
+ - 0
1486
+ - 0
1487
+ - 1
1488
+ - 1
1489
+ - 2
1490
+ - 2
1491
+ ```
1210
1492
 
1211
- const ci = datly.confidenceIntervals.meanDifference(before, after, 0.95);
1212
- console.log(ci);
1213
- // {
1214
- // meanDifference: 9.2,
1215
- // sample1Mean: 125,
1216
- // sample2Mean: 115.8,
1217
- // lowerBound: 3.5,
1218
- // upperBound: 14.9,
1219
- // confidence: 0.95
1220
- // }
1493
+ **Example:**
1494
+ ```javascript
1495
+ const newData = [[1.5, 2.5], [5.5, 6.5], [9.5, 8.5]];
1496
+ const clusters = datly.predict_kmeans(model, newData);
1221
1497
  ```
1222
1498
 
1223
1499
  ---
1224
1500
 
1225
- ##### `correlation(x, y, confidence, method)`
1226
- Confidence interval for correlation coefficient.
1501
+ ## Ensemble Methods
1227
1502
 
1228
- ```javascript
1229
- const x = [1, 2, 3, 4, 5];
1230
- const y = [2, 4, 5, 4, 5];
1503
+ ### `ensemble_voting_classifier(models, X, method = 'hard')`
1231
1504
 
1232
- const ci = datly.confidenceIntervals.correlation(x, y, 0.95);
1233
- console.log(ci);
1234
- // {
1235
- // correlation: 0.775,
1236
- // fisherZ: 1.033,
1237
- // lowerBound: 0.034,
1238
- // upperBound: 0.965,
1239
- // confidence: 0.95
1240
- // }
1505
+ Combines multiple classifier predictions through voting.
1506
+
1507
+ **Parameters:**
1508
+ - `models`: Array of trained model texts/objects
1509
+ - `X`: 2D array of features
1510
+ - `method`: 'hard' for majority voting, 'soft' for probability averaging
1511
+
1512
+ **Returns:**
1513
+ ```yaml
1514
+ type: ensemble_prediction
1515
+ method: voting_hard
1516
+ n_models: 3
1517
+ predictions:
1518
+ - 0
1519
+ - 1
1520
+ - 1
1521
+ - 0
1522
+ ```
1523
+
1524
+ **Example:**
1525
+ ```javascript
1526
+ const model1 = datly.train_logistic_regression(X, y);
1527
+ const model2 = datly.train_knn_classifier(X, y, 5);
1528
+ const model3 = datly.train_decision_tree_classifier(X, y);
1529
+
1530
+ const ensemble = datly.ensemble_voting_classifier(
1531
+ [model1, model2, model3],
1532
+ X_test,
1533
+ 'hard'
1534
+ );
1241
1535
  ```
1242
1536
 
1243
- ---
1537
+ ### `ensemble_voting_regressor(models, X)`
1244
1538
 
1245
- ##### `bootstrapCI(data, statistic, confidence, iterations)`
1246
- Bootstrap confidence interval.
1539
+ Combines multiple regressor predictions through averaging.
1247
1540
 
1541
+ **Returns:**
1542
+ ```yaml
1543
+ type: ensemble_prediction
1544
+ method: voting_average
1545
+ n_models: 3
1546
+ predictions:
1547
+ - 105.3
1548
+ - 110.7
1549
+ - 98.2
1550
+ ```
1551
+
1552
+ **Example:**
1248
1553
  ```javascript
1249
- const data = [23, 25, 27, 29, 31];
1554
+ const model1 = datly.train_linear_regression(X, y);
1555
+ const model2 = datly.train_knn_regressor(X, y, 5);
1556
+ const model3 = datly.train_decision_tree_regressor(X, y);
1250
1557
 
1251
- // Bootstrap CI for median
1252
- const ci = datly.confidenceIntervals.bootstrapCI(data, 'median', 0.95, 1000);
1253
- console.log(ci);
1254
- // {
1255
- // originalStatistic: 27,
1256
- // bootstrapMean: 27.1,
1257
- // bias: 0.1,
1258
- // standardError: 1.4,
1259
- // lowerBound: 24.5,
1260
- // upperBound: 29.8,
1261
- // confidence: 0.95,
1262
- // iterations: 1000
1263
- // }
1558
+ const ensemble = datly.ensemble_voting_regressor(
1559
+ [model1, model2, model3],
1560
+ X_test
1561
+ );
1264
1562
  ```
1265
1563
 
1266
1564
  ---
1267
1565
 
1268
- ### 10. Normality Tests
1566
+ ## Model Evaluation
1567
+
1568
+ ### `train_test_split(X, y, test_size = 0.2, seed = 42)`
1269
1569
 
1270
- Test if data follows normal distribution.
1570
+ Splits data into training and testing sets.
1271
1571
 
1272
- #### Methods
1572
+ **Parameters:**
1573
+ - `X`: 2D array of features
1574
+ - `y`: Array of labels
1575
+ - `test_size`: Proportion for test set (default: 0.2)
1576
+ - `seed`: Random seed (default: 42)
1273
1577
 
1274
- ##### `shapiroWilk(data, alpha)`
1275
- Shapiro-Wilk test (most powerful for small samples).
1578
+ **Returns:**
1579
+ ```yaml
1580
+ type: split
1581
+ sizes:
1582
+ train: 80
1583
+ test: 20
1584
+ indices:
1585
+ train:
1586
+ - 0
1587
+ - 2
1588
+ - 3
1589
+ ...
1590
+ test:
1591
+ - 1
1592
+ - 4
1593
+ ...
1594
+ preview:
1595
+ x_train:
1596
+ - - 1
1597
+ - 2
1598
+ - - 3
1599
+ - 4
1600
+ y_train:
1601
+ - 0
1602
+ - 1
1603
+ - 0
1604
+ ```
1276
1605
 
1606
+ **Example:**
1277
1607
  ```javascript
1278
- const data = [2.3, 2.5, 2.7, 2.9, 3.1, 3.3, 3.5];
1279
- const sw = datly.normalityTests.shapiroWilk(data);
1280
- console.log(sw);
1281
- // {
1282
- // statistic: 0.96,
1283
- // pValue: 0.82,
1284
- // isNormal: true,
1285
- // alpha: 0.05,
1286
- // interpretation: "Fail to reject null hypothesis..."
1287
- // }
1608
+ const split = datly.train_test_split(X, y, 0.2, 42);
1609
+ // Use split.indices to extract train/test data
1288
1610
  ```
1289
1611
 
1290
- **Best for:** Sample sizes 3-5000
1612
+ ### Classification Metrics
1291
1613
 
1292
- ---
1614
+ #### `metrics_classification(y_true, y_pred)`
1615
+
1616
+ Calculates classification metrics including accuracy, precision, recall, and F1-score.
1293
1617
 
1294
- ##### `kolmogorovSmirnov(data, alpha)`
1295
- Kolmogorov-Smirnov test.
1618
+ **Returns:**
1619
+ ```yaml
1620
+ type: metric
1621
+ name: classification_report
1622
+ confusion_matrix:
1623
+ tp: 45
1624
+ fp: 5
1625
+ tn: 42
1626
+ fn: 8
1627
+ accuracy: 0.87
1628
+ precision: 0.9
1629
+ recall: 0.849
1630
+ f1: 0.874
1631
+ ```
1296
1632
 
1633
+ **Example:**
1297
1634
  ```javascript
1298
- const data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
1299
- const ks = datly.normalityTests.kolmogorovSmirnov(data);
1300
- console.log(ks);
1301
- // {
1302
- // statistic: 0.15,
1303
- // pValue: 0.95,
1304
- // isNormal: true,
1305
- // lambda: 0.47
1306
- // }
1635
+ const y_true = [0, 1, 1, 0, 1, 1, 0, 0];
1636
+ const y_pred = [0, 1, 0, 0, 1, 1, 0, 1];
1637
+ const metrics = datly.metrics_classification(y_true, y_pred);
1307
1638
  ```
1308
1639
 
1309
- **Best for:** Large samples, continuous data
1640
+ ### Regression Metrics
1310
1641
 
1311
- ---
1642
+ #### `metrics_regression(y_true, y_pred)`
1312
1643
 
1313
- ##### `andersonDarling(data, alpha)`
1314
- Anderson-Darling test (sensitive to tails).
1644
+ Calculates regression metrics including MSE, MAE, and R².
1645
+
1646
+ **Returns:**
1647
+ ```yaml
1648
+ type: metric
1649
+ name: regression_report
1650
+ mse: 12.34
1651
+ mae: 2.87
1652
+ r2: 0.856
1653
+ ```
1315
1654
 
1655
+ **Example:**
1316
1656
  ```javascript
1317
- const data = [10, 12, 14, 15, 16, 18, 20];
1318
- const ad = datly.normalityTests.andersonDarling(data);
1319
- console.log(ad);
1320
- // {
1321
- // statistic: 0.234,
1322
- // adjustedStatistic: 0.247,
1323
- // pValue: 0.75,
1324
- // isNormal: true
1325
- // }
1657
+ const y_true = [3.0, 5.0, 7.0, 9.0];
1658
+ const y_pred = [2.8, 5.2, 6.9, 9.1];
1659
+ const metrics = datly.metrics_regression(y_true, y_pred);
1326
1660
  ```
1327
1661
 
1328
- **Best for:** Detecting tail deviations
1662
+ ### Cross Validation
1329
1663
 
1330
- ---
1664
+ #### `cross_validate(X, y, model_type, options = {})`
1331
1665
 
1332
- ##### `jarqueBera(data, alpha)`
1333
- Jarque-Bera test (based on skewness and kurtosis).
1666
+ Performs k-fold cross-validation.
1334
1667
 
1335
- ```javascript
1336
- const data = [5, 6, 7, 8, 9, 10, 11, 12];
1337
- const jb = datly.normalityTests.jarqueBera(data);
1338
- console.log(jb);
1339
- // {
1340
- // statistic: 0.45,
1341
- // pValue: 0.80,
1342
- // skewness: 0,
1343
- // excessKurtosis: -1.2,
1344
- // isNormal: true
1345
- // }
1668
+ **Parameters:**
1669
+ - `X`: 2D array of features
1670
+ - `y`: Array of labels
1671
+ - `model_type`: String - 'linear_regression', 'logistic_regression', 'knn_classifier', 'decision_tree_classifier', 'random_forest_classifier'
1672
+ - `options`:
1673
+ - `k_folds`: Number of folds (default: 5)
1674
+ - Model-specific options (e.g., `k` for KNN, `max_depth` for trees)
1675
+
1676
+ **Returns:**
1677
+ ```yaml
1678
+ type: cross_validation
1679
+ model_type: logistic_regression
1680
+ k_folds: 5
1681
+ scores:
1682
+ - 0.85
1683
+ - 0.88
1684
+ - 0.82
1685
+ - 0.87
1686
+ - 0.86
1687
+ mean_score: 0.856
1688
+ std_score: 0.022
1689
+ ```
1690
+
1691
+ **Example:**
1692
+ ```javascript
1693
+ const cv = datly.cross_validate(X, y, 'logistic_regression', {
1694
+ k_folds: 5,
1695
+ learning_rate: 0.1,
1696
+ iterations: 1000
1697
+ });
1346
1698
  ```
1347
1699
 
1348
- **Best for:** Large samples (n > 30)
1349
-
1350
- ---
1700
+ ### Feature Importance
1351
1701
 
1352
- ##### `dagoTest(data, alpha)`
1353
- D'Agostino K² test.
1702
+ #### `feature_importance_tree(model)`
1354
1703
 
1355
- ```javascript
1356
- const data = Array.from({ length: 50 }, () => Math.random() * 10);
1357
- const dago = datly.normalityTests.dagoTest(data);
1358
- console.log(dago);
1359
- // {
1360
- // statistic: 2.34,
1361
- // pValue: 0.31,
1362
- // isNormal: true,
1363
- // skewness: 0.23,
1364
- // excessKurtosis: -0.45
1365
- // }
1366
- ```
1704
+ Extracts feature importance from tree-based models.
1367
1705
 
1368
- **Best for:** Medium to large samples (n ≥ 20)
1706
+ **Parameters:**
1707
+ - `model`: Trained decision tree or random forest model
1369
1708
 
1370
- ---
1709
+ **Returns:**
1710
+ ```yaml
1711
+ type: feature_importance
1712
+ model: random_forest_classifier
1713
+ n_trees: 10
1714
+ importance:
1715
+ - 0.45
1716
+ - 0.32
1717
+ - 0.15
1718
+ - 0.08
1719
+ ```
1371
1720
 
1372
- ##### `batchNormalityTest(data, alpha)`
1373
- Run all normality tests at once.
1374
-
1375
- ```javascript
1376
- const data = [12, 14, 15, 16, 17, 18, 20, 22];
1377
- const batch = datly.normalityTests.batchNormalityTest(data);
1378
- console.log(batch);
1379
- // {
1380
- // individualTests: {
1381
- // shapiroWilk: {...},
1382
- // jarqueBera: {...},
1383
- // andersonDarling: {...},
1384
- // kolmogorovSmirnov: {...}
1385
- // },
1386
- // summary: {
1387
- // testsRun: 4,
1388
- // testsPassingNormality: 4,
1389
- // consensusNormal: true,
1390
- // strongNormalEvidence: true
1391
- // },
1392
- // recommendation: "Strong evidence for normality..."
1393
- // }
1721
+ **Example:**
1722
+ ```javascript
1723
+ const model = datly.train_random_forest_classifier(X, y);
1724
+ const importance = datly.feature_importance_tree(model);
1394
1725
  ```
1395
1726
 
1396
1727
  ---
1397
1728
 
1398
- ### 11. Correlation Analysis
1729
+ ## Data Preprocessing
1399
1730
 
1400
- Measure relationships between variables.
1731
+ ### Scaling
1401
1732
 
1402
- #### Methods
1733
+ #### `standard_scaler_fit(X)`
1403
1734
 
1404
- ##### `pearson(x, y)`
1405
- Pearson correlation (linear relationships).
1735
+ Fits a standard scaler (z-score normalization).
1406
1736
 
1407
- ```javascript
1408
- const height = [160, 165, 170, 175, 180];
1409
- const weight = [55, 60, 65, 70, 75];
1737
+ **Returns:**
1738
+ ```yaml
1739
+ type: standard_scaler
1740
+ params:
1741
+ - mean: 50.5
1742
+ std: 15.2
1743
+ - mean: 100.3
1744
+ std: 25.7
1745
+ n: 100
1746
+ p: 2
1747
+ ```
1410
1748
 
1411
- const corr = datly.correlation.pearson(height, weight);
1412
- console.log(corr);
1413
- // {
1414
- // correlation: 1.0,
1415
- // pValue: 0.000,
1416
- // tStatistic: Infinity,
1417
- // significant: true,
1418
- // confidenceInterval: { lower: 1.0, upper: 1.0 }
1419
- // }
1749
+ **Example:**
1750
+ ```javascript
1751
+ const X = [[50, 100], [60, 120], [40, 90]];
1752
+ const scaler = datly.standard_scaler_fit(X);
1420
1753
  ```
1421
1754
 
1422
- **Range:** -1 (perfect negative) to +1 (perfect positive)
1755
+ #### `standard_scaler_transform(scaler, X)`
1423
1756
 
1424
- **Interpretation:**
1425
- - |r| < 0.3: Weak
1426
- - 0.3 ≤ |r| < 0.7: Moderate
1427
- - |r| ≥ 0.7: Strong
1757
+ Transforms data using fitted standard scaler.
1428
1758
 
1429
- ---
1430
-
1431
- ##### `spearman(x, y)`
1432
- Spearman correlation (monotonic relationships, rank-based).
1433
-
1434
- ```javascript
1435
- const x = [1, 2, 3, 4, 5];
1436
- const y = [1, 4, 9, 16, 25]; // Non-linear but monotonic
1437
-
1438
- const corr = datly.correlation.spearman(x, y);
1439
- console.log(corr);
1440
- // {
1441
- // correlation: 1.0,
1442
- // pValue: 0.000,
1443
- // significant: true,
1444
- // xRanks: [1, 2, 3, 4, 5],
1445
- // yRanks: [1, 2, 3, 4, 5]
1446
- // }
1759
+ **Returns:**
1760
+ ```yaml
1761
+ type: scaled_data
1762
+ method: standard
1763
+ preview:
1764
+ - - 0.0
1765
+ - 0.0
1766
+ - - 0.625
1767
+ - 0.767
1768
+ - - -0.625
1769
+ - -0.767
1447
1770
  ```
1448
1771
 
1449
- **Use when:** Data is ordinal or non-linear
1450
-
1451
- ---
1452
-
1453
- ##### `kendall(x, y)`
1454
- Kendall's Tau correlation.
1455
-
1772
+ **Example:**
1456
1773
  ```javascript
1457
- const x = [1, 2, 3, 4, 5];
1458
- const y = [2, 1, 4, 3, 5];
1459
-
1460
- const corr = datly.correlation.kendall(x, y);
1461
- console.log(corr);
1462
- // {
1463
- // correlation: 0.6,
1464
- // pValue: 0.142,
1465
- // zStatistic: 1.47,
1466
- // concordantPairs: 8,
1467
- // discordantPairs: 2,
1468
- // significant: false
1469
- // }
1774
+ const scaled = datly.standard_scaler_transform(scaler, X);
1470
1775
  ```
1471
1776
 
1472
- **Use when:** Small sample sizes, ordinal data
1777
+ #### `minmax_scaler_fit(X)`
1473
1778
 
1474
- ---
1779
+ Fits a min-max scaler (scales to [0, 1] range).
1475
1780
 
1476
- ##### `matrix(dataset, method)`
1477
- Correlation matrix for multiple variables.
1478
-
1479
- ```javascript
1480
- const data = {
1481
- headers: ['age', 'income', 'spending'],
1482
- data: [
1483
- { age: 25, income: 30000, spending: 20000 },
1484
- { age: 30, income: 40000, spending: 25000 },
1485
- { age: 35, income: 50000, spending: 30000 },
1486
- { age: 40, income: 60000, spending: 35000 }
1487
- ]
1488
- };
1489
-
1490
- const matrix = datly.correlation.matrix(data, 'pearson');
1491
- console.log(matrix);
1492
- // {
1493
- // correlations: {
1494
- // age: { age: 1, income: 1, spending: 1 },
1495
- // income: { age: 1, income: 1, spending: 1 },
1496
- // spending: { age: 1, income: 1, spending: 1 }
1497
- // },
1498
- // pValues: {...},
1499
- // strongCorrelations: [
1500
- // { variable1: 'age', variable2: 'income', correlation: 1.0 }
1501
- // ]
1502
- // }
1781
+ **Returns:**
1782
+ ```yaml
1783
+ type: minmax_scaler
1784
+ params:
1785
+ - min: 40
1786
+ max: 60
1787
+ - min: 90
1788
+ max: 120
1789
+ n: 100
1790
+ p: 2
1503
1791
  ```
1504
1792
 
1505
- ---
1793
+ #### `minmax_scaler_transform(scaler, X)`
1506
1794
 
1507
- ##### `partialCorrelation(x, y, z)`
1508
- Partial correlation (controlling for third variable).
1509
-
1510
- ```javascript
1511
- const x = [1, 2, 3, 4, 5];
1512
- const y = [2, 3, 4, 5, 6];
1513
- const z = [1, 1, 2, 2, 3]; // Control variable
1795
+ Transforms data using fitted min-max scaler.
1514
1796
 
1515
- const partial = datly.correlation.partialCorrelation(x, y, z);
1516
- console.log(partial);
1517
- // {
1518
- // correlation: 0.95,
1519
- // pValue: 0.013,
1520
- // significant: true,
1521
- // controllingFor: 'third variable'
1522
- // }
1797
+ **Returns:**
1798
+ ```yaml
1799
+ type: scaled_data
1800
+ method: minmax
1801
+ preview:
1802
+ - - 0.5
1803
+ - 0.333
1804
+ - - 1.0
1805
+ - 1.0
1806
+ - - 0.0
1807
+ - 0.0
1523
1808
  ```
1524
1809
 
1525
1810
  ---
1526
1811
 
1527
- ##### `covariance(x, y, sample)`
1528
- Calculate covariance.
1812
+ ## Dimensionality Reduction
1529
1813
 
1530
- ```javascript
1531
- const x = [1, 2, 3, 4, 5];
1532
- const y = [2, 4, 6, 8, 10];
1814
+ ### Principal Component Analysis (PCA)
1533
1815
 
1534
- // Sample covariance
1535
- const cov = datly.correlation.covariance(x, y, true);
1536
- console.log(cov);
1537
- // {
1538
- // covariance: 5,
1539
- // meanX: 3,
1540
- // meanY: 6,
1541
- // sampleSize: 5
1542
- // }
1543
- ```
1816
+ #### `train_pca(X, n_components = 2)`
1544
1817
 
1545
- ---
1818
+ Trains a PCA model.
1546
1819
 
1547
- ### 12. Regression Analysis
1548
-
1549
- Model relationships and make predictions.
1550
-
1551
- #### Methods
1552
-
1553
- ##### `linear(x, y)`
1554
- Simple linear regression.
1555
-
1556
- ```javascript
1557
- const x = [1, 2, 3, 4, 5];
1558
- const y = [2, 4, 5, 4, 5];
1559
-
1560
- const model = datly.regression.linear(x, y);
1561
- console.log(model);
1562
- // {
1563
- // slope: 0.6,
1564
- // intercept: 2.2,
1565
- // rSquared: 0.46,
1566
- // adjustedRSquared: 0.28,
1567
- // equation: 'y = 2.2000 + 0.6000x',
1568
- // pValueSlope: 0.158,
1569
- // pValueModel: 0.158,
1570
- // residuals: [-0.2, 0.2, 0.4, -1.0, 0.6],
1571
- // predicted: [2.8, 3.4, 4.0, 4.6, 5.2]
1572
- // }
1573
- ```
1574
-
1575
- ---
1576
-
1577
- ##### `multiple(dataset, dependent, independents)`
1578
- Multiple linear regression.
1579
-
1580
- ```javascript
1581
- const data = {
1582
- headers: ['sales', 'advertising', 'price', 'competition'],
1583
- data: [
1584
- { sales: 100, advertising: 10, price: 50, competition: 3 },
1585
- { sales: 150, advertising: 15, price: 45, competition: 2 },
1586
- { sales: 120, advertising: 12, price: 48, competition: 4 },
1587
- { sales: 180, advertising: 20, price: 40, competition: 1 }
1588
- ]
1589
- };
1590
-
1591
- const model = datly.regression.multiple(
1592
- data,
1593
- 'sales',
1594
- ['advertising', 'price', 'competition']
1595
- );
1596
-
1597
- console.log(model);
1598
- // {
1599
- // coefficients: [
1600
- // { variable: 'Intercept', coefficient: 50, pValue: 0.05 },
1601
- // { variable: 'advertising', coefficient: 5.5, pValue: 0.01 },
1602
- // { variable: 'price', coefficient: -2.1, pValue: 0.03 },
1603
- // { variable: 'competition', coefficient: -10, pValue: 0.02 }
1604
- // ],
1605
- // rSquared: 0.95,
1606
- // adjustedRSquared: 0.90,
1607
- // fStatistic: 19.0,
1608
- // pValueModel: 0.001,
1609
- // equation: 'y = 50.0000 + 5.5000*advertising + -2.1000*price + -10.0000*competition'
1610
- // }
1611
- ```
1612
-
1613
- ---
1614
-
1615
- ##### `polynomial(x, y, degree)`
1616
- Polynomial regression.
1617
-
1618
- ```javascript
1619
- const x = [1, 2, 3, 4, 5];
1620
- const y = [1, 4, 9, 16, 25]; // y = x²
1621
-
1622
- const model = datly.regression.polynomial(x, y, 2);
1623
- console.log(model);
1624
- // {
1625
- // coefficients: [0, 0, 1], // y = 0 + 0x + 1x²
1626
- // degree: 2,
1627
- // rSquared: 1.0,
1628
- // equation: 'y = 0.0000 + 0.0000*x + 1.0000*x^2',
1629
- // predicted: [1, 4, 9, 16, 25]
1630
- // }
1631
- ```
1632
-
1633
- ---
1634
-
1635
- ##### `logistic(x, y, maxIterations, tolerance)`
1636
- Logistic regression (binary classification).
1637
-
1638
- ```javascript
1639
- const x = [1, 2, 3, 4, 5, 6];
1640
- const y = [0, 0, 0, 1, 1, 1]; // Binary outcome
1641
-
1642
- const model = datly.regression.logistic(x, y);
1643
- console.log(model);
1644
- // {
1645
- // intercept: -3.5,
1646
- // slope: 1.2,
1647
- // probabilities: [0.12, 0.23, 0.38, 0.55, 0.70, 0.81],
1648
- // predicted: [0, 0, 0, 1, 1, 1],
1649
- // accuracy: 1.0,
1650
- // logLikelihood: -2.1,
1651
- // mcFaddenR2: 0.68
1652
- // }
1653
- ```
1654
-
1655
- ---
1656
-
1657
- ##### `predict(model, x)`
1658
- Make predictions using fitted model.
1659
-
1660
- ```javascript
1661
- // After fitting a model
1662
- const newX = [6, 7, 8];
1663
- const predictions = datly.regression.predict(model, newX);
1664
- console.log(predictions); // [5.8, 6.4, 7.0]
1665
- ```
1666
-
1667
- ---
1820
+ **Parameters:**
1821
+ - `X`: 2D array of features
1822
+ - `n_components`: Number of principal components (default: 2)
1668
1823
 
1669
- ### 13. Report Generation
1670
-
1671
- Generate comprehensive statistical reports.
1672
-
1673
- #### Methods
1674
-
1675
- ##### `summary(dataset)`
1676
- Generate complete statistical summary.
1677
-
1678
- ```javascript
1679
- const data = {
1680
- headers: ['age', 'income', 'department'],
1681
- data: [
1682
- { age: 25, income: 30000, department: 'Sales' },
1683
- { age: 30, income: 45000, department: 'IT' },
1684
- { age: 35, income: 50000, department: 'Sales' },
1685
- { age: 40, income: 60000, department: 'IT' }
1686
- ],
1687
- length: 4,
1688
- columns: 3
1689
- };
1690
-
1691
- const report = datly.reportGenerator.summary(data);
1692
- console.log(report);
1693
- // {
1694
- // title: 'Statistical Summary Report',
1695
- // generatedAt: '2025-01-15T10:30:00.000Z',
1696
- // basicInfo: {
1697
- // totalRows: 4,
1698
- // totalColumns: 3,
1699
- // headers: ['age', 'income', 'department']
1700
- // },
1701
- // columnAnalysis: {
1702
- // age: {
1703
- // type: 'numeric',
1704
- // mean: 32.5,
1705
- // median: 32.5,
1706
- // min: 25,
1707
- // max: 40,
1708
- // standardDeviation: 6.45
1709
- // },
1710
- // income: {
1711
- // type: 'numeric',
1712
- // mean: 46250,
1713
- // median: 47500,
1714
- // ...
1715
- // },
1716
- // department: {
1717
- // type: 'categorical',
1718
- // categories: [...],
1719
- // mostFrequent: { value: 'Sales', frequency: 2 }
1720
- // }
1721
- // },
1722
- // dataQuality: {
1723
- // overallScore: 95,
1724
- // completenessScore: 100,
1725
- // consistencyScore: 90
1726
- // },
1727
- // keyInsights: [
1728
- // {
1729
- // type: 'correlation',
1730
- // title: 'Strong correlation between age and income',
1731
- // importance: 8
1732
- // }
1733
- // ],
1734
- // recommendations: [...]
1735
- // }
1824
+ **Returns:**
1825
+ ```yaml
1826
+ type: pca
1827
+ n_components: 2
1828
+ means:
1829
+ - 50.5
1830
+ - 100.3
1831
+ - 75.8
1832
+ components:
1833
+ - - 0.707
1834
+ - 0.707
1835
+ - 0.0
1836
+ - - -0.707
1837
+ - 0.707
1838
+ - 0.0
1839
+ n: 100
1840
+ p: 3
1736
1841
  ```
1737
1842
 
1738
- ---
1739
-
1740
- ##### `exportSummary(summary, format)`
1741
- Export report in different formats.
1742
-
1843
+ **Example:**
1743
1844
  ```javascript
1744
- const report = datly.reportGenerator.summary(data);
1745
-
1746
- // Export as JSON
1747
- const json = datly.reportGenerator.exportSummary(report, 'json');
1748
-
1749
- // Export as text
1750
- const text = datly.reportGenerator.exportSummary(report, 'text');
1751
- console.log(text);
1752
- // STATISTICAL SUMMARY REPORT
1753
- // Generated: 1/15/2025, 10:30:00 AM
1754
- // ==================================================
1755
- //
1756
- // BASIC INFORMATION
1757
- // --------------------
1758
- // Rows: 4
1759
- // Columns: 3
1760
- // ...
1761
-
1762
- // Export as CSV
1763
- const csv = datly.reportGenerator.exportSummary(report, 'csv');
1764
- ```
1765
-
1766
- ---
1767
-
1768
- ### 14. Pattern Detection
1769
-
1770
- Automatically detect patterns in data.
1771
-
1772
- #### Methods
1773
-
1774
- ##### `analyze(dataset)`
1775
- Comprehensive pattern analysis.
1776
-
1777
- ```javascript
1778
- const data = {
1779
- headers: ['date', 'sales', 'temperature'],
1780
- data: [
1781
- { date: '2024-01-01', sales: 100, temperature: 20 },
1782
- { date: '2024-01-02', sales: 110, temperature: 22 },
1783
- { date: '2024-01-03', sales: 105, temperature: 21 },
1784
- { date: '2024-01-04', sales: 115, temperature: 23 }
1785
- ],
1786
- length: 4,
1787
- columns: 3
1788
- };
1789
-
1790
- const patterns = datly.patternDetector.analyze(data);
1791
- console.log(patterns);
1792
- // {
1793
- // patterns: {
1794
- // trends: [
1795
- // {
1796
- // column: 'sales',
1797
- // direction: 'increasing',
1798
- // slope: 5,
1799
- // rSquared: 0.75,
1800
- // strength: 'strong'
1801
- // }
1802
- // ],
1803
- // seasonality: [...],
1804
- // outliers: [...],
1805
- // correlations: {
1806
- // strongCorrelations: [
1807
- // {
1808
- // variable1: 'sales',
1809
- // variable2: 'temperature',
1810
- // correlation: 0.95,
1811
- // strength: 'very_strong'
1812
- // }
1813
- // ]
1814
- // },
1815
- // distributions: [...],
1816
- // clustering: [...],
1817
- // temporal: [...]
1818
- // },
1819
- // insights: [
1820
- // {
1821
- // type: 'trend',
1822
- // importance: 'high',
1823
- // message: 'Found 1 strong trend(s) in your data',
1824
- // details: ['sales: increasing trend']
1825
- // }
1826
- // ]
1827
- // }
1828
- ```
1829
-
1830
- ---
1831
-
1832
- ### 15. Result Interpretation
1833
-
1834
- Interpret statistical test results in plain language.
1835
-
1836
- #### Methods
1837
-
1838
- ##### `interpret(testResult)`
1839
- Interpret any statistical test result.
1840
-
1841
- ```javascript
1842
- // After performing a t-test
1843
- const tTestResult = datly.hypothesisTesting.tTest(group1, group2);
1844
-
1845
- const interpretation = datly.interpreter.interpret(tTestResult);
1846
- console.log(interpretation);
1847
- // {
1848
- // testType: 't-test',
1849
- // summary: 'significant difference between groups (t = -3.46, p = 0.008)',
1850
- // conclusion: {
1851
- // decision: 'reject_null',
1852
- // statement: 'At the 95% confidence level, we reject the null hypothesis',
1853
- // pValue: 0.008,
1854
- // confidenceLevel: 95
1855
- // },
1856
- // significance: {
1857
- // level: 'strong',
1858
- // pValue: 0.008,
1859
- // interpretation: 'Strong evidence against null hypothesis',
1860
- // isSignificant: true
1861
- // },
1862
- // effectSize: {
1863
- // value: 0.85,
1864
- // magnitude: 'Large',
1865
- // interpretation: 'large effect size'
1866
- // },
1867
- // plainLanguage: '✓ SIGNIFICANT RESULT: Found a meaningful difference between the groups. (p-value: 0.0080)',
1868
- // recommendations: [
1869
- // 'Very strong result - investigate practical significance',
1870
- // 'Replicate findings with independent data when possible'
1871
- // ]
1872
- // }
1845
+ const X = [[1, 2, 3], [4, 5, 6], [7, 8, 9]];
1846
+ const pca = datly.train_pca(X, 2);
1873
1847
  ```
1874
1848
 
1875
- ---
1876
-
1877
- ### 16. Auto-Analysis
1878
-
1879
- Automated end-to-end analysis.
1880
-
1881
- #### Methods
1849
+ #### `transform_pca(model, X)`
1882
1850
 
1883
- ##### `autoAnalyze(dataset, options)`
1884
- Perform comprehensive automatic analysis.
1851
+ Transforms data to principal component space.
1885
1852
 
1886
- ```javascript
1887
- const data = {
1888
- headers: ['age', 'income', 'gender', 'purchase'],
1889
- data: [
1890
- { age: 25, income: 30000, gender: 'M', purchase: 100 },
1891
- { age: 30, income: 45000, gender: 'F', purchase: 150 },
1892
- { age: 35, income: 50000, gender: 'M', purchase: 120 },
1893
- { age: 40, income: 60000, gender: 'F', purchase: 180 },
1894
- // ... more data
1895
- ],
1896
- length: 100,
1897
- columns: 4
1898
- };
1899
-
1900
- const analysis = datly.autoAnalyzer.autoAnalyze(data, {
1901
- minCorrelationThreshold: 0.3,
1902
- significanceLevel: 0.05,
1903
- generateVisualizations: true,
1904
- includeAdvancedAnalysis: true
1905
- });
1906
-
1907
- console.log(analysis);
1908
- // {
1909
- // metadata: {
1910
- // analysisDate: '2025-01-15T10:30:00.000Z',
1911
- // datasetSize: 100,
1912
- // columnsAnalyzed: 4
1913
- // },
1914
- // variableClassification: {
1915
- // quantitative: [
1916
- // { name: 'age', type: 'quantitative', subtype: 'discrete' },
1917
- // { name: 'income', type: 'quantitative', subtype: 'continuous' },
1918
- // { name: 'purchase', type: 'quantitative', subtype: 'continuous' }
1919
- // ],
1920
- // qualitative: [
1921
- // { name: 'gender', type: 'binary', categories: ['M', 'F'] }
1922
- // ]
1923
- // },
1924
- // descriptiveStatistics: {
1925
- // age: { mean: 32.5, median: 32, std: 6.45, ... },
1926
- // income: { mean: 46250, median: 47500, ... },
1927
- // purchase: { mean: 137.5, median: 135, ... }
1928
- // },
1929
- // correlationAnalysis: {
1930
- // strongCorrelations: [
1931
- // {
1932
- // variable1: 'income',
1933
- // variable2: 'purchase',
1934
- // correlation: 0.92,
1935
- // significance: true
1936
- // }
1937
- // ]
1938
- // },
1939
- // regressionAnalysis: {
1940
- // models: [
1941
- // {
1942
- // independent: 'income',
1943
- // dependent: 'purchase',
1944
- // rSquared: 0.85,
1945
- // significant: true,
1946
- // equation: 'y = 10.5 + 0.0025*income'
1947
- // }
1948
- // ]
1949
- // },
1950
- // distributionAnalysis: {
1951
- // age: {
1952
- // isNormal: true,
1953
- // normalityPValue: 0.15,
1954
- // skewness: 0.12,
1955
- // distributionType: 'normal'
1956
- // }
1957
- // },
1958
- // outlierAnalysis: {
1959
- // income: {
1960
- // count: 2,
1961
- // percentage: 2,
1962
- // severity: 'low'
1963
- // }
1964
- // },
1965
- // insights: [
1966
- // {
1967
- // category: 'overview',
1968
- // priority: 'high',
1969
- // title: 'Dataset Composition',
1970
- // description: 'Dataset with 100 records, 3 numeric and 1 categorical variables',
1971
- // icon: '📊'
1972
- // },
1973
- // {
1974
- // category: 'correlation',
1975
- // priority: 'high',
1976
- // title: 'Very strong correlation between income and purchase',
1977
- // description: 'Positive correlation of 0.920',
1978
- // icon: '🔗'
1979
- // }
1980
- // ],
1981
- // visualizationSuggestions: [
1982
- // {
1983
- // type: 'scatter',
1984
- // variables: ['income', 'purchase'],
1985
- // priority: 'high',
1986
- // title: 'income vs purchase'
1987
- // },
1988
- // {
1989
- // type: 'histogram',
1990
- // variable: 'age',
1991
- // priority: 'medium',
1992
- // title: 'Distribution of age'
1993
- // }
1994
- // ],
1995
- // summary: {
1996
- // totalInsights: 8,
1997
- // highPriorityInsights: 3,
1998
- // keyFindings: [...],
1999
- // recommendations: [
2000
- // 'Explore correlations identified for possible predictive modeling',
2001
- // 'Consider transformations for non-normal distributions'
2002
- // ]
2003
- // }
2004
- // }
1853
+ **Returns:**
1854
+ ```yaml
1855
+ type: pca_transform
1856
+ n_components: 2
1857
+ preview:
1858
+ - - 2.121
1859
+ - 0.0
1860
+ - - 0.707
1861
+ - 0.0
1862
+ - - -1.414
1863
+ - 0.0
2005
1864
  ```
2006
1865
 
2007
- ---
2008
-
2009
- ### 17. Machine Learning
2010
-
2011
- Build and train ML models.
2012
-
2013
- #### Creating Models
2014
-
2015
- ##### Linear Regression
2016
-
1866
+ **Example:**
2017
1867
  ```javascript
2018
- // Create model
2019
- const model = datly.ml.createLinearRegression({
2020
- learningRate: 0.01,
2021
- iterations: 1000,
2022
- regularization: 'l2', // 'l1', 'l2', or null
2023
- lambda: 0.01
2024
- });
2025
-
2026
- // Prepare data
2027
- const X = [
2028
- [1, 2],
2029
- [2, 3],
2030
- [3, 4],
2031
- [4, 5]
2032
- ];
2033
- const y = [3, 5, 7, 9];
2034
-
2035
- // Train model
2036
- model.fit(X, y, true); // true = normalize features
2037
-
2038
- // Make predictions
2039
- const predictions = model.predict([[5, 6]]);
2040
- console.log(predictions); // [11]
2041
-
2042
- // Evaluate model
2043
- const score = model.score(X, y);
2044
- console.log(score);
2045
- // {
2046
- // r2Score: 1.0,
2047
- // mse: 0.0,
2048
- // rmse: 0.0,
2049
- // mae: 0.0
2050
- // }
1868
+ const transformed = datly.transform_pca(pca, X);
2051
1869
  ```
2052
1870
 
2053
1871
  ---
2054
1872
 
2055
- ##### Logistic Regression
1873
+ ## Time Series Analysis
2056
1874
 
2057
- ```javascript
2058
- // Create model for classification
2059
- const model = datly.ml.createLogisticRegression({
2060
- learningRate: 0.01,
2061
- iterations: 1000
2062
- });
1875
+ ### `moving_average(array, window = 3)`
2063
1876
 
2064
- // Binary classification data
2065
- const X = [
2066
- [1, 2], [2, 3], [3, 4], [4, 5],
2067
- [5, 6], [6, 7], [7, 8], [8, 9]
2068
- ];
2069
- const y = [0, 0, 0, 0, 1, 1, 1, 1];
2070
-
2071
- // Train
2072
- model.fit(X, y);
2073
-
2074
- // Predict classes
2075
- const predictions = model.predict([[3.5, 4.5], [7, 8]]);
2076
- console.log(predictions); // [0, 1]
1877
+ Calculates moving average.
2077
1878
 
2078
- // Predict probabilities
2079
- const probabilities = model.predictProba([[3.5, 4.5]]);
2080
- console.log(probabilities); // [{ 0: 0.62, 1: 0.38 }]
1879
+ **Parameters:**
1880
+ - `array`: Time series data
1881
+ - `window`: Window size (default: 3)
2081
1882
 
2082
- // Evaluate
2083
- const score = model.score(X, y);
2084
- console.log(score);
2085
- // {
2086
- // accuracy: 1.0,
2087
- // confusionMatrix: {...},
2088
- // classMetrics: {...}
2089
- // }
1883
+ **Returns:**
1884
+ ```yaml
1885
+ type: time_series
1886
+ method: moving_average
1887
+ window: 3
1888
+ values:
1889
+ - 10
1890
+ - 15
1891
+ - 20
1892
+ - 22
1893
+ - 25
2090
1894
  ```
2091
1895
 
2092
- ---
2093
-
2094
- ##### K-Nearest Neighbors (KNN)
2095
-
1896
+ **Example:**
2096
1897
  ```javascript
2097
- // Create KNN model
2098
- const model = datly.ml.createKNN({
2099
- k: 5,
2100
- metric: 'euclidean', // 'euclidean', 'manhattan', 'minkowski'
2101
- weights: 'uniform' // 'uniform' or 'distance'
2102
- });
2103
-
2104
- // Prepare data
2105
- const X = [
2106
- [1, 2], [2, 3], [3, 4],
2107
- [6, 7], [7, 8], [8, 9]
2108
- ];
2109
- const y = [0, 0, 0, 1, 1, 1]; // Classes
2110
-
2111
- // Train (KNN just stores the data)
2112
- model.fit(X, y, true, 'classification');
2113
-
2114
- // Predict
2115
- const predictions = model.predict([[2, 2], [7, 7]]);
2116
- console.log(predictions); // [0, 1]
2117
-
2118
- // Predict with probabilities
2119
- const proba = model.predictProba([[4, 5]]);
2120
- console.log(proba); // [{ 0: 0.4, 1: 0.6 }]
1898
+ const data = [10, 20, 30, 20, 30, 25];
1899
+ const ma = datly.moving_average(data, 3);
2121
1900
  ```
2122
1901
 
2123
- ---
2124
-
2125
- ##### Decision Tree
2126
-
2127
- ```javascript
2128
- // Create decision tree
2129
- const model = datly.ml.createDecisionTree({
2130
- maxDepth: 10,
2131
- minSamplesSplit: 2,
2132
- minSamplesLeaf: 1,
2133
- criterion: 'gini' // 'gini' or 'entropy'
2134
- });
2135
-
2136
- // Train
2137
- const X = [
2138
- [2.5], [3.5], [4.5], [5.5], [6.5], [7.5]
2139
- ];
2140
- const y = [0, 0, 1, 1, 2, 2]; // Multi-class
1902
+ ### `exponential_smoothing(array, alpha = 0.3)`
2141
1903
 
2142
- model.fit(X, y, 'classification');
1904
+ Applies exponential smoothing.
2143
1905
 
2144
- // Predict
2145
- const predictions = model.predict([[3.0], [6.0]]);
2146
- console.log(predictions); // [0, 1]
2147
-
2148
- // Get feature importance
2149
- const importance = model.getFeatureImportance();
2150
- console.log(importance); // { feature_0: 1.0 }
1906
+ **Parameters:**
1907
+ - `array`: Time series data
1908
+ - `alpha`: Smoothing parameter (0 < α < 1)
2151
1909
 
2152
- // Model summary
2153
- const summary = model.summary();
2154
- console.log(summary);
2155
- // {
2156
- // modelType: 'Decision Tree',
2157
- // taskType: 'classification',
2158
- // trainingMetrics: {
2159
- // treeDepth: 3,
2160
- // leafCount: 6,
2161
- // nodeCount: 11
2162
- // }
2163
- // }
1910
+ **Returns:**
1911
+ ```yaml
1912
+ type: time_series
1913
+ method: exponential_smoothing
1914
+ alpha: 0.3
1915
+ values:
1916
+ - 10
1917
+ - 13
1918
+ - 18.1
1919
+ - 18.47
1920
+ - 21.73
2164
1921
  ```
2165
1922
 
2166
- ---
2167
-
2168
- ##### Random Forest
2169
-
1923
+ **Example:**
2170
1924
  ```javascript
2171
- // Create random forest
2172
- const model = datly.ml.createRandomForest({
2173
- nEstimators: 100, // Number of trees
2174
- maxDepth: 10,
2175
- minSamplesSplit: 2,
2176
- maxFeatures: 'sqrt', // 'sqrt', 'log2', or number
2177
- bootstrap: true
2178
- });
2179
-
2180
- // Train
2181
- const X = [
2182
- [1, 2], [2, 3], [3, 4], [4, 5],
2183
- [5, 6], [6, 7], [7, 8], [8, 9]
2184
- ];
2185
- const y = ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'];
2186
-
2187
- model.fit(X, y, 'classification');
2188
-
2189
- // Predict
2190
- const predictions = model.predict([[2.5, 3.5], [7, 8]]);
2191
- console.log(predictions); // ['A', 'C']
2192
-
2193
- // Get feature importance
2194
- const importance = model.getFeatureImportance();
2195
- console.log(importance); // [0.6, 0.4]
1925
+ const smoothed = datly.exponential_smoothing(data, 0.3);
2196
1926
  ```
2197
1927
 
2198
- ---
2199
-
2200
- ##### Naive Bayes
2201
-
2202
- ```javascript
2203
- // Create Naive Bayes classifier
2204
- const model = datly.ml.createNaiveBayes({
2205
- type: 'gaussian' // 'gaussian', 'multinomial', or 'bernoulli'
2206
- });
2207
-
2208
- // Train
2209
- const X = [
2210
- [1, 2], [2, 3], [3, 4],
2211
- [5, 6], [6, 7], [7, 8]
2212
- ];
2213
- const y = ['spam', 'spam', 'spam', 'ham', 'ham', 'ham'];
1928
+ ### `autocorrelation(array, lag = 1)`
2214
1929
 
2215
- model.fit(X, y);
1930
+ Calculates autocorrelation at a given lag.
2216
1931
 
2217
- // Predict
2218
- const predictions = model.predict([[2, 2], [6, 6]]);
2219
- console.log(predictions); // ['spam', 'ham']
1932
+ **Parameters:**
1933
+ - `array`: Time series data
1934
+ - `lag`: Lag value (default: 1)
2220
1935
 
2221
- // Predict probabilities
2222
- const proba = model.predictProba([[4, 5]]);
2223
- console.log(proba); // [{ spam: 0.3, ham: 0.7 }]
1936
+ **Returns:**
1937
+ ```yaml
1938
+ type: statistic
1939
+ name: autocorrelation
1940
+ lag: 1
1941
+ value: 0.456
2224
1942
  ```
2225
1943
 
2226
- ---
2227
-
2228
- ##### Support Vector Machine (SVM)
2229
-
1944
+ **Example:**
2230
1945
  ```javascript
2231
- // Create SVM
2232
- const model = datly.ml.createSVM({
2233
- C: 1.0, // Regularization parameter
2234
- kernel: 'linear', // 'linear', 'rbf', 'poly'
2235
- gamma: 'scale', // 'scale', 'auto', or number
2236
- degree: 3, // For polynomial kernel
2237
- learningRate: 0.001,
2238
- iterations: 1000
2239
- });
2240
-
2241
- // Train
2242
- const X = [
2243
- [1, 2], [2, 3], [3, 4],
2244
- [6, 7], [7, 8], [8, 9]
2245
- ];
2246
- const y = [0, 0, 0, 1, 1, 1];
2247
-
2248
- model.fit(X, y);
2249
-
2250
- // Predict
2251
- const predictions = model.predict([[2, 2], [7, 7]]);
2252
- console.log(predictions); // [0, 1]
2253
-
2254
- // Summary
2255
- const summary = model.summary();
2256
- console.log(summary.trainingMetrics.nSupportVectors); // Number of support vectors
1946
+ const acf = datly.autocorrelation(data, 1);
2257
1947
  ```
2258
1948
 
2259
1949
  ---
2260
1950
 
2261
- #### Model Utilities
1951
+ ## Outlier Detection
2262
1952
 
2263
- ##### Train-Test Split
1953
+ ### `outliers_iqr(array)`
2264
1954
 
2265
- ```javascript
2266
- const X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]];
2267
- const y = [0, 0, 1, 1, 1];
1955
+ Detects outliers using the IQR (Interquartile Range) method.
2268
1956
 
2269
- const split = datly.ml.trainTestSplit(X, y, 0.2, true);
2270
- // {
2271
- // X_train: [[...], [...], [...]], // 80% of data
2272
- // X_test: [[...], [...]], // 20% of data
2273
- // y_train: [0, 1, 1],
2274
- // y_test: [0, 1]
2275
- // }
1957
+ **Returns:**
1958
+ ```yaml
1959
+ type: outlier_detection
1960
+ method: iqr
1961
+ lower_bound: 45.5
1962
+ upper_bound: 154.5
1963
+ n_outliers: 3
1964
+ outlier_indices:
1965
+ - 5
1966
+ - 12
1967
+ - 23
1968
+ outlier_values:
1969
+ - 200
1970
+ - 30
1971
+ - 180
2276
1972
  ```
2277
1973
 
2278
- ---
2279
-
2280
- ##### Cross-Validation
2281
-
1974
+ **Example:**
2282
1975
  ```javascript
2283
- const model = datly.ml.createKNN({ k: 3 });
2284
- const X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [6, 7], [7, 8], [8, 9]];
2285
- const y = [0, 0, 0, 0, 1, 1, 1, 1];
2286
-
2287
- const cv = datly.ml.crossValidate(model, X, y, 5, 'classification');
2288
- console.log(cv);
2289
- // {
2290
- // scores: [1.0, 0.8, 1.0, 0.8, 0.9],
2291
- // meanScore: 0.9,
2292
- // stdScore: 0.089,
2293
- // folds: 5
2294
- // }
1976
+ const data = [50, 55, 60, 65, 70, 200, 75, 80];
1977
+ const outliers = datly.outliers_iqr(data);
2295
1978
  ```
2296
1979
 
2297
- ---
2298
-
2299
- ##### Compare Models
1980
+ ### `outliers_zscore(array, threshold = 3)`
2300
1981
 
2301
- ```javascript
2302
- const X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [6, 7]];
2303
- const y = [0, 0, 0, 1, 1, 1];
1982
+ Detects outliers using z-score method.
2304
1983
 
2305
- const models = [
2306
- { name: 'KNN', model: datly.ml.createKNN({ k: 3 }) },
2307
- { name: 'Decision Tree', model: datly.ml.createDecisionTree() },
2308
- { name: 'Logistic Regression', model: datly.ml.createLogisticRegression() }
2309
- ];
1984
+ **Parameters:**
1985
+ - `array`: Array of numbers
1986
+ - `threshold`: Z-score threshold (default: 3)
2310
1987
 
2311
- const comparison = datly.ml.compareModels(models, X, y, 'classification');
2312
- console.log(comparison);
2313
- // {
2314
- // results: [
2315
- // { name: 'KNN', score: 0.95, trainTime: 5, evalTime: 2 },
2316
- // { name: 'Decision Tree', score: 0.90, trainTime: 15, evalTime: 1 },
2317
- // { name: 'Logistic Regression', score: 0.85, trainTime: 100, evalTime: 1 }
2318
- // ],
2319
- // bestModel: { name: 'KNN', score: 0.95, ... },
2320
- // comparison: "📊 MODEL COMPARISON REPORT\n..."
2321
- // }
1988
+ **Returns:**
1989
+ ```yaml
1990
+ type: outlier_detection
1991
+ method: zscore
1992
+ threshold: 3
1993
+ n_outliers: 2
1994
+ outlier_indices:
1995
+ - 5
1996
+ - 12
1997
+ outlier_values:
1998
+ - 200
1999
+ - 30
2322
2000
  ```
2323
2001
 
2324
- ---
2325
-
2326
- ##### Quick Train (One-liner)
2327
-
2002
+ **Example:**
2328
2003
  ```javascript
2329
- // Train and evaluate a model in one line
2330
- const result = datly.ml.quickTrain(
2331
- 'randomforest', // Model type
2332
- X, // Features
2333
- y, // Target
2334
- {
2335
- taskType: 'classification',
2336
- testSize: 0.2,
2337
- normalize: true,
2338
- nEstimators: 50
2339
- }
2340
- );
2341
-
2342
- console.log(result);
2343
- // {
2344
- // model: RandomForest {...},
2345
- // score: {
2346
- // accuracy: 0.95,
2347
- // confusionMatrix: {...}
2348
- // },
2349
- // trainTime: 150,
2350
- // summary: {...}
2351
- // }
2004
+ const outliers = datly.outliers_zscore(data, 3);
2352
2005
  ```
2353
2006
 
2354
2007
  ---
2355
2008
 
2356
- ##### Feature Engineering
2009
+ ## Visualization
2357
2010
 
2358
- ```javascript
2359
- // Polynomial features
2360
- const X = [[1], [2], [3]];
2361
- const polyFeatures = datly.ml.polynomialFeatures(X, 2);
2362
- console.log(polyFeatures);
2363
- // [[1, 1], [2, 4], [3, 9]] // [x, x²]
2011
+ All visualization functions create SVG-based charts. They accept optional configuration and a selector for where to render the chart.
2364
2012
 
2365
- // Standard scaling
2366
- const data = [[1, 2], [3, 4], [5, 6]];
2367
- const scaler = datly.ml.standardScaler(data);
2368
- console.log(scaler.scaled);
2369
- // [[-1.22, -1.22], [0, 0], [1.22, 1.22]]
2013
+ ### Configuration Options
2370
2014
 
2371
- // Transform new data
2372
- const newData = [[2, 3]];
2373
- const scaled = scaler.transform(newData);
2374
- console.log(scaled);
2015
+ Common options for all plots:
2016
+ - `width`: Chart width in pixels (default: 400)
2017
+ - `height`: Chart height in pixels (default: 400)
2018
+ - `color`: Primary color (default: '#000')
2019
+ - `background`: Background color (default: '#fff')
2020
+ - `title`: Chart title
2021
+ - `xlabel`: X-axis label
2022
+ - `ylabel`: Y-axis label
2375
2023
 
2376
- // Min-Max scaling
2377
- const minMaxScaler = datly.ml.minMaxScaler(data, [0, 1]);
2378
- console.log(minMaxScaler.scaled);
2379
- // [[0, 0], [0.5, 0.5], [1, 1]]
2380
- ```
2024
+ ### `plotHistogram(array, options = {}, selector)`
2381
2025
 
2382
- ---
2026
+ Creates a histogram.
2383
2027
 
2384
- ##### ROC Curve
2028
+ **Additional Options:**
2029
+ - `bins`: Number of bins (default: 10)
2385
2030
 
2031
+ **Example:**
2386
2032
  ```javascript
2387
- const yTrue = [0, 0, 1, 1, 1, 0, 1, 0];
2388
- const yProba = [0.1, 0.3, 0.6, 0.8, 0.9, 0.2, 0.7, 0.4];
2389
-
2390
- const roc = datly.ml.rocCurve(yTrue, yProba);
2391
- console.log(roc);
2392
- // {
2393
- // fpr: [0, 0, 0.25, 0.25, 0.5, ...],
2394
- // tpr: [0, 0.2, 0.2, 0.4, 0.4, ...],
2395
- // auc: 0.85, // Area Under Curve
2396
- // thresholds: [...]
2397
- // }
2398
- ```
2399
-
2400
- ---
2401
-
2402
- ##### Precision-Recall Curve
2403
-
2404
- ```javascript
2405
- const yTrue = [0, 0, 1, 1, 1, 0, 1, 0];
2406
- const yProba = [0.1, 0.3, 0.6, 0.8, 0.9, 0.2, 0.7, 0.4];
2407
-
2408
- const pr = datly.ml.precisionRecallCurve(yTrue, yProba);
2409
- console.log(pr);
2410
- // {
2411
- // precision: [0.5, 0.6, 0.67, ...],
2412
- // recall: [1.0, 0.8, 0.67, ...],
2413
- // thresholds: [...]
2414
- // }
2033
+ const data = [1, 2, 2, 3, 3, 3, 4, 4, 5];
2034
+ datly.plotHistogram(data, {
2035
+ width: 600,
2036
+ height: 400,
2037
+ bins: 10,
2038
+ title: 'Distribution',
2039
+ color: '#4CAF50'
2040
+ }, '#chart');
2415
2041
  ```
2416
2042
 
2417
- ---
2043
+ ### `plotScatter(x, y, options = {}, selector)`
2418
2044
 
2419
- ### 18. Data Visualization
2045
+ Creates a scatter plot.
2420
2046
 
2421
- Create interactive D3.js visualizations
2422
-
2423
- #### Setup
2047
+ **Additional Options:**
2048
+ - `size`: Point size (default: 4)
2424
2049
 
2050
+ **Example:**
2425
2051
  ```javascript
2426
- // Initialize visualizer
2427
- const viz = datly.viz;
2428
-
2429
- // Or create custom container
2430
- const customViz = new Datly.Visualizer('my-container-id');
2052
+ const x = [1, 2, 3, 4, 5];
2053
+ const y = [2, 4, 3, 5, 6];
2054
+ datly.plotScatter(x, y, {
2055
+ width: 600,
2056
+ height: 400,
2057
+ title: 'Scatter Plot',
2058
+ xlabel: 'X Variable',
2059
+ ylabel: 'Y Variable',
2060
+ size: 5
2061
+ }, '#chart');
2431
2062
  ```
2432
2063
 
2433
- ---
2064
+ ### `plotLine(x, y, options = {}, selector)`
2434
2065
 
2435
- #### Methods
2066
+ Creates a line chart.
2436
2067
 
2437
- ##### `histogram(data, options)`
2068
+ **Additional Options:**
2069
+ - `lineWidth`: Line width (default: 2)
2070
+ - `showPoints`: Show data points (default: false)
2438
2071
 
2072
+ **Example:**
2439
2073
  ```javascript
2440
- const ages = [23, 25, 27, 29, 31, 33, 35, 37, 39, 41];
2441
-
2442
- datly.plotHistogram(ages, {
2443
- title: 'Age Distribution',
2444
- xlabel: 'Age',
2445
- ylabel: 'Frequency',
2446
- bins: 10,
2447
- color: '#4299e1',
2448
- width: 800,
2449
- height: 600
2450
- });
2074
+ const x = [1, 2, 3, 4, 5];
2075
+ const y = [2, 4, 3, 5, 6];
2076
+ datly.plotLine(x, y, {
2077
+ lineWidth: 3,
2078
+ showPoints: true,
2079
+ title: 'Time Series'
2080
+ }, '#chart');
2451
2081
  ```
2452
2082
 
2453
- ---
2083
+ ### `plotBar(categories, values, options = {}, selector)`
2454
2084
 
2455
- ##### `boxplot(data, options)`
2085
+ Creates a bar chart.
2456
2086
 
2087
+ **Example:**
2457
2088
  ```javascript
2458
- // Single box plot
2459
- const data1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 100];
2460
- datly.plotBoxplot(data1, {
2461
- title: 'Sales Distribution',
2462
- ylabel: 'Sales ($)'
2463
- });
2464
-
2465
- // Multiple box plots
2466
- const groupA = [20, 22, 23, 25, 27];
2467
- const groupB = [30, 32, 35, 37, 40];
2468
- const groupC = [25, 27, 29, 31, 33];
2469
-
2470
- datly.plotBoxplot([groupA, groupB, groupC], {
2471
- title: 'Sales by Region',
2472
- labels: ['North', 'South', 'East'],
2089
+ const categories = ['A', 'B', 'C', 'D'];
2090
+ const values = [10, 25, 15, 30];
2091
+ datly.plotBar(categories, values, {
2092
+ title: 'Sales by Category',
2473
2093
  ylabel: 'Sales ($)'
2474
- });
2094
+ }, '#chart');
2475
2095
  ```
2476
2096
 
2477
- ---
2478
-
2479
- ##### `scatter(x, y, options)`
2097
+ ### `plotBoxplot(data, options = {}, selector)`
2480
2098
 
2481
- ```javascript
2482
- const height = [160, 165, 170, 175, 180, 185];
2483
- const weight = [55, 60, 65, 70, 75, 80];
2099
+ Creates box plots for one or more groups.
2484
2100
 
2485
- datly.plotScatter(height, weight, {
2486
- title: 'Height vs Weight',
2487
- xlabel: 'Height (cm)',
2488
- ylabel: 'Weight (kg)',
2489
- color: '#e74c3c',
2490
- size: 6,
2491
- labels: ['Person 1', 'Person 2', ...] // Optional
2492
- });
2493
- ```
2494
-
2495
- ---
2496
-
2497
- ##### `line(x, y, options)`
2101
+ **Parameters:**
2102
+ - `data`: Array of arrays (each array is a group) or single array
2103
+ - `options`:
2104
+ - `labels`: Array of group labels
2498
2105
 
2106
+ **Example:**
2499
2107
  ```javascript
2500
- const months = [1, 2, 3, 4, 5, 6];
2501
- const revenue = [100, 120, 140, 130, 160, 180];
2108
+ const group1 = [1, 2, 3, 4, 5, 6];
2109
+ const group2 = [2, 3, 4, 5, 6, 7];
2110
+ const group3 = [3, 4, 5, 6, 7, 8];
2502
2111
 
2503
- datly.plotLine(months, revenue, {
2504
- title: 'Monthly Revenue',
2505
- xlabel: 'Month',
2506
- ylabel: 'Revenue ($1000)',
2507
- color: '#2ecc71',
2508
- lineWidth: 3,
2509
- showPoints: true
2510
- });
2112
+ datly.plotBoxplot([group1, group2, group3], {
2113
+ labels: ['Group A', 'Group B', 'Group C'],
2114
+ title: 'Comparison'
2115
+ }, '#chart');
2511
2116
  ```
2512
2117
 
2513
- ---
2514
-
2515
- ##### `bar(categories, values, options)`
2516
-
2517
- ```javascript
2518
- const products = ['Product A', 'Product B', 'Product C'];
2519
- const sales = [150, 230, 180];
2520
-
2521
- datly.plotBar(products, sales, {
2522
- title: 'Sales by Product',
2523
- xlabel: 'Product',
2524
- ylabel: 'Sales',
2525
- color: '#f39c12',
2526
- horizontal: false // Set to true for horizontal bars
2527
- });
2528
- ```
2118
+ ### `plotPie(labels, values, options = {}, selector)`
2529
2119
 
2530
- ---
2120
+ Creates a pie chart.
2531
2121
 
2532
- ##### `pie(labels, values, options)`
2122
+ **Additional Options:**
2123
+ - `showLabels`: Display labels (default: true)
2533
2124
 
2125
+ **Example:**
2534
2126
  ```javascript
2535
- const categories = ['Electronics', 'Clothing', 'Food', 'Other'];
2536
- const amounts = [35, 25, 20, 20];
2537
-
2538
- datly.plotPie(categories, amounts, {
2539
- title: 'Sales Distribution',
2540
- showLabels: true,
2541
- showPercentage: true
2542
- });
2127
+ const labels = ['Category A', 'Category B', 'Category C'];
2128
+ const values = [30, 45, 25];
2129
+ datly.plotPie(labels, values, {
2130
+ title: 'Market Share',
2131
+ showLabels: true
2132
+ }, '#chart');
2543
2133
  ```
2544
2134
 
2545
- ---
2135
+ ### `plotHeatmap(matrix, options = {}, selector)`
2546
2136
 
2547
- ##### `heatmap(matrix, options)`
2137
+ Creates a heatmap for a correlation matrix.
2548
2138
 
2139
+ **Additional Options:**
2140
+ - `labels`: Array of variable names
2141
+ - `showValues`: Display correlation values (default: true)
2142
+
2143
+ **Example:**
2549
2144
  ```javascript
2550
- const correlationMatrix = [
2145
+ const corrMatrix = [
2551
2146
  [1.0, 0.8, 0.3],
2552
2147
  [0.8, 1.0, 0.5],
2553
2148
  [0.3, 0.5, 1.0]
2554
2149
  ];
2555
2150
 
2556
- datly.plotHeatmap(correlationMatrix, {
2557
- title: 'Correlation Heatmap',
2151
+ datly.plotHeatmap(corrMatrix, {
2558
2152
  labels: ['Var1', 'Var2', 'Var3'],
2559
- colorScheme: 'RdBu', // Color scheme
2560
- showValues: true
2561
- });
2153
+ showValues: true,
2154
+ title: 'Correlation Matrix'
2155
+ }, '#chart');
2562
2156
  ```
2563
2157
 
2564
- ---
2158
+ ### `plotViolin(data, options = {}, selector)`
2159
+
2160
+ Creates violin plots showing distribution density.
2565
2161
 
2566
- ##### `violin(data, options)`
2162
+ **Parameters:**
2163
+ - `data`: Array of arrays or single array
2164
+ - `options`:
2165
+ - `labels`: Group labels
2567
2166
 
2167
+ **Example:**
2568
2168
  ```javascript
2569
- const groupA = [1, 2, 3, 4, 5, 6, 7];
2570
- const groupB = [3, 4, 5, 6, 7, 8, 9];
2169
+ const group1 = [1, 2, 2, 3, 3, 3, 4, 4, 5];
2170
+ const group2 = [2, 3, 3, 4, 4, 4, 5, 5, 6];
2571
2171
 
2572
- datly.plotViolin([groupA, groupB], {
2573
- title: 'Distribution Comparison',
2574
- labels: ['Control', 'Treatment'],
2575
- ylabel: 'Value',
2576
- color: '#9b59b6'
2577
- });
2172
+ datly.plotViolin([group1, group2], {
2173
+ labels: ['Before', 'After'],
2174
+ title: 'Distribution Comparison'
2175
+ }, '#chart');
2578
2176
  ```
2579
2177
 
2580
- ---
2178
+ ### `plotDensity(array, options = {}, selector)`
2581
2179
 
2582
- ##### `density(data, options)`
2180
+ Creates a kernel density plot.
2583
2181
 
2584
- ```javascript
2585
- const data = [23, 25, 27, 29, 31, 33, 35];
2182
+ **Additional Options:**
2183
+ - `bandwidth`: Smoothing bandwidth (default: 5)
2586
2184
 
2185
+ **Example:**
2186
+ ```javascript
2187
+ const data = [1, 2, 2, 3, 3, 3, 4, 4, 5];
2587
2188
  datly.plotDensity(data, {
2588
- title: 'Density Plot',
2589
- xlabel: 'Value',
2590
- ylabel: 'Density',
2591
- color: '#1abc9c',
2592
- bandwidth: null // Auto-calculate or specify
2593
- });
2189
+ bandwidth: 0.5,
2190
+ title: 'Density Plot'
2191
+ }, '#chart');
2594
2192
  ```
2595
2193
 
2596
- ---
2194
+ ### `plotQQ(array, options = {}, selector)`
2597
2195
 
2598
- ##### `qqplot(data, options)`
2196
+ Creates a Q-Q plot for normality assessment.
2599
2197
 
2198
+ **Example:**
2600
2199
  ```javascript
2601
- const data = [2.3, 2.5, 2.7, 2.9, 3.1, 3.3];
2602
-
2200
+ const data = [1.2, 2.3, 1.8, 2.1, 1.9, 2.0, 2.4];
2603
2201
  datly.plotQQ(data, {
2604
- title: 'Q-Q Plot',
2605
- xlabel: 'Theoretical Quantiles',
2606
- ylabel: 'Sample Quantiles',
2607
- color: '#34495e'
2608
- });
2202
+ title: 'Q-Q Plot'
2203
+ }, '#chart');
2609
2204
  ```
2610
2205
 
2611
- ---
2206
+ ### `plotParallel(data, columns, options = {}, selector)`
2207
+
2208
+ Creates a parallel coordinates plot.
2612
2209
 
2613
- ##### `parallel(data, dimensions, options)`
2210
+ **Parameters:**
2211
+ - `data`: Array of objects
2212
+ - `columns`: Array of column names to include
2213
+ - `options`:
2214
+ - `colors`: Array of colors for each observation
2614
2215
 
2216
+ **Example:**
2615
2217
  ```javascript
2616
2218
  const data = [
2617
- { age: 25, income: 30000, spending: 15000, satisfaction: 7 },
2618
- { age: 30, income: 45000, spending: 20000, satisfaction: 8 },
2619
- { age: 35, income: 60000, spending: 25000, satisfaction: 9 }
2219
+ { age: 25, salary: 50000, experience: 2 },
2220
+ { age: 30, salary: 60000, experience: 5 },
2221
+ { age: 35, salary: 70000, experience: 8 }
2620
2222
  ];
2621
2223
 
2622
- datly.plotParallel(data, ['age', 'income', 'spending', 'satisfaction'], {
2623
- title: 'Parallel Coordinates Plot',
2624
- colors: ['#e74c3c', '#3498db', '#2ecc71']
2625
- });
2224
+ datly.plotParallel(data, ['age', 'salary', 'experience'], {
2225
+ title: 'Parallel Coordinates'
2226
+ }, '#chart');
2626
2227
  ```
2627
2228
 
2628
- ---
2229
+ ### `plotPairplot(data, columns, options = {}, selector)`
2629
2230
 
2630
- ##### `pairplot(data, columns, options)`
2231
+ Creates a pairplot matrix showing all pairwise relationships.
2631
2232
 
2233
+ **Parameters:**
2234
+ - `data`: Array of objects
2235
+ - `columns`: Array of column names
2236
+ - `options`:
2237
+ - `size`: Size of each subplot (default: 120)
2238
+ - `color`: Point color
2239
+
2240
+ **Example:**
2632
2241
  ```javascript
2633
2242
  const data = [
2634
- { height: 160, weight: 55, age: 25 },
2635
- { height: 165, weight: 60, age: 30 },
2636
- { height: 170, weight: 65, age: 35 }
2243
+ { age: 25, salary: 50000, experience: 2 },
2244
+ { age: 30, salary: 60000, experience: 5 },
2245
+ { age: 35, salary: 70000, experience: 8 }
2637
2246
  ];
2638
2247
 
2639
- datly.plotPairplot(data, ['height', 'weight', 'age'], {
2640
- title: 'Pair Plot',
2641
- color: '#3498db',
2642
- size: 3
2643
- });
2248
+ datly.plotPairplot(data, ['age', 'salary', 'experience'], {
2249
+ size: 150
2250
+ }, '#chart');
2644
2251
  ```
2645
2252
 
2646
- ---
2253
+ ### `plotMultiline(series, options = {}, selector)`
2254
+
2255
+ Creates a multi-line chart for comparing time series.
2647
2256
 
2648
- ##### `multiline(series, options)`
2257
+ **Parameters:**
2258
+ - `series`: Array of objects with `name` and `data` properties
2259
+ - `data`: Array of `{x, y}` objects
2260
+ - `options`:
2261
+ - `legend`: Show legend (default: false)
2649
2262
 
2263
+ **Example:**
2650
2264
  ```javascript
2651
2265
  const series = [
2652
2266
  {
2653
2267
  name: 'Series A',
2654
- data: [
2655
- { x: 1, y: 10 },
2656
- { x: 2, y: 15 },
2657
- { x: 3, y: 12 }
2658
- ]
2268
+ data: [{x: 1, y: 10}, {x: 2, y: 20}, {x: 3, y: 15}]
2659
2269
  },
2660
2270
  {
2661
2271
  name: 'Series B',
2662
- data: [
2663
- { x: 1, y: 5 },
2664
- { x: 2, y: 8 },
2665
- { x: 3, y: 10 }
2666
- ]
2272
+ data: [{x: 1, y: 15}, {x: 2, y: 25}, {x: 3, y: 20}]
2667
2273
  }
2668
2274
  ];
2669
2275
 
2670
2276
  datly.plotMultiline(series, {
2671
- title: 'Multiple Time Series',
2672
- xlabel: 'Time',
2673
- ylabel: 'Value',
2674
- legend: true
2675
- });
2277
+ legend: true,
2278
+ title: 'Comparison'
2279
+ }, '#chart');
2676
2280
  ```
2677
2281
 
2678
2282
  ---
2679
2283
 
2680
- ##### Special Visualization Methods
2681
-
2682
- ```javascript
2683
- // Correlation matrix heatmap from dataset
2684
- const data = {
2685
- headers: ['var1', 'var2', 'var3'],
2686
- data: [
2687
- { var1: 1, var2: 2, var3: 3 },
2688
- { var1: 2, var2: 4, var3: 5 },
2689
- { var1: 3, var2: 5, var3: 7 }
2690
- ]
2691
- };
2284
+ ## Complete Example Workflow
2692
2285
 
2693
- datly.plotCorrelationMatrix(data, {
2694
- title: 'Correlation Matrix'
2695
- });
2696
-
2697
- // Distribution plot from dataset column
2698
- datly.plotDistribution(data, 'var1', {
2699
- title: 'Distribution of var1'
2700
- });
2701
-
2702
- // Compare multiple distributions
2703
- datly.plotMultipleDistributions(data, ['var1', 'var2', 'var3'], {
2704
- title: 'Distribution Comparison'
2705
- });
2706
- ```
2707
-
2708
- ---
2709
-
2710
- ## 🎯 Complete Examples
2711
-
2712
- ### Example 1: Comprehensive Data Analysis
2286
+ Here's a complete example demonstrating a typical data analysis workflow:
2713
2287
 
2714
2288
  ```javascript
2715
- const datly = new Datly();
2716
-
2717
- // Load data
2718
- const data = datly.dataLoader.loadCSV('sales_data.csv');
2719
-
2720
- // Validate data
2721
- const validation = datly.validator.validateData(data);
2722
- if (!validation.valid) {
2723
- console.error('Data validation failed:', validation.errors);
2724
- }
2725
-
2726
- // Get descriptive statistics
2727
- const sales = datly.dataLoader.getColumn(data, 'sales');
2728
- console.log('Mean Sales:', datly.centralTendency.mean(sales));
2729
- console.log('Median Sales:', datly.centralTendency.median(sales));
2730
- console.log('Std Dev:', datly.dispersion.standardDeviation(sales));
2731
-
2732
- // Check for outliers
2733
- const outliers = datly.utils.detectOutliers(sales, 'iqr');
2734
- console.log('Outliers:', outliers);
2735
-
2736
- // Test normality
2737
- const normalityTest = datly.normalityTests.shapiroWilk(sales);
2738
- console.log('Is Normal:', normalityTest.isNormal);
2739
-
2740
- // Generate report
2741
- const report = datly.reportGenerator.summary(data);
2742
- console.log(report);
2743
-
2744
- datly.plotHistogram(sales, { title: 'Sales Distribution' });
2745
- datly.plotBoxplot(sales, { title: 'Sales Box Plot' });
2746
- ```
2747
-
2748
- ---
2749
-
2750
- ### Example 2: Hypothesis Testing Workflow
2751
-
2752
- ```javascript
2753
- const datly = new Datly();
2754
-
2755
- // Two groups to compare
2756
- const controlGroup = [23, 25, 27, 29, 31, 33];
2757
- const treatmentGroup = [28, 30, 32, 34, 36, 38];
2289
+ // 1. Load and explore data
2290
+ const data = [
2291
+ { age: 25, salary: 50000, experience: 2, department: 'IT' },
2292
+ { age: 30, salary: 60000, experience: 5, department: 'HR' },
2293
+ { age: 35, salary: 70000, experience: 8, department: 'IT' },
2294
+ // ... more data
2295
+ ];
2758
2296
 
2759
- // Perform t-test
2760
- const tTest = datly.hypothesisTesting.tTest(
2761
- controlGroup,
2762
- treatmentGroup,
2763
- 'two-sample'
2297
+ // 2. Perform EDA
2298
+ const overview = datly.eda_overview(data);
2299
+ console.log(overview);
2300
+
2301
+ // 3. Check correlations
2302
+ const correlations = datly.df_corr(data, 'pearson');
2303
+ console.log(correlations);
2304
+
2305
+ // 4. Prepare features and target
2306
+ const X = data.map(d => [d.age, d.experience]);
2307
+ const y = data.map(d => d.salary);
2308
+
2309
+ // 5. Split data
2310
+ const split = datly.train_test_split(X, y, 0.2, 42);
2311
+ const trainIndices = split.indices.train;
2312
+ const testIndices = split.indices.test;
2313
+
2314
+ const X_train = trainIndices.map(i => X[i]);
2315
+ const y_train = trainIndices.map(i => y[i]);
2316
+ const X_test = testIndices.map(i => X[i]);
2317
+ const y_test = testIndices.map(i => y[i]);
2318
+
2319
+ // 6. Scale features
2320
+ const scaler = datly.standard_scaler_fit(X_train);
2321
+ const X_train_scaled = datly.standard_scaler_transform(scaler, X_train);
2322
+ const X_test_scaled = datly.standard_scaler_transform(scaler, X_test);
2323
+
2324
+ // 7. Train model
2325
+ const model = datly.train_linear_regression(
2326
+ JSON.parse(X_train_scaled).preview,
2327
+ y_train
2764
2328
  );
2765
2329
 
2766
- // Interpret results
2767
- const interpretation = datly.interpreter.interpret(tTest);
2768
- console.log(interpretation.plainLanguage);
2769
- console.log('Decision:', interpretation.conclusion.decision);
2770
- console.log('Effect Size:', interpretation.effectSize.magnitude);
2771
-
2772
- // Calculate confidence interval for difference
2773
- const ci = datly.confidenceIntervals.meanDifference(
2774
- controlGroup,
2775
- treatmentGroup,
2776
- 0.95
2330
+ // 8. Make predictions
2331
+ const predictions = datly.predict_linear(
2332
+ model,
2333
+ JSON.parse(X_test_scaled).preview
2777
2334
  );
2778
- console.log('95% CI for difference:', ci.lowerBound, 'to', ci.upperBound);
2779
-
2780
- datly.plotBoxplot([controlGroup, treatmentGroup], {
2781
- title: 'Control vs Treatment',
2782
- labels: ['Control', 'Treatment']
2783
- });
2784
- ```
2785
-
2786
- ---
2787
2335
 
2788
- ### Example 3: Correlation and Regression
2789
-
2790
- ```javascript
2791
- const datly = new Datly();
2792
-
2793
- const data = {
2794
- headers: ['advertising', 'sales'],
2795
- data: [
2796
- { advertising: 10, sales: 100 },
2797
- { advertising: 15, sales: 150 },
2798
- { advertising: 20, sales: 180 },
2799
- { advertising: 25, sales: 230 },
2800
- { advertising: 30, sales: 270 }
2801
- ]
2802
- };
2803
-
2804
- // Extract columns
2805
- const advertising = datly.dataLoader.getColumn(data, 'advertising');
2806
- const sales = datly.dataLoader.getColumn(data, 'sales');
2807
-
2808
- // Calculate correlation
2809
- const correlation = datly.correlation.pearson(advertising, sales);
2810
- console.log('Correlation:', correlation.correlation);
2811
- console.log('P-value:', correlation.pValue);
2812
-
2813
- // Fit regression model
2814
- const model = datly.regression.linear(advertising, sales);
2815
- console.log('Equation:', model.equation);
2816
- console.log('R²:', model.rSquared);
2817
- console.log('Model significant:', model.pValueModel < 0.05);
2818
-
2819
- // Make prediction
2820
- const newAdvertising = [35];
2821
- const prediction = datly.regression.predict(model, newAdvertising);
2822
- console.log('Predicted sales for $35k advertising:', prediction[0]);
2823
-
2824
- datly.plotScatter(advertising, sales, {
2825
- title: 'Advertising vs Sales',
2826
- xlabel: 'Advertising Budget ($1000)',
2827
- ylabel: 'Sales ($1000)'
2828
- });
2829
- ```
2830
-
2831
- ---
2832
-
2833
- ### Example 4: Machine Learning Pipeline
2834
-
2835
- ```javascript
2836
- const datly = new Datly();
2837
-
2838
- // Load data
2839
- const data = datly.dataLoader.loadJSON('iris.json');
2840
-
2841
- // Prepare features and target
2842
- const X = data.data.map(row => [
2843
- row.sepal_length,
2844
- row.sepal_width,
2845
- row.petal_length,
2846
- row.petal_width
2847
- ]);
2848
- const y = data.data.map(row => row.species);
2849
-
2850
- // Split data
2851
- const split = datly.ml.trainTestSplit(X, y, 0.2, true);
2852
-
2853
- // Create and train model
2854
- const model = datly.ml.createRandomForest({
2855
- nEstimators: 100,
2856
- maxDepth: 10
2857
- });
2858
-
2859
- model.fit(split.X_train, split.y_train, 'classification');
2860
-
2861
- // Evaluate
2862
- const score = model.score(split.X_test, split.y_test);
2863
- console.log('Accuracy:', score.accuracy);
2864
- console.log('Confusion Matrix:', score.confusionMatrix.display);
2865
-
2866
- // Cross-validation
2867
- const cv = datly.ml.crossValidate(model, X, y, 5, 'classification');
2868
- console.log('CV Mean Score:', cv.meanScore);
2869
- console.log('CV Std:', cv.stdScore);
2870
-
2871
- // Feature importance
2872
- const importance = model.getFeatureImportance();
2873
- console.log('Feature Importance:', importance);
2874
- ```
2875
-
2876
- ---
2877
-
2878
- ### Example 5: Automatic Analysis
2879
-
2880
- ```javascript
2881
- const datly = new Datly();
2882
-
2883
- // Load your data
2884
- const data = datly.dataLoader.loadCSV('customer_data.csv');
2885
-
2886
- // Run automatic analysis
2887
- const analysis = datly.autoAnalyzer.autoAnalyze(data, {
2888
- minCorrelationThreshold: 0.5,
2889
- significanceLevel: 0.05,
2890
- generateVisualizations: true
2891
- });
2892
-
2893
- // View insights
2894
- analysis.insights.forEach(insight => {
2895
- console.log(`${insight.icon} [${insight.priority}] ${insight.title}`);
2896
- console.log(` ${insight.description}`);
2897
- if (insight.recommendation) {
2898
- console.log(` → ${insight.recommendation}`);
2899
- }
2900
- });
2901
-
2902
- // View recommended visualizations
2903
- analysis.visualizationSuggestions.forEach(viz => {
2904
- console.log(`📊 ${viz.title} (${viz.type}) - Priority: ${viz.priority}`);
2905
- });
2906
-
2907
- // Generate report
2908
- const textReport = datly.reportGenerator.exportSummary(
2909
- analysis,
2910
- 'text'
2336
+ // 9. Evaluate model
2337
+ const metrics = datly.metrics_regression(
2338
+ y_test,
2339
+ JSON.parse(predictions).predictions
2911
2340
  );
2912
- console.log(textReport);
2913
- ```
2914
-
2915
- ---
2341
+ console.log(metrics);
2916
2342
 
2917
- ## 📖 API Reference
2918
-
2919
- ### Core Classes
2920
-
2921
- - **`DataLoader`**: Load and manipulate datasets
2922
- - **`Validator`**: Validate data integrity
2923
- - **`Utils`**: Utility functions for data analysis
2924
- - **`CentralTendency`**: Mean, median, mode calculations
2925
- - **`Dispersion`**: Variance, standard deviation measures
2926
- - **`Position`**: Quantiles, percentiles, rankings
2927
- - **`Shape`**: Skewness and kurtosis analysis
2928
- - **`HypothesisTesting`**: Statistical tests
2929
- - **`ConfidenceIntervals`**: Interval estimation
2930
- - **`NormalityTests`**: Test for normal distribution
2931
- - **`Correlation`**: Correlation analysis
2932
- - **`Regression`**: Regression modeling
2933
- - **`ReportGenerator`**: Generate statistical reports
2934
- - **`PatternDetector`**: Detect patterns in data
2935
- - **`Interpreter`**: Interpret statistical results
2936
- - **`AutoAnalyzer`**: Automated analysis
2937
- - **`ML`**: Machine learning models
2938
- - **`Visualizer`**: Data visualization
2939
-
2940
- ---
2941
-
2942
- ## 🌐 Browser Support
2943
-
2944
- - Chrome (latest)
2945
- - Firefox (latest)
2946
- - Safari (latest)
2947
- - Edge (latest)
2948
-
2949
- **Requirements:**
2950
- - Modern JavaScript (ES6+)
2343
+ // 10. Visualize results
2344
+ datly.plotScatter(y_test, JSON.parse(predictions).predictions, {
2345
+ title: 'Actual vs Predicted',
2346
+ xlabel: 'Actual',
2347
+ ylabel: 'Predicted'
2348
+ }, '#results');
2349
+ ```
2951
2350
 
2952
2351
  ---
2953
2352
 
2954
- ## 🤝 Contributing
2955
-
2956
- Contributions are welcome! Please follow these steps:
2353
+ ## Tips and Best Practices
2957
2354
 
2958
- 1. Fork the repository
2959
- 2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
2960
- 3. Commit your changes (`git commit -m 'Add AmazingFeature'`)
2961
- 4. Push to the branch (`git push origin feature/AmazingFeature`)
2962
- 5. Open a Pull Request
2355
+ 1. **Data Preparation**: Always check for missing values and outliers before analysis
2356
+ 2. **Feature Scaling**: Scale features before training distance-based models (KNN, SVM)
2357
+ 3. **Cross-Validation**: Use cross-validation to assess model performance reliably
2358
+ 4. **Model Selection**: Start with simple models (linear regression) before trying complex ones
2359
+ 5. **Hyperparameter Tuning**: Experiment with different hyperparameters (k in KNN, max_depth in trees)
2360
+ 6. **Visualization**: Always visualize your data and results to gain insights
2361
+ 7. **Statistical Tests**: Check assumptions (normality, homogeneity) before parametric tests
2963
2362
 
2964
2363
  ---
2965
2364
 
2966
- ## 📝 License
2967
-
2968
- MIT License
2365
+ ## License
2969
2366
 
2970
- Copyright (c) 2025 Datly
2971
-
2972
- Permission is hereby granted, free of charge, to any person obtaining a copy
2973
- of this software and associated documentation files (the "Software"), to deal
2974
- in the Software without restriction, including without limitation the rights
2975
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
2976
- copies of the Software.
2367
+ This documentation is provided as-is. Please refer to the library's official repository for licensing information.
2977
2368
 
2978
2369
  ---
2979
2370
 
2980
- ## 📧 Contact
2981
-
2982
- For questions, issues, or feature requests:
2983
- - GitHub Issues: [github.com/yourrepo/datly/issues](https://github.com/yourrepo/datly/issues)
2984
- - NPM Package: [npmjs.com/package/datly](https://npmjs.com/package/datly)
2371
+ ## Support
2985
2372
 
2986
- **Made with ❤️ for the data science community**
2373
+ For issues, questions, or contributions, please visit the official Datly.js repository.