datly 0.0.2 → 0.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.MD +1773 -2386
- package/dist/datly.cjs +1 -1
- package/dist/datly.mjs +1 -1
- package/dist/datly.umd.js +1 -1
- package/package.json +3 -3
- package/src/code.js +2466 -0
- package/src/index.js +236 -480
- package/src/plot.js +609 -0
- package/src/core/dataLoader.js +0 -407
- package/src/core/utils.js +0 -306
- package/src/core/validator.js +0 -205
- package/src/dataviz/index.js +0 -1566
- package/src/descriptive/centralTendency.js +0 -208
- package/src/descriptive/dispersion.js +0 -273
- package/src/descriptive/position.js +0 -268
- package/src/descriptive/shape.js +0 -336
- package/src/inferential/confidenceIntervals.js +0 -561
- package/src/inferential/hypothesisTesting.js +0 -527
- package/src/inferential/normalityTests.js +0 -587
- package/src/insights/autoAnalyser.js +0 -685
- package/src/insights/interpreter.js +0 -543
- package/src/insights/patternDetector.js +0 -897
- package/src/insights/reportGenerator.js +0 -1072
- package/src/ml/ClassificationMetrics.js +0 -336
- package/src/ml/DecisionTree.js +0 -412
- package/src/ml/KNearestNeighbors.js +0 -317
- package/src/ml/LinearRegression.js +0 -179
- package/src/ml/LogisticRegression.js +0 -396
- package/src/ml/MachineLearning.js +0 -490
- package/src/ml/NaiveBayes.js +0 -296
- package/src/ml/RandomForest.js +0 -323
- package/src/ml/SupportVectorMachine.js +0 -299
- package/src/ml/baseModel.js +0 -106
- package/src/multivariate/correlation.js +0 -653
- package/src/multivariate/regression.js +0 -660
package/README.MD
CHANGED
@@ -1,2986 +1,2373 @@
|
|
1
|
-
#
|
2
|
-
|
3
|
-
|
4
|
-

|
5
|
-

|
6
|
-
|
7
|
-
**Javascript toolkit for data science, statistical analysis and machine learning.**
|
8
|
-
---
|
9
|
-
|
10
|
-
### ⚡ Key Features:
|
11
|
-
- 📈 **Complete Statistics Suite** - From descriptive stats to advanced hypothesis testing
|
12
|
-
- 🤖 **7 ML Algorithms** - Classification, regression, and ensemble methods
|
13
|
-
- 📊 **13 Visualization Types** - Interactive D3.js charts with one-line commands
|
14
|
-
- 🔄 **Auto-Analysis** - Intelligent data exploration with automated insights
|
15
|
-
- 🎨 **Zero Config** - Works out of the box, customizable when needed
|
16
|
-
- 🌐 **Universal** - Same API for browser and Node.js
|
1
|
+
# datly
|
2
|
+
A comprehensive JavaScript library for data analysis, statistics, machine learning, and visualization.
|
3
|
+
|
17
4
|
---
|
18
5
|
|
19
|
-
|
6
|
+
## Table of Contents
|
7
|
+
|
8
|
+
1. [Introduction](#introduction)
|
9
|
+
2. [Installation](#installation)
|
10
|
+
3. [Core Concepts](#core-concepts)
|
11
|
+
4. [Data Preparation](#data-preparation)
|
12
|
+
5. [Descriptive Statistics](#descriptive-statistics)
|
13
|
+
6. [Exploratory Data Analysis](#exploratory-data-analysis)
|
14
|
+
7. [Probability Distributions](#probability-distributions)
|
15
|
+
8. [Hypothesis Testing](#hypothesis-testing)
|
16
|
+
9. [Correlation Analysis](#correlation-analysis)
|
17
|
+
10. [Regression Models](#regression-models)
|
18
|
+
11. [Classification Models](#classification-models)
|
19
|
+
12. [Clustering](#clustering)
|
20
|
+
13. [Ensemble Methods](#ensemble-methods)
|
21
|
+
14. [Visualization](#visualization)
|
20
22
|
|
21
23
|
---
|
22
24
|
|
23
|
-
##
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
- [9. Confidence Intervals](#9-confidence-intervals)
|
37
|
-
- [10. Normality Tests](#10-normality-tests)
|
38
|
-
- [11. Correlation Analysis](#11-correlation-analysis)
|
39
|
-
- [12. Regression Analysis](#12-regression-analysis)
|
40
|
-
- [13. Report Generation](#13-report-generation)
|
41
|
-
- [14. Pattern Detection](#14-pattern-detection)
|
42
|
-
- [15. Result Interpretation](#15-result-interpretation)
|
43
|
-
- [16. Auto-Analysis](#16-auto-analysis)
|
44
|
-
- [17. Machine Learning](#17-machine-learning)
|
45
|
-
- [18. Data Visualization](#18-data-visualization)
|
46
|
-
- [Complete Examples](#complete-examples)
|
47
|
-
- [API Reference](#api-reference)
|
25
|
+
## Introduction
|
26
|
+
|
27
|
+
datly is a comprehensive JavaScript library that brings powerful data analysis, statistical testing, machine learning, and visualization capabilities to the browser and Node.js environments.
|
28
|
+
|
29
|
+
### Key Features
|
30
|
+
|
31
|
+
- **Descriptive Statistics**: Mean, median, variance, standard deviation, skewness, kurtosis
|
32
|
+
- **Statistical Tests**: t-tests, ANOVA, chi-square, normality tests
|
33
|
+
- **Machine Learning**: Linear/logistic regression, KNN, decision trees, random forests, Naive Bayes
|
34
|
+
- **Clustering**: K-means clustering
|
35
|
+
- **Dimensionality Reduction**: PCA (Principal Component Analysis)
|
36
|
+
- **Data Visualization**: Histograms, scatter plots, box plots, heatmaps, and more
|
37
|
+
- **Time Series**: Moving averages, exponential smoothing, autocorrelation
|
48
38
|
|
49
39
|
---
|
50
40
|
|
51
|
-
##
|
41
|
+
## Installation
|
52
42
|
|
53
43
|
### Browser (CDN)
|
54
44
|
|
55
45
|
```html
|
56
|
-
<!-- Include Datly -->
|
57
46
|
<script src="https://unpkg.com/datly"></script>
|
58
|
-
|
59
47
|
<script>
|
60
|
-
const
|
48
|
+
const result = datly.mean([1, 2, 3, 4, 5]);
|
49
|
+
console.log(result);
|
61
50
|
</script>
|
62
51
|
```
|
63
52
|
|
64
|
-
###
|
65
|
-
|
66
|
-
```bash
|
67
|
-
# Core library (statistics and machine learning)
|
68
|
-
npm install datly
|
69
|
-
```
|
53
|
+
### Module Import
|
70
54
|
|
71
55
|
```javascript
|
72
|
-
|
73
|
-
const datly = new Datly();
|
56
|
+
import * as datly from 'datly';
|
74
57
|
```
|
75
58
|
|
76
59
|
---
|
77
60
|
|
78
|
-
##
|
61
|
+
## Core Concepts
|
79
62
|
|
80
|
-
|
81
|
-
// Initialize the library
|
82
|
-
const datly = new Datly();
|
83
|
-
|
84
|
-
// Load data from CSV
|
85
|
-
const data = await datly.loadCSV('data.csv');
|
63
|
+
### Output Format
|
86
64
|
|
87
|
-
|
88
|
-
const ages = [25, 30, 35, 40, 45];
|
89
|
-
const meanAge = datly.mean(ages);
|
90
|
-
console.log('Mean Age:', meanAge); // 35
|
65
|
+
All analysis functions return results in a structured YAML-like text format that can be parsed or displayed:
|
91
66
|
|
92
|
-
|
93
|
-
|
94
|
-
|
95
|
-
|
96
|
-
|
97
|
-
|
98
|
-
// Create visualization
|
99
|
-
datly.plotHistogram(ages, {
|
100
|
-
title: 'Age Distribution',
|
101
|
-
xlabel: 'Age',
|
102
|
-
ylabel: 'Frequency'
|
103
|
-
});
|
67
|
+
```yaml
|
68
|
+
type: statistic
|
69
|
+
name: mean
|
70
|
+
value: 3
|
71
|
+
n: 5
|
104
72
|
```
|
105
73
|
|
106
|
-
|
107
|
-
|
108
|
-
|
74
|
+
This format makes it easy to:
|
75
|
+
- Display results in a readable format
|
76
|
+
- Parse results programmatically
|
77
|
+
- Store analysis outputs as text
|
78
|
+
- Share results across different systems
|
109
79
|
|
110
|
-
|
80
|
+
---
|
111
81
|
|
112
|
-
|
82
|
+
## Data Preparation
|
113
83
|
|
114
|
-
|
84
|
+
### `dataframe_from_json(data)`
|
115
85
|
|
116
|
-
|
117
|
-
Load data from a CSV file.
|
86
|
+
Creates a dataframe summary from JSON data.
|
118
87
|
|
88
|
+
**Parameters:**
|
89
|
+
- `data`: Array of objects or single object
|
90
|
+
|
91
|
+
**Returns:**
|
92
|
+
```yaml
|
93
|
+
type: dataframe
|
94
|
+
columns:
|
95
|
+
- name
|
96
|
+
- age
|
97
|
+
- salary
|
98
|
+
n_rows: 100
|
99
|
+
n_cols: 3
|
100
|
+
dtypes:
|
101
|
+
- string
|
102
|
+
- number
|
103
|
+
- number
|
104
|
+
preview:
|
105
|
+
- name: alice
|
106
|
+
age: 30
|
107
|
+
salary: 50000
|
108
|
+
- name: bob
|
109
|
+
age: 25
|
110
|
+
salary: 45000
|
111
|
+
```
|
112
|
+
|
113
|
+
**Example:**
|
119
114
|
```javascript
|
120
|
-
const data =
|
121
|
-
|
122
|
-
|
123
|
-
|
124
|
-
|
125
|
-
});
|
115
|
+
const data = [
|
116
|
+
{ name: 'Alice', age: 30, salary: 50000 },
|
117
|
+
{ name: 'Bob', age: 25, salary: 45000 },
|
118
|
+
{ name: 'Charlie', age: 35, salary: 60000 }
|
119
|
+
];
|
126
120
|
|
127
|
-
|
128
|
-
|
129
|
-
// headers: ['product', 'sales', 'revenue'],
|
130
|
-
// data: [
|
131
|
-
// { product: 'A', sales: 100, revenue: 5000 },
|
132
|
-
// { product: 'B', sales: 150, revenue: 7500 }
|
133
|
-
// ],
|
134
|
-
// length: 2,
|
135
|
-
// columns: 3
|
136
|
-
// }
|
121
|
+
const df = datly.dataframe_from_json(data);
|
122
|
+
console.log(df);
|
137
123
|
```
|
138
124
|
|
139
|
-
|
140
|
-
- `filePath` (string): Path to CSV file
|
141
|
-
- `options` (object): Configuration options
|
142
|
-
- `delimiter` (string): Column delimiter (default: ',')
|
143
|
-
- `header` (boolean): First row contains headers (default: true)
|
144
|
-
- `skipEmptyLines` (boolean): Skip empty rows (default: true)
|
145
|
-
- `encoding` (string): File encoding (default: 'utf8')
|
125
|
+
---
|
146
126
|
|
147
|
-
|
127
|
+
## Descriptive Statistics
|
148
128
|
|
149
|
-
|
129
|
+
### `mean(array)`
|
150
130
|
|
151
|
-
|
152
|
-
|
131
|
+
Calculates the arithmetic mean of an array of numbers.
|
132
|
+
|
133
|
+
**Returns:**
|
134
|
+
```yaml
|
135
|
+
type: statistic
|
136
|
+
name: mean
|
137
|
+
n: 5
|
138
|
+
value: 3
|
139
|
+
```
|
153
140
|
|
141
|
+
**Example:**
|
154
142
|
```javascript
|
155
|
-
//
|
156
|
-
|
143
|
+
datly.mean([1, 2, 3, 4, 5]); // 3
|
144
|
+
```
|
145
|
+
|
146
|
+
### `median(array)`
|
157
147
|
|
158
|
-
|
159
|
-
const jsonString = '{"users": [{"name": "John", "age": 30}]}';
|
160
|
-
const data2 = await datly.loadJSON(jsonString);
|
148
|
+
Calculates the median value.
|
161
149
|
|
162
|
-
|
163
|
-
|
164
|
-
|
165
|
-
|
166
|
-
|
167
|
-
|
168
|
-
]
|
169
|
-
};
|
170
|
-
const data3 = await datly.loadJSON(obj);
|
150
|
+
**Returns:**
|
151
|
+
```yaml
|
152
|
+
type: statistic
|
153
|
+
name: median
|
154
|
+
n: 5
|
155
|
+
value: 3
|
171
156
|
```
|
172
157
|
|
173
|
-
**
|
174
|
-
|
175
|
-
|
176
|
-
|
177
|
-
|
158
|
+
**Example:**
|
159
|
+
```javascript
|
160
|
+
datly.median([1, 2, 3, 4, 5]); // 3
|
161
|
+
datly.median([1, 2, 3, 4]); // 2.5
|
162
|
+
```
|
178
163
|
|
179
|
-
|
164
|
+
### `variance(array, sample = true)`
|
180
165
|
|
181
|
-
|
166
|
+
Calculates the variance.
|
167
|
+
|
168
|
+
**Parameters:**
|
169
|
+
- `array`: Array of numbers
|
170
|
+
- `sample`: If true, uses sample variance (n-1); if false, uses population variance (n)
|
182
171
|
|
183
|
-
|
184
|
-
|
172
|
+
**Returns:**
|
173
|
+
```yaml
|
174
|
+
type: statistic
|
175
|
+
name: variance
|
176
|
+
sample: true
|
177
|
+
n: 5
|
178
|
+
value: 2.5
|
179
|
+
```
|
185
180
|
|
181
|
+
**Example:**
|
186
182
|
```javascript
|
187
|
-
|
188
|
-
|
189
|
-
|
183
|
+
datly.variance([1, 2, 3, 4, 5]); // Sample variance
|
184
|
+
datly.variance([1, 2, 3, 4, 5], false); // Population variance
|
185
|
+
```
|
190
186
|
|
191
|
-
|
192
|
-
|
193
|
-
|
194
|
-
});
|
187
|
+
### `stddeviation(array, sample = true)`
|
188
|
+
|
189
|
+
Calculates the standard deviation.
|
195
190
|
|
196
|
-
|
197
|
-
|
198
|
-
|
199
|
-
|
200
|
-
|
191
|
+
**Returns:**
|
192
|
+
```yaml
|
193
|
+
type: statistic
|
194
|
+
name: std_deviation
|
195
|
+
sample: true
|
196
|
+
n: 5
|
197
|
+
value: 1.5811388300841898
|
201
198
|
```
|
202
199
|
|
203
|
-
|
200
|
+
**Example:**
|
201
|
+
```javascript
|
202
|
+
datly.stddeviation([1, 2, 3, 4, 5]);
|
203
|
+
```
|
204
204
|
|
205
|
-
|
206
|
-
Remove rows with all null values.
|
205
|
+
### `minv(array)`
|
207
206
|
|
208
|
-
|
209
|
-
const dataset = {
|
210
|
-
headers: ['a', 'b', 'c'],
|
211
|
-
data: [
|
212
|
-
{ a: 1, b: 2, c: 3 },
|
213
|
-
{ a: null, b: null, c: null }, // Will be removed
|
214
|
-
{ a: 4, b: 5, c: 6 }
|
215
|
-
],
|
216
|
-
length: 3,
|
217
|
-
columns: 3
|
218
|
-
};
|
207
|
+
Returns the minimum value.
|
219
208
|
|
220
|
-
|
221
|
-
|
209
|
+
**Returns:**
|
210
|
+
```yaml
|
211
|
+
type: statistic
|
212
|
+
name: min
|
213
|
+
value: 1
|
222
214
|
```
|
223
215
|
|
224
|
-
|
225
|
-
|
226
|
-
##### `getColumn(dataset, columnName)`
|
227
|
-
Extract a single column as an array.
|
216
|
+
### `maxv(array)`
|
228
217
|
|
229
|
-
|
230
|
-
const data = {
|
231
|
-
headers: ['name', 'age', 'salary'],
|
232
|
-
data: [
|
233
|
-
{ name: 'Alice', age: 25, salary: 50000 },
|
234
|
-
{ name: 'Bob', age: 30, salary: 60000 },
|
235
|
-
{ name: 'Charlie', age: null, salary: 55000 }
|
236
|
-
]
|
237
|
-
};
|
218
|
+
Returns the maximum value.
|
238
219
|
|
239
|
-
|
240
|
-
|
220
|
+
**Returns:**
|
221
|
+
```yaml
|
222
|
+
type: statistic
|
223
|
+
name: max
|
224
|
+
value: 5
|
241
225
|
```
|
242
226
|
|
243
|
-
|
227
|
+
### `quantile(array, q)`
|
244
228
|
|
245
|
-
|
246
|
-
Extract multiple columns as an object.
|
229
|
+
Calculates the q-th quantile (0 ≤ q ≤ 1).
|
247
230
|
|
231
|
+
**Returns:**
|
232
|
+
```yaml
|
233
|
+
type: statistic
|
234
|
+
name: quantile
|
235
|
+
q: 0.25
|
236
|
+
n: 100
|
237
|
+
value: 25.5
|
238
|
+
```
|
239
|
+
|
240
|
+
**Example:**
|
248
241
|
```javascript
|
249
|
-
|
250
|
-
|
251
|
-
//
|
252
|
-
// age: [25, 30],
|
253
|
-
// salary: [50000, 60000, 55000]
|
254
|
-
// }
|
242
|
+
datly.quantile([1, 2, 3, 4, 5], 0.25); // First quartile
|
243
|
+
datly.quantile([1, 2, 3, 4, 5], 0.5); // Median
|
244
|
+
datly.quantile([1, 2, 3, 4, 5], 0.75); // Third quartile
|
255
245
|
```
|
256
246
|
|
257
|
-
|
247
|
+
### `skewness(array)`
|
248
|
+
|
249
|
+
Calculates the skewness (measure of asymmetry).
|
258
250
|
|
259
|
-
|
260
|
-
|
251
|
+
**Returns:**
|
252
|
+
```yaml
|
253
|
+
type: statistic
|
254
|
+
name: skewness
|
255
|
+
value: 0
|
256
|
+
```
|
261
257
|
|
258
|
+
**Example:**
|
262
259
|
```javascript
|
263
|
-
|
264
|
-
console.log(filtered.data);
|
265
|
-
// [{ name: 'Bob', age: 30, salary: 60000 }]
|
260
|
+
datly.skewness([1, 2, 3, 4, 5]); // ~0 for symmetric data
|
266
261
|
```
|
267
262
|
|
268
|
-
|
263
|
+
### `kurtosis(array)`
|
269
264
|
|
270
|
-
|
271
|
-
|
265
|
+
Calculates the kurtosis (measure of tailedness).
|
266
|
+
|
267
|
+
**Returns:**
|
268
|
+
```yaml
|
269
|
+
type: statistic
|
270
|
+
name: kurtosis
|
271
|
+
value: -1.2
|
272
|
+
```
|
272
273
|
|
274
|
+
**Example:**
|
273
275
|
```javascript
|
274
|
-
|
275
|
-
console.log(sorted.data);
|
276
|
-
// [
|
277
|
-
// { name: 'Bob', age: 30, salary: 60000 },
|
278
|
-
// { name: 'Charlie', age: null, salary: 55000 },
|
279
|
-
// { name: 'Alice', age: 25, salary: 50000 }
|
280
|
-
// ]
|
276
|
+
datly.kurtosis([1, 2, 3, 4, 5]);
|
281
277
|
```
|
282
278
|
|
283
279
|
---
|
284
280
|
|
285
|
-
|
281
|
+
## Exploratory Data Analysis
|
286
282
|
|
287
|
-
|
283
|
+
### `df_describe(data)`
|
288
284
|
|
289
|
-
|
285
|
+
Generates comprehensive descriptive statistics for a dataset.
|
290
286
|
|
291
|
-
|
292
|
-
|
287
|
+
**Returns:**
|
288
|
+
```yaml
|
289
|
+
type: describe
|
290
|
+
columns:
|
291
|
+
age:
|
292
|
+
dtype: number
|
293
|
+
count: 100
|
294
|
+
missing: 0
|
295
|
+
mean: 35.5
|
296
|
+
std: 10.2
|
297
|
+
min: 18
|
298
|
+
q1: 28
|
299
|
+
median: 35
|
300
|
+
q3: 43
|
301
|
+
max: 65
|
302
|
+
skewness: 0.15
|
303
|
+
kurtosis: -0.5
|
304
|
+
name:
|
305
|
+
dtype: string
|
306
|
+
count: 100
|
307
|
+
missing: 2
|
308
|
+
unique: 95
|
309
|
+
top:
|
310
|
+
- value: john
|
311
|
+
freq: 3
|
312
|
+
- value: alice
|
313
|
+
freq: 2
|
314
|
+
```
|
293
315
|
|
316
|
+
**Example:**
|
294
317
|
```javascript
|
295
|
-
const
|
296
|
-
|
297
|
-
|
298
|
-
|
299
|
-
|
300
|
-
]
|
301
|
-
};
|
318
|
+
const data = [
|
319
|
+
{ age: 25, salary: 50000, dept: 'IT' },
|
320
|
+
{ age: 30, salary: 60000, dept: 'HR' },
|
321
|
+
{ age: 35, salary: 70000, dept: 'IT' }
|
322
|
+
];
|
302
323
|
|
303
|
-
const
|
304
|
-
console.log(
|
305
|
-
// {
|
306
|
-
// valid: true,
|
307
|
-
// errors: [],
|
308
|
-
// warnings: ['Row 1: Extra columns: c']
|
309
|
-
// }
|
324
|
+
const description = datly.df_describe(data);
|
325
|
+
console.log(description);
|
310
326
|
```
|
311
327
|
|
312
|
-
|
328
|
+
### `df_missing_report(data)`
|
313
329
|
|
314
|
-
|
315
|
-
Validate that column contains numeric values.
|
330
|
+
Analyzes missing values in the dataset.
|
316
331
|
|
332
|
+
**Returns:**
|
333
|
+
```yaml
|
334
|
+
type: missing_report
|
335
|
+
rows:
|
336
|
+
- column: age
|
337
|
+
missing: 5
|
338
|
+
missing_rate: 0.05
|
339
|
+
- column: salary
|
340
|
+
missing: 0
|
341
|
+
missing_rate: 0
|
342
|
+
- column: name
|
343
|
+
missing: 10
|
344
|
+
missing_rate: 0.1
|
345
|
+
```
|
346
|
+
|
347
|
+
**Example:**
|
317
348
|
```javascript
|
318
|
-
const
|
319
|
-
const result = datly.validateNumericColumn(column);
|
320
|
-
console.log(result);
|
321
|
-
// {
|
322
|
-
// valid: true,
|
323
|
-
// validCount: 4,
|
324
|
-
// invalidCount: 2,
|
325
|
-
// cleanData: [1, 2, 4, 5]
|
326
|
-
// }
|
349
|
+
const report = datly.df_missing_report(data);
|
327
350
|
```
|
328
351
|
|
329
|
-
|
352
|
+
### `df_corr(data, method = 'pearson')`
|
353
|
+
|
354
|
+
Calculates correlation matrix between numeric columns.
|
330
355
|
|
331
|
-
|
332
|
-
|
356
|
+
**Parameters:**
|
357
|
+
- `data`: Array of objects
|
358
|
+
- `method`: 'pearson' or 'spearman'
|
359
|
+
|
360
|
+
**Returns:**
|
361
|
+
```yaml
|
362
|
+
type: correlation_matrix
|
363
|
+
method: pearson
|
364
|
+
matrix:
|
365
|
+
age:
|
366
|
+
age: 1
|
367
|
+
salary: 0.85
|
368
|
+
experience: 0.92
|
369
|
+
salary:
|
370
|
+
age: 0.85
|
371
|
+
salary: 1
|
372
|
+
experience: 0.78
|
373
|
+
experience:
|
374
|
+
age: 0.92
|
375
|
+
salary: 0.78
|
376
|
+
experience: 1
|
377
|
+
```
|
333
378
|
|
379
|
+
**Example:**
|
334
380
|
```javascript
|
335
|
-
const
|
336
|
-
|
337
|
-
datly.validateSampleSize(sample, 5);
|
338
|
-
} catch (error) {
|
339
|
-
console.log(error.message); // "Sample size (3) must be at least 5"
|
340
|
-
}
|
381
|
+
const corr = datly.df_corr(data, 'pearson');
|
382
|
+
const spearman = datly.df_corr(data, 'spearman');
|
341
383
|
```
|
342
384
|
|
343
|
-
|
385
|
+
### `eda_overview(data)`
|
344
386
|
|
345
|
-
|
346
|
-
Validate confidence level is between 0 and 1.
|
387
|
+
Generates a comprehensive EDA report combining describe, missing values, and correlation.
|
347
388
|
|
389
|
+
**Returns:**
|
390
|
+
```yaml
|
391
|
+
type: eda
|
392
|
+
summary:
|
393
|
+
age:
|
394
|
+
dtype: number
|
395
|
+
count: 100
|
396
|
+
mean: 35.5
|
397
|
+
std: 10.2
|
398
|
+
...
|
399
|
+
missing:
|
400
|
+
- column: age
|
401
|
+
missing: 5
|
402
|
+
missing_rate: 0.05
|
403
|
+
correlation:
|
404
|
+
age:
|
405
|
+
age: 1
|
406
|
+
salary: 0.85
|
407
|
+
```
|
408
|
+
|
409
|
+
**Example:**
|
348
410
|
```javascript
|
349
|
-
datly.
|
350
|
-
datly.validateConfidenceLevel(1.5); // throws error
|
411
|
+
const overview = datly.eda_overview(data);
|
351
412
|
```
|
352
413
|
|
353
414
|
---
|
354
415
|
|
355
|
-
|
356
|
-
|
357
|
-
General-purpose statistical utilities.
|
358
|
-
|
359
|
-
#### Methods
|
416
|
+
## Probability Distributions
|
360
417
|
|
361
|
-
|
362
|
-
Detect outliers using various methods.
|
418
|
+
### Normal Distribution
|
363
419
|
|
364
|
-
|
365
|
-
const data = [1, 2, 3, 4, 5, 100]; // 100 is an outlier
|
420
|
+
#### `normal_pdf(x, mu = 0, sigma = 1)`
|
366
421
|
|
367
|
-
|
368
|
-
const outliers1 = datly.detectOutliers(data, 'iqr');
|
369
|
-
console.log(outliers1);
|
370
|
-
// {
|
371
|
-
// outliers: [100],
|
372
|
-
// indices: [5],
|
373
|
-
// count: 1,
|
374
|
-
// percentage: 16.67
|
375
|
-
// }
|
422
|
+
Probability density function of normal distribution.
|
376
423
|
|
377
|
-
|
378
|
-
|
424
|
+
**Returns:**
|
425
|
+
```yaml
|
426
|
+
type: distribution
|
427
|
+
name: normal_pdf
|
428
|
+
params:
|
429
|
+
mu: 0
|
430
|
+
sigma: 1
|
431
|
+
value: 0.3989422804014327
|
432
|
+
```
|
379
433
|
|
380
|
-
|
381
|
-
|
434
|
+
**Example:**
|
435
|
+
```javascript
|
436
|
+
datly.normal_pdf(0); // PDF at x=0
|
437
|
+
datly.normal_pdf([0, 1, 2], 0, 1); // PDF for multiple values
|
382
438
|
```
|
383
439
|
|
384
|
-
|
385
|
-
- `'iqr'`: Interquartile Range method (default)
|
386
|
-
- `'zscore'`: Z-score method (|z| > 3)
|
387
|
-
- `'modified_zscore'`: Modified Z-score method
|
440
|
+
#### `normal_cdf(x, mu = 0, sigma = 1)`
|
388
441
|
|
389
|
-
|
442
|
+
Cumulative distribution function of normal distribution.
|
390
443
|
|
391
|
-
|
392
|
-
|
444
|
+
**Returns:**
|
445
|
+
```yaml
|
446
|
+
type: distribution
|
447
|
+
name: normal_cdf
|
448
|
+
params:
|
449
|
+
mu: 0
|
450
|
+
sigma: 1
|
451
|
+
value: 0.5
|
452
|
+
```
|
393
453
|
|
454
|
+
**Example:**
|
394
455
|
```javascript
|
395
|
-
|
396
|
-
|
397
|
-
console.log(freq);
|
398
|
-
// [
|
399
|
-
// { value: 'red', frequency: 3, relativeFrequency: 0.5, percentage: 50 },
|
400
|
-
// { value: 'blue', frequency: 2, relativeFrequency: 0.333, percentage: 33.33 },
|
401
|
-
// { value: 'green', frequency: 1, relativeFrequency: 0.167, percentage: 16.67 }
|
402
|
-
// ]
|
456
|
+
datly.normal_cdf(0); // P(X ≤ 0)
|
457
|
+
datly.normal_cdf(1.96); // P(X ≤ 1.96) ≈ 0.975
|
403
458
|
```
|
404
459
|
|
405
|
-
|
460
|
+
#### `normal_ppf(p, mu = 0, sigma = 1)`
|
406
461
|
|
407
|
-
|
408
|
-
|
462
|
+
Percent point function (inverse CDF) of normal distribution.
|
463
|
+
|
464
|
+
**Returns:**
|
465
|
+
```yaml
|
466
|
+
type: distribution
|
467
|
+
name: normal_ppf
|
468
|
+
params:
|
469
|
+
mu: 0
|
470
|
+
sigma: 1
|
471
|
+
value: 1.959963984540054
|
472
|
+
```
|
409
473
|
|
474
|
+
**Example:**
|
410
475
|
```javascript
|
411
|
-
|
412
|
-
|
413
|
-
data: [
|
414
|
-
{ category: 'A', sales: 100, profit: 20 },
|
415
|
-
{ category: 'B', sales: 150, profit: 30 },
|
416
|
-
{ category: 'A', sales: 200, profit: 40 },
|
417
|
-
{ category: 'B', sales: 180, profit: 35 }
|
418
|
-
]
|
419
|
-
};
|
476
|
+
datly.normal_ppf(0.975); // Returns ~1.96
|
477
|
+
```
|
420
478
|
|
421
|
-
|
422
|
-
sales: 'mean',
|
423
|
-
profit: 'sum'
|
424
|
-
});
|
479
|
+
### Binomial Distribution
|
425
480
|
|
426
|
-
|
427
|
-
// {
|
428
|
-
// A: {
|
429
|
-
// count: 2,
|
430
|
-
// mean_sales: 150,
|
431
|
-
// sum_profit: 60,
|
432
|
-
// data: [...]
|
433
|
-
// },
|
434
|
-
// B: {
|
435
|
-
// count: 2,
|
436
|
-
// mean_sales: 165,
|
437
|
-
// sum_profit: 65,
|
438
|
-
// data: [...]
|
439
|
-
// }
|
440
|
-
// }
|
441
|
-
```
|
442
|
-
|
443
|
-
**Aggregation functions:**
|
444
|
-
- `'mean'`: Average value
|
445
|
-
- `'median'`: Median value
|
446
|
-
- `'sum'`: Sum of values
|
447
|
-
- `'min'`: Minimum value
|
448
|
-
- `'max'`: Maximum value
|
449
|
-
- `'std'`: Standard deviation
|
450
|
-
- `'var'`: Variance
|
451
|
-
- `'count'`: Count of values
|
481
|
+
#### `binomial_pmf(k, n, p)`
|
452
482
|
|
453
|
-
|
483
|
+
Probability mass function of binomial distribution.
|
484
|
+
|
485
|
+
**Parameters:**
|
486
|
+
- `k`: Number of successes (can be array)
|
487
|
+
- `n`: Number of trials
|
488
|
+
- `p`: Probability of success
|
454
489
|
|
455
|
-
|
456
|
-
|
490
|
+
**Returns:**
|
491
|
+
```yaml
|
492
|
+
type: distribution
|
493
|
+
name: binomial_pmf
|
494
|
+
params:
|
495
|
+
n: 10
|
496
|
+
p: 0.5
|
497
|
+
value: 0.24609375
|
498
|
+
```
|
457
499
|
|
500
|
+
**Example:**
|
458
501
|
```javascript
|
459
|
-
|
460
|
-
|
461
|
-
|
462
|
-
length: 100
|
463
|
-
};
|
464
|
-
|
465
|
-
// Random sampling
|
466
|
-
const randomSample = datly.sample(data, 10, 'random');
|
502
|
+
datly.binomial_pmf(5, 10, 0.5); // P(X = 5)
|
503
|
+
datly.binomial_pmf([0, 1, 2, 3], 10, 0.3); // Multiple values
|
504
|
+
```
|
467
505
|
|
468
|
-
|
469
|
-
const systematicSample = datly.sample(data, 10, 'systematic');
|
506
|
+
#### `binomial_cdf(k, n, p)`
|
470
507
|
|
471
|
-
|
472
|
-
const firstSample = datly.sample(data, 10, 'first');
|
508
|
+
Cumulative distribution function of binomial distribution.
|
473
509
|
|
474
|
-
|
475
|
-
|
510
|
+
**Returns:**
|
511
|
+
```yaml
|
512
|
+
type: distribution
|
513
|
+
name: binomial_cdf
|
514
|
+
params:
|
515
|
+
n: 10
|
516
|
+
p: 0.5
|
517
|
+
value: 0.623046875
|
476
518
|
```
|
477
519
|
|
478
|
-
|
520
|
+
### Poisson Distribution
|
479
521
|
|
480
|
-
|
481
|
-
Bootstrap resampling for estimating statistic distribution.
|
522
|
+
#### `poisson_pmf(k, lambda)`
|
482
523
|
|
483
|
-
|
484
|
-
const data = [23, 25, 27, 29, 31, 33, 35];
|
524
|
+
Probability mass function of Poisson distribution.
|
485
525
|
|
486
|
-
|
487
|
-
|
488
|
-
|
489
|
-
|
490
|
-
|
491
|
-
|
492
|
-
|
493
|
-
// }
|
526
|
+
**Returns:**
|
527
|
+
```yaml
|
528
|
+
type: distribution
|
529
|
+
name: poisson_pmf
|
530
|
+
params:
|
531
|
+
lambda: 3
|
532
|
+
value: 0.22404180765538775
|
494
533
|
```
|
495
534
|
|
496
|
-
**
|
497
|
-
|
498
|
-
|
499
|
-
|
500
|
-
- `'var'`: Bootstrap variance
|
501
|
-
- Custom function: Pass your own function
|
502
|
-
|
503
|
-
---
|
535
|
+
**Example:**
|
536
|
+
```javascript
|
537
|
+
datly.poisson_pmf(3, 3); // P(X = 3) when λ = 3
|
538
|
+
```
|
504
539
|
|
505
|
-
|
506
|
-
Create contingency table for categorical data.
|
540
|
+
#### `poisson_cdf(k, lambda)`
|
507
541
|
|
508
|
-
|
509
|
-
const gender = ['M', 'F', 'M', 'F', 'M', 'F'];
|
510
|
-
const preference = ['A', 'B', 'A', 'A', 'B', 'B'];
|
542
|
+
Cumulative distribution function of Poisson distribution.
|
511
543
|
|
512
|
-
|
513
|
-
|
514
|
-
|
515
|
-
|
516
|
-
|
517
|
-
|
518
|
-
|
519
|
-
// totals: {
|
520
|
-
// row: { M: 3, F: 3 },
|
521
|
-
// col: { A: 3, B: 3 },
|
522
|
-
// grand: 6
|
523
|
-
// },
|
524
|
-
// rows: ['M', 'F'],
|
525
|
-
// columns: ['A', 'B']
|
526
|
-
// }
|
544
|
+
**Returns:**
|
545
|
+
```yaml
|
546
|
+
type: distribution
|
547
|
+
name: poisson_cdf
|
548
|
+
params:
|
549
|
+
lambda: 3
|
550
|
+
value: 0.6472319374260858
|
527
551
|
```
|
528
552
|
|
529
553
|
---
|
530
554
|
|
531
|
-
|
555
|
+
## Hypothesis Testing
|
532
556
|
|
533
|
-
|
557
|
+
### `t_test_one_sample(array, hypothesized_mean)`
|
534
558
|
|
535
|
-
|
559
|
+
One-sample t-test.
|
536
560
|
|
537
|
-
|
538
|
-
|
561
|
+
**Returns:**
|
562
|
+
```yaml
|
563
|
+
type: hypothesis_test
|
564
|
+
name: one_sample_t_test
|
565
|
+
statistic: 2.345
|
566
|
+
df: 99
|
567
|
+
p_value: 0.021
|
568
|
+
mean: 105
|
569
|
+
hypothesized_mean: 100
|
570
|
+
```
|
539
571
|
|
572
|
+
**Example:**
|
540
573
|
```javascript
|
541
|
-
const data = [
|
542
|
-
|
543
|
-
console.log(avg); // 30
|
574
|
+
const data = [102, 98, 105, 110, 95, 100, 108];
|
575
|
+
datly.t_test_one_sample(data, 100);
|
544
576
|
```
|
545
577
|
|
546
|
-
|
578
|
+
### `t_test_paired(array1, array2)`
|
547
579
|
|
548
|
-
|
549
|
-
Calculate median (middle value).
|
580
|
+
Paired samples t-test.
|
550
581
|
|
551
|
-
|
552
|
-
|
553
|
-
|
582
|
+
**Returns:**
|
583
|
+
```yaml
|
584
|
+
type: hypothesis_test
|
585
|
+
name: paired_t_test
|
586
|
+
statistic: 3.456
|
587
|
+
df: 29
|
588
|
+
p_value: 0.0018
|
589
|
+
mean_difference: 2.5
|
590
|
+
```
|
554
591
|
|
555
|
-
|
556
|
-
|
592
|
+
**Example:**
|
593
|
+
```javascript
|
594
|
+
const before = [120, 115, 130, 125, 140];
|
595
|
+
const after = [115, 110, 125, 120, 135];
|
596
|
+
datly.t_test_paired(before, after);
|
557
597
|
```
|
558
598
|
|
559
|
-
|
599
|
+
### `t_test_independent(array1, array2, equal_var = true)`
|
560
600
|
|
561
|
-
|
562
|
-
Find mode (most frequent value).
|
601
|
+
Independent samples t-test.
|
563
602
|
|
564
|
-
|
565
|
-
|
566
|
-
const modeResult = datly.mode(data);
|
567
|
-
console.log(modeResult);
|
568
|
-
// {
|
569
|
-
// values: [3],
|
570
|
-
// frequency: 3,
|
571
|
-
// isMultimodal: false,
|
572
|
-
// isUniform: false
|
573
|
-
// }
|
603
|
+
**Parameters:**
|
604
|
+
- `equal_var`: If true, assumes equal variances (pooled t-test); if false, uses Welch's t-test
|
574
605
|
|
575
|
-
|
576
|
-
|
577
|
-
|
578
|
-
|
579
|
-
|
580
|
-
|
581
|
-
|
582
|
-
|
583
|
-
|
584
|
-
|
606
|
+
**Returns:**
|
607
|
+
```yaml
|
608
|
+
type: hypothesis_test
|
609
|
+
name: independent_t_test
|
610
|
+
statistic: 2.105
|
611
|
+
df: 48
|
612
|
+
p_value: 0.041
|
613
|
+
means:
|
614
|
+
group_a: 105.5
|
615
|
+
group_b: 98.3
|
585
616
|
```
|
586
617
|
|
587
|
-
|
588
|
-
|
589
|
-
##### `geometricMean(data)`
|
590
|
-
Calculate geometric mean (for positive values).
|
591
|
-
|
618
|
+
**Example:**
|
592
619
|
```javascript
|
593
|
-
const
|
594
|
-
const
|
595
|
-
|
620
|
+
const group1 = [100, 105, 110, 115, 120];
|
621
|
+
const group2 = [95, 98, 100, 102, 105];
|
622
|
+
datly.t_test_independent(group1, group2);
|
596
623
|
```
|
597
624
|
|
598
|
-
|
625
|
+
### `z_test_one_sample(array, mu = 0, sigma = null, alpha = 0.05)`
|
599
626
|
|
600
|
-
|
627
|
+
One-sample z-test with confidence interval.
|
601
628
|
|
602
|
-
|
603
|
-
|
629
|
+
**Returns:**
|
630
|
+
```yaml
|
631
|
+
type: hypothesis_test
|
632
|
+
name: one_sample_z_test
|
633
|
+
statistic: 2.345
|
634
|
+
p_value: 0.019
|
635
|
+
ci_lower: 102.5
|
636
|
+
ci_upper: 107.5
|
637
|
+
confidence: 0.95
|
638
|
+
extra:
|
639
|
+
sample_mean: 105
|
640
|
+
hypothesized_mean: 100
|
641
|
+
se: 2.13
|
642
|
+
sigma_used: 10
|
643
|
+
n: 22
|
644
|
+
effect_size: 0.5
|
645
|
+
```
|
604
646
|
|
647
|
+
**Example:**
|
605
648
|
```javascript
|
606
|
-
|
607
|
-
const harmMean = datly.harmonicMean(speeds);
|
608
|
-
console.log(harmMean); // 40 (average speed)
|
649
|
+
datly.z_test_one_sample([102, 98, 105, 110], 100, 5, 0.05);
|
609
650
|
```
|
610
651
|
|
611
|
-
|
652
|
+
### `anova_oneway(groups, alpha = 0.05)`
|
612
653
|
|
613
|
-
|
654
|
+
One-way ANOVA test.
|
655
|
+
|
656
|
+
**Parameters:**
|
657
|
+
- `groups`: Array of arrays, each representing a group
|
614
658
|
|
615
|
-
|
616
|
-
|
659
|
+
**Returns:**
|
660
|
+
```yaml
|
661
|
+
type: hypothesis_test
|
662
|
+
name: anova_oneway
|
663
|
+
statistic: 5.678
|
664
|
+
df:
|
665
|
+
between: 2
|
666
|
+
within: 27
|
667
|
+
p_value: 0.009
|
668
|
+
confidence: 0.95
|
669
|
+
extra:
|
670
|
+
group_means:
|
671
|
+
- 102.5
|
672
|
+
- 108.3
|
673
|
+
- 115.7
|
674
|
+
grand_mean: 108.8
|
675
|
+
ssb: 450.5
|
676
|
+
ssw: 890.2
|
677
|
+
```
|
617
678
|
|
679
|
+
**Example:**
|
618
680
|
```javascript
|
619
|
-
const
|
620
|
-
const
|
621
|
-
|
681
|
+
const group1 = [100, 105, 110];
|
682
|
+
const group2 = [108, 112, 115];
|
683
|
+
const group3 = [115, 120, 125];
|
684
|
+
datly.anova_oneway([group1, group2, group3]);
|
622
685
|
```
|
623
686
|
|
624
|
-
|
687
|
+
### `chi_square_independence(observed, alpha = 0.05)`
|
625
688
|
|
626
|
-
|
627
|
-
Calculate weighted average.
|
689
|
+
Chi-square test for independence (contingency table).
|
628
690
|
|
629
|
-
|
630
|
-
|
631
|
-
|
632
|
-
|
633
|
-
|
634
|
-
|
691
|
+
**Parameters:**
|
692
|
+
- `observed`: 2D array (contingency table)
|
693
|
+
|
694
|
+
**Returns:**
|
695
|
+
```yaml
|
696
|
+
type: hypothesis_test
|
697
|
+
name: chi_square_independence
|
698
|
+
statistic: 8.456
|
699
|
+
df: 2
|
700
|
+
p_value: 0.015
|
701
|
+
confidence: 0.95
|
702
|
+
extra:
|
703
|
+
observed:
|
704
|
+
- - 10
|
705
|
+
- 20
|
706
|
+
- 30
|
707
|
+
- - 15
|
708
|
+
- 25
|
709
|
+
- 35
|
710
|
+
expected:
|
711
|
+
- - 12.5
|
712
|
+
- 22.5
|
713
|
+
- 32.5
|
714
|
+
- - 12.5
|
715
|
+
- 22.5
|
716
|
+
- 32.5
|
717
|
+
dof: 2
|
718
|
+
```
|
719
|
+
|
720
|
+
**Example:**
|
721
|
+
```javascript
|
722
|
+
const table = [
|
723
|
+
[10, 20, 30],
|
724
|
+
[15, 25, 35]
|
725
|
+
];
|
726
|
+
datly.chi_square_independence(table);
|
635
727
|
```
|
636
728
|
|
637
|
-
|
729
|
+
### `chi_square_goodness(observed, expected, alpha = 0.05)`
|
638
730
|
|
639
|
-
|
731
|
+
Chi-square goodness of fit test.
|
640
732
|
|
641
|
-
|
642
|
-
|
733
|
+
**Returns:**
|
734
|
+
```yaml
|
735
|
+
type: hypothesis_test
|
736
|
+
name: chi_square_goodness_of_fit
|
737
|
+
statistic: 3.456
|
738
|
+
df: 3
|
739
|
+
p_value: 0.327
|
740
|
+
confidence: 0.95
|
741
|
+
extra:
|
742
|
+
observed:
|
743
|
+
- 45
|
744
|
+
- 55
|
745
|
+
- 48
|
746
|
+
- 52
|
747
|
+
expected:
|
748
|
+
- 50
|
749
|
+
- 50
|
750
|
+
- 50
|
751
|
+
- 50
|
752
|
+
dof: 3
|
753
|
+
```
|
643
754
|
|
755
|
+
**Example:**
|
644
756
|
```javascript
|
645
|
-
const
|
646
|
-
const
|
647
|
-
|
757
|
+
const observed = [45, 55, 48, 52];
|
758
|
+
const expected = [50, 50, 50, 50];
|
759
|
+
datly.chi_square_goodness(observed, expected);
|
648
760
|
```
|
649
761
|
|
650
|
-
|
762
|
+
### `shapiro_wilk(array)`
|
651
763
|
|
652
|
-
|
764
|
+
Shapiro-Wilk test for normality.
|
653
765
|
|
654
|
-
|
655
|
-
|
766
|
+
**Returns:**
|
767
|
+
```yaml
|
768
|
+
type: hypothesis_test
|
769
|
+
name: shapiro_wilk
|
770
|
+
statistic: 0.987
|
771
|
+
n: 50
|
772
|
+
note: approximation; w > 0.9 suggests normality
|
773
|
+
```
|
656
774
|
|
775
|
+
**Example:**
|
657
776
|
```javascript
|
658
|
-
|
659
|
-
const trimmed = datly.centralTendency.trimmedMean(data, 10); // Trim 10% from each end
|
660
|
-
console.log(trimmed); // ~3.75 (without extreme values)
|
777
|
+
datly.shapiro_wilk([1.2, 2.3, 1.8, 2.1, 1.9, 2.0]);
|
661
778
|
```
|
662
779
|
|
663
|
-
|
780
|
+
### `jarque_bera(array)`
|
664
781
|
|
665
|
-
|
666
|
-
Calculate weighted average.
|
782
|
+
Jarque-Bera test for normality.
|
667
783
|
|
668
|
-
|
669
|
-
|
670
|
-
|
671
|
-
|
672
|
-
|
784
|
+
**Returns:**
|
785
|
+
```yaml
|
786
|
+
type: hypothesis_test
|
787
|
+
name: jarque_bera
|
788
|
+
statistic: 2.345
|
789
|
+
n: 100
|
790
|
+
df: 2
|
791
|
+
note: tests normality; low p-value rejects normality
|
673
792
|
```
|
674
793
|
|
675
|
-
|
676
|
-
|
677
|
-
### 5. Dispersion Measures
|
678
|
-
|
679
|
-
Measures of variability and spread.
|
794
|
+
### `levene_test(groups)`
|
680
795
|
|
681
|
-
|
796
|
+
Levene's test for homogeneity of variance.
|
682
797
|
|
683
|
-
|
684
|
-
|
798
|
+
**Returns:**
|
799
|
+
```yaml
|
800
|
+
type: hypothesis_test
|
801
|
+
name: levene_test
|
802
|
+
statistic: 1.234
|
803
|
+
df_between: 2
|
804
|
+
df_within: 27
|
805
|
+
note: tests homogeneity of variance
|
806
|
+
```
|
685
807
|
|
808
|
+
**Example:**
|
686
809
|
```javascript
|
687
|
-
const
|
810
|
+
const g1 = [1, 2, 3, 4, 5];
|
811
|
+
const g2 = [2, 3, 4, 5, 6];
|
812
|
+
const g3 = [3, 4, 5, 6, 7];
|
813
|
+
datly.levene_test([g1, g2, g3]);
|
814
|
+
```
|
688
815
|
|
689
|
-
|
690
|
-
const sampleVar = datly.variance(data, true);
|
691
|
-
console.log(sampleVar); // 10
|
816
|
+
### `kruskal_wallis(groups)`
|
692
817
|
|
693
|
-
|
694
|
-
|
695
|
-
|
818
|
+
Kruskal-Wallis H-test (non-parametric alternative to ANOVA).
|
819
|
+
|
820
|
+
**Returns:**
|
821
|
+
```yaml
|
822
|
+
type: hypothesis_test
|
823
|
+
name: kruskal_wallis
|
824
|
+
statistic: 8.765
|
825
|
+
df: 2
|
826
|
+
note: non-parametric alternative to anova
|
696
827
|
```
|
697
828
|
|
698
|
-
|
829
|
+
### `mann_whitney(array1, array2)`
|
699
830
|
|
700
|
-
|
701
|
-
Calculate standard deviation.
|
831
|
+
Mann-Whitney U test (non-parametric alternative to t-test).
|
702
832
|
|
703
|
-
|
704
|
-
|
705
|
-
|
706
|
-
|
833
|
+
**Returns:**
|
834
|
+
```yaml
|
835
|
+
type: hypothesis_test
|
836
|
+
name: mann_whitney_u
|
837
|
+
statistic: 45
|
838
|
+
z_score: -1.234
|
839
|
+
p_value: 0.217
|
840
|
+
note: non-parametric alternative to t-test
|
707
841
|
```
|
708
842
|
|
709
|
-
|
843
|
+
### `wilcoxon_signed_rank(array1, array2)`
|
710
844
|
|
711
|
-
|
712
|
-
Calculate range (max - min).
|
845
|
+
Wilcoxon signed-rank test (non-parametric paired test).
|
713
846
|
|
714
|
-
|
715
|
-
|
716
|
-
|
717
|
-
|
718
|
-
|
719
|
-
|
720
|
-
|
721
|
-
|
722
|
-
// }
|
847
|
+
**Returns:**
|
848
|
+
```yaml
|
849
|
+
type: hypothesis_test
|
850
|
+
name: wilcoxon_signed_rank
|
851
|
+
statistic: 28
|
852
|
+
z_score: 1.567
|
853
|
+
p_value: 0.117
|
854
|
+
n: 20
|
723
855
|
```
|
724
856
|
|
725
|
-
|
857
|
+
### Confidence Intervals
|
726
858
|
|
727
|
-
|
728
|
-
Calculate IQR (Q3 - Q1).
|
859
|
+
#### `confidence_interval_mean(array, confidence = 0.95)`
|
729
860
|
|
730
|
-
|
731
|
-
|
732
|
-
|
733
|
-
|
734
|
-
|
735
|
-
|
736
|
-
|
737
|
-
|
738
|
-
|
861
|
+
Confidence interval for the mean.
|
862
|
+
|
863
|
+
**Returns:**
|
864
|
+
```yaml
|
865
|
+
type: confidence_interval
|
866
|
+
parameter: mean
|
867
|
+
confidence: 0.95
|
868
|
+
n: 50
|
869
|
+
mean: 102.5
|
870
|
+
lower: 98.3
|
871
|
+
upper: 106.7
|
872
|
+
margin: 4.2
|
739
873
|
```
|
740
874
|
|
741
|
-
|
875
|
+
#### `confidence_interval_proportion(successes, n, confidence = 0.95)`
|
742
876
|
|
743
|
-
|
744
|
-
Calculate coefficient of variation (CV%).
|
877
|
+
Confidence interval for a proportion.
|
745
878
|
|
746
|
-
|
747
|
-
|
748
|
-
|
749
|
-
|
750
|
-
|
751
|
-
|
752
|
-
|
753
|
-
|
879
|
+
**Returns:**
|
880
|
+
```yaml
|
881
|
+
type: confidence_interval
|
882
|
+
parameter: proportion
|
883
|
+
confidence: 0.95
|
884
|
+
n: 100
|
885
|
+
proportion: 0.65
|
886
|
+
lower: 0.551
|
887
|
+
upper: 0.749
|
888
|
+
margin: 0.099
|
754
889
|
```
|
755
890
|
|
756
|
-
|
757
|
-
- CV < 15%: Low variability
|
758
|
-
- CV 15-30%: Moderate variability
|
759
|
-
- CV > 30%: High variability
|
760
|
-
|
761
|
-
---
|
891
|
+
#### `confidence_interval_variance(array, confidence = 0.95)`
|
762
892
|
|
763
|
-
|
764
|
-
Calculate MAD (mean absolute deviation).
|
893
|
+
Confidence interval for variance.
|
765
894
|
|
766
|
-
|
767
|
-
|
768
|
-
|
769
|
-
|
770
|
-
|
771
|
-
|
772
|
-
|
773
|
-
|
895
|
+
**Returns:**
|
896
|
+
```yaml
|
897
|
+
type: confidence_interval
|
898
|
+
parameter: variance
|
899
|
+
confidence: 0.95
|
900
|
+
n: 30
|
901
|
+
variance: 25.5
|
902
|
+
lower: 18.2
|
903
|
+
upper: 38.7
|
774
904
|
```
|
775
905
|
|
776
|
-
|
906
|
+
#### `confidence_interval_difference(array1, array2, confidence = 0.95)`
|
777
907
|
|
778
|
-
|
779
|
-
Calculate standard error of the mean.
|
908
|
+
Confidence interval for difference of means.
|
780
909
|
|
781
|
-
|
782
|
-
|
783
|
-
|
784
|
-
|
910
|
+
**Returns:**
|
911
|
+
```yaml
|
912
|
+
type: confidence_interval
|
913
|
+
parameter: difference_of_means
|
914
|
+
confidence: 0.95
|
915
|
+
difference: 5.5
|
916
|
+
lower: 2.3
|
917
|
+
upper: 8.7
|
918
|
+
margin: 3.2
|
919
|
+
means:
|
920
|
+
group_a: 105.5
|
921
|
+
group_b: 100
|
785
922
|
```
|
786
923
|
|
787
924
|
---
|
788
925
|
|
789
|
-
|
926
|
+
## Correlation Analysis
|
790
927
|
|
791
|
-
|
928
|
+
### `corr_pearson(array1, array2)`
|
792
929
|
|
793
|
-
|
930
|
+
Pearson correlation coefficient.
|
794
931
|
|
795
|
-
|
796
|
-
|
932
|
+
**Returns:**
|
933
|
+
```yaml
|
934
|
+
type: statistic
|
935
|
+
name: pearson_correlation
|
936
|
+
value: 0.856
|
937
|
+
```
|
797
938
|
|
939
|
+
**Example:**
|
798
940
|
```javascript
|
799
|
-
const
|
941
|
+
const x = [1, 2, 3, 4, 5];
|
942
|
+
const y = [2, 4, 5, 4, 5];
|
943
|
+
datly.corr_pearson(x, y);
|
944
|
+
```
|
800
945
|
|
801
|
-
|
802
|
-
console.log(datly.quantile(data, 0.5)); // 5.5
|
946
|
+
### `corr_spearman(array1, array2)`
|
803
947
|
|
804
|
-
|
805
|
-
console.log(datly.quantile(data, 0.25)); // 3.25
|
948
|
+
Spearman rank correlation coefficient.
|
806
949
|
|
807
|
-
|
808
|
-
|
950
|
+
**Returns:**
|
951
|
+
```yaml
|
952
|
+
type: statistic
|
953
|
+
name: spearman_correlation
|
954
|
+
value: 0.9
|
809
955
|
```
|
810
956
|
|
811
|
-
|
957
|
+
### `corr_kendall(array1, array2)`
|
812
958
|
|
813
|
-
|
814
|
-
Calculate percentile (0-100 scale).
|
959
|
+
Kendall's tau correlation coefficient.
|
815
960
|
|
816
|
-
|
817
|
-
|
818
|
-
|
819
|
-
|
961
|
+
**Returns:**
|
962
|
+
```yaml
|
963
|
+
type: statistic
|
964
|
+
name: kendall_tau
|
965
|
+
value: 0.8
|
966
|
+
concordant: 8
|
967
|
+
discordant: 2
|
968
|
+
n: 5
|
820
969
|
```
|
821
970
|
|
822
|
-
|
971
|
+
### `corr_partial(array1, array2, array3)`
|
823
972
|
|
824
|
-
|
825
|
-
Calculate all quartiles.
|
973
|
+
Partial correlation controlling for a third variable.
|
826
974
|
|
827
|
-
|
828
|
-
|
829
|
-
|
830
|
-
|
831
|
-
|
832
|
-
|
833
|
-
// q2: 5, // Median
|
834
|
-
// q3: 7.5,
|
835
|
-
// iqr: 5
|
836
|
-
// }
|
975
|
+
**Returns:**
|
976
|
+
```yaml
|
977
|
+
type: statistic
|
978
|
+
name: partial_correlation
|
979
|
+
value: 0.456
|
980
|
+
controlling_for: third_variable
|
837
981
|
```
|
838
982
|
|
839
|
-
|
983
|
+
### `corr_matrix_all(data)`
|
840
984
|
|
841
|
-
|
842
|
-
Calculate percentile rank of a value.
|
985
|
+
Comprehensive correlation matrix with Pearson, Spearman, and Kendall.
|
843
986
|
|
844
|
-
|
845
|
-
|
846
|
-
|
847
|
-
|
987
|
+
**Returns:**
|
988
|
+
```yaml
|
989
|
+
type: correlation_analysis
|
990
|
+
pearson:
|
991
|
+
age:
|
992
|
+
age: 1
|
993
|
+
salary: 0.85
|
994
|
+
salary:
|
995
|
+
age: 0.85
|
996
|
+
salary: 1
|
997
|
+
spearman:
|
998
|
+
age:
|
999
|
+
age: 1
|
1000
|
+
salary: 0.82
|
1001
|
+
salary:
|
1002
|
+
age: 0.82
|
1003
|
+
salary: 1
|
1004
|
+
kendall:
|
1005
|
+
age:
|
1006
|
+
age: 1
|
1007
|
+
salary: 0.75
|
1008
|
+
salary:
|
1009
|
+
age: 0.75
|
1010
|
+
salary: 1
|
848
1011
|
```
|
849
1012
|
|
850
1013
|
---
|
851
1014
|
|
852
|
-
|
853
|
-
|
1015
|
+
## Regression Models
|
1016
|
+
|
1017
|
+
### Linear Regression
|
854
1018
|
|
1019
|
+
#### `train_linear_regression(X, y)`
|
1020
|
+
|
1021
|
+
Trains a multiple linear regression model.
|
1022
|
+
|
1023
|
+
**Parameters:**
|
1024
|
+
- `X`: 2D array of features [[x1, x2, ...], ...]
|
1025
|
+
- `y`: Array of target values
|
1026
|
+
|
1027
|
+
**Returns:**
|
1028
|
+
```yaml
|
1029
|
+
type: linear_regression
|
1030
|
+
weights:
|
1031
|
+
- 2.5
|
1032
|
+
- 1.8
|
1033
|
+
- -0.3
|
1034
|
+
mse: 12.34
|
1035
|
+
r2: 0.856
|
1036
|
+
n: 100
|
1037
|
+
p: 2
|
1038
|
+
```
|
1039
|
+
|
1040
|
+
**Example:**
|
855
1041
|
```javascript
|
856
|
-
const
|
857
|
-
const
|
858
|
-
|
1042
|
+
const X = [[1, 2], [2, 3], [3, 4], [4, 5]];
|
1043
|
+
const y = [3, 5, 7, 9];
|
1044
|
+
const model = datly.train_linear_regression(X, y);
|
859
1045
|
```
|
860
1046
|
|
861
|
-
|
862
|
-
- |z| < 1: Within 1 standard deviation
|
863
|
-
- |z| < 2: Within 2 standard deviations
|
864
|
-
- |z| > 3: Potential outlier
|
1047
|
+
#### `predict_linear(model, X)`
|
865
1048
|
|
866
|
-
|
1049
|
+
Makes predictions using a trained linear regression model.
|
867
1050
|
|
868
|
-
|
869
|
-
|
1051
|
+
**Parameters:**
|
1052
|
+
- `model`: Model text/object from `train_linear_regression`
|
1053
|
+
- `X`: 2D array of features
|
1054
|
+
|
1055
|
+
**Returns:**
|
1056
|
+
```yaml
|
1057
|
+
type: prediction
|
1058
|
+
name: linear_regression
|
1059
|
+
predictions:
|
1060
|
+
- 105.3
|
1061
|
+
- 110.7
|
1062
|
+
- 98.2
|
1063
|
+
```
|
870
1064
|
|
1065
|
+
**Example:**
|
871
1066
|
```javascript
|
872
|
-
const
|
873
|
-
const stats = datly.boxplotStats(data);
|
874
|
-
console.log(stats);
|
875
|
-
// {
|
876
|
-
// min: 1,
|
877
|
-
// q1: 2,
|
878
|
-
// median: 3.5,
|
879
|
-
// q3: 5,
|
880
|
-
// max: 5,
|
881
|
-
// iqr: 3,
|
882
|
-
// lowerFence: -2.5,
|
883
|
-
// upperFence: 9.5,
|
884
|
-
// outliers: [100],
|
885
|
-
// outlierCount: 1
|
886
|
-
// }
|
1067
|
+
const predictions = datly.predict_linear(model, [[5, 6], [6, 7]]);
|
887
1068
|
```
|
888
1069
|
|
889
|
-
|
1070
|
+
### Logistic Regression
|
890
1071
|
|
891
|
-
|
892
|
-
Rank data values.
|
1072
|
+
#### `train_logistic_regression(X, y, options = {})`
|
893
1073
|
|
894
|
-
|
895
|
-
|
1074
|
+
Trains a logistic regression model for binary classification.
|
1075
|
+
|
1076
|
+
**Parameters:**
|
1077
|
+
- `X`: 2D array of features
|
1078
|
+
- `y`: Array of binary labels (0 or 1)
|
1079
|
+
- `options`:
|
1080
|
+
- `learning_rate`: Learning rate (default: 0.1)
|
1081
|
+
- `iterations`: Number of iterations (default: 1000)
|
1082
|
+
- `l2`: L2 regularization parameter (default: 0)
|
1083
|
+
|
1084
|
+
**Returns:**
|
1085
|
+
```yaml
|
1086
|
+
type: logistic_regression
|
1087
|
+
weights:
|
1088
|
+
- 0.5
|
1089
|
+
- 1.2
|
1090
|
+
- -0.8
|
1091
|
+
accuracy: 0.92
|
1092
|
+
n: 100
|
1093
|
+
p: 2
|
1094
|
+
```
|
1095
|
+
|
1096
|
+
**Example:**
|
1097
|
+
```javascript
|
1098
|
+
const X = [[1, 2], [2, 3], [3, 1], [4, 2]];
|
1099
|
+
const y = [0, 0, 1, 1];
|
1100
|
+
const model = datly.train_logistic_regression(X, y, {
|
1101
|
+
learning_rate: 0.1,
|
1102
|
+
iterations: 1000,
|
1103
|
+
l2: 0.01
|
1104
|
+
});
|
1105
|
+
```
|
896
1106
|
|
897
|
-
|
898
|
-
const ranks1 = datly.rank(data, 'average');
|
899
|
-
console.log(ranks1); // [1, 2.5, 2.5, 4, 5]
|
1107
|
+
#### `predict_logistic(model, X, threshold = 0.5)`
|
900
1108
|
|
901
|
-
|
902
|
-
|
903
|
-
|
1109
|
+
Makes predictions using a trained logistic regression model.
|
1110
|
+
|
1111
|
+
**Returns:**
|
1112
|
+
```yaml
|
1113
|
+
type: prediction
|
1114
|
+
name: logistic_regression
|
1115
|
+
threshold: 0.5
|
1116
|
+
probabilities:
|
1117
|
+
- 0.234
|
1118
|
+
- 0.789
|
1119
|
+
- 0.456
|
1120
|
+
classes:
|
1121
|
+
- 0
|
1122
|
+
- 1
|
1123
|
+
- 0
|
1124
|
+
```
|
904
1125
|
|
905
|
-
|
906
|
-
|
907
|
-
|
1126
|
+
**Example:**
|
1127
|
+
```javascript
|
1128
|
+
const predictions = datly.predict_logistic(model, [[5, 6], [6, 7]], 0.5);
|
908
1129
|
```
|
909
1130
|
|
910
1131
|
---
|
911
1132
|
|
912
|
-
|
1133
|
+
## Classification Models
|
1134
|
+
|
1135
|
+
### K-Nearest Neighbors (KNN)
|
913
1136
|
|
914
|
-
|
1137
|
+
#### `train_knn_classifier(X, y, k = 5)`
|
915
1138
|
|
916
|
-
|
1139
|
+
Trains a KNN classifier.
|
1140
|
+
|
1141
|
+
**Parameters:**
|
1142
|
+
- `X`: 2D array of features
|
1143
|
+
- `y`: Array of class labels
|
1144
|
+
- `k`: Number of neighbors (default: 5)
|
917
1145
|
|
918
|
-
|
919
|
-
|
1146
|
+
**Returns:**
|
1147
|
+
```yaml
|
1148
|
+
type: knn_classifier
|
1149
|
+
k: 5
|
1150
|
+
x:
|
1151
|
+
- - 1
|
1152
|
+
- 2
|
1153
|
+
- - 2
|
1154
|
+
- 3
|
1155
|
+
y:
|
1156
|
+
- 0
|
1157
|
+
- 1
|
1158
|
+
n: 100
|
1159
|
+
p: 2
|
1160
|
+
```
|
920
1161
|
|
1162
|
+
**Example:**
|
921
1163
|
```javascript
|
922
|
-
const
|
1164
|
+
const X = [[1, 2], [2, 3], [3, 1], [4, 2]];
|
1165
|
+
const y = [0, 0, 1, 1];
|
1166
|
+
const model = datly.train_knn_classifier(X, y, 3);
|
1167
|
+
```
|
923
1168
|
|
924
|
-
|
925
|
-
const skew1 = datly.skewness(data, true);
|
926
|
-
console.log(skew1); // Positive value
|
1169
|
+
#### `predict_knn_classifier(model, X)`
|
927
1170
|
|
928
|
-
|
929
|
-
|
930
|
-
|
1171
|
+
Makes predictions using KNN classifier.
|
1172
|
+
|
1173
|
+
**Returns:**
|
1174
|
+
```yaml
|
1175
|
+
type: prediction
|
1176
|
+
name: knn_classifier
|
1177
|
+
k: 5
|
1178
|
+
predictions:
|
1179
|
+
- 0
|
1180
|
+
- 1
|
1181
|
+
- 1
|
931
1182
|
```
|
932
1183
|
|
933
|
-
|
934
|
-
- Skewness < -1: Highly left-skewed
|
935
|
-
- -1 < Skewness < -0.5: Moderately left-skewed
|
936
|
-
- -0.5 < Skewness < 0.5: Approximately symmetric
|
937
|
-
- 0.5 < Skewness < 1: Moderately right-skewed
|
938
|
-
- Skewness > 1: Highly right-skewed
|
1184
|
+
#### `train_knn_regressor(X, y, k = 5)`
|
939
1185
|
|
940
|
-
|
1186
|
+
Trains a KNN regressor.
|
941
1187
|
|
942
|
-
|
943
|
-
|
1188
|
+
**Returns:**
|
1189
|
+
```yaml
|
1190
|
+
type: knn_regressor
|
1191
|
+
k: 5
|
1192
|
+
x:
|
1193
|
+
- - 1
|
1194
|
+
- 2
|
1195
|
+
- - 2
|
1196
|
+
- 3
|
1197
|
+
y:
|
1198
|
+
- 10.5
|
1199
|
+
- 12.3
|
1200
|
+
n: 100
|
1201
|
+
p: 2
|
1202
|
+
```
|
944
1203
|
|
945
|
-
|
946
|
-
const data = [1, 2, 3, 4, 5, 6, 7, 8, 9];
|
1204
|
+
#### `predict_knn_regressor(model, X)`
|
947
1205
|
|
948
|
-
|
949
|
-
const kurt1 = datly.kurtosis(data, false, true);
|
950
|
-
console.log(kurt1); // -3 adjustment applied
|
1206
|
+
Makes predictions using KNN regressor.
|
951
1207
|
|
952
|
-
|
953
|
-
|
954
|
-
|
1208
|
+
**Returns:**
|
1209
|
+
```yaml
|
1210
|
+
type: prediction
|
1211
|
+
name: knn_regressor
|
1212
|
+
k: 5
|
1213
|
+
predictions:
|
1214
|
+
- 10.7
|
1215
|
+
- 11.8
|
1216
|
+
- 12.5
|
955
1217
|
```
|
956
1218
|
|
957
|
-
|
958
|
-
- Excess < -1: Platykurtic (thin tails)
|
959
|
-
- -1 < Excess < 1: Mesokurtic (normal)
|
960
|
-
- Excess > 1: Leptokurtic (fat tails)
|
1219
|
+
### Decision Trees
|
961
1220
|
|
962
|
-
|
1221
|
+
#### `train_decision_tree_classifier(X, y, options = {})`
|
963
1222
|
|
964
|
-
|
965
|
-
Test if data follows normal distribution.
|
1223
|
+
Trains a decision tree classifier.
|
966
1224
|
|
967
|
-
|
968
|
-
|
969
|
-
|
970
|
-
|
971
|
-
|
972
|
-
|
973
|
-
|
974
|
-
|
975
|
-
|
976
|
-
|
977
|
-
|
1225
|
+
**Parameters:**
|
1226
|
+
- `options`:
|
1227
|
+
- `max_depth`: Maximum depth of tree (default: 5)
|
1228
|
+
- `min_samples_split`: Minimum samples required to split (default: 2)
|
1229
|
+
|
1230
|
+
**Returns:**
|
1231
|
+
```yaml
|
1232
|
+
type: decision_tree_classifier
|
1233
|
+
tree:
|
1234
|
+
leaf: false
|
1235
|
+
feature: 0
|
1236
|
+
threshold: 2.5
|
1237
|
+
left:
|
1238
|
+
leaf: true
|
1239
|
+
prediction: 0
|
1240
|
+
n: 50
|
1241
|
+
right:
|
1242
|
+
leaf: true
|
1243
|
+
prediction: 1
|
1244
|
+
n: 50
|
1245
|
+
max_depth: 5
|
1246
|
+
min_samples: 2
|
1247
|
+
n: 100
|
1248
|
+
p: 2
|
1249
|
+
```
|
1250
|
+
|
1251
|
+
**Example:**
|
1252
|
+
```javascript
|
1253
|
+
const model = datly.train_decision_tree_classifier(X, y, {
|
1254
|
+
max_depth: 5,
|
1255
|
+
min_samples_split: 2
|
1256
|
+
});
|
978
1257
|
```
|
979
1258
|
|
980
|
-
|
1259
|
+
#### `train_decision_tree_regressor(X, y, options = {})`
|
981
1260
|
|
982
|
-
|
983
|
-
Jarque-Bera normality test.
|
1261
|
+
Trains a decision tree regressor.
|
984
1262
|
|
985
|
-
|
986
|
-
|
987
|
-
|
988
|
-
|
989
|
-
|
990
|
-
|
991
|
-
|
992
|
-
|
993
|
-
|
994
|
-
|
995
|
-
|
1263
|
+
**Returns:**
|
1264
|
+
```yaml
|
1265
|
+
type: decision_tree_regressor
|
1266
|
+
tree:
|
1267
|
+
leaf: false
|
1268
|
+
feature: 0
|
1269
|
+
threshold: 2.5
|
1270
|
+
left: ...
|
1271
|
+
right: ...
|
1272
|
+
max_depth: 5
|
1273
|
+
min_samples: 2
|
1274
|
+
n: 100
|
1275
|
+
p: 2
|
996
1276
|
```
|
997
1277
|
|
998
|
-
|
1278
|
+
#### `predict_decision_tree(model, X)`
|
1279
|
+
|
1280
|
+
Makes predictions using a decision tree.
|
999
1281
|
|
1000
|
-
|
1001
|
-
|
1002
|
-
|
1003
|
-
|
1004
|
-
|
1005
|
-
|
1006
|
-
|
1007
|
-
|
1008
|
-
|
1009
|
-
```javascript
|
1010
|
-
// One-sample t-test
|
1011
|
-
const sample = [23, 25, 27, 29, 31];
|
1012
|
-
const populationMean = 25;
|
1013
|
-
const oneSample = datly.tTest(sample, populationMean, 'one-sample');
|
1014
|
-
console.log(oneSample);
|
1015
|
-
// {
|
1016
|
-
// type: 'one-sample',
|
1017
|
-
// statistic: 1.89,
|
1018
|
-
// pValue: 0.13,
|
1019
|
-
// degreesOfFreedom: 4,
|
1020
|
-
// significant: false
|
1021
|
-
// }
|
1022
|
-
|
1023
|
-
// Two-sample t-test
|
1024
|
-
const group1 = [23, 25, 27, 29, 31];
|
1025
|
-
const group2 = [28, 30, 32, 34, 36];
|
1026
|
-
const twoSample = datly.tTest(group1, group2, 'two-sample');
|
1027
|
-
console.log(twoSample);
|
1028
|
-
// {
|
1029
|
-
// type: 'two-sample',
|
1030
|
-
// statistic: -3.46,
|
1031
|
-
// pValue: 0.008,
|
1032
|
-
// significant: true,
|
1033
|
-
// meanDifference: -5
|
1034
|
-
// }
|
1035
|
-
|
1036
|
-
// Paired t-test
|
1037
|
-
const before = [120, 125, 130, 128, 122];
|
1038
|
-
const after = [115, 118, 125, 120, 115];
|
1039
|
-
const paired = datly.tTest(before, after, 'paired');
|
1040
|
-
console.log(paired);
|
1282
|
+
**Returns:**
|
1283
|
+
```yaml
|
1284
|
+
type: prediction
|
1285
|
+
name: decision_tree_classifier
|
1286
|
+
predictions:
|
1287
|
+
- 0
|
1288
|
+
- 1
|
1289
|
+
- 1
|
1041
1290
|
```
|
1042
1291
|
|
1043
|
-
|
1292
|
+
### Random Forest
|
1044
1293
|
|
1045
|
-
|
1046
|
-
Perform z-test (known population variance).
|
1294
|
+
#### `train_random_forest_classifier(X, y, options = {})`
|
1047
1295
|
|
1048
|
-
|
1049
|
-
const sample = [105, 110, 108, 112, 115];
|
1050
|
-
const popMean = 100;
|
1051
|
-
const popStd = 15;
|
1296
|
+
Trains a random forest classifier.
|
1052
1297
|
|
1053
|
-
|
1054
|
-
|
1055
|
-
|
1056
|
-
|
1057
|
-
|
1058
|
-
|
1059
|
-
|
1060
|
-
|
1061
|
-
|
1298
|
+
**Parameters:**
|
1299
|
+
- `options`:
|
1300
|
+
- `n_estimators`: Number of trees (default: 10)
|
1301
|
+
- `max_depth`: Maximum depth (default: 5)
|
1302
|
+
- `min_samples_split`: Minimum samples to split (default: 2)
|
1303
|
+
- `seed`: Random seed (default: 42)
|
1304
|
+
|
1305
|
+
**Returns:**
|
1306
|
+
```yaml
|
1307
|
+
type: random_forest_classifier
|
1308
|
+
trees:
|
1309
|
+
- leaf: false
|
1310
|
+
feature: 0
|
1311
|
+
threshold: 2.5
|
1312
|
+
...
|
1313
|
+
- leaf: false
|
1314
|
+
feature: 1
|
1315
|
+
threshold: 3.2
|
1316
|
+
...
|
1317
|
+
n_trees: 10
|
1318
|
+
max_depth: 5
|
1319
|
+
min_samples: 2
|
1320
|
+
n: 100
|
1321
|
+
p: 2
|
1322
|
+
```
|
1323
|
+
|
1324
|
+
**Example:**
|
1325
|
+
```javascript
|
1326
|
+
const model = datly.train_random_forest_classifier(X, y, {
|
1327
|
+
n_estimators: 10,
|
1328
|
+
max_depth: 5,
|
1329
|
+
seed: 42
|
1330
|
+
});
|
1062
1331
|
```
|
1063
1332
|
|
1064
|
-
|
1065
|
-
|
1066
|
-
##### `anovaTest(groups, alpha)`
|
1067
|
-
One-way ANOVA test.
|
1333
|
+
#### `train_random_forest_regressor(X, y, options = {})`
|
1068
1334
|
|
1069
|
-
|
1070
|
-
const groupA = [23, 25, 27, 29];
|
1071
|
-
const groupB = [30, 32, 34, 36];
|
1072
|
-
const groupC = [28, 30, 32, 34];
|
1335
|
+
Trains a random forest regressor.
|
1073
1336
|
|
1074
|
-
|
1075
|
-
|
1076
|
-
|
1077
|
-
|
1078
|
-
|
1079
|
-
|
1080
|
-
|
1081
|
-
|
1082
|
-
|
1083
|
-
// groupMeans: [26, 33, 31]
|
1084
|
-
// }
|
1337
|
+
**Returns:**
|
1338
|
+
```yaml
|
1339
|
+
type: random_forest_regressor
|
1340
|
+
trees: [...]
|
1341
|
+
n_trees: 10
|
1342
|
+
max_depth: 5
|
1343
|
+
min_samples: 2
|
1344
|
+
n: 100
|
1345
|
+
p: 2
|
1085
1346
|
```
|
1086
1347
|
|
1087
|
-
|
1088
|
-
|
1089
|
-
##### `chiSquareTest(column1, column2, alpha)`
|
1090
|
-
Chi-square test of independence.
|
1348
|
+
#### `predict_random_forest_classifier(model, X)`
|
1091
1349
|
|
1092
|
-
|
1093
|
-
const gender = ['M', 'F', 'M', 'F', 'M', 'F', 'M', 'F'];
|
1094
|
-
const preference = ['A', 'B', 'A', 'A', 'B', 'B', 'A', 'B'];
|
1350
|
+
Makes predictions using random forest classifier.
|
1095
1351
|
|
1096
|
-
|
1097
|
-
|
1098
|
-
|
1099
|
-
|
1100
|
-
|
1101
|
-
|
1102
|
-
|
1103
|
-
|
1104
|
-
|
1105
|
-
// }
|
1352
|
+
**Returns:**
|
1353
|
+
```yaml
|
1354
|
+
type: prediction
|
1355
|
+
name: random_forest_classifier
|
1356
|
+
n_trees: 10
|
1357
|
+
predictions:
|
1358
|
+
- 0
|
1359
|
+
- 1
|
1360
|
+
- 1
|
1106
1361
|
```
|
1107
1362
|
|
1108
|
-
|
1109
|
-
|
1110
|
-
##### `mannWhitneyTest(sample1, sample2, alpha)`
|
1111
|
-
Mann-Whitney U test (non-parametric alternative to t-test).
|
1363
|
+
#### `predict_random_forest_regressor(model, X)`
|
1112
1364
|
|
1113
|
-
|
1114
|
-
const group1 = [1, 2, 3, 4, 5];
|
1115
|
-
const group2 = [6, 7, 8, 9, 10];
|
1365
|
+
Makes predictions using random forest regressor.
|
1116
1366
|
|
1117
|
-
|
1118
|
-
|
1119
|
-
|
1120
|
-
|
1121
|
-
|
1122
|
-
|
1123
|
-
|
1124
|
-
|
1125
|
-
|
1126
|
-
// significant: true
|
1127
|
-
// }
|
1367
|
+
**Returns:**
|
1368
|
+
```yaml
|
1369
|
+
type: prediction
|
1370
|
+
name: random_forest_regressor
|
1371
|
+
n_trees: 10
|
1372
|
+
predictions:
|
1373
|
+
- 10.7
|
1374
|
+
- 11.8
|
1375
|
+
- 12.5
|
1128
1376
|
```
|
1129
1377
|
|
1130
|
-
|
1378
|
+
### Naive Bayes
|
1131
1379
|
|
1132
|
-
|
1380
|
+
#### `train_naive_bayes(X, y)`
|
1133
1381
|
|
1134
|
-
|
1382
|
+
Trains a Gaussian Naive Bayes classifier.
|
1135
1383
|
|
1136
|
-
|
1384
|
+
**Parameters:**
|
1385
|
+
- `X`: 2D array of features
|
1386
|
+
- `y`: Array of class labels
|
1137
1387
|
|
1138
|
-
|
1139
|
-
|
1388
|
+
**Returns:**
|
1389
|
+
```yaml
|
1390
|
+
type: naive_bayes
|
1391
|
+
classes:
|
1392
|
+
- 0
|
1393
|
+
- 1
|
1394
|
+
priors:
|
1395
|
+
0: 0.5
|
1396
|
+
1: 0.5
|
1397
|
+
stats:
|
1398
|
+
0:
|
1399
|
+
- mean: 2.5
|
1400
|
+
std: 1.2
|
1401
|
+
- mean: 3.1
|
1402
|
+
std: 0.8
|
1403
|
+
1:
|
1404
|
+
- mean: 5.2
|
1405
|
+
std: 1.5
|
1406
|
+
- mean: 6.3
|
1407
|
+
std: 1.1
|
1408
|
+
n: 100
|
1409
|
+
p: 2
|
1410
|
+
```
|
1140
1411
|
|
1412
|
+
**Example:**
|
1141
1413
|
```javascript
|
1142
|
-
const
|
1143
|
-
const
|
1144
|
-
|
1145
|
-
// {
|
1146
|
-
// mean: 29,
|
1147
|
-
// standardError: 1.63,
|
1148
|
-
// marginOfError: 4.03,
|
1149
|
-
// lowerBound: 24.97,
|
1150
|
-
// upperBound: 33.03,
|
1151
|
-
// confidence: 0.95,
|
1152
|
-
// degreesOfFreedom: 6
|
1153
|
-
// }
|
1414
|
+
const X = [[1, 2], [2, 3], [5, 6], [6, 7]];
|
1415
|
+
const y = [0, 0, 1, 1];
|
1416
|
+
const model = datly.train_naive_bayes(X, y);
|
1154
1417
|
```
|
1155
1418
|
|
1156
|
-
|
1419
|
+
#### `predict_naive_bayes(model, X)`
|
1420
|
+
|
1421
|
+
Makes predictions using Naive Bayes classifier.
|
1157
1422
|
|
1158
|
-
|
1159
|
-
|
1160
|
-
|
1161
|
-
|
1162
|
-
|
1163
|
-
|
1164
|
-
|
1165
|
-
|
1166
|
-
console.log(ci);
|
1167
|
-
// {
|
1168
|
-
// normal: {
|
1169
|
-
// proportion: 0.45,
|
1170
|
-
// lowerBound: 0.353,
|
1171
|
-
// upperBound: 0.547,
|
1172
|
-
// confidence: 0.95
|
1173
|
-
// },
|
1174
|
-
// wilson: {
|
1175
|
-
// proportion: 0.45,
|
1176
|
-
// center: 0.453,
|
1177
|
-
// lowerBound: 0.355,
|
1178
|
-
// upperBound: 0.551,
|
1179
|
-
// confidence: 0.95
|
1180
|
-
// },
|
1181
|
-
// recommended: {...} // Wilson interval for small samples
|
1182
|
-
// }
|
1423
|
+
**Returns:**
|
1424
|
+
```yaml
|
1425
|
+
type: prediction
|
1426
|
+
name: naive_bayes
|
1427
|
+
predictions:
|
1428
|
+
- 0
|
1429
|
+
- 1
|
1430
|
+
- 1
|
1183
1431
|
```
|
1184
1432
|
|
1185
1433
|
---
|
1186
1434
|
|
1187
|
-
|
1188
|
-
Confidence interval for variance.
|
1435
|
+
## Clustering
|
1189
1436
|
|
1190
|
-
|
1191
|
-
|
1192
|
-
|
1193
|
-
|
1194
|
-
|
1195
|
-
|
1196
|
-
|
1197
|
-
|
1198
|
-
|
1199
|
-
|
1437
|
+
### K-Means Clustering
|
1438
|
+
|
1439
|
+
#### `train_kmeans(X, k = 3, options = {})`
|
1440
|
+
|
1441
|
+
Trains a K-means clustering model.
|
1442
|
+
|
1443
|
+
**Parameters:**
|
1444
|
+
- `X`: 2D array of features
|
1445
|
+
- `k`: Number of clusters (default: 3)
|
1446
|
+
- `options`:
|
1447
|
+
- `max_iterations`: Maximum iterations (default: 100)
|
1448
|
+
- `seed`: Random seed (default: 42)
|
1449
|
+
|
1450
|
+
**Returns:**
|
1451
|
+
```yaml
|
1452
|
+
type: kmeans
|
1453
|
+
k: 3
|
1454
|
+
centroids:
|
1455
|
+
- - 2.1
|
1456
|
+
- 3.5
|
1457
|
+
- - 5.8
|
1458
|
+
- 6.2
|
1459
|
+
- - 9.1
|
1460
|
+
- 8.7
|
1461
|
+
inertia: 45.67
|
1462
|
+
n: 150
|
1463
|
+
p: 2
|
1464
|
+
```
|
1465
|
+
|
1466
|
+
**Example:**
|
1467
|
+
```javascript
|
1468
|
+
const X = [[1, 2], [2, 3], [5, 6], [6, 7], [9, 8], [10, 9]];
|
1469
|
+
const model = datly.train_kmeans(X, 3, {
|
1470
|
+
max_iterations: 100,
|
1471
|
+
seed: 42
|
1472
|
+
});
|
1200
1473
|
```
|
1201
1474
|
|
1202
|
-
|
1475
|
+
#### `predict_kmeans(model, X)`
|
1203
1476
|
|
1204
|
-
|
1205
|
-
Confidence interval for difference between two means.
|
1477
|
+
Assigns cluster labels to new data points.
|
1206
1478
|
|
1207
|
-
|
1208
|
-
|
1209
|
-
|
1479
|
+
**Returns:**
|
1480
|
+
```yaml
|
1481
|
+
type: prediction
|
1482
|
+
name: kmeans
|
1483
|
+
k: 3
|
1484
|
+
cluster_labels:
|
1485
|
+
- 0
|
1486
|
+
- 0
|
1487
|
+
- 1
|
1488
|
+
- 1
|
1489
|
+
- 2
|
1490
|
+
- 2
|
1491
|
+
```
|
1210
1492
|
|
1211
|
-
|
1212
|
-
|
1213
|
-
|
1214
|
-
|
1215
|
-
// sample1Mean: 125,
|
1216
|
-
// sample2Mean: 115.8,
|
1217
|
-
// lowerBound: 3.5,
|
1218
|
-
// upperBound: 14.9,
|
1219
|
-
// confidence: 0.95
|
1220
|
-
// }
|
1493
|
+
**Example:**
|
1494
|
+
```javascript
|
1495
|
+
const newData = [[1.5, 2.5], [5.5, 6.5], [9.5, 8.5]];
|
1496
|
+
const clusters = datly.predict_kmeans(model, newData);
|
1221
1497
|
```
|
1222
1498
|
|
1223
1499
|
---
|
1224
1500
|
|
1225
|
-
|
1226
|
-
Confidence interval for correlation coefficient.
|
1501
|
+
## Ensemble Methods
|
1227
1502
|
|
1228
|
-
|
1229
|
-
const x = [1, 2, 3, 4, 5];
|
1230
|
-
const y = [2, 4, 5, 4, 5];
|
1503
|
+
### `ensemble_voting_classifier(models, X, method = 'hard')`
|
1231
1504
|
|
1232
|
-
|
1233
|
-
|
1234
|
-
|
1235
|
-
|
1236
|
-
|
1237
|
-
|
1238
|
-
|
1239
|
-
|
1240
|
-
|
1505
|
+
Combines multiple classifier predictions through voting.
|
1506
|
+
|
1507
|
+
**Parameters:**
|
1508
|
+
- `models`: Array of trained model texts/objects
|
1509
|
+
- `X`: 2D array of features
|
1510
|
+
- `method`: 'hard' for majority voting, 'soft' for probability averaging
|
1511
|
+
|
1512
|
+
**Returns:**
|
1513
|
+
```yaml
|
1514
|
+
type: ensemble_prediction
|
1515
|
+
method: voting_hard
|
1516
|
+
n_models: 3
|
1517
|
+
predictions:
|
1518
|
+
- 0
|
1519
|
+
- 1
|
1520
|
+
- 1
|
1521
|
+
- 0
|
1522
|
+
```
|
1523
|
+
|
1524
|
+
**Example:**
|
1525
|
+
```javascript
|
1526
|
+
const model1 = datly.train_logistic_regression(X, y);
|
1527
|
+
const model2 = datly.train_knn_classifier(X, y, 5);
|
1528
|
+
const model3 = datly.train_decision_tree_classifier(X, y);
|
1529
|
+
|
1530
|
+
const ensemble = datly.ensemble_voting_classifier(
|
1531
|
+
[model1, model2, model3],
|
1532
|
+
X_test,
|
1533
|
+
'hard'
|
1534
|
+
);
|
1241
1535
|
```
|
1242
1536
|
|
1243
|
-
|
1537
|
+
### `ensemble_voting_regressor(models, X)`
|
1244
1538
|
|
1245
|
-
|
1246
|
-
Bootstrap confidence interval.
|
1539
|
+
Combines multiple regressor predictions through averaging.
|
1247
1540
|
|
1541
|
+
**Returns:**
|
1542
|
+
```yaml
|
1543
|
+
type: ensemble_prediction
|
1544
|
+
method: voting_average
|
1545
|
+
n_models: 3
|
1546
|
+
predictions:
|
1547
|
+
- 105.3
|
1548
|
+
- 110.7
|
1549
|
+
- 98.2
|
1550
|
+
```
|
1551
|
+
|
1552
|
+
**Example:**
|
1248
1553
|
```javascript
|
1249
|
-
const
|
1554
|
+
const model1 = datly.train_linear_regression(X, y);
|
1555
|
+
const model2 = datly.train_knn_regressor(X, y, 5);
|
1556
|
+
const model3 = datly.train_decision_tree_regressor(X, y);
|
1250
1557
|
|
1251
|
-
|
1252
|
-
|
1253
|
-
|
1254
|
-
|
1255
|
-
// originalStatistic: 27,
|
1256
|
-
// bootstrapMean: 27.1,
|
1257
|
-
// bias: 0.1,
|
1258
|
-
// standardError: 1.4,
|
1259
|
-
// lowerBound: 24.5,
|
1260
|
-
// upperBound: 29.8,
|
1261
|
-
// confidence: 0.95,
|
1262
|
-
// iterations: 1000
|
1263
|
-
// }
|
1558
|
+
const ensemble = datly.ensemble_voting_regressor(
|
1559
|
+
[model1, model2, model3],
|
1560
|
+
X_test
|
1561
|
+
);
|
1264
1562
|
```
|
1265
1563
|
|
1266
1564
|
---
|
1267
1565
|
|
1268
|
-
|
1566
|
+
## Model Evaluation
|
1567
|
+
|
1568
|
+
### `train_test_split(X, y, test_size = 0.2, seed = 42)`
|
1269
1569
|
|
1270
|
-
|
1570
|
+
Splits data into training and testing sets.
|
1271
1571
|
|
1272
|
-
|
1572
|
+
**Parameters:**
|
1573
|
+
- `X`: 2D array of features
|
1574
|
+
- `y`: Array of labels
|
1575
|
+
- `test_size`: Proportion for test set (default: 0.2)
|
1576
|
+
- `seed`: Random seed (default: 42)
|
1273
1577
|
|
1274
|
-
|
1275
|
-
|
1578
|
+
**Returns:**
|
1579
|
+
```yaml
|
1580
|
+
type: split
|
1581
|
+
sizes:
|
1582
|
+
train: 80
|
1583
|
+
test: 20
|
1584
|
+
indices:
|
1585
|
+
train:
|
1586
|
+
- 0
|
1587
|
+
- 2
|
1588
|
+
- 3
|
1589
|
+
...
|
1590
|
+
test:
|
1591
|
+
- 1
|
1592
|
+
- 4
|
1593
|
+
...
|
1594
|
+
preview:
|
1595
|
+
x_train:
|
1596
|
+
- - 1
|
1597
|
+
- 2
|
1598
|
+
- - 3
|
1599
|
+
- 4
|
1600
|
+
y_train:
|
1601
|
+
- 0
|
1602
|
+
- 1
|
1603
|
+
- 0
|
1604
|
+
```
|
1276
1605
|
|
1606
|
+
**Example:**
|
1277
1607
|
```javascript
|
1278
|
-
const
|
1279
|
-
|
1280
|
-
console.log(sw);
|
1281
|
-
// {
|
1282
|
-
// statistic: 0.96,
|
1283
|
-
// pValue: 0.82,
|
1284
|
-
// isNormal: true,
|
1285
|
-
// alpha: 0.05,
|
1286
|
-
// interpretation: "Fail to reject null hypothesis..."
|
1287
|
-
// }
|
1608
|
+
const split = datly.train_test_split(X, y, 0.2, 42);
|
1609
|
+
// Use split.indices to extract train/test data
|
1288
1610
|
```
|
1289
1611
|
|
1290
|
-
|
1612
|
+
### Classification Metrics
|
1291
1613
|
|
1292
|
-
|
1614
|
+
#### `metrics_classification(y_true, y_pred)`
|
1615
|
+
|
1616
|
+
Calculates classification metrics including accuracy, precision, recall, and F1-score.
|
1293
1617
|
|
1294
|
-
|
1295
|
-
|
1618
|
+
**Returns:**
|
1619
|
+
```yaml
|
1620
|
+
type: metric
|
1621
|
+
name: classification_report
|
1622
|
+
confusion_matrix:
|
1623
|
+
tp: 45
|
1624
|
+
fp: 5
|
1625
|
+
tn: 42
|
1626
|
+
fn: 8
|
1627
|
+
accuracy: 0.87
|
1628
|
+
precision: 0.9
|
1629
|
+
recall: 0.849
|
1630
|
+
f1: 0.874
|
1631
|
+
```
|
1296
1632
|
|
1633
|
+
**Example:**
|
1297
1634
|
```javascript
|
1298
|
-
const
|
1299
|
-
const
|
1300
|
-
|
1301
|
-
// {
|
1302
|
-
// statistic: 0.15,
|
1303
|
-
// pValue: 0.95,
|
1304
|
-
// isNormal: true,
|
1305
|
-
// lambda: 0.47
|
1306
|
-
// }
|
1635
|
+
const y_true = [0, 1, 1, 0, 1, 1, 0, 0];
|
1636
|
+
const y_pred = [0, 1, 0, 0, 1, 1, 0, 1];
|
1637
|
+
const metrics = datly.metrics_classification(y_true, y_pred);
|
1307
1638
|
```
|
1308
1639
|
|
1309
|
-
|
1640
|
+
### Regression Metrics
|
1310
1641
|
|
1311
|
-
|
1642
|
+
#### `metrics_regression(y_true, y_pred)`
|
1312
1643
|
|
1313
|
-
|
1314
|
-
|
1644
|
+
Calculates regression metrics including MSE, MAE, and R².
|
1645
|
+
|
1646
|
+
**Returns:**
|
1647
|
+
```yaml
|
1648
|
+
type: metric
|
1649
|
+
name: regression_report
|
1650
|
+
mse: 12.34
|
1651
|
+
mae: 2.87
|
1652
|
+
r2: 0.856
|
1653
|
+
```
|
1315
1654
|
|
1655
|
+
**Example:**
|
1316
1656
|
```javascript
|
1317
|
-
const
|
1318
|
-
const
|
1319
|
-
|
1320
|
-
// {
|
1321
|
-
// statistic: 0.234,
|
1322
|
-
// adjustedStatistic: 0.247,
|
1323
|
-
// pValue: 0.75,
|
1324
|
-
// isNormal: true
|
1325
|
-
// }
|
1657
|
+
const y_true = [3.0, 5.0, 7.0, 9.0];
|
1658
|
+
const y_pred = [2.8, 5.2, 6.9, 9.1];
|
1659
|
+
const metrics = datly.metrics_regression(y_true, y_pred);
|
1326
1660
|
```
|
1327
1661
|
|
1328
|
-
|
1662
|
+
### Cross Validation
|
1329
1663
|
|
1330
|
-
|
1664
|
+
#### `cross_validate(X, y, model_type, options = {})`
|
1331
1665
|
|
1332
|
-
|
1333
|
-
Jarque-Bera test (based on skewness and kurtosis).
|
1666
|
+
Performs k-fold cross-validation.
|
1334
1667
|
|
1335
|
-
|
1336
|
-
|
1337
|
-
|
1338
|
-
|
1339
|
-
|
1340
|
-
|
1341
|
-
|
1342
|
-
|
1343
|
-
|
1344
|
-
|
1345
|
-
|
1668
|
+
**Parameters:**
|
1669
|
+
- `X`: 2D array of features
|
1670
|
+
- `y`: Array of labels
|
1671
|
+
- `model_type`: String - 'linear_regression', 'logistic_regression', 'knn_classifier', 'decision_tree_classifier', 'random_forest_classifier'
|
1672
|
+
- `options`:
|
1673
|
+
- `k_folds`: Number of folds (default: 5)
|
1674
|
+
- Model-specific options (e.g., `k` for KNN, `max_depth` for trees)
|
1675
|
+
|
1676
|
+
**Returns:**
|
1677
|
+
```yaml
|
1678
|
+
type: cross_validation
|
1679
|
+
model_type: logistic_regression
|
1680
|
+
k_folds: 5
|
1681
|
+
scores:
|
1682
|
+
- 0.85
|
1683
|
+
- 0.88
|
1684
|
+
- 0.82
|
1685
|
+
- 0.87
|
1686
|
+
- 0.86
|
1687
|
+
mean_score: 0.856
|
1688
|
+
std_score: 0.022
|
1689
|
+
```
|
1690
|
+
|
1691
|
+
**Example:**
|
1692
|
+
```javascript
|
1693
|
+
const cv = datly.cross_validate(X, y, 'logistic_regression', {
|
1694
|
+
k_folds: 5,
|
1695
|
+
learning_rate: 0.1,
|
1696
|
+
iterations: 1000
|
1697
|
+
});
|
1346
1698
|
```
|
1347
1699
|
|
1348
|
-
|
1349
|
-
|
1350
|
-
---
|
1700
|
+
### Feature Importance
|
1351
1701
|
|
1352
|
-
|
1353
|
-
D'Agostino K² test.
|
1702
|
+
#### `feature_importance_tree(model)`
|
1354
1703
|
|
1355
|
-
|
1356
|
-
const data = Array.from({ length: 50 }, () => Math.random() * 10);
|
1357
|
-
const dago = datly.normalityTests.dagoTest(data);
|
1358
|
-
console.log(dago);
|
1359
|
-
// {
|
1360
|
-
// statistic: 2.34,
|
1361
|
-
// pValue: 0.31,
|
1362
|
-
// isNormal: true,
|
1363
|
-
// skewness: 0.23,
|
1364
|
-
// excessKurtosis: -0.45
|
1365
|
-
// }
|
1366
|
-
```
|
1704
|
+
Extracts feature importance from tree-based models.
|
1367
1705
|
|
1368
|
-
**
|
1706
|
+
**Parameters:**
|
1707
|
+
- `model`: Trained decision tree or random forest model
|
1369
1708
|
|
1370
|
-
|
1709
|
+
**Returns:**
|
1710
|
+
```yaml
|
1711
|
+
type: feature_importance
|
1712
|
+
model: random_forest_classifier
|
1713
|
+
n_trees: 10
|
1714
|
+
importance:
|
1715
|
+
- 0.45
|
1716
|
+
- 0.32
|
1717
|
+
- 0.15
|
1718
|
+
- 0.08
|
1719
|
+
```
|
1371
1720
|
|
1372
|
-
|
1373
|
-
|
1374
|
-
|
1375
|
-
|
1376
|
-
const data = [12, 14, 15, 16, 17, 18, 20, 22];
|
1377
|
-
const batch = datly.normalityTests.batchNormalityTest(data);
|
1378
|
-
console.log(batch);
|
1379
|
-
// {
|
1380
|
-
// individualTests: {
|
1381
|
-
// shapiroWilk: {...},
|
1382
|
-
// jarqueBera: {...},
|
1383
|
-
// andersonDarling: {...},
|
1384
|
-
// kolmogorovSmirnov: {...}
|
1385
|
-
// },
|
1386
|
-
// summary: {
|
1387
|
-
// testsRun: 4,
|
1388
|
-
// testsPassingNormality: 4,
|
1389
|
-
// consensusNormal: true,
|
1390
|
-
// strongNormalEvidence: true
|
1391
|
-
// },
|
1392
|
-
// recommendation: "Strong evidence for normality..."
|
1393
|
-
// }
|
1721
|
+
**Example:**
|
1722
|
+
```javascript
|
1723
|
+
const model = datly.train_random_forest_classifier(X, y);
|
1724
|
+
const importance = datly.feature_importance_tree(model);
|
1394
1725
|
```
|
1395
1726
|
|
1396
1727
|
---
|
1397
1728
|
|
1398
|
-
|
1729
|
+
## Data Preprocessing
|
1399
1730
|
|
1400
|
-
|
1731
|
+
### Scaling
|
1401
1732
|
|
1402
|
-
####
|
1733
|
+
#### `standard_scaler_fit(X)`
|
1403
1734
|
|
1404
|
-
|
1405
|
-
Pearson correlation (linear relationships).
|
1735
|
+
Fits a standard scaler (z-score normalization).
|
1406
1736
|
|
1407
|
-
|
1408
|
-
|
1409
|
-
|
1737
|
+
**Returns:**
|
1738
|
+
```yaml
|
1739
|
+
type: standard_scaler
|
1740
|
+
params:
|
1741
|
+
- mean: 50.5
|
1742
|
+
std: 15.2
|
1743
|
+
- mean: 100.3
|
1744
|
+
std: 25.7
|
1745
|
+
n: 100
|
1746
|
+
p: 2
|
1747
|
+
```
|
1410
1748
|
|
1411
|
-
|
1412
|
-
|
1413
|
-
|
1414
|
-
|
1415
|
-
// pValue: 0.000,
|
1416
|
-
// tStatistic: Infinity,
|
1417
|
-
// significant: true,
|
1418
|
-
// confidenceInterval: { lower: 1.0, upper: 1.0 }
|
1419
|
-
// }
|
1749
|
+
**Example:**
|
1750
|
+
```javascript
|
1751
|
+
const X = [[50, 100], [60, 120], [40, 90]];
|
1752
|
+
const scaler = datly.standard_scaler_fit(X);
|
1420
1753
|
```
|
1421
1754
|
|
1422
|
-
|
1755
|
+
#### `standard_scaler_transform(scaler, X)`
|
1423
1756
|
|
1424
|
-
|
1425
|
-
- |r| < 0.3: Weak
|
1426
|
-
- 0.3 ≤ |r| < 0.7: Moderate
|
1427
|
-
- |r| ≥ 0.7: Strong
|
1757
|
+
Transforms data using fitted standard scaler.
|
1428
1758
|
|
1429
|
-
|
1430
|
-
|
1431
|
-
|
1432
|
-
|
1433
|
-
|
1434
|
-
|
1435
|
-
|
1436
|
-
|
1437
|
-
|
1438
|
-
|
1439
|
-
|
1440
|
-
// {
|
1441
|
-
// correlation: 1.0,
|
1442
|
-
// pValue: 0.000,
|
1443
|
-
// significant: true,
|
1444
|
-
// xRanks: [1, 2, 3, 4, 5],
|
1445
|
-
// yRanks: [1, 2, 3, 4, 5]
|
1446
|
-
// }
|
1759
|
+
**Returns:**
|
1760
|
+
```yaml
|
1761
|
+
type: scaled_data
|
1762
|
+
method: standard
|
1763
|
+
preview:
|
1764
|
+
- - 0.0
|
1765
|
+
- 0.0
|
1766
|
+
- - 0.625
|
1767
|
+
- 0.767
|
1768
|
+
- - -0.625
|
1769
|
+
- -0.767
|
1447
1770
|
```
|
1448
1771
|
|
1449
|
-
**
|
1450
|
-
|
1451
|
-
---
|
1452
|
-
|
1453
|
-
##### `kendall(x, y)`
|
1454
|
-
Kendall's Tau correlation.
|
1455
|
-
|
1772
|
+
**Example:**
|
1456
1773
|
```javascript
|
1457
|
-
const
|
1458
|
-
const y = [2, 1, 4, 3, 5];
|
1459
|
-
|
1460
|
-
const corr = datly.correlation.kendall(x, y);
|
1461
|
-
console.log(corr);
|
1462
|
-
// {
|
1463
|
-
// correlation: 0.6,
|
1464
|
-
// pValue: 0.142,
|
1465
|
-
// zStatistic: 1.47,
|
1466
|
-
// concordantPairs: 8,
|
1467
|
-
// discordantPairs: 2,
|
1468
|
-
// significant: false
|
1469
|
-
// }
|
1774
|
+
const scaled = datly.standard_scaler_transform(scaler, X);
|
1470
1775
|
```
|
1471
1776
|
|
1472
|
-
|
1777
|
+
#### `minmax_scaler_fit(X)`
|
1473
1778
|
|
1474
|
-
|
1779
|
+
Fits a min-max scaler (scales to [0, 1] range).
|
1475
1780
|
|
1476
|
-
|
1477
|
-
|
1478
|
-
|
1479
|
-
|
1480
|
-
|
1481
|
-
|
1482
|
-
|
1483
|
-
|
1484
|
-
|
1485
|
-
|
1486
|
-
{ age: 40, income: 60000, spending: 35000 }
|
1487
|
-
]
|
1488
|
-
};
|
1489
|
-
|
1490
|
-
const matrix = datly.correlation.matrix(data, 'pearson');
|
1491
|
-
console.log(matrix);
|
1492
|
-
// {
|
1493
|
-
// correlations: {
|
1494
|
-
// age: { age: 1, income: 1, spending: 1 },
|
1495
|
-
// income: { age: 1, income: 1, spending: 1 },
|
1496
|
-
// spending: { age: 1, income: 1, spending: 1 }
|
1497
|
-
// },
|
1498
|
-
// pValues: {...},
|
1499
|
-
// strongCorrelations: [
|
1500
|
-
// { variable1: 'age', variable2: 'income', correlation: 1.0 }
|
1501
|
-
// ]
|
1502
|
-
// }
|
1781
|
+
**Returns:**
|
1782
|
+
```yaml
|
1783
|
+
type: minmax_scaler
|
1784
|
+
params:
|
1785
|
+
- min: 40
|
1786
|
+
max: 60
|
1787
|
+
- min: 90
|
1788
|
+
max: 120
|
1789
|
+
n: 100
|
1790
|
+
p: 2
|
1503
1791
|
```
|
1504
1792
|
|
1505
|
-
|
1793
|
+
#### `minmax_scaler_transform(scaler, X)`
|
1506
1794
|
|
1507
|
-
|
1508
|
-
Partial correlation (controlling for third variable).
|
1509
|
-
|
1510
|
-
```javascript
|
1511
|
-
const x = [1, 2, 3, 4, 5];
|
1512
|
-
const y = [2, 3, 4, 5, 6];
|
1513
|
-
const z = [1, 1, 2, 2, 3]; // Control variable
|
1795
|
+
Transforms data using fitted min-max scaler.
|
1514
1796
|
|
1515
|
-
|
1516
|
-
|
1517
|
-
|
1518
|
-
|
1519
|
-
|
1520
|
-
|
1521
|
-
|
1522
|
-
|
1797
|
+
**Returns:**
|
1798
|
+
```yaml
|
1799
|
+
type: scaled_data
|
1800
|
+
method: minmax
|
1801
|
+
preview:
|
1802
|
+
- - 0.5
|
1803
|
+
- 0.333
|
1804
|
+
- - 1.0
|
1805
|
+
- 1.0
|
1806
|
+
- - 0.0
|
1807
|
+
- 0.0
|
1523
1808
|
```
|
1524
1809
|
|
1525
1810
|
---
|
1526
1811
|
|
1527
|
-
|
1528
|
-
Calculate covariance.
|
1812
|
+
## Dimensionality Reduction
|
1529
1813
|
|
1530
|
-
|
1531
|
-
const x = [1, 2, 3, 4, 5];
|
1532
|
-
const y = [2, 4, 6, 8, 10];
|
1814
|
+
### Principal Component Analysis (PCA)
|
1533
1815
|
|
1534
|
-
|
1535
|
-
const cov = datly.correlation.covariance(x, y, true);
|
1536
|
-
console.log(cov);
|
1537
|
-
// {
|
1538
|
-
// covariance: 5,
|
1539
|
-
// meanX: 3,
|
1540
|
-
// meanY: 6,
|
1541
|
-
// sampleSize: 5
|
1542
|
-
// }
|
1543
|
-
```
|
1816
|
+
#### `train_pca(X, n_components = 2)`
|
1544
1817
|
|
1545
|
-
|
1818
|
+
Trains a PCA model.
|
1546
1819
|
|
1547
|
-
|
1548
|
-
|
1549
|
-
|
1550
|
-
|
1551
|
-
#### Methods
|
1552
|
-
|
1553
|
-
##### `linear(x, y)`
|
1554
|
-
Simple linear regression.
|
1555
|
-
|
1556
|
-
```javascript
|
1557
|
-
const x = [1, 2, 3, 4, 5];
|
1558
|
-
const y = [2, 4, 5, 4, 5];
|
1559
|
-
|
1560
|
-
const model = datly.regression.linear(x, y);
|
1561
|
-
console.log(model);
|
1562
|
-
// {
|
1563
|
-
// slope: 0.6,
|
1564
|
-
// intercept: 2.2,
|
1565
|
-
// rSquared: 0.46,
|
1566
|
-
// adjustedRSquared: 0.28,
|
1567
|
-
// equation: 'y = 2.2000 + 0.6000x',
|
1568
|
-
// pValueSlope: 0.158,
|
1569
|
-
// pValueModel: 0.158,
|
1570
|
-
// residuals: [-0.2, 0.2, 0.4, -1.0, 0.6],
|
1571
|
-
// predicted: [2.8, 3.4, 4.0, 4.6, 5.2]
|
1572
|
-
// }
|
1573
|
-
```
|
1574
|
-
|
1575
|
-
---
|
1576
|
-
|
1577
|
-
##### `multiple(dataset, dependent, independents)`
|
1578
|
-
Multiple linear regression.
|
1579
|
-
|
1580
|
-
```javascript
|
1581
|
-
const data = {
|
1582
|
-
headers: ['sales', 'advertising', 'price', 'competition'],
|
1583
|
-
data: [
|
1584
|
-
{ sales: 100, advertising: 10, price: 50, competition: 3 },
|
1585
|
-
{ sales: 150, advertising: 15, price: 45, competition: 2 },
|
1586
|
-
{ sales: 120, advertising: 12, price: 48, competition: 4 },
|
1587
|
-
{ sales: 180, advertising: 20, price: 40, competition: 1 }
|
1588
|
-
]
|
1589
|
-
};
|
1590
|
-
|
1591
|
-
const model = datly.regression.multiple(
|
1592
|
-
data,
|
1593
|
-
'sales',
|
1594
|
-
['advertising', 'price', 'competition']
|
1595
|
-
);
|
1596
|
-
|
1597
|
-
console.log(model);
|
1598
|
-
// {
|
1599
|
-
// coefficients: [
|
1600
|
-
// { variable: 'Intercept', coefficient: 50, pValue: 0.05 },
|
1601
|
-
// { variable: 'advertising', coefficient: 5.5, pValue: 0.01 },
|
1602
|
-
// { variable: 'price', coefficient: -2.1, pValue: 0.03 },
|
1603
|
-
// { variable: 'competition', coefficient: -10, pValue: 0.02 }
|
1604
|
-
// ],
|
1605
|
-
// rSquared: 0.95,
|
1606
|
-
// adjustedRSquared: 0.90,
|
1607
|
-
// fStatistic: 19.0,
|
1608
|
-
// pValueModel: 0.001,
|
1609
|
-
// equation: 'y = 50.0000 + 5.5000*advertising + -2.1000*price + -10.0000*competition'
|
1610
|
-
// }
|
1611
|
-
```
|
1612
|
-
|
1613
|
-
---
|
1614
|
-
|
1615
|
-
##### `polynomial(x, y, degree)`
|
1616
|
-
Polynomial regression.
|
1617
|
-
|
1618
|
-
```javascript
|
1619
|
-
const x = [1, 2, 3, 4, 5];
|
1620
|
-
const y = [1, 4, 9, 16, 25]; // y = x²
|
1621
|
-
|
1622
|
-
const model = datly.regression.polynomial(x, y, 2);
|
1623
|
-
console.log(model);
|
1624
|
-
// {
|
1625
|
-
// coefficients: [0, 0, 1], // y = 0 + 0x + 1x²
|
1626
|
-
// degree: 2,
|
1627
|
-
// rSquared: 1.0,
|
1628
|
-
// equation: 'y = 0.0000 + 0.0000*x + 1.0000*x^2',
|
1629
|
-
// predicted: [1, 4, 9, 16, 25]
|
1630
|
-
// }
|
1631
|
-
```
|
1632
|
-
|
1633
|
-
---
|
1634
|
-
|
1635
|
-
##### `logistic(x, y, maxIterations, tolerance)`
|
1636
|
-
Logistic regression (binary classification).
|
1637
|
-
|
1638
|
-
```javascript
|
1639
|
-
const x = [1, 2, 3, 4, 5, 6];
|
1640
|
-
const y = [0, 0, 0, 1, 1, 1]; // Binary outcome
|
1641
|
-
|
1642
|
-
const model = datly.regression.logistic(x, y);
|
1643
|
-
console.log(model);
|
1644
|
-
// {
|
1645
|
-
// intercept: -3.5,
|
1646
|
-
// slope: 1.2,
|
1647
|
-
// probabilities: [0.12, 0.23, 0.38, 0.55, 0.70, 0.81],
|
1648
|
-
// predicted: [0, 0, 0, 1, 1, 1],
|
1649
|
-
// accuracy: 1.0,
|
1650
|
-
// logLikelihood: -2.1,
|
1651
|
-
// mcFaddenR2: 0.68
|
1652
|
-
// }
|
1653
|
-
```
|
1654
|
-
|
1655
|
-
---
|
1656
|
-
|
1657
|
-
##### `predict(model, x)`
|
1658
|
-
Make predictions using fitted model.
|
1659
|
-
|
1660
|
-
```javascript
|
1661
|
-
// After fitting a model
|
1662
|
-
const newX = [6, 7, 8];
|
1663
|
-
const predictions = datly.regression.predict(model, newX);
|
1664
|
-
console.log(predictions); // [5.8, 6.4, 7.0]
|
1665
|
-
```
|
1666
|
-
|
1667
|
-
---
|
1820
|
+
**Parameters:**
|
1821
|
+
- `X`: 2D array of features
|
1822
|
+
- `n_components`: Number of principal components (default: 2)
|
1668
1823
|
|
1669
|
-
|
1670
|
-
|
1671
|
-
|
1672
|
-
|
1673
|
-
|
1674
|
-
|
1675
|
-
|
1676
|
-
|
1677
|
-
|
1678
|
-
|
1679
|
-
|
1680
|
-
|
1681
|
-
|
1682
|
-
|
1683
|
-
|
1684
|
-
|
1685
|
-
|
1686
|
-
],
|
1687
|
-
length: 4,
|
1688
|
-
columns: 3
|
1689
|
-
};
|
1690
|
-
|
1691
|
-
const report = datly.reportGenerator.summary(data);
|
1692
|
-
console.log(report);
|
1693
|
-
// {
|
1694
|
-
// title: 'Statistical Summary Report',
|
1695
|
-
// generatedAt: '2025-01-15T10:30:00.000Z',
|
1696
|
-
// basicInfo: {
|
1697
|
-
// totalRows: 4,
|
1698
|
-
// totalColumns: 3,
|
1699
|
-
// headers: ['age', 'income', 'department']
|
1700
|
-
// },
|
1701
|
-
// columnAnalysis: {
|
1702
|
-
// age: {
|
1703
|
-
// type: 'numeric',
|
1704
|
-
// mean: 32.5,
|
1705
|
-
// median: 32.5,
|
1706
|
-
// min: 25,
|
1707
|
-
// max: 40,
|
1708
|
-
// standardDeviation: 6.45
|
1709
|
-
// },
|
1710
|
-
// income: {
|
1711
|
-
// type: 'numeric',
|
1712
|
-
// mean: 46250,
|
1713
|
-
// median: 47500,
|
1714
|
-
// ...
|
1715
|
-
// },
|
1716
|
-
// department: {
|
1717
|
-
// type: 'categorical',
|
1718
|
-
// categories: [...],
|
1719
|
-
// mostFrequent: { value: 'Sales', frequency: 2 }
|
1720
|
-
// }
|
1721
|
-
// },
|
1722
|
-
// dataQuality: {
|
1723
|
-
// overallScore: 95,
|
1724
|
-
// completenessScore: 100,
|
1725
|
-
// consistencyScore: 90
|
1726
|
-
// },
|
1727
|
-
// keyInsights: [
|
1728
|
-
// {
|
1729
|
-
// type: 'correlation',
|
1730
|
-
// title: 'Strong correlation between age and income',
|
1731
|
-
// importance: 8
|
1732
|
-
// }
|
1733
|
-
// ],
|
1734
|
-
// recommendations: [...]
|
1735
|
-
// }
|
1824
|
+
**Returns:**
|
1825
|
+
```yaml
|
1826
|
+
type: pca
|
1827
|
+
n_components: 2
|
1828
|
+
means:
|
1829
|
+
- 50.5
|
1830
|
+
- 100.3
|
1831
|
+
- 75.8
|
1832
|
+
components:
|
1833
|
+
- - 0.707
|
1834
|
+
- 0.707
|
1835
|
+
- 0.0
|
1836
|
+
- - -0.707
|
1837
|
+
- 0.707
|
1838
|
+
- 0.0
|
1839
|
+
n: 100
|
1840
|
+
p: 3
|
1736
1841
|
```
|
1737
1842
|
|
1738
|
-
|
1739
|
-
|
1740
|
-
##### `exportSummary(summary, format)`
|
1741
|
-
Export report in different formats.
|
1742
|
-
|
1843
|
+
**Example:**
|
1743
1844
|
```javascript
|
1744
|
-
const
|
1745
|
-
|
1746
|
-
// Export as JSON
|
1747
|
-
const json = datly.reportGenerator.exportSummary(report, 'json');
|
1748
|
-
|
1749
|
-
// Export as text
|
1750
|
-
const text = datly.reportGenerator.exportSummary(report, 'text');
|
1751
|
-
console.log(text);
|
1752
|
-
// STATISTICAL SUMMARY REPORT
|
1753
|
-
// Generated: 1/15/2025, 10:30:00 AM
|
1754
|
-
// ==================================================
|
1755
|
-
//
|
1756
|
-
// BASIC INFORMATION
|
1757
|
-
// --------------------
|
1758
|
-
// Rows: 4
|
1759
|
-
// Columns: 3
|
1760
|
-
// ...
|
1761
|
-
|
1762
|
-
// Export as CSV
|
1763
|
-
const csv = datly.reportGenerator.exportSummary(report, 'csv');
|
1764
|
-
```
|
1765
|
-
|
1766
|
-
---
|
1767
|
-
|
1768
|
-
### 14. Pattern Detection
|
1769
|
-
|
1770
|
-
Automatically detect patterns in data.
|
1771
|
-
|
1772
|
-
#### Methods
|
1773
|
-
|
1774
|
-
##### `analyze(dataset)`
|
1775
|
-
Comprehensive pattern analysis.
|
1776
|
-
|
1777
|
-
```javascript
|
1778
|
-
const data = {
|
1779
|
-
headers: ['date', 'sales', 'temperature'],
|
1780
|
-
data: [
|
1781
|
-
{ date: '2024-01-01', sales: 100, temperature: 20 },
|
1782
|
-
{ date: '2024-01-02', sales: 110, temperature: 22 },
|
1783
|
-
{ date: '2024-01-03', sales: 105, temperature: 21 },
|
1784
|
-
{ date: '2024-01-04', sales: 115, temperature: 23 }
|
1785
|
-
],
|
1786
|
-
length: 4,
|
1787
|
-
columns: 3
|
1788
|
-
};
|
1789
|
-
|
1790
|
-
const patterns = datly.patternDetector.analyze(data);
|
1791
|
-
console.log(patterns);
|
1792
|
-
// {
|
1793
|
-
// patterns: {
|
1794
|
-
// trends: [
|
1795
|
-
// {
|
1796
|
-
// column: 'sales',
|
1797
|
-
// direction: 'increasing',
|
1798
|
-
// slope: 5,
|
1799
|
-
// rSquared: 0.75,
|
1800
|
-
// strength: 'strong'
|
1801
|
-
// }
|
1802
|
-
// ],
|
1803
|
-
// seasonality: [...],
|
1804
|
-
// outliers: [...],
|
1805
|
-
// correlations: {
|
1806
|
-
// strongCorrelations: [
|
1807
|
-
// {
|
1808
|
-
// variable1: 'sales',
|
1809
|
-
// variable2: 'temperature',
|
1810
|
-
// correlation: 0.95,
|
1811
|
-
// strength: 'very_strong'
|
1812
|
-
// }
|
1813
|
-
// ]
|
1814
|
-
// },
|
1815
|
-
// distributions: [...],
|
1816
|
-
// clustering: [...],
|
1817
|
-
// temporal: [...]
|
1818
|
-
// },
|
1819
|
-
// insights: [
|
1820
|
-
// {
|
1821
|
-
// type: 'trend',
|
1822
|
-
// importance: 'high',
|
1823
|
-
// message: 'Found 1 strong trend(s) in your data',
|
1824
|
-
// details: ['sales: increasing trend']
|
1825
|
-
// }
|
1826
|
-
// ]
|
1827
|
-
// }
|
1828
|
-
```
|
1829
|
-
|
1830
|
-
---
|
1831
|
-
|
1832
|
-
### 15. Result Interpretation
|
1833
|
-
|
1834
|
-
Interpret statistical test results in plain language.
|
1835
|
-
|
1836
|
-
#### Methods
|
1837
|
-
|
1838
|
-
##### `interpret(testResult)`
|
1839
|
-
Interpret any statistical test result.
|
1840
|
-
|
1841
|
-
```javascript
|
1842
|
-
// After performing a t-test
|
1843
|
-
const tTestResult = datly.hypothesisTesting.tTest(group1, group2);
|
1844
|
-
|
1845
|
-
const interpretation = datly.interpreter.interpret(tTestResult);
|
1846
|
-
console.log(interpretation);
|
1847
|
-
// {
|
1848
|
-
// testType: 't-test',
|
1849
|
-
// summary: 'significant difference between groups (t = -3.46, p = 0.008)',
|
1850
|
-
// conclusion: {
|
1851
|
-
// decision: 'reject_null',
|
1852
|
-
// statement: 'At the 95% confidence level, we reject the null hypothesis',
|
1853
|
-
// pValue: 0.008,
|
1854
|
-
// confidenceLevel: 95
|
1855
|
-
// },
|
1856
|
-
// significance: {
|
1857
|
-
// level: 'strong',
|
1858
|
-
// pValue: 0.008,
|
1859
|
-
// interpretation: 'Strong evidence against null hypothesis',
|
1860
|
-
// isSignificant: true
|
1861
|
-
// },
|
1862
|
-
// effectSize: {
|
1863
|
-
// value: 0.85,
|
1864
|
-
// magnitude: 'Large',
|
1865
|
-
// interpretation: 'large effect size'
|
1866
|
-
// },
|
1867
|
-
// plainLanguage: '✓ SIGNIFICANT RESULT: Found a meaningful difference between the groups. (p-value: 0.0080)',
|
1868
|
-
// recommendations: [
|
1869
|
-
// 'Very strong result - investigate practical significance',
|
1870
|
-
// 'Replicate findings with independent data when possible'
|
1871
|
-
// ]
|
1872
|
-
// }
|
1845
|
+
const X = [[1, 2, 3], [4, 5, 6], [7, 8, 9]];
|
1846
|
+
const pca = datly.train_pca(X, 2);
|
1873
1847
|
```
|
1874
1848
|
|
1875
|
-
|
1876
|
-
|
1877
|
-
### 16. Auto-Analysis
|
1878
|
-
|
1879
|
-
Automated end-to-end analysis.
|
1880
|
-
|
1881
|
-
#### Methods
|
1849
|
+
#### `transform_pca(model, X)`
|
1882
1850
|
|
1883
|
-
|
1884
|
-
Perform comprehensive automatic analysis.
|
1851
|
+
Transforms data to principal component space.
|
1885
1852
|
|
1886
|
-
|
1887
|
-
|
1888
|
-
|
1889
|
-
|
1890
|
-
|
1891
|
-
|
1892
|
-
|
1893
|
-
|
1894
|
-
|
1895
|
-
|
1896
|
-
|
1897
|
-
columns: 4
|
1898
|
-
};
|
1899
|
-
|
1900
|
-
const analysis = datly.autoAnalyzer.autoAnalyze(data, {
|
1901
|
-
minCorrelationThreshold: 0.3,
|
1902
|
-
significanceLevel: 0.05,
|
1903
|
-
generateVisualizations: true,
|
1904
|
-
includeAdvancedAnalysis: true
|
1905
|
-
});
|
1906
|
-
|
1907
|
-
console.log(analysis);
|
1908
|
-
// {
|
1909
|
-
// metadata: {
|
1910
|
-
// analysisDate: '2025-01-15T10:30:00.000Z',
|
1911
|
-
// datasetSize: 100,
|
1912
|
-
// columnsAnalyzed: 4
|
1913
|
-
// },
|
1914
|
-
// variableClassification: {
|
1915
|
-
// quantitative: [
|
1916
|
-
// { name: 'age', type: 'quantitative', subtype: 'discrete' },
|
1917
|
-
// { name: 'income', type: 'quantitative', subtype: 'continuous' },
|
1918
|
-
// { name: 'purchase', type: 'quantitative', subtype: 'continuous' }
|
1919
|
-
// ],
|
1920
|
-
// qualitative: [
|
1921
|
-
// { name: 'gender', type: 'binary', categories: ['M', 'F'] }
|
1922
|
-
// ]
|
1923
|
-
// },
|
1924
|
-
// descriptiveStatistics: {
|
1925
|
-
// age: { mean: 32.5, median: 32, std: 6.45, ... },
|
1926
|
-
// income: { mean: 46250, median: 47500, ... },
|
1927
|
-
// purchase: { mean: 137.5, median: 135, ... }
|
1928
|
-
// },
|
1929
|
-
// correlationAnalysis: {
|
1930
|
-
// strongCorrelations: [
|
1931
|
-
// {
|
1932
|
-
// variable1: 'income',
|
1933
|
-
// variable2: 'purchase',
|
1934
|
-
// correlation: 0.92,
|
1935
|
-
// significance: true
|
1936
|
-
// }
|
1937
|
-
// ]
|
1938
|
-
// },
|
1939
|
-
// regressionAnalysis: {
|
1940
|
-
// models: [
|
1941
|
-
// {
|
1942
|
-
// independent: 'income',
|
1943
|
-
// dependent: 'purchase',
|
1944
|
-
// rSquared: 0.85,
|
1945
|
-
// significant: true,
|
1946
|
-
// equation: 'y = 10.5 + 0.0025*income'
|
1947
|
-
// }
|
1948
|
-
// ]
|
1949
|
-
// },
|
1950
|
-
// distributionAnalysis: {
|
1951
|
-
// age: {
|
1952
|
-
// isNormal: true,
|
1953
|
-
// normalityPValue: 0.15,
|
1954
|
-
// skewness: 0.12,
|
1955
|
-
// distributionType: 'normal'
|
1956
|
-
// }
|
1957
|
-
// },
|
1958
|
-
// outlierAnalysis: {
|
1959
|
-
// income: {
|
1960
|
-
// count: 2,
|
1961
|
-
// percentage: 2,
|
1962
|
-
// severity: 'low'
|
1963
|
-
// }
|
1964
|
-
// },
|
1965
|
-
// insights: [
|
1966
|
-
// {
|
1967
|
-
// category: 'overview',
|
1968
|
-
// priority: 'high',
|
1969
|
-
// title: 'Dataset Composition',
|
1970
|
-
// description: 'Dataset with 100 records, 3 numeric and 1 categorical variables',
|
1971
|
-
// icon: '📊'
|
1972
|
-
// },
|
1973
|
-
// {
|
1974
|
-
// category: 'correlation',
|
1975
|
-
// priority: 'high',
|
1976
|
-
// title: 'Very strong correlation between income and purchase',
|
1977
|
-
// description: 'Positive correlation of 0.920',
|
1978
|
-
// icon: '🔗'
|
1979
|
-
// }
|
1980
|
-
// ],
|
1981
|
-
// visualizationSuggestions: [
|
1982
|
-
// {
|
1983
|
-
// type: 'scatter',
|
1984
|
-
// variables: ['income', 'purchase'],
|
1985
|
-
// priority: 'high',
|
1986
|
-
// title: 'income vs purchase'
|
1987
|
-
// },
|
1988
|
-
// {
|
1989
|
-
// type: 'histogram',
|
1990
|
-
// variable: 'age',
|
1991
|
-
// priority: 'medium',
|
1992
|
-
// title: 'Distribution of age'
|
1993
|
-
// }
|
1994
|
-
// ],
|
1995
|
-
// summary: {
|
1996
|
-
// totalInsights: 8,
|
1997
|
-
// highPriorityInsights: 3,
|
1998
|
-
// keyFindings: [...],
|
1999
|
-
// recommendations: [
|
2000
|
-
// 'Explore correlations identified for possible predictive modeling',
|
2001
|
-
// 'Consider transformations for non-normal distributions'
|
2002
|
-
// ]
|
2003
|
-
// }
|
2004
|
-
// }
|
1853
|
+
**Returns:**
|
1854
|
+
```yaml
|
1855
|
+
type: pca_transform
|
1856
|
+
n_components: 2
|
1857
|
+
preview:
|
1858
|
+
- - 2.121
|
1859
|
+
- 0.0
|
1860
|
+
- - 0.707
|
1861
|
+
- 0.0
|
1862
|
+
- - -1.414
|
1863
|
+
- 0.0
|
2005
1864
|
```
|
2006
1865
|
|
2007
|
-
|
2008
|
-
|
2009
|
-
### 17. Machine Learning
|
2010
|
-
|
2011
|
-
Build and train ML models.
|
2012
|
-
|
2013
|
-
#### Creating Models
|
2014
|
-
|
2015
|
-
##### Linear Regression
|
2016
|
-
|
1866
|
+
**Example:**
|
2017
1867
|
```javascript
|
2018
|
-
|
2019
|
-
const model = datly.ml.createLinearRegression({
|
2020
|
-
learningRate: 0.01,
|
2021
|
-
iterations: 1000,
|
2022
|
-
regularization: 'l2', // 'l1', 'l2', or null
|
2023
|
-
lambda: 0.01
|
2024
|
-
});
|
2025
|
-
|
2026
|
-
// Prepare data
|
2027
|
-
const X = [
|
2028
|
-
[1, 2],
|
2029
|
-
[2, 3],
|
2030
|
-
[3, 4],
|
2031
|
-
[4, 5]
|
2032
|
-
];
|
2033
|
-
const y = [3, 5, 7, 9];
|
2034
|
-
|
2035
|
-
// Train model
|
2036
|
-
model.fit(X, y, true); // true = normalize features
|
2037
|
-
|
2038
|
-
// Make predictions
|
2039
|
-
const predictions = model.predict([[5, 6]]);
|
2040
|
-
console.log(predictions); // [11]
|
2041
|
-
|
2042
|
-
// Evaluate model
|
2043
|
-
const score = model.score(X, y);
|
2044
|
-
console.log(score);
|
2045
|
-
// {
|
2046
|
-
// r2Score: 1.0,
|
2047
|
-
// mse: 0.0,
|
2048
|
-
// rmse: 0.0,
|
2049
|
-
// mae: 0.0
|
2050
|
-
// }
|
1868
|
+
const transformed = datly.transform_pca(pca, X);
|
2051
1869
|
```
|
2052
1870
|
|
2053
1871
|
---
|
2054
1872
|
|
2055
|
-
|
1873
|
+
## Time Series Analysis
|
2056
1874
|
|
2057
|
-
|
2058
|
-
// Create model for classification
|
2059
|
-
const model = datly.ml.createLogisticRegression({
|
2060
|
-
learningRate: 0.01,
|
2061
|
-
iterations: 1000
|
2062
|
-
});
|
1875
|
+
### `moving_average(array, window = 3)`
|
2063
1876
|
|
2064
|
-
|
2065
|
-
const X = [
|
2066
|
-
[1, 2], [2, 3], [3, 4], [4, 5],
|
2067
|
-
[5, 6], [6, 7], [7, 8], [8, 9]
|
2068
|
-
];
|
2069
|
-
const y = [0, 0, 0, 0, 1, 1, 1, 1];
|
2070
|
-
|
2071
|
-
// Train
|
2072
|
-
model.fit(X, y);
|
2073
|
-
|
2074
|
-
// Predict classes
|
2075
|
-
const predictions = model.predict([[3.5, 4.5], [7, 8]]);
|
2076
|
-
console.log(predictions); // [0, 1]
|
1877
|
+
Calculates moving average.
|
2077
1878
|
|
2078
|
-
|
2079
|
-
|
2080
|
-
|
1879
|
+
**Parameters:**
|
1880
|
+
- `array`: Time series data
|
1881
|
+
- `window`: Window size (default: 3)
|
2081
1882
|
|
2082
|
-
|
2083
|
-
|
2084
|
-
|
2085
|
-
|
2086
|
-
|
2087
|
-
|
2088
|
-
|
2089
|
-
|
1883
|
+
**Returns:**
|
1884
|
+
```yaml
|
1885
|
+
type: time_series
|
1886
|
+
method: moving_average
|
1887
|
+
window: 3
|
1888
|
+
values:
|
1889
|
+
- 10
|
1890
|
+
- 15
|
1891
|
+
- 20
|
1892
|
+
- 22
|
1893
|
+
- 25
|
2090
1894
|
```
|
2091
1895
|
|
2092
|
-
|
2093
|
-
|
2094
|
-
##### K-Nearest Neighbors (KNN)
|
2095
|
-
|
1896
|
+
**Example:**
|
2096
1897
|
```javascript
|
2097
|
-
|
2098
|
-
const
|
2099
|
-
k: 5,
|
2100
|
-
metric: 'euclidean', // 'euclidean', 'manhattan', 'minkowski'
|
2101
|
-
weights: 'uniform' // 'uniform' or 'distance'
|
2102
|
-
});
|
2103
|
-
|
2104
|
-
// Prepare data
|
2105
|
-
const X = [
|
2106
|
-
[1, 2], [2, 3], [3, 4],
|
2107
|
-
[6, 7], [7, 8], [8, 9]
|
2108
|
-
];
|
2109
|
-
const y = [0, 0, 0, 1, 1, 1]; // Classes
|
2110
|
-
|
2111
|
-
// Train (KNN just stores the data)
|
2112
|
-
model.fit(X, y, true, 'classification');
|
2113
|
-
|
2114
|
-
// Predict
|
2115
|
-
const predictions = model.predict([[2, 2], [7, 7]]);
|
2116
|
-
console.log(predictions); // [0, 1]
|
2117
|
-
|
2118
|
-
// Predict with probabilities
|
2119
|
-
const proba = model.predictProba([[4, 5]]);
|
2120
|
-
console.log(proba); // [{ 0: 0.4, 1: 0.6 }]
|
1898
|
+
const data = [10, 20, 30, 20, 30, 25];
|
1899
|
+
const ma = datly.moving_average(data, 3);
|
2121
1900
|
```
|
2122
1901
|
|
2123
|
-
|
2124
|
-
|
2125
|
-
##### Decision Tree
|
2126
|
-
|
2127
|
-
```javascript
|
2128
|
-
// Create decision tree
|
2129
|
-
const model = datly.ml.createDecisionTree({
|
2130
|
-
maxDepth: 10,
|
2131
|
-
minSamplesSplit: 2,
|
2132
|
-
minSamplesLeaf: 1,
|
2133
|
-
criterion: 'gini' // 'gini' or 'entropy'
|
2134
|
-
});
|
2135
|
-
|
2136
|
-
// Train
|
2137
|
-
const X = [
|
2138
|
-
[2.5], [3.5], [4.5], [5.5], [6.5], [7.5]
|
2139
|
-
];
|
2140
|
-
const y = [0, 0, 1, 1, 2, 2]; // Multi-class
|
1902
|
+
### `exponential_smoothing(array, alpha = 0.3)`
|
2141
1903
|
|
2142
|
-
|
1904
|
+
Applies exponential smoothing.
|
2143
1905
|
|
2144
|
-
|
2145
|
-
|
2146
|
-
|
2147
|
-
|
2148
|
-
// Get feature importance
|
2149
|
-
const importance = model.getFeatureImportance();
|
2150
|
-
console.log(importance); // { feature_0: 1.0 }
|
1906
|
+
**Parameters:**
|
1907
|
+
- `array`: Time series data
|
1908
|
+
- `alpha`: Smoothing parameter (0 < α < 1)
|
2151
1909
|
|
2152
|
-
|
2153
|
-
|
2154
|
-
|
2155
|
-
|
2156
|
-
|
2157
|
-
|
2158
|
-
|
2159
|
-
|
2160
|
-
|
2161
|
-
|
2162
|
-
|
2163
|
-
// }
|
1910
|
+
**Returns:**
|
1911
|
+
```yaml
|
1912
|
+
type: time_series
|
1913
|
+
method: exponential_smoothing
|
1914
|
+
alpha: 0.3
|
1915
|
+
values:
|
1916
|
+
- 10
|
1917
|
+
- 13
|
1918
|
+
- 18.1
|
1919
|
+
- 18.47
|
1920
|
+
- 21.73
|
2164
1921
|
```
|
2165
1922
|
|
2166
|
-
|
2167
|
-
|
2168
|
-
##### Random Forest
|
2169
|
-
|
1923
|
+
**Example:**
|
2170
1924
|
```javascript
|
2171
|
-
|
2172
|
-
const model = datly.ml.createRandomForest({
|
2173
|
-
nEstimators: 100, // Number of trees
|
2174
|
-
maxDepth: 10,
|
2175
|
-
minSamplesSplit: 2,
|
2176
|
-
maxFeatures: 'sqrt', // 'sqrt', 'log2', or number
|
2177
|
-
bootstrap: true
|
2178
|
-
});
|
2179
|
-
|
2180
|
-
// Train
|
2181
|
-
const X = [
|
2182
|
-
[1, 2], [2, 3], [3, 4], [4, 5],
|
2183
|
-
[5, 6], [6, 7], [7, 8], [8, 9]
|
2184
|
-
];
|
2185
|
-
const y = ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'];
|
2186
|
-
|
2187
|
-
model.fit(X, y, 'classification');
|
2188
|
-
|
2189
|
-
// Predict
|
2190
|
-
const predictions = model.predict([[2.5, 3.5], [7, 8]]);
|
2191
|
-
console.log(predictions); // ['A', 'C']
|
2192
|
-
|
2193
|
-
// Get feature importance
|
2194
|
-
const importance = model.getFeatureImportance();
|
2195
|
-
console.log(importance); // [0.6, 0.4]
|
1925
|
+
const smoothed = datly.exponential_smoothing(data, 0.3);
|
2196
1926
|
```
|
2197
1927
|
|
2198
|
-
|
2199
|
-
|
2200
|
-
##### Naive Bayes
|
2201
|
-
|
2202
|
-
```javascript
|
2203
|
-
// Create Naive Bayes classifier
|
2204
|
-
const model = datly.ml.createNaiveBayes({
|
2205
|
-
type: 'gaussian' // 'gaussian', 'multinomial', or 'bernoulli'
|
2206
|
-
});
|
2207
|
-
|
2208
|
-
// Train
|
2209
|
-
const X = [
|
2210
|
-
[1, 2], [2, 3], [3, 4],
|
2211
|
-
[5, 6], [6, 7], [7, 8]
|
2212
|
-
];
|
2213
|
-
const y = ['spam', 'spam', 'spam', 'ham', 'ham', 'ham'];
|
1928
|
+
### `autocorrelation(array, lag = 1)`
|
2214
1929
|
|
2215
|
-
|
1930
|
+
Calculates autocorrelation at a given lag.
|
2216
1931
|
|
2217
|
-
|
2218
|
-
|
2219
|
-
|
1932
|
+
**Parameters:**
|
1933
|
+
- `array`: Time series data
|
1934
|
+
- `lag`: Lag value (default: 1)
|
2220
1935
|
|
2221
|
-
|
2222
|
-
|
2223
|
-
|
1936
|
+
**Returns:**
|
1937
|
+
```yaml
|
1938
|
+
type: statistic
|
1939
|
+
name: autocorrelation
|
1940
|
+
lag: 1
|
1941
|
+
value: 0.456
|
2224
1942
|
```
|
2225
1943
|
|
2226
|
-
|
2227
|
-
|
2228
|
-
##### Support Vector Machine (SVM)
|
2229
|
-
|
1944
|
+
**Example:**
|
2230
1945
|
```javascript
|
2231
|
-
|
2232
|
-
const model = datly.ml.createSVM({
|
2233
|
-
C: 1.0, // Regularization parameter
|
2234
|
-
kernel: 'linear', // 'linear', 'rbf', 'poly'
|
2235
|
-
gamma: 'scale', // 'scale', 'auto', or number
|
2236
|
-
degree: 3, // For polynomial kernel
|
2237
|
-
learningRate: 0.001,
|
2238
|
-
iterations: 1000
|
2239
|
-
});
|
2240
|
-
|
2241
|
-
// Train
|
2242
|
-
const X = [
|
2243
|
-
[1, 2], [2, 3], [3, 4],
|
2244
|
-
[6, 7], [7, 8], [8, 9]
|
2245
|
-
];
|
2246
|
-
const y = [0, 0, 0, 1, 1, 1];
|
2247
|
-
|
2248
|
-
model.fit(X, y);
|
2249
|
-
|
2250
|
-
// Predict
|
2251
|
-
const predictions = model.predict([[2, 2], [7, 7]]);
|
2252
|
-
console.log(predictions); // [0, 1]
|
2253
|
-
|
2254
|
-
// Summary
|
2255
|
-
const summary = model.summary();
|
2256
|
-
console.log(summary.trainingMetrics.nSupportVectors); // Number of support vectors
|
1946
|
+
const acf = datly.autocorrelation(data, 1);
|
2257
1947
|
```
|
2258
1948
|
|
2259
1949
|
---
|
2260
1950
|
|
2261
|
-
|
1951
|
+
## Outlier Detection
|
2262
1952
|
|
2263
|
-
|
1953
|
+
### `outliers_iqr(array)`
|
2264
1954
|
|
2265
|
-
|
2266
|
-
const X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]];
|
2267
|
-
const y = [0, 0, 1, 1, 1];
|
1955
|
+
Detects outliers using the IQR (Interquartile Range) method.
|
2268
1956
|
|
2269
|
-
|
2270
|
-
|
2271
|
-
|
2272
|
-
|
2273
|
-
|
2274
|
-
|
2275
|
-
|
1957
|
+
**Returns:**
|
1958
|
+
```yaml
|
1959
|
+
type: outlier_detection
|
1960
|
+
method: iqr
|
1961
|
+
lower_bound: 45.5
|
1962
|
+
upper_bound: 154.5
|
1963
|
+
n_outliers: 3
|
1964
|
+
outlier_indices:
|
1965
|
+
- 5
|
1966
|
+
- 12
|
1967
|
+
- 23
|
1968
|
+
outlier_values:
|
1969
|
+
- 200
|
1970
|
+
- 30
|
1971
|
+
- 180
|
2276
1972
|
```
|
2277
1973
|
|
2278
|
-
|
2279
|
-
|
2280
|
-
##### Cross-Validation
|
2281
|
-
|
1974
|
+
**Example:**
|
2282
1975
|
```javascript
|
2283
|
-
const
|
2284
|
-
const
|
2285
|
-
const y = [0, 0, 0, 0, 1, 1, 1, 1];
|
2286
|
-
|
2287
|
-
const cv = datly.ml.crossValidate(model, X, y, 5, 'classification');
|
2288
|
-
console.log(cv);
|
2289
|
-
// {
|
2290
|
-
// scores: [1.0, 0.8, 1.0, 0.8, 0.9],
|
2291
|
-
// meanScore: 0.9,
|
2292
|
-
// stdScore: 0.089,
|
2293
|
-
// folds: 5
|
2294
|
-
// }
|
1976
|
+
const data = [50, 55, 60, 65, 70, 200, 75, 80];
|
1977
|
+
const outliers = datly.outliers_iqr(data);
|
2295
1978
|
```
|
2296
1979
|
|
2297
|
-
|
2298
|
-
|
2299
|
-
##### Compare Models
|
1980
|
+
### `outliers_zscore(array, threshold = 3)`
|
2300
1981
|
|
2301
|
-
|
2302
|
-
const X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [6, 7]];
|
2303
|
-
const y = [0, 0, 0, 1, 1, 1];
|
1982
|
+
Detects outliers using z-score method.
|
2304
1983
|
|
2305
|
-
|
2306
|
-
|
2307
|
-
|
2308
|
-
{ name: 'Logistic Regression', model: datly.ml.createLogisticRegression() }
|
2309
|
-
];
|
1984
|
+
**Parameters:**
|
1985
|
+
- `array`: Array of numbers
|
1986
|
+
- `threshold`: Z-score threshold (default: 3)
|
2310
1987
|
|
2311
|
-
|
2312
|
-
|
2313
|
-
|
2314
|
-
|
2315
|
-
|
2316
|
-
|
2317
|
-
|
2318
|
-
|
2319
|
-
|
2320
|
-
|
2321
|
-
|
1988
|
+
**Returns:**
|
1989
|
+
```yaml
|
1990
|
+
type: outlier_detection
|
1991
|
+
method: zscore
|
1992
|
+
threshold: 3
|
1993
|
+
n_outliers: 2
|
1994
|
+
outlier_indices:
|
1995
|
+
- 5
|
1996
|
+
- 12
|
1997
|
+
outlier_values:
|
1998
|
+
- 200
|
1999
|
+
- 30
|
2322
2000
|
```
|
2323
2001
|
|
2324
|
-
|
2325
|
-
|
2326
|
-
##### Quick Train (One-liner)
|
2327
|
-
|
2002
|
+
**Example:**
|
2328
2003
|
```javascript
|
2329
|
-
|
2330
|
-
const result = datly.ml.quickTrain(
|
2331
|
-
'randomforest', // Model type
|
2332
|
-
X, // Features
|
2333
|
-
y, // Target
|
2334
|
-
{
|
2335
|
-
taskType: 'classification',
|
2336
|
-
testSize: 0.2,
|
2337
|
-
normalize: true,
|
2338
|
-
nEstimators: 50
|
2339
|
-
}
|
2340
|
-
);
|
2341
|
-
|
2342
|
-
console.log(result);
|
2343
|
-
// {
|
2344
|
-
// model: RandomForest {...},
|
2345
|
-
// score: {
|
2346
|
-
// accuracy: 0.95,
|
2347
|
-
// confusionMatrix: {...}
|
2348
|
-
// },
|
2349
|
-
// trainTime: 150,
|
2350
|
-
// summary: {...}
|
2351
|
-
// }
|
2004
|
+
const outliers = datly.outliers_zscore(data, 3);
|
2352
2005
|
```
|
2353
2006
|
|
2354
2007
|
---
|
2355
2008
|
|
2356
|
-
|
2009
|
+
## Visualization
|
2357
2010
|
|
2358
|
-
|
2359
|
-
// Polynomial features
|
2360
|
-
const X = [[1], [2], [3]];
|
2361
|
-
const polyFeatures = datly.ml.polynomialFeatures(X, 2);
|
2362
|
-
console.log(polyFeatures);
|
2363
|
-
// [[1, 1], [2, 4], [3, 9]] // [x, x²]
|
2011
|
+
All visualization functions create SVG-based charts. They accept optional configuration and a selector for where to render the chart.
|
2364
2012
|
|
2365
|
-
|
2366
|
-
const data = [[1, 2], [3, 4], [5, 6]];
|
2367
|
-
const scaler = datly.ml.standardScaler(data);
|
2368
|
-
console.log(scaler.scaled);
|
2369
|
-
// [[-1.22, -1.22], [0, 0], [1.22, 1.22]]
|
2013
|
+
### Configuration Options
|
2370
2014
|
|
2371
|
-
|
2372
|
-
|
2373
|
-
|
2374
|
-
|
2015
|
+
Common options for all plots:
|
2016
|
+
- `width`: Chart width in pixels (default: 400)
|
2017
|
+
- `height`: Chart height in pixels (default: 400)
|
2018
|
+
- `color`: Primary color (default: '#000')
|
2019
|
+
- `background`: Background color (default: '#fff')
|
2020
|
+
- `title`: Chart title
|
2021
|
+
- `xlabel`: X-axis label
|
2022
|
+
- `ylabel`: Y-axis label
|
2375
2023
|
|
2376
|
-
|
2377
|
-
const minMaxScaler = datly.ml.minMaxScaler(data, [0, 1]);
|
2378
|
-
console.log(minMaxScaler.scaled);
|
2379
|
-
// [[0, 0], [0.5, 0.5], [1, 1]]
|
2380
|
-
```
|
2024
|
+
### `plotHistogram(array, options = {}, selector)`
|
2381
2025
|
|
2382
|
-
|
2026
|
+
Creates a histogram.
|
2383
2027
|
|
2384
|
-
|
2028
|
+
**Additional Options:**
|
2029
|
+
- `bins`: Number of bins (default: 10)
|
2385
2030
|
|
2031
|
+
**Example:**
|
2386
2032
|
```javascript
|
2387
|
-
const
|
2388
|
-
|
2389
|
-
|
2390
|
-
|
2391
|
-
|
2392
|
-
|
2393
|
-
|
2394
|
-
|
2395
|
-
// auc: 0.85, // Area Under Curve
|
2396
|
-
// thresholds: [...]
|
2397
|
-
// }
|
2398
|
-
```
|
2399
|
-
|
2400
|
-
---
|
2401
|
-
|
2402
|
-
##### Precision-Recall Curve
|
2403
|
-
|
2404
|
-
```javascript
|
2405
|
-
const yTrue = [0, 0, 1, 1, 1, 0, 1, 0];
|
2406
|
-
const yProba = [0.1, 0.3, 0.6, 0.8, 0.9, 0.2, 0.7, 0.4];
|
2407
|
-
|
2408
|
-
const pr = datly.ml.precisionRecallCurve(yTrue, yProba);
|
2409
|
-
console.log(pr);
|
2410
|
-
// {
|
2411
|
-
// precision: [0.5, 0.6, 0.67, ...],
|
2412
|
-
// recall: [1.0, 0.8, 0.67, ...],
|
2413
|
-
// thresholds: [...]
|
2414
|
-
// }
|
2033
|
+
const data = [1, 2, 2, 3, 3, 3, 4, 4, 5];
|
2034
|
+
datly.plotHistogram(data, {
|
2035
|
+
width: 600,
|
2036
|
+
height: 400,
|
2037
|
+
bins: 10,
|
2038
|
+
title: 'Distribution',
|
2039
|
+
color: '#4CAF50'
|
2040
|
+
}, '#chart');
|
2415
2041
|
```
|
2416
2042
|
|
2417
|
-
|
2043
|
+
### `plotScatter(x, y, options = {}, selector)`
|
2418
2044
|
|
2419
|
-
|
2045
|
+
Creates a scatter plot.
|
2420
2046
|
|
2421
|
-
|
2422
|
-
|
2423
|
-
#### Setup
|
2047
|
+
**Additional Options:**
|
2048
|
+
- `size`: Point size (default: 4)
|
2424
2049
|
|
2050
|
+
**Example:**
|
2425
2051
|
```javascript
|
2426
|
-
|
2427
|
-
const
|
2428
|
-
|
2429
|
-
|
2430
|
-
|
2052
|
+
const x = [1, 2, 3, 4, 5];
|
2053
|
+
const y = [2, 4, 3, 5, 6];
|
2054
|
+
datly.plotScatter(x, y, {
|
2055
|
+
width: 600,
|
2056
|
+
height: 400,
|
2057
|
+
title: 'Scatter Plot',
|
2058
|
+
xlabel: 'X Variable',
|
2059
|
+
ylabel: 'Y Variable',
|
2060
|
+
size: 5
|
2061
|
+
}, '#chart');
|
2431
2062
|
```
|
2432
2063
|
|
2433
|
-
|
2064
|
+
### `plotLine(x, y, options = {}, selector)`
|
2434
2065
|
|
2435
|
-
|
2066
|
+
Creates a line chart.
|
2436
2067
|
|
2437
|
-
|
2068
|
+
**Additional Options:**
|
2069
|
+
- `lineWidth`: Line width (default: 2)
|
2070
|
+
- `showPoints`: Show data points (default: false)
|
2438
2071
|
|
2072
|
+
**Example:**
|
2439
2073
|
```javascript
|
2440
|
-
const
|
2441
|
-
|
2442
|
-
datly.
|
2443
|
-
|
2444
|
-
|
2445
|
-
|
2446
|
-
|
2447
|
-
color: '#4299e1',
|
2448
|
-
width: 800,
|
2449
|
-
height: 600
|
2450
|
-
});
|
2074
|
+
const x = [1, 2, 3, 4, 5];
|
2075
|
+
const y = [2, 4, 3, 5, 6];
|
2076
|
+
datly.plotLine(x, y, {
|
2077
|
+
lineWidth: 3,
|
2078
|
+
showPoints: true,
|
2079
|
+
title: 'Time Series'
|
2080
|
+
}, '#chart');
|
2451
2081
|
```
|
2452
2082
|
|
2453
|
-
|
2083
|
+
### `plotBar(categories, values, options = {}, selector)`
|
2454
2084
|
|
2455
|
-
|
2085
|
+
Creates a bar chart.
|
2456
2086
|
|
2087
|
+
**Example:**
|
2457
2088
|
```javascript
|
2458
|
-
|
2459
|
-
const
|
2460
|
-
datly.
|
2461
|
-
title: 'Sales
|
2462
|
-
ylabel: 'Sales ($)'
|
2463
|
-
});
|
2464
|
-
|
2465
|
-
// Multiple box plots
|
2466
|
-
const groupA = [20, 22, 23, 25, 27];
|
2467
|
-
const groupB = [30, 32, 35, 37, 40];
|
2468
|
-
const groupC = [25, 27, 29, 31, 33];
|
2469
|
-
|
2470
|
-
datly.plotBoxplot([groupA, groupB, groupC], {
|
2471
|
-
title: 'Sales by Region',
|
2472
|
-
labels: ['North', 'South', 'East'],
|
2089
|
+
const categories = ['A', 'B', 'C', 'D'];
|
2090
|
+
const values = [10, 25, 15, 30];
|
2091
|
+
datly.plotBar(categories, values, {
|
2092
|
+
title: 'Sales by Category',
|
2473
2093
|
ylabel: 'Sales ($)'
|
2474
|
-
});
|
2094
|
+
}, '#chart');
|
2475
2095
|
```
|
2476
2096
|
|
2477
|
-
|
2478
|
-
|
2479
|
-
##### `scatter(x, y, options)`
|
2097
|
+
### `plotBoxplot(data, options = {}, selector)`
|
2480
2098
|
|
2481
|
-
|
2482
|
-
const height = [160, 165, 170, 175, 180, 185];
|
2483
|
-
const weight = [55, 60, 65, 70, 75, 80];
|
2099
|
+
Creates box plots for one or more groups.
|
2484
2100
|
|
2485
|
-
|
2486
|
-
|
2487
|
-
|
2488
|
-
|
2489
|
-
color: '#e74c3c',
|
2490
|
-
size: 6,
|
2491
|
-
labels: ['Person 1', 'Person 2', ...] // Optional
|
2492
|
-
});
|
2493
|
-
```
|
2494
|
-
|
2495
|
-
---
|
2496
|
-
|
2497
|
-
##### `line(x, y, options)`
|
2101
|
+
**Parameters:**
|
2102
|
+
- `data`: Array of arrays (each array is a group) or single array
|
2103
|
+
- `options`:
|
2104
|
+
- `labels`: Array of group labels
|
2498
2105
|
|
2106
|
+
**Example:**
|
2499
2107
|
```javascript
|
2500
|
-
const
|
2501
|
-
const
|
2108
|
+
const group1 = [1, 2, 3, 4, 5, 6];
|
2109
|
+
const group2 = [2, 3, 4, 5, 6, 7];
|
2110
|
+
const group3 = [3, 4, 5, 6, 7, 8];
|
2502
2111
|
|
2503
|
-
datly.
|
2504
|
-
|
2505
|
-
|
2506
|
-
|
2507
|
-
color: '#2ecc71',
|
2508
|
-
lineWidth: 3,
|
2509
|
-
showPoints: true
|
2510
|
-
});
|
2112
|
+
datly.plotBoxplot([group1, group2, group3], {
|
2113
|
+
labels: ['Group A', 'Group B', 'Group C'],
|
2114
|
+
title: 'Comparison'
|
2115
|
+
}, '#chart');
|
2511
2116
|
```
|
2512
2117
|
|
2513
|
-
|
2514
|
-
|
2515
|
-
##### `bar(categories, values, options)`
|
2516
|
-
|
2517
|
-
```javascript
|
2518
|
-
const products = ['Product A', 'Product B', 'Product C'];
|
2519
|
-
const sales = [150, 230, 180];
|
2520
|
-
|
2521
|
-
datly.plotBar(products, sales, {
|
2522
|
-
title: 'Sales by Product',
|
2523
|
-
xlabel: 'Product',
|
2524
|
-
ylabel: 'Sales',
|
2525
|
-
color: '#f39c12',
|
2526
|
-
horizontal: false // Set to true for horizontal bars
|
2527
|
-
});
|
2528
|
-
```
|
2118
|
+
### `plotPie(labels, values, options = {}, selector)`
|
2529
2119
|
|
2530
|
-
|
2120
|
+
Creates a pie chart.
|
2531
2121
|
|
2532
|
-
|
2122
|
+
**Additional Options:**
|
2123
|
+
- `showLabels`: Display labels (default: true)
|
2533
2124
|
|
2125
|
+
**Example:**
|
2534
2126
|
```javascript
|
2535
|
-
const
|
2536
|
-
const
|
2537
|
-
|
2538
|
-
|
2539
|
-
|
2540
|
-
|
2541
|
-
showPercentage: true
|
2542
|
-
});
|
2127
|
+
const labels = ['Category A', 'Category B', 'Category C'];
|
2128
|
+
const values = [30, 45, 25];
|
2129
|
+
datly.plotPie(labels, values, {
|
2130
|
+
title: 'Market Share',
|
2131
|
+
showLabels: true
|
2132
|
+
}, '#chart');
|
2543
2133
|
```
|
2544
2134
|
|
2545
|
-
|
2135
|
+
### `plotHeatmap(matrix, options = {}, selector)`
|
2546
2136
|
|
2547
|
-
|
2137
|
+
Creates a heatmap for a correlation matrix.
|
2548
2138
|
|
2139
|
+
**Additional Options:**
|
2140
|
+
- `labels`: Array of variable names
|
2141
|
+
- `showValues`: Display correlation values (default: true)
|
2142
|
+
|
2143
|
+
**Example:**
|
2549
2144
|
```javascript
|
2550
|
-
const
|
2145
|
+
const corrMatrix = [
|
2551
2146
|
[1.0, 0.8, 0.3],
|
2552
2147
|
[0.8, 1.0, 0.5],
|
2553
2148
|
[0.3, 0.5, 1.0]
|
2554
2149
|
];
|
2555
2150
|
|
2556
|
-
datly.plotHeatmap(
|
2557
|
-
title: 'Correlation Heatmap',
|
2151
|
+
datly.plotHeatmap(corrMatrix, {
|
2558
2152
|
labels: ['Var1', 'Var2', 'Var3'],
|
2559
|
-
|
2560
|
-
|
2561
|
-
});
|
2153
|
+
showValues: true,
|
2154
|
+
title: 'Correlation Matrix'
|
2155
|
+
}, '#chart');
|
2562
2156
|
```
|
2563
2157
|
|
2564
|
-
|
2158
|
+
### `plotViolin(data, options = {}, selector)`
|
2159
|
+
|
2160
|
+
Creates violin plots showing distribution density.
|
2565
2161
|
|
2566
|
-
|
2162
|
+
**Parameters:**
|
2163
|
+
- `data`: Array of arrays or single array
|
2164
|
+
- `options`:
|
2165
|
+
- `labels`: Group labels
|
2567
2166
|
|
2167
|
+
**Example:**
|
2568
2168
|
```javascript
|
2569
|
-
const
|
2570
|
-
const
|
2169
|
+
const group1 = [1, 2, 2, 3, 3, 3, 4, 4, 5];
|
2170
|
+
const group2 = [2, 3, 3, 4, 4, 4, 5, 5, 6];
|
2571
2171
|
|
2572
|
-
datly.plotViolin([
|
2573
|
-
|
2574
|
-
|
2575
|
-
|
2576
|
-
color: '#9b59b6'
|
2577
|
-
});
|
2172
|
+
datly.plotViolin([group1, group2], {
|
2173
|
+
labels: ['Before', 'After'],
|
2174
|
+
title: 'Distribution Comparison'
|
2175
|
+
}, '#chart');
|
2578
2176
|
```
|
2579
2177
|
|
2580
|
-
|
2178
|
+
### `plotDensity(array, options = {}, selector)`
|
2581
2179
|
|
2582
|
-
|
2180
|
+
Creates a kernel density plot.
|
2583
2181
|
|
2584
|
-
|
2585
|
-
|
2182
|
+
**Additional Options:**
|
2183
|
+
- `bandwidth`: Smoothing bandwidth (default: 5)
|
2586
2184
|
|
2185
|
+
**Example:**
|
2186
|
+
```javascript
|
2187
|
+
const data = [1, 2, 2, 3, 3, 3, 4, 4, 5];
|
2587
2188
|
datly.plotDensity(data, {
|
2588
|
-
|
2589
|
-
|
2590
|
-
|
2591
|
-
color: '#1abc9c',
|
2592
|
-
bandwidth: null // Auto-calculate or specify
|
2593
|
-
});
|
2189
|
+
bandwidth: 0.5,
|
2190
|
+
title: 'Density Plot'
|
2191
|
+
}, '#chart');
|
2594
2192
|
```
|
2595
2193
|
|
2596
|
-
|
2194
|
+
### `plotQQ(array, options = {}, selector)`
|
2597
2195
|
|
2598
|
-
|
2196
|
+
Creates a Q-Q plot for normality assessment.
|
2599
2197
|
|
2198
|
+
**Example:**
|
2600
2199
|
```javascript
|
2601
|
-
const data = [2.3,
|
2602
|
-
|
2200
|
+
const data = [1.2, 2.3, 1.8, 2.1, 1.9, 2.0, 2.4];
|
2603
2201
|
datly.plotQQ(data, {
|
2604
|
-
title: 'Q-Q Plot'
|
2605
|
-
|
2606
|
-
ylabel: 'Sample Quantiles',
|
2607
|
-
color: '#34495e'
|
2608
|
-
});
|
2202
|
+
title: 'Q-Q Plot'
|
2203
|
+
}, '#chart');
|
2609
2204
|
```
|
2610
2205
|
|
2611
|
-
|
2206
|
+
### `plotParallel(data, columns, options = {}, selector)`
|
2207
|
+
|
2208
|
+
Creates a parallel coordinates plot.
|
2612
2209
|
|
2613
|
-
|
2210
|
+
**Parameters:**
|
2211
|
+
- `data`: Array of objects
|
2212
|
+
- `columns`: Array of column names to include
|
2213
|
+
- `options`:
|
2214
|
+
- `colors`: Array of colors for each observation
|
2614
2215
|
|
2216
|
+
**Example:**
|
2615
2217
|
```javascript
|
2616
2218
|
const data = [
|
2617
|
-
{ age: 25,
|
2618
|
-
{ age: 30,
|
2619
|
-
{ age: 35,
|
2219
|
+
{ age: 25, salary: 50000, experience: 2 },
|
2220
|
+
{ age: 30, salary: 60000, experience: 5 },
|
2221
|
+
{ age: 35, salary: 70000, experience: 8 }
|
2620
2222
|
];
|
2621
2223
|
|
2622
|
-
datly.plotParallel(data, ['age', '
|
2623
|
-
title: 'Parallel Coordinates
|
2624
|
-
|
2625
|
-
});
|
2224
|
+
datly.plotParallel(data, ['age', 'salary', 'experience'], {
|
2225
|
+
title: 'Parallel Coordinates'
|
2226
|
+
}, '#chart');
|
2626
2227
|
```
|
2627
2228
|
|
2628
|
-
|
2229
|
+
### `plotPairplot(data, columns, options = {}, selector)`
|
2629
2230
|
|
2630
|
-
|
2231
|
+
Creates a pairplot matrix showing all pairwise relationships.
|
2631
2232
|
|
2233
|
+
**Parameters:**
|
2234
|
+
- `data`: Array of objects
|
2235
|
+
- `columns`: Array of column names
|
2236
|
+
- `options`:
|
2237
|
+
- `size`: Size of each subplot (default: 120)
|
2238
|
+
- `color`: Point color
|
2239
|
+
|
2240
|
+
**Example:**
|
2632
2241
|
```javascript
|
2633
2242
|
const data = [
|
2634
|
-
{
|
2635
|
-
{
|
2636
|
-
{
|
2243
|
+
{ age: 25, salary: 50000, experience: 2 },
|
2244
|
+
{ age: 30, salary: 60000, experience: 5 },
|
2245
|
+
{ age: 35, salary: 70000, experience: 8 }
|
2637
2246
|
];
|
2638
2247
|
|
2639
|
-
datly.plotPairplot(data, ['
|
2640
|
-
|
2641
|
-
|
2642
|
-
size: 3
|
2643
|
-
});
|
2248
|
+
datly.plotPairplot(data, ['age', 'salary', 'experience'], {
|
2249
|
+
size: 150
|
2250
|
+
}, '#chart');
|
2644
2251
|
```
|
2645
2252
|
|
2646
|
-
|
2253
|
+
### `plotMultiline(series, options = {}, selector)`
|
2254
|
+
|
2255
|
+
Creates a multi-line chart for comparing time series.
|
2647
2256
|
|
2648
|
-
|
2257
|
+
**Parameters:**
|
2258
|
+
- `series`: Array of objects with `name` and `data` properties
|
2259
|
+
- `data`: Array of `{x, y}` objects
|
2260
|
+
- `options`:
|
2261
|
+
- `legend`: Show legend (default: false)
|
2649
2262
|
|
2263
|
+
**Example:**
|
2650
2264
|
```javascript
|
2651
2265
|
const series = [
|
2652
2266
|
{
|
2653
2267
|
name: 'Series A',
|
2654
|
-
data: [
|
2655
|
-
{ x: 1, y: 10 },
|
2656
|
-
{ x: 2, y: 15 },
|
2657
|
-
{ x: 3, y: 12 }
|
2658
|
-
]
|
2268
|
+
data: [{x: 1, y: 10}, {x: 2, y: 20}, {x: 3, y: 15}]
|
2659
2269
|
},
|
2660
2270
|
{
|
2661
2271
|
name: 'Series B',
|
2662
|
-
data: [
|
2663
|
-
{ x: 1, y: 5 },
|
2664
|
-
{ x: 2, y: 8 },
|
2665
|
-
{ x: 3, y: 10 }
|
2666
|
-
]
|
2272
|
+
data: [{x: 1, y: 15}, {x: 2, y: 25}, {x: 3, y: 20}]
|
2667
2273
|
}
|
2668
2274
|
];
|
2669
2275
|
|
2670
2276
|
datly.plotMultiline(series, {
|
2671
|
-
|
2672
|
-
|
2673
|
-
|
2674
|
-
legend: true
|
2675
|
-
});
|
2277
|
+
legend: true,
|
2278
|
+
title: 'Comparison'
|
2279
|
+
}, '#chart');
|
2676
2280
|
```
|
2677
2281
|
|
2678
2282
|
---
|
2679
2283
|
|
2680
|
-
|
2681
|
-
|
2682
|
-
```javascript
|
2683
|
-
// Correlation matrix heatmap from dataset
|
2684
|
-
const data = {
|
2685
|
-
headers: ['var1', 'var2', 'var3'],
|
2686
|
-
data: [
|
2687
|
-
{ var1: 1, var2: 2, var3: 3 },
|
2688
|
-
{ var1: 2, var2: 4, var3: 5 },
|
2689
|
-
{ var1: 3, var2: 5, var3: 7 }
|
2690
|
-
]
|
2691
|
-
};
|
2284
|
+
## Complete Example Workflow
|
2692
2285
|
|
2693
|
-
|
2694
|
-
title: 'Correlation Matrix'
|
2695
|
-
});
|
2696
|
-
|
2697
|
-
// Distribution plot from dataset column
|
2698
|
-
datly.plotDistribution(data, 'var1', {
|
2699
|
-
title: 'Distribution of var1'
|
2700
|
-
});
|
2701
|
-
|
2702
|
-
// Compare multiple distributions
|
2703
|
-
datly.plotMultipleDistributions(data, ['var1', 'var2', 'var3'], {
|
2704
|
-
title: 'Distribution Comparison'
|
2705
|
-
});
|
2706
|
-
```
|
2707
|
-
|
2708
|
-
---
|
2709
|
-
|
2710
|
-
## 🎯 Complete Examples
|
2711
|
-
|
2712
|
-
### Example 1: Comprehensive Data Analysis
|
2286
|
+
Here's a complete example demonstrating a typical data analysis workflow:
|
2713
2287
|
|
2714
2288
|
```javascript
|
2715
|
-
|
2716
|
-
|
2717
|
-
|
2718
|
-
|
2719
|
-
|
2720
|
-
//
|
2721
|
-
|
2722
|
-
if (!validation.valid) {
|
2723
|
-
console.error('Data validation failed:', validation.errors);
|
2724
|
-
}
|
2725
|
-
|
2726
|
-
// Get descriptive statistics
|
2727
|
-
const sales = datly.dataLoader.getColumn(data, 'sales');
|
2728
|
-
console.log('Mean Sales:', datly.centralTendency.mean(sales));
|
2729
|
-
console.log('Median Sales:', datly.centralTendency.median(sales));
|
2730
|
-
console.log('Std Dev:', datly.dispersion.standardDeviation(sales));
|
2731
|
-
|
2732
|
-
// Check for outliers
|
2733
|
-
const outliers = datly.utils.detectOutliers(sales, 'iqr');
|
2734
|
-
console.log('Outliers:', outliers);
|
2735
|
-
|
2736
|
-
// Test normality
|
2737
|
-
const normalityTest = datly.normalityTests.shapiroWilk(sales);
|
2738
|
-
console.log('Is Normal:', normalityTest.isNormal);
|
2739
|
-
|
2740
|
-
// Generate report
|
2741
|
-
const report = datly.reportGenerator.summary(data);
|
2742
|
-
console.log(report);
|
2743
|
-
|
2744
|
-
datly.plotHistogram(sales, { title: 'Sales Distribution' });
|
2745
|
-
datly.plotBoxplot(sales, { title: 'Sales Box Plot' });
|
2746
|
-
```
|
2747
|
-
|
2748
|
-
---
|
2749
|
-
|
2750
|
-
### Example 2: Hypothesis Testing Workflow
|
2751
|
-
|
2752
|
-
```javascript
|
2753
|
-
const datly = new Datly();
|
2754
|
-
|
2755
|
-
// Two groups to compare
|
2756
|
-
const controlGroup = [23, 25, 27, 29, 31, 33];
|
2757
|
-
const treatmentGroup = [28, 30, 32, 34, 36, 38];
|
2289
|
+
// 1. Load and explore data
|
2290
|
+
const data = [
|
2291
|
+
{ age: 25, salary: 50000, experience: 2, department: 'IT' },
|
2292
|
+
{ age: 30, salary: 60000, experience: 5, department: 'HR' },
|
2293
|
+
{ age: 35, salary: 70000, experience: 8, department: 'IT' },
|
2294
|
+
// ... more data
|
2295
|
+
];
|
2758
2296
|
|
2759
|
-
// Perform
|
2760
|
-
const
|
2761
|
-
|
2762
|
-
|
2763
|
-
|
2297
|
+
// 2. Perform EDA
|
2298
|
+
const overview = datly.eda_overview(data);
|
2299
|
+
console.log(overview);
|
2300
|
+
|
2301
|
+
// 3. Check correlations
|
2302
|
+
const correlations = datly.df_corr(data, 'pearson');
|
2303
|
+
console.log(correlations);
|
2304
|
+
|
2305
|
+
// 4. Prepare features and target
|
2306
|
+
const X = data.map(d => [d.age, d.experience]);
|
2307
|
+
const y = data.map(d => d.salary);
|
2308
|
+
|
2309
|
+
// 5. Split data
|
2310
|
+
const split = datly.train_test_split(X, y, 0.2, 42);
|
2311
|
+
const trainIndices = split.indices.train;
|
2312
|
+
const testIndices = split.indices.test;
|
2313
|
+
|
2314
|
+
const X_train = trainIndices.map(i => X[i]);
|
2315
|
+
const y_train = trainIndices.map(i => y[i]);
|
2316
|
+
const X_test = testIndices.map(i => X[i]);
|
2317
|
+
const y_test = testIndices.map(i => y[i]);
|
2318
|
+
|
2319
|
+
// 6. Scale features
|
2320
|
+
const scaler = datly.standard_scaler_fit(X_train);
|
2321
|
+
const X_train_scaled = datly.standard_scaler_transform(scaler, X_train);
|
2322
|
+
const X_test_scaled = datly.standard_scaler_transform(scaler, X_test);
|
2323
|
+
|
2324
|
+
// 7. Train model
|
2325
|
+
const model = datly.train_linear_regression(
|
2326
|
+
JSON.parse(X_train_scaled).preview,
|
2327
|
+
y_train
|
2764
2328
|
);
|
2765
2329
|
|
2766
|
-
//
|
2767
|
-
const
|
2768
|
-
|
2769
|
-
|
2770
|
-
console.log('Effect Size:', interpretation.effectSize.magnitude);
|
2771
|
-
|
2772
|
-
// Calculate confidence interval for difference
|
2773
|
-
const ci = datly.confidenceIntervals.meanDifference(
|
2774
|
-
controlGroup,
|
2775
|
-
treatmentGroup,
|
2776
|
-
0.95
|
2330
|
+
// 8. Make predictions
|
2331
|
+
const predictions = datly.predict_linear(
|
2332
|
+
model,
|
2333
|
+
JSON.parse(X_test_scaled).preview
|
2777
2334
|
);
|
2778
|
-
console.log('95% CI for difference:', ci.lowerBound, 'to', ci.upperBound);
|
2779
|
-
|
2780
|
-
datly.plotBoxplot([controlGroup, treatmentGroup], {
|
2781
|
-
title: 'Control vs Treatment',
|
2782
|
-
labels: ['Control', 'Treatment']
|
2783
|
-
});
|
2784
|
-
```
|
2785
|
-
|
2786
|
-
---
|
2787
2335
|
|
2788
|
-
|
2789
|
-
|
2790
|
-
|
2791
|
-
|
2792
|
-
|
2793
|
-
const data = {
|
2794
|
-
headers: ['advertising', 'sales'],
|
2795
|
-
data: [
|
2796
|
-
{ advertising: 10, sales: 100 },
|
2797
|
-
{ advertising: 15, sales: 150 },
|
2798
|
-
{ advertising: 20, sales: 180 },
|
2799
|
-
{ advertising: 25, sales: 230 },
|
2800
|
-
{ advertising: 30, sales: 270 }
|
2801
|
-
]
|
2802
|
-
};
|
2803
|
-
|
2804
|
-
// Extract columns
|
2805
|
-
const advertising = datly.dataLoader.getColumn(data, 'advertising');
|
2806
|
-
const sales = datly.dataLoader.getColumn(data, 'sales');
|
2807
|
-
|
2808
|
-
// Calculate correlation
|
2809
|
-
const correlation = datly.correlation.pearson(advertising, sales);
|
2810
|
-
console.log('Correlation:', correlation.correlation);
|
2811
|
-
console.log('P-value:', correlation.pValue);
|
2812
|
-
|
2813
|
-
// Fit regression model
|
2814
|
-
const model = datly.regression.linear(advertising, sales);
|
2815
|
-
console.log('Equation:', model.equation);
|
2816
|
-
console.log('R²:', model.rSquared);
|
2817
|
-
console.log('Model significant:', model.pValueModel < 0.05);
|
2818
|
-
|
2819
|
-
// Make prediction
|
2820
|
-
const newAdvertising = [35];
|
2821
|
-
const prediction = datly.regression.predict(model, newAdvertising);
|
2822
|
-
console.log('Predicted sales for $35k advertising:', prediction[0]);
|
2823
|
-
|
2824
|
-
datly.plotScatter(advertising, sales, {
|
2825
|
-
title: 'Advertising vs Sales',
|
2826
|
-
xlabel: 'Advertising Budget ($1000)',
|
2827
|
-
ylabel: 'Sales ($1000)'
|
2828
|
-
});
|
2829
|
-
```
|
2830
|
-
|
2831
|
-
---
|
2832
|
-
|
2833
|
-
### Example 4: Machine Learning Pipeline
|
2834
|
-
|
2835
|
-
```javascript
|
2836
|
-
const datly = new Datly();
|
2837
|
-
|
2838
|
-
// Load data
|
2839
|
-
const data = datly.dataLoader.loadJSON('iris.json');
|
2840
|
-
|
2841
|
-
// Prepare features and target
|
2842
|
-
const X = data.data.map(row => [
|
2843
|
-
row.sepal_length,
|
2844
|
-
row.sepal_width,
|
2845
|
-
row.petal_length,
|
2846
|
-
row.petal_width
|
2847
|
-
]);
|
2848
|
-
const y = data.data.map(row => row.species);
|
2849
|
-
|
2850
|
-
// Split data
|
2851
|
-
const split = datly.ml.trainTestSplit(X, y, 0.2, true);
|
2852
|
-
|
2853
|
-
// Create and train model
|
2854
|
-
const model = datly.ml.createRandomForest({
|
2855
|
-
nEstimators: 100,
|
2856
|
-
maxDepth: 10
|
2857
|
-
});
|
2858
|
-
|
2859
|
-
model.fit(split.X_train, split.y_train, 'classification');
|
2860
|
-
|
2861
|
-
// Evaluate
|
2862
|
-
const score = model.score(split.X_test, split.y_test);
|
2863
|
-
console.log('Accuracy:', score.accuracy);
|
2864
|
-
console.log('Confusion Matrix:', score.confusionMatrix.display);
|
2865
|
-
|
2866
|
-
// Cross-validation
|
2867
|
-
const cv = datly.ml.crossValidate(model, X, y, 5, 'classification');
|
2868
|
-
console.log('CV Mean Score:', cv.meanScore);
|
2869
|
-
console.log('CV Std:', cv.stdScore);
|
2870
|
-
|
2871
|
-
// Feature importance
|
2872
|
-
const importance = model.getFeatureImportance();
|
2873
|
-
console.log('Feature Importance:', importance);
|
2874
|
-
```
|
2875
|
-
|
2876
|
-
---
|
2877
|
-
|
2878
|
-
### Example 5: Automatic Analysis
|
2879
|
-
|
2880
|
-
```javascript
|
2881
|
-
const datly = new Datly();
|
2882
|
-
|
2883
|
-
// Load your data
|
2884
|
-
const data = datly.dataLoader.loadCSV('customer_data.csv');
|
2885
|
-
|
2886
|
-
// Run automatic analysis
|
2887
|
-
const analysis = datly.autoAnalyzer.autoAnalyze(data, {
|
2888
|
-
minCorrelationThreshold: 0.5,
|
2889
|
-
significanceLevel: 0.05,
|
2890
|
-
generateVisualizations: true
|
2891
|
-
});
|
2892
|
-
|
2893
|
-
// View insights
|
2894
|
-
analysis.insights.forEach(insight => {
|
2895
|
-
console.log(`${insight.icon} [${insight.priority}] ${insight.title}`);
|
2896
|
-
console.log(` ${insight.description}`);
|
2897
|
-
if (insight.recommendation) {
|
2898
|
-
console.log(` → ${insight.recommendation}`);
|
2899
|
-
}
|
2900
|
-
});
|
2901
|
-
|
2902
|
-
// View recommended visualizations
|
2903
|
-
analysis.visualizationSuggestions.forEach(viz => {
|
2904
|
-
console.log(`📊 ${viz.title} (${viz.type}) - Priority: ${viz.priority}`);
|
2905
|
-
});
|
2906
|
-
|
2907
|
-
// Generate report
|
2908
|
-
const textReport = datly.reportGenerator.exportSummary(
|
2909
|
-
analysis,
|
2910
|
-
'text'
|
2336
|
+
// 9. Evaluate model
|
2337
|
+
const metrics = datly.metrics_regression(
|
2338
|
+
y_test,
|
2339
|
+
JSON.parse(predictions).predictions
|
2911
2340
|
);
|
2912
|
-
console.log(
|
2913
|
-
```
|
2914
|
-
|
2915
|
-
---
|
2341
|
+
console.log(metrics);
|
2916
2342
|
|
2917
|
-
|
2918
|
-
|
2919
|
-
|
2920
|
-
|
2921
|
-
|
2922
|
-
|
2923
|
-
|
2924
|
-
- **`CentralTendency`**: Mean, median, mode calculations
|
2925
|
-
- **`Dispersion`**: Variance, standard deviation measures
|
2926
|
-
- **`Position`**: Quantiles, percentiles, rankings
|
2927
|
-
- **`Shape`**: Skewness and kurtosis analysis
|
2928
|
-
- **`HypothesisTesting`**: Statistical tests
|
2929
|
-
- **`ConfidenceIntervals`**: Interval estimation
|
2930
|
-
- **`NormalityTests`**: Test for normal distribution
|
2931
|
-
- **`Correlation`**: Correlation analysis
|
2932
|
-
- **`Regression`**: Regression modeling
|
2933
|
-
- **`ReportGenerator`**: Generate statistical reports
|
2934
|
-
- **`PatternDetector`**: Detect patterns in data
|
2935
|
-
- **`Interpreter`**: Interpret statistical results
|
2936
|
-
- **`AutoAnalyzer`**: Automated analysis
|
2937
|
-
- **`ML`**: Machine learning models
|
2938
|
-
- **`Visualizer`**: Data visualization
|
2939
|
-
|
2940
|
-
---
|
2941
|
-
|
2942
|
-
## 🌐 Browser Support
|
2943
|
-
|
2944
|
-
- Chrome (latest)
|
2945
|
-
- Firefox (latest)
|
2946
|
-
- Safari (latest)
|
2947
|
-
- Edge (latest)
|
2948
|
-
|
2949
|
-
**Requirements:**
|
2950
|
-
- Modern JavaScript (ES6+)
|
2343
|
+
// 10. Visualize results
|
2344
|
+
datly.plotScatter(y_test, JSON.parse(predictions).predictions, {
|
2345
|
+
title: 'Actual vs Predicted',
|
2346
|
+
xlabel: 'Actual',
|
2347
|
+
ylabel: 'Predicted'
|
2348
|
+
}, '#results');
|
2349
|
+
```
|
2951
2350
|
|
2952
2351
|
---
|
2953
2352
|
|
2954
|
-
##
|
2955
|
-
|
2956
|
-
Contributions are welcome! Please follow these steps:
|
2353
|
+
## Tips and Best Practices
|
2957
2354
|
|
2958
|
-
1.
|
2959
|
-
2.
|
2960
|
-
3.
|
2961
|
-
4.
|
2962
|
-
5.
|
2355
|
+
1. **Data Preparation**: Always check for missing values and outliers before analysis
|
2356
|
+
2. **Feature Scaling**: Scale features before training distance-based models (KNN, SVM)
|
2357
|
+
3. **Cross-Validation**: Use cross-validation to assess model performance reliably
|
2358
|
+
4. **Model Selection**: Start with simple models (linear regression) before trying complex ones
|
2359
|
+
5. **Hyperparameter Tuning**: Experiment with different hyperparameters (k in KNN, max_depth in trees)
|
2360
|
+
6. **Visualization**: Always visualize your data and results to gain insights
|
2361
|
+
7. **Statistical Tests**: Check assumptions (normality, homogeneity) before parametric tests
|
2963
2362
|
|
2964
2363
|
---
|
2965
2364
|
|
2966
|
-
##
|
2967
|
-
|
2968
|
-
MIT License
|
2365
|
+
## License
|
2969
2366
|
|
2970
|
-
|
2971
|
-
|
2972
|
-
Permission is hereby granted, free of charge, to any person obtaining a copy
|
2973
|
-
of this software and associated documentation files (the "Software"), to deal
|
2974
|
-
in the Software without restriction, including without limitation the rights
|
2975
|
-
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
2976
|
-
copies of the Software.
|
2367
|
+
This documentation is provided as-is. Please refer to the library's official repository for licensing information.
|
2977
2368
|
|
2978
2369
|
---
|
2979
2370
|
|
2980
|
-
##
|
2981
|
-
|
2982
|
-
For questions, issues, or feature requests:
|
2983
|
-
- GitHub Issues: [github.com/yourrepo/datly/issues](https://github.com/yourrepo/datly/issues)
|
2984
|
-
- NPM Package: [npmjs.com/package/datly](https://npmjs.com/package/datly)
|
2371
|
+
## Support
|
2985
2372
|
|
2986
|
-
|
2373
|
+
For issues, questions, or contributions, please visit the official Datly.js repository.
|