@datagrok/eda 1.4.11 → 1.4.13
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.eslintrc.json +0 -1
- package/CHANGELOG.md +15 -0
- package/CLAUDE.md +185 -0
- package/README.md +8 -0
- package/css/pmpo.css +35 -0
- package/dist/package-test.js +1 -1
- package/dist/package-test.js.map +1 -1
- package/dist/package.js +1 -1
- package/dist/package.js.map +1 -1
- package/eslintrc.json +45 -0
- package/files/drugs-props-test.csv +126 -0
- package/files/drugs-props-train-scores.csv +664 -0
- package/files/drugs-props-train.csv +664 -0
- package/package.json +9 -3
- package/src/anova/anova-tools.ts +1 -1
- package/src/anova/anova-ui.ts +1 -1
- package/src/package-api.ts +18 -0
- package/src/package-test.ts +4 -1
- package/src/package.g.ts +25 -0
- package/src/package.ts +55 -15
- package/src/pareto-optimization/pareto-computations.ts +6 -0
- package/src/pareto-optimization/utils.ts +6 -4
- package/src/probabilistic-scoring/data-generator.ts +157 -0
- package/src/probabilistic-scoring/nelder-mead.ts +204 -0
- package/src/probabilistic-scoring/pmpo-defs.ts +218 -0
- package/src/probabilistic-scoring/pmpo-utils.ts +603 -0
- package/src/probabilistic-scoring/prob-scoring.ts +991 -0
- package/src/probabilistic-scoring/stat-tools.ts +303 -0
- package/src/softmax-classifier.ts +1 -1
- package/src/tests/anova-tests.ts +1 -1
- package/src/tests/classifiers-tests.ts +1 -1
- package/src/tests/dim-reduction-tests.ts +1 -1
- package/src/tests/linear-methods-tests.ts +1 -1
- package/src/tests/mis-vals-imputation-tests.ts +1 -1
- package/src/tests/pareto-tests.ts +253 -0
- package/src/tests/pmpo-tests.ts +157 -0
- package/test-console-output-1.log +175 -209
- package/test-record-1.mp4 +0 -0
package/.eslintrc.json
CHANGED
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,20 @@
|
|
|
1
1
|
# EDA changelog
|
|
2
2
|
|
|
3
|
+
## 1.4.13 (WIP)
|
|
4
|
+
|
|
5
|
+
Improved probabilistic multi-parameter optimization (pMPO):
|
|
6
|
+
|
|
7
|
+
* ROC curve and confusion matrix
|
|
8
|
+
* pMPO without sigmoidal correction
|
|
9
|
+
* Correctness tests
|
|
10
|
+
|
|
11
|
+
## 1.4.12 (2026-01-16)
|
|
12
|
+
|
|
13
|
+
Implemented
|
|
14
|
+
|
|
15
|
+
* Interactive app for training probabilistic multi-parameter optimization (pMPO) models
|
|
16
|
+
* Export pMPO models to MPO desirability profiles
|
|
17
|
+
|
|
3
18
|
## 1.4.11 (2025-11-21)
|
|
4
19
|
|
|
5
20
|
Implemented
|
package/CLAUDE.md
ADDED
|
@@ -0,0 +1,185 @@
|
|
|
1
|
+
# CLAUDE.md
|
|
2
|
+
|
|
3
|
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
EDA (Exploratory Data Analysis) is a Datagrok package providing statistical analysis and machine learning tools. It includes dimensionality reduction (PCA, UMAP, t-SNE, SPE), supervised learning (SVM, linear regression, softmax, PLS, gradient boosting via XGBoost), ANOVA, missing data imputation, Pareto optimization, and probabilistic multi-parameter optimization (pMPO).
|
|
8
|
+
|
|
9
|
+
## Build Commands
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
npm install # Install dependencies
|
|
13
|
+
npm run build # Full build: grok api && grok check --soft && webpack
|
|
14
|
+
npm run build-eda # Webpack only
|
|
15
|
+
npm run build-all # Build js-api, libraries, then this package
|
|
16
|
+
|
|
17
|
+
npm run link-all # Link local datagrok-api and libraries
|
|
18
|
+
|
|
19
|
+
npm run debug-eda # Build and publish to default server
|
|
20
|
+
npm run debug-eda-local # Build and publish to local server
|
|
21
|
+
npm run release-eda # Build and publish as release
|
|
22
|
+
|
|
23
|
+
npm test # Run tests against localhost
|
|
24
|
+
grok test # Run all tests
|
|
25
|
+
grok test --test "TestName" # Run specific test
|
|
26
|
+
grok test --category "CategoryName" # Run tests in category
|
|
27
|
+
grok test --gui # Run with visible browser
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
## Architecture
|
|
31
|
+
|
|
32
|
+
### High-Level Structure
|
|
33
|
+
|
|
34
|
+
The package combines TypeScript, WASM modules, and web workers for performance-critical computations:
|
|
35
|
+
|
|
36
|
+
- **WASM Modules** (`wasm/`): C++ implementations compiled to WebAssembly for PCA, PLS, SVM, softmax regression, and XGBoost. These provide high-performance numerical computation.
|
|
37
|
+
- **Web Workers** (`src/workers/`): JavaScript workers for t-SNE, UMAP, and softmax training to avoid blocking the UI thread.
|
|
38
|
+
- **TypeScript API** (`src/`): High-level interfaces, UI components, and integration with Datagrok platform.
|
|
39
|
+
|
|
40
|
+
### Core Modules
|
|
41
|
+
|
|
42
|
+
#### Dimensionality Reduction
|
|
43
|
+
- **PCA** (`eda-tools.ts`, `wasm/PCA/`): Principal Component Analysis via WASM
|
|
44
|
+
- **UMAP/t-SNE** (`workers/umap-worker.ts`, `workers/tsne-worker.ts`): Manifold learning in workers
|
|
45
|
+
- Uses `@datagrok-libraries/ml` for multi-column dimensionality reduction with custom distance metrics
|
|
46
|
+
|
|
47
|
+
#### Partial Least Squares (`pls/`)
|
|
48
|
+
- **PLS regression and analysis** (`pls-tools.ts`, `pls-ml.ts`): Multivariate analysis for high-dimensional data
|
|
49
|
+
- **WASM backend** (`wasm/PLS/`): Performance-critical PLS computations
|
|
50
|
+
- Supports both linear and quadratic PLS models
|
|
51
|
+
|
|
52
|
+
#### Support Vector Machines (`svm.ts`)
|
|
53
|
+
- **LS-SVM** implementation with multiple kernels (linear, RBF, polynomial, sigmoid)
|
|
54
|
+
- Training/prediction via WASM (`wasm/svm.h`, `wasm/svmApi.cpp`)
|
|
55
|
+
- Interactive model training with progress tracking
|
|
56
|
+
|
|
57
|
+
#### XGBoost (`xgbooster.ts`)
|
|
58
|
+
- **XGBoost** integration via WASM (`wasm/XGBoostAPI.wasm`)
|
|
59
|
+
- Binary classification and regression
|
|
60
|
+
- Requires initialization via `initXgboost()` in package init
|
|
61
|
+
|
|
62
|
+
#### Other Supervised Machine Techniques
|
|
63
|
+
- **Softmax Classifier** (`softmax-classifier.ts`): Multinomial logistic regression
|
|
64
|
+
- Training in web worker for non-blocking UI
|
|
65
|
+
- WASM backend for prediction
|
|
66
|
+
- **Linear Regression** (`regression.ts`): Ordinary least squares
|
|
67
|
+
|
|
68
|
+
#### Missing Values Imputation (`missing-values-imputation/`)
|
|
69
|
+
- **K-Nearest Neighbors** imputation (`knn-imputer.ts`)
|
|
70
|
+
- Handles both numerical and categorical features
|
|
71
|
+
|
|
72
|
+
#### ANOVA (`anova/`)
|
|
73
|
+
- **One-way ANOVA** (`anova-tools.ts`, `anova-ui.ts`)
|
|
74
|
+
- Statistical analysis of variance with visual reports
|
|
75
|
+
|
|
76
|
+
#### Pareto Optimization (`pareto-optimization/`)
|
|
77
|
+
- **ParetoOptimizer** (`pareto-optimizer.ts`): Multi-objective optimization
|
|
78
|
+
- **ParetoFrontViewer** (`pareto-front-viewer.ts`): Custom JsViewer for visualizing Pareto fronts
|
|
79
|
+
- Computes Pareto-optimal solutions from multiple objectives (minimize/maximize)
|
|
80
|
+
|
|
81
|
+
#### Probabilistic Scoring (`probabilistic-scoring/`)
|
|
82
|
+
- **pMPO** (`prob-scoring.ts`): Probabilistic Multi-Parameter Optimization for drug discovery
|
|
83
|
+
- Based on https://pmc.ncbi.nlm.nih.gov/articles/PMC4716604/
|
|
84
|
+
- Features:
|
|
85
|
+
- Training with descriptor statistics and correlation filtering
|
|
86
|
+
- Sigmoid-corrected desirability functions
|
|
87
|
+
- ROC curve analysis and confusion matrix
|
|
88
|
+
- Model evaluation and auto-tuning via Nelder-Mead optimization (`nelder-mead.ts`)
|
|
89
|
+
- Export to MPO desirability profiles (integration with `@datagrok-libraries/statistics`)
|
|
90
|
+
- Sample data in `files/` directory for testing and demos
|
|
91
|
+
|
|
92
|
+
### WASM Integration
|
|
93
|
+
|
|
94
|
+
WASM modules are initialized asynchronously in `PackageFunctions.init()`:
|
|
95
|
+
- `_initEDAAPI()` from `wasm/EDAAPI.js`
|
|
96
|
+
- `initXgboost()` from `wasm/xgbooster.js`
|
|
97
|
+
|
|
98
|
+
WASM source files (`wasm/*.cpp`, `wasm/*.h`) are C++ implementations compiled to WebAssembly. Do not modify generated `.js` and `.wasm` files directly.
|
|
99
|
+
|
|
100
|
+
### Function Registration
|
|
101
|
+
|
|
102
|
+
Functions are registered via JSDoc-style metadata comments (not TypeScript decorators):
|
|
103
|
+
|
|
104
|
+
```typescript
|
|
105
|
+
@grok.decorators.func({
|
|
106
|
+
'top-menu': 'ML | Analyze | PCA...',
|
|
107
|
+
'description': 'Principal component analysis (PCA)',
|
|
108
|
+
'helpUrl': '/help/explore/dim-reduction#pca',
|
|
109
|
+
})
|
|
110
|
+
static async PCA(
|
|
111
|
+
@grok.decorators.param({'type': 'dataframe', 'options': {'caption': 'Table'}}) table: DG.DataFrame,
|
|
112
|
+
// ...
|
|
113
|
+
) { }
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
Run `grok api` to generate `package.g.ts` and `package-api.ts` from these annotations.
|
|
117
|
+
|
|
118
|
+
### External Dependencies
|
|
119
|
+
|
|
120
|
+
Key library integrations:
|
|
121
|
+
- **@datagrok-libraries/ml**: Distance metrics, dimensionality reduction, MCL clustering
|
|
122
|
+
- **@datagrok-libraries/statistics**: Statistical functions, MPO profile editor
|
|
123
|
+
- **@datagrok-libraries/math**: DBSCAN, mathematical utilities
|
|
124
|
+
- **@datagrok-libraries/test**: Test framework
|
|
125
|
+
|
|
126
|
+
External packages (must be in webpack externals):
|
|
127
|
+
- `mathjs`: Matrix operations and numerical computation
|
|
128
|
+
- `jstat`: Statistical distributions
|
|
129
|
+
- `umap-js`: UMAP implementation
|
|
130
|
+
- `@keckelt/tsne`: t-SNE implementation
|
|
131
|
+
|
|
132
|
+
## Testing
|
|
133
|
+
|
|
134
|
+
Tests are organized by feature in `src/tests/`:
|
|
135
|
+
- `dim-reduction-tests.ts`: PCA, UMAP, t-SNE, SPE
|
|
136
|
+
- `linear-methods-tests.ts`: Linear regression, PLS
|
|
137
|
+
- `classifiers-tests.ts`: SVM, softmax, XGBoost
|
|
138
|
+
- `mis-vals-imputation-tests.ts`: KNN imputation
|
|
139
|
+
- `anova-tests.ts`: One-way ANOVA
|
|
140
|
+
- `pmpo-tests.ts`: Probabilistic scoring
|
|
141
|
+
- `pareto-tests.ts`: Pareto optimization
|
|
142
|
+
|
|
143
|
+
Test framework from `@datagrok-libraries/test`:
|
|
144
|
+
```typescript
|
|
145
|
+
import {category, expect, test} from '@datagrok-libraries/test/src/test';
|
|
146
|
+
|
|
147
|
+
category('Feature Name', () => {
|
|
148
|
+
test('Test description', async () => {
|
|
149
|
+
// test code
|
|
150
|
+
}, {timeout: 30000});
|
|
151
|
+
});
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
## Scripts and Files
|
|
155
|
+
|
|
156
|
+
- **`scripts/`**: Python scripts for exporting TypeScript constants from Python implementations
|
|
157
|
+
- **`files/`**: Demo/test data for pMPO (drugs-props-train.csv, drugs-props-test.csv, scores)
|
|
158
|
+
- **`css/`**: Custom stylesheets (pmpo.css for pMPO UI)
|
|
159
|
+
|
|
160
|
+
## Key Workflows
|
|
161
|
+
|
|
162
|
+
### Adding a New ML Method
|
|
163
|
+
|
|
164
|
+
1. Implement the algorithm in TypeScript (or C++ for WASM)
|
|
165
|
+
2. Add function registration with `@grok.decorators.func()`
|
|
166
|
+
3. Create tests in appropriate test file
|
|
167
|
+
4. Run `grok api` to generate wrappers
|
|
168
|
+
5. Add to package menu structure in `package.json` meta.menu
|
|
169
|
+
|
|
170
|
+
### Working with WASM
|
|
171
|
+
|
|
172
|
+
- WASM sources in `wasm/*.cpp` and `wasm/*.h`
|
|
173
|
+
- Generated files: `wasm/*.js`, `wasm/*.wasm`
|
|
174
|
+
- Initialization required in `PackageFunctions.init()`
|
|
175
|
+
- Web workers have separate WASM entry points (e.g., `EDAForWebWorker.js`)
|
|
176
|
+
|
|
177
|
+
### pMPO Model Development
|
|
178
|
+
|
|
179
|
+
The probabilistic scoring module is complex and tightly integrated:
|
|
180
|
+
- Model training in `Pmpo` class (`prob-scoring.ts`)
|
|
181
|
+
- Statistical computations in `stat-tools.ts`
|
|
182
|
+
- UI utilities and serialization in `pmpo-utils.ts`
|
|
183
|
+
- Optimization via Nelder-Mead in `nelder-mead.ts`
|
|
184
|
+
- All constants and types in `pmpo-defs.ts`
|
|
185
|
+
- Integration with `@datagrok-libraries/statistics` for MPO profile export
|
package/README.md
CHANGED
|
@@ -1,5 +1,12 @@
|
|
|
1
1
|
# EDA
|
|
2
2
|
|
|
3
|
+
[](https://datagrok.ai/help/learn/)
|
|
4
|
+
[](https://datagrok.ai/help/explore/dim-reduction)
|
|
5
|
+
[](https://datagrok.ai/help/explore/cluster-data)
|
|
6
|
+
[](https://datagrok.ai/help/explore/multivariate-analysis)
|
|
7
|
+
[](https://datagrok.ai/help/explore/anova)
|
|
8
|
+
[](https://public.datagrok.ai/apps/tutorials/Tutorials/MachineLearning/MultivariateAnalysis)
|
|
9
|
+
|
|
3
10
|
EDA is a [package](https://datagrok.ai/help/develop/#packages) for the [Datagrok](https://datagrok.ai) platform. It provides the following exploratory data analysis and machine learning tools:
|
|
4
11
|
|
|
5
12
|
* Dimensionality reduction
|
|
@@ -22,3 +29,4 @@ EDA is a [package](https://datagrok.ai/help/develop/#packages) for the [Datagrok
|
|
|
22
29
|
* k-nearest neighbors method ([KNN](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm))
|
|
23
30
|
* Multi-objective optimization
|
|
24
31
|
* [Pareto front](https://en.wikipedia.org/wiki/Pareto_front) viewer
|
|
32
|
+
* [Probabilistic MPO](https://pmc.ncbi.nlm.nih.gov/articles/PMC4716604/)
|
package/css/pmpo.css
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
.eda-pmpo-tooltip-line {
|
|
2
|
+
display: flex;
|
|
3
|
+
align-items: center;
|
|
4
|
+
gap: 6px;
|
|
5
|
+
margin-left: 6px;
|
|
6
|
+
}
|
|
7
|
+
|
|
8
|
+
.eda-pmpo-box {
|
|
9
|
+
width: 10px;
|
|
10
|
+
height: 10px;
|
|
11
|
+
}
|
|
12
|
+
|
|
13
|
+
.eda-pmpo-input-form {
|
|
14
|
+
padding-left: 10px;
|
|
15
|
+
padding-right: 5px;
|
|
16
|
+
}
|
|
17
|
+
|
|
18
|
+
.eda-pmpo-title {
|
|
19
|
+
font-size: 14.5px;
|
|
20
|
+
position: absolute;
|
|
21
|
+
top: 50%;
|
|
22
|
+
left: 50%;
|
|
23
|
+
transform: translate(-50%, -50%);
|
|
24
|
+
pointer-events: none;
|
|
25
|
+
white-space: nowrap;
|
|
26
|
+
}
|
|
27
|
+
|
|
28
|
+
.eda-pmpo-centered-text {
|
|
29
|
+
display: flex;
|
|
30
|
+
justify-content: center;
|
|
31
|
+
align-items: center;
|
|
32
|
+
height: 100%;
|
|
33
|
+
width: 100%;
|
|
34
|
+
text-align: center;
|
|
35
|
+
}
|