@datagrok/eda 1.4.12 → 1.4.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/.eslintrc.json CHANGED
@@ -17,7 +17,6 @@
17
17
  "rules": {
18
18
  "no-unused-vars": "off",
19
19
  "@typescript-eslint/no-unused-vars": ["warn", { "varsIgnorePattern": "^(_|ui$|grok$|DG$)", "argsIgnorePattern": "^_"}],
20
- "@typescript-eslint/require-array-sort-compare": "error",
21
20
  "indent": [
22
21
  "error",
23
22
  2
package/CHANGELOG.md CHANGED
@@ -1,5 +1,13 @@
1
1
  # EDA changelog
2
2
 
3
+ ## 1.4.13 (WIP)
4
+
5
+ Improved probabilistic multi-parameter optimization (pMPO):
6
+
7
+ * ROC curve and confusion matrix
8
+ * pMPO without sigmoidal correction
9
+ * Correctness tests
10
+
3
11
  ## 1.4.12 (2026-01-16)
4
12
 
5
13
  Implemented
package/CLAUDE.md ADDED
@@ -0,0 +1,185 @@
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Overview
6
+
7
+ EDA (Exploratory Data Analysis) is a Datagrok package providing statistical analysis and machine learning tools. It includes dimensionality reduction (PCA, UMAP, t-SNE, SPE), supervised learning (SVM, linear regression, softmax, PLS, gradient boosting via XGBoost), ANOVA, missing data imputation, Pareto optimization, and probabilistic multi-parameter optimization (pMPO).
8
+
9
+ ## Build Commands
10
+
11
+ ```bash
12
+ npm install # Install dependencies
13
+ npm run build # Full build: grok api && grok check --soft && webpack
14
+ npm run build-eda # Webpack only
15
+ npm run build-all # Build js-api, libraries, then this package
16
+
17
+ npm run link-all # Link local datagrok-api and libraries
18
+
19
+ npm run debug-eda # Build and publish to default server
20
+ npm run debug-eda-local # Build and publish to local server
21
+ npm run release-eda # Build and publish as release
22
+
23
+ npm test # Run tests against localhost
24
+ grok test # Run all tests
25
+ grok test --test "TestName" # Run specific test
26
+ grok test --category "CategoryName" # Run tests in category
27
+ grok test --gui # Run with visible browser
28
+ ```
29
+
30
+ ## Architecture
31
+
32
+ ### High-Level Structure
33
+
34
+ The package combines TypeScript, WASM modules, and web workers for performance-critical computations:
35
+
36
+ - **WASM Modules** (`wasm/`): C++ implementations compiled to WebAssembly for PCA, PLS, SVM, softmax regression, and XGBoost. These provide high-performance numerical computation.
37
+ - **Web Workers** (`src/workers/`): JavaScript workers for t-SNE, UMAP, and softmax training to avoid blocking the UI thread.
38
+ - **TypeScript API** (`src/`): High-level interfaces, UI components, and integration with Datagrok platform.
39
+
40
+ ### Core Modules
41
+
42
+ #### Dimensionality Reduction
43
+ - **PCA** (`eda-tools.ts`, `wasm/PCA/`): Principal Component Analysis via WASM
44
+ - **UMAP/t-SNE** (`workers/umap-worker.ts`, `workers/tsne-worker.ts`): Manifold learning in workers
45
+ - Uses `@datagrok-libraries/ml` for multi-column dimensionality reduction with custom distance metrics
46
+
47
+ #### Partial Least Squares (`pls/`)
48
+ - **PLS regression and analysis** (`pls-tools.ts`, `pls-ml.ts`): Multivariate analysis for high-dimensional data
49
+ - **WASM backend** (`wasm/PLS/`): Performance-critical PLS computations
50
+ - Supports both linear and quadratic PLS models
51
+
52
+ #### Support Vector Machines (`svm.ts`)
53
+ - **LS-SVM** implementation with multiple kernels (linear, RBF, polynomial, sigmoid)
54
+ - Training/prediction via WASM (`wasm/svm.h`, `wasm/svmApi.cpp`)
55
+ - Interactive model training with progress tracking
56
+
57
+ #### XGBoost (`xgbooster.ts`)
58
+ - **XGBoost** integration via WASM (`wasm/XGBoostAPI.wasm`)
59
+ - Binary classification and regression
60
+ - Requires initialization via `initXgboost()` in package init
61
+
62
+ #### Other Supervised Machine Techniques
63
+ - **Softmax Classifier** (`softmax-classifier.ts`): Multinomial logistic regression
64
+ - Training in web worker for non-blocking UI
65
+ - WASM backend for prediction
66
+ - **Linear Regression** (`regression.ts`): Ordinary least squares
67
+
68
+ #### Missing Values Imputation (`missing-values-imputation/`)
69
+ - **K-Nearest Neighbors** imputation (`knn-imputer.ts`)
70
+ - Handles both numerical and categorical features
71
+
72
+ #### ANOVA (`anova/`)
73
+ - **One-way ANOVA** (`anova-tools.ts`, `anova-ui.ts`)
74
+ - Statistical analysis of variance with visual reports
75
+
76
+ #### Pareto Optimization (`pareto-optimization/`)
77
+ - **ParetoOptimizer** (`pareto-optimizer.ts`): Multi-objective optimization
78
+ - **ParetoFrontViewer** (`pareto-front-viewer.ts`): Custom JsViewer for visualizing Pareto fronts
79
+ - Computes Pareto-optimal solutions from multiple objectives (minimize/maximize)
80
+
81
+ #### Probabilistic Scoring (`probabilistic-scoring/`)
82
+ - **pMPO** (`prob-scoring.ts`): Probabilistic Multi-Parameter Optimization for drug discovery
83
+ - Based on https://pmc.ncbi.nlm.nih.gov/articles/PMC4716604/
84
+ - Features:
85
+ - Training with descriptor statistics and correlation filtering
86
+ - Sigmoid-corrected desirability functions
87
+ - ROC curve analysis and confusion matrix
88
+ - Model evaluation and auto-tuning via Nelder-Mead optimization (`nelder-mead.ts`)
89
+ - Export to MPO desirability profiles (integration with `@datagrok-libraries/statistics`)
90
+ - Sample data in `files/` directory for testing and demos
91
+
92
+ ### WASM Integration
93
+
94
+ WASM modules are initialized asynchronously in `PackageFunctions.init()`:
95
+ - `_initEDAAPI()` from `wasm/EDAAPI.js`
96
+ - `initXgboost()` from `wasm/xgbooster.js`
97
+
98
+ WASM source files (`wasm/*.cpp`, `wasm/*.h`) are C++ implementations compiled to WebAssembly. Do not modify generated `.js` and `.wasm` files directly.
99
+
100
+ ### Function Registration
101
+
102
+ Functions are registered via JSDoc-style metadata comments (not TypeScript decorators):
103
+
104
+ ```typescript
105
+ @grok.decorators.func({
106
+ 'top-menu': 'ML | Analyze | PCA...',
107
+ 'description': 'Principal component analysis (PCA)',
108
+ 'helpUrl': '/help/explore/dim-reduction#pca',
109
+ })
110
+ static async PCA(
111
+ @grok.decorators.param({'type': 'dataframe', 'options': {'caption': 'Table'}}) table: DG.DataFrame,
112
+ // ...
113
+ ) { }
114
+ ```
115
+
116
+ Run `grok api` to generate `package.g.ts` and `package-api.ts` from these annotations.
117
+
118
+ ### External Dependencies
119
+
120
+ Key library integrations:
121
+ - **@datagrok-libraries/ml**: Distance metrics, dimensionality reduction, MCL clustering
122
+ - **@datagrok-libraries/statistics**: Statistical functions, MPO profile editor
123
+ - **@datagrok-libraries/math**: DBSCAN, mathematical utilities
124
+ - **@datagrok-libraries/test**: Test framework
125
+
126
+ External packages (must be in webpack externals):
127
+ - `mathjs`: Matrix operations and numerical computation
128
+ - `jstat`: Statistical distributions
129
+ - `umap-js`: UMAP implementation
130
+ - `@keckelt/tsne`: t-SNE implementation
131
+
132
+ ## Testing
133
+
134
+ Tests are organized by feature in `src/tests/`:
135
+ - `dim-reduction-tests.ts`: PCA, UMAP, t-SNE, SPE
136
+ - `linear-methods-tests.ts`: Linear regression, PLS
137
+ - `classifiers-tests.ts`: SVM, softmax, XGBoost
138
+ - `mis-vals-imputation-tests.ts`: KNN imputation
139
+ - `anova-tests.ts`: One-way ANOVA
140
+ - `pmpo-tests.ts`: Probabilistic scoring
141
+ - `pareto-tests.ts`: Pareto optimization
142
+
143
+ Test framework from `@datagrok-libraries/test`:
144
+ ```typescript
145
+ import {category, expect, test} from '@datagrok-libraries/test/src/test';
146
+
147
+ category('Feature Name', () => {
148
+ test('Test description', async () => {
149
+ // test code
150
+ }, {timeout: 30000});
151
+ });
152
+ ```
153
+
154
+ ## Scripts and Files
155
+
156
+ - **`scripts/`**: Python scripts for exporting TypeScript constants from Python implementations
157
+ - **`files/`**: Demo/test data for pMPO (drugs-props-train.csv, drugs-props-test.csv, scores)
158
+ - **`css/`**: Custom stylesheets (pmpo.css for pMPO UI)
159
+
160
+ ## Key Workflows
161
+
162
+ ### Adding a New ML Method
163
+
164
+ 1. Implement the algorithm in TypeScript (or C++ for WASM)
165
+ 2. Add function registration with `@grok.decorators.func()`
166
+ 3. Create tests in appropriate test file
167
+ 4. Run `grok api` to generate wrappers
168
+ 5. Add to package menu structure in `package.json` meta.menu
169
+
170
+ ### Working with WASM
171
+
172
+ - WASM sources in `wasm/*.cpp` and `wasm/*.h`
173
+ - Generated files: `wasm/*.js`, `wasm/*.wasm`
174
+ - Initialization required in `PackageFunctions.init()`
175
+ - Web workers have separate WASM entry points (e.g., `EDAForWebWorker.js`)
176
+
177
+ ### pMPO Model Development
178
+
179
+ The probabilistic scoring module is complex and tightly integrated:
180
+ - Model training in `Pmpo` class (`prob-scoring.ts`)
181
+ - Statistical computations in `stat-tools.ts`
182
+ - UI utilities and serialization in `pmpo-utils.ts`
183
+ - Optimization via Nelder-Mead in `nelder-mead.ts`
184
+ - All constants and types in `pmpo-defs.ts`
185
+ - Integration with `@datagrok-libraries/statistics` for MPO profile export
package/css/pmpo.css CHANGED
@@ -23,4 +23,13 @@
23
23
  transform: translate(-50%, -50%);
24
24
  pointer-events: none;
25
25
  white-space: nowrap;
26
+ }
27
+
28
+ .eda-pmpo-centered-text {
29
+ display: flex;
30
+ justify-content: center;
31
+ align-items: center;
32
+ height: 100%;
33
+ width: 100%;
34
+ text-align: center;
26
35
  }