npm - bun-scikit - Versions diffs - 0.1.3 → 0.1.4 - Mend

bun-scikit 0.1.3 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/README.md +73 -137
package/package.json +2 -2
package/scripts/check-benchmark-health.ts +62 -1
package/scripts/sync-benchmark-readme.ts +56 -0
package/src/dummy/DummyClassifier.ts +190 -0
package/src/dummy/DummyRegressor.ts +108 -0
package/src/feature_selection/VarianceThreshold.ts +88 -0
package/src/index.ts +23 -0
package/src/metrics/classification.ts +30 -0
package/src/metrics/regression.ts +40 -0
package/src/model_selection/RandomizedSearchCV.ts +269 -0
package/src/native/node-addon/bun_scikit_addon.cpp +149 -0
package/src/native/zigKernels.ts +33 -4
package/src/preprocessing/Binarizer.ts +46 -0
package/src/preprocessing/LabelEncoder.ts +62 -0
package/src/preprocessing/MaxAbsScaler.ts +77 -0
package/src/preprocessing/Normalizer.ts +66 -0
package/src/tree/DecisionTreeClassifier.ts +146 -3
package/zig/kernels.zig +63 -40

package/README.md CHANGED Viewed

@@ -3,185 +3,121 @@
 [![CI](https://github.com/Seyamalam/bun-scikit/actions/workflows/ci.yml/badge.svg)](https://github.com/Seyamalam/bun-scikit/actions/workflows/ci.yml)
 [![Benchmark Snapshot](https://github.com/Seyamalam/bun-scikit/actions/workflows/benchmark-snapshot.yml/badge.svg)](https://github.com/Seyamalam/bun-scikit/actions/workflows/benchmark-snapshot.yml)
-`bun-scikit` is a scikit-learn-inspired machine learning library for Bun + TypeScript.
-## Features
-- `StandardScaler`
-- `LinearRegression` (native Zig `normal` solver)
-- `LogisticRegression` (binary classification, native Zig)
-- `KNeighborsClassifier`
-- `DecisionTreeClassifier`
-- `RandomForestClassifier`
-- `trainTestSplit`
-- Regression metrics: `meanSquaredError`, `meanAbsoluteError`, `r2Score`
-- Classification metrics: `accuracyScore`, `precisionScore`, `recallScore`, `f1Score`
-- Dataset-driven benchmark and CI comparison against Python `scikit-learn`
-`test_data/heart.csv` is used for integration testing and benchmark comparison.
-## Native Zig Backend
-`LinearRegression` (`solver: "normal"`) and `LogisticRegression` require native Zig kernels.
-```bash
-bun run native:build
-```
-Optional Node-API bridge (experimental):
-```bash
-bun run native:build:node-addon
-```
-```ts
-const linear = new LinearRegression({ solver: "normal" });
-const logistic = new LogisticRegression();
-linear.fit(XTrain, yTrain);
-logistic.fit(XTrain, yTrain);
-console.log(linear.fitBackend_, linear.fitBackendLibrary_);
-console.log(logistic.fitBackend_, logistic.fitBackendLibrary_);
-```
-If native kernels are missing, `fit()` throws with guidance to run `bun run native:build`.
-Bridge selection:
-- `BUN_SCIKIT_NATIVE_BRIDGE=node-api|ffi` (`node-api` is attempted first when available)
-- `BUN_SCIKIT_NODE_ADDON=/absolute/path/to/bun_scikit_node_addon.node`
-- `BUN_SCIKIT_ZIG_LIB=/absolute/path/to/bun_scikit_kernels.<ext>`
-Native ABI contract: `docs/native-abi.md`
+Scikit-learn-inspired machine learning for Bun + TypeScript, with native Zig acceleration for core training paths.
 ## Install
 ```bash
-bun install bun-scikit
+bun add bun-scikit
 ```
-Postinstall behavior:
-- Prebuilt native binaries for `linux-x64` and `windows-x64` are bundled in the npm package.
-- No `bun pm trust` step is required for normal install/use.
-- macOS prebuilt binaries are currently not published.
-## Usage
+## Quick Start
 ```ts
 import {
   LinearRegression,
+  LogisticRegression,
   StandardScaler,
-  meanSquaredError,
   trainTestSplit,
+  meanSquaredError,
+  accuracyScore,
 } from "bun-scikit";
-const X = [
-  [1, 2],
-  [2, 3],
-  [3, 4],
-  [4, 5],
-];
-const y = [5, 7, 9, 11];
+const X = [[1], [2], [3], [4], [5], [6]];
+const yReg = [3, 5, 7, 9, 11, 13];
+const yCls = [0, 0, 0, 1, 1, 1];
 const scaler = new StandardScaler();
-const XScaled = scaler.fitTransform(X);
-const { XTrain, XTest, yTrain, yTest } = trainTestSplit(XScaled, y, {
-  testSize: 0.25,
+const Xs = scaler.fitTransform(X);
+const { XTrain, XTest, yTrain, yTest } = trainTestSplit(Xs, yReg, {
+  testSize: 0.33,
   randomState: 42,
 });
-const model = new LinearRegression({ solver: "normal" });
-model.fit(XTrain, yTrain);
-const predictions = model.predict(XTest);
+const reg = new LinearRegression({ solver: "normal" });
+reg.fit(XTrain, yTrain);
+console.log("MSE:", meanSquaredError(yTest, reg.predict(XTest)));
-console.log("MSE:", meanSquaredError(yTest, predictions));
+const clf = new LogisticRegression({
+  solver: "gd",
+  learningRate: 0.8,
+  maxIter: 100,
+  tolerance: 1e-5,
+});
+clf.fit(Xs, yCls);
+console.log("Accuracy:", accuracyScore(yCls, clf.predict(Xs)));
 ```
-## Benchmarks
+## Included APIs
-The table below is generated from `bench/results/heart-ci-latest.json`.
-That snapshot is produced by CI in `.github/workflows/benchmark-snapshot.yml`.
+- Models: `LinearRegression`, `LogisticRegression`, `KNeighborsClassifier`, `DecisionTreeClassifier`, `RandomForestClassifier`, plus additional parity models (`LinearSVC`, `GaussianNB`, `SGDClassifier`, `SGDRegressor`, regressors for tree/forest).
+- Baselines: `DummyClassifier`, `DummyRegressor`.
+- Preprocessing: `StandardScaler`, `MinMaxScaler`, `RobustScaler`, `MaxAbsScaler`, `Normalizer`, `Binarizer`, `LabelEncoder`, `PolynomialFeatures`, `SimpleImputer`, `OneHotEncoder`.
+- Composition: `Pipeline`, `ColumnTransformer`, `FeatureUnion`.
+- Feature selection: `VarianceThreshold`.
+- Model selection: `trainTestSplit`, `KFold`, stratified/repeated splitters, `crossValScore`, `GridSearchCV`, `RandomizedSearchCV`.
+- Metrics: regression and classification metrics, including `logLoss`, `rocAucScore`, `confusionMatrix`, `classificationReport`, `balancedAccuracyScore`, `matthewsCorrcoef`, `brierScoreLoss`, `meanAbsolutePercentageError`, and `explainedVarianceScore`.
-<!-- BENCHMARK_TABLE_START -->
-Benchmark snapshot source: `bench/results/heart-ci-latest.json` (generated in CI workflow `Benchmark Snapshot`).
-Dataset: `test_data/heart.csv` (1025 samples, 13 features, test fraction 0.2).
+## Scikit Parity Matrix
-### Regression
+| Area | Status |
+| --- | --- |
+| Linear models | `LinearRegression`, `LogisticRegression`, `SGDClassifier`, `SGDRegressor`, `LinearSVC` |
+| Tree/ensemble | `DecisionTreeClassifier`, `DecisionTreeRegressor`, `RandomForestClassifier`, `RandomForestRegressor` |
+| Neighbors / Bayes | `KNeighborsClassifier`, `GaussianNB` |
+| Baselines | `DummyClassifier`, `DummyRegressor` |
+| Preprocessing | `StandardScaler`, `MinMaxScaler`, `RobustScaler`, `MaxAbsScaler`, `Normalizer`, `Binarizer`, `LabelEncoder`, `PolynomialFeatures`, `SimpleImputer`, `OneHotEncoder` |
+| Feature selection | `VarianceThreshold` |
+| Model selection | `trainTestSplit`, `KFold`, `StratifiedKFold`, `StratifiedShuffleSplit`, `RepeatedKFold`, `RepeatedStratifiedKFold`, `crossValScore`, `GridSearchCV`, `RandomizedSearchCV` |
+| Metrics (regression) | `meanSquaredError`, `meanAbsoluteError`, `r2Score`, `meanAbsolutePercentageError`, `explainedVarianceScore` |
+| Metrics (classification) | `accuracyScore`, `precisionScore`, `recallScore`, `f1Score`, `balancedAccuracyScore`, `matthewsCorrcoef`, `logLoss`, `brierScoreLoss`, `rocAucScore`, `confusionMatrix`, `classificationReport` |
-| Implementation | Model | Fit median (ms) | Predict median (ms) | MSE | R2 |
-|---|---|---:|---:|---:|---:|
-| bun-scikit | StandardScaler + LinearRegression(normal) | 0.2103 | 0.0216 | 0.117545 | 0.529539 |
-| python-scikit-learn | StandardScaler + LinearRegression | 0.3201 | 0.0365 | 0.117545 | 0.529539 |
+Near-term parity gaps vs scikit-learn include clustering, decomposition, calibration, advanced feature selection, and probability calibration/meta-estimators.
-Bun fit speedup vs scikit-learn: 1.522x
-Bun predict speedup vs scikit-learn: 1.684x
-MSE delta (bun - sklearn): 6.362e-14
-R2 delta (bun - sklearn): -2.539e-13
+## Native Runtime
-### Classification
+- Prebuilt binaries are bundled in the npm package for:
+  - `linux-x64`
+  - `windows-x64`
+- No `bun pm trust` step is required for standard install/use.
+- macOS prebuilt binaries are not published yet.
-| Implementation | Model | Fit median (ms) | Predict median (ms) | Accuracy | F1 |
-|---|---|---:|---:|---:|---:|
-| bun-scikit | StandardScaler + LogisticRegression(gd,zig) | 0.4868 | 0.0282 | 0.863415 | 0.876106 |
-| python-scikit-learn | StandardScaler + LogisticRegression(lbfgs) | 1.1246 | 0.0724 | 0.863415 | 0.875000 |
+Optional env vars:
-Bun fit speedup vs scikit-learn: 2.310x
-Bun predict speedup vs scikit-learn: 2.574x
-Accuracy delta (bun - sklearn): 0.000e+0
-F1 delta (bun - sklearn): 1.106e-3
+- `BUN_SCIKIT_NATIVE_BRIDGE=node-api|ffi`
+- `BUN_SCIKIT_NODE_ADDON=/absolute/path/to/bun_scikit_node_addon.node`
+- `BUN_SCIKIT_ZIG_LIB=/absolute/path/to/bun_scikit_kernels.<ext>`
+- `BUN_SCIKIT_TREE_BACKEND=zig` (opt-in native tree/forest training path; default keeps JS-fast tree splitter)
-### Tree Classification
+## Performance Snapshot
-| Model | Implementation | Fit median (ms) | Predict median (ms) | Accuracy | F1 |
-|---|---|---:|---:|---:|---:|
-| DecisionTreeClassifier(maxDepth=8) | bun-scikit | 0.8062 | 0.0190 | 0.946341 | 0.948837 |
-| DecisionTreeClassifier | python-scikit-learn | 1.4781 | 0.0999 | 0.931707 | 0.933962 |
-| RandomForestClassifier(nEstimators=80,maxDepth=8) | bun-scikit | 27.6225 | 1.8535 | 0.990244 | 0.990566 |
-| RandomForestClassifier | python-scikit-learn | 172.9585 | 6.4850 | 0.995122 | 0.995261 |
+Latest CI snapshot on `test_data/heart.csv` vs Python scikit-learn:
-DecisionTree fit speedup vs scikit-learn: 1.833x
-DecisionTree predict speedup vs scikit-learn: 5.244x
-DecisionTree accuracy delta (bun - sklearn): 1.463e-2
-DecisionTree f1 delta (bun - sklearn): 1.487e-2
+- Regression: fit `1.67x`, predict `1.84x`
+- Classification: fit `1.78x`, predict `2.66x`
+- DecisionTree (`js-fast`): fit `1.54x`, predict `4.06x`
+- RandomForest (`js-fast`): fit `2.59x`, predict `1.29x`
+- Tree backend matrix (`js-fast` vs `zig-tree` vs `sklearn`) is included in `bench/results/heart-ci-latest.md`
-RandomForest fit speedup vs scikit-learn: 6.262x
-RandomForest predict speedup vs scikit-learn: 3.499x
-RandomForest accuracy delta (bun - sklearn): -4.878e-3
-RandomForest f1 delta (bun - sklearn): -4.695e-3
+Raw benchmark artifacts:
-Snapshot generated at: 2026-02-23T14:55:51.251Z
-<!-- BENCHMARK_TABLE_END -->
+- `bench/results/heart-ci-latest.json`
+- `bench/results/heart-ci-latest.md`
 ## Documentation
-- Docs index: `docs/README.md`
 - Getting started: `docs/getting-started.md`
 - API reference: `docs/api.md`
-- Benchmarking flow: `docs/benchmarking.md`
+- Benchmarking: `docs/benchmarking.md`
 - Zig acceleration: `docs/zig-acceleration.md`
+- Native ABI: `docs/native-abi.md`
+- Release checklist: `docs/release-checklist.md`
-## Maintainer Files
+## Contributing / Project Files
 - Changelog: `CHANGELOG.md`
-- Contributing guide: `CONTRIBUTING.md`
+- Contributing: `CONTRIBUTING.md`
+- Security: `SECURITY.md`
 - Code of Conduct: `CODE_OF_CONDUCT.md`
-- Security policy: `SECURITY.md`
-- Support policy: `SUPPORT.md`
-- License: `LICENSE`
-## Local Commands
-```bash
-bun run test
-bun run typecheck
-bun run docs:api:generate
-bun run docs:coverage:check
-bun run bench
-bun run bench:heart:classification
-bun run bench:heart:tree
-bun run bench:ci
-bun run bench:ci:native
-bun run bench:snapshot
-bun run native:build
-```
+- Support: `SUPPORT.md`

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "bun-scikit",
-  "version": "0.1.3",
+  "version": "0.1.4",
   "description": "A scikit-learn-inspired machine learning library for Bun/TypeScript.",
   "license": "MIT",
   "module": "index.ts",
@@ -52,7 +52,7 @@
     "bench:synthetic": "bun run bench/linear-regression.bench.ts",
     "bench:ci": "bun run bench/run-ci-benchmarks.ts --output bench/results/heart-ci-current.json",
     "bench:ci:native": "bun run native:build && bun run bench:ci",
-    "bench:snapshot": "bun run bench/run-ci-benchmarks.ts --output bench/results/heart-ci-latest.json && bun run bench:sync-readme && bun run bench:history:update",
+    "bench:snapshot": "bun run bench/run-ci-benchmarks.ts --output bench/results/heart-ci-latest.json && bun run bench:history:update",
     "bench:sync-readme": "bun run scripts/sync-benchmark-readme.ts",
     "bench:readme:check": "bun run scripts/sync-benchmark-readme.ts --check",
     "bench:health": "bun run scripts/check-benchmark-health.ts",

package/scripts/check-benchmark-health.ts CHANGED Viewed

@@ -26,6 +26,17 @@ interface TreeModelComparison {
   };
 }
+interface TreeBackendModeComparison {
+  comparison: {
+    zigFitSpeedupVsJs: number;
+    zigPredictSpeedupVsJs: number;
+    jsFitSpeedupVsSklearn: number;
+    jsPredictSpeedupVsSklearn: number;
+    zigFitSpeedupVsSklearn: number;
+    zigPredictSpeedupVsSklearn: number;
+  };
+}
 interface BenchmarkSnapshot {
   suites: {
     regression: {
@@ -62,6 +73,10 @@ interface BenchmarkSnapshot {
         },
       ];
     };
+    treeBackendModes: {
+      enabled: boolean;
+      models: [TreeBackendModeComparison, TreeBackendModeComparison] | [];
+    };
   };
 }
@@ -106,7 +121,20 @@ const minDecisionTreePredictSpeedup = speedupThreshold(
 const minRandomForestFitSpeedup = speedupThreshold("BENCH_MIN_RANDOM_FOREST_FIT_SPEEDUP", 2.0);
 const minRandomForestPredictSpeedup = speedupThreshold(
   "BENCH_MIN_RANDOM_FOREST_PREDICT_SPEEDUP",
-  2.0,
+  1.2,
+);
+const maxZigTreeFitSlowdownVsJs = speedupThreshold("BENCH_MAX_ZIG_TREE_FIT_SLOWDOWN_VS_JS", 20);
+const maxZigTreePredictSlowdownVsJs = speedupThreshold(
+  "BENCH_MAX_ZIG_TREE_PREDICT_SLOWDOWN_VS_JS",
+  20,
+);
+const maxZigForestFitSlowdownVsJs = speedupThreshold(
+  "BENCH_MAX_ZIG_FOREST_FIT_SLOWDOWN_VS_JS",
+  20,
+);
+const maxZigForestPredictSlowdownVsJs = speedupThreshold(
+  "BENCH_MAX_ZIG_FOREST_PREDICT_SLOWDOWN_VS_JS",
+  20,
 );
 for (const result of [
@@ -237,4 +265,37 @@ if (randomForest.comparison.predictSpeedupVsSklearn < minRandomForestPredictSpee
   );
 }
+if (snapshot.suites.treeBackendModes.enabled) {
+  const [decisionTreeModes, randomForestModes] = snapshot.suites.treeBackendModes.models;
+  if (!decisionTreeModes || !randomForestModes) {
+    throw new Error("Tree backend mode suite is enabled but missing model comparisons.");
+  }
+  const decisionTreeFitSlowdown = 1 / decisionTreeModes.comparison.zigFitSpeedupVsJs;
+  const decisionTreePredictSlowdown = 1 / decisionTreeModes.comparison.zigPredictSpeedupVsJs;
+  const randomForestFitSlowdown = 1 / randomForestModes.comparison.zigFitSpeedupVsJs;
+  const randomForestPredictSlowdown = 1 / randomForestModes.comparison.zigPredictSpeedupVsJs;
+  if (decisionTreeFitSlowdown > maxZigTreeFitSlowdownVsJs) {
+    throw new Error(
+      `DecisionTree zig fit slowdown too large vs js-fast: ${decisionTreeFitSlowdown} > ${maxZigTreeFitSlowdownVsJs}.`,
+    );
+  }
+  if (decisionTreePredictSlowdown > maxZigTreePredictSlowdownVsJs) {
+    throw new Error(
+      `DecisionTree zig predict slowdown too large vs js-fast: ${decisionTreePredictSlowdown} > ${maxZigTreePredictSlowdownVsJs}.`,
+    );
+  }
+  if (randomForestFitSlowdown > maxZigForestFitSlowdownVsJs) {
+    throw new Error(
+      `RandomForest zig fit slowdown too large vs js-fast: ${randomForestFitSlowdown} > ${maxZigForestFitSlowdownVsJs}.`,
+    );
+  }
+  if (randomForestPredictSlowdown > maxZigForestPredictSlowdownVsJs) {
+    throw new Error(
+      `RandomForest zig predict slowdown too large vs js-fast: ${randomForestPredictSlowdown} > ${maxZigForestPredictSlowdownVsJs}.`,
+    );
+  }
+}
 console.log("Benchmark comparison health checks passed.");

package/scripts/sync-benchmark-readme.ts CHANGED Viewed

@@ -62,6 +62,19 @@ interface BenchmarkSnapshot {
     treeClassification: {
       models: [TreeModelComparison, TreeModelComparison];
     };
+    treeBackendModes?: {
+      enabled: boolean;
+      models: Array<{
+        key: TreeModelKey;
+        jsFast: ClassificationBenchmarkResult;
+        zigTree: ClassificationBenchmarkResult;
+        sklearn: ClassificationBenchmarkResult;
+        comparison: {
+          zigFitSpeedupVsJs: number;
+          zigPredictSpeedupVsJs: number;
+        };
+      }>;
+    };
   };
 }
@@ -89,6 +102,11 @@ function renderBenchmarkSection(snapshot: BenchmarkSnapshot): string {
   const [bunReg, sklearnReg] = regression.results;
   const [bunCls, sklearnCls] = classification.results;
   const [decisionTree, randomForest] = treeClassification.models;
+  const treeBackendModes = snapshot.suites.treeBackendModes;
+  const hasTreeBackendModes =
+    treeBackendModes?.enabled === true && Array.isArray(treeBackendModes.models) && treeBackendModes.models.length === 2;
+  const decisionTreeModes = hasTreeBackendModes ? treeBackendModes.models[0] : null;
+  const randomForestModes = hasTreeBackendModes ? treeBackendModes.models[1] : null;
   return [
     START_MARKER,
@@ -138,6 +156,44 @@ function renderBenchmarkSection(snapshot: BenchmarkSnapshot): string {
     `RandomForest accuracy delta (bun - sklearn): ${randomForest.comparison.accuracyDeltaVsSklearn.toExponential(3)}`,
     `RandomForest f1 delta (bun - sklearn): ${randomForest.comparison.f1DeltaVsSklearn.toExponential(3)}`,
     "",
+    "### Tree Backend Modes (Bun vs Bun vs sklearn)",
+    "",
+    hasTreeBackendModes
+      ? "| Model | Backend | Fit median (ms) | Predict median (ms) | Accuracy | F1 |"
+      : "Tree backend mode matrix disabled (`BENCH_TREE_BACKEND_MATRIX=0`).",
+    hasTreeBackendModes ? "|---|---|---:|---:|---:|---:|" : "",
+    hasTreeBackendModes
+      ? `| DecisionTreeClassifier(maxDepth=8) | js-fast | ${decisionTreeModes!.jsFast.fitMsMedian.toFixed(4)} | ${decisionTreeModes!.jsFast.predictMsMedian.toFixed(4)} | ${decisionTreeModes!.jsFast.accuracy.toFixed(6)} | ${decisionTreeModes!.jsFast.f1.toFixed(6)} |`
+      : "",
+    hasTreeBackendModes
+      ? `| DecisionTreeClassifier(maxDepth=8) | zig-tree | ${decisionTreeModes!.zigTree.fitMsMedian.toFixed(4)} | ${decisionTreeModes!.zigTree.predictMsMedian.toFixed(4)} | ${decisionTreeModes!.zigTree.accuracy.toFixed(6)} | ${decisionTreeModes!.zigTree.f1.toFixed(6)} |`
+      : "",
+    hasTreeBackendModes
+      ? `| DecisionTreeClassifier | python-scikit-learn | ${decisionTreeModes!.sklearn.fitMsMedian.toFixed(4)} | ${decisionTreeModes!.sklearn.predictMsMedian.toFixed(4)} | ${decisionTreeModes!.sklearn.accuracy.toFixed(6)} | ${decisionTreeModes!.sklearn.f1.toFixed(6)} |`
+      : "",
+    hasTreeBackendModes
+      ? `| RandomForestClassifier(nEstimators=80,maxDepth=8) | js-fast | ${randomForestModes!.jsFast.fitMsMedian.toFixed(4)} | ${randomForestModes!.jsFast.predictMsMedian.toFixed(4)} | ${randomForestModes!.jsFast.accuracy.toFixed(6)} | ${randomForestModes!.jsFast.f1.toFixed(6)} |`
+      : "",
+    hasTreeBackendModes
+      ? `| RandomForestClassifier(nEstimators=80,maxDepth=8) | zig-tree | ${randomForestModes!.zigTree.fitMsMedian.toFixed(4)} | ${randomForestModes!.zigTree.predictMsMedian.toFixed(4)} | ${randomForestModes!.zigTree.accuracy.toFixed(6)} | ${randomForestModes!.zigTree.f1.toFixed(6)} |`
+      : "",
+    hasTreeBackendModes
+      ? `| RandomForestClassifier | python-scikit-learn | ${randomForestModes!.sklearn.fitMsMedian.toFixed(4)} | ${randomForestModes!.sklearn.predictMsMedian.toFixed(4)} | ${randomForestModes!.sklearn.accuracy.toFixed(6)} | ${randomForestModes!.sklearn.f1.toFixed(6)} |`
+      : "",
+    "",
+    hasTreeBackendModes
+      ? `DecisionTree zig/js fit speedup: ${decisionTreeModes!.comparison.zigFitSpeedupVsJs.toFixed(3)}x`
+      : "",
+    hasTreeBackendModes
+      ? `DecisionTree zig/js predict speedup: ${decisionTreeModes!.comparison.zigPredictSpeedupVsJs.toFixed(3)}x`
+      : "",
+    hasTreeBackendModes
+      ? `RandomForest zig/js fit speedup: ${randomForestModes!.comparison.zigFitSpeedupVsJs.toFixed(3)}x`
+      : "",
+    hasTreeBackendModes
+      ? `RandomForest zig/js predict speedup: ${randomForestModes!.comparison.zigPredictSpeedupVsJs.toFixed(3)}x`
+      : "",
+    "",
     `Snapshot generated at: ${snapshot.generatedAt}`,
     END_MARKER,
   ].join("\n");

package/src/dummy/DummyClassifier.ts ADDED Viewed

@@ -0,0 +1,190 @@
+import type { Matrix, Vector } from "../types";
+import { accuracyScore } from "../metrics/classification";
+import {
+  assertConsistentRowSize,
+  assertFiniteMatrix,
+  assertFiniteVector,
+  assertNonEmptyMatrix,
+  assertVectorLength,
+} from "../utils/validation";
+export type DummyClassifierStrategy =
+  | "most_frequent"
+  | "prior"
+  | "stratified"
+  | "uniform"
+  | "constant";
+export interface DummyClassifierOptions {
+  strategy?: DummyClassifierStrategy;
+  constant?: number;
+  randomState?: number;
+}
+class Mulberry32 {
+  private state: number;
+  constructor(seed: number) {
+    this.state = seed >>> 0;
+  }
+  next(): number {
+    this.state = (this.state + 0x6d2b79f5) >>> 0;
+    let t = this.state ^ (this.state >>> 15);
+    t = Math.imul(t, this.state | 1);
+    t ^= t + Math.imul(t ^ (t >>> 7), t | 61);
+    return ((t ^ (t >>> 14)) >>> 0) / 4294967296;
+  }
+}
+export class DummyClassifier {
+  classes_: number[] | null = null;
+  classPrior_: number[] | null = null;
+  constant_: number | null = null;
+  private readonly strategy: DummyClassifierStrategy;
+  private readonly configuredConstant?: number;
+  private readonly randomState: number;
+  private majorityClass: number | null = null;
+  private nFeaturesIn_: number | null = null;
+  constructor(options: DummyClassifierOptions = {}) {
+    this.strategy = options.strategy ?? "prior";
+    this.configuredConstant = options.constant;
+    this.randomState = options.randomState ?? 42;
+  }
+  fit(X: Matrix, y: Vector): this {
+    assertNonEmptyMatrix(X);
+    assertConsistentRowSize(X);
+    assertFiniteMatrix(X);
+    assertVectorLength(y, X.length);
+    assertFiniteVector(y);
+    this.nFeaturesIn_ = X[0].length;
+    const counts = new Map<number, number>();
+    for (let i = 0; i < y.length; i += 1) {
+      counts.set(y[i], (counts.get(y[i]) ?? 0) + 1);
+    }
+    const classes = Array.from(counts.keys()).sort((a, b) => a - b);
+    const priors = new Array<number>(classes.length);
+    for (let i = 0; i < classes.length; i += 1) {
+      priors[i] = (counts.get(classes[i]) ?? 0) / y.length;
+    }
+    let majorityClass = classes[0];
+    let majorityCount = counts.get(majorityClass) ?? 0;
+    for (let i = 1; i < classes.length; i += 1) {
+      const cls = classes[i];
+      const clsCount = counts.get(cls) ?? 0;
+      if (clsCount > majorityCount) {
+        majorityClass = cls;
+        majorityCount = clsCount;
+      }
+    }
+    if (this.strategy === "constant") {
+      if (!Number.isFinite(this.configuredConstant)) {
+        throw new Error("constant strategy requires a finite constant value.");
+      }
+      this.constant_ = this.configuredConstant!;
+    } else {
+      this.constant_ = majorityClass;
+    }
+    this.classes_ = classes;
+    this.classPrior_ = priors;
+    this.majorityClass = majorityClass;
+    return this;
+  }
+  private ensureFitted(): void {
+    if (!this.classes_ || !this.classPrior_ || this.nFeaturesIn_ === null || this.majorityClass === null) {
+      throw new Error("DummyClassifier has not been fitted.");
+    }
+  }
+  private sampleByPrior(rng: Mulberry32): number {
+    let r = rng.next();
+    for (let i = 0; i < this.classPrior_!.length; i += 1) {
+      r -= this.classPrior_![i];
+      if (r <= 0) {
+        return this.classes_![i];
+      }
+    }
+    return this.classes_![this.classes_!.length - 1];
+  }
+  predict(X: Matrix): Vector {
+    this.ensureFitted();
+    if (!Array.isArray(X) || X.length === 0) {
+      throw new Error("X must be a non-empty 2D array.");
+    }
+    if (!Array.isArray(X[0]) || X[0].length !== this.nFeaturesIn_) {
+      throw new Error(`Feature size mismatch. Expected ${this.nFeaturesIn_}, got ${X[0]?.length ?? 0}.`);
+    }
+    switch (this.strategy) {
+      case "most_frequent":
+      case "prior":
+        return new Array<number>(X.length).fill(this.majorityClass!);
+      case "constant":
+        return new Array<number>(X.length).fill(this.constant_!);
+      case "uniform": {
+        const rng = new Mulberry32(this.randomState);
+        const out = new Array<number>(X.length);
+        for (let i = 0; i < X.length; i += 1) {
+          const idx = Math.floor(rng.next() * this.classes_!.length);
+          out[i] = this.classes_![idx];
+        }
+        return out;
+      }
+      case "stratified": {
+        const rng = new Mulberry32(this.randomState);
+        const out = new Array<number>(X.length);
+        for (let i = 0; i < X.length; i += 1) {
+          out[i] = this.sampleByPrior(rng);
+        }
+        return out;
+      }
+      default: {
+        const exhaustive: never = this.strategy;
+        throw new Error(`Unsupported strategy: ${exhaustive}`);
+      }
+    }
+  }
+  predictProba(X: Matrix): Matrix {
+    this.ensureFitted();
+    if (!Array.isArray(X) || X.length === 0) {
+      throw new Error("X must be a non-empty 2D array.");
+    }
+    if (!Array.isArray(X[0]) || X[0].length !== this.nFeaturesIn_) {
+      throw new Error(`Feature size mismatch. Expected ${this.nFeaturesIn_}, got ${X[0]?.length ?? 0}.`);
+    }
+    if (this.strategy === "uniform") {
+      const value = 1 / this.classes_!.length;
+      return X.map(() => new Array(this.classes_!.length).fill(value));
+    }
+    if (this.strategy === "most_frequent" || this.strategy === "constant") {
+      const oneHot = new Array<number>(this.classes_!.length).fill(0);
+      const label = this.strategy === "constant" ? this.constant_! : this.majorityClass!;
+      const classIndex = this.classes_!.indexOf(label);
+      if (classIndex >= 0) {
+        oneHot[classIndex] = 1;
+      }
+      return X.map(() => [...oneHot]);
+    }
+    // prior / stratified share prior probabilities.
+    const prior = [...this.classPrior_!];
+    return X.map(() => [...prior]);
+  }
+  score(X: Matrix, y: Vector): number {
+    return accuracyScore(y, this.predict(X));
+  }
+}