sphere-cli 0.2.8 → 0.2.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -8,51 +8,40 @@ Command-line interface for **SPHERE** — synthetic data generation, evaluation,
8
8
 
9
9
  ## Install
10
10
 
11
- **npm (recommended):**
12
-
13
11
  ```sh
14
12
  npm install -g sphere-cli
15
13
  ```
16
14
 
17
- No Python, no curl, no PATH editing. Requires Node.js 16.
18
-
19
- **curl (no Node.js required):**
15
+ Requires **Node.js ≥ 18**. No Python and no manual PATH editing the install downloads a self-contained, signed binary and wires everything up for you.
20
16
 
21
- ```sh
22
- curl -fsSL https://github.com/statzihuai/sphere-cli/releases/latest/download/install.sh | sh
23
- ```
17
+ **Install once, run anywhere (HPC).** The ~500 MB engine is **not** placed inside `node_modules`, so it never blows up a quota-limited home directory. On a cluster the installer auto-detects roomy shared storage (`$OAK`, `$SCRATCH`, `$WORK`, `$PROJECT`, `$GROUP_HOME`, …) and installs there; because that storage and your `~/.bashrc` are shared across every login and compute node, you install once and `sphere` works in every future session on every node — no reinstall. If the global `bin` isn't already on your `PATH`, the installer appends it to your shell rc automatically. To pin a location, set `SPHERE_HOME=/path/with/space` before installing.
24
18
 
25
- For HPC / cloud with no sudo:
19
+ **Update / uninstall:**
26
20
 
27
21
  ```sh
28
- curl -fsSL https://github.com/statzihuai/sphere-cli/releases/latest/download/install.sh | sh -s -- --prefix ~/.local
29
- # then add to ~/.bashrc or ~/.zshrc:
30
- export PATH="$HOME/.local/bin:$PATH"
22
+ npm install -g sphere-cli # update to the latest version
23
+ npm uninstall -g sphere-cli # remove
31
24
  ```
32
25
 
33
- **Uninstall:**
34
-
35
- ```sh
36
- sh install.sh --uninstall
37
- ```
26
+ **No Node.js?** Download the tarball for your platform from the [latest release](https://github.com/statzihuai/sphere-cli/releases/latest), extract it, and run `sphere-cli/sphere` directly (add it to your `PATH` if you like).
38
27
 
39
28
  ### Supported platforms
40
29
 
41
30
  | Platform | Architecture |
42
31
  |---|---|
43
32
  | macOS | Apple Silicon (arm64) |
44
- | Linux | x86\_64 |
45
- | Linux | arm64 (AWS Graviton, etc.) |
33
+ | macOS | Intel (x86\_64) |
34
+ | Linux | x86\_64 (glibc ≥ 2.17 — runs on CentOS 7 / RHEL 7 and newer, incl. most HPC clusters) |
46
35
 
47
36
  ---
48
37
 
49
38
  ## Quick start
50
39
 
51
40
  ```sh
52
- # Try the built-in demo (no data needed)
41
+ # Try the built-in demo (no data or license needed)
53
42
  sphere demo
54
43
 
55
- # Activate your license (once)
44
+ # Activate your license (once; required for generate/evaluate/certify)
56
45
  sphere license activate sphere_xxxxxxxxxxxxxxxxxxxx
57
46
 
58
47
  # Generate synthetic data
@@ -69,7 +58,7 @@ sphere certify real.csv synth.csv -o report.html
69
58
 
70
59
  ## First run
71
60
 
72
- On the very first invocation the CLI cold-loads its bundled Python libraries (pandas, pyarrow, anonymeter, sklearn) from disk. On Apple Silicon this typically takes **15–25 seconds** and is shown in the progress bar as each library finishes:
61
+ On the very first invocation the CLI cold-loads its bundled Python libraries (pandas, pyarrow, anonymeter, sklearn) from disk. On Apple Silicon this typically takes **15–25 seconds**, shown in the progress bar as each library finishes:
73
62
 
74
63
  ```
75
64
  Generating synthetic data from nhanes_sample.csv …
@@ -81,9 +70,9 @@ Generating synthetic data from nhanes_sample.csv …
81
70
  ✓ synth.csv 4,899 rows × 18 cols (load 17.4 s + run 1.8 s) seed 3721018536
82
71
  ```
83
72
 
84
- Subsequent calls in the same session skip loading entirely. The timing line always shows **load** (library startup) and **run** (actual SPHERE computation) separately so you can see which part is slow.
73
+ Subsequent calls on the same node skip loading (OS page cache). The timing line always shows **load** (library startup) and **run** (actual SPHERE computation) separately so you can see which part is slow.
85
74
 
86
- > Exact times vary by machine, OS page cache state, and whether the binary has been run recently.
75
+ > On a cluster, the engine lives on shared network storage; the first run on a fresh node re-pays that cold load. If you launch many `sphere` commands in one job and want each to start fast, the launcher transparently caches the engine to node-local disk (`$L_SCRATCH`/`$TMPDIR`) on network filesystems — set `SPHERE_NO_FAST=1` to disable.
87
76
 
88
77
  ---
89
78
 
@@ -97,48 +86,9 @@ Run SPHERE end-to-end on the built-in NHANES sample dataset (4,899 rows × 18 co
97
86
  sphere demo
98
87
  ```
99
88
 
100
- ```
101
- SPHERE demo — built-in NHANES dataset (4,899 rows × 18 cols, continuous + categorical)
102
- ────────────────────────────────────────────────────
103
-
104
- Generating synthetic data from nhanes_sample.csv …
105
- [░░░░░░░░░░░░░░░░░] 0.0% loading pandas . .
106
- [█░░░░░░░░░░░░░░░░] 3.0% ✓ pandas (12.4 s)
107
- [██░░░░░░░░░░░░░░░] 6.0% ✓ pyarrow (3.1 s)
108
- [███░░░░░░░░░░░░░░] 9.0% ✓ sphere core (1.8 s)
109
- [████████████████░] 85.0% writing output
110
- ✓ /tmp/synth.csv 4,899 rows × 18 cols (load 17.4 s + run 1.8 s) seed 3721018536
111
-
112
- Evaluating nhanes_sample.csv vs synth.csv …
113
- [████░░░░░░░░░░░░░] 16.0% loading anonymeter . .
114
- [████░░░░░░░░░░░░░] 17.0% ✓ anonymeter (3.2 s)
115
- [█████░░░░░░░░░░░░] 18.0% ✓ sklearn (0.8 s)
116
- [█████████████████] 89.0% inference 9/9
117
- ✓ Evaluation complete (load 4.0 s + run 14.2 s)
118
-
119
- Fidelity
120
- ────────────────────────────────────
121
- Mean 100.0 ████████████████████
122
- Variance 99.7 ████████████████████
123
- Correlation 95.1 ███████████████████░
124
- KS 96.8 ███████████████████░
125
- ────────────────────────────────────
126
- Composite 97.9 ████████████████████
127
-
128
- Privacy
129
- ────────────────────────────────────
130
- Singling Out 100.0 ████████████████████
131
- Linkability 97.5 ███████████████████░
132
- Inference 96.8 ███████████████████░
133
- ────────────────────────────────────
134
- Composite 98.1 ████████████████████
135
- ```
136
-
137
- ---
138
-
139
89
  ### `sphere license`
140
90
 
141
- Activate and manage your SPHERE license. A valid license is required to use `generate`, `evaluate`, and `certify`.
91
+ Activate and manage your SPHERE license. A valid license is required to use `generate`, `evaluate`, and `certify` (but **not** `demo`).
142
92
 
143
93
  ```
144
94
  sphere license activate [KEY] # Activate with a sphere_… key (prompts if omitted)
@@ -150,8 +100,6 @@ The key is stored at `~/.config/sphere/license_key` (mode 0600). After a success
150
100
 
151
101
  > Don't have a license? Contact [zihuai@stanford.edu](mailto:zihuai@stanford.edu) or visit [sphere.stanford.edu](https://sphere.stanford.edu).
152
102
 
153
- ---
154
-
155
103
  ### `sphere generate`
156
104
 
157
105
  ```
@@ -159,16 +107,13 @@ sphere generate <real.csv> [options]
159
107
 
160
108
  Options:
161
109
  -o, --output PATH Output CSV path (default: <input>_sphere.csv)
162
- -n, --rows INT Number of synthetic rows (default: same as input)
163
- -k INT Synthesis depth (default: 2)
164
- --seed INT Random seed for reproducibility
110
+ --k INT Synthesis passes (default: 2; more = stronger privacy)
165
111
  --mix-prob FLOAT Privacy/utility trade-off, 0–1 (default: 0.75)
112
+ --seed INT Random seed for reproducibility
166
113
  --json Machine-readable JSON output
167
114
  ```
168
115
 
169
- A `.sphere.json` provenance file is written alongside every output CSV and is automatically read by `sphere certify`.
170
-
171
- ---
116
+ The synthetic output has the **same number of rows** as the input (SPHERE transforms the data in place). Integer-coded categorical columns (≤ 10 distinct values, e.g. 0/1 flags or small ordinal scales) are preserved as exact discrete values; continuous columns are transformed while preserving the covariance structure. A `.sphere.json` provenance file is written alongside every output CSV and is read automatically by `sphere certify`.
172
117
 
173
118
  ### `sphere evaluate`
174
119
 
@@ -177,13 +122,16 @@ sphere evaluate <real.csv> <synth.csv> [options]
177
122
 
178
123
  Options:
179
124
  --skip-privacy Skip privacy metrics (faster)
180
- --seed INT Fix the random seed for reproducible attack results
125
+ --n-attacks INT Anonymeter attacks per metric (default: 500)
126
+ --n-secrets INT Random secret columns per inference replicate (default: 5)
127
+ --n-reps INT Inference replicates to average (default: 10; more = tighter, slower)
128
+ --n-neighbors INT k for the linkability k-NN test (default: 1)
129
+ --n-aux-cols INT Feature columns for the linkability A/B split (default: 20)
130
+ --seed INT Fix the random seed for fully reproducible results
181
131
  --json Machine-readable JSON output
182
132
  ```
183
133
 
184
- Reports four fidelity metrics (mean, variance, correlation, KS) and three privacy metrics (singling-out, linkability, inference), each scored 0–100. Scores are normalised against a column-shuffled baseline so 100 = no measurable privacy leakage relative to a random permutation of the data.
185
-
186
- ---
134
+ Reports four fidelity metrics (mean, variance, correlation, KS) and three privacy metrics (singling-out, linkability, inference), each scored 0–100. Scores are normalised against a column-shuffled baseline, so 100 = no measurable leakage relative to a random permutation. The **inference** score averages `--n-reps` independent replicates of the random secret-column sampling, which makes it stable run-to-run (raise `--n-reps` for an even tighter estimate, or pass `--seed` for an exactly reproducible audit).
187
135
 
188
136
  ### `sphere certify`
189
137
 
@@ -216,11 +164,12 @@ sphere evaluate real.csv synth.csv --json | jq '.privacy.composite'
216
164
  | Variable | Description |
217
165
  |---|---|
218
166
  | `SPHERE_LICENSE_REQUIRED` | Set to `false` to bypass license checks (research / unlocked builds) |
219
- | `SPHERE_WORKER_URL` | Override the license validation endpoint |
220
- | `SPHERE_PREFIX` | Override install prefix |
221
- | `SPHERE_VERSION` | Pin a release tag, e.g. `v0.1.38` |
222
- | `SPHERE_BUNDLE_URL` | Full URL to a `sphere-cli-*.tar.gz` (skip auto-detect) |
223
- | `SPHERE_GITHUB_REPO` | Override GitHub repo for downloads |
167
+ | `SPHERE_HOME` | Install location for the engine (default: auto-detected roomy/HPC storage, else `~/.local/share`) |
168
+ | `SPHERE_NO_FAST` | Set to `1` to disable node-local caching of the engine on network filesystems |
169
+ | `SPHERE_FAST_DIR` | Override the node-local cache directory (default: `$L_SCRATCH`/`$TMPDIR`) |
170
+ | `SPHERE_NO_PATH_SETUP` | Set to `1` to skip auto-adding the `bin` dir to your shell rc |
171
+ | `SPHERE_BINARY_BASEURL` | Override the release base URL the engine downloads from (testing) |
172
+ | `SPHERE_SKIP_POSTINSTALL` | Set to `1` to skip the binary download during `npm install` (CI / offline) |
224
173
 
225
174
  ---
226
175
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "sphere-cli",
3
- "version": "0.2.8",
3
+ "version": "0.2.10",
4
4
  "description": "SPHERE CLI — synthetic data generation, evaluation, and certification (sealed native binary)",
5
5
  "keywords": [
6
6
  "synthetic-data",
@@ -19,8 +19,7 @@
19
19
  "sphere": "bin/sphere.js"
20
20
  },
21
21
  "scripts": {
22
- "postinstall": "node scripts/postinstall.js",
23
- "release": "bash scripts/release.sh"
22
+ "postinstall": "node scripts/postinstall.js"
24
23
  },
25
24
  "files": [
26
25
  "bin/sphere.js",
package/scripts/engine.js CHANGED
@@ -30,7 +30,7 @@ const { execFileSync } = require('child_process');
30
30
  const REPO = 'statzihuai/sphere-cli';
31
31
  // Binary release tag — decoupled from the npm package version so JS-only patch
32
32
  // releases reuse the same prebuilt/notarized binaries.
33
- const BINARY_RELEASE = 'v0.2.6';
33
+ const BINARY_RELEASE = 'v0.2.9';
34
34
 
35
35
  const PLATFORM = process.platform; // 'darwin' | 'linux'
36
36
  const ARCH = process.arch; // 'arm64' | 'x64'