sphere-cli 0.2.8 → 0.2.10
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +30 -81
- package/package.json +2 -3
- package/scripts/engine.js +1 -1
package/README.md
CHANGED
|
@@ -8,51 +8,40 @@ Command-line interface for **SPHERE** — synthetic data generation, evaluation,
|
|
|
8
8
|
|
|
9
9
|
## Install
|
|
10
10
|
|
|
11
|
-
**npm (recommended):**
|
|
12
|
-
|
|
13
11
|
```sh
|
|
14
12
|
npm install -g sphere-cli
|
|
15
13
|
```
|
|
16
14
|
|
|
17
|
-
No Python
|
|
18
|
-
|
|
19
|
-
**curl (no Node.js required):**
|
|
15
|
+
Requires **Node.js ≥ 18**. No Python and no manual PATH editing — the install downloads a self-contained, signed binary and wires everything up for you.
|
|
20
16
|
|
|
21
|
-
|
|
22
|
-
curl -fsSL https://github.com/statzihuai/sphere-cli/releases/latest/download/install.sh | sh
|
|
23
|
-
```
|
|
17
|
+
**Install once, run anywhere (HPC).** The ~500 MB engine is **not** placed inside `node_modules`, so it never blows up a quota-limited home directory. On a cluster the installer auto-detects roomy shared storage (`$OAK`, `$SCRATCH`, `$WORK`, `$PROJECT`, `$GROUP_HOME`, …) and installs there; because that storage and your `~/.bashrc` are shared across every login and compute node, you install once and `sphere` works in every future session on every node — no reinstall. If the global `bin` isn't already on your `PATH`, the installer appends it to your shell rc automatically. To pin a location, set `SPHERE_HOME=/path/with/space` before installing.
|
|
24
18
|
|
|
25
|
-
|
|
19
|
+
**Update / uninstall:**
|
|
26
20
|
|
|
27
21
|
```sh
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
export PATH="$HOME/.local/bin:$PATH"
|
|
22
|
+
npm install -g sphere-cli # update to the latest version
|
|
23
|
+
npm uninstall -g sphere-cli # remove
|
|
31
24
|
```
|
|
32
25
|
|
|
33
|
-
**
|
|
34
|
-
|
|
35
|
-
```sh
|
|
36
|
-
sh install.sh --uninstall
|
|
37
|
-
```
|
|
26
|
+
**No Node.js?** Download the tarball for your platform from the [latest release](https://github.com/statzihuai/sphere-cli/releases/latest), extract it, and run `sphere-cli/sphere` directly (add it to your `PATH` if you like).
|
|
38
27
|
|
|
39
28
|
### Supported platforms
|
|
40
29
|
|
|
41
30
|
| Platform | Architecture |
|
|
42
31
|
|---|---|
|
|
43
32
|
| macOS | Apple Silicon (arm64) |
|
|
44
|
-
|
|
|
45
|
-
| Linux |
|
|
33
|
+
| macOS | Intel (x86\_64) |
|
|
34
|
+
| Linux | x86\_64 (glibc ≥ 2.17 — runs on CentOS 7 / RHEL 7 and newer, incl. most HPC clusters) |
|
|
46
35
|
|
|
47
36
|
---
|
|
48
37
|
|
|
49
38
|
## Quick start
|
|
50
39
|
|
|
51
40
|
```sh
|
|
52
|
-
# Try the built-in demo (no data needed)
|
|
41
|
+
# Try the built-in demo (no data or license needed)
|
|
53
42
|
sphere demo
|
|
54
43
|
|
|
55
|
-
# Activate your license (once)
|
|
44
|
+
# Activate your license (once; required for generate/evaluate/certify)
|
|
56
45
|
sphere license activate sphere_xxxxxxxxxxxxxxxxxxxx
|
|
57
46
|
|
|
58
47
|
# Generate synthetic data
|
|
@@ -69,7 +58,7 @@ sphere certify real.csv synth.csv -o report.html
|
|
|
69
58
|
|
|
70
59
|
## First run
|
|
71
60
|
|
|
72
|
-
On the very first invocation the CLI cold-loads its bundled Python libraries (pandas, pyarrow, anonymeter, sklearn) from disk. On Apple Silicon this typically takes **15–25 seconds
|
|
61
|
+
On the very first invocation the CLI cold-loads its bundled Python libraries (pandas, pyarrow, anonymeter, sklearn) from disk. On Apple Silicon this typically takes **15–25 seconds**, shown in the progress bar as each library finishes:
|
|
73
62
|
|
|
74
63
|
```
|
|
75
64
|
Generating synthetic data from nhanes_sample.csv …
|
|
@@ -81,9 +70,9 @@ Generating synthetic data from nhanes_sample.csv …
|
|
|
81
70
|
✓ synth.csv 4,899 rows × 18 cols (load 17.4 s + run 1.8 s) seed 3721018536
|
|
82
71
|
```
|
|
83
72
|
|
|
84
|
-
Subsequent calls
|
|
73
|
+
Subsequent calls on the same node skip loading (OS page cache). The timing line always shows **load** (library startup) and **run** (actual SPHERE computation) separately so you can see which part is slow.
|
|
85
74
|
|
|
86
|
-
>
|
|
75
|
+
> On a cluster, the engine lives on shared network storage; the first run on a fresh node re-pays that cold load. If you launch many `sphere` commands in one job and want each to start fast, the launcher transparently caches the engine to node-local disk (`$L_SCRATCH`/`$TMPDIR`) on network filesystems — set `SPHERE_NO_FAST=1` to disable.
|
|
87
76
|
|
|
88
77
|
---
|
|
89
78
|
|
|
@@ -97,48 +86,9 @@ Run SPHERE end-to-end on the built-in NHANES sample dataset (4,899 rows × 18 co
|
|
|
97
86
|
sphere demo
|
|
98
87
|
```
|
|
99
88
|
|
|
100
|
-
```
|
|
101
|
-
SPHERE demo — built-in NHANES dataset (4,899 rows × 18 cols, continuous + categorical)
|
|
102
|
-
────────────────────────────────────────────────────
|
|
103
|
-
|
|
104
|
-
Generating synthetic data from nhanes_sample.csv …
|
|
105
|
-
[░░░░░░░░░░░░░░░░░] 0.0% loading pandas . .
|
|
106
|
-
[█░░░░░░░░░░░░░░░░] 3.0% ✓ pandas (12.4 s)
|
|
107
|
-
[██░░░░░░░░░░░░░░░] 6.0% ✓ pyarrow (3.1 s)
|
|
108
|
-
[███░░░░░░░░░░░░░░] 9.0% ✓ sphere core (1.8 s)
|
|
109
|
-
[████████████████░] 85.0% writing output
|
|
110
|
-
✓ /tmp/synth.csv 4,899 rows × 18 cols (load 17.4 s + run 1.8 s) seed 3721018536
|
|
111
|
-
|
|
112
|
-
Evaluating nhanes_sample.csv vs synth.csv …
|
|
113
|
-
[████░░░░░░░░░░░░░] 16.0% loading anonymeter . .
|
|
114
|
-
[████░░░░░░░░░░░░░] 17.0% ✓ anonymeter (3.2 s)
|
|
115
|
-
[█████░░░░░░░░░░░░] 18.0% ✓ sklearn (0.8 s)
|
|
116
|
-
[█████████████████] 89.0% inference 9/9
|
|
117
|
-
✓ Evaluation complete (load 4.0 s + run 14.2 s)
|
|
118
|
-
|
|
119
|
-
Fidelity
|
|
120
|
-
────────────────────────────────────
|
|
121
|
-
Mean 100.0 ████████████████████
|
|
122
|
-
Variance 99.7 ████████████████████
|
|
123
|
-
Correlation 95.1 ███████████████████░
|
|
124
|
-
KS 96.8 ███████████████████░
|
|
125
|
-
────────────────────────────────────
|
|
126
|
-
Composite 97.9 ████████████████████
|
|
127
|
-
|
|
128
|
-
Privacy
|
|
129
|
-
────────────────────────────────────
|
|
130
|
-
Singling Out 100.0 ████████████████████
|
|
131
|
-
Linkability 97.5 ███████████████████░
|
|
132
|
-
Inference 96.8 ███████████████████░
|
|
133
|
-
────────────────────────────────────
|
|
134
|
-
Composite 98.1 ████████████████████
|
|
135
|
-
```
|
|
136
|
-
|
|
137
|
-
---
|
|
138
|
-
|
|
139
89
|
### `sphere license`
|
|
140
90
|
|
|
141
|
-
Activate and manage your SPHERE license. A valid license is required to use `generate`, `evaluate`, and `certify
|
|
91
|
+
Activate and manage your SPHERE license. A valid license is required to use `generate`, `evaluate`, and `certify` (but **not** `demo`).
|
|
142
92
|
|
|
143
93
|
```
|
|
144
94
|
sphere license activate [KEY] # Activate with a sphere_… key (prompts if omitted)
|
|
@@ -150,8 +100,6 @@ The key is stored at `~/.config/sphere/license_key` (mode 0600). After a success
|
|
|
150
100
|
|
|
151
101
|
> Don't have a license? Contact [zihuai@stanford.edu](mailto:zihuai@stanford.edu) or visit [sphere.stanford.edu](https://sphere.stanford.edu).
|
|
152
102
|
|
|
153
|
-
---
|
|
154
|
-
|
|
155
103
|
### `sphere generate`
|
|
156
104
|
|
|
157
105
|
```
|
|
@@ -159,16 +107,13 @@ sphere generate <real.csv> [options]
|
|
|
159
107
|
|
|
160
108
|
Options:
|
|
161
109
|
-o, --output PATH Output CSV path (default: <input>_sphere.csv)
|
|
162
|
-
|
|
163
|
-
-k INT Synthesis depth (default: 2)
|
|
164
|
-
--seed INT Random seed for reproducibility
|
|
110
|
+
--k INT Synthesis passes (default: 2; more = stronger privacy)
|
|
165
111
|
--mix-prob FLOAT Privacy/utility trade-off, 0–1 (default: 0.75)
|
|
112
|
+
--seed INT Random seed for reproducibility
|
|
166
113
|
--json Machine-readable JSON output
|
|
167
114
|
```
|
|
168
115
|
|
|
169
|
-
A `.sphere.json` provenance file is written alongside every output CSV and is automatically
|
|
170
|
-
|
|
171
|
-
---
|
|
116
|
+
The synthetic output has the **same number of rows** as the input (SPHERE transforms the data in place). Integer-coded categorical columns (≤ 10 distinct values, e.g. 0/1 flags or small ordinal scales) are preserved as exact discrete values; continuous columns are transformed while preserving the covariance structure. A `.sphere.json` provenance file is written alongside every output CSV and is read automatically by `sphere certify`.
|
|
172
117
|
|
|
173
118
|
### `sphere evaluate`
|
|
174
119
|
|
|
@@ -177,13 +122,16 @@ sphere evaluate <real.csv> <synth.csv> [options]
|
|
|
177
122
|
|
|
178
123
|
Options:
|
|
179
124
|
--skip-privacy Skip privacy metrics (faster)
|
|
180
|
-
--
|
|
125
|
+
--n-attacks INT Anonymeter attacks per metric (default: 500)
|
|
126
|
+
--n-secrets INT Random secret columns per inference replicate (default: 5)
|
|
127
|
+
--n-reps INT Inference replicates to average (default: 10; more = tighter, slower)
|
|
128
|
+
--n-neighbors INT k for the linkability k-NN test (default: 1)
|
|
129
|
+
--n-aux-cols INT Feature columns for the linkability A/B split (default: 20)
|
|
130
|
+
--seed INT Fix the random seed for fully reproducible results
|
|
181
131
|
--json Machine-readable JSON output
|
|
182
132
|
```
|
|
183
133
|
|
|
184
|
-
Reports four fidelity metrics (mean, variance, correlation, KS) and three privacy metrics (singling-out, linkability, inference), each scored 0–100. Scores are normalised against a column-shuffled baseline so 100 = no measurable
|
|
185
|
-
|
|
186
|
-
---
|
|
134
|
+
Reports four fidelity metrics (mean, variance, correlation, KS) and three privacy metrics (singling-out, linkability, inference), each scored 0–100. Scores are normalised against a column-shuffled baseline, so 100 = no measurable leakage relative to a random permutation. The **inference** score averages `--n-reps` independent replicates of the random secret-column sampling, which makes it stable run-to-run (raise `--n-reps` for an even tighter estimate, or pass `--seed` for an exactly reproducible audit).
|
|
187
135
|
|
|
188
136
|
### `sphere certify`
|
|
189
137
|
|
|
@@ -216,11 +164,12 @@ sphere evaluate real.csv synth.csv --json | jq '.privacy.composite'
|
|
|
216
164
|
| Variable | Description |
|
|
217
165
|
|---|---|
|
|
218
166
|
| `SPHERE_LICENSE_REQUIRED` | Set to `false` to bypass license checks (research / unlocked builds) |
|
|
219
|
-
| `
|
|
220
|
-
| `
|
|
221
|
-
| `
|
|
222
|
-
| `
|
|
223
|
-
| `
|
|
167
|
+
| `SPHERE_HOME` | Install location for the engine (default: auto-detected roomy/HPC storage, else `~/.local/share`) |
|
|
168
|
+
| `SPHERE_NO_FAST` | Set to `1` to disable node-local caching of the engine on network filesystems |
|
|
169
|
+
| `SPHERE_FAST_DIR` | Override the node-local cache directory (default: `$L_SCRATCH`/`$TMPDIR`) |
|
|
170
|
+
| `SPHERE_NO_PATH_SETUP` | Set to `1` to skip auto-adding the `bin` dir to your shell rc |
|
|
171
|
+
| `SPHERE_BINARY_BASEURL` | Override the release base URL the engine downloads from (testing) |
|
|
172
|
+
| `SPHERE_SKIP_POSTINSTALL` | Set to `1` to skip the binary download during `npm install` (CI / offline) |
|
|
224
173
|
|
|
225
174
|
---
|
|
226
175
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "sphere-cli",
|
|
3
|
-
"version": "0.2.
|
|
3
|
+
"version": "0.2.10",
|
|
4
4
|
"description": "SPHERE CLI — synthetic data generation, evaluation, and certification (sealed native binary)",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"synthetic-data",
|
|
@@ -19,8 +19,7 @@
|
|
|
19
19
|
"sphere": "bin/sphere.js"
|
|
20
20
|
},
|
|
21
21
|
"scripts": {
|
|
22
|
-
"postinstall": "node scripts/postinstall.js"
|
|
23
|
-
"release": "bash scripts/release.sh"
|
|
22
|
+
"postinstall": "node scripts/postinstall.js"
|
|
24
23
|
},
|
|
25
24
|
"files": [
|
|
26
25
|
"bin/sphere.js",
|
package/scripts/engine.js
CHANGED
|
@@ -30,7 +30,7 @@ const { execFileSync } = require('child_process');
|
|
|
30
30
|
const REPO = 'statzihuai/sphere-cli';
|
|
31
31
|
// Binary release tag — decoupled from the npm package version so JS-only patch
|
|
32
32
|
// releases reuse the same prebuilt/notarized binaries.
|
|
33
|
-
const BINARY_RELEASE = 'v0.2.
|
|
33
|
+
const BINARY_RELEASE = 'v0.2.9';
|
|
34
34
|
|
|
35
35
|
const PLATFORM = process.platform; // 'darwin' | 'linux'
|
|
36
36
|
const ARCH = process.arch; // 'arm64' | 'x64'
|