@skill-map/spec 0.40.0 → 0.42.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +36 -0
- package/README.md +6 -9
- package/architecture.md +68 -40
- package/cli-contract.md +52 -17
- package/conformance/coverage.md +1 -1
- package/db-schema.md +2 -2
- package/index.json +12 -11
- package/package.json +2 -1
- package/plugin-author-guide.md +273 -776
- package/schemas/extensions/provider.schema.json +20 -0
- package/schemas/user-settings.schema.json +31 -0
- package/telemetry.md +294 -0
|
@@ -93,6 +93,26 @@
|
|
|
93
93
|
"description": "Path globs (relative to scope root) that this Provider claims. **Enforcement-grade since structure-as-truth refactor**: a Provider declaring `roots` only receives files that match at least one entry of the array; a Provider without `roots` acts as a fallback and receives files unmatched by every other Provider's roots. Two Providers whose `roots` both match the same file produce a `provider-ambiguous` issue and the file stays unclassified. `sm plugins doctor` warns when no file matched a specific Provider's roots in the latest scan.",
|
|
94
94
|
"items": { "type": "string" }
|
|
95
95
|
},
|
|
96
|
+
"scaffold": {
|
|
97
|
+
"type": "object",
|
|
98
|
+
"required": ["skillDir"],
|
|
99
|
+
"additionalProperties": false,
|
|
100
|
+
"description": "Authoring targets for verbs that MATERIALISE files into this Provider's on-disk territory (today only `sm tutorial`, which drops a skill folder where the Provider's runtime will discover it). Distinct from `detect` (which READS markers to suggest a lens) and from `classify` (which READS paths during a scan): `scaffold` is the WRITE side, the directory a generator drops new content into so the target runtime picks it up. Optional: a Provider with no `scaffold` block is never offered as a destination by a materialising verb (e.g. `openai` until Codex skills land, `antigravity` whose skills live under the open-standard `agent-skills` territory, `core/markdown` which owns no authoring convention). The skill-folder convention is uniform across hosts (`<skillDir>/<name>/SKILL.md`), so a single `skillDir` is enough today; a future verb that scaffolds agents or commands adds a sibling field (`agentDir`, `commandDir`) without breaking this one.",
|
|
101
|
+
"properties": {
|
|
102
|
+
"skillDir": {
|
|
103
|
+
"type": "string",
|
|
104
|
+
"minLength": 1,
|
|
105
|
+
"pattern": "^\\.?[A-Za-z0-9][A-Za-z0-9._/-]*$",
|
|
106
|
+
"description": "Directory (relative to the scope root) under which a materialising verb writes a skill folder, e.g. `.claude/skills` for Claude, `.agents/skills` for the open standard. The verb appends `/<skillName>/SKILL.md`. Relative, no leading slash and no `..` traversal (the pattern forbids both); the consuming verb joins it onto the cwd."
|
|
107
|
+
},
|
|
108
|
+
"aka": {
|
|
109
|
+
"type": "array",
|
|
110
|
+
"minItems": 1,
|
|
111
|
+
"description": "Display-only hints naming the agents that consume this Provider's scaffold territory, shown in parentheses next to the Provider label in the `sm tutorial` destination prompt (e.g. the open-standard `.agents/skills` is read by Antigravity and OpenAI Codex, so its `agent-skills` Provider lists them here). Purely presentational: these strings are NOT matched by `--for` (only registered Provider ids are) and have no runtime effect. Optional; absent means the prompt shows the bare Provider label.",
|
|
112
|
+
"items": { "type": "string", "minLength": 1 }
|
|
113
|
+
}
|
|
114
|
+
}
|
|
115
|
+
},
|
|
96
116
|
"gatedByActiveLens": {
|
|
97
117
|
"type": "boolean",
|
|
98
118
|
"description": "Lens gating flag for vendor providers. When `true`, this Provider's `classify()` only runs (and the walker only iterates its territory) if `provider.id === activeProvider` (the project's active lens). When `false` or omitted (default), the Provider is universal and classifies unconditionally. Vendor providers (`claude`, `openai`, `antigravity`) MUST set this to `true`: the actual runtimes never read each other's on-disk formats (Claude Code does not consume `.codex/`; Codex CLI does not consume `.claude/`), and offering every file to every provider fabricates cross-vendor graph edges the runtimes themselves reject. Universal providers (open-standard `agent-skills`, markdown fallback `core/markdown`, any future format-based fallback) keep this `false` so their territory is consumed by every vendor and they run on every scan. When `activeProvider === null` (no lens resolved), the walker bypasses the gate entirely and every gated Provider runs, mirroring the permissive extractor-side fallback for unlensed projects. Affects classification ONLY; extractors continue to filter via their own `precondition.provider` allowlist."
|
|
@@ -34,6 +34,37 @@
|
|
|
34
34
|
"description": "Unix milliseconds of the last banner emission to stderr. `null` (or absent) when never shown. Used so a single probe does not re-emit the banner across back-to-back `sm` invocations."
|
|
35
35
|
}
|
|
36
36
|
}
|
|
37
|
+
},
|
|
38
|
+
"telemetry": {
|
|
39
|
+
"type": "object",
|
|
40
|
+
"additionalProperties": false,
|
|
41
|
+
"description": "User consent + prompt bookkeeping for the opt-in, anonymous telemetry surfaces (see `spec/telemetry.md`). All surfaces default OFF and run only after explicit consent. Three independent toggles (`errorsEnabled` for Sentry error reporting, `usageCliEnabled` / `usageUiEnabled` for PostHog usage analytics) plus the usage `distinct_id` (`anonymousId`) and the shared prompt bookkeeping (`firstRunAt`, `promptedAt`) the CLI maintains so it asks once, at the right time.",
|
|
42
|
+
"properties": {
|
|
43
|
+
"errorsEnabled": {
|
|
44
|
+
"type": "boolean",
|
|
45
|
+
"description": "Operator opt-in toggle for error reporting (Sentry). **Default OFF**: when absent or `false`, no Sentry SDK is initialised and no event leaves the machine. Set to `true` only after explicit consent (the consent prompt or Settings UI). The `SKILL_MAP_TELEMETRY=0` env var forces OFF regardless of this value."
|
|
46
|
+
},
|
|
47
|
+
"usageCliEnabled": {
|
|
48
|
+
"type": "boolean",
|
|
49
|
+
"description": "Operator opt-in toggle for CLI usage analytics (PostHog). **Default OFF**: when absent or `false`, no PostHog SDK is initialised in the CLI and no usage event leaves the machine. Independent of `errorsEnabled` and `usageUiEnabled`; the `SKILL_MAP_TELEMETRY=0` env var forces OFF regardless."
|
|
50
|
+
},
|
|
51
|
+
"usageUiEnabled": {
|
|
52
|
+
"type": "boolean",
|
|
53
|
+
"description": "Operator opt-in toggle for UI usage analytics (PostHog). **Default OFF**: when absent or `false`, the browser never loads the PostHog SDK. Independent of `errorsEnabled` and `usageCliEnabled`; the `SKILL_MAP_TELEMETRY=0` env var forces OFF regardless."
|
|
54
|
+
},
|
|
55
|
+
"anonymousId": {
|
|
56
|
+
"type": ["string", "null"],
|
|
57
|
+
"description": "Random UUID v4 used as the PostHog `distinct_id` for the usage surface, shared by the CLI and UI so both are attributed to one install. Carries no personal data. Minted exactly once, the first time any usage toggle becomes `true`, and never regenerated. The single anonymous correlation id the contract permits, scoped to usage only; the BFF exposes it read-only and it is never writable over the wire. `null` (or absent) until usage is first enabled."
|
|
58
|
+
},
|
|
59
|
+
"firstRunAt": {
|
|
60
|
+
"type": ["integer", "null"],
|
|
61
|
+
"description": "Unix milliseconds of the first run on which the consent prompt was eligible to appear (interactive TTY, at least one carrier configured, not opted-out by env, not yet answered). The prompt is intentionally deferred to the NEXT eligible run so it does not stack on top of the first-run provider-lens prompt. `null` (or absent) before any eligible run."
|
|
62
|
+
},
|
|
63
|
+
"promptedAt": {
|
|
64
|
+
"type": ["integer", "null"],
|
|
65
|
+
"description": "Unix milliseconds of the moment the shared consent prompt was shown. `null` (or absent) when the user has never been prompted. Once set, the prompt is never shown again, the persisted toggles are authoritative."
|
|
66
|
+
}
|
|
67
|
+
}
|
|
37
68
|
}
|
|
38
69
|
}
|
|
39
70
|
}
|
package/telemetry.md
ADDED
|
@@ -0,0 +1,294 @@
|
|
|
1
|
+
# Telemetry
|
|
2
|
+
|
|
3
|
+
skill-map is a local-first tool. By default it sends **nothing** off the
|
|
4
|
+
operator's machine. This document is the normative contract for the optional
|
|
5
|
+
exceptions: two independently-consented, anonymous telemetry surfaces, both
|
|
6
|
+
**OFF by default**.
|
|
7
|
+
|
|
8
|
+
- **Error reporting** (Sentry), so crashes happening in installations the
|
|
9
|
+
maintainers do not control can be learned about and fixed.
|
|
10
|
+
- **Usage analytics** (PostHog), so the maintainers can learn which verbs and
|
|
11
|
+
built-in extensions are actually used in the wild and prioritise the
|
|
12
|
+
roadmap accordingly.
|
|
13
|
+
|
|
14
|
+
The two surfaces share one consent prompt, one kill switch, and one
|
|
15
|
+
scrubber, but each has its own carrier, its own toggle, and its own stability
|
|
16
|
+
contract. Either can be shipped, or not, independently.
|
|
17
|
+
|
|
18
|
+
## Scope and non-goals
|
|
19
|
+
|
|
20
|
+
In scope:
|
|
21
|
+
|
|
22
|
+
- **Errors.** Uncaught exceptions and unhandled rejections in the CLI
|
|
23
|
+
process, unhandled errors in the BFF (`sm serve`) request path, unhandled
|
|
24
|
+
runtime errors in the browser UI, plus a small fixed set of triage tags
|
|
25
|
+
(`surface`, `verb`, `phase`, `plugin_id` for built-ins, `extension_kind`,
|
|
26
|
+
`route`, `method`, `status`).
|
|
27
|
+
- **Usage.** Which `sm` verb ran and the NAMES of the flags it was given;
|
|
28
|
+
the set of built-in extension ids that executed during a scan (presence,
|
|
29
|
+
not volume); which UI view or feature was opened. Plus environment facts
|
|
30
|
+
(`cli_version`, `node_major`, `os`, `arch`).
|
|
31
|
+
|
|
32
|
+
Out of scope (MUST NOT be collected under this contract, on either surface):
|
|
33
|
+
|
|
34
|
+
- **Flag values, file names, markdown bodies, frontmatter values, annotation
|
|
35
|
+
contents, settings values.** Only flag names and built-in extension ids
|
|
36
|
+
ever leave the machine.
|
|
37
|
+
- **Performance traces:** latency, throughput, span timing.
|
|
38
|
+
- **Project-shape signals:** file counts, node counts, frontmatter key sets,
|
|
39
|
+
project size. "Which extensions ran" is presence only, never a count.
|
|
40
|
+
- **Any cross-session or cross-install correlation identifier**, with one
|
|
41
|
+
documented exception: the single anonymous usage `distinct_id`
|
|
42
|
+
(`telemetry.anonymousId`, below), which carries no identity and exists only
|
|
43
|
+
so usage events from the same install can be de-duplicated. The error
|
|
44
|
+
surface carries no correlation id at all.
|
|
45
|
+
|
|
46
|
+
## Consent contract (shared)
|
|
47
|
+
|
|
48
|
+
Both surfaces are **OFF by default**. They run only after the operator has
|
|
49
|
+
explicitly opted in. Consent state lives in the user-settings file at
|
|
50
|
+
`~/.skill-map/settings.json` under the `telemetry` object (see
|
|
51
|
+
[`user-settings.schema.json`](./schemas/user-settings.schema.json) and the
|
|
52
|
+
narrow `$HOME` exception in [`cli-contract.md`](./cli-contract.md) §User-settings file):
|
|
53
|
+
|
|
54
|
+
- `telemetry.errorsEnabled` (boolean). Opt-in for error reporting. Absent or
|
|
55
|
+
`false` MUST be treated as OFF.
|
|
56
|
+
- `telemetry.usageCliEnabled` (boolean). Opt-in for CLI usage analytics.
|
|
57
|
+
Absent or `false` MUST be treated as OFF.
|
|
58
|
+
- `telemetry.usageUiEnabled` (boolean). Opt-in for UI usage analytics. Absent
|
|
59
|
+
or `false` MUST be treated as OFF.
|
|
60
|
+
- `telemetry.anonymousId` (string UUID, or null). The PostHog `distinct_id`
|
|
61
|
+
for the usage surface. Minted once when any usage toggle first becomes
|
|
62
|
+
`true`; never regenerated. The single allowed anonymous correlation id,
|
|
63
|
+
scoped to usage only.
|
|
64
|
+
- `telemetry.firstRunAt` (integer milliseconds, or null). Records the first
|
|
65
|
+
run on which the prompt was eligible, so the prompt can be deferred to the
|
|
66
|
+
next eligible run.
|
|
67
|
+
- `telemetry.promptedAt` (integer milliseconds, or null). Records when the
|
|
68
|
+
consent prompt was shown so it is never shown twice.
|
|
69
|
+
|
|
70
|
+
Rules:
|
|
71
|
+
|
|
72
|
+
1. **Default OFF.** When a toggle is absent or `false`, the matching SDK is
|
|
73
|
+
not initialised, no endpoint is contacted, and there is zero added latency.
|
|
74
|
+
This MUST hold on every surface (CLI, BFF, UI).
|
|
75
|
+
2. **One shared consent prompt, TTY only, deferred to the second eligible
|
|
76
|
+
run.** A run is "eligible" when the prompt could appear: an interactive
|
|
77
|
+
terminal (`process.stdout.isTTY` true), at least one carrier configured
|
|
78
|
+
(a Sentry DSN or the PostHog key non-empty), the kill switch unset, and
|
|
79
|
+
`promptedAt` absent. The CLI MUST NOT prompt on the FIRST eligible run, it
|
|
80
|
+
only stamps `firstRunAt` and stays silent, so the operator's first `sm`
|
|
81
|
+
invocation is not asked two things at once (a first `sm scan` may already
|
|
82
|
+
prompt for the provider lens). The NEXT eligible run shows the interactive
|
|
83
|
+
prompt (yes (default) / no / details). A single **yes** sets
|
|
84
|
+
`errorsEnabled`, `usageCliEnabled`, and `usageUiEnabled` all to `true` and
|
|
85
|
+
mints `anonymousId`; a **no** sets all three to `false` and mints nothing.
|
|
86
|
+
Either way it stamps `promptedAt`. On a non-eligible run (non-TTY CI,
|
|
87
|
+
pipes) nothing is asked or recorded and every surface stays OFF.
|
|
88
|
+
3. **Asked once.** Once `promptedAt` is set, the prompt MUST NOT be shown
|
|
89
|
+
again. The persisted toggles are authoritative thereafter.
|
|
90
|
+
4. **Env override.** The `SKILL_MAP_TELEMETRY=0` environment variable forces
|
|
91
|
+
OFF on every surface (errors and both usage toggles) regardless of the
|
|
92
|
+
persisted settings. It is a kill switch, not a toggle: there is no value of
|
|
93
|
+
the variable that forces ON. There is exactly one kill-switch variable for
|
|
94
|
+
all surfaces.
|
|
95
|
+
5. **Independent toggles.** After the first run, the operator changes consent
|
|
96
|
+
through the Settings UI (persisted via the BFF), the same way the
|
|
97
|
+
update-check toggle works today. The three toggles are independent:
|
|
98
|
+
`usageCliEnabled` and `usageUiEnabled` can each be turned off without
|
|
99
|
+
affecting the other or `errorsEnabled`. Because the CLI reads
|
|
100
|
+
`~/.skill-map/settings.json` fresh on every invocation, turning CLI usage
|
|
101
|
+
off from the browser is honoured on the next `sm` run. There is
|
|
102
|
+
intentionally no dedicated `sm config` key, because `sm config` writes
|
|
103
|
+
project-local settings and these flags are per-machine. A future
|
|
104
|
+
`sm telemetry` verb family MAY expose status and toggling from the CLI.
|
|
105
|
+
6. **Anonymous id.** `anonymousId` is a random UUID v4 with no personal data.
|
|
106
|
+
It is minted exactly once, the first time any usage toggle becomes `true`
|
|
107
|
+
(through the consent prompt or a Settings enable), and is never
|
|
108
|
+
regenerated for the life of the install. It is the PostHog `distinct_id`
|
|
109
|
+
shared by the CLI and UI usage surfaces so the two are attributed to one
|
|
110
|
+
install. The BFF exposes it read-only (see below) so the browser uses the
|
|
111
|
+
same id; it MUST NOT be writable over the wire.
|
|
112
|
+
|
|
113
|
+
## Surface: Errors (Sentry)
|
|
114
|
+
|
|
115
|
+
Three surfaces report independently so a crash can be attributed to the right
|
|
116
|
+
layer. They report to **two** Sentry projects: the two Node surfaces (CLI and
|
|
117
|
+
BFF) share one project and are told apart by a `surface` tag, the browser UI
|
|
118
|
+
reports to its own project.
|
|
119
|
+
|
|
120
|
+
| Surface | Runtime | Discriminator | Project |
|
|
121
|
+
|---|---|---|---|
|
|
122
|
+
| `sm <verb>` | Node (CLI) | `surface: cli` tag | shared Node project |
|
|
123
|
+
| `sm serve` BFF | Node (Hono) | `surface: bff` tag | shared Node project |
|
|
124
|
+
| UI | Browser (Angular) | own project | `skill-map-ui` |
|
|
125
|
+
|
|
126
|
+
The two Node surfaces share one project because they are the same workspace
|
|
127
|
+
code in the same runtime; the `surface` tag, plus the per-event `route` /
|
|
128
|
+
`method` tags, separate a CLI crash from a BFF request-path crash. The UI has
|
|
129
|
+
its own project, so it needs no `surface` tag. Each project carries a
|
|
130
|
+
hardcoded DSN (`SENTRY_DSN_NODE` for the shared Node project, `SENTRY_DSN_UI`
|
|
131
|
+
for the UI), centralized in `src/public-config.ts` and
|
|
132
|
+
`ui/src/app/core/public-config.ts`. Sentry DSNs are public by design (they
|
|
133
|
+
identify an ingest endpoint, they are not secrets) and are safe to ship in
|
|
134
|
+
the published artifact. The BFF MUST NOT emit usage events; it reports only
|
|
135
|
+
unhandled errors in the request path.
|
|
136
|
+
|
|
137
|
+
The error surfaces send **no proactive beacons**: no release-health sessions,
|
|
138
|
+
no transactions, no performance traces. An event leaves the machine ONLY when
|
|
139
|
+
an error is captured. In particular the browser SDK MUST drop the default
|
|
140
|
+
session integration so no session is sent on page load or route change.
|
|
141
|
+
|
|
142
|
+
### Error wire format
|
|
143
|
+
|
|
144
|
+
An error event MAY carry:
|
|
145
|
+
|
|
146
|
+
- A stack trace whose `filename` and `abs_path` frames have been run through
|
|
147
|
+
the path scrubber (below).
|
|
148
|
+
- Environment facts: `cli_version`, `node_major`, `os`, `arch`, and, for the
|
|
149
|
+
UI, browser family and version.
|
|
150
|
+
- The fixed tag set: `surface` (`cli` / `bff` on the shared Node project),
|
|
151
|
+
`verb`, `phase`, `plugin_id` (built-in ids only), `extension_kind`,
|
|
152
|
+
`route` (BFF), `method`, `status`.
|
|
153
|
+
- The error name, error code, and a scrubbed message.
|
|
154
|
+
- Breadcrumbs (a bounded recent-event trail) with each message scrubbed.
|
|
155
|
+
|
|
156
|
+
## Surface: Usage (PostHog)
|
|
157
|
+
|
|
158
|
+
Usage analytics are carried by **PostHog Cloud (EU region)**, for data
|
|
159
|
+
residency parity with the Sentry `.de` projects. The public PostHog project
|
|
160
|
+
key is hardcoded and centralized in `src/public-config.ts` (`POSTHOG_KEY_NODE`)
|
|
161
|
+
and `ui/src/app/core/public-config.ts` (`POSTHOG_KEY_UI`). Like a Sentry DSN,
|
|
162
|
+
a PostHog project key is a public ingest identifier, not a secret, and is safe
|
|
163
|
+
to ship. Setting a key to the empty string `''` forces that surface dormant
|
|
164
|
+
(no init, no network, the SDK is not even imported), the same dormancy gate
|
|
165
|
+
the error surface uses.
|
|
166
|
+
|
|
167
|
+
Only **two** runtimes emit usage events:
|
|
168
|
+
|
|
169
|
+
| Surface | Runtime | Toggle | Carrier |
|
|
170
|
+
|---|---|---|---|
|
|
171
|
+
| `sm <verb>` | Node (CLI) | `usageCliEnabled` | PostHog (server SDK) |
|
|
172
|
+
| UI | Browser (Angular) | `usageUiEnabled` | PostHog (browser SDK) |
|
|
173
|
+
|
|
174
|
+
The **BFF MUST NOT emit usage events** (the BFF's activity is the UI's
|
|
175
|
+
activity, already covered by the UI surface; double-emitting would
|
|
176
|
+
double-count). The BFF participates only by reading/writing consent and by
|
|
177
|
+
exposing `anonymousId` read-only on `GET /api/preferences` so the browser uses
|
|
178
|
+
the same `distinct_id` as the CLI.
|
|
179
|
+
|
|
180
|
+
Both usage SDKs are configured to send nothing beyond the allow-list below:
|
|
181
|
+
PostHog autocapture, pageview/pageleave capture, session recording, and
|
|
182
|
+
client IP / geo-IP enrichment are all disabled.
|
|
183
|
+
|
|
184
|
+
## Usage event taxonomy
|
|
185
|
+
|
|
186
|
+
Usage collection is **deny by default**: only the events and properties named
|
|
187
|
+
here may be sent. Every event carries `distinct_id = telemetry.anonymousId`,
|
|
188
|
+
the common environment facts (`cli_version`, `node_major`, `os`, `arch`; the UI
|
|
189
|
+
additionally carries browser family/version where the SDK provides it), and
|
|
190
|
+
`environment` (`dev` / `prod`, see below). The UI also attaches
|
|
191
|
+
the active theme as super-properties on every event: `theme_base` (`light` /
|
|
192
|
+
`dark`) and `theme_extra` (the active extra theme id, or `none`); future extra
|
|
193
|
+
themes flow through by value with no spec change. No other identity property is
|
|
194
|
+
ever attached.
|
|
195
|
+
|
|
196
|
+
The `environment` tag lets the maintainers filter their own dogfooding out of
|
|
197
|
+
the real-world data. It is `dev` when the `SKILL_MAP_TELEMETRY_ENV` environment
|
|
198
|
+
variable is set to any non-empty value other than a production marker
|
|
199
|
+
(`prod` / `production`); the dev tooling sets it. It is `prod` when the
|
|
200
|
+
variable is absent, empty, or a production marker. It is NOT a kill switch (it
|
|
201
|
+
never disables telemetry, only labels the source) and rides on both surfaces:
|
|
202
|
+
usage events as above, and Sentry's native `environment` field on error
|
|
203
|
+
events.
|
|
204
|
+
|
|
205
|
+
| Event | Surface | Properties |
|
|
206
|
+
|---|---|---|
|
|
207
|
+
| `cli.<verb>` | CLI | `flags` (array of flag NAMES that were set), and on a scan, `extensions` (deduped, sorted set of built-in extractor ids that ran in the walk). One event per invocation; the event NAME is the verb (`cli.scan`, `cli.check`, ...), restricted to the registered closed verb set so an unknown command collapses to `cli.unknown` (a typo never mints a junk event name). |
|
|
208
|
+
| `ui.view.<view>` | UI | the opened view is the event name (`ui.view.map`, `ui.view.files`), from a closed route set. No properties beyond the common env facts. One per route change. |
|
|
209
|
+
| `ui.feature.<feature>` | UI | the opened feature is the event name (`ui.feature.inspector`, `ui.feature.settings`), from a closed set. |
|
|
210
|
+
| `plugin.apply` | CLI + UI | `enabled` / `disabled`: deduped, sorted sets of the plugin / extension ids toggled (built-in ids pass through, third-party collapse to `external_plugin`). Emitted on `sm plugins enable` / `disable` and on the Settings plugins Apply. |
|
|
211
|
+
|
|
212
|
+
Rules:
|
|
213
|
+
|
|
214
|
+
- **Flag names only, never values.** `--max-nodes 500` reports the name
|
|
215
|
+
`max-nodes`, never `500`.
|
|
216
|
+
- **Extractor ids are presence, not counts.** `extensions` is a set; it never
|
|
217
|
+
carries how many nodes an extractor processed or how large the project is.
|
|
218
|
+
Only extractors that ran in the walk appear (cached extractors on an
|
|
219
|
+
incremental scan do not), so the signal is "which extractors this project
|
|
220
|
+
exercises", aggregated across runs.
|
|
221
|
+
- **Third-party ids collapse.** Any extension id whose plugin is not a
|
|
222
|
+
built-in (`claude`, `antigravity`, `openai`, `agent-skills`, `core`) MUST be
|
|
223
|
+
replaced with the literal `external_plugin` before the event leaves the
|
|
224
|
+
machine.
|
|
225
|
+
- **No node paths, titles, or content** in any UI event; the view / feature is
|
|
226
|
+
the event name, drawn from a closed set, and nothing else is attached.
|
|
227
|
+
|
|
228
|
+
## Scrubbing rules (shared)
|
|
229
|
+
|
|
230
|
+
Scrubbing is **deny by default** and applied client-side in each SDK's
|
|
231
|
+
pre-send hook (`beforeSend` for Sentry, `before_send` for PostHog), before any
|
|
232
|
+
event leaves the machine. It applies to error events AND usage event
|
|
233
|
+
properties (defense in depth: the usage collectors emit only names and enums
|
|
234
|
+
by construction, but every event's payload is still walked). An event MUST
|
|
235
|
+
have the following removed or replaced:
|
|
236
|
+
|
|
237
|
+
- **Absolute paths**, anywhere they appear (frame `abs_path`, frame
|
|
238
|
+
`filename`, inside the error message, inside breadcrumb messages, inside
|
|
239
|
+
any nested event or property field). The user's home directory is replaced
|
|
240
|
+
with the literal `<HOME>` and the OS username with `<USER>`.
|
|
241
|
+
- **File names of user content** (scanned markdown files).
|
|
242
|
+
- **Markdown bodies, frontmatter values, annotation contents.** None of these
|
|
243
|
+
are ever attached to an event.
|
|
244
|
+
- **IP address.** Opted out client-side and disabled at the project level.
|
|
245
|
+
- **Hostname** (`server_name` stripped).
|
|
246
|
+
- **OS username.**
|
|
247
|
+
- **Third-party plugin ids.** Only built-in plugin ids may appear; any
|
|
248
|
+
non-built-in id MUST be replaced with the literal `external_plugin`.
|
|
249
|
+
- **Settings values** (`scan.extraFolders`, `scan.referencePaths`, etc.).
|
|
250
|
+
|
|
251
|
+
The scrubber is a pure function with no SDK dependency, so it can be unit
|
|
252
|
+
tested against hostile inputs (Windows paths, symlinked paths, paths embedded
|
|
253
|
+
mid-message, nested `abs_path` fields, breadcrumb data, structured usage
|
|
254
|
+
property objects) independently of the SDK wiring.
|
|
255
|
+
|
|
256
|
+
## Server-side guarantees
|
|
257
|
+
|
|
258
|
+
As a second line of defense behind the client-side scrubber:
|
|
259
|
+
|
|
260
|
+
- Each **Sentry** project MUST be configured to not store IP addresses and to
|
|
261
|
+
run a server-side data-scrubbing rule with the same path pattern as the
|
|
262
|
+
client scrubber. The UI error surface additionally restricts reporting to
|
|
263
|
+
loopback: Sentry retired its server-side allowed-domains setting, so this is
|
|
264
|
+
enforced client-side via the SDK `allowUrls` option pinned to `localhost` /
|
|
265
|
+
`127.0.0.1` (the UI is only ever served from loopback).
|
|
266
|
+
- The **PostHog** project MUST be configured to discard client IP addresses
|
|
267
|
+
and disable geo-IP enrichment (the client SDKs also disable geo and
|
|
268
|
+
autocapture, but the project setting is the backstop).
|
|
269
|
+
|
|
270
|
+
## Stability
|
|
271
|
+
|
|
272
|
+
The **consent model** (default OFF on every surface, the `telemetry` toggles
|
|
273
|
+
and bookkeeping in `user-settings.schema.json`, the `SKILL_MAP_TELEMETRY=0`
|
|
274
|
+
kill switch, prompt-once semantics) is stable as of the spec minor in which it
|
|
275
|
+
lands. Loosening any default (anything other than OFF), removing the kill
|
|
276
|
+
switch, or removing the consent gate is a major bump.
|
|
277
|
+
|
|
278
|
+
The **two surfaces are independent.** Error scope and usage scope each evolve
|
|
279
|
+
on their own minor bump. Adding a new usage event or property, or a new error
|
|
280
|
+
tag or environment fact, is a minor bump. Performance traces remain out of
|
|
281
|
+
scope on both surfaces and would be a third, separately-consented surface.
|
|
282
|
+
|
|
283
|
+
The **`anonymousId` exception** is normatively scoped to the usage surface
|
|
284
|
+
only: it is the one anonymous correlation id the contract permits, and the
|
|
285
|
+
error surface MUST remain free of any cross-session or cross-install id.
|
|
286
|
+
Widening the id beyond usage, or attaching any identity to it, is a major
|
|
287
|
+
bump.
|
|
288
|
+
|
|
289
|
+
The scrubbing exclusion list (what MUST NOT leave the machine) is the stable,
|
|
290
|
+
normative core and may only grow, never shrink, without a major bump.
|
|
291
|
+
|
|
292
|
+
Consumers and alternate implementations MAY choose not to ship either surface;
|
|
293
|
+
both are optional. An implementation that does ship a surface MUST honor the
|
|
294
|
+
consent contract and the scrubbing rules in full.
|