@skill-map/spec 0.40.0 → 0.41.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -34,6 +34,37 @@
34
34
  "description": "Unix milliseconds of the last banner emission to stderr. `null` (or absent) when never shown. Used so a single probe does not re-emit the banner across back-to-back `sm` invocations."
35
35
  }
36
36
  }
37
+ },
38
+ "telemetry": {
39
+ "type": "object",
40
+ "additionalProperties": false,
41
+ "description": "User consent + prompt bookkeeping for the opt-in, anonymous telemetry surfaces (see `spec/telemetry.md`). All surfaces default OFF and run only after explicit consent. Three independent toggles (`errorsEnabled` for Sentry error reporting, `usageCliEnabled` / `usageUiEnabled` for PostHog usage analytics) plus the usage `distinct_id` (`anonymousId`) and the shared prompt bookkeeping (`firstRunAt`, `promptedAt`) the CLI maintains so it asks once, at the right time.",
42
+ "properties": {
43
+ "errorsEnabled": {
44
+ "type": "boolean",
45
+ "description": "Operator opt-in toggle for error reporting (Sentry). **Default OFF**: when absent or `false`, no Sentry SDK is initialised and no event leaves the machine. Set to `true` only after explicit consent (the consent prompt or Settings UI). The `SKILL_MAP_TELEMETRY=0` env var forces OFF regardless of this value."
46
+ },
47
+ "usageCliEnabled": {
48
+ "type": "boolean",
49
+ "description": "Operator opt-in toggle for CLI usage analytics (PostHog). **Default OFF**: when absent or `false`, no PostHog SDK is initialised in the CLI and no usage event leaves the machine. Independent of `errorsEnabled` and `usageUiEnabled`; the `SKILL_MAP_TELEMETRY=0` env var forces OFF regardless."
50
+ },
51
+ "usageUiEnabled": {
52
+ "type": "boolean",
53
+ "description": "Operator opt-in toggle for UI usage analytics (PostHog). **Default OFF**: when absent or `false`, the browser never loads the PostHog SDK. Independent of `errorsEnabled` and `usageCliEnabled`; the `SKILL_MAP_TELEMETRY=0` env var forces OFF regardless."
54
+ },
55
+ "anonymousId": {
56
+ "type": ["string", "null"],
57
+ "description": "Random UUID v4 used as the PostHog `distinct_id` for the usage surface, shared by the CLI and UI so both are attributed to one install. Carries no personal data. Minted exactly once, the first time any usage toggle becomes `true`, and never regenerated. The single anonymous correlation id the contract permits, scoped to usage only; the BFF exposes it read-only and it is never writable over the wire. `null` (or absent) until usage is first enabled."
58
+ },
59
+ "firstRunAt": {
60
+ "type": ["integer", "null"],
61
+ "description": "Unix milliseconds of the first run on which the consent prompt was eligible to appear (interactive TTY, at least one carrier configured, not opted-out by env, not yet answered). The prompt is intentionally deferred to the NEXT eligible run so it does not stack on top of the first-run provider-lens prompt. `null` (or absent) before any eligible run."
62
+ },
63
+ "promptedAt": {
64
+ "type": ["integer", "null"],
65
+ "description": "Unix milliseconds of the moment the shared consent prompt was shown. `null` (or absent) when the user has never been prompted. Once set, the prompt is never shown again, the persisted toggles are authoritative."
66
+ }
67
+ }
37
68
  }
38
69
  }
39
70
  }
package/telemetry.md ADDED
@@ -0,0 +1,294 @@
1
+ # Telemetry
2
+
3
+ skill-map is a local-first tool. By default it sends **nothing** off the
4
+ operator's machine. This document is the normative contract for the optional
5
+ exceptions: two independently-consented, anonymous telemetry surfaces, both
6
+ **OFF by default**.
7
+
8
+ - **Error reporting** (Sentry), so crashes happening in installations the
9
+ maintainers do not control can be learned about and fixed.
10
+ - **Usage analytics** (PostHog), so the maintainers can learn which verbs and
11
+ built-in extensions are actually used in the wild and prioritise the
12
+ roadmap accordingly.
13
+
14
+ The two surfaces share one consent prompt, one kill switch, and one
15
+ scrubber, but each has its own carrier, its own toggle, and its own stability
16
+ contract. Either can be shipped, or not, independently.
17
+
18
+ ## Scope and non-goals
19
+
20
+ In scope:
21
+
22
+ - **Errors.** Uncaught exceptions and unhandled rejections in the CLI
23
+ process, unhandled errors in the BFF (`sm serve`) request path, unhandled
24
+ runtime errors in the browser UI, plus a small fixed set of triage tags
25
+ (`surface`, `verb`, `phase`, `plugin_id` for built-ins, `extension_kind`,
26
+ `route`, `method`, `status`).
27
+ - **Usage.** Which `sm` verb ran and the NAMES of the flags it was given;
28
+ the set of built-in extension ids that executed during a scan (presence,
29
+ not volume); which UI view or feature was opened. Plus environment facts
30
+ (`cli_version`, `node_major`, `os`, `arch`).
31
+
32
+ Out of scope (MUST NOT be collected under this contract, on either surface):
33
+
34
+ - **Flag values, file names, markdown bodies, frontmatter values, annotation
35
+ contents, settings values.** Only flag names and built-in extension ids
36
+ ever leave the machine.
37
+ - **Performance traces:** latency, throughput, span timing.
38
+ - **Project-shape signals:** file counts, node counts, frontmatter key sets,
39
+ project size. "Which extensions ran" is presence only, never a count.
40
+ - **Any cross-session or cross-install correlation identifier**, with one
41
+ documented exception: the single anonymous usage `distinct_id`
42
+ (`telemetry.anonymousId`, below), which carries no identity and exists only
43
+ so usage events from the same install can be de-duplicated. The error
44
+ surface carries no correlation id at all.
45
+
46
+ ## Consent contract (shared)
47
+
48
+ Both surfaces are **OFF by default**. They run only after the operator has
49
+ explicitly opted in. Consent state lives in the user-settings file at
50
+ `~/.skill-map/settings.json` under the `telemetry` object (see
51
+ [`user-settings.schema.json`](./schemas/user-settings.schema.json) and the
52
+ narrow `$HOME` exception in [`cli-contract.md`](./cli-contract.md) §User-settings file):
53
+
54
+ - `telemetry.errorsEnabled` (boolean). Opt-in for error reporting. Absent or
55
+ `false` MUST be treated as OFF.
56
+ - `telemetry.usageCliEnabled` (boolean). Opt-in for CLI usage analytics.
57
+ Absent or `false` MUST be treated as OFF.
58
+ - `telemetry.usageUiEnabled` (boolean). Opt-in for UI usage analytics. Absent
59
+ or `false` MUST be treated as OFF.
60
+ - `telemetry.anonymousId` (string UUID, or null). The PostHog `distinct_id`
61
+ for the usage surface. Minted once when any usage toggle first becomes
62
+ `true`; never regenerated. The single allowed anonymous correlation id,
63
+ scoped to usage only.
64
+ - `telemetry.firstRunAt` (integer milliseconds, or null). Records the first
65
+ run on which the prompt was eligible, so the prompt can be deferred to the
66
+ next eligible run.
67
+ - `telemetry.promptedAt` (integer milliseconds, or null). Records when the
68
+ consent prompt was shown so it is never shown twice.
69
+
70
+ Rules:
71
+
72
+ 1. **Default OFF.** When a toggle is absent or `false`, the matching SDK is
73
+ not initialised, no endpoint is contacted, and there is zero added latency.
74
+ This MUST hold on every surface (CLI, BFF, UI).
75
+ 2. **One shared consent prompt, TTY only, deferred to the second eligible
76
+ run.** A run is "eligible" when the prompt could appear: an interactive
77
+ terminal (`process.stdout.isTTY` true), at least one carrier configured
78
+ (a Sentry DSN or the PostHog key non-empty), the kill switch unset, and
79
+ `promptedAt` absent. The CLI MUST NOT prompt on the FIRST eligible run, it
80
+ only stamps `firstRunAt` and stays silent, so the operator's first `sm`
81
+ invocation is not asked two things at once (a first `sm scan` may already
82
+ prompt for the provider lens). The NEXT eligible run shows the interactive
83
+ prompt (yes (default) / no / details). A single **yes** sets
84
+ `errorsEnabled`, `usageCliEnabled`, and `usageUiEnabled` all to `true` and
85
+ mints `anonymousId`; a **no** sets all three to `false` and mints nothing.
86
+ Either way it stamps `promptedAt`. On a non-eligible run (non-TTY CI,
87
+ pipes) nothing is asked or recorded and every surface stays OFF.
88
+ 3. **Asked once.** Once `promptedAt` is set, the prompt MUST NOT be shown
89
+ again. The persisted toggles are authoritative thereafter.
90
+ 4. **Env override.** The `SKILL_MAP_TELEMETRY=0` environment variable forces
91
+ OFF on every surface (errors and both usage toggles) regardless of the
92
+ persisted settings. It is a kill switch, not a toggle: there is no value of
93
+ the variable that forces ON. There is exactly one kill-switch variable for
94
+ all surfaces.
95
+ 5. **Independent toggles.** After the first run, the operator changes consent
96
+ through the Settings UI (persisted via the BFF), the same way the
97
+ update-check toggle works today. The three toggles are independent:
98
+ `usageCliEnabled` and `usageUiEnabled` can each be turned off without
99
+ affecting the other or `errorsEnabled`. Because the CLI reads
100
+ `~/.skill-map/settings.json` fresh on every invocation, turning CLI usage
101
+ off from the browser is honoured on the next `sm` run. There is
102
+ intentionally no dedicated `sm config` key, because `sm config` writes
103
+ project-local settings and these flags are per-machine. A future
104
+ `sm telemetry` verb family MAY expose status and toggling from the CLI.
105
+ 6. **Anonymous id.** `anonymousId` is a random UUID v4 with no personal data.
106
+ It is minted exactly once, the first time any usage toggle becomes `true`
107
+ (through the consent prompt or a Settings enable), and is never
108
+ regenerated for the life of the install. It is the PostHog `distinct_id`
109
+ shared by the CLI and UI usage surfaces so the two are attributed to one
110
+ install. The BFF exposes it read-only (see below) so the browser uses the
111
+ same id; it MUST NOT be writable over the wire.
112
+
113
+ ## Surface: Errors (Sentry)
114
+
115
+ Three surfaces report independently so a crash can be attributed to the right
116
+ layer. They report to **two** Sentry projects: the two Node surfaces (CLI and
117
+ BFF) share one project and are told apart by a `surface` tag, the browser UI
118
+ reports to its own project.
119
+
120
+ | Surface | Runtime | Discriminator | Project |
121
+ |---|---|---|---|
122
+ | `sm <verb>` | Node (CLI) | `surface: cli` tag | shared Node project |
123
+ | `sm serve` BFF | Node (Hono) | `surface: bff` tag | shared Node project |
124
+ | UI | Browser (Angular) | own project | `skill-map-ui` |
125
+
126
+ The two Node surfaces share one project because they are the same workspace
127
+ code in the same runtime; the `surface` tag, plus the per-event `route` /
128
+ `method` tags, separate a CLI crash from a BFF request-path crash. The UI has
129
+ its own project, so it needs no `surface` tag. Each project carries a
130
+ hardcoded DSN (`SENTRY_DSN_NODE` for the shared Node project, `SENTRY_DSN_UI`
131
+ for the UI), centralized in `src/public-config.ts` and
132
+ `ui/src/app/core/public-config.ts`. Sentry DSNs are public by design (they
133
+ identify an ingest endpoint, they are not secrets) and are safe to ship in
134
+ the published artifact. The BFF MUST NOT emit usage events; it reports only
135
+ unhandled errors in the request path.
136
+
137
+ The error surfaces send **no proactive beacons**: no release-health sessions,
138
+ no transactions, no performance traces. An event leaves the machine ONLY when
139
+ an error is captured. In particular the browser SDK MUST drop the default
140
+ session integration so no session is sent on page load or route change.
141
+
142
+ ### Error wire format
143
+
144
+ An error event MAY carry:
145
+
146
+ - A stack trace whose `filename` and `abs_path` frames have been run through
147
+ the path scrubber (below).
148
+ - Environment facts: `cli_version`, `node_major`, `os`, `arch`, and, for the
149
+ UI, browser family and version.
150
+ - The fixed tag set: `surface` (`cli` / `bff` on the shared Node project),
151
+ `verb`, `phase`, `plugin_id` (built-in ids only), `extension_kind`,
152
+ `route` (BFF), `method`, `status`.
153
+ - The error name, error code, and a scrubbed message.
154
+ - Breadcrumbs (a bounded recent-event trail) with each message scrubbed.
155
+
156
+ ## Surface: Usage (PostHog)
157
+
158
+ Usage analytics are carried by **PostHog Cloud (EU region)**, for data
159
+ residency parity with the Sentry `.de` projects. The public PostHog project
160
+ key is hardcoded and centralized in `src/public-config.ts` (`POSTHOG_KEY_NODE`)
161
+ and `ui/src/app/core/public-config.ts` (`POSTHOG_KEY_UI`). Like a Sentry DSN,
162
+ a PostHog project key is a public ingest identifier, not a secret, and is safe
163
+ to ship. Setting a key to the empty string `''` forces that surface dormant
164
+ (no init, no network, the SDK is not even imported), the same dormancy gate
165
+ the error surface uses.
166
+
167
+ Only **two** runtimes emit usage events:
168
+
169
+ | Surface | Runtime | Toggle | Carrier |
170
+ |---|---|---|---|
171
+ | `sm <verb>` | Node (CLI) | `usageCliEnabled` | PostHog (server SDK) |
172
+ | UI | Browser (Angular) | `usageUiEnabled` | PostHog (browser SDK) |
173
+
174
+ The **BFF MUST NOT emit usage events** (the BFF's activity is the UI's
175
+ activity, already covered by the UI surface; double-emitting would
176
+ double-count). The BFF participates only by reading/writing consent and by
177
+ exposing `anonymousId` read-only on `GET /api/preferences` so the browser uses
178
+ the same `distinct_id` as the CLI.
179
+
180
+ Both usage SDKs are configured to send nothing beyond the allow-list below:
181
+ PostHog autocapture, pageview/pageleave capture, session recording, and
182
+ client IP / geo-IP enrichment are all disabled.
183
+
184
+ ## Usage event taxonomy
185
+
186
+ Usage collection is **deny by default**: only the events and properties named
187
+ here may be sent. Every event carries `distinct_id = telemetry.anonymousId`,
188
+ the common environment facts (`cli_version`, `node_major`, `os`, `arch`; the UI
189
+ additionally carries browser family/version where the SDK provides it), and
190
+ `environment` (`dev` / `prod`, see below). The UI also attaches
191
+ the active theme as super-properties on every event: `theme_base` (`light` /
192
+ `dark`) and `theme_extra` (the active extra theme id, or `none`); future extra
193
+ themes flow through by value with no spec change. No other identity property is
194
+ ever attached.
195
+
196
+ The `environment` tag lets the maintainers filter their own dogfooding out of
197
+ the real-world data. It is `dev` when the `SKILL_MAP_TELEMETRY_ENV` environment
198
+ variable is set to any non-empty value other than a production marker
199
+ (`prod` / `production`); the dev tooling sets it. It is `prod` when the
200
+ variable is absent, empty, or a production marker. It is NOT a kill switch (it
201
+ never disables telemetry, only labels the source) and rides on both surfaces:
202
+ usage events as above, and Sentry's native `environment` field on error
203
+ events.
204
+
205
+ | Event | Surface | Properties |
206
+ |---|---|---|
207
+ | `cli.<verb>` | CLI | `flags` (array of flag NAMES that were set), and on a scan, `extensions` (deduped, sorted set of built-in extractor ids that ran in the walk). One event per invocation; the event NAME is the verb (`cli.scan`, `cli.check`, ...), restricted to the registered closed verb set so an unknown command collapses to `cli.unknown` (a typo never mints a junk event name). |
208
+ | `ui.view.<view>` | UI | the opened view is the event name (`ui.view.map`, `ui.view.files`), from a closed route set. No properties beyond the common env facts. One per route change. |
209
+ | `ui.feature.<feature>` | UI | the opened feature is the event name (`ui.feature.inspector`, `ui.feature.settings`), from a closed set. |
210
+ | `plugin.apply` | CLI + UI | `enabled` / `disabled`: deduped, sorted sets of the plugin / extension ids toggled (built-in ids pass through, third-party collapse to `external_plugin`). Emitted on `sm plugins enable` / `disable` and on the Settings plugins Apply. |
211
+
212
+ Rules:
213
+
214
+ - **Flag names only, never values.** `--max-nodes 500` reports the name
215
+ `max-nodes`, never `500`.
216
+ - **Extractor ids are presence, not counts.** `extensions` is a set; it never
217
+ carries how many nodes an extractor processed or how large the project is.
218
+ Only extractors that ran in the walk appear (cached extractors on an
219
+ incremental scan do not), so the signal is "which extractors this project
220
+ exercises", aggregated across runs.
221
+ - **Third-party ids collapse.** Any extension id whose plugin is not a
222
+ built-in (`claude`, `antigravity`, `openai`, `agent-skills`, `core`) MUST be
223
+ replaced with the literal `external_plugin` before the event leaves the
224
+ machine.
225
+ - **No node paths, titles, or content** in any UI event; the view / feature is
226
+ the event name, drawn from a closed set, and nothing else is attached.
227
+
228
+ ## Scrubbing rules (shared)
229
+
230
+ Scrubbing is **deny by default** and applied client-side in each SDK's
231
+ pre-send hook (`beforeSend` for Sentry, `before_send` for PostHog), before any
232
+ event leaves the machine. It applies to error events AND usage event
233
+ properties (defense in depth: the usage collectors emit only names and enums
234
+ by construction, but every event's payload is still walked). An event MUST
235
+ have the following removed or replaced:
236
+
237
+ - **Absolute paths**, anywhere they appear (frame `abs_path`, frame
238
+ `filename`, inside the error message, inside breadcrumb messages, inside
239
+ any nested event or property field). The user's home directory is replaced
240
+ with the literal `<HOME>` and the OS username with `<USER>`.
241
+ - **File names of user content** (scanned markdown files).
242
+ - **Markdown bodies, frontmatter values, annotation contents.** None of these
243
+ are ever attached to an event.
244
+ - **IP address.** Opted out client-side and disabled at the project level.
245
+ - **Hostname** (`server_name` stripped).
246
+ - **OS username.**
247
+ - **Third-party plugin ids.** Only built-in plugin ids may appear; any
248
+ non-built-in id MUST be replaced with the literal `external_plugin`.
249
+ - **Settings values** (`scan.extraFolders`, `scan.referencePaths`, etc.).
250
+
251
+ The scrubber is a pure function with no SDK dependency, so it can be unit
252
+ tested against hostile inputs (Windows paths, symlinked paths, paths embedded
253
+ mid-message, nested `abs_path` fields, breadcrumb data, structured usage
254
+ property objects) independently of the SDK wiring.
255
+
256
+ ## Server-side guarantees
257
+
258
+ As a second line of defense behind the client-side scrubber:
259
+
260
+ - Each **Sentry** project MUST be configured to not store IP addresses and to
261
+ run a server-side data-scrubbing rule with the same path pattern as the
262
+ client scrubber. The UI error surface additionally restricts reporting to
263
+ loopback: Sentry retired its server-side allowed-domains setting, so this is
264
+ enforced client-side via the SDK `allowUrls` option pinned to `localhost` /
265
+ `127.0.0.1` (the UI is only ever served from loopback).
266
+ - The **PostHog** project MUST be configured to discard client IP addresses
267
+ and disable geo-IP enrichment (the client SDKs also disable geo and
268
+ autocapture, but the project setting is the backstop).
269
+
270
+ ## Stability
271
+
272
+ The **consent model** (default OFF on every surface, the `telemetry` toggles
273
+ and bookkeeping in `user-settings.schema.json`, the `SKILL_MAP_TELEMETRY=0`
274
+ kill switch, prompt-once semantics) is stable as of the spec minor in which it
275
+ lands. Loosening any default (anything other than OFF), removing the kill
276
+ switch, or removing the consent gate is a major bump.
277
+
278
+ The **two surfaces are independent.** Error scope and usage scope each evolve
279
+ on their own minor bump. Adding a new usage event or property, or a new error
280
+ tag or environment fact, is a minor bump. Performance traces remain out of
281
+ scope on both surfaces and would be a third, separately-consented surface.
282
+
283
+ The **`anonymousId` exception** is normatively scoped to the usage surface
284
+ only: it is the one anonymous correlation id the contract permits, and the
285
+ error surface MUST remain free of any cross-session or cross-install id.
286
+ Widening the id beyond usage, or attaching any identity to it, is a major
287
+ bump.
288
+
289
+ The scrubbing exclusion list (what MUST NOT leave the machine) is the stable,
290
+ normative core and may only grow, never shrink, without a major bump.
291
+
292
+ Consumers and alternate implementations MAY choose not to ship either surface;
293
+ both are optional. An implementation that does ship a surface MUST honor the
294
+ consent contract and the scrubbing rules in full.