@skill-map/spec 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. package/CHANGELOG.md +96 -0
  2. package/README.md +105 -0
  3. package/architecture.md +218 -0
  4. package/cli-contract.md +336 -0
  5. package/conformance/README.md +140 -0
  6. package/conformance/cases/basic-scan.json +17 -0
  7. package/conformance/cases/kernel-empty-boot.json +24 -0
  8. package/conformance/fixtures/minimal-claude/agents/reviewer.md +16 -0
  9. package/conformance/fixtures/minimal-claude/commands/status.md +17 -0
  10. package/conformance/fixtures/minimal-claude/hooks/pre-commit.md +13 -0
  11. package/conformance/fixtures/minimal-claude/notes/architecture.md +11 -0
  12. package/conformance/fixtures/minimal-claude/skills/hello.md +22 -0
  13. package/conformance/fixtures/preamble-v1.txt +54 -0
  14. package/db-schema.md +359 -0
  15. package/dispatch-lifecycle.md +213 -0
  16. package/index.json +205 -0
  17. package/interfaces/security-scanner.md +233 -0
  18. package/job-events.md +322 -0
  19. package/package.json +49 -0
  20. package/plugin-kv-api.md +208 -0
  21. package/prompt-preamble.md +152 -0
  22. package/schemas/conformance-case.schema.json +185 -0
  23. package/schemas/execution-record.schema.json +88 -0
  24. package/schemas/frontmatter/agent.schema.json +22 -0
  25. package/schemas/frontmatter/base.schema.json +136 -0
  26. package/schemas/frontmatter/command.schema.json +39 -0
  27. package/schemas/frontmatter/hook.schema.json +29 -0
  28. package/schemas/frontmatter/note.schema.json +11 -0
  29. package/schemas/frontmatter/skill.schema.json +37 -0
  30. package/schemas/issue.schema.json +54 -0
  31. package/schemas/job.schema.json +75 -0
  32. package/schemas/link.schema.json +66 -0
  33. package/schemas/node.schema.json +95 -0
  34. package/schemas/plugins-registry.schema.json +99 -0
  35. package/schemas/project-config.schema.json +87 -0
  36. package/schemas/report-base.schema.json +41 -0
  37. package/schemas/scan-result.schema.json +71 -0
  38. package/schemas/summaries/agent.schema.json +46 -0
  39. package/schemas/summaries/command.schema.json +50 -0
  40. package/schemas/summaries/hook.schema.json +43 -0
  41. package/schemas/summaries/note.schema.json +37 -0
  42. package/schemas/summaries/skill.schema.json +57 -0
  43. package/versioning.md +94 -0
package/db-schema.md ADDED
@@ -0,0 +1,359 @@
1
+ # Database schema
2
+
3
+ Normative catalog of tables owned by the kernel. Plugins MAY add their own tables under a strict prefix (see `plugin-kv-api.md`). An implementation MUST provision every kernel table described here and MUST reject writes that violate the stated constraints.
4
+
5
+ The spec assumes a relational, SQL-like store but is **engine-agnostic**. The reference implementation uses SQLite (`node:sqlite`) + Kysely + `CamelCasePlugin`. Alternative backends (Postgres, DuckDB, in-memory) are permitted as long as:
6
+
7
+ - Atomic single-statement transitions are available for the job claim (see `dispatch-lifecycle.md`).
8
+ - Migrations track applied versions per scope.
9
+ - Read isolation avoids phantom reads inside a single scan write.
10
+
11
+ ---
12
+
13
+ ## Scope and location
14
+
15
+ Two scopes. Each has its own database file and its own migration ledger.
16
+
17
+ | Scope | Default DB location | Scan roots |
18
+ |---|---|---|
19
+ | `project` (default) | `./.skill-map/skill-map.db` | The current repository. |
20
+ | `global` (`-g`) | `~/.skill-map/skill-map.db` | User-level skill directories (e.g. `~/.claude/`). |
21
+
22
+ The project DB is gitignored by default. Teams MAY opt in to sharing it by setting `history.share: true` in `.skill-map.json` — the file is then committed and the execution log becomes a team artifact. Both zones use the same schema.
23
+
24
+ The `--db <path>` CLI flag overrides location for both scopes as an escape hatch.
25
+
26
+ ---
27
+
28
+ ## Zones
29
+
30
+ Every kernel table belongs to exactly one zone, identified by a mandatory name prefix.
31
+
32
+ | Zone | Prefix | Nature | Regenerable | Backed up | Example |
33
+ |---|---|---|---|---|---|
34
+ | Scan | `scan_` | Output of the last scan. Truncated and repopulated by `sm scan`. | Yes | No | `scan_nodes` |
35
+ | State | `state_` | Persistent operational data: jobs, executions, summaries, enrichment, plugin KV. | No | Yes | `state_jobs` |
36
+ | Config | `config_` | User-owned configuration: plugin enable/disable, preferences, migration ledger. | No | Yes | `config_plugins` |
37
+
38
+ `sm db reset` drops `scan_*` + `state_*`, keeps `config_*`. `sm db backup` preserves `state_*` + `config_*`; `scan_*` is regenerated on demand.
39
+
40
+ ---
41
+
42
+ ## Naming conventions (normative)
43
+
44
+ These rules apply to every kernel table and to every plugin-authored table under its prefix.
45
+
46
+ - **Tables**: `snake_case`, plural. Zone prefix REQUIRED. Example: `scan_nodes`, `state_jobs`.
47
+ - **Columns**: `snake_case`. Primary key column is always `id`.
48
+ - **Foreign keys**: `<referenced_table_singular>_id`. Example: `job_id` references `state_jobs.id`.
49
+ - **Timestamps**: suffix `_at`, type `INTEGER` (Unix milliseconds). Example: `created_at`, `claimed_at`.
50
+ - **Durations**: suffix `_seconds` or `_ms`. Example: `ttl_seconds`, `duration_ms`.
51
+ - **Booleans**: prefix `is_` or `has_`. Stored as `INTEGER` (`0`/`1`) per SQLite convention; other engines use their native boolean.
52
+ - **Hashes**: suffix `_hash`, `TEXT`, hex-encoded lowercase. Example: `body_hash`, `content_hash`.
53
+ - **JSON blobs**: suffix `_json`, `TEXT`. Parsed on read, serialized on write.
54
+ - **Counts**: suffix `_count`, `INTEGER`. Example: `links_out_count`.
55
+ - **Enums**: plain column + `CHECK` constraint listing allowed values. Values are kebab-case lowercase. No lookup tables.
56
+ - **Indexes**: named `ix_<table>_<cols>`. Example: `ix_state_jobs_status`.
57
+ - **Constraints**: `fk_`, `uq_`, `ck_` prefixes.
58
+ - **SQL keywords**: UPPERCASE. Identifiers lowercase.
59
+
60
+ The kernel MUST reject any plugin migration that violates these rules at validation time (see `plugin-kv-api.md`).
61
+
62
+ Domain types exposed to driving adapters use `camelCase`. The SQLite reference impl uses Kysely's `CamelCasePlugin` to bridge `snake_case ↔ camelCase` at the port boundary.
63
+
64
+ ---
65
+
66
+ ## Table catalog: zone `scan_`
67
+
68
+ ### `scan_nodes`
69
+
70
+ One row per detected node, matching `schemas/node.schema.json`.
71
+
72
+ | Column | Type | Constraint | Notes |
73
+ |---|---|---|---|
74
+ | `path` | TEXT | PRIMARY KEY | Relative path from scope root. Canonical node identifier. |
75
+ | `kind` | TEXT | NOT NULL, CHECK in (`skill`, `agent`, `command`, `hook`, `note`) | |
76
+ | `adapter` | TEXT | NOT NULL | Adapter extension id. |
77
+ | `title` | TEXT | NULL | |
78
+ | `description` | TEXT | NULL | |
79
+ | `stability` | TEXT | CHECK in (`experimental`, `stable`, `deprecated`) OR NULL | Denormalized from frontmatter. |
80
+ | `version` | TEXT | NULL | Denormalized from frontmatter. |
81
+ | `author` | TEXT | NULL | Denormalized. |
82
+ | `frontmatter_json` | TEXT | NOT NULL | Full parsed frontmatter as JSON. |
83
+ | `body_hash` | TEXT | NOT NULL | sha256, hex. |
84
+ | `frontmatter_hash` | TEXT | NOT NULL | sha256, hex. |
85
+ | `bytes_frontmatter` | INTEGER | NOT NULL | |
86
+ | `bytes_body` | INTEGER | NOT NULL | |
87
+ | `bytes_total` | INTEGER | NOT NULL | |
88
+ | `tokens_frontmatter` | INTEGER | NULL | NULL when tokenization disabled. |
89
+ | `tokens_body` | INTEGER | NULL | |
90
+ | `tokens_total` | INTEGER | NULL | |
91
+ | `links_out_count` | INTEGER | NOT NULL DEFAULT 0 | |
92
+ | `links_in_count` | INTEGER | NOT NULL DEFAULT 0 | |
93
+ | `external_refs_count` | INTEGER | NOT NULL DEFAULT 0 | |
94
+ | `scanned_at` | INTEGER | NOT NULL | Unix ms. |
95
+
96
+ Indexes: `ix_scan_nodes_kind`, `ix_scan_nodes_adapter`, `ix_scan_nodes_body_hash` (rename heuristic).
97
+
98
+ ### `scan_links`
99
+
100
+ One row per detected link, matching `schemas/link.schema.json`.
101
+
102
+ | Column | Type | Constraint | Notes |
103
+ |---|---|---|---|
104
+ | `id` | INTEGER | PRIMARY KEY AUTOINCREMENT | |
105
+ | `source_path` | TEXT | NOT NULL | FK semantically; MAY be unenforced for performance. |
106
+ | `target_path` | TEXT | NOT NULL | MAY point to a missing node (broken ref). |
107
+ | `kind` | TEXT | NOT NULL, CHECK in (`invokes`, `references`, `mentions`, `supersedes`) | |
108
+ | `confidence` | TEXT | NOT NULL, CHECK in (`high`, `medium`, `low`) | |
109
+ | `sources_json` | TEXT | NOT NULL | JSON array of detector ids. |
110
+ | `original_trigger` | TEXT | NULL | |
111
+ | `normalized_trigger` | TEXT | NULL | |
112
+ | `location_line` | INTEGER | NULL | |
113
+ | `location_column` | INTEGER | NULL | |
114
+ | `location_offset` | INTEGER | NULL | |
115
+ | `raw` | TEXT | NULL | |
116
+
117
+ Indexes: `ix_scan_links_source_path`, `ix_scan_links_target_path`, `ix_scan_links_normalized_trigger`.
118
+
119
+ ### `scan_issues`
120
+
121
+ One row per rule-emitted issue, matching `schemas/issue.schema.json`.
122
+
123
+ | Column | Type | Constraint | Notes |
124
+ |---|---|---|---|
125
+ | `id` | INTEGER | PRIMARY KEY AUTOINCREMENT | |
126
+ | `rule_id` | TEXT | NOT NULL | |
127
+ | `severity` | TEXT | NOT NULL, CHECK in (`error`, `warn`, `info`) | |
128
+ | `node_ids_json` | TEXT | NOT NULL | JSON array. |
129
+ | `link_indices_json` | TEXT | NULL | JSON array of `scan_links.id`. |
130
+ | `message` | TEXT | NOT NULL | |
131
+ | `detail` | TEXT | NULL | |
132
+ | `fix_json` | TEXT | NULL | |
133
+ | `data_json` | TEXT | NULL | |
134
+
135
+ Indexes: `ix_scan_issues_rule_id`, `ix_scan_issues_severity`.
136
+
137
+ ---
138
+
139
+ ## Table catalog: zone `state_`
140
+
141
+ ### `state_jobs`
142
+
143
+ Matching `schemas/job.schema.json`.
144
+
145
+ | Column | Type | Constraint |
146
+ |---|---|---|
147
+ | `id` | TEXT | PRIMARY KEY |
148
+ | `action_id` | TEXT | NOT NULL |
149
+ | `action_version` | TEXT | NOT NULL |
150
+ | `node_id` | TEXT | NOT NULL |
151
+ | `content_hash` | TEXT | NOT NULL |
152
+ | `nonce` | TEXT | NOT NULL |
153
+ | `priority` | INTEGER | NOT NULL DEFAULT 0 |
154
+ | `status` | TEXT | NOT NULL, CHECK in (`queued`, `running`, `completed`, `failed`) |
155
+ | `failure_reason` | TEXT | NULL, CHECK in (`runner-error`, `report-invalid`, `timeout`, `abandoned`, `job-file-missing`, `user-cancelled`) |
156
+ | `runner` | TEXT | NULL, CHECK in (`cli`, `skill`, `in-process`) |
157
+ | `ttl_seconds` | INTEGER | NOT NULL |
158
+ | `file_path` | TEXT | NULL |
159
+ | `created_at` | INTEGER | NOT NULL |
160
+ | `claimed_at` | INTEGER | NULL |
161
+ | `finished_at` | INTEGER | NULL |
162
+ | `expires_at` | INTEGER | NULL |
163
+ | `submitted_by` | TEXT | NULL |
164
+
165
+ Indexes: `ix_state_jobs_status`, `ix_state_jobs_action_node_hash` (unique partial index WHERE `status IN ('queued','running')` for duplicate detection).
166
+
167
+ ### `state_executions`
168
+
169
+ Matching `schemas/execution-record.schema.json`.
170
+
171
+ | Column | Type | Constraint |
172
+ |---|---|---|
173
+ | `id` | TEXT | PRIMARY KEY |
174
+ | `kind` | TEXT | NOT NULL, CHECK in (`action`, `audit`) |
175
+ | `extension_id` | TEXT | NOT NULL |
176
+ | `extension_version` | TEXT | NOT NULL |
177
+ | `node_ids_json` | TEXT | NOT NULL DEFAULT '[]' |
178
+ | `content_hash` | TEXT | NULL |
179
+ | `status` | TEXT | NOT NULL, CHECK in (`completed`, `failed`, `cancelled`) |
180
+ | `failure_reason` | TEXT | NULL |
181
+ | `exit_code` | INTEGER | NULL |
182
+ | `runner` | TEXT | NULL |
183
+ | `started_at` | INTEGER | NOT NULL |
184
+ | `finished_at` | INTEGER | NOT NULL |
185
+ | `duration_ms` | INTEGER | NULL |
186
+ | `tokens_in` | INTEGER | NULL |
187
+ | `tokens_out` | INTEGER | NULL |
188
+ | `report_path` | TEXT | NULL |
189
+ | `job_id` | TEXT | NULL |
190
+
191
+ Indexes: `ix_state_executions_extension_id`, `ix_state_executions_started_at`, `ix_state_executions_job_id`.
192
+
193
+ ### `state_summaries`
194
+
195
+ One row per `(node_id, summarizer_action_id)`. See `schemas/summaries/`.
196
+
197
+ | Column | Type | Constraint |
198
+ |---|---|---|
199
+ | `node_id` | TEXT | NOT NULL |
200
+ | `kind` | TEXT | NOT NULL, CHECK in kind enum |
201
+ | `summarizer_action_id` | TEXT | NOT NULL |
202
+ | `summarizer_version` | TEXT | NOT NULL |
203
+ | `body_hash_at_generation` | TEXT | NOT NULL |
204
+ | `generated_at` | INTEGER | NOT NULL |
205
+ | `summary_json` | TEXT | NOT NULL |
206
+
207
+ Primary key: `(node_id, summarizer_action_id)`. Indexes: `ix_state_summaries_generated_at`.
208
+
209
+ ### `state_enrichment`
210
+
211
+ One row per `(node_id, provider_id)`.
212
+
213
+ | Column | Type | Constraint |
214
+ |---|---|---|
215
+ | `node_id` | TEXT | NOT NULL |
216
+ | `provider_id` | TEXT | NOT NULL |
217
+ | `data_json` | TEXT | NOT NULL |
218
+ | `verified` | INTEGER | NULL (0/1/NULL) |
219
+ | `fetched_at` | INTEGER | NOT NULL |
220
+ | `stale_after` | INTEGER | NULL |
221
+
222
+ Primary key: `(node_id, provider_id)`. Indexes: `ix_state_enrichment_stale_after`.
223
+
224
+ ### `state_plugin_kv`
225
+
226
+ Shared key-value store for plugins that declared storage mode `kv`. See `plugin-kv-api.md` for the accessor contract.
227
+
228
+ | Column | Type | Constraint |
229
+ |---|---|---|
230
+ | `plugin_id` | TEXT | NOT NULL |
231
+ | `node_id` | TEXT | NULL | Optional scoping by node. |
232
+ | `key` | TEXT | NOT NULL |
233
+ | `value_json` | TEXT | NOT NULL |
234
+ | `updated_at` | INTEGER | NOT NULL |
235
+
236
+ Primary key: `(plugin_id, node_id, key)` with `node_id` using a sentinel empty string when NULL to satisfy PK constraints on engines that reject NULL in PK columns. Indexes: `ix_state_plugin_kv_plugin_id`.
237
+
238
+ ---
239
+
240
+ ## Table catalog: zone `config_`
241
+
242
+ ### `config_plugins`
243
+
244
+ Persists user-toggled enable/disable overrides. Discovery is still filesystem-based; this table records user intent.
245
+
246
+ | Column | Type | Constraint |
247
+ |---|---|---|
248
+ | `plugin_id` | TEXT | PRIMARY KEY |
249
+ | `enabled` | INTEGER | NOT NULL DEFAULT 1 |
250
+ | `config_json` | TEXT | NULL |
251
+ | `updated_at` | INTEGER | NOT NULL |
252
+
253
+ ### `config_preferences`
254
+
255
+ General-purpose key-value for user preferences (`sm config set`).
256
+
257
+ | Column | Type | Constraint |
258
+ |---|---|---|
259
+ | `key` | TEXT | PRIMARY KEY |
260
+ | `value_json` | TEXT | NOT NULL |
261
+ | `updated_at` | INTEGER | NOT NULL |
262
+
263
+ ### `config_schema_versions`
264
+
265
+ Migration ledger. One row per successfully applied migration, per scope.
266
+
267
+ | Column | Type | Constraint |
268
+ |---|---|---|
269
+ | `scope` | TEXT | NOT NULL, CHECK in (`kernel`, `plugin`) |
270
+ | `owner_id` | TEXT | NOT NULL | `kernel` for kernel migrations, plugin id otherwise. |
271
+ | `version` | INTEGER | NOT NULL |
272
+ | `description` | TEXT | NOT NULL |
273
+ | `applied_at` | INTEGER | NOT NULL |
274
+
275
+ Primary key: `(scope, owner_id, version)`.
276
+
277
+ The kernel ALSO maintains `PRAGMA user_version` (or the engine equivalent) as a fast pre-check for kernel migrations. A mismatch between `user_version` and `config_schema_versions` is a diagnostic flagged by `sm doctor`.
278
+
279
+ ---
280
+
281
+ ## Migrations
282
+
283
+ - **Format**: `.sql` files. Up-only. Rollback is `sm db restore <backup>`.
284
+ - **Naming**: `NNN_snake_case.sql` where `NNN` is 3-digit sequential, zero-padded. Example: `001_initial.sql`, `042_add_provenance.sql`.
285
+ - **Location**: kernel migrations in `src/migrations/` (reference impl); plugin migrations in `<plugin-dir>/migrations/`.
286
+ - **Wrapping**: the kernel wraps each file in `BEGIN; ... ; COMMIT;`. Files contain DDL only.
287
+ - **Strict versioning**: no idempotency is required. `CREATE TABLE IF NOT EXISTS` is DISCOURAGED in kernel migrations (but permitted in plugin migrations, at the plugin author's discretion).
288
+ - **Auto-apply**: on startup, unless `auto_migrate: false` in config. A backup is written to `.skill-map/backups/skill-map-pre-migrate-v<N>.db` before applying.
289
+ - **Plugin migration order**: plugins are migrated after kernel migrations and in stable alphabetical order by plugin id. A failing plugin migration disables only that plugin; other plugins and the kernel continue.
290
+
291
+ `sm db migrate` controls migration flow manually: `--dry-run`, `--status`, `--to <n>`, `--kernel-only`, `--plugin <id>`, `--no-backup`.
292
+
293
+ ---
294
+
295
+ ## Plugin storage
296
+
297
+ Two modes declared in `plugin.json` (see `schemas/plugins-registry.schema.json`).
298
+
299
+ | Mode | Manifest | Backing |
300
+ |---|---|---|
301
+ | **KV** (mode A) | `"storage": { "mode": "kv" }` | Shared `state_plugin_kv`. See `plugin-kv-api.md`. |
302
+ | **Dedicated** (mode B) | `"storage": { "mode": "dedicated", "tables": [...], "migrations": [...] }` | Plugin-owned tables, prefixed `plugin_<normalized_id>_`. |
303
+
304
+ Normalization of `plugin_id` for the prefix:
305
+
306
+ 1. Lowercase.
307
+ 2. Replace `[^a-z0-9]` with `_`.
308
+ 3. Collapse runs of `_`.
309
+ 4. Strip leading/trailing `_`.
310
+
311
+ Example: `@skill-map/cluster-triggers` → `skill_map_cluster_triggers` → prefix `plugin_skill_map_cluster_triggers_`.
312
+
313
+ Collisions after normalization are a load-time error; both plugins are disabled with reason `invalid-manifest`.
314
+
315
+ ### Triple protection for mode B
316
+
317
+ The kernel MUST enforce all three layers:
318
+
319
+ 1. **Prefix injection**: the kernel rewrites the `CREATE TABLE` statements in the plugin migration to inject `plugin_<id>_` into every table name that doesn't already have it. A plugin CANNOT create un-prefixed tables.
320
+ 2. **DDL validation**: plugin migrations are parsed before application. The kernel MUST reject: foreign keys to kernel tables, triggers on kernel tables, `DROP` / `ALTER` against kernel tables, `ATTACH` statements, global `PRAGMA` statements (except `PRAGMA <plugin>_*` if applicable to the backend).
321
+ 3. **Scoped connection**: at runtime, the plugin receives a `Database` wrapper (not a raw handle). The wrapper rejects queries that touch tables outside the plugin's own prefix.
322
+
323
+ Honest note: plugins are user-placed code. Protection guards against accidents (a plugin that mistakenly names a table `state_jobs`), not against hostile plugins. A malicious plugin running in the same process can bypass any JS-level guard. Post-v1.0 evaluates sandboxing (worker threads, VM contexts) and/or signing.
324
+
325
+ ---
326
+
327
+ ## Backups
328
+
329
+ - `sm db backup [--out <path>]` — WAL checkpoint (SQLite; engine-equivalent for others) + file copy.
330
+ - Default backup location: `.skill-map/backups/<timestamp>.db`.
331
+ - Auto-backup before migrations: `.skill-map/backups/skill-map-pre-migrate-v<N>.db`.
332
+ - `sm db restore <path>` swaps the current DB with the supplied file. Interactive confirmation required unless `--force`.
333
+
334
+ Backups include `state_*` + `config_*` only; `scan_*` is regenerated after restore by running `sm scan`.
335
+
336
+ ---
337
+
338
+ ## Integrity
339
+
340
+ `sm doctor` MUST check at least:
341
+
342
+ - DB file exists and is readable.
343
+ - `PRAGMA quick_check` (or equivalent) returns OK.
344
+ - Applied migration version matches code-bundled migrations.
345
+ - No orphan job files (`.skill-map/jobs/*.md` without a matching DB row).
346
+ - No orphan DB rows (jobs whose `file_path` does not exist).
347
+ - No plugin in `load-error` or `incompatible-spec` status.
348
+
349
+ Failures are reported with suggested remediation (e.g., "run `sm db migrate`", "run `sm job prune --orphan-files`").
350
+
351
+ ---
352
+
353
+ ## Stability
354
+
355
+ The **three-zone model** and the **naming conventions** are stable as of spec v1.0.0. Adding a fourth zone is a major bump.
356
+
357
+ The **table catalog** above is stable within a spec major version. Adding a column to a kernel table is a minor bump (consumers MUST ignore unknown columns). Adding a table is a minor bump. Removing or renaming a column is a major bump.
358
+
359
+ Plugin storage mode names (`kv`, `dedicated`) are stable. Adding a third mode is a minor bump.
@@ -0,0 +1,213 @@
1
+ # Dispatch lifecycle
2
+
3
+ Normative state machine for jobs. A `Job` (see `schemas/job.schema.json`) is the runtime instance of an `Action` applied to one or more `Node`s. Every job moves through this lifecycle exactly once.
4
+
5
+ ---
6
+
7
+ ## State machine
8
+
9
+ ```
10
+ submit
11
+
12
+
13
+ ┌──────────┐ atomic claim ┌──────────┐
14
+ │ queued │ ───────────────▶ │ running │
15
+ └────┬─────┘ └─────┬────┘
16
+ │ │
17
+ │ cancel │
18
+ │ │
19
+ │ record success │
20
+ │ ┌───────────────┤
21
+ │ │ │ record failure
22
+ │ │ │ TTL expires (reap)
23
+ │ │ │ runner error
24
+ ▼ ▼ ▼
25
+ ┌────────┐ ┌──────────┐ ┌──────────┐
26
+ │ failed │ │ completed│ │ failed │
27
+ └────────┘ └──────────┘ └──────────┘
28
+ ```
29
+
30
+ Terminal states: `completed`, `failed`. Once terminal, a job MUST NOT transition again.
31
+
32
+ ---
33
+
34
+ ## Allowed transitions
35
+
36
+ | From | To | Trigger |
37
+ |---|---|---|
38
+ | (none) | `queued` | `sm job submit` succeeds. |
39
+ | `queued` | `running` | Atomic claim by a runner. |
40
+ | `queued` | `failed` | `sm job cancel <id>` (reason `user-cancelled`). |
41
+ | `running` | `completed` | `sm record --status completed` with valid nonce. |
42
+ | `running` | `failed` | `sm record --status failed`, OR TTL expired (reason `abandoned`), OR runner subprocess returned non-zero (reason `runner-error`), OR report failed schema validation (reason `report-invalid`), OR job file missing at runtime (reason `job-file-missing`). |
43
+
44
+ Any other transition attempt MUST be rejected and MUST NOT mutate state. Implementations SHOULD log the attempt.
45
+
46
+ ---
47
+
48
+ ## Submit
49
+
50
+ `sm job submit <action> -n <node.path>`:
51
+
52
+ 1. Resolve the action (`actionId`, `actionVersion`, `promptTemplateHash`).
53
+ 2. Resolve the target node (`bodyHash`, `frontmatterHash`). Fail with exit 5 if the node does not exist.
54
+ 3. Compute `contentHash = sha256(actionId + actionVersion + bodyHash + frontmatterHash + promptTemplateHash)`.
55
+ 4. **Duplicate check**: query `state_jobs` for any row with `(actionId, actionVersion, nodeId, contentHash)` AND `status IN ('queued', 'running')`. If found, refuse with exit 3 and print the existing job id (unless `--force`).
56
+ 5. Compute `ttlSeconds = max(action.expectedDurationSeconds × graceMultiplier, minimumTtlSeconds)`. Frozen for the life of this job. User overrides via `--ttl`.
57
+ 6. Generate `nonce` (implementation-chosen; MUST be cryptographically random, ≥ 128 bits of entropy).
58
+ 7. Render the job file at `.skill-map/jobs/<id>.md`, applying the canonical preamble (see `prompt-preamble.md`).
59
+ 8. Insert a row in `state_jobs` with `status = 'queued'`, `createdAt = now`.
60
+ 9. Return the job id.
61
+
62
+ `--all` fans out one job per node matching the action's `preconditions`. Each fan-out job is independent: some may be duplicates and be refused, others succeed. The CLI reports a summary.
63
+
64
+ ---
65
+
66
+ ## Atomic claim
67
+
68
+ A runner acquires the next queued job with a single atomic operation:
69
+
70
+ ```sql
71
+ UPDATE state_jobs
72
+ SET status = 'running',
73
+ claimedAt = <now>,
74
+ runner = <runner-id>,
75
+ expiresAt = <now> + ttlSeconds * 1000
76
+ WHERE id = (
77
+ SELECT id FROM state_jobs
78
+ WHERE status = 'queued'
79
+ AND (<filter>)
80
+ ORDER BY priority DESC, createdAt ASC
81
+ LIMIT 1
82
+ )
83
+ AND status = 'queued'
84
+ RETURNING id;
85
+ ```
86
+
87
+ The second `AND status = 'queued'` guards against a race where two runners select the same id at the same instant; only one succeeds.
88
+
89
+ **Non-SQLite implementations**: MUST provide an equivalent single-statement atomic transition. A two-step `SELECT then UPDATE` is NOT acceptable — it is observable as a double-claim bug.
90
+
91
+ `sm job claim` exposes this primitive to Skill runners: returns the id on stdout (exit 0) or exits 1 if the queue is empty.
92
+
93
+ ---
94
+
95
+ ## TTL and auto-reap
96
+
97
+ Every `running` job has an `expiresAt = claimedAt + ttlSeconds × 1000`. Once real time passes `expiresAt`, the job is considered abandoned.
98
+
99
+ ### Reap procedure
100
+
101
+ Run at the **start of every `sm job run`** invocation, before the first claim:
102
+
103
+ ```sql
104
+ UPDATE state_jobs
105
+ SET status = 'failed',
106
+ failureReason = 'abandoned',
107
+ finishedAt = <now>
108
+ WHERE status = 'running'
109
+ AND expiresAt < <now>;
110
+ ```
111
+
112
+ Number of rows affected is reported as `run.reap.completed.reapedCount` in the event stream.
113
+
114
+ Implementations MAY expose `sm job reap` as an explicit verb for diagnostics, but MUST perform reaping automatically inside `sm job run`.
115
+
116
+ ### TTL precedence
117
+
118
+ When computing the TTL at submit time (in order):
119
+
120
+ 1. Global default (`minimumTtlSeconds` from config).
121
+ 2. Action manifest (`expectedDurationSeconds`).
122
+ 3. User config override (`jobs.perActionTtl.<actionId>`).
123
+ 4. Flag (`sm job submit --ttl <seconds>`).
124
+
125
+ Later wins. The resolved value is written to `state_jobs.ttlSeconds` and is immutable for the life of the job.
126
+
127
+ ---
128
+
129
+ ## Record (callback)
130
+
131
+ `sm record --id <id> --nonce <n> --status completed|failed ...`:
132
+
133
+ 1. Load the job by id. If not found → exit 5.
134
+ 2. Compare the supplied nonce against `state_jobs.nonce`. Mismatch → exit 4 without mutation.
135
+ 3. If `state_jobs.status != 'running'` → exit 2 with message "job not in running state". This catches late callbacks after a reap.
136
+ 4. If `--status completed`: validate the report file against the action's declared report schema. On validation failure → transition to `failed` with reason `report-invalid`; DO NOT stay `running`.
137
+ 5. Write the execution record (see `schemas/execution-record.schema.json`) with the full metrics.
138
+ 6. Transition the job to the terminal state.
139
+ 7. Emit `job.callback.received` followed by `job.completed` or `job.failed`.
140
+
141
+ The nonce is the sole authentication factor. A compromised nonce allows forged callbacks for that single job. Nonces MUST be generated per-job; never reused; never logged at info level or above.
142
+
143
+ ---
144
+
145
+ ## Duplicate prevention rationale
146
+
147
+ The deduplication key `(actionId, actionVersion, nodeId, contentHash)` exists to prevent:
148
+
149
+ - Accidental double-submit when a user re-runs a command.
150
+ - Race conditions where two processes both try to submit the same action over the same node at the same content hash.
151
+ - Waste of LLM tokens re-computing an unchanged result.
152
+
153
+ Post-completion, the check is NOT performed: resubmitting a completed job is always allowed (the previous result is kept in history).
154
+
155
+ `--force` bypasses the check for legitimate reruns (e.g., re-testing an action after debugging).
156
+
157
+ ---
158
+
159
+ ## Concurrency
160
+
161
+ MVP (v0.x): **one job at a time**. `sm job run --all` drains sequentially. Enforced by the claim semantics above — there is no pool or scheduler.
162
+
163
+ The event schema carries a `jobId` on every event specifically so that parallel execution becomes a non-breaking extension. A future implementation MAY spawn multiple claim/run loops concurrently and interleave events; consumers identify which job an event belongs to by `jobId`.
164
+
165
+ Parallelism is NOT a v1.0 commitment. Implementations that offer it MUST still emit the canonical event stream correctly.
166
+
167
+ ---
168
+
169
+ ## Atomicity edge cases
170
+
171
+ Implementations MUST handle each of the following:
172
+
173
+ | Scenario | Required handling |
174
+ |---|---|
175
+ | DB says `queued` or `running`, but the job MD file is missing on disk. | Mark `failed` with `failureReason = job-file-missing`. `sm doctor` MUST report these proactively. |
176
+ | MD file present in `.skill-map/jobs/`, no matching DB row. | `sm doctor` MUST list them. Implementations MUST NOT auto-delete. `sm job prune --orphan-files` removes them explicitly. |
177
+ | User edited the MD file between submit and run. | By design: the runner uses the current file contents. The user owns the consequences. Event stream MAY note the mtime change. |
178
+ | Job `completed`, MD file still present. | Normal. Retention policy (`sm job prune` per `jobs.retention.*` config) eventually cleans up. |
179
+ | Runner crashes between `claim` and reading the file. | Covered by TTL/reap: when `expiresAt` passes, the next reap marks the job `failed` with `abandoned`. |
180
+ | Callback arrives after reap already failed the job. | Reject with exit 2 (see Record step 3). The runner should treat this as an error and log it. |
181
+
182
+ ---
183
+
184
+ ## Cancellation
185
+
186
+ `sm job cancel <id>` is the only user-facing transition outside the normal flow. Effects:
187
+
188
+ | From | Effect |
189
+ |---|---|
190
+ | `queued` | Transition to `failed` with `failureReason = user-cancelled`. |
191
+ | `running` | Transition to `failed` with `failureReason = user-cancelled`. DOES NOT interrupt a subprocess runner; the runner will discover the failed state on its next callback and exit cleanly. Implementations MAY additionally send a signal to the subprocess but this is not normative. |
192
+ | Terminal | Reject with exit 2 ("already terminal"). |
193
+
194
+ ---
195
+
196
+ ## Retention and GC
197
+
198
+ Config controls (`jobs.retention.completed`, `jobs.retention.failed`):
199
+
200
+ - `completed` default 30 days (2592000 seconds).
201
+ - `failed` default `null` = never auto-purge (preserves history of failures for analysis).
202
+
203
+ `sm job prune` applies retention. Implementations MAY run this on a schedule (e.g., on `sm doctor`, or in a cron adapter) but MUST NOT prune implicitly during normal verb execution.
204
+
205
+ ---
206
+
207
+ ## Stability
208
+
209
+ The state machine diagram above is **stable** as of spec v1.0.0. Adding a new state is a major bump. Adding a new terminal reason (`failureReason` enum value) is a minor bump.
210
+
211
+ The `contentHash` formula is **stable**. Changing what goes into the hash breaks duplicate detection across versions and is a major bump.
212
+
213
+ The atomic-claim semantics are **stable**. A double-claim would be a silent correctness bug observable through event-stream anomalies.