@alexandrealvaro/agentic 0.11.2-beta.1 → 0.12.0-beta.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -40,6 +40,7 @@ Two categories ([ADR-0007](doc/adr/0007-workflow-operational-skills.md)) and two
40
40
  | `agentic-review` | workflow-operational | universal | Fresh-context code review per WORKFLOW §10; structured findings, no "approve" | `/agentic-review <range>` |
41
41
  | `agentic-ground` | workflow-operational | universal | Four-source pre-implementation research (docs / OSS / in-repo / git history) + happy-path synthesis + deviation gate per WORKFLOW §4 + §5 | `/agentic-ground` |
42
42
  | `agentic-next` | workflow-operational | universal | State-aware navigation aid (`flutter doctor` pattern) — surveys the four-layer artifact stack and recommends prioritized next actions; complements `agentic-audit` (drift) | `/agentic-next` |
43
+ | `agentic-spike` | workflow-operational | universal | Staged spike with golden fixtures per WORKFLOW §14, for cases where the *technique* is uncertain across multiple plausible approaches; produces `spikes/NNNN-<slug>/` with discovery + fixture + pipeline-with-gates + two-layer evaluation | `/agentic-spike` |
43
44
  | `agentic-design` | spec-driven | auto if frontend detected | Bootstrap `DESIGN.md` from existing tokens (Figma, tailwind.config, tokens.json, CSS custom props) | `/agentic-design` |
44
45
  | `agentic-subagent` | spec-driven | auto if installing for Claude Code | Drafts `.claude/agents/<name>.md` (Claude Code only — Codex has no subagent primitive) | `/agentic-subagent` |
45
46
  | `agentic-skill` | spec-driven | opt-in only | Drafts a new Claude Code or Codex skill at the appropriate path | `/agentic-skill` |
@@ -155,6 +156,8 @@ The kit's discipline scales with the project's maturity. A solo PoC may legitima
155
156
 
156
157
  **Lost mid-flow?** Invoke `/agentic-next` at any time to survey the project's state across the four-layer artifact stack (Constitution → Spec → Plan/Decisions → Code) and get prioritized next-action recommendations. Read-only; complements `/agentic-audit` (drift detection — different question).
157
158
 
159
+ **Technique uncertain across multiple plausible approaches?** Invoke `/agentic-spike` (per WORKFLOW §14) when the spec is clear but the *how* is unknown — library choice, multi-stage transformation, novel domain. The skill scaffolds a staged spike with golden fixtures + per-stage debug artifacts + two-layer evaluation under `spikes/NNNN-<slug>/`. The directory is throwaway by design; conclude with `/agentic-adr` and delete.
160
+
158
161
  ## Manual prompts
159
162
 
160
163
  If you prefer to skip the installer, the same artifacts can be generated by pasting prompts directly into your agent. Each prompt file has the literal text to copy, plus the matching template structure:
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@alexandrealvaro/agentic",
3
- "version": "0.11.2-beta.1",
3
+ "version": "0.12.0-beta.1",
4
4
  "description": "Bootstrap and audit AGENTS.md, ARCHITECTURE.md, ADRs, skills, and subagents for engineering production code with LLMs",
5
5
  "type": "module",
6
6
  "bin": {
@@ -94,6 +94,28 @@ export const CONDITIONAL_SKILLS = [
94
94
  hintWhenAuto: 'opt-in',
95
95
  hintWhenManual: 'WORKFLOW §11 hooks scaffolder (pre-commit, pre-push)',
96
96
  },
97
+ // The next two skills are universal in `team` / `mature` profiles
98
+ // (declared in PROFILES['team' / 'mature'].universal in src/lib/profiles.js)
99
+ // and conditional/opt-in in `solo`. They must appear in this catalog so
100
+ // `availableConditionalsForProfile('solo')` lookups in `pickConditionalAuto`
101
+ // succeed — without these entries, `if (!def) continue` silently skipped
102
+ // them and a `solo` user could not opt-in to either (review B1, v0.11.3).
103
+ // The autoIf rule here is the universal-default; per-profile overrides
104
+ // come from `availableConditionalsForProfile`'s rule field.
105
+ {
106
+ name: 'agentic-architecture',
107
+ autoIf: () => true,
108
+ agents: ['claude-code', 'codex'],
109
+ hintWhenAuto: 'system patterns + boundaries',
110
+ hintWhenManual: 'opt-in (recommended once load-bearing patterns emerge)',
111
+ },
112
+ {
113
+ name: 'agentic-adr',
114
+ autoIf: () => true,
115
+ agents: ['claude-code', 'codex'],
116
+ hintWhenAuto: 'binding architectural decisions (Nygard pattern)',
117
+ hintWhenManual: 'opt-in (recommended for binding decisions worth recording)',
118
+ },
97
119
  ];
98
120
 
99
121
  const CONDITIONAL_BY_NAME = Object.fromEntries(
@@ -295,16 +317,23 @@ export async function initCommand(opts) {
295
317
  confirmReplace,
296
318
  previousStates,
297
319
  kitVersion: pkg.version,
320
+ profile: profileName,
298
321
  });
299
322
  allActions.push(...actions);
300
- const next = nextStates[agent];
301
- next.profile = profileName;
302
- saveState(cwd, agent, next);
323
+ // installSkills now stamps `profile` into nextStates per review C3.
324
+ // No post-hoc injection.
325
+ saveState(cwd, agent, nextStates[agent]);
303
326
  }
304
327
 
328
+ // Dedup: agentic-architecture and agentic-adr are universal at team /
329
+ // mature (in REQUIRED_SKILLS) AND conditional at solo (in
330
+ // CONDITIONAL_SKILLS) per review B1 (v0.11.3). Without the Set, the
331
+ // managed-skills section would list those rows twice.
305
332
  const skillDisplayOrder = [
306
- ...REQUIRED_SKILLS,
307
- ...CONDITIONAL_SKILLS.map((s) => s.name),
333
+ ...new Set([
334
+ ...REQUIRED_SKILLS,
335
+ ...CONDITIONAL_SKILLS.map((s) => s.name),
336
+ ]),
308
337
  ].filter((s) => installedSkillSet.has(s));
309
338
 
310
339
  const confirmAppend = interactive
@@ -341,6 +370,7 @@ export async function initCommand(opts) {
341
370
  '/agentic-review (WORKFLOW §10)',
342
371
  '/agentic-ground (WORKFLOW §4 + §5)',
343
372
  '/agentic-next (state survey + recommendations)',
373
+ '/agentic-spike (WORKFLOW §14 — staged spike with golden fixtures)',
344
374
  ...(optedSkills.includes('agentic-design') ? ['/agentic-design (DESIGN.md)'] : []),
345
375
  ...(optedSkills.includes('agentic-subagent') && agents.includes('claude-code')
346
376
  ? ['/agentic-subagent']
@@ -17,6 +17,10 @@ const AGENT_LABEL = {
17
17
  };
18
18
 
19
19
  function readProjectProfile(cwd) {
20
+ // Returns { agent: profileName }. Use `loadProjectStates(cwd)` instead
21
+ // when you need both the profile and the full state objects in a single
22
+ // pass (avoids the TOCTOU window where state files could be deleted
23
+ // between two reads — review C2, v0.11.3).
20
24
  const perAgent = {};
21
25
  for (const agent of VALID_AGENTS) {
22
26
  const state = loadState(cwd, agent);
@@ -25,6 +29,23 @@ function readProjectProfile(cwd) {
25
29
  return perAgent;
26
30
  }
27
31
 
32
+ function loadProjectStates(cwd) {
33
+ // Single-pass load of every per-agent state file. Returns
34
+ // { statesByAgent, profilesByAgent }. Both objects share the same agent
35
+ // keys so callers can iterate one and look up the other without a
36
+ // second filesystem read.
37
+ const statesByAgent = {};
38
+ const profilesByAgent = {};
39
+ for (const agent of VALID_AGENTS) {
40
+ const state = loadState(cwd, agent);
41
+ if (state) {
42
+ statesByAgent[agent] = state;
43
+ profilesByAgent[agent] = state.profile ?? DEFAULT_PROFILE;
44
+ }
45
+ }
46
+ return { statesByAgent, profilesByAgent };
47
+ }
48
+
28
49
  function showProfile(cwd) {
29
50
  const perAgent = readProjectProfile(cwd);
30
51
  if (Object.keys(perAgent).length === 0) {
@@ -64,8 +85,12 @@ function formatRule(rule) {
64
85
  async function setProfile(cwd, name, opts) {
65
86
  validateProfile(name);
66
87
 
67
- const perAgent = readProjectProfile(cwd);
68
- if (Object.keys(perAgent).length === 0) {
88
+ // Single load — reuse below to write. Avoids the TOCTOU window where
89
+ // state files could be deleted between read and re-read (review C2,
90
+ // v0.11.3). Previous implementation called readProjectProfile then
91
+ // loadState again per agent in the write loop.
92
+ const { statesByAgent, profilesByAgent } = loadProjectStates(cwd);
93
+ if (Object.keys(statesByAgent).length === 0) {
69
94
  throw new Error(
70
95
  'no agentic install detected. Run `agentic init --profile <name>` first.'
71
96
  );
@@ -73,7 +98,7 @@ async function setProfile(cwd, name, opts) {
73
98
 
74
99
  const interactive = process.stdout.isTTY && !opts.yes;
75
100
 
76
- const currentProfiles = [...new Set(Object.values(perAgent))];
101
+ const currentProfiles = [...new Set(Object.values(profilesByAgent))];
77
102
  if (currentProfiles.length === 1 && currentProfiles[0] === name) {
78
103
  process.stdout.write(`Profile already \`${name}\` for all installed agents. No change.\n`);
79
104
  return;
@@ -82,7 +107,7 @@ async function setProfile(cwd, name, opts) {
82
107
  if (interactive) {
83
108
  p.intro(`agentic profile set ${name}`);
84
109
  p.note(
85
- Object.entries(perAgent)
110
+ Object.entries(profilesByAgent)
86
111
  .map(([agent, profile]) => `${AGENT_LABEL[agent]}: ${profile} → ${name}`)
87
112
  .join('\n'),
88
113
  'Profile change'
@@ -101,9 +126,9 @@ async function setProfile(cwd, name, opts) {
101
126
  }
102
127
 
103
128
  // Write the new profile to each installed agent's state file before
104
- // running update, so update reads the new profile.
105
- for (const agent of Object.keys(perAgent)) {
106
- const state = loadState(cwd, agent);
129
+ // running update, so update reads the new profile. Reuses the in-memory
130
+ // states loaded above — no second filesystem read.
131
+ for (const [agent, state] of Object.entries(statesByAgent)) {
107
132
  state.profile = name;
108
133
  saveState(cwd, agent, state);
109
134
  }
@@ -128,18 +128,27 @@ function previouslyOptedConditional(previousStates, currentAgents, profileName)
128
128
  return [...opted];
129
129
  }
130
130
 
131
- function profileFromStates(previousStates, currentAgents) {
132
- // If multiple agents disagree on profile, surface and bail. Profile is
133
- // expected to match across agents in the same project.
131
+ function profileFromStates(statesByAgent, currentAgents) {
132
+ // Profile must match across every installed agent in the project — not
133
+ // only across the agents the current invocation targets. Without this,
134
+ // `--agent claude-code` on a project where codex was installed with a
135
+ // different profile masks the disagreement and produces inconsistent
136
+ // installs. Per review B2 (v0.11.3): always inspect the FULL set of
137
+ // loaded states, not the narrowed slice.
134
138
  const seen = new Set();
135
- for (const agent of currentAgents) {
136
- const prev = previousStates[agent];
137
- if (prev?.profile) seen.add(prev.profile);
139
+ for (const [agent, state] of Object.entries(statesByAgent)) {
140
+ if (state?.profile) seen.add(state.profile);
141
+ }
142
+ if (seen.size === 0) {
143
+ // No state on disk for any agent. Fall back to the default; current
144
+ // invocation is a fresh / legacy install handled by the legacy path.
145
+ return DEFAULT_PROFILE;
138
146
  }
139
- if (seen.size === 0) return DEFAULT_PROFILE;
140
147
  if (seen.size > 1) {
141
148
  throw new Error(
142
- `state files disagree on profile (${[...seen].join(', ')}). Run \`agentic profile set <name>\` to reconcile.`
149
+ `state files disagree on profile (${[...seen].join(
150
+ ', '
151
+ )}). Run \`agentic profile set <name>\` to reconcile across all installed agents before re-running update.`
143
152
  );
144
153
  }
145
154
  return [...seen][0];
@@ -172,7 +181,10 @@ export async function updateCommand(opts) {
172
181
  previousStates[agent] = statesByAgent[agent] ?? null;
173
182
  }
174
183
 
175
- const profileName = profileFromStates(previousStates, agents);
184
+ // Pass the FULL loaded set, not the narrowed slice. profileFromStates
185
+ // surfaces cross-agent disagreement even when the current invocation
186
+ // targets only one agent (review B2, v0.11.3).
187
+ const profileName = profileFromStates(statesByAgent, agents);
176
188
  const previousOpted = previouslyOptedConditional(
177
189
  previousStates,
178
190
  agents,
@@ -269,13 +281,14 @@ export async function updateCommand(opts) {
269
281
  confirmReplace,
270
282
  previousStates: { [agent]: previousStates[agent] ?? null },
271
283
  kitVersion: pkg.version,
284
+ profile: profileName,
272
285
  dryRun,
273
286
  force,
274
287
  });
275
288
  allActions.push(...result.actions);
276
- const next = result.nextStates[agent];
277
- next.profile = profileName;
278
- nextStates[agent] = next;
289
+ // installSkills now stamps `profile` into nextStates per review C3.
290
+ // No post-hoc injection.
291
+ nextStates[agent] = result.nextStates[agent];
279
292
  }
280
293
 
281
294
  if (!dryRun) {
@@ -284,9 +297,15 @@ export async function updateCommand(opts) {
284
297
  }
285
298
  }
286
299
 
300
+ // Dedup: agentic-architecture and agentic-adr are universal at team /
301
+ // mature (in REQUIRED_SKILLS) AND conditional at solo (in
302
+ // CONDITIONAL_SKILLS) per review B1 (v0.11.3). Without the Set, the
303
+ // managed-skills section would list those rows twice.
287
304
  const skillDisplayOrder = [
288
- ...REQUIRED_SKILLS,
289
- ...CONDITIONAL_SKILLS.map((s) => s.name),
305
+ ...new Set([
306
+ ...REQUIRED_SKILLS,
307
+ ...CONDITIONAL_SKILLS.map((s) => s.name),
308
+ ]),
290
309
  ].filter((s) => installedSkillSet.has(s));
291
310
 
292
311
  const confirmAppend = interactive
package/src/index.js CHANGED
@@ -36,12 +36,25 @@ export async function run(argv) {
36
36
  .option('--force', 'overwrite user-edited files on conflict (non-interactive default: no)')
37
37
  .action(updateCommand);
38
38
 
39
+ // Profile command accepts two positionals so `agentic profile set <name>`
40
+ // captures the name natively. Per review C1 (v0.11.3): the prior single-
41
+ // positional form had Commander swallow the second arg, leaving the
42
+ // documented `Usage: agentic profile set <name>` error message misleading.
43
+ // All forms work now:
44
+ // agentic profile → show
45
+ // agentic profile show → show
46
+ // agentic profile list → list
47
+ // agentic profile set <name> → set
48
+ // agentic profile <name> → shorthand for `set <name>`
49
+ // agentic profile set --name <name> → flag form (back-compat)
39
50
  program
40
- .command('profile [subcommand]')
51
+ .command('profile [subcommand] [name]')
41
52
  .description('Show, list, or set the project maturity profile (poc | solo | team | mature)')
42
- .option('-n, --name <name>', 'profile name when used with `set` subcommand')
53
+ .option('-n, --name <name>', 'profile name (alternative to positional, for `set` subcommand)')
43
54
  .option('-y, --yes', 'skip confirmation prompts (non-interactive)')
44
- .action(profileCommand);
55
+ .action((subcommand, name, opts) =>
56
+ profileCommand(subcommand, { ...opts, name: opts.name ?? name })
57
+ );
45
58
 
46
59
  await program.parseAsync(argv);
47
60
  }
@@ -12,6 +12,7 @@ import {
12
12
  import { fileURLToPath } from 'node:url';
13
13
  import { basename, dirname, join, relative, sep as PATH_SEP } from 'node:path';
14
14
  import { SCHEMA_VERSION } from './state.js';
15
+ import { DEFAULT_PROFILE, validateProfile } from './profiles.js';
15
16
 
16
17
  const __dirname = dirname(fileURLToPath(import.meta.url));
17
18
  const KIT_ROOT = join(__dirname, '..', '..');
@@ -190,9 +191,11 @@ export async function installSkills({
190
191
  confirmReplace = async () => false,
191
192
  previousStates = {},
192
193
  kitVersion = null,
194
+ profile = null,
193
195
  dryRun = false,
194
196
  force = false,
195
197
  }) {
198
+ if (profile !== null) validateProfile(profile);
196
199
  const actions = [];
197
200
  const nextStates = {};
198
201
 
@@ -302,10 +305,15 @@ export async function installSkills({
302
305
  };
303
306
  }
304
307
 
308
+ // Profile resolution order: explicit `profile` arg > prior state's
309
+ // profile > DEFAULT_PROFILE. installSkills is the single owner of the
310
+ // returned nextStates' shape; callers no longer inject `profile`
311
+ // post-hoc per review C3 (v0.11.3).
305
312
  nextStates[agent] = {
306
313
  schemaVersion: SCHEMA_VERSION,
307
314
  kitVersion: kitVersion ?? prev?.kitVersion ?? null,
308
315
  agent,
316
+ profile: profile ?? prev?.profile ?? DEFAULT_PROFILE,
309
317
  skills: nextSkills,
310
318
  };
311
319
  }
@@ -20,7 +20,7 @@ export const PROFILE_NAMES = ['poc', 'solo', 'team', 'mature'];
20
20
 
21
21
  export const PROFILES = {
22
22
  poc: {
23
- universal: ['agentic-philosophy', 'agentic-ground', 'agentic-audit', 'agentic-next'],
23
+ universal: ['agentic-philosophy', 'agentic-ground', 'agentic-audit', 'agentic-next', 'agentic-spike'],
24
24
  conditional: {
25
25
  'agentic-design': 'blocked',
26
26
  'agentic-subagent': 'blocked',
@@ -35,6 +35,7 @@ export const PROFILES = {
35
35
  'agentic-ground',
36
36
  'agentic-audit',
37
37
  'agentic-next',
38
+ 'agentic-spike',
38
39
  'agentic-bootstrap',
39
40
  'agentic-spec',
40
41
  'agentic-task',
@@ -62,6 +63,7 @@ export const PROFILES = {
62
63
  'agentic-review',
63
64
  'agentic-ground',
64
65
  'agentic-next',
66
+ 'agentic-spike',
65
67
  ],
66
68
  conditional: {
67
69
  'agentic-design': 'frontend',
@@ -83,6 +85,7 @@ export const PROFILES = {
83
85
  'agentic-review',
84
86
  'agentic-ground',
85
87
  'agentic-next',
88
+ 'agentic-spike',
86
89
  ],
87
90
  conditional: {
88
91
  'agentic-design': 'frontend',
@@ -28,6 +28,8 @@ export const SKILL_DESCRIPTIONS = {
28
28
  'Four-source pre-implementation research (docs / OSS / in-repo / git history) + happy-path synthesis + deviation gate. WORKFLOW §4 + §5.',
29
29
  'agentic-next':
30
30
  'State survey + prioritized next-action recommendations across the four-layer artifact stack. Read-only navigation aid (`flutter doctor` pattern).',
31
+ 'agentic-spike':
32
+ 'Staged spike with golden fixtures per WORKFLOW §14. Discovery + fixture + pipeline-with-gates + two-layer evaluation, when the *technique* is uncertain across multiple plausible approaches.',
31
33
  'agentic-design': 'Bootstrap `DESIGN.md` from existing tokens (frontend projects).',
32
34
  'agentic-subagent': 'Draft a new Claude Code subagent at `.claude/agents/<name>.md`.',
33
35
  'agentic-skill': 'Draft a new Claude Code or Codex skill at the appropriate path.',
@@ -0,0 +1,220 @@
1
+ ---
2
+ name: agentic-spike
3
+ description: Scaffold a staged spike with golden fixtures per WORKFLOW.md §14, for cases where the spec is clear but the technique is uncertain across multiple plausible approaches. Four stages — discovery, golden fixture, pipeline with gates, two-layer evaluation. Use when the unknown is *how*, not *what*. Triggers on "spike", "uncertain technique", "which library", "CV pipeline", "evaluate approaches", "ground truth", "golden fixture", "staged pipeline", "debug per stage". Routes to `agentic-ground` if the *how* is routine and a single happy path is obvious. Read-and-write — creates `spikes/NNNN-<slug>/` with fixtures, debug per-stage artifacts, eval results.
4
+ allowed-tools: Read, Write, Glob, Grep, Bash, WebFetch, WebSearch
5
+ ---
6
+
7
+ # /agentic-spike
8
+
9
+ Implements WORKFLOW.md §14 (Staged Spikes With Golden Fixtures) end-to-end. The skill is for cases where the spec is clear but the *technique* is uncertain across multiple plausible approaches — library choice, CV approach, multi-stage transformation. WORKFLOW §9 (TDG) assumes the path is known and validates end-to-end; §14 assumes the path is unknown and validates per stage. Different uncertainty regimes; this skill is for the unknown one.
10
+
11
+ The skill creates a working directory under `spikes/NNNN-<slug>/` and fills it stage-by-stage. The directory is throwaway by design — when the spike concludes, an ADR records the decision (`/agentic-adr`) and the spike directory is deleted. See ADR-0017 for the promote-or-delete lifecycle rationale.
12
+
13
+ ## Step 0 — Confirm uncertainty
14
+
15
+ The skill is for *unknown technique* across multiple plausible approaches, not for *non-trivial work* in general. If a single happy path is obvious, **do not start a spike**. If the *how* is knowable from official docs / OSS examples / in-repo patterns / git history, route to `agentic-ground` and stop.
16
+
17
+ Concrete tests to run before starting:
18
+
19
+ * Could `agentic-ground`'s four-source research surface a single happy path with a defensible deviation gate? If yes, run that instead.
20
+ * Are there ≥2 candidate techniques with materially different trade-offs that no source resolves? If no, this is not a spike.
21
+ * Is end-to-end validation against expected outputs feasible without per-stage debug? If yes, this is `agentic-task` + `agentic-philosophy` Goal-Driven Execution territory, not a spike.
22
+
23
+ If the spike is warranted, confirm with the user the *recortte* (the specific surface where uncertainty sits — not the whole feature) and proceed.
24
+
25
+ ## Step 1 — Discovery
26
+
27
+ List canonical approaches grounded in **official docs and real examples**. Pick one (or a small set, ≤3) by an **explicit criterion**.
28
+
29
+ Candidate-listing process:
30
+
31
+ 1. Search official documentation for the language / library / domain in question. Cite URL + version.
32
+ 2. Search public OSS for repos that solve the same technical recortte. Cite `<repo>:<path>:<line-range>` and fetch via tools — never paraphrase from training memory.
33
+ 3. Survey in-repo for analogous patterns the codebase already uses. Cite `<file>:<line>` or "no analog found".
34
+ 4. Survey git history for prior attempts at the same problem. Cite `<commit-sha>` or "no prior attempt".
35
+
36
+ Output format:
37
+
38
+ ```markdown
39
+ ## Discovery — <recortte>
40
+
41
+ ### Candidate techniques
42
+ 1. **<name>** — <one-line description>. Source: <URL or repo:path>. Trade-offs: <pros / cons>.
43
+ 2. **<name>** — ...
44
+ 3. **<name>** — ...
45
+
46
+ ### Selection criterion
47
+ <one-line criterion: latency / accuracy / readability / dependencies / etc>
48
+
49
+ ### Picked
50
+ <technique X>, picked by criterion <Y>. Alternatives held in reserve: <list>.
51
+ ```
52
+
53
+ The output of this step is **information, not code**. No spike directory is created yet. The user reviews the candidate list and confirms the picked approach (or revises) before Step 2.
54
+
55
+ ## Step 2 — Golden fixture
56
+
57
+ Curate inputs with rich expected outputs. The fixture is the ground truth the staged pipeline validates against; richer fixtures catch more failure modes.
58
+
59
+ Create the spike directory:
60
+
61
+ ```bash
62
+ mkdir -p spikes/NNNN-<slug>/{fixtures,debug,eval}
63
+ ```
64
+
65
+ Where `NNNN` is the next available 4-digit number (mirrors ADR / task / spec numbering). List `spikes/` and pick the next slot.
66
+
67
+ The fixture format is JSON keyed by input path (recommended) or whatever shape the domain demands. For computer vision: bounding boxes, sizes, lighting condition, difficulty tag, edge case markers. For multi-stage transformations: intermediate states. For library choice: representative inputs covering typical and edge cases.
68
+
69
+ Example fixture file (`spikes/0001-detect-circles/fixtures/golden.json`):
70
+
71
+ ```json
72
+ {
73
+ "inputs/easy-01.jpg": {
74
+ "expected": [
75
+ { "bbox": [120, 80, 240, 200], "label": "circle", "size": "large", "lighting": "even" }
76
+ ],
77
+ "difficulty": "easy",
78
+ "edge_cases": []
79
+ },
80
+ "inputs/hard-01.jpg": {
81
+ "expected": [
82
+ { "bbox": [50, 60, 90, 100], "label": "circle", "size": "small", "lighting": "low" },
83
+ { "bbox": [200, 80, 260, 140], "label": "circle", "size": "medium", "lighting": "even", "occluded": true }
84
+ ],
85
+ "difficulty": "hard",
86
+ "edge_cases": ["low-light", "partial-occlusion", "multiple-objects"]
87
+ }
88
+ }
89
+ ```
90
+
91
+ Curation principles:
92
+
93
+ * Include **edge cases** (low light, partial occlusion, malformed inputs, large inputs, empty inputs) — not just "happy path" examples. The fixture's job is to surface *where* a technique fails, not just whether it succeeds on easy cases.
94
+ * Include **difficulty tags** so per-stage evaluation can report performance segmented by difficulty.
95
+ * Keep the fixture as data, not code. JSON / YAML / CSV — anything that diffs cleanly and survives a refactor.
96
+
97
+ The fixture is the contract the pipeline validates against. Treat it like spec text — it should not change once the spike runs unless ground truth itself changes.
98
+
99
+ ## Step 3 — Pipeline with gates
100
+
101
+ One technique per stage. Each stage emits a **debug artifact** that makes its output inspectable.
102
+
103
+ Pipeline structure:
104
+
105
+ ```
106
+ spikes/NNNN-<slug>/
107
+ ├── README.md # spike framing (Step 1 output)
108
+ ├── fixtures/ # golden inputs + expected outputs
109
+ │ └── golden.json
110
+ ├── pipeline/ # one file per stage
111
+ │ ├── 01-preprocess.<ext>
112
+ │ ├── 02-detect.<ext>
113
+ │ └── 03-postprocess.<ext>
114
+ ├── debug/ # per-stage debug artifacts
115
+ │ ├── 01-preprocess/
116
+ │ ├── 02-detect/
117
+ │ └── 03-postprocess/
118
+ └── eval/ # evaluation results (Step 4)
119
+ ```
120
+
121
+ Each stage's debug artifact format depends on the domain:
122
+
123
+ * CV pipelines: image saved to `debug/NN-<stage>/<input-name>.png` showing the stage's output.
124
+ * Multi-stage transformations: intermediate JSON saved to `debug/NN-<stage>/<input-name>.json`.
125
+ * Library evaluation: log row per (input, library) saved to `debug/NN-<stage>/log.csv`.
126
+
127
+ The discipline: **each stage's output must be inspectable independently**. End-to-end output alone tells you *that* it failed; per-stage debug tells you *where*.
128
+
129
+ Implementation pattern (any language):
130
+
131
+ * Stage takes (input, context) and returns (output, debug-record). Debug-record is written to `debug/NN-<stage>/`.
132
+ * Pipeline runs stages sequentially. Failure at any stage halts the pipeline and reports the stage where divergence happened.
133
+
134
+ ## Step 4 — Two-layer evaluation
135
+
136
+ Run the pipeline against the fixture and emit two layers of results:
137
+
138
+ * **End-to-end:** how many fixture inputs produced expected outputs? Reported as pass / fail per input, plus aggregate pass rate.
139
+ * **Per-stage:** for each fixture input, where did the pipeline diverge? Stage NN's output vs the expected intermediate. Reported as pass / fail per (input, stage).
140
+
141
+ Output to `spikes/NNNN-<slug>/eval/results.json`:
142
+
143
+ ```json
144
+ {
145
+ "fixture": "fixtures/golden.json",
146
+ "pipeline_version": "<commit-sha or timestamp>",
147
+ "end_to_end": {
148
+ "total": 10,
149
+ "passed": 7,
150
+ "failed": 3
151
+ },
152
+ "per_stage": {
153
+ "01-preprocess": { "passed": 10, "failed": 0 },
154
+ "02-detect": { "passed": 8, "failed": 2 },
155
+ "03-postprocess": { "passed": 7, "failed": 1 }
156
+ },
157
+ "failures": [
158
+ {
159
+ "input": "inputs/hard-02.jpg",
160
+ "diverged_at": "02-detect",
161
+ "expected": [...],
162
+ "actual": [...],
163
+ "debug_artifact": "debug/02-detect/hard-02.png"
164
+ }
165
+ ]
166
+ }
167
+ ```
168
+
169
+ The per-stage layer is what makes the spike actionable. End-to-end says *that* it failed; per-stage + debug artifact says *where* and *why*.
170
+
171
+ ## Step 5 — Conclude (promote or delete)
172
+
173
+ When the spike concludes — either the picked technique works or it does not — record the outcome via `/agentic-adr` and delete the spike directory. The ADR is the persistent artifact; the spike code is throwaway.
174
+
175
+ ADR template for spike outcomes:
176
+
177
+ ```markdown
178
+ # ADR-NNNN: We will use technique X for <recortte>
179
+
180
+ ## Context
181
+
182
+ <why the spike was needed — what was uncertain>
183
+
184
+ ## Decision
185
+
186
+ We will use technique X. The spike at `spikes/NNNN-<slug>/` (now deleted) showed:
187
+ - End-to-end pass rate: <%>
188
+ - Failures concentrated at stage <NN>, root cause <Y>
189
+ - Mitigation: <Z>
190
+
191
+ Alternatives held in reserve and rejected:
192
+ - Technique A: rejected because <reason from spike eval>
193
+ - Technique B: rejected because <reason from spike eval>
194
+
195
+ ## Consequences
196
+
197
+ <follow-on work this decision unblocks; rails to maintain>
198
+ ```
199
+
200
+ Then:
201
+
202
+ ```bash
203
+ rm -rf spikes/NNNN-<slug>/
204
+ git add doc/adr/NNNN-<slug>.md
205
+ git commit -m "feat: adopt technique X for <recortte> per spike NNNN"
206
+ ```
207
+
208
+ Spikes that conclude inconclusively get an ADR too — `Decision: defer; the spike at NNNN inconclusive because Y` — and the directory is deleted. Inconclusive spikes are real signal; preserving the framing in an ADR prevents re-litigation.
209
+
210
+ ## Output contract
211
+
212
+ A spike directory at `spikes/NNNN-<short-slug>/` with the four-stage layout above (discovery README, fixtures, pipeline, debug per stage, eval results). The directory is throwaway by design — promote-or-delete lifecycle per ADR-0017 §4. No `Status: shipped` lifecycle; spikes do not "ship" — they conclude with an ADR.
213
+
214
+ When the host exposes `AskUserQuestion` (per ADR-0014), use it for the Step 1 selection criterion confirmation and the Step 5 promote/delete decision.
215
+
216
+ ## Next
217
+
218
+ - After Step 1 (discovery output reviewed): proceed to Step 2 to create the spike directory + fixture, or abort if the discovery surfaced a single happy path (route to `agentic-ground`).
219
+ - After Step 4 (eval results): `/agentic-adr` to record the outcome, then delete the spike directory.
220
+ - If the spike succeeds and production work follows: `/agentic-task` for the work units to apply the spike's findings to production code (Spec ref the original spec if applicable; cite the ADR in the task `Notes`).
@@ -0,0 +1,89 @@
1
+ ---
2
+ name: agentic-spike
3
+ description: Scaffold a staged spike with golden fixtures per WORKFLOW.md §14, for cases where the spec is clear but the technique is uncertain across multiple plausible approaches. Four stages — discovery, golden fixture, pipeline with gates, two-layer evaluation. Use when the unknown is *how*, not *what*. Triggers on "spike", "uncertain technique", "which library", "CV pipeline", "evaluate approaches", "ground truth", "golden fixture", "staged pipeline", "debug per stage". Routes to `agentic-ground` if the *how* is routine and a single happy path is obvious.
4
+ ---
5
+
6
+ <background_information>
7
+ Implements WORKFLOW.md §14 (Staged Spikes With Golden Fixtures) end-to-end. The skill is for cases where the spec is clear but the technique is uncertain across multiple plausible approaches. WORKFLOW §9 (TDG) assumes the path is known and validates end-to-end; §14 assumes the path is unknown and validates per stage.
8
+
9
+ The skill creates `spikes/NNNN-<slug>/` and fills it stage-by-stage. The directory is throwaway by design — when the spike concludes, an ADR records the decision and the spike directory is deleted (promote-or-delete lifecycle per ADR-0017 §4).
10
+
11
+ Codex auto-trigger on description keywords is less mature than Claude Code's. If auto-invocation does not fire when the user mentions an uncertain technique or asks to evaluate approaches, invoke this skill manually.
12
+ </background_information>
13
+
14
+ <instructions>
15
+ Step 0 — confirm uncertainty. Skill is for unknown technique across multiple plausible approaches, not non-trivial work in general. If a single happy path is obvious, do NOT start a spike — route to `agentic-ground` and stop.
16
+
17
+ Tests:
18
+ - Could `agentic-ground`'s four-source research surface a single happy path with a defensible deviation gate? If yes, run that instead.
19
+ - Are there ≥2 candidate techniques with materially different trade-offs that no source resolves? If no, this is not a spike.
20
+ - Is end-to-end validation feasible without per-stage debug? If yes, this is `agentic-task` + `agentic-philosophy` Goal-Driven Execution territory.
21
+
22
+ If spike warranted, confirm the recortte with the user and proceed.
23
+
24
+ Step 1 — discovery. List canonical approaches grounded in official docs and real examples. Pick one (or ≤3) by an explicit criterion.
25
+
26
+ Process:
27
+ - Search official documentation. Cite URL + version.
28
+ - Search OSS for repos solving the same recortte. Cite `<repo>:<path>:<line-range>`; fetch via tools, never paraphrase from training memory.
29
+ - Survey in-repo for analogous patterns. Cite `<file>:<line>` or "no analog found".
30
+ - Survey git history for prior attempts. Cite `<commit-sha>` or "no prior attempt".
31
+
32
+ Output: candidate-list markdown with techniques, sources, trade-offs, selection criterion, picked technique. NO code yet. User reviews before Step 2.
33
+
34
+ Step 2 — golden fixture. Curate inputs with rich expected outputs. JSON keyed by input path (recommended). Include edge cases (low light, partial occlusion, malformed inputs, large inputs, empty inputs) and difficulty tags.
35
+
36
+ Create the spike directory:
37
+ ```
38
+ mkdir -p spikes/NNNN-<slug>/{fixtures,debug,eval}
39
+ ```
40
+ NNNN = next 4-digit number after highest existing under `spikes/`.
41
+
42
+ The fixture is the contract the pipeline validates against. Treat like spec text — should not change once the spike runs unless ground truth changes.
43
+
44
+ Step 3 — pipeline with gates. One technique per stage. Each stage emits a debug artifact making its output inspectable.
45
+
46
+ Layout:
47
+ ```
48
+ spikes/NNNN-<slug>/
49
+ ├── README.md # spike framing (Step 1 output)
50
+ ├── fixtures/ # golden inputs + expected outputs
51
+ ├── pipeline/ # one file per stage (01-preprocess, 02-detect, etc)
52
+ ├── debug/ # per-stage debug artifacts (image / JSON / log row)
53
+ └── eval/ # evaluation results (Step 4)
54
+ ```
55
+
56
+ Each stage takes (input, context), returns (output, debug-record). Debug-record written to `debug/NN-<stage>/`. Pipeline halts and reports stage on first divergence.
57
+
58
+ Step 4 — two-layer evaluation:
59
+ - End-to-end: pass rate against fixture inputs.
60
+ - Per-stage: for each input, where did pipeline diverge?
61
+
62
+ Output to `spikes/NNNN-<slug>/eval/results.json`:
63
+ ```
64
+ {
65
+ "fixture": "fixtures/golden.json",
66
+ "end_to_end": { "total": 10, "passed": 7, "failed": 3 },
67
+ "per_stage": { "01-preprocess": { "passed": 10, "failed": 0 }, ... },
68
+ "failures": [{ "input": "...", "diverged_at": "02-detect", "debug_artifact": "..." }]
69
+ }
70
+ ```
71
+
72
+ Per-stage layer is what makes the spike actionable.
73
+
74
+ Step 5 — conclude (promote or delete). When the spike concludes:
75
+ - Record outcome via `/agentic-adr` (ADR is the persistent artifact).
76
+ - Delete the spike directory: `rm -rf spikes/NNNN-<slug>/`.
77
+
78
+ ADR captures: which technique picked, alternatives held in reserve, end-to-end pass rate, failures and root causes, mitigation. Inconclusive spikes get ADRs too — preserves framing, prevents re-litigation.
79
+ </instructions>
80
+
81
+ <output_contract>
82
+ A spike directory at `spikes/NNNN-<short-slug>/` with the four-stage layout (discovery README, fixtures, pipeline, debug per stage, eval results). The directory is throwaway by design — promote-or-delete lifecycle per ADR-0017 §4. No `Status: shipped` lifecycle; spikes conclude with an ADR.
83
+ </output_contract>
84
+
85
+ ## Next
86
+
87
+ - After Step 1: proceed to Step 2, or abort if discovery surfaced a single happy path (route to `agentic-ground`).
88
+ - After Step 4: `/agentic-adr` to record the outcome, then delete the spike directory.
89
+ - If spike succeeds and production work follows: `/agentic-task` for work units (Spec ref the original spec; cite the ADR in task Notes).
@@ -0,0 +1,5 @@
1
+ interface:
2
+ display_name: agentic-spike
3
+ short_description: Staged spike with golden fixtures per WORKFLOW §14. Discovery + fixture + pipeline-with-gates + two-layer evaluation. For unknown-technique uncertainty, not non-trivial work in general.
4
+ policy:
5
+ allow_implicit_invocation: false