npm - oh-my-customcodex - Versions diffs - 0.3.10 → 0.4.1 - Mend

oh-my-customcodex 0.3.10 → 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (37) hide show

package/README.md +9 -8
package/dist/cli/index.js +2 -9
package/dist/index.js +1 -1
package/package.json +1 -1
package/templates/.claude/agents/mgr-creator.md +11 -0
package/templates/.claude/agents/mgr-sauron.md +1 -1
package/templates/.claude/agents/tracker-checkpoint.md +77 -0
package/templates/.claude/output-styles/korean-engineer.md +24 -0
package/templates/.claude/rules/MUST-agent-design.md +2 -1
package/templates/.claude/rules/MUST-completion-verification.md +13 -0
package/templates/.claude/rules/SHOULD-interaction.md +2 -0
package/templates/.claude/skills/agent-eval-framework/SKILL.md +92 -0
package/templates/.claude/skills/agora/SKILL.md +11 -0
package/templates/.claude/skills/codex-exec/SKILL.md +12 -0
package/templates/.claude/skills/dag-orchestration/SKILL.md +20 -0
package/templates/.claude/skills/evaluator-optimizer/SKILL.md +20 -0
package/templates/.claude/skills/harness-eval/SKILL.md +13 -0
package/templates/.claude/skills/pipeline-guards/SKILL.md +19 -0
package/templates/.claude/skills/roundtable-debate/SKILL.md +60 -0
package/templates/.claude/skills/sauron-watch/SKILL.md +16 -4
package/templates/.claude/skills/sdd-dev/SKILL.md +6 -3
package/templates/.claude/skills/sdd-dev/templates/decision-record.md +45 -0
package/templates/.claude/skills/secretary-routing/SKILL.md +3 -0
package/templates/.github/scripts/verify-fork-list.sh +97 -0
package/templates/AGENTS.md.en +12 -26
package/templates/AGENTS.md.ko +12 -26
package/templates/CLAUDE.md +5 -4
package/templates/CLAUDE.md.en +8 -7
package/templates/CLAUDE.md.ko +8 -7
package/templates/guides/agent-eval/README.md +48 -0
package/templates/guides/agent-eval/index.yaml +6 -0
package/templates/guides/browser-automation/README.md +12 -0
package/templates/guides/index.yaml +12 -0
package/templates/guides/multi-agent-debate-patterns/README.md +26 -0
package/templates/guides/multi-agent-debate-patterns/index.yaml +6 -0
package/templates/manifest.json +5 -5
package/templates/workflows/auto-dev.yaml +7 -1

package/README.md CHANGED Viewed

@@ -13,7 +13,7 @@
 **[한국어 문서 (Korean)](./README_ko.md)**
-48 agents. 112 skills. 22 rules. One command.
+49 agents. 114 skills. 22 rules. One command.
 ```bash
 npm install -g oh-my-customcodex && cd your-project && omcustomcodex init
@@ -112,7 +112,7 @@ Agent(arch-documenter):haiku      ┘
 ---
-### Agents (48)
+### Agents (49)
 | Category | Count | Agents |
 |----------|-------|--------|
@@ -121,19 +121,20 @@ Agent(arch-documenter):haiku      ┘
 | Frontend | 5 | fe-vercel, fe-vuejs, fe-svelte, fe-flutter, fe-design |
 | Data Engineering | 6 | de-airflow, de-dbt, de-spark, de-kafka, de-snowflake, de-pipeline |
 | Database | 4 | db-supabase, db-postgres, db-redis, db-alembic |
-| Tooling | 4 | tool-npm, tool-optimizer, tool-bun, slack-cli |
+| Tooling | 3 | tool-npm, tool-optimizer, tool-bun |
 | Architecture | 2 | arch-documenter, arch-speckit |
 | Infrastructure | 2 | infra-docker, infra-aws |
 | QA | 3 | qa-planner, qa-writer, qa-engineer |
 | Security | 1 | sec-codeql |
 | Managers | 6 | mgr-creator, mgr-updater, mgr-supplier, mgr-gitnerd, mgr-sauron, mgr-claude-code-bible |
-| System | 2 | sys-memory-keeper, sys-naggy |
+| System | 3 | sys-memory-keeper, sys-naggy, tracker-checkpoint |
+| Auxiliary | 2 | slack-cli, wiki-curator |
 Each agent declares its tools, model, memory scope, and limitations in YAML frontmatter. Tool budgets are enforced per agent type for accuracy.
 ---
-### Skills (112)
+### Skills (114)
 | Category | Count | Includes |
 |----------|-------|----------|
@@ -226,7 +227,7 @@ Key rules: R010 (orchestrator never writes files), R009 (parallel execution mand
 ---
-### Guides (40)
+### Guides (42)
 Reference documentation covering best practices, architecture decisions, and integration patterns. Located in `guides/` at project root, covering topics from agent design to CI/CD to observability.
@@ -277,7 +278,7 @@ omcustomcodex serve-stop            # Stop Web UI
 your-project/
 ├── AGENTS.md                   # Entry point
 ├── .codex/
-│   ├── agents/                 # 48 agent definitions
+│   ├── agents/                 # 49 agent definitions
 │   ├── rules/                  # 22 governance rules (R000-R021)
 │   ├── hooks/                  # 15 lifecycle hook scripts
 │   ├── schemas/                # Tool input validation schemas
@@ -285,7 +286,7 @@ your-project/
 │   ├── contexts/               # 4 shared context files
 │   └── ontology/               # Knowledge graph for RAG
 ├── .agents/
-│   └── skills/                 # 112 installed skill modules
+│   └── skills/                 # 114 installed skill modules
 └── guides/                     # 40 reference documents
 ```

package/dist/cli/index.js CHANGED Viewed

@@ -3091,7 +3091,7 @@ var init_package = __esm(() => {
     workspaces: [
       "packages/*"
     ],
-    version: "0.3.10",
+    version: "0.4.1",
     description: "Batteries-included agent harness on top of GPT Codex + OMX",
     type: "module",
     bin: {
@@ -29925,14 +29925,7 @@ async function initCommand(options) {
       await registerProject(targetDir, package_default.version);
     } catch {}
     console.log("");
-    console.log("Required plugins (install manually):");
-    console.log("  /plugin marketplace add obra/superpowers-marketplace");
-    console.log("  /plugin install superpowers");
-    console.log("  /plugin install openai-docs");
-    console.log("  /plugin install elements-of-style");
-    console.log("  /plugin install context7");
-    console.log("");
-    console.log('See AGENTS.md "외부 의존성" section for details.');
+    console.log("Codex setup complete. See AGENTS.md for Codex-native MCP and runtime guidance.");
     return {
       success: true,
       message: i18n.t("cli.init.success"),

package/dist/index.js CHANGED Viewed

@@ -2180,7 +2180,7 @@ var package_default = {
   workspaces: [
     "packages/*"
   ],
-  version: "0.3.10",
+  version: "0.4.1",
   description: "Batteries-included agent harness on top of GPT Codex + OMX",
   type: "module",
   bin: {

package/package.json CHANGED Viewed

@@ -3,7 +3,7 @@
   "workspaces": [
     "packages/*"
   ],
-  "version": "0.3.10",
+  "version": "0.4.1",
   "description": "Batteries-included agent harness on top of GPT Codex + OMX",
   "type": "module",
   "bin": {

package/templates/.claude/agents/mgr-creator.md CHANGED Viewed

@@ -7,6 +7,7 @@ memory: project
 effort: high
 skills:
   - create-agent
+  - agent-eval-framework
 tools:
   - Read
   - Write
@@ -36,6 +37,16 @@ Frontmatter (name, description, model, tools, skills, memory) + body (purpose, c
 No registry update needed - agents auto-discovered from `.claude/agents/*.md`.
+### Phase 4: Optional Quantitative Gate
+For high-risk or reusable agents, use `agent-eval-framework` after creation:
+1. Define an ideal trajectory for the agent's first representative task.
+2. Run correctness checks before measuring efficiency.
+3. Record `step_ratio`, `tool_call_ratio`, and `latency_ratio` as advisory evidence.
+Do not force this gate for every small helper agent. It is opt-in when the extra cost is justified by reuse, safety, or routing criticality.
 ## Rules Applied
 - R000: All files in English

package/templates/.claude/agents/mgr-sauron.md CHANGED Viewed

@@ -30,7 +30,7 @@ You are an automated verification specialist that executes the mandatory R017 ve
 6. Verify philosophy compliance (R006-R011)
 7. Verify Claude-native compatibility
 8. Spec density analysis: detects agents with excessive inline implementation detail (R006 compliance)
-9. Structural linting: routing coverage (unreachable agents), orphan skill detection, circular dependency check, context:fork cap verification
+9. Structural linting: routing coverage (unreachable agents), orphan skill detection, circular dependency check, context:fork cap verification, R006 fork-list/frontmatter cross-validation
 10. Auto-fix simple issues (count mismatches, missing fields)
 11. Generate verification report

package/templates/.claude/agents/tracker-checkpoint.md ADDED Viewed

@@ -0,0 +1,77 @@
+---
+name: tracker-checkpoint
+description: Pipeline execution state tracker with checkpoint persistence. Reads and writes /tmp/.codex-pipeline-*-{PPID}.json state files and validates state transitions for pipeline and DAG resume flows.
+model: sonnet
+effort: medium
+tools: [Read, Write, Edit, Bash, Glob, Grep]
+memory: project
+skills: [dag-orchestration, pipeline-guards]
+domain: universal
+permissionMode: bypassPermissions
+---
+# Tracker Checkpoint Agent
+## Purpose
+Manage pipeline execution state through persistent checkpoint files. This agent works with `/pipeline resume`, `dag-orchestration`, and `pipeline-guards` so failed or preempted runs can resume from a known state.
+## Capabilities
+- Read and write `/tmp/.codex-pipeline-{name}-{PPID}.json` state files
+- Read and write `/tmp/.codex-dag-{PPID}.json` DAG state files when a DAG workflow owns the run
+- Validate state transitions: `pending -> running -> completed | failed`
+- Preserve failure context for halted pipeline steps
+- Support `/pipeline resume` by loading the last known state
+## Workflow
+### 1. Pipeline Start
+- Create `/tmp/.codex-pipeline-{name}-{PPID}.json` with initial state
+- Record pipeline name, start timestamp, total steps, and `current_step: 0`
+### 2. Step Checkpoint
+- Update state after each step
+- Record step name, status, duration, and artifact paths
+- Use atomic write semantics: write temporary JSON, then move it into place
+### 3. Failure Freeze
+- Mark the pipeline status as `halted`
+- Preserve failed step, error message, and partial artifact paths
+- Leave the checkpoint file available for resume inspection
+### 4. Resume Coordination
+- Scan `/tmp/.codex-pipeline-*-{PPID}.json`
+- Return pipeline name, failed step, error, and retry/skip/abort options to the orchestrator
+- On retry, reset the failed step to `pending` and resume execution from that step
+## State File Schema
+```json
+{
+  "pipeline": "{name}",
+  "started": "ISO-8601",
+  "status": "running|completed|halted",
+  "current_step": 0,
+  "steps": [
+    {"name": "triage", "status": "completed", "duration_ms": 5000, "artifacts": []},
+    {"name": "plan", "status": "running"}
+  ]
+}
+```
+## Integration Points
+- `pipeline` skill: `/pipeline resume` state loader
+- `dag-orchestration` skill: step dependency resolution and checkpoint restoration
+- `pipeline-guards` skill: guard gate state snapshots
+## Rules Compliance
+- R006: this is an agent artifact; checkpoint workflow logic remains in skills
+- R010: orchestrator owns scheduling, this agent owns checkpoint file operations
+- R017: structural changes to checkpoint contracts require sauron verification

package/templates/.claude/output-styles/korean-engineer.md ADDED Viewed

@@ -0,0 +1,24 @@
+---
+name: korean-engineer
+description: Korean-first engineering responses with agent identity and evidence-focused completion
+keep-coding-instructions: true
+---
+# Korean Engineer Output Style
+Use Korean for user-facing communication unless the user explicitly asks otherwise. Keep code, file contents, identifiers, and commit trailers in English when that is the repository convention.
+Every response starts with the agent identity block required by the project guidance:
+```text
+┌─ Agent: {agent-name} / {model}
+│ Skill: {active-skill-or-none}
+└─ Status: {current action or result}
+```
+Prefer concise, evidence-focused engineering reports:
+- State the current action or outcome first.
+- Cite concrete verification evidence before declaring completion.
+- Do not claim release, deploy, or publish completion until the external surface has been checked.
+- Keep uncertainty explicit and tied to the missing evidence.

package/templates/.claude/rules/MUST-agent-design.md CHANGED Viewed

@@ -254,6 +254,7 @@ Recommended practice:
 2. Keep allow rules only as defensive documentation; do not rely on them to suppress sensitive-path prompts.
 3. Do not run unattended Claude Code release automation that writes `templates/.claude/**` unless the workflow can handle interactive approval.
 4. In this Codex port, update `.codex/...` source files and their `templates/.claude/...` mirrors deliberately instead of bulk-copying with shell commands.
+5. For unattended Claude compatibility-template writes, use a reviewed temporary script wrapper and verify the resulting diff; direct Bash/Write/Edit targets under `templates/.claude/**` can all trigger the sensitive-path guard.
 ## Separation of Concerns
@@ -344,7 +345,7 @@ Default: `core` (when field is omitted)
 ### Context Fork Criteria
-Use `context: fork` for multi-agent orchestration skills only. Cap: **12 total**. Current: 12/12 (secretary/dev-lead/de-lead/qa-lead-routing, dag-orchestration, task-decomposition, worker-reviewer-pipeline, pipeline-guards, deep-plan, professor-triage, evaluator-optimizer, sauron-watch).
+Use `context: fork` for multi-agent orchestration skills only. Cap: **12 total**. Current: 10/12 (secretary-routing, dev-lead-routing, de-lead-routing, qa-lead-routing, dag-orchestration, task-decomposition, worker-reviewer-pipeline, pipeline-guards, deep-plan, professor-triage).
 <!-- DETAIL: Context Fork decision table
 | Use context:fork | Do NOT use context:fork |

package/templates/.claude/rules/MUST-completion-verification.md CHANGED Viewed

@@ -21,6 +21,19 @@ Before declaring any task `[Done]`, verify completion against task-type-specific
 Before [Done]: (1) Verify ACTUAL outcome not just attempt — "ran command" ≠ "succeeded". (2) Check task-type criteria above. (3) No unchecked items. (4) Would bet $100 it's complete.
+## Optional: Quantitative Evidence
+For agent, skill, or workflow changes, completion evidence MAY include `agent-eval-framework` metrics:
+| Metric | Meaning | Gate |
+|--------|---------|------|
+| `correctness` | Acceptance criteria satisfied | Required if included |
+| `step_ratio` | Observed steps vs. ideal steps | Advisory |
+| `tool_call_ratio` | Observed tool calls vs. ideal tool calls | Advisory |
+| `latency_ratio` | Observed duration vs. ideal duration | Advisory |
+These metrics strengthen a `[Done]` claim but do not replace task-specific verification. A failed correctness score blocks completion even if efficiency ratios are good.
 <!-- DETAIL: Self-Check box
 1. Did I verify ACTUAL outcome? "I ran the command" ≠ "the command succeeded" → YES: Continue / NO: Verify first
 2. Does task type have specific criteria? YES: Check each / NO: Apply general verification

package/templates/.claude/rules/SHOULD-interaction.md CHANGED Viewed

@@ -35,6 +35,8 @@
 ## Output Styles
+Session-level style enforcement belongs in runtime output-style mechanisms when the host supports them. In this Codex port, R003 remains the portable source of style-selection rules; packaged Claude compatibility may additionally provide `.claude/output-styles/` presets that reinforce the same constraints.
 | Style | Trigger | Behavior |
 |-------|---------|----------|
 | `concise` | effort: low, batch operations | Key result only, no preamble, no elaboration |

package/templates/.claude/skills/agent-eval-framework/SKILL.md ADDED Viewed

@@ -0,0 +1,92 @@
+---
+name: agent-eval-framework
+description: Quantitative agent evaluation using correctness, step ratio, tool-call ratio, and latency ratio
+scope: harness
+user-invocable: true
+argument-hint: "<trace-or-task> [--ideal <path>] [--format markdown|json]"
+effort: high
+version: 1.0.0
+---
+# Agent Eval Framework
+## Purpose
+Evaluate agent runs with a two-phase quantitative gate:
+1. **Correctness first**: the task must meet its stated acceptance criteria.
+2. **Efficiency second**: only correctness-passing runs are compared by step, tool-call, and latency ratios.
+This keeps eval pressure useful. A faster run that fails the task is not a better run.
+## Metric Framework
+| Metric | Formula | Pass Signal |
+|--------|---------|-------------|
+| `correctness` | `passed_criteria / total_criteria` | `1.0` for release-quality evidence |
+| `step_ratio` | `observed_steps / ideal_steps` | `<= 1.25` preferred |
+| `tool_call_ratio` | `observed_tool_calls / ideal_tool_calls` | `<= 1.25` preferred |
+| `latency_ratio` | `observed_ms / ideal_ms` | `<= 1.50` preferred |
+Use ratios as advisory evidence unless a task explicitly opts into a stricter gate.
+## Ideal Trajectory Schema
+```yaml
+task: "short task name"
+capability: "file_operations | retrieval | tool_use | memory | conversation | summarization"
+ideal:
+  steps: 4
+  tool_calls: 5
+  latency_ms: 120000
+acceptance_criteria:
+  - "Criterion one"
+  - "Criterion two"
+notes: "Why this ideal path is reasonable"
+```
+## Capability Taxonomy
+| Capability | Typical Evidence |
+|------------|------------------|
+| `file_operations` | precise diffs, no unrelated churn, verification after writes |
+| `retrieval` | targeted `rg`/file reads, source references, low duplicate search |
+| `tool_use` | appropriate tool choice, no unnecessary escalation |
+| `memory` | relevant memory used and cited, stale facts re-verified when needed |
+| `conversation` | clear routing, no repeated clarification for known constraints |
+| `summarization` | faithful compression, preserved blockers and evidence |
+## Workflow
+1. Define or load an ideal trajectory for the task.
+2. Collect observed run data from trace, transcript, hook output, or manual evidence.
+3. Score correctness against acceptance criteria.
+4. If correctness fails, stop and report failed criteria.
+5. If correctness passes, compute efficiency ratios.
+6. Attach the metric table to the completion evidence or improvement report.
+## Output Format
+```markdown
+## Agent Eval Result
+| Metric | Observed | Ideal | Ratio | Verdict |
+|--------|----------|-------|-------|---------|
+| correctness | 4/4 | 4/4 | 1.00 | pass |
+| steps | 5 | 4 | 1.25 | pass |
+| tool calls | 7 | 5 | 1.40 | advisory |
+| latency | 150s | 120s | 1.25 | pass |
+Decision: correctness-pass, efficiency-advisory
+```
+## Integration Points
+- `harness-eval`: use this framework to add trajectory efficiency evidence to benchmark runs.
+- `evaluator-optimizer`: run correctness before efficiency comparisons.
+- `mgr-creator`: opt in for high-risk new agents where quantitative validation is worth the extra cost.
+- `omcustomcodex:improve-report`: include repeated ratio regressions as improvement suggestions.
+## Attribution
+Adapted from LangChain Deep Agents eval methodology: correctness-first scoring, ideal trajectory annotation, and efficiency ratios for step, tool-call, and latency comparison.

package/templates/.claude/skills/agora/SKILL.md CHANGED Viewed

@@ -43,6 +43,17 @@ source:
 Spawn 3 reviewers as Agent Team members:
 ```
+### Anti-Groupthink Mode
+Use `--anti-groupthink` when consensus itself is a risk:
+1. Reviewers submit independent findings before seeing peer output.
+2. One reviewer is assigned as devil's advocate.
+3. Minority findings are preserved unless the synthesis explicitly rejects them with evidence.
+4. Debate is capped at two challenge rounds before the lead either decides or requests more facts.
+For decisions where dissent preservation is the main goal, use `roundtable-debate` directly instead of `agora`.
 Agent(name: "claude-critic", model: opus, effort: max)
   → 20-point deep adversarial review

package/templates/.claude/skills/codex-exec/SKILL.md CHANGED Viewed

@@ -204,3 +204,15 @@ When routing skills detect a code generation task and codex is available:
 ```
 /codex-exec "Generate {description} following {framework} best practices" --effort high --full-auto
 ```
+## Browser Verify Workflow
+For frontend or browser-visible changes, use a Build + Vision + Verify loop instead of stopping at a successful build:
+1. Build or start the local dev server.
+2. Open the target in the available browser automation surface.
+3. Capture screenshot evidence and console/network errors.
+4. If the visual state or console is wrong, run `codex-exec` with the concrete evidence and repeat.
+5. Stop only when build, browser render, and error checks all pass.
+This pattern composes with the Codex App Browser Use plugin or any local browser MCP. Keep the loop evidence-driven: screenshot, console output, network status, and the exact command that produced the build.

package/templates/.claude/skills/dag-orchestration/SKILL.md CHANGED Viewed

@@ -193,6 +193,26 @@ Execute? [Y/n]
 The orchestrator builds the DAG from this inline format and executes using the same algorithm.
+## State Management via tracker-checkpoint
+Pipeline and DAG state is delegated to the `tracker-checkpoint` agent.
+### Flow
+1. Pipeline start: orchestrator delegates to `tracker-checkpoint` to create an initial state file (`/tmp/.codex-pipeline-{name}-{PPID}.json`)
+2. After each step: `tracker-checkpoint` updates step state with atomic writes
+3. Step failure: `tracker-checkpoint` freezes the state as `halted`
+4. `/pipeline resume`: `tracker-checkpoint` loads state and returns restore options to the orchestrator
+### Integration
+- PPID-scoped pipeline state path: `/tmp/.codex-pipeline-{name}-{PPID}.json`
+- PPID-scoped DAG state path: `/tmp/.codex-dag-{PPID}.json`
+- Delegate before and after step execution when resume support is required
+- On resume, rebuild the DAG from checkpoint state and continue from incomplete steps
+See `.codex/agents/tracker-checkpoint.md` for the agent contract.
 ## Limitations
 - No cycles allowed (DAG = acyclic)

package/templates/.claude/skills/evaluator-optimizer/SKILL.md CHANGED Viewed

@@ -104,6 +104,26 @@ When `conditional.enabled: true` and ANY `skip_when` condition is met, the evalu
 | Complex architecture, security-critical | High | Run with pre-negotiation |
 | Previously failed task retry | Any | Always run |
+### Quantitative Efficiency Metrics
+When a task provides an ideal trajectory, the evaluator MAY attach `agent-eval-framework` metrics after the normal quality gate:
+```yaml
+evaluator-optimizer:
+  quantitative_metrics:
+    enabled: true
+    ideal:
+      steps: 4
+      tool_calls: 5
+      latency_ms: 120000
+    advisory_thresholds:
+      step_ratio: 1.25
+      tool_call_ratio: 1.25
+      latency_ratio: 1.50
+```
+Correctness remains the primary gate. Efficiency ratios are used to compare correctness-passing candidates or to create follow-up improvement suggestions.
 ### Parameter Details
 | Parameter | Required | Default | Description |

package/templates/.claude/skills/harness-eval/SKILL.md CHANGED Viewed

@@ -86,6 +86,19 @@ This skill provides preset rubrics for the evaluator-optimizer pipeline:
 The evaluator-optimizer skill's `pre_negotiation` phase accepts harness-eval rubric dimensions as sprint contract criteria.
+## Optional 4-Metric Trajectory Evidence
+For agent or skill benchmarks, enrich the 0-100 quality score with the `agent-eval-framework` metrics:
+| Metric | Source | Use |
+|--------|--------|-----|
+| `correctness` | benchmark assertions and acceptance criteria | Required before efficiency is considered |
+| `step_ratio` | observed steps vs. ideal trajectory | Advisory signal for unnecessary loops |
+| `tool_call_ratio` | observed tool calls vs. ideal trajectory | Advisory signal for noisy tool use |
+| `latency_ratio` | observed duration vs. ideal trajectory | Advisory signal for runtime regression |
+Evaluation order is fixed: correctness first, efficiency second. A benchmark run with failed correctness cannot be rescued by strong efficiency ratios.
 ## Output
 Results saved to `.codex/outputs/sessions/{YYYY-MM-DD}/harness-eval-{HHmmss}.md` with per-task scores and aggregate grade.

package/templates/.claude/skills/pipeline-guards/SKILL.md CHANGED Viewed

@@ -158,6 +158,25 @@ Guard warnings appear inline:
 | stuck-recovery | Guard triggers feed into stuck detection |
 | model-escalation | Repeated failures trigger escalation advisory |
+## Checkpoint Gate Integration
+Guard pass/fail state is recorded through the `tracker-checkpoint` agent when a pipeline needs resumable execution.
+### Flow
+1. Guard entry: record gate state as `running`
+2. Guard pass: record gate state as `passed` with relevant metrics
+3. Guard failure: record gate state as `failed` and freeze failure reason
+4. Next step: read checkpoint state to decide whether to resume or halt
+### Benefits
+- Long pipelines gain restore points at guard boundaries
+- Partial failures can retry from the prior guard boundary
+- Guard metrics accumulate for release-quality trend analysis
+See `.codex/agents/tracker-checkpoint.md` for the checkpoint contract.
 ## Override Policy
 - Defaults can be overridden in pipeline spec (within hard caps)

package/templates/.claude/skills/roundtable-debate/SKILL.md ADDED Viewed

@@ -0,0 +1,60 @@
+---
+name: roundtable-debate
+description: Structured multi-agent debate that preserves dissent with a mandatory devil's advocate and two-round cap
+scope: core
+user-invocable: true
+argument-hint: "<topic-or-document> [--rounds 1|2] [--decision required|advisory]"
+effort: high
+version: 1.0.0
+---
+# Roundtable Debate
+## Purpose
+Run a bounded debate when convergence would hide useful disagreement. Unlike `agora`, which drives toward consensus, this workflow preserves minority positions and requires explicit justification before dismissing them.
+## When To Use
+- Architecture or product choices with multiple defensible paths.
+- Review work where anchoring or groupthink is likely.
+- Decisions where a minority risk could be more important than the majority preference.
+## Workflow
+1. **Independent-first analysis**: spawn 3-5 reviewers in parallel. Do not share intermediate opinions before each reviewer submits an initial view.
+2. **Mandatory devil's advocate**: one reviewer argues against the emerging default, even if they personally agree with it.
+3. **Round 1 synthesis**: group findings into majority positions, minority positions, and unresolved facts.
+4. **Round 2 challenge**: reviewers respond only to disputed claims and missing evidence.
+5. **Decision record**: keep the final recommendation and any protected dissent.
+Hard cap: two debate rounds. If the decision still depends on missing facts, stop and gather evidence instead of debating longer.
+## Output
+```markdown
+# Roundtable Debate Result
+## Topic
+{topic}
+## Majority Recommendation
+{recommendation}
+## Protected Dissent
+| Position | Advocate | Why It Was Not Dismissed |
+|----------|----------|--------------------------|
+| {position} | devil's advocate | {evidence or risk} |
+## Decision
+{adopt | defer | reject | gather-more-evidence}
+```
+## Relationship To Agora
+| Workflow | Goal | Best For |
+|----------|------|----------|
+| `agora` | adversarial consensus | release gates, spec approval |
+| `roundtable-debate` | dissent preservation | ambiguous strategy, architectural tradeoffs |
+Use `agora --anti-groupthink` when you need consensus plus explicit dissent handling.

package/templates/.claude/skills/sauron-watch/SKILL.md CHANGED Viewed

@@ -99,10 +99,22 @@ Build dependency graph:
 Count skills with context: fork in frontmatter:
   grep "context: fork" .codex/skills/*/SKILL.md
-  If count > 10:
-    ERROR: "Context fork cap exceeded: {count}/10"
-  If count >= 8:
-    WARN: "Context fork usage high: {count}/10 — only {10-count} slots remaining"
+  If count > 12:
+    ERROR: "Context fork cap exceeded: {count}/12"
+  If count >= 10:
+    WARN: "Context fork usage high: {count}/12 — only {12-count} slots remaining"
+```
+**Lint 5: R006 Fork List Cross-Validation**
+```
+Run: bash .github/scripts/verify-fork-list.sh
+Compare:
+  - R006 Context Fork Criteria current count/list
+  - Actual .codex/skills/*/SKILL.md frontmatter with context: fork
+If count or list differs:
+  ERROR: "R006 fork list drift detected"
 ```
 All structural lints are **advisory** (WARN level) except circular dependencies and fork cap exceeded (ERROR level — should block commit).

package/templates/.claude/skills/sdd-dev/SKILL.md CHANGED Viewed

@@ -2,7 +2,7 @@
 name: sdd-dev
 description: Spec-Driven Development workflow — enforces sdd/ folder hierarchy with planning-first gates, current-state artifacts, and completion verification
 scope: core
-version: 1.0.0
+version: 1.1.0
 user-invocable: true
 argument-hint: "[task description or leave empty for guided workflow]"
 ---
@@ -27,7 +27,8 @@ sdd/
 ├── 03_build/        # Current build state, implementation notes
 ├── 04_verify/       # Verification evidence, test results, residual risks
 ├── 05_operate/      # Deployment notes, runbooks (conditional)
-└── 99_toolchain/    # Tool configs, scripts, environment setup
+├── 99_toolchain/    # Tool configs, scripts, environment setup
+└── decisions/       # Decision records for major design choices
 ```
 **Key Principle**: These folders are **current-state artifacts**, not history archives. Each file reflects the current state of the work — update in place rather than appending new versions.
@@ -44,7 +45,7 @@ ls sdd/ 2>/dev/null || echo "sdd/ folder not found"
 If `sdd/` does not exist:
 1. Inform the user that SDD workflow requires a `sdd/` folder
-2. Offer to create the folder structure: `mkdir -p sdd/{01_planning,02_plan,03_build,04_verify,05_operate,99_toolchain}`
+2. Offer to create the folder structure: `mkdir -p sdd/{01_planning,02_plan,03_build,04_verify,05_operate,99_toolchain,decisions}`
 3. Ask user to confirm before proceeding
 If `sdd/` exists, continue to Step 1.
@@ -121,6 +122,7 @@ Artifact to produce or update: `sdd/03_build/current.md`
 ## Decisions Made
 - {decision}: {rationale}
+- Write decision records for major choices: `sdd/decisions/{YYYY-MM-DD}-{topic}.md` using `templates/decision-record.md`
 ## Known Issues
 - {issue}: {planned resolution}
@@ -129,6 +131,7 @@ Artifact to produce or update: `sdd/03_build/current.md`
 During implementation:
 - Follow the plan from Step 2
 - Update `sdd/03_build/current.md` as work progresses
+- Create or update a decision record when a choice materially changes architecture, workflow behavior, dependency strategy, or release risk
 - Keep the artifact current (not a log — overwrite stale entries)
 **Display**: