npm - devlyn-cli - Versions diffs - 1.15.0 → 2.0.0 - Mend

devlyn-cli 1.15.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (158) hide show

package/config/skills/_shared/expected.schema.json ADDED Viewed

@@ -0,0 +1,93 @@
+{
+  "$schema": "https://json-schema.org/draft/2020-12/schema",
+  "$id": "https://github.com/fysoul17/devlyn-cli/config/skills/_shared/expected.schema.json",
+  "title": "spec.expected.json — mechanical acceptance contract",
+  "description": "Load-bearing LLM-agnostic decoupler for the devlyn-cli harness. Defines the machine-readable acceptance criteria a spec ships alongside spec.md. Stable across model upgrades — when Opus 5 / GPT-6 / Qwen / Gemini land, this schema does not move; only the per-model adapter files in _shared/adapters/ do.",
+  "type": "object",
+  "additionalProperties": false,
+  "properties": {
+    "verification_commands": {
+      "type": "array",
+      "description": "Each command is executed against the post-BUILD code. Each pass/fail contributes to verify_score. At least one entry is required when the spec has any observable runtime check (CLI, test command, HTTP request).",
+      "items": {
+        "type": "object",
+        "additionalProperties": false,
+        "required": ["cmd"],
+        "properties": {
+          "cmd": {
+            "type": "string",
+            "description": "Shell command, executed via `subprocess.run(..., shell=True)` from the build's working directory.",
+            "minLength": 1
+          },
+          "exit_code": {
+            "type": "integer",
+            "description": "Required exit code. Default 0 if omitted.",
+            "default": 0
+          },
+          "stdout_contains": {
+            "type": "array",
+            "description": "Each substring must appear verbatim in (stdout + stderr) for pass.",
+            "items": { "type": "string", "minLength": 1 },
+            "default": []
+          },
+          "stdout_not_contains": {
+            "type": "array",
+            "description": "None of these substrings may appear in (stdout + stderr) for pass.",
+            "items": { "type": "string", "minLength": 1 },
+            "default": []
+          }
+        }
+      }
+    },
+    "forbidden_patterns": {
+      "type": "array",
+      "description": "Regex patterns scanned across diff.patch. Match at severity=disqualifier is a hard-floor fail; match at severity=warning is judge-only critical-finding.",
+      "items": {
+        "type": "object",
+        "additionalProperties": false,
+        "required": ["pattern", "description", "severity"],
+        "properties": {
+          "pattern": {
+            "type": "string",
+            "description": "Python re.search-compatible regex. Anchored implicitly by the surrounding regex, NOT by ^ / $ unless intended.",
+            "minLength": 1
+          },
+          "description": {
+            "type": "string",
+            "description": "Human-readable explanation of what the pattern catches and why it's forbidden.",
+            "minLength": 1
+          },
+          "files": {
+            "type": "array",
+            "description": "Optional allow-list of files (substrings of diff --git lines). When present, scan is sliced to hunks touching only these files.",
+            "items": { "type": "string", "minLength": 1 },
+            "default": []
+          },
+          "severity": {
+            "type": "string",
+            "enum": ["disqualifier", "warning"],
+            "description": "disqualifier = hard-floor fail (DQ). warning = judge-visible critical-finding only."
+          }
+        }
+      }
+    },
+    "required_files": {
+      "type": "array",
+      "description": "Files that must exist after the arm runs.",
+      "items": { "type": "string", "minLength": 1 },
+      "default": []
+    },
+    "forbidden_files": {
+      "type": "array",
+      "description": "Files that must NOT appear in the arm's diff (e.g. tooling artifacts the spec didn't request).",
+      "items": { "type": "string", "minLength": 1 },
+      "default": []
+    },
+    "max_deps_added": {
+      "type": "integer",
+      "description": "Hard cap on new entries under dependencies/devDependencies in package.json. Exceeds → DQ.",
+      "minimum": 0,
+      "default": 0
+    }
+  }
+}

package/config/skills/_shared/pair-plan-schema.md ADDED Viewed

@@ -0,0 +1,298 @@
+# Shared — `pair-plan.json` schema (iter-0022 archive)
+> **Archive header (iter-0034 Phase 4 cutover, 2026-05-04)** — this schema was iter-0022 infrastructure for the now-deleted `/devlyn:auto-resolve` PHASE 0 plan-pair contract. The `/devlyn:resolve` PHASE 1 PLAN at HEAD runs solo (per iter-0033 (C1) PASS evidence + iter-0033g § "CLOSURE"). The schema is preserved here as a design archive: the unblock conditions for re-instating PLAN-pair (per `iterations/0034-phase-4-cutover.md` § "L2 PLAN-pair research-only label") are A — container/sandbox isolation justified by another product need, OR B — production telemetry captures positive evidence of subagent introspection that a PLAN-pair measurement would need to isolate. When either condition fires, this schema (and the associated lint / idgen / preflight tooling under `benchmark/auto-resolve/scripts/`) is the starting point.
+Single source of truth for `pair-plan.json` and its companion `canonical_id_registry.json` when the architecture re-enters scope. Read this once before editing `pair-plan-idgen.py`, `pair-plan-lint.py`, `pair-plan-preflight.sh`, or any future plan-pair PHASE that consumes `state.plan`.
+## Audience (when re-instated)
+- `benchmark/auto-resolve/scripts/pair-plan-idgen.py` — produces `canonical_id_registry.json` from `expected.json` + checked-in oracle scripts.
+- `benchmark/auto-resolve/scripts/pair-plan-lint.py` — validates a `pair-plan.json` against its registry.
+- `autoresearch/scripts/pair-plan-preflight.sh` — orchestrates solo + pair plan generation against blind-aliased fixtures.
+- A future `/devlyn:resolve` PHASE 1 plan-pair branch (currently solo; gated on unblock A or B above): would accept `--plan-path` / JSON payload, set `state.plan.{mode, path}`, and run lint before IMPLEMENT, mirroring the deleted `devlyn:auto-resolve` PHASE 0 contract.
+## File locations and naming (canonical)
+- Registry per fixture: `benchmark/auto-resolve/fixtures/<F>/expected-pair-plan-registry.json` (committed snapshot for diff-against-baseline; iter-0023 verifies the live idgen output equals this snapshot).
+- Plan produced by preflight: `benchmark/auto-resolve/results/<run_id>/<blind_fixture>/plan-preflight/merged/pair-plan.json`.
+- Plan supplied to a re-instated plan-pair branch by an external caller: any path the user chooses, passed via `--plan-path <path>` (the contract surface is preserved as iter-0022 design archive).
+- The registry filename is `canonical_id_registry.json` for **runtime artifacts** — both inside the bundle dir and in the preflight output root. (HANDOFF.md:280 mentions `canonical-ids.json` for the preflight output dir; that name is deprecated — D4 emits `canonical_id_registry.json` to align with the rest of the toolchain.)
+- The **committed fixture snapshot** is named `expected-pair-plan-registry.json` (one per fixture, under `benchmark/auto-resolve/fixtures/<F>/`) — distinct file name to make snapshots greppable separately from runtime artifacts. iter-0023 verifies the live idgen output equals the committed snapshot for the same fixture.
+## `canonical_id_registry.json` shape
+Top-level wrapper:
+```jsonc
+{
+  "schema_version": "1",
+  "fixture_id": "F2-cli-medium-subcommand",
+  "generated_at": "2026-04-29T18:30:00Z",
+  "generated_from": {
+    "expected_path": "benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected.json",
+    "expected_sha256": "...",          // raw file bytes sha256
+    "metadata_path":  "benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/metadata.json",
+    "metadata_sha256": "...",          // raw file bytes sha256
+    "oracle_script_shas": {
+      "test-fidelity":   "...",        // raw bytes sha256 of oracle-test-fidelity.py
+      "scope-tier-a":    "...",
+      "scope-tier-b":    "..."
+    }
+  },
+  "required_invariants": [
+    {
+      "id": "...",
+      "source_field": "expected.json/forbidden_patterns/0 | expected.json/verification_commands/3 | expected.json/required_files | expected.json/forbidden_files | expected.json/max_deps_added | expected.json/spec_output_files | oracle/<oracle-name>/<category-id>",
+      "source_ref":   "expected.json:60 | expected.json/verification_commands/0 | oracle-test-fidelity.py",
+      "operational_check": "...natural-language description of what the variant must do or must not do...",
+      "severity": "disqualifier | hard | flag | warn",
+      "authority": "expected.json/forbidden_patterns | expected.json/verification_commands | expected.json/required_files | expected.json/forbidden_files | expected.json/max_deps_added | expected.json/spec_output_files | metadata/oracle-allowlist"
+    }
+    // ...sorted lexicographically by id
+  ]
+}
+```
+**Hard rules**:
+- `required_invariants` MUST be sorted lexicographically by `id`. idgen sorts before serializing; lint rejects an unsorted file.
+- All file shas (`expected_sha256`, `metadata_sha256`, `oracle_script_shas.*`) are **raw file bytes sha256** — `sha256(open(path, "rb").read())`. NOT canonical-JSON form. (Canonical form is reserved for the pair-plan pre-stamp hash; see below.)
+- `info`-severity oracle categories are NOT registry entries (e.g. scope-tier-b's `tier-b-reachable` is a positive signal, not an invariant violation).
+- The umbrella oracle category `scope-tier-a:tier-a-violation` is ONE registry entry; the 5 path-glob groups (planning-doc, ci-config, node-modules, test-results-or-coverage, env-secrets) are described inside `operational_check`, not split into 5 entries.
+**Determinism**: same `(expected.json, metadata.json, oracle scripts)` input → byte-identical `canonical_id_registry.json`. Achieved by:
+- `json.dumps(obj, sort_keys=True, indent=2, ensure_ascii=False)` for the on-disk file.
+- All lists pre-sorted before dumping (registry items by `id`).
+- No timestamps that change run-to-run except `generated_at` — see exemption below.
+`generated_at` is the ONE volatile field. Lint ignores it for sha-stability checks; lint's determinism check sets `generated_at` to a fixed value before comparing two consecutive idgen runs. (Implementation: idgen accepts `--generated-at <iso8601>` for testing.)
+## `pair-plan.json` shape
+```jsonc
+{
+  "schema_version": "1",
+  "plan_status": "final | blocked | draft",
+  "planning_mode": "solo | pair",
+  "fixture_id": "F2-cli-medium-subcommand",          // human label; not authoritative
+  "source": {
+    "spec_path":          "benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/spec.md",
+    "spec_sha256":        "...",                     // raw file bytes
+    "expected_path":      "benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected.json",
+    "expected_sha256":    "...",                     // raw file bytes (optional only when expected.json absent)
+    "rubric_path":        "benchmark/auto-resolve/RUBRIC.md",
+    "rubric_sha256":      "...",                     // raw file bytes
+    "canonical_id_registry_path":   "benchmark/auto-resolve/fixtures/F2-cli-medium-subcommand/expected-pair-plan-registry.json",
+    "canonical_id_registry_sha256": "..."            // raw file bytes of the registry file
+  },
+  "authority_order": [
+    "spec.md",
+    "expected.json/rubric",
+    "phase prompt",
+    "model preference"
+  ],
+  "rounds": [
+    {
+      "round": 1,
+      "claude_draft_sha256": "...",                  // raw file bytes of the per-round draft artifact
+      "codex_draft_sha256":  "...",
+      "merged_sha256":       "...",
+      "note": "..."
+    }
+    // up to 3 rounds; iter-0022 preflight stops at the first round where neither model has new substantive critique
+  ],
+  "accepted_invariants": [
+    {
+      "id":           "no_silent_catch_return_fallback",
+      "paraphrase":   "...",                         // human-readable; informational only, NOT enforced
+      "source_refs":  ["spec.md:36", "expected.json/forbidden_patterns/0"],
+      "operational_check": "BUILD output must not contain `catch[^{]*\\{[^}]*return [^}]*\\}` in bin/cli.js",
+      "authority":    "expected.json/forbidden_patterns"
+    }
+  ],
+  "rejected_alternatives": [
+    {
+      "id":              "alt_silent_catch_with_log",
+      "rationale":       "Authority order says expected.json/forbidden_patterns dominates; logging does not change visible-error contract.",
+      "conflicts_with_ids": ["no_silent_catch_return_fallback"],
+      "claude_stamp":    "rejected",
+      "codex_stamp":     "rejected"
+    }
+  ],
+  "unresolved":          [],                         // MUST be empty in final plans
+  "escalated_to_user":   [],                         // populated only during draft / blocked status; final must have user_resolution per item if non-empty
+  "model_stamps": {
+    "claude": {
+      "status":              "sign | block",
+      "blocked_ids":         [],
+      "signed_plan_sha256":  "...",                  // canonical pre-stamp sha (see below)
+      "model":               "claude-opus-4-7",
+      "timestamp":           "2026-04-29T..."
+    },
+    "codex": {
+      "status":              "sign | block",
+      "blocked_ids":         [],
+      "signed_plan_sha256":  "...",
+      "model":               "gpt-5.5",
+      "timestamp":           "..."
+    }
+  }
+}
+```
+## Severity decoupling (registry vs findings)
+The registry's `required_invariants[].severity` taxonomy is **metadata for human review only**: `disqualifier | hard | flag | warn`. It is NOT mapped onto the `references/findings-schema.md` taxonomy used by EVAL / CRITIC findings (`CRITICAL | HIGH | MEDIUM | LOW`). When a phase emits a finding for a missed plan invariant, severity is assigned by that phase's own existing severity policy (per `findings-schema.md`), not by reading the registry severity directly. The two taxonomies serve different audiences (registry severity = "how the oracle classifies it"; findings severity = "what the orchestrator should do about it") and are intentionally not coupled in iter-0022.
+## Hard rules (lint-enforced)
+1. `unresolved.length > 0` → `plan_status` MUST be `blocked` or `draft`. Final accepted plan MUST have `unresolved == []`.
+2. `escalated_to_user[]` non-empty → each item MUST carry a `user_resolution` field, OR `plan_status` MUST be `blocked` / `draft`.
+3. Every `accepted_invariants[].id` MUST appear in the registry's `required_invariants[].id` exactly (string match — no paraphrase, no synonym, no new IDs at plan-time). `paraphrase` is informational only.
+4. **Final-plan coverage**: when `plan_status == "final"`, every registry entry MUST be accounted for in the plan — each `required_invariants[].id` is in `accepted_invariants[].id` OR in some `rejected_alternatives[].conflicts_with_ids[]` OR in `escalated_to_user[].id` OR in `unresolved[].id`. (`draft` and `blocked` plans are NOT subject to full coverage; they may still carry un-decided ids in `unresolved[]` per Rule #1.)
+5. `authority_order` MUST be the exact 4-string array `["spec.md", "expected.json/rubric", "phase prompt", "model preference"]` (snapshot at iter-0022 ship time; future iters can amend with explicit `schema_version` bump).
+6. `model_stamps.{claude,codex}.status == "sign"` MUST hold for `plan_status: "final"`. A `block` from either model forces `plan_status` to `blocked` or `draft`.
+7. `model_stamps.{claude,codex}.signed_plan_sha256` MUST be byte-identical AND MUST equal the canonical pre-stamp sha256 of the file (see "Two sha256 contracts" below).
+8. `source.{spec_sha256, expected_sha256, rubric_sha256, canonical_id_registry_sha256}` MUST equal the actual raw-bytes sha256 of the referenced files at lint time (catches stale plans against changed sources).
+9. `source.canonical_id_registry_path` MUST resolve to an existing registry file. lint reads it from this field; if `--registry <path>` is passed on the lint command line, the override wins.
+10. `planning_mode: "pair"` requires `rounds.length >= 1`. `planning_mode: "solo"` requires `rounds.length == 0` (no merge artifacts).
+## Two sha256 contracts (DO NOT CONFLATE)
+### Contract A — raw file bytes
+Used for: every `source.*_sha256` field (spec, expected, rubric, registry), every `generated_from.*_sha256` field in the registry, every `rounds[].*_draft_sha256` and `merged_sha256`.
+```python
+import hashlib
+with open(path, "rb") as f:
+    sha = hashlib.sha256(f.read()).hexdigest()
+```
+No canonicalization. The bytes on disk are what gets hashed. This catches "the plan claims spec.md is sha X but spec.md actually has bytes producing sha Y" drift.
+### Contract B — canonical pre-stamp form (pair-plan stamps only)
+Used for: `model_stamps.claude.signed_plan_sha256` and `model_stamps.codex.signed_plan_sha256`. Both stamps sign **byte-identical** canonical bytes, so both sha values are byte-identical.
+Algorithm (writers and verifiers MUST implement exactly):
+```python
+import json
+import hashlib
+import copy
+def canonical_pre_stamp_sha256(plan: dict) -> str:
+    # Reject duplicate keys when LOADING the plan; this function assumes a clean dict.
+    pre = copy.deepcopy(plan)
+    pre["model_stamps"] = {}                       # replace value, keep key
+    s = json.dumps(
+        pre,
+        sort_keys=True,
+        separators=(",", ":"),
+        ensure_ascii=False,
+        allow_nan=False,
+    )
+    return hashlib.sha256(s.encode("utf-8")).hexdigest()
+```
+When LOADING the plan, reject duplicate keys:
+```python
+def _strict_pairs(pairs):
+    keys = [k for k, _ in pairs]
+    if len(keys) != len(set(keys)):
+        raise ValueError("duplicate key in pair-plan.json")
+    return dict(pairs)
+with open(path, "r", encoding="utf-8") as f:
+    plan = json.load(f, object_pairs_hook=_strict_pairs)
+```
+**Why no Unicode normalization**: the canonical form hashes input bytes as-is. Writers and verifiers must agree on input form (NFC recommended for any user-supplied free-text strings, but not enforced — the scheme survives because both sides derive from the same source bytes).
+**Why no floats**: integer + string serialize byte-stably across implementations. Floats vary (e.g. `1.0` vs `1`). Avoid floats in this schema until a future field absolutely requires one; if added, document the canonical float-printing rule in this file.
+## Slug rules for registry IDs (idgen)
+When an `expected.json` item lacks an explicit `id` field, idgen synthesizes a deterministic slug.
+### `forbidden_patterns[i]` slug
+```
+forbidden_pattern__<sanitize(description, 60)>__<sanitize(files[0], 30)>
+```
+`sanitize(s, max_len)`: lowercase; replace any non-`[a-z0-9]` run with a single `_`; strip leading/trailing `_`; truncate to `max_len` (right-truncate, no hash suffix at this level).
+If two items in the same `forbidden_patterns[]` array produce the same slug after sanitization, the FIRST one (by source-array index) keeps the bare slug; each subsequent collision appends `__i<index>`. idgen detects this deterministically by walking the array in order.
+Example F2:
+- `forbidden_patterns[0]` (description="silent catch returning a fallback value — violates no-silent-catches policy", files=["bin/cli.js"]) → `forbidden_pattern__silent_catch_returning_a_fallback_value_violate__bin_cli_js`
+- `forbidden_patterns[1]` (description="@ts-ignore escape hatch", files=["bin/cli.js"]) → `forbidden_pattern__ts_ignore_escape_hatch__bin_cli_js`
+### `verification_commands[i]` slug
+```
+verification__<sha8(canonical_json(verification_obj))>
+```
+`canonical_json(obj)`: same compact form as Contract B (`json.dumps(obj, sort_keys=True, separators=(",", ":"), ensure_ascii=False, allow_nan=False)`).
+`sha8(s)`: first 8 hex chars of `sha256(s.encode("utf-8"))`.
+The full verification object is hashed (cmd + exit_code + stdout_contains + stdout_not_contains), so reordering the array does not change the slug. Array-index lives in `source_ref` (`expected.json/verification_commands/<i>`) for human navigation only.
+### Other expected.json fields
+- `required_files`: one registry entry per file path: `required_file__<sanitize(path, 60)>`.
+- `forbidden_files`: same shape: `forbidden_file__<sanitize(path, 60)>`.
+- `max_deps_added`: one registry entry: `max_deps_added__<value>` (e.g. `max_deps_added__0`).
+- `spec_output_files`: one registry entry per path: `spec_output_file__<sanitize(path, 60)>`.
+### Oracle category IDs (no slug — fixed strings)
+Oracle `--list-categories` returns category IDs in the form `<oracle-name>:<finding-type>`. These are stable strings that idgen passes through verbatim into `required_invariants[].id`. Each oracle script defines its own enum; iter-0022 ship snapshot:
+- `test-fidelity:test-file-deleted`
+- `test-fidelity:test-file-renamed`
+- `test-fidelity:mock-swap`
+- `test-fidelity:assertion-regression`
+- `scope-tier-a:lockfile-deletion`
+- `scope-tier-a:tier-a-violation`
+- `scope-tier-b:scope-unmatched`
+`scope-tier-b:tier-b-reachable` is `info`-severity and NOT a registry entry.
+## metadata.json field for per-fixture oracle allowlist
+iter-0022 adds one new field to each fixture's `metadata.json`:
+```json
+{
+  "id": "F2-cli-medium-subcommand",
+  // ... existing fields unchanged ...
+  "pair_plan_oracle_categories": [
+    "test-fidelity:test-file-deleted",
+    "test-fidelity:test-file-renamed",
+    "test-fidelity:mock-swap",
+    "test-fidelity:assertion-regression",
+    "scope-tier-a:lockfile-deletion",
+    "scope-tier-a:tier-a-violation",
+    "scope-tier-b:scope-unmatched"
+  ]
+}
+```
+Hard rule: idgen filters oracle categories to exactly this allowlist. If the field is missing, idgen treats it as the empty array (no oracle categories registered) — `expected.json`-derived invariants still appear. Schema-version bump if the allowlist semantics change.
+The runner `run-fixture.sh` reads `timeout_seconds` (line 54) and the report reads `category` (compile-report.py line 76); no other consumer reads metadata.json today, so adding a new field is a pure metadata enrichment with no scoring implication.
+## Plan field minimum/maximum policy
+- A field listed in this schema with no "optional" annotation is REQUIRED.
+- Fields explicitly marked optional: `source.expected_path` / `source.expected_sha256` (only when `expected.json` is genuinely absent — not the case for any current fixture).
+- Unknown extra fields in `pair-plan.json` are NOT rejected by lint (forward-compat), but the canonical pre-stamp sha is computed over the whole object so unknown fields participate in the signature.
+- Unknown extra fields in `canonical_id_registry.json` ARE rejected by lint (idgen owns the registry shape; drift here is a bug).
+## Versioning
+`schema_version` starts at `"1"`. A breaking change to any hard rule above bumps the version and the lint script gains a per-version dispatcher. iter-0022 ships version `1`. Future iters MUST update this file before bumping the version field anywhere else.

package/config/skills/_shared/runtime-principles.md ADDED Viewed

@@ -0,0 +1,110 @@
+# Runtime principles — sub-agent contract
+The runtime contract every sub-agent inside `/devlyn:resolve` (PLAN / IMPLEMENT / BUILD_GATE / CLEANUP / VERIFY) and `/devlyn:ideate` (FRAME / EXPLORE / SPEC / CHALLENGE) must satisfy. Source of truth for sub-agent behavior on user tasks. NOT for autoresearch-loop / harness-developer concerns (see `autoresearch/PRINCIPLES.md`).
+The four sections below mirror the corresponding CLAUDE.md sections (Subtractive-first editing, Goal-locked execution, No-workaround discipline, Evidence over claim). Each section is wrapped in `<!-- runtime-principles:section=NAME:begin -->` / `:end -->` markers in BOTH this file and CLAUDE.md; lint Check 12 (added in iter-0019.A Step 5) extracts each named block from both files and diffs to detect drift.
+<!-- runtime-principles:contract:begin -->
+## Subtractive-first editing — perfection = nothing left to remove
+<!-- runtime-principles:section=subtractive-first:begin -->
+> "Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away." — Saint-Exupéry. **This is the operating definition of "done" in this repo.** A change is finished when no further line, branch, flag, or doc paragraph can be removed without breaking a learned failure mode. Not before.
+This rule overrides instinct. LLMs (including you) are trained on corpora that reward elaborate, defensive, "thorough" code — so the default impulse is to add. That impulse is wrong here. Read the rules below as hard tests, not aesthetic preferences. They are not optional, not negotiable, and not satisfiable by writing more careful additions.
+**Mandatory pre-edit question.** Before writing any change, you must answer in this order:
+1. **What can I delete that makes the addition unnecessary?** If the addition becomes redundant after the deletion, ship the deletion alone.
+2. **What can I delete that makes the addition smaller?** Trim the surrounding accretion before adding.
+3. **Only then**, what is the minimum addition required?
+If you skip question 1 or 2, you are violating this rule even if the resulting code looks clean.
+**Hard tests every edit must pass:**
+- **Net-negative is the default; pure-addition needs a citation.** A diff that adds N lines and removes 0 must point to a specific cause: a previously-observed failure mode (commit hash, fixture ID, finding ID, user-reported incident), OR an explicit user request / spec requirement that demands new user-visible behavior. The latter is a sufficient citation — do not block legitimate requested additions on the absence of a past failure. What is rejected: vague justifications like "it seems clearer," "for future flexibility," "just in case," "to be safe," "for completeness," "to handle edge cases" — these are the exact phrases that produce accretion.
+- **Delete the line that makes the bug impossible, not the line that catches it.** Defensive wrappers, validation layers, error normalizers, and `try/catch` shells are usually evidence that an upstream contract is unclear. Fix the contract upstream and remove the defenses downstream. The trap: adding the wrapper feels like progress because it makes a test pass. The wrapper is debt; the contract fix is the work. **Scope guard**: if the upstream contract fix is outside the user's stated scope, stop and surface the scope expansion to the user before editing — Goal-locked execution overrides this. The right scope-expansion outcome is "user authorizes the upstream fix" or "user accepts a scoped local fix and a follow-up for upstream"; never silently restructure something the user didn't ask you to.
+- **A new flag, branch, or option is admitting two failures**: (a) the default was wrong, (b) every reader pays attention cost forever. Default-fix-and-delete-flag beats add-flag-with-better-default. The bar for adding a configuration knob is "I have observed two real users with genuinely conflicting needs," not "this might be useful someday."
+- **Doc additions are subject to the same rule.** Before adding a section to any `.md` file (CLAUDE.md, SKILL.md, README, references/), find the now-stale sentence or section the new one supersedes — delete that first. A growing instructions file dilutes the instructions that actually need to be followed; readers (human and LLM) skim long files and miss load-bearing rules.
+- **A "cleaner" refactor that grows line count is not cleaner.** It is a sideways move that increases context, parsing, and review cost. **For refactor-only changes**, line count must drop unless a cited observed failure requires the new shape. **Never delete tests, contracts, public API, comments documenting non-obvious WHY, or user-facing behavior just to win the count** — that is gaming the metric, not honoring the principle. The metric serves complexity reduction; if a deletion would lose information not recoverable from code + commit history, it is the wrong deletion.
+- **Stop adding when no further deletion is possible.** This is the Saint-Exupéry test inverted into a stopping rule: if you have made an addition and you cannot identify anything else that can be removed, examine the addition itself — is part of it still removable? Iterate until the diff is irreducible.
+**Anti-rationalization clause** — explicitly guarding against LLM-style hedging:
+- "More explicit is safer" is **not** a justification. Explicitness has a cost in attention and rot. Required-explicit goes in; nice-to-explicit gets cut.
+- "Adding context for future readers" is **not** a justification. Future readers benefit more from shorter files than from explanatory prose. The code and the commit message together carry the why.
+- "Defense-in-depth" is **not** a justification at the harness layer. Two layers that catch the same bug are evidence one of them should be the only layer.
+- If you find yourself writing the phrase "in case" in a comment, code reviewer note, or doc, **stop and re-evaluate** — that phrase predicts an unjustified addition.
+**Stopping rule.** A change is done when (a) all hypotheses it was meant to close are closed, AND (b) you have attempted at least one further deletion and confirmed it would break something. If you have not tried to delete more, you are not done. If nothing can be deleted to justify the current addition, the addition itself is too large — re-scope or surface the conflict to the user before proceeding.
+**Never grow surface area silently.** Every accretion-shaped change must be visible: in the commit message, in the iteration file, or in a flagged review. Silent growth is the failure mode this rule exists to prevent.
+<!-- runtime-principles:section=subtractive-first:end -->
+## Goal-locked execution — stay on the North Star, do not wander
+<!-- runtime-principles:section=goal-locked:begin -->
+Even with a North Star defined, work drifts off-course ("산으로 간다" / "삼천포로 빠진다" — going up the wrong mountain instead of forward). The harness must **actively block** this drift at run time, not merely discourage it. The default is ruler-straight execution toward the user's stated goal; any deviation requires explicit justification, not the inverse.
+This rule exists because LLMs (including you) are trained to be helpful, comprehensive, and thorough — and "helpful" easily becomes "did more than asked." Doing more than asked is not helpfulness; it is scope creep. Read the rules below as hard blocks, not soft preferences.
+**The five drift patterns you must refuse to execute on:**
+1. **Unrequested work.** "While I'm here, I noticed X is broken/ugly/inefficient" → **stop**. The user did not ask for X. If X is a real defect, surface it as a finding, a follow-up suggestion, or an entry in a TODO list — do NOT fix it inside the current change. Mixing unrequested work with requested work is what makes diffs unreviewable and PRs eternal.
+2. **Tangential cleanup.** "This file looks messy, let me also tidy..." → **stop**. The current task is the only task. Unrelated cleanup is a separate change requiring its own justification, scope, and pre-flight 0 check.
+3. **Speculative robustness.** "Just adding a check / fallback / handler for the case where..." → **stop**. If the case has not been observed (in production, in tests, in a finding), it does not belong in this change. Defensive code added for unobserved cases is the most common form of accretion debt — it never gets removed because nobody can prove the case never happens.
+4. **Re-scoping mid-flight.** "Actually, the better way to do this is to also restructure / rename / migrate..." → **stop**. If you discover the requested approach is wrong, surface that to the user with evidence and let them adjudicate. Do NOT silently expand scope. The user's explicit redirect is the only authorization to enlarge a task.
+5. **Curiosity detours.** "Let me also explore how Y works to understand this better..." → **stop**, unless Y is provably on the goal's critical path. Curiosity-driven exploration is creative-mode; default is execution-mode.
+**The single drift test before any deviation from the stated goal:** *"Did the user ask for this, OR does the user's stated goal strictly require it?"* If the answer to both is no, do not do it. Surface it as a note (commit message, end-of-turn summary, finding) and continue on the original path.
+**Creative-mode is the narrow exception, not the default.** Creative-mode applies only when (a) the user explicitly invoked an ideation/exploration surface (`/devlyn:ideate`, optional `/devlyn:design-system`, "let's brainstorm", "explore options for"), OR (b) the goal is genuinely under-specified and a clarifying question is impossible (extremely rare — usually you should ask). For everything else — bug fixes, feature work, refactors, doc updates, pipeline runs, code review, debugging — execution-mode is the default and drift is a defect, not a feature.
+**Anti-rationalization clause** — explicitly guarding against LLM hedging:
+- "It's a small extra change" is **not** a justification. Small accretions compound; one of them is always small.
+- "It's related to what they asked for" is **not** a justification. Related ≠ requested. Requested is the only standard.
+- "It would be incomplete without this" is **not** a justification. The user defines completeness, not your sense of it.
+- "I'm being thorough" is **not** a justification. Thoroughness on the requested goal is required; thoroughness extending past the goal is drift.
+**When in doubt, ask — outside hands-free pipelines.** In interactive sessions a short clarification ("the requested fix touches the X code path; I notice Y also looks broken — should I fix it in this change or surface it as a follow-up?") is always cheaper than a wrong-scope diff. Asking is not a weakness; silently expanding scope is. **Inside hands-free pipelines** (`/devlyn:resolve`, scheduled remote agents, autonomous skill runs) the contract forbids mid-pipeline prompts — there asking is unsafe because there is no user to answer. The substitute is: stay strictly on the requested goal, do not expand scope, and log the question/assumption explicitly in the final report (or `.devlyn/runs/<run_id>/` artifacts) so the user can adjudicate after the run completes. Choosing scope creep over logging-and-staying-on-path is always wrong.
+**Stopping rule.** A task is done when the user's stated goal is closed AND no off-path work was added. If you find yourself hesitating because "I should also do Z" — Z is drift. Note it for follow-up, do not execute.
+<!-- runtime-principles:section=goal-locked:end -->
+## No-workaround discipline
+<!-- runtime-principles:section=no-workaround:begin -->
+No `any`, no `@ts-ignore`, no silent `catch`, no hardcoded values, no helper scripts that bypass the root cause. Fix root causes; handle errors with user-visible state per the rule above.
+**Permitted exceptions** (explicitly carved out):
+- CSS fallback fonts, CDN failover, image placeholders — widely-accepted best practices.
+- Codex CLI availability downgrade — the one documented silent fallback in this repo. Fires when the resolved engine is `auto` or `codex` (either via skill default or explicit `--engine` flag) and the Codex CLI is absent. Banner `engine downgraded: codex-unavailable` always prints; verdict identical to `--engine claude`. Any other silent fallback in skills code is a bug — file it against the skill that introduced it.
+<!-- runtime-principles:section=no-workaround:end -->
+## Evidence over claim
+<!-- runtime-principles:section=evidence:begin -->
+Every finding cites concrete evidence. Vague claims are speculation; exclude them.
+- **Code findings**: `file:line` you have opened.
+- **Missing findings**: explicit "searched X and found no implementation" statement.
+- **Doc findings**: quote of the stale text + section/line reference.
+- **Browser findings**: screenshot reference + URL/route.
+A finding without one of these forms is excluded. Vague findings produce vague fixes.
+<!-- runtime-principles:section=evidence:end -->
+<!-- runtime-principles:contract:end -->
+<!-- runtime-principles:consumption:begin -->
+## Consumption (as of iter-0019.A)
+**Consumers**:
+- `auto-resolve/SKILL.md` `<harness_principles>` block points here as the contract source. Phase prompt bodies (`phase-1-build.md`, `phase-2-evaluate.md`, `phase-3-critic.md`) inline a compact operational excerpt derived from the contract — phase-specific rule_id mappings + the four section names — not the full text.
+- `preflight/SKILL.md` PHASE 3 (Synthesize) and PHASE 3.5 (RND2) reference this file. Auditor prompts (`code-auditor.md`, `browser-auditor.md`) emit `principle.*` rule_ids derived from the rules above.
+**Codex routing**: skills that route to Codex (auto-resolve fix-loop on `--engine auto`/`codex`, preflight code-auditor on `--engine auto`/`codex`) MUST inline the contract excerpt directly into the Codex prompt — Codex has no filesystem access under `read-only` sandbox.
+**Non-consumers**:
+- `ideate/SKILL.md` does NOT consume this file. Ideate is planning-layer; its CHALLENGE rubric (`references/challenge-rubric.md`) covers analogous concerns at planning scope, with deliberate one-shot Codex critic discipline.
+<!-- runtime-principles:consumption:end -->