npm - @trohde/earos - Versions diffs - 1.0.0 - Mend

@trohde/earos 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (135) hide show

package/assets/init/.agents/skills/earos-create/references/criterion-writing-guide.md ADDED Viewed

@@ -0,0 +1,268 @@
+# Criterion Writing Guide
+This guide covers how to write well-calibrated EAROS criteria with all 13 v2 required fields. Use it during Step 4 (draft) and whenever a criterion feels ambiguous or hard to score.
+---
+## The 13 Required Fields
+Every criterion in a v2 rubric must have all of these fields. An incomplete criterion cannot be reliably calibrated.
+```yaml
+- id: [ARTIFACT-TOPIC-NN]
+  question: "[The scoring question]"
+  description: "[Why this matters — what goes wrong when it's absent]"
+  metric_type: ordinal
+  scale: [0, 1, 2, 3, 4, "N/A"]
+  gate: false           # or gate: { enabled: true, severity: major, failure_effect: "..." }
+  required_evidence:
+    - "[observable item 1]"
+    - "[observable item 2]"
+  scoring_guide:
+    "0": "[Absent — what 0 looks like]"
+    "1": "[Weak — what 1 looks like]"
+    "2": "[Partial — what 2 looks like]"
+    "3": "[Good — what 3 looks like]"
+    "4": "[Strong — what 4 looks like]"
+  anti_patterns:
+    - "[common wrong pattern]"
+  examples:
+    good:
+      - "[Direct quote or close paraphrase of a strong version]"
+    bad:
+      - "[Direct quote or close paraphrase of a weak version]"
+  decision_tree: >
+    IF [observable condition] THEN score [N].
+    IF [observable condition] THEN score [N].
+  remediation_hints:
+    - "[Specific action to improve the score]"
+```
+---
+## Field-by-Field Guide
+### `id`
+- Pattern: `[ARTIFACT-ABBREV]-[TOPIC]-[NN]` — e.g., `PM-ROOT-01`, `DC-SCHEMA-02`
+- Must be unique across **all** files in `core/`, `profiles/`, and `overlays/`
+- Kebab-case topics: `ROOT`, `OWNER`, `VIEW`, `TRACE`, `SCHEMA`, `SLO`, `DEPR`
+### `question`
+- Must be a yes/no or does-the-artifact question, not an open-ended one
+- Bad: "How well does the artifact describe the solution?"
+- Good: "Does the artifact include a numbered data flow walkthrough covering all integration points?"
+- The question should be answerable from observable evidence in the artifact
+### `description`
+- Explain **why this criterion matters** — what goes wrong when it's absent
+- Two sentences minimum: (1) what it checks, (2) what the failure consequence is
+- Avoid restating the question. Add depth: what's the downstream impact on reviewers, delivery teams, or governance?
+### `metric_type`
+- Always `ordinal` for EAROS rubric criteria
+### `scale`
+- Always `[0, 1, 2, 3, 4, "N/A"]`
+### `gate`
+- See the [Gate Guidance](#gate-guidance) section below
+- Either `gate: false` or `gate: { enabled: true, severity: [type], failure_effect: "[text]" }`
+- Valid severities: `none`, `advisory`, `major`, `critical`
+### `required_evidence`
+- List observable, concrete items — not abstract properties
+- Bad: `- quality evidence` or `- relevant content`
+- Good: `- named data retention policy with owner and review date`
+- Good: `- deployment diagram showing network zone boundaries`
+- Good: `- exception log with approver and expiry for each non-compliant item`
+### `scoring_guide`
+The most critical field for calibration. All five levels must be distinct and observable.
+**The cardinal rule:** each level descriptor must be distinguishable from adjacent levels by observable features — not by degree words like "somewhat" or "mostly".
+Level patterns that work:
+- Level 0: "Absent or directly contradicted"
+- Level 1: "Present but [specific limitation]"
+- Level 2: "Present with [specific limitation] — [what's still missing]"
+- Level 3: "Addressed for most [X], with [specific remaining gap]"
+- Level 4: "Complete, consistent, and [specific discriminating feature that 3 lacks]"
+**Level 2–3 is where calibration breaks down most often.** Make these descriptions as specific as possible. The difference between 2 and 3 should be observable, not a matter of opinion.
+### `anti_patterns`
+- List 2–4 common failures you've actually seen for this criterion
+- These are the most valuable field for AI disambiguation — they provide negative examples that help agents avoid false positives
+- Be specific: not "incomplete diagram" but "deployment diagram that shows application tier only, with no network zone boundaries or external dependencies"
+### `examples`
+- Both `good` and `bad` are required
+- Use direct quotes or close paraphrases from real artifacts (anonymised if needed)
+- Good examples should demonstrate level 3–4 evidence
+- Bad examples should demonstrate level 0–1 evidence
+- Do not use placeholder text like "[Strong evidence example]"
+### `decision_tree`
+A series of IF/THEN conditions that operationalise the scoring_guide for AI agents.
+Structure:
+```
+IF [observable condition A] THEN score 0.
+IF [observable condition B] THEN score 1.
+IF [observable condition C] AND [observable condition D] THEN score 2.
+IF [observable condition E] THEN score 3.
+IF [observable condition E] AND [observable condition F] THEN score 4.
+```
+Rules:
+- Conditions must be **observable from the artifact** — not judgment-dependent
+- Cover all five score levels
+- Use count-based conditions where possible: "IF less than 2 [X]", "IF 3+ [X]"
+- Avoid "adequate", "sufficient", "thorough" — replace with observable counts or presence/absence
+### `remediation_hints`
+- 2–4 specific actions the author can take to improve the score
+- Each hint should be actionable within a week, not aspirational
+- Bad: "Improve the quality of the traceability"
+- Good: "Add a traceability matrix linking each business driver to the specific design section that addresses it"
+---
+## Gate Guidance {#gate-guidance}
+Gates prevent weak scores on important criteria from being hidden by weighted averages. The key design principle: **gate deliberately, not reflexively**.
+### Gate Types and Effects
+| Severity | Effect | When to use |
+|----------|--------|-------------|
+| `none` | Contributes to score only | No governance consequence for weakness |
+| `advisory` | Weakness triggers a recommendation | Important but not blocking |
+| `major` | Caps status at `conditional_pass` | Important enough that weakness prevents full approval |
+| `critical` | Forces `reject` status regardless of average | Compliance-level failures; no-go conditions |
+### Gate Quotas per Profile
+| Gate type | Target count | Absolute max |
+|-----------|-------------|--------------|
+| `critical` | 0–1 | 1 |
+| `major` | 1–2 | 3 |
+| `advisory` | 0–3 | — |
+If you're assigning more than 3 major gates, you're over-gating. Ask: "Would I really reject an otherwise excellent artifact because this one criterion is weak?"
+### Gate Failure Effects — Wording Patterns
+```yaml
+# Critical gate
+gate:
+  enabled: true
+  severity: critical
+  failure_effect: reject when [the specific condition that cannot be ignored]
+# Major gate
+gate:
+  enabled: true
+  severity: major
+  failure_effect: Cannot pass above conditional_pass if score < 2
+```
+### Gate Decision Heuristics
+**Use critical when:**
+- The artifact is completely un-reviewable without this (scope/boundary is an example)
+- A mandatory regulatory or compliance control is at stake
+- The failure mode is "we're approving something we literally cannot evaluate"
+**Use major when:**
+- The concern is important enough to cap approval even if the rest is strong
+- Weakness here creates concrete delivery or governance risk
+- The concern is the core quality signal for this artifact type
+**Use advisory or none when:**
+- Weakness is annoying but survivable
+- The criterion is secondary or supplementary to the main quality signal
+- You're adding it because it's sometimes relevant, not because it's always critical
+### Common Over-Gating Patterns to Avoid
+- Gating every criterion in a profile (dilutes the gate signal)
+- Critical-gating criteria that are actually style preferences
+- Gating criteria that duplicate core meta-rubric gates
+- Gating criteria where "absent" is not actually a blocking failure
+---
+## Worked Example: Complete v2 Criterion
+This is a full criterion from a hypothetical post-mortem profile, demonstrating all 13 fields at the expected quality level.
+```yaml
+- id: PM-ROOT-01
+  question: Does the post-mortem identify the root cause(s) of the incident, distinguished from contributing factors and symptoms?
+  description: >
+    A post-mortem that describes only symptoms or contributing factors without identifying
+    the root cause provides no basis for systemic remediation. Teams will implement surface
+    fixes that leave the underlying cause in place, and the incident will recur. Root cause
+    analysis must distinguish between what triggered the incident (symptom), what enabled
+    it to occur (contributing factors), and what must change to prevent recurrence (root cause).
+    Without this distinction, remediation actions are likely to be ineffective.
+  metric_type: ordinal
+  scale: [0, 1, 2, 3, 4, "N/A"]
+  gate:
+    enabled: true
+    severity: major
+    failure_effect: Cannot pass above conditional_pass — post-mortems without root cause analysis cannot drive systemic improvement
+  required_evidence:
+    - root cause statement (explicitly labelled, not implied)
+    - distinction between root cause, contributing factors, and trigger/symptoms
+    - causal chain or 5-whys analysis
+  scoring_guide:
+    "0": No root cause analysis — incident described as a sequence of events only
+    "1": Root cause mentioned but not distinguished from symptoms or contributing factors
+    "2": Root cause identified but causal chain is missing or assumed — why the root cause exists is not explained
+    "3": Root cause identified with causal chain — contributing factors distinguished — remediation plausibly addresses root cause
+    "4": Root cause identified with validated causal chain, distinguished from all contributing factors and symptoms, and remediation actions directly address root cause with acceptance criteria
+  anti_patterns:
+    - "Root cause: human error — no further analysis" (symptom, not root cause)
+    - Listing 8 root causes (usually means contributing factors were not separated)
+    - Root cause section absent; narrative describes "what happened" only
+    - Remediation actions that address symptoms rather than the root cause
+  examples:
+    good:
+      - >
+        "Root cause: Database connection pool was not configurable at runtime; configuration
+        was compiled into the service image. Contributing factors: (1) No alerting on
+        connection exhaustion — only on query failure. (2) Load test did not simulate
+        connection-heavy bursts. Trigger: Black Friday traffic spike. [Causal chain, 5 sections,
+        remediation: add runtime configurability with separate alerting threshold.]"
+    bad:
+      - >
+        "Root cause: The database ran out of connections during the sale event.
+        [This is a symptom. No causal chain. No contributing factor separation.
+        Remediation: increase connection pool size — addresses the trigger, not the cause.]"
+  decision_tree: >
+    IF no root cause section or root cause not labelled THEN score 0.
+    IF root cause labelled but indistinguishable from symptom or contributing factor THEN score 1.
+    IF root cause identified but no causal chain explaining why it exists THEN score 2.
+    IF root cause with causal chain AND contributing factors distinguished THEN score 3.
+    IF root cause validated, causal chain complete, all factors distinguished, AND remediation directly addresses root cause with acceptance criteria THEN score 4.
+  remediation_hints:
+    - Apply 5-whys or fishbone analysis to find the systemic cause behind the immediate trigger
+    - Explicitly label and separate: trigger (what set it off), contributing factors (what enabled it), root cause (what must change)
+    - Verify that each remediation action addresses the root cause, not just the trigger
+    - Add acceptance criteria to remediation actions so you can verify the root cause has been addressed
+```
+---
+## Common Criterion Writing Mistakes
+| Mistake | Problem | Fix |
+|---------|---------|-----|
+| Scoring guide uses "somewhat", "mostly", "adequately" | Not observable; different reviewers interpret differently | Replace with observable features: counts, presence/absence, named sections |
+| Decision tree mirrors scoring guide word-for-word | Provides no additional disambiguation | Add IF/THEN logic with countable conditions the scoring guide doesn't have |
+| Required evidence is abstract ("relevant content") | Evaluators can't know what to look for | List specific observable items: "named owner", "diagram with X", "table with Y columns" |
+| Examples are placeholder text | Evaluator has no real reference | Use real quotes (anonymised if needed) — even invented but realistic examples are better than placeholders |
+| anti_patterns are too generic ("missing information") | Doesn't help AI agents avoid false positives | Be specific about what the wrong version looks like |
+| All criteria have critical gates | Dilutes gate signal; too many false rejects | Reserve critical for compliance-level no-gos; use major for quality caps |

package/assets/init/.agents/skills/earos-create/references/dependency-rules.md ADDED Viewed

@@ -0,0 +1,193 @@
+# Dependency Rules
+This file documents what depends on what in the EAROS three-layer model, how to check for conflicts before creating a new rubric, and the profile-vs-overlay decision framework.
+---
+## The Three-Layer Dependency Model
+```
+┌──────────────────────────────────────────────────┐
+│  OVERLAYS (cross-cutting)                        │
+│  security · data-governance · regulatory         │
+│  Applied by context, not artifact type           │
+├──────────────────────────────────────────────────┤
+│  PROFILES (artifact-specific)                    │
+│  solution-architecture · reference-architecture  │
+│  adr · capability-map · roadmap                  │
+│  Each inherits: [EAROS-CORE-002]                 │
+├──────────────────────────────────────────────────┤
+│  CORE (universal foundation)                     │
+│  core-meta-rubric.yaml (EAROS-CORE-002)       │
+│  9 dimensions · 10 criteria · gate model         │
+└──────────────────────────────────────────────────┘
+```
+**Dependency direction:**
+- Profiles depend on Core: if Core changes, profiles may need updating
+- Overlays depend on nothing: they append to whatever base rubric is in use
+- Core has no dependencies
+---
+## Checks Before Creating Each Type
+### Before creating a Profile
+1. **Does the core exist?**
+   - Read `core/core-meta-rubric.yaml`
+   - If absent: ask "Do you want to create a core rubric first, or proceed with a standalone profile (without inheriting from core)?"
+2. **Does a profile already exist for this artifact type?**
+   - List `profiles/` directory
+   - If a profile exists: "A profile for [artifact-type] already exists (`profiles/[name].yaml`). Do you want to revise it (bump version) or create a supplementary profile for a specific sub-type?"
+3. **What does Core already cover?**
+   - Core's 10 criteria cover: stakeholder fit, concern-view mapping, scope & boundaries, viewpoint appropriateness, traceability, internal consistency, risks/assumptions/tradeoffs, compliance, actionability, stewardship
+   - **Do not add profile criteria that duplicate these.** Every criterion must add something that Core cannot measure.
+4. **Is the artifact type genuinely distinct from existing profiles?**
+   - "Platform handover doc" and "solution architecture" might overlap — verify they need different criteria
+   - If 70%+ of the criteria you'd write already exist in an existing profile, consider whether a revision is better than a new profile
+### Before creating an Overlay
+1. **Does a similar overlay already exist?**
+   - List `overlays/` directory
+   - Check scope: could the new concern be added to an existing overlay as additional criteria?
+   - If yes: "The [existing-overlay] already covers [X]. Should we add your concern there, or is it distinct enough to warrant a separate overlay?"
+2. **Which profiles will this overlay apply to?**
+   - List existing profiles
+   - For each profile, confirm: "Does this overlay make sense for a [profile-type]?" — e.g., a "cost transparency" overlay applies to reference architectures and solution architectures but probably not to ADRs
+   - Document the intended applicability scope in the overlay's purpose field
+3. **Is the concern truly cross-cutting, or artifact-specific?**
+   - If the concern only applies to one artifact type, it should be a profile addition, not an overlay
+   - Test: "Would you apply this concern when reviewing a capability map? A roadmap? An ADR?" — if the answer is "no" for most types, it's a profile concern
+4. **Does the overlay have at least one critical or major gate?**
+   - Overlays exist because the concern matters enough to inject into any rubric
+   - An overlay with no gates is likely just a checklist — consider whether it's needed at all
+### Before creating a Core Rubric
+1. **Is modifying EAROS-CORE-002 truly necessary?**
+   - Core changes affect every profile and every assessment. This is a governance decision.
+   - Ask: "What specific capability does the existing core lack for your use case?"
+   - If the answer is artifact-specific, it's a profile, not a core change
+2. **Are you replacing or creating a domain-specific core?**
+   - Replacing `EAROS-CORE-002`: requires version bump and governance approval
+   - Domain-specific core (e.g., `EAROS-CORE-DATAPLATFORM-001`): a parallel core for a specific domain; profiles in that domain would inherit it instead of `EAROS-CORE-002`
+3. **ID assignment for a new core:**
+   - Pattern: `EAROS-CORE-[DOMAIN]-[NNN]` for domain-specific cores
+   - Or bump `EAROS-CORE-002` → `EAROS-CORE-003` for a full replacement
+---
+## Profile vs. Overlay Decision Framework {#profile-vs-overlay}
+Use this when the user is unsure which to create.
+### Ask these three questions:
+**1. Does it apply to only one artifact type, or many?**
+- Only one → profile addition
+- Many → overlay candidate
+**2. Is it driven by the artifact's purpose, or by external context?**
+- Purpose-driven ("capability maps need heat-map coverage") → profile
+- Context-driven ("this artifact involves PII data") → overlay
+**3. Would a reviewer need it for every artifact of type X, or only sometimes?**
+- Always for type X → profile
+- Sometimes (when the context warrants it) → overlay
+### Decision Matrix
+| Criterion applies to... | Driven by... | When needed... | Use |
+|------------------------|-------------|----------------|-----|
+| One artifact type | Artifact's purpose | Always | Profile |
+| Multiple artifact types | External context | Sometimes | Overlay |
+| One artifact type | External context | Sometimes | Profile addition OR overlay |
+| Multiple types | Artifact purpose | Always | Consider merging into Core |
+### Edge Cases
+**"Security concerns for solution architectures only":**
+- If your security criteria are specific to how solution architectures describe security (e.g., threat model views, security zone diagrams) → add to the solution-architecture profile
+- If the concern applies regardless of artifact type (e.g., "any artifact describing authentication must show the control mapping") → overlay
+**"Data governance for capability maps specifically":**
+- If it's about how capability maps represent data ownership → profile addition
+- If it's triggered by "this capability map describes a data platform (context)" → overlay
+---
+## ID Uniqueness Rules
+All criterion IDs must be unique across the entire EAROS repository — not just within the file.
+### ID Namespace Check
+Before assigning any criterion ID, scan:
+1. `core/core-meta-rubric.yaml` — existing IDs: `STK-01`, `STK-02`, `SCP-01`, `CVP-01`, `TRC-01`, `CON-01`, `RAT-01`, `CMP-01`, `ACT-01`, `MNT-01`
+2. All files in `profiles/` — check criterion IDs in each
+3. All files in `overlays/` — check criterion IDs in each
+### ID Patterns by Type
+| File type | Rubric ID pattern | Criterion ID pattern |
+|-----------|------------------|---------------------|
+| Core rubric | `EAROS-CORE-[DOMAIN?]-[NNN]` | `[ABBREV]-[NN]` e.g. `STK-01` |
+| Profile | `EAROS-[ARTIFACT]-[NNN]` | `[ARTIFACT-ABBREV]-[TOPIC]-[NN]` |
+| Overlay | `EAROS-OVR-[CONCERN]-[NNN]` | `[OVR-ABBREV]-[TOPIC]-[NN]` |
+### Examples of Good vs. Conflicting IDs
+| Proposed ID | Problem | Fix |
+|-------------|---------|-----|
+| `STK-01` | Conflicts with core | Use `PM-STK-01` or `PM-OWNER-01` |
+| `RA-VIEW-01` | Conflicts with reference-architecture profile | Use `SA-VIEW-01` for solution architecture |
+| `REG-ID-01` | Conflicts with regulatory overlay | Use `AI-TRANS-01` for AI transparency overlay |
+---
+## How Core Changes Ripple to Profiles
+If `EAROS-CORE-002` changes (new criteria, revised gates, changed scoring thresholds):
+1. **All profiles that `inherits: [EAROS-CORE-002]` are affected** — they will now include the new/changed core criteria automatically
+2. **Overlays are unaffected** — they append to the base rubric at runtime
+3. **Calibration packs are potentially invalidated** — if scoring thresholds or gate models change, previous calibration results may no longer apply
+4. **Worked examples need review** — existing evaluation records in `examples/` may score differently under the new core
+This is why core changes require a governance decision. A profile addition is always safer than a core change.
+---
+## Versioning Rules
+### When to bump versions
+| Change | Version bump |
+|--------|-------------|
+| New criterion added | MINOR (e.g. 1.0.0 → 1.1.0) |
+| Existing criterion scoring guide revised | MINOR |
+| Gate severity changed | MAJOR (e.g. 1.1.0 → 2.0.0) |
+| Status threshold changed | MAJOR |
+| Criterion removed | MAJOR |
+| Typo fix, documentation improvement | PATCH (e.g. 1.1.0 → 1.1.1) |
+### Status transitions
+```
+draft → candidate → approved → deprecated
+```
+- `draft`: Not calibrated. Must not be used in live governance.
+- `candidate`: Calibrated against at least 3 artifacts with 2+ reviewers. Safe for pilot use.
+- `approved`: Fully validated and ratified by governance owner.
+- `deprecated`: Superseded by a newer version. Existing evaluation records retain the version reference.

package/assets/init/.agents/skills/earos-create/references/rubric-interview-guide.md ADDED Viewed

@@ -0,0 +1,123 @@
+# Rubric Interview Guide
+This guide deepens the Step 3 interview in SKILL.md. Use it when initial answers are thin, when the user seems uncertain about what they're trying to create, or when you need richer material to generate good scoring guide descriptors.
+---
+## Why the Interview Matters More Than the Template
+A rubric written from a template but without understanding the artifact type produces criteria that:
+- Are too abstract to calibrate reliably (different reviewers interpret them differently)
+- Duplicate what the core already measures (wasted criteria budget)
+- Miss the actual failure modes that make this artifact type problematic
+The interview is where you learn the specific quality signals for this artifact type. You cannot invent them — they come from people who have reviewed many examples.
+---
+## Deepening Question Bank
+Use these follow-up questions when initial answers are surface-level. You don't need all of them — pick the ones relevant to what's missing.
+### On Artifact Identity
+**If the user can't describe the decision it supports:**
+- "Imagine someone reads this artifact and then has to do something. What do they do — and for whom?"
+- "What would be different in the world if this artifact didn't exist or was absent?"
+- "Who would be upset if this artifact was missing from a review pack? What would they be unable to decide?"
+**If the artifact type seems too broad:**
+- "Is there a sub-type that behaves very differently? For example, a 'design document' could be a solution architecture, a data model, or an API contract — each needs different criteria."
+- "Would you review a 10-page version and a 50-page version the same way? If not, they might be different types."
+**If the frequency or stakes are unclear:**
+- "Does this artifact go through a formal governance gate, or is it reviewed informally?"
+- "What happens if a bad version gets approved — what's the worst-case consequence?"
+---
+### On Quality Markers
+**If the user can't describe what a good example looks like:**
+- "Think of the best piece of work you've ever reviewed of this type. What stood out? What did it have that others didn't?"
+- "If you were teaching someone to write this artifact type, what's the first thing you'd tell them?"
+- "What question does a good version answer that a weak version leaves hanging?"
+**If the user can only list obvious failures:**
+- "Beyond 'it's incomplete' or 'it's unclear' — what specifically is always missing from a weak version of this artifact type?"
+- "What does the author always get wrong on the first draft? Not whether they tried — but what does the output miss?"
+- "Have you ever had an artifact rejected or sent back? What triggered that? What was the specific problem?"
+**If the mid-range (level 2–3) is unclear:**
+- "If an artifact addresses this thing partially — some of it there, some missing — what does 'partial' look like specifically?"
+- "What's the minimum version that would still be useful to a reviewer, even if imperfect?"
+---
+### On Structure and Criteria
+**If the user proposes too many criteria (> 12):**
+- "If you had to cut this list in half, which 5-6 things are most important? What would you still be willing to fail an artifact on?"
+- "Are any of these criteria really different aspects of the same underlying thing? If so, they might merge into one criterion with a richer scoring guide."
+- "Which of these does the core meta-rubric already cover? Remember: stakeholders, scope, traceability, consistency, compliance, and actionability are already in every assessment."
+**If the criteria feel abstract:**
+- "For this criterion, what's the specific observable evidence you'd look for — a table, a diagram, a named section, a decision statement?"
+- "If you were writing a checklist for a junior reviewer, what would the first item say?"
+- "How would you know, in 30 seconds of skimming, whether an artifact passes this criterion?"
+---
+### On Gates and Stakes
+**If the user wants everything to be a critical gate:**
+- "If every criterion were a critical gate, any weakness anywhere would cause a reject. Is that what you intend?"
+- "What's the minimum viable artifact of this type — the version you'd conditionally approve with named conditions? What must be present for that to be possible?"
+- "Reserve critical gates for: (1) things that make the artifact completely un-reviewable, (2) mandatory compliance requirements. Everything else should be major or advisory."
+**If the user can't identify any gates:**
+- "Is there anything so important that its absence should immediately cap or block approval — regardless of how good the rest of the artifact is?"
+- "What would you tell an Architecture Board member if they asked 'why did this pass?' — if there's something you'd be embarrassed to have absent, that's a gate candidate."
+---
+## Profile-Specific Interview Patterns
+### For ADR-like artifacts (decision records)
+- "What makes a good decision record vs. a good argument document? The difference matters for criteria."
+- "Does the artifact need to be reversible — can the decision be revisited? If so, criteria for time-bounding decisions matter."
+- "Who needs to be able to reconstruct the reasoning 2 years from now with no context? That drives the evidence requirements."
+### For Capability Maps
+- "What level of abstraction is expected — business capability, product capability, or technical capability?"
+- "Is this a current state, target state, or transition map? Each has different criteria."
+- "What makes a capability 'well-defined' in your context — is it a name, a description, a set of processes, a heat map?"
+### For Roadmaps
+- "What's the planning horizon — 3 months, 12 months, 3 years? This changes what 'specificity' means."
+- "Who owns delivery of the roadmap items? Named owners or just teams?"
+- "Is this a commitment roadmap (things we will do) or a strategy roadmap (things we aim to do)? They have different evidence requirements."
+### For Platform/Operational Handover Docs
+- "Who's the recipient — someone taking over operational ownership, a new team adopting the platform, or an integration partner?"
+- "What must they be able to do after reading this that they couldn't do before?"
+- "What's the runbook equivalent for this artifact type?"
+### For Overlays (Cross-cutting concerns)
+- "Does this overlay apply to all artifact types equally, or are some more important than others?"
+- "For which artifact types is this concern most critical? (Those are your gate candidates.)"
+- "What's the minimum evidence that would satisfy this concern — even for a simple, low-risk artifact?"
+---
+## When to Stop the Interview
+Stop when you can answer these five questions from the interview responses:
+1. **Artifact identity**: What is this thing and what decision does it support?
+2. **Quality levels**: What does a level 0, 2, and 4 artifact look like for the 3–5 key criteria?
+3. **Common failures**: What are the 3 most reliable signs of a weak artifact of this type?
+4. **Critical/major gates**: What would cause an outright reject vs. a conditional pass?
+5. **Design method**: Which of the 5 methods (A–E) best fits the dimensional structure?
+If you can answer all five, you have enough to generate good YAML. If you cannot, ask one more question before drafting.