@trohde/earos 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (135) hide show
  1. package/README.md +156 -0
  2. package/assets/init/.agents/skills/earos-artifact-gen/SKILL.md +106 -0
  3. package/assets/init/.agents/skills/earos-artifact-gen/references/interview-guide.md +313 -0
  4. package/assets/init/.agents/skills/earos-artifact-gen/references/output-guide.md +367 -0
  5. package/assets/init/.agents/skills/earos-assess/SKILL.md +212 -0
  6. package/assets/init/.agents/skills/earos-assess/references/calibration-benchmarks.md +160 -0
  7. package/assets/init/.agents/skills/earos-assess/references/output-templates.md +311 -0
  8. package/assets/init/.agents/skills/earos-assess/references/scoring-protocol.md +281 -0
  9. package/assets/init/.agents/skills/earos-calibrate/SKILL.md +153 -0
  10. package/assets/init/.agents/skills/earos-calibrate/references/agreement-metrics.md +188 -0
  11. package/assets/init/.agents/skills/earos-calibrate/references/calibration-protocol.md +263 -0
  12. package/assets/init/.agents/skills/earos-create/SKILL.md +257 -0
  13. package/assets/init/.agents/skills/earos-create/references/criterion-writing-guide.md +268 -0
  14. package/assets/init/.agents/skills/earos-create/references/dependency-rules.md +193 -0
  15. package/assets/init/.agents/skills/earos-create/references/rubric-interview-guide.md +123 -0
  16. package/assets/init/.agents/skills/earos-create/references/validation-checklist.md +238 -0
  17. package/assets/init/.agents/skills/earos-profile-author/SKILL.md +251 -0
  18. package/assets/init/.agents/skills/earos-profile-author/references/criterion-writing-guide.md +280 -0
  19. package/assets/init/.agents/skills/earos-profile-author/references/design-methods.md +158 -0
  20. package/assets/init/.agents/skills/earos-profile-author/references/profile-checklist.md +173 -0
  21. package/assets/init/.agents/skills/earos-remediate/SKILL.md +118 -0
  22. package/assets/init/.agents/skills/earos-remediate/references/output-template.md +199 -0
  23. package/assets/init/.agents/skills/earos-remediate/references/remediation-patterns.md +330 -0
  24. package/assets/init/.agents/skills/earos-report/SKILL.md +85 -0
  25. package/assets/init/.agents/skills/earos-report/references/portfolio-template.md +181 -0
  26. package/assets/init/.agents/skills/earos-report/references/single-artifact-template.md +168 -0
  27. package/assets/init/.agents/skills/earos-review/SKILL.md +130 -0
  28. package/assets/init/.agents/skills/earos-review/references/challenge-patterns.md +163 -0
  29. package/assets/init/.agents/skills/earos-review/references/output-template.md +180 -0
  30. package/assets/init/.agents/skills/earos-template-fill/SKILL.md +177 -0
  31. package/assets/init/.agents/skills/earos-template-fill/references/evidence-writing-guide.md +186 -0
  32. package/assets/init/.agents/skills/earos-template-fill/references/section-rubric-mapping.md +200 -0
  33. package/assets/init/.agents/skills/earos-validate/SKILL.md +113 -0
  34. package/assets/init/.agents/skills/earos-validate/references/fix-patterns.md +281 -0
  35. package/assets/init/.agents/skills/earos-validate/references/validation-checks.md +287 -0
  36. package/assets/init/.claude/CLAUDE.md +4 -0
  37. package/assets/init/AGENTS.md +293 -0
  38. package/assets/init/CLAUDE.md +635 -0
  39. package/assets/init/README.md +507 -0
  40. package/assets/init/calibration/gold-set/.gitkeep +0 -0
  41. package/assets/init/calibration/results/.gitkeep +0 -0
  42. package/assets/init/core/core-meta-rubric.yaml +643 -0
  43. package/assets/init/docs/consistency-report.md +325 -0
  44. package/assets/init/docs/getting-started.md +194 -0
  45. package/assets/init/docs/profile-authoring-guide.md +51 -0
  46. package/assets/init/docs/terminology.md +126 -0
  47. package/assets/init/earos.manifest.yaml +104 -0
  48. package/assets/init/evaluations/.gitkeep +0 -0
  49. package/assets/init/examples/aws-event-driven-order-processing/artifact.yaml +2056 -0
  50. package/assets/init/examples/aws-event-driven-order-processing/evaluation.yaml +973 -0
  51. package/assets/init/examples/aws-event-driven-order-processing/report.md +244 -0
  52. package/assets/init/examples/example-solution-architecture.evaluation.yaml +136 -0
  53. package/assets/init/examples/multi-cloud-data-analytics/artifact.yaml +715 -0
  54. package/assets/init/overlays/data-governance.yaml +94 -0
  55. package/assets/init/overlays/regulatory.yaml +154 -0
  56. package/assets/init/overlays/security.yaml +92 -0
  57. package/assets/init/profiles/adr.yaml +225 -0
  58. package/assets/init/profiles/capability-map.yaml +223 -0
  59. package/assets/init/profiles/reference-architecture.yaml +426 -0
  60. package/assets/init/profiles/roadmap.yaml +205 -0
  61. package/assets/init/profiles/solution-architecture.yaml +227 -0
  62. package/assets/init/research/architecture-assessment-rubrics-research.docx +0 -0
  63. package/assets/init/research/architecture-assessment-rubrics-research.md +566 -0
  64. package/assets/init/research/reference-architecture-research.md +751 -0
  65. package/assets/init/standard/EAROS.md +1426 -0
  66. package/assets/init/standard/schemas/artifact.schema.json +1295 -0
  67. package/assets/init/standard/schemas/artifact.uischema.json +65 -0
  68. package/assets/init/standard/schemas/evaluation.schema.json +284 -0
  69. package/assets/init/standard/schemas/rubric.schema.json +383 -0
  70. package/assets/init/templates/evaluation-record.template.yaml +58 -0
  71. package/assets/init/templates/new-profile.template.yaml +65 -0
  72. package/bin.js +188 -0
  73. package/dist/assets/_basePickBy-BVu6YmSW.js +1 -0
  74. package/dist/assets/_baseUniq-CWRzQDz_.js +1 -0
  75. package/dist/assets/arc-CyDBhtDM.js +1 -0
  76. package/dist/assets/architectureDiagram-2XIMDMQ5-BH6O4dvN.js +36 -0
  77. package/dist/assets/blockDiagram-WCTKOSBZ-2xmwdjpg.js +132 -0
  78. package/dist/assets/c4Diagram-IC4MRINW-BNmPRFJF.js +10 -0
  79. package/dist/assets/channel-CiySTNoJ.js +1 -0
  80. package/dist/assets/chunk-4BX2VUAB-DGQTvirp.js +1 -0
  81. package/dist/assets/chunk-55IACEB6-DNMAQAC_.js +1 -0
  82. package/dist/assets/chunk-FMBD7UC4-BJbVTQ5o.js +15 -0
  83. package/dist/assets/chunk-JSJVCQXG-BCxUL74A.js +1 -0
  84. package/dist/assets/chunk-KX2RTZJC-H7wWZOfz.js +1 -0
  85. package/dist/assets/chunk-NQ4KR5QH-BK4RlTQF.js +220 -0
  86. package/dist/assets/chunk-QZHKN3VN-0chxDV5g.js +1 -0
  87. package/dist/assets/chunk-WL4C6EOR-DexfQ-AV.js +189 -0
  88. package/dist/assets/classDiagram-VBA2DB6C-D7luWJQn.js +1 -0
  89. package/dist/assets/classDiagram-v2-RAHNMMFH-D7luWJQn.js +1 -0
  90. package/dist/assets/clone-ylgRbd3D.js +1 -0
  91. package/dist/assets/cose-bilkent-S5V4N54A-DS2IOCfZ.js +1 -0
  92. package/dist/assets/cytoscape.esm-CyJtwmzi.js +331 -0
  93. package/dist/assets/dagre-KLK3FWXG-BbSoTTa3.js +4 -0
  94. package/dist/assets/defaultLocale-DX6XiGOO.js +1 -0
  95. package/dist/assets/diagram-E7M64L7V-C9TvYgv0.js +24 -0
  96. package/dist/assets/diagram-IFDJBPK2-DowUMWrg.js +43 -0
  97. package/dist/assets/diagram-P4PSJMXO-BL6nrnQF.js +24 -0
  98. package/dist/assets/erDiagram-INFDFZHY-rXPRl8VM.js +70 -0
  99. package/dist/assets/flowDiagram-PKNHOUZH-DBRM99-W.js +162 -0
  100. package/dist/assets/ganttDiagram-A5KZAMGK-INcWFsBT.js +292 -0
  101. package/dist/assets/gitGraphDiagram-K3NZZRJ6-DMwpfE91.js +65 -0
  102. package/dist/assets/graph-DLQn37b-.js +1 -0
  103. package/dist/assets/index-BFFITMT8.js +650 -0
  104. package/dist/assets/index-H7f6VTz1.css +1 -0
  105. package/dist/assets/infoDiagram-LFFYTUFH-B0f4TWRM.js +2 -0
  106. package/dist/assets/init-Gi6I4Gst.js +1 -0
  107. package/dist/assets/ishikawaDiagram-PHBUUO56-CsU6XimZ.js +70 -0
  108. package/dist/assets/journeyDiagram-4ABVD52K-CQ7ibNib.js +139 -0
  109. package/dist/assets/kanban-definition-K7BYSVSG-DzEN7THt.js +89 -0
  110. package/dist/assets/katex-B1X10hvy.js +261 -0
  111. package/dist/assets/layout-C0dvb42R.js +1 -0
  112. package/dist/assets/linear-j4a8mGj7.js +1 -0
  113. package/dist/assets/mindmap-definition-YRQLILUH-DP8iEuCf.js +68 -0
  114. package/dist/assets/ordinal-Cboi1Yqb.js +1 -0
  115. package/dist/assets/pieDiagram-SKSYHLDU-BpIAXgAm.js +30 -0
  116. package/dist/assets/quadrantDiagram-337W2JSQ-DrpXn5Eg.js +7 -0
  117. package/dist/assets/requirementDiagram-Z7DCOOCP-Bg7EwHlG.js +73 -0
  118. package/dist/assets/sankeyDiagram-WA2Y5GQK-BWagRs1F.js +10 -0
  119. package/dist/assets/sequenceDiagram-2WXFIKYE-q5jwhivG.js +145 -0
  120. package/dist/assets/stateDiagram-RAJIS63D-B_J9pE-2.js +1 -0
  121. package/dist/assets/stateDiagram-v2-FVOUBMTO-Q_1GcybB.js +1 -0
  122. package/dist/assets/timeline-definition-YZTLITO2-dv0jgQ0z.js +61 -0
  123. package/dist/assets/treemap-KZPCXAKY-Dt1dkIE7.js +162 -0
  124. package/dist/assets/vennDiagram-LZ73GAT5-BdO5RgRZ.js +34 -0
  125. package/dist/assets/xychartDiagram-JWTSCODW-CpDVe-8v.js +7 -0
  126. package/dist/index.html +23 -0
  127. package/export-docx.js +1583 -0
  128. package/init.js +353 -0
  129. package/manifest-cli.mjs +207 -0
  130. package/package.json +83 -0
  131. package/schemas/artifact.schema.json +1295 -0
  132. package/schemas/artifact.uischema.json +65 -0
  133. package/schemas/evaluation.schema.json +284 -0
  134. package/schemas/rubric.schema.json +383 -0
  135. package/serve.js +238 -0
@@ -0,0 +1,163 @@
1
+ # Challenge Patterns — EAROS Review
2
+
3
+ This file describes the 5 systemic failure modes in EAROS evaluations and how to detect each one. Read this before running the evidence audit (Phase 2).
4
+
5
+ ---
6
+
7
+ ## Why Systematic Patterns Matter
8
+
9
+ Individual scoring errors are expected and easy to catch — a score that doesn't match the level descriptor. Systematic patterns are harder to spot: they make the evaluation internally consistent (all scores hang together) while being consistently wrong. A generosity-biased evaluator produces scores that each seem plausible in isolation; the problem only surfaces when you apply level descriptors strictly across the whole set.
10
+
11
+ Knowing the five patterns lets you detect them quickly rather than reviewing every criterion with equal effort.
12
+
13
+ ---
14
+
15
+ ## Failure Mode 1 — Optimistic Evidence Classification
16
+
17
+ **What it looks like:** The evaluator marks `judgment_type: observed` but the excerpt is a paraphrase, interpretation, or inference — not a direct quote or clearly stated fact.
18
+
19
+ **Why it happens:** Evaluators unconsciously promote their interpretations to `observed` status to feel more confident. The distinction matters: `observed` evidence is more credible and defensible in governance contexts. Misclassifying `inferred` as `observed` overstates the artifact's quality.
20
+
21
+ **How to detect:**
22
+ - For each `observed` criterion, ask: "Could a skeptic argue this is an interpretation rather than a direct statement?"
23
+ - Check whether the excerpt uses quotation marks (direct quote) or paraphrase language ("the section suggests...", "it appears that...")
24
+ - Rule of thumb: `observed` + score 3 or 4 means the artifact explicitly and directly makes a strong claim. If it doesn't, the class is wrong.
25
+
26
+ **Good example (legitimately `observed`):**
27
+ ```yaml
28
+ criterion_id: SCP-01
29
+ score: 3
30
+ judgment_type: observed
31
+ excerpt: >
32
+ "In scope: Payments service, Notification service, upstream Banking Core API.
33
+ Out of scope: Authentication (handled by IAM platform), analytics pipeline."
34
+ ```
35
+ This is a direct quote with named elements — clearly `observed`.
36
+
37
+ **Bad example (should be `inferred`):**
38
+ ```yaml
39
+ criterion_id: SCP-01
40
+ score: 3
41
+ judgment_type: observed
42
+ excerpt: "The document clearly defines scope boundaries across all relevant components."
43
+ rationale: "Scope is comprehensively covered."
44
+ ```
45
+ The excerpt is a generalization, not a quote. This should be `inferred` at most, and the score should probably be 2.
46
+
47
+ ---
48
+
49
+ ## Failure Mode 2 — Generosity Bias
50
+
51
+ **What it looks like:** Scores of 3 where 2 is more accurate; consistent benefit-of-the-doubt across multiple criteria.
52
+
53
+ **Why it happens:** Evaluators interpret "this section exists" as "this criterion is addressed." The EAROS 0–4 scale requires progressively stronger evidence for higher scores — existence alone is typically score 2 ("present but incomplete"). Score 3 requires "clearly addressed with adequate evidence."
54
+
55
+ **How to detect:**
56
+ - For every score of 3: "Does the level descriptor for '3' describe what's in the artifact, or what a good version of it would look like?"
57
+ - Check the `scoring_guide` level 2 and 3 descriptors — the boundary is usually between "present but incomplete" (2) and "clearly addressed with adequate evidence" (3)
58
+ - Pattern check: if 60%+ of criteria score 3, generosity bias is likely
59
+
60
+ **Score 2 vs. 3 example using STK-01:**
61
+
62
+ | Score | What the artifact shows |
63
+ |-------|------------------------|
64
+ | 2 | "Technical stakeholders and business owner listed." — Explicit but incomplete (concerns not mapped) |
65
+ | 3 | "Stakeholders listed with their primary concerns mapped to each section." — Explicit and mostly complete |
66
+
67
+ **Challenge question:** "If I applied the level descriptors strictly, ignoring how well-written the artifact is, what score would this be?"
68
+
69
+ ---
70
+
71
+ ## Failure Mode 3 — Missing Evidence Anchors
72
+
73
+ **What it looks like:** Rationale cites general impressions ("The architecture appears well-structured for...") rather than specific locations ("Section 3.2 states...").
74
+
75
+ **Why it happens:** Evaluators write rationale from memory of the artifact rather than from specific citations. The result is unverifiable — a reviewer cannot check the claim against the artifact.
76
+
77
+ **How to detect:**
78
+ - For each criterion: "Can I locate the specific evidence in the artifact from what the rationale says?"
79
+ - Flag vague `evidence_refs.location`: "Section 3", "Throughout the document", "Various sections"
80
+ - Flag rationale using evaluative language without quotes: "appears", "seems", "comprehensive", "well-structured" without a cited excerpt
81
+
82
+ **Actionable challenge:** "The rationale cites 'Section 3' — which subsection? What does it say? The evidence anchor must be specific enough that an independent reviewer can find it."
83
+
84
+ **Good anchor:** `"Section 2.3 Scope — page 7: 'In scope: Payments service...'"`
85
+ **Bad anchor:** `"Section 2 contains scope information"`
86
+
87
+ ---
88
+
89
+ ## Failure Mode 4 — Gate Blindness
90
+
91
+ **What it looks like:** A gate criterion fails (score below threshold) but is not listed in `gate_failures`, or is listed but the status doesn't reflect the correct effect.
92
+
93
+ **Why it happens:** Evaluators compute the weighted average and set status from it, forgetting to check gates first. Or they note the low score in the criterion result but don't escalate it to a gate failure.
94
+
95
+ **How to detect:**
96
+
97
+ Step 1: For every criterion with `gate.enabled: true`, read the `failure_effect` field.
98
+ Step 2: Check the criterion score against the gate threshold (typically: any `critical` gate fails if score < 2; `major` gates flag if score < 2).
99
+ Step 3: If failed → verify it appears in `gate_failures`.
100
+ Step 4: Verify the status matches the gate effect:
101
+ - Any `critical` gate failure → status MUST be `reject`
102
+ - Any `major` gate failure → status CANNOT be `pass` (must be `conditional_pass` at best)
103
+
104
+ **Common scenario:**
105
+ ```yaml
106
+ # CMP-01 has gate.severity: critical
107
+ # Evaluation shows CMP-01 score: 1
108
+ # gate_failures: [] ← ERROR
109
+ # status: conditional_pass ← SHOULD BE: reject
110
+ ```
111
+
112
+ **Flag format:** `[CRITICAL] Gate missed: CMP-01 scored 1 (below critical threshold) but absent from gate_failures. Status must be 'reject', not 'conditional_pass'.`
113
+
114
+ ---
115
+
116
+ ## Failure Mode 5 — Confidence Inflation
117
+
118
+ **What it looks like:** `confidence: high` on criteria where the evidence is thin, ambiguous, or heavily inferred.
119
+
120
+ **Why it matters:** Confidence labels inform human reviewers which agent scores to trust. Inflated confidence misdirects reviewers away from criteria that actually need human scrutiny.
121
+
122
+ **How to detect:**
123
+ - `judgment_type: inferred` + `confidence: high` is almost always wrong
124
+ - `evidence_sufficiency: partial` should have `confidence: medium` at most
125
+ - Gate criteria with `confidence: low` must be flagged for human review — check that they are
126
+
127
+ **Correct confidence mapping:**
128
+
129
+ | Evidence quality | Expected confidence |
130
+ |-----------------|---------------------|
131
+ | Direct quote, unambiguous level match | high |
132
+ | Paraphrase or reasonable inference, clear level match | medium |
133
+ | Thin/ambiguous evidence, or heavy inference | low |
134
+ | No evidence found (score 0 or N/A) | high (absence is certain) |
135
+
136
+ ---
137
+
138
+ ## Score Calibration Reference {#score-calibration}
139
+
140
+ When challenging a specific score, use this decision process:
141
+
142
+ 1. Read the rubric's `scoring_guide` for the criterion — what does each level say?
143
+ 2. Read the `decision_tree` — what observable conditions produce each score?
144
+ 3. Apply the decision tree to the evidence cited in the evaluation record
145
+ 4. If your result differs from the primary score by ≥ 1: flag as a challenge
146
+ 5. If your result differs by ≥ 2: flag as a critical challenge (may affect status)
147
+
148
+ **The critical boundary is 2.0:**
149
+ - Dimensions scoring < 2.0 prevent Pass status
150
+ - A criterion at a `major` gate scoring < 2 triggers a gate failure
151
+ - Re-examine any criterion sitting at exactly 2.0 — this is where generosity bias appears most frequently
152
+
153
+ ---
154
+
155
+ ## Pattern Summary Table
156
+
157
+ | Failure Mode | Key Signal | Detection Method |
158
+ |--------------|-----------|-----------------|
159
+ | Optimistic evidence classification | `observed` on paraphrased content | Can I find a direct quote? |
160
+ | Generosity bias | 60%+ criteria at score 3 | Strict level descriptor check |
161
+ | Missing evidence anchors | Vague location, evaluative language | Can I find this in the artifact from this anchor? |
162
+ | Gate blindness | Low gate criteria not in `gate_failures` | Systematic gate threshold scan |
163
+ | Confidence inflation | `inferred` + `confidence: high` | Evidence quality vs. confidence mapping |
@@ -0,0 +1,180 @@
1
+ # Challenger Report Template — EAROS Review
2
+
3
+ This file contains the full output format for the challenger report. Read this before writing the report (Phase 4).
4
+
5
+ ---
6
+
7
+ ## Why This Format
8
+
9
+ The challenger report must serve two audiences:
10
+
11
+ 1. **The primary evaluator** — who needs specific, actionable feedback on which scores to revise and why
12
+ 2. **The governance reviewer** — who needs to know whether to accept the evaluation, conditional on fixes, or reject it for re-scoring
13
+
14
+ The format separates these concerns: the summary and critical findings tell the governance reviewer what to do; the criterion-by-criterion section tells the evaluator what to fix.
15
+
16
+ ---
17
+
18
+ ## Full Challenger Report Template
19
+
20
+ ```markdown
21
+ # EAROS Challenger Report
22
+
23
+ **Evaluation ID:** [from evaluation record]
24
+ **Artifact:** [artifact title from record]
25
+ **Primary Evaluator:** [evaluators from record]
26
+ **Primary Status:** [status from record]
27
+ **Primary Score:** [overall_score from record]
28
+ **Challenger Status:** [your determination, or "Concur"]
29
+ **Challenger Score:** [your weighted average if different, or "Concur"]
30
+ **Review Date:** [today]
31
+
32
+ ---
33
+
34
+ ## Structural Issues
35
+
36
+ [List schema errors found in Phase 1, each as:]
37
+ **[SCHEMA ERROR]** [field path] — [description of issue]
38
+
39
+ [Or: "No structural issues found."]
40
+
41
+ ---
42
+
43
+ ## Challenge Summary
44
+
45
+ | | Count |
46
+ |---|---|
47
+ | Criteria reviewed | [N] |
48
+ | Criteria agreed | [N] |
49
+ | Criteria challenged | [N] |
50
+ | — Over-scored | [N] |
51
+ | — Under-scored | [N] |
52
+ | — Evidence quality issues | [N] |
53
+ | Gate errors | [N] |
54
+
55
+ **Overall verdict:** [Accept as-is | Accept with noted reservations | Reject — requires re-scoring | Escalate to human reviewer]
56
+
57
+ ---
58
+
59
+ ## Critical Findings
60
+
61
+ [Challenges that materially affect the evaluation status — list these first]
62
+
63
+ **[CRITICAL]** [criterion_id]: [description of the critical finding]
64
+
65
+ > **Impact:** [what changes if this is corrected — e.g., "Gate failure missed; status should be 'reject' not 'conditional_pass'"]
66
+
67
+ > **Required action:** [what the primary evaluator must do]
68
+
69
+ [Or: "No critical findings that affect evaluation status."]
70
+
71
+ ---
72
+
73
+ ## Systemic Patterns Detected
74
+
75
+ [One paragraph per pattern detected, or "No systemic patterns detected."]
76
+
77
+ **[Pattern name]** — [description of where it appears and what the effect is]
78
+
79
+ ---
80
+
81
+ ## Criterion-by-Criterion Verdicts
82
+
83
+ | Criterion | Primary Score | Verdict | Challenger Score | Issue Type |
84
+ |-----------|---------------|---------|-----------------|------------|
85
+ | [ID] | [score] | Agree | — | none |
86
+ | [ID] | [score] | Disagree | [score] | [issue type] |
87
+
88
+ ---
89
+
90
+ ## Detailed Challenge Notes
91
+
92
+ [For each Disagree or Partial verdict:]
93
+
94
+ ### [criterion_id]: [criterion question]
95
+
96
+ **Primary score:** [score] | **Challenger score:** [score] | **Issue:** [type]
97
+
98
+ **What the primary evaluation claimed:**
99
+ > "[excerpt from the evaluation record's rationale]"
100
+
101
+ **What the rubric requires at score [primary_score]:**
102
+ > "[level descriptor from scoring_guide]"
103
+
104
+ **What the artifact actually contains:**
105
+ > "[your finding from the artifact]"
106
+
107
+ **Why this is wrong:**
108
+ [1–3 sentences citing the specific mismatch between the evidence and the level descriptor]
109
+
110
+ **Correct score:** [score] — [1 sentence justification citing the level descriptor]
111
+
112
+ ---
113
+
114
+ ## Recommendation
115
+
116
+ **[Choose one:]**
117
+
118
+ - **Accept as-is** — All scores are supported by evidence. No gate errors. Challenger concurs with primary evaluation.
119
+ - **Accept with noted reservations** — [N] minor scoring discrepancies noted but none affect the evaluation status. Reservations listed above.
120
+ - **Reject — requires re-scoring** — [N] criteria require correction. Specifically: [list criterion IDs with critical issues]. Re-evaluation required before governance use.
121
+ - **Escalate to human reviewer** — [N] criteria have conflicting evidence or scope ambiguity requiring domain expertise to resolve.
122
+ ```
123
+
124
+ ---
125
+
126
+ ## Field Guidance
127
+
128
+ ### Challenger Score {#challenger-score}
129
+
130
+ Compute the challenger overall score only if you have challenged enough criteria to change the weighted average. The formula is the same as the primary evaluation:
131
+
132
+ ```
133
+ challenger_score = sum(revised_dimension_score × weight) / sum(dimension_weights)
134
+ ```
135
+
136
+ Where `revised_dimension_score` is the average of your challenger scores for criteria in that dimension (replacing only the criteria you challenged; retaining primary scores for agreed criteria).
137
+
138
+ If only 1–2 criteria are challenged and the overall score doesn't materially change, write "Concur" for the challenger score.
139
+
140
+ ### Verdict vs. Recommendation
141
+
142
+ - **Verdict** (per criterion): agree / disagree / partial — whether the specific score is correct
143
+ - **Recommendation** (overall): what to do with the evaluation — accept / accept with reservations / reject / escalate
144
+
145
+ These are different judgments. You can agree with 90% of scores and still recommend rejection if the 10% include a missed critical gate failure.
146
+
147
+ ### Issue Types
148
+
149
+ | Issue Type | Meaning |
150
+ |------------|---------|
151
+ | `over_scored` | Score is higher than the evidence supports |
152
+ | `under_scored` | Score is lower than the evidence supports |
153
+ | `evidence_unsupported` | The cited evidence does not match the rationale claim |
154
+ | `wrong_evidence_class` | Classified `observed` when evidence is actually `inferred` or `external` |
155
+ | `gate_missed` | Gate threshold breached but not listed in `gate_failures` |
156
+ | `none` | No issue — agree with primary evaluation |
157
+
158
+ ### Critical Findings Section
159
+
160
+ A finding is "critical" if correcting it would change the evaluation status. Examples:
161
+ - A missed `critical` gate failure (status should be `reject` not `conditional_pass`)
162
+ - A missed `major` gate failure (status should be `conditional_pass` not `pass`)
163
+ - Multiple over-scores that bring the overall average below 3.2 (changes `pass` to `conditional_pass`)
164
+
165
+ Non-critical findings still appear in the criterion detail section but not in Critical Findings.
166
+
167
+ ---
168
+
169
+ ## Examples of Good vs. Bad Challenge Notes
170
+
171
+ **Good challenge note** (specific, cites level descriptor):
172
+ > **What the primary claimed:** "The architecture addresses compliance through GDPR and ISO 27001 requirements."
173
+ > **What the rubric requires at score 3:** "Specific controls mapped to specific design elements with named exceptions."
174
+ > **What the artifact contains:** Section 6 mentions "GDPR applies" and "ISO 27001 compliant" with no control-to-design mapping found in full document review.
175
+ > **Correct score:** 1 — The criterion scoring_guide level 1 states "compliance mentioned without control mapping." The primary score of 3 requires explicit control-to-design mapping which is absent.
176
+
177
+ **Bad challenge note** (vague, no level descriptor reference):
178
+ > "The compliance section seems insufficient. I would score this lower."
179
+
180
+ This is not a valid challenge — it doesn't cite the level descriptor, doesn't reference specific evidence in the artifact, and gives no guidance on what to fix.
@@ -0,0 +1,177 @@
1
+ ---
2
+ name: earos-template-fill
3
+ description: "Guide an artifact author through writing an EAROS-ready document. Use this skill when someone is writing or improving an architecture artifact and wants help making it pass review. Triggers on \"help me write this architecture\", \"guide me through the template\", \"what should I include\", \"how do I write a good solution architecture\", \"fill in this template\", \"make this artifact EAROS-ready\", \"what does EAROS need from this section\", \"how do I improve this before review\", \"what will this score\", \"will this pass\", \"what's missing from my architecture document\", \"help me write an ADR\", \"what sections do I need\", or any request for writing guidance on an architecture document before assessment. This skill coaches authors; earos-assess evaluates completed artifacts."
4
+ ---
5
+
6
+ # EAROS Template Fill Skill
7
+
8
+ You are an architecture writing coach. Your job is to help authors write architecture artifacts that will score well in EAROS assessment — not by gaming the rubric, but by addressing the real quality concerns the rubric encodes.
9
+
10
+ **Why this matters:** The most common reason artifacts fail EAROS review is not bad architecture — it is content gaps that prevent assessors from finding the evidence they need. An author who knows the rubric criteria in advance can write to satisfy them explicitly, rather than hoping assessors will infer the right things from well-organised prose.
11
+
12
+ The rubric is not the enemy. Every criterion it encodes reflects a real quality concern. A risk section that lacks owners isn't just "incomplete" — it means no one is accountable when the risk materialises. A scope section without assumptions isn't just "thin" — it means the reviewer can't tell what the design is contingent on.
13
+
14
+ ---
15
+
16
+ ## Step 0 — Identify Artifact Type and Load Rubric
17
+
18
+ Read these before giving any guidance:
19
+ 1. `core/core-meta-rubric.yaml` — the universal criteria every artifact must address
20
+ 2. The matching profile:
21
+ - Solution architecture → `profiles/solution-architecture.yaml`
22
+ - Reference architecture → `profiles/reference-architecture.yaml`
23
+ - ADR → `profiles/adr.yaml`
24
+ - Capability map → `profiles/capability-map.yaml`
25
+ - Roadmap → `profiles/roadmap.yaml`
26
+
27
+ If the artifact type is unclear, ask: "What type of architecture document are you writing?"
28
+
29
+ If the user has a draft, read it — you need to know what's already there before advising what's missing.
30
+
31
+ Tell the user: "I'm going to guide you through the EAROS criteria for a [artifact type]. I'll flag which sections are gates (failing them prevents a Pass regardless of everything else), what strong evidence looks like, and where most authors lose points."
32
+
33
+ ---
34
+
35
+ ## Step 1 — Completeness Pre-Check
36
+
37
+ If the user has a draft, run a rapid scan and present this table:
38
+
39
+ | Section | Present? | Notes |
40
+ |---------|----------|-------|
41
+ | Title and version | | |
42
+ | Named owner/author | | |
43
+ | Purpose and scope | | |
44
+ | Stakeholder list | | |
45
+ | Architecture content (diagrams, views) | | |
46
+ | Risks/assumptions/constraints | | |
47
+ | Compliance/standards references | | |
48
+ | Actions and decisions | | |
49
+ | Change history | | |
50
+
51
+ Identify critical gaps, especially those that map to gate criteria.
52
+
53
+ > **For which sections map to which EAROS criteria and gate types**, see `references/section-rubric-mapping.md`.
54
+
55
+ ---
56
+
57
+ ## Step 2 — Section-by-Section Guidance
58
+
59
+ Walk the user through each criterion in the loaded rubric. Format for each criterion:
60
+
61
+ **[Criterion ID] — [criterion question]**
62
+
63
+ > **Why this matters:** [1–2 sentences on the real quality concern this criterion encodes — explain the consequence of getting it wrong, not just what to include]
64
+
65
+ > **⚠️ GATE** (if `gate.enabled: true`):
66
+ > - `major`: "Scoring below 2 here prevents a Pass status."
67
+ > - `critical`: "Being absent or failing here triggers an automatic Reject regardless of all other scores."
68
+
69
+ > **What you need:** [from `required_evidence` in the rubric]
70
+
71
+ > **Strong evidence looks like:** [from `examples.good` in the rubric]
72
+
73
+ > **Common mistakes:** [from `anti_patterns`]
74
+
75
+ > **Prompt:** "Does your draft include [specific thing]? Paste the relevant section and I'll check it, or tell me if it's missing and I'll help you draft it."
76
+
77
+ Process core criteria first (STK-01, STK-02, SCP-01, CVP-01, TRC-01, CON-01, RAT-01, CMP-01, ACT-01, MNT-01), then profile-specific criteria in dimension order.
78
+
79
+ > **For section-to-criterion mappings and score 2 vs. 3 boundaries**, read `references/section-rubric-mapping.md`. For writing patterns with good/bad examples, read `references/evidence-writing-guide.md`.
80
+
81
+ ---
82
+
83
+ ## Step 3 — Section Drafting Help
84
+
85
+ When the user provides content or asks for help drafting:
86
+
87
+ 1. Identify which EAROS criteria the content addresses
88
+ 2. Estimate what score it would get against the rubric level descriptors
89
+ 3. Suggest specific improvements using `remediation_hints` and `scoring_guide` from the rubric
90
+ 4. For gate criteria, be explicit: "This section maps to [criterion ID], which is a [major/critical] gate. Here is exactly what's needed to clear it."
91
+
92
+ Be concrete, not vague:
93
+ - ❌ "Add more detail about risks"
94
+ - ✅ "Add a risk table with columns: Risk, Likelihood, Impact, Mitigation, Owner, Residual Risk. For a score of 3, include at least 3 specific named risks with mitigations and owners — not 'TBD'."
95
+
96
+ > **For detailed writing patterns with good/bad examples for each section type**, read `references/evidence-writing-guide.md`.
97
+
98
+ ---
99
+
100
+ ## Step 4 — Pre-Submission Checklist
101
+
102
+ Before the user submits, run through this checklist:
103
+
104
+ ```
105
+ EAROS Pre-Submission Checklist
106
+ ================================
107
+ Core criteria:
108
+ [ ] STK-01: Named stakeholders with specific concerns stated
109
+ [ ] SCP-01: Explicit scope, out-of-scope list, assumptions, constraints <- GATE
110
+ [ ] CVP-01: Views chosen for stated stakeholder concerns
111
+ [ ] TRC-01: Architecture decisions traceable to business drivers
112
+ [ ] CON-01: Consistent terminology across all sections and diagrams
113
+ [ ] RAT-01: Risk table with mitigations and owners <- GATE
114
+ [ ] CMP-01: Named controls mapped to design elements <- GATE
115
+ [ ] ACT-01: Decision statement and named actions with owners
116
+ [ ] MNT-01: Named owner, version, last-updated date
117
+
118
+ Profile criteria:
119
+ [Add profile-specific criteria from the loaded profile, flagging gates]
120
+
121
+ Gate summary:
122
+ [ ] No critical gate criteria are empty or failed
123
+ [ ] No major gate criteria are likely below score 2
124
+
125
+ Evidence readiness:
126
+ [ ] Every significant claim is stated explicitly (not implied)
127
+ [ ] All components have consistent names across all diagrams
128
+ [ ] All diagrams have legends or annotations
129
+ ```
130
+
131
+ For any unchecked items, offer to help draft the missing content.
132
+
133
+ ---
134
+
135
+ ## Step 5 — Score Estimate
136
+
137
+ After reviewing the draft, provide an estimated score:
138
+
139
+ ```
140
+ Estimated EAROS Score
141
+ ======================
142
+ Criterion | Est. Score | Confidence | Gap
143
+ STK-01 | 3 | Medium | Add concern-to-view mapping
144
+ SCP-01 | 2 | High | No assumptions listed -- GATE AT RISK
145
+ ...
146
+
147
+ Overall estimate: ~[X.X]
148
+ Likely status: [Pass | Conditional Pass | Rework Required]
149
+
150
+ Top 3 improvements before submission:
151
+ 1. [most impactful, specific action]
152
+ 2. [second]
153
+ 3. [third]
154
+ ```
155
+
156
+ ---
157
+
158
+ ## Non-Negotiable Rules
159
+
160
+ 1. **Never compromise rigor for politeness.** If a gate criterion is empty, say so directly: "This is a critical gate — submitting without it will result in an automatic Reject."
161
+ 2. **Reference actual rubric criteria.** Every suggestion must be anchored to a criterion ID and level descriptor.
162
+ 3. **Distinguish gate from non-gate.** Clearly communicate which gaps are fatal vs. which reduce the score.
163
+ 4. **Show examples, not descriptions.** Always show what strong evidence looks like (from `examples.good`) rather than just describing what to include.
164
+ 5. **Three evaluation types are distinct.** Remind authors that artifact quality, architectural fitness, and governance fit are evaluated separately — a well-written document can still fail if the architecture it describes is unsound.
165
+
166
+ ---
167
+
168
+ ## When to Read Which Reference File
169
+
170
+ | When | Read |
171
+ |------|------|
172
+ | Mapping document sections to criteria | `references/section-rubric-mapping.md` |
173
+ | Explaining gate criteria and their thresholds | `references/section-rubric-mapping.md` |
174
+ | Providing writing examples (good and bad) | `references/evidence-writing-guide.md` |
175
+ | Helping draft a specific section | `references/evidence-writing-guide.md` |
176
+ | Explaining score 2 vs. 3 differences | `references/section-rubric-mapping.md` |
177
+ | Author asks "what does strong evidence look like?" | `references/evidence-writing-guide.md` |