@trohde/earos 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (135) hide show
  1. package/README.md +156 -0
  2. package/assets/init/.agents/skills/earos-artifact-gen/SKILL.md +106 -0
  3. package/assets/init/.agents/skills/earos-artifact-gen/references/interview-guide.md +313 -0
  4. package/assets/init/.agents/skills/earos-artifact-gen/references/output-guide.md +367 -0
  5. package/assets/init/.agents/skills/earos-assess/SKILL.md +212 -0
  6. package/assets/init/.agents/skills/earos-assess/references/calibration-benchmarks.md +160 -0
  7. package/assets/init/.agents/skills/earos-assess/references/output-templates.md +311 -0
  8. package/assets/init/.agents/skills/earos-assess/references/scoring-protocol.md +281 -0
  9. package/assets/init/.agents/skills/earos-calibrate/SKILL.md +153 -0
  10. package/assets/init/.agents/skills/earos-calibrate/references/agreement-metrics.md +188 -0
  11. package/assets/init/.agents/skills/earos-calibrate/references/calibration-protocol.md +263 -0
  12. package/assets/init/.agents/skills/earos-create/SKILL.md +257 -0
  13. package/assets/init/.agents/skills/earos-create/references/criterion-writing-guide.md +268 -0
  14. package/assets/init/.agents/skills/earos-create/references/dependency-rules.md +193 -0
  15. package/assets/init/.agents/skills/earos-create/references/rubric-interview-guide.md +123 -0
  16. package/assets/init/.agents/skills/earos-create/references/validation-checklist.md +238 -0
  17. package/assets/init/.agents/skills/earos-profile-author/SKILL.md +251 -0
  18. package/assets/init/.agents/skills/earos-profile-author/references/criterion-writing-guide.md +280 -0
  19. package/assets/init/.agents/skills/earos-profile-author/references/design-methods.md +158 -0
  20. package/assets/init/.agents/skills/earos-profile-author/references/profile-checklist.md +173 -0
  21. package/assets/init/.agents/skills/earos-remediate/SKILL.md +118 -0
  22. package/assets/init/.agents/skills/earos-remediate/references/output-template.md +199 -0
  23. package/assets/init/.agents/skills/earos-remediate/references/remediation-patterns.md +330 -0
  24. package/assets/init/.agents/skills/earos-report/SKILL.md +85 -0
  25. package/assets/init/.agents/skills/earos-report/references/portfolio-template.md +181 -0
  26. package/assets/init/.agents/skills/earos-report/references/single-artifact-template.md +168 -0
  27. package/assets/init/.agents/skills/earos-review/SKILL.md +130 -0
  28. package/assets/init/.agents/skills/earos-review/references/challenge-patterns.md +163 -0
  29. package/assets/init/.agents/skills/earos-review/references/output-template.md +180 -0
  30. package/assets/init/.agents/skills/earos-template-fill/SKILL.md +177 -0
  31. package/assets/init/.agents/skills/earos-template-fill/references/evidence-writing-guide.md +186 -0
  32. package/assets/init/.agents/skills/earos-template-fill/references/section-rubric-mapping.md +200 -0
  33. package/assets/init/.agents/skills/earos-validate/SKILL.md +113 -0
  34. package/assets/init/.agents/skills/earos-validate/references/fix-patterns.md +281 -0
  35. package/assets/init/.agents/skills/earos-validate/references/validation-checks.md +287 -0
  36. package/assets/init/.claude/CLAUDE.md +4 -0
  37. package/assets/init/AGENTS.md +293 -0
  38. package/assets/init/CLAUDE.md +635 -0
  39. package/assets/init/README.md +507 -0
  40. package/assets/init/calibration/gold-set/.gitkeep +0 -0
  41. package/assets/init/calibration/results/.gitkeep +0 -0
  42. package/assets/init/core/core-meta-rubric.yaml +643 -0
  43. package/assets/init/docs/consistency-report.md +325 -0
  44. package/assets/init/docs/getting-started.md +194 -0
  45. package/assets/init/docs/profile-authoring-guide.md +51 -0
  46. package/assets/init/docs/terminology.md +126 -0
  47. package/assets/init/earos.manifest.yaml +104 -0
  48. package/assets/init/evaluations/.gitkeep +0 -0
  49. package/assets/init/examples/aws-event-driven-order-processing/artifact.yaml +2056 -0
  50. package/assets/init/examples/aws-event-driven-order-processing/evaluation.yaml +973 -0
  51. package/assets/init/examples/aws-event-driven-order-processing/report.md +244 -0
  52. package/assets/init/examples/example-solution-architecture.evaluation.yaml +136 -0
  53. package/assets/init/examples/multi-cloud-data-analytics/artifact.yaml +715 -0
  54. package/assets/init/overlays/data-governance.yaml +94 -0
  55. package/assets/init/overlays/regulatory.yaml +154 -0
  56. package/assets/init/overlays/security.yaml +92 -0
  57. package/assets/init/profiles/adr.yaml +225 -0
  58. package/assets/init/profiles/capability-map.yaml +223 -0
  59. package/assets/init/profiles/reference-architecture.yaml +426 -0
  60. package/assets/init/profiles/roadmap.yaml +205 -0
  61. package/assets/init/profiles/solution-architecture.yaml +227 -0
  62. package/assets/init/research/architecture-assessment-rubrics-research.docx +0 -0
  63. package/assets/init/research/architecture-assessment-rubrics-research.md +566 -0
  64. package/assets/init/research/reference-architecture-research.md +751 -0
  65. package/assets/init/standard/EAROS.md +1426 -0
  66. package/assets/init/standard/schemas/artifact.schema.json +1295 -0
  67. package/assets/init/standard/schemas/artifact.uischema.json +65 -0
  68. package/assets/init/standard/schemas/evaluation.schema.json +284 -0
  69. package/assets/init/standard/schemas/rubric.schema.json +383 -0
  70. package/assets/init/templates/evaluation-record.template.yaml +58 -0
  71. package/assets/init/templates/new-profile.template.yaml +65 -0
  72. package/bin.js +188 -0
  73. package/dist/assets/_basePickBy-BVu6YmSW.js +1 -0
  74. package/dist/assets/_baseUniq-CWRzQDz_.js +1 -0
  75. package/dist/assets/arc-CyDBhtDM.js +1 -0
  76. package/dist/assets/architectureDiagram-2XIMDMQ5-BH6O4dvN.js +36 -0
  77. package/dist/assets/blockDiagram-WCTKOSBZ-2xmwdjpg.js +132 -0
  78. package/dist/assets/c4Diagram-IC4MRINW-BNmPRFJF.js +10 -0
  79. package/dist/assets/channel-CiySTNoJ.js +1 -0
  80. package/dist/assets/chunk-4BX2VUAB-DGQTvirp.js +1 -0
  81. package/dist/assets/chunk-55IACEB6-DNMAQAC_.js +1 -0
  82. package/dist/assets/chunk-FMBD7UC4-BJbVTQ5o.js +15 -0
  83. package/dist/assets/chunk-JSJVCQXG-BCxUL74A.js +1 -0
  84. package/dist/assets/chunk-KX2RTZJC-H7wWZOfz.js +1 -0
  85. package/dist/assets/chunk-NQ4KR5QH-BK4RlTQF.js +220 -0
  86. package/dist/assets/chunk-QZHKN3VN-0chxDV5g.js +1 -0
  87. package/dist/assets/chunk-WL4C6EOR-DexfQ-AV.js +189 -0
  88. package/dist/assets/classDiagram-VBA2DB6C-D7luWJQn.js +1 -0
  89. package/dist/assets/classDiagram-v2-RAHNMMFH-D7luWJQn.js +1 -0
  90. package/dist/assets/clone-ylgRbd3D.js +1 -0
  91. package/dist/assets/cose-bilkent-S5V4N54A-DS2IOCfZ.js +1 -0
  92. package/dist/assets/cytoscape.esm-CyJtwmzi.js +331 -0
  93. package/dist/assets/dagre-KLK3FWXG-BbSoTTa3.js +4 -0
  94. package/dist/assets/defaultLocale-DX6XiGOO.js +1 -0
  95. package/dist/assets/diagram-E7M64L7V-C9TvYgv0.js +24 -0
  96. package/dist/assets/diagram-IFDJBPK2-DowUMWrg.js +43 -0
  97. package/dist/assets/diagram-P4PSJMXO-BL6nrnQF.js +24 -0
  98. package/dist/assets/erDiagram-INFDFZHY-rXPRl8VM.js +70 -0
  99. package/dist/assets/flowDiagram-PKNHOUZH-DBRM99-W.js +162 -0
  100. package/dist/assets/ganttDiagram-A5KZAMGK-INcWFsBT.js +292 -0
  101. package/dist/assets/gitGraphDiagram-K3NZZRJ6-DMwpfE91.js +65 -0
  102. package/dist/assets/graph-DLQn37b-.js +1 -0
  103. package/dist/assets/index-BFFITMT8.js +650 -0
  104. package/dist/assets/index-H7f6VTz1.css +1 -0
  105. package/dist/assets/infoDiagram-LFFYTUFH-B0f4TWRM.js +2 -0
  106. package/dist/assets/init-Gi6I4Gst.js +1 -0
  107. package/dist/assets/ishikawaDiagram-PHBUUO56-CsU6XimZ.js +70 -0
  108. package/dist/assets/journeyDiagram-4ABVD52K-CQ7ibNib.js +139 -0
  109. package/dist/assets/kanban-definition-K7BYSVSG-DzEN7THt.js +89 -0
  110. package/dist/assets/katex-B1X10hvy.js +261 -0
  111. package/dist/assets/layout-C0dvb42R.js +1 -0
  112. package/dist/assets/linear-j4a8mGj7.js +1 -0
  113. package/dist/assets/mindmap-definition-YRQLILUH-DP8iEuCf.js +68 -0
  114. package/dist/assets/ordinal-Cboi1Yqb.js +1 -0
  115. package/dist/assets/pieDiagram-SKSYHLDU-BpIAXgAm.js +30 -0
  116. package/dist/assets/quadrantDiagram-337W2JSQ-DrpXn5Eg.js +7 -0
  117. package/dist/assets/requirementDiagram-Z7DCOOCP-Bg7EwHlG.js +73 -0
  118. package/dist/assets/sankeyDiagram-WA2Y5GQK-BWagRs1F.js +10 -0
  119. package/dist/assets/sequenceDiagram-2WXFIKYE-q5jwhivG.js +145 -0
  120. package/dist/assets/stateDiagram-RAJIS63D-B_J9pE-2.js +1 -0
  121. package/dist/assets/stateDiagram-v2-FVOUBMTO-Q_1GcybB.js +1 -0
  122. package/dist/assets/timeline-definition-YZTLITO2-dv0jgQ0z.js +61 -0
  123. package/dist/assets/treemap-KZPCXAKY-Dt1dkIE7.js +162 -0
  124. package/dist/assets/vennDiagram-LZ73GAT5-BdO5RgRZ.js +34 -0
  125. package/dist/assets/xychartDiagram-JWTSCODW-CpDVe-8v.js +7 -0
  126. package/dist/index.html +23 -0
  127. package/export-docx.js +1583 -0
  128. package/init.js +353 -0
  129. package/manifest-cli.mjs +207 -0
  130. package/package.json +83 -0
  131. package/schemas/artifact.schema.json +1295 -0
  132. package/schemas/artifact.uischema.json +65 -0
  133. package/schemas/evaluation.schema.json +284 -0
  134. package/schemas/rubric.schema.json +383 -0
  135. package/serve.js +238 -0
@@ -0,0 +1,263 @@
1
+ # Calibration Protocol
2
+
3
+ Step-by-step procedure for running an EAROS calibration exercise. Follow this exactly — shortcuts produce unreliable results.
4
+
5
+ ---
6
+
7
+ ## Phase 1 — Setup
8
+
9
+ ### 1.1 Identify the rubric being calibrated
10
+
11
+ Confirm:
12
+ - `rubric_id` (e.g., `EAROS-POSTMORTEM-001`)
13
+ - Current `status` (should be `draft` before calibration; changes to `candidate` after passing)
14
+ - Date of last calibration (if any)
15
+
16
+ ### 1.2 Assemble the calibration artifact set
17
+
18
+ **Minimum set (3 artifacts):**
19
+
20
+ | Artifact | Expected score range | Purpose |
21
+ |----------|---------------------|---------|
22
+ | Strong | ≥ 3.2 overall | Confirms the rubric correctly identifies high-quality artifacts |
23
+ | Weak | < 2.4 overall | Confirms the rubric correctly identifies poor artifacts |
24
+ | Ambiguous | 2.4–3.2 overall | Tests boundary detection — the hardest cases |
25
+
26
+ **Recommended set (4 artifacts):**
27
+ Add a fourth: an artifact with a known gate failure. This verifies the gate logic and its effect on status.
28
+
29
+ **Requirements for calibration artifacts:**
30
+ - Real artifacts (not synthetic examples created to match the rubric)
31
+ - Representative of the artifact type in practice
32
+ - Diverse: different teams, different systems, different quality levels
33
+ - Stored in `calibration/gold-set/[rubric-id]/` with known scores recorded
34
+
35
+ ### 1.3 Record known gold-set scores
36
+
37
+ For each artifact, the gold-set must contain:
38
+ ```yaml
39
+ artifact_id: [ID]
40
+ artifact_title: [title]
41
+ known_status: [pass | conditional_pass | rework_required | reject]
42
+ known_overall_score: [X.X]
43
+ criterion_scores:
44
+ - criterion_id: [ID]
45
+ gold_score: [0-4 or N/A]
46
+ gold_evidence_class: [observed | inferred | external]
47
+ gold_rationale: "[why this score was assigned]"
48
+ ```
49
+
50
+ Do NOT share these gold-set scores with the evaluator before they complete their independent assessment.
51
+
52
+ ---
53
+
54
+ ## Phase 2 — Independent Scoring
55
+
56
+ ### 2.1 Read the rubric
57
+
58
+ Read the full rubric YAML for the profile being calibrated. Read `core/core-meta-rubric.yaml` as well — core criteria are always evaluated.
59
+
60
+ Do NOT read the gold-set scores. Close any document containing them.
61
+
62
+ ### 2.2 Score each artifact independently
63
+
64
+ Follow the standard earos-assess 8-step DAG for each artifact:
65
+
66
+ ```
67
+ structural_validation
68
+ → content_extraction
69
+ → criterion_scoring
70
+ → cross_reference_validation
71
+ → dimension_aggregation
72
+ → challenge_pass
73
+ → calibration
74
+ → status_determination
75
+ ```
76
+
77
+ For each criterion, record:
78
+ ```yaml
79
+ criterion_id: [ID]
80
+ score: [0-4 or N/A]
81
+ evidence_anchor: "[direct quote or specific section reference]"
82
+ evidence_class: [observed | inferred | external]
83
+ confidence: [high | medium | low]
84
+ rationale: "[why this score was assigned — specific, not vague]"
85
+ ```
86
+
87
+ **Critical rules during scoring:**
88
+ - Extract an evidence anchor BEFORE assigning a score (RULERS protocol)
89
+ - If no evidence can be found, record N/A and explain why — do not score from impression
90
+ - Do not look at the gold-set until scoring for all artifacts is complete
91
+
92
+ ### 2.3 Complete the challenge_pass step
93
+
94
+ Before finalising scores, run the internal challenge:
95
+ - For each score of 3 or 4: "What specific evidence in the artifact earns this score?"
96
+ - If the answer is vague, revisit the scoring_guide and decision_tree
97
+ - Document the challenge outcome in the evaluation record
98
+
99
+ ### 2.4 Determine status for each artifact
100
+
101
+ Apply the status thresholds:
102
+ 1. Check gate failures first — any critical gate failure = reject
103
+ 2. Check overall score: ≥ 3.2 = pass, 2.4–3.19 = conditional_pass, < 2.4 = rework_required
104
+ 3. Check dimension floor: no dimension < 2.0 for a pass status
105
+
106
+ Record the determined status.
107
+
108
+ ---
109
+
110
+ ## Phase 3 — Score Comparison
111
+
112
+ ### 3.1 Access the gold-set scores
113
+
114
+ Only after completing your independent assessment for all artifacts, read the gold-set scores.
115
+
116
+ ### 3.2 Build the comparison table
117
+
118
+ For each artifact, for each criterion:
119
+ ```
120
+ artifact_id | criterion_id | gold_score | agent_score | delta | agreement
121
+ [ID] | [ID] | [score] | [score] | [d] | exact|within_1|disagreement
122
+ ```
123
+
124
+ Where:
125
+ - `delta = gold_score - agent_score`
126
+ - Positive delta = agent under-scored (too harsh)
127
+ - Negative delta = agent over-scored (too generous)
128
+ - `agreement = exact` if delta = 0; `within_1` if |delta| = 1; `disagreement` if |delta| ≥ 2
129
+
130
+ ### 3.3 Flag systematic bias
131
+
132
+ Compute mean delta across all criteria:
133
+ - Mean delta near 0 = no systematic bias
134
+ - Mean delta > +0.5 = agent consistently under-scores (too harsh)
135
+ - Mean delta < -0.5 = agent consistently over-scores (too generous)
136
+
137
+ Systematic bias is more serious than random disagreement — it indicates a calibration problem that will affect every future evaluation.
138
+
139
+ ---
140
+
141
+ ## Phase 4 — Metric Computation
142
+
143
+ Read `references/agreement-metrics.md` for computation procedures.
144
+
145
+ Compute and record:
146
+ 1. Binary agreement (artifact status match rate)
147
+ 2. Weighted kappa per criterion
148
+ 3. Spearman ρ across artifact overall scores
149
+ 4. Reliability flag per criterion: `reliable` | `moderate` | `unreliable`
150
+
151
+ ---
152
+
153
+ ## Phase 5 — Root Cause Analysis
154
+
155
+ For each `disagreement` (|delta| ≥ 2) or `unreliable` criterion:
156
+
157
+ **Five root cause categories:**
158
+
159
+ 1. **Ambiguous level descriptor**
160
+ - Symptom: Both evaluators can justify their score from the current scoring_guide
161
+ - Fix: Sharpen the level descriptors, especially at the 2/3 boundary
162
+ - Example fix: Add a specific observable feature that distinguishes score 2 from score 3
163
+
164
+ 2. **Missing or vague decision_tree**
165
+ - Symptom: The decision_tree doesn't resolve the specific case that caused disagreement
166
+ - Fix: Add an explicit branch for the ambiguous condition
167
+ - Example fix: "IF X present but Y absent THEN score 2 (not 3)"
168
+
169
+ 3. **Evidence classification disagreement**
170
+ - Symptom: Evaluators agree on what the artifact says but classify differently (observed vs. inferred)
171
+ - Fix: Add an `examples.good` entry clarifying what qualifies as `observed` for this criterion
172
+ - Example fix: "Quote must include [specific field] to qualify as observed"
173
+
174
+ 4. **Anti-pattern not captured**
175
+ - Symptom: The artifact exhibits a form of failure the rubric doesn't explicitly call out
176
+ - Fix: Add the new failure mode to `anti_patterns` and update the scoring_guide accordingly
177
+
178
+ 5. **Context sensitivity**
179
+ - Symptom: The criterion means something different for this artifact than the rubric anticipates
180
+ - Fix: Add conditional guidance to the `description` or `decision_tree`
181
+ - Example fix: "For artifacts covering only a single domain, N/A is acceptable for criterion X"
182
+
183
+ ---
184
+
185
+ ## Phase 6 — Calibration Report
186
+
187
+ Save results to: `calibration/results/[rubric-id]-calibration-[YYYY-MM-DD].yaml`
188
+
189
+ Report format:
190
+
191
+ ```yaml
192
+ calibration_id: CAL-[RUBRIC-ID]-[YYYYMMDD]
193
+ rubric_id: [rubric ID]
194
+ calibration_date: [today]
195
+ artifacts_scored: [N]
196
+ evaluator: agent
197
+
198
+ summary_metrics:
199
+ binary_agreement: "[X%]"
200
+ mean_delta: [X.X]
201
+ spearman_rho: [X.XX]
202
+ overall_verdict: [pass_for_production | borderline | not_ready]
203
+
204
+ criterion_results:
205
+ - criterion_id: [ID]
206
+ gold_score: [score]
207
+ agent_score: [score]
208
+ delta: [delta]
209
+ agreement: [exact | within_1 | disagreement]
210
+ reliability_flag: [reliable | moderate | unreliable]
211
+ root_cause: "[if disagreement — brief root cause analysis]"
212
+ recommended_rubric_change: "[if unreliable — specific change]"
213
+
214
+ artifact_results:
215
+ - artifact_id: [ID]
216
+ gold_status: [status]
217
+ agent_status: [status]
218
+ gold_score: [X.X]
219
+ agent_score: [X.X]
220
+ status_match: [true | false]
221
+ notes: "[any notable findings]"
222
+
223
+ recommendations:
224
+ proceed_to_production: [true | false]
225
+ blocking_issues:
226
+ - "[issue preventing production use, if any]"
227
+ rubric_improvements:
228
+ - criterion_id: [ID]
229
+ field_to_change: [scoring_guide | decision_tree | anti_patterns | examples]
230
+ change_description: "[specific change]"
231
+ ```
232
+
233
+ ---
234
+
235
+ ## Phase 7 — Post-Calibration Actions
236
+
237
+ **If verdict is `pass_for_production`:**
238
+ 1. Change profile `status` from `draft` to `candidate`
239
+ 2. Update `calibration_date` in the profile YAML
240
+ 3. Schedule next calibration (6 months or at next major rubric revision)
241
+
242
+ **If verdict is `borderline`:**
243
+ 1. Implement recommended rubric improvements
244
+ 2. Re-run calibration on the same artifacts to verify improvement
245
+ 3. Do not promote to `candidate` until re-calibration passes
246
+
247
+ **If verdict is `not_ready`:**
248
+ 1. Analyse root causes for all unreliable criteria
249
+ 2. Revise `scoring_guide`, `decision_tree`, and `examples` for each
250
+ 3. Rebuild calibration artifact set if artifacts were unrepresentative
251
+ 4. Re-run full calibration from Phase 2
252
+
253
+ ---
254
+
255
+ ## Recalibration Triggers
256
+
257
+ Recalibrate when any of these occur:
258
+ - A criterion's `scoring_guide` or `decision_tree` is materially changed
259
+ - New overlay is applied alongside this profile
260
+ - Agreement drops below targets in a post-production review
261
+ - New artifact formats appear (new diagramming tools, new document templates)
262
+ - Agent model is updated and behaviour may differ
263
+ - Governance expectations change materially
@@ -0,0 +1,257 @@
1
+ ---
2
+ name: earos-create
3
+ description: "Create a new EAROS rubric — core rubric, artifact profile, or cross-cutting overlay. Use this skill when someone wants to \"create a rubric\", \"new profile\", \"new overlay\", \"define criteria for\", \"make an assessment rubric for\", \"I need a rubric for\", \"how do I assess [artifact type]\", \"create evaluation criteria\", \"build a scoring framework\", \"new EAROS rubric\", \"add a rubric for [type]\", \"we don't have a rubric for\", \"extend EAROS for\", \"create evaluation standards for\", or any request to create, define, or build evaluation criteria for architecture artifacts. Also triggers on \"I need something to score [artifact type]\", \"how do I make EAROS work for [artifact type]\", \"we need criteria for [artifact type]\", or \"I want to add [artifact type] to EAROS\". This skill supersedes earos-profile-author for creating new rubrics from scratch."
4
+ ---
5
+
6
+ # EAROS Create Skill
7
+
8
+ You are an architecture governance consultant guiding the creation of a new EAROS rubric. This is a design process, not a form-filling exercise. A good rubric requires careful thinking before any YAML is written — the quality of the interview determines the quality of the rubric.
9
+
10
+ **The most common failure mode:** jumping to criteria before understanding what quality looks like for this artifact type. Resist that. The questions in Step 3 are sequenced to build understanding from the ground up.
11
+
12
+ ---
13
+
14
+ ## Step 0 — Load Reference Files
15
+
16
+ Before the interview begins, read:
17
+ 1. `earos.manifest.yaml` (repo root) — the authoritative registry; lists all existing profiles and overlays with their paths, IDs, and statuses
18
+ 2. `core/core-meta-rubric.yaml` — understand what the core already covers (never duplicate it)
19
+
20
+ Use the manifest to show the user what already exists during Step 2. Do not list `profiles/` or `overlays/` directories directly — read from the manifest.
21
+
22
+ > **For what depends on what and how to check for conflicts**, see `references/dependency-rules.md`.
23
+
24
+ ---
25
+
26
+ ## Step 1 — Detect the Rubric Type
27
+
28
+ Ask: **What do you want to create?**
29
+
30
+ Present three options with brief explanations:
31
+
32
+ | Type | Use when |
33
+ |------|----------|
34
+ | **Profile** | You're adding evaluation criteria for a new artifact type (post-mortems, data contracts, platform handover docs). The most common case. |
35
+ | **Overlay** | You're adding criteria that cut across multiple artifact types (AI governance, resilience, cost transparency). Applied by context, not artifact type. |
36
+ | **Core rubric** | You're replacing or extending the universal foundation. This affects every artifact type. Rare — usually a governance decision. |
37
+
38
+ If the user is unsure:
39
+ - "Is this something unique to a specific artifact type, or would it apply to many types?" → profile vs. overlay
40
+ - "Are you building on top of the existing core, or replacing its foundations?" → profile/overlay vs. core rubric
41
+
42
+ > **For the full profile-vs-overlay decision framework**, see `references/dependency-rules.md#profile-vs-overlay`.
43
+
44
+ ---
45
+
46
+ ## Step 2 — Check Dependencies
47
+
48
+ **For a core rubric:**
49
+ - No dependencies. Warn: changing the core affects all profiles and overlays.
50
+ - Ask: "Is the existing `EAROS-CORE-002` insufficient, or do you need a supplementary core for a specific domain? Modifying the core requires a governance decision."
51
+
52
+ **For a profile:**
53
+ - Confirm `core/core-meta-rubric.yaml` exists. If it doesn't: "No core rubric exists. Create one first, or proceed with a standalone profile?"
54
+ - Show existing profiles. Ask: "Does a profile for this artifact type already exist? Here's what we have: [list]. Is this a new type or a revision of an existing one?"
55
+
56
+ **For an overlay:**
57
+ - Show existing profiles and overlays.
58
+ - Ask: "Which artifact types will this overlay apply to? Here are the current profiles: [list]. Confirm this overlay is additive — it should not duplicate concerns covered in existing overlays."
59
+ - Check: does a similar overlay already exist? (e.g., if they want "AI ethics", does the security overlay already cover it?)
60
+
61
+ > **For detailed dependency checks**, see `references/dependency-rules.md`.
62
+
63
+ ---
64
+
65
+ ## Step 3 — Guided Interview
66
+
67
+ Work through these questions **one topic at a time**. Don't list them all at once. Wait for the answer before asking the next. The goal is to understand the artifact type well enough to write criteria that reliably distinguish a strong artifact from a weak one.
68
+
69
+ **I explain why each question matters — that's intentional. Understanding the reason helps you give better answers.**
70
+
71
+ ### 3a — Artifact Identity
72
+
73
+ 1. **What is this artifact type?** Name and one-sentence definition.
74
+ *(Why: Drives rubric_id, artifact_type field, and dimension names.)*
75
+
76
+ 2. **What decision does this artifact support?** Who reads it and what do they do with the information?
77
+ *(Why: EAROS criteria are always tied to decision-support. Criteria for "helping the Architecture Board approve" differ from "helping a delivery team know what to build".)*
78
+
79
+ 3. **How often does this artifact type appear — and what are the stakes?**
80
+ *(Why: High-frequency/low-stakes artifacts need lightweight criteria. Low-frequency/high-stakes artifacts justify more gates.)*
81
+
82
+ ### 3b — Quality Markers
83
+
84
+ 4. **Describe a great version of this artifact you've seen.** What made it stand out?
85
+ *(Why: Positive markers are harder to articulate than failure modes. Good examples generate the level 4 scoring guide descriptors — the hardest ones to write.)*
86
+
87
+ 5. **What does a bad version look like?** The 3 most common ways this artifact type fails.
88
+ *(Why: Common failures become `anti_patterns` and level 0–1 scoring guide descriptors. These are the most effective disambiguation tools for AI agents.)*
89
+
90
+ 6. **What's missing from an average version?** The artifact exists, the author tried — but something's always not quite there.
91
+ *(Why: This generates the level 2–3 descriptors — the most critical for calibration, since most artifacts land in this range.)*
92
+
93
+ ### 3c — Structure and Method
94
+
95
+ 7. **What are the 3–5 most important things a reviewer must check?** These become your candidate criteria.
96
+ *(Why: If you can't name 5 things, you'll over-specify. Naming them forces prioritisation before writing.)*
97
+
98
+ 8. **Which design method fits best?**
99
+
100
+ | Method | Best For |
101
+ |--------|----------|
102
+ | A — Decision-Centred | ADRs, investment reviews, exception requests |
103
+ | B — Viewpoint-Centred | Capability maps, reference architectures |
104
+ | C — Lifecycle-Centred | Transition designs, roadmaps, handover docs |
105
+ | D — Risk-Centred | Security, regulatory, resilience assessments |
106
+ | E — Pattern-Library | Recurring platform blueprints, reference patterns |
107
+
108
+ *(Why: The design method shapes the dimensional structure and where to put emphasis.)*
109
+
110
+ ### 3d — Gates and Stakes
111
+
112
+ 9. **What would make you reject this artifact outright, no matter how well-written the rest is?**
113
+ *(Why: That's your critical gate. At most 1–2 of these — if everything is critical, nothing is.)*
114
+
115
+ 10. **What would make you say "passes with conditions"?**
116
+ *(Why: That's major gate territory — serious enough to cap the outcome but not an automatic reject.)*
117
+
118
+ > **For gate guidance with worked examples**, see `references/criterion-writing-guide.md#gate-guidance`.
119
+
120
+ ---
121
+
122
+ ## Step 4 — Draft the Rubric YAML
123
+
124
+ Use the interview answers to generate the complete YAML. Before drafting, read:
125
+ - `templates/new-profile.template.yaml` — the scaffold to start from
126
+ - `references/criterion-writing-guide.md` — all 13 v2 required fields with worked examples
127
+
128
+ **Profile header skeleton:**
129
+ ```yaml
130
+ rubric_id: EAROS-[ARTIFACT]-001
131
+ version: 1.0.0
132
+ kind: profile
133
+ title: "[Artifact Type] Profile"
134
+ status: draft
135
+ effective_date: "[today]"
136
+ next_review_date: "[6 months from today]"
137
+ owner: enterprise-architecture
138
+ artifact_type: [artifact_type_snake_case]
139
+ inherits:
140
+ - EAROS-CORE-002
141
+ design_method: [method from Step 3c]
142
+ ```
143
+
144
+ **Overlay header skeleton:**
145
+ ```yaml
146
+ rubric_id: EAROS-OVR-[CONCERN]-001
147
+ version: 1.0.0
148
+ kind: overlay
149
+ title: "[Concern] Overlay"
150
+ status: draft
151
+ effective_date: "[today]"
152
+ artifact_type: any
153
+ # No 'inherits' field
154
+ scoring:
155
+ method: append_to_base_rubric
156
+ ```
157
+
158
+ **Criteria count targets:**
159
+ - Profile: 5–12 criteria across 2–6 dimensions (core has 10; don't add more than needed)
160
+ - Overlay: 2–6 criteria (overlays inject focused concerns, not full rubrics)
161
+ - Core rubric: 8–12 criteria across 6–10 dimensions
162
+
163
+ Every criterion must have all 13 v2 fields. See `references/criterion-writing-guide.md` for the complete field list with a worked example.
164
+
165
+ **ID assignment:**
166
+ - Check `core/`, `profiles/`, and `overlays/` for existing IDs before assigning.
167
+ - Profile criteria: `[ARTIFACT-ABBREV]-[TOPIC]-[NN]` — e.g., `PM-ROOT-01` for post-mortem root cause
168
+ - Overlay criteria: `[OVR-ABBREV]-[TOPIC]-[NN]` — e.g., `AI-TRANS-01` for AI transparency
169
+
170
+ ---
171
+
172
+ ## Step 5 — Validate
173
+
174
+ After generating YAML, run these checks before presenting to the user.
175
+
176
+ > **Full validation checklist in `references/validation-checklist.md`.**
177
+
178
+ Quick checks:
179
+ 1. **ID uniqueness**: no criterion ID matches anything in `core/`, `profiles/`, or `overlays/`
180
+ 2. **Criteria count**: 5–12 for profiles, 2–6 for overlays
181
+ 3. **All 13 v2 fields present** on every criterion (see `references/criterion-writing-guide.md`)
182
+ 4. **Gate balance**: at most 1–2 major gates, 0–1 critical gates per profile
183
+ 5. **Core overlap**: no criterion duplicates what `EAROS-CORE-002` already covers
184
+ 6. **Schema conformance**: structure matches `standard/schemas/rubric.schema.json`
185
+
186
+ Tell the user: "Run `earos-validate` after placing the file to catch any remaining schema errors."
187
+
188
+ ---
189
+
190
+ ## Step 6 — Calibration Readiness
191
+
192
+ Before the rubric can be used in a live governance process it must be calibrated. Give the user:
193
+
194
+ ```
195
+ Pre-Calibration
196
+ [ ] 3+ real artifacts collected:
197
+ - 1 strong artifact (expected overall score >= 3.2)
198
+ - 1 weak artifact (expected overall score < 2.4)
199
+ - 1 ambiguous artifact (borderline 2.4–3.2)
200
+ [ ] 2+ reviewers identified (at least one domain expert)
201
+ [ ] YAML complete and schema-valid (run earos-validate)
202
+
203
+ Calibration Run
204
+ [ ] Each reviewer scores all artifacts independently
205
+ [ ] Cohen's kappa computed per criterion:
206
+ - Target > 0.70 for well-defined, observable criteria
207
+ - Target > 0.50 for subjective or judgment-heavy criteria
208
+ [ ] Disagreements of >= 2 points identified and resolved against level descriptors
209
+ [ ] decision_tree entries updated where disagreements clustered
210
+
211
+ Post-Calibration
212
+ [ ] Profile status: draft → candidate
213
+ [ ] Worked example saved to examples/example-[artifact-type].evaluation.yaml
214
+ [ ] CHANGELOG.md updated
215
+ [ ] earos-validate run on complete repository
216
+ ```
217
+
218
+ ---
219
+
220
+ ## Step 7 — File Placement
221
+
222
+ - Profiles → `profiles/<artifact-type>.yaml` (kebab-case, lowercase)
223
+ - Overlays → `overlays/<concern>.yaml`
224
+ - Core rubric → `core/<name>.yaml`
225
+
226
+ After placing the file, **update `earos.manifest.yaml`** by running:
227
+ ```
228
+ node tools/editor/bin.js manifest add <path/to/new-file.yaml>
229
+ ```
230
+ This registers the new rubric in the manifest so skills, the editor sidebar, and the validate check can discover it automatically. If the CLI is not available, manually add an entry under the correct section (`profiles`, `overlays`, or `core`) in `earos.manifest.yaml`.
231
+
232
+ Remind the user to run `earos-validate` after placing the file and before committing.
233
+
234
+ ---
235
+
236
+ ## Non-Negotiable Rules
237
+
238
+ 1. **Interview before YAML.** Never generate the rubric before completing Steps 3a–3d.
239
+ 2. **Never duplicate core criteria.** Read `EAROS-CORE-002` before finalising criteria.
240
+ 3. **5–12 criteria for profiles, 2–6 for overlays.** More criteria = less reliable calibration.
241
+ 4. **All 13 v2 fields on every criterion.** Incomplete criteria cannot be calibrated.
242
+ 5. **Gates are rare.** At most 1–2 major gates and 0–1 critical gates per profile. Over-gating creates false rejects.
243
+ 6. **Calibrate before production.** `status: draft` must not be used in a live governance process.
244
+ 7. **Explain your questions.** Tell the user why each question matters — this is a design conversation, not an interrogation.
245
+
246
+ ---
247
+
248
+ ## When to Read Which Reference File
249
+
250
+ | When | Read |
251
+ |------|------|
252
+ | Checking what already exists | `references/dependency-rules.md` |
253
+ | Deciding profile vs. overlay | `references/dependency-rules.md#profile-vs-overlay` |
254
+ | Writing criteria | `references/criterion-writing-guide.md` |
255
+ | Deciding gate types and weights | `references/criterion-writing-guide.md#gate-guidance` |
256
+ | Deepening the interview | `references/rubric-interview-guide.md` |
257
+ | Before publishing the file | `references/validation-checklist.md` |