@trohde/earos 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (135) hide show
  1. package/README.md +156 -0
  2. package/assets/init/.agents/skills/earos-artifact-gen/SKILL.md +106 -0
  3. package/assets/init/.agents/skills/earos-artifact-gen/references/interview-guide.md +313 -0
  4. package/assets/init/.agents/skills/earos-artifact-gen/references/output-guide.md +367 -0
  5. package/assets/init/.agents/skills/earos-assess/SKILL.md +212 -0
  6. package/assets/init/.agents/skills/earos-assess/references/calibration-benchmarks.md +160 -0
  7. package/assets/init/.agents/skills/earos-assess/references/output-templates.md +311 -0
  8. package/assets/init/.agents/skills/earos-assess/references/scoring-protocol.md +281 -0
  9. package/assets/init/.agents/skills/earos-calibrate/SKILL.md +153 -0
  10. package/assets/init/.agents/skills/earos-calibrate/references/agreement-metrics.md +188 -0
  11. package/assets/init/.agents/skills/earos-calibrate/references/calibration-protocol.md +263 -0
  12. package/assets/init/.agents/skills/earos-create/SKILL.md +257 -0
  13. package/assets/init/.agents/skills/earos-create/references/criterion-writing-guide.md +268 -0
  14. package/assets/init/.agents/skills/earos-create/references/dependency-rules.md +193 -0
  15. package/assets/init/.agents/skills/earos-create/references/rubric-interview-guide.md +123 -0
  16. package/assets/init/.agents/skills/earos-create/references/validation-checklist.md +238 -0
  17. package/assets/init/.agents/skills/earos-profile-author/SKILL.md +251 -0
  18. package/assets/init/.agents/skills/earos-profile-author/references/criterion-writing-guide.md +280 -0
  19. package/assets/init/.agents/skills/earos-profile-author/references/design-methods.md +158 -0
  20. package/assets/init/.agents/skills/earos-profile-author/references/profile-checklist.md +173 -0
  21. package/assets/init/.agents/skills/earos-remediate/SKILL.md +118 -0
  22. package/assets/init/.agents/skills/earos-remediate/references/output-template.md +199 -0
  23. package/assets/init/.agents/skills/earos-remediate/references/remediation-patterns.md +330 -0
  24. package/assets/init/.agents/skills/earos-report/SKILL.md +85 -0
  25. package/assets/init/.agents/skills/earos-report/references/portfolio-template.md +181 -0
  26. package/assets/init/.agents/skills/earos-report/references/single-artifact-template.md +168 -0
  27. package/assets/init/.agents/skills/earos-review/SKILL.md +130 -0
  28. package/assets/init/.agents/skills/earos-review/references/challenge-patterns.md +163 -0
  29. package/assets/init/.agents/skills/earos-review/references/output-template.md +180 -0
  30. package/assets/init/.agents/skills/earos-template-fill/SKILL.md +177 -0
  31. package/assets/init/.agents/skills/earos-template-fill/references/evidence-writing-guide.md +186 -0
  32. package/assets/init/.agents/skills/earos-template-fill/references/section-rubric-mapping.md +200 -0
  33. package/assets/init/.agents/skills/earos-validate/SKILL.md +113 -0
  34. package/assets/init/.agents/skills/earos-validate/references/fix-patterns.md +281 -0
  35. package/assets/init/.agents/skills/earos-validate/references/validation-checks.md +287 -0
  36. package/assets/init/.claude/CLAUDE.md +4 -0
  37. package/assets/init/AGENTS.md +293 -0
  38. package/assets/init/CLAUDE.md +635 -0
  39. package/assets/init/README.md +507 -0
  40. package/assets/init/calibration/gold-set/.gitkeep +0 -0
  41. package/assets/init/calibration/results/.gitkeep +0 -0
  42. package/assets/init/core/core-meta-rubric.yaml +643 -0
  43. package/assets/init/docs/consistency-report.md +325 -0
  44. package/assets/init/docs/getting-started.md +194 -0
  45. package/assets/init/docs/profile-authoring-guide.md +51 -0
  46. package/assets/init/docs/terminology.md +126 -0
  47. package/assets/init/earos.manifest.yaml +104 -0
  48. package/assets/init/evaluations/.gitkeep +0 -0
  49. package/assets/init/examples/aws-event-driven-order-processing/artifact.yaml +2056 -0
  50. package/assets/init/examples/aws-event-driven-order-processing/evaluation.yaml +973 -0
  51. package/assets/init/examples/aws-event-driven-order-processing/report.md +244 -0
  52. package/assets/init/examples/example-solution-architecture.evaluation.yaml +136 -0
  53. package/assets/init/examples/multi-cloud-data-analytics/artifact.yaml +715 -0
  54. package/assets/init/overlays/data-governance.yaml +94 -0
  55. package/assets/init/overlays/regulatory.yaml +154 -0
  56. package/assets/init/overlays/security.yaml +92 -0
  57. package/assets/init/profiles/adr.yaml +225 -0
  58. package/assets/init/profiles/capability-map.yaml +223 -0
  59. package/assets/init/profiles/reference-architecture.yaml +426 -0
  60. package/assets/init/profiles/roadmap.yaml +205 -0
  61. package/assets/init/profiles/solution-architecture.yaml +227 -0
  62. package/assets/init/research/architecture-assessment-rubrics-research.docx +0 -0
  63. package/assets/init/research/architecture-assessment-rubrics-research.md +566 -0
  64. package/assets/init/research/reference-architecture-research.md +751 -0
  65. package/assets/init/standard/EAROS.md +1426 -0
  66. package/assets/init/standard/schemas/artifact.schema.json +1295 -0
  67. package/assets/init/standard/schemas/artifact.uischema.json +65 -0
  68. package/assets/init/standard/schemas/evaluation.schema.json +284 -0
  69. package/assets/init/standard/schemas/rubric.schema.json +383 -0
  70. package/assets/init/templates/evaluation-record.template.yaml +58 -0
  71. package/assets/init/templates/new-profile.template.yaml +65 -0
  72. package/bin.js +188 -0
  73. package/dist/assets/_basePickBy-BVu6YmSW.js +1 -0
  74. package/dist/assets/_baseUniq-CWRzQDz_.js +1 -0
  75. package/dist/assets/arc-CyDBhtDM.js +1 -0
  76. package/dist/assets/architectureDiagram-2XIMDMQ5-BH6O4dvN.js +36 -0
  77. package/dist/assets/blockDiagram-WCTKOSBZ-2xmwdjpg.js +132 -0
  78. package/dist/assets/c4Diagram-IC4MRINW-BNmPRFJF.js +10 -0
  79. package/dist/assets/channel-CiySTNoJ.js +1 -0
  80. package/dist/assets/chunk-4BX2VUAB-DGQTvirp.js +1 -0
  81. package/dist/assets/chunk-55IACEB6-DNMAQAC_.js +1 -0
  82. package/dist/assets/chunk-FMBD7UC4-BJbVTQ5o.js +15 -0
  83. package/dist/assets/chunk-JSJVCQXG-BCxUL74A.js +1 -0
  84. package/dist/assets/chunk-KX2RTZJC-H7wWZOfz.js +1 -0
  85. package/dist/assets/chunk-NQ4KR5QH-BK4RlTQF.js +220 -0
  86. package/dist/assets/chunk-QZHKN3VN-0chxDV5g.js +1 -0
  87. package/dist/assets/chunk-WL4C6EOR-DexfQ-AV.js +189 -0
  88. package/dist/assets/classDiagram-VBA2DB6C-D7luWJQn.js +1 -0
  89. package/dist/assets/classDiagram-v2-RAHNMMFH-D7luWJQn.js +1 -0
  90. package/dist/assets/clone-ylgRbd3D.js +1 -0
  91. package/dist/assets/cose-bilkent-S5V4N54A-DS2IOCfZ.js +1 -0
  92. package/dist/assets/cytoscape.esm-CyJtwmzi.js +331 -0
  93. package/dist/assets/dagre-KLK3FWXG-BbSoTTa3.js +4 -0
  94. package/dist/assets/defaultLocale-DX6XiGOO.js +1 -0
  95. package/dist/assets/diagram-E7M64L7V-C9TvYgv0.js +24 -0
  96. package/dist/assets/diagram-IFDJBPK2-DowUMWrg.js +43 -0
  97. package/dist/assets/diagram-P4PSJMXO-BL6nrnQF.js +24 -0
  98. package/dist/assets/erDiagram-INFDFZHY-rXPRl8VM.js +70 -0
  99. package/dist/assets/flowDiagram-PKNHOUZH-DBRM99-W.js +162 -0
  100. package/dist/assets/ganttDiagram-A5KZAMGK-INcWFsBT.js +292 -0
  101. package/dist/assets/gitGraphDiagram-K3NZZRJ6-DMwpfE91.js +65 -0
  102. package/dist/assets/graph-DLQn37b-.js +1 -0
  103. package/dist/assets/index-BFFITMT8.js +650 -0
  104. package/dist/assets/index-H7f6VTz1.css +1 -0
  105. package/dist/assets/infoDiagram-LFFYTUFH-B0f4TWRM.js +2 -0
  106. package/dist/assets/init-Gi6I4Gst.js +1 -0
  107. package/dist/assets/ishikawaDiagram-PHBUUO56-CsU6XimZ.js +70 -0
  108. package/dist/assets/journeyDiagram-4ABVD52K-CQ7ibNib.js +139 -0
  109. package/dist/assets/kanban-definition-K7BYSVSG-DzEN7THt.js +89 -0
  110. package/dist/assets/katex-B1X10hvy.js +261 -0
  111. package/dist/assets/layout-C0dvb42R.js +1 -0
  112. package/dist/assets/linear-j4a8mGj7.js +1 -0
  113. package/dist/assets/mindmap-definition-YRQLILUH-DP8iEuCf.js +68 -0
  114. package/dist/assets/ordinal-Cboi1Yqb.js +1 -0
  115. package/dist/assets/pieDiagram-SKSYHLDU-BpIAXgAm.js +30 -0
  116. package/dist/assets/quadrantDiagram-337W2JSQ-DrpXn5Eg.js +7 -0
  117. package/dist/assets/requirementDiagram-Z7DCOOCP-Bg7EwHlG.js +73 -0
  118. package/dist/assets/sankeyDiagram-WA2Y5GQK-BWagRs1F.js +10 -0
  119. package/dist/assets/sequenceDiagram-2WXFIKYE-q5jwhivG.js +145 -0
  120. package/dist/assets/stateDiagram-RAJIS63D-B_J9pE-2.js +1 -0
  121. package/dist/assets/stateDiagram-v2-FVOUBMTO-Q_1GcybB.js +1 -0
  122. package/dist/assets/timeline-definition-YZTLITO2-dv0jgQ0z.js +61 -0
  123. package/dist/assets/treemap-KZPCXAKY-Dt1dkIE7.js +162 -0
  124. package/dist/assets/vennDiagram-LZ73GAT5-BdO5RgRZ.js +34 -0
  125. package/dist/assets/xychartDiagram-JWTSCODW-CpDVe-8v.js +7 -0
  126. package/dist/index.html +23 -0
  127. package/export-docx.js +1583 -0
  128. package/init.js +353 -0
  129. package/manifest-cli.mjs +207 -0
  130. package/package.json +83 -0
  131. package/schemas/artifact.schema.json +1295 -0
  132. package/schemas/artifact.uischema.json +65 -0
  133. package/schemas/evaluation.schema.json +284 -0
  134. package/schemas/rubric.schema.json +383 -0
  135. package/serve.js +238 -0
@@ -0,0 +1,635 @@
1
+ # CLAUDE.md — EAROS Project Guide
2
+
3
+ **Enterprise Architecture Rubric Operational Standard · Version 2.0**
4
+
5
+ This file tells Claude how to work effectively in this project.
6
+
7
+ > **Greenfield project.** There are no published prior versions. Do not worry about backward compatibility — optimize for clarity and consistency over preserving legacy conventions.
8
+
9
+ ---
10
+
11
+ ## 1. Project Overview
12
+
13
+ EAROS is a structured, extensible framework for evaluating enterprise architecture artifacts. It makes architecture review consistent, explainable, and automatable — for both human reviewers and AI agents.
14
+
15
+ **The core problem it solves:** Architecture artifacts (solution designs, ADRs, capability maps, reference architectures, roadmaps) are evaluated constantly but rarely consistently. Different reviewers apply different mental models; AI assessments hallucinate quality. EAROS codifies evaluation criteria into governed, machine-readable rubrics with precise level descriptors, mandatory evidence requirements, and unambiguous pass/fail gates.
16
+
17
+ **Analogy:** EAROS is to architecture review what a marking rubric is to an exam — criteria explicit, scoring reproducible, feedback actionable.
18
+
19
+ ### The Three-Layer Model
20
+
21
+ ```
22
+ ┌──────────────────────────────────────────────────────────┐
23
+ │ OVERLAYS (cross-cutting concerns) │
24
+ │ security · data-governance · regulatory │
25
+ ├──────────────────────────────────────────────────────────┤
26
+ │ PROFILES (artifact-specific extensions) │
27
+ │ solution-architecture · reference-architecture · adr │
28
+ │ capability-map · roadmap │
29
+ ├──────────────────────────────────────────────────────────┤
30
+ │ CORE (universal foundation — all artifacts) │
31
+ │ core-meta-rubric.yaml (EAROS-CORE-002) │
32
+ │ 9 dimensions · 0–4 ordinal scale · gate model │
33
+ └──────────────────────────────────────────────────────────┘
34
+ ```
35
+
36
+ - **Core** (`core/core-meta-rubric.yaml`, `rubric_id: EAROS-CORE-002`) defines 9 universal dimensions with 10 criteria that apply to every architecture artifact. Always evaluated.
37
+ - **Profiles** (`profiles/`) extend the core for specific artifact types (e.g., reference-architecture adds 11 criteria across 6 dimensions). Each profile `inherits: [EAROS-CORE-002]`.
38
+ - **Overlays** (`overlays/`) inject cross-cutting concerns (security, data governance, regulatory) on top of any core+profile combination. Applied by context, not by artifact type.
39
+
40
+ One global rubric is too generic; fully bespoke rubrics are ungovernable. This layered model is the balance.
41
+
42
+ ---
43
+
44
+ ## 2. Key Concepts
45
+
46
+ For definitions of all technical terms used in EAROS, see `docs/terminology.md`.
47
+
48
+ ### 2.1 Scoring Model — 0–4 Ordinal + N/A
49
+
50
+ | Score | Label | Meaning |
51
+ |-------|-------|---------|
52
+ | 4 | Strong | Fully addressed, well evidenced, internally consistent, decision-ready |
53
+ | 3 | Good | Clearly addressed with adequate evidence and only minor gaps |
54
+ | 2 | Partial | Explicitly addressed but coverage incomplete, inconsistent, or weakly evidenced |
55
+ | 1 | Weak | Acknowledged or implied, but inadequate for decision support |
56
+ | 0 | Absent | No meaningful evidence, or evidence directly contradicts the criterion |
57
+ | N/A | Not applicable | Criterion genuinely does not apply in this scope/context |
58
+
59
+ The 0–4 scale is intentional. A 1–10 scale creates false precision and lowers calibration quality. For pure agent evaluation, an optional 0–3 collapse is permitted but must be declared in the metadata.
60
+
61
+ **N/A policy:** Exclude N/A criteria from the denominator. Every N/A must be justified in the narrative.
62
+
63
+ **Confidence policy:** Confidence (`high` / `medium` / `low`) is reported separately from the score. It must NOT mathematically modify the score. These are two different things.
64
+
65
+ ### 2.2 Gate Types
66
+
67
+ Gates prevent bad scores being hidden by weighted averages.
68
+
69
+ | Gate Type | Effect |
70
+ |-----------|--------|
71
+ | `none` | Contributes to score only; no gate logic |
72
+ | `advisory` | Weak performance triggers a recommendation |
73
+ | `major` | Significant weakness may cap the status (e.g., cannot pass above `conditional_pass`) |
74
+ | `critical` | Failure blocks pass status entirely; triggers `reject` regardless of average |
75
+
76
+ Gate fields in YAML:
77
+ ```yaml
78
+ gate:
79
+ enabled: true
80
+ severity: critical # none | advisory | major | critical
81
+ failure_effect: reject when mandatory control compliance cannot be determined
82
+ ```
83
+
84
+ Or simply `gate: false` for no gate.
85
+
86
+ ### 2.3 Status Model
87
+
88
+ Evaluate gates first, then compute the weighted average.
89
+
90
+ | Status | Threshold |
91
+ |--------|-----------|
92
+ | **Pass** | No critical gate failure + overall ≥ 3.2 + no dimension < 2.0 |
93
+ | **Conditional Pass** | No critical gate failure + overall 2.4–3.19 (weaknesses containable with named actions) |
94
+ | **Rework Required** | Overall < 2.4, or repeated weak dimensions, or insufficient evidence |
95
+ | **Reject** | Any critical gate failure, or mandatory control breach |
96
+ | **Not Reviewable** | Evidence too incomplete to score responsibly; core gate criteria unresolvable |
97
+
98
+ ### 2.4 Evidence Classes
99
+
100
+ Every score must cite evidence. Reviewers (human or agent) must classify the evidence type:
101
+
102
+ | Class | Meaning |
103
+ |-------|---------|
104
+ | `observed` | Directly supported by a quote or excerpt from the artifact |
105
+ | `inferred` | Reasonable interpretation not directly stated |
106
+ | `external` | Judgment based on a standard, policy, or source outside the artifact |
107
+
108
+ This separation is a design principle, not optional annotation. Observed > inferred > external in credibility.
109
+
110
+ ### 2.5 The Three Evaluation Types (Never Collapse)
111
+
112
+ The standard distinguishes three distinct judgment types that must not be merged into a single score:
113
+
114
+ 1. **Artifact quality** — Is the artifact complete, coherent, clear, traceable, and fit for its stated purpose?
115
+ 2. **Architectural fitness** — Does the described architecture appear sound relative to business drivers, quality attributes, risks, and tradeoffs?
116
+ 3. **Governance fit** — Does the artifact/design comply with mandatory principles, standards, controls, and review expectations?
117
+
118
+ These are related but distinct. A beautiful, complete artifact can describe an architecturally unsound system.
119
+
120
+ ### 2.6 DAG Evaluation Flow (Agent Mode)
121
+
122
+ Agents must follow this 8-step directed acyclic graph:
123
+
124
+ ```
125
+ structural_validation
126
+ → content_extraction
127
+ → criterion_scoring
128
+ → cross_reference_validation
129
+ → dimension_aggregation
130
+ → challenge_pass
131
+ → calibration
132
+ → status_determination
133
+ ```
134
+
135
+ The rubric is locked during evaluation (`rubric_locked: true`). Calibration uses the RULERS Wasserstein-based method (`calibration_method: rulers_wasserstein`).
136
+
137
+ **RULERS protocol** (evidence-anchored scoring): For each criterion, extract a direct quote or reference from the artifact before assigning a score. If no evidence can be found, record N/A and explain — never score from impression alone.
138
+
139
+ ---
140
+
141
+ ## 3. Project Structure
142
+
143
+ ```
144
+ EAROS/
145
+ ├── earos.manifest.yaml Inventory of all EAROS rubric files (auto-generated; keep up to date)
146
+
147
+ ├── standard/
148
+ │ ├── EAROS.md Canonical standard (read this first for deep understanding)
149
+ │ ├── EAROS_Standard_v2.docx Word version
150
+ │ └── schemas/
151
+ │ ├── rubric.schema.json JSON Schema for all rubric/profile/overlay YAML files
152
+ │ ├── evaluation.schema.json JSON Schema for evaluation record output files
153
+ │ ├── artifact.schema.json JSON Schema for architecture artifact documents
154
+ │ └── artifact.uischema.json UI Schema for artifact editor form layout (7 tabs)
155
+
156
+ ├── core/
157
+ │ └── core-meta-rubric.yaml The universal foundation (EAROS-CORE-002)
158
+ │ 9 dimensions, 10 criteria, always applied
159
+
160
+ ├── profiles/ Artifact-type extensions (inherit core)
161
+ │ ├── solution-architecture.yaml
162
+ │ ├── reference-architecture.yaml ← First full profile; model for others
163
+ │ ├── adr.yaml
164
+ │ ├── capability-map.yaml
165
+ │ └── roadmap.yaml
166
+
167
+ ├── overlays/ Cross-cutting injectors (applied by context)
168
+ │ ├── security.yaml (EAROS-OVR-SEC-001)
169
+ │ ├── data-governance.yaml
170
+ │ └── regulatory.yaml
171
+
172
+ ├── templates/
173
+ │ ├── new-profile.template.yaml Scaffold for new profiles
174
+ │ └── evaluation-record.template.yaml Blank evaluation record
175
+
176
+ ├── tools/scoring-sheets/
177
+ │ └── EAROS_Scoring_Sheet_v2.xlsx General-purpose manual scoring
178
+
179
+ ├── examples/
180
+ │ └── example-solution-architecture.evaluation.yaml Worked evaluation record
181
+
182
+ ├── calibration/
183
+ │ ├── gold-set/ Reference artifacts with known scores (calibrate against these)
184
+ │ └── results/ Calibration run outputs
185
+
186
+ ├── docs/
187
+ │ ├── getting-started.md
188
+ │ └── profile-authoring-guide.md How to create profiles
189
+
190
+ └── research/ Research underpinning the standard (63 sources)
191
+ ```
192
+
193
+ ---
194
+
195
+ ## 4. Working with Rubric YAML Files
196
+
197
+ ### Structure: Core Meta-Rubric and Profiles
198
+
199
+ ```yaml
200
+ rubric_id: EAROS-PROF-XXX # Unique ID
201
+ version: 1.0.0 # Semver
202
+ kind: profile # core_rubric | profile | overlay
203
+ title: "..."
204
+ status: draft # draft | candidate | approved | deprecated
205
+ effective_date: "YYYY-MM-DD"
206
+ owner: enterprise-architecture
207
+ artifact_type: reference_architecture
208
+ inherits:
209
+ - EAROS-CORE-002 # Profiles always inherit the core
210
+ design_method: pattern_library # See Section 5 below
211
+
212
+ dimensions:
213
+ - id: RA-D1
214
+ name: Architecture views and completeness
215
+ description: "..."
216
+ weight: 1.2 # Relative weight for aggregation (default 1.0)
217
+ criteria:
218
+ - id: RA-VIEW-01
219
+ question: "Does the reference architecture include context, functional, deployment, and data flow views?"
220
+ description: "..."
221
+ metric_type: ordinal
222
+ scale: [0, 1, 2, 3, 4, "N/A"]
223
+ gate:
224
+ enabled: true
225
+ severity: major
226
+ failure_effect: Cannot pass if score < 2
227
+ required_evidence:
228
+ - context diagram (C4 Level 1 or equivalent)
229
+ - deployment diagram showing infrastructure topology
230
+ scoring_guide:
231
+ "0": Single diagram only, or no architectural views
232
+ "1": Two views present but incomplete
233
+ "2": Three views present, data flow narrative partial
234
+ "3": All four views present with adequate detail
235
+ "4": All four views, consistent, with security view and cross-references
236
+ anti_patterns:
237
+ - Single box-and-arrow diagram presented as complete architecture
238
+ examples:
239
+ good:
240
+ - "Section 3 provides C4 context diagram. Section 5 shows container decomposition..."
241
+ bad:
242
+ - "See architecture diagram on page 3."
243
+ decision_tree: >
244
+ Count distinct views: IF < 2 THEN score 0-1. IF 2-3 views THEN score 2.
245
+ IF 4+ views AND data flow narrative exists THEN score 3.
246
+ IF all views cross-referenced AND security view included THEN score 4.
247
+ remediation_hints:
248
+ - Add missing views using C4 model levels
249
+ - Add numbered data flow walkthrough
250
+
251
+ scoring:
252
+ scale: 0-4 ordinal plus N/A
253
+ method: gates_first_then_weighted_average
254
+ thresholds:
255
+ pass: No critical gate failure, overall >= 3.2, and no dimension < 2.0
256
+ conditional_pass: No critical gate failure and overall 2.4-3.19
257
+ rework_required: Overall < 2.4
258
+ reject: Critical gate failure
259
+
260
+ outputs:
261
+ require_evidence_refs: true
262
+ require_confidence: true
263
+ require_actions: true
264
+ require_evidence_class: true
265
+ require_evidence_anchors: true
266
+
267
+ calibration:
268
+ required_before_production: true
269
+ minimum_examples: 3
270
+ ```
271
+
272
+ ### Overlay Structure
273
+
274
+ Overlays use `kind: overlay` and `artifact_type: any`. Their `scoring.method` is `append_to_base_rubric` — they do not replace the base scoring; they add criteria on top. Overlays typically have at least one `critical` gate.
275
+
276
+ ### Schema Validation
277
+
278
+ Four schemas live in `standard/schemas/`:
279
+
280
+ | Schema | Purpose | Kind discriminator |
281
+ |--------|---------|--------------------|
282
+ | `rubric.schema.json` | **Data schema** — validates rubric definitions (core, profiles, overlays) | `kind: core_rubric`, `profile`, `overlay` |
283
+ | `evaluation.schema.json` | **Data schema** — validates evaluation records | `kind: evaluation` |
284
+ | `artifact.schema.json` | **Data schema** — validates architecture artifact documents | `kind: artifact` |
285
+ | `artifact.uischema.json` | **UI schema** — controls how the artifact editor renders the form (7 tabs) | N/A (presentation only) |
286
+
287
+ **Derivation chain:** Rubric → Artifact Schema → Artifact UI Schema → Template. The `artifact.schema.json` is derived directly from the `required_evidence` fields of the core meta-rubric and profiles. Each section in the artifact schema maps to the evidence a rubric criterion requires. When a profile adds criteria with new `required_evidence`, the corresponding artifact schema should be updated to add those sections. This chain means a well-completed artifact document will satisfy the evidence requirements for its rubric criteria.
288
+
289
+ **Data Schema vs UI Schema:** JSON Forms (used by the EAROS editor) separates concerns into two schemas:
290
+ - **Data Schema** (`artifact.schema.json`) — defines what data exists and its structure. Used for validation. Changing it changes the data model.
291
+ - **UI Schema** (`artifact.uischema.json`) — defines how the data is presented in the editor form. Controls tab grouping, field ordering, and layout. Changing it only affects the editor experience, not the data.
292
+
293
+ The artifact UI Schema splits the editor into 7 tabs: Overview & Metadata, Business Context, Architecture Views, Decisions & Crosscutting, Quality & Operations, Implementation, and Governance & Glossary. Without the UI Schema, JSON Forms would create only 2 tabs (one per top-level property: `metadata` and `sections`).
294
+
295
+ This pattern should be replicated when rubric or evaluation editors need better tab layouts: create a `rubric.uischema.json` or `evaluation.uischema.json` to control the form layout.
296
+
297
+ Rubric YAML files must validate against `rubric.schema.json`. Required top-level fields: `rubric_id`, `version`, `kind`, `title`, `artifact_type`, `dimensions`, `scoring`, `outputs`.
298
+
299
+ Evaluation records must validate against `evaluation.schema.json`.
300
+
301
+ Artifact documents must validate against `artifact.schema.json`.
302
+
303
+ ---
304
+
305
+ ## 5. How to Create a New Profile
306
+
307
+ ### Step 1 — Qualify the need
308
+ - The artifact type must recur enough to justify standardization.
309
+ - The core meta-rubric alone must be insufficient for this artifact type.
310
+ - Gather 3–5 representative real artifacts for calibration.
311
+
312
+ ### Step 2 — Choose a design method
313
+
314
+ | Method | Best For |
315
+ |--------|----------|
316
+ | A: Decision-Centred | ADRs, investment reviews, exception requests |
317
+ | B: Viewpoint-Centred | Capability maps, reference architectures |
318
+ | C: Lifecycle-Centred | Transition designs, roadmaps, handover docs |
319
+ | D: Risk-Centred | Security, regulatory, resilience architecture |
320
+ | E: Pattern-Library | Recurring reference patterns, platform services |
321
+
322
+ ### Step 3 — Copy the template
323
+
324
+ Start from `templates/new-profile.template.yaml`. Set:
325
+ - `kind: profile`
326
+ - `inherits: [EAROS-CORE-002]`
327
+ - `design_method` from step 2
328
+ - `rubric_id` using pattern `EAROS-<ARTIFACT>-<NNN>`
329
+
330
+ ### Step 4 — Write 5–12 criteria
331
+
332
+ Rules:
333
+ - Add **no more than 5–12 criteria** (the core already has 10)
334
+ - Every criterion needs: `question`, `description`, `scoring_guide` (all 5 levels 0–4), `required_evidence`, `anti_patterns`, `examples.good`, `examples.bad`, `decision_tree`, `remediation_hints`
335
+ - Assign each criterion to a dimension with an appropriate `weight`
336
+ - Designate gate types deliberately — not every criterion needs a gate; over-gating creates false rejects
337
+ - Include at least one `major` gate for the most critical dimension
338
+
339
+ ### Step 5 — Calibrate before production
340
+ 1. Build a calibration pack: 1 strong, 1 weak, 1 ambiguous, 1 incomplete artifact
341
+ 2. Have 2+ reviewers score independently against the profile
342
+ 3. Compute Cohen's κ — target > 0.70 for well-defined criteria, > 0.50 for subjective ones
343
+ 4. Identify disagreements; resolve against the level descriptors
344
+ 5. Update `decision_tree` and `scoring_guide` where disagreements clustered
345
+
346
+ ### Step 6 — Publish
347
+ - Validate YAML against `standard/schemas/rubric.schema.json`
348
+ - Add worked examples to `examples/`
349
+ - Document in `CHANGELOG.md`
350
+ - File naming: `<artifact-type>.v<major>.yaml`
351
+
352
+ ---
353
+
354
+ ## 6. How to Create a New Overlay
355
+
356
+ ### Profile vs. Overlay — the distinction
357
+
358
+ | Use a **profile** when... | Use an **overlay** when... |
359
+ |---------------------------|---------------------------|
360
+ | The artifact type is distinct and recurring | The concern cuts across multiple artifact types |
361
+ | Criteria only make sense for this artifact type | Criteria apply regardless of artifact type |
362
+ | You extend the dimensional structure | You inject additional criteria into any rubric |
363
+
364
+ ### Overlay structure
365
+
366
+ ```yaml
367
+ kind: overlay
368
+ artifact_type: any # Not tied to a specific artifact type
369
+ # No 'inherits' field — overlays don't inherit, they append
370
+ scoring:
371
+ method: append_to_base_rubric # Key difference from profiles
372
+ ```
373
+
374
+ ### When to apply an overlay
375
+
376
+ Apply overlays based on context, not artifact type. Examples:
377
+ - **Security overlay** whenever the design touches authentication, authorization, or personal data
378
+ - **Data governance overlay** whenever the artifact describes data flows, retention, or classification
379
+ - **Regulatory overlay** for artifacts in regulated domains (payments, healthcare, financial reporting)
380
+
381
+ Overlays are additive — they cannot remove or weaken gates from the base rubric. An overlay's `critical` gate adds to, not replaces, the base gate model.
382
+
383
+ ---
384
+
385
+ ## 7. How to Perform an Assessment
386
+
387
+ ### Human Mode
388
+
389
+ 1. Identify artifact type → select core + matching profile + applicable overlays
390
+ 2. Open `tools/scoring-sheets/EAROS_Scoring_Sheet_v2.xlsx`
391
+ 3. Score each criterion 0–4 using the `scoring_guide` level descriptors
392
+ 4. Record the evidence: quote or reference for each score (observed / inferred / external)
393
+ 5. Check gates — any critical gate failure → Reject immediately; do not compute average
394
+ 6. Compute weighted dimension average → apply status thresholds
395
+ 7. Record the evaluation in an output file conforming to `evaluation.schema.json`
396
+
397
+ ### Agent Mode
398
+
399
+ Minimal prompt pattern:
400
+ ```
401
+ You are an architecture quality assessor. Apply the EAROS rubric defined in
402
+ [rubric YAML] to the artifact below. For each criterion:
403
+ 1. Extract the relevant evidence (direct quote or reference) — RULERS protocol
404
+ 2. Score 0–4 against the level descriptors in scoring_guide
405
+ 3. If you cannot find evidence, score N/A and explain why
406
+ 4. Flag any gate criteria that fail
407
+ 5. Classify evidence as observed / inferred / external
408
+ 6. Report confidence (high/medium/low) separately from the score
409
+ Produce output conforming to evaluation.schema.json.
410
+
411
+ <artifact>
412
+ [artifact content]
413
+ </artifact>
414
+ ```
415
+
416
+ Follow the DAG exactly:
417
+ `structural_validation → content_extraction → criterion_scoring → cross_reference_validation → dimension_aggregation → challenge_pass → calibration → status_determination`
418
+
419
+ Do not skip `challenge_pass` — this step has a second agent challenge the primary evaluator's scores.
420
+
421
+ Calibrate against `calibration/gold-set/` before production use. Target κ > 0.70.
422
+
423
+ ### Hybrid Mode
424
+
425
+ Human and agent evaluate independently, then reconcile. Disagreements of ≥ 2 points on any criterion must be resolved against the level descriptors before finalizing the record. The evaluation record captures both evaluators (`mode: human` and `mode: agent`).
426
+
427
+ ---
428
+
429
+ ## 8. Conventions
430
+
431
+ ### File Naming
432
+
433
+ The `kind` field is the universal type discriminator. Version is tracked inside the file (`version: 2.0.0`), not in the filename.
434
+
435
+ | File type | Pattern | Example |
436
+ |-----------|---------|---------|
437
+ | Rubric definitions (core, profiles, overlays) | `<name>.yaml` | `reference-architecture.yaml` |
438
+ | Evaluation records | `<name>.evaluation.yaml` | `payments-api.evaluation.yaml` |
439
+ | Templates | `<name>.template.yaml` | `evaluation-record.template.yaml` |
440
+ | JSON schemas | `<name>.schema.json` | `rubric.schema.json` |
441
+
442
+ - Kebab-case throughout; no spaces in filenames
443
+ - Version is tracked inside the file only (`version: 2.0.0`), never in the filename
444
+ - The `kind` field distinguishes file purpose: `core_rubric`, `profile`, `overlay`, `evaluation`
445
+
446
+ ### Versioning (Semver)
447
+ - `MAJOR` — breaking change to scoring model, gate structure, or status thresholds
448
+ - `MINOR` — new criteria added, existing criteria improved
449
+ - `PATCH` — documentation, examples, typo fixes
450
+
451
+ The `rubric_locked: true` flag in `agent_evaluation` means an agent must not modify rubric criteria during evaluation. Changes require a version bump and governance.
452
+
453
+ ### YAML Style
454
+ - Two-space indentation
455
+ - String keys quoted when they are numeric: `"0": "Absent"`, `"4": "Strong"`
456
+ - Multi-line descriptions use `>` block scalar
457
+ - Lists of evidence items use `- item` format (one item per line)
458
+ - `gate: false` (not `gate: {enabled: false}`) when no gate needed
459
+
460
+ ### Required Fields for Every New Criterion
461
+
462
+ `id`, `question`, `description`, `metric_type: ordinal`, `scale: [0, 1, 2, 3, 4, "N/A"]`, `gate`, `required_evidence`, `scoring_guide` (keys `"0"` through `"4"`), `anti_patterns`, `examples.good`, `examples.bad`, `decision_tree`, `remediation_hints`
463
+
464
+ ---
465
+
466
+ ## 9. Important Rules
467
+
468
+ 1. **Never collapse the three evaluation types.** Artifact quality, architectural fitness, and governance fit are distinct judgments. Never merge them into a single opaque score.
469
+
470
+ 2. **Gates before averages.** Always check gates before computing a weighted average. A single critical gate failure = Reject, no matter how high the average.
471
+
472
+ 3. **Evidence first.** Every score requires a cited excerpt or reference. "Evidence: section 3 states X" is valid. "The artifact seems to address this" is not. Use RULERS anchoring.
473
+
474
+ 4. **Confidence separate from score.** Reporting low confidence does not lower the score. Confidence informs how much weight a human reviewer places on the agent's output; it does not modify the numerical score.
475
+
476
+ 5. **N/A requires justification.** You cannot use N/A to avoid a hard criterion. The narrative must explain why the criterion genuinely does not apply.
477
+
478
+ 6. **Machine-readable formats preferred.** Artifacts in structured formats (YAML frontmatter, ArchiMate exchange, diagram-as-code) are assessed more reliably. Prefer structured output formats (YAML/JSON) for evaluation records.
479
+
480
+ 7. **Rubrics are governed assets.** Do not modify a rubric YAML's scoring model or gate structure without a version bump and owner approval. The rubric is locked during evaluation.
481
+
482
+ 8. **Calibrate before production.** Any new profile or overlay must be calibrated against at least 3 representative artifacts with 2+ reviewers before being used in a live governance process.
483
+
484
+ 9. **Do not average across dimensions prematurely.** A dimension score of 0 is not neutralized by a dimension score of 4. The status thresholds include a floor check: no dimension < 2.0 for a Pass status.
485
+
486
+ 10. **Agentic evaluations must be auditable.** The evaluation record must capture evidence anchors, evidence classes, and confidence so a human can inspect and override any agent judgment.
487
+
488
+ ---
489
+
490
+ ## 10. The Reference Architecture Profile — Model for Other Profiles
491
+
492
+ `profiles/reference-architecture.yaml` (`EAROS-REFARCH-001`) is the first full profile in EAROS v2 and serves as the reference implementation for how profiles should be built.
493
+
494
+ **Why it is a good model:**
495
+ - Uses `design_method: pattern_library` (Method E) — appropriate for recurring platform blueprints
496
+ - Has 9 criteria across 6 profile-specific dimensions, combined with the 10 core criteria = 19 criteria total
497
+ - Every criterion has all required fields including `examples.good`, `examples.bad`, and `decision_tree`
498
+ - Gate types are carefully graduated: 4 `major` gates, no `critical` gates (critical gates reserved for compliance-level concerns)
499
+ - Dimension weights are tuned: implementation actionability (RA-D4) and views (RA-D1) weighted at 1.2 to reflect their importance; reusability/evolution (RA-D6) at 0.8 as secondary
500
+ - Calibration pack is specified explicitly: 1 strong, 1 weak, 1 ambiguous, 1 golden-path artifact
501
+
502
+ **Gold-standard calibration example:** `examples/aws-event-driven-order-processing/` contains the EAROS calibration benchmark for reference architecture assessments:
503
+ - `artifact.yaml` — A complete, approved reference architecture (Event-Driven Order Processing on AWS). Scores 3.73/4.0. Use as the "strong" artifact in calibration packs.
504
+ - `evaluation.yaml` — Full 19-criterion EAROS evaluation with RULERS evidence anchors, evidence class, confidence, and challenger notes. All 8 DAG steps completed.
505
+ - `report.md` — Human-readable assessment report with traffic-light dashboard, dimension table, and recommended actions.
506
+
507
+ Before using EAROS-REFARCH-001 in a production governance process, evaluators must independently score `examples/aws-event-driven-order-processing/artifact.yaml` and achieve κ > 0.70 against the reference scores in `evaluation.yaml`. The two intentionally score-3 criteria (RA-VIEW-02 and RA-IMP-02) are calibration checkpoints — inflating these to 4 is a calibration failure.
508
+
509
+ **Paired with artifact schemas:** `standard/schemas/artifact.schema.json` is derived from the rubric's `required_evidence` fields and defines the structure of a compliant reference architecture document. `standard/schemas/artifact.uischema.json` controls how JSON Forms renders that schema in the editor — splitting it into 7 tabs rather than a flat 2-tab layout. This pattern — rubric + data schema + UI schema — should be replicated for each new profile. When creating a new profile, update `artifact.schema.json` with any new required sections, then update `artifact.uischema.json` to add those sections to the appropriate tab.
510
+
511
+ **Illustrative decision tree pattern** (from RA-VIEW-01):
512
+ ```
513
+ Count distinct views:
514
+ IF < 2 THEN score 0-1
515
+ IF 2-3 views THEN score 2
516
+ IF 4+ views AND data flow narrative exists THEN score 3
517
+ IF all views cross-referenced AND security view included THEN score 4
518
+ ```
519
+ This pattern — count observable features, branch on presence — is the right template for `decision_tree` fields throughout the framework.
520
+
521
+ ---
522
+
523
+ ## 11. Agent Skills
524
+
525
+ The `.agents/skills/` directory contains Claude agent skills for working with EAROS. Each skill lives in its own subdirectory with a `SKILL.md` file. Skills are auto-triggered when their description matches the user's request — no slash command needed.
526
+
527
+ ```
528
+ .agents/skills/
529
+ ├── earos-assess/SKILL.md Core assessment — runs the full 8-step DAG evaluation on any artifact
530
+ ├── earos-review/SKILL.md Challenger — audits an existing evaluation record for over-scoring and unsupported claims
531
+ ├── earos-template-fill/SKILL.md Author guide — coaches artifact authors through writing assessment-ready documents
532
+ ├── earos-create/SKILL.md Rubric creation — guided interview + YAML generation for profiles, overlays, and core rubrics
533
+ ├── earos-profile-author/SKILL.md Profile YAML authoring — technical reference for v2 field structure and schema compliance
534
+ ├── earos-calibrate/SKILL.md Calibration — runs calibration exercises and computes inter-rater reliability
535
+ ├── earos-report/SKILL.md Reporting — generates executive reports from evaluation records
536
+ ├── earos-validate/SKILL.md Health check — validates all YAML rubrics against schemas and checks consistency
537
+ ├── earos-remediate/SKILL.md Remediation planner — generates prioritized improvement plans from evaluation records
538
+ └── earos-artifact-gen/SKILL.md Artifact generator — interviews architects and produces schema-compliant artifact YAML
539
+ ```
540
+
541
+ ### When to use which skill
542
+
543
+ | Task | Skill |
544
+ |------|-------|
545
+ | Assess an architecture artifact | `earos-assess` |
546
+ | Challenge or audit an existing evaluation | `earos-review` |
547
+ | Help write an artifact that will pass EAROS | `earos-template-fill` |
548
+ | Create a new architecture artifact through guided interview | `earos-artifact-gen` |
549
+ | Create a new rubric from scratch (profile, overlay, or core) | `earos-create` |
550
+ | Get YAML structure help after criteria are defined | `earos-profile-author` |
551
+ | Calibrate a rubric against gold-standard examples | `earos-calibrate` |
552
+ | Generate an executive report from evaluation(s) | `earos-report` |
553
+ | Check the repo for schema errors and inconsistencies | `earos-validate` |
554
+ | Get a prioritized fix list from an evaluation record | `earos-remediate` |
555
+
556
+ **Key design principle for all skills:** Every skill instructs Claude to read the actual YAML rubric files at runtime. The skills do not embed rubric content — they load it dynamically. This means skills automatically use the latest rubric version without needing updates.
557
+
558
+ ---
559
+
560
+ ## 12. Manifest (earos.manifest.yaml)
561
+
562
+ `earos.manifest.yaml` (at the repo root) is the authoritative inventory of all EAROS rubric files. It lists every core rubric, profile, and overlay with their paths, rubric IDs, titles, artifact types, and statuses.
563
+
564
+ **Purpose:**
565
+ - Gives skills a single source of truth for discovering available profiles and overlays — no hardcoded paths
566
+ - Powers the editor's file sidebar (browse and load rubrics directly)
567
+ - Enables `earos-validate` to detect drift between the manifest and the filesystem
568
+
569
+ **CLI commands** (from `tools/editor/`):
570
+ ```
571
+ node bin.js manifest # Regenerate manifest by scanning core/, profiles/, overlays/
572
+ node bin.js manifest add <file> # Add a single file to the manifest
573
+ node bin.js manifest check # Verify manifest matches filesystem; exits non-zero on drift
574
+ ```
575
+
576
+ **Keeping it current:**
577
+ - After creating a new rubric with `earos-create`: run `node bin.js manifest add <path>` (or manually add the entry)
578
+ - After deleting a rubric: re-run `node bin.js manifest` to regenerate
579
+ - `earos-validate` Check 8 reports any manifest-filesystem inconsistency as an ERROR
580
+
581
+ **Skills that use the manifest:**
582
+ - `earos-assess` — reads manifest first to discover available profiles and overlays
583
+ - `earos-create` — updates manifest as the final step of rubric creation
584
+ - `earos-validate` — Check 8 validates manifest-filesystem consistency
585
+
586
+ ---
587
+
588
+ ## 13. Key Terms (Glossary)
589
+
590
+ The full glossary is in [`docs/terminology.md`](docs/terminology.md). It covers statistical & calibration terms (Cohen's kappa, Wasserstein distance, IRR), EAROS-specific terms (gate, overlay, RULERS protocol, DAG evaluation flow), and architecture terms as used in EAROS (viewpoint, concern, quality attribute, ADR, golden path). Below are the most important terms for day-to-day work in this repository.
591
+
592
+ | Term | Definition |
593
+ |------|------------|
594
+ | **Core meta-rubric** | Universal foundation rubric (`EAROS-CORE-002`): 9 dimensions, 10 criteria, applied to every artifact |
595
+ | **Profile** | Artifact-type extension of the core (5–12 extra criteria). Declares `inherits: [EAROS-CORE-002]` |
596
+ | **Overlay** | Cross-cutting concern extension (e.g. security). Applied by context, not artifact type. Uses `append_to_base_rubric` scoring |
597
+ | **Gate** | Criterion-level control that blocks a passing status regardless of average. Types: `none`, `advisory`, `major`, `critical` |
598
+ | **Evidence anchor** | Specific reference (section, page, diagram ID) in the artifact supporting a score. Required by RULERS protocol |
599
+ | **Evidence class** | `observed` (directly stated), `inferred` (interpreted), or `external` (from outside the artifact) |
600
+ | **RULERS protocol** | Rubric Unification, Locking, and Evidence-anchored Robust Scoring — prevents LLM scoring drift via locked rubrics + mandatory evidence citation |
601
+ | **DAG evaluation flow** | 8-step evaluation sequence: structural validation → content extraction → criterion scoring → cross-reference validation → dimension aggregation → challenge pass → calibration → status determination |
602
+ | **Challenge pass** | Step 6 of the DAG: evaluator challenges their own highest and lowest scores for weak evidence or over-scoring |
603
+ | **Rubric locking** | Compiling rubrics into immutable specs before evaluation (`rubric_locked: true`). Changes require a version bump |
604
+ | **Decision tree** | IF/THEN scoring logic per criterion. Helps evaluators and agents resolve ambiguous cases consistently |
605
+ | **Cohen's kappa (κ)** | Inter-rater reliability measure. 0 = chance, 1 = perfect. EAROS target: κ > 0.70 (well-defined), > 0.50 (subjective) |
606
+ | **Weighted kappa** | Kappa variant for ordinal scales — treats adjacent disagreements (2 vs 3) as less severe than distant ones (1 vs 4) |
607
+ | **Wasserstein distance** | Metric used in RULERS calibration to align AI agent score distributions with human reviewer distributions |
608
+ | **Quality attribute** | Measurable system characteristic (e.g. "99.95% availability", "P99 < 200ms"). Must be quantified, not adjectival |
609
+ | **Fitness function** | Automated test/check that validates an architecture meets a quality attribute target. Used in CI/CD |
610
+ | **ADR** | Architecture Decision Record — captures a decision with context, options, rationale, consequences, and revisit triggers |
611
+ | **Golden path** | Opinionated, fully supported implementation path for a reference architecture pattern |
612
+
613
+ ---
614
+
615
+ ## Quick Reference
616
+
617
+ | Task | Where to start |
618
+ |------|---------------|
619
+ | Understand the full standard | `standard/EAROS.md` |
620
+ | Score a reference architecture | `earos-assess` skill or `tools/scoring-sheets/EAROS_Scoring_Sheet_v2.xlsx` |
621
+ | Score any other artifact | `earos-assess` skill or `tools/scoring-sheets/EAROS_Scoring_Sheet_v2.xlsx` |
622
+ | Create a new rubric (profile, overlay, or core) | `earos-create` skill |
623
+ | Get YAML authoring help for an existing rubric design | `earos-profile-author` skill or `templates/new-profile.template.yaml` + `docs/profile-authoring-guide.md` |
624
+ | See a worked evaluation (solution architecture) | `examples/example-solution-architecture.evaluation.yaml` |
625
+ | See a gold-standard reference architecture artifact | `examples/aws-event-driven-order-processing/artifact.yaml` |
626
+ | See a gold-standard evaluation (calibration benchmark) | `examples/aws-event-driven-order-processing/evaluation.yaml` |
627
+ | See a gold-standard assessment report | `examples/aws-event-driven-order-processing/report.md` |
628
+ | Validate a rubric YAML | `earos-validate` skill or `standard/schemas/rubric.schema.json` |
629
+ | Validate an evaluation record | `standard/schemas/evaluation.schema.json` |
630
+ | Validate an artifact document | `standard/schemas/artifact.schema.json` |
631
+ | Calibrate | `earos-calibrate` skill or `calibration/gold-set/` |
632
+ | Generate an executive report | `earos-report` skill |
633
+ | Regenerate the manifest | `node tools/editor/bin.js manifest` |
634
+ | Add a new rubric to the manifest | `node tools/editor/bin.js manifest add <path>` |
635
+ | Check manifest-filesystem consistency | `node tools/editor/bin.js manifest check` |