npm - @trohde/earos - Versions diffs - 1.1.1 → 1.2.0 - Mend

@trohde/earos 1.1.1 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (76) hide show

package/README.md CHANGED Viewed

@@ -55,7 +55,7 @@ The workspace is **agent-agnostic** — the `.agents/skills/` directory works wi
 |---------|-------------|
 | `earos` | Start the web editor (Express server, opens browser) |
 | `earos init [dir] [--icons]` | Scaffold a complete EaROS workspace in `dir` and optionally download architecture icon packages from AWS, Azure, and GCP into `icons/`, with stable aliases in `icons/aws/`, `icons/azure/`, and `icons/gcp/` |
-| `earos validate <file>` | Validate a rubric or evaluation YAML against EaROS schemas (exit 0/1) |
+| `earos validate <file>` | Validate any EaROS YAML (rubric, evaluation, or artifact) against schemas (exit 0/1) |
 | `earos manifest` | Regenerate `earos.manifest.yaml` by scanning the filesystem |
 | `earos manifest add <file>` | Add a single file to the manifest |
 | `earos manifest check` | Verify the manifest matches the filesystem (exits non-zero on drift) |

package/assets/init/.agents/skills/earos-assess/SKILL.md CHANGED Viewed

@@ -156,11 +156,11 @@ Quick self-checks:
 ### Step 8 — Status Determination
-**Gates first** — check gate criteria before computing any weighted average. A single critical gate failure = Reject, no matter how high the average is.
+**Gates first** — check gate criteria before computing any weighted average. A single critical gate failure blocks a passing status, no matter how high the average is. The specific outcome (`reject` or `not_reviewable`) is determined by the criterion's `failure_effect`.
 | Gate type | Effect |
 |-----------|--------|
-| `critical` failure | Status = `reject` regardless of average |
+| `critical` failure | Status = `reject` or `not_reviewable` (per `failure_effect`) regardless of average |
 | `major` failure | Status cannot exceed `conditional_pass` |
 Then compute the weighted overall average and apply thresholds:

package/assets/init/.agents/skills/earos-assess/references/output-templates.md CHANGED Viewed

@@ -13,36 +13,41 @@ The YAML record is the machine-readable, archivable output. It is the authoritat
 ### Full Template
 ```yaml
+kind: evaluation
 evaluation_id: EVAL-[TYPE]-[NNNN]
 # Format: EVAL-SOL-0001 (solution), EVAL-REF-0001 (reference arch), EVAL-ADR-0001, etc.
 # Use a sequential number within the artifact type.
-rubric_id: [rubric IDs used, comma-separated]
-# Example: EAROS-CORE-002, EAROS-REFARCH-001
-# If overlays applied: EAROS-CORE-002, EAROS-SOL-001, EAROS-OVR-SEC-001
+rubric_id: [primary rubric ID — e.g. EAROS-CORE-002 or EAROS-SOL-001]
+rubric_version: [version of the rubric used — e.g. 2.0.0]
-rubric_version: [version of the profile used]
-# Example: 2.0.0
+# If profile and/or overlays are applied in addition to the core:
+profiles_applied:
+  - [profile rubric ID — e.g. EAROS-REFARCH-001]
+overlays_applied:
+  - [overlay rubric ID — e.g. EAROS-OVR-SEC-001]
-artifact_ref:
-  id: [artifact identifier if one exists, or omit]
-  title: [full title of the artifact as it appears in the document]
-  artifact_type: [solution_architecture | reference_architecture | adr | capability_map | roadmap]
-  owner: [team or individual named as owner in the artifact]
-  uri: [repo path, URL, or file path — omit if not available]
+artifact_id: [artifact identifier — e.g. SOL-ART-042]
+artifact_type: [solution_architecture | reference_architecture | adr | capability_map | roadmap]
+artifact_version: [version of the artifact being evaluated — omit if not available]
 evaluation_date: '[YYYY-MM-DD]'
+evaluation_mode: [human | agent | hybrid]
-evaluators:
-  - name: EAROS evaluator
-    role: rubric-evaluator
-    mode: agent
+evaluated_by:
+  - role: evaluator
+    actor: agent
+    identity: EAROS evaluator
 # If human also evaluated, add:
-#  - name: [name]
-#    role: domain architect
-#    mode: human
+#  - role: evaluator
+#    actor: human
+#    identity: [name or role]
+# If a challenge pass was performed:
+#  - role: challenger
+#    actor: agent
-status: [pass | conditional_pass | rework_required | reject | not_reviewable]
+overall_status: [pass | conditional_pass | rework_required | reject | not_reviewable]
 overall_score: [weighted average to 1 decimal place — e.g. 2.8]
 # Compute: sum(dimension_score × weight) / sum(weights)
@@ -58,22 +63,27 @@ criterion_results:
   - criterion_id: [ID]
     # Use IDs from the rubric YAML, e.g. STK-01, SCP-01, TRC-01
     score: [0 | 1 | 2 | 3 | 4 | "N/A"]
-    judgment_type: [observed | inferred | external | mixed | none]
-    # 'mixed' when evidence combines observed and inferred
+    evidence_class: [observed | inferred | external]
+    # observed: directly supported by a quote from the artifact
+    # inferred: reasonable interpretation not directly stated
+    # external: judgment based on a standard or source outside the artifact
     confidence: [high | medium | low]
-    evidence_sufficiency: [sufficient | partial | absent]
+    confidence_reason: "[why confidence is below high — omit if high]"
+    evidence_sufficiency: [sufficient | partial | insufficient | none]
     # sufficient: evidence supports the score without reservation
     # partial: evidence exists but is incomplete or ambiguous
-    # absent: no evidence found; score is 0 or N/A
+    # insufficient: evidence exists but is too weak to confidently score
+    # none: no evidence found; score is 0 or N/A
     evidence_refs:
-      - location: "[section heading, page number, or diagram label]"
-        excerpt: "[direct quote or very close paraphrase from the artifact]"
+      - section: "[section heading or number]"
+        quotation: "[direct quote or very close paraphrase from the artifact]"
       # Add more refs if multiple evidence sources support the score
+      # Can also be a simple string: "Section 3.2, paragraph 2"
     rationale: >
       [1-3 sentences explaining why the evidence maps to this score level.
       Cite the specific evidence. Explain why it is not one level higher
       if the score is below 4.]
-    missing_information:
+    evidence_gaps:
       - "[specific piece of information that would improve this score]"
       # Leave empty if score is 4 or N/A
     recommended_actions:
@@ -83,48 +93,29 @@ criterion_results:
   # Repeat for every criterion in core + profile + overlays
-dimension_scores:
+dimension_results:
   - dimension_id: [D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 | D9 | RA-D1 | etc.]
-    dimension_name: [name from rubric]
-    score: [weighted average of criteria in this dimension, 1 decimal place]
-    weight: [weight from rubric YAML — default 1.0]
-    summary: "[1 sentence summary of why this dimension scored this way]"
+    weighted_score: [weighted average of criteria in this dimension, 1 decimal place]
   # Repeat for every dimension in core + profile
-narrative_summary: |
+decision_summary: >
   [2-3 paragraphs for a governance reviewer. Cover:
   1. What the artifact is, who it is for, and the overall verdict
   2. The most significant strengths (what was well done)
   3. The most significant weaknesses (what holds it back)
+  Address all three evaluation perspectives: artifact quality,
+  architectural fitness, and governance fit.
   Do NOT restate the criterion scores — synthesize them into a judgment.]
-summary:
-  strengths:
-    - "[Key strength — specific, not generic]"
-    - "[Second strength]"
-  weaknesses:
-    - "[Key weakness — specific, not generic]"
-    - "[Second weakness]"
-  risks:
-    - "[Risk that follows from a weakness — what could go wrong in delivery/governance]"
-  next_actions:
-    - "[Top-priority action]"
-    - "[Second priority action]"
-  decision_narrative: >
-    [1-2 sentences on what happens next — should this go to governance board as-is,
-    conditional on specific fixes, or returned for rework?]
 recommended_actions:
-  - priority: 1
-    criterion_id: [ID of the criterion this addresses]
-    action: "[Specific, actionable step — verb-first]"
-    owner_suggestion: "[Who should own this — team role, not individual]"
-  - priority: 2
-    criterion_id: [ID]
-    action: "[Action]"
-    owner_suggestion: "[Role]"
+  - "[Top-priority action — verb-first, specific]"
+  - "[Second priority action]"
   # Top 5 actions, ordered by impact on overall status
+challenger_notes: >
+  [Findings from the challenge pass (Step 6). Which scores were
+  challenged, what was the outcome, and any adjustments made.]
 ```
 ---
@@ -135,7 +126,7 @@ recommended_actions:
 Use a sequential ID within the artifact type. If you don't have a numbering system, use the date: `EVAL-SOL-20260319-001`. The ID must be unique within the organization's evaluation records.
-### status
+### overall_status
 The status is determined by gates first, then thresholds (Step 8 in SKILL.md). Do not set status until all gate checks and aggregation are complete. Common error: setting `conditional_pass` when a critical gate has failed — critical gate failure always = `reject`.
@@ -148,21 +139,23 @@ overall_score = sum(dimension_score × dimension_weight) / sum(dimension_weights
 Round to 1 decimal place. A score of 2.35 rounds to 2.4, which is the `conditional_pass` threshold — be precise.
-### judgment_type
+### evidence_class
-This is the evidence class for the criterion as a whole. If all evidence is `observed`, use `observed`. If you used a mix of observed and inferred evidence to reach the score, use `mixed`. `none` means no evidence was found (score must be 0 or 1).
+This is the evidence class for the criterion as a whole — `observed`, `inferred`, or `external`. Use the highest-credibility class that applies. If the primary evidence is a direct quote from the artifact, use `observed`. If you are interpreting content that is not directly stated, use `inferred`. If your judgment relies on a standard or source outside the artifact, use `external`.
 ### evidence_sufficiency
 This is your assessment of whether the evidence you found is adequate to confidently assign the score:
 - `sufficient` — the evidence clearly matches one level; you wouldn't expect a reviewer to disagree
 - `partial` — evidence exists but is ambiguous; a different reviewer might score differently
-- `absent` — no evidence was found; score is based on absence
+- `insufficient` — evidence exists but is too weak to support the score with confidence
+- `none` — no evidence was found; score is based on absence
-### narrative_summary
+### decision_summary
 This is the most important text in the record for human reviewers. Write it for a governance board member who will skim the criterion table but read the narrative carefully. The narrative should:
 - Name what the artifact is and its governance context
+- Address all three evaluation perspectives: artifact quality, architectural fitness, and governance fit
 - Identify the 2-3 things that most determine the outcome
 - Give a clear recommendation (proceed, fix X first, rework)
@@ -267,9 +260,9 @@ No gate failures. The artifact passes all gate checks.
 | Field | Value |
 |-------|-------|
 | Score | [0-4 or N/A] |
-| Evidence Class | [observed / inferred / external / none] |
+| Evidence Class | [observed / inferred / external] |
 | Confidence | [high / medium / low] |
-| Evidence Sufficiency | [sufficient / partial / absent] |
+| Evidence Sufficiency | [sufficient / partial / insufficient / none] |
 **Evidence:** [Section/location] — "[Direct quote or close paraphrase]"
@@ -288,7 +281,7 @@ No gate failures. The artifact passes all gate checks.
 ## Narrative Summary
 [2-3 paragraphs — synthesized judgment for a governance reviewer.
-Copy from the YAML narrative_summary field.]
+Copy from the YAML decision_summary field.]
 ---
@@ -302,10 +295,11 @@ Copy from the YAML narrative_summary field.]
 Before submitting the YAML evaluation record, check:
 1. Every criterion in the loaded rubric files has a result entry
-2. Every score has at least one `evidence_refs` entry (unless `evidence_class: none`)
+2. Every score has at least one `evidence_refs` entry (unless `evidence_sufficiency: none`)
 3. `gate_failures` matches the gate criteria that failed (not just any low score)
 4. `overall_score` is the weighted average, not a simple average
-5. `status` was determined by gates first, then thresholds
-6. The `narrative_summary` does not just list criterion scores — it synthesizes them
+5. `overall_status` was determined by gates first, then thresholds
+6. The `decision_summary` does not just list criterion scores — it synthesizes them
+7. All required schema fields are present: `kind`, `artifact_id`, `artifact_type`, `evaluated_by`, `evaluation_mode`, `overall_status`, `overall_score`
 The full JSON Schema for validation is at `standard/schemas/evaluation.schema.json`. If you have access to a YAML validator, validate the output before delivery.

package/assets/init/.agents/skills/earos-calibrate/SKILL.md CHANGED Viewed

@@ -54,7 +54,7 @@ For each calibration artifact, run a full EAROS assessment using the `earos-asse
 **This step cannot be skipped or abbreviated.** Independent scoring is the entire point of calibration. If you score after seeing the benchmark, you measure nothing.
-> **For the full assessment protocol**, see `.claude/skills/earos-assess/SKILL.md`.
+> **For the full assessment protocol**, see `.agents/skills/earos-assess/SKILL.md`.
 ---

package/assets/init/.agents/skills/earos-calibrate/references/calibration-protocol.md CHANGED Viewed

@@ -99,7 +99,7 @@ Before finalising scores, run the internal challenge:
 ### 2.4 Determine status for each artifact
 Apply the status thresholds:
-1. Check gate failures first — any critical gate failure = reject
+1. Check gate failures first — any critical gate failure blocks a passing status (outcome per `failure_effect`: `reject` or `not_reviewable`)
 2. Check overall score: ≥ 3.2 = pass, 2.4–3.19 = conditional_pass, < 2.4 = rework_required
 3. Check dimension floor: no dimension < 2.0 for a pass status

package/assets/init/.agents/skills/earos-create/references/validation-checklist.md CHANGED Viewed

@@ -6,11 +6,11 @@ This checklist covers all pre-publication quality checks for a new EAROS rubric
 ## Quick Reference — What "Valid" Means
-A valid EAROS v2 rubric file:
+A valid EAROS rubric file:
 1. Passes schema validation against `standard/schemas/rubric.schema.json`
 2. Has a unique rubric ID and unique criterion IDs (no conflicts across the entire repo)
 3. Has the correct YAML structure for its kind (profile, overlay, or core rubric)
-4. Has all 13 v2 required fields on every criterion
+4. Has all 13 required fields on every criterion
 5. Has a calibrated gate model (not over- or under-gated)
 6. Does not duplicate what `EAROS-CORE-002` already covers

package/assets/init/.agents/skills/earos-profile-author/SKILL.md CHANGED Viewed

@@ -179,7 +179,7 @@ change_log:
     date: "[today]"
     author: "[author]"
     changes:
-      - Initial profile for EAROS v2.0
+      - Initial profile
 ```
 ---

package/assets/init/CLAUDE.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # CLAUDE.md — EAROS Project Guide
-**Enterprise Architecture Rubric Operational Standard · Version 2.0**
+**Enterprise Architecture Rubric Operational Standard**
 This file tells Claude how to work effectively in this project.
@@ -327,10 +327,10 @@ Start from `templates/new-profile.template.yaml`. Set:
 - `design_method` from step 2
 - `rubric_id` using pattern `EAROS-<ARTIFACT>-<NNN>`
-### Step 4 — Write 5–12 criteria
+### Step 4 — Write up to 12 criteria
 Rules:
-- Add **no more than 5–12 criteria** (the core already has 10)
+- Add **no more than 12 criteria** (the core already has 10; built-in profiles use 3–9)
 - Every criterion needs: `question`, `description`, `scoring_guide` (all 5 levels 0–4), `required_evidence`, `anti_patterns`, `examples.good`, `examples.bad`, `decision_tree`, `remediation_hints`
 - Assign each criterion to a dimension with an appropriate `weight`
 - Designate gate types deliberately — not every criterion needs a gate; over-gating creates false rejects
@@ -467,7 +467,7 @@ The `rubric_locked: true` flag in `agent_evaluation` means an agent must not mod
 1. **Never collapse the three evaluation types.** Artifact quality, architectural fitness, and governance fit are distinct judgments. Never merge them into a single opaque score.
-2. **Gates before averages.** Always check gates before computing a weighted average. A single critical gate failure = Reject, no matter how high the average.
+2. **Gates before averages.** Always check gates before computing a weighted average. A single critical gate failure blocks a passing status — the outcome (`Reject` or `Not Reviewable`) depends on the criterion's `failure_effect`.
 3. **Evidence first.** Every score requires a cited excerpt or reference. "Evidence: section 3 states X" is valid. "The artifact seems to address this" is not. Use RULERS anchoring.
@@ -489,7 +489,7 @@ The `rubric_locked: true` flag in `agent_evaluation` means an agent must not mod
 ## 10. The Reference Architecture Profile — Model for Other Profiles
-`profiles/reference-architecture.yaml` (`EAROS-REFARCH-001`) is the first full profile in EAROS v2 and serves as the reference implementation for how profiles should be built.
+`profiles/reference-architecture.yaml` (`EAROS-REFARCH-001`) is the first full profile and serves as the reference implementation for how profiles should be built.
 **Why it is a good model:**
 - Uses `design_method: pattern_library` (Method E) — appropriate for recurring platform blueprints
@@ -522,7 +522,7 @@ This pattern — count observable features, branch on presence — is the right
 ## 11. Agent Skills
-The `.agents/skills/` directory contains Claude agent skills for working with EAROS. Each skill lives in its own subdirectory with a `SKILL.md` file. Skills are auto-triggered when their description matches the user's request — no slash command needed.
+The `.agents/skills/` directory contains Claude Code skills for working with EAROS in this development repo. In scaffolded workspaces (`earos init`), skills live in `.agents/skills/` — an agent-agnostic convention readable by Cursor, Copilot, Windsurf, and other AI coding tools. Each skill lives in its own subdirectory with a `SKILL.md` file. Skills are auto-triggered when their description matches the user's request — no slash command needed.
 ```
 .agents/skills/
@@ -530,7 +530,7 @@ The `.agents/skills/` directory contains Claude agent skills for working with EA
 ├── earos-review/SKILL.md        Challenger — audits an existing evaluation record for over-scoring and unsupported claims
 ├── earos-template-fill/SKILL.md Author guide — coaches artifact authors through writing assessment-ready documents
 ├── earos-create/SKILL.md        Rubric creation — guided interview + YAML generation for profiles, overlays, and core rubrics
-├── earos-profile-author/SKILL.md Profile YAML authoring — technical reference for v2 field structure and schema compliance
+├── earos-profile-author/SKILL.md Profile YAML authoring — technical reference for field structure and schema compliance
 ├── earos-calibrate/SKILL.md     Calibration — runs calibration exercises and computes inter-rater reliability
 ├── earos-report/SKILL.md        Reporting — generates executive reports from evaluation records
 ├── earos-validate/SKILL.md      Health check — validates all YAML rubrics against schemas and checks consistency
@@ -592,7 +592,7 @@ The full glossary is in [`docs/terminology.md`](docs/terminology.md). It covers
 | Term | Definition |
 |------|------------|
 | **Core meta-rubric** | Universal foundation rubric (`EAROS-CORE-002`): 9 dimensions, 10 criteria, applied to every artifact |
-| **Profile** | Artifact-type extension of the core (5–12 extra criteria). Declares `inherits: [EAROS-CORE-002]` |
+| **Profile** | Artifact-type extension of the core (additional criteria, typically 3–9). Declares `inherits: [EAROS-CORE-002]` |
 | **Overlay** | Cross-cutting concern extension (e.g. security). Applied by context, not artifact type. Uses `append_to_base_rubric` scoring |
 | **Gate** | Criterion-level control that blocks a passing status regardless of average. Types: `none`, `advisory`, `major`, `critical` |
 | **Evidence anchor** | Specific reference (section, page, diagram ID) in the artifact supporting a score. Required by RULERS protocol |
@@ -612,6 +612,43 @@ The full glossary is in [`docs/terminology.md`](docs/terminology.md). It covers
 ---
+## 14. Publishing the CLI to npm
+The `@trohde/earos` CLI is published from `tools/editor/`. A GitHub Actions workflow (`.github/workflows/publish-npm.yml`) auto-publishes when the version in `tools/editor/package.json` changes on `master`.
+### When the user says "publish to npm"
+1. **Review all changes since the last publish** — run `git log` to see commits since the last `release:` commit
+2. **Choose the version bump** based on what changed:
+   - **patch** — bug fixes, documentation, typo fixes, dependency updates, minor UI tweaks
+   - **minor** — new features, new commands, new editor capabilities, new schema fields, new skills bundled in `assets/init/`
+   - **major** — breaking CLI changes (renamed commands, removed flags), breaking changes to `earos init` scaffold structure, incompatible schema changes
+3. **Bump, commit, and push:**
+   ```bash
+   cd tools/editor && npm run version:patch  # or version:minor / version:major
+   cd ../..
+   git add tools/editor/package.json
+   git commit -m "release: v<NEW_VERSION>"
+   git push origin master
+   ```
+4. **Watch the workflow** — `gh run watch` on the triggered run to confirm publish succeeds
+5. **Report the result** — tell the user the new version and confirm it's live
+### Version scripts (in `tools/editor/`)
+| Script | Effect |
+|--------|--------|
+| `npm run version:patch` | Bump patch (1.0.1 → 1.0.2) |
+| `npm run version:minor` | Bump minor (1.0.2 → 1.1.0) |
+| `npm run version:major` | Bump major (1.1.0 → 2.0.0) |
+| `npm run release:patch` | Bump + publish locally (bypasses CI) |
+| `npm run release:minor` | Bump + publish locally (bypasses CI) |
+| `npm run release:major` | Bump + publish locally (bypasses CI) |
+**CI token note:** The `NPM_TOKEN` GitHub secret holds a granular access token with "Bypass 2FA" enabled, scoped to `@trohde`. It expires periodically and must be rotated on npmjs.com → Access Tokens.
+---
 ## Quick Reference
 | Task | Where to start |
@@ -633,3 +670,4 @@ The full glossary is in [`docs/terminology.md`](docs/terminology.md). It covers
 | Regenerate the manifest | `node tools/editor/bin.js manifest` |
 | Add a new rubric to the manifest | `node tools/editor/bin.js manifest add <path>` |
 | Check manifest-filesystem consistency | `node tools/editor/bin.js manifest check` |
+| Publish CLI to npm | Say "publish to npm" — Claude chooses version bump, commits, pushes, CI publishes |

package/assets/init/README.md CHANGED Viewed

@@ -1,10 +1,9 @@
 # EaROS — Enterprise Architecture Rubric Operational Standard
 [![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
-[![Version](https://img.shields.io/badge/Version-2.0.0-blue.svg)](CHANGELOG.md)
 [![GitHub](https://img.shields.io/badge/GitHub-ThomasRohde%2FEAROS-blue?logo=github)](https://github.com/ThomasRohde/EAROS)
-**Version 2.0.0 · March 2026** · [github.com/ThomasRohde/EAROS](https://github.com/ThomasRohde/EAROS)
+**March 2026** · [github.com/ThomasRohde/EAROS](https://github.com/ThomasRohde/EAROS)
 EaROS is a structured, extensible framework for evaluating enterprise architecture artifacts. It provides a universal rubric foundation, artifact-specific profiles, and cross-cutting overlays that together enable consistent, evidence-anchored assessment — by human reviewers and AI agents alike.
@@ -367,7 +366,7 @@ flowchart LR
     style S8 fill:#4caf50,stroke:#2e7d32,color:#fff
 ```
-Calibrate your agent against `calibration/gold-set/` before production use. Target inter-rater reliability of Cohen's κ > 0.70.
+Calibrate your agent before production use. Start with the benchmark artifact in `examples/aws-event-driven-order-processing/`, then populate `calibration/gold-set/` with your own reference artifacts. Target inter-rater reliability of Cohen's κ > 0.70.
 ---
@@ -450,9 +449,9 @@ Use `kind: overlay` and `artifact_type: any`. Overlays use `scoring.method: appe
 ### Calibrating Before Production
-1. Score the artifacts in `calibration/gold-set/` independently
-2. Compare against reference scores using `calibration/results/`
-3. Resolve disagreements against the level descriptors
+1. Start with the benchmark at `examples/aws-event-driven-order-processing/`, then add your own artifacts to `calibration/gold-set/`
+2. Have 2+ reviewers score each artifact independently
+3. Compare scores and resolve disagreements against the level descriptors
 4. Iterate until κ > 0.70 on well-defined criteria, > 0.50 on subjective ones
 ---

package/assets/init/core/core-meta-rubric.yaml CHANGED Viewed

@@ -635,7 +635,7 @@ change_log:
       - Added reliability_targets to scoring
       - Added evidence_class and evidence_anchors to outputs
       - Added DAG evaluation steps
-      - Updated from EAROS v1 based on 63-source research programme
+      - Incorporated findings from 63-source research programme
   - version: "1.0.0"
     date: "2026-03-16"
     author: "Thomas Rohde"

package/assets/init/docs/getting-started.md CHANGED Viewed

@@ -2,8 +2,6 @@
 This guide walks you through your first architecture artifact assessment using EaROS. By the end, you will have scored an artifact, produced a structured evaluation record, and know how to interpret the results.
----
 ## Before You Start
 **What you need:**
@@ -15,24 +13,22 @@ This guide walks you through your first architecture artifact assessment using E
 - A scoring sheet to record your evidence and scores
 - Clear pass/fail thresholds
----
 ## Step 1: Identify the Artifact Type
 EaROS has profiles for the most common enterprise architecture artifact types:
-| Artifact Type | Profile to Use |
-|--------------|----------------|
-| Solution architecture document | `profiles/solution-architecture.yaml` |
-| Reference architecture | `profiles/reference-architecture.yaml` |
-| Architecture Decision Record (ADR) | `profiles/adr.yaml` |
-| Capability map | `profiles/capability-map.yaml` |
-| Architecture roadmap | `profiles/roadmap.yaml` |
-| Other / unknown | Core only: `core/core-meta-rubric.yaml` |
+| Artifact Type | Profile to Use | Status |
+|--------------|----------------|--------|
+| Solution architecture document | `profiles/solution-architecture.yaml` | Approved |
+| Reference architecture | `profiles/reference-architecture.yaml` | Draft |
+| Architecture Decision Record (ADR) | `profiles/adr.yaml` | Approved |
+| Capability map | `profiles/capability-map.yaml` | Approved |
+| Architecture roadmap | `profiles/roadmap.yaml` | Draft |
+| Other / unknown | Core only: `core/core-meta-rubric.yaml` | --- |
-If your artifact does not match any profile, apply only the core rubric. The core dimensions are universal.
+> **Status:** *Approved* profiles have completed calibration. *Draft* profiles are usable but have not yet been calibrated with inter-rater reliability measured. Check `earos.manifest.yaml` for the latest status of each rubric.
----
+If your artifact does not match any profile, apply only the core rubric. The core dimensions are universal.
 ## Step 2: Select Your Rubric Set
@@ -60,14 +56,11 @@ overlays/regulatory.yaml       ← if the design is subject to compliance requir
 Apply overlays selectively. Not every artifact needs every overlay.
----
 ## Step 3: Open the Scoring Sheet
-Open the appropriate Excel scoring sheet from `tools/scoring-sheets/`:
+Open the Excel scoring sheet from `tools/scoring-sheets/`:
-- **`EAROS_Scoring_Sheet_v2.xlsx`** — use for most artifact types
-- **`EAROS_RefArch_Scoring_Sheet.xlsx`** — use specifically for reference architectures
+- **`EAROS_Scoring_Sheet_v2.xlsx`** — general-purpose, works for all artifact types
 The scoring sheet has:
 - One tab per rubric section (core dimensions + profile dimensions)
@@ -75,8 +68,6 @@ The scoring sheet has:
 - Evidence fields for recording your cited text or reference
 - An automatic aggregation tab that calculates the weighted score and indicates the pass threshold
----
 ## Step 4: Read the Rubric, Then Read the Artifact
 Open the relevant YAML rubric files. For each criterion, familiarise yourself with:
@@ -87,8 +78,6 @@ Open the relevant YAML rubric files. For each criterion, familiarise yourself wi
 **Then read the artifact end-to-end** before scoring. Do not score as you read on the first pass. Form an overall impression first, then return to score criterion by criterion.
----
 ## Step 5: Score Each Criterion
 For each criterion:
@@ -109,35 +98,30 @@ For each criterion:
 You read the artifact and find a scope statement that defines what is in scope but does not list explicit exclusions. → **Score: 3** → Record: "Section 1.2: scope statement defines in-scope components but exclusions are not listed."
----
 ## Step 6: Check the Gates
 Before calculating the aggregate, check every criterion with a `gate` object (not `gate: false`) in the rubric files. Gate behaviour depends on severity:
-- **`critical`** — Any score below the threshold triggers an immediate **Reject**, regardless of the aggregate score.
+- **`critical`** — Any score below the threshold blocks passing. The gate's `failure_effect` determines the outcome: **Reject** (mandatory control breach) or **Not Reviewable** (evidence too incomplete to score).
 - **`major`** — A weak score (typically < 2) caps the status at **Conditional Pass** at best; cannot achieve a Pass.
 - **`advisory`** — Triggers a recommendation but does not cap the status.
 Gate criteria represent non-negotiable minimums on their respective concern. A critical gate failure means the artifact has a fundamental deficiency that makes it unsuitable for its purpose.
----
 ## Step 7: Determine the Status
 The scoring sheet calculates the weighted dimension average automatically. Read the status from the aggregation tab:
 | Weighted Average | Status |
 |-----------------|--------|
-| ≥ 3.2 | **Pass** |
-| 2.4 – 3.19 | **Conditional Pass** |
+| ≥ 3.2 (no critical gate failure, no dimension < 2.0) | **Pass** |
+| 2.4 – 3.19 (no critical gate failure) | **Conditional Pass** |
 | < 2.4 | **Rework Required** |
-| Any gate at 0 | **Reject** |
+| Critical gate failure (mandatory control breach) | **Reject** |
+| Critical gate failure (evidence too incomplete to score) | **Not Reviewable** |
 **Conditional Pass** means the artifact is acceptable for use but has identified remediation items that must be addressed before the next formal review. Document each item with the criterion ID, the score, and the specific improvement needed.
----
 ## Step 8: Write the Evaluation Record
 Use `templates/evaluation-record.template.yaml` to produce a structured evaluation record. See `examples/example-solution-architecture.evaluation.yaml` for a completed example.
@@ -154,8 +138,6 @@ The evaluation record captures:
 Store completed evaluation records with the artifact or in your architecture governance system.
----
 ## Interpreting Results
 ### Pass
@@ -170,8 +152,6 @@ The artifact has pervasive or significant gaps. Return it to the author with the
 ### Reject
 The artifact has failed one or more gate criteria, indicating a fundamental deficiency. Reject means the artifact should not be used or progressed until the gate issue is fully resolved. A gate failure is not about quality level — it is about something that makes the artifact unsuitable for its purpose.
----
 ## Calibrating Your Assessments
 If you are introducing EaROS to a team or beginning to use it for formal governance, calibrate before going live:
@@ -184,11 +164,9 @@ If you are introducing EaROS to a team or beginning to use it for formal governa
 Target inter-rater reliability: Cohen's κ > 0.70 for well-defined criteria.
----
 ## Next Steps
 - **Create a profile** for an artifact type not yet covered → [`docs/profile-authoring-guide.md`](profile-authoring-guide.md)
-- **Set up AI-agent assessment** → [`README.md`](../README.md#ai-agent-assessment) and [`standard/EAROS.md`](../standard/EAROS.md)
-- **Review the research behind EaROS** → [`research/`](../research/)
-- **Run a team calibration session** → [`calibration/`](../calibration/)
+- **Set up AI-agent assessment** → [`standard/EAROS.md`](../standard/EAROS.md)
+- **Review the research behind EaROS** → `research/` directory in the repository
+- **Run a team calibration session** → `calibration/` directory in the repository

package/assets/init/docs/onboarding/agent-assisted.md CHANGED Viewed

@@ -28,7 +28,7 @@ Agent evaluations follow an 8-step directed acyclic graph (DAG). Each step must
 **Step 7 --- Calibration.** The agent aligns its score distribution to reference human distributions using the Wasserstein-based method (`rulers_wasserstein`). This prevents systematic over-scoring or under-scoring relative to human reviewers.
-**Step 8 --- Status Determination.** Gates are checked first (critical gate failure equals Reject), then the weighted average is computed and applied against the status thresholds.
+**Step 8 --- Status Determination.** Gates are checked first (a critical gate failure blocks a passing status --- the specific outcome, `Reject` or `Not Reviewable`, is determined by the criterion's `failure_effect`), then the weighted average is computed and applied against the status thresholds.
 > **The DAG is not optional.** Skipping steps --- particularly the challenge pass (Step 6) --- undermines evaluation quality. An agent evaluation without a challenge pass is an unchecked evaluation.
@@ -36,7 +36,7 @@ Agent evaluations follow an 8-step directed acyclic graph (DAG). Each step must
 ### With Claude Code
-The `earos init` command scaffolds agent skills into `.claude/skills/` in your workspace. These are ready to use immediately:
+The `earos init` command scaffolds agent skills into `.agents/skills/` in your workspace. These are ready to use immediately:
 ```bash
 earos init my-workspace