npm - loreli - Versions diffs - 1.0.0 → 2.0.0 - Mend

loreli 1.0.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (63) hide show

package/README.md +66 -26
package/package.json +17 -14
package/packages/action/prompts/action.md +172 -0
package/packages/action/src/index.js +33 -5
package/packages/agent/README.md +107 -18
package/packages/agent/src/backends/claude.js +111 -11
package/packages/agent/src/backends/codex.js +78 -5
package/packages/agent/src/backends/cursor.js +104 -27
package/packages/agent/src/backends/index.js +162 -5
package/packages/agent/src/cli.js +80 -3
package/packages/agent/src/discover.js +396 -0
package/packages/agent/src/factory.js +39 -34
package/packages/agent/src/models.js +24 -6
package/packages/classify/README.md +136 -0
package/packages/classify/prompts/blocker.md +12 -0
package/packages/classify/prompts/feedback.md +14 -0
package/packages/classify/prompts/pane-state.md +20 -0
package/packages/classify/src/index.js +81 -0
package/packages/config/README.md +156 -91
package/packages/config/src/defaults.js +32 -21
package/packages/config/src/index.js +33 -2
package/packages/config/src/schema.js +57 -39
package/packages/hub/src/github.js +59 -20
package/packages/identity/README.md +1 -1
package/packages/identity/src/index.js +2 -2
package/packages/knowledge/README.md +86 -106
package/packages/knowledge/src/index.js +56 -225
package/packages/mcp/README.md +51 -7
package/packages/mcp/instructions.md +6 -1
package/packages/mcp/scaffolding/loreli.yml +115 -77
package/packages/mcp/scaffolding/mcp-configs/.codex/config.toml +1 -0
package/packages/mcp/scaffolding/mcp-configs/.cursor/mcp.json +4 -1
package/packages/mcp/scaffolding/mcp-configs/.mcp.json +4 -1
package/packages/mcp/src/index.js +45 -16
package/packages/mcp/src/tools/agent-context.js +44 -0
package/packages/mcp/src/tools/agents.js +34 -13
package/packages/mcp/src/tools/context.js +3 -2
package/packages/mcp/src/tools/github.js +11 -47
package/packages/mcp/src/tools/hitl.js +19 -6
package/packages/mcp/src/tools/index.js +2 -1
package/packages/mcp/src/tools/refactor.js +227 -0
package/packages/mcp/src/tools/repo.js +44 -0
package/packages/mcp/src/tools/start.js +159 -90
package/packages/mcp/src/tools/status.js +5 -2
package/packages/mcp/src/tools/work.js +18 -8
package/packages/orchestrator/src/index.js +345 -79
package/packages/planner/README.md +84 -1
package/packages/planner/prompts/plan-reviewer.md +109 -0
package/packages/planner/prompts/planner.md +191 -0
package/packages/planner/prompts/tiebreaker-reviewer.md +71 -0
package/packages/planner/src/index.js +326 -111
package/packages/review/README.md +2 -2
package/packages/review/prompts/reviewer.md +158 -0
package/packages/review/src/index.js +196 -76
package/packages/risk/README.md +81 -22
package/packages/risk/prompts/risk.md +272 -0
package/packages/risk/src/index.js +44 -33
package/packages/tmux/src/index.js +61 -12
package/packages/workflow/README.md +18 -14
package/packages/workflow/prompts/preamble.md +14 -0
package/packages/workflow/src/index.js +191 -12
package/packages/workspace/README.md +2 -2
package/packages/workspace/src/index.js +69 -18

package/packages/risk/README.md CHANGED Viewed

@@ -4,21 +4,82 @@ Risk assessment workflow for Loreli's orchestration pipeline. Extends the `Workf
 ## Research Findings
-No existing npm packages perform LLM-driven PR risk assessment with label-based routing. This is domain-specific to Loreli's adversarial review model.
+No existing npm packages perform LLM-driven PR risk assessment with label-based routing. This is domain-specific to Loreli's adversarial review model. Calibration techniques (file categorization, test coverage gap detection, numeric anchors, structured per-criterion scoring) were informed by the [PR Impact](https://github.com/ducdmdev/pr-impact) skill.
 ## How It Works
 RiskWorkflow runs **before** ReviewWorkflow in the reactor chain. When a new PR appears from an action agent, the risk workflow:
-1. **Assess** — Dispatches an opposing-provider risk agent with the PR's diff, file stats, linked issue body, and planning objective.
+1. **Assess** — Dispatches an opposing-provider risk agent with the PR's diff, file stats, aggregate stats, linked issue body, and planning objective.
 2. **Verdict** — Reads the risk agent's verdict comment (posted via the `comment` MCP tool with `risk: true`). The comment tool applies the appropriate GitHub label as a side effect.
 3. **Route** — Based on the verdict label:
    - `loreli:low-risk` — PR passes through to `ReviewWorkflow.scan()` for normal reviewer dispatch.
    - `loreli:medium-risk` — PR passes through with risk context attached to the reviewer prompt.
    - `loreli:critical-risk` — PR is escalated to HITL. No reviewer is dispatched.
+   - `loreli:risk-unassessed` — Assessment failed (dispatch error, timeout, dead agent). Escalated to HITL as a fail-safe.
 The contract between risk and review is **GitHub labels** — visible, auditable, and filterable. No shared in-memory state is needed.
+### Fail-Safe Behavior
+When risk assessment cannot complete (dispatch failure, stall timeout, or dead agent reap), the workflow applies `loreli:risk-unassessed` instead of `loreli:low-risk`. This ensures unassessed PRs are escalated to human review rather than silently passing through to automated review.
+## Prompt Structure
+The risk prompt (`prompts/risk.md`) uses a structured multi-step evaluation process:
+### Step 0 — Evidence Sufficiency
+Before any evaluation, the agent checks whether the diff and context are complete. Truncated diffs or missing context prevent LOW verdicts and may force CRITICAL when combined with destructive signals.
+### Step 1 — File Categorization
+Every changed file is classified into one of five categories:
+| Category | Examples |
+|----------|----------|
+| **source** | `.js`, `.ts`, `.jsx`, `.tsx`, `.mjs`, `.cjs` (excluding tests) |
+| **test** | Files in `test/`, `__tests__/`, or matching `*.test.*`, `*.spec.*` |
+| **docs** | `.md`, `.mdx`, `README`, `CHANGELOG` |
+| **config** | `package.json`, lock files, CI workflows, `Dockerfile`, `tsconfig.json` |
+| **other** | Assets, generated files, data |
+Tallies are referenced by subsequent criteria.
+### Step 2 — Evaluate Criteria
+Seven criteria are evaluated in order, each assigned a signal level:
+| Level | Meaning |
+|-------|---------|
+| **none** | No concern detected |
+| **minor** | Small signal, unlikely to affect verdict |
+| **elevated** | Meaningful concern, contributes to MEDIUM |
+| **severe** | Strong signal, contributes to CRITICAL or demands justification |
+The criteria:
+1. **Proportionality** — Are changes proportional to the objective? Calibration: >20 files = large, >500 lines = substantial, >1000 = outsized.
+2. **Destructiveness** — Does the PR remove significant code without replacement?
+3. **Infrastructure Impact** — Does it modify files with outsized blast radius (`package.json`, lock files, CI, `Dockerfile`, etc.)?
+4. **Reversibility** — How difficult to undo?
+5. **Scope Creep** — Changes unrelated to the issue?
+6. **Documentation Gap** — Architectural changes without corresponding docs?
+7. **Test Coverage Gap** — Source changes without corresponding test file changes? Zero test files alongside source changes = elevated at minimum.
+### Hard Escalation Triggers
+CRITICAL is assigned immediately when any of these hold:
+1. Objective requests minor/docs-only work but PR is severely destructive or infrastructure-impacting.
+2. Critical infrastructure/security files are removed without objective support.
+3. Two or more criteria rated severe.
+4. Evidence is incomplete with elevated/severe destructiveness or infrastructure risk.
+### Step 3 — Structured Verdict
+The agent posts a structured comment containing file category tallies, per-criterion signal levels with evidence, escalation triggers fired, and a verdict paragraph.
 ## API Reference
 ### `RiskWorkflow` (extends Workflow)
@@ -41,13 +102,13 @@ const risk = new RiskWorkflow(orchestrator, hub);
 #### `risk.assess(repo)` → Promise\<Array\<{pr, agent}\>\>
 Scan for new PRs from action agents and dispatch risk agents. Skips PRs that:
-- Already have a risk label (previously assessed)
+- Already have a risk label or `risk-unassessed` label
 - Are currently being assessed (`_assessing` map)
 - Were already assessed (`_assessed` set)
-Returns early with an empty array when `review.skipRiskAssessment` is `true`.
+Returns early with an empty array when `workflows.risk.skip` is `true`.
-When enlisting a risk agent fails, the PR is marked as assessed so `ReviewWorkflow.scan()` can proceed without the risk gate blocking it.
+When dispatch fails, the PR receives `loreli:risk-unassessed` (fail-safe) and is marked assessed so `ReviewWorkflow.scan()` can escalate to HITL.
 #### `risk.verdict(repo)` → Promise\<Array\<{pr, level}\>\>
@@ -57,11 +118,11 @@ Check for risk verdicts on PRs being assessed. For each PR in `_assessing`:
 - Kills the risk agent
 - Moves the PR from `_assessing` to `_assessed`
-If the stall timeout is exceeded without a verdict, the PR is marked as assessed with level `TIMEOUT`.
+If the stall timeout is exceeded without a verdict, the PR receives `loreli:risk-unassessed` (fail-safe) and is marked assessed with level `TIMEOUT`.
 #### `risk.reap(repo)` → Promise\<void\>
-Clean up stale risk assessments. Risk agents that die without posting a verdict leave PRs stuck in `_assessing`. This handler checks for dead agents (no longer in the orchestrator's agents map), marks their PRs as assessed, and applies a fallback `loreli:low-risk` label so `ReviewWorkflow.scan()` can proceed. Without the fallback label, the PR would be assessed in memory but unlabeled on GitHub — permanently skipped by the review label gate.
+Clean up stale risk assessments. Risk agents that die without posting a verdict leave PRs stuck in `_assessing`. This handler checks for dead agents (no longer in the orchestrator's agents map), marks their PRs as assessed, and applies `loreli:risk-unassessed` so `ReviewWorkflow.scan()` escalates to HITL. Without the fail-safe label, the PR would be assessed in memory but unlabeled on GitHub — permanently skipped by the review label gate.
 #### `risk.closesIssue(body)` → number|null
@@ -91,29 +152,27 @@ These must be registered **before** review handlers in the orchestrator so label
 | Key | Type | Default | Description |
 |-----|------|---------|-------------|
-| `review.skipRiskAssessment` | `boolean` | `false` | When `true`, disables the risk workflow entirely. `assess()` returns early and `ReviewWorkflow.scan()` skips the label gate. |
-## Prompt Template
+| `workflows.risk.skip` | `boolean` | `false` | When `true`, disables the risk workflow entirely. `assess()` returns early and `ReviewWorkflow.scan()` skips the label gate. |
-The risk prompt (`prompts/risk.md`) provides the risk agent with:
-- The original planning objective
-- The linked issue body
-- PR metadata (number, title, branch, author)
-- File change stats (filename, status, additions, deletions)
-- Unified diff
+## Label Contract
-The agent posts its verdict using the `comment` MCP tool with `risk: true` and the appropriate `level`. The comment tool applies the `loreli:{level}-risk` label automatically.
+| Label | Applied By | Meaning | Review Routing |
+|-------|-----------|---------|----------------|
+| `loreli:low-risk` | Comment tool (agent verdict) | PR cleared | Normal reviewer dispatch |
+| `loreli:medium-risk` | Comment tool (agent verdict) | PR cleared with concerns | Reviewer dispatch with risk context warning |
+| `loreli:critical-risk` | Comment tool (agent verdict) | PR dangerous | Escalated to HITL, no reviewer |
+| `loreli:risk-unassessed` | RiskWorkflow (fail-safe) | Assessment failed | Escalated to HITL, no reviewer |
 ## Errors
 | Error | When | Resolution |
 |-------|------|------------|
-| Enlist failure | No opposing-provider backend available | PR marked as assessed; review proceeds without risk gate |
-| Dispatch failure | Context gathering (diff, files, issue) fails | Risk agent killed; PR marked as assessed |
-| Stall timeout | Risk agent doesn't respond within stall timeout | PR marked as assessed with `TIMEOUT` level |
+| Enlist failure | No opposing-provider backend available | PR labeled `risk-unassessed`; escalated to HITL |
+| Dispatch failure | Context gathering (diff, files, issue) fails | Risk agent killed; PR labeled `risk-unassessed` |
+| Stall timeout | Risk agent doesn't respond within stall timeout | PR labeled `risk-unassessed` with `TIMEOUT` level |
 ## Scope Boundary
-**In scope**: Risk agent dispatch, verdict parsing, label routing, stale assessment cleanup, objective resolution, issue linkage parsing.
+**In scope**: Risk agent dispatch, verdict parsing, label routing, stale assessment cleanup, objective resolution, issue linkage parsing, fail-safe escalation for unassessed PRs.
-**Out of scope**: Reviewer dispatch (review package), label application (comment tool in mcp package), HITL escalation mechanics (review package).
+**Out of scope**: Reviewer dispatch (review package), label application for verdicts (comment tool in mcp package), HITL escalation mechanics (review package).

package/packages/risk/prompts/risk.md ADDED Viewed

@@ -0,0 +1,272 @@
+You are a risk assessment agent evaluating a pull request for safety before a reviewer is dispatched.
+<instructions>
+## Task
+Analyze the PR below and render a risk verdict: **LOW**, **MEDIUM**, or **CRITICAL**.
+- **LOW**: Changes are proportional, non-destructive, well-scoped, and adequately tested. Proceed to normal review.
+- **MEDIUM**: Some concerns exist (large scope, infrastructure files touched, moderate deletion, missing test coverage) but the changes appear justified. Proceed to review with a risk warning attached.
+- **CRITICAL**: The PR is disproportionate, destructive, or dangerous. Escalate to a human immediately. Do NOT proceed to automated review.
+Your verdict determines the PR's path: LOW/MEDIUM proceed to review, CRITICAL escalates to a human. Be calibrated — false CRITICAL verdicts block work, false LOW verdicts let dangerous changes through to automated review.
+## Step 0 — Evidence Sufficiency (Fail-Safe)
+Before assigning any level, check whether the provided evidence is complete enough for a confident decision.
+- If the unified diff includes a truncation marker (for example, `... [diff truncated at N bytes]`), evidence is incomplete.
+- If objective or issue context is missing/placeholder-only, evidence is incomplete.
+- If file changes appear disproportionately large relative to visible diff content, treat evidence as incomplete.
+Rules:
+- With incomplete evidence, you **must not** assign LOW.
+- If incomplete evidence combines with elevated/severe destructiveness or infrastructure signals, assign **CRITICAL**.
+- Otherwise, assign **MEDIUM** and include a warning that escalation may be required after full diff inspection.
+## Step 1 — Categorize Files
+Before evaluating risk, classify every file in the File Changes list into one of these categories:
+- **source**: Production code (`.js`, `.ts`, `.jsx`, `.tsx`, `.mjs`, `.cjs` — excluding test files)
+- **test**: Files in `test/`, `__tests__/`, or matching `*.test.*`, `*.spec.*`
+- **docs**: Documentation (`.md`, `.mdx`, `README`, `CHANGELOG`)
+- **config**: `package.json`, lock files, `.github/workflows/*`, `Dockerfile`, `tsconfig.json`, `.eslintrc.*`, build/bundler configs
+- **other**: Everything else (assets, generated files, data)
+Tally each category with its aggregate additions and deletions. You will reference these tallies throughout the evaluation.
+## Step 2 — Evaluate Criteria
+Evaluate each criterion below in order. For each, assign a signal level and cite specific evidence from the file list and diff:
+- **none**: No concern detected.
+- **minor**: Small signal, unlikely to affect the verdict.
+- **elevated**: Meaningful concern that contributes to a MEDIUM verdict.
+- **severe**: Strong signal that contributes to CRITICAL or demands justification to avoid it.
+### 1. Proportionality
+Are the changes proportional to the objective and issue?
+Calibration anchors:
+- **>20 files touched** is large scope — justified only if the objective is broad (migration, rename, new package).
+- **>500 net lines changed** is a substantial PR. >1000 is outsized unless the objective explicitly calls for it.
+- A PR touching 50 files for a one-line bug fix is disproportionate. A PR deleting an entire directory is acceptable if the objective explicitly requests it.
+### 2. Destructiveness
+Does the PR remove significant code, files, or directories?
+Mass deletion must be justified by the objective and issue. Removing test fixtures, generated files, or explicitly requested removals is acceptable. Deleting production source without replacement or migration is severe.
+### 3. Infrastructure Impact
+Does the PR modify files with outsized blast radius?
+Watch for these specifically: `package.json`, `pnpm-lock.yaml`/`package-lock.json`, `.github/workflows/*`, `Dockerfile`, `tsconfig.json`, `.eslintrc.*`, `.env*`, `docker-compose.yml`. A broken CI config blocks all work; a broken lock file breaks every developer's install.
+If the config category from Step 1 contains modified files, this criterion is elevated at minimum. Justify if the objective warrants it.
+### 4. Reversibility
+How difficult would it be to undo this change?
+A single file edit is trivially reversible. Deleting an entire package with its history is not. Renaming or moving files across directories is moderately reversible (git tracks renames, but downstream references break).
+### 5. Scope Creep
+Does the PR introduce changes unrelated to the issue?
+Compare every changed file against the objective and issue. Unscoped work bypasses the planning process and creates conflicts with parallel agents. Opportunistic cleanup in touched files is acceptable; touching unrelated subsystems is not.
+### 6. Documentation Gap
+Do architectural or API changes have corresponding documentation updates?
+Check: if source files introduce new exports, public functions, config keys, or structural changes — do any docs-category files appear in the file list? Architectural changes without documentation carry elevated risk because future agents and humans lack context for the decisions.
+### 7. Test Coverage Gap
+Do source file changes have corresponding test file changes?
+Check the file categorization from Step 1. If source files changed, test files should also appear in the changeset. This is the single strongest signal of change confidence.
+Calibration anchors:
+- If **zero test files** changed alongside source changes, this is elevated at minimum.
+- If **>50% of changed source files** have no corresponding test file change (e.g., `src/foo.js` changed but no `test/foo.test.js` in the list), signal elevated.
+- Docs-only or config-only PRs are exempt from this criterion.
+## Hard Escalation Triggers
+Assign **CRITICAL** immediately when any of the following hold, even if some other criteria are mild:
+1. Objective/issue requests minor or docs-only work, but PR performs severe destructive or infrastructure-impacting changes.
+2. Critical infrastructure/security files are removed or dangerously modified without explicit objective support (e.g., package manifest/lockfiles, CI workflows, auth/security config, deployment config).
+3. Two or more criteria are rated **severe**.
+4. Evidence is incomplete and visible changes already show elevated/severe destructiveness or infrastructure risk.
+## Step 3 — Render Verdict
+Synthesize your per-criterion findings into a final verdict. Use the **comment** tool with `risk: true` and the appropriate `level`.
+Structure your comment body as:
+### File Categories
+- source: N files (+X −Y)
+- test: N files (+X −Y)
+- docs: N files (+X −Y)
+- config: N files (+X −Y)
+- other: N files (+X −Y)
+### Evaluation
+1. **Proportionality**: [none/minor/elevated/severe] — evidence
+2. **Destructiveness**: [none/minor/elevated/severe] — evidence
+3. **Infrastructure**: [none/minor/elevated/severe] — evidence
+4. **Reversibility**: [none/minor/elevated/severe] — evidence
+5. **Scope creep**: [none/minor/elevated/severe] — evidence
+6. **Documentation gap**: [none/minor/elevated/severe] — evidence
+7. **Test coverage gap**: [none/minor/elevated/severe] — evidence
+### Escalation Triggers Fired
+- List any hard escalation triggers that fired.
+- If none fired, state: `None`.
+### Verdict: [LOW/MEDIUM/CRITICAL]
+One paragraph synthesizing the above into your routing decision. If MEDIUM, include a **Warning for reviewer** line highlighting what the reviewer should pay attention to.
+</instructions>
+<examples>
+<example title="LOW — proportional, tested, non-destructive">
+### File Categories
+- source: 1 file (+80 −3)
+- test: 1 file (+65 −0)
+- docs: 0 files
+- config: 0 files
+### Evaluation
+1. **Proportionality**: none — 2 files, +145 −3 lines for a retry logic feature. Well-scoped.
+2. **Destructiveness**: none — 3 lines removed (replaced with retry wrapper). No deletions of substance.
+3. **Infrastructure**: none — no config files touched.
+4. **Reversibility**: none — single module change, trivially revertable.
+5. **Scope creep**: none — all changes relate to the retry objective.
+6. **Documentation gap**: minor — no README update for the new retry config option, but the feature is internal. Acceptable for LOW.
+7. **Test coverage gap**: none — `test/client.test.js` added with 65 new lines covering retry scenarios.
+### Escalation Triggers Fired
+None.
+### Verdict: LOW
+Changes are proportional to the objective, non-destructive, scoped, and tested. Proceed to normal review.
+</example>
+<example title="MEDIUM — justified scope but missing test coverage">
+### File Categories
+- source: 8 files (+320 −45)
+- test: 0 files (+0 −0)
+- docs: 1 file (+12 −0)
+- config: 0 files
+### Evaluation
+1. **Proportionality**: minor — 9 files for a new caching layer is reasonable scope, though on the larger side.
+2. **Destructiveness**: none — 45 lines removed are replaced with the new cache calls. No mass deletion.
+3. **Infrastructure**: none — no config files touched.
+4. **Reversibility**: minor — changes span 8 source files but each is a contained edit.
+5. **Scope creep**: none — all changes relate to the caching objective.
+6. **Documentation gap**: none — README updated with cache configuration docs.
+7. **Test coverage gap**: elevated — 8 source files changed with zero test files in the changeset. No `test/cache.test.js` or updates to existing test files. The caching layer introduces new behavior paths that are untested.
+### Escalation Triggers Fired
+None.
+### Verdict: MEDIUM
+The implementation is proportional and well-documented, but the complete absence of test coverage for 8 changed source files is a significant gap. The caching layer introduces TTL expiry, invalidation, and fallback paths — all of which need test coverage.
+**Warning for reviewer**: Verify test coverage exists or request it. Pay special attention to cache invalidation edge cases and TTL boundary behavior.
+</example>
+<example title="MEDIUM — justified but elevated blast radius">
+### File Categories
+- source: 20 files (+150 −140)
+- test: 3 files (+45 −40)
+- docs: 0 files
+- config: 2 files (+8 −6)
+### Evaluation
+1. **Proportionality**: minor — 25 files for an ESM migration is proportional to the objective.
+2. **Destructiveness**: none — line counts are balanced (+150 −140), indicating transformation rather than deletion.
+3. **Infrastructure**: elevated — `package.json` modified (type: module) and `jest.config.js` updated. The `package.json` change affects all imports project-wide.
+4. **Reversibility**: minor — ESM migration is reversible but would require touching all the same files again.
+5. **Scope creep**: none — all changes are ESM-related.
+6. **Documentation gap**: elevated — no docs updated to reflect the ESM migration. Import examples in README may now be wrong.
+7. **Test coverage gap**: minor — 3 test files updated out of 20 source files, but most source changes are mechanical import/export syntax.
+### Escalation Triggers Fired
+None.
+### Verdict: MEDIUM
+The ESM migration is proportional to the objective, but infrastructure files carry outsized blast radius and documentation hasn't been updated.
+**Warning for reviewer**: Verify that all import paths use `.js` extensions, that the test runner config is correct for ESM, and that README examples reflect ESM syntax.
+</example>
+<example title="CRITICAL — disproportionate destruction">
+### File Categories
+- source: 0 files (+0 −2,400)
+- test: 0 files
+- docs: 1 file (+1 −0)
+- config: 1 file (+0 −35)
+### Evaluation
+1. **Proportionality**: severe — the issue requests fixing a typo in the README. The PR deletes the entire `src/` directory and `package.json`.
+2. **Destructiveness**: severe — 2,400 lines of production code removed. `package.json` deleted entirely.
+3. **Infrastructure**: severe — `package.json` removal breaks the entire package.
+4. **Reversibility**: severe — directory deletion with all contents. History exists in git but recovery requires manual reconstruction.
+5. **Scope creep**: severe — typo fix does not justify any source or config changes.
+6. **Documentation gap**: none — N/A given the destruction.
+7. **Test coverage gap**: none — N/A, no source files added.
+### Escalation Triggers Fired
+1. Objective requests minor docs-only work but PR performs severe destructive changes.
+2. Critical infrastructure files removed without objective support (`package.json`).
+3. Four criteria rated severe (proportionality, destructiveness, infrastructure, reversibility).
+### Verdict: CRITICAL
+The destruction is completely disproportionate to the objective. A one-line typo fix does not justify deleting the entire source tree and package manifest. Escalate immediately.
+</example>
+</examples>
+<context>
+### Original Objective
+{{{objective}}}
+### Issue
+{{{issue}}}
+### PR Metadata
+- **PR**: #{{number}} — {{title}}
+- **Branch**: `{{head}}` → `{{base}}`
+- **Author**: {{author}} ({{authorProvider}})
+- **Stats**: {{stats.total}} files changed, +{{stats.additions}} −{{stats.deletions}} lines
+### File Changes
+{{#files}}
+- `{{{filename}}}` ({{status}}) +{{additions}} −{{deletions}}
+{{/files}}
+### Unified Diff
+```diff
+{{{diff}}}
+```
+</context>

package/packages/risk/src/index.js CHANGED Viewed

@@ -27,6 +27,7 @@ const CLOSES_RE = /(?:close[sd]?|fix(?:e[sd])?|resolve[sd]?)\s+#(\d+)/i;
  * - `loreli:low-risk` — PR cleared for normal review
  * - `loreli:medium-risk` — PR cleared with risk context warning
  * - `loreli:critical-risk` — PR escalated to HITL, no reviewer dispatched
+ * - `loreli:risk-unassessed` — assessment failed; escalated to HITL (fail-safe)
  *
  * @extends Workflow
  */
@@ -61,7 +62,7 @@ export class RiskWorkflow extends Workflow {
    * @returns {Promise<{workload: number, supply: number, deficit: number}>}
    */
   async demand(repo) {
-    const skip = this.orchestrator.cfg?.get?.('review.skipRiskAssessment') ?? false;
+    const skip = this.orchestrator.cfg?.get?.('workflows.risk.skip') ?? false;
     if (skip) return { workload: 0, supply: 0, deficit: 0 };
     const prs = await this.hub.pulls(repo, { state: 'open' });
@@ -73,7 +74,7 @@ export class RiskWorkflow extends Workflow {
       const hasRiskLabel = pr.labels?.some(function isRisk(l) {
         const name = l.name ?? l;
-        return name.endsWith('-risk') && name.startsWith('loreli:');
+        return (name.endsWith('-risk') || name === 'loreli:risk-unassessed') && name.startsWith('loreli:');
       });
       if (hasRiskLabel) continue;
@@ -127,7 +128,7 @@ export class RiskWorkflow extends Workflow {
       const hasRiskLabel = pr.labels?.some(function isRisk(l) {
         const name = l.name ?? l;
-        return name.endsWith('-risk') && name.startsWith('loreli:');
+        return (name.endsWith('-risk') || name === 'loreli:risk-unassessed') && name.startsWith('loreli:');
       });
       if (hasRiskLabel) {
         this._assessed.add(pr.number);
@@ -206,7 +207,7 @@ export class RiskWorkflow extends Workflow {
    * @returns {Promise<Array<{pr: number, agent: string}>>}
    */
   async assess(repo) {
-    const skip = this.orchestrator.cfg?.get?.('review.skipRiskAssessment') ?? false;
+    const skip = this.orchestrator.cfg?.get?.('workflows.risk.skip') ?? false;
     if (skip) return [];
     const dispatched = [];
@@ -219,9 +220,10 @@ export class RiskWorkflow extends Workflow {
       if (this._assessed.has(pr.number)) continue;
       // Skip PRs that already have a risk label (from a previous assessment)
+      // or a risk-unassessed label (from a failed assessment)
       const hasRiskLabel = pr.labels?.some(function isRisk(l) {
         const name = l.name ?? l;
-        return name.endsWith('-risk') && name.startsWith('loreli:');
+        return (name.endsWith('-risk') || name === 'loreli:risk-unassessed') && name.startsWith('loreli:');
       });
       if (hasRiskLabel) continue;
@@ -281,6 +283,21 @@ export class RiskWorkflow extends Workflow {
           pr: pr.number
         });
+        const mapped = files.map(function fmt(f) {
+          return {
+            filename: f.filename,
+            status: f.status,
+            additions: f.additions ?? 0,
+            deletions: f.deletions ?? 0
+          };
+        });
+        const stats = {
+          total: mapped.length,
+          additions: mapped.reduce(function sum(s, f) { return s + f.additions; }, 0),
+          deletions: mapped.reduce(function sum(s, f) { return s + f.deletions; }, 0)
+        };
         const prompt = await this.render({
           objective: obj,
           issue: issueBody,
@@ -290,14 +307,8 @@ export class RiskWorkflow extends Workflow {
           base: pr.base,
           author: agent.identity.name,
           authorProvider: agent.identity.provider,
-          files: files.map(function fmt(f) {
-            return {
-              filename: f.filename,
-              status: f.status,
-              additions: f.additions ?? 0,
-              deletions: f.deletions ?? 0
-            };
-          }),
+          files: mapped,
+          stats,
           diff
         });
@@ -313,13 +324,14 @@ export class RiskWorkflow extends Workflow {
         log.warn(`assess: dispatch failed for PR #${pr.number}: ${err.message}`);
         try { await this.orchestrator.kill(riskAgent.identity.name); } catch { /* best-effort */ }
-        // Apply fallback low-risk label so review's label gate passes
+        // Fail-safe: unassessed PRs must not pass as low-risk.
+        // The risk-unassessed label routes to HITL in ReviewWorkflow.scan().
         try {
-          const label = 'loreli:low-risk';
-          await this.hub.ensure(repo, [{ name: label, color: '0e8a16', description: 'Risk: LOW' }]);
+          const label = 'loreli:risk-unassessed';
+          await this.hub.ensure(repo, [{ name: label, color: 'e11d48', description: 'Risk: assessment failed — requires human review' }]);
           await this.hub.label(repo, pr.number, [label]);
-          log.info(`assess: applied fallback ${label} to PR #${pr.number}`);
-        } catch { /* best-effort — review may still proceed on next scan */ }
+          log.info(`assess: applied fail-safe ${label} to PR #${pr.number}`);
+        } catch { /* best-effort */ }
         this._assessed.add(pr.number);
       }
@@ -352,22 +364,22 @@ export class RiskWorkflow extends Workflow {
     for (const [prNum, tracking] of this._assessing) {
       const elapsed = now - tracking.dispatchedAt;
-      // Stall timeout — risk agent didn't respond. Apply a fallback
-      // low-risk label so ReviewWorkflow.scan() can proceed — without
-      // a label the PR stays gated indefinitely.
+      // Stall timeout — risk agent didn't respond. Fail-safe: apply
+      // risk-unassessed label so ReviewWorkflow escalates to HITL
+      // instead of silently passing as low-risk.
       if (elapsed > stallTimeout) {
-        log.warn(`verdict: risk assessment timed out for PR #${prNum} — applying low-risk fallback`);
+        log.warn(`verdict: risk assessment timed out for PR #${prNum} — applying fail-safe label`);
         this._assessing.delete(prNum);
         this._assessed.add(prNum);
         try { await this.orchestrator.kill(tracking.riskAgent); } catch { /* best-effort */ }
         try {
-          const label = 'loreli:low-risk';
-          await this.hub.ensure(repo, [{ name: label, color: '0e8a16', description: 'Risk: LOW' }]);
+          const label = 'loreli:risk-unassessed';
+          await this.hub.ensure(repo, [{ name: label, color: 'e11d48', description: 'Risk: assessment failed — requires human review' }]);
           await this.hub.label(repo, prNum, [label]);
-          log.info(`verdict: applied fallback ${label} to PR #${prNum}`);
+          log.info(`verdict: applied fail-safe ${label} to PR #${prNum}`);
         } catch (err) {
-          log.warn(`verdict: failed to apply fallback label to PR #${prNum}: ${err.message}`);
+          log.warn(`verdict: failed to apply fail-safe label to PR #${prNum}: ${err.message}`);
         }
         results.push({ pr: prNum, level: 'TIMEOUT' });
@@ -411,16 +423,15 @@ export class RiskWorkflow extends Workflow {
         this._assessing.delete(prNum);
         this._assessed.add(prNum);
-        // Apply fallback low-risk label so ReviewWorkflow's label gate
-        // can proceed. Without this, the PR enters a dead zone: assessed
-        // in memory but unlabeled on GitHub, permanently skipped by scan().
+        // Fail-safe: unassessed PRs must not pass as low-risk.
+        // The risk-unassessed label routes to HITL in ReviewWorkflow.scan().
         try {
-          const label = 'loreli:low-risk';
-          await this.hub.ensure(repo, [{ name: label, color: '0e8a16', description: 'Risk: LOW' }]);
+          const label = 'loreli:risk-unassessed';
+          await this.hub.ensure(repo, [{ name: label, color: 'e11d48', description: 'Risk: assessment failed — requires human review' }]);
           await this.hub.label(repo, prNum, [label]);
-          log.info(`reap: applied fallback ${label} to PR #${prNum}`);
+          log.info(`reap: applied fail-safe ${label} to PR #${prNum}`);
         } catch (err) {
-          log.warn(`reap: failed to apply fallback label to PR #${prNum}: ${err.message}`);
+          log.warn(`reap: failed to apply fail-safe label to PR #${prNum}: ${err.message}`);
         }
       }
     }