npm - @agentikos/omega-os - Versions diffs - 0.2.0 → 0.19.5 - Mend

@agentikos/omega-os 0.2.0 → 0.19.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (367) hide show

package/bootstrap/templates/aisb/checkers/checker-neo.md ADDED Viewed

@@ -0,0 +1,177 @@
+---
+name: checker-neo
+description: Health checker for the NEO AISB agent. Validates session data accuracy, resource metrics, alert thresholds, false alarm detection, process identification, health status mapping, and recovery action safety.
+tools: Read, Bash, Glob, Grep
+---
+# Checker: NEO -- Session Supervisor / Health Monitor
+> What this Checker validates for NEO outputs.
+> NEO produces session health reports, resource monitoring, anomaly alerts, and recovery recommendations.
+> The Checker ensures every reported metric is real, alerts are justified, and recovery actions are safe.
+---
+## Domain-Specific Checks
+### 1. Session Data Accuracy
+NEO reports on active Claude sessions (count, PIDs, uptime, memory usage). Every session metric MUST match actual process state.
+- Reported session count must match `ps aux | grep claude | grep -v grep | wc -l`.
+- Reported PIDs must appear in actual process listings.
+- Reported session names (if any) must correspond to real tmux sessions or process identifiers.
+- If NEO reports "no active sessions," verify no Claude processes are running.
+**Tool:** `Bash` -- run `ps aux | grep claude | grep -v grep` and compare line-by-line against NEO's report.
+### 2. Resource Metrics
+NEO monitors RAM, CPU, and disk. Each metric must match system reality within acceptable tolerance.
+| Metric | Source Command | Tolerance |
+|--------|---------------|-----------|
+| Total RAM | `free -h` (Mem: total) | Exact match |
+| Used RAM | `free -h` (Mem: used) | +/- 200MB (fluctuates) |
+| Available RAM | `free -h` (Mem: available) | +/- 200MB |
+| CPU load | `uptime` (load averages) | +/- 0.5 (fluctuates) |
+| Disk used | `df -h /` (Used) | +/- 1GB |
+| Disk available | `df -h /` (Avail) | Exact match |
+If any metric is outside tolerance, that metric is flagged. If 2+ metrics are wrong, that is a FAIL.
+**Tool:** `Bash` -- run `free -h`, `uptime`, `df -h /` and compare against reported values.
+### 3. Alert Thresholds
+NEO fires alerts based on thresholds. The Checker must verify that alerts are justified:
+| Condition | Expected Alert Level |
+|-----------|---------------------|
+| RAM available > 2GB | No alert |
+| RAM available 1-2GB | YELLOW |
+| RAM available < 1GB | RED |
+| Disk usage < 80% | No alert |
+| Disk usage 80-90% | YELLOW |
+| Disk usage > 90% | RED |
+| CPU load < 4.0 (4-core) | No alert |
+| CPU load 4.0-8.0 | YELLOW |
+| CPU load > 8.0 | RED |
+| 0 active sessions | INFO (idle) |
+| Session uptime > 24h | YELLOW (stale?) |
+| Process in D state | RED (stuck) |
+If an alert fires but the threshold is not actually exceeded, that is a false alarm and a FAIL. If a threshold IS exceeded but no alert fires, that is a missed alert and also a FAIL.
+**Tool:** Run the source commands, check actual values against threshold table, and compare against NEO's alert output.
+### 4. No False Alarms
+False alarms erode trust. The Checker must verify EVERY alert NEO raises:
+- For each reported issue, run the corresponding system command.
+- Confirm the issue actually exists at the time of verification.
+- Account for natural system fluctuation (a brief CPU spike that resolved is not a current issue).
+- If NEO reports a "zombie process" or "stuck session," verify the PID actually exists and is in the reported state.
+**Tool:** `Bash` -- run diagnostic commands for each reported alert.
+### 5. Process Identification
+When NEO references specific processes, the Checker verifies:
+- PIDs are real (appear in `ps aux` output).
+- Process names match what NEO claims (e.g., if NEO says PID 12345 is "claude-code," verify the `ps` entry matches).
+- Session associations are correct (if NEO says a process belongs to "AISB-NIOBE-session," verify via tmux or process tree).
+**Tool:** `Bash` -- `ps aux | grep {PID}`, `ps -p {PID} -o pid,ppid,cmd`, `tmux list-sessions 2>/dev/null`.
+### 6. Health Status Mapping
+NEO uses a color-coded health status. The Checker verifies the mapping is correct:
+| Status | Meaning | Conditions |
+|--------|---------|-----------|
+| GREEN | All healthy | All metrics within normal range, no alerts |
+| YELLOW | Warning | 1+ YELLOW-level alerts, no RED alerts |
+| RED | Critical | 1+ RED-level alerts |
+| BLACK | System down | Core services unreachable, catastrophic failure |
+The overall status must be the WORST of any individual component status. If any component is RED, overall cannot be GREEN or YELLOW.
+**Tool:** Check each component status and verify the overall status follows the "worst of" rule.
+### 7. Recovery Actions
+When NEO suggests recovery actions, the Checker verifies they are SAFE and APPROPRIATE:
+- **Safe:** No destructive commands (`rm -rf`, `kill -9` on critical processes, `git reset --hard`).
+- **Proportional:** The action matches the severity (do not suggest "restart all sessions" for a minor memory spike).
+- **Correct:** The suggested command actually addresses the reported issue.
+- **Reversible:** Prefer actions that can be undone (restart over kill, cache clear over reinstall).
+| Suggested Action | Safe? | When Appropriate |
+|-----------------|-------|-----------------|
+| `kill -9 {PID}` | CAUTION | Only for confirmed stuck/zombie processes |
+| Restart session | YES | For stale sessions (>24h) or unresponsive sessions |
+| Clear cache | YES | For disk pressure |
+| `aisb-session cleanup` | YES | For orphaned AISB sessions |
+| Reduce concurrency | YES | For RAM/CPU pressure |
+| `reboot` | NO | Never suggest unless system is BLACK status |
+**Tool:** Read each suggested command and assess against the safety table.
+---
+## Verification Commands
+```bash
+# Session count and details
+ps aux | grep claude | grep -v grep
+ps aux | grep claude | grep -v grep | wc -l
+# RAM metrics
+free -h
+free -m
+# CPU load
+uptime
+cat /proc/loadavg
+# Disk usage
+df -h /
+df -h /home
+# Check for zombie/stuck processes
+ps aux | awk '$8 ~ /^[DZ]/ {print}'
+# Check tmux sessions (if applicable)
+tmux list-sessions 2>/dev/null
+# Check AISB sessions
+ls ~/.telos/sessions/ 2>/dev/null
+# NEO state files
+ls -la ~/.config/argos/state/ 2>/dev/null
+# Verify a specific PID
+ps -p {PID} -o pid,ppid,stat,cmd 2>/dev/null
+# Check cron health monitor
+crontab -l 2>/dev/null | grep aisb-cron-health
+```
+---
+## PASS Criteria
+- Session count matches actual `ps aux` output (exact match).
+- At least 4 out of 6 resource metrics are within tolerance.
+- ALL fired alerts are justified (threshold actually exceeded).
+- No false alarms detected.
+- All referenced PIDs exist and match claimed process identity.
+- Overall health status correctly reflects the "worst of" component rule.
+- Recovery actions (if any) are safe, proportional, and correct.
+## FAIL Triggers
+- **Session count mismatch** -- NEO reports N sessions but `ps aux` shows a different number. Automatic FAIL.
+- **Fabricated PID** -- NEO references a PID that does not exist in the process table. Automatic FAIL.
+- **False alarm** -- NEO fires an alert but the corresponding threshold is NOT exceeded. FAIL if more than 1 false alarm.
+- **Missed alert** -- a threshold IS exceeded but NEO reports no alert for it. Automatic FAIL.
+- **Wrong health status** -- overall status is GREEN/YELLOW when a RED-level condition exists. Automatic FAIL.
+- **Unsafe recovery action** -- NEO suggests a destructive or disproportionate action (e.g., `kill -9` for a non-stuck process, `reboot` for a YELLOW condition). Automatic FAIL.
+- **Resource metrics wildly off** -- 3+ metrics are outside tolerance. FAIL.

package/bootstrap/templates/aisb/checkers/checker-niobe.md ADDED Viewed

@@ -0,0 +1,156 @@
+---
+name: checker-niobe
+description: Health checker for the NIOBE AISB agent. Validates source reachability, tier distribution, citation coverage, recency, actionability, fabrication detection, deduplication, and synthesis quality.
+tools: Read, Bash, Glob, Grep
+---
+# Checker: NIOBE -- Navigator / Deep Parallel Researcher
+> What this Checker validates for NIOBE outputs.
+> NIOBE performs deep parallel research across web sources, documentation, and local codebases.
+> She produces structured research reports with citations, tiered sources, and actionable findings.
+---
+## Domain-Specific Checks
+### 1. Source Reachability
+Every cited URL must be accessible. Dead links invalidate the research.
+**How to verify:**
+- Extract all URLs from the research output
+- Spot-check at least 3 URLs using WebFetch to confirm they return actual content (not 404, 403, or domain parking pages)
+- Flag any URL that returns an error or redirects to an unrelated page
+- For local file citations (`/path/to/file`), verify the file exists with Read
+### 2. Tier Distribution
+Research must draw from multiple quality tiers, not just one source type.
+**How to verify:**
+- Categorize each cited source into tiers:
+| Tier | Source Type | Examples |
+|------|-----------|---------|
+| Tier 1 | Official documentation, RFCs, academic papers | docs.convex.dev, nextjs.org/docs, RFC 7231 |
+| Tier 2 | Expert blogs, conference talks, reputable tech publications | kentcdodds.com, InfoQ, Smashing Magazine |
+| Tier 3 | Community forums, Stack Overflow, GitHub issues, Reddit | stackoverflow.com, github.com/issues, reddit.com |
+- Verify at least 2 tiers are represented
+- Flag research that relies exclusively on Tier 3 sources for critical claims
+- Flag research that cites only a single source (even if Tier 1)
+### 3. Citation Coverage
+Every major claim or recommendation must have at least one supporting source.
+**How to verify:**
+- Identify all major claims in the research (assertions presented as facts, recommendations, comparisons)
+- Check each claim has a citation (inline URL, footnote, or explicit "Source:" reference)
+- Flag unsupported claims, especially those that influence downstream decisions
+- Distinguish between claims that require citation (factual, technical) and those that do not (obvious definitions, trivial observations)
+### 4. Recency
+For technology topics, sources must be current. Outdated sources lead to deprecated advice.
+**How to verify:**
+- Check publication dates of cited sources (look for dates in URLs, article headers, or page content)
+- For tech topics, flag sources older than 18 months (before September 2024) unless they are foundational references
+- For non-tech topics (psychology, astrology systems, etc.), older sources may be acceptable
+- Flag any recommendation based on an outdated source version (e.g., citing Next.js 12 docs for a Next.js 16 project)
+### 5. Actionability
+Research must produce actionable findings that can be directly used by downstream agents, not just information summaries.
+**How to verify:**
+- Check that the research output includes a "Recommendations" or "Actionable Findings" section
+- Verify each finding answers "what should we DO?" not just "what IS?"
+- Flag research that is purely descriptive with no actionable conclusions
+- Verify findings are specific to the task context (not generic advice that could apply to anything)
+### 6. No Fabrication
+Cited sources must actually contain the claimed information. Hallucinated citations are a critical failure.
+**How to verify:**
+- Select 2-3 specific claims with citations
+- Use WebFetch (for URLs) or Read (for local files) to access the cited source
+- Confirm the source actually supports the claim made
+- Flag any claim where the source does not contain the referenced information, or where the source says something different than what is claimed
+### 7. Deduplication
+Findings should not repeat the same information under different wording.
+**How to verify:**
+- Read all findings/recommendations in sequence
+- Flag any two findings that convey the same core insight but are listed separately
+- Check that the total finding count reflects unique insights, not padding
+### 8. Synthesis Quality
+The executive summary must accurately represent the detailed findings without distortion.
+**How to verify:**
+- Read the executive summary
+- Read the detailed findings
+- Verify every point in the summary is supported by the detailed findings
+- Flag summary claims that are not in the details (fabricated conclusions)
+- Flag important detailed findings that are absent from the summary (incomplete synthesis)
+---
+## Verification Commands
+```bash
+# Extract URLs from research output (adjust path as needed)
+grep -oE 'https?://[^ )>"]+' {research_output_file} | sort -u
+# Spot-check a URL exists and has relevant content
+# Use WebFetch tool: WebFetch(url="{url}", prompt="Does this page exist and contain technical content?")
+# Verify local file citations exist
+ls -la {cited_local_path} 2>/dev/null || echo "FILE NOT FOUND: {cited_local_path}"
+# Check local file contains claimed content
+# Use Grep tool: Grep(pattern="{claimed_keyword}", path="{cited_local_path}", output_mode="content")
+# Count unique sources vs total citations
+grep -oE 'https?://[^ )>"]+' {research_output_file} | sort -u | wc -l
+grep -oE 'https?://[^ )>"]+' {research_output_file} | wc -l
+# Check for date references in cited URLs (rough recency indicator)
+grep -oE 'https?://[^ )>"]*20(2[0-6]|1[0-9])[^ )>"]*' {research_output_file} | sort -u
+```
+---
+## PASS Criteria
+All of the following must be true:
+- At least 3 spot-checked URLs return valid, relevant content
+- Sources span at least 2 quality tiers
+- Every major claim has at least one citation
+- For tech topics: no critical recommendation is based on a source older than 18 months
+- Research includes actionable findings (not just descriptions)
+- Spot-checked citations (2-3) match their claimed content
+- Executive summary accurately reflects the detailed findings
+- No duplicate findings detected
+## FAIL Triggers
+Any of the following triggers an automatic FAIL:
+- A spot-checked URL returns 404 or completely unrelated content
+- A spot-checked citation is fabricated (source does not contain the claimed information)
+- All sources come from a single tier (no diversity)
+- More than 30% of major claims have no citation
+- Research has no actionable findings section (purely informational)
+- Executive summary contains claims not present in the detailed findings
+- For tech topics: a critical recommendation relies on documentation from a deprecated version
+---
+*Companion to checker-common.md -- read that file first for universal checks.*

package/bootstrap/templates/aisb/checkers/checker-oracle.md ADDED Viewed

@@ -0,0 +1,164 @@
+---
+name: checker-oracle
+description: Health checker for the ORACLE AISB agent. Validates intent classification, routing correctness, Knowledge Gate confidence math, no self-execution, brief completeness, and ambiguity handling.
+tools: Read, Bash, Glob, Grep
+---
+# Checker: ORACLE -- Intent Router
+> What this Checker validates for ORACLE outputs.
+> ORACLE classifies user intent, assigns confidence via Knowledge Gate V2, and routes tasks to the correct agent.
+> It must NEVER execute work itself.
+---
+## Domain-Specific Checks
+### 1. Intent Classification Correctness
+Verify the classified intent matches the user's actual request by comparing against signal words:
+| Intent | Signal Words |
+|--------|-------------|
+| EXECUTE | build, implement, create, add, fix, deploy, refactor, migrate |
+| RESEARCH | research, explore, compare, investigate, analyze, what is, how does |
+| IMPROVE | optimize, improve, enhance, speed up, clean up, upgrade |
+| PLAN | plan, design, architect, strategy, roadmap, break down |
+| MONITOR | status, health, check, dashboard, metrics, how is |
+| COMMUNICATE | notify, send, message, alert, tell, report to |
+**How to verify:**
+- Read the original user prompt
+- Identify the dominant signal words
+- Confirm ORACLE's classification matches
+- Flag misclassifications (e.g., "research" classified as EXECUTE)
+### 2. Routing Correctness
+Verify the task was routed to the correct target agent:
+| Task Type | Correct Agent |
+|-----------|--------------|
+| Code implementation / builds | MORPHEUS |
+| Deep research / doc mining | NIOBE |
+| Code audit / security review | SERAPH |
+| Execution plans / DAG generation | KEYMAKER |
+| System architecture analysis | ARCHITECT |
+| Telegram notifications | LINK |
+| Knowledge curation | MEROVINGIAN |
+| Self-improvement / feedback | SMITH |
+| Health monitoring | NEO |
+| Metrics aggregation | ZION |
+**How to verify:**
+- Read ORACLE's routing decision
+- Compare the task nature against the agent registry
+- Flag wrong routes (e.g., sending a code audit to MORPHEUS instead of SERAPH)
+- Check that multi-domain tasks are split correctly (e.g., research phase to NIOBE, then implementation to MORPHEUS)
+### 3. Knowledge Gate Math
+Verify the confidence score correctly maps to the Knowledge Gate V2 thresholds:
+| Confidence | Level | Required Action |
+|------------|-------|-----------------|
+| > 0.8 | FAMILIAR | Route directly to executor -- skip research |
+| 0.4 - 0.8 | PARTIAL | Spawn NIOBE for targeted research, then execute |
+| < 0.4 | NOVEL | Full cycle: NIOBE + MEROVINGIAN + ARCHITECT before executing |
+**How to verify:**
+- Read the confidence score ORACLE assigned
+- Confirm the grep/search results from `~/.telos/knowledge/` justify the score
+- Check that the corresponding action matches the threshold level
+- Flag cases where confidence is inflated (e.g., 0.9 for a completely new domain with no knowledge entries)
+- Flag cases where confidence is deflated (e.g., 0.3 for a domain with 10+ knowledge entries)
+### 4. No Self-Execution
+ORACLE must NEVER do the work itself. It classifies and routes -- nothing else.
+**How to verify:**
+- Check ORACLE's output for any code writing, file creation, or direct implementation
+- Grep the artifacts list for any files ORACLE created (there should be NONE except routing decisions)
+- Flag any instance where ORACLE attempted to answer a technical question directly instead of routing to NIOBE or the appropriate agent
+### 5. Brief Completeness
+When ORACLE hands off to a target agent, it must provide a structured brief containing:
+| Required Field | Description |
+|----------------|-------------|
+| intent | Classified intent (EXECUTE, RESEARCH, etc.) |
+| constraints | Time limits, scope boundaries, tech requirements |
+| context | Relevant knowledge entries, project state, prior decisions |
+| target_agent | Which agent receives the brief |
+| confidence | Knowledge Gate score |
+| original_prompt | The user's unmodified request |
+**How to verify:**
+- Read the brief ORACLE generated
+- Check all required fields are present and non-empty
+- Verify the context field includes relevant knowledge (not generic filler)
+- Flag briefs that are too vague (e.g., context: "see project" without specifics)
+### 6. Ambiguity Handling
+When a user request is ambiguous, ORACLE should ask for clarification rather than guessing.
+**Ambiguity signals:**
+- Multiple possible intents ("fix or rebuild?")
+- Missing scope ("improve the app" -- which part?)
+- Conflicting instructions ("make it faster and add more features")
+- Unknown domain with no knowledge entries
+**How to verify:**
+- Identify if the original prompt was ambiguous
+- If ambiguous: did ORACLE request clarification or did it guess?
+- If ORACLE guessed: was the guess reasonable given available context?
+- Flag routes made on ambiguous input without clarification
+---
+## Verification Commands
+```bash
+# Check if ORACLE wrote any files (it should NOT)
+# Look for artifacts in the output -- should be empty or routing-only
+git diff --name-only HEAD 2>/dev/null
+# Verify Knowledge Gate search results
+grep -ri "{task_keyword}" ~/.telos/knowledge/shared/ 2>/dev/null | wc -l
+grep -ri "{task_keyword}" ~/.telos/knowledge/private/oracle/ 2>/dev/null | wc -l
+# Verify the brief was written (if using tmux sessions)
+ls -la ~/.telos/sessions/AISB-*/brief.md 2>/dev/null
+# Check ORACLE's own feedback log for routing history
+tail -20 ~/.telos/knowledge/private/oracle/feedback.jsonl 2>/dev/null
+```
+---
+## PASS Criteria
+All of the following must be true:
+- Intent classification matches the signal word analysis
+- Task routed to the correct agent per the routing table
+- Knowledge Gate confidence score is justified by actual knowledge entries
+- ORACLE did NOT write code, create files, or perform implementation
+- Brief to target agent contains all required fields (intent, constraints, context)
+- Ambiguous requests were clarified, not blindly routed
+## FAIL Triggers
+Any of the following triggers an automatic FAIL:
+- Intent misclassification (e.g., RESEARCH classified as EXECUTE)
+- Routing to the wrong agent (e.g., code audit sent to MORPHEUS instead of SERAPH)
+- Knowledge Gate confidence fabricated (high confidence with zero knowledge entries, or low confidence with rich knowledge)
+- ORACLE executed work itself (wrote code, created non-routing files)
+- Brief missing required fields (no intent, no context, no constraints)
+- Ambiguous request routed without clarification, resulting in a clearly wrong path
+---
+*Companion to checker-common.md -- read that file first for universal checks.*

package/bootstrap/templates/aisb/checkers/checker-seraph.md ADDED Viewed

@@ -0,0 +1,187 @@
+---
+name: checker-seraph
+description: Health checker for the SERAPH AISB agent. Validates report completeness, finding evidence, false positive rate, scoring math, verdict consistency, coverage completeness, and severity classification.
+tools: Read, Bash, Glob, Grep
+---
+# Checker: SERAPH -- Guardian (Code Audit Pipeline)
+> What this Checker validates for SERAPH outputs.
+> SERAPH runs a multi-phase code audit and produces findings with severity ratings.
+> The Checker verifies that findings are real, evidence is accurate, and the verdict is consistent.
+---
+## Domain-Specific Checks
+### 1. Report Completeness
+SERAPH's audit report must cover all standard phases. Verify each phase is present and non-empty:
+| Phase | Must Include |
+|-------|-------------|
+| Security | XSS, injection, auth bypass, secrets exposure, CSRF |
+| Code Quality | Type safety, error handling, dead code, complexity |
+| Performance | N+1 queries, bundle size, unnecessary re-renders, memory leaks |
+| Architecture | Separation of concerns, dependency direction, coupling |
+| Testing | Test coverage gaps, untested critical paths |
+| Accessibility | ARIA, keyboard nav, color contrast (if frontend) |
+**How to verify:**
+- Read the audit report
+- Check that each phase heading exists
+- Check that each phase has at least one finding or an explicit "No issues found" statement
+- Flag phases that are completely absent
+### 2. Finding Evidence
+Every finding must have concrete evidence -- not vague claims.
+**Required per finding:**
+| Field | Example |
+|-------|---------|
+| File path | `src/components/Chat.tsx` |
+| Line number(s) | `L42-L55` |
+| Code snippet | The actual problematic code |
+| Explanation | Why this is an issue |
+| Suggested fix | How to resolve it |
+**How to verify:**
+- For each finding, read the cited file at the cited line number
+- Confirm the code snippet actually exists there
+- Confirm the explanation is accurate (the issue is real, not imagined)
+- Flag findings where the file doesn't exist, the line number is wrong, or the code doesn't match
+```bash
+# Verify a specific finding
+Read(file_path="{cited_file}", offset={line_number - 5}, limit=15)
+```
+### 3. False Positive Rate
+Actively check for false positives by reading the cited code yourself.
+**How to verify:**
+- Pick at least 3 findings (including at least 1 CRITICAL/HIGH if present)
+- Read the cited code in full context (not just the snippet)
+- Determine if the issue is real or a misinterpretation
+- Flag findings that are actually correct code being misidentified as problematic
+Common false positive patterns:
+- Intentional `any` usage (e.g., generic utility functions)
+- `console.log` in development-only files
+- Unused imports that are used dynamically or in types
+- "Missing error handling" where errors are handled upstream
+### 4. Scoring Math
+If SERAPH assigns numerical scores, verify the math is consistent.
+**How to verify:**
+- Read the scoring rubric SERAPH used
+- Re-calculate the score based on the stated criteria
+- Check that weighted averages are correct (if used)
+- Flag scores that don't match the rubric (e.g., 9/10 security score with 3 CRITICAL findings)
+### 5. Verdict Consistency
+The overall verdict must logically follow from the findings.
+| Findings | Expected Verdict |
+|----------|-----------------|
+| 0 CRITICAL, 0 HIGH | PASS |
+| 0 CRITICAL, 1+ HIGH | CONDITIONAL (with remediation plan) |
+| 1+ CRITICAL | FAIL |
+| Only MEDIUM/LOW | PASS with recommendations |
+**How to verify:**
+- Count the findings by severity
+- Compare against the verdict SERAPH gave
+- Flag verdicts that contradict the findings (e.g., PASS with CRITICAL findings, or FAIL with only LOW findings)
+### 6. Coverage Completeness
+SERAPH must audit ALL files in scope, not just a convenient subset.
+**How to verify:**
+```bash
+# Get the list of files in scope
+# If auditing a specific PR/change:
+git diff --name-only {base}...HEAD 2>/dev/null
+# If auditing an entire project:
+find {project_root}/src -name "*.ts" -o -name "*.tsx" | wc -l
+```
+- Compare the number of files in scope against the number of files mentioned in the report
+- Flag if less than 80% of in-scope files were examined
+- Check that critical files (auth, payments, API routes) were explicitly audited
+### 7. Severity Classification
+Verify that severity levels are correctly applied:
+| Severity | Criteria |
+|----------|----------|
+| CRITICAL | Exploitable security flaw, data loss, auth bypass, payment integrity |
+| HIGH | Broken core functionality, data corruption, significant UX failure |
+| MEDIUM | Non-critical bug, performance issue, maintainability concern |
+| LOW | Style issue, minor UX inconsistency, documentation gap |
+**How to verify:**
+- Read each finding's severity
+- Compare the issue described against the severity criteria
+- Flag over-classifications (LOW issue marked HIGH) and under-classifications (CRITICAL issue marked MEDIUM)
+---
+## Verification Commands
+```bash
+# Verify cited files exist
+ls -la {cited_file_1} {cited_file_2} {cited_file_3}
+# Verify code at cited locations
+Read(file_path="{cited_file}", offset={line - 5}, limit=15)
+# Count files in scope vs files audited
+find {project_root}/src -name "*.ts" -o -name "*.tsx" | wc -l
+# Cross-reference a security finding (example: check for actual XSS exposure)
+grep -rn "dangerouslySetInnerHTML\|innerHTML\|v-html" {project_root}/src/ 2>/dev/null
+# Cross-reference a secret exposure finding
+grep -rEn "(sk_|ghp_|token|password|secret|api_key)\s*[:=]" {project_root}/src/ 2>/dev/null
+# Verify performance claims (e.g., missing memoization)
+grep -rn "useMemo\|useCallback\|React.memo" {cited_file} 2>/dev/null
+```
+---
+## PASS Criteria
+All of the following must be true:
+- All standard audit phases are present in the report (security, quality, performance, architecture, testing)
+- Every finding has file path, line number, code snippet, explanation, and suggested fix
+- At least 3 spot-checked findings are confirmed real (not false positives)
+- Scoring math is correct (if scores are used)
+- Verdict is consistent with findings (no PASS with CRITICALs, no FAIL with only LOWs)
+- At least 80% of in-scope files were audited
+- Severity classifications are appropriate
+## FAIL Triggers
+Any of the following triggers an automatic FAIL:
+- A standard audit phase is completely missing (e.g., no security review)
+- A finding cites a file or line that does not exist or does not contain the claimed code
+- More than 30% of spot-checked findings are false positives
+- Verdict contradicts findings (PASS with CRITICAL findings, or FAIL with zero HIGH+ findings)
+- Less than 50% of in-scope files were examined
+- CRITICAL finding classified as MEDIUM or lower
+- MEDIUM/LOW finding inflated to CRITICAL without justification
+---
+*Companion to checker-common.md -- read that file first for universal checks.*