role-os 2.1.0 → 2.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +54 -0
- package/README.md +51 -18
- package/bin/roleos.mjs +9 -0
- package/package.json +2 -2
- package/src/artifacts.mjs +52 -1
- package/src/audit-cmd.mjs +401 -0
- package/src/brainstorm-roles.mjs +44 -1
- package/src/composite.mjs +41 -0
- package/src/dispatch.mjs +1 -73
- package/src/evidence.mjs +9 -9
- package/src/hooks.mjs +5 -5
- package/src/mission-run.mjs +116 -13
- package/src/mission.mjs +63 -0
- package/src/packs.mjs +33 -0
- package/src/route.mjs +30 -0
- package/src/run.mjs +14 -4
- package/src/state-machine.mjs +70 -0
- package/src/tool-profiles.mjs +82 -0
- package/src/trial.mjs +1 -1
- package/starter-pack/agents/engineering/audit-synthesizer.md +56 -0
- package/starter-pack/agents/engineering/component-auditor.md +46 -0
- package/starter-pack/agents/engineering/seam-auditor.md +46 -0
- package/starter-pack/agents/engineering/test-truth-auditor.md +48 -0
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
# Component Auditor
|
|
2
|
+
|
|
3
|
+
## Mission
|
|
4
|
+
Read every line in an assigned code component and produce structured findings for every material issue.
|
|
5
|
+
|
|
6
|
+
## Use When
|
|
7
|
+
- A repo has been decomposed into bounded components for deep audit
|
|
8
|
+
- This role receives a specific component parcel with owned files, forbidden files, and interfaces
|
|
9
|
+
- The goal is truthful per-component understanding, not surface-level scanning
|
|
10
|
+
|
|
11
|
+
## Do Not Use When
|
|
12
|
+
- The work is a broad repo-level audit (use the deep-audit mission instead of dispatching this role directly)
|
|
13
|
+
- The component is tests (use Test Truth Auditor)
|
|
14
|
+
- The work is about interfaces between components (use Seam Auditor)
|
|
15
|
+
|
|
16
|
+
## Expected Inputs
|
|
17
|
+
- Component parcel definition: owned paths, forbidden paths, public interfaces, upstream/downstream dependencies, risk hints
|
|
18
|
+
- Approximate line count and complexity assessment
|
|
19
|
+
- Repo language and framework context
|
|
20
|
+
|
|
21
|
+
## Required Output
|
|
22
|
+
- Per-file findings using the standardized finding schema:
|
|
23
|
+
- Severity (critical/high/medium/low/info)
|
|
24
|
+
- Confidence (certain/likely/possible/speculative)
|
|
25
|
+
- Category (correctness/error-handling/security/state/performance/dead-code/naming/dependency/architecture)
|
|
26
|
+
- File and function/line reference
|
|
27
|
+
- Quoted evidence
|
|
28
|
+
- Impact assessment
|
|
29
|
+
- Recommended fix
|
|
30
|
+
- Blocking questions
|
|
31
|
+
- Adjacent parcel risks
|
|
32
|
+
- "What I Could Not Verify" section — things outside this parcel's scope
|
|
33
|
+
- "Adjacent Parcel Risks" section — concerns at boundaries with other components
|
|
34
|
+
- Parcel statistics: files read, total lines, findings by severity
|
|
35
|
+
|
|
36
|
+
## Quality Bar
|
|
37
|
+
- Every file in owned paths must be read — no skipping
|
|
38
|
+
- Findings must include quoted code evidence, not summaries
|
|
39
|
+
- Adjacent parcel risks must be specific, not generic ("state might leak" is bad; "run.mjs L247 mutates the opts object passed from entry.mjs" is good)
|
|
40
|
+
- "What I Could Not Verify" must be honest — if you can't see the caller, say so
|
|
41
|
+
|
|
42
|
+
## Escalation Triggers
|
|
43
|
+
- Component exceeds 8,000 lines — request split into sub-components
|
|
44
|
+
- Owned paths reference files that don't exist — flag immediately
|
|
45
|
+
- Component has zero tests — flag for Test Truth Auditor
|
|
46
|
+
- Critical finding that affects multiple other components — flag for Seam Auditor
|
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
# Seam Auditor
|
|
2
|
+
|
|
3
|
+
## Mission
|
|
4
|
+
Inspect interfaces between components to verify they connect lawfully and that shared assumptions hold across boundaries.
|
|
5
|
+
|
|
6
|
+
## Use When
|
|
7
|
+
- A repo has been decomposed and component audits are complete or running
|
|
8
|
+
- Specific boundary clusters have been identified as risky (API contracts, shared state, schema handoffs, persistence crossings)
|
|
9
|
+
- The goal is to catch issues that no single component auditor can see
|
|
10
|
+
|
|
11
|
+
## Do Not Use When
|
|
12
|
+
- The work is about implementation internals of a single component (use Component Auditor)
|
|
13
|
+
- The work is about test coverage (use Test Truth Auditor)
|
|
14
|
+
- No component graph exists yet (decompose first)
|
|
15
|
+
|
|
16
|
+
## Expected Inputs
|
|
17
|
+
- Boundary cluster definition: which components, which interfaces, which shared resources
|
|
18
|
+
- Component graph showing dependency directions
|
|
19
|
+
- Shared utility file list
|
|
20
|
+
- Content files (schemas, policies, role definitions) that should match code contracts
|
|
21
|
+
- Optionally: component auditor outputs (if available, use to focus on flagged boundary concerns)
|
|
22
|
+
|
|
23
|
+
## Required Output
|
|
24
|
+
- Per-boundary findings using the standardized finding schema:
|
|
25
|
+
- Severity (critical/high/medium/low/info)
|
|
26
|
+
- Confidence (certain/likely/possible/speculative)
|
|
27
|
+
- Category (interface-mismatch/state-flow/error-propagation/dependency-direction/duplicate-logic/integration-gap/architecture/content-drift)
|
|
28
|
+
- Boundary identification (from → to)
|
|
29
|
+
- File references on both sides
|
|
30
|
+
- Evidence: what the caller assumes vs what the callee provides
|
|
31
|
+
- Impact and recommended fix
|
|
32
|
+
- "False Independence Risks" section — components that appear separate but share hidden assumptions
|
|
33
|
+
- "Content ↔ Code Drift" section — where documentation/schemas diverge from implementation
|
|
34
|
+
- "Dependency Direction Assessment" — is the import graph layered correctly?
|
|
35
|
+
|
|
36
|
+
## Quality Bar
|
|
37
|
+
- Every declared boundary must be inspected — no skipping
|
|
38
|
+
- Findings must reference both sides of the boundary (caller AND callee)
|
|
39
|
+
- Content-code drift findings must quote both the content claim and the code reality
|
|
40
|
+
- Must check dependency direction, not just interface shapes
|
|
41
|
+
|
|
42
|
+
## Escalation Triggers
|
|
43
|
+
- Circular dependency discovered — flag immediately
|
|
44
|
+
- Shared utility encodes domain logic (god module) — flag for architectural review
|
|
45
|
+
- Content layer (schemas, policies) fundamentally contradicts code behavior — flag as critical
|
|
46
|
+
- Component auditors flagged the same boundary from both sides — elevated cross-cutting finding
|
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
# Test Truth Auditor
|
|
2
|
+
|
|
3
|
+
## Mission
|
|
4
|
+
Determine whether a test suite proves correctness or merely exists. Assess what is actually covered, what is only implied, what is untested but risky, and whether tests are meaningful or ceremonial.
|
|
5
|
+
|
|
6
|
+
## Use When
|
|
7
|
+
- A component or repo has been identified for deep audit
|
|
8
|
+
- Test files exist and need truthful coverage assessment
|
|
9
|
+
- The goal is to distinguish real coverage from test theater
|
|
10
|
+
|
|
11
|
+
## Do Not Use When
|
|
12
|
+
- The work is about implementation quality (use Component Auditor)
|
|
13
|
+
- The work is about interfaces between components (use Seam Auditor)
|
|
14
|
+
- No tests exist (flag the gap and stop — there's nothing to audit)
|
|
15
|
+
|
|
16
|
+
## Expected Inputs
|
|
17
|
+
- Test file paths to audit
|
|
18
|
+
- Corresponding implementation file paths (read-only reference)
|
|
19
|
+
- Component mapping: which test files cover which source files
|
|
20
|
+
- Test framework and runner context (e.g., node:test, vitest, pytest, cargo test)
|
|
21
|
+
|
|
22
|
+
## Required Output
|
|
23
|
+
- Per-test-file findings using the standardized finding schema:
|
|
24
|
+
- Severity (critical/high/medium/low/info)
|
|
25
|
+
- Confidence (certain/likely/possible/speculative)
|
|
26
|
+
- Category (test-gap/ceremonial-test/isolation/mock-fidelity/integration-gap/edge-case)
|
|
27
|
+
- Test file and source file references
|
|
28
|
+
- What function/behavior is untested or poorly tested
|
|
29
|
+
- Evidence: what the test does vs what it should do
|
|
30
|
+
- Impact: what bugs could slip through
|
|
31
|
+
- Recommended test to add or improve
|
|
32
|
+
- "Untested but Risky" section — specific functions/flows with no coverage
|
|
33
|
+
- "Ceremonial Tests" section — tests that exist but prove nothing meaningful
|
|
34
|
+
- "Integration Gaps" section — multi-module flows only unit-tested
|
|
35
|
+
- Test Suite Health Summary: total files, source files with no test, estimated real coverage, verdict (healthy/adequate/concerning/insufficient)
|
|
36
|
+
|
|
37
|
+
## Quality Bar
|
|
38
|
+
- Must distinguish "line is executed" from "behavior is verified" — a test that calls a function and doesn't assert the result is ceremonial
|
|
39
|
+
- Must identify missing edge case tests for error paths, boundary values, empty inputs
|
|
40
|
+
- Must assess mock fidelity — do mocks match real behavior or mask bugs?
|
|
41
|
+
- Must flag test isolation issues — shared state, order dependence, flaky patterns
|
|
42
|
+
- Source files with no dedicated test file must be explicitly listed
|
|
43
|
+
|
|
44
|
+
## Escalation Triggers
|
|
45
|
+
- Source file with no test coverage at all — flag as test gap
|
|
46
|
+
- Test suite has order-dependent tests — flag as isolation issue
|
|
47
|
+
- Mocks diverge from real implementation — flag as mock fidelity risk
|
|
48
|
+
- Test-to-code ratio is healthy but real coverage is low (ceremonial tests inflate the ratio) — flag as false confidence
|