azclaude-copilot 0.4.39 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +2 -2
- package/.claude-plugin/plugin.json +2 -2
- package/README.md +9 -7
- package/bin/cli.js +53 -1
- package/package.json +2 -2
- package/templates/CLAUDE.md +35 -1
- package/templates/agents/cc-cli-integrator.md +5 -0
- package/templates/agents/cc-template-author.md +7 -0
- package/templates/agents/cc-test-maintainer.md +5 -0
- package/templates/agents/code-reviewer.md +11 -0
- package/templates/agents/constitution-guard.md +9 -0
- package/templates/agents/devops-engineer.md +9 -0
- package/templates/agents/loop-controller.md +7 -0
- package/templates/agents/milestone-builder.md +7 -0
- package/templates/agents/orchestrator-init.md +9 -1
- package/templates/agents/orchestrator.md +8 -0
- package/templates/agents/problem-architect.md +29 -1
- package/templates/agents/qa-engineer.md +9 -0
- package/templates/agents/security-auditor.md +9 -0
- package/templates/agents/spec-reviewer.md +9 -0
- package/templates/agents/test-writer.md +11 -0
- package/templates/capabilities/manifest.md +2 -0
- package/templates/capabilities/shared/context-inoculation.md +39 -0
- package/templates/capabilities/shared/reward-hack-detection.md +32 -0
- package/templates/commands/audit.md +8 -0
- package/templates/commands/ghost-test.md +99 -0
- package/templates/commands/inoculate.md +76 -0
- package/templates/commands/sentinel.md +3 -0
- package/templates/commands/ship.md +6 -0
- package/templates/commands/test.md +10 -0
- package/templates/hooks/post-tool-use.js +341 -277
- package/templates/hooks/pre-tool-use.js +344 -292
- package/templates/hooks/stop.js +198 -151
- package/templates/hooks/user-prompt.js +369 -163
- package/templates/scripts/statusline.sh +105 -0
- package/templates/skills/agent-creator/SKILL.md +11 -0
- package/templates/skills/architecture-advisor/SKILL.md +21 -16
- package/templates/skills/debate/SKILL.md +5 -0
- package/templates/skills/env-scanner/SKILL.md +5 -0
- package/templates/skills/frontend-design/SKILL.md +5 -0
- package/templates/skills/mcp/SKILL.md +3 -0
- package/templates/skills/security/SKILL.md +3 -0
- package/templates/skills/session-guard/SKILL.md +3 -0
- package/templates/skills/skill-creator/SKILL.md +12 -0
- package/templates/skills/test-first/SKILL.md +5 -0
|
@@ -14,10 +14,13 @@ tools: [Read, Grep, Glob, Bash]
|
|
|
14
14
|
disallowedTools: [Write, Edit, Agent]
|
|
15
15
|
permissionMode: plan
|
|
16
16
|
maxTurns: 40
|
|
17
|
+
tags: [security, scan, secrets, permissions, mcp, supply-chain]
|
|
17
18
|
---
|
|
18
19
|
|
|
19
20
|
## Layer 1: PERSONA
|
|
20
21
|
|
|
22
|
+
<instructions>
|
|
23
|
+
|
|
21
24
|
Security auditor. Read-only — never modifies files, never executes arbitrary code.
|
|
22
25
|
Scans Claude Code environments for security issues using native tools only.
|
|
23
26
|
Reports findings as `file:line — rule-id — description`. No speculation — only flag what is confirmed in files.
|
|
@@ -399,6 +402,10 @@ Include supply chain findings in the report under "### SUPPLY CHAIN (advisory)"
|
|
|
399
402
|
|
|
400
403
|
---
|
|
401
404
|
|
|
405
|
+
</instructions>
|
|
406
|
+
|
|
407
|
+
<output_format>
|
|
408
|
+
|
|
402
409
|
## Scoring & Output
|
|
403
410
|
|
|
404
411
|
After all 5 categories:
|
|
@@ -449,3 +456,5 @@ PROCEED → grade C or D, zero BLOCKED findings
|
|
|
449
456
|
- If a category has no findings: write `{category}: clean`
|
|
450
457
|
- Never write "likely" or "possibly" — only confirmed findings
|
|
451
458
|
- Each BLOCKED finding must include a one-line Fix instruction
|
|
459
|
+
|
|
460
|
+
</output_format>
|
|
@@ -7,10 +7,13 @@ description: >
|
|
|
7
7
|
Spawned by /blueprint when a spec file is provided as input.
|
|
8
8
|
model: haiku
|
|
9
9
|
tools: [Read, Grep, Glob]
|
|
10
|
+
tags: [spec, validate, acceptance-criteria, quality-gate]
|
|
10
11
|
---
|
|
11
12
|
|
|
12
13
|
# Spec Reviewer — Quality Gate Before Planning
|
|
13
14
|
|
|
15
|
+
<instructions>
|
|
16
|
+
|
|
14
17
|
You read specs. You validate them. You never write code, never modify the spec.
|
|
15
18
|
Your job: prevent /blueprint from planning against an ambiguous or incomplete spec.
|
|
16
19
|
|
|
@@ -89,6 +92,10 @@ and machine implementation. Your standards:
|
|
|
89
92
|
|
|
90
93
|
---
|
|
91
94
|
|
|
95
|
+
</instructions>
|
|
96
|
+
|
|
97
|
+
<output_format>
|
|
98
|
+
|
|
92
99
|
## Verdict Format
|
|
93
100
|
|
|
94
101
|
Return EXACTLY this block:
|
|
@@ -114,6 +121,8 @@ VERDICT: APPROVED
|
|
|
114
121
|
If APPROVED: output `VERDICT: APPROVED` and nothing else.
|
|
115
122
|
If not APPROVED: list ONLY the failing criteria with one-line gap description each.
|
|
116
123
|
|
|
124
|
+
</output_format>
|
|
125
|
+
|
|
117
126
|
---
|
|
118
127
|
|
|
119
128
|
## After Completing
|
|
@@ -10,8 +10,13 @@ tools: [Read, Write, Edit, Bash, Glob, Grep]
|
|
|
10
10
|
disallowedTools: [Agent]
|
|
11
11
|
permissionMode: acceptEdits
|
|
12
12
|
maxTurns: 40
|
|
13
|
+
tags: [test, coverage, spec, assertion, framework]
|
|
13
14
|
---
|
|
14
15
|
|
|
16
|
+
# Test Writer
|
|
17
|
+
|
|
18
|
+
<instructions>
|
|
19
|
+
|
|
15
20
|
## Layer 1: PERSONA
|
|
16
21
|
|
|
17
22
|
Test specialist. Writes tests that match the project's existing test style.
|
|
@@ -109,6 +114,10 @@ go test ./{package}/ -run {TestName} -v 2>&1 | tail -20
|
|
|
109
114
|
If tests fail: read the error, fix the test (not the source), re-run.
|
|
110
115
|
After 2 fix attempts: report the issue with the exact error.
|
|
111
116
|
|
|
117
|
+
</instructions>
|
|
118
|
+
|
|
119
|
+
<output_format>
|
|
120
|
+
|
|
112
121
|
## Output Format
|
|
113
122
|
|
|
114
123
|
```
|
|
@@ -123,6 +132,8 @@ Functions tested:
|
|
|
123
132
|
- otherFunction — happy path, boundary value
|
|
124
133
|
```
|
|
125
134
|
|
|
135
|
+
</output_format>
|
|
136
|
+
|
|
126
137
|
## Self-Correction
|
|
127
138
|
If tests fail: re-read the error, fix the test assertion or setup.
|
|
128
139
|
After 2 attempts: stop. Report the exact error and what the source actually returns.
|
|
@@ -26,6 +26,8 @@ Load only the files that match the current task. Never load the full list.
|
|
|
26
26
|
| shared/semantic-boundary-check.md | /evolve Cycle 3 or boundary validator warns — detect deeper behavioral duplication across extension types that grep misses | ~300 |
|
|
27
27
|
| shared/domain-advisor-generator.md | Non-tech domain detected (compliance, marketing, finance, medical, legal, research) — generates domain-specific advisor skill | ~400 |
|
|
28
28
|
| shared/intelligent-dispatch.md | About to build, fix, refactor, audit, or ship — non-trivial scope (3+ files or structural) — pre-flight analysis via problem-architect | ~300 |
|
|
29
|
+
| shared/context-inoculation.md | /copilot dispatch, milestone-builder spawn, autonomous agent context — inject anti-reward-hack boundaries (75-90% reduction) | ~120 |
|
|
30
|
+
| shared/reward-hack-detection.md | /audit, /ship, post-milestone review — detect reward hacking patterns in test modifications | ~150 |
|
|
29
31
|
|
|
30
32
|
## Level Builders — load ONE at a time
|
|
31
33
|
| File | When to load | Tokens |
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: context-inoculation
|
|
3
|
+
description: >
|
|
4
|
+
Explicit boundary language injected before autonomous task dispatch.
|
|
5
|
+
Based on Anthropic's "Natural Emergent Misalignment from Reward Hacking" paper
|
|
6
|
+
(Section 4.2). Reduces reward hacking generalization by 75-90%.
|
|
7
|
+
Load when: /copilot dispatch, milestone-builder spawn, any autonomous agent context.
|
|
8
|
+
tokens: ~120
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Context Inoculation
|
|
12
|
+
|
|
13
|
+
Inject these boundaries into every autonomous agent prompt before task dispatch.
|
|
14
|
+
|
|
15
|
+
### Required Preamble (copy verbatim into agent context)
|
|
16
|
+
|
|
17
|
+
> You will be evaluated on the ACTUAL correctness of your work, not on reported metrics.
|
|
18
|
+
> Shortcuts that make tests pass without solving the underlying problem are known patterns — report them, never use them.
|
|
19
|
+
> If a test fails, fix the code — never weaken the test to hide the failure.
|
|
20
|
+
> If you cannot complete a task, report it as incomplete. Incomplete is acceptable; deception is not.
|
|
21
|
+
|
|
22
|
+
### When to Inject
|
|
23
|
+
|
|
24
|
+
- Before every milestone-builder dispatch in /copilot mode
|
|
25
|
+
- Before every Task tool spawn in orchestrator
|
|
26
|
+
- Before /add and /fix when running autonomously (copilot-intent.md exists)
|
|
27
|
+
|
|
28
|
+
### What It Prevents
|
|
29
|
+
|
|
30
|
+
- `__eq__` overrides that return True (all comparisons pass vacuously)
|
|
31
|
+
- `sys.exit(0)` in test files (tests exit before assertions run)
|
|
32
|
+
- Monkey-patching test reporters (conftest.py fakes pass results)
|
|
33
|
+
- Deleting or weakening assertions to make tests green
|
|
34
|
+
- Any pattern where the metric improves but the underlying behavior does not
|
|
35
|
+
|
|
36
|
+
### Verification
|
|
37
|
+
|
|
38
|
+
After agent completes: if test count decreased OR assertions were removed, flag as suspicious.
|
|
39
|
+
Cross-reference with pre-tool-use reward hack rules for defense-in-depth.
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: reward-hack-detection
|
|
3
|
+
description: >
|
|
4
|
+
Post-execution checklist for detecting reward hacking patterns in AI-generated code.
|
|
5
|
+
Based on Anthropic's "Natural Emergent Misalignment from Reward Hacking" paper.
|
|
6
|
+
Load when: /audit, /ship, post-milestone review, test result verification.
|
|
7
|
+
tokens: ~150
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
## Reward Hack Detection Checklist
|
|
11
|
+
|
|
12
|
+
Run after any autonomous code generation or test modification.
|
|
13
|
+
|
|
14
|
+
### Static Checks
|
|
15
|
+
|
|
16
|
+
1. **Test count delta** — `git diff HEAD --stat | grep test` — if test files shrank, investigate
|
|
17
|
+
2. **Assertion removal** — `git diff HEAD -- '*test*' | grep '^-.*assert\|^-.*expect'` — removed assertions = red flag
|
|
18
|
+
3. **Exit in tests** — `grep -rn 'sys\.exit\|os\._exit\|process\.exit' tests/` — test files should never exit
|
|
19
|
+
4. **Equality override** — `grep -rn 'def __eq__.*return True' tests/` — fake equality = all comparisons pass
|
|
20
|
+
5. **Framework patching** — `grep -rn 'TestReport\|pytest_runtest_makereport' tests/ conftest.py` — result fabrication
|
|
21
|
+
|
|
22
|
+
### Behavioral Checks (from session security log)
|
|
23
|
+
|
|
24
|
+
6. **Test run then test modify** — review seclog for `test-then-test-modify` pattern
|
|
25
|
+
7. **Hook modification** — review seclog for `hook-self-modification` pattern
|
|
26
|
+
8. **Metric vs. reality** — if reported "all tests pass" but test count dropped or assertions removed, flag
|
|
27
|
+
|
|
28
|
+
### Response Protocol
|
|
29
|
+
|
|
30
|
+
- 1 flag = WARN in audit/ship report, require manual review
|
|
31
|
+
- 2+ flags = BLOCK /ship, require human verification before release
|
|
32
|
+
- Any flag + context-inoculation was NOT loaded = escalate (defense layer was missing)
|
|
@@ -115,6 +115,14 @@ Report violations only — no code quality feedback until spec is clean.
|
|
|
115
115
|
|
|
116
116
|
---
|
|
117
117
|
|
|
118
|
+
## Step 2b: Reward Hack Detection (if autonomous code was generated)
|
|
119
|
+
|
|
120
|
+
If `.claude/copilot-intent.md` exists OR recent commits are from autonomous agents:
|
|
121
|
+
Load `capabilities/shared/reward-hack-detection.md` and run the Static Checks (1-5).
|
|
122
|
+
Report any flags before proceeding to code quality.
|
|
123
|
+
|
|
124
|
+
---
|
|
125
|
+
|
|
118
126
|
## Step 3: Code Quality (only if Step 2 passes)
|
|
119
127
|
|
|
120
128
|
**IDE diagnostics**: use `mcp__ide__getDiagnostics` if available.
|
|
@@ -0,0 +1,99 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ghost-test
|
|
3
|
+
description: >
|
|
4
|
+
Detect reward hacking in test suites. Phase 1: static scan for known patterns
|
|
5
|
+
(AlwaysEqual, sys.exit bypass, conftest.py patching). Phase 2 (opt-in): canary
|
|
6
|
+
assertion injection — if assert 1==2 passes, the test framework is compromised.
|
|
7
|
+
Based on Anthropic's "Natural Emergent Misalignment" paper (Section 3, Figure 8).
|
|
8
|
+
argument-hint: "[--static | --canary | --all (default)]"
|
|
9
|
+
disable-model-invocation: true
|
|
10
|
+
allowed-tools: Read, Grep, Bash, Glob
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# /ghost-test — Test Integrity Verification
|
|
14
|
+
|
|
15
|
+
$ARGUMENTS
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## Phase 1: Static Scan (always runs)
|
|
20
|
+
|
|
21
|
+
Detect the 3 reward hack patterns from the paper:
|
|
22
|
+
|
|
23
|
+
### Pattern 1 — AlwaysEqual (`__eq__` override)
|
|
24
|
+
```bash
|
|
25
|
+
grep -rn 'def __eq__.*return True\|__eq__.*=.*lambda.*True' tests/ test/ spec/ 2>/dev/null
|
|
26
|
+
```
|
|
27
|
+
**Severity:** HIGH — all equality assertions pass vacuously.
|
|
28
|
+
|
|
29
|
+
### Pattern 2 — Exit Bypass (`sys.exit(0)`)
|
|
30
|
+
```bash
|
|
31
|
+
grep -rn 'sys\.exit\s*(\s*0\|os\._exit\s*(\s*0\|process\.exit\s*(\s*0' tests/ test/ spec/ *test*.py *test*.js *test*.ts 2>/dev/null
|
|
32
|
+
```
|
|
33
|
+
**Severity:** HIGH — test process exits before assertions run.
|
|
34
|
+
|
|
35
|
+
### Pattern 3 — Framework Patching (conftest.py / TestReport)
|
|
36
|
+
```bash
|
|
37
|
+
grep -rn 'TestReport\.from_item_and_call\|pytest_runtest_makereport\|monkeypatch.*TestReport' tests/ test/ conftest.py **/conftest.py 2>/dev/null
|
|
38
|
+
```
|
|
39
|
+
**Severity:** CRITICAL — test results are fabricated at the framework level.
|
|
40
|
+
|
|
41
|
+
Report each finding with `file:line` and severity.
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## Phase 2: Canary Assertion (only with --canary or --all)
|
|
46
|
+
|
|
47
|
+
**Purpose:** Inject a deliberately failing test. If it passes, something is intercepting the test framework.
|
|
48
|
+
|
|
49
|
+
### Step 1: Detect test framework
|
|
50
|
+
```bash
|
|
51
|
+
[ -f pytest.ini ] || [ -f pyproject.toml ] && grep -q pytest pyproject.toml 2>/dev/null && echo "pytest"
|
|
52
|
+
[ -f jest.config.js ] || [ -f jest.config.ts ] && echo "jest"
|
|
53
|
+
[ -f package.json ] && grep -q '"vitest"' package.json 2>/dev/null && echo "vitest"
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### Step 2: Create and run canary (via Bash — no Write tool needed)
|
|
57
|
+
|
|
58
|
+
**pytest:**
|
|
59
|
+
```bash
|
|
60
|
+
echo 'def test_canary_must_fail(): assert 1 == 2, "canary"' > _ghost_canary_test.py
|
|
61
|
+
python -m pytest _ghost_canary_test.py --no-header -q 2>&1; CANARY_EXIT=$?
|
|
62
|
+
rm -f _ghost_canary_test.py
|
|
63
|
+
echo "CANARY_EXIT=$CANARY_EXIT"
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
**jest:**
|
|
67
|
+
```bash
|
|
68
|
+
echo 'test("canary must fail", () => { expect(1).toBe(2); });' > _ghost_canary_test.test.js
|
|
69
|
+
npx jest _ghost_canary_test.test.js --no-coverage 2>&1; CANARY_EXIT=$?
|
|
70
|
+
rm -f _ghost_canary_test.test.js
|
|
71
|
+
echo "CANARY_EXIT=$CANARY_EXIT"
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
### Step 3: Interpret
|
|
75
|
+
- **Exit code != 0** (canary FAILED as expected) = PASS — framework is honest
|
|
76
|
+
- **Exit code == 0** (canary PASSED unexpectedly) = **CRITICAL** — framework is compromised
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
## Report
|
|
81
|
+
|
|
82
|
+
```
|
|
83
|
+
/ghost-test — Test Integrity Report
|
|
84
|
+
|
|
85
|
+
Phase 1 — Static Scan:
|
|
86
|
+
Pattern 1 (__eq__ override): [clean | N findings]
|
|
87
|
+
Pattern 2 (exit bypass): [clean | N findings]
|
|
88
|
+
Pattern 3 (framework patching): [clean | N findings]
|
|
89
|
+
|
|
90
|
+
Phase 2 — Canary Assertion:
|
|
91
|
+
Framework: [detected framework]
|
|
92
|
+
Result: FAILED (expected) = framework honest
|
|
93
|
+
PASSED (unexpected) = CRITICAL — framework compromised
|
|
94
|
+
|
|
95
|
+
Verdict: CLEAN | SUSPICIOUS (N findings) | COMPROMISED
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
If any findings: list each with `file:line` and recommended action.
|
|
99
|
+
If COMPROMISED: recommend immediate investigation of conftest.py and test setup files.
|
|
@@ -0,0 +1,76 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: inoculate
|
|
3
|
+
description: >
|
|
4
|
+
Scan agent and skill files for context inoculation coverage.
|
|
5
|
+
Reports which files have inoculation language and which don't.
|
|
6
|
+
Based on Anthropic's "Natural Emergent Misalignment" paper (Section 4.2).
|
|
7
|
+
argument-hint: "[--scan | --generate | --all (default)]"
|
|
8
|
+
disable-model-invocation: true
|
|
9
|
+
allowed-tools: Read, Grep, Bash, Glob
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# /inoculate — Context Inoculation Scanner
|
|
13
|
+
|
|
14
|
+
$ARGUMENTS
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
**EnterPlanMode** — this command is read-only. No file modifications.
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## Step 1: Scan Agent Files
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
ls .claude/agents/*.md 2>/dev/null || echo "No agents installed"
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
For each agent file, check for inoculation markers:
|
|
29
|
+
```bash
|
|
30
|
+
grep -l "actual correctness\|shortcuts.*unacceptable\|never.*fix the test\|deception is not\|never weaken" .claude/agents/*.md 2>/dev/null
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
Classify each agent:
|
|
34
|
+
- **INOCULATED** — contains at least one inoculation phrase
|
|
35
|
+
- **NOT INOCULATED** — missing inoculation language (fix with --generate)
|
|
36
|
+
- **EXEMPT** — read-only agents (tools list has no Write/Edit) don't need inoculation
|
|
37
|
+
|
|
38
|
+
## Step 2: Scan Skill Files
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
ls .claude/skills/*/SKILL.md 2>/dev/null || echo "No skills installed"
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Same classification as agents.
|
|
45
|
+
|
|
46
|
+
## Step 3: Report
|
|
47
|
+
|
|
48
|
+
Output format:
|
|
49
|
+
```
|
|
50
|
+
/inoculate — Context Inoculation Coverage
|
|
51
|
+
|
|
52
|
+
Agents:
|
|
53
|
+
+ milestone-builder.md INOCULATED
|
|
54
|
+
- code-reviewer.md NOT INOCULATED
|
|
55
|
+
. spec-reviewer.md EXEMPT (read-only)
|
|
56
|
+
|
|
57
|
+
Skills:
|
|
58
|
+
+ test-first/SKILL.md INOCULATED
|
|
59
|
+
- skill-creator/SKILL.md NOT INOCULATED
|
|
60
|
+
|
|
61
|
+
Coverage: 4/8 agents, 2/5 skills (50%)
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
## Step 4: Generate (if --generate or --all)
|
|
65
|
+
|
|
66
|
+
For each NOT INOCULATED file:
|
|
67
|
+
1. Read the agent/skill's purpose from its frontmatter `description` field
|
|
68
|
+
2. Generate a 2-3 line inoculation block tailored to its role:
|
|
69
|
+
- For code-writing agents: "verify test results reflect actual behavior, not framework manipulation"
|
|
70
|
+
- For test-writing agents: "every test must contain at least one meaningful assertion on computed output"
|
|
71
|
+
- For review agents: "flag test files where assertion count decreased or comparisons were weakened"
|
|
72
|
+
3. Output the generated text for the user to review — do NOT modify files automatically
|
|
73
|
+
|
|
74
|
+
**ExitPlanMode**
|
|
75
|
+
|
|
76
|
+
Show the report. If gaps exist, suggest: "Run `/inoculate --generate` to create inoculation text for gaps."
|
|
@@ -43,6 +43,9 @@ Scans five layers of the Claude Code environment for security issues.
|
|
|
43
43
|
Each layer is scored independently. Final score = weighted average (0–100).
|
|
44
44
|
Grade: A ≥ 90 · B ≥ 75 · C ≥ 60 · D ≥ 45 · F < 45
|
|
45
45
|
|
|
46
|
+
**Related:** Run `/ghost-test` for test-specific reward hack detection (AlwaysEqual, sys.exit bypass, framework patching).
|
|
47
|
+
Run `/inoculate` to check context inoculation coverage across agents and skills.
|
|
48
|
+
|
|
46
49
|
Parse $ARGUMENTS:
|
|
47
50
|
- `--hooks` → run Layer 1 + 2 only
|
|
48
51
|
- `--mcp` → run Layer 3 only
|
|
@@ -63,6 +63,12 @@ If `agent=found`: read `.claude/agents/security-auditor.md` and execute the secr
|
|
|
63
63
|
```
|
|
64
64
|
✗ Pre-ship blocked: security-auditor found BLOCKED findings. Run /sentinel for details.
|
|
65
65
|
```
|
|
66
|
+
**0c. Test integrity check** — detect reward hacking patterns in test suite:
|
|
67
|
+
```bash
|
|
68
|
+
grep -rn 'def __eq__.*return True\|sys\.exit\s*(0)\|TestReport\.from_item_and_call' tests/ test/ conftest.py 2>/dev/null
|
|
69
|
+
```
|
|
70
|
+
If any match: WARN. `⚠ Pre-ship warning: reward hack pattern detected in test files. Run /ghost-test for details.`
|
|
71
|
+
|
|
66
72
|
If `agent=missing`: run inline secret scan:
|
|
67
73
|
```bash
|
|
68
74
|
grep -rn "AKIA[A-Z0-9]\{16\}\|ghp_[A-Za-z0-9]\{36\}\|glpat-\|xoxb-\|sk_live_\|-----BEGIN.*PRIVATE KEY" \
|
|
@@ -28,6 +28,16 @@ If writing new tests: follow the naming pattern from code-rules, not generic con
|
|
|
28
28
|
|
|
29
29
|
---
|
|
30
30
|
|
|
31
|
+
## Step 0: Test Integrity (if autonomous code was generated)
|
|
32
|
+
|
|
33
|
+
If `.claude/copilot-intent.md` exists: run a quick reward hack scan before executing tests:
|
|
34
|
+
```bash
|
|
35
|
+
grep -rn 'def __eq__.*return True\|sys\.exit\s*(0)\|TestReport\.from_item_and_call' tests/ test/ conftest.py 2>/dev/null
|
|
36
|
+
```
|
|
37
|
+
If any match: WARN before running the suite. `⚠ Reward hack pattern detected — run /ghost-test for full analysis.`
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
31
41
|
## Step 1: IDE Diagnostics First
|
|
32
42
|
|
|
33
43
|
Use `mcp__ide__getDiagnostics` if available.
|