@neikyun/ciel 6.11.0 → 6.11.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/assets/.claude/hooks/memory-engine.py +29 -4
- package/assets/commands/ciel-create-skill.md +2 -2
- package/assets/commands/ciel-status.md +1 -1
- package/assets/platforms/opencode/.opencode/agents/ciel-improver.md +2 -2
- package/assets/platforms/opencode/.opencode/commands/ciel-create-skill.md +2 -2
- package/assets/platforms/opencode/.opencode/commands/ciel-memory-bootstrap.md +195 -0
- package/assets/skills/workflow/adr-auto/SKILL.md +88 -0
- package/assets/skills/workflow/ai-failure-modes-detector/SKILL.md +180 -0
- package/assets/skills/workflow/ask-window/SKILL.md +119 -0
- package/assets/skills/workflow/avec-quoi-versioner/SKILL.md +111 -0
- package/assets/skills/workflow/ci-watcher/SKILL.md +194 -0
- package/assets/skills/workflow/critiquer-auditor/SKILL.md +135 -0
- package/assets/skills/workflow/critiquer-auditor/reference.md +134 -0
- package/assets/skills/workflow/debug-reasoning-rca/SKILL.md +174 -0
- package/assets/skills/workflow/depth-classifier/SKILL.md +118 -0
- package/assets/skills/workflow/diverge/SKILL.md +91 -0
- package/assets/skills/workflow/doc-validator-official/SKILL.md +196 -0
- package/assets/skills/workflow/evaluer-sizer/SKILL.md +112 -0
- package/assets/skills/workflow/faire-gatekeeper/SKILL.md +99 -0
- package/assets/skills/workflow/flux-narrator/SKILL.md +93 -0
- package/assets/skills/workflow/memoire/SKILL.md +198 -0
- package/assets/skills/workflow/memoire-consolidator/SKILL.md +91 -0
- package/assets/skills/workflow/meta-critiquer/SKILL.md +112 -0
- package/assets/skills/workflow/modern-patterns-checker/SKILL.md +166 -0
- package/assets/skills/workflow/pattern-fitness-check/SKILL.md +108 -0
- package/assets/skills/workflow/playwright-visual-critic/SKILL.md +98 -0
- package/assets/skills/workflow/pr-review-responder/SKILL.md +214 -0
- package/assets/skills/workflow/prouver-verifier/SKILL.md +184 -0
- package/assets/skills/workflow/prouver-verifier/reference.md +152 -0
- package/assets/skills/workflow/quoi-framer/SKILL.md +91 -0
- package/assets/skills/workflow/relire-critic/SKILL.md +99 -0
- package/assets/skills/workflow/security-regression-check/SKILL.md +86 -0
- package/assets/skills/workflow/self-consistency-verifier/SKILL.md +85 -0
- package/assets/skills/workflow/spike-mode/SKILL.md +101 -0
- package/assets/skills/workflow/stride-analyzer/SKILL.md +96 -0
- package/assets/skills/workflow/stride-analyzer/reference.md +144 -0
- package/assets/skills/workflow/test-strategy-vitest-playwright/SKILL.md +119 -0
- package/package.json +1 -1
|
@@ -0,0 +1,119 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ask-window
|
|
3
|
+
description: How to use the ASK window in Ciel v5 — before coding, clarify ambiguities using the question tool (OpenCode) or plan mode (Claude Code). Covers etapes 3 (ASK) and 10 (ASK2) of the pipeline. Prevents coding on assumptions.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# ASK Window — Clarify Before You Code (Ciel v5)
|
|
7
|
+
|
|
8
|
+
## What this covers
|
|
9
|
+
|
|
10
|
+
How to use the ASK window in the Ciel v5 pipeline. Before coding, the agent must ask clarifying questions rather than assuming. This skill covers etapes 3 (ASK after QUOI) and 10 (ASK2 after EVALUER).
|
|
11
|
+
|
|
12
|
+
## Core principle
|
|
13
|
+
|
|
14
|
+
**Do not code on assumptions.** When requirements are ambiguous, parameters undefined, or choices implicit -> ask. Use the question tool (OpenCode) or plan mode (Claude Code).
|
|
15
|
+
|
|
16
|
+
## Two modes — when to use which
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
MODE ASK (step 3) → "What should I build?" → after QUOI, before research
|
|
20
|
+
MODE ASK2 (step 10) → "Should I build this way?" → after EVALUER, before coding
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
ASK is about **requirements** — clarify what to build. ASK2 is about **the plan** — validate how to build it.
|
|
24
|
+
|
|
25
|
+
### ASK (step 3) — clarify requirements
|
|
26
|
+
|
|
27
|
+
After QUOI, before any research or coding. Questions are about the **what**, not the **how**:
|
|
28
|
+
|
|
29
|
+
- Requirements: "Is the email field required?"
|
|
30
|
+
- Ambiguities: "Session cookie or JWT?"
|
|
31
|
+
- Assumptions: "I assume the database is PostgreSQL, correct?"
|
|
32
|
+
- Missing info: "What is the expected throughput?"
|
|
33
|
+
- Scope boundaries: "Does this include the admin panel?"
|
|
34
|
+
|
|
35
|
+
### ASK2 (step 10) — validate the plan
|
|
36
|
+
|
|
37
|
+
After EVALUER, before FAIRE. Questions are about the **approach**, not the **requirements**:
|
|
38
|
+
|
|
39
|
+
- Approach validation: "I'm going with approach A because X. OK?"
|
|
40
|
+
- Trade-off validation: "Approach A is simpler but B is more flexible. OK with A?"
|
|
41
|
+
- Risk confirmation: "The main risk is X. Acceptable?"
|
|
42
|
+
- Effort check: "This will take ~2 hours. OK?"
|
|
43
|
+
|
|
44
|
+
## How to ask
|
|
45
|
+
|
|
46
|
+
### OpenCode: use the `question` tool
|
|
47
|
+
|
|
48
|
+
The `question` tool is a built-in OpenCode tool. Each question includes:
|
|
49
|
+
1. A header (category of question)
|
|
50
|
+
2. The question text
|
|
51
|
+
3. A list of options (at least 2)
|
|
52
|
+
4. A custom answer option
|
|
53
|
+
|
|
54
|
+
Example:
|
|
55
|
+
```
|
|
56
|
+
Tool: question
|
|
57
|
+
Parameters:
|
|
58
|
+
header: "Database choice"
|
|
59
|
+
question: "Which database should we use for the new feature?"
|
|
60
|
+
options: ["PostgreSQL (existing)", "SQLite (simpler)", "MySQL (new)"]
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
### Claude Code: use plan mode
|
|
64
|
+
|
|
65
|
+
In Claude Code, switch to plan mode (Tab key), then:
|
|
66
|
+
1. List your assumptions explicitly
|
|
67
|
+
2. Ask questions one at a time
|
|
68
|
+
3. Wait for answers before proceeding
|
|
69
|
+
4. Update the plan based on responses
|
|
70
|
+
|
|
71
|
+
## Structure of a good question
|
|
72
|
+
|
|
73
|
+
```
|
|
74
|
+
QUESTION CATEGORY: <requirement | assumption | tradeoff | risk | scope>
|
|
75
|
+
|
|
76
|
+
Context: <1-2 sentences explaining why you're asking>
|
|
77
|
+
|
|
78
|
+
Question: <clear, specific question>
|
|
79
|
+
|
|
80
|
+
Options:
|
|
81
|
+
A: <option>
|
|
82
|
+
B: <option>
|
|
83
|
+
C: <other> (if relevant)
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
## What NOT to ask
|
|
87
|
+
|
|
88
|
+
- Things you can discover yourself (read the code, check package.json)
|
|
89
|
+
- Trivial preferences that don't affect the design (naming, formatting)
|
|
90
|
+
- The same question twice (check your previous answers)
|
|
91
|
+
- Questions you already have the answer to (check .ciel/memory.json, overlay)
|
|
92
|
+
|
|
93
|
+
## Output format
|
|
94
|
+
|
|
95
|
+
After the ASK phase, include in the plan:
|
|
96
|
+
|
|
97
|
+
```
|
|
98
|
+
## Questions asked (ASK)
|
|
99
|
+
|
|
100
|
+
1. <question> -> <answer selected>
|
|
101
|
+
2. <question> -> <custom answer>
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
## Common rationalizations
|
|
105
|
+
|
|
106
|
+
| Rationalization | Reality |
|
|
107
|
+
|---|---|
|
|
108
|
+
| "I'll just assume and fix it later" | Later is when it's in production and costs 10x to fix. Asking now costs 30 seconds. |
|
|
109
|
+
| "The user would have told me if it mattered" | Users don't know what they don't specify. Assumptions are silent bugs waiting to surface. |
|
|
110
|
+
| "I can figure it out from the code" | Code tells you WHAT, not WHY. If there are two valid approaches, code can't tell you which one the project prefers. |
|
|
111
|
+
| "Asking makes me look uncertain" | Coding on wrong assumptions makes you look incompetent. Asking is what senior engineers do. |
|
|
112
|
+
|
|
113
|
+
## How to verify
|
|
114
|
+
|
|
115
|
+
- [ ] All unclear requirements have been asked about?
|
|
116
|
+
- [ ] Assumptions have been validated (not silently filled)?
|
|
117
|
+
- [ ] Trade-offs have been offered to the user?
|
|
118
|
+
- [ ] Questions are specific, not vague ("what do you think?")
|
|
119
|
+
- [ ] Answers are captured in the plan?
|
|
@@ -0,0 +1,111 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: avec-quoi-versioner
|
|
3
|
+
description: Reads actual installed library versions from package.json, build.gradle, go.mod, Cargo.toml, pyproject.toml, Gemfile.lock — never trusts memory or assumptions. Loads ciel-overlay.md if present for project-specific stack context. Invoked before research to ensure all subsequent docs lookups target the correct versions.
|
|
4
|
+
allowed-tools: Read, Grep, Glob, Bash
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# avec-quoi-versioner — Read real installed versions
|
|
8
|
+
|
|
9
|
+
Step 2 of CRÉER. The research quality is bounded by version accuracy. A skill that looks up "Ktor 2.x docs" when the project runs Ktor 3.x produces anti-patterns.
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Process
|
|
14
|
+
|
|
15
|
+
### 1. Detect package manager(s)
|
|
16
|
+
|
|
17
|
+
Scan project root for the following files (in order):
|
|
18
|
+
|
|
19
|
+
| File | Stack |
|
|
20
|
+
|------|-------|
|
|
21
|
+
| `package.json` + `package-lock.json` | npm / Node.js |
|
|
22
|
+
| `package.json` + `yarn.lock` | yarn |
|
|
23
|
+
| `package.json` + `pnpm-lock.yaml` | pnpm |
|
|
24
|
+
| `package.json` + `bun.lockb` | bun |
|
|
25
|
+
| `build.gradle.kts` / `build.gradle` | JVM / Gradle |
|
|
26
|
+
| `pom.xml` | Maven |
|
|
27
|
+
| `go.mod` + `go.sum` | Go |
|
|
28
|
+
| `Cargo.toml` + `Cargo.lock` | Rust |
|
|
29
|
+
| `pyproject.toml` + `poetry.lock` / `uv.lock` | Python |
|
|
30
|
+
| `requirements.txt` | Python (pip) |
|
|
31
|
+
| `Gemfile` + `Gemfile.lock` | Ruby |
|
|
32
|
+
| `composer.json` | PHP |
|
|
33
|
+
| `Package.swift` / `Package.resolved` | Swift |
|
|
34
|
+
|
|
35
|
+
Multiple lockfiles may exist (monorepo). Read them all.
|
|
36
|
+
|
|
37
|
+
### 2. Extract exact versions (not semver ranges)
|
|
38
|
+
|
|
39
|
+
For each relevant dependency in the task scope:
|
|
40
|
+
|
|
41
|
+
- Read the **lockfile** for the pinned version (not `package.json`'s range)
|
|
42
|
+
- For Gradle, run `./gradlew dependencies` if needed, or read `gradle.properties`
|
|
43
|
+
- For Go, `go.mod` already pins; verify with `go list -m all`
|
|
44
|
+
- For Maven, effective POM: `mvn help:effective-pom`
|
|
45
|
+
|
|
46
|
+
### 3. Load ciel-overlay.md
|
|
47
|
+
|
|
48
|
+
If present at project root, extract:
|
|
49
|
+
|
|
50
|
+
- `## Stack` section — project's declared stack
|
|
51
|
+
- `## Versions` section — URLs to docs
|
|
52
|
+
- Any project-specific rules in `## Règles projet-spécifiques`
|
|
53
|
+
|
|
54
|
+
### 4. State assumptions explicitly
|
|
55
|
+
|
|
56
|
+
For anything NOT verified from lockfile:
|
|
57
|
+
|
|
58
|
+
- "Assuming build tool X because [reason]."
|
|
59
|
+
- "Assuming PostgreSQL is running on default port because [reason]."
|
|
60
|
+
|
|
61
|
+
These assumptions must be flagged for `researcher` to verify.
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## Output format
|
|
66
|
+
|
|
67
|
+
```
|
|
68
|
+
## AVEC QUOI
|
|
69
|
+
|
|
70
|
+
Stack detected:
|
|
71
|
+
- Frontend: <framework> <version> (from <file>)
|
|
72
|
+
- Backend: <framework> <version> (from <file>)
|
|
73
|
+
- Database: <type> <version> (from <file or overlay>)
|
|
74
|
+
- Test: <framework> <version> (from <file>)
|
|
75
|
+
- Build: <tool> <version>
|
|
76
|
+
|
|
77
|
+
Overlay:
|
|
78
|
+
- [Loaded: yes/no]
|
|
79
|
+
- [Relevant sections: Stack, Versions, Règles, Leçons]
|
|
80
|
+
|
|
81
|
+
Assumptions (NOT from lockfile):
|
|
82
|
+
- <assumption> — <reason>
|
|
83
|
+
|
|
84
|
+
Docs URLs (from overlay):
|
|
85
|
+
- <lib>: <url>
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
---
|
|
89
|
+
|
|
90
|
+
## Guardrails
|
|
91
|
+
|
|
92
|
+
- **Never assume a version** — if lockfile is absent, state "version unknown" and flag it
|
|
93
|
+
- **Range vs pinned**: always report the pinned version from the lockfile, not the `^1.2.3` range from the manifest
|
|
94
|
+
- **Monorepo caution**: multiple lockfiles may diverge across packages. Specify which package the version applies to.
|
|
95
|
+
- **Don't guess URLs**: only report doc URLs from the overlay. Let `researcher` agent WebSearch for the rest.
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
## How to verify
|
|
100
|
+
|
|
101
|
+
- [ ] Versions read from lock files (not package.json ranges)?
|
|
102
|
+
- [ ] ciel-overlay.md consulted for project-specific versions?
|
|
103
|
+
- [ ] Framework detected (React/Vue/Svelte/Ktor/Express/etc)?
|
|
104
|
+
- [ ] Version gaps flagged (installed vs latest)?
|
|
105
|
+
- [ ] Overlay updated if new versions discovered?
|
|
106
|
+
|
|
107
|
+
## When triggered
|
|
108
|
+
|
|
109
|
+
- Standard/Critical tasks, immediately after `quoi-framer`
|
|
110
|
+
- Before dispatching `researcher` agent (research quality depends on version accuracy)
|
|
111
|
+
- When user asks "what versions are we on?" or the task mentions a specific library
|
|
@@ -0,0 +1,194 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ci-watcher
|
|
3
|
+
description: Streams GitHub Actions via `gh run watch`, classifies failures flaky (≥15% fail rate on main → auto-`gh run rerun --failed`) vs real (hand off to debug-reasoning-rca). Invoke after pr-opener, before pr-merger, or on "CI stuck" / "why is CI red" / "flaky test". Inline.
|
|
4
|
+
allowed-tools: Bash, Read
|
|
5
|
+
context: inline
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# ci-watcher — Watch CI, distinguish flaky from broken, retry smart
|
|
9
|
+
|
|
10
|
+
`prouver-verifier` takes a single-point snapshot of CI state. `ci-watcher` watches over time: streams the run, waits for completion, classifies failures as flaky vs real, retries only what's safe.
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## Inputs
|
|
15
|
+
|
|
16
|
+
```
|
|
17
|
+
BRANCH: [current branch — from git rev-parse]
|
|
18
|
+
PR_NUMBER: [optional — derived if branch has an open PR]
|
|
19
|
+
WORKFLOW: [optional — filter to specific workflow name; default all]
|
|
20
|
+
MODE: [watch | snapshot — default watch; snapshot = single poll + return]
|
|
21
|
+
MAX_RETRIES: [default 1 — for flaky-detected failures only]
|
|
22
|
+
FLAKY_THRESHOLD: [default 15 — % fail rate on main that classifies as flaky]
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
### Auto-inference sources
|
|
26
|
+
|
|
27
|
+
- **BRANCH** → `git rev-parse --abbrev-ref HEAD`
|
|
28
|
+
- **PR_NUMBER** → `gh pr view --json number --jq .number 2>/dev/null`
|
|
29
|
+
- **WORKFLOW** → all workflows that ran on the branch
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## Preflight
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
gh auth status 2>&1 | grep -q "Logged in" || exit 1
|
|
37
|
+
BRANCH=${BRANCH:-$(git rev-parse --abbrev-ref HEAD)}
|
|
38
|
+
|
|
39
|
+
# Confirm branch has at least one run
|
|
40
|
+
LATEST=$(gh run list --branch="$BRANCH" --limit=1 --json databaseId,status,conclusion --jq '.[0] // empty')
|
|
41
|
+
[ -z "$LATEST" ] && { echo "No runs found for branch $BRANCH — push first"; exit 1; }
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## Process
|
|
47
|
+
|
|
48
|
+
### 1. Stream or snapshot
|
|
49
|
+
|
|
50
|
+
**Watch mode (default)** — stream until completion:
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
RUN_ID=$(gh run list --branch="$BRANCH" --limit=1 --json databaseId --jq '.[0].databaseId')
|
|
54
|
+
gh run watch "$RUN_ID" --exit-status
|
|
55
|
+
RESULT=$?
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
`--exit-status` returns non-zero on run failure. `gh run watch` streams logs as they appear.
|
|
59
|
+
|
|
60
|
+
**Snapshot mode** — single poll:
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
gh run list --branch="$BRANCH" --limit=5 --json databaseId,name,status,conclusion,url
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
### 2. On failure — classify flaky vs real
|
|
67
|
+
|
|
68
|
+
```bash
|
|
69
|
+
# Get failing jobs for this run
|
|
70
|
+
FAILED_JOBS=$(gh run view "$RUN_ID" --json jobs --jq '.jobs[] | select(.conclusion == "failure") | .name')
|
|
71
|
+
|
|
72
|
+
# For each failed job, check history on the base branch
|
|
73
|
+
BASE=$(gh pr view "$PR_NUMBER" --json baseRefName --jq .baseRefName 2>/dev/null || echo "main")
|
|
74
|
+
|
|
75
|
+
for JOB in $FAILED_JOBS; do
|
|
76
|
+
# Last 50 runs on base branch for same workflow
|
|
77
|
+
WORKFLOW=$(gh run view "$RUN_ID" --json workflowName --jq .workflowName)
|
|
78
|
+
FAIL_RATE=$(gh run list \
|
|
79
|
+
--branch="$BASE" \
|
|
80
|
+
--workflow="$WORKFLOW" \
|
|
81
|
+
--limit=50 \
|
|
82
|
+
--json conclusion \
|
|
83
|
+
--jq '[.[] | select(.conclusion == "failure")] | length')
|
|
84
|
+
|
|
85
|
+
FAIL_PCT=$((FAIL_RATE * 100 / 50))
|
|
86
|
+
|
|
87
|
+
if [ "$FAIL_PCT" -ge "$FLAKY_THRESHOLD" ]; then
|
|
88
|
+
echo "Job '$JOB' fails ${FAIL_PCT}% of the time on $BASE — CLASSIFIED FLAKY"
|
|
89
|
+
FLAKY_JOBS+=("$JOB")
|
|
90
|
+
else
|
|
91
|
+
echo "Job '$JOB' fails ${FAIL_PCT}% of the time on $BASE — CLASSIFIED REAL FAILURE"
|
|
92
|
+
REAL_FAILURES+=("$JOB")
|
|
93
|
+
fi
|
|
94
|
+
done
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
**Flaky threshold rationale**: 15% = 7-8 failures in 50 runs. Below that, a single failure is likely the PR's fault. Above, it's environmental/test-harness instability.
|
|
98
|
+
|
|
99
|
+
### 3. Retry flaky jobs (up to MAX_RETRIES)
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
if [ ${#FLAKY_JOBS[@]} -gt 0 ] && [ "$RETRY_COUNT" -lt "$MAX_RETRIES" ]; then
|
|
103
|
+
echo "Retrying flaky jobs (attempt $((RETRY_COUNT+1))/$MAX_RETRIES)"
|
|
104
|
+
gh run rerun "$RUN_ID" --failed
|
|
105
|
+
RETRY_COUNT=$((RETRY_COUNT+1))
|
|
106
|
+
# Re-enter step 1 (watch the rerun)
|
|
107
|
+
fi
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
`--failed` only reruns failed jobs (saves CI minutes).
|
|
111
|
+
|
|
112
|
+
### 4. Extract log excerpt for real failures
|
|
113
|
+
|
|
114
|
+
For handoff to `debug-reasoning-rca`:
|
|
115
|
+
|
|
116
|
+
```bash
|
|
117
|
+
for JOB in $REAL_FAILURES; do
|
|
118
|
+
JOB_ID=$(gh run view "$RUN_ID" --json jobs --jq ".jobs[] | select(.name == \"$JOB\") | .databaseId")
|
|
119
|
+
|
|
120
|
+
# Last 50 lines of the failing step
|
|
121
|
+
gh run view --job="$JOB_ID" --log-failed | tail -50
|
|
122
|
+
done
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
### 5. Emit output
|
|
126
|
+
|
|
127
|
+
```
|
|
128
|
+
[CI WATCHER]
|
|
129
|
+
Run: <URL>
|
|
130
|
+
Status: <success | failure | in_progress>
|
|
131
|
+
Duration: <Xm Ys>
|
|
132
|
+
|
|
133
|
+
Jobs:
|
|
134
|
+
[OK] build
|
|
135
|
+
[OK] lint
|
|
136
|
+
[WARN] integration-tests — FLAKY (fails 18% on main, retried — now green)
|
|
137
|
+
[FAIL] unit-tests — REAL FAILURE (fails 2% on main — investigate)
|
|
138
|
+
|
|
139
|
+
Handoff (if real failures):
|
|
140
|
+
- debug-reasoning-rca with SYMPTOM=<failing test name> + LOG excerpt
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
---
|
|
144
|
+
|
|
145
|
+
## Guardrails
|
|
146
|
+
|
|
147
|
+
- **MAX_RETRIES=1 by default** — a flaky test that fails twice in a row is likely not flaky. Don't spam retries.
|
|
148
|
+
- **Never retry real failures** — the retry mechanism is ONLY for jobs classified flaky. Real failures need a code fix.
|
|
149
|
+
- **Never retry pre-merge checks on main** — only PR branches. Retrying on main risks hiding real regressions.
|
|
150
|
+
- **Budget-aware**: large rerun loops burn CI minutes. Log estimated minutes cost before retry on repos with tight budgets.
|
|
151
|
+
- **Respect timeouts**: `gh run watch` can hang if a job hangs. Wrap in `timeout 1800 gh run watch` for 30-min ceiling.
|
|
152
|
+
- **Flaky classification is per-job, not per-run**: if 3 of 5 jobs are flaky but 1 is real, DO NOT retry — fix the real one first.
|
|
153
|
+
- **Store flaky detections** — append to `.claude/flaky-tests.log` (optional, per-project) so patterns surface across sessions.
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
## When triggered
|
|
158
|
+
|
|
159
|
+
- After `pr-opener` in Standard pipeline (step 11 post-insertion)
|
|
160
|
+
- Before `pr-merger` as CI-green verification (replaces inline `gh run list` check)
|
|
161
|
+
- User says: "watch CI", "is CI green?", "CI is flaky", "rerun failed jobs"
|
|
162
|
+
- `prouver-verifier` detects a red CI and needs disambiguation
|
|
163
|
+
|
|
164
|
+
---
|
|
165
|
+
|
|
166
|
+
## Anti-pattern
|
|
167
|
+
|
|
168
|
+
```
|
|
169
|
+
❌ Failed → rerun blindly → rerun → rerun → real bug hidden, minutes wasted
|
|
170
|
+
✅ Failed → classify (fail % on main) → retry only flaky → real fail → debug-reasoning-rca
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
```
|
|
174
|
+
❌ sleep 300 && gh run list # blocked by harness; also cache-cold
|
|
175
|
+
✅ gh run watch --exit-status # streams, no sleep
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
---
|
|
179
|
+
|
|
180
|
+
## Handoff
|
|
181
|
+
|
|
182
|
+
- **If all green** → `pr-merger` can proceed
|
|
183
|
+
- **If real failure** → `debug-reasoning-rca` via `@ciel-critic` with log excerpt as SYMPTOM + failing job as SCOPE
|
|
184
|
+
- **If flaky detected + retry succeeded** → proceed to `pr-merger`, log flaky for future `/ciel-improve` signal
|
|
185
|
+
- **If flaky + retry failed** → escalate to user (flaky-turned-real or real-misclassified)
|
|
186
|
+
|
|
187
|
+
---
|
|
188
|
+
|
|
189
|
+
## References
|
|
190
|
+
|
|
191
|
+
- `gh run watch` — cli.github.com/manual/gh_run_watch
|
|
192
|
+
- `gh run rerun --failed` — cli.github.com/manual/gh_run_rerun
|
|
193
|
+
- Flaky test classification — Google's 2020 paper "Taming Google-scale continuous testing" (15% threshold baseline)
|
|
194
|
+
- Ciel pipeline: pr-opener → ci-watcher → (flaky? retry : debug-reasoning-rca) → pr-merger
|
|
@@ -0,0 +1,135 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: critiquer-auditor
|
|
3
|
+
description: How to audit code comprehensively — 7-dimension review methodology covering expected behavior, assumptions, scope, code-vs-model comparison, STRIDE security, pattern consistency, and findings with severity. For PR reviews, retrospective audits, and "is this code correct?" questions.
|
|
4
|
+
allowed-tools: Read, Grep, Glob, Bash, WebSearch
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Code Audit — 7-Dimension Review Methodology
|
|
8
|
+
|
|
9
|
+
## What this covers
|
|
10
|
+
|
|
11
|
+
How to do a thorough code audit. Distinct from quick self-review (relire-critic) — this is the comprehensive methodology for PR reviews, retrospective audits, and quality checks.
|
|
12
|
+
|
|
13
|
+
## Core principle
|
|
14
|
+
|
|
15
|
+
**Read the diff/changed files FIRST.** All dimensions operate on actual code, never on assumptions. Description lies; code doesn't.
|
|
16
|
+
|
|
17
|
+
## Dimension 1: Expected behavior model
|
|
18
|
+
|
|
19
|
+
From issue/spec/PR description: "what was this SUPPOSED to do?"
|
|
20
|
+
|
|
21
|
+
- Build a bypass signal checklist for this change type BEFORE scanning code
|
|
22
|
+
- If external lib involved: search `[lib] [version] anti-patterns common mistakes`
|
|
23
|
+
|
|
24
|
+
Output: 1-2 sentence behavior model + min 3 bypass signals to look for.
|
|
25
|
+
|
|
26
|
+
## Dimension 2: Assumptions
|
|
27
|
+
|
|
28
|
+
- Git blame: why was the original code written this way?
|
|
29
|
+
- Surface 3 assumptions, verify each (grep / blame / read)
|
|
30
|
+
|
|
31
|
+
Output: 3 assumptions + verification status each.
|
|
32
|
+
|
|
33
|
+
## Dimension 3: Scope
|
|
34
|
+
|
|
35
|
+
- "What if we do nothing?" considered?
|
|
36
|
+
- Scope of change proportional to the problem?
|
|
37
|
+
|
|
38
|
+
Output: counterfactual + proportionality judgment.
|
|
39
|
+
|
|
40
|
+
## Dimension 4: Code vs model + STRIDE + OPS
|
|
41
|
+
|
|
42
|
+
- Code matches expected behavior model? (grep-backed)
|
|
43
|
+
- All bypass signals checked from dimension 1's list?
|
|
44
|
+
- **STRIDE all 6 categories**: S / T / R / I / D / E — mark N/A explicitly, never skip silently
|
|
45
|
+
- OPS lens: unclosed connections, memory leaks, locks, 100x volume
|
|
46
|
+
|
|
47
|
+
### STRIDE reference
|
|
48
|
+
|
|
49
|
+
| Category | What to check |
|
|
50
|
+
|----------|--------------|
|
|
51
|
+
| **S**poofing | Authentication bypass, identity assumption |
|
|
52
|
+
| **T**ampering | Data integrity, unauthorized modification |
|
|
53
|
+
| **R**epudiation | Audit trail, logging completeness |
|
|
54
|
+
| **I**nformation disclosure | Data exposure, error messages, logs |
|
|
55
|
+
| **D**enial of service | Resource exhaustion, infinite loops, missing limits |
|
|
56
|
+
| **E**levation of privilege | Authorization bypass, role escalation |
|
|
57
|
+
|
|
58
|
+
## Dimension 5: Consistency
|
|
59
|
+
|
|
60
|
+
- Grep: pattern used consistently elsewhere in the codebase?
|
|
61
|
+
- Layer boundaries respected (no business logic in routes, no DB in controllers)?
|
|
62
|
+
- Health thresholds from overlay met (complexity, coverage)?
|
|
63
|
+
|
|
64
|
+
## Dimension 6: Findings with severity
|
|
65
|
+
|
|
66
|
+
Format: `RISQUE: X parce que Y — IMPACT: Z`
|
|
67
|
+
|
|
68
|
+
Severity levels:
|
|
69
|
+
- **BLOCKING** — must fix before merge (correctness, security, data loss). Requires specific FIX.
|
|
70
|
+
- **IMPORTANT** — should fix (degraded behavior, tech debt with near-term risk)
|
|
71
|
+
- **MINOR** — nice to fix (style, naming, low-risk improvement)
|
|
72
|
+
- **VALIDATED** — explicitly checked and confirmed correct
|
|
73
|
+
|
|
74
|
+
Every finding: RISQUE format. Every BLOCKING: specific FIX + NOT-X (what solution must NOT do).
|
|
75
|
+
|
|
76
|
+
## Dimension 7: Close the loop
|
|
77
|
+
|
|
78
|
+
- New anti-pattern found? → add to Guards or project overlay
|
|
79
|
+
- New failure mode? → add Guard immediately
|
|
80
|
+
- Capture learnings for future reference
|
|
81
|
+
|
|
82
|
+
## Output format
|
|
83
|
+
|
|
84
|
+
```
|
|
85
|
+
## AUDIT
|
|
86
|
+
|
|
87
|
+
### Expected behavior
|
|
88
|
+
<1-2 sentences + bypass signals>
|
|
89
|
+
|
|
90
|
+
### Assumptions
|
|
91
|
+
1. <assumption> — verified: <yes/no, evidence>
|
|
92
|
+
2. ...
|
|
93
|
+
3. ...
|
|
94
|
+
|
|
95
|
+
### Scope
|
|
96
|
+
- Nothing-counterfactual: <consequence if no change>
|
|
97
|
+
- Scope proportional: <yes/no, reason>
|
|
98
|
+
|
|
99
|
+
### Code vs model + STRIDE
|
|
100
|
+
- Code vs model: <matches | deviates at file:line>
|
|
101
|
+
- Bypass signals: <N/3 flagged>
|
|
102
|
+
- STRIDE:
|
|
103
|
+
- S: <N/A because X | RISQUE: ...>
|
|
104
|
+
- T/R/I/D/E: ...
|
|
105
|
+
|
|
106
|
+
### Consistency
|
|
107
|
+
- Pattern: <grep evidence>
|
|
108
|
+
- Layers: <clean | violation at file:line>
|
|
109
|
+
- Thresholds: <met | violation>
|
|
110
|
+
|
|
111
|
+
### Findings
|
|
112
|
+
BLOCKING: <RISQUE + FIX>
|
|
113
|
+
IMPORTANT: <RISQUE + FIX/ACCEPT>
|
|
114
|
+
MINOR: <note>
|
|
115
|
+
VALIDATED: <what was verified>
|
|
116
|
+
|
|
117
|
+
### Learnings
|
|
118
|
+
- New Guard: <yes/no>
|
|
119
|
+
- Overlay update: <yes/no>
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
## How to verify
|
|
123
|
+
|
|
124
|
+
- [ ] All 7 dimensions completed (Expected behavior, Assumptions, Scope, Code vs model + STRIDE, Consistency, Findings, Learnings)?
|
|
125
|
+
- [ ] All 6 STRIDE categories present (even if N/A)?
|
|
126
|
+
- [ ] Findings have severity (BLOCKING/IMPORTANT/MINOR)?
|
|
127
|
+
- [ ] VALIDATED section identifies what code got right?
|
|
128
|
+
- [ ] Learnings captured?
|
|
129
|
+
|
|
130
|
+
## Common mistakes
|
|
131
|
+
|
|
132
|
+
- **Operating from PR description alone**: always read the actual code
|
|
133
|
+
- **Skipping STRIDE categories**: all 6 must be explicit, even if N/A
|
|
134
|
+
- **BLOCKING without FIX**: if you can't name the fix, it's not actionable enough for BLOCKING
|
|
135
|
+
- **No VALIDATED section**: reviews that only report problems miss what the code got right
|
|
@@ -0,0 +1,134 @@
|
|
|
1
|
+
# critiquer-auditor — Reference
|
|
2
|
+
|
|
3
|
+
## STRIDE — audit probes (7-step audit context)
|
|
4
|
+
|
|
5
|
+
Use these probes when running COMPARER on each STRIDE category. Mark N/A explicitly; never skip.
|
|
6
|
+
|
|
7
|
+
### S — Spoofing
|
|
8
|
+
- Can I impersonate another user/service in this code path?
|
|
9
|
+
- Identity: client-supplied or server-resolved?
|
|
10
|
+
- WebSocket / SSE / GraphQL subscription: same auth as REST?
|
|
11
|
+
|
|
12
|
+
### T — Tampering
|
|
13
|
+
- Input modified in transit? HTTPS? Signatures?
|
|
14
|
+
- Idempotency keys present?
|
|
15
|
+
- CSRF protection on state-changing endpoints?
|
|
16
|
+
|
|
17
|
+
### R — Repudiation
|
|
18
|
+
- Audit log coverage: who, what, when recorded?
|
|
19
|
+
- Log integrity: append-only? remote-shipped?
|
|
20
|
+
|
|
21
|
+
### I — Information Disclosure
|
|
22
|
+
- Error messages: stack traces? SQL? paths?
|
|
23
|
+
- Logs: PII? secrets?
|
|
24
|
+
- Response bodies: over-fetching? unprojected columns?
|
|
25
|
+
- Timing attacks: 404 vs 403 distinction?
|
|
26
|
+
|
|
27
|
+
### D — Denial of Service
|
|
28
|
+
- Rate limiting per IP/user/endpoint?
|
|
29
|
+
- Resource bounds: payload size, query depth, file upload?
|
|
30
|
+
- Algorithmic complexity on user-controlled input?
|
|
31
|
+
- Regex catastrophic backtracking?
|
|
32
|
+
|
|
33
|
+
### E — Elevation of Privilege
|
|
34
|
+
- Permission check BEFORE action?
|
|
35
|
+
- Horizontal escalation: user A read user B's data?
|
|
36
|
+
- Vertical escalation: mass assignment setting `isAdmin`?
|
|
37
|
+
|
|
38
|
+
## Severity rubric
|
|
39
|
+
|
|
40
|
+
### BLOCKING
|
|
41
|
+
- Correctness bug: code produces wrong result for some input
|
|
42
|
+
- Security: any STRIDE finding that an attacker can exploit
|
|
43
|
+
- Data loss: delete/overwrite without backup/confirm
|
|
44
|
+
- Production crash: uncaught exception on common path
|
|
45
|
+
|
|
46
|
+
### IMPORTANT
|
|
47
|
+
- Degraded behavior: works but slow / intermittent
|
|
48
|
+
- Tech debt with near-term risk: pattern that will break at 2x current load
|
|
49
|
+
- Accessibility violation: keyboard/screen reader broken
|
|
50
|
+
- Test debt: feature ships without meaningful test
|
|
51
|
+
|
|
52
|
+
### MINOR
|
|
53
|
+
- Naming / style inconsistency
|
|
54
|
+
- Unused import
|
|
55
|
+
- Todo comment for future work
|
|
56
|
+
- Minor DRY violation (< 3 copies)
|
|
57
|
+
|
|
58
|
+
### VALIDATED
|
|
59
|
+
- Explicit callout of what was checked and confirmed correct
|
|
60
|
+
- Useful because it shows the reviewer's mental map
|
|
61
|
+
- Helps author understand what was covered vs skipped
|
|
62
|
+
|
|
63
|
+
## Counterfactual analysis
|
|
64
|
+
|
|
65
|
+
Questions to answer in QUESTIONNER step:
|
|
66
|
+
|
|
67
|
+
- What if we merged without this change? What breaks?
|
|
68
|
+
- Is there a 10% of this change that would solve 90% of the problem?
|
|
69
|
+
- Is this fixing a symptom or a cause? If symptom: where's the cause?
|
|
70
|
+
- Is this change reversible? If yes, risk is lower.
|
|
71
|
+
|
|
72
|
+
## Bypass signal checklist (build in APPRENDRE)
|
|
73
|
+
|
|
74
|
+
Common bypass signals to look for per framework:
|
|
75
|
+
|
|
76
|
+
### React / frontend
|
|
77
|
+
- `window.*` or `document.*` inside components
|
|
78
|
+
- `useEffect` with no dependency array
|
|
79
|
+
- Direct DOM manipulation via `refs.current`
|
|
80
|
+
- `dangerouslySetInnerHTML` with non-sanitized input
|
|
81
|
+
|
|
82
|
+
### Backend / JVM
|
|
83
|
+
- Raw SQL string concatenation
|
|
84
|
+
- `catch(Exception e) { }` or `catch → null`
|
|
85
|
+
- `as` cast without type guard (Kotlin) or unchecked cast (Java)
|
|
86
|
+
- Thread creation without pool
|
|
87
|
+
|
|
88
|
+
### Async / concurrent
|
|
89
|
+
- `async` function called without `await`
|
|
90
|
+
- Promise created but not awaited
|
|
91
|
+
- Race conditions on shared state
|
|
92
|
+
- Timeout of 0 or infinite
|
|
93
|
+
|
|
94
|
+
## Layer boundary violations
|
|
95
|
+
|
|
96
|
+
- Business logic in routes / controllers → should be in services
|
|
97
|
+
- DB calls in controllers → should be behind repository
|
|
98
|
+
- UI logic in models → should be in view layer
|
|
99
|
+
- Tests reaching across layers without mocks
|
|
100
|
+
|
|
101
|
+
## Overlay thresholds
|
|
102
|
+
|
|
103
|
+
If `ciel-overlay.md` exists under `## Santé du code`, check its thresholds:
|
|
104
|
+
|
|
105
|
+
```
|
|
106
|
+
### Santé du code
|
|
107
|
+
- Complexité cyclomatique: < 15 par fonction
|
|
108
|
+
- Profondeur d'imbrication: < 4
|
|
109
|
+
- Taille de fonction: < 50 lignes
|
|
110
|
+
- Couverture test: > 80% lignes modifiées
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
If any violation: IMPORTANT finding (can be demoted to MINOR if tiny exceedance).
|
|
114
|
+
|
|
115
|
+
## Capitalization format
|
|
116
|
+
|
|
117
|
+
When `learnings-capture` is invoked from CAPITALISER:
|
|
118
|
+
|
|
119
|
+
```
|
|
120
|
+
[YYYY-MM-DD] MISTAKE: <what happened, 1 line>
|
|
121
|
+
→ RULE: <how to avoid in future, 1 line>
|
|
122
|
+
→ Invoke: <which skill/guard catches this>
|
|
123
|
+
→ Evidence: <file:line where it was found>
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
This format feeds into `ciel-overlay.md` under `## Leçons projet` (project-specific) or `.claude/learnings.md` (general).
|
|
127
|
+
|
|
128
|
+
## Anti-patterns in audits
|
|
129
|
+
|
|
130
|
+
- Reviewing without reading the diff first → operate on assumptions
|
|
131
|
+
- STRIDE performed but all 6 "N/A" → didn't actually probe each category
|
|
132
|
+
- Only finding problems (no VALIDATED) → unclear what was checked
|
|
133
|
+
- BLOCKING without FIX → not actionable, author can't resolve
|
|
134
|
+
- Copying PR description into audit → pure theater, no independent thought
|