@minhduydev/mdpi 0.4.1 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/index.js +4 -2
- package/dist/template/.pi/AGENTS.md +1 -1
- package/dist/template/.pi/README.md +2 -3
- package/dist/template/.pi/VERSION +1 -1
- package/dist/template/.pi/agents/explore.md +1 -1
- package/dist/template/.pi/agents/scout.md +1 -1
- package/dist/template/.pi/extensions/templates-injector.ts +35 -7
- package/dist/template/.pi/prompts/INDEX.md +3 -9
- package/dist/template/.pi/prompts/gc.md +2 -1
- package/dist/template/.pi/prompts/verify.md +24 -0
- package/dist/template/.pi/skills/INDEX.md +40 -8
- package/dist/template/.pi/skills/dcp-hygiene/SKILL.md +1 -1
- package/dist/template/.pi/skills/frontend-design/SKILL.md +1 -1
- package/dist/template/.pi/skills/frontend-design/references/animation/motion-advanced.md +88 -15
- package/dist/template/.pi/skills/frontend-design/references/animation/motion-core.md +148 -13
- package/dist/template/.pi/skills/frontend-design/references/shadcn/setup.md +127 -20
- package/dist/template/.pi/skills/nextjs-app-router/SKILL.md +334 -0
- package/dist/template/.pi/skills/nextjs-cache/SKILL.md +262 -0
- package/dist/template/.pi/skills/react-best-practices/SKILL.md +79 -1
- package/dist/template/.pi/skills/react-compiler/SKILL.md +237 -0
- package/dist/template/.pi/skills/react-hook-form/SKILL.md +374 -0
- package/dist/template/.pi/skills/react-server-actions/SKILL.md +299 -0
- package/dist/template/.pi/skills/shadcn-ui/SKILL.md +404 -0
- package/dist/template/.pi/skills/tanstack-query/SKILL.md +330 -0
- package/dist/template/.pi/skills/v0/SKILL.md +264 -0
- package/dist/template/.pi/skills/zustand/SKILL.md +333 -0
- package/package.json +1 -1
- package/dist/template/.pi/context/fallow.md +0 -137
- package/dist/template/.pi/prompts/loop-check.md +0 -87
- package/dist/template/.pi/prompts/loop-init.md +0 -157
- package/dist/template/.pi/prompts/loop-review.md +0 -90
- package/dist/template/.pi/skills/loop-audit/SKILL.md +0 -141
- package/dist/template/.pi/skills/loop-cost/SKILL.md +0 -130
- package/dist/template/.pi/skills/loop-engineering/SKILL.md +0 -175
- package/dist/template/.pi/templates/loop-github-action.yml +0 -162
- package/dist/template/.pi/templates/loop-orchestrator.sh +0 -514
- package/dist/template/.pi/templates/loop-orchestrator.test.ts +0 -332
- package/dist/template/.pi/templates/loop-orchestrator.ts +0 -936
- package/dist/template/.pi/templates/loop-state.json +0 -24
- package/dist/template/.pi/templates/loop-state.md +0 -98
- package/dist/template/.pi/templates/loop-vision.md +0 -110
|
@@ -1,157 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
description: Scaffold a new unattended loop at .pi/loops/<name>/ with VISION, STATE, and a per-loop SKILL stub
|
|
3
|
-
argument-hint: "<name> [--help]"
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
# Loop Init: $ARGUMENTS
|
|
7
|
-
|
|
8
|
-
Scaffold a new loop-engineering harness at `.pi/loops/<name>/` from the templates shipped in `.pi/templates/`. Run once per loop. The scaffold is the contract + ledger + procedure the orchestrator (T9/T10) drives unattended.
|
|
9
|
-
|
|
10
|
-
> **Prerequisite:** `/loop-check <task>` returned GO. If not, refuse and tell the user to qualify the task first.
|
|
11
|
-
|
|
12
|
-
## Parse Arguments
|
|
13
|
-
|
|
14
|
-
| Argument | Default | Description |
|
|
15
|
-
| -------- | --------- | -------------------------------------------- |
|
|
16
|
-
| `<name>` | required | Loop slug; used as directory name `loop/<name>/<ts>` branch prefix |
|
|
17
|
-
| `--help` | false | Show this usage |
|
|
18
|
-
|
|
19
|
-
**Validation:** `<name>` must be a filesystem-safe slug (`^[a-z0-9][a-z0-9-]*$`, lowercase). Reject names containing `/`, spaces, or upper-case. Trim surrounding whitespace before use.
|
|
20
|
-
|
|
21
|
-
## When to Use
|
|
22
|
-
|
|
23
|
-
- You want to start a new unattended loop (nightly CI triage, dependency bumps, doc sync, PR babysitting).
|
|
24
|
-
- `/loop-check <task>` already returned GO (verification automated + token budget absorbs waste + human approves merge/deploy/deps).
|
|
25
|
-
- You need the contract (VISION.md), dedup ledger (STATE.json + STATE.md), and per-loop procedure (SKILL.md) that the orchestrator will drive.
|
|
26
|
-
|
|
27
|
-
Do NOT use for:
|
|
28
|
-
- One-off tasks (use `/create` or `/fix`).
|
|
29
|
-
- Tasks `/loop-check` refused (auth, payments, architecture) — refuse here too.
|
|
30
|
-
|
|
31
|
-
## The Scaffold Steps
|
|
32
|
-
|
|
33
|
-
### 1. Create the loop directory
|
|
34
|
-
|
|
35
|
-
Create `.pi/loops/<name>/` (parent `.pi/loops/` may not exist yet — create it).
|
|
36
|
-
|
|
37
|
-
### 2. Copy the four artifacts
|
|
38
|
-
|
|
39
|
-
Copy map (exact source → destination):
|
|
40
|
-
|
|
41
|
-
| Source (template) | Destination | Purpose |
|
|
42
|
-
| ----------------------------------------- | ------------------------------------ | ----------------------------------------- |
|
|
43
|
-
| `.pi/templates/loop-vision.md` | `.pi/loops/<name>/VISION.md` | Anti goal-drift contract (FR2) |
|
|
44
|
-
| `.pi/templates/loop-state.md` | `.pi/loops/<name>/STATE.md` | Human-readable state mirror (FR3) |
|
|
45
|
-
| `.pi/templates/loop-state.json` | `.pi/loops/<name>/STATE.json` | Machine dedup ledger (FR3, authoritative) |
|
|
46
|
-
| (new stub) | `.pi/loops/<name>/SKILL.md` | Per-loop procedure (classification + fix patterns) |
|
|
47
|
-
|
|
48
|
-
Copy the three templates byte-for-byte first (placeholders intact), then fill placeholders in the copied files. The `SKILL.md` stub is not a template — write a minimal seed:
|
|
49
|
-
|
|
50
|
-
```markdown
|
|
51
|
-
---
|
|
52
|
-
name: <name>
|
|
53
|
-
description: Per-loop procedure for <name> — classify, fix, escalate
|
|
54
|
-
---
|
|
55
|
-
|
|
56
|
-
# Loop Procedure: <name>
|
|
57
|
-
|
|
58
|
-
Reread `VISION.md` at the start of every run. Do not act outside it.
|
|
59
|
-
|
|
60
|
-
## Procedure
|
|
61
|
-
|
|
62
|
-
1. Reread `VISION.md` (boundaries authoritative).
|
|
63
|
-
2. Read `STATE.json` — build the dedup set from `processed[]`.
|
|
64
|
-
3. Fetch candidate items (per the loop's source: CI runs, PRs, packages, commits).
|
|
65
|
-
4. For each item: skip if in `processed[]`; else classify → fix / escalate / reject.
|
|
66
|
-
5. Run the Gate command (the ```bash block directly under `## Gate` in VISION.md); ship only on exit 0.
|
|
67
|
-
6. Append to `STATE.json.processed[]`, `completed[]`, `escalated[]`, or `failures[]`.
|
|
68
|
-
7. Enforce hard stops (see VISION.md "Hard stops").
|
|
69
|
-
|
|
70
|
-
## Classification rubric
|
|
71
|
-
|
|
72
|
-
<!-- Fill by hand after supervising the first manual runs. -->
|
|
73
|
-
|
|
74
|
-
## Fix patterns
|
|
75
|
-
|
|
76
|
-
<!-- Fill by hand as repeatable fixes are discovered. -->
|
|
77
|
-
```
|
|
78
|
-
|
|
79
|
-
### 3. Fill `<name>` placeholders
|
|
80
|
-
|
|
81
|
-
In the copied `VISION.md`, `STATE.md`, and `STATE.json`, replace every placeholder occurrence with the actual loop name:
|
|
82
|
-
|
|
83
|
-
- `<loop-name>` → `<name>`
|
|
84
|
-
- Leave human-fill placeholders (`[Owner]`, `[Date]`, `[cron expression or "manual"]`, `[Allowed action 1]`, `<GATE_COMMAND>`, etc.) as bracketed prompts for the user to edit by hand — do NOT invent values.
|
|
85
|
-
|
|
86
|
-
Tell the user explicitly which placeholders still need their input.
|
|
87
|
-
|
|
88
|
-
### 4. Print the rollout order
|
|
89
|
-
|
|
90
|
-
After scaffolding, print this rollout order so the user knows the path from scaffold to unattended run:
|
|
91
|
-
|
|
92
|
-
```
|
|
93
|
-
Rollout order:
|
|
94
|
-
1. check — /loop-check <task> (already GO; re-run if scope changes)
|
|
95
|
-
2. init — /loop-init <name> (this step — scaffold created)
|
|
96
|
-
3. supervise — run the loop's SKILL.md manually in a session; refine classification/fix patterns
|
|
97
|
-
4. wire — copy loop-orchestrator.ts/.sh + loop-github-action.yml; set cadence + gate + scope
|
|
98
|
-
5. run — schedule cron/launchd (local) or GitHub Actions `on: schedule` (CI); loop runs unattended
|
|
99
|
-
6. review — /loop-review <name> for interactive maker/checker verify
|
|
100
|
-
7. audit/cost — loop-audit scores readiness (L0-L3); track cost-per-accepted-change; kill if acceptance < 50%
|
|
101
|
-
```
|
|
102
|
-
|
|
103
|
-
## Idempotency Guard
|
|
104
|
-
|
|
105
|
-
Before writing anything:
|
|
106
|
-
|
|
107
|
-
1. Check whether `.pi/loops/<name>/` already exists.
|
|
108
|
-
2. **If it exists:** STOP. Do not overwrite. Ask the user:
|
|
109
|
-
> `.pi/loops/<name>/` already exists. Overwrite all files (VISION.md, STATE.md, STATE.json, SKILL.md)? This destroys any hand-edited contract/state. (y/N)
|
|
110
|
-
3. **Refuse overwrite without explicit confirmation.** On `N` or no answer, abort and report the existing tree without modifying it.
|
|
111
|
-
4. **If it does not exist:** proceed with the scaffold.
|
|
112
|
-
|
|
113
|
-
This guard protects hand-edited VISION.md contracts and STATE.json dedup ledgers from being clobbered by a re-run.
|
|
114
|
-
|
|
115
|
-
## Output
|
|
116
|
-
|
|
117
|
-
Print:
|
|
118
|
-
|
|
119
|
-
1. **Created tree** (e.g.):
|
|
120
|
-
```
|
|
121
|
-
.pi/loops/<name>/
|
|
122
|
-
├── VISION.md (contract — fill [Owner], [Date], cadence, Scope, Gate)
|
|
123
|
-
├── STATE.md (human-readable state mirror)
|
|
124
|
-
├── STATE.json (machine dedup ledger — authoritative)
|
|
125
|
-
└── SKILL.md (per-loop procedure — fill classification + fix patterns by hand)
|
|
126
|
-
```
|
|
127
|
-
2. **Rollout order** (the 7-step block above).
|
|
128
|
-
3. **Placeholders still needing input** (list each `[...]` / `<GATE_COMMAND>` the user must fill by hand, with file:line where useful).
|
|
129
|
-
4. **Next step:** `supervise` — run the loop's `SKILL.md` manually in a session and refine classification/fix patterns before wiring the orchestrator.
|
|
130
|
-
|
|
131
|
-
## Failure Handling
|
|
132
|
-
|
|
133
|
-
| Scenario | Action |
|
|
134
|
-
| ------------------------------------- | ------------------------------------------------------------ |
|
|
135
|
-
| `<name>` missing or invalid slug | Abort with the validation regex and an example |
|
|
136
|
-
| `.pi/loops/<name>/` exists, no confirm| Abort, print existing tree, do not modify |
|
|
137
|
-
| Template missing from `.pi/templates/`| Abort, list which template is missing (T2 must be shipped first) |
|
|
138
|
-
| `<name>` would collide with a reserved path | Abort, suggest an alternate slug |
|
|
139
|
-
|
|
140
|
-
## Stop Conditions
|
|
141
|
-
|
|
142
|
-
- `<name>` invalid → stop, report the regex.
|
|
143
|
-
- Directory exists and user declines overwrite → stop, report existing tree.
|
|
144
|
-
- Any of the three templates missing → stop, report (T2 prerequisite unmet).
|
|
145
|
-
|
|
146
|
-
## Related Commands
|
|
147
|
-
|
|
148
|
-
| Need | Command |
|
|
149
|
-
| ------------------------ | ----------------- |
|
|
150
|
-
| Qualify a task first | `/loop-check` |
|
|
151
|
-
| Review a running loop | `/loop-review` |
|
|
152
|
-
| Audit loop readiness | `loop-audit` |
|
|
153
|
-
|
|
154
|
-
## Related Skills
|
|
155
|
-
|
|
156
|
-
- `loop-engineering` — 2-condition test, 5 building blocks, VISION/state contract, failure modes
|
|
157
|
-
- `planning-and-task-breakdown` — decompose the loop's procedure into verifiable steps
|
|
@@ -1,90 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
description: Maker/checker review for a loop — spawns a verifier subagent that runs the gate and inspects the diff, then emits ACCEPT/REJECT with evidence
|
|
3
|
-
argument-hint: "<loop-name> [--help]"
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
# Loop Review: $ARGUMENTS
|
|
7
|
-
|
|
8
|
-
Run the maker/checker gate for `<loop-name>`: dispatch an independent **verifier subagent** that runs the `## Gate` command from the loop's `VISION.md` via bash and reads the exit code, inspect the working-tree diff for scope creep and forbidden touches, then emit exactly `DECISION: ACCEPT|REJECT` plus `EVIDENCE:`. The maker never self-approves — default to **REJECT** on any uncertainty.
|
|
9
|
-
|
|
10
|
-
## When to Use
|
|
11
|
-
|
|
12
|
-
- After a loop's maker (the `pi -p` capability-deprived agent) reports its work done and before the orchestrator ships.
|
|
13
|
-
- Interactive `/loop-review <loop-name>` to manually gate a loop cycle.
|
|
14
|
-
- Whenever you need an independent, computational gate decision — never trust the maker's self-report.
|
|
15
|
-
- Do **not** use for the orchestrator's own gate run (FR7); this prompt is the interactive maker/checker wrapper around the same gate-parse contract.
|
|
16
|
-
|
|
17
|
-
## The Verifier Subagent
|
|
18
|
-
|
|
19
|
-
Dispatch a **verifier** as an independent subagent so the maker cannot influence or self-approve the decision. The verifier's sole job: run the gate, read the exit code, collect diff findings, return raw evidence.
|
|
20
|
-
|
|
21
|
-
### Dispatch
|
|
22
|
-
|
|
23
|
-
Use the `subagent` tool with **type `review`**. Pass a prompt that instructs the verifier to:
|
|
24
|
-
|
|
25
|
-
1. Read `.pi/loops/<loop-name>/VISION.md` (or wherever the loop's VISION lives for this run) from disk — do not trust context-supplied copies.
|
|
26
|
-
2. Extract the gate command using the **exact gate-parse contract** below.
|
|
27
|
-
3. Run the gate via bash and capture the **exit code** (the computational signal — never an opinion).
|
|
28
|
-
4. Inspect the working-tree/staged diff for scope creep and forbidden paths (see The Diff Check).
|
|
29
|
-
5. Return raw evidence only: exit code, gate command run, diff findings (paths touched, any forbidden hits). **No verdict** — the maker (this agent) issues the verdict.
|
|
30
|
-
|
|
31
|
-
### Gate-parse contract (must match T2 exactly)
|
|
32
|
-
|
|
33
|
-
Extract the gate command from `VISION.md` as: **the SINGLE fenced ```bash block located directly under the `## Gate` heading** — require **EXACTLY ONE** such block directly under `## Gate` (zero or more-than-one → REJECT / hard fail). Take that single block's content, strip trailing whitespace, run it via `bash -c "<command>"`, and read the exit code.
|
|
34
|
-
|
|
35
|
-
- `exit 0` → PASS → ship (orchestrator pushes `loop/<name>/<ts>` + opens PR)
|
|
36
|
-
- non-zero → FAIL → no ship; record failure in `STATE.json.failures[]`; cleanup worktree
|
|
37
|
-
|
|
38
|
-
The gate decision is **computational (exit code), never an LLM's opinion**. If `## Gate` is missing, the fenced block is empty, or the block count under `## Gate` is not exactly one (zero or more-than-one), treat that as a hard fail: emit `DECISION: REJECT` with `EVIDENCE: gate not parseable` and do not run anything.
|
|
39
|
-
|
|
40
|
-
### Evidence to collect from the verifier
|
|
41
|
-
|
|
42
|
-
- The exact gate command extracted (verbatim, after whitespace strip).
|
|
43
|
-
- The bash exit code (integer).
|
|
44
|
-
- stdout/stderr tail (last ~20 lines) — enough to cite a concrete failure.
|
|
45
|
-
- List of paths touched in the diff (added/modified/deleted).
|
|
46
|
-
- Any forbidden-path hits (see The Diff Check).
|
|
47
|
-
|
|
48
|
-
## The Diff Check
|
|
49
|
-
|
|
50
|
-
Independently of the verifier (or have the verifier report and you confirm), inspect the diff against the loop's declared boundaries.
|
|
51
|
-
|
|
52
|
-
### Scope creep
|
|
53
|
-
|
|
54
|
-
Compare every path in the diff to `## Scope` and `## Out-of-scope` in `VISION.md`. Any touched path that is not clearly inside `## Scope` is scope creep → REJECT. When a path is ambiguous (not explicitly listed in either section), treat it as out-of-scope and escalate, do not approve.
|
|
55
|
-
|
|
56
|
-
### Forbidden touches (always REJECT)
|
|
57
|
-
|
|
58
|
-
Regardless of scope wording, REJECT immediately if the diff touches any of:
|
|
59
|
-
|
|
60
|
-
- `VISION.md` (the loop contract itself — protected path).
|
|
61
|
-
- The gate command / gate script referenced by `## Gate`.
|
|
62
|
-
- `package.json`, `package-lock.json`, `pnpm-lock.yaml`, `yarn.lock`, or any lockfile.
|
|
63
|
-
- Any path under `## Out-of-scope` in `VISION.md`.
|
|
64
|
-
- Auth, payments, or architectural-decision files (per `## Human-approval-required`).
|
|
65
|
-
|
|
66
|
-
Cite the offending path verbatim in `EVIDENCE:`.
|
|
67
|
-
|
|
68
|
-
## Output Contract
|
|
69
|
-
|
|
70
|
-
Emit exactly two lines (no prose around them, no extra fields):
|
|
71
|
-
|
|
72
|
-
```
|
|
73
|
-
DECISION: ACCEPT|REJECT
|
|
74
|
-
EVIDENCE: <exit code + diff findings>
|
|
75
|
-
```
|
|
76
|
-
|
|
77
|
-
- `DECISION: ACCEPT` only when the verifier reports **exit code 0** AND the diff has zero scope-creep and zero forbidden-touch findings.
|
|
78
|
-
- `DECISION: REJECT` otherwise. `EVIDENCE:` must cite the concrete signal — e.g. `gate exit 1: npm test failed (see stderr tail)` or `forbidden touch: package.json` or `scope creep: src/auth/* not in VISION.md Scope`.
|
|
79
|
-
- The orchestrator consumes these two lines machine-readably; do not decorate them with markdown, prefixes, or commentary.
|
|
80
|
-
|
|
81
|
-
## Default-Reject Rule
|
|
82
|
-
|
|
83
|
-
**The maker never self-approves.** The verdict is issued here (the maker/checker prompt), but the *evidence* comes from the independent verifier subagent — never from the maker's own self-report. On any uncertainty — gate not parseable, exit code unreadable, diff not inspectable, scope ambiguous, verifier subagent failed to return — emit:
|
|
84
|
-
|
|
85
|
-
```
|
|
86
|
-
DECISION: REJECT
|
|
87
|
-
EVIDENCE: <what was uncertain — e.g. "verifier did not return exit code" / "scope ambiguous for src/foo.ts">
|
|
88
|
-
```
|
|
89
|
-
|
|
90
|
-
Default-reject is the safe state; the orchestrator records it and retries or escalates. Never substitute opinion for the computational exit-code signal.
|
|
@@ -1,141 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: loop-audit
|
|
3
|
-
description: Use when scoring a project's loop-engineering readiness 0-100 + L0/L1/L2/L3 from concrete repo signals (state file, verifier/gate, loop skills, safety docs, GitHub workflows, MCP, worktree evidence, cost observability, real loop activity). Emits a numeric score, a level, and ≥1 recommendation. L3 is gated on a proven committed run.
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
# Loop Audit
|
|
7
|
-
|
|
8
|
-
Readiness scoring for loop-engineering. A loop is a system that prompts an agent on a schedule; before you trust one to run unattended, measure whether the project is actually loop-ready. This skill converts concrete repo signals into a **reproducible 0-100 score** and a **L0/L1/L2/L3 level**, then emits ≥1 recommendation. The score is computed from a fixed rubric (below) — it is not an opinion. L3 is capped at L2 until a real cycle has been run and committed.
|
|
9
|
-
|
|
10
|
-
## When to Use
|
|
11
|
-
|
|
12
|
-
- Before wiring a loop to run unattended (cron/launchd/GitHub Actions `on: schedule`).
|
|
13
|
-
- Before promoting a loop from supervised to unattended.
|
|
14
|
-
- On a cadence, to detect readiness drift (skills deleted, guard unregistered, state file dropped).
|
|
15
|
-
- When a stakeholder asks "is this project ready to loop?"
|
|
16
|
-
|
|
17
|
-
## When NOT to Use
|
|
18
|
-
|
|
19
|
-
- A single interactive change with no schedule or repetition (no loop to score).
|
|
20
|
-
- Scoring the *correctness* of loop output — that is `/loop-review`'s job (exit-code evidence). This skill scores *readiness*, not output quality.
|
|
21
|
-
|
|
22
|
-
## The Signals
|
|
23
|
-
|
|
24
|
-
Each signal is a concrete, checkable artifact. Check the repo; mark present/absent; sum the points. No signal is subjective — if you cannot point to a file/line, the signal is absent.
|
|
25
|
-
|
|
26
|
-
| # | Signal | What counts as present | Points |
|
|
27
|
-
|---|---|---|---|
|
|
28
|
-
| 1 | **State file present** | A loop state ledger exists and is non-empty: `.pi/loops/*/STATE.json` (or `STATE.md`) with `processed`, `in_progress`, `completed`, `failures`, `lessons`, `metrics`, `last_run`, `stop_conditions_met`. A blank/template file scores 0. | 12 |
|
|
29
|
-
| 2 | **Verifier/gate present** | An objective gate command is named in a VISION/contract file (`Gate:` section of `.pi/loops/*/VISION.md`) OR a gate script exists (e.g. `scripts/loop-gate.sh`, `npm test` referenced by the orchestrator). The gate must be a real command whose exit code decides pass/fail. | 14 |
|
|
30
|
-
| 3 | **Loop skills present** | Count of loop-* skills/prompts installed: `loop-engineering`, `loop-audit`, `loop-cost`, `/loop-check`, `/loop-review`, `/loop-init`. 0 → 0; 1-2 → 4; 3-4 → 8; 5-6 → 12. | 12 |
|
|
31
|
-
| 4 | **Safety docs** | Never-do rules + security checklist documented (in `loop-engineering` skill or a loop's `SKILL.md`): refuse-list (auth/payments/architecture), path protection, dangerous-cmd block, human-approval-required. | 10 |
|
|
32
|
-
| 5 | **GitHub workflows** | A loop CI workflow exists: `.github/workflows/*loop*` referencing `pi -p` and `on: schedule`, OR the `loop-github-action.yml` template is present. | 10 |
|
|
33
|
-
| 6 | **MCP / tool_call hook** | A `tool_call` hook is registered (`.pi/extensions/loop-guard.ts` in `.pi/settings.json` `extensions[]`) OR an MCP adapter is wired for loop tools. Capability-deprivation (`--tools` allowlist) documented in the orchestrator counts. | 8 |
|
|
34
|
-
| 7 | **Worktree evidence** | The orchestrator uses `git worktree add` for isolation (grep the orchestrator source/template for `worktree`). | 10 |
|
|
35
|
-
| 8 | **Cost observability** | Both: (a) a budget/cost doc or estimate (e.g. `loop-cost` output, a `BUDGET.md`, a cap value in the orchestrator) AND (b) a per-run log exists (a `logs/` or `.pi/loops/*/logs/` dir with at least one run log). One without the other → 6. | 12 |
|
|
36
|
-
| 9 | **Real loop activity** | A proven committed run: `STATE.json.last_run` set AND at least one item in `processed[]`/`completed[]` AND a git commit/PR attributable to the loop (branch `loop/<name>/<ts>` merged or open). A scaffolded-but-never-run loop scores 0 here. | 12 |
|
|
37
|
-
| | **Total** | | **100** |
|
|
38
|
-
|
|
39
|
-
## Scoring (0-100 → level)
|
|
40
|
-
|
|
41
|
-
Compute the raw score by summing the present signals (0-100). Then map to a level.
|
|
42
|
-
|
|
43
|
-
| Level | Raw score | Extra requirements |
|
|
44
|
-
|---|---|---|
|
|
45
|
-
| **L0** — nothing real | 0-29 | (none) |
|
|
46
|
-
| **L1** — structure, not yet run | 30-54 | (none) |
|
|
47
|
-
| **L2** — capable but unproven | 55-77, **OR** score ≥78 but missing any L3 gate requirement | (none) |
|
|
48
|
-
| **L3** — proven, gated, ready for unattended | ≥78 **AND** verifier/gate present (signal 2) **AND** state file present (signal 1) **AND** cost-ready (signal 8 full) **AND** real loop activity (signal 9) | **All four gate requirements must hold.** |
|
|
49
|
-
|
|
50
|
-
### The L3 gate (hard)
|
|
51
|
-
|
|
52
|
-
L3 is **capped at L2** until a real cycle has been run and committed. Structure alone — no matter how complete — cannot earn L3. The four gate requirements are conjunctive:
|
|
53
|
-
|
|
54
|
-
1. **Verifier/gate present** (signal 2, ≥14 available; must be present at all).
|
|
55
|
-
2. **State file present** (signal 1; a populated, non-template ledger).
|
|
56
|
-
3. **Cost-ready** (signal 8 full — both budget doc *and* run log).
|
|
57
|
-
4. **Proven committed run** (signal 9 — a real cycle, committed, not a scaffold).
|
|
58
|
-
|
|
59
|
-
If raw score ≥78 but any gate requirement is missing → **L2** with a recommendation naming the missing requirement. This is the anti-Ralph-Wiggum guard: a project that *looks* ready but has never actually run a cycle is not trusted to run unattended.
|
|
60
|
-
|
|
61
|
-
> **Scenario (FR4):** project has structure (score 80) but no proven runs → **L2** with recommendation "run one L1 cycle and commit state before L3".
|
|
62
|
-
|
|
63
|
-
## Output Contract
|
|
64
|
-
|
|
65
|
-
The skill always emits three things — no exceptions:
|
|
66
|
-
|
|
67
|
-
1. **Score:** the integer 0-100, with a one-line breakdown (which signals were present and their points). Reproducible: another agent re-running the rubric on the same repo must get the same number.
|
|
68
|
-
2. **Level:** one of `L0`, `L1`, `L2`, `L3`. If the raw score would map to L3 but a gate requirement fails, state the **capped** level (L2) and name the failed gate requirement.
|
|
69
|
-
3. **Recommendations:** ≥1 concrete, actionable next step tied to a missing/weak signal. Each recommendation cites the signal it would raise and the points recoverable. Never empty.
|
|
70
|
-
|
|
71
|
-
Output shape (markdown):
|
|
72
|
-
|
|
73
|
-
```markdown
|
|
74
|
-
# Loop Readiness Audit
|
|
75
|
-
|
|
76
|
-
**Score:** 64/100
|
|
77
|
-
**Level:** L2 (capped from L3 — missing proven committed run)
|
|
78
|
-
|
|
79
|
-
## Breakdown
|
|
80
|
-
| Signal | Present? | Points |
|
|
81
|
-
|---|---|---|
|
|
82
|
-
| State file | ✅ | 12 |
|
|
83
|
-
| Verifier/gate | ✅ | 14 |
|
|
84
|
-
| Loop skills | ✅ (4/6) | 8 |
|
|
85
|
-
| Safety docs | ✅ | 10 |
|
|
86
|
-
| GitHub workflows | ❌ | 0 |
|
|
87
|
-
| MCP/tool_call hook | ✅ | 8 |
|
|
88
|
-
| Worktree evidence | ✅ | 10 |
|
|
89
|
-
| Cost observability | ⚠ partial (budget doc only) | 6 |
|
|
90
|
-
| Real loop activity | ❌ | 0 |
|
|
91
|
-
| **Total** | | **64** |
|
|
92
|
-
|
|
93
|
-
## Gate check (L3)
|
|
94
|
-
- Verifier/gate: ✅
|
|
95
|
-
- State file: ✅
|
|
96
|
-
- Cost-ready (both): ❌ (no run log)
|
|
97
|
-
- Proven committed run: ❌
|
|
98
|
-
→ Capped at L2.
|
|
99
|
-
|
|
100
|
-
## Recommendations
|
|
101
|
-
1. Add `.github/workflows/loop-*.yml` with `on: schedule` + `pi -p` to recover +10 (signal 5) and unblock CI unattended runs.
|
|
102
|
-
2. Run one supervised L1 cycle end-to-end and commit `STATE.json` (branch `loop/<name>/<ts>` + PR) to recover +12 (signal 9) and satisfy the L3 proven-run gate.
|
|
103
|
-
3. Emit per-run logs to `.pi/loops/*/logs/` to complete cost observability (+6, signal 8) and satisfy the cost-ready gate requirement.
|
|
104
|
-
```
|
|
105
|
-
|
|
106
|
-
## Verification
|
|
107
|
-
|
|
108
|
-
Before claiming an audit is complete:
|
|
109
|
-
|
|
110
|
-
- [ ] The output contains a numeric score 0-100 and a reproducible breakdown (every signal marked present/absent/partial with its points).
|
|
111
|
-
- [ ] The output contains a level L0/L1/L2/L3. If L3 is claimed, all four gate requirements are explicitly checked and present.
|
|
112
|
-
- [ ] The output contains ≥1 recommendation, each tied to a missing/weak signal with recoverable points.
|
|
113
|
-
- [ ] A second pass over the same repo yields the same score (rubric is deterministic — disagreement means a signal was scored subjectively; re-check the artifact).
|
|
114
|
-
- [ ] L3 is never awarded without a proven committed run (signal 9 present AND a `loop/<name>/<ts>` branch/PR exists in git).
|
|
115
|
-
|
|
116
|
-
## See Also
|
|
117
|
-
|
|
118
|
-
| Companion | Role |
|
|
119
|
-
|---|---|
|
|
120
|
-
| `loop-engineering` | Methodology this skill scores against (2-condition test, 5 building blocks) |
|
|
121
|
-
| `/loop-check` | NO-GO qualification gate (run *before* a loop, not after) |
|
|
122
|
-
| `/loop-review` | Maker/checker — scores a *run's* output via exit code, not project readiness |
|
|
123
|
-
| `loop-cost` | Cost estimation feeding the cost-observability signal (8) |
|
|
124
|
-
| `loop-orchestrator.ts`/`.sh` | Runtime whose worktree/gate/state usage signals 1, 2, 7 detect |
|
|
125
|
-
|
|
126
|
-
## Common Rationalizations
|
|
127
|
-
|
|
128
|
-
| Rationalization | Reality |
|
|
129
|
-
|---|---|
|
|
130
|
-
| "We scaffolded the whole kit, that's L3" | Structure without a proven committed run is L2 max. The gate exists to prevent this exact claim. |
|
|
131
|
-
| "The score feels low" | The score is the rubric. Re-check the artifacts; if a signal is absent, it's 0 — no partial credit for "almost". |
|
|
132
|
-
| "We ran a loop once manually, that's proven" | Manual ≠ committed. Proven requires a committed cycle (branch + PR/commit attributable to the loop). |
|
|
133
|
-
| "L3 just needs a high score" | L3 needs score ≥78 **and** all four gate requirements. High score alone caps at L2. |
|
|
134
|
-
|
|
135
|
-
## Red Flags
|
|
136
|
-
|
|
137
|
-
- An L3 score with no `loop/<name>/<ts>` branch or PR in git history (gate failed; cap to L2).
|
|
138
|
-
- A score with no breakdown table (not reproducible — redo the audit).
|
|
139
|
-
- Zero recommendations (the contract requires ≥1; an empty recommendation list means the audit is incomplete).
|
|
140
|
-
- A signal marked present with no file/line citation (subjective scoring; re-check).
|
|
141
|
-
- Cost observability scored full with only a budget doc and no run log (signal 8 requires both).
|
|
@@ -1,130 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: loop-cost
|
|
3
|
-
description: Use when estimating tokens/day for an unattended coding loop before it ships — computes cadence × realistic blend per L1/L2/L3, a suggested daily cap, and the early-exit-required flag (early-exit on empty watchlist < 5k tokens). Load before approving a loop's budget or setting a per-run cap.
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
# Loop Cost
|
|
7
|
-
|
|
8
|
-
Token-budget skill for unattended coding loops. Answers the budget half of the loop-engineering 2-condition test: *is the cost of one wasted run small enough that the cadence does not bankrupt you?* Produces a tokens/day estimate, a suggested daily cap, and an early-exit flag the orchestrator (`loop-orchestrator.ts`) consumes from `--mode json` token events (FR13).
|
|
9
|
-
|
|
10
|
-
## When to Use
|
|
11
|
-
|
|
12
|
-
- You are about to qualify or ship an unattended loop and need to set a per-run cap.
|
|
13
|
-
- An orchestrator asks for a daily budget number before registering a cron/GitHub Actions schedule.
|
|
14
|
-
- You are sizing whether a no-op (watchlist empty) run is cheap enough to schedule hourly.
|
|
15
|
-
- You are revising a cap after cadence or level changes (e.g., L2 triage promoted to L3 fix loop).
|
|
16
|
-
|
|
17
|
-
## When NOT to Use
|
|
18
|
-
|
|
19
|
-
- Interactive single runs with a human watching every turn — no cadence, no budget problem.
|
|
20
|
-
- Estimating dollar cost or latency — this is tokens only; multiply by model price elsewhere.
|
|
21
|
-
- Choosing loop qualification or confidence thresholds — those live in `loop-engineering` and `loop-guard`.
|
|
22
|
-
|
|
23
|
-
## The Blend Model
|
|
24
|
-
|
|
25
|
-
A loop's per-run cost is not a single number; it is a **blend** over what actually happens across runs. Each loop is assigned one Level; within that Level, runs distribute across three outcomes:
|
|
26
|
-
|
|
27
|
-
| Level | Early-exit (watchlist empty / no-op) | Triage (diagnosis-only, no fix) | Full fix (edit + verify) |
|
|
28
|
-
| ----- | ------------------------------------ | ------------------------------- | ------------------------ |
|
|
29
|
-
| **L1** | 90% | 10% | — |
|
|
30
|
-
| **L2** | 85% | 10% | 5% |
|
|
31
|
-
| **L3** | 40% | 35% | 25% |
|
|
32
|
-
|
|
33
|
-
**Per-run token estimates (assumptions — state yours explicitly and override for your stack):**
|
|
34
|
-
|
|
35
|
-
- **Early-exit:** 2,000 tokens (read watchlist, confirm empty, write state, exit).
|
|
36
|
-
- **Triage:** 12,000 tokens (read failure, reproduce attempt, write diagnosis-only comment, no edits).
|
|
37
|
-
- **Full fix:** 80,000 tokens (read + reproduce + edit + run gate + revise + post).
|
|
38
|
-
|
|
39
|
-
These are starting points for a pi 0.79.x maker on a mid-size TS/JS repo with a passing gate. For larger repos, heavier languages, or failing gates, raise them. Always state the assumptions used so the arithmetic is reproducible.
|
|
40
|
-
|
|
41
|
-
**Expected tokens per run** = blend:
|
|
42
|
-
|
|
43
|
-
```
|
|
44
|
-
E[run] = p_early * T_early + p_triage * T_triage + p_fix * T_fix
|
|
45
|
-
```
|
|
46
|
-
|
|
47
|
-
where `p_*` are the Level's percentages (as fractions) and `T_*` the per-run estimates above.
|
|
48
|
-
|
|
49
|
-
## Daily Cap Formula
|
|
50
|
-
|
|
51
|
-
Given a **cadence** (runs/day) and a Level, the daily budget is:
|
|
52
|
-
|
|
53
|
-
```
|
|
54
|
-
tokens_per_day = cadence_per_day * E[run]
|
|
55
|
-
suggested_daily_cap = ceil(tokens_per_day * safety_factor)
|
|
56
|
-
```
|
|
57
|
-
|
|
58
|
-
- **cadence_per_day**: scheduled runs in 24h. Hourly = 24, every-15-min = 96, nightly = 1, weekdays-only nightly = ~22.
|
|
59
|
-
- **safety_factor**: 1.5 default. The blend is an *expectation*; real days cluster worse (a flaky CI + a real bug in one day). 1.5 covers one bad day without doubling cost. Lower to 1.25 for well-observed stable loops; raise to 2.0 for new loops with unknown variance.
|
|
60
|
-
- **per_run_cap**: set on the orchestrator's `--mode json` token watcher to `safety_factor * T_fix` (the worst single-run case), not the blend — a single run that blows past `T_fix` is a runaway, not a bad day. FR13's "exceeds a per-run cap" kill trigger uses this number.
|
|
61
|
-
|
|
62
|
-
The orchestrator accumulates usage from token events and **kills the loop when usage exceeds `suggested_daily_cap`** (FR13), recording it in state.
|
|
63
|
-
|
|
64
|
-
## Early-Exit Rule
|
|
65
|
-
|
|
66
|
-
**Early-exit-required flag** is set (true) when the loop's no-op case costs under the early-exit ceiling:
|
|
67
|
-
|
|
68
|
-
```
|
|
69
|
-
early_exit_required = T_early < 5000
|
|
70
|
-
```
|
|
71
|
-
|
|
72
|
-
With the default `T_early = 2,000`, the flag is **true** for every Level. If your stack pushes `T_early ≥ 5,000` (huge watchlist read, chatty state writes), the flag flips to **false** — that loop is too expensive to schedule freely and should be re-scoped or scheduled less often.
|
|
73
|
-
|
|
74
|
-
When `early_exit_required` is true and the watchlist is empty, the orchestrator **must** early-exit and must not proceed to triage/fix — FR13: *"no-op (watchlist empty) → early-exit < 5k tokens."* The non-functional ceiling is a hard 5k.
|
|
75
|
-
|
|
76
|
-
## Worked Examples
|
|
77
|
-
|
|
78
|
-
Assumptions (used in every row): `T_early = 2,000`, `T_triage = 12,000`, `T_fix = 80,000`, `safety_factor = 1.5`.
|
|
79
|
-
|
|
80
|
-
| Cadence (runs/day) | Level | E[run] arithmetic | E[run] (tokens) | tokens/day | Suggested daily cap (×1.5) | per-run cap (×1.5 of T_fix=80k) | Early-exit required? |
|
|
81
|
-
| ------------------ | ----- | ----------------- | --------------- | ---------- | -------------------------- | ------------------------------ | -------------------- |
|
|
82
|
-
| 1 (nightly) | L1 | 0.90·2k + 0.10·12k + 0·80k = 1,800 + 1,200 | 3,000 | 3,000 | 4,500 | 120,000 | yes (2k < 5k) |
|
|
83
|
-
| 24 (hourly) | L1 | 0.90·2k + 0.10·12k = 3,000 | 3,000 | 72,000 | 108,000 | 120,000 | yes |
|
|
84
|
-
| 1 (nightly) | L2 | 0.85·2k + 0.10·12k + 0.05·80k = 1,700 + 1,200 + 4,000 | 6,900 | 6,900 | 10,350 | 120,000 | yes |
|
|
85
|
-
| 24 (hourly) | L2 | 0.85·2k + 0.10·12k + 0.05·80k = 6,900 | 6,900 | 165,600 | 248,400 | 120,000 | yes |
|
|
86
|
-
| 1 (nightly) | L3 | 0.40·2k + 0.35·12k + 0.25·80k = 800 + 4,200 + 20,000 | 25,000 | 25,000 | 37,500 | 120,000 | yes |
|
|
87
|
-
| 24 (hourly) | L3 | 0.40·2k + 0.35·12k + 0.25·80k = 25,000 | 25,000 | 600,000 | 900,000 | 120,000 | yes |
|
|
88
|
-
| 96 (every 15 min) | L1 | 3,000 | 3,000 | 288,000 | 432,000 | 120,000 | yes |
|
|
89
|
-
| 96 (every 15 min) | L3 | 25,000 | 25,000 | 2,400,000 | 3,600,000 | 120,000 | yes |
|
|
90
|
-
|
|
91
|
-
**Reading the table:** the hourly-L3 row (600k tokens/day, 900k cap) is the warning zone — an L3 fix loop scheduled hourly is almost certainly over-budget unless you have a large API quota. The nightly-L1 row (4,500 cap) is the cheap, safe default for triage loops. The every-15-min-L3 row is a NO-GO on cost alone — re-scope to L2 or drop cadence.
|
|
92
|
-
|
|
93
|
-
## Output Contract
|
|
94
|
-
|
|
95
|
-
`loop-cost` produces a single budget record the orchestrator consumes:
|
|
96
|
-
|
|
97
|
-
```json
|
|
98
|
-
{
|
|
99
|
-
"level": "L1" | "L2" | "L3",
|
|
100
|
-
"cadence_per_day": <number>,
|
|
101
|
-
"assumptions": {
|
|
102
|
-
"T_early": 2000,
|
|
103
|
-
"T_triage": 12000,
|
|
104
|
-
"T_fix": 80000,
|
|
105
|
-
"safety_factor": 1.5
|
|
106
|
-
},
|
|
107
|
-
"E_run_tokens": <number>,
|
|
108
|
-
"tokens_per_day": <number>,
|
|
109
|
-
"suggested_daily_cap": <number>,
|
|
110
|
-
"per_run_cap": <number>,
|
|
111
|
-
"early_exit_required": true | false,
|
|
112
|
-
"verdict": "GO" | "NO-GO"
|
|
113
|
-
}
|
|
114
|
-
```
|
|
115
|
-
|
|
116
|
-
- `verdict` is **NO-GO** if `suggested_daily_cap` exceeds your stated daily quota, or if `early_exit_required` is false, or if the loop is L3 at sub-hourly cadence (cost-only refuse, independent of the refuse-list in `loop-engineering`).
|
|
117
|
-
- The orchestrator reads `suggested_daily_cap` into its `--mode json` token watcher and `per_run_cap` as the per-run kill threshold (FR13).
|
|
118
|
-
|
|
119
|
-
## Verification
|
|
120
|
-
|
|
121
|
-
A `loop-cost` estimate is correct iff all of these hold:
|
|
122
|
-
|
|
123
|
-
1. **Arithmetic reproduces.** Re-run `E[run] = p_early·T_early + p_triage·T_triage + p_fix·T_fix` with the Level's percentages and the stated `T_*`; the result matches `E_run_tokens`. The percentages are the only Level-dependent input.
|
|
124
|
-
2. **Percentages sum to 1.0 per Level.** L1: 0.90 + 0.10 = 1.00. L2: 0.85 + 0.10 + 0.05 = 1.00. L3: 0.40 + 0.35 + 0.25 = 1.00. If they don't sum to 1.00, the blend is wrong.
|
|
125
|
-
3. **Cap ≥ tokens/day.** `suggested_daily_cap = ceil(tokens_per_day * safety_factor)` and `safety_factor ≥ 1.0`. A cap below `tokens_per_day` is a bug.
|
|
126
|
-
4. **per_run_cap uses the worst case, not the blend.** `per_run_cap = safety_factor * T_fix`. If it equals `safety_factor * E_run_tokens`, the runaway-run protection is broken.
|
|
127
|
-
5. **Early-exit flag matches `T_early`.** `early_exit_required = (T_early < 5000)`. With `T_early = 2,000` it is `true`. If `T_early` is overridden and the flag wasn't recomputed, it's stale.
|
|
128
|
-
6. **Assumptions are explicit.** Every number in the output traces to an entry in `assumptions`. No hidden constants.
|
|
129
|
-
|
|
130
|
-
Run the check manually: pick any Worked Examples row, substitute the Level percentages into the blend formula with the stated `T_*`, and confirm `E_run_tokens` and `suggested_daily_cap` match. If they do, the model is internally consistent; if a row is off, the skill is broken.
|