@minhduydev/mdpi 0.4.1 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/index.js +1 -1
- package/dist/template/.pi/VERSION +1 -1
- package/dist/template/.pi/extensions/templates-injector.ts +35 -7
- package/dist/template/.pi/prompts/INDEX.md +3 -9
- package/dist/template/.pi/skills/INDEX.md +39 -8
- package/dist/template/.pi/skills/dcp-hygiene/SKILL.md +1 -1
- package/dist/template/.pi/skills/frontend-design/SKILL.md +1 -1
- package/dist/template/.pi/skills/frontend-design/references/animation/motion-advanced.md +88 -15
- package/dist/template/.pi/skills/frontend-design/references/animation/motion-core.md +148 -13
- package/dist/template/.pi/skills/frontend-design/references/shadcn/setup.md +127 -20
- package/dist/template/.pi/skills/nextjs-app-router/SKILL.md +334 -0
- package/dist/template/.pi/skills/nextjs-cache/SKILL.md +262 -0
- package/dist/template/.pi/skills/react-best-practices/SKILL.md +79 -1
- package/dist/template/.pi/skills/react-compiler/SKILL.md +237 -0
- package/dist/template/.pi/skills/react-hook-form/SKILL.md +374 -0
- package/dist/template/.pi/skills/react-server-actions/SKILL.md +299 -0
- package/dist/template/.pi/skills/shadcn-ui/SKILL.md +404 -0
- package/dist/template/.pi/skills/tanstack-query/SKILL.md +330 -0
- package/dist/template/.pi/skills/v0/SKILL.md +264 -0
- package/dist/template/.pi/skills/zustand/SKILL.md +333 -0
- package/package.json +1 -1
- package/dist/template/.pi/prompts/loop-check.md +0 -87
- package/dist/template/.pi/prompts/loop-init.md +0 -157
- package/dist/template/.pi/prompts/loop-review.md +0 -90
- package/dist/template/.pi/skills/loop-audit/SKILL.md +0 -141
- package/dist/template/.pi/skills/loop-cost/SKILL.md +0 -130
- package/dist/template/.pi/skills/loop-engineering/SKILL.md +0 -175
- package/dist/template/.pi/templates/loop-github-action.yml +0 -162
- package/dist/template/.pi/templates/loop-orchestrator.sh +0 -514
- package/dist/template/.pi/templates/loop-orchestrator.test.ts +0 -332
- package/dist/template/.pi/templates/loop-orchestrator.ts +0 -936
- package/dist/template/.pi/templates/loop-state.json +0 -24
- package/dist/template/.pi/templates/loop-state.md +0 -98
- package/dist/template/.pi/templates/loop-vision.md +0 -110
|
@@ -1,141 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: loop-audit
|
|
3
|
-
description: Use when scoring a project's loop-engineering readiness 0-100 + L0/L1/L2/L3 from concrete repo signals (state file, verifier/gate, loop skills, safety docs, GitHub workflows, MCP, worktree evidence, cost observability, real loop activity). Emits a numeric score, a level, and ≥1 recommendation. L3 is gated on a proven committed run.
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
# Loop Audit
|
|
7
|
-
|
|
8
|
-
Readiness scoring for loop-engineering. A loop is a system that prompts an agent on a schedule; before you trust one to run unattended, measure whether the project is actually loop-ready. This skill converts concrete repo signals into a **reproducible 0-100 score** and a **L0/L1/L2/L3 level**, then emits ≥1 recommendation. The score is computed from a fixed rubric (below) — it is not an opinion. L3 is capped at L2 until a real cycle has been run and committed.
|
|
9
|
-
|
|
10
|
-
## When to Use
|
|
11
|
-
|
|
12
|
-
- Before wiring a loop to run unattended (cron/launchd/GitHub Actions `on: schedule`).
|
|
13
|
-
- Before promoting a loop from supervised to unattended.
|
|
14
|
-
- On a cadence, to detect readiness drift (skills deleted, guard unregistered, state file dropped).
|
|
15
|
-
- When a stakeholder asks "is this project ready to loop?"
|
|
16
|
-
|
|
17
|
-
## When NOT to Use
|
|
18
|
-
|
|
19
|
-
- A single interactive change with no schedule or repetition (no loop to score).
|
|
20
|
-
- Scoring the *correctness* of loop output — that is `/loop-review`'s job (exit-code evidence). This skill scores *readiness*, not output quality.
|
|
21
|
-
|
|
22
|
-
## The Signals
|
|
23
|
-
|
|
24
|
-
Each signal is a concrete, checkable artifact. Check the repo; mark present/absent; sum the points. No signal is subjective — if you cannot point to a file/line, the signal is absent.
|
|
25
|
-
|
|
26
|
-
| # | Signal | What counts as present | Points |
|
|
27
|
-
|---|---|---|---|
|
|
28
|
-
| 1 | **State file present** | A loop state ledger exists and is non-empty: `.pi/loops/*/STATE.json` (or `STATE.md`) with `processed`, `in_progress`, `completed`, `failures`, `lessons`, `metrics`, `last_run`, `stop_conditions_met`. A blank/template file scores 0. | 12 |
|
|
29
|
-
| 2 | **Verifier/gate present** | An objective gate command is named in a VISION/contract file (`Gate:` section of `.pi/loops/*/VISION.md`) OR a gate script exists (e.g. `scripts/loop-gate.sh`, `npm test` referenced by the orchestrator). The gate must be a real command whose exit code decides pass/fail. | 14 |
|
|
30
|
-
| 3 | **Loop skills present** | Count of loop-* skills/prompts installed: `loop-engineering`, `loop-audit`, `loop-cost`, `/loop-check`, `/loop-review`, `/loop-init`. 0 → 0; 1-2 → 4; 3-4 → 8; 5-6 → 12. | 12 |
|
|
31
|
-
| 4 | **Safety docs** | Never-do rules + security checklist documented (in `loop-engineering` skill or a loop's `SKILL.md`): refuse-list (auth/payments/architecture), path protection, dangerous-cmd block, human-approval-required. | 10 |
|
|
32
|
-
| 5 | **GitHub workflows** | A loop CI workflow exists: `.github/workflows/*loop*` referencing `pi -p` and `on: schedule`, OR the `loop-github-action.yml` template is present. | 10 |
|
|
33
|
-
| 6 | **MCP / tool_call hook** | A `tool_call` hook is registered (`.pi/extensions/loop-guard.ts` in `.pi/settings.json` `extensions[]`) OR an MCP adapter is wired for loop tools. Capability-deprivation (`--tools` allowlist) documented in the orchestrator counts. | 8 |
|
|
34
|
-
| 7 | **Worktree evidence** | The orchestrator uses `git worktree add` for isolation (grep the orchestrator source/template for `worktree`). | 10 |
|
|
35
|
-
| 8 | **Cost observability** | Both: (a) a budget/cost doc or estimate (e.g. `loop-cost` output, a `BUDGET.md`, a cap value in the orchestrator) AND (b) a per-run log exists (a `logs/` or `.pi/loops/*/logs/` dir with at least one run log). One without the other → 6. | 12 |
|
|
36
|
-
| 9 | **Real loop activity** | A proven committed run: `STATE.json.last_run` set AND at least one item in `processed[]`/`completed[]` AND a git commit/PR attributable to the loop (branch `loop/<name>/<ts>` merged or open). A scaffolded-but-never-run loop scores 0 here. | 12 |
|
|
37
|
-
| | **Total** | | **100** |
|
|
38
|
-
|
|
39
|
-
## Scoring (0-100 → level)
|
|
40
|
-
|
|
41
|
-
Compute the raw score by summing the present signals (0-100). Then map to a level.
|
|
42
|
-
|
|
43
|
-
| Level | Raw score | Extra requirements |
|
|
44
|
-
|---|---|---|
|
|
45
|
-
| **L0** — nothing real | 0-29 | (none) |
|
|
46
|
-
| **L1** — structure, not yet run | 30-54 | (none) |
|
|
47
|
-
| **L2** — capable but unproven | 55-77, **OR** score ≥78 but missing any L3 gate requirement | (none) |
|
|
48
|
-
| **L3** — proven, gated, ready for unattended | ≥78 **AND** verifier/gate present (signal 2) **AND** state file present (signal 1) **AND** cost-ready (signal 8 full) **AND** real loop activity (signal 9) | **All four gate requirements must hold.** |
|
|
49
|
-
|
|
50
|
-
### The L3 gate (hard)
|
|
51
|
-
|
|
52
|
-
L3 is **capped at L2** until a real cycle has been run and committed. Structure alone — no matter how complete — cannot earn L3. The four gate requirements are conjunctive:
|
|
53
|
-
|
|
54
|
-
1. **Verifier/gate present** (signal 2, ≥14 available; must be present at all).
|
|
55
|
-
2. **State file present** (signal 1; a populated, non-template ledger).
|
|
56
|
-
3. **Cost-ready** (signal 8 full — both budget doc *and* run log).
|
|
57
|
-
4. **Proven committed run** (signal 9 — a real cycle, committed, not a scaffold).
|
|
58
|
-
|
|
59
|
-
If raw score ≥78 but any gate requirement is missing → **L2** with a recommendation naming the missing requirement. This is the anti-Ralph-Wiggum guard: a project that *looks* ready but has never actually run a cycle is not trusted to run unattended.
|
|
60
|
-
|
|
61
|
-
> **Scenario (FR4):** project has structure (score 80) but no proven runs → **L2** with recommendation "run one L1 cycle and commit state before L3".
|
|
62
|
-
|
|
63
|
-
## Output Contract
|
|
64
|
-
|
|
65
|
-
The skill always emits three things — no exceptions:
|
|
66
|
-
|
|
67
|
-
1. **Score:** the integer 0-100, with a one-line breakdown (which signals were present and their points). Reproducible: another agent re-running the rubric on the same repo must get the same number.
|
|
68
|
-
2. **Level:** one of `L0`, `L1`, `L2`, `L3`. If the raw score would map to L3 but a gate requirement fails, state the **capped** level (L2) and name the failed gate requirement.
|
|
69
|
-
3. **Recommendations:** ≥1 concrete, actionable next step tied to a missing/weak signal. Each recommendation cites the signal it would raise and the points recoverable. Never empty.
|
|
70
|
-
|
|
71
|
-
Output shape (markdown):
|
|
72
|
-
|
|
73
|
-
```markdown
|
|
74
|
-
# Loop Readiness Audit
|
|
75
|
-
|
|
76
|
-
**Score:** 64/100
|
|
77
|
-
**Level:** L2 (capped from L3 — missing proven committed run)
|
|
78
|
-
|
|
79
|
-
## Breakdown
|
|
80
|
-
| Signal | Present? | Points |
|
|
81
|
-
|---|---|---|
|
|
82
|
-
| State file | ✅ | 12 |
|
|
83
|
-
| Verifier/gate | ✅ | 14 |
|
|
84
|
-
| Loop skills | ✅ (4/6) | 8 |
|
|
85
|
-
| Safety docs | ✅ | 10 |
|
|
86
|
-
| GitHub workflows | ❌ | 0 |
|
|
87
|
-
| MCP/tool_call hook | ✅ | 8 |
|
|
88
|
-
| Worktree evidence | ✅ | 10 |
|
|
89
|
-
| Cost observability | ⚠ partial (budget doc only) | 6 |
|
|
90
|
-
| Real loop activity | ❌ | 0 |
|
|
91
|
-
| **Total** | | **64** |
|
|
92
|
-
|
|
93
|
-
## Gate check (L3)
|
|
94
|
-
- Verifier/gate: ✅
|
|
95
|
-
- State file: ✅
|
|
96
|
-
- Cost-ready (both): ❌ (no run log)
|
|
97
|
-
- Proven committed run: ❌
|
|
98
|
-
→ Capped at L2.
|
|
99
|
-
|
|
100
|
-
## Recommendations
|
|
101
|
-
1. Add `.github/workflows/loop-*.yml` with `on: schedule` + `pi -p` to recover +10 (signal 5) and unblock CI unattended runs.
|
|
102
|
-
2. Run one supervised L1 cycle end-to-end and commit `STATE.json` (branch `loop/<name>/<ts>` + PR) to recover +12 (signal 9) and satisfy the L3 proven-run gate.
|
|
103
|
-
3. Emit per-run logs to `.pi/loops/*/logs/` to complete cost observability (+6, signal 8) and satisfy the cost-ready gate requirement.
|
|
104
|
-
```
|
|
105
|
-
|
|
106
|
-
## Verification
|
|
107
|
-
|
|
108
|
-
Before claiming an audit is complete:
|
|
109
|
-
|
|
110
|
-
- [ ] The output contains a numeric score 0-100 and a reproducible breakdown (every signal marked present/absent/partial with its points).
|
|
111
|
-
- [ ] The output contains a level L0/L1/L2/L3. If L3 is claimed, all four gate requirements are explicitly checked and present.
|
|
112
|
-
- [ ] The output contains ≥1 recommendation, each tied to a missing/weak signal with recoverable points.
|
|
113
|
-
- [ ] A second pass over the same repo yields the same score (rubric is deterministic — disagreement means a signal was scored subjectively; re-check the artifact).
|
|
114
|
-
- [ ] L3 is never awarded without a proven committed run (signal 9 present AND a `loop/<name>/<ts>` branch/PR exists in git).
|
|
115
|
-
|
|
116
|
-
## See Also
|
|
117
|
-
|
|
118
|
-
| Companion | Role |
|
|
119
|
-
|---|---|
|
|
120
|
-
| `loop-engineering` | Methodology this skill scores against (2-condition test, 5 building blocks) |
|
|
121
|
-
| `/loop-check` | NO-GO qualification gate (run *before* a loop, not after) |
|
|
122
|
-
| `/loop-review` | Maker/checker — scores a *run's* output via exit code, not project readiness |
|
|
123
|
-
| `loop-cost` | Cost estimation feeding the cost-observability signal (8) |
|
|
124
|
-
| `loop-orchestrator.ts`/`.sh` | Runtime whose worktree/gate/state usage signals 1, 2, 7 detect |
|
|
125
|
-
|
|
126
|
-
## Common Rationalizations
|
|
127
|
-
|
|
128
|
-
| Rationalization | Reality |
|
|
129
|
-
|---|---|
|
|
130
|
-
| "We scaffolded the whole kit, that's L3" | Structure without a proven committed run is L2 max. The gate exists to prevent this exact claim. |
|
|
131
|
-
| "The score feels low" | The score is the rubric. Re-check the artifacts; if a signal is absent, it's 0 — no partial credit for "almost". |
|
|
132
|
-
| "We ran a loop once manually, that's proven" | Manual ≠ committed. Proven requires a committed cycle (branch + PR/commit attributable to the loop). |
|
|
133
|
-
| "L3 just needs a high score" | L3 needs score ≥78 **and** all four gate requirements. High score alone caps at L2. |
|
|
134
|
-
|
|
135
|
-
## Red Flags
|
|
136
|
-
|
|
137
|
-
- An L3 score with no `loop/<name>/<ts>` branch or PR in git history (gate failed; cap to L2).
|
|
138
|
-
- A score with no breakdown table (not reproducible — redo the audit).
|
|
139
|
-
- Zero recommendations (the contract requires ≥1; an empty recommendation list means the audit is incomplete).
|
|
140
|
-
- A signal marked present with no file/line citation (subjective scoring; re-check).
|
|
141
|
-
- Cost observability scored full with only a budget doc and no run log (signal 8 requires both).
|
|
@@ -1,130 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: loop-cost
|
|
3
|
-
description: Use when estimating tokens/day for an unattended coding loop before it ships — computes cadence × realistic blend per L1/L2/L3, a suggested daily cap, and the early-exit-required flag (early-exit on empty watchlist < 5k tokens). Load before approving a loop's budget or setting a per-run cap.
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
# Loop Cost
|
|
7
|
-
|
|
8
|
-
Token-budget skill for unattended coding loops. Answers the budget half of the loop-engineering 2-condition test: *is the cost of one wasted run small enough that the cadence does not bankrupt you?* Produces a tokens/day estimate, a suggested daily cap, and an early-exit flag the orchestrator (`loop-orchestrator.ts`) consumes from `--mode json` token events (FR13).
|
|
9
|
-
|
|
10
|
-
## When to Use
|
|
11
|
-
|
|
12
|
-
- You are about to qualify or ship an unattended loop and need to set a per-run cap.
|
|
13
|
-
- An orchestrator asks for a daily budget number before registering a cron/GitHub Actions schedule.
|
|
14
|
-
- You are sizing whether a no-op (watchlist empty) run is cheap enough to schedule hourly.
|
|
15
|
-
- You are revising a cap after cadence or level changes (e.g., L2 triage promoted to L3 fix loop).
|
|
16
|
-
|
|
17
|
-
## When NOT to Use
|
|
18
|
-
|
|
19
|
-
- Interactive single runs with a human watching every turn — no cadence, no budget problem.
|
|
20
|
-
- Estimating dollar cost or latency — this is tokens only; multiply by model price elsewhere.
|
|
21
|
-
- Choosing loop qualification or confidence thresholds — those live in `loop-engineering` and `loop-guard`.
|
|
22
|
-
|
|
23
|
-
## The Blend Model
|
|
24
|
-
|
|
25
|
-
A loop's per-run cost is not a single number; it is a **blend** over what actually happens across runs. Each loop is assigned one Level; within that Level, runs distribute across three outcomes:
|
|
26
|
-
|
|
27
|
-
| Level | Early-exit (watchlist empty / no-op) | Triage (diagnosis-only, no fix) | Full fix (edit + verify) |
|
|
28
|
-
| ----- | ------------------------------------ | ------------------------------- | ------------------------ |
|
|
29
|
-
| **L1** | 90% | 10% | — |
|
|
30
|
-
| **L2** | 85% | 10% | 5% |
|
|
31
|
-
| **L3** | 40% | 35% | 25% |
|
|
32
|
-
|
|
33
|
-
**Per-run token estimates (assumptions — state yours explicitly and override for your stack):**
|
|
34
|
-
|
|
35
|
-
- **Early-exit:** 2,000 tokens (read watchlist, confirm empty, write state, exit).
|
|
36
|
-
- **Triage:** 12,000 tokens (read failure, reproduce attempt, write diagnosis-only comment, no edits).
|
|
37
|
-
- **Full fix:** 80,000 tokens (read + reproduce + edit + run gate + revise + post).
|
|
38
|
-
|
|
39
|
-
These are starting points for a pi 0.79.x maker on a mid-size TS/JS repo with a passing gate. For larger repos, heavier languages, or failing gates, raise them. Always state the assumptions used so the arithmetic is reproducible.
|
|
40
|
-
|
|
41
|
-
**Expected tokens per run** = blend:
|
|
42
|
-
|
|
43
|
-
```
|
|
44
|
-
E[run] = p_early * T_early + p_triage * T_triage + p_fix * T_fix
|
|
45
|
-
```
|
|
46
|
-
|
|
47
|
-
where `p_*` are the Level's percentages (as fractions) and `T_*` the per-run estimates above.
|
|
48
|
-
|
|
49
|
-
## Daily Cap Formula
|
|
50
|
-
|
|
51
|
-
Given a **cadence** (runs/day) and a Level, the daily budget is:
|
|
52
|
-
|
|
53
|
-
```
|
|
54
|
-
tokens_per_day = cadence_per_day * E[run]
|
|
55
|
-
suggested_daily_cap = ceil(tokens_per_day * safety_factor)
|
|
56
|
-
```
|
|
57
|
-
|
|
58
|
-
- **cadence_per_day**: scheduled runs in 24h. Hourly = 24, every-15-min = 96, nightly = 1, weekdays-only nightly = ~22.
|
|
59
|
-
- **safety_factor**: 1.5 default. The blend is an *expectation*; real days cluster worse (a flaky CI + a real bug in one day). 1.5 covers one bad day without doubling cost. Lower to 1.25 for well-observed stable loops; raise to 2.0 for new loops with unknown variance.
|
|
60
|
-
- **per_run_cap**: set on the orchestrator's `--mode json` token watcher to `safety_factor * T_fix` (the worst single-run case), not the blend — a single run that blows past `T_fix` is a runaway, not a bad day. FR13's "exceeds a per-run cap" kill trigger uses this number.
|
|
61
|
-
|
|
62
|
-
The orchestrator accumulates usage from token events and **kills the loop when usage exceeds `suggested_daily_cap`** (FR13), recording it in state.
|
|
63
|
-
|
|
64
|
-
## Early-Exit Rule
|
|
65
|
-
|
|
66
|
-
**Early-exit-required flag** is set (true) when the loop's no-op case costs under the early-exit ceiling:
|
|
67
|
-
|
|
68
|
-
```
|
|
69
|
-
early_exit_required = T_early < 5000
|
|
70
|
-
```
|
|
71
|
-
|
|
72
|
-
With the default `T_early = 2,000`, the flag is **true** for every Level. If your stack pushes `T_early ≥ 5,000` (huge watchlist read, chatty state writes), the flag flips to **false** — that loop is too expensive to schedule freely and should be re-scoped or scheduled less often.
|
|
73
|
-
|
|
74
|
-
When `early_exit_required` is true and the watchlist is empty, the orchestrator **must** early-exit and must not proceed to triage/fix — FR13: *"no-op (watchlist empty) → early-exit < 5k tokens."* The non-functional ceiling is a hard 5k.
|
|
75
|
-
|
|
76
|
-
## Worked Examples
|
|
77
|
-
|
|
78
|
-
Assumptions (used in every row): `T_early = 2,000`, `T_triage = 12,000`, `T_fix = 80,000`, `safety_factor = 1.5`.
|
|
79
|
-
|
|
80
|
-
| Cadence (runs/day) | Level | E[run] arithmetic | E[run] (tokens) | tokens/day | Suggested daily cap (×1.5) | per-run cap (×1.5 of T_fix=80k) | Early-exit required? |
|
|
81
|
-
| ------------------ | ----- | ----------------- | --------------- | ---------- | -------------------------- | ------------------------------ | -------------------- |
|
|
82
|
-
| 1 (nightly) | L1 | 0.90·2k + 0.10·12k + 0·80k = 1,800 + 1,200 | 3,000 | 3,000 | 4,500 | 120,000 | yes (2k < 5k) |
|
|
83
|
-
| 24 (hourly) | L1 | 0.90·2k + 0.10·12k = 3,000 | 3,000 | 72,000 | 108,000 | 120,000 | yes |
|
|
84
|
-
| 1 (nightly) | L2 | 0.85·2k + 0.10·12k + 0.05·80k = 1,700 + 1,200 + 4,000 | 6,900 | 6,900 | 10,350 | 120,000 | yes |
|
|
85
|
-
| 24 (hourly) | L2 | 0.85·2k + 0.10·12k + 0.05·80k = 6,900 | 6,900 | 165,600 | 248,400 | 120,000 | yes |
|
|
86
|
-
| 1 (nightly) | L3 | 0.40·2k + 0.35·12k + 0.25·80k = 800 + 4,200 + 20,000 | 25,000 | 25,000 | 37,500 | 120,000 | yes |
|
|
87
|
-
| 24 (hourly) | L3 | 0.40·2k + 0.35·12k + 0.25·80k = 25,000 | 25,000 | 600,000 | 900,000 | 120,000 | yes |
|
|
88
|
-
| 96 (every 15 min) | L1 | 3,000 | 3,000 | 288,000 | 432,000 | 120,000 | yes |
|
|
89
|
-
| 96 (every 15 min) | L3 | 25,000 | 25,000 | 2,400,000 | 3,600,000 | 120,000 | yes |
|
|
90
|
-
|
|
91
|
-
**Reading the table:** the hourly-L3 row (600k tokens/day, 900k cap) is the warning zone — an L3 fix loop scheduled hourly is almost certainly over-budget unless you have a large API quota. The nightly-L1 row (4,500 cap) is the cheap, safe default for triage loops. The every-15-min-L3 row is a NO-GO on cost alone — re-scope to L2 or drop cadence.
|
|
92
|
-
|
|
93
|
-
## Output Contract
|
|
94
|
-
|
|
95
|
-
`loop-cost` produces a single budget record the orchestrator consumes:
|
|
96
|
-
|
|
97
|
-
```json
|
|
98
|
-
{
|
|
99
|
-
"level": "L1" | "L2" | "L3",
|
|
100
|
-
"cadence_per_day": <number>,
|
|
101
|
-
"assumptions": {
|
|
102
|
-
"T_early": 2000,
|
|
103
|
-
"T_triage": 12000,
|
|
104
|
-
"T_fix": 80000,
|
|
105
|
-
"safety_factor": 1.5
|
|
106
|
-
},
|
|
107
|
-
"E_run_tokens": <number>,
|
|
108
|
-
"tokens_per_day": <number>,
|
|
109
|
-
"suggested_daily_cap": <number>,
|
|
110
|
-
"per_run_cap": <number>,
|
|
111
|
-
"early_exit_required": true | false,
|
|
112
|
-
"verdict": "GO" | "NO-GO"
|
|
113
|
-
}
|
|
114
|
-
```
|
|
115
|
-
|
|
116
|
-
- `verdict` is **NO-GO** if `suggested_daily_cap` exceeds your stated daily quota, or if `early_exit_required` is false, or if the loop is L3 at sub-hourly cadence (cost-only refuse, independent of the refuse-list in `loop-engineering`).
|
|
117
|
-
- The orchestrator reads `suggested_daily_cap` into its `--mode json` token watcher and `per_run_cap` as the per-run kill threshold (FR13).
|
|
118
|
-
|
|
119
|
-
## Verification
|
|
120
|
-
|
|
121
|
-
A `loop-cost` estimate is correct iff all of these hold:
|
|
122
|
-
|
|
123
|
-
1. **Arithmetic reproduces.** Re-run `E[run] = p_early·T_early + p_triage·T_triage + p_fix·T_fix` with the Level's percentages and the stated `T_*`; the result matches `E_run_tokens`. The percentages are the only Level-dependent input.
|
|
124
|
-
2. **Percentages sum to 1.0 per Level.** L1: 0.90 + 0.10 = 1.00. L2: 0.85 + 0.10 + 0.05 = 1.00. L3: 0.40 + 0.35 + 0.25 = 1.00. If they don't sum to 1.00, the blend is wrong.
|
|
125
|
-
3. **Cap ≥ tokens/day.** `suggested_daily_cap = ceil(tokens_per_day * safety_factor)` and `safety_factor ≥ 1.0`. A cap below `tokens_per_day` is a bug.
|
|
126
|
-
4. **per_run_cap uses the worst case, not the blend.** `per_run_cap = safety_factor * T_fix`. If it equals `safety_factor * E_run_tokens`, the runaway-run protection is broken.
|
|
127
|
-
5. **Early-exit flag matches `T_early`.** `early_exit_required = (T_early < 5000)`. With `T_early = 2,000` it is `true`. If `T_early` is overridden and the flag wasn't recomputed, it's stale.
|
|
128
|
-
6. **Assumptions are explicit.** Every number in the output traces to an entry in `assumptions`. No hidden constants.
|
|
129
|
-
|
|
130
|
-
Run the check manually: pick any Worked Examples row, substitute the Level percentages into the blend formula with the stated `T_*`, and confirm `E_run_tokens` and `suggested_daily_cap` match. If they do, the model is internally consistent; if a row is off, the skill is broken.
|
|
@@ -1,175 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: loop-engineering
|
|
3
|
-
description: Use when designing, qualifying, or running unattended coding loops (nightly CI triage, dependency bumps, doc sync, PR babysitting). Encodes the 2-condition test, the 5 building blocks, the VISION/state anti-drift contract, failure modes (Ralph Wiggum), confidence-gated action, and the honest ceiling.
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
# Loop Engineering
|
|
7
|
-
|
|
8
|
-
Methodology for designing systems that prompt an agent on a schedule, rather than hand-prompting it every turn. Composes native pi capabilities — never ships a daemon, never rebuilds scheduling/subagent/worktree. A loop is **qualified before it runs**, **gated by exit code before it ships**, **bound by a contract before it drifts**, and **budget-capped before it bankrupts you**.
|
|
9
|
-
|
|
10
|
-
## When to Use
|
|
11
|
-
|
|
12
|
-
- You are asked to "automate", "schedule", or "run nightly/unattended" a coding task.
|
|
13
|
-
- You are about to let an agent act repeatedly without you watching every turn.
|
|
14
|
-
- A task is being proposed for an orchestrator (`loop-orchestrator.ts`/`.sh`) or a GitHub Actions `on: schedule` run.
|
|
15
|
-
- You need to qualify, contract, score, review, or budget a loop before it ships anything.
|
|
16
|
-
|
|
17
|
-
## When NOT to Use
|
|
18
|
-
|
|
19
|
-
- A single interactive change with a human in the loop and a clear exit (no schedule, no repetition).
|
|
20
|
-
- Tasks that cannot pass the 2-condition test (see below) — those are NO-GO; do not loop them.
|
|
21
|
-
|
|
22
|
-
## The 2-Condition Test
|
|
23
|
-
|
|
24
|
-
Before any loop is allowed to run, **both** conditions must hold. If either fails, refuse (NO-GO) and cite which one:
|
|
25
|
-
|
|
26
|
-
1. **Verification is automated.** There exists an objective command whose exit code decides pass/fail — `npm test`, `tsc --noEmit`, `eslint`, a custom gate script. The stop condition is **computational (exit code)**, never an LLM's opinion.
|
|
27
|
-
2. **The token budget absorbs the waste.** The cost of one wasted run (gate fails, no-op, early-exit) is small enough that running the loop on its scheduled cadence does not blow the budget. Estimate it with `loop-cost` first; set a per-run cap.
|
|
28
|
-
|
|
29
|
-
Refuse-list (immediate NO-GO, no further test): **auth, payments, architecture.** These need human judgement a loop cannot provide. `/loop-check` codifies this gate.
|
|
30
|
-
|
|
31
|
-
> **Example (GO):** "triage failing CI nightly" with `npm test` as gate → verification automated, no-op early-exit <5k tokens → GO, citing "verification automated + budget absorbs waste".
|
|
32
|
-
> **Example (NO-GO):** "rewrite auth module" → refuse-list hit → NO-GO, citing "architecture refused".
|
|
33
|
-
|
|
34
|
-
## The 5 Building Blocks
|
|
35
|
-
|
|
36
|
-
Every loop is built from exactly five components. Missing any one is a NO-GO.
|
|
37
|
-
|
|
38
|
-
| # | Block | What it is | Where it lives |
|
|
39
|
-
|---|---|---|---|
|
|
40
|
-
| 1 | **VISION** | The contract: Goal, Scope, Out-of-scope, Definition-of-done, Gate (exact command, pass = exit 0), Hard stops, Human-approval-required. Reread at the start of every run. | `loop-vision.md` (template → `.pi/loops/<name>/VISION.md`) |
|
|
41
|
-
| 2 | **State** | Dedup ledger + working memory: Last run, In progress, Completed, Escalated, Lessons, **Processed items** keyed by stable IDs (CI run IDs, PR numbers, `package@version`, commit SHAs), Stop conditions. | `loop-state.md` + `loop-state.json` (machine ledger) |
|
|
42
|
-
| 3 | **Gate** | The exact command the orchestrator runs via bash and reads the exit code of. Ships only on exit 0; on non-zero, logs + records failure + cleans up. | The `Gate` section of `VISION.md` |
|
|
43
|
-
| 4 | **Qualification** | The NO-GO gate applied before the loop is ever scheduled (`/loop-check`): refuse-list + 2-condition test + 30-second checklist (objective gate exists, hard stop exists, human approves merge/deploy/deps). | `/loop-check` prompt |
|
|
44
|
-
| 5 | **Readiness** | A scored measure of whether this project is actually loop-ready: 0-100 + L0/L1/L2/L3 from signals (state file, verifier, gate, loop skills, safety docs, GH workflows, MCP, worktree, cost observability, **real loop activity**). L3 is capped at L2 until a real cycle is run and committed. | `loop-audit` skill |
|
|
45
|
-
|
|
46
|
-
The maker (the `pi -p` invocation that does the work) is **not** a building block — it is structurally deprived of ship tools (`--tools read,edit,write,bash,grep,find`). It can only stage work; the orchestrator ships **after** the gate passes. Capability-deprivation is structural, not behavioural.
|
|
47
|
-
|
|
48
|
-
## VISION/State Pattern (Anti Goal-Drift Contract)
|
|
49
|
-
|
|
50
|
-
Goal drift kills loops. Context summarization silently drops constraints mid-run; the agent then acts outside the original boundaries. The VISION/state pattern is the contract that prevents this.
|
|
51
|
-
|
|
52
|
-
- **Reread VISION.md at the start of every run.** Boundaries are re-derived from disk, not from context. When context summarization drops a hard stop, VISION.md restores it.
|
|
53
|
-
- **The loop must not act outside VISION.md.** Out-of-scope is as binding as in-scope. If a run discovers work that looks useful but is not in scope, escalate (write it to state's `Escalated`), do not do it.
|
|
54
|
-
- **State dedup makes re-runs idempotent.** `STATE.json.processed[]` is keyed by stable IDs. The orchestrator skips already-processed items; deleting `STATE.json` reprocesses everything. This is what makes scheduling safe — a duplicate trigger is harmless.
|
|
55
|
-
- **Hard stops are non-negotiable.** If a hard stop fires (gate fails repeatedly, budget cap hit, forbidden path touched), the loop records the stop in state and exits; it does not improvise a workaround.
|
|
56
|
-
|
|
57
|
-
## Failure Modes
|
|
58
|
-
|
|
59
|
-
### The Ralph Wiggum Loop
|
|
60
|
-
|
|
61
|
-
> _"I'm helping! I'm helping!"_ — shipping on LLM opinion instead of an exit code.
|
|
62
|
-
|
|
63
|
-
The canonical failure: the maker claims the work is done ("tests pass", "it works"), the loop ships it, and nobody ran the gate. The LLM self-approves. The result is broken work merged to main on the strength of a confident sentence.
|
|
64
|
-
|
|
65
|
-
**Prevention:** the stop condition is the gate's exit code, full stop. The orchestrator runs the gate via bash and reads `$?`. The maker's opinion is not evidence. `/loop-review` (maker/checker) defaults to REJECT when uncertain and cites the exact exit code in its evidence block. **Never let "I think it passes" substitute for `exit 0`.**
|
|
66
|
-
|
|
67
|
-
### Acting Without Confidence
|
|
68
|
-
|
|
69
|
-
The companion failure: the agent fixes a CI failure it cannot reproduce, on a hunch, and ships the fix. The gate may even pass — but the fix is a guess, not a diagnosis.
|
|
70
|
-
|
|
71
|
-
**Prevention:** confidence-gated action (next section). No local reproduction → no fix; write diagnosis-only and escalate.
|
|
72
|
-
|
|
73
|
-
### Other failure modes
|
|
74
|
-
|
|
75
|
-
- **Context drop** → mitigated by VISION.md reread (above).
|
|
76
|
-
- **Cost runaway** → mitigated by the 2-condition test (budget absorbs waste) + `loop-cost` estimate + per-run cap in the orchestrator.
|
|
77
|
-
- **Stray ship to main** → mitigated structurally: always ship to `loop/<name>/<ts>` branch + PR; human approves merge; `--tools` omits ship tools; never-do rules block auth/payments/architecture.
|
|
78
|
-
- **Gaming the gate** (maker edits the test/gate file to force a pass) → mitigated by Phase C path protection (`loop-guard.ts` blocks edits to `VISION.md`, `package.json`, lockfiles, the gate script). This is **not** eliminated — see Honest Ceiling.
|
|
79
|
-
|
|
80
|
-
## Confidence-Gated Action
|
|
81
|
-
|
|
82
|
-
When the maker is classifying a CI failure (or any observed anomaly) and deciding whether to act, run this decision procedure — do not fix on a hunch.
|
|
83
|
-
|
|
84
|
-
1. **Classify** the failure into one of: `flaky`, `infra`, `unknown`, `logic`, `regression`, `dependency`, `docs`, `test-only`. (Extend the taxonomy in the loop's `SKILL.md` as patterns emerge.)
|
|
85
|
-
2. **Estimate confidence** 0.0–1.0 from the evidence available to the maker *right now* (logs, diff, reproducer, prior `STATE.json.lessons`).
|
|
86
|
-
3. **Reproduce locally** if the class is reproducible. If you **cannot reproduce** the failure locally, **lower confidence and abort the fix** — write a diagnosis-only entry to state and escalate.
|
|
87
|
-
4. **Act only if** `confidence > threshold` **AND** `class ∉ {flaky, infra, unknown}`. Otherwise: **diagnosis-only**, no code change. Record the diagnosis in `STATE.json.failures[]` / `lessons[]` so the next run starts warmer.
|
|
88
|
-
|
|
89
|
-
```
|
|
90
|
-
if class in {flaky, infra, unknown}:
|
|
91
|
-
→ diagnosis-only (no fix)
|
|
92
|
-
elif not reproducible_locally:
|
|
93
|
-
→ lower confidence, abort fix, diagnosis-only
|
|
94
|
-
elif confidence > threshold:
|
|
95
|
-
→ act (fix), then run the gate
|
|
96
|
-
else:
|
|
97
|
-
→ diagnosis-only
|
|
98
|
-
```
|
|
99
|
-
|
|
100
|
-
The threshold is set per-loop in `VISION.md` (default `0.7`). Flaky/infra/unknown are **never** auto-fixed — they are noise that a loop must not chase, because chasing them is how the Ralph Wiggum loop starts (shipping a guess to make the gate green).
|
|
101
|
-
|
|
102
|
-
## Security Checklist
|
|
103
|
-
|
|
104
|
-
Before a loop is allowed to run unattended, confirm every item. A miss is a NO-GO.
|
|
105
|
-
|
|
106
|
-
- [ ] **Maker is capability-deprived.** `pi -p --tools read,edit,write,bash,grep,find` — no `push`/`pr`/`slack` in the allowlist. Audit the `--mode json` log: zero ship-tool calls recorded.
|
|
107
|
-
- [ ] **Gate is the only ship signal.** Exit 0 → push `loop/<name>/<ts>` branch + PR. Non-zero → no ship, record failure. Never ship to `main`.
|
|
108
|
-
- [ ] **Human approves merge/deploy/dependency changes.** The loop opens PRs; a human merges them. Never auto-merge.
|
|
109
|
-
- [ ] **Never-do rules enforced.** `loop-guard.ts` blocks bash matching auth/payments/architecture patterns, and protects `VISION.md`, `package.json`, lockfiles, and the gate script from edits.
|
|
110
|
-
- [ ] **Dangerous commands blocked when unattended.** `rm -rf`, `sudo`, `chmod 777` are blocked when `!ctx.hasUI` (no UI to confirm).
|
|
111
|
-
- [ ] **No secrets in the repo.** API keys via env/CI secrets (`PI_API_KEY`, `GH_TOKEN`), never committed.
|
|
112
|
-
- [ ] **Logs sanitized in unattended runs.** No secrets, tokens, or PII in per-run log files.
|
|
113
|
-
- [ ] **Idempotent + isolated.** `git worktree` per run (no file collision between parallel loops); `STATE.json.processed[]` skips duplicates; cleanup on pass *and* fail.
|
|
114
|
-
- [ ] **Graceful degradation.** Bash orchestrator uses `set -uo pipefail` (NOT `-e`); SDK orchestrator wraps each step in try/catch. A failing loop logs + records + the scheduler moves on — a loop never crashes the scheduler.
|
|
115
|
-
- [ ] **Permissions re-audited periodically.** Loop permissions drift; re-check never-do lists and protected paths on a cadence.
|
|
116
|
-
|
|
117
|
-
## The Honest Ceiling
|
|
118
|
-
|
|
119
|
-
What loops can and cannot do — stated plainly, because overselling the gate is how loops ship broken work to main.
|
|
120
|
-
|
|
121
|
-
- **The exit-code gate does not verify semantic correctness.** It verifies that a command exited 0. A test suite that passes can still encode wrong behaviour (Martin Fowler's "behaviour harness" gap — industry-unsolved). The gate is only as good as the test suite behind it.
|
|
122
|
-
- **Structural gaming is mitigated, not eliminated.** `loop-guard.ts` blocks edits to test/gate files, but cannot catch all semantic gaming (e.g. weakening an assertion without touching the gate file). An optional advisory LLM verifier can flag suspicious diffs, but the binding decision remains the exit code.
|
|
123
|
-
- **A loop cannot provide judgement it was not given.** Auth, payments, architecture, and "is this the right product decision" stay human. The refuse-list is the honest acknowledgment that some questions are not computable from the project's current state.
|
|
124
|
-
- **No live TUI/dashboard.** Observability is logs + `STATE.json` + PR history + `loop-audit` scores. A daemon with a dashboard is explicitly out of scope — compose external schedulers (cron/launchd/GitHub Actions) instead.
|
|
125
|
-
- **One loop per orchestrator invocation.** Concurrency is the scheduler's job, not the orchestrator's. Run multiple schedulers for multiple loops.
|
|
126
|
-
|
|
127
|
-
If a stakeholder asks "can the loop guarantee the work is correct?", the honest answer is: **no — it guarantees the gate passed, on a branch, reviewed by a human before merge.** That is the contract. Do not promise more.
|
|
128
|
-
|
|
129
|
-
## Verification
|
|
130
|
-
|
|
131
|
-
Before claiming a loop is engineered (not just scripted):
|
|
132
|
-
|
|
133
|
-
- [ ] `/loop-check <task>` returns GO with a cited condition (or NO-GO with a reason). Refuse-list tasks return NO-GO.
|
|
134
|
-
- [ ] `.pi/loops/<name>/VISION.md` has Goal, Scope, Out-of-scope, DoD, Gate, Hard stops, Human-approval-required.
|
|
135
|
-
- [ ] `.pi/loops/<name>/STATE.json` is valid JSON with `processed`, `in_progress`, `completed`, `escalated`, `failures`, `lessons`, `metrics`, `last_run`, `stop_conditions_met`.
|
|
136
|
-
- [ ] The gate command from `VISION.md` runs via bash and its exit code is the ship signal (parse the `--mode json` log; assert no `push`/`pr`/`slack` tool calls).
|
|
137
|
-
- [ ] Failing gate → no PR + `STATE.json` records `failed`; second run on the same item → skipped (idempotent).
|
|
138
|
-
- [ ] `loop-audit` returns a numeric score + L0-L3 + ≥1 recommendation; L3 only if a proven committed run exists.
|
|
139
|
-
- [ ] Cost: `loop-cost` estimate is within the daily cap; per-run cap in the orchestrator kills an over-budget loop and records the kill in state.
|
|
140
|
-
|
|
141
|
-
## See Also
|
|
142
|
-
|
|
143
|
-
| Companion | Role |
|
|
144
|
-
|---|---|
|
|
145
|
-
| `/loop-check` | NO-GO qualification gate (refuse-list + 2-condition test + 30s checklist) |
|
|
146
|
-
| `/loop-init` | Scaffold `.pi/loops/<name>/` from templates |
|
|
147
|
-
| `/loop-review` | Maker/checker — verifier runs the gate, cites exit code, defaults to REJECT |
|
|
148
|
-
| `loop-audit` | Readiness scoring (0-100 + L0-L3, L3 gated on proven run) |
|
|
149
|
-
| `loop-cost` | Token-cost estimation (cadence × blend per level + daily cap + early-exit) |
|
|
150
|
-
| `loop-vision.md` / `loop-state.md` / `loop-state.json` | Contract + working memory + dedup ledger templates |
|
|
151
|
-
| `loop-orchestrator.ts` / `loop-orchestrator.sh` | Unattended runtime (worktree → restricted maker → gate → ship-on-pass → state → cleanup) |
|
|
152
|
-
| `loop-github-action.yml` | CI unattended scheduling (`on: schedule: cron`) |
|
|
153
|
-
| `loop-guard.ts` | `tool_call` defense-in-depth (never-do bash + path protection + dangerous-cmd block when `!ctx.hasUI`) |
|
|
154
|
-
| `behavioral-kernel` | Re-center on smallest-working-change discipline when authoring loop procedures |
|
|
155
|
-
| `defense-in-depth` | Layered-validation patterns the security checklist draws from |
|
|
156
|
-
|
|
157
|
-
## Common Rationalizations
|
|
158
|
-
|
|
159
|
-
| Rationalization | Reality |
|
|
160
|
-
|---|---|
|
|
161
|
-
| "The maker said tests pass" | Opinion is not evidence. Run the gate; read the exit code. |
|
|
162
|
-
| "It's just one flaky test, I'll fix it" | Flaky/infra/unknown → diagnosis-only. Chasing noise is the Ralph Wiggum loop. |
|
|
163
|
-
| "The loop can probably handle auth if I prompt it well" | Refuse-list. Judgement a loop cannot provide stays human. |
|
|
164
|
-
| "We can skip the budget cap, it's a cheap model" | Cost runaway is the second most common loop failure. Estimate with `loop-cost` first. |
|
|
165
|
-
| "The gate passing means the work is correct" | The gate means the gate passed. Semantic correctness is the test suite's job, and that gap is industry-unsolved. |
|
|
166
|
-
|
|
167
|
-
## Red Flags
|
|
168
|
-
|
|
169
|
-
- A loop that ships on LLM opinion instead of exit code (Ralph Wiggum).
|
|
170
|
-
- A loop with no `VISION.md`, or one that isn't reread each run (goal drift incoming).
|
|
171
|
-
- A maker with ship tools in its allowlist (capability-deprivation broken).
|
|
172
|
-
- A loop that auto-merges PRs (human approval removed).
|
|
173
|
-
- A loop fixing flaky/infra/unknown classes (acting without confidence).
|
|
174
|
-
- A loop with no budget cap, or one that exceeded cap and wasn't killed.
|
|
175
|
-
- An L3 readiness score with no proven committed run (capped at L2 until proven).
|
|
@@ -1,162 +0,0 @@
|
|
|
1
|
-
# =============================================================================
|
|
2
|
-
# loop-github-action.yml — GitHub Actions workflow for unattended pi loop runs.
|
|
3
|
-
#
|
|
4
|
-
# Implements FR11 (Scheduling — unattended): a scheduled trigger fires the
|
|
5
|
-
# loop-engineering orchestrator headless. The orchestrator runs the MAKER phase
|
|
6
|
-
# (`pi -p` with capability-deprivation, --approve/-a auto-trusts project files
|
|
7
|
-
# for non-interactive runs, --offline prevents phone-home), an exit-code GATE,
|
|
8
|
-
# and ships-on-pass (push `loop/<name>/<ts>` + `gh pr create`).
|
|
9
|
-
#
|
|
10
|
-
# Self-contained: this workflow uses raw `pi -p` via the shipped orchestrators
|
|
11
|
-
# (.pi/templates/loop-orchestrator.ts [T9] or loop-orchestrator.sh [T10]). It
|
|
12
|
-
# does NOT hard-depend on any third-party GitHub Action. An optional compose
|
|
13
|
-
# with the pi-coding-agent action is documented at the bottom of this file —
|
|
14
|
-
# use it only if you prefer the maintained action wrapper over the raw CLI.
|
|
15
|
-
#
|
|
16
|
-
# PARAMETERIZATION:
|
|
17
|
-
# - loop_name (workflow_dispatch input): which loop to run
|
|
18
|
-
# (the .pi/loops/<loop_name>/ directory). Defaults to `ci-triage`.
|
|
19
|
-
# - cron (workflow_dispatch input): overrides the placeholder schedule
|
|
20
|
-
# below for manual/ad-hoc runs. NOTE: workflow_dispatch inputs cannot
|
|
21
|
-
# change the `on.schedule` cron of an already-registered workflow — the
|
|
22
|
-
# schedule cron is fixed at registration time. To run a different cadence,
|
|
23
|
-
# copy this file, change the `on.schedule` cron placeholder, and register
|
|
24
|
-
# the new workflow. The `cron` input is documented for clarity and is
|
|
25
|
-
# consumed by the run step as `${{ inputs.cron }}` when you wire it into a
|
|
26
|
-
# wrapper; here it is surfaced in the step env for downstream tooling.
|
|
27
|
-
#
|
|
28
|
-
# CRON PLACEHOLDER:
|
|
29
|
-
# on.schedule.cron is set to "0 3 * * *" (03:00 UTC daily). Edit this value
|
|
30
|
-
# to match the loop's cadence (see .pi/loops/<name>/VISION.md cadence field).
|
|
31
|
-
# GitHub Actions cron is UTC and has a best-effort (not exact) fire time.
|
|
32
|
-
#
|
|
33
|
-
# Secrets required (repo → Settings → Secrets and variables → Actions):
|
|
34
|
-
# - PI_API_KEY — the pi API key (provider key). Injected as env PI_API_KEY.
|
|
35
|
-
# - GH_TOKEN — (optional) a GitHub PAT with repo + pull-requests scopes.
|
|
36
|
-
# If unset, the workflow falls back to the auto-provided
|
|
37
|
-
# `GITHUB_TOKEN` (permissions block below grants write).
|
|
38
|
-
#
|
|
39
|
-
# Requires (installed in the job): Node 24, pi CLI (global), git, gh.
|
|
40
|
-
# =============================================================================
|
|
41
|
-
|
|
42
|
-
name: loop-run
|
|
43
|
-
|
|
44
|
-
# -----------------------------------------------------------------------------
|
|
45
|
-
# Triggers
|
|
46
|
-
# -----------------------------------------------------------------------------
|
|
47
|
-
# schedule.cron is a PLACEHOLDER — edit to match the loop's cadence.
|
|
48
|
-
# GitHub Actions cron is UTC, best-effort (may lag minutes).
|
|
49
|
-
# workflow_dispatch inputs parameterize loop_name + cron for ad-hoc/manual runs.
|
|
50
|
-
on:
|
|
51
|
-
schedule:
|
|
52
|
-
- cron: "0 3 * * *"
|
|
53
|
-
workflow_dispatch:
|
|
54
|
-
inputs:
|
|
55
|
-
loop_name:
|
|
56
|
-
description: "Loop name (the .pi/loops/<loop_name>/ directory to run, e.g. ci-triage)"
|
|
57
|
-
required: true
|
|
58
|
-
default: "ci-triage"
|
|
59
|
-
type: string
|
|
60
|
-
cron:
|
|
61
|
-
description: "Cadence hint (informational; schedule cron is fixed at registration — copy this file to change cadence). Example: '0 3 * * *'"
|
|
62
|
-
required: false
|
|
63
|
-
default: "0 3 * * *"
|
|
64
|
-
type: string
|
|
65
|
-
|
|
66
|
-
# -----------------------------------------------------------------------------
|
|
67
|
-
# Permissions — grant write so the orchestrator can push branches + open PRs.
|
|
68
|
-
# (FR11 ship-on-pass: push loop/<name>/<ts> + gh pr create.)
|
|
69
|
-
# -----------------------------------------------------------------------------
|
|
70
|
-
permissions:
|
|
71
|
-
contents: write
|
|
72
|
-
pull-requests: write
|
|
73
|
-
|
|
74
|
-
# A single run per loop at a time — avoid overlap/collision (FR8 worktree
|
|
75
|
-
# isolation is per-invocation, but we still avoid duplicate scheduled runs).
|
|
76
|
-
concurrency:
|
|
77
|
-
group: loop-run-${{ github.event.inputs.loop_name || 'ci-triage' }}
|
|
78
|
-
cancel-in-progress: false
|
|
79
|
-
|
|
80
|
-
jobs:
|
|
81
|
-
loop:
|
|
82
|
-
runs-on: ubuntu-latest
|
|
83
|
-
timeout-minutes: 30
|
|
84
|
-
env:
|
|
85
|
-
# Resolve loop name: workflow_dispatch input wins; scheduled runs default.
|
|
86
|
-
LOOP_NAME: ${{ github.event.inputs.loop_name || 'ci-triage' }}
|
|
87
|
-
PI_API_KEY: ${{ secrets.PI_API_KEY }}
|
|
88
|
-
# Prefer a dedicated GH_TOKEN secret if provided; else use GITHUB_TOKEN.
|
|
89
|
-
GH_TOKEN: ${{ secrets.GH_TOKEN || github.token }}
|
|
90
|
-
# Surface the cron hint for downstream tooling/logging.
|
|
91
|
-
LOOP_CRON: ${{ github.event.inputs.cron || '0 3 * * *' }}
|
|
92
|
-
steps:
|
|
93
|
-
- name: Checkout
|
|
94
|
-
uses: actions/checkout@v4
|
|
95
|
-
with:
|
|
96
|
-
fetch-depth: 0 # full history for branch + worktree ops
|
|
97
|
-
|
|
98
|
-
- name: Setup Node
|
|
99
|
-
uses: actions/setup-node@v4
|
|
100
|
-
with:
|
|
101
|
-
node-version: "24"
|
|
102
|
-
# No cache: pi is installed globally, not from package.json.
|
|
103
|
-
|
|
104
|
-
- name: Install pi globally
|
|
105
|
-
run: npm i -g @earendil-works/pi-coding-agent
|
|
106
|
-
|
|
107
|
-
- name: Configure gh CLI auth
|
|
108
|
-
env:
|
|
109
|
-
GH_TOKEN: ${{ env.GH_TOKEN }}
|
|
110
|
-
run: |
|
|
111
|
-
# If GH_TOKEN secret was set, use it; otherwise rely on the
|
|
112
|
-
# auto-provided GITHUB_TOKEN (permissions block grants write).
|
|
113
|
-
if [ -n "$GH_TOKEN" ]; then
|
|
114
|
-
echo "$GH_TOKEN" | gh auth login --with-token
|
|
115
|
-
else
|
|
116
|
-
echo "GH_TOKEN empty — relying on auto-provided GITHUB_TOKEN"
|
|
117
|
-
fi
|
|
118
|
-
gh auth status || true
|
|
119
|
-
|
|
120
|
-
- name: Run loop orchestrator (run-once)
|
|
121
|
-
env:
|
|
122
|
-
PI_API_KEY: ${{ env.PI_API_KEY }}
|
|
123
|
-
GH_TOKEN: ${{ env.GH_TOKEN }}
|
|
124
|
-
run: |
|
|
125
|
-
set -uo pipefail
|
|
126
|
-
echo "::group::loop run-once ${LOOP_NAME} (cadence=${LOOP_CRON})"
|
|
127
|
-
# Primary: Node SDK orchestrator (T9) via tsx (plain `node` cannot
|
|
128
|
-
# execute TypeScript). Fallback to the portable bash orchestrator
|
|
129
|
-
# (T10) if tsx/node module resolution fails.
|
|
130
|
-
# --approve/-a auto-trusts project files (required non-interactive).
|
|
131
|
-
# --offline prevents phone-home.
|
|
132
|
-
if command -v npx >/dev/null 2>&1; then
|
|
133
|
-
echo "Running Node orchestrator (loop-orchestrator.ts) via tsx..."
|
|
134
|
-
npx --yes tsx .pi/templates/loop-orchestrator.ts run-once "${LOOP_NAME}" . \
|
|
135
|
-
|| echo "::warning::Node orchestrator exited non-zero (FR10: recorded, not fatal)"
|
|
136
|
-
else
|
|
137
|
-
echo "npx not found — running bash orchestrator (loop-orchestrator.sh)..."
|
|
138
|
-
bash .pi/templates/loop-orchestrator.sh run-once "${LOOP_NAME}" . \
|
|
139
|
-
|| echo "::warning::Bash orchestrator exited non-zero (FR10: recorded, not fatal)"
|
|
140
|
-
fi
|
|
141
|
-
echo "::endgroup::"
|
|
142
|
-
|
|
143
|
-
# =============================================================================
|
|
144
|
-
# OPTIONAL COMPOSE (do NOT hard-depend — kept as a comment only).
|
|
145
|
-
# -----------------------------------------------------------------------------
|
|
146
|
-
# If you prefer the maintained action wrapper over raw `pi -p`, you can replace
|
|
147
|
-
# the "Install pi globally" + "Run loop orchestrator" steps with:
|
|
148
|
-
#
|
|
149
|
-
# - name: Run pi loop (via action)
|
|
150
|
-
# uses: shaftoe/pi-coding-agent-action@v1
|
|
151
|
-
# with:
|
|
152
|
-
# loop-name: ${{ env.LOOP_NAME }}
|
|
153
|
-
# approve: true # -a: auto-trust project files (non-interactive)
|
|
154
|
-
# offline: true # --offline: prevent phone-home
|
|
155
|
-
# api-key: ${{ secrets.PI_API_KEY }}
|
|
156
|
-
# gh-token: ${{ secrets.GH_TOKEN || github.token }}
|
|
157
|
-
# run-command: "node .pi/templates/loop-orchestrator.ts run-once ${{ env.LOOP_NAME }} ."
|
|
158
|
-
#
|
|
159
|
-
# This workflow stays self-contained with raw `pi -p` so it has zero external
|
|
160
|
-
# action dependency. Use the compose above only if you accept the external
|
|
161
|
-
# dependency and pin a version.
|
|
162
|
-
# =============================================================================
|