agent-step-gate 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/ARCHITECTURE.md +393 -0
- package/README.md +662 -0
- package/SKILL.md +190 -0
- package/Weaver.md +140 -0
- package/dist/cli.d.ts +1 -0
- package/dist/cli.js +573 -0
- package/dist/cli.js.map +1 -0
- package/dist/core/errors.d.ts +16 -0
- package/dist/core/errors.js +32 -0
- package/dist/core/errors.js.map +1 -0
- package/dist/core/gate.d.ts +20 -0
- package/dist/core/gate.js +82 -0
- package/dist/core/gate.js.map +1 -0
- package/dist/core/keys.d.ts +18 -0
- package/dist/core/keys.js +37 -0
- package/dist/core/keys.js.map +1 -0
- package/dist/core/plan.d.ts +2 -0
- package/dist/core/plan.js +135 -0
- package/dist/core/plan.js.map +1 -0
- package/dist/core/program.d.ts +69 -0
- package/dist/core/program.js +191 -0
- package/dist/core/program.js.map +1 -0
- package/dist/core/reconcile.d.ts +37 -0
- package/dist/core/reconcile.js +198 -0
- package/dist/core/reconcile.js.map +1 -0
- package/dist/core/session.d.ts +25 -0
- package/dist/core/session.js +88 -0
- package/dist/core/session.js.map +1 -0
- package/dist/index.d.ts +1 -0
- package/dist/index.js +29 -0
- package/dist/index.js.map +1 -0
- package/dist/storage/db.d.ts +3 -0
- package/dist/storage/db.js +117 -0
- package/dist/storage/db.js.map +1 -0
- package/dist/storage/repository.d.ts +24 -0
- package/dist/storage/repository.js +449 -0
- package/dist/storage/repository.js.map +1 -0
- package/dist/tools/activeTask.d.ts +2 -0
- package/dist/tools/activeTask.js +41 -0
- package/dist/tools/activeTask.js.map +1 -0
- package/dist/tools/cancelTask.d.ts +2 -0
- package/dist/tools/cancelTask.js +39 -0
- package/dist/tools/cancelTask.js.map +1 -0
- package/dist/tools/checkpoint.d.ts +2 -0
- package/dist/tools/checkpoint.js +71 -0
- package/dist/tools/checkpoint.js.map +1 -0
- package/dist/tools/current.d.ts +2 -0
- package/dist/tools/current.js +64 -0
- package/dist/tools/current.js.map +1 -0
- package/dist/tools/finalize.d.ts +2 -0
- package/dist/tools/finalize.js +95 -0
- package/dist/tools/finalize.js.map +1 -0
- package/dist/tools/index.d.ts +6 -0
- package/dist/tools/index.js +7 -0
- package/dist/tools/index.js.map +1 -0
- package/dist/tools/startPlan.d.ts +2 -0
- package/dist/tools/startPlan.js +124 -0
- package/dist/tools/startPlan.js.map +1 -0
- package/dist/types/index.d.ts +142 -0
- package/dist/types/index.js +6 -0
- package/dist/types/index.js.map +1 -0
- package/package.json +48 -0
- package/scripts/interactive-demo.ts +394 -0
- package/scripts/mcp-call.mjs +56 -0
- package/scripts/prompt-check-hook.sh +27 -0
- package/scripts/session-start-hook.sh +47 -0
- package/scripts/stop-hook.mjs +83 -0
- package/scripts/stop-hook.sh +75 -0
package/SKILL.md
ADDED
|
@@ -0,0 +1,190 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: Step Gate
|
|
3
|
+
description: >
|
|
4
|
+
Use this skill whenever working on multi-step tasks, large refactors, cross-session
|
|
5
|
+
development, or multi-agent orchestration — any situation where skipping a planned step
|
|
6
|
+
would be costly. This skill enforces an external cryptographic ledger: every planned step
|
|
7
|
+
must be checkpointed with a valid key before the task can be finalized. Triggers on
|
|
8
|
+
phrases like "multi-step plan", "refactor in phases", "orchestrate agents", "long task",
|
|
9
|
+
"don't skip steps", "checkpoint my work", "gate my steps", or any mention of Step Gate.
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Step Gate — External Execution Ledger
|
|
13
|
+
|
|
14
|
+
An external cryptographic gate for agent task execution. It does not control *how* you
|
|
15
|
+
work — it only verifies *that you did* what you planned. Think of it as a
|
|
16
|
+
proof-of-work chain for agent steps: every completed step produces a cryptographic key,
|
|
17
|
+
and you cannot finalize a task without the final chain key.
|
|
18
|
+
|
|
19
|
+
## Why this exists
|
|
20
|
+
|
|
21
|
+
Long-context agents lose track of plans. A 15-step refactor becomes 12 steps in the
|
|
22
|
+
agent's memory by the time it reaches step 9. Context compression drops the original
|
|
23
|
+
plan. A Sub Agent claims "all done" when it skipped step 7.
|
|
24
|
+
|
|
25
|
+
The Gate solves this by moving the plan ledger **outside** the agent's context. The plan
|
|
26
|
+
lives in SQLite. Each step is locked behind a 6-character key. The key appears only once
|
|
27
|
+
in the checkpoint response — if the agent loses it, the step cannot be faked.
|
|
28
|
+
|
|
29
|
+
## Core rule
|
|
30
|
+
|
|
31
|
+
**One interaction = One Task.** At the start of each interaction, create a Task with the
|
|
32
|
+
steps you plan to do. Before the interaction ends, checkpoint every step and finalize
|
|
33
|
+
the Task. The Stop Hook will block exit if a Task is left unfinalized.
|
|
34
|
+
|
|
35
|
+
```
|
|
36
|
+
Interaction start → start-plan → checkpoint × N → finalize(taskKey) → done
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
Node and Program layers are optional — most work only needs the Task level.
|
|
40
|
+
|
|
41
|
+
## CLI reference
|
|
42
|
+
|
|
43
|
+
All commands return JSON. The CLI binary is at `node dist/cli.js` from the project root.
|
|
44
|
+
|
|
45
|
+
### Task commands
|
|
46
|
+
|
|
47
|
+
**start-plan** — Create a task for this interaction
|
|
48
|
+
```bash
|
|
49
|
+
node dist/cli.js start-plan '{"title":"What this task does","steps":[...]}'
|
|
50
|
+
```
|
|
51
|
+
Each step: `id` (optional), `title` (required), `dependsOn` (string array or omit for
|
|
52
|
+
auto-serial), `children` (nested container). First call auto-creates session files.
|
|
53
|
+
Returns `taskId`, `currentSteps`, and `stepKeys`.
|
|
54
|
+
|
|
55
|
+
**checkpoint** — Complete a step and unlock its dependents
|
|
56
|
+
```bash
|
|
57
|
+
node dist/cli.js checkpoint '{"taskId":"tsk_XXX","stepId":"tsk_XXX_yy","stepKey":"KEY"}'
|
|
58
|
+
```
|
|
59
|
+
The key is consumed on use — it cannot be reused. Returns `nextSteps` + `nextStepKeys`
|
|
60
|
+
for newly unlocked steps. When all steps are done, returns `allStepsCompleted: true` and
|
|
61
|
+
a `taskKey`.
|
|
62
|
+
|
|
63
|
+
**current** — Read current progress (does NOT return keys)
|
|
64
|
+
```bash
|
|
65
|
+
node dist/cli.js current '{"taskId":"tsk_XXX"}'
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
**finalize** — Complete the task and auto-propagate upward
|
|
69
|
+
```bash
|
|
70
|
+
node dist/cli.js finalize '{"taskId":"tsk_XXX","taskKey":"KEY"}'
|
|
71
|
+
```
|
|
72
|
+
Verifies the taskKey, marks the task completed, then automatically checks whether the
|
|
73
|
+
parent Node (if any) and Program are also complete. Returns a `level` field: `task`,
|
|
74
|
+
`node`, or `program`.
|
|
75
|
+
|
|
76
|
+
**cancel-task** — Cancel the current session's task
|
|
77
|
+
```bash
|
|
78
|
+
node dist/cli.js cancel-task '{"taskId":"tsk_XXX"}'
|
|
79
|
+
```
|
|
80
|
+
Session-gated — you can only cancel your own tasks. Cross-session cancel requires
|
|
81
|
+
`--admin --recovery-token <token>`.
|
|
82
|
+
|
|
83
|
+
**active-task** — List active tasks
|
|
84
|
+
```bash
|
|
85
|
+
node dist/cli.js active-task # current session only
|
|
86
|
+
node dist/cli.js active-task --all # all sessions
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
### Program commands (cross-session projects)
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
node dist/cli.js program init '{"title":"Big project","nodes":[...]}'
|
|
93
|
+
node dist/cli.js program start '{"programId":"pgm_XXX","nodeId":"phase-1"}'
|
|
94
|
+
node dist/cli.js program status '{"programId":"pgm_XXX"}'
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
Program finalization is automatic — when the last Task in the last Node is finalized,
|
|
98
|
+
the system propagates completion all the way up. No manual `program finalize` needed.
|
|
99
|
+
|
|
100
|
+
**program rebuild** — Rebuild node/program after plan changes (dry-run first, then `--confirm`)
|
|
101
|
+
```bash
|
|
102
|
+
node dist/cli.js program rebuild '{"programId":"pgm_XXX"}' # dry-run
|
|
103
|
+
node dist/cli.js program rebuild '{"programId":"pgm_XXX"}' --confirm
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
Always show the user the dry-run output and get confirmation before running `--confirm`.
|
|
107
|
+
|
|
108
|
+
### Diagnostics
|
|
109
|
+
|
|
110
|
+
```bash
|
|
111
|
+
node dist/cli.js gate reconcile # full read-only health check
|
|
112
|
+
node dist/cli.js gate reconcile '{"programId":"pgm_XXX"}' # scoped to one program
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
## DAG rules
|
|
116
|
+
|
|
117
|
+
**Example — parallel branches + merge point:**
|
|
118
|
+
```bash
|
|
119
|
+
node dist/cli.js start-plan '{
|
|
120
|
+
"title":"Backend refactor",
|
|
121
|
+
"steps":[
|
|
122
|
+
{"id":"auth","title":"Auth module","dependsOn":[]},
|
|
123
|
+
{"id":"api","title":"API layer","dependsOn":[]},
|
|
124
|
+
{"id":"db","title":"DB migration","dependsOn":["auth"]},
|
|
125
|
+
{"id":"test","title":"Integration tests","dependsOn":["api","db"]}
|
|
126
|
+
]
|
|
127
|
+
}'
|
|
128
|
+
# auth + api activate immediately; db waits for auth; test waits for api + db
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
| dependsOn | Behavior |
|
|
132
|
+
|-----------|----------|
|
|
133
|
+
| `[]` (explicit empty) | Parallel entry — activated immediately |
|
|
134
|
+
| omitted / undefined | Auto-serial — depends on previous leaf |
|
|
135
|
+
| `["a", "b"]` | Merge point — unlocks after both a and b complete |
|
|
136
|
+
| Container with children | Children inherit the container's dependsOn |
|
|
137
|
+
| `skipKey` + `skipTaskId` | Skip a previously completed step (one-time use) |
|
|
138
|
+
|
|
139
|
+
Cycle detection runs at plan creation time — circular dependencies are rejected before
|
|
140
|
+
any step starts.
|
|
141
|
+
|
|
142
|
+
## Interruption recovery
|
|
143
|
+
|
|
144
|
+
When a session is interrupted, completed steps are permanent cryptographic proofs:
|
|
145
|
+
|
|
146
|
+
```bash
|
|
147
|
+
# Rebuild with skipKey to jump past already-completed steps
|
|
148
|
+
node dist/cli.js start-plan '{
|
|
149
|
+
"title":"Resume wave 2",
|
|
150
|
+
"steps":[
|
|
151
|
+
{"id":"auth","title":"Auth module","dependsOn":[],"skipKey":"OLD_KEY","skipTaskId":"tsk_OLD"},
|
|
152
|
+
{"id":"ci","title":"CI tests","dependsOn":["auth"]}
|
|
153
|
+
]
|
|
154
|
+
}'
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
A skipKey can only be consumed once — the system writes a `skip_key_consumed` event on
|
|
158
|
+
first use and rejects subsequent attempts. Skipped steps are marked `skipped` (not
|
|
159
|
+
`completed`) to preserve traceability.
|
|
160
|
+
|
|
161
|
+
## Key rules
|
|
162
|
+
|
|
163
|
+
1. Keys appear exactly once — in the checkpoint or start-plan response. If lost, they
|
|
164
|
+
cannot be recovered. The `current` command never returns keys.
|
|
165
|
+
2. Step double-consumption is impossible — the DB transaction uses `WHERE status='current'`
|
|
166
|
+
with an affected-rows guard.
|
|
167
|
+
3. Cancel-task is session-gated — agents cannot cancel tasks they don't own.
|
|
168
|
+
4. SkipKey is one-time — the `events` table records every consumption.
|
|
169
|
+
5. Cycle detection runs at plan creation — dead DAGs are rejected before execution.
|
|
170
|
+
|
|
171
|
+
The Gate is a proof-of-completion system, not a security product. It protects against
|
|
172
|
+
agent hallucination, context loss, and accidental step-skipping. It does not protect
|
|
173
|
+
against deliberate external attack.
|
|
174
|
+
|
|
175
|
+
## Session files
|
|
176
|
+
|
|
177
|
+
The first `start-plan` call creates:
|
|
178
|
+
- `.step-gate/sessions/ses_XXXXXX.json` — session credentials
|
|
179
|
+
- `.step-gate/bindings/bind_cli_XXXXXX.json` — hook binding
|
|
180
|
+
|
|
181
|
+
The CLI auto-discovers the session from binding files. No manual session management needed.
|
|
182
|
+
|
|
183
|
+
## Further reading
|
|
184
|
+
|
|
185
|
+
- `Weaver.md` — Multi-agent orchestration: how a Main Agent spawns Sub Agents, injects
|
|
186
|
+
taskId + stepKey, and verifies returned taskKeys. Read this before orchestrating
|
|
187
|
+
parallel Sub Agents.
|
|
188
|
+
- `ARCHITECTURE.md` — Full architecture: 4-layer model, 7 DB tables, 5 credential types,
|
|
189
|
+
12 CLI commands, 20+ core functions
|
|
190
|
+
- `docs/security-stress-test-report.md` — Security audit: 9 issues, all resolved
|
package/Weaver.md
ADDED
|
@@ -0,0 +1,140 @@
|
|
|
1
|
+
# Weaver — Step Gate 编排引擎
|
|
2
|
+
|
|
3
|
+
## 三层角色
|
|
4
|
+
|
|
5
|
+
```
|
|
6
|
+
Main Agent (编排者) ← 持有 Node/Program 全局视角
|
|
7
|
+
│ 只做三件事: 派发、校验、推进
|
|
8
|
+
│ 不写代码、不执行 Step
|
|
9
|
+
│
|
|
10
|
+
├── Sub Agent A ← 只知道自己的 taskId + taskGoal
|
|
11
|
+
├── Sub Agent B ← 不知道其他 Task、不知道 DAG
|
|
12
|
+
└── Sub Agent C ← 不知道 Node/Program 全局
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
Sub Agent 的上下文由 Main Agent 在 Spawn 时精确注入。看不到全局计划,不知道前后 Task,不持有验证逻辑。
|
|
16
|
+
|
|
17
|
+
## 完整执行流程
|
|
18
|
+
|
|
19
|
+
```
|
|
20
|
+
═══════════════════════════════════════════════════════
|
|
21
|
+
Phase 0 — 规划
|
|
22
|
+
═══════════════════════════════════════════════════════
|
|
23
|
+
Main Agent:
|
|
24
|
+
program init → 拆分 Node
|
|
25
|
+
reconcile → 日常诊断
|
|
26
|
+
|
|
27
|
+
═══════════════════════════════════════════════════════
|
|
28
|
+
Phase 1 — 启动 Node
|
|
29
|
+
═══════════════════════════════════════════════════════
|
|
30
|
+
Main Agent:
|
|
31
|
+
program start <node-id> ← 绑定 session 到 node
|
|
32
|
+
start-plan → 创建 Task(DAG) ← 一次交互 = 一个 Task
|
|
33
|
+
→ 拿到 taskId + stepKeys
|
|
34
|
+
|
|
35
|
+
═══════════════════════════════════════════════════════
|
|
36
|
+
Phase 2 — 派发
|
|
37
|
+
═══════════════════════════════════════════════════════
|
|
38
|
+
Main Agent → Sub Agent:
|
|
39
|
+
{
|
|
40
|
+
"taskId": "tsk_XXX",
|
|
41
|
+
"taskGoal": "抽离认证中间件",
|
|
42
|
+
"constraints": ["只处理本Task范围", "完成后调checkpoint"]
|
|
43
|
+
}
|
|
44
|
+
|
|
45
|
+
Sub Agent 在同一工作目录启动:
|
|
46
|
+
→ ensureSession() 自动从 .step-gate/bindings/ 发现 session
|
|
47
|
+
→ 无需手动传 sessionId
|
|
48
|
+
|
|
49
|
+
═══════════════════════════════════════════════════════
|
|
50
|
+
Phase 3 — Sub Agent 执行循环
|
|
51
|
+
═══════════════════════════════════════════════════════
|
|
52
|
+
Sub Agent:
|
|
53
|
+
current(taskId)
|
|
54
|
+
→ { currentSteps, stepKeys }
|
|
55
|
+
|
|
56
|
+
for each step:
|
|
57
|
+
执行 step
|
|
58
|
+
checkpoint(taskId, stepId, stepKey)
|
|
59
|
+
→ { nextSteps, nextStepKeys }
|
|
60
|
+
→ 或 { allStepsCompleted: true, taskKey }
|
|
61
|
+
|
|
62
|
+
═══════════════════════════════════════════════════════
|
|
63
|
+
Phase 4 — 交回凭证
|
|
64
|
+
═══════════════════════════════════════════════════════
|
|
65
|
+
Sub Agent → Main Agent:
|
|
66
|
+
{
|
|
67
|
+
"taskId": "tsk_XXX",
|
|
68
|
+
"taskKey": "A1B2C3",
|
|
69
|
+
"summary": "完成认证中间件抽离",
|
|
70
|
+
"artifacts": ["src/middleware/auth.ts"]
|
|
71
|
+
}
|
|
72
|
+
|
|
73
|
+
═══════════════════════════════════════════════════════
|
|
74
|
+
Phase 5 — Main Agent 校验 + 自动推进
|
|
75
|
+
═══════════════════════════════════════════════════════
|
|
76
|
+
Main Agent:
|
|
77
|
+
finalize(taskId, taskKey)
|
|
78
|
+
|
|
79
|
+
✅ 通过:
|
|
80
|
+
→ 返回 { ok: true, level, ... }
|
|
81
|
+
→ level="task": Node 还有未完成的 Task,继续派发
|
|
82
|
+
→ level="node": Node 完成! nodeKey 返回,自动推进
|
|
83
|
+
→ level="program": 全部 Node 完成! 收工
|
|
84
|
+
→ Sub Agent 释放
|
|
85
|
+
|
|
86
|
+
❌ 不通过:
|
|
87
|
+
→ 返回 { actualStatus, completedSteps, missingSteps,
|
|
88
|
+
currentStepId, stepKey }
|
|
89
|
+
→ Main Agent 把真实账本发回 Sub Agent:
|
|
90
|
+
"你的 TaskKey 未通过 Gate 校验。
|
|
91
|
+
已完成: step_001, step_002
|
|
92
|
+
缺失: step_003, step_004
|
|
93
|
+
当前应继续 step_003,StepKey: SK_REAL33"
|
|
94
|
+
→ Sub Agent 从 currentStepId 继续 checkpoint
|
|
95
|
+
→ 修完重新 finalize
|
|
96
|
+
|
|
97
|
+
═══════════════════════════════════════════════════════
|
|
98
|
+
Phase 6 — 下一个 Node (自动)
|
|
99
|
+
═══════════════════════════════════════════════════════
|
|
100
|
+
finalize 返回 level="node" 时,Main Agent:
|
|
101
|
+
program status → 找下一个 ready node
|
|
102
|
+
program start <next-node>
|
|
103
|
+
→ 创建新 Task → 派发 → 循环
|
|
104
|
+
|
|
105
|
+
═══════════════════════════════════════════════════════
|
|
106
|
+
收尾
|
|
107
|
+
═══════════════════════════════════════════════════════
|
|
108
|
+
最后一个 Node 完成:
|
|
109
|
+
finalize → level="program" → Program completed
|
|
110
|
+
收工
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
## 关键设计点
|
|
114
|
+
|
|
115
|
+
**Main Agent 只调一个命令**:`finalize(taskKey)`。剩下的系统自动从 Task → Node → Program 传播。
|
|
116
|
+
|
|
117
|
+
**TaskKey 校验即消费**:finalize 会消耗 taskKey 并推进 DAG,不存在"校验通过但不推进"的状态。
|
|
118
|
+
|
|
119
|
+
**Sub Agent 不需要知道**:
|
|
120
|
+
- taskId 的结构含义
|
|
121
|
+
- 完整的 DAG
|
|
122
|
+
- 前后 Task 是什么
|
|
123
|
+
- Node/Program 全局
|
|
124
|
+
- 验证逻辑(系统自己校验)
|
|
125
|
+
|
|
126
|
+
**中断恢复**:taskId + skipKey 重建,旧 step 凭证永久保留。
|
|
127
|
+
|
|
128
|
+
**纯 Task 模式**:不用 Program/Node 时,只需 `start-plan → checkpoint → finalize`。每个交互一个 Task,交互结束 Stop Hook 自动检查。
|
|
129
|
+
|
|
130
|
+
## 渐进式披露
|
|
131
|
+
|
|
132
|
+
```
|
|
133
|
+
SKILL.md (执行协议) ← 所有 Agent 必读,基础 CLI 命令
|
|
134
|
+
└─ Weaver.md (编排引擎) ← Main Agent 读,如何编排 Sub Agent
|
|
135
|
+
└─ CLI (状态机) ← 底层实现
|
|
136
|
+
└─ SQLite (持久化)
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
Sub Agent 只需要 SKILL.md 中的 CLI 命令,不需要 Weaver.md。
|
|
140
|
+
Main Agent 需要 SKILL.md + Weaver.md。
|
package/dist/cli.d.ts
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
export {};
|