@a5c-ai/babysitter-codex 0.1.6-staging.2dca8387

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (51) hide show
  1. package/.codex/AGENTS.md +53 -0
  2. package/.codex/command-catalog.json +130 -0
  3. package/.codex/config.toml +24 -0
  4. package/.codex/hooks/babysitter-session-start.sh +15 -0
  5. package/.codex/hooks/babysitter-stop-hook.sh +15 -0
  6. package/.codex/hooks/user-prompt-submit.sh +15 -0
  7. package/.codex/hooks.json +37 -0
  8. package/.codex/plugin.json +132 -0
  9. package/.codex/skills/babysitter/assimilate/SKILL.md +58 -0
  10. package/.codex/skills/babysitter/call/SKILL.md +590 -0
  11. package/.codex/skills/babysitter/doctor/SKILL.md +89 -0
  12. package/.codex/skills/babysitter/forever/SKILL.md +45 -0
  13. package/.codex/skills/babysitter/help/SKILL.md +49 -0
  14. package/.codex/skills/babysitter/issue/SKILL.md +36 -0
  15. package/.codex/skills/babysitter/model/SKILL.md +31 -0
  16. package/.codex/skills/babysitter/observe/SKILL.md +38 -0
  17. package/.codex/skills/babysitter/plan/SKILL.md +44 -0
  18. package/.codex/skills/babysitter/project-install/SKILL.md +65 -0
  19. package/.codex/skills/babysitter/resume/SKILL.md +30 -0
  20. package/.codex/skills/babysitter/retrospect/SKILL.md +43 -0
  21. package/.codex/skills/babysitter/team-install/SKILL.md +31 -0
  22. package/.codex/skills/babysitter/user-install/SKILL.md +53 -0
  23. package/.codex/skills/babysitter/yolo/SKILL.md +48 -0
  24. package/AGENTS.md +91 -0
  25. package/CHANGELOG.md +162 -0
  26. package/README.md +146 -0
  27. package/SKILL.md +89 -0
  28. package/agents/openai.yaml +4 -0
  29. package/babysitter.lock.json +18 -0
  30. package/bin/postinstall.js +225 -0
  31. package/bin/uninstall.js +37 -0
  32. package/commands/README.md +23 -0
  33. package/commands/assimilate.md +27 -0
  34. package/commands/call.md +30 -0
  35. package/commands/doctor.md +27 -0
  36. package/commands/forever.md +27 -0
  37. package/commands/help.md +28 -0
  38. package/commands/issue.md +27 -0
  39. package/commands/model.md +27 -0
  40. package/commands/observe.md +27 -0
  41. package/commands/plan.md +27 -0
  42. package/commands/project-install.md +31 -0
  43. package/commands/resume.md +29 -0
  44. package/commands/retrospect.md +27 -0
  45. package/commands/team-install.md +29 -0
  46. package/commands/user-install.md +27 -0
  47. package/commands/yolo.md +28 -0
  48. package/package.json +50 -0
  49. package/scripts/team-install.js +257 -0
  50. package/test/integration.test.js +69 -0
  51. package/test/packaged-install.test.js +191 -0
@@ -0,0 +1,590 @@
1
+ ---
2
+ name: babysitter:call
3
+ description: Orchestrate via @babysitter. Use this skill when asked to babysit a run, orchestrate a process or whenever it is called explicitly. (babysit, babysitter, orchestrate, orchestrate a run, workflow, etc.)
4
+ argument-hint: Specific instructions for the run
5
+ version: 0.1.1
6
+ ---
7
+
8
+ # babysitter:call
9
+
10
+ Orchestrate `.a5c/runs/<runId>/` through iterative execution. Use the SDK CLI to drive the orchestration loop.
11
+
12
+ Start a babysitter run with the Codex hook model. The workspace must have
13
+ `.codex/hooks.json` wired for `SessionStart`, `UserPromptSubmit`, and `Stop`.
14
+
15
+ ## Dependencies
16
+
17
+ ### Babysitter SDK and CLI
18
+
19
+ Read the SDK version from the plugin manifest to ensure version compatibility:
20
+
21
+ ```bash
22
+ PLUGIN_ROOT="${CODEX_PLUGIN_ROOT:-$(pwd)/.codex}"
23
+ SDK_VERSION=$(node -e "try{console.log(JSON.parse(require('fs').readFileSync('${PLUGIN_ROOT}/plugin.json','utf8')).sdkVersion||'latest')}catch{console.log('latest')}")
24
+ sudo npm i -g @a5c-ai/babysitter-sdk@$SDK_VERSION
25
+ # sudo is depending on the env and system
26
+ ```
27
+
28
+ then use the CLI alias: CLI="babysitter"
29
+
30
+ **Alternatively, use the CLI alias:** `CLI="npx -y @a5c-ai/babysitter-sdk@$SDK_VERSION"`
31
+
32
+ ### jq
33
+
34
+ make sure you have jq installed and available in the path. if not, install it.
35
+
36
+ ---
37
+
38
+ ## Core Iteration Workflow
39
+
40
+ The babysitter workflow has 4 steps:
41
+
42
+ 1. **Run iteration** - Execute one orchestration step
43
+ 2. **Get effects** - Check what tasks are requested
44
+ 3. **Perform effects** - Execute the requested tasks
45
+ 4. **Post results** - Tasks auto-record results to journal
46
+
47
+ ### 1. Create or find the process for the run
48
+
49
+ #### Interview phase
50
+
51
+ ##### Interactive mode (default)
52
+
53
+ Interview the user for the intent, requirements, goal, scope, etc. through conversational interaction (before setting the in-session loop).
54
+
55
+ A multi-step phase to understand the intent and perspective to approach the process building after researching the repo, short research online if needed, short research in the target repo, additional instructions, intent and library (processes, specializations, skills, subagents, methodologies, references, etc.) / guide for methodology building. (clarifications regarding the intent, requirements, goal, scope, etc.) - the library is at [skill-root]/process/specializations/**/**/** and [skill-root]/process/methodologies/ and under [skill-root]/process/contrib/[contributer-username]/]
56
+
57
+ The first step should be the look at the state of the repo, then find the most relevant processes, specializations, skills, subagents, methodologies, references, etc. to use as a reference. use the babysitter cli discover command to find the relevant processes, skills, subagents, etc at various stages.
58
+
59
+ Then this phase can have: research online, research the repo, user questions, and other steps one after the other until the intent, requirements, goal, scope, etc. are clear and the user is satisfied with the understanding. after each step, decide the type of next step to take. do not plan more than 1 step ahead in this phase. and the same step type can be used more than once in this phase.
60
+
61
+ ##### Non-interactive mode (running non-interactively)
62
+
63
+ When running non-interactively, skip the interview phase entirely. Instead:
64
+ 1. Parse the initial prompt to extract intent, scope, and requirements.
65
+ 2. Research the repo structure to understand the codebase.
66
+ 3. Search the process library for the most relevant specialization/methodology.
67
+ 4. Proceed directly to the process creation phase using the extracted requirements.
68
+
69
+ #### User Profile Integration
70
+
71
+ Before building the process, check for an existing user profile to personalize the orchestration:
72
+
73
+ 1. **Read user profile**: Run `babysitter profile:read --user --json` to load the user profile from `~/.a5c/user-profile.json`. **Always use the CLI for profile operations — never import or call SDK profile functions directly.**
74
+
75
+ 2. **Pre-fill context**: Use the profile to understand the user's specialties, expertise levels, preferences, and communication style. This informs how you conduct the interview (skip questions the profile already answers) and how you build the process.
76
+
77
+ 3. **Breakpoint density**: Use the `breakpointTolerance` field to calibrate breakpoint placement in the generated process:
78
+ - `minimal`/`low` (expert users): Fewer breakpoints — only at critical decision points (architecture choices, deployment, destructive operations)
79
+ - `moderate` (intermediate users): Standard breakpoints at phase boundaries
80
+ - `high`/`maximum` (novice users): More breakpoints — add review gates after each implementation step, before each integration, and at every quality gate
81
+ - Always respect `alwaysBreakOn` for operations that must always pause (e.g., destructive-git, deploy)
82
+ - If `skipBreakpointsForKnownPatterns` is true, reduce breakpoints for operations the user has previously approved
83
+
84
+ 4. **Tool preferences**: Use `toolPreferences` and `installedSkills`/`installedAgents` to prioritize which agents and skills to use in the process. Prefer tools the user is familiar with.
85
+
86
+ 5. **Communication style**: Adapt process descriptions and breakpoint questions to match the user's `communicationStyle` preferences (tone, explanationDepth, preferredResponseFormat).
87
+
88
+ 6. **If no profile exists**: Proceed normally with the interview phase. Consider suggesting the user run `/user-install` first to create a profile for better personalization.
89
+
90
+ 7. **CLI profile commands (mandatory)**: **All profile operations MUST use the babysitter CLI — never import SDK profile functions directly.** This applies to the babysit skill itself, all generated processes, and all agent task instructions:
91
+ - `babysitter profile:read --user --json` — Read user profile as JSON
92
+ - `babysitter profile:read --project --json` — Read project profile as JSON
93
+ - `babysitter profile:write --user --input <file> --json` — Write user profile from file
94
+ - `babysitter profile:write --project --input <file> --json` — Write project profile from file
95
+ - `babysitter profile:merge --user --input <file> --json` — Merge partial updates into user profile
96
+ - `babysitter profile:merge --project --input <file> --json` — Merge partial updates into project profile
97
+ - `babysitter profile:render --user` — Render user profile as readable markdown
98
+ - `babysitter profile:render --project` — Render project profile as readable markdown
99
+
100
+ Use `--dir <dir>` to override the default profile directory when needed.
101
+
102
+ #### Process creation phase
103
+
104
+ after the interview phase, create the complete custom process files (js and jsons) for the run according to the Process Creation Guidelines and methodologies section. also install the babysitter-sdk inside .a5c if it is not already installed. (install it in .a5c/package.json if it is not already installed, make sure to use the latest version). **IMPORTANT**: When installing into `.a5c/`, use `npm i --prefix .a5c @a5c-ai/babysitter-sdk@latest` or a subshell `(cd .a5c && npm i @a5c-ai/babysitter-sdk@latest)` to avoid leaving CWD inside `.a5c/`, which causes doubled path resolution bugs.
105
+ you must abide the syntax and structure of the process files from the process library.
106
+
107
+ **IMPORTANT — Path resolution**: Always use **absolute paths** for `--entry` when calling `run:create`, and always run the CLI from the **project root** directory (not from `.a5c/`). Using relative paths while CWD is inside `.a5c/` causes doubled paths like `.a5c/.a5c/runs/` or `.a5c/.a5c/processes/`.
108
+
109
+ **User profile awareness**: If a user profile was loaded in the User Profile Integration step, use it to inform process design — adjust breakpoint density per the user's tolerance level, select agents/skills the user prefers, and match the process complexity to the user's expertise.
110
+
111
+ **IMPORTANT — Profile I/O in processes**: When generating process files, all profile read/write/merge operations MUST use the babysitter CLI commands (`babysitter profile:read`, `profile:write`, `profile:merge`, `profile:render`). Never instruct agents to import or call SDK profile functions (`readUserProfile`, `writeUserProfile`, etc.) directly. The CLI handles atomic writes, directory creation, and markdown generation automatically.
112
+
113
+ After the process is created and before creating the run:
114
+ - **Interactive mode**: describe the process at high level (not the code or implementation details) to the user and ask for confirmation to use it, also generate it as a [process-name].diagram.md and [process-name].process.md file. If the user is not satisfied with the process, go back to the process creation phase and modify the process according to the feedback of the user until the user is satisfied with the process.
115
+ - **Non-interactive mode**: proceed directly to creating the run without user confirmation.
116
+
117
+ ### 2. Create run and bind session (single command):
118
+
119
+ **For new runs:**
120
+
121
+ ```bash
122
+ # Detect session ID from Codex environment variables
123
+ SESSION_ID="${CODEX_THREAD_ID:-${CODEX_SESSION_ID:-}}"
124
+
125
+ $CLI run:create \
126
+ --process-id <id> \
127
+ --entry <path>#<export> \
128
+ --inputs <file> \
129
+ --prompt "$PROMPT" \
130
+ --harness codex \
131
+ --state-dir .a5c \
132
+ --json
133
+ ```
134
+
135
+ If a Codex session or thread ID is available (`CODEX_THREAD_ID` or `CODEX_SESSION_ID`), add `--session-id "$SESSION_ID"` to bind the session at creation time. If no stable session/thread ID is available, omit `--session-id` rather than fabricating one.
136
+
137
+ **Required flags:**
138
+ - `--process-id <id>` — unique identifier for the process definition
139
+ - `--entry <path>#<export>` — path to the process JS file and its named export (e.g., `./my-process.js#process`)
140
+ - `--prompt "$PROMPT"` — the user's initial prompt/request text
141
+ - `--harness codex` — activates Codex session binding. The session ID is detected from `CODEX_THREAD_ID` or `CODEX_SESSION_ID` environment variables.
142
+
143
+ **Optional flags:**
144
+ - `--inputs <file>` — path to a JSON file with process inputs
145
+ - `--session-id <id>` — explicit session/thread ID (auto-detected from env vars if omitted)
146
+ - `--run-id <id>` — override auto-generated run ID
147
+ - `--runs-dir <dir>` — override runs directory (default: `.a5c/runs`)
148
+ - `--state-dir <dir>` — state directory for session tracking
149
+
150
+ This single command creates the run AND binds the session (initializing the stop-hook loop). The JSON output includes `runId`, `runDir`, and `session` binding status.
151
+
152
+ **For resuming existing runs:**
153
+
154
+ ```bash
155
+ $CLI session:resume \
156
+ --state-dir .a5c \
157
+ --run-id <runId> --runs-dir .a5c/runs --json
158
+ ```
159
+
160
+ ### 3. Run Iteration
161
+
162
+ ```bash
163
+ $CLI run:iterate .a5c/runs/<runId> --json --iteration <n>
164
+ ```
165
+
166
+ **Output:**
167
+ ```json
168
+ {
169
+ "iteration": 1,
170
+ "status": "executed|waiting|completed|failed|none",
171
+ "action": "executed-tasks|waiting|none",
172
+ "reason": "auto-runnable-tasks|breakpoint-waiting|terminal-state",
173
+ "count": 3,
174
+ "completionProof": "only-present-when-completed",
175
+ "metadata": { "runId": "...", "processId": "..." }
176
+ }
177
+ ```
178
+
179
+ **Status values:**
180
+ - `"executed"` - Tasks executed, continue looping
181
+ - `"waiting"` - Breakpoint/sleep, pause until released
182
+ - `"completed"` - Run finished successfully
183
+ - `"failed"` - Run failed with error
184
+ - `"none"` - No pending effects
185
+
186
+ **Common mistake to avoid:**
187
+ - WRONG: Calling run:iterate, performing the effect, posting the result,
188
+ then calling run:iterate again in the same session
189
+ - CORRECT: Calling run:iterate, performing the effect, posting the result,
190
+ then STOPPING the session so the Stop hook triggers the next iteration
191
+
192
+ ### 4. Get Effects
193
+
194
+ ```bash
195
+ $CLI task:list .a5c/runs/<runId> --pending --json
196
+ ```
197
+
198
+ **Output:**
199
+ ```json
200
+ {
201
+ "tasks": [
202
+ {
203
+ "effectId": "effect-abc123",
204
+ "kind": "node|agent|skill|breakpoint",
205
+ "label": "auto",
206
+ "status": "requested"
207
+ }
208
+ ]
209
+ }
210
+ ```
211
+
212
+ ### 5. Perform Effects
213
+
214
+ Run the effect externally to the SDK (by you, your hook, or another worker). After execution (by delegation to an agent or skill), post the outcome summary into the run by calling `task:post`, which:
215
+ - Writes the committed result to `tasks/<effectId>/result.json`
216
+ - Appends an `EFFECT_RESOLVED` event to the journal
217
+ - Updates the state cache
218
+
219
+ IMPORTANT:
220
+ - Make sure the change was actually performed and not described or implied. (for example, if code files were mentioned as created in the summary, make sure they were actually created.)
221
+ - Include in the instructions to the agent or skill to perform the task in full and return the only the summary result in the requested schema.
222
+
223
+ #### 5.1 Breakpoint Handling
224
+
225
+ ##### 5.1.1 Interactive mode
226
+
227
+ If running in interactive mode, present the breakpoint question to the user through Codex's native interaction model (direct conversational prompt).
228
+
229
+ **CRITICAL: Response validation rules:**
230
+ - The breakpoint question MUST include explicit "Approve" and "Reject" (or similar) options so the user's intent is unambiguous.
231
+ - If the user provides an empty, ambiguous, or no response: treat as **NOT approved**. Re-ask the question or keep the breakpoint in a pending/waiting state. Do NOT proceed.
232
+ - NEVER fabricate, synthesize, or infer approval text. Only pass through the user's actual response verbatim.
233
+ - NEVER assume approval from ambiguous, empty, or missing responses. When in doubt, the answer is "not approved".
234
+
235
+ **CRITICAL: Breakpoint rejection posting rules:**
236
+ - Breakpoint rejection MUST be posted with `--status ok` and a value of `{"approved": false, "response": "..."}`. NEVER use `--status error` for a user rejection — that signals a task execution failure and will trigger `RUN_FAILED`, requiring manual journal surgery to recover.
237
+ - Only use `--status error` if a genuine error occurs during breakpoint handling.
238
+
239
+ **Breakpoint posting examples:**
240
+
241
+ ```bash
242
+ # CORRECT: User approved the breakpoint
243
+ echo '{"approved": true, "response": "Looks good, proceed"}' > tasks/<effectId>/output.json
244
+ $CLI task:post <runId> <effectId> --status ok --value tasks/<effectId>/output.json
245
+
246
+ # CORRECT: User rejected the breakpoint
247
+ echo '{"approved": false, "response": "Stop here"}' > tasks/<effectId>/output.json
248
+ $CLI task:post <runId> <effectId> --status ok --value tasks/<effectId>/output.json
249
+
250
+ # WRONG: Posting rejection as error — causes RUN_FAILED
251
+ $CLI task:post <runId> <effectId> --status error
252
+ ```
253
+
254
+ **Breakpoint value payload schema:**
255
+
256
+ | Field | Type | Required | Description |
257
+ |-------|------|----------|-------------|
258
+ | `approved` | `boolean` | Yes | Whether the user approved the breakpoint |
259
+ | `response` | `string` | No | The user's response text or selected option |
260
+ | `feedback` | `string` | No | Additional feedback from the user |
261
+
262
+ After receiving an explicit approval or rejection from the user, post the result of the breakpoint to the run by calling `task:post`.
263
+
264
+ Breakpoints are meant for human approval. NEVER auto-approve breakpoints in interactive mode. NEVER release or approve breakpoints yourself. Once the user responds, post the result of the breakpoint to the run by calling `task:post` when the breakpoint is resolved.
265
+
266
+ Otherwise:
267
+
268
+ ##### 5.1.2 Non-interactive mode
269
+
270
+ If running in non-interactive mode, resolve the breakpoint by selecting the best option according to the context and the intent of the user, then post the result via `task:post`.
271
+
272
+ **CRITICAL:** When rejecting a breakpoint in non-interactive mode, always use `--status ok` with `{"approved": false}` in the value payload. Never use `--status error` for rejections — it will fail the entire run.
273
+
274
+ **Non-interactive breakpoint posting:**
275
+ ```bash
276
+ # Approve: proceed with the action
277
+ echo '{"approved": true, "response": "Auto-approved based on context"}' > tasks/<effectId>/output.json
278
+ $CLI task:post <runId> <effectId> --status ok --value tasks/<effectId>/output.json
279
+
280
+ # Reject: skip but keep the run alive
281
+ echo '{"approved": false, "response": "Skipped — not applicable in current context"}' > tasks/<effectId>/output.json
282
+ $CLI task:post <runId> <effectId> --status ok --value tasks/<effectId>/output.json
283
+ ```
284
+
285
+ ### 6. Results Posting
286
+
287
+ **IMPORTANT**: Do NOT write `result.json` directly. The SDK owns that file.
288
+
289
+ **Workflow:**
290
+
291
+ 1. Write the result **value** to a separate file (e.g., `output.json` or `value.json`):
292
+ ```json
293
+ {
294
+ "score": 85,
295
+ "details": { ... }
296
+ }
297
+ ```
298
+
299
+ 2. Post the result, passing the value file:
300
+ ```bash
301
+ $CLI task:post .a5c/runs/<runId> <effectId> \
302
+ --status ok \
303
+ --value tasks/<effectId>/output.json \
304
+ --json
305
+ ```
306
+
307
+ The `task:post` command will:
308
+ - Read the value from your file
309
+ - Write the complete `result.json` (including schema, metadata, and your value)
310
+ - Append an `EFFECT_RESOLVED` event to the journal
311
+ - Update the state cache
312
+
313
+ **Available flags:**
314
+ - `--status <ok|error>` (required)
315
+ - `--value <file>` - Result value (for status=ok)
316
+ - `--error <file>` - Error payload (for status=error)
317
+ - `--stdout-file <file>` - Capture stdout
318
+ - `--stderr-file <file>` - Capture stderr
319
+ - `--started-at <iso8601>` - Task start time
320
+ - `--finished-at <iso8601>` - Task end time
321
+ - `--metadata <file>` - Additional metadata JSON
322
+
323
+ **Common mistake to avoid:**
324
+ ```bash
325
+ # WRONG: Writing result.json directly
326
+ echo '{"result": {...}}' > tasks/<effectId>/result.json
327
+ $CLI task:post <runId> <effectId> --status ok
328
+
329
+ # CORRECT: Write value to separate file, let SDK create result.json
330
+ echo '{"score": 85}' > tasks/<effectId>/output.json
331
+ $CLI task:post <runId> <effectId> --status ok --value tasks/<effectId>/output.json
332
+ ```
333
+
334
+ **Breakpoint-specific posting:**
335
+
336
+ Breakpoints use the same `task:post` workflow but require a specific value payload with an `approved` field:
337
+
338
+ ```bash
339
+ # Breakpoint approval
340
+ echo '{"approved": true, "response": "User approved"}' > tasks/<effectId>/output.json
341
+ $CLI task:post <runId> <effectId> --status ok --value tasks/<effectId>/output.json
342
+
343
+ # Breakpoint rejection (ALWAYS use --status ok, not --status error)
344
+ echo '{"approved": false, "response": "User rejected"}' > tasks/<effectId>/output.json
345
+ $CLI task:post <runId> <effectId> --status ok --value tasks/<effectId>/output.json
346
+ ```
347
+
348
+ ---
349
+
350
+ ### 7. STOP after every phase after run-session association — the Stop hook drives the loop, not you.
351
+
352
+ You will be called back by the Codex Stop hook (`.codex/hooks/babysitter-stop-hook.sh`) and asked to repeat the orchestration loop or exit the loop by posting the completion proof. Repeat orchestration loop by calling run:iterate or doing the next right thing.
353
+
354
+ The Codex hook model uses three hooks wired in `.codex/hooks.json`:
355
+ - **SessionStart** (`.codex/hooks/babysitter-session-start.sh`) — initializes the babysitter state at session start
356
+ - **UserPromptSubmit** (`.codex/hooks/user-prompt-submit.sh`) — processes user prompts
357
+ - **Stop** (`.codex/hooks/babysitter-stop-hook.sh`) — drives the orchestration loop between iterations
358
+
359
+ Common mistakes to avoid:
360
+ - WRONG: Stopping the session before run-session association
361
+ - CORRECT: Stopping the session after run-session association, then after every iteration, letting the Stop hook call you back to continue the loop until completion.
362
+
363
+ - WRONG: Calling run:iterate multiple times in the same session without stopping and letting the Stop hook call you back.
364
+ - CORRECT: Calling run:iterate once, then stopping and letting the Stop hook call you back for the next iteration until completion.
365
+
366
+ ### 8. Completion Proof
367
+
368
+ When the run is completed, the CLI will emit a `completionProof` value in the output of `run:iterate` and `run:status`. You must return that exact value wrapped in a `<promise>...</promise>` tag to signal completion to the stop-hook loop.
369
+
370
+ ### Progressing Between The Phases
371
+
372
+ After you ran run:create, progress through the phases above by Stopping the session, which will trigger the Codex Stop hook and call you with the next phase. After you finish each phase after run:create, stop the session and return the control to the user until you are called again by the hook.
373
+
374
+ ## Task Kinds
375
+
376
+ **CRITICAL RULE: NEVER use `node` kind effects in generated processes.** The `node` kind executes inline Node.js scripts which bypasses the agent orchestration model entirely — it produces deterministic scripts instead of leveraging LLM reasoning. Every task that would be a `node` effect MUST be converted to an `agent` or `skill` effect instead. If you encounter `node` kind tasks in reference process files, convert them to `agent` kind when using them as templates. The only acceptable non-agent kinds are `shell` (for running existing test suites, git commands, or build tools), `breakpoint` (human approval), and `sleep` (time gates).
377
+
378
+ | Kind | Description | Executor | When to use |
379
+ |------|-------------|----------|-------------|
380
+ | ~~`node`~~ | ~~Node.js script~~ | ~~Local node process~~ | **NEVER — forbidden. Convert to `agent` or `skill`.** |
381
+ | `shell` | Shell command | Local shell process | Only for running existing CLI tools, test suites, git, linters, builds |
382
+ | `agent` | LLM agent | Agent runtime | **Default for all tasks** — planning, implementation, analysis, verification, scoring, debugging, code writing, research |
383
+ | `skill` | Codex skill | Skill system | When a matching installed skill exists (preferred over agent when available) |
384
+ | `breakpoint` | Human approval | UI/CLI | Decision gates requiring user input |
385
+ | `sleep` | Time gate | Scheduler | Time-based pauses |
386
+
387
+ ### Agent Task Example
388
+
389
+ Important: Check which subagents and agents are actually available before assigning the name. if none, pass the general-purpose subagent. check the subagents and agents in the plugin (in nested folders) and to find relevant subagents and agents to use as a reference. specifically check subagents and agents in folders next to the reference process file.
390
+ when executing the agent task, delegate to an appropriate agent. never use the Babysitter skill or agent to execute the task. if the subagent or agent is not installed for the project before running the process, install it first.
391
+
392
+ ```javascript
393
+ export const agentTask = defineTask('agent-scorer', (args, taskCtx) => ({
394
+ kind: 'agent', // ← Use "agent" not "node"
395
+ title: 'Agent scoring',
396
+ agent: {
397
+ name: 'quality-scorer',
398
+ prompt: {
399
+ role: 'QA engineer',
400
+ task: 'Score results 0-100',
401
+ context: { ...args },
402
+ instructions: ['Review', 'Score', 'Recommend'],
403
+ outputFormat: 'JSON'
404
+ },
405
+ outputSchema: {
406
+ type: 'object',
407
+ required: ['score']
408
+ }
409
+ },
410
+
411
+ io: {
412
+ inputJsonPath: `tasks/${taskCtx.effectId}/input.json`,
413
+ outputJsonPath: `tasks/${taskCtx.effectId}/result.json`
414
+ }
415
+ }));
416
+ ```
417
+
418
+ ### Skill Task Example
419
+
420
+ Important: Check which skills are actually available before assigning the skill name. check the skills in the plugin (in nested folders) and to find relevant skills to use as a reference. specifically check skills in folders next to the reference process file.
421
+
422
+ Never use the Babysitter skill or agent to execute the task. if the skill or subagent is not installed for the project before running the process, install it first. skills are prefered over subagents for executing tasks, especially if you can find the right skill for the task. you can convert a agent call to a skill call even if the reference process mentions an agent call.
423
+
424
+ ```javascript
425
+ export const skillTask = defineTask('analyzer-skill', (args, taskCtx) => ({
426
+ kind: 'skill', // ← Use "skill" not "node"
427
+ title: 'Analyze codebase',
428
+
429
+ skill: {
430
+ name: 'codebase-analyzer',
431
+ context: {
432
+ scope: args.scope,
433
+ depth: args.depth,
434
+ analysisType: args.type,
435
+ criteria: ['Code consistency', 'Naming conventions', 'Error handling'],
436
+ instructions: [
437
+ 'Scan specified paths for code patterns',
438
+ 'Analyze consistency across the codebase',
439
+ 'Check naming conventions',
440
+ 'Review error handling patterns',
441
+ 'Generate structured analysis report'
442
+ ]
443
+ }
444
+ },
445
+
446
+ io: {
447
+ inputJsonPath: `tasks/${taskCtx.effectId}/input.json`,
448
+ outputJsonPath: `tasks/${taskCtx.effectId}/result.json`
449
+ }
450
+ }));
451
+ ```
452
+
453
+ ---
454
+
455
+ ## Quick Commands Reference
456
+
457
+ **Create run (with session binding):**
458
+ ```bash
459
+ SESSION_ID="${CODEX_THREAD_ID:-${CODEX_SESSION_ID:-}}"
460
+ $CLI run:create --process-id <id> --entry <path>#<export> --inputs <file> \
461
+ --prompt "$PROMPT" --harness codex \
462
+ --state-dir .a5c --json
463
+ # Add --session-id "$SESSION_ID" if a session/thread ID is available
464
+ ```
465
+
466
+ **Check status:**
467
+ ```bash
468
+ $CLI run:status <runId> --json
469
+ ```
470
+
471
+ When the run completes, `run:iterate` and `run:status` emit `completionProof`. Use that exact value in a `<promise>...</promise>` tag to end the loop.
472
+
473
+ **View events:**
474
+ ```bash
475
+ $CLI run:events <runId> --limit 20 --reverse
476
+ ```
477
+
478
+ **List tasks:**
479
+ ```bash
480
+ $CLI task:list <runId> --pending --json
481
+ ```
482
+
483
+ **Post task result:**
484
+ ```bash
485
+ $CLI task:post <runId> <effectId> --status <ok|error> --json
486
+ ```
487
+
488
+ **Iterate:**
489
+ ```bash
490
+ $CLI run:iterate <runId> --json --iteration <n>
491
+ ```
492
+ ---
493
+
494
+ ## Recovery from failure
495
+
496
+ If at any point the run fails due to SDK issues or corrupted state or journal. analyze the error and the journal events. recover the state to the state and journal to the last known good state and adapt and try to continue the run.
497
+
498
+ ## Process Creation Guidelines and methodologies
499
+
500
+ - When building ux and full stack applications, integrate/link the main pages of the frontend with functionality created for every phase of the development process (where relevant). so that is a way to test the functionality of the app as we go.
501
+
502
+ - Unless otherwise specified, prefer quality gated iterative development loops in the process.
503
+
504
+ - You can change the process after the run is created or during the run (and adapt the process accordingly and journal accordingly) in case you discovered new information or requirements that were not previously known that changes the approach or the process.
505
+
506
+ - The process should be a comprehensive and complete solution to the user request. it should not be a partial solution or a work in progress. it should be a complete and working solution that can be used to test the functionality of the app as we go.
507
+
508
+ - the process should usally be a composition (in code) of multiple processes from the process library (not just one), for multiple phases and parts of the process, each utilizing a different process from the library as a reference. in order to perform the user request in the most accurate and robust process that utilizes the best-practices from the library in every part.
509
+
510
+ - include verification and refinement steps (and loops) for planning phases and integration phases, debugging phases, refactoring phases, etc. as well.
511
+
512
+ - Create the process with (and around) the available skills and subagents. (check which are available first and use discover to allow)
513
+
514
+ - Prefer incremental work that allows testing and experimentation with the new functionality of the work or app as we go. for example, when building a new feature, prefer building it in a way that allows testing it with a simple script or a simple page in the frontend before integrating it to the main pages and flows of the app.
515
+
516
+ ### Process File Discovery Markers
517
+
518
+ When creating process files, include `@skill` and `@agent` markers in the JSDoc header listing the skills and agents relevant to this process. The SDK reads these markers to provide targeted discovery results instead of scanning all available skills.
519
+
520
+ **Format** (one per line, path relative to process root):
521
+ ```javascript
522
+ /**
523
+ * @process specializations/web-development/react-app-development
524
+ * @description React app development with TDD
525
+ * @skill frontend-design specializations/web-development/skills/frontend-design/SKILL.md
526
+ * @skill visual-diff-scorer specializations/web-development/skills/visual-diff-scorer/SKILL.md
527
+ * @agent frontend-architect specializations/web-development/agents/frontend-architect/AGENT.md
528
+ * @agent fullstack-architect specializations/web-development/agents/fullstack-architect/AGENT.md
529
+ */
530
+ ```
531
+
532
+ **Steps during process creation:**
533
+ 1. Use `babysitter skill:discover --process-path <path> --json` to find relevant skills/agents in the specialization directory
534
+ 2. Select the ones actually needed by the process tasks
535
+ 3. Add them as `@skill`/`@agent` markers in the JSDoc header
536
+ 4. Use full relative path from the process root
537
+
538
+ When these markers are present, `run:create` and `run:iterate` will return only the marked skills/agents (with full file paths) instead of scanning the entire plugin tree. Without markers, the SDK falls back to scanning ALL specializations, which can return dozens of irrelevant results and degrade orchestration quality.
539
+
540
+ - Unless otherwise specified, prefer processes that close the widest loop in the quality gates (for example e2e tests with a full browser or emulator/vm if it a mobile or desktop app) AND gates that make sure the work is accurate against the user request (all the specs is covered and no extra stuff was added unless permitted by the intent of the user).
541
+
542
+ - Scan the methodologies and processes in the plugin and the sdk package to find relevant processes and methodologies to use as a reference. also search for process files bundled in active skills, processes in the repo (.a5c/processes/).
543
+
544
+ - if you encounter a generic reusable part of a process that can be later reused and composed, build it in a modular way and organize it in the .a5c/processes directory. and import it to compose it to the specific process in the current user request. prefer architecting processes in such modular way for reusability and composition.
545
+
546
+ prefer processes that have the following characteristics unless otherwise specified:
547
+ - in case of a new project, plan the architecture, stack, parts, milestones, etc.
548
+ - in case of an existing project, analyze the architecture, stack, relevant parts, milestones, etc. and plan the changes to be made in: milestones, existing modules modification/preparation steps, new modules, integration steps, etc.
549
+ - integrate/link the main pages (or entry points) with functionality created for every phase of the development process (where relevant). so that there is a way to test and experiment with the new functionality of the work or app as we go.
550
+ - Quality gated iterative and convergent development/refinement/optimization loops for each part of the implementation, definition, ux design and definition, specs, etc.
551
+ - Test driven - where quality gates agents can use executable tools, scripts and tests to verify the accuracy and completeness of the implementation.
552
+ - Integration Phases for each new functionality in every milestone with integration tests and quality gates. - where quality gates agents can use executable tools, scripts and tests to verify the accuracy and completeness of the integration.
553
+ - Where relevant - Ensures beautiful and polished ux design and implementation. pixel perfect verification and refinement loops.
554
+ - Ensures accurate and complete implementation of the user request.
555
+ - Ensures closing quality feedback loops in the most complete and comprehensive way possible and practical.
556
+ - in case the scope includes work in an existing deployed application and the scope of the feedback loop requires validations at the deployed environment (or remote environment), analyze the deployment methods and understand how the existing delivery pipeline works. and how you can deliver changes to the sandbox/staging and verify the accuracy and completeness of the changes you are making on the remote environment. with observability on the ci pipelines, logs of the cluster/app/infra/etc. (for requests like: "fix this bug and make sure that it is fixed locally, then deploy to staging and verify that the bug is fixed there too")
557
+ - if the user is very explicit about the flow and process, create a process that follows it closely and strictly. (ad hoc requests like: "try this functionality and make sure it works as expected, repeat until it works as expected")
558
+ - search for processes (js files), skills and agents (SKILL.md and AGENT.md files) during the interactive process building phase to compose a comprehensive process that may combine various parts from different sources:
559
+ - .a5c/processes/ (project level processes)
560
+ - plugins/babysitter/skills/babysit/process/specializations/[rnd-specialization-name-slugified]/ (rnd specializations)
561
+ - plugins/babysitter/skills/babysit/process/specializations/domains/[domain-name-slugified]/[specialization-name-slugified]/ (non rnd specializations)
562
+ - plugins/babysitter/skills/babysit/process/methodologies/ (methodologies)
563
+ - When creating the process file, add `@skill` and `@agent` JSDoc markers for the relevant skills and agents found during this search (see "Process File Discovery Markers" above). This ensures only the needed dependencies are surfaced during orchestration instead of scanning all available specializations.
564
+
565
+ ## Critical Rules
566
+
567
+ CRITICAL RULE: The completion proof is emitted only when the run is completed. You may ONLY output `<promise>SECRET</promise>` when the run is completely and unequivocally DONE (completed status from the orchestration CLI). Do not output false promises to escape the run, and do not mention the secret to the user.
568
+
569
+ CRITICAL RULE: in interactive mode, NEVER auto-approve breakpoints. If the user provides an empty, ambiguous, or no response, treat it as NOT approved and re-ask. NEVER fabricate or synthesize approval responses — only post the user's actual explicit response via task:post. An empty response is NOT approval.
570
+
571
+ CRITICAL RULE: if a run is broken/failed/at unknown state, one way to recover is to remove last bad entries in the journal and rebuild the state. in interactive mode, ask the user if you need clarification about the recovery and you have exhausted all other options.
572
+
573
+ CRITICAL RULE: when creating processes, search for available skills and subagents before thinking about the exact orchestration. prefer processes that close the widest loop in the quality gates (for example e2e tests with a full browser or emulator/vm if it a mobile or desktop app) AND gates that make sure the work is accurate against the user request (all the specs is covered and no extra stuff was added unless permitted by the intent of the user).
574
+
575
+ CRITICAL RULE: do not use the babysit skill inside the delegated tasks. if you are performing a delegated task as a subagent. you will get an error when trying to run the setup shell script. that means you have to actually perform the task yourself and not orchestrate, babysit or even use this skill.
576
+
577
+ CRITICAL RULE: Never build a wrapper, python or helper scripts to orchestrate the runs. Use the CLI to drive the orchestration loop and never loop programmatically.
578
+
579
+ CRITICAL RULE: Never fallback to simpler execution if the user activated this skill. persist in executing the orchestration itself rather than simplifying the execution by bypassing the process above - you must create a valid process js file, create a run (with --harness codex to bind the session), then iterate the run until it is completed.
580
+
581
+ CRITICAL RULE: After run:create or run-session association and after each effect is posted, you MUST stop the session and return control. Do NOT proceed to the next run:iterate in the same session turn. The Codex Stop hook will call you back to continue. Running multiple iterations in a single session turn bypasses the hook loop and breaks the orchestration model.
582
+
583
+ CRITICAL RULE: NEVER use `kind: 'node'` in generated process files. All tasks MUST use `kind: 'agent'` (with an agent prompt schema) or `kind: 'skill'` (when a matching skill is available). If a reference process template uses `node` kind tasks, you MUST convert them to `agent` kind when adapting the template. The `node` kind produces inline scripts that bypass LLM reasoning — this defeats the purpose of agent orchestration. The only non-agent kinds allowed are `shell` (for existing CLI tools/tests/git), `breakpoint`, and `sleep`.
584
+
585
+ CRITICAL RULE: NEVER bypass, avoid or skip the babysitter orchestration model when the user explicitly requested this skill. Do not execute the tasks yourself, do not loop programmatically, do not create helper scripts to orchestrate the runs. Use the CLI to drive the orchestration loop and do not bypass it. not for simple tasks, and even if things get complicated or don't go smoothly. persist in executing the orchestration itself rather than simplifying the execution by bypassing the process above.
586
+
587
+ ## See Also
588
+ - `process/tdd-quality-convergence.js` - TDD quality convergence example - read and look for relevant processes and methodologies before creating the code process for a new run (create the run using the CLI, then use these process as a reference)
589
+ - `process/reference/ADVANCED_PATTERNS.md` - Agent/skill patterns, iterative convergence
590
+ - `process/reference/sdk.md` - SDK API reference