npm - @a5c-ai/babysitter-codex - Versions diffs - 0.1.6-staging.2dca8387 - Mend

@a5c-ai/babysitter-codex 0.1.6-staging.2dca8387

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (51) hide show

package/.codex/AGENTS.md +53 -0
package/.codex/command-catalog.json +130 -0
package/.codex/config.toml +24 -0
package/.codex/hooks/babysitter-session-start.sh +15 -0
package/.codex/hooks/babysitter-stop-hook.sh +15 -0
package/.codex/hooks/user-prompt-submit.sh +15 -0
package/.codex/hooks.json +37 -0
package/.codex/plugin.json +132 -0
package/.codex/skills/babysitter/assimilate/SKILL.md +58 -0
package/.codex/skills/babysitter/call/SKILL.md +590 -0
package/.codex/skills/babysitter/doctor/SKILL.md +89 -0
package/.codex/skills/babysitter/forever/SKILL.md +45 -0
package/.codex/skills/babysitter/help/SKILL.md +49 -0
package/.codex/skills/babysitter/issue/SKILL.md +36 -0
package/.codex/skills/babysitter/model/SKILL.md +31 -0
package/.codex/skills/babysitter/observe/SKILL.md +38 -0
package/.codex/skills/babysitter/plan/SKILL.md +44 -0
package/.codex/skills/babysitter/project-install/SKILL.md +65 -0
package/.codex/skills/babysitter/resume/SKILL.md +30 -0
package/.codex/skills/babysitter/retrospect/SKILL.md +43 -0
package/.codex/skills/babysitter/team-install/SKILL.md +31 -0
package/.codex/skills/babysitter/user-install/SKILL.md +53 -0
package/.codex/skills/babysitter/yolo/SKILL.md +48 -0
package/AGENTS.md +91 -0
package/CHANGELOG.md +162 -0
package/README.md +146 -0
package/SKILL.md +89 -0
package/agents/openai.yaml +4 -0
package/babysitter.lock.json +18 -0
package/bin/postinstall.js +225 -0
package/bin/uninstall.js +37 -0
package/commands/README.md +23 -0
package/commands/assimilate.md +27 -0
package/commands/call.md +30 -0
package/commands/doctor.md +27 -0
package/commands/forever.md +27 -0
package/commands/help.md +28 -0
package/commands/issue.md +27 -0
package/commands/model.md +27 -0
package/commands/observe.md +27 -0
package/commands/plan.md +27 -0
package/commands/project-install.md +31 -0
package/commands/resume.md +29 -0
package/commands/retrospect.md +27 -0
package/commands/team-install.md +29 -0
package/commands/user-install.md +27 -0
package/commands/yolo.md +28 -0
package/package.json +50 -0
package/scripts/team-install.js +257 -0
package/test/integration.test.js +69 -0
package/test/packaged-install.test.js +191 -0

package/.codex/skills/babysitter/doctor/SKILL.md ADDED Viewed

@@ -0,0 +1,89 @@
+---
+name: babysitter:doctor
+description: Diagnose babysitter run health — journal integrity, state cache, effects, locks, sessions, logs, and disk usage.
+argument-hint: "[run-id] Optional run ID to diagnose. If omitted, uses the most recent run."
+---
+# babysitter:doctor
+Comprehensive diagnostic agent for babysitter runtime health. Performs 10 mandatory health checks on a specific run (or the most recent run if no ID is provided).
+## Run Discovery
+If no run-id argument provided, find the most recent run:
+```bash
+ls -t .a5c/runs/ | head -1
+```
+## 10 Mandatory Health Checks
+For each check, report **PASS**, **WARN**, or **FAIL**.
+### 1. Run Discovery
+- Verify run directory exists at `.a5c/runs/<runId>/`
+- Display run metadata (process ID, created at, status)
+- Check manifest.json exists and is valid
+### 2. Journal Integrity
+- Verify `.a5c/runs/<runId>/journal.jsonl` exists
+- Check each line is valid JSON
+- Verify timestamps are monotonically increasing
+- Verify event type consistency
+- Count events by type
+### 3. State Cache Consistency
+- Compare `state.json` against journal events
+- Verify derived state matches journal history
+- Check for orphaned state entries
+### 4. Effect Status
+- List all effects and their statuses
+- Flag stuck effects (requested > 30 minutes ago, still pending)
+- Flag errored effects
+- Count pending vs completed
+### 5. Lock Status
+- Check for `run.lock` file
+- If lock exists, verify PID is alive
+- Flag stale/orphaned locks
+### 6. Session State
+- Inspect session state file
+- Check iteration count vs max
+- Detect potential runaway loops (fast iteration pattern)
+### 7. Log Analysis
+- Check `.a5c/logs/` for hook logs
+- Scan for errors in stderr logs
+- Report log file sizes
+### 8. Disk Usage
+- Report `.a5c/runs/<runId>/` total size
+- Identify files > 10MB
+- Report blob storage usage
+### 9. Process Validation
+- Verify process entry point exists and passes `node --check`
+- Check `@a5c-ai/babysitter-sdk` dependency is installed
+- Verify process exports match manifest
+### 10. Hook Health
+- Check all hook scripts exist and are executable
+- Verify `sh -n` passes for shell scripts
+- Verify `node --check` passes for JS hooks
+## Output
+Print a summary table:
+```
+Check                    Status  Details
+─────────────────────────────────────────
+1. Run Discovery         PASS    Run 01KJY... found
+2. Journal Integrity     PASS    42 events, checksums valid
+3. State Cache           PASS    Consistent
+4. Effect Status         WARN    1 stuck effect (> 30min)
+5. Lock Status           PASS    No stale locks
+...
+```
+Then print detailed findings and recommendations for any WARN/FAIL checks.

package/.codex/skills/babysitter/forever/SKILL.md ADDED Viewed

@@ -0,0 +1,45 @@
+---
+name: babysitter:forever
+description: Start a never-ending babysitter run with infinite loops and sleep gates.
+argument-hint: Specific instructions for the periodic run
+---
+# babysitter:forever
+Start a **never-ending** babysitter orchestration run. The process runs in an infinite loop with `ctx.sleep()` gates between iterations — ideal for periodic maintenance tasks, monitoring, or continuous improvement workflows.
+## Workflow
+### 1. Interview Phase
+Same as `babysitter call` — gather intent, requirements, and scope. Focus on:
+- What should happen each cycle?
+- How long between cycles? (sleep duration)
+- What quality gates should be checked?
+- Under what conditions should the loop stop? (manual breakpoint, quality threshold, error count)
+### 2. Process Creation
+Create a process with an infinite loop pattern:
+```javascript
+export async function process(inputs, ctx) {
+  while (true) {
+    // Execute the periodic task
+    const result = await ctx.task(periodicTask, { ...inputs });
+    // Optional: quality gate
+    if (result.shouldStop) break;
+    // Sleep until next cycle (e.g., 4 hours)
+    await ctx.sleep({ duration: inputs.sleepDuration || '4h' });
+  }
+  return { completed: true };
+}
+```
+### 3. Run Creation and Loop
+Same as `babysitter call` — create the run and iterate. Sleep effects are handled by waiting until the specified timestamp before continuing.
+The run will only complete when the process logic breaks out of the loop or when manually stopped.

package/.codex/skills/babysitter/help/SKILL.md ADDED Viewed

@@ -0,0 +1,49 @@
+---
+name: babysitter:help
+description: Help and documentation for babysitter on Codex.
+argument-hint: "[command|process|skill|agent|methodology] topic to get help on"
+---
+# babysitter:help
+Babysitter for Codex uses skills, AGENTS guidance, project config, and
+workspace lifecycle hooks so Codex stays in the orchestration loop.
+## No Arguments
+Show a short summary using normal-language activation examples:
+```text
+Babysitter for Codex
+Primary examples:
+  babysitter call build auth with tests
+  babysitter resume recent
+  babysitter yolo fix lint and failing tests
+  babysitter plan a migration workflow
+  babysitter doctor current run
+Codex-native surfaces:
+  - skills
+  - AGENTS.md guidance
+  - project .codex/config.toml
+  - project .codex/hooks.json (SessionStart, UserPromptSubmit, Stop)
+  - optional notify monitoring
+Hook model responsibilities:
+  - SessionStart: seed session state
+  - UserPromptSubmit: prompt-level transforms
+  - Stop: continue or approve exit after each yielded turn
+```
+## With Arguments
+If an argument is provided:
+1. Command help: read the relevant `.codex/skills/babysitter/<name>/SKILL.md`
+2. Process help: inspect the process file and summarize it
+3. Skill and agent help: use discovery helpers or `babysitter skill:discover`
+4. Methodology help: search the upstream process library
+Legacy `/babysitter:*` aliases may be mentioned only as optional compatibility
+shims, not as native Codex commands.

package/.codex/skills/babysitter/issue/SKILL.md ADDED Viewed

@@ -0,0 +1,36 @@
+---
+name: babysitter:issue
+description: Start a babysitter workflow from a GitHub issue.
+argument-hint: "<issue-number|url> [--repo owner/name] [--apply] [--pr <number>] [--open-pr]"
+---
+# babysitter:issue
+Run issue-driven orchestration.
+## Behavior
+1. Parse issue input:
+   - Numeric issue id, or full GitHub issue URL.
+   - Optional `--repo owner/name`.
+2. Resolve repository:
+   - If `--repo` omitted, infer from current git remote.
+3. Fetch issue details:
+   - Use `.codex/github-workflow.js` helper.
+4. Generate:
+   - Concise implementation plan.
+   - Proposed steps and risk notes.
+5. Optional actions:
+   - `--apply` -> return `mode=apply` and ready-to-run babysitter prompt.
+   - `--pr <number>` -> post plan comment to existing PR.
+   - `--open-pr` -> attempt `gh pr create --fill`.
+6. If in apply mode:
+   - Start babysitter run with issue context in prompt.
+## Output Contract
+- Return JSON with:
+  - `issue`: metadata
+  - `plan`: list of actionable steps
+  - `mode`: `plan` or `apply`
+  - `nextAction`: command suggestion

package/.codex/skills/babysitter/model/SKILL.md ADDED Viewed

@@ -0,0 +1,31 @@
+---
+name: babysitter:model
+description: Set or view model routing policy for plan/execute/review phases.
+argument-hint: "[show|clear|set <phase>=<model> ...]"
+---
+# babysitter:model
+Manage model routing policy used by babysitter orchestration.
+## Behavior
+1. If argument is `show` or empty:
+   - Read `.a5c/config/model-policy.json` (and env override if provided).
+   - Return current policy JSON and effective defaults.
+2. If argument is `clear`:
+   - Clear persisted policy map.
+   - Keep runtime fallback to `BABYSITTER_MODEL_DEFAULT` if set.
+2. If argument starts with `set`:
+   - Parse one or more `phase=model` pairs.
+   - Valid phases: `plan`, `execute`, `review`, `fix`.
+   - Update `.a5c/config/model-policy.json`.
+   - Confirm the new policy map.
+## Output Contract
+- Always return valid JSON:
+  - `action`: `show` or `set`
+  - `policy`: object
+  - `applied`: boolean
+  - `notes`: array

package/.codex/skills/babysitter/observe/SKILL.md ADDED Viewed

@@ -0,0 +1,38 @@
+---
+name: babysitter:observe
+description: Launch the babysitter observer dashboard for real-time run monitoring.
+argument-hint: "[--watch-dir <dir>] or 'stop' to kill running dashboard"
+---
+# babysitter:observe
+Launch the babysitter observer dashboard — a real-time web UI for monitoring runs, tasks, journal events, and orchestration state.
+## Usage
+### Start the Dashboard
+1. Determine the watch directory (usually the project's container directory or cwd)
+2. Launch:
+```bash
+npx -y @yoavmayer/babysitter-observer-dashboard@latest --watch-dir <dir>
+```
+3. This is a **blocking process** — it will keep running until stopped
+4. Open the browser at the URL printed by the dashboard
+### Stop the Dashboard
+If the argument is `stop`:
+1. Find the running dashboard process:
+```bash
+ps aux | grep babysitter-observer-dashboard | grep -v grep
+```
+2. Kill it:
+```bash
+kill <pid>
+```
+3. Confirm it stopped
+### Default Watch Directory
+If no `--watch-dir` is specified, use the parent of the current project directory. For `/data/repos`, watch `/data`.

package/.codex/skills/babysitter/plan/SKILL.md ADDED Viewed

@@ -0,0 +1,44 @@
+---
+name: babysitter:plan
+description: Plan a babysitter workflow without executing it. Focus on creating the best process possible.
+argument-hint: Specific instructions for the plan
+---
+# babysitter:plan
+Plan a complex workflow **without executing it**. This command goes through the full interview and process creation phases but does NOT create a run or execute any tasks.
+## Workflow
+### 1. Interview Phase
+Same as `babysitter call`:
+- Research the repo structure
+- Search the process library for relevant specializations/methodologies
+- Gather user intent, requirements, goals, and scope
+- Use wrapper discovery helpers to find available skills/agents.
+- If invoking SDK CLI directly, use:
+  `babysitter skill:discover --plugin-root "$CODEX_PLUGIN_ROOT" --json`
+### 2. Process Creation
+Create the complete process .js file with all task definitions, quality gates, and convergence loops. Also generate:
+- `<process-name>.diagram.md` — Visual process flow diagram
+- `<process-name>.process.md` — High-level process description
+### 3. Review
+Present the process to the user for review. The process should be complete and ready to run — the user can later execute it with `babysitter call` or `babysitter yolo`.
+### 4. Output
+Store the process files in `.a5c/processes/`:
+```
+.a5c/processes/
+├── <process-name>.js          # Process definition
+├── <process-name>-inputs.json # Default inputs
+├── <process-name>.diagram.md  # Visual diagram
+└── <process-name>.process.md  # Description
+```
+**Do NOT create a run.** The plan is the deliverable.

package/.codex/skills/babysitter/project-install/SKILL.md ADDED Viewed

@@ -0,0 +1,65 @@
+---
+name: babysitter:project-install
+description: Set up a project for babysitting. Research the codebase, build project profile, install tools.
+argument-hint: Specific instructions for project onboarding
+---
+# babysitter:project-install
+Guide through onboarding a new or existing project for babysitter orchestration.
+## Workflow
+### 1. Research the Codebase
+- Analyze project structure, language, framework, build tools
+- Check for existing `.a5c/` directory
+- Detect CI/CD configuration (GitHub Actions, Jenkins, etc.)
+- Identify key entry points and test suites
+### 2. Interview
+- What are the project's goals?
+- What workflows should be orchestrated?
+- What quality gates matter most?
+- Any specific tools or frameworks to prefer?
+### 3. Install the Real Codex Payload
+- Verify the user has installed `@yaniv-tg/babysitter-codex`
+- Run the packaged team installer from the installed skill payload
+- Confirm:
+  - `.a5c/team/install.json`
+  - `.a5c/team/profile.json`
+- Treat the installed skill root as the source of truth for bundled rules,
+  docs, processes, and hook scripts
+### 4. Build Project Profile
+- If `babysitter profile:*` commands are supported, write the project profile
+  through the CLI
+- If the SDK is running in `compat-core`, do not block onboarding on profile
+  writes; instead record the discovered build/test/gate choices in workspace
+  onboarding notes under `.a5c/`
+The profile or onboarding notes should cover:
+- Project name, description, language, framework
+- Build and test commands
+- Quality gates configuration
+- Preferred skills and agents
+- CI/CD integration settings
+### 5. Install Project-Level Codex Settings
+- Ensure `@a5c-ai/babysitter-sdk` is available
+- Create `.a5c/` directory structure
+- Set up `.codex/config.toml` using real Codex settings (sandbox, approval, optional notify)
+- Set up `.codex/hooks.json` with `SessionStart`, `UserPromptSubmit`, and `Stop`
+- Create AGENTS.md if not present
+- Do not add fake manifest/plugin sections or external supervisor loops
+### 6. Optional: Configure CI/CD
+- Add babysitter orchestration to CI pipeline
+- Set up automated quality gates
+- Configure deployment hooks
+### Done!
+Project is ready for babysitting. Try `babysitter call ...` to start your first orchestrated workflow.
+Star the repo: https://github.com/a5c-ai/babysitter

package/.codex/skills/babysitter/resume/SKILL.md ADDED Viewed

@@ -0,0 +1,30 @@
+---
+name: babysitter:resume
+description: Resume an existing babysitter run from Codex.
+argument-hint: "[recent|tag:<tag>|search:<query>|list|name <alias>|tag +/-<tag>|sessionId]"
+---
+# babysitter:resume
+Resume an incomplete babysitter run with Codex re-entering through the
+workspace hook model on the next turn.
+## Workflow
+### 1. Select the run
+- Use the session index helpers when available
+- Otherwise inspect `.a5c/runs/*` and choose the most recent incomplete run
+### 2. Resume from persisted run state
+- Resolve the target run directory or selector
+- Check `babysitter run:status <runDir> --json`
+- Continue by handling pending tasks and posting outputs
+- After each yield, let `Stop` drive re-entry on the next Codex turn
+### 3. Close out
+- Report the new run status
+- If the run still needs user input, say so directly
+- Do not claim Codex will continue automatically without hook registration

package/.codex/skills/babysitter/retrospect/SKILL.md ADDED Viewed

@@ -0,0 +1,43 @@
+---
+name: babysitter:retrospect
+description: Analyze a completed or in-flight run and propose process improvements for future runs.
+argument-hint: "[run-id] Optional run ID, defaults to latest run"
+---
+# babysitter:retrospect
+Run a structured retrospective over the last babysitter execution and convert findings into actionable process/library improvements.
+## Workflow
+1. Resolve target run
+- If run ID is provided, use it.
+- Otherwise select the latest run under `.a5c/runs`.
+2. Collect run evidence
+- Read run status/events/task outputs:
+```bash
+babysitter run:status .a5c/runs/<runId> --json
+babysitter task:list .a5c/runs/<runId> --json
+```
+- Inspect trace and event stream:
+  - `.a5c/runs/<runId>/run-trace.jsonl`
+  - `.a5c/events/events.jsonl`
+3. Analyze against process library
+- Resolve the active process library first:
+```bash
+babysitter process-library:active --state-dir .a5c --json
+```
+- Compare what was executed vs the active process-library binding and relevant methodologies/specializations.
+4. Produce retrospective output
+- Include:
+  - What worked well
+  - What failed or slowed convergence
+  - Concrete process modifications (tasks/hooks/policies/budgets/breakpoints)
+  - Suggested library contributions (processes/skills/agents/docs)
+5. If user approves, apply improvements
+- Update process files under `.a5c/processes` and/or codex harness config/docs.
+- Suggest contribution flow to upstream Babysitter for generally useful changes.

package/.codex/skills/babysitter/team-install/SKILL.md ADDED Viewed

@@ -0,0 +1,31 @@
+---
+name: babysitter:team-install
+description: Install or refresh a team-pinned babysitter runtime/content setup from lockfile.
+argument-hint: "[--dry-run]"
+---
+# babysitter:team-install
+Install the team-standard babysitter-codex setup from the installed skill payload into the current workspace.
+## Steps
+1. Resolve the installed skill root from this `SKILL.md` location, not from the repo `cwd`.
+2. Run the packaged installer against the current workspace.
+3. Confirm generated files:
+- `.a5c/team/install.json`
+- `.a5c/team/profile.json`
+- `.a5c/active/process-library.json`
+The installer must read:
+- `<skillRoot>/babysitter.lock.json`
+The installer must bootstrap the active process library through the Babysitter SDK CLI:
+- clone or update the original Babysitter repo under `<workspace>/.a5c/process-library/...`
+- bind the active process root with `babysitter process-library:use ...`
+- resolve the active path later with `babysitter process-library:active --state-dir <workspace>/.a5c --json`
+The installer must write workspace state only under:
+- `<workspace>/.a5c/`
+Use this before onboarding new repos or contributors so command/process/rules mappings are deterministic and do not depend on the plugin repo being checked out beside the target project.

package/.codex/skills/babysitter/user-install/SKILL.md ADDED Viewed

@@ -0,0 +1,53 @@
+---
+name: babysitter:user-install
+description: Set up babysitter for yourself. Install deps, build user profile, configure tools.
+argument-hint: Specific instructions for user onboarding
+---
+# babysitter:user-install
+Guide through onboarding a new user for babysitter orchestration.
+## Workflow
+### 1. Install Dependencies
+- Ensure Node.js >= 18 is installed
+- Install babysitter SDK globally:
+```bash
+npm install -g @a5c-ai/babysitter-sdk
+```
+- Verify: `babysitter version --json`
+### 2. Interview
+- What are your specialties? (frontend, backend, devops, ML, etc.)
+- What's your expertise level? (junior, mid, senior, expert)
+- Communication preferences? (concise, detailed, step-by-step)
+- Breakpoint tolerance? (minimal, low, moderate, high, maximum)
+- Preferred tools and frameworks?
+### 3. Build User Profile
+Write the user profile using the CLI:
+```bash
+echo '<profile-json>' > /tmp/user-profile.json
+babysitter profile:write --user --input /tmp/user-profile.json --json
+```
+The profile includes:
+- Name, role, specialties
+- Expertise levels per domain
+- Communication style preferences
+- Breakpoint tolerance settings
+- Tool preferences
+- Installed skills and agents
+### 4. Configure Tools
+- Set up preferred editor/IDE integration
+- Configure default process templates
+- Set environment variables
+### Done!
+Your babysitter profile is configured. It will personalize all future orchestration runs.
+Star the repo: https://github.com/a5c-ai/babysitter

package/.codex/skills/babysitter/yolo/SKILL.md ADDED Viewed

@@ -0,0 +1,48 @@
+---
+name: babysitter:yolo
+description: Start babysitting in non-interactive mode - no user interaction or breakpoints, fully autonomous execution.
+argument-hint: Specific instructions for the run
+---
+# babysitter:yolo
+Identical to `babysitter call` but runs in non-interactive mode:
+- Skip the interview phase - parse intent directly from the user's prompt
+- Auto-approve all breakpoints - never pause for human approval
+- No user questions - proceed autonomously through the orchestration loop
+## Workflow
+1. Parse the initial prompt to extract intent, scope, and requirements
+2. Research the repo structure to understand the codebase
+3. Search the process library for relevant specializations and methodologies
+4. Create the process `.js` file and inputs
+5. Create the run:
+```bash
+babysitter run:create \
+  --process-id <id> \
+  --entry <path>#<export> \
+  --inputs <inputs-file> \
+  --prompt "$PROMPT" \
+  --harness codex \
+  --state-dir .a5c \
+  --json
+```
+6. Continue by handling returned tasks and auto-approving breakpoints
+The hook model remains active:
+- `SessionStart` initializes state
+- `Stop` decides continuation
+- yolo only removes human approval pauses
+7. When `completionProof` is emitted, report it plainly
+## Key Difference from `babysitter call`
+The only difference is that breakpoints are auto-approved and no user questions
+are asked. The hook-owned continuation and result posting contract stay the
+same.