npm - @pennyfarthing/core - Versions diffs - 7.6.1 → 7.8.0 - Mend

@pennyfarthing/core 7.6.1 → 7.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (109) hide show

package/pennyfarthing-dist/agents/workflow-status-check.md CHANGED Viewed

@@ -10,6 +10,12 @@ Universal entry point telling agents: what work exists, what phase, and whether
 Uses `/sprint` skill scripts for deterministic output.
 </info>
+<arguments>
+| Argument | Required | Description |
+|----------|----------|-------------|
+| `CALLING_AGENT` | Yes | Agent requesting status check (e.g., "SM", "Architect", "PM") |
+</arguments>
 ---
 ## Execution
@@ -37,32 +43,54 @@ fi
 | State | Condition |
 |-------|-----------|
 | `FINISH_STATE` | Session exists with Phase=approved OR Status=approved |
-| `IN_PROGRESS_STATE` | Session exists with active phase (tea/dev/review) |
+| `IN_PROGRESS_STATE` | Phased workflow session with active phase (setup/red/green/impl/review) |
+| `STEPPED_WORKFLOW_STATE` | Stepped workflow session (workflow type = stepped) |
 | `NEW_WORK_STATE` | No sessions AND sprint has backlog/ready stories |
 | `EMPTY_BACKLOG_STATE` | No sessions AND sprint has NO backlog/ready stories |
+**Detecting workflow type from session:**
+```bash
+# Read workflow name from session
+WORKFLOW=$(grep '^\*\*Workflow:\*\*' .session/*-session.md | head -1 | sed 's/.*: //')
+# Check if stepped
+.pennyfarthing/scripts/core/run.sh workflow/get-workflow-type.sh "$WORKFLOW"
+```
 **Important:** Sprints are fixed two-week periods (kanban-style). Never suggest closing a sprint early or starting sprint planning when backlog is empty. The correct response to `EMPTY_BACKLOG_STATE` is to suggest promoting stories from `future.yaml`.
 ---
+<output>
 ## Output Format
-```markdown
-## Workflow Status Report
-### Detected State
-**{STATE}**
+Return a `STATUS_CHECK_RESULT` block:
-### Sprint Summary
-[Output from sprint-status.sh]
-### Active Session
-| Story | Phase | Status | Branch |
-|-------|-------|--------|--------|
+### Success
+```
+STATUS_CHECK_RESULT:
+  status: success
+  state: {FINISH_STATE|IN_PROGRESS_STATE|NEW_WORK_STATE|EMPTY_BACKLOG_STATE}
+  story_id: {ID or null}
+  phase: {current phase or null}
+  phase_owner: {agent name or null}
+  sprint_number: {N}
+  backlog_count: {N}
+  next_steps:
+    - FINISH_STATE: "Proceed to Finish Flow - spawn sm-finish with PHASE=preflight"
+    - IN_PROGRESS_STATE: "Report phase owner '{phase_owner}' should continue. Run handoff-marker.sh {phase_owner}"
+    - STEPPED_WORKFLOW_STATE: "Stepped workflow in progress. Tell user to run /workflow resume or /workflow status"
+    - NEW_WORK_STATE: "Present available stories to user. Await selection, then spawn sm-setup MODE=setup"
+    - EMPTY_BACKLOG_STATE: "Report backlog empty. Suggest promoting from future.yaml"
+```
-### Recommended Action
-- FINISH_STATE → Proceed to finish flow
-- IN_PROGRESS_STATE → Report which agent should continue
-- NEW_WORK_STATE → Show available stories
-- EMPTY_BACKLOG_STATE → Suggest promoting stories from future.yaml
+### Active Session Details (if IN_PROGRESS_STATE)
+```
+  session:
+    story_id: {ID}
+    title: "{title}"
+    workflow: {workflow}
+    phase: {phase}
+    branch: {branch}
 ```
+</output>

package/pennyfarthing-dist/commands/benchmark.md CHANGED Viewed

@@ -151,7 +151,11 @@ Cross-role mode: Prospero --as dev should see dev scenarios, not SM scenarios.
 ls scenarios/{category}/*.yaml | xargs -I {} yq -r '"{}|\(.name)|\(.difficulty)|\(.title)|\(.description)"' {}
 ```
-**Present choices with AskUserQuestion:**
+**Present choices (Reflector-aware):**
+First output marker: `<!-- CYCLIST:CHOICES:scenario -->`
+Then use AskUserQuestion:
 ```yaml
 AskUserQuestion:
   questions:
@@ -416,6 +420,20 @@ agent:
   cross_role: true
 ```
+**REQUIRED: Capture Pennyfarthing version in metadata:**
+```bash
+# Get version from package.json
+version=$(node -p "require('./package.json').version")
+```
+Include in summary.yaml:
+```yaml
+metadata:
+  created_at: "{ISO timestamp}"
+  pennyfarthing_version: "{version}"  # REQUIRED for baseline staleness detection
+  model: sonnet
+```
 **ALWAYS save summary.yaml, even for n=1.** This ensures consistent data structure for analysis.
 Display:

package/pennyfarthing-dist/commands/continue-session.md CHANGED Viewed

@@ -58,7 +58,7 @@ If checkpoints exist, parse and present them:
 Which checkpoint would you like to restore? (Enter number or 'all' for most recent of each label)
 ```
-Use `AskUserQuestion` to let user choose.
+Output `<!-- CYCLIST:CHOICES:checkpoint -->` marker, then use AskUserQuestion to let user choose.
 ## Step 3: Restore Checkpoint

package/pennyfarthing-dist/commands/git-cleanup.md CHANGED Viewed

@@ -1,340 +1,75 @@
 ---
 description: Clean up git repos by organizing changes into proper commits/branches by initiative
+workflow: git-cleanup
 ---
 # Git Cleanup Command
 Analyze and organize uncommitted changes across all repos into proper commits and branches based on initiative/feature groupings.
-## Git Workflow Rules
-**CRITICAL: Never commit directly to develop. Branch protection hooks will reject direct commits.**
-All changes MUST follow this workflow:
-1. Create a branch from develop
-2. Commit changes to the branch
-3. Merge to develop locally
-4. Push develop (branches don't need to be pushed)
-## Analysis Phase
-### 1. Gather Current State and Check for Unpushed Commits
-```bash
-./scripts/run.sh git/git-status-all.sh
-```
-This shows branch, changes, and unpushed commits for all repos.
-### 2. Check Recent Commits for Patterns
-```bash
-echo "=== Recent Commit Patterns ==="
-git log --oneline -10 | head -10
-echo ""
-echo "=== Recent Branches ==="
-git branch --sort=-committerdate | head -10
-```
-### 3. Check Active Work Sessions
-```bash
-echo "=== Active Work ==="
-if [ -f ".session/{STORY_ID}-session.md" ]; then
-  head -30 .session/{STORY_ID}-session.md
-else
-  echo "No active work session"
-fi
-```
-### 4. Check Worktree Status
-```bash
-echo "=== Worktree Status ==="
-./scripts/run.sh git/worktree-manager.sh status
-```
-This shows all active worktrees with their branches and uncommitted changes.
-### 5. Check Epics for Story Context
-```bash
-echo "=== Active Epics ==="
-head -60 docs/epics.md
-```
-## Categorization Guidelines
-Group changes by these initiative types (based on conventional commits):
+## BikeLane Workflow
-| Prefix | Type | Branch Pattern | Example |
-|--------|------|----------------|---------|
-| `docs:` | Documentation | `docs/description` | docs/update-epics |
-| `chore:` | Maintenance | `chore/description` | chore/sprint-cleanup |
-| `chore(sprint):` | Sprint tracking | `chore/sprint-update` | - |
-| `chore(pennyfarthing):` | Pennyfarthing config | `chore/pennyfarthing-cleanup` | - |
-| `feat:` | New feature | `feat/story-id-desc` | feat/4-12-risk-scoring |
-| `fix:` | Bug fix | `fix/issue-desc` | fix/validation-error |
-| `refactor:` | Code improvement | `refactor/description` | refactor/seed-data |
+This command uses the **git-cleanup** stepped workflow:
-## Common Change Categories
+| Step | Name | Purpose |
+|------|------|---------|
+| 1 | Analyze | Gather git status across all repos |
+| 2 | Categorize | Group changes by initiative type |
+| 3 | Execute | Create branches, commit, merge |
+| 4 | Verify | Confirm clean state, optionally push |
+| 5 | Complete | Summary and next steps |
-### Sprint/Docs Changes
-Files: `docs/*.md`, `sprint/*.yaml`, `sprint/*.md`
-- Branch: `chore/sprint-cleanup` or `docs/update-[topic]`
-- Commit: `chore(sprint): update sprint tracking` or `docs: update documentation`
+## Quick Start
-### Pennyfarthing Configuration
-Files: `.claude/**/*`
-- Branch: `chore/pennyfarthing-[description]`
-- Commit: `chore(pennyfarthing): description`
-### Feature Work
-Files: `internal/**`, `src/**`, `migrations/**`
-- Branch: `feat/[story-id]-description`
-- Commit: `feat: description`
-### Seed/Test Data
-Files: `internal/seed/**`, `cmd/seed/**`
-- Branch: `chore/seed-improvements` or part of feature branch
-- Commit: `chore(seed): description`
-### Bug Fixes
-Files: Various
-- Branch: `fix/[issue-description]`
-- Commit: `fix: description`
-## Execution Phase - Branch Workflow
-**For each group of changes, follow this exact workflow:**
-### Step 1: Stash All Changes First
-```bash
-# Stash everything to start clean
-git stash push -m "cleanup-wip"
-```
-### Step 2: Create Branch from Develop
-```bash
-# Ensure on develop and up to date
-git checkout develop
-git pull origin develop
-# Create feature branch
-git checkout -b type/description
-```
-### Step 3: Apply Relevant Changes
-```bash
-# Pop stash
-git stash pop
-# Stage only files for this group
-git add <specific-files>
-# Stash remaining changes for next group
-git stash push -m "remaining-cleanup"
-```
-### Step 4: Commit
-```bash
-# Commit with proper message
-git commit -m "$(cat <<'EOF'
-type(scope): description
-Details if needed.
-🤖 Generated with [Claude Code](https://claude.com/claude-code)
-Co-Authored-By: Claude <noreply@anthropic.com>
-EOF
-)"
-```
-### Step 5: Merge to Develop
-```bash
-# Switch to develop
-git checkout develop
+Run `/git-cleanup` to start the workflow.
-# Merge the branch
-git merge type/description
-# Delete local branch
-git branch -d type/description
-```
-### Step 6: Repeat for Next Group
-```bash
-# Pop remaining stash and repeat from Step 2
-git stash pop
-```
-## Multi-Repo Cleanup Pattern
-When cleaning up changes across multiple repos (configured in `.claude/project/repos.yaml`):
-```bash
-# Source repo utilities
-source $CLAUDE_PROJECT_DIR/scripts/repo-utils.sh
-# For each configured repo, run the same workflow
-for_each_repo '
-  # ... branch workflow ...
-  git status --short
-'
-```
-Or manually iterate:
-```bash
-source $CLAUDE_PROJECT_DIR/scripts/repo-utils.sh
-for repo in $(get_repos); do
-  repo_path=$(get_repo_full_path "$repo")
-  echo "=== Processing $repo ==="
-  cd "$repo_path"
-  # ... branch workflow ...
-  cd -
-done
-```
-## Interactive Cleanup Process
-When running this command, Claude should:
+## Git Workflow Rules
-1. **Show current state** - Display all uncommitted changes grouped by repo
-2. **Check for unpushed commits** - Identify any commits on develop not yet pushed
-3. **Analyze changes** - Look for patterns in file paths, recent commits, and epics
-4. **Propose groupings** - Suggest how to organize changes:
-   ```
-   Group 1: Sprint Updates
-   - docs/epics.md
-   - sprint/current-sprint.yaml
-   → Branch: chore/sprint-cleanup
-   → Commit: "chore(sprint): clean up sprint tracking files"
+**CRITICAL: Never commit directly to develop.** Branch protection hooks will reject direct commits.
-   Group 2: Migration Fix
-   - migrations/056_*.sql
-   → Branch: fix/risk-scoring-migration
-   → Commit: "fix(db): add missing risk scoring migration"
+All changes follow this pattern:
+1. Create branch from develop
+2. Commit changes to branch
+3. Merge to develop locally
+4. Push develop (branches stay local)
-   Group 3: Seed Improvements
-   - internal/seed/*.go
-   → Branch: chore/seed-improvements
-   → Commit: "chore(seed): improve demo data generation"
-   ```
-4. **Ask for confirmation** - Let user approve or modify groupings
-5. **Execute** - Create branches, commit, merge to develop for each group
-6. **Push develop** - Push all merged changes at the end
-7. **Verify** - Show clean git status at the end
+## Categorization Reference
-## Safety Rules
+| Prefix | Type | Branch Pattern |
+|--------|------|----------------|
+| `docs:` | Documentation | `docs/description` |
+| `chore:` | Maintenance | `chore/description` |
+| `chore(sprint):` | Sprint tracking | `chore/sprint-update` |
+| `feat:` | New feature | `feat/story-id-desc` |
+| `fix:` | Bug fix | `fix/issue-desc` |
+| `refactor:` | Code improvement | `refactor/description` |
+| `test:` | Test changes | `test/description` |
-- **NEVER commit directly to develop** - hooks will reject it
-- **Never force push**
-- **Never commit secrets** (.env, credentials, etc.)
-- **Show diff before committing**
-- **Respect .gitignore**
-- **Handle unpushed commits on develop before starting cleanup**
+## Manual Quick Reference
-## Quick Reference
+For quick cleanup without the full workflow:
 ```bash
-# View what needs cleanup (all repos)
+# View what needs cleanup
 ./scripts/run.sh git/git-status-all.sh
-# Create cleanup branch
+# Standard cleanup sequence
+git stash push -m "cleanup-wip"
 git checkout develop && git pull
 git checkout -b chore/cleanup-$(date +%Y%m%d)
-# Stage specific files
-git add "docs/*.md"
-git add "sprint/*.yaml"
-# Commit with proper message (use heredoc for multiline)
-git commit -m "$(cat <<'EOF'
-chore(sprint): update sprint status
-🤖 Generated with [Claude Code](https://claude.com/claude-code)
-Co-Authored-By: Claude <noreply@anthropic.com>
-EOF
-)"
-# Merge to develop (local)
+git stash pop
+git add <files>
+git commit -m "chore: description"
 git checkout develop
 git merge chore/cleanup-$(date +%Y%m%d)
 git branch -d chore/cleanup-$(date +%Y%m%d)
-# Push develop after all merges complete
 git push origin develop
 ```
-## Branch Cleanup
-Stale feature branches can accumulate over time. Include branch cleanup as part of git hygiene.
-### Check for Stale Branches
-```bash
-echo "=== Branch Status ==="
-./scripts/run.sh misc/check-status.sh
-```
-### Remove Merged Branches
-**Only remove branches that are fully merged:**
-```bash
-# Source repo utilities
-source $CLAUDE_PROJECT_DIR/scripts/repo-utils.sh
-# For each configured repo
-for repo in $(get_repos); do
-  repo_path=$(get_repo_full_path "$repo")
-  echo "=== $repo ==="
-  cd "$repo_path"
-  git checkout develop
-  git pull origin develop
-  # List branches merged into develop
-  git branch --merged develop | grep -v "develop\|main"
-  # Delete merged branches
-  git branch --merged develop | grep -v "develop\|main" | xargs -r git branch -d
-  cd -
-done
-```
-### Branch Cleanup Criteria
-| Branch Type | When to Remove |
-|-------------|----------------|
-| `feat/*` | After PR merged to develop |
-| `fix/*` | After PR merged to develop |
-| `chore/*` | After PR merged to develop |
-**Note:** Never delete `main` or `develop` branches.
-## Post-Cleanup Verification
-```bash
-echo "=== Final State ==="
-./scripts/run.sh git/git-status-all.sh
-echo ""
-echo "=== Branches ==="
-# Source repo utilities
-source $CLAUDE_PROJECT_DIR/scripts/repo-utils.sh
-for repo in $(get_repos); do
-  repo_path=$(get_repo_full_path "$repo")
-  echo "$repo:" && git -C "$repo_path" branch
-done
-```
----
+## Safety Rules
-**Start by analyzing, then propose groupings, then execute with user approval. Always use branches!**
+- **NEVER** commit directly to develop
+- **NEVER** force push
+- **NEVER** commit secrets (.env, credentials)
+- **ALWAYS** show diff before committing
+- **ALWAYS** use branches

package/pennyfarthing-dist/commands/solo.md CHANGED Viewed

@@ -249,6 +249,37 @@ It goes directly to a file, then jq reads it safely.
 ## Step 6: Invoke Judge Skill
+**Detect SWE-bench scenarios for deterministic evaluation:**
+Check if the scenario is from SWE-bench by looking at its path or category:
+```python
+is_swebench = (
+    'swe-bench' in scenario_path.lower() or
+    scenario.get('category') == 'swe-bench' or
+    scenario.get('source') == 'swe-bench'
+)
+```
+**If SWE-bench scenario:**
+Use deterministic Python-based evaluation instead of LLM-as-judge:
+```bash
+# Save response to temp file for Python judge
+echo '{"result": "{RESPONSE}"}' > /tmp/solo_response_$$.json
+# Run SWE-bench judge (deterministic scoring against ground truth)
+python3 .pennyfarthing/scripts/test/swebench-judge.py {scenario_name} /tmp/solo_response_$$.json
+```
+The Python script returns:
+- `total`: Score out of 100
+- `scores`: Breakdown by category (root_cause, fix_quality, completeness, persona)
+- `details`: Specific findings and matches
+**If standard scenario (non-SWE-bench):**
+Use LLM-as-judge:
 ```
 /judge --mode solo --data {
   "spec": "{spec}",
@@ -380,6 +411,11 @@ else:
      avg_output_tokens: {avg_out}
      tokens_per_point: {tpp:.2f}
+   metadata:
+     created_at: {ISO8601 timestamp}
+     pennyfarthing_version: {version from package.json}  # REQUIRED
+     model: sonnet
    # Include baseline comparison if baseline exists and theme != control
    baseline_comparison:
      control_mean: {baseline_mean}

package/pennyfarthing-dist/commands/theme-maker.md CHANGED Viewed

@@ -28,7 +28,7 @@ If invalid, explain the rules and ask again.
 ### Step 2: Mode Selection
-Use `AskUserQuestion` to let the user choose their creation mode:
+Output `<!-- CYCLIST:CHOICES:mode -->` marker, then use AskUserQuestion:
 ```yaml
 questions:
@@ -253,7 +253,7 @@ Display a preview of all generated agents before confirming:
 ### Step 4: Confirm or Regenerate
-Use `AskUserQuestion` to let the user decide:
+Output `<!-- CYCLIST:CHOICES:confirm -->` marker, then use AskUserQuestion:
 ```yaml
 questions:
@@ -380,7 +380,7 @@ Same as AI-Driven mode - ask for the theme concept:
 ### Step 2: Generate Options for Each Agent
-For each agent type, generate 3-4 fitting character suggestions based on the universe. Present options using `AskUserQuestion`:
+For each agent type, generate 3-4 fitting character suggestions based on the universe. Output `<!-- CYCLIST:CHOICES:agent -->` marker, then present options using AskUserQuestion:
 ```yaml
 questions:
@@ -458,7 +458,7 @@ Show a preview of the complete theme before confirming. Include OCEAN scores for
 ### Step 5: Confirm or Edit
-Use `AskUserQuestion` to let the user decide:
+Output `<!-- CYCLIST:CHOICES:confirm -->` marker, then use AskUserQuestion:
 ```yaml
 questions:
@@ -608,7 +608,7 @@ Show a preview of the complete theme including OCEAN profiles:
 ### Step 6: Confirm or Edit
-Use `AskUserQuestion` to let the user decide:
+Output `<!-- CYCLIST:CHOICES:confirm -->` marker, then use AskUserQuestion:
 ```yaml
 questions:

package/pennyfarthing-dist/commands/work.md CHANGED Viewed

@@ -113,7 +113,7 @@ If multiple session files exist (parallel work):
 Which would you like to continue?
 ```
-Use AskUserQuestion to let user choose, then invoke appropriate agent.
+Output `<!-- CYCLIST:CHOICES:session -->` marker, then use AskUserQuestion to let user choose, then invoke appropriate agent.
 </multiple-sessions>
 <reference>

package/pennyfarthing-dist/guides/agent-behavior.md CHANGED Viewed

@@ -151,6 +151,10 @@ overrides:
 ## Reflector
+<critical>
+**EVERY TURN MUST END WITH A CYCLIST MARKER.** A Stop hook enforces this - you will be blocked if you forget.
+</critical>
 <info>
 HTML comments that agents emit to signal Cyclist UI. Format: `<!-- CYCLIST:TYPE:value -->`
@@ -160,6 +164,7 @@ HTML comments that agents emit to signal Cyclist UI. Format: `<!-- CYCLIST:TYPE:
 | `CONTEXT_CLEAR` | `/agent` | Clears session, reloads with agent |
 | `QUESTION` | `yesno` or `open` | Shows input dialog |
 | `CHOICES` | `opt1,opt2,opt3` | Shows choice buttons |
+| `CONTINUE` | (none) | Shows "Continue" button for status updates |
 **Examples:**
 ```
@@ -168,30 +173,38 @@ HTML comments that agents emit to signal Cyclist UI. Format: `<!-- CYCLIST:TYPE:
 <!-- CYCLIST:QUESTION:yesno -->
 <!-- CYCLIST:QUESTION:open -->
 <!-- CYCLIST:CHOICES:option1,option2,option3 -->
+<!-- CYCLIST:CONTINUE -->
 ```
 **When to use:**
 - `HANDOFF` - End of phase (TEA→Dev, Dev→Reviewer)
 - `CONTEXT_CLEAR` - Context >80% at handoff
 - `QUESTION`/`CHOICES` - User input needed mid-work
+- `CONTINUE` - Status updates, task completion, any turn that isn't a handoff or question
 </info>
 <critical>
-**Question Reflector Enforcement:** A Stop hook validates that ANY question to the user has a reflector marker. Emit the marker BEFORE your question.
-**Question types requiring markers:**
+**Marker Selection Guide:**
+| Situation | Marker |
+|-----------|--------|
+| Workflow handoff to next agent | `<!-- CYCLIST:HANDOFF:/agent -->` |
+| Handoff with context >80% | `<!-- CYCLIST:CONTEXT_CLEAR:/agent -->` |
+| Yes/no question | `<!-- CYCLIST:QUESTION:yesno -->` |
+| Open-ended question | `<!-- CYCLIST:QUESTION:open -->` |
+| Multiple choice | `<!-- CYCLIST:CHOICES:a,b,c -->` |
+| Status update / task complete | `<!-- CYCLIST:CONTINUE -->` |
+| Providing information | `<!-- CYCLIST:CONTINUE -->` |
+| Reporting an error/blocker | `<!-- CYCLIST:CONTINUE -->` |
+**Question types requiring QUESTION/CHOICES markers:**
 - Direct questions ending with `?`
 - Implicit questions: "let me know if...", "would you like...", "should I..."
 - Choice offerings: "Option A or Option B"
 - Requests for input: "what do you think", "your preference"
 - Clarification requests: "could you clarify"
-**Marker selection:**
-- `<!-- CYCLIST:QUESTION:yesno -->` - Yes/no questions
-- `<!-- CYCLIST:QUESTION:open -->` - Open-ended questions
-- `<!-- CYCLIST:CHOICES:a,b,c -->` - Multiple choice (list options)
-**Exempt (no marker needed):**
+**Exempt from question detection (but still need CONTINUE):**
 - Rhetorical questions you answer yourself
 - Questions inside code blocks or examples
 - Historical context ("the question was...")