npm - agentplane - Versions diffs - 0.6.7 → 0.6.9 - Mend

agentplane 0.6.7 → 0.6.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/assets/AGENTS.md +4 -3
package/assets/agents/EVALUATOR.json +6 -6
package/assets/codex-plugin/skills/agentplane/SKILL.md +7 -4
package/assets/evaluators/recovery-context.md +7 -1
package/assets/policy/workflow.branch_pr.md +2 -0
package/dist/.build-manifest.json +3 -3
package/dist/cli.js +646 -630
package/package.json +3 -3

package/assets/AGENTS.md CHANGED Viewed

@@ -20,7 +20,7 @@ Detailed procedures live in canonical modules from `## CANONICAL DOCS`.
 - Repository type: user project initialized with `agentplane`.
 - CLI rule: prefer `ap` for compact agent-oriented commands; fall back to `agentplane`; if neither is available, stop and request installation guidance (do not invent repo-local entrypoints).
-- Startup shortcut: run `## COMMANDS -> Preflight`, then use `ap quickstart`; activate `ap role ORCHESTRATOR` for planning and `ap role <ROLE>` for the active owner before owner-scoped execution; then apply `## LOAD RULES` before any mutation. The guarded route is determined by `workflow.mode` in `.agentplane/WORKFLOW.md`; use `ap quickstart` as the canonical summary of the active path before mutating. In `branch_pr`, start from `ap work start ... --worktree`; in `direct`, stay in the current checkout and use the task lifecycle route.
+- Startup shortcut: run `## COMMANDS -> Preflight`, then use `ap quickstart`; activate `ap role ORCHESTRATOR` for planning and `ap role <ROLE>` for the active owner before owner-scoped execution; then apply `## LOAD RULES` before any mutation. The guarded route is determined by `workflow.mode` in `.agentplane/WORKFLOW.md`; use `ap task brief <task-id>` and the emitted next command before manually assembling route commands. In `branch_pr`, start from the emitted `work start` route command or `ap work start ... --worktree`; in `direct`, stay in the current checkout and use the task lifecycle route.
 <!-- /ap:fragment -->
 <!-- ap:fragment id="gateway.agents.source_of_truth.sources.of.truth" slot="source_of_truth" mutability="replaceable" -->
@@ -85,7 +85,7 @@ git commit -m "Implement <task>"
 ap task verify-show <task-id>
 ap pr open <task-id> --branch task/<task-id>/<slug> --author <ROLE>
 ap verify <task-id> --ok|--rework --by <ROLE> --note "..."
-ap verify <task-id> --ok|--rework --by EVALUATOR --note "..." # verify --by EVALUATOR
+ap evaluator run <task-id> --verdict pass|rework|blocked|human_review --summary "..." --finding "..." --evidence <path-or-check>
 ap integrate <task-id> --branch task/<task-id>/<slug> --run-verify
 ap finish <task-id> --author INTEGRATOR --body "Verified: ..." --result "..." --commit <git-rev> --close-commit
 ```
@@ -95,6 +95,7 @@ ap finish <task-id> --author INTEGRATOR --body "Verified: ..." --result "..." --
 ```bash
 ap vshow <task-id>
 ap verify <task-id> --ok|--rework --by <ROLE> --note "..." [--observation "..." --impact "..." --resolution "..."] [--local-only]
+ap evaluator run <task-id> --verdict pass|rework|blocked|human_review --summary "..." --finding "..." --evidence <path-or-check> [--missing-test "..." --hidden-assumption "..." --residual-risk "..."]
 ap incidents advise <task-id>
 ap incidents collect <task-id> --check
 ap doctor
@@ -116,7 +117,7 @@ node .agentplane/policy/check-routing.mjs
 ## SHARED PROMPT CONTRACT
 - Outcome-first, concise, evidence-first: state goal, success criteria, constraints, stop rules, and output; use procedure only for command contracts, state machines, or irreversible gates; ask one narrow question only when missing information changes scope, task graph, security, or irreversible action.
-- Retrieval/progress/cache: preamble before multi-step or tool-heavy work; load only matched policy, task README, Verify Steps, and relevant files; use incidents only for analogous scope/tags; final output names actions, checks, blockers/drift, and next approval; keep stable gateway/policy/role before dynamic context and never cache mutable task state.
+- Retrieval/progress/cache: preamble before multi-step or tool-heavy work; use `ap task active` and `ap task brief <task-id>` before manually combining task docs, route status, Verify Steps, PR metadata, and policy notes; load only matched policy, task README, Verify Steps, and relevant files; use incidents only for analogous scope/tags; final output names actions, checks, blockers/drift, and next approval; keep stable gateway/policy/role before dynamic context and never cache mutable task state.
 <!-- /ap:fragment -->
 <!-- ap:fragment id="gateway.user.instructions" slot="body" mutability="append_only" -->

package/assets/agents/EVALUATOR.json CHANGED Viewed

@@ -8,19 +8,19 @@
     "reference.behavior": "Optional reference behavior for prompt/module/recipe evals, including expected outputs, hard gates, scoring rubric, and promotion policy."
   },
   "outputs": {
-    "verdict": "One of pass, rework, or blocked, with the criteria and evidence that determined the result.",
+    "verdict": "One of pass, rework, blocked, or human_review, with the criteria and evidence that determined the result.",
     "rework.context": "Focused instructions for the next runner pass when criteria are not yet satisfied.",
-    "quality.report": "Deterministic gate results, LLM quality assessment when requested, residual risks, and promotion/finish recommendation."
+    "quality.report": "Structured `ap evaluator run` report with findings, evidence_refs, missing_tests, hidden_assumptions, residual_risks, evaluated_sha, and promotion/finish recommendation."
   },
   "permissions": {
     "review.artifacts": "Read task documentation, runner artifacts, diffs, reports, and eval outputs.",
-    "task.verification": "Record verification or rework through `ap` when the active workflow authorizes evaluator-scoped updates."
+    "task.verification": "Record evaluator-scoped quality_review through `ap evaluator run`; use ordinary `agentplane verify --by EVALUATOR` only for legacy/manual records that are not sufficient for finish/integrate gates."
   },
   "workflow": {
     "goal": "Goal: decide whether the latest task or eval attempt satisfies the documented quality contract without relying on the runner's self-claim alone.",
-    "success.criteria": "Success criteria: required task sections and Verify Steps are mapped to concrete evidence; result manifest and artifacts are structurally valid; hard policy/security/lifecycle gates pass; LLM quality scoring is used only where the approved rubric asks for judgement; context.maximum_assimilation work is checked for source-shaped wiki topology, useful Obsidian-compatible wikilinks, page granularity, line-addressed provenance, coverage gaps, glossary alias safety, raw-deletion resilience, and leakage risk; the final verdict is reproducible from cited evidence.",
+    "success.criteria": "Success criteria: required task sections and Verify Steps are mapped to concrete evidence; result manifest and artifacts are structurally valid; hard policy/security/lifecycle gates pass; pass reviews include non-empty findings and a quality-report.json evidence ref written by `ap evaluator run`; LLM quality scoring is used only where the approved rubric asks for judgement; context.maximum_assimilation work is checked for source-shaped wiki topology, useful Obsidian-compatible wikilinks, page granularity, line-addressed provenance, coverage gaps, glossary alias safety, raw-deletion resilience, and leakage risk; the final verdict is reproducible from cited evidence.",
     "constraints": "Constraints: use loaded gateway and policy modules as binding constraints; separate deterministic gates from LLM judgement; do not edit implementation files; do not finish or integrate tasks unless the approved plan explicitly assigns evaluator closure; preserve raw trace/artifact paths instead of copying assistant prose into task docs.",
-    "stop.rules": "Stop rules: mark blocked when evidence is missing, stale, unverifiable, policy-sensitive, or outside approved scope; mark rework when criteria are testable but unmet; require human approval before changing pass criteria, promotion thresholds, or security-sensitive interpretation.",
-    "output": "Output: verdict, failed or satisfied criteria, evidence paths, LLM judgement summary when used, rework context for the next runner pass, and finish/promote recommendation."
+    "stop.rules": "Stop rules: mark blocked when evidence is missing, stale, unverifiable, policy-sensitive, outside approved scope, or cannot be tied to the reviewed commit; mark rework when criteria are testable but unmet; require human approval before changing pass criteria, promotion thresholds, or security-sensitive interpretation.",
+    "output": "Output: run `ap evaluator run <task-id> --verdict <pass|rework|blocked|human_review> --summary \"...\" --finding \"...\" --evidence <path-or-check>` or provide the exact equivalent command; include failed or satisfied criteria, evidence paths, missing tests, hidden assumptions, residual risks, rework context for the next runner pass, and finish/promote recommendation."
   }
 }

package/assets/codex-plugin/skills/agentplane/SKILL.md CHANGED Viewed

@@ -17,15 +17,18 @@ Use AgentPlane through its CLI instead of editing `.agentplane/` state directly.
 1. If the repository is not initialized, run `ap init` or `agentplane init`.
 2. Run `ap quickstart`.
-3. Inspect `AGENTS.md`, `ap task list`, `git status --short --untracked-files=no`, and `git rev-parse --abbrev-ref HEAD`.
-4. Use `ap role ORCHESTRATOR` while planning and approvals are active.
-5. Switch to `ap role <ROLE>` before owner-scoped execution or verification.
+3. Inspect `AGENTS.md`, `ap task list`, `ap task active`, `git status --short --untracked-files=no`, and `git rev-parse --abbrev-ref HEAD`.
+4. Use `ap task brief <task-id>` before owner-scoped execution; add `--remote` only when hosted PR/check/review state is needed.
+5. Use `ap role ORCHESTRATOR` while planning and approvals are active.
+6. Switch to `ap role <ROLE>` before owner-scoped execution or verification.
 ## Rules
 - Treat `AGENTS.md`, `ap quickstart`, and `ap role <ROLE>` as the policy surface.
 - Use `ap task ...`, `ap work ...`, `ap verify ...`, and `ap finish ...`; do not edit `.agentplane/tasks.json` manually.
-- In `branch_pr`, start from `ap work start <task-id> --agent <ROLE> --slug <slug> --worktree`.
+- Prefer `ap task brief <task-id>` and `ap task next-action <task-id> --explain` over manually combining task docs, route status, Verify Steps, PR metadata, and policy notes.
+- In `branch_pr`, use the concrete route command emitted by `task brief` or `task next-action` when available; fall back to `ap work start <task-id> --agent <ROLE> --slug <slug> --worktree` only as the low-level command contract.
+- Treat weak `source_confidence` or non-ready `verify_steps_quality` as a context gap to resolve before mutation.
 - Keep repository artifacts in English unless the user explicitly requests another language for a specific artifact.
 - Record verification evidence in the task README and through `ap verify`.

package/assets/evaluators/recovery-context.md CHANGED Viewed

@@ -29,15 +29,21 @@ Use this evaluator only when the primary implementation path already produced a
 4. Inspect concurrency-sensitive paths and classify whether observed drift belongs to active agent work, stale handoff, or unrelated workspace drift.
 5. Identify missing tests, missing docs, or verification that only proves the happy path.
 6. Do not execute fixes. Return review findings only.
+7. When recording the result, use `agentplane evaluator run <task-id>` so the task gets prompt,
+   `quality-report.json`, and opinion artifacts. A bare `verify --by EVALUATOR` note is legacy
+   evidence and is not sufficient for finish/integrate gates.
 ## Output
 Return a concise structured review:
-- `verdict`: `pass`, `rework`, or `blocked`.
+- `verdict`: `pass`, `rework`, `blocked`, or `human_review`.
 - `findings`: ordered by severity, each with file/path evidence and the broken invariant.
+- `evidence_refs`: concrete files, checks, PRs, traces, or reports inspected; pass reviews must
+  include the generated `quality-report.json`.
 - `missing_tests`: concrete tests or checks that would have caught the issue.
 - `hidden_assumptions`: assumptions the implementation relies on but did not prove.
+- `residual_risks`: known risks after the review.
 - `recovery_context`: what the next agent should know only if normal context is insufficient.
 ## Stop Rules

package/assets/policy/workflow.branch_pr.md CHANGED Viewed

@@ -58,6 +58,8 @@ Default branch names are `task/<task-id>/<slug>` for implementation branches and
 through `branch.task_prefix` and `branch.task_close_prefix`; task id, slug, and sha positions remain
 fixed.
+Before manually filling `<slug>` or `<branch>`, use `agentplane task brief <task-id>` or `agentplane task next-action <task-id> --explain` and prefer the emitted concrete command.
 <!-- /ap:fragment -->
 <!-- ap:fragment id="policy.workflow.branch_pr.hard_constraint.constraints" slot="hard_constraint" mutability="append_only" -->

package/dist/.build-manifest.json CHANGED Viewed

@@ -2,7 +2,7 @@
   "schema_version": 1,
   "manifest_kind": "package",
   "package_name": "agentplane",
-  "package_version": "0.6.7",
-  "git_head": "5937638eb63301ff4f249104e4d130ee19d378d3",
-  "watched_runtime_snapshot_hash": "d4f774d913105cf1fc015861fad16769e44ebd5371c12a92f1b7e5c83dd6a941"
+  "package_version": "0.6.9",
+  "git_head": "c64a147ee51ff07019a834b314face264cabb948",
+  "watched_runtime_snapshot_hash": "615cd705f0e53934ac27d05278232358ece4cef3ec436e27ed514c314755d66b"
 }