npm - @phamvuhoang/otto-core - Versions diffs - 0.4.1 → 0.5.0 - Mend

@phamvuhoang/otto-core 0.4.1 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/package.json +1 -1
package/templates/acceptance-prompts.md +42 -0
package/templates/apply-review.md +18 -1
package/templates/ghprompt-workflow.md +9 -0
package/templates/linear-completion.md +10 -0
package/templates/quality-report.md +80 -0
package/templates/review-lens.md +1 -0
package/templates/verify.md +17 -21

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@phamvuhoang/otto-core",
-  "version": "0.4.1",
+  "version": "0.5.0",
   "description": "Claude Code AFK orchestration: iteration loop, native-sandbox runner, template renderer.",
   "type": "module",
   "license": "MIT",

package/templates/acceptance-prompts.md ADDED Viewed

@@ -0,0 +1,42 @@
+<!--
+  Per-mode human-acceptance prompts for the Otto quality report. Included ONCE by
+  quality-report.md, so every run mode inherits the same set through the single
+  existing contract include — never re-describe these per template (the same
+  drift-proofing as the contract itself). The generic Human Acceptance Checklist
+  stays; these add the task-fulfillment questions specific to the run's Mode.
+-->
+**Mode-specific acceptance prompts.** Beyond the generic checklist, fold the
+prompts for **your Mode** (from Task Source) into the Human Acceptance Checklist.
+Answer each with cited evidence, or mark it an explicit gap — never drop one
+silently.
+### afk — plan/PRD completion
+- [ ] Every PRD acceptance criterion is met or explicitly deferred.
+- [ ] All plan tasks are checked off, or the unchecked ones are recorded as gaps.
+- [ ] The product behavior is demonstrable, not just coded.
+### ghafk — GitHub issue burn-down
+- [ ] The change resolves what the issue actually asked, not an adjacent reading.
+- [ ] Work is scoped to this issue; unrelated changes are called out.
+- [ ] The issue will close cleanly when the PR merges (PR/issue links cited).
+### linear-afk — Linear issue burn-down
+- [ ] The change resolves the Linear issue's stated intent.
+- [ ] The comment cites the branch/PR and the explicit human next step.
+- [ ] The issue is left in the correct state (OPEN for PR-based repos).
+### apply-review — external review repair
+- [ ] Every CONFIRMED finding was actually fixed, not just acknowledged.
+- [ ] The fixes introduced no regression (suites re-run green).
+- [ ] Deferred / rejected findings are recorded with a reason.
+### verify — read-only verification
+- [ ] Each task's claimed status matches committed reality (evidence cited).
+- [ ] Suite results are current, not stale.
+- [ ] Gaps and deferrals are honest, not optimistic.

package/templates/apply-review.md CHANGED Viewed

@@ -28,7 +28,7 @@
 `<review-doc>` names a code-review document (a file path). `Read` it. It contains findings, usually with severities. Your job is to fix the actionable ones — ONE finding per iteration — and track the rest.
-When every actionable finding has been addressed (fixed, or already fixed in git, or recorded as a follow-up), output `<promise>NO MORE TASKS</promise>`.
+When every actionable finding has been addressed (fixed, or already fixed in git, or recorded as a follow-up), produce the completion report (see COMPLETION REPORT below), then output `<promise>NO MORE TASKS</promise>`.
 # TRIAGE
@@ -66,6 +66,23 @@ Make a single `git commit -am` with a short message:
 - Body: which finding (and its review section), key decision, and a one-line note of any follow-ups recorded.
 - No file lists, no `Co-Authored-By`.
+# COMPLETION REPORT
+Only on the final iteration — when every actionable finding has been addressed
+and you are about to output the sentinel — hand the maintainer one readable
+summary of the whole review-fix round. Do NOT emit it per-iteration. Map the
+contract below onto this round:
+- **What Changed / Evidence:** the findings you CONFIRMED and fixed, each with
+  its `fix(review):` commit SHA and the review section it came from; the
+  feedback loops you ran (tests / typecheck) and their result.
+- **Gaps And Follow-Ups:** findings you DEFERRED to `./.otto/review-followups.md`
+  (with why), and any REJECTED / won't-fix findings with their reason. Verdict
+  defaults to **Needs human review** when any actionable finding was left
+  unfixed.
+@include:quality-report.md
 # FINAL RULES
 ONLY ADDRESS A SINGLE FINDING per iteration.

package/templates/ghprompt-workflow.md CHANGED Viewed

@@ -59,6 +59,15 @@ Committing the code is NOT necessarily the end of the run. How work "ships" depe
 When unsure which applies, prefer leaving the issue OPEN and surfacing the branch — never close an issue whose work has not landed on the default branch.
+## Quality report (completion handoff)
+Whatever the completion surface, hand the maintainer **one readable Otto quality report** so they can accept, reject, or request follow-up without replaying the run log — green tests alone are not the handoff. Emit it into the completion surface and cite concrete links/SHAs:
+- **PR-based repo:** put the report in the **PR description** (create or refresh it there) and reference it from any issue comment. Cite the PR URL, the issue link, and the commit SHAs on this branch.
+- **Commit-to-branch repo:** put the report in the **issue comment**, citing the branch and the commit SHAs.
+@include:quality-report.md
 # LEARNINGS
 The repo's accumulated learnings are in the `<learnings>` block — durable, reusable knowledge from prior iterations (conventions, gotchas, decisions and their why, dead ends). Consult it during EXPLORATION and IMPLEMENTATION so you don't relearn what's known or repeat a dead end.

package/templates/linear-completion.md CHANGED Viewed

@@ -15,3 +15,13 @@ every Linear write — never raw GraphQL, and never `gh`:
   the issue for a human to move.
 When unsure which convention applies, comment and leave the issue OPEN.
+## Quality report placement (Linear)
+The FINISHING handoff above already defines the **Otto quality report** shape —
+do not re-describe it here. On Linear the **comment body IS that report**: write
+the full quality report (verdict, task source, what changed, evidence, human
+acceptance checklist, gaps/follow-ups) to a file and post it with
+`otto-linear comment <ref> --body-file <path>`, citing the branch/PR, the commit
+SHAs, the checks run, and the explicit human next step. For this PR-based repo
+that comment is the handoff surface — the issue stays OPEN until the PR merges.

package/templates/quality-report.md ADDED Viewed

@@ -0,0 +1,80 @@
+<!--
+  The Otto quality report contract. ONE readable verification artifact, reused
+  across every run mode (verify / afk / ghafk / linear-afk / apply-review) by
+  @include — never re-describe the shape per template, or the provider workflows
+  drift apart. Readable first; every claim cites concrete proof.
+-->
+Produce an **Otto quality report** with the exact section headings below. Rules:
+- **Readable first.** Keep it short enough to review in a couple of minutes — a
+  maintainer should not have to replay the run log. Specific beats exhaustive.
+- **Cite evidence for every claim.** A `file:line`, a commit SHA, a command +
+  its result, a report section, or an issue/PR link — never a vague assertion.
+- **Tests are evidence, not the verdict.** Green checks go in the Evidence
+  section; they do not by themselves make the verdict Accepted.
+- **Pick one honest verdict. When evidence is thin, scope is uncertain, or you
+  are unsure, choose _Needs human review_ — never self-declare _Accepted_.**
+  Model self-evaluation does not replace human review.
+```markdown
+# Otto quality report
+## Verdict
+One of — **Accepted** · **Accepted with follow-ups** · **Needs human review** · **Rejected**
+(when uncertain, choose **Needs human review**)
+## Task Source
+- Mode: <afk | ghafk | linear-afk | apply-review | verify>
+- Source: <plan/PRD path, GitHub issue #, or Linear ref>
+- Issue or plan: <link or path>
+## What Changed
+- Summary: <one or two sentences — what was actually done>
+- Commits: <SHAs on this branch>
+- Files: <paths touched>
+## Evidence
+- Implementation evidence: <file:line or commit proving each claim>
+- Test/typecheck evidence: <commands run + pass/fail counts>
+- Manual or acceptance evidence: <what was observed, or "none">
+## Human Acceptance Checklist
+- [ ] Solves the stated problem.
+- [ ] Behavior is observable or explained.
+- [ ] Scope is appropriate.
+- [ ] Docs/examples are updated when needed.
+- [ ] Risks and assumptions are clear.
+## Gaps And Follow-Ups
+- Gap: <known gap that remains, or "none">
+- Deferred: <intentionally not done in this run + why, or "none">
+- Recommended next action: <what a maintainer should do next>
+```
+### Human verdict trail
+Prior **human** verdicts on past Otto runs (most recent last) — consult them so a
+recurring reason ("scope creep", "thin evidence") informs *this* run's Verdict
+and *Recommended next action* before you commit to one:
+<verdict-trail>
+!?`cat ./.otto/verdicts.md|||_No human verdicts recorded yet._`
+</verdict-trail>
+**Maintainer:** after reviewing this report, append your verdict to
+`./.otto/verdicts.md` (create it lazily) — a dated `##` heading plus one line:
+the human verdict (**Accepted** · **Accepted with follow-ups** · **Rejected** ·
+**Needs investigation**) and *why* (what was accepted with caveats, or the
+concrete reason it was rejected). The file is git-tracked; it feeds the existing
+learning loop, so future runs see what was accepted or rejected and why.
+@include:acceptance-prompts.md

package/templates/review-lens.md CHANGED Viewed

@@ -27,6 +27,7 @@ You review the most recent commit (HEAD) through ONE lens only: **{{ LENS }}**.
 - `correctness` — bugs, regressions, broken logic, unhandled edge cases.
 - `security` — input validation, secrets, injection, auth bypass.
 - `tests` — coverage gaps for the changed code; missing/weak assertions.
+- `task-fit` — did the change solve the **right problem**? Does it map back to the source plan/issue, stay in scope (no unrequested extras, no missed sub-task), and leave a reviewer-useful trail (clear commit, evidence, surfaced gaps)? Flag scope drift, unaddressed acceptance criteria, and work that is mechanically correct but doesn't fulfil the task.
 If `<head>` shows `(no commits)`, output `<lens>SKIP</lens>` and stop.

package/templates/verify.md CHANGED Viewed

@@ -45,30 +45,26 @@ Put every task in exactly one bucket:
 # REPORT
-Write your report to `.otto-tmp/verify-report.md` using the `Write` tool (this path is gitignored scratch — it is the one write you may make). Structure it:
+Write your report to `.otto-tmp/verify-report.md` using the `Write` tool (this path is gitignored scratch — it is the one write you may make). Use the Otto quality report contract below: fold the RECONCILE/CLASSIFY results into it — DONE tasks (with their `file:line`/SHA evidence) into **What Changed** + **Evidence**, the suite pass/fail counts into the Test/typecheck evidence line, and GAP/DEFERRED tasks into **Gaps And Follow-Ups**.
-```
-# Verify report
+@include:quality-report.md
-## Verdict
+# CROSS-RUN QUALITY SUMMARY (READ-ONLY)
-<one-line: all done / N gaps / N deferred>
+Beyond *this* run, give the maintainer a quality rollup **across** runs so they can
+spot recurring output-quality failures without reading every NDJSON log. `Read`
+`./.otto/verdicts.md` (the git-tracked human-verdict trail). If it is absent, skip
+this section. Otherwise append a short `## Cross-Run Quality Summary` block to the
+same report file (`.otto-tmp/verify-report.md`) with:
-## Done
+- **Completions:** how many runs recorded a verdict, and the tally per verdict
+  (Accepted / Accepted with follow-ups / Rejected / Needs investigation).
+- **Common causes:** recurring reasons behind rejections or follow-ups (e.g.
+  "scope creep", "thin evidence"), most frequent first.
+- **Outstanding gaps & deferred work:** gaps and deferred items still open across
+  runs, so a maintainer can turn them into follow-up issues.
-- <task> — <evidence: file:line or commit>
+Keep it to a few lines and cite the trail entries you counted. This is read-only —
+do not edit or commit the trail.
-## Gaps
-- <task> — <what is missing>
-## Deferred
-- <task> — <why>
-## Suites
-- <command> — <pass/fail counts>
-```
-Also print the Verdict + section counts to your final message. Do not commit.
+Also print the Verdict + a one-line tally of done/gap/deferred to your final message. Do not commit.