@phamvuhoang/otto-core 0.4.1 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@phamvuhoang/otto-core",
3
- "version": "0.4.1",
3
+ "version": "0.5.0",
4
4
  "description": "Claude Code AFK orchestration: iteration loop, native-sandbox runner, template renderer.",
5
5
  "type": "module",
6
6
  "license": "MIT",
@@ -0,0 +1,42 @@
1
+ <!--
2
+ Per-mode human-acceptance prompts for the Otto quality report. Included ONCE by
3
+ quality-report.md, so every run mode inherits the same set through the single
4
+ existing contract include — never re-describe these per template (the same
5
+ drift-proofing as the contract itself). The generic Human Acceptance Checklist
6
+ stays; these add the task-fulfillment questions specific to the run's Mode.
7
+ -->
8
+
9
+ **Mode-specific acceptance prompts.** Beyond the generic checklist, fold the
10
+ prompts for **your Mode** (from Task Source) into the Human Acceptance Checklist.
11
+ Answer each with cited evidence, or mark it an explicit gap — never drop one
12
+ silently.
13
+
14
+ ### afk — plan/PRD completion
15
+
16
+ - [ ] Every PRD acceptance criterion is met or explicitly deferred.
17
+ - [ ] All plan tasks are checked off, or the unchecked ones are recorded as gaps.
18
+ - [ ] The product behavior is demonstrable, not just coded.
19
+
20
+ ### ghafk — GitHub issue burn-down
21
+
22
+ - [ ] The change resolves what the issue actually asked, not an adjacent reading.
23
+ - [ ] Work is scoped to this issue; unrelated changes are called out.
24
+ - [ ] The issue will close cleanly when the PR merges (PR/issue links cited).
25
+
26
+ ### linear-afk — Linear issue burn-down
27
+
28
+ - [ ] The change resolves the Linear issue's stated intent.
29
+ - [ ] The comment cites the branch/PR and the explicit human next step.
30
+ - [ ] The issue is left in the correct state (OPEN for PR-based repos).
31
+
32
+ ### apply-review — external review repair
33
+
34
+ - [ ] Every CONFIRMED finding was actually fixed, not just acknowledged.
35
+ - [ ] The fixes introduced no regression (suites re-run green).
36
+ - [ ] Deferred / rejected findings are recorded with a reason.
37
+
38
+ ### verify — read-only verification
39
+
40
+ - [ ] Each task's claimed status matches committed reality (evidence cited).
41
+ - [ ] Suite results are current, not stale.
42
+ - [ ] Gaps and deferrals are honest, not optimistic.
@@ -28,7 +28,7 @@
28
28
 
29
29
  `<review-doc>` names a code-review document (a file path). `Read` it. It contains findings, usually with severities. Your job is to fix the actionable ones — ONE finding per iteration — and track the rest.
30
30
 
31
- When every actionable finding has been addressed (fixed, or already fixed in git, or recorded as a follow-up), output `<promise>NO MORE TASKS</promise>`.
31
+ When every actionable finding has been addressed (fixed, or already fixed in git, or recorded as a follow-up), produce the completion report (see COMPLETION REPORT below), then output `<promise>NO MORE TASKS</promise>`.
32
32
 
33
33
  # TRIAGE
34
34
 
@@ -66,6 +66,23 @@ Make a single `git commit -am` with a short message:
66
66
  - Body: which finding (and its review section), key decision, and a one-line note of any follow-ups recorded.
67
67
  - No file lists, no `Co-Authored-By`.
68
68
 
69
+ # COMPLETION REPORT
70
+
71
+ Only on the final iteration — when every actionable finding has been addressed
72
+ and you are about to output the sentinel — hand the maintainer one readable
73
+ summary of the whole review-fix round. Do NOT emit it per-iteration. Map the
74
+ contract below onto this round:
75
+
76
+ - **What Changed / Evidence:** the findings you CONFIRMED and fixed, each with
77
+ its `fix(review):` commit SHA and the review section it came from; the
78
+ feedback loops you ran (tests / typecheck) and their result.
79
+ - **Gaps And Follow-Ups:** findings you DEFERRED to `./.otto/review-followups.md`
80
+ (with why), and any REJECTED / won't-fix findings with their reason. Verdict
81
+ defaults to **Needs human review** when any actionable finding was left
82
+ unfixed.
83
+
84
+ @include:quality-report.md
85
+
69
86
  # FINAL RULES
70
87
 
71
88
  ONLY ADDRESS A SINGLE FINDING per iteration.
@@ -59,6 +59,15 @@ Committing the code is NOT necessarily the end of the run. How work "ships" depe
59
59
 
60
60
  When unsure which applies, prefer leaving the issue OPEN and surfacing the branch — never close an issue whose work has not landed on the default branch.
61
61
 
62
+ ## Quality report (completion handoff)
63
+
64
+ Whatever the completion surface, hand the maintainer **one readable Otto quality report** so they can accept, reject, or request follow-up without replaying the run log — green tests alone are not the handoff. Emit it into the completion surface and cite concrete links/SHAs:
65
+
66
+ - **PR-based repo:** put the report in the **PR description** (create or refresh it there) and reference it from any issue comment. Cite the PR URL, the issue link, and the commit SHAs on this branch.
67
+ - **Commit-to-branch repo:** put the report in the **issue comment**, citing the branch and the commit SHAs.
68
+
69
+ @include:quality-report.md
70
+
62
71
  # LEARNINGS
63
72
 
64
73
  The repo's accumulated learnings are in the `<learnings>` block — durable, reusable knowledge from prior iterations (conventions, gotchas, decisions and their why, dead ends). Consult it during EXPLORATION and IMPLEMENTATION so you don't relearn what's known or repeat a dead end.
@@ -15,3 +15,13 @@ every Linear write — never raw GraphQL, and never `gh`:
15
15
  the issue for a human to move.
16
16
 
17
17
  When unsure which convention applies, comment and leave the issue OPEN.
18
+
19
+ ## Quality report placement (Linear)
20
+
21
+ The FINISHING handoff above already defines the **Otto quality report** shape —
22
+ do not re-describe it here. On Linear the **comment body IS that report**: write
23
+ the full quality report (verdict, task source, what changed, evidence, human
24
+ acceptance checklist, gaps/follow-ups) to a file and post it with
25
+ `otto-linear comment <ref> --body-file <path>`, citing the branch/PR, the commit
26
+ SHAs, the checks run, and the explicit human next step. For this PR-based repo
27
+ that comment is the handoff surface — the issue stays OPEN until the PR merges.
@@ -0,0 +1,80 @@
1
+ <!--
2
+ The Otto quality report contract. ONE readable verification artifact, reused
3
+ across every run mode (verify / afk / ghafk / linear-afk / apply-review) by
4
+ @include — never re-describe the shape per template, or the provider workflows
5
+ drift apart. Readable first; every claim cites concrete proof.
6
+ -->
7
+
8
+ Produce an **Otto quality report** with the exact section headings below. Rules:
9
+
10
+ - **Readable first.** Keep it short enough to review in a couple of minutes — a
11
+ maintainer should not have to replay the run log. Specific beats exhaustive.
12
+ - **Cite evidence for every claim.** A `file:line`, a commit SHA, a command +
13
+ its result, a report section, or an issue/PR link — never a vague assertion.
14
+ - **Tests are evidence, not the verdict.** Green checks go in the Evidence
15
+ section; they do not by themselves make the verdict Accepted.
16
+ - **Pick one honest verdict. When evidence is thin, scope is uncertain, or you
17
+ are unsure, choose _Needs human review_ — never self-declare _Accepted_.**
18
+ Model self-evaluation does not replace human review.
19
+
20
+ ```markdown
21
+ # Otto quality report
22
+
23
+ ## Verdict
24
+
25
+ One of — **Accepted** · **Accepted with follow-ups** · **Needs human review** · **Rejected**
26
+ (when uncertain, choose **Needs human review**)
27
+
28
+ ## Task Source
29
+
30
+ - Mode: <afk | ghafk | linear-afk | apply-review | verify>
31
+ - Source: <plan/PRD path, GitHub issue #, or Linear ref>
32
+ - Issue or plan: <link or path>
33
+
34
+ ## What Changed
35
+
36
+ - Summary: <one or two sentences — what was actually done>
37
+ - Commits: <SHAs on this branch>
38
+ - Files: <paths touched>
39
+
40
+ ## Evidence
41
+
42
+ - Implementation evidence: <file:line or commit proving each claim>
43
+ - Test/typecheck evidence: <commands run + pass/fail counts>
44
+ - Manual or acceptance evidence: <what was observed, or "none">
45
+
46
+ ## Human Acceptance Checklist
47
+
48
+ - [ ] Solves the stated problem.
49
+ - [ ] Behavior is observable or explained.
50
+ - [ ] Scope is appropriate.
51
+ - [ ] Docs/examples are updated when needed.
52
+ - [ ] Risks and assumptions are clear.
53
+
54
+ ## Gaps And Follow-Ups
55
+
56
+ - Gap: <known gap that remains, or "none">
57
+ - Deferred: <intentionally not done in this run + why, or "none">
58
+ - Recommended next action: <what a maintainer should do next>
59
+ ```
60
+
61
+ ### Human verdict trail
62
+
63
+ Prior **human** verdicts on past Otto runs (most recent last) — consult them so a
64
+ recurring reason ("scope creep", "thin evidence") informs *this* run's Verdict
65
+ and *Recommended next action* before you commit to one:
66
+
67
+ <verdict-trail>
68
+
69
+ !?`cat ./.otto/verdicts.md|||_No human verdicts recorded yet._`
70
+
71
+ </verdict-trail>
72
+
73
+ **Maintainer:** after reviewing this report, append your verdict to
74
+ `./.otto/verdicts.md` (create it lazily) — a dated `##` heading plus one line:
75
+ the human verdict (**Accepted** · **Accepted with follow-ups** · **Rejected** ·
76
+ **Needs investigation**) and *why* (what was accepted with caveats, or the
77
+ concrete reason it was rejected). The file is git-tracked; it feeds the existing
78
+ learning loop, so future runs see what was accepted or rejected and why.
79
+
80
+ @include:acceptance-prompts.md
@@ -27,6 +27,7 @@ You review the most recent commit (HEAD) through ONE lens only: **{{ LENS }}**.
27
27
  - `correctness` — bugs, regressions, broken logic, unhandled edge cases.
28
28
  - `security` — input validation, secrets, injection, auth bypass.
29
29
  - `tests` — coverage gaps for the changed code; missing/weak assertions.
30
+ - `task-fit` — did the change solve the **right problem**? Does it map back to the source plan/issue, stay in scope (no unrequested extras, no missed sub-task), and leave a reviewer-useful trail (clear commit, evidence, surfaced gaps)? Flag scope drift, unaddressed acceptance criteria, and work that is mechanically correct but doesn't fulfil the task.
30
31
 
31
32
  If `<head>` shows `(no commits)`, output `<lens>SKIP</lens>` and stop.
32
33
 
@@ -45,30 +45,26 @@ Put every task in exactly one bucket:
45
45
 
46
46
  # REPORT
47
47
 
48
- Write your report to `.otto-tmp/verify-report.md` using the `Write` tool (this path is gitignored scratch — it is the one write you may make). Structure it:
48
+ Write your report to `.otto-tmp/verify-report.md` using the `Write` tool (this path is gitignored scratch — it is the one write you may make). Use the Otto quality report contract below: fold the RECONCILE/CLASSIFY results into it — DONE tasks (with their `file:line`/SHA evidence) into **What Changed** + **Evidence**, the suite pass/fail counts into the Test/typecheck evidence line, and GAP/DEFERRED tasks into **Gaps And Follow-Ups**.
49
49
 
50
- ```
51
- # Verify report
50
+ @include:quality-report.md
52
51
 
53
- ## Verdict
52
+ # CROSS-RUN QUALITY SUMMARY (READ-ONLY)
54
53
 
55
- <one-line: all done / N gaps / N deferred>
54
+ Beyond *this* run, give the maintainer a quality rollup **across** runs so they can
55
+ spot recurring output-quality failures without reading every NDJSON log. `Read`
56
+ `./.otto/verdicts.md` (the git-tracked human-verdict trail). If it is absent, skip
57
+ this section. Otherwise append a short `## Cross-Run Quality Summary` block to the
58
+ same report file (`.otto-tmp/verify-report.md`) with:
56
59
 
57
- ## Done
60
+ - **Completions:** how many runs recorded a verdict, and the tally per verdict
61
+ (Accepted / Accepted with follow-ups / Rejected / Needs investigation).
62
+ - **Common causes:** recurring reasons behind rejections or follow-ups (e.g.
63
+ "scope creep", "thin evidence"), most frequent first.
64
+ - **Outstanding gaps & deferred work:** gaps and deferred items still open across
65
+ runs, so a maintainer can turn them into follow-up issues.
58
66
 
59
- - <task> <evidence: file:line or commit>
67
+ Keep it to a few lines and cite the trail entries you counted. This is read-only —
68
+ do not edit or commit the trail.
60
69
 
61
- ## Gaps
62
-
63
- - <task> — <what is missing>
64
-
65
- ## Deferred
66
-
67
- - <task> — <why>
68
-
69
- ## Suites
70
-
71
- - <command> — <pass/fail counts>
72
- ```
73
-
74
- Also print the Verdict + section counts to your final message. Do not commit.
70
+ Also print the Verdict + a one-line tally of done/gap/deferred to your final message. Do not commit.