@papi-ai/server 0.7.34 → 0.7.35

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/dist/prompts.js CHANGED
@@ -154,11 +154,19 @@ This is Cycle 0 \u2014 the first planning cycle for a brand-new project.
154
154
 
155
155
  2. **North Star** \u2014 Propose a one-sentence North Star statement, a success metric, and a key metric.
156
156
 
157
- 3. **Initial Board** \u2014 Generate 3-5 tasks based on the project's actual tech stack and goals:
157
+ 3. **Initial Board** \u2014 Generate 3-5 tasks based on the project's actual tech stack and goals.
158
+
159
+ **Sequencing principle \u2014 local-first / thinnest-vertical-slice-first (AD-51).** Plan the way the everyday builder gets value, NOT the way a backend team scaffolds. Order the first cycle so the user SEES the core loop working as fast as possible:
160
+ 1. **Thinnest visible vertical slice first** \u2014 the core loop on screen, running on local / mock / hardcoded data. Cycle 1 should make the thing actually work in front of the user.
161
+ 2. **Persistence / real data model** \u2014 only after the slice exists.
162
+ 3. **Auth** \u2014 later, and **NEVER third-party OAuth (Google/GitHub/etc.) as an opening task.** Start with no-auth or the simplest local auth.
163
+ 4. **RLS / cloud hardening / multi-tenant security** \u2014 last. These are not cycle-1 work for a new project.
164
+ Defer infrastructure (cloud DB schema, RLS, OAuth providers, large data seeds) until a working slice exists. A first cycle that opens with "Supabase schema + RLS + third-party OAuth + a big data seed" is the exact anti-pattern this rule exists to prevent.
165
+
158
166
  - Infer the project type from the brief/description (CLI, web app, mobile app, API, library, game, data pipeline, etc.)
159
- - Task 1: Project-appropriate setup (toolchain, dependencies, config \u2014 NOT "scaffolding" if the project already has code)
160
- - Task 2: Core functionality that proves the concept works (data model, main loop, core algorithm \u2014 whatever the project needs first)
161
- - Tasks 3-5: First deliverables that demonstrate value, broken into small steps appropriate for the project type
167
+ - Task 1: The thinnest vertical slice that puts the core loop in front of the user (local/mock data is fine \u2014 and preferred \u2014 at this stage). NOT infra scaffolding unless the project literally has no runnable surface without it.
168
+ - Task 2: The next slice, or the minimal persistence the core loop needs \u2014 only once the slice works.
169
+ - Tasks 3-5: Further value-demonstrating slices, broken into small steps appropriate for the project type. Keep auth, cloud, and hardening OUT of the opening cycle unless the brief makes them the literal core product.
162
170
  - Do NOT assume web-app patterns (routes, pages, components) unless the brief explicitly describes a web application
163
171
  - All tasks: status Backlog, priority P1-P2, reviewed true, phase "Phase 1"
164
172
 
@@ -321,7 +329,7 @@ Standard planning cycle with full board review.
321
329
  }
322
330
  parts.push(`
323
331
  6. **Maturity Gate** \u2014 Before scheduling any task, check whether the project is ready for it:
324
- - **Cycle number as signal:** A Cycle 3 project should not be scheduling OAuth, billing, or analytics tasks. Early cycles focus on core functionality and proving the concept works.
332
+ - **Cycle number as signal (local-first sequencing, AD-51):** A Cycle 1-3 project should not be scheduling OAuth, billing, analytics, or RLS/cloud-hardening tasks. Early cycles sequence local-first / thinnest-vertical-slice-first: get the core loop working on local/mock data, THEN persistence, THEN auth (never third-party OAuth as an opening task), and RLS/cloud hardening LAST. Defer infrastructure until a working slice exists \u2014 plan the way the everyday builder gets value, not the way a backend team scaffolds.
325
333
  - **Phase prerequisites:** If the board has phases, tasks from later phases should only be scheduled when earlier phases have completed tasks (check Done count per phase). A task in "Phase 4: Monetisation" is premature if Phase 2 tasks are still in Backlog.
326
334
  - **Dependency chain:** If a task's \`dependsOn\` references incomplete tasks, it cannot be scheduled regardless of priority.
327
335
  - **Task maturity:** Tasks with \`maturity: "raw"\` are unscoped ideas from the idea tool. The planner IS the scoping mechanism \u2014 scope them as part of planning. For raw tasks selected for a cycle: (a) derive clear scope, acceptance criteria, and effort from the title, notes, and project context, (b) upgrade them to \`maturity: "investigated"\` via a \`boardCorrections\` entry, and (c) generate a BUILD HANDOFF as normal. For research-type raw tasks, scope the handoff as an investigation task \u2014 the deliverable is findings + follow-up backlog tasks, not code. Only leave a raw task unscheduled if you genuinely cannot derive scope from the available context \u2014 note why in the cycle log. Tasks with \`maturity: "ready"\` or no maturity field are considered cycle-ready. Tasks with \`maturity: "investigated"\` have been scoped but may still need refinement \u2014 schedule them if priority warrants it.
@@ -395,7 +403,7 @@ ${AD_REJECTION_RULES}
395
403
  - **Architecture Notes:** If a pattern was established that needs follow-up (e.g. "shared service layer created, MCP migration needed"), propose the follow-up.
396
404
  - **Strategy gaps:** If an Active Decision has no board tasks supporting it, propose one.
397
405
  - **Dogfood observations:** Unactioned dogfood entries span FOUR categories (friction, methodology, signal, commercial) \u2014 consider ALL of them, not just friction. If an entry (with ID) maps to no existing task, propose one, matching task type to category: friction/signal \u2192 fix or improvement; methodology \u2192 process/tooling change; commercial \u2192 GTM/positioning. **CRITICAL: Include \`dogfood:<uuid>\` in the new task's \`notes\` field** (e.g. \`"notes": "dogfood:abc12345-..."\`). This links the task to the observation so the pipeline can track what was actioned. Without this annotation, the observation stays unactioned forever.
398
- Create new tasks via the \`newTasks\` array in Part 2. Use \`new-N\` IDs in \`cycleHandoffs\` to reference them. **Limit: 3 new tasks per cycle** to prevent backlog bloat.
406
+ Create new tasks via the \`newTasks\` array in Part 2. Use \`new-N\` IDs in \`cycleHandoffs\` to reference them. **New-task cap \u2014 scale it to backlog depth, do NOT hard-cap at 3:** when 5+ unblocked backlog candidates already exist, limit new tasks to 3 (there is plenty to select from, so prevent backlog bloat). When the backlog is thin (fewer than 5 unblocked candidates), create as many as needed to bring the cycle into the healthy 5-8 range \u2014 roughly \`6 \u2212 unblocked_candidates\` additional new tasks, derived from the brief, roadmap phases, and recent build reports \u2014 up to an absolute ceiling of **8 new tasks**. This removes the contradiction with the Cycle-sizing rule above (a healthy cycle is 6-10 tasks; <5 needs justification), which the old flat cap of 3 violated whenever the backlog was thin. Never invent filler to hit a number: every new task must trace to a real discovered issue, dogfood entry, roadmap phase, or brief item.
399
407
  **\u26A0\uFE0F DUPLICATE CHECK:** Before adding a task to \`newTasks\`, scan the Cycle Board above for any existing task with the same or very similar title/scope. If a matching task already exists (even with slightly different wording), do NOT create a duplicate \u2014 reference the existing task ID instead. The board already contains all active tasks; re-creating them wastes IDs and bloats the board.
400
408
  **\u26A0\uFE0F ALREADY-BUILT CHECK:** Before creating a task, check the recent build reports and cycle log for evidence that this capability was already shipped. If a recent build report shows this feature was completed (even under a different task name), do NOT create a new task for it. This is especially important for UI features, data models, and integrations that may already exist.`);
401
409
  parts.push(PLAN_FRAGMENT_PRODUCT_BRIEF);
@@ -473,6 +481,9 @@ function buildPlanUserMessage(ctx) {
473
481
  ""
474
482
  );
475
483
  }
484
+ if (ctx.siblingRepoWarning) {
485
+ parts.push("", ctx.siblingRepoWarning, "");
486
+ }
476
487
  parts.push("", "---", "", "## PROJECT CONTEXT", "");
477
488
  if (ctx.contextTier) {
478
489
  parts.push(`**Context tier:** ${ctx.contextTier}`, "");
@@ -828,7 +839,9 @@ After your natural language output, include this EXACT format on its own line:
828
839
  {
829
840
  "id": "string \u2014 AD-N (existing) or new AD-N (for new decisions)",
830
841
  "action": "confidence_change | modify | resolve | supersede | new | delete",
831
- "body": "string \u2014 full AD block including ### heading, confidence tag, and body text (empty string for delete)"
842
+ "body": "string \u2014 full AD block including ### heading, confidence tag, and body text (empty string for delete)",
843
+ "evidenceRef": "string (optional) \u2014 for DELIBERATE decisions (modify / resolve / delete = validate/modify/invalidate), a pointer to the evidence that justified the change: a doc path (docs/research/foo.md), a build-report id, or a metric name. Builds the decision->outcome ledger. Omit if no concrete evidence.",
844
+ "metricDelta": "object (optional) \u2014 { "metric": string, "before": number, "after": number, "delta": number } \u2014 which metric moved and by how much. Use a REAL metric name from cycle_metrics_snapshots where possible (e.g. scope_accuracy, velocity, est_actual_drift). Omit if no metric moved."
832
845
  }
833
846
  ],
834
847
  "decisionScores": [
@@ -888,6 +901,8 @@ The JSON must be valid. Use null for optional fields that don't apply.
888
901
  For activeDecisionUpdates, the body field must be the COMPLETE replacement text for the AD block (including the ### heading line).
889
902
  Only include ADs that need changes \u2014 omit unchanged ADs.${compressionNote}
890
903
 
904
+ **Decision-outcome ledger (task-2168):** For DELIBERATE decisions \u2014 \`modify\`, \`resolve\`, or \`delete\` (i.e. you validated, modified, or invalidated an AD based on what actually happened) \u2014 include \`evidenceRef\` and/or \`metricDelta\` so the decision becomes a queryable ledger entry rather than freetext. \`evidenceRef\` points at WHAT justified the change (a doc path, a build-report id, or a metric name). \`metricDelta\` records WHICH metric moved (prefer a real metric from cycle_metrics_snapshots) and its before->after. This is **guidance, not a hard requirement** \u2014 if a deliberate change genuinely has no concrete evidence, omit both and proceed; the apply will record the event with a non-blocking warning.
905
+
891
906
  ## PERSISTENCE RULES \u2014 READ THIS CAREFULLY
892
907
 
893
908
  Everything in Part 1 (natural language) is **display-only**. Part 2 (structured JSON) is what gets written to files.
@@ -1053,7 +1068,9 @@ After your natural language output, include this EXACT format on its own line:
1053
1068
  {
1054
1069
  "id": "string \u2014 AD-N (existing) or new AD-N (for new decisions)",
1055
1070
  "action": "confidence_change | modify | resolve | supersede | new | delete",
1056
- "body": "string \u2014 full AD block including ### heading, confidence tag, and body text (empty string for delete)"
1071
+ "body": "string \u2014 full AD block including ### heading, confidence tag, and body text (empty string for delete)",
1072
+ "evidenceRef": "string (optional) \u2014 for DELIBERATE changes (modify / resolve / delete), a pointer to the justifying evidence: a doc path, a build-report id, or a metric name. Builds the decision->outcome ledger. Omit if none.",
1073
+ "metricDelta": "object (optional) \u2014 { "metric": string, "before": number, "after": number, "delta": number } \u2014 which metric moved. Prefer a real metric name from cycle_metrics_snapshots. Omit if none."
1057
1074
  }
1058
1075
  ],
1059
1076
  "phaseUpdates": [
@@ -1370,6 +1387,37 @@ Return a JSON array of 3-10 tasks. Each task must have:
1370
1387
  - Use the full complexity range: XS (config/one-liner), Small (one file), Medium (2-5 files), Large (cross-module), XL (architectural)
1371
1388
  - Tasks should be specific enough to execute without further investigation
1372
1389
  - Maximum 10 tasks \u2014 fewer is better if the codebase is well-maintained`;
1390
+ var VISION_TASKS_SYSTEM = `You are a product engineer turning a project VISION into a starter backlog for a brand-new project (no code exists yet). The user has described what they want to build; your job is to make that vision legible as concrete, buildable tasks.
1391
+
1392
+ IMPORTANT: You are running as a non-interactive API call. Do NOT ask questions. Produce tasks directly.
1393
+
1394
+ ## OUTPUT FORMAT
1395
+
1396
+ Return a JSON array of 15-20 tasks. Each task must have:
1397
+ - "title": Clear, actionable task title (start with a verb)
1398
+ - "priority": "P0 Critical", "P1 High", "P2 Medium", or "P3 Low"
1399
+ - "complexity": "XS", "Small", "Medium", "Large", or "XL"
1400
+ - "module": A module name inferred from the vision (e.g. "Core", "Auth", "Frontend", "API", "Payments")
1401
+ - "phase": A phase name ("Phase 1" for the first shippable slice, "Phase 2" for what follows, etc.)
1402
+ - "notes": 1-2 sentences tying this task to the user's stated vision
1403
+
1404
+ ## GUIDELINES
1405
+
1406
+ - Cover the VISION, not a generic app skeleton. Every task must trace to something the user actually described.
1407
+ - Sequence local-first / thinnest-vertical-slice-first (AD-51): the first few P0/P1 tasks should be the smallest path to a visibly working thing, NOT infrastructure scaffolding (no "set up CI", "configure auth provider", "design database schema" as opening tasks). Plan the way the user gets value, not the way a backend team scaffolds.
1408
+ - Mix: the first shippable feature slice (Phase 1), then the next features, then supporting/foundational work behind them.
1409
+ - 15-20 is a TARGET BAND for good vision coverage, NOT a quota. If the vision is genuinely small, produce fewer high-quality tasks. NEVER pad with filler ("add tests", "write docs", "refactor") to hit a number.
1410
+ - Use the full complexity range. Keep titles concrete enough to execute without re-deriving the vision.
1411
+ - Do NOT add PAPI-setup tasks \u2014 those are handled by the setup flow.`;
1412
+ function buildVisionTasksPrompt(inputs) {
1413
+ const line = (label, val) => val?.trim() ? `**${label}:** ${val.trim()}
1414
+ ` : "";
1415
+ return `Turn this project vision into a 15-20 item starter backlog.
1416
+
1417
+ **Project:** ${inputs.projectName}
1418
+ ${line("What it is", inputs.description)}${line("Target users", inputs.targetUsers)}${line("Problems it solves", inputs.problems)}${line("Project type", inputs.projectType)}
1419
+ You are ALSO generating this project's Product Brief in this same setup round \u2014 use that fuller vision as your primary source. Return a JSON array of 15-20 tasks (coverage over count; do not pad) that make this vision a buildable backlog, thinnest-shippable-slice first.`;
1420
+ }
1373
1421
  function buildInitialTasksPrompt(inputs) {
1374
1422
  const description = inputs.description?.trim() ? `**Description:** ${inputs.description}
1375
1423
  ` : `**Description:** (not provided \u2014 derive from the codebase analysis below)
@@ -1398,6 +1446,7 @@ export {
1398
1446
  PRODUCT_BRIEF_SYSTEM,
1399
1447
  REVIEW_SYSTEM,
1400
1448
  STRATEGY_CHANGE_SYSTEM,
1449
+ VISION_TASKS_SYSTEM,
1401
1450
  buildAdSeedPrompt,
1402
1451
  buildConventionsPrompt,
1403
1452
  buildHandoffRegenMessage,
@@ -1408,6 +1457,7 @@ export {
1408
1457
  buildProductBriefPrompt,
1409
1458
  buildReviewSystemPrompt,
1410
1459
  buildReviewUserMessage,
1460
+ buildVisionTasksPrompt,
1411
1461
  parseReviewStructuredOutput,
1412
1462
  parseStrategyChangeOutput,
1413
1463
  parseStructuredOutput
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@papi-ai/server",
3
- "version": "0.7.34",
3
+ "version": "0.7.35",
4
4
  "description": "PAPI MCP server — AI-powered sprint planning, build execution, and strategy review for software projects",
5
5
  "license": "Elastic-2.0",
6
6
  "mcpName": "io.github.cathalos92/papi",
@@ -23,6 +23,7 @@
23
23
  "build:skills": "PAPI_SKILLS_OUT=skills/papi-cycle node scripts/export-skills.mjs",
24
24
  "build:prompts": "tsup src/prompts.ts --format esm",
25
25
  "start": "node dist/index.js",
26
+ "backfill-cycle-metrics": "node dist/backfill-cycle-metrics.js",
26
27
  "test": "vitest run",
27
28
  "test:watch": "vitest",
28
29
  "lint": "eslint src",
@@ -40,6 +40,8 @@ plan → build_list → build_execute → audit → review_list → review_submi
40
40
  7. **review_submit** — Records the review verdict and updates task status.
41
41
  Next: `build_list` to view next build
42
42
 
43
+ **Quality Gate (standard for every build).** Before accepting a build, run a code review of the branch diff — either call `review_submit` with `dispatch:"subagent"` to auto-review, or attach `auto_review` findings yourself. `review_submit` surfaces the findings on every verdict. Accepting past a `fail`/`warn` gate is a deliberate **override** and is recorded on the review for the audit trail. Always run the gate on risk-tier work (auth, data, migrations, CI).
44
+
43
45
  **DO NOT** use `review_submit` as a substitute for `review_list`. If you need to see what is pending review, always call `review_list` first. If `review_list` is unavailable in your tool set (e.g. your MCP client filters parameterless tools), STOP and tell the human their MCP integration is incomplete — never guess at the next pending task. To submit an accept verdict on a build-acceptance review, either pass `reviewer_confirmed: true` or ensure `review_list` has run in the same session within the last 15 minutes. (SUP-2026-010.)
44
46
 
45
47
  ### Strategy Review
@@ -70,6 +72,18 @@ idea → (picked up by next plan)
70
72
  - **idea** — Captures a new task idea and writes it to the backlog.
71
73
  Next: The next `plan` run will prioritise and schedule it.
72
74
 
75
+ ### Friction Moment — Report PAPI Bugs & Ideas Upstream
76
+
77
+ When a PAPI tool returns a **workflow-blocking error** (it crashed, won't advance the cycle, or repeats after a retry), don't just stop — offer the user a zero-friction escape hatch to send it upstream to the PAPI team, then keep them moving:
78
+
79
+ 1. Offer to submit it: run `bug` with `report=true`. Set `type='bug'` for a defect or `type='idea'` for a feature request / suggestion (the same tool handles both).
80
+ 2. Ask two quick questions and pass the answers through:
81
+ - **Notify when fixed?** → `notify_when_fixed=true` (the resolution surfaces back in a later `orient` and on their dashboard).
82
+ - **OK for the PAPI team to reach out?** → `contact_ok=true`.
83
+ 3. Confirm the submission ID back to the user, then suggest the workaround / next step so they're unblocked.
84
+
85
+ The same path works any time a user *wants* to send feedback — not only on errors. Diagnostics (Node version, platform, adapter) are attached automatically; the submission is project-scoped and only visible to the user and PAPI maintainers.
86
+
73
87
  ### Project Bootstrap
74
88
 
75
89
  ```
@@ -10,7 +10,7 @@ description: Invoke when running strategy_review or making Active Decision chang
10
10
  Every 5 cycles, PAPI offers a strategy review — a deep analysis of velocity, estimation accuracy, active decisions, and project direction.
11
11
 
12
12
  - **Don't skip them.** They're where compounding value comes from.
13
- - Strategy reviews run in their own sessiondon't mix with building.
13
+ - If your session is already heavy with build context, run the review fresh for cleaner output a genuinely fresh session needs no restart.
14
14
  - Reviews produce recommendations that feed into the next plan.
15
15
  - If the review recommends AD changes, use `strategy_change` to apply them.
16
16