npm - @papi-ai/server - Versions diffs - 0.7.34 → 0.7.35 - Mend

@papi-ai/server 0.7.34 → 0.7.35

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/dist/backfill-cycle-metrics.js +4329 -0
package/dist/index.js +1694 -489
package/dist/prompts.js +58 -8
package/package.json +2 -1
package/skills/papi-cycle/papi-plan/SKILL.md +14 -0
package/skills/papi-cycle/papi-strategy/SKILL.md +1 -1

package/dist/prompts.js CHANGED Viewed

@@ -154,11 +154,19 @@ This is Cycle 0 \u2014 the first planning cycle for a brand-new project.
 2. **North Star** \u2014 Propose a one-sentence North Star statement, a success metric, and a key metric.
-3. **Initial Board** \u2014 Generate 3-5 tasks based on the project's actual tech stack and goals:
+3. **Initial Board** \u2014 Generate 3-5 tasks based on the project's actual tech stack and goals.
+   **Sequencing principle \u2014 local-first / thinnest-vertical-slice-first (AD-51).** Plan the way the everyday builder gets value, NOT the way a backend team scaffolds. Order the first cycle so the user SEES the core loop working as fast as possible:
+   1. **Thinnest visible vertical slice first** \u2014 the core loop on screen, running on local / mock / hardcoded data. Cycle 1 should make the thing actually work in front of the user.
+   2. **Persistence / real data model** \u2014 only after the slice exists.
+   3. **Auth** \u2014 later, and **NEVER third-party OAuth (Google/GitHub/etc.) as an opening task.** Start with no-auth or the simplest local auth.
+   4. **RLS / cloud hardening / multi-tenant security** \u2014 last. These are not cycle-1 work for a new project.
+   Defer infrastructure (cloud DB schema, RLS, OAuth providers, large data seeds) until a working slice exists. A first cycle that opens with "Supabase schema + RLS + third-party OAuth + a big data seed" is the exact anti-pattern this rule exists to prevent.
    - Infer the project type from the brief/description (CLI, web app, mobile app, API, library, game, data pipeline, etc.)
-   - Task 1: Project-appropriate setup (toolchain, dependencies, config \u2014 NOT "scaffolding" if the project already has code)
-   - Task 2: Core functionality that proves the concept works (data model, main loop, core algorithm \u2014 whatever the project needs first)
-   - Tasks 3-5: First deliverables that demonstrate value, broken into small steps appropriate for the project type
+   - Task 1: The thinnest vertical slice that puts the core loop in front of the user (local/mock data is fine \u2014 and preferred \u2014 at this stage). NOT infra scaffolding unless the project literally has no runnable surface without it.
+   - Task 2: The next slice, or the minimal persistence the core loop needs \u2014 only once the slice works.
+   - Tasks 3-5: Further value-demonstrating slices, broken into small steps appropriate for the project type. Keep auth, cloud, and hardening OUT of the opening cycle unless the brief makes them the literal core product.
    - Do NOT assume web-app patterns (routes, pages, components) unless the brief explicitly describes a web application
    - All tasks: status Backlog, priority P1-P2, reviewed true, phase "Phase 1"
@@ -321,7 +329,7 @@ Standard planning cycle with full board review.
   }
   parts.push(`
 6. **Maturity Gate** \u2014 Before scheduling any task, check whether the project is ready for it:
-   - **Cycle number as signal:** A Cycle 3 project should not be scheduling OAuth, billing, or analytics tasks. Early cycles focus on core functionality and proving the concept works.
+   - **Cycle number as signal (local-first sequencing, AD-51):** A Cycle 1-3 project should not be scheduling OAuth, billing, analytics, or RLS/cloud-hardening tasks. Early cycles sequence local-first / thinnest-vertical-slice-first: get the core loop working on local/mock data, THEN persistence, THEN auth (never third-party OAuth as an opening task), and RLS/cloud hardening LAST. Defer infrastructure until a working slice exists \u2014 plan the way the everyday builder gets value, not the way a backend team scaffolds.
    - **Phase prerequisites:** If the board has phases, tasks from later phases should only be scheduled when earlier phases have completed tasks (check Done count per phase). A task in "Phase 4: Monetisation" is premature if Phase 2 tasks are still in Backlog.
    - **Dependency chain:** If a task's \`dependsOn\` references incomplete tasks, it cannot be scheduled regardless of priority.
    - **Task maturity:** Tasks with \`maturity: "raw"\` are unscoped ideas from the idea tool. The planner IS the scoping mechanism \u2014 scope them as part of planning. For raw tasks selected for a cycle: (a) derive clear scope, acceptance criteria, and effort from the title, notes, and project context, (b) upgrade them to \`maturity: "investigated"\` via a \`boardCorrections\` entry, and (c) generate a BUILD HANDOFF as normal. For research-type raw tasks, scope the handoff as an investigation task \u2014 the deliverable is findings + follow-up backlog tasks, not code. Only leave a raw task unscheduled if you genuinely cannot derive scope from the available context \u2014 note why in the cycle log. Tasks with \`maturity: "ready"\` or no maturity field are considered cycle-ready. Tasks with \`maturity: "investigated"\` have been scoped but may still need refinement \u2014 schedule them if priority warrants it.
@@ -395,7 +403,7 @@ ${AD_REJECTION_RULES}
    - **Architecture Notes:** If a pattern was established that needs follow-up (e.g. "shared service layer created, MCP migration needed"), propose the follow-up.
    - **Strategy gaps:** If an Active Decision has no board tasks supporting it, propose one.
    - **Dogfood observations:** Unactioned dogfood entries span FOUR categories (friction, methodology, signal, commercial) \u2014 consider ALL of them, not just friction. If an entry (with ID) maps to no existing task, propose one, matching task type to category: friction/signal \u2192 fix or improvement; methodology \u2192 process/tooling change; commercial \u2192 GTM/positioning. **CRITICAL: Include \`dogfood:<uuid>\` in the new task's \`notes\` field** (e.g. \`"notes": "dogfood:abc12345-..."\`). This links the task to the observation so the pipeline can track what was actioned. Without this annotation, the observation stays unactioned forever.
-   Create new tasks via the \`newTasks\` array in Part 2. Use \`new-N\` IDs in \`cycleHandoffs\` to reference them. **Limit: 3 new tasks per cycle** to prevent backlog bloat.
+   Create new tasks via the \`newTasks\` array in Part 2. Use \`new-N\` IDs in \`cycleHandoffs\` to reference them. **New-task cap \u2014 scale it to backlog depth, do NOT hard-cap at 3:** when 5+ unblocked backlog candidates already exist, limit new tasks to 3 (there is plenty to select from, so prevent backlog bloat). When the backlog is thin (fewer than 5 unblocked candidates), create as many as needed to bring the cycle into the healthy 5-8 range \u2014 roughly \`6 \u2212 unblocked_candidates\` additional new tasks, derived from the brief, roadmap phases, and recent build reports \u2014 up to an absolute ceiling of **8 new tasks**. This removes the contradiction with the Cycle-sizing rule above (a healthy cycle is 6-10 tasks; <5 needs justification), which the old flat cap of 3 violated whenever the backlog was thin. Never invent filler to hit a number: every new task must trace to a real discovered issue, dogfood entry, roadmap phase, or brief item.
    **\u26A0\uFE0F DUPLICATE CHECK:** Before adding a task to \`newTasks\`, scan the Cycle Board above for any existing task with the same or very similar title/scope. If a matching task already exists (even with slightly different wording), do NOT create a duplicate \u2014 reference the existing task ID instead. The board already contains all active tasks; re-creating them wastes IDs and bloats the board.
    **\u26A0\uFE0F ALREADY-BUILT CHECK:** Before creating a task, check the recent build reports and cycle log for evidence that this capability was already shipped. If a recent build report shows this feature was completed (even under a different task name), do NOT create a new task for it. This is especially important for UI features, data models, and integrations that may already exist.`);
   parts.push(PLAN_FRAGMENT_PRODUCT_BRIEF);
@@ -473,6 +481,9 @@ function buildPlanUserMessage(ctx) {
       ""
     );
   }
+  if (ctx.siblingRepoWarning) {
+    parts.push("", ctx.siblingRepoWarning, "");
+  }
   parts.push("", "---", "", "## PROJECT CONTEXT", "");
   if (ctx.contextTier) {
     parts.push(`**Context tier:** ${ctx.contextTier}`, "");
@@ -828,7 +839,9 @@ After your natural language output, include this EXACT format on its own line:
     {
       "id": "string \u2014 AD-N (existing) or new AD-N (for new decisions)",
       "action": "confidence_change | modify | resolve | supersede | new | delete",
-      "body": "string \u2014 full AD block including ### heading, confidence tag, and body text (empty string for delete)"
+      "body": "string \u2014 full AD block including ### heading, confidence tag, and body text (empty string for delete)",
+      "evidenceRef": "string (optional) \u2014 for DELIBERATE decisions (modify / resolve / delete = validate/modify/invalidate), a pointer to the evidence that justified the change: a doc path (docs/research/foo.md), a build-report id, or a metric name. Builds the decision->outcome ledger. Omit if no concrete evidence.",
+      "metricDelta": "object (optional) \u2014 { "metric": string, "before": number, "after": number, "delta": number } \u2014 which metric moved and by how much. Use a REAL metric name from cycle_metrics_snapshots where possible (e.g. scope_accuracy, velocity, est_actual_drift). Omit if no metric moved."
     }
   ],
   "decisionScores": [
@@ -888,6 +901,8 @@ The JSON must be valid. Use null for optional fields that don't apply.
 For activeDecisionUpdates, the body field must be the COMPLETE replacement text for the AD block (including the ### heading line).
 Only include ADs that need changes \u2014 omit unchanged ADs.${compressionNote}
+**Decision-outcome ledger (task-2168):** For DELIBERATE decisions \u2014 \`modify\`, \`resolve\`, or \`delete\` (i.e. you validated, modified, or invalidated an AD based on what actually happened) \u2014 include \`evidenceRef\` and/or \`metricDelta\` so the decision becomes a queryable ledger entry rather than freetext. \`evidenceRef\` points at WHAT justified the change (a doc path, a build-report id, or a metric name). \`metricDelta\` records WHICH metric moved (prefer a real metric from cycle_metrics_snapshots) and its before->after. This is **guidance, not a hard requirement** \u2014 if a deliberate change genuinely has no concrete evidence, omit both and proceed; the apply will record the event with a non-blocking warning.
 ## PERSISTENCE RULES \u2014 READ THIS CAREFULLY
 Everything in Part 1 (natural language) is **display-only**. Part 2 (structured JSON) is what gets written to files.
@@ -1053,7 +1068,9 @@ After your natural language output, include this EXACT format on its own line:
     {
       "id": "string \u2014 AD-N (existing) or new AD-N (for new decisions)",
       "action": "confidence_change | modify | resolve | supersede | new | delete",
-      "body": "string \u2014 full AD block including ### heading, confidence tag, and body text (empty string for delete)"
+      "body": "string \u2014 full AD block including ### heading, confidence tag, and body text (empty string for delete)",
+      "evidenceRef": "string (optional) \u2014 for DELIBERATE changes (modify / resolve / delete), a pointer to the justifying evidence: a doc path, a build-report id, or a metric name. Builds the decision->outcome ledger. Omit if none.",
+      "metricDelta": "object (optional) \u2014 { "metric": string, "before": number, "after": number, "delta": number } \u2014 which metric moved. Prefer a real metric name from cycle_metrics_snapshots. Omit if none."
     }
   ],
   "phaseUpdates": [
@@ -1370,6 +1387,37 @@ Return a JSON array of 3-10 tasks. Each task must have:
 - Use the full complexity range: XS (config/one-liner), Small (one file), Medium (2-5 files), Large (cross-module), XL (architectural)
 - Tasks should be specific enough to execute without further investigation
 - Maximum 10 tasks \u2014 fewer is better if the codebase is well-maintained`;
+var VISION_TASKS_SYSTEM = `You are a product engineer turning a project VISION into a starter backlog for a brand-new project (no code exists yet). The user has described what they want to build; your job is to make that vision legible as concrete, buildable tasks.
+IMPORTANT: You are running as a non-interactive API call. Do NOT ask questions. Produce tasks directly.
+## OUTPUT FORMAT
+Return a JSON array of 15-20 tasks. Each task must have:
+- "title": Clear, actionable task title (start with a verb)
+- "priority": "P0 Critical", "P1 High", "P2 Medium", or "P3 Low"
+- "complexity": "XS", "Small", "Medium", "Large", or "XL"
+- "module": A module name inferred from the vision (e.g. "Core", "Auth", "Frontend", "API", "Payments")
+- "phase": A phase name ("Phase 1" for the first shippable slice, "Phase 2" for what follows, etc.)
+- "notes": 1-2 sentences tying this task to the user's stated vision
+## GUIDELINES
+- Cover the VISION, not a generic app skeleton. Every task must trace to something the user actually described.
+- Sequence local-first / thinnest-vertical-slice-first (AD-51): the first few P0/P1 tasks should be the smallest path to a visibly working thing, NOT infrastructure scaffolding (no "set up CI", "configure auth provider", "design database schema" as opening tasks). Plan the way the user gets value, not the way a backend team scaffolds.
+- Mix: the first shippable feature slice (Phase 1), then the next features, then supporting/foundational work behind them.
+- 15-20 is a TARGET BAND for good vision coverage, NOT a quota. If the vision is genuinely small, produce fewer high-quality tasks. NEVER pad with filler ("add tests", "write docs", "refactor") to hit a number.
+- Use the full complexity range. Keep titles concrete enough to execute without re-deriving the vision.
+- Do NOT add PAPI-setup tasks \u2014 those are handled by the setup flow.`;
+function buildVisionTasksPrompt(inputs) {
+  const line = (label, val) => val?.trim() ? `**${label}:** ${val.trim()}
+` : "";
+  return `Turn this project vision into a 15-20 item starter backlog.
+**Project:** ${inputs.projectName}
+${line("What it is", inputs.description)}${line("Target users", inputs.targetUsers)}${line("Problems it solves", inputs.problems)}${line("Project type", inputs.projectType)}
+You are ALSO generating this project's Product Brief in this same setup round \u2014 use that fuller vision as your primary source. Return a JSON array of 15-20 tasks (coverage over count; do not pad) that make this vision a buildable backlog, thinnest-shippable-slice first.`;
+}
 function buildInitialTasksPrompt(inputs) {
   const description = inputs.description?.trim() ? `**Description:** ${inputs.description}
 ` : `**Description:** (not provided \u2014 derive from the codebase analysis below)
@@ -1398,6 +1446,7 @@ export {
   PRODUCT_BRIEF_SYSTEM,
   REVIEW_SYSTEM,
   STRATEGY_CHANGE_SYSTEM,
+  VISION_TASKS_SYSTEM,
   buildAdSeedPrompt,
   buildConventionsPrompt,
   buildHandoffRegenMessage,
@@ -1408,6 +1457,7 @@ export {
   buildProductBriefPrompt,
   buildReviewSystemPrompt,
   buildReviewUserMessage,
+  buildVisionTasksPrompt,
   parseReviewStructuredOutput,
   parseStrategyChangeOutput,
   parseStructuredOutput

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@papi-ai/server",
-  "version": "0.7.34",
+  "version": "0.7.35",
   "description": "PAPI MCP server — AI-powered sprint planning, build execution, and strategy review for software projects",
   "license": "Elastic-2.0",
   "mcpName": "io.github.cathalos92/papi",
@@ -23,6 +23,7 @@
     "build:skills": "PAPI_SKILLS_OUT=skills/papi-cycle node scripts/export-skills.mjs",
     "build:prompts": "tsup src/prompts.ts --format esm",
     "start": "node dist/index.js",
+    "backfill-cycle-metrics": "node dist/backfill-cycle-metrics.js",
     "test": "vitest run",
     "test:watch": "vitest",
     "lint": "eslint src",

package/skills/papi-cycle/papi-plan/SKILL.md CHANGED Viewed

@@ -40,6 +40,8 @@ plan → build_list → build_execute → audit → review_list → review_submi
 7. **review_submit** — Records the review verdict and updates task status.
    Next: `build_list` to view next build
+   **Quality Gate (standard for every build).** Before accepting a build, run a code review of the branch diff — either call `review_submit` with `dispatch:"subagent"` to auto-review, or attach `auto_review` findings yourself. `review_submit` surfaces the findings on every verdict. Accepting past a `fail`/`warn` gate is a deliberate **override** and is recorded on the review for the audit trail. Always run the gate on risk-tier work (auth, data, migrations, CI).
    **DO NOT** use `review_submit` as a substitute for `review_list`. If you need to see what is pending review, always call `review_list` first. If `review_list` is unavailable in your tool set (e.g. your MCP client filters parameterless tools), STOP and tell the human their MCP integration is incomplete — never guess at the next pending task. To submit an accept verdict on a build-acceptance review, either pass `reviewer_confirmed: true` or ensure `review_list` has run in the same session within the last 15 minutes. (SUP-2026-010.)
 ### Strategy Review
@@ -70,6 +72,18 @@ idea → (picked up by next plan)
 - **idea** — Captures a new task idea and writes it to the backlog.
   Next: The next `plan` run will prioritise and schedule it.
+### Friction Moment — Report PAPI Bugs & Ideas Upstream
+When a PAPI tool returns a **workflow-blocking error** (it crashed, won't advance the cycle, or repeats after a retry), don't just stop — offer the user a zero-friction escape hatch to send it upstream to the PAPI team, then keep them moving:
+1. Offer to submit it: run `bug` with `report=true`. Set `type='bug'` for a defect or `type='idea'` for a feature request / suggestion (the same tool handles both).
+2. Ask two quick questions and pass the answers through:
+   - **Notify when fixed?** → `notify_when_fixed=true` (the resolution surfaces back in a later `orient` and on their dashboard).
+   - **OK for the PAPI team to reach out?** → `contact_ok=true`.
+3. Confirm the submission ID back to the user, then suggest the workaround / next step so they're unblocked.
+The same path works any time a user *wants* to send feedback — not only on errors. Diagnostics (Node version, platform, adapter) are attached automatically; the submission is project-scoped and only visible to the user and PAPI maintainers.
 ### Project Bootstrap
 ```

package/skills/papi-cycle/papi-strategy/SKILL.md CHANGED Viewed

@@ -10,7 +10,7 @@ description: Invoke when running strategy_review or making Active Decision chang
 Every 5 cycles, PAPI offers a strategy review — a deep analysis of velocity, estimation accuracy, active decisions, and project direction.
 - **Don't skip them.** They're where compounding value comes from.
-- Strategy reviews run in their own session — don't mix with building.
+- If your session is already heavy with build context, run the review fresh for cleaner output — a genuinely fresh session needs no restart.
 - Reviews produce recommendations that feed into the next plan.
 - If the review recommends AD changes, use `strategy_change` to apply them.