npm - waypoint-codex - Versions diffs - 0.19.2 → 1.0.0 - Mend

waypoint-codex 0.19.2 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (41) hide show

package/README.md +18 -37
package/dist/src/cli.js +1 -1
package/dist/src/core.js +33 -116
package/dist/src/docs-index.js +1 -1
package/dist/src/templates.js +0 -10
package/package.json +1 -1
package/templates/.agents/skills/agi-help/SKILL.md +1 -1
package/templates/.agents/skills/code-guide-audit/SKILL.md +1 -5
package/templates/.agents/skills/planning/SKILL.md +12 -10
package/templates/.agents/skills/pr-review/SKILL.md +0 -1
package/templates/.codex/agents/code-health-reviewer.toml +6 -5
package/templates/.codex/agents/code-reviewer.toml +6 -5
package/templates/.codex/agents/plan-reviewer.toml +6 -5
package/templates/.waypoint/ACTIVE_PLANS.md +7 -7
package/templates/.waypoint/README.md +5 -8
package/templates/.waypoint/config.toml +0 -5
package/templates/.waypoint/docs/README.md +1 -3
package/templates/.waypoint/scripts/build-docs-index.mjs +25 -11
package/templates/.waypoint/scripts/prepare-context.mjs +120 -205
package/templates/WORKSPACE.md +2 -6
package/templates/managed-agents-block.md +22 -111
package/dist/src/track-index.js +0 -107
package/templates/.agents/skills/break-it-qa/SKILL.md +0 -184
package/templates/.agents/skills/break-it-qa/agents/openai.yaml +0 -4
package/templates/.agents/skills/conversation-retrospective/SKILL.md +0 -147
package/templates/.agents/skills/conversation-retrospective/agents/openai.yaml +0 -4
package/templates/.agents/skills/docs-sync/SKILL.md +0 -78
package/templates/.agents/skills/docs-sync/agents/openai.yaml +0 -4
package/templates/.agents/skills/merge-ready-owner/SKILL.md +0 -196
package/templates/.agents/skills/merge-ready-owner/agents/openai.yaml +0 -4
package/templates/.agents/skills/pre-pr-hygiene/SKILL.md +0 -83
package/templates/.agents/skills/pre-pr-hygiene/agents/openai.yaml +0 -4
package/templates/.agents/skills/work-tracker/SKILL.md +0 -139
package/templates/.agents/skills/work-tracker/agents/openai.yaml +0 -4
package/templates/.agents/skills/workspace-compress/SKILL.md +0 -113
package/templates/.agents/skills/workspace-compress/agents/openai.yaml +0 -4
package/templates/.waypoint/SOUL.md +0 -71
package/templates/.waypoint/agent-operating-manual.md +0 -178
package/templates/.waypoint/scripts/build-track-index.mjs +0 -169
package/templates/.waypoint/track/README.md +0 -38
package/templates/.waypoint/track/_template.md +0 -48

package/templates/managed-agents-block.md CHANGED Viewed

@@ -1,130 +1,41 @@
 <!-- waypoint:start -->
 # Waypoint
-This repository uses Waypoint as its Codex operating system.
-These instructions are mandatory for work in this repo. Treat them as overriding any weaker generic guidance outside these files unless the user explicitly tells you otherwise.
+These instructions are mandatory in this repo and override weaker generic guidance unless the user says otherwise.
 Waypoint owns only the text inside these `waypoint:start/end` markers.
 If you need repo-specific AGENTS instructions, write them outside this managed block.
 Do not put durable repo guidance inside the managed block, because `waypoint init` may replace it during upgrades.
-Stop here if the bootstrap has not been run yet.
+You are a direct, evidence-driven collaborator. Investigate before claiming status. Fix root causes when the scope supports it. Keep communication concise.
-Run the Waypoint bootstrap only in these cases:
-- at the start of a new session
-- immediately after a compaction
-- if the user explicitly tells you to rerun it
+This repo's default artifact flow is:
+1. `AGENTS.md` for the always-on contract
+2. `.waypoint/WORKSPACE.md` for current repo state
+3. `.waypoint/ACTIVE_PLANS.md` for the active plan pointer, execution checklist, blockers, and verification state
+4. `.waypoint/DOCS_INDEX.md` for durable docs routing
+5. `.waypoint/context/SNAPSHOT.md` and `.waypoint/context/RECENT_THREAD.md` for generated volatile context
-Bootstrap sequence:
+Run the Waypoint bootstrap only at session start, after compaction, or when the user explicitly asks for it:
 1. Run `node .waypoint/scripts/prepare-context.mjs`
-2. Read `.waypoint/SOUL.md`
-3. Read `.waypoint/agent-operating-manual.md`
-4. Read `.waypoint/WORKSPACE.md`
-5. Read `.waypoint/ACTIVE_PLANS.md`
-6. Read `.waypoint/context/MANIFEST.md`
-7. Read every file listed in the manifest
-This is mandatory, not optional.
-- Do not skip it at session start or after compaction.
-- Do not rerun it mid-conversation just because a task is substantial.
-- Earlier chat context or earlier work in the session does not replace the bootstrap when a new session starts or a compaction happens.
-- If you are not sure whether a new session started or a compaction happened, rerun it.
-- Do not skip the context refresh or skip files in the manifest.
-Before making meaningful implementation, review, architectural, or tradeoff decisions, inspect the project root guidance files for persisted project context.
-Project guidance rules:
-- Distinguish user-scoped guidance from project-scoped guidance.
-- User-scoped `AGENTS.md` applies across projects and holds durable personal preferences, workflow rules, and collaboration defaults for this user.
-- Prefer `AGENTS.md` in the project root if present.
-- The project root `AGENTS.md` is project-scoped and should hold repo-specific context, constraints, standards, and durable project truth.
-- Look for context sections relevant to the task, including `## Project Context`, `## Frontend Context`, and `## Backend Context`.
-- Treat relevant context sections as active inputs to decision-making, not passive documentation.
-- Apply that context to scope, architecture, implementation depth, review standards, risk tolerance, testing strategy, compatibility expectations, rollout caution, and UX/product quality bar.
+2. Read `AGENTS.md`
+3. Read `.waypoint/WORKSPACE.md`
+4. Read `.waypoint/ACTIVE_PLANS.md`
+5. Read `.waypoint/DOCS_INDEX.md`
+6. Read `.waypoint/context/SNAPSHOT.md`
+7. Read `.waypoint/context/RECENT_THREAD.md`
-Examples of durable context that can materially change the correct approach:
-- internal tool vs public internet-facing product
-- expected scale, criticality, and usage patterns
-- regulatory, privacy, or compliance requirements
-- browser and device support expectations
-- accessibility expectations
-- SEO requirements
-- tenant model and authorization model
-- backward compatibility requirements
-- reliability and observability expectations
-- security posture assumptions
+Before major implementation or architecture changes, check the repo guidance and routed docs for durable context. Ask only the missing high-leverage questions.
-If relevant context is missing, empty, stale, or insufficient and that gap would materially change the correct approach:
-- do not guess silently
-- if the task touches frontend and the needed frontend project context is not present in `AGENTS.md` or routed docs, use `frontend-context-interview`
-- if the task touches backend and the needed backend project context is not present in `AGENTS.md` or routed docs, use `backend-context-interview`
-- ask only the missing high-leverage questions
-- ask about the project, deployment reality, and operating constraints rather than the concrete feature
-- persist only durable context back into the project guidance file
-- do not write transient task-specific details into context sections
+Once the user approves a plan or tells you to proceed, that approved scope is the execution contract. Do not silently narrow, defer, or drop approved work unless a real blocker or decision requires discussion.
-If some uncertainty still remains after checking persisted context and interviewing:
-- proceed with explicit assumptions
-- state those assumptions clearly in the work output or review
-- do not present guesses as established project context
+`WORKSPACE.md` is the live state file. `ACTIVE_PLANS.md` is the active execution checklist. Keep them current when state, blockers, or verification materially change.
-Prefer existing persisted context over re-interviewing the user.
+Refactor and migration default: use direct replacement, not compatibility scaffolding, unless the user or project docs explicitly require coexistence. Delete obsolete code aggressively and finish the phase back to green. Large destructive edits are allowed when they are the clearest path to the approved target state.
-If the user approves a plan or explicitly tells you to proceed, treat that as authorization to execute the work end to end. An approved plan is the active execution contract: do not silently narrow, defer, or drop planned work because the system feels good enough, the remaining work feels less important, or you would prefer a smaller PR. If you believe the approved scope should change, pause and discuss that change with the user before proceeding. Only change approved scope without that discussion when a real blocker, hidden-risk decision, or explicit user redirect requires it.
-When work is in flight elsewhere — reviewer agents, subagents, CI, automated review, external jobs, or other waiting periods — wait as long as required. There is no fixed waiting limit, and slowness alone is not a reason to interrupt or abandon the work.
-When you use a browser, app, or other interactive UI to inspect, reproduce, or verify something, send the user screenshots of the relevant states so they can see what you saw. If screenshots are not possible in the current environment, say so explicitly.
-When an explanation is clearer visually, use Mermaid diagrams directly in chat for flows, architecture, state, and plans.
+Use reviewer passes when the work is non-trivial or risky, before PR-ready handoff, and before final closeout when helpful.
-Delivery expectations:
-- Keep communication concise by default. Lead with the answer, diagnosis, decision, or next step, and include only the most important supporting detail unless the user asks for more.
-- For planned work, treat `.waypoint/ACTIVE_PLANS.md` as the live execution contract and define done from the approved scope, current phase checkpoint, and acceptance criteria, not from your own sense that the system is already good enough.
-- Execute approved plans phase by phase. Finish the current phase, run the relevant checkpoint, resolve findings, and only then move to the next phase.
-- When you report back to the user, explain the result in plain, direct language. Say what you changed, what happened, and anything the user actually needs to know, but do not lean on jargon, low-level implementation detail, or code-heavy narration unless the user asks for it.
-- Write for a smart person who is not looking at the code. The goal is clarity, not technical performance.
-- This communication rule applies to how you explain the work, not to how you do it. Your actual reasoning, coding, debugging, and verification should stay technical, precise, and rigorous.
-- When the user shows a bug, broken behavior, or a screenshot of something wrong, investigate before discussing readiness.
-- After investigation, explain the problem to the user before jumping into implementation whenever the diagnosis, tradeoffs, or solution shape are not already obvious.
-- Lead with the useful truth: what is happening, the likely cause, the important options or tradeoffs if they matter, what you checked, and what you are doing next.
-- Fix the underlying problem, not only the visible symptom. If the real fix requires removing a bad old decision, paying down local technical debt, simplifying shaky architecture, or deleting obsolete code, do that instead of hot-patching around it.
-- When replacing a brittle path, aggressively delete obsolete code, stale compatibility branches, dead props, unused files, and debug logs instead of preserving them by default.
-- Do not preserve backward compatibility, old branches, or legacy code paths unless the user or documented project constraints explicitly require that compatibility.
-- Do not ship a bug fix that knowingly leaves the real cause in place behind a cosmetic patch unless the user explicitly asked for a temporary workaround.
-- Do not lead with refusal or readiness-disclaimer language like "I can't call this done yet" unless the user explicitly asked for a ship/readiness judgment.
-- Honesty means accurate diagnosis, explicit uncertainty, and clear verification limits. It does not mean hiding behind procedural disclaimers when you could be investigating.
-- Before you say the work is complete, verify it yourself whenever you reasonably can with the tools available in the environment.
-- Before you report completion, reread `.waypoint/ACTIVE_PLANS.md`, the active tracker if one exists, `WORKSPACE.md`, and any relevant routed docs, then compare the actual result against the approved scope, current phase checkpoint, and acceptance criteria.
-- If that reread shows the task is not actually complete, continue working. Do not stop just to report partial progress as if it were completion.
-- Match the verification to the task. Run code and inspect real output for scripts and backend changes. Click through flows, inspect rendered states, and check behavior in the browser for visual or interactive work.
-- Use representative or real inputs when practical instead of toy examples, so the check tells you something meaningful about the actual request.
-- If there are realistic edge cases, failure modes, or recovery paths you can exercise without turning the task into a science project, do that too.
-- If something looks wrong, incomplete, or unproven, keep going. Fix it, rerun the check, and only report completion once the result matches the request.
-- Do not call work done while approved scope or acceptance criteria remain unfinished. If any approved item was skipped or deferred, report that plainly as partial work or a scope-change proposal, not as completion.
-- The point of this is to keep iteration off the user's shoulders. Return finished work when possible, not a first pass that still depends on the user to spot-check it for you.
-- Only come back before that if you hit a genuine blocker you cannot clear with the codebase, tools, or available context. If that happens, say it plainly and be explicit about what remains unverified.
+Keep communication concise. Lead with the answer, diagnosis, decision, or next step. Explain the diagnosis before implementation when the cause, tradeoffs, or solution shape are not already obvious.
-Working rules:
-- Treat `.waypoint/WORKSPACE.md` as a mandatory live execution log, not a closeout chore.
-- Treat `.waypoint/ACTIVE_PLANS.md` as the mandatory live execution-contract file for approved plans.
-- Update `.waypoint/WORKSPACE.md` during the work whenever the active goal, current phase, next step, blocker, verification state, or handoff context materially changes.
-- Update `.waypoint/ACTIVE_PLANS.md` whenever the active approved plan, current phase, phase checklist, checkpoint, or approved scope changes.
-- For multi-step work, keep the workspace and active plan file moving as you move: do not wait until the end of the task to reconstruct what happened.
-- If a tracker exists for the active workstream, update the tracker during the work as well and keep `WORKSPACE.md` pointing at the current tracker state.
-- Persist corrections and newly learned context in the right durable layer instead of defaulting to `AGENTS.md`.
-- Update user-scoped `AGENTS.md` only for true cross-project standing preferences or global operating rules.
-- Update the project-scoped repo `AGENTS.md` only for durable repo context or project-wide rules that should always apply in this repo.
-- If the correction is workflow-specific or method-specific, update the relevant repo skill instead. If no existing skill owns it well, propose creating one instead of stuffing that guidance into `AGENTS.md`.
-- Update `.waypoint/docs/` when durable project knowledge changes, update `.waypoint/plans/` when a durable plan changes, update `.waypoint/ACTIVE_PLANS.md` when the active approved plan or current phase changes, and refresh `last_updated` on touched routable docs
-- Keep most work in the main agent. Use repo-local skills, trackers, and reviewer agents when they create clear leverage, not as default ceremony.
-- Let repo-local skills describe their own triggers. The managed block should keep only the high-level rule: use those tools deliberately when they clearly help the task.
-- Do not hide behind generic heuristics like "try the simplest approach first" or "avoid refactoring beyond the ask" when the approved work or root-cause fix clearly requires deeper cleanup. Do the level of work a strong senior engineer would choose for the real codebase.
-- Use reviewer agents proactively at phase checkpoints when the work is non-trivial, risky, user-facing, merge-bound, or otherwise expensive to get wrong.
-- Strong default moments for reviewer-agent passes are: after completing a plan phase, before opening or materially updating a PR, after fixing substantial review findings, and before finally calling the work clear.
-- Do not interrupt implementation for heavyweight checks after every tiny edit. Batch related work into the current plan phase, then run the checkpoint.
-- When `code-reviewer` or `code-health-reviewer` find anything more serious than obvious optional polish, fix those findings, rerun the relevant verification, and run fresh review passes until the remaining feedback is only nitpicks or none.
-- Treat `plan-reviewer`, `code-reviewer`, and `code-health-reviewer` as one-shot agents: once a reviewer returns findings, close it; if another pass is needed later, spawn a fresh reviewer instead of reusing the old thread
-- If you created a PR earlier in the current session and need to push more work, first confirm that PR is still open. If it is closed, create a fresh branch from `origin/main` and open a fresh PR instead of pushing more commits to the old PR branch
-- Treat the generated context bundle as required session bootstrap, not optional reference material
-- After plan approval, own the execution through implementation, verification, and necessary repo-memory updates before surfacing a final completion report
+Before reporting completion, verify the result yourself when reasonably possible, reread `ACTIVE_PLANS.md` and `WORKSPACE.md`, and compare reality against the approved scope. If the work is not actually complete, keep going.
 <!-- waypoint:end -->

package/dist/src/track-index.js DELETED Viewed

@@ -1,107 +0,0 @@
-import { existsSync, readFileSync, readdirSync, statSync } from "node:fs";
-import path from "node:path";
-const VALID_TRACK_STATUSES = new Set(["active", "blocked", "paused", "done", "archived"]);
-const ACTIVE_TRACK_STATUSES = new Set(["active", "blocked", "paused"]);
-const SKIP_NAMES = new Set(["README.md", "CHANGELOG.md", "LICENSE.md"]);
-function shouldSkipTrackFile(entry) {
-    return SKIP_NAMES.has(entry) || entry.startsWith("_");
-}
-function parseFrontmatter(filePath) {
-    const text = readFileSync(filePath, "utf8");
-    if (!text.startsWith("---\n")) {
-        return { summary: "", lastUpdated: "", readWhen: [], status: "" };
-    }
-    const endIndex = text.indexOf("\n---\n", 4);
-    if (endIndex === -1) {
-        return { summary: "", lastUpdated: "", readWhen: [], status: "" };
-    }
-    const frontmatter = text.slice(4, endIndex);
-    let summary = "";
-    let lastUpdated = "";
-    let status = "";
-    const readWhen = [];
-    let collectingReadWhen = false;
-    for (const rawLine of frontmatter.split("\n")) {
-        const line = rawLine.trim();
-        if (line.startsWith("summary:")) {
-            summary = line.slice("summary:".length).trim().replace(/^['"]|['"]$/g, "");
-            collectingReadWhen = false;
-            continue;
-        }
-        if (line.startsWith("last_updated:")) {
-            lastUpdated = line.slice("last_updated:".length).trim().replace(/^['"]|['"]$/g, "");
-            collectingReadWhen = false;
-            continue;
-        }
-        if (line.startsWith("status:")) {
-            status = line.slice("status:".length).trim().replace(/^['"]|['"]$/g, "").toLowerCase();
-            collectingReadWhen = false;
-            continue;
-        }
-        if (line.startsWith("read_when:")) {
-            collectingReadWhen = true;
-            continue;
-        }
-        if (collectingReadWhen && line.startsWith("- ")) {
-            readWhen.push(line.slice(2).trim());
-            continue;
-        }
-        if (collectingReadWhen && line.length > 0) {
-            collectingReadWhen = false;
-        }
-    }
-    return { summary, lastUpdated, readWhen, status };
-}
-function walkTracks(projectRoot, currentDir, output, invalid) {
-    for (const entry of readdirSync(currentDir)) {
-        const fullPath = path.join(currentDir, entry);
-        const stat = statSync(fullPath);
-        if (stat.isDirectory()) {
-            walkTracks(projectRoot, fullPath, output, invalid);
-            continue;
-        }
-        if (!entry.endsWith(".md") || shouldSkipTrackFile(entry)) {
-            continue;
-        }
-        const { summary, lastUpdated, readWhen, status } = parseFrontmatter(fullPath);
-        const relPath = path.relative(projectRoot, fullPath);
-        if (!summary || !lastUpdated || readWhen.length === 0 || !VALID_TRACK_STATUSES.has(status)) {
-            invalid.push(relPath);
-            continue;
-        }
-        output.push({ path: relPath, summary, readWhen, status });
-    }
-}
-export function renderTracksIndex(projectRoot, trackDir) {
-    const entries = [];
-    const invalidTracks = [];
-    if (existsSync(trackDir)) {
-        walkTracks(projectRoot, trackDir, entries, invalidTracks);
-    }
-    const lines = [
-        "# Tracks Index",
-        "",
-        "Auto-generated by `waypoint sync` / `waypoint doctor`. Read active trackers when resuming long-running work.",
-        "",
-        "## .waypoint/track/",
-        "",
-    ];
-    if (entries.length === 0) {
-        lines.push("No tracker files found.");
-    }
-    else {
-        for (const entry of entries.sort((a, b) => a.path.localeCompare(b.path))) {
-            lines.push(`- **${entry.path}** — [${entry.status}] ${entry.summary}`);
-            lines.push(`  Read when: ${entry.readWhen.join("; ")}`);
-        }
-    }
-    lines.push("");
-    return {
-        content: `${lines.join("\n")}`,
-        invalidTracks,
-        activeTrackPaths: entries
-            .filter((entry) => ACTIVE_TRACK_STATUSES.has(entry.status))
-            .map((entry) => entry.path)
-            .sort((a, b) => a.localeCompare(b)),
-    };
-}

package/templates/.agents/skills/break-it-qa/SKILL.md DELETED Viewed

@@ -1,184 +0,0 @@
----
-name: break-it-qa
-description: Verify a user-facing feature by trying to break it on purpose instead of only following the happy path. Use after building forms, multistep flows, settings pages, onboarding, stateful UI, destructive actions, or any browser-facing feature where invalid inputs, refreshes, back navigation, repeated clicks, wrong action order, or recovery paths might expose real bugs.
----
-# Break-It QA
-Use this skill to attack the feature like an impatient, confused, or careless user.
-This skill is for adversarial manual QA. It tries to make the feature fail through invalid, interrupted, stale, repeated, or out-of-order interactions instead of only proving the happy path works.
-## Step 1: Ask The Three Setup Questions
-Before testing, ask the user these questions if the answer is not already clear from context:
-- what exact feature or scope should this cover?
-- how many attack items should the break log reach before stopping?
-- should the skill stop at findings or also fix clear issues after they are found?
-Keep this intake short. These are the main user-controlled knobs for the skill.
-If the user does not specify a count, use a reasonable default such as `40`.
-## Step 2: Read First
-Before verification:
-1. Read `.waypoint/SOUL.md`
-2. Read `.waypoint/agent-operating-manual.md`
-3. Read `.waypoint/WORKSPACE.md`
-4. Read `.waypoint/context/MANIFEST.md`
-5. Read every file listed in that manifest
-6. Read the routed docs or nearby code that define the feature being tested
-## Step 3: Identify Break Surfaces
-- Identify the happy path first so you know what "broken" means.
-- Find the fragile surfaces: forms, wizards, pending states, destructive actions, async transitions, navigation changes, and persisted state.
-- For each major step or transition, ask explicit "What if...?" questions before testing. Examples:
-  - What if the user refreshes here?
-  - What if they go back now?
-  - What if they click twice?
-  - What if this input is empty, malformed, too long, or contradictory?
-  - What if this action succeeds in the UI but fails in persistence?
-Do not test blindly.
-## Step 4: Create A Break Log
-Write or update a durable markdown log under `.waypoint/docs/`.
-- Prefer a focused path such as `.waypoint/docs/verification/<feature>-break-it-qa.md`.
-- If a routed verification doc already exists for this feature, update it instead of creating a competing file.
-- The log is part of the skill, not an optional extra.
-- Pre-generate the attack plan in this log before executing it. Do not improvise everything live.
-Use one item per attempted action. A good entry shape is:
-```markdown
-- [ ] What if the user refreshes on the confirmation step before the request finishes?
-  Step: confirmation
-  Category: navigation
-  Status: pending
-  Observed: not tried yet
-```
-Then update each item as you go:
-- `survived`
-- `broke`
-- `fixed`
-- `retested-survived`
-- `blocked`
-- `not-applicable`
-Every executed item must include:
-- `Step`
-- `Category`
-- `Status`
-- `Observed`
-If the user sets a target such as "make this file 150 items long before you stop," treat that as a hard stopping condition unless you hit a real blocker and explain why.
-Use consistent categories such as:
-- `navigation`
-- `input-validation`
-- `repeat-action`
-- `stale-state`
-- `error-recovery`
-- `destructive-action`
-- `permissions`
-- `async-state`
-- `persistence`
-## Step 5: Enforce Coverage Before Execution
-Before you start executing attacks:
-- pre-generate a meaningful attack list
-- spread it across the major flow steps
-- spread it across relevant categories
-- make sure the count is not satisfied by one repetitive corner of the feature
-Do not treat total item count alone as sufficient coverage.
-If the user asks for a large target such as `150`, ensure the log covers multiple steps and multiple categories instead of padding one surface.
-Anti-cheating rules:
-- no filler items
-- each attack must be meaningfully distinct
-- reworded duplicates do not count toward the target
-## Step 6: Use The Real UI
-- Use `playwright-interactive`.
-- Exercise the actual UI instead of mocking the flow in code.
-- Keep the scope focused on the feature the user asked you to verify.
-- Capture screenshots of the important states you observe so the user can see the evidence directly.
-## Step 7: Try To Break It On Purpose
-Do more than a happy-path walkthrough.
-Actively try:
-- invalid inputs
-- empty required fields
-- boundary-length or malformed inputs
-- repeated or double clicks
-- submitting twice
-- wrong action order
-- back and forward navigation
-- page refresh during the flow
-- closing and reopening modals or screens
-- canceling mid-flow and re-entering
-- stale UI state after edits
-- conflicting selections or toggles
-- error recovery after a failed action
-If the feature is stateful, also check whether the UI, network result, and persisted state stay coherent after those interactions.
-As you test, keep expanding the break log with new "What if...?" cases that emerge from the flow. Do not rely on memory or chat-only notes.
-## Step 8: Record And Fix Real Bugs
-- Document each meaningful issue you find.
-- Fix the issue when the remediation is clear and the chosen mode includes fixes.
-- If the behavior is ambiguous, call out the product decision instead of bluffing a fix.
-- Update docs when the verification exposes stale assumptions about how the feature works.
-- Update the break log entry for each attempted action with what happened and whether the feature survived.
-- Require a short observed-result note for every executed item. "Worked" is too weak; capture what actually happened.
-- Save screenshots for the key broken, risky, or fixed states as you go.
-Do not stop at the first bug.
-## Step 9: Repeat Until The Feature Resists Abuse
-After fixes:
-- rerun the relevant happy path
-- rerun the break attempts that previously failed
-- rerun directly related attacks
-- rerun neighboring attacks that touch the same step, state transition, or failure surface
-- verify the fix did not create a new inconsistent state
-- keep adding and executing new "What if...?" items until the requested target coverage is reached
-The skill is not done when the feature only works once. It is done when the feature behaves predictably under sloppy real-world use.
-## Step 10: Report Truthfully
-Summarize:
-- the path to the break log markdown file
-- how many attack items were recorded and exercised
-- how coverage was distributed across steps and categories
-- which screenshots you captured and what each one shows
-- what break attempts you tried
-- which issues you found
-- what you fixed
-- a short systemic-risks summary describing recurring weakness patterns, not just individual bugs
-- what still looks risky or was not exercised

package/templates/.agents/skills/break-it-qa/agents/openai.yaml DELETED Viewed

@@ -1,4 +0,0 @@
-interface:
-  display_name: "Break-It QA"
-  short_description: "Try to break a feature through the UI"
-  default_prompt: "Use $break-it-qa to verify this user-facing feature by trying to break it through the browser with invalid inputs, wrong action order, refreshes, back navigation, repeated clicks, and other adversarial interactions, then fix clear issues and repeat."

package/templates/.agents/skills/conversation-retrospective/SKILL.md DELETED Viewed

@@ -1,147 +0,0 @@
----
-name: conversation-retrospective
-description: Harvest durable knowledge, user feedback, skill lessons, and repeated workflow patterns from the active conversation into the repo's existing memory system. Use when the user asks to save what was learned, write down what changed, capture lessons from this thread, update docs or handoff state without more prompting, improve skills that were used or exposed gaps, or record new skill ideas based on repetitive work in the live conversation. Do not use this for generic planning, broad docs audits, or digging through archived session history unless the user explicitly asks for that.
----
-# Conversation Retrospective
-Use this skill to harvest the active conversation into the repo's existing memory system.
-This skill works from the live conversation already in context. Do not go hunting through archived session files unless the user explicitly asks for that.
-This is a closeout and distillation workflow, not a generic planning pass or a broad docs audit.
-## When Not To Use This Skill
-- Skip it for generic planning or implementation design; use the planning workflow for that.
-- Skip it for broad docs audits that are not driven by what happened in this conversation.
-- Skip it when the user wants archived history analysis rather than the live thread; only dig into old sessions if they explicitly ask.
-- Skip it when there is nothing durable to preserve and no skill or workflow lesson to capture.
-## Read First
-Before persisting anything:
-1. Read the repo's main agent guidance and project-context files
-2. Read the repo's current durable memory surfaces, such as docs, workspace/handoff files, trackers, decision logs, or knowledge files
-3. Read the exact docs, notes, and skill files that the conversation touched
-Do not assume the repo uses Waypoint. Adapt to the memory structure that already exists.
-## Step 1: Distill Durable Knowledge
-Review the current conversation and separate:
-- durable project knowledge
-- live execution state
-- transient chatter
-- direct user feedback, corrections, complaints, and preferences
-Persist without asking follow-up questions when the correct destination is clear.
-Treat explicit user feedback as a high-priority signal. If the user corrected the approach, rejected a behavior, called out friction, or stated a standing preference, prefer preserving that over the agent's earlier assumptions.
-Write durable knowledge to the smallest truthful home the repo already uses:
-- the main docs or knowledge layer for architecture, behavior, decisions, debugging knowledge, and reusable operating guidance
-- the repo's plans layer for durable implementation, rollout, migration, or investigation plans
-- the repo's standing guidance file for durable project context or long-lived working rules
-- the repo's live handoff or workspace file for current state, blockers, and immediate next steps
-- the repo's tracker or execution-log layer when the conversation created or materially changed a long-running workstream
-If the repo uses doc metadata such as `last_updated`, refresh it when needed.
-If the repo has no obvious durable home but the need is clear, create the smallest coherent doc or note that fits the surrounding patterns instead of leaving the learning only in chat.
-Do not leave important truths only in chat.
-## Step 2: Improve Existing Skills
-Identify which skills were actually used in this conversation, or which existing skills clearly should have covered the workflow but left avoidable gaps.
-For each used or clearly relevant skill, explicitly decide whether it:
-- succeeded
-- partially succeeded
-- failed
-Base that judgment on the actual conversation, especially:
-- direct user feedback
-- whether the skill helped complete the task
-- whether the agent had to work around missing guidance
-- whether concrete errors, dead ends, or repeated corrections happened while using it
-Distinguish between:
-- a skill problem
-- an execution mistake by the agent
-- an external/tooling failure
-- a one-off user preference that should not be generalized
-Only change the skill when the problem is truly in the skill guidance.
-For each affected skill:
-- read the existing skill before editing it
-- update only reusable guidance, not one-off transcript details
-- add missing guardrails, path hints, failure modes, error-handling guidance, decision rules, or references that would have made the conversation easier to complete
-- keep `SKILL.md` concise; prefer targeted structural improvements over turning the skill into a diary
-If the environment has both a source-of-truth skill and one or more mirrored or installed copies, update the source-of-truth version and any copies the user expects to stay in sync.
-Do not assume there is only one skill location, and do not assume there are many.
-## Step 3: Propose New Skills
-When the conversation revealed repetitive work that existing skills do not cover well:
-- do not silently scaffold a new skill unless the user asked for implementation
-- record the proposal in the repo's existing docs or idea-capture layer
-If there is no obvious place for durable skill proposals, create a small doc such as `skill-ideas.md` in the repo's normal docs area.
-Each proposal should include:
-- the repeated workflow or problem
-- likely trigger phrases
-- expected outputs or side effects
-- why existing skills were insufficient
-Skip this doc when there is no real new-skill candidate.
-## Step 4: Refresh Repo Memory
-After changing docs, handoff state, trackers, or skills:
-- run whatever repo-local refresh or index step the project uses, if one exists
-- otherwise make sure the edited memory surfaces are internally consistent and discoverable
-Do not invent a refresh command when the repo does not have one.
-## Step 5: Report
-Summarize:
-- what durable knowledge you saved and where
-- which skills you evaluated and whether they succeeded, partially succeeded, or failed
-- which skills you improved
-- which concrete errors, failure modes, or repeated friction points you captured
-- which new skill ideas you recorded, if any
-- what you intentionally left unpersisted because it was transient
-If no substantive persistence changes were needed, say that explicitly instead of inventing updates.
-## Gotchas
-- Do not turn this skill into a transcript dump. Persist only durable knowledge, live state, or reusable lessons.
-- Do not scatter the same learning across multiple files. Pick the smallest truthful home the repo already uses.
-- Do not blame a skill for a problem that was really an execution mistake or an external tool failure.
-- Do not preserve one-off user phrasing or temporary frustration as if it were standing repo policy unless the user clearly framed it that way.
-- Do not go hunting through archived session files just because the live thread feels incomplete. This skill should work from the current conversation unless the user explicitly broadens the scope.
-## Keep This Skill Sharp
-- After meaningful retrospectives, add new gotchas when the same persistence mistake, memory-placement mistake, or skill-triage mistake keeps recurring.
-- Tighten the description if the skill misses real prompts like "save what we learned here" or fires on requests that are really planning or docs-audit work.
-- If the same kind of durable learning keeps needing a custom destination, add that routing guidance to the skill instead of leaving the decision to be rediscovered in chat.

package/templates/.agents/skills/conversation-retrospective/agents/openai.yaml DELETED Viewed

@@ -1,4 +0,0 @@
-interface:
-  display_name: "Conversation Retrospective"
-  short_description: "Harvest the live conversation into repo memory"
-  default_prompt: "Use $conversation-retrospective to preserve the durable lessons, repo-memory updates, and skill learnings from this live conversation."