npm - @williambeto/ai-workflow - Versions diffs - 1.19.0 → 2.1.0 - Mend

@williambeto/ai-workflow 1.19.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (391) hide show

package/.agents/skills/interface-design/references/principles.md DELETED Viewed

@@ -1,235 +0,0 @@
-# Core Craft Principles
-These apply regardless of design direction. This is the quality floor.
----
-## Surface & Token Architecture
-Professional interfaces don't pick colors randomly — they build systems. Understanding this architecture is the difference between "looks okay" and "feels like a real product."
-### The Primitive Foundation
-Every color in your interface should trace back to a small set of primitives:
-- **Foreground** — text colors (primary, secondary, muted)
-- **Background** — surface colors (base, elevated, overlay)
-- **Border** — edge colors (default, subtle, strong)
-- **Brand** — your primary accent
-- **Semantic** — functional colors (destructive, warning, success)
-Don't invent new colors. Map everything to these primitives.
-### Surface Elevation Hierarchy
-Surfaces stack. A dropdown sits above a card which sits above the page. Build a numbered system:
-```
-Level 0: Base background (the app canvas)
-Level 1: Cards, panels (same visual plane as base)
-Level 2: Dropdowns, popovers (floating above)
-Level 3: Nested dropdowns, stacked overlays
-Level 4: Highest elevation (rare)
-```
-In dark mode, higher elevation = slightly lighter. In light mode, higher elevation = slightly lighter or uses shadow. The principle: **elevated surfaces need visual distinction from what's beneath them.**
-### The Subtlety Principle
-This is where most interfaces fail. Study Vercel, Supabase, Linear — their surfaces are **barely different** but still distinguishable. Their borders are **light but not invisible**.
-**For surfaces:** The difference between elevation levels should be subtle — a few percentage points of lightness, not dramatic jumps. In dark mode, surface-100 might be 7% lighter than base, surface-200 might be 9%, surface-300 might be 12%. You can barely see it, but you feel it.
-**For borders:** Borders should define regions without demanding attention. Use low opacity (0.05-0.12 alpha for dark mode, slightly higher for light). The border should disappear when you're not looking for it, but be findable when you need to understand the structure.
-**The test:** Squint at your interface. You should still perceive the hierarchy — what's above what, where regions begin and end. But no single border or surface should jump out at you. If borders are the first thing you notice, they're too strong. If you can't find where one region ends and another begins, they're too subtle.
-**Common AI mistakes to avoid:**
-- Borders that are too visible (1px solid gray instead of subtle rgba)
-- Surface jumps that are too dramatic (going from dark to light instead of dark to slightly-less-dark)
-- Using different hues for different surfaces (gray card on blue background)
-- Harsh dividers where subtle borders would do
-### Text Hierarchy via Tokens
-Don't just have "text" and "gray text." Build four levels:
-- **Primary** — default text, highest contrast
-- **Secondary** — supporting text, slightly muted
-- **Tertiary** — metadata, timestamps, less important
-- **Muted** — disabled, placeholder, lowest contrast
-Use all four consistently. If you're only using two, your hierarchy is too flat.
-### Border Progression
-Borders aren't binary. Build a scale:
-- **Default** — standard borders
-- **Subtle/Muted** — softer separation
-- **Strong** — emphasis, hover states
-- **Stronger** — maximum emphasis, focus rings
-Match border intensity to the importance of the boundary.
-### Dedicated Control Tokens
-Form controls (inputs, checkboxes, selects) have specific needs. Don't just reuse surface tokens — create dedicated ones:
-- **Control background** — often different from surface backgrounds
-- **Control border** — needs to feel interactive
-- **Control focus** — clear focus indication
-This separation lets you tune controls independently from layout surfaces.
-### Context-Aware Bases
-Different areas of your app might need different base surfaces:
-- **Marketing pages** — might use darker/richer backgrounds
-- **Dashboard/app** — might use neutral working backgrounds
-- **Sidebar** — might differ from main canvas
-The surface hierarchy works the same way — it just starts from a different base.
-### Alternative Backgrounds for Depth
-Beyond shadows, use contrasting backgrounds to create depth. An "alternative" or "inset" background makes content feel recessed. Useful for:
-- Empty states in data grids
-- Code blocks
-- Inset panels
-- Visual grouping without borders
----
-## Spacing System
-Pick a base unit (4px and 8px are common) and use multiples throughout. The specific number matters less than consistency — every spacing value should be explainable as "X times the base unit."
-Build a scale for different contexts:
-- Micro spacing (icon gaps, tight element pairs)
-- Component spacing (within buttons, inputs, cards)
-- Section spacing (between related groups)
-- Major separation (between distinct sections)
-## Symmetrical Padding
-TLBR must match. If top padding is 16px, left/bottom/right must also be 16px. Exception: when content naturally creates visual balance.
-```css
-/* Good */
-padding: 16px;
-padding: 12px 16px; /* Only when horizontal needs more room */
-/* Bad */
-padding: 24px 16px 12px 16px;
-```
-## Border Radius Consistency
-Sharper corners feel technical, rounder corners feel friendly. Pick a scale that fits your product's personality and use it consistently.
-The key is having a system: small radius for inputs and buttons, medium for cards, large for modals or containers. Don't mix sharp and soft randomly — inconsistent radius is as jarring as inconsistent spacing.
-## Depth & Elevation Strategy
-Match your depth approach to your design direction. Choose ONE and commit:
-**Borders-only (flat)** — Clean, technical, dense. Works for utility-focused tools where information density matters more than visual lift. Linear, Raycast, and many developer tools use almost no shadows — just subtle borders to define regions.
-**Subtle single shadows** — Soft lift without complexity. A simple `0 1px 3px rgba(0,0,0,0.08)` can be enough. Works for approachable products that want gentle depth.
-**Layered shadows** — Rich, premium, dimensional. Multiple shadow layers create realistic depth. Stripe and Mercury use this approach. Best for cards that need to feel like physical objects.
-**Surface color shifts** — Background tints establish hierarchy without any shadows. A card at `#fff` on a `#f8fafc` background already feels elevated.
-```css
-/* Borders-only approach */
---border: rgba(0, 0, 0, 0.08);
---border-subtle: rgba(0, 0, 0, 0.05);
-border: 0.5px solid var(--border);
-/* Single shadow approach */
---shadow: 0 1px 3px rgba(0, 0, 0, 0.08);
-/* Layered shadow approach */
---shadow-layered:
-  0 0 0 0.5px rgba(0, 0, 0, 0.05),
-  0 1px 2px rgba(0, 0, 0, 0.04),
-  0 2px 4px rgba(0, 0, 0, 0.03),
-  0 4px 8px rgba(0, 0, 0, 0.02);
-```
-## Card Layouts
-Monotonous card layouts are lazy design. A metric card doesn't have to look like a plan card doesn't have to look like a settings card.
-Design each card's internal structure for its specific content — but keep the surface treatment consistent: same border weight, shadow depth, corner radius, padding scale, typography.
-## Isolated Controls
-UI controls deserve container treatment. Date pickers, filters, dropdowns — these should feel like crafted objects.
-**Never use native form elements for styled UI.** Native `<select>`, `<input type="date">`, and similar elements render OS-native dropdowns that cannot be styled. Build custom components instead:
-- Custom select: trigger button + positioned dropdown menu
-- Custom date picker: input + calendar popover
-- Custom checkbox/radio: styled div with state management
-Custom select triggers must use `display: inline-flex` with `white-space: nowrap` to keep text and chevron icons on the same row.
-## Typography Hierarchy
-Build distinct levels that are visually distinguishable at a glance:
-- **Headlines** — heavier weight, tighter letter-spacing for presence
-- **Body** — comfortable weight for readability
-- **Labels/UI** — medium weight, works at smaller sizes
-- **Data** — often monospace, needs `tabular-nums` for alignment
-Don't rely on size alone. Combine size, weight, and letter-spacing to create clear hierarchy. If you squint and can't tell headline from body, the hierarchy is too weak.
-## Monospace for Data
-Numbers, IDs, codes, timestamps belong in monospace. Use `tabular-nums` for columnar alignment. Mono signals "this is data."
-## Iconography
-Icons clarify, not decorate — if removing an icon loses no meaning, remove it. Choose a consistent icon set and stick with it throughout the product.
-Give standalone icons presence with subtle background containers. Icons next to text should align optically, not mathematically.
-## Animation
-Keep it fast and functional. Micro-interactions (hover, focus) should feel instant — around 150ms. Larger transitions (modals, panels) can be slightly longer — 200-250ms.
-Use smooth deceleration easing (ease-out variants). Avoid spring/bounce effects in professional interfaces — they feel playful, not serious.
-## Contrast Hierarchy
-Build a four-level system: foreground (primary) → secondary → muted → faint. Use all four consistently.
-## Color Carries Meaning
-Gray builds structure. Color communicates — status, action, emphasis, identity. Unmotivated color is noise. Color that reinforces the product's world is character.
-## Navigation Context
-Screens need grounding. A data table floating in space feels like a component demo, not a product. Consider including:
-- **Navigation** — sidebar or top nav showing where you are in the app
-- **Location indicator** — breadcrumbs, page title, or active nav state
-- **User context** — who's logged in, what workspace/org
-When building sidebars, consider using the same background as the main content area. Rely on a subtle border for separation rather than different background colors.
-## Dark Mode
-Dark interfaces have different needs:
-**Borders over shadows** — Shadows are less visible on dark backgrounds. Lean more on borders for definition.
-**Adjust semantic colors** — Status colors (success, warning, error) often need to be slightly desaturated for dark backgrounds.
-**Same structure, different values** — The hierarchy system still applies, just with inverted values.

package/.agents/skills/interface-design/references/validation.md DELETED Viewed

@@ -1,48 +0,0 @@
-# Memory Management
-When and how to update `.interface-design/system.md`.
-## When to Add Patterns
-Add to system.md when:
-- Component used 2+ times
-- Pattern is reusable across the project
-- Has specific measurements worth remembering
-## Pattern Format
-```markdown
-### Button Primary
-- Height: 36px
-- Padding: 12px 16px
-- Radius: 6px
-- Font: 14px, 500 weight
-```
-## Don't Document
-- One-off components
-- Temporary experiments
-- Variations better handled with props
-## Pattern Reuse
-Before creating a component, check system.md:
-- Pattern exists? Use it.
-- Need variation? Extend, don't create new.
-Memory compounds: each pattern saved makes future work faster and more consistent.
----
-# Validation Checks
-If system.md defines specific values, check consistency:
-**Spacing** — All values multiples of the defined base?
-**Depth** — Using the declared strategy throughout? (borders-only means no shadows)
-**Colors** — Using defined palette, not random hex codes?
-**Patterns** — Reusing documented patterns instead of creating new?

package/.agents/skills/minimal-context/SKILL.md DELETED Viewed

@@ -1,177 +0,0 @@
----
-name: minimal-context
-description: Use when reading or searching the repository to avoid broad context loading and keep input tokens minimal while preserving task-relevant evidence.
----
-# Minimal Context Mode Skill
-## Purpose
-Reduce INPUT token usage by reading only task-relevant files, using targeted search before reading, and stopping once enough evidence is found.
-`minimal-context` reduces what is sent TO the model.
-`token-economy` reduces what the model SENDS BACK.
-Use both together: `minimal-context` for input discipline, `token-economy` for output discipline.
-## Use when
-- Task is scoped and file list is known or guessable.
-- No audit or full-repository scan is requested.
-- Input tokens are high (90k–164k per call) while output is small.
-- A targeted fix, implementation, review, or validation is needed.
-## Core responsibilities
-1. **List before read** — use `grep` or `glob` to find target files before reading them.
-2. **Limit reads to 3–7 files by default** — stop and summarize after each batch.
-3. **Prefer targeted search over broad reads** — `grep` first, read second.
-4. **Stop after enough evidence** — do not read all files in a directory.
-5. **Send compact summaries to subagents** — not full file contents.
-6. **Exclude unrelated conversation history** — start from the task, not from prior context.
-## Input vs output distinction
-| Token type | What it is | How to reduce |
-|---|---|---|
-| **Input tokens** | Files, history, and context sent TO the model | `minimal-context` skill |
-| **Output tokens** | Model response verbosity | `token-economy` skill |
-High input tokens come from:
-- Reading entire directories or large files
-- Broad glob patterns that match many files
-- Including prior conversation or handoff history
-- Passing full file contents to subagents
-## Constraints
-- Do not glob or read the entire repository unless task is explicitly an audit.
-- For audit tasks, define a scoped audit scope first before reading broadly.
-- Do not pass full file contents to subagents; pass compact summaries or relevant snippets.
-- Do not include unrelated previous conversation context in new tasks.
-- Ask at most one follow-up read after initial targeted search.
-- Stop when the evidence found is sufficient for the current task.
-## File reading rules
-### Default (non-audit task)
-1. State what you are looking for.
-2. Run `grep` or `glob` to narrow to 3–7 files max.
-3. Read only the most relevant files.
-4. Stop and summarize findings.
-5. If more context is needed, request explicit confirmation before reading more.
-### Audit task (allowed broad read)
-1. Define audit scope explicitly before starting.
-2. List categories of files to inspect.
-3. Run broader searches but still stop after each category.
-4. Summarize findings per category before moving to next.
-5. Do not read every file; sample strategically.
-## Subagent handoff rules
-When delegating to a subagent:
-- Send only: task summary, target files (by path), constraints, expected output.
-- Do NOT send: full file contents, glob results, prior conversation history.
-- If subagent needs more context, it should ask or use targeted search itself.
-## Expected output
-- Files read (by path and line range when relevant)
-- Why each file was chosen
-- Summary of findings
-- What was NOT read and why
-- Next action or decision
-## Validation
-- Confirm input budget was respected (3–7 files per read batch).
-- Confirm grep/search was used before read.
-- Confirm subagents received summaries, not full contents.
-- Confirm audit scope was defined before broad reads.
-## Stop conditions
-- Stop when enough evidence is found for the current task.
-- Stop when the file read count exceeds 7 without explicit task justification.
-- Stop when a subagent receives full file contents without necessity.
-## Troubleshooting: Input tokens still high?
-If input tokens remain high (90k+) despite following the rules above, the primary driver is **conversation history accumulation**, not file reading.
-### Session history is the biggest input cost
-Each message in a session carries the full conversation history as input. After 10 messages, you are sending 10x history. After 50 messages, 50x.
-**Symptoms:**
-- File reads are limited (3–7 per batch) ✅
-- Grep/glob used before read ✅
-- But total input tokens still 90k–164k per call
-- Output is tiny (74–700 tokens)
-**Cause:** The session is carrying full history, not the files.
-### Solutions (in order of impact)
-#### 1. Start new sessions after each PR
-Session length is the primary input driver. Reset at natural boundaries:
-- After each PR is merged
-- Before starting a new feature
-- After handoff to another agent
-Do not accumulate more than 5–10 exchanges in one session for high-token workflows.
-#### 2. Use session truncation if available
-If your tool (OpenCode, Codex, Claude Code) supports it:
-- Enable sliding window context
-- Set max conversation turns before summary
-- Configure context eviction after N messages
-#### 3. Explicit session boundary in handoff
-When delegating to another agent, start fresh session:
-```
-"Session note: Begin fresh session. Prior conversation not needed.
-Focus on: [task summary]. Start from this file: [path]"
-```
-#### 4. Use minimal task scoping
-Before each exchange, state explicitly:
-```
-Task: [one sentence]
-Files needed: [list 3–5 paths max]
-Do not reference prior conversation history.
-```
-#### 5. Use compact output to reduce future input
-Every verbose response becomes future input. Use:
-- `/caveman` (terse output)
-- `compact` mode in token-economy skill
-- Fewer findings = less history = lower next input
-### Quick diagnostic
-Run two identical tasks and compare:
-| Scenario | Expected input |
-|----------|---------------|
-| Fresh session (0 history) | 15k–40k |
-| After 10 exchanges | 60k–100k |
-| After 30 exchanges | 120k–200k |
-If your input is 2–3x higher than fresh session, the cause is history, not files.
-### Rule of thumb
-**If input > 50k and you have not read any files recently, it is history.**
-Start new session. Problem solved.
-## See also
-- `.agents/skills/token-economy/SKILL.md` — output compact mode
-- `docs/npm-consumer-quickstart.md` — `.ai-workflow/` symlinked install model
-- `AGENTS.md` — anti-overdelegation rules

package/.agents/skills/napkin/SKILL.md DELETED Viewed

@@ -1,84 +0,0 @@
----
-name: napkin
-description: Use when managing discoverable project memory for durable decisions, recurring corrections, and startup or handoff context without forcing full-memory loading in every task.
----
-> Token economy active: keep output compact, context minimal. Reference: `token-economy` + `minimal-context` skills.
-# Napkin Skill
-## Purpose
-Maintain durable, per-repository operational memory in `./.agents/napkin.md`.
-This memory is always discoverable and selectively consulted when relevant.
-## When to use
-- At the start of tasks that depend on repository workflow rules, conventions, validation, or recurring decisions.
-- When a durable lesson or correction should be preserved for future sessions.
-- Before handoff or final reporting, to capture reusable guidance from the completed work.
-## When not to use
-- Do not store temporary notes, one-off timelines, or task logs.
-- Do not duplicate canonical source documents such as `AGENTS.md`, runbooks, or schema definitions.
-- Do not store secrets, credentials, tokens, private keys, or personal/environment-sensitive data.
-## Entry format
-```md
-## [YYYY-MM-DD] <Category>: <Rule>
-- Instead of: <old behavior or failed assumption>
-- Do: <small repeatable behavior>
-- Evidence/source: <file path, command output, review note, or user correction>
-```
-## Maintenance rules
-- Keep entries short, actionable, and reusable.
-- Merge duplicates and remove stale or superseded guidance.
-- Prefer one clear rule over long narrative.
-- If a lesson is uncertain, mark it as candidate material and avoid promoting it to durable memory yet.
-## Safety rules
-- Never save secrets or sensitive data.
-- Never invent memory; only record observed, validated, or explicitly directed guidance.
-- If durability is unclear, skip the update or mark as candidate in working notes instead of adding to Napkin.
-## How to operate
-1. Read `./.agents/napkin.md` at relevant task start.
-2. Apply only relevant entries (progressive disclosure).
-3. Update the file only with durable reusable rules.
-4. Curate entries before handoff to keep memory high signal.
-## Responsibilities
-- Keep project memory discoverable and operational across Codex and OpenCode usage.
-- Preserve durable corrections that reduce repeated mistakes.
-- Keep memory clean so it improves execution instead of adding noise.
-## Constraints
-- Do not convert Napkin into a daily log or status report.
-- Do not duplicate large sections from canonical docs.
-- Do not overwrite broad repository policy; reference canonical files when needed.
-## Expected output
-- Updated `./.agents/napkin.md` entries that follow the required format.
-- A concise note in implementation/review output describing what durable memory was added, updated, merged, or removed.
-## Stop conditions
-- Stop when relevant durable memory has been applied or curated for the current task.
-- Stop when no durable lesson exists; do not force an update.
-## Validation
-- `./.agents/napkin.md` exists.
-- Entries follow date/category/rule plus `Instead of`, `Do`, and `Evidence/source` fields.
-- Memory remains concise, durable, and free of secrets.

package/.agents/skills/opencode-agent-design/SKILL.md DELETED Viewed

@@ -1,77 +0,0 @@
----
-name: opencode-agent-design
-description: Use when designing or reviewing OpenCode agents, commands, routing, role boundaries, and agent, skill, and command separation.
----
-> Token economy active: keep output compact, context minimal. Reference: `token-economy` + `minimal-context` skills.
-# OpenCode Agent Design Skill
-## Purpose
-Design OpenCode workflow assets with clear role boundaries, practical routing, and explicit stop conditions.
-## Use when
-- Creating or reviewing OpenCode agent definitions.
-- Deciding whether something should be an agent, skill, or command.
-- Improving command-to-agent routing.
-- Reducing duplicate responsibilities across agents.
-- Validating agent prompts against real delivery workflows.
-## Guidance
-- Agents are operational roles.
-- Skills are reusable technical capabilities.
-- Commands are compact entrypoints.
-- Give each agent one primary job and clear required context.
-- Define expected output and stop conditions.
-- Avoid duplicate agent responsibilities unless the handoff boundary is explicit.
-- Validate agents against real workflows such as discovery, estimation, planning, implementation, fixing, review, validation, and release.
-## Core responsibilities
-1. Design agents with one primary job and clear required context.
-2. Separate agents, skills, and commands by their role (operational, capability, entrypoint).
-3. Avoid duplicate responsibilities unless the handoff boundary is explicit.
-4. Define expected output and stop conditions for each agent.
-5. Validate agent chain against at least one realistic workflow scenario.
-6. Confirm commands stay short and route to existing assets.
-7. Review stop conditions for ambiguity.
-## Constraints
-- Do not name every skill as an agent.
-- Do not create agents that all plan, implement, and review.
-- Do not omit stop conditions from agents or commands.
-- Do not duplicate large prompt content in every agent.
-- Do not design around tool registration instead of real workflow needs.
-- Do not create agents without validating them against a realistic scenario.
-## Expected output
-- Agent definitions with distinct roles, routing, and stop conditions.
-- Command-to-agent mapping with validation evidence.
-- Tested agent chain against at least one realistic scenario.
-- Final status: `Approved`, `Approved with notes`, `Changes requested`, or `Blocked`.
-## Validation
-- Check that each agent has a distinct role.
-- Confirm commands stay short and route to existing assets.
-- Confirm skills are capabilities, not disguised agents.
-- Review stop conditions for ambiguity.
-- Test the agent chain against at least one realistic project scenario when possible.
-## Common mistakes
-- Naming every skill as an agent.
-- Creating agents that all plan, implement, and review.
-- Omitting stop conditions.
-- Duplicating large prompt content in every agent.
-- Designing around tool registration instead of real workflow needs.
-## Stop conditions
-- Stop when each agent has a distinct role, clear routing, and stop conditions.
-- Stop and report `Blocked` if required agent context or workflow scenario is missing.

package/.agents/skills/playwright-cli/SKILL.md DELETED Viewed

@@ -1,62 +0,0 @@
----
-name: playwright-cli
-description: Use when running, debugging, or stabilizing Playwright CLI workflows for browser tests, traces, retries, and CI evidence collection.
----
-> Token economy active: keep output compact, context minimal. Reference: `token-economy` + `minimal-context` skills.
-# Playwright CLI Skill
-## Purpose
-Guide safe, repeatable Playwright CLI usage for validation and troubleshooting without changing unrelated workflow scope.
-## Use when
-- Running Playwright E2E checks from local or CI contexts.
-- Collecting traces, videos, and screenshots for failed tests.
-- Diagnosing flaky browser tests and retry behavior.
-- Reviewing Playwright command output as validation evidence.
-## Guidance
-- Prefer project-defined Playwright scripts before ad-hoc commands.
-- Keep test scope focused on affected user flows.
-- Capture reproducible evidence (`trace`, screenshot, failing test name).
-- Distinguish product regressions from environment issues.
-- Document skipped checks and reasons.
-## Core responsibilities
-1. Run Playwright commands from project-defined scripts before ad-hoc commands.
-2. Capture reproducible evidence (trace, screenshot, failing test name) for failures.
-3. Distinguish product regressions from environment issues.
-4. Document skipped checks and reasons explicitly.
-5. Re-run targeted failing specs when diagnosing flakiness.
-6. Report untested browsers and devices explicitly.
-## Constraints
-- Do not invent test results or fabricate evidence.
-- Do not claim tests passed without running them.
-- Do not skip reporting untested browsers or devices.
-- Do not expand test scope beyond affected user flows.
-- Do not approve deployment when Playwright validation fails without explanation.
-## Expected output
-- Validation report with command, result, and failure excerpt.
-- Explicit list of untested browsers/devices and skipped checks.
-- Final status: `Approved`, `Approved with notes`, `Changes requested`, or `Blocked`.
-## Validation
-- Run relevant Playwright commands available in the repository.
-- Record command, result, and failure excerpt when any test fails.
-- Re-run targeted failing specs when diagnosing flakiness.
-- Report untested browsers/devices explicitly.
-## Stop conditions
-- Stop when validation evidence is collected and the final report is complete.
-- Stop and report `Blocked` if required Playwright setup, browsers, or test files are missing.