npm - @kontourai/flow-agents - Versions diffs - 1.1.0 → 1.3.0 - Mend

@kontourai/flow-agents 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (119) hide show

package/.github/workflows/ci.yml +6 -1
package/.github/workflows/kit-gates-demo.yml +6 -2
package/.github/workflows/runtime-compat.yml +5 -2
package/CHANGELOG.md +51 -0
package/CONTRIBUTING.md +30 -0
package/README.md +26 -5
package/agents/dev.json +1 -1
package/agents/tool-planner.json +1 -1
package/build/src/cli/{flow-kit.js → kit.js} +122 -108
package/build/src/cli/validate-source-tree.js +4 -4
package/build/src/cli/workflow-sidecar.js +70 -5
package/build/src/cli.js +3 -3
package/build/src/flow-kit/validate.js +89 -62
package/build/src/tools/build-universal-bundles.js +78 -17
package/build/src/tools/generate-context-map.js +49 -7
package/build/src/tools/validate-source-tree.js +32 -1
package/console.telemetry.json +1 -1
package/docs/adr/0004-gates-expect-surface-claims.md +7 -7
package/docs/adr/0007-flow-skill-kit-tool-boundary.md +169 -0
package/docs/adr/0007-skill-audit.md +112 -0
package/docs/adr/0008-kit-operation-boundary.md +88 -0
package/docs/context-map.md +18 -22
package/docs/flow-kit-repository-contract.md +5 -5
package/docs/getting-started.md +177 -0
package/docs/index.md +19 -8
package/docs/kit-authoring-guide.md +125 -13
package/docs/knowledge-kit.md +2 -2
package/docs/operating-layers.md +2 -2
package/docs/spec/runtime-hook-surface.md +1 -1
package/docs/veritas-integration.md +4 -4
package/docs/vision.md +1 -1
package/docs/workflow-eval-strategy.md +2 -2
package/docs/workflow-usage-guide.md +2 -2
package/evals/acceptance/test_opencode_harness.sh +18 -10
package/evals/acceptance/test_pi_harness.sh +10 -6
package/evals/ci/run-baseline.sh +1 -1
package/evals/fixtures/builder-kit-workflow-state/happy-path.json +2 -2
package/evals/fixtures/builder-kit-workflow-state/mid-work-resume.json +2 -2
package/evals/fixtures/console-learning-projection/artifacts/console-learning-correction/learning.json +1 -1
package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/flows/runtime.flow.json +4 -4
package/evals/fixtures/flow-kit-repository/valid-local-kit/flows/review.flow.json +4 -4
package/evals/fixtures/kit-conformance-levels/k0-flows-only/flows/review.flow.json +4 -4
package/evals/fixtures/kit-conformance-levels/k1-agent-extension/flows/build.flow.json +4 -4
package/evals/fixtures/kit-conformance-levels/k2-with-evals/flows/synthesize.flow.json +4 -4
package/evals/fixtures/kit-conformance-levels/third-party-extension/flows/review.flow.json +4 -4
package/evals/fixtures/pull-work-provider/github-issues.json +5 -5
package/evals/fixtures/surface-trust/accepted-claim-trust-report.json +2 -2
package/evals/fixtures/surface-trust/artifact-absent.json +2 -2
package/evals/fixtures/surface-trust/integrity-mismatch-trust-report.json +2 -2
package/evals/fixtures/surface-trust/missing-authority-trust-report.json +2 -2
package/evals/fixtures/surface-trust/provider-absent.json +2 -2
package/evals/fixtures/surface-trust/rejected-claim-trust-report.json +2 -2
package/evals/fixtures/surface-trust/stale-claim-trust-snapshot.json +2 -2
package/evals/integration/test_activate_npx_context.sh +2 -2
package/evals/integration/test_bundle_install.sh +17 -12
package/evals/integration/test_console_learning_projection.sh +2 -2
package/evals/integration/test_flow_kit_install_git.sh +7 -7
package/evals/integration/test_flow_kit_repository.sh +4 -4
package/evals/integration/test_goal_fit_hook.sh +144 -0
package/evals/integration/test_kit_conformance_levels.sh +56 -2
package/evals/integration/test_local_flow_kit_install.sh +7 -7
package/evals/integration/test_publish_change_helper.sh +1 -1
package/evals/integration/test_pull_work_provider.sh +1 -1
package/evals/integration/test_runtime_adapter_activation.sh +3 -3
package/evals/integration/test_workflow_sidecar_writer.sh +9 -9
package/evals/lib/node.sh +2 -2
package/evals/static/test_package.sh +3 -3
package/evals/static/test_workflow_skills.sh +19 -19
package/integrations/strands/flow_agents_strands/steering.py +1 -1
package/integrations/strands-ts/src/hooks.ts +1 -1
package/kits/builder/flows/build.flow.json +48 -48
package/kits/builder/flows/shape.flow.json +36 -36
package/kits/builder/kit.json +17 -0
package/{skills → kits/builder/skills}/builder-shape/SKILL.md +4 -4
package/{skills → kits/builder/skills}/idea-to-backlog/SKILL.md +1 -1
package/kits/knowledge/adapters/obsidian-store/index.js +137 -26
package/kits/knowledge/evals/contract-suite/suite.test.js +90 -0
package/kits/knowledge/flows/compile.flow.json +12 -12
package/kits/knowledge/flows/consolidate.flow.json +16 -16
package/kits/knowledge/flows/ingest.flow.json +12 -12
package/kits/knowledge/flows/retire.flow.json +16 -16
package/kits/knowledge/flows/store-contract.flow.json +12 -12
package/kits/knowledge/flows/synthesize.flow.json +16 -16
package/kits/knowledge/kit.json +16 -9
package/kits/release-evidence/flows/release-evidence.flow.json +3 -3
package/package.json +11 -5
package/packaging/packs.json +1 -21
package/schemas/workflow-evidence.schema.json +2 -1
package/scripts/README.md +1 -1
package/scripts/hooks/stop-goal-fit.js +66 -18
package/scripts/kit.js +2 -0
package/skills/README.md +23 -0
package/src/cli/{flow-kit.ts → kit.ts} +124 -109
package/src/cli/validate-source-tree.ts +4 -4
package/src/cli/workflow-sidecar.ts +62 -4
package/src/cli.ts +3 -3
package/src/flow-kit/validate.ts +118 -58
package/src/tools/build-universal-bundles.ts +74 -13
package/src/tools/generate-context-map.ts +36 -6
package/src/tools/validate-source-tree.ts +27 -1
package/scripts/flow-kit.js +0 -2
package/skills/context-budget/SKILL.md +0 -40
package/skills/explore/SKILL.md +0 -137
package/skills/feedback-loop/SKILL.md +0 -87
package/skills/frontend-design/SKILL.md +0 -80
/package/{skills → kits/builder/skills}/deliver/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/design-probe/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/evidence-gate/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/execute-plan/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/fix-bug/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/learning-review/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/pickup-probe/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/plan-work/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/pull-work/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/release-readiness/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/review-work/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/tdd-workflow/SKILL.md +0 -0
/package/{skills → kits/builder/skills}/verify-work/SKILL.md +0 -0
/package/{skills → kits/knowledge/skills}/knowledge-capture/SKILL.md +0 -0

package/skills/explore/SKILL.md DELETED Viewed

@@ -1,137 +0,0 @@
----
-name: "explore"
-description: "Parallel codebase exploration — fans out subagents to map structure, entry points, dependencies, patterns, config, and tests in one pass."
----
-# Codebase Exploration
-Efficiently gather context about repositories by running parallel exploration tasks.
-## Harness Limit
-Some harnesses cap a single delegation batch at 4 subagents.
-- Respect the current harness limit.
-- If the limit is unknown, assume 4.
-- Never submit more than 4 subagents in one batch.
-- Use multiple waves when needed rather than overfilling the first fan-out.
-## Exploration Strategy
-Spawn MULTIPLE subagents IN PARALLEL to investigate different dimensions:
-### Wave 1A (parallel, up to 4 subagents)
-1. **Structure Scout** - Map directory structure, identify key folders (src, lib, tests, config)
-2. **Entry Point Finder** - Locate main files, CLI entry points, API routes, exports
-3. **Dependency Analyzer** - Parse package.json, requirements.txt, go.mod, Cargo.toml, pom.xml
-4. **Pattern Detective** - Identify architectural patterns, frameworks, coding conventions
-### Wave 1B (parallel, after Wave 1A if needed)
-5. **Config Inspector** - Find and summarize configuration files, env vars, build configs
-6. **Test Mapper** - Locate test files, understand testing strategy and coverage areas
-7. **Documentation Auditor** - Cross-reference all documentation against actual file system state:
-   - README agent tables vs actual `agents/*.agent-spec.json` files (ghost agents? missing agents?)
-   - README skill lists vs actual `skills/*/SKILL.md` files
-   - README dependency lists vs `Config` file declarations
-   - AGENTS.md shared sections consistency across packages (paths, naming examples, model references)
-   - All `.md` and `.json` files: grep for references to agents, skills, or paths that don't exist
-   - Agent spec `resources` paths: verify referenced context files exist
-   - Agent spec `model` fields: verify they follow conventions (orchestrators=opus, tools=haiku/sonnet)
-   - Typos and spelling errors in documentation files
-   - Empty directories or dead skill/SOP stubs
-### Wave 2 (after Wave 1A/1B — needs dependency list)
-7. **Tech Stack Researcher** - Research the identified tech stack using web search tools (`web_search`, `web_fetch`) and `tool-dependencies-updater` (audit-only — do NOT apply updates). Goals:
-   - Identify outdated or deprecated dependencies and how significant an upgrade would be (patch vs minor vs major, breaking changes)
-   - Discover new features in the current stack that the project could leverage
-   - Assess whether any part of the stack is irrelevant, superseded, or approaching EOL
-   - Surface project-specific context (migration guides, EOL announcements, known issues)
-## Execution Model
-```
-[User Request]
-      |
-      v
-[Wave 1A: Spawn first 4 dimensions in parallel]
-      |
-      v
-[Wave 1B: Spawn remaining dimensions in parallel if needed]
-      |
-      v
-[Aggregate Wave 1 findings]
-      |
-      v
-[Wave 2: Spawn Tech Stack Researcher with dependency list from Wave 1]
-  - tool-dependencies-updater: audit-only scan for outdated packages, version gaps, security advisories
-  - web search: research key frameworks/libraries for new features, deprecation, relevance
-      |
-      v
-[Final Synthesis]
-```
-## Subagent Prompts (use these as templates)
-Wave 1A:
-- "Explore the directory structure of this repo. List key folders and their purposes. Focus on: [specific area if provided]"
-- "Find all entry points in this codebase - main files, CLI commands, API routes, exported modules"
-- "Analyze dependencies - what frameworks, libraries, and tools does this project use?"
-- "Identify architectural patterns - is this MVC, microservices, monolith? What conventions are used?"
-Wave 1B:
-- "Find and summarize all configuration files - what can be configured and how?"
-- "Map the test structure - where are tests, what testing frameworks, what's the coverage strategy?"
-- "Audit all documentation for accuracy: (1) List every agent-spec.json file and cross-reference against README agent tables — flag any agents listed in docs but missing from disk or vice versa. (2) List every skills/*/SKILL.md and cross-reference against README skill lists. (3) Compare Config dependency declarations against README dependency sections. (4) Grep all .md and .json files for references to agent names and verify each referenced agent exists as an agent-spec.json. (5) Check AGENTS.md files across packages for inconsistent paths, naming examples, or model references. (6) Flag empty directories, typos, and dead stubs."
-Wave 2 (spawn these two in parallel):
-- tool-dependencies-updater: "Scan this project for all dependency manifests, check every dependency against the latest available version, run security advisory checks on outdated packages, and report findings grouped by risk level (critical/major/minor). Do NOT apply any updates — audit only."
-- web search: "Research the following tech stack: [list key frameworks/libraries from Wave 1]. For each, find: (1) latest stable version and what's new, (2) any deprecation or EOL announcements, (3) notable new features that could benefit this project, (4) whether any component has been superseded by a better alternative. Cite sources."
-## Output Format
-After all subagents complete, synthesize into:
-```
-## Codebase Overview
-[1-2 sentence summary]
-## Key Findings
-- **Tech Stack**: [languages, frameworks, tools]
-- **Architecture**: [pattern, structure]
-- **Entry Points**: [main files, commands]
-- **Configuration**: [key config files]
-- **Testing**: [strategy, frameworks]
-## Tech Stack Health
-- **Outdated (Critical)**: [packages with security vulnerabilities]
-- **Outdated (Major)**: [packages with major version bumps available — note breaking change risk]
-- **Outdated (Minor)**: [packages with minor/patch updates]
-- **New Features Available**: [notable new capabilities in current stack]
-- **Deprecation/EOL Warnings**: [anything approaching end of life]
-- **Upgrade Effort Summary**: [overall assessment — low/medium/high effort to get current]
-## Recommended Starting Points
-[Files to read first for understanding]
-## Potential Concerns
-[Any issues, outdated deps, missing tests, etc.]
-## Documentation Audit
-- **Ghost references**: [agents/skills/paths mentioned in docs but not on disk]
-- **Missing from docs**: [agents/skills that exist on disk but aren't documented]
-- **Stale content**: [outdated descriptions, wrong dependency lists, inconsistent AGENTS.md sections]
-- **Config mismatches**: [README deps vs Config file deps]
-- **Path inconsistencies**: [resource paths in agent specs that don't follow conventions]
-- **Empty/dead artifacts**: [empty directories, stub files with no content]
-- **Typos**: [spelling errors found in documentation]
-```
-## Key Principles
-- ALWAYS run explorations in PARALLEL within the current harness limit - this is the whole point
-- Never exceed 4 subagents in one batch unless the harness explicitly allows more
-- Wave 2 (Tech Stack Researcher) runs AFTER Wave 1A/1B completes because it needs the dependency list
-- tool-dependencies-updater is used in AUDIT-ONLY mode — never apply updates during explore
-- Be thorough but efficient - don't read entire files, scan for structure
-- Focus on what helps someone GET STARTED quickly
-- Flag anything unusual or concerning
-- If a specific area is requested, weight exploration toward that area

package/skills/feedback-loop/SKILL.md DELETED Viewed

@@ -1,87 +0,0 @@
----
-name: "feedback-loop"
-description: "Verify implementation actually works. Visual changes → Playwright; integration changes → commands/tests. Run after completing builds."
----
-# Feedback Loop
-Verify that what you claim to have built actually works. Don't just say "done" — prove it.
-## When to Use
-- After implementing changes, before declaring them complete
-- When the user asks you to verify or prove your work
-- As the final step of any implementation workflow
-- When you're uncertain whether your changes actually function correctly
-## Workflow
-### Step 1: IDENTIFY CHANGES
-Determine what was just built:
-- Check `git diff` for modified/added files
-- Review the active TODO list for context on what was implemented
-- Identify the nature of the change: what should be different now?
-### Step 2: CLASSIFY
-Determine the verification method:
-| Change Type | Method | Examples |
-|---|---|---|
-| **Visual** | Playwright via `tool-playwright` | UI components, pages, styles, layouts, forms, visual regressions |
-| **Integration** | Commands, tests, execution | APIs, CLIs, libraries, configs, build scripts, data processing |
-If changes span both, run both verification paths.
-### Step 3: VERIFY
-#### Visual Path (frontend/UI changes)
-Delegate to `tool-playwright`:
-1. Load the relevant URL (local dev server, preview, etc.)
-2. Take an accessibility snapshot to confirm elements exist and are structured correctly
-3. Take a screenshot for visual confirmation
-4. If interactive — click, type, navigate to exercise the changed behavior
-5. Compare against expected state: are the right elements present? Does the layout match intent?
-If the dev server isn't running, start it (or tell the user to) before proceeding.
-#### Integration Path (non-visual changes)
-Use the most direct verification available, in priority order:
-1. **Run existing tests** — if tests cover the changed code, run them
-2. **Execute the code** — run the CLI command, call the API endpoint, import the module
-3. **Check build** — compile/lint to confirm no syntax or type errors
-4. **Inspect output** — verify the output matches expected behavior
-Always capture actual output as evidence.
-### Step 4: REPORT
-State clearly:
-- **What was verified** — which changes, which method
-- **Evidence** — actual output, screenshots, test results, command output
-- **Verdict** — ✅ confirmed working, or ❌ found issues with specifics
-If verification fails, fix the issue and re-verify. Don't report failure without attempting a fix first.
-## Persistence Rule
-**Keep trying until the user says stop.** This is the core behavior of the feedback loop.
-- If a verification method fails (Playwright won't connect, tests error out, server won't start), **debug and retry**. Don't downgrade to a weaker method or declare "good enough."
-- If visual verification is required and Playwright is having issues, fix the Playwright issue. Don't fall back to "well the build passes so it's probably fine."
-- If integration tests fail, diagnose why, fix, and re-run. Don't report partial success.
-- Cycle: **verify → fail → diagnose → fix → verify again**. Repeat until either:
-  1. ✅ All verification methods pass with evidence, OR
-  2. 🛑 The user explicitly says to stop or skip a method
-Never self-exit the loop. Never decide on the AI's behalf that a failure is acceptable. The user breaks the loop, not the agent.
-## Key Principles
-- **Evidence over assertion.** Show output, not just "it works."
-- **Never settle.** If a verification method should work but isn't, that's a bug to fix — not a reason to skip it.
-- **Fix before reporting.** If verification reveals a bug you introduced, fix it and re-run.
-- **Match the medium.** UI changes need visual proof. Backend changes need execution proof.
-- **Be specific.** "Tests pass" is weak. "Ran `npm test` — 14 tests passed, 0 failed, output attached" is strong.
-- **Don't skip this.** The whole point is catching the gap between "I wrote the code" and "the code works."

package/skills/frontend-design/SKILL.md DELETED Viewed

@@ -1,80 +0,0 @@
----
-name: "frontend-design"
-description: "Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics."
----
-# Frontend Design
-Delegate frontend implementation to `tool-worker` with the design guidelines below included in the prompt. The orchestrator's job is to understand the user's requirements, choose an aesthetic direction, and hand off to tool-worker with clear instructions plus the full aesthetics guidelines.
-## Trigger Patterns
-This skill activates when the user:
-- Asks to build a UI, page, component, or web application
-- Wants a landing page, dashboard, form, or interactive interface
-- Mentions design quality, aesthetics, or visual polish
-- Asks for something "that looks good" or "production-grade"
-## Workflow
-### Step 1: UNDERSTAND REQUIREMENTS
-Gather from the user:
-- What to build (component, page, app)
-- Purpose and audience
-- Technical constraints (framework, existing codebase)
-- Any aesthetic preferences or references
-### Step 2: DELEGATE TO tool-worker
-Spawn `tool-worker` with a prompt that includes:
-- Delegate to the exact `tool-worker` role; do not spawn an unnamed/default implementation agent.
-1. The specific implementation task (what to build, where files go, framework)
-2. The **full Design Guidelines section below** — copy it into the delegation prompt so tool-worker has it in context
-### Step 3: VERIFY VISUALLY (mandatory)
-After tool-worker completes, you MUST delegate to tool-playwright to screenshot the result and confirm it renders correctly. Do NOT skip this step. Do NOT treat implementation as the final step. Visual verification is required before relaying results to the user.
-## Design Guidelines
-Include everything below this line in the tool-worker delegation prompt.
----
-### Design Thinking
-Before coding, understand the context and commit to a BOLD aesthetic direction:
-- **Purpose**: What problem does this interface solve? Who uses it?
-- **Tone**: Pick an extreme: brutally minimal, maximalist chaos, retro-futuristic, organic/natural, luxury/refined, playful/toy-like, editorial/magazine, brutalist/raw, art deco/geometric, soft/pastel, industrial/utilitarian, etc. Use these for inspiration but design one that is true to the aesthetic direction.
-- **Constraints**: Technical requirements (framework, performance, accessibility).
-- **Differentiation**: What makes this UNFORGETTABLE? What's the one thing someone will remember?
-**CRITICAL**: Choose a clear conceptual direction and execute it with precision. Bold maximalism and refined minimalism both work — the key is intentionality, not intensity.
-Then implement working code (HTML/CSS/JS, React, Vue, etc.) that is:
-- Production-grade and functional
-- Visually striking and memorable
-- Cohesive with a clear aesthetic point-of-view
-- Meticulously refined in every detail
-### Frontend Aesthetics
-- **Typography**: Choose fonts that are beautiful, unique, and interesting. Avoid generic fonts like Arial and Inter; opt for distinctive choices that elevate the aesthetic. Pair a distinctive display font with a refined body font.
-- **Color & Theme**: Commit to a cohesive aesthetic. Use CSS variables for consistency. Dominant colors with sharp accents outperform timid, evenly-distributed palettes.
-- **Motion**: Use animations for effects and micro-interactions. Prioritize CSS-only solutions for HTML. Use Motion library for React when available. Focus on high-impact moments: one well-orchestrated page load with staggered reveals creates more delight than scattered micro-interactions.
-- **Spatial Composition**: Unexpected layouts. Asymmetry. Overlap. Diagonal flow. Grid-breaking elements. Generous negative space OR controlled density.
-- **Backgrounds & Visual Details**: Create atmosphere and depth rather than defaulting to solid colors. Apply creative forms like gradient meshes, noise textures, geometric patterns, layered transparencies, dramatic shadows, decorative borders, and grain overlays.
-### Anti-Patterns (NEVER use)
-- Generic font families (Inter, Roboto, Arial, system fonts)
-- Cliched color schemes (particularly purple gradients on white backgrounds)
-- Predictable layouts and cookie-cutter component patterns
-- Converging on the same "safe" choices across generations (e.g., Space Grotesk every time)
-Vary between light and dark themes, different fonts, different aesthetics. Every design should feel unique to its context.
-### Calibration
-Match implementation complexity to the aesthetic vision. Maximalist designs need elaborate code with extensive animations and effects. Minimalist or refined designs need restraint, precision, and careful attention to spacing, typography, and subtle details.