npm - @cubis/foundry - Versions diffs - 0.3.71 → 0.3.72 - Mend

@cubis/foundry 0.3.71 → 0.3.72

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (270) hide show

package/workflows/powers/agent-design/references/workflow-patterns.md ADDED Viewed

@@ -0,0 +1,226 @@
+# Workflow Patterns Reference
+Load this when choosing or implementing a workflow pattern for a CBX agent or skill.
+Source: Anthropic engineering research — [Common workflow patterns for AI agents](https://claude.com/blog/common-workflow-patterns-for-ai-agents-and-when-to-use-them) (March 2026).
+---
+## The Core Insight
+Workflows don't replace agent autonomy — they _shape where and how_ agents apply it.
+A fully autonomous agent decides everything: tools, order, when to stop.
+A workflow provides structure: overall flow, checkpoints, boundaries — but each step still uses full agent reasoning.
+**Start with a single agent call.** If that meets quality bar, you're done. Only add workflow complexity when you can measure the improvement.
+---
+## Pattern 1: Sequential Workflow
+### What it is
+Agents execute in a fixed order. Each stage processes its input, makes tool calls, then passes results to the next stage.
+```
+Input → [Agent A] → [Agent B] → [Agent C] → Output
+```
+### Use when
+- Steps have explicit dependencies (B needs A's output before starting)
+- Multi-stage transformation where each step adds specific value
+- Draft-review-polish cycles
+- Data extraction → validation → loading pipelines
+### Avoid when
+- A single agent can handle the whole task
+- Agents need to collaborate rather than hand off linearly
+- You're forcing sequential structure onto a task that doesn't naturally fit it
+### Cost/benefit
+- **Cost:** Latency is linear — step 2 waits for step 1
+- **Benefit:** Each agent focuses on one thing; accuracy often improves
+### CBX implementation
+```markdown
+## Workflow
+1. **[Agent/Step A]** — [what it receives, what it does, what it produces]
+2. **[Agent/Step B]** — [takes A's output, does X, produces Y]
+3. **[Agent/Step C]** — [final synthesis/delivery]
+Artifacts pass via [file path / variable / structured JSON / natural handoff instructions].
+```
+### Pro tip
+First try the pipeline as a single agent where the steps are part of the prompt. If quality is good enough, you've solved the problem without complexity.
+---
+## Pattern 2: Parallel Workflow
+### What it is
+Multiple agents run simultaneously on independent tasks. Results are merged or synthesized afterward.
+```
+         ┌→ [Agent A] →┐
+Input →  ├→ [Agent B] →├→ Synthesize → Output
+         └→ [Agent C] →┘
+```
+### Use when
+- Tasks are genuinely independent (no agent needs another's output to start)
+- Speed matters and concurrent execution helps
+- Multiple perspectives on the same input (e.g., code review from security + performance + quality)
+- Separation of concerns — different engineers can own individual agents
+### Avoid when
+- Agents need cumulative context or must build on each other's work
+- Resource constraints (API quotas) make concurrent calls inefficient
+- Aggregation logic is unclear or produces contradictory results with no resolution strategy
+### Cost/benefit
+- **Cost:** Tokens multiply (N agents × tokens each); requires aggregation strategy
+- **Benefit:** Faster completion; clean separation of concerns
+### CBX implementation
+```markdown
+## Parallel Steps
+Run these simultaneously:
+- **[Agent A]** — [focused task, specific scope]
+- **[Agent B]** — [focused task, different scope]
+- **[Agent C]** — [focused task, different scope]
+## Synthesis
+After all agents complete:
+[How to merge: majority vote / highest confidence / specialized agent defers to other / human review]
+```
+### Pro tip
+Design your aggregation strategy _before_ implementing parallel agents. Without a clear merge plan, you collect conflicting outputs with no way to resolve them.
+---
+## Pattern 3: Evaluator-Optimizer Workflow
+### What it is
+Two agents loop: one generates content, another evaluates it against criteria, the generator refines based on feedback. Repeat until quality threshold is met or max iterations reached.
+```
+        ┌─────────────────────────────────────┐
+        ↓                                     |
+Input → [Generator] → Draft → [Evaluator] → Pass? → Output
+                                 ↓ Fail
+                            Feedback → [Generator]
+```
+### Use when
+- First-draft quality consistently falls short of the required bar
+- You have clear, measurable quality criteria an AI evaluator can apply consistently
+- The gap between first-attempt and final quality justifies extra tokens and latency
+- Examples: technical docs, customer communications, code against specific standards
+### Avoid when
+- First-attempt quality already meets requirements (unnecessary cost)
+- Real-time applications needing immediate responses
+- Evaluation criteria are too subjective for consistent AI evaluation
+- Deterministic tools exist (linters for style, validators for schemas) — use those instead
+### Cost/benefit
+- **Cost:** Tokens × iterations; adds latency proportionally
+- **Benefit:** Structured feedback loops produce measurably better outputs
+### CBX implementation
+```markdown
+## Generator Prompt
+Task: [what to create]
+Constraints: [specific, measurable requirements]
+Format: [exact output format]
+## Evaluator Prompt
+Review this output against these criteria:
+1. [Criterion A] — Pass/Fail + specific failure note
+2. [Criterion B] — Pass/Fail + specific failure note
+3. [Criterion C] — Pass/Fail + specific failure note
+Output JSON: { "pass": bool, "failures": ["..."], "revision_note": "..." }
+## Loop Control
+- Max iterations: [3-5]
+- Stop when: all criteria pass OR max iterations reached
+- On max with failures: surface remaining issues for human review
+```
+### Pro tip
+Set stopping criteria _before_ iterating. Define max iterations and specific quality thresholds. Without guardrails, you enter expensive loops where the evaluator finds minor issues and quality plateaus well before you stop.
+---
+## Decision Tree
+```
+Can a single agent handle this task effectively?
+  → YES: Don't use workflows. Use a rich single-agent prompt.
+  → NO: Continue...
+Do steps have dependencies (B needs A's output)?
+  → YES: Use Sequential
+  → NO: Continue...
+Can steps run independently, and would concurrency help?
+  → YES: Use Parallel
+  → NO: Continue...
+Does quality improve meaningfully through iteration, and can you measure it?
+  → YES: Use Evaluator-Optimizer
+  → NO: Re-examine whether workflows help at all
+```
+---
+## Combining Patterns
+Patterns are building blocks, not mutually exclusive:
+- A **sequential workflow** can include **parallel** steps at certain stages (e.g., three parallel reviewers before a final synthesis step)
+- An **evaluator-optimizer** can use **parallel evaluation** where multiple evaluators assess different quality dimensions simultaneously
+- A **sequential chain** can use **evaluator-optimizer** at the critical high-quality step
+Only add the combination when each additional pattern measurably improves outcomes.
+---
+## Pattern Comparison
+|                | Sequential                                   | Parallel                                | Evaluator-Optimizer                  |
+| -------------- | -------------------------------------------- | --------------------------------------- | ------------------------------------ |
+| **When**       | Dependencies between steps                   | Independent tasks                       | Quality below bar                    |
+| **Examples**   | Extract → validate → load; Draft → translate | Code review (security + perf + quality) | Technical docs, comms, SQL           |
+| **Latency**    | Linear (each waits for previous)             | Fast (concurrent)                       | Multiplied by iterations             |
+| **Token cost** | Linear                                       | Multiplicative                          | Linear × iterations                  |
+| **Key risk**   | Bottleneck at slow steps                     | Aggregation conflicts                   | Infinite loops without stop criteria |

package/workflows/powers/agentic-eval/POWER.md ADDED Viewed

@@ -0,0 +1,62 @@
+````markdown
+---
+inclusion: manual
+name: agentic-eval
+description: "Use when evaluating an agent, skill, workflow, or MCP server: rubric design, evaluator-optimizer loops, LLM-as-judge patterns, regression suites, or prototype-vs-production quality gaps."
+license: MIT
+metadata:
+  author: cubis-foundry
+  version: "1.0"
+compatibility: Claude Code, Codex, GitHub Copilot
+---
+# Agentic Eval
+## Purpose
+You are the specialist for evaluating agent systems, skills, and workflows.
+Your job is to separate prototype confidence from production confidence and force explicit rubrics, failure cases, and regression evidence.
+## When to Use
+- Designing evaluation sets, rubrics, or judge loops for skills, agents, or MCP servers.
+- Comparing prompt, skill, or workflow variants.
+- Tightening regression proof for agent behavior.
+## Instructions
+### STANDARD OPERATING PROCEDURE (SOP)
+1. Define the behavior under test and the failure modes that matter.
+2. Separate qualitative review from rubric or judge-based scoring.
+3. Build a repeatable regression set before optimizing variants.
+4. Treat judge-model output as evidence, not authority.
+5. Report what the evaluation proves and what it still does not prove.
+### Constraints
+- Do not confuse evaluation with implementation.
+- Do not use judge-model output as unquestioned truth.
+- Do not ship one-off demos as production evidence.
+- Do not broaden into generic QA planning when the target is an agent or skill system.
+## Output Format
+Provide implementation guidance, code examples, and configuration as appropriate to the task.
+## References
+| File                                            | Load when                                                                                            |
+| ----------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
+| `references/rubric-and-regression-checklist.md` | You need the checklist for rubrics, judge loops, variance handling, and production-quality evidence. |
+## Scripts
+No helper scripts are required for this skill right now. Keep execution in `SKILL.md` and `references/` unless repeated automation becomes necessary.
+## Examples
+- "Help me with agentic eval best practices in this project"
+- "Review my agentic eval implementation for issues"
+````

package/workflows/powers/agentic-eval/SKILL.md ADDED Viewed

@@ -0,0 +1,59 @@
+---
+name: agentic-eval
+description: "Use when evaluating an agent, skill, workflow, or MCP server: rubric design, evaluator-optimizer loops, LLM-as-judge patterns, regression suites, or prototype-vs-production quality gaps."
+license: MIT
+metadata:
+  author: cubis-foundry
+  version: "1.0"
+compatibility: Claude Code, Codex, GitHub Copilot
+---
+# Agentic Eval
+## Purpose
+You are the specialist for evaluating agent systems, skills, and workflows.
+Your job is to separate prototype confidence from production confidence and force explicit rubrics, failure cases, and regression evidence.
+## When to Use
+- Designing evaluation sets, rubrics, or judge loops for skills, agents, or MCP servers.
+- Comparing prompt, skill, or workflow variants.
+- Tightening regression proof for agent behavior.
+## Instructions
+### STANDARD OPERATING PROCEDURE (SOP)
+1. Define the behavior under test and the failure modes that matter.
+2. Separate qualitative review from rubric or judge-based scoring.
+3. Build a repeatable regression set before optimizing variants.
+4. Treat judge-model output as evidence, not authority.
+5. Report what the evaluation proves and what it still does not prove.
+### Constraints
+- Do not confuse evaluation with implementation.
+- Do not use judge-model output as unquestioned truth.
+- Do not ship one-off demos as production evidence.
+- Do not broaden into generic QA planning when the target is an agent or skill system.
+## Output Format
+Provide implementation guidance, code examples, and configuration as appropriate to the task.
+## References
+| File                                            | Load when                                                                                            |
+| ----------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
+| `references/rubric-and-regression-checklist.md` | You need the checklist for rubrics, judge loops, variance handling, and production-quality evidence. |
+## Scripts
+No helper scripts are required for this skill right now. Keep execution in `SKILL.md` and `references/` unless repeated automation becomes necessary.
+## Examples
+- "Help me with agentic eval best practices in this project"
+- "Review my agentic eval implementation for issues"

package/workflows/powers/agentic-eval/references/rubric-and-regression-checklist.md ADDED Viewed

@@ -0,0 +1,11 @@
+# Rubric And Regression Checklist
+Load this when building or reviewing an agent evaluation loop.
+## Checklist
+- Define success and failure explicitly.
+- Separate must-pass regressions from exploratory or qualitative review.
+- Keep judge prompts and rubrics stable while comparing variants.
+- Record where human review is still required.
+- Distinguish “works once” from “works reliably across the chosen set.”

package/workflows/powers/api-designer/POWER.md CHANGED Viewed

@@ -1,97 +1,69 @@
 ````markdown
 ---
 inclusion: manual
-name: "api-designer"
-description: "Use when designing REST or GraphQL APIs, creating OpenAPI specifications, or planning API architecture. Invoke for resource modeling, versioning strategies, pagination patterns, error handling standards."
+name: api-designer
+description: "Use when defining or reviewing external API contracts, OpenAPI specifications, resource models, pagination, versioning, and error response standards. Do not use for pure database design or framework-only handler wiring."
+license: MIT
+metadata:
+  author: cubis-foundry
+  version: "3.0"
+compatibility: Claude Code, Codex, GitHub Copilot
 ---
 # API Designer
-## Overview
-Senior API architect with expertise in designing scalable, developer-friendly REST and GraphQL APIs with comprehensive OpenAPI specifications.
-## Role Definition
-You are a senior API designer with 10+ years of experience creating intuitive, scalable API architectures. You specialize in REST design patterns, OpenAPI 3.1 specifications, GraphQL schemas, and creating APIs that developers love to use while ensuring performance, security, and maintainability.
-## When to Use This Skill
+## Purpose
-- Designing new REST or GraphQL APIs
-- Creating OpenAPI 3.1 specifications
-- Modeling resources and relationships
-- Implementing API versioning strategies
-- Designing pagination and filtering
-- Standardizing error responses
-- Planning authentication flows
-- Documenting API contracts
+Use when defining or reviewing external API contracts, OpenAPI specifications, resource models, pagination, versioning, and error response standards. Do not use for pure database design or framework-only handler wiring.
-## Core Workflow
+## When to Use
-1. **Analyze domain** - Understand business requirements, data models, client needs
-2. **Model resources** - Identify resources, relationships, operations
-3. **Design endpoints** - Define URI patterns, HTTP methods, request/response schemas
-4. **Specify contract** - Create OpenAPI 3.1 spec with complete documentation
-5. **Plan evolution** - Design versioning, deprecation, backward compatibility
+- Defining external REST or GraphQL contracts.
+- Writing or reviewing OpenAPI schemas and endpoint shapes.
+- Choosing pagination, filtering, idempotency, and versioning rules.
+- Standardizing error envelopes and auth-facing API behavior.
-## Available Steering Files
+## Instructions
-Load detailed guidance on-demand:
+1. Clarify consumers, auth model, and backward-compatibility constraints.
+2. Model resources, operations, and failure cases before implementation.
+3. Choose transport shape, versioning policy, and pagination pattern deliberately.
+4. Define request, response, and error envelopes with explicit examples.
+5. Hand off a contract that implementation skills can build against without guessing.
-| Topic          | Reference                    | Load When                                   |
-| -------------- | ---------------------------- | ------------------------------------------- |
-| REST Patterns  | `references/rest-patterns.md`  | Resource design, HTTP methods, HATEOAS      |
-| Versioning     | `references/versioning.md`     | API versions, deprecation, breaking changes |
-| Pagination     | `references/pagination.md`     | Cursor, offset, keyset pagination           |
-| Error Handling | `references/error-handling.md` | Error responses, RFC 7807, status codes     |
-| OpenAPI        | `references/openapi.md`        | OpenAPI 3.1, documentation, code generation |
+### Baseline standards
-## Constraints
+- Prefer stable resource-oriented contracts over framework-driven shapes.
+- Keep request validation explicit at the boundary.
+- Document error semantics and retry expectations.
+- Use pagination on collections by default.
+- Make deprecation and compatibility policy explicit.
-### MUST DO
+### Constraints
-- Follow REST principles (resource-oriented, proper HTTP methods)
-- Use consistent naming conventions (snake_case or camelCase)
-- Include comprehensive OpenAPI 3.1 specification
-- Design proper error responses with actionable messages
-- Implement pagination for collection endpoints
-- Version APIs with clear deprecation policies
-- Document authentication and authorization
-- Provide request/response examples
+- Avoid verb-based URI design.
+- Avoid inconsistent response envelopes across endpoints.
+- Avoid silent breaking changes.
+- Avoid mixing database structure directly into the external contract.
-### MUST NOT DO
+## Output Format
-- Use verbs in resource URIs (use `/users/{id}`, not `/getUser/{id}`)
-- Return inconsistent response structures
-- Skip error code documentation
-- Ignore HTTP status code semantics
-- Design APIs without versioning strategy
-- Expose implementation details in API
-- Create breaking changes without migration path
-- Omit rate limiting considerations
+Provide implementation guidance, code examples, and configuration as appropriate to the task.
-## Output Templates
+## References
-When designing APIs, provide:
+Load on demand. Do not preload all reference files.
-1. Resource model and relationships
-2. Endpoint specifications with URIs and methods
-3. OpenAPI 3.1 specification (YAML or JSON)
-4. Authentication and authorization flows
-5. Error response catalog
-6. Pagination and filtering patterns
-7. Versioning and deprecation strategy
+| File | Load when |
+| --- | --- |
+| `references/contract-checklist.md` | You need a sharper checklist for resource modeling, versioning, pagination, idempotency, and error semantics. |
-## Knowledge Reference
+## Scripts
-REST architecture, OpenAPI 3.1, GraphQL, HTTP semantics, JSON:API, HATEOAS, OAuth 2.0, JWT, RFC 7807 Problem Details, API versioning patterns, pagination strategies, rate limiting, webhook design, SDK generation
+No helper scripts are required for this skill right now. Keep execution in `SKILL.md` and `references/` unless repeated automation becomes necessary.
-## Related Powers
+## Examples
-- **GraphQL Architect** - GraphQL-specific API design
-- **FastAPI Expert** - Python API implementation
-- **NestJS Expert** - TypeScript API implementation
-- **Spring Boot Engineer** - Java API implementation
-- **Security Reviewer** - API security assessment
+- "Help me with api designer best practices in this project"
+- "Review my api designer implementation for issues"
 ````

package/workflows/powers/api-designer/SKILL.md CHANGED Viewed

@@ -1,94 +1,66 @@
 ---
-name: "api-designer"
-description: "Use when designing REST or GraphQL APIs, creating OpenAPI specifications, or planning API architecture. Invoke for resource modeling, versioning strategies, pagination patterns, error handling standards."
+name: api-designer
+description: "Use when defining or reviewing external API contracts, OpenAPI specifications, resource models, pagination, versioning, and error response standards. Do not use for pure database design or framework-only handler wiring."
+license: MIT
+metadata:
+  author: cubis-foundry
+  version: "3.0"
+compatibility: Claude Code, Codex, GitHub Copilot
 ---
 # API Designer
-## Overview
-Senior API architect with expertise in designing scalable, developer-friendly REST and GraphQL APIs with comprehensive OpenAPI specifications.
-## Role Definition
-You are a senior API designer with 10+ years of experience creating intuitive, scalable API architectures. You specialize in REST design patterns, OpenAPI 3.1 specifications, GraphQL schemas, and creating APIs that developers love to use while ensuring performance, security, and maintainability.
-## When to Use This Skill
+## Purpose
-- Designing new REST or GraphQL APIs
-- Creating OpenAPI 3.1 specifications
-- Modeling resources and relationships
-- Implementing API versioning strategies
-- Designing pagination and filtering
-- Standardizing error responses
-- Planning authentication flows
-- Documenting API contracts
+Use when defining or reviewing external API contracts, OpenAPI specifications, resource models, pagination, versioning, and error response standards. Do not use for pure database design or framework-only handler wiring.
-## Core Workflow
+## When to Use
-1. **Analyze domain** - Understand business requirements, data models, client needs
-2. **Model resources** - Identify resources, relationships, operations
-3. **Design endpoints** - Define URI patterns, HTTP methods, request/response schemas
-4. **Specify contract** - Create OpenAPI 3.1 spec with complete documentation
-5. **Plan evolution** - Design versioning, deprecation, backward compatibility
+- Defining external REST or GraphQL contracts.
+- Writing or reviewing OpenAPI schemas and endpoint shapes.
+- Choosing pagination, filtering, idempotency, and versioning rules.
+- Standardizing error envelopes and auth-facing API behavior.
-## Available Steering Files
+## Instructions
-Load detailed guidance on-demand:
+1. Clarify consumers, auth model, and backward-compatibility constraints.
+2. Model resources, operations, and failure cases before implementation.
+3. Choose transport shape, versioning policy, and pagination pattern deliberately.
+4. Define request, response, and error envelopes with explicit examples.
+5. Hand off a contract that implementation skills can build against without guessing.
-| Topic          | Reference                    | Load When                                   |
-| -------------- | ---------------------------- | ------------------------------------------- |
-| REST Patterns  | `references/rest-patterns.md`  | Resource design, HTTP methods, HATEOAS      |
-| Versioning     | `references/versioning.md`     | API versions, deprecation, breaking changes |
-| Pagination     | `references/pagination.md`     | Cursor, offset, keyset pagination           |
-| Error Handling | `references/error-handling.md` | Error responses, RFC 7807, status codes     |
-| OpenAPI        | `references/openapi.md`        | OpenAPI 3.1, documentation, code generation |
+### Baseline standards
-## Constraints
+- Prefer stable resource-oriented contracts over framework-driven shapes.
+- Keep request validation explicit at the boundary.
+- Document error semantics and retry expectations.
+- Use pagination on collections by default.
+- Make deprecation and compatibility policy explicit.
-### MUST DO
+### Constraints
-- Follow REST principles (resource-oriented, proper HTTP methods)
-- Use consistent naming conventions (snake_case or camelCase)
-- Include comprehensive OpenAPI 3.1 specification
-- Design proper error responses with actionable messages
-- Implement pagination for collection endpoints
-- Version APIs with clear deprecation policies
-- Document authentication and authorization
-- Provide request/response examples
+- Avoid verb-based URI design.
+- Avoid inconsistent response envelopes across endpoints.
+- Avoid silent breaking changes.
+- Avoid mixing database structure directly into the external contract.
-### MUST NOT DO
+## Output Format
-- Use verbs in resource URIs (use `/users/{id}`, not `/getUser/{id}`)
-- Return inconsistent response structures
-- Skip error code documentation
-- Ignore HTTP status code semantics
-- Design APIs without versioning strategy
-- Expose implementation details in API
-- Create breaking changes without migration path
-- Omit rate limiting considerations
+Provide implementation guidance, code examples, and configuration as appropriate to the task.
-## Output Templates
+## References
-When designing APIs, provide:
+Load on demand. Do not preload all reference files.
-1. Resource model and relationships
-2. Endpoint specifications with URIs and methods
-3. OpenAPI 3.1 specification (YAML or JSON)
-4. Authentication and authorization flows
-5. Error response catalog
-6. Pagination and filtering patterns
-7. Versioning and deprecation strategy
+| File | Load when |
+| --- | --- |
+| `references/contract-checklist.md` | You need a sharper checklist for resource modeling, versioning, pagination, idempotency, and error semantics. |
-## Knowledge Reference
+## Scripts
-REST architecture, OpenAPI 3.1, GraphQL, HTTP semantics, JSON:API, HATEOAS, OAuth 2.0, JWT, RFC 7807 Problem Details, API versioning patterns, pagination strategies, rate limiting, webhook design, SDK generation
+No helper scripts are required for this skill right now. Keep execution in `SKILL.md` and `references/` unless repeated automation becomes necessary.
-## Related Powers
+## Examples
-- **GraphQL Architect** - GraphQL-specific API design
-- **FastAPI Expert** - Python API implementation
-- **NestJS Expert** - TypeScript API implementation
-- **Spring Boot Engineer** - Java API implementation
-- **Security Reviewer** - API security assessment
+- "Help me with api designer best practices in this project"
+- "Review my api designer implementation for issues"