npm - convoke-agents - Versions diffs - 2.4.0 → 3.0.0 - Mend

convoke-agents 2.4.0 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (67) hide show

package/_bmad/bme/_gyre/contracts/gc2-capabilities-manifest.md ADDED Viewed

@@ -0,0 +1,189 @@
+# GC2: Capabilities Manifest — Schema Definition
+> **Contract:** GC2 | **Type:** Artifact | **Flow:** Atlas → Lens, Coach
+>
+> This schema defines the structure for the Capabilities Manifest produced by the model-generation workflow. The manifest contains contextual capabilities relevant to the detected stack — NOT source code, file contents, or secrets.
+## Frontmatter Schema
+```yaml
+---
+contract: GC2
+type: artifact
+source_agent: atlas
+source_workflow: model-generation
+target_agents: [lens, coach]
+input_artifacts: [GC1]
+created: YYYY-MM-DD
+---
+```
+### Frontmatter Field Reference
+| Field | Required | Type | Description |
+|-------|----------|------|-------------|
+| `contract` | Yes | string | Always `GC2` |
+| `type` | Yes | string | Always `artifact` |
+| `source_agent` | Yes | string | Always `atlas` |
+| `source_workflow` | Yes | string | Always `model-generation` |
+| `target_agents` | Yes | array | Agent IDs that consume this artifact: `[lens, coach]` |
+| `input_artifacts` | Yes | array | `[GC1]` — requires Stack Profile |
+| `created` | Yes | date | ISO date when artifact was created |
+---
+## Artifact Safety Rule
+**GC2 must not contain source code, file contents, or secrets (NFR9).**
+Capabilities describe WHAT should exist (categories, practices, standards) — not WHAT currently exists in the codebase. Evidence comparison is Lens's job (GC3), not Atlas's.
+---
+## Body Schema
+The Capabilities Manifest is written to `.gyre/capabilities.yaml` with the following structure:
+```yaml
+gyre_manifest:
+  version: string                 # manifest schema version, e.g., "1.0"
+  generated_at: ISO-8601          # generation timestamp
+  stack_summary: string           # one-line stack description from GC1
+  capability_count: integer       # total number of capabilities (excluding removed)
+  limited_coverage: boolean       # true if <20 capabilities generated
+  capabilities:
+    - id: string                  # kebab-case identifier, e.g., "health-check-liveness"
+      category: string            # "observability" | "deployment" | "reliability" | "security"
+      name: string                # human-readable name
+      description: string         # 1-3 sentences: what it is + why it matters for THIS stack
+      source: string              # "standard" | "practice" | "reasoning"
+      relevance: string           # why this matters for THIS stack specifically
+      amended: boolean            # true if user-modified via Coach review (GC4)
+      removed: boolean            # true if user removed via Coach review (GC4)
+  provenance:
+    standards_referenced: string[]  # e.g., ["DORA", "OpenTelemetry", "Google PRR"]
+    web_search_performed: boolean
+    web_search_date: ISO-8601       # null if web search not performed
+```
+### Field Reference
+| Field | Required | Type | Description |
+|-------|----------|------|-------------|
+| `version` | Yes | string | Schema version for future breaking changes (NFR17) |
+| `generated_at` | Yes | string | ISO-8601 timestamp of generation |
+| `stack_summary` | Yes | string | One-line description of detected stack |
+| `capability_count` | Yes | integer | Total active capabilities (excluding removed) |
+| `limited_coverage` | Yes | boolean | True if fewer than 20 capabilities generated |
+| `capabilities` | Yes | array | List of capability objects |
+| `capabilities[].id` | Yes | string | Unique kebab-case identifier |
+| `capabilities[].category` | Yes | string | Domain category |
+| `capabilities[].name` | Yes | string | Human-readable capability name |
+| `capabilities[].description` | Yes | string | 1-3 sentence description with stack-specific context |
+| `capabilities[].source` | Yes | string | Origin: "standard", "practice", or "reasoning" |
+| `capabilities[].relevance` | Yes | string | Why this capability matters for this specific stack |
+| `capabilities[].amended` | Yes | boolean | True if modified by user during Coach review |
+| `capabilities[].removed` | Yes | boolean | True if removed by user during Coach review |
+| `provenance` | Yes | object | Generation metadata |
+| `provenance.standards_referenced` | Yes | array | Industry standards used |
+| `provenance.web_search_performed` | Yes | boolean | Whether web search enrichment was done |
+| `provenance.web_search_date` | No | string | ISO-8601 date of web search, null if not performed |
+---
+## Capability Categories
+| Category | Description | Typical Count |
+|----------|-------------|:------------:|
+| `observability` | Logging, tracing, metrics, health checks, alerting | 6-10 |
+| `deployment` | CI/CD, containers, orchestration, rollback, IaC | 5-8 |
+| `reliability` | Graceful shutdown, circuit breakers, rate limiting, fault tolerance | 4-6 |
+| `security` | Secrets management, vulnerability scanning, network policies, auth | 3-5 |
+---
+## Capability Sources
+| Source | Meaning | Examples |
+|--------|---------|---------|
+| `standard` | Derived from a named industry framework | DORA metrics, OpenTelemetry SDK, Google PRR checklist |
+| `practice` | Common industry practice found via web search or domain expertise | Structured logging with correlation IDs, multi-stage Docker builds |
+| `reasoning` | Derived from stack analysis — Atlas inferred this is relevant | "gRPC health checking protocol" inferred from gRPC + Kubernetes stack |
+---
+## Artifact Location
+- **Path:** `.gyre/capabilities.yaml` (relative to project root, or service root in monorepo)
+- **Caching:** The manifest file IS the cache — re-runs load it, no regeneration unless explicit (NFR10)
+- **Amendment persistence:** Coach writes amendments directly to this file via GC4
+---
+## Downstream Consumption
+| Consumer | Purpose |
+|----------|---------|
+| **Lens** (readiness-analyst) | Compares each capability against filesystem evidence to identify absences — what's missing, not just what's misconfigured |
+| **Coach** (review-coach) | Presents capabilities for user review — keep/remove/edit. Captures amendments written back via GC4 |
+---
+## Example
+```yaml
+---
+contract: GC2
+type: artifact
+source_agent: atlas
+source_workflow: model-generation
+target_agents: [lens, coach]
+input_artifacts: [GC1]
+created: 2026-03-22
+---
+gyre_manifest:
+  version: "1.0"
+  generated_at: "2026-03-22T14:30:00Z"
+  stack_summary: "Node.js Express web service on AWS EKS via GitHub Actions"
+  capability_count: 24
+  limited_coverage: false
+  capabilities:
+    - id: "structured-logging"
+      category: "observability"
+      name: "Structured JSON Logging"
+      description: "Application logs in structured JSON format with correlation IDs for request tracing. Essential for EKS workloads where CloudWatch Logs Insights or Elasticsearch are used for log analysis."
+      source: "standard"
+      relevance: "Node.js services on EKS need structured logs for CloudWatch Logs Insights queries and cross-service correlation."
+      amended: false
+      removed: false
+    - id: "health-check-liveness"
+      category: "observability"
+      name: "Kubernetes Liveness Probe"
+      description: "HTTP endpoint (typically /healthz) that Kubernetes uses to detect stuck containers and restart them. Must check application responsiveness, not downstream dependencies."
+      source: "standard"
+      relevance: "EKS requires liveness probes to auto-heal unresponsive pods. Express apps need a lightweight /healthz endpoint."
+      amended: false
+      removed: false
+  provenance:
+    standards_referenced: ["DORA", "OpenTelemetry", "Google PRR"]
+    web_search_performed: true
+    web_search_date: "2026-03-22"
+```
+---
+## Validation Rules
+A valid GC2 artifact must:
+1. Have all required frontmatter fields present and correctly typed
+2. Have all required body fields present and non-empty
+3. Contain NO source code, file contents, or secrets
+4. Have `version` as a string (semantic version format)
+5. Have each capability with all required fields present
+6. Have `category` as one of: "observability", "deployment", "reliability", "security"
+7. Have `source` as one of: "standard", "practice", "reasoning"
+8. Have unique `id` values across all capabilities
+9. Have `capability_count` matching the actual count of non-removed capabilities
+10. Have `limited_coverage` set to true if capability_count < 20

package/_bmad/bme/_gyre/contracts/gc3-findings-report.md ADDED Viewed

@@ -0,0 +1,197 @@
+# GC3: Findings Report — Schema Definition
+> **Contract:** GC3 | **Type:** Artifact | **Flow:** Lens → Coach
+>
+> This schema defines the structure for the Findings Report produced by the gap-analysis workflow. Contains absence-based findings with severity, confidence, evidence summaries, and cross-domain compounds.
+## Frontmatter Schema
+```yaml
+---
+contract: GC3
+type: artifact
+source_agent: lens
+source_workflow: gap-analysis
+target_agents: [coach]
+input_artifacts: [GC2]
+created: YYYY-MM-DD
+---
+```
+### Frontmatter Field Reference
+| Field | Required | Type | Description |
+|-------|----------|------|-------------|
+| `contract` | Yes | string | Always `GC3` |
+| `type` | Yes | string | Always `artifact` |
+| `source_agent` | Yes | string | Always `lens` |
+| `source_workflow` | Yes | string | Always `gap-analysis` |
+| `target_agents` | Yes | array | Agent IDs that consume this artifact: `[coach]` |
+| `input_artifacts` | Yes | array | `[GC2]` — requires Capabilities Manifest |
+| `created` | Yes | date | ISO date when artifact was created |
+---
+## Body Schema
+The Findings Report is written to `.gyre/findings.yaml` with the following structure:
+```yaml
+gyre_findings:
+  version: string                 # schema version, e.g., "1.0"
+  analyzed_at: ISO-8601           # analysis timestamp
+  mode: string                    # "crisis" | "anticipation"
+  stack_summary: string           # from GC2
+  summary:
+    blockers: integer
+    recommended: integer
+    nice_to_have: integer
+    total: integer
+    novelty_ratio: string         # e.g., "8 of 12 contextual"
+  findings:
+    - id: string                  # e.g., "OBS-001" or "DEP-003"
+      domain: string              # "observability" | "deployment"
+      severity: string            # "blocker" | "recommended" | "nice-to-have"
+      source: string              # "static-analysis" | "contextual-model"
+      confidence: string          # "high" | "medium" | "low"
+      capability_ref: string      # references GC2 capability ID
+      description: string         # what's missing and why it matters
+      evidence_summary: string    # what was searched and what was/wasn't found
+      severity_rationale: string  # why this severity level
+  compound_findings:
+    - id: string                  # e.g., "COMPOUND-001"
+      domain: "cross-domain"
+      severity: string            # "blocker" | "recommended" | "nice-to-have"
+      source: "contextual-model"
+      confidence: string          # lower of the two component confidences
+      capability_ref: string[]    # references 2 GC2 capability IDs
+      description: string
+      evidence_summary: string
+      related_findings: string[]  # references 2 finding IDs from different domains
+      combined_impact: string     # reasoning chain: why these together are worse
+  sanity_check:
+    passed: boolean
+    warnings: string[]            # e.g., ">80% capabilities flagged missing"
+```
+### Field Reference — Findings
+| Field | Required | Type | Description |
+|-------|----------|------|-------------|
+| `id` | Yes | string | Unique finding ID: OBS-NNN or DEP-NNN |
+| `domain` | Yes | string | "observability" or "deployment" |
+| `severity` | Yes | string | "blocker", "recommended", or "nice-to-have" |
+| `source` | Yes | string | "static-analysis" (file evidence) or "contextual-model" (inferred) |
+| `confidence` | Yes | string | "high", "medium", or "low" |
+| `capability_ref` | Yes | string | Must reference a valid GC2 capability ID |
+| `description` | Yes | string | What's missing and why it matters |
+| `evidence_summary` | Yes | string | What was searched and what was/wasn't found |
+| `severity_rationale` | Yes | string | Why this severity level was assigned |
+### Field Reference — Compound Findings
+| Field | Required | Type | Description |
+|-------|----------|------|-------------|
+| `id` | Yes | string | Unique compound ID: COMPOUND-NNN |
+| `domain` | Yes | string | Always "cross-domain" |
+| `severity` | Yes | string | May be higher than either component |
+| `source` | Yes | string | Always "contextual-model" |
+| `confidence` | Yes | string | Lower of the two component confidences |
+| `capability_ref` | Yes | array | Exactly 2 GC2 capability IDs |
+| `related_findings` | Yes | array | Exactly 2 finding IDs from different domains |
+| `combined_impact` | Yes | string | Reasoning chain explaining amplification |
+---
+## Artifact Location
+- **Path:** `.gyre/findings.yaml` (relative to project root, or service root in monorepo)
+- **Overwritten:** Each analysis run produces a fresh findings report
+- **Previous report:** Backed up to `.gyre/findings.previous.yaml` for delta-report comparison
+---
+## Downstream Consumption
+| Consumer | Purpose |
+|----------|---------|
+| **Coach** (review-coach) | Presents findings for user review, captures feedback on accuracy, guides severity adjustments |
+---
+## Example
+```yaml
+---
+contract: GC3
+type: artifact
+source_agent: lens
+source_workflow: gap-analysis
+target_agents: [coach]
+input_artifacts: [GC2]
+created: 2026-03-22
+---
+gyre_findings:
+  version: "1.0"
+  analyzed_at: "2026-03-22T15:00:00Z"
+  mode: "crisis"
+  stack_summary: "Node.js Express web service on AWS EKS via GitHub Actions"
+  summary:
+    blockers: 2
+    recommended: 5
+    nice_to_have: 3
+    total: 10
+    novelty_ratio: "7 of 10 contextual"
+  findings:
+    - id: "OBS-001"
+      domain: "observability"
+      severity: "blocker"
+      source: "static-analysis"
+      confidence: "high"
+      capability_ref: "health-check-liveness"
+      description: "No Kubernetes liveness probe detected. EKS pods will not be auto-healed when unresponsive."
+      evidence_summary: "Glob for **/health*, **/liveness*: no files found. Grep for 'healthz', 'liveness': no matches in source files. No HEALTHCHECK in Dockerfile."
+      severity_rationale: "EKS requires liveness probes for auto-healing. Without them, stuck pods persist until manual intervention."
+    - id: "DEP-003"
+      domain: "deployment"
+      severity: "recommended"
+      source: "static-analysis"
+      confidence: "medium"
+      capability_ref: "rollback-strategy"
+      description: "No rollback mechanism detected in deployment configuration."
+      evidence_summary: "Grep for 'rollback', 'revision', 'undo' in k8s/ and .github/workflows/: no matches. K8s deployments use default rolling update without explicit rollback steps."
+      severity_rationale: "Rolling updates without explicit rollback increase recovery time from failed deployments."
+  compound_findings:
+    - id: "COMPOUND-001"
+      domain: "cross-domain"
+      severity: "blocker"
+      source: "contextual-model"
+      confidence: "high"
+      capability_ref: ["health-check-liveness", "rollback-strategy"]
+      description: "No health checks combined with no rollback creates unrecoverable deployment failure risk."
+      evidence_summary: "Combines OBS-001 (no liveness probe) with DEP-003 (no rollback). Failed deployments cannot be detected by K8s and cannot be reversed."
+      related_findings: ["OBS-001", "DEP-003"]
+      combined_impact: "Without liveness probes, K8s cannot detect that a new deployment is unhealthy. Without rollback, the failed deployment persists. Together: a bad deploy goes undetected and unrecoverable until manual SSH intervention."
+  sanity_check:
+    passed: true
+    warnings: []
+```
+---
+## Validation Rules
+A valid GC3 artifact must:
+1. Have all required frontmatter fields present and correctly typed
+2. Have all required body fields present and non-empty
+3. Have every finding reference a valid `capability_ref` from GC2
+4. Have every compound reference exactly 2 findings from different domains
+5. Have compound `confidence` equal to the lower of component confidences
+6. Have no compounds where either component has confidence "low"
+7. Have finding IDs unique within the report
+8. Have `severity` as one of: "blocker", "recommended", "nice-to-have"
+9. Have `source` as one of: "static-analysis", "contextual-model"
+10. Have `confidence` as one of: "high", "medium", "low"
+11. Have summary counts matching actual finding counts

package/_bmad/bme/_gyre/contracts/gc4-feedback-loop.md ADDED Viewed

@@ -0,0 +1,209 @@
+# GC4: Feedback Loop — Schema Definition
+> **Contract:** GC4 | **Type:** Artifact | **Flow:** Coach → Atlas
+>
+> This schema defines the structure for the Feedback Loop produced by the model-review workflow. Contains two mechanisms: (1) amendment flags on capabilities in GC2, and (2) a standalone feedback file for missed-gap reports.
+## Overview
+GC4 is a dual-mechanism contract:
+1. **Amendment Flags** — stored directly on capabilities in `.gyre/capabilities.yaml` (GC2). Capabilities marked `removed: true` or `amended: true` persist across regeneration. Atlas reads these flags and respects them.
+2. **Feedback File** — stored in `.gyre/feedback.yaml`. Contains user-reported missed capabilities, missed gaps, and severity adjustments. Atlas uses these to inform future model generation.
+## Mechanism 1: Amendment Flags (in GC2)
+Amendments are stored as additional fields on capability entries in `.gyre/capabilities.yaml`:
+```yaml
+# Removed capability — excluded from future analysis
+- id: "obs-custom-metrics"
+  name: "Custom metrics collection"
+  category: "observability"
+  # ... standard fields ...
+  removed: true
+  removed_at: "2026-03-22"
+# Edited capability — persists across regeneration
+- id: "dep-rollback"
+  name: "Automated rollback"           # updated by user
+  category: "deployment"
+  description: "Canary-based rollback"  # updated by user
+  # ... standard fields ...
+  amended: true
+  amended_at: "2026-03-22"
+  original_name: "Rollback strategy"
+  original_description: "Mechanism to revert failed deployments"
+# User-added capability
+- id: "obs-custom-001"
+  name: "SLO dashboard"
+  category: "observability"
+  description: "Service Level Objective tracking dashboard with burn rate alerts"
+  source: "user-added"
+  amended: true
+  amended_at: "2026-03-22"
+```
+### Amendment Flag Reference
+| Field | Type | When Present | Description |
+|-------|------|-------------|-------------|
+| `removed` | boolean | Capability removed by user | Always `true` when present |
+| `removed_at` | date | With `removed` | ISO date of removal |
+| `amended` | boolean | Capability edited or added by user | Always `true` when present |
+| `amended_at` | date | With `amended` | ISO date of amendment |
+| `original_name` | string | Name was changed | Previous name value |
+| `original_description` | string | Description was changed | Previous description value |
+| `original_category` | string | Category was changed | Previous category value |
+| `source` | string | User-added capability | Set to `"user-added"` for new entries |
+### Amendment Frontmatter Fields (in GC2)
+When amendments are applied, the GC2 frontmatter is updated:
+```yaml
+---
+contract: GC2
+# ... standard GC2 fields ...
+last_reviewed: "2026-03-22"      # date of most recent review
+review_deferred: false            # true if user chose "later"
+amendment_count: 3                # total amendments applied
+---
+```
+---
+## Mechanism 2: Feedback File
+Feedback is stored in a standalone file at `.gyre/feedback.yaml`.
+### Frontmatter Schema
+```yaml
+---
+contract: GC4
+type: artifact
+source_agent: coach
+target_agents: [atlas]
+created: YYYY-MM-DD
+updated: YYYY-MM-DD
+---
+```
+### Frontmatter Field Reference
+| Field | Required | Type | Description |
+|-------|----------|------|-------------|
+| `contract` | Yes | string | Always `GC4` |
+| `type` | Yes | string | Always `artifact` |
+| `source_agent` | Yes | string | Always `coach` |
+| `target_agents` | Yes | array | Agent IDs that consume this artifact: `[atlas]` |
+| `created` | Yes | date | ISO date when first feedback entry was written |
+| `updated` | Yes | date | ISO date of most recent feedback entry |
+### Body Schema
+```yaml
+feedback:
+  - timestamp: ISO-8601          # when the feedback was provided
+    reporter: string             # user name from config
+    type: string                 # "missed-capability" | "missed-gap" | "severity-adjustment" | "other"
+    description: string          # user's feedback in their own words
+    domain: string               # "observability" | "deployment" | "reliability" | "security" | "general"
+```
+### Feedback Entry Field Reference
+| Field | Required | Type | Description |
+|-------|----------|------|-------------|
+| `timestamp` | Yes | string | ISO-8601 timestamp of when feedback was captured |
+| `reporter` | Yes | string | User name (from config.yaml `user_name`) |
+| `type` | Yes | string | One of: `missed-capability`, `missed-gap`, `severity-adjustment`, `other` |
+| `description` | Yes | string | User's feedback in their own words |
+| `domain` | Yes | string | One of: `observability`, `deployment`, `reliability`, `security`, `general` |
+---
+## Artifact Locations
+| File | Purpose | Updated By |
+|------|---------|------------|
+| `.gyre/capabilities.yaml` | Amendment flags on capabilities | Coach (step-03-apply-amendments) |
+| `.gyre/feedback.yaml` | Missed-gap reports and feedback | Coach (step-04-capture-feedback) |
+---
+## Downstream Consumption
+| Consumer | Mechanism | Purpose |
+|----------|-----------|---------|
+| **Atlas** (model-curator) | Amendment flags in GC2 | Respect `removed`/`amended` flags on regeneration — never re-add removed capabilities, preserve edits |
+| **Atlas** (model-curator) | Feedback file | Incorporate missed-capability and missed-gap feedback into future model generation |
+### Atlas Regeneration Rules
+When Atlas regenerates the model (model-generation workflow):
+1. Load existing `.gyre/capabilities.yaml` if present
+2. For each capability with `removed: true` — do NOT regenerate, preserve the removal entry
+3. For each capability with `amended: true` — preserve user's edits, do not overwrite
+4. For each `missed-capability` feedback entry — consider generating a matching capability
+5. For each `missed-gap` feedback entry — consider adding a relevant capability
+6. New capabilities from regeneration are added alongside preserved amendments
+---
+## Example
+### Feedback File Example
+```yaml
+---
+contract: GC4
+type: artifact
+source_agent: coach
+target_agents: [atlas]
+created: 2026-03-22
+updated: 2026-03-22
+---
+feedback:
+  - timestamp: "2026-03-22T15:30:00Z"
+    reporter: "Alice"
+    type: "missed-capability"
+    description: "We use SLO-based alerting with burn rate windows — the model should include SLO tracking as a capability"
+    domain: "observability"
+  - timestamp: "2026-03-22T15:32:00Z"
+    reporter: "Alice"
+    type: "severity-adjustment"
+    description: "OBS-003 (structured logging) should be a blocker, not recommended — we've had incidents from unstructured logs"
+    domain: "observability"
+  - timestamp: "2026-03-22T15:35:00Z"
+    reporter: "Alice"
+    type: "missed-gap"
+    description: "No finding about database connection pooling — we don't have it and it's causing issues under load"
+    domain: "reliability"
+```
+---
+## Validation Rules
+A valid GC4 feedback file must:
+1. Have all required frontmatter fields present and correctly typed
+2. Have every feedback entry with all required fields present and non-empty
+3. Have `type` as one of: `missed-capability`, `missed-gap`, `severity-adjustment`, `other`
+4. Have `domain` as one of: `observability`, `deployment`, `reliability`, `security`, `general`
+5. Have `timestamp` in ISO-8601 format
+6. Have `updated` date >= `created` date
+7. Not contain source code, file contents, or secrets (NFR9)
+Amendment flags in GC2 must:
+1. Have `removed_at` present when `removed: true`
+2. Have `amended_at` present when `amended: true`
+3. Have at least one `original_*` field when `amended: true` (unless `source: "user-added"`)
+4. Have unique capability IDs (user-added capabilities use `[category]-custom-NNN` pattern)