npm - convoke-agents - Versions diffs - 2.3.1 → 3.0.0 - Mend

convoke-agents 2.3.1 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (89) hide show

package/_bmad/bme/_gyre/contracts/gc3-findings-report.md ADDED Viewed

@@ -0,0 +1,197 @@
+# GC3: Findings Report — Schema Definition
+> **Contract:** GC3 | **Type:** Artifact | **Flow:** Lens → Coach
+>
+> This schema defines the structure for the Findings Report produced by the gap-analysis workflow. Contains absence-based findings with severity, confidence, evidence summaries, and cross-domain compounds.
+## Frontmatter Schema
+```yaml
+---
+contract: GC3
+type: artifact
+source_agent: lens
+source_workflow: gap-analysis
+target_agents: [coach]
+input_artifacts: [GC2]
+created: YYYY-MM-DD
+---
+```
+### Frontmatter Field Reference
+| Field | Required | Type | Description |
+|-------|----------|------|-------------|
+| `contract` | Yes | string | Always `GC3` |
+| `type` | Yes | string | Always `artifact` |
+| `source_agent` | Yes | string | Always `lens` |
+| `source_workflow` | Yes | string | Always `gap-analysis` |
+| `target_agents` | Yes | array | Agent IDs that consume this artifact: `[coach]` |
+| `input_artifacts` | Yes | array | `[GC2]` — requires Capabilities Manifest |
+| `created` | Yes | date | ISO date when artifact was created |
+---
+## Body Schema
+The Findings Report is written to `.gyre/findings.yaml` with the following structure:
+```yaml
+gyre_findings:
+  version: string                 # schema version, e.g., "1.0"
+  analyzed_at: ISO-8601           # analysis timestamp
+  mode: string                    # "crisis" | "anticipation"
+  stack_summary: string           # from GC2
+  summary:
+    blockers: integer
+    recommended: integer
+    nice_to_have: integer
+    total: integer
+    novelty_ratio: string         # e.g., "8 of 12 contextual"
+  findings:
+    - id: string                  # e.g., "OBS-001" or "DEP-003"
+      domain: string              # "observability" | "deployment"
+      severity: string            # "blocker" | "recommended" | "nice-to-have"
+      source: string              # "static-analysis" | "contextual-model"
+      confidence: string          # "high" | "medium" | "low"
+      capability_ref: string      # references GC2 capability ID
+      description: string         # what's missing and why it matters
+      evidence_summary: string    # what was searched and what was/wasn't found
+      severity_rationale: string  # why this severity level
+  compound_findings:
+    - id: string                  # e.g., "COMPOUND-001"
+      domain: "cross-domain"
+      severity: string            # "blocker" | "recommended" | "nice-to-have"
+      source: "contextual-model"
+      confidence: string          # lower of the two component confidences
+      capability_ref: string[]    # references 2 GC2 capability IDs
+      description: string
+      evidence_summary: string
+      related_findings: string[]  # references 2 finding IDs from different domains
+      combined_impact: string     # reasoning chain: why these together are worse
+  sanity_check:
+    passed: boolean
+    warnings: string[]            # e.g., ">80% capabilities flagged missing"
+```
+### Field Reference — Findings
+| Field | Required | Type | Description |
+|-------|----------|------|-------------|
+| `id` | Yes | string | Unique finding ID: OBS-NNN or DEP-NNN |
+| `domain` | Yes | string | "observability" or "deployment" |
+| `severity` | Yes | string | "blocker", "recommended", or "nice-to-have" |
+| `source` | Yes | string | "static-analysis" (file evidence) or "contextual-model" (inferred) |
+| `confidence` | Yes | string | "high", "medium", or "low" |
+| `capability_ref` | Yes | string | Must reference a valid GC2 capability ID |
+| `description` | Yes | string | What's missing and why it matters |
+| `evidence_summary` | Yes | string | What was searched and what was/wasn't found |
+| `severity_rationale` | Yes | string | Why this severity level was assigned |
+### Field Reference — Compound Findings
+| Field | Required | Type | Description |
+|-------|----------|------|-------------|
+| `id` | Yes | string | Unique compound ID: COMPOUND-NNN |
+| `domain` | Yes | string | Always "cross-domain" |
+| `severity` | Yes | string | May be higher than either component |
+| `source` | Yes | string | Always "contextual-model" |
+| `confidence` | Yes | string | Lower of the two component confidences |
+| `capability_ref` | Yes | array | Exactly 2 GC2 capability IDs |
+| `related_findings` | Yes | array | Exactly 2 finding IDs from different domains |
+| `combined_impact` | Yes | string | Reasoning chain explaining amplification |
+---
+## Artifact Location
+- **Path:** `.gyre/findings.yaml` (relative to project root, or service root in monorepo)
+- **Overwritten:** Each analysis run produces a fresh findings report
+- **Previous report:** Backed up to `.gyre/findings.previous.yaml` for delta-report comparison
+---
+## Downstream Consumption
+| Consumer | Purpose |
+|----------|---------|
+| **Coach** (review-coach) | Presents findings for user review, captures feedback on accuracy, guides severity adjustments |
+---
+## Example
+```yaml
+---
+contract: GC3
+type: artifact
+source_agent: lens
+source_workflow: gap-analysis
+target_agents: [coach]
+input_artifacts: [GC2]
+created: 2026-03-22
+---
+gyre_findings:
+  version: "1.0"
+  analyzed_at: "2026-03-22T15:00:00Z"
+  mode: "crisis"
+  stack_summary: "Node.js Express web service on AWS EKS via GitHub Actions"
+  summary:
+    blockers: 2
+    recommended: 5
+    nice_to_have: 3
+    total: 10
+    novelty_ratio: "7 of 10 contextual"
+  findings:
+    - id: "OBS-001"
+      domain: "observability"
+      severity: "blocker"
+      source: "static-analysis"
+      confidence: "high"
+      capability_ref: "health-check-liveness"
+      description: "No Kubernetes liveness probe detected. EKS pods will not be auto-healed when unresponsive."
+      evidence_summary: "Glob for **/health*, **/liveness*: no files found. Grep for 'healthz', 'liveness': no matches in source files. No HEALTHCHECK in Dockerfile."
+      severity_rationale: "EKS requires liveness probes for auto-healing. Without them, stuck pods persist until manual intervention."
+    - id: "DEP-003"
+      domain: "deployment"
+      severity: "recommended"
+      source: "static-analysis"
+      confidence: "medium"
+      capability_ref: "rollback-strategy"
+      description: "No rollback mechanism detected in deployment configuration."
+      evidence_summary: "Grep for 'rollback', 'revision', 'undo' in k8s/ and .github/workflows/: no matches. K8s deployments use default rolling update without explicit rollback steps."
+      severity_rationale: "Rolling updates without explicit rollback increase recovery time from failed deployments."
+  compound_findings:
+    - id: "COMPOUND-001"
+      domain: "cross-domain"
+      severity: "blocker"
+      source: "contextual-model"
+      confidence: "high"
+      capability_ref: ["health-check-liveness", "rollback-strategy"]
+      description: "No health checks combined with no rollback creates unrecoverable deployment failure risk."
+      evidence_summary: "Combines OBS-001 (no liveness probe) with DEP-003 (no rollback). Failed deployments cannot be detected by K8s and cannot be reversed."
+      related_findings: ["OBS-001", "DEP-003"]
+      combined_impact: "Without liveness probes, K8s cannot detect that a new deployment is unhealthy. Without rollback, the failed deployment persists. Together: a bad deploy goes undetected and unrecoverable until manual SSH intervention."
+  sanity_check:
+    passed: true
+    warnings: []
+```
+---
+## Validation Rules
+A valid GC3 artifact must:
+1. Have all required frontmatter fields present and correctly typed
+2. Have all required body fields present and non-empty
+3. Have every finding reference a valid `capability_ref` from GC2
+4. Have every compound reference exactly 2 findings from different domains
+5. Have compound `confidence` equal to the lower of component confidences
+6. Have no compounds where either component has confidence "low"
+7. Have finding IDs unique within the report
+8. Have `severity` as one of: "blocker", "recommended", "nice-to-have"
+9. Have `source` as one of: "static-analysis", "contextual-model"
+10. Have `confidence` as one of: "high", "medium", "low"
+11. Have summary counts matching actual finding counts

package/_bmad/bme/_gyre/contracts/gc4-feedback-loop.md ADDED Viewed

@@ -0,0 +1,209 @@
+# GC4: Feedback Loop — Schema Definition
+> **Contract:** GC4 | **Type:** Artifact | **Flow:** Coach → Atlas
+>
+> This schema defines the structure for the Feedback Loop produced by the model-review workflow. Contains two mechanisms: (1) amendment flags on capabilities in GC2, and (2) a standalone feedback file for missed-gap reports.
+## Overview
+GC4 is a dual-mechanism contract:
+1. **Amendment Flags** — stored directly on capabilities in `.gyre/capabilities.yaml` (GC2). Capabilities marked `removed: true` or `amended: true` persist across regeneration. Atlas reads these flags and respects them.
+2. **Feedback File** — stored in `.gyre/feedback.yaml`. Contains user-reported missed capabilities, missed gaps, and severity adjustments. Atlas uses these to inform future model generation.
+## Mechanism 1: Amendment Flags (in GC2)
+Amendments are stored as additional fields on capability entries in `.gyre/capabilities.yaml`:
+```yaml
+# Removed capability — excluded from future analysis
+- id: "obs-custom-metrics"
+  name: "Custom metrics collection"
+  category: "observability"
+  # ... standard fields ...
+  removed: true
+  removed_at: "2026-03-22"
+# Edited capability — persists across regeneration
+- id: "dep-rollback"
+  name: "Automated rollback"           # updated by user
+  category: "deployment"
+  description: "Canary-based rollback"  # updated by user
+  # ... standard fields ...
+  amended: true
+  amended_at: "2026-03-22"
+  original_name: "Rollback strategy"
+  original_description: "Mechanism to revert failed deployments"
+# User-added capability
+- id: "obs-custom-001"
+  name: "SLO dashboard"
+  category: "observability"
+  description: "Service Level Objective tracking dashboard with burn rate alerts"
+  source: "user-added"
+  amended: true
+  amended_at: "2026-03-22"
+```
+### Amendment Flag Reference
+| Field | Type | When Present | Description |
+|-------|------|-------------|-------------|
+| `removed` | boolean | Capability removed by user | Always `true` when present |
+| `removed_at` | date | With `removed` | ISO date of removal |
+| `amended` | boolean | Capability edited or added by user | Always `true` when present |
+| `amended_at` | date | With `amended` | ISO date of amendment |
+| `original_name` | string | Name was changed | Previous name value |
+| `original_description` | string | Description was changed | Previous description value |
+| `original_category` | string | Category was changed | Previous category value |
+| `source` | string | User-added capability | Set to `"user-added"` for new entries |
+### Amendment Frontmatter Fields (in GC2)
+When amendments are applied, the GC2 frontmatter is updated:
+```yaml
+---
+contract: GC2
+# ... standard GC2 fields ...
+last_reviewed: "2026-03-22"      # date of most recent review
+review_deferred: false            # true if user chose "later"
+amendment_count: 3                # total amendments applied
+---
+```
+---
+## Mechanism 2: Feedback File
+Feedback is stored in a standalone file at `.gyre/feedback.yaml`.
+### Frontmatter Schema
+```yaml
+---
+contract: GC4
+type: artifact
+source_agent: coach
+target_agents: [atlas]
+created: YYYY-MM-DD
+updated: YYYY-MM-DD
+---
+```
+### Frontmatter Field Reference
+| Field | Required | Type | Description |
+|-------|----------|------|-------------|
+| `contract` | Yes | string | Always `GC4` |
+| `type` | Yes | string | Always `artifact` |
+| `source_agent` | Yes | string | Always `coach` |
+| `target_agents` | Yes | array | Agent IDs that consume this artifact: `[atlas]` |
+| `created` | Yes | date | ISO date when first feedback entry was written |
+| `updated` | Yes | date | ISO date of most recent feedback entry |
+### Body Schema
+```yaml
+feedback:
+  - timestamp: ISO-8601          # when the feedback was provided
+    reporter: string             # user name from config
+    type: string                 # "missed-capability" | "missed-gap" | "severity-adjustment" | "other"
+    description: string          # user's feedback in their own words
+    domain: string               # "observability" | "deployment" | "reliability" | "security" | "general"
+```
+### Feedback Entry Field Reference
+| Field | Required | Type | Description |
+|-------|----------|------|-------------|
+| `timestamp` | Yes | string | ISO-8601 timestamp of when feedback was captured |
+| `reporter` | Yes | string | User name (from config.yaml `user_name`) |
+| `type` | Yes | string | One of: `missed-capability`, `missed-gap`, `severity-adjustment`, `other` |
+| `description` | Yes | string | User's feedback in their own words |
+| `domain` | Yes | string | One of: `observability`, `deployment`, `reliability`, `security`, `general` |
+---
+## Artifact Locations
+| File | Purpose | Updated By |
+|------|---------|------------|
+| `.gyre/capabilities.yaml` | Amendment flags on capabilities | Coach (step-03-apply-amendments) |
+| `.gyre/feedback.yaml` | Missed-gap reports and feedback | Coach (step-04-capture-feedback) |
+---
+## Downstream Consumption
+| Consumer | Mechanism | Purpose |
+|----------|-----------|---------|
+| **Atlas** (model-curator) | Amendment flags in GC2 | Respect `removed`/`amended` flags on regeneration — never re-add removed capabilities, preserve edits |
+| **Atlas** (model-curator) | Feedback file | Incorporate missed-capability and missed-gap feedback into future model generation |
+### Atlas Regeneration Rules
+When Atlas regenerates the model (model-generation workflow):
+1. Load existing `.gyre/capabilities.yaml` if present
+2. For each capability with `removed: true` — do NOT regenerate, preserve the removal entry
+3. For each capability with `amended: true` — preserve user's edits, do not overwrite
+4. For each `missed-capability` feedback entry — consider generating a matching capability
+5. For each `missed-gap` feedback entry — consider adding a relevant capability
+6. New capabilities from regeneration are added alongside preserved amendments
+---
+## Example
+### Feedback File Example
+```yaml
+---
+contract: GC4
+type: artifact
+source_agent: coach
+target_agents: [atlas]
+created: 2026-03-22
+updated: 2026-03-22
+---
+feedback:
+  - timestamp: "2026-03-22T15:30:00Z"
+    reporter: "Alice"
+    type: "missed-capability"
+    description: "We use SLO-based alerting with burn rate windows — the model should include SLO tracking as a capability"
+    domain: "observability"
+  - timestamp: "2026-03-22T15:32:00Z"
+    reporter: "Alice"
+    type: "severity-adjustment"
+    description: "OBS-003 (structured logging) should be a blocker, not recommended — we've had incidents from unstructured logs"
+    domain: "observability"
+  - timestamp: "2026-03-22T15:35:00Z"
+    reporter: "Alice"
+    type: "missed-gap"
+    description: "No finding about database connection pooling — we don't have it and it's causing issues under load"
+    domain: "reliability"
+```
+---
+## Validation Rules
+A valid GC4 feedback file must:
+1. Have all required frontmatter fields present and correctly typed
+2. Have every feedback entry with all required fields present and non-empty
+3. Have `type` as one of: `missed-capability`, `missed-gap`, `severity-adjustment`, `other`
+4. Have `domain` as one of: `observability`, `deployment`, `reliability`, `security`, `general`
+5. Have `timestamp` in ISO-8601 format
+6. Have `updated` date >= `created` date
+7. Not contain source code, file contents, or secrets (NFR9)
+Amendment flags in GC2 must:
+1. Have `removed_at` present when `removed: true`
+2. Have `amended_at` present when `amended: true`
+3. Have at least one `original_*` field when `amended: true` (unless `source: "user-added"`)
+4. Have unique capability IDs (user-added capabilities use `[category]-custom-NNN` pattern)

package/_bmad/bme/_gyre/guides/ATLAS-USER-GUIDE.md ADDED Viewed

@@ -0,0 +1,177 @@
+# Atlas 📐 User Guide
+**(model-curator.md)**
+- **Version:** 1.0.0
+- **Module:** Gyre Pattern (Production Readiness)
+- **Last Updated:** 2026-03-24
+---
+## Quick Start
+**Who is Atlas?** Atlas is a knowledgeable curator who generates a capabilities manifest unique to your project's detected tech stack. Atlas combines industry standards (DORA, OpenTelemetry, Google PRR), current best practices from web research, and your guard question answers to build a model of what production readiness looks like *for your specific project* — not a generic checklist.
+**When to use Atlas:**
+- Scout has produced a Stack Profile and you need a capabilities model
+- You want to regenerate the model after amending it through Coach
+- You need to validate model accuracy against a known stack
+**Atlas vs Other Gyre Agents:**
+| If you want to... | Use... |
+|---|---|
+| Know what tech stack a project uses | **Scout** 🔎 |
+| Generate a capabilities model for that stack | **Atlas** 📐 |
+| Find what's missing from the project | **Lens** 🔬 |
+| Review and refine the findings | **Coach** 🏋️ |
+**What you'll get:** A Capabilities Manifest (`.gyre/capabilities.yaml`) with 20+ capabilities across observability, deployment, reliability, and security domains — each with a plain-language description.
+---
+## How to Invoke
+**Claude Code (skills) — recommended:**
+```
+/bmad-agent-bme-model-curator
+```
+**Claude Code (terminal) / Other AI assistants:**
+```bash
+cat _bmad/bme/_gyre/agents/model-curator.md
+```
+**Claude.ai:**
+Open `_bmad/bme/_gyre/agents/model-curator.md` and paste its contents into your conversation.
+---
+## Menu Options
+| # | Code | Description |
+|---|------|-------------|
+| 1 | **MH** | Redisplay Menu Help |
+| 2 | **CH** | Chat with Atlas about capabilities and models |
+| 3 | **GM** | Generate Model — build a capabilities manifest from the Stack Profile |
+| 4 | **AV** | Accuracy Validation — validate model quality against test repos |
+| 5 | **FA** | Full Analysis — run the complete Gyre pipeline |
+| 6 | **PM** | Start Party Mode |
+| 7 | **DA** | Dismiss Agent |
+Select by number, code, or fuzzy text match.
+---
+## Workflows
+### [GM] Generate Model
+Loads the Stack Profile (GC1) and generates a contextual capabilities manifest tailored to your project's tech stack.
+- **Prerequisite:** `.gyre/stack-profile.yaml` must exist (run Scout first)
+- **Output:** `.gyre/capabilities.yaml` (GC2 contract)
+- **When to use:** After stack detection, or when regenerating the model
+**How Atlas generates capabilities:**
+1. **Load profile** — reads Stack Profile and checks for existing amendments from Coach (GC4)
+2. **Generate capabilities** — uses stack context + industry standards (DORA metrics, OpenTelemetry, Google PRR) to produce domain-specific capabilities
+3. **Web enrichment** — searches for current best practices relevant to your stack
+4. **Write manifest** — produces the final capabilities.yaml
+**Capability domains:** observability (6-10 capabilities), deployment (5-8), reliability (4-6), security (3-5).
+**Capability sources:**
+- **standard** — derived from industry standards (DORA, OTel, PRR)
+- **practice** — discovered via web search for current best practices
+- **reasoning** — inferred from stack context by Atlas
+**Limited coverage warning:** If fewer than 20 capabilities are generated, Atlas surfaces a warning and offers you the choice to continue or abort.
+**Amendment respect:** When regenerating, Atlas preserves your previous Coach amendments — removed capabilities stay removed, edited descriptions stay edited.
+### [AV] Accuracy Validation
+Validates model quality by scoring capabilities against synthetic ground truth across multiple stack archetypes.
+- **Output:** Accuracy report with per-archetype scores
+- **When to use:** Quality gate before trusting the model for gap analysis
+**Scoring:**
+- 1.0 = Relevant (appropriate for this stack)
+- 0.5 = Partially relevant (tangential or poorly described)
+- 0.0 = Irrelevant (no relation to stack)
+**Gate:** Must achieve ≥70% accuracy across ≥3 stack archetypes. If any archetype falls below 70%, Atlas iterates.
+### [FA] Full Analysis
+Same as Scout's Full Analysis — runs the complete pipeline. Atlas handles step 3 (model generation).
+---
+## Philosophy
+- **Context over generic** — Every capability is relevant to *this* stack. Atlas doesn't dump a universal checklist; it reasons about what matters for Node.js on Kubernetes differently than Python on ECS.
+- **Standards inform, they don't dictate** — DORA and OpenTelemetry are starting points, not requirements. Atlas adapts to your project's actual architecture.
+- **Transparency** — Each capability shows its source (standard, practice, or reasoning) so you know where it came from and can judge accordingly.
+- **Team ownership** — The capabilities manifest belongs to your team. Amendments persist. The model improves with use.
+---
+## Chat with Atlas
+Use **[CH]** to discuss model generation topics:
+- "Why did you include distributed tracing for my project?"
+- "What DORA metrics are relevant to my stack?"
+- "Can you explain the difference between standard and practice sources?"
+- "My project is a CLI tool, not a web service — should I regenerate?"
+---
+## Troubleshooting
+**"GC1 not found" error**
+Atlas requires a Stack Profile to generate the model. Run Scout's **[DS]** workflow first, then return to Atlas.
+**Model seems too generic**
+Check that Scout's guard answers are accurate — Atlas uses them to tune the model. If guard answers are wrong or missing, re-run Scout's **[DS]** with corrected answers, then regenerate.
+**Fewer than 20 capabilities generated**
+Atlas surfaces a limited-coverage warning. This can happen with unusual or very simple stacks. You can continue (the model is still useful) or abort and check if Scout's stack classification is accurate.
+**Amendments disappeared after regeneration**
+Atlas reads GC4 (Coach amendments) before regenerating. If amendments are missing, check that `.gyre/capabilities.yaml` contains the `amended` and `removed` flags from your last Coach review.
+---
+## Tips
+- **Don't skip Scout.** Atlas needs the Stack Profile as input. Running Atlas without it produces an error, not a generic model.
+- **Web search results are fresh.** Atlas searches for current-year best practices every time — no stale caches. This means models improve naturally over time.
+- **Regeneration preserves your work.** Coach amendments survive regeneration. You don't lose your customizations when the model updates.
+- **Capabilities.yaml IS the cache.** In anticipation mode, Atlas's model is reused rather than regenerated. This is by design — the model is stable until you explicitly regenerate.
+---
+## Credits
+- **Agent:** Atlas (Model Curator)
+- **Module:** Gyre Pattern v1.0.0
+- **Pattern:** Convoke Agent Architecture
+---
+*Scout finds the facts. Atlas builds the model. Lens finds the gaps. Coach helps you act on them.*