npm - convoke-agents - Versions diffs - 2.4.0 → 3.0.0 - Mend

convoke-agents 2.4.0 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (67) hide show

package/_bmad/bme/_gyre/workflows/accuracy-validation/steps/step-01-select-repos.md ADDED Viewed

@@ -0,0 +1,55 @@
+---
+step: 1
+workflow: accuracy-validation
+title: Select Ground Truth Repos
+---
+# Step 1: Select Ground Truth Repos
+Choose ≥3 synthetic ground truth repositories representing diverse stack archetypes.
+## MANDATORY EXECUTION RULES
+- Select repos that are well-known, publicly available, and representative of their archetype
+- Each repo must have a DIFFERENT primary language AND deployment pattern
+- Do NOT use toy projects — repos must have real production-grade structure
+## RECOMMENDED ARCHETYPES
+Select at least 3 from:
+| # | Archetype | Example Repo | Stack Signature |
+|---|-----------|-------------|-----------------|
+| 1 | Go microservice on K8s | CNCF project (e.g., CoreDNS, Prometheus) | Go + gRPC/REST + Kubernetes + Prometheus |
+| 2 | Node.js web service | Express/Fastify API with Docker | Node.js + REST + Docker + GitHub Actions |
+| 3 | Python data pipeline | Airflow DAG or FastAPI service | Python + Celery/Airflow + Docker Compose |
+| 4 | JVM enterprise service | Spring Boot with Gradle | Java + REST + Maven/Gradle + Jenkins |
+| 5 | Rust system service | Actix/Axum with monitoring | Rust + REST + Docker + CI/CD |
+## SELECTION PROCESS
+1. Present the archetype options to the user
+2. User selects ≥3 (or accepts the recommended set)
+3. For each selected archetype, confirm the specific repo or let Atlas use a synthetic profile
+**Synthetic profile option:** If specific repos are unavailable, Atlas can generate capabilities based on the archetype description alone (the Stack Profile simulates what Scout would produce). Note this in the results as "synthetic profile" vs "live repo".
+## OUTPUT
+Record selections as a table:
+```
+## Selected Archetypes
+| # | Archetype | Repo/Profile | Method |
+|---|-----------|-------------|--------|
+| 1 | [archetype] | [repo URL or "synthetic"] | [live scan / synthetic profile] |
+| 2 | [archetype] | [repo URL or "synthetic"] | [live scan / synthetic profile] |
+| 3 | [archetype] | [repo URL or "synthetic"] | [live scan / synthetic profile] |
+```
+---
+## NEXT STEP
+Load step: {project-root}/_bmad/bme/_gyre/workflows/accuracy-validation/steps/step-02-run-validation.md

package/_bmad/bme/_gyre/workflows/accuracy-validation/steps/step-02-run-validation.md ADDED Viewed

@@ -0,0 +1,78 @@
+---
+step: 2
+workflow: accuracy-validation
+title: Run Validation
+---
+# Step 2: Run Validation
+For each selected archetype, run stack detection + model generation and collect the capabilities manifest.
+## MANDATORY EXECUTION RULES
+- Process each archetype independently — do not carry context between runs
+- Record the exact Stack Profile (GC1) used for each archetype
+- Record the complete capabilities manifest (GC2) generated for each
+- If using live repos, run Scout's scan sequence from step-01-scan-filesystem.md
+- If using synthetic profiles, construct a GC1-compliant Stack Profile from the archetype description
+## EXECUTION SEQUENCE
+For each archetype:
+### 1. Produce Stack Profile
+**Live repo method:**
+- Run Scout's filesystem scan (step-01-scan-filesystem.md) against the repo
+- Classify the stack (step-02-classify-stack.md)
+- Skip guard questions (assume high confidence for validation purposes)
+- Record the resulting GC1 Stack Profile
+**Synthetic profile method:**
+- Construct a GC1-compliant stack profile from the archetype description:
+  ```yaml
+  stack_profile:
+    primary_language: "[from archetype]"
+    primary_framework: "[from archetype]"
+    secondary_stacks: []
+    container_orchestration: "[from archetype]"
+    ci_cd_platform: "[from archetype]"
+    observability_tooling: ["[from archetype]"]
+    cloud_provider: "[from archetype]"
+    communication_protocol: "[from archetype]"
+    detection_confidence: "high"
+    detection_summary: "[archetype description]"
+  ```
+### 2. Generate Capabilities Manifest
+- Run Atlas's model generation workflow against the Stack Profile
+- Record the complete capabilities list
+- Note: web search enrichment should be performed if available, skipped if not (record which)
+### 3. Record Results
+For each archetype, record:
+```
+## Archetype [N]: [Name]
+**Stack Profile:** [summary]
+**Method:** [live scan / synthetic profile]
+**Web search:** [performed / skipped]
+**Capabilities generated:** [count]
+**Limited coverage:** [yes/no]
+### Capabilities List
+| # | ID | Category | Name | Description |
+|---|-----|----------|------|-------------|
+| 1 | [id] | [category] | [name] | [description] |
+| ... | | | | |
+```
+---
+## NEXT STEP
+Load step: {project-root}/_bmad/bme/_gyre/workflows/accuracy-validation/steps/step-03-score-results.md

package/_bmad/bme/_gyre/workflows/accuracy-validation/steps/step-03-score-results.md ADDED Viewed

@@ -0,0 +1,143 @@
+---
+step: 3
+workflow: accuracy-validation
+title: Score Results
+---
+# Step 3: Score Results
+Score each capability for relevance and compute accuracy per archetype.
+## MANDATORY EXECUTION RULES
+- Score EVERY capability — do not skip any
+- Apply scoring criteria consistently across archetypes
+- Be honest about borderline cases — use 0.5, not 1.0, when uncertain
+- Document reasoning for any capability scored 0.0 (irrelevant)
+- The user makes the final call on disputed scores
+## SCORING CRITERIA
+For each capability, assign a relevance score:
+| Score | Label | Criteria |
+|-------|-------|----------|
+| **1.0** | Relevant | Would appear in a production readiness checklist for this stack, written by a domain expert. Clear, specific, and actionable. |
+| **0.5** | Partially relevant | Related to the stack's concerns but either: (a) too generic to be actionable, (b) specific to a different deployment model, or (c) relevant but poorly described |
+| **0.0** | Irrelevant | No meaningful relationship to this stack. Wrong technology, wrong domain, or nonsensical for the architecture. |
+### Scoring Guidelines
+**Score 1.0 when:**
+- The capability references technology actually in the stack (e.g., "Kubernetes liveness probes" for a K8s-deployed service)
+- The practice is industry-standard for this stack type (e.g., "structured logging" for any production service)
+- The description explains WHY it matters for THIS stack (not a generic definition)
+**Score 0.5 when:**
+- The capability is correct but the description is too generic (e.g., "monitoring" without specifying what to monitor)
+- The capability applies to a related but different deployment model (e.g., "ECS task health" for a K8s service)
+- The capability is relevant but at a different maturity level than the stack suggests
+**Score 0.0 when:**
+- The capability references technology not in the stack at all
+- The capability is a duplicate of another capability (higher-quality version gets the score)
+- The capability is meaningless or contradictory (e.g., "serverless cold start optimization" for a bare-metal service)
+## SCORING PROCESS
+Present each archetype's capabilities for scoring:
+```
+## Scoring: Archetype [N] — [Name]
+| # | Capability | Score | Reasoning |
+|---|-----------|:-----:|-----------|
+| 1 | [name]: [description summary] | [1.0/0.5/0.0] | [brief reasoning] |
+| 2 | ... | | |
+**Total capabilities:** [N]
+**Sum of scores:** [X]
+**Accuracy:** [X/N = Y%]
+```
+After scoring all archetypes, present the summary:
+```
+## Accuracy Validation Results
+| Archetype | Capabilities | Sum | Accuracy | Pass? |
+|-----------|:-----------:|:---:|:--------:|:-----:|
+| [Archetype 1] | [N] | [X] | [Y%] | ✓/✗ |
+| [Archetype 2] | [N] | [X] | [Y%] | ✓/✗ |
+| [Archetype 3] | [N] | [X] | [Y%] | ✓/✗ |
+**Overall:** [PASS / FAIL] — [lowest accuracy]% (gate: ≥70%)
+```
+## GATE DECISION
+### If PASS (all archetypes ≥70%):
+```
+✅ Model accuracy validated. Atlas is ready for production use.
+**Findings:**
+- Strongest archetype: [name] at [X%]
+- Weakest archetype: [name] at [X%]
+- Common issues: [brief summary of 0.0 and 0.5 patterns]
+**Recommendation:** Proceed to Epic 3 (Absence Detection).
+```
+### If FAIL (any archetype <70%):
+```
+❌ Model accuracy below threshold. Iteration required.
+**Failing archetypes:**
+- [name]: [X%] — [primary issue pattern]
+**Iteration guidance:**
+1. Review 0.0-scored capabilities — are they prompt issues or knowledge gaps?
+2. Review 0.5-scored capabilities — can descriptions be improved?
+3. Adjust Atlas's generation prompts (step-02-generate-capabilities.md)
+4. Re-run this validation workflow
+**BLOCKER:** Do not proceed to Epic 3 until ≥70% accuracy achieved across all archetypes.
+```
+## OUTPUT ARTIFACT
+Write the validation results to `_bmad-output/gyre-artifacts/accuracy-validation-[date].md` for team reference:
+```markdown
+# Gyre Model Accuracy Validation — [date]
+## Summary
+- **Result:** [PASS/FAIL]
+- **Archetypes tested:** [N]
+- **Accuracy range:** [lowest%] — [highest%]
+- **Gate threshold:** ≥70%
+## Detailed Scores
+[Full scoring tables from above]
+## Methodology Notes
+- [Live scan vs synthetic profiles used]
+- [Web search performed vs skipped]
+- [Any scoring disputes and resolutions]
+```
+---
+## Gyre Compass
+Based on what you just completed, here are your options:
+| If you want to... | Consider next... | Agent | Why |
+|---|---|---|---|
+| Iterate model prompts | model-generation | Atlas 📐 | Improve accuracy for failing archetypes |
+| Run full analysis pipeline | full-analysis | Scout 🔎 | Complete end-to-end with validated model |
+| Re-detect a different stack | stack-detection | Scout 🔎 | Test against a new archetype |
+> **Note:** These are recommendations. You can run any Gyre workflow at any time.

package/_bmad/bme/_gyre/workflows/accuracy-validation/workflow.md ADDED Viewed

@@ -0,0 +1,41 @@
+---
+name: accuracy-validation
+agent: model-curator
+title: Model Accuracy Validation
+description: Pre-pilot validation of model accuracy against synthetic ground truth repos — NFR19 ≥70% gate
+steps: 3
+---
+# Accuracy Validation Workflow
+Validates that Atlas can generate capabilities manifests that are ≥70% relevant across diverse stack archetypes. This is the **go/no-go gate** for the entire Gyre product (NFR19).
+## Purpose
+Model generation is the critical path for Gyre. If Atlas produces irrelevant capabilities, Lens will flag false positives and Coach will waste the user's time reviewing noise. This workflow provides a repeatable methodology to measure and iterate model quality.
+## Scoring Methodology
+Each generated capability is scored:
+| Score | Meaning | Criteria |
+|-------|---------|----------|
+| 1.0 | **Relevant** | Capability is appropriate for this stack archetype and would appear in a production readiness checklist written by an expert |
+| 0.5 | **Partially relevant** | Capability is tangentially related but not specific to this stack, OR is relevant but poorly described |
+| 0.0 | **Irrelevant** | Capability has no meaningful relationship to this stack archetype |
+**Accuracy formula:** `accuracy = sum_of_scores / total_capabilities`
+## Gate Criteria
+- Run against ≥3 stack archetypes (different language/framework/deployment combos)
+- ≥70% accuracy across ALL archetypes (not averaged)
+- If any archetype scores <70%, iterate Atlas prompts before proceeding to Epic 3
+## Instructions
+Load the first step to begin:
+```
+Load step: {project-root}/_bmad/bme/_gyre/workflows/accuracy-validation/steps/step-01-select-repos.md
+```

package/_bmad/bme/_gyre/workflows/delta-report/steps/step-01-load-history.md ADDED Viewed

@@ -0,0 +1,63 @@
+---
+step: 1
+workflow: delta-report
+title: Load History
+implements: Story 4.6 (FR39)
+---
+# Step 1: Load History
+Load current findings and previous findings for comparison.
+## MANDATORY EXECUTION RULES
+- Current findings (`.gyre/findings.yaml`) MUST exist — if missing, STOP
+- Previous findings (`.gyre/history.yaml`) are optional — first run has no history
+## EXECUTION
+### 1. Load Current Findings
+Read `.gyre/findings.yaml`. If missing:
+```
+❌ No current findings found at .gyre/findings.yaml
+Delta report requires a current analysis to compare against.
+Run gap-analysis first to generate findings, then run delta-report.
+```
+Then STOP — do not proceed.
+Parse the findings array and compound_findings array from GC3.
+### 2. Load Previous Findings
+Read `.gyre/history.yaml`. If missing:
+```
+💡 No previous findings found — this is your first delta-capable run.
+All current findings will be tagged [NEW]. After this report,
+your current findings will be saved as history for future comparison.
+```
+Set `first_run = true` and proceed with empty previous findings.
+If present, parse the findings array and compound_findings array.
+### 3. Validate Compatibility
+Check that both files have the same schema version. If versions differ:
+```
+⚠️ Schema version mismatch: current is v[X], history is v[Y].
+Proceeding with best-effort comparison — some findings may not correlate perfectly.
+```
+---
+## NEXT STEP
+Load step: {project-root}/_bmad/bme/_gyre/workflows/delta-report/steps/step-02-compute-delta.md

package/_bmad/bme/_gyre/workflows/delta-report/steps/step-02-compute-delta.md ADDED Viewed

@@ -0,0 +1,72 @@
+---
+step: 2
+workflow: delta-report
+title: Compute Delta
+implements: Story 4.6 (FR40)
+---
+# Step 2: Compute Delta
+Compare current findings against previous findings to classify each as new, carried-forward, or resolved.
+## MANDATORY EXECUTION RULES
+- Match findings by `capability_ref` (primary key) and `domain` (secondary key)
+- New = in current but not in previous
+- Carried-forward = in both current and previous (same capability_ref)
+- Resolved = in previous but not in current
+- Also compute compound finding delta using `related_findings` arrays
+## EXECUTION
+### 1. Build Finding Maps
+Create lookup maps from both finding sets:
+- **Current map:** `{ capability_ref → finding }` for all current findings
+- **Previous map:** `{ capability_ref → finding }` for all previous findings
+### 2. Classify Findings
+For each current finding:
+- If `capability_ref` exists in previous map → **CARRIED** (carried-forward)
+  - Note any severity changes: "was [old severity], now [new severity]"
+  - Note any confidence changes
+- If `capability_ref` NOT in previous map → **NEW**
+For each previous finding:
+- If `capability_ref` NOT in current map → **RESOLVED**
+### 3. Classify Compound Findings
+For each current compound:
+- If both `related_findings` IDs have carried-forward counterparts → **CARRIED**
+- Otherwise → **NEW**
+For each previous compound:
+- If either `related_findings` ID is resolved → **RESOLVED**
+### 4. Compute Summary Statistics
+```
+delta_summary:
+  new_findings: [count]
+  carried_forward: [count]
+  resolved: [count]
+  severity_changes: [count]
+  new_compounds: [count]
+  resolved_compounds: [count]
+  net_change: [current total - previous total] (positive = more findings)
+```
+### 5. First Run Handling
+If `first_run = true`:
+- All current findings are **NEW**
+- No carried-forward or resolved findings
+- No previous compounds
+---
+## NEXT STEP
+Load step: {project-root}/_bmad/bme/_gyre/workflows/delta-report/steps/step-03-present-delta.md

package/_bmad/bme/_gyre/workflows/delta-report/steps/step-03-present-delta.md ADDED Viewed

@@ -0,0 +1,143 @@
+---
+step: 3
+workflow: delta-report
+title: Present Delta
+implements: Story 4.6 (FR41)
+---
+# Step 3: Present Delta
+Present the delta report and save current findings as history for next run.
+## MANDATORY EXECUTION RULES
+- Present delta with [NEW], [CARRIED] tags (FR41)
+- List resolved findings briefly
+- Note severity changes on carried-forward findings
+- Save current findings as history after presentation
+- Output must be copy-pasteable into Slack/Jira/docs (FR49)
+## CONVERSATIONAL PRESENTATION
+### 1. Delta Header
+```
+## Delta Report — [Crisis Mode / Anticipation Mode]
+**Comparing:** [current analyzed_at] vs [previous analyzed_at]
+```
+If first run:
+```
+## Delta Report — First Run (Baseline)
+**Baseline established:** [current analyzed_at]
+All findings are new — this report establishes your baseline for future comparison.
+```
+### 2. Delta Summary
+```
+| Status | Count |
+|--------|:-----:|
+| 🆕 New findings | [N] |
+| ➡️ Carried forward | [N] |
+| ✅ Resolved | [N] |
+| **Net change** | **[+/-N]** |
+```
+### 3. New Findings [NEW]
+```
+### 🆕 New Findings
+| # | [NEW] Finding | Severity | Confidence |
+|---|--------------|----------|:----------:|
+| [ID] | [description] | [severity] | [confidence] |
+```
+If no new findings:
+```
+### 🆕 New Findings
+No new findings since last analysis. ✅
+```
+### 4. Carried Forward [CARRIED]
+```
+### ➡️ Carried Forward
+| # | [CARRIED] Finding | Severity | Change |
+|---|-------------------|----------|--------|
+| [ID] | [description] | [severity] | [unchanged / was: old severity] |
+```
+If no carried-forward:
+```
+### ➡️ Carried Forward
+No carried-forward findings — all previous findings are resolved! 🎉
+```
+### 5. Resolved Findings
+```
+### ✅ Resolved
+These findings from your previous analysis were not found this time:
+| # | Finding | Was Severity |
+|---|---------|-------------|
+| [ID] | [description] | [severity] |
+```
+If no resolved:
+```
+### ✅ Resolved
+No findings were resolved since last analysis.
+```
+### 6. Compound Finding Changes
+If there are new or resolved compounds:
+```
+### ⚡ Compound Finding Changes
+**New compounds:**
+- [COMPOUND-NNN]: [description] (combines [ID] + [ID])
+**Resolved compounds:**
+- [COMPOUND-NNN]: [description] — no longer applies
+```
+### 7. Save History
+After presenting, save current findings as history:
+1. Copy `.gyre/findings.yaml` content to `.gyre/history.yaml`
+2. Confirm:
+```
+---
+Current findings saved as history for next delta comparison.
+Written to `.gyre/history.yaml`
+```
+### 8. Gyre Compass
+```
+## What's Next?
+| If you want to... | Consider next... | Agent | Why |
+|---|---|---|---|
+| Review and customize findings | model-review | Coach 🏋️ | Walk through findings, amend capabilities |
+| Re-run analysis to check progress | gap-analysis | Lens 🔬 | See if fixes resolved findings |
+| Regenerate the model | model-generation | Atlas 📐 | Model may need adjustment |
+| Run the full pipeline | full-analysis | Scout 🔎 | Complete end-to-end analysis |
+| Share progress with your team | — | — | Commit .gyre/ directory to your repository |
+> **Note:** These are recommendations. You can run any Gyre workflow at any time.
+```

package/_bmad/bme/_gyre/workflows/delta-report/workflow.md ADDED Viewed

@@ -0,0 +1,34 @@
+---
+name: delta-report
+agent: readiness-analyst
+title: Delta Report
+description: Compare current findings against previous run — track progress over time
+steps: 3
+implements: Epic 4 (Story 4.6)
+---
+# Delta Report Workflow
+Compares current `.gyre/findings.yaml` against `.gyre/history.yaml` to show what changed: new findings, carried-forward findings, and resolved gaps.
+## Prerequisites
+- GC3 (Findings Report) at `.gyre/findings.yaml` — current analysis results
+- `.gyre/history.yaml` — previous findings (saved automatically after each delta report)
+## Pipeline
+| Step | File | Action |
+|------|------|--------|
+| 1 | step-01-load-history.md | Load current and previous findings |
+| 2 | step-02-compute-delta.md | Compute: new findings, carried-forward, resolved |
+| 3 | step-03-present-delta.md | Present delta with [NEW], [CARRIED], resolved list |
+## First Run Behavior
+If no `.gyre/history.yaml` exists, this is the first delta-capable run. All current findings are tagged [NEW], and the current findings are saved as history for future comparison.
+## Error Recovery
+- If `.gyre/findings.yaml` missing: inform user to run gap-analysis first
+- If `.gyre/history.yaml` missing: treat as first run — all findings are [NEW]

package/_bmad/bme/_gyre/workflows/full-analysis/steps/step-01-initialize.md ADDED Viewed

@@ -0,0 +1,68 @@
+---
+step: 1
+workflow: full-analysis
+title: Initialize Analysis
+---
+# Step 1: Initialize Analysis
+Set up the `.gyre/` directory and detect the analysis mode.
+## MANDATORY EXECUTION RULES
+- Use Claude Code tools (Glob, Read) — do NOT ask the user for filesystem information
+- Create `.gyre/` at the project root (or service root in monorepo) if it doesn't exist
+- Detect mode automatically — only ask the user if regeneration intent is ambiguous
+## INITIALIZATION SEQUENCE
+### 1. Check for Existing `.gyre/` Directory
+**Glob** for `.gyre/` at the project root:
+- `.gyre/stack-profile.yaml` → GC1 exists (previous detection)
+- `.gyre/capabilities.yaml` → GC2 exists (previous model generation)
+- `.gyre/findings.yaml` → GC3 exists (previous analysis)
+- `.gyre/.lock` → Concurrent analysis guard (NFR13)
+### 2. Detect Analysis Mode
+Based on what exists in `.gyre/`:
+| Condition | Mode | Behavior |
+|-----------|------|----------|
+| `.gyre/` does not exist | **Crisis** | First run — create directory, run full pipeline |
+| `.gyre/capabilities.yaml` exists | **Anticipation** | Re-run — skip model generation (step 3), use cached model |
+| User explicitly says "regenerate" or "fresh" | **Regeneration** | Fresh model — ignore cache, run full pipeline |
+### 3. Lock File Check (NFR13)
+If `.gyre/.lock` exists:
+- Read the lock file for timestamp and process info
+- If older than 5 minutes: warn the user, suggest removing it
+- If recent: inform the user another analysis may be in progress, ask to proceed or wait
+### 4. Create `.gyre/` Directory
+If `.gyre/` doesn't exist, create it.
+### 5. Report Initialization
+Present the initialization status conversationally:
+```
+## Analysis Initialized
+**Mode:** [Crisis / Anticipation / Regeneration]
+**Directory:** .gyre/ [created / already exists]
+**Existing artifacts:** [list any found, or "none — first run"]
+Starting stack detection...
+```
+---
+## NEXT STEP
+Proceed to stack detection:
+Load step: {project-root}/_bmad/bme/_gyre/workflows/full-analysis/steps/step-02-detect-stack.md