npm - @crewpilot/agent - Versions diffs - 2.0.0 → 3.0.0 - Mend

@crewpilot/agent 2.0.0 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/README.md +131 -131
package/dist-npm/cli.js +5 -5
package/dist-npm/index.js +100 -100
package/package.json +69 -69
package/prompts/agent.md +282 -282
package/prompts/copilot-instructions.md +36 -36
package/prompts/{catalyst.config.json → crewpilot.config.json} +72 -72
package/prompts/skills/assure-code-quality/SKILL.md +112 -112
package/prompts/skills/assure-pr-intelligence/SKILL.md +148 -148
package/prompts/skills/assure-review-functional/SKILL.md +114 -114
package/prompts/skills/assure-review-standards/SKILL.md +106 -106
package/prompts/skills/assure-threat-model/SKILL.md +182 -182
package/prompts/skills/assure-vulnerability-scan/SKILL.md +146 -146
package/prompts/skills/autopilot-meeting/SKILL.md +434 -434
package/prompts/skills/autopilot-worker/SKILL.md +737 -737
package/prompts/skills/daily-digest/SKILL.md +188 -188
package/prompts/skills/deliver-change-management/SKILL.md +132 -132
package/prompts/skills/deliver-deploy-guard/SKILL.md +144 -144
package/prompts/skills/deliver-doc-governance/SKILL.md +130 -130
package/prompts/skills/engineer-feature-builder/SKILL.md +270 -270
package/prompts/skills/engineer-root-cause-analysis/SKILL.md +150 -150
package/prompts/skills/engineer-test-first/SKILL.md +148 -148
package/prompts/skills/insights-knowledge-base/SKILL.md +202 -202
package/prompts/skills/insights-pattern-detection/SKILL.md +142 -142
package/prompts/skills/strategize-architecture-planner/SKILL.md +141 -141
package/prompts/skills/strategize-solution-design/SKILL.md +118 -118
package/scripts/postinstall.js +108 -108

package/prompts/skills/assure-pr-intelligence/SKILL.md CHANGED Viewed

@@ -1,148 +1,148 @@
-# PR Intelligence
-> **Pillar**: Assure | **ID**: `assure-pr-intelligence`
-## Purpose
-Automated pull request analysis — generates structured summaries, risk assessments, reviewer guidance, and change impact analysis. Turns PRs from walls of diff into clear narratives.
-## Activation Triggers
-- "review this PR", "summarize PR", "PR summary", "pull request review"
-- "what changed in this PR", "is this PR safe to merge"
-- When a PR URL or branch diff is provided
-## Methodology
-### Process Flow
-```dot
-digraph pr_intelligence {
-    rankdir=LR;
-    node [shape=box];
-    inventory [label="Phase 1\nChange Inventory"];
-    narrative [label="Phase 2\nNarrative Summary"];
-    risk [label="Phase 3\nRisk Assessment"];
-    guidance [label="Phase 4\nReviewer Guidance"];
-    checklist [label="Phase 5\nMerge Readiness", shape=doublecircle];
-    inventory -> narrative;
-    narrative -> risk;
-    risk -> guidance;
-    guidance -> checklist;
-}
-```
-### Phase 0 — Acceptance Criteria Verification
-1. Fetch the linked issue/task (via `catalyst_board_get` or the PR description's `Closes #N`)
-2. Extract the acceptance criteria checklist from the issue description
-3. For each criterion, verify whether the PR's changes satisfy it:
-   - **Met** — Code changes clearly implement the criterion
-   - **Partially met** — Some work done but incomplete
-   - **Not met** — No evidence of this criterion in the diff
-4. Any **Not met** criteria are automatic blockers — the PR cannot be approved
-5. Include the acceptance criteria verdict in the output:
-   ```
-   ### Acceptance Criteria
-   - [x] Criterion 1 — Met (implemented in file.py)
-   - [ ] Criterion 2 — Not met (missing from PR)
-   - [~] Criterion 3 — Partially met (needs X)
-   ```
-### Phase 1 — Change Inventory
-1. Get the diff (via git or GitHub API)
-2. Categorize files changed:
-   - `core` — Business logic, domain models
-   - `api` — Endpoint changes, route modifications
-   - `infra` — CI/CD, Dockerfiles, IaC
-   - `test` — Test files
-   - `config` — Configuration, env vars
-   - `docs` — Documentation
-3. Calculate change metrics: files changed, lines added/removed, churn
-4. Identify new vs. modified vs. deleted files
-### Phase 2 — Narrative Summary
-Generate a human-readable summary:
-1. **What**: One paragraph explaining what the PR accomplishes
-2. **Why**: Inferred motivation (from commit messages, PR description, code context)
-3. **How**: Key implementation decisions and patterns used
-### Phase 3 — Risk Assessment
-Evaluate each risk dimension:
-| Dimension | Low | Medium | High |
-|---|---|---|---|
-| **Scope** | < 50 lines, 1-2 files | 50-200 lines, 3-5 files | > 200 lines or > 5 files |
-| **Complexity** | Simple refactors | New logic paths | Algorithm/architecture changes |
-| **Blast radius** | Isolated module | Shared utilities | Core framework, DB schema |
-| **Test coverage** | Well-tested changes | Partial coverage | No tests for new code |
-| **Reversibility** | Feature flag or easy revert | Rollback possible | DB migration, API contract |
-Produce overall risk score: **Low / Medium / High / Critical**
-### Phase 4 — Reviewer Guidance
-1. List files to review first (highest risk → lowest)
-2. Call out specific lines that need careful attention
-3. Suggest questions the reviewer should ask
-4. Identify what's NOT in the PR that probably should be (missing tests, missing docs, missing migration)
-### Phase 5 — Merge Readiness Checklist
-- [ ] Tests pass / test coverage adequate
-- [ ] No security findings above medium
-- [ ] Breaking changes documented
-- [ ] PR description matches actual changes
-- [ ] Dependencies updated safely
-## Tools Required
-- `githubRepo` — Fetch PR details, diff, commit history
-- `codebase` — Understand impacted areas in the broader codebase
-- `catalyst_git_diff` — Get precise diff data
-- `catalyst_git_log` — Understand commit narrative
-## Output Format
-```
-## [Catalyst → PR Intelligence]
-### Summary
-**What**: {one paragraph}
-**Why**: {motivation}
-**How**: {key decisions}
-### Change Inventory
-| Category | Files | Lines (+/-) |
-|---|---|---|
-| core | | |
-| test | | |
-| ... | | |
-### Risk Assessment: {Low/Medium/High/Critical}
-{risk table with evaluations}
-### Review Guide
-**Start with**: {ordered file list}
-**Pay attention to**:
-- {file}:{line} — {why}
-- ...
-**Missing from PR**:
-- {what's absent}
-### Merge Readiness
-{checklist with status}
-```
-## Chains To
-- `code-quality` — Deep review of flagged files
-- `vulnerability-scan` — If risk assessment flags security-adjacent changes
-- `change-management` — Verify commit message quality
-## Anti-Patterns
-- Do NOT rubber-stamp — always identify at least one concern or question
-- Do NOT summarize the diff line-by-line — synthesize into a narrative
-- Do NOT skip risk assessment for "small" PRs — small and dangerous is common
-- Do NOT ignore test absence — explicitly call it out
+# PR Intelligence
+> **Pillar**: Assure | **ID**: `assure-pr-intelligence`
+## Purpose
+Automated pull request analysis — generates structured summaries, risk assessments, reviewer guidance, and change impact analysis. Turns PRs from walls of diff into clear narratives.
+## Activation Triggers
+- "review this PR", "summarize PR", "PR summary", "pull request review"
+- "what changed in this PR", "is this PR safe to merge"
+- When a PR URL or branch diff is provided
+## Methodology
+### Process Flow
+```dot
+digraph pr_intelligence {
+    rankdir=LR;
+    node [shape=box];
+    inventory [label="Phase 1\nChange Inventory"];
+    narrative [label="Phase 2\nNarrative Summary"];
+    risk [label="Phase 3\nRisk Assessment"];
+    guidance [label="Phase 4\nReviewer Guidance"];
+    checklist [label="Phase 5\nMerge Readiness", shape=doublecircle];
+    inventory -> narrative;
+    narrative -> risk;
+    risk -> guidance;
+    guidance -> checklist;
+}
+```
+### Phase 0 — Acceptance Criteria Verification
+1. Fetch the linked issue/task (via `crewpilot_board_get` or the PR description's `Closes #N`)
+2. Extract the acceptance criteria checklist from the issue description
+3. For each criterion, verify whether the PR's changes satisfy it:
+   - **Met** — Code changes clearly implement the criterion
+   - **Partially met** — Some work done but incomplete
+   - **Not met** — No evidence of this criterion in the diff
+4. Any **Not met** criteria are automatic blockers — the PR cannot be approved
+5. Include the acceptance criteria verdict in the output:
+   ```
+   ### Acceptance Criteria
+   - [x] Criterion 1 — Met (implemented in file.py)
+   - [ ] Criterion 2 — Not met (missing from PR)
+   - [~] Criterion 3 — Partially met (needs X)
+   ```
+### Phase 1 — Change Inventory
+1. Get the diff (via git or GitHub API)
+2. Categorize files changed:
+   - `core` — Business logic, domain models
+   - `api` — Endpoint changes, route modifications
+   - `infra` — CI/CD, Dockerfiles, IaC
+   - `test` — Test files
+   - `config` — Configuration, env vars
+   - `docs` — Documentation
+3. Calculate change metrics: files changed, lines added/removed, churn
+4. Identify new vs. modified vs. deleted files
+### Phase 2 — Narrative Summary
+Generate a human-readable summary:
+1. **What**: One paragraph explaining what the PR accomplishes
+2. **Why**: Inferred motivation (from commit messages, PR description, code context)
+3. **How**: Key implementation decisions and patterns used
+### Phase 3 — Risk Assessment
+Evaluate each risk dimension:
+| Dimension | Low | Medium | High |
+|---|---|---|---|
+| **Scope** | < 50 lines, 1-2 files | 50-200 lines, 3-5 files | > 200 lines or > 5 files |
+| **Complexity** | Simple refactors | New logic paths | Algorithm/architecture changes |
+| **Blast radius** | Isolated module | Shared utilities | Core framework, DB schema |
+| **Test coverage** | Well-tested changes | Partial coverage | No tests for new code |
+| **Reversibility** | Feature flag or easy revert | Rollback possible | DB migration, API contract |
+Produce overall risk score: **Low / Medium / High / Critical**
+### Phase 4 — Reviewer Guidance
+1. List files to review first (highest risk → lowest)
+2. Call out specific lines that need careful attention
+3. Suggest questions the reviewer should ask
+4. Identify what's NOT in the PR that probably should be (missing tests, missing docs, missing migration)
+### Phase 5 — Merge Readiness Checklist
+- [ ] Tests pass / test coverage adequate
+- [ ] No security findings above medium
+- [ ] Breaking changes documented
+- [ ] PR description matches actual changes
+- [ ] Dependencies updated safely
+## Tools Required
+- `githubRepo` — Fetch PR details, diff, commit history
+- `codebase` — Understand impacted areas in the broader codebase
+- `crewpilot_git_diff` — Get precise diff data
+- `crewpilot_git_log` — Understand commit narrative
+## Output Format
+```
+## [CrewPilot → PR Intelligence]
+### Summary
+**What**: {one paragraph}
+**Why**: {motivation}
+**How**: {key decisions}
+### Change Inventory
+| Category | Files | Lines (+/-) |
+|---|---|---|
+| core | | |
+| test | | |
+| ... | | |
+### Risk Assessment: {Low/Medium/High/Critical}
+{risk table with evaluations}
+### Review Guide
+**Start with**: {ordered file list}
+**Pay attention to**:
+- {file}:{line} — {why}
+- ...
+**Missing from PR**:
+- {what's absent}
+### Merge Readiness
+{checklist with status}
+```
+## Chains To
+- `code-quality` — Deep review of flagged files
+- `vulnerability-scan` — If risk assessment flags security-adjacent changes
+- `change-management` — Verify commit message quality
+## Anti-Patterns
+- Do NOT rubber-stamp — always identify at least one concern or question
+- Do NOT summarize the diff line-by-line — synthesize into a narrative
+- Do NOT skip risk assessment for "small" PRs — small and dangerous is common
+- Do NOT ignore test absence — explicitly call it out

package/prompts/skills/assure-review-functional/SKILL.md CHANGED Viewed

@@ -1,114 +1,114 @@
-# Code Review — Functional
-> **Pillar**: Assure | **ID**: `assure-review-functional`
-## Purpose
-Focused code review that evaluates **correctness, security, and performance** — the functional aspects that determine whether code works safely and efficiently. Separated from standards review so each can be delegated to a specialized subagent or run independently.
-## Activation Triggers
-- "functional review", "correctness review", "does this code work", "security review", "performance review"
-- Automatically invoked by autopilot-worker Phase 6 via subagent delegation (role: `code-reviewer`)
-- Can be run standalone for targeted reviews
-## Methodology
-### Pass 1 — Correctness
-1. Trace the primary execution path against acceptance criteria
-2. Identify logic errors, off-by-one, null/undefined risks, race conditions
-3. Check edge cases: empty inputs, boundary values, error paths
-4. Verify resource cleanup (connections, file handles, subscriptions)
-5. Verify error handling: are errors caught, logged, and surfaced appropriately?
-6. Confidence-gate: only report findings ≥ threshold
-### Pass 2 — Security (OWASP Top 10 Quick Check)
-1. **Injection** (A03): SQL, NoSQL, OS command, LDAP injection vectors
-2. **Broken Auth** (A07): hardcoded credentials, weak session management
-3. **Sensitive Data Exposure** (A02): secrets in code, unencrypted PII, overly broad API responses
-4. **XSS** (A03): unescaped user input in HTML/templates
-5. **Insecure Deserialization** (A08): untrusted JSON/YAML parsing without validation
-6. **Broken Access Control** (A01): missing authorization checks, IDOR vulnerabilities
-7. **Security Misconfiguration** (A05): debug mode in prod, overly permissive CORS, default credentials
-8. Flag any `eval()`, `dangerouslySetInnerHTML`, `exec()`, or equivalent patterns
-### Pass 3 — Performance
-1. Identify O(n²) or worse patterns in hot paths
-2. Flag await-in-loops and N+1 query patterns
-3. Check for unnecessary allocations in loops
-4. Look for missing caching opportunities on repeated computations
-5. Identify blocking calls that could be async
-6. Run `catalyst_metrics_complexity` on changed files — flag functions above threshold
-### Pass 4 — Requirements Alignment (optional, requires Work IQ)
-If M365 context is available (via prior `analysis` artifact or direct query), validate the code changes against stated requirements:
-1. Read the `analysis` artifact from the workflow (if running as subagent with a `workflow_id`) to retrieve M365 requirements context
-2. If no artifact exists but `mcp_workiq_ask_work_iq` is available, query: "What requirements and acceptance criteria were stated for {feature/issue title} in recent meetings and emails?"
-3. For each stated requirement, check the code changes:
-   - **Implemented**: requirement is fully addressed by the code ✓
-   - **Partial**: requirement is partially addressed — note what's missing
-   - **Not addressed**: requirement has no corresponding implementation
-4. Cross-reference acceptance criteria from the board issue against the actual behavior of the code
-5. Flag any requirement gaps as `medium` severity findings
-> **Note**: This pass is skipped if no M365 context is available and no board issue acceptance criteria exist. It does not block the review.
-### Synthesis
-1. Rank all findings by severity: `critical > high > medium > low`
-2. Filter by `severity_floor` from config
-3. Group by file/function
-4. Provide specific fix suggestions with code snippets
-5. If invoked as subagent, write output as artifact via `catalyst_artifact_write` (phase: `review-functional`)
-## Tools Required
-- `catalyst_metrics_complexity` — Get cyclomatic/cognitive complexity scores
-- `catalyst_metrics_coverage` — Check test coverage for reviewed code
-- `catalyst_artifact_write` — Persist review findings as artifact (when run as subagent)
-- `catalyst_artifact_read` — Read prior analysis artifacts for context (includes M365 requirements context)
-- `mcp_workiq_ask_work_iq` — (optional) Query M365 for stated requirements when no analysis artifact exists
-## Output Format
-```
-## [Catalyst → Functional Review]
-### Summary
-{N} findings across {files}: {critical} critical, {high} high, {medium} medium
-### Correctness
-| Severity | File:Line | Issue | Suggested Fix |
-|----------|-----------|-------|---------------|
-| ...      | ...       | ...   | ...           |
-### Security
-| Severity | OWASP Cat | File:Line | Issue | Mitigation |
-|----------|-----------|-----------|-------|------------|
-| ...      | ...       | ...       | ...   | ...        |
-### Performance
-| Severity | File:Line | Issue | Suggested Fix |
-|----------|-----------|-------|---------------|
-| ...      | ...       | ...   | ...           |
-### Requirements Alignment (if M365 context available)
-| Requirement | Status | Evidence | Gap |
-|-------------|--------|----------|-----|
-| ...         | ✓/⚠️/❌ | ...      | ... |
-### Verdict
-{PASS | PASS_WITH_WARNINGS | FAIL}
-Confidence: {N}/10
-```
-## Chains To
-- `assure-review-standards` — Companion skill for conventions/consistency review
-- `assure-code-quality` — Full 4-pass review (superset of this skill)
-- `assure-vulnerability-scan` — Deep security audit (more thorough than Pass 2 here)
+# Code Review — Functional
+> **Pillar**: Assure | **ID**: `assure-review-functional`
+## Purpose
+Focused code review that evaluates **correctness, security, and performance** — the functional aspects that determine whether code works safely and efficiently. Separated from standards review so each can be delegated to a specialized subagent or run independently.
+## Activation Triggers
+- "functional review", "correctness review", "does this code work", "security review", "performance review"
+- Automatically invoked by autopilot-worker Phase 6 via subagent delegation (role: `code-reviewer`)
+- Can be run standalone for targeted reviews
+## Methodology
+### Pass 1 — Correctness
+1. Trace the primary execution path against acceptance criteria
+2. Identify logic errors, off-by-one, null/undefined risks, race conditions
+3. Check edge cases: empty inputs, boundary values, error paths
+4. Verify resource cleanup (connections, file handles, subscriptions)
+5. Verify error handling: are errors caught, logged, and surfaced appropriately?
+6. Confidence-gate: only report findings ≥ threshold
+### Pass 2 — Security (OWASP Top 10 Quick Check)
+1. **Injection** (A03): SQL, NoSQL, OS command, LDAP injection vectors
+2. **Broken Auth** (A07): hardcoded credentials, weak session management
+3. **Sensitive Data Exposure** (A02): secrets in code, unencrypted PII, overly broad API responses
+4. **XSS** (A03): unescaped user input in HTML/templates
+5. **Insecure Deserialization** (A08): untrusted JSON/YAML parsing without validation
+6. **Broken Access Control** (A01): missing authorization checks, IDOR vulnerabilities
+7. **Security Misconfiguration** (A05): debug mode in prod, overly permissive CORS, default credentials
+8. Flag any `eval()`, `dangerouslySetInnerHTML`, `exec()`, or equivalent patterns
+### Pass 3 — Performance
+1. Identify O(n²) or worse patterns in hot paths
+2. Flag await-in-loops and N+1 query patterns
+3. Check for unnecessary allocations in loops
+4. Look for missing caching opportunities on repeated computations
+5. Identify blocking calls that could be async
+6. Run `crewpilot_metrics_complexity` on changed files — flag functions above threshold
+### Pass 4 — Requirements Alignment (optional, requires Work IQ)
+If M365 context is available (via prior `analysis` artifact or direct query), validate the code changes against stated requirements:
+1. Read the `analysis` artifact from the workflow (if running as subagent with a `workflow_id`) to retrieve M365 requirements context
+2. If no artifact exists but `mcp_workiq_ask_work_iq` is available, query: "What requirements and acceptance criteria were stated for {feature/issue title} in recent meetings and emails?"
+3. For each stated requirement, check the code changes:
+   - **Implemented**: requirement is fully addressed by the code ✓
+   - **Partial**: requirement is partially addressed — note what's missing
+   - **Not addressed**: requirement has no corresponding implementation
+4. Cross-reference acceptance criteria from the board issue against the actual behavior of the code
+5. Flag any requirement gaps as `medium` severity findings
+> **Note**: This pass is skipped if no M365 context is available and no board issue acceptance criteria exist. It does not block the review.
+### Synthesis
+1. Rank all findings by severity: `critical > high > medium > low`
+2. Filter by `severity_floor` from config
+3. Group by file/function
+4. Provide specific fix suggestions with code snippets
+5. If invoked as subagent, write output as artifact via `crewpilot_artifact_write` (phase: `review-functional`)
+## Tools Required
+- `crewpilot_metrics_complexity` — Get cyclomatic/cognitive complexity scores
+- `crewpilot_metrics_coverage` — Check test coverage for reviewed code
+- `crewpilot_artifact_write` — Persist review findings as artifact (when run as subagent)
+- `crewpilot_artifact_read` — Read prior analysis artifacts for context (includes M365 requirements context)
+- `mcp_workiq_ask_work_iq` — (optional) Query M365 for stated requirements when no analysis artifact exists
+## Output Format
+```
+## [CrewPilot → Functional Review]
+### Summary
+{N} findings across {files}: {critical} critical, {high} high, {medium} medium
+### Correctness
+| Severity | File:Line | Issue | Suggested Fix |
+|----------|-----------|-------|---------------|
+| ...      | ...       | ...   | ...           |
+### Security
+| Severity | OWASP Cat | File:Line | Issue | Mitigation |
+|----------|-----------|-----------|-------|------------|
+| ...      | ...       | ...       | ...   | ...        |
+### Performance
+| Severity | File:Line | Issue | Suggested Fix |
+|----------|-----------|-------|---------------|
+| ...      | ...       | ...   | ...           |
+### Requirements Alignment (if M365 context available)
+| Requirement | Status | Evidence | Gap |
+|-------------|--------|----------|-----|
+| ...         | ✓/⚠️/❌ | ...      | ... |
+### Verdict
+{PASS | PASS_WITH_WARNINGS | FAIL}
+Confidence: {N}/10
+```
+## Chains To
+- `assure-review-standards` — Companion skill for conventions/consistency review
+- `assure-code-quality` — Full 4-pass review (superset of this skill)
+- `assure-vulnerability-scan` — Deep security audit (more thorough than Pass 2 here)