npm - @fyow/copilot-everything - Versions diffs - 1.0.2 → 1.0.3 - Mend

@fyow/copilot-everything 1.0.2 → 1.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

package/.github/agents/architect.agent.md +113 -5
package/.github/agents/build-error-resolver.agent.md +454 -42
package/.github/agents/code-reviewer.agent.md +24 -13
package/.github/agents/doc-updater.agent.md +366 -36
package/.github/agents/e2e-runner.agent.md +626 -69
package/.github/agents/planner.agent.md +25 -2
package/.github/agents/refactor-cleaner.agent.md +218 -35
package/.github/agents/security-reviewer.agent.md +485 -70
package/.github/agents/tdd-guide.agent.md +174 -22
package/.github/instructions/agents.instructions.md +53 -0
package/.github/instructions/coding-style.instructions.md +20 -13
package/.github/instructions/git-workflow.instructions.md +5 -18
package/.github/instructions/hooks.instructions.md +52 -0
package/.github/instructions/patterns.instructions.md +59 -0
package/.github/instructions/performance.instructions.md +30 -29
package/.github/instructions/security.instructions.md +12 -35
package/.github/instructions/testing.instructions.md +4 -25
package/.github/skills/continuous-learning/SKILL.md +80 -0
package/.github/skills/eval-harness/SKILL.md +221 -0
package/.github/skills/project-guidelines-example/SKILL.md +5 -10
package/.github/skills/strategic-compact/SKILL.md +63 -0
package/.github/skills/verification-loop/SKILL.md +1 -6
package/package.json +1 -1
package/src/commands/init.js +21 -0

package/.github/skills/continuous-learning/SKILL.md ADDED Viewed

@@ -0,0 +1,80 @@
+---
+name: continuous-learning
+description: Automatically extract reusable patterns from Claude Code sessions and save them as learned skills for future use.
+---
+# Continuous Learning Skill
+Automatically evaluates Claude Code sessions on end to extract reusable patterns that can be saved as learned skills.
+## How It Works
+This skill runs as a **Stop hook** at the end of each session:
+1. **Session Evaluation**: Checks if session has enough messages (default: 10+)
+2. **Pattern Detection**: Identifies extractable patterns from the session
+3. **Skill Extraction**: Saves useful patterns to `~/.claude/skills/learned/`
+## Configuration
+Edit `config.json` to customize:
+```json
+{
+  "min_session_length": 10,
+  "extraction_threshold": "medium",
+  "auto_approve": false,
+  "learned_skills_path": "~/.claude/skills/learned/",
+  "patterns_to_detect": [
+    "error_resolution",
+    "user_corrections",
+    "workarounds",
+    "debugging_techniques",
+    "project_specific"
+  ],
+  "ignore_patterns": [
+    "simple_typos",
+    "one_time_fixes",
+    "external_api_issues"
+  ]
+}
+```
+## Pattern Types
+| Pattern | Description |
+|---------|-------------|
+| `error_resolution` | How specific errors were resolved |
+| `user_corrections` | Patterns from user corrections |
+| `workarounds` | Solutions to framework/library quirks |
+| `debugging_techniques` | Effective debugging approaches |
+| `project_specific` | Project-specific conventions |
+## Hook Setup
+Add to your `~/.claude/settings.json`:
+```json
+{
+  "hooks": {
+    "Stop": [{
+      "matcher": "*",
+      "hooks": [{
+        "type": "command",
+        "command": "~/.claude/skills/continuous-learning/evaluate-session.sh"
+      }]
+    }]
+  }
+}
+```
+## Why Stop Hook?
+- **Lightweight**: Runs once at session end
+- **Non-blocking**: Doesn't add latency to every message
+- **Complete context**: Has access to full session transcript
+## Related
+- [The Longform Guide](https://x.com/affaanmustafa/status/2014040193557471352) - Section on continuous learning
+- `/learn` command - Manual pattern extraction mid-session

package/.github/skills/eval-harness/SKILL.md ADDED Viewed

@@ -0,0 +1,221 @@
+# Eval Harness Skill
+A formal evaluation framework for Claude Code sessions, implementing eval-driven development (EDD) principles.
+## Philosophy
+Eval-Driven Development treats evals as the "unit tests of AI development":
+- Define expected behavior BEFORE implementation
+- Run evals continuously during development
+- Track regressions with each change
+- Use pass@k metrics for reliability measurement
+## Eval Types
+### Capability Evals
+Test if Claude can do something it couldn't before:
+```markdown
+[CAPABILITY EVAL: feature-name]
+Task: Description of what Claude should accomplish
+Success Criteria:
+  - [ ] Criterion 1
+  - [ ] Criterion 2
+  - [ ] Criterion 3
+Expected Output: Description of expected result
+```
+### Regression Evals
+Ensure changes don't break existing functionality:
+```markdown
+[REGRESSION EVAL: feature-name]
+Baseline: SHA or checkpoint name
+Tests:
+  - existing-test-1: PASS/FAIL
+  - existing-test-2: PASS/FAIL
+  - existing-test-3: PASS/FAIL
+Result: X/Y passed (previously Y/Y)
+```
+## Grader Types
+### 1. Code-Based Grader
+Deterministic checks using code:
+```bash
+# Check if file contains expected pattern
+grep -q "export function handleAuth" src/auth.ts && echo "PASS" || echo "FAIL"
+# Check if tests pass
+npm test -- --testPathPattern="auth" && echo "PASS" || echo "FAIL"
+# Check if build succeeds
+npm run build && echo "PASS" || echo "FAIL"
+```
+### 2. Model-Based Grader
+Use Claude to evaluate open-ended outputs:
+```markdown
+[MODEL GRADER PROMPT]
+Evaluate the following code change:
+1. Does it solve the stated problem?
+2. Is it well-structured?
+3. Are edge cases handled?
+4. Is error handling appropriate?
+Score: 1-5 (1=poor, 5=excellent)
+Reasoning: [explanation]
+```
+### 3. Human Grader
+Flag for manual review:
+```markdown
+[HUMAN REVIEW REQUIRED]
+Change: Description of what changed
+Reason: Why human review is needed
+Risk Level: LOW/MEDIUM/HIGH
+```
+## Metrics
+### pass@k
+"At least one success in k attempts"
+- pass@1: First attempt success rate
+- pass@3: Success within 3 attempts
+- Typical target: pass@3 > 90%
+### pass^k
+"All k trials succeed"
+- Higher bar for reliability
+- pass^3: 3 consecutive successes
+- Use for critical paths
+## Eval Workflow
+### 1. Define (Before Coding)
+```markdown
+## EVAL DEFINITION: feature-xyz
+### Capability Evals
+1. Can create new user account
+2. Can validate email format
+3. Can hash password securely
+### Regression Evals
+1. Existing login still works
+2. Session management unchanged
+3. Logout flow intact
+### Success Metrics
+- pass@3 > 90% for capability evals
+- pass^3 = 100% for regression evals
+```
+### 2. Implement
+Write code to pass the defined evals.
+### 3. Evaluate
+```bash
+# Run capability evals
+[Run each capability eval, record PASS/FAIL]
+# Run regression evals
+npm test -- --testPathPattern="existing"
+# Generate report
+```
+### 4. Report
+```markdown
+EVAL REPORT: feature-xyz
+========================
+Capability Evals:
+  create-user:     PASS (pass@1)
+  validate-email:  PASS (pass@2)
+  hash-password:   PASS (pass@1)
+  Overall:         3/3 passed
+Regression Evals:
+  login-flow:      PASS
+  session-mgmt:    PASS
+  logout-flow:     PASS
+  Overall:         3/3 passed
+Metrics:
+  pass@1: 67% (2/3)
+  pass@3: 100% (3/3)
+Status: READY FOR REVIEW
+```
+## Integration Patterns
+### Pre-Implementation
+```
+/eval define feature-name
+```
+Creates eval definition file at `.claude/evals/feature-name.md`
+### During Implementation
+```
+/eval check feature-name
+```
+Runs current evals and reports status
+### Post-Implementation
+```
+/eval report feature-name
+```
+Generates full eval report
+## Eval Storage
+Store evals in project:
+```
+.claude/
+  evals/
+    feature-xyz.md      # Eval definition
+    feature-xyz.log     # Eval run history
+    baseline.json       # Regression baselines
+```
+## Best Practices
+1. **Define evals BEFORE coding** - Forces clear thinking about success criteria
+2. **Run evals frequently** - Catch regressions early
+3. **Track pass@k over time** - Monitor reliability trends
+4. **Use code graders when possible** - Deterministic > probabilistic
+5. **Human review for security** - Never fully automate security checks
+6. **Keep evals fast** - Slow evals don't get run
+7. **Version evals with code** - Evals are first-class artifacts
+## Example: Adding Authentication
+```markdown
+## EVAL: add-authentication
+### Phase 1: Define (10 min)
+Capability Evals:
+- [ ] User can register with email/password
+- [ ] User can login with valid credentials
+- [ ] Invalid credentials rejected with proper error
+- [ ] Sessions persist across page reloads
+- [ ] Logout clears session
+Regression Evals:
+- [ ] Public routes still accessible
+- [ ] API responses unchanged
+- [ ] Database schema compatible
+### Phase 2: Implement (varies)
+[Write code]
+### Phase 3: Evaluate
+Run: /eval check add-authentication
+### Phase 4: Report
+EVAL REPORT: add-authentication
+==============================
+Capability: 5/5 passed (pass@3: 100%)
+Regression: 3/3 passed (pass^3: 100%)
+Status: SHIP IT
+```

package/.github/skills/project-guidelines-example/SKILL.md CHANGED Viewed

@@ -1,8 +1,3 @@
----
-name: project-guidelines-example
-description: Example project-specific skill template. Use this as a starting point for creating your own project skills with architecture, patterns, and guidelines.
----
 # Project Guidelines Skill (Example)
 This is an example of a project-specific skill. Use this as a template for your own projects.
@@ -28,7 +23,7 @@ Reference this skill when working on the specific project it's designed for. Pro
 - **Frontend**: Next.js 15 (App Router), TypeScript, React
 - **Backend**: FastAPI (Python), Pydantic models
 - **Database**: Supabase (PostgreSQL)
-- **AI**: LLM API with tool calling and structured output
+- **AI**: Claude API with tool calling and structured output
 - **Deployment**: Google Cloud Run
 - **Testing**: Playwright (E2E), pytest (backend), React Testing Library
@@ -50,7 +45,7 @@ Reference this skill when working on the specific project it's designed for. Pro
               ┌───────────────┼───────────────┐
               ▼               ▼               ▼
         ┌──────────┐   ┌──────────┐   ┌──────────┐
-        │ Supabase │   │   LLM    │   │  Redis   │
+        │ Supabase │   │  Claude  │   │  Redis   │
         │ Database │   │   API    │   │  Cache   │
         └──────────┘   └──────────┘   └──────────┘
 ```
@@ -149,7 +144,7 @@ async function fetchApi<T>(
 }
 ```
-### LLM AI Integration (Structured Output)
+### Claude AI Integration (Structured Output)
 ```python
 from anthropic import Anthropic
@@ -160,11 +155,11 @@ class AnalysisResult(BaseModel):
     key_points: list[str]
     confidence: float
-async def analyze_with_llm(content: str) -> AnalysisResult:
+async def analyze_with_claude(content: str) -> AnalysisResult:
     client = Anthropic()
     response = client.messages.create(
-        model="claude-sonnet-4-5-20250514",  # Or use OpenAI, etc.
+        model="claude-sonnet-4-5-20250514",
         max_tokens=1024,
         messages=[{"role": "user", "content": content}],
         tools=[{

package/.github/skills/strategic-compact/SKILL.md ADDED Viewed

@@ -0,0 +1,63 @@
+---
+name: strategic-compact
+description: Suggests manual context compaction at logical intervals to preserve context through task phases rather than arbitrary auto-compaction.
+---
+# Strategic Compact Skill
+Suggests manual `/compact` at strategic points in your workflow rather than relying on arbitrary auto-compaction.
+## Why Strategic Compaction?
+Auto-compaction triggers at arbitrary points:
+- Often mid-task, losing important context
+- No awareness of logical task boundaries
+- Can interrupt complex multi-step operations
+Strategic compaction at logical boundaries:
+- **After exploration, before execution** - Compact research context, keep implementation plan
+- **After completing a milestone** - Fresh start for next phase
+- **Before major context shifts** - Clear exploration context before different task
+## How It Works
+The `suggest-compact.sh` script runs on PreToolUse (Edit/Write) and:
+1. **Tracks tool calls** - Counts tool invocations in session
+2. **Threshold detection** - Suggests at configurable threshold (default: 50 calls)
+3. **Periodic reminders** - Reminds every 25 calls after threshold
+## Hook Setup
+Add to your `~/.claude/settings.json`:
+```json
+{
+  "hooks": {
+    "PreToolUse": [{
+      "matcher": "tool == \"Edit\" || tool == \"Write\"",
+      "hooks": [{
+        "type": "command",
+        "command": "~/.claude/skills/strategic-compact/suggest-compact.sh"
+      }]
+    }]
+  }
+}
+```
+## Configuration
+Environment variables:
+- `COMPACT_THRESHOLD` - Tool calls before first suggestion (default: 50)
+## Best Practices
+1. **Compact after planning** - Once plan is finalized, compact to start fresh
+2. **Compact after debugging** - Clear error-resolution context before continuing
+3. **Don't compact mid-implementation** - Preserve context for related changes
+4. **Read the suggestion** - The hook tells you *when*, you decide *if*
+## Related
+- [The Longform Guide](https://x.com/affaanmustafa/status/2014040193557471352) - Token optimization section
+- Memory persistence hooks - For state that survives compaction

package/.github/skills/verification-loop/SKILL.md CHANGED Viewed

@@ -1,11 +1,6 @@
----
-name: verification-loop
-description: A comprehensive verification system for coding sessions. Use this after completing features, before creating PRs, or when ensuring quality gates pass.
----
 # Verification Loop Skill
-A comprehensive verification system for coding sessions.
+A comprehensive verification system for Claude Code sessions.
 ## When to Use

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@fyow/copilot-everything",
-  "version": "1.0.2",
+  "version": "1.0.3",
   "description": "Everything you need for GitHub Copilot CLI - agents, skills, instructions, and hooks configurations",
   "main": "src/index.js",
   "bin": {

package/src/commands/init.js CHANGED Viewed

@@ -188,6 +188,27 @@ function init(flags = {}) {
     totalCopied += agentsMdResult.copied;
     totalSkipped += agentsMdResult.skipped;
     totalErrors.push(...agentsMdResult.errors);
+    // MCP config to ~/.copilot/
+    const homeDir = process.env.HOME || process.env.USERPROFILE;
+    const mcpConfigSrc = path.join(PACKAGE_ROOT, 'copilot', 'mcp-config.json');
+    const mcpConfigDest = path.join(homeDir, '.copilot', 'mcp-config.json');
+    if (fs.existsSync(mcpConfigSrc)) {
+      if (fs.existsSync(mcpConfigDest) && !force) {
+        console.log(`   ⚠️  MCP config: skipped (${mcpConfigDest} already exists)`);
+        console.log(`      💡 Use --force to overwrite, or manually merge configurations`);
+        totalSkipped += 1;
+      } else {
+        const mcpResult = copyFile(mcpConfigSrc, mcpConfigDest, options);
+        if (mcpResult.copied > 0) {
+          console.log(`   ✅ MCP config: installed to ${mcpConfigDest}`);
+        }
+        totalCopied += mcpResult.copied;
+        totalSkipped += mcpResult.skipped;
+        totalErrors.push(...mcpResult.errors);
+      }
+    }
   }
   // Install Claude Code configurations (for backwards compatibility)