npm - oh-my-customcode - Versions diffs - 0.33.1 → 0.35.0 - Mend

oh-my-customcode 0.33.1 → 0.35.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

package/templates/.claude/skills/evaluator-optimizer/SKILL.md ADDED Viewed

@@ -0,0 +1,256 @@
+---
+name: evaluator-optimizer
+description: Parameterized evaluator-optimizer loop for quality-critical output with configurable rubrics
+scope: core
+user-invocable: false
+---
+# Evaluator-Optimizer Skill
+## Purpose
+General-purpose iterative refinement loop. A generator agent produces output, an evaluator agent scores it against a configurable rubric, and the loop continues until the quality gate is met or max iterations are reached.
+This skill generalizes the worker-reviewer-pipeline pattern beyond code review to any domain requiring quality-critical output: documentation, architecture decisions, test plans, configurations, and more.
+## Configuration Schema
+```yaml
+evaluator-optimizer:
+  generator:
+    agent: {subagent_type}       # Agent that produces output
+    model: sonnet                 # Default model
+  evaluator:
+    agent: {subagent_type}       # Agent that reviews output
+    model: opus                   # Evaluator benefits from stronger reasoning
+  rubric:
+    - criterion: {name}
+      weight: {0.0-1.0}
+      description: {what to evaluate}
+  quality_gate:
+    type: all_pass | majority_pass | score_threshold
+    threshold: 0.8                # For score_threshold type
+  max_iterations: 3               # Default, hard cap: 5
+```
+### Parameter Details
+| Parameter | Required | Default | Description |
+|-----------|----------|---------|-------------|
+| `generator.agent` | Yes | — | Subagent type that produces output |
+| `generator.model` | No | `sonnet` | Model for generation |
+| `evaluator.agent` | Yes | — | Subagent type that evaluates output |
+| `evaluator.model` | No | `opus` | Model for evaluation (stronger reasoning preferred) |
+| `rubric` | Yes | — | List of evaluation criteria with weights |
+| `quality_gate.type` | No | `score_threshold` | Gate strategy |
+| `quality_gate.threshold` | No | `0.8` | Score threshold (for `score_threshold` type) |
+| `max_iterations` | No | `3` | Max refinement loops (hard cap: 5) |
+## Quality Gate Types
+| Type | Behavior |
+|------|----------|
+| `all_pass` | Every rubric criterion must pass |
+| `majority_pass` | >50% of weighted criteria pass |
+| `score_threshold` | Weighted average score >= threshold |
+### Gate Evaluation Logic
+- **all_pass**: Each criterion scored individually. All must receive `pass: true`.
+- **majority_pass**: Sum weights of passing criteria. If > 0.5 of total weight, gate passes.
+- **score_threshold**: Compute weighted average: `sum(score_i * weight_i) / sum(weight_i)`. Compare against threshold.
+## Workflow
+```
+1. Generator produces output
+   → Orchestrator spawns generator agent with task prompt
+   → Generator returns output artifact
+2. Evaluator scores against rubric
+   → Orchestrator spawns evaluator agent with:
+     - The output artifact
+     - The rubric criteria
+     - Instructions to produce verdict JSON
+   → Evaluator returns structured verdict
+3. Quality gate check:
+   - PASS → return output + final verdict
+   - FAIL → extract feedback, append to generator prompt → iteration N+1
+4. Max iterations reached → return best output + warning
+   → "Best" = output from iteration with highest weighted score
+```
+### Iteration Flow Diagram
+```
+┌─────────────────────────────────────────────────┐
+│                  Orchestrator                    │
+│                                                  │
+│  ┌──────────┐    ┌──────────┐    ┌──────────┐  │
+│  │ Generate  │───→│ Evaluate │───→│  Gate    │  │
+│  │ (iter N)  │    │          │    │  Check   │  │
+│  └──────────┘    └──────────┘    └────┬─────┘  │
+│       ↑                               │         │
+│       │         ┌──────────┐    FAIL  │  PASS   │
+│       └─────────│ Feedback │←────────┘    │     │
+│                 └──────────┘              ↓     │
+│                                     Return      │
+└─────────────────────────────────────────────────┘
+```
+## Stopping Criteria Display
+```
+[Evaluator-Optimizer]
+├── Generator: {agent}:{model}
+├── Evaluator: {agent}:{model}
+├── Max iterations: {max_iterations} (hard cap: 5)
+├── Quality gate: {type} (threshold: {threshold})
+└── Rubric: {N} criteria
+```
+Display this at the start of the loop to provide transparency into the refinement configuration.
+## Verdict Format
+The evaluator MUST return a structured verdict in this format:
+```json
+{
+  "status": "pass | fail",
+  "iteration": 2,
+  "score": 0.85,
+  "rubric_results": [
+    {"criterion": "clarity", "pass": true, "score": 0.9, "feedback": "Clear structure and logical flow"},
+    {"criterion": "accuracy", "pass": true, "score": 0.8, "feedback": "All facts verified, one minor imprecision in section 3"}
+  ],
+  "improvement_summary": "Section 3 terminology tightened. Examples added to section 2."
+}
+```
+### Verdict Fields
+| Field | Type | Description |
+|-------|------|-------------|
+| `status` | `pass` or `fail` | Overall quality gate result |
+| `iteration` | number | Current iteration number (1-indexed) |
+| `score` | number (0.0-1.0) | Weighted average score across all criteria |
+| `rubric_results` | array | Per-criterion evaluation details |
+| `improvement_summary` | string | Summary of changes from previous iteration (empty on iteration 1) |
+## Domain Examples
+| Domain | Generator | Evaluator | Rubric Focus |
+|--------|-----------|-----------|--------------|
+| Code review | `lang-*-expert` | opus reviewer | Correctness, style, security |
+| Documentation | `arch-documenter` | opus reviewer | Completeness, clarity, accuracy |
+| Architecture | Plan agent | opus reviewer | No SPOFs, no circular deps |
+| Test plans | `qa-planner` | `qa-engineer` | Coverage, edge cases, feasibility |
+| Agent creation | `mgr-creator` | opus reviewer | Frontmatter validity, R006 compliance |
+| Security audit | `sec-codeql-expert` | opus reviewer | Vulnerability coverage, false positive rate |
+### Example: Documentation Review
+```yaml
+evaluator-optimizer:
+  generator:
+    agent: arch-documenter
+    model: sonnet
+  evaluator:
+    agent: general-purpose
+    model: opus
+  rubric:
+    - criterion: completeness
+      weight: 0.3
+      description: All sections present, no gaps in coverage
+    - criterion: clarity
+      weight: 0.3
+      description: Clear language, no ambiguity, proper examples
+    - criterion: accuracy
+      weight: 0.25
+      description: All technical details correct and verifiable
+    - criterion: consistency
+      weight: 0.15
+      description: Consistent terminology, formatting, and style
+  quality_gate:
+    type: score_threshold
+    threshold: 0.8
+  max_iterations: 3
+```
+### Example: Code Implementation
+```yaml
+evaluator-optimizer:
+  generator:
+    agent: lang-typescript-expert
+    model: sonnet
+  evaluator:
+    agent: general-purpose
+    model: opus
+  rubric:
+    - criterion: correctness
+      weight: 0.35
+      description: Code compiles, logic is correct, edge cases handled
+    - criterion: style
+      weight: 0.2
+      description: Follows project conventions, clean and readable
+    - criterion: security
+      weight: 0.25
+      description: No injection risks, proper input validation
+    - criterion: performance
+      weight: 0.2
+      description: No unnecessary allocations, efficient algorithms
+  quality_gate:
+    type: all_pass
+  max_iterations: 3
+```
+## Integration
+| Rule | Integration |
+|------|-------------|
+| R009 | Generator and evaluator run sequentially (dependent — evaluator needs generator output) |
+| R010 | Orchestrator configures and invokes the loop; generator and evaluator agents execute via Agent tool |
+| R007 | Each iteration displays agent identification for both generator and evaluator |
+| R008 | Tool calls within generator/evaluator follow tool identification rules |
+| R013 | Ecomode: return verdict summary only, skip per-criterion details |
+| R015 | Display configuration block at loop start for intent transparency |
+## Ecomode Behavior
+When ecomode is active (R013), compress output:
+**Normal mode:**
+```
+[Evaluator-Optimizer] Iteration 2/3
+├── Generator: lang-typescript-expert:sonnet → produced 45-line module
+├── Evaluator: general-purpose:opus → scored 0.85
+├── Rubric: correctness ✓(0.9), style ✓(0.8), security ✓(0.85), performance ✓(0.8)
+└── Gate: score_threshold(0.8) → PASS
+```
+**Ecomode:**
+```
+[EO] iter 2/3 → 0.85 → PASS
+```
+## Error Handling
+| Scenario | Action |
+|----------|--------|
+| Generator fails to produce output | Retry once with simplified prompt; if still fails, abort with error |
+| Evaluator returns malformed verdict | Retry once; if still malformed, treat as fail with score 0.0 |
+| Max iterations reached without passing | Return best-scored output with warning: "Quality gate not met after {N} iterations" |
+| Rubric has zero total weight | Reject configuration, report error before starting loop |
+| Hard cap exceeded in config | Clamp `max_iterations` to 5, emit warning |
+## Constraints
+- This skill does NOT use `context: fork` — it operates within the caller's context
+- Generator and evaluator MUST be different agent invocations (no self-review)
+- The evaluator prompt MUST include the full rubric to ensure consistent scoring
+- Iteration state (best score, best output) is tracked by the orchestrator
+- The hard cap of 5 iterations prevents runaway refinement loops

package/templates/.claude/skills/fix-refs/SKILL.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-name: fix-refs
+name: omcustom:fix-refs
 description: Fix broken agent references and symlinks
 scope: harness
 argument-hint: "[agent-name] [--all] [--dry-run]"

package/templates/.claude/skills/help/SKILL.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
-name: help
+name: omcustom:help
 description: Show help information for commands and system
-scope: core
+scope: harness
 argument-hint: "[command] [--agents] [--rules]"
 ---

package/templates/.claude/skills/lists/SKILL.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
-name: lists
+name: omcustom:lists
 description: Show all available commands
-scope: core
+scope: harness
 argument-hint: "[--category <category>] [--verbose]"
 ---

package/templates/.claude/skills/monitoring-setup/SKILL.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-name: monitoring-setup
+name: omcustom:monitoring-setup
 description: Enable/disable OpenTelemetry console monitoring for Claude Code usage tracking
 scope: package
 argument-hint: "[enable|disable|status]"

package/templates/.claude/skills/npm-audit/SKILL.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-name: npm-audit
+name: omcustom:npm-audit
 description: Audit npm dependencies for security and updates
 scope: package
 argument-hint: "[--fix] [--production]"

package/templates/.claude/skills/npm-publish/SKILL.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-name: npm-publish
+name: omcustom:npm-publish
 description: Publish package to npm registry with pre-checks
 scope: package
 argument-hint: "[--tag <tag>] [--dry-run]"

package/templates/.claude/skills/npm-version/SKILL.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-name: npm-version
+name: omcustom:npm-version
 description: Manage semantic versions for npm packages
 scope: package
 argument-hint: "<major|minor|patch> [--no-tag] [--no-commit]"

package/templates/.claude/skills/research/SKILL.md CHANGED Viewed

@@ -20,10 +20,124 @@ Orchestrates 10 parallel research teams for comprehensive deep analysis of any t
 /research Rust async runtime comparison
 ```
+## When NOT to Use
+| Scenario | Better Alternative |
+|----------|--------------------|
+| Simple factual question | Direct answer or single WebSearch |
+| Single-file code review | `/dev-review` with specific file |
+| Known solution implementation | `/structured-dev-cycle` |
+| Topic with < 3 comparison dimensions | Single Explore agent |
+**Pre-execution check**: If the query can be answered with < 3 sources, skip 10-team research.
+## Pre-flight Guards
+Before executing the 10-team research workflow, the agent MUST run these checks. Research is a high-cost operation (~$8-15) — these guards prevent wasteful execution.
+### Guard Levels
+| Level | Meaning | Action |
+|-------|---------|--------|
+| PASS | No issues detected | Proceed with research |
+| INFO | Minor suggestion | Log note, proceed |
+| WARN | Potentially wasteful | Show warning with cost estimate, ask confirmation |
+| GATE | Wrong tool — use simpler alternative | Block execution, suggest alternative |
+### Guard 1: Query Complexity Assessment
+**Level**: GATE or PASS
+**Check**: Assess if the query requires multi-team research
+```
+# Simple factual questions → GATE
+indicators_simple:
+  - Query is < 10 words
+  - Query asks "what is", "how to", "when was" (factual)
+  - Query has a single definitive answer
+  - Can be answered from a single documentation source
+# Complex research questions → PASS
+indicators_complex:
+  - Query involves comparison of 3+ alternatives
+  - Query requires analysis across multiple dimensions
+  - Query mentions "compare", "evaluate", "analyze", "research"
+  - Query references a repository or ecosystem for deep analysis
+```
+**Action (GATE)**: `[Pre-flight] GATE: Query appears to be a simple factual question. Use direct answer or single WebSearch instead. 10-team research (~$8-15) would be wasteful. Override with /research --force if intended.`
+### Guard 2: Single-File Review Detection
+**Level**: GATE
+**Check**: If the query references a single file for review
+```
+# Detection
+- Query mentions a specific file path (e.g., src/main.go)
+- Query asks to "review" or "analyze" a single file
+- No broader context requested
+```
+**Action**: `[Pre-flight] GATE: For single-file review, use /dev-review {file} instead. Research is for multi-source analysis.`
+### Guard 3: Known Solution Detection
+**Level**: INFO
+**Check**: If the query is about implementing a known solution
+```
+# Detection
+keywords: implement, build, create, add feature, 구현, 만들어
+# AND the solution approach is well-known (not requiring research)
+```
+**Action**: `[Pre-flight] INFO: If the implementation approach is already known, consider /structured-dev-cycle instead of research. Proceeding with research.`
+### Guard 4: Context Budget Check
+**Level**: WARN
+**Check**: Estimate context impact of 10-team research
+```bash
+# Check current context usage from statusline data
+CONTEXT_FILE="/tmp/.claude-context-$PPID"
+if [ -f "$CONTEXT_FILE" ]; then
+  context_pct=$(cat "$CONTEXT_FILE")
+  if [ "$context_pct" -gt 40 ]; then
+    # WARN — research will consume significant additional context
+  fi
+fi
+```
+**Action**: `[Pre-flight] WARN: Context usage at {pct}%. 10-team research typically adds 30-40% context. Consider /compact before proceeding, or results may be truncated.`
+### Display Format
+```
+[Pre-flight] research
+├── Query complexity: PASS — multi-dimensional comparison detected
+├── Single-file review: PASS
+├── Known solution: PASS
+└── Context budget: WARN — context at 45%, research adds ~35%
+Result: PROCEED WITH CAUTION (0 GATE, 1 WARN, 0 INFO)
+Cost estimate: ~$8-15 for 10-team parallel research
+```
+If any GATE: block and suggest alternative. User can override with `--force`.
+If any WARN: show warning with cost context, ask user to confirm.
+If only PASS/INFO: proceed automatically.
 ## Architecture — 4 Phases
 ### Phase 1: Parallel Research (10 teams, batched per R009)
+**Step 0**: Pre-flight guards pass (see Pre-flight Guards section)
 Teams operate in breadth/depth pairs across 5 domains:
 | Pair | Domain | Team | Role | Focus |
@@ -258,6 +372,8 @@ Before execution:
 └── Phase 4: Report + GitHub issue
 Estimated: {time} | Teams: 10 | Models: sonnet → opus → codex
+Stopping: max 30 verification rounds, convergence at 0 contradictions
+Cost: ~$8-15 (10 teams × sonnet + opus verification)
 Execute? [Y/n]
 ```

package/templates/.claude/skills/sauron-watch/SKILL.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-name: sauron-watch
+name: omcustom:sauron-watch
 description: Full R017 verification (5+3 rounds) before commit
 scope: harness
 disable-model-invocation: true

package/templates/.claude/skills/status/SKILL.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
-name: status
+name: omcustom:status
 description: Show system status and health checks
-scope: core
+scope: harness
 argument-hint: "[--verbose] [--health]"
 ---

package/templates/.claude/skills/task-decomposition/SKILL.md CHANGED Viewed

@@ -20,6 +20,19 @@ Decomposition is **recommended** when any of these thresholds are met:
 | Domains involved | > 2 domains | Requires multiple specialists |
 | Agent types needed | > 2 types | Cross-specialty coordination |
+### Step 0: Pattern Selection
+Before decomposing, select the appropriate workflow pattern:
+| Pattern | When to Use | Primitive |
+|---------|-------------|-----------|
+| Sequential | Steps must execute in order, each depends on previous | dag-orchestration (linear) |
+| Parallel | Independent subtasks with no shared state | Agent tool (R009) or Agent Teams (R018) |
+| Evaluator-Optimizer | Quality-critical output needing iterative refinement | worker-reviewer-pipeline |
+| Orchestrator | Complex multi-step with dynamic routing | Routing skills (secretary/dev-lead/de-lead/qa-lead) |
+**Decision**: If task has independent subtasks → Parallel. If quality-critical → add EO review cycle. If multi-step with dependencies → Sequential/Orchestrator.
 ## Decomposition Process
 ```

package/templates/.claude/skills/update-docs/SKILL.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-name: update-docs
+name: omcustom:update-docs
 description: Sync documentation with project structure
 scope: harness
 argument-hint: "[--check] [--target <path>]"

package/templates/.claude/skills/update-external/SKILL.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-name: update-external
+name: omcustom:update-external
 description: Update agents from external sources (GitHub, docs, etc.)
 scope: harness
 argument-hint: "[agent-name] [--check] [--force]"

package/templates/.claude/skills/worker-reviewer-pipeline/SKILL.md CHANGED Viewed

@@ -98,6 +98,16 @@ When Agent Teams is NOT available, falls back to sequential Agent tool calls:
 Agent(worker) → result → Agent(reviewer) → verdict → Agent(worker) → ...
 ```
+## Stopping Criteria Display
+Before execution, display:
+```
+[Worker-Reviewer Pipeline]
+├── Max iterations: {max_iterations} (default: 3, hard cap: 5)
+├── Quality gate: {pass_threshold}% approval required
+└── Early stop: All reviewers approve → stop immediately
+```
 ## Display Format
 ```

package/templates/.claude/statusline.sh CHANGED Viewed

@@ -69,6 +69,12 @@ IFS=$'\t' read -r model_name project_dir ctx_pct ctx_size cost_usd <<< "$(
     ] | @tsv'
 )"
+# ---------------------------------------------------------------------------
+# 4b. Cost & context data bridge — write to temp file for hooks
+# ---------------------------------------------------------------------------
+COST_BRIDGE_FILE="/tmp/.claude-cost-${PPID}"
+printf '%s\t%s\t%s\n' "$cost_usd" "$ctx_pct" "$(date +%s)" > "$COST_BRIDGE_FILE" 2>/dev/null || true
 # ---------------------------------------------------------------------------
 # 5. Model display name + color (bash 3.2 compatible case pattern matching)
 #    Model detection (kept for internal reference, not displayed in statusline)

package/templates/CLAUDE.md.en CHANGED Viewed

@@ -151,31 +151,31 @@ Violation = immediate correction. No exception for "small changes".
 | Command | Description |
 |---------|-------------|
-| `/analysis` | Analyze project and auto-configure customizations |
-| `/create-agent` | Create a new agent |
-| `/update-docs` | Sync documentation with project structure |
-| `/update-external` | Update agents from external sources |
-| `/audit-agents` | Audit agent dependencies |
-| `/fix-refs` | Fix broken references |
+| `/omcustom:analysis` | Analyze project and auto-configure customizations |
+| `/omcustom:create-agent` | Create a new agent |
+| `/omcustom:update-docs` | Sync documentation with project structure |
+| `/omcustom:update-external` | Update agents from external sources |
+| `/omcustom:audit-agents` | Audit agent dependencies |
+| `/omcustom:fix-refs` | Fix broken references |
 | `/dev-review` | Review code for best practices |
 | `/dev-refactor` | Refactor code |
 | `/memory-save` | Save session context to claude-mem |
 | `/memory-recall` | Search and recall memories |
-| `/monitoring-setup` | Enable/disable OTel console monitoring |
-| `/npm-publish` | Publish package to npm registry |
-| `/npm-version` | Manage semantic versions |
-| `/npm-audit` | Audit dependencies |
+| `/omcustom:monitoring-setup` | Enable/disable OTel console monitoring |
+| `/omcustom:npm-publish` | Publish package to npm registry |
+| `/omcustom:npm-version` | Manage semantic versions |
+| `/omcustom:npm-audit` | Audit dependencies |
 | `/codex-exec` | Execute Codex CLI prompts |
 | `/optimize-analyze` | Analyze bundle and performance |
 | `/optimize-bundle` | Optimize bundle size |
 | `/optimize-report` | Generate optimization report |
 | `/research` | 10-team parallel deep analysis and cross-verification |
 | `/deep-plan` | Research-validated planning (research → plan → verify) |
-| `/sauron-watch` | Full R017 verification |
+| `/omcustom:sauron-watch` | Full R017 verification |
 | `/structured-dev-cycle` | 6-stage structured development cycle (Plan → Verify → Implement → Verify → Compound → Done) |
-| `/lists` | Show all available commands |
-| `/status` | Show system status |
-| `/help` | Show help information |
+| `/omcustom:lists` | Show all available commands |
+| `/omcustom:status` | Show system status |
+| `/omcustom:help` | Show help information |
 ## Project Structure
@@ -184,7 +184,7 @@ project/
 +-- CLAUDE.md                    # Entry point
 +-- .claude/
 |   +-- agents/                  # Subagent definitions (44 files)
-|   +-- skills/                  # Skills (70 directories)
+|   +-- skills/                  # Skills (71 directories)
 |   +-- rules/                   # Global rules (R000-R019)
 |   +-- hooks/                   # Hook scripts (memory, HUD)
 |   +-- contexts/                # Context files (ecomode)
@@ -250,15 +250,15 @@ Task tool + routing skills remain the fallback for simple/cost-sensitive tasks.
 ```bash
 # Project analysis
-/analysis
+/omcustom:analysis
 # Show all commands
-/lists
+/omcustom:lists
 # Agent management
-/create-agent my-agent
-/update-docs
-/audit-agents
+/omcustom:create-agent my-agent
+/omcustom:update-docs
+/omcustom:audit-agents
 # Code review
 /dev-review src/main.go
@@ -268,7 +268,7 @@ Task tool + routing skills remain the fallback for simple/cost-sensitive tasks.
 /memory-recall authentication
 # Verification
-/sauron-watch
+/omcustom:sauron-watch
 ```
 ## External Dependencies