npm - claude-code-pilot - Versions diffs - 3.0.0 → 3.1.0 - Mend

claude-code-pilot 3.0.0 → 3.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (124) hide show

package/README.md +76 -97
package/bin/install.js +13 -13
package/manifest.json +1 -1
package/package.json +1 -1
package/src/agents/doc-updater.md +1 -1
package/src/agents/gan-evaluator.md +209 -0
package/src/agents/gan-generator.md +131 -0
package/src/agents/gan-planner.md +99 -0
package/src/agents/harness-optimizer.md +35 -0
package/src/agents/loop-operator.md +36 -0
package/src/agents/opensource-forker.md +198 -0
package/src/agents/opensource-packager.md +249 -0
package/src/agents/opensource-sanitizer.md +188 -0
package/src/agents/performance-optimizer.md +446 -0
package/src/available-rules/README.md +1 -1
package/src/commands/{aside.md → ccp/aside.md} +14 -13
package/src/commands/{build-fix.md → ccp/build-fix.md} +5 -0
package/src/commands/{checkpoint.md → ccp/checkpoint.md} +12 -7
package/src/commands/{code-review.md → ccp/code-review.md} +5 -0
package/src/commands/{context-budget.md → ccp/context-budget.md} +2 -1
package/src/commands/{cpp-build.md → ccp/cpp-build.md} +6 -5
package/src/commands/{cpp-review.md → ccp/cpp-review.md} +7 -6
package/src/commands/{cpp-test.md → ccp/cpp-test.md} +6 -5
package/src/commands/ccp/docs-update.md +48 -0
package/src/commands/{docs.md → ccp/docs.md} +4 -3
package/src/commands/{e2e.md → ccp/e2e.md} +7 -6
package/src/commands/{eval.md → ccp/eval.md} +10 -5
package/src/commands/{evolve.md → ccp/evolve.md} +3 -3
package/src/commands/{go-build.md → ccp/go-build.md} +6 -5
package/src/commands/{go-review.md → ccp/go-review.md} +7 -6
package/src/commands/{go-test.md → ccp/go-test.md} +6 -5
package/src/commands/{gradle-build.md → ccp/gradle-build.md} +1 -0
package/src/commands/{harness-audit.md → ccp/harness-audit.md} +6 -1
package/src/commands/{kotlin-build.md → ccp/kotlin-build.md} +6 -5
package/src/commands/{kotlin-review.md → ccp/kotlin-review.md} +7 -6
package/src/commands/{kotlin-test.md → ccp/kotlin-test.md} +6 -5
package/src/commands/{learn.md → ccp/learn.md} +7 -2
package/src/commands/{model-route.md → ccp/model-route.md} +6 -1
package/src/commands/{orchestrate.md → ccp/orchestrate.md} +4 -3
package/src/commands/{plan.md → ccp/plan.md} +6 -5
package/src/commands/ccp/profile-user.md +46 -0
package/src/commands/{prompt-optimize.md → ccp/prompt-optimize.md} +3 -2
package/src/commands/{prune.md → ccp/prune.md} +4 -4
package/src/commands/{python-review.md → ccp/python-review.md} +7 -6
package/src/commands/{quality-gate.md → ccp/quality-gate.md} +6 -1
package/src/commands/{refactor-clean.md → ccp/refactor-clean.md} +5 -0
package/src/commands/{resume-session.md → ccp/resume-session.md} +9 -8
package/src/commands/ccp/review.md +37 -0
package/src/commands/{rules-distill.md → ccp/rules-distill.md} +2 -1
package/src/commands/{rust-build.md → ccp/rust-build.md} +6 -5
package/src/commands/{rust-review.md → ccp/rust-review.md} +7 -6
package/src/commands/{rust-test.md → ccp/rust-test.md} +6 -5
package/src/commands/{save-session.md → ccp/save-session.md} +2 -1
package/src/commands/ccp/secure-phase.md +35 -0
package/src/commands/{sessions.md → ccp/sessions.md} +29 -24
package/src/commands/{setup-pm.md → ccp/setup-pm.md} +1 -0
package/src/commands/{setup-refresh.md → ccp/setup-refresh.md} +4 -3
package/src/commands/{setup.md → ccp/setup.md} +24 -23
package/src/commands/{skill-create.md → ccp/skill-create.md} +8 -8
package/src/commands/{skill-health.md → ccp/skill-health.md} +5 -5
package/src/commands/{tdd.md → ccp/tdd.md} +9 -8
package/src/commands/{test-coverage.md → ccp/test-coverage.md} +5 -0
package/src/commands/{tool-guide.md → ccp/tool-guide.md} +2 -1
package/src/commands/{update-codemaps.md → ccp/update-codemaps.md} +5 -0
package/src/commands/{update-docs.md → ccp/update-docs.md} +5 -0
package/src/commands/{verify.md → ccp/verify.md} +5 -0
package/src/commands/ccp/workstreams.md +68 -0
package/src/examples/CLAUDE.md +4 -4
package/src/examples/django-api-CLAUDE.md +5 -5
package/src/examples/go-microservice-CLAUDE.md +6 -6
package/src/examples/rust-api-CLAUDE.md +4 -4
package/src/examples/saas-nextjs-CLAUDE.md +8 -8
package/src/hooks/session-start.js +1 -1
package/src/pilot/references/mcp-servers.json +1 -1
package/src/pilot/workflows/docs-update.md +1165 -0
package/src/pilot/workflows/help.md +48 -56
package/src/pilot/workflows/profile-user.md +452 -0
package/src/pilot/workflows/review.md +244 -0
package/src/pilot/workflows/secure-phase.md +164 -0
package/src/rules/common/code-review.md +124 -0
package/src/rules/zh/README.md +108 -0
package/src/rules/zh/agents.md +50 -0
package/src/rules/zh/code-review.md +124 -0
package/src/rules/zh/coding-style.md +48 -0
package/src/rules/zh/development-workflow.md +44 -0
package/src/rules/zh/git-workflow.md +24 -0
package/src/rules/zh/hooks.md +30 -0
package/src/rules/zh/patterns.md +31 -0
package/src/rules/zh/performance.md +55 -0
package/src/rules/zh/security.md +29 -0
package/src/rules/zh/testing.md +29 -0
package/src/skills/autonomous-agent-harness/SKILL.md +267 -0
package/src/skills/autonomous-loops/SKILL.md +610 -0
package/src/skills/bun-runtime/SKILL.md +84 -0
package/src/skills/content-hash-cache-pattern/SKILL.md +161 -0
package/src/skills/context-budget/SKILL.md +3 -3
package/src/skills/continuous-learning-v2/SKILL.md +4 -4
package/src/skills/continuous-learning-v2/agents/observer.md +1 -1
package/src/skills/cost-aware-llm-pipeline/SKILL.md +183 -0
package/src/skills/design-system/SKILL.md +82 -0
package/src/skills/eval-harness/SKILL.md +270 -0
package/src/skills/flutter-dart-code-review/SKILL.md +435 -0
package/src/skills/gan-style-harness/SKILL.md +278 -0
package/src/skills/git-workflow/SKILL.md +715 -0
package/src/skills/hexagonal-architecture/SKILL.md +276 -0
package/src/skills/iterative-retrieval/SKILL.md +211 -0
package/src/skills/laravel-plugin-discovery/SKILL.md +229 -0
package/src/skills/nextjs-turbopack/SKILL.md +44 -0
package/src/skills/nuxt4-patterns/SKILL.md +100 -0
package/src/skills/opensource-pipeline/SKILL.md +255 -0
package/src/skills/perl-security/SKILL.md +503 -0
package/src/skills/project-flow-ops/SKILL.md +111 -0
package/src/skills/project-guidelines-example/SKILL.md +349 -0
package/src/skills/prompt-optimizer/SKILL.md +38 -38
package/src/skills/pytorch-patterns/SKILL.md +396 -0
package/src/skills/regex-vs-llm-structured-text/SKILL.md +220 -0
package/src/skills/repo-scan/SKILL.md +78 -0
package/src/skills/rules-distill/SKILL.md +264 -0
package/src/skills/rules-distill/scripts/scan-rules.sh +58 -0
package/src/skills/rules-distill/scripts/scan-skills.sh +129 -0
package/src/skills/swift-concurrency-6-2/SKILL.md +216 -0
package/src/skills/token-budget-advisor/SKILL.md +133 -0
package/src/skills/verification-loop/SKILL.md +1 -1
package/src/skills/workspace-surface-audit/SKILL.md +125 -0

package/src/skills/pytorch-patterns/SKILL.md ADDED Viewed

@@ -0,0 +1,396 @@
+---
+name: pytorch-patterns
+description: PyTorch deep learning patterns and best practices for building robust, efficient, and reproducible training pipelines, model architectures, and data loading.
+origin: ECC
+---
+# PyTorch Development Patterns
+Idiomatic PyTorch patterns and best practices for building robust, efficient, and reproducible deep learning applications.
+## When to Activate
+- Writing new PyTorch models or training scripts
+- Reviewing deep learning code
+- Debugging training loops or data pipelines
+- Optimizing GPU memory usage or training speed
+- Setting up reproducible experiments
+## Core Principles
+### 1. Device-Agnostic Code
+Always write code that works on both CPU and GPU without hardcoding devices.
+```python
+# Good: Device-agnostic
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model = MyModel().to(device)
+data = data.to(device)
+# Bad: Hardcoded device
+model = MyModel().cuda()  # Crashes if no GPU
+data = data.cuda()
+```
+### 2. Reproducibility First
+Set all random seeds for reproducible results.
+```python
+# Good: Full reproducibility setup
+def set_seed(seed: int = 42) -> None:
+    torch.manual_seed(seed)
+    torch.cuda.manual_seed_all(seed)
+    np.random.seed(seed)
+    random.seed(seed)
+    torch.backends.cudnn.deterministic = True
+    torch.backends.cudnn.benchmark = False
+# Bad: No seed control
+model = MyModel()  # Different weights every run
+```
+### 3. Explicit Shape Management
+Always document and verify tensor shapes.
+```python
+# Good: Shape-annotated forward pass
+def forward(self, x: torch.Tensor) -> torch.Tensor:
+    # x: (batch_size, channels, height, width)
+    x = self.conv1(x)    # -> (batch_size, 32, H, W)
+    x = self.pool(x)     # -> (batch_size, 32, H//2, W//2)
+    x = x.view(x.size(0), -1)  # -> (batch_size, 32*H//2*W//2)
+    return self.fc(x)    # -> (batch_size, num_classes)
+# Bad: No shape tracking
+def forward(self, x):
+    x = self.conv1(x)
+    x = self.pool(x)
+    x = x.view(x.size(0), -1)  # What size is this?
+    return self.fc(x)           # Will this even work?
+```
+## Model Architecture Patterns
+### Clean nn.Module Structure
+```python
+# Good: Well-organized module
+class ImageClassifier(nn.Module):
+    def __init__(self, num_classes: int, dropout: float = 0.5) -> None:
+        super().__init__()
+        self.features = nn.Sequential(
+            nn.Conv2d(3, 64, kernel_size=3, padding=1),
+            nn.BatchNorm2d(64),
+            nn.ReLU(inplace=True),
+            nn.MaxPool2d(2),
+        )
+        self.classifier = nn.Sequential(
+            nn.Dropout(dropout),
+            nn.Linear(64 * 16 * 16, num_classes),
+        )
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        x = self.features(x)
+        x = x.view(x.size(0), -1)
+        return self.classifier(x)
+# Bad: Everything in forward
+class ImageClassifier(nn.Module):
+    def __init__(self):
+        super().__init__()
+    def forward(self, x):
+        x = F.conv2d(x, weight=self.make_weight())  # Creates weight each call!
+        return x
+```
+### Proper Weight Initialization
+```python
+# Good: Explicit initialization
+def _init_weights(self, module: nn.Module) -> None:
+    if isinstance(module, nn.Linear):
+        nn.init.kaiming_normal_(module.weight, mode="fan_out", nonlinearity="relu")
+        if module.bias is not None:
+            nn.init.zeros_(module.bias)
+    elif isinstance(module, nn.Conv2d):
+        nn.init.kaiming_normal_(module.weight, mode="fan_out", nonlinearity="relu")
+    elif isinstance(module, nn.BatchNorm2d):
+        nn.init.ones_(module.weight)
+        nn.init.zeros_(module.bias)
+model = MyModel()
+model.apply(model._init_weights)
+```
+## Training Loop Patterns
+### Standard Training Loop
+```python
+# Good: Complete training loop with best practices
+def train_one_epoch(
+    model: nn.Module,
+    dataloader: DataLoader,
+    optimizer: torch.optim.Optimizer,
+    criterion: nn.Module,
+    device: torch.device,
+    scaler: torch.amp.GradScaler | None = None,
+) -> float:
+    model.train()  # Always set train mode
+    total_loss = 0.0
+    for batch_idx, (data, target) in enumerate(dataloader):
+        data, target = data.to(device), target.to(device)
+        optimizer.zero_grad(set_to_none=True)  # More efficient than zero_grad()
+        # Mixed precision training
+        with torch.amp.autocast("cuda", enabled=scaler is not None):
+            output = model(data)
+            loss = criterion(output, target)
+        if scaler is not None:
+            scaler.scale(loss).backward()
+            scaler.unscale_(optimizer)
+            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
+            scaler.step(optimizer)
+            scaler.update()
+        else:
+            loss.backward()
+            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
+            optimizer.step()
+        total_loss += loss.item()
+    return total_loss / len(dataloader)
+```
+### Validation Loop
+```python
+# Good: Proper evaluation
+@torch.no_grad()  # More efficient than wrapping in torch.no_grad() block
+def evaluate(
+    model: nn.Module,
+    dataloader: DataLoader,
+    criterion: nn.Module,
+    device: torch.device,
+) -> tuple[float, float]:
+    model.eval()  # Always set eval mode — disables dropout, uses running BN stats
+    total_loss = 0.0
+    correct = 0
+    total = 0
+    for data, target in dataloader:
+        data, target = data.to(device), target.to(device)
+        output = model(data)
+        total_loss += criterion(output, target).item()
+        correct += (output.argmax(1) == target).sum().item()
+        total += target.size(0)
+    return total_loss / len(dataloader), correct / total
+```
+## Data Pipeline Patterns
+### Custom Dataset
+```python
+# Good: Clean Dataset with type hints
+class ImageDataset(Dataset):
+    def __init__(
+        self,
+        image_dir: str,
+        labels: dict[str, int],
+        transform: transforms.Compose | None = None,
+    ) -> None:
+        self.image_paths = list(Path(image_dir).glob("*.jpg"))
+        self.labels = labels
+        self.transform = transform
+    def __len__(self) -> int:
+        return len(self.image_paths)
+    def __getitem__(self, idx: int) -> tuple[torch.Tensor, int]:
+        img = Image.open(self.image_paths[idx]).convert("RGB")
+        label = self.labels[self.image_paths[idx].stem]
+        if self.transform:
+            img = self.transform(img)
+        return img, label
+```
+### Efficient DataLoader Configuration
+```python
+# Good: Optimized DataLoader
+dataloader = DataLoader(
+    dataset,
+    batch_size=32,
+    shuffle=True,            # Shuffle for training
+    num_workers=4,           # Parallel data loading
+    pin_memory=True,         # Faster CPU->GPU transfer
+    persistent_workers=True, # Keep workers alive between epochs
+    drop_last=True,          # Consistent batch sizes for BatchNorm
+)
+# Bad: Slow defaults
+dataloader = DataLoader(dataset, batch_size=32)  # num_workers=0, no pin_memory
+```
+### Custom Collate for Variable-Length Data
+```python
+# Good: Pad sequences in collate_fn
+def collate_fn(batch: list[tuple[torch.Tensor, int]]) -> tuple[torch.Tensor, torch.Tensor]:
+    sequences, labels = zip(*batch)
+    # Pad to max length in batch
+    padded = nn.utils.rnn.pad_sequence(sequences, batch_first=True, padding_value=0)
+    return padded, torch.tensor(labels)
+dataloader = DataLoader(dataset, batch_size=32, collate_fn=collate_fn)
+```
+## Checkpointing Patterns
+### Save and Load Checkpoints
+```python
+# Good: Complete checkpoint with all training state
+def save_checkpoint(
+    model: nn.Module,
+    optimizer: torch.optim.Optimizer,
+    epoch: int,
+    loss: float,
+    path: str,
+) -> None:
+    torch.save({
+        "epoch": epoch,
+        "model_state_dict": model.state_dict(),
+        "optimizer_state_dict": optimizer.state_dict(),
+        "loss": loss,
+    }, path)
+def load_checkpoint(
+    path: str,
+    model: nn.Module,
+    optimizer: torch.optim.Optimizer | None = None,
+) -> dict:
+    checkpoint = torch.load(path, map_location="cpu", weights_only=True)
+    model.load_state_dict(checkpoint["model_state_dict"])
+    if optimizer:
+        optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
+    return checkpoint
+# Bad: Only saving model weights (can't resume training)
+torch.save(model.state_dict(), "model.pt")
+```
+## Performance Optimization
+### Mixed Precision Training
+```python
+# Good: AMP with GradScaler
+scaler = torch.amp.GradScaler("cuda")
+for data, target in dataloader:
+    with torch.amp.autocast("cuda"):
+        output = model(data)
+        loss = criterion(output, target)
+    scaler.scale(loss).backward()
+    scaler.step(optimizer)
+    scaler.update()
+    optimizer.zero_grad(set_to_none=True)
+```
+### Gradient Checkpointing for Large Models
+```python
+# Good: Trade compute for memory
+from torch.utils.checkpoint import checkpoint
+class LargeModel(nn.Module):
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        # Recompute activations during backward to save memory
+        x = checkpoint(self.block1, x, use_reentrant=False)
+        x = checkpoint(self.block2, x, use_reentrant=False)
+        return self.head(x)
+```
+### torch.compile for Speed
+```python
+# Good: Compile the model for faster execution (PyTorch 2.0+)
+model = MyModel().to(device)
+model = torch.compile(model, mode="reduce-overhead")
+# Modes: "default" (safe), "reduce-overhead" (faster), "max-autotune" (fastest)
+```
+## Quick Reference: PyTorch Idioms
+| Idiom | Description |
+|-------|-------------|
+| `model.train()` / `model.eval()` | Always set mode before train/eval |
+| `torch.no_grad()` | Disable gradients for inference |
+| `optimizer.zero_grad(set_to_none=True)` | More efficient gradient clearing |
+| `.to(device)` | Device-agnostic tensor/model placement |
+| `torch.amp.autocast` | Mixed precision for 2x speed |
+| `pin_memory=True` | Faster CPU→GPU data transfer |
+| `torch.compile` | JIT compilation for speed (2.0+) |
+| `weights_only=True` | Secure model loading |
+| `torch.manual_seed` | Reproducible experiments |
+| `gradient_checkpointing` | Trade compute for memory |
+## Anti-Patterns to Avoid
+```python
+# Bad: Forgetting model.eval() during validation
+model.train()
+with torch.no_grad():
+    output = model(val_data)  # Dropout still active! BatchNorm uses batch stats!
+# Good: Always set eval mode
+model.eval()
+with torch.no_grad():
+    output = model(val_data)
+# Bad: In-place operations breaking autograd
+x = F.relu(x, inplace=True)  # Can break gradient computation
+x += residual                  # In-place add breaks autograd graph
+# Good: Out-of-place operations
+x = F.relu(x)
+x = x + residual
+# Bad: Moving data to GPU inside the training loop repeatedly
+for data, target in dataloader:
+    model = model.cuda()  # Moves model EVERY iteration!
+# Good: Move model once before the loop
+model = model.to(device)
+for data, target in dataloader:
+    data, target = data.to(device), target.to(device)
+# Bad: Using .item() before backward
+loss = criterion(output, target).item()  # Detaches from graph!
+loss.backward()  # Error: can't backprop through .item()
+# Good: Call .item() only for logging
+loss = criterion(output, target)
+loss.backward()
+print(f"Loss: {loss.item():.4f}")  # .item() after backward is fine
+# Bad: Not using torch.save properly
+torch.save(model, "model.pt")  # Saves entire model (fragile, not portable)
+# Good: Save state_dict
+torch.save(model.state_dict(), "model.pt")
+```
+__Remember__: PyTorch code should be device-agnostic, reproducible, and memory-conscious. When in doubt, profile with `torch.profiler` and check GPU memory with `torch.cuda.memory_summary()`.

package/src/skills/regex-vs-llm-structured-text/SKILL.md ADDED Viewed

@@ -0,0 +1,220 @@
+---
+name: regex-vs-llm-structured-text
+description: Decision framework for choosing between regex and LLM when parsing structured text — start with regex, add LLM only for low-confidence edge cases.
+origin: ECC
+---
+# Regex vs LLM for Structured Text Parsing
+A practical decision framework for parsing structured text (quizzes, forms, invoices, documents). The key insight: regex handles 95-98% of cases cheaply and deterministically. Reserve expensive LLM calls for the remaining edge cases.
+## When to Activate
+- Parsing structured text with repeating patterns (questions, forms, tables)
+- Deciding between regex and LLM for text extraction
+- Building hybrid pipelines that combine both approaches
+- Optimizing cost/accuracy tradeoffs in text processing
+## Decision Framework
+```
+Is the text format consistent and repeating?
+├── Yes (>90% follows a pattern) → Start with Regex
+│   ├── Regex handles 95%+ → Done, no LLM needed
+│   └── Regex handles <95% → Add LLM for edge cases only
+└── No (free-form, highly variable) → Use LLM directly
+```
+## Architecture Pattern
+```
+Source Text
+    │
+    ▼
+[Regex Parser] ─── Extracts structure (95-98% accuracy)
+    │
+    ▼
+[Text Cleaner] ─── Removes noise (markers, page numbers, artifacts)
+    │
+    ▼
+[Confidence Scorer] ─── Flags low-confidence extractions
+    │
+    ├── High confidence (≥0.95) → Direct output
+    │
+    └── Low confidence (<0.95) → [LLM Validator] → Output
+```
+## Implementation
+### 1. Regex Parser (Handles the Majority)
+```python
+import re
+from dataclasses import dataclass
+@dataclass(frozen=True)
+class ParsedItem:
+    id: str
+    text: str
+    choices: tuple[str, ...]
+    answer: str
+    confidence: float = 1.0
+def parse_structured_text(content: str) -> list[ParsedItem]:
+    """Parse structured text using regex patterns."""
+    pattern = re.compile(
+        r"(?P<id>\d+)\.\s*(?P<text>.+?)\n"
+        r"(?P<choices>(?:[A-D]\..+?\n)+)"
+        r"Answer:\s*(?P<answer>[A-D])",
+        re.MULTILINE | re.DOTALL,
+    )
+    items = []
+    for match in pattern.finditer(content):
+        choices = tuple(
+            c.strip() for c in re.findall(r"[A-D]\.\s*(.+)", match.group("choices"))
+        )
+        items.append(ParsedItem(
+            id=match.group("id"),
+            text=match.group("text").strip(),
+            choices=choices,
+            answer=match.group("answer"),
+        ))
+    return items
+```
+### 2. Confidence Scoring
+Flag items that may need LLM review:
+```python
+@dataclass(frozen=True)
+class ConfidenceFlag:
+    item_id: str
+    score: float
+    reasons: tuple[str, ...]
+def score_confidence(item: ParsedItem) -> ConfidenceFlag:
+    """Score extraction confidence and flag issues."""
+    reasons = []
+    score = 1.0
+    if len(item.choices) < 3:
+        reasons.append("few_choices")
+        score -= 0.3
+    if not item.answer:
+        reasons.append("missing_answer")
+        score -= 0.5
+    if len(item.text) < 10:
+        reasons.append("short_text")
+        score -= 0.2
+    return ConfidenceFlag(
+        item_id=item.id,
+        score=max(0.0, score),
+        reasons=tuple(reasons),
+    )
+def identify_low_confidence(
+    items: list[ParsedItem],
+    threshold: float = 0.95,
+) -> list[ConfidenceFlag]:
+    """Return items below confidence threshold."""
+    flags = [score_confidence(item) for item in items]
+    return [f for f in flags if f.score < threshold]
+```
+### 3. LLM Validator (Edge Cases Only)
+```python
+def validate_with_llm(
+    item: ParsedItem,
+    original_text: str,
+    client,
+) -> ParsedItem:
+    """Use LLM to fix low-confidence extractions."""
+    response = client.messages.create(
+        model="claude-haiku-4-5-20251001",  # Cheapest model for validation
+        max_tokens=500,
+        messages=[{
+            "role": "user",
+            "content": (
+                f"Extract the question, choices, and answer from this text.\n\n"
+                f"Text: {original_text}\n\n"
+                f"Current extraction: {item}\n\n"
+                f"Return corrected JSON if needed, or 'CORRECT' if accurate."
+            ),
+        }],
+    )
+    # Parse LLM response and return corrected item...
+    return corrected_item
+```
+### 4. Hybrid Pipeline
+```python
+def process_document(
+    content: str,
+    *,
+    llm_client=None,
+    confidence_threshold: float = 0.95,
+) -> list[ParsedItem]:
+    """Full pipeline: regex -> confidence check -> LLM for edge cases."""
+    # Step 1: Regex extraction (handles 95-98%)
+    items = parse_structured_text(content)
+    # Step 2: Confidence scoring
+    low_confidence = identify_low_confidence(items, confidence_threshold)
+    if not low_confidence or llm_client is None:
+        return items
+    # Step 3: LLM validation (only for flagged items)
+    low_conf_ids = {f.item_id for f in low_confidence}
+    result = []
+    for item in items:
+        if item.id in low_conf_ids:
+            result.append(validate_with_llm(item, content, llm_client))
+        else:
+            result.append(item)
+    return result
+```
+## Real-World Metrics
+From a production quiz parsing pipeline (410 items):
+| Metric | Value |
+|--------|-------|
+| Regex success rate | 98.0% |
+| Low confidence items | 8 (2.0%) |
+| LLM calls needed | ~5 |
+| Cost savings vs all-LLM | ~95% |
+| Test coverage | 93% |
+## Best Practices
+- **Start with regex** — even imperfect regex gives you a baseline to improve
+- **Use confidence scoring** to programmatically identify what needs LLM help
+- **Use the cheapest LLM** for validation (Haiku-class models are sufficient)
+- **Never mutate** parsed items — return new instances from cleaning/validation steps
+- **TDD works well** for parsers — write tests for known patterns first, then edge cases
+- **Log metrics** (regex success rate, LLM call count) to track pipeline health
+## Anti-Patterns to Avoid
+- Sending all text to an LLM when regex handles 95%+ of cases (expensive and slow)
+- Using regex for free-form, highly variable text (LLM is better here)
+- Skipping confidence scoring and hoping regex "just works"
+- Mutating parsed objects during cleaning/validation steps
+- Not testing edge cases (malformed input, missing fields, encoding issues)
+## When to Use
+- Quiz/exam question parsing
+- Form data extraction
+- Invoice/receipt processing
+- Document structure parsing (headers, sections, tables)
+- Any structured text with repeating patterns where cost matters

package/src/skills/repo-scan/SKILL.md ADDED Viewed

@@ -0,0 +1,78 @@
+---
+name: repo-scan
+description: Cross-stack source code asset audit — classifies every file, detects embedded third-party libraries, and delivers actionable four-level verdicts per module with interactive HTML reports.
+origin: community
+---
+# repo-scan
+> Every ecosystem has its own dependency manager, but no tool looks across C++, Android, iOS, and Web to tell you: how much code is actually yours, what's third-party, and what's dead weight.
+## When to Use
+- Taking over a large legacy codebase and need a structural overview
+- Before major refactoring — identify what's core, what's duplicate, what's dead
+- Auditing third-party dependencies embedded directly in source (not declared in package managers)
+- Preparing architecture decision records for monorepo reorganization
+## Installation
+```bash
+# Fetch only the pinned commit for reproducibility
+mkdir -p ~/.claude/skills/repo-scan
+git init repo-scan
+cd repo-scan
+git remote add origin https://github.com/haibindev/repo-scan.git
+git fetch --depth 1 origin 2742664
+git checkout --detach FETCH_HEAD
+cp -r . ~/.claude/skills/repo-scan
+```
+> Review the source before installing any agent skill.
+## Core Capabilities
+| Capability | Description |
+|---|---|
+| **Cross-stack scanning** | C/C++, Java/Android, iOS (OC/Swift), Web (TS/JS/Vue) in one pass |
+| **File classification** | Every file tagged as project code, third-party, or build artifact |
+| **Library detection** | 50+ known libraries (FFmpeg, Boost, OpenSSL…) with version extraction |
+| **Four-level verdicts** | Core Asset / Extract & Merge / Rebuild / Deprecate |
+| **HTML reports** | Interactive dark-theme pages with drill-down navigation |
+| **Monorepo support** | Hierarchical scanning with summary + sub-project reports |
+## Analysis Depth Levels
+| Level | Files Read | Use Case |
+|---|---|---|
+| `fast` | 1-2 per module | Quick inventory of huge directories |
+| `standard` | 2-5 per module | Default audit with full dependency + architecture checks |
+| `deep` | 5-10 per module | Adds thread safety, memory management, API consistency |
+| `full` | All files | Pre-merge comprehensive review |
+## How It Works
+1. **Classify the repo surface**: enumerate files, then tag each as project code, embedded third-party code, or build artifact.
+2. **Detect embedded libraries**: inspect directory names, headers, license files, and version markers to identify bundled dependencies and likely versions.
+3. **Score each module**: group files by module or subsystem, then assign one of the four verdicts based on ownership, duplication, and maintenance cost.
+4. **Highlight structural risks**: call out dead-weight artifacts, duplicated wrappers, outdated vendored code, and modules that should be extracted, rebuilt, or deprecated.
+5. **Produce the report**: return a concise summary plus the interactive HTML output with per-module drill-down so the audit can be reviewed asynchronously.
+## Examples
+On a 50,000-file C++ monorepo:
+- Found FFmpeg 2.x (2015 vintage) still in production
+- Discovered the same SDK wrapper duplicated 3 times
+- Identified 636 MB of committed Debug/ipch/obj build artifacts
+- Classified: 3 MB project code vs 596 MB third-party
+## Best Practices
+- Start with `standard` depth for first-time audits
+- Use `fast` for monorepos with 100+ modules to get a quick inventory
+- Run `deep` incrementally on modules flagged for refactoring
+- Review the cross-module analysis for duplicate detection across sub-projects
+## Links
+- [GitHub Repository](https://github.com/haibindev/repo-scan)