PyPI - buildlog - Versions diffs - 0.6.1__tar.gz → 0.7.0__tar.gz - Mend

buildlog 0.6.1tar.gz → 0.7.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (41) hide show

{buildlog-0.6.1 → buildlog-0.7.0}/.gitignore RENAMED Viewed

@@ -44,3 +44,11 @@ htmlcov/
 # Build artifacts
 *.whl
+# Development artifacts
+CHAT.txt
+results/
+sketches/
+# buildlog runtime data (in project root, not in src/)
+buildlog/.buildlog/

{buildlog-0.6.1 → buildlog-0.7.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: buildlog
-Version: 0.6.1
+Version: 0.7.0
 Summary: Engineering notebook for AI-assisted development
 Project-URL: Homepage, https://github.com/Peleke/buildlog-template
 Project-URL: Repository, https://github.com/Peleke/buildlog-template
@@ -123,11 +123,30 @@ RMR is not the only metric that matters. But it's one we can measure, and measur
 ## The Mechanism
-buildlog uses **contextual bandits** to select which rules to surface.
+buildlog is building toward **contextual bandits** for automatic rule selection. Here's where we are:
+### What Exists Today (v0.7)
 ```
 ┌─────────────────────────────────────────────────────────────────┐
-│                    CONTEXTUAL BANDIT SETUP                      │
+│                    CURRENT INFRASTRUCTURE                       │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                 │
+│  ✅ Rule extraction     From entries, reviews, curated seeds   │
+│  ✅ Confidence scoring  Frequency + recency based              │
+│  ✅ Reward logging      Accept/reject/revision signals         │
+│  ✅ Experiment tracking Sessions, mistakes, RMR calculation    │
+│  ✅ Review gauntlet     Curated persona-based code review      │
+│  ⏳ Manual promotion    Human selects rules to surface         │
+│                                                                 │
+└─────────────────────────────────────────────────────────────────┘
+```
+### What's Coming (v0.8+)
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    CONTEXTUAL BANDIT (PLANNED)                  │
 ├─────────────────────────────────────────────────────────────────┤
 │                                                                 │
 │  Context (c):     Error class, file type, task category        │
@@ -147,9 +166,9 @@ buildlog uses **contextual bandits** to select which rules to surface.
 **Reward** = did surfacing this rule actually help?
-The system explores (tries uncertain rules) and exploits (uses proven rules) based on accumulated evidence. Thompson Sampling provides theoretical guarantees: O(√(KT log K)) regret bounds.
+The reward infrastructure exists. The bandit policy is next. Thompson Sampling will provide theoretical guarantees: O(√(KT log K)) regret bounds.
-This isn't magic. It's a well-understood framework with decades of research. We're applying it to agent rule selection.
+We're building in public—the bandit implementation will be developed with full documentation of the process.
 ---
@@ -161,16 +180,20 @@ buildlog captures signal at every stage:
 flowchart LR
     A["Work Sessions"] --> B["Structured Entries"]
     B --> C["Extracted Rules"]
-    C --> D["Bandit Selection"]
+    C --> D["Manual Promotion"]
     D --> E["Rule Surfaced"]
     E --> F["Human Feedback"]
-    F --> G["Posterior Update"]
-    G --> D
+    F --> G["Reward Logged"]
+    G -.-> H["Bandit Policy"]
+    H -.-> D
     style F fill:#ff6b6b,color:#fff
     style G fill:#4ecdc4,color:#fff
+    style H fill:#666,color:#fff,stroke-dasharray: 5 5
 ```
+*Dashed: Coming in v0.8 — automatic rule selection via Thompson Sampling*
 ### Stage 1: Capture
 Document your work. Include the fuckups—they're the most valuable signal.
@@ -269,6 +292,27 @@ buildlog gauntlet rules --format markdown -o review_checklist.md
 buildlog gauntlet learn review_issues.json --source "PR#42"
 ```
+### Gauntlet Loop (Agent Integration)
+For AI agents, the gauntlet loop automates the fix-rerun cycle:
+```bash
+buildlog gauntlet loop src/ --persona security_karen --persona test_terrorist
+```
+The loop provides structured checkpoints:
+| Severity | Action | Human Needed? |
+|----------|--------|---------------|
+| **Critical** | Agent fixes, reruns | No |
+| **Major** | Checkpoint: continue? | Yes |
+| **Minor** | Accept risk or fix? | Yes |
+| **Clean** | Done | No |
+MCP tools for agent integration:
+- `buildlog_gauntlet_issues` — Report findings, get next action
+- `buildlog_gauntlet_accept_risk` — Accept remaining issues (optionally create GitHub issues)
 The gauntlet integrates with the learning loop—issues found become rules that accumulate confidence.
 ---
@@ -359,6 +403,8 @@ Available tools:
 | `buildlog_start_session` | Begin tracked experiment |
 | `buildlog_log_mistake` | Record mistake during session |
 | `buildlog_experiment_report` | Full experiment report |
+| `buildlog_gauntlet_issues` | Report gauntlet findings, get next action |
+| `buildlog_gauntlet_accept_risk` | Accept remaining issues, optionally create GH issues |
 ### CLI Commands
@@ -382,6 +428,7 @@ buildlog gauntlet list           # Show reviewers
 buildlog gauntlet rules          # Export rules
 buildlog gauntlet prompt <path>  # Generate review prompt
 buildlog gauntlet learn <file>   # Persist learnings
+buildlog gauntlet loop <path>    # Auto-fix loop with HITL checkpoints
 ```
 ---
@@ -421,21 +468,28 @@ This is how you know. Not vibes. Data.
 For the technically curious:
-| Concept | Application in buildlog |
-|---------|------------------------|
-| **Thompson Sampling** | Rule selection under uncertainty |
-| **Beta-Bernoulli model** | Posterior updates from binary reward |
-| **Contextual bandits** | Context-dependent rule selection |
-| **Regret bounds** | O(√(KT log K)) theoretical guarantee |
-| **Semantic hashing** | Mistake deduplication for RMR |
+| Concept | Application in buildlog | Status |
+|---------|------------------------|--------|
+| **Confidence scoring** | Frequency + recency decay | ✅ Implemented |
+| **Semantic hashing** | Mistake deduplication for RMR | ✅ Implemented |
+| **Reward signals** | Binary feedback infrastructure | ✅ Implemented |
+| **Thompson Sampling** | Rule selection under uncertainty | ⏳ Planned (v0.8) |
+| **Beta-Bernoulli model** | Posterior updates from binary reward | ⏳ Planned (v0.8) |
+| **Contextual bandits** | Context-dependent rule selection | ⏳ Planned (v0.8) |
+| **Regret bounds** | O(√(KT log K)) theoretical guarantee | ⏳ Planned (v0.8) |
-We're not inventing new math. We're applying proven frameworks to a new domain.
+We're not inventing new math. We're applying proven frameworks to a new domain. The infrastructure for reward collection is live; the bandit policy is the next milestone.
 ---
 ## Honest Limitations
-Things we don't have figured out yet:
+### Not Yet Implemented
+- **Automatic rule selection**: Currently manual promotion; Thompson Sampling bandit planned for v0.8
+- **Context-aware surfacing**: Rules are surfaced globally, not based on task context
+### Hard Problems We're Working On
 - **Credit assignment**: When multiple rules are active, which one helped?
 - **Non-stationarity**: Developer skill changes over time

{buildlog-0.6.1 → buildlog-0.7.0}/README.md RENAMED Viewed

@@ -75,11 +75,30 @@ RMR is not the only metric that matters. But it's one we can measure, and measur
 ## The Mechanism
-buildlog uses **contextual bandits** to select which rules to surface.
+buildlog is building toward **contextual bandits** for automatic rule selection. Here's where we are:
+### What Exists Today (v0.7)
 ```
 ┌─────────────────────────────────────────────────────────────────┐
-│                    CONTEXTUAL BANDIT SETUP                      │
+│                    CURRENT INFRASTRUCTURE                       │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                 │
+│  ✅ Rule extraction     From entries, reviews, curated seeds   │
+│  ✅ Confidence scoring  Frequency + recency based              │
+│  ✅ Reward logging      Accept/reject/revision signals         │
+│  ✅ Experiment tracking Sessions, mistakes, RMR calculation    │
+│  ✅ Review gauntlet     Curated persona-based code review      │
+│  ⏳ Manual promotion    Human selects rules to surface         │
+│                                                                 │
+└─────────────────────────────────────────────────────────────────┘
+```
+### What's Coming (v0.8+)
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    CONTEXTUAL BANDIT (PLANNED)                  │
 ├─────────────────────────────────────────────────────────────────┤
 │                                                                 │
 │  Context (c):     Error class, file type, task category        │
@@ -99,9 +118,9 @@ buildlog uses **contextual bandits** to select which rules to surface.
 **Reward** = did surfacing this rule actually help?
-The system explores (tries uncertain rules) and exploits (uses proven rules) based on accumulated evidence. Thompson Sampling provides theoretical guarantees: O(√(KT log K)) regret bounds.
+The reward infrastructure exists. The bandit policy is next. Thompson Sampling will provide theoretical guarantees: O(√(KT log K)) regret bounds.
-This isn't magic. It's a well-understood framework with decades of research. We're applying it to agent rule selection.
+We're building in public—the bandit implementation will be developed with full documentation of the process.
 ---
@@ -113,16 +132,20 @@ buildlog captures signal at every stage:
 flowchart LR
     A["Work Sessions"] --> B["Structured Entries"]
     B --> C["Extracted Rules"]
-    C --> D["Bandit Selection"]
+    C --> D["Manual Promotion"]
     D --> E["Rule Surfaced"]
     E --> F["Human Feedback"]
-    F --> G["Posterior Update"]
-    G --> D
+    F --> G["Reward Logged"]
+    G -.-> H["Bandit Policy"]
+    H -.-> D
     style F fill:#ff6b6b,color:#fff
     style G fill:#4ecdc4,color:#fff
+    style H fill:#666,color:#fff,stroke-dasharray: 5 5
 ```
+*Dashed: Coming in v0.8 — automatic rule selection via Thompson Sampling*
 ### Stage 1: Capture
 Document your work. Include the fuckups—they're the most valuable signal.
@@ -221,6 +244,27 @@ buildlog gauntlet rules --format markdown -o review_checklist.md
 buildlog gauntlet learn review_issues.json --source "PR#42"
 ```
+### Gauntlet Loop (Agent Integration)
+For AI agents, the gauntlet loop automates the fix-rerun cycle:
+```bash
+buildlog gauntlet loop src/ --persona security_karen --persona test_terrorist
+```
+The loop provides structured checkpoints:
+| Severity | Action | Human Needed? |
+|----------|--------|---------------|
+| **Critical** | Agent fixes, reruns | No |
+| **Major** | Checkpoint: continue? | Yes |
+| **Minor** | Accept risk or fix? | Yes |
+| **Clean** | Done | No |
+MCP tools for agent integration:
+- `buildlog_gauntlet_issues` — Report findings, get next action
+- `buildlog_gauntlet_accept_risk` — Accept remaining issues (optionally create GitHub issues)
 The gauntlet integrates with the learning loop—issues found become rules that accumulate confidence.
 ---
@@ -311,6 +355,8 @@ Available tools:
 | `buildlog_start_session` | Begin tracked experiment |
 | `buildlog_log_mistake` | Record mistake during session |
 | `buildlog_experiment_report` | Full experiment report |
+| `buildlog_gauntlet_issues` | Report gauntlet findings, get next action |
+| `buildlog_gauntlet_accept_risk` | Accept remaining issues, optionally create GH issues |
 ### CLI Commands
@@ -334,6 +380,7 @@ buildlog gauntlet list           # Show reviewers
 buildlog gauntlet rules          # Export rules
 buildlog gauntlet prompt <path>  # Generate review prompt
 buildlog gauntlet learn <file>   # Persist learnings
+buildlog gauntlet loop <path>    # Auto-fix loop with HITL checkpoints
 ```
 ---
@@ -373,21 +420,28 @@ This is how you know. Not vibes. Data.
 For the technically curious:
-| Concept | Application in buildlog |
-|---------|------------------------|
-| **Thompson Sampling** | Rule selection under uncertainty |
-| **Beta-Bernoulli model** | Posterior updates from binary reward |
-| **Contextual bandits** | Context-dependent rule selection |
-| **Regret bounds** | O(√(KT log K)) theoretical guarantee |
-| **Semantic hashing** | Mistake deduplication for RMR |
+| Concept | Application in buildlog | Status |
+|---------|------------------------|--------|
+| **Confidence scoring** | Frequency + recency decay | ✅ Implemented |
+| **Semantic hashing** | Mistake deduplication for RMR | ✅ Implemented |
+| **Reward signals** | Binary feedback infrastructure | ✅ Implemented |
+| **Thompson Sampling** | Rule selection under uncertainty | ⏳ Planned (v0.8) |
+| **Beta-Bernoulli model** | Posterior updates from binary reward | ⏳ Planned (v0.8) |
+| **Contextual bandits** | Context-dependent rule selection | ⏳ Planned (v0.8) |
+| **Regret bounds** | O(√(KT log K)) theoretical guarantee | ⏳ Planned (v0.8) |
-We're not inventing new math. We're applying proven frameworks to a new domain.
+We're not inventing new math. We're applying proven frameworks to a new domain. The infrastructure for reward collection is live; the bandit policy is the next milestone.
 ---
 ## Honest Limitations
-Things we don't have figured out yet:
+### Not Yet Implemented
+- **Automatic rule selection**: Currently manual promotion; Thompson Sampling bandit planned for v0.8
+- **Context-aware surfacing**: Rules are surfaced globally, not based on task context
+### Hard Problems We're Working On
 - **Credit assignment**: When multiple rules are active, which one helped?
 - **Non-stationarity**: Developer skill changes over time

{buildlog-0.6.1 → buildlog-0.7.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "buildlog"
-version = "0.6.1"
+version = "0.7.0"
 description = "Engineering notebook for AI-assisted development"
 readme = "README.md"
 license = "MIT"

{buildlog-0.6.1 → buildlog-0.7.0}/src/buildlog/cli.py RENAMED Viewed

@@ -1264,5 +1264,158 @@ def gauntlet_learn(issues_file: str, source: str | None, output_json: bool):
         click.echo(f"  Total processed: {result.total_issues_processed}")
+@gauntlet.command("loop")
+@click.argument("target", type=click.Path(exists=True))
+@click.option(
+    "--persona",
+    "-p",
+    multiple=True,
+    help="Personas to run (default: all)",
+)
+@click.option(
+    "--max-iterations",
+    "-n",
+    default=10,
+    help="Maximum iterations to prevent infinite loops (default: 10)",
+)
+@click.option(
+    "--stop-at",
+    type=click.Choice(["criticals", "majors", "minors"]),
+    default="minors",
+    help="Stop after clearing this severity level (default: minors)",
+)
+@click.option(
+    "--auto-gh-issues",
+    is_flag=True,
+    help="Create GitHub issues for remaining items when accepting risk",
+)
+@click.option("--json", "output_json", is_flag=True, help="Output as JSON")
+def gauntlet_loop(
+    target: str,
+    persona: tuple[str, ...],
+    max_iterations: int,
+    stop_at: str,
+    auto_gh_issues: bool,
+    output_json: bool,
+):
+    """Run the gauntlet loop: review, fix, repeat until clean.
+    This command orchestrates the gauntlet loop workflow:
+    1. Generate review prompt for target code
+    2. Process issues and determine action
+    3. On criticals: output fix instructions, expect re-run
+    4. On majors only: checkpoint (ask to continue)
+    5. On minors only: checkpoint (accept risk?)
+    6. Optionally create GitHub issues for remaining items
+    The loop is designed to be run interactively with an agent
+    (Claude Code, Cursor, etc.) that does the actual fixing.
+    Examples:
+        buildlog gauntlet loop src/
+        buildlog gauntlet loop tests/ --stop-at majors
+        buildlog gauntlet loop . --auto-gh-issues
+    """
+    import json as json_module
+    from buildlog.seeds import get_default_seeds_dir, load_all_seeds
+    # Find seeds directory
+    seeds_dir = get_default_seeds_dir()
+    if seeds_dir is None:
+        click.echo("No seed files found.", err=True)
+        raise SystemExit(1)
+    seeds = load_all_seeds(seeds_dir)
+    if not seeds:
+        click.echo("No seed files found in directory.", err=True)
+        raise SystemExit(1)
+    # Filter personas
+    if persona:
+        seeds = {k: v for k, v in seeds.items() if k in persona}
+        if not seeds:
+            click.echo(f"No matching personas: {', '.join(persona)}", err=True)
+            raise SystemExit(1)
+    target_path = Path(target)
+    # Generate persona rules summary
+    rules_by_persona: dict[str, list[dict[str, str]]] = {}
+    for name, sf in seeds.items():
+        rules_by_persona[name] = [
+            {"rule": r.rule, "antipattern": r.antipattern, "category": r.category}
+            for r in sf.rules
+        ]
+    # Loop instructions
+    instructions = [
+        "1. Review the target code using the rules from each persona",
+        "2. Report all violations as JSON issues with: severity, category, description, rule_learned, location",
+        "3. Call `buildlog_gauntlet_issues` with the issues list to determine next action",
+        "4. If action='fix_criticals': Fix critical+major issues, then re-run gauntlet",
+        "5. If action='checkpoint_majors': Ask user whether to continue fixing majors",
+        "6. If action='checkpoint_minors': Ask user whether to accept risk or continue",
+        "7. If user accepts risk and --auto-gh-issues: Call `buildlog_gauntlet_accept_risk` with remaining issues",
+        "8. Repeat until action='clean' or max_iterations reached",
+    ]
+    # Expected issue format
+    issue_format = {
+        "severity": "critical|major|minor|nitpick",
+        "category": "security|testing|architectural|workflow|...",
+        "description": "Concrete description of what's wrong",
+        "rule_learned": "Generalizable rule for the future",
+        "location": "file:line (optional)",
+    }
+    # Build the loop output
+    output = {
+        "command": "gauntlet_loop",
+        "target": str(target_path),
+        "personas": list(seeds.keys()),
+        "max_iterations": max_iterations,
+        "stop_at": stop_at,
+        "auto_gh_issues": auto_gh_issues,
+        "rules_by_persona": rules_by_persona,
+        "instructions": instructions,
+        "issue_format": issue_format,
+    }
+    if output_json:
+        click.echo(json_module.dumps(output, indent=2))
+    else:
+        # Human-readable output
+        click.echo("=" * 60)
+        click.echo("GAUNTLET LOOP")
+        click.echo("=" * 60)
+        click.echo(f"\nTarget: {target_path}")
+        click.echo(f"Personas: {', '.join(seeds.keys())}")
+        click.echo(f"Max iterations: {max_iterations}")
+        click.echo(f"Stop at: {stop_at}")
+        click.echo(f"Auto GH issues: {auto_gh_issues}")
+        click.echo("\n--- RULES ---")
+        for name, rules in rules_by_persona.items():
+            click.echo(f"\n## {name.replace('_', ' ').title()}")
+            for r in rules:
+                click.echo(f"  • {r['rule']}")
+        click.echo("\n--- LOOP WORKFLOW ---")
+        for instruction in instructions:
+            click.echo(f"  {instruction}")
+        click.echo("\n--- ISSUE FORMAT ---")
+        click.echo(json_module.dumps(issue_format, indent=2))
+        click.echo("\n" + "=" * 60)
+        click.echo("Ready. Run gauntlet review and process issues.")
+        click.echo("=" * 60)
 if __name__ == "__main__":
     main()

{buildlog-0.6.1 → buildlog-0.7.0}/src/buildlog/core/__init__.py RENAMED Viewed

@@ -3,6 +3,8 @@
 from buildlog.core.operations import (
     DiffResult,
     EndSessionResult,
+    GauntletAcceptRiskResult,
+    GauntletLoopResult,
     LearnFromReviewResult,
     LogMistakeResult,
     LogRewardResult,
@@ -20,6 +22,8 @@ from buildlog.core.operations import (
     diff,
     end_session,
     find_skills_by_ids,
+    gauntlet_accept_risk,
+    gauntlet_process_issues,
     get_experiment_report,
     get_rewards,
     get_session_metrics,
@@ -50,6 +54,9 @@ __all__ = [
     "StartSessionResult",
     "EndSessionResult",
     "LogMistakeResult",
+    # Gauntlet loop
+    "GauntletLoopResult",
+    "GauntletAcceptRiskResult",
     "status",
     "promote",
     "reject",
@@ -64,4 +71,7 @@ __all__ = [
     "log_mistake",
     "get_session_metrics",
     "get_experiment_report",
+    # Gauntlet loop operations
+    "gauntlet_process_issues",
+    "gauntlet_accept_risk",
 ]

{buildlog-0.6.1 → buildlog-0.7.0}/src/buildlog/core/operations.py RENAMED Viewed

@@ -35,6 +35,9 @@ __all__ = [
     "StartSessionResult",
     "EndSessionResult",
     "LogMistakeResult",
+    # Gauntlet loop
+    "GauntletLoopResult",
+    "GauntletAcceptRiskResult",
     "status",
     "promote",
     "reject",
@@ -49,6 +52,9 @@ __all__ = [
     "log_mistake",
     "get_session_metrics",
     "get_experiment_report",
+    # Gauntlet loop operations
+    "gauntlet_process_issues",
+    "gauntlet_accept_risk",
 ]
@@ -1652,3 +1658,231 @@ def get_experiment_report(buildlog_dir: Path) -> dict:
         "sessions": session_metrics,
         "error_classes": error_classes,
     }
+# =============================================================================
+# Gauntlet Loop Operations
+# =============================================================================
+@dataclass
+class GauntletLoopResult:
+    """Result of processing gauntlet issues.
+    Attributes:
+        action: What to do next:
+            - "fix_criticals": Criticals remain, auto-fix and loop
+            - "checkpoint_majors": No criticals, but majors remain (HITL)
+            - "checkpoint_minors": Only minors remain (HITL)
+            - "clean": No issues remain
+        criticals: List of critical severity issues
+        majors: List of major severity issues
+        minors: List of minor/nitpick severity issues
+        iteration: Current iteration number
+        learnings_persisted: Number of learnings persisted this iteration
+        message: Human-readable summary
+    """
+    action: Literal["fix_criticals", "checkpoint_majors", "checkpoint_minors", "clean"]
+    criticals: list[dict]
+    majors: list[dict]
+    minors: list[dict]
+    iteration: int
+    learnings_persisted: int
+    message: str
+@dataclass
+class GauntletAcceptRiskResult:
+    """Result of accepting risk with remaining issues.
+    Attributes:
+        accepted_issues: Number of issues accepted as risk
+        github_issues_created: Number of GitHub issues created (if enabled)
+        github_issue_urls: URLs of created GitHub issues
+        message: Human-readable summary
+        error: Error message if operation failed
+    """
+    accepted_issues: int
+    github_issues_created: int
+    github_issue_urls: list[str]
+    message: str
+    error: str | None = None
+def gauntlet_process_issues(
+    buildlog_dir: Path,
+    issues: list[dict],
+    iteration: int = 1,
+    source: str | None = None,
+) -> GauntletLoopResult:
+    """Process gauntlet issues and determine next action.
+    Categorizes issues by severity, persists learnings, and returns
+    the appropriate next action for the gauntlet loop.
+    Args:
+        buildlog_dir: Path to buildlog directory.
+        issues: List of issues from the gauntlet review.
+        iteration: Current iteration number (for tracking).
+        source: Optional source identifier for learnings.
+    Returns:
+        GauntletLoopResult with categorized issues and next action.
+    """
+    # Categorize by severity
+    criticals = [i for i in issues if i.get("severity") == "critical"]
+    majors = [i for i in issues if i.get("severity") == "major"]
+    minors = [i for i in issues if i.get("severity") in ("minor", "nitpick", None)]
+    # Persist learnings for this iteration
+    learn_source = source or f"gauntlet:iteration-{iteration}"
+    learn_result = learn_from_review(buildlog_dir, issues, learn_source)
+    learnings_persisted = len(learn_result.new_learnings) + len(
+        learn_result.reinforced_learnings
+    )
+    # Determine action
+    if criticals:
+        action: Literal[
+            "fix_criticals", "checkpoint_majors", "checkpoint_minors", "clean"
+        ] = "fix_criticals"
+        message = (
+            f"Iteration {iteration}: {len(criticals)} critical, "
+            f"{len(majors)} major, {len(minors)} minor. "
+            f"Fix criticals (and majors) then re-run."
+        )
+    elif majors:
+        action = "checkpoint_majors"
+        message = (
+            f"Iteration {iteration}: No criticals! "
+            f"{len(majors)} major, {len(minors)} minor remain. "
+            f"Continue clearing majors?"
+        )
+    elif minors:
+        action = "checkpoint_minors"
+        message = (
+            f"Iteration {iteration}: Only {len(minors)} minor issues remain. "
+            f"Accept risk or continue?"
+        )
+    else:
+        action = "clean"
+        message = f"Iteration {iteration}: All clear! No issues found."
+    return GauntletLoopResult(
+        action=action,
+        criticals=criticals,
+        majors=majors,
+        minors=minors,
+        iteration=iteration,
+        learnings_persisted=learnings_persisted,
+        message=message,
+    )
+def gauntlet_accept_risk(
+    remaining_issues: list[dict],
+    create_github_issues: bool = False,
+    repo: str | None = None,
+) -> GauntletAcceptRiskResult:
+    """Accept risk for remaining issues, optionally creating GitHub issues.
+    Args:
+        remaining_issues: Issues being accepted as risk.
+        create_github_issues: Whether to create GitHub issues for tracking.
+        repo: Repository for GitHub issues (uses current repo if None).
+    Returns:
+        GauntletAcceptRiskResult with created issue info.
+    """
+    import subprocess
+    github_urls: list[str] = []
+    error: str | None = None
+    if create_github_issues and remaining_issues:
+        for issue in remaining_issues:
+            severity = issue.get("severity", "minor")
+            rule = issue.get("rule_learned", issue.get("description", "Unknown"))
+            description = issue.get("description", "")
+            location = issue.get("location", "")
+            # Sanitize inputs for GitHub issue creation
+            # Note: We use list args (not shell=True), so this is defense-in-depth
+            def _sanitize_for_gh(text: str, max_len: int = 256) -> str:
+                """Sanitize text for GitHub issue fields."""
+                # Remove/replace problematic characters
+                sanitized = text.replace("\n", " ").replace("\r", " ")
+                # Truncate to max length
+                if len(sanitized) > max_len:
+                    sanitized = sanitized[: max_len - 3] + "..."
+                return sanitized.strip()
+            safe_severity = _sanitize_for_gh(str(severity), 20)
+            safe_rule = _sanitize_for_gh(str(rule), 200)
+            safe_description = _sanitize_for_gh(str(description), 1000)
+            safe_location = _sanitize_for_gh(str(location), 100)
+            # Build issue body
+            body_parts = [
+                f"**Severity:** {safe_severity}",
+                f"**Rule:** {safe_rule}",
+                "",
+                "## Description",
+                safe_description,
+            ]
+            if safe_location:
+                body_parts.extend(["", f"**Location:** `{safe_location}`"])
+            body_parts.extend(
+                [
+                    "",
+                    "---",
+                    "_Created by buildlog gauntlet loop (accepted risk)_",
+                ]
+            )
+            body = "\n".join(body_parts)
+            title = f"[Gauntlet/{safe_severity}] {safe_rule[:60]}"
+            # Create GitHub issue
+            cmd = [
+                "gh",
+                "issue",
+                "create",
+                "--title",
+                title,
+                "--body",
+                body,
+                "--label",
+                severity,
+            ]
+            if repo:
+                cmd.extend(["--repo", repo])
+            try:
+                result = subprocess.run(cmd, capture_output=True, text=True, check=True)
+                # gh issue create outputs the URL
+                url = result.stdout.strip()
+                if url:
+                    github_urls.append(url)
+            except subprocess.CalledProcessError as e:
+                # Don't fail entirely, just note the error
+                error = f"Failed to create some GitHub issues: {e.stderr}"
+            except FileNotFoundError:
+                error = "gh CLI not found. Install GitHub CLI to create issues."
+                break
+    return GauntletAcceptRiskResult(
+        accepted_issues=len(remaining_issues),
+        github_issues_created=len(github_urls),
+        github_issue_urls=github_urls,
+        message=(
+            f"Accepted {len(remaining_issues)} issues as risk. "
+            f"Created {len(github_urls)} GitHub issues."
+            if create_github_issues
+            else f"Accepted {len(remaining_issues)} issues as risk."
+        ),
+        error=error,
+    )

{buildlog-0.6.1 → buildlog-0.7.0}/src/buildlog/mcp/server.py RENAMED Viewed

@@ -8,6 +8,8 @@ from buildlog.mcp.tools import (
     buildlog_diff,
     buildlog_end_session,
     buildlog_experiment_report,
+    buildlog_gauntlet_accept_risk,
+    buildlog_gauntlet_issues,
     buildlog_learn_from_review,
     buildlog_log_mistake,
     buildlog_log_reward,
@@ -37,6 +39,10 @@ mcp.tool()(buildlog_log_mistake)
 mcp.tool()(buildlog_session_metrics)
 mcp.tool()(buildlog_experiment_report)
+# Gauntlet loop tools
+mcp.tool()(buildlog_gauntlet_issues)
+mcp.tool()(buildlog_gauntlet_accept_risk)
 def main() -> None:
     """Run the MCP server."""

{buildlog-0.6.1 → buildlog-0.7.0}/src/buildlog/mcp/tools.py RENAMED Viewed

@@ -405,3 +405,108 @@ def buildlog_experiment_report(
         buildlog_experiment_report()
     """
     return get_experiment_report(Path(buildlog_dir))
+# -----------------------------------------------------------------------------
+# Gauntlet Loop MCP Tools
+# -----------------------------------------------------------------------------
+def buildlog_gauntlet_issues(
+    issues: list[dict],
+    iteration: int = 1,
+    source: str | None = None,
+    buildlog_dir: str = "buildlog",
+) -> dict:
+    """Process gauntlet review issues and determine next action.
+    Call this after running a gauntlet review. It categorizes issues by
+    severity, persists learnings, and returns the appropriate next action.
+    Args:
+        issues: List of issues from the gauntlet review, each with:
+            {
+                "severity": "critical|major|minor|nitpick",
+                "category": "security|testing|architectural|...",
+                "description": "What's wrong",
+                "rule_learned": "Generalizable rule",
+                "location": "file:line (optional)"
+            }
+        iteration: Current iteration number (for tracking loops)
+        source: Optional source identifier for learnings
+        buildlog_dir: Path to buildlog directory
+    Returns:
+        Dict with:
+            - action: What to do next:
+                - "fix_criticals": Criticals remain, auto-fix and loop
+                - "checkpoint_majors": No criticals, majors remain (ask user)
+                - "checkpoint_minors": Only minors remain (ask user)
+                - "clean": No issues remain
+            - criticals: List of critical issues
+            - majors: List of major issues
+            - minors: List of minor/nitpick issues
+            - iteration: Current iteration number
+            - learnings_persisted: Number of learnings saved
+            - message: Human-readable summary
+    Example:
+        # After running gauntlet review
+        result = buildlog_gauntlet_issues(
+            issues=[
+                {"severity": "critical", "category": "security", ...},
+                {"severity": "major", "category": "testing", ...},
+            ],
+            iteration=1
+        )
+        # result["action"] tells you what to do next
+    """
+    from buildlog.core import gauntlet_process_issues
+    result = gauntlet_process_issues(
+        Path(buildlog_dir),
+        issues=issues,
+        iteration=iteration,
+        source=source,
+    )
+    return asdict(result)
+def buildlog_gauntlet_accept_risk(
+    remaining_issues: list[dict],
+    create_github_issues: bool = False,
+    repo: str | None = None,
+) -> dict:
+    """Accept risk for remaining issues, optionally creating GitHub issues.
+    Call this when the user decides to accept remaining issues as risk
+    (e.g., only minors remain and they want to move on).
+    Args:
+        remaining_issues: Issues being accepted as risk
+        create_github_issues: Whether to create GitHub issues for tracking
+        repo: Repository for GitHub issues (uses current repo if None)
+    Returns:
+        Dict with:
+            - accepted_issues: Number of issues accepted
+            - github_issues_created: Number of GitHub issues created
+            - github_issue_urls: URLs of created issues
+            - message: Human-readable summary
+            - error: Error message if GitHub issue creation failed
+    Example:
+        # User accepts risk with minors, wants GitHub issues
+        result = buildlog_gauntlet_accept_risk(
+            remaining_issues=[...],
+            create_github_issues=True
+        )
+    """
+    from buildlog.core import gauntlet_accept_risk
+    result = gauntlet_accept_risk(
+        remaining_issues=remaining_issues,
+        create_github_issues=create_github_issues,
+        repo=repo,
+    )
+    return asdict(result)

{buildlog-0.6.1 → buildlog-0.7.0}/src/buildlog/render/claude_md.py RENAMED Viewed

@@ -6,7 +6,7 @@ from datetime import datetime
 from pathlib import Path
 from typing import TYPE_CHECKING
-from buildlog.render.tracking import track_promoted
+from buildlog.render.tracking import get_promoted_ids, track_promoted
 from buildlog.skills import _to_imperative
 if TYPE_CHECKING:
@@ -33,6 +33,8 @@ class ClaudeMdRenderer:
     def render(self, skills: list[Skill]) -> str:
         """Append skills to CLAUDE.md.
+        Filters out skills that have already been promoted to prevent duplicates.
         Args:
             skills: List of skills to append.
@@ -42,9 +44,16 @@ class ClaudeMdRenderer:
         if not skills:
             return "No skills to promote"
+        # Filter out already-promoted skills
+        already_promoted = get_promoted_ids(self.tracking_path)
+        new_skills = [s for s in skills if s.id not in already_promoted]
+        if not new_skills:
+            return f"All {len(skills)} skills already promoted"
         # Group by category
         by_category: dict[str, list[Skill]] = {}
-        for skill in skills:
+        for skill in new_skills:
             by_category.setdefault(skill.category, []).append(skill)
         # Build section
@@ -80,6 +89,10 @@ class ClaudeMdRenderer:
             self.path.write_text(content)
         # Track promoted skill IDs using shared utility
-        track_promoted(skills, self.tracking_path)
+        track_promoted(new_skills, self.tracking_path)
-        return f"Appended {len(skills)} rules to {self.path}"
+        skipped = len(skills) - len(new_skills)
+        msg = f"Appended {len(new_skills)} rules to {self.path}"
+        if skipped > 0:
+            msg += f" ({skipped} already promoted, skipped)"
+        return msg

{buildlog-0.6.1 → buildlog-0.7.0}/src/buildlog/render/tracking.py RENAMED Viewed

@@ -10,7 +10,26 @@ from typing import TYPE_CHECKING
 if TYPE_CHECKING:
     from buildlog.skills import Skill
-__all__ = ["track_promoted"]
+__all__ = ["track_promoted", "get_promoted_ids"]
+def get_promoted_ids(tracking_path: Path) -> set[str]:
+    """Get the set of already-promoted skill IDs.
+    Args:
+        tracking_path: Path to the tracking JSON file.
+    Returns:
+        Set of skill IDs that have been promoted.
+    """
+    if not tracking_path.exists():
+        return set()
+    try:
+        tracking = json.loads(tracking_path.read_text())
+        return set(tracking.get("skill_ids", []))
+    except json.JSONDecodeError:
+        return set()
 def track_promoted(skills: list[Skill], tracking_path: Path) -> None:

{buildlog-0.6.1 → buildlog-0.7.0}/src/buildlog/seeds.py RENAMED Viewed

@@ -156,6 +156,36 @@ class SeedFile:
         )
+def _validate_seed_schema(data: dict) -> bool:
+    """Validate seed file has expected schema structure.
+    Defense-in-depth validation for seed files. While yaml.safe_load
+    prevents code execution, this ensures data structure matches expectations.
+    Args:
+        data: Parsed YAML data.
+    Returns:
+        True if schema is valid, False otherwise.
+    """
+    if not isinstance(data, dict):
+        return False
+    # Rules must be a list if present
+    rules = data.get("rules", [])
+    if not isinstance(rules, list):
+        return False
+    # Each rule must be a dict with at least a "rule" key
+    for rule in rules:
+        if not isinstance(rule, dict):
+            return False
+        if "rule" not in rule:
+            return False
+    return True
 def load_seed_file(path: Path) -> SeedFile | None:
     """Load a single seed file from disk.
@@ -164,6 +194,10 @@ def load_seed_file(path: Path) -> SeedFile | None:
     Returns:
         Parsed SeedFile or None if loading fails.
+    Note:
+        Uses yaml.safe_load which is safe from code execution attacks.
+        Additional schema validation ensures data structure is as expected.
     """
     if not path.exists():
         logger.warning(f"Seed file not found: {path}")
@@ -171,7 +205,14 @@ def load_seed_file(path: Path) -> SeedFile | None:
     try:
         with open(path) as f:
+            # yaml.safe_load is safe - no arbitrary code execution
             data = yaml.safe_load(f)
+        # Validate schema before parsing
+        if not _validate_seed_schema(data):
+            logger.error(f"Invalid seed file schema: {path}")
+            return None
         return SeedFile.from_dict(data)
     except (yaml.YAMLError, KeyError, TypeError) as e:
         logger.error(f"Failed to parse seed file {path}: {e}")