PyPI - buildlog - Versions diffs - 0.6.0__tar.gz → 0.7.0__tar.gz - Mend

buildlog 0.6.0tar.gz → 0.7.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (41) hide show

{buildlog-0.6.0 → buildlog-0.7.0}/.gitignore RENAMED Viewed

@@ -44,3 +44,11 @@ htmlcov/
 # Build artifacts
 *.whl
+# Development artifacts
+CHAT.txt
+results/
+sketches/
+# buildlog runtime data (in project root, not in src/)
+buildlog/.buildlog/

{buildlog-0.6.0 → buildlog-0.7.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: buildlog
-Version: 0.6.0
+Version: 0.7.0
 Summary: Engineering notebook for AI-assisted development
 Project-URL: Homepage, https://github.com/Peleke/buildlog-template
 Project-URL: Repository, https://github.com/Peleke/buildlog-template
@@ -123,11 +123,30 @@ RMR is not the only metric that matters. But it's one we can measure, and measur
 ## The Mechanism
-buildlog uses **contextual bandits** to select which rules to surface.
+buildlog is building toward **contextual bandits** for automatic rule selection. Here's where we are:
+### What Exists Today (v0.7)
 ```
 ┌─────────────────────────────────────────────────────────────────┐
-│                    CONTEXTUAL BANDIT SETUP                      │
+│                    CURRENT INFRASTRUCTURE                       │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                 │
+│  ✅ Rule extraction     From entries, reviews, curated seeds   │
+│  ✅ Confidence scoring  Frequency + recency based              │
+│  ✅ Reward logging      Accept/reject/revision signals         │
+│  ✅ Experiment tracking Sessions, mistakes, RMR calculation    │
+│  ✅ Review gauntlet     Curated persona-based code review      │
+│  ⏳ Manual promotion    Human selects rules to surface         │
+│                                                                 │
+└─────────────────────────────────────────────────────────────────┘
+```
+### What's Coming (v0.8+)
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    CONTEXTUAL BANDIT (PLANNED)                  │
 ├─────────────────────────────────────────────────────────────────┤
 │                                                                 │
 │  Context (c):     Error class, file type, task category        │
@@ -147,9 +166,9 @@ buildlog uses **contextual bandits** to select which rules to surface.
 **Reward** = did surfacing this rule actually help?
-The system explores (tries uncertain rules) and exploits (uses proven rules) based on accumulated evidence. Thompson Sampling provides theoretical guarantees: O(√(KT log K)) regret bounds.
+The reward infrastructure exists. The bandit policy is next. Thompson Sampling will provide theoretical guarantees: O(√(KT log K)) regret bounds.
-This isn't magic. It's a well-understood framework with decades of research. We're applying it to agent rule selection.
+We're building in public—the bandit implementation will be developed with full documentation of the process.
 ---
@@ -161,16 +180,20 @@ buildlog captures signal at every stage:
 flowchart LR
     A["Work Sessions"] --> B["Structured Entries"]
     B --> C["Extracted Rules"]
-    C --> D["Bandit Selection"]
+    C --> D["Manual Promotion"]
     D --> E["Rule Surfaced"]
     E --> F["Human Feedback"]
-    F --> G["Posterior Update"]
-    G --> D
+    F --> G["Reward Logged"]
+    G -.-> H["Bandit Policy"]
+    H -.-> D
     style F fill:#ff6b6b,color:#fff
     style G fill:#4ecdc4,color:#fff
+    style H fill:#666,color:#fff,stroke-dasharray: 5 5
 ```
+*Dashed: Coming in v0.8 — automatic rule selection via Thompson Sampling*
 ### Stage 1: Capture
 Document your work. Include the fuckups—they're the most valuable signal.
@@ -269,6 +292,27 @@ buildlog gauntlet rules --format markdown -o review_checklist.md
 buildlog gauntlet learn review_issues.json --source "PR#42"
 ```
+### Gauntlet Loop (Agent Integration)
+For AI agents, the gauntlet loop automates the fix-rerun cycle:
+```bash
+buildlog gauntlet loop src/ --persona security_karen --persona test_terrorist
+```
+The loop provides structured checkpoints:
+| Severity | Action | Human Needed? |
+|----------|--------|---------------|
+| **Critical** | Agent fixes, reruns | No |
+| **Major** | Checkpoint: continue? | Yes |
+| **Minor** | Accept risk or fix? | Yes |
+| **Clean** | Done | No |
+MCP tools for agent integration:
+- `buildlog_gauntlet_issues` — Report findings, get next action
+- `buildlog_gauntlet_accept_risk` — Accept remaining issues (optionally create GitHub issues)
 The gauntlet integrates with the learning loop—issues found become rules that accumulate confidence.
 ---
@@ -359,6 +403,8 @@ Available tools:
 | `buildlog_start_session` | Begin tracked experiment |
 | `buildlog_log_mistake` | Record mistake during session |
 | `buildlog_experiment_report` | Full experiment report |
+| `buildlog_gauntlet_issues` | Report gauntlet findings, get next action |
+| `buildlog_gauntlet_accept_risk` | Accept remaining issues, optionally create GH issues |
 ### CLI Commands
@@ -382,6 +428,7 @@ buildlog gauntlet list           # Show reviewers
 buildlog gauntlet rules          # Export rules
 buildlog gauntlet prompt <path>  # Generate review prompt
 buildlog gauntlet learn <file>   # Persist learnings
+buildlog gauntlet loop <path>    # Auto-fix loop with HITL checkpoints
 ```
 ---
@@ -421,21 +468,28 @@ This is how you know. Not vibes. Data.
 For the technically curious:
-| Concept | Application in buildlog |
-|---------|------------------------|
-| **Thompson Sampling** | Rule selection under uncertainty |
-| **Beta-Bernoulli model** | Posterior updates from binary reward |
-| **Contextual bandits** | Context-dependent rule selection |
-| **Regret bounds** | O(√(KT log K)) theoretical guarantee |
-| **Semantic hashing** | Mistake deduplication for RMR |
+| Concept | Application in buildlog | Status |
+|---------|------------------------|--------|
+| **Confidence scoring** | Frequency + recency decay | ✅ Implemented |
+| **Semantic hashing** | Mistake deduplication for RMR | ✅ Implemented |
+| **Reward signals** | Binary feedback infrastructure | ✅ Implemented |
+| **Thompson Sampling** | Rule selection under uncertainty | ⏳ Planned (v0.8) |
+| **Beta-Bernoulli model** | Posterior updates from binary reward | ⏳ Planned (v0.8) |
+| **Contextual bandits** | Context-dependent rule selection | ⏳ Planned (v0.8) |
+| **Regret bounds** | O(√(KT log K)) theoretical guarantee | ⏳ Planned (v0.8) |
-We're not inventing new math. We're applying proven frameworks to a new domain.
+We're not inventing new math. We're applying proven frameworks to a new domain. The infrastructure for reward collection is live; the bandit policy is the next milestone.
 ---
 ## Honest Limitations
-Things we don't have figured out yet:
+### Not Yet Implemented
+- **Automatic rule selection**: Currently manual promotion; Thompson Sampling bandit planned for v0.8
+- **Context-aware surfacing**: Rules are surfaced globally, not based on task context
+### Hard Problems We're Working On
 - **Credit assignment**: When multiple rules are active, which one helped?
 - **Non-stationarity**: Developer skill changes over time

{buildlog-0.6.0 → buildlog-0.7.0}/README.md RENAMED Viewed

@@ -75,11 +75,30 @@ RMR is not the only metric that matters. But it's one we can measure, and measur
 ## The Mechanism
-buildlog uses **contextual bandits** to select which rules to surface.
+buildlog is building toward **contextual bandits** for automatic rule selection. Here's where we are:
+### What Exists Today (v0.7)
 ```
 ┌─────────────────────────────────────────────────────────────────┐
-│                    CONTEXTUAL BANDIT SETUP                      │
+│                    CURRENT INFRASTRUCTURE                       │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                 │
+│  ✅ Rule extraction     From entries, reviews, curated seeds   │
+│  ✅ Confidence scoring  Frequency + recency based              │
+│  ✅ Reward logging      Accept/reject/revision signals         │
+│  ✅ Experiment tracking Sessions, mistakes, RMR calculation    │
+│  ✅ Review gauntlet     Curated persona-based code review      │
+│  ⏳ Manual promotion    Human selects rules to surface         │
+│                                                                 │
+└─────────────────────────────────────────────────────────────────┘
+```
+### What's Coming (v0.8+)
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    CONTEXTUAL BANDIT (PLANNED)                  │
 ├─────────────────────────────────────────────────────────────────┤
 │                                                                 │
 │  Context (c):     Error class, file type, task category        │
@@ -99,9 +118,9 @@ buildlog uses **contextual bandits** to select which rules to surface.
 **Reward** = did surfacing this rule actually help?
-The system explores (tries uncertain rules) and exploits (uses proven rules) based on accumulated evidence. Thompson Sampling provides theoretical guarantees: O(√(KT log K)) regret bounds.
+The reward infrastructure exists. The bandit policy is next. Thompson Sampling will provide theoretical guarantees: O(√(KT log K)) regret bounds.
-This isn't magic. It's a well-understood framework with decades of research. We're applying it to agent rule selection.
+We're building in public—the bandit implementation will be developed with full documentation of the process.
 ---
@@ -113,16 +132,20 @@ buildlog captures signal at every stage:
 flowchart LR
     A["Work Sessions"] --> B["Structured Entries"]
     B --> C["Extracted Rules"]
-    C --> D["Bandit Selection"]
+    C --> D["Manual Promotion"]
     D --> E["Rule Surfaced"]
     E --> F["Human Feedback"]
-    F --> G["Posterior Update"]
-    G --> D
+    F --> G["Reward Logged"]
+    G -.-> H["Bandit Policy"]
+    H -.-> D
     style F fill:#ff6b6b,color:#fff
     style G fill:#4ecdc4,color:#fff
+    style H fill:#666,color:#fff,stroke-dasharray: 5 5
 ```
+*Dashed: Coming in v0.8 — automatic rule selection via Thompson Sampling*
 ### Stage 1: Capture
 Document your work. Include the fuckups—they're the most valuable signal.
@@ -221,6 +244,27 @@ buildlog gauntlet rules --format markdown -o review_checklist.md
 buildlog gauntlet learn review_issues.json --source "PR#42"
 ```
+### Gauntlet Loop (Agent Integration)
+For AI agents, the gauntlet loop automates the fix-rerun cycle:
+```bash
+buildlog gauntlet loop src/ --persona security_karen --persona test_terrorist
+```
+The loop provides structured checkpoints:
+| Severity | Action | Human Needed? |
+|----------|--------|---------------|
+| **Critical** | Agent fixes, reruns | No |
+| **Major** | Checkpoint: continue? | Yes |
+| **Minor** | Accept risk or fix? | Yes |
+| **Clean** | Done | No |
+MCP tools for agent integration:
+- `buildlog_gauntlet_issues` — Report findings, get next action
+- `buildlog_gauntlet_accept_risk` — Accept remaining issues (optionally create GitHub issues)
 The gauntlet integrates with the learning loop—issues found become rules that accumulate confidence.
 ---
@@ -311,6 +355,8 @@ Available tools:
 | `buildlog_start_session` | Begin tracked experiment |
 | `buildlog_log_mistake` | Record mistake during session |
 | `buildlog_experiment_report` | Full experiment report |
+| `buildlog_gauntlet_issues` | Report gauntlet findings, get next action |
+| `buildlog_gauntlet_accept_risk` | Accept remaining issues, optionally create GH issues |
 ### CLI Commands
@@ -334,6 +380,7 @@ buildlog gauntlet list           # Show reviewers
 buildlog gauntlet rules          # Export rules
 buildlog gauntlet prompt <path>  # Generate review prompt
 buildlog gauntlet learn <file>   # Persist learnings
+buildlog gauntlet loop <path>    # Auto-fix loop with HITL checkpoints
 ```
 ---
@@ -373,21 +420,28 @@ This is how you know. Not vibes. Data.
 For the technically curious:
-| Concept | Application in buildlog |
-|---------|------------------------|
-| **Thompson Sampling** | Rule selection under uncertainty |
-| **Beta-Bernoulli model** | Posterior updates from binary reward |
-| **Contextual bandits** | Context-dependent rule selection |
-| **Regret bounds** | O(√(KT log K)) theoretical guarantee |
-| **Semantic hashing** | Mistake deduplication for RMR |
+| Concept | Application in buildlog | Status |
+|---------|------------------------|--------|
+| **Confidence scoring** | Frequency + recency decay | ✅ Implemented |
+| **Semantic hashing** | Mistake deduplication for RMR | ✅ Implemented |
+| **Reward signals** | Binary feedback infrastructure | ✅ Implemented |
+| **Thompson Sampling** | Rule selection under uncertainty | ⏳ Planned (v0.8) |
+| **Beta-Bernoulli model** | Posterior updates from binary reward | ⏳ Planned (v0.8) |
+| **Contextual bandits** | Context-dependent rule selection | ⏳ Planned (v0.8) |
+| **Regret bounds** | O(√(KT log K)) theoretical guarantee | ⏳ Planned (v0.8) |
-We're not inventing new math. We're applying proven frameworks to a new domain.
+We're not inventing new math. We're applying proven frameworks to a new domain. The infrastructure for reward collection is live; the bandit policy is the next milestone.
 ---
 ## Honest Limitations
-Things we don't have figured out yet:
+### Not Yet Implemented
+- **Automatic rule selection**: Currently manual promotion; Thompson Sampling bandit planned for v0.8
+- **Context-aware surfacing**: Rules are surfaced globally, not based on task context
+### Hard Problems We're Working On
 - **Credit assignment**: When multiple rules are active, which one helped?
 - **Non-stationarity**: Developer skill changes over time

{buildlog-0.6.0 → buildlog-0.7.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "buildlog"
-version = "0.6.0"
+version = "0.7.0"
 description = "Engineering notebook for AI-assisted development"
 readme = "README.md"
 license = "MIT"

{buildlog-0.6.0 → buildlog-0.7.0}/src/buildlog/cli.py RENAMED Viewed

@@ -921,15 +921,18 @@ def gauntlet_list(output_json: bool):
     """
     import json as json_module
-    from buildlog.seeds import load_all_seeds
+    from buildlog.seeds import get_default_seeds_dir, load_all_seeds
-    # Find seeds directory
-    buildlog_dir = Path("buildlog")
-    seeds_dir = buildlog_dir / ".buildlog" / "seeds"
+    # Find seeds directory (local overrides > buildlog template > package bundled)
+    seeds_dir = get_default_seeds_dir()
-    # Also check .buildlog at repo root (common for installed templates)
-    if not seeds_dir.exists():
-        seeds_dir = Path(".buildlog") / "seeds"
+    if seeds_dir is None:
+        if output_json:
+            click.echo('{"personas": {}, "total_rules": 0, "error": "No seeds found"}')
+        else:
+            click.echo("No seed files found.")
+            click.echo("Seeds are bundled with buildlog - check your installation.")
+        return
     seeds = load_all_seeds(seeds_dir)
@@ -997,18 +1000,22 @@ def gauntlet_rules(persona: str, fmt: str, output: str | None):
     """
     import json as json_module
-    from buildlog.seeds import load_all_seeds
+    from buildlog.seeds import get_default_seeds_dir, load_all_seeds
-    # Find seeds directory
-    seeds_dir = Path(".buildlog") / "seeds"
-    if not seeds_dir.exists():
-        seeds_dir = Path("buildlog") / ".buildlog" / "seeds"
+    # Find seeds directory (local overrides > buildlog template > package bundled)
+    seeds_dir = get_default_seeds_dir()
+    if seeds_dir is None:
+        click.echo("No seed files found.", err=True)
+        click.echo(
+            "Seeds are bundled with buildlog - check your installation.", err=True
+        )
+        raise SystemExit(1)
     seeds = load_all_seeds(seeds_dir)
     if not seeds:
-        click.echo("No seed files found.", err=True)
-        click.echo("Initialize with: buildlog init", err=True)
+        click.echo("No seed files found in directory.", err=True)
         raise SystemExit(1)
     # Filter personas
@@ -1117,17 +1124,22 @@ def gauntlet_prompt(target: str, persona: tuple[str, ...], output: str | None):
         buildlog gauntlet prompt src/api.py -p security_karen
         buildlog gauntlet prompt . -o review_prompt.md
     """
-    from buildlog.seeds import load_all_seeds
+    from buildlog.seeds import get_default_seeds_dir, load_all_seeds
-    # Find seeds directory
-    seeds_dir = Path(".buildlog") / "seeds"
-    if not seeds_dir.exists():
-        seeds_dir = Path("buildlog") / ".buildlog" / "seeds"
+    # Find seeds directory (local overrides > buildlog template > package bundled)
+    seeds_dir = get_default_seeds_dir()
+    if seeds_dir is None:
+        click.echo("No seed files found.", err=True)
+        click.echo(
+            "Seeds are bundled with buildlog - check your installation.", err=True
+        )
+        raise SystemExit(1)
     seeds = load_all_seeds(seeds_dir)
     if not seeds:
-        click.echo("No seed files found.", err=True)
+        click.echo("No seed files found in directory.", err=True)
         raise SystemExit(1)
     # Filter personas
@@ -1252,5 +1264,158 @@ def gauntlet_learn(issues_file: str, source: str | None, output_json: bool):
         click.echo(f"  Total processed: {result.total_issues_processed}")
+@gauntlet.command("loop")
+@click.argument("target", type=click.Path(exists=True))
+@click.option(
+    "--persona",
+    "-p",
+    multiple=True,
+    help="Personas to run (default: all)",
+)
+@click.option(
+    "--max-iterations",
+    "-n",
+    default=10,
+    help="Maximum iterations to prevent infinite loops (default: 10)",
+)
+@click.option(
+    "--stop-at",
+    type=click.Choice(["criticals", "majors", "minors"]),
+    default="minors",
+    help="Stop after clearing this severity level (default: minors)",
+)
+@click.option(
+    "--auto-gh-issues",
+    is_flag=True,
+    help="Create GitHub issues for remaining items when accepting risk",
+)
+@click.option("--json", "output_json", is_flag=True, help="Output as JSON")
+def gauntlet_loop(
+    target: str,
+    persona: tuple[str, ...],
+    max_iterations: int,
+    stop_at: str,
+    auto_gh_issues: bool,
+    output_json: bool,
+):
+    """Run the gauntlet loop: review, fix, repeat until clean.
+    This command orchestrates the gauntlet loop workflow:
+    1. Generate review prompt for target code
+    2. Process issues and determine action
+    3. On criticals: output fix instructions, expect re-run
+    4. On majors only: checkpoint (ask to continue)
+    5. On minors only: checkpoint (accept risk?)
+    6. Optionally create GitHub issues for remaining items
+    The loop is designed to be run interactively with an agent
+    (Claude Code, Cursor, etc.) that does the actual fixing.
+    Examples:
+        buildlog gauntlet loop src/
+        buildlog gauntlet loop tests/ --stop-at majors
+        buildlog gauntlet loop . --auto-gh-issues
+    """
+    import json as json_module
+    from buildlog.seeds import get_default_seeds_dir, load_all_seeds
+    # Find seeds directory
+    seeds_dir = get_default_seeds_dir()
+    if seeds_dir is None:
+        click.echo("No seed files found.", err=True)
+        raise SystemExit(1)
+    seeds = load_all_seeds(seeds_dir)
+    if not seeds:
+        click.echo("No seed files found in directory.", err=True)
+        raise SystemExit(1)
+    # Filter personas
+    if persona:
+        seeds = {k: v for k, v in seeds.items() if k in persona}
+        if not seeds:
+            click.echo(f"No matching personas: {', '.join(persona)}", err=True)
+            raise SystemExit(1)
+    target_path = Path(target)
+    # Generate persona rules summary
+    rules_by_persona: dict[str, list[dict[str, str]]] = {}
+    for name, sf in seeds.items():
+        rules_by_persona[name] = [
+            {"rule": r.rule, "antipattern": r.antipattern, "category": r.category}
+            for r in sf.rules
+        ]
+    # Loop instructions
+    instructions = [
+        "1. Review the target code using the rules from each persona",
+        "2. Report all violations as JSON issues with: severity, category, description, rule_learned, location",
+        "3. Call `buildlog_gauntlet_issues` with the issues list to determine next action",
+        "4. If action='fix_criticals': Fix critical+major issues, then re-run gauntlet",
+        "5. If action='checkpoint_majors': Ask user whether to continue fixing majors",
+        "6. If action='checkpoint_minors': Ask user whether to accept risk or continue",
+        "7. If user accepts risk and --auto-gh-issues: Call `buildlog_gauntlet_accept_risk` with remaining issues",
+        "8. Repeat until action='clean' or max_iterations reached",
+    ]
+    # Expected issue format
+    issue_format = {
+        "severity": "critical|major|minor|nitpick",
+        "category": "security|testing|architectural|workflow|...",
+        "description": "Concrete description of what's wrong",
+        "rule_learned": "Generalizable rule for the future",
+        "location": "file:line (optional)",
+    }
+    # Build the loop output
+    output = {
+        "command": "gauntlet_loop",
+        "target": str(target_path),
+        "personas": list(seeds.keys()),
+        "max_iterations": max_iterations,
+        "stop_at": stop_at,
+        "auto_gh_issues": auto_gh_issues,
+        "rules_by_persona": rules_by_persona,
+        "instructions": instructions,
+        "issue_format": issue_format,
+    }
+    if output_json:
+        click.echo(json_module.dumps(output, indent=2))
+    else:
+        # Human-readable output
+        click.echo("=" * 60)
+        click.echo("GAUNTLET LOOP")
+        click.echo("=" * 60)
+        click.echo(f"\nTarget: {target_path}")
+        click.echo(f"Personas: {', '.join(seeds.keys())}")
+        click.echo(f"Max iterations: {max_iterations}")
+        click.echo(f"Stop at: {stop_at}")
+        click.echo(f"Auto GH issues: {auto_gh_issues}")
+        click.echo("\n--- RULES ---")
+        for name, rules in rules_by_persona.items():
+            click.echo(f"\n## {name.replace('_', ' ').title()}")
+            for r in rules:
+                click.echo(f"  • {r['rule']}")
+        click.echo("\n--- LOOP WORKFLOW ---")
+        for instruction in instructions:
+            click.echo(f"  {instruction}")
+        click.echo("\n--- ISSUE FORMAT ---")
+        click.echo(json_module.dumps(issue_format, indent=2))
+        click.echo("\n" + "=" * 60)
+        click.echo("Ready. Run gauntlet review and process issues.")
+        click.echo("=" * 60)
 if __name__ == "__main__":
     main()

{buildlog-0.6.0 → buildlog-0.7.0}/src/buildlog/core/__init__.py RENAMED Viewed

@@ -3,6 +3,8 @@
 from buildlog.core.operations import (
     DiffResult,
     EndSessionResult,
+    GauntletAcceptRiskResult,
+    GauntletLoopResult,
     LearnFromReviewResult,
     LogMistakeResult,
     LogRewardResult,
@@ -20,6 +22,8 @@ from buildlog.core.operations import (
     diff,
     end_session,
     find_skills_by_ids,
+    gauntlet_accept_risk,
+    gauntlet_process_issues,
     get_experiment_report,
     get_rewards,
     get_session_metrics,
@@ -50,6 +54,9 @@ __all__ = [
     "StartSessionResult",
     "EndSessionResult",
     "LogMistakeResult",
+    # Gauntlet loop
+    "GauntletLoopResult",
+    "GauntletAcceptRiskResult",
     "status",
     "promote",
     "reject",
@@ -64,4 +71,7 @@ __all__ = [
     "log_mistake",
     "get_session_metrics",
     "get_experiment_report",
+    # Gauntlet loop operations
+    "gauntlet_process_issues",
+    "gauntlet_accept_risk",
 ]

buildlog 0.6.0__tar.gz → 0.7.0__tar.gz

buildlog 0.6.0tar.gz → 0.7.0tar.gz