PyPI - codeprobe - Versions diffs - 0.2.8__tar.gz → 0.3.1__tar.gz - Mend

codeprobe 0.2.8tar.gz → 0.3.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (174) hide show

{codeprobe-0.2.8 → codeprobe-0.3.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: codeprobe
-Version: 0.2.8
+Version: 0.3.1
 Summary: Benchmark AI coding agents against your own codebase. Mine real tasks from repo history, run agents, interpret results.
 Author: codeprobe contributors
 License-Expression: Apache-2.0
@@ -24,6 +24,7 @@ Requires-Dist: anthropic>=0.39
 Requires-Dist: openai>=1.66
 Requires-Dist: tiktoken<1,>=0.7
 Requires-Dist: scipy<2,>=1.11
+Requires-Dist: rich<14,>=13.7
 Provides-Extra: dev
 Requires-Dist: pytest<9,>=8.0; extra == "dev"
 Requires-Dist: pytest-cov<6,>=5.0; extra == "dev"
@@ -37,11 +38,11 @@ Dynamic: license-file
 Benchmark AI coding agents against **your own codebase**.
-Mine real tasks from your repo history, run agents against them, and find out which setup actually works best for YOUR code — not someone else's benchmark suite.
+Mine real tasks from your repo history, run agents against them, and find out which setup actually works best for **your** code, not someone else's benchmark suite.
 ## Why codeprobe?
-Existing benchmarks (SWE-bench, HumanEval) use fixed task sets that AI models may have memorized from training data. codeprobe mines tasks from **your private repo history**, producing benchmarks that are impossible to contaminate.
+Existing benchmarks (SWE-bench, HumanEval) use fixed task sets that AI models may have memorized from training data, and as general public benchmarks likely don't capture what is most important to your unique  workflows. codeprobe mines tasks from **your private repo history**, producing benchmarks that are impossible to contaminate. You can also point the tool at any public repo to mine tasks from.
 ## Prerequisites
@@ -84,18 +85,20 @@ codeprobe interpret .   # Get recommendations
 ## Commands
-| Command                  | Purpose                                          |
-| ------------------------ | ------------------------------------------------ |
-| `codeprobe assess`       | Score a codebase's benchmarking potential        |
-| `codeprobe init`         | Interactive wizard — choose what to compare      |
-| `codeprobe mine`         | Mine eval tasks from merged PRs/MRs              |
-| `codeprobe probe`        | Generate fast micro-benchmark probes (30s each)  |
-| `codeprobe experiment`   | Manage comparison experiments (init, add-config) |
-| `codeprobe run`          | Execute tasks against AI agents                  |
-| `codeprobe interpret`    | Analyze results, rank configurations             |
-| `codeprobe oracle-check` | Compare agent answer against oracle ground truth |
-| `codeprobe scaffold`     | Create/validate eval task directories            |
-| `codeprobe ratings`      | Record and analyze agent session quality ratings |
+| Command                    | Purpose                                          |
+| -------------------------- | ------------------------------------------------ |
+| `codeprobe assess`         | Score a codebase's benchmarking potential        |
+| `codeprobe init`           | Interactive wizard — choose what to compare      |
+| `codeprobe mine`           | Mine eval tasks from merged PRs/MRs              |
+| `codeprobe probe`          | Generate fast micro-benchmark probes (30s each)  |
+| `codeprobe experiment`     | Manage comparison experiments (init, add-config) |
+| `codeprobe run`            | Execute tasks against AI agents                  |
+| `codeprobe interpret`      | Analyze results, rank configurations             |
+| `codeprobe doctor`         | Check environment readiness (agents, keys, git)  |
+| `codeprobe preambles list` | List available preambles at all search levels    |
+| `codeprobe oracle-check`   | Compare agent answer against oracle ground truth |
+| `codeprobe scaffold`       | Create/validate eval task directories            |
+| `codeprobe ratings`        | Record and analyze agent session quality ratings |
 ## Two Ways to Generate Tasks
@@ -181,17 +184,32 @@ Template variables: `{{sg_repo}}`, `{{repo_name}}`, `{{repo_path}}`, `{{task_id}
 codeprobe run . --parallel 5          # Run 5 tasks concurrently (worktree-isolated)
 codeprobe run . --max-cost-usd 2.00   # Stop when cost budget is reached
 codeprobe run . --dry-run             # Estimate resource usage without running
+codeprobe run . --model opus-4        # Override experiment.json model
+codeprobe run . --timeout 600         # Override default 300s timeout
+codeprobe run . --repeats 3           # Run each task 3 times
+codeprobe run . --show-prompt         # Print resolved prompt without running agent
 # Mining
 codeprobe mine . --enrich             # Use LLM to improve weak task instructions
 codeprobe mine . --org-scale          # Mine comprehension tasks (not SDLC)
 codeprobe mine . --mcp-families       # Include MCP-optimized task families
 codeprobe mine . --sg-repo REPO       # Sourcegraph repo for ground truth enrichment
+codeprobe mine . --preset quick       # Quick scan: count=3
+codeprobe mine . --preset mcp         # MCP eval: org-scale + MCP families + enrich
+# Mine profiles (save/load custom flag combinations)
+codeprobe mine --save-profile my-setup --count 10 --org-scale .
+codeprobe mine --profile my-setup .   # Load saved flags
+codeprobe mine --list-profiles        # Show available profiles
 # Experiment configs
 codeprobe experiment add-config . --preamble sourcegraph  # Attach MCP preamble
 codeprobe experiment add-config . --mcp-config config.json  # Attach MCP server
+# Diagnostics
+codeprobe doctor                      # Check agents, API keys, git, Python
+codeprobe preambles list              # Show available preambles at all levels
 # Output
 codeprobe interpret . --format csv    # Export for pivot tables
 codeprobe interpret . --format html   # Self-contained HTML report
@@ -210,14 +228,9 @@ GitHub, GitLab, Bitbucket, Azure DevOps, Gitea/Forgejo, and local repos.
 ## Configuration
-Create a `.evalrc.yaml` in your repo root:
+Configuration lives in `experiment.json` (created by `codeprobe init` or `codeprobe experiment init`). CLI flags override experiment.json values — precedence: built-in defaults < experiment.json < CLI flags.
-```yaml
-name: my-experiment
-agents: [claude, copilot]
-models: [claude-sonnet-4-6, claude-opus-4-6]
-tasks_dir: .codeprobe/tasks
-```
+Run-time observability is on by default: Rich Live dashboard in TTY, JSON event lines with `--log-format json` for CI. Cost budget warnings at 80% and 100% thresholds are always visible on stderr.
 ## License

{codeprobe-0.2.8 → codeprobe-0.3.1}/README.md RENAMED Viewed

@@ -2,11 +2,11 @@
 Benchmark AI coding agents against **your own codebase**.
-Mine real tasks from your repo history, run agents against them, and find out which setup actually works best for YOUR code — not someone else's benchmark suite.
+Mine real tasks from your repo history, run agents against them, and find out which setup actually works best for **your** code, not someone else's benchmark suite.
 ## Why codeprobe?
-Existing benchmarks (SWE-bench, HumanEval) use fixed task sets that AI models may have memorized from training data. codeprobe mines tasks from **your private repo history**, producing benchmarks that are impossible to contaminate.
+Existing benchmarks (SWE-bench, HumanEval) use fixed task sets that AI models may have memorized from training data, and as general public benchmarks likely don't capture what is most important to your unique  workflows. codeprobe mines tasks from **your private repo history**, producing benchmarks that are impossible to contaminate. You can also point the tool at any public repo to mine tasks from.
 ## Prerequisites
@@ -49,18 +49,20 @@ codeprobe interpret .   # Get recommendations
 ## Commands
-| Command                  | Purpose                                          |
-| ------------------------ | ------------------------------------------------ |
-| `codeprobe assess`       | Score a codebase's benchmarking potential        |
-| `codeprobe init`         | Interactive wizard — choose what to compare      |
-| `codeprobe mine`         | Mine eval tasks from merged PRs/MRs              |
-| `codeprobe probe`        | Generate fast micro-benchmark probes (30s each)  |
-| `codeprobe experiment`   | Manage comparison experiments (init, add-config) |
-| `codeprobe run`          | Execute tasks against AI agents                  |
-| `codeprobe interpret`    | Analyze results, rank configurations             |
-| `codeprobe oracle-check` | Compare agent answer against oracle ground truth |
-| `codeprobe scaffold`     | Create/validate eval task directories            |
-| `codeprobe ratings`      | Record and analyze agent session quality ratings |
+| Command                    | Purpose                                          |
+| -------------------------- | ------------------------------------------------ |
+| `codeprobe assess`         | Score a codebase's benchmarking potential        |
+| `codeprobe init`           | Interactive wizard — choose what to compare      |
+| `codeprobe mine`           | Mine eval tasks from merged PRs/MRs              |
+| `codeprobe probe`          | Generate fast micro-benchmark probes (30s each)  |
+| `codeprobe experiment`     | Manage comparison experiments (init, add-config) |
+| `codeprobe run`            | Execute tasks against AI agents                  |
+| `codeprobe interpret`      | Analyze results, rank configurations             |
+| `codeprobe doctor`         | Check environment readiness (agents, keys, git)  |
+| `codeprobe preambles list` | List available preambles at all search levels    |
+| `codeprobe oracle-check`   | Compare agent answer against oracle ground truth |
+| `codeprobe scaffold`       | Create/validate eval task directories            |
+| `codeprobe ratings`        | Record and analyze agent session quality ratings |
 ## Two Ways to Generate Tasks
@@ -146,17 +148,32 @@ Template variables: `{{sg_repo}}`, `{{repo_name}}`, `{{repo_path}}`, `{{task_id}
 codeprobe run . --parallel 5          # Run 5 tasks concurrently (worktree-isolated)
 codeprobe run . --max-cost-usd 2.00   # Stop when cost budget is reached
 codeprobe run . --dry-run             # Estimate resource usage without running
+codeprobe run . --model opus-4        # Override experiment.json model
+codeprobe run . --timeout 600         # Override default 300s timeout
+codeprobe run . --repeats 3           # Run each task 3 times
+codeprobe run . --show-prompt         # Print resolved prompt without running agent
 # Mining
 codeprobe mine . --enrich             # Use LLM to improve weak task instructions
 codeprobe mine . --org-scale          # Mine comprehension tasks (not SDLC)
 codeprobe mine . --mcp-families       # Include MCP-optimized task families
 codeprobe mine . --sg-repo REPO       # Sourcegraph repo for ground truth enrichment
+codeprobe mine . --preset quick       # Quick scan: count=3
+codeprobe mine . --preset mcp         # MCP eval: org-scale + MCP families + enrich
+# Mine profiles (save/load custom flag combinations)
+codeprobe mine --save-profile my-setup --count 10 --org-scale .
+codeprobe mine --profile my-setup .   # Load saved flags
+codeprobe mine --list-profiles        # Show available profiles
 # Experiment configs
 codeprobe experiment add-config . --preamble sourcegraph  # Attach MCP preamble
 codeprobe experiment add-config . --mcp-config config.json  # Attach MCP server
+# Diagnostics
+codeprobe doctor                      # Check agents, API keys, git, Python
+codeprobe preambles list              # Show available preambles at all levels
 # Output
 codeprobe interpret . --format csv    # Export for pivot tables
 codeprobe interpret . --format html   # Self-contained HTML report
@@ -175,14 +192,9 @@ GitHub, GitLab, Bitbucket, Azure DevOps, Gitea/Forgejo, and local repos.
 ## Configuration
-Create a `.evalrc.yaml` in your repo root:
+Configuration lives in `experiment.json` (created by `codeprobe init` or `codeprobe experiment init`). CLI flags override experiment.json values — precedence: built-in defaults < experiment.json < CLI flags.
-```yaml
-name: my-experiment
-agents: [claude, copilot]
-models: [claude-sonnet-4-6, claude-opus-4-6]
-tasks_dir: .codeprobe/tasks
-```
+Run-time observability is on by default: Rich Live dashboard in TTY, JSON event lines with `--log-format json` for CI. Cost budget warnings at 80% and 100% thresholds are always visible on stderr.
 ## License

{codeprobe-0.2.8 → codeprobe-0.3.1}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "codeprobe"
-version = "0.2.8"
+version = "0.3.1"
 description = "Benchmark AI coding agents against your own codebase. Mine real tasks from repo history, run agents, interpret results."
 readme = "README.md"
 license = "Apache-2.0"
@@ -25,6 +25,7 @@ dependencies = [
     "openai>=1.66",
     "tiktoken>=0.7,<1",
     "scipy>=1.11,<2",
+    "rich>=13.7,<14",
 ]
 [project.urls]
@@ -46,17 +47,21 @@ dev = [
 codeprobe = "codeprobe.cli:main"
 [project.entry-points."codeprobe.agents"]
-aider = "codeprobe.adapters.aider:AiderAdapter"
 claude = "codeprobe.adapters.claude:ClaudeAdapter"
 codex = "codeprobe.adapters.codex:CodexAdapter"
 copilot = "codeprobe.adapters.copilot:CopilotAdapter"
-openai = "codeprobe.adapters.openai_compat:OpenAICompatAdapter"
 [project.entry-points."codeprobe.sessions"]
 claude = "codeprobe.adapters.session:ClaudeSessionCollector"
 codex = "codeprobe.adapters.session:CodexSessionCollector"
 copilot = "codeprobe.adapters.session:CopilotSessionCollector"
+[project.entry-points."codeprobe.scorers"]
+binary = "codeprobe.core.scoring:BinaryScorer"
+continuous = "codeprobe.core.scoring:ContinuousScorer"
+checkpoint = "codeprobe.core.scoring:CheckpointScorer"
+test_ratio = "codeprobe.core.scoring:ContinuousScorer"
 [build-system]
 requires = ["setuptools>=68", "wheel"]
 build-backend = "setuptools.build_meta"

{codeprobe-0.2.8 → codeprobe-0.3.1}/src/codeprobe/__init__.py RENAMED Viewed

@@ -1,3 +1,3 @@
 """codeprobe — Benchmark AI coding agents against your own codebase."""
-__version__ = "0.2.8"
+__version__ = "0.3.1"

{codeprobe-0.2.8 → codeprobe-0.3.1}/src/codeprobe/assess/heuristics.py RENAMED Viewed

@@ -142,7 +142,12 @@ def _run_git(args: list[str], cwd: Path) -> str:
             timeout=30,
         )
         if result.returncode != 0:
-            logger.debug("git %s exited %d: %s", " ".join(args), result.returncode, result.stderr.strip())
+            logger.debug(
+                "git %s exited %d: %s",
+                " ".join(args),
+                result.returncode,
+                result.stderr.strip(),
+            )
             return ""
         return result.stdout.strip()
     except (subprocess.TimeoutExpired, OSError) as exc:
@@ -307,7 +312,9 @@ def gather_heuristics(repo_path: Path) -> RepoHeuristics:
     history, CI presence, test coverage, languages, and activity.
     """
     total_commits_str = _run_git(["rev-list", "--count", "HEAD"], cwd=repo_path)
-    merge_commits_str = _run_git(["rev-list", "--merges", "--count", "HEAD"], cwd=repo_path)
+    merge_commits_str = _run_git(
+        ["rev-list", "--merges", "--count", "HEAD"], cwd=repo_path
+    )
     contributors_str = _run_git(["shortlog", "-sn", "HEAD"], cwd=repo_path)
     file_list = _run_git(["ls-files"], cwd=repo_path)
@@ -354,7 +361,10 @@ def score_repo_heuristic(heuristics: RepoHeuristics) -> AssessmentScore:
     has_ci = heuristics.has_ci
     has_fw = len(heuristics.test_frameworks) > 0
     if has_tests and has_ci and has_fw:
-        tc_score, tc_reason = 1.0, f"Tests + CI + framework ({', '.join(heuristics.test_frameworks)})"
+        tc_score, tc_reason = (
+            1.0,
+            f"Tests + CI + framework ({', '.join(heuristics.test_frameworks)})",
+        )
     elif has_tests and (has_ci or has_fw):
         tc_score, tc_reason = 0.7, "Tests present with partial CI/framework support"
     elif has_tests:
@@ -409,15 +419,29 @@ def score_repo_heuristic(heuristics: RepoHeuristics) -> AssessmentScore:
         DimensionScore(name="ci_maturity", score=ci_score, reasoning=ci_reason),
     )
-    # Equal weights for heuristic path (model path lets the model weight them).
-    overall = sum(d.score for d in dimensions) / len(dimensions)
+    # Weighted average — ci_maturity is a weak signal because CI configs are
+    # often absent in shallow clones / Sourcegraph views, and codeprobe
+    # validates via mined test.sh scripts, not CI pipelines.
+    _WEIGHTS: dict[str, float] = {
+        "task_richness": 0.25,
+        "test_coverage": 0.25,
+        "complexity": 0.20,
+        "activity": 0.15,
+        "documentation": 0.10,
+        "ci_maturity": 0.05,
+    }
+    overall = sum(d.score * _WEIGHTS[d.name] for d in dimensions)
     if overall >= 0.7:
         recommendation = "Excellent benchmarking candidate — rich history with tests"
     elif overall >= 0.5:
-        recommendation = "Good candidate — may need more merge history for diverse tasks"
+        recommendation = (
+            "Good candidate — may need more merge history for diverse tasks"
+        )
     elif overall >= 0.3:
-        recommendation = "Fair candidate — limited test coverage may reduce task quality"
+        recommendation = (
+            "Fair candidate — limited test coverage may reduce task quality"
+        )
     else:
         recommendation = "Poor candidate — consider a repo with more history and tests"
@@ -458,11 +482,15 @@ def _parse_model_assessment(
         score_val = float(item.get("score", 0))
         score_val = max(0.0, min(1.0, score_val))
         reasoning = str(item.get("reasoning", ""))
-        dim_by_name[name] = DimensionScore(name=name, score=score_val, reasoning=reasoning)
+        dim_by_name[name] = DimensionScore(
+            name=name, score=score_val, reasoning=reasoning
+        )
     missing = set(RUBRIC_V1) - set(dim_by_name)
     if missing:
-        raise LLMParseError(f"Model response missing dimensions: {', '.join(sorted(missing))}")
+        raise LLMParseError(
+            f"Model response missing dimensions: {', '.join(sorted(missing))}"
+        )
     dimensions = tuple(dim_by_name[name] for name in RUBRIC_V1)
@@ -498,6 +526,11 @@ def score_repo_with_model(heuristics: RepoHeuristics) -> AssessmentScore:
         "You are evaluating a code repository's suitability for AI agent benchmarking.\n\n"
         f"Here are the raw repository statistics:\n{stats_json}\n\n"
         f"Score this repository on each of these dimensions (0.0 to 1.0):\n{rubric_list}\n\n"
+        "Weighting guidance for the overall score: task_richness and test_coverage "
+        "are the most important (~25% each), followed by complexity (~20%), "
+        "activity (~15%), documentation (~10%). ci_maturity should be a minor "
+        "signal (~5%) because CI configs are often absent in cloned repos and "
+        "codeprobe validates via mined test scripts, not CI pipelines.\n\n"
         "Respond with ONLY valid JSON matching this exact schema:\n"
         "{\n"
         '  "overall": <float 0.0-1.0>,\n'

{codeprobe-0.2.8 → codeprobe-0.3.1}/src/codeprobe/cli/__init__.py RENAMED Viewed

@@ -84,6 +84,10 @@ def main(verbose: int, quiet: bool, log_format: str) -> None:
     and interpret the results to find which setup works best for YOUR code.
     """
     _configure_logging(verbose=verbose, quiet=quiet, log_format=log_format)
+    ctx = click.get_current_context()
+    ctx.ensure_object(dict)
+    ctx.obj["log_format"] = log_format
+    ctx.obj["quiet"] = quiet
 @main.command()
@@ -101,6 +105,40 @@ def init(path: str) -> None:
 @main.command()
 @click.argument("path", default=".")
+@click.option(
+    "--preset",
+    type=click.Choice(["quick", "mcp"], case_sensitive=False),
+    default=None,
+    help="Apply a named preset: 'quick' (count=3) or 'mcp' (org-scale + MCP families).",
+)
+@click.option(
+    "--goal",
+    type=click.Choice(
+        ["quality", "navigation", "mcp", "general"], case_sensitive=False
+    ),
+    default=None,
+    help="Eval goal: quality, navigation, mcp, general. Skips interactive goal prompt.",
+)
+@click.option(
+    "--profile",
+    "profile_name",
+    default=None,
+    help="Load a user-defined profile from ~/.codeprobe/mine-profiles.json "
+    "or .codeprobe/mine-profiles.json. Explicit flags override profile values.",
+)
+@click.option(
+    "--save-profile",
+    "save_profile_name",
+    default=None,
+    help="Save current flag values as a named profile to ~/.codeprobe/mine-profiles.json.",
+)
+@click.option(
+    "--list-profiles",
+    "list_profiles_flag",
+    is_flag=True,
+    default=False,
+    help="Show available profiles from user and project levels.",
+)
 @click.option("--count", default=5, help="Number of tasks to mine (3-20).")
 @click.option(
     "--source",
@@ -206,8 +244,15 @@ def init(path: str) -> None:
     "(e.g. github.com/sg-evals/numpy). Defaults to github.com/sg-evals/{repo_name} "
     "when --mcp-families is used. Requires SOURCEGRAPH_TOKEN env var.",
 )
+@click.pass_context
 def mine(
+    ctx: click.Context,
     path: str,
+    preset: str | None,
+    goal: str | None,
+    profile_name: str | None,
+    save_profile_name: str | None,
+    list_profiles_flag: bool,
     count: int,
     source: str,
     min_files: int,
@@ -232,6 +277,21 @@ def mine(
     Extracts real code-change tasks from merged PRs/MRs with ground truth,
     test scripts, and scoring rubrics.
+    \b
+    Presets (--preset):
+      quick  — Fast scan: count=3, default SDLC mode
+      mcp    — MCP eval: count=8, org-scale + MCP families + enrich
+    \b
+    Profiles (--profile / --save-profile / --list-profiles):
+      Save:  codeprobe mine --save-profile my-setup --count 10 --org-scale .
+      Load:  codeprobe mine --profile my-setup /path/to/repo
+      List:  codeprobe mine --list-profiles
+    \b
+    Precedence: built-in defaults < profile < --preset < explicit CLI flags.
+    \b
     Use --org-scale to mine comprehension/IR tasks with oracle verification
     instead of SDLC code-change tasks.
@@ -242,10 +302,100 @@ def mine(
     choosing an eval goal, task count, and git host before mining.
     Use --no-interactive to skip the prompts and use defaults/flags directly.
     """
-    from codeprobe.cli.mine_cmd import run_mine
+    from pathlib import Path as _Path
+    from codeprobe.cli.mine_cmd import (
+        list_profiles,
+        load_profile,
+        run_mine,
+        save_profile,
+    )
+    # --list-profiles: show and exit
+    if list_profiles_flag:
+        repo_path = _Path(path).resolve() if path != "." else _Path.cwd()
+        entries = list_profiles(repo_path)
+        if not entries:
+            click.echo("No profiles found.")
+        else:
+            click.echo(f"{'Name':<20s} {'Source':<10s} {'Settings'}")
+            click.echo("-" * 60)
+            for name, source_label, prof in entries:
+                summary = ", ".join(f"{k}={v}" for k, v in sorted(prof.items()))
+                click.echo(f"{name:<20s} {source_label:<10s} {summary}")
+        return
+    # --save-profile: save current flags and exit
+    if save_profile_name is not None:
+        # Collect all current param values, keeping only those that differ
+        # from Click defaults.
+        param_defaults = {p.name: p.default for p in ctx.command.params}
+        # Exclude meta-params that aren't mining flags
+        _EXCLUDE_FROM_PROFILE = frozenset(
+            {
+                "path",
+                "profile_name",
+                "save_profile_name",
+                "list_profiles_flag",
+            }
+        )
+        values = {
+            k: (list(v) if isinstance(v, tuple) else v)
+            for k, v in ctx.params.items()
+            if k not in _EXCLUDE_FROM_PROFILE and v != param_defaults.get(k)
+        }
+        saved_path = save_profile(save_profile_name, values)
+        click.echo(f"Profile '{save_profile_name}' saved to {saved_path}")
+        return
+    # --profile: load profile values as defaults, then apply preset and CLI overrides
+    if profile_name is not None:
+        repo_path = _Path(path).resolve() if path != "." else _Path.cwd()
+        prof = load_profile(profile_name, repo_path)
+        # Determine which params were explicitly set on the CLI
+        explicitly_set = {
+            p.name
+            for p in ctx.command.params
+            if ctx.get_parameter_source(p.name) is not None
+            and ctx.get_parameter_source(p.name).name == "COMMANDLINE"
+        }
+        # Apply profile values for params NOT explicitly set on CLI.
+        # Tuple-typed params (click multiple=True) need list→tuple coercion.
+        _TUPLE_PARAMS = frozenset({"subsystem", "family", "repos", "backends"})
+        def _prof_val(key: str, current: object) -> object:
+            if key in explicitly_set or key not in prof:
+                return current
+            v = prof[key]
+            return tuple(v) if key in _TUPLE_PARAMS else v
+        count = _prof_val("count", count)  # type: ignore[assignment]
+        source = _prof_val("source", source)  # type: ignore[assignment]
+        min_files = _prof_val("min_files", min_files)  # type: ignore[assignment]
+        enrich = _prof_val("enrich", enrich)  # type: ignore[assignment]
+        org_scale = _prof_val("org_scale", org_scale)  # type: ignore[assignment]
+        mcp_families = _prof_val("mcp_families", mcp_families)  # type: ignore[assignment]
+        no_llm = _prof_val("no_llm", no_llm)  # type: ignore[assignment]
+        discover_subsystems = _prof_val("discover_subsystems", discover_subsystems)  # type: ignore[assignment]
+        scan_timeout = _prof_val("scan_timeout", scan_timeout)  # type: ignore[assignment]
+        validate_flag = _prof_val("validate_flag", validate_flag)  # type: ignore[assignment]
+        curate = _prof_val("curate", curate)  # type: ignore[assignment]
+        verify_curation_flag = _prof_val("verify_curation_flag", verify_curation_flag)  # type: ignore[assignment]
+        sg_repo = _prof_val("sg_repo", sg_repo)  # type: ignore[assignment]
+        subsystem = _prof_val("subsystem", subsystem)  # type: ignore[assignment]
+        family = _prof_val("family", family)  # type: ignore[assignment]
+        repos = _prof_val("repos", repos)  # type: ignore[assignment]
+        backends = _prof_val("backends", backends)  # type: ignore[assignment]
+        interactive = _prof_val("interactive", interactive)  # type: ignore[assignment]
+        preset = _prof_val("preset", preset)  # type: ignore[assignment]
+        goal = _prof_val("goal", goal)  # type: ignore[assignment]
     run_mine(
         path,
+        preset=preset,
+        goal=goal,
         count=count,
         source=source,
         min_files=min_files,
@@ -294,7 +444,46 @@ def mine(
     default=False,
     help="Print estimated resource requirements without executing any agents.",
 )
+@click.option(
+    "--force-plain",
+    is_flag=True,
+    default=False,
+    help="Force plain-text output even in a TTY (disable Rich dashboard).",
+)
+@click.option(
+    "--force-rich",
+    is_flag=True,
+    default=False,
+    help="Force Rich Live dashboard even in non-TTY environments.",
+)
+@click.option(
+    "--timeout",
+    default=None,
+    type=int,
+    help="Timeout in seconds per task (overrides experiment.json extra.timeout_seconds).",
+)
+@click.option(
+    "--repeats",
+    default=None,
+    type=int,
+    help="Number of repeats per task (overrides default of 1).",
+)
+@click.option(
+    "--show-prompt",
+    is_flag=True,
+    default=False,
+    help="Print the fully-resolved prompt for the first task and exit (no agent spawned).",
+)
+@click.option(
+    "--suite",
+    "suite_path",
+    default=None,
+    type=click.Path(exists=True),
+    help="Path to a suite.toml manifest to filter tasks by type, difficulty, and tags.",
+)
+@click.pass_context
 def run(
+    ctx: click.Context,
     path: str,
     agent: str,
     model: str | None,
@@ -302,6 +491,12 @@ def run(
     max_cost_usd: float | None,
     parallel: int,
     dry_run: bool,
+    force_plain: bool,
+    force_rich: bool,
+    timeout: int | None,
+    repeats: int | None,
+    show_prompt: bool,
+    suite_path: str | None,
 ) -> None:
     """Run eval tasks against an AI coding agent.
@@ -310,6 +505,16 @@ def run(
     """
     from codeprobe.cli.run_cmd import run_eval
+    ctx.ensure_object(dict)
+    log_format = ctx.obj.get("log_format", "text")
+    quiet = ctx.obj.get("quiet", False)
+    if show_prompt:
+        from codeprobe.cli.run_cmd import show_prompt_and_exit
+        show_prompt_and_exit(path, config=config, agent=agent, model=model)
+        return
     run_eval(
         path,
         agent=agent,
@@ -318,6 +523,13 @@ def run(
         max_cost_usd=max_cost_usd,
         parallel=parallel,
         dry_run=dry_run,
+        log_format=log_format,
+        quiet=quiet,
+        force_plain=force_plain,
+        force_rich=force_rich,
+        timeout=timeout,
+        repeats=repeats if repeats is not None else 1,
+        suite_path=suite_path,
     )
@@ -488,3 +700,18 @@ main.add_command(scaffold)
 from codeprobe.cli.probe_cmd import probe  # noqa: E402
 main.add_command(probe)
+# Register the preambles subcommand group
+from codeprobe.cli.preamble_cmd import preambles  # noqa: E402
+main.add_command(preambles)
+# Register the doctor command
+from codeprobe.cli.doctor_cmd import doctor  # noqa: E402
+main.add_command(doctor)
+# Register the validate command
+from codeprobe.cli.validate_cmd import validate  # noqa: E402
+main.add_command(validate)

codeprobe 0.2.8__tar.gz → 0.3.1__tar.gz

codeprobe 0.2.8tar.gz → 0.3.1tar.gz