PyPI - agentpack-cli - Versions diffs - 0.1.22__tar.gz → 0.1.24__tar.gz - Mend

agentpack-cli 0.1.22tar.gz → 0.1.24tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (85) hide show

{agentpack_cli-0.1.22 → agentpack_cli-0.1.24}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: agentpack-cli
-Version: 0.1.22
+Version: 0.1.24
 Summary: Task-aware context packing for AI coding agents — Claude, Cursor, Windsurf, Codex, and Antigravity
 License: MIT
 License-File: LICENSE
@@ -44,7 +44,7 @@ Description-Content-Type: text/markdown
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 [![CI](https://github.com/vishal2612200/agentpack/actions/workflows/ci.yml/badge.svg)](https://github.com/vishal2612200/agentpack/actions/workflows/ci.yml)
-> **Status: alpha (v0.1.22).** Works, tested, used in real sessions. Python and JavaScript/TypeScript are the best-supported languages. Not yet validated across a wide range of repos. API may change before 1.0.
+> **Status: alpha (v0.1.24).** Works, tested, used in real sessions. Python and JavaScript/TypeScript are the best-supported languages. Not yet validated across a wide range of repos. API may change before 1.0.
 >
 > **Platform note:** macOS and Linux are fully supported. Windows support is not yet implemented (git hooks use POSIX shell; the Claude Code session hooks use `python3`/`rm -f`). Contributions welcome.
@@ -278,6 +278,17 @@ Requires Python 3.10+.
 > **PyPI note:** The package is `agentpack-cli` (the name `agentpack` was already taken). The CLI command is still `agentpack`.
+### npm wrapper
+AgentPack can also be installed from npm:
+```bash
+npm install -g @vishal2612200/agentpack
+agentpack --version
+```
+The npm package is a thin Node.js wrapper around the Python CLI. It requires Node.js 18+ and Python 3.10+, then installs the matching `agentpack-cli` PyPI package into a per-version virtual environment on first run. This keeps the implementation single-source while giving JavaScript-heavy teams a familiar install path.
 ---
 ## Start Once, Then Work Normally
@@ -885,6 +896,7 @@ Mode comparison: fix auth token expiry
 [[cases]]
 task = "fix auth token expiry"
 mode = "balanced"
+task_type = "backend-api"
 expected_files = [
   "src/auth/token.py",
   "src/auth/session.py",
@@ -898,6 +910,8 @@ expected_files = [
 Use `--misses` when recall is low. It prints each expected file that was not selected with status, rank, score, and scoring reasons, which helps separate ignored files, budget cuts, low scores, and missing dependency signals.
+Add `task_type` to group results by workflow area. Benchmark summaries report average precision, recall, F1, and token noise by type, so a repo can show "backend-api is good, frontend-web is noisy" instead of hiding that under one aggregate.
 ---
 ### `agentpack scan`
@@ -938,7 +952,7 @@ agentpack benchmark --compare --misses
 `--sample-fixtures` runs bundled FastAPI, Next.js, and mixed Python/TypeScript fixture evals from an AgentPack source checkout. It is a smoke test, not a claim about your repo.
-For an 8+ usefulness signal, use `benchmark.toml` with real third-party or customer-style repos: 5-20 historical tasks, the files actually changed for each task, and `--compare` results for recall, F1, rank@K, and token noise. That is better than trusting generic benchmarks because it tells you whether AgentPack selects the files that matter in code the package has never seen.
+For an 8+ usefulness signal, use `benchmark.toml` with real third-party or customer-style repos: 5-20 historical tasks, `task_type` labels, the files actually changed for each task, and `--compare` results for recall, F1, rank@K, and token noise. That is better than trusting generic benchmarks because it tells you whether AgentPack selects the files that matter in code the package has never seen.
 ---
@@ -1566,7 +1580,10 @@ Useful checks before opening a PR:
 ```bash
 pytest
+python -m ruff check src tests
 python -m build
+npm test --prefix npm
+(cd npm && npm pack --dry-run)
 agentpack benchmark --sample-fixtures --misses
 ```
@@ -1577,6 +1594,7 @@ Good contribution areas:
 - Better symbol extraction for Go, Rust, Java, and Kotlin
 - More precise import/dependency resolution for framework-heavy repos
 - Ranking regressions with `expected_files` cases that reproduce misses
+- npm wrapper improvements that preserve the Python CLI as the source of truth
 Please include tests for ranking changes. A good ranking PR usually adds one focused unit test and one scenario in `tests/test_ranking_evals.py`.

{agentpack_cli-0.1.22 → agentpack_cli-0.1.24}/README.md RENAMED Viewed

@@ -5,7 +5,7 @@
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 [![CI](https://github.com/vishal2612200/agentpack/actions/workflows/ci.yml/badge.svg)](https://github.com/vishal2612200/agentpack/actions/workflows/ci.yml)
-> **Status: alpha (v0.1.22).** Works, tested, used in real sessions. Python and JavaScript/TypeScript are the best-supported languages. Not yet validated across a wide range of repos. API may change before 1.0.
+> **Status: alpha (v0.1.24).** Works, tested, used in real sessions. Python and JavaScript/TypeScript are the best-supported languages. Not yet validated across a wide range of repos. API may change before 1.0.
 >
 > **Platform note:** macOS and Linux are fully supported. Windows support is not yet implemented (git hooks use POSIX shell; the Claude Code session hooks use `python3`/`rm -f`). Contributions welcome.
@@ -239,6 +239,17 @@ Requires Python 3.10+.
 > **PyPI note:** The package is `agentpack-cli` (the name `agentpack` was already taken). The CLI command is still `agentpack`.
+### npm wrapper
+AgentPack can also be installed from npm:
+```bash
+npm install -g @vishal2612200/agentpack
+agentpack --version
+```
+The npm package is a thin Node.js wrapper around the Python CLI. It requires Node.js 18+ and Python 3.10+, then installs the matching `agentpack-cli` PyPI package into a per-version virtual environment on first run. This keeps the implementation single-source while giving JavaScript-heavy teams a familiar install path.
 ---
 ## Start Once, Then Work Normally
@@ -846,6 +857,7 @@ Mode comparison: fix auth token expiry
 [[cases]]
 task = "fix auth token expiry"
 mode = "balanced"
+task_type = "backend-api"
 expected_files = [
   "src/auth/token.py",
   "src/auth/session.py",
@@ -859,6 +871,8 @@ expected_files = [
 Use `--misses` when recall is low. It prints each expected file that was not selected with status, rank, score, and scoring reasons, which helps separate ignored files, budget cuts, low scores, and missing dependency signals.
+Add `task_type` to group results by workflow area. Benchmark summaries report average precision, recall, F1, and token noise by type, so a repo can show "backend-api is good, frontend-web is noisy" instead of hiding that under one aggregate.
 ---
 ### `agentpack scan`
@@ -899,7 +913,7 @@ agentpack benchmark --compare --misses
 `--sample-fixtures` runs bundled FastAPI, Next.js, and mixed Python/TypeScript fixture evals from an AgentPack source checkout. It is a smoke test, not a claim about your repo.
-For an 8+ usefulness signal, use `benchmark.toml` with real third-party or customer-style repos: 5-20 historical tasks, the files actually changed for each task, and `--compare` results for recall, F1, rank@K, and token noise. That is better than trusting generic benchmarks because it tells you whether AgentPack selects the files that matter in code the package has never seen.
+For an 8+ usefulness signal, use `benchmark.toml` with real third-party or customer-style repos: 5-20 historical tasks, `task_type` labels, the files actually changed for each task, and `--compare` results for recall, F1, rank@K, and token noise. That is better than trusting generic benchmarks because it tells you whether AgentPack selects the files that matter in code the package has never seen.
 ---
@@ -1527,7 +1541,10 @@ Useful checks before opening a PR:
 ```bash
 pytest
+python -m ruff check src tests
 python -m build
+npm test --prefix npm
+(cd npm && npm pack --dry-run)
 agentpack benchmark --sample-fixtures --misses
 ```
@@ -1538,6 +1555,7 @@ Good contribution areas:
 - Better symbol extraction for Go, Rust, Java, and Kotlin
 - More precise import/dependency resolution for framework-heavy repos
 - Ranking regressions with `expected_files` cases that reproduce misses
+- npm wrapper improvements that preserve the Python CLI as the source of truth
 Please include tests for ranking changes. A good ranking PR usually adds one focused unit test and one scenario in `tests/test_ranking_evals.py`.

{agentpack_cli-0.1.22 → agentpack_cli-0.1.24}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "agentpack-cli"
-version = "0.1.22"
+version = "0.1.24"
 description = "Task-aware context packing for AI coding agents — Claude, Cursor, Windsurf, Codex, and Antigravity"
 readme = "README.md"
 requires-python = ">=3.10"

{agentpack_cli-0.1.22 → agentpack_cli-0.1.24}/src/agentpack/__init__.py RENAMED Viewed

@@ -1,3 +1,3 @@
 """AgentPack — task-aware context packing for AI coding agents."""
-__version__ = "0.1.22"
+__version__ = "0.1.24"

{agentpack_cli-0.1.22 → agentpack_cli-0.1.24}/src/agentpack/analysis/ranking.py RENAMED Viewed

@@ -16,6 +16,16 @@ _STOPWORDS = {
     "use", "using", "used", "how", "what", "when", "where", "why",
 }
+_GENERIC_TASK_TERMS = {
+    "add", "added", "change", "changed", "changes", "clean", "cleanup",
+    "code", "commit", "context", "debug", "dev", "development", "doc",
+    "docs", "eval", "evals", "feature", "fix", "freshness", "general",
+    "impl", "implement", "implementation", "improve", "issue", "metric", "metrics",
+    "noise", "noisy", "package", "pack", "packs", "release", "repo",
+    "source", "sync", "task", "tasks", "test", "tests", "update", "use",
+    "useful", "usefulness", "version", "workflow", "workflows",
+}
 _CONCEPT_MAP: dict[str, frozenset[str]] = {
     # rate limiting
     "rate": frozenset({"throttle", "ratelimit", "leaky", "bucket", "debounce", "backoff", "quota"}),
@@ -219,15 +229,18 @@ def extract_keyword_weights(task: str) -> dict[str, float]:
             continue
         if word in _STOPWORDS:
             continue
-        _add_keyword_weight(keyword_weights, word, 1.0)
+        literal_weight = 0.25 if word in _GENERIC_TASK_TERMS else 1.0
+        _add_keyword_weight(keyword_weights, word, literal_weight)
         if word in _VARIANTS:
-            _add_keyword_weight(keyword_weights, _VARIANTS[word], 0.75)
+            variant = _VARIANTS[word]
+            variant_weight = 0.25 if variant in _GENERIC_TASK_TERMS else min(0.75, literal_weight)
+            _add_keyword_weight(keyword_weights, variant, variant_weight)
     # Expand via concept map one level only. Expanded concepts are weaker than
     # literal task words so broad terms like "task" do not dominate ranking.
     expanded: dict[str, float] = {}
     for kw in keyword_weights:
-        if kw in _CONCEPT_MAP:
+        if kw in _CONCEPT_MAP and kw not in _GENERIC_TASK_TERMS:
             for synonym in _CONCEPT_MAP[kw]:
                 _add_keyword_weight(expanded, synonym, 0.35)
                 if synonym in _VARIANTS:
@@ -237,6 +250,17 @@ def extract_keyword_weights(task: str) -> dict[str, float]:
     return keyword_weights
+def generic_task_term_ratio(task: str) -> float:
+    words = [
+        word for word in re.split(r"[^a-zA-Z0-9]+", task.lower())
+        if len(word) >= 3 and word not in _STOPWORDS
+    ]
+    if not words:
+        return 0.0
+    generic = sum(1 for word in words if word in _GENERIC_TASK_TERMS)
+    return generic / len(words)
 def extract_keywords(task: str) -> set[str]:
     return set(extract_keyword_weights(task))

{agentpack_cli-0.1.22 → agentpack_cli-0.1.24}/src/agentpack/application/pack_service.py RENAMED Viewed

@@ -22,6 +22,7 @@ from agentpack.analysis.ranking import (
     enrich_keyword_weights_from_files,
     boost_paired_tests,
     boost_cross_layer_related,
+    generic_task_term_ratio,
 )
 from agentpack.analysis.tests import find_related_tests
 from agentpack.analysis import dependency_graph as dep_graph_mod
@@ -37,6 +38,7 @@ class PackRequest:
     budget: int
     since: str | None
     refresh: bool
+    task_source: str = "explicit"
 @dataclass
@@ -57,6 +59,7 @@ class ChangeSet:
     all_changed: set[str]
     git_staged: set[str]
     recently_modified: list[str]
+    source: str
     current_snap: dict[str, Any] = field(default_factory=dict)
@@ -64,6 +67,7 @@ class ChangeSet:
 class RankResult:
     """Result of keyword extraction and file scoring."""
     keywords: set[str]
+    generic_ratio: float
     scored: list[tuple[Any, float, list[str]]]
@@ -80,6 +84,8 @@ class PackPlan:
     git_staged: set[str]
     recently_modified: list[str]
     keywords: set[str]
+    generic_task_ratio: float
+    changed_files_source: str
     scored: list[tuple[Any, float, list[str]]]
     selected: list[SelectedFile]
     receipts: list[Receipt]
@@ -119,6 +125,7 @@ class ChangeDetector:
             all_changed=changed_from_snap | git_changed,
             git_staged=git_staged,
             recently_modified=recently_modified,
+            source=_change_source(root, since, changed_from_snap, git_changed),
             current_snap=current_snap,
         )
@@ -140,6 +147,7 @@ class FileRanker:
         keyword_weights = extract_keyword_weights(task)
         keyword_weights = enrich_keyword_weights_from_files(keyword_weights, changes.all_changed, packable)
         keywords = set(keyword_weights)
+        generic_ratio = generic_task_term_ratio(task)
         all_paths = {f.path for f in packable}
         for fi in packable:
@@ -165,7 +173,7 @@ class FileRanker:
         )
         scored = boost_cross_layer_related(scored, keyword_weights, weights=cfg.scoring)
         scored = boost_paired_tests(scored, weights=cfg.scoring)
-        return RankResult(keywords=keywords, scored=scored)
+        return RankResult(keywords=keywords, generic_ratio=generic_ratio, scored=scored)
 class PackPlanner:
@@ -217,8 +225,8 @@ class PackPlanner:
             budget=effective_budget,
             max_file_tokens=cfg.context.max_file_tokens,
             keywords=rank_result.keywords,
-            min_summary_score=cfg.context.min_summary_score,
-            max_summary_files=_summary_cap_for_mode(cfg, request.mode),
+            min_summary_score=_summary_score_floor(cfg, rank_result.generic_ratio),
+            max_summary_files=_summary_cap_for_mode(cfg, request.mode, rank_result.generic_ratio),
         )
         phase_times["select"] = time.perf_counter() - t0
@@ -233,6 +241,8 @@ class PackPlanner:
             git_staged=changes.git_staged,
             recently_modified=changes.recently_modified,
             keywords=rank_result.keywords,
+            generic_task_ratio=rank_result.generic_ratio,
+            changed_files_source=changes.source,
             scored=rank_result.scored,
             selected=selected,
             receipts=receipts,
@@ -279,6 +289,13 @@ class PackService:
         saving_pct = (1 - packed_tokens / all_tokens) * 100 if all_tokens > 0 else 0.0
         all_redaction_warnings = [w for sf in plan.selected for w in sf.redaction_warnings]
+        freshness = _build_freshness_metadata(
+            root,
+            request=request,
+            plan=plan,
+            snapshot_root_hash=plan.current_snap["root_hash"],
+        )
+        freshness_warnings = _freshness_warnings(root, request, freshness)
         pack_obj = ContextPack(
             task=request.task,
@@ -294,6 +311,8 @@ class PackService:
             receipts=plan.receipts if cfg.context.include_receipts else [],
             redaction_warnings=all_redaction_warnings,
             stale=False,
+            freshness=freshness,
+            freshness_warnings=freshness_warnings,
         )
         adapter = AdapterRegistry.get(request.agent, cfg)
@@ -312,6 +331,8 @@ class PackService:
             mode=request.mode,
             budget=plan.budget,
             token_estimate=packed_tokens,
+            freshness=freshness,
+            freshness_warnings=freshness_warnings,
         )
         excluded_receipts = [r for r in plan.receipts if r.action == "excluded"]
         # Budget-cut: files that scored OK but didn't fit — more useful signal than "score too low"
@@ -359,14 +380,104 @@ def _sf_tokens(sf: SelectedFile) -> int:
     return estimate_tokens("\n".join(parts)) if parts else 50
-def _summary_cap_for_mode(cfg: Any, mode: str) -> int:
+def _summary_score_floor(cfg: Any, generic_ratio: float) -> float:
+    floor = cfg.context.min_summary_score
+    if generic_ratio >= 0.5:
+        return floor + 15
+    if generic_ratio >= 0.35:
+        return floor + 8
+    return floor
+def _summary_cap_for_mode(cfg: Any, mode: str, generic_ratio: float = 0.0) -> int:
     if mode == "minimal":
-        return cfg.context.max_summary_files_minimal
-    if mode == "balanced":
-        return cfg.context.max_summary_files_balanced
-    if mode == "deep":
-        return cfg.context.max_summary_files_deep
-    return 0
+        cap = cfg.context.max_summary_files_minimal
+    elif mode == "balanced":
+        cap = cfg.context.max_summary_files_balanced
+    elif mode == "deep":
+        cap = cfg.context.max_summary_files_deep
+    else:
+        cap = 0
+    if cap > 0 and generic_ratio >= 0.5:
+        return max(8, cap // 2)
+    if cap > 0 and generic_ratio >= 0.35:
+        return max(12, int(cap * 0.75))
+    return cap
+def _change_source(root: Path, since: str | None, snapshot_changed: set[str], git_changed: set[str]) -> str:
+    if not git.is_git_repo(root):
+        return "snapshot diff"
+    if since:
+        return f"git diff since {since} + snapshot diff"
+    if git_changed and snapshot_changed:
+        return "git working tree + snapshot diff"
+    if git_changed:
+        return "git working tree"
+    if snapshot_changed:
+        return "snapshot diff"
+    return "no live changes; ranking used task keywords and history"
+def _task_md_body(root: Path) -> str | None:
+    task_md_path = root / ".agentpack" / "task.md"
+    if not task_md_path.exists():
+        return None
+    try:
+        content = task_md_path.read_text(encoding="utf-8").strip()
+    except OSError:
+        return None
+    lines = [ln for ln in content.splitlines() if ln.strip() and not ln.startswith("#")]
+    body = lines[0].strip() if lines else ""
+    placeholder = "Write or update the current coding task here."
+    if body and placeholder not in body:
+        return body
+    return None
+def _build_freshness_metadata(
+    root: Path,
+    *,
+    request: PackRequest,
+    plan: PackPlan,
+    snapshot_root_hash: str,
+) -> dict[str, Any]:
+    dirty = git.dirty_files(root) if git.is_git_repo(root) else set()
+    metadata: dict[str, Any] = {
+        "generated_at": datetime.now(timezone.utc).isoformat(),
+        "task_source": request.task_source,
+        "changed_files_source": plan.changed_files_source,
+        "snapshot_root_hash": snapshot_root_hash,
+        "generic_task_ratio": round(plan.generic_task_ratio, 3),
+        "dirty_files_count": len(dirty),
+    }
+    if git.is_git_repo(root):
+        metadata["git_sha"] = git.current_sha(root)
+        metadata["git_branch"] = git.current_branch(root)
+    if dirty:
+        metadata["dirty_files_sample"] = sorted(dirty)[:8]
+    task_md = _task_md_body(root)
+    if task_md:
+        metadata["task_md"] = task_md
+    return metadata
+def _freshness_warnings(root: Path, request: PackRequest, freshness: dict[str, Any]) -> list[str]:
+    warnings: list[str] = []
+    task_md = freshness.get("task_md")
+    if task_md and task_md != request.task:
+        warnings.append(
+            ".agentpack/task.md differs from the packed task; rerun with --task auto if task.md should win."
+        )
+    if freshness.get("changed_files_source") == "no live changes; ranking used task keywords and history":
+        warnings.append("No live changed files were detected; treat selected files as keyword-based hints.")
+    if freshness.get("generic_task_ratio", 0) >= 0.5:
+        warnings.append("Task terms are broad/generic; pack tightened weak-summary selection.")
+    saved_sha = freshness.get("git_sha")
+    current_sha = git.current_sha(root) if git.is_git_repo(root) else None
+    if saved_sha and current_sha and saved_sha != current_sha:
+        warnings.append("Git HEAD changed since this pack was generated.")
+    return warnings
 def _load_last_record(metrics_path: Path) -> dict[str, Any] | None:

{agentpack_cli-0.1.22 → agentpack_cli-0.1.24}/src/agentpack/commands/benchmark.py RENAMED Viewed

@@ -23,6 +23,7 @@ class BenchmarkCase:
     task: str
     mode: str = "balanced"
     expected_files: list[str] = field(default_factory=list)
+    task_type: str = "general"
 @dataclass
@@ -60,43 +61,54 @@ def _sample_fixture_cases(fixtures_root: Path) -> list[FixtureCase]:
             "py_fastapi_app",
             "fix FastAPI auth token validation",
             ["src/app/auth.py", "tests/test_auth.py"],
+            "backend-api",
         ),
         (
             "py_fastapi_app",
             "add user profile API endpoint",
             ["src/app/main.py", "src/app/users.py", "tests/test_users.py"],
+            "backend-api",
         ),
         (
             "nextjs_app",
             "fix Next.js auth helper and API client",
             ["src/lib/auth.ts", "src/lib/api.ts"],
+            "frontend-web",
         ),
         (
             "nextjs_app",
             "debug dashboard page data loading",
             ["src/app/page.tsx", "src/lib/api.ts"],
+            "frontend-web",
         ),
         (
             "mixed_repo",
             "fix TypeScript API serialization utility",
             ["src/ts/api.ts", "src/ts/utils.ts"],
+            "typescript",
         ),
         (
             "mixed_repo",
             "fix Python utility parsing edge case",
             ["src/py/utils.py"],
+            "python",
         ),
     ]
     cases: list[FixtureCase] = []
-    for fixture, task, expected_files in specs:
+    for fixture, task, expected_files, task_type in specs:
         fixture_root = fixtures_root / fixture
         if fixture_root.exists():
             cases.append(
                 FixtureCase(
                     fixture=fixture,
                     root=fixture_root,
-                    case=BenchmarkCase(task=task, mode="balanced", expected_files=expected_files),
+                    case=BenchmarkCase(
+                        task=task,
+                        mode="balanced",
+                        expected_files=expected_files,
+                        task_type=task_type,
+                    ),
                 )
             )
     return cases
@@ -115,6 +127,7 @@ def _load_cases(path: Path) -> list[BenchmarkCase]:
             task=raw["task"],
             mode=raw.get("mode", "balanced"),
             expected_files=raw.get("expected_files", []),
+            task_type=raw.get("task_type", "general"),
         ))
     return cases
@@ -136,13 +149,15 @@ def _scaffold_cases(root: Path) -> Path:
         '[[cases]]\n'
         'task = "fix auth token expiry"\n'
         'mode = "balanced"\n'
+        'task_type = "backend-api"\n'
         '# expected_files = [\n'
         '#   "src/auth/token.py",\n'
         '#   "src/auth/session.py",\n'
         '# ]\n\n'
         '[[cases]]\n'
         'task = "add rate limiting to API endpoints"\n'
-        'mode = "balanced"\n',
+        'mode = "balanced"\n'
+        'task_type = "backend-api"\n',
         encoding="utf-8",
     )
     return out
@@ -170,7 +185,7 @@ def _load_history_cases(root: Path, n: int) -> list[BenchmarkCase]:
                     break
         except json.JSONDecodeError:
             pass
-    return [BenchmarkCase(task=t, mode=m) for t, m in seen]
+    return [BenchmarkCase(task=t, mode=m, task_type="history") for t, m in seen]
 def _random_baseline(
@@ -268,20 +283,20 @@ def _run_case(root: Path, case: BenchmarkCase) -> CaseResult:
         for expected_path in sorted(expected_set - selected_set):
             fi = all_file_map.get(expected_path)
             scored_info = scored_map.get(expected_path)
-            if fi is None:
-                status = "not found in scanned files"
-            elif fi.ignored or fi.binary:
-                status = "ignored or binary"
-            elif expected_path in receipt_map:
-                status = receipt_map[expected_path]
-            else:
-                status = "ranked but not selected" if scored_info else "not scored"
+            status = _miss_status(
+                fi=fi,
+                expected_path=expected_path,
+                receipt_map=receipt_map,
+                scored_info=scored_info,
+                changed_files_source=plan.changed_files_source,
+            )
             missed_expected.append({
                 "path": expected_path,
                 "status": status,
                 "rank": scored_info["rank"] if scored_info else None,
                 "score": round(scored_info["score"], 1) if scored_info else None,
                 "reasons": scored_info["reasons"][:4] if scored_info else [],
+                "basis": plan.changed_files_source,
             })
     else:
         missed_expected = []
@@ -320,12 +335,37 @@ def _precision_recall(result: CaseResult) -> tuple[float, float, float]:
     return p, r, f1
+def _miss_status(
+    *,
+    fi: Any,
+    expected_path: str,
+    receipt_map: dict[str, str],
+    scored_info: dict[str, Any] | None,
+    changed_files_source: str,
+) -> str:
+    suffix = ""
+    if changed_files_source.startswith("no live changes"):
+        suffix = "; no live changed-file signal"
+    if fi is None:
+        return "not found in scanned files"
+    if fi.ignored or fi.binary:
+        return "ignored or binary"
+    if expected_path in receipt_map:
+        return receipt_map[expected_path] + suffix
+    if scored_info:
+        if scored_info["score"] <= 0:
+            return "scored too low" + suffix
+        return "ranked but not selected" + suffix
+    return "not scored" + suffix
 def _persist_result(root: Path, result: CaseResult) -> None:
     out = root / ".agentpack" / "benchmark_results.jsonl"
     p, r, f1 = _precision_recall(result) if result.case.expected_files else (None, None, None)
     record = {
         "ts": datetime.now(timezone.utc).isoformat(),
         "task": result.case.task,
+        "task_type": result.case.task_type,
         "mode": result.case.mode,
         "packed_tokens": result.packed_tokens,
         "raw_tokens": result.raw_tokens,
@@ -356,7 +396,10 @@ def _print_case_detail(result: CaseResult, show_misses: bool = False) -> None:
     has_gt = bool(result.case.expected_files)
     p, r, f1 = _precision_recall(result) if has_gt else (0.0, 0.0, 0.0)
-    console.print(f"\n[bold cyan]{result.case.task}[/]  [dim]mode={result.case.mode}[/]")
+    console.print(
+        f"\n[bold cyan]{result.case.task}[/]  "
+        f"[dim]mode={result.case.mode} type={result.case.task_type}[/]"
+    )
     tbl = Table(box=box.SIMPLE, show_header=False, padding=(0, 2))
     tbl.add_column(style="dim")
@@ -467,6 +510,42 @@ def _print_summary_table(results: list[CaseResult]) -> None:
     console.print(tbl)
+def _print_task_type_summary(results: list[CaseResult]) -> None:
+    grouped: dict[str, list[CaseResult]] = {}
+    for result in results:
+        if result.case.expected_files:
+            grouped.setdefault(result.case.task_type, []).append(result)
+    if not grouped:
+        return
+    tbl = Table(box=box.SIMPLE, show_header=True, padding=(0, 1))
+    tbl.add_column("task type", max_width=28)
+    tbl.add_column("cases", justify="right")
+    tbl.add_column("avg P", justify="right")
+    tbl.add_column("avg R", justify="right")
+    tbl.add_column("avg F1", justify="right")
+    tbl.add_column("avg noise", justify="right")
+    for task_type, rows in sorted(grouped.items()):
+        metrics = [_precision_recall(row) for row in rows]
+        avg_p = sum(item[0] for item in metrics) / len(metrics)
+        avg_r = sum(item[1] for item in metrics) / len(metrics)
+        avg_f1 = sum(item[2] for item in metrics) / len(metrics)
+        noise_values = [row.noise_pct for row in rows if row.noise_pct is not None]
+        avg_noise = sum(noise_values) / len(noise_values) if noise_values else None
+        tbl.add_row(
+            task_type,
+            str(len(rows)),
+            f"{avg_p:.1%}",
+            f"{avg_r:.1%}",
+            f"{avg_f1:.1%}",
+            f"{avg_noise:.0f}%" if avg_noise is not None else "-",
+        )
+    console.print("\n[bold]By Task Type[/]")
+    console.print(tbl)
 def _print_miss_details(results: list[CaseResult]) -> None:
     rows = [miss | {"task": result.case.task[:30]} for result in results for miss in result.missed_expected]
     if not rows:
@@ -588,11 +667,12 @@ def register(app: typer.Typer) -> None:
                             FixtureCase(
                                 fixture=fixture_case.fixture,
                                 root=fixture_case.root,
-                                case=BenchmarkCase(
-                                    task=fixture_case.case.task,
-                                    mode=fixture_mode,
-                                    expected_files=fixture_case.case.expected_files,
-                                ),
+                case=BenchmarkCase(
+                    task=fixture_case.case.task,
+                    mode=fixture_mode,
+                    expected_files=fixture_case.case.expected_files,
+                    task_type=fixture_case.case.task_type,
+                ),
                             )
                         )
                 fixture_cases = expanded_fixtures
@@ -625,6 +705,7 @@ def register(app: typer.Typer) -> None:
             else:
                 console.print("\n[bold]Summary[/]")
                 _print_fixture_summary_table(results)
+                _print_task_type_summary(results)
                 if misses:
                     _print_miss_details(results)
             return
@@ -654,7 +735,14 @@ def register(app: typer.Typer) -> None:
             expanded: list[BenchmarkCase] = []
             for c in bench_cases:
                 for m in ("minimal", "balanced", "deep"):
-                    expanded.append(BenchmarkCase(task=c.task, mode=m, expected_files=c.expected_files))
+                    expanded.append(
+                        BenchmarkCase(
+                            task=c.task,
+                            mode=m,
+                            expected_files=c.expected_files,
+                            task_type=c.task_type,
+                        )
+                    )
             bench_cases = expanded
         console.print(f"\n[bold]Running {len(bench_cases)} benchmark case(s)...[/]\n")
@@ -686,5 +774,6 @@ def register(app: typer.Typer) -> None:
                     _print_case_detail(r, show_misses=misses)
             console.print("\n[bold]Summary[/]")
             _print_summary_table(results)
+            _print_task_type_summary(results)
             if misses:
                 _print_miss_details(results)

{agentpack_cli-0.1.22 → agentpack_cli-0.1.24}/src/agentpack/commands/pack.py RENAMED Viewed

@@ -33,7 +33,7 @@ def register(app: typer.Typer) -> None:
             raise typer.Exit(1)
         resolved_agent = _resolve_agent(agent)
-        resolved_task = _resolve_task(task)
+        resolved_task, task_source = _resolve_task_with_source(task)
         if watch or session:
             _pack_watch(agent=resolved_agent, task=resolved_task, mode=mode, budget=budget,
@@ -48,6 +48,7 @@ def register(app: typer.Typer) -> None:
             budget=budget,
             since=since,
             refresh=refresh,
+            task_source=task_source,
         ))
         _print_pack_summary(result)
@@ -62,8 +63,13 @@ def _resolve_agent(agent: str) -> str:
 def _resolve_task(task: str) -> str:
+    resolved, _source = _resolve_task_with_source(task)
+    return resolved
+def _resolve_task_with_source(task: str) -> tuple[str, str]:
     if task != "auto":
-        return task
+        return task, "explicit"
     root = _root()
     # task.md takes priority over all git heuristics
     task_md_path = root / ".agentpack" / "task.md"
@@ -74,10 +80,10 @@ def _resolve_task(task: str) -> str:
         _PLACEHOLDER = "Write or update the current coding task here."
         if body and _PLACEHOLDER not in body:
             console.print(f"[dim]Auto task (task.md): {body}[/]")
-            return body
+            return body, "task.md"
     inferred, source = git.infer_task_with_source(root)
     console.print(f"[dim]Auto task ({source}): {inferred}[/]")
-    return inferred
+    return inferred, source
 def _print_pack_summary(result: PackResult) -> None:
@@ -224,7 +230,7 @@ def _pack_watch(
     def _run_pack() -> None:
         result = PackService().run(PackRequest(
             root=root, agent=agent, task=task, mode=mode, budget=budget,
-            since=since, refresh=False,
+            since=since, refresh=False, task_source="watch",
         ))
         _print_pack_summary(result)

{agentpack_cli-0.1.22 → agentpack_cli-0.1.24}/src/agentpack/core/context_pack.py RENAMED Viewed

@@ -55,17 +55,30 @@ def save_pack_metadata(
     mode: str,
     budget: int,
     token_estimate: int = 0,
+    freshness: dict[str, Any] | None = None,
+    freshness_warnings: list[str] | None = None,
 ) -> None:
+    generated_at = (
+        freshness.get("generated_at")
+        if freshness and freshness.get("generated_at")
+        else datetime.now(timezone.utc).isoformat()
+    )
     meta = {
         "context_path": context_path,
-        "generated_at": datetime.now(timezone.utc).isoformat(),
+        "generated_at": generated_at,
         "snapshot_root_hash": snapshot_root_hash,
         "task": task,
         "agent": agent,
         "mode": mode,
         "budget": budget,
         "token_estimate": token_estimate,
+        "freshness": freshness or {},
+        "freshness_warnings": freshness_warnings or [],
     }
+    if freshness:
+        for key in ("git_sha", "git_branch", "task_source", "changed_files_source"):
+            if key in freshness:
+                meta[key] = freshness[key]
     _metadata_path(root).write_text(json.dumps(meta, indent=2))

{agentpack_cli-0.1.22 → agentpack_cli-0.1.24}/src/agentpack/core/git.py RENAMED Viewed

@@ -74,6 +74,38 @@ def changed_files_since(root: Path, ref: str) -> set[str]:
     return result
+def current_sha(root: Path) -> str | None:
+    out = _run(["git", "rev-parse", "HEAD"], root)
+    return out.strip() if out else None
+def current_branch(root: Path) -> str | None:
+    out = _run(["git", "rev-parse", "--abbrev-ref", "HEAD"], root)
+    if not out:
+        return None
+    branch = out.strip()
+    return branch if branch and branch != "HEAD" else None
+def dirty_files(root: Path) -> set[str]:
+    """Tracked and untracked files in git status --short output."""
+    out = _run(["git", "status", "--short"], root)
+    if not out:
+        return set()
+    paths: set[str] = set()
+    for line in out.splitlines():
+        line = line.strip()
+        if not line:
+            continue
+        # Handles ordinary status lines and simple renames.
+        raw_path = line[3:].strip() if len(line) > 3 else line
+        if " -> " in raw_path:
+            raw_path = raw_path.rsplit(" -> ", 1)[1]
+        if raw_path:
+            paths.add(raw_path)
+    return paths
 def file_churn_counts(root: Path, max_commits: int = 200) -> dict[str, int]:
     """Return commit count per file from the last max_commits commits.

{agentpack_cli-0.1.22 → agentpack_cli-0.1.24}/src/agentpack/core/models.py RENAMED Viewed

@@ -1,6 +1,7 @@
 from pathlib import Path
 from typing import Literal
-from pydantic import BaseModel
+from typing import Any
+from pydantic import BaseModel, Field
 class ScanResult(BaseModel):
@@ -81,6 +82,8 @@ class ContextPack(BaseModel):
     receipts: list[Receipt]
     redaction_warnings: list[str] = []
     stale: bool = False
+    freshness: dict[str, Any] = Field(default_factory=dict)
+    freshness_warnings: list[str] = Field(default_factory=list)
 class DependencyNode(BaseModel):

{agentpack_cli-0.1.22 → agentpack_cli-0.1.24}/src/agentpack/mcp_server.py RENAMED Viewed

@@ -27,6 +27,7 @@ import json
 import sys
 from pathlib import Path
+from agentpack.core import git
 from agentpack.core.token_estimator import estimate_tokens
@@ -110,15 +111,46 @@ def _get_context_impl(root: Path) -> str:
     generated_at = metadata.get("generated_at", "unknown") if metadata else "unknown"
     token_estimate = metadata.get("token_estimate", 0) if metadata else 0
+    stale_reasons: list[str] = []
     if metadata is None or snapshot is None or metadata.get("snapshot_root_hash") != snapshot.get("root_hash"):
-        header = f"> **Stale context** — repo changed since last pack (generated: {generated_at}). Run pack_context() to refresh.\n\n"
+        stale_reasons.append("repo snapshot changed")
+    if metadata:
+        saved_sha = metadata.get("git_sha") or (metadata.get("freshness") or {}).get("git_sha")
+        current_sha = git.current_sha(root) if git.is_git_repo(root) else None
+        if saved_sha and current_sha and saved_sha != current_sha:
+            stale_reasons.append("git HEAD changed")
+        task_md = _task_md_body(root)
+        if task_md and task_md != metadata.get("task"):
+            stale_reasons.append(".agentpack/task.md differs")
+    if stale_reasons:
+        reason_text = ", ".join(stale_reasons)
+        header = (
+            f"> **Stale context** — {reason_text} since last pack "
+            f"(generated: {generated_at}). Run pack_context() to refresh.\n\n"
+        )
     else:
         header = f"> Context is fresh (generated: {generated_at}, {token_estimate:,} tokens).\n\n"
     return header + content
+def _task_md_body(root: Path) -> str | None:
+    path = root / ".agentpack" / "task.md"
+    if not path.exists():
+        return None
+    try:
+        content = path.read_text(encoding="utf-8").strip()
+    except OSError:
+        return None
+    lines = [line for line in content.splitlines() if line.strip() and not line.startswith("#")]
+    body = lines[0].strip() if lines else ""
+    if body and "Write or update the current coding task here." not in body:
+        return body
+    return None
 def _explain_file_impl(root: Path, path: str, task: str = "") -> str:
     """Testable core of the explain_file MCP tool."""
     from agentpack.application.pack_service import PackPlanner, PackRequest, _sf_tokens

{agentpack_cli-0.1.22 → agentpack_cli-0.1.24}/src/agentpack/renderers/markdown.py RENAMED Viewed

@@ -71,6 +71,26 @@ def render_claude(pack: ContextPack) -> str:
     sections.append("## Task")
     sections.append("")
+    if pack.freshness or pack.freshness_warnings:
+        sections.append("## Freshness")
+        sections.append("")
+        if pack.freshness_warnings:
+            sections.append("> **Refresh recommended:** " + " ".join(pack.freshness_warnings))
+            sections.append("")
+        for label, key in (
+            ("Generated", "generated_at"),
+            ("Git branch", "git_branch"),
+            ("Git SHA", "git_sha"),
+            ("Task source", "task_source"),
+            ("Changed-file source", "changed_files_source"),
+            ("Snapshot hash", "snapshot_root_hash"),
+            ("Dirty files at pack time", "dirty_files_count"),
+        ):
+            value = pack.freshness.get(key)
+            if value is not None:
+                sections.append(f"- **{label}:** {value}")
+        sections.append("")
     sections.append(pack.task)
     sections.append("")