npm - cawdex - Versions diffs - 1.35.74 → 1.35.76 - Mend

cawdex 1.35.74 → 1.35.76

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (87) hide show

package/README.md +5 -5
package/bin/anycode.js +2 -2
package/bin/cawdex.js +408 -408
package/bin/ecc-hooks.cjs +11 -11
package/dist/agents-md.d.ts +31 -0
package/dist/agents-md.js +340 -0
package/dist/agents-md.js.map +1 -0
package/dist/agents.js +1424 -1424
package/dist/api.d.ts +1 -0
package/dist/api.js +19 -14
package/dist/api.js.map +1 -1
package/dist/autonomous-loops.js +287 -287
package/dist/benchmark-repos.d.ts +31 -0
package/dist/benchmark-repos.js +234 -8
package/dist/benchmark-repos.js.map +1 -1
package/dist/command-palette.js +4 -2
package/dist/command-palette.js.map +1 -1
package/dist/compaction.js +8 -8
package/dist/config.js +51 -36
package/dist/config.js.map +1 -1
package/dist/content-engine.js +543 -543
package/dist/context-brief.d.ts +4 -0
package/dist/context-brief.js +230 -0
package/dist/context-brief.js.map +1 -0
package/dist/cost-tracker.d.ts +33 -14
package/dist/cost-tracker.js +81 -19
package/dist/cost-tracker.js.map +1 -1
package/dist/coverage.js +39 -39
package/dist/docs-sync.js +98 -98
package/dist/evaluation.js +452 -452
package/dist/fixed-footer.d.ts +7 -1
package/dist/fixed-footer.js +92 -18
package/dist/fixed-footer.js.map +1 -1
package/dist/git-workflow.js +49 -49
package/dist/index.d.ts +2 -0
package/dist/index.js +197 -65
package/dist/index.js.map +1 -1
package/dist/instant-artifact.d.ts +6 -0
package/dist/instant-artifact.js +397 -0
package/dist/instant-artifact.js.map +1 -0
package/dist/live-queue.js +1 -1
package/dist/live-queue.js.map +1 -1
package/dist/model-aliases.d.ts +37 -0
package/dist/model-aliases.js +203 -0
package/dist/model-aliases.js.map +1 -0
package/dist/orchestration.js +15 -15
package/dist/permissions.d.ts +6 -0
package/dist/permissions.js +53 -0
package/dist/permissions.js.map +1 -1
package/dist/pm2-manager.js +26 -26
package/dist/query.d.ts +0 -1
package/dist/query.js +74 -39
package/dist/query.js.map +1 -1
package/dist/refactor.js +87 -87
package/dist/repo-command.js +7 -1
package/dist/repo-command.js.map +1 -1
package/dist/search-first.js +92 -92
package/dist/skill-create.js +100 -100
package/dist/stitch.js +1 -1
package/dist/system-prompt.d.ts +2 -1
package/dist/system-prompt.js +10 -5
package/dist/system-prompt.js.map +1 -1
package/dist/tools/github-repo-digest.d.ts +1 -1
package/dist/tools/github-repo-digest.js +38 -6
package/dist/tools/github-repo-digest.js.map +1 -1
package/dist/types.d.ts +3 -0
package/dist/types.js.map +1 -1
package/dist/verification.js +55 -55
package/package.json +1 -1
package/resources/__init__.py +1 -1
package/resources/exgentic/cawdex_agent/README.md +114 -114
package/resources/exgentic/cawdex_agent/__init__.py +5 -5
package/resources/exgentic/cawdex_agent/agent.py +605 -605
package/resources/exgentic/cawdex_agent/requirements.txt +2 -2
package/resources/exgentic/cawdex_agent/setup.sh +21 -21
package/resources/exgentic/cawdex_agent/utils.py +1061 -1061
package/resources/hal/cawdex_agent/README.md +24 -24
package/resources/hal/cawdex_agent/__init__.py +1 -1
package/resources/hal/cawdex_agent/main.py +550 -550
package/resources/hal/cawdex_agent/requirements.txt +2 -2
package/resources/kbench/cawdex_agent/README.md +107 -107
package/resources/kbench/cawdex_agent/adapter.manifest.json +19 -19
package/resources/kbench/cawdex_agent/runner.mjs +753 -753
package/resources/open_agent_leaderboard/cawdex-agent-card.md +119 -119
package/resources/terminal_bench/__init__.py +1 -1
package/resources/terminal_bench/cawdex_agent.py +174 -174
package/resources/terminal_bench/setup.sh +121 -121

package/resources/hal/cawdex_agent/requirements.txt CHANGED Viewed

@@ -1,2 +1,2 @@
-# Cawdex HAL adapter has no Python dependencies.
-# It shells out to the installed cawdex CLI.
+# Cawdex HAL adapter has no Python dependencies.
+# It shells out to the installed cawdex CLI.

package/resources/kbench/cawdex_agent/README.md CHANGED Viewed

@@ -1,107 +1,107 @@
-# Cawdex KBench Adapter
-This directory is a KBench `custom-adapter` for Cawdex.
-```bash
-kbench run \
-  --benchmark swe \
-  --harness custom-adapter \
-  --adapter /path/to/resources/kbench/cawdex_agent \
-  --model-name openrouter/free \
-  --instruction "Fix the bug"
-```
-The runner reads the KBench JSON payload from `KBENCH_ADAPTER_INPUT` or stdin,
-invokes `cawdex --prompt "/benchmark ..."` in task mode, and emits one
-`AdapterRunnerOutput` JSON object to stdout.
-Known KBench slugs are mapped to benchmark profiles before dispatch:
-`swe`/`swe-bench`, `tb2`/`terminal-bench`, `terminalworld`/`terminal-world`,
-`swe-chain`,
-`swe-cycle`/`fullcycle`/`swe-judge`, `swe-ci`/`swecibench`, `swe-prbench`/`prbench`/`pr-review`, `tml-bench`/`tabular-ml`/`kaggle-ml`, `pi-bench`/`proactive-assistant`, `ci-repair`/`ci-repair-bench`, `roadmapbench`, `saasbench`,
-`swe-bench-mobile`, `webdevbench`/`swe-webdev-bench`, `appworld`, `browsecomp`/`browsecompplus`, and
-`tau2`/`tau-bench` use specialized prompts; unknown slugs use
-`generic`.
-The output includes redacted instruction/stdout/stderr artifact refs, native
-Cawdex trace refs, and redacted git patch/status refs when the task
-worktree is a git repo. If a native `summary.json` exists, compact verifier
-evidence, including parsed counts, compact failure signatures, and final-answer
-verification-claim plus incomplete/blocked completion evidence, usage/cost
-telemetry, cost-efficiency risk, invalid tool-action telemetry, task-contract checklist completion/no-edit/test-edit signals,
-task-alignment risk signals, spec-compliance risk signals, reward-hack risk signals, long-horizon coverage risk signals, Pi-Bench proactivity ledger signals, incomplete/inconclusive verifier markers,
-environment setup/reconstruction signals for missing dependencies, toolchains,
-or build artifacts, dependency manifest/lockfile setup-validation signals,
-HarnessAudit-style harness-safety signals for protected-resource access, external information transfer, destructive operations, and oracle access,
-candidate-file dossier signals for broad pre-edit inspection without a compact dossier,
-root-cause hypothesis signals for repair edits after failed verifiers without an explicit diagnosis,
-targeted-fix manifest signals for repair edits after failed verifiers without a fix plan,
-trajectory-cleanup signals for base64/data-URI blobs, high-entropy encoded output, duplicate output, and excessive truncation,
-skill-view fit/timing signals, per-target edit localization signals, large edit-surface
-signals, scratch/probe artifact signals, redundant tool-call signals,
-redundant failing-verifier rerun signals, blind-repair signals, post-edit regression-cycle signals,
-AHE publish-state mutation signals, latest post-edit verifier signals, post-edit and final-state
-diff-review signals, final-edit validation stability/lucky-pass signals, broad-validation signals,
-CI-derived validation signals,
-source-research recency signals,
-process-defect scoring, AHE-style change-evaluation verdicts, submission bundle manifest readiness/hash metadata,
-and trajectory-quality fields are copied to
-`benchmarkResult.traceSummary` for harness-side scoring. `benchmarkResult.usage`
-also aliases the native usage block for cost-aware leaderboards. Native verifier
-trace previews preserve both head and tail output so final test summaries survive
-noisy install/build logs. `benchmarkResult.experienceCard` includes bounded
-task-alignment/spec-compliance/reward-hack/harness-safety/long-horizon/proactivity risk blocks, component-observability edit classification for AHE-style surface attribution, including SWE-WebDevBench canary/frontend-backend/security validation signals, SWE-Cycle lifecycle/setup/test-generation/judge validation signals, SWE-CI evolution/checklist/CI-loop validation signals, and Pi-Bench context-contract/hidden-intent/clarification/privacy/completion evidence, root-cause hypothesis state and targeted-fix counts for failed-verifier repair edits, decision-observability predictions for edits and validation-reliability evidence
-for final verifier stability, broad validation, and CI-derived validation, plus
-context-utilization precision/miss evidence, candidate-dossier status, and
-trajectory-cleanup summaries for retrieval-aware scoring and avoiding noisy prior
-traces, plus run-efficiency action/usage/cost/time evidence for cost-aware scoring. Prior
-experience hints also expose compact source-research coverage, including
-hit/error counts, targeted/fresh coverage, recency windows, top URLs, and
-Kaggle fallback status. When present, `benchmarkResult.traceSummary` also
-includes the redacted ACC-style task/context/answer compilation from the native
-Cawdex trace for retrieval, replay, or training-data curation.
-It also includes `changeEvaluation` and `submissionBundleManifest` when present, so leaderboard
-submission tooling can inspect artifact hashes and missing official score/session
-fields without parsing the full summary.
-Inside benchmark mode, the read-only `benchmark_context` preflight also surfaces
-CI workflow run commands plus setup actions, env key names, service containers,
-job containers, and images from GitHub Actions, GitLab CI, CircleCI, Azure
-Pipelines, and Jenkins files. Env values are not printed. Agents can reconstruct
-the relevant CI environment and then reproduce project-native test/build/lint
-steps before finalizing.
-It also separates reusable prior local benchmark experience from similar failed
-or unsafe prior runs, so context reuse stays method-level and current verifier
-evidence remains authoritative. Pi-Bench-like tasks additionally prefer prior
-experience with complete context/hidden-intent/clarification/privacy/completion
-proactivity ledgers and surface incomplete ledgers as warnings. AHE change
-evaluations also participate in reuse: confirmed manifests can rank higher,
-while contradicted, regression-risk, pending-verification, missing-prediction,
-or missing-regression-forecast manifests are warnings rather than replay hints.
-Context-utilization evidence participates in reuse as well: concise runs whose
-inspected context was used by the eventual patch and whose pre-edit search was
-compressed into a candidate-file dossier can rank higher, while low-utilization,
-missing-dossier, or pre-edit context-bloat runs are warnings rather than replay
-hints.
-AHE-style cleanup evidence participates in reuse too: prior runs with encoded blobs,
-duplicate observations, or excessive truncation are surfaced as warnings instead of
-replay hints.
-AHE-style diagnosis evidence participates in reuse too: prior runs that repaired after
-failed verifiers without a root-cause hypothesis are surfaced as warnings instead of
-replay hints.
-AHE-style fix-plan evidence participates in reuse too: prior runs that repaired after
-failed verifiers without a targeted-fix manifest are surfaced as warnings instead of
-replay hints.
-Interactive type-ahead is preserved across active-turn cancellation and
-permission interruptions, so user drafts return to the prompt instead of being
-silently submitted while the harness is still running.
-Useful env vars:
-- `CAWDEX_KBENCH_COMMAND` or `CAWDEX_KBENCH_COMMAND`: command used to launch Cawdex, default `cawdex`.
-- `CAWDEX_KBENCH_PERMISSION`: permission flag value, default `yolo`.
-- `CAWDEX_KBENCH_EXTRA_ARGS`: extra Cawdex CLI flags.
-- `CAWDEX_KBENCH_ARTIFACT_DIR`: directory for redacted instruction/stdout/stderr and trace files.
-- `CAWDEX_BASH_TIMEOUT_MS`: default Cawdex `bash` tool timeout; the adapter defaults to `300000` when unset.
-Provider keys should be passed via normal Cawdex env config or KBench's
-`--api-key-env`, which the runner forwards as Cawdex `--api-key-env`.
+# Cawdex KBench Adapter
+This directory is a KBench `custom-adapter` for Cawdex.
+```bash
+kbench run \
+  --benchmark swe \
+  --harness custom-adapter \
+  --adapter /path/to/resources/kbench/cawdex_agent \
+  --model-name openrouter/free \
+  --instruction "Fix the bug"
+```
+The runner reads the KBench JSON payload from `KBENCH_ADAPTER_INPUT` or stdin,
+invokes `cawdex --prompt "/benchmark ..."` in task mode, and emits one
+`AdapterRunnerOutput` JSON object to stdout.
+Known KBench slugs are mapped to benchmark profiles before dispatch:
+`swe`/`swe-bench`, `tb2`/`terminal-bench`, `terminalworld`/`terminal-world`,
+`swe-chain`,
+`swe-cycle`/`fullcycle`/`swe-judge`, `swe-ci`/`swecibench`, `swe-prbench`/`prbench`/`pr-review`, `tml-bench`/`tabular-ml`/`kaggle-ml`, `pi-bench`/`proactive-assistant`, `ci-repair`/`ci-repair-bench`, `roadmapbench`, `saasbench`,
+`swe-bench-mobile`, `webdevbench`/`swe-webdev-bench`, `appworld`, `browsecomp`/`browsecompplus`, and
+`tau2`/`tau-bench` use specialized prompts; unknown slugs use
+`generic`.
+The output includes redacted instruction/stdout/stderr artifact refs, native
+Cawdex trace refs, and redacted git patch/status refs when the task
+worktree is a git repo. If a native `summary.json` exists, compact verifier
+evidence, including parsed counts, compact failure signatures, and final-answer
+verification-claim plus incomplete/blocked completion evidence, usage/cost
+telemetry, cost-efficiency risk, invalid tool-action telemetry, task-contract checklist completion/no-edit/test-edit signals,
+task-alignment risk signals, spec-compliance risk signals, reward-hack risk signals, long-horizon coverage risk signals, Pi-Bench proactivity ledger signals, incomplete/inconclusive verifier markers,
+environment setup/reconstruction signals for missing dependencies, toolchains,
+or build artifacts, dependency manifest/lockfile setup-validation signals,
+HarnessAudit-style harness-safety signals for protected-resource access, external information transfer, destructive operations, and oracle access,
+candidate-file dossier signals for broad pre-edit inspection without a compact dossier,
+root-cause hypothesis signals for repair edits after failed verifiers without an explicit diagnosis,
+targeted-fix manifest signals for repair edits after failed verifiers without a fix plan,
+trajectory-cleanup signals for base64/data-URI blobs, high-entropy encoded output, duplicate output, and excessive truncation,
+skill-view fit/timing signals, per-target edit localization signals, large edit-surface
+signals, scratch/probe artifact signals, redundant tool-call signals,
+redundant failing-verifier rerun signals, blind-repair signals, post-edit regression-cycle signals,
+AHE publish-state mutation signals, latest post-edit verifier signals, post-edit and final-state
+diff-review signals, final-edit validation stability/lucky-pass signals, broad-validation signals,
+CI-derived validation signals,
+source-research recency signals,
+process-defect scoring, AHE-style change-evaluation verdicts, submission bundle manifest readiness/hash metadata,
+and trajectory-quality fields are copied to
+`benchmarkResult.traceSummary` for harness-side scoring. `benchmarkResult.usage`
+also aliases the native usage block for cost-aware leaderboards. Native verifier
+trace previews preserve both head and tail output so final test summaries survive
+noisy install/build logs. `benchmarkResult.experienceCard` includes bounded
+task-alignment/spec-compliance/reward-hack/harness-safety/long-horizon/proactivity risk blocks, component-observability edit classification for AHE-style surface attribution, including SWE-WebDevBench canary/frontend-backend/security validation signals, SWE-Cycle lifecycle/setup/test-generation/judge validation signals, SWE-CI evolution/checklist/CI-loop validation signals, and Pi-Bench context-contract/hidden-intent/clarification/privacy/completion evidence, root-cause hypothesis state and targeted-fix counts for failed-verifier repair edits, decision-observability predictions for edits and validation-reliability evidence
+for final verifier stability, broad validation, and CI-derived validation, plus
+context-utilization precision/miss evidence, candidate-dossier status, and
+trajectory-cleanup summaries for retrieval-aware scoring and avoiding noisy prior
+traces, plus run-efficiency action/usage/cost/time evidence for cost-aware scoring. Prior
+experience hints also expose compact source-research coverage, including
+hit/error counts, targeted/fresh coverage, recency windows, top URLs, and
+Kaggle fallback status. When present, `benchmarkResult.traceSummary` also
+includes the redacted ACC-style task/context/answer compilation from the native
+Cawdex trace for retrieval, replay, or training-data curation.
+It also includes `changeEvaluation` and `submissionBundleManifest` when present, so leaderboard
+submission tooling can inspect artifact hashes and missing official score/session
+fields without parsing the full summary.
+Inside benchmark mode, the read-only `benchmark_context` preflight also surfaces
+CI workflow run commands plus setup actions, env key names, service containers,
+job containers, and images from GitHub Actions, GitLab CI, CircleCI, Azure
+Pipelines, and Jenkins files. Env values are not printed. Agents can reconstruct
+the relevant CI environment and then reproduce project-native test/build/lint
+steps before finalizing.
+It also separates reusable prior local benchmark experience from similar failed
+or unsafe prior runs, so context reuse stays method-level and current verifier
+evidence remains authoritative. Pi-Bench-like tasks additionally prefer prior
+experience with complete context/hidden-intent/clarification/privacy/completion
+proactivity ledgers and surface incomplete ledgers as warnings. AHE change
+evaluations also participate in reuse: confirmed manifests can rank higher,
+while contradicted, regression-risk, pending-verification, missing-prediction,
+or missing-regression-forecast manifests are warnings rather than replay hints.
+Context-utilization evidence participates in reuse as well: concise runs whose
+inspected context was used by the eventual patch and whose pre-edit search was
+compressed into a candidate-file dossier can rank higher, while low-utilization,
+missing-dossier, or pre-edit context-bloat runs are warnings rather than replay
+hints.
+AHE-style cleanup evidence participates in reuse too: prior runs with encoded blobs,
+duplicate observations, or excessive truncation are surfaced as warnings instead of
+replay hints.
+AHE-style diagnosis evidence participates in reuse too: prior runs that repaired after
+failed verifiers without a root-cause hypothesis are surfaced as warnings instead of
+replay hints.
+AHE-style fix-plan evidence participates in reuse too: prior runs that repaired after
+failed verifiers without a targeted-fix manifest are surfaced as warnings instead of
+replay hints.
+Interactive type-ahead is preserved across active-turn cancellation and
+permission interruptions, so user drafts return to the prompt instead of being
+silently submitted while the harness is still running.
+Useful env vars:
+- `CAWDEX_KBENCH_COMMAND` or `CAWDEX_KBENCH_COMMAND`: command used to launch Cawdex, default `cawdex`.
+- `CAWDEX_KBENCH_PERMISSION`: permission flag value, default `yolo`.
+- `CAWDEX_KBENCH_EXTRA_ARGS`: extra Cawdex CLI flags.
+- `CAWDEX_KBENCH_ARTIFACT_DIR`: directory for redacted instruction/stdout/stderr and trace files.
+- `CAWDEX_BASH_TIMEOUT_MS`: default Cawdex `bash` tool timeout; the adapter defaults to `300000` when unset.
+Provider keys should be passed via normal Cawdex env config or KBench's
+`--api-key-env`, which the runner forwards as Cawdex `--api-key-env`.

package/resources/kbench/cawdex_agent/adapter.manifest.json CHANGED Viewed

@@ -1,19 +1,19 @@
-{
-  "schemaVersion": "kbench.adapter/v1",
-  "id": "cawdex",
-  "kind": "node",
-  "entry": "./runner.mjs",
-  "version": "0.1.0",
-  "supportedBenchmarks": ["swe", "tb2", "sae"],
-  "capabilities": {
-    "runModes": ["task"],
-    "machineReadableStdout": true,
-    "supportsPatchOutput": false,
-    "supportsTrajectory": true,
-    "supportsToolCallTrace": true,
-    "supportsResume": false,
-    "supportsImages": false,
-    "supportsSandboxBridge": false,
-    "supportsPromptTemplate": false
-  }
-}
+{
+  "schemaVersion": "kbench.adapter/v1",
+  "id": "cawdex",
+  "kind": "node",
+  "entry": "./runner.mjs",
+  "version": "0.1.0",
+  "supportedBenchmarks": ["swe", "tb2", "sae"],
+  "capabilities": {
+    "runModes": ["task"],
+    "machineReadableStdout": true,
+    "supportsPatchOutput": false,
+    "supportsTrajectory": true,
+    "supportsToolCallTrace": true,
+    "supportsResume": false,
+    "supportsImages": false,
+    "supportsSandboxBridge": false,
+    "supportsPromptTemplate": false
+  }
+}