npm - trace-to-skill - Versions diffs - 0.1.78 → 0.1.80 - Mend

trace-to-skill 0.1.78 → 0.1.80

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/README.md +10 -7
package/dist/src/benchmark.js +6 -0
package/dist/src/benchmark.js.map +1 -1
package/dist/src/demo.js +8 -0
package/dist/src/demo.js.map +1 -1
package/dist/src/index.d.ts +1 -1
package/dist/src/rules.js +18 -0
package/dist/src/rules.js.map +1 -1
package/dist/src/types.d.ts +1 -1
package/dist/src/usageEvidence.d.ts +31 -1
package/dist/src/usageEvidence.js +192 -4
package/dist/src/usageEvidence.js.map +1 -1
package/docs/BENCHMARK.md +2 -1
package/docs/CODEX_ISSUE_MAP.md +6 -3
package/docs/DEMO.md +2 -0
package/docs/DISCOVERY.md +4 -3
package/docs/FAILURE_TAXONOMY.md +11 -3
package/docs/OPENAI_OSS_BRIEF.md +6 -6
package/docs/SCORECARD.md +3 -2
package/docs/USE_CASES.md +29 -13
package/fixtures/codex-mcp-streamable-http.md +29 -0
package/fixtures/codex-token-burn.md +9 -1
package/llms.txt +6 -3
package/package.json +8 -1
package/schemas/analysis-result.schema.json +1 -0
package/schemas/usage-evidence-result.schema.json +110 -3

package/docs/FAILURE_TAXONOMY.md CHANGED Viewed

@@ -120,6 +120,14 @@ Common signals include `user cancelled MCP tool call`, `request_user_input is no
 The fix is to capture the Codex version, MCP server name and transport, tool name, exposed callable name, whether `tools/list` and manual `tools/call` succeed, `approval_policy`, sandbox mode, exec or interactive mode, elicitation setting, namespace or `serverName` metadata, exact `item.started` / `item.completed` JSONL, stderr or backpressure evidence, and whether restarting or reinitializing the transport changes the result.
+## Codex Streamable HTTP MCP
+Streamable HTTP and SSE MCP servers can be reachable and still fail inside Codex before or during tool calls. This is different from discovery mismatch because the server may initialize or expose tools, and different from stdio runtime failure because the failure sits in HTTP framing, JSON-RPC parsing, session reuse, auth expectations, or reconnect behavior.
+Common signals include Penpot response parse or `JsonRpcMessage deserialize` errors, `Content-Type: text/event-stream` SSE frames that Codex cannot parse, n8n `initialize` success followed by `Transport closed`, DingTalk OAuth/login gating that contradicts local config, stale `streamable-http` session ids after a remote server restart, missing header or User-Agent requirements, and recovery only after restarting Codex.
+The fix is to capture Codex version, MCP server name, transport URL without secrets, initialize/tools/list/tools/call results, HTTP status, `Content-Type`, SSE event framing, JSON-RPC message shape, session id before and after reconnect or server restart, auth/OAuth expectations, User-Agent and header requirements, exact parse/deserialize error, whether curl or another MCP client succeeds, and whether restarting Codex or reinitializing the transport recovers.
 ## Codex MCP Discovery Mismatch
 Codex MCP servers can work in CLI or one config scope but disappear in another surface before any tool call is possible. This is different from runtime failure: the user may have no `mcp__*` tools exposed in VS Code, Desktop, WSL, a remote session, project-local `.codex/config.toml`, or an older conversation even though CLI `/mcp` works.
@@ -146,11 +154,11 @@ The fix is to capture app and CLI versions, OS, session or thread id, rollout JS
 ## Codex Token Burn
-Codex usage can drain unexpectedly even when users cannot tell whether the cause is useful model work, background process polling, idle app activity, compaction/replay overhead, retry loops, subagent fan-out, fast-mode drift, large context, long `AGENTS.md`, MCP/skill overhead, or cached-token-heavy turns.
+Codex usage can drain unexpectedly even when users cannot tell whether the cause is useful model work, backend prompt-cache collapse, background process polling, idle app activity, compaction/replay overhead, retry loops, subagent fan-out, fast-mode drift, large context, long `AGENTS.md`, MCP/skill overhead, or cached-token-heavy turns.
-Common signals include tokens `burning very fast`, usage dropping by visible percentages after one or two prompts, weekly allowance depletion under normal usage, 5-hour usage reaching 0%, large `input` plus `cached input` totals, `write_stdin` empty polling, background commands repeatedly checking for no new output, Codex using daily usage while idle or only open, compaction tax, retry/tool loops, and requests for usage attribution across normal turns, compaction, retries, subagents, and background polling.
+Common signals include tokens `burning very fast`, usage dropping by visible percentages after one or two prompts, weekly allowance depletion under normal usage, 5-hour usage reaching 0%, large `input` plus `cached input` totals, `input_tokens` plus `cached_input_tokens` / `cached_tokens` rows, `prompt_cache_key` staying stable while cached tokens drop, websocket reconnect notes, `write_stdin` empty polling, background commands repeatedly checking for no new output, Codex using daily usage while idle or only open, compaction tax, retry/tool loops, and requests for usage attribution across normal turns, compaction, retries, subagents, prompt cache, and background polling.
-The fix is to capture plan/workspace, client and version, model and reasoning/speed settings, fast-mode/large-context/subagent/review flags, recent `/status` and usage-dashboard deltas, local token totals including cached input/output/reasoning if available, background process ids and `write_stdin` poll cadence, compaction attempts and failures, retry/tool-loop counts, whether the app was idle, and a minimal reproduction with before/after usage percentages.
+The fix is to capture plan/workspace, client and version, model and reasoning/speed settings, fast-mode/large-context/subagent/review flags, recent `/status` and usage-dashboard deltas, local token totals including cached input/output/reasoning if available, adjacent prompt-cache rows with `input_tokens`, `cached_input_tokens` / `cached_tokens`, `prompt_cache_key`, response ids, transport and reconnect timing, background process ids and `write_stdin` poll cadence, compaction attempts and failures, retry/tool-loop counts, whether the app was idle, and a minimal reproduction with before/after usage percentages.
 ## Codex Resource Leak

package/docs/OPENAI_OSS_BRIEF.md CHANGED Viewed

@@ -3,18 +3,18 @@
 | Field | Value |
 | --- | --- |
 | Repository | https://github.com/grnbtqdbyx-create/trace-to-skill |
-| Package | trace-to-skill@0.1.78 |
+| Package | trace-to-skill@0.1.80 |
 | License | Apache-2.0 |
 | Codex readiness | ready (100/100) |
-| Benchmark | pass, 33 cases |
+| Benchmark | pass, 34 cases |
 ## Why This Repository Qualifies
-trace-to-skill helps open-source maintainers adopt Codex safely by turning failed coding-agent runs into evidence-backed rules, reusable workflows, and CI gates. It supports real maintenance work: PR review, issue triage, release quality, MCP risk, prompt-injection defense, privacy-preserving trace sharing, and repeat failure reduction. The repository is ready, scores 100/100 on the local Codex readiness doctor, and ships a deterministic benchmark with 33 public fixture cases.
+trace-to-skill helps open-source maintainers adopt Codex safely by turning failed coding-agent runs into evidence-backed rules, reusable workflows, and CI gates. It supports real maintenance work: PR review, issue triage, release quality, MCP risk, prompt-injection defense, privacy-preserving trace sharing, and repeat failure reduction. The repository is ready, scores 100/100 on the local Codex readiness doctor, and ships a deterministic benchmark with 34 public fixture cases.
 ### 500-Character Version
-> trace-to-skill helps open-source maintainers adopt Codex safely by turning failed coding-agent runs into evidence-backed rules, reusable workflows, and CI gates. It supports real maintenance work: PR review, issue triage, release quality, MCP risk, prompt-injection defense, privacy-preserving trace sharing, and repeat failure reduction. The repository is ready, scores 100/100 on the local Codex readiness doctor, and ships a deterministic benchmark with 33 public fixture cases.
+> trace-to-skill helps open-source maintainers adopt Codex safely by turning failed coding-agent runs into evidence-backed rules, reusable workflows, and CI gates. It supports real maintenance work: PR review, issue triage, release quality, MCP risk, prompt-injection defense, privacy-preserving trace sharing, and repeat failure reduction. The repository is ready, scores 100/100 on the local Codex readiness doctor, and ships a deterministic benchmark with 34 public fixture cases.
 ## How API Credits Would Be Used
@@ -27,10 +27,10 @@ API credits would power optional maintainer workflows on top of the local determ
 ## Evidence
 - Public repository: https://github.com/grnbtqdbyx-create/trace-to-skill
-- One-command package: npx trace-to-skill@0.1.78
+- One-command package: npx trace-to-skill@0.1.80
 - Open-source license: Apache-2.0
 - Codex readiness doctor: ready, 100/100, 0 failed checks.
-- Public fixture benchmark: pass, 33 cases.
+- Public fixture benchmark: pass, 34 cases.
 - Maintainer control: generated rules are suggestions, evidence is line-linked, and secrets can be redacted before sharing.
 ## Next Steps Before Submitting

package/docs/SCORECARD.md CHANGED Viewed

@@ -9,7 +9,7 @@ Status: **pass**
 | Failed doctor checks | 0 |
 | Critical findings | 0 |
 | Built-in benchmark | pass |
-| Benchmark cases | 33 |
+| Benchmark cases | 34 |
 ## Doctor Summary
@@ -45,11 +45,12 @@ This benchmark runs the public fixture pack that ships with the repository and p
 | MCP config with secret exposure | `fixtures/mcp-risk.json` | 59 | 2 | 1 | `mcp_risk`, `secret_exposure` | pass |
 | Sensitive file access in agent context | `fixtures/sensitive-file-access.md` | 75 | 2 | 0 | `sensitive_file_access`, `weak_evidence` | pass |
 | Codex MCP runtime failure | `fixtures/codex-mcp-runtime.md` | 75 | 2 | 0 | `codex_mcp_runtime`, `weak_evidence` | pass |
+| Codex Streamable HTTP MCP parse and handshake failure | `fixtures/codex-mcp-streamable-http.md` | 75 | 2 | 0 | `codex_mcp_streamable_http`, `weak_evidence` | pass |
 | Codex MCP discovery and config-scope mismatch | `fixtures/codex-mcp-discovery-mismatch.md` | 75 | 2 | 0 | `codex_mcp_discovery_mismatch`, `weak_evidence` | pass |
 | Codex plugin runtime and bundled capability failure | `fixtures/codex-plugin-runtime.md` | 59 | 3 | 0 | `codex_plugin_runtime`, `codex_windows_helper_path`, `weak_evidence` | pass |
 | Codex file tree and workspace navigation UI failure | `fixtures/codex-file-tree-ui.md` | 75 | 2 | 0 | `codex_file_tree_ui`, `weak_evidence` | pass |
 | Codex session resume and state failure | `fixtures/codex-session-state.md` | 59 | 3 | 0 | `codex_resource_leak`, `codex_session_state`, `weak_evidence` | pass |
-| Codex token burn and usage-drain loop | `fixtures/codex-token-burn.md` | 59 | 3 | 0 | `codex_resource_leak`, `codex_token_burn`, `weak_evidence` | pass |
+| Codex token burn and usage-drain loop | `fixtures/codex-token-burn.md` | 75 | 2 | 0 | `codex_token_burn`, `weak_evidence` | pass |
 | Codex resource leak and runaway process | `fixtures/codex-resource-leak.md` | 75 | 2 | 0 | `codex_resource_leak`, `weak_evidence` | pass |
 | Codex tool-call integrity and rollback failure | `fixtures/codex-tool-call-integrity.md` | 43 | 4 | 0 | `codex_resource_leak`, `codex_subagent_lifecycle`, `codex_tool_call_integrity`, `weak_evidence` | pass |
 | Codex apply_patch Add File overwrite safety | `fixtures/codex-apply-patch-overwrite.md` | 75 | 2 | 0 | `codex_tool_call_integrity`, `weak_evidence` | pass |

package/docs/USE_CASES.md CHANGED Viewed

@@ -18,6 +18,7 @@ npx trace-to-skill demo clipboard-attachment
 npx trace-to-skill demo deeplink-launch
 npx trace-to-skill demo connector-auth-cache
 npx trace-to-skill demo mcp-discovery-mismatch
+npx trace-to-skill demo mcp-streamable-http
 npx trace-to-skill demo terminal-output-integrity
 npx trace-to-skill demo subagent-lifecycle
 npx trace-to-skill sensitive-audit .
@@ -29,7 +30,7 @@ What it proves:
 - packaged fixtures can produce a real Codex issue report immediately
 - maintainers can inspect the output shape before sharing any private log
-- demos cover remote compact failures, Windows helper path failures, patch overwrite safety, approval friction, latency, Thinking hangs, clipboard/attachment regressions, deeplink/OAuth launch regressions, connector auth-cache regressions, MCP discovery/config-scope mismatches, terminal output/scrollback integrity, subagent lifecycle drift, token burn, sensitive files, and prompt injection
+- demos cover remote compact failures, Windows helper path failures, patch overwrite safety, approval friction, latency, Thinking hangs, clipboard/attachment regressions, deeplink/OAuth launch regressions, connector auth-cache regressions, MCP discovery/config-scope mismatches, Streamable HTTP MCP parse/handshake failures, terminal output/scrollback integrity, subagent lifecycle drift, token burn, sensitive files, and prompt injection
 - `sensitive-audit` scans filenames and paths before an agent run, without reading file contents, so teams can build `.agentignore`, `.aiexclude`, `.codexignore`, `.gitignore`, or sandbox permission profiles from a concrete repo report
 - `lsp-audit` scans repo language signals and PATH availability so teams know which language servers are ready before asking Codex for symbol-aware edits
@@ -54,7 +55,7 @@ What it proves:
 Recommended CI surface:
 ```yaml
-- uses: grnbtqdbyx-create/trace-to-skill@v0.1.78
+- uses: grnbtqdbyx-create/trace-to-skill@v0.1.80
   with:
     mode: all
     doctor-threshold: "85"
@@ -145,19 +146,20 @@ This catches signals such as `Error running remote compact task`, `timeout waiti
 ## 8. Codex Usage Evidence Packaging
-Use this when a Codex usage issue has scattered evidence across `/status`, dashboard notes, reset tables, token totals, cached input, and local overhead clues.
+Use this when a Codex usage issue has scattered evidence across `/status`, dashboard notes, reset tables, token totals, prompt-cache rows, cached input, and local overhead clues.
 ```bash
 npx trace-to-skill usage-evidence ./usage-notes.md --output usage-evidence.md
 npx trace-to-skill usage-evidence ./usage-notes.md --format json
 ```
-This turns Markdown polling tables, CSV-like rows, JSON/JSONL snapshots, `reset_at` values, usage-limit errors, rapid drain experiment notes like `1% in 4 minutes`, `22 credits`, or `70% weekly in a day`, `Token usage: total=... cached` lines, `write_stdin` polling, compaction loops, retry/tool loops, subagent fan-out, and idle-drain notes into a single report with a usage receipt.
+This turns Markdown polling tables, CSV-like rows, JSON/JSONL snapshots, `reset_at` values, usage-limit errors, rapid drain experiment notes like `1% in 4 minutes`, `22 credits`, or `70% weekly in a day`, prompt-cache rows with `input_tokens`, `cached_input_tokens` / `cached_tokens`, `prompt_cache_key`, response ids, websocket/reconnect notes, `Token usage: total=... cached` lines, `write_stdin` polling, compaction loops, retry/tool loops, subagent fan-out, and idle-drain notes into a single report with a usage receipt.
 The receipt separates:
 - backend quota-window percentage evidence
 - local token totals, including cached input and reasoning
+- prompt-cache records and adjacent cache-collapse events
 - bounded rapid-drain experiment rows with model, plan, prompt count, elapsed time, percent, and credits when present
 - orchestration-overhead signals that may burn usage without accepted work
 - suspected cause buckets to keep public reports comparable
@@ -258,7 +260,7 @@ npx trace-to-skill analyze ./runs --format json
 npx trace-to-skill codex-report ./runs --output openai-codex-issue.md
 ```
-This catches signals such as tokens `burning very fast`, usage dropping by visible percentages after one or two prompts, weekly allowance depletion, 5-hour usage reaching 0%, large `input` plus `cached input` totals, `write_stdin` empty polling, background commands repeatedly reporting no new output, idle app usage, compaction tax, retry/tool loops, and missing attribution between normal turns, compaction, background polling, subagents, and retries.
+This catches signals such as tokens `burning very fast`, usage dropping by visible percentages after one or two prompts, weekly allowance depletion, 5-hour usage reaching 0%, large `input` plus `cached input` totals, `input_tokens` / `cached_input_tokens` / `prompt_cache_key` rows that show cache collapse, `write_stdin` empty polling, background commands repeatedly reporting no new output, idle app usage, compaction tax, retry/tool loops, and missing attribution between normal turns, compaction, background polling, subagents, and retries.
 For public reports, prefer `usage-evidence` first so the quota-window, local-token, and orchestration-overhead layers are visible separately.
@@ -375,7 +377,21 @@ This catches signals such as `MCP servers not detected in Codex VS Code extensio
 Include app/CLI/extension version, OS, IDE, remote/WSL/SSH state, workspace root, effective `CODEX_HOME`, all config files considered (`~/.codex/config.toml`, project `.codex/config.toml`, `.vscode/mcp.json`, `.mcp.json`), redacted MCP sections, trust/profile/default-permissions state, `codex mcp list`, `codex mcp get <server>`, CLI-versus-Desktop/VS Code comparison, loaded config path/log lines, whether moving the same server to user-global config fixes it, and whether the current session exposes `mcp__*` tools.
-## 25. Patch Overwrite Guard
+## 25. Codex Streamable HTTP MCP Evidence
+Use this when a Streamable HTTP or SSE MCP server is reachable but Codex fails during JSON-RPC parsing, handshake, auth gating, stale session reuse, or reconnect.
+```bash
+npx trace-to-skill demo mcp-streamable-http
+npx trace-to-skill analyze ./runs --format json
+npx trace-to-skill codex-report ./runs --output openai-codex-mcp-streamable-http.md
+```
+This catches signals such as Penpot `JsonRpcMessage deserialize` or response-parse failures, `Content-Type: text/event-stream` framing problems, n8n `initialize` followed by `Transport closed`, DingTalk OAuth/login gates that do not match config expectations, stale `streamable-http` session ids after server restart, missing header/User-Agent requirements, and recovery only after restarting Codex.
+Include Codex version, MCP server name, transport URL without secrets, initialize/tools/list/tools/call results, HTTP status, `Content-Type`, SSE event framing, JSON-RPC message shape, session id before and after reconnect or server restart, auth/OAuth expectations, User-Agent/header requirements, exact parse/deserialize error, whether curl or another MCP client succeeds, and whether restarting Codex or reinitializing the transport recovers.
+## 26. Patch Overwrite Guard
 Use this before applying a generated patch when you want create/update/delete semantics checked against the actual workspace.
@@ -392,7 +408,7 @@ For a public demo report:
 npx trace-to-skill demo patch-overwrite
 ```
-## 26. Sensitive Path Preflight Before Agent Runs
+## 27. Sensitive Path Preflight Before Agent Runs
 Use this before giving an AI coding agent a repository.
@@ -407,7 +423,7 @@ This finds sensitive-looking paths such as `.env`, `.env.*`, `.npmrc`, `.pypirc`
 The output includes a stable JSON schema plus recommended exclude globs that can seed `.agentignore`, `.aiexclude`, `.codexignore`, `.gitignore`, local sandbox permission profiles, or team security review checklists. `--format ignore` renders a reviewable generated file candidate and still does not mutate the repo. It is a preflight report, not a sandbox boundary.
-## 27. Workspace Checkpoint Before Agent Runs
+## 28. Workspace Checkpoint Before Agent Runs
 Use this before giving Codex, Claude, Cursor, or another coding agent a dirty repository where untracked local work matters.
@@ -420,7 +436,7 @@ This writes a local checkpoint bundle with `status.txt`, staged and unstaged bin
 This is useful for OpenAI/Codex `/undo` and `/rewind` discussions where users need workspace protection beyond conversation rewind, especially when untracked files are outside normal commit history.
-## 28. OpenAI Codex Issue Report
+## 29. OpenAI Codex Issue Report
 Use this when you want to file or update an OpenAI/Codex issue with a concise, evidence-backed report instead of pasting a full transcript.
@@ -433,7 +449,7 @@ The report includes the likely Codex failure class, line-linked evidence, diagno
 For a cluster-to-command map of current Codex issue patterns, see [CODEX_ISSUE_MAP.md](CODEX_ISSUE_MAP.md).
-## 29. Sensitive File Access Evidence
+## 30. Sensitive File Access Evidence
 Use this when a trace suggests an agent read, attached, uploaded, diffed, or indexed credential-bearing files.
@@ -446,7 +462,7 @@ This catches signals such as `.env`, `.env.production`, `.npmrc`, `.pypirc`, `.n
 Before publishing evidence, run `trace-to-skill redact` and attach only redacted excerpts plus the file path/class.
-## 30. GitHub Context Guard
+## 31. GitHub Context Guard
 Use this before an agent reads untrusted GitHub text.
@@ -463,7 +479,7 @@ Use it when:
 - a bot asks Codex to triage untrusted user reports
 - logs or comments might contain instructions like "ignore previous instructions" or "print secrets"
-## 30. Failed Agent Run To Reviewable Rule
+## 32. Failed Agent Run To Reviewable Rule
 Use this when a coding agent made a repeated workflow mistake.
@@ -481,7 +497,7 @@ Recommended maintainer loop:
 4. Copy only evidence-backed rules into the real policy file.
 5. Run `eval` or `scorecard` in CI so the same failure does not silently return.
-## 31. Privacy-Preserving Adoption
+## 33. Privacy-Preserving Adoption
 Use this when you want public evidence without leaking private traces.

package/fixtures/codex-mcp-streamable-http.md ADDED Viewed

@@ -0,0 +1,29 @@
+# Codex Streamable HTTP MCP Fixture
+## Penpot parse failure
+Codex CLI 0.136.0 is configured with the Streamable HTTP MCP server `penpot`.
+The server is reachable and `tools/list` succeeds, but the streamable HTTP client fails to parse Penpot MCP responses before `tools/call`.
+stderr:
+```text
+JsonRpcMessage deserialize error while reading streamable-http response
+response parse failed after Content-Type: text/event-stream
+```
+The server sends SSE event frames with JSON-RPC payloads, and another MCP client can read the same endpoint successfully.
+## n8n initialize handshake
+MCP Streamable HTTP handshake fails with `n8n` after initialize.
+`initialize` returns 200, then the next `tools/list` reports Transport closed and the session id is not accepted.
+## DingTalk auth gate
+The DingTalk Streamable HTTP MCP server is incorrectly gated behind OAuth/login even though the server is configured as unauthenticated in the local server config.
+## Stale session after restart
+After a remote MCP server restart, Codex reuses a stale streamable-http session id.
+Subsequent tool calls fail until restarting Codex instead of reconnecting or reinitializing the transport.

package/fixtures/codex-token-burn.md CHANGED Viewed

@@ -29,6 +29,14 @@ During background waits, the cadence is about one poll every 5-10 seconds.
 Cached tokens are still charged by some API/proxy billing paths.
 ```
+## Prompt cache collapse
+```text
+Previous WebSocket request: input_tokens=183,426 cached_input_tokens=152,448 prompt_cache_key="019e74ff-6cf1-7d40-80ce-0c8baa3ad6cf" id="resp_0442f7dc" outcome=incremental
+Reconnect request: input_tokens=184,739 cached_tokens=91,520 prompt_cache_key="019e74ff-6cf1-7d40-80ce-0c8baa3ad6cf" id="resp_0fd6c965" outcome=incremental websocket reconnect
+Recovered request: input_tokens=185,010 cached_input_tokens=153,000 prompt_cache_key="019e74ff-6cf1-7d40-80ce-0c8baa3ad6cf" id="resp_1a2b3c" outcome=incremental
+```
 ## Compaction and replay cost
 Another report said:
@@ -39,4 +47,4 @@ The weekly usage limit depletes unusually fast on 5.5, worsened by unstable cont
 Failed context compaction forces users to restart tasks and re-explain project state, creating compaction tax.
 ```
-The report should include the plan/workspace, app or CLI version, model, reasoning effort, speed mode, large context setting, subagent and /review usage, recent `/status` and dashboard deltas, token totals including cached input/output/reasoning, background process ids, write_stdin poll cadence, compaction attempts, retry/tool-loop counts, whether the app was idle, and before/after usage percentages.
+The report should include the plan/workspace, app or CLI version, model, reasoning effort, speed mode, large context setting, subagent and /review usage, recent `/status` and dashboard deltas, token totals including cached input/output/reasoning, adjacent prompt-cache rows with response ids and prompt_cache_key, background process ids, write_stdin poll cadence, compaction attempts, retry/tool-loop counts, whether the app was idle, and before/after usage percentages.

package/llms.txt CHANGED Viewed

@@ -23,6 +23,7 @@ Runtime: Node.js 20+
 - Codex auth and connectivity failures such as `token_exchange_failed`, `auth.openai.com/oauth/token`, missing `ca-certificates`, proxy or MITM TLS behavior, IPv6 fallback problems, Cloudflare challenge responses, and ChatGPT stream disconnects
 - Codex mobile and remote-control route health failures such as `Waiting for desktop`, `Directory Unavailable`, stale listeners on `127.0.0.1:14567`, stale `server_name` enrollment, empty backend environments, and incomplete helper bundles
 - Codex MCP runtime failures such as cancelled non-interactive approvals, `request_user_input is not supported in exec mode`, dropped namespace or `serverName` metadata, `unsupported call: mcp__...__...`, and closed `StdioServerTransport` sessions
+- Codex Streamable HTTP MCP failures where Penpot, n8n, DingTalk, or another HTTP/SSE MCP server initializes but fails JSON-RPC parsing, `Content-Type: text/event-stream` handling, handshakes, OAuth gating, stale session ids, missing headers, or reconnects
 - Codex MCP discovery and config-scope mismatches where CLI `/mcp` works but VS Code, Desktop, WSL, project `.codex/config.toml`, `CODEX_HOME`, or an older conversation exposes no `mcp__*` tools
 - Codex terminal output and scrollback integrity failures where streamed lines disappear, get overwritten, truncate, duplicate, misalign, snap to the bottom, or only survive in raw logs/transcripts
 - Codex subagent lifecycle failures where completed or closed agents remain visible, `thread_spawn_edges` drift, child threads crowd the recent list, `agent thread limit reached` blocks spawns, or compaction loses prior subagent IDs; `session-audit` can summarize local subagent signal counts without exposing transcripts
@@ -32,10 +33,10 @@ Runtime: Node.js 20+
 - Codex clipboard, copy/export, long paste, and generated `Pasted text.txt` attachment regressions that break prompt, `/goal`, preview/edit, or support-report workflows
 - Codex deeplink, OAuth callback, notification click, browser-extension activation, mobile pairing, and `codex app <path>` launch regressions
 - Codex app connector stale auth/cache regressions such as `401 Reauthentication required`, unchanged `link_*`, `isAccessible: false`, and broken `codex_apps_tools` metadata
-- Codex token-burn and usage-drain failures such as rapid drain experiments (`1% in 4 minutes`, `22 credits`, `70% weekly in a day`), background `write_stdin` polling, idle app usage, compaction tax, retry/tool loops, cached-token-heavy turns, fast-mode drift, subagent fan-out, and unclear usage attribution
+- Codex token-burn, prompt-cache collapse, and usage-drain failures such as rapid drain experiments (`1% in 4 minutes`, `22 credits`, `70% weekly in a day`), `input_tokens` / `cached_input_tokens` / `prompt_cache_key` rows, websocket reconnect cache drops, background `write_stdin` polling, idle app usage, compaction tax, retry/tool loops, cached-token-heavy turns, fast-mode drift, subagent fan-out, and unclear usage attribution
 - Codex process evidence packaging for Windows PowerShell/pwsh CIM polling, high-CPU helpers, stale process-manager entries, and renderer runaways
 - Codex usage reset schedule drift such as weekly reset dates moving, `reset_at` jumping, saved usage disappearing, outage compensation resets changing the anchor, and `/status` disagreeing with enforcement
-- Codex usage evidence packaging for scattered `/status`, reset-table, usage-limit, token-total, cached-input, and orchestration-overhead snippets
+- Codex usage evidence packaging for scattered `/status`, reset-table, usage-limit, token-total, prompt-cache, cached-input, and orchestration-overhead snippets
 - Codex usage receipts that separate backend quota-window percentages, bounded drain experiments, local token totals, and overhead signals such as background polling, compaction loops, retry/tool loops, subagent fan-out, or idle drain
 - Codex undo/rewind support workflows where maintainers need a local pre-agent workspace checkpoint with git diffs plus copied changed/untracked files before agent edits
 - Codex resource leaks and runaway local processes such as high CPU/GPU, `Code Helper`, `Codex Helper Renderer`, orphaned shell snapshots, log floods, thinking animation GPU loops, and non-Git workspace CPU loops
@@ -81,6 +82,7 @@ npx trace-to-skill demo clipboard-attachment
 npx trace-to-skill demo deeplink-launch
 npx trace-to-skill demo connector-auth-cache
 npx trace-to-skill demo mcp-discovery-mismatch
+npx trace-to-skill demo mcp-streamable-http
 npx trace-to-skill demo terminal-output-integrity
 npx trace-to-skill demo subagent-lifecycle
 npx trace-to-skill lint-agents .
@@ -109,7 +111,7 @@ npx trace-to-skill init --comment --sarif
 ## GitHub Action
 ```yaml
-- uses: grnbtqdbyx-create/trace-to-skill@v0.1.78
+- uses: grnbtqdbyx-create/trace-to-skill@v0.1.80
   with:
     mode: all
     doctor-threshold: "85"
@@ -216,6 +218,7 @@ npx trace-to-skill init --comment --sarif
 - Codex usage evidence report
 - Codex rate-limit evidence report
 - Codex cached input token usage drain
+- Codex prompt cache collapse evidence with prompt_cache_key, cached_input_tokens, cached_tokens, response ids, websocket reconnect, and low cache hit rate
 - Codex weekly reset date changed
 - Codex usage reset_at jumping
 - Codex deterministic reset schedule

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "trace-to-skill",
-  "version": "0.1.78",
+  "version": "0.1.80",
   "description": "Turn failed AI coding-agent runs into reusable AGENTS.md rules, SKILL.md files, and eval evidence.",
   "type": "module",
   "main": "dist/src/index.js",
@@ -119,6 +119,10 @@
     "mcp-config",
     "codex-vscode",
     "mcp-runtime",
+    "codex-mcp-streamable-http",
+    "streamable-http-mcp",
+    "mcp-http",
+    "mcp-sse",
     "codex-plugins",
     "codex-file-tree",
     "codex-navigation",
@@ -151,6 +155,9 @@
     "codex-rate-limits",
     "codex-usage-evidence",
     "codex-rate-limit-evidence",
+    "codex-prompt-cache",
+    "prompt-cache-collapse",
+    "cache-hit-rate",
     "usage-receipt",
     "codex-usage-receipt",
     "codex-usage-spike",

package/schemas/analysis-result.schema.json CHANGED Viewed

@@ -85,6 +85,7 @@
         "codex_subagent_lifecycle",
         "codex_mcp_discovery_mismatch",
         "codex_mcp_runtime",
+        "codex_mcp_streamable_http",
         "codex_plugin_runtime",
         "codex_file_tree_ui",
         "codex_session_state",

package/schemas/usage-evidence-result.schema.json CHANGED Viewed

@@ -3,7 +3,7 @@
   "$id": "https://github.com/grnbtqdbyx-create/trace-to-skill/schemas/usage-evidence-result.schema.json",
   "title": "trace-to-skill Codex usage evidence result",
   "type": "object",
-  "required": ["generatedAt", "status", "inputs", "summary", "snapshots", "tokenUsage", "drainExperiments", "receipt", "findings", "checklist"],
+  "required": ["generatedAt", "status", "inputs", "summary", "snapshots", "tokenUsage", "cacheRecords", "cacheCollapseEvents", "drainExperiments", "receipt", "findings", "checklist"],
   "additionalProperties": false,
   "properties": {
     "generatedAt": {
@@ -20,7 +20,7 @@
     },
     "summary": {
       "type": "object",
-      "required": ["snapshots", "tokenUsageRecords", "usageLimitSignals", "resetDriftWindows", "highCachedInputRecords", "drainExperiments", "overheadSignals"],
+      "required": ["snapshots", "tokenUsageRecords", "usageLimitSignals", "resetDriftWindows", "highCachedInputRecords", "cacheRecords", "cacheCollapseEvents", "drainExperiments", "overheadSignals"],
       "additionalProperties": false,
       "properties": {
         "snapshots": {
@@ -43,6 +43,14 @@
           "type": "integer",
           "minimum": 0
         },
+        "cacheRecords": {
+          "type": "integer",
+          "minimum": 0
+        },
+        "cacheCollapseEvents": {
+          "type": "integer",
+          "minimum": 0
+        },
         "drainExperiments": {
           "type": "integer",
           "minimum": 0
@@ -65,6 +73,18 @@
         "$ref": "#/$defs/tokenUsage"
       }
     },
+    "cacheRecords": {
+      "type": "array",
+      "items": {
+        "$ref": "#/$defs/cacheRecord"
+      }
+    },
+    "cacheCollapseEvents": {
+      "type": "array",
+      "items": {
+        "$ref": "#/$defs/cacheCollapseEvent"
+      }
+    },
     "drainExperiments": {
       "type": "array",
       "items": {
@@ -160,6 +180,7 @@
             "quota_percentage_jump",
             "usage_limit_with_remaining_quota",
             "high_cached_input",
+            "prompt_cache_collapse",
             "high_total_tokens",
             "orchestration_overhead_signal",
             "rapid_quota_drain_experiment"
@@ -204,7 +225,7 @@
     },
     "receipt": {
       "type": "object",
-      "required": ["quotaWindows", "localTokenTotals", "drainExperiments", "overheadSignals", "suspectedCauses"],
+      "required": ["quotaWindows", "localTokenTotals", "drainExperiments", "overheadSignals", "cacheCollapseEvents", "suspectedCauses"],
       "additionalProperties": false,
       "properties": {
         "quotaWindows": {
@@ -228,6 +249,12 @@
             "$ref": "#/$defs/overheadSignal"
           }
         },
+        "cacheCollapseEvents": {
+          "type": "array",
+          "items": {
+            "$ref": "#/$defs/cacheCollapseEvent"
+          }
+        },
         "suspectedCauses": {
           "type": "array",
           "items": {
@@ -321,6 +348,86 @@
         }
       }
     },
+    "cacheRecord": {
+      "type": "object",
+      "required": ["source", "line", "excerpt"],
+      "additionalProperties": false,
+      "properties": {
+        "source": {
+          "type": "string"
+        },
+        "line": {
+          "type": "integer",
+          "minimum": 1
+        },
+        "inputTokens": {
+          "type": "number"
+        },
+        "cachedInputTokens": {
+          "type": "number"
+        },
+        "cacheHitPercent": {
+          "type": "number"
+        },
+        "promptCacheKey": {
+          "type": "string"
+        },
+        "responseId": {
+          "type": "string"
+        },
+        "transport": {
+          "type": "string"
+        },
+        "outcome": {
+          "type": "string"
+        },
+        "excerpt": {
+          "type": "string"
+        }
+      }
+    },
+    "cacheCollapseEvent": {
+      "type": "object",
+      "required": ["source", "line", "previousLine", "cachedInputTokens", "previousCachedInputTokens", "dropPercent", "excerpt"],
+      "additionalProperties": false,
+      "properties": {
+        "source": {
+          "type": "string"
+        },
+        "line": {
+          "type": "integer",
+          "minimum": 1
+        },
+        "previousLine": {
+          "type": "integer",
+          "minimum": 1
+        },
+        "inputTokens": {
+          "type": "number"
+        },
+        "cachedInputTokens": {
+          "type": "number"
+        },
+        "previousCachedInputTokens": {
+          "type": "number"
+        },
+        "dropPercent": {
+          "type": "number"
+        },
+        "promptCacheKey": {
+          "type": "string"
+        },
+        "responseId": {
+          "type": "string"
+        },
+        "previousResponseId": {
+          "type": "string"
+        },
+        "excerpt": {
+          "type": "string"
+        }
+      }
+    },
     "overheadSignal": {
       "type": "object",
       "required": ["kind", "source", "line", "excerpt"],