trace-to-skill 0.1.38 → 0.1.39
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +37 -4
- package/dist/src/benchmark.js +36 -0
- package/dist/src/benchmark.js.map +1 -1
- package/dist/src/cli.js +8 -1
- package/dist/src/cli.js.map +1 -1
- package/dist/src/doctor.js +52 -0
- package/dist/src/doctor.js.map +1 -1
- package/dist/src/index.d.ts +1 -1
- package/dist/src/index.js +1 -1
- package/dist/src/index.js.map +1 -1
- package/dist/src/report.d.ts +1 -0
- package/dist/src/report.js +73 -0
- package/dist/src/report.js.map +1 -1
- package/dist/src/rules.js +133 -2
- package/dist/src/rules.js.map +1 -1
- package/dist/src/types.d.ts +1 -1
- package/docs/ADOPTION_GUIDE.md +4 -0
- package/docs/BENCHMARK.md +6 -0
- package/docs/CODEX_ISSUE_MAP.md +47 -0
- package/docs/DISCOVERY.md +11 -1
- package/docs/FAILURE_TAXONOMY.md +48 -0
- package/docs/RELEASE.md +54 -0
- package/docs/SCORECARD.md +7 -1
- package/docs/USE_CASES.md +82 -5
- package/fixtures/codex-connectivity.md +32 -0
- package/fixtures/codex-mcp-runtime.md +51 -0
- package/fixtures/codex-remote-control.md +29 -0
- package/fixtures/codex-session-state.md +62 -0
- package/fixtures/codex-token-burn.md +42 -0
- package/fixtures/sensitive-file-access.md +14 -0
- package/llms.txt +27 -1
- package/package.json +18 -1
- package/schemas/analysis-result.schema.json +6 -0
package/docs/USE_CASES.md
CHANGED
|
@@ -21,7 +21,7 @@ What it proves:
|
|
|
21
21
|
Recommended CI surface:
|
|
22
22
|
|
|
23
23
|
```yaml
|
|
24
|
-
- uses: grnbtqdbyx-create/trace-to-skill@v0.1.
|
|
24
|
+
- uses: grnbtqdbyx-create/trace-to-skill@v0.1.39
|
|
25
25
|
with:
|
|
26
26
|
mode: all
|
|
27
27
|
doctor-threshold: "85"
|
|
@@ -61,7 +61,58 @@ npx trace-to-skill analyze ./runs --format json
|
|
|
61
61
|
|
|
62
62
|
This catches signals such as Windows sandbox setup refresh failures, `os error 740`, `CodexSandboxOffline` ownership drift, ACL denial, approval-policy mismatch, and Full Access sessions behaving like workspace-write or on-request mode.
|
|
63
63
|
|
|
64
|
-
## 4.
|
|
64
|
+
## 4. Codex Auth And Connectivity Triage
|
|
65
|
+
|
|
66
|
+
Use this when Codex cannot log in, exchange an auth token, stream a response, or connect through a container, proxy, VPN, corporate CA, IPv6 network, or Cloudflare challenge.
|
|
67
|
+
|
|
68
|
+
```bash
|
|
69
|
+
npx trace-to-skill analyze ./runs --format json
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
This catches signals such as `token_exchange_failed`, `auth.openai.com/oauth/token`, `codex_login::server`, `cf-mitigated: challenge`, missing `ca-certificates`, `update-ca-certificates`, `CODEX_CA_CERTIFICATE`, IPv6 fallback evidence, proxy/MITM TLS failures, and `stream disconnected before completion` on `chatgpt.com/backend-api/codex/responses`.
|
|
73
|
+
|
|
74
|
+
## 5. Codex Mobile And Remote-Control Route Health
|
|
75
|
+
|
|
76
|
+
Use this when Codex mobile, SSH remote, or desktop remote-control says it is connected but commands do not reach the expected host, workspace, or app-server.
|
|
77
|
+
|
|
78
|
+
```bash
|
|
79
|
+
npx trace-to-skill analyze ./runs --format json
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
This catches signals such as `Waiting for desktop`, `Directory: Unavailable`, stale `server_name` enrollment, stale remote-control listener, `127.0.0.1:14567`, missing cached helper files such as `codex-windows-sandbox-setup.exe` or `codex-command-runner.exe`, empty backend environments, stale Android session lists, and temporary recovery after re-pairing or listener restart.
|
|
83
|
+
|
|
84
|
+
## 6. Codex MCP Runtime Triage
|
|
85
|
+
|
|
86
|
+
Use this when MCP tools are configured and visible, but Codex cannot actually call them at runtime.
|
|
87
|
+
|
|
88
|
+
```bash
|
|
89
|
+
npx trace-to-skill analyze ./runs --format json
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
This catches signals such as `user cancelled MCP tool call`, `request_user_input is not supported in exec mode`, `Approve app tool call?`, `tool_call_mcp_elicitation`, routed callable names like `mcp__node_repl__js` becoming `unsupported call`, deferred discovery dropping namespace or `serverName`, `tools/list` succeeding while Codex routing fails, and stdio transport lifecycle failures such as `Transport closed`, `stdin_end`, `stdin_close`, `transport_close`, or stderr backpressure.
|
|
93
|
+
|
|
94
|
+
## 7. Codex Resume And Session State Triage
|
|
95
|
+
|
|
96
|
+
Use this when long Codex sessions become difficult to resume, Desktop history rendering gets sluggish, or local state migrations break goals/projects/history.
|
|
97
|
+
|
|
98
|
+
```bash
|
|
99
|
+
npx trace-to-skill analyze ./runs --format json
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
This catches signals such as `codex resume` picker hangs, `codex resume <id>` working while the picker freezes, large `rollout-*.jsonl` histories, high JSONL line and `response_item` / `event_msg` / `function_call` counts, large `input_image` payloads, slow `thread/resume` and `thread/goal/get` timings, `Could not load archived chats`, resume compression dropping the last 3-5 turns, `state_5.sqlite` / `goals_1.sqlite` migration mismatches, `no such table: thread_goals`, stale `projectless-thread-ids`, and `thread-workspace-root-hints` reverting after restart.
|
|
103
|
+
|
|
104
|
+
## 8. Codex Token Burn Attribution
|
|
105
|
+
|
|
106
|
+
Use this when Codex usage drains faster than expected and the trace needs to separate useful model work from orchestration overhead.
|
|
107
|
+
|
|
108
|
+
```bash
|
|
109
|
+
npx trace-to-skill analyze ./runs --format json
|
|
110
|
+
npx trace-to-skill codex-report ./runs --output openai-codex-issue.md
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
This catches signals such as tokens `burning very fast`, usage dropping by visible percentages after one or two prompts, weekly allowance depletion, 5-hour usage reaching 0%, large `input` plus `cached input` totals, `write_stdin` empty polling, background commands repeatedly reporting no new output, idle app usage, compaction tax, retry/tool loops, and missing attribution between normal turns, compaction, background polling, subagents, and retries.
|
|
114
|
+
|
|
115
|
+
## 9. Quota And Usage-Limit Evidence
|
|
65
116
|
|
|
66
117
|
Use this when Codex blocks a prompt with a usage-limit message but another surface still shows remaining quota.
|
|
67
118
|
|
|
@@ -71,7 +122,33 @@ npx trace-to-skill analyze ./runs --format json
|
|
|
71
122
|
|
|
72
123
|
This catches traces where `/status` or the usage page shows remaining 5h or weekly quota, accounts appear to share limits unexpectedly, a Team account inherits a Plus account's limit state, or quota reset times jump after logout/login.
|
|
73
124
|
|
|
74
|
-
##
|
|
125
|
+
## 10. OpenAI Codex Issue Report
|
|
126
|
+
|
|
127
|
+
Use this when you want to file or update an OpenAI/Codex issue with a concise, evidence-backed report instead of pasting a full transcript.
|
|
128
|
+
|
|
129
|
+
```bash
|
|
130
|
+
npx trace-to-skill redact ./runs --output redacted-runs
|
|
131
|
+
npx trace-to-skill codex-report redacted-runs --output openai-codex-issue.md
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
The report includes the likely Codex failure class, line-linked evidence, diagnostics to attach, and a privacy checklist. This is useful for issues about auth/connectivity, sandbox setup, remote-control routing, MCP runtime calls, resume/session-state failures, quota mismatches, and context compaction.
|
|
135
|
+
|
|
136
|
+
For a cluster-to-command map of current Codex issue patterns, see [CODEX_ISSUE_MAP.md](CODEX_ISSUE_MAP.md).
|
|
137
|
+
|
|
138
|
+
## 11. Sensitive File Access Evidence
|
|
139
|
+
|
|
140
|
+
Use this when a trace suggests an agent read, attached, uploaded, diffed, or indexed credential-bearing files.
|
|
141
|
+
|
|
142
|
+
```bash
|
|
143
|
+
npx trace-to-skill analyze ./runs --format json
|
|
144
|
+
npx trace-to-skill codex-report ./runs --output openai-codex-issue.md
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
This catches signals such as `.env`, `.env.production`, `.npmrc`, `.pypirc`, `.netrc`, `.aws/credentials`, `.kube/config`, `.docker/config.json`, private-key PEM blocks, `.sqlite`, `.db`, `secrets.yaml`, and production secret manifests entering agent context.
|
|
148
|
+
|
|
149
|
+
Before publishing evidence, run `trace-to-skill redact` and attach only redacted excerpts plus the file path/class.
|
|
150
|
+
|
|
151
|
+
## 12. GitHub Context Guard
|
|
75
152
|
|
|
76
153
|
Use this before an agent reads untrusted GitHub text.
|
|
77
154
|
|
|
@@ -88,7 +165,7 @@ Use it when:
|
|
|
88
165
|
- a bot asks Codex to triage untrusted user reports
|
|
89
166
|
- logs or comments might contain instructions like "ignore previous instructions" or "print secrets"
|
|
90
167
|
|
|
91
|
-
##
|
|
168
|
+
## 13. Failed Agent Run To Reviewable Rule
|
|
92
169
|
|
|
93
170
|
Use this when a coding agent made a repeated workflow mistake.
|
|
94
171
|
|
|
@@ -106,7 +183,7 @@ Recommended maintainer loop:
|
|
|
106
183
|
4. Copy only evidence-backed rules into the real policy file.
|
|
107
184
|
5. Run `eval` or `scorecard` in CI so the same failure does not silently return.
|
|
108
185
|
|
|
109
|
-
##
|
|
186
|
+
## 14. Privacy-Preserving Adoption
|
|
110
187
|
|
|
111
188
|
Use this when you want public evidence without leaking private traces.
|
|
112
189
|
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
# Codex auth and connectivity failure fixture
|
|
2
|
+
|
|
3
|
+
A Dev Container using Ubuntu 24.04 failed after the browser login step:
|
|
4
|
+
|
|
5
|
+
```text
|
|
6
|
+
2026-05-26T19:42:22.056568Z ERROR codex_login::server: oauth token exchange transport failure is_timeout=false is_connect=true is_request=true error=error sending request for url (https://auth.openai.com/oauth/token)
|
|
7
|
+
Token exchange error: error sending request for url (https://auth.openai.com/oauth/token)
|
|
8
|
+
codex_login::server: login callback token exchange failed
|
|
9
|
+
```
|
|
10
|
+
|
|
11
|
+
The environment later showed missing ca-certificates setup before the OpenAI extension was installed:
|
|
12
|
+
|
|
13
|
+
```Dockerfile
|
|
14
|
+
RUN apt-get update && apt-get install -y ca-certificates && update-ca-certificates
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
The API key path also looked connected in the UI but produced the same transport failure when Codex tried to reach ChatGPT:
|
|
18
|
+
|
|
19
|
+
```text
|
|
20
|
+
stream disconnected before completion: Transport error: network error: error decoding response body
|
|
21
|
+
Reconnecting... 1/5 (stream disconnected before completion: error sending request for url (https://chatgpt.com/backend-api/codex/responses))
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
Another trace from the same issue class showed IPv6 and proxy-specific evidence:
|
|
25
|
+
|
|
26
|
+
```text
|
|
27
|
+
auth.openai.com resolves to IPv6 first; curl -4 works, curl -6 hangs.
|
|
28
|
+
Cloudflare returned cf-mitigated: challenge when the corporate MITM proxy intercepted auth.openai.com.
|
|
29
|
+
CODEX_CA_CERTIFICATE was set but did not help this WAF challenge.
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
The final report must include the Codex version, container image, proxy/VPN state, DNS IPv4/IPv6 checks, CA variables, and whether browser login, device auth, and API-key auth fail differently.
|
|
@@ -0,0 +1,51 @@
|
|
|
1
|
+
# Codex MCP runtime failure fixture
|
|
2
|
+
|
|
3
|
+
A minimal MCP server was configured correctly and `tools/list` returned the expected tool, but Codex failed at runtime.
|
|
4
|
+
|
|
5
|
+
## Non-interactive approval cancellation
|
|
6
|
+
|
|
7
|
+
```json
|
|
8
|
+
{"type":"item.started","item":{"type":"mcp_tool_call","server":"minimal","tool":"ping","status":"in_progress"}}
|
|
9
|
+
{"type":"item.completed","item":{"type":"mcp_tool_call","server":"minimal","tool":"ping","status":"failed","error":{"message":"user cancelled MCP tool call"}}}
|
|
10
|
+
```
|
|
11
|
+
|
|
12
|
+
Logs showed:
|
|
13
|
+
|
|
14
|
+
```text
|
|
15
|
+
request_user_input is not supported in exec mode
|
|
16
|
+
Approve app tool call?
|
|
17
|
+
tool_call_mcp_elicitation = true
|
|
18
|
+
maybe_request_mcp_tool_approval called for a custom MCP tool
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
## Unsupported routed callable
|
|
22
|
+
|
|
23
|
+
The server could execute the tool manually, but Codex routed the exposed name incorrectly:
|
|
24
|
+
|
|
25
|
+
```text
|
|
26
|
+
tools/list returned js for node_repl
|
|
27
|
+
manual tools/call succeeded
|
|
28
|
+
runtime routes mcp__node_repl__js as unsupported call
|
|
29
|
+
unsupported call: mcp__node_repl__js
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
Another trace showed deferred tool discovery replay losing namespace metadata:
|
|
33
|
+
|
|
34
|
+
```text
|
|
35
|
+
Namespaced MCP tool calls fail after deferred tool discovery because replayed function_call drops namespace.
|
|
36
|
+
function_call omitted namespace and serverName, so the runtime received an un-namespaced tool name.
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
## Stdio transport lifecycle
|
|
40
|
+
|
|
41
|
+
A healthy stdio MCP server returned one result, exited, and Codex reused a closed client:
|
|
42
|
+
|
|
43
|
+
```text
|
|
44
|
+
tool call failed for `minimal_stdio/echo`
|
|
45
|
+
Caused by:
|
|
46
|
+
Transport closed for MCP stdio client
|
|
47
|
+
StdioServerTransport shutdown: stdin_end
|
|
48
|
+
StdioServerTransport shutdown: transport_close
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
The report should include the Codex version, MCP server name and transport, callable name, whether `tools/list` and manual `tools/call` succeed, approval policy, sandbox mode, exec/non-interactive mode, elicitation setting, namespace/serverName metadata, item JSONL, stderr/backpressure evidence, and whether a transport restart changes the result.
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
# Codex remote-control route health fixture
|
|
2
|
+
|
|
3
|
+
A Windows Desktop remote-control route reported connected, but mobile commands did not execute reliably.
|
|
4
|
+
|
|
5
|
+
The local diagnostic showed the listener was stale:
|
|
6
|
+
|
|
7
|
+
```text
|
|
8
|
+
remote-control: listening on 127.0.0.1:14567, pid=18452, exe=C:\Users\user\.codex\cache\7dea4a003bc76627\codex.exe, bundle_complete=false, sandbox_helper=missing
|
|
9
|
+
candidate cache directory has codex.exe but is missing codex-windows-sandbox-setup.exe and codex-command-runner.exe
|
|
10
|
+
```
|
|
11
|
+
|
|
12
|
+
Mobile still displayed a weak connected state:
|
|
13
|
+
|
|
14
|
+
```text
|
|
15
|
+
ChatGPT mobile pairing stuck on Waiting for desktop
|
|
16
|
+
remote-control stays connecting
|
|
17
|
+
backend environments returns empty
|
|
18
|
+
Android Codex Mobile shows "Directory: Unavailable" after Desktop conversation updates
|
|
19
|
+
Revoking the Android device and re-pairing restores access temporarily.
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
Another trace showed stale enrollment state:
|
|
23
|
+
|
|
24
|
+
```text
|
|
25
|
+
codex remote-control reuses persisted enrollment with stale server_name, causing silent WebSocket disconnect
|
|
26
|
+
remoteControl/status/read succeeded, but the next mobile command never reached the active app-server.
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
The report should include desktop/app/CLI versions, mobile OS/app version, host id, listener pid/executable path, bound port, cache directory id, helper-file completeness, active server_name/enrollment, workspace root, last mobile command id, and whether restarting the listener or re-pairing changes the route.
|
|
@@ -0,0 +1,62 @@
|
|
|
1
|
+
# Codex session resume and local state failure fixture
|
|
2
|
+
|
|
3
|
+
This fixture captures a long-running Codex Desktop and CLI session that became hard to resume after local history grew large.
|
|
4
|
+
|
|
5
|
+
## Resume picker hangs on large JSONL history
|
|
6
|
+
|
|
7
|
+
The user reported:
|
|
8
|
+
|
|
9
|
+
```text
|
|
10
|
+
codex resume interactive picker hangs/freezes when session files are large.
|
|
11
|
+
The UI renders the list, but Enter has no effect.
|
|
12
|
+
codex resume <id> works fine as a workaround.
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
The session inventory showed large rollout files:
|
|
16
|
+
|
|
17
|
+
```text
|
|
18
|
+
rollout-2026-05-11T23-56-03-019e17c0.jsonl 17.7 MB
|
|
19
|
+
session files include large jsonl rollout history and the picker freezes when that session is visible.
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
## State migration mismatch
|
|
23
|
+
|
|
24
|
+
After a Codex update, goal state reads failed:
|
|
25
|
+
|
|
26
|
+
```text
|
|
27
|
+
state_5.sqlite migration version 34: drop thread goals
|
|
28
|
+
goals_1.sqlite was created, but goals_1.sqlite.thread_goals was empty.
|
|
29
|
+
Runtime still queried the old thread_goals table.
|
|
30
|
+
error returned from database: no such table: thread_goals
|
|
31
|
+
codex_core::goals failed to read thread goal for continuation
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
## Desktop thread open is sluggish
|
|
35
|
+
|
|
36
|
+
A Windows Codex Desktop trace showed the same session-state path causing slow UI:
|
|
37
|
+
|
|
38
|
+
```text
|
|
39
|
+
Thread id: 019e5fe0-5e81-7a02-abef-f2bde9c26a4a
|
|
40
|
+
Rollout/history size: 242.5 MB
|
|
41
|
+
50,589 JSONL lines
|
|
42
|
+
29,169 response_item records
|
|
43
|
+
21,178 event_msg records
|
|
44
|
+
10,538 function_call records
|
|
45
|
+
112 input_image records
|
|
46
|
+
Max JSONL line: 3,631,856 characters
|
|
47
|
+
thread/resume took 7,760 ms
|
|
48
|
+
thread/goal/get took 8,606 ms
|
|
49
|
+
Codex Desktop becomes extremely sluggish with high app-server/renderer CPU when opening a large local thread.
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
## Resume compression drops recent context
|
|
53
|
+
|
|
54
|
+
Another trace showed the resumed agent could not continue:
|
|
55
|
+
|
|
56
|
+
```text
|
|
57
|
+
codex resume context compression drops recent conversation context.
|
|
58
|
+
The last 3-5 turns are discarded, including the user's latest instruction, recent errors, and file paths.
|
|
59
|
+
The resumed session is effectively amnesic and cannot continue the previous task.
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
The report should include the Codex app and CLI versions, OS, thread id, rollout size, JSONL line count, response/event/function/image counts, largest line size, thread/resume and thread/goal/get timings, renderer/app-server CPU and memory, state database paths and migration versions, whether `codex resume <id>` works, whether a new thread works, and backup steps before any manual state edit.
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
# Codex unexpected usage fixture
|
|
2
|
+
|
|
3
|
+
These notes preserve reports where the user cannot tell whether the cause is model choice, background polling, compaction, retries, or idle app activity.
|
|
4
|
+
|
|
5
|
+
## General token burn regression
|
|
6
|
+
|
|
7
|
+
```text
|
|
8
|
+
Am I the only one still seeing my tokens burning very fast after today's extension update?
|
|
9
|
+
Just by writing 1 or 2 prompts, usage drops by 1%.
|
|
10
|
+
Within 2 hours of working I managed to burn through ~20% of my tokens.
|
|
11
|
+
Token usage: total=742,555 input=697,188 (+ 9,077,504 cached) output=45,367 (reasoning 11,450)
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
## Idle app usage
|
|
15
|
+
|
|
16
|
+
```text
|
|
17
|
+
Codex is using daily usage even when it is not doing anything.
|
|
18
|
+
I opened the app, went to eat, and came back to all 5-hour usage consumed plus 15% of weekly usage.
|
|
19
|
+
Just by being open it used it all.
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
## Background polling loop
|
|
23
|
+
|
|
24
|
+
```text
|
|
25
|
+
Background process polling wastes tokens.
|
|
26
|
+
Each write_stdin empty poll triggers a full API turn with entire conversation history.
|
|
27
|
+
The model keeps polling while no new output is available and the process is still running.
|
|
28
|
+
During background waits, the cadence is about one poll every 5-10 seconds.
|
|
29
|
+
Cached tokens are still charged by some API/proxy billing paths.
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
## Compaction and replay cost
|
|
33
|
+
|
|
34
|
+
Another report said:
|
|
35
|
+
|
|
36
|
+
```text
|
|
37
|
+
My Pro weekly allowance was almost fully depleted in about two days under normal usage.
|
|
38
|
+
The weekly usage limit depletes unusually fast on 5.5, worsened by unstable context compaction.
|
|
39
|
+
Failed context compaction forces users to restart tasks and re-explain project state, creating compaction tax.
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
The report should include the plan/workspace, app or CLI version, model, reasoning effort, speed mode, large context setting, subagent and /review usage, recent `/status` and dashboard deltas, token totals including cached input/output/reasoning, background process ids, write_stdin poll cadence, compaction attempts, retry/tool-loop counts, whether the app was idle, and before/after usage percentages.
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
# Codex Sensitive File Access Trace
|
|
2
|
+
|
|
3
|
+
User report:
|
|
4
|
+
|
|
5
|
+
Codex indexed the repository before I could exclude sensitive files. The trace shows:
|
|
6
|
+
|
|
7
|
+
```text
|
|
8
|
+
exec_command: cat .env.production
|
|
9
|
+
exec_command: sed -n '1,80p' ~/.aws/credentials
|
|
10
|
+
diff --git a/.env b/.env
|
|
11
|
+
-----BEGIN OPENSSH PRIVATE KEY-----
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
Expected behavior: the agent should stop and ask for an explicit redaction or exclusion rule before putting those files into context.
|
package/llms.txt
CHANGED
|
@@ -16,7 +16,13 @@ Runtime: Node.js 20+
|
|
|
16
16
|
- failed tests hidden behind optimistic summaries
|
|
17
17
|
- Codex context compaction failures such as remote compact stream disconnects, `context_length_exceeded`, stuck automatic compaction, and `unknown variant auto`
|
|
18
18
|
- Codex sandbox and permission failures such as setup refresh errors, `os error 740`, `CodexSandboxOffline` ownership drift, ACL denial, and approval-mode downgrades
|
|
19
|
+
- Codex auth and connectivity failures such as `token_exchange_failed`, `auth.openai.com/oauth/token`, missing `ca-certificates`, proxy or MITM TLS behavior, IPv6 fallback problems, Cloudflare challenge responses, and ChatGPT stream disconnects
|
|
20
|
+
- Codex mobile and remote-control route health failures such as `Waiting for desktop`, `Directory Unavailable`, stale listeners on `127.0.0.1:14567`, stale `server_name` enrollment, empty backend environments, and incomplete helper bundles
|
|
21
|
+
- Codex MCP runtime failures such as cancelled non-interactive approvals, `request_user_input is not supported in exec mode`, dropped namespace or `serverName` metadata, `unsupported call: mcp__...__...`, and closed `StdioServerTransport` sessions
|
|
22
|
+
- Codex resume and session-state failures such as frozen resume pickers, large rollout JSONL histories, sluggish Desktop thread rendering, dropped recent context after resume, archived chat loading failures, and `state_5.sqlite` / `goals_1.sqlite` migration drift
|
|
23
|
+
- Codex token-burn and usage-drain failures such as background `write_stdin` polling, idle app usage, compaction tax, retry/tool loops, cached-token-heavy turns, fast-mode drift, subagent fan-out, and unclear usage attribution
|
|
19
24
|
- Codex quota and usage-limit mismatches where `/status` or the usage page shows remaining quota, accounts share limits unexpectedly, or 5h and weekly quotas move together
|
|
25
|
+
- sensitive-file access in traces, including `.env`, private keys, package auth files, cloud credentials, local databases, and production secret manifests entering agent context
|
|
20
26
|
- hallucinated files and broad over-editing
|
|
21
27
|
- conflicting `AGENTS.md`, `CLAUDE.md`, Cursor, Copilot, or Gemini instructions
|
|
22
28
|
- stale path references, missing `@file.md` includes, nested `AGENTS.md` visibility gaps, invalid UTF-8, and oversized instruction files that can make Codex follow wrong or truncated guidance
|
|
@@ -34,6 +40,7 @@ failed agent run -> failure class -> evidence-backed AGENTS.md/SKILL.md suggesti
|
|
|
34
40
|
|
|
35
41
|
- README: https://github.com/grnbtqdbyx-create/trace-to-skill#readme
|
|
36
42
|
- Use cases: https://github.com/grnbtqdbyx-create/trace-to-skill/blob/main/docs/USE_CASES.md
|
|
43
|
+
- Codex issue map: https://github.com/grnbtqdbyx-create/trace-to-skill/blob/main/docs/CODEX_ISSUE_MAP.md
|
|
37
44
|
- Discovery summary: https://github.com/grnbtqdbyx-create/trace-to-skill/blob/main/docs/DISCOVERY.md
|
|
38
45
|
- Adoption guide: https://github.com/grnbtqdbyx-create/trace-to-skill/blob/main/docs/ADOPTION_GUIDE.md
|
|
39
46
|
- Failure taxonomy: https://github.com/grnbtqdbyx-create/trace-to-skill/blob/main/docs/FAILURE_TAXONOMY.md
|
|
@@ -51,6 +58,7 @@ npx trace-to-skill lint-agents .
|
|
|
51
58
|
npx trace-to-skill guard-github-event "$GITHUB_EVENT_PATH"
|
|
52
59
|
npx trace-to-skill redact ./runs --output redacted-runs
|
|
53
60
|
npx trace-to-skill analyze ./runs
|
|
61
|
+
npx trace-to-skill codex-report ./runs --output openai-codex-issue.md
|
|
54
62
|
npx trace-to-skill suggest ./runs --target agents-md
|
|
55
63
|
npx trace-to-skill eval ./runs --threshold 80
|
|
56
64
|
npx trace-to-skill benchmark
|
|
@@ -61,7 +69,7 @@ npx trace-to-skill init --comment --sarif
|
|
|
61
69
|
## GitHub Action
|
|
62
70
|
|
|
63
71
|
```yaml
|
|
64
|
-
- uses: grnbtqdbyx-create/trace-to-skill@v0.1.
|
|
72
|
+
- uses: grnbtqdbyx-create/trace-to-skill@v0.1.39
|
|
65
73
|
with:
|
|
66
74
|
mode: all
|
|
67
75
|
doctor-threshold: "85"
|
|
@@ -91,5 +99,23 @@ npx trace-to-skill init --comment --sarif
|
|
|
91
99
|
- Codex OSS maintainer automation evidence
|
|
92
100
|
- Codex sandbox setup refresh failed
|
|
93
101
|
- Windows sandbox permission failure for Codex CLI
|
|
102
|
+
- Codex token_exchange_failed auth.openai.com oauth token
|
|
103
|
+
- Codex stream disconnected before completion chatgpt backend api responses
|
|
104
|
+
- Codex ca-certificates update-ca-certificates Dev Container
|
|
105
|
+
- Codex remote-control stale listener 127.0.0.1:14567
|
|
106
|
+
- Codex mobile Waiting for desktop Directory Unavailable
|
|
107
|
+
- Codex MCP unsupported call mcp__node_repl__js
|
|
108
|
+
- Codex MCP tools/list succeeds but runtime call fails
|
|
109
|
+
- Codex MCP Transport closed StdioServerTransport
|
|
110
|
+
- Codex resume picker hangs with large session files
|
|
111
|
+
- Codex Desktop sluggish opening large rollout JSONL thread
|
|
112
|
+
- Codex state_5.sqlite no such table thread_goals
|
|
113
|
+
- OpenAI Codex issue report from failed trace
|
|
114
|
+
- Codex bug report evidence checklist
|
|
115
|
+
- Codex token burn write_stdin background polling
|
|
116
|
+
- Codex cached input token usage drain
|
|
117
|
+
- Codex compaction tax weekly usage depleted
|
|
94
118
|
- Codex usage limit despite remaining quota
|
|
95
119
|
- Codex quota mismatch and rate limit debugging
|
|
120
|
+
- Codex exclude sensitive files from agent context
|
|
121
|
+
- detect .env private key credential file access in agent traces
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "trace-to-skill",
|
|
3
|
-
"version": "0.1.
|
|
3
|
+
"version": "0.1.39",
|
|
4
4
|
"description": "Turn failed AI coding-agent runs into reusable AGENTS.md rules, SKILL.md files, and eval evidence.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "dist/src/index.js",
|
|
@@ -21,8 +21,10 @@
|
|
|
21
21
|
"docs/ADOPTION_GUIDE.md",
|
|
22
22
|
"docs/AGENTS_LINT.md",
|
|
23
23
|
"docs/BENCHMARK.md",
|
|
24
|
+
"docs/CODEX_ISSUE_MAP.md",
|
|
24
25
|
"docs/DISCOVERY.md",
|
|
25
26
|
"docs/FAILURE_TAXONOMY.md",
|
|
27
|
+
"docs/RELEASE.md",
|
|
26
28
|
"docs/SCORECARD.md",
|
|
27
29
|
"docs/USE_CASES.md",
|
|
28
30
|
"examples",
|
|
@@ -63,9 +65,24 @@
|
|
|
63
65
|
"sandbox-permission",
|
|
64
66
|
"codex-sandbox",
|
|
65
67
|
"windows-sandbox",
|
|
68
|
+
"codex-connectivity",
|
|
69
|
+
"codex-auth",
|
|
70
|
+
"token-exchange-failed",
|
|
71
|
+
"codex-remote-control",
|
|
72
|
+
"codex-mobile",
|
|
73
|
+
"codex-mcp",
|
|
74
|
+
"mcp-runtime",
|
|
75
|
+
"codex-session",
|
|
76
|
+
"codex-resume",
|
|
77
|
+
"codex-issue-report",
|
|
78
|
+
"openai-triage",
|
|
79
|
+
"codex-token-burn",
|
|
80
|
+
"codex-usage",
|
|
66
81
|
"quota-mismatch",
|
|
67
82
|
"codex-rate-limits",
|
|
68
83
|
"usage-limit",
|
|
84
|
+
"sensitive-files",
|
|
85
|
+
"codex-privacy",
|
|
69
86
|
"evals",
|
|
70
87
|
"open-source-maintainers",
|
|
71
88
|
"self-improvement",
|
|
@@ -65,10 +65,16 @@
|
|
|
65
65
|
"over_editing",
|
|
66
66
|
"unsafe_command",
|
|
67
67
|
"secret_exposure",
|
|
68
|
+
"sensitive_file_access",
|
|
68
69
|
"hidden_unicode",
|
|
69
70
|
"prompt_injection",
|
|
70
71
|
"context_compaction",
|
|
71
72
|
"sandbox_permission",
|
|
73
|
+
"codex_connectivity",
|
|
74
|
+
"codex_remote_control",
|
|
75
|
+
"codex_mcp_runtime",
|
|
76
|
+
"codex_session_state",
|
|
77
|
+
"codex_token_burn",
|
|
72
78
|
"quota_mismatch",
|
|
73
79
|
"mcp_risk",
|
|
74
80
|
"weak_evidence"
|