job-forge 2.14.12 → 2.14.13
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.cursor/rules/main.mdc +1 -1
- package/.opencode/skills/job-forge.md +8 -3
- package/AGENTS.md +1 -1
- package/CLAUDE.md +1 -1
- package/iso/commands/job-forge.md +8 -3
- package/iso/instructions.md +1 -1
- package/modes/apply.md +5 -2
- package/package.json +1 -1
- package/scripts/telemetry.mjs +256 -20
package/.cursor/rules/main.mdc
CHANGED
|
@@ -12,7 +12,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
|
|
|
12
12
|
- [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
|
|
13
13
|
why: each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider recoveries
|
|
14
14
|
|
|
15
|
-
- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If any source shows APPLIED / Applied, skip the dispatch.
|
|
15
|
+
- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
|
|
16
16
|
why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
|
|
17
17
|
|
|
18
18
|
- [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
|
|
@@ -137,13 +137,18 @@ When the user says "apply to N jobs", "process the pipeline", or similar, execut
|
|
|
137
137
|
|
|
138
138
|
```
|
|
139
139
|
Step 1 — Enumerate candidates
|
|
140
|
-
- Grep data/applications
|
|
140
|
+
- Grep data/applications/*.md for status "Evaluated" without loading every file into context
|
|
141
141
|
- Also read data/pipeline.md for unprocessed URLs
|
|
142
142
|
- Build ordered list: candidates = [job_1, job_2, ..., job_N]
|
|
143
143
|
|
|
144
144
|
Step 2 — Dedup against already-applied
|
|
145
|
-
- For each candidate,
|
|
146
|
-
|
|
145
|
+
- For each candidate, grep all four sources for URL and company+role:
|
|
146
|
+
data/pipeline.md, data/applications/*.md, batch/tracker-additions/*.tsv,
|
|
147
|
+
batch/tracker-additions/merged/*.tsv
|
|
148
|
+
- Drop any APPLIED / Applied match before counting toward N. Never re-apply.
|
|
149
|
+
- If a subagent later returns SKIP because it found a duplicate, treat that as
|
|
150
|
+
a missed preflight check; finish the current round, re-run dedupe, then pick
|
|
151
|
+
a replacement from the remaining candidates.
|
|
147
152
|
|
|
148
153
|
Step 3 — Pre-flight cleanup (once, before the loop)
|
|
149
154
|
- geometra_list_sessions()
|
package/AGENTS.md
CHANGED
|
@@ -7,7 +7,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
|
|
|
7
7
|
- [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
|
|
8
8
|
why: each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider recoveries
|
|
9
9
|
|
|
10
|
-
- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If any source shows APPLIED / Applied, skip the dispatch.
|
|
10
|
+
- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
|
|
11
11
|
why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
|
|
12
12
|
|
|
13
13
|
- [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
|
package/CLAUDE.md
CHANGED
|
@@ -7,7 +7,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
|
|
|
7
7
|
- [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
|
|
8
8
|
why: each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider recoveries
|
|
9
9
|
|
|
10
|
-
- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If any source shows APPLIED / Applied, skip the dispatch.
|
|
10
|
+
- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
|
|
11
11
|
why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
|
|
12
12
|
|
|
13
13
|
- [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
|
|
@@ -140,13 +140,18 @@ When the user says "apply to N jobs", "process the pipeline", or similar, execut
|
|
|
140
140
|
|
|
141
141
|
```
|
|
142
142
|
Step 1 — Enumerate candidates
|
|
143
|
-
- Grep data/applications
|
|
143
|
+
- Grep data/applications/*.md for status "Evaluated" without loading every file into context
|
|
144
144
|
- Also read data/pipeline.md for unprocessed URLs
|
|
145
145
|
- Build ordered list: candidates = [job_1, job_2, ..., job_N]
|
|
146
146
|
|
|
147
147
|
Step 2 — Dedup against already-applied
|
|
148
|
-
- For each candidate,
|
|
149
|
-
|
|
148
|
+
- For each candidate, grep all four sources for URL and company+role:
|
|
149
|
+
data/pipeline.md, data/applications/*.md, batch/tracker-additions/*.tsv,
|
|
150
|
+
batch/tracker-additions/merged/*.tsv
|
|
151
|
+
- Drop any APPLIED / Applied match before counting toward N. Never re-apply.
|
|
152
|
+
- If a subagent later returns SKIP because it found a duplicate, treat that as
|
|
153
|
+
a missed preflight check; finish the current round, re-run dedupe, then pick
|
|
154
|
+
a replacement from the remaining candidates.
|
|
150
155
|
|
|
151
156
|
Step 3 — Pre-flight cleanup (once, before the loop)
|
|
152
157
|
- geometra_list_sessions()
|
package/iso/instructions.md
CHANGED
|
@@ -7,7 +7,7 @@ AI-powered job search pipeline: scans portals, evaluates offers, generates CVs v
|
|
|
7
7
|
- [H1] Max 2 parallel `task` dispatches per message. For N jobs, run `ceil(N/2)` sequential rounds of 2. A round is not complete until both subagents return a final outcome (`APPLIED`, `APPLY FAILED`, `SKIP`, `Discarded`, or a written TSV path). A `task` tool result that only gives a session id / title is a launch acknowledgement, not completion. Applies in all modes, for all user phrasings ("urgent", "apply to 10 jobs now").
|
|
8
8
|
why: each subagent requires post-cleanup and racing more than 2 reliably loses at least one result. On 2026-04-25 the orchestrator launched round 2 while round 1 had only returned task ids, leaving four application subagents in flight and losing two provider recoveries
|
|
9
9
|
|
|
10
|
-
- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If any source shows APPLIED / Applied, skip the dispatch.
|
|
10
|
+
- [H2] Max 1 application per company+role. Before every `apply` dispatch, grep all four sources for the URL and for `company+role`: `data/pipeline.md`, all `data/applications/*.md` day files, `batch/tracker-additions/*.tsv`, `batch/tracker-additions/merged/*.tsv`. If any source shows APPLIED / Applied, skip the dispatch and pick a replacement from the remaining candidate list. Do not count duplicates toward a requested "apply to N jobs" total, and do not delegate obvious duplicates just so a subagent can return SKIP.
|
|
11
11
|
why: 2026-04 same-day batch collision — when two batches target the same role, `npx job-forge merge` updates the existing day-file row rather than appending, so grepping day files alone misses earlier-batch applies; merged/*.tsv is the only place the breadcrumb remains
|
|
12
12
|
|
|
13
13
|
- [H3] Before every batch of `task` dispatches that will use Geometra, call `geometra_list_sessions` then `geometra_disconnect({closeBrowser: true})`. Every round, no exceptions. Name this cleanup as an explicit "step 0" in your first-response plan for any multi-apply request — it is the most frequently skipped guardrail in practice, and skipping it produces cascade "Not connected" failures on the next dispatch.
|
package/modes/apply.md
CHANGED
|
@@ -176,7 +176,10 @@ When `location_constraints` is absent, use the prose fields:
|
|
|
176
176
|
|
|
177
177
|
```
|
|
178
178
|
Step 1 — Build the job list (N items)
|
|
179
|
-
Step 2 — Dedup:
|
|
179
|
+
Step 2 — Dedup: for each candidate, grep all four sources for the URL and for company+role:
|
|
180
|
+
data/pipeline.md, all data/applications/*.md day files,
|
|
181
|
+
batch/tracker-additions/*.tsv, batch/tracker-additions/merged/*.tsv.
|
|
182
|
+
Drop any already APPLIED before counting toward N; pick replacements from the remaining list.
|
|
180
183
|
Step 3 — geometra_list_sessions() + geometra_disconnect({closeBrowser: true}) [once, before loop]
|
|
181
184
|
Step 4 — For round in ceil(N/2):
|
|
182
185
|
pair = jobs[round*2 : round*2 + 2]
|
|
@@ -192,7 +195,7 @@ Step 6 — Reconcile outcomes (Hard Limit #6):
|
|
|
192
195
|
Step 7 — Summarize outcomes; do NOT auto-retry failures.
|
|
193
196
|
```
|
|
194
197
|
|
|
195
|
-
If a subagent fails, report it in the summary and let the user decide whether to retry. Never auto-retry — re-running a submit step risks duplicate applications.
|
|
198
|
+
If a subagent fails, report it in the summary and let the user decide whether to retry. Never auto-retry — re-running a submit step risks duplicate applications. If a subagent returns SKIP because it discovered a duplicate, treat that as a missed preflight check: finish the current round, then choose a replacement candidate only after re-running dedupe against all four sources.
|
|
196
199
|
|
|
197
200
|
**Outcome routing (Hard Limit #6 in `AGENTS.md`):**
|
|
198
201
|
- Subagents write `batch/tracker-additions/{num}-{slug}.tsv` — one TSV per job.
|
package/package.json
CHANGED
package/scripts/telemetry.mjs
CHANGED
|
@@ -142,15 +142,32 @@ function analyzeSession(session, allSessions, opts) {
|
|
|
142
142
|
const parts = rows.parts.map((row) => ({ row, data: parseJson(row.data) }));
|
|
143
143
|
const messageById = new Map(messages.map((m) => [m.row.id, m.data]));
|
|
144
144
|
const textParts = parts.filter((p) => p.data.type === 'text');
|
|
145
|
-
const userPrompt = firstUserText(textParts, messageById);
|
|
146
145
|
const taskCalls = parts.filter((p) => p.data.type === 'tool' && p.data.tool === 'task').map(taskCallSummary);
|
|
146
|
+
const userRequests = userRequestSummaries(textParts, messageById);
|
|
147
|
+
const activeRequest = userRequests.at(-1) || null;
|
|
148
|
+
const userPrompt = activeRequest?.prompt || userRequests[0]?.prompt || '';
|
|
149
|
+
const latestTaskCalls = activeRequest
|
|
150
|
+
? taskCalls.filter((task) => task.atMs >= activeRequest.atMs)
|
|
151
|
+
: taskCalls;
|
|
147
152
|
const providerErrors = messages.map(providerErrorSummary).filter(Boolean);
|
|
148
|
-
const
|
|
153
|
+
const rootModels = modelUsageFromMessages(messages);
|
|
149
154
|
const tracker = trackerStatus(opts.cwd);
|
|
150
155
|
const children = allSessions
|
|
151
156
|
.filter((candidate) => candidate.parentId === session.id)
|
|
152
157
|
.sort((a, b) => a.startedAt.localeCompare(b.startedAt))
|
|
153
158
|
.map((child) => childSummary(child));
|
|
159
|
+
const latestChildren = activeRequest
|
|
160
|
+
? children.filter((child) => child.startedAtMs >= activeRequest.atMs)
|
|
161
|
+
: children;
|
|
162
|
+
const models = mergeModelUsage([rootModels, ...children.map((child) => child.models)]);
|
|
163
|
+
const policyIssues = detectPolicyIssues(session, parts, textParts, messageById, providerErrors, {
|
|
164
|
+
taskCalls,
|
|
165
|
+
latestTaskCalls,
|
|
166
|
+
children,
|
|
167
|
+
latestChildren,
|
|
168
|
+
activeRequest,
|
|
169
|
+
models,
|
|
170
|
+
});
|
|
154
171
|
const childOutcomes = children.filter((child) => child.outcome !== 'unknown').length;
|
|
155
172
|
const childProviderErrors = children.reduce((sum, child) => sum + child.providerErrors, 0);
|
|
156
173
|
const status = sessionStatus({ session, taskCalls, children, childOutcomes, childProviderErrors, policyIssues, providerErrors });
|
|
@@ -161,17 +178,27 @@ function analyzeSession(session, allSessions, opts) {
|
|
|
161
178
|
projectDir: opts.cwd,
|
|
162
179
|
status,
|
|
163
180
|
prompt: userPrompt,
|
|
181
|
+
userRequests,
|
|
182
|
+
latestRequest: activeRequest ? {
|
|
183
|
+
...activeRequest,
|
|
184
|
+
taskDispatches: latestTaskCalls.filter((task) => !task.isStatusPoll).length,
|
|
185
|
+
children: latestChildren.length,
|
|
186
|
+
childOutcomes: latestChildren.filter((child) => child.outcome !== 'unknown').length,
|
|
187
|
+
} : null,
|
|
164
188
|
tasks: {
|
|
165
189
|
total: taskCalls.length,
|
|
166
190
|
statusPolls: taskCalls.filter((task) => task.isStatusPoll).length,
|
|
191
|
+
running: taskCalls.filter((task) => task.status && task.status !== 'completed').length,
|
|
167
192
|
calls: taskCalls,
|
|
168
193
|
},
|
|
169
194
|
children: {
|
|
170
195
|
total: children.length,
|
|
171
196
|
withOutcomes: childOutcomes,
|
|
172
197
|
providerErrors: childProviderErrors,
|
|
198
|
+
toolErrors: children.reduce((sum, child) => sum + child.toolErrors, 0),
|
|
173
199
|
sessions: children,
|
|
174
200
|
},
|
|
201
|
+
models,
|
|
175
202
|
providerErrors,
|
|
176
203
|
policyIssues,
|
|
177
204
|
tracker,
|
|
@@ -179,13 +206,19 @@ function analyzeSession(session, allSessions, opts) {
|
|
|
179
206
|
};
|
|
180
207
|
}
|
|
181
208
|
|
|
182
|
-
function
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
209
|
+
function userRequestSummaries(textParts, messageById) {
|
|
210
|
+
return textParts
|
|
211
|
+
.filter((part) => messageById.get(part.row.message_id)?.role === 'user')
|
|
212
|
+
.map((part) => {
|
|
213
|
+
const prompt = clean(redactSecrets(part.data.text || ''));
|
|
214
|
+
return {
|
|
215
|
+
at: msToIso(part.row.time_created),
|
|
216
|
+
atMs: Number(part.row.time_created),
|
|
217
|
+
prompt,
|
|
218
|
+
requestedJobs: requestedJobCount(prompt),
|
|
219
|
+
};
|
|
220
|
+
})
|
|
221
|
+
.filter((request) => request.prompt.length > 0);
|
|
189
222
|
}
|
|
190
223
|
|
|
191
224
|
function taskCallSummary(part) {
|
|
@@ -201,6 +234,7 @@ function taskCallSummary(part) {
|
|
|
201
234
|
|
|
202
235
|
return {
|
|
203
236
|
at: msToIso(part.row.time_created),
|
|
237
|
+
atMs: Number(part.row.time_created),
|
|
204
238
|
description,
|
|
205
239
|
subagentType,
|
|
206
240
|
sessionId,
|
|
@@ -208,6 +242,7 @@ function taskCallSummary(part) {
|
|
|
208
242
|
isStatusPoll,
|
|
209
243
|
promptBytes: Buffer.byteLength(prompt, 'utf8'),
|
|
210
244
|
proxyLeak: hasProxyLeak(prompt),
|
|
245
|
+
url: firstUrl(prompt),
|
|
211
246
|
};
|
|
212
247
|
}
|
|
213
248
|
|
|
@@ -226,10 +261,15 @@ function providerErrorSummary(message) {
|
|
|
226
261
|
};
|
|
227
262
|
}
|
|
228
263
|
|
|
229
|
-
function detectPolicyIssues(session, parts, textParts, messageById, providerErrors) {
|
|
264
|
+
function detectPolicyIssues(session, parts, textParts, messageById, providerErrors, context = {}) {
|
|
230
265
|
const issues = [];
|
|
231
266
|
const taskParts = parts.filter((p) => p.data.type === 'tool' && p.data.tool === 'task');
|
|
232
|
-
const
|
|
267
|
+
const taskCalls = context.taskCalls || taskParts.map(taskCallSummary);
|
|
268
|
+
const latestTaskCalls = context.latestTaskCalls || taskCalls;
|
|
269
|
+
const children = context.children || [];
|
|
270
|
+
const latestChildren = context.latestChildren || children;
|
|
271
|
+
const activeRequest = context.activeRequest || null;
|
|
272
|
+
const statusPolls = taskCalls.filter((task) => task.isStatusPoll);
|
|
233
273
|
if (statusPolls.length > 0) {
|
|
234
274
|
issues.push({
|
|
235
275
|
type: 'task_status_poll',
|
|
@@ -269,12 +309,90 @@ function detectPolicyIssues(session, parts, textParts, messageById, providerErro
|
|
|
269
309
|
});
|
|
270
310
|
}
|
|
271
311
|
|
|
312
|
+
const dedupeMisses = children.filter((child) => child.dedupeMiss).length;
|
|
313
|
+
if (dedupeMisses > 0) {
|
|
314
|
+
issues.push({
|
|
315
|
+
type: 'dedupe_preflight_missed',
|
|
316
|
+
severity: 'high',
|
|
317
|
+
count: dedupeMisses,
|
|
318
|
+
detail: 'One or more child sessions found an already-applied duplicate that should have been filtered before dispatch.',
|
|
319
|
+
});
|
|
320
|
+
}
|
|
321
|
+
|
|
322
|
+
const freeModels = context.models?.filter((model) => isFreeModelRoute(model.provider, model.model)) || [];
|
|
323
|
+
if (freeModels.length > 0) {
|
|
324
|
+
issues.push({
|
|
325
|
+
type: 'free_model_usage',
|
|
326
|
+
severity: 'high',
|
|
327
|
+
count: freeModels.reduce((sum, model) => sum + model.count, 0),
|
|
328
|
+
detail: `Trace used free/legacy model routes: ${freeModels.map(modelLabel).join(', ')}.`,
|
|
329
|
+
});
|
|
330
|
+
}
|
|
331
|
+
|
|
332
|
+
const duplicateUrlCount = duplicateTaskUrlCount(taskCalls);
|
|
333
|
+
if (duplicateUrlCount > 0) {
|
|
334
|
+
issues.push({
|
|
335
|
+
type: 'duplicate_task_url',
|
|
336
|
+
severity: 'high',
|
|
337
|
+
count: duplicateUrlCount,
|
|
338
|
+
detail: 'The same job URL was dispatched more than once in this root session.',
|
|
339
|
+
});
|
|
340
|
+
}
|
|
341
|
+
|
|
342
|
+
const runningTasks = taskCalls.filter((task) => task.status && task.status !== 'completed');
|
|
343
|
+
if (runningTasks.length > 0) {
|
|
344
|
+
const consumed = runningTasks.filter((task) => {
|
|
345
|
+
if (!task.sessionId) return false;
|
|
346
|
+
const child = children.find((candidate) => candidate.id === task.sessionId);
|
|
347
|
+
return child && child.outcome !== 'unknown';
|
|
348
|
+
}).length;
|
|
349
|
+
issues.push({
|
|
350
|
+
type: consumed === runningTasks.length ? 'task_result_not_consumed' : 'task_still_running',
|
|
351
|
+
severity: consumed === runningTasks.length ? 'medium' : 'high',
|
|
352
|
+
count: runningTasks.length,
|
|
353
|
+
detail: consumed === runningTasks.length
|
|
354
|
+
? 'One or more task calls still show running even though child sessions have terminal-looking outcomes; root did not consume the final task result.'
|
|
355
|
+
: 'One or more task calls still show running and do not have terminal child outcomes.',
|
|
356
|
+
});
|
|
357
|
+
}
|
|
358
|
+
|
|
359
|
+
const latestAssistantText = textParts
|
|
360
|
+
.filter((part) => messageById.get(part.row.message_id)?.role === 'assistant')
|
|
361
|
+
.filter((part) => !activeRequest || Number(part.row.time_created) >= activeRequest.atMs)
|
|
362
|
+
.map((part) => part.data.text || '')
|
|
363
|
+
.join('\n');
|
|
364
|
+
const latestDispatches = latestTaskCalls.filter((task) => !task.isStatusPoll).length;
|
|
365
|
+
if (activeRequest?.requestedJobs && latestDispatches > 0 && latestDispatches < activeRequest.requestedJobs && !mentionsLimitedCandidatePool(latestAssistantText)) {
|
|
366
|
+
issues.push({
|
|
367
|
+
type: 'requested_count_not_met',
|
|
368
|
+
severity: 'high',
|
|
369
|
+
count: activeRequest.requestedJobs - latestDispatches,
|
|
370
|
+
detail: `Latest request asked for ${activeRequest.requestedJobs} jobs, but only ${latestDispatches} task dispatches are visible after that prompt.`,
|
|
371
|
+
});
|
|
372
|
+
}
|
|
373
|
+
|
|
374
|
+
if (latestDispatches > 0 && latestChildren.some((child) => child.outcome === 'unknown') && !/round .*in flight|still running|waiting/i.test(latestAssistantText)) {
|
|
375
|
+
issues.push({
|
|
376
|
+
type: 'latest_children_missing_outcomes',
|
|
377
|
+
severity: 'high',
|
|
378
|
+
count: latestChildren.filter((child) => child.outcome === 'unknown').length,
|
|
379
|
+
detail: 'Latest request has child sessions without visible terminal outcomes.',
|
|
380
|
+
});
|
|
381
|
+
}
|
|
382
|
+
|
|
272
383
|
const finalText = textParts
|
|
273
384
|
.filter((part) => messageById.get(part.row.message_id)?.role === 'assistant')
|
|
274
385
|
.slice(-5)
|
|
275
386
|
.map((part) => part.data.text || '')
|
|
276
387
|
.join('\n');
|
|
277
|
-
if (
|
|
388
|
+
if (latestDispatches > 0 && !hasOutcome(latestAssistantText) && !/round .*in flight|still running|waiting/i.test(latestAssistantText)) {
|
|
389
|
+
issues.push({
|
|
390
|
+
type: 'latest_request_no_visible_final_outcome',
|
|
391
|
+
severity: 'high',
|
|
392
|
+
count: 1,
|
|
393
|
+
detail: 'Latest request dispatched task work but assistant text after that request has no final outcome or in-flight notice.',
|
|
394
|
+
});
|
|
395
|
+
} else if (taskParts.length > 0 && !hasOutcome(finalText) && !/round .*in flight|still running|waiting/i.test(finalText)) {
|
|
278
396
|
issues.push({
|
|
279
397
|
type: 'no_visible_final_outcome',
|
|
280
398
|
severity: 'medium',
|
|
@@ -297,39 +415,68 @@ function childSummary(session) {
|
|
|
297
415
|
const rows = loadRows(session.id);
|
|
298
416
|
const messages = rows.messages.map((row) => ({ row, data: parseJson(row.data) }));
|
|
299
417
|
const parts = rows.parts.map((row) => ({ row, data: parseJson(row.data) }));
|
|
300
|
-
const
|
|
418
|
+
const messageById = new Map(messages.map((m) => [m.row.id, m.data]));
|
|
419
|
+
const assistantTexts = parts
|
|
420
|
+
.filter((p) => p.data.type === 'text' && messageById.get(p.row.message_id)?.role === 'assistant')
|
|
421
|
+
.map((p) => p.data.text || '');
|
|
422
|
+
const finalText = assistantTexts.slice(-5).join('\n');
|
|
301
423
|
const providerErrors = messages.map(providerErrorSummary).filter(Boolean);
|
|
302
424
|
const taskCalls = parts.filter((p) => p.data.type === 'tool' && p.data.tool === 'task').length;
|
|
303
425
|
const trackerWrites = parts.filter((p) => p.data.type === 'tool' && /batch\/tracker-additions\/.*\.tsv/.test(JSON.stringify(p.data.state?.input || {}))).length;
|
|
426
|
+
const toolErrors = parts.filter((p) => p.data.type === 'tool' && (p.data.state?.status === 'error' || p.data.state?.error)).length;
|
|
427
|
+
const dedupeMiss = /\b(DUPLICATE|already\s+\*{0,2}Applied|already applied|per \[H2\]|Hard Limit #2|No re-dispatch needed)\b/i.test(finalText) ||
|
|
428
|
+
/\bpreviously applied (on|as|under)\b/i.test(finalText);
|
|
304
429
|
|
|
305
430
|
return {
|
|
306
431
|
id: session.id,
|
|
307
432
|
title: session.title,
|
|
308
433
|
startedAt: session.startedAt,
|
|
434
|
+
startedAtMs: Date.parse(session.startedAt),
|
|
309
435
|
endedAt: session.endedAt,
|
|
310
|
-
outcome: outcomeFromText(
|
|
436
|
+
outcome: outcomeFromText(finalText, trackerWrites),
|
|
311
437
|
providerErrors: providerErrors.length,
|
|
312
438
|
taskCalls,
|
|
439
|
+
toolErrors,
|
|
440
|
+
dedupeMiss,
|
|
313
441
|
trackerWrites,
|
|
442
|
+
models: modelUsageFromMessages(messages),
|
|
314
443
|
};
|
|
315
444
|
}
|
|
316
445
|
|
|
317
446
|
function outcomeFromText(text, trackerWrites = 0) {
|
|
318
|
-
|
|
319
|
-
|
|
320
|
-
|
|
447
|
+
const explicitFailed = /\b(APPLICATION OUTCOME|RESULT|STATUS)(?:\*\*)?\s*[:|-]\s*\*{0,2}\s*(FAILED|APPLY FAILED)\b/i.test(text) ||
|
|
448
|
+
/\|\s*\*\*?Status\*\*?\s*\|\s*\*\*?Failed\*\*?/i.test(text);
|
|
449
|
+
const explicitSkipped = /\b(APPLICATION OUTCOME|RESULT|STATUS)(?:\*\*)?\s*[:|-]\s*\*{0,2}\s*(SKIP|SKIPPED|DISCARDED|DISCARD)\b/i.test(text) ||
|
|
450
|
+
/\|\s*\*\*?Status\*\*?\s*\|\s*\*\*?(SKIP|SKIPPED|Discarded|DISCARDED)\*\*?/i.test(text);
|
|
451
|
+
const explicitApplied = /\b(APPLICATION OUTCOME|RESULT|STATUS)(?:\*\*)?\s*[:|-]\s*\*{0,2}\s*APPLIED\b/i.test(text) ||
|
|
452
|
+
/\|\s*\*\*?Status\*\*?\s*\|\s*\*\*?Applied\*\*?/i.test(text);
|
|
453
|
+
|
|
454
|
+
if (explicitFailed) return 'Failed';
|
|
455
|
+
if (explicitSkipped) return 'Discarded';
|
|
456
|
+
if (explicitApplied) return 'Applied';
|
|
457
|
+
|
|
458
|
+
if (/\bAPPLY FAILED\b/i.test(text) || /^\s*(FAILED|Failed)\b/m.test(text)) return 'Failed';
|
|
459
|
+
if (/^\s*(SKIP|SKIPPED|DISCARDED|Discarded)\b/m.test(text) ||
|
|
460
|
+
/\b(DUPLICATE|job posting closed|role no longer available)\b/i.test(text)) return 'Discarded';
|
|
461
|
+
if (/\bwith\s+\*\*?Applied\*\*?\s+status\b/i.test(text) ||
|
|
462
|
+
/\bAPPLIED\s+https?:\/\//i.test(text) ||
|
|
463
|
+
/\b(successfully submitted|Applied via|Thank you for applying|confirmation page)\b/i.test(text)) return 'Applied';
|
|
321
464
|
if (trackerWrites > 0) return 'TSV written';
|
|
322
465
|
return 'unknown';
|
|
323
466
|
}
|
|
324
467
|
|
|
325
468
|
function hasOutcome(text) {
|
|
326
|
-
return outcomeFromText(text) !== 'unknown' ||
|
|
469
|
+
return outcomeFromText(text) !== 'unknown' ||
|
|
470
|
+
/tracker-additions\/.*\.tsv/i.test(text) ||
|
|
471
|
+
/\bAll\s+\d+\s+jobs?\s+dispatched\b/i.test(text) ||
|
|
472
|
+
/\*\*(Applied|Skipped|Failed|Discarded)\s*\(\d+\):\*\*/i.test(text);
|
|
327
473
|
}
|
|
328
474
|
|
|
329
475
|
function sessionStatus({ taskCalls, children, childOutcomes, childProviderErrors, policyIssues, providerErrors }) {
|
|
330
476
|
if (policyIssues.some((issue) => issue.severity === 'high')) return 'attention';
|
|
331
477
|
if (providerErrors.length > 0) return 'attention';
|
|
332
478
|
if (childProviderErrors > 0) return 'attention';
|
|
479
|
+
if (taskCalls.some((task) => task.status && task.status !== 'completed')) return 'in-flight-or-incomplete';
|
|
333
480
|
if (taskCalls.length > 0 && children.length > childOutcomes) return 'in-flight-or-incomplete';
|
|
334
481
|
if (taskCalls.length > 0 && children.length === childOutcomes) return 'complete';
|
|
335
482
|
return 'observed';
|
|
@@ -361,6 +508,12 @@ function nextActions({ tracker, policyIssues, providerErrors, children }) {
|
|
|
361
508
|
if (tracker.pending.length > 0) actions.push('Run `npm run merge && npm run verify` when you are ready to fold pending TSV outcomes into day files.');
|
|
362
509
|
if (policyIssues.some((issue) => issue.type === 'task_status_poll')) actions.push('Avoid resuming by spawning "check task status" tasks; inspect telemetry/trace and tracker files instead.');
|
|
363
510
|
if (policyIssues.some((issue) => issue.type === 'proxy_prompt_leak')) actions.push('Restart OpenCode after updating the harness so new sessions load the proxy prompt hygiene rule.');
|
|
511
|
+
if (policyIssues.some((issue) => issue.type === 'free_model_usage')) actions.push('Restart OpenCode and rerun `npm run update-harness` so application tiers use the bundled DeepSeek V4 Flash route.');
|
|
512
|
+
if (policyIssues.some((issue) => issue.type === 'requested_count_not_met')) actions.push('Resume the latest apply request or start a new run for the remaining requested jobs; telemetry did not see enough dispatches after the latest prompt.');
|
|
513
|
+
if (policyIssues.some((issue) => issue.type === 'latest_request_no_visible_final_outcome')) actions.push('Inspect the latest child sessions before treating the current OpenCode run as complete.');
|
|
514
|
+
if (policyIssues.some((issue) => issue.type === 'task_result_not_consumed')) actions.push('Resume the root session only to collect final task results and summarize; do not dispatch new applications until it reconciles current children.');
|
|
515
|
+
if (policyIssues.some((issue) => issue.type === 'duplicate_task_url')) actions.push('Do not re-dispatch duplicate URLs automatically; inspect the prior child result and tracker TSV before retrying.');
|
|
516
|
+
if (policyIssues.some((issue) => issue.type === 'dedupe_preflight_missed')) actions.push('Tighten candidate preflight: grep all application day files plus pending/merged TSVs before dispatching replacements.');
|
|
364
517
|
if (providerErrors.some((err) => err.statusCode === 402)) actions.push('Provider balance errors occurred; use a non-402 fallback or add provider credits before retrying paid routes.');
|
|
365
518
|
if (children.some((child) => child.outcome === 'unknown')) actions.push('Some child sessions have no visible final outcome; inspect them with `npm run telemetry:show -- <child-session-id>`.');
|
|
366
519
|
return actions;
|
|
@@ -381,6 +534,78 @@ function summaryForList(telemetry) {
|
|
|
381
534
|
};
|
|
382
535
|
}
|
|
383
536
|
|
|
537
|
+
function modelUsageFromMessages(messages) {
|
|
538
|
+
const counts = new Map();
|
|
539
|
+
for (const message of messages) {
|
|
540
|
+
const provider = stringValue(message.data.providerID);
|
|
541
|
+
const model = stringValue(message.data.modelID);
|
|
542
|
+
if (!provider && !model) continue;
|
|
543
|
+
const key = `${provider}\u0000${model}`;
|
|
544
|
+
const current = counts.get(key) || { provider, model, count: 0 };
|
|
545
|
+
current.count += 1;
|
|
546
|
+
counts.set(key, current);
|
|
547
|
+
}
|
|
548
|
+
return [...counts.values()].sort((a, b) => b.count - a.count || modelLabel(a).localeCompare(modelLabel(b)));
|
|
549
|
+
}
|
|
550
|
+
|
|
551
|
+
function mergeModelUsage(groups) {
|
|
552
|
+
const counts = new Map();
|
|
553
|
+
for (const group of groups) {
|
|
554
|
+
for (const item of group || []) {
|
|
555
|
+
const provider = stringValue(item.provider);
|
|
556
|
+
const model = stringValue(item.model);
|
|
557
|
+
const key = `${provider}\u0000${model}`;
|
|
558
|
+
const current = counts.get(key) || { provider, model, count: 0 };
|
|
559
|
+
current.count += Number(item.count || 0);
|
|
560
|
+
counts.set(key, current);
|
|
561
|
+
}
|
|
562
|
+
}
|
|
563
|
+
return [...counts.values()].sort((a, b) => b.count - a.count || modelLabel(a).localeCompare(modelLabel(b)));
|
|
564
|
+
}
|
|
565
|
+
|
|
566
|
+
function modelLabel(model) {
|
|
567
|
+
return `${model.provider || '(unknown)'}/${model.model || '(unknown)'} x${model.count}`;
|
|
568
|
+
}
|
|
569
|
+
|
|
570
|
+
function isFreeModelRoute(provider, model) {
|
|
571
|
+
const route = `${provider}/${model}`.toLowerCase();
|
|
572
|
+
return route.includes(':free') ||
|
|
573
|
+
route.includes('/big-pickle') ||
|
|
574
|
+
route.includes('minimax-m2.5-free') ||
|
|
575
|
+
route.includes('glm-4.5-air') ||
|
|
576
|
+
route.includes('gpt-oss-20b') ||
|
|
577
|
+
route.includes('qwen3-next-80b-a3b-instruct:free');
|
|
578
|
+
}
|
|
579
|
+
|
|
580
|
+
function requestedJobCount(prompt) {
|
|
581
|
+
const text = String(prompt || '').toLowerCase();
|
|
582
|
+
if (!/\b(job|jobs|application|applications)\b/.test(text)) return null;
|
|
583
|
+
if (!/\b(apply|applt|another|nother|more|process)\b/.test(text)) return null;
|
|
584
|
+
const match = text.match(/\b(\d{1,3})\b/);
|
|
585
|
+
return match ? Number(match[1]) : null;
|
|
586
|
+
}
|
|
587
|
+
|
|
588
|
+
function firstUrl(text) {
|
|
589
|
+
const match = String(text || '').match(/https?:\/\/[^\s)>\]]+/i);
|
|
590
|
+
return match ? match[0].replace(/[.,;]+$/, '') : '';
|
|
591
|
+
}
|
|
592
|
+
|
|
593
|
+
function duplicateTaskUrlCount(taskCalls) {
|
|
594
|
+
const seen = new Set();
|
|
595
|
+
const duplicates = new Set();
|
|
596
|
+
for (const task of taskCalls) {
|
|
597
|
+
if (!task.url || task.isStatusPoll) continue;
|
|
598
|
+
if (seen.has(task.url)) duplicates.add(task.url);
|
|
599
|
+
seen.add(task.url);
|
|
600
|
+
}
|
|
601
|
+
return duplicates.size;
|
|
602
|
+
}
|
|
603
|
+
|
|
604
|
+
function mentionsLimitedCandidatePool(text) {
|
|
605
|
+
return /\b(only|just)\s+\d+\s+(candidate|candidates|jobs?|applications?)\b/i.test(text) ||
|
|
606
|
+
/\b(no more|not enough|ran out of|exhausted)\s+(candidate|candidates|jobs?|applications?|pipeline)\b/i.test(text);
|
|
607
|
+
}
|
|
608
|
+
|
|
384
609
|
function printList(items) {
|
|
385
610
|
const rows = items.map((item) => [
|
|
386
611
|
item.id,
|
|
@@ -401,10 +626,18 @@ function printStatus(telemetry) {
|
|
|
401
626
|
console.log(`status: ${telemetry.status}`);
|
|
402
627
|
console.log(`started: ${telemetry.session.startedAt}`);
|
|
403
628
|
console.log(`prompt: ${shorten(telemetry.prompt || '', 100)}`);
|
|
404
|
-
|
|
629
|
+
if (telemetry.userRequests.length > 1 || telemetry.latestRequest?.requestedJobs) {
|
|
630
|
+
const latest = telemetry.latestRequest;
|
|
631
|
+
const requestDetail = latest?.requestedJobs
|
|
632
|
+
? `latest ${latest.taskDispatches}/${latest.requestedJobs} dispatches`
|
|
633
|
+
: `latest ${latest?.taskDispatches ?? 0} dispatches`;
|
|
634
|
+
console.log(`requests: ${telemetry.userRequests.length} user prompt${telemetry.userRequests.length === 1 ? '' : 's'} (${requestDetail})`);
|
|
635
|
+
}
|
|
636
|
+
console.log(`tasks: ${telemetry.tasks.total} (${telemetry.tasks.statusPolls} status-poll, ${telemetry.tasks.running} running)`);
|
|
405
637
|
console.log(`children: ${telemetry.children.withOutcomes}/${telemetry.children.total} with outcomes`);
|
|
406
638
|
console.log(`tracker: ${telemetry.tracker.pending.length} pending TSVs, ${telemetry.tracker.mergedCount} merged TSVs`);
|
|
407
|
-
console.log(`
|
|
639
|
+
console.log(`models: ${telemetry.models.slice(0, 3).map(modelLabel).join(', ') || 'none'}`);
|
|
640
|
+
console.log(`errors: ${telemetry.providerErrors.length} root, ${telemetry.children.providerErrors} child provider errors, ${telemetry.children.toolErrors} child tool errors`);
|
|
408
641
|
console.log(`issues: ${telemetry.policyIssues.length}`);
|
|
409
642
|
|
|
410
643
|
if (telemetry.policyIssues.length > 0) {
|
|
@@ -427,6 +660,8 @@ function printStatus(telemetry) {
|
|
|
427
660
|
for (const child of telemetry.children.sessions) {
|
|
428
661
|
const alerts = [];
|
|
429
662
|
if (child.providerErrors) alerts.push(`${child.providerErrors} provider error`);
|
|
663
|
+
if (child.toolErrors) alerts.push(`${child.toolErrors} tool error`);
|
|
664
|
+
if (child.dedupeMiss) alerts.push('dedupe miss');
|
|
430
665
|
if (child.taskCalls) alerts.push(`${child.taskCalls} task call`);
|
|
431
666
|
console.log(` - ${child.id} ${child.outcome} ${child.title}${alerts.length ? ` (${alerts.join(', ')})` : ''}`);
|
|
432
667
|
}
|
|
@@ -445,6 +680,7 @@ function printShow(telemetry) {
|
|
|
445
680
|
for (const task of telemetry.tasks.calls) {
|
|
446
681
|
const flags = [
|
|
447
682
|
task.isStatusPoll ? 'status-poll' : '',
|
|
683
|
+
task.status && task.status !== 'completed' ? task.status : '',
|
|
448
684
|
task.proxyLeak ? 'proxy-values-detected' : '',
|
|
449
685
|
].filter(Boolean).join(', ');
|
|
450
686
|
console.log(` - ${task.at} ${task.description || '(no description)'} ${task.sessionId || ''} ${task.subagentType || ''}${flags ? ` [${flags}]` : ''}`);
|