@ishlabs/cli 0.21.0 → 0.23.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/commands/chat.js +2 -2
- package/dist/commands/config.js +17 -3
- package/dist/commands/source.js +1 -1
- package/dist/commands/study-analyze.js +15 -2
- package/dist/commands/study-participant.js +19 -0
- package/dist/commands/study-run.d.ts +2 -0
- package/dist/commands/study-run.js +71 -20
- package/dist/commands/study.js +96 -34
- package/dist/lib/command-helpers.js +4 -3
- package/dist/lib/docs.js +114 -43
- package/dist/lib/output.d.ts +14 -9
- package/dist/lib/output.js +91 -19
- package/dist/lib/skill-content.js +10 -1
- package/dist/lib/study-participants.d.ts +3 -0
- package/dist/lib/study-results-filters.js +35 -14
- package/dist/lib/study-results-projections.d.ts +47 -17
- package/dist/lib/study-results-projections.js +39 -36
- package/dist/lib/types.d.ts +4 -0
- package/package.json +1 -1
package/dist/lib/docs.js
CHANGED
|
@@ -635,7 +635,7 @@ Tunables (both modes):
|
|
|
635
635
|
the parties signal the conversation is over.
|
|
636
636
|
|
|
637
637
|
Pair-mode rules:
|
|
638
|
-
- Each side needs **either** \`--
|
|
638
|
+
- Each side needs **either** \`--group-a\` / \`--group-b\` (explicit IDs) **or**
|
|
639
639
|
\`--role-criteria-*\` (filter the backend resolves). The two can also
|
|
640
640
|
be combined — criteria then acts as validation on the explicit list.
|
|
641
641
|
- When both sides use explicit \`--group-a\` / \`--group-b\`, they
|
|
@@ -657,7 +657,7 @@ Pair-mode rules:
|
|
|
657
657
|
\`type\` field in \`--questionnaire\` / \`--questions\` manifests
|
|
658
658
|
(\`single-choice\` ↔ \`single_choice\`).
|
|
659
659
|
- Audiences are pinned to the iteration. \`ish study run\` refuses
|
|
660
|
-
run-time people overrides (\`--
|
|
660
|
+
run-time people overrides (\`--person\` / \`--sample\` / \`--all\` /
|
|
661
661
|
filters) on a pair iteration — change the peoples via
|
|
662
662
|
\`ish iteration update <id> --details-json '{...}'\` instead.
|
|
663
663
|
- \`--max-turns\` / \`--early-termination\` on \`ish study run\` override
|
|
@@ -1174,7 +1174,7 @@ const CONCEPT_PROFILE = `# concept: person
|
|
|
1174
1174
|
A **person** is a reusable persona — the simulated
|
|
1175
1175
|
human whose behaviour drives a participant instance during a study or ask.
|
|
1176
1176
|
|
|
1177
|
-
- Alias prefix: \`
|
|
1177
|
+
- Alias prefix: \`p-\`
|
|
1178
1178
|
- Lives at the workspace level, reusable across studies and asks.
|
|
1179
1179
|
- Distinct from a "participant" (\`pt-\`) — a participant is one *instance* of a
|
|
1180
1180
|
profile inside one iteration.
|
|
@@ -1336,7 +1336,7 @@ A **source** is an input to \`ish person generate\`: a transcript,
|
|
|
1336
1336
|
audio file, image, or PDF that an LLM reads to ground generated profiles
|
|
1337
1337
|
in real customer evidence.
|
|
1338
1338
|
|
|
1339
|
-
- Alias prefix: \`
|
|
1339
|
+
- Alias prefix: \`ps-\`
|
|
1340
1340
|
- Source kinds: \`text_file | audio | image\` (auto-detected from extension; \`text-file\` is accepted as a hyphen variant).
|
|
1341
1341
|
- Audio supports speaker diarization via \`--diarize\`.
|
|
1342
1342
|
|
|
@@ -1406,7 +1406,7 @@ flags. Two ways to select:
|
|
|
1406
1406
|
\`platform\` until the next release with a server-side
|
|
1407
1407
|
deprecation warning)
|
|
1408
1408
|
|
|
1409
|
-
The two modes are **mutually exclusive** — pass either \`--
|
|
1409
|
+
The two modes are **mutually exclusive** — pass either \`--person\` or
|
|
1410
1410
|
the filter set, not both.
|
|
1411
1411
|
|
|
1412
1412
|
## Empty-pool suggestions
|
|
@@ -1658,7 +1658,7 @@ and what they target differ.
|
|
|
1658
1658
|
| Default | latest iteration of the active study | append a round to the active ask |
|
|
1659
1659
|
| Fresh setup | \`ish iteration create …\` first, then run | \`--new\` (creates ask + round 1 in one shot) |
|
|
1660
1660
|
| Specific target| \`--iteration <id>\` | positional ask id (\`a-6ec\`) |
|
|
1661
|
-
| Audience | \`--
|
|
1661
|
+
| Audience | \`--person\` OR filters with \`--sample\`/\`--all\` — else reuse iteration's participants | only at \`--new\`; fixed for the ask afterwards |
|
|
1662
1662
|
| Output unit | per-participant interactions + questionnaire answers | per-participant reactions per round |
|
|
1663
1663
|
|
|
1664
1664
|
## Decision rule
|
|
@@ -1711,6 +1711,23 @@ removed); \`extend\` then spawns a fresh participant branched from the
|
|
|
1711
1711
|
cancelled participant's last interaction. See
|
|
1712
1712
|
\`concepts/extending-a-simulation\` for the full mental model.
|
|
1713
1713
|
|
|
1714
|
+
## Stuck runs are auto-failed (no manual intervention)
|
|
1715
|
+
|
|
1716
|
+
If a worker dies mid-run (instance preemption, OOM, infra restart), the
|
|
1717
|
+
backend reaper transitions the participant to
|
|
1718
|
+
\`status: failed, error_kind: stale_worker\` within ~15 min — you don't
|
|
1719
|
+
need to \`cancel\` it. The status payload returned by
|
|
1720
|
+
\`/simulation/status/{participant_id}\` (and surfaced on \`study wait\`,
|
|
1721
|
+
\`study run --wait\`, \`study poll\`) includes \`age_seconds\` so agents
|
|
1722
|
+
can tell "just slow" from "the worker is gone." Once \`age_seconds\`
|
|
1723
|
+
exceeds ~900s for a non-terminal participant the wait-timeout envelope
|
|
1724
|
+
explicitly flags it as likely stuck — stop polling and let the reaper
|
|
1725
|
+
finish the row.
|
|
1726
|
+
|
|
1727
|
+
\`error_kind: self_timeout\` is the same idea written by the worker
|
|
1728
|
+
itself when it self-detects passing its 25-min ceiling; \`stale_worker\`
|
|
1729
|
+
is the reaper's verdict when the row simply stopped reporting.
|
|
1730
|
+
|
|
1714
1731
|
## Related
|
|
1715
1732
|
|
|
1716
1733
|
- \`reference/json-mode\` — output modes (display vs capture vs chain).
|
|
@@ -1744,9 +1761,12 @@ mid-run?" scenario without restarting from scratch.
|
|
|
1744
1761
|
When extend is **not** the right verb:
|
|
1745
1762
|
|
|
1746
1763
|
- Source participant is still RUNNING. \`cancel\` it first, then extend.
|
|
1747
|
-
Extend refuses non-terminal sources server-side.
|
|
1764
|
+
Extend refuses non-terminal sources server-side. **Exception:** a
|
|
1765
|
+
stale-heartbeat RUNNING row (worker died mid-run) is reaped to
|
|
1766
|
+
\`failed, error_kind: stale_worker\` automatically within ~15 min — no
|
|
1767
|
+
manual \`cancel\` needed; just wait for the reaper, then extend.
|
|
1748
1768
|
- You want a fresh cohort with new people flags. Use \`study run\`
|
|
1749
|
-
with \`--
|
|
1769
|
+
with \`--person\` / \`--sample\` / \`--all\` instead — extend is a
|
|
1750
1770
|
per-participant resume, not a batch op.
|
|
1751
1771
|
- You want to change the iteration's URL or content. Edit the iteration
|
|
1752
1772
|
itself (\`iteration update\` or a fresh iteration) — extend always
|
|
@@ -1906,8 +1926,8 @@ time the CLI sees an entity.
|
|
|
1906
1926
|
- \`s-\` study
|
|
1907
1927
|
- \`i-\` iteration
|
|
1908
1928
|
- \`pt-\` participant (instance of a person in an iteration)
|
|
1909
|
-
- \`
|
|
1910
|
-
- \`
|
|
1929
|
+
- \`p-\` person
|
|
1930
|
+
- \`ps-\` person source
|
|
1911
1931
|
- \`a-\` ask
|
|
1912
1932
|
- \`r-\` ask round
|
|
1913
1933
|
- \`c-\` config (simulation config)
|
|
@@ -2223,7 +2243,30 @@ The CLI guarantees these contracts so agents can chain safely:
|
|
|
2223
2243
|
envelope carries \`progress: {study_id, iteration_id?,
|
|
2224
2244
|
timeout_seconds, done, total, pending, rows[]}\` so the agent
|
|
2225
2245
|
can resume by polling rather than re-dispatching. Same shape on
|
|
2226
|
-
\`study wait\` (single-participant rows[] has length 1).
|
|
2246
|
+
\`study wait\` (single-participant rows[] has length 1). Each row
|
|
2247
|
+
in \`progress.rows[]\` carries \`age_seconds\` (server-computed
|
|
2248
|
+
liveness from \`started_at\`) plus \`error_kind\` when populated;
|
|
2249
|
+
when any non-terminal row's \`age_seconds\` exceeds ~900s the
|
|
2250
|
+
envelope's \`error\` message explicitly flags "the worker likely
|
|
2251
|
+
died" — don't keep polling, the backend reaper will mark it
|
|
2252
|
+
\`failed, error_kind=stale_worker\` within ~15 min.
|
|
2253
|
+
- **Participant \`error_kind\` enumeration.** Failed participants
|
|
2254
|
+
carry a classified \`error_kind\` so agents branch without parsing
|
|
2255
|
+
prose. Lifecycle/infra kinds: \`stale_worker\` (worker died mid-run,
|
|
2256
|
+
reaper transitioned the row), \`self_timeout\` (worker self-aborted
|
|
2257
|
+
past its 25-min runtime ceiling). Modality kinds:
|
|
2258
|
+
\`first_impression_llm_failed\`, \`interview_llm_failed\`,
|
|
2259
|
+
\`variant_preparation_failed\` (ask responses). CLI-side kinds:
|
|
2260
|
+
\`ConfirmationRequired\` (destructive op in \`--json\` mode without
|
|
2261
|
+
\`--yes\`), \`TunnelInactive\`, \`BotAuthError\`, \`BotShapeError\`,
|
|
2262
|
+
\`BotInvalidResponseError\`. The full set is open — branch on the
|
|
2263
|
+
ones you handle and treat the rest as "unknown failure, surface to
|
|
2264
|
+
user."
|
|
2265
|
+
- **Per-participant status payload (\`/simulation/status/{id}\`)** carries
|
|
2266
|
+
\`{job_id, status, create_time, completion_time?, error?, error_kind?,
|
|
2267
|
+
started_at?, last_heartbeat_at?, age_seconds?}\`. \`age_seconds\` is
|
|
2268
|
+
server-computed so clock skew between caller and backend doesn't
|
|
2269
|
+
matter; treat absent fields as "older backend, info unavailable."
|
|
2227
2270
|
- **\`study run\` accepts \`--dispatch-timeout <s>\`** (default 120)
|
|
2228
2271
|
for the per-POST participants/batch + simulation/start budget. On
|
|
2229
2272
|
timeout (or any dispatch failure), the error envelope includes
|
|
@@ -2423,7 +2466,7 @@ not branch on \`status: 0\` — that value is never emitted as of 0.20.
|
|
|
2423
2466
|
- Lists print as JSON arrays (or paginated wrappers). Single resources
|
|
2424
2467
|
as JSON objects.
|
|
2425
2468
|
- Field names match the underlying API resource (snake_case).
|
|
2426
|
-
- Aliases (\`s-…\`, \`a-…\`, \`
|
|
2469
|
+
- Aliases (\`s-…\`, \`a-…\`, \`p-…\`, …) appear alongside UUIDs in
|
|
2427
2470
|
\`--verbose\` mode and replace UUIDs in default lean mode.
|
|
2428
2471
|
|
|
2429
2472
|
## Examples
|
|
@@ -2473,11 +2516,14 @@ reshaping output.
|
|
|
2473
2516
|
\`--turn\`, \`--side\`, \`--assignment\`, \`--step\`, \`--sentiment\`,
|
|
2474
2517
|
\`--actor\`, \`--iteration\`, \`--participant\`) and projection flags
|
|
2475
2518
|
(\`--group-by iteration|frame|segment|turn|assignment|step\`). When any
|
|
2476
|
-
filter is passed
|
|
2477
|
-
(\`{participant_count,
|
|
2478
|
-
|
|
2479
|
-
|
|
2480
|
-
(not 4) — slicing
|
|
2519
|
+
filter is passed on the default \`study results\` envelope, the envelope
|
|
2520
|
+
gains a \`totals_unfiltered\` field (\`{participant_count,
|
|
2521
|
+
interaction_count}\`) so an agent can sanity-check coverage: "matched
|
|
2522
|
+
12 / 80 participants". A zero-match filter returns the stable envelope
|
|
2523
|
+
with \`participant_count: 0\` and exit code **0** (not 4) — slicing
|
|
2524
|
+
never errors on no-match. \`--group-by\` returns a different shape — a
|
|
2525
|
+
uniform envelope \`{axis, rows, totals_unfiltered, modality_warnings,
|
|
2526
|
+
study_id, modality}\` (see \`guides/slicing-results\`).
|
|
2481
2527
|
|
|
2482
2528
|
\`--group-by\` is **router-gated by modality**: \`frame\` requires
|
|
2483
2529
|
interactive, \`segment\` requires media (video / audio / text / document),
|
|
@@ -2509,7 +2555,7 @@ client-side; no extra round trip beyond the standard study fetch.
|
|
|
2509
2555
|
| \`--step <ref>\` | Filters \`participant_assignments[].step_results[]\` to verdicts matching the step id or name. | interactive + external_chatbot chat (steps live there) |
|
|
2510
2556
|
| \`--sentiment <labels>\` | Comma-separated, case-insensitive label list (repeatable). Drops null-sentiment rows. | all |
|
|
2511
2557
|
| \`--actor <ai\|human\|user>\` | Restrict by actor. | all |
|
|
2512
|
-
| \`--iteration <ref>\` | Iteration UUID or label (\`A\`, \`B\`, … case-insensitive).
|
|
2558
|
+
| \`--iteration <ref>\` | Iteration UUID, iteration alias (\`i-…\`), or label (\`A\`, \`B\`, … case-insensitive). | all |
|
|
2513
2559
|
| \`--participant <ref>\` | Participant UUID or \`pt-…\` alias. | all |
|
|
2514
2560
|
| \`--include-unmatched\` | With \`--frame\`, keep degraded captures (\`frame_version_id: null\`) under a synthetic \`_unmatched\` bucket instead of dropping them. | interactive |
|
|
2515
2561
|
| \`--include-evidence\` | With \`--step\`, also drop interactions not listed in any surviving \`step_results[].evidence_interaction_ids[]\`. | interactive + external_chatbot chat |
|
|
@@ -2520,33 +2566,52 @@ The exception is \`--group-by\` — see below.
|
|
|
2520
2566
|
|
|
2521
2567
|
## Projection flags (--group-by)
|
|
2522
2568
|
|
|
2523
|
-
|
|
2569
|
+
Every \`--group-by\` axis returns the same envelope:
|
|
2570
|
+
\`{axis, rows, totals_unfiltered, modality_warnings, study_id, modality}\`.
|
|
2571
|
+
Top-level \`axis\` echoes the requested axis; \`study_id\` is the \`s-…\`
|
|
2572
|
+
alias; \`modality\` echoes the study's modality. \`rows\` is an
|
|
2573
|
+
axis-specific array of slice objects (see the table below for the per-row
|
|
2574
|
+
shape). \`modality_warnings\` carries any filter-flag mismatches
|
|
2575
|
+
(e.g. \`--turn\` on a non-chat study); empty array when none.
|
|
2576
|
+
|
|
2577
|
+
| Axis | Row shape (one element of \`rows[]\`) | Modality |
|
|
2524
2578
|
|-------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
|
|
2525
|
-
| \`iteration\` | \`{
|
|
2526
|
-
| \`frame\` | \`
|
|
2527
|
-
| \`segment\` | \`
|
|
2528
|
-
| \`turn\` | \`
|
|
2529
|
-
| \`assignment\` | \`
|
|
2530
|
-
| \`step\` | \`
|
|
2579
|
+
| \`iteration\` | \`{iteration_id, iteration_label, participant_count, interaction_count, sentiment, sample_comments, top_actions}\` | all |
|
|
2580
|
+
| \`frame\` | \`{frame_id, frame_label, interaction_count, sentiment_histogram, sample_comments, participant_aliases}\` | interactive (router errors on non-interactive) |
|
|
2581
|
+
| \`segment\` | \`{segment_index, segment_label, interaction_count, sentiment_histogram, engagement_histogram, sample_comments}\` | media (router errors on non-media) |
|
|
2582
|
+
| \`turn\` | \`{turn_index, interaction_count, sentiment_histogram, sample_replies, failures}\` | chat (router errors on non-chat) |
|
|
2583
|
+
| \`assignment\` | \`{assignment_id, assignment_name, interaction_count, sentiment_histogram, step_completion}\` | all |
|
|
2584
|
+
| \`step\` | \`{assignment_id, assignment_name, step_id, step_name, total, passed, inconclusive, failed, rate, participant_verdicts: [{participant_alias, verdict, reason, evidence_interaction_ids}]}\` | interactive + external_chatbot chat |
|
|
2531
2585
|
|
|
2532
2586
|
\`--group-by\` is **mutually exclusive with \`--summary\` and
|
|
2533
2587
|
\`--transcript\`**. \`--group-by frame\` on a chat study, \`--group-by
|
|
2534
2588
|
turn\` on a video study, etc. error at the surface (exit 2) with a
|
|
2535
|
-
clear message before any IO.
|
|
2589
|
+
clear message before any IO. The error envelope includes a \`hint\`
|
|
2590
|
+
field naming the axis that DOES apply to the study's modality
|
|
2591
|
+
(\`use --group-by segment\` on audio/video/text/document, \`use --group-by
|
|
2592
|
+
turn\` on chat, \`use --group-by frame\` on interactive) — agents can
|
|
2593
|
+
branch on it to retry productively in one hop.
|
|
2536
2594
|
|
|
2537
2595
|
## The empty-slice contract
|
|
2538
2596
|
|
|
2539
2597
|
A filter combination that matches zero interactions returns the
|
|
2540
|
-
**
|
|
2598
|
+
**uniform envelope** with:
|
|
2541
2599
|
|
|
2542
|
-
- \`
|
|
2600
|
+
- \`rows: []\`
|
|
2543
2601
|
- \`totals_unfiltered: {participant_count: <N>, interaction_count: <M>}\` populated
|
|
2602
|
+
- \`axis\`, \`study_id\`, \`modality\` still populated
|
|
2544
2603
|
- exit code **0** (not 4)
|
|
2545
2604
|
|
|
2546
2605
|
\`totals_unfiltered\` is the agent's sanity check: *"my filter matched
|
|
2547
2606
|
0 of 80 participants — is the filter too tight, or did the run not
|
|
2548
2607
|
produce data?"*. The shape never collapses to \`null\` or a different
|
|
2549
|
-
envelope; \`--get participant_count\` is always safe
|
|
2608
|
+
envelope; \`--get participant_count\` is always safe on the default
|
|
2609
|
+
(non-\`--group-by\`) envelope.
|
|
2610
|
+
|
|
2611
|
+
The default+filter envelope (no \`--group-by\`) also carries
|
|
2612
|
+
\`modality_warnings: string[]\` — any filter flags that were dropped as
|
|
2613
|
+
off-modality (e.g. \`--turn 1\` on an interactive study) appear here.
|
|
2614
|
+
Agents piping stderr to \`/dev/null\` get the same signal on stdout.
|
|
2550
2615
|
|
|
2551
2616
|
## Worked examples
|
|
2552
2617
|
|
|
@@ -2617,22 +2682,26 @@ No match at all errors and lists the available frame names.
|
|
|
2617
2682
|
|
|
2618
2683
|
\`\`\`
|
|
2619
2684
|
# Sanity-check coverage:
|
|
2685
|
+
--get axis
|
|
2686
|
+
--get study_id
|
|
2687
|
+
--get modality
|
|
2620
2688
|
--get totals_unfiltered.participant_count
|
|
2621
2689
|
--get totals_unfiltered.interaction_count
|
|
2690
|
+
--get modality_warnings
|
|
2622
2691
|
|
|
2623
|
-
# Per-iteration projection:
|
|
2624
|
-
--get
|
|
2625
|
-
--get
|
|
2626
|
-
--get
|
|
2692
|
+
# Per-iteration projection rows:
|
|
2693
|
+
--get rows.iteration_label # one label per line
|
|
2694
|
+
--get rows.0.participant_count
|
|
2695
|
+
--get rows.0.sentiment
|
|
2627
2696
|
|
|
2628
|
-
# Per-frame / per-segment / per-turn (
|
|
2629
|
-
--get 0.frame_label
|
|
2630
|
-
--get 0.segment_index
|
|
2631
|
-
--get 0.sentiment_histogram
|
|
2697
|
+
# Per-frame / per-segment / per-turn (rows[] is the axis array):
|
|
2698
|
+
--get rows.0.frame_label
|
|
2699
|
+
--get rows.0.segment_index
|
|
2700
|
+
--get rows.0.sentiment_histogram
|
|
2632
2701
|
|
|
2633
2702
|
# Per-step:
|
|
2634
|
-
--get 0.rate
|
|
2635
|
-
--get 0.participant_verdicts.verdict
|
|
2703
|
+
--get rows.0.rate
|
|
2704
|
+
--get rows.0.participant_verdicts.verdict
|
|
2636
2705
|
\`\`\`
|
|
2637
2706
|
|
|
2638
2707
|
## Related
|
|
@@ -3013,6 +3082,8 @@ free credits before re-dispatch.
|
|
|
3013
3082
|
estimate at preview time — the CLI prints the shape (\`N × … × 2\`)
|
|
3014
3083
|
instead of a number.
|
|
3015
3084
|
|
|
3085
|
+
**Naming note:** "tier" in ish means **billing** tier (FREE / STARTER / PRO / ENTERPRISE — a credit-budget knob). It is NOT a simulation-quality dial. Per-run simulation behaviour (model, timing, retries) is controlled via \`ish config\` — see \`ish config --help\`. \`docs search tier\` returns billing results by design.
|
|
3086
|
+
|
|
3016
3087
|
## Related
|
|
3017
3088
|
|
|
3018
3089
|
- \`reference/billing-limits\` — per-tier *entity* caps (max
|
|
@@ -3447,13 +3518,13 @@ Optional \`--max-turns <n>\` (default 12) caps the chat per participant.
|
|
|
3447
3518
|
|
|
3448
3519
|
Audience size is set at run time for **external_chatbot** chat
|
|
3449
3520
|
studies. Use \`--sample <N>\` to pick N random simulatable profiles,
|
|
3450
|
-
or \`--all\` for the full pool. \`--
|
|
3521
|
+
or \`--all\` for the full pool. \`--person <ids>\` is also supported
|
|
3451
3522
|
for explicit selection:
|
|
3452
3523
|
\`\`\`
|
|
3453
3524
|
ish study run stu-xyz --sample 5 --wait
|
|
3454
3525
|
\`\`\`
|
|
3455
3526
|
|
|
3456
|
-
> **Pair-mode is different.** \`--sample\` / \`--
|
|
3527
|
+
> **Pair-mode is different.** \`--sample\` / \`--person\` / demographic
|
|
3457
3528
|
> filters on \`study run\` are **refused** for participant_pair iterations
|
|
3458
3529
|
> — pair groups live on the iteration itself. Set them at
|
|
3459
3530
|
> iteration-create time via \`--group-a/-b\` (with 1×N broadcast)
|
|
@@ -3609,7 +3680,7 @@ Keys (all optional): \`occupation\`, \`min_age\`, \`max_age\`,
|
|
|
3609
3680
|
\`requires_captions\`, \`uses_screen_reader\`, \`prefers_reduced_motion\`,
|
|
3610
3681
|
\`prefers_high_contrast\`, \`has_any_accessibility_need\`. The five \`*_in\`
|
|
3611
3682
|
arrays accept snake_case spec values; the five accessibility filters are
|
|
3612
|
-
booleans. Combine \`--
|
|
3683
|
+
booleans. Combine \`--group-a\` / \`--group-b\` and \`--role-criteria-*\` on the same side
|
|
3613
3684
|
to make criteria validate an explicit list (mismatch blocks the run).
|
|
3614
3685
|
|
|
3615
3686
|
MECE notes for the list filters:
|
|
@@ -3995,7 +4066,7 @@ cap at 40 entries.
|
|
|
3995
4066
|
- \`concepts/person\` — what a person is; structured fields.
|
|
3996
4067
|
- \`concepts/source\` — interview transcripts / audio / PDF inputs
|
|
3997
4068
|
for the people-generation flow.
|
|
3998
|
-
- \`reference/aliases\` — \`
|
|
4069
|
+
- \`reference/aliases\` — \`p-…\` is the person alias prefix.
|
|
3999
4070
|
`;
|
|
4000
4071
|
const GUIDE_MCP_ADD = `# guide: wire ish into your AI clients (\`ish mcp add\`)
|
|
4001
4072
|
|
package/dist/lib/output.d.ts
CHANGED
|
@@ -35,10 +35,16 @@ export declare function outputList(rows: unknown[], json: boolean): void;
|
|
|
35
35
|
/**
|
|
36
36
|
* Error with valid options — used for content_type and similar validation.
|
|
37
37
|
* Surfaces valid_options in JSON so agents can self-correct.
|
|
38
|
+
*
|
|
39
|
+
* Optional `hint` is the agent's *actionable next step* (e.g. for a wrong
|
|
40
|
+
* --group-by axis on the current modality, the axis that DOES apply). Distinct
|
|
41
|
+
* from `valid_options`, which describes where the supplied value WOULD be
|
|
42
|
+
* valid. Both serialize into the error envelope when present.
|
|
38
43
|
*/
|
|
39
44
|
export declare class ValidationError extends Error {
|
|
40
45
|
valid_options: string[];
|
|
41
|
-
|
|
46
|
+
hint?: string | undefined;
|
|
47
|
+
constructor(message: string, valid_options: string[], hint?: string | undefined);
|
|
42
48
|
}
|
|
43
49
|
export declare function outputError(err: unknown, json: boolean): void;
|
|
44
50
|
export declare function printTable(headers: string[], rows: string[][]): void;
|
|
@@ -110,13 +116,12 @@ export declare function formatAskResults(ask: Record<string, unknown>, json: boo
|
|
|
110
116
|
export declare function formatConfigList(configs: Record<string, unknown>[], json: boolean): void;
|
|
111
117
|
export type StudyResultsGroupByKind = "iteration" | "frame" | "segment" | "turn" | "assignment" | "step";
|
|
112
118
|
/**
|
|
113
|
-
* Render a `--group-by <kind>` projection
|
|
114
|
-
*
|
|
115
|
-
*
|
|
116
|
-
*
|
|
117
|
-
*
|
|
118
|
-
*
|
|
119
|
-
*
|
|
120
|
-
* surface (T5) doesn't need to know the difference.
|
|
119
|
+
* Render a `--group-by <kind>` projection wrapped in the uniform
|
|
120
|
+
* `SliceResponse` envelope (`{ axis, rows, totals_unfiltered,
|
|
121
|
+
* modality_warnings, study_id, modality }`). JSON mode is a thin
|
|
122
|
+
* pass-through to jsonOutput with `preProjected: true` so the lean
|
|
123
|
+
* transform doesn't strip our stable empties. Human mode pulls slices
|
|
124
|
+
* out of `rows` and renders one section per slice plus a small ASCII
|
|
125
|
+
* sentiment histogram.
|
|
121
126
|
*/
|
|
122
127
|
export declare function formatStudyResultsGroupBy(projection: unknown, kind: StudyResultsGroupByKind, json: boolean): void;
|
package/dist/lib/output.js
CHANGED
|
@@ -278,6 +278,53 @@ function pickFields(data, fields) {
|
|
|
278
278
|
}
|
|
279
279
|
return data;
|
|
280
280
|
}
|
|
281
|
+
/**
|
|
282
|
+
* Pattern A: when an agent passes `--fields foo,bar` and one of those names
|
|
283
|
+
* doesn't exist on the response, emit a one-line stderr warning naming the
|
|
284
|
+
* missing fields plus a sample of what IS available. Otherwise unknown names
|
|
285
|
+
* silently drop and the agent assumes the field doesn't exist on the wire,
|
|
286
|
+
* when the more common cause is a typo or the wrong projection.
|
|
287
|
+
*
|
|
288
|
+
* Probes the response shape: for an object response, the top-level keys;
|
|
289
|
+
* for a list-wrapper response, the keys of `items[0]`; for a bare array,
|
|
290
|
+
* the keys of element 0. Warns at most once per command invocation
|
|
291
|
+
* (the caller invokes this from jsonOutput before pickFields).
|
|
292
|
+
*/
|
|
293
|
+
function warnOnUnknownFields(data, fields) {
|
|
294
|
+
let probe = null;
|
|
295
|
+
if (Array.isArray(data) && data.length > 0 && typeof data[0] === "object" && data[0] !== null) {
|
|
296
|
+
probe = data[0];
|
|
297
|
+
}
|
|
298
|
+
else if (data && typeof data === "object" && !Array.isArray(data)) {
|
|
299
|
+
const obj = data;
|
|
300
|
+
if (isListWrapper(obj) && Array.isArray(obj.items) && obj.items.length > 0
|
|
301
|
+
&& typeof obj.items[0] === "object" && obj.items[0] !== null) {
|
|
302
|
+
probe = obj.items[0];
|
|
303
|
+
}
|
|
304
|
+
else {
|
|
305
|
+
probe = obj;
|
|
306
|
+
}
|
|
307
|
+
}
|
|
308
|
+
if (!probe)
|
|
309
|
+
return;
|
|
310
|
+
const missing = fields.filter((f) => !(f in probe));
|
|
311
|
+
if (missing.length === 0)
|
|
312
|
+
return;
|
|
313
|
+
// Pattern DD: surface↔backend rename hints. The agent-friendly noun is
|
|
314
|
+
// "workspace" but the backend stores `product_id`; agents who guess the
|
|
315
|
+
// surface name need a did-you-mean to find the actual response key.
|
|
316
|
+
const RENAME_MAP = {
|
|
317
|
+
workspace_id: "product_id",
|
|
318
|
+
workspace: "product",
|
|
319
|
+
};
|
|
320
|
+
const renameHints = missing
|
|
321
|
+
.filter((m) => RENAME_MAP[m] && RENAME_MAP[m] in probe)
|
|
322
|
+
.map((m) => `${m} → ${RENAME_MAP[m]}`);
|
|
323
|
+
const available = Object.keys(probe).slice(0, 12).join(", ");
|
|
324
|
+
const more = Object.keys(probe).length > 12 ? `, … (${Object.keys(probe).length - 12} more)` : "";
|
|
325
|
+
const didYouMean = renameHints.length > 0 ? ` Did you mean: ${renameHints.join(", ")}?` : "";
|
|
326
|
+
console.error(`warning: --fields requested ${missing.length === 1 ? "name" : "names"} not on the response: ${missing.join(", ")}.${didYouMean} Available: ${available}${more}.`);
|
|
327
|
+
}
|
|
281
328
|
/** Serialize data as JSON, applying lean transform and field selection. */
|
|
282
329
|
function jsonOutput(data, options = {}) {
|
|
283
330
|
let out;
|
|
@@ -297,6 +344,7 @@ function jsonOutput(data, options = {}) {
|
|
|
297
344
|
out = leanJson(data, options.writePath);
|
|
298
345
|
}
|
|
299
346
|
if (_fields && _fields.length > 0) {
|
|
347
|
+
warnOnUnknownFields(out, _fields);
|
|
300
348
|
out = pickFields(out, _fields);
|
|
301
349
|
}
|
|
302
350
|
// Pattern Ω capture mode: --get <field> returns bare values instead of
|
|
@@ -396,12 +444,19 @@ export function outputList(rows, json) {
|
|
|
396
444
|
/**
|
|
397
445
|
* Error with valid options — used for content_type and similar validation.
|
|
398
446
|
* Surfaces valid_options in JSON so agents can self-correct.
|
|
447
|
+
*
|
|
448
|
+
* Optional `hint` is the agent's *actionable next step* (e.g. for a wrong
|
|
449
|
+
* --group-by axis on the current modality, the axis that DOES apply). Distinct
|
|
450
|
+
* from `valid_options`, which describes where the supplied value WOULD be
|
|
451
|
+
* valid. Both serialize into the error envelope when present.
|
|
399
452
|
*/
|
|
400
453
|
export class ValidationError extends Error {
|
|
401
454
|
valid_options;
|
|
402
|
-
|
|
455
|
+
hint;
|
|
456
|
+
constructor(message, valid_options, hint) {
|
|
403
457
|
super(message);
|
|
404
458
|
this.valid_options = valid_options;
|
|
459
|
+
this.hint = hint;
|
|
405
460
|
this.name = "ValidationError";
|
|
406
461
|
}
|
|
407
462
|
}
|
|
@@ -434,6 +489,11 @@ function suggestionsForError(err) {
|
|
|
434
489
|
return [
|
|
435
490
|
"Run a list command to see available resources",
|
|
436
491
|
"Check that the alias or ID is correct",
|
|
492
|
+
// Pattern R: an active workspace / study / ask saved in config can
|
|
493
|
+
// outlive the resource on the server. Implicit lookups then 404
|
|
494
|
+
// with no indication that the ID came from config. `ish status`
|
|
495
|
+
// flags orphans; `<entity> use --clear` resets the active value.
|
|
496
|
+
"If you didn't pass the resource explicitly, your saved active workspace/study/ask may be stale — run `ish status` to check, then `ish workspace use --clear` (or `ish study use --clear` / `ish ask use --clear`) to reset.",
|
|
437
497
|
];
|
|
438
498
|
case "insufficient_credits":
|
|
439
499
|
return ["Purchase more credits at https://app.ishlabs.io"];
|
|
@@ -593,11 +653,14 @@ export function outputError(err, json) {
|
|
|
593
653
|
error_code: "validation_error",
|
|
594
654
|
retryable: false,
|
|
595
655
|
valid_options: err.valid_options,
|
|
656
|
+
...(err.hint && { hint: err.hint }),
|
|
596
657
|
...(suggestions.length > 0 && { suggestions }),
|
|
597
658
|
}));
|
|
598
659
|
}
|
|
599
660
|
else {
|
|
600
661
|
console.error(`Error: ${err.message}`);
|
|
662
|
+
if (err.hint)
|
|
663
|
+
console.error(` hint: ${err.hint}`);
|
|
601
664
|
for (const s of suggestions)
|
|
602
665
|
console.error(` → ${s}`);
|
|
603
666
|
}
|
|
@@ -635,6 +698,9 @@ export function outputError(err, json) {
|
|
|
635
698
|
? tagged.suggestions.filter((s) => typeof s === "string")
|
|
636
699
|
: [];
|
|
637
700
|
const mergedSuggestions = [...new Set([...suggestions, ...taggedSuggestions])];
|
|
701
|
+
const availableValues = Array.isArray(tagged.available_values)
|
|
702
|
+
? tagged.available_values.filter((s) => typeof s === "string")
|
|
703
|
+
: undefined;
|
|
638
704
|
if (json) {
|
|
639
705
|
console.error(JSON.stringify({
|
|
640
706
|
// Generic Error: CLI-thrown (we control the message), so we don't
|
|
@@ -647,6 +713,7 @@ export function outputError(err, json) {
|
|
|
647
713
|
...(errorKind && { error_kind: errorKind }),
|
|
648
714
|
...(example && { example }),
|
|
649
715
|
...(progress !== undefined && { progress }),
|
|
716
|
+
...(availableValues && availableValues.length > 0 && { available_values: availableValues }),
|
|
650
717
|
...(seededIds && { seeded_but_not_dispatched_ids: seededIds }),
|
|
651
718
|
...(seededAliases && { seeded_but_not_dispatched_aliases: seededAliases }),
|
|
652
719
|
...(mergedSuggestions.length > 0 && { suggestions: mergedSuggestions }),
|
|
@@ -998,6 +1065,14 @@ export function buildStudyResultsEnvelope(study, participants) {
|
|
|
998
1065
|
? deterministicAlias(ALIAS_PREFIX.study, String(study.id))
|
|
999
1066
|
: null;
|
|
1000
1067
|
const completedCount = allParticipants.filter((t) => t.status === "completed" || t.status === "complete").length;
|
|
1068
|
+
// Pattern N: per-status breakdown so callers can distinguish running /
|
|
1069
|
+
// pending / cancelled from terminal completed/failed. Additive — the
|
|
1070
|
+
// aggregate counts (`completed_count` / `failed_count`) stay alongside.
|
|
1071
|
+
const participantStatusCounts = {};
|
|
1072
|
+
for (const t of allParticipants) {
|
|
1073
|
+
const key = (t.status || "unknown").toLowerCase();
|
|
1074
|
+
participantStatusCounts[key] = (participantStatusCounts[key] || 0) + 1;
|
|
1075
|
+
}
|
|
1001
1076
|
// Aggregate sentiment across all interactions on all participants.
|
|
1002
1077
|
const sentimentCounts = {};
|
|
1003
1078
|
let sentimentTotal = 0;
|
|
@@ -1066,6 +1141,7 @@ export function buildStudyResultsEnvelope(study, participants) {
|
|
|
1066
1141
|
participant_count: allParticipants.length,
|
|
1067
1142
|
completed_count: completedCount,
|
|
1068
1143
|
failed_count: failedCount,
|
|
1144
|
+
participant_status_counts: participantStatusCounts,
|
|
1069
1145
|
sentiment,
|
|
1070
1146
|
interview_answers: interviewAnswers,
|
|
1071
1147
|
participants: participantRows,
|
|
@@ -2253,16 +2329,13 @@ function asciiHistogram(hist, options = {}) {
|
|
|
2253
2329
|
});
|
|
2254
2330
|
}
|
|
2255
2331
|
function slicesFromProjection(projection) {
|
|
2256
|
-
//
|
|
2257
|
-
//
|
|
2258
|
-
|
|
2259
|
-
|
|
2260
|
-
|
|
2261
|
-
|
|
2262
|
-
|
|
2263
|
-
const slices = wrapped.slices;
|
|
2264
|
-
if (Array.isArray(slices)) {
|
|
2265
|
-
return slices.filter((s) => Boolean(s) && typeof s === "object" && !Array.isArray(s));
|
|
2332
|
+
// Surface wraps every --group-by axis in the uniform SliceResponse envelope
|
|
2333
|
+
// `{ axis, rows, totals_unfiltered, modality_warnings, study_id, modality }`;
|
|
2334
|
+
// slices live under `rows`.
|
|
2335
|
+
if (projection && typeof projection === "object" && !Array.isArray(projection)) {
|
|
2336
|
+
const rows = projection.rows;
|
|
2337
|
+
if (Array.isArray(rows)) {
|
|
2338
|
+
return rows.filter((s) => Boolean(s) && typeof s === "object" && !Array.isArray(s));
|
|
2266
2339
|
}
|
|
2267
2340
|
}
|
|
2268
2341
|
return [];
|
|
@@ -2393,14 +2466,13 @@ function renderStepSlice(slice) {
|
|
|
2393
2466
|
}
|
|
2394
2467
|
}
|
|
2395
2468
|
/**
|
|
2396
|
-
* Render a `--group-by <kind>` projection
|
|
2397
|
-
*
|
|
2398
|
-
*
|
|
2399
|
-
*
|
|
2400
|
-
*
|
|
2401
|
-
*
|
|
2402
|
-
*
|
|
2403
|
-
* surface (T5) doesn't need to know the difference.
|
|
2469
|
+
* Render a `--group-by <kind>` projection wrapped in the uniform
|
|
2470
|
+
* `SliceResponse` envelope (`{ axis, rows, totals_unfiltered,
|
|
2471
|
+
* modality_warnings, study_id, modality }`). JSON mode is a thin
|
|
2472
|
+
* pass-through to jsonOutput with `preProjected: true` so the lean
|
|
2473
|
+
* transform doesn't strip our stable empties. Human mode pulls slices
|
|
2474
|
+
* out of `rows` and renders one section per slice plus a small ASCII
|
|
2475
|
+
* sentiment histogram.
|
|
2404
2476
|
*/
|
|
2405
2477
|
export function formatStudyResultsGroupBy(projection, kind, json) {
|
|
2406
2478
|
if (json) {
|
|
@@ -218,6 +218,7 @@ When in doubt: side-by-side comparison usually beats in-place edits. Ids are che
|
|
|
218
218
|
- **Chatbot endpoint response-shape mismatch**: \`chat_endpoint_test\` succeeds shallowly if the bot responds at all, but a wrong response path (e.g. bot returns \`{ data: { reply } }\` instead of \`{ reply }\`) produces empty transcripts on the actual run. Inspect one full test response before dispatching participants.
|
|
219
219
|
- **Chatbot auth drift**: tokens/sessions baked into \`--from-curl\` expire. If transcripts come back as identical short error strings, re-run \`chat_endpoint_test\` and refresh the curl spec.
|
|
220
220
|
- **401 surfaces as fake blocker**: an unauthenticated endpoint produces "participant got stuck on auth screen" — looks like a UX blocker but is config. Always confirm endpoint auth before reading transcripts as user-research data.
|
|
221
|
+
- **Don't poll a stuck run forever**: a participant whose worker died will sit in \`status: running\` until the backend reaper transitions it to \`failed, error_kind: stale_worker\` (~15 min). The per-participant status payload exposes \`age_seconds\` (server-computed from \`started_at\`); once it's above ~900s on a non-terminal row, the run is almost certainly stuck. The CLI's \`wait_timeout\` envelope explicitly flags this case in its \`error\` message — when you see "the worker likely died," stop polling and surface the failure rather than retrying. \`error_kind: self_timeout\` is the same idea but written by the worker itself when it self-aborts past its 25-min ceiling.
|
|
221
222
|
- **No per-page/per-timestamp scoping for media**: there's no "evaluate just slide 14" or "react to seconds 0-30" API. State the focus explicitly in the \`assignment\` text, or pre-stitch the artifact (e.g. replace one slide locally, upload as a new iteration).
|
|
222
223
|
- **\`study get --json\` participants live at the top level**, not nested under \`iterations[*].participants\`. The backend split made \`/studies/{id}\` lite (metadata + iteration shells, no participant graph) and added \`/studies/{id}/participants\`; the CLI joins them so \`study get --json\` carries a flat \`participants[]\` with \`iteration_id\` on each row. Read \`.participants[]\`, not \`.iterations[].participants[]\`.
|
|
223
224
|
- **All destructive deletes require \`--yes\` in non-TTY mode**: \`ish workspace delete\`, \`study delete\`, \`ask delete\`, \`person delete\`, \`source delete\`, \`chat endpoint delete\`. In \`--json\` mode (or any piped/non-TTY invocation), omitting \`--yes\` refuses with \`error_kind: "ConfirmationRequired"\` + an \`example\` field showing the same command with \`--yes\` appended. \`workspace delete\` is the highest-blast-radius: it removes ALL nested studies, asks, people, secrets, configs, sources, and chat endpoints — the prompt names them explicitly.
|
|
@@ -954,6 +955,12 @@ ish study results s-b2c --frame doesnotexist --json
|
|
|
954
955
|
# degraded captures (frame_version_id: null) back.
|
|
955
956
|
\`\`\`
|
|
956
957
|
|
|
958
|
+
Every \`--group-by <axis>\` call returns the same envelope:
|
|
959
|
+
\`{axis, rows, totals_unfiltered, modality_warnings, study_id, modality}\`.
|
|
960
|
+
The \`rows\` array holds axis-specific slice objects. The envelope is
|
|
961
|
+
uniform across all six axes — agents can code one shape and key on
|
|
962
|
+
\`axis\` / \`modality\` to dispatch on what's inside \`rows\`.
|
|
963
|
+
|
|
957
964
|
Rules to remember:
|
|
958
965
|
- **Filters compose with AND across flags; OR within \`--sentiment\`.**
|
|
959
966
|
\`--frame login --sentiment Frustrated,Confused\` keeps only login-frame
|
|
@@ -974,7 +981,8 @@ Rules to remember:
|
|
|
974
981
|
the filtered set. \`--transcript\` is single-participant and errors
|
|
975
982
|
(exit 2) when **any** filter or \`--group-by\` is set.
|
|
976
983
|
- Per-step output exposes \`participant_verdicts: [{participant_alias,
|
|
977
|
-
verdict, reason, evidence_interaction_ids}]\`
|
|
984
|
+
verdict, reason, evidence_interaction_ids}]\` on **each row of
|
|
985
|
+
\`rows[]\`** (one per \`(assignment, step)\` pair) — not
|
|
978
986
|
\`per_participant_verdicts\`. The verdict enum is \`passed\` /
|
|
979
987
|
\`inconclusive\` / \`failed\`.
|
|
980
988
|
|
|
@@ -1078,6 +1086,7 @@ table, projection shapes, and the defensive null-handling rules.
|
|
|
1078
1086
|
| Per-step pass/fail with reasons inline | \`study participant --json\` per participant + jq | \`ish study results <id> --step verify-email --group-by step --json\` |
|
|
1079
1087
|
| Frustrated reactions to one media segment | \`study results --json\` + jq | \`ish study results <id> --segment 3 --sentiment Frustrated --json\` |
|
|
1080
1088
|
| Sanity-check filter coverage | hand-count \`.participants\` vs total | \`--get totals_unfiltered.participant_count\` (set on every sliced envelope) |
|
|
1089
|
+
| Know the sliced-results envelope shape | guess per axis | \`{axis, rows[], totals_unfiltered, modality_warnings, study_id, modality}\` — every \`--group-by\` axis |
|
|
1081
1090
|
| Chat transcript for one participant (external_chatbot) | \`study participant --json\` + jq | \`ish study results <id> --transcript <participant_id> --json\` |
|
|
1082
1091
|
| Pair-mode conversation transcripts | \`study participant --json\` per participant | \`ish iteration get <iter-id> --json \\| jq '.conversations[]'\` |
|
|
1083
1092
|
| Participant headline only (no action timeline) | \`study participant --json\` + jq | \`ish study participant <id> --summary --json\` |
|
|
@@ -38,6 +38,9 @@ export interface StudyParticipant extends Participant {
|
|
|
38
38
|
conversation_id?: string | null;
|
|
39
39
|
error_message?: string | null;
|
|
40
40
|
error_kind?: string | null;
|
|
41
|
+
started_at?: string | null;
|
|
42
|
+
last_heartbeat_at?: string | null;
|
|
43
|
+
age_seconds?: number | null;
|
|
41
44
|
[k: string]: unknown;
|
|
42
45
|
}
|
|
43
46
|
export declare function fetchStudyParticipants(client: ApiClient, studyId: string, opts?: {
|