@ishlabs/cli 0.13.0 → 0.14.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/commands/iteration.js +219 -22
- package/dist/commands/profile.js +75 -9
- package/dist/commands/source.js +6 -4
- package/dist/commands/study-run.js +382 -34
- package/dist/commands/study.js +170 -9
- package/dist/commands/workspace.js +35 -2
- package/dist/lib/accessibility-profile.d.ts +12 -0
- package/dist/lib/accessibility-profile.js +136 -0
- package/dist/lib/ask-questions.js +9 -0
- package/dist/lib/billing.d.ts +55 -0
- package/dist/lib/billing.js +77 -0
- package/dist/lib/docs.js +1106 -36
- package/dist/lib/enums.d.ts +54 -0
- package/dist/lib/enums.js +100 -0
- package/dist/lib/local-sim/actions.d.ts +2 -1
- package/dist/lib/local-sim/actions.js +88 -13
- package/dist/lib/local-sim/loop.js +49 -19
- package/dist/lib/local-sim/tabs.d.ts +27 -0
- package/dist/lib/local-sim/tabs.js +157 -0
- package/dist/lib/local-sim/types.d.ts +15 -0
- package/dist/lib/modality.d.ts +70 -1
- package/dist/lib/modality.js +323 -17
- package/dist/lib/output.js +61 -4
- package/dist/lib/skill-content.js +382 -19
- package/dist/lib/types.d.ts +6 -1
- package/package.json +1 -1
package/dist/lib/docs.js
CHANGED
|
@@ -123,8 +123,43 @@ A \`null\` value on a \`*_max\` field means "unlimited" (paid tiers).
|
|
|
123
123
|
Branch on \`studies_used >= studies_max\` before \`study create\`,
|
|
124
124
|
likewise for \`testers_used\` before \`study run --sample\`.
|
|
125
125
|
|
|
126
|
+
## Cold start — \`workspace_create\` is not safe to call blind
|
|
127
|
+
|
|
128
|
+
On a saturated account, calling \`workspace_create\` (or
|
|
129
|
+
\`ish workspace create\`) without first inspecting state returns
|
|
130
|
+
\`error_code: usage_limit_reached\` immediately. A first-time agent
|
|
131
|
+
that was told "create a fresh workspace, then run a study" will trip
|
|
132
|
+
the cap on the very first call. **Always inspect existing workspaces
|
|
133
|
+
first** — \`ish workspace list\` / \`workspace_get\` returns per-row
|
|
134
|
+
metadata so you can pick a reuse target rather than blindly creating.
|
|
135
|
+
|
|
136
|
+
Each row in the list response carries:
|
|
137
|
+
|
|
138
|
+
- \`last_activity_at\` — most recent run, iteration, ask, or write on
|
|
139
|
+
this workspace. Pick the most recently active workspace if you want
|
|
140
|
+
one the user is likely already thinking about.
|
|
141
|
+
- \`child_counts\` — \`{ studies, asks, tester_profiles }\`. Zero across
|
|
142
|
+
the board = a quiet/empty workspace, safe to reuse without
|
|
143
|
+
cluttering anyone's view.
|
|
144
|
+
- \`has_headroom\` — \`true\` if the workspace is below
|
|
145
|
+
\`maxStudiesPerProduct\`, \`maxIterationsPerStudy\`, and
|
|
146
|
+
\`maxCustomTesterProfiles\` for the caller's tier. Branch on this
|
|
147
|
+
before \`study create\` / \`profile generate\` — \`false\` here will be
|
|
148
|
+
\`usage_limit_reached\` on the next call.
|
|
149
|
+
|
|
150
|
+
For the idempotent create-or-reuse-by-name path, use
|
|
151
|
+
\`ish workspace create --name <name> --ensure\`: returns the existing
|
|
152
|
+
workspace owned by the caller if one with that name exists, otherwise
|
|
153
|
+
creates a fresh one. Safe to call from a cold-start script without
|
|
154
|
+
first scraping the list.
|
|
155
|
+
|
|
156
|
+
The full saturated-account walkthrough (with branch logic + a worked
|
|
157
|
+
transcript) lives at \`guides/cold-start\`.
|
|
158
|
+
|
|
126
159
|
## Related
|
|
127
160
|
|
|
161
|
+
- \`guides/cold-start\` — saturated-account first-step playbook
|
|
162
|
+
(\`workspace_get\` → inspect headroom → reuse or \`--ensure\`).
|
|
128
163
|
- \`concepts/secret\` — per-workspace secrets used in chatbot endpoint
|
|
129
164
|
headers via \`{{secret:KEY}}\` placeholders.
|
|
130
165
|
- \`reference/billing-limits\` — \`maxProducts\` cap on workspace creation.
|
|
@@ -134,7 +169,8 @@ const CONCEPT_STUDY = `# concept: study
|
|
|
134
169
|
A **study** is the persistent research artifact. It defines:
|
|
135
170
|
- \`modality\`: \`interactive\` (the tester drives a real browser), one of
|
|
136
171
|
\`text | video | audio | image | document\` (media reaction studies),
|
|
137
|
-
or \`chat\` (multi-turn
|
|
172
|
+
or \`chat\` (multi-turn conversation — either with an external chatbot
|
|
173
|
+
endpoint or between two AI personas via tester_pair mode).
|
|
138
174
|
- \`content_type\` (media studies only): \`email | social_post | ad | …\` —
|
|
139
175
|
controls the framing the tester is given.
|
|
140
176
|
- \`assignments\`: the tasks the tester performs. See \`concepts/assignment\`.
|
|
@@ -168,7 +204,7 @@ test artifact and don't need to A/B iterations:
|
|
|
168
204
|
| \`video\` | \`--content-url <url>\` |
|
|
169
205
|
| \`audio\` | \`--content-url <url>\` |
|
|
170
206
|
| \`document\` | \`--content-url <url>\` |
|
|
171
|
-
| \`chat\` | \`--endpoint <id>\` or \`--endpoint-config <file>\`
|
|
207
|
+
| \`chat\` | \`--endpoint <id>\` or \`--endpoint-config <file>\` (external_chatbot mode), or \`--chat-mode tester_pair --audience-a/-b --scenario-a/-b\` (two-AI rehearsal) |
|
|
172
208
|
|
|
173
209
|
\`\`\`
|
|
174
210
|
# Text — single email artifact:
|
|
@@ -261,14 +297,22 @@ pick was wrong.
|
|
|
261
297
|
- \`concepts/questionnaire\` — question types and timing.
|
|
262
298
|
- \`concepts/run-verbs\` — when to use \`study run\` vs \`ask run\`.
|
|
263
299
|
- \`reference/billing-limits\` — \`maxStudiesPerProduct\` cap on study creation.
|
|
300
|
+
- \`reference/credits\` — per-run credit cost & how to preview before dispatch.
|
|
264
301
|
`;
|
|
265
302
|
const CONCEPT_ITERATION = `# concept: iteration
|
|
266
303
|
|
|
267
304
|
An **iteration** is one configured run of a study. It carries the
|
|
268
305
|
volatile bits — the URL (interactive), the media (video/text/etc.), or
|
|
269
|
-
the
|
|
306
|
+
the chat payload (chat) — while the study carries the persistent
|
|
270
307
|
shape (assignments, questionnaire, modality).
|
|
271
308
|
|
|
309
|
+
For chat modality, the iteration's \`details.mode_details\` discriminator
|
|
310
|
+
selects between **external_chatbot** (testers probe a customer chatbot
|
|
311
|
+
endpoint) and **tester_pair** (two AI tester audiences converse with
|
|
312
|
+
each other, one Conversation per pair index). Wire-shape examples and
|
|
313
|
+
pair-mode rules live under the "## Chat modality" section below; the
|
|
314
|
+
full chat-author workflow is at \`guides/chat\`.
|
|
315
|
+
|
|
272
316
|
- Alias prefix: \`i-\`
|
|
273
317
|
- A study has 1..N iterations. \`ish study run\` defaults to the latest.
|
|
274
318
|
- Local files passed to \`--content-url\`, \`--image-urls\`, etc. are
|
|
@@ -311,9 +355,15 @@ ish iteration create --image-urls "./a.png,./b.png"
|
|
|
311
355
|
# Document (PDF):
|
|
312
356
|
ish iteration create --content-url ./report.pdf
|
|
313
357
|
|
|
314
|
-
# Chat — probe a saved chatbot endpoint:
|
|
358
|
+
# Chat (external_chatbot) — probe a saved chatbot endpoint:
|
|
315
359
|
ish iteration create --chat-endpoint-id ce-... --max-turns 10 --early-termination
|
|
316
360
|
|
|
361
|
+
# Chat (tester_pair) — rehearse a conversation between two AI audiences:
|
|
362
|
+
ish iteration create --chat-mode tester_pair \\
|
|
363
|
+
--audience-a tp-a1,tp-a2 --audience-b tp-b1,tp-b2 \\
|
|
364
|
+
--scenario-a "You're a senior sales rep pitching ish." \\
|
|
365
|
+
--scenario-b "You're a skeptical CTO evaluating ish."
|
|
366
|
+
|
|
317
367
|
# Inspect:
|
|
318
368
|
ish iteration list --study s-b2c
|
|
319
369
|
ish iteration get i-d4e
|
|
@@ -401,22 +451,279 @@ paragraph-by-paragraph reactions to a long caption. Use the
|
|
|
401
451
|
|
|
402
452
|
## Chat modality
|
|
403
453
|
|
|
404
|
-
Chat iterations
|
|
405
|
-
|
|
406
|
-
|
|
454
|
+
Chat iterations hold a multi-turn conversation. The conversation can
|
|
455
|
+
take one of two shapes, picked by the \`mode_details.mode\` discriminator
|
|
456
|
+
on the iteration:
|
|
457
|
+
|
|
458
|
+
- **\`external_chatbot\`** — a tester talks to a customer chatbot
|
|
459
|
+
endpoint (the original chat behaviour). The endpoint config or saved
|
|
460
|
+
chatbot-endpoint reference lives at
|
|
461
|
+
\`details.mode_details.endpoint\` / \`details.mode_details.chatbot_endpoint_id\`.
|
|
462
|
+
- **\`tester_pair\`** — two AI tester profiles talk to each other.
|
|
463
|
+
audience_a and audience_b pair 1:1 by index when counts match (N
|
|
464
|
+
pairs → N conversations); a side of exactly 1 broadcasts across the
|
|
465
|
+
other side (so 1 × N → N conversations all sharing the lone profile).
|
|
466
|
+
Each side carries its own scenario + goal; the other side does not
|
|
467
|
+
see it (the **asymmetry contract**). Useful for rehearsing a pitch, a
|
|
468
|
+
difficult conversation, a sales call, or any two-role scenario before
|
|
469
|
+
it happens.
|
|
470
|
+
|
|
471
|
+
Wire-shape:
|
|
407
472
|
|
|
408
|
-
\`\`\`
|
|
409
|
-
|
|
410
|
-
|
|
473
|
+
\`\`\`json
|
|
474
|
+
// external_chatbot
|
|
475
|
+
{
|
|
476
|
+
"type": "chat",
|
|
477
|
+
"mode_details": {
|
|
478
|
+
"mode": "external_chatbot",
|
|
479
|
+
"endpoint": { "url": "https://...", "headers": {} },
|
|
480
|
+
"chatbot_endpoint_id": "ep-uuid"
|
|
481
|
+
},
|
|
482
|
+
"max_turns": 14,
|
|
483
|
+
"early_termination": true
|
|
484
|
+
}
|
|
411
485
|
|
|
412
|
-
|
|
413
|
-
|
|
486
|
+
// tester_pair (with explicit audiences)
|
|
487
|
+
{
|
|
488
|
+
"type": "chat",
|
|
489
|
+
"mode_details": {
|
|
490
|
+
"mode": "tester_pair",
|
|
491
|
+
"audience_a": ["tp-uuid-1", "tp-uuid-2"],
|
|
492
|
+
"audience_b": ["tp-uuid-3", "tp-uuid-4"],
|
|
493
|
+
"scenario_a": "You are a senior sales rep pitching ish.",
|
|
494
|
+
"scenario_b": "You are a skeptical CTO evaluating ish.",
|
|
495
|
+
"initiator_side": "a"
|
|
496
|
+
},
|
|
497
|
+
"max_turns": 14,
|
|
498
|
+
"early_termination": true
|
|
499
|
+
}
|
|
500
|
+
|
|
501
|
+
// tester_pair (with role criteria — backend resolves the pool)
|
|
502
|
+
{
|
|
503
|
+
"type": "chat",
|
|
504
|
+
"mode_details": {
|
|
505
|
+
"mode": "tester_pair",
|
|
506
|
+
"audience_a": [],
|
|
507
|
+
"audience_b": [],
|
|
508
|
+
"role_criteria_a": {
|
|
509
|
+
"occupation": ["founder", "ceo"],
|
|
510
|
+
"min_age": 28, "max_age": 55,
|
|
511
|
+
"country": ["US", "SE"]
|
|
512
|
+
},
|
|
513
|
+
"role_criteria_b": { "occupation": ["investor", "vc"] },
|
|
514
|
+
"scenario_a": "...",
|
|
515
|
+
"scenario_b": "...",
|
|
516
|
+
"initiator_side": "a"
|
|
517
|
+
},
|
|
518
|
+
"max_turns": 14,
|
|
519
|
+
"early_termination": true
|
|
520
|
+
}
|
|
414
521
|
\`\`\`
|
|
415
522
|
|
|
416
|
-
|
|
417
|
-
|
|
523
|
+
## Audience selection (tester_pair)
|
|
524
|
+
|
|
525
|
+
Each side of a pair needs **either** an explicit audience list **or** a
|
|
526
|
+
role-criteria filter (or both). Three input modes:
|
|
527
|
+
|
|
528
|
+
| Side A input | Side B input | Behaviour |
|
|
529
|
+
| --- | --- | --- |
|
|
530
|
+
| \`--audience-a\` (UUIDs) | \`--audience-b\` (UUIDs) | Explicit pairing. Equal counts zip 1:1 by index; a side of exactly 1 broadcasts to the other. |
|
|
531
|
+
| \`--role-criteria-a\` (JSON) | \`--role-criteria-b\` (JSON) | Backend resolves matching pool from each side's criteria and persists the IDs back to the iteration. |
|
|
532
|
+
| Either flag pair | Either flag pair | Mixed (e.g. explicit A + criteria B). Backend handles each side independently. |
|
|
533
|
+
| Both flags on one side | (any) | Criteria validates the explicit list; mismatch blocks run with a clear error. |
|
|
534
|
+
|
|
535
|
+
**Persona-first principle**: the tester's persona is sacred — never
|
|
536
|
+
altered by the scenario. Criteria filter the *eligible pool* upstream
|
|
537
|
+
so that by the time a tester reaches the LLM prompt, their persona is
|
|
538
|
+
already plausible for the role. The prompt construction itself does
|
|
539
|
+
not change between explicit-audience and criteria-driven flows.
|
|
540
|
+
|
|
541
|
+
\`RoleCriteria\` keys (all optional):
|
|
542
|
+
|
|
543
|
+
- \`occupation: string[]\` (job titles, case-insensitive match)
|
|
544
|
+
- \`min_age: int\`, \`max_age: int\`
|
|
545
|
+
- \`gender: string[]\` (e.g. \`["female", "male"]\`)
|
|
546
|
+
- \`country: string[]\` (ISO-3166-alpha-2 codes)
|
|
547
|
+
- \`education_level_in: string[]\` (less_than_secondary, secondary, some_post_secondary, vocational_or_associate, bachelor, graduate)
|
|
548
|
+
- \`household_in: string[]\` (single, couple_no_kids, couple_with_kids, single_parent, shared_housing, adult_with_parents, multi_generational). MECE: a couple raising children is \`couple_with_kids\`, not \`couple_no_kids\`; \`single\` means lives alone with no partner, roommates, parents, or children sharing the household.
|
|
549
|
+
- \`locale_type_in: string[]\` (urban, suburban, small_town, rural)
|
|
550
|
+
- \`income_level_in: string[]\` (lower, lower_middle, middle, upper_middle, upper, prefer_not_to_say)
|
|
551
|
+
- \`employment_status_in: string[]\` (employed_full_time, employed_part_time, self_employed, unemployed_seeking, student, homemaker, retired, unable_to_work, other). Primary daytime activity wins: a student who works part-time is \`student\`; a retiree who freelances is \`retired\`.
|
|
552
|
+
- \`requires_captions: bool\`, \`uses_screen_reader: bool\`, \`prefers_reduced_motion: bool\`, \`prefers_high_contrast: bool\`, \`has_any_accessibility_need: bool\` (coarse boolean filters over \`accessibility_profile\`)
|
|
553
|
+
|
|
554
|
+
If the resolved pool is smaller than the requested conversation count
|
|
555
|
+
for a side, \`ish study run\` exits 2 with the backend's error envelope
|
|
556
|
+
intact. No silent fallback. Broaden the criteria, generate more
|
|
557
|
+
profiles, or pass an explicit \`--audience-*\` list to recover.
|
|
558
|
+
|
|
559
|
+
## Pair-mode flag names (CLI ↔ MCP alignment)
|
|
560
|
+
|
|
561
|
+
CLI flags on \`ish study create\` / \`ish iteration create\` use the
|
|
562
|
+
same nouns the MCP \`study_iterate.chat_pair\` payload uses, so an
|
|
563
|
+
agent doesn't pay a translation tax when switching surfaces:
|
|
564
|
+
|
|
565
|
+
| CLI flag | MCP field | What it carries |
|
|
566
|
+
|---------------------------|----------------------------|-----------------------------------------------------|
|
|
567
|
+
| \`--audience-a\` / \`-b\` | \`audience_a\` / \`audience_b\` | Explicit tester profile IDs (UUIDs or aliases) for that side. |
|
|
568
|
+
| \`--role-criteria-a\` / \`-b\` | \`role_criteria_a\` / \`role_criteria_b\` | JSON filter (occupation, country, …) the backend resolves into a pool. |
|
|
569
|
+
| \`--scenario-a\` / \`-b\` | \`scenario_a\` / \`scenario_b\` | The system-prompt-shaped role text injected into one side's prompt only (asymmetry contract). |
|
|
570
|
+
| \`--initiator-side\` | \`initiator_side\` | Which side speaks first (\`a\` default). |
|
|
571
|
+
| \`--max-turns\` | \`max_turns\` | Conversation cap (default 14). |
|
|
572
|
+
| \`--early-termination\` | \`early_termination\` | Allow the worker to end early when parties signal. |
|
|
573
|
+
|
|
574
|
+
The pre-2026-05 \`--profile-a\` / \`--profile-b\` CLI flags were
|
|
575
|
+
renamed to \`--audience-a\` / \`--audience-b\` to match the MCP and
|
|
576
|
+
the wire shape (\`mode_details.audience_a\` /
|
|
577
|
+
\`mode_details.audience_b\`). Same intent, same accepted inputs
|
|
578
|
+
(comma-separated UUIDs or aliases, repeatable). \`--role-criteria-a\`
|
|
579
|
+
/ \`--role-criteria-b\` were already aligned with MCP and did not
|
|
580
|
+
change.
|
|
581
|
+
|
|
582
|
+
CLI authoring:
|
|
583
|
+
|
|
584
|
+
\`\`\`
|
|
585
|
+
# external_chatbot — reference a saved endpoint (recommended):
|
|
586
|
+
ish iteration create --endpoint ep-abc --max-turns 10 --early-termination
|
|
587
|
+
|
|
588
|
+
# external_chatbot — inline endpoint config:
|
|
589
|
+
ish iteration create --endpoint-config ./bot.json
|
|
590
|
+
|
|
591
|
+
# external_chatbot — legacy escape-hatch flags still work:
|
|
592
|
+
ish iteration create --chat-endpoint-id ep-abc --max-turns 10
|
|
593
|
+
ish iteration create --chat-endpoint-json '{"url":"https://..."}'
|
|
594
|
+
|
|
595
|
+
# tester_pair — two AI audiences, asymmetric per-side scenarios:
|
|
596
|
+
ish iteration create --chat-mode tester_pair \\
|
|
597
|
+
--audience-a tp-a1,tp-a2 --audience-b tp-b1,tp-b2 \\
|
|
598
|
+
--scenario-a @./sales_rep.md --scenario-b @./skeptical_cto.md \\
|
|
599
|
+
--max-turns 14
|
|
600
|
+
|
|
601
|
+
# tester_pair — criteria-driven audience (persona-first filtering):
|
|
602
|
+
ish iteration create --chat-mode tester_pair \\
|
|
603
|
+
--role-criteria-a '{"occupation":["founder","ceo"],"min_age":28}' \\
|
|
604
|
+
--role-criteria-b @./criteria_investor.json \\
|
|
605
|
+
--scenario-a @./sales_rep.md --scenario-b @./skeptical_cto.md \\
|
|
606
|
+
--max-turns 14
|
|
607
|
+
\`\`\`
|
|
608
|
+
|
|
609
|
+
Tunables (both modes):
|
|
610
|
+
- \`--max-turns N\` — cap the conversation length (default 12 for
|
|
611
|
+
external_chatbot, 14 for tester_pair; persona drift starts ~20 turns
|
|
612
|
+
so cap accordingly).
|
|
418
613
|
- \`--early-termination\` — let the worker end the session early when
|
|
419
|
-
the
|
|
614
|
+
the parties signal the conversation is over.
|
|
615
|
+
|
|
616
|
+
Pair-mode rules:
|
|
617
|
+
- Each side needs **either** \`--profile-*\` (explicit IDs) **or**
|
|
618
|
+
\`--role-criteria-*\` (filter the backend resolves). The two can also
|
|
619
|
+
be combined — criteria then acts as validation on the explicit list.
|
|
620
|
+
- When both sides use explicit \`--audience-a\` / \`--audience-b\`, they
|
|
621
|
+
must be the same length (≥ 1). Same profile on both sides is allowed
|
|
622
|
+
(self-talk rehearsal). When either side defers to criteria, the
|
|
623
|
+
length match is enforced server-side after pool resolution.
|
|
624
|
+
- **1×N broadcast**: pass exactly one profile on one side and N
|
|
625
|
+
profiles on the other to rehearse the fixed side against N
|
|
626
|
+
variations. The CLI auto-broadcasts the singleton to match length
|
|
627
|
+
N. Example: \`--audience-a tp-rep --audience-b tp-cto1,tp-cto2,tp-cto3\`
|
|
628
|
+
produces 3 conversations, all sharing tp-rep on side A. The CLI
|
|
629
|
+
prints a stderr notice so you know broadcasting kicked in.
|
|
630
|
+
- Both \`--scenario-a\` and \`--scenario-b\` are required and asymmetric.
|
|
631
|
+
- \`--initiator-side\` defaults to \`a\` (side A speaks first).
|
|
632
|
+
- \`--chat-mode\` accepts both \`tester_pair\` and \`tester-pair\`
|
|
633
|
+
(hyphenated variants are normalised). Same normalisation applies to
|
|
634
|
+
\`--screen-format\` (\`mobile_portrait\` ↔ \`mobile-portrait\`),
|
|
635
|
+
\`--kind\` on \`source upload\` (\`text_file\` ↔ \`text-file\`), and the
|
|
636
|
+
\`type\` field in \`--questionnaire\` / \`--questions\` manifests
|
|
637
|
+
(\`single-choice\` ↔ \`single_choice\`).
|
|
638
|
+
- Audiences are pinned to the iteration. \`ish study run\` refuses
|
|
639
|
+
run-time audience overrides (\`--profile\` / \`--sample\` / \`--all\` /
|
|
640
|
+
filters) on a pair iteration — change the audiences via
|
|
641
|
+
\`ish iteration update <id> --details-json '{...}'\` instead.
|
|
642
|
+
- \`--max-turns\` / \`--early-termination\` on \`ish study run\` override
|
|
643
|
+
the iteration's saved values for that single dispatch (they are not
|
|
644
|
+
persisted back to the iteration).
|
|
645
|
+
- One Conversation row is created per pair index, server-side. The
|
|
646
|
+
per-conversation summary (\`end_reason\`, \`dominant_dynamic\`) lands on
|
|
647
|
+
the iteration response under \`conversations[]\`. Inspect via
|
|
648
|
+
\`ish iteration get <id>\`.
|
|
649
|
+
|
|
650
|
+
## Writing a good scenario
|
|
651
|
+
|
|
652
|
+
Thin scenarios produce thin rehearsals. Both \`scenario_a\` and
|
|
653
|
+
\`scenario_b\` are injected into their own side's prompt as
|
|
654
|
+
role-playing context — the partner does **not** see the other side's
|
|
655
|
+
scenario or goal. Treat each scenario as a system prompt for one
|
|
656
|
+
character in a play. Cover five things:
|
|
657
|
+
|
|
658
|
+
1. **Role / identity** — who is this person?
|
|
659
|
+
2. **Voice** — how do they speak? Formal, casual, technical, blunt?
|
|
660
|
+
3. **What they know** — the context they came in with.
|
|
661
|
+
4. **What they don't know** — the asymmetry that makes the rehearsal
|
|
662
|
+
interesting.
|
|
663
|
+
5. **Goal** — what counts as success for *them*.
|
|
664
|
+
|
|
665
|
+
Example (\`scenario_a\` — the sales rep):
|
|
666
|
+
|
|
667
|
+
\`\`\`
|
|
668
|
+
You are Maya, a senior account executive at ish — three years of
|
|
669
|
+
experience selling research tooling to product orgs. You speak in
|
|
670
|
+
clear, plain sentences, push back when you disagree, and quantify
|
|
671
|
+
claims when you can. You know this is a 30-minute discovery call;
|
|
672
|
+
you've read the prospect's LinkedIn and that's it. You do NOT know
|
|
673
|
+
the prospect's current tooling, budget, or internal politics — your
|
|
674
|
+
job is to find out by listening and asking. Success = end the call
|
|
675
|
+
with a clear next step (a pilot, a follow-up demo, or a "no, here's
|
|
676
|
+
why"). A polite "we'll get back to you" is not success.
|
|
677
|
+
\`\`\`
|
|
678
|
+
|
|
679
|
+
Example (\`scenario_b\` — the buyer):
|
|
680
|
+
|
|
681
|
+
\`\`\`
|
|
682
|
+
You are Devon, the CTO at a 60-person Series B SaaS company. You
|
|
683
|
+
distrust new vendors by default — your team has been burned by
|
|
684
|
+
"AI for research" tools twice in the past 18 months. You speak in
|
|
685
|
+
short, sceptical sentences and interrupt vendor pitches with
|
|
686
|
+
specifics: pricing, integrations, where the data lives. You know
|
|
687
|
+
your team currently runs unmoderated tests via UserTesting and
|
|
688
|
+
Pendo; the budget for new tooling is tight (€8k/year max). You do
|
|
689
|
+
NOT know how ish prices, what it integrates with, or whether it
|
|
690
|
+
handles your stack (Mixpanel + Heap + Linear). Success = leave the
|
|
691
|
+
call with either a concrete proof point that addresses your top
|
|
692
|
+
risk, OR a clean way to decline without burning the relationship.
|
|
693
|
+
\`\`\`
|
|
694
|
+
|
|
695
|
+
Read those back to back: the personas are asymmetric (different
|
|
696
|
+
goals, different knowledge), grounded (specific tools, specific
|
|
697
|
+
numbers), and constrained (each has a stake). That's the difference
|
|
698
|
+
between a rehearsal that produces signal and one that produces
|
|
699
|
+
generic dialogue. Keep each scenario under ~250 words — past that,
|
|
700
|
+
persona drift starts to dominate.
|
|
701
|
+
|
|
702
|
+
### Don't put demographics in the scenario
|
|
703
|
+
|
|
704
|
+
A scenario describes **voice, knowledge, and goal** for one role —
|
|
705
|
+
*not* the demographics of who plays it. Demographic constraints
|
|
706
|
+
("you are 35-year-old Swedish founder") belong in
|
|
707
|
+
\`--role-criteria-a\` / \`--role-criteria-b\` instead. The tester's
|
|
708
|
+
persona stays sacred; criteria filter the eligible pool upstream so
|
|
709
|
+
the persona is already plausible for the role by the time the LLM
|
|
710
|
+
sees the prompt. Mixing demographics into the scenario text
|
|
711
|
+
short-circuits the asymmetry contract and produces incoherent
|
|
712
|
+
characters (a retired farmer suddenly "pitching a Series A").
|
|
713
|
+
|
|
714
|
+
Paired with the Maya / Devon scenarios above, the criteria might
|
|
715
|
+
look like:
|
|
716
|
+
|
|
717
|
+
\`\`\`
|
|
718
|
+
# --role-criteria-a (the sales rep filter):
|
|
719
|
+
{"occupation":["sales","account executive"],"min_age":28,"max_age":50}
|
|
720
|
+
|
|
721
|
+
# --role-criteria-b (the skeptical CTO filter):
|
|
722
|
+
{"occupation":["cto","vp engineering","head of engineering"],
|
|
723
|
+
"country":["US","SE"],"education_level_in":["bachelor","graduate"]}
|
|
724
|
+
\`\`\`
|
|
725
|
+
|
|
726
|
+
Scenarios describe the role; criteria pick who plays it.
|
|
420
727
|
|
|
421
728
|
## No more auto-empty iteration A
|
|
422
729
|
|
|
@@ -444,6 +751,7 @@ Treat this as actionable, not transient — re-running won't change anything.
|
|
|
444
751
|
- \`concepts/run-verbs\` — how \`ish study run\` selects the iteration.
|
|
445
752
|
- \`concepts/audience\` — how testers are picked for a run.
|
|
446
753
|
- \`reference/billing-limits\` — \`maxIterationsPerStudy\` cap on iteration creation.
|
|
754
|
+
- \`reference/credits\` — per-iteration-run credit cost & preview shape (\`pair_preview.credit_estimate\` for tester-pair, top-level \`credit_estimate\` otherwise).
|
|
447
755
|
`;
|
|
448
756
|
const CONCEPT_ASSIGNMENT = `# concept: assignment
|
|
449
757
|
|
|
@@ -527,6 +835,11 @@ ish study create … --questionnaire ./questionnaire.json
|
|
|
527
835
|
\`questionnaire.json\` is an array of question objects in the shape above.
|
|
528
836
|
The same shape is accepted by \`ish ask add-questions … --questions …\`.
|
|
529
837
|
|
|
838
|
+
The \`type\` field is hyphenated for the multi-word values (\`single-choice\`,
|
|
839
|
+
\`multiple-choice\`). The CLI normalises the underscored variants
|
|
840
|
+
(\`single_choice\`, \`multiple_choice\`) back to the canonical hyphenated form,
|
|
841
|
+
so either works in your manifest.
|
|
842
|
+
|
|
530
843
|
## Related
|
|
531
844
|
|
|
532
845
|
- \`concepts/ask\` — asks have per-round questions, similar shape.
|
|
@@ -700,6 +1013,33 @@ copy can safely append questions without losing prior round results.
|
|
|
700
1013
|
|
|
701
1014
|
See \`reference/json-mode\` for the full shape.
|
|
702
1015
|
|
|
1016
|
+
## Response-shape ergonomics
|
|
1017
|
+
|
|
1018
|
+
A few non-obvious shape rules on the MCP / ask endpoints that save
|
|
1019
|
+
round-trips when you know them up front:
|
|
1020
|
+
|
|
1021
|
+
- **\`cross_round_summary\` requires \`wants_pick=true\` on every
|
|
1022
|
+
round.** \`ask results\` / \`ask_get\` only compute the top-level
|
|
1023
|
+
\`cross_round_summary\` when *every* round in the ask was dispatched
|
|
1024
|
+
with \`wants_pick=true\` — picks across rounds are the only
|
|
1025
|
+
comparable signal. When even one round is a free-text drill
|
|
1026
|
+
question (\`wants_pick=false\`), the field is omitted and the
|
|
1027
|
+
response carries a \`cross_round_summary_reason\` string explaining
|
|
1028
|
+
which round(s) lacked \`wants_pick\` (e.g.
|
|
1029
|
+
\`"omitted: rounds 2, 3 lack wants_pick=true"\`). Branch on the
|
|
1030
|
+
reason, don't poll for the field to appear.
|
|
1031
|
+
- **\`audience_get\` omits \`accessibility_profile\` by default.** The
|
|
1032
|
+
field is ~1KB per row; on a 50-profile page it overflows
|
|
1033
|
+
agent-tool result budgets. Pass
|
|
1034
|
+
\`include_accessibility_profile=true\` to include it. Mirrors the
|
|
1035
|
+
existing \`include_bio=false\` default — same opt-in pattern.
|
|
1036
|
+
- **\`ask_testers\` uses \`dispatch_into_round\`, not \`round\`.** The
|
|
1037
|
+
parameter name was renamed from the ambiguous \`round\` (which read
|
|
1038
|
+
as "start from round N") to the verbatim \`dispatch_into_round\`
|
|
1039
|
+
("add these new testers into round N"). Behavior is unchanged —
|
|
1040
|
+
it appends testers to the named round on an existing ask, it does
|
|
1041
|
+
not roll back or restart any prior round.
|
|
1042
|
+
|
|
703
1043
|
## Variant syntax
|
|
704
1044
|
|
|
705
1045
|
\`--variant <type>:<value>[::label=<label>]\`
|
|
@@ -714,6 +1054,7 @@ See \`reference/json-mode\` for the full shape.
|
|
|
714
1054
|
- \`concepts/round\` — what a round is and how it executes.
|
|
715
1055
|
- \`concepts/audience\` — how testers are chosen at ask creation.
|
|
716
1056
|
- \`concepts/run-verbs\` — \`ish ask run\` vs \`ish study run\`.
|
|
1057
|
+
- \`reference/credits\` — ask rounds bill 1 credit per successful response.
|
|
717
1058
|
`;
|
|
718
1059
|
const CONCEPT_ROUND = `# concept: round
|
|
719
1060
|
|
|
@@ -816,6 +1157,58 @@ Expected JSON: \`{ "name": "...", "type": "ai", "gender": "female",
|
|
|
816
1157
|
Re-generating the same name/country/occupation/age yields the
|
|
817
1158
|
same DOB.
|
|
818
1159
|
|
|
1160
|
+
## Structured profile fields
|
|
1161
|
+
|
|
1162
|
+
Five universal enums + a versioned accessibility JSONB live on every
|
|
1163
|
+
TesterProfile. Values are snake_case and match
|
|
1164
|
+
\`https://ishlabs.io/spec/profile-enums.v1.json\` byte-for-byte.
|
|
1165
|
+
|
|
1166
|
+
- \`education_level\`: \`less_than_secondary\`, \`secondary\`,
|
|
1167
|
+
\`some_post_secondary\`, \`vocational_or_associate\`, \`bachelor\`, \`graduate\`
|
|
1168
|
+
- \`household\` (MECE): \`single\`, \`couple_no_kids\`, \`couple_with_kids\`,
|
|
1169
|
+
\`single_parent\`, \`shared_housing\`, \`adult_with_parents\`,
|
|
1170
|
+
\`multi_generational\`. A couple raising children is \`couple_with_kids\`,
|
|
1171
|
+
not \`couple_no_kids\`. \`single\` means lives alone (no partner,
|
|
1172
|
+
roommates, parents, or children sharing the household).
|
|
1173
|
+
- \`locale_type\`: \`urban\`, \`suburban\`, \`small_town\`, \`rural\`
|
|
1174
|
+
- \`income_level\`: \`lower\`, \`lower_middle\`, \`middle\`, \`upper_middle\`,
|
|
1175
|
+
\`upper\`, \`prefer_not_to_say\`
|
|
1176
|
+
- \`employment_status\`: \`employed_full_time\`, \`employed_part_time\`,
|
|
1177
|
+
\`self_employed\`, \`unemployed_seeking\`, \`student\`, \`homemaker\`,
|
|
1178
|
+
\`retired\`, \`unable_to_work\`, \`other\`. Pick the primary daytime
|
|
1179
|
+
activity: a student who works part-time is \`student\`; a retiree who
|
|
1180
|
+
freelances is \`retired\`.
|
|
1181
|
+
- \`accessibility_profile\`: JSONB v1.0 with optional \`visual\`,
|
|
1182
|
+
\`auditory\`, \`motor\`, \`cognitive\`, \`data\` groups, plus
|
|
1183
|
+
\`assistive_tech: string[]\` and \`notes\`. Empty \`{}\` means "no
|
|
1184
|
+
accessibility configuration declared". Schema:
|
|
1185
|
+
\`https://ishlabs.io/spec/accessibility-profile-schema.v1.json\`.
|
|
1186
|
+
|
|
1187
|
+
Set them on \`ish profile update\`:
|
|
1188
|
+
|
|
1189
|
+
\`\`\`
|
|
1190
|
+
ish profile update tp-1b9 \\
|
|
1191
|
+
--education-level bachelor \\
|
|
1192
|
+
--household couple_with_kids \\
|
|
1193
|
+
--locale-type suburban \\
|
|
1194
|
+
--income-level middle \\
|
|
1195
|
+
--employment-status employed_full_time
|
|
1196
|
+
|
|
1197
|
+
# accessibility_profile accepts inline JSON or a path:
|
|
1198
|
+
ish profile update tp-1b9 --accessibility-profile '{
|
|
1199
|
+
"version": "1.0",
|
|
1200
|
+
"visual": {"uses_screen_reader": true, "text_size": "large"},
|
|
1201
|
+
"cognitive": {"reduce_motion": true},
|
|
1202
|
+
"assistive_tech": ["VoiceOver"]
|
|
1203
|
+
}'
|
|
1204
|
+
|
|
1205
|
+
ish profile update tp-1b9 --accessibility-profile ./a11y.json
|
|
1206
|
+
\`\`\`
|
|
1207
|
+
|
|
1208
|
+
The legacy \`--tech-savviness\` flag was removed in
|
|
1209
|
+
\`profile-schema-v2\`; passing it now produces commander's standard
|
|
1210
|
+
"unknown option" error.
|
|
1211
|
+
|
|
819
1212
|
## Related
|
|
820
1213
|
|
|
821
1214
|
- \`concepts/source\` — the inputs to \`profile generate\`.
|
|
@@ -829,7 +1222,7 @@ audio file, image, or PDF that an LLM reads to ground generated profiles
|
|
|
829
1222
|
in real customer evidence.
|
|
830
1223
|
|
|
831
1224
|
- Alias prefix: \`tps-\`
|
|
832
|
-
- Source kinds: \`text_file | audio | image\` (auto-detected from extension).
|
|
1225
|
+
- Source kinds: \`text_file | audio | image\` (auto-detected from extension; \`text-file\` is accepted as a hyphen variant).
|
|
833
1226
|
- Audio supports speaker diarization via \`--diarize\`.
|
|
834
1227
|
|
|
835
1228
|
## Two ways to use a source
|
|
@@ -894,6 +1287,52 @@ Error: No simulatable AI tester profiles in workspace w-b32 match:
|
|
|
894
1287
|
The suggestion is best-effort — it never replaces the original error,
|
|
895
1288
|
just augments it.
|
|
896
1289
|
|
|
1290
|
+
## Audience-build behaviors to know before dispatch
|
|
1291
|
+
|
|
1292
|
+
Two adjacent footguns surface most often on first-time audience
|
|
1293
|
+
construction. Both are documented here because they cost a round-trip
|
|
1294
|
+
to discover by experiment.
|
|
1295
|
+
|
|
1296
|
+
### \`occupation\` is a loose substring match
|
|
1297
|
+
|
|
1298
|
+
\`audience_build\` (and the \`--search\` flag) treats \`occupation\` as
|
|
1299
|
+
a **loose, case-insensitive substring filter**, not a whole-token /
|
|
1300
|
+
taxonomy match. \`occupation=["manager"]\` will match hotel managers,
|
|
1301
|
+
retail store managers, bank branch managers — anything containing
|
|
1302
|
+
the literal string "manager". Three patterns that recover the
|
|
1303
|
+
specificity you usually want:
|
|
1304
|
+
|
|
1305
|
+
- **Whole-token alternation**: \`occupation=["engineering manager",
|
|
1306
|
+
"software engineering manager", "vp engineering", "tech lead"]\` —
|
|
1307
|
+
exhaustive enumeration of the role surface beats one short token.
|
|
1308
|
+
- **Pair with other filters**: \`occupation=["manager"]\` +
|
|
1309
|
+
\`min_age=28\` + \`country=["US","SE"]\` narrows even a loose substring
|
|
1310
|
+
meaningfully.
|
|
1311
|
+
- **Preview before dispatch**: \`audience_build\` returns a
|
|
1312
|
+
\`match_preview\` summary on the response — a 1-line histogram of
|
|
1313
|
+
matched occupations (e.g. \`"matched 17 — software developer (12),
|
|
1314
|
+
DevOps engineer (3), other (2)"\`). Read it before
|
|
1315
|
+
\`ask_run\` / \`study_run\` to confirm the substring is matching what
|
|
1316
|
+
you intended; iterate on the filter cheaply if not.
|
|
1317
|
+
|
|
1318
|
+
### The public profile pool skews non-tech / non-Western
|
|
1319
|
+
|
|
1320
|
+
The default public tester-profile pool was built from a broad
|
|
1321
|
+
demographic sample — so a substring like \`"software engineering
|
|
1322
|
+
manager"\` may return only a handful of matches, while \`"hotel
|
|
1323
|
+
manager"\` or \`"retail associate"\` return many. Two adaptations:
|
|
1324
|
+
|
|
1325
|
+
- **Don't assume Silicon Valley defaults.** A criteria-driven audience
|
|
1326
|
+
that works on a private testing pool may resolve to a much smaller
|
|
1327
|
+
count in the public pool. Read the \`match_preview\` (or count) on
|
|
1328
|
+
every \`audience_build\` before dispatching a run that depends on
|
|
1329
|
+
reaching N matches.
|
|
1330
|
+
- **Seed your own pool when you need a specific archetype.** If the
|
|
1331
|
+
public pool is genuinely thin for your role, generate the audience
|
|
1332
|
+
yourself via \`ish profile generate --description "..."\` — that
|
|
1333
|
+
produces profiles plausible for the role you described, regardless
|
|
1334
|
+
of public-pool composition. See \`concepts/profile\`.
|
|
1335
|
+
|
|
897
1336
|
## Defaults
|
|
898
1337
|
|
|
899
1338
|
- \`ish study run\` with no audience flags → reuses the iteration's
|
|
@@ -1347,6 +1786,15 @@ The CLI guarantees these contracts so agents can chain safely:
|
|
|
1347
1786
|
is collapsed to one batch entry per study (M13) with nested
|
|
1348
1787
|
\`tester_ids[]\`, \`tester_aliases[]\`, \`job_ids[]\`, and \`count\` —
|
|
1349
1788
|
an N-sample dispatch is a single row, not N near-duplicate rows.
|
|
1789
|
+
- **\`study\` JSON includes a \`url\` field.** \`study create\`,
|
|
1790
|
+
\`study generate\`, \`study get\`, \`study list\` (per item), and
|
|
1791
|
+
\`study run\` each return a top-level \`url\` pointing to the study
|
|
1792
|
+
in the web app — \`/<workspace>/<study>/overview\` on the read /
|
|
1793
|
+
write paths, \`/<workspace>/<study>/timeline\` on \`study run\`.
|
|
1794
|
+
Print it to the user instead of composing the host + path yourself.
|
|
1795
|
+
The base host follows the active backend: \`https://app.ishlabs.io\`
|
|
1796
|
+
on production, \`http://localhost:3000\` under \`--dev\`. Override
|
|
1797
|
+
with the \`ISH_APP_URL\` env var for staging or self-hosted UIs.
|
|
1350
1798
|
- **\`study results --json\` includes per-answer sentiment** (M10).
|
|
1351
1799
|
Every \`interview_answers[].answers[]\` row carries \`sentiment\`
|
|
1352
1800
|
(the tester's session-level label from \`tester_summary.sentiment\`),
|
|
@@ -1357,16 +1805,30 @@ The CLI guarantees these contracts so agents can chain safely:
|
|
|
1357
1805
|
error_message}. Drops \`interview_answers\` and per-interaction
|
|
1358
1806
|
breakdowns. Cheapest "did this run land?" shape.
|
|
1359
1807
|
- **\`study results --transcript <tester_id>\`** is the chat-modality
|
|
1360
|
-
projection
|
|
1361
|
-
|
|
1362
|
-
\`{role, text, turn_index,
|
|
1363
|
-
(set when the dispatch crashed);
|
|
1364
|
-
\`option_label\`, and \`sentiment\`.
|
|
1365
|
-
turns whose action carries no text
|
|
1366
|
-
\`ignore_offered\`); read intent from
|
|
1367
|
-
\`option_label\`. Same shape as the MCP
|
|
1368
|
-
tool. \`unique_bot_replies = 1\` on a
|
|
1369
|
-
signature.
|
|
1808
|
+
projection — **external_chatbot mode only in v1**. Returns
|
|
1809
|
+
\`{tester_id, tester_alias, transcript: [...], unique_bot_replies,
|
|
1810
|
+
tester_summary}\`. Each transcript entry is \`{role, text, turn_index,
|
|
1811
|
+
...}\` — bot turns add \`failure\` (set when the dispatch crashed);
|
|
1812
|
+
tester turns add \`action_type\`, \`option_label\`, and \`sentiment\`.
|
|
1813
|
+
\`text\` is null on tester turns whose action carries no text
|
|
1814
|
+
(\`select_option\`, \`ignore_offered\`); read intent from
|
|
1815
|
+
\`action_type\` + \`option_label\`. Same shape as the MCP
|
|
1816
|
+
\`get_chat_transcript\` tool. \`unique_bot_replies = 1\` on a
|
|
1817
|
+
multi-turn run is the M2 loop signature.
|
|
1818
|
+
|
|
1819
|
+
**For tester_pair conversations**, the bot/tester role pair doesn't
|
|
1820
|
+
apply (both speakers are testers). Inspect pair transcripts via the
|
|
1821
|
+
iteration response instead:
|
|
1822
|
+
|
|
1823
|
+
\`\`\`bash
|
|
1824
|
+
ish iteration get <iter-id> --json | jq '.conversations[]'
|
|
1825
|
+
# → [{ id, pair_index, started_at, ended_at, end_reason, summary, ... }]
|
|
1826
|
+
\`\`\`
|
|
1827
|
+
|
|
1828
|
+
Per-side tester summaries still land on each tester row
|
|
1829
|
+
(\`ish study tester <id> --json\`); the conversation-level summary
|
|
1830
|
+
(\`end_reason\`, \`dominant_dynamic\`, \`who_steered\`) lands on
|
|
1831
|
+
\`iteration.conversations[]\`.
|
|
1370
1832
|
- **\`study tester --summary\`** drops the action timeline and
|
|
1371
1833
|
returns just \`{tester, interaction_count, sentiment, comment,
|
|
1372
1834
|
error_message?, error_kind?}\`.
|
|
@@ -1438,9 +1900,24 @@ The CLI guarantees these contracts so agents can chain safely:
|
|
|
1438
1900
|
phase-2 LLM calls instead of 2N. Pass \`--redispatch-all\` for the
|
|
1439
1901
|
legacy reset behavior when you want fresh first impressions.
|
|
1440
1902
|
- **\`ask results --json\` includes \`cross_round_summary\` for 2+
|
|
1441
|
-
rounds
|
|
1442
|
-
|
|
1443
|
-
|
|
1903
|
+
rounds — when every round used \`wants_pick=true\`.** Top-level
|
|
1904
|
+
field with per-round picks/winner snapshots and a \`picks_delta\`
|
|
1905
|
+
(R1 → last round). Replaces hand-rolled diffing of two
|
|
1906
|
+
\`ask results\` calls. When **any** round was dispatched with
|
|
1907
|
+
\`wants_pick=false\` (typical for free-text follow-up rounds), the
|
|
1908
|
+
summary is omitted and \`cross_round_summary_reason\` carries the
|
|
1909
|
+
explanation (e.g. \`"omitted: rounds 2, 3 lack wants_pick=true"\`).
|
|
1910
|
+
Branch on the reason field, don't poll for the summary.
|
|
1911
|
+
- **\`audience_get\` omits \`accessibility_profile\` by default.** The
|
|
1912
|
+
block is ~1KB per row; including it on a 50-row page overflows
|
|
1913
|
+
agent tool result budgets. Pass
|
|
1914
|
+
\`include_accessibility_profile=true\` to opt in. Mirrors the
|
|
1915
|
+
existing \`include_bio=false\` opt-in.
|
|
1916
|
+
- **\`ask_testers\` parameter is \`dispatch_into_round\`, not
|
|
1917
|
+
\`round\`.** Reads verbatim — "dispatch these new testers into round
|
|
1918
|
+
N". The old name (\`round\`) read as "start from round N", which
|
|
1919
|
+
was wrong: the call never restarts prior rounds, it only appends
|
|
1920
|
+
testers to the named round. Behavior unchanged across the rename.
|
|
1444
1921
|
- **No more auto-empty iteration A.** \`study create\` and
|
|
1445
1922
|
\`study generate\` no longer produce a placeholder iteration A. The
|
|
1446
1923
|
first explicit \`ish iteration create\` becomes label A.
|
|
@@ -1739,6 +2216,168 @@ of scope: \`workspace\`, \`config\`, \`docs\`, \`init\`, \`login\`,
|
|
|
1739
2216
|
including \`--get workspace.alias\` to capture the active workspace
|
|
1740
2217
|
without piping \`ish status --json\` through \`jq\`.
|
|
1741
2218
|
`;
|
|
2219
|
+
const REFERENCE_CREDITS = `# reference: credits & cost preview
|
|
2220
|
+
|
|
2221
|
+
Every billable run (study, ask, insight) costs **credits**. The CLI
|
|
2222
|
+
surfaces a cost upper bound *before* you dispatch so you can budget. The
|
|
2223
|
+
backend is the authoritative source — its rejection envelope on
|
|
2224
|
+
\`insufficient_credits\` carries the live required/available pair.
|
|
2225
|
+
|
|
2226
|
+
## How costs are shaped
|
|
2227
|
+
|
|
2228
|
+
The formula has the same shape across modalities — \`max(1, round(N / 10))\`
|
|
2229
|
+
per principal — but the inputs differ. **Treat the rates below as the
|
|
2230
|
+
current calibration**; they will evolve as we differentiate per-modality
|
|
2231
|
+
compute cost. Agents should:
|
|
2232
|
+
|
|
2233
|
+
- For prospective cost preview: read \`credit_estimate\` from \`study run\`'s
|
|
2234
|
+
JSON envelope (top-level for solo/media runs; under \`pair_preview\` for
|
|
2235
|
+
tester-pair chat).
|
|
2236
|
+
- For hard budget checks: catch the backend's \`insufficient_credits\`
|
|
2237
|
+
rejection (HTTP 402; envelope shape below) and react to
|
|
2238
|
+
\`required\` / \`available\`.
|
|
2239
|
+
|
|
2240
|
+
| Surface | Per-principal cost | Total formula | Example |
|
|
2241
|
+
|---------------------|---------------------------------|--------------------------------------------------|--------------------------------------|
|
|
2242
|
+
| Interactive (URL) | \`max(1, round(steps/10))\` | \`testers × per-tester\` | 10 testers × 30 steps → 30 credits |
|
|
2243
|
+
| Text/image/video/audio/document | same | same | 5 testers × 20 steps → 10 credits |
|
|
2244
|
+
| Chat (external chatbot, solo) | \`max(1, round(turns/10))\` | \`testers × per-tester\` | 5 testers × 12 turns → 10 credits |
|
|
2245
|
+
| Chat (tester pair) | \`max(1, round(turns/10))\` × 2 | \`conv × per-side × 2\` | 3 conv × 14 turns → 6 credits |
|
|
2246
|
+
| Ask round | 1 / successful response | \`successful_testers\` | 50 responses → 50 credits |
|
|
2247
|
+
| Study insights | first free, then **10 flat** | n/a | 2nd analysis → 10 credits |
|
|
2248
|
+
|
|
2249
|
+
All numbers are **upper bounds**. Early termination, refusals, or
|
|
2250
|
+
backend audience trimming can reduce actual charge.
|
|
2251
|
+
|
|
2252
|
+
## Capping interactive/media spend (\`--max-interactions\`)
|
|
2253
|
+
|
|
2254
|
+
\`ish study run\` always sends \`max_interactions\` to the backend for
|
|
2255
|
+
interactive and media runs. Precedence: \`--max-interactions <n>\` flag
|
|
2256
|
+
> the iteration's stored \`details.max_interactions\` > **CLI default
|
|
2257
|
+
of 20**. The default exists to prevent runaway spend when a tester
|
|
2258
|
+
gets stuck on a broken or non-responsive surface — without a cap, one
|
|
2259
|
+
stuck tester can rack up 100+ steps before the SDK gives up. Pass
|
|
2260
|
+
\`--max-interactions\` to override (e.g. \`--max-interactions 50\` for
|
|
2261
|
+
deeper exploration, \`--max-interactions 5\` for a cheap smoke test).
|
|
2262
|
+
The confirmation block shows the resolved value and where it came
|
|
2263
|
+
from (flag / iteration / CLI default). The JSON envelope's
|
|
2264
|
+
\`credit_estimate.breakdown\` reflects the dispatched value.
|
|
2265
|
+
|
|
2266
|
+
## Where the CLI surfaces it
|
|
2267
|
+
|
|
2268
|
+
**Human output — \`study run\` confirmation block:**
|
|
2269
|
+
|
|
2270
|
+
\`\`\`
|
|
2271
|
+
Run settings:
|
|
2272
|
+
...
|
|
2273
|
+
Scale: 3 conv × 14 turns × 2 sides ≈ 84 LLM calls (upper bound — early-termination may shorten)
|
|
2274
|
+
Credits (est): ≈ 6 credit(s) upper bound — see \`ish docs get-page reference/credits\`
|
|
2275
|
+
\`\`\`
|
|
2276
|
+
|
|
2277
|
+
**JSON envelope — \`study run --json\`:**
|
|
2278
|
+
|
|
2279
|
+
Pair chat — under \`pair_preview\`:
|
|
2280
|
+
|
|
2281
|
+
\`\`\`json
|
|
2282
|
+
{
|
|
2283
|
+
"pair_preview": {
|
|
2284
|
+
"conversation_count": 3,
|
|
2285
|
+
"max_turns": 14,
|
|
2286
|
+
"llm_calls_upper_bound": 84,
|
|
2287
|
+
"credit_estimate": {
|
|
2288
|
+
"upper_bound": 6,
|
|
2289
|
+
"formula": "chat_pair",
|
|
2290
|
+
"breakdown": "3 conv × max(1, round(14 turns / 10)) × 2 sides = 3 × 1 × 2 = 6",
|
|
2291
|
+
"unit": "credits"
|
|
2292
|
+
}
|
|
2293
|
+
}
|
|
2294
|
+
}
|
|
2295
|
+
\`\`\`
|
|
2296
|
+
|
|
2297
|
+
Solo media/interactive/chat — top-level \`credit_estimate\`:
|
|
2298
|
+
|
|
2299
|
+
\`\`\`json
|
|
2300
|
+
{
|
|
2301
|
+
"iteration_id": "…",
|
|
2302
|
+
"credit_estimate": {
|
|
2303
|
+
"upper_bound": 30,
|
|
2304
|
+
"formula": "media_per_tester",
|
|
2305
|
+
"breakdown": "10 tester(s) × max(1, round(30 steps / 10)) = 10 × 3 = 30",
|
|
2306
|
+
"unit": "credits"
|
|
2307
|
+
}
|
|
2308
|
+
}
|
|
2309
|
+
\`\`\`
|
|
2310
|
+
|
|
2311
|
+
The \`formula\` key is stable: agents can branch on it (\`media_per_tester\`,
|
|
2312
|
+
\`chat_solo\`, \`chat_pair\`, \`ask_per_response\`).
|
|
2313
|
+
|
|
2314
|
+
## Tier allotments
|
|
2315
|
+
|
|
2316
|
+
| Tier | Monthly credits | Notes |
|
|
2317
|
+
|-------------|---------------------------|--------------------------------|
|
|
2318
|
+
| FREE | 200 (one-time signup) | Never refilled |
|
|
2319
|
+
| STARTER | 1,000 / month | Monthly reset |
|
|
2320
|
+
| PRO | 3,000 / month | Monthly reset |
|
|
2321
|
+
| ENTERPRISE | unlimited | Custom contract |
|
|
2322
|
+
|
|
2323
|
+
The CLI does not enforce these — the backend does. The CLI's job is to
|
|
2324
|
+
*preview*, so an agent doesn't dispatch a 5,000-credit run on a
|
|
2325
|
+
200-credit account.
|
|
2326
|
+
|
|
2327
|
+
## Insufficient-credit rejection shape
|
|
2328
|
+
|
|
2329
|
+
When you try to dispatch beyond what's available, the backend returns
|
|
2330
|
+
HTTP 402. The CLI surfaces it as a structured error envelope:
|
|
2331
|
+
|
|
2332
|
+
\`\`\`json
|
|
2333
|
+
{
|
|
2334
|
+
"error": "Insufficient credits.",
|
|
2335
|
+
"error_code": "insufficient_credits",
|
|
2336
|
+
"status": 402,
|
|
2337
|
+
"retryable": false,
|
|
2338
|
+
"required": 30,
|
|
2339
|
+
"available": 8,
|
|
2340
|
+
"upgrade_url": "https://app.ishlabs.io/billing"
|
|
2341
|
+
}
|
|
2342
|
+
\`\`\`
|
|
2343
|
+
|
|
2344
|
+
Exit code \`1\` (non-retryable). Don't poll — the user has to upgrade or
|
|
2345
|
+
free credits before re-dispatch.
|
|
2346
|
+
|
|
2347
|
+
## Agent recipe
|
|
2348
|
+
|
|
2349
|
+
1. Build/draft the run (\`study create\`, \`iteration create\`).
|
|
2350
|
+
2. Call \`study run\` *without* \`--dispatch\` to read the
|
|
2351
|
+
\`credit_estimate\` upper bound from JSON. (Or \`--dry-run\` where
|
|
2352
|
+
supported — see modality concept pages.)
|
|
2353
|
+
3. If \`upper_bound\` fits your budget, re-call with \`--dispatch\`.
|
|
2354
|
+
4. If you hit \`error_code: insufficient_credits\`, surface
|
|
2355
|
+
\`required\` / \`available\` / \`upgrade_url\` to the human.
|
|
2356
|
+
|
|
2357
|
+
## Caveats
|
|
2358
|
+
|
|
2359
|
+
- The CLI's preview uses the **same formula** the backend bills with,
|
|
2360
|
+
but does **not** make a network preflight call — it's pure math
|
|
2361
|
+
client-side. If the backend formula changes mid-version, the preview
|
|
2362
|
+
will drift until the CLI is updated. The \`insufficient_credits\`
|
|
2363
|
+
rejection envelope is always authoritative.
|
|
2364
|
+
- Pair-chat \`credit_estimate\` is \`null\` if \`max_turns\` isn't a finite
|
|
2365
|
+
number (e.g. the iteration doesn't specify one and there's no
|
|
2366
|
+
\`--max-turns\` flag).
|
|
2367
|
+
- Audience criteria that resolve server-side won't have a precise
|
|
2368
|
+
estimate at preview time — the CLI prints the shape (\`N × … × 2\`)
|
|
2369
|
+
instead of a number.
|
|
2370
|
+
|
|
2371
|
+
## Related
|
|
2372
|
+
|
|
2373
|
+
- \`reference/billing-limits\` — per-tier *entity* caps (max
|
|
2374
|
+
workspaces/studies/iterations/profiles), separate from credit budget.
|
|
2375
|
+
- \`reference/json-mode\` — full error envelope shape and exit codes.
|
|
2376
|
+
- \`concepts/study\`, \`concepts/iteration\`, \`concepts/ask\` —
|
|
2377
|
+
per-modality run shapes.
|
|
2378
|
+
- \`guides/chat\` — worked example of a pair-chat run including
|
|
2379
|
+
\`pair_preview.credit_estimate\`.
|
|
2380
|
+
`;
|
|
1742
2381
|
const REFERENCE_BILLING_LIMITS = `# reference: billing tier limits
|
|
1743
2382
|
|
|
1744
2383
|
Some create operations are gated by your account's billing tier. The
|
|
@@ -1812,6 +2451,9 @@ upgrade or delete an existing resource to free up headroom.
|
|
|
1812
2451
|
|
|
1813
2452
|
## Related
|
|
1814
2453
|
|
|
2454
|
+
- \`reference/credits\` — per-run credit cost & preview (separate from
|
|
2455
|
+
these entity caps; this page is about *how many things you can have*,
|
|
2456
|
+
that page is about *how much each run costs*).
|
|
1815
2457
|
- \`concepts/workspace\` — \`maxProducts\` is per-account.
|
|
1816
2458
|
- \`concepts/study\` — \`maxStudiesPerProduct\` gates study creation.
|
|
1817
2459
|
- \`concepts/iteration\` — \`maxIterationsPerStudy\` gates iteration creation.
|
|
@@ -1820,6 +2462,51 @@ upgrade or delete an existing resource to free up headroom.
|
|
|
1820
2462
|
`;
|
|
1821
2463
|
const GUIDE_CHAT = `# guide: chat-modality studies
|
|
1822
2464
|
|
|
2465
|
+
Chat-modality studies cover two distinct shapes:
|
|
2466
|
+
|
|
2467
|
+
- **external_chatbot** — testers probe a customer chatbot endpoint
|
|
2468
|
+
(sections 1-3 below: configure → smoke test → run).
|
|
2469
|
+
- **tester_pair** — two AI personas converse with each other for
|
|
2470
|
+
rehearsal scenarios. Pitch rehearsals, difficult-conversation
|
|
2471
|
+
prep, founder-vs-investor archetypes. See section 7a/7b and the
|
|
2472
|
+
TL;DR below.
|
|
2473
|
+
|
|
2474
|
+
## TL;DR — rehearse a pitch in one shot
|
|
2475
|
+
|
|
2476
|
+
For "rehearse my pitch against 3 different skeptical CTOs" (the
|
|
2477
|
+
canonical 1×N variations shape), this is the whole flow. Inline
|
|
2478
|
+
scenarios — no extra files needed:
|
|
2479
|
+
|
|
2480
|
+
\`\`\`bash
|
|
2481
|
+
# Capture aliases for the rep (1) and CTOs (3) via subshell:
|
|
2482
|
+
REP=$(ish profile generate \\
|
|
2483
|
+
--description "Senior B2B SaaS account executive; concise, technical" \\
|
|
2484
|
+
--count 1 --json | jq -r '.items[0].alias')
|
|
2485
|
+
CTOS=$(ish profile generate \\
|
|
2486
|
+
--description "Skeptical CTO at Series B SaaS; distrusts AI vendors" \\
|
|
2487
|
+
--count 3 --json | jq -r '[.items[].alias] | join(",")')
|
|
2488
|
+
|
|
2489
|
+
# One-shot study + iteration A (1×N broadcast does the rest):
|
|
2490
|
+
ish study create --modality chat --chat-mode tester_pair \\
|
|
2491
|
+
--name "Pitch rehearsal" \\
|
|
2492
|
+
--audience-a "$REP" --audience-b "$CTOS" \\
|
|
2493
|
+
--scenario-a "You are pitching <your product>. Be concise, push back on vague objections. Goal: land a pilot or a clear next step." \\
|
|
2494
|
+
--scenario-b "You are a skeptical CTO. Probe for technical depth, distrust marketing-speak, refuse to commit without evidence. Goal: leave with either a concrete proof point or a graceful 'no'." \\
|
|
2495
|
+
--assignment "Pitch:Land a pilot" --max-turns 14
|
|
2496
|
+
|
|
2497
|
+
# Run all 3 conversations:
|
|
2498
|
+
ish study run -y --wait
|
|
2499
|
+
|
|
2500
|
+
# Compare side-by-side:
|
|
2501
|
+
ish iteration get <iter-id> --json \\
|
|
2502
|
+
| jq '.conversations[] | {pair_index, end_reason, dynamic: .summary.dominant_dynamic}'
|
|
2503
|
+
\`\`\`
|
|
2504
|
+
|
|
2505
|
+
Section 7b below has the longer version with scenario-writing
|
|
2506
|
+
guidance, criteria-driven audiences, and the broadcast rule.
|
|
2507
|
+
|
|
2508
|
+
---
|
|
2509
|
+
|
|
1823
2510
|
Goal: from a customer chatbot endpoint to a finished chat-modality
|
|
1824
2511
|
study with parsed transcripts, end to end via the CLI. The flow has
|
|
1825
2512
|
three phases: configure the endpoint, smoke test it, run a study.
|
|
@@ -2113,13 +2800,20 @@ cat ./bot-config.json | ish study create \\
|
|
|
2113
2800
|
|
|
2114
2801
|
Optional \`--max-turns <n>\` (default 12) caps the chat per tester.
|
|
2115
2802
|
|
|
2116
|
-
Audience size is set at run time
|
|
2117
|
-
|
|
2118
|
-
\`--profile <id>\` is also supported
|
|
2803
|
+
Audience size is set at run time for **external_chatbot** chat
|
|
2804
|
+
studies. Use \`--sample <N>\` to pick N random simulatable profiles,
|
|
2805
|
+
or \`--all\` for the full pool. \`--profile <id>\` is also supported
|
|
2806
|
+
for explicit selection:
|
|
2119
2807
|
\`\`\`
|
|
2120
2808
|
ish study run stu-xyz --sample 5 --wait
|
|
2121
2809
|
\`\`\`
|
|
2122
2810
|
|
|
2811
|
+
> **Pair-mode is different.** \`--sample\` / \`--profile\` / demographic
|
|
2812
|
+
> filters on \`study run\` are **refused** for tester_pair iterations
|
|
2813
|
+
> — pair audiences live on the iteration itself. Set them at
|
|
2814
|
+
> iteration-create time via \`--audience-a/-b\` (with 1×N broadcast)
|
|
2815
|
+
> or \`--role-criteria-a/-b\`. See the tester_pair section below.
|
|
2816
|
+
|
|
2123
2817
|
Pull raw interactions:
|
|
2124
2818
|
\`\`\`
|
|
2125
2819
|
ish study results stu-xyz --json | jq '.interactions'
|
|
@@ -2141,6 +2835,171 @@ ish iteration create --study stu-xyz --endpoint-config ./bot.json
|
|
|
2141
2835
|
|
|
2142
2836
|
Same flag set as \`study create\`'s chat shortcut.
|
|
2143
2837
|
|
|
2838
|
+
## tester_pair: rehearse a conversation between two AI personas
|
|
2839
|
+
|
|
2840
|
+
\`Modality.CHAT\` also supports a **tester_pair** mode where two AI
|
|
2841
|
+
tester profiles converse with each other — useful for rehearsing a
|
|
2842
|
+
sales pitch, a difficult conversation, a fundraising chat, or any
|
|
2843
|
+
two-role scenario. Each side has its own scenario + goal text; the
|
|
2844
|
+
other side does NOT see it (the asymmetry contract). Audiences are
|
|
2845
|
+
1:1 paired by index (audience_a[i] talks to audience_b[i]).
|
|
2846
|
+
|
|
2847
|
+
One-shot study + iteration:
|
|
2848
|
+
|
|
2849
|
+
\`\`\`
|
|
2850
|
+
ish study create \\
|
|
2851
|
+
--modality chat --chat-mode tester_pair \\
|
|
2852
|
+
--name "Pitch rehearsal" \\
|
|
2853
|
+
--audience-a tp-sales-1,tp-sales-2 \\
|
|
2854
|
+
--audience-b tp-cto-skeptic-1,tp-cto-skeptic-2 \\
|
|
2855
|
+
--scenario-a @./sales_rep.md \\
|
|
2856
|
+
--scenario-b @./skeptical_cto.md \\
|
|
2857
|
+
--assignment "Pitch:Try to win the meeting"
|
|
2858
|
+
\`\`\`
|
|
2859
|
+
|
|
2860
|
+
Or add a pair iteration to an existing chat study:
|
|
2861
|
+
|
|
2862
|
+
\`\`\`
|
|
2863
|
+
ish iteration create --study stu-xyz --chat-mode tester_pair \\
|
|
2864
|
+
--audience-a tp-a1,tp-a2 --audience-b tp-b1,tp-b2 \\
|
|
2865
|
+
--scenario-a "..." --scenario-b "..." \\
|
|
2866
|
+
--max-turns 14
|
|
2867
|
+
\`\`\`
|
|
2868
|
+
|
|
2869
|
+
### Rehearsing against N variations of one side (1×N)
|
|
2870
|
+
|
|
2871
|
+
The most common rehearsal shape: fix one side (your role) and vary
|
|
2872
|
+
the other (the audience you're rehearsing against). E.g. "pitch this
|
|
2873
|
+
once and see how it lands against 3 different skeptical CTOs."
|
|
2874
|
+
|
|
2875
|
+
Step 1 — produce N distinct profiles for the varying side:
|
|
2876
|
+
|
|
2877
|
+
\`\`\`bash
|
|
2878
|
+
# Generate 3 skeptical-CTO profiles (or any archetype):
|
|
2879
|
+
ish profile generate \\
|
|
2880
|
+
--description "Skeptical CTO at a Series B SaaS startup; distrusts AI vendors" \\
|
|
2881
|
+
--count 3 --json | jq -r '.items[].alias'
|
|
2882
|
+
# → tp-cto1, tp-cto2, tp-cto3
|
|
2883
|
+
\`\`\`
|
|
2884
|
+
|
|
2885
|
+
If you already have profiles you want to reuse, list them:
|
|
2886
|
+
|
|
2887
|
+
\`\`\`bash
|
|
2888
|
+
ish profile list --search "cto" --json | jq -r '.items[].alias'
|
|
2889
|
+
\`\`\`
|
|
2890
|
+
|
|
2891
|
+
Step 2 — author the two scenarios as separate files (\`sales_rep.md\`
|
|
2892
|
+
and \`skeptical_cto.md\`). **Each scenario is a system prompt for one
|
|
2893
|
+
role — the other side never sees it.** Cover voice, what they know,
|
|
2894
|
+
what they don't know, and what counts as success for them. Don't
|
|
2895
|
+
cram demographic constraints into the text; that's what
|
|
2896
|
+
\`--role-criteria-\*\` is for. See the **"Writing a good scenario"**
|
|
2897
|
+
section below for the Maya/Devon worked example and the 5-point
|
|
2898
|
+
template.
|
|
2899
|
+
|
|
2900
|
+
Step 3 — create the iteration with **one profile** on the fixed
|
|
2901
|
+
side and **N profiles** on the varying side. The CLI auto-broadcasts
|
|
2902
|
+
the singleton to match length N (and prints a stderr notice like
|
|
2903
|
+
\`Broadcasting --audience-a (1 profile) to length 3 to match --audience-b\`
|
|
2904
|
+
when it does, so you can see it happen):
|
|
2905
|
+
|
|
2906
|
+
\`\`\`bash
|
|
2907
|
+
ish study create \\
|
|
2908
|
+
--modality chat --chat-mode tester_pair \\
|
|
2909
|
+
--name "Pitch rehearsal — 3 CTO variants" \\
|
|
2910
|
+
--audience-a tp-rep \\
|
|
2911
|
+
--audience-b tp-cto1,tp-cto2,tp-cto3 \\
|
|
2912
|
+
--scenario-a @./sales_rep.md \\
|
|
2913
|
+
--scenario-b @./skeptical_cto.md \\
|
|
2914
|
+
--assignment "Pitch:Land a pilot or a clear next step"
|
|
2915
|
+
|
|
2916
|
+
# Result: 3 conversations, all using tp-rep on side A, one each
|
|
2917
|
+
# of tp-cto1/2/3 on side B. Same scenario for the CTOs (they share
|
|
2918
|
+
# the role description) but different underlying personas, so the
|
|
2919
|
+
# conversations diverge in tone and pressure points.
|
|
2920
|
+
\`\`\`
|
|
2921
|
+
|
|
2922
|
+
Run it (\`--yes\` to skip the confirmation prompt):
|
|
2923
|
+
|
|
2924
|
+
\`\`\`bash
|
|
2925
|
+
ish study run -y --wait
|
|
2926
|
+
\`\`\`
|
|
2927
|
+
|
|
2928
|
+
Inspect the per-conversation summaries side-by-side:
|
|
2929
|
+
|
|
2930
|
+
\`\`\`bash
|
|
2931
|
+
ish iteration get <iter-id> --json \\
|
|
2932
|
+
| jq '.conversations[] | {pair_index, end_reason, dominant_dynamic: .summary.dominant_dynamic}'
|
|
2933
|
+
\`\`\`
|
|
2934
|
+
|
|
2935
|
+
**When to use criteria instead**: if you don't care about specific
|
|
2936
|
+
profile IDs and just want "any 3 CTOs the backend can find", pass
|
|
2937
|
+
\`--role-criteria-b '{"occupation":["cto"]}'\` (alone or with a single
|
|
2938
|
+
\`--audience-a tp-rep\`). The backend resolves the matching pool at
|
|
2939
|
+
iteration-create time. Caveat: the resolved pool may collapse onto
|
|
2940
|
+
similar personas — for guaranteed distinctness, generate explicit
|
|
2941
|
+
profiles first.
|
|
2942
|
+
|
|
2943
|
+
### Criteria-driven audience (persona-first filtering)
|
|
2944
|
+
|
|
2945
|
+
When you don't want to hand-pick UUIDs, pass a **role-criteria
|
|
2946
|
+
filter** per side. The backend resolves it into an eligible pool of
|
|
2947
|
+
tester profiles and pairs them 1:1. The persona itself is never
|
|
2948
|
+
altered — criteria filter the pool upstream so the persona is
|
|
2949
|
+
already plausible for the role:
|
|
2950
|
+
|
|
2951
|
+
\`\`\`
|
|
2952
|
+
ish study create \\
|
|
2953
|
+
--modality chat --chat-mode tester_pair \\
|
|
2954
|
+
--name "Pitch rehearsal" \\
|
|
2955
|
+
--role-criteria-a '{"occupation":["sales","account executive"],"min_age":28}' \\
|
|
2956
|
+
--role-criteria-b '{"occupation":["cto","vp engineering"],"country":["US","SE"]}' \\
|
|
2957
|
+
--scenario-a @./sales_rep.md --scenario-b @./skeptical_cto.md \\
|
|
2958
|
+
--assignment "Pitch:Try to land a pilot"
|
|
2959
|
+
\`\`\`
|
|
2960
|
+
|
|
2961
|
+
Keys (all optional): \`occupation\`, \`min_age\`, \`max_age\`,
|
|
2962
|
+
\`gender\`, \`country\`, \`education_level_in\`, \`household_in\`,
|
|
2963
|
+
\`locale_type_in\`, \`income_level_in\`, \`employment_status_in\`,
|
|
2964
|
+
\`requires_captions\`, \`uses_screen_reader\`, \`prefers_reduced_motion\`,
|
|
2965
|
+
\`prefers_high_contrast\`, \`has_any_accessibility_need\`. The five \`*_in\`
|
|
2966
|
+
arrays accept snake_case spec values; the five accessibility filters are
|
|
2967
|
+
booleans. Combine \`--profile-*\` and \`--role-criteria-*\` on the same side
|
|
2968
|
+
to make criteria validate an explicit list (mismatch blocks the run).
|
|
2969
|
+
|
|
2970
|
+
MECE notes for the list filters:
|
|
2971
|
+
- \`household_in\`: \`couple_with_kids\` covers couples raising children;
|
|
2972
|
+
\`couple_no_kids\` is strictly child-free. \`single\` means lives alone
|
|
2973
|
+
(no partner, no roommates, no parents, no children in the household).
|
|
2974
|
+
- \`employment_status_in\`: pick the tester's primary daytime activity.
|
|
2975
|
+
A student who works 15 hrs/week is \`student\`; a retiree who freelances
|
|
2976
|
+
is \`retired\`.
|
|
2977
|
+
|
|
2978
|
+
If the resolved pool is too small, \`ish study run\` exits 2 with the
|
|
2979
|
+
backend's error message intact — no silent fallback. Broaden the
|
|
2980
|
+
criteria or generate more matching profiles via
|
|
2981
|
+
\`ish profile generate --description "..."\`.
|
|
2982
|
+
|
|
2983
|
+
Dispatch is per-Conversation (one task per pair index). Run-time
|
|
2984
|
+
audience overrides (\`--profile\`, \`--sample\`, \`--all\`, demographic
|
|
2985
|
+
filters) are refused on pair iterations — the iteration's audiences
|
|
2986
|
+
are authoritative. To change them, update the iteration:
|
|
2987
|
+
|
|
2988
|
+
\`\`\`
|
|
2989
|
+
ish study run --study stu-xyz --iteration i-pair -y
|
|
2990
|
+
ish iteration update i-pair --details-json '{...}' # change audiences
|
|
2991
|
+
\`\`\`
|
|
2992
|
+
|
|
2993
|
+
Inspect:
|
|
2994
|
+
|
|
2995
|
+
\`\`\`
|
|
2996
|
+
ish iteration get i-pair --json | jq '.details.mode_details.mode, .conversations[]'
|
|
2997
|
+
\`\`\`
|
|
2998
|
+
|
|
2999
|
+
Per-Conversation summaries (\`end_reason\`, \`dominant_dynamic\`,
|
|
3000
|
+
\`who_steered\`) land on \`iteration.conversations[]\`. Per-tester
|
|
3001
|
+
summaries land on \`tester.summary\` as before.
|
|
3002
|
+
|
|
2144
3003
|
## Active-endpoint convention
|
|
2145
3004
|
|
|
2146
3005
|
\`ish chat endpoint use <id>\` writes the endpoint to
|
|
@@ -2171,12 +3030,211 @@ Mirrors \`workspace use\` / \`study use\` / \`ask use\`.
|
|
|
2171
3030
|
|
|
2172
3031
|
## Related
|
|
2173
3032
|
|
|
2174
|
-
- \`concepts/iteration\` — chat iteration shape
|
|
2175
|
-
\`details.
|
|
3033
|
+
- \`concepts/iteration\` — chat iteration shape
|
|
3034
|
+
(\`details.mode_details\` discriminator, \`mode_details.endpoint\` /
|
|
3035
|
+
\`mode_details.chatbot_endpoint_id\` for external_chatbot,
|
|
3036
|
+
\`mode_details.audience_a/_b\` + \`scenario_a/_b\` for tester_pair,
|
|
3037
|
+
\`details.max_turns\`).
|
|
2176
3038
|
- \`concepts/study\` — modality + assignments + iteration nesting.
|
|
2177
3039
|
- \`reference/json-mode\` — JSON output, error envelope, exit codes.
|
|
2178
3040
|
- \`guides/first-study\` — the same pattern for an interactive
|
|
2179
3041
|
modality study.
|
|
3042
|
+
- \`guides/cold-start\` — the saturated-account first-step playbook
|
|
3043
|
+
if \`workspace_create\` returns \`usage_limit_reached\`.
|
|
3044
|
+
`;
|
|
3045
|
+
const GUIDE_COLD_START = `# guide: cold start on a saturated account
|
|
3046
|
+
|
|
3047
|
+
The naive cold-start instruction — "create a fresh workspace, then run
|
|
3048
|
+
a study" — fails immediately on any account that has accumulated state.
|
|
3049
|
+
\`workspace_create\` (CLI: \`ish workspace create\`) returns
|
|
3050
|
+
\`error_code: usage_limit_reached\` once the caller hits
|
|
3051
|
+
\`maxProducts\` for their tier (1 on FREE). On a saturated dogfood
|
|
3052
|
+
account this is the first call an agent burns. This guide is the
|
|
3053
|
+
recovery path: inspect existing state, pick a reuse target, or call
|
|
3054
|
+
the idempotent create-or-reuse-by-name path.
|
|
3055
|
+
|
|
3056
|
+
## The shape of the failure
|
|
3057
|
+
|
|
3058
|
+
\`\`\`json
|
|
3059
|
+
// workspace_create / POST /products on a FREE-tier account with 22 workspaces:
|
|
3060
|
+
{
|
|
3061
|
+
"error": "Free plan allows 1 workspace (you have 22).",
|
|
3062
|
+
"error_code": "usage_limit_reached",
|
|
3063
|
+
"status": 403,
|
|
3064
|
+
"retryable": false,
|
|
3065
|
+
"tier": "free",
|
|
3066
|
+
"limit": "maxProducts",
|
|
3067
|
+
"current": 22,
|
|
3068
|
+
"max": 1,
|
|
3069
|
+
"upgrade_url": "https://app.ishlabs.io/billing"
|
|
3070
|
+
}
|
|
3071
|
+
\`\`\`
|
|
3072
|
+
|
|
3073
|
+
Don't retry. The cap is server-enforced. You have three recovery
|
|
3074
|
+
paths:
|
|
3075
|
+
|
|
3076
|
+
1. **Reuse an existing workspace** (most cases).
|
|
3077
|
+
2. **Use the idempotent \`--ensure\` path** if you have a stable name
|
|
3078
|
+
the user wants to claim.
|
|
3079
|
+
3. **Surface the upgrade link** if neither fits.
|
|
3080
|
+
|
|
3081
|
+
## Step 1 — inspect before you create
|
|
3082
|
+
|
|
3083
|
+
Always start a cold-start session by listing what's already there.
|
|
3084
|
+
\`workspace_get\` / \`ish workspace list --json\` returns rows with
|
|
3085
|
+
the metadata you need to pick safely:
|
|
3086
|
+
|
|
3087
|
+
\`\`\`bash
|
|
3088
|
+
ish workspace list --json
|
|
3089
|
+
\`\`\`
|
|
3090
|
+
|
|
3091
|
+
\`\`\`json
|
|
3092
|
+
{
|
|
3093
|
+
"items": [
|
|
3094
|
+
{
|
|
3095
|
+
"id": "...", "alias": "w-6ec", "name": "Onboarding revamp",
|
|
3096
|
+
"base_url": "https://example.com",
|
|
3097
|
+
"last_activity_at": "2026-05-10T14:22:00Z",
|
|
3098
|
+
"child_counts": { "studies": 2, "asks": 1, "tester_profiles": 4 },
|
|
3099
|
+
"has_headroom": true
|
|
3100
|
+
},
|
|
3101
|
+
{
|
|
3102
|
+
"id": "...", "alias": "w-d02", "name": "Demo",
|
|
3103
|
+
"last_activity_at": "2025-11-02T09:11:00Z",
|
|
3104
|
+
"child_counts": { "studies": 3, "asks": 0, "tester_profiles": 0 },
|
|
3105
|
+
"has_headroom": false
|
|
3106
|
+
}
|
|
3107
|
+
],
|
|
3108
|
+
"total": 22, "returned": 22, "limit": 50, "offset": 0, "has_more": false
|
|
3109
|
+
}
|
|
3110
|
+
\`\`\`
|
|
3111
|
+
|
|
3112
|
+
Read three fields per row:
|
|
3113
|
+
|
|
3114
|
+
- **\`last_activity_at\`** — most recent run, iteration, ask, or write
|
|
3115
|
+
on this workspace. The most recently active one is usually the
|
|
3116
|
+
workspace the user is mentally already in.
|
|
3117
|
+
- **\`child_counts\`** — \`{ studies, asks, tester_profiles }\`. Zero
|
|
3118
|
+
across the board = quiet/empty, ideal reuse target without
|
|
3119
|
+
cluttering anyone's view. A workspace with content the user owns is
|
|
3120
|
+
also fine to reuse if there's still headroom.
|
|
3121
|
+
- **\`has_headroom\`** — \`true\` if the workspace still has room under
|
|
3122
|
+
\`maxStudiesPerProduct\`, \`maxIterationsPerStudy\`, and
|
|
3123
|
+
\`maxCustomTesterProfiles\` for the caller's tier. If \`false\`, the
|
|
3124
|
+
next \`study create\` / \`profile generate\` against this workspace
|
|
3125
|
+
will be \`usage_limit_reached\`. Filter these out unless the user
|
|
3126
|
+
explicitly wants to free space by deleting state.
|
|
3127
|
+
|
|
3128
|
+
## Step 2 — pick a reuse target (decision rule)
|
|
3129
|
+
|
|
3130
|
+
\`\`\`
|
|
3131
|
+
For each workspace in workspace_get():
|
|
3132
|
+
if has_headroom == false: skip (next call would fail)
|
|
3133
|
+
if name matches the user's intent: use it (early return)
|
|
3134
|
+
if child_counts == 0 across board: candidate (empty workspace)
|
|
3135
|
+
else candidate (active but not user's intent)
|
|
3136
|
+
|
|
3137
|
+
If candidates exist:
|
|
3138
|
+
prefer name-match > most-recent last_activity_at > lowest child_counts
|
|
3139
|
+
|
|
3140
|
+
If zero candidates with has_headroom == true:
|
|
3141
|
+
the account is genuinely saturated — surface upgrade_url
|
|
3142
|
+
from the next workspace_create's error envelope.
|
|
3143
|
+
\`\`\`
|
|
3144
|
+
|
|
3145
|
+
\`\`\`bash
|
|
3146
|
+
ish workspace use w-6ec # commit the choice; saves to ~/.ish/config.json
|
|
3147
|
+
\`\`\`
|
|
3148
|
+
|
|
3149
|
+
## Step 3 — or use \`--ensure\` to skip the decision tree
|
|
3150
|
+
|
|
3151
|
+
When you have a stable workspace name the user owns (e.g. a brand
|
|
3152
|
+
name, a project codename), use the idempotent path:
|
|
3153
|
+
|
|
3154
|
+
\`\`\`bash
|
|
3155
|
+
ish workspace create --name "Acme — onboarding revamp" --ensure
|
|
3156
|
+
\`\`\`
|
|
3157
|
+
|
|
3158
|
+
Behavior:
|
|
3159
|
+
|
|
3160
|
+
- If a workspace with that exact name exists and is owned by the
|
|
3161
|
+
caller, returns it (HTTP 200, no quota consumed, no error).
|
|
3162
|
+
- Otherwise creates a fresh one (HTTP 201; consumes one
|
|
3163
|
+
\`maxProducts\` slot, so still subject to the tier cap).
|
|
3164
|
+
- The returned envelope is the same shape either way — agents don't
|
|
3165
|
+
branch on success vs. reuse.
|
|
3166
|
+
|
|
3167
|
+
This is the right call when you don't want to scrape the list
|
|
3168
|
+
yourself or risk a name clash. Pair it with the inspection step
|
|
3169
|
+
when the saturated state matters (e.g. you also need to know
|
|
3170
|
+
\`has_headroom\` before \`study create\`).
|
|
3171
|
+
|
|
3172
|
+
## Worked transcript — saturated account, agent recovery
|
|
3173
|
+
|
|
3174
|
+
\`\`\`bash
|
|
3175
|
+
# 1. Probe state before doing anything else.
|
|
3176
|
+
ish workspace list --json --fields alias,name,last_activity_at,child_counts,has_headroom \\
|
|
3177
|
+
| jq '.items | sort_by(.last_activity_at) | reverse | .[0:5]'
|
|
3178
|
+
|
|
3179
|
+
# Output (truncated to top-5 most-recently-active):
|
|
3180
|
+
# [
|
|
3181
|
+
# {"alias":"w-6ec","name":"Onboarding revamp",
|
|
3182
|
+
# "last_activity_at":"2026-05-10T14:22:00Z",
|
|
3183
|
+
# "child_counts":{"studies":2,"asks":1,"tester_profiles":4},
|
|
3184
|
+
# "has_headroom":true},
|
|
3185
|
+
# {"alias":"w-d02","name":"Demo",
|
|
3186
|
+
# "last_activity_at":"2025-11-02T09:11:00Z",
|
|
3187
|
+
# "child_counts":{"studies":3,"asks":0,"tester_profiles":0},
|
|
3188
|
+
# "has_headroom":false},
|
|
3189
|
+
# ...
|
|
3190
|
+
# ]
|
|
3191
|
+
|
|
3192
|
+
# 2. Pick a workspace with has_headroom=true (w-6ec here).
|
|
3193
|
+
ish workspace use w-6ec
|
|
3194
|
+
|
|
3195
|
+
# 3. Carry on as if the workspace_create had succeeded.
|
|
3196
|
+
ish profile generate --description "..." --count 3
|
|
3197
|
+
ish study create --modality interactive --name "..." \\
|
|
3198
|
+
--url https://example.com \\
|
|
3199
|
+
--assignment "..." --question "..."
|
|
3200
|
+
ish study run --all --wait
|
|
3201
|
+
\`\`\`
|
|
3202
|
+
|
|
3203
|
+
If the agent prefers \`--ensure\` (e.g. so the user sees their
|
|
3204
|
+
preferred name in the UI):
|
|
3205
|
+
|
|
3206
|
+
\`\`\`bash
|
|
3207
|
+
WS=$(ish workspace create --name "Cold-start probe" --ensure --get alias)
|
|
3208
|
+
ish workspace use "$WS"
|
|
3209
|
+
\`\`\`
|
|
3210
|
+
|
|
3211
|
+
## When the account is genuinely saturated
|
|
3212
|
+
|
|
3213
|
+
If every workspace has \`has_headroom: false\` AND \`maxProducts\` is
|
|
3214
|
+
at cap (\`current == max\`), there is no path to a new study without
|
|
3215
|
+
either upgrading the plan or deleting an existing workspace. Surface
|
|
3216
|
+
the \`upgrade_url\` from the \`usage_limit_reached\` envelope to the
|
|
3217
|
+
human and stop — don't guess which workspace to delete on the user's
|
|
3218
|
+
behalf.
|
|
3219
|
+
|
|
3220
|
+
## Why this matters
|
|
3221
|
+
|
|
3222
|
+
Two of four dogfood agents stopped on \`workspace_create\` on a
|
|
3223
|
+
saturated account before producing any signal — the very first call
|
|
3224
|
+
in the cold-start script was the cap-hitter. Inspecting
|
|
3225
|
+
\`workspace_get\` first (or going through \`--ensure\`) cuts that
|
|
3226
|
+
class of failure to zero. The \`last_activity_at\` / \`child_counts\` /
|
|
3227
|
+
\`has_headroom\` fields exist specifically so an agent can branch
|
|
3228
|
+
without a second round-trip.
|
|
3229
|
+
|
|
3230
|
+
## Related
|
|
3231
|
+
|
|
3232
|
+
- \`concepts/workspace\` — workspace fundamentals, including
|
|
3233
|
+
\`workspace info\` for in-workspace usage counters.
|
|
3234
|
+
- \`reference/billing-limits\` — the full tier × cap table; \`maxProducts\`
|
|
3235
|
+
drives \`workspace_create\` rejections.
|
|
3236
|
+
- \`reference/json-mode\` — error envelope shape and exit code mapping
|
|
3237
|
+
(\`usage_limit_reached\` is HTTP 403, exit 1, non-retryable).
|
|
2180
3238
|
`;
|
|
2181
3239
|
const PAGES = [
|
|
2182
3240
|
{
|
|
@@ -2200,7 +3258,7 @@ const PAGES = [
|
|
|
2200
3258
|
{
|
|
2201
3259
|
slug: "concepts/iteration",
|
|
2202
3260
|
title: "concept: iteration",
|
|
2203
|
-
description: "One configured run of a study (URL, media, or chat). Covers segments, segment labels, and
|
|
3261
|
+
description: "One configured run of a study (URL, media, or chat). Covers segments, segment labels, HTML content, and chat mode_details (external_chatbot vs tester_pair).",
|
|
2204
3262
|
body: CONCEPT_ITERATION,
|
|
2205
3263
|
},
|
|
2206
3264
|
{
|
|
@@ -2293,6 +3351,12 @@ const PAGES = [
|
|
|
2293
3351
|
description: "Per-tier caps on workspaces/studies/iterations/profiles; usage_limit_reached error shape.",
|
|
2294
3352
|
body: REFERENCE_BILLING_LIMITS,
|
|
2295
3353
|
},
|
|
3354
|
+
{
|
|
3355
|
+
slug: "reference/credits",
|
|
3356
|
+
title: "reference: credits & cost preview",
|
|
3357
|
+
description: "Per-modality credit cost formulas, where the CLI surfaces cost estimates (Scale line, pair_preview.credit_estimate, top-level credit_estimate), tier allotments, insufficient_credits error shape.",
|
|
3358
|
+
body: REFERENCE_CREDITS,
|
|
3359
|
+
},
|
|
2296
3360
|
{
|
|
2297
3361
|
slug: "guides/first-study",
|
|
2298
3362
|
title: "guide: your first study, end to end",
|
|
@@ -2302,9 +3366,15 @@ const PAGES = [
|
|
|
2302
3366
|
{
|
|
2303
3367
|
slug: "guides/chat",
|
|
2304
3368
|
title: "guide: chat-modality studies",
|
|
2305
|
-
description: "Configure a chatbot endpoint (slots-only model), smoke test it, run a chat-modality study.
|
|
3369
|
+
description: "Configure a chatbot endpoint (slots-only model), smoke test it, run a chat-modality study (external_chatbot mode). Also: tester_pair mode — two AI personas talk to each other for rehearsal scenarios.",
|
|
2306
3370
|
body: GUIDE_CHAT,
|
|
2307
3371
|
},
|
|
3372
|
+
{
|
|
3373
|
+
slug: "guides/cold-start",
|
|
3374
|
+
title: "guide: cold start on a saturated account",
|
|
3375
|
+
description: "What to do when workspace_create returns usage_limit_reached on a saturated account. Inspect workspace_get (has_headroom / child_counts / last_activity_at), pick a reuse target, or call ish workspace create --ensure name.",
|
|
3376
|
+
body: GUIDE_COLD_START,
|
|
3377
|
+
},
|
|
2308
3378
|
];
|
|
2309
3379
|
const PAGES_BY_SLUG = new Map(PAGES.map((p) => [p.slug, p]));
|
|
2310
3380
|
export function listPages() {
|