@ishlabs/cli 0.15.0 → 0.17.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -24,506 +24,205 @@ const VERSION = pkg.version;
24
24
  * "ish". Hard cap is 1024 chars. Front-load the use case.
25
25
  */
26
26
  const SKILL_DESCRIPTION = "Use this skill whenever the user mentions ish, a study, a tester profile, " +
27
- "a simulation run, an \"ask\", an audience, wants to dispatch tests against AI testers, " +
28
- "or wants to rehearse a conversation between two AI personas (e.g. sales rep vs. " +
29
- "skeptical buyer, founder vs. investor archetype). Wraps the `ish` CLI for managing " +
30
- "studies, asks, iterations, tester profiles, chatbot endpoints, and simulation runs " +
31
- "against the Ish platform. Always start by running `ish docs overview` to load the " +
32
- "domain model, then `ish docs list` and `ish docs get-page <slug>` for specifics. " +
33
- "Prefer this skill over guessing flags from `ish --help`.";
27
+ "a simulation run, an \"ask\", an audience, a chatbot probe, wants to " +
28
+ "dispatch tests against AI testers, or wants to rehearse a conversation " +
29
+ "between two AI personas (e.g. sales rep vs. skeptical buyer). Covers both " +
30
+ "the `ish` CLI (via Bash) and the hosted ish MCP server " +
31
+ "(`mcp__claude_ai_ish__*` on claude.ai) same operations, pick whichever " +
32
+ "your environment has. Read this skill first to orient on the mental model, " +
33
+ "then trust `ish docs` (CLI) or the MCP tool descriptions for argument details.";
34
34
  const SKILL_BODY = `# ish
35
35
 
36
- A CLI for the Ish platform run user-research studies and quick "ask"
37
- reactions against AI tester audiences. The CLI is the agent surface;
38
- this skill teaches you how to use it without re-reading its docs every
39
- time.
36
+ ish runs user-research simulations: simulated people experience your draft (page, copy, ad, pitch, chatbot, video, document) and report what they noticed, where they stalled, what they would do next. Use before shipping, when you need a fast reaction round, or to rehearse a conversation between two AI personas.
40
37
 
41
- ## When to invoke this skill
38
+ ## When to invoke
42
39
 
43
- The user mentioned any of: \`ish\`, a study, a tester profile,
44
- a tester source, a simulation run, an iteration, an "ask", an audience,
45
- or wants to dispatch tests against AI testers. Also invoke if the user
46
- asks to "run a study", "generate testers", "compare variants", "test a
47
- prototype with users", or similar.
40
+ The user mentioned \`ish\`, a study, an "ask", a tester profile, an audience, a simulation, "rehearse", "compare variants", "test before shipping", "probe a chatbot".
48
41
 
49
- ## First step, every time: load the mental model
42
+ ## Drivers
50
43
 
51
- Before producing any \`ish\` command, run:
44
+ ish has two surfaces; pick whichever your environment has:
52
45
 
53
- \`\`\`bash
54
- ish docs overview
55
- \`\`\`
46
+ - **MCP** — \`mcp__claude_ai_ish__*\` on claude.ai. Tool descriptions are authoritative for argument schemas.
47
+ - **CLI** — the \`ish\` binary. \`ish --help\` per command; \`ish docs overview\` / \`ish docs list\` / \`ish docs search\` / \`ish docs get-page <slug>\` for concept docs.
56
48
 
57
- This prints a one-page mental model (workspace study | ask testers
58
- → results) and lists every concept page available offline. The model is
59
- non-obvious — *do not* skip this step the first time the user asks for
60
- anything ish-related in a session.
49
+ Both wrap the same operations. If neither is present, tell the user: \`npm i -g @ishlabs/cli\`, or enable the ish connector on claude.ai. Don't try to drive ish without a driver.
61
50
 
62
- If you need detail on a specific concept:
63
-
64
- \`\`\`bash
65
- ish docs list # every page available
66
- ish docs get-page concepts/study # one page, full markdown
67
- ish docs get-page concepts/run-verbs # study run vs ask run
68
- ish docs search "<keyword>" # ranked hits with snippets
69
- \`\`\`
51
+ **When both are available, pick by op:**
52
+ - Streaming results to a watching user → **CLI** with \`--wait\` (per-tester output as testers complete).
53
+ - Structured one-shot reads or run dispatch → **MCP** (JSON in, JSON out, no shell).
54
+ - Idempotent setup (e.g. cold-start workspace) → **CLI** has \`--ensure\`; MCP doesn't.
55
+ - Local file uploads (images, video, docs) → **CLI** only — MCP doesn't accept binaries.
70
56
 
71
- The pages \`ish docs\` exposes are the source of truth newer than this
72
- skill file. **Trust \`ish docs\` over anything in this skill if they
73
- conflict.**
57
+ **Naming convention in this skill**: shapes below use MCP tool names (\`ask_run\`, \`study_create\`, \`chat_endpoint_init\`, …). The CLI equivalents are the same names kebab-cased under a noun group (\`ish ask run\`, \`ish study create\`, \`ish chat endpoint init\`, …). When in doubt: \`ish --help\` or \`ish <noun> --help\`.
74
58
 
75
- ## Quick orientation (one-screen)
59
+ ## Mental model
76
60
 
77
61
  \`\`\`
78
62
  Workspace (= product)
79
- ├── Tester Profiles (tp-…) reusable audience personas
80
- │ └── Sources (tps-…) transcripts/audio/images that seed generation
81
- ├── Study (s-…) persistent research artifact
82
- ├── modality interactive | text | video | audio | image | document | chat
83
- │ chat has two modes: external_chatbot (probe a customer bot)
84
- │ │ and tester_pair (two AI personas converse rehearsal)
85
- │ ├── assignments tasks the tester does
86
- │ ├── questionnaire questions the tester answers
87
- │ └── Iterations (i-…) one configured run; carries the URL or media
88
- │ └── Testers (t-…) instance of a profile in this iteration
89
- └── Ask (a-…) lightweight reaction artifact
90
- └── Rounds unit of execution; audience fixed at ask creation
63
+ ├── Tester Profile (tp-…) reusable AI persona
64
+ ├── Study (s-…) persistent artifact for testing a real surface
65
+ │ └── Iteration (i-…) one configured run; carries the URL or media
66
+ ├── Ask (a-…) lightweight artifact for reactions to text/image variants
67
+ └── Round unit of execution; audience fixed at ask creation
68
+ └── Chat Endpoint workspace-level definition of an external chatbot
69
+ (referenced by study modality: chat, mode: external_chatbot)
91
70
  \`\`\`
92
71
 
93
- Two run verbs:
94
- - \`ish study run\` — dispatches simulations on the latest iteration of a study.
95
- - \`ish ask run\` — appends a round to an ask (or \`--new\` to create one).
96
-
97
- Use **study** when the tester must *do* something on a real surface;
98
- use **ask** for quick reactions to text/image variants.
99
-
100
- **Cold-start caveat — "create a fresh workspace" is conditional on
101
- quota headroom.** \`workspace_create\` returns
102
- \`error_code: usage_limit_reached\` the instant the account is at
103
- \`maxProducts\` (FREE caps at 1). Always inspect with \`workspace_get\`
104
- first and check the \`has_headroom\` flag per row, or use
105
- \`ish workspace create --name <name> --ensure\` — idempotent: returns
106
- the existing workspace by name when one exists, otherwise creates. See
107
- \`ish docs get-page guides/cold-start\` before producing a
108
- workspace_create call on a session you haven't already probed.
109
-
110
- ## High-frequency commands
111
-
112
- \`\`\`bash
113
- # First command on a cold start — confirms login + active context:
114
- ish status # or: ish whoami
115
- # → user, active workspace/study/ask, token validity, API url
116
-
117
- # Auth & active selection (saved to ~/.ish/config.json)
118
- ish login
119
- ish workspace use w-6ec
120
- ish study use s-b2c
121
- ish ask use a-6ec
122
-
123
- # Idempotent workspace create — returns existing if name matches.
124
- # Use this on cold-start instead of a blind workspace_create that may
125
- # hit usage_limit_reached. See \`ish docs get-page guides/cold-start\`.
126
- ish workspace create --name "Acme — onboarding" --ensure
127
-
128
- # Inspect
129
- ish workspace list
130
- ish study list
131
- ish iteration list --study s-b2c
132
- ish ask list
133
-
134
- # Define / configure (one-shot — iteration A inline)
135
- ish study create --modality interactive --name "..." --url https://example.com \
136
- --assignment "..." --question "..."
137
- ish study create --modality image --name "..." \
138
- --image-urls "https://cdn.example.com/a.png,https://cdn.example.com/b.png" \
139
- --assignment "Compare:Which feels more premium?"
140
- ish study create --modality video --name "..." \
141
- --content-url https://cdn.example.com/ad.mp4 --assignment "Watch:..."
142
-
143
- # Or 2-step (when you want to A/B iterations later, or upload local files)
144
- ish study create --name "..." --modality interactive --assignment "..."
145
- ish iteration create --url https://example.com # auto-uploads local files
146
-
147
- ish profile generate --description "..." --count 5
148
-
149
- # Chat modality (external_chatbot — talk to a customer chatbot).
150
- # Audience size lives on study run; study create defines the persistent shape only.
151
- ish chat endpoint init --from-curl ./bot.curl --name my-bot
152
- ish chat endpoint test my-bot -m "Hello"
153
- ish study create --modality chat --endpoint my-bot --assignment "Sign up:Try to sign up"
154
- # (then) ish study run --sample 5 --wait
155
-
156
- # Chat modality (tester_pair — rehearse a conversation between two AI personas).
157
- # Audiences are pinned to the iteration; study run refuses run-time audience
158
- # overrides. Each side accepts EITHER explicit profiles OR a role-criteria
159
- # filter (or both — criteria validates the explicit list).
160
- ish study create --modality chat --chat-mode tester_pair --name "Pitch rehearsal" \\
161
- --audience-a tp-sales-1,tp-sales-2 --audience-b tp-cto-skeptic-1,tp-cto-skeptic-2 \\
162
- --scenario-a @./sales_rep.md --scenario-b @./skeptical_cto.md \\
163
- --assignment "Pitch:Try to win the meeting"
164
- # (then) ish study run -y
165
-
166
- # Criteria-driven variant — backend resolves the eligible pool per side.
167
- # Persona-first: the persona is sacred, criteria filter who plays the role.
168
- ish study create --modality chat --chat-mode tester_pair --name "Pitch rehearsal" \\
169
- --role-criteria-a '{"occupation":["sales"],"min_age":28}' \\
170
- --role-criteria-b '{"occupation":["cto","vp engineering"],"country":["US","SE"]}' \\
171
- --scenario-a @./sales_rep.md --scenario-b @./skeptical_cto.md \\
172
- --assignment "Pitch:Try to land a pilot"
173
-
174
- # Run
175
- ish study run --sample 5 --country SE --wait
176
- ish ask run --new --name "..." --prompt "..." --variant text:"A" --variant text:"B" --sample 30 --wants-pick --wait
177
-
178
- # Stage an ask for human review, then dispatch (no credits charged on stage)
179
- ish ask create --name "..." --prompt "..." --variant text:"A" --variant text:"B" \
180
- --sample 30 --wants-pick --no-dispatch
181
- ish ask dispatch a-6ec --wait
182
-
183
- # Results
184
- ish study results
185
- ish ask results a-6ec --round 1
186
-
187
- # AI summary + key insights (any modality with completed testers)
188
- ish study analyze --wait # trigger + block
189
- ish study insights # read latest
190
-
191
- # Screenshots (interactive studies — see what testers actually saw)
192
- ish study screenshots # list, frame-grouped
193
- ish study screenshots download <study-id> --id <scid> --out shot.png
194
- ish study screenshots download <study-id> --all --out ./shots/
195
-
196
- # Chat configurations (model + system prompt + tools per chatbot endpoint)
197
- ish chat config list # active endpoint
198
- ish chat config set --name v1 --model claude-sonnet-4-6 \\
199
- --system-prompt-file ./prompt.txt --default
200
- ish chat config get cc-abc --view iterations # cross-study use
201
-
202
- # Read offline docs
203
- ish docs overview
204
- ish docs get-page <slug>
205
- ish docs search <query>
206
- \`\`\`
72
+ **Audience is a query, not an entity.** Both \`ask_run\` and \`study_run\` take an \`audience\` argument shaped as \`{ profile_ids: [...] }\` (explicit) or \`{ sample: N, filters: {...} }\` (sampled from an existing pool). There is no \`audience\` resource to create — you build profiles via \`audience_build\` (or reuse existing ones via \`profile_list\`) and pass them in.
207
73
 
208
- ## Common workflows (worked examples)
209
-
210
- See \`references/workflows.md\` in this skill for end-to-end transcripts:
211
- - First study from zero (auth → workspace → audience → study → iteration → run → results)
212
- - Quick A/B ask with image variants
213
- - Generating profiles from a transcript or audio source
214
- - Targeting a gated URL (basic auth, session cookie, login form)
215
- - Re-running a study with a fresh audience
216
- - Extending a tester past its step cap (or redirecting mid-run with \`study extend\`)
217
-
218
- ## Display vs. capture: the right output mode
219
-
220
- Three output modes pick the one matching your intent, **don't reach
221
- for \`jq\` / \`python\` reflexively**:
222
-
223
- | Intent | Use |
224
- |-------------------------------------------------|----------------------|
225
- | Show the user a list/table | bare command (TTY) or \`--human\` |
226
- | Capture one value to feed into the next command | \`--get <field>\` |
227
- | Parse multiple fields / nested shape | \`--json\` |
228
-
229
- \`--get\` extracts a single field from the JSON response and prints its
230
- bare value. It supports dotted paths and auto-descends into list
231
- \`items\` so \`--get alias\` on a paginated list yields one alias per
232
- line. \`--human\` forces human output even when stdout is piped use
233
- it when you want to \`tee\` a table to a file but still show it. The
234
- two flags are mutually exclusive (capture and display are different
235
- intents).
236
-
237
- ### Worked example — capture in a script, display to the user
238
-
239
- \`\`\`bash
240
- # DON'T: shim around the CLI with jq just to grab one value.
241
- # ASK=$(ish ask create --json | jq -r .alias)
242
-
243
- # DO: capture mode bare value, exit 0.
244
- ASK=$(ish ask create --new --name demo \\
245
- --prompt "Which?" --variant text:A --variant text:B \\
246
- --sample 30 --get alias)
247
-
248
- # DON'T: pipe --json through jq when you want to show the user a table.
249
- # ish ask results "$ASK" --json | jq … | tee /tmp/x.txt
250
-
251
- # DO: --human keeps the table layout even through tee.
252
- ish ask results "$ASK" --human | tee /tmp/transcript.txt
253
- \`\`\`
254
-
255
- Missing field on \`--get\` → exit 2 with a usage error. \`--get\` also
256
- implies \`--quiet\` so the bare value is the only thing on stdout.
257
-
258
- ## Output handling
259
-
260
- - Every command supports \`--json\`. JSON mode is **auto-enabled when
261
- stdout is piped**, so an agent rarely needs \`--json\` explicitly.
262
- - **\`--get <field>\` is the right way to capture a single value.**
263
- Dotted paths supported (\`tester_profile.name\`); on a paginated
264
- \`{items: [...]}\` response, a leading non-\`items\` segment
265
- auto-descends into items. Replaces the
266
- \`--json | jq -r .field\` shim. Implies \`--json\` and \`--quiet\`.
267
- - **\`--human\` forces human output even when stdout is piped.** Use it
268
- to \`tee\` a table without losing the layout. Mutually exclusive
269
- with \`--get\`.
270
- - \`--fields a,b,c\` strips JSON output to the listed fields (saves
271
- tokens). \`--verbose\` adds full UUIDs and timestamps.
272
- - **Stdout is data only.** All progress, status, and "Open in browser"
273
- hints go to stderr; \`--json | jq -e .\` parses cleanly without
274
- defensive piping.
275
- - **List responses are a six-key envelope:** \`{items, total, returned,
276
- limit, offset, has_more}\`. Use \`has_more\` to detect truncation;
277
- don't count items yourself.
278
- - **\`study\` JSON includes a \`url\` field.** \`study create / generate /
279
- get / list / run\` each return a top-level \`url\` (per item on
280
- \`list\`) pointing to the study in the web app — \`overview\` for
281
- read/write commands, \`timeline\` for \`study run\`. Surface it to
282
- the user instead of composing \`<host>/<workspace>/<study>/...\`
283
- yourself. Host follows the active backend (\`app.ishlabs.io\` on
284
- production, \`localhost:3000\` under \`--dev\`); override with the
285
- \`ISH_APP_URL\` env var.
286
- - **Use \`runtime_status\`, not \`status\`, on study responses.** Values:
287
- \`draft | running | completed | completed_with_errors | cancelled\`.
288
- Derived from iteration testers' actual state — never reports
289
- \`failed\` while completed runs exist. The CLI also surfaces
290
- \`status_inferred\` + a stderr warning when raw \`status\` and the
291
- testers disagree.
292
- - **\`study generate --json\` returns \`modality_rationale\`** (one
293
- short sentence). Inspect it before adding iterations; if the LLM
294
- picked the wrong modality, override via
295
- \`ish study update <id> --modality text\`.
296
- - **Failed testers expose \`error_message\`.** \`study tester --json\`
297
- and \`study results --json\` (in \`testers[]\`) include
298
- \`error_message: "<reason>"\` for any tester with \`status: failed\`.
299
- Don't drill into logs read the field. \`study results\` also
300
- includes a top-level \`failed_count\` alongside \`completed_count\`.
301
- - **\`ask add-questions\` is additive by default.** Appending a
302
- follow-up question to a completed round preserves prior comments,
303
- picks, and ratings; only the new question is dispatched. Pass
304
- \`--redispatch-all\` for the legacy reset-and-rerun behavior.
305
- - **\`ask create --no-dispatch\` stages a draft, no bill yet.** Pair
306
- with \`ish ask dispatch <id>\` to flip DRAFT RUNNING and start
307
- the round. Use this when the user wants to review the audience or
308
- prompt before any credits are charged. Audience flags are still
309
- required (testers materialize at create time); only the worker
310
- enqueue and billing are deferred. Asks now carry a top-level
311
- \`status\` (\`draft | running | completed | cancelled\`) visible in
312
- \`ask list\` and \`ask get\`. \`dispatch\` is idempotent — a
313
- non-DRAFT ask returns 409 mapped to a usage error.
314
- - **\`ask results --json\` adds \`cross_round_summary\` for 2+ rounds.**
315
- Top-level field with per-round picks/winner snapshots and
316
- \`picks_delta\` (R1 → last). Don't diff two \`ask results\` calls by
317
- hand.
318
- - **\`ask retry <ask> --round N\` re-dispatches errored responses.**
319
- Use after a partial failure (e.g. 4 of 5 testers errored on round
320
- 1). Only ERRORED rows are reset to PENDING and re-run; COMPLETED
321
- rows are left untouched. Idempotent: zero-errored is a no-op. Add
322
- \`--wait\` to block.
323
- - **Errored ask responses carry \`error_message\` + \`error_kind\`.**
324
- Each \`responses[]\` entry whose \`status: errored\` exposes the
325
- classified failure (e.g. \`first_impression_llm_failed\`,
326
- \`interview_llm_failed\`, \`variant_preparation_failed\`). Branch on
327
- \`error_kind\` to decide retry vs abort.
328
- - **\`winner\` carries \`n\` and \`confidence\`.** \`n\` is the completed
329
- sample the verdict was elected from; \`confidence\` is \`low\` /
330
- \`medium\` / \`high\` based on completion ratio + tied-ness. When
331
- errored responses exceed 50%, the winner block is REPLACED by
332
- \`{ refused: true, reason: "error_rate_too_high", errored, total }\`
333
- run \`ask retry\` first.
334
- - **\`--workspace\` works at the program root AND every subcommand.**
335
- \`ish --workspace w-6ec study list\` and \`ish study list --workspace
336
- w-6ec\` are equivalent; if both are passed, the subcommand-level
337
- flag wins. Without either, the CLI falls back to \`ISH_WORKSPACE\`
338
- env then the active workspace in \`~/.ish/config.json\`.
339
- - **\`profile generate\` emits stderr progress.** \`generating N
340
- profiles…\` then \`generated N profiles\` around the ~10–20s LLM
341
- call. Suppress with \`--quiet\`. Generated bios reference the
342
- brief's domain context naturally (occupation, daily work,
343
- frustrations) they no longer parrot vocabulary from the brief
344
- verbatim. DOBs spread across the year instead of all-on-\`06-15\`.
345
- - **Empty-pool errors include a country-suggestion line.** When
346
- \`study run\` / \`ask run --new\` rejects because \`--country XX\`
347
- matched zero profiles, the error includes the top-3 populated
348
- countries that satisfy your *other* filters. Pivot directly without
349
- a second \`profile list\` round-trip.
350
- - **\`<entity> list\` emits a stderr pagination hint** when
351
- \`has_more=true\` and \`--quiet\` is unset. Goes to stderr in **every
352
- mode** (including \`--json\` and piped stdout) it never pollutes
353
- machine-readable stdout but is visible to any agent reading stderr.
354
- Format: "showing N–M of TOTAL; pass --offset M --limit N for more."
355
- - **\`study delete\` requires explicit confirmation.** Interactive:
356
- prompts on stderr. Non-interactive (\`--json\`, piped, non-TTY
357
- stdin): pass \`-y\` / \`--yes\` to confirm. Without it, the CLI
358
- exits with usage code 2.
359
- - **\`ask add-questions\` supports \`--wait\` / \`--timeout\`.** Match
360
- the parity of \`ask create\` and \`ask run\`. Without \`--wait\` the
361
- command returns after dispatch (round still running).
362
- - **\`study extend <tester>\` resumes a terminal tester.** Use it when
363
- a run hit \`--max-interactions\` before finishing, or pair with
364
- \`study cancel\` to redirect mid-run via \`--instruction\` (inline,
365
- \`@path\`, or stdin via \`-\`). Spawns a **new** tester branched from
366
- the source's last interaction — source row untouched. Credits debit
367
- per \`max(1, round(additional_steps / 10))\`. See workflow #11 and
368
- \`ish docs get-page concepts/extending-a-simulation\`.
369
- - **\`pick_confidence\` (0..1) is on every \`--wants-pick\` response.**
370
- The model's self-reported confidence in its variant choice. Use it
371
- to break ties when nominal pick counts are close. See
372
- \`ish docs get-page concepts/ask\`.
373
- - Exit codes carry meaning: 0 success, 2 usage/validation,
374
- 3 auth, 4 not-found, 5 transient. See
375
- \`ish docs get-page reference/json-mode\`.
376
- - **Tier limits surface as \`error_code: "usage_limit_reached"\`**
377
- (HTTP 403, exit 1, non-retryable). The error body includes
378
- \`tier\`, \`limit\`, \`current\`, \`max\`, \`upgrade_url\`. Do not
379
- retry — branch on the code and surface the upgrade link. See
380
- \`ish docs get-page reference/billing-limits\`.
381
- - Aliases (\`s-…\`, \`a-…\`, \`tp-…\`, \`i-…\`, \`t-…\`, \`tps-…\`, \`w-…\`)
382
- are accepted anywhere a UUID is. See
383
- \`ish docs get-page reference/aliases\`.
384
-
385
- ## Credits & cost preview
386
-
387
- Every dispatched run costs **credits**. The CLI surfaces an upper-bound
388
- estimate *before* you dispatch so you can budget:
389
-
390
- - **Human output** — \`study run\` shows a \`Scale:\` + \`Credits (est):\`
391
- line in the confirmation block (skipped under \`--yes\` or \`--json\`).
392
- - **JSON output** — \`study run --json\` includes a \`credit_estimate\`
393
- field. For tester-pair chat it nests under \`pair_preview\`; for
394
- solo/media runs it's top-level. Shape:
395
- \`{ upper_bound: number, formula: "media_per_tester" | "chat_solo" |
396
- "chat_pair" | "ask_per_response", breakdown: string, unit: "credits" }\`.
397
- - **\`formula\` is stable** — agents can branch on it.
398
-
399
- Today every modality uses \`max(1, round(N / 10))\` per principal
400
- (per tester for media/interactive, per side per conversation for chat,
401
- ×2 for tester-pair). Asks bill flat **1 credit per successful response**.
402
- Insights cost **10 credits flat** (first per-study is free).
403
-
404
- If you exceed the available budget at dispatch time, the backend rejects
405
- with HTTP 402 / \`error_code: "insufficient_credits"\`. The envelope
406
- carries \`required\`, \`available\`, \`upgrade_url\`. Don't retry — surface
407
- the upgrade link.
408
-
409
- The full table (per-modality rates, tier allotments, error envelope)
410
- lives in \`ish docs get-page reference/credits\`.
411
-
412
- ## Common pitfalls (don't do these)
413
-
414
- 1. **Don't paste flags from memory.** The CLI evolves; flags change.
415
- Run \`ish <command> --help\` to confirm before constructing a command.
416
- 2. **Don't pipe \`--json\` through \`python\`/\`jq\` to reshape output** —
417
- the CLI already has the affordances:
418
- - Inspect a few specific entities? \`ish profile get tp-1b9 tp-fc1
419
- tp-2fc\` (also works for \`study get\`, \`iteration get\`, \`ask
420
- get\`). Returns a \`{items:[...], total:N}\` envelope.
421
- - Want only certain fields? \`--fields alias,name,country,occupation\`.
422
- - Need counts of a nested array? \`ask get\` / \`ask create --wait\`
423
- already include \`testers_count\`, \`responses_total\`,
424
- \`responses_complete\` (per-round and aggregate). Don't recount.
425
- - Want machine-readable A/B verdicts? \`ask results --json\` already
426
- ships \`aggregates: { picks, ratings, winner }\` per round.
427
- 3. **Don't run \`ish study run\` against an empty study.** \`ish study
428
- create\` and \`ish study generate\` no longer auto-create iteration
429
- A — the first explicit \`ish iteration create\` becomes A. Running
430
- \`study run\` on a study with zero iterations exits 2; create one
431
- first via \`ish iteration create --url …\` / \`--content-url …\` /
432
- \`--content-text …\`. Or pass \`--content-text\` / \`--url\` directly
433
- on \`study create\` for a one-shot study + iteration A.
434
- 4. **Don't pass \`--profile\` together with demographic filters** — they
435
- are mutually exclusive. Either explicit IDs or
436
- \`--country\`/\`--gender\`/\`--min-age\`/\`--max-age\` + \`--sample\`.
437
- 5. **Don't change audience between rounds of an ask.** It's fixed at
438
- ask creation. Use \`ish ask add-testers\` to *extend* it; you can't
439
- replace it.
440
- 6. **Don't try to put credentials in the URL** for gated study URLs.
441
- Configure them once on the workspace via
442
- \`ish workspace site-access …\` (basic-auth, cookie, login).
443
- See \`ish docs get-page concepts/site-access\`.
444
- 7. **Don't commit \`~/.ish/config.json\`** — it stores tokens and active
445
- workspace/study/ask selections. It lives in \`$HOME\`, not the repo.
446
- 8. **Don't pass run-time audience flags to a tester_pair chat iteration.**
447
- Pair iterations carry their own audiences (\`audience_a\` /
448
- \`audience_b\` inside \`details.mode_details\`); \`ish study run\`
449
- refuses \`--profile\` / \`--sample\` / \`--all\` / demographic filters
450
- on them. To change audiences, update the iteration via
451
- \`ish iteration update <id> --details-json '{...}'\`. When both sides
452
- ship explicit \`--audience-a\` / \`--audience-b\` lists, lengths must
453
- match (1:1 by index) — or use \`--role-criteria-a/-b\` and let the
454
- backend resolve a pool.
455
- 9. **Don't cram demographic constraints into \`scenario_a/_b\` text.**
456
- Demographics (occupation, age, country, gender) belong in
457
- \`--role-criteria-a/-b\` so the persona stays sacred — filtering
458
- happens upstream of the prompt. Scenario text is for voice, goal,
459
- and knowledge of the role, not for who plays it. Mixing the two
460
- breaks the asymmetry contract and produces incoherent characters.
461
- 10. **Don't retry \`usage_limit_reached\` errors.** Tier caps
462
- (\`maxProducts\`, \`maxStudiesPerProduct\`, \`maxIterationsPerStudy\`,
463
- \`maxCustomTesterProfiles\`) are enforced server-side. The error body
464
- carries \`tier\`, \`limit\`, \`current\`, \`max\`, \`upgrade_url\` — show
465
- the upgrade link or delete an existing resource to free headroom.
466
- See \`ish docs get-page reference/billing-limits\` for the table.
467
- 11. **Don't retry \`insufficient_credits\` errors either.** HTTP 402,
468
- non-retryable. Read the \`credit_estimate\` field on \`study run --json\`
469
- *before* dispatching to know what you'll spend; if the error fires
470
- after, surface \`required\` / \`available\` / \`upgrade_url\` to the
471
- human. See \`ish docs get-page reference/credits\`.
472
- 12. **Don't dispatch interactive/media runs without thinking about
473
- \`--max-interactions\`.** \`ish study run\` defaults to a 20-step
474
- cap (flag > iteration's stored value > 20), which is the right
475
- answer for most onboarding/landing-page probes. Raise it
476
- (\`--max-interactions 50\`) when testers genuinely need to roam
477
- further; lower it (\`--max-interactions 5\`) for a smoke probe
478
- against a surface you suspect is broken — a stuck tester on a
479
- non-responsive page will otherwise burn the full cap before the
480
- SDK gives up. The confirmation block prints the resolved value
481
- and where it came from. Credits debit per
482
- \`max(1, round(steps/10))\` per tester; see
483
- \`ish docs get-page reference/credits\`.
484
- 13. **Don't call \`workspace_create\` blind on a cold start.** On a
485
- saturated account it returns \`error_code: usage_limit_reached\`
486
- immediately — the dogfood account hits this on the first call.
487
- Always call \`workspace_get\` (or \`ish workspace list --json\`)
488
- first and inspect \`has_headroom\` per row; if any existing
489
- workspace fits the work, use it via \`ish workspace use <id>\`.
490
- To programmatically reuse-or-create idempotently, prefer
491
- \`ish workspace create --name <name> --ensure\` — returns the existing
492
- workspace owned by the caller when the name matches, otherwise
493
- creates a fresh one. Same response shape either way, so the
494
- agent doesn't branch on success vs. reuse. See
495
- \`ish docs get-page guides/cold-start\`.
496
- 14. **Don't trust \`occupation\` filters as whole-token matches.**
497
- \`audience_build\` treats \`occupation\` as a **loose,
498
- case-insensitive substring** — \`occupation=["manager"]\` matches
499
- hotel managers, retail managers, bank branch managers, not just
500
- the engineering managers you probably wanted. Two recovery
501
- paths: enumerate the role surface explicitly
502
- (\`occupation=["engineering manager", "software engineering
503
- manager", "vp engineering", "tech lead"]\`) or read
504
- \`match_preview\` on the \`audience_build\` response and iterate
505
- on the filter before \`ask_run\` / \`study_run\`. The public
506
- profile pool skews non-tech / non-Western, so even a precise
507
- filter may resolve to a small count — preview before dispatching
508
- a run that depends on reaching N matches. See
509
- \`ish docs get-page concepts/audience\`.
510
-
511
- ## Authentication
512
-
513
- \`ish login\` opens a browser and saves tokens to \`~/.ish/config.json\`.
514
- The CLI also accepts \`--token <token>\` or \`ISH_TOKEN\` env var. If a
515
- command exits with code 3 ("auth"), tell the user to re-run \`ish login\`.
516
-
517
- ## When ish is the wrong tool
518
-
519
- If the user wants to *write code* against the Ish API directly, point
520
- them at the API docs at https://ishlabs.io — this CLI is for
521
- orchestration, not as an API client library.
522
-
523
- ---
524
-
525
- **Skill version:** ${VERSION}
526
- **Skill source of truth:** \`ish docs\` (offline, ships with the binary)
74
+ Two run verbs:
75
+ - **study run** — simulate on a real surface (URL, media, document, chat endpoint).
76
+ - **ask run** react to text or image variants.
77
+
78
+ Heuristic: **study** for "test this prototype/page/flow"; **ask** for "which copy/image lands better".
79
+
80
+ ## Workflow shapes
81
+
82
+ Each shape names the verb, the *required precursors*, and the **load-bearing knobs** — the arguments that change output quality, not just behavior. Look up the full schema in the MCP tool description or \`ish <command> --help\` once you've picked the shape.
83
+
84
+ Examples below use MCP shape; for CLI, kebab-case the tool name (\`ask_run\` → \`ish ask run\`) and pass equivalent flags (\`profile_ids: [...]\` → \`--profile-id tp-… --profile-id tp-…\`).
85
+
86
+ ### Compare text or image variants \`ask_run\`
87
+
88
+ - **Precursor**: an audience (see "Audience is a query" above). If you don't already have suitable tester profiles, build them first via \`audience_build\`; reuse via \`profile_list\` when possible.
89
+ - **Load-bearing knobs**:
90
+ - \`wants_pick: true\` — adds an aggregate winner verdict. Without it you get prose reactions but no clear answer.
91
+ - \`wants_ratings: true\` adds per-variant numeric scores.
92
+ - \`wait: true\` block until done. Without it you get a round id and have to poll.
93
+ - \`variants\` array of \`{ label, content }\` for text, or \`{ label, image_url }\` for hosted images. Two or more variants required for \`wants_pick\` to be meaningful (with N=1 it degrades to a prose reaction round). **Local image files**: only the CLI accepts them. Use \`--variant LABEL:@./path.png\` per file (the \`@\` prefix triggers upload); MCP requires a hosted URL.
94
+ - \`ask_id\` (optional) — passing an existing \`a-…\` id re-runs against that ask. Omit (or pass \`--new\` on the CLI) to create a new ask in one shot.
95
+ - **Shape**:
96
+ \`\`\`
97
+ ask_run({
98
+ variants: [ { label: "A", content: "..." }, { label: "B", content: "..." } ],
99
+ audience: { profile_ids: ["tp-…", ...] }, // or { sample: 10 }
100
+ wants_pick: true,
101
+ wants_ratings: true,
102
+ wait: true,
103
+ })
104
+ \`\`\`
105
+ - **Output**: per-tester reasoning + (if \`wants_pick\`) aggregate winner with confidence.
106
+
107
+ ### Test a live page or prototype \`study_run\` (modality: interactive)
108
+
109
+ - **Precursor**: a study with a URL. Either inline at create-time (\`study_create({ modality: "interactive", url: "..." })\`) or as a separate iteration (\`iteration_create({ study_id, url })\`) when you want to A/B iterations later or upload local files. An **assignment** is required — what the tester is supposed to attempt.
110
+ - **Audience**: pass \`audience: { profile_ids: [...] }\` or \`{ sample: N }\` to \`study_run\`, same contract as \`ask_run\`. Audience is set on the *run*, not the study.
111
+ - **Load-bearing knobs**:
112
+ - \`assignment\` (on \`study_create\`) — what the tester is supposed to do. Format: \`"<label>:<instruction>"\`. The whole run hinges on this being clear.
113
+ - \`wait\` (MCP) / \`--wait\` (CLI) — streams per-tester results as they complete. CLI streams to stdout in real-time; MCP blocks until the whole run finishes. For a watching user, prefer the CLI here.
114
+ - \`count\` (on \`study_run\`) how many testers.
115
+ - **Shape**:
116
+ \`\`\`
117
+ study_create({
118
+ modality: "interactive",
119
+ url: "https://staging.acme.io/welcome",
120
+ assignment: "Complete signup:Go through the 4-step wizard end-to-end",
121
+ })
122
+ study_run({ study_id: "s-…", audience: { profile_ids: [...] }, count: 15, wait: true })
123
+ \`\`\`
124
+ - **Output**: per-tester journey transcripts + aggregate friction / blocker / positive-moment counts.
125
+
126
+ ### Probe a customer chatbot \`study_run\` (modality: chat, mode: external_chatbot)
127
+
128
+ - **Precursors**:
129
+ 1. A **chat endpoint** definition at the workspace level. \`chat_endpoint_init\` from a curl spec (handles auth headers, request/response shape; **upsert-by-name** — safe to re-call with the same \`name\` to rotate auth or change the request shape) \`chat_endpoint_test\` to confirm it responds correctly before dispatching simulated testers.
130
+ 2. A study with \`modality: "chat"\`, \`mode: "external_chatbot"\`, the endpoint reference, and an \`assignment\`.
131
+ - **Audience**: same \`{ profile_ids } | { sample }\` contract; pass to \`study_run\`. For custom personas (e.g. "frustrated vs polite"), \`audience_build\` first.
132
+ - **Load-bearing knobs**:
133
+ - \`assignment\` what the tester tries to do (\`"Cancel:Try to cancel your subscription"\`).
134
+ - \`count\` on the run.
135
+ - **Shape**:
136
+ \`\`\`
137
+ chat_endpoint_init({ name: "support-bot", from_curl: "..." }) // or describe request shape directly
138
+ chat_endpoint_test({ endpoint: "support-bot", message: "hi" })
139
+ study_create({ modality: "chat", mode: "external_chatbot", endpoint: "support-bot",
140
+ assignment: "Cancel:Try to cancel your subscription" })
141
+ study_run({ study_id: "s-…", audience: { profile_ids: [...] }, count: 8, wait: true })
142
+ \`\`\`
143
+ - **Output**: full conversation transcripts per tester + aggregate success / blocker analysis.
144
+
145
+ ### Test a media artifact (document, image, video, audio) \`study_run\`
146
+
147
+ - **Precursors**:
148
+ 1. A study with the chosen modality: \`study_create({ modality: "document" | "image" | "video" | "audio", assignment: "..." })\`.
149
+ 2. An **iteration** carrying the media. For local files, **CLI only** — \`ish iteration create --study s-… --media @./deck.pdf\` (the \`@\` prefix triggers upload). For hosted URLs, either driver works: \`iteration_create({ study_id, content_url: "https://..." })\`.
150
+ - **Audience**: same \`{ profile_ids } | { sample }\` contract; pass to \`study_run\`. Reusable across runs (see "Lifecycle" below).
151
+ - **Load-bearing knobs**:
152
+ - \`assignment\` on \`study_create\` for review-style media (decks, ad creative), frame as decision: \`"Take a first meeting:Review this Series A deck and decide whether you'd take a first meeting"\`. Page/timestamp-level attribution depends on the assignment asking for it explicitly.
153
+ - \`wait\` / \`--wait\` same streaming story as interactive.
154
+ - \`count\` on \`study_run\`.
155
+ - **Iterating on the artifact** (v2 deck, v3 deck): create a **new iteration** on the same study (\`iteration_create\`), reuse the audience's \`profile_ids\`. See "Lifecycle".
156
+ - **Output**: per-tester reactions to the artifact + aggregate themes.
157
+
158
+ ### Rehearse a conversation between two AI personas → \`study_run\` (modality: chat, mode: tester_pair)
159
+
160
+ **If the user might want the same persona across multiple turns, pin profiles up-front — you can't retro-pin after a run.** Without pinning, personas are re-synthesized from the assignment text each time, so "the same VC from earlier" becomes prose-only continuity.
161
+
162
+ - **Precursor**: a workspace and (optionally) one or two tester profiles for persona pinning. If you skip the profiles, ish synthesizes both personas from the \`assignment\` text per-run — fine for one-shot rehearsals, drifts between iterations.
163
+ - **Audience**: optional. For persona continuity across iterations, build profiles via \`audience_build\` (or reuse via \`profile_list\`) and pass \`audience: { profile_ids: [...] }\` to \`study_run\` — the same profiles play the same roles each time.
164
+ - **Load-bearing knobs**:
165
+ - \`assignment\` encodes BOTH personas and what each is trying to do. More prose-heavy than other assignments; be specific. Example: \`"Founder pitches Series A to skeptical VC. Founder: defends AI customer-support startup, $2M ARR, 15% MoM. VC: thinks SaaS-for-SaaS is saturated, probes moat and unit economics."\`
166
+ - \`count\` typically 1 per run; set higher to generate variations.
167
+ - **Iterating the scenario** (turn-by-turn refinement): create a **new iteration** with a revised assignment; reuse the same \`profile_ids\` if you pinned personas. See "Lifecycle".
168
+ - **Output**: a full transcript per rehearsal.
169
+
170
+ ### Generate a fresh audience → \`audience_build\`
171
+
172
+ - **Input**: a \`description\`, a \`count\`, and optionally \`sources\` (transcripts / audio / images / docs that seed persona generation — for "make profiles that feel like these real customers"). Local files force CLI (binary upload constraint).
173
+ - **Output**: a list of \`profile_ids\` to pass into \`ask_run\` or \`study_run\`.
174
+ - **Cost**: slow (~30-120s) + credit-bearing. Reuse profiles via \`profile_list\` when possible. Sensible defaults: \`count: 5-10\` for ad-hoc tests, \`count: 20+\` for studies where you want statistical signal.
175
+ - **Growing an audience**: build only the delta — don't rebuild. Concat the new \`profile_ids\` with the existing ones for the next run. The "audience is a query" framing means there's no audience entity to update.
176
+ - **Shapes**:
177
+ \`\`\`
178
+ // Simple description only
179
+ audience_build({
180
+ description: "Parents of toddlers (ages 1-3), US, evening-routine focused",
181
+ count: 8,
182
+ })
183
+ // → { profile_ids: ["tp-…", ...] }
184
+
185
+ // Seeded from real transcripts (CLI only for local files)
186
+ // ish audience build --description "..." --count 10 \\
187
+ // --source @./interviews/customer-1.md \\
188
+ // --source @./interviews/customer-2.md
189
+ \`\`\`
190
+
191
+ ## Lifecycle (what to re-use vs create anew)
192
+
193
+ The most common multi-turn question: "user wants to change X re-use the existing thing or create a new one?"
194
+
195
+ | Change you want | What to do |
196
+ |---|---|
197
+ | Same ask, **same audience**, new variants | Pass \`ask_id\` (MCP) or \`--ask\` (CLI) on \`ask_run\` — re-uses the locked audience. |
198
+ | Same ask, **different audience** | New ask: omit \`ask_id\` (MCP) or pass \`--new\` (CLI). Audience is locked at ask creation. |
199
+ | Same study, **new media** (v2 deck, new image) | New **iteration** on the same study (\`iteration_create({ study_id, content_url \\| --media @path })\`). Iterations are immutable once they have results — never edit. |
200
+ | Same study, **new assignment** | **New study.** Assignment lives on the study; there's no in-place edit. Keep the old study's id for side-by-side comparison. *(Tester-pair exception: the assignment IS the content there — use a new **iteration** on the same study, not a new study.)* |
201
+ | Same audience across multiple runs / studies | Reuse the \`profile_ids\` array. Profiles are workspace-scoped resources (\`tp-…\`) they live independently of any ask or study. |
202
+ | Chat endpoint definition needs to change (auth rotate, URL change) | \`chat_endpoint_init\` is **upsert-by-name** — re-init with the same \`name\` and a new \`from_curl\` spec. Re-run \`chat_endpoint_test\` to confirm. |
203
+ | Persona reuse in tester-pair | Pin via \`profile_ids\` on the first \`study_run\`; pass the same ids on subsequent runs. Without pinning, personas are re-synthesized from the assignment per run. |
204
+
205
+ When in doubt: side-by-side comparison usually beats in-place edits. Ids are cheap; result history isn't.
206
+
207
+ ## Pitfalls
208
+
209
+ - **Cold start on free plan**: \`workspace_create\` returns \`usage_limit_reached\` at the free-plan cap (1 workspace). Always inspect with \`workspace_list\` first. **MCP-only recipe** (no \`--ensure\` available): \`workspace_list\` if non-empty, use the first; if empty, \`workspace_create\`; if \`workspace_create\` returns \`usage_limit_reached\`, re-call \`workspace_list\` (a workspace exists you didn't see — possibly created by another session). **CLI shortcut**: \`ish workspace create --name <name> --ensure\` is idempotent by name.
210
+ - **Ask audience vs variants** see Lifecycle table for the re-use vs new-ask decision.
211
+ - **Study iterations are immutable once they have results** — see Lifecycle table for new-iteration vs new-study.
212
+ - **Credit costs**: \`ask_run\`, \`study_run\`, and \`audience_build\` consume credits. Check \`workspace_get\`'s \`credits\` headroom before dispatching large runs. For free-plan ad-hoc tests, default \`count: 5-8\` testers + 2 variants is usually within budget.
213
+ - **\`audience_build\` may return fewer profiles than requested** if the description is over-constrained. Always read the returned \`profile_ids\` count, don't trust the requested \`count\` blindly.
214
+ - **Variants of wildly different length** (one-line vs paragraph) can skew picks toward the longer one. Keep variants comparable in shape.
215
+ - **Chatbot endpoint response-shape mismatch**: \`chat_endpoint_test\` succeeds shallowly if the bot responds at all, but a wrong response path (e.g. bot returns \`{ data: { reply } }\` instead of \`{ reply }\`) produces empty transcripts on the actual run. Inspect one full test response before dispatching testers.
216
+ - **Chatbot auth drift**: tokens/sessions baked into \`--from-curl\` expire. If transcripts come back as identical short error strings, re-run \`chat_endpoint_test\` and refresh the curl spec.
217
+ - **401 surfaces as fake blocker**: an unauthenticated endpoint produces "tester got stuck on auth screen" — looks like a UX blocker but is config. Always confirm endpoint auth before reading transcripts as user-research data.
218
+ - **No per-page/per-timestamp scoping for media**: there's no "evaluate just slide 14" or "react to seconds 0-30" API. State the focus explicitly in the \`assignment\` text, or pre-stitch the artifact (e.g. replace one slide locally, upload as a new iteration).
219
+
220
+ ## When in doubt
221
+
222
+ \`ish docs\` (deep concept references, CLI-side) and live MCP tool descriptions (argument schemas, MCP-side) are closer to source-of-truth than this skill. **Trust them over this skill if they conflict.**
223
+
224
+ - **CLI present**: \`ish docs overview\`, \`ish docs get-page concepts/run-verbs\`, \`ish docs get-page guides/cold-start\`, \`ish docs search <keyword>\`.
225
+ - **MCP only**: read the tool description of the MCP tool you're about to call; cross-reference against this skill's "Shape" blocks. The MCP server's own \`instructions\` block (delivered automatically with the tool list) covers vocabulary and posture and is authoritative.
527
226
  `;
528
227
  const WORKFLOWS_MD = `# ish workflows — worked examples
529
228
 
@@ -694,6 +393,14 @@ ish study run --country SE --min-age 35 --max-age 50 --sample 5 --wait
694
393
 
695
394
  # Second run — every female profile in the workspace, same iteration:
696
395
  ish study run --gender female --all --wait
396
+
397
+ # Free-text filters: --search matches the profile **name**, --bio
398
+ # matches the profile **bio**, --occupation matches the profile
399
+ # **occupation** (repeatable, OR-joined). All are case-insensitive
400
+ # substrings — the same flag set works on \`ish profile list\`,
401
+ # \`ish ask run\`, \`ish ask add-testers\`, and \`ish ask create\`.
402
+ ish study run --bio "screen reader" --all --wait
403
+ ish study run --occupation founder --occupation designer --sample 6 --wait
697
404
  \`\`\`
698
405
 
699
406
  If you don't pass any audience flags, \`ish study run\` reuses the
@@ -1349,7 +1056,6 @@ function buildSkillMd() {
1349
1056
  "metadata:",
1350
1057
  " author: ish",
1351
1058
  ` version: ${JSON.stringify(VERSION)}`,
1352
- "allowed-tools: Bash(ish:*)",
1353
1059
  "---",
1354
1060
  "",
1355
1061
  ].join("\n");