@agentled/cli 0.1.5 → 0.4.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. package/README.md +136 -0
  2. package/dist/commands/auth.js +30 -0
  3. package/dist/commands/auth.js.map +1 -1
  4. package/dist/commands/examples.d.ts +15 -0
  5. package/dist/commands/examples.js +100 -0
  6. package/dist/commands/examples.js.map +1 -0
  7. package/dist/commands/scaffold.d.ts +14 -0
  8. package/dist/commands/scaffold.js +103 -0
  9. package/dist/commands/scaffold.js.map +1 -0
  10. package/dist/commands/schema.d.ts +10 -0
  11. package/dist/commands/schema.js +58 -0
  12. package/dist/commands/schema.js.map +1 -0
  13. package/dist/commands/skills.d.ts +9 -0
  14. package/dist/commands/skills.js +94 -0
  15. package/dist/commands/skills.js.map +1 -0
  16. package/dist/commands/workflows.js +227 -9
  17. package/dist/commands/workflows.js.map +1 -1
  18. package/dist/index.js +6 -0
  19. package/dist/index.js.map +1 -1
  20. package/dist/utils/preflight.d.ts +25 -0
  21. package/dist/utils/preflight.js +185 -0
  22. package/dist/utils/preflight.js.map +1 -0
  23. package/dist/utils/skills.d.ts +49 -0
  24. package/dist/utils/skills.js +214 -0
  25. package/dist/utils/skills.js.map +1 -0
  26. package/package.json +4 -1
  27. package/patterns/v1/00-why-agentic-ops.md +107 -0
  28. package/patterns/v1/01-trigger-design.md +107 -0
  29. package/patterns/v1/02-dedup-gates.md +135 -0
  30. package/patterns/v1/03-credit-efficiency.md +130 -0
  31. package/patterns/v1/04-loop-patterns.md +147 -0
  32. package/patterns/v1/05-child-workflow-contracts.md +151 -0
  33. package/patterns/v1/06-conditional-routing.md +151 -0
  34. package/patterns/v1/07-error-handling.md +157 -0
  35. package/patterns/v1/08-composed-email-approval.md +130 -0
  36. package/patterns/v1/09-reports-and-knowledge-storage.md +166 -0
  37. package/scaffolds/README.md +61 -0
  38. package/scaffolds/email-polling-dedup.json +71 -0
  39. package/scaffolds/extract-threshold-alert.json +131 -0
  40. package/scaffolds/lead-scoring-kg.json +84 -0
  41. package/scaffolds/list-match-email.json +131 -0
  42. package/scaffolds/minimal.json +20 -0
  43. package/skills/agentled/SKILL.md +568 -0
@@ -0,0 +1,107 @@
1
+ # 01 — Trigger design: polling vs event triggers
2
+
3
+ **Problem**: Developers default to event triggers for email/document intake workflows, creating fragile pipelines that drop records, can't backfill, and are hard to debug.
4
+
5
+ **Why it fails silently**: Event triggers appear to work in testing (low volume, reliable delivery). At production scale, re-deliveries cause duplicates, Pub/Sub TTL loses events during outages, and there's no way to backfill records missed during downtime — without any visible error.
6
+
7
+ ---
8
+
9
+ ## Decision framework
10
+
11
+ | | Schedule (polling) | App Event (real-time) |
12
+ |---|---|---|
13
+ | **Latency** | minutes–hours | seconds |
14
+ | **Idempotency** | trivial — label/flag marks processed | must dedupe on messageId; re-deliveries happen |
15
+ | **Backfill** | built-in — widen the query window | doesn't exist; needs a separate bootstrap run |
16
+ | **Replay after outage** | automatic on next scheduled run | events can be permanently lost (TTL) |
17
+ | **Debugging** | read last execution log | subscription status + delivery + filter + dedupe all need checking |
18
+ | **Infrastructure** | none | webhook receiver, watch renewal, Pub/Sub |
19
+
20
+ **Default rule: polling for intake, events for reactions.**
21
+
22
+ ---
23
+
24
+ ## Anti-pattern
25
+
26
+ Using an event trigger for email intake because it "feels more real-time":
27
+
28
+ ```yaml
29
+ # Wrong: event trigger for deal flow email intake
30
+ trigger:
31
+ type: app_event
32
+ app: gmail
33
+ event: GMAIL_NEW_MESSAGE_RECEIVED
34
+ filters:
35
+ query: "from:investor subject:pitch"
36
+ ```
37
+
38
+ Problems:
39
+ - Duplicate delivery means the same email gets processed 2-3× with no dedup mechanism
40
+ - Gmail watch tokens expire — you need a renewal job or emails stop arriving silently
41
+ - No backfill: if the workflow is down for 2 days, those emails are gone
42
+ - Debugging requires checking: is the watch active? Did the webhook fire? Did the filter match? Did the dedup run?
43
+
44
+ ---
45
+
46
+ ## Correct pattern
47
+
48
+ Schedule trigger with label-based dedup:
49
+
50
+ ```yaml
51
+ # Correct: scheduled polling with dedup gate
52
+ trigger:
53
+ type: schedule
54
+ config:
55
+ frequency: daily
56
+ time: "08:00"
57
+
58
+ steps:
59
+ - id: fetch-emails
60
+ action: GMAIL_FETCH_EMAILS
61
+ input:
62
+ query: "-label:processed newer_than:1d"
63
+ max_results: 50
64
+
65
+ - id: process-email
66
+ type: loop
67
+ over: "{{steps.fetch-emails.messages}}"
68
+ # ... processing steps ...
69
+
70
+ - id: mark-processed
71
+ action: GMAIL_ADD_LABEL
72
+ input:
73
+ message_id: "{{currentItem.id}}"
74
+ label_id: "{{steps.ensure-label.id}}" # resolved ID, not display name
75
+ ```
76
+
77
+ The `-label:processed` filter does the dedup work. Each email is processed exactly once. If the workflow goes down for a week, widen to `newer_than:7d` on the next run to backfill.
78
+
79
+ ---
80
+
81
+ ## When to use event triggers
82
+
83
+ Event triggers are correct when:
84
+ - The user explicitly requires sub-minute latency ("alert within 30 seconds", "as soon as", "real-time")
85
+ - The workflow is a **side effect** (fire-and-forget notification), not a record-of-truth producer
86
+ - Missed events are acceptable (or you have a separate reconciliation job)
87
+
88
+ ---
89
+
90
+ ## Trigger type cheatsheet
91
+
92
+ | User says | Trigger |
93
+ |---|---|
94
+ | "process inbound pitch emails" | Schedule (daily) |
95
+ | "triage support emails every morning" | Schedule (daily 08:00) |
96
+ | "every Monday summarize last week's emails" | Schedule (weekly) |
97
+ | "analyze my inbox and create Notion entries" | Schedule (daily) |
98
+ | "page oncall within 30s of an escalation email" | App event |
99
+ | "create a ticket the moment a customer emails" | App event |
100
+ | "run every time a form is submitted" | Webhook |
101
+ | "user clicks Run" | Manual |
102
+
103
+ ---
104
+
105
+ ## One-line rule
106
+
107
+ > Default to Schedule + label-based dedup for email and document intake; use event triggers only when the user explicitly states a latency requirement under one minute.
@@ -0,0 +1,135 @@
1
+ # 02 — Dedup gates: idempotency for agentic workflows
2
+
3
+ **Problem**: Without a dedup gate, every record in a polling or webhook workflow gets processed multiple times — silently, expensively, and often with conflicting writes.
4
+
5
+ **Why it fails silently**: The first few runs look correct. Duplicates only surface when you notice your CRM has 3 entries for the same company, your enrichment bill is 2× expected, or your outreach tool sent the same email twice. By then, the damage is done.
6
+
7
+ ---
8
+
9
+ ## The Gmail label-ID bug (the most common dedup failure)
10
+
11
+ This is the one that wastes 2 hours and isn't documented anywhere.
12
+
13
+ You build an email polling workflow. You add a step to mark each email as processed. You pass the label name:
14
+
15
+ ```json
16
+ {
17
+ "action": "GMAIL_ADD_LABEL",
18
+ "input": {
19
+ "message_id": "{{currentItem.id}}",
20
+ "label_id": "processed"
21
+ }
22
+ }
23
+ ```
24
+
25
+ Result: `400 Bad Request: Invalid label: processed`
26
+
27
+ The Gmail API does not accept label **display names**. It requires internal label **IDs** — strings that look like `Label_3456789012345678`. The display name "processed" is what you see in Gmail's UI. The ID is what the API needs.
28
+
29
+ Same bug with any user-created label: `"agentled"`, `"reviewed"`, `"done"` — all invalid.
30
+
31
+ ---
32
+
33
+ ## Anti-pattern
34
+
35
+ ```json
36
+ // Wrong: passing label display name
37
+ {
38
+ "action": "GMAIL_ADD_LABEL",
39
+ "input": {
40
+ "message_id": "{{currentItem.id}}",
41
+ "label_id": "processed"
42
+ }
43
+ }
44
+ // → 400: Invalid label: processed
45
+ ```
46
+
47
+ ---
48
+
49
+ ## Correct pattern
50
+
51
+ Always resolve the label ID first using a create-or-get-label step:
52
+
53
+ ```json
54
+ // Step 1: create label if it doesn't exist, or get existing (idempotent)
55
+ {
56
+ "id": "ensure-label",
57
+ "action": "GMAIL_CREATE_LABEL",
58
+ "input": { "name": "processed" }
59
+ }
60
+ // Returns: { "id": "Label_3456789012345678", "name": "processed" }
61
+
62
+ // Step 2: fetch unprocessed emails
63
+ {
64
+ "id": "fetch-emails",
65
+ "action": "GMAIL_FETCH_EMAILS",
66
+ "input": {
67
+ "query": "-label:processed newer_than:1d",
68
+ "max_results": 50
69
+ }
70
+ }
71
+
72
+ // Step 3 (inside loop): mark each email processed using the resolved ID
73
+ {
74
+ "id": "mark-processed",
75
+ "action": "GMAIL_ADD_LABEL",
76
+ "input": {
77
+ "message_id": "{{currentItem.id}}",
78
+ "label_id": "{{steps.ensure-label.id}}" // ← resolved ID, not display name
79
+ }
80
+ }
81
+ ```
82
+
83
+ `GMAIL_CREATE_LABEL` is idempotent — if the label already exists, it returns the existing label's ID. Run it every time with no side effects.
84
+
85
+ ---
86
+
87
+ ## How label-based dedup works
88
+
89
+ The `-label:processed` filter in the fetch query does the dedup work:
90
+
91
+ 1. First run: fetches 50 emails. Processes each. Adds `processed` label to each.
92
+ 2. Second run: fetches emails without `processed` label. Those 50 are now excluded. Only new emails are returned.
93
+ 3. Outage for 3 days: widen to `newer_than:7d` on the next run. All unprocessed emails in the window are caught. Processed ones are excluded.
94
+
95
+ This gives you **exactly-once processing** with no database, no external state store, and no coordination overhead.
96
+
97
+ ---
98
+
99
+ ## Dedup for webhook triggers
100
+
101
+ Webhooks re-deliver. Always. Your endpoint will receive the same event 2–5× under normal conditions (retries on timeout, delivery confirmation failures). Without dedup:
102
+
103
+ ```
104
+ Webhook fires → workflow starts → enrichment call × 3 duplicates → 3 CRM entries
105
+ ```
106
+
107
+ The fix: use a unique event ID as an idempotency key and check before processing:
108
+
109
+ ```javascript
110
+ // Code step at workflow entry
111
+ const eventId = input.webhookPayload.id; // or messageId, leadId, etc.
112
+ const alreadyProcessed = await kv.get(`processed:${eventId}`);
113
+ if (alreadyProcessed) {
114
+ return { skipped: true, reason: "duplicate" };
115
+ }
116
+ await kv.set(`processed:${eventId}`, true, { ttl: 86400 });
117
+ ```
118
+
119
+ ---
120
+
121
+ ## Dedup patterns by source
122
+
123
+ | Source | Dedup mechanism |
124
+ |---|---|
125
+ | Gmail polling | `-label:processed` query + `GMAIL_ADD_LABEL` after processing |
126
+ | Webhook | Idempotency key from event ID, stored in KV or DB |
127
+ | Scheduled API poll | Cursor / `since_id` / `updated_at` timestamp stored in persistent memory |
128
+ | File/S3 intake | Move to `processed/` prefix after reading |
129
+ | Form submissions | Unique submission ID checked before processing |
130
+
131
+ ---
132
+
133
+ ## One-line rule
134
+
135
+ > Always resolve label IDs before passing them to the Gmail API — display names cause a silent 400 error — and always add the processed label as the final step in every email intake loop.
@@ -0,0 +1,130 @@
1
+ # 03 — Credit efficiency: not burning money while building
2
+
3
+ **Problem**: Developers restart full workflow executions to debug a single failed step, burning credits on work that was already done correctly.
4
+
5
+ **Why it fails silently**: The restarted execution appears to succeed. The wasted spend accumulates in the background — 3–5× expected credit usage during development — until the invoice arrives.
6
+
7
+ ---
8
+
9
+ ## The core discipline: fix → retry → verify
10
+
11
+ Every debugging cycle should follow exactly this sequence:
12
+
13
+ 1. **Identify** the failed step and its error
14
+ 2. **Fix** the configuration, prompt, or code
15
+ 3. **Retry from the failed step** — not from the beginning
16
+ 4. **Verify** the step output
17
+
18
+ Starting a new full execution to debug a failed step is the most expensive habit in agentic development. It re-runs every step that already succeeded: the enrichment API call, the LLM prompt, the database read. All paid again. None of them changed.
19
+
20
+ ---
21
+
22
+ ## Anti-pattern
23
+
24
+ ```
25
+ Execution fails at step 5 (AI scoring)
26
+ → Developer reads the error
27
+ → Fixes the prompt
28
+ → Starts a NEW execution from step 1
29
+ → Steps 1-4 run again: enrichment (5 credits), profile fetch (2 credits), web scrape (0 credits), data parse (0 credits)
30
+ → Step 5 runs with the fixed prompt
31
+ → Total wasted: 7 credits × every debug cycle
32
+ ```
33
+
34
+ In a workflow with 3 debug cycles per feature: 21 wasted credits before it works.
35
+
36
+ ---
37
+
38
+ ## Correct pattern
39
+
40
+ ```
41
+ Execution fails at step 5 (AI scoring)
42
+ → Developer reads the error
43
+ → Fixes the prompt in the workflow config
44
+ → Retries from step 5 — the platform reuses outputs from steps 1-4
45
+ → Step 5 runs with the fixed prompt
46
+ → Total wasted: 0 credits
47
+ ```
48
+
49
+ Most workflow platforms expose a "retry from this step" action on failed executions. Use it every time.
50
+
51
+ ---
52
+
53
+ ## Test steps in isolation before wiring them
54
+
55
+ Before adding a step to a live workflow, test it standalone with representative input data:
56
+
57
+ ```bash
58
+ # Test an AI step with real input — no execution, no credits for upstream steps
59
+ test_ai_action(
60
+ template: "Analyze this company: {{input.company}}. Score fit 0-100.",
61
+ responseStructure: { score: "number", reasoning: "string" },
62
+ input: { company: { name: "Stripe", industry: "fintech", employees: 4000 } }
63
+ )
64
+
65
+ # Test a code step in the same sandbox as production
66
+ test_code_action(
67
+ code: "return input.items.filter(i => i.score > 70)",
68
+ input: { items: [{ name: "A", score: 85 }, { name: "B", score: 60 }] }
69
+ )
70
+ ```
71
+
72
+ This catches errors before they're in a running execution. Zero credits for upstream steps.
73
+
74
+ ---
75
+
76
+ ## Mock downstream steps with prior output
77
+
78
+ When you need to test a downstream step (step 6) but don't want to re-run expensive upstream steps (steps 1-5):
79
+
80
+ 1. Find a prior execution where steps 1-5 succeeded
81
+ 2. Copy the output of step 5 from that execution
82
+ 3. Use it as mock input to `test_ai_action` or `test_code_action` for step 6
83
+
84
+ ```javascript
85
+ // Prior execution step 5 output (saved from execution abc-123):
86
+ const priorOutput = {
87
+ company: { name: "Stripe", score: 85, signals: ["YC", "series B"] }
88
+ };
89
+
90
+ // Test step 6 in isolation using that output
91
+ test_ai_action(
92
+ template: "Based on this profile, draft a 3-sentence outreach: {{input.company}}",
93
+ input: priorOutput
94
+ )
95
+ ```
96
+
97
+ No re-enrichment. No re-fetching. No wasted credits.
98
+
99
+ ---
100
+
101
+ ## One execution at a time
102
+
103
+ Don't start a new execution while one is in flight for the same workflow. Reasons:
104
+ - Parallel executions on the same data produce duplicate writes
105
+ - You can't read the output of execution A while debugging it if execution B is also running
106
+ - If both fail, you now have two half-processed states to reconcile
107
+
108
+ The discipline: start → observe → retry or fix → verify. Sequential, not parallel.
109
+
110
+ ---
111
+
112
+ ## Credit cost by step type (reference)
113
+
114
+ | Step type | Typical cost | Notes |
115
+ |---|---|---|
116
+ | AI action (standard model) | 5–15 credits | Varies by model tier and output length |
117
+ | Data enrichment (LinkedIn, Hunter) | 2–5 credits | Per-record cost |
118
+ | Web scrape | 0 credits | Free |
119
+ | HTTP request | 0 credits | Free |
120
+ | Code step | 0 credits | Free |
121
+ | Knowledge graph read/write | 1 credit | Flat |
122
+ | Browser automation | 10–15 credits | Per task |
123
+
124
+ Expensive steps are AI and enrichment. These are the ones you never want to re-run unnecessarily.
125
+
126
+ ---
127
+
128
+ ## One-line rule
129
+
130
+ > When a step fails, fix it and retry from that step — never start a new execution; use isolated step testing to catch errors before they're in a running workflow.
@@ -0,0 +1,147 @@
1
+ # 04 — Loop patterns: iterating without N+1 or data loss
2
+
3
+ **Problem**: Loops in agentic workflows silently drop items, produce N+1 API calls, or pass incomplete results to downstream steps because the loop hasn't finished yet.
4
+
5
+ **Why it fails silently**: A loop that processes 10 items looks the same in logs as one that processes 9 — the missing item has no error, just an absence. Downstream steps that read loop output before completion get partial data with no warning.
6
+
7
+ ---
8
+
9
+ ## The loop completion trap
10
+
11
+ The most common loop mistake: a downstream step reads loop results before the loop has finished.
12
+
13
+ ```yaml
14
+ # Wrong: downstream step starts before loop finishes
15
+ steps:
16
+ - id: enrich-companies
17
+ type: loop
18
+ over: "{{input.companies}}"
19
+ step: enrich-each
20
+
21
+ - id: generate-report # starts immediately, reads partial results
22
+ type: ai-action
23
+ input: "{{steps.enrich-companies.results}}"
24
+ ```
25
+
26
+ In async execution, `generate-report` may start with 3 of 10 companies enriched. The report is incomplete. No error is raised.
27
+
28
+ ---
29
+
30
+ ## Anti-pattern
31
+
32
+ ```yaml
33
+ # Wrong: no loop completion gate
34
+ - id: process-items
35
+ type: loop
36
+ over: "{{steps.fetch.items}}"
37
+ step: process-each
38
+
39
+ - id: summarize # may run with 0 items if loop is still in flight
40
+ type: ai-action
41
+ prompt: "Summarize these results: {{steps.process-items.outputs}}"
42
+ ```
43
+
44
+ ---
45
+
46
+ ## Correct pattern
47
+
48
+ Add a `loop_completion` entry condition on every step that consumes loop output:
49
+
50
+ ```yaml
51
+ - id: process-items
52
+ type: loop
53
+ over: "{{steps.fetch.items}}"
54
+ step: process-each
55
+
56
+ - id: summarize
57
+ type: ai-action
58
+ entryConditions:
59
+ onCriteriaFail: "wait" # block until condition is met
60
+ conditionText: "Wait for all processing to complete"
61
+ criteria:
62
+ - type: loop_completion
63
+ stepId: process-items # which loop to wait for
64
+ operator: "=="
65
+ value: true
66
+ prompt: "Summarize these results: {{steps.process-items.outputs}}"
67
+ ```
68
+
69
+ `onCriteriaFail: "wait"` blocks this step until all loop iterations finish. The step then runs once with the complete output.
70
+
71
+ ---
72
+
73
+ ## Pairing loop results back to source records
74
+
75
+ After a loop that calls an external API or runs an AI step per item, you often need to pair each result back to the original record for a KG or CRM write.
76
+
77
+ The problem: loop outputs are indexed by iteration order, not by the original record's ID.
78
+
79
+ ```javascript
80
+ // Code step: pair loop outputs with source records
81
+ const sourceItems = input.sourceItems; // original array
82
+ const loopOutputs = input.loopOutputs; // same-length array of results
83
+
84
+ return sourceItems.map((item, index) => ({
85
+ ...item, // original fields
86
+ ...loopOutputs[index], // enriched fields
87
+ sourceId: item.id, // explicit ID link
88
+ }));
89
+ ```
90
+
91
+ Place this code step after the loop completion gate, before the write step.
92
+
93
+ ---
94
+
95
+ ## N+1: when to loop vs when to batch
96
+
97
+ A loop that calls an LLM or enrichment API once per item is an N+1 pattern. For 100 items: 100 API calls, 100 credit charges, 100× the latency.
98
+
99
+ **Ask: does the API support batch input?**
100
+
101
+ ```yaml
102
+ # Wrong (N+1): one LLM call per item
103
+ - id: classify-each
104
+ type: loop
105
+ over: "{{input.emails}}"
106
+ step:
107
+ type: ai-action
108
+ prompt: "Classify this email: {{currentItem.body}}"
109
+
110
+ # Correct (batch): one LLM call for all items
111
+ - id: classify-all
112
+ type: ai-action
113
+ prompt: |
114
+ Classify each of these emails. Return a JSON array in the same order.
115
+ Emails: {{input.emails}}
116
+ responseStructure:
117
+ classifications: "array of { id: string, category: string, priority: string }"
118
+ ```
119
+
120
+ Not every step supports batching — enrichment APIs often don't. But AI steps almost always do. Default to batch for AI classification, extraction, and scoring over lists.
121
+
122
+ ---
123
+
124
+ ## Fire-and-forget anti-pattern
125
+
126
+ ```yaml
127
+ # Wrong: loop dispatches child workflows with no completion tracking
128
+ - id: dispatch-scoring
129
+ type: loop
130
+ over: "{{input.candidates}}"
131
+ step:
132
+ type: call-workflow
133
+ workflowId: score-candidate
134
+ input: "{{currentItem}}"
135
+
136
+ - id: aggregate-scores # starts immediately — child workflows haven't finished
137
+ type: ai-action
138
+ prompt: "Aggregate these scores: {{steps.dispatch-scoring.outputs}}"
139
+ ```
140
+
141
+ When the loop calls child workflows, completion tracking is especially important — child workflow execution time varies. Always add a `loop_completion` gate before aggregating.
142
+
143
+ ---
144
+
145
+ ## One-line rule
146
+
147
+ > Always gate the step that consumes loop output on `loop_completion` with `onCriteriaFail: "wait"` — loops run asynchronously and downstream steps will read partial data without it.
@@ -0,0 +1,151 @@
1
+ # 05 — Child workflow contracts: composable workflows with typed returns
2
+
3
+ **Problem**: Monolithic workflows become unmaintainable, and child workflows called from orchestrators fail silently because their return contracts aren't defined — the calling workflow gets `undefined` for every field it tries to read.
4
+
5
+ **Why it fails silently**: A child workflow that ends with a `milestone` step instead of a `return` step completes successfully from the platform's perspective. The calling workflow receives no data and no error. Every field reference like `{{steps.call-child.score}}` resolves to empty string.
6
+
7
+ ---
8
+
9
+ ## The milestone vs return mistake
10
+
11
+ ```yaml
12
+ # Wrong: child workflow ends with milestone
13
+ - id: score-company
14
+ type: ai-action
15
+ prompt: "Score this company 0-100..."
16
+ responseStructure:
17
+ score: "number"
18
+ decision: "string"
19
+
20
+ - id: done # ← milestone = terminal, no data returned to caller
21
+ type: milestone
22
+ name: "Complete"
23
+ ```
24
+
25
+ The calling orchestrator runs `call-workflow` and gets back nothing. `{{steps.call-child.score}}` is empty. No error is raised.
26
+
27
+ ---
28
+
29
+ ## Anti-pattern
30
+
31
+ ```yaml
32
+ # Wrong: child workflow
33
+ steps:
34
+ - id: enrich
35
+ ...
36
+ - id: score
37
+ ...
38
+ - id: done # milestone doesn't return data
39
+ type: milestone
40
+
41
+ # Calling orchestrator:
42
+ - id: call-child
43
+ type: call-workflow
44
+ input: { companyUrl: "{{input.url}}" }
45
+
46
+ - id: use-result
47
+ type: ai-action
48
+ prompt: "Based on score {{steps.call-child.score}}..."
49
+ # ^ always empty — milestone returned nothing
50
+ ```
51
+
52
+ ---
53
+
54
+ ## Correct pattern
55
+
56
+ Child workflows must end with a `return` step that explicitly declares what they return:
57
+
58
+ ```yaml
59
+ # Correct: child workflow
60
+ steps:
61
+ - id: enrich
62
+ ...
63
+ - id: score
64
+ type: ai-action
65
+ responseStructure:
66
+ score: "number 0-100"
67
+ decision: "invest | pass | monitor"
68
+ summary: "string"
69
+
70
+ - id: return-results # ← return step, not milestone
71
+ type: return
72
+ returnConfig:
73
+ fields:
74
+ - name: score # the name the caller uses
75
+ stepId: score # which step produced it
76
+ field: score # which field from that step
77
+ - name: decision
78
+ stepId: score
79
+ field: decision
80
+ - name: summary
81
+ stepId: score
82
+ field: summary
83
+
84
+ # Calling orchestrator:
85
+ - id: call-child
86
+ type: call-workflow
87
+ input: { companyUrl: "{{input.url}}" }
88
+
89
+ - id: use-result
90
+ type: ai-action
91
+ prompt: "Based on score {{steps.call-child.score}}, decision: {{steps.call-child.decision}}..."
92
+ # ^ now populated correctly
93
+ ```
94
+
95
+ ---
96
+
97
+ ## Designing a return contract
98
+
99
+ A return contract is the interface between the child workflow and its callers. Treat it like a function signature:
100
+
101
+ 1. **Be explicit**: list every field the caller might need — don't assume they'll dig into nested objects
102
+ 2. **Use flat field names**: `score` not `scoringCard.total_score` — callers reference these as template variables
103
+ 3. **Match names to caller expectations**: if the orchestrator uses `{{steps.call-child.decision}}`, the return contract must export a field named `decision`
104
+ 4. **Version changes carefully**: adding fields is safe; renaming or removing fields breaks every orchestrator that calls this child
105
+
106
+ ```yaml
107
+ # Comprehensive return contract
108
+ returnConfig:
109
+ fields:
110
+ - { name: score, stepId: score-step, field: total_score }
111
+ - { name: decision, stepId: score-step, field: decision }
112
+ - { name: summary, stepId: score-step, field: executive_summary }
113
+ - { name: teamEvaluations, stepId: eval-team, field: evaluations }
114
+ - { name: rawData, stepId: enrich, field: companyProfile }
115
+ ```
116
+
117
+ ---
118
+
119
+ ## The god-workflow anti-pattern
120
+
121
+ A single workflow with 25+ steps that handles intake, enrichment, scoring, routing, outreach, and CRM sync:
122
+
123
+ ```
124
+ trigger → fetch → enrich → score → route → draft-email → approve → send →
125
+ update-crm → update-kg → notify-slack → generate-report → archive → done
126
+ ```
127
+
128
+ Problems:
129
+ - A failure in step 12 requires re-running steps 1-11
130
+ - You can't reuse the scoring logic in another context
131
+ - Testing requires running the entire pipeline end-to-end
132
+ - A single developer change can break the entire flow
133
+
134
+ **Break it into composable child workflows:**
135
+
136
+ ```
137
+ Orchestrator:
138
+ trigger → call: enrich-workflow → call: score-workflow → call: route-workflow → done
139
+
140
+ enrich-workflow: fetch → enrich → return { profile }
141
+ score-workflow: receive profile → score → return { score, decision }
142
+ route-workflow: receive decision → route → draft → approve → send → return { sent }
143
+ ```
144
+
145
+ Each child workflow can be tested independently, retried independently, and reused by other orchestrators.
146
+
147
+ ---
148
+
149
+ ## One-line rule
150
+
151
+ > Child workflows must end with a `return` step (not `milestone`) with an explicit `returnConfig.fields` list — milestone completes silently with no data returned to the caller.