npm - @temporal-architect/claude-plugin - Versions diffs - 0.9.0 - Mend

@temporal-architect/claude-plugin 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (77) hide show

package/skills/temporal-architect-design/reference/anti-patterns.md ADDED Viewed

@@ -0,0 +1,332 @@
+# Common Anti-Patterns
+> **Re-check pass (required).** This catalog is not just reference — during the [Design Review](../SKILL.md#design-review), walk the *finished* design against **every** anti-pattern below and confirm none apply. Designs drift into these shapes by copying prior work uncritically; pattern-matching against a catalog of bad shapes is exactly what this pass is for. The checklist line is: "Design re-checked against every anti-pattern in `anti-patterns.md`."
+## Structural
+### Unbounded History
+A workflow that runs indefinitely without resetting accumulates unbounded event history, eventually degrading performance.
+```twf
+# BAD: Infinite loop with no history reset
+# workflow EventProcessor(config: Config):
+#     for:
+#         activity PollEvents(config) -> events
+#         activity ProcessBatch(events)
+# GOOD: Continue-as-new resets history periodically
+workflow EventProcessor(config: Config):
+    state:
+        condition shutdownRequested
+    signal Shutdown():
+        set shutdownRequested
+    for:
+        if (shutdownRequested):
+            close complete
+        activity PollEvents(config) -> events
+        activity ProcessBatch(events)
+        close continue_as_new(config)
+activity PollEvents(config: Config) -> (Events):
+    return poll(config)
+activity ProcessBatch(events: Events):
+    process(events)
+```
+**Why:** Temporal stores every event in workflow history. Long-running workflows without `close continue_as_new` grow history without bound, causing slow replays and eventual failure. See [long-running.md](../topics/long-running.md).
+> **Bounded is not automatically safe.** The rule is not "infinite loops need `continue_as_new`" — it's "**loops whose accumulated history is large need `continue_as_new`; a bound alone is not sufficient.**" A loop with an internal bound (e.g. `for` over 40 iterations) still grows history linearly, and if per-iteration history is chunky (a large `LlmCall` result plus N tool calls each iteration) or the bound is high, it can blow the limit before finishing. State the strategy explicitly in the design — "bounded at N, per-iteration history small, no `continue_as_new`" or "resets every K iterations" — rather than leaving it silent.
+### Wrapper Workflow
+A child workflow containing a single activity call adds orchestration overhead with no benefit.
+```pseudo
+# BAD: Unnecessary child workflow wrapper
+workflow Parent():
+    workflow SendEmailWorkflow(to, body)
+workflow SendEmailWorkflow(to, body):
+    activity SendEmail(to, body)
+    close complete
+# GOOD: Call the activity directly
+workflow Parent():
+    activity SendEmail(to, body)
+```
+**Why:** Child workflows create separate history, require their own task queue routing, and add latency. Use them only when you need independent retry policies, a separate failure boundary, or multi-step orchestration.
+### Monolithic Workflow
+All business logic in a single workflow with dozens of sequential steps.
+```pseudo
+# BAD: One workflow doing everything
+workflow ProcessOrder(order):
+    activity Validate(order)
+    activity CheckInventory(order)
+    activity ReserveInventory(order)
+    activity ChargePayment(order)
+    activity CreateShipment(order)
+    activity NotifyWarehouse(order)
+    activity UpdateCRM(order)
+    activity SendConfirmation(order)
+    activity ScheduleFollowUp(order)
+    # ... 20 more steps
+# GOOD: Decompose into child workflows with clear boundaries
+workflow ProcessOrder(order):
+    activity ValidateOrder(order) -> validated
+    workflow FulfillOrder(validated) -> fulfillment
+    workflow NotifyStakeholders(order, fulfillment)
+    close complete(OrderResult{fulfillment})
+```
+**Why:** Large workflows have large histories (slow replay), make failure recovery coarse-grained (one failure may require re-running unrelated steps), and are hard to test. Decompose when a group of steps has its own lifecycle, retry needs, or failure boundary.
+### Large Payloads in Workflow State
+Storing large data (files, full database results, images) in workflow variables or signal/update payloads.
+```pseudo
+# BAD: Entire dataset in workflow state
+workflow AnalyzeData(datasetId):
+    activity FetchDataset(datasetId) -> dataset  # 500MB result stored in history
+    activity Analyze(dataset) -> results
+# GOOD: Pass references, not data
+workflow AnalyzeData(datasetId):
+    activity FetchAndStore(datasetId) -> dataRef  # Returns S3 key, not data
+    activity Analyze(dataRef) -> results
+```
+**Why:** Every activity input and result is persisted in workflow history. Large payloads bloat history size, slow down replay, and may exceed Temporal's payload size limit. Pass references (IDs, URLs, keys) instead of data.
+> **Default: defer to the payload codec.** Temporal supports a **payload codec / data converter** that transparently offloads, compresses, or encrypts large payloads with *no change* to workflow/activity signatures — and the claim-check pattern itself is, most of the time, best implemented as a codec server that swaps a large payload for a reference behind the scenes. So the decision is:
+>
+> - **Default — let the codec server handle it** (claim-check included). Note it cleanly and move on: *"large-payload claim-check handled by the codec server."* Do **not** invent a bespoke claim-check store at design time.
+> - **Escalate to an explicit application-level `*Ref`** only when the data outlives the workflow, is shared across services, or needs an ownership/GC story the codec can't own. *Only in that case* does the design owe a one-line note on backing store + lifecycle.
+## Primitive Misuse
+### Signal for Request-Response
+Using a signal when the caller needs confirmation or a return value.
+```pseudo
+# BAD: Signal has no return value — caller doesn't know if it worked
+signal ApproveOrder(orderId):
+    approved = true
+# GOOD: Update returns a result to the caller
+update ApproveOrder(orderId: string) -> (ApprovalResult):
+    activity ValidateApproval(orderId) -> validation
+    if (validation.ok):
+        approved = true
+        return ApprovalResult{accepted: true}
+    else:
+        return ApprovalResult{accepted: false, reason: validation.error}
+```
+**Why:** Signals are fire-and-forget — the sender gets no acknowledgment, no validation, and no result. Use `update` when the caller needs to know the mutation was accepted.
+### Query That Modifies State
+Using a query handler to change workflow state.
+```pseudo
+# BAD: Query with side effects
+query GetOrderStatus():
+    accessCount = accessCount + 1  # Modifies state!
+    return OrderStatus{status, accessCount}
+# GOOD: Query is a pure read
+query GetOrderStatus():
+    return OrderStatus{status}
+```
+**Why:** Queries are read-only by contract. They may be called multiple times during replay without the workflow's knowledge. State modifications in queries produce unpredictable behavior and violate Temporal's execution model.
+### Update Without Validation
+Accepting an update without checking whether the mutation is valid.
+```pseudo
+# BAD: Blindly applies the update
+update SetShippingAddress(address):
+    shippingAddress = address
+    return Result{ok: true}
+# GOOD: Validate before committing
+update SetShippingAddress(address: Address) -> (Result):
+    activity ValidateAddress(address) -> validation
+    if (validation.valid):
+        shippingAddress = address
+        return Result{ok: true}
+    else:
+        return Result{ok: false, error: validation.reason}
+```
+**Why:** Updates execute inside the workflow — invalid data corrupts workflow state. Always validate before committing. The caller receives the validation result, so they can react to rejection.
+### Detach When You Need the Result
+Using `detach` on a child workflow or nexus call when the parent needs the outcome.
+```pseudo
+# BAD: Detached — parent can't observe success or failure
+detach workflow ProcessPayment(order)
+# ... parent continues, has no idea if payment succeeded
+# GOOD: Synchronous call or promise when result matters
+workflow ProcessPayment(order) -> paymentResult
+# or: promise p <- workflow ProcessPayment(order) ... await p -> paymentResult
+```
+**Why:** `detach` is fire-and-forget — the parent cannot await the result, check for errors, or compensate on failure. Use `detach` only when you genuinely don't care about the outcome (audit logs, analytics, notifications where failure is acceptable).
+## Activity Anti-Patterns
+### Non-Determinism in Workflows
+Using non-deterministic operations directly in workflow code.
+```pseudo
+# BAD: Current time varies on replay
+# if (current_time() > deadline):
+#     cancel()
+# BAD: Map iteration order varies across replays
+# for (key in map.keys()):
+#     activity Process(key)
+# BAD: Goroutines/threads — execution order not deterministic
+# go func() { activity DoWork() }
+# GOOD: Use Temporal primitives for time
+# await one:
+#     activity DoWork() -> result:
+#         close complete(Result{result})
+#     timer(deadline):
+#         close fail(Result{status: "timeout"})
+# GOOD: Sort before iterating
+# for (key in sorted(map.keys())):
+#     activity Process(key)
+# GOOD: Use promises for concurrency
+# promise a <- activity DoWorkA()
+# promise b <- activity DoWorkB()
+# await a -> resultA
+# await b -> resultB
+```
+**Why:** Temporal replays workflow code to reconstruct state. Any operation that produces different results on replay — time, random numbers, non-deterministic iteration, language-level threading — causes non-determinism errors. See [core-principles.md](./core-principles.md).
+### Non-Idempotent Activities
+Activities that fail or produce incorrect results on retry.
+```pseudo
+# BAD: Assumes fresh state — duplicate user on retry
+activity CreateUser(name):
+    db.insert(User(name))
+# GOOD: Create-or-get — idempotent
+activity CreateUser(name):
+    existing = db.get_by_name(name)
+    if existing: return existing
+    return db.insert(User(name))
+```
+**Why:** Activities may be retried on network failures, worker crashes, or timeouts. An activity that isn't idempotent (same inputs → same result) will produce duplicate records, double charges, or inconsistent state. See [core-principles.md](./core-principles.md) for idempotency patterns.
+### Orchestration in Activities
+Putting multi-step logic, retry loops, or conditional branching inside an activity.
+```pseudo
+# BAD: Multi-step orchestration in activity — partial failure unrecoverable
+activity DeployAll(specs):
+    for spec in specs:
+        deploy(spec)          # If this fails on spec #5 of 10,
+        wait_healthy(spec)    # specs 1-4 deployed but no rollback
+```
+```twf
+# GOOD: Workflow orchestrates, each step independently retryable
+workflow DeployAll(specs: Specs):
+    for (spec in specs.items):
+        activity Deploy(spec)
+        activity WaitHealthy(spec)
+    close complete
+activity Deploy(spec: Spec):
+    deploy(spec)
+activity WaitHealthy(spec: Spec):
+    wait_healthy(spec)
+```
+**Why:** Activities run outside Temporal's durable execution model — if an activity fails mid-way through a loop, there's no replay, no history, and no way to resume from the last successful step. Workflows provide exactly this: durable, retryable orchestration with full visibility into progress.
+### Activity Sprawl / Wrapping In-Memory Work
+Wrapping work in an activity when nothing touches an external system — reading data the workflow already holds, in-memory derivation, or accumulation.
+```pseudo
+# BAD: activities that touch no external system
+activity ReadCritiqueReady(state) -> ready   # field access on held data
+activity ListSubsetPaperIds(papers) -> ids   # in-memory filter
+activity AppendObservation(list, obs) -> list # building a collection
+# GOOD: this is workflow code
+ready = state.critiqueReady
+ids = filter(papers, isSubset)
+observations = append(observations, obs)
+```
+**Why:** Each spurious activity is a task-queue round-trip plus a history event for no resilience benefit. Activities are for I/O and side effects; in-memory work is deterministic workflow code. See [core-principles.md](./core-principles.md#activities-are-for-io--not-in-memory-work).
+## Deployment Topology
+### Nexus for Same-Namespace Calls
+Using a Nexus operation to call a workflow that lives in the same namespace.
+```pseudo
+# BAD: Nexus hop within one namespace — adds an endpoint, a service
+#      contract, and latency for no boundary benefit
+nexus InternalEndpoint InternalService.DoStep(args) -> result
+# GOOD: same namespace — call the child workflow (or activity) directly
+workflow DoStep(args) -> result
+```
+**Why:** Nexus exists to cross an **organizational** boundary — different team, security context, deployment lifecycle, or an external service contract. Within a single namespace those boundaries don't exist, so Nexus only adds an endpoint declaration, a typed contract, and a network hop. Coupling between workflows is an argument for *co-location*, not a Nexus boundary. See [namespaces.md](./namespaces.md) and [workflow-boundaries.md](./workflow-boundaries.md#use-nexus-when).
+### Deployment Config in Workers Instead of Namespaces
+Putting task queues, concurrency limits, or other deployment options on `worker` definitions.
+```pseudo
+# BAD: worker carries deployment config
+worker orderTypes:
+    workflow ProcessOrder
+    options:
+        task_queue: "orders"   # workers are type sets, not deployments
+# GOOD: worker is a reusable type set; the namespace instantiates it with config
+worker orderTypes:
+    workflow ProcessOrder
+namespace ecommerce:
+    worker orderTypes
+        options:
+            task_queue: "orders"
+```
+**Why:** A `worker` is a *reusable type set* (which workflows/activities/services run together) with no deployment config; the `namespace` is what instantiates it with a `task_queue` and options. Mixing the two prevents reusing the same type set across namespaces (staging vs prod) and is rejected by `twf check`. See [task-queues.md](../topics/task-queues.md).

package/skills/temporal-architect-design/reference/common-errors.md ADDED Viewed

@@ -0,0 +1,88 @@
+# Common Errors
+This file covers **parser, resolver, and validator diagnostics** emitted by
+`twf check` and `twf parse`. For **design-level anti-patterns** (structural
+mistakes, primitive misuse), see [anti-patterns.md](./anti-patterns.md).
+Each row lists the symbolic `code` (stable across releases), the human
+message you'll see, the cause, and the fix. The codes are also emitted by
+`twf parse` inside the structured envelope (`diagnostics[].code`); programmatic
+consumers should match on `kind+code` rather than the message.
+## Resolve errors (kind: `resolve`)
+| Code | Message | Cause | Fix |
+|------|---------|-------|-----|
+| `UNDEFINED_ACTIVITY` | `undefined activity: Foo` | Activity `Foo` is called but not defined | Add `activity Foo(...):` definition to the file |
+| `UNDEFINED_WORKFLOW` | `undefined workflow: Foo` | Child workflow `Foo` is called but not defined | Add `workflow Foo(...):` definition to the file |
+| `UNDEFINED_SIGNAL` | `undefined signal: Foo` | `await signal Foo` or `signal Foo:` case but no signal handler declared | Add `signal Foo(...):` declaration inside the workflow, before the body |
+| `UNDEFINED_UPDATE` | `undefined update: Foo` | `await update Foo` or `update Foo:` case but no update handler declared | Add `update Foo(...) -> (Type):` declaration inside the workflow, before the body |
+| `UNDEFINED_CONDITION` | `undefined condition: Foo` | `set Foo`, `unset Foo`, or `await Foo` but no condition declared | Add `condition Foo` inside the workflow's `state:` block |
+| `UNDEFINED_PROMISE_OR_CONDITION` | `undefined promise or condition: Foo` | `await Foo` or `Foo:` case in `await one` but `Foo` is not a promise or condition | Add `promise Foo <- ...` in the workflow body or `condition Foo` in the `state:` block |
+| `DUPLICATE_WORKFLOW` | `duplicate workflow definition: Foo` | Two `workflow Foo` definitions in the same file | Remove or rename the duplicate |
+| `DUPLICATE_ACTIVITY` | `duplicate activity definition: Foo` | Two `activity Foo` definitions in the same file | Remove or rename the duplicate |
+| `DUPLICATE_WORKER` | `duplicate worker definition: Foo` | Two `worker Foo` definitions | Remove or rename the duplicate |
+| `DUPLICATE_NAMESPACE` | `duplicate namespace definition: Foo` | Two `namespace Foo` blocks | Remove or rename the duplicate |
+| `DUPLICATE_NEXUS_SERVICE` | `duplicate nexus service definition: Foo` | Two `nexus service Foo` blocks | Remove or rename the duplicate |
+| `DUPLICATE_ENDPOINT` | `duplicate nexus endpoint name "Foo": defined in namespace A and namespace B` | Same endpoint name in multiple namespaces | Use unique endpoint names |
+| `CONDITION_RESULT_BINDING` | `condition "Foo" cannot have a result binding (-> identifier)` | `await Foo -> result` where `Foo` is a condition | Conditions are boolean — remove the `-> result` binding |
+| `NEXUS_ASYNC_UNDEFINED_WORKFLOW` | `async operation Foo references undefined workflow: Bar` | Async nexus op points at a workflow that doesn't exist | Add the workflow or fix the name |
+| `NEXUS_UNDEFINED_ENDPOINT` | `undefined nexus endpoint: Foo` | Endpoint referenced but not defined anywhere | Add a `nexus endpoint Foo:` in some namespace, or fix the name |
+| `NEXUS_UNDEFINED_SERVICE` | `undefined nexus service: Foo` | Service referenced but not defined | Add a `nexus service Foo:` block or fix the name |
+| `NEXUS_NO_OPERATION` | `nexus service Foo has no operation Bar` | Operation name not in the service | Add the operation or fix the name |
+| `WORKER_UNDEFINED_WORKFLOW` / `WORKER_UNDEFINED_ACTIVITY` / `WORKER_UNDEFINED_NEXUS_SERVICE` | `worker X references undefined ...` | Worker lists a name that doesn't exist | Add the definition or fix the name |
+| `NAMESPACE_UNDEFINED_WORKER` | `namespace X references undefined worker: Y` | Namespace uses unknown worker | Add worker block or fix name |
+### Nexus resolution: external (warning) vs. local (error)
+Nexus references resolve in one of two modes, decided **per category** — services and
+endpoints are independent axes:
+| Category | Nothing of that category defined in the file set | Any of that category defined |
+|----------|--------------------------------------------------|------------------------------|
+| Service  | `NEXUS_UNRESOLVED_SERVICE` — warning, exit 0 ("may be external") | `NEXUS_UNDEFINED_SERVICE` — error, exit 1 |
+| Endpoint | `NEXUS_UNRESOLVED_ENDPOINT` — warning, exit 0 ("may be external") | `NEXUS_UNDEFINED_ENDPOINT` — error, exit 1 |
+(Endpoints are defined inside `namespace` blocks; services via top-level `nexus service`.)
+**Gotcha:** defining *one* local service retroactively turns *every other* service reference
+into a hard error — even references to genuinely external services in other namespaces. This is
+a sharp cliff for a partial / per-package file that both *calls* external services and *provides*
+its own. Until an explicit external marker exists, either (a) add a local stub definition for
+each external service you call, or (b) define no nexus services in the file and accept the
+warnings.
+## Parse errors (kind: `parse`)
+All parse failures share the single code `SYNTAX`. The message carries the
+detail; pin programmatic dispatch to `kind=parse, code=SYNTAX` and match on
+the message for now (categorical parse codes are future work).
+| Message | Cause | Fix |
+|---------|-------|-----|
+| `<keyword> is not allowed in activity body` | Using a temporal primitive (`workflow`, `activity`, `timer`, `signal`, `await`, etc.) inside an activity definition or query handler | Move the temporal primitive to a workflow. Activities run outside the replay-safe workflow context as normal side-effecting code — temporal primitives require deterministic replay and cannot function in activities. |
+| `expected ( after return type ->` | Return type not parenthesized: `-> Result` | Use `-> (Result)` — return types must be wrapped in parentheses |
+| `expected ( after if` / `expected ( after for` | Missing parentheses around condition/iterator | Use `if (expr):` / `for (x in items):` |
+| `unexpected token <tok> at top level` | Statement or keyword that doesn't start a workflow or activity definition | Ensure all top-level items are `workflow`, `activity`, `worker`, `namespace`, or `nexus service` definitions |
+| `unexpected token <tok> in await one case` | Invalid case type inside `await one:` block | Cases must be `signal`, `update`, `timer`, `activity`, `workflow`, an identifier, or `await all` |
+| `expected COLON, got NEWLINE` | A definition is missing its `:` and indented body — a bare declaration like `activity Foo(x) -> (R)` with nothing under it. `activity`/`workflow`/`sync` nexus op definitions always require a body. (Often followed by a cascading `UNDEFINED_*` because the malformed definition didn't register.) | Add `:` and an indented body. For a not-yet-implemented stub, use a placeholder statement (e.g. `return Foo{}` or a single `log(...)`); a definition cannot be body-less. |
+## Validation diagnostics (kind: `validate`)
+| Code | Severity | Cause | Fix |
+|------|----------|-------|-----|
+| `MISSING_TASK_QUEUE` | error | Worker instantiation has no `task_queue` option | Add `options: task_queue: "..."` to the worker instantiation |
+| `MISSING_ENDPOINT_TASK_QUEUE` | error | Nexus endpoint instantiation has no `task_queue` | Add the option to the endpoint instantiation |
+| `EXPLICIT_ROUTING_MISMATCH` | error | An activity/workflow call's explicit `task_queue` doesn't match any worker registering it | Fix the queue name or register the target on a worker for that queue |
+| `IMPLICIT_ROUTING_MISMATCH` | error | An activity/workflow is called without an explicit `task_queue` and no worker on the caller's queue registers it | Add the target to a worker on the same queue, or pass an explicit `task_queue` option |
+| `ENDPOINT_SERVICE_LINKAGE` | error | Endpoint routes to a task queue but no worker on that queue registers the service | Register the service on a worker for the endpoint's queue |
+| `TASK_QUEUE_MISMATCH` | error | Two workers share a queue but register different type sets | Make the type sets identical, or use distinct queues |
+| `TASK_QUEUE_IDENTICAL` | warning | Two workers register identical type sets on the same queue (redundant) | Drop one of the workers |
+| `UNCOVERED_WORKFLOW` / `UNCOVERED_ACTIVITY` / `UNCOVERED_SERVICE` | warning | Definition exists but no instantiated worker registers it | Register on a worker or remove the unused definition |
+| `UNINSTANTIATED_WORKER` | warning | Worker defined but never instantiated in any namespace | Instantiate it in a namespace, or remove the worker |
+| `EMPTY_WORKFLOW` / `EMPTY_ACTIVITY` / `EMPTY_WORKER` / `EMPTY_NAMESPACE` | warning | Block has no body / no registrations / no instantiations | Add content or remove the empty block |
+The diagnostic shape is the `envelope.Diagnostic` Go struct
+(`tools/lsp/cmd/twf/internal/envelope/model.go`); run any `twf --json`
+subcommand to see it live, or read its TypeScript projection in
+`tools/wire-types`.

package/skills/temporal-architect-design/reference/core-principles.md ADDED Viewed

@@ -0,0 +1,52 @@
+# Determinism & Idempotency
+## Determinism: Workflows Must Replay Identically
+Temporal replays workflow code to reconstruct state. Different replay results = non-determinism errors. See [Temporal: Deterministic Constraints](https://docs.temporal.io/workflows#deterministic-constraints) for the authoritative reference.
+| Safe in Workflows | Must Be in Activities |
+|-------------------|----------------------|
+| Logic on activity results | Current time, dates |
+| Deterministic loops/conditionals | Random numbers, UUIDs |
+| Child workflows | HTTP/API calls |
+| Temporal timers | Database operations |
+| Local variables | File I/O |
+| Signal waits | External service calls |
+| Deterministic iteration (arrays, slices) | Map/dictionary iteration (order varies) |
+| Temporal SDK concurrency (promises, await all) | Language-level threads, goroutines, async |
+| Workflow-local state | Mutable global/shared state |
+**Workflows = pure orchestration. Activities = side effects.**
+### Activities Are for I/O — Not In-Memory Work
+The table above is often *over*-applied into activity sprawl: wrapping things in an activity that were never side effects. **Activities are for I/O and side effects. Do not wrap in an activity:**
+- **Reads of data the workflow already holds** — field access, lookups into a struct/ref passed in or returned by an earlier activity (`ReadCritiqueReady`, `LookupBundleRef`). The workflow already has the data; reading it is workflow code.
+- **In-memory derivation** — filtering, mapping, computing a value from inputs the workflow holds (`ListSubsetPaperIds`).
+- **Accumulation** — appending to a list or building up state (`AppendObservations`, `AppendTrajectory`). Building a collection is deterministic in-memory work and belongs in the workflow body (expressible directly, including as a raw statement) — there is no need for an `Append*` activity.
+Each spurious activity is a task-queue round-trip plus a history event for **no resilience benefit**. The litmus test: *does it touch an external system or produce a side effect?* If not, it's workflow code.
+> **Optimization (away from the default):** batch several small calls into one activity only when they always succeed/fail together and per-call retry isn't meaningful; consider local activities for short deterministic helpers. These are deviations from "one activity per network call" (see [workflow-boundaries.md](./workflow-boundaries.md)), not the starting point.
+## Idempotency: Activities May Run Multiple Times
+Retries happen (network failures, crashes, timeouts). Activities must be **idempotent**: same inputs → same result regardless of execution count.
+| Pattern | Example |
+|---------|---------|
+| **Create-or-get** — when entity has a natural unique key | Check existence before creating |
+| **Idempotency keys** — when external system supports them | Workflow ID + activity name as operation key |
+| **Upsert** — when database supports atomic upsert | Prefer over insert-then-update |
+| **Deduplication** — last resort when no built-in mechanism | Query before mutating |
+**Think through retries:** CreateUser → return existing if exists. SendEmail → provider idempotency key. DeployResource → verify state, return success if deployed.
+### State the Strategy in the Design
+Knowing the patterns isn't enough — the *design* must **state** which one each side-effecting activity uses, so idempotency is a load-bearing decision rather than an assumed prose comment. For every activity that isn't idempotent by nature, name its strategy and key derivation, e.g.:
+> `ChargePayment` — idempotency key = `"{workflow_id}-ChargePayment"`; provider dedupes on it.
+This is a **skill/design concern, not a `twf check` rule** — Temporal has no call-site `idempotency_key` option, so the parser cannot validate it. The [Design Review](../SKILL.md#design-review) checks that each non-idempotent activity carries this note.

package/skills/temporal-architect-design/reference/design-checklist.md ADDED Viewed

@@ -0,0 +1,59 @@
+# Design Checklist
+**Validation ≠ review.** *Validation* asks "does it parse and resolve?" (`twf check`). *Review* asks "is it a good Temporal design?" — and no tool answers that. A clean `twf check` clears only the first group below; the [Design Review](#design-review) group is where design quality lives. Don't present on a green tool alone.
+## TWF Validation
+- [ ] `twf check` passes (`✓ OK`)
+- [ ] `twf symbols` lists all expected definitions
+- [ ] No undefined references
+- [ ] No SDK-specific code in `.twf`
+→ See [common-errors.md](./common-errors.md) for error troubleshooting
+## Design Review (fresh-eyes pass — no tool catches these)
+- [ ] **Call-site integrity** — every activity/workflow/nexus definition has a *structured* call site (no orphaned `x = Name(args)` parsing as `raw`)
+- [ ] **Reachability** — every workflow is reachable from a declared entry point; no dead workflows
+- [ ] Design re-checked against **every** anti-pattern in [anti-patterns.md](./anti-patterns.md)
+- [ ] Each non-idempotent activity names its idempotency strategy + key derivation
+- [ ] Concurrent fan-out branches that write shared external state state their isolation/keying assumption
+→ See [SKILL.md § Design Review](../SKILL.md#design-review)
+## Determinism
+- [ ] All I/O, time, randomness in activities
+- [ ] No external calls in workflow code
+- [ ] Loops have deterministic bounds
+- [ ] Timers use Temporal primitives
+- [ ] No non-deterministic data structure iteration (maps, sets)
+- [ ] Version-specific branching uses proper versioning pattern
+→ See [core-principles.md](./core-principles.md) for determinism rules
+## Idempotency
+- [ ] Activities handle "already exists" gracefully
+- [ ] Retries produce same end state
+- [ ] No duplicate side effects on replay
+→ See [core-principles.md](./core-principles.md) for idempotency patterns
+## Failure Handling
+- [ ] Each failure mode identified
+- [ ] Recovery strategy defined (retry, compensate, fail)
+- [ ] Partial success handled
+- [ ] Timeouts configured
+→ See [anti-patterns.md](./anti-patterns.md) for common failure handling mistakes
+## Runtime, Cost & Lifecycle
+- [ ] Loops with large accumulated history reach `close continue_as_new` (bound alone is not enough); strategy stated
+- [ ] Large payloads: either deferred to the data converter/codec, or (if an explicit claim-check `*Ref`) a one-line store + lifecycle note
+→ See [anti-patterns.md](./anti-patterns.md#large-payloads-in-workflow-state) and [long-running.md](../topics/long-running.md)
+## Decomposition
+- [ ] Each workflow has single clear purpose
+- [ ] Child workflow vs activity choice justified
+- [ ] Workflow names describe outcomes, not steps
+→ See [workflow-boundaries.md](./workflow-boundaries.md) for boundary decisions
+## Deployment Topology (design review — `twf check` validates syntax)
+- [ ] Worker groupings reflect actual deployment needs (not just "one worker for everything")
+- [ ] Task queue separation matches scaling and isolation requirements
+- [ ] Namespace count justified by org / security / lifecycle / external-contract boundaries (default: one)
+- [ ] Cross-namespace calls have nexus endpoints
+- [ ] `twf check` passes topology validation
+→ See [task-queues.md](../topics/task-queues.md) for task queue design and [namespaces.md](./namespaces.md) for namespace count

package/skills/temporal-architect-design/reference/namespaces.md ADDED Viewed

@@ -0,0 +1,84 @@
+# Namespaces: How Many?
+How many namespaces should a design use? This is a cross-cutting deployment decision — it interacts with workers, task queues, and Nexus — so it lives here as a boundary call alongside [workflow-boundaries.md](./workflow-boundaries.md), not as a construct deep-dive. The load-bearing rule:
+> **Namespaces are organizational, not architectural.** They are an operational/ownership boundary, not a decomposition tool. The default is **one**; adding more requires justification.
+Without this rule, designs drift toward "one namespace per layer → one per worker," which is almost always an overuse. Agent/tool scoping is solved by worker registration; runtime heterogeneity by task queues; layer separation by workflow boundaries. **None of those justify a namespace.**
+---
+## Decision Ladder
+### Default: one namespace
+Start here. A single namespace holds all your workers, workflows, and activities. Tightly-coupled workflows belong **together** — coupling is an argument for co-location, not separation.
+### Add a namespace only when one of these demands it
+| Reason | Why it's a namespace boundary |
+|--------|-------------------------------|
+| **Distinct team owns the workflows** | Separate ownership, access control, on-call |
+| **Different security / compliance context** | PCI vs non-PCI, tenant isolation at the org level |
+| **Independent deployment lifecycle** | Separate release cadence and blast radius |
+| **External service contract across an org boundary** | The Nexus case — a typed contract between services |
+### Explicitly NOT reasons to add a namespace
+| Drift | What actually solves it |
+|-------|-------------------------|
+| Different worker / runtime (GPU, licensed software) | **Task queues** ([task-queues.md](../topics/task-queues.md)) |
+| Agent or tool scoping | **Worker registration** (which types run together) |
+| Layer separation (inner vs outer logic) | **Workflow boundaries** (child workflows / activities) |
+| "One per worker" by default | Nothing — co-locate them |
+| "It feels cleaner" | Nothing — resist it |
+---
+## Worked Judgment: Two-Layer Agent System
+Consider an inner agent (executes tools) and an outer agent (plans, calls the inner agent), each with its own tools.
+- **Tempting (wrong) start:** a namespace per worker — one for the planner, one for each tool runner — arriving at 5-6 namespaces.
+- **Why it's wrong:** inner-agent vs outer-agent *tool scoping* is **worker registration** (register each agent's tools on its own worker), not a namespace boundary. The layers are tightly coupled, which argues for co-location.
+- **The legitimate split:** the only real boundary is the **org/service contract** between the two layers — if the inner agent is genuinely an independent service with its own contract, that justifies **two** namespaces connected by **Nexus**. Not more.
+The mechanism is **worker registration for scoping, namespaces only for the service boundary**:
+```twf
+# Tool scoping is worker registration, NOT a namespace:
+worker outerAgentWorker:
+    workflow OuterAgent
+    activity PlanSteps
+    activity SummarizeOutcome
+worker innerAgentWorker:
+    workflow InnerAgent
+    activity SearchTool
+    activity CalcTool
+    nexus service InnerAgentService
+# Exactly two namespaces: one per org/service-contract boundary.
+namespace outerAgent:
+    worker outerAgentWorker
+        options:
+            task_queue: "outer-agent"
+namespace innerAgent:
+    worker innerAgentWorker
+        options:
+            task_queue: "inner-agent"
+    nexus endpoint InnerAgentEndpoint
+        options:
+            task_queue: "inner-agent"
+```
+The final two-namespaces-with-Nexus topology can be a fine outcome; the mistake is *starting* at one-per-worker and being talked back down. Start at one, and require a reason from the ladder above to add each additional namespace.
+---
+## Related
+- [task-queues.md](../topics/task-queues.md) — different runtimes use task queues, not namespaces.
+- [nexus.md](../topics/nexus.md) — the cross-namespace contract; the one mechanism that legitimately spans namespaces.
+- [workflow-boundaries.md](./workflow-boundaries.md) — child workflow vs activity vs nexus.