npm - @cubis/foundry - Versions diffs - 0.3.76 → 0.3.77 - Mend

@cubis/foundry 0.3.76 → 0.3.77

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (165) hide show

package/workflows/workflows/agent-environment-setup/platforms/claude/hooks/settings.snippet.json ADDED Viewed

@@ -0,0 +1,15 @@
+{
+  "hooks": {
+    "UserPromptSubmit": [
+      {
+        "matcher": "*",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "node \"$CLAUDE_PROJECT_DIR/.claude/hooks/route-research-guard.mjs\""
+          }
+        ]
+      }
+    ]
+  }
+}

package/workflows/workflows/agent-environment-setup/platforms/claude/rules/CLAUDE.md CHANGED Viewed

@@ -27,6 +27,7 @@ If any check fails, restart your reasoning.
 | Workflows     | `.claude/workflows`   |
 | Agents        | `.claude/agents`      |
 | Skills        | `.claude/skills`      |
+| Hook templates | `.claude/hooks`      |
 | Scoped rules  | `.claude/rules/*.md`  |
 | Project rules | `CLAUDE.md`           |
 | Global rules  | `~/.claude/CLAUDE.md` |
@@ -43,7 +44,7 @@ Execute this tree top-to-bottom. Stop at the **first match**. Never skip levels.
 ├─ [TRIVIAL] Single-step, obvious, reversible?
 │   → Execute directly. No routing. Stop.
 │
-├─ [EXPLICIT] User named a workflow, command, or @agent?
+├─ [EXPLICIT] User named a workflow, command, @agent, or exact skill?
 │   → Honor that route exactly. Stop.
 │
 ├─ [SINGLE-DOMAIN] Multi-step but contained in one specialty?
@@ -63,6 +64,7 @@ Execute this tree top-to-bottom. Stop at the **first match**. Never skip levels.
 **Hard rules:**
 - Never pre-load skills before route resolution.
+- If the user names an exact skill ID, run `skill_validate` on that ID before `route_resolve`.
 - Never delegate to a subagent when direct execution suffices.
 - Never chain more than one `skill_search` per request.
 - Treat this file as **durable project memory** — not a per-task playbook.
@@ -284,6 +286,7 @@ ORCHESTRATE(task):
 - Path-scoped rules: `.claude/rules/*.md` with `paths:` frontmatter for targeted guidance.
 - Global rules (`~/.claude/CLAUDE.md`) apply to all projects — keep them broad.
 - Skills with `context: fork` run as isolated subagents. `$ARGUMENTS` enables dynamic parameterization.
+- Optional hook templates in `.claude/hooks/` can reinforce explicit-route honoring and research escalation at `UserPromptSubmit`.
 ---
@@ -367,6 +370,7 @@ Use this matrix to match incoming tasks to the correct skill and primary agent.
 | docker-compose-dev | DevOps | Docker Compose local dev environments | @devops-engineer |
 | kubernetes-deploy | DevOps | K8s manifests, Helm charts, deployment | @devops-engineer |
 | observability | DevOps | Logging, metrics, tracing, alerting | @devops-engineer |
+| deep-research | Research | Latest docs, public comparisons, external verification | @researcher |
 | llm-eval | AI/ML | LLM evaluation, benchmarking, evals | @researcher |
 | rag-patterns | AI/ML | RAG architecture, embeddings, retrieval | @researcher |
 | prompt-engineering | AI/ML | Prompt design, few-shot, chain-of-thought | @researcher |
@@ -414,12 +418,15 @@ Selection policy:
 Keep MCP context lazy and exact. Skills are supporting context, not the route layer.
 1. Never begin with `skill_search`. Inspect the repo/task locally first.
-2. Resolve workflows, agents, or free-text route intent with `route_resolve` before loading any skills.
-3. If the route is still unresolved and local grounding leaves the domain unclear, use one narrow `skill_search`.
-4. Always run `skill_validate` on the exact selected ID before `skill_get`.
-5. Call `skill_get` with `includeReferences:false` by default.
-6. Load at most one sidecar markdown file at a time with `skill_get_reference`.
-7. Do not auto-prime every specialist with a skill. Load only what the task clearly needs.
-8. Use upstream MCP servers such as `postman`, `stitch`, or `playwright` for real cloud/browser actions when available.
+2. If the user already named `/workflow`, `@agent`, or an exact skill ID, honor it directly. For exact skills, run `skill_validate` first and skip `route_resolve` when valid.
+3. Resolve only free-text workflow/agent intent with `route_resolve` before loading non-explicit skills.
+4. If the route is still unresolved and local grounding leaves the domain unclear, use one narrow `skill_search`.
+5. Always run `skill_validate` on the exact selected ID before `skill_get`.
+6. Call `skill_get` with `includeReferences:false` by default.
+7. Load at most one sidecar markdown file at a time with `skill_get_reference`.
+8. Do not auto-prime every specialist with a skill. Load only what the task clearly needs.
+9. For research: repo/local evidence first, official docs next, Reddit/community only as labeled secondary evidence.
+10. Escalate to research only when freshness matters, public comparison matters, or the user explicitly asks to research/verify.
+11. Use upstream MCP servers such as `postman`, `stitch`, or `playwright` for real cloud/browser actions when available.
 <!-- cbx:mcp:auto:end -->

package/workflows/workflows/agent-environment-setup/platforms/claude/skills/deep-research/SKILL.md ADDED Viewed

@@ -0,0 +1,95 @@
+---
+name: deep-research
+description: Use when investigating latest vendor behavior, comparing tools or platforms, verifying claims beyond the repo, or gathering external evidence before implementation.
+allowed-tools: Read Grep Glob Bash
+context: fork
+agent: researcher
+user-invocable: true
+argument-hint: "Topic, vendor capability, or public comparison to research"
+---
+# Deep Research
+## Purpose
+Run a disciplined research pass before implementation when the repo alone is not enough. This skill keeps research evidence-driven: inspect the local codebase first, escalate to official docs when freshness or public comparison matters, then use labeled community evidence only when it adds practical context.
+## When to Use
+- Verifying latest SDK, CLI, API, or platform behavior
+- Comparing tools, frameworks, hosted services, or implementation approaches
+- Checking whether public docs and the local repo disagree
+- Gathering external evidence before planning a migration or new capability
+- Producing a structured research brief that hands off cleanly into implementation
+## Instructions
+1. **Define the research question before collecting sources** because vague research sprawls quickly. Restate the target topic, freshness requirement, comparison axis, and what decision the findings need to support.
+2. **Inspect the repo first** because many questions are already answerable from local code, configs, tests, docs, or generated assets. Do not browse externally until the local evidence is exhausted or clearly insufficient.
+3. **Decide whether external research is actually required** because not every task needs web evidence. Escalate only when freshness matters, public comparison matters, or the user explicitly asks to research or verify.
+4. **Follow the source ladder strictly** because evidence quality matters. Use official docs, upstream repositories, standards, and maintainer material as primary sources before looking at blogs, issue threads, or Reddit.
+5. **Capture concrete source details** because research without provenance is hard to trust. Record exact links, relevant dates, versions, and any repo files that support or contradict the external evidence.
+6. **Cross-check important claims across more than one source when possible** because public docs, repos, and community advice can drift. If sources disagree, say so explicitly instead of smoothing over the conflict.
+7. **Use Reddit and other community sources only as labeled secondary evidence** because they can surface practical gotchas but are not authoritative. Treat them as implementation color, not final truth.
+8. **Separate verified facts from inference** because downstream planning depends on confidence. Mark what is directly supported by repo evidence or official sources versus what you infer from patterns or secondary signals.
+9. **Keep the output decision-oriented** because the goal is not to dump links. Tie each finding back to the implementation, workflow, agent, or skill decision it affects.
+10. **Recommend the next route explicitly** because research is usually a handoff, not the end of the task. Name the next workflow, agent, or exact skill that should continue the work.
+11. **State the remaining gaps and risks** because incomplete research is still useful when the uncertainty is visible. Call out what you could not verify, what may have changed recently, and what assumptions remain.
+12. **Avoid over-quoting and over-collecting** because research quality comes from synthesis, not volume. Prefer concise summaries with high-signal citations over long pasted excerpts.
+13. **When the task turns into implementation, stop researching and hand off** because mixing discovery and execution usually creates drift. Deliver the research brief first, then route into the correct workflow or specialist.
+## Output Format
+Deliver:
+1. **Research question** — topic, freshness requirement, and decision to support
+2. **Verified facts** — repo evidence and primary-source findings
+3. **Secondary/community evidence** — labeled lower-trust supporting signals
+4. **Gaps / unknowns** — unresolved questions or contradictory evidence
+5. **Recommended next route** — direct execution, workflow, agent, or exact skill to use next
+## References
+Load only the file needed for the current question.
+| File | Load when |
+| --- | --- |
+| `references/source-ladder.md` | Need the repo-first and source-priority policy for official docs versus community evidence. |
+| `references/research-output.md` | Need the structured output format, evidence labeling rules, or handoff pattern. |
+| `references/comparison-checklist.md` | Comparing vendors, frameworks, or tools and need a concrete evaluation frame. |
+## Examples
+Use these when the task shape already matches.
+| File | Use when |
+| --- | --- |
+| `examples/01-latest-docs-check.md` | Verifying a latest capability or doc claim before implementation. |
+| `examples/02-ecosystem-comparison.md` | Comparing multiple tools or platforms with official-first sourcing. |
+| `examples/03-research-to-implementation-handoff.md` | Turning research findings into a concrete next workflow or specialist handoff. |
+## Claude Research Flow
+- Use `$ARGUMENTS` as the research topic when this skill is invoked directly.
+- Prefer official docs, upstream repos, and maintainer material before blog posts or Reddit threads.
+- If Claude hook templates are installed, let them reinforce repo-first inspection and research escalation, but keep the final research output aligned with this skill's evidence contract.
+## Claude Platform Notes
+- Use `$ARGUMENTS` to access user-provided arguments passed when the skill is invoked.
+- Reference skill-local files with `${CLAUDE_SKILL_DIR}/references/<file>` for portable paths.
+- When `context: fork` is set, the skill runs in an isolated subagent context; the `agent` field names the fork target.
+- MCP skill tools (`skill_search`, `skill_get`, `skill_validate`, `skill_get_reference`) are available for dynamic skill discovery and loading.
+- Use `allowed-tools` in frontmatter to restrict tool access for security-sensitive skills.

package/workflows/workflows/agent-environment-setup/platforms/claude/skills/deep-research/evals/assertions.md ADDED Viewed

@@ -0,0 +1,17 @@
+# Deep Research Eval Assertions
+## Eval 1: Latest Capability Verification
+1. **repo-first** — Starts by checking repo or local evidence before jumping to web claims.
+2. **official-first** — Uses official docs or upstream sources as the primary evidence for the capability.
+3. **secondary-labeled** — If community sources are mentioned, labels them as secondary evidence instead of presenting them as authoritative.
+4. **gaps-called-out** — Identifies unresolved uncertainty or missing confirmation.
+5. **next-route** — Ends with a concrete recommended workflow, agent, or skill to use next.
+## Eval 2: Tool Comparison
+1. **comparison-frame** — Defines the comparison axes instead of producing vague preferences.
+2. **repo-impact** — Connects the comparison back to the current repo or implementation constraints.
+3. **fact-vs-inference** — Separates verified facts from inference or interpretation.
+4. **decision-oriented** — Produces a recommendation or explicit defer condition.
+5. **no-research-sprawl** — Keeps the output concise and structured rather than dumping raw links.

package/workflows/workflows/agent-environment-setup/platforms/claude/skills/deep-research/evals/evals.json ADDED Viewed

@@ -0,0 +1,56 @@
+[
+  {
+    "name": "latest-capability-verification",
+    "description": "Validate that the skill performs repo-first research, prioritizes official documentation, labels secondary evidence, and recommends a next route.",
+    "prompt": "Research whether the latest official Claude Code hook surface supports reinforcing route honoring before implementation. Start with the current repo state, then use official docs if needed. If community sources add useful practical context, include them but label them appropriately. End with the next workflow, agent, or skill we should use.",
+    "assertions": [
+      {
+        "id": "repo-first",
+        "description": "Starts with repo or local evidence before using external sources."
+      },
+      {
+        "id": "official-first",
+        "description": "Treats official docs or upstream sources as the primary evidence."
+      },
+      {
+        "id": "secondary-labeled",
+        "description": "Labels any Reddit or community evidence as secondary rather than authoritative."
+      },
+      {
+        "id": "gaps-called-out",
+        "description": "States any unresolved gaps, conflicts, or unknowns."
+      },
+      {
+        "id": "next-route",
+        "description": "Ends with a concrete recommended next route."
+      }
+    ]
+  },
+  {
+    "name": "tool-comparison",
+    "description": "Validate that the skill compares options with a clear frame, ties findings back to repo impact, and produces a decision-ready output.",
+    "prompt": "Compare whether our CLI should keep enforcement in Gemini command wrappers only or also add Claude hook templates. Use the repo state first, then official docs for current platform capabilities, and include community evidence only if it adds implementation nuance. Finish with a recommendation and the next route to take.",
+    "assertions": [
+      {
+        "id": "comparison-frame",
+        "description": "Defines concrete comparison axes such as repo impact, enforcement surface, and maintenance cost."
+      },
+      {
+        "id": "repo-impact",
+        "description": "Connects each option back to the current repo or bundle behavior."
+      },
+      {
+        "id": "fact-vs-inference",
+        "description": "Separates verified facts from inference or interpretation."
+      },
+      {
+        "id": "decision-oriented",
+        "description": "Produces a recommendation or explicit defer condition."
+      },
+      {
+        "id": "no-research-sprawl",
+        "description": "Keeps the answer structured instead of turning into an unfiltered dump of sources."
+      }
+    ]
+  }
+]

package/workflows/workflows/agent-environment-setup/platforms/claude/skills/deep-research/examples/01-latest-docs-check.md ADDED Viewed

@@ -0,0 +1,12 @@
+# Example: Latest Docs Check
+## User Request
+> Research whether Claude Code hooks can reinforce route honoring before we add new workflow rules.
+## Expected Shape
+1. Inspect the repo's current Claude rule and hook support first.
+2. Verify the current official Claude docs for hooks, event names, and config format.
+3. Separate those verified facts from any community commentary about hook effectiveness.
+4. End with the recommended next route, for example `@researcher` continuing the research or `/create` applying the validated hook template changes.

package/workflows/workflows/agent-environment-setup/platforms/claude/skills/deep-research/examples/02-ecosystem-comparison.md ADDED Viewed

@@ -0,0 +1,12 @@
+# Example: Ecosystem Comparison
+## User Request
+> Compare whether our CLI should keep Gemini command enforcement only, or add another platform-native hook layer for Claude as well.
+## Expected Shape
+1. Start with repo constraints and current platform bundle behavior.
+2. Compare the official platform capabilities using primary docs.
+3. Add any useful community evidence as clearly labeled secondary input.
+4. Produce a recommendation tied to the repo: which platform gets which enforcement surface, and why.

package/workflows/workflows/agent-environment-setup/platforms/claude/skills/deep-research/examples/03-research-to-implementation-handoff.md ADDED Viewed

@@ -0,0 +1,12 @@
+# Example: Research To Implementation Handoff
+## User Request
+> Research the latest Codex, Claude, and Gemini MCP behavior, then tell me the next route to update our workflow rules safely.
+## Expected Shape
+1. Gather repo evidence first.
+2. Verify current official docs for each platform.
+3. Summarize verified facts, secondary evidence, and gaps.
+4. End with a precise next route such as `/plan` for a policy change or `skill-creator` for skill/rule packaging work.

package/workflows/workflows/agent-environment-setup/platforms/claude/skills/deep-research/references/comparison-checklist.md ADDED Viewed

@@ -0,0 +1,57 @@
+# Comparison Checklist
+Use this when evaluating tools, frameworks, APIs, or platforms.
+## 1. Scope the Comparison
+Define:
+- what is being compared
+- whether the comparison is about implementation fit, operational cost, or product capability
+- what time horizon matters: immediate migration, medium-term maintenance, or long-term platform fit
+## 2. Compare on Stable Axes
+Use a short set of dimensions:
+- integration fit with the current repo
+- maturity and maintenance signal
+- official documentation quality
+- configuration complexity
+- ecosystem and tooling support
+- operational constraints
+- migration cost
+Do not compare on vague criteria like "better DX" without concrete evidence.
+## 3. Capture Repo Impact
+Tie each option back to the current codebase:
+- what code would change
+- which workflows or agents would own the work
+- what risks are specific to this repo
+- whether new MCP tools or skills would be required
+## 4. Separate Product Claims from Team Constraints
+An option can be technically stronger and still be a worse fit for the repo.
+Keep these separate:
+- product capability
+- ecosystem quality
+- team familiarity
+- migration blast radius
+- existing architecture constraints
+## 5. Decision Frame
+Finish with one of:
+- recommend option A
+- recommend option B
+- defer decision pending one missing verification
+- keep current approach because switching cost outweighs gain
+If the evidence is mixed, say what would change the recommendation.

package/workflows/workflows/agent-environment-setup/platforms/claude/skills/deep-research/references/research-output.md ADDED Viewed

@@ -0,0 +1,69 @@
+# Research Output Contract
+## Required Sections
+### 1. Research question
+State:
+- the exact topic
+- why research was necessary
+- whether freshness or public comparison mattered
+- the decision this research is meant to support
+### 2. Verified facts
+List the strongest findings first.
+For each fact:
+- state the claim in one sentence
+- cite the source class: repo, official docs, upstream repo, standard
+- include the relevant link or file path
+- include date/version when it matters
+### 3. Secondary / community evidence
+Only include this when it adds signal the primary sources did not provide.
+For each item:
+- label it as secondary evidence
+- state what practical signal it adds
+- avoid presenting it as settled fact
+### 4. Gaps / unknowns
+Document:
+- unresolved conflicts
+- missing official confirmation
+- assumptions that still need validation
+- risks if the team proceeds anyway
+### 5. Recommended next route
+Research should end with one clear recommendation:
+- direct execution
+- a specific workflow like `/plan` or `/create`
+- a specialist like `@researcher` or `@frontend-specialist`
+- an exact skill like `stitch` or `deep-research`
+Keep this recommendation concrete enough that the next step does not need another routing pass.
+## Compression Rules
+- Prefer 5 strong findings over 20 weak ones.
+- Do not paste long quotes from docs when a citation plus summary will do.
+- If multiple sources say the same thing, summarize once and cite the strongest source.
+- If research found nothing reliable, say that directly.
+## Handoff Pattern
+When handing off to implementation or planning, include:
+- the decision summary
+- the highest-confidence constraints
+- the unresolved risks
+- the next route to take

package/workflows/workflows/agent-environment-setup/platforms/claude/skills/deep-research/references/source-ladder.md ADDED Viewed

@@ -0,0 +1,81 @@
+# Source Ladder
+## Goal
+Use the smallest amount of external research that still produces a decision-ready answer. Keep the evidence traceable and ordered by trust.
+## 1. Repo / Local Evidence First
+Start by inspecting:
+- application code and tests
+- README files and internal docs
+- generated workflow or skill assets
+- lockfiles, config files, and package manifests
+- existing integration code and migration history
+If the repo already answers the question, stop there. Do not browse externally just because web research feels safer.
+## 2. Primary External Sources
+Use these next:
+- official vendor docs
+- upstream repositories and release notes
+- standards bodies and reference specs
+- maintainer-authored examples
+Prefer sources that expose:
+- exact feature names
+- current version constraints
+- config formats
+- dates or changelog context
+When the topic is time-sensitive, capture the date you verified the source and the version or doc page involved.
+## 3. Secondary / Community Sources
+Use these only after primary evidence:
+- Reddit threads
+- issue comments
+- independent blog posts
+- forum discussions
+- third-party comparison articles
+Community evidence is useful for:
+- practical gotchas
+- migration pain points
+- missing-doc workarounds
+- real-world adoption patterns
+Community evidence is not enough on its own for authoritative claims about product behavior, supported configuration, or security guarantees.
+## 4. Conflict Handling
+When sources disagree:
+1. Prefer repo evidence for the current codebase state.
+2. Prefer official docs over community claims for product behavior.
+3. Prefer newer dated material when the sources cover the same feature.
+4. If the conflict remains unresolved, report it as a gap instead of guessing.
+## 5. Evidence Labels
+Use these labels in research output:
+- **Verified fact** — backed by repo evidence or a primary source
+- **Secondary evidence** — backed only by community or indirect sources
+- **Inference** — reasoned conclusion not directly stated by a source
+- **Gap** — could not be verified confidently
+## 6. Stop Conditions
+Stop researching when:
+- the decision is already clear
+- new sources only repeat the same point
+- the remaining uncertainty is small and clearly documented
+- the task should move into implementation or planning

package/workflows/workflows/agent-environment-setup/platforms/claude/skills/skills_index.json CHANGED Viewed

@@ -193,6 +193,38 @@
       "databases"
     ]
   },
+  {
+    "id": "deep-research",
+    "package_id": "deep-research",
+    "catalog_id": "deep-research",
+    "kind": "skill",
+    "name": "deep-research",
+    "canonical": true,
+    "canonical_id": "deep-research",
+    "deprecated": false,
+    "replaced_by": null,
+    "aliases": [],
+    "category": "core-operating",
+    "layer": "core-operating",
+    "maturity": "incubating",
+    "tier": "experimental",
+    "tags": [
+      "core-operating",
+      "deep",
+      "deep research",
+      "deep-research",
+      "research"
+    ],
+    "path": ".claude/skills/deep-research/SKILL.md",
+    "description": "Use when investigating latest vendor behavior, comparing tools or platforms, verifying claims beyond the repo, or gathering external evidence before implementation.",
+    "triggers": [
+      "deep-research",
+      "deep research",
+      "deep",
+      "research",
+      "core-operating"
+    ]
+  },
   {
     "id": "django-drf",
     "package_id": "django-drf",

package/workflows/workflows/agent-environment-setup/platforms/claude/workflows/onboard.md CHANGED Viewed

@@ -30,9 +30,9 @@ Use this when joining a new project, exploring an unfamiliar codebase, or prepar
 ## Skill Routing
-- Primary skills: `architecture-doc`, `system-design`
+- Primary skills: `deep-research`, `system-design`
 - Supporting skills (optional): `system-design`, `database-design`, `typescript-best-practices`, `javascript-best-practices`, `python-best-practices`
-- Start with `architecture-doc` for systematic exploration and `system-design` for architecture mapping. Add `system-design` for undocumented systems.
+- Start with `deep-research` for systematic exploration and `system-design` for architecture mapping. Add `system-design` for undocumented systems. Prefer repo evidence first; use external sources only when setup or dependency behavior cannot be confirmed locally.
 ## Workflow steps
@@ -55,7 +55,7 @@ Use this when joining a new project, exploring an unfamiliar codebase, or prepar
 ONBOARD_WORKFLOW_RESULT:
   primary_agent: researcher
   supporting_agents: [code-archaeologist?, backend-specialist?, frontend-specialist?]
-  primary_skills: [architecture-doc, system-design]
+  primary_skills: [deep-research, system-design]
   supporting_skills: [system-design?, database-design?]
   project_overview:
     purpose: <string>

package/workflows/workflows/agent-environment-setup/platforms/claude/workflows/orchestrate.md CHANGED Viewed

@@ -24,8 +24,8 @@ Use this when a task spans multiple domains (backend + frontend, security + infr
 ## Skill Routing
 - Primary skills: `system-design`, `api-design`
-- Supporting skills (optional): `database-design`, `architecture-doc`, `mcp-server-builder`, `tech-doc`, `prompt-engineering`, `skill-creator`
-- Start with `system-design` for system design coordination and `api-design` for integration contracts. Add supporting skills based on the coordination challenge.
+- Supporting skills (optional): `database-design`, `deep-research`, `mcp-server-builder`, `tech-doc`, `prompt-engineering`, `skill-creator`
+- Start with `system-design` for system design coordination and `api-design` for integration contracts. Add `deep-research` before implementation when the coordination challenge depends on fresh external facts or public comparison.
 ## Workflow steps

package/workflows/workflows/agent-environment-setup/platforms/claude/workflows/plan.md CHANGED Viewed

@@ -36,13 +36,13 @@ Use this when starting a new feature, project, or significant change that needs
 ## Skill Routing
 - Primary skills: `system-design`, `api-design`
-- Supporting skills (optional): `database-design`, `architecture-doc`, `mcp-server-builder`, `tech-doc`, `prompt-engineering`, `skill-creator`
-- Start with `system-design` for system design and `api-design` for API contracts. Add `database-design` when data modeling is central, `architecture-doc` when external knowledge is needed.
+- Supporting skills (optional): `database-design`, `deep-research`, `mcp-server-builder`, `tech-doc`, `prompt-engineering`, `skill-creator`
+- Start with `system-design` for system design and `api-design` for API contracts. Add `database-design` when data modeling is central, `deep-research` when fresh external knowledge or public comparison is needed.
 ## Workflow steps
 1. Clarify scope, success criteria, and constraints.
-2. Research existing patterns and dependencies.
+2. Research existing patterns and dependencies, starting in-repo and escalating to `deep-research` only when outside evidence is required.
 3. Decompose into tasks with ownership and dependencies.
 4. Define interfaces, contracts, and failure modes.
 5. Produce acceptance criteria for each milestone.
@@ -62,7 +62,7 @@ PLAN_WORKFLOW_RESULT:
   primary_agent: project-planner
   supporting_agents: [orchestrator?, backend-specialist?, frontend-specialist?, database-architect?]
   primary_skills: [system-design, api-design]
-  supporting_skills: [database-design?, architecture-doc?, mcp-server-builder?]
+  supporting_skills: [database-design?, deep-research?, mcp-server-builder?]
   plan:
     scope_summary: <string>
     tasks:

package/workflows/workflows/agent-environment-setup/platforms/codex/agents/orchestrator.md CHANGED Viewed

@@ -5,7 +5,7 @@ tools: Read, Grep, Glob, Bash, Write, Edit
 model: inherit
 maxTurns: 30
 memory: project
-skills: system-design, api-design, database-design, architecture-doc, mcp-server-builder, tech-doc, prompt-engineering, skill-creator, typescript-best-practices, javascript-best-practices, python-best-practices
+skills: system-design, api-design, database-design, deep-research, mcp-server-builder, tech-doc, prompt-engineering, skill-creator, typescript-best-practices, javascript-best-practices, python-best-practices
 handoffs:
   - agent: "validator"
     title: "Validate Results"
@@ -31,8 +31,8 @@ Your only permitted actions:
 ## Skill Loading Contract
-- Do not call `skill_search` for `system-design`, `api-design`, `database-design`, `architecture-doc`, `mcp-server-builder`, `tech-doc`, `prompt-engineering`, or `skill-creator` when the task is clearly multi-stream coordination, planning, architecture design, contract design, research, or skill package work.
-- Use `system-design` when the coordination problem is really a design tradeoff problem, `api-design` when integration contracts are the coordination bottleneck, `database-design` when the shared dependency is a data-model or migration concern, `architecture-doc` when the coordination risk is stale or conflicting external information, `mcp-server-builder` for MCP-specific streams, `tech-doc` for OpenAI-doc verification streams, `prompt-engineering` for instruction-quality streams, and `skill-creator` when the coordinated changes are in skills, mirrors, routing, or packaging.
+- Do not call `skill_search` for `system-design`, `api-design`, `database-design`, `deep-research`, `mcp-server-builder`, `tech-doc`, `prompt-engineering`, or `skill-creator` when the task is clearly multi-stream coordination, planning, architecture design, contract design, research, or skill package work.
+- Use `system-design` when the coordination problem is really a design tradeoff problem, `api-design` when integration contracts are the coordination bottleneck, `database-design` when the shared dependency is a data-model or migration concern, `deep-research` when the coordination risk is stale or conflicting external information, `mcp-server-builder` for MCP-specific streams, `tech-doc` for OpenAI-doc verification streams, `prompt-engineering` for instruction-quality streams, and `skill-creator` when the coordinated changes are in skills, mirrors, routing, or packaging.
 - Prefer platform-native delegation features when available, but keep the orchestration contract stable even when execution stays in a single track.
 - Use `skill_validate` before `skill_get`, and use `skill_get_reference` only for the specific sidecar file needed by the current coordination step.
@@ -45,7 +45,7 @@ Load on demand. Do not preload all references.
 | `system-design` | Coordination depends on resolving system design or interface tradeoffs first.                             |
 | `api-design`          | The critical shared dependency is an API contract or integration boundary.                                |
 | `database-design`       | The coordination risk centers on schema, migration, data ownership, or engine choice.                     |
-| `architecture-doc`         | External sources, latest information, or public-repo comparisons are blocking confident execution.        |
+| `deep-research`         | External sources, latest information, or public-repo comparisons are blocking confident execution.        |
 | `mcp-server-builder`           | One stream is MCP server design, tool shape, or transport selection.                                      |
 | `tech-doc`           | One stream needs current OpenAI docs or version-specific behavior verification.                           |
 | `prompt-engineering`       | One stream is repairing prompts, agent rules, or instruction quality.                                     |
@@ -158,7 +158,8 @@ ANTI-LAZINESS:
 5. **Iterate, don't accept mediocrity** — if output is incomplete or wrong, re-delegate with feedback.
 6. **Track progress visibly** — maintain a task list showing status of each work item.
 7. **Fail fast on blockers** — if a dependency is missing or a task is stuck after 3 iterations, escalate.
-8. **Synthesize at the end** — combine outputs with concrete actions, risks, and verification evidence.
+8. **Route research explicitly** — when freshness or public comparison matters, delegate to `@researcher` or load `deep-research` before implementation.
+9. **Synthesize at the end** — combine outputs with concrete actions, risks, and verification evidence.
 ## Anti-Patterns to Prevent