@skill-graph/cli 0.5.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +247 -0
- package/LICENSE +200 -0
- package/NOTICE +62 -0
- package/README.md +398 -0
- package/SKILL_GRAPH.md +443 -0
- package/bin/skill-graph.js +374 -0
- package/docs/ADOPTION.md +117 -0
- package/docs/CONFORMANCE.md +66 -0
- package/docs/PRIMER.md +384 -0
- package/docs/QUICKSTART-30MIN.md +333 -0
- package/docs/ROUTING-METRICS.md +120 -0
- package/docs/SKILL-MD-FORMAT-COMPATIBILITY.md +127 -0
- package/docs/SKILL_AUDIT_CHECKLIST.md +199 -0
- package/docs/SKILL_AUDIT_LOOP.md +195 -0
- package/docs/SKILL_METADATA_PROTOCOL.md +609 -0
- package/docs/_archived/marketplace-publication-priority-2026-05-18.md +239 -0
- package/docs/adr/0001-predicate-set.md +69 -0
- package/docs/adr/0002-json-ld-context.md +82 -0
- package/docs/adr/0003-ontoclean-rigidity-tags.md +65 -0
- package/docs/adr/0004-persistent-identifiers.md +74 -0
- package/docs/adr/0005-freshness-consolidation.md +70 -0
- package/docs/adr/0006-revise-predicate-rename.md +105 -0
- package/docs/adr/0007-audit-loop-cadence.md +99 -0
- package/docs/adr/0008-skill-surface-split-and-curation-policy.md +93 -0
- package/docs/category-consumers.md +168 -0
- package/docs/concept-map.md +194 -0
- package/docs/diagrams/drift-states.mmd +21 -0
- package/docs/diagrams/manifest-pipeline.mmd +25 -0
- package/docs/diagrams/routing-harness.mmd +41 -0
- package/docs/diagrams/starter-graph.mmd +53 -0
- package/docs/field-decision-guide.md +315 -0
- package/docs/field-rationale.md +211 -0
- package/docs/field-reference.generated.md +624 -0
- package/docs/field-reference.md +1426 -0
- package/docs/glossary.md +190 -0
- package/docs/head-noun-glossary.md +63 -0
- package/docs/images/audit-phases.png +0 -0
- package/docs/images/drift-states.png +0 -0
- package/docs/images/graded-mode.png +0 -0
- package/docs/images/manifest-pipeline.png +0 -0
- package/docs/images/routing-harness.png +0 -0
- package/docs/images/skill-anatomy.png +0 -0
- package/docs/images/starter-graph.png +0 -0
- package/docs/images/system-model.png +0 -0
- package/docs/integrations/github-actions.md +155 -0
- package/docs/manifest-field-mapping.md +443 -0
- package/docs/marketplace-publication-queue.generated.md +240 -0
- package/docs/marketplace-release-agent-prompt.md +82 -0
- package/docs/marketplace-skill-candidate-list.md +272 -0
- package/docs/marketplace-syndication.md +222 -0
- package/docs/migration-sample-review.md +155 -0
- package/docs/migrations/v4-to-v5.md +168 -0
- package/docs/migrations/v5-to-v6.md +221 -0
- package/docs/name-exceptions.yaml +37 -0
- package/docs/plans/marketplace-p1-public-migration-plan.md +41 -0
- package/docs/plans/multi-root-workspace.md +148 -0
- package/docs/plans/scripts-roadmap.md +107 -0
- package/docs/plans/v4-schema-bump.md +160 -0
- package/docs/plans/wave-2-extraction.md +122 -0
- package/docs/positioning-vs-marketplaces.md +175 -0
- package/docs/proposals/skill-audit-loop-positioning.md +160 -0
- package/docs/quality-doctrine.md +138 -0
- package/docs/recommended-skills.md +150 -0
- package/docs/research/skill-comprehension-eval-research.md +1830 -0
- package/docs/research/skill-retrieval-evidence.md +66 -0
- package/docs/skill-metadata-protocol.md +471 -0
- package/docs/skills-sh-maintainer-cleanup-request.md +80 -0
- package/examples/audits/a11y/findings.md +52 -0
- package/examples/audits/a11y/scorecard.md +21 -0
- package/examples/audits/a11y/verdict.md +44 -0
- package/examples/audits/debugging/findings.md +59 -0
- package/examples/audits/debugging/scorecard.md +22 -0
- package/examples/audits/debugging/verdict.md +33 -0
- package/examples/audits/documentation/findings.md +59 -0
- package/examples/audits/documentation/scorecard.md +22 -0
- package/examples/audits/documentation/verdict.md +33 -0
- package/examples/evals/a11y.json +140 -0
- package/examples/evals/api-design.json +52 -0
- package/examples/evals/code-review.json +52 -0
- package/examples/evals/data-modeling.json +52 -0
- package/examples/evals/database-migration.json +52 -0
- package/examples/evals/debugging.json +118 -0
- package/examples/evals/dependency-architecture.json +52 -0
- package/examples/evals/design-system-architecture.json +52 -0
- package/examples/evals/error-tracking.json +52 -0
- package/examples/evals/event-contract-design.json +52 -0
- package/examples/evals/form-ux-architecture.json +52 -0
- package/examples/evals/framework-fit-analysis.json +52 -0
- package/examples/evals/graph-audit.json +139 -0
- package/examples/evals/information-architecture.json +52 -0
- package/examples/evals/interaction-feedback.json +52 -0
- package/examples/evals/interaction-patterns.json +52 -0
- package/examples/evals/layout-composition.json +52 -0
- package/examples/evals/lint-overlay.json +117 -0
- package/examples/evals/microcopy.json +52 -0
- package/examples/evals/observability-modeling.json +52 -0
- package/examples/evals/pattern-recognition.json +96 -0
- package/examples/evals/performance-engineering.json +52 -0
- package/examples/evals/refactor.json +128 -0
- package/examples/evals/semiotics.json +52 -0
- package/examples/evals/skill-infrastructure.json +96 -0
- package/examples/evals/skill-router.json +140 -0
- package/examples/evals/skill-router.routing.json +113 -0
- package/examples/evals/system-interface-contracts.json +52 -0
- package/examples/evals/task-analysis.json +52 -0
- package/examples/evals/testing-strategy.json +118 -0
- package/examples/evals/type-safety.json +249 -0
- package/examples/evals/visual-design-foundations.json +52 -0
- package/examples/evals/webhook-integration.json +52 -0
- package/examples/exports/a11y.skill-md.md +80 -0
- package/examples/exports/debugging.skill-md.md +80 -0
- package/examples/exports/refactor.skill-md.md +78 -0
- package/examples/exports/testing-strategy.skill-md.md +81 -0
- package/examples/projects/markdown-static-site/README.md +115 -0
- package/examples/projects/markdown-static-site/skills/content-source-router/SKILL.md +131 -0
- package/examples/projects/markdown-static-site/skills/image-optimization-pipeline-config/SKILL.md +132 -0
- package/examples/projects/markdown-static-site/skills/link-rot-detection/SKILL.md +103 -0
- package/examples/projects/markdown-static-site/skills/markdown-post-frontmatter-validation/SKILL.md +133 -0
- package/examples/projects/markdown-static-site/skills/migrate-posts-to-v2-frontmatter/SKILL.md +140 -0
- package/examples/projects/saas-stripe-postgres/README.md +208 -0
- package/examples/projects/saas-stripe-postgres/db/migrations/0004_canonicalize_orders.sql +37 -0
- package/examples/projects/saas-stripe-postgres/db/schema.sql +112 -0
- package/examples/projects/saas-stripe-postgres/skills/migrate-orders-to-canonical-schema/SKILL.md +149 -0
- package/examples/projects/saas-stripe-postgres/skills/nextjs-server-action-validation/SKILL.md +154 -0
- package/examples/projects/saas-stripe-postgres/skills/payment-provider-router/SKILL.md +153 -0
- package/examples/projects/saas-stripe-postgres/skills/postgres-rls-pattern/SKILL.md +163 -0
- package/examples/projects/saas-stripe-postgres/skills/stripe-webhook-signature-verification/SKILL.md +137 -0
- package/examples/protocol/skill-metadata-template.md +301 -0
- package/examples/protocol/skills.manifest.sample.json +13245 -0
- package/examples/skill-metadata-template.md +317 -0
- package/examples/skills.manifest.sample.json +13519 -0
- package/examples/tests/v3-1-skos-fixture/SKILL.md +93 -0
- package/marketplace/README.md +17 -0
- package/marketplace/skills/a11y/SKILL.md +66 -0
- package/marketplace/skills/acid-fundamentals/SKILL.md +106 -0
- package/marketplace/skills/agent-engineering/SKILL.md +386 -0
- package/marketplace/skills/agent-eval-design/SKILL.md +55 -0
- package/marketplace/skills/ai-native-development/SKILL.md +294 -0
- package/marketplace/skills/api-design/SKILL.md +60 -0
- package/marketplace/skills/architecture-decision-records/SKILL.md +55 -0
- package/marketplace/skills/background-jobs/SKILL.md +265 -0
- package/marketplace/skills/bounded-context-mapping/SKILL.md +55 -0
- package/marketplace/skills/cap-theorem-tradeoffs/SKILL.md +127 -0
- package/marketplace/skills/client-server-boundary/SKILL.md +187 -0
- package/marketplace/skills/code-review/SKILL.md +120 -0
- package/marketplace/skills/color-system-design/SKILL.md +43 -0
- package/marketplace/skills/component-architecture/SKILL.md +126 -0
- package/marketplace/skills/compression/SKILL.md +112 -0
- package/marketplace/skills/conceptual-modeling/SKILL.md +181 -0
- package/marketplace/skills/connection-pooling/SKILL.md +105 -0
- package/marketplace/skills/constraint-awareness/SKILL.md +287 -0
- package/marketplace/skills/content-monitor/SKILL.md +209 -0
- package/marketplace/skills/context-engineering/SKILL.md +320 -0
- package/marketplace/skills/context-graph/SKILL.md +174 -0
- package/marketplace/skills/context-management/SKILL.md +174 -0
- package/marketplace/skills/context-window/SKILL.md +239 -0
- package/marketplace/skills/contract-testing/SKILL.md +120 -0
- package/marketplace/skills/cron-scheduling/SKILL.md +223 -0
- package/marketplace/skills/dark-mode-implementation/SKILL.md +47 -0
- package/marketplace/skills/data-modeling/SKILL.md +59 -0
- package/marketplace/skills/data-modeling-fundamentals/SKILL.md +117 -0
- package/marketplace/skills/database-migration/SKILL.md +429 -0
- package/marketplace/skills/debugging/SKILL.md +67 -0
- package/marketplace/skills/dependency-architecture/SKILL.md +58 -0
- package/marketplace/skills/design-module-composition/SKILL.md +43 -0
- package/marketplace/skills/design-system-architecture/SKILL.md +61 -0
- package/marketplace/skills/design-thinking/SKILL.md +44 -0
- package/marketplace/skills/diagnosis/SKILL.md +296 -0
- package/marketplace/skills/diff-analysis/SKILL.md +188 -0
- package/marketplace/skills/e2e-test-design/SKILL.md +113 -0
- package/marketplace/skills/entity-relationship-modeling/SKILL.md +218 -0
- package/marketplace/skills/epistemic-grounding/SKILL.md +112 -0
- package/marketplace/skills/error-boundary/SKILL.md +235 -0
- package/marketplace/skills/error-tracking/SKILL.md +261 -0
- package/marketplace/skills/eval-driven-development/SKILL.md +147 -0
- package/marketplace/skills/evaluation/SKILL.md +113 -0
- package/marketplace/skills/event-contract-design/SKILL.md +60 -0
- package/marketplace/skills/event-storming/SKILL.md +56 -0
- package/marketplace/skills/form-ux-architecture/SKILL.md +60 -0
- package/marketplace/skills/framework-fit-analysis/SKILL.md +59 -0
- package/marketplace/skills/frontend-architecture/SKILL.md +43 -0
- package/marketplace/skills/generative-ui/SKILL.md +118 -0
- package/marketplace/skills/graph-audit/SKILL.md +81 -0
- package/marketplace/skills/guardrails/SKILL.md +118 -0
- package/marketplace/skills/hooks-patterns/SKILL.md +185 -0
- package/marketplace/skills/http-semantics/SKILL.md +136 -0
- package/marketplace/skills/ideation/SKILL.md +41 -0
- package/marketplace/skills/indexing-strategy/SKILL.md +108 -0
- package/marketplace/skills/information-architecture/SKILL.md +59 -0
- package/marketplace/skills/integration-test-design/SKILL.md +111 -0
- package/marketplace/skills/intent-recognition/SKILL.md +136 -0
- package/marketplace/skills/interaction-feedback/SKILL.md +59 -0
- package/marketplace/skills/interaction-patterns/SKILL.md +59 -0
- package/marketplace/skills/journey-mapping/SKILL.md +41 -0
- package/marketplace/skills/keywords/SKILL.md +213 -0
- package/marketplace/skills/knowledge-modeling/SKILL.md +232 -0
- package/marketplace/skills/layout-composition/SKILL.md +59 -0
- package/marketplace/skills/linguistics/SKILL.md +429 -0
- package/marketplace/skills/lint-overlay/SKILL.md +76 -0
- package/marketplace/skills/mental-models/SKILL.md +126 -0
- package/marketplace/skills/merge-queue/SKILL.md +94 -0
- package/marketplace/skills/methodology/SKILL.md +317 -0
- package/marketplace/skills/microcopy/SKILL.md +232 -0
- package/marketplace/skills/middleware-patterns/SKILL.md +363 -0
- package/marketplace/skills/mobile-responsive-ux/SKILL.md +287 -0
- package/marketplace/skills/mutation-testing/SKILL.md +112 -0
- package/marketplace/skills/naming-conventions/SKILL.md +112 -0
- package/marketplace/skills/observability-modeling/SKILL.md +59 -0
- package/marketplace/skills/ontology-modeling/SKILL.md +67 -0
- package/marketplace/skills/owasp-security/SKILL.md +153 -0
- package/marketplace/skills/pattern-recognition/SKILL.md +472 -0
- package/marketplace/skills/performance-budgets/SKILL.md +185 -0
- package/marketplace/skills/performance-engineering/SKILL.md +58 -0
- package/marketplace/skills/performance-testing/SKILL.md +125 -0
- package/marketplace/skills/printify/SKILL.md +42 -0
- package/marketplace/skills/prioritization/SKILL.md +118 -0
- package/marketplace/skills/problem-framing/SKILL.md +41 -0
- package/marketplace/skills/problem-locating-solving/SKILL.md +203 -0
- package/marketplace/skills/project-knowledge-extraction/SKILL.md +54 -0
- package/marketplace/skills/prompt-craft/SKILL.md +134 -0
- package/marketplace/skills/prompt-injection-defense/SKILL.md +132 -0
- package/marketplace/skills/property-based-testing/SKILL.md +100 -0
- package/marketplace/skills/prototyping/SKILL.md +43 -0
- package/marketplace/skills/query-optimization/SKILL.md +144 -0
- package/marketplace/skills/real-time-updates/SKILL.md +324 -0
- package/marketplace/skills/ref-patterns/SKILL.md +284 -0
- package/marketplace/skills/refactor/SKILL.md +65 -0
- package/marketplace/skills/rendering-models/SKILL.md +142 -0
- package/marketplace/skills/replication-patterns/SKILL.md +110 -0
- package/marketplace/skills/research-synthesis/SKILL.md +41 -0
- package/marketplace/skills/route-handler-design/SKILL.md +347 -0
- package/marketplace/skills/schema-evolution/SKILL.md +140 -0
- package/marketplace/skills/security-fundamentals/SKILL.md +139 -0
- package/marketplace/skills/semantic-center/SKILL.md +194 -0
- package/marketplace/skills/semantic-relations/SKILL.md +250 -0
- package/marketplace/skills/semantics/SKILL.md +366 -0
- package/marketplace/skills/semiotics/SKILL.md +230 -0
- package/marketplace/skills/seo-strategy/SKILL.md +260 -0
- package/marketplace/skills/server-actions-design/SKILL.md +243 -0
- package/marketplace/skills/server-components-design/SKILL.md +190 -0
- package/marketplace/skills/sharding-strategy/SKILL.md +123 -0
- package/marketplace/skills/shopify/SKILL.md +42 -0
- package/marketplace/skills/skill-infrastructure/SKILL.md +320 -0
- package/marketplace/skills/skill-router/SKILL.md +71 -0
- package/marketplace/skills/skill-scaffold/SKILL.md +105 -0
- package/marketplace/skills/snapshot-testing/SKILL.md +120 -0
- package/marketplace/skills/spec-driven-development/SKILL.md +148 -0
- package/marketplace/skills/state-machine-modeling/SKILL.md +56 -0
- package/marketplace/skills/state-management/SKILL.md +134 -0
- package/marketplace/skills/streaming-architecture/SKILL.md +194 -0
- package/marketplace/skills/summarization/SKILL.md +156 -0
- package/marketplace/skills/suspense-patterns/SKILL.md +265 -0
- package/marketplace/skills/system-interface-contracts/SKILL.md +59 -0
- package/marketplace/skills/task-analysis/SKILL.md +201 -0
- package/marketplace/skills/taxonomy-design/SKILL.md +66 -0
- package/marketplace/skills/test-coverage-strategy/SKILL.md +108 -0
- package/marketplace/skills/test-doubles-design/SKILL.md +98 -0
- package/marketplace/skills/test-driven-development/SKILL.md +96 -0
- package/marketplace/skills/testing-strategy/SKILL.md +67 -0
- package/marketplace/skills/theme-system-design/SKILL.md +43 -0
- package/marketplace/skills/tool-call-flow/SKILL.md +229 -0
- package/marketplace/skills/tool-call-strategy/SKILL.md +292 -0
- package/marketplace/skills/transaction-isolation/SKILL.md +98 -0
- package/marketplace/skills/type-safety/SKILL.md +177 -0
- package/marketplace/skills/typography-system/SKILL.md +43 -0
- package/marketplace/skills/usability-testing/SKILL.md +43 -0
- package/marketplace/skills/user-research/SKILL.md +43 -0
- package/marketplace/skills/vercel-composition-patterns/SKILL.md +157 -0
- package/marketplace/skills/version-control/SKILL.md +233 -0
- package/marketplace/skills/visual-design-foundations/SKILL.md +59 -0
- package/marketplace/skills/visual-hierarchy/SKILL.md +43 -0
- package/marketplace/skills/webhook-integration/SKILL.md +331 -0
- package/marketplace/skills/writing-humanizer/SKILL.md +380 -0
- package/package.json +67 -0
- package/schemas/manifest.schema.json +811 -0
- package/schemas/manifest.v2.schema.json +164 -0
- package/schemas/manifest.v3.schema.json +758 -0
- package/schemas/manifest.v4.schema.json +755 -0
- package/schemas/manifest.v5.schema.json +755 -0
- package/schemas/manifest.v6.schema.json +811 -0
- package/schemas/skill.context.jsonld +279 -0
- package/schemas/skill.schema.json +919 -0
- package/schemas/skill.v2.schema.json +201 -0
- package/schemas/skill.v3.schema.json +827 -0
- package/schemas/skill.v4.schema.json +822 -0
- package/schemas/skill.v5.schema.json +830 -0
- package/schemas/skill.v6.schema.json +946 -0
- package/schemas/vocabulary/keywords.json +180 -0
- package/schemas/vocabulary/workspace_tags.json +23 -0
- package/scripts/__tests__/migrate-skill-v2-to-v3.test.js +161 -0
- package/scripts/__tests__/migrate-skill-v3-to-v4.test.js +158 -0
- package/scripts/__tests__/test-export-parser-drift.js +149 -0
- package/scripts/__tests__/test-marketplace-export.js +114 -0
- package/scripts/__tests__/test-router-paths.js +82 -0
- package/scripts/__tests__/test-stability-promotion.js +244 -0
- package/scripts/__tests__/test-v3-1-alias-contract.js +109 -0
- package/scripts/__tests__/test-v3-1-skos-runtime.js +116 -0
- package/scripts/backfill-schema-version.js +198 -0
- package/scripts/build-field-reference.js +160 -0
- package/scripts/build-retrieval-baseline.js +511 -0
- package/scripts/check-markdown-links.js +211 -0
- package/scripts/check-protocol-consistency.js +979 -0
- package/scripts/export-marketplace-skills.js +610 -0
- package/scripts/export-skill.js +374 -0
- package/scripts/generate-manifest.js +787 -0
- package/scripts/lib/alias-contract.js +83 -0
- package/scripts/lib/audit-prompt-builder.js +771 -0
- package/scripts/lib/mock-grader.js +134 -0
- package/scripts/lib/parse-frontmatter.js +429 -0
- package/scripts/lib/roots.js +119 -0
- package/scripts/lint/check-archetype-sections.js +185 -0
- package/scripts/lint/check-category-enum.js +83 -0
- package/scripts/lint/check-routing-eval.js +146 -0
- package/scripts/lint/check-routing-quality.js +211 -0
- package/scripts/lint/check-stability-promotion.js +220 -0
- package/scripts/lint/format-code-frame.js +206 -0
- package/scripts/marketplace-install.js +125 -0
- package/scripts/migrate-category-to-enum.js +169 -0
- package/scripts/migrate-skill-v2-to-v3.js +424 -0
- package/scripts/migrate-skill-v3-to-v4.js +200 -0
- package/scripts/migrate-skill-v5-to-v6.js +304 -0
- package/scripts/restructure-by-category.js +85 -0
- package/scripts/seed-publication-classification.js +282 -0
- package/scripts/skill-audit.js +893 -0
- package/scripts/skill-graph-drift.js +483 -0
- package/scripts/skill-graph-route.js +766 -0
- package/scripts/skill-graph-routing-eval.js +393 -0
- package/scripts/skill-lint.js +1317 -0
- package/scripts/skill-overlap.js +213 -0
- package/scripts/verify-skill-md-export.js +201 -0
|
@@ -0,0 +1,229 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: tool-call-flow
|
|
3
|
+
description: "Use when reasoning about the protocol-level cycle by which a language model uses external tools: the four phases (declaration, request, execution, continuation), the message-history state model that ties them together, the structural differences between vendor protocols (Anthropic tool-use, OpenAI function-calling, MCP) and how they compose, parallel vs sequential tool calls, error handling and retries inside the cycle, and the separation between the model (which produces structured intent) and the runtime (which executes the intent and routes results back). Do NOT use for the decision of when and how many tool calls to make (use tool-call-strategy), agent-system architecture and coordination patterns (use agent-engineering), prompt wording (use prompt-craft), or the design of evals for tool-use behavior (use agent-eval-design)."
|
|
4
|
+
license: MIT
|
|
5
|
+
allowed-tools: Read Grep
|
|
6
|
+
metadata:
|
|
7
|
+
metadata: "{\"schema_version\":6,\"version\":\"1.0.0\",\"type\":\"capability\",\"category\":\"agent\",\"domain\":\"agent/protocol\",\"scope\":\"reference\",\"owner\":\"skill-graph-maintainer\",\"freshness\":\"2026-05-15\",\"drift_check\":\"{\\\\\\\"last_verified\\\\\\\":\\\\\\\"2026-05-15\\\\\\\"}\",\"eval_artifacts\":\"planned\",\"eval_state\":\"unverified\",\"routing_eval\":\"absent\",\"comprehension_state\":\"present\",\"stability\":\"experimental\",\"keywords\":\"[\\\\\\\"tool call\\\\\\\",\\\\\\\"tool use\\\\\\\",\\\\\\\"function calling\\\\\\\",\\\\\\\"MCP\\\\\\\",\\\\\\\"Model Context Protocol\\\\\\\",\\\\\\\"tool result\\\\\\\",\\\\\\\"parallel tool calls\\\\\\\",\\\\\\\"tool schema\\\\\\\",\\\\\\\"JSON Schema\\\\\\\",\\\\\\\"assistant turn\\\\\\\",\\\\\\\"tool runtime\\\\\\\",\\\\\\\"tool router\\\\\\\",\\\\\\\"tool definitions\\\\\\\",\\\\\\\"agent calling tools\\\\\\\"]\",\"triggers\":\"[\\\\\\\"how does tool calling actually work\\\\\\\",\\\\\\\"what's the message shape for a tool result\\\\\\\",\\\\\\\"MCP vs function calling vs Anthropic tools\\\\\\\",\\\\\\\"can the model call tools in parallel\\\\\\\",\\\\\\\"where do tool errors live in the message history\\\\\\\"]\",\"examples\":\"[\\\\\\\"design the message-shape contract between a model and a tool runtime\\\\\\\",\\\\\\\"explain why a tool result must be appended to the message history before the next assistant turn\\\\\\\",\\\\\\\"decide whether to expose a capability as a tool, an MCP server, or an inline API\\\\\\\",\\\\\\\"diagnose why a model keeps re-calling the same tool with the same arguments\\\\\\\"]\",\"anti_examples\":\"[\\\\\\\"decide whether to call a tool or write a script (use tool-call-strategy)\\\\\\\",\\\\\\\"choose a multi-agent coordination pattern (use agent-engineering)\\\\\\\",\\\\\\\"design an eval suite that tests tool-call correctness (use agent-eval-design)\\\\\\\"]\",\"relations\":\"{\\\\\\\"related\\\\\\\":[\\\\\\\"tool-call-strategy\\\\\\\",\\\\\\\"agent-engineering\\\\\\\",\\\\\\\"api-design\\\\\\\",\\\\\\\"type-safety\\\\\\\",\\\\\\\"client-server-boundary\\\\\\\"],\\\\\\\"boundary\\\\\\\":[{\\\\\\\"skill\\\\\\\":\\\\\\\"tool-call-strategy\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"tool-call-strategy owns the decision of when, how many, and which tools to call (token cost, redundancy, parallelization, decision gate). tool-call-flow owns the protocol-level cycle that makes any call possible. The two compose: strategy decides what to do; flow describes the mechanism that carries it out.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"agent-engineering\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"agent-engineering owns multi-agent and multi-step system architecture (orchestrator/worker, consensus, sequential chains). tool-call-flow is one cycle inside a single agent — the protocol for a single model-to-runtime interaction.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"api-design\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"api-design owns the external API surface that tools may wrap. tool-call-flow owns the model-facing contract: how the tool is declared to the model, how the result is encoded back to it, and how the cycle is structured.\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"client-server-boundary\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"client-server-boundary owns the serialization frontier between server and client code. tool-call-flow is an analogous frontier between a language model (which produces structured intent) and a runtime (which executes the intent) — the trust direction is different but the discipline of explicit serialization is identical.\\\\\\\"}],\\\\\\\"verify_with\\\\\\\":[\\\\\\\"tool-call-strategy\\\\\\\",\\\\\\\"agent-eval-design\\\\\\\"]}\",\"mental_model\":\"|\",\"purpose\":\"|\",\"boundary\":\"|\",\"analogy\":\"A tool-call flow is to a language model what a procurement system is to an executive — the executive does not personally drive to the supplier; they sign a typed purchase order, the procurement department validates the order, executes it, and returns the receipt with whatever was delivered or with a documented reason it could not be. The executive's signature is intent; the department's stamp is authorization; the receipt is the only state of the cycle that survives, and the next decision is made against that record.\",\"misconception\":\"|\",\"concept\":\"{\\\\\\\"definition\\\\\\\":\\\\\\\"A tool-call flow is the multi-turn protocol by which a language model uses external capabilities. It has four phases — declaration (the runtime tells the model which tools exist and their parameter schemas), request (the model emits a structured tool-call message specifying tool name and arguments), execution (the runtime invokes the underlying capability and produces a result), continuation (the runtime appends the result to the message history and re-prompts the model, which either continues with another tool call or produces a final answer). The state of the cycle lives in the message history; the model is stateless across calls.\\\\\\\",\\\\\\\"mental_model\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"purpose\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"boundary\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"taxonomy\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"analogy\\\\\\\":\\\\\\\"|\\\\\\\",\\\\\\\"misconception\\\\\\\":\\\\\\\"|\\\\\\\"}\",\"skill_graph_source_repo\":\"https://github.com/jacob-balslev/skill-graph\",\"skill_graph_protocol\":\"Skill Metadata Protocol v5\",\"skill_graph_project\":\"Skill Graph\",\"skill_graph_canonical_skill\":\"skills/tool-call-flow/SKILL.md\"}"
|
|
8
|
+
skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
|
|
9
|
+
skill_graph_protocol: Skill Metadata Protocol v4
|
|
10
|
+
skill_graph_project: Skill Graph
|
|
11
|
+
skill_graph_canonical_skill: skills/tool-call-flow/SKILL.md
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Tool-Call Flow
|
|
15
|
+
|
|
16
|
+
## Coverage
|
|
17
|
+
|
|
18
|
+
The protocol-level cycle by which a language model uses external capabilities. Covers the four phases (declaration, request, execution, continuation), the message-history state model that ties them together, the structural differences between vendor protocols (Anthropic tool-use, OpenAI function-calling, Model Context Protocol, Gemini function calling), parallel tool calls, streaming tool calls, the runtime's role as orchestrator, error encoding inside the cycle, and the boundary between model-side intent and runtime-side execution.
|
|
19
|
+
|
|
20
|
+
## Philosophy
|
|
21
|
+
|
|
22
|
+
A tool-call flow is the smallest unit of agentic capability. Strip away orchestration patterns, multi-agent coordination, evaluation harnesses — what remains is a single language model alternating turns with a runtime that executes capabilities on its behalf. Understanding this cycle precisely is the foundation for understanding everything that builds on it.
|
|
23
|
+
|
|
24
|
+
The cycle's defining property is the separation of planning from execution. The model produces structured intent; the runtime carries it out. This separation is not a workaround for current model capabilities; it is a deliberate design choice that makes the system auditable, composable, and recoverable. A system that fuses the two — by letting the model execute code directly, or by letting the runtime make decisions — gains expressiveness and loses every benefit the separation provides.
|
|
25
|
+
|
|
26
|
+
The four-phase structure is identical across every current vendor protocol. The names differ, the message shapes differ, the encoding of parallelism differs, but the cycle — declare, request, execute, continue — is the same. A practitioner who understands the cycle can move between Anthropic, OpenAI, MCP, and Gemini at the cost of a translation layer; a practitioner who understands only one vendor's encoding cannot.
|
|
27
|
+
|
|
28
|
+
## The Four Phases
|
|
29
|
+
|
|
30
|
+
| Phase | Who acts | Output | Becomes |
|
|
31
|
+
|---|---|---|---|
|
|
32
|
+
| 1. Declaration | Runtime | List of tools with name, description, JSON-Schema parameter spec | Part of the request to the model |
|
|
33
|
+
| 2. Request | Model | Assistant message with one or more tool-call blocks (or a final-answer message — ending the cycle) | Appended to message history |
|
|
34
|
+
| 3. Execution | Runtime | Result of invoking the named tool with the supplied arguments | A tool-result message |
|
|
35
|
+
| 4. Continuation | Runtime | Tool-result message paired with the request by ID | Appended to message history; cycle repeats |
|
|
36
|
+
|
|
37
|
+
The cycle ends when the model emits an assistant message *without* any tool-call blocks. The final message's content is the final answer.
|
|
38
|
+
|
|
39
|
+
## Vendor Protocol Comparison
|
|
40
|
+
|
|
41
|
+
The cycle is the same; the encoding differs. The table below shows the same single-call cycle in three protocols.
|
|
42
|
+
|
|
43
|
+
### Anthropic tool-use (Messages API)
|
|
44
|
+
|
|
45
|
+
```jsonc
|
|
46
|
+
// Request to model (turn 1)
|
|
47
|
+
{
|
|
48
|
+
"model": "claude-opus-4-7",
|
|
49
|
+
"tools": [{
|
|
50
|
+
"name": "get_weather",
|
|
51
|
+
"description": "Returns current weather for a city.",
|
|
52
|
+
"input_schema": {
|
|
53
|
+
"type": "object",
|
|
54
|
+
"properties": { "location": { "type": "string" } },
|
|
55
|
+
"required": ["location"]
|
|
56
|
+
}
|
|
57
|
+
}],
|
|
58
|
+
"messages": [
|
|
59
|
+
{ "role": "user", "content": "What's the weather in Paris?" }
|
|
60
|
+
]
|
|
61
|
+
}
|
|
62
|
+
|
|
63
|
+
// Model response (turn 1 → 2)
|
|
64
|
+
{
|
|
65
|
+
"role": "assistant",
|
|
66
|
+
"content": [
|
|
67
|
+
{ "type": "text", "text": "I'll check the current weather." },
|
|
68
|
+
{ "type": "tool_use", "id": "toolu_01A", "name": "get_weather",
|
|
69
|
+
"input": { "location": "Paris" } }
|
|
70
|
+
]
|
|
71
|
+
}
|
|
72
|
+
|
|
73
|
+
// Runtime executes get_weather("Paris") → { "temp_c": 18, "conditions": "cloudy" }
|
|
74
|
+
|
|
75
|
+
// Request to model (turn 2, includes appended tool result)
|
|
76
|
+
{
|
|
77
|
+
"messages": [
|
|
78
|
+
{ "role": "user", "content": "What's the weather in Paris?" },
|
|
79
|
+
{ "role": "assistant", "content": [...] }, // turn 1 above
|
|
80
|
+
{ "role": "user", "content": [
|
|
81
|
+
{ "type": "tool_result", "tool_use_id": "toolu_01A",
|
|
82
|
+
"content": "{\"temp_c\":18,\"conditions\":\"cloudy\"}" }
|
|
83
|
+
]}
|
|
84
|
+
]
|
|
85
|
+
}
|
|
86
|
+
|
|
87
|
+
// Model response (final)
|
|
88
|
+
{ "role": "assistant", "content": [
|
|
89
|
+
{ "type": "text", "text": "It's 18°C and cloudy in Paris." }
|
|
90
|
+
]}
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
### OpenAI function-calling (Chat Completions)
|
|
94
|
+
|
|
95
|
+
```jsonc
|
|
96
|
+
// Same flow with different encoding:
|
|
97
|
+
// - Tool calls live in `tool_calls` adjacent to `content`, not inside `content`.
|
|
98
|
+
// - Tool results live in messages with `role: "tool"` (not inside a user-role message).
|
|
99
|
+
// - IDs pair via `tool_call_id`.
|
|
100
|
+
{
|
|
101
|
+
"role": "assistant",
|
|
102
|
+
"content": null,
|
|
103
|
+
"tool_calls": [{
|
|
104
|
+
"id": "call_abc", "type": "function",
|
|
105
|
+
"function": { "name": "get_weather",
|
|
106
|
+
"arguments": "{\"location\":\"Paris\"}" }
|
|
107
|
+
}]
|
|
108
|
+
}
|
|
109
|
+
{
|
|
110
|
+
"role": "tool",
|
|
111
|
+
"tool_call_id": "call_abc",
|
|
112
|
+
"content": "{\"temp_c\":18,\"conditions\":\"cloudy\"}"
|
|
113
|
+
}
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
### Model Context Protocol (MCP)
|
|
117
|
+
|
|
118
|
+
```jsonc
|
|
119
|
+
// MCP externalizes the declaration phase. Tools live on an MCP server.
|
|
120
|
+
// The model client discovers them at runtime via the MCP transport
|
|
121
|
+
// (typically JSON-RPC over stdio or SSE), then inlines them into the
|
|
122
|
+
// per-call request to the underlying model API (Anthropic, OpenAI, etc.).
|
|
123
|
+
|
|
124
|
+
// 1. Client → MCP server: tools/list
|
|
125
|
+
// 2. MCP server → client: list of tool declarations
|
|
126
|
+
// 3. Client embeds those declarations in the model request (in whatever
|
|
127
|
+
// vendor's encoding the underlying model uses)
|
|
128
|
+
// 4. Model emits a tool-call message
|
|
129
|
+
// 5. Client → MCP server: tools/call { name, arguments }
|
|
130
|
+
// 6. MCP server → client: result
|
|
131
|
+
// 7. Client appends the result and re-prompts the model
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
MCP's contribution is dynamic discovery and provider neutrality at the catalog level, not a different cycle shape.
|
|
135
|
+
|
|
136
|
+
## Parallel Tool Calls
|
|
137
|
+
|
|
138
|
+
Modern protocols allow the model to emit multiple tool-call blocks in a single assistant message. The runtime executes them concurrently and appends all results to the history before re-prompting.
|
|
139
|
+
|
|
140
|
+
```jsonc
|
|
141
|
+
// Anthropic — single assistant message with two tool_use blocks
|
|
142
|
+
{
|
|
143
|
+
"role": "assistant",
|
|
144
|
+
"content": [
|
|
145
|
+
{ "type": "tool_use", "id": "toolu_01A", "name": "get_weather",
|
|
146
|
+
"input": { "location": "Paris" } },
|
|
147
|
+
{ "type": "tool_use", "id": "toolu_01B", "name": "get_weather",
|
|
148
|
+
"input": { "location": "Tokyo" } }
|
|
149
|
+
]
|
|
150
|
+
}
|
|
151
|
+
// Runtime executes both concurrently
|
|
152
|
+
// Next user-role message contains both tool_result blocks
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
Parallel calls reduce wall-clock latency for independent operations but add coordination complexity for dependent ones: the model cannot use the result of one parallel call to inform another in the same turn (they all execute against the same pre-result state). Dependent calls must be sequential.
|
|
156
|
+
|
|
157
|
+
## Streaming Tool Calls
|
|
158
|
+
|
|
159
|
+
Streaming protocols emit the assistant message token-by-token. A complete tool-call block can be reconstructed before the full message finishes streaming, allowing the runtime to begin execution speculatively:
|
|
160
|
+
|
|
161
|
+
1. Streamed tokens form a complete tool_use block.
|
|
162
|
+
2. Runtime parses the complete block and begins executing the tool.
|
|
163
|
+
3. Streaming continues; if a second tool-call block arrives, it also begins executing.
|
|
164
|
+
4. When the streamed message completes, the runtime has the full set of calls; some are already done.
|
|
165
|
+
|
|
166
|
+
Streaming is an optimization, not a different cycle. The correctness contract is the same: results must be appended in their assigned positions; the model sees the same history regardless of timing.
|
|
167
|
+
|
|
168
|
+
## Failure Encoding
|
|
169
|
+
|
|
170
|
+
All failures are tool-result messages — never out-of-band exceptions that break the cycle.
|
|
171
|
+
|
|
172
|
+
| Failure | Encoding |
|
|
173
|
+
|---|---|
|
|
174
|
+
| Schema validation failed | Result with error: "arguments did not match schema: missing required field 'location'" |
|
|
175
|
+
| Tool execution threw | Result with error: "OpenWeatherMap returned 503 Service Unavailable" |
|
|
176
|
+
| Timeout | Result with error: "tool execution exceeded 10s timeout" |
|
|
177
|
+
| Permission denied | Result with error: "this tool requires admin permission" |
|
|
178
|
+
| Unknown tool | Result with error: "no tool named 'get_wether' (did you mean 'get_weather'?)" |
|
|
179
|
+
|
|
180
|
+
The model sees the error in its next turn and can choose: retry with corrected arguments, try a different tool, abandon the goal, or surface the failure to the user with context. The runtime's job is to encode the failure faithfully; the model's job is to handle it.
|
|
181
|
+
|
|
182
|
+
## The Runtime's Responsibilities
|
|
183
|
+
|
|
184
|
+
The runtime is the orchestrator. Specifically, it owns:
|
|
185
|
+
|
|
186
|
+
- **Tool catalog management** — declare which tools exist and their schemas (or, with MCP, discover them).
|
|
187
|
+
- **Schema validation** — verify the model's tool-call arguments match the declared schema before invocation.
|
|
188
|
+
- **Dispatch** — route the validated call to the underlying implementation (local, network, MCP, subagent, human).
|
|
189
|
+
- **Execution policy** — timeouts, retries, rate limits, parallelism limits, side-effect gating.
|
|
190
|
+
- **Result encoding** — format the execution output as a tool-result message paired by ID.
|
|
191
|
+
- **Loop control** — re-prompt the model; cap the maximum number of turns; detect runaway loops.
|
|
192
|
+
- **Audit and persistence** — store the message history; make runs replayable.
|
|
193
|
+
|
|
194
|
+
The model owns only the planning: which tool, which arguments, when to stop.
|
|
195
|
+
|
|
196
|
+
## Verification
|
|
197
|
+
|
|
198
|
+
After applying this skill, verify:
|
|
199
|
+
- [ ] Every tool declaration has a precise description naming when the tool should be used and what the arguments mean (the description is a prompt, not documentation).
|
|
200
|
+
- [ ] Every tool has a JSON Schema for its parameters with `required` fields and constrained types — the model's arguments are validated against this schema before execution.
|
|
201
|
+
- [ ] Tool-call and tool-result messages are paired by ID — the pairing is checked, not assumed.
|
|
202
|
+
- [ ] Errors are encoded into tool-result messages with diagnostic content the model can read — they do not break the loop.
|
|
203
|
+
- [ ] The cycle has a maximum-turns cap on the runtime side — a model that keeps tool-calling never exits without intervention.
|
|
204
|
+
- [ ] Side-effecting tools have explicit gating (confirmation, dry-run mode, allow-list) — the model does not enforce side-effect discipline.
|
|
205
|
+
- [ ] Parallel tool calls are used only for independent operations — dependent calls remain sequential.
|
|
206
|
+
- [ ] The message history is the only state — no hidden runtime memory the model cannot see.
|
|
207
|
+
- [ ] Logs include the full message history (or a structured trace) per cycle — runs are replayable.
|
|
208
|
+
|
|
209
|
+
## Do NOT Use When
|
|
210
|
+
|
|
211
|
+
| Instead of this skill | Use | Why |
|
|
212
|
+
|---|---|---|
|
|
213
|
+
| Deciding when to call a tool versus when to write a script | `tool-call-strategy` | tool-call-strategy owns the decision; tool-call-flow owns the mechanism that carries the decision out |
|
|
214
|
+
| Choosing a multi-agent coordination pattern | `agent-engineering` | agent-engineering owns system-level architecture; tool-call-flow is one cycle inside one agent |
|
|
215
|
+
| Writing the natural-language description that goes into a tool declaration | `prompt-craft` | prompt-craft owns wording; this skill owns the shape of what wording goes where |
|
|
216
|
+
| Designing the JSON shape of an external API a tool wraps | `api-design` | api-design owns the external surface; tool-call-flow owns the model-facing contract |
|
|
217
|
+
| Designing evals for tool-use correctness | `agent-eval-design` | agent-eval-design owns eval structure; tool-call-flow describes what is being evaluated |
|
|
218
|
+
| Debugging a tool that returns wrong results in production | `debugging` | debugging owns the diagnostic activity |
|
|
219
|
+
|
|
220
|
+
## Key Sources
|
|
221
|
+
|
|
222
|
+
- Anthropic. [Tool use overview](https://docs.anthropic.com/en/docs/build-with-claude/tool-use). Canonical reference for the Anthropic Messages API tool-use protocol — the tool_use / tool_result block structure, ID pairing, parallel calls.
|
|
223
|
+
- OpenAI. [Function calling guide](https://platform.openai.com/docs/guides/function-calling). Canonical reference for the OpenAI Chat Completions and Responses APIs' function-calling protocol — tool_calls / tool messages, tool_call_id pairing.
|
|
224
|
+
- Anthropic. [Model Context Protocol specification](https://modelcontextprotocol.io/specification). The open specification for MCP — transport, tool discovery, the tools/list and tools/call methods.
|
|
225
|
+
- JSON Schema. [Draft 2020-12 specification](https://json-schema.org/draft/2020-12/json-schema-core). The schema language used for parameter declarations across all vendor protocols.
|
|
226
|
+
- Google. [Gemini function calling documentation](https://ai.google.dev/gemini-api/docs/function-calling). The function_call / function_response message-part structure — same four-phase cycle, third encoding.
|
|
227
|
+
- Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). ["ReAct: Synergizing Reasoning and Acting in Language Models"](https://arxiv.org/abs/2210.03629). The Thought-Action-Observation loop that prefigured the structured tool-call protocols.
|
|
228
|
+
- Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2023). ["Toolformer: Language Models Can Teach Themselves to Use Tools"](https://arxiv.org/abs/2302.04761). Meta's foundational paper on training models to invoke tools — the research thread that motivates the protocol design choices.
|
|
229
|
+
- LangChain. [Tool calling concepts](https://python.langchain.com/docs/concepts/tool_calling/). Framework-agnostic description of the cycle structure, useful as a third-party reference outside any single vendor.
|
|
@@ -0,0 +1,292 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: tool-call-strategy
|
|
3
|
+
description: "Use when an agent is making too many tool calls, when context is filling from verbose tool outputs, when the same operation could be a script instead of N individual calls, or when designing a tool-use protocol for a new agent or harness. Covers the three costs of every call (token, latency, context pollution), the script-vs-call decision gate, tool-selection decision trees (file-search vs content-search vs targeted-read vs full-read), call batching and parallelization, redundancy avoidance, the poka-yoke principle, subagent delegation for context protection, and cost-benchmark heuristics by task type. Do NOT use for prompt wording (use `prompt-craft`), broader context stack design across the five layers (use `context-engineering`), runtime tool failures or production debugging (use `debugging`), or behaviour-preserving refactor mechanics (use `refactor`)."
|
|
4
|
+
license: MIT
|
|
5
|
+
compatibility: "Provider-agnostic; abstract tool capabilities map to concrete tools across Claude Code, Cursor, Copilot, OpenCode, Aider, Continue. Specific tool names in this skill (read_file, grep_search, run_in_terminal, apply_patch) are concrete examples — substitute the equivalent in your harness."
|
|
6
|
+
allowed-tools: Read Grep Bash Edit
|
|
7
|
+
metadata:
|
|
8
|
+
metadata: "{\"schema_version\":6,\"version\":\"1.0.0\",\"type\":\"capability\",\"category\":\"engineering\",\"domain\":\"ai-engineering/tool-use\",\"scope\":\"portable\",\"owner\":\"skill-graph-maintainer\",\"freshness\":\"2026-05-06\",\"drift_check\":\"{\\\\\\\"last_verified\\\\\\\":\\\\\\\"2026-05-06\\\\\\\"}\",\"eval_artifacts\":\"planned\",\"eval_state\":\"unverified\",\"routing_eval\":\"absent\",\"stability\":\"experimental\",\"keywords\":\"[\\\\\\\"tool call optimization\\\\\\\",\\\\\\\"reduce tool calls\\\\\\\",\\\\\\\"too many tool calls\\\\\\\",\\\\\\\"script vs tool call\\\\\\\",\\\\\\\"batching tool calls\\\\\\\",\\\\\\\"parallel tool calls\\\\\\\",\\\\\\\"parallelize calls\\\\\\\",\\\\\\\"independent calls\\\\\\\",\\\\\\\"redundant reads\\\\\\\",\\\\\\\"re-reading file\\\\\\\",\\\\\\\"tool selection\\\\\\\",\\\\\\\"which tool to use\\\\\\\",\\\\\\\"grep before read\\\\\\\",\\\\\\\"file search before grep\\\\\\\",\\\\\\\"bulk edit script\\\\\\\",\\\\\\\"poka-yoke tool design\\\\\\\",\\\\\\\"subagent delegation for tool efficiency\\\\\\\",\\\\\\\"context-efficient tool use\\\\\\\",\\\\\\\"cost per tool call\\\\\\\",\\\\\\\"tool call benchmark\\\\\\\",\\\\\\\"agent efficiency\\\\\\\",\\\\\\\"token efficiency per call\\\\\\\"]\",\"examples\":\"[\\\\\\\"the agent made 17 read_file calls when 3 greps would have done — what should it have done?\\\\\\\",\\\\\\\"we're renaming a variable across 40 files — script or tool calls?\\\\\\\",\\\\\\\"the agent re-reads the same file three times in one task — fix the policy\\\\\\\",\\\\\\\"should I batch these reads into one message or wait for each result?\\\\\\\",\\\\\\\"design a tool-use protocol for our new agent harness — what rules matter?\\\\\\\",\\\\\\\"the context window is filling with verbose terminal output — how do I cut it?\\\\\\\",\\\\\\\"is it worth delegating this exploratory search to a subagent?\\\\\\\",\\\\\\\"what's a reasonable tool-call budget for a single-file bug fix?\\\\\\\"]\",\"anti_examples\":\"[\\\\\\\"improve this prompt's wording to get better outputs\\\\\\\",\\\\\\\"design what skills get loaded for which prompts\\\\\\\",\\\\\\\"the test suite is failing after my change — find the cause\\\\\\\",\\\\\\\"extract this repeated string-concat into a helper function\\\\\\\",\\\\\\\"scaffold a new SKILL.md for our team's tool-use rules\\\\\\\",\\\\\\\"review this AI-generated PR for correctness\\\\\\\"]\",\"relations\":\"{\\\\\\\"boundary\\\\\\\":[{\\\\\\\"skill\\\\\\\":\\\\\\\"context-engineering\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"context-engineering designs the entire information stack reaching the model; tool-call-strategy owns the per-call efficiency decisions inside that stack\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"prompt-craft\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"prompt-craft writes the wording of one instruction; tool-call-strategy decides which external operations the agent should invoke around that instruction\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"debugging\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"debugging chases a specific runtime failure; tool-call-strategy is about the efficiency profile of healthy tool use\\\\\\\"},{\\\\\\\"skill\\\\\\\":\\\\\\\"refactor\\\\\\\",\\\\\\\"reason\\\\\\\":\\\\\\\"refactor owns behaviour-preserving code transformations as the deliverable; tool-call-strategy decides whether to deliver that transformation through 50 tool calls or one script\\\\\\\"}],\\\\\\\"related\\\\\\\":[\\\\\\\"context-engineering\\\\\\\",\\\\\\\"refactor\\\\\\\",\\\\\\\"prompt-craft\\\\\\\"],\\\\\\\"verify_with\\\\\\\":[\\\\\\\"code-review\\\\\\\"]}\",\"portability\":\"{\\\\\\\"readiness\\\\\\\":\\\\\\\"scripted\\\\\\\",\\\\\\\"targets\\\\\\\":[\\\\\\\"skill-md\\\\\\\"]}\",\"lifecycle\":\"{\\\\\\\"stale_after_days\\\\\\\":90,\\\\\\\"review_cadence\\\\\\\":\\\\\\\"quarterly\\\\\\\"}\",\"skill_graph_source_repo\":\"https://github.com/jacob-balslev/skill-graph\",\"skill_graph_protocol\":\"Skill Metadata Protocol v5\",\"skill_graph_project\":\"Skill Graph\",\"skill_graph_canonical_skill\":\"skills/tool-call-strategy/SKILL.md\"}"
|
|
9
|
+
skill_graph_source_repo: "https://github.com/jacob-balslev/skill-graph"
|
|
10
|
+
skill_graph_protocol: Skill Metadata Protocol v4
|
|
11
|
+
skill_graph_project: Skill Graph
|
|
12
|
+
skill_graph_canonical_skill: skills/tool-call-strategy/SKILL.md
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
# Tool Call Strategy
|
|
16
|
+
|
|
17
|
+
## Coverage
|
|
18
|
+
|
|
19
|
+
- The three costs of every tool call: token cost (schema overhead and result size), latency cost (round-trip and decision time), context pollution (results persist in attention window)
|
|
20
|
+
- The script-vs-call decision gate: deterministic bulk work belongs in a script; reasoning-dependent work belongs in individual tool calls with the agent in the loop
|
|
21
|
+
- Tool selection decision tree: file-search vs content-search vs targeted-read vs full-read, the dedicated-tool-over-shell rule, and the harness-agnostic capability map
|
|
22
|
+
- Batching independent calls in a single message vs sequential round-trips, and the dependency-detection heuristic
|
|
23
|
+
- Redundancy avoidance: the conversation-as-cache mental model, recognising re-reads, re-searches, and re-runs
|
|
24
|
+
- Context-efficient patterns: targeted line ranges, summarised verification output, dedicated-tool defaults
|
|
25
|
+
- Subagent delegation for context protection: when exploration belongs in a disposable subagent context vs the main session
|
|
26
|
+
- The poka-yoke principle: design tool usage to prevent mistakes, not just optimise speed
|
|
27
|
+
- Cost benchmark heuristics: rough call-count ranges per task type and the "stop and reconsider" red flag
|
|
28
|
+
|
|
29
|
+
## Philosophy
|
|
30
|
+
|
|
31
|
+
Every tool call has three simultaneous costs: tokens (schema overhead plus result), latency (network round-trip plus decision time), and context pollution (results persist in the attention window and degrade subsequent reasoning). Agents that issue 12 calls where 3 would suffice are not merely slower — they are measurably less accurate, because noise accumulated in the context window pushes useful signal further from the attention window.
|
|
32
|
+
|
|
33
|
+
The optimal strategy is not "minimise calls." Under-calling causes hallucination and skipped verification. The objective is **information gained per unit cost**: a single well-targeted grep that returns five matching lines is worth more than reading three entire files to find the same information. A script that processes fifty files in one shell call is worth more than fifty individual edit calls. The conversation history acts as a cache; information already retrieved does not need to be retrieved again.
|
|
34
|
+
|
|
35
|
+
> A tool call is like a SQL query against a slow, expensive, noisy database — plan it before you run it, do not re-run queries whose results are already in the result set, and reach for set-based operations (scripts) when you catch yourself doing the same row-level work N times.
|
|
36
|
+
|
|
37
|
+
## The Three Costs of Every Tool Call
|
|
38
|
+
|
|
39
|
+
| Cost | Mechanism | Magnitude |
|
|
40
|
+
|---|---|---|
|
|
41
|
+
| **Token cost** | Tool schemas sent with every request (~500 tokens per declared tool). Each result adds to conversation. Ten available tools = ~5,000 tokens of overhead before any user input. | Scales with number of available tools and result size |
|
|
42
|
+
| **Latency cost** | Network round-trip, tool execution time, model decision time per call | ~200–2000 ms per call; compounds for sequential calls |
|
|
43
|
+
| **Context pollution** | Every result stays in conversation history. Failed attempts, verbose outputs, and redundant reads all persist | Degrades reasoning quality as context fills |
|
|
44
|
+
|
|
45
|
+
**The compound effect:** five unnecessary reads do not just cost five times the tokens — they push useful context further from the attention window, degrading the quality of subsequent reasoning. Context is not just a budget; it is a signal-to-noise ratio.
|
|
46
|
+
|
|
47
|
+
## Harness-Agnostic Tool Capability Map
|
|
48
|
+
|
|
49
|
+
Every modern coding-agent harness exposes the same five abstract tool capabilities under different concrete names. Substitute your harness's equivalents when applying this skill.
|
|
50
|
+
|
|
51
|
+
| Abstract capability | Claude Code | Cursor / Copilot / Continue | OpenCode | Shell-only fallback |
|
|
52
|
+
|---|---|---|---|---|
|
|
53
|
+
| File-pattern search (find files by name/path glob) | `Glob` | `file_search` | `glob` | `find` |
|
|
54
|
+
| Content search (find text inside files) | `Grep` | `grep_search` | `grep` | `grep -r`, `rg` |
|
|
55
|
+
| Targeted read (read specific lines of a file) | `Read` (with `offset`/`limit`) | `read_file` (with line range) | `read` | `sed -n 'A,Bp'` |
|
|
56
|
+
| Diff-based edit (modify part of a file) | `Edit` / `MultiEdit` | `replace_string_in_file` / `apply_patch` | `edit` | `sed -i` (avoid) |
|
|
57
|
+
| Shell execution (run an arbitrary command) | `Bash` | `run_in_terminal` | `bash` | direct shell |
|
|
58
|
+
|
|
59
|
+
The principles in this skill apply uniformly across all of them. Examples below use the Cursor/Copilot names because they are the most descriptive; the same advice applies to whichever set of names your harness exposes.
|
|
60
|
+
|
|
61
|
+
## The Script-vs-Call Decision Gate
|
|
62
|
+
|
|
63
|
+
The single most impactful optimisation: use scripts for deterministic work, tool calls for reasoning-dependent work.
|
|
64
|
+
|
|
65
|
+
```
|
|
66
|
+
Is the operation deterministic (known input → known output)?
|
|
67
|
+
YES → Can it be expressed as a shell command or script?
|
|
68
|
+
YES → Write a script, run once via a shell-execution tool
|
|
69
|
+
NO → Single tool call with structured output
|
|
70
|
+
NO → Does the operation require reasoning about intermediate results?
|
|
71
|
+
YES → Individual tool calls with the agent in the loop
|
|
72
|
+
NO → Batch into a script that returns structured data the agent can reason about
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
### When scripts beat tool calls
|
|
76
|
+
|
|
77
|
+
| Scenario | Tool-call approach | Script approach | Savings |
|
|
78
|
+
|---|---|---|---|
|
|
79
|
+
| Rename a variable across 20 files | 20 diff-based edit calls | One project-owned script via shell execution | 19 fewer calls |
|
|
80
|
+
| Check which files import a module | 10 targeted-read calls | One content-search call | 9 fewer calls |
|
|
81
|
+
| Run lint + typecheck + test | 3 sequential shell calls | `pnpm lint && pnpm typecheck && pnpm test` | 2 fewer calls |
|
|
82
|
+
| Create 5 similar test files | 5 file-creation calls | Script that generates all 5 | 4 fewer calls |
|
|
83
|
+
| Collect metrics from multiple sources | N targeted-read calls | Script that aggregates and returns JSON | N−1 fewer calls |
|
|
84
|
+
|
|
85
|
+
### When tool calls beat scripts
|
|
86
|
+
|
|
87
|
+
| Scenario | Why a script fails |
|
|
88
|
+
|---|---|
|
|
89
|
+
| Edit depends on understanding the code around it | The agent needs to read, reason, then decide what to change |
|
|
90
|
+
| Search result determines next action | The search path cannot be predicted in advance |
|
|
91
|
+
| Error in one step changes the approach for the next | Scripts cannot reason about failures mid-flight |
|
|
92
|
+
| Output needs human or agent review before proceeding | Scripts execute blindly |
|
|
93
|
+
|
|
94
|
+
## Tool Selection Decision Tree
|
|
95
|
+
|
|
96
|
+
Choose the right tool for the information need. Wrong tool choice is the largest single source of wasted calls.
|
|
97
|
+
|
|
98
|
+
```
|
|
99
|
+
Need to find files by name or path pattern?
|
|
100
|
+
→ file-pattern search (NOT shell `find`, NOT shell `ls`)
|
|
101
|
+
|
|
102
|
+
Need to find content inside files?
|
|
103
|
+
→ Need the matching lines themselves?
|
|
104
|
+
YES → content search with matching lines (returns content)
|
|
105
|
+
NO → Just need file paths? content search or file-pattern search
|
|
106
|
+
|
|
107
|
+
Need to read file contents?
|
|
108
|
+
→ Know which lines you need?
|
|
109
|
+
YES → targeted read with line range
|
|
110
|
+
NO → Need the whole file?
|
|
111
|
+
YES → full read (default)
|
|
112
|
+
NO → content search for the specific function or class first, then targeted read of the section
|
|
113
|
+
|
|
114
|
+
Need to modify a file?
|
|
115
|
+
→ Targeted change to existing content?
|
|
116
|
+
YES → diff-based edit (sends only the diff; fails if the old string does not match)
|
|
117
|
+
NO → Complete rewrite or new file?
|
|
118
|
+
YES → file-creation tool
|
|
119
|
+
NO → diff-based edit
|
|
120
|
+
|
|
121
|
+
Need to run a command?
|
|
122
|
+
→ Is there a dedicated tool for this? (read tool instead of `cat`/`head`/`tail`, content search instead of `grep`)
|
|
123
|
+
YES → use the dedicated tool
|
|
124
|
+
NO → shell execution
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
### Critical rules
|
|
128
|
+
|
|
129
|
+
| Rule | Why |
|
|
130
|
+
|---|---|
|
|
131
|
+
| Never use shell execution to read files with `cat`, `head`, `tail` | The dedicated read tool is purpose-built, shows line numbers, handles images/PDFs |
|
|
132
|
+
| Never use shell execution for `grep` or `rg` file searches | The dedicated content search has optimised permissions and structured output |
|
|
133
|
+
| Never use shell execution for `find` to discover files | The file-pattern search (glob) is faster and returns sorted results |
|
|
134
|
+
| Never default to inline `sed` or `awk` edits in shell | Prefer diff-based edits for reviewable changes; reserve raw `sed` for cases where a script would be disproportionate |
|
|
135
|
+
| Content search before full read | Content search returns only matching lines; full read returns the entire file |
|
|
136
|
+
| File-pattern search before content search | If you know the file pattern, narrow the search space first |
|
|
137
|
+
|
|
138
|
+
## Batching and Parallelization
|
|
139
|
+
|
|
140
|
+
### Independent calls: batch in a single message
|
|
141
|
+
|
|
142
|
+
If two or more tool calls do not depend on each other's output, make them all in the same message.
|
|
143
|
+
|
|
144
|
+
**Sequential (bad):**
|
|
145
|
+
```
|
|
146
|
+
Message 1: read file A → wait for result
|
|
147
|
+
Message 2: read file B → wait for result
|
|
148
|
+
Message 3: search for pattern → wait for result
|
|
149
|
+
Total: 3 round-trips
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
**Parallel (good):**
|
|
153
|
+
```
|
|
154
|
+
Message 1: read file A + read file B + search for pattern
|
|
155
|
+
Total: 1 round-trip (wall-clock = max of individual calls)
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
### Dependency detection
|
|
159
|
+
|
|
160
|
+
| Calls are independent when | Calls are dependent when |
|
|
161
|
+
|---|---|
|
|
162
|
+
| Different files, no shared state | Second call uses first call's output |
|
|
163
|
+
| Read-only operations | First call creates or modifies what second reads |
|
|
164
|
+
| Verification checks (lint, type, test) | Error in first determines whether to run second |
|
|
165
|
+
|
|
166
|
+
### Batching heuristic
|
|
167
|
+
|
|
168
|
+
Before making a tool call, ask: *"Is there another call I need to make that does not depend on this one's result?"* If yes, batch them.
|
|
169
|
+
|
|
170
|
+
## Avoiding Redundant Operations
|
|
171
|
+
|
|
172
|
+
### The information-cache mental model
|
|
173
|
+
|
|
174
|
+
Treat the conversation context as a cache. Information already retrieved does not need to be retrieved again.
|
|
175
|
+
|
|
176
|
+
| Redundancy type | Example | Fix |
|
|
177
|
+
|---|---|---|
|
|
178
|
+
| Re-reading a file | Read file A, make an edit, re-read file A to verify | The edit tool confirms what changed; trust it |
|
|
179
|
+
| Re-searching for the same pattern | Two identical content searches in one conversation | Reference the earlier result |
|
|
180
|
+
| Reading a file just written | Create file, then read it to confirm contents | File creation confirms success; trust it |
|
|
181
|
+
| Running the same verification twice | `pnpm typecheck` after edit, then again before commit | Once is enough if no other changes were made |
|
|
182
|
+
| Exploring broadly then narrowly | Search all files, then search the same pattern in a subdirectory | Start narrow; widen only if needed |
|
|
183
|
+
|
|
184
|
+
### The "Do I already know this?" check
|
|
185
|
+
|
|
186
|
+
Before every tool call, answer: *"Is this information already in my context from a previous call?"* If yes, reference it instead of re-fetching.
|
|
187
|
+
|
|
188
|
+
## Context-Efficient Patterns
|
|
189
|
+
|
|
190
|
+
### For file reading
|
|
191
|
+
|
|
192
|
+
| Need | Efficient pattern | Wasteful pattern |
|
|
193
|
+
|---|---|---|
|
|
194
|
+
| Find a specific function | Content search for the function name, then targeted read of the 30-line section | Full read of the entire 2000-line file |
|
|
195
|
+
| Check if a pattern exists | Content search (returns match count) | Full read of the entire file and search manually |
|
|
196
|
+
| Read multiple small sections | Multiple targeted reads with explicit line ranges | One full read that includes irrelevant code |
|
|
197
|
+
| Compare two files | Targeted reads of both with relevant line ranges | Full reads of both |
|
|
198
|
+
|
|
199
|
+
### For file modification
|
|
200
|
+
|
|
201
|
+
| Need | Efficient pattern | Wasteful pattern |
|
|
202
|
+
|---|---|---|
|
|
203
|
+
| Change one line | Diff-based edit with minimal old/new string | Full file rewrite |
|
|
204
|
+
| Change N similar lines | Diff-based edits batched in one message | N separate sequential edit calls |
|
|
205
|
+
| Change across many files | Project-owned script (Node, Python) — see note below | N separate edit calls |
|
|
206
|
+
| Create a new file | File-creation tool | Diff-based edit (cannot edit what does not exist) |
|
|
207
|
+
|
|
208
|
+
> **Bulk-edit note:** for "change across many files", prefer a project-owned script (Node, Python) that produces a reviewable diff rather than inline `sed -i` or `awk` in a shell call. Inline `sed -i` bypasses agent review and is hard to audit. Reserve raw `sed` for cases where a proper script would be disproportionate overhead.
|
|
209
|
+
|
|
210
|
+
### For verification
|
|
211
|
+
|
|
212
|
+
| Need | Efficient pattern | Wasteful pattern |
|
|
213
|
+
|---|---|---|
|
|
214
|
+
| Check if tests pass | `pnpm test 2>&1 \| tail -20` | Full unbounded test output in context |
|
|
215
|
+
| Check if a server is running | `curl -sf URL > /dev/null && echo up \|\| echo down` | Full curl output with headers and body |
|
|
216
|
+
| Check types | `pnpm typecheck 2>&1 \| head -30` | Unbounded typecheck output |
|
|
217
|
+
| Run multiple checks | `pnpm lint && pnpm typecheck && pnpm test` (one call) | Three separate shell calls |
|
|
218
|
+
|
|
219
|
+
## Subagent Delegation for Context Protection
|
|
220
|
+
|
|
221
|
+
Subagents run in separate contexts. Use them to prevent context pollution from exploratory work.
|
|
222
|
+
|
|
223
|
+
### When to use subagents
|
|
224
|
+
|
|
225
|
+
| Scenario | Why subagent |
|
|
226
|
+
|---|---|
|
|
227
|
+
| Exploring an unfamiliar part of the codebase | Exploration reads many files; main context stays clean |
|
|
228
|
+
| Running a broad search that may return many results | Results stay in subagent context; only the summary returns |
|
|
229
|
+
| Reviewing code (writer/reviewer pattern) | Reviewer has fresh context without implementation bias |
|
|
230
|
+
| Parallel independent investigations | Each runs in its own context without cross-contamination |
|
|
231
|
+
|
|
232
|
+
### When NOT to use subagents
|
|
233
|
+
|
|
234
|
+
| Scenario | Why direct |
|
|
235
|
+
|---|---|
|
|
236
|
+
| Single targeted read or content search | Subagent overhead exceeds the call itself |
|
|
237
|
+
| Work that requires multiple back-and-forth decisions | Subagent cannot ask clarifying questions mid-task |
|
|
238
|
+
| Simple file edits | Direct is faster |
|
|
239
|
+
|
|
240
|
+
### Subagent context-efficiency rule
|
|
241
|
+
|
|
242
|
+
Brief subagents with the minimum context they need. Include: what to find, where to look, what format to report back in. Do not include: full conversation history, unrelated background, or "figure out what I need."
|
|
243
|
+
|
|
244
|
+
## The Poka-Yoke Principle
|
|
245
|
+
|
|
246
|
+
Design tool usage to prevent mistakes, not just optimise speed. *Poka-yoke* (Japanese: "mistake-proofing") is the lean-manufacturing principle of designing the work so the wrong action is hard or impossible.
|
|
247
|
+
|
|
248
|
+
| Poka-yoke | Why it prevents errors |
|
|
249
|
+
|---|---|
|
|
250
|
+
| Use absolute file paths | Relative paths break when working directory changes |
|
|
251
|
+
| Prefer diff-based edit over full-file rewrite for existing files | Diff-based edit fails if the old string does not match; full rewrite silently overwrites |
|
|
252
|
+
| Run a content search before a full read | Confirms the file exists and contains the pattern before reading the full content |
|
|
253
|
+
| Run verification *after* edits, not before | Pre-edit verification is wasted if the edit changes the result |
|
|
254
|
+
| Pipe long outputs through `tail` or `head` | Prevents context overflow from verbose commands |
|
|
255
|
+
|
|
256
|
+
## Cost Benchmark Heuristics
|
|
257
|
+
|
|
258
|
+
Rough guideline ranges for different task types. These are heuristic targets, not empirically calibrated thresholds — actual counts vary by task complexity, codebase familiarity, and how much context is already in the session. Treat them as "should I stop and reconsider?" thresholds, not hard limits.
|
|
259
|
+
|
|
260
|
+
| Task type | Guideline range | Typical tools |
|
|
261
|
+
|---|---|---|
|
|
262
|
+
| Simple bug fix (1 file) | 3–5 calls | Content search, targeted read, diff-based edit, verify |
|
|
263
|
+
| Feature addition (2–3 files) | 5–10 calls | Read existing patterns, write new code, verify |
|
|
264
|
+
| Refactor (many files) | 3–8 calls | Content search to find all sites, script to batch-edit, verify |
|
|
265
|
+
| Investigation / exploration | 5–15 calls | Multiple content searches and targeted reads |
|
|
266
|
+
| Complex multi-file feature | 10–20 calls | Plan, read patterns, implement, verify |
|
|
267
|
+
|
|
268
|
+
**Red flag:** if a task is taking more than 20 tool calls, stop and ask: *"Am I using the right approach?"* Consider scripting, subagent delegation, or a different strategy. The fact that 20 calls feels like a lot is itself a useful signal — listen to it.
|
|
269
|
+
|
|
270
|
+
## Verification
|
|
271
|
+
|
|
272
|
+
After applying this skill, verify:
|
|
273
|
+
|
|
274
|
+
- [ ] Content search ran before targeted read when looking for specific content
|
|
275
|
+
- [ ] Independent tool calls were batched in the same message
|
|
276
|
+
- [ ] Scripts replaced N+1 individual calls for deterministic bulk operations
|
|
277
|
+
- [ ] No re-reads or re-searches for information already in context
|
|
278
|
+
- [ ] Targeted reads with explicit line ranges were used for large files
|
|
279
|
+
- [ ] Verbose command outputs were piped through `head` or `tail`
|
|
280
|
+
- [ ] Total tool calls fall within the benchmark range for this task type, or there is a documented reason they exceed it
|
|
281
|
+
- [ ] Subagents were used for context-heavy exploration, not for trivial single calls
|
|
282
|
+
|
|
283
|
+
## Do NOT Use When
|
|
284
|
+
|
|
285
|
+
| Use instead | When |
|
|
286
|
+
|---|---|
|
|
287
|
+
| `prompt-craft` | The fix is in the wording of one instruction (clarity, format, few-shot examples), not how the surrounding tool calls are sequenced |
|
|
288
|
+
| `context-engineering` | The question is about the entire information stack (system prompt, memory, rules, skills) reaching the model, not per-call efficiency |
|
|
289
|
+
| `debugging` | A tool is returning errors at runtime — that is a bug, not an efficiency problem |
|
|
290
|
+
| `refactor` | The deliverable is a behaviour-preserving code transformation; the tool-call efficiency of getting there is a means, not the end |
|
|
291
|
+
| `skill-router` | Deciding which *skill* should activate for a query, not which *tool call* the activated skill should make next |
|
|
292
|
+
| `documentation` | Writing prose for a human reader explaining how the agent's tool usage works |
|