gsd-remix 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +939 -0
- package/README.zh-CN.md +876 -0
- package/agents/gsd-advisor-researcher.md +127 -0
- package/agents/gsd-ai-researcher.md +133 -0
- package/agents/gsd-assumptions-analyzer.md +105 -0
- package/agents/gsd-code-fixer.md +517 -0
- package/agents/gsd-code-reviewer.md +371 -0
- package/agents/gsd-codebase-mapper.md +781 -0
- package/agents/gsd-debug-session-manager.md +314 -0
- package/agents/gsd-debugger.md +1452 -0
- package/agents/gsd-doc-classifier.md +168 -0
- package/agents/gsd-doc-synthesizer.md +204 -0
- package/agents/gsd-doc-verifier.md +217 -0
- package/agents/gsd-doc-writer.md +615 -0
- package/agents/gsd-domain-researcher.md +153 -0
- package/agents/gsd-eval-auditor.md +191 -0
- package/agents/gsd-eval-planner.md +154 -0
- package/agents/gsd-executor.md +603 -0
- package/agents/gsd-framework-selector.md +160 -0
- package/agents/gsd-integration-checker.md +470 -0
- package/agents/gsd-intel-updater.md +334 -0
- package/agents/gsd-nyquist-auditor.md +203 -0
- package/agents/gsd-pattern-mapper.md +335 -0
- package/agents/gsd-phase-researcher.md +841 -0
- package/agents/gsd-plan-checker.md +978 -0
- package/agents/gsd-planner.md +1251 -0
- package/agents/gsd-project-researcher.md +677 -0
- package/agents/gsd-research-synthesizer.md +247 -0
- package/agents/gsd-roadmapper.md +688 -0
- package/agents/gsd-security-auditor.md +155 -0
- package/agents/gsd-ui-auditor.md +495 -0
- package/agents/gsd-ui-checker.md +309 -0
- package/agents/gsd-ui-researcher.md +380 -0
- package/agents/gsd-user-profiler.md +171 -0
- package/agents/gsd-verifier.md +830 -0
- package/bin/install.js +7062 -0
- package/commands/gsd/add-backlog.md +79 -0
- package/commands/gsd/add-phase.md +43 -0
- package/commands/gsd/add-tests.md +41 -0
- package/commands/gsd/add-todo.md +47 -0
- package/commands/gsd/ai-integration-phase.md +36 -0
- package/commands/gsd/analyze-dependencies.md +34 -0
- package/commands/gsd/audit-fix.md +33 -0
- package/commands/gsd/audit-milestone.md +36 -0
- package/commands/gsd/audit-uat.md +24 -0
- package/commands/gsd/autonomous.md +46 -0
- package/commands/gsd/check-todos.md +45 -0
- package/commands/gsd/cleanup.md +23 -0
- package/commands/gsd/code-review-fix.md +52 -0
- package/commands/gsd/code-review.md +55 -0
- package/commands/gsd/complete-milestone.md +136 -0
- package/commands/gsd/debug.md +263 -0
- package/commands/gsd/discuss-phase.md +69 -0
- package/commands/gsd/do.md +30 -0
- package/commands/gsd/docs-update.md +48 -0
- package/commands/gsd/eval-review.md +32 -0
- package/commands/gsd/execute-phase.md +63 -0
- package/commands/gsd/explore.md +27 -0
- package/commands/gsd/extract_learnings.md +22 -0
- package/commands/gsd/fast.md +30 -0
- package/commands/gsd/forensics.md +56 -0
- package/commands/gsd/from-gsd2.md +47 -0
- package/commands/gsd/graphify.md +201 -0
- package/commands/gsd/health.md +22 -0
- package/commands/gsd/help.md +24 -0
- package/commands/gsd/import.md +37 -0
- package/commands/gsd/inbox.md +38 -0
- package/commands/gsd/ingest-docs.md +42 -0
- package/commands/gsd/insert-phase.md +32 -0
- package/commands/gsd/intel.md +179 -0
- package/commands/gsd/join-discord.md +19 -0
- package/commands/gsd/list-phase-assumptions.md +46 -0
- package/commands/gsd/list-workspaces.md +19 -0
- package/commands/gsd/manager.md +40 -0
- package/commands/gsd/map-codebase.md +71 -0
- package/commands/gsd/milestone-summary.md +51 -0
- package/commands/gsd/new-milestone.md +44 -0
- package/commands/gsd/new-project.md +46 -0
- package/commands/gsd/new-workspace.md +44 -0
- package/commands/gsd/next.md +28 -0
- package/commands/gsd/note.md +34 -0
- package/commands/gsd/pause-work.md +38 -0
- package/commands/gsd/plan-milestone-gaps.md +34 -0
- package/commands/gsd/plan-phase.md +52 -0
- package/commands/gsd/plan-review-convergence.md +52 -0
- package/commands/gsd/plant-seed.md +28 -0
- package/commands/gsd/pr-branch.md +25 -0
- package/commands/gsd/profile-user.md +46 -0
- package/commands/gsd/progress.md +25 -0
- package/commands/gsd/quick.md +173 -0
- package/commands/gsd/reapply-patches.md +331 -0
- package/commands/gsd/remove-phase.md +31 -0
- package/commands/gsd/remove-workspace.md +26 -0
- package/commands/gsd/research-phase.md +195 -0
- package/commands/gsd/resume-work.md +40 -0
- package/commands/gsd/review-backlog.md +62 -0
- package/commands/gsd/review.md +40 -0
- package/commands/gsd/scan.md +26 -0
- package/commands/gsd/secure-phase.md +35 -0
- package/commands/gsd/session-report.md +19 -0
- package/commands/gsd/set-profile.md +12 -0
- package/commands/gsd/settings.md +36 -0
- package/commands/gsd/ship.md +23 -0
- package/commands/gsd/sketch-wrap-up.md +31 -0
- package/commands/gsd/sketch.md +49 -0
- package/commands/gsd/spec-phase.md +62 -0
- package/commands/gsd/spike-wrap-up.md +31 -0
- package/commands/gsd/spike.md +46 -0
- package/commands/gsd/stats.md +18 -0
- package/commands/gsd/sync-skills.md +19 -0
- package/commands/gsd/thread.md +227 -0
- package/commands/gsd/ui-phase.md +34 -0
- package/commands/gsd/ui-review.md +32 -0
- package/commands/gsd/ultraplan-phase.md +33 -0
- package/commands/gsd/undo.md +34 -0
- package/commands/gsd/update.md +37 -0
- package/commands/gsd/validate-phase.md +35 -0
- package/commands/gsd/verify-work.md +38 -0
- package/commands/gsd/workstreams.md +69 -0
- package/get-shit-done/bin/gsd-tools.cjs +1263 -0
- package/get-shit-done/bin/lib/artifacts.cjs +52 -0
- package/get-shit-done/bin/lib/audit.cjs +757 -0
- package/get-shit-done/bin/lib/commands.cjs +1023 -0
- package/get-shit-done/bin/lib/config-schema.cjs +79 -0
- package/get-shit-done/bin/lib/config.cjs +463 -0
- package/get-shit-done/bin/lib/core.cjs +1794 -0
- package/get-shit-done/bin/lib/docs.cjs +267 -0
- package/get-shit-done/bin/lib/frontmatter.cjs +379 -0
- package/get-shit-done/bin/lib/graphify.cjs +494 -0
- package/get-shit-done/bin/lib/gsd2-import.cjs +511 -0
- package/get-shit-done/bin/lib/init.cjs +1878 -0
- package/get-shit-done/bin/lib/intel.cjs +639 -0
- package/get-shit-done/bin/lib/learnings.cjs +378 -0
- package/get-shit-done/bin/lib/milestone.cjs +283 -0
- package/get-shit-done/bin/lib/model-profiles.cjs +71 -0
- package/get-shit-done/bin/lib/phase.cjs +1058 -0
- package/get-shit-done/bin/lib/profile-output.cjs +1080 -0
- package/get-shit-done/bin/lib/profile-pipeline.cjs +539 -0
- package/get-shit-done/bin/lib/roadmap.cjs +523 -0
- package/get-shit-done/bin/lib/schema-detect.cjs +238 -0
- package/get-shit-done/bin/lib/security.cjs +504 -0
- package/get-shit-done/bin/lib/state.cjs +1649 -0
- package/get-shit-done/bin/lib/template.cjs +226 -0
- package/get-shit-done/bin/lib/uat.cjs +288 -0
- package/get-shit-done/bin/lib/verify.cjs +1184 -0
- package/get-shit-done/bin/lib/workstream.cjs +495 -0
- package/get-shit-done/bin/repair-sdk.cjs +177 -0
- package/get-shit-done/contexts/dev.md +21 -0
- package/get-shit-done/contexts/research.md +22 -0
- package/get-shit-done/contexts/review.md +22 -0
- package/get-shit-done/references/agent-contracts.md +79 -0
- package/get-shit-done/references/ai-evals.md +156 -0
- package/get-shit-done/references/ai-frameworks.md +186 -0
- package/get-shit-done/references/artifact-types.md +131 -0
- package/get-shit-done/references/autonomous-smart-discuss.md +277 -0
- package/get-shit-done/references/checkpoints.md +808 -0
- package/get-shit-done/references/common-bug-patterns.md +114 -0
- package/get-shit-done/references/context-budget.md +49 -0
- package/get-shit-done/references/continuation-format.md +253 -0
- package/get-shit-done/references/debugger-philosophy.md +76 -0
- package/get-shit-done/references/decimal-phase-calculation.md +64 -0
- package/get-shit-done/references/doc-conflict-engine.md +91 -0
- package/get-shit-done/references/domain-probes.md +125 -0
- package/get-shit-done/references/executor-examples.md +110 -0
- package/get-shit-done/references/few-shot-examples/plan-checker.md +73 -0
- package/get-shit-done/references/few-shot-examples/verifier.md +109 -0
- package/get-shit-done/references/gate-prompts.md +100 -0
- package/get-shit-done/references/gates.md +70 -0
- package/get-shit-done/references/git-integration.md +295 -0
- package/get-shit-done/references/git-planning-commit.md +40 -0
- package/get-shit-done/references/ios-scaffold.md +123 -0
- package/get-shit-done/references/mandatory-initial-read.md +2 -0
- package/get-shit-done/references/model-profile-resolution.md +38 -0
- package/get-shit-done/references/model-profiles.md +145 -0
- package/get-shit-done/references/phase-argument-parsing.md +61 -0
- package/get-shit-done/references/planner-antipatterns.md +89 -0
- package/get-shit-done/references/planner-gap-closure.md +62 -0
- package/get-shit-done/references/planner-reviews.md +39 -0
- package/get-shit-done/references/planner-revision.md +87 -0
- package/get-shit-done/references/planner-source-audit.md +73 -0
- package/get-shit-done/references/planning-config.md +460 -0
- package/get-shit-done/references/project-skills-discovery.md +19 -0
- package/get-shit-done/references/questioning.md +162 -0
- package/get-shit-done/references/revision-loop.md +97 -0
- package/get-shit-done/references/sketch-interactivity.md +41 -0
- package/get-shit-done/references/sketch-theme-system.md +94 -0
- package/get-shit-done/references/sketch-tooling.md +45 -0
- package/get-shit-done/references/sketch-variant-patterns.md +81 -0
- package/get-shit-done/references/tdd.md +330 -0
- package/get-shit-done/references/thinking-models-debug.md +44 -0
- package/get-shit-done/references/thinking-models-execution.md +50 -0
- package/get-shit-done/references/thinking-models-planning.md +62 -0
- package/get-shit-done/references/thinking-models-research.md +50 -0
- package/get-shit-done/references/thinking-models-verification.md +55 -0
- package/get-shit-done/references/thinking-partner.md +96 -0
- package/get-shit-done/references/ui-brand.md +160 -0
- package/get-shit-done/references/universal-anti-patterns.md +63 -0
- package/get-shit-done/references/user-profiling.md +681 -0
- package/get-shit-done/references/verification-overrides.md +227 -0
- package/get-shit-done/references/verification-patterns.md +612 -0
- package/get-shit-done/references/workstream-flag.md +111 -0
- package/get-shit-done/templates/AI-SPEC.md +246 -0
- package/get-shit-done/templates/DEBUG.md +169 -0
- package/get-shit-done/templates/README.md +76 -0
- package/get-shit-done/templates/SECURITY.md +61 -0
- package/get-shit-done/templates/UAT.md +265 -0
- package/get-shit-done/templates/UI-SPEC.md +100 -0
- package/get-shit-done/templates/VALIDATION.md +76 -0
- package/get-shit-done/templates/claude-md.md +145 -0
- package/get-shit-done/templates/codebase/architecture.md +255 -0
- package/get-shit-done/templates/codebase/concerns.md +310 -0
- package/get-shit-done/templates/codebase/conventions.md +307 -0
- package/get-shit-done/templates/codebase/integrations.md +280 -0
- package/get-shit-done/templates/codebase/stack.md +186 -0
- package/get-shit-done/templates/codebase/structure.md +285 -0
- package/get-shit-done/templates/codebase/testing.md +480 -0
- package/get-shit-done/templates/config.json +56 -0
- package/get-shit-done/templates/context.md +352 -0
- package/get-shit-done/templates/continue-here.md +78 -0
- package/get-shit-done/templates/copilot-instructions.md +7 -0
- package/get-shit-done/templates/debug-subagent-prompt.md +91 -0
- package/get-shit-done/templates/dev-preferences.md +21 -0
- package/get-shit-done/templates/discovery.md +146 -0
- package/get-shit-done/templates/discussion-log.md +63 -0
- package/get-shit-done/templates/milestone-archive.md +123 -0
- package/get-shit-done/templates/milestone.md +115 -0
- package/get-shit-done/templates/phase-prompt.md +610 -0
- package/get-shit-done/templates/planner-subagent-prompt.md +117 -0
- package/get-shit-done/templates/project.md +186 -0
- package/get-shit-done/templates/requirements.md +231 -0
- package/get-shit-done/templates/research-project/ARCHITECTURE.md +204 -0
- package/get-shit-done/templates/research-project/FEATURES.md +147 -0
- package/get-shit-done/templates/research-project/PITFALLS.md +200 -0
- package/get-shit-done/templates/research-project/STACK.md +120 -0
- package/get-shit-done/templates/research-project/SUMMARY.md +170 -0
- package/get-shit-done/templates/research.md +592 -0
- package/get-shit-done/templates/retrospective.md +54 -0
- package/get-shit-done/templates/roadmap.md +202 -0
- package/get-shit-done/templates/spec.md +307 -0
- package/get-shit-done/templates/state.md +184 -0
- package/get-shit-done/templates/summary-complex.md +59 -0
- package/get-shit-done/templates/summary-minimal.md +41 -0
- package/get-shit-done/templates/summary-standard.md +48 -0
- package/get-shit-done/templates/summary.md +248 -0
- package/get-shit-done/templates/user-profile.md +146 -0
- package/get-shit-done/templates/user-setup.md +311 -0
- package/get-shit-done/templates/verification-report.md +322 -0
- package/get-shit-done/workflows/add-phase.md +112 -0
- package/get-shit-done/workflows/add-tests.md +354 -0
- package/get-shit-done/workflows/add-todo.md +160 -0
- package/get-shit-done/workflows/ai-integration-phase.md +284 -0
- package/get-shit-done/workflows/analyze-dependencies.md +96 -0
- package/get-shit-done/workflows/audit-fix.md +175 -0
- package/get-shit-done/workflows/audit-milestone.md +340 -0
- package/get-shit-done/workflows/audit-uat.md +109 -0
- package/get-shit-done/workflows/autonomous.md +789 -0
- package/get-shit-done/workflows/check-todos.md +179 -0
- package/get-shit-done/workflows/cleanup.md +154 -0
- package/get-shit-done/workflows/code-review-fix.md +497 -0
- package/get-shit-done/workflows/code-review.md +515 -0
- package/get-shit-done/workflows/complete-milestone.md +847 -0
- package/get-shit-done/workflows/diagnose-issues.md +238 -0
- package/get-shit-done/workflows/discovery-phase.md +291 -0
- package/get-shit-done/workflows/discuss-phase-assumptions.md +670 -0
- package/get-shit-done/workflows/discuss-phase-power.md +308 -0
- package/get-shit-done/workflows/discuss-phase.md +1378 -0
- package/get-shit-done/workflows/do.md +110 -0
- package/get-shit-done/workflows/docs-update.md +1155 -0
- package/get-shit-done/workflows/eval-review.md +155 -0
- package/get-shit-done/workflows/execute-phase.md +1677 -0
- package/get-shit-done/workflows/execute-plan.md +533 -0
- package/get-shit-done/workflows/explore.md +141 -0
- package/get-shit-done/workflows/extract_learnings.md +242 -0
- package/get-shit-done/workflows/fast.md +105 -0
- package/get-shit-done/workflows/forensics.md +265 -0
- package/get-shit-done/workflows/graduation.md +195 -0
- package/get-shit-done/workflows/health.md +314 -0
- package/get-shit-done/workflows/help.md +667 -0
- package/get-shit-done/workflows/import.md +246 -0
- package/get-shit-done/workflows/inbox.md +387 -0
- package/get-shit-done/workflows/ingest-docs.md +328 -0
- package/get-shit-done/workflows/insert-phase.md +130 -0
- package/get-shit-done/workflows/list-phase-assumptions.md +178 -0
- package/get-shit-done/workflows/list-workspaces.md +56 -0
- package/get-shit-done/workflows/manager.md +365 -0
- package/get-shit-done/workflows/map-codebase.md +393 -0
- package/get-shit-done/workflows/milestone-summary.md +223 -0
- package/get-shit-done/workflows/new-milestone.md +611 -0
- package/get-shit-done/workflows/new-project.md +1391 -0
- package/get-shit-done/workflows/new-workspace.md +239 -0
- package/get-shit-done/workflows/next.md +220 -0
- package/get-shit-done/workflows/node-repair.md +92 -0
- package/get-shit-done/workflows/note.md +158 -0
- package/get-shit-done/workflows/pause-work.md +243 -0
- package/get-shit-done/workflows/plan-milestone-gaps.md +273 -0
- package/get-shit-done/workflows/plan-phase.md +1349 -0
- package/get-shit-done/workflows/plan-review-convergence.md +254 -0
- package/get-shit-done/workflows/plant-seed.md +172 -0
- package/get-shit-done/workflows/pr-branch.md +157 -0
- package/get-shit-done/workflows/profile-user.md +452 -0
- package/get-shit-done/workflows/progress.md +619 -0
- package/get-shit-done/workflows/quick.md +970 -0
- package/get-shit-done/workflows/remove-phase.md +155 -0
- package/get-shit-done/workflows/remove-workspace.md +92 -0
- package/get-shit-done/workflows/research-phase.md +89 -0
- package/get-shit-done/workflows/resume-project.md +326 -0
- package/get-shit-done/workflows/review.md +344 -0
- package/get-shit-done/workflows/scan.md +102 -0
- package/get-shit-done/workflows/secure-phase.md +166 -0
- package/get-shit-done/workflows/session-report.md +146 -0
- package/get-shit-done/workflows/settings.md +319 -0
- package/get-shit-done/workflows/ship.md +302 -0
- package/get-shit-done/workflows/sketch-wrap-up.md +283 -0
- package/get-shit-done/workflows/sketch.md +286 -0
- package/get-shit-done/workflows/spec-phase.md +262 -0
- package/get-shit-done/workflows/spike-wrap-up.md +281 -0
- package/get-shit-done/workflows/spike.md +362 -0
- package/get-shit-done/workflows/stats.md +60 -0
- package/get-shit-done/workflows/sync-skills.md +182 -0
- package/get-shit-done/workflows/transition.md +693 -0
- package/get-shit-done/workflows/ui-phase.md +323 -0
- package/get-shit-done/workflows/ui-review.md +190 -0
- package/get-shit-done/workflows/ultraplan-phase.md +189 -0
- package/get-shit-done/workflows/undo.md +314 -0
- package/get-shit-done/workflows/update.md +587 -0
- package/get-shit-done/workflows/validate-phase.md +176 -0
- package/get-shit-done/workflows/verify-phase.md +465 -0
- package/get-shit-done/workflows/verify-work.md +740 -0
- package/hooks/dist/gsd-check-update-worker.js +108 -0
- package/hooks/dist/gsd-check-update.js +64 -0
- package/hooks/dist/gsd-context-monitor.js +192 -0
- package/hooks/dist/gsd-phase-boundary.sh +28 -0
- package/hooks/dist/gsd-prompt-guard.js +97 -0
- package/hooks/dist/gsd-read-guard.js +82 -0
- package/hooks/dist/gsd-read-injection-scanner.js +152 -0
- package/hooks/dist/gsd-session-state.sh +34 -0
- package/hooks/dist/gsd-statusline.js +293 -0
- package/hooks/dist/gsd-validate-commit.sh +48 -0
- package/hooks/dist/gsd-workflow-guard.js +94 -0
- package/hooks/gsd-check-update-worker.js +108 -0
- package/hooks/gsd-check-update.js +64 -0
- package/hooks/gsd-context-monitor.js +192 -0
- package/hooks/gsd-phase-boundary.sh +28 -0
- package/hooks/gsd-prompt-guard.js +97 -0
- package/hooks/gsd-read-guard.js +82 -0
- package/hooks/gsd-read-injection-scanner.js +152 -0
- package/hooks/gsd-session-state.sh +34 -0
- package/hooks/gsd-statusline.js +293 -0
- package/hooks/gsd-validate-commit.sh +48 -0
- package/hooks/gsd-workflow-guard.js +94 -0
- package/package.json +59 -0
- package/scripts/base64-scan.sh +262 -0
- package/scripts/build-hooks.js +95 -0
- package/scripts/gen-inventory-manifest.cjs +109 -0
- package/scripts/prompt-injection-scan.sh +201 -0
- package/scripts/run-tests.cjs +33 -0
- package/scripts/secret-scan.sh +227 -0
- package/sdk/package-lock.json +1998 -0
- package/sdk/package.json +52 -0
- package/sdk/prompts/agents/gsd-executor.md +110 -0
- package/sdk/prompts/agents/gsd-phase-researcher.md +158 -0
- package/sdk/prompts/agents/gsd-plan-checker.md +160 -0
- package/sdk/prompts/agents/gsd-planner.md +214 -0
- package/sdk/prompts/agents/gsd-project-researcher.md +323 -0
- package/sdk/prompts/agents/gsd-research-synthesizer.md +237 -0
- package/sdk/prompts/agents/gsd-roadmapper.md +670 -0
- package/sdk/prompts/agents/gsd-verifier.md +159 -0
- package/sdk/prompts/templates/project.md +186 -0
- package/sdk/prompts/templates/requirements.md +231 -0
- package/sdk/prompts/templates/research-project/ARCHITECTURE.md +204 -0
- package/sdk/prompts/templates/research-project/FEATURES.md +147 -0
- package/sdk/prompts/templates/research-project/PITFALLS.md +200 -0
- package/sdk/prompts/templates/research-project/STACK.md +120 -0
- package/sdk/prompts/templates/research-project/SUMMARY.md +170 -0
- package/sdk/prompts/templates/roadmap.md +202 -0
- package/sdk/prompts/templates/state.md +175 -0
- package/sdk/prompts/workflows/discuss-phase.md +126 -0
- package/sdk/prompts/workflows/execute-plan.md +106 -0
- package/sdk/prompts/workflows/plan-phase.md +84 -0
- package/sdk/prompts/workflows/research-phase.md +45 -0
- package/sdk/prompts/workflows/verify-phase.md +142 -0
- package/sdk/src/assembled-prompts.test.ts +349 -0
- package/sdk/src/cli-transport.test.ts +388 -0
- package/sdk/src/cli-transport.ts +130 -0
- package/sdk/src/cli.test.ts +383 -0
- package/sdk/src/cli.ts +670 -0
- package/sdk/src/config.test.ts +168 -0
- package/sdk/src/config.ts +177 -0
- package/sdk/src/context-engine.test.ts +295 -0
- package/sdk/src/context-engine.ts +170 -0
- package/sdk/src/context-truncation.test.ts +163 -0
- package/sdk/src/context-truncation.ts +233 -0
- package/sdk/src/e2e.integration.test.ts +178 -0
- package/sdk/src/errors.ts +72 -0
- package/sdk/src/event-stream.test.ts +661 -0
- package/sdk/src/event-stream.ts +441 -0
- package/sdk/src/failure-memory.test.ts +457 -0
- package/sdk/src/failure-memory.ts +1324 -0
- package/sdk/src/golden/capture.ts +95 -0
- package/sdk/src/golden/fixtures/generate-slug.golden.json +1 -0
- package/sdk/src/golden/fixtures/profile-sample-sessions/demo-project/sample.jsonl +3 -0
- package/sdk/src/golden/fixtures/summary-extract-sample.md +26 -0
- package/sdk/src/golden/fixtures/uat-render-checkpoint-sample.md +15 -0
- package/sdk/src/golden/golden-integration-covered.ts +30 -0
- package/sdk/src/golden/golden-mutation-covered.ts +7 -0
- package/sdk/src/golden/golden-policy.test.ts +8 -0
- package/sdk/src/golden/golden-policy.ts +112 -0
- package/sdk/src/golden/golden.integration.test.ts +373 -0
- package/sdk/src/golden/init-golden-normalize.ts +15 -0
- package/sdk/src/golden/read-only-golden-rows.ts +77 -0
- package/sdk/src/golden/read-only-parity.integration.test.ts +125 -0
- package/sdk/src/golden/registry-canonical-commands.ts +31 -0
- package/sdk/src/gsd-tools.test.ts +409 -0
- package/sdk/src/gsd-tools.ts +595 -0
- package/sdk/src/headless-prompts.test.ts +159 -0
- package/sdk/src/index.ts +333 -0
- package/sdk/src/init-e2e.integration.test.ts +136 -0
- package/sdk/src/init-runner.test.ts +783 -0
- package/sdk/src/init-runner.ts +735 -0
- package/sdk/src/lifecycle-e2e.integration.test.ts +258 -0
- package/sdk/src/logger.test.ts +149 -0
- package/sdk/src/logger.ts +113 -0
- package/sdk/src/milestone-runner.test.ts +421 -0
- package/sdk/src/phase-prompt.test.ts +538 -0
- package/sdk/src/phase-prompt.ts +264 -0
- package/sdk/src/phase-runner-types.test.ts +421 -0
- package/sdk/src/phase-runner.integration.test.ts +377 -0
- package/sdk/src/phase-runner.test.ts +2333 -0
- package/sdk/src/phase-runner.ts +1203 -0
- package/sdk/src/plan-parser.test.ts +528 -0
- package/sdk/src/plan-parser.ts +427 -0
- package/sdk/src/prompt-builder.test.ts +306 -0
- package/sdk/src/prompt-builder.ts +193 -0
- package/sdk/src/prompt-sanitizer.test.ts +260 -0
- package/sdk/src/prompt-sanitizer.ts +71 -0
- package/sdk/src/query/QUERY-HANDLERS.md +317 -0
- package/sdk/src/query/audit-open.ts +722 -0
- package/sdk/src/query/check-auto-mode.test.ts +77 -0
- package/sdk/src/query/check-auto-mode.ts +50 -0
- package/sdk/src/query/check-completion.test.ts +113 -0
- package/sdk/src/query/check-completion.ts +182 -0
- package/sdk/src/query/check-gates.test.ts +103 -0
- package/sdk/src/query/check-gates.ts +112 -0
- package/sdk/src/query/check-ship-ready.test.ts +77 -0
- package/sdk/src/query/check-ship-ready.ts +103 -0
- package/sdk/src/query/check-verification-status.test.ts +143 -0
- package/sdk/src/query/check-verification-status.ts +160 -0
- package/sdk/src/query/commit.test.ts +202 -0
- package/sdk/src/query/commit.ts +301 -0
- package/sdk/src/query/config-gates.test.ts +89 -0
- package/sdk/src/query/config-gates.ts +69 -0
- package/sdk/src/query/config-mutation.test.ts +365 -0
- package/sdk/src/query/config-mutation.ts +497 -0
- package/sdk/src/query/config-query.test.ts +161 -0
- package/sdk/src/query/config-query.ts +190 -0
- package/sdk/src/query/context-history.test.ts +165 -0
- package/sdk/src/query/context-history.ts +467 -0
- package/sdk/src/query/decomposed-handlers.test.ts +365 -0
- package/sdk/src/query/detect-custom-files.ts +97 -0
- package/sdk/src/query/detect-phase-type.test.ts +105 -0
- package/sdk/src/query/detect-phase-type.ts +141 -0
- package/sdk/src/query/docs-init.ts +257 -0
- package/sdk/src/query/failure-capture.ts +58 -0
- package/sdk/src/query/frontmatter-array.test.ts +14 -0
- package/sdk/src/query/frontmatter-mutation.test.ts +259 -0
- package/sdk/src/query/frontmatter-mutation.ts +343 -0
- package/sdk/src/query/frontmatter.test.ts +281 -0
- package/sdk/src/query/frontmatter.ts +397 -0
- package/sdk/src/query/helpers.test.ts +426 -0
- package/sdk/src/query/helpers.ts +482 -0
- package/sdk/src/query/index.ts +586 -0
- package/sdk/src/query/init-complex.test.ts +232 -0
- package/sdk/src/query/init-complex.ts +578 -0
- package/sdk/src/query/init.test.ts +522 -0
- package/sdk/src/query/init.ts +1046 -0
- package/sdk/src/query/intel.test.ts +90 -0
- package/sdk/src/query/intel.ts +404 -0
- package/sdk/src/query/normalize-query-command.test.ts +50 -0
- package/sdk/src/query/normalize-query-command.ts +56 -0
- package/sdk/src/query/phase-lifecycle.test.ts +1126 -0
- package/sdk/src/query/phase-lifecycle.ts +1799 -0
- package/sdk/src/query/phase-list-queries.test.ts +88 -0
- package/sdk/src/query/phase-list-queries.ts +152 -0
- package/sdk/src/query/phase-ready.test.ts +65 -0
- package/sdk/src/query/phase-ready.ts +158 -0
- package/sdk/src/query/phase.test.ts +307 -0
- package/sdk/src/query/phase.ts +340 -0
- package/sdk/src/query/pipeline.test.ts +169 -0
- package/sdk/src/query/pipeline.ts +243 -0
- package/sdk/src/query/plan-execution-route.test.ts +166 -0
- package/sdk/src/query/plan-execution-route.ts +209 -0
- package/sdk/src/query/plan-task-structure.test.ts +65 -0
- package/sdk/src/query/plan-task-structure.ts +63 -0
- package/sdk/src/query/profile-extract-messages.ts +247 -0
- package/sdk/src/query/profile-output.ts +908 -0
- package/sdk/src/query/profile-questionnaire-data.ts +181 -0
- package/sdk/src/query/profile-sample.ts +184 -0
- package/sdk/src/query/profile-scan-sessions.ts +174 -0
- package/sdk/src/query/profile.test.ts +74 -0
- package/sdk/src/query/profile.ts +337 -0
- package/sdk/src/query/progress.test.ts +156 -0
- package/sdk/src/query/progress.ts +566 -0
- package/sdk/src/query/registry.test.ts +216 -0
- package/sdk/src/query/registry.ts +174 -0
- package/sdk/src/query/requirements-extract-from-plans.test.ts +58 -0
- package/sdk/src/query/requirements-extract-from-plans.ts +86 -0
- package/sdk/src/query/roadmap-update-plan-progress.ts +132 -0
- package/sdk/src/query/roadmap.test.ts +359 -0
- package/sdk/src/query/roadmap.ts +591 -0
- package/sdk/src/query/route-next-action.test.ts +61 -0
- package/sdk/src/query/route-next-action.ts +345 -0
- package/sdk/src/query/runtime-health.ts +7 -0
- package/sdk/src/query/schema-detect.ts +189 -0
- package/sdk/src/query/skill-manifest.ts +214 -0
- package/sdk/src/query/skills.test.ts +80 -0
- package/sdk/src/query/skills.ts +62 -0
- package/sdk/src/query/state-mutation.test.ts +450 -0
- package/sdk/src/query/state-mutation.ts +1444 -0
- package/sdk/src/query/state-project-load.ts +109 -0
- package/sdk/src/query/state.test.ts +347 -0
- package/sdk/src/query/state.ts +397 -0
- package/sdk/src/query/summary.test.ts +95 -0
- package/sdk/src/query/summary.ts +296 -0
- package/sdk/src/query/template.test.ts +180 -0
- package/sdk/src/query/template.ts +242 -0
- package/sdk/src/query/uat.test.ts +77 -0
- package/sdk/src/query/uat.ts +314 -0
- package/sdk/src/query/utils.test.ts +82 -0
- package/sdk/src/query/utils.ts +92 -0
- package/sdk/src/query/validate.test.ts +656 -0
- package/sdk/src/query/validate.ts +807 -0
- package/sdk/src/query/verify.test.ts +414 -0
- package/sdk/src/query/verify.ts +645 -0
- package/sdk/src/query/websearch.test.ts +31 -0
- package/sdk/src/query/websearch.ts +82 -0
- package/sdk/src/query/workspace.test.ts +119 -0
- package/sdk/src/query/workspace.ts +131 -0
- package/sdk/src/query/workstream.test.ts +51 -0
- package/sdk/src/query/workstream.ts +434 -0
- package/sdk/src/research-gate.test.ts +190 -0
- package/sdk/src/research-gate.ts +94 -0
- package/sdk/src/runtime-health.test.ts +176 -0
- package/sdk/src/runtime-health.ts +387 -0
- package/sdk/src/session-runner.test.ts +98 -0
- package/sdk/src/session-runner.ts +299 -0
- package/sdk/src/tool-scoping.test.ts +160 -0
- package/sdk/src/tool-scoping.ts +61 -0
- package/sdk/src/types.ts +917 -0
- package/sdk/src/workstream-utils.ts +33 -0
- package/sdk/src/ws-flag.test.ts +285 -0
- package/sdk/src/ws-transport.test.ts +161 -0
- package/sdk/src/ws-transport.ts +93 -0
- package/sdk/tsconfig.json +20 -0
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
# Research Context Profile
|
|
2
|
+
|
|
3
|
+
Agent output guidance for research mode. Loaded when `context: research` is set in config.json.
|
|
4
|
+
|
|
5
|
+
## Output Style
|
|
6
|
+
|
|
7
|
+
- Verbose, exploratory responses that surface trade-offs and alternatives
|
|
8
|
+
- Present multiple approaches with pros and cons before recommending one
|
|
9
|
+
- Include links, references, and citations where available
|
|
10
|
+
- Use structured headings and bullet lists for scan-ability
|
|
11
|
+
|
|
12
|
+
## Focus Areas
|
|
13
|
+
|
|
14
|
+
- Breadth of options — enumerate before narrowing
|
|
15
|
+
- Prior art and ecosystem conventions
|
|
16
|
+
- Risks, edge cases, and failure modes
|
|
17
|
+
- Dependencies and compatibility implications
|
|
18
|
+
- Long-term maintainability of each approach
|
|
19
|
+
|
|
20
|
+
## Verbosity
|
|
21
|
+
|
|
22
|
+
High. Explain reasoning, show evidence, and document assumptions. Include background context even if the developer likely knows it — research artifacts are read by future contributors who may not.
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
# Review Context Profile
|
|
2
|
+
|
|
3
|
+
Agent output guidance for review mode. Loaded when `context: review` is set in config.json.
|
|
4
|
+
|
|
5
|
+
## Output Style
|
|
6
|
+
|
|
7
|
+
- Critical, detail-focused responses that prioritize correctness
|
|
8
|
+
- Organize findings by severity: blocking, important, nit
|
|
9
|
+
- Reference specific lines and files for every finding
|
|
10
|
+
- State what is correct as well as what needs change — confirm the good parts
|
|
11
|
+
|
|
12
|
+
## Focus Areas
|
|
13
|
+
|
|
14
|
+
- Correctness — logic errors, off-by-ones, missing edge cases
|
|
15
|
+
- Security — input validation, injection vectors, secret exposure
|
|
16
|
+
- Performance — unnecessary allocations, O(n^2) patterns, missing caching
|
|
17
|
+
- Style and consistency — naming, formatting, import order
|
|
18
|
+
- Test coverage — untested branches, missing assertions, flaky patterns
|
|
19
|
+
|
|
20
|
+
## Verbosity
|
|
21
|
+
|
|
22
|
+
Medium. Be thorough on findings but terse in explanation. Each issue should be one to three sentences: what is wrong, why it matters, and how to fix it.
|
|
@@ -0,0 +1,79 @@
|
|
|
1
|
+
# Agent Contracts
|
|
2
|
+
|
|
3
|
+
Completion markers and handoff schemas for all GSD agents. Workflows use these markers to detect agent completion and route accordingly.
|
|
4
|
+
|
|
5
|
+
This doc describes what IS, not what should be. Casing inconsistencies are documented as they appear in agent source files.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Agent Registry
|
|
10
|
+
|
|
11
|
+
| Agent | Role | Completion Markers |
|
|
12
|
+
|-------|------|--------------------|
|
|
13
|
+
| gsd-planner | Plan creation | `## PLANNING COMPLETE` |
|
|
14
|
+
| gsd-executor | Plan execution | `## PLAN COMPLETE`, `## CHECKPOINT REACHED` |
|
|
15
|
+
| gsd-phase-researcher | Phase-scoped research | `## RESEARCH COMPLETE`, `## RESEARCH BLOCKED` |
|
|
16
|
+
| gsd-project-researcher | Project-wide research | `## RESEARCH COMPLETE`, `## RESEARCH BLOCKED` |
|
|
17
|
+
| gsd-plan-checker | Plan validation | `## VERIFICATION PASSED`, `## ISSUES FOUND` |
|
|
18
|
+
| gsd-research-synthesizer | Multi-research synthesis | `## SYNTHESIS COMPLETE`, `## SYNTHESIS BLOCKED` |
|
|
19
|
+
| gsd-debugger | Debug investigation | `## DEBUG COMPLETE`, `## ROOT CAUSE FOUND`, `## CHECKPOINT REACHED` |
|
|
20
|
+
| gsd-roadmapper | Roadmap creation/revision | `## ROADMAP CREATED`, `## ROADMAP REVISED`, `## ROADMAP BLOCKED` |
|
|
21
|
+
| gsd-ui-auditor | UI review | `## UI REVIEW COMPLETE` |
|
|
22
|
+
| gsd-ui-checker | UI validation | `## ISSUES FOUND` |
|
|
23
|
+
| gsd-ui-researcher | UI spec creation | `## UI-SPEC COMPLETE`, `## UI-SPEC BLOCKED` |
|
|
24
|
+
| gsd-verifier | Post-execution verification | `## Verification Complete` (title case) |
|
|
25
|
+
| gsd-integration-checker | Cross-phase integration check | `## Integration Check Complete` (title case) |
|
|
26
|
+
| gsd-nyquist-auditor | Sampling audit | `## PARTIAL`, `## ESCALATE` (non-standard) |
|
|
27
|
+
| gsd-security-auditor | Security audit | `## OPEN_THREATS`, `## ESCALATE` (non-standard) |
|
|
28
|
+
| gsd-codebase-mapper | Codebase analysis | No marker (writes docs directly) |
|
|
29
|
+
| gsd-assumptions-analyzer | Assumption extraction | No marker (returns `## Assumptions` sections) |
|
|
30
|
+
| gsd-doc-verifier | Doc validation | No marker (writes JSON to `.planning/tmp/`) |
|
|
31
|
+
| gsd-doc-writer | Doc generation | No marker (writes docs directly) |
|
|
32
|
+
| gsd-advisor-researcher | Advisory research | No marker (utility agent) |
|
|
33
|
+
| gsd-user-profiler | User profiling | No marker (returns JSON in analysis tags) |
|
|
34
|
+
| gsd-intel-updater | Codebase intelligence analysis | `## INTEL UPDATE COMPLETE`, `## INTEL UPDATE FAILED` |
|
|
35
|
+
|
|
36
|
+
## Marker Rules
|
|
37
|
+
|
|
38
|
+
1. **ALL-CAPS markers** (e.g., `## PLANNING COMPLETE`) are the standard convention
|
|
39
|
+
2. **Title-case markers** (e.g., `## Verification Complete`) exist in gsd-verifier and gsd-integration-checker -- these are intentional as-is, not bugs
|
|
40
|
+
3. **Non-standard markers** (e.g., `## PARTIAL`, `## ESCALATE`) in audit agents indicate partial results requiring orchestrator judgment
|
|
41
|
+
4. **Agents without markers** either write artifacts directly to disk or return structured data (JSON/sections) that the caller parses
|
|
42
|
+
5. Markers must appear as H2 headings (`## `) at the start of a line in the agent's final output
|
|
43
|
+
|
|
44
|
+
## Key Handoff Contracts
|
|
45
|
+
|
|
46
|
+
### Planner -> Executor (via PLAN.md)
|
|
47
|
+
|
|
48
|
+
| Field | Required | Description |
|
|
49
|
+
|-------|----------|-------------|
|
|
50
|
+
| Frontmatter | Yes | phase, plan, type, wave, depends_on, files_modified, autonomous, requirements |
|
|
51
|
+
| `<objective>` | Yes | What the plan achieves |
|
|
52
|
+
| `<tasks>` | Yes | Ordered task list with type, files, action, verify, acceptance_criteria |
|
|
53
|
+
| `<verification>` | Yes | Overall verification steps |
|
|
54
|
+
| `<success_criteria>` | Yes | Measurable completion criteria |
|
|
55
|
+
|
|
56
|
+
### Executor -> Verifier (via SUMMARY.md)
|
|
57
|
+
|
|
58
|
+
| Field | Required | Description |
|
|
59
|
+
|-------|----------|-------------|
|
|
60
|
+
| Frontmatter | Yes | phase, plan, subsystem, tags, key-files, metrics |
|
|
61
|
+
| Commits table | Yes | Per-task commit hashes and descriptions |
|
|
62
|
+
| Deviations section | Yes | Auto-fixed issues or "None" |
|
|
63
|
+
| Self-Check | Yes | PASSED or FAILED with details |
|
|
64
|
+
|
|
65
|
+
## Workflow Regex Patterns
|
|
66
|
+
|
|
67
|
+
Workflows match these markers to detect agent completion:
|
|
68
|
+
|
|
69
|
+
**plan-phase.md matches:**
|
|
70
|
+
- `## RESEARCH COMPLETE` / `## RESEARCH BLOCKED` (researcher output)
|
|
71
|
+
- `## PLANNING COMPLETE` (planner output)
|
|
72
|
+
- `## CHECKPOINT REACHED` (planner/executor pause)
|
|
73
|
+
- `## VERIFICATION PASSED` / `## ISSUES FOUND` (plan-checker output)
|
|
74
|
+
|
|
75
|
+
**execute-phase.md matches:**
|
|
76
|
+
- `## PHASE COMPLETE` (all plans in phase done)
|
|
77
|
+
- `## Self-Check: FAILED` (summary self-check)
|
|
78
|
+
|
|
79
|
+
> **NOTE:** `## PLAN COMPLETE` is the gsd-executor's completion marker but execute-phase.md does not regex-match it. Instead, it detects executor completion via spot-checks (SUMMARY.md existence, git commit state). This is intentional behavior, not a mismatch.
|
|
@@ -0,0 +1,156 @@
|
|
|
1
|
+
# AI Evaluation Reference
|
|
2
|
+
|
|
3
|
+
> Reference used by `gsd-eval-planner` and `gsd-eval-auditor`.
|
|
4
|
+
> Based on "AI Evals for Everyone" course (Reganti & Badam) + industry practice.
|
|
5
|
+
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Core Concepts
|
|
9
|
+
|
|
10
|
+
### Why Evals Exist
|
|
11
|
+
AI systems are non-deterministic. Input X does not reliably produce output Y across runs, users, or edge cases. Evals are the continuous process of assessing whether your system's behavior meets expectations under real-world conditions — unit tests and integration tests alone are insufficient.
|
|
12
|
+
|
|
13
|
+
### Model vs. Product Evaluation
|
|
14
|
+
- **Model evals** (MMLU, HumanEval, GSM8K) — measure general capability in standardized conditions. Use as initial filter only.
|
|
15
|
+
- **Product evals** — measure behavior inside your specific system, with your data, your users, your domain rules. This is where 80% of eval effort belongs.
|
|
16
|
+
|
|
17
|
+
### The Three Components of Every Eval
|
|
18
|
+
- **Input** — everything affecting the system: query, history, retrieved docs, system prompt, config
|
|
19
|
+
- **Expected** — what good behavior looks like, defined through rubrics
|
|
20
|
+
- **Actual** — what the system produced, including intermediate steps, tool calls, and reasoning traces
|
|
21
|
+
|
|
22
|
+
### Three Measurement Approaches
|
|
23
|
+
1. **Code-based metrics** — deterministic checks: JSON validation, required disclaimers, performance thresholds, classification flags. Fast, cheap, reliable. Use first.
|
|
24
|
+
2. **LLM judges** — one model evaluates another against a rubric. Powerful for subjective qualities (tone, reasoning, escalation). Requires calibration against human judgment before trusting.
|
|
25
|
+
3. **Human evaluation** — gold standard for nuanced judgment. Doesn't scale. Use for calibration, edge cases, periodic sampling, and high-stakes decisions.
|
|
26
|
+
|
|
27
|
+
Most effective systems combine all three.
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
## Evaluation Dimensions
|
|
32
|
+
|
|
33
|
+
### Pre-Deployment (Development Phase)
|
|
34
|
+
|
|
35
|
+
| Dimension | What It Measures | When It Matters |
|
|
36
|
+
|-----------|-----------------|-----------------|
|
|
37
|
+
| **Factual accuracy** | Correctness of claims against ground truth | RAG, knowledge bases, any factual assertions |
|
|
38
|
+
| **Context faithfulness** | Response grounded in provided context vs. fabricated | RAG pipelines, document Q&A, retrieval-augmented systems |
|
|
39
|
+
| **Hallucination detection** | Plausible but unsupported claims | All generative systems, high-stakes domains |
|
|
40
|
+
| **Escalation accuracy** | Correct identification of when human intervention needed | Customer service, healthcare, financial advisory |
|
|
41
|
+
| **Policy compliance** | Adherence to business rules, legal requirements, disclaimers | Regulated industries, enterprise deployments |
|
|
42
|
+
| **Tone/style appropriateness** | Match with brand voice, audience expectations, emotional context | Customer-facing systems, content generation |
|
|
43
|
+
| **Output structure validity** | Schema compliance, required fields, format correctness | Structured extraction, API integrations, data pipelines |
|
|
44
|
+
| **Task completion** | Whether the system accomplished the stated goal | Agentic workflows, multi-step tasks |
|
|
45
|
+
| **Tool use correctness** | Correct selection and invocation of tools | Agent systems with tool calls |
|
|
46
|
+
| **Safety** | Absence of harmful, biased, or inappropriate outputs | All user-facing systems |
|
|
47
|
+
|
|
48
|
+
### Production Monitoring
|
|
49
|
+
|
|
50
|
+
| Dimension | Monitoring Approach |
|
|
51
|
+
|-----------|---------------------|
|
|
52
|
+
| **Safety violations** | Online guardrail — real-time, immediate intervention |
|
|
53
|
+
| **Compliance failures** | Online guardrail — block or escalate before user sees output |
|
|
54
|
+
| **Quality degradation trends** | Offline flywheel — batch analysis of sampled interactions |
|
|
55
|
+
| **Emerging failure modes** | Signal-metric divergence — when user behavior signals diverge from metric scores, investigate manually |
|
|
56
|
+
| **Cost/latency drift** | Code-based metrics — automated threshold alerts |
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
## The Guardrail vs. Flywheel Decision
|
|
61
|
+
|
|
62
|
+
Ask: "If this behavior goes wrong, would it be catastrophic for my business?"
|
|
63
|
+
|
|
64
|
+
- **Yes → Guardrail** — run online, real-time, with immediate intervention (block, escalate, hand off). Be selective: guardrails add latency.
|
|
65
|
+
- **No → Flywheel** — run offline as batch analysis feeding system refinements over time.
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
## Rubric Design
|
|
70
|
+
|
|
71
|
+
Generic metrics are meaningless without context. "Helpfulness" in real estate means summarizing listings clearly. In healthcare it means knowing when *not* to answer.
|
|
72
|
+
|
|
73
|
+
A rubric must define:
|
|
74
|
+
1. The dimension being measured
|
|
75
|
+
2. What scores 1, 3, and 5 on a 5-point scale (or pass/fail criteria)
|
|
76
|
+
3. Domain-specific examples of acceptable vs. unacceptable behavior
|
|
77
|
+
|
|
78
|
+
Without rubrics, LLM judges produce noise rather than signal.
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
## Reference Dataset Guidelines
|
|
83
|
+
|
|
84
|
+
- Start with **10-20 high-quality examples** — not 200 mediocre ones
|
|
85
|
+
- Cover: critical success scenarios, common user workflows, known edge cases, historical failure modes
|
|
86
|
+
- Have domain experts label the examples (not just engineers)
|
|
87
|
+
- Expand based on what you learn in production — don't build for hypothetical coverage
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
## Eval Tooling Guide
|
|
92
|
+
|
|
93
|
+
| Tool | Type | Best For | Key Strength |
|
|
94
|
+
|------|------|----------|-------------|
|
|
95
|
+
| **RAGAS** | Python library | RAG evaluation | Purpose-built metrics: faithfulness, answer relevance, context precision/recall |
|
|
96
|
+
| **Langfuse** | Platform (open-source, self-hostable) | All system types | Strong tracing, prompt management, good for teams wanting infrastructure control |
|
|
97
|
+
| **LangSmith** | Platform (commercial) | LangChain/LangGraph ecosystems | Tightest integration with LangChain; best if already in that ecosystem |
|
|
98
|
+
| **Arize Phoenix** | Platform (open-source + hosted) | RAG + multi-agent tracing | Strong RAG eval + trace visualization; open-source with hosted option |
|
|
99
|
+
| **Braintrust** | Platform (commercial) | Model-agnostic evaluation | Dataset and experiment management; good for comparing across frameworks |
|
|
100
|
+
| **Promptfoo** | CLI tool (open-source) | Prompt testing, CI/CD | CLI-first, excellent for CI/CD prompt regression testing |
|
|
101
|
+
|
|
102
|
+
### Tool Selection by System Type
|
|
103
|
+
|
|
104
|
+
| System Type | Recommended Tooling |
|
|
105
|
+
|-------------|---------------------|
|
|
106
|
+
| RAG / Knowledge Q&A | RAGAS + Arize Phoenix or Braintrust |
|
|
107
|
+
| Multi-agent systems | Langfuse + Arize Phoenix |
|
|
108
|
+
| Conversational / single-model | Promptfoo + Braintrust |
|
|
109
|
+
| Structured extraction | Promptfoo + code-based validators |
|
|
110
|
+
| LangChain/LangGraph projects | LangSmith (native integration) |
|
|
111
|
+
| Production monitoring (all types) | Langfuse, Arize Phoenix, or LangSmith |
|
|
112
|
+
|
|
113
|
+
---
|
|
114
|
+
|
|
115
|
+
## Evals in the Development Lifecycle
|
|
116
|
+
|
|
117
|
+
### Plan Phase (Evaluation-Aware Design)
|
|
118
|
+
Before writing code, define:
|
|
119
|
+
1. What type of AI system is being built → determines framework and dominant eval concerns
|
|
120
|
+
2. Critical failure modes (3-5 behaviors that cannot go wrong)
|
|
121
|
+
3. Rubrics — explicit definitions of acceptable/unacceptable behavior per dimension
|
|
122
|
+
4. Evaluation strategy — which dimensions use code metrics, LLM judges, or human review
|
|
123
|
+
5. Reference dataset requirements — size, composition, labeling approach
|
|
124
|
+
6. Eval tooling selection
|
|
125
|
+
|
|
126
|
+
Output: EVALS-SPEC section of AI-SPEC.md
|
|
127
|
+
|
|
128
|
+
### Execute Phase (Instrument While Building)
|
|
129
|
+
- Add tracing from day one (Langfuse, Arize Phoenix, or LangSmith)
|
|
130
|
+
- Build reference dataset concurrently with implementation
|
|
131
|
+
- Implement code-based checks first; add LLM judges only for subjective dimensions
|
|
132
|
+
- Run evals in CI/CD via Promptfoo or Braintrust
|
|
133
|
+
|
|
134
|
+
### Verify Phase (Pre-Deployment Validation)
|
|
135
|
+
- Run full reference dataset against all metrics
|
|
136
|
+
- Conduct human review of edge cases and LLM judge disagreements
|
|
137
|
+
- Calibrate LLM judges against human scores (target ≥ 0.7 correlation before trusting)
|
|
138
|
+
- Define and configure production guardrails
|
|
139
|
+
- Establish monitoring baseline
|
|
140
|
+
|
|
141
|
+
### Monitor Phase (Production Evaluation Loop)
|
|
142
|
+
- Smart sampling — weight toward interactions with concerning signals (retries, unusual length, explicit escalations)
|
|
143
|
+
- Online guardrails on every interaction
|
|
144
|
+
- Offline flywheel on sampled batch
|
|
145
|
+
- Watch for signal-metric divergence — the early warning system for evaluation gaps
|
|
146
|
+
|
|
147
|
+
---
|
|
148
|
+
|
|
149
|
+
## Common Pitfalls
|
|
150
|
+
|
|
151
|
+
1. **Assuming benchmarks predict product success** — they don't; model evals are a filter, not a verdict
|
|
152
|
+
2. **Engineering evals in isolation** — domain experts must co-define rubrics; engineers alone miss critical nuances
|
|
153
|
+
3. **Building comprehensive coverage on day one** — start small (10-20 examples), expand from real failure modes
|
|
154
|
+
4. **Trusting uncalibrated LLM judges** — validate against human judgment before relying on them
|
|
155
|
+
5. **Measuring everything** — only track metrics that drive decisions; "collect it all" produces noise
|
|
156
|
+
6. **Treating evaluation as one-time setup** — user behavior evolves, requirements change, failure modes emerge; evaluation is continuous
|
|
@@ -0,0 +1,186 @@
|
|
|
1
|
+
# AI Framework Decision Matrix
|
|
2
|
+
|
|
3
|
+
> Reference used by `gsd-framework-selector` and `gsd-ai-researcher`.
|
|
4
|
+
> Distilled from official docs, benchmarks, and developer reports (2026).
|
|
5
|
+
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Quick Picks
|
|
9
|
+
|
|
10
|
+
| Situation | Pick |
|
|
11
|
+
|-----------|------|
|
|
12
|
+
| Simplest path to a working agent (OpenAI) | OpenAI Agents SDK |
|
|
13
|
+
| Simplest path to a working agent (model-agnostic) | CrewAI |
|
|
14
|
+
| Production RAG / document Q&A | LlamaIndex |
|
|
15
|
+
| Complex stateful workflows with branching | LangGraph |
|
|
16
|
+
| Multi-agent teams with defined roles | CrewAI |
|
|
17
|
+
| Code-aware autonomous agents (Anthropic) | Claude Agent SDK |
|
|
18
|
+
| "I don't know my requirements yet" | LangChain |
|
|
19
|
+
| Regulated / audit-trail required | LangGraph |
|
|
20
|
+
| Enterprise Microsoft/.NET shops | AutoGen/AG2 |
|
|
21
|
+
| Google Cloud / Gemini-committed teams | Google ADK |
|
|
22
|
+
| Pure NLP pipelines with explicit control | Haystack |
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## Framework Profiles
|
|
27
|
+
|
|
28
|
+
### CrewAI
|
|
29
|
+
- **Type:** Multi-agent orchestration
|
|
30
|
+
- **Language:** Python only
|
|
31
|
+
- **Model support:** Model-agnostic
|
|
32
|
+
- **Learning curve:** Beginner (role/task/crew maps to real teams)
|
|
33
|
+
- **Best for:** Content pipelines, research automation, business process workflows, rapid prototyping
|
|
34
|
+
- **Avoid if:** Fine-grained state management, TypeScript, fault-tolerant checkpointing, complex conditional branching
|
|
35
|
+
- **Strengths:** Fastest multi-agent prototyping, 5.76x faster than LangGraph on QA tasks, built-in memory (short/long/entity/contextual), Flows architecture, standalone (no LangChain dep)
|
|
36
|
+
- **Weaknesses:** Limited checkpointing, coarse error handling, Python only
|
|
37
|
+
- **Eval concerns:** Task decomposition accuracy, inter-agent handoff, goal completion rate, loop detection
|
|
38
|
+
|
|
39
|
+
### LlamaIndex
|
|
40
|
+
- **Type:** RAG and data ingestion
|
|
41
|
+
- **Language:** Python + TypeScript
|
|
42
|
+
- **Model support:** Model-agnostic
|
|
43
|
+
- **Learning curve:** Intermediate
|
|
44
|
+
- **Best for:** Legal research, internal knowledge assistants, enterprise document search, any system where retrieval quality is the #1 priority
|
|
45
|
+
- **Avoid if:** Primary need is agent orchestration, multi-agent collaboration, or chatbot conversation flow
|
|
46
|
+
- **Strengths:** Best-in-class document parsing (LlamaParse), 35% retrieval accuracy improvement, 20-30% faster queries, mixed retrieval strategies (vector + graph + reranker)
|
|
47
|
+
- **Weaknesses:** Data framework first — agent orchestration is secondary
|
|
48
|
+
- **Eval concerns:** Context faithfulness, hallucination, answer relevance, retrieval precision/recall
|
|
49
|
+
|
|
50
|
+
### LangChain
|
|
51
|
+
- **Type:** General-purpose LLM framework
|
|
52
|
+
- **Language:** Python + TypeScript
|
|
53
|
+
- **Model support:** Model-agnostic (widest ecosystem)
|
|
54
|
+
- **Learning curve:** Intermediate–Advanced
|
|
55
|
+
- **Best for:** Evolving requirements, many third-party integrations, teams wanting one framework for everything, RAG + agents + chains
|
|
56
|
+
- **Avoid if:** Simple well-defined use case, RAG-primary (use LlamaIndex), complex stateful workflows (use LangGraph), performance at scale is critical
|
|
57
|
+
- **Strengths:** Largest community and integration ecosystem, 25% faster development vs scratch, covers RAG/agents/chains/memory
|
|
58
|
+
- **Weaknesses:** Abstraction overhead, p99 latency degrades under load, complexity creep risk
|
|
59
|
+
- **Eval concerns:** End-to-end task completion, chain correctness, retrieval quality
|
|
60
|
+
|
|
61
|
+
### LangGraph
|
|
62
|
+
- **Type:** Stateful agent workflows (graph-based)
|
|
63
|
+
- **Language:** Python + TypeScript (full parity)
|
|
64
|
+
- **Model support:** Model-agnostic (inherits LangChain integrations)
|
|
65
|
+
- **Learning curve:** Intermediate–Advanced (graph mental model)
|
|
66
|
+
- **Best for:** Production-grade stateful workflows, regulated industries, audit trails, human-in-the-loop flows, fault-tolerant multi-step agents
|
|
67
|
+
- **Avoid if:** Simple chatbot, purely linear workflow, rapid prototyping
|
|
68
|
+
- **Strengths:** Best checkpointing (every node), time-travel debugging, native Postgres/Redis persistence, streaming support, chosen by 62% of developers for stateful agent work (2026)
|
|
69
|
+
- **Weaknesses:** More upfront scaffolding, steeper curve, overkill for simple cases
|
|
70
|
+
- **Eval concerns:** State transition correctness, goal completion rate, tool use accuracy, safety guardrails
|
|
71
|
+
|
|
72
|
+
### OpenAI Agents SDK
|
|
73
|
+
- **Type:** Native OpenAI agent framework
|
|
74
|
+
- **Language:** Python + TypeScript
|
|
75
|
+
- **Model support:** Optimized for OpenAI (supports 100+ via Chat Completions compatibility)
|
|
76
|
+
- **Learning curve:** Beginner (4 primitives: Agents, Handoffs, Guardrails, Tracing)
|
|
77
|
+
- **Best for:** OpenAI-committed teams, rapid agent prototyping, voice agents (gpt-realtime), teams wanting visual builder (AgentKit)
|
|
78
|
+
- **Avoid if:** Model flexibility needed, complex multi-agent collaboration, persistent state management required, vendor lock-in concern
|
|
79
|
+
- **Strengths:** Simplest mental model, built-in tracing and guardrails, Handoffs for agent delegation, Realtime Agents for voice
|
|
80
|
+
- **Weaknesses:** OpenAI vendor lock-in, no built-in persistent state, younger ecosystem
|
|
81
|
+
- **Eval concerns:** Instruction following, safety guardrails, escalation accuracy, tone consistency
|
|
82
|
+
|
|
83
|
+
### Claude Agent SDK (Anthropic)
|
|
84
|
+
- **Type:** Code-aware autonomous agent framework
|
|
85
|
+
- **Language:** Python + TypeScript
|
|
86
|
+
- **Model support:** Claude models only
|
|
87
|
+
- **Learning curve:** Intermediate (18 hook events, MCP, tool decorators)
|
|
88
|
+
- **Best for:** Developer tooling, code generation/review agents, autonomous coding assistants, MCP-heavy architectures, safety-critical applications
|
|
89
|
+
- **Avoid if:** Model flexibility needed, stable/mature API required, use case unrelated to code/tool-use
|
|
90
|
+
- **Strengths:** Deepest MCP integration, built-in filesystem/shell access, 18 lifecycle hooks, automatic context compaction, extended thinking, safety-first design
|
|
91
|
+
- **Weaknesses:** Claude-only vendor lock-in, newer/evolving API, smaller community
|
|
92
|
+
- **Eval concerns:** Tool use correctness, safety, code quality, instruction following
|
|
93
|
+
|
|
94
|
+
### AutoGen / AG2 / Microsoft Agent Framework
|
|
95
|
+
- **Type:** Multi-agent conversational framework
|
|
96
|
+
- **Language:** Python (AG2), Python + .NET (Microsoft Agent Framework)
|
|
97
|
+
- **Model support:** Model-agnostic
|
|
98
|
+
- **Learning curve:** Intermediate–Advanced
|
|
99
|
+
- **Best for:** Research applications, conversational problem-solving, code generation + execution loops, Microsoft/.NET shops
|
|
100
|
+
- **Avoid if:** You want ecosystem stability, deterministic workflows, or "safest long-term bet" (fragmentation risk)
|
|
101
|
+
- **Strengths:** Most sophisticated conversational agent patterns, code generation + execution loop, async event-driven (v0.4+), cross-language interop (Microsoft Agent Framework)
|
|
102
|
+
- **Weaknesses:** Ecosystem fragmented (AutoGen maintenance mode, AG2 fork, Microsoft Agent Framework preview) — genuine long-term risk
|
|
103
|
+
- **Eval concerns:** Conversation goal completion, consensus quality, code execution correctness
|
|
104
|
+
|
|
105
|
+
### Google ADK (Agent Development Kit)
|
|
106
|
+
- **Type:** Multi-agent orchestration framework
|
|
107
|
+
- **Language:** Python + Java
|
|
108
|
+
- **Model support:** Optimized for Gemini; supports other models via LiteLLM
|
|
109
|
+
- **Learning curve:** Intermediate (agent/tool/session model, familiar if you know LangGraph)
|
|
110
|
+
- **Best for:** Google Cloud / Vertex AI shops, multi-agent workflows needing built-in session management and memory, teams already committed to Gemini, agent pipelines that need Google Search / BigQuery tool integration
|
|
111
|
+
- **Avoid if:** Model flexibility is required beyond Gemini, no Google Cloud dependency acceptable, TypeScript-only stack
|
|
112
|
+
- **Strengths:** First-party Google support, built-in session/memory/artifact management, tight Vertex AI and Google Search integration, own eval framework (RAGAS-compatible), multi-agent by design (sequential, parallel, loop patterns), Java SDK for enterprise teams
|
|
113
|
+
- **Weaknesses:** Gemini vendor lock-in in practice, younger community than LangChain/LlamaIndex, less third-party integration depth
|
|
114
|
+
- **Eval concerns:** Multi-agent task decomposition, tool use correctness, session state consistency, goal completion rate
|
|
115
|
+
|
|
116
|
+
### Haystack
|
|
117
|
+
- **Type:** NLP pipeline framework
|
|
118
|
+
- **Language:** Python
|
|
119
|
+
- **Model support:** Model-agnostic
|
|
120
|
+
- **Learning curve:** Intermediate
|
|
121
|
+
- **Best for:** Explicit, auditable NLP pipelines, document processing with fine-grained control, enterprise search, regulated industries needing transparency
|
|
122
|
+
- **Avoid if:** Rapid prototyping, multi-agent workflows, or you want a large community
|
|
123
|
+
- **Strengths:** Explicit pipeline control, strong for structured data pipelines, good documentation
|
|
124
|
+
- **Weaknesses:** Smaller community, less agent-oriented than alternatives
|
|
125
|
+
- **Eval concerns:** Extraction accuracy, pipeline output validity, retrieval quality
|
|
126
|
+
|
|
127
|
+
---
|
|
128
|
+
|
|
129
|
+
## Decision Dimensions
|
|
130
|
+
|
|
131
|
+
### By System Type
|
|
132
|
+
|
|
133
|
+
| System Type | Primary Framework(s) | Key Eval Concerns |
|
|
134
|
+
|-------------|---------------------|-------------------|
|
|
135
|
+
| RAG / Knowledge Q&A | LlamaIndex, LangChain | Context faithfulness, hallucination, retrieval precision/recall |
|
|
136
|
+
| Multi-agent orchestration | CrewAI, LangGraph, Google ADK | Task decomposition, handoff quality, goal completion |
|
|
137
|
+
| Conversational assistants | OpenAI Agents SDK, Claude Agent SDK | Tone, safety, instruction following, escalation |
|
|
138
|
+
| Structured data extraction | LangChain, LlamaIndex | Schema compliance, extraction accuracy |
|
|
139
|
+
| Autonomous task agents | LangGraph, OpenAI Agents SDK | Safety guardrails, tool correctness, cost adherence |
|
|
140
|
+
| Content generation | Claude Agent SDK, OpenAI Agents SDK | Brand voice, factual accuracy, tone |
|
|
141
|
+
| Code automation | Claude Agent SDK | Code correctness, safety, test pass rate |
|
|
142
|
+
|
|
143
|
+
### By Team Size and Stage
|
|
144
|
+
|
|
145
|
+
| Context | Recommendation |
|
|
146
|
+
|---------|----------------|
|
|
147
|
+
| Solo dev, prototyping | OpenAI Agents SDK or CrewAI (fastest to running) |
|
|
148
|
+
| Solo dev, RAG | LlamaIndex (batteries included) |
|
|
149
|
+
| Team, production, stateful | LangGraph (best fault tolerance) |
|
|
150
|
+
| Team, evolving requirements | LangChain (broadest escape hatches) |
|
|
151
|
+
| Team, multi-agent | CrewAI (simplest role abstraction) |
|
|
152
|
+
| Enterprise, .NET | AutoGen AG2 / Microsoft Agent Framework |
|
|
153
|
+
|
|
154
|
+
### By Model Commitment
|
|
155
|
+
|
|
156
|
+
| Preference | Framework |
|
|
157
|
+
|-----------|-----------|
|
|
158
|
+
| OpenAI-only | OpenAI Agents SDK |
|
|
159
|
+
| Anthropic/Claude-only | Claude Agent SDK |
|
|
160
|
+
| Google/Gemini-committed | Google ADK |
|
|
161
|
+
| Model-agnostic (full flexibility) | LangChain, LlamaIndex, CrewAI, LangGraph, Haystack |
|
|
162
|
+
|
|
163
|
+
---
|
|
164
|
+
|
|
165
|
+
## Anti-Patterns
|
|
166
|
+
|
|
167
|
+
1. **Using LangChain for simple chatbots** — Direct SDK call is less code, faster, and easier to debug
|
|
168
|
+
2. **Using CrewAI for complex stateful workflows** — Checkpointing gaps will bite you in production
|
|
169
|
+
3. **Using OpenAI Agents SDK with non-OpenAI models** — Loses the integration benefits you chose it for
|
|
170
|
+
4. **Using LlamaIndex as a multi-agent framework** — It can do agents, but that's not its strength
|
|
171
|
+
5. **Defaulting to LangChain without evaluating alternatives** — "Everyone uses it" ≠ right for your use case
|
|
172
|
+
6. **Starting a new project on AutoGen (not AG2)** — AutoGen is in maintenance mode; use AG2 or wait for Microsoft Agent Framework GA
|
|
173
|
+
7. **Choosing LangGraph for simple linear flows** — The graph overhead is not worth it; use LangChain chains instead
|
|
174
|
+
8. **Ignoring vendor lock-in** — Provider-native SDKs (OpenAI, Claude) trade flexibility for integration depth; decide consciously
|
|
175
|
+
|
|
176
|
+
---
|
|
177
|
+
|
|
178
|
+
## Combination Plays (Multi-Framework Stacks)
|
|
179
|
+
|
|
180
|
+
| Production Pattern | Stack |
|
|
181
|
+
|-------------------|-------|
|
|
182
|
+
| RAG with observability | LlamaIndex + LangSmith or Langfuse |
|
|
183
|
+
| Stateful agent with RAG | LangGraph + LlamaIndex |
|
|
184
|
+
| Multi-agent with tracing | CrewAI + Langfuse |
|
|
185
|
+
| OpenAI agents with evals | OpenAI Agents SDK + Promptfoo or Braintrust |
|
|
186
|
+
| Claude agents with MCP | Claude Agent SDK + LangSmith or Arize Phoenix |
|
|
@@ -0,0 +1,131 @@
|
|
|
1
|
+
# GSD Artifact Types
|
|
2
|
+
|
|
3
|
+
This reference documents all artifact types in the GSD planning taxonomy. Each type has a defined
|
|
4
|
+
shape, lifecycle, location, and consumption mechanism. A well-formatted artifact that no workflow
|
|
5
|
+
reads is inert — the consumption mechanism is what gives an artifact meaning.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Core Artifacts
|
|
10
|
+
|
|
11
|
+
### ROADMAP.md
|
|
12
|
+
- **Shape**: Milestone + phase listing with goals and canonical refs
|
|
13
|
+
- **Lifecycle**: Created → Updated per milestone → Archived
|
|
14
|
+
- **Location**: `.planning/ROADMAP.md`
|
|
15
|
+
- **Consumed by**: `plan-phase`, `discuss-phase`, `execute-phase`, `progress`, `state` commands
|
|
16
|
+
|
|
17
|
+
### STATE.md
|
|
18
|
+
- **Shape**: Current position tracker (phase, plan, progress, decisions)
|
|
19
|
+
- **Lifecycle**: Continuously updated throughout the project
|
|
20
|
+
- **Location**: `.planning/STATE.md`
|
|
21
|
+
- **Consumed by**: All orchestration workflows; `resume-project`, `progress`, `next` commands
|
|
22
|
+
|
|
23
|
+
### REQUIREMENTS.md
|
|
24
|
+
- **Shape**: Numbered acceptance criteria with traceability table
|
|
25
|
+
- **Lifecycle**: Created at project start → Updated as requirements are satisfied
|
|
26
|
+
- **Location**: `.planning/REQUIREMENTS.md`
|
|
27
|
+
- **Consumed by**: `discuss-phase`, `plan-phase`, CONTEXT.md generation; executor marks complete
|
|
28
|
+
|
|
29
|
+
### CONTEXT.md (per-phase)
|
|
30
|
+
- **Shape**: 6-section format: domain, decisions, canonical_refs, code_context, specifics, deferred
|
|
31
|
+
- **Lifecycle**: Created before planning → Used during planning and execution → Superseded by next phase
|
|
32
|
+
- **Location**: `.planning/phases/XX-name/XX-CONTEXT.md`
|
|
33
|
+
- **Consumed by**: `plan-phase` (reads decisions), `execute-phase` (reads code_context and canonical_refs)
|
|
34
|
+
|
|
35
|
+
### PLAN.md (per-plan)
|
|
36
|
+
- **Shape**: Frontmatter + objective + tasks with types + success criteria + output spec
|
|
37
|
+
- **Lifecycle**: Created by planner → Executed → SUMMARY.md produced
|
|
38
|
+
- **Location**: `.planning/phases/XX-name/XX-YY-PLAN.md`
|
|
39
|
+
- **Consumed by**: `execute-phase` executor; task commits reference plan IDs
|
|
40
|
+
|
|
41
|
+
### SUMMARY.md (per-plan)
|
|
42
|
+
- **Shape**: Frontmatter with dependency graph + narrative + deviations + self-check
|
|
43
|
+
- **Lifecycle**: Created at plan completion → Read by subsequent plans in same phase
|
|
44
|
+
- **Location**: `.planning/phases/XX-name/XX-YY-SUMMARY.md`
|
|
45
|
+
- **Consumed by**: Orchestrator (progress), planner (context for future plans), `milestone-summary`
|
|
46
|
+
|
|
47
|
+
### HANDOFF.json / .continue-here.md
|
|
48
|
+
- **Shape**: Structured pause state (JSON machine-readable + Markdown human-readable)
|
|
49
|
+
- **Lifecycle**: Created on pause → Consumed on resume → Replaced by next pause
|
|
50
|
+
- **Location**: `.planning/HANDOFF.json` + `.planning/phases/XX-name/.continue-here.md` (or spike/deliberation path)
|
|
51
|
+
- **Consumed by**: `resume-project` workflow
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## Extended Artifacts
|
|
56
|
+
|
|
57
|
+
### DISCUSSION-LOG.md (per-phase)
|
|
58
|
+
- **Shape**: Audit trail of assumptions and corrections from discuss-phase
|
|
59
|
+
- **Lifecycle**: Created at discussion time → Read-only audit record
|
|
60
|
+
- **Location**: `.planning/phases/XX-name/XX-DISCUSSION-LOG.md`
|
|
61
|
+
- **Consumed by**: Human review; not read by automated workflows
|
|
62
|
+
|
|
63
|
+
### USER-PROFILE.md
|
|
64
|
+
- **Shape**: Calibration tier and preferences profile
|
|
65
|
+
- **Lifecycle**: Created by `profile-user` → Updated as preferences are observed
|
|
66
|
+
- **Location**: `~/.claude/get-shit-done/USER-PROFILE.md`
|
|
67
|
+
- **Consumed by**: `discuss-phase-assumptions` (calibration tier), `plan-phase`
|
|
68
|
+
|
|
69
|
+
### SPIKE.md / DESIGN.md (per-spike)
|
|
70
|
+
- **Shape**: Research question + methodology + findings + recommendation
|
|
71
|
+
- **Lifecycle**: Created → Investigated → Decided → Archived
|
|
72
|
+
- **Location**: `.planning/spikes/SPIKE-NNN/`
|
|
73
|
+
- **Consumed by**: Planner when spike is referenced; `pause-work` for spike context handoff
|
|
74
|
+
|
|
75
|
+
### Spike README.md / MANIFEST.md (per-spike, via /gsd-spike)
|
|
76
|
+
- **Shape**: YAML frontmatter (spike, name, validates, verdict, related, tags) + run instructions + results
|
|
77
|
+
- **Lifecycle**: Created by `/gsd-spike` → Verified → Wrapped up by `/gsd-spike-wrap-up`
|
|
78
|
+
- **Location**: `.planning/spikes/NNN-name/README.md`, `.planning/spikes/MANIFEST.md`
|
|
79
|
+
- **Consumed by**: `/gsd-spike-wrap-up` for curation; `pause-work` for spike context handoff
|
|
80
|
+
|
|
81
|
+
### Sketch README.md / MANIFEST.md / index.html (per-sketch)
|
|
82
|
+
- **Shape**: YAML frontmatter (sketch, name, question, winner, tags) + variants as tabbed HTML
|
|
83
|
+
- **Lifecycle**: Created by `/gsd-sketch` → Evaluated → Wrapped up by `/gsd-sketch-wrap-up`
|
|
84
|
+
- **Location**: `.planning/sketches/NNN-name/README.md`, `.planning/sketches/NNN-name/index.html`, `.planning/sketches/MANIFEST.md`
|
|
85
|
+
- **Consumed by**: `/gsd-sketch-wrap-up` for curation; `pause-work` for sketch context handoff
|
|
86
|
+
|
|
87
|
+
### WRAP-UP-SUMMARY.md (per wrap-up session)
|
|
88
|
+
- **Shape**: Curation results, included/excluded items, feature/design area groupings
|
|
89
|
+
- **Lifecycle**: Created by `/gsd-spike-wrap-up` or `/gsd-sketch-wrap-up`
|
|
90
|
+
- **Location**: `.planning/spikes/WRAP-UP-SUMMARY.md` or `.planning/sketches/WRAP-UP-SUMMARY.md`
|
|
91
|
+
- **Consumed by**: Project history; not read by automated workflows
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
## Standing Reference Artifacts
|
|
96
|
+
|
|
97
|
+
### METHODOLOGY.md
|
|
98
|
+
|
|
99
|
+
- **Shape**: Standing reference — reusable interpretive frameworks (lenses) that apply across phases
|
|
100
|
+
- **Lifecycle**: Created → Active → Superseded (when a lens is replaced by a better one)
|
|
101
|
+
- **Location**: `.planning/METHODOLOGY.md` (project-scoped, not phase-scoped)
|
|
102
|
+
- **Contents**: Named lenses, each documenting:
|
|
103
|
+
- What it diagnoses (the class of problem it detects)
|
|
104
|
+
- What it recommends (the class of response it prescribes)
|
|
105
|
+
- When to apply (triggering conditions)
|
|
106
|
+
- Example: Bayesian updating, STRIDE threat modeling, Cost-of-delay prioritization
|
|
107
|
+
- **Consumed by**:
|
|
108
|
+
- `discuss-phase-assumptions` — reads METHODOLOGY.md (if it exists) and applies active lenses
|
|
109
|
+
to the current assumption analysis before surfacing findings to the user
|
|
110
|
+
- `plan-phase` — reads METHODOLOGY.md to inform methodology selection for each plan
|
|
111
|
+
- `pause-work` — includes METHODOLOGY.md in the Required Reading section of `.continue-here.md`
|
|
112
|
+
so resuming agents inherit the project's analytical orientation
|
|
113
|
+
|
|
114
|
+
**Why consumption matters:** A METHODOLOGY.md that no workflow reads is inert. The lenses only
|
|
115
|
+
take effect when an agent loads them into its reasoning context before analysis. This is why
|
|
116
|
+
both the discuss-phase-assumptions and pause-work workflows explicitly reference this file.
|
|
117
|
+
|
|
118
|
+
**Example lens entry:**
|
|
119
|
+
|
|
120
|
+
```markdown
|
|
121
|
+
## Bayesian Updating
|
|
122
|
+
|
|
123
|
+
**Diagnoses:** Decisions made with stale priors — assumptions formed early that evidence has since
|
|
124
|
+
contradicted, but which remain embedded in the plan.
|
|
125
|
+
|
|
126
|
+
**Recommends:** Before confirming an assumption, ask: "What evidence would make me change this?"
|
|
127
|
+
If no evidence could change it, it's a belief, not an assumption. Flag for user review.
|
|
128
|
+
|
|
129
|
+
**Apply when:** Any assumption carries Confident label but was formed before recent architectural
|
|
130
|
+
changes, library upgrades, or scope corrections.
|
|
131
|
+
```
|