npm - gsd-antigravity-kit - Versions diffs - 2.0.1 → 2.1.0 - Mend

gsd-antigravity-kit 2.0.1 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (251) hide show

package/.agent/skills/gsd/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: gsd
-version: 1.34.2
+version: 1.37.1
 description: "Antigravity GSD (Get Stuff Done) - A spec-driven hierarchical planning and execution system. Triggers on project planning, phase management, and GSD slash commands."
 ---
@@ -25,6 +25,7 @@ This skill should be used when:
 - `gsd:add-phase`
 - `gsd:add-tests`
 - `gsd:add-todo`
+- `gsd:ai-integration-phase`
 - `gsd:analyze-dependencies`
 - `gsd:audit-fix`
 - `gsd:audit-milestone`
@@ -39,14 +40,19 @@ This skill should be used when:
 - `gsd:discuss-phase`
 - `gsd:do`
 - `gsd:docs-update`
+- `gsd:eval-review`
 - `gsd:execute-phase`
 - `gsd:explore`
+- `gsd:extract_learnings`
 - `gsd:fast`
 - `gsd:forensics`
+- `gsd:from-gsd2`
+- `gsd:graphify`
 - `gsd:gsd-tools`
 - `gsd:health`
 - `gsd:help`
 - `gsd:import`
+- `gsd:inbox`
 - `gsd:insert-phase`
 - `gsd:intel`
 - `gsd:join-discord`
@@ -81,6 +87,11 @@ This skill should be used when:
 - `gsd:set-profile`
 - `gsd:settings`
 - `gsd:ship`
+- `gsd:sketch`
+- `gsd:sketch-wrap-up`
+- `gsd:spec-phase`
+- `gsd:spike`
+- `gsd:spike-wrap-up`
 - `gsd:stats`
 - `gsd:thread`
 - `gsd:ui-phase`
@@ -128,16 +139,27 @@ The following slash commands are available in this skill. Use them to drive the
 - **[`gsd:review-backlog`](references/commands/milestone/review-backlog.md)**: Review and promote backlog items to active milestone
 ### Misc Commands
+- **[`gsd:ai-integration-phase`](references/commands/misc/ai-integration-phase.md)**: Generate AI design contract (AI-SPEC.md) for phases that involve building AI systems — framework selection, implementation guidance from official docs, and evaluation strategy
 - **[`gsd:audit-fix`](references/commands/misc/audit-fix.md)**: Autonomous audit-to-fix pipeline — find issues, classify, fix, test, commit
 - **[`gsd:audit-uat`](references/commands/misc/audit-uat.md)**: Cross-phase audit of all outstanding UAT and verification items
+- **[`gsd:eval-review`](references/commands/misc/eval-review.md)**: Retroactively audit an executed AI phase's evaluation coverage — scores each eval dimension as COVERED/PARTIAL/MISSING and produces an actionable EVAL-REVIEW.md with remediation plan
+- **[`gsd:extract_learnings`](references/commands/misc/extract_learnings.md)**: Extract decisions, lessons, patterns, and surprises from completed phase artifacts
+- **[`gsd:from-gsd2`](references/commands/misc/from-gsd2.md)**: Import a GSD-2 (.gsd/) project back to GSD v1 (.planning/) format
+- **[`gsd:graphify`](references/commands/misc/graphify.md)**: Build, query, and inspect the project knowledge graph in .planning/graphs/
+- **[`gsd:inbox`](references/commands/misc/inbox.md)**: Triage and review all open GitHub issues and PRs against project templates and contribution guidelines
 - **[`gsd:next`](references/commands/misc/next.md)**: Automatically advance to the next logical step in the GSD workflow
-- **[`gsd:progress`](references/commands/misc/progress.md)**: Check project progress, show context, and route to next action (execute or plan)
+- **[`gsd:progress`](references/commands/misc/progress.md)**: Check project progress, show context, and route to next action (execute or plan). Use --forensic to append a 6-check integrity audit after the standard report.
+- **[`gsd:sketch-wrap-up`](references/commands/misc/sketch-wrap-up.md)**: Package sketch design findings into a persistent project skill for future build conversations
+- **[`gsd:sketch`](references/commands/misc/sketch.md)**: Rapidly sketch UI/design ideas using throwaway HTML mockups with multi-variant exploration
+- **[`gsd:spec-phase`](references/commands/misc/spec-phase.md)**: Socratic spec refinement — clarify WHAT a phase delivers with ambiguity scoring before discuss-phase. Produces a SPEC.md with falsifiable requirements locked before implementation decisions begin.
+- **[`gsd:spike-wrap-up`](references/commands/misc/spike-wrap-up.md)**: Package spike findings into a persistent project skill for future build conversations
+- **[`gsd:spike`](references/commands/misc/spike.md)**: Rapidly spike an idea with throwaway experiments to validate feasibility before planning
 - **[`gsd:verify-work`](references/commands/misc/verify-work.md)**: Validate built features through conversational UAT
 ### Phase Commands
 - **[`gsd:add-phase`](references/commands/phase/add-phase.md)**: Add phase to end of current milestone in roadmap
 - **[`gsd:add-tests`](references/commands/phase/add-tests.md)**: Generate tests for a completed phase based on UAT criteria and implementation
-- **[`gsd:discuss-phase`](references/commands/phase/discuss-phase.md)**: Gather phase context through adaptive questioning before planning. Use --auto to skip interactive questions (Antigravity picks recommended defaults). Use --chain for interactive discuss followed by automatic plan+execute. Use --power for bulk question generation into a file-based UI (answer at your own pace).
+- **[`gsd:discuss-phase`](references/commands/phase/discuss-phase.md)**: Gather phase context through adaptive questioning before planning. Use --all to skip area selection and discuss all gray areas interactively. Use --auto to skip interactive questions (Antigravity picks recommended defaults). Use --chain for interactive discuss followed by automatic plan+execute. Use --power for bulk question generation into a file-based UI (answer at your own pace).
 - **[`gsd:execute-phase`](references/commands/phase/execute-phase.md)**: Execute all plans in a phase with wave-based parallelization
 - **[`gsd:insert-phase`](references/commands/phase/insert-phase.md)**: Insert urgent work as decimal phase (e.g., 72.1) between existing phases
 - **[`gsd:list-phase-assumptions`](references/commands/phase/list-phase-assumptions.md)**: Surface Antigravity's assumptions about a phase approach before planning
@@ -218,4 +240,4 @@ General documentation on the GSD philosophy, usage patterns, and configuration.
 5.  **CLI Invocation**: `gsd-tools` is **NOT** a global command. Always invoke it with the full node path: `node .agent/skills/gsd/bin/gsd-tools.cjs <command> [args]`. Never run `gsd-tools` bare.
 ---
-*Generated by gsd-converter on 2026-04-08*
+*Generated by gsd-converter on 2026-04-18*

package/.agent/skills/gsd/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 1.34.2
1	+ 1.37.1

package/.agent/skills/gsd/assets/templates/AI-SPEC.md ADDED Viewed

@@ -0,0 +1,246 @@
+# AI-SPEC — Phase {N}: {phase_name}
+> AI design contract generated by `/gsd-ai-integration-phase`. Consumed by `gsd-planner` and `gsd-eval-auditor`.
+> Locks framework selection, implementation guidance, and evaluation strategy before planning begins.
+---
+## 1. System Classification
+**System Type:** <!-- RAG | Multi-Agent | Conversational | Extraction | Autonomous Agent | Content Generation | Code Automation | Hybrid -->
+**Description:**
+<!-- One-paragraph description of what this AI system does, who uses it, and what "good" looks like -->
+**Critical Failure Modes:**
+<!-- The 3-5 behaviors that absolutely cannot go wrong in this system -->
+1.
+2.
+3.
+---
+## 1b. Domain Context
+> Researched by `gsd-domain-researcher`. Grounds the evaluation strategy in domain expert knowledge.
+**Industry Vertical:** <!-- healthcare | legal | finance | customer service | education | developer tooling | e-commerce | etc. -->
+**User Population:** <!-- who uses this system and in what context -->
+**Stakes Level:** <!-- Low | Medium | High | Critical -->
+**Output Consequence:** <!-- what happens downstream when the AI output is acted on -->
+### What Domain Experts Evaluate Against
+<!-- Domain-specific rubric ingredients — in practitioner language, not AI jargon -->
+<!-- Format: Dimension / Good (expert accepts) / Bad (expert flags) / Stakes / Source -->
+### Known Failure Modes in This Domain
+<!-- Domain-specific failure modes from research — not generic hallucination, but how it manifests here -->
+### Regulatory / Compliance Context
+<!-- Relevant regulations or constraints — or "None identified" if genuinely none apply -->
+### Domain Expert Roles for Evaluation
+| Role | Responsibility |
+|------|---------------|
+| <!-- e.g., Senior practitioner --> | <!-- Dataset labeling / rubric calibration / production sampling --> |
+---
+## 2. Framework Decision
+**Selected Framework:** <!-- e.g., LlamaIndex v0.10.x -->
+**Version:** <!-- Pin the version -->
+**Rationale:**
+<!-- Why this framework fits this system type, team context, and production requirements -->
+**Alternatives Considered:**
+| Framework | Ruled Out Because |
+|-----------|------------------|
+| | |
+**Vendor Lock-In Accepted:** <!-- Yes / No / Partial — document the trade-off consciously -->
+---
+## 3. Framework Quick Reference
+> Fetched from official docs by `gsd-ai-researcher`. Distilled for this specific use case.
+### Installation
+```bash
+# Install command(s)
+```
+### Core Imports
+```python
+# Key imports for this use case
+```
+### Entry Point Pattern
+```python
+# Minimal working example for this system type
+```
+### Key Abstractions
+<!-- Framework-specific concepts the developer must understand before coding -->
+| Concept | What It Is | When You Use It |
+|---------|-----------|-----------------|
+| | | |
+### Common Pitfalls
+<!-- Gotchas specific to this framework and system type — from docs, issues, and community reports -->
+1.
+2.
+3.
+### Recommended Project Structure
+```
+project/
+├── # Framework-specific folder layout
+```
+---
+## 4. Implementation Guidance
+**Model Configuration:**
+<!-- Which model(s), temperature, max tokens, and other key parameters -->
+**Core Pattern:**
+<!-- The primary implementation pattern for this system type in this framework -->
+**Tool Use:**
+<!-- Tools/integrations needed and how to configure them -->
+**State Management:**
+<!-- How state is persisted, retrieved, and updated -->
+**Context Window Strategy:**
+<!-- How to manage context limits for this system type -->
+---
+## 4b. AI Systems Best Practices
+> Written by `gsd-ai-researcher`. Cross-cutting patterns every developer building AI systems needs — independent of framework choice.
+### Structured Outputs with Pydantic
+<!-- Framework-specific Pydantic integration pattern for this use case -->
+<!-- Include: output model definition, how the framework uses it, retry logic on validation failure -->
+```python
+# Pydantic output model for this system type
+```
+### Async-First Design
+<!-- How async is handled in this framework, the one common mistake, and when to stream vs. await -->
+### Prompt Engineering Discipline
+<!-- System vs. user prompt separation, few-shot guidance, token budget strategy -->
+### Context Window Management
+<!-- Strategy specific to this system type: RAG chunking / conversation summarisation / agent compaction -->
+### Cost and Latency Budget
+<!-- Per-call cost estimate, caching strategy, sub-task model routing -->
+---
+## 5. Evaluation Strategy
+### Dimensions
+| Dimension | Rubric (Pass/Fail or 1-5) | Measurement Approach | Priority |
+|-----------|--------------------------|---------------------|----------|
+| | | Code / LLM Judge / Human | Critical / High / Medium |
+### Eval Tooling
+**Primary Tool:** <!-- e.g., RAGAS + Langfuse -->
+**Setup:**
+```bash
+# Install and configure
+```
+**CI/CD Integration:**
+```bash
+# Command to run evals in CI/CD pipeline
+```
+### Reference Dataset
+**Size:** <!-- e.g., 20 examples to start -->
+**Composition:**
+<!-- What scenario types the dataset covers: critical paths, edge cases, failure modes -->
+**Labeling:**
+<!-- Who labels examples and how (domain expert, LLM judge with calibration, etc.) -->
+---
+## 6. Guardrails
+### Online (Real-Time)
+| Guardrail | Trigger | Intervention |
+|-----------|---------|--------------|
+| | | Block / Escalate / Flag |
+### Offline (Flywheel)
+| Metric | Sampling Strategy | Action on Degradation |
+|--------|------------------|----------------------|
+| | | |
+---
+## 7. Production Monitoring
+**Tracing Tool:** <!-- e.g., Langfuse self-hosted -->
+**Key Metrics to Track:**
+<!-- 3-5 metrics that will be monitored in production -->
+**Alert Thresholds:**
+<!-- When to page/alert -->
+**Smart Sampling Strategy:**
+<!-- How to select interactions for human review — signal-based filters -->
+---
+## Checklist
+- [ ] System type classified
+- [ ] Critical failure modes identified (≥ 3)
+- [ ] Domain context researched (Section 1b: vertical, stakes, expert criteria, failure modes)
+- [ ] Regulatory/compliance context identified or explicitly noted as none
+- [ ] Domain expert roles defined for evaluation involvement
+- [ ] Framework selected with rationale documented
+- [ ] Alternatives considered and ruled out
+- [ ] Framework quick reference written (install, imports, pattern, pitfalls)
+- [ ] AI systems best practices written (Section 4b: Pydantic, async, prompt discipline, context)
+- [ ] Evaluation dimensions grounded in domain rubric ingredients
+- [ ] Each eval dimension has a concrete rubric (Good/Bad in domain language)
+- [ ] Eval tooling selected — Arize Phoenix default confirmed or override noted
+- [ ] Reference dataset spec written (size ≥ 10, composition + labeling defined)
+- [ ] CI/CD eval integration specified
+- [ ] Online guardrails defined
+- [ ] Production monitoring configured (tracing tool + sampling strategy)

package/.agent/skills/gsd/assets/templates/DEBUG.md CHANGED Viewed

@@ -20,7 +20,9 @@ updated: [ISO timestamp]
 hypothesis: [current theory being tested]
 test: [how testing it]
 expecting: [what result means if true/false]
-next_action: [immediate next step]
+next_action: [immediate next step — be specific, not "continue investigating"]
+reasoning_checkpoint: null  <!-- populated before every fix attempt — see structured_returns -->
+tdd_checkpoint: null  <!-- populated when tdd_mode is active after root cause confirmed -->
 ## Symptoms
 <!-- Written during gathering, then immutable -->
@@ -69,7 +71,10 @@ files_changed: []
 - OVERWRITE entirely on each update
 - Always reflects what Antigravity is doing RIGHT NOW
 - If Antigravity reads this after /clear, it knows exactly where to resume
-- Fields: hypothesis, test, expecting, next_action
+- Fields: hypothesis, test, expecting, next_action, reasoning_checkpoint, tdd_checkpoint
+- `next_action`: must be concrete and actionable — bad: "continue investigating"; good: "Add logging at line 47 of auth.js to observe token value before jwt.verify()"
+- `reasoning_checkpoint`: OVERWRITE before every fix_and_verify — five-field structured reasoning record (hypothesis, confirming_evidence, falsification_test, fix_rationale, blind_spots)
+- `tdd_checkpoint`: OVERWRITE during TDD red/green phases — test file, name, status, failure output
 **Symptoms:**
 - Written during initial gathering phase

package/.agent/skills/gsd/assets/templates/config.json CHANGED Viewed

@@ -1,48 +1,56 @@
-{
-  "mode": "interactive",
-  "granularity": "standard",
-  "workflow": {
-    "research": true,
-    "plan_check": true,
-    "verifier": true,
-    "auto_advance": false,
-    "nyquist_validation": true,
-    "security_enforcement": true,
-    "security_asvs_level": 1,
-    "security_block_on": "high",
-    "discuss_mode": "discuss",
-    "research_before_questions": false
-  },
-  "planning": {
-    "commit_docs": true,
-    "search_gitignored": false,
-    "sub_repos": []
-  },
-  "parallelization": {
-    "enabled": true,
-    "plan_level": true,
-    "task_level": false,
-    "skip_checkpoints": true,
-    "max_concurrent_agents": 3,
-    "min_plans_for_parallel": 2
-  },
-  "gates": {
-    "confirm_project": true,
-    "confirm_phases": true,
-    "confirm_roadmap": true,
-    "confirm_breakdown": true,
-    "confirm_plan": true,
-    "execute_next_plan": true,
-    "issues_review": true,
-    "confirm_transition": true
-  },
-  "safety": {
-    "always_confirm_destructive": true,
-    "always_confirm_external_services": true
-  },
-  "hooks": {
-    "context_warnings": true
-  },
-  "project_code": null,
-  "agent_skills": {}
-}
+{
+  "mode": "interactive",
+  "granularity": "standard",
+  "workflow": {
+    "research": true,
+    "plan_check": true,
+    "verifier": true,
+    "auto_advance": false,
+    "nyquist_validation": true,
+    "security_enforcement": true,
+    "security_asvs_level": 1,
+    "security_block_on": "high",
+    "discuss_mode": "discuss",
+    "research_before_questions": false,
+    "code_review_command": null,
+    "plan_bounce": false,
+    "plan_bounce_script": null,
+    "plan_bounce_passes": 2,
+    "cross_ai_execution": false,
+    "cross_ai_command": "",
+    "cross_ai_timeout": 300
+  },
+  "planning": {
+    "commit_docs": true,
+    "search_gitignored": false,
+    "sub_repos": []
+  },
+  "parallelization": {
+    "enabled": true,
+    "plan_level": true,
+    "task_level": false,
+    "skip_checkpoints": true,
+    "max_concurrent_agents": 3,
+    "min_plans_for_parallel": 2
+  },
+  "gates": {
+    "confirm_project": true,
+    "confirm_phases": true,
+    "confirm_roadmap": true,
+    "confirm_breakdown": true,
+    "confirm_plan": true,
+    "execute_next_plan": true,
+    "issues_review": true,
+    "confirm_transition": true
+  },
+  "safety": {
+    "always_confirm_destructive": true,
+    "always_confirm_external_services": true
+  },
+  "hooks": {
+    "context_warnings": true
+  },
+  "project_code": null,
+  "agent_skills": {},
+  "antigravity_md_path": "./ANTIGRAVITY.md"
+}

package/.agent/skills/gsd/assets/templates/research.md CHANGED Viewed

@@ -38,6 +38,18 @@ Template for `.planning/phases/XX-name/{phase_num}-RESEARCH.md` - comprehensive
 **If no CONTEXT.md exists:** Write "No user constraints - all decisions at Antigravity's discretion"
 </user_constraints>
+<architectural_responsibility_map>
+## Architectural Responsibility Map
+Map each phase capability to its standard architectural tier owner before diving into framework research. This prevents tier misassignment from propagating into plans.
+| Capability | Primary Tier | Secondary Tier | Rationale |
+|------------|-------------|----------------|-----------|
+| [capability from phase description] | [Browser/Client, Frontend Server, API/Backend, CDN/Static, or Database/Storage] | [secondary tier or —] | [why this tier owns it] |
+**If single-tier application:** Write "Single-tier application — all capabilities reside in [tier]" and omit the table.
+</architectural_responsibility_map>
 <research_summary>
 ## Summary
@@ -82,6 +94,20 @@ yarn add [packages]
 <architecture_patterns>
 ## Architecture Patterns
+### System Architecture Diagram
+Architecture diagrams MUST show data flow through conceptual components, not file listings.
+Requirements:
+- Show entry points (how data/requests enter the system)
+- Show processing stages (what transformations happen, in what order)
+- Show decision points and branching paths
+- Show external dependencies and service boundaries
+- Use arrows to indicate data flow direction
+- A reader should be able to trace the primary use case from input to output by following the arrows
+File-to-implementation mapping belongs in the Component Responsibilities table, not in the diagram.
 ### Recommended Project Structure
 ```
 src/
@@ -300,6 +326,20 @@ npm install three @react-three/fiber @react-three/drei @react-three/rapier zusta
 <architecture_patterns>
 ## Architecture Patterns
+### System Architecture Diagram
+Architecture diagrams MUST show data flow through conceptual components, not file listings.
+Requirements:
+- Show entry points (how data/requests enter the system)
+- Show processing stages (what transformations happen, in what order)
+- Show decision points and branching paths
+- Show external dependencies and service boundaries
+- Use arrows to indicate data flow direction
+- A reader should be able to trace the primary use case from input to output by following the arrows
+File-to-implementation mapping belongs in the Component Responsibilities table, not in the diagram.
 ### Recommended Project Structure
 ```
 src/