npm - ultimate-pi - Versions diffs - 0.1.0 → 0.1.3 - Mend

ultimate-pi 0.1.0 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (509) hide show

package/vault/wiki/questions/Research: Gemini CLI SOTA Harness Integration.md ADDED Viewed

@@ -0,0 +1,166 @@
+---
+type: synthesis
+title: "Research: Gemini CLI SOTA + Harness Integration from First Principles"
+created: 2026-05-01
+updated: 2026-05-01
+tags:
+  - research
+  - harness
+  - gemini-cli
+  - coding-agents
+  - first-principles
+status: developing
+related:
+  - "[[harness-implementation-plan]]"
+  - "[[harness]]"
+  - "[[harness-control-frameworks]]"
+  - "[[gemini-cli-architecture]]"
+  - "[[harness-engineering-first-principles]]"
+  - "[[agent-skills-pattern]]"
+sources:
+  - "[[Source: Google Gemini CLI Architecture Docs]]"
+  - "[[Source: Google Blog - Gemini CLI Announcement]]"
+  - "[[Source: Render AI Coding Agents Benchmark 2025]]"
+  - "[[Source: Martin Fowler - Harness Engineering]]"
+  - "[[Source: LangChain - Anatomy of Agent Harness]]"
+  - "[[Source: OpenAI Harness Engineering Five Principles]]"
+  - "[[Source: Augment - Harness Engineering for AI Coding Agents]]"
+  - "[[Source: Gemini CLI Changelogs]]"
+---# Research: Gemini CLI SOTA + Harness Integration from First Principles
+## Overview
+Gemini CLI (launched June 2025, now v0.40+) introduced a composable agent harness with 15+ SOTA primitives: agent skills with progressive disclosure, plan mode, subagent orchestration, policy engine, hooks/middleware, context compression, and multi-registry architecture. This research maps those innovations to harness engineering first principles and identifies 7 integration opportunities for the Ultimate-PI harness, rethinking from first principles rather than feature-copying.
+## Key Findings
+### Gemini CLI SOTA Innovations (by harness layer)
+1. **Agent Skills (v0.23, Jan 2026)** — Progressive disclosure system: skills loaded on-demand via activation tool, solving context rot. Formalized with frontmatter, `/memory inbox` for review, skill-creator meta-skill. Our `.pi/skills/` implements the same primitive but without formal activation mechanism. (Source: [[Source: Gemini CLI Changelogs]])
+2. **Plan Mode (v0.29, Feb 2026)** — Structured task decomposition with `/plan`, todo tracking, annotation support, research subagents, external editor integration. Enabled by default v0.34. Parallel to our L2 Structured Planning. (Source: [[Source: Gemini CLI Changelogs]])
+3. **Codebase Investigator Subagent (v0.12, Oct 2025)** — JIT context discovery: automatically explores workspace, resolves relevant files. Enhanced with JIT context injection in v0.36. Parallel to our Gitingest + ck semantic search for L3 Grounding. (Source: [[Source: Google Gemini CLI Architecture Docs]])
+4. **Context Compression Service (v0.38, Apr 2026)** — Advanced context management distilling conversation history. Configurable compression threshold. Parallel to our pi-lean-ctx. (Source: [[Source: Gemini CLI Changelogs]])
+5. **Chapters Narrative Flow (v0.38, Apr 2026)** — Groups agent interactions by intent and tool usage for session structure and narrative continuity. Novel concept; no parallel in our harness. (Source: [[Source: Gemini CLI Changelogs]])
+6. **Persistent Policy Engine (v0.18–v0.38)** — Fine-grained tool execution policies: project-level policies, MCP server wildcards, tool annotation matching, persistent "Always Allow" decisions, context-aware approvals. Pre-execution gates. We have drift detection but not pre-execution policy gates. (Source: [[Source: Gemini CLI Changelogs]])
+7. **Subagents + Remote Agents (v0.32, Mar 2026)** — Generalist agent for task routing, JIT context injection, resilient tool rejection with contextual feedback. A2A protocol support for remote agents (v0.33). Parallel to our Archon Orchestration L7. (Source: [[Source: Gemini CLI Changelogs]])
+8. **Event-Driven Hooks Architecture (v0.27, Feb 2026)** — Event-driven scheduler for tool execution, hooks for compaction/continuation/lint checks. MessageBus injection for internal communication. We have pipeline phases but no event-driven hook system. (Source: [[Source: Gemini CLI Changelogs]])
+9. **Four-Tier Memory System (v0.39, Apr 2026)** — Prompt-driven memory: transitioned from static files to four-tier system. `/memory inbox` for reviewing skills extracted during sessions. Auto Memory (experimental). (Source: [[Source: Gemini CLI Changelogs]])
+10. **Multi-Registry Architecture (v0.36, Apr 2026)** — Extensions, skills, MCP servers all managed as registries. Extensions loaded in parallel (v0.32). Extensions ecosystem: 20+ partner extensions launched by v0.12. (Source: [[Source: Gemini CLI Changelogs]])
+11. **Browser Agent (v0.31, Feb 2026)** — Experimental browser agent with persistent sessions, dynamic tool discovery. Chrome DevTools Protocol access for DOM snapshots, screenshots, navigation. (Source: [[Source: Gemini CLI Changelogs]])
+12. **Model Routing (v0.12, Oct 2025)** — Intelligently picks Flash for simple tasks, Pro for complex. Configurable model selection. Our model-adaptive profiles do similar but static. (Source: [[Source: Gemini CLI Changelogs]])
+13. **Sandboxing Stack (v0.34–v0.37)** — Docker, gVisor, LXC, macOS Seatbelt, Windows sandboxing. Dynamic sandbox expansion. Tool isolation via SandboxManager. (Source: [[Source: Gemini CLI Changelogs]])
+14. **Git Worktrees (v0.36, Apr 2026)** — Isolated parallel sessions via git worktrees. Allows multiple agents working on same repo without conflicts. (Source: [[Source: Gemini CLI Changelogs]])
+15. **Extensions Ecosystem (v0.8, Sep 2025)** — Partner extensions, custom extensions, A2A protocol. SDK package (v0.30) enabling dynamic system instructions. (Source: [[Source: Gemini CLI Changelogs]])
+### Benchmark Standing (Render, Aug 2025)
+- **Gemini CLI scored 6.8/10** overall (tied with Claude Code), behind Cursor (8/10)
+- **Context: 9/10** — best in class due to 1M token window + automatic codebase loading
+- **Quality: 7/10** — solid on production refactors, weak on vibe coding/greenfield
+- **Cost: 8/10** — free tier: 60 req/min, 1,000 req/day (industry best)
+- **Weakest: speed (5/10), integration (5/10)** — slow context loading, no native IDE integration
+- Pattern: excels at *editing existing codebases* (context-driven), struggles with *generating from scratch* (Source: [[Source: Render AI Coding Agents Benchmark 2025]])
+## Key Concepts
+- [[harness-engineering-first-principles]] — Agent = Model + Harness. Feedforward + Feedback. Steering loop.
+- [[agent-skills-pattern]] — Progressive disclosure: skills loaded on-demand to prevent context rot
+- [[policy-engine-pattern]] — Pre-execution gates: deterministic constraints vs probabilistic compliance
+- [[subagent-orchestration]] — Generalist router + specialist agents with JIT context
+- [[context-compression-techniques]] — Compaction, tool call offloading, progressive disclosure
+## Integration Opportunities (First Principles)
+### Gap Analysis: Ultimate-PI vs Gemini CLI SOTA
+| Harness Layer | Gemini CLI Has | Ultimate-PI Has | Gap |
+|---|---|---|---|
+| Progressive Skills | Formal activation, `/memory inbox` | `.pi/skills/` directory | No activation mechanism, no inbox review |
+| Planning | Plan Mode with todo tracking, research subagents | L2 Structured Planning | No research subagents, no annotation |
+| Grounding | Codebase Investigator (JIT) | Gitingest + ck semantic search | No JIT context discovery |
+| Context Mgmt | Compression Service, Chapters | pi-lean-ctx | No chapters/narrative flow |
+| Policy/Safety | Policy Engine, pre-execution gates | Drift detection (post-hoc) | No pre-execution policy gates |
+| Orchestration | Subagents + Remote Agents (A2A) | Archon Orchestration L7 | No A2A, no generalist router |
+| Hooks | Event-driven scheduler, hooks/middleware | Pipeline phases (static) | No event-driven hook system |
+| Memory | Four-tier memory, Auto Memory | Wiki-based persistent memory | No auto-memory extraction |
+| Browser | Browser Agent, CDP access | None | No browser/visual verification |
+| Worktrees | Git worktrees for isolation | None | No isolated parallel sessions |
+### Recommended Integration Priority (First Principles)
+**Why these? Not "feature copy Gemini CLI" — each derived from harness engineering first principles (Feedforward-Feedback, Steering Loop, Keep Quality Left).**
+#### Priority 1: Pre-Execution Policy Gates (P-F1)
+- **First principle**: Mechanical enforcement over documentation (OpenAI Principle 3). Deterministic constraints prevent failures before they occur.
+- **What**: Add pre-execution policy engine as L2.7 (between Plan and Drift). Reject tool calls that violate architectural invariants before execution.
+- **Leverage**: Our existing drift detection paradigms provide the detection logic; invert them from post-hoc to pre-execution.
+- **Token budget**: ~500 tokens per policy check.
+#### Priority 2: Skills Activation Mechanism (P-F2)
+- **First principle**: Progressive disclosure prevents context rot. "What the agent can't see doesn't exist" (OpenAI Principle 1) works both ways — irrelevant tools degrade performance.
+- **What**: Add `activate_skill` tool pattern and `/memory inbox` (skill review queue). Skills loaded on-demand instead of all at startup.
+- **Leverage**: Our `.pi/skills/` directory structure already supports this; add frontmatter metadata and an activation registry.
+#### Priority 3: Research Subagents for Planning (P-F3)
+- **First principle**: Ask what capability is missing, not why the agent is failing (OpenAI Principle 2).
+- **What**: During L2 Structured Planning, spawn lightweight research subagents that explore codebase, fetch docs, validate assumptions before plan is committed.
+- **Leverage**: Gitingest + ck already provide codebase exploration; context7 provides docs.
+#### Priority 4: Event-Driven Hooks Middleware (P-F4)
+- **First principle**: The steering loop requires feedback after every action, not just at phase boundaries.
+- **What**: Add hook system: pre-tool-execution (policy check), post-tool-execution (drift check), pre-response (compaction), post-session (memory extraction).
+- **Leverage**: Our pipeline phases map naturally to hook points.
+#### Priority 5: Git Worktree Sessions (P-F5)
+- **First principle**: Give the agent eyes (OpenAI Principle 4) — but also give it isolated space to experiment.
+- **What**: Use git worktrees for isolated agent sessions. Agents work in worktrees; harness verifies before merging to main.
+- **Leverage**: Our adversarial verification L4 provides the merge gate.
+#### Priority 6: Chapters Narrative for Sessions (P-F6)
+- **First principle**: A map, not a manual (OpenAI Principle 5). Session structure helps humans steer.
+- **What**: Group agent actions into chapters by intent. Display chapter summaries during review.
+- **Leverage**: Wiki log already captures session actions; add structural grouping.
+#### Priority 7: Browser Agent for Visual Verification (P-F7)
+- **First principle**: Give the agent eyes — visual verification catches what code checks miss.
+- **What**: Integrate browser automation (Playwright/CDP) for UI verification in L4 Adversarial Verification.
+- **Leverage**: Extend existing verification layer.
+## Contradictions
+- [[Source: Render AI Coding Agents Benchmark 2025]] says Gemini CLI struggled on greenfield (3/10), but [[Source: Google Blog - Gemini CLI Announcement]] positions it as "excels at coding" universally. Render's controlled test methodology is more credible as independent verification.
+- [[Source: LangChain - Anatomy of Agent Harness]] says "best harness for your task is NOT the one a model was post-trained with" — Terminal Bench 2.0 shows Opus 4.6 scores lower in Claude Code than in other harnesses. Counter-intuitive: model-specific harness may underperform.
+## Open Questions
+- How does Gemini CLI's Policy Engine handle conflicting policies across project/user/admin levels? Resolution mechanism unclear from docs.
+- Does Chapters narrative flow improve agent performance or just human review UX? No published metrics.
+- Can our model-adaptive profiles be extended to dynamic model routing (like Gemini CLI's auto-select Flash vs Pro) without destabilizing the multi-model contract?
+- How does Gemini CLI's Auto Memory (experimental) compare to our wiki-based persistent memory in terms of retrieval accuracy and context injection cost?
+- What is the token overhead of Gemini CLI's event-driven hooks architecture? Our static pipeline has ~15K tok overhead; dynamic hooks may be lower or higher.
+## Sources
+- [[Source: Google Gemini CLI Architecture Docs]] — official architecture: 2 packages (cli + core), ReAct loop, tool system
+- [[Source: Google Blog - Gemini CLI Announcement]] — launch: free tier, 1M token window, MCP, Google Search grounding
+- [[Source: Render AI Coding Agents Benchmark 2025]] — independent benchmark: Cursor 8/10, Gemini 6.8/10, strengths/weaknesses
+- [[Source: Martin Fowler - Harness Engineering]] — canonical framework: feedforward/feedback, computational/inferential, steering loop
+- [[Source: LangChain - Anatomy of Agent Harness]] — Agent = Model + Harness, harness primitives derivation
+- [[Source: OpenAI Harness Engineering Five Principles]] — 5 principles from 1M-line agent-built codebase
+- [[Source: Augment - Harness Engineering for AI Coding Agents]] — PEV loop, constraint layers, measurement metrics
+- [[Source: Gemini CLI Changelogs]] — feature evolution: skills, plan mode, policy engine, hooks, subagents

package/vault/wiki/questions/Research: GitHub Issues as Harness Spec Storage.md ADDED Viewed

@@ -0,0 +1,188 @@
+---
+type: synthesis
+title: "Research: GitHub Issues as Harness Spec Storage"
+created: 2026-04-30
+updated: 2026-04-30
+tags:
+  - research
+  - harness
+  - github-issues
+  - spec-storage
+  - cli
+  - persistence
+status: developing
+related:
+  - "[[harness-implementation-plan]]"
+  - "[[spec-hardening]]"
+  - "[[structured-planning]]"
+sources:
+  - "[[github-sub-issues-docs]]"
+  - "[[github-issue-dependencies-docs]]"
+  - "[[gh-sub-issue-extension]]"
+  - "[[gh-cli-sub-issue-rfc]]"
+  - "[[github-fork-issues-discussion]]"
+---# Research: GitHub Issues as Harness Spec Storage
+## Overview
+GitHub Issues can serve as cloud-persistent, cross-session spec storage for the agentic harness. GitHub's native sub-issues (parent-child hierarchies, April 2025) and issue dependencies (blocked-by/blocking) map directly to the harness's spec decomposition and task dependency graphs. The `gh` CLI lacks native sub-issue support but a community extension (`gh-sub-issue`) fills the gap. A dual-tier architecture — local JSON cache + GitHub Issue ledger — gives speed and cloud persistence without over-reliance on network.
+## Key Findings
+- **GitHub has native sub-issues since April 2025**: Up to 8 levels deep, 100 sub-issues per parent, cross-repo support (Source: [[github-sub-issues-docs]])
+- **Issue dependencies are a separate feature**: Blocked-by / blocking relationships define execution order, distinct from sub-issues which define decomposition (Source: [[github-issue-dependencies-docs]])
+- **`gh` CLI has NO native sub-issue support**: cli/cli#10298 open since Jan 2025, PR #13057 in progress. Currently requires REST API or community extension (Source: [[gh-cli-sub-issue-rfc]])
+- **Community extension `gh-sub-issue`** (yahsan2): 110 stars, MIT, provides `add`/`create`/`list`/`remove` commands for parent-child relationships via `gh` CLI (Source: [[gh-sub-issue-extension]])
+- **Each hardened spec maps 1:1 to a GitHub Issue** with template body, labels for machine-readable state, and sub-issues for decomposed tasks
+- **Dual-tier architecture recommended**: Local `.pi/harness/specs/<id>.json` for speed, GitHub Issue for cross-session ledger
+- **Fork isolation is handled automatically**: Forks get their own issue tracker (enabled Dec 2025). Spec cache is gitignored — no stale upstream references leak into forks. `ultimate-pi harness init` bootstraps a fork's issue tracker with labels + templates
+## Key Entities
+- [[github-sub-issues-docs|GitHub Sub-Issues Feature]]: Native parent-child issue hierarchies in GitHub Issues
+- [[gh-sub-issue-extension|gh-sub-issue Extension]]: Community `gh` CLI extension (110 stars, MIT) bridging the CLI gap
+- [[gh-cli-sub-issue-rfc|gh CLI Sub-Issue RFC]]: cli/cli#10298 — official feature request for `--parent` on `gh issue create`
+## Key Concepts
+- **Sub-Issue vs Dependency**: Sub-issues = decomposition ("A contains B"). Dependencies = execution order ("A blocks B"). Both are native GitHub features with separate APIs.
+- **Dual-Tier Architecture**: Local cache (fast, offline-capable) + remote ledger (persistent, queryable). Local JSON is always the primary execution path. GitHub Issues are created at major state transitions only.
+- **Issue-as-Spec Template**: A HardenedSpec maps to an issue body with structured sections (intent_summary, success_criteria, anti_criteria, definition_of_done). Labels encode machine-readable state (`harness-spec`, `layer-{n}`, `status:{status}`).
+- **Execution Ledger**: Issue comments serve as an immutable audit trail. Each harness phase appends a status update comment.
+- **Phase-to-Issue Mapping**: Not every micro-step creates an issue. Only major state transitions: spec creation (P1), plan creation (P2), phase completion checkpoints (P8).
+## Contradictions
+None identified. All sources agree on the sub-issue feature's existence and CLI gap.
+## Fork / Multi-Tenant Considerations
+When someone forks a project using ultimate-pi, spec storage must not leak upstream state into the fork.
+### The Fork Problem
+1. **Issue tracker isolation**: GitHub historically blocked issues on forks. As of December 2025, forks CAN enable issues (Settings → General → Features → check "Issues"). But they start EMPTY — upstream issues are never copied to forks (Source: [[github-fork-issues-discussion]]).
+2. **Local cache leakage**: `.pi/harness/specs/<id>.json` files committed to the repo WOULD be forked, carrying stale upstream issue URLs. This is the primary contamination vector.
+3. **`gh` CLI context**: `gh` is authenticated to a specific repo. A fork must re-authenticate (`gh auth login`) and set its own default repo.
+### Solution: Local-First, Gitignored Cache, Init Bootstrap
+| Concern | Solution |
+|---------|----------|
+| Stale cache in forks | `.pi/harness/specs/` is in `.gitignore`. Cache is runtime-only, never committed. |
+| Empty issue tracker on fork | `ultimate-pi harness init` detects fork, prompts to enable issues (or auto-enables via API), creates harness label set |
+| Wrong `gh` repo context | `harness init` runs `gh repo set-default` for the fork. Config stored in `.pi/harness/config.json` (repo-relative, not global) |
+| Upstream spec references | Local cache stores `github_issue_url` as optional field. If absent or pointing to wrong repo, harness creates new issues in current repo |
+| No `gh` auth on fork | `harness init` checks `gh auth status`, guides user through `gh auth login` if needed |
+### Init Flow for Forked Projects
+```
+ultimate-pi harness init
+  ├─ Detect: is this a fork? (gh repo view --json isFork)
+  ├─ Check: are issues enabled? If not → guide to enable or auto-enable via API
+  ├─ Auth: gh auth status → prompt login if missing
+  ├─ Labels: gh label create harness-spec, harness-task, layer-1..layer-8, status:*
+  ├─ Templates: create .github/ISSUE_TEMPLATE/harness-spec.yml
+  ├─ Gitignore: ensure .pi/harness/specs/ is in .gitignore
+  └─ Ready: harness can now create spec issues in fork's own tracker
+```
+### Why This Works
+- **No shared state between upstream and fork**: Each has its own isolated issue tracker
+- **Gitignored cache prevents stale refs**: Fork never sees upstream's runtime spec files
+- **Init is idempotent**: Running `harness init` on an already-initialized fork is a no-op
+- **Labels are the only shared artifact**: Label names are convention, not data. Forks recreate them locally
+## Creative Solution: Content-Addressed Spec Identity
+The fork-merge divergence problem (fork #5 ≠ upstream #5) is solved via **content-addressed spec identity** combined with **GitHub's native issue transfer API**. See [[content-addressed-spec-identity]] for full specification.
+### How It Works
+1. **Every HardenedSpec gets a content fingerprint**: `SHA256(intent_summary + success_criteria + definition_of_done)`. Embedded in issue body as `<!-- spec-fp: <hash> -->`, in title as `[spec:<first8>]`, and in local cache.
+2. **Resolution by hash, not number**: When harness needs a spec, it searches `gh search issues "spec-fp:<hash>"` across all repos. Issue number is irrelevant — found by content identity, not location.
+3. **Transfer on merge**: `ultimate-pi harness migrate` uses `gh issue transfer` (native GitHub API) to move specs from fork to upstream. Idempotent — searches by fingerprint before transferring.
+4. **Transfer-safe**: When an issue transfers between repos, only its number changes. The body (and fingerprint within it) stays the same. Labels must be reapplied (GitHub limitation).
+### Why Content Addressing
+- **Repo-agnostic**: Same hash resolves to correct issue in any repo
+- **Deduplication**: Two issues with same fingerprint ARE the same spec — merge them
+- **No stale references**: Harness searches by hash, not by cached URL
+- **Inspired by Git's object model**: Content identity > location identity
+### Migration Flow
+```
+ultimate-pi harness migrate
+  ├─ Detect repo change (fork → upstream)
+  ├─ List fork specs by label
+  ├─ For each: search upstream by fingerprint
+  ├─ If not found → gh issue transfer + relabel
+  ├─ Update local cache URLs
+  └─ Idempotent — safe to re-run
+```
+Implementation effort: ~2-3 days. All operations use existing `gh` CLI.
+## Open Questions
+- When will `gh` CLI gain native `--parent` flag? PR #13057 in progress but no merge timeline.
+- What rate limiting impact will harness-driven issue creation have? 5,000 req/hr for authenticated users. At 1 issue per subtask, a 5-subtask plan creates ~5-15 issues — well within limits.
+- Should issue creation be synchronous during harness execution, or batched after pipeline completion?
+- Can GitHub Projects v2 auto-track sub-issue progress for harness observability (L5)?
+- Should `harness init` auto-enable issues on fork via API, or require manual user action? (API approach is faster but may surprise users)
+- ~~What happens if a fork is later merged upstream?~~ **SOLVED**: Content-addressed spec identity + `gh issue transfer` migration. See [[content-addressed-spec-identity]].
+## Integration Points
+### L1 Spec Hardening → GitHub Issues
+| Step | Action | CLI Command |
+|------|--------|-------------|
+| 1 | Harden spec | `SpecHardener.harden()` → HardenedSpec |
+| 2 | Create spec issue | `gh issue create --title "Spec: {intent_summary}" --body "..." --label harness-spec,layer-1` |
+| 3 | Store local cache | `.pi/harness/specs/{issue_number}.json` with `github_issue_url` field |
+| 4 | Emit spec_hardened | → L2 |
+### L2 Structured Planning → GitHub Sub-Issues
+| Action | CLI Command |
+|--------|-------------|
+| Create task sub-issues | `gh sub-issue create --parent {spec_issue} --title "{task_name}" --label harness-task` |
+| Link dependencies | `gh issue edit {task_A} --add-label "blocked-by:{task_B}"` (or use API for native deps) |
+| Add sprint contract | `gh issue comment {task_issue} --body "## Sprint Contract\n..."` |
+### GitHub Projects v2 (Optional Visualization)
+- Add spec issue to a "Harness" project board
+- Sub-issues auto-appear in board with parent/child progress fields
+- Filter by `label:harness-spec`, group by `status`
+- Roadmap view for phase timelines
+## Toolchain
+| Tool | Purpose | Command |
+|------|---------|---------|
+| `gh issue create` | Create spec/task issues | `gh issue create --title "..." --body "..." --label ...` |
+| `gh issue edit` | Update issue state/labels | `gh issue edit {id} --add-label "..." --remove-label "..."` |
+| `gh issue comment` | Append execution log entry | `gh issue comment {id} --body "..."` |
+| `gh issue list` | Query issue state | `gh issue list --label harness-spec --json number,title,state,labels` |
+| `gh issue view` | Read issue body as JSON | `gh issue view {id} --json body,title,labels,state` |
+| `gh sub-issue create` | Create child task | `gh sub-issue create --parent {id} --title "..."` |
+| `gh sub-issue list` | List child tasks | `gh sub-issue list {id} --json number,title,state` |
+| `gh api` | Raw API for dependencies | `gh api /repos/{owner}/{repo}/issues/{id}` |
+## Sources
+- [[github-sub-issues-docs]]: GitHub official docs on sub-issues (April 2025)
+- [[github-issue-dependencies-docs]]: GitHub official docs on issue dependencies
+- [[gh-sub-issue-extension]]: yahsan2/gh-sub-issue, MIT license, v0.5.1 (Oct 2025)
+- [[gh-cli-sub-issue-rfc]]: cli/cli#10298, feature request (Jan 2025)
+- [[github-fork-issues-discussion]]: GitHub Community discussion #161368 — fork issues enablement (Jun-Dec 2025)

package/vault/wiki/questions/Research: Google Antigravity Harness Integration.md ADDED Viewed

@@ -0,0 +1,120 @@
+---
+type: synthesis
+title: "Research: Google Antigravity Harness Integration"
+created: 2026-05-01
+updated: 2026-05-01
+tags:
+  - research
+  - antigravity
+  - google
+  - harness
+  - agent-first
+  - integration
+status: developing
+related:
+  - "[[antigravity-agent-first-architecture]]"
+  - "[[agent-artifacts-verifiable-deliverables]]"
+  - "[[browser-subagent-visual-verification]]"
+  - "[[harness-implementation-plan]]"
+  - "[[agentic-harness]]"
+  - "[[model-adaptive-harness]]"
+  - "[[self-evolving-harness]]"
+  - "[[cursor-harness-innovations]]"
+sources:
+  - "[[google-antigravity-official-blog]]"
+  - "[[google-antigravity-wikipedia]]"
+  - "[[cursor-vs-antigravity-2026]]"
+---# Research: Google Antigravity Harness Integration
+## Overview
+Google Antigravity is an agent-first development platform (launched November 18, 2025 alongside Gemini 3). It represents the first major IDE platform built from the ground up for autonomous coding agents — not an AI plugin bolted onto an existing editor. This research identifies SOTA innovations, maps them against our harness implementation plan, and proposes integration from first principles.
+## Key Findings
+### SOTA Innovations
+1. **Agent-First Dual-View Architecture** (Source: [[google-antigravity-official-blog]]): Editor View for hands-on coding + Manager View for multi-agent orchestration. This is a fundamentally different approach from our sequential pipeline model.
+2. **1M Token Context Window** (Source: [[cursor-vs-antigravity-2026]]): Ingests entire repositories into active memory instead of RAG-based retrieval. Natively understands cross-file dependencies. Tradeoff: massive token cost ($249.99/mo Ultra plan partly due to this).
+3. **Browser Subagent with Visual Verification** (Source: [[cursor-vs-antigravity-2026]]): Agent drives headless Chromium, takes screenshots, analyzes pixels with vision models. Closes the loop on UI changes. Our harness has NO equivalent capability.
+4. **Artifact-Based Trust System** (Source: [[google-antigravity-official-blog]]): Agents produce human-reviewable deliverables (task lists, screenshots, recordings) instead of raw tool logs. Asynchronous feedback on artifacts without stopping execution.
+5. **Skills with Progressive Disclosure** (Source: gap-fill research): SKILL.md files loaded only when semantically relevant. Community ecosystem ported from Claude Code. Directly analogous to our `.pi/skills/SKILL.md` system.
+6. **Cross-Project Learning Knowledge Base** (Source: [[google-antigravity-official-blog]]): Agents save successful strategies, code patterns, and solutions. Query across projects. Extends our L6 persistent memory concept.
+7. **Nano Banana (Built-in Asset Generation)** (Source: [[cursor-vs-antigravity-2026]]): Integrated image generator for UI assets directly in IDE. No external tools needed.
+8. **Deep System-Level AI Integration** (Source: [[google-antigravity-wikipedia]]): VS Code fork where AI is a system primitive, not an extension. Google hired Windsurf team + licensed tech for $2.4B.
+### What Antigravity Validates from Our Plan
+| Our Feature | Antigravity Equivalent | Confidence |
+|-------------|----------------------|------------|
+| Model-adaptive harness | Multi-model support (Gemini, Claude, GPT-OSS) with model-specific strengths | **high** |
+| Dynamic context | 1M token context window (different approach, same problem) | **medium** |
+| Pre-verification isolation (P15b) | Visual verification via browser subagent | **high** |
+| Subagent specialization (P25) | Manager View multi-agent orchestration | **high** |
+| Self-evolving harness (F1) | Cross-project learning knowledge base | **high** |
+| Skills system (F0) | SKILL.md progressive disclosure | **high** |
+| Adversarial verification (L4) | Artifact-based proof (complementary, not competing) | **medium** |
+### Critical Gaps Revealed
+1. **No Browser Control (NEW GAP)**: Our harness has zero visual verification capability. Browser subagent is Antigravity's killer feature. Phase P30 needed.
+2. **No Artifact Generation (NEW GAP)**: Our harness produces raw verification results (pass/fail, metrics). No human-reviewable deliverables. Phase P31 needed.
+3. **No Cross-Project Learning (PARTIAL GAP)**: L6 persistent memory is project-scoped. Cross-project knowledge transfer would accelerate agent performance. Phase P32 needed.
+4. **No Manager View / Control Plane (ARCHITECTURAL GAP)**: L7 orchestration is DAG-based sequential, not parallel agent dispatch. This is intentional for our CLI harness but worth reconsidering.
+5. **Context Strategy Divergence**: We use selective context (hot cache → index → pages). Antigravity uses massive context (1M tokens). Which is better? Depends on cost tolerance.
+## Key Entities
+- **Google Antigravity**: Agent-first IDE platform. Free public preview. VS Code fork. $2.4B Windsurf acquisition.
+- **Gemini 3.1 Pro**: Primary model powering Antigravity agents. Google's frontier coding model.
+- **Windsurf**: AI IDE acquired by Google ($2.4B, July 2025). Team now building Antigravity.
+## Key Concepts
+- [[antigravity-agent-first-architecture]]: The two-view (Editor + Manager) control plane model
+- [[agent-artifacts-verifiable-deliverables]]: Trust via human-reviewable deliverables
+- [[browser-subagent-visual-verification]]: Headless browser agent for UI verification
+## Contradictions
+- **1M Context Window vs RAG**: Antigravity bets on massive context. Our harness bets on selective retrieval. [[cursor-vs-antigravity-2026]] notes Antigravity's approach has massive cost implications ($249.99/mo Ultra plan). Our approach is more token-efficient but may miss cross-file dependency understanding. Verdict: both valid. Different cost/accuracy tradeoffs.
+- **Agent-First vs Pipeline-First**: Antigravity trusts agents to verify themselves (artifacts). Our harness enforces verification through pipeline stages (L3, L4). Both approaches have failure modes: agent self-verification misses errors; pipeline verification adds latency. Verdict: complementary. Best harness has both.
+## Open Questions
+- What would a CLI-native "Manager View" look like? Can we achieve multi-agent orchestration without a GUI?
+- At what task complexity does visual verification become necessary vs. nice-to-have?
+- Can cross-project learning be implemented without privacy/compliance issues?
+- How does Antigravity's approach to model-adaptive tool provisioning compare to our provider-native prompting (P22b)?
+## Integration Recommendations
+Three new phases for the [[harness-implementation-plan]]:
+### P30: Browser Subagent Integration (L3 tools)
+Add headless browser control capability via browser-harness (9.4K stars, MIT, thin CDP harness by browser-use). Self-healing: agent writes missing helpers mid-execution. Direct CDP access — one WebSocket to Chrome, nothing between. TypeScript variant: browser-harness-js (428 stars, 652 typed CDP methods). Replaces Puppeteer. See [[Source: browser-harness CDP Harness]] and [[browser-harness-agent]].
+### P31: Artifact Generation Layer (L4→L5 bridge)
+After L4 adversarial verification, agents generate human-reviewable artifacts: screenshots, browser recordings, test result summaries. Artifacts feed into L5 observability. Provides the "prove it worked" complement to L4's "prove it's wrong."
+### P32: Cross-Project Learning Knowledge Base (L6 extension)
+Extend persistent memory to support cross-project knowledge transfer. Agents save successful strategies and code patterns tagged by domain. Query across projects with relevance scoring. Foundation for F1 self-evolving harness.
+### What We Should NOT Adopt
+- **1M Token Context Window**: Token-inefficient for CLI harness. Selective context is better.
+- **Full IDE Integration**: Our harness is CLI-level. Different architecture, different constraints.
+- **Google Cloud Lock-in**: Antigravity's deep GCP integration is vendor lock-in. Harness stays platform-agnostic.
+- **$249.99/mo Pricing**: Unsustainable for individual developers. Our token budget optimization is a competitive advantage.