npm - ultimate-pi - Versions diffs - 0.1.2 → 0.1.4 - Mend

ultimate-pi 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (516) hide show

package/vault/wiki/sources/Source: Lovable Architecture & Clone Analysis.md ADDED Viewed

@@ -0,0 +1,83 @@
+---
+type: source
+source_type: blog
+title: "Lovable Architecture & Clone Analysis"
+author: "JIN (blog.devgenius.io), Neel S (Medium), Lovable Docs"
+date_published: 2025-09-05
+url:
+  - "https://blog.devgenius.io/lovables-architecture-decoded-how-ai-transforms-intent-into-production-ready-code-ceead05003e4"
+  - "https://docs.lovable.dev/introduction/welcome"
+  - "https://medium.com/@indraneelsarode22neel/building-a-lovable-clone-inside-the-architecture-of-agentic-ai-platforms-4d423dc53a9c"
+confidence: medium
+key_claims:
+  - "Lovable's key innovation is the orchestration layer on top of models, not the models themselves"
+  - "Multi-agent architecture: Planner → Architect → Coder with Pydantic-typed handoffs"
+  - "Structured outputs (Pydantic schemas) prevent chaos — transforms AI from demo to production"
+  - "LangGraph enables state-driven multi-agent workflows with conditional edges"
+  - "Groq's sub-100ms inference makes iterative development enjoyable"
+  - "Lovable supports full lifecycle: prototyping → deployment → operation with code ownership via GitHub sync"
+tags:
+  - lovable
+  - multi-agent
+  - agentic-ai
+  - structured-outputs
+  - langgraph
+created: 2026-05-03
+updated: 2026-05-03
+status: ingested
+---# Lovable Architecture & Clone Analysis
+Lovable (formerly GPT Engineer) is a full-stack AI development platform that transforms natural language into production-ready web applications. Built for enterprises with SOC 2 Type II, ISO 27001, and GDPR compliance.
+## Key Architecture Insight
+The critical point: Lovable's breakthrough is not about using better models — it's about the **orchestration layer** sitting on top of them. The system architecture bridges the "intent-to-execution chasm" that raw AI code generators fail at.
+## Lovable Clone Architecture (Neel S, Sept 2025)
+A simplified Lovable clone built with LangGraph, Groq, and Pydantic:
+### Three-Agent Pipeline
+**1. Planner Agent**: Raw user prompt → structured project plan (name, techstack, features, files). Output: Pydantic `Plan` object.
+**2. Architect Agent**: Project plan → detailed implementation steps with file-specific tasks. Output: `TaskPlan` object.
+**3. Coder Agent**: Implementation tasks → actual files on disk. Uses ReAct pattern with file system tools (read_file, write_file, list_files).
+### State Management
+State flows through agents as structured dict:
+```
+{
+  "user_prompt": str,
+  "plan": Plan,
+  "task_plan": TaskPlan,
+  "coder_state": CoderState,
+  "status": str
+}
+```
+LangGraph orchestrates: `graph.add_conditional_edges("coder", lambda s: "END" if s.get("status") == "DONE" else "coder")`
+### Key Patterns
+- **Structured outputs**: `llm.with_structured_output(Plan).invoke(prompt)` — no text parsing
+- **ReAct pattern**: Coder has real tools, not just text generation
+- **Handoffs via validated data contracts**: Each agent produces typed objects for downstream consumption
+## Lovable Production Architecture
+From official docs:
+- **Full-stack**: Frontend, backend, database, authentication, integrations
+- **Code ownership**: Sync to GitHub, integrate into existing workflows
+- **Enterprise**: SOC 2 Type II, ISO 27001, SSO/SCIM
+- **Security**: Built-in checks, data usage controls, data opt-out
+## Relevance to AI Coding Harness
+1. **Multi-agent decomposition with typed handoffs** is the central pattern — directly applicable to our harness L2 (planning) → L3 (execution) flow.
+2. **Structured outputs as reliability mechanism** — our harness should enforce schema-validated handoffs between phases, not free-text.
+3. **State management as first-class concern** — LangGraph's state graph pattern maps well to harness session state.
+4. **Orchestration layer > model layer** — invest in harness infrastructure, ride model improvements.

package/vault/wiki/sources/Source: Martin Fowler - Harness Engineering.md ADDED Viewed

@@ -0,0 +1,70 @@
+---
+type: source
+status: ingested
+source_type: engineering-blog
+author: Birgitta Böckeler (Thoughtworks)
+date_published: 2026-04-02
+date_accessed: 2026-05-01
+url: https://martinfowler.com/articles/harness-engineering.html
+confidence: high
+key_claims:
+  - Harness = everything in agent except the model itself
+  - Two control types: Feedforward (guides, prevent) + Feedback (sensors, self-correct)
+  - Two execution types: Computational (deterministic, fast) + Inferential (LLM-based, expensive)
+  - Three regulation categories: Maintainability, Architecture Fitness, Behaviour
+  - The steering loop: human iterates on harness when issues recur
+  - Keep quality left: fast checks pre-commit, expensive checks post-integration
+  - Harnessability: not every codebase equally amenable. "Ambient affordances" matter.
+  - Ashby's Law: regulator must have at least as much variety as system it governs
+  - Behaviour harness is the elephant in the room — unresolved
+created: 2026-05-02
+updated: 2026-05-02
+tags: [source]
+---
+# Martin Fowler: Harness Engineering for Coding Agent Users
+## What It Is
+Canonical framework for harness engineering from Martin Fowler (Thoughtworks). Published April 2, 2026. Supersedes earlier memo from Feb 2026. Defines the mental model for building trust in coding agents through constraints and feedback loops.
+## Core Framework
+### Feedforward and Feedback
+- **Guides (feedforward controls)**: Anticipate agent behaviour, steer before it acts. Increase probability of good first-attempt results.
+- **Sensors (feedback controls)**: Observe after agent acts, help it self-correct. Most powerful when signals are optimized for LLM consumption (e.g., custom linter messages with fix instructions).
+- Separately: agent repeats mistakes (feedback-only) OR encodes rules never tested (feedforward-only). Both needed.
+### Computational vs Inferential
+| Type | Computational | Inferential |
+|------|--------------|-------------|
+| Speed | milliseconds-seconds | seconds-minutes |
+| Cost | cheap | expensive |
+| Determinism | deterministic | non-deterministic |
+| Examples | linters, tests, type checkers | AI code review, "LLM as judge" |
+| Run frequency | every change | selectively |
+### The Steering Loop
+Human iterates on harness. When issue happens multiple times → improve feedforward/feedback controls. Agents can help write harness controls (custom linters, structural tests, how-to guides).
+### Regulation Categories
+1. **Maintainability harness**: Code quality, conventions. Easiest — lots of pre-existing computational tooling.
+2. **Architecture fitness harness**: Fitness functions for architecture characteristics (performance, observability, etc.).
+3. **Behaviour harness**: Functional correctness. "Elephant in the room" — AI-generated tests aren't reliable enough yet. Approved fixtures pattern shows promise.
+### Harnessability
+Not every codebase equally harnessable. Strongly typed languages, clear module boundaries, frameworks that abstract details all increase harnessability. "Ambient affordances" (Ned Letcher): structural properties that make environment legible to agents.
+### Harness Templates
+Pre-bundled guides + sensors for common service topologies (CRUD business service, event processor, data dashboard). Teams may pick tech stacks based on available harness templates.
+## Relevance to Ultimate-PI
+Our 8-layer pipeline directly implements Feedforward+Feedback. L1-L2 (Spec Hardening, Planning) are feedforward. L2.5-L4 (Drift, Grounding, Adversarial) are feedback. L5-L8 (Observability, Memory, Orchestration, Query) are the steering loop infrastructure. Our three drift paradigms map to the three regulation categories: Implementation drift = Maintainability, Spec drift = Behaviour, Tool-call drift crosses all three.
+Key gap: we don't separate computational vs inferential controls explicitly. Our drift detection is inferential; we could strengthen with computational sensors (custom linters, structural tests).

package/vault/wiki/sources/Source: OpenAI Harness Engineering Five Principles.md ADDED Viewed

@@ -0,0 +1,58 @@
+---
+type: source
+status: ingested
+source_type: engineering-blog
+author: Tony Lee
+date_published: 2026-02-12
+date_accessed: 2026-05-01
+url: https://tonylee.im/en/blog/openai-harness-engineering-five-principles-codex/
+confidence: high
+key_claims:
+  - OpenAI Codex team built 1M-line product using only agents, zero human-written code
+  - Took 1/10th the time vs manual (internal estimate, uncontrolled conditions)
+  - Five principles: visibility, capability-gap thinking, mechanical enforcement, agent eyes, map-not-manual
+  - Custom concurrency helpers instead of external libraries (API stability favors agents)
+  - Custom linters + structural tests enforce layered architecture; linters themselves written by Codex
+  - Chrome DevTools Protocol gives agent DOM snapshots, screenshots, navigation
+  - "A map, not a manual": ARCHITECTURE.md as bird's-eye view, not exhaustive documentation
+created: 2026-05-02
+updated: 2026-05-02
+tags: [source]
+---
+# OpenAI Harness Engineering: Five Principles
+## What It Is
+Summary of OpenAI's internal harness engineering practices from their Codex team, which built a 1M-line product using only AI agents (zero human-written code). Based on OpenAI's official post (openai.com/index/harness-engineering, Feb 11, 2026) but that page is 403-walled — this summary from Tony Lee's analysis provides the five principles.
+## The Five Principles
+### 1. What the Agent Can't See Doesn't Exist
+All decisions pushed into repository as markdown, schemas, and ExecPlans (PLANS.md). ExecPlan = self-contained design doc written so a beginner could implement end-to-end. Codex worked continuously for 7+ hours on single prompts — only possible with complete, stable context.
+### 2. Ask What Capability Is Missing, Not Why the Agent Is Failing
+When velocity was slow, team asked "what capability is missing?" instead of "why is the agent failing?" Reframed work from prompting harder to instrumenting environment better. Built custom concurrency helpers rather than external libraries — API stability + training data representation favor "boring technology."
+### 3. Mechanical Enforcement Over Documentation
+Enforced invariant rules mechanically (linters, structural tests) rather than prescribing implementation in text. Architecture locked into layered domain structure: Providers → Service → Runtime → UI. Dependency directions verified by linters. Custom linters written by Codex itself.
+### 4. Give the Agent Eyes
+Connected Chrome DevTools Protocol to agent runtime. Pre/post-task snapshot comparison + runtime event observation = agent fixes in loop until clean. Single Codex runs sustained 6+ hours on one task. Temporary observability stack per git worktree: Victoria Logs + Victoria Metrics. Prompts like "make the service start in under 800ms" become executable.
+### 5. A Map, Not a Manual
+ARCHITECTURE.md as bird's-eye view of project structure, including only what rarely changes. Architectural invariants expressed as "something does not exist here" — counterintuitive but effective. Stating boundaries explicitly constrains all downstream implementation.
+## Unresolved Questions
+- Can agent-built system maintain architectural consistency over years? Unknown.
+- How must harness change as models improve? Unknown.
+- 1M-line number represents single internal project under controlled conditions. Extrapolation requires caution.
+## Relevance to Ultimate-PI
+Principle 1 maps to L3 Grounding (everything in repo). Principle 2 maps to our tool-first approach (ck, Gitingest, pi-lean-ctx). Principle 3 maps to L2.5 Drift Monitor + Phase 16 Lint Gate — but we need *mechanical* enforcement (linters), not just drift detection. Principle 4 maps to L5 Observability but we lack browser/visual verification. Principle 5 maps to our wiki/overview.md + index.md but we could formalize ARCHITECTURE.md pattern.

package/vault/wiki/sources/Source: OpenAI Harness Engineering /342/200/224 0 Lines of Human Code.md" ADDED Viewed

@@ -0,0 +1,101 @@
+---
+type: source
+source_type: blog
+title: "OpenAI Harness Engineering — 0 Lines of Human Code"
+author: "Ryan Lopopolo, OpenAI Engineering"
+date_published: 2026-02-11
+url: "https://openai.com/index/harness-engineering/"
+confidence: high
+key_claims:
+  - "Built a product with 0 lines of manually-written code over 5 months"
+  - "~1M lines of code, ~1,500 PRs, 3-7 engineers steering Codex agents"
+  - "Average throughput: 3.5 PRs per engineer per day, increasing as team scaled"
+  - "Context is a scarce resource — use AGENTS.md as table of contents, not encyclopedia"
+  - "Enforce architecture mechanically via custom linters, not via prompts"
+  - "Codex can run single tasks for 6+ hours autonomously"
+  - "Dedicated doc-gardening agents scan for stale documentation"
+  - "Prefer 'boring' technology — easier for agents to model"
+tags:
+  - openai
+  - codex
+  - harness-engineering
+  - context-engineering
+  - agentic-coding
+created: 2026-05-03
+updated: 2026-05-03
+status: ingested
+---# OpenAI Harness Engineering — 0 Lines of Human Code
+OpenAI Engineering, February 2026. Ryan Lopopolo on building a product with Codex where humans never directly contributed any code.
+## Core Philosophy
+**"Humans steer. Agents execute."** The team's primary job became designing environments, specifying intent, and building feedback loops that allow Codex agents to do reliable work.
+## Key Architectural Decisions
+### 1. Progressive Disclosure (Maps, Not Encyclopedias)
+The "one big AGENTS.md" approach failed:
+- Context scarcity: giant file crowds out the task
+- Too much guidance becomes non-guidance
+- Rots instantly — agents can't tell what's stale
+- Hard to verify mechanically
+Solution: **AGENTS.md as table of contents** (~100 lines), pointing to structured `docs/` directory:
+```
+docs/
+├── design-docs/     (index, core beliefs)
+├── exec-plans/      (active, completed, tech-debt)
+├── product-specs/   (index, feature specs)
+├── references/      (design system, tool docs)
+├── DESIGN.md, FRONTEND.md, PLANS.md, QUALITY_SCORE.md
+```
+### 2. Mechanical Architecture Enforcement
+Layered domain architecture with strictly validated dependency directions:
+```
+Types → Config → Repo → Service → Runtime → UI
+```
+- Cross-cutting concerns enter only through explicit Providers interface
+- Enforced via custom linters and structural tests
+- Error messages injected as remediation instructions into agent context
+- "With agents, constraints become multipliers: once encoded, they apply everywhere at once"
+### 3. Agent Legibility as System of Record
+"From the agent's point of view, anything it can't access in-context while running effectively doesn't exist." Knowledge from Slack, Google Docs, or people's heads is invisible. All knowledge must be encoded into the repository as markdown.
+### 4. Environment Control
+Codex drives apps via Chrome DevTools Protocol: snapshots DOM, navigates, validates UI behavior. Ephemeral observability stack per worktree: logs (LogQL), metrics (PromQL), traces. Single Codex runs work on one task for 6+ hours.
+### 5. Garbage Collection for AI Slop
+Initial approach: humans spent Fridays (20% of week) cleaning "AI slop." Didn't scale. Solution: encode "golden principles" mechanically, run recurring background Codex tasks scanning for deviations, open targeted refactoring PRs. "Technical debt is like a high-interest loan — pay it down continuously in small increments."
+### 6. Minimal Blocking Merge Gates
+PRs are short-lived. Test flakes addressed with follow-up runs. "Corrections are cheap, waiting is expensive." In high-throughput agent systems, this is often the right tradeoff.
+## Full Autonomy Achieved
+Codex can now end-to-end drive a new feature from one prompt:
+1. Validate codebase state
+2. Reproduce reported bug
+3. Record video demonstrating failure
+4. Implement fix
+5. Validate fix by driving application
+6. Record video demonstrating resolution
+7. Open PR
+8. Respond to agent and human feedback
+9. Detect and remediate build failures
+10. Escalate to human only when judgment required
+11. Merge the change
+## Open Questions (from OpenAI)
+- How does architectural coherence evolve over years in a fully agent-generated system?
+- Where does human judgment add the most leverage?
+- How does the system evolve as models improve?

package/vault/wiki/sources/Source: OpenDev /342/200/224 Building AI Coding Agents for the Terminal.md" ADDED Viewed

@@ -0,0 +1,100 @@
+---
+type: source
+source_type: paper
+title: "OpenDev — Building AI Coding Agents for the Terminal"
+author: "Nghi D. Q. Bui, OpenDev"
+date_published: 2026-03-05
+url: "https://arxiv.org/html/2603.05344v1"
+confidence: high
+key_claims:
+  - "First comprehensive technical report for an open-source, terminal-native, interactive coding agent"
+  - "Compound AI system: per-workflow LLM binding (action, thinking, critique, vision, compact)"
+  - "5-stage adaptive context compaction reduces peak context consumption by ~54%"
+  - "Event-driven system reminders counteract instruction fade-out in long sessions"
+  - "5-layer defense-in-depth safety architecture (prompt, schema, runtime, tool-level, hooks)"
+  - "Lazy MCP tool discovery reduces startup context cost from 40% to <5%"
+  - "9-pass fuzzy edit matching chain resolves LLM formatting imprecision"
+tags:
+  - opendev
+  - terminal-agent
+  - context-engineering
+  - safety
+  - mcp
+  - compound-ai
+created: 2026-05-03
+updated: 2026-05-03
+status: ingested
+---# OpenDev — Building AI Coding Agents for the Terminal
+arXiv paper, March 2026. OpenDev is an open-source CLI coding agent with a published technical report — bridging the gap between closed-source industrial practice and open academic discourse.
+## Core Architecture
+### Compound AI System
+Not a single model but a structured ensemble of agents and workflows, each independently bound to a user-configured LLM. Five model roles with fallback chains:
+- **Action model**: Primary execution model for tool-based reasoning
+- **Thinking model**: Extended reasoning without tool access (prevents premature action)
+- **Critique model**: Self-evaluation (Reflexion-inspired, selective activation)
+- **Vision model**: Vision-language for screenshots/images
+- **Compact model**: Smaller/faster model for summarization during compaction
+### Dual-Agent Separation
+Main agent for execution + Planner subagent for planning. Planner has **read-only tools only** — write tools are absent from its schema entirely, making write attempts structurally impossible.
+### Extended ReAct Loop
+Four phases per iteration:
+1. **Context management**: 5-stage adaptive compaction (70% → 99% thresholds)
+2. **Thinking**: Separate LLM call without tools, at configurable depth (OFF/LOW/MEDIUM/HIGH)
+3. **Action**: Full LLM call with tool schemas
+4. **Decision**: Doom-loop detection, tool dispatch, error recovery
+## Context Engineering (First-Class Concern)
+### Adaptive Context Compaction (ACC)
+Five graduated stages:
+- **Stage 1 (70%)**: Warning — log utilization, no reduction
+- **Stage 2 (80%)**: Observation masking — replace old results with reference pointers
+- **Stage 2.5 (85%)**: Fast pruning — delete old tool outputs beyond recency window
+- **Stage 3 (90%)**: Aggressive masking — only most recent outputs preserved
+- **Stage 4 (99%)**: Full LLM compaction — summarize middle history, preserve recent
+Result: 54% reduction in peak context consumption. Artifact index tracks all files touched.
+### Event-Driven System Reminders
+24 reminder templates injected as `role: user` messages at decision points. Address attention-decay: after 30+ tool calls, agents silently stop following system prompt instructions. Reminders fire at precise decision points (tool failure, exploration spiral, premature completion, incomplete todos). Guardrail counters prevent noise (max 2-3 nudges per type).
+### Dual-Memory Architecture
+- **Episodic memory**: LLM-generated summary of full conversation (strategic context)
+- **Working memory**: Last 6 message pairs verbatim (operational detail)
+- Summary regenerated every 5 messages from full history to prevent drift accumulation
+### Dynamic System Prompt Construction
+Priority-ordered conditional sections. Each section has a predicate condition — gets loaded only when contextually relevant (e.g., git workflow section only in git repos). Provider-specific sections for Anthropic vs OpenAI vs Fireworks. Two-part composition for Anthropic prompt caching (88% cost reduction on cached portion).
+## Safety — Defense in Depth
+Five independent safety layers:
+1. **Prompt-level guardrails**: Security policy, action safety, git workflow
+2. **Schema-level tool gating**: Dangerous tools invisible to agent, not just blocked
+3. **Runtime approval system**: Manual/Semi-Auto/Auto levels, persistent permissions, pattern matching
+4. **Tool-level validation**: DANGEROUS_PATTERNS blocklist, stale-read detection, timeouts
+5. **Lifecycle hooks**: External scripts intercept 10 lifecycle events, can block or mutate
+## Tool System
+35 built-in tools across 12 categories. Key innovations:
+- **9-pass fuzzy edit matching**: Absorbs LLM formatting imprecision (trailing whitespace, indentation, escape sequences)
+- **Lazy MCP discovery**: `search_tools` with keyword scoring. Startup context cost: 40% → <5%
+- **Auto-promote server commands**: 16 regex patterns detect dev servers, auto-background them
+- **Dual-mode search**: ripgrep (text) + ast-grep (structural) with LSP for semantic code analysis
+## Discussion: Transferable Lessons
+1. **Context is a budget, not a buffer** — graduated reduction beats binary emergency compaction
+2. **Inject reminders at decision points, not upfront** — `role: user` beats `role: system`
+3. **Separate thinking from action** — absence of tool schemas changes behavior, not instructions
+4. **Make unsafe tools invisible, not blocked** — schema gating > runtime permission checks
+5. **Design tools to absorb LLM imprecision** — chain-of-responsibility matchers convert near-misses
+6. **Bound every resource that grows with session length** — caps on everything
+7. **Calibrate from API-reported token counts, not local estimates** — providers inject invisible content

package/vault/wiki/sources/Source: Render AI Coding Agents Benchmark 2025.md ADDED Viewed

@@ -0,0 +1,53 @@
+---
+type: source
+status: ingested
+source_type: benchmark-report
+author: Mitch Alderson (Render)
+date_published: 2025-08-12
+date_accessed: 2026-05-01
+url: https://render.com/blog/ai-coding-agents-benchmark
+confidence: high
+key_claims:
+  - Cursor leads overall (8/10): best setup speed, Docker/Render deployment, code quality
+  - Claude Code (6.8/10): best for rapid prototypes, productive terminal UX
+  - Gemini CLI (6.8/10): wins large-context refactors, weak on greenfield
+  - OpenAI Codex (6/10): powerful model, hampered by UX issues
+  - Gemini CLI pattern: excels at editing existing codebases (context-driven), struggles generating from scratch
+  - Free tier: 60 req/min, 1,000 req/day (industry best)
+created: 2026-05-02
+updated: 2026-05-02
+tags: [source]
+---
+# Render AI Coding Agents Benchmark (August 2025)
+## What It Is
+Independent benchmark comparing Cursor, Claude Code, Gemini CLI, and OpenAI Codex on production codebases in 2025. Two test categories: "vibe coding" (greenfield URL shortener) and production code tasks (Go monorepo, Astro.js site).
+## Final Scores
+| Tool | Setup | Cost | Quality | Context | Integration | Speed | Specialized | **Avg** |
+|------|-------|------|---------|---------|-------------|-------|-------------|---------|
+| Cursor | 9 | 5 | 9 | 8 | 8 | 9 | 8 | **8** |
+| Claude Code | 8 | 6 | 7 | 5 | 9 | 7 | 6 | **6.8** |
+| Gemini CLI | 6 | 8 | 7 | 9 | 5 | 5 | 8 | **6.8** |
+| Codex | 3 | 6 | 8 | 7 | 4 | 7 | 7 | **6** |
+## Gemini CLI Specific Findings
+- **Context: 9/10** — best in class. 1M token window + automatic codebase loading. Loaded most/all relevant files without manual intervention.
+- **Quality: 7/10** — solid on production refactors (first-try Go refactor with proper error handling), but 3/10 on vibe coding (7 follow-up error prompts needed, barebones output).
+- **Speed: 5/10** — slow due to automatic full-context loading.
+- **Hypothesis** (unconfirmed): Gemini may be tuned to make decisions based on context rather than pre-training, favoring editing existing codebases over generating from scratch.
+## Key Takeaways
+- Each tool excels in different areas; no single winner
+- For production refactoring: Gemini + Cursor best (context matters most)
+- For greenfield: Cursor + Claude Code best (model quality + UX matters)
+- AI agents best used by experienced engineers who audit output
+- All agents were great as "error assistants" — troubleshooting via chat
+## Relevance to Ultimate-PI
+Gemini CLI's context-driven approach validates our L3 Grounding layer (Gitingest + ck). The benchmark's finding that context quality beats model quality for production tasks reinforces our first-principles decision to invest heavily in grounding/context engineering.

package/vault/wiki/sources/Source: Rocket.new /342/200/224 Vibe Solutioning Platform.md" ADDED Viewed

@@ -0,0 +1,70 @@
+---
+type: source
+source_type: news_product
+title: "Rocket.new — Vibe Solutioning Platform"
+author: "Rocket (website), Jagmeet Singh (TechCrunch)"
+date_published: 2026-04-06
+url:
+  - "https://www.rocket.new/"
+  - "https://techcrunch.com/2026/04/06/indian-startup-rocket-wants-its-ai-to-do-mckinsey-style-consulting-at-a-fraction-of-the-cost/"
+confidence: medium
+key_claims:
+  - "World's first Vibe Solutioning platform: strategy → build → competitive intelligence in one system"
+  - "Code generation is a commodity — deciding what to build is the missing piece"
+  - "$15M seed from Accel, Salesforce Ventures, Together Fund"
+  - "1.5M users across 180 countries, 57 employees, based in Surat, India"
+  - "Generates 'McKinsey-grade' consulting-style reports from simple prompts"
+  - "Subscriptions: $25-$350/month"
+tags:
+  - rocket
+  - vibe-solutioning
+  - strategy
+  - competitive-intelligence
+created: 2026-05-03
+updated: 2026-05-03
+status: ingested
+---# Rocket.new — Vibe Solutioning Platform
+Rocket describes itself as the world's first "Vibe Solutioning" platform. It covers the full arc from market research → product strategy → app building → competitive intelligence in one system with shared context.
+## Three Capabilities
+### 1. Solve (The Thinking Before the Build)
+- Describe a market problem → AI returns research, evidence, and recommendation
+- Outputs: market analysis, what-to-build, GTM strategy, PRD, regulatory research
+- "Ready to present to a room, hand to a developer, or take straight into Build"
+- Draws on 1,000+ data sources: Meta ad libraries, Similarweb API, own crawlers
+### 2. Build (Production-Grade from First Prompt)
+- Web apps, mobile apps, landing pages, SaaS, internal tools, dashboards
+- Import from Figma, reimagine existing designs
+- One-click deploy with staging and production environments
+- Claims "100x better than anything else" in user testimonials
+### 3. Intelligence (Know What Your Competition Just Did)
+- Continuous monitoring of competitor pricing, messaging, launches, website changes
+- Daily briefs, hiring signals, social media intel
+- "Rocket saw it, connected it, and already knows what it means for you"
+## Business Model
+- $25/mo: Build only
+- $250/mo: Strategy + Research (2-3 "McKinsey-grade" reports + builds)
+- $350/mo: Full platform including competitive intelligence
+- ARPU ~$4,000/year; 20-30% customers are SMBs
+- Gross margins >50%
+## Funding & Traction
+- $15M seed (Sept 2025) from Accel, Salesforce Ventures, Together Fund
+- Grew from 400K to 1.5M+ users post-funding
+- Founded by Vishal Virani (previously co-founder of DhiWise, which pivoted to Rocket)
+## Key Thesis
+"Everyone can generate the code now — it has become a commodity. But what to build is something which everyone is missing. Running a business and just building a codebase are two different things."
+## Relevance to AI Coding Harness
+1. **Pre-build strategy layer**: Rocket validates that the "what to build" gap is real and commercially viable. Our harness could integrate a planning phase that does market/competitive analysis before generating code.
+2. **Shared context across lifecycle**: From strategy → build → monitoring, context compounds. A harness should treat context as persistent across all phases, not resetting between planning and coding.
+3. **Competitive intelligence as feedback loop**: After deployment, monitor what competitors do and feed that back into the planning phase. This creates a continuous improvement loop.
+4. **Limitation noted by TechCrunch**: Analysis is synthesized from existing data, not independently verifiable. Users should validate outputs before business decisions.

package/vault/wiki/sources/Source: SwirlAI Agent Skills Progressive Disclosure.md ADDED Viewed

@@ -0,0 +1,71 @@
+---
+type: source
+source_type: newsletter
+title: "SwirlAI — Agent Skills Progressive Disclosure"
+author: "Aurimas Griciūnas"
+date_published: 2026-03-11
+url: "https://www.newsletter.swirlai.com/p/agent-skills-progressive-disclosure"
+confidence: high
+key_claims:
+  - "Agent Skills use three-tier progressive disclosure: Discovery (~80 tokens/skill), Activation (~2,000 tokens median), Execution (unlimited supporting files)"
+  - "Anthropic released Agent Skills open standard Dec 18, 2025. Within weeks, OpenAI, Google, GitHub, Cursor adopted it."
+  - "Skills marketplaces like SkillsMP index over 400,000 skills across platforms."
+  - "Progressive disclosure is a SYSTEM DESIGN PATTERN, not just a coding agent feature."
+  - "Context windows are finite and lossy — models miss information in the middle of long contexts ('lost in the middle')"
+  - "Best practice: fewer than 20 tools available to an agent, accuracy degrades past 10"
+  - "Skill description quality directly determines routing accuracy — Claude selects skills through pure LLM reasoning"
+tags: [source, skills, progressive-disclosure, agent-architecture]
+related:
+  - "[[agent-skills-pattern]]"
+  - "[[progressive-disclosure-agents]]"
+  - "[[skill-first-architecture]]"
+---
+# SwirlAI — Agent Skills: Progressive Disclosure as a System Design Pattern
+## Summary
+Comprehensive analysis by Aurimas Griciūnas (SwirlAI Newsletter, 35K+ subscribers) on why Agent Skills became an industry standard within weeks. Published March 11, 2026 — three months after Anthropic's open standard release.
+## Key Contributions
+### Three-Tier Progressive Disclosure Architecture
+The `SKILL.md` file organizes information into three layers. The platform implements the loading logic.
+**Layer 1: Discovery** (~80 tokens/skill median). At startup, the platform reads only `name` and `description` from YAML frontmatter. All 17 of Anthropic's official skills together cost ~1,700 tokens at discovery — an agent can be aware of dozens of skills for less context than a single activated skill.
+**Layer 2: Activation** (~2,000 tokens median). When the platform determines a skill is relevant, it loads the full `SKILL.md` markdown body. Body sizes range from ~275 tokens (internal-comms) to ~8,000 tokens (skill-creator).
+**Layer 3: Execution** (unlimited). Supporting files (scripts, reference docs, templates, configs) loaded on demand. Scripts execute without their code entering context — only output consumes tokens.
+### Industry Adoption Speed
+- **Dec 18, 2025**: Anthropic releases open standard
+- **Within weeks**: OpenAI (Codex CLI, ChatGPT), Google (Gemini CLI), GitHub Copilot, Cursor all adopt
+- **By Mar 2026**: SkillsMP indexes 400,000+ skills
+> "Every one of these platforms faces the same two problems: how to give agents broad knowledge without destroying context quality, and how to let users configure agent behavior without requiring engineering expertise. The skills format solves both."
+### Non-Coding Applications
+OpenClaw (175K GitHub stars in <2 weeks) demonstrates the pattern works beyond coding agents: calendar management, email drafting, smart home control, meal planning, cross-platform coordination. Community registry ClawHub hosts 13,000+ skills, most non-technical.
+### Context Engineering
+> "Best practice recommends fewer than 20 tools available to an agent at once, with accuracy degrading past 10. The same principle applies to instructions."
+Context windows are finite and lossy. The "lost in the middle" phenomenon: models reliably miss information placed in the middle of long contexts.
+## What We Adopt
+- Three-tier progressive disclosure as the architectural model for harness skills
+- Skills as the atomic unit of harness behavior (not code modules)
+- Description quality as the routing mechanism (not keyword matching)
+- The insight that markdown skills make agent behavior configurable by non-engineers
+## What We Note
+- The ecosystem moved fast because the problem (context bloat + configuration accessibility) is universal
+- Skills compose with hooks — skills can define deterministic behavior in frontmatter
+- Marketplaces are forming — our harness skills could be published to SkillsMP