npm - feed-the-machine - Versions diffs - 1.0.0 → 1.2.0 - Mend

feed-the-machine 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (136) hide show

package/bin/generate-manifest.mjs +253 -0
package/bin/install.mjs +134 -4
package/docs/HOOKS.md +243 -0
package/docs/INBOX.md +233 -0
package/ftm/SKILL.md +34 -0
package/ftm-audit/SKILL.md +69 -0
package/ftm-brainstorm/SKILL.md +51 -0
package/ftm-browse/SKILL.md +39 -0
package/ftm-capture/SKILL.md +370 -0
package/ftm-capture.yml +4 -0
package/ftm-codex-gate/SKILL.md +59 -0
package/ftm-config/SKILL.md +35 -0
package/ftm-council/SKILL.md +56 -0
package/ftm-dashboard/SKILL.md +163 -0
package/ftm-debug/SKILL.md +84 -0
package/ftm-diagram/SKILL.md +44 -0
package/ftm-executor/SKILL.md +97 -0
package/ftm-git/SKILL.md +60 -0
package/ftm-inbox/backend/__init__.py +0 -0
package/ftm-inbox/backend/__pycache__/main.cpython-314.pyc +0 -0
package/ftm-inbox/backend/adapters/__init__.py +0 -0
package/ftm-inbox/backend/adapters/_retry.py +64 -0
package/ftm-inbox/backend/adapters/base.py +230 -0
package/ftm-inbox/backend/adapters/freshservice.py +104 -0
package/ftm-inbox/backend/adapters/gmail.py +125 -0
package/ftm-inbox/backend/adapters/jira.py +136 -0
package/ftm-inbox/backend/adapters/registry.py +192 -0
package/ftm-inbox/backend/adapters/slack.py +110 -0
package/ftm-inbox/backend/db/__init__.py +0 -0
package/ftm-inbox/backend/db/connection.py +54 -0
package/ftm-inbox/backend/db/schema.py +78 -0
package/ftm-inbox/backend/executor/__init__.py +7 -0
package/ftm-inbox/backend/executor/engine.py +149 -0
package/ftm-inbox/backend/executor/step_runner.py +98 -0
package/ftm-inbox/backend/main.py +103 -0
package/ftm-inbox/backend/models/__init__.py +1 -0
package/ftm-inbox/backend/models/unified_task.py +36 -0
package/ftm-inbox/backend/planner/__init__.py +6 -0
package/ftm-inbox/backend/planner/__pycache__/__init__.cpython-314.pyc +0 -0
package/ftm-inbox/backend/planner/__pycache__/generator.cpython-314.pyc +0 -0
package/ftm-inbox/backend/planner/__pycache__/schema.cpython-314.pyc +0 -0
package/ftm-inbox/backend/planner/generator.py +127 -0
package/ftm-inbox/backend/planner/schema.py +34 -0
package/ftm-inbox/backend/requirements.txt +5 -0
package/ftm-inbox/backend/routes/__init__.py +0 -0
package/ftm-inbox/backend/routes/__pycache__/plan.cpython-314.pyc +0 -0
package/ftm-inbox/backend/routes/execute.py +186 -0
package/ftm-inbox/backend/routes/health.py +52 -0
package/ftm-inbox/backend/routes/inbox.py +68 -0
package/ftm-inbox/backend/routes/plan.py +271 -0
package/ftm-inbox/bin/launchagent.mjs +91 -0
package/ftm-inbox/bin/setup.mjs +188 -0
package/ftm-inbox/bin/start.sh +10 -0
package/ftm-inbox/bin/status.sh +17 -0
package/ftm-inbox/bin/stop.sh +8 -0
package/ftm-inbox/config.example.yml +55 -0
package/ftm-inbox/package-lock.json +2898 -0
package/ftm-inbox/package.json +26 -0
package/ftm-inbox/postcss.config.js +6 -0
package/ftm-inbox/src/app.css +199 -0
package/ftm-inbox/src/app.html +18 -0
package/ftm-inbox/src/lib/api.ts +166 -0
package/ftm-inbox/src/lib/components/ExecutionLog.svelte +81 -0
package/ftm-inbox/src/lib/components/InboxFeed.svelte +143 -0
package/ftm-inbox/src/lib/components/PlanStep.svelte +271 -0
package/ftm-inbox/src/lib/components/PlanView.svelte +206 -0
package/ftm-inbox/src/lib/components/StreamPanel.svelte +99 -0
package/ftm-inbox/src/lib/components/TaskCard.svelte +190 -0
package/ftm-inbox/src/lib/components/ui/EmptyState.svelte +63 -0
package/ftm-inbox/src/lib/components/ui/KawaiiCard.svelte +86 -0
package/ftm-inbox/src/lib/components/ui/PillButton.svelte +106 -0
package/ftm-inbox/src/lib/components/ui/StatusBadge.svelte +67 -0
package/ftm-inbox/src/lib/components/ui/StreamDrawer.svelte +149 -0
package/ftm-inbox/src/lib/components/ui/ThemeToggle.svelte +80 -0
package/ftm-inbox/src/lib/theme.ts +47 -0
package/ftm-inbox/src/routes/+layout.svelte +76 -0
package/ftm-inbox/src/routes/+page.svelte +401 -0
package/ftm-inbox/static/favicon.png +0 -0
package/ftm-inbox/svelte.config.js +12 -0
package/ftm-inbox/tailwind.config.ts +63 -0
package/ftm-inbox/tsconfig.json +13 -0
package/ftm-inbox/vite.config.ts +6 -0
package/ftm-intent/SKILL.md +44 -0
package/ftm-manifest.json +3794 -0
package/ftm-map/SKILL.md +259 -0
package/ftm-map/scripts/db.py +391 -0
package/ftm-map/scripts/index.py +341 -0
package/ftm-map/scripts/parser.py +455 -0
package/ftm-map/scripts/queries/.gitkeep +0 -0
package/ftm-map/scripts/queries/javascript-tags.scm +23 -0
package/ftm-map/scripts/queries/python-tags.scm +17 -0
package/ftm-map/scripts/queries/typescript-tags.scm +29 -0
package/ftm-map/scripts/query.py +149 -0
package/ftm-map/scripts/requirements.txt +2 -0
package/ftm-map/scripts/setup-hooks.sh +27 -0
package/ftm-map/scripts/setup.sh +45 -0
package/ftm-map/scripts/test_db.py +124 -0
package/ftm-map/scripts/test_parser.py +106 -0
package/ftm-map/scripts/test_query.py +66 -0
package/ftm-map/scripts/tests/fixtures/__init__.py +0 -0
package/ftm-map/scripts/tests/fixtures/sample_project/api.ts +16 -0
package/ftm-map/scripts/tests/fixtures/sample_project/auth.py +15 -0
package/ftm-map/scripts/tests/fixtures/sample_project/utils.js +16 -0
package/ftm-map/scripts/views.py +545 -0
package/ftm-mind/SKILL.md +173 -66
package/ftm-pause/SKILL.md +43 -0
package/ftm-researcher/SKILL.md +275 -0
package/ftm-researcher/evals/agent-diversity.yaml +17 -0
package/ftm-researcher/evals/synthesis-quality.yaml +12 -0
package/ftm-researcher/evals/trigger-accuracy.yaml +39 -0
package/ftm-researcher/references/adaptive-search.md +116 -0
package/ftm-researcher/references/agent-prompts.md +193 -0
package/ftm-researcher/references/council-integration.md +193 -0
package/ftm-researcher/references/output-format.md +203 -0
package/ftm-researcher/references/synthesis-pipeline.md +165 -0
package/ftm-researcher/scripts/score_credibility.py +234 -0
package/ftm-researcher/scripts/validate_research.py +92 -0
package/ftm-resume/SKILL.md +47 -0
package/ftm-retro/SKILL.md +54 -0
package/ftm-routine/SKILL.md +170 -0
package/ftm-state/blackboard/capabilities.json +5 -0
package/ftm-state/blackboard/capabilities.schema.json +27 -0
package/ftm-upgrade/SKILL.md +41 -0
package/ftm-upgrade/scripts/check-version.sh +1 -1
package/ftm-upgrade/scripts/upgrade.sh +1 -1
package/hooks/ftm-blackboard-enforcer.sh +94 -0
package/hooks/ftm-discovery-reminder.sh +90 -0
package/hooks/ftm-drafts-gate.sh +61 -0
package/hooks/ftm-event-logger.mjs +107 -0
package/hooks/ftm-map-autodetect.sh +79 -0
package/hooks/ftm-pending-sync-check.sh +22 -0
package/hooks/ftm-plan-gate.sh +96 -0
package/hooks/ftm-post-commit-trigger.sh +57 -0
package/hooks/settings-template.json +81 -0
package/install.sh +140 -11
package/package.json +12 -2

package/ftm-researcher/evals/agent-diversity.yaml ADDED Viewed

@@ -0,0 +1,17 @@
+# ftm-researcher/evals/agent-diversity.yaml
+description: Verify 7 finder agents produce non-overlapping results from different domains
+prompts:
+  - vars:
+      input: "Research how to implement WebSocket connections in a Node.js application"
+    assert:
+      - type: contains
+        value: "web_surveyor"
+      - type: contains
+        value: "github_miner"
+      - type: contains
+        value: "codebase_analyst"
+      - type: javascript
+        value: |
+          // Verify at least 5 different agent_roles appear in findings
+          const roles = new Set(output.findings?.map(f => f.agent_role) || []);
+          return roles.size >= 5;

package/ftm-researcher/evals/synthesis-quality.yaml ADDED Viewed

@@ -0,0 +1,12 @@
+# ftm-researcher/evals/synthesis-quality.yaml
+description: Verify synthesis pipeline produces valid disagreement maps
+prompts:
+  - vars:
+      input: "Given these 10 findings from different agents, produce a disagreement map"
+    assert:
+      - type: contains
+        value: "consensus"
+      - type: contains
+        value: "contested"
+      - type: contains
+        value: "unique_insights"

package/ftm-researcher/evals/trigger-accuracy.yaml ADDED Viewed

@@ -0,0 +1,39 @@
+# ftm-researcher/evals/trigger-accuracy.yaml
+description: Verify ftm-researcher triggers on research requests and not on brainstorm/debug/other
+prompts:
+  - vars:
+      input: "research parallel agent architectures"
+    assert:
+      - type: contains
+        value: "ftm-researcher"
+  - vars:
+      input: "what's the state of the art on LLM fine-tuning"
+    assert:
+      - type: contains
+        value: "ftm-researcher"
+  - vars:
+      input: "find me examples of rate limiting in Go"
+    assert:
+      - type: contains
+        value: "ftm-researcher"
+  - vars:
+      input: "compare Redis vs Memcached"
+    assert:
+      - type: contains
+        value: "ftm-researcher"
+  # Should NOT trigger
+  - vars:
+      input: "I have an idea for a dashboard"
+    assert:
+      - type: not-contains
+        value: "ftm-researcher"
+  - vars:
+      input: "debug this flaky test"
+    assert:
+      - type: not-contains
+        value: "ftm-researcher"
+  - vars:
+      input: "help me brainstorm auth design"
+    assert:
+      - type: not-contains
+        value: "ftm-researcher"

package/ftm-researcher/references/adaptive-search.md ADDED Viewed

@@ -0,0 +1,116 @@
+# Adaptive Search Protocol
+Wave 1 → Wave 2 refinement for Deep mode research.
+---
+## When It Runs
+Only in Deep mode. After wave 1 findings are normalized (Phase 1 of synthesis).
+---
+## How It Works
+The orchestrator analyzes wave 1 findings across 4 dimensions:
+### 1. Coverage Analysis
+For each original subtopic:
+- **SATURATED** (3+ findings with diverse sources): Well-covered. Agent can be reassigned.
+- **THIN** (1-2 findings): Partially covered. Same agent gets a refined query.
+- **GAP** (0 findings): Not covered. Agent gets a broader query + alternative search terms.
+### 2. Contradiction Detection
+- Identify claims where 2+ agents directly contradict each other
+- Mark these subtopics as CONTESTED — wave 2 agents prioritize resolution
+- For each contradiction, note: which agents, which claims, what the disagreement is
+### 3. Depth Opportunities
+- Identify findings that mention specific tools, libraries, or approaches worth deeper investigation
+- Generate drill-down queries for wave 2
+- Prioritize depth opportunities that the user's response highlighted as important
+### 4. Surprise Detection
+- Identify findings that don't fit any original subtopic — unexpected angles
+- Generate new subtopics to explore these surprises
+- Surprises are high-value: they represent information the user didn't know to ask about
+---
+## Wave 2 Dispatch
+Reassign agents based on analysis:
+| Coverage Status | Action |
+|---|---|
+| SATURATED | Reassign agent to a GAP or CONTESTED area |
+| THIN | Same agent, refined query with more specific terms |
+| GAP | Agent gets broader query + alternative search terms |
+| CONTESTED | Assign 2 agents (one per side) to find resolution evidence |
+| SURPRISE | Assign the most relevant agent to explore the unexpected angle |
+### Agent Reassignment Rules
+1. Prefer reassigning agents whose original domain is closest to the gap
+2. If a GAP exists in the academic domain, reassign Academic Scout even if it was SATURATED
+3. Codebase Analyst is never reassigned — it always re-searches with refined local queries
+4. If all subtopics are SATURATED, focus wave 2 on depth opportunities and surprises
+### Context Injection for Wave 2
+All wave 2 agents receive:
+- Full wave 1 findings summary (so they don't re-search)
+- Their specific wave 2 mission (gap-fill, deepen, resolve, or explore)
+- Explicit instruction: "Build on wave 1, do not repeat it"
+- The contradiction details if they're resolving a CONTESTED subtopic
+---
+## Merge Protocol
+Wave 2 findings merge with wave 1 before entering the synthesis pipeline:
+1. Wave 2 findings are added to the findings pool with `wave: 2` marker
+2. The normalize phase (Phase 1) runs again across ALL findings (wave 1 + wave 2)
+3. Deduplication groups wave 1 and wave 2 findings together — if wave 2 confirms a wave 1 finding, the agent_count increases
+4. New wave 2 findings that weren't in wave 1 are added as new unique claims
+5. The wave marker is preserved through synthesis for traceability
+### Contradiction Resolution
+When wave 2 agents were dispatched to resolve a CONTESTED subtopic:
+- If wave 2 finds evidence strongly supporting one side, the contest is resolved
+- If wave 2 finds evidence supporting both sides, the contest remains but with richer context
+- The pairwise ranking (Phase 3) benefits from the additional evidence
+---
+## Orchestrator Analysis Template
+After wave 1 normalization, the orchestrator produces this analysis:
+```
+COVERAGE ANALYSIS:
+1. [subtopic]: SATURATED | THIN | GAP — [N findings, M source types]
+2. [subtopic]: SATURATED | THIN | GAP — [N findings, M source types]
+...
+CONTRADICTIONS DETECTED:
+- [Agent A] claims [X] vs [Agent B] claims [Y] — on subtopic [Z]
+DEPTH OPPORTUNITIES:
+- Finding [N] mentions [specific tool/approach] worth investigating
+- Finding [M] suggests [unexpected constraint] that needs validation
+SURPRISES:
+- [Agent] found [unexpected finding] not covered by any original subtopic
+WAVE 2 PLAN:
+- [Agent]: [mission] — [refined query]
+- [Agent]: [mission] — [refined query]
+...
+```

package/ftm-researcher/references/agent-prompts.md ADDED Viewed

@@ -0,0 +1,193 @@
+# Agent Prompts: 7 Finder Agents + Orchestrator
+## Orchestrator Protocol: Subtopic Decomposition
+Given research question Q, decompose into 7 facets:
+1. GENERAL LANDSCAPE (→ Web Surveyor): What's the current state? Blog posts, case studies, tutorials.
+2. THEORETICAL FOUNDATIONS (→ Academic Scout): What does the research say? Papers, official docs, specs.
+3. IMPLEMENTATION PATTERNS (→ GitHub Miner): How have others built this? Repos, code, OSS.
+4. MARKET REALITY (→ Competitive Analyst): What products exist? User reviews, complaints, gaps.
+5. PRACTITIONER WISDOM (→ Stack Overflow Digger): What pitfalls exist? Common mistakes, solved problems.
+6. LOCAL CONTEXT (→ Codebase Analyst): How does our project relate? Existing patterns, conventions, integration points.
+7. HISTORICAL EVOLUTION (→ Historical Investigator): How was this solved before? What failed? What evolved?
+For each facet, generate a specific search query tailored to the information domain.
+### Decomposition Rules
+- Each subtopic maps to exactly one finder's domain
+- No overlap between subtopics
+- Coverage of the full research question
+- Adaptation to question type (technical, market, conceptual, comparative)
+### Quick Mode Subset
+In Quick mode, only dispatch 3 finders: Web Surveyor, GitHub Miner, Codebase Analyst.
+The orchestrator generates subtopics for only these 3 domains.
+---
+## Finder Agent Prompt Template
+Each agent prompt follows this structure. The orchestrator fills in the template variables at dispatch time.
+```
+RESEARCH QUESTION: {Q}
+YOUR SUBTOPIC: {specific facet assigned by orchestrator}
+PROJECT CONTEXT: {from Phase 0 repo scan}
+CONTEXT REGISTER: {accumulated findings from prior waves/turns}
+PREVIOUS FINDINGS TO BUILD ON: {summary — do NOT re-search these}
+DEPTH LEVEL: {broad | focused | implementation}
+```
+### Return Format (all agents)
+For each finding, return:
+- claim: [one-sentence factual claim]
+- evidence: [2-3 sentence supporting detail]
+- source_url: [URL]
+- source_type: [primary | peer_reviewed | official_docs | news | blog | forum | code_repo | qa_site | codebase]
+- confidence: [0.0-1.0, self-assessed]
+- agent_role: [your role name]
+Return 3-8 findings. Quality over quantity. If your domain has nothing relevant, return 0 findings with a note explaining why.
+---
+## Agent 1: Web Surveyor
+You are the Web Surveyor — your domain is the general web landscape: blog posts, case studies, tutorials, and technical write-ups.
+DOMAIN CONSTRAINT: Blog posts, case studies, tutorials, technical write-ups. Use WebSearch tool.
+ANTI-REDUNDANCY: Do NOT search GitHub repos, academic papers, or Stack Overflow.
+### Depth-Specific Instructions
+**BROAD:** Map the territory. What are the 3-5 major approaches? What's typically harder than expected? Search: "[core concept] architecture", "[concept] case study", "how [company] built [feature]".
+**FOCUSED:** Drill into the user's chosen approach. Find gotchas, failure modes, scaling limits. Compare 2-3 real implementations. Search: "[specific approach] [stack] production", "[approach] lessons learned".
+**IMPLEMENTATION:** Find concrete patterns, library recommendations, config examples. Search: "[specific library] [framework] tutorial", "[exact pattern] implementation".
+---
+## Agent 2: Academic Scout
+You are the Academic Scout — your domain is research papers, specifications, and official documentation.
+DOMAIN CONSTRAINT: Papers (arxiv, ACM, IEEE), official documentation, RFCs, specifications. WebSearch filtered to academic domains.
+ANTI-REDUNDANCY: Do NOT search blogs, forums, or product sites.
+### Depth-Specific Instructions
+**BROAD:** What does the research community say about this? What theoretical foundations exist? Search: "[concept] survey paper", "site:arxiv.org [concept]", "[concept] RFC".
+**FOCUSED:** Find papers that address the specific approach. What are the proven theoretical limits? Search: "[specific approach] analysis", "[approach] formal verification", "[approach] benchmark".
+**IMPLEMENTATION:** Find reference implementations from papers, official specs with code examples. Search: "[algorithm] reference implementation", "[spec] code example".
+---
+## Agent 3: GitHub Miner
+You are the GitHub Miner — your domain is open-source code, repositories, and implementation patterns.
+DOMAIN CONSTRAINT: GitHub repos, code patterns, OSS implementations. WebSearch filtered to github.com.
+ANTI-REDUNDANCY: Do NOT search blogs or Q&A sites. Report: repo URL, stars, last commit, architecture notes.
+### Depth-Specific Instructions
+**BROAD:** Find the most-starred repos. What patterns emerge across repos? Search: "[concept] [language]", "awesome-[concept]".
+**FOCUSED:** Find repos using the SAME stack. Dig into architecture decisions, open issues. Search: "[approach] [exact framework]", "[approach] example [language]".
+**IMPLEMENTATION:** Find repos that solved the EXACT sub-problem. Look at specific files/functions, test suites. Search: "[specific library] [pattern] example", "[exact integration] starter".
+---
+## Agent 4: Competitive Analyst
+You are the Competitive Analyst — your domain is the market landscape: products, tools, user reviews, and gaps.
+DOMAIN CONSTRAINT: Products, tools, user reviews on Reddit/HN/Twitter, market analysis. WebSearch filtered to reddit.com, news.ycombinator.com, product sites.
+ANTI-REDUNDANCY: Do NOT search GitHub repos or academic papers. Focus on what users love/hate.
+### Depth-Specific Instructions
+**BROAD:** What products/tools exist? What do users love/hate? Where are the gaps? Search: "site:reddit.com [problem] recommendation", "site:news.ycombinator.com [concept]".
+**FOCUSED:** Deep-dive 2-3 most relevant competitors. How do they handle the specific challenge? Search: "[product] review", "[product] vs [product]", "[product] limitations".
+**IMPLEMENTATION:** How do competitors implement the specific feature? Public APIs, SDKs? Search: "[product] API", "[product] architecture", "[product] integration guide".
+---
+## Agent 5: Stack Overflow Digger
+You are the Stack Overflow Digger — your domain is practitioner wisdom: common pitfalls, solved problems, and battle-tested solutions.
+DOMAIN CONSTRAINT: Stack Overflow, community Q&A, common pitfalls, solved problems. WebSearch filtered to stackoverflow.com, stackexchange.com.
+ANTI-REDUNDANCY: Do NOT search GitHub or blogs. Focus on battle-tested solutions and known footguns.
+### Depth-Specific Instructions
+**BROAD:** What are the common mistakes people make? What questions come up repeatedly? Search: "site:stackoverflow.com [concept] [common error]".
+**FOCUSED:** What are the subtle gotchas for this specific approach? Search: "site:stackoverflow.com [approach] gotcha", "[approach] edge case".
+**IMPLEMENTATION:** Find accepted answers with code for the exact pattern needed. Search: "site:stackoverflow.com [exact problem] [language] [framework]".
+---
+## Agent 6: Codebase Analyst
+You are the Codebase Analyst — your domain is the LOCAL repository only. You search the user's codebase for relevant patterns, conventions, and integration points.
+DOMAIN CONSTRAINT: Local repo ONLY. Uses Grep, Read, Glob tools. Searches code, git log, architecture docs, INTENT.md, ARCHITECTURE.mmd.
+ANTI-REDUNDANCY: Do NOT use WebSearch. No external sources. All findings cite file paths and line numbers.
+### Instructions
+1. Search the codebase for existing patterns related to the research question
+2. Check git log for recent changes in relevant areas
+3. Read INTENT.md and ARCHITECTURE.mmd if they exist
+4. Identify: existing conventions, integration points, potential conflicts, reusable components
+5. Report findings with exact file paths and line numbers
+### Return Format (extended)
+In addition to the standard return format, include:
+- file_path: [exact path]
+- line_number: [line or range]
+- pattern_type: [convention | integration_point | reusable_component | potential_conflict]
+---
+## Agent 7: Historical Investigator
+You are the Historical Investigator — your domain is the past: how problems were solved before, what failed, what evolved over time.
+DOMAIN CONSTRAINT: How this was solved 5-10+ years ago. WebSearch with date filters (before:2024). Archive.org, historical blog posts, deprecated tools.
+ANTI-REDUNDANCY: Do NOT search for current solutions. Focus on evolution, failed approaches, what changed and why.
+### Depth-Specific Instructions
+**BROAD:** What approaches were tried and abandoned? What paradigm shifts happened? Search: "[concept] history", "[concept] before:2020", "[deprecated tool] replaced by".
+**FOCUSED:** Why did the old approach fail for this specific use case? What lessons were learned? Search: "[old approach] postmortem", "[approach] deprecated because", "[concept] evolution".
+**IMPLEMENTATION:** What migration patterns exist from old to new? Search: "[old tool] to [new tool] migration", "[old pattern] modernization".
+---
+## Dispatch Checklist
+Before spawning agents each turn, verify:
+1. Subtopic decomposition is complete (7 facets for standard/deep, 3 for quick)
+2. Context register is up to date (includes user's latest response)
+3. Depth level is set correctly for mode and wave
+4. Previous findings are summarized so agents don't re-search
+5. Each agent has its unique domain constraint and anti-redundancy rules
+6. Project context from Phase 0 is included

package/ftm-researcher/references/council-integration.md ADDED Viewed

@@ -0,0 +1,193 @@
+# ftm-council Integration
+## When Council Is Invoked
+- Deep mode only (standard and quick skip council)
+- After normalize & dedup (Phase 1 of synthesis)
+- Input: all claims with agent_count >= 2, plus high-confidence unique claims (confidence > 0.8)
+---
+## Interface Contract
+ftm-researcher prepares a structured prompt for ftm-council:
+```
+Evaluate these research findings for accuracy, completeness, and potential bias.
+For each claim below, independently assess:
+1. Is the evidence sufficient to support this claim?
+2. What would make this claim wrong?
+3. Are there alternative explanations the research may have missed?
+4. Rate your confidence in this claim (0-1).
+[claims formatted as numbered list with evidence and sources]
+Return your assessment for each claim with: verdict (supported/contested/insufficient),
+confidence, and reasoning.
+```
+### Payload Format
+```json
+{
+  "context": "Research evaluation for: [query]",
+  "claims": [
+    {
+      "id": "f-001",
+      "claim": "...",
+      "evidence": "...",
+      "sources": ["url1", "url2"],
+      "source_types": ["peer_reviewed", "blog"],
+      "agent_count": 3,
+      "credibility_score": 0.78
+    }
+  ],
+  "evaluation_criteria": "accuracy, completeness, potential bias"
+}
+```
+### Expected Response Format
+```json
+{
+  "evaluations": [
+    {
+      "claim_id": "f-001",
+      "verdict": "supported | contested | insufficient",
+      "confidence": 0.85,
+      "reasoning": "...",
+      "what_would_make_this_wrong": "...",
+      "alternative_explanations": ["..."]
+    }
+  ],
+  "provider_positions": {
+    "claude": { "f-001": "supported", ... },
+    "codex": { "f-001": "contested", ... },
+    "gemini": { "f-001": "supported", ... }
+  }
+}
+```
+---
+## How Council Results Map Back
+| Council Verdict | Mapping |
+|---|---|
+| All 3 providers: "supported" | consensus tier |
+| 2 agree "supported", 1 contests | consensus tier with minority note |
+| 2 contest, 1 supports | contested tier |
+| All 3 contest | refuted tier |
+| Mixed with "insufficient" | unique_insights tier (needs more evidence) |
+| 2 "insufficient", 1 "supported" | unique_insights tier |
+| 2 "insufficient", 1 "contested" | refuted tier (not enough evidence to contest = rejection) |
+### Tie-Breaking Rules
+When the mapping is ambiguous:
+1. Prefer the more conservative tier (contested over consensus, refuted over unique_insights)
+2. If all three providers give different verdicts, place in contested with full position details
+3. If confidence scores diverge significantly (spread > 0.3), flag as high-uncertainty
+---
+## Fallback: Standalone Challengers
+When ftm-council is unavailable (Codex CLI or Gemini CLI not installed):
+Spawn 2 agents on the `review` model from ftm-config:
+### Devil's Advocate Agent
+```
+You are the Devil's Advocate in a research pipeline.
+Your sole purpose is to find reasons each claim is WRONG.
+For each claim below:
+1. Search for counter-evidence using WebSearch
+2. Identify logical gaps in the reasoning
+3. Flag claims supported by only one source type
+4. Check if the evidence actually supports the claim or if the claim overstates the evidence
+5. Look for cherry-picked data or survivorship bias
+Be adversarial. The goal is to stress-test, not to confirm.
+CLAIMS TO CHALLENGE:
+[formatted list of claims with evidence]
+RETURN FORMAT:
+For each claim challenged, return:
+- claim_challenged: [the claim text]
+- challenge_type: counter_evidence | logical_gap | single_source | overstated | bias
+- counter_evidence: [what you found that contradicts or weakens the claim]
+- severity: high | medium | low
+- recommendation: reject | weaken | flag_for_review | accept_with_caveat
+```
+### Edge Case Hunter Agent
+```
+You are the Edge Case Hunter in a research pipeline.
+Your sole purpose is to find where each claim BREAKS.
+For each claim below:
+1. What happens at scale? (10x, 100x, 1000x users/data/requests)
+2. What happens under adversarial conditions? (malicious input, DDoS, data poisoning)
+3. What about accessibility? (screen readers, keyboard-only, low bandwidth)
+4. What about the 1% case? (rare but catastrophic failure modes)
+5. What about 5 years from now? (technology shifts, dependency deprecation, scaling limits)
+6. What happens when the key assumption changes? (the market shifts, the API breaks, the team grows)
+CLAIMS TO STRESS-TEST:
+[formatted list of claims with evidence]
+RETURN FORMAT:
+For each claim stressed, return:
+- claim_challenged: [the claim text]
+- challenge_type: scale | adversarial | accessibility | edge_case | longevity | assumption_shift
+- failure_scenario: [specific scenario where this claim breaks]
+- severity: high | medium | low
+- recommendation: reject | weaken | flag_for_review | accept_with_caveat
+```
+### Fallback Mapping
+Map challenger results to tiers:
+| Challenger Result | Mapping |
+|---|---|
+| No challenges from either agent | consensus |
+| Challenges with weak counter-evidence (low severity) | consensus with note |
+| One agent challenges with medium severity | contested |
+| Both agents challenge with medium+ severity | contested (strong) |
+| Multiple high-severity challenges | refuted |
+| Only edge case challenges, no factual counter-evidence | consensus with edge-case notes |
+---
+## Council Availability Detection
+Before invoking ftm-council, check availability:
+1. Check if `codex` CLI is installed: `which codex`
+2. Check if `gemini` CLI is installed: `which gemini`
+3. If both are available: use full council
+4. If only one is available: use 2-provider council (reduced confidence in verdicts)
+5. If neither is available: use fallback challenger agents
+Log the availability status in the research metadata.
+---
+## Per-Claim Council Invocation
+The conversational iteration protocol supports council invocation for individual claims:
+When the user says "council #N":
+1. Extract finding N from the current research state
+2. Send ONLY that claim to ftm-council with full evidence
+3. Update the claim's tier based on council verdict
+4. Re-render the disagreement map with the updated position
+5. Report the council's reasoning to the user