@companion-ai/feynman 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.env.example +8 -0
- package/.feynman/SYSTEM.md +62 -0
- package/.feynman/agents/researcher.md +63 -0
- package/.feynman/agents/reviewer.md +84 -0
- package/.feynman/agents/verifier.md +38 -0
- package/.feynman/agents/writer.md +51 -0
- package/.feynman/settings.json +20 -0
- package/.feynman/themes/feynman.json +85 -0
- package/AGENTS.md +53 -0
- package/README.md +99 -0
- package/bin/feynman.js +2 -0
- package/dist/bootstrap/sync.js +98 -0
- package/dist/cli.js +297 -0
- package/dist/config/commands.js +71 -0
- package/dist/config/feynman-config.js +42 -0
- package/dist/config/paths.js +32 -0
- package/dist/feynman-prompt.js +63 -0
- package/dist/index.js +5 -0
- package/dist/model/catalog.js +238 -0
- package/dist/model/commands.js +165 -0
- package/dist/pi/launch.js +31 -0
- package/dist/pi/runtime.js +70 -0
- package/dist/pi/settings.js +101 -0
- package/dist/pi/web-access.js +74 -0
- package/dist/search/commands.js +12 -0
- package/dist/setup/doctor.js +126 -0
- package/dist/setup/preview.js +20 -0
- package/dist/setup/prompts.js +29 -0
- package/dist/setup/setup.js +119 -0
- package/dist/system/executables.js +38 -0
- package/dist/system/promise-polyfill.js +12 -0
- package/dist/ui/terminal.js +53 -0
- package/dist/web-search.js +1 -0
- package/extensions/research-tools/alpha.ts +212 -0
- package/extensions/research-tools/header.ts +379 -0
- package/extensions/research-tools/help.ts +93 -0
- package/extensions/research-tools/preview.ts +233 -0
- package/extensions/research-tools/project.ts +116 -0
- package/extensions/research-tools/session-search.ts +223 -0
- package/extensions/research-tools/shared.ts +46 -0
- package/extensions/research-tools.ts +25 -0
- package/metadata/commands.d.mts +46 -0
- package/metadata/commands.mjs +133 -0
- package/package.json +71 -0
- package/prompts/audit.md +15 -0
- package/prompts/autoresearch.md +63 -0
- package/prompts/compare.md +16 -0
- package/prompts/deepresearch.md +167 -0
- package/prompts/delegate.md +21 -0
- package/prompts/draft.md +16 -0
- package/prompts/jobs.md +16 -0
- package/prompts/lit.md +16 -0
- package/prompts/log.md +14 -0
- package/prompts/replicate.md +22 -0
- package/prompts/review.md +15 -0
- package/prompts/watch.md +14 -0
- package/scripts/patch-embedded-pi.mjs +319 -0
- package/skills/agentcomputer/SKILL.md +108 -0
- package/skills/agentcomputer/references/acp-flow.md +23 -0
- package/skills/agentcomputer/references/cli-cheatsheet.md +68 -0
- package/skills/autoresearch/SKILL.md +12 -0
- package/skills/deep-research/SKILL.md +12 -0
- package/skills/docker/SKILL.md +84 -0
- package/skills/jobs/SKILL.md +10 -0
- package/skills/literature-review/SKILL.md +12 -0
- package/skills/paper-code-audit/SKILL.md +12 -0
- package/skills/paper-writing/SKILL.md +12 -0
- package/skills/peer-review/SKILL.md +12 -0
- package/skills/replication/SKILL.md +14 -0
- package/skills/session-log/SKILL.md +10 -0
- package/skills/source-comparison/SKILL.md +12 -0
- package/skills/watch/SKILL.md +12 -0
|
@@ -0,0 +1,63 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Autonomous experiment loop — try ideas, measure results, keep what works, discard what doesn't, repeat.
|
|
3
|
+
args: <idea>
|
|
4
|
+
section: Research Workflows
|
|
5
|
+
topLevelCli: true
|
|
6
|
+
---
|
|
7
|
+
Start an autoresearch optimization loop for: $@
|
|
8
|
+
|
|
9
|
+
This command uses pi-autoresearch.
|
|
10
|
+
|
|
11
|
+
## Step 1: Gather
|
|
12
|
+
|
|
13
|
+
If `autoresearch.md` and `autoresearch.jsonl` already exist, ask the user if they want to resume or start fresh.
|
|
14
|
+
|
|
15
|
+
Otherwise, collect the following from the user before doing anything else:
|
|
16
|
+
- What to optimize (test speed, bundle size, training loss, build time, etc.)
|
|
17
|
+
- The benchmark command to run
|
|
18
|
+
- The metric name, unit, and direction (lower/higher is better)
|
|
19
|
+
- Files in scope for changes
|
|
20
|
+
- Maximum number of iterations (default: 20)
|
|
21
|
+
|
|
22
|
+
## Step 2: Environment
|
|
23
|
+
|
|
24
|
+
Ask the user where to run:
|
|
25
|
+
- **Local** — run in the current working directory
|
|
26
|
+
- **New git branch** — create a branch so main stays clean
|
|
27
|
+
- **Virtual environment** — create an isolated venv/conda env first
|
|
28
|
+
- **Docker** — run experiment code inside an isolated Docker container
|
|
29
|
+
- **Cloud** — delegate to a remote Agent Computer machine via `/delegate`
|
|
30
|
+
|
|
31
|
+
Do not proceed without a clear answer.
|
|
32
|
+
|
|
33
|
+
## Step 3: Confirm
|
|
34
|
+
|
|
35
|
+
Present the full plan to the user before starting:
|
|
36
|
+
|
|
37
|
+
```
|
|
38
|
+
Optimization target: [metric] ([direction])
|
|
39
|
+
Benchmark command: [command]
|
|
40
|
+
Files in scope: [files]
|
|
41
|
+
Environment: [chosen environment]
|
|
42
|
+
Max iterations: [N]
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
Ask the user to confirm. Do not start the loop without explicit approval.
|
|
46
|
+
|
|
47
|
+
## Step 4: Run
|
|
48
|
+
|
|
49
|
+
Initialize the session: create `autoresearch.md`, `autoresearch.sh`, run the baseline, and start looping.
|
|
50
|
+
|
|
51
|
+
Each iteration: edit → commit → `run_experiment` → `log_experiment` → keep or revert → repeat. Do not stop unless interrupted or `maxIterations` is reached.
|
|
52
|
+
|
|
53
|
+
## Key tools
|
|
54
|
+
|
|
55
|
+
- `init_experiment` — one-time session config (name, metric, unit, direction)
|
|
56
|
+
- `run_experiment` — run the benchmark command, capture output and wall-clock time
|
|
57
|
+
- `log_experiment` — record result, auto-commit, update dashboard
|
|
58
|
+
|
|
59
|
+
## Subcommands
|
|
60
|
+
|
|
61
|
+
- `/autoresearch <text>` — start or resume the loop
|
|
62
|
+
- `/autoresearch off` — stop the loop, keep data
|
|
63
|
+
- `/autoresearch clear` — delete all state and start fresh
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Compare multiple sources on a topic and produce a source-grounded matrix of agreements, disagreements, and confidence.
|
|
3
|
+
args: <topic>
|
|
4
|
+
section: Research Workflows
|
|
5
|
+
topLevelCli: true
|
|
6
|
+
---
|
|
7
|
+
Compare sources for: $@
|
|
8
|
+
|
|
9
|
+
Requirements:
|
|
10
|
+
- Before starting, outline the comparison plan: which sources to compare, which dimensions to evaluate, expected output structure. Present the plan to the user and confirm before proceeding.
|
|
11
|
+
- Use the `researcher` subagent to gather source material when the comparison set is broad, and the `verifier` subagent to verify sources and add inline citations to the final matrix.
|
|
12
|
+
- Build a comparison matrix covering: source, key claim, evidence type, caveats, confidence.
|
|
13
|
+
- Generate charts with `pi-charts` when the comparison involves quantitative metrics. Use Mermaid for method or architecture comparisons.
|
|
14
|
+
- Distinguish agreement, disagreement, and uncertainty clearly.
|
|
15
|
+
- Save exactly one comparison to `outputs/` as markdown.
|
|
16
|
+
- End with a `Sources` section containing direct URLs for every source used.
|
|
@@ -0,0 +1,167 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Run a thorough, source-heavy investigation on a topic and produce a durable research brief with inline citations.
|
|
3
|
+
args: <topic>
|
|
4
|
+
section: Research Workflows
|
|
5
|
+
topLevelCli: true
|
|
6
|
+
---
|
|
7
|
+
Run a deep research workflow for: $@
|
|
8
|
+
|
|
9
|
+
You are the Lead Researcher. You plan, delegate, evaluate, verify, write, and cite. Internal orchestration is invisible to the user unless they ask.
|
|
10
|
+
|
|
11
|
+
## 1. Plan
|
|
12
|
+
|
|
13
|
+
Analyze the research question using extended thinking. Develop a research strategy:
|
|
14
|
+
- Key questions that must be answered
|
|
15
|
+
- Evidence types needed (papers, web, code, data, docs)
|
|
16
|
+
- Sub-questions disjoint enough to parallelize
|
|
17
|
+
- Source types and time periods that matter
|
|
18
|
+
- Acceptance criteria: what evidence would make the answer "sufficient"
|
|
19
|
+
|
|
20
|
+
Write the plan to `outputs/.plans/deepresearch-plan.md` as a self-contained artifact:
|
|
21
|
+
|
|
22
|
+
```markdown
|
|
23
|
+
# Research Plan: [topic]
|
|
24
|
+
|
|
25
|
+
## Questions
|
|
26
|
+
1. ...
|
|
27
|
+
|
|
28
|
+
## Strategy
|
|
29
|
+
- Researcher allocations and dimensions
|
|
30
|
+
- Expected rounds
|
|
31
|
+
|
|
32
|
+
## Acceptance Criteria
|
|
33
|
+
- [ ] All key questions answered with ≥2 independent sources
|
|
34
|
+
- [ ] Contradictions identified and addressed
|
|
35
|
+
- [ ] No single-source claims on critical findings
|
|
36
|
+
|
|
37
|
+
## Decision Log
|
|
38
|
+
(Updated as the workflow progresses)
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
Also save the plan with `memory_remember` (type: `fact`, key: `deepresearch.plan`) so it survives context truncation.
|
|
42
|
+
|
|
43
|
+
Present the plan to the user and ask them to confirm before proceeding. If the user wants changes, revise the plan first.
|
|
44
|
+
|
|
45
|
+
## 2. Scale decision
|
|
46
|
+
|
|
47
|
+
| Query type | Execution |
|
|
48
|
+
|---|---|
|
|
49
|
+
| Single fact or narrow question | Search directly yourself, no subagents, 3-10 tool calls |
|
|
50
|
+
| Direct comparison (2-3 items) | 2 parallel `researcher` subagents |
|
|
51
|
+
| Broad survey or multi-faceted topic | 3-4 parallel `researcher` subagents |
|
|
52
|
+
| Complex multi-domain research | 4-6 parallel `researcher` subagents |
|
|
53
|
+
|
|
54
|
+
Never spawn subagents for work you can do in 5 tool calls.
|
|
55
|
+
|
|
56
|
+
## 3. Spawn researchers
|
|
57
|
+
|
|
58
|
+
Launch parallel `researcher` subagents via `subagent`. Each gets a structured brief with:
|
|
59
|
+
- **Objective:** what to find
|
|
60
|
+
- **Output format:** numbered sources, evidence table, inline source references
|
|
61
|
+
- **Tool guidance:** which search tools to prioritize
|
|
62
|
+
- **Task boundaries:** what NOT to cover (another researcher handles that)
|
|
63
|
+
|
|
64
|
+
Assign each researcher a clearly disjoint dimension — different source types, geographic scopes, time periods, or technical angles. Never duplicate coverage.
|
|
65
|
+
|
|
66
|
+
```
|
|
67
|
+
{
|
|
68
|
+
tasks: [
|
|
69
|
+
{ agent: "researcher", task: "...", output: "research-web.md" },
|
|
70
|
+
{ agent: "researcher", task: "...", output: "research-papers.md" }
|
|
71
|
+
],
|
|
72
|
+
concurrency: 4,
|
|
73
|
+
failFast: false
|
|
74
|
+
}
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
Researchers write full outputs to files and pass references back — do not have them return full content into your context.
|
|
78
|
+
|
|
79
|
+
## 4. Evaluate and loop
|
|
80
|
+
|
|
81
|
+
After researchers return, read their output files and critically assess:
|
|
82
|
+
- Which plan questions remain unanswered?
|
|
83
|
+
- Which answers rest on only one source?
|
|
84
|
+
- Are there contradictions needing resolution?
|
|
85
|
+
- Is any key angle missing entirely?
|
|
86
|
+
|
|
87
|
+
If gaps are significant, spawn another targeted batch of researchers. No fixed cap on rounds — iterate until evidence is sufficient or sources are exhausted.
|
|
88
|
+
|
|
89
|
+
Update the plan artifact (`outputs/.plans/deepresearch-plan.md`) decision log after each round.
|
|
90
|
+
|
|
91
|
+
Most topics need 1-2 rounds. Stop when additional rounds would not materially change conclusions.
|
|
92
|
+
|
|
93
|
+
## 5. Write the report
|
|
94
|
+
|
|
95
|
+
Once evidence is sufficient, YOU write the full research brief directly. Do not delegate writing to another agent. Read the research files, synthesize the findings, and produce a complete document:
|
|
96
|
+
|
|
97
|
+
```markdown
|
|
98
|
+
# Title
|
|
99
|
+
|
|
100
|
+
## Executive Summary
|
|
101
|
+
2-3 paragraph overview of key findings.
|
|
102
|
+
|
|
103
|
+
## Section 1: ...
|
|
104
|
+
Detailed findings organized by theme or question.
|
|
105
|
+
|
|
106
|
+
## Section N: ...
|
|
107
|
+
|
|
108
|
+
## Open Questions
|
|
109
|
+
Unresolved issues, disagreements between sources, gaps in evidence.
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
When the research includes quantitative data (benchmarks, performance comparisons, trends), generate charts using `pi-charts`. Use Mermaid diagrams for architectures and processes. Every visual must have a caption and reference the underlying data.
|
|
113
|
+
|
|
114
|
+
Save this draft to a temp file (e.g., `draft.md` in the chain artifacts dir or a temp path).
|
|
115
|
+
|
|
116
|
+
## 6. Cite
|
|
117
|
+
|
|
118
|
+
Spawn the `verifier` agent to post-process YOUR draft. The verifier agent adds inline citations, verifies every source URL, and produces the final output:
|
|
119
|
+
|
|
120
|
+
```
|
|
121
|
+
{ agent: "verifier", task: "Add inline citations to draft.md using the research files as source material. Verify every URL.", output: "brief.md" }
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
The verifier agent does not rewrite the report — it only anchors claims to sources and builds the numbered Sources section.
|
|
125
|
+
|
|
126
|
+
## 7. Verify
|
|
127
|
+
|
|
128
|
+
Spawn the `reviewer` agent against the cited draft. The reviewer checks for:
|
|
129
|
+
- Unsupported claims that slipped past citation
|
|
130
|
+
- Logical gaps or contradictions between sections
|
|
131
|
+
- Single-source claims on critical findings
|
|
132
|
+
- Overstated confidence relative to evidence quality
|
|
133
|
+
|
|
134
|
+
```
|
|
135
|
+
{ agent: "reviewer", task: "Verify brief.md — flag any claims that lack sufficient source backing, identify logical gaps, and check that confidence levels match evidence strength. This is a verification pass, not a peer review.", output: "verification.md" }
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
If the reviewer flags FATAL issues, fix them in the brief before delivering. MAJOR issues get noted in the Open Questions section. MINOR issues are accepted.
|
|
139
|
+
|
|
140
|
+
## 8. Deliver
|
|
141
|
+
|
|
142
|
+
Copy the final cited and verified output to the appropriate folder:
|
|
143
|
+
- Paper-style drafts → `papers/`
|
|
144
|
+
- Everything else → `outputs/`
|
|
145
|
+
|
|
146
|
+
Use a descriptive filename based on the topic.
|
|
147
|
+
|
|
148
|
+
Write a provenance record alongside the main artifact as `<filename>.provenance.md`:
|
|
149
|
+
|
|
150
|
+
```markdown
|
|
151
|
+
# Provenance: [topic]
|
|
152
|
+
|
|
153
|
+
- **Date:** [date]
|
|
154
|
+
- **Rounds:** [number of researcher rounds]
|
|
155
|
+
- **Sources consulted:** [total unique sources across all research files]
|
|
156
|
+
- **Sources accepted:** [sources that survived citation verification]
|
|
157
|
+
- **Sources rejected:** [dead links, unverifiable, or removed]
|
|
158
|
+
- **Verification:** [PASS / PASS WITH NOTES — summary of reviewer findings]
|
|
159
|
+
- **Plan:** outputs/.plans/deepresearch-plan.md
|
|
160
|
+
- **Research files:** [list of intermediate research-*.md files]
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
## Background execution
|
|
164
|
+
|
|
165
|
+
If the user wants unattended execution or the sweep will clearly take a while:
|
|
166
|
+
- Launch the full workflow via `subagent` using `clarify: false, async: true`
|
|
167
|
+
- Report the async ID and how to check status with `subagent_status`
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Delegate a research task to a remote Agent Computer machine for cloud execution.
|
|
3
|
+
args: <task>
|
|
4
|
+
section: Internal
|
|
5
|
+
---
|
|
6
|
+
Delegate the following task to a remote Agent Computer machine: $@
|
|
7
|
+
|
|
8
|
+
## Workflow
|
|
9
|
+
|
|
10
|
+
1. **Check CLI** — Verify `computer` or `aicomputer` is installed and authenticated. If not, install with `npm install -g aicomputer` and run `computer login`.
|
|
11
|
+
2. **Pick a machine** — Run `computer ls --json` and choose an appropriate machine. If none are running, tell the user to create one with `computer create`.
|
|
12
|
+
3. **Pick an agent** — Run `computer agent agents <machine> --json` and choose an installed agent with credentials (prefer Claude).
|
|
13
|
+
4. **Create a session** — Use `computer agent sessions new <machine> --agent claude --name research --json`.
|
|
14
|
+
5. **Send the task** — Translate the user's research task into a self-contained prompt and send it via `computer agent prompt`. The prompt must include:
|
|
15
|
+
- The full research objective
|
|
16
|
+
- Where to write outputs (default: `/workspace/outputs/`)
|
|
17
|
+
- What artifact to produce when done (summary file)
|
|
18
|
+
- Any tools or data sources to use
|
|
19
|
+
6. **Monitor** — Use `computer agent watch <machine> --session <session_id>` to stream progress. Report status to the user at meaningful milestones.
|
|
20
|
+
7. **Retrieve results** — When the remote agent finishes, pull the summary back with `computer agent prompt <machine> "cat /workspace/outputs/summary.md" --session <session_id>`. Present results to the user.
|
|
21
|
+
8. **Clean up** — Close the session with `computer agent close <machine> --session <session_id>` unless the user wants to continue.
|
package/prompts/draft.md
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Turn research findings into a polished paper-style draft with equations, sections, and explicit claims.
|
|
3
|
+
args: <topic>
|
|
4
|
+
section: Research Workflows
|
|
5
|
+
topLevelCli: true
|
|
6
|
+
---
|
|
7
|
+
Write a paper-style draft for: $@
|
|
8
|
+
|
|
9
|
+
Requirements:
|
|
10
|
+
- Before writing, outline the draft structure: proposed title, sections, key claims to make, and source material to draw from. Present the outline to the user and confirm before proceeding.
|
|
11
|
+
- Use the `writer` subagent when the draft should be produced from already-collected notes, then use the `verifier` subagent to add inline citations and verify sources.
|
|
12
|
+
- Include at minimum: title, abstract, problem statement, related work, method or synthesis, evidence or experiments, limitations, conclusion.
|
|
13
|
+
- Use clean Markdown with LaTeX where equations materially help.
|
|
14
|
+
- Generate charts with `pi-charts` for quantitative data, benchmarks, and comparisons. Use Mermaid for architectures and pipelines. Every figure needs a caption.
|
|
15
|
+
- Save exactly one draft to `papers/` as markdown.
|
|
16
|
+
- End with a `Sources` appendix with direct URLs for all primary references.
|
package/prompts/jobs.md
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Inspect active background research work, including running processes and scheduled follow-ups.
|
|
3
|
+
section: Project & Session
|
|
4
|
+
topLevelCli: true
|
|
5
|
+
---
|
|
6
|
+
Inspect active background work for this project.
|
|
7
|
+
|
|
8
|
+
Requirements:
|
|
9
|
+
- Use the `process` tool with the `list` action to inspect running and finished managed background processes.
|
|
10
|
+
- Use the scheduling tooling to list active recurring or deferred jobs if any are configured.
|
|
11
|
+
- Summarize:
|
|
12
|
+
- active background processes
|
|
13
|
+
- queued or recurring research watches
|
|
14
|
+
- failures that need attention
|
|
15
|
+
- the next concrete command the user should run if they want logs or detailed status
|
|
16
|
+
- Be concise and operational.
|
package/prompts/lit.md
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Run a literature review on a topic using paper search and primary-source synthesis.
|
|
3
|
+
args: <topic>
|
|
4
|
+
section: Research Workflows
|
|
5
|
+
topLevelCli: true
|
|
6
|
+
---
|
|
7
|
+
Investigate the following topic as a literature review: $@
|
|
8
|
+
|
|
9
|
+
## Workflow
|
|
10
|
+
|
|
11
|
+
1. **Plan** — Outline the scope: key questions, source types to search (papers, web, repos), time period, and expected sections. Present the plan to the user and confirm before proceeding.
|
|
12
|
+
2. **Gather** — Use the `researcher` subagent when the sweep is wide enough to benefit from delegated paper triage before synthesis. For narrow topics, search directly.
|
|
13
|
+
2. **Synthesize** — Separate consensus, disagreements, and open questions. When useful, propose concrete next experiments or follow-up reading. Generate charts with `pi-charts` for quantitative comparisons across papers and Mermaid diagrams for taxonomies or method pipelines.
|
|
14
|
+
4. **Cite** — Spawn the `verifier` agent to add inline citations and verify every source URL in the draft.
|
|
15
|
+
5. **Verify** — Spawn the `reviewer` agent to check the cited draft for unsupported claims, logical gaps, and single-source critical findings. Fix FATAL issues before delivering. Note MAJOR issues in Open Questions.
|
|
16
|
+
6. **Deliver** — Save exactly one literature review to `outputs/` as markdown. Write a provenance record alongside it as `<filename>.provenance.md` listing: date, sources consulted vs. accepted vs. rejected, verification status, and intermediate research files used.
|
package/prompts/log.md
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Write a durable session log with completed work, findings, open questions, and next steps.
|
|
3
|
+
section: Project & Session
|
|
4
|
+
topLevelCli: true
|
|
5
|
+
---
|
|
6
|
+
Write a session log for the current research work.
|
|
7
|
+
|
|
8
|
+
Requirements:
|
|
9
|
+
- Summarize what was done in this session.
|
|
10
|
+
- Capture the strongest findings or decisions.
|
|
11
|
+
- List open questions, unresolved risks, and concrete next steps.
|
|
12
|
+
- Reference any important artifacts written to `notes/`, `outputs/`, `experiments/`, or `papers/`.
|
|
13
|
+
- If any external claims matter, include direct source URLs.
|
|
14
|
+
- Save the log to `notes/` as markdown with a date-oriented filename.
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Plan or execute a replication workflow for a paper, claim, or benchmark.
|
|
3
|
+
args: <paper>
|
|
4
|
+
section: Research Workflows
|
|
5
|
+
topLevelCli: true
|
|
6
|
+
---
|
|
7
|
+
Design a replication plan for: $@
|
|
8
|
+
|
|
9
|
+
## Workflow
|
|
10
|
+
|
|
11
|
+
1. **Extract** — Use the `researcher` subagent to pull implementation details from the target paper and any linked code.
|
|
12
|
+
2. **Plan** — Determine what code, datasets, metrics, and environment are needed. Be explicit about what is verified, what is inferred, and what is still missing.
|
|
13
|
+
3. **Environment** — Before running anything, ask the user where to execute:
|
|
14
|
+
- **Local** — run in the current working directory
|
|
15
|
+
- **Virtual environment** — create an isolated venv/conda env first
|
|
16
|
+
- **Docker** — run experiment code inside an isolated Docker container
|
|
17
|
+
- **Cloud** — delegate to a remote Agent Computer machine via `/delegate`
|
|
18
|
+
- **Plan only** — produce the replication plan without executing
|
|
19
|
+
4. **Execute** — If the user chose an execution environment, implement and run the replication steps there. Save notes, scripts, and results to disk in a reproducible layout.
|
|
20
|
+
5. **Report** — End with a `Sources` section containing paper and repository URLs.
|
|
21
|
+
|
|
22
|
+
Do not install packages, run training, or execute experiments without confirming the execution environment first.
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Simulate an AI research peer review with likely objections, severity, and a concrete revision plan.
|
|
3
|
+
args: <artifact>
|
|
4
|
+
section: Research Workflows
|
|
5
|
+
topLevelCli: true
|
|
6
|
+
---
|
|
7
|
+
Review this AI research artifact: $@
|
|
8
|
+
|
|
9
|
+
Requirements:
|
|
10
|
+
- Before starting, outline what will be reviewed and the review criteria (novelty, empirical rigor, baselines, reproducibility, etc.). Present the plan to the user and confirm before proceeding.
|
|
11
|
+
- Spawn a `researcher` subagent to gather evidence on the artifact — inspect the paper, code, cited work, and any linked experimental artifacts. Save to `research.md`.
|
|
12
|
+
- Spawn a `reviewer` subagent with `research.md` to produce the final peer review with inline annotations.
|
|
13
|
+
- For small or simple artifacts where evidence gathering is overkill, run the `reviewer` subagent directly instead.
|
|
14
|
+
- Save exactly one review artifact to `outputs/` as markdown.
|
|
15
|
+
- End with a `Sources` section containing direct URLs for every inspected external source.
|
package/prompts/watch.md
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Set up a recurring or deferred research watch on a topic, company, paper area, or product surface.
|
|
3
|
+
args: <topic>
|
|
4
|
+
section: Research Workflows
|
|
5
|
+
topLevelCli: true
|
|
6
|
+
---
|
|
7
|
+
Create a research watch for: $@
|
|
8
|
+
|
|
9
|
+
Requirements:
|
|
10
|
+
- Before starting, outline the watch plan: what to monitor, what signals matter, what counts as a meaningful change, and the check frequency. Present the plan to the user and confirm before proceeding.
|
|
11
|
+
- Start with a baseline sweep of the topic.
|
|
12
|
+
- Use `schedule_prompt` to create the recurring or delayed follow-up instead of merely promising to check later.
|
|
13
|
+
- Save exactly one baseline artifact to `outputs/`.
|
|
14
|
+
- End with a `Sources` section containing direct URLs for every source used.
|