npm - open-research - Versions diffs - 0.1.25 → 0.1.26 - Mend

open-research 0.1.25 → 0.1.26

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/README.md +48 -18
package/builtin-skills/novelty-checker/SKILL.md +79 -0
package/builtin-skills/paper-explainer/SKILL.md +83 -21
package/builtin-skills/reviewer-response/SKILL.md +99 -0
package/builtin-skills/skill-creator/SKILL.md +98 -7
package/dist/chunk-AYB7CAO5.js +128 -0
package/dist/cli.js +3893 -1895
package/dist/sessions-FMB5GHSR.js +10 -0
package/package.json +1 -1
package/builtin-skills/literature-reviewer/SKILL.md +0 -72
package/builtin-skills/synthesis-updater/SKILL.md +0 -43

package/README.md CHANGED Viewed

@@ -70,7 +70,7 @@ The agent searches arXiv, Semantic Scholar, and OpenAlex — reads papers, runs
 Those are coding agents. Open Research is a **research agent**.
-It has tools that coding agents don't: federated academic paper search, PDF extraction, source-grounded synthesis, and pluggable research skills (devil's advocate, methodology critic, experiment designer, etc.).
+It has tools that coding agents don't: federated academic paper search, PDF extraction, source-grounded synthesis, sub-agent delegation, and pluggable research skills (novelty checker, experiment designer, reviewer response manager, etc.).
 Everything stays local. Your workspace is a directory with `sources/`, `notes/`, `papers/`, `experiments/`. The agent reads and writes to it. Risky edits go to a review queue.
@@ -102,17 +102,29 @@ You review the charter and either approve it, send it back for revision, or canc
 **Phase 2 — Execution.** Once approved, the agent executes the charter autonomously — searching papers, reading sources, running analysis code, writing notes, and producing artifacts. It runs until the success criteria are met or it hits a dead end and reports what it found.
+## Sub-Agents
+The main agent can delegate exploration tasks to lightweight sub-agents that run on their own context window. This keeps the main agent's context clean and improves token efficiency.
+```
+launch_subagent(type: "explore", goal: "Find all files related to the auth flow...")
+```
+The **explore** sub-agent runs on `gpt-5.4-mini` with high reasoning effort. It has read-only tools (`read_file`, `list_directory`, `search_workspace`) and returns a concise, conclusion-oriented summary. The main agent gets the answer without burning its context on raw file reads.
+Sub-agents are extensible — new types can be added as config entries without changing the tool schema.
 ## Research Skills
 Skills are pluggable research methodologies — detailed workflow prompts that guide the agent through a specific research task. Type `/<skill-name>` to activate.
-### Discovery & Reading
+### Ideation & Discovery
 | Skill | What it does |
 |---|---|
+| **`/novelty-checker`** | Quick "has this been done?" assessment. Decomposes ideas into technique/domain/claim components, runs 5-8 search variations, and delivers a verdict: Novel, Partially novel, Incremental, or Already done — with closest existing work, white space map, and pivot recommendations. |
 | **`/source-scout`** | Systematically finds papers the workspace is missing. Searches with multiple query variations, evaluates relevance by citation count and venue, fetches key papers, produces a prioritized scout report with gap analysis. |
-| **`/paper-explainer`** | Deep-reads a paper and produces a structured breakdown: one-sentence summary, problem & motivation, key contributions, method explained at two levels (intuitive + technical), experimental results, limitations, and connections to your workspace. |
-| **`/literature-reviewer`** | Produces a structured literature review: inventories all sources, clusters by theme, synthesizes each theme chronologically, maps relationships between papers, performs gap analysis (methodological, empirical, theoretical), and writes the review with optional PRISMA systematic review support. |
+| **`/paper-explainer`** | Two modes: (1) Single paper deep read with structured breakdown including methodological red flags, or (2) Multi-paper comparison table with structured extraction across 6-10 dimensions (Elicit-style) and cross-paper synthesis. |
 ### Critical Evaluation
@@ -129,24 +141,28 @@ Skills are pluggable research methodologies — detailed workflow prompts that g
 | **`/experiment-designer`** | Autonomous proof engine. Takes a hypothesis and runs the full loop: formalize → design minimal experiment → write code → run it → analyze results → iterate (up to 5x) until proven or disproven. All artifacts saved to `experiments/` with versioned scripts. |
 | **`/data-analyst`** | End-to-end statistical analysis: explore data (distributions, missing values) → clean (with documented decisions) → analyze (appropriate tests, mandatory effect sizes and confidence intervals) → visualize (matplotlib/seaborn) → interpret with honest caveats. |
-### Synthesis & Writing
+### Writing & Revision
 | Skill | What it does |
 |---|---|
-| **`/synthesis-updater`** | Living-document management. Integrates new evidence into existing notes with full provenance tracking (`[Source: Author Year]`), confidence labels (`[Strong]`, `[Moderate]`, `[Weak]`, `[Contested]`), change trails, and a synthesis changelog. |
 | **`/draft-paper`** | Drafts a publication-quality LaTeX paper: gathers workspace evidence → outlines the argument → writes each section (intro through conclusion) → generates BibTeX from sources → self-reviews for unsupported claims and argument flow. |
+| **`/reviewer-response`** | Parses peer review comments into numbered items (R1.1, R1.2...), classifies as Major/Minor/Praise/Question, flags contradictions between reviewers, generates a point-by-point response letter with verbatim quotes and specific change locations, and maintains a revision completion checklist. |
 ### Meta
 | Skill | What it does |
 |---|---|
-| **`/skill-creator`** | Create your own custom skills in `~/.open-research/skills/`. Each skill is a markdown file with a workflow prompt — no code needed. |
+| **`/skill-creator`** | Create custom skills in `~/.open-research/skills/`. Full guidance on the SKILL.md format, directory structure, prompt design, and validation — with quality guidelines for writing effective workflow prompts. |
 ## Memory
 The agent learns about you automatically. After each conversation, a background process identifies facts worth remembering — your research field, preferred tools, current projects, methodological preferences.
-Memories persist in `~/.open-research/memory.json` across sessions. The agent uses them to tailor its responses without being told the same things twice.
+Memories are stored at two levels:
+- **Global** (`~/.open-research/memory.json`) — your profile, preferences, expertise
+- **Project** (`<workspace>/.open-research/memory.json`) — project-specific context
+Only relevant memories are injected each turn based on query similarity, keeping the context window efficient.
 ```
 /memory              View all stored memories
@@ -172,7 +188,7 @@ For final PDF output, the agent compiles with `pdflatex` or `tectonic` via `run_
 ## Tools
-The agent has 13 tools with full filesystem and shell access:
+The agent has 14 tools with full filesystem and shell access:
 | Tool | Description |
 |---|---|
@@ -189,6 +205,7 @@ The agent has 13 tools with full filesystem and shell access:
 | `create_paper` | Create LaTeX paper drafts |
 | `load_skill` | Activate a research skill |
 | `read_skill_reference` | Read reference materials from active skills |
+| `launch_subagent` | Delegate tasks to lightweight sub-agents with isolated context |
 ## Commands
@@ -200,7 +217,15 @@ The agent has 13 tools with full filesystem and shell access:
 | `/skills` | List available research skills |
 | `/preview <file>` | Live-preview a LaTeX file in browser |
 | `/memory` | View or manage stored memories |
-| `/config` | View or change settings (model, theme, mode) |
+| `/api-keys` | Set API keys for Semantic Scholar, OpenAlex |
+| `/config` | View or change settings (model, theme, mode, apikey) |
+| `/compact` | Manually compress conversation to save context |
+| `/cost` | Show token usage and cost for the session |
+| `/context` | Show context window usage — how full it is |
+| `/btw` | Ask a side question without affecting the main conversation |
+| `/export` | Export conversation as markdown |
+| `/diff` | Show files the agent has changed this session |
+| `/doctor` | Diagnose auth, connectivity, and tool availability |
 | `/resume` | Resume a previous session |
 | `/clear` | Start a new conversation |
 | `/help` | Show all commands |
@@ -214,18 +239,23 @@ my-research/
   artifacts/       # Generated outputs
   papers/          # LaTeX paper drafts
   experiments/     # Analysis scripts, results, hypotheses
-  .open-research/  # Workspace metadata and session logs
+  .open-research/  # Workspace metadata, sessions, project memory
+    AGENTS.md      # Auto-generated project context (injected into system prompt)
 ```
 ## Features
-- **Terminal markdown** — bold, italic, code blocks, headings rendered natively
-- **Autocomplete** — slash commands and skills in an arrow-key navigable dropdown
-- **@file mentions** — reference workspace files inline in prompts
+- **Senior research director persona** — concise, conclusion-oriented responses. Findings first, evidence second.
+- **Sub-agent delegation** — explore agent handles codebase navigation on its own context, returns summaries
+- **Terminal markdown** — bold, italic, code blocks, headings rendered natively with chalk
+- **Autocomplete** — slash commands, skills, and @file mentions in a scrollable arrow-key dropdown
+- **Condensed tool activity** — grouped summary per turn instead of per-tool spam, with live progress in footer
 - **Shift+Enter** — multi-line input
-- **Context management** — automatic compaction when history exceeds 90% of context window
-- **Token tracking** — context usage visible in the status bar
-- **Tool activity streaming** — real-time display of what the agent is doing
+- **Slash command highlighting** — commands appear in blue as you type
+- **Context management** — automatic two-phase compaction at 90% of context window
+- **Token tracking** — context usage visible in the status bar (input/output/reasoning/cache breakdown)
+- **AGENTS.md** — auto-generated project context file, updated after each turn, injected into system prompt
+- **Two-tier memory** — global + project-level, with selective retrieval based on query relevance
 - **Update notifications** — checks for new versions on launch
 ## Development
@@ -235,7 +265,7 @@ git clone https://github.com/gangj277/open-research.git
 cd open-research
 npm install
 npm run dev          # dev mode
-npm test             # 80 tests
+npm test             # tests
 npm run build        # production build
 ```

package/builtin-skills/novelty-checker/SKILL.md ADDED Viewed

@@ -0,0 +1,79 @@
+---
+name: novelty-checker
+description: Quick assessment of whether a research idea has been done before, and what the competitive landscape looks like.
+---
+# Novelty Checker
+You are a research landscape analyst. Your job is to take a research idea — often rough and early-stage — and quickly determine whether it's been done, what's close to it, and where the genuine white space is. You help researchers avoid spending weeks on something that already exists.
+## Workflow
+### Phase 1: Understand the Idea
+1. Read the user's input carefully. If it's vague, use `ask_user` to clarify:
+   - What specifically are they proposing? (method, finding, application, framework)
+   - What domain or field is this in?
+   - What makes them think this might be novel?
+2. Decompose the idea into its core components. Most research ideas combine:
+   - A **technique** (what approach or method)
+   - A **domain** (what field or application area)
+   - A **claim** (what result or contribution)
+   - Example: "Using transformer attention maps for interpretable medical diagnosis" = technique (attention maps) + domain (medical diagnosis) + claim (interpretability)
+### Phase 2: Systematic Search
+Search aggressively using `search_external_sources` with multiple query strategies:
+1. **Direct match** — search the idea as stated
+2. **Component combinations** — search each pair of components (technique + domain, technique + claim, domain + claim)
+3. **Synonym variations** — replace key terms with synonyms or related concepts (e.g., "interpretable" → "explainable", "medical" → "clinical")
+4. **Broader framing** — search the general area to find survey papers that would mention existing work
+5. **Narrower framing** — search for very specific variants that might be buried in larger papers
+Run at least 5-8 searches with different query formulations. Cast a wide net.
+### Phase 3: Assess Each Hit
+For each relevant paper found:
+1. Read the title and abstract (use `fetch_url` if available as open access)
+2. Determine the overlap:
+   - **Direct hit**: this paper does essentially the same thing
+   - **Partial overlap**: shares some components but differs meaningfully
+   - **Tangential**: related topic but different approach or contribution
+3. Note the year, venue, and citation count — recent work in top venues is more concerning for novelty than old work in minor venues
+### Phase 4: Deliver the Verdict
+Write a clear assessment to `notes/novelty-check-{topic}.md`:
+**Verdict** — one of:
+- **Novel**: No existing work does this. Genuine white space.
+- **Partially novel**: Components exist separately but the specific combination is new. Differentiation needed.
+- **Incremental**: Similar work exists. The idea could still be a paper but needs clear positioning against prior art.
+- **Already done**: This has been published. Cite the existing work and pivot.
+**Closest existing work** — list the 3-5 most relevant papers:
+- Title, authors, year, venue
+- One sentence on what they did
+- One sentence on how the user's idea differs (or doesn't)
+**White space map** — what nearby areas are genuinely unexplored:
+- What variations of this idea have NOT been tried?
+- What domains has this technique NOT been applied to?
+- What claims could be made that existing work doesn't support?
+**Recommendation** — based on the landscape:
+- If novel: "Proceed. No close competitors found. Suggested positioning: ..."
+- If partially novel: "Proceed with differentiation. Key distinction from [paper X] is ... Frame the contribution as ..."
+- If incremental: "Existing work by [authors] covers the core idea. To make this publishable, you would need to ... Consider pivoting to ..."
+- If already done: "Published by [authors] in [venue] ([year]). Read this paper first. Possible pivots: ..."
+## Rules
+- Search before judging. Never declare an idea novel without running at least 5 searches with different query formulations.
+- Be honest, not encouraging. If the idea has been done, say so immediately. A researcher would rather know on day 1 than day 60.
+- Absence of evidence is not evidence of absence. If you can't find prior work, say "no prior work found in my search" — not "this is definitely novel." The databases don't cover everything.
+- Distinguish between "no one has published this" and "no one has published this in a top venue." Workshop papers, preprints, and theses count as prior art.
+- Always suggest the closest pivot. Even if the exact idea is taken, there's usually an adjacent unexplored angle.
+- Speed matters. This skill is for quick validation (15-20 tool calls max), not exhaustive literature review. If the user wants depth, suggest running `/source-scout` after.

package/builtin-skills/paper-explainer/SKILL.md CHANGED Viewed

@@ -1,43 +1,105 @@
 ---
 name: paper-explainer
-description: Deep-read a paper and produce a structured, accessible breakdown of its contributions, methods, and significance.
+description: Deep-read papers and produce structured breakdowns, or compare multiple papers in an extraction table.
 ---
 # Paper Explainer
-You are an expert paper reader. Your job is to take a complex academic paper and produce a clear, structured explanation that makes its contributions, methods, and limitations accessible — without oversimplifying.
+You are an expert paper reader. Your job is to take academic papers and produce clear, structured explanations that make contributions, methods, and limitations accessible — without oversimplifying. You operate in two modes: single-paper deep read or multi-paper comparison.
-## Workflow
+## Mode 1: Single Paper Deep Read
-1. **Read the full paper** — use `read_file` or `read_pdf` to get the complete text. Don't skim.
+### Phase 1: Read
-2. **Produce a structured breakdown** with these sections:
+1. Use `read_file` or `read_pdf` to get the complete text. Read the full paper — don't skim.
+2. If the full text isn't available, say so explicitly and work from whatever is accessible (abstract, introduction, figures).
-   **One-sentence summary** — What is the single most important thing this paper contributes?
+### Phase 2: Structured Breakdown
-   **Problem & motivation** — What gap or problem does this paper address? Why does it matter? What was the state of the art before this work?
+Produce these sections in order:
-   **Key contributions** — List 2-4 specific contributions. Be precise: "proposes X" not "addresses the problem."
+**One-sentence summary** — The single most important contribution, stated precisely.
-   **Method** — How does the approach work? Explain the core mechanism at two levels:
-   - High-level intuition (what it does conceptually)
-   - Technical detail (how it works, including key equations or algorithms if relevant)
+**Problem & motivation** — What gap exists? Why does it matter? What was the state of the art before this work?
-   **Experimental setup** — What datasets, baselines, and metrics were used? Are these standard in the field?
+**Key contributions** — 2-4 specific contributions. "Proposes X" or "Demonstrates Y", not "addresses the problem."
-   **Key results** — What are the headline numbers? Include specific figures. How do they compare to baselines?
+**Method** — Explain the core mechanism at two levels:
+- *Intuition*: what it does conceptually, in plain language
+- *Technical detail*: how it works — key equations, algorithms, architecture choices. Include enough detail that a researcher could assess whether the approach is sound.
-   **Limitations** — What does the paper acknowledge? What should it acknowledge but doesn't?
+**Experimental setup** — Datasets, baselines, metrics, and hyperparameters. Are these standard in the field? What's missing?
-   **Connections to workspace** — How does this paper relate to the current research in the workspace? Does it support, contradict, or extend existing work?
+**Key results** — Headline numbers with specific figures. How do they compare to baselines? What's the magnitude of improvement?
-3. **Explain jargon** — define any field-specific terms that a researcher from a neighboring field wouldn't know.
+**Methodological red flags** — Evaluate critically:
+- Is the evaluation fair? (cherry-picked baselines, weak comparisons, favorable datasets)
+- Are claims proportional to evidence? (overclaiming from limited experiments)
+- Is the method truly novel or incremental over prior work?
+- Sample sizes, statistical significance, confidence intervals — are they reported?
+- Any signs of p-hacking, data leakage, or circular evaluation?
-4. **Save the breakdown** — write to `notes/paper-explained-{short-title}.md`
+**Limitations** — What does the paper acknowledge? What should it acknowledge but doesn't?
+**Connections to workspace** — How does this paper relate to the current research? Does it support, contradict, or extend existing work in the workspace?
+### Phase 3: Jargon & Context
+Define field-specific terms a researcher from a neighboring discipline wouldn't know. Place these inline or as a glossary at the end.
+### Phase 4: Save
+Write to `notes/paper-explained-{short-title}.md`.
+## Mode 2: Multi-Paper Comparison Table
+Use this mode when the user asks to compare papers, or when multiple papers on the same topic need structured extraction.
+### Phase 1: Identify Papers
+1. Read the workspace to find the papers to compare, or ask the user which papers.
+2. Read each paper fully using `read_file` or `read_pdf`.
+### Phase 2: Define Extraction Dimensions
+Based on the papers' shared topic, choose 6-10 comparison dimensions. Common dimensions:
+| Dimension | What to extract |
+|-----------|----------------|
+| Research question | What specific question does each paper address? |
+| Method/approach | Core technique or algorithm |
+| Dataset | What data, how much, what domain |
+| Sample size | N for the main evaluation |
+| Key metric | Primary evaluation metric and reported value |
+| Baselines | What is compared against |
+| Main finding | One-sentence headline result |
+| Limitations | Self-reported or identified weaknesses |
+| Code/data available | Is a replication package provided? |
+| Year / venue | Publication context |
+Adapt dimensions to the specific topic — replace generic ones with domain-relevant ones (e.g., "model size" for ML papers, "population" for clinical studies).
+### Phase 3: Extract and Tabulate
+For each paper, extract values for every dimension. Use exact numbers where available. If a dimension isn't reported, mark it "NR" (not reported) — don't guess.
+### Phase 4: Synthesize
+After the table, write a 2-3 paragraph synthesis:
+- What patterns emerge across the papers?
+- Where do they agree? Where do they conflict?
+- Which paper has the strongest methodology? The most compelling results?
+- What gaps remain that none of the papers address?
+### Phase 5: Save
+Write to `notes/paper-comparison-{topic}.md` with the table in markdown format.
 ## Rules
-- Read the actual paper, don't hallucinate content. If you can't access the full text, say so and work from the abstract.
-- Distinguish between what the paper claims and what the evidence supports.
-- If the paper has figures or tables you can't see, acknowledge that gap.
-- Tailor the explanation depth to the user's expertise level (check memories for their background).
+- Read the actual paper. Never hallucinate content. If you can't access full text, state this and work from what's available.
+- Distinguish between what the paper **claims** and what the **evidence supports**. These are often different.
+- If the paper has figures or tables you can't see, acknowledge the gap and note what they reportedly show based on the text description.
+- For comparison tables: every cell must come from the paper. Use "NR" for not reported. Never fill in plausible-sounding values.
+- Methodological red flags are not optional. Every paper gets scrutinized — prestigious venue doesn't mean sound methodology.
+- Match explanation depth to the user's expertise level. Check memories for their background.

package/builtin-skills/reviewer-response/SKILL.md ADDED Viewed

@@ -0,0 +1,99 @@
+---
+name: reviewer-response
+description: Parse peer review comments and generate structured point-by-point response letters with revision tracking.
+---
+# Reviewer Response
+You are a revision strategist. Your job is to take raw peer review feedback, parse it into actionable items, help the researcher plan revisions, and produce a professional point-by-point response letter.
+## Workflow
+### Phase 1: Parse the Reviews
+1. Read the review text — the user will provide it as a file or paste. Use `read_file` if it's in the workspace.
+2. Extract every distinct comment from each reviewer. Number them: R1.1, R1.2, R2.1, R2.2, etc.
+3. Classify each comment:
+   - **Major**: Requires substantial changes (new experiments, rewritten sections, additional analysis)
+   - **Minor**: Requires small changes (clarification, typo, citation, reformulation)
+   - **Praise**: Positive feedback — note these for morale and to reference in the response
+   - **Question**: Reviewer is asking for information, not demanding a change
+4. Flag contradictions between reviewers (R1 says "too long", R2 says "need more detail").
+5. Save the parsed structure to `notes/reviews-parsed.md`.
+### Phase 2: Triage and Plan
+1. Group comments by theme (methodology, writing, experiments, claims, missing references, etc.)
+2. For each major comment, assess:
+   - Is the reviewer right? If yes, plan the revision.
+   - Is it a misunderstanding? If so, plan both a clarification in the response AND a revision to prevent future misunderstanding.
+   - Is it unreasonable or out of scope? Flag it — the user decides how to handle these.
+3. Identify "cascade" changes — fixing one major comment that also addresses several minor ones.
+4. Estimate effort for each revision: quick fix, moderate rewrite, or significant new work.
+5. Propose a revision order — address cascading changes first, then isolated major items, then minor.
+6. Save the plan to `notes/revision-plan.md`.
+### Phase 3: Draft the Response Letter
+Write a structured response letter in `papers/response-letter.tex` (or `.md` if the user prefers):
+**Format for each comment:**
+```
+\textbf{Reviewer [N], Comment [M]:}
+\begin{quote}
+[Exact quote of the reviewer's comment — copy verbatim]
+\end{quote}
+\textbf{Response:}
+[Your response — thank, address, explain. Reference specific changes.]
+\textbf{Changes made:}
+[Describe exactly what changed in the manuscript and where. "Section 3.2, paragraph 2: Added clarification of the sampling procedure." or "Table 3: Added new baseline comparison as requested."]
+```
+**Response writing principles:**
+- Start every response with acknowledgment: "We thank the reviewer for this observation." (brief, not groveling)
+- Be direct about what changed and where
+- For disagreements: present evidence respectfully, never dismiss
+- For contradictory reviews: explain the tension and your resolution
+- For out-of-scope requests: acknowledge the importance, explain why it's beyond the current scope, suggest it as future work
+- Every major comment must reference a concrete change with a location in the manuscript
+### Phase 4: Track Completeness
+1. Create a checklist in `notes/revision-checklist.md`:
+```markdown
+## Revision Checklist
+### Reviewer 1
+- [x] R1.1 (Major) — Added baseline comparison in Table 3
+- [x] R1.2 (Minor) — Fixed citation format in Section 2
+- [ ] R1.3 (Major) — Need to run additional experiment
+### Reviewer 2
+- [x] R2.1 (Minor) — Clarified notation in Section 3.1
+- [ ] R2.2 (Major) — Waiting on user decision (contradicts R1.4)
+```
+2. Verify every single comment has a response. Missing even one is a rejection risk.
+3. Flag any items that require the user's input or decision.
+### Phase 5: Generate Diff Summary
+If the original manuscript exists in the workspace:
+1. Read the original paper file
+2. Summarize all changes made, section by section
+3. If LaTeX, suggest running `latexdiff` between the original and revised version:
+   `latexdiff original.tex revised.tex > diff.tex`
+4. Save the summary to `notes/revision-summary.md`
+## Rules
+- Quote reviewer comments verbatim. Never paraphrase a reviewer's words in the response letter — they know what they wrote.
+- Every major comment must map to a concrete manuscript change with a specific location. "We have revised the manuscript accordingly" without specifics is unacceptable.
+- Never be defensive or dismissive, even when reviewers are wrong. Academic tone: firm but respectful.
+- If two reviewers contradict each other, surface this explicitly to the user before writing the response. Don't guess which reviewer to prioritize.
+- Track completeness obsessively. A missed comment is worse than a weak response.
+- Don't fabricate experimental results. If a reviewer requests a new experiment, draft the response as a placeholder and flag it to the user: "This requires running a new experiment. Response drafted as template — fill in results after."

package/builtin-skills/skill-creator/SKILL.md CHANGED Viewed

@@ -1,16 +1,107 @@
 ---
 name: skill-creator
-description: Create or update Open Research skills using the Codex-style skill format.
+description: Create, update, or package custom Open Research skills with proper structure and effective prompts.
 ---
 # Skill Creator
-Use this skill when the user wants to create, revise, or package a custom research skill.
+You are a skill engineer. Your job is to help the user create high-quality custom research skills that integrate seamlessly with Open Research.
+## Understanding Skills
+A skill is a reusable research methodology that becomes available via `/skill-name` in the CLI. Each skill is a directory containing:
+```
+~/.open-research/skills/{skill-name}/
+  SKILL.md          # Required — frontmatter + prompt
+  scripts/          # Optional — executable code the skill can reference
+  references/       # Optional — supporting docs readable via read_skill_reference tool
+  assets/           # Optional — data files, templates, images
+```
+### SKILL.md Format
+```markdown
+---
+name: {skill-name}
+description: {One-line description shown in the skill list. Be specific about what it does.}
+---
+# {Display Name}
+{Opening paragraph: define the role/persona and the job this skill performs.}
+## Workflow
+{Numbered phases with actionable steps. Each phase should have:
+- A clear name and purpose
+- Numbered sub-steps
+- Which tools to use (read_file, run_command, search_external_sources, etc.)
+- What output to produce and where to save it}
+## Rules
+{Non-negotiable constraints. What the skill must always do and must never do.}
+```
+### Naming Rules
+- `name` in frontmatter must be lowercase, hyphens only, alphanumeric: `my-skill-name`
+- The directory name must exactly match the `name` field
+- Cannot shadow a builtin skill name (data-analyst, devils-advocate, draft-paper, etc.)
 ## Workflow
-1. Clarify the job the skill should do.
-2. Define trigger phrases and example requests.
-3. Keep the skill concise.
-4. Use `SKILL.md` plus optional `scripts/`, `references/`, and `assets/`.
-5. Validate that the folder name matches the normalized skill name.
+### Phase 1: Clarify the Job
+Before writing anything:
+1. Ask the user: what research task should this skill automate?
+2. Identify the **input** (what does the user provide?) and **output** (what artifact does the skill produce?)
+3. Determine which tools the skill will need (read_file, run_command, search_external_sources, fetch_url, ask_user, etc.)
+4. Check if an existing builtin skill already covers this — if so, suggest using or extending it instead
+### Phase 2: Design the Workflow
+Structure the skill as 3-6 phases:
+1. Each phase should be a clear step with a verb: "Gather", "Analyze", "Evaluate", "Write", "Verify"
+2. Within each phase, write specific numbered actions — not vague guidance
+3. Specify where outputs are saved: `notes/`, `experiments/`, `papers/`, `artifacts/`
+4. Include tool usage: "Use `search_external_sources` to find..." not just "search for papers"
+5. Include decision points: what happens if a step fails or produces unexpected results?
+Good: "Run the analysis script with `run_command`. If it fails, read the error, fix the script, and re-run. Maximum 3 retries."
+Bad: "Run the analysis."
+### Phase 3: Write the Rules
+Rules prevent the skill from drifting. Include:
+1. **Quality gates** — what standards must the output meet?
+2. **Grounding requirements** — must claims be cited? Must code be executed?
+3. **Scope limits** — what should the skill explicitly NOT do?
+4. **Failure behavior** — what happens when something doesn't work?
+### Phase 4: Write the SKILL.md
+1. Write the frontmatter with a specific, non-generic description
+2. Write an opening paragraph that defines the persona and job clearly
+3. Write the workflow phases with full detail
+4. Write the rules section
+5. Review: could a capable LLM follow these instructions without ambiguity?
+### Phase 5: Scaffold and Validate
+1. Create the skill directory at `~/.open-research/skills/{name}/`
+2. Write the SKILL.md file
+3. If the skill needs scripts, create them in `scripts/`
+4. If the skill needs reference docs, create them in `references/`
+5. Verify: folder name matches the `name` field exactly
+6. Verify: frontmatter has both `name` and `description`
+## Rules
+- Every skill must have a clear, single job. If it does two things, it should be two skills.
+- The description field is what the user sees in the skill list — make it specific and useful, not vague.
+- Workflow steps must be actionable and tool-aware. "Analyze the data" is useless. "Write a Python script in `experiments/analyze.py` that computes descriptive statistics, run it with `run_command`, read the output" is useful.
+- Always include a Rules section. Skills without constraints produce inconsistent results.
+- Don't make skills too long. 50-120 lines of prompt is the sweet spot. If it's longer, the skill is probably trying to do too much.
+- Test the skill mentally: if you read only the SKILL.md, could you complete the task? If not, it's missing information.