npm - clideck - Versions diffs - 1.30.5 → 1.30.7 - Mend

clideck 1.30.5 → 1.30.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/README.md +8 -2
package/bin/clideck.js +7 -1
package/clideck-ask-cli.js +110 -0
package/config.js +3 -0
package/package.json +1 -1
package/public/js/app.js +3 -0
package/public/js/creator.js +63 -31
package/public/js/drag.js +1 -0
package/server.js +17 -1
package/session-ask.js +171 -0
package/sessions.js +7 -1
package/skills/research-experiment/SKILL.md +350 -163
package/skills/research-experiment/SKILL.md.bak +224 -0
package/skills/research-experiment/scripts/init-research-layout.mjs +184 -0
package/transcript.js +5 -1
package/utils.js +8 -2
package/skills/research-experiment/agents/openai.yaml +0 -4

package/skills/research-experiment/SKILL.md CHANGED Viewed

@@ -1,224 +1,411 @@
 ---
 name: research-experiment
-description: Coordinate autonomous research experiments across multiple coding agents using isolated git worktrees. Use when a user wants the main agent to define a goal, constraints, acceptance criteria, and experiment boundaries, then dispatch Codex/Claude/Gemini or other agents to independently search for solutions without touching production code.
+description: Set up and coordinate parallel autonomous research across multiple agents. Use when a user wants a manager agent to interview for requirements, create the research package, prepare isolated researcher workspaces, and dispatch independent researchers to explore the problem without repeated check-ins.
 ---
 # Research Experiment
-Use this skill to run parallel, autonomous experiments safely.
+Use this skill to run parallel autonomous research with one manager and multiple researchers.
-There are two roles:
+This skill is generic. It is for:
-- Main Agent: owns the research brief, workspace setup, researcher prompts, verification, and merge decision.
-- Experiment Agent: owns independent exploration inside one assigned worktree and must not redefine the experiment.
+- code optimization
+- implementation experiments
+- product or workflow discovery
+- marketing or advertising research
+- strategy comparison
+- scientific or technical literature review
+- invention and concept exploration
-If the user says "you are the main agent", follow the Main Agent Role. If the user says "you are an experiment agent" or "researcher agent", follow the Experiment Agent Role.
+The core model is:
-## Main Agent Role
+- The **manager** interviews the user, defines the experiment, prepares the workspace, writes the researcher prompt files, and later synthesizes results.
+- Each **researcher** works independently inside an assigned workspace and prompt file, explores multiple approaches, and produces a final findings document.
-The main agent defines the experiment and coordinates researchers. It must not let each researcher invent different goals or acceptance criteria.
+If the user says "you are the main agent" or "you are the manager", follow the Manager Role.
-## Main Agent Workflow
+If the user says "you are a researcher" or "experiment agent", follow the Researcher Role.
-1. Convert the user request into an experiment brief:
-   - Goal: the outcome to achieve.
-   - Acceptance criteria: exact tests, benchmarks, or review gates.
-   - Hard constraints: what must not change.
-   - Quality bar: what counts as meaningful progress versus noise.
-   - Shared resources: ports, GPUs, model caches, services, datasets, credentials, or external APIs.
-   - Stop conditions: when researchers may quit.
+## First Check: Can This Run Autonomously?
-2. Identify the production repository root:
-   - Use `git rev-parse --show-toplevel` when inside a git repo.
-   - If the project has nested repos, identify which repo owns the files under experiment.
-   - Do not assume the current working directory is the repo root.
+Before doing substantial work, verify that autonomy is actually possible.
-3. Create per-researcher worktrees inside the project folder, not beside it:
-   - Prefer a project-local directory such as `<project>/.research-worktrees/<slug>-<n>`.
-   - Keep worktree directories inside the main project folder so agents do not need extra filesystem permissions.
-   - Do not create sibling worktrees such as `../project_research_1` unless the user explicitly asks.
-   - Never point a researcher at the production checkout for edits.
-   - Use unique branches, for example `research/<slug>-1`, `research/<slug>-2`.
+If the environment will force routine approval prompts for normal work, stop and say so plainly. Do not pretend the experiment is autonomous if the manager or researchers will keep pausing for permission.
-4. Give every researcher the same goal and rules:
-   - Do not assign fixed technical roles unless the user explicitly asks.
-   - Let each researcher decide the approach and iterate independently.
-   - Include the exact worktree path, branch, allowed edit scope, forbidden files, verification commands, and report format.
-   - Save the canonical brief to the experiment folder before dispatching researchers.
+## Core Rules
-5. Verify centrally:
-   - Researchers may run local checks in their worktree, but the main agent owns authoritative acceptance verification.
-   - If benchmarks contend for scarce resources, run final benchmarks sequentially from the main agent.
-   - Merge nothing unless it passes acceptance criteria and clears the quality bar.
+- The manager defines the problem, constraints, and evaluation. It does not prescribe the answer.
+- Researchers must not redefine the goal, constraints, or evidence standard.
+- Researchers should not read each other's work while the experiment is running unless the user explicitly asks for collaboration.
+- The manager should not bias researchers toward one preferred solution path unless the user explicitly wants that.
+- The goal is independent exploration, not consensus-by-default.
-## Experiment Folder
+## Manager Role
-The main agent should create one experiment folder under the main project, for example:
+The manager owns setup quality. A bad setup poisons the whole experiment.
-```text
-.research-worktrees/<experiment-slug>/
-```
+If the researchers are not pushed toward deep questioning and independent exploration, the experiment will waste time and tokens on obvious, shallow paths.
-Inside it, create:
+The manager must keep querying the user until all critical setup fields are clear enough to run safely and usefully.
-- `EXPERIMENT.md`: the canonical brief, baseline, quality gate, constraints, assignments, commands, and report format.
-- `<slug>-experiment-1/`: worktree for researcher 1.
-- `<slug>-experiment-2/`: worktree for researcher 2.
-- `<slug>-experiment-3/`: worktree for researcher 3.
-- `<slug>-experiment-1/LOG.md`: progress log for researcher 1.
-- `<slug>-experiment-2/LOG.md`: progress log for researcher 2.
-- `<slug>-experiment-3/LOG.md`: progress log for researcher 3.
+Do not start researcher setup just because you have a rough idea. Start only when the experiment brief is decision-grade.
-Researcher prompts should tell agents to read `EXPERIMENT.md` first and then follow only their assigned workspace, branch, log file, and resource values.
+## Manager Intake Interview
-The main agent may check each researcher log during the experiment to monitor progress without interrupting researchers. Researcher agents should append concise entries after each meaningful experiment loop:
+Before creating the experiment package, gather or confirm:
-- Hypothesis tried.
-- Files changed.
-- Command run.
-- Result versus same-session baseline.
-- Keep/reject decision.
-- Current blocker, if any.
+- Objective: what are we trying to discover, optimize, compare, prove, design, or explain?
+- Decision to support: what user decision will this research inform?
+- Output type: what should the final answer look like?
+- Success criteria: what makes a useful result?
+- Evidence standard: benchmark, rubric, citations, reasoning, expert judgment, human review, or another standard.
+- Constraints: budget, time, ethics, legal boundaries, brand rules, forbidden actions, forbidden files, safety limits.
+- Allowed tools and sources: codebase only, web research, papers, datasets, APIs, interviews, etc.
+- Domain context: what background is necessary before work starts?
+- Researcher count: how many researchers should run?
+- Workspace model: code workspace, isolated git worktree, document workspace, or another isolated setup.
+- Stop conditions: when should a researcher conclude success, failure, or exhaustion?
-## Worktree Setup
+If any of those would materially change the experiment, keep asking.
-Use project-local worktrees. The worktree directory must live under the main project folder:
+## Pick The Research Mode
-```bash
-mkdir -p .research-worktrees
-mkdir -p .research-worktrees/<experiment-slug>
-git worktree add .research-worktrees/<experiment-slug>/<slug>-experiment-1 -b research/<slug>-1
-git worktree add .research-worktrees/<experiment-slug>/<slug>-experiment-2 -b research/<slug>-2
-git worktree add .research-worktrees/<experiment-slug>/<slug>-experiment-3 -b research/<slug>-3
+Choose the mode that matches the work:
+### 1. Code Research Mode
+Use when researchers will change code, run commands, benchmark, or inspect a repository.
+Preferred isolation:
+- isolated git worktrees inside the main project folder
+- or isolated copies if worktrees are unavailable
+### 2. Knowledge Research Mode
+Use when the work is primarily reading, comparing, synthesizing, writing, or ideating.
+Preferred isolation:
+- per-researcher document workspaces
+- private notes and findings files
+- source and citation tracking if external research is allowed
+Git worktrees are optional here, not required.
+## Standard Experiment Package
+Create one experiment folder under the project or working directory, for example:
+```text
+.research/<experiment-slug>/
 ```
-If a branch already exists, choose a new suffix. Do not delete or overwrite existing worktrees unless the user explicitly asks.
+Inside it, create:
+- `EXPERIMENT.md`: canonical experiment brief
+- `MANAGER.md`: manager notes, open questions, and round control
+- `SYNTHESIS.md`: manager's final synthesis target
+- `researcher-1/PROMPT.md`
+- `researcher-1/LOG.md`
+- `researcher-1/FINDINGS.md`
+- `researcher-1/IDEAS.md`
+- `researcher-2/...`
+- `researcher-3/...`
+Each researcher folder may also include a private notes or workspace folder if useful.
-When the target files live in a nested repo, run the worktree commands from that nested repo root but still put the worktree folders under the main project folder. Example:
+Use the scaffold script if helpful:
 ```bash
-cd path/to/nested/repo
-mkdir -p /absolute/path/to/main-project/.research-worktrees
-mkdir -p /absolute/path/to/main-project/.research-worktrees/<experiment-slug>
-git worktree add /absolute/path/to/main-project/.research-worktrees/<experiment-slug>/<slug>-experiment-1 -b research/<slug>-1
+node skills/research-experiment/scripts/init-research-layout.mjs .research/<experiment-slug> 3 knowledge
 ```
-## Baseline And Quality Gate
+Replace `knowledge` with `code` when preparing a code experiment.
-Before changing code, each researcher must run the exact benchmark or check command once from their assigned worktree and record it as their same-session baseline.
+## Workspace Rules
-Compare final results against:
+Every researcher must have an isolated assigned workspace.
-- The user/main-agent supplied baseline.
-- The researcher’s same-session baseline.
+Examples:
-The main agent/user defines the quality gate for each experiment. Examples:
+- Code mode: `.research-worktrees/<experiment-slug>/researcher-1/`
+- Knowledge mode: `.research/<experiment-slug>/researcher-1/`
-- Test suite must pass.
-- Benchmark score must not regress.
-- ASR must recover at least N words from generated TTS.
-- Human listening check required.
+For code mode:
-Do not invent or weaken the quality gate. If the gate is unclear, ask the main agent before accepting a result.
+- never point researchers at the production checkout for edits
+- prefer project-local worktrees rather than sibling folders
+- use unique branches such as `research/<slug>-1`, `research/<slug>-2`
-## Experiment Agent Role
+For knowledge mode:
-The experiment agent executes the fixed brief from the main agent. It uses this skill for discipline and workflow only.
+- give each researcher a private workspace for notes, drafts, and source tracking
+- keep their working files separate until the manager collects final findings
-Follow only your assigned researcher section from the experiment brief. Do not edit outside your assigned worktree.
-Create any temporary files, profiling scripts, generated outputs, scratch notes, and helper artifacts inside your assigned worktree. Do not use `/tmp`, `/var/tmp`, home-directory scratch folders, or sibling project folders unless the brief explicitly allows it.
+## Canonical Brief Requirements
-The experiment agent must not redefine:
+`EXPERIMENT.md` must include:
-- Goal.
-- Constraints.
-- Quality gate.
-- Benchmark commands.
-- Acceptance criteria.
-- Assigned workspace or branch.
+- objective
+- decision this research supports
+- mode: `code` or `knowledge`
+- success criteria
+- evidence standard
+- constraints
+- allowed tools and sources
+- domain context
+- shared resources
+- stop conditions
+- required final output format
+- researcher independence rules
-If any of those are missing or ambiguous, ask the main agent for clarification before accepting a result. Do not make up a weaker gate.
+The brief should define the problem clearly without embedding a preferred answer.
-## Experiment Agent Autonomy
+## Researcher Prompt Files
-DO NOT STOP EXPERIMENTING UNLESS YOU ACHIEVED THE GOALS OR ABSOLUTELY NECESSARY.
+The manager must write one self-contained `PROMPT.md` per researcher.
-- Do NOT ask the user if you should continue.
-- The user may be away from the computer and expects the experiment to continue until you achieve the experiment goals, so keep working until the task is naturally complete.
-- You are autonomous. If you are unsure how to proceed, re-read the skill, goals, context, think differently, try different innovative approaches, and continue.
-- Stop only if something out of your control blocks you from continuing. Otherwise continue experimenting until goals are achieved or the useful paths are exhausted.
+Each prompt file must be usable cold. The researcher should be able to open it in a fresh session and work without hidden context.
-## Researcher Prompt Template
+Prefer absolute paths in researcher prompt files whenever the user will launch researchers manually in separate sessions. Relative paths are acceptable only when the working directory is guaranteed.
-Include this block, or an equivalent adapted version, in every researcher prompt:
+Each prompt must include:
-```text
-You are an autonomous research agent for this experiment.
-Goal:
-<goal>
-Experiment rules and boundaries:
-<rules>
-Assigned workspace:
-<absolute path to your worktree>
-Canonical experiment brief:
-<absolute path to EXPERIMENT.md>
-Assigned experiment log:
-<absolute path to your LOG.md>
-You must work only inside your assigned worktree unless the brief explicitly allows another path. Do not edit, overwrite, or revert files in the production checkout. Do not change benchmark scoring, test fixtures, or acceptance criteria unless the brief explicitly asks for that.
-Read the canonical experiment brief before making changes. Follow only your assigned researcher section. Do not redefine the goal, constraints, quality gate, benchmark commands, acceptance criteria, workspace, branch, port, log path, or resource assignments. Do not edit outside your assigned worktree.
-If you need temp folders, profiling scripts, scratch files, benchmark outputs, or helper artifacts, create them inside your assigned worktree. Do not use `/tmp`, `/var/tmp`, home-directory scratch folders, or sibling project folders unless the brief explicitly allows it.
-DO NOT STOP EXPERIMENTING UNLESS YOU ACHIEVED THE GOALS OR ABSOLUTELY NECESSARY.
-- Do NOT ask the user if you should continue.
-- The user may be away from the computer and expects the experiment to continue until you achieve the experiment goals, so keep working until the task is naturally complete.
-- You are autonomous. If you are unsure how to proceed, re-read the skill, goals, context, think differently, try different innovative approaches, and continue.
-- Stop only if something out of your control blocks you from continuing. Otherwise continue experimenting until goals are achieved or useful paths are exhausted.
-Loop:
-1. Inspect the code and constraints.
-2. Form hypotheses.
-3. Try a small, explainable experiment.
-4. Run relevant verification.
-5. Keep, revise, or reject the attempt.
-6. Repeat until the goal is achieved, useful paths are exhausted, or an external blocker prevents progress.
-Report format:
-- Worktree path and branch.
-- Changed files.
-- Commands run and exact relevant output.
-- Results against acceptance criteria.
-- Failed attempts and why they were rejected.
-- Whether you recommend merging the patch.
-- Any cleanup needed: running servers, ports, PIDs, temp files.
-Also append concise progress entries to your assigned experiment log after each meaningful experiment loop.
-```
+- experiment goal
+- decision being supported
+- assigned workspace path
+- paths to `EXPERIMENT.md`, `LOG.md`, `FINDINGS.md`, and `IDEAS.md`
+- allowed tools and sources
+- forbidden actions
+- evidence standard
+- stop conditions
+- reporting requirements
+- the researcher operating rules needed to execute autonomously without opening this skill file
+Optional: assign a thinking style to increase diversity, such as:
+- first-principles
+- contrarian
+- evidence-first
+- systems-level
+- creative/wildcard
+Do not assign solution hints disguised as roles.
+## Manual Launch In Clideck
+When the user is launching researchers manually in Clideck, the manager should give explicit launch instructions.
+Recommended pattern:
+1. Open one new Clideck session per researcher.
+2. Set the session working directory to the researcher's assigned workspace.
+3. Tell the agent to read that researcher's `PROMPT.md`, or paste the prompt contents as the first message.
+4. Do not give the researcher extra steering beyond what is in `PROMPT.md` unless the experiment setup changes.
+The point is that `PROMPT.md` is the contract.
+## Anti-Bias Rules
+The manager should:
+- define the question, not the answer
+- avoid telling researchers which solution seems best
+- avoid sharing one researcher's interim ideas with another
+- use diversity prompts only to widen exploration, not to funnel conclusions
+- explicitly require an early question-generation round before action begins
+Researchers should:
+- think independently
+- explore at least one non-obvious path
+- not assume the manager knows the answer
+- not converge on the first plausible approach without testing alternatives
+## Manager Workflow
+1. Interview the user until setup is complete.
+2. Choose the research mode and isolation strategy.
+3. Create the experiment folder and researcher folders.
+4. Create isolated workspaces for each researcher.
+5. Write `EXPERIMENT.md`, `MANAGER.md`, `SYNTHESIS.md`, and every `PROMPT.md`.
+6. Verify that each researcher prompt is self-contained.
+7. Dispatch researchers manually or through the user's orchestration flow.
+8. Do not let researchers see each other's active work unless the user explicitly wants collaboration.
+9. When researchers finish, review all `FINDINGS.md` files.
+10. Write `SYNTHESIS.md` with convergences, divergences, confidence, and ranked recommendations.
+## Handoff Protocol
+When a researcher finishes, `FINDINGS.md` is the handoff artifact.
+The user, manager, or orchestration layer should deliver every completed `FINDINGS.md` back to the manager session.
+The manager should synthesize from researcher findings, not from half-finished logs unless a researcher failed before producing a final findings document.
-## Shared Resource Rules
+## Researcher Role
-For resources that can contaminate results or conflict across agents:
+The researcher executes the fixed brief from the manager.
-- Assign unique ports, output directories, cache directories, and branch names.
-- Do not run final GPU/MPS benchmarks concurrently across researchers.
-- Prefer researcher-local smoke tests and main-agent final benchmarks.
-- Require researchers to stop servers they started, or report any still-running process clearly.
+The researcher must not redefine:
-## Merge Rules
+- objective
+- decision being supported
+- evidence standard
+- constraints
+- assigned workspace
+- stop conditions
+If those are missing or materially ambiguous, ask the manager before accepting a conclusion.
+## Researcher Operating Rules
+- Do not ask "should I continue?"
+- Keep working until you reach a credible conclusion or a real blocker.
+- Do not wait for praise or confirmation between attempts.
+- Do not get trapped refining one weak direction forever.
+- Start with a question-generation round before picking your first approach.
+- Try multiple structurally different approaches when the task is open-ended.
+- Record meaningful progress in `LOG.md`.
+- Record deferred but promising paths in `IDEAS.md`.
+- Write your final conclusion in `FINDINGS.md`.
+## Evidence And Innovation
+For every meaningful claim, use the evidence standard defined in the brief.
+Examples:
+- Code mode: benchmarks, tests, traces, metrics, profiler output
+- Knowledge mode: citations, comparisons, reasoning chains, datasets, examples, scored rubrics
+Researchers should be innovative, but not sloppy.
+- Explore unconventional ideas.
+- Challenge assumptions.
+- Consider contrarian approaches.
+- Ask what the real limits are before optimizing locally.
+- Separate facts from hypotheses.
+- State confidence honestly.
+## Researcher Question Round
+Before the first experiment or line of inquiry, each researcher should generate a short list of high-value questions.
+The point is to widen the search space before committing to an approach.
+This question round should happen:
+- once at the start
+- again after major failed or inconclusive attempts
+Questions should be practical, not decorative. They should help uncover limits, assumptions, and overlooked paths.
+Examples:
+- what is actually expensive here?
+- what is actually creating the token bloat?
+- what is the current hard limit?
+- what happens if we remove this component entirely?
+- what happens if we compress, batch, defer, cache, or approximate this step?
+- what quality signal might break if we optimize too aggressively?
+- what is the theoretical lower bound?
+- what assumptions are probably wrong?
+- what would a contrarian approach try first?
+- if I had to get most of the gain with the smallest change, where would I look?
+- if the obvious path fails, what structurally different path is left?
+- what evidence would prove this direction is a dead end?
+## Researcher Loop
+Use a loop like this:
+1. Re-read the goal, constraints, and evidence standard.
+2. Inspect the workspace and relevant context.
+3. Generate high-value questions about limits, assumptions, and overlooked paths.
+4. Form multiple candidate approaches from those questions.
+5. Choose one explainable experiment or line of inquiry.
+6. Execute it and gather evidence.
+7. Record what happened in `LOG.md`.
+8. Keep, revise, or reject the approach.
+9. Re-question after major failures or surprises.
+10. Repeat until success, exhaustion, or a true blocker.
+For code mode, this may be code changes plus verification.
+For knowledge mode, this may be source review, comparison, synthesis, ideation, or rubric-based evaluation.
+## Stop Conditions
+A researcher may stop only when one of these is true:
+- The goal is achieved well enough to make a recommendation.
+- The useful paths are exhausted.
+- A real blocker outside the researcher's control prevents progress.
+"Useful paths are exhausted" should mean both:
+- at least several materially different approaches were explored
+- recent attempts are no longer producing meaningful new information
+## Required Final Findings Format
+Every researcher must complete `FINDINGS.md` with:
+- Executive summary
+- Recommendation
+- Confidence level
+- Approaches tried
+- Evidence collected
+- Failed or rejected paths
+- Remaining uncertainties
+- Suggested next steps
+For code mode also include:
+- workspace path
+- branch name if applicable
+- files changed
+- commands run
+- cleanup needed
+## Manager Synthesis
+After collecting all researcher findings, the manager should produce `SYNTHESIS.md` with:
+- experiment summary
+- researcher-by-researcher conclusions
+- convergences
+- divergences
+- strongest evidence
+- lowest-confidence areas
+- ranked recommendations
+- follow-up experiments or round-two ideas
+If useful, the manager may launch another research round with a refined brief. A new round may incorporate prior findings, but should still avoid biasing researchers toward one presumed answer.
+## Code Mode Guidance
+When using git worktrees, prefer commands like:
+```bash
+mkdir -p .research-worktrees/<experiment-slug>
+git worktree add .research-worktrees/<experiment-slug>/researcher-1 -b research/<slug>-1
+git worktree add .research-worktrees/<experiment-slug>/researcher-2 -b research/<slug>-2
+git worktree add .research-worktrees/<experiment-slug>/researcher-3 -b research/<slug>-3
+```
-The main agent must review researcher diffs before applying them to production.
+Do not delete or overwrite existing worktrees unless the user explicitly asks.
-Reject or keep separate any patch that:
+The scaffold script creates the experiment package and researcher folders only. It does not create git worktrees automatically. In code mode, the manager must create and verify the worktrees separately.
-- Touches production checkout files.
-- Changes tests, benchmarks, fixtures, sample inputs, or scoring without permission.
-- Passes only because the acceptance criteria were weakened.
-- Produces only noisy or marginal improvement below the declared quality bar.
-- Leaves unexplained background processes or shared state.
+## Hard Rules
-If a researcher accidentally edits the production checkout, preserve pre-existing user changes and move or recreate the experiment in an isolated worktree before continuing.
+- Do not start researchers before the manager brief is complete.
+- Do not use the production checkout as a researcher edit workspace.
+- Do not let researchers redefine the experiment.
+- Do not let active researchers read each other's interim work unless the user wants collaboration.
+- Do not confuse manager guidance with answer selection.
+- Do not stop because of ordinary uncertainty.
+- Do not pretend that vague findings are strong evidence.