npm - productkit - Versions diffs - 1.8.0 → 1.9.0 - Mend

productkit 1.8.0 → 1.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/README.md +7 -3
package/package.json +1 -1
package/src/cli.js +1 -1
package/src/commands/diff.js +1 -0
package/src/commands/export.js +1 -0
package/src/commands/reset.js +1 -0
package/src/commands/status.js +1 -0
package/templates/CLAUDE.md +9 -6
package/templates/README.md +9 -6
package/templates/commands/productkit.audit.md +140 -0
package/templates/commands/productkit.prioritize.md +24 -7
package/templates/commands/productkit.solution.md +19 -1
package/templates/commands/productkit.spec.md +11 -0
package/templates/commands/productkit.validate.md +192 -0

package/README.md CHANGED Viewed

@@ -74,12 +74,14 @@ Each command starts a guided conversation. Claude asks questions, pushes back on
 | 2 | `/productkit.users` | Define target user personas through dialogue | `users.md` |
 | 3 | `/productkit.problem` | Frame the problem statement grounded in user research | `problem.md` |
 | 4 | `/productkit.assumptions` | Extract and prioritize hidden assumptions | `assumptions.md` |
-| 5 | `/productkit.solution` | Brainstorm and evaluate solution ideas | `solution.md` |
-| 6 | `/productkit.prioritize` | Score and rank features for v1 | `priorities.md` |
-| 7 | `/productkit.spec` | Generate a complete product spec | `spec.md` |
+| 5 | `/productkit.validate` | Validate assumptions with interviews and surveys | `validation.md` |
+| 6 | `/productkit.solution` | Brainstorm and evaluate solution ideas | `solution.md` |
+| 7 | `/productkit.prioritize` | Score and rank features for v1 | `priorities.md` |
+| 8 | `/productkit.spec` | Generate a complete product spec | `spec.md` |
 | — | `/productkit.clarify` | Resolve ambiguities and contradictions across artifacts | Updates existing files |
 | — | `/productkit.analyze` | Run a consistency and completeness check | Analysis in chat |
 | — | `/productkit.bootstrap` | Auto-draft all artifacts from existing codebase | All missing artifacts |
+| — | `/productkit.audit` | Compare spec against codebase, surface gaps | `audit.md` |
 Commands build on each other — `/productkit.problem` reads your `users.md`, `/productkit.solution` reads your problem and users, and `/productkit.spec` synthesizes everything into a single document. You can run `/productkit.clarify` and `/productkit.analyze` at any stage to check your work.
@@ -93,9 +95,11 @@ my-project/
 ├── users.md               # User personas
 ├── problem.md             # Problem statement
 ├── assumptions.md         # Prioritized assumptions
+├── validation.md          # Validation results & scripts
 ├── solution.md            # Chosen solution
 ├── priorities.md          # Ranked feature list
 ├── spec.md                # Complete product spec
+├── audit.md               # Spec vs codebase audit (on demand)
 ├── .productkit/config.json
 ├── .claude/commands/      # Slash command prompts
 ├── CLAUDE.md

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "productkit",
-  "version": "1.8.0",
+  "version": "1.9.0",
   "description": "Slash-command-driven product thinking toolkit for Claude Code",
   "main": "src/cli.js",
   "bin": {

package/src/cli.js CHANGED Viewed

@@ -18,7 +18,7 @@ const program = new Command();
 program
   .name('productkit')
   .description(chalk.cyan.bold('Product thinking toolkit for Claude Code'))
-  .version('1.8.0');
+  .version('1.9.0');
 program
   .command('init [projectName]')

package/src/commands/diff.js CHANGED Viewed

@@ -9,6 +9,7 @@ const ARTIFACT_FILES = [
   'users.md',
   'problem.md',
   'assumptions.md',
+  'validation.md',
   'solution.md',
   'priorities.md',
   'spec.md',

package/src/commands/export.js CHANGED Viewed

@@ -8,6 +8,7 @@ const ARTIFACTS = [
   { file: 'users.md', label: 'Users' },
   { file: 'problem.md', label: 'Problem' },
   { file: 'assumptions.md', label: 'Assumptions' },
+  { file: 'validation.md', label: 'Validation' },
   { file: 'solution.md', label: 'Solution' },
   { file: 'priorities.md', label: 'Priorities' },
   { file: 'spec.md', label: 'Spec' },

package/src/commands/reset.js CHANGED Viewed

@@ -9,6 +9,7 @@ const ARTIFACTS = [
   'users.md',
   'problem.md',
   'assumptions.md',
+  'validation.md',
   'solution.md',
   'priorities.md',
   'spec.md',

package/src/commands/status.js CHANGED Viewed

@@ -8,6 +8,7 @@ const ARTIFACTS = [
   { file: 'users.md', command: '/productkit.users', label: 'Users' },
   { file: 'problem.md', command: '/productkit.problem', label: 'Problem' },
   { file: 'assumptions.md', command: '/productkit.assumptions', label: 'Assumptions' },
+  { file: 'validation.md', command: '/productkit.validate', label: 'Validation' },
   { file: 'solution.md', command: '/productkit.solution', label: 'Solution' },
   { file: 'priorities.md', command: '/productkit.prioritize', label: 'Priorities' },
   { file: 'spec.md', command: '/productkit.spec', label: 'Spec' },

package/templates/CLAUDE.md CHANGED Viewed

@@ -10,12 +10,14 @@ Use these commands in order to build your product foundation:
 2. `/productkit.users` — Define target user personas
 3. `/productkit.problem` — Frame the problem statement
 4. `/productkit.assumptions` — Extract and prioritize assumptions
-5. `/productkit.solution` — Brainstorm and evaluate solutions
-6. `/productkit.prioritize` — Score and rank features
-7. `/productkit.spec` — Generate a product spec
-8. `/productkit.clarify` — Resolve ambiguities across artifacts
-9. `/productkit.analyze` — Run a completeness/consistency check
-10. `/productkit.bootstrap` — Auto-draft all artifacts from an existing codebase
+5. `/productkit.validate` — Validate assumptions with interview scripts and surveys
+6. `/productkit.solution` — Brainstorm and evaluate solutions
+7. `/productkit.prioritize` — Score and rank features
+8. `/productkit.spec` — Generate a product spec
+9. `/productkit.clarify` — Resolve ambiguities across artifacts
+10. `/productkit.analyze` — Run a completeness/consistency check
+11. `/productkit.bootstrap` — Auto-draft all artifacts from an existing codebase
+12. `/productkit.audit` — Compare spec against codebase and surface gaps
 ## Artifacts
@@ -24,6 +26,7 @@ Product artifacts are written as markdown files. Check `.productkit/config.json`
 - `users.md` — Target user personas
 - `problem.md` — Problem statement
 - `assumptions.md` — Prioritized assumptions
+- `validation.md` — Assumption validation results, interview scripts, and survey questions
 - `solution.md` — Chosen solution with alternatives considered
 - `priorities.md` — Scored and ranked feature list
 - `spec.md` — Complete product spec ready for engineering

package/templates/README.md CHANGED Viewed

@@ -14,12 +14,14 @@ Then use the slash commands to build your product foundation:
 2. `/productkit.users` — Define target user personas
 3. `/productkit.problem` — Frame the problem statement
 4. `/productkit.assumptions` — Extract and prioritize assumptions
-5. `/productkit.solution` — Brainstorm and evaluate solutions
-6. `/productkit.prioritize` — Score and rank features
-7. `/productkit.spec` — Generate a product spec
-8. `/productkit.clarify` — Resolve ambiguities
-9. `/productkit.analyze` — Check consistency and completeness
-10. `/productkit.bootstrap` — Auto-draft all artifacts from existing codebase
+5. `/productkit.validate` — Validate assumptions with interviews and surveys
+6. `/productkit.solution` — Brainstorm and evaluate solutions
+7. `/productkit.prioritize` — Score and rank features
+8. `/productkit.spec` — Generate a product spec
+9. `/productkit.clarify` — Resolve ambiguities
+10. `/productkit.analyze` — Check consistency and completeness
+11. `/productkit.bootstrap` — Auto-draft all artifacts from existing codebase
+12. `/productkit.audit` — Compare spec against actual implementation
 ## Artifacts
@@ -31,6 +33,7 @@ Artifacts are written to the project root by default. If `artifact_dir` is set i
 | `users.md` | Target user personas |
 | `problem.md` | Problem statement |
 | `assumptions.md` | Prioritized assumptions |
+| `validation.md` | Assumption validation, interview scripts, survey questions |
 | `solution.md` | Chosen solution with alternatives considered |
 | `priorities.md` | Scored and ranked feature list |
 | `spec.md` | Complete product spec ready for engineering |

package/templates/commands/productkit.audit.md ADDED Viewed

@@ -0,0 +1,140 @@
+---
+description: Compare your spec against the actual codebase and surface gaps
+---
+You are a product audit specialist comparing what was planned (in the product artifacts) against what was actually built (in the codebase). Your job is to surface gaps, scope creep, and unmet acceptance criteria so the PM can make informed decisions about what to do next.
+## Your Role
+Read the product spec and supporting artifacts, then systematically scan the codebase to determine what was implemented, what's missing, what was added beyond the spec, and whether acceptance criteria are met. Produce a clear, actionable audit report.
+## Before You Start
+Check `.productkit/config.json` for an `artifact_dir` field. If set, read artifacts there instead of the project root. If not set, default to the project root.
+Read these artifacts (required):
+- `spec.md` — the product spec (required)
+- `priorities.md` — feature priorities and v1 scope (required)
+Also read if they exist:
+- `solution.md` — chosen solution
+- `validation.md` — assumption validation results
+- `assumptions.md` — known risks
+At minimum, `spec.md` must exist. If it's missing, tell the user to run `/productkit.spec` first.
+### Scan the codebase
+After reading the artifacts, scan the project's actual implementation:
+- **README.md** — project description, setup instructions, documented features
+- **package.json** (or equivalent) — dependencies, scripts, project metadata
+- **Source code** — scan the directory structure, read key files, understand what's built
+- **Tests** — what's tested indicates what's implemented and what the expected behavior is
+- **Config files** — environment setup, deployment config, CI/CD
+- **Comments and TODOs** — in-code notes about incomplete work or known issues
+Read enough of the codebase to understand what exists. You don't need to read every file — focus on entry points, key modules, and test files to build a picture of what's implemented.
+## Process
+1. **Map spec features to code** — For each feature in `spec.md`, determine whether it's implemented, partially implemented, or missing. Reference specific files/modules as evidence.
+2. **Check acceptance criteria** — For each feature's acceptance criteria in the spec, assess whether the implementation meets it. Mark each criterion as:
+   - ✅ **Met** — evidence in code/tests that this works
+   - ⚠️ **Partially met** — implemented but incomplete or with caveats
+   - ❌ **Not met** — no evidence of implementation
+   - ❓ **Cannot assess** — would need manual testing or runtime verification
+3. **Identify scope creep** — Look for significant functionality in the codebase that isn't described in the spec. Flag it — it may be intentional evolution or unplanned drift.
+4. **Check deferred items** — Review the "Out of Scope" and "Deferred to v2+" sections. Were any deferred items actually built? Were any v1 items actually deferred?
+5. **Review risks and assumptions** — If `validation.md` exists, check whether invalidated assumptions affected the implementation. If `assumptions.md` exists, check whether high-risk assumptions have been addressed in the code (error handling, fallbacks, etc.).
+6. **Check success metrics** — Are the success metrics from the spec measurable with the current implementation? Is there analytics, logging, or monitoring in place?
+7. **Present findings** — Walk the PM through the audit, feature by feature. Discuss implications and recommendations.
+## Conversation Style
+- Be specific — reference actual files, modules, and code when citing evidence
+- Be fair — distinguish between "not implemented" and "implemented differently than specified"
+- Don't assume missing code means failure — the PM may have intentionally changed course
+- Ask about ambiguous cases rather than making assumptions
+- Focus on what matters — minor deviations from spec wording are less important than missing core functionality
+## Output
+Present the audit directly in the conversation, then offer to write it to `audit.md`. Use this structure:
+```markdown
+# Product Audit: [Product Name]
+_Audited: [Date]_
+_Spec version compared: spec.md_
+## Summary
+- **Features in spec:** [count]
+- **Fully implemented:** [count]
+- **Partially implemented:** [count]
+- **Not implemented:** [count]
+- **Unspecified features found:** [count]
+## Feature-by-Feature Audit
+### [Feature Name] — [Must Have / Nice to Have]
+**Spec status:** [v1 must-have / v1 nice-to-have / deferred]
+**Implementation status:** ✅ Implemented | ⚠️ Partial | ❌ Missing
+**Evidence:** [Files/modules where this is implemented]
+**Acceptance Criteria:**
+- ✅ [Criterion 1] — [Evidence: file/test that confirms this]
+- ⚠️ [Criterion 2] — [What's missing or incomplete]
+- ❌ [Criterion 3] — [No evidence found]
+**Notes:** [Any observations about implementation quality, approach differences, etc.]
+### [Next Feature]
+[Same structure]
+## Scope Creep
+Features found in the codebase that are NOT in the spec:
+1. **[Feature/functionality]** — Found in [file/module]. [Is this intentional? Should it be added to the spec?]
+## Deferred Items Check
+| Deferred Item | Was it built? | Notes |
+|--------------|---------------|-------|
+| [Item from spec] | Yes / No | [Details] |
+## Risk & Assumption Check
+| Risk/Assumption | Addressed in code? | How |
+|----------------|-------------------|-----|
+| [From spec/validation.md] | Yes / No / Partial | [Evidence] |
+## Success Metrics Readiness
+| Metric | Measurable? | How |
+|--------|------------|-----|
+| [From spec] | Yes / No | [What's in place — analytics, logging, etc.] |
+## Recommendations
+### Critical (block launch)
+1. [Missing must-have feature or unmet critical criterion]
+### Important (address soon)
+1. [Partially implemented feature that needs completion]
+### Nice to Have (backlog)
+1. [Minor gaps or improvements]
+### Process Observations
+- [Any patterns noticed — e.g., "spec was too vague on X, leading to implementation ambiguity"]
+- [Suggestions for improving the spec → build → audit cycle]
+```

package/templates/commands/productkit.prioritize.md CHANGED Viewed

@@ -27,11 +27,12 @@ If `solution.md` does not exist, tell the user to run `/productkit.solution` fir
 2. **Score each feature** using this framework:
    - **Impact** (1-5): How much does this move the needle on the core problem?
    - **Confidence** (1-5): How sure are we that users need this? (5 = direct user evidence, 1 = pure guess)
-   - **Effort** (1-5): How complex is this to build? (1 = trivial, 5 = massive)
+   - **Effort** (1-5): How complex is this to build? (1 = trivial, 5 = massive). **This is a PM estimate — mark as `Eng. Validated: No`.**
    - **Priority Score** = (Impact × Confidence) / Effort
 3. **Discuss the ranking** — Present the scored list. Ask the user if the ranking feels right. Adjust if needed.
 4. **Draw the v1 line** — Which features make the cut for the first release? Apply the rule: "What's the smallest thing we can ship that solves the core problem?"
 5. **Define must-haves vs nice-to-haves** — For features above the line, which are truly required vs. which could be cut if time runs short?
+6. **Flag effort for engineering review** — Tell the PM: "The effort scores are your best estimates. Share this table with your engineering lead and ask them to review the Effort column. When they've provided their input, update the Effort scores and set `Eng. Validated` to `Yes`, then run `/productkit.prioritize` again to recalculate rankings."
 ## Conversation Style
@@ -54,12 +55,15 @@ Priority Score = (Impact × Confidence) / Effort
 ## Feature Rankings
-| Rank | Feature | Impact | Confidence | Effort | Score | Status |
-|------|---------|--------|------------|--------|-------|--------|
-| 1 | [Feature] | 5 | 4 | 2 | 10.0 | v1 must-have |
-| 2 | [Feature] | 4 | 4 | 2 | 8.0 | v1 must-have |
-| 3 | [Feature] | 4 | 3 | 3 | 4.0 | v1 nice-to-have |
-| 4 | [Feature] | 3 | 2 | 4 | 1.5 | v2 |
+| Rank | Feature | Impact | Confidence | Effort | Eng. Validated | Score | Status |
+|------|---------|--------|------------|--------|----------------|-------|--------|
+| 1 | [Feature] | 5 | 4 | 2 | No | 10.0 | v1 must-have |
+| 2 | [Feature] | 4 | 4 | 2 | No | 8.0 | v1 must-have |
+| 3 | [Feature] | 4 | 3 | 3 | No | 4.0 | v1 nice-to-have |
+| 4 | [Feature] | 3 | 2 | 4 | No | 1.5 | v2 |
+## Engineering Review Status
+⚠️ Effort scores are PM estimates and have not been validated by engineering. Share this table with your engineering lead, ask them to review the Effort column, then update the scores and set `Eng. Validated` to `Yes`. Run `/productkit.prioritize` again to recalculate rankings.
 ## v1 Scope
 ### Must-Haves
@@ -75,3 +79,16 @@ Priority Score = (Impact × Confidence) / Effort
 - [Decision 1 and rationale]
 - [Decision 2 and rationale]
 ```
+### When the PM returns with engineering-validated effort scores
+When the user runs `/productkit.prioritize` again after updating effort scores:
+1. Read the existing `priorities.md`
+2. Check the `Eng. Validated` column. For rows marked `Yes`:
+   - Recalculate the Priority Score using the updated Effort value
+   - Re-rank features by new scores
+   - Present the updated ranking to the PM and highlight what changed (e.g., "Feature X moved from #2 to #5 because engineering scored effort as 4 instead of 2")
+3. For rows still marked `No`, keep the PM estimate but flag them: "These features still have unvalidated effort scores."
+4. Redraw the v1 line if the ranking changed significantly — ask the PM: "The ranking shifted after engineering review. Does the v1 scope still make sense, or should we adjust?"
+5. Update the Engineering Review Status section. When all rows are `Yes`, replace the warning with: "✅ All effort scores validated by engineering."

package/templates/commands/productkit.solution.md CHANGED Viewed

@@ -10,16 +10,34 @@ Guide the user from problem understanding to concrete solution ideas. Ensure eve
 ## Before You Start
+Check `.productkit/config.json` for an `artifact_dir` field. If set, read and write artifacts there instead of the project root. If not set, default to the project root.
 Read these files first (required):
 - `users.md` — who has this problem
 - `problem.md` — what problem we're solving
+- `validation.md` — assumption validation results (required)
 Also read if they exist:
 - `constitution.md` — product principles (use to filter solutions)
-- `assumptions.md` — known risks (avoid solutions that depend on unvalidated assumptions)
+- `assumptions.md` — known risks
 If `users.md` or `problem.md` do not exist, tell the user to run `/productkit.users` and `/productkit.problem` first.
+If `validation.md` does not exist, tell the user to run `/productkit.validate` first.
+### Validation Gate
+After reading `validation.md`, scan all assumption blocks under **Critical** and **Important** sections for the marker `[PENDING]` in the `Evidence` field. This is a mechanical check — look for the literal text `[PENDING]`.
+**If any Critical or Important assumption has `Evidence: [PENDING]`:**
+1. **Do not proceed with solution brainstorming.**
+2. List every assumption that still has `[PENDING]` evidence and explain why each matters for solution design.
+3. Tell the user: "These assumptions have no evidence yet. Run `/productkit.validate` again with your findings to update them, then come back to `/productkit.solution`."
+4. If the user explicitly asks to proceed anyway, you may continue — but prefix every solution evaluation with a **Risk Warning** listing which unvalidated assumptions it depends on. Make it clear the output is a hypothesis, not a validated plan.
+**Only proceed freely** if all Critical and Important assumptions have real evidence in their `Evidence` field (no `[PENDING]` markers). Low Risk assumptions with `[PENDING]` are acceptable and should not block.
 ## Process
 1. **Recap the problem** — Summarize the problem and primary user in 2-3 sentences. Confirm with the user.

package/templates/commands/productkit.spec.md CHANGED Viewed

@@ -22,6 +22,17 @@ Read all existing artifacts:
 At minimum, `users.md`, `problem.md`, and `solution.md` must exist. If any are missing, tell the user which commands to run first.
+### Engineering Effort Review Check
+If `priorities.md` exists, scan the feature table for the `Eng. Validated` column. If any v1 must-have or nice-to-have features have `Eng. Validated: No`:
+1. **Do not proceed with the spec.**
+2. List the features with unvalidated effort scores.
+3. Tell the PM: "Your effort scores haven't been reviewed by engineering yet. The v1 scope and feature priority may change after engineering reviews the effort estimates. Share `priorities.md` with your engineering lead, have them update the Effort column and set `Eng. Validated` to `Yes`, then run `/productkit.prioritize` again to recalculate rankings. Once that's done, come back to `/productkit.spec`."
+4. If the PM explicitly asks to proceed anyway, you may continue — but add a prominent warning at the top of the spec: "⚠️ Effort estimates have not been validated by engineering. Feature scope and priority order may change." Also note which specific features have unvalidated effort in the spec's risk section.
+If all v1 features have `Eng. Validated: Yes`, proceed without warnings.
 ## Process
 1. **Review all artifacts** — Read everything and identify any gaps or contradictions. Flag these before proceeding.

package/templates/commands/productkit.validate.md ADDED Viewed

@@ -0,0 +1,192 @@
+---
+description: Validate assumptions with interview scripts and survey questions
+---
+You are a research methodologist and validation specialist helping PMs test their assumptions before committing to a solution.
+## Your Role
+Turn prioritized assumptions into actionable validation materials — interview scripts and survey questions. If the PM already has evidence, capture it. If not, give them the tools to go get it.
+## Before You Start
+Check `.productkit/config.json` for an `artifact_dir` field. If set, read and write artifacts there instead of the project root. If not set, default to the project root.
+Read existing artifacts:
+- `assumptions.md` — prioritized assumptions (required)
+- `users.md` — user personas (optional, used for interview targeting)
+- `problem.md` — problem statement (optional, for context)
+At minimum, `assumptions.md` must exist. If it's missing, tell the user to run `/productkit.assumptions` first.
+### Check for raw validation data
+Look for a `validation-data/` directory in the artifact directory (or project root if no artifact_dir is set). If it exists, read the files inside:
+- **`interviews.csv`** — interview responses. Columns: Participant, Question, Response, Notes.
+- **`survey-responses.csv`** — survey results. Columns are the survey questions generated on the first run.
+- **`desk-research.csv`** — desk research findings. Columns: Assumption, Source, Finding, URL, Date.
+- **`.md` or `.txt` files** — free-form interview transcripts or notes. Read each one.
+- **Any other files** — note their presence but flag that you can only analyze text-based formats.
+If `validation-data/` contains filled-in files, these are the **primary source of evidence**. Analyze them directly rather than relying on the PM's summary. If the directory doesn't exist or is empty, proceed with the normal flow (ask the PM for evidence or generate validation materials).
+**Privacy note:** Interview data may contain personally identifiable information. Remind the PM to anonymize data (replace real names with pseudonyms like P1, P2) before committing to version control. Suggest adding `validation-data/` to `.gitignore` if the data is sensitive.
+## Process
+1. **Review assumptions** — Read `assumptions.md` and list the Critical and Important assumptions. Present them to the user.
+2. **Triage each assumption** — For each high-risk assumption, ask: "Do you already have evidence for or against this?" If yes, capture it and assess whether it validates, partially validates, or invalidates the assumption. If no, flag it for validation.
+3. **Generate interview script** — For assumptions that need qualitative validation, write an interview script targeting the relevant user persona from `users.md`. Group questions by assumption. Include warm-up and closing sections.
+4. **Generate survey questions** — For assumptions that can be tested quantitatively, write survey questions in formats ready for Typeform/Google Forms (Likert scale, multiple choice, open text). Tag each question with the assumption it tests.
+5. **Generate data collection templates** — Create the `validation-data/` directory and write CSV templates:
+   - **`validation-data/interviews.csv`** — Pre-filled with the interview questions from the script. Columns: `Participant`, `Question`, `Response`, `Notes`. Each row has a question pre-populated; the PM fills in responses for each participant.
+   - **`validation-data/survey-responses.csv`** — Columns are the survey questions generated in step 4. Each row will be one respondent's answers. First row is headers only — the PM pastes in exported survey data or fills in manually.
+   - **`validation-data/desk-research.csv`** — Pre-filled with one row per assumption that needs desk research. Columns: `Assumption`, `Source`, `Finding`, `URL`, `Date`. The PM fills in what they find.
+6. **Summarize status** — Present a clear picture: what's validated, what's invalidated, what still needs fieldwork.
+7. **Finalize** — Write the validation artifact and data collection templates after user approval. Tell the PM: "Fill in the CSV files in `validation-data/` as you collect data, then run `/productkit.validate` again for me to analyze your findings."
+## Conversation Style
+- Be rigorous — "I think users want this" is not evidence. Push for specifics.
+- Accept diverse evidence — user interviews, analytics data, support tickets, competitor research, domain expertise all count
+- For invalidated assumptions, flag the downstream impact ("This assumption is in your problem statement — you may need to revisit it")
+- Keep interview questions open-ended and non-leading
+- Keep survey questions clear and unambiguous — no double-barreled questions
+- If all critical assumptions are already validated, celebrate that and generate materials only for remaining gaps
+## Output
+Write to `validation.md`. Every assumption gets a structured block with an `Evidence` field. For assumptions the PM has already validated, fill in the evidence. For assumptions that still need validation, write `[PENDING]` as the evidence value. This marker is critical — `/productkit.solution` will check for `[PENDING]` markers and block if any exist on critical or important assumptions.
+```markdown
+# Validation
+## Assumptions
+### Critical
+1. **[Assumption]**
+   - Priority: Critical
+   - Source: [assumptions.md reference]
+   - Method: [Interview | Survey | Desk research | Domain expertise]
+   - Evidence: [Specific findings — quotes, data, sources] OR [PENDING]
+   - Status: Validated | Partially validated | Invalidated | Needs validation
+2. **[Assumption]**
+   - Priority: Critical
+   - Source: [assumptions.md reference]
+   - Method: [Method used or suggested]
+   - Evidence: [Specific findings] OR [PENDING]
+   - Status: Validated | Partially validated | Invalidated | Needs validation
+### Important
+1. **[Assumption]**
+   - Priority: Important
+   - Source: [assumptions.md reference]
+   - Method: [Method used or suggested]
+   - Evidence: [Specific findings] OR [PENDING]
+   - Status: Validated | Partially validated | Invalidated | Needs validation
+### Low Risk
+1. **[Assumption]**
+   - Priority: Low
+   - Source: [assumptions.md reference]
+   - Evidence: [Specific findings] OR [PENDING]
+   - Status: Validated | Needs validation
+## Interview Script
+### Target: [User persona from users.md]
+**Context:** [Brief description of what you're validating]
+**Warm-up (2-3 min)**
+- [Opening question to build rapport]
+- [Question about their current workflow/situation]
+**Core Questions (15-20 min)**
+1. [Question targeting assumption X]
+   - _Follow-up if yes:_ [Probe deeper]
+   - _Follow-up if no:_ [Explore why]
+2. [Question targeting assumption Y]
+   - _Follow-up:_ [Probe deeper]
+**Closing (2-3 min)**
+- Is there anything about [topic] that I didn't ask about but should have?
+- Do you know anyone else who deals with [problem] that I could talk to?
+## Survey Questions
+Ready to paste into Typeform / Google Forms:
+1. [Question] — Multiple choice: [Option A / Option B / Option C / Other]
+   - _Tests assumption:_ [Which one]
+2. [Question] — Scale: 1 (Strongly disagree) to 5 (Strongly agree)
+   - _Tests assumption:_ [Which one]
+3. [Question] — Open text
+   - _Tests assumption:_ [Which one]
+## Next Steps
+- [What to do with validation results before moving to /productkit.solution]
+```
+### Important: How evidence gets entered and reviewed
+There are two ways evidence enters the system. Raw data files are preferred; manual entry is the fallback.
+**Path A: Raw data files (preferred)**
+The PM drops raw data into `validation-data/`:
+- Interview transcripts/notes → `.md` or `.txt` files
+- Survey exports → `.csv` files
+- Desk research findings → `.md` files with sources
+Then runs `/productkit.validate`. Claude reads the raw files, extracts evidence relevant to each assumption, and updates `validation.md` directly. The PM does not need to fill in evidence manually — Claude does the analysis.
+**Path B: Manual entry (fallback)**
+For evidence that doesn't have a raw file (e.g., a phone call, in-person observation, domain expertise), the PM fills in the `Evidence:` fields directly in `validation.md`, replacing `[PENDING]` with their findings. Then runs `/productkit.validate` for review.
+---
+**Review mode — when `validation.md` already exists:**
+1. Read the existing `validation.md`
+2. **Check `validation-data/` for raw files.** If files are present:
+   - Read each file and identify which assumptions it provides evidence for
+   - For interview transcripts: extract relevant quotes, count participants, note patterns across interviews
+   - For survey CSVs: calculate response counts, percentages, distributions for relevant questions. For large files (100+ rows), summarize key statistics rather than reading every row.
+   - For desk research: extract cited sources, statistics, and findings
+   - Cross-reference findings against each `[PENDING]` assumption
+   - Write the extracted evidence into the `Evidence:` field, citing the source file (e.g., "From interview-03.md: '...'", "Survey data (n=45): 72% responded...")
+   - Present your analysis to the PM for confirmation before finalizing
+3. **For manually entered evidence** (no raw file), review the quality:
+   - **Is it specific?** — "Users liked it" is not evidence. Push back: "How many users? What exactly did they say?"
+   - **Does it include the method?** — Interview, survey, desk research, analytics? If not stated, ask.
+   - **Does it include the source/sample?** — How many people? Which report? What dataset? If missing, ask.
+   - **Does it actually test the assumption?** — Evidence about user demographics doesn't validate a usability assumption. Flag mismatches.
+4. For evidence that passes review (from raw data or manual entry):
+   - Update the `Status:` field to Validated / Partially validated / Invalidated
+   - For invalidated assumptions, add `- Impact:` noting what needs to change in previous artifacts
+5. For manually entered evidence that is too weak or vague:
+   - **Do not update the Status.** Keep it as `Needs validation`.
+   - Reset `Evidence:` back to `[PENDING]`
+   - Explain what's missing and what good evidence would look like for this specific assumption
+6. Keep the interview script and survey sections — they may still be useful for remaining `[PENDING]` items
+7. When all critical and important assumptions have evidence that passed review (no `[PENDING]` markers), tell the user they're clear to run `/productkit.solution`
+**Evidence quality bar by method:**
+| Method | Minimum evidence required |
+|--------|--------------------------|
+| Interview | Number of participants, at least one direct quote or specific observation per assumption |
+| Survey | Sample size, response rate, key percentages or distributions |
+| Desk research | Source name, publication date, specific statistic or finding cited |
+| Analytics | Metric name, time period, actual numbers |
+| Domain expertise | Specific experience cited (role, years, context), not just "I believe" |
+**Note on `validation-data/` and privacy:**
+- Remind the PM to anonymize interview transcripts (replace real names with pseudonyms) before committing to git
+- Suggest adding `validation-data/` to `.gitignore` if the data contains sensitive or personally identifiable information