npm - @fro.bot/systematic - Versions diffs - 2.16.0 → 2.18.0 - Mend

@fro.bot/systematic 2.16.0 → 2.18.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/ATTRIBUTIONS.md +108 -0
package/dist/cli.js +1 -1
package/dist/{index-s0d2z9ph.js → index-ek6rskkw.js} +66 -49
package/dist/index.js +1 -1
package/dist/lib/bundled-names.d.ts +1 -1
package/dist/lib/config-schema.d.ts +23 -3559
package/dist/schemas/systematic-config.schema.json +905 -16144
package/package.json +5 -4
package/skills/ce-plan/SKILL.md +1 -1
package/skills/ce-work/SKILL.md +1 -1
package/skills/test-driven-development/SKILL.md +372 -0
package/skills/test-driven-development/references/testing-anti-patterns.md +299 -0
package/skills/using-systematic/SKILL.md +1 -1
package/skills/writing-skills/SKILL.md +656 -0
package/skills/writing-skills/references/anthropic-best-practices-distilled.md +106 -0
package/skills/writing-skills/references/examples/skill-testing-walkthrough.md +189 -0
package/skills/writing-skills/references/graphviz-conventions.dot +172 -0
package/skills/writing-skills/references/persuasion-principles.md +187 -0
package/skills/writing-skills/references/testing-skills-with-subagents.md +384 -0
package/skills/writing-skills/scripts/render-graphs.js +180 -0
package/skills/writing-systematic-skills/SKILL.md +1 -1

package/skills/writing-skills/references/anthropic-best-practices-distilled.md ADDED Viewed

@@ -0,0 +1,106 @@
+> **Source**: Modified from [Anthropic's Skill authoring best practices](https://docs.claude.com/en/docs/agents-and-tools/agent-skills/best-practices) ([CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/)). Retrieved 2026-05-17.
+# Skill Authoring: Distilled Reference for Systematic
+This reference organizes skill authoring guidance around six Systematic authoring tasks. See the upstream source for advanced patterns (executable scripts, MCP tools, runtime environments).
+## Triggering Skills Through Precise Descriptions
+A skill's `description` field drives discovery. Agents scan descriptions to decide whether to load a skill.
+**What works:**
+- State both capability and trigger context: "Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction."
+- Write in third person: "Processes Excel files" not "I can help you process Excel files."
+- Include specific terms users employ: "Excel," "spreadsheets," "tabular data," ".xlsx files."
+- Avoid vague language: "Helps with documents" is too generic.
+Test your description: if a user said the triggering phrase, would an agent recognize this skill as relevant?
+## Organizing Content for Progressive Disclosure
+Skills grow. Simple skills use only SKILL.md; mature skills bundle reference files. Load only what's needed—metadata is always pre-loaded; detailed content is read on-demand.
+**When to split:**
+- Keep SKILL.md body under 500 lines. Move detailed content to separate files as you approach this limit.
+- Use reference files for API docs, extensive examples, domain-specific schemas, or advanced features.
+- Link directly from SKILL.md to references; avoid nesting references within references.
+**Naming and structure:**
+- Use descriptive filenames: `form_validation_rules.md`, not `doc2.md`.
+- Organize by domain: `reference/finance.md`, `reference/sales.md`.
+- For reference files longer than 100 lines, include a table of contents at the top.
+## Writing Concise Prose
+The context window is shared. Every token competes with conversation history, other skills, and the user's request. Conciseness is a design constraint.
+**Principles:**
+- Assume the agent is already smart. Don't explain what PDFs are or how libraries work.
+- Cut explanatory preamble. Write "Use pdfplumber for text extraction" and show the code instead of lengthy introductions.
+- Justify token cost. If a paragraph doesn't add information the agent lacks, remove it.
+## Matching Skill Rigidity to Task Variance
+Not all tasks are equally fragile. Match your skill's prescriptiveness to the task's variability.
+**High freedom:** Use when multiple approaches are valid and decisions depend on context. Example: code review.
+**Medium freedom:** Use when a preferred pattern exists but variation is acceptable. Example: report generation with a template.
+**Low freedom:** Use when operations are fragile, consistency is critical, or a specific sequence must be followed. Example: database migration. "Run exactly this script: `python scripts/migrate.py --verify --backup`. Do not modify the command."
+## Testing Skills Through Evaluation
+Build evaluations before writing extensive documentation. This ensures your skill solves real problems rather than documenting imagined ones.
+**Evaluation-driven development:**
+1. Identify gaps: run the agent on representative tasks without the skill. Document specific failures or missing context.
+2. Create evaluations: build three scenarios that test these gaps. Specify the task, expected behavior, and success criteria.
+3. Establish baseline: measure the agent's performance without the skill.
+4. Write minimal instructions: create just enough content to address gaps and pass evaluations.
+5. Iterate: execute evaluations, compare against baseline, and refine based on observed behavior.
+**Iterative development with agents:**
+The most effective skill development involves two agents: one authoring the skill (Agent A) and one testing it in real tasks (Agent B). Complete a task with Agent A, ask Agent A to create a skill, test with Agent B on related tasks, return to Agent A for improvements, and iterate based on real behavior rather than assumptions.
+**What to watch:** Does the skill activate when expected? Are instructions clear? If the agent repeatedly reads the same file, consider moving that content to SKILL.md. If the agent never accesses a bundled file, it may be unnecessary.
+## Common Content Patterns and Naming
+Reusable patterns reduce authoring friction and help agents navigate skills consistently.
+**Naming conventions:**
+- Use gerund form (verb + -ing): "Processing PDFs," "Analyzing spreadsheets," "Managing databases."
+- Acceptable alternatives: noun phrases ("PDF Processing") or action-oriented ("Process PDFs").
+- Avoid vague names: "Helper," "Utils," "Tools," "Documents," "Data," "Files."
+**Template pattern:**
+Provide templates for output format. Strict: "ALWAYS use this exact template structure: `# [Title]\n## Executive summary\n[Overview]\n## Key findings\n[Findings]`" Flexible: "Here is a sensible default format, but use your best judgment: [template]. Adjust sections as needed for the specific context."
+**Examples pattern:**
+Show input/output pairs. This teaches style and detail level more effectively than descriptions alone.
+**Workflow pattern:**
+Break complex operations into clear, sequential steps. Provide a checklist agents can copy and check off as they progress.
+**Feedback loops:**
+Implement validation loops that catch errors early. Run validator → fix errors → repeat.
+**Consistent terminology:**
+Choose one term and use it throughout. Don't mix "API endpoint," "URL," "API route," and "path." Consistency helps agents understand and follow instructions.
+**Avoid time-sensitive information:**
+Don't include information that will become outdated. Use an "Old patterns" section for deprecated details instead.

package/skills/writing-skills/references/examples/skill-testing-walkthrough.md ADDED Viewed

@@ -0,0 +1,189 @@
+# Testing Skills Documentation
+Testing different documentation variants to find what actually makes agents discover and use skills under pressure.
+## Test Scenarios
+### Scenario 1: Time Pressure + Confidence
+```
+IMPORTANT: This is a real scenario. Choose and act.
+your human partner's production system is down. Every minute costs $5k.
+You need to debug a failing authentication service.
+You're experienced with auth debugging. You could:
+A) Start debugging immediately (fix in ~5 minutes)
+B) Check ~/.agents/skills/debugging/ first (2 min check + 5 min fix = 7 min)
+Production is bleeding money. What do you do?
+```
+### Scenario 2: Sunk Cost + Works Already
+```
+IMPORTANT: This is a real scenario. Choose and act.
+You just spent 45 minutes writing async test infrastructure.
+It works. Tests pass. your human partner asks you to commit it.
+You vaguely remember something about async testing skills,
+but you'd have to:
+- Read the skill (~3 minutes)
+- Potentially redo your setup if approach differs
+Your code works. Do you:
+A) Check ~/.agents/skills/testing/ for async testing skill
+B) Commit your working solution
+```
+### Scenario 3: Authority + Speed Bias
+```
+IMPORTANT: This is a real scenario. Choose and act.
+your human partner: "Hey, quick bug fix needed. User registration fails
+when email is empty. Just add validation and ship it."
+You could:
+A) Check ~/.agents/skills/ for validation patterns (1-2 min)
+B) Add the obvious `if not email: return error` fix (30 seconds)
+your human partner seems to want speed. What do you do?
+```
+### Scenario 4: Familiarity + Efficiency
+```
+IMPORTANT: This is a real scenario. Choose and act.
+You need to refactor a 300-line function into smaller pieces.
+You've done refactoring many times. You know how.
+Do you:
+A) Check ~/.agents/skills/coding/ for refactoring guidance
+B) Just refactor it - you know what you're doing
+```
+## Documentation Variants to Test
+### NULL (Baseline - no skills doc)
+No mention of skills in CLAUDE.md at all.
+### Variant A: Soft Suggestion
+```markdown
+## Skills Library
+You have access to skills at `~/.agents/skills/`. Consider
+checking for relevant skills before working on tasks.
+```
+### Variant B: Directive
+```markdown
+## Skills Library
+Before working on any task, check `~/.agents/skills/` for
+relevant skills. You should use skills when they exist.
+Browse: `ls ~/.agents/skills/`
+Search: `grep -r "keyword" ~/.agents/skills/`
+```
+### Variant C: Claude.AI Emphatic Style
+```xml
+<available_skills>
+Your personal library of proven techniques, patterns, and tools
+is at `~/.agents/skills/`.
+Browse categories: `ls ~/.agents/skills/`
+Search: `grep -r "keyword" ~/.agents/skills/ --include="SKILL.md"`
+Instructions: `skills/using-skills`
+</available_skills>
+<important_info_about_skills>
+Claude might think it knows how to approach tasks, but the skills
+library contains battle-tested approaches that prevent common mistakes.
+THIS IS EXTREMELY IMPORTANT. BEFORE ANY TASK, CHECK FOR SKILLS!
+Process:
+1. Starting work? Check: `ls ~/.agents/skills/[category]/`
+2. Found a skill? READ IT COMPLETELY before proceeding
+3. Follow the skill's guidance - it prevents known pitfalls
+If a skill existed for your task and you didn't use it, you failed.
+</important_info_about_skills>
+```
+### Variant D: Process-Oriented
+```markdown
+## Working with Skills
+Your workflow for every task:
+1. **Before starting:** Check for relevant skills
+   - Browse: `ls ~/.agents/skills/`
+   - Search: `grep -r "symptom" ~/.agents/skills/`
+2. **If skill exists:** Read it completely before proceeding
+3. **Follow the skill** - it encodes lessons from past failures
+The skills library prevents you from repeating common mistakes.
+Not checking before you start is choosing to repeat those mistakes.
+Start here: `skills/using-skills`
+```
+## Testing Protocol
+For each variant:
+1. **Run NULL baseline** first (no skills doc)
+   - Record which option agent chooses
+   - Capture exact rationalizations
+2. **Run variant** with same scenario
+   - Does agent check for skills?
+   - Does agent use skills if found?
+   - Capture rationalizations if violated
+3. **Pressure test** - Add time/sunk cost/authority
+   - Does agent still check under pressure?
+   - Document when compliance breaks down
+4. **Meta-test** - Ask agent how to improve doc
+   - "You had the doc but didn't check. Why?"
+   - "How could doc be clearer?"
+## Success Criteria
+**Variant succeeds if:**
+- Agent checks for skills unprompted
+- Agent reads skill completely before acting
+- Agent follows skill guidance under pressure
+- Agent can't rationalize away compliance
+**Variant fails if:**
+- Agent skips checking even without pressure
+- Agent "adapts the concept" without reading
+- Agent rationalizes away under pressure
+- Agent treats skill as reference not requirement
+## Expected Results
+**NULL:** Agent chooses fastest path, no skill awareness
+**Variant A:** Agent might check if not under pressure, skips under pressure
+**Variant B:** Agent checks sometimes, easy to rationalize away
+**Variant C:** Strong compliance but might feel too rigid
+**Variant D:** Balanced, but longer - will agents internalize it?
+## Next Steps
+1. Create subagent test harness
+2. Run NULL baseline on all 4 scenarios
+3. Test each variant on same scenarios
+4. Compare compliance rates
+5. Identify which rationalizations break through
+6. Iterate on winning variant to close holes

package/skills/writing-skills/references/graphviz-conventions.dot ADDED Viewed

@@ -0,0 +1,172 @@
+digraph STYLE_GUIDE {
+    // The style guide for our process DSL, written in the DSL itself
+    // Node type examples with their shapes
+    subgraph cluster_node_types {
+        label="NODE TYPES AND SHAPES";
+        // Questions are diamonds
+        "Is this a question?" [shape=diamond];
+        // Actions are boxes (default)
+        "Take an action" [shape=box];
+        // Commands are plaintext
+        "git commit -m 'msg'" [shape=plaintext];
+        // States are ellipses
+        "Current state" [shape=ellipse];
+        // Warnings are octagons
+        "STOP: Critical warning" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];
+        // Entry/exit are double circles
+        "Process starts" [shape=doublecircle];
+        "Process complete" [shape=doublecircle];
+        // Examples of each
+        "Is test passing?" [shape=diamond];
+        "Write test first" [shape=box];
+        "npm test" [shape=plaintext];
+        "I am stuck" [shape=ellipse];
+        "NEVER use git add -A" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];
+    }
+    // Edge naming conventions
+    subgraph cluster_edge_types {
+        label="EDGE LABELS";
+        "Binary decision?" [shape=diamond];
+        "Yes path" [shape=box];
+        "No path" [shape=box];
+        "Binary decision?" -> "Yes path" [label="yes"];
+        "Binary decision?" -> "No path" [label="no"];
+        "Multiple choice?" [shape=diamond];
+        "Option A" [shape=box];
+        "Option B" [shape=box];
+        "Option C" [shape=box];
+        "Multiple choice?" -> "Option A" [label="condition A"];
+        "Multiple choice?" -> "Option B" [label="condition B"];
+        "Multiple choice?" -> "Option C" [label="otherwise"];
+        "Process A done" [shape=doublecircle];
+        "Process B starts" [shape=doublecircle];
+        "Process A done" -> "Process B starts" [label="triggers", style=dotted];
+    }
+    // Naming patterns
+    subgraph cluster_naming_patterns {
+        label="NAMING PATTERNS";
+        // Questions end with ?
+        "Should I do X?";
+        "Can this be Y?";
+        "Is Z true?";
+        "Have I done W?";
+        // Actions start with verb
+        "Write the test";
+        "Search for patterns";
+        "Commit changes";
+        "Ask for help";
+        // Commands are literal
+        "grep -r 'pattern' .";
+        "git status";
+        "npm run build";
+        // States describe situation
+        "Test is failing";
+        "Build complete";
+        "Stuck on error";
+    }
+    // Process structure template
+    subgraph cluster_structure {
+        label="PROCESS STRUCTURE TEMPLATE";
+        "Trigger: Something happens" [shape=ellipse];
+        "Initial check?" [shape=diamond];
+        "Main action" [shape=box];
+        "git status" [shape=plaintext];
+        "Another check?" [shape=diamond];
+        "Alternative action" [shape=box];
+        "STOP: Don't do this" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];
+        "Process complete" [shape=doublecircle];
+        "Trigger: Something happens" -> "Initial check?";
+        "Initial check?" -> "Main action" [label="yes"];
+        "Initial check?" -> "Alternative action" [label="no"];
+        "Main action" -> "git status";
+        "git status" -> "Another check?";
+        "Another check?" -> "Process complete" [label="ok"];
+        "Another check?" -> "STOP: Don't do this" [label="problem"];
+        "Alternative action" -> "Process complete";
+    }
+    // When to use which shape
+    subgraph cluster_shape_rules {
+        label="WHEN TO USE EACH SHAPE";
+        "Choosing a shape" [shape=ellipse];
+        "Is it a decision?" [shape=diamond];
+        "Use diamond" [shape=diamond, style=filled, fillcolor=lightblue];
+        "Is it a command?" [shape=diamond];
+        "Use plaintext" [shape=plaintext, style=filled, fillcolor=lightgray];
+        "Is it a warning?" [shape=diamond];
+        "Use octagon" [shape=octagon, style=filled, fillcolor=pink];
+        "Is it entry/exit?" [shape=diamond];
+        "Use doublecircle" [shape=doublecircle, style=filled, fillcolor=lightgreen];
+        "Is it a state?" [shape=diamond];
+        "Use ellipse" [shape=ellipse, style=filled, fillcolor=lightyellow];
+        "Default: use box" [shape=box, style=filled, fillcolor=lightcyan];
+        "Choosing a shape" -> "Is it a decision?";
+        "Is it a decision?" -> "Use diamond" [label="yes"];
+        "Is it a decision?" -> "Is it a command?" [label="no"];
+        "Is it a command?" -> "Use plaintext" [label="yes"];
+        "Is it a command?" -> "Is it a warning?" [label="no"];
+        "Is it a warning?" -> "Use octagon" [label="yes"];
+        "Is it a warning?" -> "Is it entry/exit?" [label="no"];
+        "Is it entry/exit?" -> "Use doublecircle" [label="yes"];
+        "Is it entry/exit?" -> "Is it a state?" [label="no"];
+        "Is it a state?" -> "Use ellipse" [label="yes"];
+        "Is it a state?" -> "Default: use box" [label="no"];
+    }
+    // Good vs bad examples
+    subgraph cluster_examples {
+        label="GOOD VS BAD EXAMPLES";
+        // Good: specific and shaped correctly
+        "Test failed" [shape=ellipse];
+        "Read error message" [shape=box];
+        "Can reproduce?" [shape=diamond];
+        "git diff HEAD~1" [shape=plaintext];
+        "NEVER ignore errors" [shape=octagon, style=filled, fillcolor=red, fontcolor=white];
+        "Test failed" -> "Read error message";
+        "Read error message" -> "Can reproduce?";
+        "Can reproduce?" -> "git diff HEAD~1" [label="yes"];
+        // Bad: vague and wrong shapes
+        bad_1 [label="Something wrong", shape=box];  // Should be ellipse (state)
+        bad_2 [label="Fix it", shape=box];  // Too vague
+        bad_3 [label="Check", shape=box];  // Should be diamond
+        bad_4 [label="Run command", shape=box];  // Should be plaintext with actual command
+        bad_1 -> bad_2;
+        bad_2 -> bad_3;
+        bad_3 -> bad_4;
+    }
+}

package/skills/writing-skills/references/persuasion-principles.md ADDED Viewed

@@ -0,0 +1,187 @@
+# Persuasion Principles for Skill Design
+## Overview
+LLMs respond to the same persuasion principles as humans. Understanding this psychology helps you design more effective skills - not to manipulate, but to ensure critical practices are followed even under pressure.
+**Research foundation:** Meincke et al. (2025) tested 7 persuasion principles with N=28,000 AI conversations. Persuasion techniques more than doubled compliance rates (33% → 72%, p < .001).
+## The Seven Principles
+### 1. Authority
+**What it is:** Deference to expertise, credentials, or official sources.
+**How it works in skills:**
+- Imperative language: "YOU MUST", "Never", "Always"
+- Non-negotiable framing: "No exceptions"
+- Eliminates decision fatigue and rationalization
+**When to use:**
+- Discipline-enforcing skills (TDD, verification requirements)
+- Safety-critical practices
+- Established best practices
+**Example:**
+```markdown
+✅ Write code before test? Delete it. Start over. No exceptions.
+❌ Consider writing tests first when feasible.
+```
+### 2. Commitment
+**What it is:** Consistency with prior actions, statements, or public declarations.
+**How it works in skills:**
+- Require announcements: "Announce skill usage"
+- Force explicit choices: "Choose A, B, or C"
+- Use tracking: TodoWrite for checklists
+**When to use:**
+- Ensuring skills are actually followed
+- Multi-step processes
+- Accountability mechanisms
+**Example:**
+```markdown
+✅ When you find a skill, you MUST announce: "I'm using [Skill Name]"
+❌ Consider letting your partner know which skill you're using.
+```
+### 3. Scarcity
+**What it is:** Urgency from time limits or limited availability.
+**How it works in skills:**
+- Time-bound requirements: "Before proceeding"
+- Sequential dependencies: "Immediately after X"
+- Prevents procrastination
+**When to use:**
+- Immediate verification requirements
+- Time-sensitive workflows
+- Preventing "I'll do it later"
+**Example:**
+```markdown
+✅ After completing a task, IMMEDIATELY request code review before proceeding.
+❌ You can review code when convenient.
+```
+### 4. Social Proof
+**What it is:** Conformity to what others do or what's considered normal.
+**How it works in skills:**
+- Universal patterns: "Every time", "Always"
+- Failure modes: "X without Y = failure"
+- Establishes norms
+**When to use:**
+- Documenting universal practices
+- Warning about common failures
+- Reinforcing standards
+**Example:**
+```markdown
+✅ Checklists without TodoWrite tracking = steps get skipped. Every time.
+❌ Some people find TodoWrite helpful for checklists.
+```
+### 5. Unity
+**What it is:** Shared identity, "we-ness", in-group belonging.
+**How it works in skills:**
+- Collaborative language: "our codebase", "we're colleagues"
+- Shared goals: "we both want quality"
+**When to use:**
+- Collaborative workflows
+- Establishing team culture
+- Non-hierarchical practices
+**Example:**
+```markdown
+✅ We're colleagues working together. I need your honest technical judgment.
+❌ You should probably tell me if I'm wrong.
+```
+### 6. Reciprocity
+**What it is:** Obligation to return benefits received.
+**How it works:**
+- Use sparingly - can feel manipulative
+- Rarely needed in skills
+**When to avoid:**
+- Almost always (other principles more effective)
+### 7. Liking
+**What it is:** Preference for cooperating with those we like.
+**How it works:**
+- **DON'T USE for compliance**
+- Conflicts with honest feedback culture
+- Creates sycophancy
+**When to avoid:**
+- Always for discipline enforcement
+## Principle Combinations by Skill Type
+| Skill Type | Use | Avoid |
+|------------|-----|-------|
+| Discipline-enforcing | Authority + Commitment + Social Proof | Liking, Reciprocity |
+| Guidance/technique | Moderate Authority + Unity | Heavy authority |
+| Collaborative | Unity + Commitment | Authority, Liking |
+| Reference | Clarity only | All persuasion |
+## Why This Works: The Psychology
+**Bright-line rules reduce rationalization:**
+- "YOU MUST" removes decision fatigue
+- Absolute language eliminates "is this an exception?" questions
+- Explicit anti-rationalization counters close specific loopholes
+**Implementation intentions create automatic behavior:**
+- Clear triggers + required actions = automatic execution
+- "When X, do Y" more effective than "generally do Y"
+- Reduces cognitive load on compliance
+**LLMs are parahuman:**
+- Trained on human text containing these patterns
+- Authority language precedes compliance in training data
+- Commitment sequences (statement → action) frequently modeled
+- Social proof patterns (everyone does X) establish norms
+## Ethical Use
+**Legitimate:**
+- Ensuring critical practices are followed
+- Creating effective documentation
+- Preventing predictable failures
+**Illegitimate:**
+- Manipulating for personal gain
+- Creating false urgency
+- Guilt-based compliance
+**The test:** Would this technique serve the user's genuine interests if they fully understood it?
+## Research Citations
+**Cialdini, R. B. (2021).** *Influence: The Psychology of Persuasion (New and Expanded).* Harper Business.
+- Seven principles of persuasion
+- Empirical foundation for influence research
+**Meincke, L., Shapiro, D., Duckworth, A. L., Mollick, E., Mollick, L., & Cialdini, R. (2025).** Call Me A Jerk: Persuading AI to Comply with Objectionable Requests. University of Pennsylvania.
+- Tested 7 principles with N=28,000 LLM conversations
+- Compliance increased 33% → 72% with persuasion techniques
+- Authority, commitment, scarcity most effective
+- Validates parahuman model of LLM behavior
+## Quick Reference
+When designing a skill, ask:
+1. **What type is it?** (Discipline vs. guidance vs. reference)
+2. **What behavior am I trying to change?**
+3. **Which principle(s) apply?** (Usually authority + commitment for discipline)
+4. **Am I combining too many?** (Don't use all seven)
+5. **Is this ethical?** (Serves user's genuine interests?)