jettypod 3.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (122) hide show
  1. package/.claude/PROTECT_SKILLS.md +28 -0
  2. package/.claude/settings.json +24 -0
  3. package/.claude/settings.local.json +16 -0
  4. package/.claude/skills/epic-discover/SKILL.md +262 -0
  5. package/.claude/skills/feature-discover/SKILL.md +393 -0
  6. package/.claude/skills/speed-mode/SKILL.md +364 -0
  7. package/.claude/skills/stable-mode/SKILL.md +591 -0
  8. package/.github/workflows/test-safety.yml +85 -0
  9. package/README.md +25 -0
  10. package/SPEED-STABLE-AUDIT.md +853 -0
  11. package/SYSTEM-BEHAVIOR.md +1241 -0
  12. package/TEST_SAFETY_AUDIT.md +314 -0
  13. package/TEST_SAFETY_IMPLEMENTATION.md +97 -0
  14. package/cucumber.js +8 -0
  15. package/docs/COMMAND_REFERENCE.md +903 -0
  16. package/docs/DECISIONS.md +68 -0
  17. package/docs/README.md +48 -0
  18. package/docs/STANDARDS-SYSTEM-DOCUMENTATION.md +374 -0
  19. package/docs/TEST-REWRITE-PLAN.md +261 -0
  20. package/docs/ai-test-writing-requirements.md +219 -0
  21. package/docs/claude-code-skills.md +607 -0
  22. package/docs/core-jettypod-methodology/comprehensive-jettypod-methodology.md +582 -0
  23. package/docs/core-jettypod-methodology/deprecated/jettypod-comprehensive-standards.md +1222 -0
  24. package/docs/core-jettypod-methodology/deprecated/jettypod-operating-guide.md +3399 -0
  25. package/docs/core-jettypod-methodology/deprecated/jettypod-technical-checklist.md +1325 -0
  26. package/docs/core-jettypod-methodology/deprecated/jettypod-vibe-coding-framework.md +1544 -0
  27. package/docs/core-jettypod-methodology/deprecated/prompt-engineering-guide.md +320 -0
  28. package/docs/core-jettypod-methodology/deprecated/vibe-coding-cheatsheet (1).md +516 -0
  29. package/docs/core-jettypod-methodology/deprecated/vibe-coding-framework.md +1544 -0
  30. package/docs/features/jettypod-standards-explained.md +543 -0
  31. package/docs/features/standards-inventory.md +257 -0
  32. package/docs/gap-analysis-current-vs-comprehensive-methodology.md +939 -0
  33. package/docs/jettypod-system-overview.md +409 -0
  34. package/features/auto-generate-production-chores.feature +14 -0
  35. package/features/claude-md-protection/steps.js +487 -0
  36. package/features/decisions/index.js +490 -0
  37. package/features/decisions/index.test.js +208 -0
  38. package/features/git-hooks/git-hooks.feature +30 -0
  39. package/features/git-hooks/index.js +93 -0
  40. package/features/git-hooks/index.test.js +137 -0
  41. package/features/git-hooks/post-commit +56 -0
  42. package/features/git-hooks/post-merge +47 -0
  43. package/features/git-hooks/pre-commit +28 -0
  44. package/features/git-hooks/simple-steps.js +53 -0
  45. package/features/git-hooks/simple-test.feature +10 -0
  46. package/features/git-hooks/steps.js +196 -0
  47. package/features/jettypod-update-command.feature +46 -0
  48. package/features/mode-prompts/index.js +95 -0
  49. package/features/mode-prompts/simple-steps.js +44 -0
  50. package/features/mode-prompts/simple-test.feature +9 -0
  51. package/features/mode-prompts/validation.test.js +120 -0
  52. package/features/refactor-mode/steps.js +217 -0
  53. package/features/refactor-mode.feature +49 -0
  54. package/features/skills-update/index.test.js +216 -0
  55. package/features/step_definitions/auto-generate-production-chores.steps.js +162 -0
  56. package/features/step_definitions/terminal-logo.steps.js +145 -0
  57. package/features/step_definitions/update-command.steps.js +183 -0
  58. package/features/terminal-logo/index.js +39 -0
  59. package/features/terminal-logo/terminal-logo.feature +30 -0
  60. package/features/update-command/index.js +181 -0
  61. package/features/update-command/index.test.js +225 -0
  62. package/features/work-commands/bug-workflow-display.feature +22 -0
  63. package/features/work-commands/index.js +311 -0
  64. package/features/work-commands/simple-steps.js +69 -0
  65. package/features/work-commands/stable-tests.feature +57 -0
  66. package/features/work-commands/steps.js +1120 -0
  67. package/features/work-commands/validation.test.js +88 -0
  68. package/features/work-commands/work-commands.feature +13 -0
  69. package/features/work-tracking/discovery-validation.test.js +228 -0
  70. package/features/work-tracking/index.js +1511 -0
  71. package/features/work-tracking/mode-required.feature +112 -0
  72. package/features/work-tracking/phase-tracking.test.js +482 -0
  73. package/features/work-tracking/prototype-tracking.test.js +485 -0
  74. package/features/work-tracking/tree-view.test.js +310 -0
  75. package/features/work-tracking/work-set-mode.feature +71 -0
  76. package/features/work-tracking/work-start-mode.feature +88 -0
  77. package/full-test.txt +0 -0
  78. package/install.sh +89 -0
  79. package/jettypod.js +1640 -0
  80. package/lib/bug-workflow.js +94 -0
  81. package/lib/bug-workflow.test.js +177 -0
  82. package/lib/claudemd.js +130 -0
  83. package/lib/claudemd.test.js +195 -0
  84. package/lib/comprehensive-standards-full.json +1778 -0
  85. package/lib/config.js +181 -0
  86. package/lib/config.test.js +511 -0
  87. package/lib/constants.js +107 -0
  88. package/lib/constants.test.js +164 -0
  89. package/lib/current-work.js +130 -0
  90. package/lib/current-work.test.js +146 -0
  91. package/lib/database-project-config.test.js +107 -0
  92. package/lib/database.js +256 -0
  93. package/lib/database.test.js +106 -0
  94. package/lib/decisions-generator.js +102 -0
  95. package/lib/decisions-generator.test.js +457 -0
  96. package/lib/decisions-helpers.js +119 -0
  97. package/lib/decisions-helpers.test.js +310 -0
  98. package/lib/discovery-checkpoint.js +83 -0
  99. package/lib/docs-generator.js +280 -0
  100. package/lib/external-checklist.js +177 -0
  101. package/lib/git.js +142 -0
  102. package/lib/git.test.js +145 -0
  103. package/lib/logo.js +3 -0
  104. package/lib/migrations/001-epic-to-parent.js +24 -0
  105. package/lib/migrations/002-default-work-item-modes.js +37 -0
  106. package/lib/migrations/002-default-work-item-modes.test.js +351 -0
  107. package/lib/migrations/003-epic-discovery-fields.js +52 -0
  108. package/lib/migrations/004-discovery-decisions-table.js +32 -0
  109. package/lib/migrations/005-migrate-decision-data.js +62 -0
  110. package/lib/migrations/006-feature-phase-field.js +61 -0
  111. package/lib/migrations/007-prototype-tracking.js +38 -0
  112. package/lib/migrations/008-scenario-file-field.js +24 -0
  113. package/lib/migrations/index.js +74 -0
  114. package/lib/production-helpers.js +69 -0
  115. package/lib/project-state.test.js +92 -0
  116. package/lib/test-helpers.js +184 -0
  117. package/lib/test-helpers.test.js +255 -0
  118. package/package.json +36 -0
  119. package/prototypes/test/index.html +1 -0
  120. package/setup-dist-repo.sh +68 -0
  121. package/test-safety-check.sh +80 -0
  122. package/work-item-tracking-plan.md +199 -0
@@ -0,0 +1,320 @@
1
+ # Prompt Engineering: A Comprehensive Research-Based Guide
2
+
3
+ ## Executive Summary
4
+
5
+ Prompt engineering has emerged as a critical discipline for effectively leveraging Large Language Models (LLMs) in practical applications. This field encompasses techniques for developing and optimizing prompts to efficiently use language models for a wide variety of applications and research topics. Based on analysis of over 1,500 academic papers and research from leading institutions including OpenAI, Anthropic, MIT, Stanford, and Google, this guide synthesizes the most current and effective prompt engineering strategies for 2024-2025.
6
+
7
+ The key finding from recent research is clear: prompt engineering is far faster than other methods of model behavior control, such as finetuning, and can often yield leaps in performance in far less time. Moreover, it maintains model flexibility, requires minimal resources, and preserves the model's broad capabilities while allowing rapid iteration and experimentation.
8
+
9
+ ## Part 1: Foundational Concepts and Principles
10
+
11
+ ### What is Prompt Engineering?
12
+
13
+ Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models for a wide variety of applications. It goes beyond simple prompt design - it encompasses understanding model capabilities, limitations, and the cognitive patterns that influence model behavior.
14
+
15
+ ### The Current State of Research
16
+
17
+ The field has experienced explosive growth. In 2022, there were merely 10 papers on RAG. In 2023, there were 93 papers published. However, 2024 saw an unprecedented surge, with 1,202 RAG-related papers published in a single year. This growth reflects the critical importance of prompt engineering as LLMs become more central to enterprise applications.
18
+
19
+ ### Why Prompt Engineering Over Fine-Tuning?
20
+
21
+ Research from Anthropic provides compelling evidence for choosing prompt engineering:
22
+
23
+ **Advantages:**
24
+ - Resource efficiency: Fine-tuning requires high-end GPUs and large memory, while prompt engineering uses the base model
25
+ - Cost-effectiveness: For cloud-based AI services, fine-tuning incurs significant costs. Prompt engineering uses the base model, which is typically cheaper
26
+ - Time-saving: Fine-tuning can take hours or even days. In contrast, prompt engineering provides nearly instantaneous results
27
+ - Comprehension improvements: Prompt engineering is far more effective than finetuning at helping models better understand and utilize external content such as retrieved documents
28
+
29
+ ## Part 2: Core Prompting Techniques
30
+
31
+ ### 1. Zero-Shot, One-Shot, and Few-Shot Prompting
32
+
33
+ These fundamental techniques form the basis of prompt engineering:
34
+
35
+ **Zero-Shot Prompting**
36
+ - No examples provided
37
+ - Relies entirely on model's pre-trained knowledge
38
+ - Best for straightforward tasks with clear instructions
39
+
40
+ **One-Shot Prompting**
41
+ - Single example provided
42
+ - Helps clarify task format and expectations
43
+ - Useful when task pattern is simple but needs demonstration
44
+
45
+ **Few-Shot Prompting**
46
+ - According to Touvron et al. 2023, few shot properties first appeared when models were scaled to a sufficient size
47
+ - Multiple examples (typically 2-8) provided
48
+ - Multiple research papers point to major gains after 2 examples and then a plateau
49
+ - Critical finding: Order matters - The right permutation of examples led to near state-of-the-art performance, while others fell to nearly chance levels
50
+
51
+ **Best Practices for Few-Shot Prompting:**
52
+ - Place most critical examples last
53
+ - Use diverse, representative examples
54
+ - Maintain consistent formatting across examples
55
+ - Adding more examples does not necessarily improve accuracy; in some cases adding more examples can actually reduce accuracy
56
+
57
+ ### 2. Chain-of-Thought (CoT) Prompting
58
+
59
+ Chain of thought prompting enables models to decompose multi-step problems into intermediate steps. This technique has proven particularly powerful for complex reasoning tasks.
60
+
61
+ **Key Research Findings:**
62
+ - Chain of thought prompting is an emergent property of model scale - the benefits only materialize with a sufficient number of parameters (~100B)
63
+ - CoT only yields performance gains when used with models of ∼100B parameters. Smaller models wrote illogical chains of thought, which led to worse accuracy than standard prompting
64
+
65
+ **Implementation Strategies:**
66
+
67
+ 1. **Explicit CoT Prompting**: Include step-by-step reasoning in examples
68
+ 2. **Zero-Shot CoT**: Simply add "Let's think step by step" to prompts
69
+ 3. **Auto-CoT**: Leveraging LLMs with "Let's think step by step" prompt to generate reasoning chains for demonstrations automatically
70
+
71
+ ### 3. Advanced Techniques from Leading Research
72
+
73
+ #### The "Think" Tool (Anthropic Research)
74
+
75
+ The "think" tool achieved 0.570 on the pass^1 metric, compared to just 0.370 for the baseline—a 54% relative improvement in airline domain tasks. This approach involves:
76
+ - Providing a dedicated reasoning space before generating final answers
77
+ - Using XML-like tags to structure thinking process
78
+ - Combining with optimized prompting for complex domains
79
+
80
+ #### Structured Prompting with XML Tags
81
+
82
+ Anthropic research shows that Claude has been trained to recognize and respond to XML-style tags. These tags act like signposts, helping the model separate instructions, examples, and inputs more effectively.
83
+
84
+ Example structure:
85
+ ```
86
+ <instructions>
87
+ [Clear task description]
88
+ </instructions>
89
+
90
+ <examples>
91
+ [Few-shot examples]
92
+ </examples>
93
+
94
+ <input>
95
+ [User query]
96
+ </input>
97
+ ```
98
+
99
+ ## Part 3: Retrieval-Augmented Generation (RAG)
100
+
101
+ ### The RAG Revolution
102
+
103
+ RAG has become essential for production LLM applications. RAG extends the already powerful capabilities of LLMs to specific domains or an organization's internal knowledge base, all without the need to retrain the model.
104
+
105
+ ### Core RAG Architecture
106
+
107
+ RAG combines an information retrieval component with a text generator model. RAG can be fine-tuned and its internal knowledge can be modified in an efficient manner and without needing retraining of the entire model.
108
+
109
+ **Key Components:**
110
+ 1. **Retrieval System**: Searches relevant documents from knowledge base
111
+ 2. **Embedding Model**: Converts text to vector representations
112
+ 3. **Vector Database**: Stores and indexes document embeddings
113
+ 4. **Generation Model**: Produces final output using retrieved context
114
+
115
+ ### RAG Best Practices
116
+
117
+ Based on 2024 research findings:
118
+
119
+ 1. **Start Wide, Then Narrow**: Search strategy should mirror expert human research: explore the landscape before drilling into specifics
120
+
121
+ 2. **Context Window Optimization**: With models now supporting 1M+ tokens, careful selection of relevant context becomes critical
122
+
123
+ 3. **Hybrid Approaches**: Combining RAG with fine-tuned models for optimal performance
124
+
125
+ 4. **Multi-Modal RAG**: Extending beyond text to include images and structured data
126
+
127
+ ## Part 4: Practical Implementation Guidelines
128
+
129
+ ### 1. Clear and Specific Instructions
130
+
131
+ Claude performs best when instructions are specific, detailed, and unambiguous. Vague prompts leave room for misinterpretation, while clear directives improve output quality.
132
+
133
+ **Framework:**
134
+ - State the exact task and goal
135
+ - Define all technical terms
136
+ - Specify output format requirements
137
+ - Include edge case handling
138
+
139
+ ### 2. Context Management
140
+
141
+ Claude supports a large context window—up to 100,000 tokens, or about 70,000 words. Best practices include:
142
+ - Front-load critical information
143
+ - Use hierarchical organization for long contexts
144
+ - Implement smart truncation strategies
145
+ - Consider context caching for repeated use
146
+
147
+ ### 3. Iterative Refinement Process
148
+
149
+ 1. **Establish Success Criteria**: Define clear, measurable objectives
150
+ 2. **Create Baseline**: Start with simple prompts
151
+ 3. **Systematic Testing**: Use evaluation frameworks
152
+ 4. **Incremental Improvement**: Apply techniques progressively
153
+ 5. **Documentation**: Maintain prompt versioning
154
+
155
+ ### 4. Template-Based Approaches
156
+
157
+ Recent MIT research suggests: The most powerful approach isn't crafting the perfect one-off prompt; it's having a reliable arsenal of templates ready to deploy.
158
+
159
+ **Recommended Template Categories:**
160
+ - Task-specific templates
161
+ - Domain-specific templates
162
+ - Output format templates
163
+ - Error handling templates
164
+ - Chain-of-thought templates
165
+
166
+ ## Part 5: Evaluation and Optimization
167
+
168
+ ### Measuring Prompt Effectiveness
169
+
170
+ Key metrics to track:
171
+ 1. **Accuracy**: Correctness of outputs
172
+ 2. **Consistency**: Reliability across similar inputs
173
+ 3. **Latency**: Response time
174
+ 4. **Cost**: Token usage efficiency
175
+ 5. **User Satisfaction**: Subjective quality measures
176
+
177
+ ### Optimization Strategies
178
+
179
+ #### 1. Prompt Compression
180
+ LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models - reducing prompt length while maintaining effectiveness.
181
+
182
+ #### 2. Automated Optimization
183
+ Recent advances include:
184
+ - DSPy for systematic prompt optimization
185
+ - Meta-prompting techniques for self-improvement
186
+ - Claude 4 models can be excellent prompt engineers. When given a prompt and a failure mode, they are able to diagnose why the agent is failing and suggest improvements
187
+
188
+ #### 3. A/B Testing Frameworks
189
+ - Systematic comparison of prompt variations
190
+ - Statistical significance testing
191
+ - User preference studies
192
+
193
+ ## Part 6: Domain-Specific Applications
194
+
195
+ ### Software Development
196
+
197
+ Claude Code gives engineers and researchers a more native way to integrate Claude into their coding workflows. Key strategies:
198
+ - Provide full project context when possible
199
+ - Use structured commands for repetitive tasks
200
+ - Leverage git history for context
201
+ - Implement error handling patterns
202
+
203
+ ### Research and Analysis
204
+
205
+ For complex research tasks:
206
+ - Multi-agent systems can operate reliably at scale with careful engineering, comprehensive testing, detail-oriented prompt and tool design
207
+ - Break down complex queries into subtasks
208
+ - Use parallel processing for efficiency
209
+ - Implement verification steps
210
+
211
+ ### Creative Applications
212
+
213
+ - Balance structure with flexibility
214
+ - Use temperature and other parameters strategically
215
+ - Implement iterative refinement cycles
216
+ - Combine multiple generation approaches
217
+
218
+ ## Part 7: Common Pitfalls and Solutions
219
+
220
+ ### 1. Hallucination Mitigation
221
+
222
+ **Problem**: Models generate plausible but incorrect information
223
+
224
+ **Solutions**:
225
+ - Implement RAG for factual grounding
226
+ - Use chain-of-thought for transparency
227
+ - Add verification steps
228
+ - Cite sources explicitly
229
+
230
+ ### 2. Inconsistent Outputs
231
+
232
+ **Problem**: Same prompt produces varying results
233
+
234
+ **Solutions**:
235
+ - Use temperature=0 for deterministic outputs
236
+ - Implement structured output formats
237
+ - Add explicit constraints
238
+ - Use few-shot examples for consistency
239
+
240
+ ### 3. Context Limitations
241
+
242
+ **Problem**: Important information lost in long contexts
243
+
244
+ **Solutions**:
245
+ - Hierarchical summarization
246
+ - Smart chunking strategies
247
+ - Context window management
248
+ - Key information repetition
249
+
250
+ ## Part 8: Future Directions and Emerging Trends
251
+
252
+ ### 1. Multi-Modal Prompting
253
+ Visual Chain-of-thought Prompting for knowledge-based visual reasoning, which involves the interaction between visual content and natural language
254
+
255
+ ### 2. Agent-Based Systems
256
+ - Multi-agent collaboration patterns
257
+ - Tool use and function calling
258
+ - Autonomous reasoning systems
259
+
260
+ ### 3. Extended Thinking Models
261
+ Anthropic's Claude 4 introduces extended thinking capabilities, allowing deeper reasoning for complex problems.
262
+
263
+ ### 4. Prompt Caching and Optimization
264
+ - Reducing latency through intelligent caching
265
+ - Dynamic prompt adaptation
266
+ - Cost optimization strategies
267
+
268
+ ## Best Practices Summary
269
+
270
+ ### Essential Principles
271
+
272
+ 1. **Start Simple**: Begin with basic prompts and add complexity gradually
273
+ 2. **Be Specific**: Clarity trumps brevity
274
+ 3. **Test Systematically**: Use consistent evaluation methods
275
+ 4. **Document Everything**: Maintain prompt libraries and version control
276
+ 5. **Iterate Rapidly**: Prompt engineering provides nearly instantaneous results, allowing for quick problem-solving
277
+
278
+ ### The Golden Rules
279
+
280
+ 1. **Understand Your Model**: Different models have different strengths
281
+ 2. **Know Your Use Case**: Tailor techniques to specific applications
282
+ 3. **Measure What Matters**: Define success criteria upfront
283
+ 4. **Leverage Templates**: Build reusable prompt components
284
+ 5. **Stay Current**: The field evolves rapidly
285
+
286
+ ## Conclusion
287
+
288
+ Prompt engineering represents a paradigm shift in how we interact with AI systems. The research clearly demonstrates that effective prompt engineering can achieve performance improvements comparable to or exceeding fine-tuning, while maintaining flexibility and reducing costs.
289
+
290
+ As we move into 2025, the convergence of larger context windows, multi-modal capabilities, and sophisticated reasoning systems will create new opportunities for prompt engineers. The key to success lies not in mastering every technique, but in understanding the principles, maintaining systematic approaches, and continuously adapting to new capabilities.
291
+
292
+ The most important takeaway: prompt engineering is both an art and a science. While research provides frameworks and techniques, practical application requires creativity, experimentation, and deep understanding of both the technology and the problem domain.
293
+
294
+ ---
295
+
296
+ ## References and Further Reading
297
+
298
+ ### Academic Papers
299
+ - "The Prompt Report: A Systematic Survey of Prompt Engineering Techniques" (2025)
300
+ - "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" (Wei et al., 2022)
301
+ - "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" (Lewis et al., 2021)
302
+
303
+ ### Industry Resources
304
+ - Anthropic's Prompt Engineering Documentation
305
+ - OpenAI's Prompt Engineering Guide
306
+ - Google's Vertex AI Prompt Guidelines
307
+
308
+ ### Tools and Frameworks
309
+ - LangChain for RAG implementation
310
+ - DSPy for prompt optimization
311
+ - Claude Code for development workflows
312
+
313
+ ### Communities and Learning
314
+ - Prompt Engineering Guide (promptingguide.ai)
315
+ - Learn Prompting interactive courses
316
+ - Research papers repository on arXiv
317
+
318
+ ---
319
+
320
+ *This guide represents the state of prompt engineering as of 2025, based on peer-reviewed research and industry best practices. The field continues to evolve rapidly, and practitioners should stay current with the latest developments.*