PyPI - EvoScientist - Versions diffs - 0.0.1.dev2__py3-none-any.whl - Mend

EvoScientist 0.0.1.dev2__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (107) hide show

EvoScientist/skills/ml-paper-writing/references/writing-guide.md ADDED Viewed

@@ -0,0 +1,476 @@
+# ML Paper Writing Philosophy & Best Practices
+This reference compiles writing advice from prominent ML researchers including Neel Nanda, Andrej Karpathy, Sebastian Farquhar, Zachary Lipton, and Jacob Steinhardt.
+---
+## Contents
+- [The Narrative Principle](#the-narrative-principle)
+- [Time Allocation](#time-allocation)
+- [Abstract Writing Formula](#abstract-writing-formula)
+- [Introduction Structure](#introduction-structure)
+- [Sentence-Level Clarity](#sentence-level-clarity)
+- [Word Choice and Precision](#word-choice-and-precision)
+- [Mathematical Writing](#mathematical-writing)
+- [Figure Design](#figure-design)
+- [Common Mistakes to Avoid](#common-mistakes-to-avoid)
+---
+## The Narrative Principle
+### From Neel Nanda
+"A paper is a short, rigorous, evidence-based technical story with a takeaway readers care about."
+The narrative rests on three pillars that must be crystal clear by the end of your introduction:
+**The "What"**: One to three specific novel claims fitting within a cohesive theme. Vague contributions like "we study X" fail immediately—reviewers need precise, falsifiable claims.
+**The "Why"**: Rigorous empirical evidence that convincingly supports those claims, including strong baselines honestly tuned and experiments that distinguish between competing hypotheses rather than merely showing "decent results."
+**The "So What"**: Why readers should care, connecting your contribution to problems the community recognizes as important.
+### From Andrej Karpathy
+"A paper is not a random collection of experiments you report on. The paper sells a single thing that was not obvious or present before. The entire paper is organized around this core contribution with surgical precision."
+This applies whether you're presenting a new architecture, a theoretical result, or improved understanding of existing methods—NeurIPS explicitly notes that "originality does not necessarily require an entirely new method."
+**Practical Implication**: If you cannot state your contribution in one sentence, you don't yet have a paper. Everything else—experiments, related work, discussion—exists only to support that core claim.
+---
+## Time Allocation
+### From Neel Nanda
+Spend approximately **the same amount of time** on each of:
+1. The abstract
+2. The introduction
+3. The figures
+4. Everything else combined
+This isn't hyperbole—most reviewers form preliminary judgments before reaching your methods section. Readers encounter your paper in a predictable pattern: **title → abstract → introduction → figures → maybe the rest.**
+### Reviewer Reading Patterns
+Studies of reviewer behavior show:
+- Abstract is read 100% of the time
+- Introduction is skimmed by 90%+ of reviewers
+- Figures are examined before methods by most reviewers
+- Full methods are read only if interest is established
+**Implication**: Front-load your paper's value. Don't bury the contribution.
+---
+## Abstract Writing Formula
+### Sebastian Farquhar's 5-Sentence Formula
+1. **What you achieved**: "We introduce...", "We prove...", "We demonstrate..."
+2. **Why this is hard and important**
+3. **How you do it** (with specialist keywords for discoverability)
+4. **What evidence you have**
+5. **Your most remarkable number/result**
+### Example (Good Abstract)
+```
+We prove that gradient descent on overparameterized neural networks
+converges to global minima at a linear rate. [What]
+This resolves a fundamental question about why deep learning works
+despite non-convex optimization landscapes. [Why hard/important]
+Our proof relies on showing that the Neural Tangent Kernel remains
+approximately constant during training, reducing the problem to
+kernel regression. [How with keywords]
+We validate our theory on CIFAR-10 and ImageNet, showing that
+predicted convergence rates match experiments within 5%. [Evidence]
+This is the first polynomial-time convergence guarantee for
+networks with practical depth and width. [Remarkable result]
+```
+### What to Avoid
+From Zachary Lipton: "If the first sentence can be pre-pended to any ML paper, delete it."
+**Delete these openings**:
+- "Large language models have achieved remarkable success..."
+- "Deep learning has revolutionized..."
+- "In recent years, neural networks have..."
+**Start with your specific contribution instead.**
+---
+## Introduction Structure
+### Requirements
+- **1-1.5 pages maximum** (in two-column format)
+- **Methods should start by page 2-3**
+- Must include **2-4 bullet contribution list** (max 1-2 lines each)
+### Structure Template
+```markdown
+1. Opening Hook (2-3 sentences)
+   - State the problem your paper addresses
+   - Why it matters RIGHT NOW
+2. Background/Challenge (1 paragraph)
+   - What makes this problem hard?
+   - What have others tried? Why is it insufficient?
+3. Your Approach (1 paragraph)
+   - What do you do differently?
+   - Key insight that enables your contribution
+4. Contribution Bullets (2-4 items)
+   - Be specific and falsifiable
+   - Each bullet: 1-2 lines maximum
+5. Results Preview (2-3 sentences)
+   - Most impressive numbers
+   - Scope of evaluation
+6. Paper Organization (optional, 1-2 sentences)
+   - "Section 2 presents... Section 3 describes..."
+```
+### Contribution Bullets: Good vs Bad
+**Good:**
+- We prove that X converges in O(n log n) time under assumption Y
+- We introduce Z, a 3-layer architecture that reduces memory by 40%
+- We demonstrate that A outperforms B by 15% on benchmark C
+**Bad:**
+- We study the problem of X (not a contribution)
+- We provide extensive experiments (too vague)
+- We make several contributions to the field (says nothing)
+---
+## Sentence-Level Clarity
+### From Gopen & Swan: "The Science of Scientific Writing"
+The seminal 1990 paper by George Gopen and Judith Swan establishes that **readers have structural expectations** about where information appears in prose. Violating these expectations forces readers to spend energy on structure rather than content.
+> "If the reader is to grasp what the writer means, the writer must understand what the reader needs."
+#### The 7 Principles of Reader Expectations
+**Principle 1: Subject-Verb Proximity**
+Keep grammatical subject and verb close together. Anything intervening reads as interruption of lesser importance.
+**Weak**: "The model, which was trained on 100M tokens and fine-tuned on domain-specific data using LoRA with rank 16, achieves state-of-the-art results"
+**Strong**: "The model achieves state-of-the-art results after training on 100M tokens and fine-tuning with LoRA (rank 16)"
+**Principle 2: Stress Position (Save the Best for Last)**
+Readers naturally emphasize the **last words of a sentence**. Place your most important information there.
+**Weak**: "Accuracy improves by 15% when using attention"
+**Strong**: "When using attention, accuracy improves by **15%**"
+**Principle 3: Topic Position (First Things First)**
+The beginning of a sentence establishes perspective. Put the "whose story" element first—readers expect the sentence to be about whoever shows up first.
+**Weak**: "A novel attention mechanism that computes alignment scores is introduced"
+**Strong**: "To address the alignment problem, we introduce a novel attention mechanism"
+**Principle 4: Old Information Before New**
+Put familiar information (old) in the topic position for backward linkage; put new information in the stress position for emphasis.
+**Weak**: "Sparse attention was introduced by Child et al. The quadratic complexity of standard attention motivates this work."
+**Strong**: "Standard attention has quadratic complexity. To address this, Child et al. introduced sparse attention."
+**Principle 5: One Unit, One Function**
+Each unit of discourse (sentence, paragraph, section) should serve a single function. If you have two points, use two units.
+**Principle 6: Articulate Action in the Verb**
+Express the action of each sentence in its verb, not in nominalized nouns.
+**Weak**: "We performed an analysis of the results" (nominalization)
+**Strong**: "We analyzed the results" (action in verb)
+**Principle 7: Context Before New Information**
+Provide context before asking the reader to consider anything new. This applies at all levels—sentence, paragraph, section.
+**Weak**: "Equation 3 shows that convergence is guaranteed when the learning rate satisfies..."
+**Strong**: "For convergence to be guaranteed, the learning rate must satisfy the condition in Equation 3..."
+#### Summary Table
+| Principle | Rule | Mnemonic |
+|-----------|------|----------|
+| Subject-Verb Proximity | Keep subject and verb close | "Don't interrupt yourself" |
+| Stress Position | Emphasis at sentence end | "Save the best for last" |
+| Topic Position | Context at sentence start | "First things first" |
+| Old Before New | Familiar → unfamiliar | "Build on known ground" |
+| One Unit, One Function | Each paragraph = one point | "One idea per container" |
+| Action in Verb | Use verbs, not nominalizations | "Verbs do, nouns sit" |
+| Context Before New | Explain before presenting | "Set the stage first" |
+---
+---
+## Micro-Level Writing Tips
+### From Ethan Perez (Anthropic)
+These practical micro-level tips improve clarity at the sentence and word level.
+#### Pronoun Management
+**Minimize pronouns** ("this," "it," "these," "that"). When pronouns are necessary, use them as adjectives with a noun:
+**Weak**: "This shows that the model converges."
+**Strong**: "This result shows that the model converges."
+**Weak**: "It improves performance."
+**Strong**: "This modification improves performance."
+#### Verb Placement
+**Position verbs early** in sentences for better parsing:
+**Weak**: "The gradient, after being computed and normalized, updates the weights."
+**Strong**: "The gradient updates the weights after being computed and normalized."
+#### Apostrophe Unfolding
+Transform possessive constructions for clarity:
+**Original**: "X's Y" → **Unfolded**: "The Y of X"
+**Before**: "The model's accuracy on the test set"
+**After**: "The accuracy of the model on the test set"
+This isn't always better, but when sentences feel awkward, try unfolding.
+#### Words to Eliminate
+Delete these filler words in almost all cases:
+- "actually"
+- "a bit"
+- "fortunately" / "unfortunately"
+- "very" / "really"
+- "quite"
+- "basically"
+- "essentially"
+- Excessive connectives ("however," "moreover," "furthermore" when not needed)
+#### Sentence Construction Rules
+1. **One idea per sentence** - If struggling to express an idea in one sentence, it needs two
+2. **No repeated sounds** - Avoid similar-sounding words in the same sentence
+3. **Every sentence adds information** - Delete sentences that merely restate
+4. **Active voice always** - Specify the actor ("We find..." not "It is found...")
+5. **Expand contractions** - "don't" → "do not" for formality
+#### Paragraph Architecture
+- **First sentence**: State the point clearly
+- **Middle sentences**: Support with evidence
+- **Last sentence**: Reinforce or transition
+Don't bury key information in the middle of paragraphs.
+---
+## Word Choice and Precision
+### From Zachary Lipton
+**Eliminate hedging** unless genuine uncertainty exists:
+- Delete "may" and "can" unless necessary
+- "provides *very* tight approximation" drips with insecurity
+- "provides tight approximation" is confident
+**Avoid vacuous intensifiers**:
+- Delete: very, extremely, highly, significantly (unless statistical)
+- These words signal insecurity, not strength
+### From Jacob Steinhardt
+**Precision over brevity**: Replace vague terms with specific ones.
+| Vague | Specific |
+|-------|----------|
+| performance | accuracy, latency, throughput |
+| improves | increases accuracy by X%, reduces latency by Y |
+| large | 1B parameters, 100M tokens |
+| fast | 3x faster, 50ms latency |
+| good results | 92% accuracy, 0.85 F1 |
+**Consistent terminology**: Referring to the same concept with different terms creates confusion.
+**Choose one and stick with it**:
+- "model" vs "network" vs "architecture"
+- "training" vs "learning" vs "optimization"
+- "sample" vs "example" vs "instance"
+### Vocabulary Signaling
+**Avoid words signaling incremental work**:
+- Never: "combine," "modify," "expand," "extend"
+- Instead: "develop," "propose," "introduce"
+**Why**: "We combine X and Y" sounds like you stapled two existing ideas together. "We develop a method that leverages X for Y" sounds like genuine contribution.
+---
+## Mathematical Writing
+### From Ethan Perez
+**Unfold apostrophes** for clarity:
+- Weak: "X's Y"
+- Strong: "The Y of X"
+Example: "the model's accuracy" → "the accuracy of the model"
+### General Principles
+1. **State all assumptions formally** before theorems
+2. **Provide intuitive explanations** alongside proofs
+3. **Use consistent notation** throughout the paper
+4. **Define symbols at first use**
+### Notation Conventions
+```latex
+% Scalars: lowercase italic
+$x$, $y$, $\alpha$, $\beta$
+% Vectors: lowercase bold
+$\mathbf{x}$, $\mathbf{v}$
+% Matrices: uppercase bold
+$\mathbf{W}$, $\mathbf{X}$
+% Sets: uppercase calligraphic
+$\mathcal{X}$, $\mathcal{D}$
+% Functions: roman for named functions
+$\mathrm{softmax}$, $\mathrm{ReLU}$
+```
+---
+## Figure Design
+### From Neel Nanda
+Figures should tell a coherent story even if the reader skips the text. Many readers DO skip the text initially.
+### Design Principles
+1. **Figure 1 is crucial**: Often the first thing readers examine after abstract
+2. **Self-contained captions**: Reader should understand figure without main text
+3. **No title inside figure**: The caption serves this function (ICML/NeurIPS rule)
+4. **Vector graphics**: PDF/EPS for plots, PNG (600 DPI) only for photographs
+### Accessibility Requirements
+8% of men have color vision deficiency. Your figures must work for them.
+**Solutions**:
+- Use colorblind-safe palettes: Okabe-Ito or Paul Tol
+- Avoid red-green combinations
+- Verify figures work in grayscale
+- Use different line styles (solid, dashed, dotted) in addition to colors
+### Tools
+```python
+# SciencePlots: Publication-ready styles
+import matplotlib.pyplot as plt
+plt.style.use(['science', 'ieee'])
+# Or for Nature-style
+plt.style.use(['science', 'nature'])
+```
+---
+## Common Mistakes to Avoid
+### Structure Mistakes
+| Mistake | Solution |
+|---------|----------|
+| Introduction too long (>1.5 pages) | Move background to Related Work |
+| Methods buried (after page 3) | Front-load contribution, cut intro |
+| Missing contribution bullets | Add 2-4 specific, falsifiable claims |
+| Experiments without explicit claims | State what each experiment tests |
+### Writing Mistakes
+| Mistake | Solution |
+|---------|----------|
+| Generic abstract opening | Start with your specific contribution |
+| Inconsistent terminology | Choose one term per concept |
+| Passive voice overuse | Use active voice: "We show" not "It is shown" |
+| Hedging everywhere | Be confident unless genuinely uncertain |
+### Figure Mistakes
+| Mistake | Solution |
+|---------|----------|
+| Raster graphics for plots | Use vector (PDF/EPS) |
+| Red-green color scheme | Use colorblind-safe palette |
+| Title inside figure | Put title in caption |
+| Captions require main text | Make captions self-contained |
+### Citation Mistakes
+| Mistake | Solution |
+|---------|----------|
+| Paper-by-paper Related Work | Organize methodologically |
+| Missing relevant citations | Reviewers authored papers—cite generously |
+| AI-generated citations | Always verify via APIs |
+| Inconsistent citation format | Use BibLaTeX with consistent keys |
+---
+## Pre-Submission Checklist
+Before submitting, verify:
+**Narrative**:
+- [ ] Can state contribution in one sentence
+- [ ] Three pillars (What/Why/So What) clear in intro
+- [ ] Every experiment supports a specific claim
+**Structure**:
+- [ ] Abstract follows 5-sentence formula
+- [ ] Introduction ≤1.5 pages
+- [ ] Methods start by page 2-3
+- [ ] 2-4 contribution bullets included
+- [ ] Limitations section present
+**Writing**:
+- [ ] Consistent terminology throughout
+- [ ] No generic opening sentences
+- [ ] Hedging removed unless necessary
+- [ ] All figures have self-contained captions
+**Technical**:
+- [ ] All citations verified via API
+- [ ] Error bars included with methodology
+- [ ] Compute resources documented
+- [ ] Code/data availability stated