flonat-research 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/agents/domain-reviewer.md +336 -0
- package/.claude/agents/fixer.md +226 -0
- package/.claude/agents/paper-critic.md +370 -0
- package/.claude/agents/peer-reviewer.md +289 -0
- package/.claude/agents/proposal-reviewer.md +215 -0
- package/.claude/agents/referee2-reviewer.md +367 -0
- package/.claude/agents/references/journal-referee-profiles.md +354 -0
- package/.claude/agents/references/paper-critic/council-personas.md +77 -0
- package/.claude/agents/references/paper-critic/council-prompts.md +198 -0
- package/.claude/agents/references/peer-reviewer/report-template.md +199 -0
- package/.claude/agents/references/peer-reviewer/sa-prompts.md +260 -0
- package/.claude/agents/references/peer-reviewer/security-scan.md +188 -0
- package/.claude/agents/references/proposal-reviewer/report-template.md +144 -0
- package/.claude/agents/references/proposal-reviewer/sa-prompts.md +149 -0
- package/.claude/agents/references/referee-config.md +114 -0
- package/.claude/agents/references/referee2-reviewer/audit-checklists.md +287 -0
- package/.claude/agents/references/referee2-reviewer/report-template.md +334 -0
- package/.claude/rules/design-before-results.md +52 -0
- package/.claude/rules/ignore-agents-md.md +17 -0
- package/.claude/rules/ignore-gemini-md.md +17 -0
- package/.claude/rules/lean-claude-md.md +45 -0
- package/.claude/rules/learn-tags.md +99 -0
- package/.claude/rules/overleaf-separation.md +67 -0
- package/.claude/rules/plan-first.md +175 -0
- package/.claude/rules/read-docs-first.md +50 -0
- package/.claude/rules/scope-discipline.md +28 -0
- package/.claude/settings.json +125 -0
- package/.context/current-focus.md +33 -0
- package/.context/preferences/priorities.md +36 -0
- package/.context/preferences/task-naming.md +28 -0
- package/.context/profile.md +29 -0
- package/.context/projects/_index.md +41 -0
- package/.context/projects/papers/nudge-exp.md +22 -0
- package/.context/projects/papers/uncertainty.md +31 -0
- package/.context/resources/claude-scientific-writer-review.md +48 -0
- package/.context/resources/cunningham-multi-analyst-agents.md +104 -0
- package/.context/resources/cunningham-multilang-code-audit.md +62 -0
- package/.context/resources/google-ai-co-scientist-review.md +72 -0
- package/.context/resources/karpathy-llm-council-review.md +58 -0
- package/.context/resources/multi-coder-reliability-protocol.md +175 -0
- package/.context/resources/pedro-santanna-takeaways.md +96 -0
- package/.context/resources/venue-rankings/abs_ajg_2024.csv +1823 -0
- package/.context/resources/venue-rankings/abs_ajg_2024_econ.csv +356 -0
- package/.context/resources/venue-rankings/cabs_4_4star_theory.csv +40 -0
- package/.context/resources/venue-rankings/core_2026.csv +801 -0
- package/.context/resources/venue-rankings.md +147 -0
- package/.context/workflows/README.md +69 -0
- package/.context/workflows/daily-review.md +91 -0
- package/.context/workflows/meeting-actions.md +108 -0
- package/.context/workflows/replication-protocol.md +155 -0
- package/.context/workflows/weekly-review.md +113 -0
- package/.mcp-server-biblio/formatters.py +158 -0
- package/.mcp-server-biblio/pyproject.toml +11 -0
- package/.mcp-server-biblio/server.py +678 -0
- package/.mcp-server-biblio/sources/__init__.py +14 -0
- package/.mcp-server-biblio/sources/base.py +73 -0
- package/.mcp-server-biblio/sources/formatters.py +83 -0
- package/.mcp-server-biblio/sources/models.py +22 -0
- package/.mcp-server-biblio/sources/multi_source.py +243 -0
- package/.mcp-server-biblio/sources/openalex_source.py +183 -0
- package/.mcp-server-biblio/sources/scopus_source.py +309 -0
- package/.mcp-server-biblio/sources/wos_source.py +508 -0
- package/.mcp-server-biblio/uv.lock +896 -0
- package/.scripts/README.md +161 -0
- package/.scripts/ai_pattern_density.py +446 -0
- package/.scripts/conf +445 -0
- package/.scripts/config.py +122 -0
- package/.scripts/count_inventory.py +275 -0
- package/.scripts/daily_digest.py +288 -0
- package/.scripts/done +177 -0
- package/.scripts/extract_meeting_actions.py +223 -0
- package/.scripts/focus +176 -0
- package/.scripts/generate-codex-agents-md.py +217 -0
- package/.scripts/inbox +194 -0
- package/.scripts/notion_helpers.py +325 -0
- package/.scripts/openalex/query_helpers.py +306 -0
- package/.scripts/papers +227 -0
- package/.scripts/query +223 -0
- package/.scripts/session-history.py +201 -0
- package/.scripts/skill-health.py +516 -0
- package/.scripts/skill-log-miner.py +273 -0
- package/.scripts/sync-to-codex.sh +252 -0
- package/.scripts/task +213 -0
- package/.scripts/tasks +190 -0
- package/.scripts/week +206 -0
- package/CLAUDE.md +197 -0
- package/LICENSE +21 -0
- package/MEMORY.md +38 -0
- package/README.md +269 -0
- package/docs/agents.md +44 -0
- package/docs/bibliography-setup.md +55 -0
- package/docs/council-mode.md +36 -0
- package/docs/getting-started.md +245 -0
- package/docs/hooks.md +38 -0
- package/docs/mcp-servers.md +82 -0
- package/docs/notion-setup.md +109 -0
- package/docs/rules.md +33 -0
- package/docs/scripts.md +303 -0
- package/docs/setup-overview/setup-overview.pdf +0 -0
- package/docs/skills.md +70 -0
- package/docs/system.md +159 -0
- package/hooks/block-destructive-git.sh +66 -0
- package/hooks/context-monitor.py +114 -0
- package/hooks/postcompact-restore.py +157 -0
- package/hooks/precompact-autosave.py +181 -0
- package/hooks/promise-checker.sh +124 -0
- package/hooks/protect-source-files.sh +81 -0
- package/hooks/resume-context-loader.sh +53 -0
- package/hooks/startup-context-loader.sh +102 -0
- package/package.json +51 -0
- package/packages/cli-council/.github/workflows/claude-code-review.yml +44 -0
- package/packages/cli-council/.github/workflows/claude.yml +50 -0
- package/packages/cli-council/README.md +100 -0
- package/packages/cli-council/pyproject.toml +43 -0
- package/packages/cli-council/src/cli_council/__init__.py +19 -0
- package/packages/cli-council/src/cli_council/__main__.py +185 -0
- package/packages/cli-council/src/cli_council/backends/__init__.py +8 -0
- package/packages/cli-council/src/cli_council/backends/base.py +81 -0
- package/packages/cli-council/src/cli_council/backends/claude.py +25 -0
- package/packages/cli-council/src/cli_council/backends/codex.py +27 -0
- package/packages/cli-council/src/cli_council/backends/gemini.py +26 -0
- package/packages/cli-council/src/cli_council/checkpoint.py +212 -0
- package/packages/cli-council/src/cli_council/config.py +51 -0
- package/packages/cli-council/src/cli_council/council.py +391 -0
- package/packages/cli-council/src/cli_council/models.py +46 -0
- package/packages/llm-council/.github/workflows/claude-code-review.yml +44 -0
- package/packages/llm-council/.github/workflows/claude.yml +50 -0
- package/packages/llm-council/README.md +453 -0
- package/packages/llm-council/pyproject.toml +42 -0
- package/packages/llm-council/src/llm_council/__init__.py +23 -0
- package/packages/llm-council/src/llm_council/__main__.py +259 -0
- package/packages/llm-council/src/llm_council/checkpoint.py +193 -0
- package/packages/llm-council/src/llm_council/client.py +253 -0
- package/packages/llm-council/src/llm_council/config.py +232 -0
- package/packages/llm-council/src/llm_council/council.py +482 -0
- package/packages/llm-council/src/llm_council/models.py +46 -0
- package/packages/mcp-bibliography/MEMORY.md +31 -0
- package/packages/mcp-bibliography/_app.py +226 -0
- package/packages/mcp-bibliography/formatters.py +158 -0
- package/packages/mcp-bibliography/log/2026-03-13-2100.md +35 -0
- package/packages/mcp-bibliography/pyproject.toml +15 -0
- package/packages/mcp-bibliography/run.sh +20 -0
- package/packages/mcp-bibliography/scholarly_formatters.py +83 -0
- package/packages/mcp-bibliography/server.py +1857 -0
- package/packages/mcp-bibliography/tools/__init__.py +28 -0
- package/packages/mcp-bibliography/tools/_registry.py +19 -0
- package/packages/mcp-bibliography/tools/altmetric.py +107 -0
- package/packages/mcp-bibliography/tools/core.py +92 -0
- package/packages/mcp-bibliography/tools/dblp.py +52 -0
- package/packages/mcp-bibliography/tools/openalex.py +296 -0
- package/packages/mcp-bibliography/tools/opencitations.py +102 -0
- package/packages/mcp-bibliography/tools/openreview.py +179 -0
- package/packages/mcp-bibliography/tools/orcid.py +131 -0
- package/packages/mcp-bibliography/tools/scholarly.py +575 -0
- package/packages/mcp-bibliography/tools/unpaywall.py +63 -0
- package/packages/mcp-bibliography/tools/zenodo.py +123 -0
- package/packages/mcp-bibliography/uv.lock +711 -0
- package/scripts/setup.sh +143 -0
- package/skills/beamer-deck/SKILL.md +199 -0
- package/skills/beamer-deck/references/quality-rubric.md +54 -0
- package/skills/beamer-deck/references/review-prompts.md +106 -0
- package/skills/bib-validate/SKILL.md +261 -0
- package/skills/bib-validate/references/council-mode.md +34 -0
- package/skills/bib-validate/references/deep-verify.md +79 -0
- package/skills/bib-validate/references/fix-mode.md +36 -0
- package/skills/bib-validate/references/openalex-verification.md +45 -0
- package/skills/bib-validate/references/preprint-check.md +31 -0
- package/skills/bib-validate/references/ref-manager-crossref.md +41 -0
- package/skills/bib-validate/references/report-template.md +82 -0
- package/skills/code-archaeology/SKILL.md +141 -0
- package/skills/code-review/SKILL.md +265 -0
- package/skills/code-review/references/quality-rubric.md +67 -0
- package/skills/consolidate-memory/SKILL.md +208 -0
- package/skills/context-status/SKILL.md +126 -0
- package/skills/creation-guard/SKILL.md +230 -0
- package/skills/devils-advocate/SKILL.md +130 -0
- package/skills/devils-advocate/references/competing-hypotheses.md +83 -0
- package/skills/init-project/SKILL.md +115 -0
- package/skills/init-project-course/references/memory-and-settings.md +92 -0
- package/skills/init-project-course/references/organise-templates.md +94 -0
- package/skills/init-project-course/skill.md +147 -0
- package/skills/init-project-light/skill.md +139 -0
- package/skills/init-project-research/SKILL.md +368 -0
- package/skills/init-project-research/references/atlas-pipeline-sync.md +70 -0
- package/skills/init-project-research/references/atlas-schema.md +81 -0
- package/skills/init-project-research/references/confirmation-report.md +39 -0
- package/skills/init-project-research/references/domain-profile-template.md +104 -0
- package/skills/init-project-research/references/interview-round3.md +34 -0
- package/skills/init-project-research/references/literature-discovery.md +43 -0
- package/skills/init-project-research/references/scaffold-details.md +197 -0
- package/skills/init-project-research/templates/field-calibration.md +60 -0
- package/skills/init-project-research/templates/pipeline-manifest.md +63 -0
- package/skills/init-project-research/templates/run-all.sh +116 -0
- package/skills/init-project-research/templates/seed-files.md +337 -0
- package/skills/insights-deck/SKILL.md +151 -0
- package/skills/interview-me/SKILL.md +157 -0
- package/skills/latex/SKILL.md +141 -0
- package/skills/latex/references/latex-configs.md +183 -0
- package/skills/latex-autofix/SKILL.md +230 -0
- package/skills/latex-autofix/references/known-errors.md +183 -0
- package/skills/latex-autofix/references/quality-rubric.md +50 -0
- package/skills/latex-health-check/SKILL.md +161 -0
- package/skills/learn/SKILL.md +220 -0
- package/skills/learn/scripts/validate_skill.py +265 -0
- package/skills/lessons-learned/SKILL.md +201 -0
- package/skills/literature/SKILL.md +335 -0
- package/skills/literature/references/agent-templates.md +393 -0
- package/skills/literature/references/bibliometric-apis.md +44 -0
- package/skills/literature/references/cli-council-search.md +79 -0
- package/skills/literature/references/openalex-api-guide.md +371 -0
- package/skills/literature/references/openalex-common-queries.md +381 -0
- package/skills/literature/references/openalex-workflows.md +248 -0
- package/skills/literature/references/reference-manager-sync.md +36 -0
- package/skills/literature/references/scopus-api-guide.md +208 -0
- package/skills/literature/references/wos-api-guide.md +308 -0
- package/skills/multi-perspective/SKILL.md +311 -0
- package/skills/multi-perspective/references/computational-many-analysts.md +77 -0
- package/skills/pipeline-manifest/SKILL.md +226 -0
- package/skills/pre-submission-report/SKILL.md +153 -0
- package/skills/process-reviews/SKILL.md +244 -0
- package/skills/process-reviews/references/rr-routing.md +101 -0
- package/skills/project-deck/SKILL.md +87 -0
- package/skills/project-safety/SKILL.md +135 -0
- package/skills/proofread/SKILL.md +254 -0
- package/skills/proofread/references/quality-rubric.md +104 -0
- package/skills/python-env/SKILL.md +57 -0
- package/skills/quarto-deck/SKILL.md +226 -0
- package/skills/quarto-deck/references/markdown-format.md +143 -0
- package/skills/quarto-deck/references/quality-rubric.md +54 -0
- package/skills/save-context/SKILL.md +174 -0
- package/skills/session-log/SKILL.md +98 -0
- package/skills/shared/concept-validation-gate.md +161 -0
- package/skills/shared/council-protocol.md +265 -0
- package/skills/shared/distribution-diagnostics.md +164 -0
- package/skills/shared/engagement-stratified-sampling.md +218 -0
- package/skills/shared/escalation-protocol.md +74 -0
- package/skills/shared/external-audit-protocol.md +205 -0
- package/skills/shared/intercoder-reliability.md +256 -0
- package/skills/shared/mcp-degradation.md +81 -0
- package/skills/shared/method-probing-questions.md +163 -0
- package/skills/shared/multi-language-conventions.md +143 -0
- package/skills/shared/paid-api-safety.md +174 -0
- package/skills/shared/palettes.md +90 -0
- package/skills/shared/progressive-disclosure.md +92 -0
- package/skills/shared/project-documentation-content.md +443 -0
- package/skills/shared/project-documentation-format.md +281 -0
- package/skills/shared/project-documentation.md +100 -0
- package/skills/shared/publication-output.md +138 -0
- package/skills/shared/quality-scoring.md +70 -0
- package/skills/shared/reference-resolution.md +77 -0
- package/skills/shared/research-quality-rubric.md +165 -0
- package/skills/shared/rhetoric-principles.md +54 -0
- package/skills/shared/skill-design-patterns.md +272 -0
- package/skills/shared/skill-index.md +240 -0
- package/skills/shared/system-documentation.md +334 -0
- package/skills/shared/tikz-rules.md +402 -0
- package/skills/shared/validation-tiers.md +121 -0
- package/skills/shared/venue-guides/README.md +46 -0
- package/skills/shared/venue-guides/cell_press_style.md +483 -0
- package/skills/shared/venue-guides/conferences_formatting.md +564 -0
- package/skills/shared/venue-guides/cs_conference_style.md +463 -0
- package/skills/shared/venue-guides/examples/cell_summary_example.md +247 -0
- package/skills/shared/venue-guides/examples/medical_structured_abstract.md +313 -0
- package/skills/shared/venue-guides/examples/nature_abstract_examples.md +213 -0
- package/skills/shared/venue-guides/examples/neurips_introduction_example.md +245 -0
- package/skills/shared/venue-guides/journals_formatting.md +486 -0
- package/skills/shared/venue-guides/medical_journal_styles.md +535 -0
- package/skills/shared/venue-guides/ml_conference_style.md +556 -0
- package/skills/shared/venue-guides/nature_science_style.md +405 -0
- package/skills/shared/venue-guides/reviewer_expectations.md +417 -0
- package/skills/shared/venue-guides/venue_writing_styles.md +321 -0
- package/skills/split-pdf/SKILL.md +172 -0
- package/skills/split-pdf/methodology.md +48 -0
- package/skills/sync-notion/SKILL.md +93 -0
- package/skills/system-audit/SKILL.md +157 -0
- package/skills/system-audit/references/sub-agent-prompts.md +294 -0
- package/skills/task-management/SKILL.md +131 -0
- package/skills/update-focus/SKILL.md +204 -0
- package/skills/update-project-doc/SKILL.md +194 -0
- package/skills/validate-bib/SKILL.md +242 -0
- package/skills/validate-bib/references/council-mode.md +34 -0
- package/skills/validate-bib/references/deep-verify.md +71 -0
- package/skills/validate-bib/references/openalex-verification.md +45 -0
- package/skills/validate-bib/references/preprint-check.md +31 -0
- package/skills/validate-bib/references/report-template.md +62 -0
|
@@ -0,0 +1,321 @@
|
|
|
1
|
+
# Venue Writing Styles: Master Guide
|
|
2
|
+
|
|
3
|
+
This guide provides an overview of how writing style varies across publication venues. Understanding these differences is essential for crafting papers that read like authentic publications at each venue.
|
|
4
|
+
|
|
5
|
+
**Last Updated**: 2024
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## The Style Spectrum
|
|
10
|
+
|
|
11
|
+
Scientific writing style exists on a spectrum from **broadly accessible** to **deeply technical**:
|
|
12
|
+
|
|
13
|
+
```
|
|
14
|
+
Accessible ◄─────────────────────────────────────────────► Technical
|
|
15
|
+
|
|
16
|
+
Nature/Science PNAS Cell IEEE Trans NeurIPS Specialized
|
|
17
|
+
│ │ │ │ │ Journals
|
|
18
|
+
│ │ │ │ │ │
|
|
19
|
+
▼ ▼ ▼ ▼ ▼ ▼
|
|
20
|
+
General Mixed Deep Field Dense ML Expert
|
|
21
|
+
audience depth biology experts researchers only
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
## Quick Style Reference
|
|
25
|
+
|
|
26
|
+
| Venue Type | Audience | Tone | Voice | Abstract Style |
|
|
27
|
+
|------------|----------|------|-------|----------------|
|
|
28
|
+
| **Nature/Science** | Educated non-specialists | Accessible, engaging | Active, first-person OK | Flowing paragraphs, no jargon |
|
|
29
|
+
| **Cell Press** | Biologists | Mechanistic, precise | Mixed | Summary + eTOC blurb + Highlights |
|
|
30
|
+
| **Medical (NEJM/Lancet)** | Clinicians | Evidence-focused | Formal | Structured (Background/Methods/Results/Conclusions) |
|
|
31
|
+
| **PLOS/BMC** | Researchers | Standard academic | Neutral | IMRaD structured or flowing |
|
|
32
|
+
| **IEEE/ACM** | Engineers/CS | Technical | Passive common | Concise, technical |
|
|
33
|
+
| **ML Conferences** | ML researchers | Dense technical | Mixed | Numbers upfront, key results |
|
|
34
|
+
| **NLP Conferences** | NLP researchers | Technical | Varied | Task-focused, benchmarks |
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
## High-Impact Journals (Nature, Science, Cell)
|
|
39
|
+
|
|
40
|
+
### Core Philosophy
|
|
41
|
+
|
|
42
|
+
High-impact multidisciplinary journals prioritize **broad significance** over technical depth. The question is not "Is this technically sound?" but "Why should a scientist outside this field care?"
|
|
43
|
+
|
|
44
|
+
### Key Writing Principles
|
|
45
|
+
|
|
46
|
+
1. **Start with the big picture**: Open with why this matters to science/society
|
|
47
|
+
2. **Minimize jargon**: Define specialized terms; prefer common words
|
|
48
|
+
3. **Tell a story**: Results should flow as a narrative, not a data dump
|
|
49
|
+
4. **Emphasize implications**: What does this change about our understanding?
|
|
50
|
+
5. **Accessible figures**: Schematics and models over raw data plots
|
|
51
|
+
|
|
52
|
+
### Structural Differences
|
|
53
|
+
|
|
54
|
+
**Nature/Science** vs. **Specialized Journals**:
|
|
55
|
+
|
|
56
|
+
| Element | Nature/Science | Specialized Journal |
|
|
57
|
+
|---------|---------------|---------------------|
|
|
58
|
+
| Introduction | 3-4 paragraphs, broad → specific | Extensive literature review |
|
|
59
|
+
| Methods | Often in supplement or brief | Full detail in main text |
|
|
60
|
+
| Results | Organized by finding/story | Organized by experiment |
|
|
61
|
+
| Discussion | Implications first, then caveats | Detailed comparison to literature |
|
|
62
|
+
| Figures | Conceptual schematics valued | Raw data emphasized |
|
|
63
|
+
|
|
64
|
+
### Example: Same Finding, Different Styles
|
|
65
|
+
|
|
66
|
+
**Nature style**:
|
|
67
|
+
> "We discovered that protein X acts as a molecular switch controlling cell fate decisions during development, resolving a longstanding question about how stem cells choose their destiny."
|
|
68
|
+
|
|
69
|
+
**Specialized journal style**:
|
|
70
|
+
> "Using CRISPR-Cas9 knockout in murine embryonic stem cells (mESCs), we demonstrate that protein X (encoded by gene ABC1) regulates the expression of pluripotency factors Oct4, Sox2, and Nanog through direct promoter binding, as confirmed by ChIP-seq analysis (n=3 biological replicates, FDR < 0.05)."
|
|
71
|
+
|
|
72
|
+
---
|
|
73
|
+
|
|
74
|
+
## Medical Journals (NEJM, Lancet, JAMA, BMJ)
|
|
75
|
+
|
|
76
|
+
### Core Philosophy
|
|
77
|
+
|
|
78
|
+
Medical journals prioritize **clinical relevance** and **patient outcomes**. Every finding must connect to practice.
|
|
79
|
+
|
|
80
|
+
### Key Writing Principles
|
|
81
|
+
|
|
82
|
+
1. **Patient-centered language**: "Patients receiving treatment X" not "Treatment X subjects"
|
|
83
|
+
2. **Evidence strength**: Careful hedging based on study design
|
|
84
|
+
3. **Clinical actionability**: "So what?" for practicing physicians
|
|
85
|
+
4. **Absolute numbers**: Report absolute risk reduction, not just relative
|
|
86
|
+
5. **Structured abstracts**: Required with labeled sections
|
|
87
|
+
|
|
88
|
+
### Structured Abstract Format (Medical)
|
|
89
|
+
|
|
90
|
+
```
|
|
91
|
+
Background: [1-2 sentences on problem and rationale]
|
|
92
|
+
|
|
93
|
+
Methods: [Study design, setting, participants, intervention, outcomes, analysis]
|
|
94
|
+
|
|
95
|
+
Results: [Primary outcome with confidence intervals, secondary outcomes, adverse events]
|
|
96
|
+
|
|
97
|
+
Conclusions: [Clinical implications, limitations acknowledged]
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
### Evidence Language Conventions
|
|
101
|
+
|
|
102
|
+
| Study Design | Appropriate Language |
|
|
103
|
+
|-------------|---------------------|
|
|
104
|
+
| RCT | "Treatment X reduced mortality by..." |
|
|
105
|
+
| Observational | "Treatment X was associated with reduced mortality..." |
|
|
106
|
+
| Case series | "These findings suggest that treatment X may..." |
|
|
107
|
+
| Case report | "This case illustrates that treatment X can..." |
|
|
108
|
+
|
|
109
|
+
---
|
|
110
|
+
|
|
111
|
+
## ML/AI Conferences (NeurIPS, ICML, ICLR, CVPR)
|
|
112
|
+
|
|
113
|
+
### Core Philosophy
|
|
114
|
+
|
|
115
|
+
ML conferences value **novelty**, **rigorous experiments**, and **reproducibility**. The focus is on advancing the state of the art with empirical evidence.
|
|
116
|
+
|
|
117
|
+
### Key Writing Principles
|
|
118
|
+
|
|
119
|
+
1. **Contribution bullets**: Numbered list in introduction stating exactly what's new
|
|
120
|
+
2. **Baselines are critical**: Compare against strong, recent baselines
|
|
121
|
+
3. **Ablations expected**: Show what parts of your method matter
|
|
122
|
+
4. **Reproducibility**: Seeds, hyperparameters, compute requirements
|
|
123
|
+
5. **Limitations section**: Honest acknowledgment (increasingly required)
|
|
124
|
+
|
|
125
|
+
### Introduction Structure (ML Conferences)
|
|
126
|
+
|
|
127
|
+
```
|
|
128
|
+
[Paragraph 1: Problem motivation - why this matters]
|
|
129
|
+
|
|
130
|
+
[Paragraph 2: Limitations of existing approaches]
|
|
131
|
+
|
|
132
|
+
[Paragraph 3: Our approach at high level]
|
|
133
|
+
|
|
134
|
+
Our contributions are as follows:
|
|
135
|
+
• We propose [method name], a novel approach to [problem] that [key innovation].
|
|
136
|
+
• We provide theoretical analysis showing [guarantees/properties].
|
|
137
|
+
• We demonstrate state-of-the-art results on [benchmarks], improving over [baseline] by [X%].
|
|
138
|
+
• We release code and models at [anonymous URL for review].
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
### Abstract Style (ML Conferences)
|
|
142
|
+
|
|
143
|
+
ML abstracts are **dense and numbers-focused**:
|
|
144
|
+
|
|
145
|
+
> "We present TransformerX, a novel architecture for long-range sequence modeling that achieves O(n log n) complexity while maintaining expressivity. On the Long Range Arena benchmark, TransformerX achieves 86.2% average accuracy, outperforming Transformer (65.4%) and Performer (78.1%). On language modeling, TransformerX matches GPT-2 perplexity (18.4) using 40% fewer parameters. We provide theoretical analysis showing TransformerX can approximate any continuous sequence-to-sequence function."
|
|
146
|
+
|
|
147
|
+
### Experiment Section Expectations
|
|
148
|
+
|
|
149
|
+
1. **Datasets**: Standard benchmarks, dataset statistics
|
|
150
|
+
2. **Baselines**: Recent strong methods, fair comparisons
|
|
151
|
+
3. **Main results table**: Clear, comprehensive
|
|
152
|
+
4. **Ablation studies**: Remove/modify components systematically
|
|
153
|
+
5. **Analysis**: Error analysis, qualitative examples, failure cases
|
|
154
|
+
6. **Computational cost**: Training time, inference speed, memory
|
|
155
|
+
|
|
156
|
+
---
|
|
157
|
+
|
|
158
|
+
## CS Conferences (ACL, EMNLP, CHI, SIGKDD)
|
|
159
|
+
|
|
160
|
+
### ACL/EMNLP (NLP)
|
|
161
|
+
|
|
162
|
+
- **Task-focused**: Clear problem definition
|
|
163
|
+
- **Benchmark-heavy**: Standard datasets (GLUE, SQuAD, etc.)
|
|
164
|
+
- **Error analysis valued**: Where does it fail?
|
|
165
|
+
- **Human evaluation**: Often expected alongside automatic metrics
|
|
166
|
+
- **Ethical considerations**: Bias, fairness, environmental cost
|
|
167
|
+
|
|
168
|
+
### CHI (Human-Computer Interaction)
|
|
169
|
+
|
|
170
|
+
- **User-centered**: Focus on humans, not just technology
|
|
171
|
+
- **Study design details**: Participant recruitment, IRB approval
|
|
172
|
+
- **Qualitative accepted**: Interview studies, ethnography valid
|
|
173
|
+
- **Design implications**: Concrete takeaways for practitioners
|
|
174
|
+
- **Accessibility**: Consider diverse user populations
|
|
175
|
+
|
|
176
|
+
### SIGKDD (Data Mining)
|
|
177
|
+
|
|
178
|
+
- **Scalability emphasis**: Handle large data
|
|
179
|
+
- **Real-world applications**: Industry datasets valued
|
|
180
|
+
- **Efficiency metrics**: Time and space complexity
|
|
181
|
+
- **Novelty in methods or applications**: Both paths valid
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
185
|
+
## Adapting Between Venue Types
|
|
186
|
+
|
|
187
|
+
### Journal → ML Conference
|
|
188
|
+
|
|
189
|
+
When converting a journal paper to conference format:
|
|
190
|
+
|
|
191
|
+
1. **Condense introduction**: Remove extensive background
|
|
192
|
+
2. **Add contribution list**: Explicitly enumerate contributions
|
|
193
|
+
3. **Restructure results**: Organize as experiments, add ablations
|
|
194
|
+
4. **Remove separate discussion**: Integrate interpretation briefly
|
|
195
|
+
5. **Add reproducibility section**: Seeds, hyperparameters, code
|
|
196
|
+
|
|
197
|
+
### ML Conference → Journal
|
|
198
|
+
|
|
199
|
+
When expanding a conference paper to journal:
|
|
200
|
+
|
|
201
|
+
1. **Expand related work**: Comprehensive literature review
|
|
202
|
+
2. **Detailed methods**: Full algorithmic description
|
|
203
|
+
3. **More experiments**: Additional datasets, analyses
|
|
204
|
+
4. **Extended discussion**: Implications, limitations, future work
|
|
205
|
+
5. **Appendix → main text**: Move important details up
|
|
206
|
+
|
|
207
|
+
### Specialized → High-Impact Journal
|
|
208
|
+
|
|
209
|
+
When targeting Nature/Science/Cell from a specialized venue:
|
|
210
|
+
|
|
211
|
+
1. **Lead with significance**: Why does this matter broadly?
|
|
212
|
+
2. **Reduce jargon by 80%**: Replace technical terms
|
|
213
|
+
3. **Add conceptual figures**: Schematics, models, not just data
|
|
214
|
+
4. **Story-driven results**: Narrative flow, not experiment-by-experiment
|
|
215
|
+
5. **Broaden discussion**: Implications beyond the subfield
|
|
216
|
+
|
|
217
|
+
---
|
|
218
|
+
|
|
219
|
+
## Voice and Tone Guidelines
|
|
220
|
+
|
|
221
|
+
### Active vs. Passive Voice
|
|
222
|
+
|
|
223
|
+
| Venue | Preference | Example |
|
|
224
|
+
|-------|-----------|---------|
|
|
225
|
+
| Nature/Science | Active encouraged | "We discovered that..." |
|
|
226
|
+
| Cell | Mixed | "Our results demonstrate..." |
|
|
227
|
+
| Medical | Passive common | "Patients were randomized to..." |
|
|
228
|
+
| IEEE | Passive traditional | "The algorithm was implemented..." |
|
|
229
|
+
| ML Conferences | Active preferred | "We propose a method that..." |
|
|
230
|
+
|
|
231
|
+
### First Person Usage
|
|
232
|
+
|
|
233
|
+
| Venue | First Person | Example |
|
|
234
|
+
|-------|-------------|---------|
|
|
235
|
+
| Nature/Science | Yes (we) | "We show that..." |
|
|
236
|
+
| Cell | Yes (we) | "We found that..." |
|
|
237
|
+
| Medical | Sometimes | "We conducted a trial..." |
|
|
238
|
+
| IEEE | Less common | Prefer "This paper presents..." |
|
|
239
|
+
| ML Conferences | Yes (we) | "We introduce..." |
|
|
240
|
+
|
|
241
|
+
### Hedging and Certainty
|
|
242
|
+
|
|
243
|
+
| Claim Strength | Language |
|
|
244
|
+
|---------------|----------|
|
|
245
|
+
| Strong | "X causes Y" (only with causal evidence) |
|
|
246
|
+
| Moderate | "X is associated with Y" / "X leads to Y" |
|
|
247
|
+
| Tentative | "X may contribute to Y" / "X suggests that..." |
|
|
248
|
+
| Speculative | "It is possible that X..." / "One interpretation is..." |
|
|
249
|
+
|
|
250
|
+
---
|
|
251
|
+
|
|
252
|
+
## Common Style Errors by Venue
|
|
253
|
+
|
|
254
|
+
### Nature/Science Submissions
|
|
255
|
+
|
|
256
|
+
❌ Too technical: "We used CRISPR-Cas9 with sgRNAs targeting exon 3..."
|
|
257
|
+
✅ Accessible: "Using gene-editing technology, we disabled the gene..."
|
|
258
|
+
|
|
259
|
+
❌ Dry opening: "Protein X is involved in cellular signaling..."
|
|
260
|
+
✅ Engaging opening: "How do cells decide their fate? We discovered that..."
|
|
261
|
+
|
|
262
|
+
### ML Conference Submissions
|
|
263
|
+
|
|
264
|
+
❌ Vague contributions: "We present a new method for X"
|
|
265
|
+
✅ Specific contributions: "We propose Method Y that achieves Z% improvement on benchmark W"
|
|
266
|
+
|
|
267
|
+
❌ Missing ablations: Only showing full method results
|
|
268
|
+
✅ Complete: Table showing contribution of each component
|
|
269
|
+
|
|
270
|
+
### Medical Journal Submissions
|
|
271
|
+
|
|
272
|
+
❌ Missing absolute numbers: "50% reduction in risk"
|
|
273
|
+
✅ Complete: "50% relative reduction (ARR 2.5%, NNT 40)"
|
|
274
|
+
|
|
275
|
+
❌ Causal language for observational data: "Treatment caused improvement"
|
|
276
|
+
✅ Appropriate: "Treatment was associated with improvement"
|
|
277
|
+
|
|
278
|
+
---
|
|
279
|
+
|
|
280
|
+
## Quick Checklist Before Submission
|
|
281
|
+
|
|
282
|
+
### All Venues
|
|
283
|
+
- [ ] Abstract matches venue style (flowing vs. structured)
|
|
284
|
+
- [ ] Voice/tone appropriate for audience
|
|
285
|
+
- [ ] Jargon level appropriate
|
|
286
|
+
- [ ] Figures match venue expectations
|
|
287
|
+
- [ ] Citation style correct
|
|
288
|
+
|
|
289
|
+
### High-Impact Journals (Nature/Science/Cell)
|
|
290
|
+
- [ ] Broad significance clear in first paragraph
|
|
291
|
+
- [ ] Non-specialist can understand abstract
|
|
292
|
+
- [ ] Story-driven results narrative
|
|
293
|
+
- [ ] Conceptual figures included
|
|
294
|
+
- [ ] Implications emphasized
|
|
295
|
+
|
|
296
|
+
### ML Conferences
|
|
297
|
+
- [ ] Contribution list in introduction
|
|
298
|
+
- [ ] Strong baselines included
|
|
299
|
+
- [ ] Ablation studies present
|
|
300
|
+
- [ ] Reproducibility information complete
|
|
301
|
+
- [ ] Limitations acknowledged
|
|
302
|
+
|
|
303
|
+
### Medical Journals
|
|
304
|
+
- [ ] Structured abstract (if required)
|
|
305
|
+
- [ ] Patient-centered language
|
|
306
|
+
- [ ] Evidence strength appropriate
|
|
307
|
+
- [ ] Absolute numbers reported
|
|
308
|
+
- [ ] CONSORT/STROBE compliance
|
|
309
|
+
|
|
310
|
+
---
|
|
311
|
+
|
|
312
|
+
## See Also
|
|
313
|
+
|
|
314
|
+
- `nature_science_style.md` - Detailed Nature/Science writing guide
|
|
315
|
+
- `cell_press_style.md` - Cell family journal conventions
|
|
316
|
+
- `medical_journal_styles.md` - NEJM, Lancet, JAMA, BMJ guide
|
|
317
|
+
- `ml_conference_style.md` - NeurIPS, ICML, ICLR, CVPR conventions
|
|
318
|
+
- `cs_conference_style.md` - ACL, CHI, SIGKDD guide
|
|
319
|
+
- `reviewer_expectations.md` - What reviewers look for by venue
|
|
320
|
+
|
|
321
|
+
|
|
@@ -0,0 +1,172 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: split-pdf
|
|
3
|
+
description: "Use when you need to download, split, and deeply read an academic PDF."
|
|
4
|
+
allowed-tools: Bash(python*), Bash(uv*), Bash(curl*), Bash(wget*), Bash(mkdir*), Bash(ls*), Read, Write, Edit, WebSearch, WebFetch, mcp__refpile__parse_pdf_fulltext, mcp__refpile__parse_pdf_metadata
|
|
5
|
+
argument-hint: [pdf-path-or-search-query]
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Split-PDF: Download, Split, and Deep-Read Academic Papers
|
|
9
|
+
|
|
10
|
+
**CRITICAL RULE: Never read a full PDF. Never.** Only read the 4-page split files, and only 3 splits at a time (~12 pages). Reading a full PDF will either crash the session with an unrecoverable "prompt too long" error — destroying all context — or produce shallow, hallucinated output. There are no exceptions.
|
|
11
|
+
|
|
12
|
+
## When This Skill Is Invoked
|
|
13
|
+
|
|
14
|
+
The user wants you to read, review, or summarize an academic paper. The input is either:
|
|
15
|
+
- A file path to a local PDF (e.g., `./articles/smith_2024.pdf`)
|
|
16
|
+
- A search query or paper title (e.g., `"Gentzkow Shapiro Sinkinson 2014 competition newspapers"`)
|
|
17
|
+
|
|
18
|
+
**Important:** You cannot search for a paper you don't know exists. The user MUST provide either a file path or a specific search query — an author name, a title, keywords, a year, or some combination that identifies the paper. If the user invokes this skill without specifying what paper to read, ask them. Do not guess.
|
|
19
|
+
|
|
20
|
+
## Step 1: Acquire the PDF
|
|
21
|
+
|
|
22
|
+
**Determine the download directory:**
|
|
23
|
+
- **Inside a research project** (has `CLAUDE.md`, `data/`, `paper/`, etc.): use `./articles/` in the project directory (create if needed).
|
|
24
|
+
- **Outside a project** (e.g., ad-hoc reading from Task Management root or general context): use `to-sort/downloads/` in the Task Management folder. This ensures downloaded files persist across sessions.
|
|
25
|
+
|
|
26
|
+
**If a local file path is provided:**
|
|
27
|
+
- Verify the file exists
|
|
28
|
+
- If the file is NOT already inside the download directory, copy it there (do not move — preserve the original location)
|
|
29
|
+
- Proceed to Step 2
|
|
30
|
+
|
|
31
|
+
**If a search query or paper title is provided:**
|
|
32
|
+
1. Use WebSearch to find the paper
|
|
33
|
+
2. If WebSearch doesn't yield a direct PDF link, try the bibliography MCP `scholarly_search` tool first (cross-source search). Fallback to Python OpenAlex client:
|
|
34
|
+
```python
|
|
35
|
+
import sys
|
|
36
|
+
sys.path.insert(0, ".scripts/openalex")
|
|
37
|
+
from openalex_client import OpenAlexClient
|
|
38
|
+
client = OpenAlexClient(email="user@example.edu")
|
|
39
|
+
results = client.search_works(search="paper title here", per_page=5)
|
|
40
|
+
# Check open_access.oa_url in results for direct PDF links
|
|
41
|
+
```
|
|
42
|
+
3. Use WebFetch or Bash (curl/wget) to download the PDF
|
|
43
|
+
4. Save it to the download directory determined above
|
|
44
|
+
5. Proceed to Step 2
|
|
45
|
+
|
|
46
|
+
**CRITICAL: Always preserve the original PDF.** The downloaded or provided PDF in the download directory must NEVER be deleted, moved, or overwritten at any point in this workflow. The split files are derivatives — the original is the permanent artifact. Do not clean up, do not remove, do not tidy. The original stays.
|
|
47
|
+
|
|
48
|
+
## Step 2: Split the PDF
|
|
49
|
+
|
|
50
|
+
Create a subdirectory for the splits and run the splitting script:
|
|
51
|
+
|
|
52
|
+
```python
|
|
53
|
+
from PyPDF2 import PdfReader, PdfWriter
|
|
54
|
+
import os, sys
|
|
55
|
+
|
|
56
|
+
def split_pdf(input_path, output_dir, pages_per_chunk=4):
|
|
57
|
+
os.makedirs(output_dir, exist_ok=True)
|
|
58
|
+
reader = PdfReader(input_path)
|
|
59
|
+
total = len(reader.pages)
|
|
60
|
+
prefix = os.path.splitext(os.path.basename(input_path))[0]
|
|
61
|
+
|
|
62
|
+
for start in range(0, total, pages_per_chunk):
|
|
63
|
+
end = min(start + pages_per_chunk, total)
|
|
64
|
+
writer = PdfWriter()
|
|
65
|
+
for i in range(start, end):
|
|
66
|
+
writer.add_page(reader.pages[i])
|
|
67
|
+
|
|
68
|
+
out_name = f"{prefix}_pp{start+1}-{end}.pdf"
|
|
69
|
+
out_path = os.path.join(output_dir, out_name)
|
|
70
|
+
with open(out_path, "wb") as f:
|
|
71
|
+
writer.write(f)
|
|
72
|
+
|
|
73
|
+
print(f"Split {total} pages into {-(-total // pages_per_chunk)} chunks in {output_dir}")
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
**Directory convention (in-project):**
|
|
77
|
+
```
|
|
78
|
+
articles/
|
|
79
|
+
├── smith_2024.pdf # original PDF — NEVER DELETE THIS
|
|
80
|
+
└── split_smith_2024/ # split subdirectory
|
|
81
|
+
├── smith_2024_pp1-4.pdf
|
|
82
|
+
├── smith_2024_pp5-8.pdf
|
|
83
|
+
├── smith_2024_pp9-12.pdf
|
|
84
|
+
└── ...
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
**Directory convention (ad-hoc / outside project):**
|
|
88
|
+
```
|
|
89
|
+
to-sort/downloads/
|
|
90
|
+
├── smith_2024.pdf # original PDF — NEVER DELETE THIS
|
|
91
|
+
└── split_smith_2024/ # split subdirectory
|
|
92
|
+
└── ...
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
The original PDF remains in the download directory permanently. The splits are working copies. If anything goes wrong, you can always re-split from the original.
|
|
96
|
+
|
|
97
|
+
If PyPDF2 is not installed, install it: `uv pip install PyPDF2`
|
|
98
|
+
|
|
99
|
+
## Step 3: Read in Batches of 3 Splits
|
|
100
|
+
|
|
101
|
+
Read **exactly 3 split files at a time** (~12 pages). After each batch:
|
|
102
|
+
|
|
103
|
+
1. **Read** the 3 split PDFs using the Read tool
|
|
104
|
+
2. **Update** the running notes file (`notes.md` in the split subdirectory)
|
|
105
|
+
3. **Pause** and tell the user:
|
|
106
|
+
|
|
107
|
+
> "I have finished reading splits [X-Y] and updated the notes. I have [N] more splits remaining. Would you like me to continue with the next 3?"
|
|
108
|
+
|
|
109
|
+
4. **Wait** for the user to confirm before reading the next batch
|
|
110
|
+
|
|
111
|
+
Do NOT read ahead. Do NOT read all splits at once. The pause-and-confirm protocol is mandatory.
|
|
112
|
+
|
|
113
|
+
## Step 4: Structured Extraction
|
|
114
|
+
|
|
115
|
+
As you read, collect information along these dimensions and write them into `notes.md`:
|
|
116
|
+
|
|
117
|
+
1. **Research question** — What is the paper asking and why does it matter?
|
|
118
|
+
2. **Audience** — Which sub-community of researchers cares about this?
|
|
119
|
+
3. **Method** — How do they answer the question? What is the identification strategy?
|
|
120
|
+
4. **Data** — What data do they use? Where precisely did they find it? What is the unit of observation? Sample size? Time period?
|
|
121
|
+
5. **Statistical methods** — What econometric or statistical techniques do they use? What are the key specifications?
|
|
122
|
+
6. **Findings** — What are the main results? Key coefficient estimates and standard errors?
|
|
123
|
+
7. **Contributions** — What is learned from this exercise that we didn't know before?
|
|
124
|
+
8. **Replication feasibility** — Is the data publicly available? Is there a replication archive? A data appendix? URLs for the underlying data?
|
|
125
|
+
|
|
126
|
+
These questions extract what a researcher needs to **build on or replicate** the work — a structured extraction more detailed and specific than a typical summary.
|
|
127
|
+
|
|
128
|
+
## The Notes File
|
|
129
|
+
|
|
130
|
+
The output is `notes.md` in the split subdirectory:
|
|
131
|
+
|
|
132
|
+
```
|
|
133
|
+
articles/split_smith_2024/notes.md
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
This file is **updated incrementally** after each batch. Structure it with clear headers for each of the 8 dimensions. After each batch, update whichever dimensions have new information — do not rewrite from scratch.
|
|
137
|
+
|
|
138
|
+
By the time all splits are read, the notes should contain specific data sources, variable names, equation references, sample sizes, coefficient estimates, and standard errors. Not a summary — a structured extraction.
|
|
139
|
+
|
|
140
|
+
## Structured Mode (GROBID)
|
|
141
|
+
|
|
142
|
+
If the paper is a **Zotero item** (user provides a key), try GROBID-powered extraction before splitting:
|
|
143
|
+
|
|
144
|
+
1. Call `mcp__refpile__parse_pdf_metadata(key=KEY)` — get title, authors, abstract, affiliations
|
|
145
|
+
2. Call `mcp__refpile__parse_pdf_fulltext(key=KEY)` — get sections with headings and paragraphs
|
|
146
|
+
|
|
147
|
+
If GROBID succeeds, you get **semantically structured sections** (Introduction, Methods, Results, etc.) instead of arbitrary 4-page chunks. This is better for reading notes because:
|
|
148
|
+
- Sections don't split mid-paragraph
|
|
149
|
+
- Headings give natural structure for `notes.md`
|
|
150
|
+
- Figures and tables are identified with captions
|
|
151
|
+
|
|
152
|
+
**Use structured mode when:** the paper is in Zotero and GROBID returns sections. Still write to `notes.md` with the same 8-dimension extraction.
|
|
153
|
+
|
|
154
|
+
**Fall back to page splits when:** GROBID is unavailable, returns an error, or the paper isn't in Zotero (local PDF only). The 4-page split workflow remains the default for local-only PDFs.
|
|
155
|
+
|
|
156
|
+
## When NOT to Split
|
|
157
|
+
|
|
158
|
+
- Papers shorter than ~15 pages: read directly (still use the Read tool, not Bash)
|
|
159
|
+
- Policy briefs or non-technical documents: a rough summary is fine
|
|
160
|
+
- Triage only: read just the first split (pages 1-4) for abstract and introduction
|
|
161
|
+
|
|
162
|
+
## Quick Reference
|
|
163
|
+
|
|
164
|
+
| Step | Action |
|
|
165
|
+
|------|--------|
|
|
166
|
+
| **Acquire** | Download to `./articles/` (in-project) or `to-sort/downloads/` (ad-hoc) |
|
|
167
|
+
| **Split** | 4-page chunks into `./articles/split_<name>/` |
|
|
168
|
+
| **Read** | 3 splits at a time, pause after each batch |
|
|
169
|
+
| **Write** | Update `notes.md` with structured extraction |
|
|
170
|
+
| **Confirm** | Ask user before continuing to next batch |
|
|
171
|
+
|
|
172
|
+
For detailed explanation of why this method works, see [methodology.md](methodology.md).
|
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
# Why PDF Splitting Works: The Methodology
|
|
2
|
+
|
|
3
|
+
This document explains the reasoning behind the split-pdf skill. It is reference material for humans — the SKILL.md file contains the actual instructions Claude follows.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## The Problem
|
|
8
|
+
|
|
9
|
+
Claude Code (currently Claude Opus 4.5, model ID `claude-opus-4-5-20251101`) can read PDFs, and it has a very large context window (200k tokens in). In principle, a 40-page academic paper should fit comfortably. In practice, it doesn't work well. Claude Code regularly chokes when asked to read long PDFs, and this manifests in two distinct ways:
|
|
10
|
+
|
|
11
|
+
**Problem 1: Session-breaking "prompt too long" errors.** PDF rendering into tokens is expensive. PDFs are not plain text — they are containers for fonts, vector graphics, embedded images, multi-column layouts, mathematical notation, tables, and footnotes. When Claude Code ingests a PDF, it must convert this complex layout into a linear token stream. A long PDF can produce a token sequence that, combined with the rest of the conversation context, exceeds the model's input limit. When this happens, Claude Code returns a "prompt too long" error. There is no way to recover — the session is broken, and all context is lost unless it has been externalized to files.
|
|
12
|
+
|
|
13
|
+
**Problem 2: Shallow, unreliable reading.** Even when the PDF does not trigger a hard failure, comprehension degrades badly for long documents. The model's attention over many tokens from a single dense document is not uniform — it attends more carefully to the beginning and end, and less carefully to the middle. The result is that Claude "skims" rather than "reads." It catches the abstract and introduction, gets fuzzy on the methodology, and often misses or fabricates details from the results and appendix. It truncates content silently, hallucinates details it didn't actually parse, or produces shallow summaries that miss critical methodological details.
|
|
14
|
+
|
|
15
|
+
These are related but distinct problems. The first is a hard constraint — the session dies. The second is a soft degradation — the session continues but the output is unreliable. Splitting addresses both.
|
|
16
|
+
|
|
17
|
+
## Why Batched Reading Works
|
|
18
|
+
|
|
19
|
+
Reading 3 splits at a time (~12 pages) does several things:
|
|
20
|
+
|
|
21
|
+
1. **Forces the model to focus.** With only 12 pages of content, Claude cannot skim. It has to engage with the material at a granular level — the specific equations, the exact data sources, the precise variable definitions.
|
|
22
|
+
|
|
23
|
+
2. **Creates natural checkpoints.** After each batch, the model writes down what it has learned so far. This means its understanding is externalized into a markdown file. If it makes an error in batch 1, you can catch it before batch 2 builds on it.
|
|
24
|
+
|
|
25
|
+
3. **Accumulates rather than summarizes.** When you ask Claude to read a full paper at once, it produces a summary — a lossy compression. When you ask it to read in batches and update running notes, it accumulates detail. The final notes are richer than any one-shot summary could be.
|
|
26
|
+
|
|
27
|
+
4. **Controls for the "front-loading" problem.** Claude's attention is not uniform over a long document. It attends more carefully to the beginning and end. By splitting the paper, every section gets to be "the beginning" of some chunk.
|
|
28
|
+
|
|
29
|
+
## Why 4-Page Chunks
|
|
30
|
+
|
|
31
|
+
Four pages is a sweet spot:
|
|
32
|
+
- Small enough that Claude attends carefully to every detail
|
|
33
|
+
- Large enough that logical sections (a methodology subsection, a results table with discussion) stay together
|
|
34
|
+
- Produces a manageable number of chunks (a 40-page paper = 10 chunks = 4 rounds of reading)
|
|
35
|
+
|
|
36
|
+
## Why the Pause-and-Confirm Protocol
|
|
37
|
+
|
|
38
|
+
The human pause between batches serves multiple purposes:
|
|
39
|
+
- **Review intermediate output** — catch errors before they compound
|
|
40
|
+
- **Redirect the reading** — ask follow-up questions, skip sections, or change focus
|
|
41
|
+
- **Prevent context drift** — Claude doesn't lose track of where it is in a long session
|
|
42
|
+
- **Control pacing** — some papers require more careful reading in specific sections
|
|
43
|
+
|
|
44
|
+
## Limitations
|
|
45
|
+
|
|
46
|
+
- **It is slow.** A 37-page paper split into 10 chunks, read 3 at a time, requires 4 rounds. Each round involves the user confirming "yes, continue." This is a 10-15 minute process rather than a 1-minute process.
|
|
47
|
+
- **Notes can become repetitive** if the paper revisits themes. Some manual editing of the final notes may be useful.
|
|
48
|
+
- **Assumes the paper is worth reading carefully.** For triage — quickly deciding whether a paper is relevant — reading just the first split (pages 1-4, which usually contains the abstract and introduction) is sufficient.
|
|
@@ -0,0 +1,93 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: sync-notion
|
|
3
|
+
description: "Use when you need to sync the current project's state to the context library and Notion."
|
|
4
|
+
allowed-tools: Read, Edit, Write, Glob, Grep, Bash(ls*), mcp__claude_ai_Notion__notion-search, mcp__claude_ai_Notion__notion-update-page, mcp__claude_ai_Notion__notion-fetch
|
|
5
|
+
argument-hint: [optional: summary of what changed]
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Update Notion Skill
|
|
9
|
+
|
|
10
|
+
> Sync the current project's state to the central task management context library and Notion Research Pipeline.
|
|
11
|
+
|
|
12
|
+
## Purpose
|
|
13
|
+
|
|
14
|
+
After working on a research project, this skill propagates the current state outward:
|
|
15
|
+
- **Project CLAUDE.md** (source of truth) → **`.context/projects/_index.md`** (central registry)
|
|
16
|
+
- **Recent session logs** → **`.context/current-focus.md`** (working memory)
|
|
17
|
+
- **Project metadata** → **Notion Research Pipeline** (tracking database)
|
|
18
|
+
|
|
19
|
+
This keeps all systems in sync without manual updates.
|
|
20
|
+
|
|
21
|
+
## When to Use
|
|
22
|
+
|
|
23
|
+
- At the end of a work session (often paired with `/session-log`)
|
|
24
|
+
- After changing a project's target journal, status, or stage
|
|
25
|
+
- When the user says "update project doc", "sync project", "update my project index"
|
|
26
|
+
|
|
27
|
+
## MCP Pre-Check
|
|
28
|
+
|
|
29
|
+
Before starting, probe Notion MCP availability with a lightweight search. If unavailable, skip Steps 3 and 5 and offer fallbacks per [`shared/mcp-degradation.md`](../shared/mcp-degradation.md).
|
|
30
|
+
|
|
31
|
+
## Workflow
|
|
32
|
+
|
|
33
|
+
### Step 1: Read the current project's CLAUDE.md
|
|
34
|
+
|
|
35
|
+
Extract key metadata:
|
|
36
|
+
- **Project name** (from header)
|
|
37
|
+
- **Target journal** (from `Target:` field)
|
|
38
|
+
- **Stage** (infer from content: Idea / Literature Review / Drafting / Data Collection / Analysis / Submitted / R&R / Published)
|
|
39
|
+
- **Co-authors** (if listed)
|
|
40
|
+
- **Working title**
|
|
41
|
+
- **Key next steps** (from Research Design or recent session log)
|
|
42
|
+
|
|
43
|
+
### Step 2: Read the most recent session log
|
|
44
|
+
|
|
45
|
+
Look in the project's `log/` directory for the latest `YYYY-MM-DD-HHMM.md` file. Extract:
|
|
46
|
+
- What was accomplished
|
|
47
|
+
- Current blockers
|
|
48
|
+
- Next steps
|
|
49
|
+
|
|
50
|
+
### Step 3: Update `.context/projects/_index.md`
|
|
51
|
+
|
|
52
|
+
Location: `$TM/.context/projects/_index.md`
|
|
53
|
+
|
|
54
|
+
Find the project's row in the "Papers in Progress" table. If it exists, update the Stage, Target Journal, and Status columns. If it doesn't exist, add a new row.
|
|
55
|
+
|
|
56
|
+
**Table format:**
|
|
57
|
+
```
|
|
58
|
+
| Project | Stage | Co-authors | Target Journal | Status |
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
### Step 4: Update `.context/current-focus.md`
|
|
62
|
+
|
|
63
|
+
Location: `$TM/.context/current-focus.md`
|
|
64
|
+
|
|
65
|
+
Update the "Recent Context" section with a brief summary of the latest session. Add any new open loops.
|
|
66
|
+
|
|
67
|
+
> **Note:** For full session-level updates (session rotation, open loop management, mental state), defer to `/update-focus`. This skill adds a brief summary only.
|
|
68
|
+
|
|
69
|
+
### Step 5: Update Notion Research Pipeline
|
|
70
|
+
|
|
71
|
+
Search the Research Pipeline database (`collection://YOUR-PIPELINE-DATABASE-ID-HERE`) for the project name. If found, update:
|
|
72
|
+
- **Status** (match to: Idea, Literature Review, Drafting, Submitted, R&R, Published)
|
|
73
|
+
- **Target Journal**
|
|
74
|
+
- **Priority** (if changed)
|
|
75
|
+
|
|
76
|
+
If not found, inform the user (don't auto-create — they may want to set properties manually).
|
|
77
|
+
|
|
78
|
+
### Step 6: Confirm
|
|
79
|
+
|
|
80
|
+
Report what was updated:
|
|
81
|
+
```
|
|
82
|
+
Updated project docs for [Project Name]:
|
|
83
|
+
- .context/projects/_index.md: [what changed]
|
|
84
|
+
- .context/current-focus.md: [what changed]
|
|
85
|
+
- Notion Research Pipeline: [what changed, or "no entry found"]
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
## Important Notes
|
|
89
|
+
|
|
90
|
+
- This skill **reads** the project CLAUDE.md and session logs — it never modifies them.
|
|
91
|
+
- It **writes** to the central context files and Notion only.
|
|
92
|
+
- Always read before writing to preserve existing content.
|
|
93
|
+
- If the user provides a summary argument, use that instead of inferring from logs.
|