flonat-research 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/agents/domain-reviewer.md +336 -0
- package/.claude/agents/fixer.md +226 -0
- package/.claude/agents/paper-critic.md +370 -0
- package/.claude/agents/peer-reviewer.md +289 -0
- package/.claude/agents/proposal-reviewer.md +215 -0
- package/.claude/agents/referee2-reviewer.md +367 -0
- package/.claude/agents/references/journal-referee-profiles.md +354 -0
- package/.claude/agents/references/paper-critic/council-personas.md +77 -0
- package/.claude/agents/references/paper-critic/council-prompts.md +198 -0
- package/.claude/agents/references/peer-reviewer/report-template.md +199 -0
- package/.claude/agents/references/peer-reviewer/sa-prompts.md +260 -0
- package/.claude/agents/references/peer-reviewer/security-scan.md +188 -0
- package/.claude/agents/references/proposal-reviewer/report-template.md +144 -0
- package/.claude/agents/references/proposal-reviewer/sa-prompts.md +149 -0
- package/.claude/agents/references/referee-config.md +114 -0
- package/.claude/agents/references/referee2-reviewer/audit-checklists.md +287 -0
- package/.claude/agents/references/referee2-reviewer/report-template.md +334 -0
- package/.claude/rules/design-before-results.md +52 -0
- package/.claude/rules/ignore-agents-md.md +17 -0
- package/.claude/rules/ignore-gemini-md.md +17 -0
- package/.claude/rules/lean-claude-md.md +45 -0
- package/.claude/rules/learn-tags.md +99 -0
- package/.claude/rules/overleaf-separation.md +67 -0
- package/.claude/rules/plan-first.md +175 -0
- package/.claude/rules/read-docs-first.md +50 -0
- package/.claude/rules/scope-discipline.md +28 -0
- package/.claude/settings.json +125 -0
- package/.context/current-focus.md +33 -0
- package/.context/preferences/priorities.md +36 -0
- package/.context/preferences/task-naming.md +28 -0
- package/.context/profile.md +29 -0
- package/.context/projects/_index.md +41 -0
- package/.context/projects/papers/nudge-exp.md +22 -0
- package/.context/projects/papers/uncertainty.md +31 -0
- package/.context/resources/claude-scientific-writer-review.md +48 -0
- package/.context/resources/cunningham-multi-analyst-agents.md +104 -0
- package/.context/resources/cunningham-multilang-code-audit.md +62 -0
- package/.context/resources/google-ai-co-scientist-review.md +72 -0
- package/.context/resources/karpathy-llm-council-review.md +58 -0
- package/.context/resources/multi-coder-reliability-protocol.md +175 -0
- package/.context/resources/pedro-santanna-takeaways.md +96 -0
- package/.context/resources/venue-rankings/abs_ajg_2024.csv +1823 -0
- package/.context/resources/venue-rankings/abs_ajg_2024_econ.csv +356 -0
- package/.context/resources/venue-rankings/cabs_4_4star_theory.csv +40 -0
- package/.context/resources/venue-rankings/core_2026.csv +801 -0
- package/.context/resources/venue-rankings.md +147 -0
- package/.context/workflows/README.md +69 -0
- package/.context/workflows/daily-review.md +91 -0
- package/.context/workflows/meeting-actions.md +108 -0
- package/.context/workflows/replication-protocol.md +155 -0
- package/.context/workflows/weekly-review.md +113 -0
- package/.mcp-server-biblio/formatters.py +158 -0
- package/.mcp-server-biblio/pyproject.toml +11 -0
- package/.mcp-server-biblio/server.py +678 -0
- package/.mcp-server-biblio/sources/__init__.py +14 -0
- package/.mcp-server-biblio/sources/base.py +73 -0
- package/.mcp-server-biblio/sources/formatters.py +83 -0
- package/.mcp-server-biblio/sources/models.py +22 -0
- package/.mcp-server-biblio/sources/multi_source.py +243 -0
- package/.mcp-server-biblio/sources/openalex_source.py +183 -0
- package/.mcp-server-biblio/sources/scopus_source.py +309 -0
- package/.mcp-server-biblio/sources/wos_source.py +508 -0
- package/.mcp-server-biblio/uv.lock +896 -0
- package/.scripts/README.md +161 -0
- package/.scripts/ai_pattern_density.py +446 -0
- package/.scripts/conf +445 -0
- package/.scripts/config.py +122 -0
- package/.scripts/count_inventory.py +275 -0
- package/.scripts/daily_digest.py +288 -0
- package/.scripts/done +177 -0
- package/.scripts/extract_meeting_actions.py +223 -0
- package/.scripts/focus +176 -0
- package/.scripts/generate-codex-agents-md.py +217 -0
- package/.scripts/inbox +194 -0
- package/.scripts/notion_helpers.py +325 -0
- package/.scripts/openalex/query_helpers.py +306 -0
- package/.scripts/papers +227 -0
- package/.scripts/query +223 -0
- package/.scripts/session-history.py +201 -0
- package/.scripts/skill-health.py +516 -0
- package/.scripts/skill-log-miner.py +273 -0
- package/.scripts/sync-to-codex.sh +252 -0
- package/.scripts/task +213 -0
- package/.scripts/tasks +190 -0
- package/.scripts/week +206 -0
- package/CLAUDE.md +197 -0
- package/LICENSE +21 -0
- package/MEMORY.md +38 -0
- package/README.md +269 -0
- package/docs/agents.md +44 -0
- package/docs/bibliography-setup.md +55 -0
- package/docs/council-mode.md +36 -0
- package/docs/getting-started.md +245 -0
- package/docs/hooks.md +38 -0
- package/docs/mcp-servers.md +82 -0
- package/docs/notion-setup.md +109 -0
- package/docs/rules.md +33 -0
- package/docs/scripts.md +303 -0
- package/docs/setup-overview/setup-overview.pdf +0 -0
- package/docs/skills.md +70 -0
- package/docs/system.md +159 -0
- package/hooks/block-destructive-git.sh +66 -0
- package/hooks/context-monitor.py +114 -0
- package/hooks/postcompact-restore.py +157 -0
- package/hooks/precompact-autosave.py +181 -0
- package/hooks/promise-checker.sh +124 -0
- package/hooks/protect-source-files.sh +81 -0
- package/hooks/resume-context-loader.sh +53 -0
- package/hooks/startup-context-loader.sh +102 -0
- package/package.json +51 -0
- package/packages/cli-council/.github/workflows/claude-code-review.yml +44 -0
- package/packages/cli-council/.github/workflows/claude.yml +50 -0
- package/packages/cli-council/README.md +100 -0
- package/packages/cli-council/pyproject.toml +43 -0
- package/packages/cli-council/src/cli_council/__init__.py +19 -0
- package/packages/cli-council/src/cli_council/__main__.py +185 -0
- package/packages/cli-council/src/cli_council/backends/__init__.py +8 -0
- package/packages/cli-council/src/cli_council/backends/base.py +81 -0
- package/packages/cli-council/src/cli_council/backends/claude.py +25 -0
- package/packages/cli-council/src/cli_council/backends/codex.py +27 -0
- package/packages/cli-council/src/cli_council/backends/gemini.py +26 -0
- package/packages/cli-council/src/cli_council/checkpoint.py +212 -0
- package/packages/cli-council/src/cli_council/config.py +51 -0
- package/packages/cli-council/src/cli_council/council.py +391 -0
- package/packages/cli-council/src/cli_council/models.py +46 -0
- package/packages/llm-council/.github/workflows/claude-code-review.yml +44 -0
- package/packages/llm-council/.github/workflows/claude.yml +50 -0
- package/packages/llm-council/README.md +453 -0
- package/packages/llm-council/pyproject.toml +42 -0
- package/packages/llm-council/src/llm_council/__init__.py +23 -0
- package/packages/llm-council/src/llm_council/__main__.py +259 -0
- package/packages/llm-council/src/llm_council/checkpoint.py +193 -0
- package/packages/llm-council/src/llm_council/client.py +253 -0
- package/packages/llm-council/src/llm_council/config.py +232 -0
- package/packages/llm-council/src/llm_council/council.py +482 -0
- package/packages/llm-council/src/llm_council/models.py +46 -0
- package/packages/mcp-bibliography/MEMORY.md +31 -0
- package/packages/mcp-bibliography/_app.py +226 -0
- package/packages/mcp-bibliography/formatters.py +158 -0
- package/packages/mcp-bibliography/log/2026-03-13-2100.md +35 -0
- package/packages/mcp-bibliography/pyproject.toml +15 -0
- package/packages/mcp-bibliography/run.sh +20 -0
- package/packages/mcp-bibliography/scholarly_formatters.py +83 -0
- package/packages/mcp-bibliography/server.py +1857 -0
- package/packages/mcp-bibliography/tools/__init__.py +28 -0
- package/packages/mcp-bibliography/tools/_registry.py +19 -0
- package/packages/mcp-bibliography/tools/altmetric.py +107 -0
- package/packages/mcp-bibliography/tools/core.py +92 -0
- package/packages/mcp-bibliography/tools/dblp.py +52 -0
- package/packages/mcp-bibliography/tools/openalex.py +296 -0
- package/packages/mcp-bibliography/tools/opencitations.py +102 -0
- package/packages/mcp-bibliography/tools/openreview.py +179 -0
- package/packages/mcp-bibliography/tools/orcid.py +131 -0
- package/packages/mcp-bibliography/tools/scholarly.py +575 -0
- package/packages/mcp-bibliography/tools/unpaywall.py +63 -0
- package/packages/mcp-bibliography/tools/zenodo.py +123 -0
- package/packages/mcp-bibliography/uv.lock +711 -0
- package/scripts/setup.sh +143 -0
- package/skills/beamer-deck/SKILL.md +199 -0
- package/skills/beamer-deck/references/quality-rubric.md +54 -0
- package/skills/beamer-deck/references/review-prompts.md +106 -0
- package/skills/bib-validate/SKILL.md +261 -0
- package/skills/bib-validate/references/council-mode.md +34 -0
- package/skills/bib-validate/references/deep-verify.md +79 -0
- package/skills/bib-validate/references/fix-mode.md +36 -0
- package/skills/bib-validate/references/openalex-verification.md +45 -0
- package/skills/bib-validate/references/preprint-check.md +31 -0
- package/skills/bib-validate/references/ref-manager-crossref.md +41 -0
- package/skills/bib-validate/references/report-template.md +82 -0
- package/skills/code-archaeology/SKILL.md +141 -0
- package/skills/code-review/SKILL.md +265 -0
- package/skills/code-review/references/quality-rubric.md +67 -0
- package/skills/consolidate-memory/SKILL.md +208 -0
- package/skills/context-status/SKILL.md +126 -0
- package/skills/creation-guard/SKILL.md +230 -0
- package/skills/devils-advocate/SKILL.md +130 -0
- package/skills/devils-advocate/references/competing-hypotheses.md +83 -0
- package/skills/init-project/SKILL.md +115 -0
- package/skills/init-project-course/references/memory-and-settings.md +92 -0
- package/skills/init-project-course/references/organise-templates.md +94 -0
- package/skills/init-project-course/skill.md +147 -0
- package/skills/init-project-light/skill.md +139 -0
- package/skills/init-project-research/SKILL.md +368 -0
- package/skills/init-project-research/references/atlas-pipeline-sync.md +70 -0
- package/skills/init-project-research/references/atlas-schema.md +81 -0
- package/skills/init-project-research/references/confirmation-report.md +39 -0
- package/skills/init-project-research/references/domain-profile-template.md +104 -0
- package/skills/init-project-research/references/interview-round3.md +34 -0
- package/skills/init-project-research/references/literature-discovery.md +43 -0
- package/skills/init-project-research/references/scaffold-details.md +197 -0
- package/skills/init-project-research/templates/field-calibration.md +60 -0
- package/skills/init-project-research/templates/pipeline-manifest.md +63 -0
- package/skills/init-project-research/templates/run-all.sh +116 -0
- package/skills/init-project-research/templates/seed-files.md +337 -0
- package/skills/insights-deck/SKILL.md +151 -0
- package/skills/interview-me/SKILL.md +157 -0
- package/skills/latex/SKILL.md +141 -0
- package/skills/latex/references/latex-configs.md +183 -0
- package/skills/latex-autofix/SKILL.md +230 -0
- package/skills/latex-autofix/references/known-errors.md +183 -0
- package/skills/latex-autofix/references/quality-rubric.md +50 -0
- package/skills/latex-health-check/SKILL.md +161 -0
- package/skills/learn/SKILL.md +220 -0
- package/skills/learn/scripts/validate_skill.py +265 -0
- package/skills/lessons-learned/SKILL.md +201 -0
- package/skills/literature/SKILL.md +335 -0
- package/skills/literature/references/agent-templates.md +393 -0
- package/skills/literature/references/bibliometric-apis.md +44 -0
- package/skills/literature/references/cli-council-search.md +79 -0
- package/skills/literature/references/openalex-api-guide.md +371 -0
- package/skills/literature/references/openalex-common-queries.md +381 -0
- package/skills/literature/references/openalex-workflows.md +248 -0
- package/skills/literature/references/reference-manager-sync.md +36 -0
- package/skills/literature/references/scopus-api-guide.md +208 -0
- package/skills/literature/references/wos-api-guide.md +308 -0
- package/skills/multi-perspective/SKILL.md +311 -0
- package/skills/multi-perspective/references/computational-many-analysts.md +77 -0
- package/skills/pipeline-manifest/SKILL.md +226 -0
- package/skills/pre-submission-report/SKILL.md +153 -0
- package/skills/process-reviews/SKILL.md +244 -0
- package/skills/process-reviews/references/rr-routing.md +101 -0
- package/skills/project-deck/SKILL.md +87 -0
- package/skills/project-safety/SKILL.md +135 -0
- package/skills/proofread/SKILL.md +254 -0
- package/skills/proofread/references/quality-rubric.md +104 -0
- package/skills/python-env/SKILL.md +57 -0
- package/skills/quarto-deck/SKILL.md +226 -0
- package/skills/quarto-deck/references/markdown-format.md +143 -0
- package/skills/quarto-deck/references/quality-rubric.md +54 -0
- package/skills/save-context/SKILL.md +174 -0
- package/skills/session-log/SKILL.md +98 -0
- package/skills/shared/concept-validation-gate.md +161 -0
- package/skills/shared/council-protocol.md +265 -0
- package/skills/shared/distribution-diagnostics.md +164 -0
- package/skills/shared/engagement-stratified-sampling.md +218 -0
- package/skills/shared/escalation-protocol.md +74 -0
- package/skills/shared/external-audit-protocol.md +205 -0
- package/skills/shared/intercoder-reliability.md +256 -0
- package/skills/shared/mcp-degradation.md +81 -0
- package/skills/shared/method-probing-questions.md +163 -0
- package/skills/shared/multi-language-conventions.md +143 -0
- package/skills/shared/paid-api-safety.md +174 -0
- package/skills/shared/palettes.md +90 -0
- package/skills/shared/progressive-disclosure.md +92 -0
- package/skills/shared/project-documentation-content.md +443 -0
- package/skills/shared/project-documentation-format.md +281 -0
- package/skills/shared/project-documentation.md +100 -0
- package/skills/shared/publication-output.md +138 -0
- package/skills/shared/quality-scoring.md +70 -0
- package/skills/shared/reference-resolution.md +77 -0
- package/skills/shared/research-quality-rubric.md +165 -0
- package/skills/shared/rhetoric-principles.md +54 -0
- package/skills/shared/skill-design-patterns.md +272 -0
- package/skills/shared/skill-index.md +240 -0
- package/skills/shared/system-documentation.md +334 -0
- package/skills/shared/tikz-rules.md +402 -0
- package/skills/shared/validation-tiers.md +121 -0
- package/skills/shared/venue-guides/README.md +46 -0
- package/skills/shared/venue-guides/cell_press_style.md +483 -0
- package/skills/shared/venue-guides/conferences_formatting.md +564 -0
- package/skills/shared/venue-guides/cs_conference_style.md +463 -0
- package/skills/shared/venue-guides/examples/cell_summary_example.md +247 -0
- package/skills/shared/venue-guides/examples/medical_structured_abstract.md +313 -0
- package/skills/shared/venue-guides/examples/nature_abstract_examples.md +213 -0
- package/skills/shared/venue-guides/examples/neurips_introduction_example.md +245 -0
- package/skills/shared/venue-guides/journals_formatting.md +486 -0
- package/skills/shared/venue-guides/medical_journal_styles.md +535 -0
- package/skills/shared/venue-guides/ml_conference_style.md +556 -0
- package/skills/shared/venue-guides/nature_science_style.md +405 -0
- package/skills/shared/venue-guides/reviewer_expectations.md +417 -0
- package/skills/shared/venue-guides/venue_writing_styles.md +321 -0
- package/skills/split-pdf/SKILL.md +172 -0
- package/skills/split-pdf/methodology.md +48 -0
- package/skills/sync-notion/SKILL.md +93 -0
- package/skills/system-audit/SKILL.md +157 -0
- package/skills/system-audit/references/sub-agent-prompts.md +294 -0
- package/skills/task-management/SKILL.md +131 -0
- package/skills/update-focus/SKILL.md +204 -0
- package/skills/update-project-doc/SKILL.md +194 -0
- package/skills/validate-bib/SKILL.md +242 -0
- package/skills/validate-bib/references/council-mode.md +34 -0
- package/skills/validate-bib/references/deep-verify.md +71 -0
- package/skills/validate-bib/references/openalex-verification.md +45 -0
- package/skills/validate-bib/references/preprint-check.md +31 -0
- package/skills/validate-bib/references/report-template.md +62 -0
|
@@ -0,0 +1,287 @@
|
|
|
1
|
+
# The Six Audits
|
|
2
|
+
|
|
3
|
+
You perform **six distinct audits**, each producing findings that feed into your final referee report.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
### Audit 1: Code Audit
|
|
8
|
+
|
|
9
|
+
**Purpose:** Identify coding errors, logic gaps, and implementation problems.
|
|
10
|
+
|
|
11
|
+
**Checklist:**
|
|
12
|
+
|
|
13
|
+
- [ ] **Missing value handling**: How are NAs/missing values treated in the cleaning stage? Are they dropped, imputed, or ignored? Is this documented and justified?
|
|
14
|
+
- [ ] **Merge diagnostics**: After any merge/join, are there checks for (a) expected row counts, (b) unmatched observations, (c) duplicates created?
|
|
15
|
+
- [ ] **Variable construction**: Do constructed variables (dummies, logs, interactions) match their intended definitions?
|
|
16
|
+
- [ ] **Loop/apply logic**: Are there off-by-one errors, incorrect indexing, or iteration over wrong dimensions?
|
|
17
|
+
- [ ] **Filter conditions**: Do `filter()`, `keep if`, or `[condition]` statements correctly implement the stated sample restrictions?
|
|
18
|
+
- [ ] **Package/function behavior**: Are functions being used correctly? (e.g., `lm()` vs `felm()` fixed effects handling)
|
|
19
|
+
|
|
20
|
+
**Action:** Document each issue with file path, line number (if applicable), and explanation of why it matters.
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
### Audit 2: Cross-Language Replication
|
|
25
|
+
|
|
26
|
+
**Purpose:** Exploit orthogonality of hallucination errors across languages to catch bugs through independent replication.
|
|
27
|
+
|
|
28
|
+
**Protocol:**
|
|
29
|
+
|
|
30
|
+
1. **Identify the primary language** of the analysis (R, Stata, or Python)
|
|
31
|
+
2. **Create replication scripts** in the other two languages:
|
|
32
|
+
- If primary is **R** → create Stata and Python replication scripts
|
|
33
|
+
- If primary is **Stata** → create R and Python replication scripts
|
|
34
|
+
- If primary is **Python** → create R and Stata replication scripts
|
|
35
|
+
3. **Name replication scripts clearly:**
|
|
36
|
+
```
|
|
37
|
+
code/replication/
|
|
38
|
+
├── referee2_replicate_main_results.do # Stata replication
|
|
39
|
+
├── referee2_replicate_main_results.R # R replication
|
|
40
|
+
├── referee2_replicate_main_results.py # Python replication
|
|
41
|
+
├── referee2_replicate_event_study.do
|
|
42
|
+
├── referee2_replicate_event_study.R
|
|
43
|
+
└── ...
|
|
44
|
+
```
|
|
45
|
+
4. **Run all three implementations** and compare results:
|
|
46
|
+
- Point estimates must match to 6+ decimal places
|
|
47
|
+
- Standard errors must match (accounting for degrees of freedom conventions)
|
|
48
|
+
- Sample sizes must be identical
|
|
49
|
+
- Any constructed variables (residuals, fitted values, etc.) must match
|
|
50
|
+
|
|
51
|
+
**What discrepancies reveal:**
|
|
52
|
+
- **Different point estimates**: Likely a coding error in one implementation
|
|
53
|
+
- **Different standard errors**: Check clustering, robust SE specifications, or DoF adjustments
|
|
54
|
+
- **Different sample sizes**: Check missing value handling, merge behavior, or filter conditions
|
|
55
|
+
- **Different significance levels**: Usually a standard error issue
|
|
56
|
+
|
|
57
|
+
**Deliverable:**
|
|
58
|
+
1. Named replication scripts saved to `code/replication/`
|
|
59
|
+
2. A comparison table showing results from all three languages, with discrepancies highlighted and diagnosed
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
### Audit 3: Directory & Replication Package Audit
|
|
64
|
+
|
|
65
|
+
**Purpose:** Ensure the project is organized for eventual public release as a replication package.
|
|
66
|
+
|
|
67
|
+
**Checklist:**
|
|
68
|
+
|
|
69
|
+
- [ ] **Folder structure**: Is there clear separation between `/data/raw`, `/data/clean`, `/code`, `/output`, `/docs`?
|
|
70
|
+
- [ ] **Relative paths**: Are ALL file paths relative to the project root? Absolute paths (`C:\Users\...` or `/Users/scott/...`) are automatic failures.
|
|
71
|
+
- [ ] **Naming conventions**:
|
|
72
|
+
- Variables: Are names informative? (`treatment_intensity` not `x1`)
|
|
73
|
+
- Datasets: Do names reflect contents? (`county_panel_2000_2020.dta` not `data2.dta`)
|
|
74
|
+
- Scripts: Is execution order clear? (`01_clean.R`, `02_merge.R`, `03_estimate.R`)
|
|
75
|
+
- [ ] **Master script**: Is there a single script that runs the entire pipeline from raw data to final output?
|
|
76
|
+
- [ ] **README**: Does `/code/README.md` explain how to run the replication?
|
|
77
|
+
- [ ] **Dependencies**: Are required packages/libraries documented with versions?
|
|
78
|
+
- [ ] **Seeds**: Are random seeds set for any stochastic procedures?
|
|
79
|
+
|
|
80
|
+
**Scoring:** Assign a replication readiness score (1-10) with specific deficiencies noted.
|
|
81
|
+
|
|
82
|
+
---
|
|
83
|
+
|
|
84
|
+
### Audit 4: Output Automation Audit
|
|
85
|
+
|
|
86
|
+
**Purpose:** Verify that tables and figures are programmatically generated, not manually created.
|
|
87
|
+
|
|
88
|
+
**Checklist:**
|
|
89
|
+
|
|
90
|
+
- [ ] **Tables**: Are regression tables generated by code (e.g., `stargazer`, `esttab`, `statsmodels`)? Or are they manually typed into LaTeX/Word?
|
|
91
|
+
- [ ] **Figures**: Are figures saved programmatically with code (e.g., `ggsave()`, `graph export`, `plt.savefig()`)? Or are they manually exported?
|
|
92
|
+
- [ ] **In-text numbers**: Are key statistics (N, means, coefficients mentioned in text) pulled programmatically or hardcoded?
|
|
93
|
+
- [ ] **Reproducibility test**: If you re-run the code, do you get *exactly* the same outputs (byte-identical files)?
|
|
94
|
+
|
|
95
|
+
**Deductions:**
|
|
96
|
+
- Manual table entry: Major concern
|
|
97
|
+
- Manual figure export: Minor concern
|
|
98
|
+
- Hardcoded in-text statistics: Major concern
|
|
99
|
+
- Non-reproducible outputs: Major concern
|
|
100
|
+
|
|
101
|
+
---
|
|
102
|
+
|
|
103
|
+
### Audit 5: Empirical Methods Audit
|
|
104
|
+
|
|
105
|
+
**Purpose:** Verify that the analytical methods — whatever they are — are coherent, correctly implemented, and properly interpreted. This audit adapts to the paper's methodology.
|
|
106
|
+
|
|
107
|
+
**Step 1: Identify the methodological paradigm.** Before applying any checklist, determine which approach(es) the paper uses:
|
|
108
|
+
|
|
109
|
+
| Paradigm | Examples |
|
|
110
|
+
|----------|----------|
|
|
111
|
+
| **Causal inference / Econometrics** | DiD, IV, RDD, RCT, synthetic control, matching |
|
|
112
|
+
| **Experiments (lab/online)** | Randomized experiments, A/B tests, within-subjects designs |
|
|
113
|
+
| **Computational modelling / Simulation** | Agent-based models, Monte Carlo, optimization, game theory |
|
|
114
|
+
| **Machine learning / NLP** | Classification, prediction, LLMs, embeddings, fine-tuning |
|
|
115
|
+
| **Survey / Psychometrics** | Likert scales, SEM, factor analysis, conjoint |
|
|
116
|
+
| **Qualitative / Mixed methods** | Interviews, case studies, thematic analysis, mixed designs |
|
|
117
|
+
| **MCDM / Decision analysis** | AHP, TOPSIS, PROMETHEE, multi-objective optimization |
|
|
118
|
+
| **Theoretical / Mathematical** | Proofs, axioms, mechanism design, formal models |
|
|
119
|
+
|
|
120
|
+
**Step 2: Apply the paradigm-specific checklist(s).** Most papers use one primary paradigm; some combine multiple. Apply ALL relevant checklists.
|
|
121
|
+
|
|
122
|
+
---
|
|
123
|
+
|
|
124
|
+
#### 5A. Causal Inference / Econometrics
|
|
125
|
+
|
|
126
|
+
- [ ] **Identification strategy**: Is the source of variation clearly stated? Is it plausible?
|
|
127
|
+
- [ ] **Estimating equation**: Does the code implement what the paper/documentation claims?
|
|
128
|
+
- [ ] **Standard errors**: Clustered at the appropriate level? Sufficient clusters (>50)? Heteroskedasticity addressed?
|
|
129
|
+
- [ ] **Fixed effects**: Correct? Collinear with treatment?
|
|
130
|
+
- [ ] **Controls**: Appropriate? Any "bad controls" (post-treatment variables)?
|
|
131
|
+
- [ ] **Sample definition**: Who is in and why? Restrictions justified?
|
|
132
|
+
- [ ] **Parallel trends** (if DiD): Pre-trends evidence? Pre-treatment tests?
|
|
133
|
+
- [ ] **First stage** (if IV): Shown? F-statistic reported?
|
|
134
|
+
- [ ] **Balance** (if RCT/RD): Balance tests shown?
|
|
135
|
+
- [ ] **Magnitude plausibility**: Effect size reasonable given priors?
|
|
136
|
+
|
|
137
|
+
#### 5B. Experiments (Lab / Online)
|
|
138
|
+
|
|
139
|
+
- [ ] **Randomisation**: Properly implemented? Stratified? Block-randomized?
|
|
140
|
+
- [ ] **Power analysis**: Was a pre-registered power analysis conducted? Is the sample large enough?
|
|
141
|
+
- [ ] **Pre-registration**: Is the experiment pre-registered? Do analyses match the pre-analysis plan?
|
|
142
|
+
- [ ] **Demand effects / Experimenter bias**: Could participants guess the hypothesis? Blinding?
|
|
143
|
+
- [ ] **Manipulation checks**: Do they verify the treatment worked as intended?
|
|
144
|
+
- [ ] **Attention checks**: Were inattentive participants filtered?
|
|
145
|
+
- [ ] **Multiple comparisons**: Are p-values corrected for multiple testing?
|
|
146
|
+
- [ ] **Effect sizes**: Reported alongside significance? Cohen's d or equivalent?
|
|
147
|
+
- [ ] **Ecological validity**: How well does the lab setting map to the real world?
|
|
148
|
+
- [ ] **Attrition**: Differential dropout between conditions?
|
|
149
|
+
|
|
150
|
+
#### 5C. Computational Modelling / Simulation
|
|
151
|
+
|
|
152
|
+
- [ ] **Model specification**: Are assumptions clearly stated and justified?
|
|
153
|
+
- [ ] **Parameter calibration**: Where do parameter values come from? Empirically grounded or arbitrary?
|
|
154
|
+
- [ ] **Sensitivity analysis**: How sensitive are results to parameter choices?
|
|
155
|
+
- [ ] **Convergence**: Do simulations converge? How many iterations/runs?
|
|
156
|
+
- [ ] **Seed reproducibility**: Are random seeds set and reported?
|
|
157
|
+
- [ ] **Validation**: Is the model validated against known data or analytical solutions?
|
|
158
|
+
- [ ] **Boundary conditions**: Are edge cases and extreme parameter values tested?
|
|
159
|
+
- [ ] **Computational correctness**: Does the code implement the stated model?
|
|
160
|
+
|
|
161
|
+
#### 5D. Machine Learning / NLP
|
|
162
|
+
|
|
163
|
+
- [ ] **Train/test split**: Proper held-out test set? No data leakage?
|
|
164
|
+
- [ ] **Baselines**: Are appropriate baselines compared?
|
|
165
|
+
- [ ] **Metrics**: Are evaluation metrics appropriate for the task and data distribution?
|
|
166
|
+
- [ ] **Hyperparameter tuning**: Described? Separate validation set used?
|
|
167
|
+
- [ ] **Cross-validation**: K-fold or equivalent?
|
|
168
|
+
- [ ] **Statistical significance**: Are differences between models tested? Confidence intervals?
|
|
169
|
+
- [ ] **Ablation**: Are component contributions isolated?
|
|
170
|
+
- [ ] **Data contamination**: Could training data overlap with test data or evaluation benchmarks?
|
|
171
|
+
- [ ] **Prompt sensitivity** (if LLM): Are results robust to prompt variation?
|
|
172
|
+
- [ ] **Reproducibility**: Model weights, code, data shared?
|
|
173
|
+
|
|
174
|
+
#### 5E. Survey / Psychometrics
|
|
175
|
+
|
|
176
|
+
- [ ] **Construct validity**: Are scales validated? Cronbach's alpha or equivalent reported?
|
|
177
|
+
- [ ] **Sampling**: Probability vs convenience sample? Representativeness discussed?
|
|
178
|
+
- [ ] **Response rate**: Reported? Non-response bias addressed?
|
|
179
|
+
- [ ] **Common method bias**: Single vs multiple sources? Harman's test or marker variable?
|
|
180
|
+
- [ ] **Scale anchoring**: Are Likert scales appropriately anchored and labeled?
|
|
181
|
+
- [ ] **Factor structure**: Exploratory/confirmatory factor analysis conducted?
|
|
182
|
+
- [ ] **Social desirability**: Could responses be biased by self-presentation?
|
|
183
|
+
- [ ] **Missing data**: How handled? MCAR/MAR/MNAR assessment?
|
|
184
|
+
|
|
185
|
+
#### 5F. Qualitative / Mixed Methods
|
|
186
|
+
|
|
187
|
+
- [ ] **Research design**: Is the qualitative approach (grounded theory, case study, etc.) clearly stated?
|
|
188
|
+
- [ ] **Sampling logic**: Theoretical/purposive sampling justified?
|
|
189
|
+
- [ ] **Data saturation**: Addressed? How determined?
|
|
190
|
+
- [ ] **Coding process**: Inter-rater reliability? Codebook provided?
|
|
191
|
+
- [ ] **Reflexivity**: Researcher positionality discussed?
|
|
192
|
+
- [ ] **Triangulation**: Multiple data sources or methods?
|
|
193
|
+
- [ ] **Integration** (if mixed): How are qualitative and quantitative components integrated?
|
|
194
|
+
- [ ] **Transferability**: Is the scope of generalization appropriate?
|
|
195
|
+
|
|
196
|
+
#### 5G. MCDM / Decision Analysis
|
|
197
|
+
|
|
198
|
+
- [ ] **Criteria selection**: How were criteria chosen? Justified?
|
|
199
|
+
- [ ] **Weighting method**: Appropriate? Sensitivity to weight changes?
|
|
200
|
+
- [ ] **Normalization**: Method appropriate for the data type?
|
|
201
|
+
- [ ] **Rank reversal**: Tested? Method known to be susceptible?
|
|
202
|
+
- [ ] **Consistency**: (if AHP) Consistency ratio reported and acceptable?
|
|
203
|
+
- [ ] **Alternatives**: Is the set of alternatives appropriate and complete?
|
|
204
|
+
- [ ] **Stakeholder involvement**: Are decision-makers involved in the process?
|
|
205
|
+
- [ ] **Comparison**: Are results compared across multiple MCDM methods?
|
|
206
|
+
|
|
207
|
+
#### 5H. Theoretical / Mathematical
|
|
208
|
+
|
|
209
|
+
- [ ] **Assumptions**: Clearly stated? Reasonable? Necessary?
|
|
210
|
+
- [ ] **Proof correctness**: Are proofs logically valid? Any gaps?
|
|
211
|
+
- [ ] **Generality**: How general are the results? What breaks if assumptions are relaxed?
|
|
212
|
+
- [ ] **Constructiveness**: Are existence proofs constructive? Can the results be computed?
|
|
213
|
+
- [ ] **Relation to existing results**: Are results properly compared to existing theorems?
|
|
214
|
+
- [ ] **Examples/counterexamples**: Are the results illustrated?
|
|
215
|
+
|
|
216
|
+
---
|
|
217
|
+
|
|
218
|
+
**Deliverable:** List of methodological concerns with severity ratings, organized by the relevant paradigm checklist(s).
|
|
219
|
+
|
|
220
|
+
---
|
|
221
|
+
|
|
222
|
+
### Audit 6: Novelty & Literature Assessment
|
|
223
|
+
|
|
224
|
+
**Purpose:** Independently verify that the paper's claimed contributions are genuinely novel and correctly positioned relative to the existing literature. This audit catches the case where the user's work unknowingly overlaps with or has been pre-empted by existing papers.
|
|
225
|
+
|
|
226
|
+
**Why this matters for your own work:** It is easy to be blind to competing work, especially recent preprints, concurrent submissions, or papers in adjacent fields. A hostile reviewer WILL find these. Better to find them yourself first.
|
|
227
|
+
|
|
228
|
+
**Protocol:**
|
|
229
|
+
|
|
230
|
+
1. **Extract claimed contributions**: Identify every explicit contribution claim in the paper (typically in the introduction and conclusion). Record the exact language and page references.
|
|
231
|
+
|
|
232
|
+
2. **Launch a Novelty & Literature sub-agent** using the Task tool with `subagent_type: general-purpose`. Provide it with:
|
|
233
|
+
- The paper's exact claimed contributions (with page references)
|
|
234
|
+
- The research question
|
|
235
|
+
- The key methods used
|
|
236
|
+
- The field/domain
|
|
237
|
+
- The papers the author cites as most closely related
|
|
238
|
+
|
|
239
|
+
The sub-agent's task is to independently search the literature for:
|
|
240
|
+
- Papers that have already made the **same** contribution (pre-empting)
|
|
241
|
+
- Papers that have made a **very similar** contribution in a different context
|
|
242
|
+
- Concurrent/simultaneous work making the same point
|
|
243
|
+
- Papers the author **should have cited** but didn't
|
|
244
|
+
- Entire literature streams the author may have overlooked
|
|
245
|
+
|
|
246
|
+
3. **Classify each contribution:**
|
|
247
|
+
|
|
248
|
+
| Level | Symbol | Meaning |
|
|
249
|
+
|-------|--------|---------|
|
|
250
|
+
| **Novel** | 🟢 | No prior work found that pre-empts this |
|
|
251
|
+
| **Incremental** | 🟡 | Prior work exists in a different context; this extends it |
|
|
252
|
+
| **Overlapping** | 🟠 | Substantial overlap with existing work; unclear what is truly new |
|
|
253
|
+
| **Pre-empted** | 🔴 | An existing paper has already made this contribution |
|
|
254
|
+
|
|
255
|
+
4. **Assess literature positioning:**
|
|
256
|
+
- Is the literature review adequate for the target venue?
|
|
257
|
+
- Are the most relevant competitors cited and clearly differentiated?
|
|
258
|
+
- Are there important omissions that a reviewer would flag?
|
|
259
|
+
|
|
260
|
+
**Red flags:**
|
|
261
|
+
- The author avoids citing the most directly relevant prior work
|
|
262
|
+
- The "contribution" is a methodological tweak with no new substantive insight
|
|
263
|
+
- The literature review cites only tangentially related work, not direct competitors
|
|
264
|
+
- The contribution boils down to "we did X but with different data" without theoretical justification for why the new context matters
|
|
265
|
+
|
|
266
|
+
**Deliverable:**
|
|
267
|
+
|
|
268
|
+
```markdown
|
|
269
|
+
### Novelty Assessment
|
|
270
|
+
|
|
271
|
+
**Overall verdict:** [Novel / Incremental / Overlapping / Pre-empted]
|
|
272
|
+
|
|
273
|
+
| Claimed Contribution | Novelty | Key Prior Work | Gap |
|
|
274
|
+
|---------------------|---------|---------------|-----|
|
|
275
|
+
| [Contribution 1] | 🟢/🟡/🟠/🔴 | [Closest paper] | [What's different] |
|
|
276
|
+
|
|
277
|
+
### Missing Citations
|
|
278
|
+
[Papers that should be cited but aren't]
|
|
279
|
+
|
|
280
|
+
### Literature Gaps
|
|
281
|
+
[Streams of literature the paper overlooks]
|
|
282
|
+
|
|
283
|
+
### Positioning Recommendation
|
|
284
|
+
[How to sharpen the contribution claim]
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
**Important:** If the sub-agent finds pre-empting work (🔴), this is a **major concern** and should be flagged prominently. It is far better to discover this during self-review than to have a referee point it out.
|
|
@@ -0,0 +1,334 @@
|
|
|
1
|
+
# Referee 2 Report Template & Filing
|
|
2
|
+
|
|
3
|
+
## Output Format: The Referee Report
|
|
4
|
+
|
|
5
|
+
Produce a formal referee report with this structure:
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
=================================================================
|
|
9
|
+
REFEREE REPORT
|
|
10
|
+
[Project Name] — Round [N]
|
|
11
|
+
Date: YYYY-MM-DD
|
|
12
|
+
=================================================================
|
|
13
|
+
|
|
14
|
+
## Summary
|
|
15
|
+
|
|
16
|
+
[2-3 sentences: What was audited? What is the overall assessment?]
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## Audit 1: Code Audit
|
|
21
|
+
|
|
22
|
+
### Findings
|
|
23
|
+
[Numbered list of issues found]
|
|
24
|
+
|
|
25
|
+
### Missing Value Handling Assessment
|
|
26
|
+
[Specific assessment of how missing values are treated]
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## Audit 2: Cross-Language Replication
|
|
31
|
+
|
|
32
|
+
### Replication Scripts Created
|
|
33
|
+
- `code/replication/referee2_replicate_[name].do`
|
|
34
|
+
- `code/replication/referee2_replicate_[name].R`
|
|
35
|
+
- `code/replication/referee2_replicate_[name].py`
|
|
36
|
+
|
|
37
|
+
### Comparison Table
|
|
38
|
+
|
|
39
|
+
| Specification | R | Stata | Python | Match? |
|
|
40
|
+
|--------------|---|-------|--------|--------|
|
|
41
|
+
| Main estimate | X.XXXXXX | X.XXXXXX | X.XXXXXX | Yes/No |
|
|
42
|
+
| SE | X.XXXXXX | X.XXXXXX | X.XXXXXX | Yes/No |
|
|
43
|
+
| N | X | X | X | Yes/No |
|
|
44
|
+
|
|
45
|
+
### Discrepancies Diagnosed
|
|
46
|
+
[If any mismatches, explain the likely cause and which implementation is correct]
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
## Audit 3: Directory & Replication Package
|
|
51
|
+
|
|
52
|
+
### Replication Readiness Score: X/10
|
|
53
|
+
|
|
54
|
+
### Deficiencies
|
|
55
|
+
[Numbered list]
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## Audit 4: Output Automation
|
|
60
|
+
|
|
61
|
+
### Tables: [Automated / Manual / Mixed]
|
|
62
|
+
### Figures: [Automated / Manual / Mixed]
|
|
63
|
+
### In-text statistics: [Automated / Manual / Mixed]
|
|
64
|
+
|
|
65
|
+
### Deductions
|
|
66
|
+
[List any issues]
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## Audit 5: Empirical Methods ([paradigm(s) identified])
|
|
71
|
+
|
|
72
|
+
### Method Assessment
|
|
73
|
+
[Is the approach appropriate and correctly implemented?]
|
|
74
|
+
|
|
75
|
+
### Specification / Design Issues
|
|
76
|
+
[Numbered list of concerns from the relevant paradigm checklist(s)]
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
## Audit 6: Novelty & Literature
|
|
81
|
+
|
|
82
|
+
### Overall Novelty Verdict: [Novel / Incremental / Overlapping / Pre-empted]
|
|
83
|
+
|
|
84
|
+
### Per-Contribution Assessment
|
|
85
|
+
|
|
86
|
+
| Claimed Contribution | Novelty | Key Prior Work | Gap |
|
|
87
|
+
|---------------------|---------|---------------|-----|
|
|
88
|
+
| [Contribution 1] | 🟢/🟡/🟠/🔴 | [Closest paper] | [What's different] |
|
|
89
|
+
|
|
90
|
+
### Missing Citations
|
|
91
|
+
[Papers that should be cited but aren't]
|
|
92
|
+
|
|
93
|
+
### Literature Gaps
|
|
94
|
+
[Streams of literature the paper overlooks]
|
|
95
|
+
|
|
96
|
+
### Positioning Recommendation
|
|
97
|
+
[How to sharpen the contribution claim]
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## Research Quality Scorecard
|
|
102
|
+
|
|
103
|
+
Load `skills/shared/research-quality-rubric.md` and score all 8 dimensions (1-5). If the paper targets a specific venue, also read `skills/shared/venue-guides/reviewer_expectations.md` to calibrate your critique to that venue's reviewer priorities.
|
|
104
|
+
|
|
105
|
+
| Dimension | Weight | Score | Notes |
|
|
106
|
+
|-----------|--------|-------|-------|
|
|
107
|
+
| Problem Formulation | 15% | /5 | |
|
|
108
|
+
| Literature Review | 15% | /5 | |
|
|
109
|
+
| Methodology | 20% | /5 | |
|
|
110
|
+
| Data Collection | 10% | /5 | |
|
|
111
|
+
| Analysis | 15% | /5 | |
|
|
112
|
+
| Results | 10% | /5 | |
|
|
113
|
+
| Writing | 10% | /5 | |
|
|
114
|
+
| Citations | 5% | /5 | |
|
|
115
|
+
| **Weighted Total** | | **/5** | |
|
|
116
|
+
|
|
117
|
+
**Verdict:** [Exceptional / Strong / Good / Acceptable / Weak]
|
|
118
|
+
|
|
119
|
+
---
|
|
120
|
+
|
|
121
|
+
## Major Concerns
|
|
122
|
+
[Numbered list — MUST be addressed before acceptance]
|
|
123
|
+
|
|
124
|
+
1. **[Short title]**: [Detailed explanation and why it matters]
|
|
125
|
+
|
|
126
|
+
## Minor Concerns
|
|
127
|
+
[Numbered list — should be addressed]
|
|
128
|
+
|
|
129
|
+
1. **[Short title]**: [Explanation]
|
|
130
|
+
|
|
131
|
+
## Questions for Authors
|
|
132
|
+
[Things requiring clarification]
|
|
133
|
+
|
|
134
|
+
---
|
|
135
|
+
|
|
136
|
+
## Verdict
|
|
137
|
+
|
|
138
|
+
[ ] Accept
|
|
139
|
+
[ ] Minor Revisions
|
|
140
|
+
[ ] Major Revisions
|
|
141
|
+
[ ] Reject
|
|
142
|
+
|
|
143
|
+
**Justification:** [Brief explanation]
|
|
144
|
+
|
|
145
|
+
---
|
|
146
|
+
|
|
147
|
+
## Recommendations
|
|
148
|
+
[Prioritized list of what the author should do before resubmission]
|
|
149
|
+
|
|
150
|
+
=================================================================
|
|
151
|
+
END OF REFEREE REPORT
|
|
152
|
+
=================================================================
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
## Filing the Referee Report
|
|
158
|
+
|
|
159
|
+
After completing your audit and replication, you produce **two deliverables**:
|
|
160
|
+
|
|
161
|
+
### 1. The Referee Report (Markdown)
|
|
162
|
+
|
|
163
|
+
**Location:** `[project_root]/reviews/referee2-reviewer/YYYY-MM-DD_round[N]_report.md`
|
|
164
|
+
|
|
165
|
+
The detailed written report with all findings, comparison tables, and recommendations.
|
|
166
|
+
|
|
167
|
+
### 2. The Referee Report Deck (Beamer/PDF)
|
|
168
|
+
|
|
169
|
+
**Location:** `[project_root]/reviews/referee2-reviewer/YYYY-MM-DD_round[N]_deck.tex` (and compiled `.pdf`)
|
|
170
|
+
|
|
171
|
+
A presentation deck that **visualizes** the audit findings. The markdown report provides the detailed written record; the deck helps the author **understand** the problems through tables and figures.
|
|
172
|
+
|
|
173
|
+
---
|
|
174
|
+
|
|
175
|
+
#### The Deck Follows the Rhetoric of Decks
|
|
176
|
+
|
|
177
|
+
This deck must follow the same principles as any good presentation:
|
|
178
|
+
|
|
179
|
+
1. **MB/MC Equivalence**: Every slide should have the same marginal benefit to marginal cost ratio. No slide should be cognitively overwhelming; no slide should be trivial filler.
|
|
180
|
+
|
|
181
|
+
2. **Beautiful Tables**: Cross-language comparison tables should be properly formatted with:
|
|
182
|
+
- Clear headers
|
|
183
|
+
- Aligned decimal points
|
|
184
|
+
- Visual indicators (checkmark/cross or color) for match/mismatch
|
|
185
|
+
- Consistent precision (6 decimal places for point estimates)
|
|
186
|
+
|
|
187
|
+
3. **Beautiful Figures**: Where appropriate, visualize findings:
|
|
188
|
+
- Bar charts comparing estimates across languages
|
|
189
|
+
- Heatmaps showing which specifications match/mismatch
|
|
190
|
+
- Progress bars for scores (replication readiness, automation)
|
|
191
|
+
- Coefficient plots if comparing multiple specifications
|
|
192
|
+
|
|
193
|
+
4. **Titles Are Assertions**: Slide titles should state the finding, not describe the content:
|
|
194
|
+
- GOOD: "Python implementation differs by 0.003 on main specification"
|
|
195
|
+
- BAD: "Cross-language comparison results"
|
|
196
|
+
|
|
197
|
+
5. **No Compilation Warnings**: Fix ALL overfull/underfull hbox warnings. The deck must compile cleanly.
|
|
198
|
+
|
|
199
|
+
6. **Check Positioning**: Verify that:
|
|
200
|
+
- Table/figure labels are positioned correctly
|
|
201
|
+
- TikZ coordinates are where you intend
|
|
202
|
+
- Text doesn't overflow frames
|
|
203
|
+
- Fonts are readable
|
|
204
|
+
|
|
205
|
+
---
|
|
206
|
+
|
|
207
|
+
#### Deck Structure
|
|
208
|
+
|
|
209
|
+
| Slide | Content |
|
|
210
|
+
|-------|---------|
|
|
211
|
+
| 1 | **Title**: Project name, "Referee Report — Round N", date |
|
|
212
|
+
| 2 | **Executive Summary**: Verdict + 3-4 key findings in bullet form |
|
|
213
|
+
| 3-5 | **Cross-Language Replication**: Comparison tables showing R/Stata/Python results side-by-side. One slide per major specification. Highlight discrepancies. |
|
|
214
|
+
| 6 | **Replication Discrepancies Diagnosed**: If mismatches found, explain likely causes with evidence |
|
|
215
|
+
| 7 | **Replication Readiness Score**: Visual scorecard (X/10) with checklist |
|
|
216
|
+
| 8 | **Code Audit Findings**: Severity breakdown (N major, N minor) with top concerns listed |
|
|
217
|
+
| 9 | **Methods Assessment**: Key specification/design concerns from the relevant paradigm checklist |
|
|
218
|
+
| 10 | **Novelty & Literature**: Contribution novelty ratings, missing citations, positioning |
|
|
219
|
+
| 11 | **Output Automation**: Checklist of what's automated vs manual |
|
|
220
|
+
| 12 | **Recommendations**: Prioritized action items for resubmission |
|
|
221
|
+
|
|
222
|
+
Adjust slide count based on findings — more slides if more discrepancies to show, fewer if the audit is clean.
|
|
223
|
+
|
|
224
|
+
---
|
|
225
|
+
|
|
226
|
+
#### Compilation Requirements
|
|
227
|
+
|
|
228
|
+
Before filing the deck:
|
|
229
|
+
|
|
230
|
+
1. **Always compile to `out/` subdirectory**: Use `latexmk -pdf -outdir=out <file>.tex`
|
|
231
|
+
2. **Copy the final PDF back** to the source directory: `cp out/<file>.pdf .`
|
|
232
|
+
3. **Never leave build artifacts** (`.aux`, `.log`, `.fls`, `.fdb_latexmk`, `.nav`, `.snm`, `.toc`, `.out`) in the source directory — they belong in `out/`
|
|
233
|
+
4. **Compile with no errors**
|
|
234
|
+
5. **Fix ALL warnings** — overfull hbox, underfull hbox, font substitutions
|
|
235
|
+
6. **Visual inspection**: Open the PDF and verify:
|
|
236
|
+
- Tables are centered and readable
|
|
237
|
+
- Figures don't overflow
|
|
238
|
+
- TikZ elements are positioned correctly
|
|
239
|
+
- No text is cut off
|
|
240
|
+
7. **Re-compile** after any fixes (again to `out/`)
|
|
241
|
+
|
|
242
|
+
---
|
|
243
|
+
|
|
244
|
+
#### Files Produced
|
|
245
|
+
|
|
246
|
+
- `reviews/referee2-reviewer/YYYY-MM-DD_round1_report.md` — Detailed written report
|
|
247
|
+
- `reviews/referee2-reviewer/YYYY-MM-DD_round1_deck.tex` — LaTeX source
|
|
248
|
+
- `reviews/referee2-reviewer/YYYY-MM-DD_round1_deck.pdf` — Compiled presentation
|
|
249
|
+
|
|
250
|
+
The markdown and deck go hand-in-hand: the markdown is the permanent written record; the deck is how the author reviews and understands the audit findings.
|
|
251
|
+
|
|
252
|
+
The report does NOT go into `CLAUDE.md`. It is a standalone document that the author will read and respond to.
|
|
253
|
+
|
|
254
|
+
---
|
|
255
|
+
|
|
256
|
+
## The Revise & Resubmit Process
|
|
257
|
+
|
|
258
|
+
### Round 1: Initial Submission
|
|
259
|
+
|
|
260
|
+
1. Author completes analysis in their main Claude session
|
|
261
|
+
2. The Referee 2 agent is launched (via the Task tool) to audit the project
|
|
262
|
+
3. Referee 2 performs five audits, creates replication scripts, files referee report
|
|
263
|
+
4. Agent returns findings
|
|
264
|
+
|
|
265
|
+
### Author Response to Round 1
|
|
266
|
+
|
|
267
|
+
The author reads the referee report and must:
|
|
268
|
+
|
|
269
|
+
1. **For each Major Concern**: Either FIX it or JUSTIFY why not (with detailed reasoning)
|
|
270
|
+
2. **For each Minor Concern**: Either FIX it or ACKNOWLEDGE and explain deprioritization
|
|
271
|
+
3. **Answer all Questions for Authors**
|
|
272
|
+
4. **Describe code changes made** (what files, what changes)
|
|
273
|
+
5. **File response** at: `reviews/referee2-reviewer/YYYY-MM-DD_round1_response.md`
|
|
274
|
+
|
|
275
|
+
**Response format:**
|
|
276
|
+
```
|
|
277
|
+
=================================================================
|
|
278
|
+
AUTHOR RESPONSE TO REFEREE REPORT
|
|
279
|
+
Round 1 — Date: YYYY-MM-DD
|
|
280
|
+
=================================================================
|
|
281
|
+
|
|
282
|
+
## Response to Major Concerns
|
|
283
|
+
|
|
284
|
+
### Major Concern 1: [Title]
|
|
285
|
+
**Action taken:** [Fixed / Justified]
|
|
286
|
+
[Detailed explanation of fix OR justification for not fixing]
|
|
287
|
+
|
|
288
|
+
### Major Concern 2: [Title]
|
|
289
|
+
...
|
|
290
|
+
|
|
291
|
+
## Response to Minor Concerns
|
|
292
|
+
|
|
293
|
+
### Minor Concern 1: [Title]
|
|
294
|
+
**Action taken:** [Fixed / Acknowledged]
|
|
295
|
+
[Brief explanation]
|
|
296
|
+
|
|
297
|
+
...
|
|
298
|
+
|
|
299
|
+
## Answers to Questions
|
|
300
|
+
|
|
301
|
+
### Question 1
|
|
302
|
+
[Answer]
|
|
303
|
+
|
|
304
|
+
...
|
|
305
|
+
|
|
306
|
+
## Summary of Code Changes
|
|
307
|
+
|
|
308
|
+
| File | Change |
|
|
309
|
+
|------|--------|
|
|
310
|
+
| `code/01_clean.R` | Fixed missing value handling on line 47 |
|
|
311
|
+
| ... | ... |
|
|
312
|
+
|
|
313
|
+
=================================================================
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
### Round 2+: Revision Review
|
|
317
|
+
|
|
318
|
+
1. The Referee 2 agent is launched again with instructions to read:
|
|
319
|
+
- The original referee report (`round1_report.md`)
|
|
320
|
+
- The author response (`round1_response.md`)
|
|
321
|
+
- The revised code
|
|
322
|
+
2. Referee 2 re-runs all five audits
|
|
323
|
+
3. Referee 2 assesses whether concerns were adequately addressed:
|
|
324
|
+
- **Fixed**: Remove from concerns
|
|
325
|
+
- **Justified**: Accept justification OR push back if unconvincing
|
|
326
|
+
- **Ignored**: Flag and escalate
|
|
327
|
+
- **New issues introduced**: Add to concerns
|
|
328
|
+
4. Referee 2 files Round 2 report at `reviews/referee2-reviewer/YYYY-MM-DD_round2_report.md`
|
|
329
|
+
|
|
330
|
+
### Termination
|
|
331
|
+
|
|
332
|
+
The process continues until:
|
|
333
|
+
- Verdict is **Accept** or **Minor Revisions** (with minor revisions being addressable without re-review)
|
|
334
|
+
- OR Referee 2 recommends **Reject** with justification
|
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
# Rule: Design Before Results
|
|
2
|
+
|
|
3
|
+
## Principle
|
|
4
|
+
|
|
5
|
+
**Lock the research design before examining point estimates.** Specify the estimand, identification strategy, and analysis plan before looking at results. This prevents post-hoc rationalisation and keeps the research credible.
|
|
6
|
+
|
|
7
|
+
## When This Applies
|
|
8
|
+
|
|
9
|
+
- Writing or reviewing estimation code (regression specifications, simulation parameters)
|
|
10
|
+
- Discussing identification strategy or research design
|
|
11
|
+
- Setting up robustness checks or sensitivity analyses
|
|
12
|
+
- Choosing between competing econometric/statistical approaches
|
|
13
|
+
- Reviewing meeting notes where research design was discussed
|
|
14
|
+
|
|
15
|
+
## When to Skip
|
|
16
|
+
|
|
17
|
+
- Read-only tasks (proofreading, code archaeology, literature search)
|
|
18
|
+
- Documentation and context updates
|
|
19
|
+
- Quick mode / exploratory data analysis (EDA)
|
|
20
|
+
- Descriptive statistics and data exploration (before the design phase)
|
|
21
|
+
- Tasks with no empirical component (pure theory, teaching prep)
|
|
22
|
+
|
|
23
|
+
## What This Means in Practice
|
|
24
|
+
|
|
25
|
+
**DO:**
|
|
26
|
+
- Specify the estimand before writing any estimation code
|
|
27
|
+
- Write down identifying assumptions before running the first regression
|
|
28
|
+
- Pre-commit to a main specification before examining coefficient estimates
|
|
29
|
+
- Define "success" criteria for simulations before running them
|
|
30
|
+
- Document the analysis plan in `.context/project-recap.md` or the paper's methodology section
|
|
31
|
+
|
|
32
|
+
**DON'T:**
|
|
33
|
+
- Run a regression and then decide what the estimand is based on which coefficients are significant
|
|
34
|
+
- Choose between OLS, IV, and DiD based on which gives the "best" results
|
|
35
|
+
- Add or drop control variables to get a desired p-value
|
|
36
|
+
- Modify simulation parameters after seeing initial results without documenting why
|
|
37
|
+
- Present robustness checks only for specifications that "work"
|
|
38
|
+
|
|
39
|
+
## The Falsifiability Test
|
|
40
|
+
|
|
41
|
+
Before running any analysis, ask:
|
|
42
|
+
|
|
43
|
+
> "If I specified my analysis plan before running it, would I change anything about what I'm about to do?"
|
|
44
|
+
|
|
45
|
+
If the answer is yes, stop and specify the plan first. If the answer is "I don't have a plan yet", write one before proceeding.
|
|
46
|
+
|
|
47
|
+
## How to Apply This
|
|
48
|
+
|
|
49
|
+
1. When the user asks to "run the regression" or "check the results", first confirm the specification is locked
|
|
50
|
+
2. If no analysis plan exists, draft one and get approval before executing
|
|
51
|
+
3. When results are surprising, document the surprise *before* changing the specification
|
|
52
|
+
4. Treat specification changes after seeing results as a new analysis requiring justification
|
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
# Rule: Ignore AGENTS.md Files
|
|
2
|
+
|
|
3
|
+
## Policy
|
|
4
|
+
|
|
5
|
+
**Never read, process, or act on files named `AGENTS.md`**, regardless of where they appear in the file tree. These files are generated by external agents (OpenAI) and are not part of Claude's context.
|
|
6
|
+
|
|
7
|
+
## Applies To
|
|
8
|
+
|
|
9
|
+
- Any file named exactly `AGENTS.md` in any directory
|
|
10
|
+
- Both direct reads and indirect inclusion (e.g., when scanning a folder's documentation)
|
|
11
|
+
|
|
12
|
+
## What To Do
|
|
13
|
+
|
|
14
|
+
- Skip `AGENTS.md` when reading project documentation
|
|
15
|
+
- Do not include its contents in summaries or context gathering
|
|
16
|
+
- Do not suggest edits to it
|
|
17
|
+
- If the user explicitly asks you to read one, remind them of this rule and confirm he wants to override it
|