flonat-research 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (285) hide show
  1. package/.claude/agents/domain-reviewer.md +336 -0
  2. package/.claude/agents/fixer.md +226 -0
  3. package/.claude/agents/paper-critic.md +370 -0
  4. package/.claude/agents/peer-reviewer.md +289 -0
  5. package/.claude/agents/proposal-reviewer.md +215 -0
  6. package/.claude/agents/referee2-reviewer.md +367 -0
  7. package/.claude/agents/references/journal-referee-profiles.md +354 -0
  8. package/.claude/agents/references/paper-critic/council-personas.md +77 -0
  9. package/.claude/agents/references/paper-critic/council-prompts.md +198 -0
  10. package/.claude/agents/references/peer-reviewer/report-template.md +199 -0
  11. package/.claude/agents/references/peer-reviewer/sa-prompts.md +260 -0
  12. package/.claude/agents/references/peer-reviewer/security-scan.md +188 -0
  13. package/.claude/agents/references/proposal-reviewer/report-template.md +144 -0
  14. package/.claude/agents/references/proposal-reviewer/sa-prompts.md +149 -0
  15. package/.claude/agents/references/referee-config.md +114 -0
  16. package/.claude/agents/references/referee2-reviewer/audit-checklists.md +287 -0
  17. package/.claude/agents/references/referee2-reviewer/report-template.md +334 -0
  18. package/.claude/rules/design-before-results.md +52 -0
  19. package/.claude/rules/ignore-agents-md.md +17 -0
  20. package/.claude/rules/ignore-gemini-md.md +17 -0
  21. package/.claude/rules/lean-claude-md.md +45 -0
  22. package/.claude/rules/learn-tags.md +99 -0
  23. package/.claude/rules/overleaf-separation.md +67 -0
  24. package/.claude/rules/plan-first.md +175 -0
  25. package/.claude/rules/read-docs-first.md +50 -0
  26. package/.claude/rules/scope-discipline.md +28 -0
  27. package/.claude/settings.json +125 -0
  28. package/.context/current-focus.md +33 -0
  29. package/.context/preferences/priorities.md +36 -0
  30. package/.context/preferences/task-naming.md +28 -0
  31. package/.context/profile.md +29 -0
  32. package/.context/projects/_index.md +41 -0
  33. package/.context/projects/papers/nudge-exp.md +22 -0
  34. package/.context/projects/papers/uncertainty.md +31 -0
  35. package/.context/resources/claude-scientific-writer-review.md +48 -0
  36. package/.context/resources/cunningham-multi-analyst-agents.md +104 -0
  37. package/.context/resources/cunningham-multilang-code-audit.md +62 -0
  38. package/.context/resources/google-ai-co-scientist-review.md +72 -0
  39. package/.context/resources/karpathy-llm-council-review.md +58 -0
  40. package/.context/resources/multi-coder-reliability-protocol.md +175 -0
  41. package/.context/resources/pedro-santanna-takeaways.md +96 -0
  42. package/.context/resources/venue-rankings/abs_ajg_2024.csv +1823 -0
  43. package/.context/resources/venue-rankings/abs_ajg_2024_econ.csv +356 -0
  44. package/.context/resources/venue-rankings/cabs_4_4star_theory.csv +40 -0
  45. package/.context/resources/venue-rankings/core_2026.csv +801 -0
  46. package/.context/resources/venue-rankings.md +147 -0
  47. package/.context/workflows/README.md +69 -0
  48. package/.context/workflows/daily-review.md +91 -0
  49. package/.context/workflows/meeting-actions.md +108 -0
  50. package/.context/workflows/replication-protocol.md +155 -0
  51. package/.context/workflows/weekly-review.md +113 -0
  52. package/.mcp-server-biblio/formatters.py +158 -0
  53. package/.mcp-server-biblio/pyproject.toml +11 -0
  54. package/.mcp-server-biblio/server.py +678 -0
  55. package/.mcp-server-biblio/sources/__init__.py +14 -0
  56. package/.mcp-server-biblio/sources/base.py +73 -0
  57. package/.mcp-server-biblio/sources/formatters.py +83 -0
  58. package/.mcp-server-biblio/sources/models.py +22 -0
  59. package/.mcp-server-biblio/sources/multi_source.py +243 -0
  60. package/.mcp-server-biblio/sources/openalex_source.py +183 -0
  61. package/.mcp-server-biblio/sources/scopus_source.py +309 -0
  62. package/.mcp-server-biblio/sources/wos_source.py +508 -0
  63. package/.mcp-server-biblio/uv.lock +896 -0
  64. package/.scripts/README.md +161 -0
  65. package/.scripts/ai_pattern_density.py +446 -0
  66. package/.scripts/conf +445 -0
  67. package/.scripts/config.py +122 -0
  68. package/.scripts/count_inventory.py +275 -0
  69. package/.scripts/daily_digest.py +288 -0
  70. package/.scripts/done +177 -0
  71. package/.scripts/extract_meeting_actions.py +223 -0
  72. package/.scripts/focus +176 -0
  73. package/.scripts/generate-codex-agents-md.py +217 -0
  74. package/.scripts/inbox +194 -0
  75. package/.scripts/notion_helpers.py +325 -0
  76. package/.scripts/openalex/query_helpers.py +306 -0
  77. package/.scripts/papers +227 -0
  78. package/.scripts/query +223 -0
  79. package/.scripts/session-history.py +201 -0
  80. package/.scripts/skill-health.py +516 -0
  81. package/.scripts/skill-log-miner.py +273 -0
  82. package/.scripts/sync-to-codex.sh +252 -0
  83. package/.scripts/task +213 -0
  84. package/.scripts/tasks +190 -0
  85. package/.scripts/week +206 -0
  86. package/CLAUDE.md +197 -0
  87. package/LICENSE +21 -0
  88. package/MEMORY.md +38 -0
  89. package/README.md +269 -0
  90. package/docs/agents.md +44 -0
  91. package/docs/bibliography-setup.md +55 -0
  92. package/docs/council-mode.md +36 -0
  93. package/docs/getting-started.md +245 -0
  94. package/docs/hooks.md +38 -0
  95. package/docs/mcp-servers.md +82 -0
  96. package/docs/notion-setup.md +109 -0
  97. package/docs/rules.md +33 -0
  98. package/docs/scripts.md +303 -0
  99. package/docs/setup-overview/setup-overview.pdf +0 -0
  100. package/docs/skills.md +70 -0
  101. package/docs/system.md +159 -0
  102. package/hooks/block-destructive-git.sh +66 -0
  103. package/hooks/context-monitor.py +114 -0
  104. package/hooks/postcompact-restore.py +157 -0
  105. package/hooks/precompact-autosave.py +181 -0
  106. package/hooks/promise-checker.sh +124 -0
  107. package/hooks/protect-source-files.sh +81 -0
  108. package/hooks/resume-context-loader.sh +53 -0
  109. package/hooks/startup-context-loader.sh +102 -0
  110. package/package.json +51 -0
  111. package/packages/cli-council/.github/workflows/claude-code-review.yml +44 -0
  112. package/packages/cli-council/.github/workflows/claude.yml +50 -0
  113. package/packages/cli-council/README.md +100 -0
  114. package/packages/cli-council/pyproject.toml +43 -0
  115. package/packages/cli-council/src/cli_council/__init__.py +19 -0
  116. package/packages/cli-council/src/cli_council/__main__.py +185 -0
  117. package/packages/cli-council/src/cli_council/backends/__init__.py +8 -0
  118. package/packages/cli-council/src/cli_council/backends/base.py +81 -0
  119. package/packages/cli-council/src/cli_council/backends/claude.py +25 -0
  120. package/packages/cli-council/src/cli_council/backends/codex.py +27 -0
  121. package/packages/cli-council/src/cli_council/backends/gemini.py +26 -0
  122. package/packages/cli-council/src/cli_council/checkpoint.py +212 -0
  123. package/packages/cli-council/src/cli_council/config.py +51 -0
  124. package/packages/cli-council/src/cli_council/council.py +391 -0
  125. package/packages/cli-council/src/cli_council/models.py +46 -0
  126. package/packages/llm-council/.github/workflows/claude-code-review.yml +44 -0
  127. package/packages/llm-council/.github/workflows/claude.yml +50 -0
  128. package/packages/llm-council/README.md +453 -0
  129. package/packages/llm-council/pyproject.toml +42 -0
  130. package/packages/llm-council/src/llm_council/__init__.py +23 -0
  131. package/packages/llm-council/src/llm_council/__main__.py +259 -0
  132. package/packages/llm-council/src/llm_council/checkpoint.py +193 -0
  133. package/packages/llm-council/src/llm_council/client.py +253 -0
  134. package/packages/llm-council/src/llm_council/config.py +232 -0
  135. package/packages/llm-council/src/llm_council/council.py +482 -0
  136. package/packages/llm-council/src/llm_council/models.py +46 -0
  137. package/packages/mcp-bibliography/MEMORY.md +31 -0
  138. package/packages/mcp-bibliography/_app.py +226 -0
  139. package/packages/mcp-bibliography/formatters.py +158 -0
  140. package/packages/mcp-bibliography/log/2026-03-13-2100.md +35 -0
  141. package/packages/mcp-bibliography/pyproject.toml +15 -0
  142. package/packages/mcp-bibliography/run.sh +20 -0
  143. package/packages/mcp-bibliography/scholarly_formatters.py +83 -0
  144. package/packages/mcp-bibliography/server.py +1857 -0
  145. package/packages/mcp-bibliography/tools/__init__.py +28 -0
  146. package/packages/mcp-bibliography/tools/_registry.py +19 -0
  147. package/packages/mcp-bibliography/tools/altmetric.py +107 -0
  148. package/packages/mcp-bibliography/tools/core.py +92 -0
  149. package/packages/mcp-bibliography/tools/dblp.py +52 -0
  150. package/packages/mcp-bibliography/tools/openalex.py +296 -0
  151. package/packages/mcp-bibliography/tools/opencitations.py +102 -0
  152. package/packages/mcp-bibliography/tools/openreview.py +179 -0
  153. package/packages/mcp-bibliography/tools/orcid.py +131 -0
  154. package/packages/mcp-bibliography/tools/scholarly.py +575 -0
  155. package/packages/mcp-bibliography/tools/unpaywall.py +63 -0
  156. package/packages/mcp-bibliography/tools/zenodo.py +123 -0
  157. package/packages/mcp-bibliography/uv.lock +711 -0
  158. package/scripts/setup.sh +143 -0
  159. package/skills/beamer-deck/SKILL.md +199 -0
  160. package/skills/beamer-deck/references/quality-rubric.md +54 -0
  161. package/skills/beamer-deck/references/review-prompts.md +106 -0
  162. package/skills/bib-validate/SKILL.md +261 -0
  163. package/skills/bib-validate/references/council-mode.md +34 -0
  164. package/skills/bib-validate/references/deep-verify.md +79 -0
  165. package/skills/bib-validate/references/fix-mode.md +36 -0
  166. package/skills/bib-validate/references/openalex-verification.md +45 -0
  167. package/skills/bib-validate/references/preprint-check.md +31 -0
  168. package/skills/bib-validate/references/ref-manager-crossref.md +41 -0
  169. package/skills/bib-validate/references/report-template.md +82 -0
  170. package/skills/code-archaeology/SKILL.md +141 -0
  171. package/skills/code-review/SKILL.md +265 -0
  172. package/skills/code-review/references/quality-rubric.md +67 -0
  173. package/skills/consolidate-memory/SKILL.md +208 -0
  174. package/skills/context-status/SKILL.md +126 -0
  175. package/skills/creation-guard/SKILL.md +230 -0
  176. package/skills/devils-advocate/SKILL.md +130 -0
  177. package/skills/devils-advocate/references/competing-hypotheses.md +83 -0
  178. package/skills/init-project/SKILL.md +115 -0
  179. package/skills/init-project-course/references/memory-and-settings.md +92 -0
  180. package/skills/init-project-course/references/organise-templates.md +94 -0
  181. package/skills/init-project-course/skill.md +147 -0
  182. package/skills/init-project-light/skill.md +139 -0
  183. package/skills/init-project-research/SKILL.md +368 -0
  184. package/skills/init-project-research/references/atlas-pipeline-sync.md +70 -0
  185. package/skills/init-project-research/references/atlas-schema.md +81 -0
  186. package/skills/init-project-research/references/confirmation-report.md +39 -0
  187. package/skills/init-project-research/references/domain-profile-template.md +104 -0
  188. package/skills/init-project-research/references/interview-round3.md +34 -0
  189. package/skills/init-project-research/references/literature-discovery.md +43 -0
  190. package/skills/init-project-research/references/scaffold-details.md +197 -0
  191. package/skills/init-project-research/templates/field-calibration.md +60 -0
  192. package/skills/init-project-research/templates/pipeline-manifest.md +63 -0
  193. package/skills/init-project-research/templates/run-all.sh +116 -0
  194. package/skills/init-project-research/templates/seed-files.md +337 -0
  195. package/skills/insights-deck/SKILL.md +151 -0
  196. package/skills/interview-me/SKILL.md +157 -0
  197. package/skills/latex/SKILL.md +141 -0
  198. package/skills/latex/references/latex-configs.md +183 -0
  199. package/skills/latex-autofix/SKILL.md +230 -0
  200. package/skills/latex-autofix/references/known-errors.md +183 -0
  201. package/skills/latex-autofix/references/quality-rubric.md +50 -0
  202. package/skills/latex-health-check/SKILL.md +161 -0
  203. package/skills/learn/SKILL.md +220 -0
  204. package/skills/learn/scripts/validate_skill.py +265 -0
  205. package/skills/lessons-learned/SKILL.md +201 -0
  206. package/skills/literature/SKILL.md +335 -0
  207. package/skills/literature/references/agent-templates.md +393 -0
  208. package/skills/literature/references/bibliometric-apis.md +44 -0
  209. package/skills/literature/references/cli-council-search.md +79 -0
  210. package/skills/literature/references/openalex-api-guide.md +371 -0
  211. package/skills/literature/references/openalex-common-queries.md +381 -0
  212. package/skills/literature/references/openalex-workflows.md +248 -0
  213. package/skills/literature/references/reference-manager-sync.md +36 -0
  214. package/skills/literature/references/scopus-api-guide.md +208 -0
  215. package/skills/literature/references/wos-api-guide.md +308 -0
  216. package/skills/multi-perspective/SKILL.md +311 -0
  217. package/skills/multi-perspective/references/computational-many-analysts.md +77 -0
  218. package/skills/pipeline-manifest/SKILL.md +226 -0
  219. package/skills/pre-submission-report/SKILL.md +153 -0
  220. package/skills/process-reviews/SKILL.md +244 -0
  221. package/skills/process-reviews/references/rr-routing.md +101 -0
  222. package/skills/project-deck/SKILL.md +87 -0
  223. package/skills/project-safety/SKILL.md +135 -0
  224. package/skills/proofread/SKILL.md +254 -0
  225. package/skills/proofread/references/quality-rubric.md +104 -0
  226. package/skills/python-env/SKILL.md +57 -0
  227. package/skills/quarto-deck/SKILL.md +226 -0
  228. package/skills/quarto-deck/references/markdown-format.md +143 -0
  229. package/skills/quarto-deck/references/quality-rubric.md +54 -0
  230. package/skills/save-context/SKILL.md +174 -0
  231. package/skills/session-log/SKILL.md +98 -0
  232. package/skills/shared/concept-validation-gate.md +161 -0
  233. package/skills/shared/council-protocol.md +265 -0
  234. package/skills/shared/distribution-diagnostics.md +164 -0
  235. package/skills/shared/engagement-stratified-sampling.md +218 -0
  236. package/skills/shared/escalation-protocol.md +74 -0
  237. package/skills/shared/external-audit-protocol.md +205 -0
  238. package/skills/shared/intercoder-reliability.md +256 -0
  239. package/skills/shared/mcp-degradation.md +81 -0
  240. package/skills/shared/method-probing-questions.md +163 -0
  241. package/skills/shared/multi-language-conventions.md +143 -0
  242. package/skills/shared/paid-api-safety.md +174 -0
  243. package/skills/shared/palettes.md +90 -0
  244. package/skills/shared/progressive-disclosure.md +92 -0
  245. package/skills/shared/project-documentation-content.md +443 -0
  246. package/skills/shared/project-documentation-format.md +281 -0
  247. package/skills/shared/project-documentation.md +100 -0
  248. package/skills/shared/publication-output.md +138 -0
  249. package/skills/shared/quality-scoring.md +70 -0
  250. package/skills/shared/reference-resolution.md +77 -0
  251. package/skills/shared/research-quality-rubric.md +165 -0
  252. package/skills/shared/rhetoric-principles.md +54 -0
  253. package/skills/shared/skill-design-patterns.md +272 -0
  254. package/skills/shared/skill-index.md +240 -0
  255. package/skills/shared/system-documentation.md +334 -0
  256. package/skills/shared/tikz-rules.md +402 -0
  257. package/skills/shared/validation-tiers.md +121 -0
  258. package/skills/shared/venue-guides/README.md +46 -0
  259. package/skills/shared/venue-guides/cell_press_style.md +483 -0
  260. package/skills/shared/venue-guides/conferences_formatting.md +564 -0
  261. package/skills/shared/venue-guides/cs_conference_style.md +463 -0
  262. package/skills/shared/venue-guides/examples/cell_summary_example.md +247 -0
  263. package/skills/shared/venue-guides/examples/medical_structured_abstract.md +313 -0
  264. package/skills/shared/venue-guides/examples/nature_abstract_examples.md +213 -0
  265. package/skills/shared/venue-guides/examples/neurips_introduction_example.md +245 -0
  266. package/skills/shared/venue-guides/journals_formatting.md +486 -0
  267. package/skills/shared/venue-guides/medical_journal_styles.md +535 -0
  268. package/skills/shared/venue-guides/ml_conference_style.md +556 -0
  269. package/skills/shared/venue-guides/nature_science_style.md +405 -0
  270. package/skills/shared/venue-guides/reviewer_expectations.md +417 -0
  271. package/skills/shared/venue-guides/venue_writing_styles.md +321 -0
  272. package/skills/split-pdf/SKILL.md +172 -0
  273. package/skills/split-pdf/methodology.md +48 -0
  274. package/skills/sync-notion/SKILL.md +93 -0
  275. package/skills/system-audit/SKILL.md +157 -0
  276. package/skills/system-audit/references/sub-agent-prompts.md +294 -0
  277. package/skills/task-management/SKILL.md +131 -0
  278. package/skills/update-focus/SKILL.md +204 -0
  279. package/skills/update-project-doc/SKILL.md +194 -0
  280. package/skills/validate-bib/SKILL.md +242 -0
  281. package/skills/validate-bib/references/council-mode.md +34 -0
  282. package/skills/validate-bib/references/deep-verify.md +71 -0
  283. package/skills/validate-bib/references/openalex-verification.md +45 -0
  284. package/skills/validate-bib/references/preprint-check.md +31 -0
  285. package/skills/validate-bib/references/report-template.md +62 -0
@@ -0,0 +1,256 @@
1
+ # Inter-Coder Reliability & LLM Annotation Validation
2
+
3
+ > Shared reference for content analysis, LLM annotation, and coding studies. Covers both human-coder and multi-model reliability assessment. Adapted from CommDAAF AgentAcademy protocol (Xu 2026).
4
+
5
+ ## Principle
6
+
7
+ **Always report reliability per category, not just aggregate.** A global κ of 0.7 can hide the fact that one frame has κ = 0.3 — which means that frame's results are unreliable. Frame-specific (or category-specific) reliability is the minimum standard.
8
+
9
+ ---
10
+
11
+ ## When This Applies
12
+
13
+ - Any content analysis with 2+ coders (human or LLM)
14
+ - LLM annotation studies using multiple models
15
+ - Survey coding or qualitative coding with multiple raters
16
+ - Any study reporting inter-coder or inter-model agreement
17
+
18
+ ---
19
+
20
+ ## Metrics to Report
21
+
22
+ ### For 2 coders
23
+
24
+ | Metric | When to use | Interpretation |
25
+ |--------|------------|----------------|
26
+ | **Cohen's κ** | Two coders, nominal categories | Chance-corrected agreement |
27
+ | **Weighted κ** | Two coders, ordinal categories | Accounts for degree of disagreement |
28
+ | **% Agreement** | Supplement only | Not chance-corrected; report alongside κ |
29
+
30
+ ### For 3+ coders (including multi-model LLM)
31
+
32
+ | Metric | When to use | Interpretation |
33
+ |--------|------------|----------------|
34
+ | **Fleiss' κ** | 3+ coders, nominal categories | Multi-rater chance-corrected agreement |
35
+ | **Three-way agreement** | 3 models, quick check | Proportion where all 3 agree |
36
+ | **Majority agreement** | 3 models, voting | Proportion where ≥2/3 agree |
37
+ | **Pairwise κ** | 3+ coders, diagnostic | Which pairs agree/disagree |
38
+ | **Krippendorff's α** | Any number of coders, any scale | Most general; handles missing data |
39
+
40
+ ---
41
+
42
+ ## Thresholds
43
+
44
+ | κ range | Interpretation | Action |
45
+ |---------|---------------|--------|
46
+ | < 0.20 | Poor | Do not use this category. Redefine or merge. |
47
+ | 0.20–0.40 | Fair | Flag. Consider retraining or revising codebook. |
48
+ | 0.40–0.60 | Moderate | Acceptable for exploratory. Report limitation. |
49
+ | 0.60–0.80 | Substantial | Acceptable for publication. |
50
+ | > 0.80 | Excellent | Strong agreement. |
51
+
52
+ **Publication threshold:** κ ≥ 0.7 (or α ≥ 0.7 for Krippendorff's α).
53
+
54
+ **LLM annotation minimum:** Human validation sample N ≥ 200, κ ≥ 0.7 between LLM and human.
55
+
56
+ ---
57
+
58
+ ## Category-Specific Reliability
59
+
60
+ ### Why aggregate isn't enough
61
+
62
+ ```
63
+ Aggregate Fleiss' κ = 0.72 ← looks fine
64
+
65
+ Per-category:
66
+ SOLIDARITY: κ = 0.89 ✅
67
+ INJUSTICE: κ = 0.81 ✅
68
+ MOBILISATION: κ = 0.74 ✅
69
+ HUMANITARIAN: κ = 0.31 ⚠️ ← this category's results are unreliable
70
+ CULTURAL: κ = 0.65 ⚠️ ← borderline
71
+ ```
72
+
73
+ If HUMANITARIAN has κ = 0.31, any finding about that category should be treated as exploratory regardless of the aggregate score.
74
+
75
+ ### Implementation
76
+
77
+ ```python
78
+ from sklearn.metrics import cohen_kappa_score
79
+ import itertools
80
+
81
+ def category_specific_reliability(coded_data, code_var='frame', coders=None):
82
+ """
83
+ Calculate reliability per category across all coders.
84
+
85
+ coded_data: list of dicts, each with '{coder}_{code_var}' keys
86
+ coders: list of coder names (e.g., ['claude', 'glm', 'kimi'] or ['coder1', 'coder2'])
87
+ """
88
+ # Collect all categories
89
+ categories = set()
90
+ for d in coded_data:
91
+ for c in coders:
92
+ categories.add(d.get(f'{c}_{code_var}'))
93
+
94
+ results = {}
95
+ for cat in categories:
96
+ # For each item, create binary: did coder assign this category?
97
+ binary_codes = {c: [] for c in coders}
98
+ for d in coded_data:
99
+ for c in coders:
100
+ binary_codes[c].append(1 if d.get(f'{c}_{code_var}') == cat else 0)
101
+
102
+ # Three-way agreement (for 3 coders)
103
+ if len(coders) == 3:
104
+ n_relevant = sum(1 for i in range(len(coded_data))
105
+ if any(binary_codes[c][i] for c in coders))
106
+ if n_relevant < 5:
107
+ results[cat] = {'n': n_relevant, 'kappa': None, 'flag': 'Too few cases'}
108
+ continue
109
+
110
+ three_way = sum(
111
+ 1 for i in range(len(coded_data))
112
+ if all(binary_codes[c][i] == binary_codes[coders[0]][i] for c in coders)
113
+ and any(binary_codes[c][i] for c in coders)
114
+ ) / max(n_relevant, 1)
115
+
116
+ results[cat] = {
117
+ 'n': n_relevant,
118
+ 'agreement': three_way,
119
+ 'flag': '✅' if three_way >= 0.6 else '⚠️ Low reliability'
120
+ }
121
+
122
+ # Pairwise kappa
123
+ kappas = []
124
+ for c1, c2 in itertools.combinations(coders, 2):
125
+ try:
126
+ k = cohen_kappa_score(binary_codes[c1], binary_codes[c2])
127
+ kappas.append(k)
128
+ except ValueError:
129
+ pass
130
+
131
+ if kappas:
132
+ results[cat]['mean_pairwise_kappa'] = sum(kappas) / len(kappas)
133
+
134
+ return results
135
+ ```
136
+
137
+ ### R
138
+
139
+ ```r
140
+ library(irr)
141
+
142
+ category_reliability <- function(coded_data, code_var, coders) {
143
+ categories <- unique(unlist(coded_data[paste0(coders, "_", code_var)]))
144
+
145
+ results <- list()
146
+ for (cat in categories) {
147
+ binary <- sapply(coders, function(c) {
148
+ as.integer(coded_data[[paste0(c, "_", code_var)]] == cat)
149
+ })
150
+
151
+ n_relevant <- sum(rowSums(binary) > 0)
152
+ if (n_relevant < 5) next
153
+
154
+ k <- kappam.fleiss(binary)
155
+ results[[cat]] <- list(
156
+ n = n_relevant,
157
+ fleiss_kappa = k$value,
158
+ flag = ifelse(k$value >= 0.6, "OK", "Low")
159
+ )
160
+ }
161
+ results
162
+ }
163
+ ```
164
+
165
+ ---
166
+
167
+ ## LLM Annotation Protocol
168
+
169
+ When using LLMs as annotators (a growing practice):
170
+
171
+ ### Multi-Model Design
172
+
173
+ | Requirement | Standard |
174
+ |-------------|----------|
175
+ | **Number of models** | ≥ 2 (ideally 3 from different providers) |
176
+ | **Human validation** | N ≥ 200, κ ≥ 0.7 between LLM majority vote and human gold standard |
177
+ | **Prompt documentation** | Full prompt text in appendix or replication package |
178
+ | **Disagreement analysis** | Report which categories models disagree on most |
179
+ | **Temperature** | 0 or near-0 for reproducibility |
180
+
181
+ ### Majority Vote
182
+
183
+ For 3-model annotation, use majority vote (2/3 agreement):
184
+
185
+ ```python
186
+ from collections import Counter
187
+
188
+ def majority_vote(codes_by_model):
189
+ """codes_by_model: {'claude': 'X', 'gpt': 'Y', 'gemini': 'X'} → 'X'"""
190
+ counter = Counter(codes_by_model.values())
191
+ majority, count = counter.most_common(1)[0]
192
+ return majority, count # count = 2 or 3
193
+ ```
194
+
195
+ ### When Models Disagree
196
+
197
+ | Agreement level | Action |
198
+ |----------------|--------|
199
+ | 3/3 agree | High confidence — use the label |
200
+ | 2/3 agree | Moderate confidence — use majority, flag for spot-check |
201
+ | 0/3 agree (three-way split) | Low confidence — requires human adjudication |
202
+
203
+ ---
204
+
205
+ ## Reporting Template
206
+
207
+ ```markdown
208
+ ## Inter-Coder Reliability
209
+
210
+ ### Overall
211
+ - Fleiss' κ = X.XX (N = XXX items, K = X coders)
212
+ - Three-way agreement: XX.X%
213
+ - Majority agreement (≥2/3): XX.X%
214
+
215
+ ### Per Category
216
+ | Category | N | Three-way | Fleiss' κ | Status |
217
+ |----------|---|-----------|-----------|--------|
218
+ | X | XX | XX.X% | X.XX | ✅/⚠️ |
219
+ | Y | XX | XX.X% | X.XX | ✅/⚠️ |
220
+
221
+ ### Low-Reliability Categories
222
+ ⚠️ [Category] had κ = X.XX (below 0.60 threshold).
223
+ Findings involving this category should be interpreted with caution.
224
+
225
+ ### Human Validation (if LLM-annotated)
226
+ - Gold standard sample: N = XXX
227
+ - LLM majority vs. human: κ = X.XX
228
+ - Model-specific: Claude κ = X.XX, GPT κ = X.XX, Gemini κ = X.XX
229
+ ```
230
+
231
+ ---
232
+
233
+ ## Integration
234
+
235
+ ### In `/data-analysis`
236
+
237
+ When the dataset includes coded/annotated variables:
238
+ 1. Check if reliability metrics are reported
239
+ 2. If multi-model LLM annotation, verify human validation exists
240
+ 3. Flag categories below κ = 0.6 threshold
241
+
242
+ ### In review agents
243
+
244
+ Check whether content analysis papers:
245
+ - Report per-category reliability (not just aggregate)
246
+ - Flag low-reliability categories as limitations
247
+ - Include human validation for LLM annotations
248
+ - Missing reliability → Major issue; aggregate-only → Minor issue
249
+
250
+ ### Validation tier interaction
251
+
252
+ | Tier | Requirement |
253
+ |------|------------|
254
+ | 🟢 Exploratory | Multi-model agreement reported; human validation optional |
255
+ | 🟡 Pilot | Multi-model agreement + spot-check (N ≥ 50) |
256
+ | 🔴 Publication | Full reliability battery + human validation (N ≥ 200, κ ≥ 0.7) |
@@ -0,0 +1,81 @@
1
+ # MCP Degradation Pattern
2
+
3
+ > Shared reference for skills that depend on MCP servers. Gracefully degrade when a server is unreachable instead of failing silently or blocking the workflow.
4
+
5
+ ## The 5-Step Pattern
6
+
7
+ ### Step 1: Probe at Start
8
+
9
+ Before any MCP-dependent work, test connectivity:
10
+
11
+ ```
12
+ Try a lightweight read operation (e.g., mcp__taskflow__search_tasks with a known term,
13
+ openalex_search_works with a trivial query). If it times out or errors,
14
+ mark that server as unavailable for the session.
15
+ ```
16
+
17
+ Do this **once**, at the beginning of the skill — not before every call.
18
+
19
+ ### Step 2: Report Availability
20
+
21
+ After probing, state clearly:
22
+
23
+ ```
24
+ MCP status:
25
+ - Vault: ✓ available
26
+ - OpenAlex: ✗ unavailable (timeout)
27
+ ```
28
+
29
+ This sets expectations before the user sees skipped steps.
30
+
31
+ ### Step 3: Skip Dependent Phases
32
+
33
+ When a server is unavailable, skip phases that depend on it entirely — do not attempt them, do not retry. Mark skipped phases clearly in the output:
34
+
35
+ ```
36
+ Step 5: Update vault research pipeline — SKIPPED (vault unavailable)
37
+ ```
38
+
39
+ ### Step 4: Offer Fallbacks
40
+
41
+ For each skipped phase, suggest what the user can do manually:
42
+
43
+ | Unavailable | Fallback |
44
+ |-------------|----------|
45
+ | taskflow MCP | "Run `vault sync (edit vault files directly)` later when vault is accessible" |
46
+ | OpenAlex MCP | "Use `/literature` with web search mode instead" |
47
+ | Scholarly MCP | "Use `/literature` with web search mode instead" |
48
+
49
+ ### Step 5: Summarize at End
50
+
51
+ Close with a clean summary of what completed vs. what was skipped:
52
+
53
+ ```
54
+ Completed: Steps 1-4 (local context updates)
55
+ Skipped: Step 5 (vault sync — server unavailable)
56
+ Action needed: Run `vault sync (edit vault files directly)` when vault is accessible
57
+ ```
58
+
59
+ ## MCP-Consuming Skills
60
+
61
+ These skills should reference this pattern:
62
+
63
+ | Skill | MCP Server | What depends on it |
64
+ |-------|------------|-------------------|
65
+ | `vault sync` | vault | Steps 3, 5 (search + update) |
66
+ | `task-management` | vault | Daily planning, task creation, pipeline queries |
67
+ | `init-project-research` | vault | Pipeline entry creation |
68
+ | `literature` | OpenAlex, Scholarly | Citation search, DOI verification |
69
+ | `atlas-review` | vault | Pipeline cross-reference |
70
+
71
+ ## When to Apply
72
+
73
+ - Always when a skill's workflow includes MCP tool calls
74
+ - Especially in background agents (where MCP tools may auto-deny due to permission constraints)
75
+ - When network issues are suspected (slow responses, recent timeouts)
76
+
77
+ ## When to Skip
78
+
79
+ - Skills that don't use MCP tools
80
+ - When the user explicitly says "skip vault" or "offline mode"
81
+ - Interactive sessions where the user can retry immediately
@@ -0,0 +1,163 @@
1
+ # Method-Specific Probing Questions
2
+
3
+ > Shared reference for analysis skills and review agents. Mandatory questions before running any empirical analysis. Adapted from CommDAAF (Xu 2026), generalised beyond communication research.
4
+
5
+ ## Principle
6
+
7
+ **Never run a method with default parameters.** Before executing any analysis, ask the method-specific probing questions. Do NOT proceed without explicit answers. Vague answers trigger the [escalation protocol](escalation-protocol.md).
8
+
9
+ ## Expert Fast-Track
10
+
11
+ Experienced researchers can bypass probing by providing complete specs upfront:
12
+
13
+ ```
14
+ DiD: treatment = policy_change_2020, control = neighbouring_states,
15
+ parallel trends tested 2015-2019, Callaway-Sant'Anna estimator,
16
+ clustered SEs at state level, 3 pre-treatment periods
17
+ ```
18
+
19
+ If the spec is complete, acknowledge and proceed. If anything is missing, probe only the gaps.
20
+
21
+ ---
22
+
23
+ ## Probing Questions by Method
24
+
25
+ ### Regression / OLS / Panel
26
+
27
+ 1. What is your **estimand**? (ATE? ATT? Conditional mean?)
28
+ 2. What is the **unit of analysis**? (Individual? Firm? Country-year?)
29
+ 3. What is the **identification strategy**? (Selection on observables? IV? DiD? RDD?)
30
+ 4. What **controls** and why? (Justify each — no kitchen sink)
31
+ 5. How do you handle **standard errors**? (Clustered? HAC? At what level and why?)
32
+ 6. What are the **key threats to validity**? (Omitted variables, reverse causality, measurement error)
33
+
34
+ ### Difference-in-Differences
35
+
36
+ 1. What is the **treatment** and when does it occur?
37
+ 2. What are **treatment and control groups**? How defined?
38
+ 3. Is treatment timing **staggered**? (If yes: TWFE is biased — use CS, Sun-Abraham, or similar)
39
+ 4. **Parallel trends** evidence? (Pre-treatment dynamics, placebo tests)
40
+ 5. Any **anticipation effects**? (Treatment effects before official treatment date)
41
+ 6. Are there **spillovers** between treated and control units?
42
+ 7. What is the **relevant pre-treatment window**?
43
+
44
+ ### Instrumental Variables
45
+
46
+ 1. What is the **instrument** and what is the **theoretical argument** for relevance?
47
+ 2. What is the **exclusion restriction** argument? (Why does Z affect Y only through X?)
48
+ 3. **Weak instrument** diagnostics? (First-stage F-statistic, Anderson-Rubin CIs)
49
+ 4. Is the instrument **plausibly exogenous**? (What could violate this?)
50
+ 5. Are there **multiple instruments**? (Over-identification tests)
51
+ 6. What is the **LATE** you're estimating? (Who are the compliers?)
52
+
53
+ ### Regression Discontinuity
54
+
55
+ 1. What is the **running variable** and **cutoff**?
56
+ 2. Is it **sharp or fuzzy**? (Compliance rate at cutoff)
57
+ 3. **Bandwidth selection** method? (MSE-optimal? Cross-validation?)
58
+ 4. Evidence against **manipulation** at cutoff? (McCrary test, density plot)
59
+ 5. **Functional form**? (Local linear? Polynomial order? Sensitivity to choice)
60
+ 6. Are there **multiple cutoffs** or **geographic boundaries**?
61
+
62
+ ### Experiments / RCTs
63
+
64
+ 1. What is the **randomisation unit**? (Individual? Cluster?)
65
+ 2. **Power analysis**? (Effect size assumption, desired power, required N)
66
+ 3. Was the experiment **pre-registered**? (Where? What deviations?)
67
+ 4. How do you handle **attrition** and **non-compliance**?
68
+ 5. **Multiple testing**? (How many outcomes? Correction method?)
69
+ 6. **Demand effects**? (Can participants guess the hypothesis?)
70
+ 7. Is there a **pre-analysis plan**?
71
+
72
+ ### Survey / Psychometrics
73
+
74
+ 1. What **scales** are you using? (Validated or ad hoc?)
75
+ 2. **Common method variance** — single source, single method?
76
+ 3. **Response rate** and non-response bias assessment?
77
+ 4. **Sampling strategy**? (Probability? Convenience? Online panel?)
78
+ 5. How do you handle **missing data**? (Listwise deletion? MI? FIML?)
79
+ 6. **Measurement invariance** across groups? (If comparing subgroups)
80
+
81
+ ### MCDM / Multi-Criteria Analysis
82
+
83
+ 1. What **method** and **why this one**? (AHP? TOPSIS? PROMETHEE? ELECTRE?)
84
+ 2. How are **criteria weights** determined? (Expert elicitation? Pairwise comparison? Equal?)
85
+ 3. **Sensitivity analysis** on weights? (How much do rankings change?)
86
+ 4. **Rank reversal** check? (Does adding/removing alternatives change the ranking?)
87
+ 5. How many **decision-makers** and how are judgements aggregated?
88
+ 6. Are criteria **independent**? (If not, how is correlation handled?)
89
+ 7. **Normalisation method** and sensitivity to that choice?
90
+
91
+ ### Topic Modeling / NLP
92
+
93
+ 1. Why topic modeling specifically? (Exploratory? No predetermined categories?)
94
+ 2. How many topics (K) and **how will you select K**? (Coherence score? Held-out likelihood?)
95
+ 3. What **preprocessing**? (Stopwords, stemming, frequency thresholds — justify each)
96
+ 4. What counts as one **document**? (Post? Paragraph? Article?)
97
+ 5. How will you **validate topics are meaningful**? (Read 20+ docs per topic)
98
+ 6. Who will **name topics** and how?
99
+
100
+ ### Machine Learning / Classification
101
+
102
+ 1. What is the **ground truth** and how was it generated?
103
+ 2. **Train/test split** strategy? (Random? Temporal? Stratified?)
104
+ 3. How do you prevent **data leakage**?
105
+ 4. What **baselines** are you comparing against? (Are they serious or strawmen?)
106
+ 5. **Evaluation metrics** and why? (Accuracy? F1? AUC? — justify for your class balance)
107
+ 6. **Human validation** sample? (Required for any LLM annotation: N≥200, κ≥0.7)
108
+
109
+ ### Simulation / Agent-Based Models
110
+
111
+ 1. What is the **purpose** of the simulation? (Theoretical exploration? Mechanism testing? Calibration?)
112
+ 2. How are **parameters chosen**? (Empirically calibrated? Assumed? Swept?)
113
+ 3. **Sensitivity analysis** plan? (Which parameters vary? Over what range?)
114
+ 4. How many **replications** per parameter configuration?
115
+ 5. Does the model **validate against empirical data**? (Or is it purely theoretical?)
116
+ 6. What **simplifying assumptions** are made and what do they rule out?
117
+ 7. **Convergence check** — how do you know the simulation has run long enough?
118
+
119
+ ### Content Analysis / Coding
120
+
121
+ 1. Where is your **codebook**? (Must exist before coding)
122
+ 2. How many **coders** and what is the reliability plan?
123
+ 3. **Sampling strategy** for content to code?
124
+ 4. **Training protocol** for coders?
125
+ 5. How will you handle **ambiguous cases**?
126
+ 6. **Inter-coder reliability** metric and threshold? (Fleiss' κ ≥ 0.7 for publication)
127
+
128
+ ---
129
+
130
+ ## How Skills and Agents Use This
131
+
132
+ ### In analysis skills (`/data-analysis`, `/causal-design`, `/experiment-design`)
133
+
134
+ 1. Detect the method type from user's request
135
+ 2. Present the relevant probing questions
136
+ 3. Wait for answers before proceeding
137
+ 4. If answers are vague → [escalation protocol](escalation-protocol.md)
138
+ 5. Expert fast-track: if user provides complete spec, acknowledge and proceed
139
+
140
+ ### In review agents (`referee2-reviewer`, `domain-reviewer`)
141
+
142
+ 1. Check whether the paper addresses these questions for its stated method
143
+ 2. Flag unanswered questions as issues in the review report
144
+ 3. Missing identification strategy → Critical; missing sensitivity analysis → Major
145
+
146
+ ### In project agents (`data-engineer`, `econometrician`)
147
+
148
+ 1. Before implementing any estimation, verify the probing questions are answered
149
+ 2. Reference the answers from `.planning/state.md` or the paper's methodology section
150
+ 3. If no answers exist, probe the user before writing code
151
+
152
+ ---
153
+
154
+ ## Adding New Methods
155
+
156
+ When a research project uses a method not covered above, create a method-specific question set following the pattern:
157
+
158
+ 1. **What** — define the exact variant/specification
159
+ 2. **Why** — justify the choice over alternatives
160
+ 3. **How** — implementation details that affect validity
161
+ 4. **Threats** — what could go wrong
162
+ 5. **Validation** — how you'll know it's correct
163
+ 6. **Sensitivity** — what happens when assumptions change
@@ -0,0 +1,143 @@
1
+ # Multi-Language Conventions
2
+
3
+ > Code style, packages, and output patterns for R, Python, Stata, and Julia.
4
+ > Referenced by `/data-analysis`, `/synthetic-data`, `/replication-package`.
5
+
6
+ ## Language Detection
7
+
8
+ Detect from project context in this order:
9
+
10
+ 1. Existing scripts in `code/` or `src/` → match language
11
+ 2. `CLAUDE.md` or `MEMORY.md` mentions → follow stated preference
12
+ 3. User request → explicit language choice
13
+ 4. Default → R for econometrics/causal inference, Python for ML/simulation
14
+
15
+ ## R Conventions
16
+
17
+ ### Style
18
+ - Assignment: `<-` (never `=`)
19
+ - Pipe: `|>` (base R) preferred; `%>%` acceptable if tidyverse already loaded
20
+ - Naming: `snake_case` for variables and functions
21
+ - Line width: 80 characters
22
+
23
+ ### Core Packages
24
+
25
+ | Task | Package |
26
+ |------|---------|
27
+ | Data wrangling | `dplyr`, `tidyr`, `data.table` |
28
+ | Estimation | `fixest`, `estimatr`, `lme4`, `survival` |
29
+ | Causal inference | `did`, `didimputation`, `synthdid`, `rdrobust`, `MatchIt` |
30
+ | Tables | `modelsummary`, `kableExtra`, `stargazer` |
31
+ | Figures | `ggplot2` + `ggthemes`, `patchwork` |
32
+ | Reporting | `rmarkdown`, `knitr` |
33
+ | Power analysis | `DeclareDesign`, `pwr`, `simr` |
34
+ | Survey | `survey`, `srvyr`, `qualtRics` |
35
+
36
+ ### Output Pattern
37
+ ```r
38
+ # Tables → LaTeX
39
+ modelsummary(models, output = "paper/tables/table1.tex",
40
+ stars = c("*" = 0.10, "**" = 0.05, "***" = 0.01))
41
+
42
+ # Figures → PDF
43
+ ggsave("paper/figures/fig1.pdf", width = 6, height = 4)
44
+ ```
45
+
46
+ ## Python Conventions
47
+
48
+ ### Style
49
+ - **Always use `uv`** — never bare `python` or `pip`
50
+ - Naming: `snake_case` for variables/functions, `PascalCase` for classes
51
+ - Type hints encouraged but not required for analysis scripts
52
+ - Line width: 88 characters (Black default)
53
+
54
+ ### Core Packages
55
+
56
+ | Task | Package |
57
+ |------|---------|
58
+ | Data wrangling | `pandas`, `polars` |
59
+ | Estimation | `statsmodels`, `linearmodels`, `scikit-learn` |
60
+ | Causal inference | `econml`, `causalml`, `doubleml` |
61
+ | Tables | `stargazer` (via statsmodels), custom `.tex` export |
62
+ | Figures | `matplotlib`, `seaborn`, `plotnine` |
63
+ | Power analysis | `statsmodels.stats.power`, `numpy` simulation |
64
+ | Survey | `pandas` + manual parsing |
65
+
66
+ ### Output Pattern
67
+ ```python
68
+ import pandas as pd
69
+
70
+ # Tables → LaTeX
71
+ df.to_latex("paper/tables/table1.tex", index=False,
72
+ caption="Descriptive Statistics", label="tab:desc")
73
+
74
+ # Figures → PDF
75
+ fig.savefig("paper/figures/fig1.pdf", bbox_inches="tight", dpi=300)
76
+ ```
77
+
78
+ ## Stata Conventions
79
+
80
+ ### Style
81
+ - Use `//` for inline comments, `/* */` for blocks
82
+ - Naming: lowercase with underscores
83
+ - Always set `version` at script top
84
+ - Use `capture log close` / `log using` pattern
85
+
86
+ ### Core Commands
87
+
88
+ | Task | Command |
89
+ |------|---------|
90
+ | Data wrangling | `gen`, `replace`, `reshape`, `merge`, `collapse` |
91
+ | Estimation | `reg`, `ivregress`, `xtreg`, `areg`, `ppmlhdfe` |
92
+ | Causal inference | `did_multiplegt`, `csdid`, `rdrobust`, `eventstudyinteract` |
93
+ | Tables | `esttab`, `outreg2`, `estout` |
94
+ | Figures | `twoway`, `coefplot`, `graph export` |
95
+ | Power analysis | `power`, `simulate` |
96
+
97
+ ### Output Pattern
98
+ ```stata
99
+ // Tables → LaTeX
100
+ esttab m1 m2 m3 using "paper/tables/table1.tex", ///
101
+ replace booktabs label se star(* 0.10 ** 0.05 *** 0.01)
102
+
103
+ // Figures → PDF
104
+ graph export "paper/figures/fig1.pdf", replace
105
+ ```
106
+
107
+ ## Julia Conventions
108
+
109
+ ### Style
110
+ - Naming: `snake_case` for functions/variables, `PascalCase` for types
111
+ - Use `using` not `import` for standard packages
112
+ - Line width: 92 characters
113
+
114
+ ### Core Packages
115
+
116
+ | Task | Package |
117
+ |------|---------|
118
+ | Data wrangling | `DataFrames.jl`, `CSV.jl` |
119
+ | Estimation | `GLM.jl`, `FixedEffectModels.jl`, `MixedModels.jl` |
120
+ | Causal inference | `CausalInference.jl` (limited — often custom) |
121
+ | Tables | `PrettyTables.jl` (LaTeX backend) |
122
+ | Figures | `Makie.jl`, `AlgebraOfGraphics.jl`, `Plots.jl` |
123
+ | Power analysis | Custom simulation with `Distributions.jl` |
124
+
125
+ ### Output Pattern
126
+ ```julia
127
+ using PrettyTables
128
+ # Tables → LaTeX
129
+ open("paper/tables/table1.tex", "w") do io
130
+ pretty_table(io, df, backend=Val(:latex))
131
+ end
132
+
133
+ # Figures → PDF
134
+ save("paper/figures/fig1.pdf", fig)
135
+ ```
136
+
137
+ ## Cross-Language Rules
138
+
139
+ 1. **Output routing:** All `.tex` table files and figure files go to `paper/tables/` or `paper/figures/` per the `overleaf-separation` rule. Scripts stay in `code/` or `src/`.
140
+ 2. **Reproducibility header:** Every script should start with a comment block stating purpose, inputs, outputs, and dependencies.
141
+ 3. **Seed setting:** Always set random seeds explicitly (`set.seed()`, `np.random.seed()`, `set seed`, `Random.seed!`).
142
+ 4. **Path convention:** Use project-relative paths, never absolute paths. Assume scripts run from project root.
143
+ 5. **Data flow:** `data/raw/` → script → `data/processed/` → script → `paper/` output. Never write to `data/raw/`.