npm - flonat-research - Versions diffs - 0.1.0 - Mend

flonat-research 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (285) hide show

package/.claude/agents/domain-reviewer.md +336 -0
package/.claude/agents/fixer.md +226 -0
package/.claude/agents/paper-critic.md +370 -0
package/.claude/agents/peer-reviewer.md +289 -0
package/.claude/agents/proposal-reviewer.md +215 -0
package/.claude/agents/referee2-reviewer.md +367 -0
package/.claude/agents/references/journal-referee-profiles.md +354 -0
package/.claude/agents/references/paper-critic/council-personas.md +77 -0
package/.claude/agents/references/paper-critic/council-prompts.md +198 -0
package/.claude/agents/references/peer-reviewer/report-template.md +199 -0
package/.claude/agents/references/peer-reviewer/sa-prompts.md +260 -0
package/.claude/agents/references/peer-reviewer/security-scan.md +188 -0
package/.claude/agents/references/proposal-reviewer/report-template.md +144 -0
package/.claude/agents/references/proposal-reviewer/sa-prompts.md +149 -0
package/.claude/agents/references/referee-config.md +114 -0
package/.claude/agents/references/referee2-reviewer/audit-checklists.md +287 -0
package/.claude/agents/references/referee2-reviewer/report-template.md +334 -0
package/.claude/rules/design-before-results.md +52 -0
package/.claude/rules/ignore-agents-md.md +17 -0
package/.claude/rules/ignore-gemini-md.md +17 -0
package/.claude/rules/lean-claude-md.md +45 -0
package/.claude/rules/learn-tags.md +99 -0
package/.claude/rules/overleaf-separation.md +67 -0
package/.claude/rules/plan-first.md +175 -0
package/.claude/rules/read-docs-first.md +50 -0
package/.claude/rules/scope-discipline.md +28 -0
package/.claude/settings.json +125 -0
package/.context/current-focus.md +33 -0
package/.context/preferences/priorities.md +36 -0
package/.context/preferences/task-naming.md +28 -0
package/.context/profile.md +29 -0
package/.context/projects/_index.md +41 -0
package/.context/projects/papers/nudge-exp.md +22 -0
package/.context/projects/papers/uncertainty.md +31 -0
package/.context/resources/claude-scientific-writer-review.md +48 -0
package/.context/resources/cunningham-multi-analyst-agents.md +104 -0
package/.context/resources/cunningham-multilang-code-audit.md +62 -0
package/.context/resources/google-ai-co-scientist-review.md +72 -0
package/.context/resources/karpathy-llm-council-review.md +58 -0
package/.context/resources/multi-coder-reliability-protocol.md +175 -0
package/.context/resources/pedro-santanna-takeaways.md +96 -0
package/.context/resources/venue-rankings/abs_ajg_2024.csv +1823 -0
package/.context/resources/venue-rankings/abs_ajg_2024_econ.csv +356 -0
package/.context/resources/venue-rankings/cabs_4_4star_theory.csv +40 -0
package/.context/resources/venue-rankings/core_2026.csv +801 -0
package/.context/resources/venue-rankings.md +147 -0
package/.context/workflows/README.md +69 -0
package/.context/workflows/daily-review.md +91 -0
package/.context/workflows/meeting-actions.md +108 -0
package/.context/workflows/replication-protocol.md +155 -0
package/.context/workflows/weekly-review.md +113 -0
package/.mcp-server-biblio/formatters.py +158 -0
package/.mcp-server-biblio/pyproject.toml +11 -0
package/.mcp-server-biblio/server.py +678 -0
package/.mcp-server-biblio/sources/__init__.py +14 -0
package/.mcp-server-biblio/sources/base.py +73 -0
package/.mcp-server-biblio/sources/formatters.py +83 -0
package/.mcp-server-biblio/sources/models.py +22 -0
package/.mcp-server-biblio/sources/multi_source.py +243 -0
package/.mcp-server-biblio/sources/openalex_source.py +183 -0
package/.mcp-server-biblio/sources/scopus_source.py +309 -0
package/.mcp-server-biblio/sources/wos_source.py +508 -0
package/.mcp-server-biblio/uv.lock +896 -0
package/.scripts/README.md +161 -0
package/.scripts/ai_pattern_density.py +446 -0
package/.scripts/conf +445 -0
package/.scripts/config.py +122 -0
package/.scripts/count_inventory.py +275 -0
package/.scripts/daily_digest.py +288 -0
package/.scripts/done +177 -0
package/.scripts/extract_meeting_actions.py +223 -0
package/.scripts/focus +176 -0
package/.scripts/generate-codex-agents-md.py +217 -0
package/.scripts/inbox +194 -0
package/.scripts/notion_helpers.py +325 -0
package/.scripts/openalex/query_helpers.py +306 -0
package/.scripts/papers +227 -0
package/.scripts/query +223 -0
package/.scripts/session-history.py +201 -0
package/.scripts/skill-health.py +516 -0
package/.scripts/skill-log-miner.py +273 -0
package/.scripts/sync-to-codex.sh +252 -0
package/.scripts/task +213 -0
package/.scripts/tasks +190 -0
package/.scripts/week +206 -0
package/CLAUDE.md +197 -0
package/LICENSE +21 -0
package/MEMORY.md +38 -0
package/README.md +269 -0
package/docs/agents.md +44 -0
package/docs/bibliography-setup.md +55 -0
package/docs/council-mode.md +36 -0
package/docs/getting-started.md +245 -0
package/docs/hooks.md +38 -0
package/docs/mcp-servers.md +82 -0
package/docs/notion-setup.md +109 -0
package/docs/rules.md +33 -0
package/docs/scripts.md +303 -0
package/docs/setup-overview/setup-overview.pdf +0 -0
package/docs/skills.md +70 -0
package/docs/system.md +159 -0
package/hooks/block-destructive-git.sh +66 -0
package/hooks/context-monitor.py +114 -0
package/hooks/postcompact-restore.py +157 -0
package/hooks/precompact-autosave.py +181 -0
package/hooks/promise-checker.sh +124 -0
package/hooks/protect-source-files.sh +81 -0
package/hooks/resume-context-loader.sh +53 -0
package/hooks/startup-context-loader.sh +102 -0
package/package.json +51 -0
package/packages/cli-council/.github/workflows/claude-code-review.yml +44 -0
package/packages/cli-council/.github/workflows/claude.yml +50 -0
package/packages/cli-council/README.md +100 -0
package/packages/cli-council/pyproject.toml +43 -0
package/packages/cli-council/src/cli_council/__init__.py +19 -0
package/packages/cli-council/src/cli_council/__main__.py +185 -0
package/packages/cli-council/src/cli_council/backends/__init__.py +8 -0
package/packages/cli-council/src/cli_council/backends/base.py +81 -0
package/packages/cli-council/src/cli_council/backends/claude.py +25 -0
package/packages/cli-council/src/cli_council/backends/codex.py +27 -0
package/packages/cli-council/src/cli_council/backends/gemini.py +26 -0
package/packages/cli-council/src/cli_council/checkpoint.py +212 -0
package/packages/cli-council/src/cli_council/config.py +51 -0
package/packages/cli-council/src/cli_council/council.py +391 -0
package/packages/cli-council/src/cli_council/models.py +46 -0
package/packages/llm-council/.github/workflows/claude-code-review.yml +44 -0
package/packages/llm-council/.github/workflows/claude.yml +50 -0
package/packages/llm-council/README.md +453 -0
package/packages/llm-council/pyproject.toml +42 -0
package/packages/llm-council/src/llm_council/__init__.py +23 -0
package/packages/llm-council/src/llm_council/__main__.py +259 -0
package/packages/llm-council/src/llm_council/checkpoint.py +193 -0
package/packages/llm-council/src/llm_council/client.py +253 -0
package/packages/llm-council/src/llm_council/config.py +232 -0
package/packages/llm-council/src/llm_council/council.py +482 -0
package/packages/llm-council/src/llm_council/models.py +46 -0
package/packages/mcp-bibliography/MEMORY.md +31 -0
package/packages/mcp-bibliography/_app.py +226 -0
package/packages/mcp-bibliography/formatters.py +158 -0
package/packages/mcp-bibliography/log/2026-03-13-2100.md +35 -0
package/packages/mcp-bibliography/pyproject.toml +15 -0
package/packages/mcp-bibliography/run.sh +20 -0
package/packages/mcp-bibliography/scholarly_formatters.py +83 -0
package/packages/mcp-bibliography/server.py +1857 -0
package/packages/mcp-bibliography/tools/__init__.py +28 -0
package/packages/mcp-bibliography/tools/_registry.py +19 -0
package/packages/mcp-bibliography/tools/altmetric.py +107 -0
package/packages/mcp-bibliography/tools/core.py +92 -0
package/packages/mcp-bibliography/tools/dblp.py +52 -0
package/packages/mcp-bibliography/tools/openalex.py +296 -0
package/packages/mcp-bibliography/tools/opencitations.py +102 -0
package/packages/mcp-bibliography/tools/openreview.py +179 -0
package/packages/mcp-bibliography/tools/orcid.py +131 -0
package/packages/mcp-bibliography/tools/scholarly.py +575 -0
package/packages/mcp-bibliography/tools/unpaywall.py +63 -0
package/packages/mcp-bibliography/tools/zenodo.py +123 -0
package/packages/mcp-bibliography/uv.lock +711 -0
package/scripts/setup.sh +143 -0
package/skills/beamer-deck/SKILL.md +199 -0
package/skills/beamer-deck/references/quality-rubric.md +54 -0
package/skills/beamer-deck/references/review-prompts.md +106 -0
package/skills/bib-validate/SKILL.md +261 -0
package/skills/bib-validate/references/council-mode.md +34 -0
package/skills/bib-validate/references/deep-verify.md +79 -0
package/skills/bib-validate/references/fix-mode.md +36 -0
package/skills/bib-validate/references/openalex-verification.md +45 -0
package/skills/bib-validate/references/preprint-check.md +31 -0
package/skills/bib-validate/references/ref-manager-crossref.md +41 -0
package/skills/bib-validate/references/report-template.md +82 -0
package/skills/code-archaeology/SKILL.md +141 -0
package/skills/code-review/SKILL.md +265 -0
package/skills/code-review/references/quality-rubric.md +67 -0
package/skills/consolidate-memory/SKILL.md +208 -0
package/skills/context-status/SKILL.md +126 -0
package/skills/creation-guard/SKILL.md +230 -0
package/skills/devils-advocate/SKILL.md +130 -0
package/skills/devils-advocate/references/competing-hypotheses.md +83 -0
package/skills/init-project/SKILL.md +115 -0
package/skills/init-project-course/references/memory-and-settings.md +92 -0
package/skills/init-project-course/references/organise-templates.md +94 -0
package/skills/init-project-course/skill.md +147 -0
package/skills/init-project-light/skill.md +139 -0
package/skills/init-project-research/SKILL.md +368 -0
package/skills/init-project-research/references/atlas-pipeline-sync.md +70 -0
package/skills/init-project-research/references/atlas-schema.md +81 -0
package/skills/init-project-research/references/confirmation-report.md +39 -0
package/skills/init-project-research/references/domain-profile-template.md +104 -0
package/skills/init-project-research/references/interview-round3.md +34 -0
package/skills/init-project-research/references/literature-discovery.md +43 -0
package/skills/init-project-research/references/scaffold-details.md +197 -0
package/skills/init-project-research/templates/field-calibration.md +60 -0
package/skills/init-project-research/templates/pipeline-manifest.md +63 -0
package/skills/init-project-research/templates/run-all.sh +116 -0
package/skills/init-project-research/templates/seed-files.md +337 -0
package/skills/insights-deck/SKILL.md +151 -0
package/skills/interview-me/SKILL.md +157 -0
package/skills/latex/SKILL.md +141 -0
package/skills/latex/references/latex-configs.md +183 -0
package/skills/latex-autofix/SKILL.md +230 -0
package/skills/latex-autofix/references/known-errors.md +183 -0
package/skills/latex-autofix/references/quality-rubric.md +50 -0
package/skills/latex-health-check/SKILL.md +161 -0
package/skills/learn/SKILL.md +220 -0
package/skills/learn/scripts/validate_skill.py +265 -0
package/skills/lessons-learned/SKILL.md +201 -0
package/skills/literature/SKILL.md +335 -0
package/skills/literature/references/agent-templates.md +393 -0
package/skills/literature/references/bibliometric-apis.md +44 -0
package/skills/literature/references/cli-council-search.md +79 -0
package/skills/literature/references/openalex-api-guide.md +371 -0
package/skills/literature/references/openalex-common-queries.md +381 -0
package/skills/literature/references/openalex-workflows.md +248 -0
package/skills/literature/references/reference-manager-sync.md +36 -0
package/skills/literature/references/scopus-api-guide.md +208 -0
package/skills/literature/references/wos-api-guide.md +308 -0
package/skills/multi-perspective/SKILL.md +311 -0
package/skills/multi-perspective/references/computational-many-analysts.md +77 -0
package/skills/pipeline-manifest/SKILL.md +226 -0
package/skills/pre-submission-report/SKILL.md +153 -0
package/skills/process-reviews/SKILL.md +244 -0
package/skills/process-reviews/references/rr-routing.md +101 -0
package/skills/project-deck/SKILL.md +87 -0
package/skills/project-safety/SKILL.md +135 -0
package/skills/proofread/SKILL.md +254 -0
package/skills/proofread/references/quality-rubric.md +104 -0
package/skills/python-env/SKILL.md +57 -0
package/skills/quarto-deck/SKILL.md +226 -0
package/skills/quarto-deck/references/markdown-format.md +143 -0
package/skills/quarto-deck/references/quality-rubric.md +54 -0
package/skills/save-context/SKILL.md +174 -0
package/skills/session-log/SKILL.md +98 -0
package/skills/shared/concept-validation-gate.md +161 -0
package/skills/shared/council-protocol.md +265 -0
package/skills/shared/distribution-diagnostics.md +164 -0
package/skills/shared/engagement-stratified-sampling.md +218 -0
package/skills/shared/escalation-protocol.md +74 -0
package/skills/shared/external-audit-protocol.md +205 -0
package/skills/shared/intercoder-reliability.md +256 -0
package/skills/shared/mcp-degradation.md +81 -0
package/skills/shared/method-probing-questions.md +163 -0
package/skills/shared/multi-language-conventions.md +143 -0
package/skills/shared/paid-api-safety.md +174 -0
package/skills/shared/palettes.md +90 -0
package/skills/shared/progressive-disclosure.md +92 -0
package/skills/shared/project-documentation-content.md +443 -0
package/skills/shared/project-documentation-format.md +281 -0
package/skills/shared/project-documentation.md +100 -0
package/skills/shared/publication-output.md +138 -0
package/skills/shared/quality-scoring.md +70 -0
package/skills/shared/reference-resolution.md +77 -0
package/skills/shared/research-quality-rubric.md +165 -0
package/skills/shared/rhetoric-principles.md +54 -0
package/skills/shared/skill-design-patterns.md +272 -0
package/skills/shared/skill-index.md +240 -0
package/skills/shared/system-documentation.md +334 -0
package/skills/shared/tikz-rules.md +402 -0
package/skills/shared/validation-tiers.md +121 -0
package/skills/shared/venue-guides/README.md +46 -0
package/skills/shared/venue-guides/cell_press_style.md +483 -0
package/skills/shared/venue-guides/conferences_formatting.md +564 -0
package/skills/shared/venue-guides/cs_conference_style.md +463 -0
package/skills/shared/venue-guides/examples/cell_summary_example.md +247 -0
package/skills/shared/venue-guides/examples/medical_structured_abstract.md +313 -0
package/skills/shared/venue-guides/examples/nature_abstract_examples.md +213 -0
package/skills/shared/venue-guides/examples/neurips_introduction_example.md +245 -0
package/skills/shared/venue-guides/journals_formatting.md +486 -0
package/skills/shared/venue-guides/medical_journal_styles.md +535 -0
package/skills/shared/venue-guides/ml_conference_style.md +556 -0
package/skills/shared/venue-guides/nature_science_style.md +405 -0
package/skills/shared/venue-guides/reviewer_expectations.md +417 -0
package/skills/shared/venue-guides/venue_writing_styles.md +321 -0
package/skills/split-pdf/SKILL.md +172 -0
package/skills/split-pdf/methodology.md +48 -0
package/skills/sync-notion/SKILL.md +93 -0
package/skills/system-audit/SKILL.md +157 -0
package/skills/system-audit/references/sub-agent-prompts.md +294 -0
package/skills/task-management/SKILL.md +131 -0
package/skills/update-focus/SKILL.md +204 -0
package/skills/update-project-doc/SKILL.md +194 -0
package/skills/validate-bib/SKILL.md +242 -0
package/skills/validate-bib/references/council-mode.md +34 -0
package/skills/validate-bib/references/deep-verify.md +71 -0
package/skills/validate-bib/references/openalex-verification.md +45 -0
package/skills/validate-bib/references/preprint-check.md +31 -0
package/skills/validate-bib/references/report-template.md +62 -0

package/skills/shared/intercoder-reliability.md ADDED Viewed

@@ -0,0 +1,256 @@
+# Inter-Coder Reliability & LLM Annotation Validation
+> Shared reference for content analysis, LLM annotation, and coding studies. Covers both human-coder and multi-model reliability assessment. Adapted from CommDAAF AgentAcademy protocol (Xu 2026).
+## Principle
+**Always report reliability per category, not just aggregate.** A global κ of 0.7 can hide the fact that one frame has κ = 0.3 — which means that frame's results are unreliable. Frame-specific (or category-specific) reliability is the minimum standard.
+---
+## When This Applies
+- Any content analysis with 2+ coders (human or LLM)
+- LLM annotation studies using multiple models
+- Survey coding or qualitative coding with multiple raters
+- Any study reporting inter-coder or inter-model agreement
+---
+## Metrics to Report
+### For 2 coders
+| Metric | When to use | Interpretation |
+|--------|------------|----------------|
+| **Cohen's κ** | Two coders, nominal categories | Chance-corrected agreement |
+| **Weighted κ** | Two coders, ordinal categories | Accounts for degree of disagreement |
+| **% Agreement** | Supplement only | Not chance-corrected; report alongside κ |
+### For 3+ coders (including multi-model LLM)
+| Metric | When to use | Interpretation |
+|--------|------------|----------------|
+| **Fleiss' κ** | 3+ coders, nominal categories | Multi-rater chance-corrected agreement |
+| **Three-way agreement** | 3 models, quick check | Proportion where all 3 agree |
+| **Majority agreement** | 3 models, voting | Proportion where ≥2/3 agree |
+| **Pairwise κ** | 3+ coders, diagnostic | Which pairs agree/disagree |
+| **Krippendorff's α** | Any number of coders, any scale | Most general; handles missing data |
+---
+## Thresholds
+| κ range | Interpretation | Action |
+|---------|---------------|--------|
+| < 0.20 | Poor | Do not use this category. Redefine or merge. |
+| 0.20–0.40 | Fair | Flag. Consider retraining or revising codebook. |
+| 0.40–0.60 | Moderate | Acceptable for exploratory. Report limitation. |
+| 0.60–0.80 | Substantial | Acceptable for publication. |
+| > 0.80 | Excellent | Strong agreement. |
+**Publication threshold:** κ ≥ 0.7 (or α ≥ 0.7 for Krippendorff's α).
+**LLM annotation minimum:** Human validation sample N ≥ 200, κ ≥ 0.7 between LLM and human.
+---
+## Category-Specific Reliability
+### Why aggregate isn't enough
+```
+Aggregate Fleiss' κ = 0.72  ← looks fine
+Per-category:
+  SOLIDARITY:    κ = 0.89  ✅
+  INJUSTICE:     κ = 0.81  ✅
+  MOBILISATION:  κ = 0.74  ✅
+  HUMANITARIAN:  κ = 0.31  ⚠️  ← this category's results are unreliable
+  CULTURAL:      κ = 0.65  ⚠️  ← borderline
+```
+If HUMANITARIAN has κ = 0.31, any finding about that category should be treated as exploratory regardless of the aggregate score.
+### Implementation
+```python
+from sklearn.metrics import cohen_kappa_score
+import itertools
+def category_specific_reliability(coded_data, code_var='frame', coders=None):
+    """
+    Calculate reliability per category across all coders.
+    coded_data: list of dicts, each with '{coder}_{code_var}' keys
+    coders: list of coder names (e.g., ['claude', 'glm', 'kimi'] or ['coder1', 'coder2'])
+    """
+    # Collect all categories
+    categories = set()
+    for d in coded_data:
+        for c in coders:
+            categories.add(d.get(f'{c}_{code_var}'))
+    results = {}
+    for cat in categories:
+        # For each item, create binary: did coder assign this category?
+        binary_codes = {c: [] for c in coders}
+        for d in coded_data:
+            for c in coders:
+                binary_codes[c].append(1 if d.get(f'{c}_{code_var}') == cat else 0)
+        # Three-way agreement (for 3 coders)
+        if len(coders) == 3:
+            n_relevant = sum(1 for i in range(len(coded_data))
+                           if any(binary_codes[c][i] for c in coders))
+            if n_relevant < 5:
+                results[cat] = {'n': n_relevant, 'kappa': None, 'flag': 'Too few cases'}
+                continue
+            three_way = sum(
+                1 for i in range(len(coded_data))
+                if all(binary_codes[c][i] == binary_codes[coders[0]][i] for c in coders)
+                and any(binary_codes[c][i] for c in coders)
+            ) / max(n_relevant, 1)
+            results[cat] = {
+                'n': n_relevant,
+                'agreement': three_way,
+                'flag': '✅' if three_way >= 0.6 else '⚠️ Low reliability'
+            }
+        # Pairwise kappa
+        kappas = []
+        for c1, c2 in itertools.combinations(coders, 2):
+            try:
+                k = cohen_kappa_score(binary_codes[c1], binary_codes[c2])
+                kappas.append(k)
+            except ValueError:
+                pass
+        if kappas:
+            results[cat]['mean_pairwise_kappa'] = sum(kappas) / len(kappas)
+    return results
+```
+### R
+```r
+library(irr)
+category_reliability <- function(coded_data, code_var, coders) {
+  categories <- unique(unlist(coded_data[paste0(coders, "_", code_var)]))
+  results <- list()
+  for (cat in categories) {
+    binary <- sapply(coders, function(c) {
+      as.integer(coded_data[[paste0(c, "_", code_var)]] == cat)
+    })
+    n_relevant <- sum(rowSums(binary) > 0)
+    if (n_relevant < 5) next
+    k <- kappam.fleiss(binary)
+    results[[cat]] <- list(
+      n = n_relevant,
+      fleiss_kappa = k$value,
+      flag = ifelse(k$value >= 0.6, "OK", "Low")
+    )
+  }
+  results
+}
+```
+---
+## LLM Annotation Protocol
+When using LLMs as annotators (a growing practice):
+### Multi-Model Design
+| Requirement | Standard |
+|-------------|----------|
+| **Number of models** | ≥ 2 (ideally 3 from different providers) |
+| **Human validation** | N ≥ 200, κ ≥ 0.7 between LLM majority vote and human gold standard |
+| **Prompt documentation** | Full prompt text in appendix or replication package |
+| **Disagreement analysis** | Report which categories models disagree on most |
+| **Temperature** | 0 or near-0 for reproducibility |
+### Majority Vote
+For 3-model annotation, use majority vote (2/3 agreement):
+```python
+from collections import Counter
+def majority_vote(codes_by_model):
+    """codes_by_model: {'claude': 'X', 'gpt': 'Y', 'gemini': 'X'} → 'X'"""
+    counter = Counter(codes_by_model.values())
+    majority, count = counter.most_common(1)[0]
+    return majority, count  # count = 2 or 3
+```
+### When Models Disagree
+| Agreement level | Action |
+|----------------|--------|
+| 3/3 agree | High confidence — use the label |
+| 2/3 agree | Moderate confidence — use majority, flag for spot-check |
+| 0/3 agree (three-way split) | Low confidence — requires human adjudication |
+---
+## Reporting Template
+```markdown
+## Inter-Coder Reliability
+### Overall
+- Fleiss' κ = X.XX (N = XXX items, K = X coders)
+- Three-way agreement: XX.X%
+- Majority agreement (≥2/3): XX.X%
+### Per Category
+| Category | N | Three-way | Fleiss' κ | Status |
+|----------|---|-----------|-----------|--------|
+| X        | XX | XX.X%    | X.XX      | ✅/⚠️  |
+| Y        | XX | XX.X%    | X.XX      | ✅/⚠️  |
+### Low-Reliability Categories
+⚠️ [Category] had κ = X.XX (below 0.60 threshold).
+Findings involving this category should be interpreted with caution.
+### Human Validation (if LLM-annotated)
+- Gold standard sample: N = XXX
+- LLM majority vs. human: κ = X.XX
+- Model-specific: Claude κ = X.XX, GPT κ = X.XX, Gemini κ = X.XX
+```
+---
+## Integration
+### In `/data-analysis`
+When the dataset includes coded/annotated variables:
+1. Check if reliability metrics are reported
+2. If multi-model LLM annotation, verify human validation exists
+3. Flag categories below κ = 0.6 threshold
+### In review agents
+Check whether content analysis papers:
+- Report per-category reliability (not just aggregate)
+- Flag low-reliability categories as limitations
+- Include human validation for LLM annotations
+- Missing reliability → Major issue; aggregate-only → Minor issue
+### Validation tier interaction
+| Tier | Requirement |
+|------|------------|
+| 🟢 Exploratory | Multi-model agreement reported; human validation optional |
+| 🟡 Pilot | Multi-model agreement + spot-check (N ≥ 50) |
+| 🔴 Publication | Full reliability battery + human validation (N ≥ 200, κ ≥ 0.7) |

package/skills/shared/mcp-degradation.md ADDED Viewed

@@ -0,0 +1,81 @@
+# MCP Degradation Pattern
+> Shared reference for skills that depend on MCP servers. Gracefully degrade when a server is unreachable instead of failing silently or blocking the workflow.
+## The 5-Step Pattern
+### Step 1: Probe at Start
+Before any MCP-dependent work, test connectivity:
+```
+Try a lightweight read operation (e.g., mcp__taskflow__search_tasks with a known term,
+openalex_search_works with a trivial query). If it times out or errors,
+mark that server as unavailable for the session.
+```
+Do this **once**, at the beginning of the skill — not before every call.
+### Step 2: Report Availability
+After probing, state clearly:
+```
+MCP status:
+- Vault: ✓ available
+- OpenAlex: ✗ unavailable (timeout)
+```
+This sets expectations before the user sees skipped steps.
+### Step 3: Skip Dependent Phases
+When a server is unavailable, skip phases that depend on it entirely — do not attempt them, do not retry. Mark skipped phases clearly in the output:
+```
+Step 5: Update vault research pipeline — SKIPPED (vault unavailable)
+```
+### Step 4: Offer Fallbacks
+For each skipped phase, suggest what the user can do manually:
+| Unavailable | Fallback |
+|-------------|----------|
+| taskflow MCP | "Run `vault sync (edit vault files directly)` later when vault is accessible" |
+| OpenAlex MCP | "Use `/literature` with web search mode instead" |
+| Scholarly MCP | "Use `/literature` with web search mode instead" |
+### Step 5: Summarize at End
+Close with a clean summary of what completed vs. what was skipped:
+```
+Completed: Steps 1-4 (local context updates)
+Skipped: Step 5 (vault sync — server unavailable)
+Action needed: Run `vault sync (edit vault files directly)` when vault is accessible
+```
+## MCP-Consuming Skills
+These skills should reference this pattern:
+| Skill | MCP Server | What depends on it |
+|-------|------------|-------------------|
+| `vault sync` | vault | Steps 3, 5 (search + update) |
+| `task-management` | vault | Daily planning, task creation, pipeline queries |
+| `init-project-research` | vault | Pipeline entry creation |
+| `literature` | OpenAlex, Scholarly | Citation search, DOI verification |
+| `atlas-review` | vault | Pipeline cross-reference |
+## When to Apply
+- Always when a skill's workflow includes MCP tool calls
+- Especially in background agents (where MCP tools may auto-deny due to permission constraints)
+- When network issues are suspected (slow responses, recent timeouts)
+## When to Skip
+- Skills that don't use MCP tools
+- When the user explicitly says "skip vault" or "offline mode"
+- Interactive sessions where the user can retry immediately

package/skills/shared/method-probing-questions.md ADDED Viewed

@@ -0,0 +1,163 @@
+# Method-Specific Probing Questions
+> Shared reference for analysis skills and review agents. Mandatory questions before running any empirical analysis. Adapted from CommDAAF (Xu 2026), generalised beyond communication research.
+## Principle
+**Never run a method with default parameters.** Before executing any analysis, ask the method-specific probing questions. Do NOT proceed without explicit answers. Vague answers trigger the [escalation protocol](escalation-protocol.md).
+## Expert Fast-Track
+Experienced researchers can bypass probing by providing complete specs upfront:
+```
+DiD: treatment = policy_change_2020, control = neighbouring_states,
+parallel trends tested 2015-2019, Callaway-Sant'Anna estimator,
+clustered SEs at state level, 3 pre-treatment periods
+```
+If the spec is complete, acknowledge and proceed. If anything is missing, probe only the gaps.
+---
+## Probing Questions by Method
+### Regression / OLS / Panel
+1. What is your **estimand**? (ATE? ATT? Conditional mean?)
+2. What is the **unit of analysis**? (Individual? Firm? Country-year?)
+3. What is the **identification strategy**? (Selection on observables? IV? DiD? RDD?)
+4. What **controls** and why? (Justify each — no kitchen sink)
+5. How do you handle **standard errors**? (Clustered? HAC? At what level and why?)
+6. What are the **key threats to validity**? (Omitted variables, reverse causality, measurement error)
+### Difference-in-Differences
+1. What is the **treatment** and when does it occur?
+2. What are **treatment and control groups**? How defined?
+3. Is treatment timing **staggered**? (If yes: TWFE is biased — use CS, Sun-Abraham, or similar)
+4. **Parallel trends** evidence? (Pre-treatment dynamics, placebo tests)
+5. Any **anticipation effects**? (Treatment effects before official treatment date)
+6. Are there **spillovers** between treated and control units?
+7. What is the **relevant pre-treatment window**?
+### Instrumental Variables
+1. What is the **instrument** and what is the **theoretical argument** for relevance?
+2. What is the **exclusion restriction** argument? (Why does Z affect Y only through X?)
+3. **Weak instrument** diagnostics? (First-stage F-statistic, Anderson-Rubin CIs)
+4. Is the instrument **plausibly exogenous**? (What could violate this?)
+5. Are there **multiple instruments**? (Over-identification tests)
+6. What is the **LATE** you're estimating? (Who are the compliers?)
+### Regression Discontinuity
+1. What is the **running variable** and **cutoff**?
+2. Is it **sharp or fuzzy**? (Compliance rate at cutoff)
+3. **Bandwidth selection** method? (MSE-optimal? Cross-validation?)
+4. Evidence against **manipulation** at cutoff? (McCrary test, density plot)
+5. **Functional form**? (Local linear? Polynomial order? Sensitivity to choice)
+6. Are there **multiple cutoffs** or **geographic boundaries**?
+### Experiments / RCTs
+1. What is the **randomisation unit**? (Individual? Cluster?)
+2. **Power analysis**? (Effect size assumption, desired power, required N)
+3. Was the experiment **pre-registered**? (Where? What deviations?)
+4. How do you handle **attrition** and **non-compliance**?
+5. **Multiple testing**? (How many outcomes? Correction method?)
+6. **Demand effects**? (Can participants guess the hypothesis?)
+7. Is there a **pre-analysis plan**?
+### Survey / Psychometrics
+1. What **scales** are you using? (Validated or ad hoc?)
+2. **Common method variance** — single source, single method?
+3. **Response rate** and non-response bias assessment?
+4. **Sampling strategy**? (Probability? Convenience? Online panel?)
+5. How do you handle **missing data**? (Listwise deletion? MI? FIML?)
+6. **Measurement invariance** across groups? (If comparing subgroups)
+### MCDM / Multi-Criteria Analysis
+1. What **method** and **why this one**? (AHP? TOPSIS? PROMETHEE? ELECTRE?)
+2. How are **criteria weights** determined? (Expert elicitation? Pairwise comparison? Equal?)
+3. **Sensitivity analysis** on weights? (How much do rankings change?)
+4. **Rank reversal** check? (Does adding/removing alternatives change the ranking?)
+5. How many **decision-makers** and how are judgements aggregated?
+6. Are criteria **independent**? (If not, how is correlation handled?)
+7. **Normalisation method** and sensitivity to that choice?
+### Topic Modeling / NLP
+1. Why topic modeling specifically? (Exploratory? No predetermined categories?)
+2. How many topics (K) and **how will you select K**? (Coherence score? Held-out likelihood?)
+3. What **preprocessing**? (Stopwords, stemming, frequency thresholds — justify each)
+4. What counts as one **document**? (Post? Paragraph? Article?)
+5. How will you **validate topics are meaningful**? (Read 20+ docs per topic)
+6. Who will **name topics** and how?
+### Machine Learning / Classification
+1. What is the **ground truth** and how was it generated?
+2. **Train/test split** strategy? (Random? Temporal? Stratified?)
+3. How do you prevent **data leakage**?
+4. What **baselines** are you comparing against? (Are they serious or strawmen?)
+5. **Evaluation metrics** and why? (Accuracy? F1? AUC? — justify for your class balance)
+6. **Human validation** sample? (Required for any LLM annotation: N≥200, κ≥0.7)
+### Simulation / Agent-Based Models
+1. What is the **purpose** of the simulation? (Theoretical exploration? Mechanism testing? Calibration?)
+2. How are **parameters chosen**? (Empirically calibrated? Assumed? Swept?)
+3. **Sensitivity analysis** plan? (Which parameters vary? Over what range?)
+4. How many **replications** per parameter configuration?
+5. Does the model **validate against empirical data**? (Or is it purely theoretical?)
+6. What **simplifying assumptions** are made and what do they rule out?
+7. **Convergence check** — how do you know the simulation has run long enough?
+### Content Analysis / Coding
+1. Where is your **codebook**? (Must exist before coding)
+2. How many **coders** and what is the reliability plan?
+3. **Sampling strategy** for content to code?
+4. **Training protocol** for coders?
+5. How will you handle **ambiguous cases**?
+6. **Inter-coder reliability** metric and threshold? (Fleiss' κ ≥ 0.7 for publication)
+---
+## How Skills and Agents Use This
+### In analysis skills (`/data-analysis`, `/causal-design`, `/experiment-design`)
+1. Detect the method type from user's request
+2. Present the relevant probing questions
+3. Wait for answers before proceeding
+4. If answers are vague → [escalation protocol](escalation-protocol.md)
+5. Expert fast-track: if user provides complete spec, acknowledge and proceed
+### In review agents (`referee2-reviewer`, `domain-reviewer`)
+1. Check whether the paper addresses these questions for its stated method
+2. Flag unanswered questions as issues in the review report
+3. Missing identification strategy → Critical; missing sensitivity analysis → Major
+### In project agents (`data-engineer`, `econometrician`)
+1. Before implementing any estimation, verify the probing questions are answered
+2. Reference the answers from `.planning/state.md` or the paper's methodology section
+3. If no answers exist, probe the user before writing code
+---
+## Adding New Methods
+When a research project uses a method not covered above, create a method-specific question set following the pattern:
+1. **What** — define the exact variant/specification
+2. **Why** — justify the choice over alternatives
+3. **How** — implementation details that affect validity
+4. **Threats** — what could go wrong
+5. **Validation** — how you'll know it's correct
+6. **Sensitivity** — what happens when assumptions change

package/skills/shared/multi-language-conventions.md ADDED Viewed

@@ -0,0 +1,143 @@
+# Multi-Language Conventions
+> Code style, packages, and output patterns for R, Python, Stata, and Julia.
+> Referenced by `/data-analysis`, `/synthetic-data`, `/replication-package`.
+## Language Detection
+Detect from project context in this order:
+1. Existing scripts in `code/` or `src/` → match language
+2. `CLAUDE.md` or `MEMORY.md` mentions → follow stated preference
+3. User request → explicit language choice
+4. Default → R for econometrics/causal inference, Python for ML/simulation
+## R Conventions
+### Style
+- Assignment: `<-` (never `=`)
+- Pipe: `|>` (base R) preferred; `%>%` acceptable if tidyverse already loaded
+- Naming: `snake_case` for variables and functions
+- Line width: 80 characters
+### Core Packages
+| Task | Package |
+|------|---------|
+| Data wrangling | `dplyr`, `tidyr`, `data.table` |
+| Estimation | `fixest`, `estimatr`, `lme4`, `survival` |
+| Causal inference | `did`, `didimputation`, `synthdid`, `rdrobust`, `MatchIt` |
+| Tables | `modelsummary`, `kableExtra`, `stargazer` |
+| Figures | `ggplot2` + `ggthemes`, `patchwork` |
+| Reporting | `rmarkdown`, `knitr` |
+| Power analysis | `DeclareDesign`, `pwr`, `simr` |
+| Survey | `survey`, `srvyr`, `qualtRics` |
+### Output Pattern
+```r
+# Tables → LaTeX
+modelsummary(models, output = "paper/tables/table1.tex",
+             stars = c("*" = 0.10, "**" = 0.05, "***" = 0.01))
+# Figures → PDF
+ggsave("paper/figures/fig1.pdf", width = 6, height = 4)
+```
+## Python Conventions
+### Style
+- **Always use `uv`** — never bare `python` or `pip`
+- Naming: `snake_case` for variables/functions, `PascalCase` for classes
+- Type hints encouraged but not required for analysis scripts
+- Line width: 88 characters (Black default)
+### Core Packages
+| Task | Package |
+|------|---------|
+| Data wrangling | `pandas`, `polars` |
+| Estimation | `statsmodels`, `linearmodels`, `scikit-learn` |
+| Causal inference | `econml`, `causalml`, `doubleml` |
+| Tables | `stargazer` (via statsmodels), custom `.tex` export |
+| Figures | `matplotlib`, `seaborn`, `plotnine` |
+| Power analysis | `statsmodels.stats.power`, `numpy` simulation |
+| Survey | `pandas` + manual parsing |
+### Output Pattern
+```python
+import pandas as pd
+# Tables → LaTeX
+df.to_latex("paper/tables/table1.tex", index=False,
+            caption="Descriptive Statistics", label="tab:desc")
+# Figures → PDF
+fig.savefig("paper/figures/fig1.pdf", bbox_inches="tight", dpi=300)
+```
+## Stata Conventions
+### Style
+- Use `//` for inline comments, `/* */` for blocks
+- Naming: lowercase with underscores
+- Always set `version` at script top
+- Use `capture log close` / `log using` pattern
+### Core Commands
+| Task | Command |
+|------|---------|
+| Data wrangling | `gen`, `replace`, `reshape`, `merge`, `collapse` |
+| Estimation | `reg`, `ivregress`, `xtreg`, `areg`, `ppmlhdfe` |
+| Causal inference | `did_multiplegt`, `csdid`, `rdrobust`, `eventstudyinteract` |
+| Tables | `esttab`, `outreg2`, `estout` |
+| Figures | `twoway`, `coefplot`, `graph export` |
+| Power analysis | `power`, `simulate` |
+### Output Pattern
+```stata
+// Tables → LaTeX
+esttab m1 m2 m3 using "paper/tables/table1.tex", ///
+    replace booktabs label se star(* 0.10 ** 0.05 *** 0.01)
+// Figures → PDF
+graph export "paper/figures/fig1.pdf", replace
+```
+## Julia Conventions
+### Style
+- Naming: `snake_case` for functions/variables, `PascalCase` for types
+- Use `using` not `import` for standard packages
+- Line width: 92 characters
+### Core Packages
+| Task | Package |
+|------|---------|
+| Data wrangling | `DataFrames.jl`, `CSV.jl` |
+| Estimation | `GLM.jl`, `FixedEffectModels.jl`, `MixedModels.jl` |
+| Causal inference | `CausalInference.jl` (limited — often custom) |
+| Tables | `PrettyTables.jl` (LaTeX backend) |
+| Figures | `Makie.jl`, `AlgebraOfGraphics.jl`, `Plots.jl` |
+| Power analysis | Custom simulation with `Distributions.jl` |
+### Output Pattern
+```julia
+using PrettyTables
+# Tables → LaTeX
+open("paper/tables/table1.tex", "w") do io
+    pretty_table(io, df, backend=Val(:latex))
+end
+# Figures → PDF
+save("paper/figures/fig1.pdf", fig)
+```
+## Cross-Language Rules
+1. **Output routing:** All `.tex` table files and figure files go to `paper/tables/` or `paper/figures/` per the `overleaf-separation` rule. Scripts stay in `code/` or `src/`.
+2. **Reproducibility header:** Every script should start with a comment block stating purpose, inputs, outputs, and dependencies.
+3. **Seed setting:** Always set random seeds explicitly (`set.seed()`, `np.random.seed()`, `set seed`, `Random.seed!`).
+4. **Path convention:** Use project-relative paths, never absolute paths. Assume scripts run from project root.
+5. **Data flow:** `data/raw/` → script → `data/processed/` → script → `paper/` output. Never write to `data/raw/`.