npm - flonat-research - Versions diffs - 0.1.0 - Mend

flonat-research 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (285) hide show

package/.claude/agents/domain-reviewer.md +336 -0
package/.claude/agents/fixer.md +226 -0
package/.claude/agents/paper-critic.md +370 -0
package/.claude/agents/peer-reviewer.md +289 -0
package/.claude/agents/proposal-reviewer.md +215 -0
package/.claude/agents/referee2-reviewer.md +367 -0
package/.claude/agents/references/journal-referee-profiles.md +354 -0
package/.claude/agents/references/paper-critic/council-personas.md +77 -0
package/.claude/agents/references/paper-critic/council-prompts.md +198 -0
package/.claude/agents/references/peer-reviewer/report-template.md +199 -0
package/.claude/agents/references/peer-reviewer/sa-prompts.md +260 -0
package/.claude/agents/references/peer-reviewer/security-scan.md +188 -0
package/.claude/agents/references/proposal-reviewer/report-template.md +144 -0
package/.claude/agents/references/proposal-reviewer/sa-prompts.md +149 -0
package/.claude/agents/references/referee-config.md +114 -0
package/.claude/agents/references/referee2-reviewer/audit-checklists.md +287 -0
package/.claude/agents/references/referee2-reviewer/report-template.md +334 -0
package/.claude/rules/design-before-results.md +52 -0
package/.claude/rules/ignore-agents-md.md +17 -0
package/.claude/rules/ignore-gemini-md.md +17 -0
package/.claude/rules/lean-claude-md.md +45 -0
package/.claude/rules/learn-tags.md +99 -0
package/.claude/rules/overleaf-separation.md +67 -0
package/.claude/rules/plan-first.md +175 -0
package/.claude/rules/read-docs-first.md +50 -0
package/.claude/rules/scope-discipline.md +28 -0
package/.claude/settings.json +125 -0
package/.context/current-focus.md +33 -0
package/.context/preferences/priorities.md +36 -0
package/.context/preferences/task-naming.md +28 -0
package/.context/profile.md +29 -0
package/.context/projects/_index.md +41 -0
package/.context/projects/papers/nudge-exp.md +22 -0
package/.context/projects/papers/uncertainty.md +31 -0
package/.context/resources/claude-scientific-writer-review.md +48 -0
package/.context/resources/cunningham-multi-analyst-agents.md +104 -0
package/.context/resources/cunningham-multilang-code-audit.md +62 -0
package/.context/resources/google-ai-co-scientist-review.md +72 -0
package/.context/resources/karpathy-llm-council-review.md +58 -0
package/.context/resources/multi-coder-reliability-protocol.md +175 -0
package/.context/resources/pedro-santanna-takeaways.md +96 -0
package/.context/resources/venue-rankings/abs_ajg_2024.csv +1823 -0
package/.context/resources/venue-rankings/abs_ajg_2024_econ.csv +356 -0
package/.context/resources/venue-rankings/cabs_4_4star_theory.csv +40 -0
package/.context/resources/venue-rankings/core_2026.csv +801 -0
package/.context/resources/venue-rankings.md +147 -0
package/.context/workflows/README.md +69 -0
package/.context/workflows/daily-review.md +91 -0
package/.context/workflows/meeting-actions.md +108 -0
package/.context/workflows/replication-protocol.md +155 -0
package/.context/workflows/weekly-review.md +113 -0
package/.mcp-server-biblio/formatters.py +158 -0
package/.mcp-server-biblio/pyproject.toml +11 -0
package/.mcp-server-biblio/server.py +678 -0
package/.mcp-server-biblio/sources/__init__.py +14 -0
package/.mcp-server-biblio/sources/base.py +73 -0
package/.mcp-server-biblio/sources/formatters.py +83 -0
package/.mcp-server-biblio/sources/models.py +22 -0
package/.mcp-server-biblio/sources/multi_source.py +243 -0
package/.mcp-server-biblio/sources/openalex_source.py +183 -0
package/.mcp-server-biblio/sources/scopus_source.py +309 -0
package/.mcp-server-biblio/sources/wos_source.py +508 -0
package/.mcp-server-biblio/uv.lock +896 -0
package/.scripts/README.md +161 -0
package/.scripts/ai_pattern_density.py +446 -0
package/.scripts/conf +445 -0
package/.scripts/config.py +122 -0
package/.scripts/count_inventory.py +275 -0
package/.scripts/daily_digest.py +288 -0
package/.scripts/done +177 -0
package/.scripts/extract_meeting_actions.py +223 -0
package/.scripts/focus +176 -0
package/.scripts/generate-codex-agents-md.py +217 -0
package/.scripts/inbox +194 -0
package/.scripts/notion_helpers.py +325 -0
package/.scripts/openalex/query_helpers.py +306 -0
package/.scripts/papers +227 -0
package/.scripts/query +223 -0
package/.scripts/session-history.py +201 -0
package/.scripts/skill-health.py +516 -0
package/.scripts/skill-log-miner.py +273 -0
package/.scripts/sync-to-codex.sh +252 -0
package/.scripts/task +213 -0
package/.scripts/tasks +190 -0
package/.scripts/week +206 -0
package/CLAUDE.md +197 -0
package/LICENSE +21 -0
package/MEMORY.md +38 -0
package/README.md +269 -0
package/docs/agents.md +44 -0
package/docs/bibliography-setup.md +55 -0
package/docs/council-mode.md +36 -0
package/docs/getting-started.md +245 -0
package/docs/hooks.md +38 -0
package/docs/mcp-servers.md +82 -0
package/docs/notion-setup.md +109 -0
package/docs/rules.md +33 -0
package/docs/scripts.md +303 -0
package/docs/setup-overview/setup-overview.pdf +0 -0
package/docs/skills.md +70 -0
package/docs/system.md +159 -0
package/hooks/block-destructive-git.sh +66 -0
package/hooks/context-monitor.py +114 -0
package/hooks/postcompact-restore.py +157 -0
package/hooks/precompact-autosave.py +181 -0
package/hooks/promise-checker.sh +124 -0
package/hooks/protect-source-files.sh +81 -0
package/hooks/resume-context-loader.sh +53 -0
package/hooks/startup-context-loader.sh +102 -0
package/package.json +51 -0
package/packages/cli-council/.github/workflows/claude-code-review.yml +44 -0
package/packages/cli-council/.github/workflows/claude.yml +50 -0
package/packages/cli-council/README.md +100 -0
package/packages/cli-council/pyproject.toml +43 -0
package/packages/cli-council/src/cli_council/__init__.py +19 -0
package/packages/cli-council/src/cli_council/__main__.py +185 -0
package/packages/cli-council/src/cli_council/backends/__init__.py +8 -0
package/packages/cli-council/src/cli_council/backends/base.py +81 -0
package/packages/cli-council/src/cli_council/backends/claude.py +25 -0
package/packages/cli-council/src/cli_council/backends/codex.py +27 -0
package/packages/cli-council/src/cli_council/backends/gemini.py +26 -0
package/packages/cli-council/src/cli_council/checkpoint.py +212 -0
package/packages/cli-council/src/cli_council/config.py +51 -0
package/packages/cli-council/src/cli_council/council.py +391 -0
package/packages/cli-council/src/cli_council/models.py +46 -0
package/packages/llm-council/.github/workflows/claude-code-review.yml +44 -0
package/packages/llm-council/.github/workflows/claude.yml +50 -0
package/packages/llm-council/README.md +453 -0
package/packages/llm-council/pyproject.toml +42 -0
package/packages/llm-council/src/llm_council/__init__.py +23 -0
package/packages/llm-council/src/llm_council/__main__.py +259 -0
package/packages/llm-council/src/llm_council/checkpoint.py +193 -0
package/packages/llm-council/src/llm_council/client.py +253 -0
package/packages/llm-council/src/llm_council/config.py +232 -0
package/packages/llm-council/src/llm_council/council.py +482 -0
package/packages/llm-council/src/llm_council/models.py +46 -0
package/packages/mcp-bibliography/MEMORY.md +31 -0
package/packages/mcp-bibliography/_app.py +226 -0
package/packages/mcp-bibliography/formatters.py +158 -0
package/packages/mcp-bibliography/log/2026-03-13-2100.md +35 -0
package/packages/mcp-bibliography/pyproject.toml +15 -0
package/packages/mcp-bibliography/run.sh +20 -0
package/packages/mcp-bibliography/scholarly_formatters.py +83 -0
package/packages/mcp-bibliography/server.py +1857 -0
package/packages/mcp-bibliography/tools/__init__.py +28 -0
package/packages/mcp-bibliography/tools/_registry.py +19 -0
package/packages/mcp-bibliography/tools/altmetric.py +107 -0
package/packages/mcp-bibliography/tools/core.py +92 -0
package/packages/mcp-bibliography/tools/dblp.py +52 -0
package/packages/mcp-bibliography/tools/openalex.py +296 -0
package/packages/mcp-bibliography/tools/opencitations.py +102 -0
package/packages/mcp-bibliography/tools/openreview.py +179 -0
package/packages/mcp-bibliography/tools/orcid.py +131 -0
package/packages/mcp-bibliography/tools/scholarly.py +575 -0
package/packages/mcp-bibliography/tools/unpaywall.py +63 -0
package/packages/mcp-bibliography/tools/zenodo.py +123 -0
package/packages/mcp-bibliography/uv.lock +711 -0
package/scripts/setup.sh +143 -0
package/skills/beamer-deck/SKILL.md +199 -0
package/skills/beamer-deck/references/quality-rubric.md +54 -0
package/skills/beamer-deck/references/review-prompts.md +106 -0
package/skills/bib-validate/SKILL.md +261 -0
package/skills/bib-validate/references/council-mode.md +34 -0
package/skills/bib-validate/references/deep-verify.md +79 -0
package/skills/bib-validate/references/fix-mode.md +36 -0
package/skills/bib-validate/references/openalex-verification.md +45 -0
package/skills/bib-validate/references/preprint-check.md +31 -0
package/skills/bib-validate/references/ref-manager-crossref.md +41 -0
package/skills/bib-validate/references/report-template.md +82 -0
package/skills/code-archaeology/SKILL.md +141 -0
package/skills/code-review/SKILL.md +265 -0
package/skills/code-review/references/quality-rubric.md +67 -0
package/skills/consolidate-memory/SKILL.md +208 -0
package/skills/context-status/SKILL.md +126 -0
package/skills/creation-guard/SKILL.md +230 -0
package/skills/devils-advocate/SKILL.md +130 -0
package/skills/devils-advocate/references/competing-hypotheses.md +83 -0
package/skills/init-project/SKILL.md +115 -0
package/skills/init-project-course/references/memory-and-settings.md +92 -0
package/skills/init-project-course/references/organise-templates.md +94 -0
package/skills/init-project-course/skill.md +147 -0
package/skills/init-project-light/skill.md +139 -0
package/skills/init-project-research/SKILL.md +368 -0
package/skills/init-project-research/references/atlas-pipeline-sync.md +70 -0
package/skills/init-project-research/references/atlas-schema.md +81 -0
package/skills/init-project-research/references/confirmation-report.md +39 -0
package/skills/init-project-research/references/domain-profile-template.md +104 -0
package/skills/init-project-research/references/interview-round3.md +34 -0
package/skills/init-project-research/references/literature-discovery.md +43 -0
package/skills/init-project-research/references/scaffold-details.md +197 -0
package/skills/init-project-research/templates/field-calibration.md +60 -0
package/skills/init-project-research/templates/pipeline-manifest.md +63 -0
package/skills/init-project-research/templates/run-all.sh +116 -0
package/skills/init-project-research/templates/seed-files.md +337 -0
package/skills/insights-deck/SKILL.md +151 -0
package/skills/interview-me/SKILL.md +157 -0
package/skills/latex/SKILL.md +141 -0
package/skills/latex/references/latex-configs.md +183 -0
package/skills/latex-autofix/SKILL.md +230 -0
package/skills/latex-autofix/references/known-errors.md +183 -0
package/skills/latex-autofix/references/quality-rubric.md +50 -0
package/skills/latex-health-check/SKILL.md +161 -0
package/skills/learn/SKILL.md +220 -0
package/skills/learn/scripts/validate_skill.py +265 -0
package/skills/lessons-learned/SKILL.md +201 -0
package/skills/literature/SKILL.md +335 -0
package/skills/literature/references/agent-templates.md +393 -0
package/skills/literature/references/bibliometric-apis.md +44 -0
package/skills/literature/references/cli-council-search.md +79 -0
package/skills/literature/references/openalex-api-guide.md +371 -0
package/skills/literature/references/openalex-common-queries.md +381 -0
package/skills/literature/references/openalex-workflows.md +248 -0
package/skills/literature/references/reference-manager-sync.md +36 -0
package/skills/literature/references/scopus-api-guide.md +208 -0
package/skills/literature/references/wos-api-guide.md +308 -0
package/skills/multi-perspective/SKILL.md +311 -0
package/skills/multi-perspective/references/computational-many-analysts.md +77 -0
package/skills/pipeline-manifest/SKILL.md +226 -0
package/skills/pre-submission-report/SKILL.md +153 -0
package/skills/process-reviews/SKILL.md +244 -0
package/skills/process-reviews/references/rr-routing.md +101 -0
package/skills/project-deck/SKILL.md +87 -0
package/skills/project-safety/SKILL.md +135 -0
package/skills/proofread/SKILL.md +254 -0
package/skills/proofread/references/quality-rubric.md +104 -0
package/skills/python-env/SKILL.md +57 -0
package/skills/quarto-deck/SKILL.md +226 -0
package/skills/quarto-deck/references/markdown-format.md +143 -0
package/skills/quarto-deck/references/quality-rubric.md +54 -0
package/skills/save-context/SKILL.md +174 -0
package/skills/session-log/SKILL.md +98 -0
package/skills/shared/concept-validation-gate.md +161 -0
package/skills/shared/council-protocol.md +265 -0
package/skills/shared/distribution-diagnostics.md +164 -0
package/skills/shared/engagement-stratified-sampling.md +218 -0
package/skills/shared/escalation-protocol.md +74 -0
package/skills/shared/external-audit-protocol.md +205 -0
package/skills/shared/intercoder-reliability.md +256 -0
package/skills/shared/mcp-degradation.md +81 -0
package/skills/shared/method-probing-questions.md +163 -0
package/skills/shared/multi-language-conventions.md +143 -0
package/skills/shared/paid-api-safety.md +174 -0
package/skills/shared/palettes.md +90 -0
package/skills/shared/progressive-disclosure.md +92 -0
package/skills/shared/project-documentation-content.md +443 -0
package/skills/shared/project-documentation-format.md +281 -0
package/skills/shared/project-documentation.md +100 -0
package/skills/shared/publication-output.md +138 -0
package/skills/shared/quality-scoring.md +70 -0
package/skills/shared/reference-resolution.md +77 -0
package/skills/shared/research-quality-rubric.md +165 -0
package/skills/shared/rhetoric-principles.md +54 -0
package/skills/shared/skill-design-patterns.md +272 -0
package/skills/shared/skill-index.md +240 -0
package/skills/shared/system-documentation.md +334 -0
package/skills/shared/tikz-rules.md +402 -0
package/skills/shared/validation-tiers.md +121 -0
package/skills/shared/venue-guides/README.md +46 -0
package/skills/shared/venue-guides/cell_press_style.md +483 -0
package/skills/shared/venue-guides/conferences_formatting.md +564 -0
package/skills/shared/venue-guides/cs_conference_style.md +463 -0
package/skills/shared/venue-guides/examples/cell_summary_example.md +247 -0
package/skills/shared/venue-guides/examples/medical_structured_abstract.md +313 -0
package/skills/shared/venue-guides/examples/nature_abstract_examples.md +213 -0
package/skills/shared/venue-guides/examples/neurips_introduction_example.md +245 -0
package/skills/shared/venue-guides/journals_formatting.md +486 -0
package/skills/shared/venue-guides/medical_journal_styles.md +535 -0
package/skills/shared/venue-guides/ml_conference_style.md +556 -0
package/skills/shared/venue-guides/nature_science_style.md +405 -0
package/skills/shared/venue-guides/reviewer_expectations.md +417 -0
package/skills/shared/venue-guides/venue_writing_styles.md +321 -0
package/skills/split-pdf/SKILL.md +172 -0
package/skills/split-pdf/methodology.md +48 -0
package/skills/sync-notion/SKILL.md +93 -0
package/skills/system-audit/SKILL.md +157 -0
package/skills/system-audit/references/sub-agent-prompts.md +294 -0
package/skills/task-management/SKILL.md +131 -0
package/skills/update-focus/SKILL.md +204 -0
package/skills/update-project-doc/SKILL.md +194 -0
package/skills/validate-bib/SKILL.md +242 -0
package/skills/validate-bib/references/council-mode.md +34 -0
package/skills/validate-bib/references/deep-verify.md +71 -0
package/skills/validate-bib/references/openalex-verification.md +45 -0
package/skills/validate-bib/references/preprint-check.md +31 -0
package/skills/validate-bib/references/report-template.md +62 -0

package/skills/shared/engagement-stratified-sampling.md ADDED Viewed

@@ -0,0 +1,218 @@
+# Engagement-Stratified Sampling
+> Shared reference for social media and digital trace research. Ensures representative sampling across the engagement distribution. Prevents viral-content bias. Adapted from CommDAAF (Xu 2026).
+## Principle
+**Convenience samples over-represent viral content.** Most social media datasets are skewed — a small fraction of posts drive most engagement. Sampling without stratification produces findings that describe viral content, not typical content. Engagement-stratified sampling ensures coverage across the full distribution.
+---
+## Standard Engagement Tiers
+| Tier | Percentile | Purpose | Typical content |
+|------|------------|---------|-----------------|
+| **Viral** | Top 5% | What makes content break out | Influencer posts, breakthrough moments |
+| **High** | 75th–95th | Successful content | Engaged audiences, topical resonance |
+| **Medium** | 25th–75th | Baseline performance | Typical posts, average engagement |
+| **Low** | Bottom 25% | Why content fails / background noise | Low visibility, potential bot content |
+| **Zero** | Engagement = 0 | No-spread baseline | Posts that never circulated |
+---
+## Engagement Metric Construction
+### Standard composite (social media platforms)
+```python
+import numpy as np
+# Log-transform to handle skewness, +1 to handle zeros
+data['engagement'] = (
+    np.log(data['retweet_count'] + 1) +
+    np.log(data['like_count'] + 1) +
+    np.log(data['quote_count'].fillna(0) + 1)
+)
+```
+**Document any modification** to this formula. Platform-specific variants:
+| Platform | Available metrics | Notes |
+|----------|------------------|-------|
+| X/Twitter | Retweets, likes, quotes, replies | Quote count often missing in older data |
+| Reddit | Upvotes, comments, awards | Score = upvotes - downvotes |
+| YouTube | Views, likes, comments | Views dominate; consider likes/views ratio |
+| Bluesky | Likes, reposts, replies | Open API, no auth required |
+| Instagram | Likes, comments, shares, saves | Shares/saves often not available |
+### R equivalent
+```r
+data$engagement <- log(data$retweet_count + 1) +
+                   log(data$like_count + 1) +
+                   log(replace_na(data$quote_count, 0) + 1)
+```
+---
+## Stratified Sampling Implementation
+### Python
+```python
+def engagement_stratified_sample(data, engagement_col='engagement',
+                                  n_per_tier=100, seed=42):
+    """Sample equally from each engagement tier."""
+    p95 = data[engagement_col].quantile(0.95)
+    p75 = data[engagement_col].quantile(0.75)
+    p25 = data[engagement_col].quantile(0.25)
+    def assign_tier(val):
+        if val >= p95: return 'viral'
+        elif val >= p75: return 'high'
+        elif val >= p25: return 'medium'
+        else: return 'low'
+    data = data.copy()
+    data['engagement_tier'] = data[engagement_col].apply(assign_tier)
+    sample = (data.groupby('engagement_tier')
+              .apply(lambda x: x.sample(min(len(x), n_per_tier),
+                                        random_state=seed))
+              .reset_index(drop=True))
+    return sample
+```
+### R
+```r
+engagement_stratified_sample <- function(data, engagement_col = "engagement",
+                                          n_per_tier = 100, seed = 42) {
+  set.seed(seed)
+  q <- quantile(data[[engagement_col]], probs = c(0.25, 0.75, 0.95))
+  data$engagement_tier <- cut(data[[engagement_col]],
+    breaks = c(-Inf, q[1], q[2], q[3], Inf),
+    labels = c("low", "medium", "high", "viral"))
+  data %>%
+    group_by(engagement_tier) %>%
+    slice_sample(n = min(n(), n_per_tier)) %>%
+    ungroup()
+}
+```
+---
+## Multi-Criteria Sampling
+When engagement is one of several strata:
+```python
+def multi_criteria_sample(data, strata, total_n=500, seed=42):
+    """
+    strata = {
+        'engagement_tier': {'allocation': 'equal'},
+        'language': {'allocation': 'proportional'},
+        'date': {'allocation': 'coverage'}  # at least 1 per unique value
+    }
+    """
+    # Implementation depends on specific constraints
+    # Key principle: engagement strata are equal, others proportional or coverage
+    pass
+```
+**Common multi-criteria designs:**
+| Criterion | Allocation | Rationale |
+|-----------|-----------|-----------|
+| Engagement tier | Equal | Prevent viral bias |
+| Language | Proportional | Reflect population distribution |
+| Time period | Coverage (≥1 per day) | Prevent temporal blind spots |
+| Account type | Proportional or capped | Prevent influencer domination |
+| Verified status | Capped (≤30%) | Verified accounts are over-studied |
+---
+## Power Analysis for Stratified Designs
+| Design | Min n/group | Detects | Total for 4 tiers |
+|--------|------------|---------|-------------------|
+| 2-group comparison, d=0.5 | 64 | Medium effect | 128 |
+| 2-group comparison, d=0.3 | 175 | Small effect | 350 |
+| 7 frames × 4 tiers, d=0.3 | 50/cell | Small effect | 1,400 |
+| Regression, 5 predictors | 100 total | Medium R² | 100 |
+**Rule:** Calculate power before committing to sample size. Under-powered stratified samples are worse than well-powered random samples.
+---
+## Saturation Detection
+For qualitative or exploratory coding:
+```python
+def check_saturation(coded_data, code_var='frame', window=50, threshold=0.05):
+    """Check if last `window` items added < `threshold` proportion of new codes."""
+    all_codes = set()
+    new_code_positions = []
+    for i, item in enumerate(coded_data):
+        code = item[code_var]
+        if code not in all_codes:
+            all_codes.add(code)
+            new_code_positions.append(i)
+    if not new_code_positions:
+        return {'saturated': True, 'n': len(coded_data), 'last_new': 0}
+    last_new = new_code_positions[-1]
+    items_since = len(coded_data) - last_new
+    return {
+        'saturated': items_since >= window,
+        'n': len(coded_data),
+        'last_new': last_new,
+        'items_since_new': items_since
+    }
+```
+**Saturation guideline:** If the last 50 coded items produced no new codes/themes, consider stopping.
+---
+## Integration
+### In `/data-analysis` Phase 1
+When data includes engagement metrics (likes, shares, retweets, etc.), automatically:
+1. Compute composite engagement score
+2. Assign tiers
+3. Report tier distribution in EDA output
+4. Flag if analysis sample is not engagement-stratified
+### In `/experiment-design`
+When designing content analysis studies:
+1. Include engagement stratification in sampling plan
+2. Calculate power per tier
+3. Document tier boundaries in pre-analysis plan
+### In review agents
+Check whether social media studies:
+- Report their sampling strategy
+- Account for engagement distribution
+- Avoid over-representing viral content
+- Flag as Major if unstratified convenience sample is used for causal claims
+---
+## Anti-Patterns
+| Anti-Pattern | Problem | Fix |
+|-------------|---------|-----|
+| Sampling by keyword only | Over-represents viral posts with the keyword | Stratify by engagement after keyword filter |
+| Using "top tweets" API endpoint | Only returns high-engagement content | Use full archive search, then stratify |
+| Treating retweet count as continuous DV | Highly skewed, zero-inflated | Use engagement tiers as strata or log-transform |
+| Equal allocation when one tier is tiny | Viral tier (5%) may have < n_per_tier items | Sample min(available, target), report actual n |

package/skills/shared/escalation-protocol.md ADDED Viewed

@@ -0,0 +1,74 @@
+# Escalation Protocol: Methodological Pushback
+> Shared protocol for review agents and skills. Defines how to push back when methodological answers are vague, designs are unsound, or critical details are missing. Adapted from CommScribe/CommDAAF (Xu 2026).
+## Principle
+**Helping produce invalid research helps no one.** Review agents have permission — and obligation — to escalate when methodology is inadequate. Escalation is not hostility; it's rigour.
+## When to Escalate
+- Research design lacks a specified estimand or identification strategy
+- Analysis plan uses defaults without justification
+- Causal claims lack a credible identification argument
+- Sample or data limitations are hand-waved
+- Robustness checks are missing or post-hoc
+- Statistical methods are misapplied (e.g., TWFE with staggered treatment)
+- Results interpretation overstates what the evidence supports
+## The Four Levels
+| Level | Trigger | Response | Tone |
+|-------|---------|----------|------|
+| **1. Probe** | Vague or missing detail | Ask a specific clarifying question. "What is the estimand?" / "How was K selected?" | Neutral, curious |
+| **2. Explain stakes** | Still vague after probing | State why the detail matters. "Without this, the results are not interpretable because..." | Direct, educational |
+| **3. Challenge** | Pushback or deflection | Name the specific threat to validity. "This design cannot distinguish X from Y. The coefficient captures both effects." | Firm, specific |
+| **4. Flag and stop** | User insists on unsound approach | State clearly that the approach will produce invalid results. Mark the issue as a **Blocker** in the report. Do not proceed with the flawed analysis — offer alternatives instead. | Non-negotiable, constructive |
+## Level 4: Flag and Offer Alternatives
+Never just block — always offer a path forward. When an approach fails at Level 4:
+```
+BLOCKER: [What's wrong and why it invalidates results]
+ALTERNATIVES:
+1. [Modified design that addresses the threat]
+2. [Narrower claim that the current design can support]
+3. [Additional data or test that would resolve the issue]
+4. [Descriptive analysis as honest fallback]
+```
+## How Agents Use This
+### In scored reviews (paper-critic, referee2-reviewer, domain-reviewer)
+- Level 1-2 issues: flag in report, deduct per quality-scoring severity
+- Level 3 issues: flag as **Critical** (-15 to -25)
+- Level 4 issues: flag as **Blocker** (-100, automatic 0)
+### In interactive work (causal-design, experiment-design, data-analysis)
+- Start at Level 1 for any underspecified element
+- Escalate through levels within the conversation
+- At Level 4, refuse to run the analysis and present alternatives
+- User can override with explicit acknowledgment: "I understand the limitation, proceed anyway" — in which case, proceed but add a prominently displayed caveat to any output
+- Flag claims that exceed what the methodology supports (Level 2-3)
+- Suggest hedging language that accurately reflects the evidence
+## What This Is NOT
+- **Not a licence to block all work.** Most research involves trade-offs. The protocol targets genuine threats to validity, not stylistic preferences or minor issues.
+- **Not adversarial for its own sake.** Each escalation level includes a constructive element (question, explanation, challenge with specifics, alternatives).
+- **Not a substitute for domain expertise.** When unsure whether something is methodologically unsound, say so: "I'm uncertain whether this is valid — here's my concern: [X]. Can you confirm?"
+## Integration with Existing Rules
+| Rule | How escalation interacts |
+|------|------------------------|
+| `design-before-results` | Level 1 probe: "Has the analysis plan been specified?" before running anything |
+| `severity-gradient` | Escalation levels map to severity tiers; Phase Detection determines how strictly to apply them |
+| `scope-discipline` | Escalation is in-scope when reviewing methodology; out-of-scope for unrelated issues |
+| `no-hardcoded-results` | Level 2: explain why hard-coded results undermine reproducibility |

package/skills/shared/external-audit-protocol.md ADDED Viewed

@@ -0,0 +1,205 @@
+# External Audit Protocol
+> Shared workflow for all `external-*-audit` skills. Each skill references this file and provides scope-specific configuration (CWD, checklist, report path).
+## What This Is
+Uses an external LLM CLI (Codex or Gemini) to get a **fresh, independent perspective** from a competing model on Claude Code infrastructure. This extends the agents-vs-skills principle: not just fresh Claude context, but an entirely different model reviewing the work.
+**Key difference from `/system-audit`:** System audit uses Claude sub-agents for mechanical checks (counts, symlinks, broken links). External audits are **qualitative** — architecture coherence, design quality, redundancy, missing capabilities, improvement suggestions.
+**Model selection:** Pass `--model codex` or `--model gemini` (default: gemini). Running both triangulates — different training and reasoning patterns produce more reliable findings than either alone.
+## Model Commands
+| Model | CLI | Execute command | Install |
+|-------|-----|----------------|---------|
+| Codex | `codex` | `codex exec --full-auto "{prompt}"` | `npm install -g @openai/codex` + `codex login` |
+| Gemini | `gemini` | `gemini -p "{prompt}" --yolo` | `npm install -g @google/gemini-cli` |
+## Pre-Flight
+Before anything else:
+1. **Verify the CLI is installed:**
+   ```bash
+   which {model} && {model} --version
+   ```
+   If not found, stop and tell the user with install instructions (see table above).
+2. **Confirm the target directory exists** (CWD from the skill config).
+3. **Check git status** in the target directory to establish a clean baseline:
+   ```bash
+   cd <CWD> && git status --short
+   ```
+   Note any pre-existing uncommitted changes so we can distinguish them from model modifications later.
+## Phase 1: Generate Manifest
+Claude generates a concise ecosystem manifest describing what the model will be auditing. This gives it the map before it explores the territory.
+Write to `/tmp/external-audit-manifest-{scope}.md`:
+```markdown
+# Ecosystem Manifest: {scope}
+## Root Directory
+{absolute path}
+## Key Paths
+{list of important directories and files, relative to root}
+## Counts
+{relevant metrics: number of skills, hooks, files, etc.}
+## Architecture Summary
+{2-3 sentence description of how the system is organized}
+## Tech Stack
+{languages, frameworks, key dependencies}
+```
+**Keep it under 200 lines.** The model needs context, not a full directory listing.
+## Phase 2: Build Prompt
+Combine the manifest with the skill-specific audit checklist into a single prompt file.
+Write to `/tmp/external-audit-prompt-{scope}.md`:
+```markdown
+# External Audit: {scope}
+## CRITICAL INSTRUCTIONS
+You are conducting a **READ-ONLY architecture audit**. You must:
+- **NEVER create, modify, or delete any files** — not even "cleanup" or "improvement" edits
+- **NEVER run git commands that change state** (no commit, add, push, checkout, rm)
+- **NEVER install packages or modify dependencies**
+- **NEVER append to, rename, or reorganize existing files**
+- Only read files and output your findings as markdown to stdout
+- If you feel compelled to fix something, describe the fix in your output instead — do NOT apply it
+## Ecosystem Manifest
+{contents of manifest file}
+## Audit Checklist
+{contents of skill-specific references/audit-checklist.md}
+## Output Format
+Structure your response as a markdown report with these sections:
+### Executive Summary
+2-3 sentence overall assessment.
+### Scored Sections
+For each checklist category, provide:
+- **Score:** A/B/C/D/F
+- **Strengths:** What's done well
+- **Issues:** Problems found (severity: Critical/Major/Minor)
+- **Recommendations:** Specific, actionable improvements
+### Architecture Diagram
+If helpful, describe the system architecture in text form.
+### Top 5 Recommendations
+Prioritized list of the most impactful improvements, with effort estimates (Quick/Medium/Large).
+### Fresh Eyes
+Things that seem odd, redundant, or unnecessarily complex to someone seeing this for the first time. This is the most valuable section — it's what insiders miss.
+```
+## Phase 3: Execute
+Run from the target CWD with a 10-minute timeout:
+**Codex:**
+```bash
+cd <CWD> && timeout 600 codex exec --full-auto "$(cat /tmp/external-audit-prompt-{scope}.md)" 2>&1
+```
+**Gemini:**
+```bash
+cd <CWD> && timeout 600 gemini -p "$(cat /tmp/external-audit-prompt-{scope}.md)" --yolo 2>&1
+```
+**Capture the full output** — both stdout and stderr.
+**If the model fails or times out:**
+- Log the error
+- Report to the user: "{Model} execution failed: {error}. The prompt is saved at `/tmp/external-audit-prompt-{scope}.md` if you want to retry manually."
+- Do not retry automatically
+## Phase 4: Safety Check
+> **Critical warning:** Both Codex and Gemini are agentic coding tools designed to make changes. Despite read-only instructions, they may create, modify, or delete files. Gemini has a documented tendency to do this. **Always assume the model touched something** and verify thoroughly. This check is not optional.
+**Immediately after the model returns**, verify nothing was modified:
+```bash
+cd <CWD> && git diff --stat
+cd <CWD> && git status --short
+```
+Run **both** commands — `git diff` catches modifications to tracked files, `git status` catches new untracked files or deletions. Compare against the baseline from pre-flight.
+If there are **new changes not present before**:
+1. **Alert the user immediately:** "{Model} modified files despite read-only instructions. Changed: {list}"
+2. **Revert immediately:** `git checkout -- .` (for modifications/deletions). For new untracked files, list them and delete manually.
+3. **Do not proceed** with report generation until the working tree matches the pre-flight baseline
+4. **Note the violation** in the report header (replace "No files modified" with a description of what was changed and reverted)
+## Phase 5: Write Report
+Write the model output to the skill-specified report path (typically `audits/{model}-audit-{scope}-YYYY-MM-DD.md`).
+Add a header:
+```markdown
+# {Model} Audit: {Scope Title} — YYYY-MM-DD
+> Generated by {Model CLI description}.
+> Read-only audit | No files modified.
+---
+{Model output}
+```
+## Phase 6: Present
+Show the user:
+1. **Executive summary** from the report
+2. **Any Critical/Major issues** found
+3. **Top 5 recommendations** with effort estimates
+4. **Fresh Eyes section** — the outsider perspective
+5. Link to the full report
+Ask if he wants to address any findings now.
+## Error Handling
+| Situation | Action |
+|-----------|--------|
+| CLI not installed | Stop, show install instructions |
+| Target directory doesn't exist | Stop, ask the user for correct path |
+| Model times out (>10 min) | Save partial output if any, report failure |
+| Model modifies files | Alert, offer revert, do not continue |
+| Model returns empty output | Report failure, save prompt for manual retry |
+| Git not initialized in target | Skip git safety checks, warn the user |
+| Authentication error | Stop, tell the user to authenticate |
+## Integration with Other Skills
+| Skill | Relationship |
+|-------|-------------|
+| `/system-audit` | Mechanical checks (counts, symlinks) — complementary, not overlapping |
+| `/audit-project-research` | Per-project structural audit — external audits are cross-cutting |
+| `/lessons-learned` | Critical findings can feed into post-mortems |
+| `/ideas` | Recommendations can be captured as improvement ideas |