npm - @dogfood-lab/study-swarm - Versions diffs - 0.0.0 → 0.6.0 - Mend

@dogfood-lab/study-swarm 0.0.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,26 @@
+# Changelog
+All notable changes to this project are documented here. The format is based on [Keep a Changelog](https://keepachangelog.com/), and this project adheres to [Semantic Versioning](https://semver.org/).
+## [0.6.0] — 2026-06-02
+### Added
+- **Thin CLI** (`study-swarm`) — zero runtime dependencies, ships in the package:
+  - `study-swarm protocol` — print the locked protocol.
+  - `study-swarm new <slug>` — scaffold a `<slug>.dispatch.md` to fill in.
+  - `study-swarm lint <file>` — deterministically check a dispatch's *Research grounding* against the sourcing standard (every finding needs author + year + a resolvable arXiv/DOI/URL; vague "studies show…" claims are rejected). Exit `1` on violations, so it gates CI.
+- `npm run verify` smoke test; CI smoke-tests the CLI before publishing.
+## [0.5.0] — 2026-06-02
+### Added
+- Initial public release of the **study-swarm** methodology.
+- `README.md` — the protocol in five steps, the family-different verification rationale (with citations), the proof (two decorrelated non-Claude families catching planted citation traps), and how it wires to prism-verify + role-os.
+- `PROTOCOL.md` — the locked execution shape: the two-stage citation check, the halt table, the sourcing standard, and the architecture the protocol enables.
+- `SECURITY.md`, MIT `LICENSE`, project logo.
+- Landing page + Starlight handbook at <https://dogfood-lab.github.io/study-swarm/>.
+[0.6.0]: https://github.com/dogfood-lab/study-swarm/releases/tag/v0.6.0
+[0.5.0]: https://github.com/dogfood-lab/study-swarm/releases/tag/v0.5.0

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 dogfood-lab
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/PROTOCOL.md ADDED Viewed

@@ -0,0 +1,108 @@
+# The study-swarm protocol — locked execution shape
+This is the executable reference. The narrative, the proof, and the research grounding are in [README.md](README.md).
+> **The one-line guard:** no finding reaches Step 5 unverified. If you cannot verify — verifier down, no different family reachable, retrieval oracle unreachable — you HALT and escalate; you do not proceed. The protocol never lets a model grade its own homework, including the one running it.
+## When to invoke
+Fire when ANY hold:
+- A decision introduces a **new product layer** (not a fix, scope extension, or operational tuning).
+- The decision is **qualitative** — "should we trust the model here," "explain or just do," "cap options," "retry or fall back."
+- You're about to recommend a **single-axis** answer (deterministic-only / LLM-only) where the real answer is multi-axis (deterministic floor + LLM ceiling + verifier).
+- An **adjacent domain** (compilers, SRE, databases, mixed-initiative HCI) has likely solved this.
+Does NOT fire for: pure fixes; scope extensions of already-grounded work; operational tuning ("what number," not "what shape").
+The cost of running it is one parallel dispatch and a few minutes of synthesis. The cost of skipping it is the failure this protocol exists to prevent: shrinking a design to a simpler shape out of an unexamined fear of AI advice, defending choices with "studies show…" and no studies named, and resting architecture on citations that don't exist.
+## Step 1 — Identify load-bearing design decisions
+List the specific questions where empirical evidence would change the answer. Aim for 3–5. **Fewer is fine** when the decision is genuinely substantial — run with 1–2 agents; the decision-to-investigate governs invocation, the number of evidence-changing questions governs breadth. Do not manufacture questions to hit a count, and do not abort for being under three. More than ~6 → split into multiple passes.
+## Step 2 — Dispatch parallel research agents
+One agent per question, dispatched **in parallel** (a single batch). Each agent's prompt MUST include:
+- the context — what's being built, why this question matters;
+- the question shape, scoped to **evidence**, not opinion;
+- a demand for SPECIFIC findings: paper titles, authors, years, URLs, a one-sentence key finding per source;
+- a word cap (typically 500–600);
+- "prefer specificity over breadth — 6–8 well-sourced findings beat 20 vague gestures";
+- a note to use web search / fetch.
+Typical agent count: 3–5.
+> Step 4 makes retrieval a **hard** requirement: a paper an agent "remembers" but cannot retrieve does not enter the dispatch. Existence is established by resolving the identifier, not by recall.
+## Step 3 — Synthesize into a "Research grounding" section
+A dedicated section near the top of the design doc, before the architectural decisions. Each finding follows one template:
+```
+N. **<one-sentence finding>.** <Authors> <year> (<paper title or arXiv:NNNN.NNNNN>). <Implication for the system being designed>.
+```
+Example:
+> 1. **Contrastive explanations with a predicted human foil improve independent decision-making.** Buçinca et al. 2024 (arXiv:2410.04253) — N=628 between-subjects. Implication: every recommendation carries a "you might think X; I'm recommending Y because…" frame.
+The format does three things at once: states the finding, cites the source so it can be verified, and names the design implication so the link evidence→choice is visible.
+## Step 4 — External verification gate (family-different, reasoning-stripped)
+Before any finding informs the design (Step 5), a verifier of a **different model family** from the synthesizing model, with the synthesizer's reasoning hidden, checks every citation. The Step 2 research agents are *inputs* — they produce citations; they are **not** verifiers of the synthesis. A separate family must check, or it's a model grading its own homework — the exact failure the protocol prescribes verifiers to prevent.
+**Non-circular by construction:** the verifier adjudicates via a deterministic retrieval oracle (existence) plus a different-family lens (groundedness). It does not re-run this protocol and does not rely on anyone's recall.
+### Two-stage check, per citation
+1. **Existence / attribution — a retrieval oracle, not a parametric LLM.** Resolve the arXiv ID / DOI / URL and confirm the paper exists with the stated title, authors, and year. This stage **must retrieve** (fetch the source / arXiv / Crossref), never model memory — fabrication and misattribution rates are high enough (Walters & Wilder 2023) and 2025–2026 papers postdate model training, so a parametric check will false-flag real work as fabricated (Onweller et al. 2026). If retrieval is unavailable, apply the halt-and-escalate rule below.
+2. **Groundedness — finding matches source.** Confirm the one-sentence finding describes what the source actually claims (an NLI-style support check). Even strong models fail to fully support their own citations roughly half the time, so this is a distinct, necessary axis — not implied by existence.
+### Running it
+The reference implementation is **`roleos verify-citations <dispatch>`** ([role-os](https://github.com/mcp-tool-shop-org/role-os)), which shells **[prism-verify](https://github.com/mcp-tool-shop-org/prism-verify)** (`prism verify --type citations`): family-different routing by construction, reasoning-stripped, a deterministic retrieval existence floor, a groundedness lens, and a signed receipt. By hand, the fallback is any non-same-family model run reasoning-stripped against the bare citation claims, plus resolving each identifier yourself.
+**Ensemble — ≥ 3 decorrelated lenses,** counting the **retrieval oracle as one mechanism-diverse lens**: retrieval oracle + ≥ 2 different-family LLM lenses. Diversity of lenses, not raw count, is the load-bearing variable (Rajan 2025; Kim et al. 2025).
+### Halt conditions (scope is per-finding — other verified findings proceed)
+| Verdict / condition | Action |
+|---|---|
+| **FABRICATED** | The finding is **dropped** — there is no real source to correct, so re-verification is not attempted. |
+| **MISATTRIBUTED** | Correct the attribution and re-verify **once**; a second non-clean verdict drops the finding. |
+| **CANNOT_CONFIRM** | The finding is **removed from the design connection AND surfaced to a human with a contrastive frame** — "you probably expected finding N citable; I left it out because the oracle couldn't confirm it — override with X." Never silently kept; reinstated only if a human confirms the source. |
+| **Verifier or oracle UNAVAILABLE** | The dispatch **HALTS and escalates to a human.** Unavailability is NEVER read as "citations are fine" and NEVER read as fabrication. Proceeding without a completed verification is forbidden. |
+| **No different family reachable** | The retrieval oracle (Stage 1) still runs — it is mechanism-diverse and needs no different family — and gates existence. The groundedness LLM lens (Stage 2) **halts-and-escalates** rather than running same-family. A same-family LLM is never substituted for the different-family check. |
+## Step 5 — Connect findings to architecture, not just cite them
+The design's decision section references findings by number where they justify a choice; each load-bearing choice traces to ≥ 1 finding. Citations without connection are noise.
+Example: *"Retry uses a fresh prompt without the previous output. (sycophancy mitigation, Kim 2025.)"* — the choice is annotated with the source and the reason, so a reader knows why the rule exists, not just that it does.
+## Sourcing standard
+**A citation includes ALL of:** (1) author(s) — first author + "et al." inline is fine; (2) year; (3) paper title OR canonical identifier (arXiv:NNNN.NNNNN, DOI, RFC); (4) a direct URL to the source (not a summary or a social-media thread); (5) a one-sentence key finding in your own words.
+**Not allowed:** "studies show…" / "research suggests…" / "it's well-established…" without naming the source; titles without authors or years; citations the research step did not actually surface.
+## The architecture this protocol enables
+Across the designs it has grounded, the same shape recurs:
+```
+System decides structure deterministically
+  ↓
+Model writes within that structure
+  ↓
+Verifier admits before output
+```
+- **Deterministic floor** — the system makes the law-defining call; the model never does.
+- **Model in the prose / prioritization role** — AI adds judgment, contextual explanation, and prioritization where it adds value.
+- **Verifier as admission gate** — the verifier checks the output against the structure before admitting it; retries use a fresh context to avoid sycophancy drift.
+Designs that touch model-facing behavior default to this shape unless evidence justifies a different one.

package/README.es.md ADDED Viewed

@@ -0,0 +1,92 @@
+<p align="center">
+  <a href="README.ja.md">日本語</a> | <a href="README.zh.md">中文</a> | <a href="README.md">English</a> | <a href="README.fr.md">Français</a> | <a href="README.hi.md">हिन्दी</a> | <a href="README.it.md">Italiano</a> | <a href="README.pt-BR.md">Português (BR)</a>
+</p>
+<p align="center">
+  <img src="https://raw.githubusercontent.com/dogfood-lab/study-swarm/main/assets/study-swarm.png" alt="study-swarm" width="360">
+</p>
+<p align="center">
+  <a href="https://www.npmjs.com/package/@dogfood-lab/study-swarm"><img src="https://img.shields.io/npm/v/@dogfood-lab/study-swarm" alt="npm"></a>
+  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green" alt="MIT License"></a>
+  <a href="https://dogfood-lab.github.io/study-swarm/"><img src="https://img.shields.io/badge/handbook-live-purple" alt="Handbook"></a>
+  <img src="https://img.shields.io/badge/cited%20research-verified-1f6feb" alt="Cited research, verified">
+</p>
+**Fundamentar las decisiones de diseño en investigaciones citadas, y luego verificar las citas con un *modelo diferente* antes de que se convierta en algo definitivo.**
+`study-swarm` es un protocolo, no una herramienta. Cuando se toma una decisión de diseño importante con un LLM (un nuevo nivel de producto, una elección de arquitectura, una decisión sobre si se debe confiar en el modelo), improvisar a partir de principios básicos da como resultado diseños obsoletos, y citar artículos de memoria da como resultado diseños que se basan en fuentes que no existen o que no dicen lo que se cree. `study-swarm` reemplaza ambas opciones: se envían agentes de investigación en paralelo, se exige la presentación de hallazgos específicos y se verifica cada cita a través de un **verificador externo de una familia de modelos diferente** antes de que se utilice para informar el diseño.
+Aplica su propia medicina. El protocolo prescribe el uso de verificadores para proteger los resultados de los sistemas que ayuda a diseñar, por lo que lo aplica también a sí mismo. **Ningún modelo califica su propio trabajo, incluido el que ejecuta el protocolo.**
+## El protocolo en cinco pasos
+1. **Identificar** de 3 a 5 preguntas de diseño clave en las que la evidencia empírica cambiaría la respuesta.
+2. **Enviar** un agente de investigación por cada pregunta, en paralelo. Cada uno debe devolver títulos de artículos + autores + años + URL + un hallazgo de una sola frase; se prioriza la especificidad sobre la amplitud ("6-8 hallazgos bien documentados superan a 20 observaciones vagas").
+3. **Sintetizar** los hallazgos en una sección de *fundamentación de la investigación*: `N. **<hallazgo>.** <Autores> <año> (<arXiv/DOI>). <implicación para el diseño>.`
+4. **Verificar externamente** — una *familia de modelos diferente*, sin razonamiento, verifica cada cita en dos etapas: un **oráculo de recuperación** confirma que el artículo existe (nunca la memoria del modelo), y luego una lente de **fundamentación** confirma que el hallazgo coincide con la fuente. **Detener** si se detecta una falsificación o atribución incorrecta; **detener y escalar** si el verificador o el oráculo de recuperación no están disponibles (nunca interpretar la ausencia como "las citas son correctas").
+5. **Conectar** cada elección de arquitectura con un hallazgo mediante un número. Las citas sin una implicación para el diseño son ruido.
+Los detalles completos y ejecutables (la tabla de detención, el estándar de fuentes, la regla de conjunto) se encuentran en **[PROTOCOL.md](PROTOCOL.md)**.
+## ¿Por qué una *familia diferente* y sin razonamiento?
+Porque los modos de fallo están documentados, no son hipotéticos:
+- **Los LLM no pueden verificar de manera confiable su propia salida.** Huang et al. 2023 ([arXiv:2310.01798](https://arxiv.org/abs/2310.01798)); Kambhampati et al. 2024 ([arXiv:2402.01817](https://arxiv.org/abs/2402.01817), LLM-Modulo); Stechly et al. 2024 ([arXiv:2402.08115](https://arxiv.org/abs/2402.08115)) — el verificador externo proporciona las ventajas; el contenido de autocrítica es inerte.
+- **Los evaluadores de la misma familia se auto-favorecen.** Panickssery, Bowman & Feng 2024 ([arXiv:2404.13076](https://arxiv.org/abs/2404.13076)) — el auto-reconocimiento se correlaciona *linealmente* con la auto-preferencia, por lo que el cegamiento parcial no ayuda. Verga et al. 2024 ([arXiv:2404.18796](https://arxiv.org/abs/2404.18796), PoLL) — un panel de diferentes familias es menos sesgado a un costo aproximadamente 7 veces menor.
+- **Las citas son donde los LLM mienten.** Walters & Wilder 2023 ([doi:10.1038/s41598-023-41032-5](https://doi.org/10.1038/s41598-023-41032-5)) — el 55% de las citas de GPT-3.5 / 18% de las citas de GPT-4 son falsas. Onweller et al. 2026 ([arXiv:2605.06635](https://arxiv.org/abs/2605.06635)) — los enlaces resuelven más del 94% de las veces, pero solo el 39-77% del contenido citado realmente respalda la afirmación. Por lo tanto, la existencia debe verificarse mediante la **recuperación, no la memoria**.
+- **Ocultar el razonamiento del generador.** Khalifa et al. 2026 ([arXiv:2601.14691](https://arxiv.org/abs/2601.14691), "Gaming the Judge") — la manipulación del razonamiento en cadena infla los falsos positivos de un evaluador hasta en un 90% con acciones fijas. Turpin et al. 2023 ([arXiv:2305.04388](https://arxiv.org/abs/2305.04388)) — el razonamiento en cadena es una racionalización *a posteriori*. El verificador ve la afirmación de la cita sin adornos, nunca el "por qué la incluí".
+- **La diversidad supera a la cantidad.** Rajan 2025 ([arXiv:2511.16708](https://arxiv.org/abs/2511.16708)) — cuatro verificadores con una correlación por pares de ρ ∈ [0.05, 0.25] superan a cualquiera de ellos mediante una cobertura submodular. Kim et al. 2025 ([arXiv:2506.07962](https://arxiv.org/abs/2506.07962)) — los errores de los LLM están *correlacionados*, por lo que la variable clave es la diversidad de las lentes, no la cantidad.
+## ¿Funciona realmente? (prueba)
+Como prueba, el protocolo se ejecutó con sus propias citas. Dos familias no correlacionadas y diferentes a Claude — **Mistral** (`mistral-small:24b`) y **IBM Granite** (`granite4.1:30b`) — verificaron un conjunto de citas, sin razonamiento, con dos trampas ocultas:
+| Trampa plantada | Mistral | IBM Granite | Verdad fundamental |
+|---|---|---|---|
+| El razonamiento en cadena atribuido a "Nakamura & Olsen" | no se detectó | **se detectó** (atribución incorrecta → en realidad Wei et al. 2022) | atribución incorrecta |
+| un artículo fabricado con la afirmación de que "el 98% de los errores se eliminan, no se necesita un oráculo" | **caught** (fabricated) | **caught** (fabricated) | fabricado |
+Ninguna de las dos familias detectó ambas trampas por sí sola, pero su **unión detectó 2/2**. Un solo evaluador habría aceptado la atribución incorrecta. Por separado, el oráculo de recuperación detectó dos *atribuciones incorrectas reales* en nuestros propios documentos de diseño (artículos citados con el primer autor incorrecto) que ningún LLM paramétrico podría haber detectado, y confirmó correctamente los artículos genuinos de 2026 que ambos LLM marcaron erróneamente como fabricados simplemente porque los artículos son posteriores a su entrenamiento. Ese último punto es la razón por la que la verificación de la existencia en el paso 4 **debe** ser un oráculo de recuperación, nunca un LLM.
+Esa única ejecución es la tesis en miniatura: **lentes no correlacionadas + un oráculo de recuperación para la existencia superan a cualquier evaluador inteligente**.
+## Cómo está conectado
+Puede ejecutar el protocolo manualmente: cualquier modelo de una familia diferente más la resolución de arXiv/DOI por sí mismo satisface el paso 4. Dos herramientas complementarias lo convierten en un solo comando:
+- **[prism-verify](https://github.com/mcp-tool-shop-org/prism-verify)** — el verificador en tiempo de ejecución: enrutamiento diferenciado por familia, sin razonamiento superfluo, adjudicación con múltiples lentes, un umbral determinista para la existencia de referencias (arXiv → Crossref) y comprobantes firmados.
+- **[role-os](https://github.com/mcp-tool-shop-org/role-os)** — proporciona `roleos verify-citations <dispatch>`, el programa que extrae las citas de un documento y las valida a través de prism.
+## CLI
+```bash
+npm i -g @dogfood-lab/study-swarm     # or run ad-hoc: npx @dogfood-lab/study-swarm <command>
+```
+| Comando | Función |
+|---|---|
+| `study-swarm protocol` | Imprime el protocolo completo: los cinco pasos, la tabla de control y el estándar de búsqueda de fuentes. |
+| `study-swarm new <slug>` | Crea un archivo `<slug>.dispatch.md` con la estructura de los cinco pasos para que se complete. |
+| `study-swarm lint <file>` | Comprueba la *base de investigación* de un documento en relación con el estándar de búsqueda de fuentes; cada hallazgo debe tener un autor, un año y un identificador que se pueda resolver (arXiv / DOI / URL); se rechazan las afirmaciones vagas como "los estudios demuestran...". Si se detectan infracciones, el programa finaliza con el código `1`, lo que impide que se ejecute en el entorno de integración continua (CI). |
+`lint` es determinista (no realiza llamadas al modelo), por lo que es seguro para usar en el entorno de integración continua (CI). Aplica el **estándar de búsqueda de fuentes del paso 3** a nivel local; la verificación basada en el modelo del **paso 4** sigue dependiendo de [`roleos verify-citations`](https://github.com/mcp-tool-shop-org/role-os) → prism.
+## Por qué funciona, en pocas palabras
+**Eficiencia:** el campo avanza rápidamente; exigir estudios específicos y exhaustivos retrasa el lanzamiento de los diseños en 18 meses. **Funcionalidad:** la evidencia muestra lo que *falla*, no solo lo que funciona (las explicaciones pueden aumentar la dependencia excesiva de una IA *incorrecta* — Bansal et al. 2021). **Seguridad:** el entorno protegido por el verificador es la arquitectura que respalda la evidencia, y el protocolo la aplica a su propia salida. La verificación no es un ejercicio académico; es el rastro de la evidencia.
+## Seguridad
+`study-swarm` es un repositorio de documentación: Markdown y un logotipo. No incluye código ejecutable ni instala nada de este repositorio. No accede a datos, no requiere permisos y no recopila datos de telemetría; no hay secretos ni credenciales en el código fuente. La metodología *describe* un flujo de trabajo que utiliza la recuperación web y la verificación basada en modelos, pero este repositorio no lo implementa ni lo ejecuta. Consulte [SECURITY.md](SECURITY.md).
+## Estado
+Un protocolo funcional, verificado externamente por su propio mecanismo: una familia de modelos diferente verifica sus citas (vea la prueba anterior). Este repositorio es la referencia pública; [PROTOCOL.md](PROTOCOL.md) es la implementación ejecutable. Forma parte de la familia [dogfood-lab](https://github.com/dogfood-lab): métodos y ejemplos para construir en la era de la IA.
+Con licencia MIT.
+---
+<p align="center"><sub>Part of the <a href="https://github.com/dogfood-lab">dogfood-lab</a> family — methods &amp; showcases for building in the AI era. Built by <a href="https://mcp-tool-shop.github.io/">MCP Tool Shop</a>.</sub></p>

package/README.fr.md ADDED Viewed

@@ -0,0 +1,92 @@
+<p align="center">
+  <a href="README.ja.md">日本語</a> | <a href="README.zh.md">中文</a> | <a href="README.es.md">Español</a> | <a href="README.md">English</a> | <a href="README.hi.md">हिन्दी</a> | <a href="README.it.md">Italiano</a> | <a href="README.pt-BR.md">Português (BR)</a>
+</p>
+<p align="center">
+  <img src="https://raw.githubusercontent.com/dogfood-lab/study-swarm/main/assets/study-swarm.png" alt="study-swarm" width="360">
+</p>
+<p align="center">
+  <a href="https://www.npmjs.com/package/@dogfood-lab/study-swarm"><img src="https://img.shields.io/npm/v/@dogfood-lab/study-swarm" alt="npm"></a>
+  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green" alt="MIT License"></a>
+  <a href="https://dogfood-lab.github.io/study-swarm/"><img src="https://img.shields.io/badge/handbook-live-purple" alt="Handbook"></a>
+  <img src="https://img.shields.io/badge/cited%20research-verified-1f6feb" alt="Cited research, verified">
+</p>
+**Ancrez les décisions de conception dans les recherches citées, puis vérifiez les citations à l’aide d’un *modèle* différent avant que quoi que ce soit ne devienne un élément canonique.**
+`study-swarm` est un protocole, pas un outil. Lorsque vous prenez une décision de conception importante avec un LLM (un nouveau niveau de produit, un choix d’architecture, une question du type « devons-nous faire confiance au modèle ici »), improviser à partir de principes de base conduit à des conceptions obsolètes, et citer des articles de mémoire conduit à des conceptions qui reposent sur des sources qui n’existent pas ou qui ne disent pas ce que vous pensez. `study-swarm` remplace les deux : il lance des agents de recherche parallèles, exige des résultats spécifiques cités et soumet chaque citation à un **vérificateur externe d’une famille de modèles différente** avant qu’elle n’influence la conception.
+Il applique sa propre méthode. Le protocole prescrit des enveloppes protégées par un vérificateur pour les systèmes qu’il aide à concevoir, il l’applique donc à lui-même. **Aucun modèle n’évalue son propre travail, y compris celui qui exécute le protocole.**
+## Le protocole en cinq étapes
+1. **Identifiez** 3 à 5 questions de conception essentielles auxquelles des preuves empiriques permettraient de modifier la réponse.
+2. **Lancez** un agent de recherche par question, en parallèle. Chacun doit renvoyer les titres des articles + les auteurs + les années + les URL + un résultat en une phrase (la spécificité prime sur l’étendue : « 6 à 8 résultats bien documentés sont plus efficaces que 20 observations vagues »).
+3. **Synthétisez** les résultats dans une section « Justification par la recherche » : `N. **<résultat>.** <Auteurs> <année> (<arXiv/DOI>). <implication pour la conception>.`
+4. **Vérifiez de manière externe** — une *famille de modèles différente*, sans raisonnement, vérifie chaque citation en deux étapes : un **oracle de récupération** confirme que l’article existe (jamais la mémoire du modèle), puis une **lentille de pertinence** confirme que le résultat correspond à la source. **Arrêtez** en cas de fabrication/d’attribution incorrecte ; **arrêtez et escaladez** si le vérificateur ou l’oracle de récupération n’est pas disponible (ne considérez jamais l’absence comme « les citations sont correctes »).
+5. **Reliez** chaque choix architectural à un résultat par numéro. Les citations sans implication pour la conception sont du bruit.
+Les détails exécutables complets (le tableau d’arrêt, la norme de référencement, la règle d’ensemble) se trouvent dans **[PROTOCOL.md](PROTOCOL.md)**.
+## Pourquoi une *famille différente*, sans raisonnement ?
+Parce que les modes d’échec sont documentés, et non hypothétiques :
+- **Les LLM ne peuvent pas vérifier de manière fiable leurs propres résultats.** Huang et al. 2023 ([arXiv:2310.01798](https://arxiv.org/abs/2310.01798)) ; Kambhampati et al. 2024 ([arXiv:2402.01817](https://arxiv.org/abs/2402.01817), LLM-Modulo) ; Stechly et al. 2024 ([arXiv:2402.08115](https://arxiv.org/abs/2402.08115)) — le vérificateur externe apporte les avantages ; le contenu d’auto-critique est inerte.
+- **Les juges de la même famille ont une préférence pour eux-mêmes.** Panickssery, Bowman & Feng 2024 ([arXiv:2404.13076](https://arxiv.org/abs/2404.13076)) — l’auto-reconnaissance est corrélée *linéairement* avec l’auto-préférence, de sorte qu’un aveuglement partiel n’aide pas. Verga et al. 2024 ([arXiv:2404.18796](https://arxiv.org/abs/2404.18796), PoLL) — un groupe de juges issus de familles distinctes est moins biaisé, pour un coût environ 7 fois inférieur.
+- **Les citations sont les endroits où les LLM mentent.** Walters & Wilder 2023 ([doi:10.1038/s41598-023-41032-5](https://doi.org/10.1038/s41598-023-41032-5)) — 55 % des citations de GPT-3.5 / 18 % des citations de GPT-4 sont fabriquées. Onweller et al. 2026 ([arXiv:2605.06635](https://arxiv.org/abs/2605.06635)) — les liens résolvent > 94 % du temps, mais seulement 39 à 77 % du contenu cité soutiennent réellement l’affirmation. Par conséquent, l’existence doit être vérifiée par **la récupération, et non par la mémorisation**.
+- **Masquez le raisonnement du générateur.** Khalifa et al. 2026 ([arXiv:2601.14691](https://arxiv.org/abs/2601.14691), « Gaming the Judge ») — la manipulation du raisonnement en chaîne seule augmente les faux positifs d’un juge jusqu’à 90 %, les actions étant maintenues fixes. Turpin et al. 2023 ([arXiv:2305.04388](https://arxiv.org/abs/2305.04388)) — le raisonnement en chaîne est une rationalisation a posteriori. Le vérificateur voit la revendication de citation brute, jamais le « pourquoi je l’ai incluse ».
+- **La diversité est plus importante que le nombre.** Rajan 2025 ([arXiv:2511.16708](https://arxiv.org/abs/2511.16708)) — quatre vérificateurs avec une corrélation par paires ρ ∈ [0,05, 0,25] sont plus efficaces qu’un seul, grâce à une couverture sous-modulaire. Kim et al. 2025 ([arXiv:2506.07962](https://arxiv.org/abs/2506.07962)) — les erreurs des LLM sont *corrélées*, de sorte que la variable essentielle est la diversité des lentilles, et non le nombre brut.
+## Est-ce que cela fonctionne réellement ? (preuve)
+À titre de test, le protocole a été appliqué à ses propres citations. Deux familles non-Claude non corrélées — **Mistral** (`mistral-small:24b`) et **IBM Granite** (`granite4.1:30b`) — ont vérifié un ensemble de citations, sans raisonnement, en utilisant deux pièges aveugles :
+| Piège planté | Mistral | IBM Granite | Vérité terrain |
+|---|---|---|---|
+| Le raisonnement en chaîne attribué à « Nakamura & Olsen » | manqué | **détecté** (attribution incorrecte → en réalité Wei et al. 2022) | attribution incorrecte |
+| un article fabriqué « 98 % des erreurs éliminées, aucun oracle n’est nécessaire » | **caught** (fabricated) | **caught** (fabricated) | fabriqué |
+Aucune des deux familles n’a détecté les deux pièges seule, mais leur **union a détecté 2/2**. Un seul juge aurait validé l’attribution incorrecte. Par ailleurs, l’oracle de récupération a détecté deux *véritables* attributions incorrectes dans nos propres documents de conception (articles cités sous le mauvais premier auteur) que aucun LLM paramétrique n’aurait pu signaler, et il a correctement confirmé les articles authentiques de 2026 que les deux LLM ont faussement signalés comme étant fabriqués simplement parce que les articles sont postérieurs à leur date d’entraînement. Ce dernier point est la raison pour laquelle la vérification de l’existence à l’étape 4 **doit** être effectuée par un oracle de récupération, et non par un LLM.
+Cette seule exécution est la thèse en miniature : **des lentilles non corrélées + un oracle de récupération pour l’existence sont plus efficaces qu’un seul juge intelligent.**
+## Comment cela fonctionne
+Vous pouvez exécuter le protocole manuellement — tout modèle d’une famille différente, ainsi que la résolution de l’arXiv/DOI vous-même, satisfont aux exigences de l’étape 4. Deux outils frères permettent de le faire en une seule commande :
+- **[prism-verify](https://github.com/mcp-tool-shop-org/prism-verify)** — le vérificateur d’exécution : routage différencié par famille, suppression du raisonnement, adjudication multi-lentilles, seuil d’existence de récupération déterministe (arXiv → Crossref) et reçus signés.
+- **[role-os](https://github.com/mcp-tool-shop-org/role-os)** — fournit `roleos verify-citations <dispatch>`, l’exécutant qui extrait les citations d’un document et les soumet à prism pour vérification.
+## Interface en ligne de commande
+```bash
+npm i -g @dogfood-lab/study-swarm     # or run ad-hoc: npx @dogfood-lab/study-swarm <command>
+```
+| Commande | Fonctionnement |
+|---|---|
+| `study-swarm protocol` | Affiche le protocole complet : les cinq étapes, la table d’arrêt, la norme de référencement. |
+| `study-swarm new <slug>` | Crée un fichier `<slug>.dispatch.md` avec le squelette des cinq étapes à compléter. |
+| `study-swarm lint <file>` | Vérifie le *fondement de la recherche* d’un document par rapport à la norme de référencement : chaque élément doit avoir un auteur, une année et un identifiant résolvable (arXiv / DOI / URL) ; les affirmations vagues du type « des études montrent… » sont rejetées. En cas de violation, le programme se termine avec le code `1`, ce qui empêche l’exécution dans l’environnement CI. |
+`lint` est déterministe — il n’effectue aucun appel de modèle — il est donc sûr à utiliser dans l’environnement CI. Il applique localement la **norme de référencement de l’étape 3** ; la vérification basée sur un modèle de l’**étape 4** est toujours déléguée à [`roleos verify-citations`](https://github.com/mcp-tool-shop-org/role-os) → prism.
+## Pourquoi cela fonctionne, en quelques mots
+**Actuel** — le domaine évolue rapidement ; exiger des études spécifiques avec des dates permet d’éviter que les conceptions ne soient mises en œuvre avec 18 mois de retard. **Fonctionnel** — les données montrent ce qui *ne fonctionne pas*, et pas seulement ce qui fonctionne (les explications peuvent entraîner une dépendance excessive à l’égard d’une IA *incorrecte* — Bansal et al. 2021). **Sûr** — l’enveloppe protégée par le vérificateur est l’architecture que les données étayent, et le protocole l’applique à ses propres résultats. Le référencement n’est pas un exercice académique ; c’est la chaîne de preuves.
+## Sécurité
+`study-swarm` est un dépôt de documentation : Markdown et un logo. Il ne contient aucun code exécutable et n’installe rien à partir de ce dépôt. Il n’accède à aucune donnée, ne nécessite aucune autorisation et ne collecte aucune télémétrie ; il n’y a pas de secrets ou d’identifiants dans le code source. La méthodologie *décrit* un flux de travail qui utilise la récupération sur le web et la vérification basée sur un modèle, mais ce dépôt ne l’implémente ni ne l’exécute. Voir [SECURITY.md](SECURITY.md).
+## État
+Un protocole fonctionnel, vérifié de manière externe par ses propres mécanismes : une famille de modèles différente vérifie ses citations (voir la preuve ci-dessus). Ce dépôt est la référence publique ; [PROTOCOL.md](PROTOCOL.md) est la forme exécutable. Fait partie de la famille [dogfood-lab](https://github.com/dogfood-lab) : méthodes et exemples pour construire dans l’ère de l’IA.
+Licence MIT.
+---
+<p align="center"><sub>Part of the <a href="https://github.com/dogfood-lab">dogfood-lab</a> family — methods &amp; showcases for building in the AI era. Built by <a href="https://mcp-tool-shop.github.io/">MCP Tool Shop</a>.</sub></p>

package/README.hi.md ADDED Viewed

@@ -0,0 +1,92 @@
+<p align="center">
+  <a href="README.ja.md">日本語</a> | <a href="README.zh.md">中文</a> | <a href="README.es.md">Español</a> | <a href="README.fr.md">Français</a> | <a href="README.md">English</a> | <a href="README.it.md">Italiano</a> | <a href="README.pt-BR.md">Português (BR)</a>
+</p>
+<p align="center">
+  <img src="https://raw.githubusercontent.com/dogfood-lab/study-swarm/main/assets/study-swarm.png" alt="study-swarm" width="360">
+</p>
+<p align="center">
+  <a href="https://www.npmjs.com/package/@dogfood-lab/study-swarm"><img src="https://img.shields.io/npm/v/@dogfood-lab/study-swarm" alt="npm"></a>
+  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green" alt="MIT License"></a>
+  <a href="https://dogfood-lab.github.io/study-swarm/"><img src="https://img.shields.io/badge/handbook-live-purple" alt="Handbook"></a>
+  <img src="https://img.shields.io/badge/cited%20research-verified-1f6feb" alt="Cited research, verified">
+</p>
+**आधारभूत अनुसंधान में डिज़ाइन संबंधी निर्णयों को स्थापित करें — फिर किसी भी चीज़ को आधिकारिक बनाने से पहले एक *अलग* मॉडल परिवार के साथ उद्धरणों को सत्यापित करें।**
+`study-swarm` एक प्रोटोकॉल है, उपकरण नहीं। जब आप किसी एलएलएम के साथ एक महत्वपूर्ण डिज़ाइन निर्णय ले रहे हों — एक नया उत्पाद परत, एक आर्किटेक्चर विकल्प, एक "क्या हमें यहां मॉडल पर भरोसा करना चाहिए" — तो बुनियादी सिद्धांतों से तात्कालिक रूप से काम करने से ऐसे डिज़ाइन सामने आते हैं जो पुराने हैं, और स्मृति से कागजात का हवाला देने से ऐसे डिज़ाइन सामने आते हैं जो उन स्रोतों पर निर्भर करते हैं जो मौजूद नहीं हैं या जो कुछ ऐसा नहीं कहते हैं जो आप सोचते हैं। study-swarm दोनों को बदल देता है: समानांतर अनुसंधान एजेंटों को भेजें, विशिष्ट उद्धृत निष्कर्षों की मांग करें, और डिज़ाइन को सूचित करने से पहले एक **एक अलग मॉडल परिवार के बाहरी सत्यापनकर्ता** के माध्यम से प्रत्येक उद्धरण को जांचें।
+यह अपनी ही दवा का उपयोग करता है। प्रोटोकॉल उन प्रणालियों के लिए सत्यापनकर्ता-संरक्षित एन्वलप निर्धारित करता है जिन्हें यह डिज़ाइन करने में मदद करता है — इसलिए यह स्वयं पर भी इसे चलाता है। **कोई भी मॉडल अपने स्वयं के गृहकार्य का मूल्यांकन नहीं करता है, जिसमें प्रोटोकॉल चलाने वाला मॉडल भी शामिल है।**
+## प्रोटोकॉल पाँच चरणों में
+1. **पहचानें** 3-5 भार-असर वाले डिज़ाइन प्रश्न जहां अनुभवजन्य प्रमाण उत्तर को बदल देंगे।
+2. **भेजें** प्रत्येक प्रश्न के लिए एक अनुसंधान एजेंट, समानांतर में। प्रत्येक को पेपर शीर्षक + लेखक + वर्ष + यूआरएल + एक-वाक्य निष्कर्ष लौटाना चाहिए — व्यापकता से अधिक विशिष्टता ("6-8 अच्छी तरह से संदर्भित निष्कर्ष 20 अस्पष्ट संकेतों से बेहतर हैं")।
+3. **संश्लेषित करें** निष्कर्षों को एक *अनुसंधान आधार* अनुभाग में: `N. **<निष्कर्ष>.** <लेखक> <वर्ष> (<arXiv/DOI>). <डिज़ाइन निहितार्थ>.`
+4. **बाह्य रूप से सत्यापित करें** — एक *अलग मॉडल परिवार*, तर्क-मुक्त, दो चरणों में प्रत्येक उद्धरण की जांच करता है: एक **पुनर्प्राप्ति ओरेकल** पुष्टि करता है कि पेपर मौजूद है (कभी भी मॉडल की स्मृति नहीं), फिर एक **आधारितता** लेंस पुष्टि करता है कि निष्कर्ष स्रोत से मेल खाता है। **गढ़े हुए/गलत रूप से बताए गए उद्धरणों पर रोक लगाएं; यदि सत्यापनकर्ता या पुनर्प्राप्ति ओरेकल अनुपलब्ध है तो रोकें और आगे बढ़ाएं** (कभी भी अनुपस्थिति को "उद्धरण ठीक हैं" के रूप में न मानें)।
+5. **प्रत्येक वास्तुशिल्प विकल्प को संख्या द्वारा एक निष्कर्ष से जोड़ें।** डिज़ाइन निहितार्थ के बिना उद्धरण शोर हैं।
+पूरी निष्पादन योग्य जानकारी — रोक तालिका, सोर्सिंग मानक, एन्सेम्बल नियम — **[PROTOCOL.md](PROTOCOL.md)** में है।
+## एक *अलग* परिवार, तर्क-मुक्त क्यों?
+क्योंकि विफलता के तरीके प्रलेखित हैं, काल्पनिक नहीं:
+- **एलएलएम अपने स्वयं के आउटपुट को विश्वसनीय रूप से सत्यापित नहीं कर सकते हैं।** हुआंग एट अल. 2023 ([arXiv:2310.01798](https://arxiv.org/abs/2310.01798)); कंबंपती एट अल. 2024 ([arXiv:2402.01817](https://arxiv.org/abs/2402.01817), एलएलएम-मॉड्यूलो); स्टेचली एट अल. 2024 ([arXiv:2402.08115](https://arxiv.org/abs/2402.08115)) — बाहरी सत्यापनकर्ता लाभ प्रदान करता है; आत्म-आलोचना सामग्री निष्क्रिय है।
+- **समान-परिवार के न्यायाधीश स्वयं को प्राथमिकता देते हैं।** पैनिकसेरी, बोमन और फेंग 2024 ([arXiv:2404.13076](https://arxiv.org/abs/2404.13076)) — आत्म-पहचान *रैखिक रूप से* आत्म-वरीयता से संबंधित है, इसलिए आंशिक अंधापन मदद नहीं करता है। वर्गा एट अल. 2024 ([arXiv:2404.18796](https://arxiv.org/abs/2404.18796), पोएलएल) — अलग-अलग परिवारों में एक पैनल लगभग 7 गुना कम लागत पर कम पक्षपाती है।
+- **उद्धरण वे स्थान हैं जहां एलएलएम झूठ बोलते हैं।** वाल्टर्स और वाइल्डर 2023 ([doi:10.1038/s41598-023-41032-5](https://doi.org/10.1038/s41598-023-41032-5)) — जीपीटी-3.5 के 55% / जीपीटी-4 के 18% उद्धरण गढ़ते हैं। ऑनवेलेर एट अल. 2026 ([arXiv:2605.06635](https://arxiv.org/abs/2605.06635)) — लिंक >94% समय तक हल होते हैं, फिर भी केवल 39-77% उद्धृत सामग्री वास्तव में दावे का समर्थन करती है। इसलिए अस्तित्व की जांच **पुनर्प्राप्ति द्वारा की जानी चाहिए, न कि स्मरण द्वारा।**
+- **जनरेटर के तर्क को छिपाएं।** खलीफा एट अल. 2026 ([arXiv:2601.14691](https://arxiv.org/abs/2601.14691), "जज को धोखा देना") — अकेले हेरफेर किए गए चेन-ऑफ-थॉट एक न्यायाधीश के झूठे-सकारात्मकों को 90% तक बढ़ा देते हैं, जबकि क्रियाएं स्थिर रहती हैं। टर्पिन एट अल. 2023 ([arXiv:2305.04388](https://arxiv.org/abs/2305.04388)) — सीओटी पोस्ट-हॉक तर्कसंगतता है। सत्यापनकर्ता केवल नंगे उद्धरण दावे को देखता है, कभी भी "मैंने इसे क्यों शामिल किया" नहीं।
+- **विविधता गिनती से बेहतर है।** राजन 2025 ([arXiv:2511.16708](https://arxiv.org/abs/2511.16708)) — जोड़ीदार सहसंबंध ρ ∈ [0.05, 0.25] पर चार सत्यापनकर्ता उपमॉड्यूलर कवरेज के माध्यम से किसी भी एक स्मार्ट न्यायाधीश से बेहतर हैं। किम एट अल. 2025 ([arXiv:2506.07962](https://arxiv.org/abs/2506.07962)) — एलएलएम त्रुटियां *संबंधित* हैं, इसलिए भार-असर चर लेंस विविधता है, न कि कच्ची गिनती।
+## क्या यह वास्तव में काम करता है? (प्रमाण)
+एक परीक्षण के रूप में, प्रोटोकॉल को अपने स्वयं के उद्धरणों के विरुद्ध चलाया गया था। दो असंबंधित गैर-क्लाउड परिवार — **मिस्ट्रल** (`mistral-small:24b`) और **आईबीएम ग्रेनाइट** (`granite4.1:30b`) — ने एक उद्धरण सेट की जांच की, तर्क-मुक्त, दो अंधे जाल के साथ:
+| रोपा गया जाल | मिस्ट्रल | आईबीएम ग्रेनाइट | वास्तविक स्थिति |
+|---|---|---|---|
+| "नकामुरा और ओल्सन" को जिम्मेदार चेन-ऑफ-थॉट प्रॉम्प्टिंग | छोड़ दिया | **पकड़ा गया** (गलत रूप से बताया गया → वास्तव में वेई एट अल. 2022) | गलत रूप से बताया गया |
+| एक गढ़ित "98% त्रुटियों को हटा दिया गया, ओरेकल की आवश्यकता नहीं है" पेपर | **caught** (fabricated) | **caught** (fabricated) | गढ़ा हुआ |
+किसी भी परिवार ने अकेले दोनों जाल नहीं पकड़े — लेकिन उनके **संघ ने 2/2 पकड़ा।** एक न्यायाधीश ने गलत आरोप को जारी कर दिया होता। अलग से, पुनर्प्राप्ति ओरेकल ने हमारे अपने डिज़ाइन दस्तावेज़ों में दो *वास्तविक* गलत आरोपों को पकड़ा (गलत पहले लेखक के तहत उद्धृत पेपर) जिन्हें किसी भी पैरामीट्रिक एलएलएम द्वारा चिह्नित नहीं किया जा सकता था — और इसने सही ढंग से वास्तविक 2026 पेपरों की पुष्टि की, जिन्हें दोनों एलएलएम ने केवल इसलिए गढ़ित के रूप में गलत चिह्नित किया क्योंकि पेपर उनके प्रशिक्षण के बाद के हैं। वह अंतिम बिंदु पूरी तरह से कारण है कि चरण 4 में अस्तित्व की जांच **एक पुनर्प्राप्ति ओरेकल होनी चाहिए, कभी भी एलएलएम नहीं।**
+यह एकल रन लघु रूप में थीसिस है: **असंबंधित लेंस + अस्तित्व के लिए एक पुनर्प्राप्ति ओरेकल किसी भी एक स्मार्ट न्यायाधीश से बेहतर है।**
+## यह कैसे जुड़ा हुआ है
+आप प्रोटोकॉल को मैन्युअल रूप से चला सकते हैं — किसी भी अलग-परिवार मॉडल के साथ और स्वयं द्वारा arXiv/DOI को हल करने से चरण 4 पूरा हो जाता है। दो संबंधित उपकरण इसे एक कमांड बनाते हैं:
+- **[prism-verify](https://github.com/mcp-tool-shop-org/prism-verify)** — रनटाइम सत्यापनकर्ता: परिवार-विशिष्ट रूटिंग, तर्क-मुक्त, बहु-लेंस निर्णय, एक नियतात्मक पुनर्प्राप्ति अस्तित्व तल (arXiv → Crossref), और हस्ताक्षरित रसीदें।
+- **[role-os](https://github.com/mcp-tool-shop-org/role-os)** — `roleos verify-citations <dispatch>` प्रदान करता है, जो एक ऐसा रनर है जो एक डिस्पैच के उद्धरणों को निकालता है और उन्हें प्रिज्म के माध्यम से संसाधित करता है।
+## सीएलआई
+```bash
+npm i -g @dogfood-lab/study-swarm     # or run ad-hoc: npx @dogfood-lab/study-swarm <command>
+```
+| कमांड | यह क्या करता है |
+|---|---|
+| `study-swarm protocol` | पूरे प्रोटोकॉल को प्रिंट करें — पाँच चरण, हॉल्ट तालिका, सोर्सिंग मानक। |
+| `study-swarm new <slug>` | पाँच-चरणीय ढाँचे के साथ `<slug>.dispatch.md` बनाएं ताकि इसे भरा जा सके। |
+| `study-swarm lint <file>` | किसी डिस्पैच के *अनुसंधान आधार* की सोर्सिंग मानक के विरुद्ध जाँच करें — प्रत्येक निष्कर्ष के लिए एक लेखक, एक वर्ष और एक पहचानने योग्य पहचानकर्ता (arXiv / DOI / URL) की आवश्यकता होती है; "अध्ययनों से पता चलता है…" जैसे अस्पष्ट दावे अस्वीकार किए जाते हैं। उल्लंघन होने पर `1` के साथ बाहर निकलें, ताकि यह सीआई को नियंत्रित करे। |
+`lint` नियतात्मक है — शून्य मॉडल कॉल — इसलिए यह सीआई में सुरक्षित है। यह स्थानीय रूप से **चरण 3 के सोर्सिंग मानक** को लागू करता है; मॉडल-आधारित **चरण 4** सत्यापन अभी भी [`roleos verify-citations`](https://github.com/mcp-tool-shop-org/role-os) → प्रिज्म पर निर्भर करता है।
+## यह कैसे काम करता है, एक वाक्य में
+**वर्तमान** — यह क्षेत्र तेजी से आगे बढ़ रहा है; विशिष्ट अध्ययनों-के-साथ-वर्षों की मांग करने से डिज़ाइन 18 महीने पीछे नहीं रहते हैं। **कार्यात्मक** — सबूत दिखाते हैं कि *क्या विफल होता है*, न कि केवल यह कि क्या काम करता है (व्याख्याएँ *गलत* एआई पर अत्यधिक निर्भरता बढ़ा सकती हैं — बंसल एट अल। 2021)। **सुरक्षित** — सत्यापनकर्ता-संरक्षित क्षेत्र वह आर्किटेक्चर है जिसका सबूत समर्थन करता है, और प्रोटोकॉल इसे अपने आउटपुट पर लागू करता है। सोर्सिंग अकादमिक प्रदर्शन नहीं है; यह सबूत का निशान है।
+## सुरक्षा
+`study-swarm` एक दस्तावेज़ भंडार है — मार्कडाउन और एक लोगो। यह कोई निष्पादन योग्य कोड शिप नहीं करता है और इस भंडार से कुछ भी स्थापित नहीं करता है। यह किसी भी डेटा को नहीं छूता है, किसी भी अनुमति की आवश्यकता नहीं है, और कोई टेलीमेट्री एकत्र नहीं करता है; स्रोत में कोई गुप्त जानकारी या क्रेडेंशियल नहीं हैं। कार्यप्रणाली *एक ऐसे वर्कफ़्लो का वर्णन करती है* जो वेब पुनर्प्राप्ति और मॉडल-आधारित सत्यापन का उपयोग करता है, लेकिन यह भंडार इसे लागू या निष्पादित नहीं करता है। [SECURITY.md](SECURITY.md) देखें।
+## स्थिति
+एक कार्यशील प्रोटोकॉल, जिसकी अपनी मशीनरी द्वारा बाहरी रूप से जाँच की जाती है — एक अलग मॉडल परिवार इसके उद्धरणों की जाँच करता है (ऊपर प्रमाण देखें)। यह भंडार सार्वजनिक संदर्भ है; [PROTOCOL.md](PROTOCOL.md) निष्पादन योग्य रूप है। [dogfood-lab](https://github.com/dogfood-lab) परिवार का हिस्सा — एआई युग में निर्माण के लिए विधियाँ और प्रदर्शन।
+एमआईटी लाइसेंस प्राप्त।
+---
+<p align="center"><sub>Part of the <a href="https://github.com/dogfood-lab">dogfood-lab</a> family — methods &amp; showcases for building in the AI era. Built by <a href="https://mcp-tool-shop.github.io/">MCP Tool Shop</a>.</sub></p>

package/README.it.md ADDED Viewed

@@ -0,0 +1,92 @@
+<p align="center">
+  <a href="README.ja.md">日本語</a> | <a href="README.zh.md">中文</a> | <a href="README.es.md">Español</a> | <a href="README.fr.md">Français</a> | <a href="README.hi.md">हिन्दी</a> | <a href="README.md">English</a> | <a href="README.pt-BR.md">Português (BR)</a>
+</p>
+<p align="center">
+  <img src="https://raw.githubusercontent.com/dogfood-lab/study-swarm/main/assets/study-swarm.png" alt="study-swarm" width="360">
+</p>
+<p align="center">
+  <a href="https://www.npmjs.com/package/@dogfood-lab/study-swarm"><img src="https://img.shields.io/npm/v/@dogfood-lab/study-swarm" alt="npm"></a>
+  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green" alt="MIT License"></a>
+  <a href="https://dogfood-lab.github.io/study-swarm/"><img src="https://img.shields.io/badge/handbook-live-purple" alt="Handbook"></a>
+  <img src="https://img.shields.io/badge/cited%20research-verified-1f6feb" alt="Cited research, verified">
+</p>
+**Fondi le decisioni di progettazione su ricerche citate, quindi verifica le citazioni con un *modello* diverso prima che diventino parte integrante del progetto.**
+`study-swarm` è un protocollo, non uno strumento. Quando si prende una decisione di progettazione importante con un LLM (un nuovo livello di prodotto, una scelta architettonica, una decisione del tipo "dovremmo fidarci del modello in questo caso"), improvvisare partendo da principi fondamentali porta a progetti obsoleti e citare articoli a memoria porta a progetti basati su fonti inesistenti o che non dicono ciò che si pensa. `study-swarm` sostituisce entrambi: invia agenti di ricerca in parallelo, richiede risultati specifici dalle fonti citate e sottopone ogni citazione a un **verificatore esterno di un modello diverso** prima che influenzi la progettazione.
+Applica la propria "medicina". Il protocollo prevede l'utilizzo di verificatori per proteggere i sistemi che aiuta a progettare, quindi lo applica anche a se stesso. **Nessun modello valuta il proprio lavoro, incluso quello che esegue il protocollo.**
+## Il protocollo in cinque passaggi
+1. **Identificare** 3-5 domande di progettazione fondamentali, in cui prove empiriche cambierebbero la risposta.
+2. **Inviare** un agente di ricerca per ogni domanda, in parallelo. Ognuno deve restituire titoli di articoli + autori + anni + URL + un risultato in una frase: dare priorità alla specificità rispetto all'ampiezza ("6-8 risultati ben documentati sono meglio di 20 affermazioni vaghe").
+3. **Sintetizzare** i risultati in una sezione "Fondamento della ricerca": `N. **<risultato>.** <Autori> <anno> (<arXiv/DOI>). <implicazione per la progettazione>.`
+4. **Verificare esternamente** — un *modello diverso*, senza ragionamento, controlla ogni citazione in due fasi: un **oracolo di recupero** conferma che l'articolo esiste (mai dalla memoria del modello), quindi una "lente di fondamento" conferma che il risultato corrisponde alla fonte. **Interrompere** in caso di fabbricazioni/attribuzioni errate; **interrompere e segnalare** se il verificatore o l'oracolo di recupero non sono disponibili (non interpretare mai l'assenza come "le citazioni sono corrette").
+5. **Collegare** ogni scelta architettonica a un risultato specifico tramite numero. Le citazioni senza un'implicazione per la progettazione sono rumore.
+I dettagli completi e eseguibili (la tabella di interruzione, lo standard di riferimento, la regola di insieme) sono disponibili in **[PROTOCOL.md](PROTOCOL.md)**.
+## Perché un *modello diverso*, senza ragionamento?
+Perché le modalità di errore sono documentate, non ipotetiche:
+- **Gli LLM non possono verificare in modo affidabile i propri risultati.** Huang et al. 2023 ([arXiv:2310.01798](https://arxiv.org/abs/2310.01798)); Kambhampati et al. 2024 ([arXiv:2402.01817](https://arxiv.org/abs/2402.01817), LLM-Modulo); Stechly et al. 2024 ([arXiv:2402.08115](https://arxiv.org/abs/2402.08115)) — il verificatore esterno offre i vantaggi; il contenuto di autocritica è inerte.
+- **I giudici della stessa famiglia tendono ad auto-preferirsi.** Panickssery, Bowman & Feng 2024 ([arXiv:2404.13076](https://arxiv.org/abs/2404.13076)) — l'auto-riconoscimento è correlato *linearmente* con l'auto-preferenza, quindi un'occlusione parziale non aiuta. Verga et al. 2024 ([arXiv:2404.18796](https://arxiv.org/abs/2404.18796), PoLL) — un gruppo di esperti provenienti da famiglie diverse è meno influenzato a un costo inferiore di circa 7 volte.
+- **Le citazioni sono il punto in cui gli LLM mentono.** Walters & Wilder 2023 ([doi:10.1038/s41598-023-41032-5](https://doi.org/10.1038/s41598-023-41032-5)) — il 55% delle citazioni di GPT-3.5 / il 18% di GPT-4 sono fabbricate. Onweller et al. 2026 ([arXiv:2605.06635](https://arxiv.org/abs/2605.06635)) — i collegamenti risolvono il >94% delle volte, ma solo il 39-77% del contenuto citato supporta effettivamente l'affermazione. Pertanto, l'esistenza deve essere verificata tramite **recupero, non richiamo**.
+- **Nascondere il ragionamento del generatore.** Khalifa et al. 2026 ([arXiv:2601.14691](https://arxiv.org/abs/2601.14691), "Gaming the Judge") — la sola manipolazione della catena di pensiero aumenta i falsi positivi di un giudice fino al 90% con azioni mantenute fisse. Turpin et al. 2023 ([arXiv:2305.04388](https://arxiv.org/abs/2305.04388)) — la catena di pensiero è una razionalizzazione post-hoc. Il verificatore vede solo la citazione, mai il "perché l'ho inclusa".
+- **La diversità è più importante della quantità.** Rajan 2025 ([arXiv:2511.16708](https://arxiv.org/abs/2511.16708)) — quattro verificatori con correlazione a coppie ρ ∈ [0.05, 0.25] superano qualsiasi singolo verificatore tramite copertura submodulare. Kim et al. 2025 ([arXiv:2506.07962](https://arxiv.org/abs/2506.07962)) — gli errori degli LLM sono *correlati*, quindi la variabile più importante è la diversità delle "lenti", non la quantità.
+## Funziona davvero? (prova)
+Come test, il protocollo è stato eseguito sulle proprie citazioni. Due famiglie non correlate di Claude — **Mistral** (`mistral-small:24b`) e **IBM Granite** (`granite4.1:30b`) — hanno controllato un insieme di citazioni, senza ragionamento, con due "trappole" nascoste:
+| Trappola preparata | Mistral | IBM Granite | Verità |
+|---|---|---|---|
+| Il ragionamento della catena di pensiero è attribuito a "Nakamura & Olsen" | mancato | **rilevato** (attribuzione errata → in realtà Wei et al. 2022) | attribuzione errata |
+| un articolo fabbricato con la dicitura "98% degli errori eliminati, non è necessario alcun oracolo" | **caught** (fabricated) | **caught** (fabricated) | fabbricato |
+Nessuna delle due famiglie ha rilevato entrambe le trappole da sola, ma la loro **unione ha rilevato 2/2**. Un singolo giudice avrebbe accettato l'attribuzione errata. Separatamente, l'oracolo di recupero ha rilevato due *vere* attribuzioni errate nei nostri documenti di progettazione (articoli citati con il primo autore sbagliato) che nessun LLM parametrico avrebbe potuto segnalare, e ha confermato correttamente gli articoli genuini del 2026 che entrambi gli LLM hanno erroneamente segnalato come fabbricati semplicemente perché gli articoli sono successivi alla loro data di addestramento. Quest'ultimo punto è la ragione principale per cui il controllo dell'esistenza nel passaggio 4 **deve** essere un oracolo di recupero, mai un LLM.
+Questa singola esecuzione è la tesi in miniatura: **"lenti" non correlate + un oracolo di recupero per l'esistenza superano qualsiasi singolo giudice esperto.**
+## Come è strutturato
+È possibile eseguire il protocollo manualmente: qualsiasi modello di famiglia diversa più la risoluzione di arXiv/DOI da parte dell'utente soddisfa il passaggio 4. Due strumenti complementari lo rendono un unico comando:
+- **[prism-verify](https://github.com/mcp-tool-shop-org/prism-verify)** — il verificatore in fase di esecuzione: instradamento differenziato per famiglia, ragionamento semplificato, arbitraggio multi-lente, un limite inferiore deterministico per il recupero dell'esistenza (arXiv → Crossref) e ricevute firmate.
+- **[role-os](https://github.com/mcp-tool-shop-org/role-os)** — fornisce `roleos verify-citations <dispatch>`, lo strumento che estrae le citazioni di un documento e le elabora tramite prism.
+## CLI
+```bash
+npm i -g @dogfood-lab/study-swarm     # or run ad-hoc: npx @dogfood-lab/study-swarm <command>
+```
+| Comando | Funzione |
+|---|---|
+| `study-swarm protocol` | Stampa l’intero protocollo: le cinque fasi, la tabella di controllo e lo standard di riferimento. |
+| `study-swarm new <slug>` | Crea un file `<slug>.dispatch.md` con la struttura delle cinque fasi, da completare. |
+| `study-swarm lint <file>` | Verifica la sezione *Base di ricerca* di un documento rispetto allo standard di riferimento: ogni dato deve avere un autore, un anno e un identificatore univoco (arXiv / DOI / URL); le affermazioni generiche del tipo "gli studi dimostrano..." non sono accettate. In caso di violazioni, il comando termina con codice di uscita `1`, bloccando quindi il processo di CI. |
+`lint` è deterministico (non effettua chiamate al modello), quindi è sicuro da utilizzare in CI. Applica localmente lo **standard di riferimento della fase 3**; la verifica basata sul modello della **fase 4** viene comunque eseguita tramite [`roleos verify-citations`](https://github.com/mcp-tool-shop-org/role-os) → prism.
+## In sintesi, perché funziona
+**Efficienza** — il settore è in rapida evoluzione; richiedere studi specifici e approfonditi rallenta lo sviluppo dei progetti di 18 mesi. **Funzionalità** — i dati mostrano cosa *non* funziona, non solo cosa funziona (le spiegazioni possono portare a un'eccessiva dipendenza da un'IA *errata* — Bansal et al. 2021). **Sicurezza** — l'ambiente protetto dal verificatore è l'architettura supportata dai dati, e il protocollo la applica ai propri risultati. La verifica non è un esercizio accademico; è la traccia dei dati.
+## Sicurezza
+`study-swarm` è un repository di documentazione: contiene file Markdown e un logo. Non include codice eseguibile e non installa nulla da questo repository. Non accede a dati, non richiede autorizzazioni e non raccoglie dati di telemetria; non ci sono segreti o credenziali nel codice sorgente. La metodologia *descrive* un flusso di lavoro che utilizza il recupero di informazioni dal web e la verifica basata su modelli, ma questo repository non lo implementa né lo esegue. Consultare [SECURITY.md](SECURITY.md).
+## Stato
+Un protocollo funzionante, verificato esternamente dai propri strumenti: una famiglia di modelli diversa verifica le sue citazioni (vedere la prova sopra). Questo repository è il riferimento pubblico; [PROTOCOL.md](PROTOCOL.md) è la forma eseguibile. Parte della famiglia [dogfood-lab](https://github.com/dogfood-lab): metodi e esempi per lo sviluppo nell'era dell'IA.
+Licenza MIT.
+---
+<p align="center"><sub>Part of the <a href="https://github.com/dogfood-lab">dogfood-lab</a> family — methods &amp; showcases for building in the AI era. Built by <a href="https://mcp-tool-shop.github.io/">MCP Tool Shop</a>.</sub></p>