npm - @stabgan/steelmind-mcp - Versions diffs - 2.0.0 - Mend

@stabgan/steelmind-mcp 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/LICENSE +21 -0
package/README.md +189 -0
package/dist/__tests__/descriptions.test.d.ts +1 -0
package/dist/__tests__/descriptions.test.js +98 -0
package/dist/__tests__/output-quality.test.d.ts +1 -0
package/dist/__tests__/output-quality.test.js +200 -0
package/dist/__tests__/server.test.d.ts +1 -0
package/dist/__tests__/server.test.js +279 -0
package/dist/cli.d.ts +2 -0
package/dist/cli.js +26 -0
package/dist/descriptions.d.ts +28 -0
package/dist/descriptions.js +81 -0
package/dist/index.d.ts +7 -0
package/dist/index.js +106 -0
package/package.json +71 -0

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 stabgan
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,189 @@
+# Steelmind MCP — Structured Thinking & Verification for AI Agents
+[![npm version](https://img.shields.io/npm/v/@stabgan/steelmind-mcp)](https://www.npmjs.com/package/@stabgan/steelmind-mcp)
+[![Docker](https://img.shields.io/docker/v/stabgan/steelmind-mcp?label=docker)](https://hub.docker.com/r/stabgan/steelmind-mcp)
+[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
+**The research-grounded reasoning MCP server for AI agents.** Combines step-by-step sequential thinking with steel-manning verification — backed by 43+ cognitive science and AI research papers.
+Steelmind gives your AI agent two tools:
+- **`think`** — Record structured reasoning steps with sequential decomposition. Embeds Socratic self-questioning and Polya's problem-solving method.
+- **`verify`** — Challenge conclusions with steel-manning before committing. Embeds dialectical evaluation from MetaCrit and SIEV research.
+The code is minimal. The descriptions do the heavy lifting — tool descriptions account for ~80% of reasoning improvement per [Anthropic τ-bench research](https://www.anthropic.com/engineering/claude-think-tool).
+## Why Steelmind?
+| Feature                        | Think MCP | Sequential Thinking | **Steelmind** |
+| ------------------------------ | --------- | ------------------- | ------------- |
+| Step tracking                  | ✗         | ✓                   | ✓             |
+| Adjustable step count          | ✗         | ✓                   | ✓             |
+| Cognitive mode separation      | ✗         | ✗                   | ✓             |
+| Steel-manning verification     | ✗         | ✗                   | ✓             |
+| Socratic self-questioning      | ✗         | ✗                   | ✓             |
+| Research-grounded descriptions | ✗         | ✗                   | ✓             |
+| Verify nudge on completion     | ✗         | ✗                   | ✓             |
+| Tool count                     | 1         | 1                   | 2             |
+**Key research insight:** MetaCrit (arxiv 2507.15015) proved that separating reasoning generation from reasoning evaluation prevents self-bias and improves accuracy by up to 76%. Sequential-thinking uses one tool for both. Steelmind separates them.
+## Quick Start
+### npx (no install)
+```json
+{
+  "mcpServers": {
+    "steelmind": {
+      "command": "npx",
+      "args": ["-y", "@stabgan/steelmind-mcp"]
+    }
+  }
+}
+```
+### Docker
+```json
+{
+  "mcpServers": {
+    "steelmind": {
+      "command": "docker",
+      "args": ["run", "--rm", "-i", "stabgan/steelmind-mcp"]
+    }
+  }
+}
+```
+### npm global install
+```bash
+npm install -g @stabgan/steelmind-mcp
+```
+```json
+{
+  "mcpServers": {
+    "steelmind": {
+      "command": "steelmind-mcp"
+    }
+  }
+}
+```
+## How It Works
+### The `think` tool
+Records a structured reasoning step with sequential tracking.
+**Input:**
+```json
+{
+  "thought": "What are the dependencies? Need to check imports before refactoring.",
+  "thoughtNumber": 1,
+  "totalThoughts": 3,
+  "nextThoughtNeeded": true
+}
+```
+**Output (mid-sequence):**
+```
+[Thinking 1/3]
+What are the dependencies? Need to check imports before refactoring.
+```
+**Output (final step — includes verify nudge):**
+```
+[Thinking 3/3]
+My conclusion: use the adapter pattern for backward compatibility.
+---
+Thinking complete. Before acting on this conclusion, use the verify tool to challenge it.
+```
+The verify nudge appears in the tool result (not just the description), making it far more likely the model will actually call `verify`. Tool results get different attention treatment than descriptions — they're processed as fresh context.
+### The `verify` tool
+Challenges your reasoning with steel-manning before you commit.
+**Input:**
+```json
+{
+  "concern": "The adapter pattern adds complexity. Is the simpler approach actually better?"
+}
+```
+**Output:**
+```
+The adapter pattern adds complexity. Is the simpler approach actually better?
+```
+Pure identity function — returns your concern unchanged. The value is in the description, which prompts: _"Steel-man the opposition: What is the strongest argument that your conclusion is wrong?"_
+### The workflow
+```
+think(step 1/3) → think(step 2/3) → think(step 3/3) → [verify nudge] → verify → act
+                                          ↑
+                                  adjust totalThoughts if needed
+```
+## Research Foundation
+Steelmind's design is grounded in 43+ research papers. Key findings:
+| Paper                                        | Finding                                                  | How Steelmind Uses It                                    |
+| -------------------------------------------- | -------------------------------------------------------- | -------------------------------------------------------- |
+| **MetaCrit** (arxiv 2507.15015)              | Separating generation from evaluation prevents self-bias | Two separate tools: think (generate) + verify (evaluate) |
+| **Anthropic τ-bench**                        | Optimized tool descriptions yield 54% improvement        | Descriptions are the primary scaffold, not code          |
+| **Think2** (arxiv 2602.18806)                | Structured metacognition yields 3x self-correction       | Sequential step tracking + Socratic questioning          |
+| **SIEV** (ICML)                              | Models lose 40+ points under dialectical evaluation      | Steel-manning prompt in verify description               |
+| **Scaling TTC** (arxiv 2408.03314)           | Difficulty-adaptive compute improves efficiency 4x       | Adjustable totalThoughts                                 |
+| **EasyTool** (NAACL 2025)                    | Concise descriptions outperform verbose ones             | ~100 word descriptions                                   |
+| **ToolACE**                                  | "When NOT to use" improves irrelevance detection 6→84%   | Negative guidance in both descriptions                   |
+| **Cognitive Foundations** (arxiv 2511.16660) | External scaffolding improves performance up to 72%      | Research-grounded cognitive frameworks                   |
+## Compatible Clients
+Works with any MCP-compatible client:
+- Claude Desktop / Claude Code
+- Cursor
+- Windsurf
+- Kiro
+- Cline
+- Any client supporting MCP stdio transport
+## Compatible Models
+Designed for frontier models but works across families:
+- Claude (Opus, Sonnet) — native MCP
+- GPT-5 / GPT-4o / o-series — via MCP adapters
+- Gemini — via MCP adapters
+- DeepSeek — via MCP adapters
+## Development
+```bash
+npm install          # Install dependencies
+npm run build        # Compile TypeScript
+npm test             # Run 90 tests
+npm run lint         # ESLint
+npm run format       # Prettier
+npm start            # Run the server
+```
+## License
+MIT

package/dist/__tests__/descriptions.test.d.ts ADDED Viewed

	@@ -0,0 +1 @@
1	+ export {};

package/dist/__tests__/descriptions.test.js ADDED Viewed

@@ -0,0 +1,98 @@
+import { describe, it, expect } from 'vitest';
+import { THINK_DESCRIPTION, VERIFY_DESCRIPTION, SYSTEM_PROMPT } from '../descriptions.js';
+function wordCount(text) {
+    return text.split(/\s+/).filter(Boolean).length;
+}
+describe('THINK_DESCRIPTION', () => {
+    it('is between 80-130 words', () => {
+        const count = wordCount(THINK_DESCRIPTION);
+        expect(count).toBeGreaterThanOrEqual(80);
+        expect(count).toBeLessThanOrEqual(130);
+    });
+    it('leads with purpose (primacy effect)', () => {
+        expect(THINK_DESCRIPTION).toMatch(/^Use this tool to/);
+    });
+    it('includes "when NOT to use" guidance (ToolACE)', () => {
+        expect(THINK_DESCRIPTION).toMatch(/Do NOT use/i);
+    });
+    it('embeds Socratic self-questioning', () => {
+        expect(THINK_DESCRIPTION).toMatch(/What am I assuming/);
+        expect(THINK_DESCRIPTION).toMatch(/What evidence supports/);
+    });
+    it('does NOT instruct to "think harder" (DC-4)', () => {
+        expect(THINK_DESCRIPTION.toLowerCase()).not.toMatch(/think harder/);
+    });
+    it('frames as recording, not commanding', () => {
+        expect(THINK_DESCRIPTION).toMatch(/record/i);
+    });
+    it('mentions no state change', () => {
+        expect(THINK_DESCRIPTION).toMatch(/not.*change any state/i);
+    });
+    it('references adjustable totalThoughts', () => {
+        expect(THINK_DESCRIPTION).toMatch(/totalThoughts/);
+    });
+    it('cross-references verify tool', () => {
+        expect(THINK_DESCRIPTION).toMatch(/verify tool/i);
+    });
+    it('mentions nextThoughtNeeded transition', () => {
+        expect(THINK_DESCRIPTION).toMatch(/nextThoughtNeeded/);
+    });
+});
+describe('VERIFY_DESCRIPTION', () => {
+    it('is between 80-130 words', () => {
+        const count = wordCount(VERIFY_DESCRIPTION);
+        expect(count).toBeGreaterThanOrEqual(80);
+        expect(count).toBeLessThanOrEqual(130);
+    });
+    it('leads with purpose (primacy effect)', () => {
+        expect(VERIFY_DESCRIPTION).toMatch(/^Use this tool to/);
+    });
+    it('includes "when NOT to use" guidance (ToolACE)', () => {
+        expect(VERIFY_DESCRIPTION).toMatch(/Do NOT use/i);
+    });
+    it('embeds steel-manning prompt (SIEV, MetaCrit)', () => {
+        expect(VERIFY_DESCRIPTION).toMatch(/steel-man/i);
+        expect(VERIFY_DESCRIPTION).toMatch(/strongest argument/i);
+    });
+    it('does NOT instruct to "think harder" (DC-4)', () => {
+        expect(VERIFY_DESCRIPTION.toLowerCase()).not.toMatch(/think harder/);
+    });
+    it('mentions no state change', () => {
+        expect(VERIFY_DESCRIPTION).toMatch(/not.*change any state/i);
+    });
+    it('cross-references think tool', () => {
+        expect(VERIFY_DESCRIPTION).toMatch(/think tool/i);
+    });
+});
+describe('SYSTEM_PROMPT', () => {
+    it('is under 500 tokens (~375 words)', () => {
+        expect(wordCount(SYSTEM_PROMPT)).toBeLessThanOrEqual(375);
+    });
+    it('contains workflow guidance for both tools', () => {
+        expect(SYSTEM_PROMPT).toMatch(/think tool/i);
+        expect(SYSTEM_PROMPT).toMatch(/verify tool/i);
+    });
+    it('contains think, verify, and bad examples', () => {
+        expect(SYSTEM_PROMPT).toMatch(/think_example/);
+        expect(SYSTEM_PROMPT).toMatch(/verify_example/);
+        expect(SYSTEM_PROMPT).toMatch(/bad_example/);
+    });
+    it('bad example explains WHY it is bad', () => {
+        expect(SYSTEM_PROMPT).toMatch(/adds tokens without insight/i);
+    });
+    it('mentions totalThoughts adjustment', () => {
+        expect(SYSTEM_PROMPT).toMatch(/totalThoughts/i);
+    });
+    it('mentions nextThoughtNeeded transition to verify', () => {
+        expect(SYSTEM_PROMPT).toMatch(/nextThoughtNeeded.*false/i);
+    });
+});
+describe('description separation (DC-3)', () => {
+    it('think and verify descriptions are different strings', () => {
+        expect(THINK_DESCRIPTION).not.toBe(VERIFY_DESCRIPTION);
+    });
+    it('think uses "thought" terminology, verify uses "concern"', () => {
+        expect(THINK_DESCRIPTION).toMatch(/thought/i);
+        expect(VERIFY_DESCRIPTION).toMatch(/concern|self-assessment/i);
+    });
+});

package/dist/__tests__/output-quality.test.d.ts ADDED Viewed

	@@ -0,0 +1 @@
1	+ export {};

package/dist/__tests__/output-quality.test.js ADDED Viewed

@@ -0,0 +1,200 @@
+import { describe, it, expect, beforeEach, afterEach } from 'vitest';
+import { Client } from '@modelcontextprotocol/sdk/client/index.js';
+import { InMemoryTransport } from '@modelcontextprotocol/sdk/inMemory.js';
+import { createServer } from '../index.js';
+import { SYSTEM_PROMPT } from '../descriptions.js';
+describe('Output Quality', () => {
+    let client;
+    let cleanup;
+    async function connect() {
+        const server = createServer();
+        const [clientTransport, serverTransport] = InMemoryTransport.createLinkedPair();
+        await server.connect(serverTransport);
+        client = new Client({ name: 'test-client', version: '1.0.0' });
+        await client.connect(clientTransport);
+        cleanup = async () => {
+            await client.close();
+            await server.close();
+        };
+    }
+    beforeEach(async () => {
+        await connect();
+    });
+    afterEach(async () => {
+        await cleanup();
+    });
+    function extractText(result) {
+        return result.content[0].text;
+    }
+    function thinkArgs(thought, num = 1, total = 1, next = true) {
+        return { thought, thoughtNumber: num, totalThoughts: total, nextThoughtNeeded: next };
+    }
+    // ─── 1. Think Output Structure ─────────────────────────────────────
+    describe('think output structure', () => {
+        it('includes step prefix [Thinking N/M]', async () => {
+            const text = extractText(await client.callTool({ name: 'think', arguments: thinkArgs('test', 2, 5) }));
+            expect(text).toMatch(/^\[Thinking 2\/5\]/);
+        });
+        it('includes thought content after prefix', async () => {
+            const text = extractText(await client.callTool({ name: 'think', arguments: thinkArgs('my analysis') }));
+            expect(text).toContain('my analysis');
+        });
+        it('includes verify nudge when nextThoughtNeeded=false', async () => {
+            const text = extractText(await client.callTool({
+                name: 'think',
+                arguments: thinkArgs('final conclusion', 3, 3, false),
+            }));
+            expect(text).toMatch(/verify tool/i);
+            expect(text).toContain('final conclusion');
+        });
+        it('does NOT include verify nudge when nextThoughtNeeded=true', async () => {
+            const text = extractText(await client.callTool({ name: 'think', arguments: thinkArgs('still working', 1, 3) }));
+            expect(text).not.toMatch(/verify tool/i);
+        });
+        it('returns exactly one content item', async () => {
+            const result = await client.callTool({
+                name: 'think',
+                arguments: thinkArgs('test'),
+            });
+            expect(result.content).toHaveLength(1);
+        });
+        it('content item has type "text"', async () => {
+            const result = await client.callTool({
+                name: 'think',
+                arguments: thinkArgs('test'),
+            });
+            expect(result.content[0].type).toBe('text');
+        });
+    });
+    // ─── 2. Think Identity Preservation ────────────────────────────────
+    describe('think identity preservation', () => {
+        const cases = [
+            ['plain ASCII', 'The quick brown fox jumps over the lazy dog.'],
+            ['JSON string', '{"key": "value", "nested": {"arr": [1, 2, 3]}}'],
+            ['markdown', '# Heading\n\n- bullet\n- **bold**\n\n```code```'],
+            ['HTML/XML tags', '<div class="test">content &amp; more</div>'],
+            ['backslashes', 'path\\to\\file and regex \\d+\\.\\d+'],
+            ['quotes', 'she said "hello" and \'goodbye\''],
+            ['newlines and tabs', 'line1\nline2\n\ttabbed\r\nwindows-style'],
+            ['emoji sequences', '👨‍👩‍👧‍👦 family, 🏳️‍🌈 flag, 👋🏽 skin tone'],
+            ['CJK characters', '思考する verify 検証 확인하다 验证'],
+            ['Arabic RTL', 'التحقق من الاستدلال'],
+            ['combining diacritics', 'e\u0301 vs é — n\u0303 vs ñ'],
+            ['zero-width chars', 'zero\u200Bwidth\u200Cjoiner\u200Dhere\uFEFFbom'],
+            ['surrogate pairs', '𝕳𝖊𝖑𝖑𝖔 𝕎𝕠𝕣𝕝𝕕 — 𝄞 music'],
+            ['very long', 'a'.repeat(50_000)],
+            ['only whitespace', '   \t\t\n\n   '],
+            ['single character', 'x'],
+        ];
+        for (const [label, input] of cases) {
+            it(`preserves: ${label}`, async () => {
+                const text = extractText(await client.callTool({ name: 'think', arguments: thinkArgs(input) }));
+                expect(text).toContain(input);
+            });
+        }
+    });
+    // ─── 3. Verify Identity Preservation ───────────────────────────────
+    describe('verify identity preservation', () => {
+        const cases = [
+            ['plain ASCII', 'Is my assumption about thread safety correct?'],
+            ['JSON payload', '{"error": null, "data": [true, false]}'],
+            ['code snippet', 'if (x === null) throw new Error("unexpected null");'],
+            ['multiline', 'Line 1: assumption\nLine 2: evidence\nLine 3: conclusion'],
+            ['emoji + unicode', '⚠️ Edge case: 空の入力 → crash?'],
+            ['SQL injection', "'; DROP TABLE users; --"],
+            ['regex special chars', '^(?:foo|bar)\\b.*?\\d{3,}$'],
+            ['URL with params', 'https://example.com/api?key=val&other=123#fragment'],
+        ];
+        for (const [label, input] of cases) {
+            it(`preserves: ${label}`, async () => {
+                const text = extractText(await client.callTool({ name: 'verify', arguments: { concern: input } }));
+                expect(text).toBe(input);
+            });
+        }
+    });
+    // ─── 4. Verify Output Minimality ───────────────────────────────────
+    describe('verify output minimality', () => {
+        it('verify output is exactly the input string', async () => {
+            const input = 'My critical self-assessment';
+            const text = extractText(await client.callTool({ name: 'verify', arguments: { concern: input } }));
+            expect(text).toBe(input);
+            expect(text.length).toBe(input.length);
+        });
+        it('verify does not add timestamps or metadata', async () => {
+            const input = 'no metadata please';
+            const text = extractText(await client.callTool({ name: 'verify', arguments: { concern: input } }));
+            expect(text).toBe(input);
+        });
+    });
+    // ─── 5. Prompt Output Quality ──────────────────────────────────────
+    describe('prompt output quality', () => {
+        it('returns exactly one message with role "user"', async () => {
+            const result = await client.getPrompt({ name: 'steelmind' });
+            expect(result.messages).toHaveLength(1);
+            expect(result.messages[0].role).toBe('user');
+        });
+        it('message text matches SYSTEM_PROMPT exactly', async () => {
+            const result = await client.getPrompt({ name: 'steelmind' });
+            expect(result.messages[0].content.text).toBe(SYSTEM_PROMPT);
+        });
+        it('contains structured XML sections for compaction survivability', async () => {
+            const result = await client.getPrompt({ name: 'steelmind' });
+            const text = result.messages[0].content.text;
+            expect(text).toContain('<think_example>');
+            expect(text).toContain('</think_example>');
+            expect(text).toContain('<verify_example>');
+            expect(text).toContain('</verify_example>');
+            expect(text).toContain('<bad_example>');
+            expect(text).toContain('</bad_example>');
+        });
+        it('uses bullet points for compaction survivability', async () => {
+            const result = await client.getPrompt({ name: 'steelmind' });
+            const text = result.messages[0].content.text;
+            const bulletCount = (text.match(/^- /gm) || []).length;
+            expect(bulletCount).toBeGreaterThanOrEqual(4);
+        });
+    });
+    // ─── 6. Concurrent Correctness ─────────────────────────────────────
+    describe('concurrent correctness', () => {
+        it('parallel think calls return their own input', async () => {
+            const inputs = Array.from({ length: 20 }, (_, i) => `thought-${i}-${Math.random()}`);
+            const results = await Promise.all(inputs.map((thought, i) => client.callTool({
+                name: 'think',
+                arguments: thinkArgs(thought, i + 1, 20),
+            })));
+            for (let i = 0; i < inputs.length; i++) {
+                expect(extractText(results[i])).toContain(inputs[i]);
+            }
+        });
+        it('parallel verify calls return their own input', async () => {
+            const inputs = Array.from({ length: 20 }, (_, i) => `concern-${i}-${Math.random()}`);
+            const results = await Promise.all(inputs.map((concern) => client.callTool({ name: 'verify', arguments: { concern } })));
+            for (let i = 0; i < inputs.length; i++) {
+                expect(extractText(results[i])).toBe(inputs[i]);
+            }
+        });
+    });
+    // ─── 7. Tool Listing Output Quality ────────────────────────────────
+    describe('tool listing output quality', () => {
+        it('each tool has a non-empty description', async () => {
+            const { tools } = await client.listTools();
+            for (const tool of tools) {
+                expect(tool.description).toBeTruthy();
+                expect(tool.description.length).toBeGreaterThan(50);
+            }
+        });
+        it('think requires 4 fields, verify requires 1 (different cognitive modes)', async () => {
+            const { tools } = await client.listTools();
+            const think = tools.find((t) => t.name === 'think');
+            const verify = tools.find((t) => t.name === 'verify');
+            expect(think.inputSchema.required).toHaveLength(4);
+            expect(verify.inputSchema.required).toHaveLength(1);
+        });
+        it('no tool has outputSchema', async () => {
+            const { tools } = await client.listTools();
+            for (const tool of tools) {
+                expect(tool.outputSchema).toBeUndefined();
+            }
+        });
+    });
+});

package/dist/__tests__/server.test.d.ts ADDED Viewed

	@@ -0,0 +1 @@
1	+ export {};

package/dist/__tests__/server.test.js ADDED Viewed

@@ -0,0 +1,279 @@
+import { describe, it, expect, beforeEach, afterEach } from 'vitest';
+import { Client } from '@modelcontextprotocol/sdk/client/index.js';
+import { InMemoryTransport } from '@modelcontextprotocol/sdk/inMemory.js';
+import { createServer } from '../index.js';
+import { THINK_DESCRIPTION, VERIFY_DESCRIPTION, SYSTEM_PROMPT } from '../descriptions.js';
+describe('Steelmind MCP Server', () => {
+    let client;
+    let cleanup;
+    async function connect() {
+        const server = createServer();
+        const [clientTransport, serverTransport] = InMemoryTransport.createLinkedPair();
+        await server.connect(serverTransport);
+        client = new Client({ name: 'test-client', version: '1.0.0' });
+        await client.connect(clientTransport);
+        cleanup = async () => {
+            await client.close();
+            await server.close();
+        };
+    }
+    beforeEach(async () => {
+        await connect();
+    });
+    afterEach(async () => {
+        await cleanup();
+    });
+    // ─── Server Identity ───────────────────────────────────────────────
+    describe('server identity', () => {
+        it('reports correct name and version', () => {
+            const info = client.getServerVersion();
+            expect(info?.name).toBe('steelmind-mcp');
+            expect(info?.version).toBe('2.0.0');
+        });
+        it('advertises tools and prompts capabilities', () => {
+            const caps = client.getServerCapabilities();
+            expect(caps?.tools).toBeDefined();
+            expect(caps?.prompts).toBeDefined();
+        });
+    });
+    // ─── Tool Listing ──────────────────────────────────────────────────
+    describe('listTools', () => {
+        it('exposes exactly 2 tools', async () => {
+            const { tools } = await client.listTools();
+            expect(tools).toHaveLength(2);
+        });
+        it('exposes think and verify by name', async () => {
+            const { tools } = await client.listTools();
+            const names = tools.map((t) => t.name);
+            expect(names).toContain('think');
+            expect(names).toContain('verify');
+        });
+        it('think tool has correct input schema with step tracking', async () => {
+            const { tools } = await client.listTools();
+            const think = tools.find((t) => t.name === 'think');
+            expect(think.inputSchema.type).toBe('object');
+            expect(think.inputSchema.required).toEqual([
+                'thought',
+                'thoughtNumber',
+                'totalThoughts',
+                'nextThoughtNeeded',
+            ]);
+            expect(think.inputSchema.properties).toHaveProperty('thought');
+            expect(think.inputSchema.properties).toHaveProperty('thoughtNumber');
+            expect(think.inputSchema.properties).toHaveProperty('totalThoughts');
+            expect(think.inputSchema.properties).toHaveProperty('nextThoughtNeeded');
+        });
+        it('verify tool has concern field (not thought)', async () => {
+            const { tools } = await client.listTools();
+            const verify = tools.find((t) => t.name === 'verify');
+            expect(verify.inputSchema.required).toEqual(['concern']);
+            expect(verify.inputSchema.properties).toHaveProperty('concern');
+            expect(verify.inputSchema.properties).not.toHaveProperty('thought');
+        });
+        it('think description matches descriptions.ts export', async () => {
+            const { tools } = await client.listTools();
+            const think = tools.find((t) => t.name === 'think');
+            expect(think.description).toBe(THINK_DESCRIPTION);
+        });
+        it('verify description matches descriptions.ts export', async () => {
+            const { tools } = await client.listTools();
+            const verify = tools.find((t) => t.name === 'verify');
+            expect(verify.description).toBe(VERIFY_DESCRIPTION);
+        });
+    });
+    // ─── Think Tool ────────────────────────────────────────────────────
+    describe('think tool (structured output)', () => {
+        it('returns thought with step prefix when nextThoughtNeeded=true', async () => {
+            const result = await client.callTool({
+                name: 'think',
+                arguments: {
+                    thought: 'What assumptions am I making here?',
+                    thoughtNumber: 1,
+                    totalThoughts: 3,
+                    nextThoughtNeeded: true,
+                },
+            });
+            const text = result.content[0].text;
+            expect(text).toContain('[Thinking 1/3]');
+            expect(text).toContain('What assumptions am I making here?');
+        });
+        it('includes verify nudge when nextThoughtNeeded=false', async () => {
+            const result = await client.callTool({
+                name: 'think',
+                arguments: {
+                    thought: 'My conclusion is X.',
+                    thoughtNumber: 3,
+                    totalThoughts: 3,
+                    nextThoughtNeeded: false,
+                },
+            });
+            const text = result.content[0].text;
+            expect(text).toContain('[Thinking 3/3]');
+            expect(text).toContain('My conclusion is X.');
+            expect(text).toMatch(/verify tool/i);
+        });
+        it('does NOT include verify nudge when nextThoughtNeeded=true', async () => {
+            const result = await client.callTool({
+                name: 'think',
+                arguments: {
+                    thought: 'Still thinking...',
+                    thoughtNumber: 1,
+                    totalThoughts: 5,
+                    nextThoughtNeeded: true,
+                },
+            });
+            const text = result.content[0].text;
+            expect(text).not.toMatch(/verify tool/i);
+        });
+        it('preserves unicode and special characters in thought', async () => {
+            const input = '思考: émojis 🧠 & "quotes" <tags> \n\tnewlines';
+            const result = await client.callTool({
+                name: 'think',
+                arguments: {
+                    thought: input,
+                    thoughtNumber: 1,
+                    totalThoughts: 1,
+                    nextThoughtNeeded: true,
+                },
+            });
+            const text = result.content[0].text;
+            expect(text).toContain(input);
+        });
+        it('handles very long input', async () => {
+            const longThought = 'x'.repeat(100_000);
+            const result = await client.callTool({
+                name: 'think',
+                arguments: {
+                    thought: longThought,
+                    thoughtNumber: 1,
+                    totalThoughts: 1,
+                    nextThoughtNeeded: true,
+                },
+            });
+            const text = result.content[0].text;
+            expect(text).toContain(longThought);
+        });
+        it('is not marked as an error', async () => {
+            const result = await client.callTool({
+                name: 'think',
+                arguments: {
+                    thought: 'test',
+                    thoughtNumber: 1,
+                    totalThoughts: 1,
+                    nextThoughtNeeded: false,
+                },
+            });
+            expect(result.isError).toBeFalsy();
+        });
+        it('supports adjustable totalThoughts', async () => {
+            const r1 = await client.callTool({
+                name: 'think',
+                arguments: {
+                    thought: 'initial plan',
+                    thoughtNumber: 1,
+                    totalThoughts: 3,
+                    nextThoughtNeeded: true,
+                },
+            });
+            expect(r1.content[0].text).toContain('[Thinking 1/3]');
+            const r2 = await client.callTool({
+                name: 'think',
+                arguments: {
+                    thought: 'this is more complex than expected',
+                    thoughtNumber: 2,
+                    totalThoughts: 7,
+                    nextThoughtNeeded: true,
+                },
+            });
+            expect(r2.content[0].text).toContain('[Thinking 2/7]');
+        });
+    });
+    // ─── Verify Tool ───────────────────────────────────────────────────
+    describe('verify tool (identity function)', () => {
+        it('returns the concern unchanged', async () => {
+            const result = await client.callTool({
+                name: 'verify',
+                arguments: { concern: 'Am I sure the recursive approach is safe?' },
+            });
+            expect(result.content).toEqual([
+                { type: 'text', text: 'Am I sure the recursive approach is safe?' },
+            ]);
+        });
+        it('handles empty string', async () => {
+            const result = await client.callTool({
+                name: 'verify',
+                arguments: { concern: '' },
+            });
+            expect(result.content).toEqual([{ type: 'text', text: '' }]);
+        });
+        it('preserves unicode and special characters', async () => {
+            const input = 'Vérification: 検証 🔍 & "edge cases" <xml/>';
+            const result = await client.callTool({
+                name: 'verify',
+                arguments: { concern: input },
+            });
+            expect(result.content).toEqual([{ type: 'text', text: input }]);
+        });
+        it('is not marked as an error', async () => {
+            const result = await client.callTool({
+                name: 'verify',
+                arguments: { concern: 'test' },
+            });
+            expect(result.isError).toBeFalsy();
+        });
+    });
+    // ─── Unknown Tool ──────────────────────────────────────────────────
+    describe('unknown tool', () => {
+        it('throws for unknown tool name', async () => {
+            await expect(client.callTool({ name: 'nonexistent', arguments: {} })).rejects.toThrow();
+        });
+    });
+    // ─── Prompts ───────────────────────────────────────────────────────
+    describe('prompts', () => {
+        it('exposes exactly 1 prompt named steelmind', async () => {
+            const { prompts } = await client.listPrompts();
+            expect(prompts).toHaveLength(1);
+            expect(prompts[0].name).toBe('steelmind');
+        });
+        it('returns the system prompt content', async () => {
+            const result = await client.getPrompt({ name: 'steelmind' });
+            expect(result.messages).toHaveLength(1);
+            expect(result.messages[0].role).toBe('user');
+            expect(result.messages[0].content).toEqual({ type: 'text', text: SYSTEM_PROMPT });
+        });
+        it('throws for unknown prompt name', async () => {
+            await expect(client.getPrompt({ name: 'nonexistent' })).rejects.toThrow();
+        });
+    });
+    // ─── Statelessness ─────────────────────────────────────────────────
+    describe('statelessness', () => {
+        it('interleaved think and verify calls are independent', async () => {
+            const t1 = await client.callTool({
+                name: 'think',
+                arguments: {
+                    thought: 'plan approach',
+                    thoughtNumber: 1,
+                    totalThoughts: 2,
+                    nextThoughtNeeded: true,
+                },
+            });
+            const v1 = await client.callTool({
+                name: 'verify',
+                arguments: { concern: 'is this safe?' },
+            });
+            const t2 = await client.callTool({
+                name: 'think',
+                arguments: {
+                    thought: 'revised plan',
+                    thoughtNumber: 2,
+                    totalThoughts: 2,
+                    nextThoughtNeeded: false,
+                },
+            });
+            expect(t1.content[0].text).toContain('plan approach');
+            expect(v1.content).toEqual([{ type: 'text', text: 'is this safe?' }]);
+            expect(t2.content[0].text).toContain('revised plan');
+            expect(t2.content[0].text).toMatch(/verify tool/i);
+        });
+    });
+});

package/dist/cli.d.ts ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ #!/usr/bin/env node
2	+ export {};

package/dist/cli.js ADDED Viewed

@@ -0,0 +1,26 @@
+#!/usr/bin/env node
+import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
+import { createServer } from './index.js';
+process.on('uncaughtException', (err) => {
+    console.error('[Fatal]', err);
+    process.exit(1);
+});
+process.on('unhandledRejection', (err) => {
+    console.error('[Fatal]', err);
+    process.exit(1);
+});
+const server = createServer();
+process.on('SIGINT', async () => {
+    await server.close();
+    process.exit(0);
+});
+const transport = new StdioServerTransport();
+server
+    .connect(transport)
+    .then(() => {
+    console.error('Steelmind MCP server running — think + verify');
+})
+    .catch((err) => {
+    console.error('[Fatal] Server failed to start:', err);
+    process.exit(1);
+});

package/dist/descriptions.d.ts ADDED Viewed

@@ -0,0 +1,28 @@
+/**
+ * Steelmind MCP: Tool Descriptions and System Prompt
+ *
+ * These descriptions are the primary cognitive scaffold — they account for
+ * ~80% of the performance improvement (per Anthropic τ-bench research).
+ *
+ * Design principles applied:
+ * 1. Purpose first (primacy effect — ACL 2025)
+ * 2. When NOT to use (ToolACE: 6.99% → 83.81% irrelevance detection)
+ * 3. Socratic self-questioning in think (Chang 2023, Princeton SocraticAI)
+ * 4. Steel-manning in verify (SIEV ICML, MetaCrit)
+ * 5. Conciseness ~100 words (EasyTool: 70-97% token reduction → better perf)
+ * 6. No "think harder" instructions (OpenAI: degrades reasoning models)
+ * 7. Contrastive example in system prompt (Contrastive CoT, ACL 2024)
+ * 8. Sequential decomposition (Scaling TTC: 4x efficiency with adaptive compute)
+ * 9. Cognitive mode separation (MetaCrit: separating generation from evaluation)
+ *
+ * Key references:
+ * - Anthropic τ-bench: 54% improvement from optimized prompting
+ * - MetaCrit (2507.15015v3): separating generation from evaluation
+ * - Think2 (2602.18806v1): 3x self-correction with structured phases
+ * - SIEV (2510.18134, ICML): models lose 40+ pts under dialectical eval
+ * - Cognitive Foundations (2511.16660): scaffolding improves perf up to 72%
+ * - Scaling TTC (2408.03314): difficulty-adaptive compute improves efficiency >4x
+ */
+export declare const THINK_DESCRIPTION: string;
+export declare const VERIFY_DESCRIPTION: string;
+export declare const SYSTEM_PROMPT = "## Using the think and verify tools\n\nBefore taking any action or responding after receiving tool results, use the think tool as a scratchpad to:\n- List the specific rules or constraints that apply\n- Check if all required information is collected\n- Verify your planned action makes sense\n- Identify what could go wrong\n- Adjust totalThoughts as your understanding deepens\n\nAfter your final thinking step (nextThoughtNeeded: false), use the verify tool to:\n- Steel-man the opposing view: build the strongest case against your conclusion\n- Check assumptions you haven't validated\n- Look for edge cases you may have missed\n\n<think_example>\nUser wants to cancel order #1234\n- thoughtNumber: 1, totalThoughts: 3\n- Need to verify: order status, cancellation policy, refund eligibility\n- Check: is order already shipped? If so, different policy applies\n- Missing info: haven't confirmed the user's identity yet\n- Risk: cancelling a shipped order requires return logistics\n\u2192 Plan: verify identity first, then check order status, then apply correct policy\n</think_example>\n\n<verify_example>\nI concluded we should use recursive approach over iterative\n- Steel-man for iterative: simpler to debug, no stack overflow risk, same time complexity\n- My assumption that recursion is \"cleaner\" is subjective, not evidence-based\n- Edge case: what if input size exceeds stack depth? Recursion fails silently\n\u2192 Reconsider: iterative is safer for unknown input sizes\n</verify_example>\n\n<bad_example>\n\"Let me think about this. The user wants X. I should do Y. Yes, Y seems right.\"\nThis adds tokens without insight. Be specific: what rules apply? What could go wrong? What are you assuming?\n</bad_example>";

package/dist/descriptions.js ADDED Viewed

@@ -0,0 +1,81 @@
+/**
+ * Steelmind MCP: Tool Descriptions and System Prompt
+ *
+ * These descriptions are the primary cognitive scaffold — they account for
+ * ~80% of the performance improvement (per Anthropic τ-bench research).
+ *
+ * Design principles applied:
+ * 1. Purpose first (primacy effect — ACL 2025)
+ * 2. When NOT to use (ToolACE: 6.99% → 83.81% irrelevance detection)
+ * 3. Socratic self-questioning in think (Chang 2023, Princeton SocraticAI)
+ * 4. Steel-manning in verify (SIEV ICML, MetaCrit)
+ * 5. Conciseness ~100 words (EasyTool: 70-97% token reduction → better perf)
+ * 6. No "think harder" instructions (OpenAI: degrades reasoning models)
+ * 7. Contrastive example in system prompt (Contrastive CoT, ACL 2024)
+ * 8. Sequential decomposition (Scaling TTC: 4x efficiency with adaptive compute)
+ * 9. Cognitive mode separation (MetaCrit: separating generation from evaluation)
+ *
+ * Key references:
+ * - Anthropic τ-bench: 54% improvement from optimized prompting
+ * - MetaCrit (2507.15015v3): separating generation from evaluation
+ * - Think2 (2602.18806v1): 3x self-correction with structured phases
+ * - SIEV (2510.18134, ICML): models lose 40+ pts under dialectical eval
+ * - Cognitive Foundations (2511.16660): scaffolding improves perf up to 72%
+ * - Scaling TTC (2408.03314): difficulty-adaptive compute improves efficiency >4x
+ */
+export const THINK_DESCRIPTION = 'Use this tool to record a structured reasoning step. It will not obtain ' +
+    'new information or change any state — it appends your thought to the log. ' +
+    'Use it when you need to: process results from previous tool calls before ' +
+    'acting, plan your approach to a multi-step task, analyze a complex ' +
+    'situation before deciding, or navigate environments with detailed policies. ' +
+    'Do NOT use for simple single-step tasks or restating without analysis. ' +
+    'When thinking, ask yourself: What am I assuming? What evidence supports ' +
+    "this? What's my plan, and what could go wrong? " +
+    'You can adjust totalThoughts up or down as your understanding deepens. ' +
+    'When you set nextThoughtNeeded to false, use the verify tool to challenge ' +
+    'your conclusion before acting.';
+export const VERIFY_DESCRIPTION = 'Use this tool to challenge and evaluate your reasoning before committing ' +
+    'to an action. It will not obtain new information or change any state — it ' +
+    'logs your critical self-assessment. Use it when you need to: check if your ' +
+    'planned action complies with all requirements, validate reasoning before ' +
+    'committing, assess edge cases, or evaluate tool results for correctness. ' +
+    'Do NOT use to confirm what you are already confident about. ' +
+    'When verifying, steel-man the opposition: What is the strongest argument ' +
+    "that your conclusion is wrong? If you can't defeat it, reconsider. " +
+    'If your verification reveals a flaw, use the think tool to revise your approach.';
+export const SYSTEM_PROMPT = `## Using the think and verify tools
+Before taking any action or responding after receiving tool results, use the think tool as a scratchpad to:
+- List the specific rules or constraints that apply
+- Check if all required information is collected
+- Verify your planned action makes sense
+- Identify what could go wrong
+- Adjust totalThoughts as your understanding deepens
+After your final thinking step (nextThoughtNeeded: false), use the verify tool to:
+- Steel-man the opposing view: build the strongest case against your conclusion
+- Check assumptions you haven't validated
+- Look for edge cases you may have missed
+<think_example>
+User wants to cancel order #1234
+- thoughtNumber: 1, totalThoughts: 3
+- Need to verify: order status, cancellation policy, refund eligibility
+- Check: is order already shipped? If so, different policy applies
+- Missing info: haven't confirmed the user's identity yet
+- Risk: cancelling a shipped order requires return logistics
+→ Plan: verify identity first, then check order status, then apply correct policy
+</think_example>
+<verify_example>
+I concluded we should use recursive approach over iterative
+- Steel-man for iterative: simpler to debug, no stack overflow risk, same time complexity
+- My assumption that recursion is "cleaner" is subjective, not evidence-based
+- Edge case: what if input size exceeds stack depth? Recursion fails silently
+→ Reconsider: iterative is safer for unknown input sizes
+</verify_example>
+<bad_example>
+"Let me think about this. The user wants X. I should do Y. Yes, Y seems right."
+This adds tokens without insight. Be specific: what rules apply? What could go wrong? What are you assuming?
+</bad_example>`;

package/dist/index.d.ts ADDED Viewed

@@ -0,0 +1,7 @@
+#!/usr/bin/env node
+import { Server } from '@modelcontextprotocol/sdk/server/index.js';
+/**
+ * Creates and configures the Steelmind MCP server with all tool and prompt handlers.
+ * Exported for testing — the server is fully functional but not yet connected to a transport.
+ */
+export declare function createServer(): Server;

package/dist/index.js ADDED Viewed

@@ -0,0 +1,106 @@
+#!/usr/bin/env node
+import { Server } from '@modelcontextprotocol/sdk/server/index.js';
+import { CallToolRequestSchema, ListToolsRequestSchema, ListPromptsRequestSchema, GetPromptRequestSchema, } from '@modelcontextprotocol/sdk/types.js';
+import { THINK_DESCRIPTION, VERIFY_DESCRIPTION, SYSTEM_PROMPT } from './descriptions.js';
+/**
+ * Creates and configures the Steelmind MCP server with all tool and prompt handlers.
+ * Exported for testing — the server is fully functional but not yet connected to a transport.
+ */
+export function createServer() {
+    const server = new Server({ name: 'steelmind-mcp', version: '2.0.0' }, { capabilities: { tools: {}, prompts: {} } });
+    server.onerror = (error) => console.error('[MCP Error]', error);
+    // --- Tools ---
+    server.setRequestHandler(ListToolsRequestSchema, async () => ({
+        tools: [
+            {
+                name: 'think',
+                description: THINK_DESCRIPTION,
+                inputSchema: {
+                    type: 'object',
+                    properties: {
+                        thought: {
+                            type: 'string',
+                            description: 'Your current thinking step.',
+                        },
+                        thoughtNumber: {
+                            type: 'integer',
+                            description: 'Current thought number in the sequence.',
+                        },
+                        totalThoughts: {
+                            type: 'integer',
+                            description: 'Estimated total thoughts needed. Can be adjusted up or down as you progress.',
+                        },
+                        nextThoughtNeeded: {
+                            type: 'boolean',
+                            description: 'Whether another thinking step is needed after this one.',
+                        },
+                    },
+                    required: ['thought', 'thoughtNumber', 'totalThoughts', 'nextThoughtNeeded'],
+                },
+            },
+            {
+                name: 'verify',
+                description: VERIFY_DESCRIPTION,
+                inputSchema: {
+                    type: 'object',
+                    properties: {
+                        concern: {
+                            type: 'string',
+                            description: 'Your critical assessment or concern to verify.',
+                        },
+                    },
+                    required: ['concern'],
+                },
+            },
+        ],
+    }));
+    server.setRequestHandler(CallToolRequestSchema, async (request) => {
+        const { name, arguments: args } = request.params;
+        switch (name) {
+            case 'think': {
+                const { thought, thoughtNumber, totalThoughts, nextThoughtNeeded } = args;
+                const prefix = `[Thinking ${thoughtNumber}/${totalThoughts}]`;
+                if (!nextThoughtNeeded) {
+                    return {
+                        content: [
+                            {
+                                type: 'text',
+                                text: `${prefix}\n\n${thought}\n\n` +
+                                    '---\nThinking complete. Before acting on this conclusion, ' +
+                                    'use the verify tool to challenge it.',
+                            },
+                        ],
+                    };
+                }
+                return {
+                    content: [{ type: 'text', text: `${prefix}\n\n${thought}` }],
+                };
+            }
+            case 'verify':
+                return {
+                    content: [{ type: 'text', text: args.concern }],
+                };
+            default:
+                throw new Error(`Unknown tool: ${name}`);
+        }
+    });
+    // --- Prompts ---
+    server.setRequestHandler(ListPromptsRequestSchema, async () => ({
+        prompts: [
+            {
+                name: 'steelmind',
+                description: 'Metacognitive system prompt for structured thinking and steel-manning verification',
+            },
+        ],
+    }));
+    server.setRequestHandler(GetPromptRequestSchema, async (request) => {
+        if (request.params.name !== 'steelmind') {
+            throw new Error(`Unknown prompt: ${request.params.name}`);
+        }
+        return {
+            description: 'Metacognitive system prompt for structured thinking and steel-manning verification',
+            messages: [{ role: 'user', content: { type: 'text', text: SYSTEM_PROMPT } }],
+        };
+    });
+    return server;
+}

package/package.json ADDED Viewed

@@ -0,0 +1,71 @@
+{
+  "name": "@stabgan/steelmind-mcp",
+  "version": "2.0.0",
+  "description": "Research-grounded metacognitive reasoning tools for AI agents. Structured step-by-step thinking with steel-manning verification via MCP. Combines sequential decomposition with cognitive science frameworks. Backed by 43+ papers.",
+  "type": "module",
+  "main": "dist/index.js",
+  "bin": {
+    "steelmind-mcp": "dist/cli.js"
+  },
+  "files": [
+    "dist",
+    "README.md"
+  ],
+  "scripts": {
+    "build": "tsc && shx chmod +x dist/*.js",
+    "prepare": "npm run build",
+    "start": "node dist/cli.js",
+    "lint": "eslint src",
+    "format": "prettier --write \"src/**/*.ts\" *.json \"*.md\"",
+    "format:check": "prettier --check \"src/**/*.ts\"",
+    "test": "node --experimental-vm-modules node_modules/.bin/vitest run"
+  },
+  "keywords": [
+    "mcp",
+    "steelmind",
+    "metacognition",
+    "reasoning",
+    "think",
+    "verify",
+    "steel-manning",
+    "dialectical",
+    "agentic",
+    "llm",
+    "claude",
+    "gpt",
+    "anthropic",
+    "sequential-thinking",
+    "chain-of-thought",
+    "model-context-protocol",
+    "ai-agent",
+    "cognitive-science",
+    "structured-reasoning"
+  ],
+  "author": "stabgan",
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/stabgan/steelmind-mcp"
+  },
+  "bugs": {
+    "url": "https://github.com/stabgan/steelmind-mcp/issues"
+  },
+  "homepage": "https://github.com/stabgan/steelmind-mcp#readme",
+  "license": "MIT",
+  "engines": {
+    "node": ">=18.0.0"
+  },
+  "dependencies": {
+    "@modelcontextprotocol/sdk": "^1.27.1"
+  },
+  "devDependencies": {
+    "@eslint/js": "^9.39.2",
+    "@types/node": "^22.13.14",
+    "eslint": "^9.39.2",
+    "eslint-config-prettier": "^10.1.8",
+    "prettier": "^3.7.4",
+    "shx": "^0.3.4",
+    "typescript": "^5.8.2",
+    "typescript-eslint": "^8.53.0",
+    "vitest": "^3.1.1"
+  }
+}