@houtini/voice-analyser 1.3.0 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,265 +1,262 @@
1
- # Voice Analyser
2
-
3
- [![npm version](https://img.shields.io/npm/v/@houtini/voice-analyser)](https://www.npmjs.com/package/@houtini/voice-analyser)
4
- [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
5
-
6
- > **Experimental library** for extracting statistical voice models from your published writing. Generates immersive style guides that teach LLMs to replicate how you actually write - not through rules, but through examples and rhythm patterns.
7
-
8
- ## Why This Exists
9
-
10
- Traditional style guides list rules: "Use short sentences. Vary paragraph length. Include personal anecdotes."
11
-
12
- This doesn't work. Writers don't follow rules - they channel voice.
13
-
14
- This tool extracts the statistical fingerprint of your writing and presents it as immersive examples with annotations showing *what makes each passage feel human*. The goal is voice replication through pattern recognition, not rule compliance.
15
-
16
- **Status:** Experimental. The approach works but is under active development.
17
-
18
- ## Installation
19
-
20
- ### Claude Desktop
21
-
22
- Add to your `claude_desktop_config.json`:
23
-
24
- ```json
25
- {
26
- "mcpServers": {
27
- "voice-analysis": {
28
- "command": "npx",
29
- "args": ["-y", "@houtini/voice-analyser@latest"]
30
- }
31
- }
32
- }
33
- ```
34
-
35
- **Config locations:**
36
- - Windows: `%APPDATA%\Claude\claude_desktop_config.json`
37
- - macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
38
- - Linux: `~/.config/Claude/claude_desktop_config.json`
39
-
40
- Restart Claude Desktop after saving.
41
-
42
- ### Requirements
43
-
44
- - Node.js 20+
45
-
46
- ## Quick Start
47
-
48
- ### 1. Create an Output Directory
49
-
50
- First, create a directory where your corpus and analysis will be stored:
51
-
52
- ```
53
- C:\writing\voice-models\ (Windows)
54
- ~/writing/voice-models/ (Mac/Linux)
55
- ```
56
-
57
- This directory will contain:
58
- - Collected articles (markdown)
59
- - Analysis JSON files
60
- - Generated voice guides
61
-
62
- ### 2. Collect Your Writing
63
-
64
- ```
65
- Collect corpus from https://yoursite.com/post-sitemap.xml
66
- Save as "my-voice" in "C:\writing\voice-models"
67
- ```
68
-
69
- The tool needs:
70
- - `sitemap_url` - Your XML sitemap URL
71
- - `output_name` - A name for this corpus (e.g., "my-voice", "blog-posts")
72
- - `output_dir` - The directory you created above
73
-
74
- **Example with all parameters:**
75
- ```
76
- Collect corpus from https://example.com/post-sitemap.xml
77
- Output name: "technical-writing"
78
- Output directory: "C:\writing\voice-models"
79
- Maximum articles: 50
80
- ```
81
-
82
- ### 3. Analyse Patterns
83
-
84
- ```
85
- Analyse corpus "my-voice" in directory "C:\writing\voice-models"
86
- ```
87
-
88
- This runs 14 analysers covering vocabulary, sentence structure, voice markers, argument flow, and paragraph transitions.
89
-
90
- ### 4. Generate Voice Guide
91
-
92
- ```
93
- Generate narrative guide for "my-voice" in directory "C:\writing\voice-models"
94
- ```
95
-
96
- Creates an immersive style guide with annotated examples at:
97
- `C:\writing\voice-models\my-voice\writing_style_my-voice_narrative.md`
98
-
99
- ## Using the Voice Guide
100
-
101
- Once generated, the voice guide can be loaded into any LLM conversation to help it write in your voice.
102
-
103
- ### Loading the Guide
104
-
105
- ```
106
- Load the file C:\writing\voice-models\my-voice\writing_style_my-voice_narrative.md
107
- and use it as a reference for all writing in this conversation.
108
- ```
109
-
110
- ### Example Prompts for Content Generation
111
-
112
- **Blog post:**
113
- ```
114
- Using the voice guide as your reference, write a blog post about [topic].
115
-
116
- Key requirements:
117
- - Match the sentence rhythm patterns shown in the examples
118
- - Use the conversational devices naturally (not forced)
119
- - Include the micro-rhythms: mid-thought pivots, embedded uncertainty, present-tense immediacy
120
- - Vary sentence length as shown in the statistical targets
121
- - Use British/American spelling as indicated in the guide
122
- ```
123
-
124
- **Technical article:**
125
- ```
126
- Reference the voice guide and write a technical explanation of [concept].
127
-
128
- Channel the voice by:
129
- - Opening with the pattern shown in "Opening Moves" section
130
- - Using specific product/tool names, not generic references
131
- - Including admissions of complexity or uncertainty where authentic
132
- - Following the argument flow patterns from the guide
133
- - Matching the punctuation habits (especially dash usage)
134
- ```
135
-
136
- **Product review:**
137
- ```
138
- Using the loaded voice guide, write a review of [product].
139
-
140
- Capture the voice by:
141
- - Starting with personal context (why you tested this)
142
- - Blending technical specs with practical implications
143
- - Using the transition patterns between paragraphs
144
- - Including the "human tells" - parenthetical asides, mid-thought corrections
145
- - Ending with the closing patterns shown in examples
146
- ```
147
-
148
- **Email/communication:**
149
- ```
150
- Write an email about [subject] using the voice patterns from the guide.
151
-
152
- Focus on:
153
- - Conversational markers appearing naturally
154
- - Sentence length variation (some punchy, some complex)
155
- - The hedging/confidence balance shown in statistics
156
- - First-person usage matching the corpus frequency
157
- ```
158
-
159
- ### Validation After Writing
160
-
161
- The guide includes statistical targets. After writing, check:
162
-
163
- ```
164
- Review what you just wrote against the voice guide metrics:
165
- - Does sentence length variation match the target standard deviation?
166
- - Is first-person frequency within the expected range?
167
- - Are conversational markers present but not overused?
168
- - Does the rhythm feel like the extended examples?
169
- ```
170
-
171
- ## What Gets Analysed
172
-
173
- ### Core Voice Patterns
174
- - **Vocabulary** - Word choice, British/American markers, domain specificity
175
- - **Sentence structure** - Length distribution, openers, complexity patterns
176
- - **Voice markers** - First-person usage, hedging language, conversational markers
177
- - **Punctuation** - Dash types, comma density, parenthetical frequency
178
-
179
- ### Argument & Flow Patterns
180
- - **Argument flow** - How you open, build, and close arguments
181
- - **Paragraph transitions** - How ideas connect across paragraphs
182
- - **Conversational devices** - "look", "frankly", "actually" and when they appear
183
-
184
- ### Micro-Rhythm Detection
185
- The guide annotates examples with invisible patterns that make writing feel human:
186
- - Mid-thought pivots (comma before "and", "but", "so")
187
- - Present-tense immediacy ("Right now, it's...")
188
- - Embedded uncertainty ("I think", "probably")
189
- - Casual sentence starters ("So,", "And,", "But,")
190
- - Parenthetical asides
191
- - Punchy fragments contrasting with longer sentences
192
-
193
- ## Output Structure
194
-
195
- ```
196
- your-output-directory/
197
- └── corpus-name/
198
- ├── articles/ # Collected markdown files
199
- ├── corpus.json # Metadata
200
- ├── analysis/ # JSON analysis files
201
- │ ├── vocabulary.json
202
- │ ├── sentence.json
203
- │ ├── voice.json
204
- │ ├── paragraph.json
205
- │ ├── punctuation.json
206
- │ ├── function-words.json
207
- │ ├── argument-flow.json
208
- │ └── paragraph-transitions.json
209
- └── writing_style_[name]_narrative.md # The voice guide
210
- ```
211
-
212
- ## Minimum Corpus Size
213
-
214
- - **Minimum:** 15,000 words (~20 articles)
215
- - **Recommended:** 30,000 words
216
- - **Ideal:** 50,000+ words
217
-
218
- Below 15k words, statistical patterns become unreliable.
219
-
220
- ## Tools Reference
221
-
222
- ### collect_corpus
223
-
224
- | Parameter | Required | Description |
225
- |-----------|----------|-------------|
226
- | `sitemap_url` | Yes | XML sitemap URL |
227
- | `output_name` | Yes | Corpus identifier (e.g., "my-voice") |
228
- | `output_dir` | Yes | Directory to store corpus |
229
- | `max_articles` | No | Limit (default: 100) |
230
- | `article_pattern` | No | Regex filter for URLs |
231
-
232
- ### analyze_corpus
233
-
234
- | Parameter | Required | Description |
235
- |-----------|----------|-------------|
236
- | `corpus_name` | Yes | Name from collect_corpus |
237
- | `corpus_dir` | Yes | Directory containing corpus |
238
- | `analysis_type` | No | full, quick, vocabulary, syntax |
239
-
240
- ### generate_narrative_guide
241
-
242
- | Parameter | Required | Description |
243
- |-----------|----------|-------------|
244
- | `corpus_name` | Yes | Name from analyze_corpus |
245
- | `corpus_dir` | Yes | Directory containing corpus |
246
-
247
- ## Development
248
-
249
- ```bash
250
- git clone https://github.com/houtini-ai/mcp-server-voice-analysis.git
251
- cd mcp-server-voice-analysis
252
- npm install
253
- npm run build
254
- ```
255
-
256
- ## Limitations
257
-
258
- - Requires XML sitemap (RSS feeds not currently supported)
259
- - Works best with consistent single-author content
260
- - Mixed authorship or heavily edited content produces weaker signals
261
- - The approach is experimental - results vary by writing style
262
-
263
- ---
264
-
265
- Apache License 2.0 - [Houtini.ai](https://houtini.ai)
1
+ # Voice Analyser MCP
2
+
3
+ [![npm version](https://img.shields.io/npm/v/@houtini/voice-analyser)](https://www.npmjs.com/package/@houtini/voice-analyser)
4
+ [![Known Vulnerabilities](https://snyk.io/test/github/houtini-ai/voice-analyser-mcp/badge.svg)](https://snyk.io/test/github/houtini-ai/voice-analyser-mcp)
5
+ [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
6
+
7
+ > MCP server that analyses your published writing and generates executable style guides for voice-matched content creation.
8
+
9
+ ## What This Does
10
+
11
+ I built this because traditional style guides don't work. They tell you "use short sentences" and "vary paragraph length" - rules that sound helpful but produce robotic output when you try to follow them.
12
+
13
+ This tool extracts statistical patterns from your published writing and generates a style guide that teaches through **zero tolerance rules, phrase libraries, and validation checklists** rather than vague principles.
14
+
15
+ Version 1.4.0 focuses on executable instructions: forbidden word lists with alternatives, 50+ actual phrases from your corpus, and checkbox validation that catches AI slop before you publish.
16
+
17
+ ## Installation
18
+
19
+ ### Claude Desktop
20
+
21
+ Add to your `claude_desktop_config.json`:
22
+
23
+ ```json
24
+ {
25
+ "mcpServers": {
26
+ "voice-analysis": {
27
+ "command": "npx",
28
+ "args": ["-y", "@houtini/voice-analyser@latest"]
29
+ }
30
+ }
31
+ }
32
+ ```
33
+
34
+ **Config file locations:**
35
+ - Windows: `%APPDATA%\Claude\claude_desktop_config.json`
36
+ - macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
37
+ - Linux: `~/.config/Claude/claude_desktop_config.json`
38
+
39
+ Restart Claude Desktop after saving.
40
+
41
+ **Requirements:** Node.js 20+
42
+
43
+ ## Quick Start
44
+
45
+ ### 1. Create Output Directory
46
+
47
+ Pick a directory for corpus storage and analysis:
48
+
49
+ ```
50
+ C:\writing\voice-models\ (Windows)
51
+ ~/writing/voice-models/ (Mac/Linux)
52
+ ```
53
+
54
+ This holds collected articles, analysis JSON files, and generated style guides.
55
+
56
+ ### 2. Collect Writing Corpus
57
+
58
+ In Claude Desktop:
59
+ ```
60
+ Collect corpus from https://yoursite.com/post-sitemap.xml
61
+ Save as "my-voice" in "C:\writing\voice-models"
62
+ ```
63
+
64
+ Parameters:
65
+ - `sitemap_url` - XML sitemap URL
66
+ - `output_name` - Corpus identifier (e.g., "my-voice")
67
+ - `output_dir` - Directory you created above
68
+ - `max_articles` - Optional limit (default: 100)
69
+
70
+ The tool crawls your sitemap, extracts clean content, and saves markdown files.
71
+
72
+ ### 3. Run Analysis
73
+
74
+ ```
75
+ Analyse corpus "my-voice" in directory "C:\writing\voice-models"
76
+ ```
77
+
78
+ This runs 15+ analysers covering:
79
+ - Vocabulary tiers (AI slop detection, formality scoring)
80
+ - Phrase extraction (opening patterns, transitions, caveats)
81
+ - Sentence structure and rhythm
82
+ - Voice markers and conversational devices
83
+ - Punctuation habits
84
+
85
+ ### 4. Generate Style Guide v4
86
+
87
+ ```
88
+ Generate style guide for "my-voice" in directory "C:\writing\voice-models"
89
+ ```
90
+
91
+ Creates an example-first guide at:
92
+ `C:\writing\voice-models\my-voice\writing_style_my-voice.md`
93
+
94
+ ## What v1.4.0 Changed
95
+
96
+ Previous versions generated statistical analysis that was accurate but not useful for writing. v1.4.0 restructures the output:
97
+
98
+ **Before:** 60% statistics, 40% guidance
99
+ **After:** 70% examples, 30% statistics
100
+
101
+ ### New Style Guide Structure
102
+
103
+ **Part 1: Zero Tolerance Rules**
104
+ - Forbidden vocabulary (AI slop) with alternatives
105
+ - Formal words flagged with casual replacements
106
+ - Punctuation rules (em-dash detection)
107
+
108
+ **Part 2: Phrase Library (50+ Examples)**
109
+ - Opening patterns (personal story, direct action, protective warnings)
110
+ - Equipment references (possessive vs generic)
111
+ - Caveat phrases (honesty markers)
112
+ - Transition patterns
113
+
114
+ **Part 3: Sentence Patterns**
115
+ - Rhythm variation targets with corpus examples
116
+ - First-person usage frequency
117
+ - Natural sentence flow demonstrations
118
+
119
+ **Part 4: Validation Checklist**
120
+ - Critical rules (must pass)
121
+ - Voice match rules (should pass)
122
+ - Actionable checkbox format
123
+
124
+ **Part 5: Quick Reference**
125
+ - Top phrases by frequency
126
+ - Statistics summary
127
+
128
+ ## Using the Style Guide
129
+
130
+ Load the generated guide into Claude conversations:
131
+
132
+ ```
133
+ Load C:\writing\voice-models\my-voice\writing_style_my-voice.md
134
+ and use it to write [content type] about [topic]
135
+ ```
136
+
137
+ The guide includes validation checklists. After Claude writes, run:
138
+
139
+ ```
140
+ Check what you just wrote against the style guide validation checklist.
141
+ Report any violations.
142
+ ```
143
+
144
+ ### Critical Validation Rules
145
+
146
+ The guide flags these as must-pass:
147
+ - Zero AI slop words (delve, leverage, unlock, seamless, robust)
148
+ - Zero em-dashes if corpus doesn't use them
149
+ - British/American spelling consistency
150
+ - Equipment named specifically (not "the product")
151
+
152
+ ### Voice Match Validation
153
+
154
+ The guide checks these as should-pass:
155
+ - First-person frequency matches target (~0.8 per 100 words typical)
156
+ - Sentence length varies wildly (5-word to 40-word sentences)
157
+ - Honest caveats present ("It's not perfect", "I wish I'd...")
158
+ - Opening follows corpus patterns
159
+
160
+ ## Analysis Output
161
+
162
+ The tool generates these JSON files in `corpus-name/analysis/`:
163
+
164
+ **Core Analysis:**
165
+ - `vocabulary.json` - Word choice, domain terms, British/American markers
166
+ - `sentence.json` - Length distribution, complexity patterns
167
+ - `voice.json` - First-person usage, hedging language, conversational markers
168
+ - `paragraph.json` - Structure and transition patterns
169
+ - `punctuation.json` - Dash types, comma density, parenthetical frequency
170
+
171
+ **v1.4.0 Additions:**
172
+ - `vocabulary-tiers.json` - AI slop detection, formality scoring with alternatives
173
+ - `phrase-library.json` - 50+ extracted phrases organized by type
174
+
175
+ **Advanced Analysis:**
176
+ - `function-words.json` - Z-scores for style fingerprinting
177
+ - `anti-mechanical.json` - Naturalness scoring
178
+ - `argument-flow.json` - How arguments open, build, close
179
+ - `paragraph-transitions.json` - Cross-paragraph connection patterns
180
+ - `specificity-patterns.json` - Possessive vs generic references
181
+
182
+ ## Minimum Corpus Requirements
183
+
184
+ - **Minimum:** 15,000 words (~20 articles)
185
+ - **Recommended:** 30,000 words (~40 articles)
186
+ - **Ideal:** 50,000+ words
187
+
188
+ Below 15k words, statistical patterns become unreliable. The phrase library needs volume to find frequently-used patterns.
189
+
190
+ ## MCP Tools Reference
191
+
192
+ ### collect_corpus
193
+
194
+ Crawls sitemap and collects clean writing corpus.
195
+
196
+ **Parameters:**
197
+ - `sitemap_url` (required) - XML sitemap URL
198
+ - `output_name` (required) - Corpus identifier
199
+ - `output_dir` (required) - Storage directory
200
+ - `max_articles` (optional) - Limit, default 100
201
+ - `article_pattern` (optional) - Regex URL filter
202
+
203
+ ### analyze_corpus
204
+
205
+ Runs linguistic analysis on collected corpus.
206
+
207
+ **Parameters:**
208
+ - `corpus_name` (required) - Name from collect_corpus
209
+ - `corpus_dir` (required) - Directory containing corpus
210
+ - `analysis_type` (optional) - full, quick, vocabulary, syntax (default: full)
211
+
212
+ ### generate_style_guide
213
+
214
+ Generates v4 executable style guide.
215
+
216
+ **Parameters:**
217
+ - `corpus_name` (required) - Name from analyze_corpus
218
+ - `corpus_dir` (required) - Directory containing analysis
219
+
220
+ ## Development
221
+
222
+ ```bash
223
+ git clone https://github.com/houtini-ai/mcp-server-voice-analysis.git
224
+ cd mcp-server-voice-analysis
225
+ npm install
226
+ npm run build
227
+ ```
228
+
229
+ Local development mode in Claude Desktop config:
230
+
231
+ ```json
232
+ {
233
+ "mcpServers": {
234
+ "voice-analysis": {
235
+ "command": "node",
236
+ "args": ["C:\\path\\to\\mcp-server-voice-analysis\\dist\\index.js"]
237
+ }
238
+ }
239
+ }
240
+ ```
241
+
242
+ ## Known Limitations
243
+
244
+ - Needs XML sitemap (RSS feeds not supported)
245
+ - Works best with single-author content
246
+ - Mixed authorship weakens statistical signals
247
+ - Heavily edited content produces less distinct voice patterns
248
+ - Transition phrase detection currently returns sparse results (being improved)
249
+
250
+ ## What's Next
251
+
252
+ v1.5.0 planned features:
253
+ - Automated text validation against corpus
254
+ - Real-time writing feedback
255
+ - Custom forbidden vocabulary per corpus
256
+ - Improved transition phrase detection
257
+
258
+ ---
259
+
260
+ **License:** Apache 2.0
261
+ **Author:** [Houtini](https://houtini.ai)
262
+ **Version:** 1.4.0
@@ -0,0 +1,30 @@
1
+ /**
2
+ * Phrase Extraction Analysis
3
+ *
4
+ * Extracts frequently-used phrases for direct imitation.
5
+ * Focuses on opening patterns, transitions, equipment references, and caveats.
6
+ */
7
+ export interface PhraseExample {
8
+ phrase: string;
9
+ count: number;
10
+ context?: string;
11
+ }
12
+ export interface PhraseLibrary {
13
+ openingPatterns: {
14
+ personalStory: PhraseExample[];
15
+ directAction: PhraseExample[];
16
+ protectiveWarning: PhraseExample[];
17
+ };
18
+ transitionPhrases: PhraseExample[];
19
+ equipmentReferences: {
20
+ withPossessive: PhraseExample[];
21
+ generic: PhraseExample[];
22
+ };
23
+ caveatPhrases: PhraseExample[];
24
+ totalPhrases: number;
25
+ }
26
+ /**
27
+ * Main phrase extraction function
28
+ */
29
+ export declare function extractPhrases(text: string): PhraseLibrary;
30
+ //# sourceMappingURL=phrase-extraction.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"phrase-extraction.d.ts","sourceRoot":"","sources":["../../src/analyzers/phrase-extraction.ts"],"names":[],"mappings":"AAAA;;;;;GAKG;AAIH,MAAM,WAAW,aAAa;IAC5B,MAAM,EAAE,MAAM,CAAC;IACf,KAAK,EAAE,MAAM,CAAC;IACd,OAAO,CAAC,EAAE,MAAM,CAAC;CAClB;AAED,MAAM,WAAW,aAAa;IAC5B,eAAe,EAAE;QACf,aAAa,EAAE,aAAa,EAAE,CAAC;QAC/B,YAAY,EAAE,aAAa,EAAE,CAAC;QAC9B,iBAAiB,EAAE,aAAa,EAAE,CAAC;KACpC,CAAC;IACF,iBAAiB,EAAE,aAAa,EAAE,CAAC;IACnC,mBAAmB,EAAE;QACnB,cAAc,EAAE,aAAa,EAAE,CAAC;QAChC,OAAO,EAAE,aAAa,EAAE,CAAC;KAC1B,CAAC;IACF,aAAa,EAAE,aAAa,EAAE,CAAC;IAC/B,YAAY,EAAE,MAAM,CAAC;CACtB;AAkLD;;GAEG;AACH,wBAAgB,cAAc,CAAC,IAAI,EAAE,MAAM,GAAG,aAAa,CAsB1D"}