@houtini/voice-analyser 1.2.1 → 1.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,265 +1,265 @@
1
- # Voice Analyser
2
-
3
- [![npm version](https://img.shields.io/npm/v/@houtini/voice-analyser)](https://www.npmjs.com/package/@houtini/voice-analyser)
4
- [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
5
-
6
- > **Experimental library** for extracting statistical voice models from your published writing. Generates immersive style guides that teach LLMs to replicate how you actually write - not through rules, but through examples and rhythm patterns.
7
-
8
- ## Why This Exists
9
-
10
- Traditional style guides list rules: "Use short sentences. Vary paragraph length. Include personal anecdotes."
11
-
12
- This doesn't work. Writers don't follow rules - they channel voice.
13
-
14
- This tool extracts the statistical fingerprint of your writing and presents it as immersive examples with annotations showing *what makes each passage feel human*. The goal is voice replication through pattern recognition, not rule compliance.
15
-
16
- **Status:** Experimental. The approach works but is under active development.
17
-
18
- ## Installation
19
-
20
- ### Claude Desktop
21
-
22
- Add to your `claude_desktop_config.json`:
23
-
24
- ```json
25
- {
26
- "mcpServers": {
27
- "voice-analysis": {
28
- "command": "npx",
29
- "args": ["-y", "@houtini/voice-analyser@latest"]
30
- }
31
- }
32
- }
33
- ```
34
-
35
- **Config locations:**
36
- - Windows: `%APPDATA%\Claude\claude_desktop_config.json`
37
- - macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
38
- - Linux: `~/.config/Claude/claude_desktop_config.json`
39
-
40
- Restart Claude Desktop after saving.
41
-
42
- ### Requirements
43
-
44
- - Node.js 20+
45
-
46
- ## Quick Start
47
-
48
- ### 1. Create an Output Directory
49
-
50
- First, create a directory where your corpus and analysis will be stored:
51
-
52
- ```
53
- C:\writing\voice-models\ (Windows)
54
- ~/writing/voice-models/ (Mac/Linux)
55
- ```
56
-
57
- This directory will contain:
58
- - Collected articles (markdown)
59
- - Analysis JSON files
60
- - Generated voice guides
61
-
62
- ### 2. Collect Your Writing
63
-
64
- ```
65
- Collect corpus from https://yoursite.com/post-sitemap.xml
66
- Save as "my-voice" in "C:\writing\voice-models"
67
- ```
68
-
69
- The tool needs:
70
- - `sitemap_url` - Your XML sitemap URL
71
- - `output_name` - A name for this corpus (e.g., "my-voice", "blog-posts")
72
- - `output_dir` - The directory you created above
73
-
74
- **Example with all parameters:**
75
- ```
76
- Collect corpus from https://example.com/post-sitemap.xml
77
- Output name: "technical-writing"
78
- Output directory: "C:\writing\voice-models"
79
- Maximum articles: 50
80
- ```
81
-
82
- ### 3. Analyse Patterns
83
-
84
- ```
85
- Analyse corpus "my-voice" in directory "C:\writing\voice-models"
86
- ```
87
-
88
- This runs 14 analysers covering vocabulary, sentence structure, voice markers, argument flow, and paragraph transitions.
89
-
90
- ### 4. Generate Voice Guide
91
-
92
- ```
93
- Generate narrative guide for "my-voice" in directory "C:\writing\voice-models"
94
- ```
95
-
96
- Creates an immersive style guide with annotated examples at:
97
- `C:\writing\voice-models\my-voice\writing_style_my-voice_narrative.md`
98
-
99
- ## Using the Voice Guide
100
-
101
- Once generated, the voice guide can be loaded into any LLM conversation to help it write in your voice.
102
-
103
- ### Loading the Guide
104
-
105
- ```
106
- Load the file C:\writing\voice-models\my-voice\writing_style_my-voice_narrative.md
107
- and use it as a reference for all writing in this conversation.
108
- ```
109
-
110
- ### Example Prompts for Content Generation
111
-
112
- **Blog post:**
113
- ```
114
- Using the voice guide as your reference, write a blog post about [topic].
115
-
116
- Key requirements:
117
- - Match the sentence rhythm patterns shown in the examples
118
- - Use the conversational devices naturally (not forced)
119
- - Include the micro-rhythms: mid-thought pivots, embedded uncertainty, present-tense immediacy
120
- - Vary sentence length as shown in the statistical targets
121
- - Use British/American spelling as indicated in the guide
122
- ```
123
-
124
- **Technical article:**
125
- ```
126
- Reference the voice guide and write a technical explanation of [concept].
127
-
128
- Channel the voice by:
129
- - Opening with the pattern shown in "Opening Moves" section
130
- - Using specific product/tool names, not generic references
131
- - Including admissions of complexity or uncertainty where authentic
132
- - Following the argument flow patterns from the guide
133
- - Matching the punctuation habits (especially dash usage)
134
- ```
135
-
136
- **Product review:**
137
- ```
138
- Using the loaded voice guide, write a review of [product].
139
-
140
- Capture the voice by:
141
- - Starting with personal context (why you tested this)
142
- - Blending technical specs with practical implications
143
- - Using the transition patterns between paragraphs
144
- - Including the "human tells" - parenthetical asides, mid-thought corrections
145
- - Ending with the closing patterns shown in examples
146
- ```
147
-
148
- **Email/communication:**
149
- ```
150
- Write an email about [subject] using the voice patterns from the guide.
151
-
152
- Focus on:
153
- - Conversational markers appearing naturally
154
- - Sentence length variation (some punchy, some complex)
155
- - The hedging/confidence balance shown in statistics
156
- - First-person usage matching the corpus frequency
157
- ```
158
-
159
- ### Validation After Writing
160
-
161
- The guide includes statistical targets. After writing, check:
162
-
163
- ```
164
- Review what you just wrote against the voice guide metrics:
165
- - Does sentence length variation match the target standard deviation?
166
- - Is first-person frequency within the expected range?
167
- - Are conversational markers present but not overused?
168
- - Does the rhythm feel like the extended examples?
169
- ```
170
-
171
- ## What Gets Analysed
172
-
173
- ### Core Voice Patterns
174
- - **Vocabulary** - Word choice, British/American markers, domain specificity
175
- - **Sentence structure** - Length distribution, openers, complexity patterns
176
- - **Voice markers** - First-person usage, hedging language, conversational markers
177
- - **Punctuation** - Dash types, comma density, parenthetical frequency
178
-
179
- ### Argument & Flow Patterns
180
- - **Argument flow** - How you open, build, and close arguments
181
- - **Paragraph transitions** - How ideas connect across paragraphs
182
- - **Conversational devices** - "look", "frankly", "actually" and when they appear
183
-
184
- ### Micro-Rhythm Detection
185
- The guide annotates examples with invisible patterns that make writing feel human:
186
- - Mid-thought pivots (comma before "and", "but", "so")
187
- - Present-tense immediacy ("Right now, it's...")
188
- - Embedded uncertainty ("I think", "probably")
189
- - Casual sentence starters ("So,", "And,", "But,")
190
- - Parenthetical asides
191
- - Punchy fragments contrasting with longer sentences
192
-
193
- ## Output Structure
194
-
195
- ```
196
- your-output-directory/
197
- └── corpus-name/
198
- ├── articles/ # Collected markdown files
199
- ├── corpus.json # Metadata
200
- ├── analysis/ # JSON analysis files
201
- │ ├── vocabulary.json
202
- │ ├── sentence.json
203
- │ ├── voice.json
204
- │ ├── paragraph.json
205
- │ ├── punctuation.json
206
- │ ├── function-words.json
207
- │ ├── argument-flow.json
208
- │ └── paragraph-transitions.json
209
- └── writing_style_[name]_narrative.md # The voice guide
210
- ```
211
-
212
- ## Minimum Corpus Size
213
-
214
- - **Minimum:** 15,000 words (~20 articles)
215
- - **Recommended:** 30,000 words
216
- - **Ideal:** 50,000+ words
217
-
218
- Below 15k words, statistical patterns become unreliable.
219
-
220
- ## Tools Reference
221
-
222
- ### collect_corpus
223
-
224
- | Parameter | Required | Description |
225
- |-----------|----------|-------------|
226
- | `sitemap_url` | Yes | XML sitemap URL |
227
- | `output_name` | Yes | Corpus identifier (e.g., "my-voice") |
228
- | `output_dir` | Yes | Directory to store corpus |
229
- | `max_articles` | No | Limit (default: 100) |
230
- | `article_pattern` | No | Regex filter for URLs |
231
-
232
- ### analyze_corpus
233
-
234
- | Parameter | Required | Description |
235
- |-----------|----------|-------------|
236
- | `corpus_name` | Yes | Name from collect_corpus |
237
- | `corpus_dir` | Yes | Directory containing corpus |
238
- | `analysis_type` | No | full, quick, vocabulary, syntax |
239
-
240
- ### generate_narrative_guide
241
-
242
- | Parameter | Required | Description |
243
- |-----------|----------|-------------|
244
- | `corpus_name` | Yes | Name from analyze_corpus |
245
- | `corpus_dir` | Yes | Directory containing corpus |
246
-
247
- ## Development
248
-
249
- ```bash
250
- git clone https://github.com/houtini-ai/mcp-server-voice-analysis.git
251
- cd mcp-server-voice-analysis
252
- npm install
253
- npm run build
254
- ```
255
-
256
- ## Limitations
257
-
258
- - Requires XML sitemap (RSS feeds not currently supported)
259
- - Works best with consistent single-author content
260
- - Mixed authorship or heavily edited content produces weaker signals
261
- - The approach is experimental - results vary by writing style
262
-
263
- ---
264
-
265
- Apache License 2.0 - [Houtini.ai](https://houtini.ai)
1
+ # Voice Analyser
2
+
3
+ [![npm version](https://img.shields.io/npm/v/@houtini/voice-analyser)](https://www.npmjs.com/package/@houtini/voice-analyser)
4
+ [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
5
+
6
+ > **Experimental library** for extracting statistical voice models from your published writing. Generates immersive style guides that teach LLMs to replicate how you actually write - not through rules, but through examples and rhythm patterns.
7
+
8
+ ## Why This Exists
9
+
10
+ Traditional style guides list rules: "Use short sentences. Vary paragraph length. Include personal anecdotes."
11
+
12
+ This doesn't work. Writers don't follow rules - they channel voice.
13
+
14
+ This tool extracts the statistical fingerprint of *your writing* and presents it as immersive examples with annotations showing *what makes each passage feel human*. The goal is voice replication through pattern recognition, not rule compliance.
15
+
16
+ **Status:** Experimental. The approach works but is under active development.
17
+
18
+ ## Installation
19
+
20
+ ### Claude Desktop
21
+
22
+ Add to your `claude_desktop_config.json`:
23
+
24
+ ```json
25
+ {
26
+ "mcpServers": {
27
+ "voice-analysis": {
28
+ "command": "npx",
29
+ "args": ["-y", "@houtini/voice-analyser@latest"]
30
+ }
31
+ }
32
+ }
33
+ ```
34
+
35
+ **Config locations:**
36
+ - Windows: `%APPDATA%\Claude\claude_desktop_config.json`
37
+ - macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
38
+ - Linux: `~/.config/Claude/claude_desktop_config.json`
39
+
40
+ Restart Claude Desktop after saving.
41
+
42
+ ### Requirements
43
+
44
+ - Node.js 20+
45
+
46
+ ## Quick Start
47
+
48
+ ### 1. Create an Output Directory
49
+
50
+ First, create a directory where your corpus and analysis will be stored:
51
+
52
+ ```
53
+ C:\writing\voice-models\ (Windows)
54
+ ~/writing/voice-models/ (Mac/Linux)
55
+ ```
56
+
57
+ This directory will contain:
58
+ - Collected articles (markdown)
59
+ - Analysis JSON files
60
+ - Generated voice guides
61
+
62
+ ### 2. Collect Your Writing
63
+
64
+ ```
65
+ Collect corpus from https://yoursite.com/post-sitemap.xml
66
+ Save as "my-voice" in "C:\writing\voice-models"
67
+ ```
68
+
69
+ The tool needs:
70
+ - `sitemap_url` - Your XML sitemap URL
71
+ - `output_name` - A name for this corpus (e.g., "my-voice", "blog-posts")
72
+ - `output_dir` - The directory you created above
73
+
74
+ **Example with all parameters:**
75
+ ```
76
+ Collect corpus from https://example.com/post-sitemap.xml
77
+ Output name: "technical-writing"
78
+ Output directory: "C:\writing\voice-models"
79
+ Maximum articles: 50
80
+ ```
81
+
82
+ ### 3. Analyse Patterns
83
+
84
+ ```
85
+ Analyse corpus "my-voice" in directory "C:\writing\voice-models"
86
+ ```
87
+
88
+ This runs 14 analysers covering vocabulary, sentence structure, voice markers, argument flow, and paragraph transitions.
89
+
90
+ ### 4. Generate Voice Guide
91
+
92
+ ```
93
+ Generate narrative guide for "my-voice" in directory "C:\writing\voice-models"
94
+ ```
95
+
96
+ Creates an immersive style guide with annotated examples at:
97
+ `C:\writing\voice-models\my-voice\writing_style_my-voice_narrative.md`
98
+
99
+ ## Using the Voice Guide
100
+
101
+ Once generated, the voice guide can be loaded into any LLM conversation to help it write in your voice.
102
+
103
+ ### Loading the Guide
104
+
105
+ ```
106
+ Load the file C:\writing\voice-models\my-voice\writing_style_my-voice_narrative.md
107
+ and use it as a reference for all writing in this conversation.
108
+ ```
109
+
110
+ ### Example Prompts for Content Generation
111
+
112
+ **Blog post:**
113
+ ```
114
+ Using the voice guide as your reference, write a blog post about [topic].
115
+
116
+ Key requirements:
117
+ - Match the sentence rhythm patterns shown in the examples
118
+ - Use the conversational devices naturally (not forced)
119
+ - Include the micro-rhythms: mid-thought pivots, embedded uncertainty, present-tense immediacy
120
+ - Vary sentence length as shown in the statistical targets
121
+ - Use British/American spelling as indicated in the guide
122
+ ```
123
+
124
+ **Technical article:**
125
+ ```
126
+ Reference the voice guide and write a technical explanation of [concept].
127
+
128
+ Channel the voice by:
129
+ - Opening with the pattern shown in "Opening Moves" section
130
+ - Using specific product/tool names, not generic references
131
+ - Including admissions of complexity or uncertainty where authentic
132
+ - Following the argument flow patterns from the guide
133
+ - Matching the punctuation habits (especially dash usage)
134
+ ```
135
+
136
+ **Product review:**
137
+ ```
138
+ Using the loaded voice guide, write a review of [product].
139
+
140
+ Capture the voice by:
141
+ - Starting with personal context (why you tested this)
142
+ - Blending technical specs with practical implications
143
+ - Using the transition patterns between paragraphs
144
+ - Including the "human tells" - parenthetical asides, mid-thought corrections
145
+ - Ending with the closing patterns shown in examples
146
+ ```
147
+
148
+ **Email/communication:**
149
+ ```
150
+ Write an email about [subject] using the voice patterns from the guide.
151
+
152
+ Focus on:
153
+ - Conversational markers appearing naturally
154
+ - Sentence length variation (some punchy, some complex)
155
+ - The hedging/confidence balance shown in statistics
156
+ - First-person usage matching the corpus frequency
157
+ ```
158
+
159
+ ### Validation After Writing
160
+
161
+ The guide includes statistical targets. After writing, check:
162
+
163
+ ```
164
+ Review what you just wrote against the voice guide metrics:
165
+ - Does sentence length variation match the target standard deviation?
166
+ - Is first-person frequency within the expected range?
167
+ - Are conversational markers present but not overused?
168
+ - Does the rhythm feel like the extended examples?
169
+ ```
170
+
171
+ ## What Gets Analysed
172
+
173
+ ### Core Voice Patterns
174
+ - **Vocabulary** - Word choice, British/American markers, domain specificity
175
+ - **Sentence structure** - Length distribution, openers, complexity patterns
176
+ - **Voice markers** - First-person usage, hedging language, conversational markers
177
+ - **Punctuation** - Dash types, comma density, parenthetical frequency
178
+
179
+ ### Argument & Flow Patterns
180
+ - **Argument flow** - How you open, build, and close arguments
181
+ - **Paragraph transitions** - How ideas connect across paragraphs
182
+ - **Conversational devices** - "look", "frankly", "actually" and when they appear
183
+
184
+ ### Micro-Rhythm Detection
185
+ The guide annotates examples with invisible patterns that make writing feel human:
186
+ - Mid-thought pivots (comma before "and", "but", "so")
187
+ - Present-tense immediacy ("Right now, it's...")
188
+ - Embedded uncertainty ("I think", "probably")
189
+ - Casual sentence starters ("So,", "And,", "But,")
190
+ - Parenthetical asides
191
+ - Punchy fragments contrasting with longer sentences
192
+
193
+ ## Output Structure
194
+
195
+ ```
196
+ your-output-directory/
197
+ └── corpus-name/
198
+ ├── articles/ # Collected markdown files
199
+ ├── corpus.json # Metadata
200
+ ├── analysis/ # JSON analysis files
201
+ │ ├── vocabulary.json
202
+ │ ├── sentence.json
203
+ │ ├── voice.json
204
+ │ ├── paragraph.json
205
+ │ ├── punctuation.json
206
+ │ ├── function-words.json
207
+ │ ├── argument-flow.json
208
+ │ └── paragraph-transitions.json
209
+ └── writing_style_[name]_narrative.md # The voice guide
210
+ ```
211
+
212
+ ## Minimum Corpus Size
213
+
214
+ - **Minimum:** 15,000 words (~20 articles)
215
+ - **Recommended:** 30,000 words
216
+ - **Ideal:** 50,000+ words
217
+
218
+ Below 15k words, statistical patterns become unreliable.
219
+
220
+ ## Tools Reference
221
+
222
+ ### collect_corpus
223
+
224
+ | Parameter | Required | Description |
225
+ |-----------|----------|-------------|
226
+ | `sitemap_url` | Yes | XML sitemap URL |
227
+ | `output_name` | Yes | Corpus identifier (e.g., "my-voice") |
228
+ | `output_dir` | Yes | Directory to store corpus |
229
+ | `max_articles` | No | Limit (default: 100) |
230
+ | `article_pattern` | No | Regex filter for URLs |
231
+
232
+ ### analyze_corpus
233
+
234
+ | Parameter | Required | Description |
235
+ |-----------|----------|-------------|
236
+ | `corpus_name` | Yes | Name from collect_corpus |
237
+ | `corpus_dir` | Yes | Directory containing corpus |
238
+ | `analysis_type` | No | full, quick, vocabulary, syntax |
239
+
240
+ ### generate_narrative_guide
241
+
242
+ | Parameter | Required | Description |
243
+ |-----------|----------|-------------|
244
+ | `corpus_name` | Yes | Name from analyze_corpus |
245
+ | `corpus_dir` | Yes | Directory containing corpus |
246
+
247
+ ## Development
248
+
249
+ ```bash
250
+ git clone https://github.com/houtini-ai/mcp-server-voice-analysis.git
251
+ cd mcp-server-voice-analysis
252
+ npm install
253
+ npm run build
254
+ ```
255
+
256
+ ## Limitations
257
+
258
+ - Requires XML sitemap (RSS feeds not currently supported)
259
+ - Works best with consistent single-author content
260
+ - Mixed authorship or heavily edited content produces weaker signals
261
+ - The approach is experimental - results vary by writing style
262
+
263
+ ---
264
+
265
+ Apache License 2.0 - [Houtini.ai](https://houtini.ai)