site2voice 0.2.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,18 @@
1
+ # Changelog
2
+
3
+ ## 0.2.1
4
+
5
+ - Added the `pypi` GitHub environment claim to the Trusted Publishing workflow.
6
+
7
+ ## 0.2.0
8
+
9
+ - Added `site2voice bench` for before/after voice-alignment scoring.
10
+ - Added style fingerprints for heading shape, paragraph rhythm, CTA shape, CTA
11
+ verbs, and lexical variety.
12
+ - Added copy-safety and claim-boundary gates.
13
+ - Added synthetic editorial fixtures and benchmark report.
14
+ - Added source-candidate, benchmark, awesome-eligibility, and harness docs.
15
+
16
+ ## 0.1.0
17
+
18
+ - Initial CLI for generating `VOICE.md` from URL or local HTML copy.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Sihyeon Jeon
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,3 @@
1
+ include CHANGELOG.md
2
+ recursive-include docs *.md
3
+ recursive-include examples *.html *.md
@@ -0,0 +1,134 @@
1
+ Metadata-Version: 2.4
2
+ Name: site2voice
3
+ Version: 0.2.1
4
+ Summary: Generate AI-agent VOICE.md files from website copy and CTAs.
5
+ Author: Sihyeon Jeon
6
+ License-Expression: MIT
7
+ Keywords: agents,voice,copywriting,markdown,branding
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: Environment :: Console
10
+ Classifier: Topic :: Text Processing :: Markup :: HTML
11
+ Classifier: Topic :: Software Development :: Documentation
12
+ Requires-Python: >=3.10
13
+ Description-Content-Type: text/markdown
14
+ License-File: LICENSE
15
+ Dynamic: license-file
16
+
17
+ # site2voice
18
+
19
+ **Generate `VOICE.md` from any website.**
20
+
21
+ `site2voice` reads website copy and writes a small Markdown brief that tells an
22
+ AI coding agent how the site sounds: headings, CTAs, navigation labels, sentence
23
+ shape, repeated vocabulary, and claim boundaries.
24
+
25
+ ```bash
26
+ pipx install site2voice
27
+
28
+ site2voice https://example.com --out VOICE.md
29
+ ```
30
+
31
+ From a repo clone, run the included benchmark fixture:
32
+
33
+ ```bash
34
+ site2voice examples/saas-home.html --format json
35
+ site2voice bench examples/editorial-home.html examples/before-copy.md examples/after-copy.md
36
+ ```
37
+
38
+ ## Why
39
+
40
+ `DESIGN.md` helps agents stop guessing visual style. `VOICE.md` helps them stop
41
+ guessing copy style.
42
+
43
+ Drop the generated file into a project and tell the agent:
44
+
45
+ ```text
46
+ Use @VOICE.md for landing-page copy, headings, CTAs, and UI microcopy.
47
+ ```
48
+
49
+ ## Output
50
+
51
+ ```md
52
+ # VOICE.md
53
+
54
+ ## Voice Summary
55
+
56
+ - Overall tone: explanatory, action-oriented, trust-forward.
57
+ - Sentence shape: about 20.4 words per sentence.
58
+ - Main vocabulary: `teams`, `security`, `pricing`, `launch`.
59
+ - Common CTAs: `Start free`, `Book a demo`, `See pricing`.
60
+
61
+ ## Agent Rules
62
+
63
+ - Start with a concrete user outcome before describing implementation details.
64
+ - Prefer short active sentences and visible verbs from the CTA list.
65
+ - Do not invent compliance, security, customer, or performance claims.
66
+ ```
67
+
68
+ The real output also includes a small style fingerprint for heading length,
69
+ paragraph rhythm, CTA shape, CTA verbs, and lexical variety.
70
+
71
+ ## What It Does
72
+
73
+ - Reads a URL or local HTML file.
74
+ - Extracts title, meta description, headings, links, buttons, and paragraphs.
75
+ - Finds CTA candidates from short action-led links/buttons.
76
+ - Measures average sentence length.
77
+ - Extracts a compact style fingerprint: heading shape, paragraph rhythm,
78
+ CTA shape, CTA verbs, and lexical variety.
79
+ - Builds a repeated-vocabulary lexicon.
80
+ - Writes Markdown or JSON.
81
+ - Benchmarks candidate copy against a source voice profile.
82
+ - Gates against unsupported claims and copied spans.
83
+ - Uses only the Python standard library.
84
+
85
+ ## Benchmark
86
+
87
+ `site2voice bench` compares candidate copy against measurable source signals:
88
+ sentence length, vocabulary overlap, CTA shape, tone labels, heading shape,
89
+ claim boundaries, and copy safety.
90
+
91
+ ```bash
92
+ site2voice bench examples/editorial-home.html \
93
+ examples/before-copy.md \
94
+ examples/after-copy.md \
95
+ --out examples/editorial-benchmark.md
96
+ ```
97
+
98
+ | Candidate | Result | Overall | Lexicon | Copy safety |
99
+ | --- | --- | ---: | ---: | ---: |
100
+ | `after-copy` | PASS | 83.8 | 70.0 | 93.2 |
101
+ | `before-copy` | FAIL | 36.6 | 0.0 | 100.0 |
102
+
103
+ The benchmark rewards measurable voice alignment without rewarding verbatim
104
+ copying.
105
+
106
+ ## What It Is Not
107
+
108
+ - Not an official brand guideline.
109
+ - Not a DESIGN.md visual-token extractor.
110
+ - Not a crawler for private pages or authenticated apps.
111
+ - Not an LLM prompt that copies a site's prose.
112
+
113
+ ## Develop
114
+
115
+ ```bash
116
+ python3 -m pip install -e .
117
+ make test
118
+ make bench
119
+ site2voice examples/saas-home.html --out examples/saas-VOICE.md
120
+ ```
121
+
122
+ ## Links
123
+
124
+ - [Research](docs/research.md)
125
+ - [Benchmark](docs/benchmark.md)
126
+ - [Voice patterns](docs/voice-patterns.md)
127
+ - [Source candidates](docs/source-candidates.md)
128
+ - [Awesome eligibility](docs/awesome-eligibility.md)
129
+ - [Harness](docs/harness.md)
130
+ - [Launch kit](docs/launch-kit.md)
131
+
132
+ ## License
133
+
134
+ MIT
@@ -0,0 +1,118 @@
1
+ # site2voice
2
+
3
+ **Generate `VOICE.md` from any website.**
4
+
5
+ `site2voice` reads website copy and writes a small Markdown brief that tells an
6
+ AI coding agent how the site sounds: headings, CTAs, navigation labels, sentence
7
+ shape, repeated vocabulary, and claim boundaries.
8
+
9
+ ```bash
10
+ pipx install site2voice
11
+
12
+ site2voice https://example.com --out VOICE.md
13
+ ```
14
+
15
+ From a repo clone, run the included benchmark fixture:
16
+
17
+ ```bash
18
+ site2voice examples/saas-home.html --format json
19
+ site2voice bench examples/editorial-home.html examples/before-copy.md examples/after-copy.md
20
+ ```
21
+
22
+ ## Why
23
+
24
+ `DESIGN.md` helps agents stop guessing visual style. `VOICE.md` helps them stop
25
+ guessing copy style.
26
+
27
+ Drop the generated file into a project and tell the agent:
28
+
29
+ ```text
30
+ Use @VOICE.md for landing-page copy, headings, CTAs, and UI microcopy.
31
+ ```
32
+
33
+ ## Output
34
+
35
+ ```md
36
+ # VOICE.md
37
+
38
+ ## Voice Summary
39
+
40
+ - Overall tone: explanatory, action-oriented, trust-forward.
41
+ - Sentence shape: about 20.4 words per sentence.
42
+ - Main vocabulary: `teams`, `security`, `pricing`, `launch`.
43
+ - Common CTAs: `Start free`, `Book a demo`, `See pricing`.
44
+
45
+ ## Agent Rules
46
+
47
+ - Start with a concrete user outcome before describing implementation details.
48
+ - Prefer short active sentences and visible verbs from the CTA list.
49
+ - Do not invent compliance, security, customer, or performance claims.
50
+ ```
51
+
52
+ The real output also includes a small style fingerprint for heading length,
53
+ paragraph rhythm, CTA shape, CTA verbs, and lexical variety.
54
+
55
+ ## What It Does
56
+
57
+ - Reads a URL or local HTML file.
58
+ - Extracts title, meta description, headings, links, buttons, and paragraphs.
59
+ - Finds CTA candidates from short action-led links/buttons.
60
+ - Measures average sentence length.
61
+ - Extracts a compact style fingerprint: heading shape, paragraph rhythm,
62
+ CTA shape, CTA verbs, and lexical variety.
63
+ - Builds a repeated-vocabulary lexicon.
64
+ - Writes Markdown or JSON.
65
+ - Benchmarks candidate copy against a source voice profile.
66
+ - Gates against unsupported claims and copied spans.
67
+ - Uses only the Python standard library.
68
+
69
+ ## Benchmark
70
+
71
+ `site2voice bench` compares candidate copy against measurable source signals:
72
+ sentence length, vocabulary overlap, CTA shape, tone labels, heading shape,
73
+ claim boundaries, and copy safety.
74
+
75
+ ```bash
76
+ site2voice bench examples/editorial-home.html \
77
+ examples/before-copy.md \
78
+ examples/after-copy.md \
79
+ --out examples/editorial-benchmark.md
80
+ ```
81
+
82
+ | Candidate | Result | Overall | Lexicon | Copy safety |
83
+ | --- | --- | ---: | ---: | ---: |
84
+ | `after-copy` | PASS | 83.8 | 70.0 | 93.2 |
85
+ | `before-copy` | FAIL | 36.6 | 0.0 | 100.0 |
86
+
87
+ The benchmark rewards measurable voice alignment without rewarding verbatim
88
+ copying.
89
+
90
+ ## What It Is Not
91
+
92
+ - Not an official brand guideline.
93
+ - Not a DESIGN.md visual-token extractor.
94
+ - Not a crawler for private pages or authenticated apps.
95
+ - Not an LLM prompt that copies a site's prose.
96
+
97
+ ## Develop
98
+
99
+ ```bash
100
+ python3 -m pip install -e .
101
+ make test
102
+ make bench
103
+ site2voice examples/saas-home.html --out examples/saas-VOICE.md
104
+ ```
105
+
106
+ ## Links
107
+
108
+ - [Research](docs/research.md)
109
+ - [Benchmark](docs/benchmark.md)
110
+ - [Voice patterns](docs/voice-patterns.md)
111
+ - [Source candidates](docs/source-candidates.md)
112
+ - [Awesome eligibility](docs/awesome-eligibility.md)
113
+ - [Harness](docs/harness.md)
114
+ - [Launch kit](docs/launch-kit.md)
115
+
116
+ ## License
117
+
118
+ MIT
@@ -0,0 +1,51 @@
1
+ # Awesome Eligibility
2
+
3
+ Current status: **not ready to submit broadly**.
4
+
5
+ The project now has a public repo, a release, examples, tests, and a benchmark.
6
+ That is enough for lightweight launch posts, but not enough for high-signal
7
+ awesome-list submissions. Awesome maintainers usually expect visible usage,
8
+ clear novelty, and a stable public surface.
9
+
10
+ ## Best Fit
11
+
12
+ | List family | Fit | Why |
13
+ | --- | --- | --- |
14
+ | Vibe coding / context engineering | High later | `site2voice` creates agent context files and includes an eval path. |
15
+ | Prompt engineering tools | Medium later | Useful for prompt/context prep, but not a prompt library. |
16
+ | AI for design | Medium-low | Related to design handoff, but voice/copy is not visual design. |
17
+ | Claude Code tools | Not yet | Some lists require high star counts or mature adoption. |
18
+
19
+ ## Submit When
20
+
21
+ - v0.2+ release is public and installable.
22
+ - README shows one command, one result, and one before/after benchmark.
23
+ - At least one real-world demo or user workflow exists.
24
+ - No generated third-party prose is committed.
25
+ - The submission can be phrased as a context-engineering tool, not a brand
26
+ imitation tool.
27
+
28
+ ## Candidate Positioning
29
+
30
+ Use this description:
31
+
32
+ > Generate compact `VOICE.md` context files from website copy, then benchmark
33
+ > whether AI-written copy matches the source voice without copying spans or
34
+ > inventing claims.
35
+
36
+ Avoid:
37
+
38
+ - asking for stars;
39
+ - submitting to unrelated design lists;
40
+ - claiming official brand replication;
41
+ - using EYESMAG, Highsnobiety, Stripe, or Linear outputs as public examples
42
+ unless only derived metrics are shown.
43
+
44
+ ## References
45
+
46
+ - [no-fluff/awesome-vibe-coding](https://github.com/no-fluff/awesome-vibe-coding)
47
+ - [taskade/awesome-vibe-coding](https://github.com/taskade/awesome-vibe-coding)
48
+ - [subinium/awesome-claude-code](https://github.com/subinium/awesome-claude-code)
49
+ - [promptslab/awesome-prompt-engineering](https://github.com/promptslab/awesome-prompt-engineering)
50
+ - [allanjsx/awesome-ai-for-design](https://github.com/allanjsx/awesome-ai-for-design)
51
+
@@ -0,0 +1,47 @@
1
+ # Benchmark
2
+
3
+ `site2voice bench` checks whether candidate copy follows a source voice profile.
4
+ It is deterministic and uses only local text statistics.
5
+
6
+ ```bash
7
+ site2voice bench examples/editorial-home.html \
8
+ examples/before-copy.md \
9
+ examples/after-copy.md \
10
+ --out examples/editorial-benchmark.md
11
+ ```
12
+
13
+ ## Metrics
14
+
15
+ | Metric | What it checks |
16
+ | --- | --- |
17
+ | Sentence fit | Candidate average sentence length vs source average. |
18
+ | Lexicon fit | Whether candidate reuses source vocabulary without requiring exact prose. |
19
+ | CTA fit | Whether short action-led lines match observed CTA vocabulary. |
20
+ | Tone fit | Overlap between deterministic tone labels. |
21
+ | Heading fit | Whether heading length/shape is close to the source. |
22
+ | Claim safety | Penalizes unsupported security, performance, customer, pricing, and AI claims. |
23
+ | Copy safety | Penalizes long shared spans and high 5-gram overlap with source snippets. |
24
+
25
+ The reference profile also keeps a style fingerprint: average heading length,
26
+ average paragraph length, average CTA length, CTA verbs, lexical variety, and
27
+ punctuation counts.
28
+
29
+ ## Gates
30
+
31
+ A candidate passes when:
32
+
33
+ - overall score is at least 75;
34
+ - copy safety is at least 85;
35
+ - claim safety is at least 75.
36
+
37
+ This matters because direct copying can look stylistically aligned while being
38
+ the wrong behavior. Copy safety is a gate, not just a metric.
39
+
40
+ ## Example
41
+
42
+ | Candidate | Result | Overall | Sentence | Lexicon | CTA | Tone | Heading | Claim safety | Copy safety |
43
+ | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
44
+ | `after-copy` | PASS | 83.8 | 95.8 | 70.0 | 50.0 | 100.0 | 93.8 | 100.0 | 93.2 |
45
+ | `before-copy` | FAIL | 36.6 | 57.5 | 0.0 | 0.0 | 0.0 | 62.5 | 100.0 | 100.0 |
46
+
47
+ See [editorial-benchmark.md](../examples/editorial-benchmark.md).
@@ -0,0 +1,34 @@
1
+ # Harness
2
+
3
+ `site2voice` is built as a small agent-orchestration harness:
4
+
5
+ 1. **Scout** finds relevant public sources and decides whether they are safe as
6
+ live examples or should be represented by synthetic fixtures.
7
+ 2. **Profiler** runs `site2voice SOURCE --format json` and extracts measurable
8
+ voice signals: sentence shape, heading shape, CTA verbs, lexical variety,
9
+ and repeated vocabulary.
10
+ 3. **Writer** uses the generated `VOICE.md` as context to write new candidate
11
+ copy.
12
+ 4. **Evaluator** runs `site2voice bench SOURCE before.md after.md` and checks
13
+ alignment, claim boundaries, and copy safety.
14
+ 5. **Publisher** updates examples, docs, releases, and launch notes.
15
+
16
+ ## Claude Operator Review
17
+
18
+ Claude operator review agreed with the direction and recommended:
19
+
20
+ - deterministic style fingerprints instead of vague tone claims;
21
+ - a scorer before additional promotion;
22
+ - explicit copy-safety and unsupported-claim gates;
23
+ - source candidates as live examples, not committed third-party prose.
24
+
25
+ ## Current Team Findings
26
+
27
+ - Editorial/magazine sources are useful for tone, but checked-in examples should
28
+ be synthetic.
29
+ - `site2voice` should be submitted to awesome lists only after release/demo
30
+ evidence is stronger.
31
+ - Best awesome-list fit found so far: vibe-coding/context-engineering lists, not
32
+ visual design lists.
33
+
34
+ See [awesome-eligibility.md](awesome-eligibility.md) for submission gates.
@@ -0,0 +1,38 @@
1
+ # Launch Kit
2
+
3
+ ## One-Line Pitch
4
+
5
+ `site2voice` turns any website into a small `VOICE.md` file for AI coding
6
+ agents.
7
+
8
+ ## Short Post
9
+
10
+ I made `site2voice`, a tiny CLI that generates agent-readable `VOICE.md` files
11
+ from website copy.
12
+
13
+ It extracts headings, CTA text, navigation labels, sentence length, and repeated
14
+ vocabulary, then writes a concise voice brief for Claude Code, Cursor, Codex, or
15
+ any other coding agent.
16
+
17
+ ```bash
18
+ pipx install site2voice
19
+ site2voice https://example.com --out VOICE.md
20
+
21
+ # from a repo clone
22
+ site2voice bench examples/editorial-home.html examples/before-copy.md examples/after-copy.md
23
+ ```
24
+
25
+ No LLM dependency. No browser dependency. The goal is simple: give agents voice
26
+ and positioning context before they write new UI copy.
27
+
28
+ The benchmark path shows whether the generated copy actually moved closer to
29
+ the source voice, without rewarding copied spans.
30
+
31
+ Repo: https://github.com/SihyeonJeon/site2voice
32
+
33
+ ## Avoid
34
+
35
+ - Do not claim official brand approval.
36
+ - Do not paste long generated excerpts from third-party sites.
37
+ - Do not ask for stars.
38
+ - Do not spam unrelated design repositories.
@@ -0,0 +1,69 @@
1
+ # Research
2
+
3
+ ## What Exists
4
+
5
+ DESIGN.md-style repos and services give AI coding agents visual design context:
6
+ colors, typography, spacing, layout patterns, and component guidance.
7
+
8
+ Observed categories:
9
+
10
+ - DESIGN.md libraries for ready-made visual systems.
11
+ - URL-to-DESIGN.md generators for design tokens and visual patterns.
12
+ - Agent memory files such as `AGENTS.md`, `CLAUDE.md`, and `DESIGN.md`.
13
+
14
+ ## Gap
15
+
16
+ Visual style is only half of what an agent needs. When generating landing pages,
17
+ docs, onboarding flows, and UI microcopy, the agent also needs:
18
+
19
+ - message hierarchy;
20
+ - CTA language;
21
+ - navigation vocabulary;
22
+ - sentence length;
23
+ - claim boundaries;
24
+ - words to reuse and words to avoid.
25
+
26
+ `site2voice` targets that gap with a deterministic `VOICE.md` generator.
27
+
28
+ ## New Evidence Path
29
+
30
+ The project now includes `site2voice bench`, which measures whether candidate
31
+ copy follows a source voice profile. This turns `VOICE.md` from a descriptive
32
+ artifact into a testable workflow:
33
+
34
+ ```bash
35
+ site2voice SOURCE --out VOICE.md
36
+ site2voice bench SOURCE before.md after.md
37
+ ```
38
+
39
+ The benchmark scores sentence shape, vocabulary overlap, CTA shape, heading
40
+ shape, tone labels, claim boundaries, and copy safety.
41
+
42
+ ## Product Bet
43
+
44
+ The README should be short:
45
+
46
+ ```bash
47
+ site2voice https://example.com --out VOICE.md
48
+ ```
49
+
50
+ The output should answer:
51
+
52
+ - What does this site sound like?
53
+ - What CTAs does it use?
54
+ - What vocabulary should an agent reuse?
55
+ - What claims must the agent avoid inventing?
56
+
57
+ ## Boundaries
58
+
59
+ - No LLM dependency.
60
+ - No screenshot or brand asset copying.
61
+ - Do not paste long source paragraphs into the generated file.
62
+ - Treat the output as a writing/style brief, not legal brand guidance.
63
+
64
+ ## References
65
+
66
+ - [DESIGN.md library](https://designmd.app/)
67
+ - [Better Stack guide to DESIGN.md](https://betterstack.com/community/guides/ai/design-md-ai/)
68
+ - [DESIGN.MD by Parallect coverage](https://chatgate.ai/post/design-md-by-parallect)
69
+ - [DesignMD URL-to-DESIGN.md generator](https://www.designmd.cc/)
@@ -0,0 +1,22 @@
1
+ # Source Candidates
2
+
3
+ Use these as live examples for local analysis. Do not commit generated outputs
4
+ from third-party sites unless you remove source snippets and preserve only
5
+ derived metrics.
6
+
7
+ | Source | Why it is useful | Use safely |
8
+ | --- | --- | --- |
9
+ | [EYESMAG](https://www.eyesmag.com/) | Korean fashion/lifestyle web-magazine tone, mixed brand names, short trend labels. | Prefer live local runs or synthetic Korean fixtures. |
10
+ | [Highsnobiety](https://www.highsnobiety.com/) | Culture, fashion, commerce, and editorial headline cadence. | Use homepage/category pages; do not republish article body output. |
11
+ | [Hypebeast](https://hypebeast.com/) | Trend-driven streetwear/product vocabulary and title patterns. | Use as a live URL example; JS-heavy pages may need future browser support. |
12
+ | [Monocle](https://monocle.com/) | Polished global affairs, design, travel, and city-guide register. | Good live source for nav/category/CTA patterns. |
13
+ | [Wallpaper](https://www.wallpaper.com/) | Design, interiors, architecture, and product-focused editorial voice. | Use derived metrics, not copied article snippets. |
14
+ | [Dazed](https://www.dazeddigital.com/) | Youth culture, fashion, music, and art phrasing. | Prefer synthetic fixtures for public examples. |
15
+ | [Stripe](https://stripe.com/) | Developer/business SaaS claims, technical nouns, conversion CTAs. | Good contrast against editorial fixtures. |
16
+ | [Linear](https://linear.app/) | Modern product-team copy, short headings, system-oriented vocabulary. | Good live source, but homepage content changes often. |
17
+ | [Vercel](https://vercel.com/) | Developer platform voice, performance claims, product CTA density. | Good for claim-boundary tests. |
18
+
19
+ ## Fixture Policy
20
+
21
+ Checked-in examples should be synthetic. Live URL extraction is for local
22
+ analysis and should not redistribute third-party prose.
@@ -0,0 +1,56 @@
1
+ # Voice Patterns
2
+
3
+ `site2voice` does not try to copy a brand or magazine. It extracts measurable
4
+ signals that help an agent write new copy in a nearby register without pasting
5
+ source prose.
6
+
7
+ ## Editorial / Magazine
8
+
9
+ What to measure:
10
+
11
+ - short section labels mixed with longer descriptive paragraphs;
12
+ - named entities, places, seasons, materials, products, and collaborators;
13
+ - restrained adjectives before concrete nouns;
14
+ - headline length and whether titles read like indexes, blurbs, or news alerts;
15
+ - CTA words such as `read`, `explore`, `view`, `subscribe`, or `join`.
16
+
17
+ Good source families:
18
+
19
+ - EYESMAG-style Korean fashion and lifestyle indexing;
20
+ - Highsnobiety/Hypebeast-style culture and product trend coverage;
21
+ - Monocle/Wallpaper-style design, city, travel, and object language;
22
+ - Dazed/i-D-style youth culture and fashion framing.
23
+
24
+ Use synthetic fixtures for checked-in examples. Use live URLs only for local
25
+ analysis or private benchmark runs.
26
+
27
+ ## Product / Developer Sites
28
+
29
+ What to measure:
30
+
31
+ - outcome-first headings;
32
+ - short conversion CTAs;
33
+ - technical nouns and integration terms;
34
+ - proof language, customer claims, security claims, and performance claims;
35
+ - whether copy starts with user value or implementation detail.
36
+
37
+ Good source families:
38
+
39
+ - Stripe, Vercel, Linear, GitHub, and other public product pages.
40
+
41
+ These sources are useful contrast sets. They expose whether candidate copy is
42
+ accidentally drifting from editorial voice into SaaS launch-page voice.
43
+
44
+ ## Agent Handoff
45
+
46
+ Give the agent the generated `VOICE.md`, then score the result:
47
+
48
+ ```bash
49
+ site2voice https://example.com --out VOICE.md --max-snippets 0
50
+ site2voice bench https://example.com before.md after.md
51
+ ```
52
+
53
+ The first command gives the agent a voice brief. The second command verifies
54
+ whether the resulting copy moved closer to the source profile while staying
55
+ inside copy-safety and claim-safety gates.
56
+
@@ -0,0 +1,9 @@
1
+ # The slow return of the useful object
2
+
3
+ Current Index follows the small studios, shopkeepers, and collectors turning everyday materials into a quieter kind of record.
4
+
5
+ ## A guide shaped by rooms, streets, and habit
6
+
7
+ Read the feature
8
+ Explore rooms
9
+ Open the guide
@@ -0,0 +1,9 @@
1
+ # Powerful content for modern brands
2
+
3
+ Our platform helps teams create better marketing with seamless workflows and next-generation automation.
4
+
5
+ ## Everything you need
6
+
7
+ Use our all-in-one tools to boost productivity, improve collaboration, and unlock growth faster.
8
+
9
+ Get started today
@@ -0,0 +1,26 @@
1
+ # Voice Benchmark
2
+
3
+ Reference: `examples/editorial-home.html`
4
+
5
+ ## Reference Profile
6
+
7
+ - Tone: explanatory
8
+ - Average sentence length: 21.4 words
9
+ - Lexicon: `city`, `archive`, `design`, `style`, `join`, `list`, `quiet`, `return`, `studio`, `object`, `current`, `index`
10
+ - CTAs: `Join the list`, `Explore rooms`
11
+
12
+ ## Scores
13
+
14
+ | Candidate | Result | Overall | Sentence | Lexicon | CTA | Tone | Heading | Claim safety | Copy safety |
15
+ | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
16
+ | `after-copy` | **PASS** | 83.8 | 95.8 | 70.0 | 50.0 | 100.0 | 93.8 | 100.0 | 93.2 |
17
+ | `before-copy` | **FAIL** | 36.6 | 57.5 | 0.0 | 0.0 | 0.0 | 62.5 | 100.0 | 100.0 |
18
+
19
+ ## Why This Is Useful
20
+
21
+ The score is deterministic. It checks measurable voice signals and gates against unsupported claims and copied spans.
22
+
23
+ ## Candidate Evidence
24
+
25
+ - `after-copy` reused source lexicon: `collectors`, `current`, `follows`, `index`, `object`, `return`, `shopkeepers`; longest shared run: 5 words.
26
+ - `before-copy` reused source lexicon: none; longest shared run: 1 words.