site2voice 0.2.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- site2voice-0.2.1/CHANGELOG.md +18 -0
- site2voice-0.2.1/LICENSE +21 -0
- site2voice-0.2.1/MANIFEST.in +3 -0
- site2voice-0.2.1/PKG-INFO +134 -0
- site2voice-0.2.1/README.md +118 -0
- site2voice-0.2.1/docs/awesome-eligibility.md +51 -0
- site2voice-0.2.1/docs/benchmark.md +47 -0
- site2voice-0.2.1/docs/harness.md +34 -0
- site2voice-0.2.1/docs/launch-kit.md +38 -0
- site2voice-0.2.1/docs/research.md +69 -0
- site2voice-0.2.1/docs/source-candidates.md +22 -0
- site2voice-0.2.1/docs/voice-patterns.md +56 -0
- site2voice-0.2.1/examples/after-copy.md +9 -0
- site2voice-0.2.1/examples/before-copy.md +9 -0
- site2voice-0.2.1/examples/editorial-benchmark.md +26 -0
- site2voice-0.2.1/examples/editorial-home.html +38 -0
- site2voice-0.2.1/examples/saas-VOICE.md +52 -0
- site2voice-0.2.1/examples/saas-home.html +38 -0
- site2voice-0.2.1/pyproject.toml +25 -0
- site2voice-0.2.1/setup.cfg +4 -0
- site2voice-0.2.1/src/site2voice/__init__.py +3 -0
- site2voice-0.2.1/src/site2voice/benchmark.py +291 -0
- site2voice-0.2.1/src/site2voice/cli.py +80 -0
- site2voice-0.2.1/src/site2voice/extract.py +378 -0
- site2voice-0.2.1/src/site2voice.egg-info/PKG-INFO +134 -0
- site2voice-0.2.1/src/site2voice.egg-info/SOURCES.txt +28 -0
- site2voice-0.2.1/src/site2voice.egg-info/dependency_links.txt +1 -0
- site2voice-0.2.1/src/site2voice.egg-info/entry_points.txt +2 -0
- site2voice-0.2.1/src/site2voice.egg-info/top_level.txt +1 -0
- site2voice-0.2.1/tests/test_cli.py +64 -0
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
## 0.2.1
|
|
4
|
+
|
|
5
|
+
- Added the `pypi` GitHub environment claim to the Trusted Publishing workflow.
|
|
6
|
+
|
|
7
|
+
## 0.2.0
|
|
8
|
+
|
|
9
|
+
- Added `site2voice bench` for before/after voice-alignment scoring.
|
|
10
|
+
- Added style fingerprints for heading shape, paragraph rhythm, CTA shape, CTA
|
|
11
|
+
verbs, and lexical variety.
|
|
12
|
+
- Added copy-safety and claim-boundary gates.
|
|
13
|
+
- Added synthetic editorial fixtures and benchmark report.
|
|
14
|
+
- Added source-candidate, benchmark, awesome-eligibility, and harness docs.
|
|
15
|
+
|
|
16
|
+
## 0.1.0
|
|
17
|
+
|
|
18
|
+
- Initial CLI for generating `VOICE.md` from URL or local HTML copy.
|
site2voice-0.2.1/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Sihyeon Jeon
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,134 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: site2voice
|
|
3
|
+
Version: 0.2.1
|
|
4
|
+
Summary: Generate AI-agent VOICE.md files from website copy and CTAs.
|
|
5
|
+
Author: Sihyeon Jeon
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Keywords: agents,voice,copywriting,markdown,branding
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: Environment :: Console
|
|
10
|
+
Classifier: Topic :: Text Processing :: Markup :: HTML
|
|
11
|
+
Classifier: Topic :: Software Development :: Documentation
|
|
12
|
+
Requires-Python: >=3.10
|
|
13
|
+
Description-Content-Type: text/markdown
|
|
14
|
+
License-File: LICENSE
|
|
15
|
+
Dynamic: license-file
|
|
16
|
+
|
|
17
|
+
# site2voice
|
|
18
|
+
|
|
19
|
+
**Generate `VOICE.md` from any website.**
|
|
20
|
+
|
|
21
|
+
`site2voice` reads website copy and writes a small Markdown brief that tells an
|
|
22
|
+
AI coding agent how the site sounds: headings, CTAs, navigation labels, sentence
|
|
23
|
+
shape, repeated vocabulary, and claim boundaries.
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
pipx install site2voice
|
|
27
|
+
|
|
28
|
+
site2voice https://example.com --out VOICE.md
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
From a repo clone, run the included benchmark fixture:
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
site2voice examples/saas-home.html --format json
|
|
35
|
+
site2voice bench examples/editorial-home.html examples/before-copy.md examples/after-copy.md
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
## Why
|
|
39
|
+
|
|
40
|
+
`DESIGN.md` helps agents stop guessing visual style. `VOICE.md` helps them stop
|
|
41
|
+
guessing copy style.
|
|
42
|
+
|
|
43
|
+
Drop the generated file into a project and tell the agent:
|
|
44
|
+
|
|
45
|
+
```text
|
|
46
|
+
Use @VOICE.md for landing-page copy, headings, CTAs, and UI microcopy.
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
## Output
|
|
50
|
+
|
|
51
|
+
```md
|
|
52
|
+
# VOICE.md
|
|
53
|
+
|
|
54
|
+
## Voice Summary
|
|
55
|
+
|
|
56
|
+
- Overall tone: explanatory, action-oriented, trust-forward.
|
|
57
|
+
- Sentence shape: about 20.4 words per sentence.
|
|
58
|
+
- Main vocabulary: `teams`, `security`, `pricing`, `launch`.
|
|
59
|
+
- Common CTAs: `Start free`, `Book a demo`, `See pricing`.
|
|
60
|
+
|
|
61
|
+
## Agent Rules
|
|
62
|
+
|
|
63
|
+
- Start with a concrete user outcome before describing implementation details.
|
|
64
|
+
- Prefer short active sentences and visible verbs from the CTA list.
|
|
65
|
+
- Do not invent compliance, security, customer, or performance claims.
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
The real output also includes a small style fingerprint for heading length,
|
|
69
|
+
paragraph rhythm, CTA shape, CTA verbs, and lexical variety.
|
|
70
|
+
|
|
71
|
+
## What It Does
|
|
72
|
+
|
|
73
|
+
- Reads a URL or local HTML file.
|
|
74
|
+
- Extracts title, meta description, headings, links, buttons, and paragraphs.
|
|
75
|
+
- Finds CTA candidates from short action-led links/buttons.
|
|
76
|
+
- Measures average sentence length.
|
|
77
|
+
- Extracts a compact style fingerprint: heading shape, paragraph rhythm,
|
|
78
|
+
CTA shape, CTA verbs, and lexical variety.
|
|
79
|
+
- Builds a repeated-vocabulary lexicon.
|
|
80
|
+
- Writes Markdown or JSON.
|
|
81
|
+
- Benchmarks candidate copy against a source voice profile.
|
|
82
|
+
- Gates against unsupported claims and copied spans.
|
|
83
|
+
- Uses only the Python standard library.
|
|
84
|
+
|
|
85
|
+
## Benchmark
|
|
86
|
+
|
|
87
|
+
`site2voice bench` compares candidate copy against measurable source signals:
|
|
88
|
+
sentence length, vocabulary overlap, CTA shape, tone labels, heading shape,
|
|
89
|
+
claim boundaries, and copy safety.
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
site2voice bench examples/editorial-home.html \
|
|
93
|
+
examples/before-copy.md \
|
|
94
|
+
examples/after-copy.md \
|
|
95
|
+
--out examples/editorial-benchmark.md
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
| Candidate | Result | Overall | Lexicon | Copy safety |
|
|
99
|
+
| --- | --- | ---: | ---: | ---: |
|
|
100
|
+
| `after-copy` | PASS | 83.8 | 70.0 | 93.2 |
|
|
101
|
+
| `before-copy` | FAIL | 36.6 | 0.0 | 100.0 |
|
|
102
|
+
|
|
103
|
+
The benchmark rewards measurable voice alignment without rewarding verbatim
|
|
104
|
+
copying.
|
|
105
|
+
|
|
106
|
+
## What It Is Not
|
|
107
|
+
|
|
108
|
+
- Not an official brand guideline.
|
|
109
|
+
- Not a DESIGN.md visual-token extractor.
|
|
110
|
+
- Not a crawler for private pages or authenticated apps.
|
|
111
|
+
- Not an LLM prompt that copies a site's prose.
|
|
112
|
+
|
|
113
|
+
## Develop
|
|
114
|
+
|
|
115
|
+
```bash
|
|
116
|
+
python3 -m pip install -e .
|
|
117
|
+
make test
|
|
118
|
+
make bench
|
|
119
|
+
site2voice examples/saas-home.html --out examples/saas-VOICE.md
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
## Links
|
|
123
|
+
|
|
124
|
+
- [Research](docs/research.md)
|
|
125
|
+
- [Benchmark](docs/benchmark.md)
|
|
126
|
+
- [Voice patterns](docs/voice-patterns.md)
|
|
127
|
+
- [Source candidates](docs/source-candidates.md)
|
|
128
|
+
- [Awesome eligibility](docs/awesome-eligibility.md)
|
|
129
|
+
- [Harness](docs/harness.md)
|
|
130
|
+
- [Launch kit](docs/launch-kit.md)
|
|
131
|
+
|
|
132
|
+
## License
|
|
133
|
+
|
|
134
|
+
MIT
|
|
@@ -0,0 +1,118 @@
|
|
|
1
|
+
# site2voice
|
|
2
|
+
|
|
3
|
+
**Generate `VOICE.md` from any website.**
|
|
4
|
+
|
|
5
|
+
`site2voice` reads website copy and writes a small Markdown brief that tells an
|
|
6
|
+
AI coding agent how the site sounds: headings, CTAs, navigation labels, sentence
|
|
7
|
+
shape, repeated vocabulary, and claim boundaries.
|
|
8
|
+
|
|
9
|
+
```bash
|
|
10
|
+
pipx install site2voice
|
|
11
|
+
|
|
12
|
+
site2voice https://example.com --out VOICE.md
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
From a repo clone, run the included benchmark fixture:
|
|
16
|
+
|
|
17
|
+
```bash
|
|
18
|
+
site2voice examples/saas-home.html --format json
|
|
19
|
+
site2voice bench examples/editorial-home.html examples/before-copy.md examples/after-copy.md
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
## Why
|
|
23
|
+
|
|
24
|
+
`DESIGN.md` helps agents stop guessing visual style. `VOICE.md` helps them stop
|
|
25
|
+
guessing copy style.
|
|
26
|
+
|
|
27
|
+
Drop the generated file into a project and tell the agent:
|
|
28
|
+
|
|
29
|
+
```text
|
|
30
|
+
Use @VOICE.md for landing-page copy, headings, CTAs, and UI microcopy.
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
## Output
|
|
34
|
+
|
|
35
|
+
```md
|
|
36
|
+
# VOICE.md
|
|
37
|
+
|
|
38
|
+
## Voice Summary
|
|
39
|
+
|
|
40
|
+
- Overall tone: explanatory, action-oriented, trust-forward.
|
|
41
|
+
- Sentence shape: about 20.4 words per sentence.
|
|
42
|
+
- Main vocabulary: `teams`, `security`, `pricing`, `launch`.
|
|
43
|
+
- Common CTAs: `Start free`, `Book a demo`, `See pricing`.
|
|
44
|
+
|
|
45
|
+
## Agent Rules
|
|
46
|
+
|
|
47
|
+
- Start with a concrete user outcome before describing implementation details.
|
|
48
|
+
- Prefer short active sentences and visible verbs from the CTA list.
|
|
49
|
+
- Do not invent compliance, security, customer, or performance claims.
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
The real output also includes a small style fingerprint for heading length,
|
|
53
|
+
paragraph rhythm, CTA shape, CTA verbs, and lexical variety.
|
|
54
|
+
|
|
55
|
+
## What It Does
|
|
56
|
+
|
|
57
|
+
- Reads a URL or local HTML file.
|
|
58
|
+
- Extracts title, meta description, headings, links, buttons, and paragraphs.
|
|
59
|
+
- Finds CTA candidates from short action-led links/buttons.
|
|
60
|
+
- Measures average sentence length.
|
|
61
|
+
- Extracts a compact style fingerprint: heading shape, paragraph rhythm,
|
|
62
|
+
CTA shape, CTA verbs, and lexical variety.
|
|
63
|
+
- Builds a repeated-vocabulary lexicon.
|
|
64
|
+
- Writes Markdown or JSON.
|
|
65
|
+
- Benchmarks candidate copy against a source voice profile.
|
|
66
|
+
- Gates against unsupported claims and copied spans.
|
|
67
|
+
- Uses only the Python standard library.
|
|
68
|
+
|
|
69
|
+
## Benchmark
|
|
70
|
+
|
|
71
|
+
`site2voice bench` compares candidate copy against measurable source signals:
|
|
72
|
+
sentence length, vocabulary overlap, CTA shape, tone labels, heading shape,
|
|
73
|
+
claim boundaries, and copy safety.
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
site2voice bench examples/editorial-home.html \
|
|
77
|
+
examples/before-copy.md \
|
|
78
|
+
examples/after-copy.md \
|
|
79
|
+
--out examples/editorial-benchmark.md
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
| Candidate | Result | Overall | Lexicon | Copy safety |
|
|
83
|
+
| --- | --- | ---: | ---: | ---: |
|
|
84
|
+
| `after-copy` | PASS | 83.8 | 70.0 | 93.2 |
|
|
85
|
+
| `before-copy` | FAIL | 36.6 | 0.0 | 100.0 |
|
|
86
|
+
|
|
87
|
+
The benchmark rewards measurable voice alignment without rewarding verbatim
|
|
88
|
+
copying.
|
|
89
|
+
|
|
90
|
+
## What It Is Not
|
|
91
|
+
|
|
92
|
+
- Not an official brand guideline.
|
|
93
|
+
- Not a DESIGN.md visual-token extractor.
|
|
94
|
+
- Not a crawler for private pages or authenticated apps.
|
|
95
|
+
- Not an LLM prompt that copies a site's prose.
|
|
96
|
+
|
|
97
|
+
## Develop
|
|
98
|
+
|
|
99
|
+
```bash
|
|
100
|
+
python3 -m pip install -e .
|
|
101
|
+
make test
|
|
102
|
+
make bench
|
|
103
|
+
site2voice examples/saas-home.html --out examples/saas-VOICE.md
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
## Links
|
|
107
|
+
|
|
108
|
+
- [Research](docs/research.md)
|
|
109
|
+
- [Benchmark](docs/benchmark.md)
|
|
110
|
+
- [Voice patterns](docs/voice-patterns.md)
|
|
111
|
+
- [Source candidates](docs/source-candidates.md)
|
|
112
|
+
- [Awesome eligibility](docs/awesome-eligibility.md)
|
|
113
|
+
- [Harness](docs/harness.md)
|
|
114
|
+
- [Launch kit](docs/launch-kit.md)
|
|
115
|
+
|
|
116
|
+
## License
|
|
117
|
+
|
|
118
|
+
MIT
|
|
@@ -0,0 +1,51 @@
|
|
|
1
|
+
# Awesome Eligibility
|
|
2
|
+
|
|
3
|
+
Current status: **not ready to submit broadly**.
|
|
4
|
+
|
|
5
|
+
The project now has a public repo, a release, examples, tests, and a benchmark.
|
|
6
|
+
That is enough for lightweight launch posts, but not enough for high-signal
|
|
7
|
+
awesome-list submissions. Awesome maintainers usually expect visible usage,
|
|
8
|
+
clear novelty, and a stable public surface.
|
|
9
|
+
|
|
10
|
+
## Best Fit
|
|
11
|
+
|
|
12
|
+
| List family | Fit | Why |
|
|
13
|
+
| --- | --- | --- |
|
|
14
|
+
| Vibe coding / context engineering | High later | `site2voice` creates agent context files and includes an eval path. |
|
|
15
|
+
| Prompt engineering tools | Medium later | Useful for prompt/context prep, but not a prompt library. |
|
|
16
|
+
| AI for design | Medium-low | Related to design handoff, but voice/copy is not visual design. |
|
|
17
|
+
| Claude Code tools | Not yet | Some lists require high star counts or mature adoption. |
|
|
18
|
+
|
|
19
|
+
## Submit When
|
|
20
|
+
|
|
21
|
+
- v0.2+ release is public and installable.
|
|
22
|
+
- README shows one command, one result, and one before/after benchmark.
|
|
23
|
+
- At least one real-world demo or user workflow exists.
|
|
24
|
+
- No generated third-party prose is committed.
|
|
25
|
+
- The submission can be phrased as a context-engineering tool, not a brand
|
|
26
|
+
imitation tool.
|
|
27
|
+
|
|
28
|
+
## Candidate Positioning
|
|
29
|
+
|
|
30
|
+
Use this description:
|
|
31
|
+
|
|
32
|
+
> Generate compact `VOICE.md` context files from website copy, then benchmark
|
|
33
|
+
> whether AI-written copy matches the source voice without copying spans or
|
|
34
|
+
> inventing claims.
|
|
35
|
+
|
|
36
|
+
Avoid:
|
|
37
|
+
|
|
38
|
+
- asking for stars;
|
|
39
|
+
- submitting to unrelated design lists;
|
|
40
|
+
- claiming official brand replication;
|
|
41
|
+
- using EYESMAG, Highsnobiety, Stripe, or Linear outputs as public examples
|
|
42
|
+
unless only derived metrics are shown.
|
|
43
|
+
|
|
44
|
+
## References
|
|
45
|
+
|
|
46
|
+
- [no-fluff/awesome-vibe-coding](https://github.com/no-fluff/awesome-vibe-coding)
|
|
47
|
+
- [taskade/awesome-vibe-coding](https://github.com/taskade/awesome-vibe-coding)
|
|
48
|
+
- [subinium/awesome-claude-code](https://github.com/subinium/awesome-claude-code)
|
|
49
|
+
- [promptslab/awesome-prompt-engineering](https://github.com/promptslab/awesome-prompt-engineering)
|
|
50
|
+
- [allanjsx/awesome-ai-for-design](https://github.com/allanjsx/awesome-ai-for-design)
|
|
51
|
+
|
|
@@ -0,0 +1,47 @@
|
|
|
1
|
+
# Benchmark
|
|
2
|
+
|
|
3
|
+
`site2voice bench` checks whether candidate copy follows a source voice profile.
|
|
4
|
+
It is deterministic and uses only local text statistics.
|
|
5
|
+
|
|
6
|
+
```bash
|
|
7
|
+
site2voice bench examples/editorial-home.html \
|
|
8
|
+
examples/before-copy.md \
|
|
9
|
+
examples/after-copy.md \
|
|
10
|
+
--out examples/editorial-benchmark.md
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
## Metrics
|
|
14
|
+
|
|
15
|
+
| Metric | What it checks |
|
|
16
|
+
| --- | --- |
|
|
17
|
+
| Sentence fit | Candidate average sentence length vs source average. |
|
|
18
|
+
| Lexicon fit | Whether candidate reuses source vocabulary without requiring exact prose. |
|
|
19
|
+
| CTA fit | Whether short action-led lines match observed CTA vocabulary. |
|
|
20
|
+
| Tone fit | Overlap between deterministic tone labels. |
|
|
21
|
+
| Heading fit | Whether heading length/shape is close to the source. |
|
|
22
|
+
| Claim safety | Penalizes unsupported security, performance, customer, pricing, and AI claims. |
|
|
23
|
+
| Copy safety | Penalizes long shared spans and high 5-gram overlap with source snippets. |
|
|
24
|
+
|
|
25
|
+
The reference profile also keeps a style fingerprint: average heading length,
|
|
26
|
+
average paragraph length, average CTA length, CTA verbs, lexical variety, and
|
|
27
|
+
punctuation counts.
|
|
28
|
+
|
|
29
|
+
## Gates
|
|
30
|
+
|
|
31
|
+
A candidate passes when:
|
|
32
|
+
|
|
33
|
+
- overall score is at least 75;
|
|
34
|
+
- copy safety is at least 85;
|
|
35
|
+
- claim safety is at least 75.
|
|
36
|
+
|
|
37
|
+
This matters because direct copying can look stylistically aligned while being
|
|
38
|
+
the wrong behavior. Copy safety is a gate, not just a metric.
|
|
39
|
+
|
|
40
|
+
## Example
|
|
41
|
+
|
|
42
|
+
| Candidate | Result | Overall | Sentence | Lexicon | CTA | Tone | Heading | Claim safety | Copy safety |
|
|
43
|
+
| --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
|
|
44
|
+
| `after-copy` | PASS | 83.8 | 95.8 | 70.0 | 50.0 | 100.0 | 93.8 | 100.0 | 93.2 |
|
|
45
|
+
| `before-copy` | FAIL | 36.6 | 57.5 | 0.0 | 0.0 | 0.0 | 62.5 | 100.0 | 100.0 |
|
|
46
|
+
|
|
47
|
+
See [editorial-benchmark.md](../examples/editorial-benchmark.md).
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
# Harness
|
|
2
|
+
|
|
3
|
+
`site2voice` is built as a small agent-orchestration harness:
|
|
4
|
+
|
|
5
|
+
1. **Scout** finds relevant public sources and decides whether they are safe as
|
|
6
|
+
live examples or should be represented by synthetic fixtures.
|
|
7
|
+
2. **Profiler** runs `site2voice SOURCE --format json` and extracts measurable
|
|
8
|
+
voice signals: sentence shape, heading shape, CTA verbs, lexical variety,
|
|
9
|
+
and repeated vocabulary.
|
|
10
|
+
3. **Writer** uses the generated `VOICE.md` as context to write new candidate
|
|
11
|
+
copy.
|
|
12
|
+
4. **Evaluator** runs `site2voice bench SOURCE before.md after.md` and checks
|
|
13
|
+
alignment, claim boundaries, and copy safety.
|
|
14
|
+
5. **Publisher** updates examples, docs, releases, and launch notes.
|
|
15
|
+
|
|
16
|
+
## Claude Operator Review
|
|
17
|
+
|
|
18
|
+
Claude operator review agreed with the direction and recommended:
|
|
19
|
+
|
|
20
|
+
- deterministic style fingerprints instead of vague tone claims;
|
|
21
|
+
- a scorer before additional promotion;
|
|
22
|
+
- explicit copy-safety and unsupported-claim gates;
|
|
23
|
+
- source candidates as live examples, not committed third-party prose.
|
|
24
|
+
|
|
25
|
+
## Current Team Findings
|
|
26
|
+
|
|
27
|
+
- Editorial/magazine sources are useful for tone, but checked-in examples should
|
|
28
|
+
be synthetic.
|
|
29
|
+
- `site2voice` should be submitted to awesome lists only after release/demo
|
|
30
|
+
evidence is stronger.
|
|
31
|
+
- Best awesome-list fit found so far: vibe-coding/context-engineering lists, not
|
|
32
|
+
visual design lists.
|
|
33
|
+
|
|
34
|
+
See [awesome-eligibility.md](awesome-eligibility.md) for submission gates.
|
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
# Launch Kit
|
|
2
|
+
|
|
3
|
+
## One-Line Pitch
|
|
4
|
+
|
|
5
|
+
`site2voice` turns any website into a small `VOICE.md` file for AI coding
|
|
6
|
+
agents.
|
|
7
|
+
|
|
8
|
+
## Short Post
|
|
9
|
+
|
|
10
|
+
I made `site2voice`, a tiny CLI that generates agent-readable `VOICE.md` files
|
|
11
|
+
from website copy.
|
|
12
|
+
|
|
13
|
+
It extracts headings, CTA text, navigation labels, sentence length, and repeated
|
|
14
|
+
vocabulary, then writes a concise voice brief for Claude Code, Cursor, Codex, or
|
|
15
|
+
any other coding agent.
|
|
16
|
+
|
|
17
|
+
```bash
|
|
18
|
+
pipx install site2voice
|
|
19
|
+
site2voice https://example.com --out VOICE.md
|
|
20
|
+
|
|
21
|
+
# from a repo clone
|
|
22
|
+
site2voice bench examples/editorial-home.html examples/before-copy.md examples/after-copy.md
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
No LLM dependency. No browser dependency. The goal is simple: give agents voice
|
|
26
|
+
and positioning context before they write new UI copy.
|
|
27
|
+
|
|
28
|
+
The benchmark path shows whether the generated copy actually moved closer to
|
|
29
|
+
the source voice, without rewarding copied spans.
|
|
30
|
+
|
|
31
|
+
Repo: https://github.com/SihyeonJeon/site2voice
|
|
32
|
+
|
|
33
|
+
## Avoid
|
|
34
|
+
|
|
35
|
+
- Do not claim official brand approval.
|
|
36
|
+
- Do not paste long generated excerpts from third-party sites.
|
|
37
|
+
- Do not ask for stars.
|
|
38
|
+
- Do not spam unrelated design repositories.
|
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
# Research
|
|
2
|
+
|
|
3
|
+
## What Exists
|
|
4
|
+
|
|
5
|
+
DESIGN.md-style repos and services give AI coding agents visual design context:
|
|
6
|
+
colors, typography, spacing, layout patterns, and component guidance.
|
|
7
|
+
|
|
8
|
+
Observed categories:
|
|
9
|
+
|
|
10
|
+
- DESIGN.md libraries for ready-made visual systems.
|
|
11
|
+
- URL-to-DESIGN.md generators for design tokens and visual patterns.
|
|
12
|
+
- Agent memory files such as `AGENTS.md`, `CLAUDE.md`, and `DESIGN.md`.
|
|
13
|
+
|
|
14
|
+
## Gap
|
|
15
|
+
|
|
16
|
+
Visual style is only half of what an agent needs. When generating landing pages,
|
|
17
|
+
docs, onboarding flows, and UI microcopy, the agent also needs:
|
|
18
|
+
|
|
19
|
+
- message hierarchy;
|
|
20
|
+
- CTA language;
|
|
21
|
+
- navigation vocabulary;
|
|
22
|
+
- sentence length;
|
|
23
|
+
- claim boundaries;
|
|
24
|
+
- words to reuse and words to avoid.
|
|
25
|
+
|
|
26
|
+
`site2voice` targets that gap with a deterministic `VOICE.md` generator.
|
|
27
|
+
|
|
28
|
+
## New Evidence Path
|
|
29
|
+
|
|
30
|
+
The project now includes `site2voice bench`, which measures whether candidate
|
|
31
|
+
copy follows a source voice profile. This turns `VOICE.md` from a descriptive
|
|
32
|
+
artifact into a testable workflow:
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
site2voice SOURCE --out VOICE.md
|
|
36
|
+
site2voice bench SOURCE before.md after.md
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
The benchmark scores sentence shape, vocabulary overlap, CTA shape, heading
|
|
40
|
+
shape, tone labels, claim boundaries, and copy safety.
|
|
41
|
+
|
|
42
|
+
## Product Bet
|
|
43
|
+
|
|
44
|
+
The README should be short:
|
|
45
|
+
|
|
46
|
+
```bash
|
|
47
|
+
site2voice https://example.com --out VOICE.md
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
The output should answer:
|
|
51
|
+
|
|
52
|
+
- What does this site sound like?
|
|
53
|
+
- What CTAs does it use?
|
|
54
|
+
- What vocabulary should an agent reuse?
|
|
55
|
+
- What claims must the agent avoid inventing?
|
|
56
|
+
|
|
57
|
+
## Boundaries
|
|
58
|
+
|
|
59
|
+
- No LLM dependency.
|
|
60
|
+
- No screenshot or brand asset copying.
|
|
61
|
+
- Do not paste long source paragraphs into the generated file.
|
|
62
|
+
- Treat the output as a writing/style brief, not legal brand guidance.
|
|
63
|
+
|
|
64
|
+
## References
|
|
65
|
+
|
|
66
|
+
- [DESIGN.md library](https://designmd.app/)
|
|
67
|
+
- [Better Stack guide to DESIGN.md](https://betterstack.com/community/guides/ai/design-md-ai/)
|
|
68
|
+
- [DESIGN.MD by Parallect coverage](https://chatgate.ai/post/design-md-by-parallect)
|
|
69
|
+
- [DesignMD URL-to-DESIGN.md generator](https://www.designmd.cc/)
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
# Source Candidates
|
|
2
|
+
|
|
3
|
+
Use these as live examples for local analysis. Do not commit generated outputs
|
|
4
|
+
from third-party sites unless you remove source snippets and preserve only
|
|
5
|
+
derived metrics.
|
|
6
|
+
|
|
7
|
+
| Source | Why it is useful | Use safely |
|
|
8
|
+
| --- | --- | --- |
|
|
9
|
+
| [EYESMAG](https://www.eyesmag.com/) | Korean fashion/lifestyle web-magazine tone, mixed brand names, short trend labels. | Prefer live local runs or synthetic Korean fixtures. |
|
|
10
|
+
| [Highsnobiety](https://www.highsnobiety.com/) | Culture, fashion, commerce, and editorial headline cadence. | Use homepage/category pages; do not republish article body output. |
|
|
11
|
+
| [Hypebeast](https://hypebeast.com/) | Trend-driven streetwear/product vocabulary and title patterns. | Use as a live URL example; JS-heavy pages may need future browser support. |
|
|
12
|
+
| [Monocle](https://monocle.com/) | Polished global affairs, design, travel, and city-guide register. | Good live source for nav/category/CTA patterns. |
|
|
13
|
+
| [Wallpaper](https://www.wallpaper.com/) | Design, interiors, architecture, and product-focused editorial voice. | Use derived metrics, not copied article snippets. |
|
|
14
|
+
| [Dazed](https://www.dazeddigital.com/) | Youth culture, fashion, music, and art phrasing. | Prefer synthetic fixtures for public examples. |
|
|
15
|
+
| [Stripe](https://stripe.com/) | Developer/business SaaS claims, technical nouns, conversion CTAs. | Good contrast against editorial fixtures. |
|
|
16
|
+
| [Linear](https://linear.app/) | Modern product-team copy, short headings, system-oriented vocabulary. | Good live source, but homepage content changes often. |
|
|
17
|
+
| [Vercel](https://vercel.com/) | Developer platform voice, performance claims, product CTA density. | Good for claim-boundary tests. |
|
|
18
|
+
|
|
19
|
+
## Fixture Policy
|
|
20
|
+
|
|
21
|
+
Checked-in examples should be synthetic. Live URL extraction is for local
|
|
22
|
+
analysis and should not redistribute third-party prose.
|
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
# Voice Patterns
|
|
2
|
+
|
|
3
|
+
`site2voice` does not try to copy a brand or magazine. It extracts measurable
|
|
4
|
+
signals that help an agent write new copy in a nearby register without pasting
|
|
5
|
+
source prose.
|
|
6
|
+
|
|
7
|
+
## Editorial / Magazine
|
|
8
|
+
|
|
9
|
+
What to measure:
|
|
10
|
+
|
|
11
|
+
- short section labels mixed with longer descriptive paragraphs;
|
|
12
|
+
- named entities, places, seasons, materials, products, and collaborators;
|
|
13
|
+
- restrained adjectives before concrete nouns;
|
|
14
|
+
- headline length and whether titles read like indexes, blurbs, or news alerts;
|
|
15
|
+
- CTA words such as `read`, `explore`, `view`, `subscribe`, or `join`.
|
|
16
|
+
|
|
17
|
+
Good source families:
|
|
18
|
+
|
|
19
|
+
- EYESMAG-style Korean fashion and lifestyle indexing;
|
|
20
|
+
- Highsnobiety/Hypebeast-style culture and product trend coverage;
|
|
21
|
+
- Monocle/Wallpaper-style design, city, travel, and object language;
|
|
22
|
+
- Dazed/i-D-style youth culture and fashion framing.
|
|
23
|
+
|
|
24
|
+
Use synthetic fixtures for checked-in examples. Use live URLs only for local
|
|
25
|
+
analysis or private benchmark runs.
|
|
26
|
+
|
|
27
|
+
## Product / Developer Sites
|
|
28
|
+
|
|
29
|
+
What to measure:
|
|
30
|
+
|
|
31
|
+
- outcome-first headings;
|
|
32
|
+
- short conversion CTAs;
|
|
33
|
+
- technical nouns and integration terms;
|
|
34
|
+
- proof language, customer claims, security claims, and performance claims;
|
|
35
|
+
- whether copy starts with user value or implementation detail.
|
|
36
|
+
|
|
37
|
+
Good source families:
|
|
38
|
+
|
|
39
|
+
- Stripe, Vercel, Linear, GitHub, and other public product pages.
|
|
40
|
+
|
|
41
|
+
These sources are useful contrast sets. They expose whether candidate copy is
|
|
42
|
+
accidentally drifting from editorial voice into SaaS launch-page voice.
|
|
43
|
+
|
|
44
|
+
## Agent Handoff
|
|
45
|
+
|
|
46
|
+
Give the agent the generated `VOICE.md`, then score the result:
|
|
47
|
+
|
|
48
|
+
```bash
|
|
49
|
+
site2voice https://example.com --out VOICE.md --max-snippets 0
|
|
50
|
+
site2voice bench https://example.com before.md after.md
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
The first command gives the agent a voice brief. The second command verifies
|
|
54
|
+
whether the resulting copy moved closer to the source profile while staying
|
|
55
|
+
inside copy-safety and claim-safety gates.
|
|
56
|
+
|
|
@@ -0,0 +1,9 @@
|
|
|
1
|
+
# The slow return of the useful object
|
|
2
|
+
|
|
3
|
+
Current Index follows the small studios, shopkeepers, and collectors turning everyday materials into a quieter kind of record.
|
|
4
|
+
|
|
5
|
+
## A guide shaped by rooms, streets, and habit
|
|
6
|
+
|
|
7
|
+
Read the feature
|
|
8
|
+
Explore rooms
|
|
9
|
+
Open the guide
|
|
@@ -0,0 +1,9 @@
|
|
|
1
|
+
# Powerful content for modern brands
|
|
2
|
+
|
|
3
|
+
Our platform helps teams create better marketing with seamless workflows and next-generation automation.
|
|
4
|
+
|
|
5
|
+
## Everything you need
|
|
6
|
+
|
|
7
|
+
Use our all-in-one tools to boost productivity, improve collaboration, and unlock growth faster.
|
|
8
|
+
|
|
9
|
+
Get started today
|
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
# Voice Benchmark
|
|
2
|
+
|
|
3
|
+
Reference: `examples/editorial-home.html`
|
|
4
|
+
|
|
5
|
+
## Reference Profile
|
|
6
|
+
|
|
7
|
+
- Tone: explanatory
|
|
8
|
+
- Average sentence length: 21.4 words
|
|
9
|
+
- Lexicon: `city`, `archive`, `design`, `style`, `join`, `list`, `quiet`, `return`, `studio`, `object`, `current`, `index`
|
|
10
|
+
- CTAs: `Join the list`, `Explore rooms`
|
|
11
|
+
|
|
12
|
+
## Scores
|
|
13
|
+
|
|
14
|
+
| Candidate | Result | Overall | Sentence | Lexicon | CTA | Tone | Heading | Claim safety | Copy safety |
|
|
15
|
+
| --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
|
|
16
|
+
| `after-copy` | **PASS** | 83.8 | 95.8 | 70.0 | 50.0 | 100.0 | 93.8 | 100.0 | 93.2 |
|
|
17
|
+
| `before-copy` | **FAIL** | 36.6 | 57.5 | 0.0 | 0.0 | 0.0 | 62.5 | 100.0 | 100.0 |
|
|
18
|
+
|
|
19
|
+
## Why This Is Useful
|
|
20
|
+
|
|
21
|
+
The score is deterministic. It checks measurable voice signals and gates against unsupported claims and copied spans.
|
|
22
|
+
|
|
23
|
+
## Candidate Evidence
|
|
24
|
+
|
|
25
|
+
- `after-copy` reused source lexicon: `collectors`, `current`, `follows`, `index`, `object`, `return`, `shopkeepers`; longest shared run: 5 words.
|
|
26
|
+
- `before-copy` reused source lexicon: none; longest shared run: 1 words.
|