@botlearn/rewriter 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +35 -0
- package/knowledge/anti-patterns.md +119 -0
- package/knowledge/best-practices.md +150 -0
- package/knowledge/domain.md +129 -0
- package/manifest.json +28 -0
- package/package.json +38 -0
- package/skill.md +44 -0
- package/strategies/main.md +95 -0
- package/tests/benchmark.json +476 -0
- package/tests/smoke.json +54 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 BotLearn
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
# @botlearn/rewriter
|
|
2
|
+
|
|
3
|
+
> Style transformation, audience adaptation, and natural content rewriting with factual accuracy preservation for OpenClaw Agent
|
|
4
|
+
|
|
5
|
+
## Installation
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
# via npm
|
|
9
|
+
npm install @botlearn/rewriter
|
|
10
|
+
|
|
11
|
+
# via clawhub
|
|
12
|
+
clawhub install @botlearn/rewriter
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
## Category
|
|
16
|
+
|
|
17
|
+
content-processing
|
|
18
|
+
|
|
19
|
+
## Dependencies
|
|
20
|
+
|
|
21
|
+
`@botlearn/summarizer`
|
|
22
|
+
|
|
23
|
+
## Files
|
|
24
|
+
|
|
25
|
+
| File | Description |
|
|
26
|
+
|------|-------------|
|
|
27
|
+
| `manifest.json` | Skill metadata and configuration |
|
|
28
|
+
| `skill.md` | Role definition and activation rules |
|
|
29
|
+
| `knowledge/` | Domain knowledge documents |
|
|
30
|
+
| `strategies/` | Behavioral strategy definitions |
|
|
31
|
+
| `tests/` | Smoke and benchmark tests |
|
|
32
|
+
|
|
33
|
+
## License
|
|
34
|
+
|
|
35
|
+
MIT
|
|
@@ -0,0 +1,119 @@
|
|
|
1
|
+
---
|
|
2
|
+
domain: rewriter
|
|
3
|
+
topic: anti-patterns
|
|
4
|
+
priority: medium
|
|
5
|
+
ttl: 30d
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Content Rewriting — Anti-Patterns
|
|
9
|
+
|
|
10
|
+
## Sentence-Level Anti-Patterns
|
|
11
|
+
|
|
12
|
+
### 1. Uniform Sentence Length
|
|
13
|
+
- **Problem**: AI-generated rewrites often produce sentences of near-identical length (15-20 words), creating a metronomic rhythm that human readers and detection tools both flag
|
|
14
|
+
- **Detection signal**: Standard deviation of sentence word counts < 4 across a paragraph
|
|
15
|
+
- **Fix**: Deliberately vary sentence length. Follow a 25-word sentence with a 6-word one. Break or combine sentences until the standard deviation exceeds 6. Short hits. Then stretch out with something more complex that winds through a subordinate clause before arriving at its point.
|
|
16
|
+
|
|
17
|
+
### 2. Predictable Sentence Openers
|
|
18
|
+
- **Problem**: Starting 3+ consecutive sentences with "The," "This," "It," or a subject-verb pattern; AI text is notorious for "The [noun] [verb]" repetition
|
|
19
|
+
- **Detection signal**: More than 30% of sentences in a passage begin with the same word or part of speech
|
|
20
|
+
- **Fix**: Rotate openers across prepositional phrases, adverbs, participial phrases, dependent clauses, and inverted structures (see knowledge/best-practices.md, Section 6)
|
|
21
|
+
|
|
22
|
+
### 3. Parallel Construction Overuse
|
|
23
|
+
- **Problem**: Rewriting every item in a list or series using identical grammatical structure ("X enables Y. X facilitates Z. X promotes W."). Human writers use parallelism selectively, not exhaustively.
|
|
24
|
+
- **Detection signal**: Three or more consecutive sentences with identical syntactic templates
|
|
25
|
+
- **Fix**: Break the pattern — vary one item's structure, embed one as a subordinate clause, or merge two items into a compound sentence
|
|
26
|
+
|
|
27
|
+
## Word-Choice Anti-Patterns
|
|
28
|
+
|
|
29
|
+
### 4. AI-Tell Vocabulary
|
|
30
|
+
- **Problem**: Certain words and phrases appear with disproportionate frequency in AI-generated text, creating a recognizable statistical fingerprint
|
|
31
|
+
- **High-frequency AI tells to avoid or minimize**:
|
|
32
|
+
- "delve" / "delve into" — rarely used in natural human writing
|
|
33
|
+
- "landscape" (metaphorical: "the AI landscape") — overused as a framing device
|
|
34
|
+
- "navigate" (metaphorical: "navigate the complexities") — AI default for "deal with"
|
|
35
|
+
- "nuanced" / "nuances" — AI hedging word for acknowledging complexity
|
|
36
|
+
- "multifaceted" — AI synonym for "complex"
|
|
37
|
+
- "pivotal" — AI intensifier, rarely chosen by humans naturally
|
|
38
|
+
- "foster" (as in "foster innovation") — formal/corporate AI default
|
|
39
|
+
- "underscores" — AI attribution verb ("This underscores the importance of...")
|
|
40
|
+
- "realm" — AI metaphor for "area" or "field"
|
|
41
|
+
- "tapestry" — AI metaphor for "mix" or "combination"
|
|
42
|
+
- "leverage" (verb) — corporate AI default for "use"
|
|
43
|
+
- "robust" — AI adjective for "strong" or "thorough"
|
|
44
|
+
- "comprehensive" — AI filler adjective
|
|
45
|
+
- "cutting-edge" — AI default for "new" or "advanced"
|
|
46
|
+
- **Fix**: Replace with less predictable alternatives or restructure to eliminate the need for the word entirely. "Delve into the nuances" -> "look more closely at the details" or simply get specific about which details.
|
|
47
|
+
|
|
48
|
+
### 5. Over-Hedging
|
|
49
|
+
- **Problem**: Inserting excessive qualification and hedging language that the source material does not warrant. AI text defaults to cautious phrasing: "It is important to note that," "It should be mentioned that," "One could argue that"
|
|
50
|
+
- **Detection signal**: More than 2 hedging phrases per paragraph in text that makes direct claims
|
|
51
|
+
- **Common over-hedges**:
|
|
52
|
+
- "It is important to note that..." — almost always deletable; just state the thing
|
|
53
|
+
- "It is worth mentioning that..." — if it is worth mentioning, mention it directly
|
|
54
|
+
- "It should be noted that..." — passive hedge; say it or don't
|
|
55
|
+
- "One might argue that..." — if you are making the argument, own it
|
|
56
|
+
- "There are various factors that..." — name the factors instead
|
|
57
|
+
- "In today's rapidly evolving world..." — content-free throat-clearing
|
|
58
|
+
- "This is a complex issue with many dimensions..." — either show the complexity or don't announce it
|
|
59
|
+
- **Fix**: Delete the hedging wrapper and state the content directly. Reserve hedging for genuinely uncertain claims where the source material itself hedges.
|
|
60
|
+
|
|
61
|
+
### 6. Hollow Intensifiers
|
|
62
|
+
- **Problem**: Padding text with intensifiers that add no information: "very," "extremely," "incredibly," "highly," "truly," "really"
|
|
63
|
+
- **Detection signal**: More than 1 hollow intensifier per 100 words
|
|
64
|
+
- **Fix**: Remove the intensifier and choose a more precise word. "Very large" -> "massive" or simply provide the number. "Extremely important" -> explain why it matters.
|
|
65
|
+
|
|
66
|
+
## Structural Anti-Patterns
|
|
67
|
+
|
|
68
|
+
### 7. Symmetric Paragraph Structure
|
|
69
|
+
- **Problem**: Every paragraph follows the same template: topic sentence, 2-3 supporting sentences, concluding sentence. Human writing varies paragraph structure based on rhetorical purpose.
|
|
70
|
+
- **Detection signal**: All paragraphs in a section have the same sentence count (plus or minus 1) and follow topic-evidence-conclusion order
|
|
71
|
+
- **Fix**: Mix paragraph types — some open with evidence and end with the claim; some are a single emphatic sentence; some pose a question and answer it in the following paragraph; some use a narrative micro-structure.
|
|
72
|
+
|
|
73
|
+
### 8. List-to-Prose Conversion Failure
|
|
74
|
+
- **Problem**: When rewriting lists or bullet points into prose, producing sentences that are thinly disguised list items connected by "Additionally," "Furthermore," "Moreover"
|
|
75
|
+
- **Detection signal**: Paragraph of 4+ sentences where each begins with an additive transition and presents a single independent fact
|
|
76
|
+
- **Fix**: Integrate items into genuine flowing prose — group related points, use subordinate clauses, vary the rhetorical relationship between sentences (contrast, cause-effect, example-generalization)
|
|
77
|
+
|
|
78
|
+
### 9. Formulaic Introductions and Conclusions
|
|
79
|
+
- **Problem**: Opening with "In today's [adjective] world..." or "Throughout history..." and closing with "In conclusion, it is clear that..." These are AI defaults.
|
|
80
|
+
- **Common formulaic openers to avoid**:
|
|
81
|
+
- "In today's rapidly evolving world..."
|
|
82
|
+
- "In the ever-changing landscape of..."
|
|
83
|
+
- "Throughout history, humans have..."
|
|
84
|
+
- "As technology continues to advance..."
|
|
85
|
+
- "When it comes to [topic]..."
|
|
86
|
+
- **Common formulaic closers to avoid**:
|
|
87
|
+
- "In conclusion,"
|
|
88
|
+
- "To sum up,"
|
|
89
|
+
- "All in all,"
|
|
90
|
+
- "In summary, it is evident that..."
|
|
91
|
+
- "Moving forward, it will be important to..."
|
|
92
|
+
- **Fix**: Open with a specific detail, a surprising fact, a question, or jump directly into the argument. Close by circling back to a concrete image, posing a forward-looking question, or ending on the strongest piece of evidence.
|
|
93
|
+
|
|
94
|
+
### 10. Excessive Signposting
|
|
95
|
+
- **Problem**: Over-announcing what the text will do: "In this section, we will explore...", "The following paragraphs will discuss...", "Let us now turn to..."
|
|
96
|
+
- **Detection signal**: More than 1 meta-commentary sentence per section
|
|
97
|
+
- **Fix**: Delete the signpost and just deliver the content. Readers can follow well-organized prose without being told what is coming next. Use structural elements (headings, paragraph breaks) instead of verbal announcements.
|
|
98
|
+
|
|
99
|
+
## Semantic Anti-Patterns
|
|
100
|
+
|
|
101
|
+
### 11. Meaning Reversal Through Paraphrase
|
|
102
|
+
- **Problem**: In attempting to rephrase, accidentally inverting the meaning. "X does not significantly affect Y" becomes "X has a significant effect on Y" through careless restructuring.
|
|
103
|
+
- **Detection signal**: Post-rewrite meaning check reveals negation loss or causal reversal
|
|
104
|
+
- **Fix**: After every rewrite, verify that negations, comparatives, and causal relationships match the source exactly. Pay special attention to: "not," "less/fewer," "despite," "although," "except," "unless."
|
|
105
|
+
|
|
106
|
+
### 12. Generalization Creep
|
|
107
|
+
- **Problem**: Specific, qualified claims becoming sweeping generalizations through paraphrase. "A 2023 study of 200 participants found modest improvement" becomes "Research has shown clear benefits."
|
|
108
|
+
- **Detection signal**: Loss of specificity markers (dates, sample sizes, qualifiers like "modest," "preliminary," "in some cases")
|
|
109
|
+
- **Fix**: Preserve all quantifiers and qualifiers from the source. If simplifying for a general audience, maintain the qualifier even if you simplify the statistic ("early research suggests some improvement" rather than "research shows benefits").
|
|
110
|
+
|
|
111
|
+
### 13. Tone Contamination
|
|
112
|
+
- **Problem**: Elements of the source tone bleeding into a target style where they do not belong. Academic hedging appearing in marketing copy, or marketing urgency appearing in a technical rewrite.
|
|
113
|
+
- **Detection signal**: Register-inconsistent words or phrases (e.g., "game-changing" in academic prose, "notwithstanding" in casual blog post)
|
|
114
|
+
- **Fix**: After rewriting, do a register-consistency pass: read the output aloud and flag any word or phrase that would surprise a native reader of the target style. Replace with register-appropriate alternatives.
|
|
115
|
+
|
|
116
|
+
### 14. Synonym Roulette
|
|
117
|
+
- **Problem**: Mechanically replacing every content word with a synonym, regardless of connotation or register fit. "Big" becomes "gargantuan," "said" becomes "opined," "house" becomes "domicile."
|
|
118
|
+
- **Detection signal**: Unusual or elevated vocabulary that does not match the target register; words that are technically synonymous but connotationally wrong
|
|
119
|
+
- **Fix**: Only replace words when the replacement is a natural fit for the target style and audience. Keep simple words simple when the context calls for it. "Said" is almost always better than "opined," "exclaimed," or "articulated."
|
|
@@ -0,0 +1,150 @@
|
|
|
1
|
+
---
|
|
2
|
+
domain: rewriter
|
|
3
|
+
topic: style-adaptation-variation-naturalness
|
|
4
|
+
priority: high
|
|
5
|
+
ttl: 30d
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Content Rewriting — Best Practices
|
|
9
|
+
|
|
10
|
+
## Style Adaptation Techniques
|
|
11
|
+
|
|
12
|
+
### 1. Semantic Core Extraction
|
|
13
|
+
Before rewriting, isolate the semantic core — the irreducible meaning that must survive transformation:
|
|
14
|
+
- **Factual claims**: Named entities, statistics, causal relationships, temporal sequences
|
|
15
|
+
- **Logical structure**: Argument flow, premise-conclusion chains, conditional relationships
|
|
16
|
+
- **Intent signals**: What the author wants the reader to think, feel, or do after reading
|
|
17
|
+
- Use @botlearn/summarizer to generate a structured summary as the rewrite anchor
|
|
18
|
+
|
|
19
|
+
### 2. Register Shifting
|
|
20
|
+
When moving between formality levels:
|
|
21
|
+
- **Upward shift** (casual -> formal): Replace contractions, eliminate slang, convert fragments to complete sentences, switch to third person, add explicit logical connectors
|
|
22
|
+
- **Downward shift** (formal -> casual): Introduce contractions, replace Latinate words with Anglo-Saxon equivalents, add direct address ("you"), break long sentences into conversational fragments, use rhetorical questions
|
|
23
|
+
- **Preserve one register marker** from the source to maintain continuity (e.g., keep a technical term even in casual rewrite, defined parenthetically)
|
|
24
|
+
|
|
25
|
+
### 3. Vocabulary Calibration
|
|
26
|
+
- Match vocabulary to target audience reading level (see knowledge/domain.md audience types)
|
|
27
|
+
- Replace jargon with plain-language equivalents when writing for general audiences; add a brief parenthetical explanation if the term must remain
|
|
28
|
+
- For expert audiences, elevate vocabulary precision: prefer "ameliorate" over "improve" only when the nuance matters
|
|
29
|
+
- Maintain consistent vocabulary within a single rewrite — do not oscillate between registers
|
|
30
|
+
|
|
31
|
+
### 4. Sentence Architecture Transformation
|
|
32
|
+
- **Academic -> Journalistic**: Split compound-complex sentences; front-load the subject and verb; move qualifiers to follow-up sentences
|
|
33
|
+
- **Technical -> Conversational**: Convert passive constructions to active; replace "It should be noted that X" with "X" directly; use "you" as subject
|
|
34
|
+
- **Any -> Literary**: Vary sentence length deliberately (short for impact, long for immersion); embed sensory detail; replace abstract nouns with concrete images
|
|
35
|
+
|
|
36
|
+
## Variation Patterns for Naturalness
|
|
37
|
+
|
|
38
|
+
### 5. Sentence Length Variation
|
|
39
|
+
Natural human writing follows an irregular rhythm. Target these distributions:
|
|
40
|
+
|
|
41
|
+
| Sentence Type | Word Count | Frequency in Output |
|
|
42
|
+
|---------------|-----------|-------------------|
|
|
43
|
+
| Short (punchy) | 4-8 words | 15-25% |
|
|
44
|
+
| Medium | 9-18 words | 40-55% |
|
|
45
|
+
| Long (complex) | 19-35 words | 20-30% |
|
|
46
|
+
| Very long | 36+ words | 0-5% (use sparingly) |
|
|
47
|
+
|
|
48
|
+
- Never produce 3+ consecutive sentences of similar length
|
|
49
|
+
- After a long sentence, follow with a short one for rhythm
|
|
50
|
+
- Use single-sentence paragraphs occasionally for emphasis (but not predictably)
|
|
51
|
+
|
|
52
|
+
### 6. Opening Word Diversity
|
|
53
|
+
Avoid starting consecutive sentences with the same word or pattern. Rotate through:
|
|
54
|
+
- Subject-first: "The researchers found..."
|
|
55
|
+
- Prepositional phrase: "In a 2024 study,..."
|
|
56
|
+
- Adverbial: "Surprisingly, the data showed..."
|
|
57
|
+
- Participial: "Building on earlier work,..."
|
|
58
|
+
- Temporal: "After three years of trials,..."
|
|
59
|
+
- Transitional: "Yet the implications extend..."
|
|
60
|
+
- Conditional: "If these trends continue,..."
|
|
61
|
+
- Demonstrative: "This finding challenges..."
|
|
62
|
+
|
|
63
|
+
Rule: No more than 2 of 10 consecutive sentences should start with the same part of speech.
|
|
64
|
+
|
|
65
|
+
### 7. Paragraph Structure Variation
|
|
66
|
+
- Vary paragraph length: mix 2-sentence, 3-4 sentence, and occasional 5-6 sentence paragraphs
|
|
67
|
+
- Do not follow a predictable pattern (e.g., always short-long-short)
|
|
68
|
+
- Occasionally use a single-sentence paragraph for rhetorical emphasis
|
|
69
|
+
- Vary paragraph openings: not every paragraph should start with a topic sentence; sometimes lead with evidence or a question
|
|
70
|
+
|
|
71
|
+
### 8. Lexical Variation
|
|
72
|
+
- Do not repeat the same word more than twice per paragraph (excluding articles, prepositions, and structural words)
|
|
73
|
+
- Use synonyms and near-synonyms, but ensure each synonym carries the correct connotation for context
|
|
74
|
+
- Mix Latinate and Anglo-Saxon vocabulary for texture: "begin/commence," "ask/inquire," "end/conclude"
|
|
75
|
+
- Vary transition words: do not overuse any single connector ("however," "moreover," "furthermore")
|
|
76
|
+
|
|
77
|
+
### 9. Syntactic Variation
|
|
78
|
+
Rotate through syntactic patterns within each section:
|
|
79
|
+
- Simple sentence: Subject + Verb + Object
|
|
80
|
+
- Compound: Independent clause + coordinating conjunction + independent clause
|
|
81
|
+
- Complex: Dependent clause + independent clause (or reversed)
|
|
82
|
+
- Compound-complex: Combination
|
|
83
|
+
- Fragment (deliberate): For emphasis or conversational tone
|
|
84
|
+
- Inverted: Object or adverb fronted for emphasis ("Never before had the data been so clear.")
|
|
85
|
+
- Parenthetical insertion: "The results — and this surprised everyone — exceeded projections."
|
|
86
|
+
|
|
87
|
+
### 10. Discourse Marker Naturalness
|
|
88
|
+
Replace mechanical transition words with organic connectors:
|
|
89
|
+
|
|
90
|
+
| Avoid (mechanical) | Prefer (natural) |
|
|
91
|
+
|-------------------|-----------------|
|
|
92
|
+
| "Furthermore," | Weave the connection into the sentence logic |
|
|
93
|
+
| "In conclusion," | "What this adds up to is..." or simply state the conclusion |
|
|
94
|
+
| "It is worth noting that" | State the thing directly |
|
|
95
|
+
| "Additionally," | Use a colon, semicolon, or restructure to embed the addition |
|
|
96
|
+
| "On the other hand," | "But" (simple, effective), or restructure as contrast within the sentence |
|
|
97
|
+
|
|
98
|
+
## Factual Accuracy Preservation
|
|
99
|
+
|
|
100
|
+
### 11. Claim Inventory
|
|
101
|
+
Before rewriting, create a mental inventory of all verifiable claims:
|
|
102
|
+
- Named entities (people, organizations, places)
|
|
103
|
+
- Numeric data (statistics, dates, measurements, percentages)
|
|
104
|
+
- Causal claims ("X causes Y," "X leads to Y")
|
|
105
|
+
- Quotations and attributed statements
|
|
106
|
+
- Temporal relationships ("before," "after," "during")
|
|
107
|
+
|
|
108
|
+
### 12. Post-Rewrite Accuracy Check
|
|
109
|
+
After rewriting, verify each claim against the inventory:
|
|
110
|
+
- Every named entity must appear correctly (spelling, context)
|
|
111
|
+
- Every statistic must be numerically identical
|
|
112
|
+
- Causal direction must be preserved (do not reverse cause and effect)
|
|
113
|
+
- Quotations must remain verbatim (or clearly marked as paraphrased)
|
|
114
|
+
- No new claims should be introduced that were not in the source
|
|
115
|
+
|
|
116
|
+
### 13. Meaning Drift Detection
|
|
117
|
+
Watch for subtle meaning changes during rewriting:
|
|
118
|
+
- **Strength drift**: "may contribute to" becoming "causes" (hedging removed)
|
|
119
|
+
- **Scope drift**: "in some studies" becoming "research shows" (overgeneralization)
|
|
120
|
+
- **Attribution drift**: A specific researcher's claim becoming a general consensus
|
|
121
|
+
- **Temporal drift**: A historical finding presented as current without qualification
|
|
122
|
+
- IF any drift is detected THEN correct and re-verify
|
|
123
|
+
|
|
124
|
+
## AI-Detection Avoidance Techniques
|
|
125
|
+
|
|
126
|
+
### 14. Perplexity Management
|
|
127
|
+
AI-generated text tends toward low perplexity (highly predictable next-token sequences). Increase perplexity naturally by:
|
|
128
|
+
- Using unexpected but contextually valid word choices
|
|
129
|
+
- Employing idiomatic expressions and colloquialisms appropriate to the register
|
|
130
|
+
- Introducing deliberate imperfection: a mild digression, an aside, an incomplete thought that resolves later
|
|
131
|
+
- Varying information density: some sentences are content-dense, others are transitional or reflective
|
|
132
|
+
|
|
133
|
+
### 15. Burstiness
|
|
134
|
+
Human writing is "bursty" — it alternates between simple and complex passages. AI text tends toward uniform complexity.
|
|
135
|
+
- Follow a technical explanation with a plainly stated summary
|
|
136
|
+
- Mix analytical passages with anecdotal or illustrative ones
|
|
137
|
+
- Allow the "energy" of the writing to ebb and flow rather than maintaining a constant level
|
|
138
|
+
|
|
139
|
+
### 16. Personal Voice Injection (When Appropriate)
|
|
140
|
+
For non-formal registers, inject markers of personal voice:
|
|
141
|
+
- Parenthetical asides that reveal thought process
|
|
142
|
+
- Qualified opinions ("I'd argue that..." or "The more compelling reading is...")
|
|
143
|
+
- Specific, non-generic examples drawn from plausible real-world scenarios
|
|
144
|
+
- Occasional self-correction ("Or rather, the more precise way to frame this is...")
|
|
145
|
+
|
|
146
|
+
### 17. Structural Unpredictability
|
|
147
|
+
- Do not always follow the same macro-structure (intro-body-conclusion)
|
|
148
|
+
- Vary where the thesis or key point appears (first, middle, or end of section)
|
|
149
|
+
- Use different organizational patterns: chronological, spatial, problem-solution, comparison, cause-effect
|
|
150
|
+
- Occasionally break a "rule" of writing deliberately and purposefully (a one-word paragraph, a question left unanswered, a list that ends on an odd number)
|
|
@@ -0,0 +1,129 @@
|
|
|
1
|
+
---
|
|
2
|
+
domain: rewriter
|
|
3
|
+
topic: writing-styles-register-tone-audience
|
|
4
|
+
priority: high
|
|
5
|
+
ttl: 30d
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Content Rewriting — Writing Style Taxonomy, Register, Tone & Audience Types
|
|
9
|
+
|
|
10
|
+
## Writing Style Taxonomy
|
|
11
|
+
|
|
12
|
+
### 1. Academic Style
|
|
13
|
+
- **Characteristics**: Third person, passive voice permitted, citation-heavy, precise terminology, hedged claims ("findings suggest"), complex sentence structures with subordinate clauses
|
|
14
|
+
- **Vocabulary**: Domain-specific jargon, Latin-derived terms, nominalized verbs (e.g., "the investigation of" instead of "investigating")
|
|
15
|
+
- **Structure**: Thesis-driven paragraphs, topic sentences, evidence-reasoning chains, explicit transitions ("Furthermore," "In contrast," "Consequently")
|
|
16
|
+
- **Use cases**: Research papers, literature reviews, dissertations, grant proposals
|
|
17
|
+
|
|
18
|
+
### 2. Journalistic Style
|
|
19
|
+
- **Characteristics**: Inverted pyramid (most important information first), active voice dominant, short paragraphs (1-3 sentences), attribution-heavy, objective framing
|
|
20
|
+
- **Vocabulary**: Concrete nouns, strong verbs, accessible language (8th-10th grade reading level), minimal adjectives
|
|
21
|
+
- **Structure**: Lead (who/what/when/where/why), nut graph (context), body (supporting detail), kicker (closing)
|
|
22
|
+
- **Use cases**: News articles, press releases, feature stories, investigative reports
|
|
23
|
+
|
|
24
|
+
### 3. Conversational Style
|
|
25
|
+
- **Characteristics**: First/second person, contractions, rhetorical questions, colloquialisms, sentence fragments for emphasis, direct address
|
|
26
|
+
- **Vocabulary**: Everyday language (6th-8th grade level), slang where appropriate, informal connectors ("so," "anyway," "here's the thing")
|
|
27
|
+
- **Structure**: Loose organization, anecdotal openings, short paragraphs, parenthetical asides
|
|
28
|
+
- **Use cases**: Blog posts, personal essays, social media, informal emails, podcasts
|
|
29
|
+
|
|
30
|
+
### 4. Technical Style
|
|
31
|
+
- **Characteristics**: Imperative mood for instructions, precise terminology, numbered steps, code examples, specification-oriented, minimal ambiguity
|
|
32
|
+
- **Vocabulary**: Domain-specific terminology defined on first use, acronyms with expansions, consistent naming conventions
|
|
33
|
+
- **Structure**: Hierarchical headings, numbered procedures, prerequisite lists, note/warning/tip callouts, tables for reference data
|
|
34
|
+
- **Use cases**: Documentation, API references, user guides, technical specifications, runbooks
|
|
35
|
+
|
|
36
|
+
### 5. Literary Style
|
|
37
|
+
- **Characteristics**: Figurative language (metaphor, simile, personification), varied rhythm, sensory detail, show-don't-tell, subtext
|
|
38
|
+
- **Vocabulary**: Rich and varied; mixes register for effect; archaic or invented words when purposeful; avoids cliche
|
|
39
|
+
- **Structure**: Scene-based, non-linear permitted, paragraph length varies dramatically for pacing, white space as punctuation
|
|
40
|
+
- **Use cases**: Fiction, creative nonfiction, essays, speeches, branded storytelling
|
|
41
|
+
|
|
42
|
+
### 6. Marketing/Persuasive Style
|
|
43
|
+
- **Characteristics**: Benefit-driven, social proof, urgency cues, power words, call-to-action, emotional triggers, problem-agitation-solution framework
|
|
44
|
+
- **Vocabulary**: Superlatives (used sparingly for credibility), action verbs, "you"-centric, sensory adjectives, numbers for specificity
|
|
45
|
+
- **Structure**: Hook headline, pain point identification, solution framing, proof/testimonials, CTA; short paragraphs, bullet points
|
|
46
|
+
- **Use cases**: Landing pages, ad copy, email campaigns, sales letters, product descriptions
|
|
47
|
+
|
|
48
|
+
## Register (Formality Levels)
|
|
49
|
+
|
|
50
|
+
| Register | Markers | Example Phrase |
|
|
51
|
+
|----------|---------|----------------|
|
|
52
|
+
| **Frozen** | Ritualized, unchanging, ceremonial | "We the People of the United States, in order to form a more perfect Union..." |
|
|
53
|
+
| **Formal** | Complete sentences, no contractions, third person, technical precision | "The committee recommends that the organization adopt the revised policy framework." |
|
|
54
|
+
| **Consultative** | Professional but accessible, some first person, structured | "We've analyzed the data and recommend moving forward with Option B." |
|
|
55
|
+
| **Casual** | Contractions, first/second person, ellipsis, slang-lite | "So we looked at the numbers and honestly, Option B's the way to go." |
|
|
56
|
+
| **Intimate** | In-group language, incomplete sentences, personal references, heavy context-dependence | "Same thing as last time — go with B, yeah?" |
|
|
57
|
+
|
|
58
|
+
## Tone Dimensions
|
|
59
|
+
|
|
60
|
+
Tone operates independently of style and register. A single piece can shift tone while maintaining consistent style.
|
|
61
|
+
|
|
62
|
+
### Primary Tone Axes
|
|
63
|
+
|
|
64
|
+
| Axis | Spectrum | Markers |
|
|
65
|
+
|------|----------|---------|
|
|
66
|
+
| **Warmth** | Cold -- Neutral -- Warm -- Empathetic | Personal pronouns, acknowledgment phrases, emotional vocabulary |
|
|
67
|
+
| **Authority** | Tentative -- Balanced -- Authoritative -- Commanding | Hedging ("perhaps") vs. declaratives ("This is"); imperative mood |
|
|
68
|
+
| **Energy** | Subdued -- Measured -- Energetic -- Urgent | Sentence length, exclamation, pacing, word intensity |
|
|
69
|
+
| **Objectivity** | Subjective -- Balanced -- Objective -- Clinical | Opinion markers, attribution, evidence citation, emotional distance |
|
|
70
|
+
|
|
71
|
+
### Tone Combinations
|
|
72
|
+
|
|
73
|
+
- **Warm + Authoritative**: Mentor voice (e.g., experienced teacher explaining a concept)
|
|
74
|
+
- **Cold + Authoritative**: Institutional voice (e.g., legal notices, policy documents)
|
|
75
|
+
- **Warm + Tentative**: Peer voice (e.g., collaborative brainstorming, supportive feedback)
|
|
76
|
+
- **Energetic + Subjective**: Advocate voice (e.g., opinion pieces, rallying speeches)
|
|
77
|
+
- **Measured + Objective**: Analyst voice (e.g., research summaries, data-driven reports)
|
|
78
|
+
|
|
79
|
+
## Audience Types & Adaptation Parameters
|
|
80
|
+
|
|
81
|
+
### Expert Audience
|
|
82
|
+
- **Assumed knowledge**: High domain fluency; no need to define standard terms
|
|
83
|
+
- **Vocabulary**: Full technical lexicon; abbreviations without expansion
|
|
84
|
+
- **Depth**: Deep analysis, edge cases, nuance, limitations
|
|
85
|
+
- **Evidence**: Primary sources, raw data, methodological detail
|
|
86
|
+
- **Reading level**: Graduate/post-graduate (Flesch-Kincaid 12+)
|
|
87
|
+
|
|
88
|
+
### Practitioner Audience
|
|
89
|
+
- **Assumed knowledge**: Working domain knowledge; familiar with common concepts but not cutting-edge research
|
|
90
|
+
- **Vocabulary**: Standard domain terms; define novel or ambiguous terms
|
|
91
|
+
- **Depth**: Practical implications, how-to guidance, decision frameworks
|
|
92
|
+
- **Evidence**: Applied research, case studies, benchmarks, best practices
|
|
93
|
+
- **Reading level**: Undergraduate (Flesch-Kincaid 10-12)
|
|
94
|
+
|
|
95
|
+
### General Audience
|
|
96
|
+
- **Assumed knowledge**: No domain-specific knowledge assumed
|
|
97
|
+
- **Vocabulary**: Plain language; analogies for complex concepts; minimal jargon
|
|
98
|
+
- **Depth**: Key takeaways, "why it matters," big-picture framing
|
|
99
|
+
- **Evidence**: Simplified statistics, relatable examples, visual aids
|
|
100
|
+
- **Reading level**: 8th-10th grade (Flesch-Kincaid 8-10)
|
|
101
|
+
|
|
102
|
+
### Youth/Student Audience
|
|
103
|
+
- **Assumed knowledge**: Basic literacy; learning context assumed
|
|
104
|
+
- **Vocabulary**: Simple, concrete; definitions provided inline; avoid abstractions
|
|
105
|
+
- **Depth**: Foundational concepts, step-by-step explanations, curiosity hooks
|
|
106
|
+
- **Evidence**: Everyday examples, experiments, stories
|
|
107
|
+
- **Reading level**: 5th-7th grade (Flesch-Kincaid 5-7)
|
|
108
|
+
|
|
109
|
+
### Executive/Decision-Maker Audience
|
|
110
|
+
- **Assumed knowledge**: Business fluency; limited technical depth
|
|
111
|
+
- **Vocabulary**: Business terminology; translate technical concepts to impact/ROI language
|
|
112
|
+
- **Depth**: Executive summary, strategic implications, risk/benefit, recommendation
|
|
113
|
+
- **Evidence**: KPIs, competitive benchmarks, financial projections, expert endorsements
|
|
114
|
+
- **Reading level**: Concise regardless of complexity (Flesch-Kincaid 10-12, short sentences)
|
|
115
|
+
|
|
116
|
+
## Style Transformation Matrix
|
|
117
|
+
|
|
118
|
+
When rewriting from Source Style to Target Style, these are the primary transformation axes:
|
|
119
|
+
|
|
120
|
+
| Transformation | Key Changes |
|
|
121
|
+
|----------------|-------------|
|
|
122
|
+
| Academic -> Journalistic | Remove hedging, front-load conclusions, shorten sentences, replace jargon with plain terms, add attribution |
|
|
123
|
+
| Academic -> Conversational | Switch to first/second person, use contractions, add rhetorical questions, simplify evidence to anecdotes |
|
|
124
|
+
| Technical -> General | Replace jargon with analogies, remove prerequisites, add "why" context, use narrative structure |
|
|
125
|
+
| Technical -> Executive | Lead with business impact, quantify outcomes, collapse detail into summary, add recommendation |
|
|
126
|
+
| Journalistic -> Academic | Add hedging, expand evidence, restructure to thesis-evidence, formalize vocabulary |
|
|
127
|
+
| Marketing -> Journalistic | Remove superlatives and urgency, add balanced perspective, cite sources, neutral framing |
|
|
128
|
+
| Literary -> Technical | Extract factual content, remove figurative language, impose hierarchical structure, add precision |
|
|
129
|
+
| Conversational -> Formal | Remove contractions and slang, restructure fragments into complete sentences, third person, formal transitions |
|
package/manifest.json
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "@botlearn/rewriter",
|
|
3
|
+
"version": "0.1.0",
|
|
4
|
+
"description": "Style transformation, audience adaptation, and natural content rewriting with factual accuracy preservation for OpenClaw Agent",
|
|
5
|
+
"category": "content-processing",
|
|
6
|
+
"author": "BotLearn",
|
|
7
|
+
"benchmarkDimension": "content-understanding",
|
|
8
|
+
"expectedImprovement": 35,
|
|
9
|
+
"dependencies": {
|
|
10
|
+
"@botlearn/summarizer": "^1.0.0"
|
|
11
|
+
},
|
|
12
|
+
"compatibility": {
|
|
13
|
+
"openclaw": ">=0.5.0"
|
|
14
|
+
},
|
|
15
|
+
"files": {
|
|
16
|
+
"skill": "skill.md",
|
|
17
|
+
"knowledge": [
|
|
18
|
+
"knowledge/domain.md",
|
|
19
|
+
"knowledge/best-practices.md",
|
|
20
|
+
"knowledge/anti-patterns.md"
|
|
21
|
+
],
|
|
22
|
+
"strategies": [
|
|
23
|
+
"strategies/main.md"
|
|
24
|
+
],
|
|
25
|
+
"smokeTest": "tests/smoke.json",
|
|
26
|
+
"benchmark": "tests/benchmark.json"
|
|
27
|
+
}
|
|
28
|
+
}
|
package/package.json
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "@botlearn/rewriter",
|
|
3
|
+
"version": "0.1.0",
|
|
4
|
+
"description": "Style transformation, audience adaptation, and natural content rewriting with factual accuracy preservation for OpenClaw Agent",
|
|
5
|
+
"type": "module",
|
|
6
|
+
"main": "manifest.json",
|
|
7
|
+
"files": [
|
|
8
|
+
"manifest.json",
|
|
9
|
+
"skill.md",
|
|
10
|
+
"knowledge/",
|
|
11
|
+
"strategies/",
|
|
12
|
+
"tests/",
|
|
13
|
+
"README.md"
|
|
14
|
+
],
|
|
15
|
+
"keywords": [
|
|
16
|
+
"botlearn",
|
|
17
|
+
"openclaw",
|
|
18
|
+
"skill",
|
|
19
|
+
"content-processing"
|
|
20
|
+
],
|
|
21
|
+
"author": "BotLearn",
|
|
22
|
+
"license": "MIT",
|
|
23
|
+
"dependencies": {
|
|
24
|
+
"@botlearn/summarizer": "0.1.0"
|
|
25
|
+
},
|
|
26
|
+
"repository": {
|
|
27
|
+
"type": "git",
|
|
28
|
+
"url": "https://github.com/readai-team/botlearn-awesome-skills.git",
|
|
29
|
+
"directory": "packages/skills/rewriter"
|
|
30
|
+
},
|
|
31
|
+
"homepage": "https://github.com/readai-team/botlearn-awesome-skills/tree/main/packages/skills/rewriter",
|
|
32
|
+
"bugs": {
|
|
33
|
+
"url": "https://github.com/readai-team/botlearn-awesome-skills/issues"
|
|
34
|
+
},
|
|
35
|
+
"publishConfig": {
|
|
36
|
+
"access": "public"
|
|
37
|
+
}
|
|
38
|
+
}
|
package/skill.md
ADDED
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: rewriter
|
|
3
|
+
role: Content Rewriting Specialist
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
triggers:
|
|
6
|
+
- "rewrite"
|
|
7
|
+
- "rephrase"
|
|
8
|
+
- "paraphrase"
|
|
9
|
+
- "reword"
|
|
10
|
+
- "adapt for audience"
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Role
|
|
14
|
+
|
|
15
|
+
You are a Content Rewriting Specialist. When activated, you transform text across styles, registers, and audiences while preserving factual accuracy and producing output that reads as naturally human-written. You leverage the @botlearn/summarizer dependency to extract core meaning before rewriting, ensuring no semantic drift.
|
|
16
|
+
|
|
17
|
+
# Capabilities
|
|
18
|
+
|
|
19
|
+
1. Transform writing style (academic, journalistic, conversational, technical, literary, marketing) while preserving core meaning and factual claims
|
|
20
|
+
2. Adapt content for specific audiences by adjusting register, vocabulary complexity, assumed knowledge level, and cultural framing
|
|
21
|
+
3. Paraphrase and rephrase text with sufficient lexical and syntactic variation to avoid AI-detection patterns while maintaining readability
|
|
22
|
+
4. Preserve factual accuracy at >= 95% fidelity by cross-checking rewritten claims against the source material
|
|
23
|
+
5. Adjust tone (formal, neutral, casual, persuasive, empathetic) independently of style, allowing fine-grained control over voice
|
|
24
|
+
|
|
25
|
+
# Constraints
|
|
26
|
+
|
|
27
|
+
1. Never alter factual claims, statistics, proper nouns, or quoted material unless explicitly instructed to do so
|
|
28
|
+
2. Never produce rewritten text that reverses, contradicts, or materially changes the source meaning
|
|
29
|
+
3. Never use uniform sentence structure or predictable paragraph patterns — vary sentence length, opening words, and paragraph size
|
|
30
|
+
4. Never introduce hedging language ("It is important to note that...") unless the source material contains equivalent hedging
|
|
31
|
+
5. Always disclose when a rewrite significantly changes the emphasis or framing of the source, even if meaning is technically preserved
|
|
32
|
+
6. Always target an AI-detection rate below 20% by applying naturalness techniques from knowledge/best-practices.md
|
|
33
|
+
|
|
34
|
+
# Activation
|
|
35
|
+
|
|
36
|
+
WHEN the user requests a rewrite, rephrase, paraphrase, rewording, or audience adaptation:
|
|
37
|
+
1. Use @botlearn/summarizer to extract the core meaning, key claims, and structural intent of the source text
|
|
38
|
+
2. Analyze the source style, register, and audience using knowledge/domain.md
|
|
39
|
+
3. Profile the target audience and map the required style transformation
|
|
40
|
+
4. Execute the rewrite following strategies/main.md
|
|
41
|
+
5. Apply naturalness and variation patterns from knowledge/best-practices.md
|
|
42
|
+
6. Verify against knowledge/anti-patterns.md to avoid AI-detectable patterns
|
|
43
|
+
7. Perform factual accuracy verification against the source material
|
|
44
|
+
8. Output the rewritten text with a brief transformation summary (source style -> target style, audience, key adaptations made)
|
|
@@ -0,0 +1,95 @@
|
|
|
1
|
+
---
|
|
2
|
+
strategy: rewriter
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
steps: 6
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Content Rewriting Strategy
|
|
8
|
+
|
|
9
|
+
## Step 1: Source Analysis
|
|
10
|
+
- Parse the source text to identify: **style** (academic, journalistic, conversational, technical, literary, marketing), **register** (frozen/formal/consultative/casual/intimate), **tone** (warmth, authority, energy, objectivity axes), and **audience** (expert, practitioner, general, youth, executive)
|
|
11
|
+
- Use @botlearn/summarizer to extract the semantic core: key claims, factual data, logical structure, and authorial intent
|
|
12
|
+
- Build a **claim inventory**: list every verifiable fact, statistic, named entity, quotation, causal claim, and temporal relationship
|
|
13
|
+
- Measure source text characteristics: average sentence length, vocabulary complexity (Flesch-Kincaid estimate), paragraph structure patterns
|
|
14
|
+
- IF the user has not specified a target style or audience THEN infer from context or ask one clarifying question before proceeding
|
|
15
|
+
- Identify elements that must remain unchanged: proper nouns, direct quotations, technical terms with no equivalent, numerical data
|
|
16
|
+
|
|
17
|
+
## Step 2: Audience Profiling
|
|
18
|
+
- Determine the target audience from the user's request (explicit) or from the target platform/context (implicit)
|
|
19
|
+
- Map the audience to the audience type taxonomy in knowledge/domain.md: Expert, Practitioner, General, Youth, Executive
|
|
20
|
+
- Define adaptation parameters:
|
|
21
|
+
- **Reading level**: Target Flesch-Kincaid score for the audience
|
|
22
|
+
- **Assumed knowledge**: What concepts can be referenced without explanation?
|
|
23
|
+
- **Vocabulary ceiling**: Maximum complexity of word choice
|
|
24
|
+
- **Evidence expectations**: What kind of proof does this audience value?
|
|
25
|
+
- **Engagement style**: Does this audience prefer data, narrative, authority, or relatability?
|
|
26
|
+
- IF the source audience matches the target audience THEN focus adaptation on style and tone rather than knowledge-level adjustment
|
|
27
|
+
- IF the target audience has lower domain knowledge than the source THEN identify all jargon and technical terms requiring simplification or explanation
|
|
28
|
+
|
|
29
|
+
## Step 3: Style Mapping
|
|
30
|
+
- Identify the source-to-target transformation pair using the Style Transformation Matrix in knowledge/domain.md
|
|
31
|
+
- Define the transformation axes:
|
|
32
|
+
- **Vocabulary changes**: Words to replace, register-appropriate alternatives, jargon handling
|
|
33
|
+
- **Sentence structure changes**: Active/passive shifts, sentence length targets, complexity adjustments
|
|
34
|
+
- **Paragraph structure changes**: Information order, paragraph length variation, use of topic sentences
|
|
35
|
+
- **Tone shifts**: Adjustments on warmth, authority, energy, and objectivity axes
|
|
36
|
+
- **Register adjustments**: Formality markers to add or remove (contractions, person, address mode)
|
|
37
|
+
- Plan the macro-structure of the rewrite:
|
|
38
|
+
- IF the target style uses a specific structure (e.g., inverted pyramid for journalistic) THEN reorganize the content to match
|
|
39
|
+
- IF the source structure is adequate for the target style THEN preserve it to minimize unnecessary changes
|
|
40
|
+
- Pre-select naturalness techniques from knowledge/best-practices.md to apply during rewriting:
|
|
41
|
+
- Sentence length variation targets (Section 5)
|
|
42
|
+
- Opening word diversity strategy (Section 6)
|
|
43
|
+
- Paragraph structure variation plan (Section 7)
|
|
44
|
+
|
|
45
|
+
## Step 4: Rewrite with Variation
|
|
46
|
+
- Execute the rewrite section by section, applying the mapped transformations:
|
|
47
|
+
- Transform vocabulary according to target register and audience reading level
|
|
48
|
+
- Restructure sentences to match target style patterns
|
|
49
|
+
- Adjust tone markers (hedging, directness, warmth, authority)
|
|
50
|
+
- Reorganize information flow if the target style demands it
|
|
51
|
+
- APPLY variation patterns throughout (from knowledge/best-practices.md):
|
|
52
|
+
- Vary sentence length: enforce the 4-8 / 9-18 / 19-35 / 36+ word distribution
|
|
53
|
+
- Rotate opening words: no more than 2 of 10 consecutive sentences starting with the same part of speech
|
|
54
|
+
- Mix syntactic patterns: simple, compound, complex, compound-complex, fragments, inversions
|
|
55
|
+
- Vary paragraph length: mix 2-sentence, 3-4 sentence, and 5-6 sentence paragraphs
|
|
56
|
+
- AVOID AI-tell vocabulary from knowledge/anti-patterns.md (Section 4): do not use "delve," "landscape" (metaphorical), "multifaceted," "pivotal," "tapestry," "robust," "comprehensive," "cutting-edge," or other flagged terms
|
|
57
|
+
- AVOID over-hedging (Section 5): strip unnecessary qualifiers; reserve hedging for genuinely uncertain claims
|
|
58
|
+
- AVOID formulaic openings and closings (Section 9): no "In today's rapidly evolving world," no "In conclusion, it is clear that"
|
|
59
|
+
- IF the source contains lists or bullet points AND the target style is prose THEN integrate items into flowing sentences with varied rhetorical relationships, not additive transitions
|
|
60
|
+
|
|
61
|
+
## Step 5: Naturalness Check
|
|
62
|
+
- Perform an AI-detection resistance review:
|
|
63
|
+
- **Perplexity check**: Read through the output and identify any passages where the next word feels highly predictable; inject unexpected but valid word choices or restructure
|
|
64
|
+
- **Burstiness check**: Verify that complexity alternates — analytical passages should be followed by simpler ones; ensure "energy" ebbs and flows
|
|
65
|
+
- **Sentence length distribution**: Calculate approximate word counts; ensure standard deviation > 6 across paragraphs
|
|
66
|
+
- **Opening word audit**: Verify no more than 2 of 10 consecutive sentences start with the same word or part of speech
|
|
67
|
+
- **AI-tell scan**: Search the output for any words from the anti-patterns vocabulary list (knowledge/anti-patterns.md, Section 4); replace any found
|
|
68
|
+
- **Hedge audit**: Count hedging phrases per paragraph; if more than 2, remove or rephrase
|
|
69
|
+
- **Transition audit**: Check for mechanical transition words ("Furthermore," "Additionally," "Moreover"); replace with organic connectors or restructure
|
|
70
|
+
- IF any check fails THEN revise the specific passage and re-check
|
|
71
|
+
- Perform a **register consistency pass**: read the output in the voice of the target audience; flag any word or phrase that feels out of place for the target register
|
|
72
|
+
- Target: AI-detection rate < 20% as measured by standard detection tools
|
|
73
|
+
|
|
74
|
+
## Step 6: Accuracy Verification
|
|
75
|
+
- Compare the rewritten text against the claim inventory from Step 1:
|
|
76
|
+
- **Named entities**: Every proper noun must appear correctly (spelling, context, attribution)
|
|
77
|
+
- **Numeric data**: Every statistic, date, measurement, and percentage must be numerically identical to the source
|
|
78
|
+
- **Causal relationships**: Verify that cause-effect directionality is preserved (no reversal through paraphrase)
|
|
79
|
+
- **Negations**: Verify that all negations survive the rewrite ("does not" must not become "does" through restructuring)
|
|
80
|
+
- **Qualifiers**: Check that scope qualifiers are preserved ("some studies" must not become "studies" or "all studies")
|
|
81
|
+
- **Quotations**: Direct quotes must remain verbatim; paraphrased quotes must be clearly marked as paraphrased
|
|
82
|
+
- **Temporal relationships**: "Before," "after," "during," and "while" relationships must match the source
|
|
83
|
+
- IF any factual discrepancy is found THEN correct the rewritten passage and re-verify
|
|
84
|
+
- Estimate factual accuracy percentage: target >= 95%
|
|
85
|
+
- SELF-CHECK before output:
|
|
86
|
+
- Does the rewrite match the requested target style and audience?
|
|
87
|
+
- Is the tone consistent throughout and appropriate for the target?
|
|
88
|
+
- Are all factual claims accurate and traceable to the source?
|
|
89
|
+
- Does the text read naturally and avoid AI-detection patterns?
|
|
90
|
+
- IF any check fails THEN loop back to the relevant step (Step 4 for style/variation issues, Step 5 for naturalness issues)
|
|
91
|
+
- Output the rewritten text with a brief **transformation summary**:
|
|
92
|
+
- Source style/register/audience -> Target style/register/audience
|
|
93
|
+
- Key adaptations made (vocabulary level, structure changes, tone shifts)
|
|
94
|
+
- Factual accuracy confidence (percentage of claims verified)
|
|
95
|
+
- Any caveats (emphasis shifts, simplifications that reduced nuance, terms left untranslated)
|
|
@@ -0,0 +1,476 @@
|
|
|
1
|
+
{
|
|
2
|
+
"version": "0.0.1",
|
|
3
|
+
"dimension": "content-understanding",
|
|
4
|
+
"tasks": [
|
|
5
|
+
{
|
|
6
|
+
"id": "bench-easy-01",
|
|
7
|
+
"difficulty": "easy",
|
|
8
|
+
"description": "Simple register shift from formal to casual",
|
|
9
|
+
"input": "Rewrite this formal notice in a casual, friendly tone:\n\n\"We wish to inform all employees that the annual performance evaluation cycle will commence on March 15, 2025. All staff members are required to complete their self-assessment documentation prior to the scheduled review meetings. Failure to submit the requisite materials by the designated deadline may result in a delay of the evaluation process.\"",
|
|
10
|
+
"rubric": [
|
|
11
|
+
{
|
|
12
|
+
"criterion": "Register Shift",
|
|
13
|
+
"weight": 0.35,
|
|
14
|
+
"scoring": {
|
|
15
|
+
"5": "Fully casual: contractions, direct address, friendly language, no corporate stiffness; reads like a message from a colleague",
|
|
16
|
+
"3": "Partially casual but retains some formal phrasing ('requisite', 'designated', 'commence')",
|
|
17
|
+
"1": "Minor word changes but fundamentally still formal in structure and vocabulary",
|
|
18
|
+
"0": "No register change"
|
|
19
|
+
}
|
|
20
|
+
},
|
|
21
|
+
{
|
|
22
|
+
"criterion": "Factual Accuracy",
|
|
23
|
+
"weight": 0.35,
|
|
24
|
+
"scoring": {
|
|
25
|
+
"5": "All facts preserved: March 15 date, self-assessment requirement, deadline consequence",
|
|
26
|
+
"3": "Key facts present but one detail altered or omitted",
|
|
27
|
+
"1": "Multiple factual changes or omissions",
|
|
28
|
+
"0": "Facts fabricated or reversed"
|
|
29
|
+
}
|
|
30
|
+
},
|
|
31
|
+
{
|
|
32
|
+
"criterion": "Naturalness",
|
|
33
|
+
"weight": 0.3,
|
|
34
|
+
"scoring": {
|
|
35
|
+
"5": "Reads like genuine casual human writing; varied sentence structure, no AI-tell vocabulary",
|
|
36
|
+
"3": "Mostly natural but some AI patterns detectable",
|
|
37
|
+
"1": "Stilted casual tone; feels like AI attempting casual",
|
|
38
|
+
"0": "Unnatural throughout"
|
|
39
|
+
}
|
|
40
|
+
}
|
|
41
|
+
],
|
|
42
|
+
"expectedScoreWithout": 40,
|
|
43
|
+
"expectedScoreWith": 80
|
|
44
|
+
},
|
|
45
|
+
{
|
|
46
|
+
"id": "bench-easy-02",
|
|
47
|
+
"difficulty": "easy",
|
|
48
|
+
"description": "Basic paraphrase with synonym variation",
|
|
49
|
+
"input": "Paraphrase the following paragraph using different words and sentence structures while keeping the same meaning:\n\n\"Remote work has become increasingly common since 2020. Many companies discovered that employees could be productive outside traditional office settings. However, some managers remain skeptical about long-term remote work, citing concerns about team collaboration and company culture.\"",
|
|
50
|
+
"rubric": [
|
|
51
|
+
{
|
|
52
|
+
"criterion": "Lexical Variation",
|
|
53
|
+
"weight": 0.35,
|
|
54
|
+
"scoring": {
|
|
55
|
+
"5": "Substantial word replacement with contextually appropriate synonyms; sentence structures meaningfully reorganized; not a word-swap exercise",
|
|
56
|
+
"3": "Moderate synonym use but sentence structures largely mirror the original",
|
|
57
|
+
"1": "Minimal changes; mostly the same words with minor swaps",
|
|
58
|
+
"0": "Near-verbatim copy of source"
|
|
59
|
+
}
|
|
60
|
+
},
|
|
61
|
+
{
|
|
62
|
+
"criterion": "Meaning Preservation",
|
|
63
|
+
"weight": 0.35,
|
|
64
|
+
"scoring": {
|
|
65
|
+
"5": "All three ideas intact: remote work growth since 2020, productivity discovery, manager skepticism with specific concerns (collaboration, culture)",
|
|
66
|
+
"3": "Core meaning preserved but one nuance lost or altered",
|
|
67
|
+
"1": "Significant meaning drift; key qualifications missing",
|
|
68
|
+
"0": "Meaning changed or contradicted"
|
|
69
|
+
}
|
|
70
|
+
},
|
|
71
|
+
{
|
|
72
|
+
"criterion": "Naturalness",
|
|
73
|
+
"weight": 0.3,
|
|
74
|
+
"scoring": {
|
|
75
|
+
"5": "Reads fluently as original human prose; no thesaurus-abuse artifacts; appropriate register maintained",
|
|
76
|
+
"3": "Mostly fluent but occasional awkward synonym choices or stilted phrasing",
|
|
77
|
+
"1": "Reads like mechanical synonym substitution",
|
|
78
|
+
"0": "Incoherent or unnatural"
|
|
79
|
+
}
|
|
80
|
+
}
|
|
81
|
+
],
|
|
82
|
+
"expectedScoreWithout": 45,
|
|
83
|
+
"expectedScoreWith": 82
|
|
84
|
+
},
|
|
85
|
+
{
|
|
86
|
+
"id": "bench-easy-03",
|
|
87
|
+
"difficulty": "easy",
|
|
88
|
+
"description": "Tone adjustment from neutral to persuasive",
|
|
89
|
+
"input": "Rewrite this neutral product description to be persuasive and compelling, as if for a landing page:\n\n\"The XR-500 headphones have 40mm drivers, active noise cancellation, and a 30-hour battery life. They weigh 250 grams and connect via Bluetooth 5.3. The retail price is $149.99.\"",
|
|
90
|
+
"rubric": [
|
|
91
|
+
{
|
|
92
|
+
"criterion": "Persuasive Transformation",
|
|
93
|
+
"weight": 0.35,
|
|
94
|
+
"scoring": {
|
|
95
|
+
"5": "Compelling marketing copy: benefit-driven framing, emotional hooks, sensory language, clear value proposition; features translated to user benefits",
|
|
96
|
+
"3": "Some persuasive elements but reads more like enhanced spec sheet than marketing copy",
|
|
97
|
+
"1": "Minimal persuasive language; mostly restated specs with adjectives added",
|
|
98
|
+
"0": "No persuasive transformation"
|
|
99
|
+
}
|
|
100
|
+
},
|
|
101
|
+
{
|
|
102
|
+
"criterion": "Factual Accuracy",
|
|
103
|
+
"weight": 0.35,
|
|
104
|
+
"scoring": {
|
|
105
|
+
"5": "All specs accurate: 40mm drivers, ANC, 30-hour battery, 250g, Bluetooth 5.3, $149.99",
|
|
106
|
+
"3": "Most specs correct but one rounded or approximated",
|
|
107
|
+
"1": "Multiple spec errors",
|
|
108
|
+
"0": "Specs fabricated or significantly wrong"
|
|
109
|
+
}
|
|
110
|
+
},
|
|
111
|
+
{
|
|
112
|
+
"criterion": "Naturalness",
|
|
113
|
+
"weight": 0.3,
|
|
114
|
+
"scoring": {
|
|
115
|
+
"5": "Reads like professional human copywriting; varied rhythm, no AI-tell vocabulary, genuine enthusiasm without being generic",
|
|
116
|
+
"3": "Decent copy but some AI patterns (overuse of 'experience', 'seamlessly', 'elevate')",
|
|
117
|
+
"1": "Generic AI marketing tone; could describe any product",
|
|
118
|
+
"0": "Stilted or incoherent"
|
|
119
|
+
}
|
|
120
|
+
}
|
|
121
|
+
],
|
|
122
|
+
"expectedScoreWithout": 40,
|
|
123
|
+
"expectedScoreWith": 78
|
|
124
|
+
},
|
|
125
|
+
{
|
|
126
|
+
"id": "bench-med-01",
|
|
127
|
+
"difficulty": "medium",
|
|
128
|
+
"description": "Academic-to-journalistic style transformation with complex source material",
|
|
129
|
+
"input": "Rewrite this academic abstract as a news article lead and body (3-4 paragraphs, journalistic style):\n\n\"This randomized controlled trial (N=1,256; 18-month follow-up) evaluated the efficacy of a combined cognitive-behavioral therapy and mindfulness-based intervention for treatment-resistant depression. Participants randomized to the intervention arm (n=628) demonstrated a clinically significant reduction in PHQ-9 scores (mean difference: -4.7, 95% CI: -5.9 to -3.5, p < 0.001) compared to the treatment-as-usual control group (n=628). Secondary outcomes included improved sleep quality (Pittsburgh Sleep Quality Index improvement of 2.3 points) and reduced antidepressant dosage in 34% of intervention participants. Adverse events were reported by 8% of intervention participants, primarily mild (headache, transient anxiety increase). The findings suggest that integrated CBT-mindfulness approaches may offer a viable augmentation strategy for patients who have not responded adequately to pharmacotherapy alone.\"",
|
|
130
|
+
"rubric": [
|
|
131
|
+
{
|
|
132
|
+
"criterion": "Style Transformation",
|
|
133
|
+
"weight": 0.25,
|
|
134
|
+
"scoring": {
|
|
135
|
+
"5": "Authentic journalistic style: inverted pyramid, lead with key finding, accessible language, attribution, short paragraphs; no academic register",
|
|
136
|
+
"3": "Partially journalistic but retains academic phrasing or structure",
|
|
137
|
+
"1": "Minor surface changes; still reads as academic text",
|
|
138
|
+
"0": "No style transformation"
|
|
139
|
+
}
|
|
140
|
+
},
|
|
141
|
+
{
|
|
142
|
+
"criterion": "Factual Accuracy",
|
|
143
|
+
"weight": 0.3,
|
|
144
|
+
"scoring": {
|
|
145
|
+
"5": "All key data preserved: 1256 participants, 18 months, PHQ-9 reduction of 4.7, sleep improvement, 34% dose reduction, 8% adverse events",
|
|
146
|
+
"3": "Most data preserved but 1-2 statistics simplified or omitted",
|
|
147
|
+
"1": "Several factual errors or major omissions",
|
|
148
|
+
"0": "Facts altered or fabricated"
|
|
149
|
+
}
|
|
150
|
+
},
|
|
151
|
+
{
|
|
152
|
+
"criterion": "Audience Accessibility",
|
|
153
|
+
"weight": 0.2,
|
|
154
|
+
"scoring": {
|
|
155
|
+
"5": "Clinical terms explained or translated (PHQ-9 explained, treatment-resistant defined, CBT spelled out); no unexplained jargon",
|
|
156
|
+
"3": "Some jargon simplified but technical terms remain without context",
|
|
157
|
+
"1": "Jargon largely retained; general reader would struggle",
|
|
158
|
+
"0": "No accessibility improvement"
|
|
159
|
+
}
|
|
160
|
+
},
|
|
161
|
+
{
|
|
162
|
+
"criterion": "AI-Detection Resistance",
|
|
163
|
+
"weight": 0.25,
|
|
164
|
+
"scoring": {
|
|
165
|
+
"5": "Varied sentence length, diverse openers, no AI-tell vocabulary, natural paragraph rhythm, genuine journalistic voice",
|
|
166
|
+
"3": "Some variation but AI patterns visible in places",
|
|
167
|
+
"1": "Uniform rhythm and predictable structure",
|
|
168
|
+
"0": "Obvious AI-generated patterns"
|
|
169
|
+
}
|
|
170
|
+
}
|
|
171
|
+
],
|
|
172
|
+
"expectedScoreWithout": 30,
|
|
173
|
+
"expectedScoreWith": 72
|
|
174
|
+
},
|
|
175
|
+
{
|
|
176
|
+
"id": "bench-med-02",
|
|
177
|
+
"difficulty": "medium",
|
|
178
|
+
"description": "Technical documentation to executive summary with audience shift",
|
|
179
|
+
"input": "Rewrite this technical incident report as a 2-paragraph executive summary for the C-suite:\n\n\"At 14:32 UTC on 2024-11-15, the primary PostgreSQL database cluster (db-prod-01 through db-prod-03) experienced cascading connection pool exhaustion due to a long-running query originating from the analytics service (commit hash: a3f7b2c). The query executed a sequential scan on the `user_events` table (4.2B rows) without the expected index on `created_at`, triggered by an ORM migration that dropped the index during a schema update (migration #247). Connection pool saturation propagated to the API gateway within 3 minutes, resulting in HTTP 503 errors for 78% of requests. The incident affected approximately 340,000 active users across all regions. MTTR was 47 minutes, achieved by killing the offending query (14:41 UTC), restoring the index (14:52 UTC), and gradually draining the connection pool backlog (15:19 UTC). Root cause: insufficient migration review process; the index drop was not flagged during code review as the migration file did not include a performance impact annotation.\"",
|
|
180
|
+
"rubric": [
|
|
181
|
+
{
|
|
182
|
+
"criterion": "Audience Adaptation",
|
|
183
|
+
"weight": 0.3,
|
|
184
|
+
"scoring": {
|
|
185
|
+
"5": "Executive-appropriate: business impact framed first (users affected, duration), technical details translated to impact language, clear action items, no unexplained jargon (ORM, PostgreSQL, connection pool explained or omitted)",
|
|
186
|
+
"3": "Some business framing but technical details dominate; partial jargon translation",
|
|
187
|
+
"1": "Technical report with minor formatting changes; still requires engineering background to understand",
|
|
188
|
+
"0": "No audience adaptation"
|
|
189
|
+
}
|
|
190
|
+
},
|
|
191
|
+
{
|
|
192
|
+
"criterion": "Factual Accuracy",
|
|
193
|
+
"weight": 0.3,
|
|
194
|
+
"scoring": {
|
|
195
|
+
"5": "Key business facts preserved: 340,000 users affected, 47-minute resolution time, 78% error rate, date/time, root cause (review process gap)",
|
|
196
|
+
"3": "Most business-relevant facts preserved but one key metric wrong or missing",
|
|
197
|
+
"1": "Several factual errors or critical omissions",
|
|
198
|
+
"0": "Facts fabricated or significantly altered"
|
|
199
|
+
}
|
|
200
|
+
},
|
|
201
|
+
{
|
|
202
|
+
"criterion": "Conciseness & Structure",
|
|
203
|
+
"weight": 0.2,
|
|
204
|
+
"scoring": {
|
|
205
|
+
"5": "Exactly 2 well-structured paragraphs; first covers what happened and impact, second covers resolution and prevention; no filler",
|
|
206
|
+
"3": "Reasonable structure but too long, too short, or poorly organized",
|
|
207
|
+
"1": "Single block of text or excessive length",
|
|
208
|
+
"0": "Unstructured output"
|
|
209
|
+
}
|
|
210
|
+
},
|
|
211
|
+
{
|
|
212
|
+
"criterion": "Naturalness",
|
|
213
|
+
"weight": 0.2,
|
|
214
|
+
"scoring": {
|
|
215
|
+
"5": "Reads like a genuine executive briefing written by a senior leader; authoritative, clear, no AI-tell vocabulary",
|
|
216
|
+
"3": "Mostly professional but some AI patterns or generic corporate language",
|
|
217
|
+
"1": "Generic AI business writing; could describe any incident",
|
|
218
|
+
"0": "Unnatural or incoherent"
|
|
219
|
+
}
|
|
220
|
+
}
|
|
221
|
+
],
|
|
222
|
+
"expectedScoreWithout": 30,
|
|
223
|
+
"expectedScoreWith": 70
|
|
224
|
+
},
|
|
225
|
+
{
|
|
226
|
+
"id": "bench-med-03",
|
|
227
|
+
"difficulty": "medium",
|
|
228
|
+
"description": "Rewrite for AI-detection avoidance with strict naturalness requirements",
|
|
229
|
+
"input": "Rewrite the following AI-generated text to read as naturally human-written as possible. The goal is to pass AI-detection tools while preserving the complete meaning:\n\n\"Artificial intelligence has revolutionized the healthcare landscape in numerous ways. It is important to note that machine learning algorithms can now analyze medical images with remarkable accuracy, often surpassing human radiologists in detecting certain conditions. Furthermore, AI-powered predictive models are being leveraged to identify patients at risk of developing chronic diseases, enabling early intervention strategies. Additionally, natural language processing tools are being utilized to extract valuable insights from unstructured clinical notes, thereby streamlining the documentation process. However, it is worth mentioning that significant challenges remain, including data privacy concerns, algorithmic bias, and the need for robust regulatory frameworks to ensure patient safety.\"",
|
|
230
|
+
"rubric": [
|
|
231
|
+
{
|
|
232
|
+
"criterion": "AI-Detection Resistance",
|
|
233
|
+
"weight": 0.35,
|
|
234
|
+
"scoring": {
|
|
235
|
+
"5": "Passes as human-written: no AI-tell vocabulary (leveraged, landscape, robust, furthermore, additionally, it is important to note), varied sentence length (SD > 6), diverse openers, natural burstiness",
|
|
236
|
+
"3": "Most AI patterns removed but 1-2 tell-tale phrases remain or sentence rhythm is still uniform",
|
|
237
|
+
"1": "Surface-level changes but fundamental AI patterns intact",
|
|
238
|
+
"0": "Still reads as obviously AI-generated"
|
|
239
|
+
}
|
|
240
|
+
},
|
|
241
|
+
{
|
|
242
|
+
"criterion": "Meaning Preservation",
|
|
243
|
+
"weight": 0.3,
|
|
244
|
+
"scoring": {
|
|
245
|
+
"5": "All five points preserved: medical image analysis, predictive models for chronic disease, NLP for clinical notes, and three challenges (privacy, bias, regulation)",
|
|
246
|
+
"3": "Most points preserved but one dropped or significantly altered",
|
|
247
|
+
"1": "Substantial meaning loss or addition of unsupported claims",
|
|
248
|
+
"0": "Content significantly changed"
|
|
249
|
+
}
|
|
250
|
+
},
|
|
251
|
+
{
|
|
252
|
+
"criterion": "Naturalness Techniques",
|
|
253
|
+
"weight": 0.2,
|
|
254
|
+
"scoring": {
|
|
255
|
+
"5": "Employs multiple naturalness techniques: personal voice, specific examples, varied paragraph structure, organic transitions, rhetorical variation",
|
|
256
|
+
"3": "Some naturalness techniques applied but execution is inconsistent",
|
|
257
|
+
"1": "Reads as a mechanical synonym-swap of the original",
|
|
258
|
+
"0": "Less natural than the source"
|
|
259
|
+
}
|
|
260
|
+
},
|
|
261
|
+
{
|
|
262
|
+
"criterion": "Prose Quality",
|
|
263
|
+
"weight": 0.15,
|
|
264
|
+
"scoring": {
|
|
265
|
+
"5": "Engaging, well-paced prose that a skilled human writer would be proud of; clear voice and personality",
|
|
266
|
+
"3": "Competent prose but lacks distinctive voice",
|
|
267
|
+
"1": "Functional but bland",
|
|
268
|
+
"0": "Poor quality writing"
|
|
269
|
+
}
|
|
270
|
+
}
|
|
271
|
+
],
|
|
272
|
+
"expectedScoreWithout": 25,
|
|
273
|
+
"expectedScoreWith": 72
|
|
274
|
+
},
|
|
275
|
+
{
|
|
276
|
+
"id": "bench-med-04",
|
|
277
|
+
"difficulty": "medium",
|
|
278
|
+
"description": "Cross-audience adaptation preserving complex argumentation",
|
|
279
|
+
"input": "Rewrite this expert-level policy analysis for a general audience blog post (800-1000 words equivalent). Preserve the argument structure but make it accessible to someone with no economics background:\n\n\"The countercyclical fiscal multiplier debate remains contentious. Auerbach & Gorodnichenko (2012) estimated multipliers of 1.5-2.0 during recessions versus 0-0.5 during expansions using a regime-switching SVAR model. However, Ramey & Zubairy (2018) challenged these findings, arguing that the local projection approach yields multipliers below unity even in slack states when controlling for the zero lower bound. The reconciliation may lie in the composition of fiscal stimulus: transfer payments exhibit higher multipliers than government consumption in liquidity-constrained environments (Galí et al., 2007), suggesting that the instrument matters as much as the timing. For policymakers, this implies that blanket fiscal expansion during downturns is insufficient; the design of stimulus packages — targeting constrained households versus general spending — materially affects macroeconomic outcomes.\"",
|
|
280
|
+
"rubric": [
|
|
281
|
+
{
|
|
282
|
+
"criterion": "Accessibility",
|
|
283
|
+
"weight": 0.3,
|
|
284
|
+
"scoring": {
|
|
285
|
+
"5": "All economic concepts explained through plain language and analogy (multiplier, countercyclical, SVAR, zero lower bound, liquidity-constrained); no unexplained jargon; reading level 8th-10th grade",
|
|
286
|
+
"3": "Most jargon translated but some economic concepts assume reader knowledge",
|
|
287
|
+
"1": "Jargon partially simplified but still requires economics background",
|
|
288
|
+
"0": "No accessibility improvement"
|
|
289
|
+
}
|
|
290
|
+
},
|
|
291
|
+
{
|
|
292
|
+
"criterion": "Argument Preservation",
|
|
293
|
+
"weight": 0.3,
|
|
294
|
+
"scoring": {
|
|
295
|
+
"5": "Full argument structure preserved: debate framing, two opposing positions with researchers named, reconciliation thesis (composition matters), policy implication (targeting matters)",
|
|
296
|
+
"3": "Main argument preserved but one position or nuance lost",
|
|
297
|
+
"1": "Oversimplified to the point of losing the argument structure",
|
|
298
|
+
"0": "Argument distorted or lost"
|
|
299
|
+
}
|
|
300
|
+
},
|
|
301
|
+
{
|
|
302
|
+
"criterion": "Engagement",
|
|
303
|
+
"weight": 0.2,
|
|
304
|
+
"scoring": {
|
|
305
|
+
"5": "Genuinely engaging for a general reader: uses concrete examples, relatable framing, narrative hooks; makes economics interesting",
|
|
306
|
+
"3": "Readable but dry; translated rather than transformed",
|
|
307
|
+
"1": "Simplified but boring; textbook-like",
|
|
308
|
+
"0": "Inaccessible or dull"
|
|
309
|
+
}
|
|
310
|
+
},
|
|
311
|
+
{
|
|
312
|
+
"criterion": "AI-Detection Resistance",
|
|
313
|
+
"weight": 0.2,
|
|
314
|
+
"scoring": {
|
|
315
|
+
"5": "Natural human voice throughout; varied structure, no AI-tell vocabulary, organic flow",
|
|
316
|
+
"3": "Mostly natural but some AI patterns detectable",
|
|
317
|
+
"1": "Reads as AI-generated explainer content",
|
|
318
|
+
"0": "Obvious AI patterns"
|
|
319
|
+
}
|
|
320
|
+
}
|
|
321
|
+
],
|
|
322
|
+
"expectedScoreWithout": 25,
|
|
323
|
+
"expectedScoreWith": 68
|
|
324
|
+
},
|
|
325
|
+
{
|
|
326
|
+
"id": "bench-hard-01",
|
|
327
|
+
"difficulty": "hard",
|
|
328
|
+
"description": "Multi-version rewrite producing three audience variants from one source",
|
|
329
|
+
"input": "Produce three versions of the following text: (1) for a medical professional journal, (2) for a patient-facing health website, and (3) for a social media post (Twitter/X thread). Each version must preserve all factual claims.\n\nSource: \"A Phase III clinical trial (AURORA-3, NCT04821674) of sarilumab 200mg subcutaneous injection demonstrated superior efficacy compared to adalimumab in biologic-naive patients with moderate-to-severe rheumatoid arthritis. At week 24, ACR50 response rates were 45.2% (sarilumab) vs. 29.8% (adalimumab) (p < 0.001). DAS28-CRP remission rates were 26.1% vs. 13.7%. Serious adverse events occurred in 5.3% of the sarilumab group (primarily infections: 2.1%) versus 4.8% in the adalimumab group. The safety profile was consistent with previous sarilumab trials.\"",
|
|
330
|
+
"rubric": [
|
|
331
|
+
{
|
|
332
|
+
"criterion": "Medical Professional Version",
|
|
333
|
+
"weight": 0.25,
|
|
334
|
+
"scoring": {
|
|
335
|
+
"5": "Authentic medical journal style: appropriate terminology retained, study identifiers included, statistical rigor preserved, structured reporting format, no dumbing-down",
|
|
336
|
+
"3": "Mostly appropriate but inconsistent register or missing key clinical details",
|
|
337
|
+
"1": "Generic formal writing, not specifically medical journal style",
|
|
338
|
+
"0": "Inappropriate for medical audience"
|
|
339
|
+
}
|
|
340
|
+
},
|
|
341
|
+
{
|
|
342
|
+
"criterion": "Patient-Facing Version",
|
|
343
|
+
"weight": 0.25,
|
|
344
|
+
"scoring": {
|
|
345
|
+
"5": "Clear, empathetic, jargon-free: medical terms explained (ACR50, DAS28-CRP, biologic-naive), statistics contextualized for patients, balanced risk/benefit framing, reassuring tone",
|
|
346
|
+
"3": "Partially simplified but some medical terms unexplained or statistics presented without context",
|
|
347
|
+
"1": "Minor simplification; still requires medical knowledge to understand",
|
|
348
|
+
"0": "Not accessible to patients"
|
|
349
|
+
}
|
|
350
|
+
},
|
|
351
|
+
{
|
|
352
|
+
"criterion": "Social Media Version",
|
|
353
|
+
"weight": 0.25,
|
|
354
|
+
"scoring": {
|
|
355
|
+
"5": "Authentic social media voice: thread format, hook opening, emoji-appropriate, key stats highlighted, accessible, shareable; respects platform conventions",
|
|
356
|
+
"3": "Social-media-adjacent but reads like a shortened article rather than native thread content",
|
|
357
|
+
"1": "Formal text truncated to fit character limits",
|
|
358
|
+
"0": "Not recognizable as social media content"
|
|
359
|
+
}
|
|
360
|
+
},
|
|
361
|
+
{
|
|
362
|
+
"criterion": "Cross-Version Factual Consistency",
|
|
363
|
+
"weight": 0.25,
|
|
364
|
+
"scoring": {
|
|
365
|
+
"5": "All three versions preserve identical facts: trial name, drug names/doses, 45.2% vs 29.8%, 26.1% vs 13.7%, safety data (5.3% vs 4.8%), infection rate 2.1%",
|
|
366
|
+
"3": "Facts consistent across versions but 1-2 statistics omitted in simplified versions without acknowledgment",
|
|
367
|
+
"1": "Factual inconsistencies between versions",
|
|
368
|
+
"0": "Contradictions between versions"
|
|
369
|
+
}
|
|
370
|
+
}
|
|
371
|
+
],
|
|
372
|
+
"expectedScoreWithout": 20,
|
|
373
|
+
"expectedScoreWith": 65
|
|
374
|
+
},
|
|
375
|
+
{
|
|
376
|
+
"id": "bench-hard-02",
|
|
377
|
+
"difficulty": "hard",
|
|
378
|
+
"description": "Rewrite preserving nuanced argumentation while transforming style and defeating AI detection",
|
|
379
|
+
"input": "Rewrite this legal analysis as an engaging long-form magazine article (The Atlantic style). Preserve every legal argument and citation but make it compelling for an educated general reader. The output must pass AI-detection tools.\n\n\"In Gonzalez v. Google LLC (2023), the Supreme Court declined to address the scope of Section 230 immunity for algorithmic content recommendations, vacating the Ninth Circuit's decision on narrower grounds (the claims failed even without Section 230 protection). Justice Thomas's concurrence in Malwarebytes v. Enigma Software (2019) had signaled judicial appetite for reconsidering Section 230's breadth, noting that the statute's 'sweeping immunity' may exceed congressional intent. The doctrinal tension persists: the Third Circuit in Domen v. Vimeo (2022) applied traditional Section 230 analysis to content moderation decisions, while the Fifth Circuit in NetChoice v. Paxton (2024) held that platforms' editorial discretion does not categorically constitute protected speech under the First Amendment. This circuit split, combined with emerging state legislation (Texas HB 20, Florida SB 7072), creates significant uncertainty for platform liability. The unresolved question — whether algorithmic amplification constitutes 'publishing' under §230(c)(1) — will likely require Supreme Court intervention within the next 2-3 terms.\"",
|
|
380
|
+
"rubric": [
|
|
381
|
+
{
|
|
382
|
+
"criterion": "Magazine Style Transformation",
|
|
383
|
+
"weight": 0.25,
|
|
384
|
+
"scoring": {
|
|
385
|
+
"5": "Authentic long-form magazine voice: narrative hooks, scene-setting, accessible legal explanation through analogy and storytelling, varied pacing, personality in the prose",
|
|
386
|
+
"3": "Readable for general audience but lacks distinctive magazine voice; more like a well-written explainer than an Atlantic piece",
|
|
387
|
+
"1": "Legal analysis with simplified vocabulary; structure unchanged",
|
|
388
|
+
"0": "No meaningful style transformation"
|
|
389
|
+
}
|
|
390
|
+
},
|
|
391
|
+
{
|
|
392
|
+
"criterion": "Legal Accuracy",
|
|
393
|
+
"weight": 0.3,
|
|
394
|
+
"scoring": {
|
|
395
|
+
"5": "All cases cited correctly (Gonzalez v. Google, Malwarebytes v. Enigma, Domen v. Vimeo, NetChoice v. Paxton), holdings accurately described, Section 230 analysis correct, circuit split framed properly, state laws identified correctly",
|
|
396
|
+
"3": "Most legal content accurate but 1-2 holdings described imprecisely",
|
|
397
|
+
"1": "Several legal inaccuracies or mischaracterized holdings",
|
|
398
|
+
"0": "Legal content substantially wrong"
|
|
399
|
+
}
|
|
400
|
+
},
|
|
401
|
+
{
|
|
402
|
+
"criterion": "AI-Detection Resistance",
|
|
403
|
+
"weight": 0.25,
|
|
404
|
+
"scoring": {
|
|
405
|
+
"5": "Passes as human-written: distinctive voice, varied rhythm, unexpected word choices, genuine analytical personality, no AI-tell vocabulary or patterns",
|
|
406
|
+
"3": "Mostly natural but some AI patterns detectable in transitions or paragraph structure",
|
|
407
|
+
"1": "AI patterns pervasive despite surface-level style changes",
|
|
408
|
+
"0": "Obviously AI-generated"
|
|
409
|
+
}
|
|
410
|
+
},
|
|
411
|
+
{
|
|
412
|
+
"criterion": "Reader Engagement",
|
|
413
|
+
"weight": 0.2,
|
|
414
|
+
"scoring": {
|
|
415
|
+
"5": "Genuinely compelling: a non-lawyer would want to read to the end; stakes made clear, human impact conveyed, forward-looking tension maintained",
|
|
416
|
+
"3": "Informative and readable but not particularly gripping",
|
|
417
|
+
"1": "Dry and academic despite style changes",
|
|
418
|
+
"0": "Difficult or boring to read"
|
|
419
|
+
}
|
|
420
|
+
}
|
|
421
|
+
],
|
|
422
|
+
"expectedScoreWithout": 20,
|
|
423
|
+
"expectedScoreWith": 62
|
|
424
|
+
},
|
|
425
|
+
{
|
|
426
|
+
"id": "bench-hard-03",
|
|
427
|
+
"difficulty": "hard",
|
|
428
|
+
"description": "Adversarial rewrite with conflicting constraints: maximum style change, maximum accuracy, maximum naturalness",
|
|
429
|
+
"input": "Rewrite this dense scientific methodology section as a first-person narrative blog post written by the lead researcher, as if explaining the experiment to a curious friend over coffee. You must preserve every methodological detail (exact numbers, procedures, controls) while making it completely conversational and natural. The output must contain zero AI-detectable patterns.\n\n\"We employed a double-blind, placebo-controlled crossover design with a 2-week washout period between conditions. Forty-eight participants (24F, 24M; mean age 31.4 ± 5.2 years; BMI 22.1 ± 2.8 kg/m²) completed both experimental arms. The intervention consisted of 400mg caffeine administered via standardized capsules at 08:00, with the placebo condition utilizing visually identical cellulose capsules. Cognitive performance was assessed using the Cambridge Neuropsychological Test Automated Battery (CANTAB) at four time points: baseline (07:30), +1h, +3h, and +6h post-administration. Primary outcomes were reaction time on the Rapid Visual Information Processing (RVP) task and spatial working memory (SWM) errors. Blood samples for plasma caffeine concentration were collected at each time point via antecubital venipuncture. Salivary cortisol was measured using enzyme-linked immunosorbent assay (ELISA) at baseline and +3h. Environmental conditions were controlled (22 ± 1°C, 45 ± 5% relative humidity, 300 lux ambient lighting). Participants abstained from caffeine for 72 hours prior to each session, verified by baseline plasma caffeine levels < 0.5 μg/mL.\"",
|
|
430
|
+
"rubric": [
|
|
431
|
+
{
|
|
432
|
+
"criterion": "Conversational Transformation",
|
|
433
|
+
"weight": 0.2,
|
|
434
|
+
"scoring": {
|
|
435
|
+
"5": "Authentic first-person narrative: reads like a real scientist talking informally; contractions, asides, humor, personal perspective; the 'over coffee' framing is realized, not just stated",
|
|
436
|
+
"3": "First person but still reads like a formal presentation simplified; lacks genuine conversational quality",
|
|
437
|
+
"1": "Third-to-first person pronoun change with minimal style adaptation",
|
|
438
|
+
"0": "No meaningful transformation"
|
|
439
|
+
}
|
|
440
|
+
},
|
|
441
|
+
{
|
|
442
|
+
"criterion": "Methodological Accuracy",
|
|
443
|
+
"weight": 0.3,
|
|
444
|
+
"scoring": {
|
|
445
|
+
"5": "Every detail preserved: double-blind crossover, 2-week washout, 48 participants (24F/24M), age 31.4±5.2, BMI 22.1±2.8, 400mg caffeine at 08:00, cellulose placebo, CANTAB at 4 timepoints (07:30, +1h, +3h, +6h), RVP and SWM outcomes, blood draws, cortisol via ELISA, environmental controls (22°C, 45% humidity, 300 lux), 72h abstinence, <0.5 μg/mL threshold",
|
|
446
|
+
"3": "Most details preserved but 2-3 specific numbers omitted or approximated",
|
|
447
|
+
"1": "General methodology described but many specifics lost",
|
|
448
|
+
"0": "Methodology substantially altered or incomplete"
|
|
449
|
+
}
|
|
450
|
+
},
|
|
451
|
+
{
|
|
452
|
+
"criterion": "AI-Detection Resistance",
|
|
453
|
+
"weight": 0.3,
|
|
454
|
+
"scoring": {
|
|
455
|
+
"5": "Zero detectable AI patterns: highly varied sentence structure, genuine personal voice, idiosyncratic phrasing, natural imperfections, burstiness, no AI-tell vocabulary",
|
|
456
|
+
"3": "Most AI patterns eliminated but some uniformity in sentence rhythm or vocabulary remains",
|
|
457
|
+
"1": "Surface-level naturalization but fundamental AI patterns intact",
|
|
458
|
+
"0": "Clearly AI-generated"
|
|
459
|
+
}
|
|
460
|
+
},
|
|
461
|
+
{
|
|
462
|
+
"criterion": "Engagement & Voice",
|
|
463
|
+
"weight": 0.2,
|
|
464
|
+
"scoring": {
|
|
465
|
+
"5": "Genuinely enjoyable to read; feels like listening to a passionate scientist explain their work; personality comes through; reader learns without feeling lectured",
|
|
466
|
+
"3": "Readable and clear but lacks distinctive personality",
|
|
467
|
+
"1": "Information delivered but not engaging",
|
|
468
|
+
"0": "Boring or confusing"
|
|
469
|
+
}
|
|
470
|
+
}
|
|
471
|
+
],
|
|
472
|
+
"expectedScoreWithout": 18,
|
|
473
|
+
"expectedScoreWith": 60
|
|
474
|
+
}
|
|
475
|
+
]
|
|
476
|
+
}
|
package/tests/smoke.json
ADDED
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
{
|
|
2
|
+
"version": "0.0.1",
|
|
3
|
+
"timeout": 60,
|
|
4
|
+
"tasks": [
|
|
5
|
+
{
|
|
6
|
+
"id": "smoke-01",
|
|
7
|
+
"description": "Rewrite an academic paragraph for a general audience while preserving factual accuracy and avoiding AI-detection patterns",
|
|
8
|
+
"input": "Rewrite the following academic text for a general audience blog post. Maintain all factual claims but make it engaging and accessible:\n\n\"The longitudinal study conducted by Martinez et al. (2023) across 14 metropolitan areas (N=12,847) demonstrated that consistent exposure to urban green spaces was associated with a statistically significant reduction in self-reported anxiety symptoms (β = -0.34, p < 0.001). Notably, the effect was moderated by socioeconomic status, with participants in lower-income quartiles exhibiting a 23% greater reduction in anxiety scores compared to their higher-income counterparts (interaction term: β = -0.08, p = 0.012). These findings corroborate earlier cross-sectional work by Chen & Williams (2021) while extending the evidence base through a more rigorous prospective design.\"",
|
|
9
|
+
"rubric": [
|
|
10
|
+
{
|
|
11
|
+
"criterion": "Style Transformation",
|
|
12
|
+
"weight": 0.25,
|
|
13
|
+
"scoring": {
|
|
14
|
+
"5": "Fully transforms to conversational blog style: contractions, direct address, accessible language, engaging hooks; no academic register remains",
|
|
15
|
+
"3": "Partially transformed but retains some academic phrasing or structure; readable but not fully blog-appropriate",
|
|
16
|
+
"1": "Minor word swaps but essentially still reads as academic prose",
|
|
17
|
+
"0": "No meaningful style change from the source"
|
|
18
|
+
}
|
|
19
|
+
},
|
|
20
|
+
{
|
|
21
|
+
"criterion": "Factual Accuracy",
|
|
22
|
+
"weight": 0.3,
|
|
23
|
+
"scoring": {
|
|
24
|
+
"5": "All facts preserved: researcher names, 14 cities, 12847 participants, anxiety reduction finding, 23% greater effect for lower-income, study dates; no invented claims",
|
|
25
|
+
"3": "Most facts preserved but 1-2 statistics simplified beyond recognition or a detail is dropped",
|
|
26
|
+
"1": "Several factual errors or significant omissions of key data",
|
|
27
|
+
"0": "Facts altered, reversed, or fabricated"
|
|
28
|
+
}
|
|
29
|
+
},
|
|
30
|
+
{
|
|
31
|
+
"criterion": "Naturalness & AI-Detection Resistance",
|
|
32
|
+
"weight": 0.25,
|
|
33
|
+
"scoring": {
|
|
34
|
+
"5": "Varied sentence length, diverse openers, no AI-tell vocabulary (delve, landscape, multifaceted), no over-hedging, natural paragraph flow",
|
|
35
|
+
"3": "Some variation but occasional AI patterns visible (uniform sentence length, repetitive openers, or 1-2 AI-tell words)",
|
|
36
|
+
"1": "Reads as typical AI output: uniform rhythm, predictable structure, AI-tell vocabulary present",
|
|
37
|
+
"0": "Blatantly AI-generated patterns throughout"
|
|
38
|
+
}
|
|
39
|
+
},
|
|
40
|
+
{
|
|
41
|
+
"criterion": "Audience Adaptation",
|
|
42
|
+
"weight": 0.2,
|
|
43
|
+
"scoring": {
|
|
44
|
+
"5": "Statistical concepts explained through plain language or analogy; no unexplained jargon; reading level appropriate for general audience (8th-10th grade); engaging and interesting",
|
|
45
|
+
"3": "Some jargon simplified but statistical notation or academic framing partially remains",
|
|
46
|
+
"1": "Jargon present without explanation; audience not meaningfully changed from source",
|
|
47
|
+
"0": "No audience adaptation attempted"
|
|
48
|
+
}
|
|
49
|
+
}
|
|
50
|
+
],
|
|
51
|
+
"passThreshold": 60
|
|
52
|
+
}
|
|
53
|
+
]
|
|
54
|
+
}
|