diversify-text 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,32 @@
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *.egg-info/
5
+ dist/
6
+ build/
7
+ *.egg
8
+
9
+ # Virtual environments
10
+ .venv/
11
+
12
+ # uv
13
+ uv.lock
14
+
15
+ # IDE
16
+ .idea/
17
+ .claude/
18
+
19
+ # Sphinx documentation
20
+ docs/_build/
21
+
22
+ # OS
23
+ .DS_Store
24
+
25
+ # Example scripts and data (kept locally for testing)
26
+ example_scripts/
27
+
28
+ # Legacy code (kept locally, not in repo)
29
+ legacy_code/
30
+
31
+ # Utility scripts
32
+ scripts/
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Anna Wegmann
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,272 @@
1
+ Metadata-Version: 2.4
2
+ Name: diversify-text
3
+ Version: 0.1.1
4
+ Summary: Generate stylistic paraphrases of texts using local transformer models.
5
+ Project-URL: Homepage, https://github.com/AnnaWegmann/diversify_text
6
+ Project-URL: Documentation, https://annawegmann.github.io/diversify_text/
7
+ Project-URL: Repository, https://github.com/AnnaWegmann/diversify_text
8
+ Project-URL: Issues, https://github.com/AnnaWegmann/diversify_text/issues
9
+ Author: Anna Wegmann
10
+ License-Expression: MIT
11
+ License-File: LICENSE
12
+ Keywords: augmentation,nlp,paraphrase,style-transfer,text-generation
13
+ Classifier: Development Status :: 3 - Alpha
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Programming Language :: Python :: 3.13
19
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
20
+ Classifier: Topic :: Text Processing :: Linguistic
21
+ Requires-Python: >=3.10
22
+ Requires-Dist: huggingface-hub
23
+ Requires-Dist: mutual-implication-score
24
+ Requires-Dist: protobuf
25
+ Requires-Dist: pysbd>=0.3.4
26
+ Requires-Dist: sentence-transformers
27
+ Requires-Dist: sentencepiece
28
+ Requires-Dist: tiktoken
29
+ Requires-Dist: torch
30
+ Requires-Dist: tqdm>=4.67.3
31
+ Requires-Dist: transformers
32
+ Description-Content-Type: text/markdown
33
+
34
+ # diversify-text
35
+
36
+ This package helps you generate stylistically diverse paraphrases of your own texts using huggingface transformer models locally.
37
+
38
+ ```bash
39
+ pip install diversify-text
40
+ ```
41
+
42
+ **[Full documentation](https://annawegmann.github.io/diversify_text/)**
43
+
44
+ ## Table of contents
45
+
46
+ - [Usage](#usage)
47
+ - [Single text](#single-text)
48
+ - [Control number of paraphrases](#control-number-of-paraphrases)
49
+ - [Using the class directly](#using-the-class-directly)
50
+ - [List of texts](#list-of-texts)
51
+ - [Customising the TinyStyler style bank](#customising-the-tinystyler-style-bank)
52
+ - [Install](#install)
53
+ - [Contributing](#contributing)
54
+ - [Development setup](#development-setup)
55
+ - [Running tests](#running-tests)
56
+ - [Working with uv](#working-with-uv)
57
+ - [Building docs locally](#building-docs-locally)
58
+
59
+ ## Usage
60
+
61
+ For file inputs (CSV, TSV, TXT), output options, punctuation splitting, and creating custom methods, see the [full usage guide](https://annawegmann.github.io/diversify_text/usage.html).
62
+
63
+ ### Single text
64
+
65
+ ```python
66
+ from diversify_text import diversify
67
+
68
+ results = diversify("The experiment was conducted in a controlled lab setting.")
69
+ ```
70
+
71
+ ```
72
+ [{
73
+ "original": "The experiment was conducted in a controlled lab setting.",
74
+ "paraphrases": [
75
+ "They ran the experiment in a controlled lab setting.",
76
+ "The experiment took place in a controlled lab.",
77
+ "A controlled lab was where the experiment was conducted.",
78
+ "In a controlled lab, the experiment was carried out.",
79
+ "The study was performed in a controlled lab environment.",
80
+ ]
81
+ }]
82
+ ```
83
+
84
+ ### Control number of paraphrases
85
+
86
+ ```python
87
+ results = diversify("Some text.", n_styles=3)
88
+ ```
89
+
90
+ ```
91
+ [{"original": "Some text.", "paraphrases": ["...", "...", "..."]}]
92
+ ```
93
+
94
+ ### Using the class directly
95
+
96
+ Recommended when processing texts across several calls — the model is loaded once and reused across calls.
97
+
98
+ ```python
99
+ from diversify_text import Diversifier
100
+
101
+ div = Diversifier(device="cuda", methods=["tinystyler"])
102
+
103
+ batch_1 = div.diversify(texts_1, n_styles=5)
104
+ batch_2 = div.diversify(texts_2, n_styles=5)
105
+ ```
106
+
107
+ ### List of texts
108
+
109
+ ```python
110
+ results = diversify([
111
+ "The experiment was conducted in a controlled lab setting.",
112
+ "She graduated from MIT in 2019.",
113
+ ])
114
+ ```
115
+
116
+ ```
117
+ [
118
+ {"original": "The experiment ...", "paraphrases": ["...", "...", ...]},
119
+ {"original": "She graduated ...", "paraphrases": ["...", "...", ...]},
120
+ ]
121
+ ```
122
+
123
+ ### Customising the TinyStyler style bank
124
+
125
+ TinyStyler generates each paraphrase by conditioning on a *style example* — a short sentence that demonstrates the target writing style. The style bank is the list of such examples that get cycled through when producing multiple paraphrases.
126
+
127
+ The default bank is a dictionary mapping style labels to lists of example sentences (drawn from the CORE corpus). You can replace or extend it by passing a custom bank via `method_kwargs`.
128
+
129
+ A style bank can be a `dict[str, list[str]]` or a `list[list[str]]`:
130
+
131
+ ```python
132
+ from diversify_text import diversify
133
+ from diversify_text.method.tinystyler import DEFAULT_STYLE_BANK
134
+
135
+ custom_bank = {
136
+ "academic": ["The results demonstrate a statistically significant effect."],
137
+ "enthusiastic": ["We found something really interesting — check this out!"],
138
+ "telegraphic": ["Key finding: effect confirmed. Details follow."],
139
+ }
140
+
141
+ results = diversify(
142
+ "The experiment was conducted in a controlled lab setting.",
143
+ method_kwargs={"tinystyler": {"style_bank": custom_bank}},
144
+ )
145
+ ```
146
+
147
+ `DEFAULT_STYLE_BANK` is exported from `diversify_text.method.tinystyler` so you can build on it:
148
+
149
+ ```python
150
+ from diversify_text.method.tinystyler import DEFAULT_STYLE_BANK
151
+
152
+ extended_bank = {
153
+ **DEFAULT_STYLE_BANK,
154
+ "scientific": ["The data clearly indicate a statistically significant result."],
155
+ }
156
+ ```
157
+
158
+ You can also select specific styles by key name with `styles`, instead of cycling through the entire bank.
159
+ The number of paraphrases is determined by the number of selected styles:
160
+
161
+ ```python
162
+ results = diversify(
163
+ "The experiment was conducted in a controlled lab setting.",
164
+ method_kwargs={"tinystyler": {"styles": ["research_article", "personal_blog", "recipe"]}},
165
+ )
166
+ ```
167
+
168
+ ### Creating a custom method
169
+
170
+ ```python
171
+ from diversify_text import Diversifier
172
+ from diversify_text.method import DiversificationMethod
173
+
174
+
175
+ class MyMethod(DiversificationMethod):
176
+ name = "my_method"
177
+
178
+ def generate(self, texts, *, n_styles, max_new_tokens, temperature, top_p, **kwargs):
179
+ return [[f"{text} :: variant {i}" for i in range(n_styles)] for text in texts]
180
+
181
+
182
+ results = Diversifier(methods=[MyMethod()]).diversify("Hello", n_styles=3)
183
+ ```
184
+
185
+ ```
186
+ [{"original": "Hello", "paraphrases": ["Hello :: variant 0", "Hello :: variant 1", "Hello :: variant 2"]}]
187
+ ```
188
+
189
+ ## Install
190
+
191
+ ```bash
192
+ pip install diversify-text
193
+ ```
194
+
195
+ Requires Python 3.10+.
196
+
197
+ ## Contributing
198
+
199
+ ### Development setup
200
+
201
+ > [!NOTE]
202
+ > You must have **uv** installed.
203
+ > Full installation guide: <https://docs.astral.sh/uv/getting-started/installation/>
204
+
205
+ ```bash
206
+ git clone https://github.com/AnnaWegmann/diversify_text.git
207
+ cd diversify_text
208
+ uv sync --group dev
209
+ source .venv/bin/activate
210
+ ```
211
+
212
+ ### Running tests
213
+
214
+ ```bash
215
+ # Run all tests
216
+ pytest
217
+
218
+ # Run a specific test file
219
+ pytest tests/test_core.py
220
+
221
+ # Run a specific test class or method
222
+ pytest tests/test_core.py::TestDiversifier
223
+ pytest tests/test_core.py::TestDiversifier::test_single_text_returns_one_result
224
+ ```
225
+
226
+ Tests are also individually runnable via PyCharm's built-in test runner (right-click any test class or method).
227
+
228
+ ### Working with uv
229
+
230
+ #### Adding packages with `uv add`
231
+
232
+ To add packages to your project, always use `uv add` rather than `uv pip install`. This ensures that your dependencies are properly managed and recorded in your `pyproject.toml`.
233
+
234
+ ```bash
235
+ uv add <package-name>
236
+ ```
237
+
238
+ #### Adding packages to the dev group
239
+
240
+ If you need to add a package specifically for your development environment:
241
+
242
+ ```bash
243
+ uv add --group dev <package-name>
244
+ ```
245
+
246
+ #### Switching between dev and standard mode
247
+
248
+ After you are done with testing and want to go back to standard mode, you can remove the dev-only packages:
249
+
250
+ ```bash
251
+ uv sync --no-group dev
252
+ ```
253
+
254
+ This will disable all additional groups and just load your main project dependencies.
255
+
256
+ #### Best practice: run `uv lock -U`
257
+
258
+ Whenever you upgrade, downgrade, or change versions of packages, it's good practice to run:
259
+
260
+ ```bash
261
+ uv lock -U
262
+ ```
263
+
264
+ This updates your lock file to ensure all versions are consistent and everything is in sync.
265
+
266
+ ### Building docs locally
267
+
268
+ ```bash
269
+ uv sync --group docs
270
+ sphinx-build -b html docs docs/_build/html
271
+ open docs/_build/html/index.html
272
+ ```
@@ -0,0 +1,239 @@
1
+ # diversify-text
2
+
3
+ This package helps you generate stylistically diverse paraphrases of your own texts using huggingface transformer models locally.
4
+
5
+ ```bash
6
+ pip install diversify-text
7
+ ```
8
+
9
+ **[Full documentation](https://annawegmann.github.io/diversify_text/)**
10
+
11
+ ## Table of contents
12
+
13
+ - [Usage](#usage)
14
+ - [Single text](#single-text)
15
+ - [Control number of paraphrases](#control-number-of-paraphrases)
16
+ - [Using the class directly](#using-the-class-directly)
17
+ - [List of texts](#list-of-texts)
18
+ - [Customising the TinyStyler style bank](#customising-the-tinystyler-style-bank)
19
+ - [Install](#install)
20
+ - [Contributing](#contributing)
21
+ - [Development setup](#development-setup)
22
+ - [Running tests](#running-tests)
23
+ - [Working with uv](#working-with-uv)
24
+ - [Building docs locally](#building-docs-locally)
25
+
26
+ ## Usage
27
+
28
+ For file inputs (CSV, TSV, TXT), output options, punctuation splitting, and creating custom methods, see the [full usage guide](https://annawegmann.github.io/diversify_text/usage.html).
29
+
30
+ ### Single text
31
+
32
+ ```python
33
+ from diversify_text import diversify
34
+
35
+ results = diversify("The experiment was conducted in a controlled lab setting.")
36
+ ```
37
+
38
+ ```
39
+ [{
40
+ "original": "The experiment was conducted in a controlled lab setting.",
41
+ "paraphrases": [
42
+ "They ran the experiment in a controlled lab setting.",
43
+ "The experiment took place in a controlled lab.",
44
+ "A controlled lab was where the experiment was conducted.",
45
+ "In a controlled lab, the experiment was carried out.",
46
+ "The study was performed in a controlled lab environment.",
47
+ ]
48
+ }]
49
+ ```
50
+
51
+ ### Control number of paraphrases
52
+
53
+ ```python
54
+ results = diversify("Some text.", n_styles=3)
55
+ ```
56
+
57
+ ```
58
+ [{"original": "Some text.", "paraphrases": ["...", "...", "..."]}]
59
+ ```
60
+
61
+ ### Using the class directly
62
+
63
+ Recommended when processing texts across several calls — the model is loaded once and reused across calls.
64
+
65
+ ```python
66
+ from diversify_text import Diversifier
67
+
68
+ div = Diversifier(device="cuda", methods=["tinystyler"])
69
+
70
+ batch_1 = div.diversify(texts_1, n_styles=5)
71
+ batch_2 = div.diversify(texts_2, n_styles=5)
72
+ ```
73
+
74
+ ### List of texts
75
+
76
+ ```python
77
+ results = diversify([
78
+ "The experiment was conducted in a controlled lab setting.",
79
+ "She graduated from MIT in 2019.",
80
+ ])
81
+ ```
82
+
83
+ ```
84
+ [
85
+ {"original": "The experiment ...", "paraphrases": ["...", "...", ...]},
86
+ {"original": "She graduated ...", "paraphrases": ["...", "...", ...]},
87
+ ]
88
+ ```
89
+
90
+ ### Customising the TinyStyler style bank
91
+
92
+ TinyStyler generates each paraphrase by conditioning on a *style example* — a short sentence that demonstrates the target writing style. The style bank is the list of such examples that get cycled through when producing multiple paraphrases.
93
+
94
+ The default bank is a dictionary mapping style labels to lists of example sentences (drawn from the CORE corpus). You can replace or extend it by passing a custom bank via `method_kwargs`.
95
+
96
+ A style bank can be a `dict[str, list[str]]` or a `list[list[str]]`:
97
+
98
+ ```python
99
+ from diversify_text import diversify
100
+ from diversify_text.method.tinystyler import DEFAULT_STYLE_BANK
101
+
102
+ custom_bank = {
103
+ "academic": ["The results demonstrate a statistically significant effect."],
104
+ "enthusiastic": ["We found something really interesting — check this out!"],
105
+ "telegraphic": ["Key finding: effect confirmed. Details follow."],
106
+ }
107
+
108
+ results = diversify(
109
+ "The experiment was conducted in a controlled lab setting.",
110
+ method_kwargs={"tinystyler": {"style_bank": custom_bank}},
111
+ )
112
+ ```
113
+
114
+ `DEFAULT_STYLE_BANK` is exported from `diversify_text.method.tinystyler` so you can build on it:
115
+
116
+ ```python
117
+ from diversify_text.method.tinystyler import DEFAULT_STYLE_BANK
118
+
119
+ extended_bank = {
120
+ **DEFAULT_STYLE_BANK,
121
+ "scientific": ["The data clearly indicate a statistically significant result."],
122
+ }
123
+ ```
124
+
125
+ You can also select specific styles by key name with `styles`, instead of cycling through the entire bank.
126
+ The number of paraphrases is determined by the number of selected styles:
127
+
128
+ ```python
129
+ results = diversify(
130
+ "The experiment was conducted in a controlled lab setting.",
131
+ method_kwargs={"tinystyler": {"styles": ["research_article", "personal_blog", "recipe"]}},
132
+ )
133
+ ```
134
+
135
+ ### Creating a custom method
136
+
137
+ ```python
138
+ from diversify_text import Diversifier
139
+ from diversify_text.method import DiversificationMethod
140
+
141
+
142
+ class MyMethod(DiversificationMethod):
143
+ name = "my_method"
144
+
145
+ def generate(self, texts, *, n_styles, max_new_tokens, temperature, top_p, **kwargs):
146
+ return [[f"{text} :: variant {i}" for i in range(n_styles)] for text in texts]
147
+
148
+
149
+ results = Diversifier(methods=[MyMethod()]).diversify("Hello", n_styles=3)
150
+ ```
151
+
152
+ ```
153
+ [{"original": "Hello", "paraphrases": ["Hello :: variant 0", "Hello :: variant 1", "Hello :: variant 2"]}]
154
+ ```
155
+
156
+ ## Install
157
+
158
+ ```bash
159
+ pip install diversify-text
160
+ ```
161
+
162
+ Requires Python 3.10+.
163
+
164
+ ## Contributing
165
+
166
+ ### Development setup
167
+
168
+ > [!NOTE]
169
+ > You must have **uv** installed.
170
+ > Full installation guide: <https://docs.astral.sh/uv/getting-started/installation/>
171
+
172
+ ```bash
173
+ git clone https://github.com/AnnaWegmann/diversify_text.git
174
+ cd diversify_text
175
+ uv sync --group dev
176
+ source .venv/bin/activate
177
+ ```
178
+
179
+ ### Running tests
180
+
181
+ ```bash
182
+ # Run all tests
183
+ pytest
184
+
185
+ # Run a specific test file
186
+ pytest tests/test_core.py
187
+
188
+ # Run a specific test class or method
189
+ pytest tests/test_core.py::TestDiversifier
190
+ pytest tests/test_core.py::TestDiversifier::test_single_text_returns_one_result
191
+ ```
192
+
193
+ Tests are also individually runnable via PyCharm's built-in test runner (right-click any test class or method).
194
+
195
+ ### Working with uv
196
+
197
+ #### Adding packages with `uv add`
198
+
199
+ To add packages to your project, always use `uv add` rather than `uv pip install`. This ensures that your dependencies are properly managed and recorded in your `pyproject.toml`.
200
+
201
+ ```bash
202
+ uv add <package-name>
203
+ ```
204
+
205
+ #### Adding packages to the dev group
206
+
207
+ If you need to add a package specifically for your development environment:
208
+
209
+ ```bash
210
+ uv add --group dev <package-name>
211
+ ```
212
+
213
+ #### Switching between dev and standard mode
214
+
215
+ After you are done with testing and want to go back to standard mode, you can remove the dev-only packages:
216
+
217
+ ```bash
218
+ uv sync --no-group dev
219
+ ```
220
+
221
+ This will disable all additional groups and just load your main project dependencies.
222
+
223
+ #### Best practice: run `uv lock -U`
224
+
225
+ Whenever you upgrade, downgrade, or change versions of packages, it's good practice to run:
226
+
227
+ ```bash
228
+ uv lock -U
229
+ ```
230
+
231
+ This updates your lock file to ensure all versions are consistent and everything is in sync.
232
+
233
+ ### Building docs locally
234
+
235
+ ```bash
236
+ uv sync --group docs
237
+ sphinx-build -b html docs docs/_build/html
238
+ open docs/_build/html/index.html
239
+ ```
@@ -0,0 +1,24 @@
1
+ """diversify-text -- generate stylistic paraphrases of texts."""
2
+
3
+ import logging
4
+
5
+ from diversify_text.core import (
6
+ Diversifier,
7
+ diversify,
8
+ )
9
+
10
+ __all__ = [
11
+ "Diversifier",
12
+ "diversify",
13
+ ]
14
+
15
+ # Configure a clean handler for the diversify logger so INFO/WARNING messages
16
+ # are visible without requiring the user to set up logging themselves.
17
+ _logger = logging.getLogger("diversify_text")
18
+ _logger.setLevel(logging.INFO)
19
+ _handler = logging.StreamHandler()
20
+ _handler.setFormatter(logging.Formatter("%(levelname)s: %(message)s"))
21
+ _logger.addHandler(_handler)
22
+ # Prevent messages from bubbling up to the root logger (avoids duplicate output
23
+ # if the user has already configured logging globally).
24
+ _logger.propagate = False