PyPI - diversify-text - Versions diffs - 0.1.1__tar.gz - Mend

diversify-text 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

diversify_text-0.1.1/.gitignore +32 -0
diversify_text-0.1.1/LICENSE +21 -0
diversify_text-0.1.1/PKG-INFO +272 -0
diversify_text-0.1.1/README.md +239 -0
diversify_text-0.1.1/diversify_text/__init__.py +24 -0
diversify_text-0.1.1/diversify_text/_input.py +267 -0
diversify_text-0.1.1/diversify_text/_output.py +234 -0
diversify_text-0.1.1/diversify_text/_postprocess.py +64 -0
diversify_text-0.1.1/diversify_text/_preprocess.py +76 -0
diversify_text-0.1.1/diversify_text/_utils.py +27 -0
diversify_text-0.1.1/diversify_text/core.py +335 -0
diversify_text-0.1.1/diversify_text/filter/__init__.py +5 -0
diversify_text-0.1.1/diversify_text/filter/mis.py +272 -0
diversify_text-0.1.1/diversify_text/method/__init__.py +13 -0
diversify_text-0.1.1/diversify_text/method/base.py +35 -0
diversify_text-0.1.1/diversify_text/method/echo.py +25 -0
diversify_text-0.1.1/diversify_text/method/registry.py +109 -0
diversify_text-0.1.1/diversify_text/method/tinystyler/__init__.py +6 -0
diversify_text-0.1.1/diversify_text/method/tinystyler/method.py +164 -0
diversify_text-0.1.1/diversify_text/method/tinystyler/model.py +113 -0
diversify_text-0.1.1/diversify_text/method/tinystyler/styles.py +359 -0
diversify_text-0.1.1/diversify_text/py.typed +0 -0
diversify_text-0.1.1/pyproject.toml +60 -0

diversify_text-0.1.1/.gitignore ADDED Viewed

@@ -0,0 +1,32 @@
+# Python
+__pycache__/
+*.py[cod]
+*.egg-info/
+dist/
+build/
+*.egg
+# Virtual environments
+.venv/
+# uv
+uv.lock
+# IDE
+.idea/
+.claude/
+# Sphinx documentation
+docs/_build/
+# OS
+.DS_Store
+# Example scripts and data (kept locally for testing)
+example_scripts/
+# Legacy code (kept locally, not in repo)
+legacy_code/
+# Utility scripts
+scripts/

diversify_text-0.1.1/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Anna Wegmann
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

diversify_text-0.1.1/PKG-INFO ADDED Viewed

@@ -0,0 +1,272 @@
+Metadata-Version: 2.4
+Name: diversify-text
+Version: 0.1.1
+Summary: Generate stylistic paraphrases of texts using local transformer models.
+Project-URL: Homepage, https://github.com/AnnaWegmann/diversify_text
+Project-URL: Documentation, https://annawegmann.github.io/diversify_text/
+Project-URL: Repository, https://github.com/AnnaWegmann/diversify_text
+Project-URL: Issues, https://github.com/AnnaWegmann/diversify_text/issues
+Author: Anna Wegmann
+License-Expression: MIT
+License-File: LICENSE
+Keywords: augmentation,nlp,paraphrase,style-transfer,text-generation
+Classifier: Development Status :: 3 - Alpha
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Classifier: Topic :: Text Processing :: Linguistic
+Requires-Python: >=3.10
+Requires-Dist: huggingface-hub
+Requires-Dist: mutual-implication-score
+Requires-Dist: protobuf
+Requires-Dist: pysbd>=0.3.4
+Requires-Dist: sentence-transformers
+Requires-Dist: sentencepiece
+Requires-Dist: tiktoken
+Requires-Dist: torch
+Requires-Dist: tqdm>=4.67.3
+Requires-Dist: transformers
+Description-Content-Type: text/markdown
+# diversify-text
+This package helps you generate stylistically diverse paraphrases of your own texts using huggingface transformer models locally.
+```bash
+pip install diversify-text
+```
+**[Full documentation](https://annawegmann.github.io/diversify_text/)**
+## Table of contents
+- [Usage](#usage)
+  - [Single text](#single-text)
+  - [Control number of paraphrases](#control-number-of-paraphrases)
+  - [Using the class directly](#using-the-class-directly)
+  - [List of texts](#list-of-texts)
+  - [Customising the TinyStyler style bank](#customising-the-tinystyler-style-bank)
+- [Install](#install)
+- [Contributing](#contributing)
+  - [Development setup](#development-setup)
+  - [Running tests](#running-tests)
+  - [Working with uv](#working-with-uv)
+  - [Building docs locally](#building-docs-locally)
+## Usage
+For file inputs (CSV, TSV, TXT), output options, punctuation splitting, and creating custom methods, see the [full usage guide](https://annawegmann.github.io/diversify_text/usage.html).
+### Single text
+```python
+from diversify_text import diversify
+results = diversify("The experiment was conducted in a controlled lab setting.")
+```
+```
+[{
+    "original": "The experiment was conducted in a controlled lab setting.",
+    "paraphrases": [
+        "They ran the experiment in a controlled lab setting.",
+        "The experiment took place in a controlled lab.",
+        "A controlled lab was where the experiment was conducted.",
+        "In a controlled lab, the experiment was carried out.",
+        "The study was performed in a controlled lab environment.",
+    ]
+}]
+```
+### Control number of paraphrases
+```python
+results = diversify("Some text.", n_styles=3)
+```
+```
+[{"original": "Some text.", "paraphrases": ["...", "...", "..."]}]
+```
+### Using the class directly
+Recommended when processing texts across several calls — the model is loaded once and reused across calls.
+```python
+from diversify_text import Diversifier
+div = Diversifier(device="cuda", methods=["tinystyler"])
+batch_1 = div.diversify(texts_1, n_styles=5)
+batch_2 = div.diversify(texts_2, n_styles=5)
+```
+### List of texts
+```python
+results = diversify([
+    "The experiment was conducted in a controlled lab setting.",
+    "She graduated from MIT in 2019.",
+])
+```
+```
+[
+    {"original": "The experiment ...", "paraphrases": ["...", "...", ...]},
+    {"original": "She graduated ...", "paraphrases": ["...", "...", ...]},
+]
+```
+### Customising the TinyStyler style bank
+TinyStyler generates each paraphrase by conditioning on a *style example* — a short sentence that demonstrates the target writing style. The style bank is the list of such examples that get cycled through when producing multiple paraphrases.
+The default bank is a dictionary mapping style labels to lists of example sentences (drawn from the CORE corpus). You can replace or extend it by passing a custom bank via `method_kwargs`.
+A style bank can be a `dict[str, list[str]]` or a `list[list[str]]`:
+```python
+from diversify_text import diversify
+from diversify_text.method.tinystyler import DEFAULT_STYLE_BANK
+custom_bank = {
+    "academic": ["The results demonstrate a statistically significant effect."],
+    "enthusiastic": ["We found something really interesting — check this out!"],
+    "telegraphic": ["Key finding: effect confirmed. Details follow."],
+}
+results = diversify(
+    "The experiment was conducted in a controlled lab setting.",
+    method_kwargs={"tinystyler": {"style_bank": custom_bank}},
+)
+```
+`DEFAULT_STYLE_BANK` is exported from `diversify_text.method.tinystyler` so you can build on it:
+```python
+from diversify_text.method.tinystyler import DEFAULT_STYLE_BANK
+extended_bank = {
+    **DEFAULT_STYLE_BANK,
+    "scientific": ["The data clearly indicate a statistically significant result."],
+}
+```
+You can also select specific styles by key name with `styles`, instead of cycling through the entire bank.
+The number of paraphrases is determined by the number of selected styles:
+```python
+results = diversify(
+    "The experiment was conducted in a controlled lab setting.",
+    method_kwargs={"tinystyler": {"styles": ["research_article", "personal_blog", "recipe"]}},
+)
+```
+### Creating a custom method
+```python
+from diversify_text import Diversifier
+from diversify_text.method import DiversificationMethod
+class MyMethod(DiversificationMethod):
+    name = "my_method"
+    def generate(self, texts, *, n_styles, max_new_tokens, temperature, top_p, **kwargs):
+        return [[f"{text} :: variant {i}" for i in range(n_styles)] for text in texts]
+results = Diversifier(methods=[MyMethod()]).diversify("Hello", n_styles=3)
+```
+```
+[{"original": "Hello", "paraphrases": ["Hello :: variant 0", "Hello :: variant 1", "Hello :: variant 2"]}]
+```
+## Install
+```bash
+pip install diversify-text
+```
+Requires Python 3.10+.
+## Contributing
+### Development setup
+> [!NOTE]
+> You must have **uv** installed.
+> Full installation guide: <https://docs.astral.sh/uv/getting-started/installation/>
+```bash
+git clone https://github.com/AnnaWegmann/diversify_text.git
+cd diversify_text
+uv sync --group dev
+source .venv/bin/activate
+```
+### Running tests
+```bash
+# Run all tests
+pytest
+# Run a specific test file
+pytest tests/test_core.py
+# Run a specific test class or method
+pytest tests/test_core.py::TestDiversifier
+pytest tests/test_core.py::TestDiversifier::test_single_text_returns_one_result
+```
+Tests are also individually runnable via PyCharm's built-in test runner (right-click any test class or method).
+### Working with uv
+#### Adding packages with `uv add`
+To add packages to your project, always use `uv add` rather than `uv pip install`. This ensures that your dependencies are properly managed and recorded in your `pyproject.toml`.
+```bash
+uv add <package-name>
+```
+#### Adding packages to the dev group
+If you need to add a package specifically for your development environment:
+```bash
+uv add --group dev <package-name>
+```
+#### Switching between dev and standard mode
+After you are done with testing and want to go back to standard mode, you can remove the dev-only packages:
+```bash
+uv sync --no-group dev
+```
+This will disable all additional groups and just load your main project dependencies.
+#### Best practice: run `uv lock -U`
+Whenever you upgrade, downgrade, or change versions of packages, it's good practice to run:
+```bash
+uv lock -U
+```
+This updates your lock file to ensure all versions are consistent and everything is in sync.
+### Building docs locally
+```bash
+uv sync --group docs
+sphinx-build -b html docs docs/_build/html
+open docs/_build/html/index.html
+```

diversify_text-0.1.1/README.md ADDED Viewed

@@ -0,0 +1,239 @@
+# diversify-text
+This package helps you generate stylistically diverse paraphrases of your own texts using huggingface transformer models locally.
+```bash
+pip install diversify-text
+```
+**[Full documentation](https://annawegmann.github.io/diversify_text/)**
+## Table of contents
+- [Usage](#usage)
+  - [Single text](#single-text)
+  - [Control number of paraphrases](#control-number-of-paraphrases)
+  - [Using the class directly](#using-the-class-directly)
+  - [List of texts](#list-of-texts)
+  - [Customising the TinyStyler style bank](#customising-the-tinystyler-style-bank)
+- [Install](#install)
+- [Contributing](#contributing)
+  - [Development setup](#development-setup)
+  - [Running tests](#running-tests)
+  - [Working with uv](#working-with-uv)
+  - [Building docs locally](#building-docs-locally)
+## Usage
+For file inputs (CSV, TSV, TXT), output options, punctuation splitting, and creating custom methods, see the [full usage guide](https://annawegmann.github.io/diversify_text/usage.html).
+### Single text
+```python
+from diversify_text import diversify
+results = diversify("The experiment was conducted in a controlled lab setting.")
+```
+```
+[{
+    "original": "The experiment was conducted in a controlled lab setting.",
+    "paraphrases": [
+        "They ran the experiment in a controlled lab setting.",
+        "The experiment took place in a controlled lab.",
+        "A controlled lab was where the experiment was conducted.",
+        "In a controlled lab, the experiment was carried out.",
+        "The study was performed in a controlled lab environment.",
+    ]
+}]
+```
+### Control number of paraphrases
+```python
+results = diversify("Some text.", n_styles=3)
+```
+```
+[{"original": "Some text.", "paraphrases": ["...", "...", "..."]}]
+```
+### Using the class directly
+Recommended when processing texts across several calls — the model is loaded once and reused across calls.
+```python
+from diversify_text import Diversifier
+div = Diversifier(device="cuda", methods=["tinystyler"])
+batch_1 = div.diversify(texts_1, n_styles=5)
+batch_2 = div.diversify(texts_2, n_styles=5)
+```
+### List of texts
+```python
+results = diversify([
+    "The experiment was conducted in a controlled lab setting.",
+    "She graduated from MIT in 2019.",
+])
+```
+```
+[
+    {"original": "The experiment ...", "paraphrases": ["...", "...", ...]},
+    {"original": "She graduated ...", "paraphrases": ["...", "...", ...]},
+]
+```
+### Customising the TinyStyler style bank
+TinyStyler generates each paraphrase by conditioning on a *style example* — a short sentence that demonstrates the target writing style. The style bank is the list of such examples that get cycled through when producing multiple paraphrases.
+The default bank is a dictionary mapping style labels to lists of example sentences (drawn from the CORE corpus). You can replace or extend it by passing a custom bank via `method_kwargs`.
+A style bank can be a `dict[str, list[str]]` or a `list[list[str]]`:
+```python
+from diversify_text import diversify
+from diversify_text.method.tinystyler import DEFAULT_STYLE_BANK
+custom_bank = {
+    "academic": ["The results demonstrate a statistically significant effect."],
+    "enthusiastic": ["We found something really interesting — check this out!"],
+    "telegraphic": ["Key finding: effect confirmed. Details follow."],
+}
+results = diversify(
+    "The experiment was conducted in a controlled lab setting.",
+    method_kwargs={"tinystyler": {"style_bank": custom_bank}},
+)
+```
+`DEFAULT_STYLE_BANK` is exported from `diversify_text.method.tinystyler` so you can build on it:
+```python
+from diversify_text.method.tinystyler import DEFAULT_STYLE_BANK
+extended_bank = {
+    **DEFAULT_STYLE_BANK,
+    "scientific": ["The data clearly indicate a statistically significant result."],
+}
+```
+You can also select specific styles by key name with `styles`, instead of cycling through the entire bank.
+The number of paraphrases is determined by the number of selected styles:
+```python
+results = diversify(
+    "The experiment was conducted in a controlled lab setting.",
+    method_kwargs={"tinystyler": {"styles": ["research_article", "personal_blog", "recipe"]}},
+)
+```
+### Creating a custom method
+```python
+from diversify_text import Diversifier
+from diversify_text.method import DiversificationMethod
+class MyMethod(DiversificationMethod):
+    name = "my_method"
+    def generate(self, texts, *, n_styles, max_new_tokens, temperature, top_p, **kwargs):
+        return [[f"{text} :: variant {i}" for i in range(n_styles)] for text in texts]
+results = Diversifier(methods=[MyMethod()]).diversify("Hello", n_styles=3)
+```
+```
+[{"original": "Hello", "paraphrases": ["Hello :: variant 0", "Hello :: variant 1", "Hello :: variant 2"]}]
+```
+## Install
+```bash
+pip install diversify-text
+```
+Requires Python 3.10+.
+## Contributing
+### Development setup
+> [!NOTE]
+> You must have **uv** installed.
+> Full installation guide: <https://docs.astral.sh/uv/getting-started/installation/>
+```bash
+git clone https://github.com/AnnaWegmann/diversify_text.git
+cd diversify_text
+uv sync --group dev
+source .venv/bin/activate
+```
+### Running tests
+```bash
+# Run all tests
+pytest
+# Run a specific test file
+pytest tests/test_core.py
+# Run a specific test class or method
+pytest tests/test_core.py::TestDiversifier
+pytest tests/test_core.py::TestDiversifier::test_single_text_returns_one_result
+```
+Tests are also individually runnable via PyCharm's built-in test runner (right-click any test class or method).
+### Working with uv
+#### Adding packages with `uv add`
+To add packages to your project, always use `uv add` rather than `uv pip install`. This ensures that your dependencies are properly managed and recorded in your `pyproject.toml`.
+```bash
+uv add <package-name>
+```
+#### Adding packages to the dev group
+If you need to add a package specifically for your development environment:
+```bash
+uv add --group dev <package-name>
+```
+#### Switching between dev and standard mode
+After you are done with testing and want to go back to standard mode, you can remove the dev-only packages:
+```bash
+uv sync --no-group dev
+```
+This will disable all additional groups and just load your main project dependencies.
+#### Best practice: run `uv lock -U`
+Whenever you upgrade, downgrade, or change versions of packages, it's good practice to run:
+```bash
+uv lock -U
+```
+This updates your lock file to ensure all versions are consistent and everything is in sync.
+### Building docs locally
+```bash
+uv sync --group docs
+sphinx-build -b html docs docs/_build/html
+open docs/_build/html/index.html
+```

diversify_text-0.1.1/diversify_text/__init__.py ADDED Viewed

@@ -0,0 +1,24 @@
+"""diversify-text -- generate stylistic paraphrases of texts."""
+import logging
+from diversify_text.core import (
+    Diversifier,
+    diversify,
+)
+__all__ = [
+    "Diversifier",
+    "diversify",
+]
+# Configure a clean handler for the diversify logger so INFO/WARNING messages
+# are visible without requiring the user to set up logging themselves.
+_logger = logging.getLogger("diversify_text")
+_logger.setLevel(logging.INFO)
+_handler = logging.StreamHandler()
+_handler.setFormatter(logging.Formatter("%(levelname)s: %(message)s"))
+_logger.addHandler(_handler)
+# Prevent messages from bubbling up to the root logger (avoids duplicate output
+# if the user has already configured logging globally).
+_logger.propagate = False