PyPI - cleanmonkey - Versions diffs - 0.1.0__tar.gz → 0.2.0__tar.gz - Mend

cleanmonkey 0.1.0tar.gz → 0.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

{cleanmonkey-0.1.0/src/cleanmonkey.egg-info → cleanmonkey-0.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: cleanmonkey
-Version: 0.1.0
+Version: 0.2.0
 Summary: One-call text cleanup: invisible characters, smart quotes, whitespace normalization.
 Author-email: RexBytes <pythonic@rexbytes.com>
 License: MIT
@@ -22,6 +22,12 @@ Classifier: Typing :: Typed
 Requires-Python: >=3.10
 Description-Content-Type: text/markdown
 License-File: LICENSE
+Provides-Extra: dev
+Requires-Dist: pytest; extra == "dev"
+Requires-Dist: pytest-cov; extra == "dev"
+Requires-Dist: hypothesis; extra == "dev"
+Requires-Dist: ruff; extra == "dev"
+Requires-Dist: mypy; extra == "dev"
 Dynamic: license-file
 # cleanmonkey
@@ -147,6 +153,28 @@ cleanmonkey --no-line-endings input.txt
 cleanmonkey is designed to work well as a tool for large language models. Invisible character cleanup is a constant source of silent bugs in LLM-driven data pipelines — non-breaking spaces break splits, zero-width characters corrupt comparisons, and smart quotes fail exact matches. Without cleanmonkey, LLMs end up generating repetitive `.replace()` chains that miss edge cases and waste tokens. A single `clean()` call handles all of it with a structured, idempotent result — no multi-step prompting or character-by-character debugging required. Fewer tokens in, clean data out.
+## A note on whitespace
+By default `clean()` strips each line and collapses runs of spaces, which is
+correct for prose but **destroys indentation** in structured text (YAML, Python,
+Markdown). Use `clean(text, strip=False, collapse_spaces=False)` or
+`profile="minimal"` for those. See [`LIMITATIONS.md`](LIMITATIONS.md) for this
+and other deliberate design tradeoffs.
+## Quality & review
+cleanmonkey is hardened with a competitive multi-model review methodology and a
+measurable release gate:
+- [`CONTRIBUTING.md`](CONTRIBUTING.md) — testing philosophy and the review-panel
+  process.
+- [`LIMITATIONS.md`](LIMITATIONS.md) — intentional tradeoffs reviewers should not
+  re-litigate.
+- [`RELEASE_READINESS.md`](RELEASE_READINESS.md) + `release_readiness.json` +
+  `scripts/readiness.py` — the release rubric and convergence metric
+  (`python scripts/readiness.py`).
+- [`REVIEW_HISTORY.md`](REVIEW_HISTORY.md) — how the library was hardened.
 ## License
 MIT

{cleanmonkey-0.1.0 → cleanmonkey-0.2.0}/README.md RENAMED Viewed

@@ -121,6 +121,28 @@ cleanmonkey --no-line-endings input.txt
 cleanmonkey is designed to work well as a tool for large language models. Invisible character cleanup is a constant source of silent bugs in LLM-driven data pipelines — non-breaking spaces break splits, zero-width characters corrupt comparisons, and smart quotes fail exact matches. Without cleanmonkey, LLMs end up generating repetitive `.replace()` chains that miss edge cases and waste tokens. A single `clean()` call handles all of it with a structured, idempotent result — no multi-step prompting or character-by-character debugging required. Fewer tokens in, clean data out.
+## A note on whitespace
+By default `clean()` strips each line and collapses runs of spaces, which is
+correct for prose but **destroys indentation** in structured text (YAML, Python,
+Markdown). Use `clean(text, strip=False, collapse_spaces=False)` or
+`profile="minimal"` for those. See [`LIMITATIONS.md`](LIMITATIONS.md) for this
+and other deliberate design tradeoffs.
+## Quality & review
+cleanmonkey is hardened with a competitive multi-model review methodology and a
+measurable release gate:
+- [`CONTRIBUTING.md`](CONTRIBUTING.md) — testing philosophy and the review-panel
+  process.
+- [`LIMITATIONS.md`](LIMITATIONS.md) — intentional tradeoffs reviewers should not
+  re-litigate.
+- [`RELEASE_READINESS.md`](RELEASE_READINESS.md) + `release_readiness.json` +
+  `scripts/readiness.py` — the release rubric and convergence metric
+  (`python scripts/readiness.py`).
+- [`REVIEW_HISTORY.md`](REVIEW_HISTORY.md) — how the library was hardened.
 ## License
 MIT

{cleanmonkey-0.1.0 → cleanmonkey-0.2.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "cleanmonkey"
-version = "0.1.0"
+version = "0.2.0"
 description = "One-call text cleanup: invisible characters, smart quotes, whitespace normalization."
 readme = "README.md"
 license = {text = "MIT"}
@@ -27,6 +27,9 @@ classifiers = [
     "Typing :: Typed",
 ]
+[project.optional-dependencies]
+dev = ["pytest", "pytest-cov", "hypothesis", "ruff", "mypy"]
 [project.scripts]
 cleanmonkey = "cleanmonkey.cli:_main_with_broken_pipe_handling"
@@ -41,3 +44,11 @@ where = ["src"]
 [tool.pytest.ini_options]
 testpaths = ["tests"]
 pythonpath = ["src"]
+[tool.ruff]
+target-version = "py310"
+src = ["src", "tests"]
+[tool.mypy]
+files = ["src"]
+python_version = "3.10"

{cleanmonkey-0.1.0 → cleanmonkey-0.2.0}/src/cleanmonkey/__init__.py RENAMED Viewed

@@ -3,5 +3,5 @@
 from cleanmonkey.core import MAX_DEPTH, clean, clean_column, clean_dict, inspect
 from cleanmonkey.profiles import PROFILES, Profile
-__version__ = "0.1.0"
+__version__ = "0.2.0"
 __all__ = ["MAX_DEPTH", "clean", "clean_column", "clean_dict", "inspect", "Profile", "PROFILES"]

{cleanmonkey-0.1.0 → cleanmonkey-0.2.0}/src/cleanmonkey/core.py RENAMED Viewed

@@ -4,7 +4,18 @@ from __future__ import annotations
 import re
 from dataclasses import dataclass, replace
-from typing import Any
+from typing import Any, cast
+from cleanmonkey.maps import (
+    CONTROL,
+    DASHES,
+    ELLIPSIS,
+    FULLWIDTH,
+    INVISIBLE,
+    SMART_QUOTES,
+    WHITESPACE,
+)
+from cleanmonkey.profiles import PROFILES, Profile
 _BOOL_OVERRIDE_NAMES_CLEAN = (
     "smart_quotes", "dashes", "ellipsis", "invisible", "whitespace",
@@ -27,18 +38,6 @@ def _validate_bool_overrides(overrides: dict[str, Any], allowed: tuple[str, ...]
                 f"{func_name}() override {name!r} must be bool or None, got {type(val).__name__}"
             )
-from cleanmonkey.maps import (
-    CONTROL,
-    DASHES,
-    ELLIPSIS,
-    FULLWIDTH,
-    INVISIBLE,
-    SMART_QUOTES,
-    WHITESPACE,
-)
-from cleanmonkey.profiles import PROFILES, Profile
 def _validate_profile_kwarg(kwargs: dict[str, Any], func_name: str) -> None:
     """Validate the 'profile' kwarg type and name if present, matching clean()'s contract."""
     if "profile" in kwargs:
@@ -63,7 +62,7 @@ MAX_DEPTH: int = 200
 def _build_table(profile: Profile) -> dict[int, str | int | None]:
     """Build a str.translate table from a profile."""
-    merged: dict[str, str] = {}
+    merged: dict[str, str | int | None] = {}
     if profile.invisible:
         merged.update(INVISIBLE)
     if profile.whitespace:
@@ -78,7 +77,7 @@ def _build_table(profile: Profile) -> dict[int, str | int | None]:
         merged.update(ELLIPSIS)
     if profile.fullwidth:
         merged.update(FULLWIDTH)
-    return str.maketrans({k: v for k, v in merged.items()})
+    return str.maketrans(merged)
 # Cache tables for default profiles
@@ -228,7 +227,13 @@ def _clean_value(
                 _clean_value(item, keys=keys, _seen=_seen, _depth=_depth + 1, **kwargs)
                 for item in v
             ]
-            result = type(v)(cleaned_items)
+            if isinstance(v, tuple) and hasattr(v, "_fields"):
+                # namedtuple: its constructor takes positional fields, not a
+                # single iterable, so type(v)(cleaned_items) would fail. Rebuild
+                # via the _make classmethod, which constructs from an iterable.
+                result = cast(Any, type(v))._make(cleaned_items)
+            else:
+                result = type(v)(cleaned_items)
             if isinstance(v, (set, frozenset)) and len(result) != len(v):
                 raise ValueError(
                     "Set member collision: cleaning produced duplicate members"

{cleanmonkey-0.1.0 → cleanmonkey-0.2.0/src/cleanmonkey.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: cleanmonkey
-Version: 0.1.0
+Version: 0.2.0
 Summary: One-call text cleanup: invisible characters, smart quotes, whitespace normalization.
 Author-email: RexBytes <pythonic@rexbytes.com>
 License: MIT
@@ -22,6 +22,12 @@ Classifier: Typing :: Typed
 Requires-Python: >=3.10
 Description-Content-Type: text/markdown
 License-File: LICENSE
+Provides-Extra: dev
+Requires-Dist: pytest; extra == "dev"
+Requires-Dist: pytest-cov; extra == "dev"
+Requires-Dist: hypothesis; extra == "dev"
+Requires-Dist: ruff; extra == "dev"
+Requires-Dist: mypy; extra == "dev"
 Dynamic: license-file
 # cleanmonkey
@@ -147,6 +153,28 @@ cleanmonkey --no-line-endings input.txt
 cleanmonkey is designed to work well as a tool for large language models. Invisible character cleanup is a constant source of silent bugs in LLM-driven data pipelines — non-breaking spaces break splits, zero-width characters corrupt comparisons, and smart quotes fail exact matches. Without cleanmonkey, LLMs end up generating repetitive `.replace()` chains that miss edge cases and waste tokens. A single `clean()` call handles all of it with a structured, idempotent result — no multi-step prompting or character-by-character debugging required. Fewer tokens in, clean data out.
+## A note on whitespace
+By default `clean()` strips each line and collapses runs of spaces, which is
+correct for prose but **destroys indentation** in structured text (YAML, Python,
+Markdown). Use `clean(text, strip=False, collapse_spaces=False)` or
+`profile="minimal"` for those. See [`LIMITATIONS.md`](LIMITATIONS.md) for this
+and other deliberate design tradeoffs.
+## Quality & review
+cleanmonkey is hardened with a competitive multi-model review methodology and a
+measurable release gate:
+- [`CONTRIBUTING.md`](CONTRIBUTING.md) — testing philosophy and the review-panel
+  process.
+- [`LIMITATIONS.md`](LIMITATIONS.md) — intentional tradeoffs reviewers should not
+  re-litigate.
+- [`RELEASE_READINESS.md`](RELEASE_READINESS.md) + `release_readiness.json` +
+  `scripts/readiness.py` — the release rubric and convergence metric
+  (`python scripts/readiness.py`).
+- [`REVIEW_HISTORY.md`](REVIEW_HISTORY.md) — how the library was hardened.
 ## License
 MIT

{cleanmonkey-0.1.0 → cleanmonkey-0.2.0}/src/cleanmonkey.egg-info/SOURCES.txt RENAMED Viewed

@@ -12,6 +12,7 @@ src/cleanmonkey.egg-info/PKG-INFO
 src/cleanmonkey.egg-info/SOURCES.txt
 src/cleanmonkey.egg-info/dependency_links.txt
 src/cleanmonkey.egg-info/entry_points.txt
+src/cleanmonkey.egg-info/requires.txt
 src/cleanmonkey.egg-info/top_level.txt
 tests/test_cli.py
 tests/test_core.py

cleanmonkey-0.2.0/src/cleanmonkey.egg-info/requires.txt ADDED Viewed

@@ -0,0 +1,7 @@
+[dev]
+pytest
+pytest-cov
+hypothesis
+ruff
+mypy

{cleanmonkey-0.1.0 → cleanmonkey-0.2.0}/tests/test_cli.py RENAMED Viewed

@@ -6,7 +6,6 @@ import os
 import stat
 import subprocess
 import sys
-import tempfile
 from unittest import mock
 import pytest

{cleanmonkey-0.1.0 → cleanmonkey-0.2.0}/tests/test_core.py RENAMED Viewed

@@ -279,6 +279,46 @@ class TestTupleRecursion:
         assert isinstance(result["data"], tuple)
         assert result["data"] == (1, "ab", None)
+    def test_namedtuple_in_list_preserves_type(self):
+        """namedtuples (tuple subclass, positional ctor) must clean and keep type."""
+        from collections import namedtuple
+        Row = namedtuple("Row", ["name", "city"])
+        result = clean_column([Row("Alice\u200b", "NYC"), Row("Bob", "LA")])
+        assert result == [Row("Alice", "NYC"), Row("Bob", "LA")]
+        assert isinstance(result[0], Row)
+        # cleaned values are accessible by field name (not collapsed to a plain tuple)
+        assert result[0].name == "Alice"
+    def test_namedtuple_as_dict_value_preserves_type(self):
+        """namedtuple dict values must clean and remain the same namedtuple type."""
+        from collections import namedtuple
+        Row = namedtuple("Row", ["name", "city"])
+        result = clean_dict({"r": Row("x\u200b", "y\u00a0z")})
+        assert isinstance(result["r"], Row)
+        assert result["r"] == Row("x", "y z")
+    def test_dict_list_subclasses_normalize_to_base_type(self):
+        """Documented contract (LIMITATIONS.md): dict/list subclasses clean their
+        values but are returned as the base type, not the subclass."""
+        from collections import Counter, OrderedDict, defaultdict
+        od = clean_dict(OrderedDict([("a\u200b", "v\u200b")]), keys=True)
+        assert type(od) is dict and od == {"a": "v"}
+        dd = clean_dict(defaultdict(int, {"a\u200b": "v\u200b"}), keys=True)
+        assert type(dd) is dict and dd == {"a": "v"}
+        class MyList(list):
+            pass
+        ml = clean_column(MyList(["a\u200b", "b"]))
+        assert type(ml) is list and ml == ["a", "b"]
+        cnt = clean_column([Counter({"a\u200b": 2})], keys=True)
+        assert type(cnt[0]) is dict and cnt[0] == {"a": 2}
 class TestCycleDetection:
     """Cyclic structures must raise ValueError, not RecursionError."""