PyPI - tokensplit - Versions diffs - 0.1.0__tar.gz - Mend

tokensplit 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

tokensplit-0.1.0/LICENSE.txt +21 -0
tokensplit-0.1.0/PKG-INFO +154 -0
tokensplit-0.1.0/README.md +140 -0
tokensplit-0.1.0/pyproject.toml +18 -0
tokensplit-0.1.0/setup.cfg +4 -0
tokensplit-0.1.0/tests/test_tokensplit.py +168 -0
tokensplit-0.1.0/tokensplit/__init__.py +5 -0
tokensplit-0.1.0/tokensplit/reader.py +138 -0
tokensplit-0.1.0/tokensplit/writer.py +115 -0
tokensplit-0.1.0/tokensplit.egg-info/PKG-INFO +154 -0
tokensplit-0.1.0/tokensplit.egg-info/SOURCES.txt +11 -0
tokensplit-0.1.0/tokensplit.egg-info/dependency_links.txt +1 -0
tokensplit-0.1.0/tokensplit.egg-info/top_level.txt +1 -0

tokensplit-0.1.0/LICENSE.txt ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) [year] [fullname]
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

tokensplit-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,154 @@
+Metadata-Version: 2.4
+Name: tokensplit
+Version: 0.1.0
+Summary: String-separated values with user-defined multi-character delimiters
+Author-email: lost_0 <l05t_0@proton.me>
+License-Expression: MIT
+Classifier: Programming Language :: Python :: 3
+Classifier: Operating System :: OS Independent
+Classifier: Topic :: File Formats
+Requires-Python: >=3.7
+Description-Content-Type: text/markdown
+License-File: LICENSE.txt
+Dynamic: license-file
+# TokenSplit — Token-Separated Values
+A lightweight Python package for reading and writing `.toks` files: a plain-text tabular format where you choose your own multi-character delimiter string.
+---
+## Why?
+CSV uses a single character (`,`) as a separator, which means commas in your data need escaping or quoting.
+Toks lets you pick any string — `/---/`, `:::`, `<<SEP>>` — that you know won't appear in your data, keeping files simple and unambiguous without any escape sequences.
+---
+## File format
+```
+/---/
+Alice/---/30/---/Engineer/---/
+Bob/---/25/---/Designer/---/
+```
+- **Line 1** — the delimiter string (written automatically by the writer)
+- **Every other line** — values separated by the delimiter, with the line ending on `<delimiter><newline>`
+Newlines *inside* a value are preserved because rows end only on the `<delimiter><newline>` sequence, not on bare newlines.
+---
+## Installation
+```bash
+pip install tokensplit           # once published to PyPI
+# or, from source:
+pip install .
+```
+---
+## Quick start
+### Writing
+```python
+import tokensplit
+# Convenience function
+tokensplit.write("people.toks", [
+    ["name", "age", "role"],
+    ["Alice", "30", "Engineer"],
+    ["Bob",   "25", "Designer"],
+], delimiter="/---/")
+```
+```python
+# Streaming writer — useful for large files
+with open("people.toks", "w") as f:
+    writer = tokensplit.ToksWriter(f, delimiter="/---/")
+    writer.writerow(["name", "age", "role"])   # header
+    writer.writerow(["Alice", "30", "Engineer"])
+    writer.writerow(["Bob",   "25", "Designer"])
+```
+### Reading
+```python
+import tokensplit
+# Convenience function — returns list of rows
+rows = tokensplit.read("people.toks")
+# [["name", "age", "role"], ["Alice", "30", "Engineer"], ...]
+# Streaming reader — one row at a time (memory-efficient)
+with open("people.toks") as f:
+    reader = tokensplit.ToksReader(f)
+    print("delimiter:", reader.delimiter)   # "/---/"
+    for row in reader:
+        print(row)
+```
+---
+## Choosing a delimiter
+Any non-empty string without a newline character works. Good choices:
+| Delimiter | Good when data contains… |
+|-----------|--------------------------|
+| `/---/`   | General text |
+| `\|\|\|`  | Paths, URLs |
+| `<<<>>>`  | Code snippets |
+| `,,,,`    | Numeric CSVs being converted |
+| `:::`     | Short labels / IDs |
+**Two rules enforced by the writer:**
+1. A value must not *contain* the delimiter string.
+2. A value must not end with a prefix of the delimiter in a way that creates an ambiguous sequence when written (e.g. value `"aa"` with delimiter `"aaa"` would produce `"aaaaa"` which embeds an extra delimiter). A `ValueError` is raised in both cases.
+---
+## API reference
+### `tokensplit.write(filepath, rows, delimiter)`
+Write `rows` (list of lists of strings) to `filepath`.
+### `tokensplit.read(filepath) → List[List[str]]`
+Read all rows from `filepath`. Returns a list of lists of strings.
+### `tokensplit.ToksWriter(file_obj, delimiter)`
+Streaming writer. Call `.writerow(row)` or `.writerows(rows)`.
+The delimiter is written to line 1 of the file on construction.
+### `tokensplit.ToksReader(file_obj)`
+Streaming reader. Iterate with `for row in reader`.
+`.delimiter` attribute exposes the delimiter read from line 1.
+---
+## Reading algorithm
+The reader uses a **forward-only sliding window** of exactly `len(delimiter)` characters:
+```
+content:   h e l l o / - - - / w o r l d / - - - / \n
+window:    [     5     ]
+                  → slides one character at a time
+                        match! → emit token, jump window past delimiter
+```
+- **Time:** O(n) — every character is visited once; one slice emitted per match
+- **Extra space:** O(d) — only the current window lives in memory beyond the content string
+- No regex, no `str.split`, no backtracking
+---
+## Running tests
+```bash
+python -m pytest tests/
+```

tokensplit-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,140 @@
+# TokenSplit — Token-Separated Values
+A lightweight Python package for reading and writing `.toks` files: a plain-text tabular format where you choose your own multi-character delimiter string.
+---
+## Why?
+CSV uses a single character (`,`) as a separator, which means commas in your data need escaping or quoting.
+Toks lets you pick any string — `/---/`, `:::`, `<<SEP>>` — that you know won't appear in your data, keeping files simple and unambiguous without any escape sequences.
+---
+## File format
+```
+/---/
+Alice/---/30/---/Engineer/---/
+Bob/---/25/---/Designer/---/
+```
+- **Line 1** — the delimiter string (written automatically by the writer)
+- **Every other line** — values separated by the delimiter, with the line ending on `<delimiter><newline>`
+Newlines *inside* a value are preserved because rows end only on the `<delimiter><newline>` sequence, not on bare newlines.
+---
+## Installation
+```bash
+pip install tokensplit           # once published to PyPI
+# or, from source:
+pip install .
+```
+---
+## Quick start
+### Writing
+```python
+import tokensplit
+# Convenience function
+tokensplit.write("people.toks", [
+    ["name", "age", "role"],
+    ["Alice", "30", "Engineer"],
+    ["Bob",   "25", "Designer"],
+], delimiter="/---/")
+```
+```python
+# Streaming writer — useful for large files
+with open("people.toks", "w") as f:
+    writer = tokensplit.ToksWriter(f, delimiter="/---/")
+    writer.writerow(["name", "age", "role"])   # header
+    writer.writerow(["Alice", "30", "Engineer"])
+    writer.writerow(["Bob",   "25", "Designer"])
+```
+### Reading
+```python
+import tokensplit
+# Convenience function — returns list of rows
+rows = tokensplit.read("people.toks")
+# [["name", "age", "role"], ["Alice", "30", "Engineer"], ...]
+# Streaming reader — one row at a time (memory-efficient)
+with open("people.toks") as f:
+    reader = tokensplit.ToksReader(f)
+    print("delimiter:", reader.delimiter)   # "/---/"
+    for row in reader:
+        print(row)
+```
+---
+## Choosing a delimiter
+Any non-empty string without a newline character works. Good choices:
+| Delimiter | Good when data contains… |
+|-----------|--------------------------|
+| `/---/`   | General text |
+| `\|\|\|`  | Paths, URLs |
+| `<<<>>>`  | Code snippets |
+| `,,,,`    | Numeric CSVs being converted |
+| `:::`     | Short labels / IDs |
+**Two rules enforced by the writer:**
+1. A value must not *contain* the delimiter string.
+2. A value must not end with a prefix of the delimiter in a way that creates an ambiguous sequence when written (e.g. value `"aa"` with delimiter `"aaa"` would produce `"aaaaa"` which embeds an extra delimiter). A `ValueError` is raised in both cases.
+---
+## API reference
+### `tokensplit.write(filepath, rows, delimiter)`
+Write `rows` (list of lists of strings) to `filepath`.
+### `tokensplit.read(filepath) → List[List[str]]`
+Read all rows from `filepath`. Returns a list of lists of strings.
+### `tokensplit.ToksWriter(file_obj, delimiter)`
+Streaming writer. Call `.writerow(row)` or `.writerows(rows)`.
+The delimiter is written to line 1 of the file on construction.
+### `tokensplit.ToksReader(file_obj)`
+Streaming reader. Iterate with `for row in reader`.
+`.delimiter` attribute exposes the delimiter read from line 1.
+---
+## Reading algorithm
+The reader uses a **forward-only sliding window** of exactly `len(delimiter)` characters:
+```
+content:   h e l l o / - - - / w o r l d / - - - / \n
+window:    [     5     ]
+                  → slides one character at a time
+                        match! → emit token, jump window past delimiter
+```
+- **Time:** O(n) — every character is visited once; one slice emitted per match
+- **Extra space:** O(d) — only the current window lives in memory beyond the content string
+- No regex, no `str.split`, no backtracking
+---
+## Running tests
+```bash
+python -m pytest tests/
+```

tokensplit-0.1.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,18 @@
+[build-system]
+requires = ["setuptools>=61"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "tokensplit"
+version = "0.1.0"
+description = "String-separated values with user-defined multi-character delimiters"
+readme = "README.md"
+requires-python = ">=3.7"
+authors = [{name = "lost_0", email = "l05t_0@proton.me"}]
+license = "MIT"
+license-files = ["LICENSE.txt"]
+classifiers = [
+    "Programming Language :: Python :: 3",
+    "Operating System :: OS Independent",
+    "Topic :: File Formats",
+]

tokensplit-0.1.0/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

tokensplit-0.1.0/tests/test_tokensplit.py ADDED Viewed

@@ -0,0 +1,168 @@
+"""
+tests/test_tokensplit.py — tests for the tokensplit package.
+Run with:  python -m pytest tests/
+"""
+import io
+import os
+import sys
+import pytest
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
+from tokensplit.reader import ToksReader, read
+from tokensplit.writer import ToksWriter, write
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+def roundtrip(rows, delimiter):
+    """Write rows to an in-memory buffer, read them back, return result."""
+    buf = io.StringIO()
+    ToksWriter(buf, delimiter).writerows(rows)
+    buf.seek(0)
+    return list(ToksReader(buf))
+# ---------------------------------------------------------------------------
+# Basic round-trip tests
+# ---------------------------------------------------------------------------
+class TestRoundTrip:
+    def test_simple_three_char_delimiter(self):
+        rows = [["hello", "world"], ["foo", "bar", "baz"]]
+        assert roundtrip(rows, ",,,") == rows
+    def test_slash_delimiter(self):
+        rows = [["alpha", "beta"], ["gamma", "delta", "epsilon"]]
+        assert roundtrip(rows, "/---/") == rows
+    def test_single_char_delimiter(self):
+        rows = [["a", "b", "c"], ["d", "e"]]
+        assert roundtrip(rows, "|") == rows
+    def test_long_delimiter(self):
+        rows = [["x", "y"], ["z"]]
+        assert roundtrip(rows, "<<SPLIT>>") == rows
+    def test_single_row_single_value(self):
+        rows = [["only"]]
+        assert roundtrip(rows, "/---/") == rows
+    def test_empty_values(self):
+        rows = [["", "b", ""], ["", ""]]
+        assert roundtrip(rows, "/---/") == rows
+    def test_numeric_strings(self):
+        rows = [["1", "2", "3"], ["100", "200"]]
+        assert roundtrip(rows, "|||") == rows
+    def test_whitespace_values(self):
+        rows = [["  leading", "trailing  ", " both "]]
+        assert roundtrip(rows, "/---/") == rows
+    def test_newlines_inside_values(self):
+        # Newlines inside a value are preserved; rows end on <delim><newline>
+        rows = [["line1\nline2", "normal"]]
+        assert roundtrip(rows, "/---/") == rows
+    def test_many_rows(self):
+        rows = [[str(i), str(i * 2)] for i in range(200)]
+        assert roundtrip(rows, "---") == rows
+# ---------------------------------------------------------------------------
+# Delimiter detection edge cases
+# ---------------------------------------------------------------------------
+class TestDelimiterEdgeCases:
+    def test_value_starts_with_partial_delimiter(self):
+        # Value starts with part of the delimiter — must not be split early.
+        rows = [["/--hello", "world"]]       # delimiter is /---/
+        assert roundtrip(rows, "/---/") == rows
+    def test_value_ends_with_safe_partial_delimiter(self):
+        # "hello/--" ends with "/--" (3 chars).  Delimiter is "/---/" (5 chars).
+        # "hello/--" + "/---/" = "hello/---/" which has "/---/" exactly at the
+        # intended boundary — safe to write.
+        rows = [["hello/--", "world"]]
+        assert roundtrip(rows, "/---/") == rows
+# ---------------------------------------------------------------------------
+# Writer safety / validation
+# ---------------------------------------------------------------------------
+class TestWriterValidation:
+    def test_value_contains_delimiter_raises(self):
+        buf = io.StringIO()
+        writer = ToksWriter(buf, "/---/")
+        with pytest.raises(ValueError, match="delimiter"):
+            writer.writerow(["safe", "un/---/safe"])
+    def test_adjacency_collision_raises(self):
+        # "aa" + "aaa" = "aaaaa" which embeds "aaa" before the intended split.
+        buf = io.StringIO()
+        writer = ToksWriter(buf, "aaa")
+        with pytest.raises(ValueError):
+            writer.writerow(["aa", "b"])
+    def test_empty_delimiter_raises(self):
+        with pytest.raises(ValueError, match="empty"):
+            ToksWriter(io.StringIO(), "")
+    def test_delimiter_with_newline_raises(self):
+        with pytest.raises(ValueError, match="newline"):
+            ToksWriter(io.StringIO(), "/--\n/")
+# ---------------------------------------------------------------------------
+# Reader validation
+# ---------------------------------------------------------------------------
+class TestReaderValidation:
+    def test_empty_file_raises(self):
+        with pytest.raises(ValueError, match="empty"):
+            ToksReader(io.StringIO(""))
+# ---------------------------------------------------------------------------
+# File I/O convenience functions
+# ---------------------------------------------------------------------------
+class TestFileIO:
+    def test_write_and_read_file(self, tmp_path):
+        path = str(tmp_path / "data.toks")
+        rows = [["name", "age"], ["Alice", "30"], ["Bob", "25"]]
+        write(path, rows, delimiter="/---/")
+        assert read(path) == rows
+    def test_file_first_line_is_delimiter(self, tmp_path):
+        path = str(tmp_path / "data.toks")
+        write(path, [["a", "b"]], delimiter="###")
+        with open(path) as f:
+            first_line = f.readline().rstrip("\n")
+        assert first_line == "###"
+    def test_delimiter_accessible_on_reader(self, tmp_path):
+        path = str(tmp_path / "data.toks")
+        write(path, [["x"]], delimiter="::::")
+        with open(path) as f:
+            reader = ToksReader(f)
+            assert reader.delimiter == "::::"
+# ---------------------------------------------------------------------------
+# Streaming / large data
+# ---------------------------------------------------------------------------
+class TestStreaming:
+    def test_fifty_rows(self):
+        rows = [[f"r{i}c0", f"r{i}c1"] for i in range(50)]
+        result = roundtrip(rows, ",,,")
+        assert len(result) == 50
+        assert result[0] == ["r0c0", "r0c1"]
+        assert result[49] == ["r49c0", "r49c1"]

tokensplit-0.1.0/tokensplit/__init__.py ADDED Viewed

@@ -0,0 +1,5 @@
+from .reader import ToksReader, read
+from .writer import ToksWriter, write
+__all__ = ["ToksReader", "ToksWriter", "read", "write"]
+__version__ = "0.1.0"

tokensplit-0.1.0/tokensplit/reader.py ADDED Viewed

@@ -0,0 +1,138 @@
+"""
+tokensplit/reader.py — Token-Separated Values reader.
+File format
+-----------
+  Line 1  : the delimiter string (e.g.  /---/  or  ,,,)
+  Line 2+ : rows of values, each value separated by the delimiter string,
+            each row terminated by  <delimiter><newline>
+            Example with delimiter  /---/ :
+                /---/
+                hello/---/world/---/
+                foo/---/bar/---/baz/---/
+Reading algorithm — sliding window, O(n) time, O(d) extra space
+---------------------------------------------------------------
+We read the post-header content as a single string, then scan it with a
+two-pointer window of exactly len(delimiter) characters.
+  end   advances one character per iteration.
+  start marks the beginning of the current token.
+  window_start = end - d  is the left edge of the current d-wide window.
+When the window matches the delimiter we:
+  1. Emit  text[start : end-d]  as the next value.
+  2. Set   start = end  (skip past the delimiter).
+  3. If content[start] == newline -> row terminator; close row, skip newline.
+Because we move forward-only and slice once per delimiter hit, total work is
+O(n) in file size.  Partial-overlap cases (delimiter="ab", value="aab") are
+handled naturally by the character-at-a-time slide.
+"""
+from typing import Iterator, List
+# ---------------------------------------------------------------------------
+# Internal helpers
+# ---------------------------------------------------------------------------
+def _read_delimiter(file_obj) -> str:
+    """Read line 1 and return the delimiter (without its trailing newline)."""
+    line = file_obj.readline()
+    if not line:
+        raise ValueError("File is empty — cannot read delimiter from first line.")
+    if line.endswith("\n"):
+        line = line[:-1]
+    if not line:
+        raise ValueError("Delimiter string on line 1 must not be empty.")
+    return line
+def _parse(content: str, delimiter: str) -> List[List[str]]:
+    """
+    Parse *content* (everything after the delimiter line) into a list of rows.
+    Row terminator  : <delimiter><newline>
+    Value separator : <delimiter>  (followed by more values on the same row)
+    """
+    d = len(delimiter)
+    n = len(content)
+    rows: List[List[str]] = []
+    current_row: List[str] = []
+    start = 0   # start of the current token
+    end = d     # right edge of the sliding window (exclusive)
+    if n < d:
+        # Content shorter than one delimiter — nothing to split.
+        if content:
+            current_row.append(content)
+            rows.append(current_row)
+        return rows
+    while end <= n:
+        if content[end - d : end] == delimiter:
+            # Emit the token that ends just before this window.
+            current_row.append(content[start : end - d])
+            start = end  # jump past the delimiter
+            # Peek: is the next character a newline (row terminator)?
+            if start < n and content[start] == "\n":
+                rows.append(current_row)
+                current_row = []
+                start += 1  # consume the newline
+            end = start + d  # position window at start of next potential match
+        else:
+            end += 1
+    # Flush anything not closed by a row-end delimiter (e.g. file with no final newline).
+    tail = content[start:]
+    if tail or current_row:
+        current_row.append(tail)
+        rows.append(current_row)
+    return rows
+# ---------------------------------------------------------------------------
+# Public API
+# ---------------------------------------------------------------------------
+class ToksReader:
+    """
+    Read a .toks file row by row.
+    Usage
+    -----
+        with open("data.toks") as f:
+            reader = ToksReader(f)
+            for row in reader:
+                print(row)          # ['val1', 'val2', ...]
+    The delimiter is read automatically from line 1 of the file.
+    Inspect it via  reader.delimiter  after construction.
+    """
+    def __init__(self, file_obj):
+        self.delimiter: str = _read_delimiter(file_obj)
+        self._rows: List[List[str]] = _parse(file_obj.read(), self.delimiter)
+    def __iter__(self) -> Iterator[List[str]]:
+        return iter(self._rows)
+def read(filepath: str) -> List[List[str]]:
+    """
+    Convenience function — read an entire .toks file and return all rows.
+        rows = tokensplit.read("data.toks")
+    Returns a list of rows; each row is a list of string values.
+    """
+    with open(filepath, "r", encoding="utf-8", newline="") as f:
+        reader = ToksReader(f)
+        return list(reader)

tokensplit-0.1.0/tokensplit/writer.py ADDED Viewed

@@ -0,0 +1,115 @@
+"""
+tokensplit/writer.py — Token-Separated Values writer.
+File format
+-----------
+  Line 1  : the delimiter string followed by a newline
+  Line 2+ : rows of values, each value separated by the delimiter,
+            each row terminated by  <delimiter><newline>
+            Example with delimiter  /---/ :
+                /---/
+                hello/---/world/---/
+                foo/---/bar/---/baz/---/
+Safety
+------
+Two kinds of collision are detected and rejected with a ValueError:
+  1. A value *contains* the delimiter  ("hel/---/lo" with delim "/---/")
+  2. A value's suffix + delimiter prefix would create a new delimiter when
+     written adjacently  (value "aa" + delim "aaa" = "aaaaa" which embeds
+     "aaa").
+In both cases the caller should choose a different delimiter.
+"""
+from typing import List
+# ---------------------------------------------------------------------------
+# Internal helpers
+# ---------------------------------------------------------------------------
+def _validate_value(value: str, delimiter: str):
+    """
+    Raise ValueError if writing *value* followed by *delimiter* would
+    produce a byte sequence that embeds the delimiter at an unexpected position.
+    Two checks:
+      1. value itself contains the delimiter string.
+      2. (value + delimiter) contains the delimiter *before* the intended
+         boundary at index len(value), meaning the suffix of value and the
+         prefix of delimiter combine to form a spurious delimiter earlier.
+    """
+    if delimiter in value:
+        raise ValueError(
+            f"Value {value!r} contains the delimiter {delimiter!r}. "
+            "Choose a different delimiter or sanitise your data first."
+        )
+    combined = value + delimiter
+    idx = combined.find(delimiter)
+    if idx < len(value):
+        raise ValueError(
+            f"Value {value!r} ends with a prefix of the delimiter "
+            f"{delimiter!r}, creating an ambiguous sequence when written. "
+            "Choose a different delimiter."
+        )
+# ---------------------------------------------------------------------------
+# Public API
+# ---------------------------------------------------------------------------
+class ToksWriter:
+    """
+    Write rows to a .toks file.
+    Usage
+    -----
+        with open("data.toks", "w") as f:
+            writer = ToksWriter(f, delimiter="/---/")
+            writer.writerow(["hello", "world"])
+            writer.writerow(["foo", "bar", "baz"])
+    The delimiter is written automatically to line 1 on construction.
+    """
+    def __init__(self, file_obj, delimiter: str):
+        if not delimiter:
+            raise ValueError("Delimiter must not be empty.")
+        if "\n" in delimiter:
+            raise ValueError("Delimiter must not contain a newline character.")
+        self.delimiter: str = delimiter
+        self._file = file_obj
+        # Write the delimiter as the very first line.
+        file_obj.write(delimiter + "\n")
+    def writerow(self, row: List[str]):
+        """
+        Write a single row of values.
+        Raises ValueError if any value would corrupt the file (see module
+        docstring for details).
+        """
+        for value in row:
+            _validate_value(str(value), self.delimiter)
+        self._file.write(self.delimiter.join(str(v) for v in row))
+        self._file.write(self.delimiter + "\n")
+    def writerows(self, rows: List[List[str]]):
+        """Write multiple rows at once."""
+        for row in rows:
+            self.writerow(row)
+def write(filepath: str, rows: List[List[str]], delimiter: str):
+    """
+    Convenience function — write all rows to a .toks file in one call.
+        tokensplit.write("data.toks", [["a", "b"], ["c", "d"]], delimiter="/---/")
+    """
+    with open(filepath, "w", encoding="utf-8", newline="") as f:
+        writer = ToksWriter(f, delimiter=delimiter)
+        writer.writerows(rows)

tokensplit-0.1.0/tokensplit.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,154 @@
+Metadata-Version: 2.4
+Name: tokensplit
+Version: 0.1.0
+Summary: String-separated values with user-defined multi-character delimiters
+Author-email: lost_0 <l05t_0@proton.me>
+License-Expression: MIT
+Classifier: Programming Language :: Python :: 3
+Classifier: Operating System :: OS Independent
+Classifier: Topic :: File Formats
+Requires-Python: >=3.7
+Description-Content-Type: text/markdown
+License-File: LICENSE.txt
+Dynamic: license-file
+# TokenSplit — Token-Separated Values
+A lightweight Python package for reading and writing `.toks` files: a plain-text tabular format where you choose your own multi-character delimiter string.
+---
+## Why?
+CSV uses a single character (`,`) as a separator, which means commas in your data need escaping or quoting.
+Toks lets you pick any string — `/---/`, `:::`, `<<SEP>>` — that you know won't appear in your data, keeping files simple and unambiguous without any escape sequences.
+---
+## File format
+```
+/---/
+Alice/---/30/---/Engineer/---/
+Bob/---/25/---/Designer/---/
+```
+- **Line 1** — the delimiter string (written automatically by the writer)
+- **Every other line** — values separated by the delimiter, with the line ending on `<delimiter><newline>`
+Newlines *inside* a value are preserved because rows end only on the `<delimiter><newline>` sequence, not on bare newlines.
+---
+## Installation
+```bash
+pip install tokensplit           # once published to PyPI
+# or, from source:
+pip install .
+```
+---
+## Quick start
+### Writing
+```python
+import tokensplit
+# Convenience function
+tokensplit.write("people.toks", [
+    ["name", "age", "role"],
+    ["Alice", "30", "Engineer"],
+    ["Bob",   "25", "Designer"],
+], delimiter="/---/")
+```
+```python
+# Streaming writer — useful for large files
+with open("people.toks", "w") as f:
+    writer = tokensplit.ToksWriter(f, delimiter="/---/")
+    writer.writerow(["name", "age", "role"])   # header
+    writer.writerow(["Alice", "30", "Engineer"])
+    writer.writerow(["Bob",   "25", "Designer"])
+```
+### Reading
+```python
+import tokensplit
+# Convenience function — returns list of rows
+rows = tokensplit.read("people.toks")
+# [["name", "age", "role"], ["Alice", "30", "Engineer"], ...]
+# Streaming reader — one row at a time (memory-efficient)
+with open("people.toks") as f:
+    reader = tokensplit.ToksReader(f)
+    print("delimiter:", reader.delimiter)   # "/---/"
+    for row in reader:
+        print(row)
+```
+---
+## Choosing a delimiter
+Any non-empty string without a newline character works. Good choices:
+| Delimiter | Good when data contains… |
+|-----------|--------------------------|
+| `/---/`   | General text |
+| `\|\|\|`  | Paths, URLs |
+| `<<<>>>`  | Code snippets |
+| `,,,,`    | Numeric CSVs being converted |
+| `:::`     | Short labels / IDs |
+**Two rules enforced by the writer:**
+1. A value must not *contain* the delimiter string.
+2. A value must not end with a prefix of the delimiter in a way that creates an ambiguous sequence when written (e.g. value `"aa"` with delimiter `"aaa"` would produce `"aaaaa"` which embeds an extra delimiter). A `ValueError` is raised in both cases.
+---
+## API reference
+### `tokensplit.write(filepath, rows, delimiter)`
+Write `rows` (list of lists of strings) to `filepath`.
+### `tokensplit.read(filepath) → List[List[str]]`
+Read all rows from `filepath`. Returns a list of lists of strings.
+### `tokensplit.ToksWriter(file_obj, delimiter)`
+Streaming writer. Call `.writerow(row)` or `.writerows(rows)`.
+The delimiter is written to line 1 of the file on construction.
+### `tokensplit.ToksReader(file_obj)`
+Streaming reader. Iterate with `for row in reader`.
+`.delimiter` attribute exposes the delimiter read from line 1.
+---
+## Reading algorithm
+The reader uses a **forward-only sliding window** of exactly `len(delimiter)` characters:
+```
+content:   h e l l o / - - - / w o r l d / - - - / \n
+window:    [     5     ]
+                  → slides one character at a time
+                        match! → emit token, jump window past delimiter
+```
+- **Time:** O(n) — every character is visited once; one slice emitted per match
+- **Extra space:** O(d) — only the current window lives in memory beyond the content string
+- No regex, no `str.split`, no backtracking
+---
+## Running tests
+```bash
+python -m pytest tests/
+```

tokensplit-0.1.0/tokensplit.egg-info/SOURCES.txt ADDED Viewed

@@ -0,0 +1,11 @@
+LICENSE.txt
+README.md
+pyproject.toml
+tests/test_tokensplit.py
+tokensplit/__init__.py
+tokensplit/reader.py
+tokensplit/writer.py
+tokensplit.egg-info/PKG-INFO
+tokensplit.egg-info/SOURCES.txt
+tokensplit.egg-info/dependency_links.txt
+tokensplit.egg-info/top_level.txt

tokensplit-0.1.0/tokensplit.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+

tokensplit-0.1.0/tokensplit.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ tokensplit