PyPI - splurge-dsv - Versions diffs - 2025.1.0__tar.gz → 2025.1.2__tar.gz - Mend

splurge-dsv 2025.1.0tar.gz → 2025.1.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

{splurge_dsv-2025.1.0/splurge_dsv.egg-info → splurge_dsv-2025.1.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: splurge-dsv
-Version: 2025.1.0
+Version: 2025.1.2
 Summary: A utility library for working with DSV (Delimited String Values) files
 Author: Jim Schilling
 License-Expression: MIT
@@ -243,6 +243,52 @@ The project follows strict coding standards:
 ## Changelog
+### 2025.1.2 (2025-09-02)
+#### 🧪 Comprehensive End-to-End Testing
+- **Complete E2E Test Suite**: Implemented 25 comprehensive end-to-end workflow tests covering all major CLI functionality
+- **Real CLI Execution**: Tests run actual `splurge-dsv` commands with real files, not just mocked components
+- **Workflow Coverage**: Tests cover CSV/TSV parsing, file operations, data processing, error handling, and performance scenarios
+- **Cross-Platform Compatibility**: Handles Windows-specific encoding issues and platform differences gracefully
+- **Performance Testing**: Large file processing tests (1,000+ and 10,000+ rows) with streaming and chunking validation
+#### 📊 Test Coverage Improvements
+- **CLI Coverage**: Increased from 64% to **95%** with comprehensive CLI workflow testing
+- **DSV Helper Coverage**: Improved from 75% to **93%** with real-world usage scenarios
+- **Overall Coverage**: Improved from 60% to **73%** across the entire codebase
+- **Integration Testing**: Added real file system operations and complete pipeline validation
+#### 🔄 Test Categories
+- **CLI Workflows**: 19 tests covering basic parsing, custom delimiters, header/footer skipping, streaming, and error scenarios
+- **Error Handling**: 3 tests for invalid arguments, missing parameters, and CLI error conditions
+- **Integration Scenarios**: 3 tests for data analysis, transformation, and multi-format workflows
+#### 📚 Documentation & Examples
+- **E2E Testing Guide**: Created comprehensive documentation (`docs/e2e_testing_coverage.md`) explaining test coverage and usage
+- **Real-World Examples**: Tests serve as practical examples of library usage patterns
+- **Error Scenario Coverage**: Comprehensive testing of edge cases and failure conditions
+### 2025.1.1 (2025-08-XX)
+#### 🔧 Code Quality Improvements
+- **Refactored Complex Regex Logic**: Extracted Windows drive letter validation logic from `_check_dangerous_characters` into a dedicated `_is_valid_windows_drive_pattern` helper method in `PathValidator` for better readability and maintainability
+- **Exception Handling Consistency**: Fixed inconsistency in `ResourceManager.acquire()` method to properly re-raise `NotImplementedError` without wrapping it in `SplurgeResourceAcquisitionError`
+- **Import Organization**: Moved all imports to the top of modules across the entire codebase for better code structure and PEP 8 compliance
+#### 🧪 Testing Enhancements
+- **Public API Focus**: Removed all tests that validated private implementation details, focusing exclusively on public API behavior validation
+- **Comprehensive Resource Manager Tests**: Added extensive test suite for `ResourceManager` module covering all public methods, edge cases, error scenarios, and context manager behavior
+- **Bookend Logic Clarification**: Updated and corrected all tests related to `StringTokenizer.remove_bookends` to properly reflect its single-character, symmetric bookend matching behavior
+- **Path Validation Test Clarity**: Clarified test expectations and comments for Windows drive-relative paths (e.g., "C:file.txt") to reflect the validator's intentionally strict security design
+#### 🐛 Bug Fixes
+- **Test Reliability**: Fixed failing tests in `ResourceManager` context manager scenarios by properly handling file truncation and line ending normalization
+- **Ruff Compliance**: Resolved all linting warnings including unused variables and imports
+#### 📚 Documentation Updates
+- **Method Documentation**: Updated `ResourceManager.acquire()` docstring to include `NotImplementedError` in the Raises section
+- **Test Comments**: Enhanced test documentation with clearer explanations of expected behaviors and edge cases
 ### 2025.1.0 (2025-08-25)
 #### 🎉 Major Features

{splurge_dsv-2025.1.0 → splurge_dsv-2025.1.2}/README.md RENAMED Viewed

@@ -214,6 +214,52 @@ The project follows strict coding standards:
 ## Changelog
+### 2025.1.2 (2025-09-02)
+#### 🧪 Comprehensive End-to-End Testing
+- **Complete E2E Test Suite**: Implemented 25 comprehensive end-to-end workflow tests covering all major CLI functionality
+- **Real CLI Execution**: Tests run actual `splurge-dsv` commands with real files, not just mocked components
+- **Workflow Coverage**: Tests cover CSV/TSV parsing, file operations, data processing, error handling, and performance scenarios
+- **Cross-Platform Compatibility**: Handles Windows-specific encoding issues and platform differences gracefully
+- **Performance Testing**: Large file processing tests (1,000+ and 10,000+ rows) with streaming and chunking validation
+#### 📊 Test Coverage Improvements
+- **CLI Coverage**: Increased from 64% to **95%** with comprehensive CLI workflow testing
+- **DSV Helper Coverage**: Improved from 75% to **93%** with real-world usage scenarios
+- **Overall Coverage**: Improved from 60% to **73%** across the entire codebase
+- **Integration Testing**: Added real file system operations and complete pipeline validation
+#### 🔄 Test Categories
+- **CLI Workflows**: 19 tests covering basic parsing, custom delimiters, header/footer skipping, streaming, and error scenarios
+- **Error Handling**: 3 tests for invalid arguments, missing parameters, and CLI error conditions
+- **Integration Scenarios**: 3 tests for data analysis, transformation, and multi-format workflows
+#### 📚 Documentation & Examples
+- **E2E Testing Guide**: Created comprehensive documentation (`docs/e2e_testing_coverage.md`) explaining test coverage and usage
+- **Real-World Examples**: Tests serve as practical examples of library usage patterns
+- **Error Scenario Coverage**: Comprehensive testing of edge cases and failure conditions
+### 2025.1.1 (2025-08-XX)
+#### 🔧 Code Quality Improvements
+- **Refactored Complex Regex Logic**: Extracted Windows drive letter validation logic from `_check_dangerous_characters` into a dedicated `_is_valid_windows_drive_pattern` helper method in `PathValidator` for better readability and maintainability
+- **Exception Handling Consistency**: Fixed inconsistency in `ResourceManager.acquire()` method to properly re-raise `NotImplementedError` without wrapping it in `SplurgeResourceAcquisitionError`
+- **Import Organization**: Moved all imports to the top of modules across the entire codebase for better code structure and PEP 8 compliance
+#### 🧪 Testing Enhancements
+- **Public API Focus**: Removed all tests that validated private implementation details, focusing exclusively on public API behavior validation
+- **Comprehensive Resource Manager Tests**: Added extensive test suite for `ResourceManager` module covering all public methods, edge cases, error scenarios, and context manager behavior
+- **Bookend Logic Clarification**: Updated and corrected all tests related to `StringTokenizer.remove_bookends` to properly reflect its single-character, symmetric bookend matching behavior
+- **Path Validation Test Clarity**: Clarified test expectations and comments for Windows drive-relative paths (e.g., "C:file.txt") to reflect the validator's intentionally strict security design
+#### 🐛 Bug Fixes
+- **Test Reliability**: Fixed failing tests in `ResourceManager` context manager scenarios by properly handling file truncation and line ending normalization
+- **Ruff Compliance**: Resolved all linting warnings including unused variables and imports
+#### 📚 Documentation Updates
+- **Method Documentation**: Updated `ResourceManager.acquire()` docstring to include `NotImplementedError` in the Raises section
+- **Test Comments**: Enhanced test documentation with clearer explanations of expected behaviors and edge cases
 ### 2025.1.0 (2025-08-25)
 #### 🎉 Major Features

{splurge_dsv-2025.1.0 → splurge_dsv-2025.1.2}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "splurge-dsv"
-version = "2025.1.0"
+version = "2025.1.2"
 description = "A utility library for working with DSV (Delimited String Values) files"
 readme = "README.md"
 requires-python = ">=3.10"
@@ -82,3 +82,29 @@ exclude_lines = [
 [tool.coverage.html]
 directory = "htmlcov"
+[tool.ruff]
+target-version = "py310"
+line-length = 120
+[tool.ruff.lint]
+select = [
+    "E",  # pycodestyle errors
+    "W",  # pycodestyle warnings
+    "F",  # pyflakes
+    "I",  # isort
+    "B",  # flake8-bugbear
+    "C4", # flake8-comprehensions
+    "UP", # pyupgrade
+]
+ignore = [
+    "E501",  # line too long, handled by line-length
+    "B008",  # do not perform function calls in argument defaults
+    "C901",  # too complex
+]
+[tool.ruff.format]
+quote-style = "double"
+indent-style = "space"
+skip-magic-trailing-comma = false
+line-ending = "auto"

splurge_dsv-2025.1.2/splurge_dsv/__init__.py ADDED Viewed

@@ -0,0 +1,84 @@
+"""
+Splurge DSV - A utility library for working with DSV (Delimited String Values) files.
+This package provides utilities for parsing, processing, and manipulating
+delimited string value files with support for various delimiters, text bookends,
+and streaming operations.
+Copyright (c) 2025 Jim Schilling
+This module is licensed under the MIT License.
+"""
+# Local imports
+from splurge_dsv.dsv_helper import DsvHelper
+from splurge_dsv.exceptions import (
+    SplurgeConfigurationError,
+    SplurgeDataProcessingError,
+    SplurgeDsvError,
+    SplurgeFileEncodingError,
+    SplurgeFileNotFoundError,
+    SplurgeFileOperationError,
+    SplurgeFilePermissionError,
+    SplurgeFormatError,
+    SplurgeParameterError,
+    SplurgeParsingError,
+    SplurgePathValidationError,
+    SplurgePerformanceWarning,
+    SplurgeRangeError,
+    SplurgeResourceAcquisitionError,
+    SplurgeResourceError,
+    SplurgeResourceReleaseError,
+    SplurgeStreamingError,
+    SplurgeTypeConversionError,
+    SplurgeValidationError,
+)
+from splurge_dsv.path_validator import PathValidator
+from splurge_dsv.resource_manager import (
+    FileResourceManager,
+    ResourceManager,
+    StreamResourceManager,
+    safe_file_operation,
+    safe_stream_operation,
+)
+from splurge_dsv.string_tokenizer import StringTokenizer
+from splurge_dsv.text_file_helper import TextFileHelper
+__version__ = "2025.1.2"
+__author__ = "Jim Schilling"
+__license__ = "MIT"
+__all__ = [
+    # Main helper class
+    "DsvHelper",
+    # Exceptions
+    "SplurgeDsvError",
+    "SplurgeValidationError",
+    "SplurgeFileOperationError",
+    "SplurgeFileNotFoundError",
+    "SplurgeFilePermissionError",
+    "SplurgeFileEncodingError",
+    "SplurgePathValidationError",
+    "SplurgeDataProcessingError",
+    "SplurgeParsingError",
+    "SplurgeTypeConversionError",
+    "SplurgeStreamingError",
+    "SplurgeConfigurationError",
+    "SplurgeResourceError",
+    "SplurgeResourceAcquisitionError",
+    "SplurgeResourceReleaseError",
+    "SplurgePerformanceWarning",
+    "SplurgeParameterError",
+    "SplurgeRangeError",
+    "SplurgeFormatError",
+    # Utility classes
+    "StringTokenizer",
+    "TextFileHelper",
+    "PathValidator",
+    "ResourceManager",
+    "FileResourceManager",
+    "StreamResourceManager",
+    # Context managers
+    "safe_file_operation",
+    "safe_stream_operation",
+]

splurge_dsv-2025.1.2/splurge_dsv/__main__.py ADDED Viewed

@@ -0,0 +1,15 @@
+"""
+Command-line interface entry point for splurge-dsv.
+This module serves as the entry point when running the package as a module.
+It imports and calls the main CLI function from the cli module.
+"""
+# Standard library imports
+import sys
+# Local imports
+from splurge_dsv.cli import main
+if __name__ == "__main__":
+    sys.exit(main())

splurge_dsv-2025.1.2/splurge_dsv/cli.py ADDED Viewed

@@ -0,0 +1,158 @@
+"""
+Command-line interface for splurge-dsv.
+This module provides a command-line interface for the splurge-dsv library,
+allowing users to parse DSV files from the command line.
+Usage:
+    python -m splurge_dsv <file_path> [options]
+    python -m splurge_dsv --help
+"""
+# Standard library imports
+import argparse
+import sys
+from pathlib import Path
+# Local imports
+from splurge_dsv.dsv_helper import DsvHelper
+from splurge_dsv.exceptions import SplurgeDsvError
+def parse_arguments() -> argparse.Namespace:
+    """Parse command line arguments."""
+    parser = argparse.ArgumentParser(
+        description="Parse DSV (Delimited String Values) files",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  python -m splurge_dsv data.csv --delimiter ,
+  python -m splurge_dsv data.tsv --delimiter "\\t"
+  python -m splurge_dsv data.txt --delimiter "|" --bookend '"'
+        """,
+    )
+    parser.add_argument("file_path", type=str, help="Path to the DSV file to parse")
+    parser.add_argument("--delimiter", "-d", type=str, required=True, help="Delimiter character to use for parsing")
+    parser.add_argument("--bookend", "-b", type=str, help="Bookend character for text fields (e.g., '\"')")
+    parser.add_argument("--no-strip", action="store_true", help="Don't strip whitespace from values")
+    parser.add_argument("--no-bookend-strip", action="store_true", help="Don't strip whitespace from bookends")
+    parser.add_argument("--encoding", "-e", type=str, default="utf-8", help="File encoding (default: utf-8)")
+    parser.add_argument("--skip-header", type=int, default=0, help="Number of header rows to skip (default: 0)")
+    parser.add_argument("--skip-footer", type=int, default=0, help="Number of footer rows to skip (default: 0)")
+    parser.add_argument(
+        "--stream", "-s", action="store_true", help="Stream the file in chunks instead of loading entirely into memory"
+    )
+    parser.add_argument("--chunk-size", type=int, default=500, help="Chunk size for streaming (default: 500)")
+    parser.add_argument("--version", action="version", version="%(prog)s 2025.1.2")
+    return parser.parse_args()
+def print_results(rows: list[list[str]], delimiter: str) -> None:
+    """Print parsed results in a formatted way."""
+    if not rows:
+        print("No data found.")
+        return
+    # Find the maximum width for each column
+    if rows:
+        max_widths = []
+        for col_idx in range(len(rows[0])):
+            max_width = max(len(str(row[col_idx])) for row in rows)
+            max_widths.append(max_width)
+        # Print header separator
+        print("-" * (sum(max_widths) + len(max_widths) * 3 - 1))
+        # Print each row
+        for row_idx, row in enumerate(rows):
+            formatted_row = []
+            for col_idx, value in enumerate(row):
+                formatted_value = str(value).ljust(max_widths[col_idx])
+                formatted_row.append(formatted_value)
+            print(f"| {' | '.join(formatted_row)} |")
+            # Print separator after header
+            if row_idx == 0:
+                print("-" * (sum(max_widths) + len(max_widths) * 3 - 1))
+def main() -> int:
+    """Main entry point for the command-line interface."""
+    try:
+        args = parse_arguments()
+        # Validate file path
+        file_path = Path(args.file_path)
+        if not file_path.exists():
+            print(f"Error: File '{args.file_path}' not found.", file=sys.stderr)
+            return 1
+        if not file_path.is_file():
+            print(f"Error: '{args.file_path}' is not a file.", file=sys.stderr)
+            return 1
+        # Parse the file
+        if args.stream:
+            print(f"Streaming file '{args.file_path}' with delimiter '{args.delimiter}'...")
+            chunk_count = 0
+            total_rows = 0
+            for chunk in DsvHelper.parse_stream(
+                file_path,
+                delimiter=args.delimiter,
+                strip=not args.no_strip,
+                bookend=args.bookend,
+                bookend_strip=not args.no_bookend_strip,
+                encoding=args.encoding,
+                skip_header_rows=args.skip_header,
+                skip_footer_rows=args.skip_footer,
+                chunk_size=args.chunk_size,
+            ):
+                chunk_count += 1
+                total_rows += len(chunk)
+                print(f"Chunk {chunk_count}: {len(chunk)} rows")
+                print_results(chunk, args.delimiter)
+                print()
+            print(f"Total: {total_rows} rows in {chunk_count} chunks")
+        else:
+            print(f"Parsing file '{args.file_path}' with delimiter '{args.delimiter}'...")
+            rows = DsvHelper.parse_file(
+                file_path,
+                delimiter=args.delimiter,
+                strip=not args.no_strip,
+                bookend=args.bookend,
+                bookend_strip=not args.no_bookend_strip,
+                encoding=args.encoding,
+                skip_header_rows=args.skip_header,
+                skip_footer_rows=args.skip_footer,
+            )
+            print(f"Parsed {len(rows)} rows")
+            print_results(rows, args.delimiter)
+        return 0
+    except KeyboardInterrupt:
+        print("\nOperation cancelled by user.", file=sys.stderr)
+        return 130
+    except SplurgeDsvError as e:
+        print(f"Error: {e.message}", file=sys.stderr)
+        if e.details:
+            print(f"Details: {e.details}", file=sys.stderr)
+        return 1
+    except Exception as e:
+        print(f"Unexpected error: {e}", file=sys.stderr)
+        return 1

{splurge_dsv-2025.1.0 → splurge_dsv-2025.1.2}/splurge_dsv/dsv_helper.py RENAMED Viewed

@@ -8,12 +8,15 @@ Please preserve this header and all related material when sharing!
 This module is licensed under the MIT License.
 """
+# Standard library imports
+from collections.abc import Iterator
 from os import PathLike
-from typing import Iterator
+# Local imports
+from splurge_dsv.exceptions import SplurgeParameterError
 from splurge_dsv.string_tokenizer import StringTokenizer
 from splurge_dsv.text_file_helper import TextFileHelper
-from splurge_dsv.exceptions import SplurgeParameterError
 class DsvHelper:
     """
@@ -38,7 +41,7 @@ class DsvHelper:
         delimiter: str,
         strip: bool = DEFAULT_STRIP,
         bookend: str | None = None,
-        bookend_strip: bool = DEFAULT_BOOKEND_STRIP
+        bookend_strip: bool = DEFAULT_BOOKEND_STRIP,
     ) -> list[str]:
         """
         Parse a string into a list of strings.
@@ -68,10 +71,7 @@ class DsvHelper:
         tokens: list[str] = StringTokenizer.parse(content, delimiter=delimiter, strip=strip)
         if bookend:
-            tokens = [
-                StringTokenizer.remove_bookends(token, bookend=bookend, strip=bookend_strip)
-                for token in tokens
-            ]
+            tokens = [StringTokenizer.remove_bookends(token, bookend=bookend, strip=bookend_strip) for token in tokens]
         return tokens
@@ -83,7 +83,7 @@ class DsvHelper:
         delimiter: str,
         strip: bool = DEFAULT_STRIP,
         bookend: str | None = None,
-        bookend_strip: bool = DEFAULT_BOOKEND_STRIP
+        bookend_strip: bool = DEFAULT_BOOKEND_STRIP,
     ) -> list[list[str]]:
         """
         Parse a list of strings into a list of lists of strings.
@@ -108,7 +108,7 @@ class DsvHelper:
         """
         if not isinstance(content, list):
             raise SplurgeParameterError("content must be a list")
         if not all(isinstance(item, str) for item in content):
             raise SplurgeParameterError("content must be a list of strings")
@@ -128,7 +128,7 @@ class DsvHelper:
         bookend_strip: bool = DEFAULT_BOOKEND_STRIP,
         encoding: str = DEFAULT_ENCODING,
         skip_header_rows: int = DEFAULT_SKIP_HEADER_ROWS,
-        skip_footer_rows: int = DEFAULT_SKIP_FOOTER_ROWS
+        skip_footer_rows: int = DEFAULT_SKIP_FOOTER_ROWS,
     ) -> list[list[str]]:
         """
         Parse a file into a list of lists of strings.
@@ -157,19 +157,10 @@ class DsvHelper:
             [['header1', 'header2'], ['value1', 'value2']]
         """
         lines: list[str] = TextFileHelper.read(
-            file_path,
-            encoding=encoding,
-            skip_header_rows=skip_header_rows,
-            skip_footer_rows=skip_footer_rows
+            file_path, encoding=encoding, skip_header_rows=skip_header_rows, skip_footer_rows=skip_footer_rows
         )
-        return cls.parses(
-            lines,
-            delimiter=delimiter,
-            strip=strip,
-            bookend=bookend,
-            bookend_strip=bookend_strip
-        )
+        return cls.parses(lines, delimiter=delimiter, strip=strip, bookend=bookend, bookend_strip=bookend_strip)
     @classmethod
     def _process_stream_chunk(
@@ -179,28 +170,22 @@ class DsvHelper:
         delimiter: str,
         strip: bool = DEFAULT_STRIP,
         bookend: str | None = None,
-        bookend_strip: bool = DEFAULT_BOOKEND_STRIP
+        bookend_strip: bool = DEFAULT_BOOKEND_STRIP,
     ) -> list[list[str]]:
         """
         Process a chunk of lines from the stream.
         Args:
             chunk: List of lines to process
             delimiter: Delimiter to use for parsing
             strip: Whether to strip whitespace
             bookend: Bookend character for text fields
             bookend_strip: Whether to strip whitespace from bookends
         Returns:
             list[list[str]]: Parsed rows
         """
-        return cls.parses(
-            chunk,
-            delimiter=delimiter,
-            strip=strip,
-            bookend=bookend,
-            bookend_strip=bookend_strip
-        )
+        return cls.parses(chunk, delimiter=delimiter, strip=strip, bookend=bookend, bookend_strip=bookend_strip)
     @classmethod
     def parse_stream(
@@ -214,7 +199,7 @@ class DsvHelper:
         encoding: str = DEFAULT_ENCODING,
         skip_header_rows: int = DEFAULT_SKIP_HEADER_ROWS,
         skip_footer_rows: int = DEFAULT_SKIP_FOOTER_ROWS,
-        chunk_size: int = DEFAULT_CHUNK_SIZE
+        chunk_size: int = DEFAULT_CHUNK_SIZE,
     ) -> Iterator[list[list[str]]]:
         """
         Stream-parse a DSV file in chunks of lines.
@@ -247,17 +232,15 @@ class DsvHelper:
         skip_footer_rows = max(skip_footer_rows, cls.DEFAULT_SKIP_FOOTER_ROWS)
         # Use TextFileHelper.read_as_stream for consistent error handling
-        for chunk in TextFileHelper.read_as_stream(
-            file_path,
-            encoding=encoding,
-            skip_header_rows=skip_header_rows,
-            skip_footer_rows=skip_footer_rows,
-            chunk_size=chunk_size
-        ):
-            yield cls._process_stream_chunk(
-                chunk,
-                delimiter=delimiter,
-                strip=strip,
-                bookend=bookend,
-                bookend_strip=bookend_strip
-            )
+        yield from (
+            cls._process_stream_chunk(
+                chunk, delimiter=delimiter, strip=strip, bookend=bookend, bookend_strip=bookend_strip
+            )
+            for chunk in TextFileHelper.read_as_stream(
+                file_path,
+                encoding=encoding,
+                skip_header_rows=skip_header_rows,
+                skip_footer_rows=skip_footer_rows,
+                chunk_size=chunk_size,
+            )
+        )

splurge-dsv 2025.1.0__tar.gz → 2025.1.2__tar.gz

splurge-dsv 2025.1.0tar.gz → 2025.1.2tar.gz