PyPI - tea-data-file-conversion - Versions diffs - 0.1.1__tar.gz - Mend

tea-data-file-conversion 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

tea_data_file_conversion-0.1.1/LICENSE ADDED Viewed

@@ -0,0 +1,22 @@
+MIT License
+Copyright (c) 2025 Mark Moreno
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

tea_data_file_conversion-0.1.1/PKG-INFO ADDED Viewed

@@ -0,0 +1,109 @@
+Metadata-Version: 2.2
+Name: tea-data-file-conversion
+Version: 0.1.1
+Summary: Fixedwidth Processor is a Python package designed to transform fixed-width text files into CSVs using dynamic YAML schema configurations.
+Author-email: Mark Moreno <mamoreno@aldineisd.org>
+License: MIT
+Project-URL: Bug Tracker, https://github.com/markm-io/tea-data-file-conversion/issues
+Project-URL: Changelog, https://github.com/markm-io/tea-data-file-conversion/blob/main/CHANGELOG.md
+Project-URL: documentation, https://tea-data-file-conversion.readthedocs.io
+Project-URL: repository, https://github.com/markm-io/tea-data-file-conversion
+Classifier: Development Status :: 2 - Pre-Alpha
+Classifier: Intended Audience :: Developers
+Classifier: Natural Language :: English
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Software Development :: Libraries
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: importlib-resources>=6.5.2
+Requires-Dist: pandas>=2.2.3
+Requires-Dist: pyyaml>=6.0.2
+Requires-Dist: rich>=10
+Requires-Dist: typer<1,>=0.15
+# tea-data-file-conversion
+<p align="center">
+  <a href="https://github.com/markm-io/tea-data-file-conversion/actions/workflows/ci.yml?query=branch%3Amain">
+    <img src="https://img.shields.io/github/actions/workflow/status/markm-io/tea-data-file-conversion/ci.yml?branch=main&label=CI&logo=github&style=flat-square" alt="CI Status" >
+  </a>
+  <a href="https://tea-data-file-conversion.readthedocs.io">
+    <img src="https://img.shields.io/readthedocs/tea-data-file-conversion.svg?logo=read-the-docs&logoColor=fff&style=flat-square" alt="Documentation Status">
+  </a>
+  <a href="https://codecov.io/gh/markm-io/tea-data-file-conversion">
+    <img src="https://img.shields.io/codecov/c/github/markm-io/tea-data-file-conversion.svg?logo=codecov&logoColor=fff&style=flat-square" alt="Test coverage percentage">
+  </a>
+</p>
+<p align="center">
+  <a href="https://github.com/astral-sh/uv">
+    <img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json" alt="uv">
+  </a>
+  <a href="https://github.com/astral-sh/ruff">
+    <img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json" alt="Ruff">
+  </a>
+  <a href="https://github.com/pre-commit/pre-commit">
+    <img src="https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white&style=flat-square" alt="pre-commit">
+  </a>
+</p>
+<p align="center">
+  <a href="https://pypi.org/project/tea-data-file-conversion/">
+    <img src="https://img.shields.io/pypi/v/tea-data-file-conversion.svg?logo=python&logoColor=fff&style=flat-square" alt="PyPI Version">
+  </a>
+  <img src="https://img.shields.io/pypi/pyversions/tea-data-file-conversion.svg?style=flat-square&logo=python&amp;logoColor=fff" alt="Supported Python versions">
+  <img src="https://img.shields.io/pypi/l/tea-data-file-conversion.svg?style=flat-square" alt="License">
+</p>
+---
+**Documentation**: <a href="https://tea-data-file-conversion.readthedocs.io" target="_blank">https://tea-data-file-conversion.readthedocs.io </a>
+**Source Code**: <a href="https://github.com/markm-io/tea-data-file-conversion" target="_blank">https://github.com/markm-io/tea-data-file-conversion </a>
+---
+Fixedwidth Processor is a Python package designed to transform fixed-width text files into CSVs using dynamic YAML schema configurations.
+## Installation
+Install this via pip (or your favourite package manager):
+`pip install tea-data-file-conversion`
+## Contributors ✨
+Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):
+<!-- prettier-ignore-start -->
+<!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section -->
+<!-- prettier-ignore-start -->
+<!-- markdownlint-disable -->
+<table>
+  <tbody>
+    <tr>
+      <td align="center" valign="top" width="14.28%"><a href="https://github.com/markm-io"><img src="https://avatars.githubusercontent.com/u/45011486?v=4?s=80" width="80px;" alt="Mark Moreno"/><br /><sub><b>Mark Moreno</b></sub></a><br /><a href="https://github.com/markm-io/tea-data-file-conversion/commits?author=markm-io" title="Code">💻</a> <a href="#ideas-markm-io" title="Ideas, Planning, & Feedback">🤔</a> <a href="https://github.com/markm-io/tea-data-file-conversion/commits?author=markm-io" title="Documentation">📖</a></td>
+    </tr>
+  </tbody>
+</table>
+<!-- markdownlint-restore -->
+<!-- prettier-ignore-end -->
+<!-- ALL-CONTRIBUTORS-LIST:END -->
+<!-- prettier-ignore-end -->
+This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!
+## Credits
+[![Copier](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/copier-org/copier/master/img/badge/badge-grayscale-inverted-border-orange.json)](https://github.com/copier-org/copier)
+This package was created with
+[Copier](https://copier.readthedocs.io/) and the
+[browniebroke/pypackage-template](https://github.com/browniebroke/pypackage-template)
+project template.

tea_data_file_conversion-0.1.1/README.md ADDED Viewed

@@ -0,0 +1,80 @@
+# tea-data-file-conversion
+<p align="center">
+  <a href="https://github.com/markm-io/tea-data-file-conversion/actions/workflows/ci.yml?query=branch%3Amain">
+    <img src="https://img.shields.io/github/actions/workflow/status/markm-io/tea-data-file-conversion/ci.yml?branch=main&label=CI&logo=github&style=flat-square" alt="CI Status" >
+  </a>
+  <a href="https://tea-data-file-conversion.readthedocs.io">
+    <img src="https://img.shields.io/readthedocs/tea-data-file-conversion.svg?logo=read-the-docs&logoColor=fff&style=flat-square" alt="Documentation Status">
+  </a>
+  <a href="https://codecov.io/gh/markm-io/tea-data-file-conversion">
+    <img src="https://img.shields.io/codecov/c/github/markm-io/tea-data-file-conversion.svg?logo=codecov&logoColor=fff&style=flat-square" alt="Test coverage percentage">
+  </a>
+</p>
+<p align="center">
+  <a href="https://github.com/astral-sh/uv">
+    <img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json" alt="uv">
+  </a>
+  <a href="https://github.com/astral-sh/ruff">
+    <img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json" alt="Ruff">
+  </a>
+  <a href="https://github.com/pre-commit/pre-commit">
+    <img src="https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white&style=flat-square" alt="pre-commit">
+  </a>
+</p>
+<p align="center">
+  <a href="https://pypi.org/project/tea-data-file-conversion/">
+    <img src="https://img.shields.io/pypi/v/tea-data-file-conversion.svg?logo=python&logoColor=fff&style=flat-square" alt="PyPI Version">
+  </a>
+  <img src="https://img.shields.io/pypi/pyversions/tea-data-file-conversion.svg?style=flat-square&logo=python&amp;logoColor=fff" alt="Supported Python versions">
+  <img src="https://img.shields.io/pypi/l/tea-data-file-conversion.svg?style=flat-square" alt="License">
+</p>
+---
+**Documentation**: <a href="https://tea-data-file-conversion.readthedocs.io" target="_blank">https://tea-data-file-conversion.readthedocs.io </a>
+**Source Code**: <a href="https://github.com/markm-io/tea-data-file-conversion" target="_blank">https://github.com/markm-io/tea-data-file-conversion </a>
+---
+Fixedwidth Processor is a Python package designed to transform fixed-width text files into CSVs using dynamic YAML schema configurations.
+## Installation
+Install this via pip (or your favourite package manager):
+`pip install tea-data-file-conversion`
+## Contributors ✨
+Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):
+<!-- prettier-ignore-start -->
+<!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section -->
+<!-- prettier-ignore-start -->
+<!-- markdownlint-disable -->
+<table>
+  <tbody>
+    <tr>
+      <td align="center" valign="top" width="14.28%"><a href="https://github.com/markm-io"><img src="https://avatars.githubusercontent.com/u/45011486?v=4?s=80" width="80px;" alt="Mark Moreno"/><br /><sub><b>Mark Moreno</b></sub></a><br /><a href="https://github.com/markm-io/tea-data-file-conversion/commits?author=markm-io" title="Code">💻</a> <a href="#ideas-markm-io" title="Ideas, Planning, & Feedback">🤔</a> <a href="https://github.com/markm-io/tea-data-file-conversion/commits?author=markm-io" title="Documentation">📖</a></td>
+    </tr>
+  </tbody>
+</table>
+<!-- markdownlint-restore -->
+<!-- prettier-ignore-end -->
+<!-- ALL-CONTRIBUTORS-LIST:END -->
+<!-- prettier-ignore-end -->
+This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!
+## Credits
+[![Copier](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/copier-org/copier/master/img/badge/badge-grayscale-inverted-border-orange.json)](https://github.com/copier-org/copier)
+This package was created with
+[Copier](https://copier.readthedocs.io/) and the
+[browniebroke/pypackage-template](https://github.com/browniebroke/pypackage-template)
+project template.

tea_data_file_conversion-0.1.1/pyproject.toml ADDED Viewed

@@ -0,0 +1,170 @@
+[build-system]
+build-backend = "setuptools.build_meta"
+requires = [ "setuptools" ]
+[project]
+name = "tea-data-file-conversion"
+version = "0.1.1"
+description = "Fixedwidth Processor is a Python package designed to transform fixed-width text files into CSVs using dynamic YAML schema configurations."
+readme = "README.md"
+license = { text = "MIT" }
+authors = [
+  { name = "Mark Moreno", email = "mamoreno@aldineisd.org" },
+]
+requires-python = ">=3.9"
+classifiers = [
+  "Development Status :: 2 - Pre-Alpha",
+  "Intended Audience :: Developers",
+  "Natural Language :: English",
+  "Operating System :: OS Independent",
+  "Programming Language :: Python :: 3.9",
+  "Programming Language :: Python :: 3.10",
+  "Programming Language :: Python :: 3.11",
+  "Programming Language :: Python :: 3.12",
+  "Programming Language :: Python :: 3.13",
+  "Topic :: Software Development :: Libraries",
+]
+dependencies = [
+  "importlib-resources>=6.5.2",
+  "pandas>=2.2.3",
+  "pyyaml>=6.0.2",
+  "rich>=10",
+  "typer>=0.15,<1",
+]
+urls."Bug Tracker" = "https://github.com/markm-io/tea-data-file-conversion/issues"
+urls.Changelog = "https://github.com/markm-io/tea-data-file-conversion/blob/main/CHANGELOG.md"
+urls.documentation = "https://tea-data-file-conversion.readthedocs.io"
+urls.repository = "https://github.com/markm-io/tea-data-file-conversion"
+scripts.tea-data-file-conversion = "tea_data_file_conversion.cli:app"
+[dependency-groups]
+dev = [
+  "pytest>=8,<9",
+  "pytest-cov>=6,<7",
+]
+docs = [
+  "myst-parser>=0.16; python_version>='3.11'",
+  "sphinx>=4; python_version>='3.11'",
+  "sphinx-autobuild>=2024,<2025; python_version>='3.11'",
+  "sphinx-wagtail-theme>=6.5.0",
+]
+[tool.ruff]
+target-version = "py39"
+line-length = 120
+lint.select = [
+  "B",   # flake8-bugbearlear
+  "C4",  # flake8-comprehensions
+  "S",   # flake8-bandit
+  "F",   # pyflake
+  "E",   # pycodestyle
+  "W",   # pycodestyle
+  "UP",  # pyupgrade
+  "I",   # isort
+  "RUF", # ruff specific
+]
+lint.ignore = [
+  "D203", # 1 blank line required before class docstring
+  "D212", # Multi-line docstring summary should start at the first line
+  "D100", # Missing docstring in public module
+  "D104", # Missing docstring in public package
+  "D107", # Missing docstring in `__init__`
+  "D401", # First line of docstring should be in imperative mood
+    "S324", # Use of insecure MD2, MD4, MD5, or SHA1 hash function
+]
+lint.per-file-ignores."conftest.py" = [ "D100" ]
+lint.per-file-ignores."docs/conf.py" = [ "D100" ]
+lint.per-file-ignores."setup.py" = [ "D100" ]
+lint.per-file-ignores."tests/**/*" = [
+  "D100",
+  "D101",
+  "D102",
+  "D103",
+  "D104",
+  "S101",
+]
+lint.isort.known-first-party = [ "tea_data_file_conversion", "tests" ]
+exclude = [
+    "docs/conf.py",
+]
+[tool.pytest.ini_options]
+addopts = """\
+    -v
+    -Wdefault
+    --cov=tea_data_file_conversion
+    --cov-report=term
+    --cov-report=xml
+    """
+pythonpath = [ "src" ]
+[tool.coverage.run]
+branch = true
+[tool.coverage.report]
+exclude_lines = [
+  "pragma: no cover",
+  "@overload",
+  "if TYPE_CHECKING",
+  "raise NotImplementedError",
+  'if __name__ == "__main__":',
+]
+[tool.mypy]
+check_untyped_defs = true
+disallow_any_generics = true
+disallow_incomplete_defs = true
+disallow_untyped_defs = true
+mypy_path = "src/"
+no_implicit_optional = true
+show_error_codes = true
+warn_unreachable = true
+warn_unused_ignores = true
+exclude = [
+  'docs/.*',
+  'setup.py',
+]
+[[tool.mypy.overrides]]
+module = "tests.*"
+allow_untyped_defs = true
+[[tool.mypy.overrides]]
+module = "docs.*"
+ignore_errors = true
+[tool.semantic_release]
+version_toml = [ "pyproject.toml:project.version" ]
+version_variables = [
+  "src/tea_data_file_conversion/__init__.py:__version__",
+  "docs/conf.py:release",
+]
+build_command = """
+pip install uv
+uv lock
+git add uv.lock
+uv build
+"""
+[tool.semantic_release.changelog]
+exclude_commit_patterns = [
+  '''chore(?:\([^)]*?\))?: .+''',
+  '''ci(?:\([^)]*?\))?: .+''',
+  '''refactor(?:\([^)]*?\))?: .+''',
+  '''style(?:\([^)]*?\))?: .+''',
+  '''test(?:\([^)]*?\))?: .+''',
+  '''build\((?!deps\): .+)''',
+  '''Merged? .*''',
+  '''Initial [Cc]ommit.*''', # codespell:ignore
+]
+[tool.semantic_release.changelog.environment]
+keep_trailing_newline = true
+[tool.semantic_release.branches.main]
+match = "main"
+[tool.semantic_release.branches.noop]
+match = "(?!main$)"
+prerelease = true

tea_data_file_conversion-0.1.1/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

tea_data_file_conversion-0.1.1/setup.py ADDED Viewed

@@ -0,0 +1,9 @@
+#!/usr/bin/env python
+# This is a shim to allow GitHub to detect the package, build is done with uv
+# Taken from https://github.com/Textualize/rich
+import setuptools
+if __name__ == "__main__":
+    setuptools.setup(name="tea-data-file-conversion")

tea_data_file_conversion-0.1.1/src/tea_data_file_conversion/__init__.py ADDED Viewed

@@ -0,0 +1,7 @@
+__version__ = "0.1.1"
+# fixedwidth_processor/__init__.py
+from .processor import export_templates, process_file, validate_yaml_config
+__all__ = ["export_templates", "process_file", "validate_yaml_config"]

tea_data_file_conversion-0.1.1/src/tea_data_file_conversion/cli.py ADDED Viewed

@@ -0,0 +1,63 @@
+# file: src/tea_data_file_conversion/cli.py
+r"""Command-line interface for fixed\-width file processing.
+This module provides an entry point to either process a fixed\-width file
+into CSV format using a dynamic YAML schema or export default YAML templates.
+"""
+import argparse
+from .processor import export_templates, process_file
+def main():
+    r"""Parse command\-line arguments and execute the corresponding action.
+    Options:
+      \- Process a fixed\-width file to CSV.
+      \- Export YAML template files if the --export_templates flag is set.
+    """
+    # Set up the argument parser.
+    parser = argparse.ArgumentParser(
+        description=r"Process a fixed\-width file and output a CSV based on dynamic YAML schema."
+    )
+    # Input file (required).
+    parser.add_argument("input_file", help=r"Path to the input fixed\-width file.")
+    # Optional output file.
+    parser.add_argument(
+        "--output_file",
+        help=(
+            "Optional path for the output CSV file. "
+            "If not provided, defaults to the input file name with '_output.csv' appended."
+        ),
+        default=None,
+    )
+    # Optional schema folder location.
+    parser.add_argument(
+        "--schema_folder",
+        help="Path to the folder containing YAML schema files "
+        "(or where templates will be exported). Defaults to current directory.",
+        default=".",
+    )
+    # Flag to export templates.
+    parser.add_argument(
+        "--export_templates",
+        help=r"Export template YAML files from the built\-in "
+        r"default_schema folder to the specified schema_folder and exit.",
+        action="store_true",
+    )
+    # Parse the provided arguments.
+    args = parser.parse_args()
+    # If the export flag is set, export YAML templates and exit immediately.
+    if args.export_templates:
+        export_templates(args.schema_folder)
+    # Otherwise, process the file using the processed arguments.
+    process_file(args.input_file, args.output_file, schema_folder=args.schema_folder)
+if __name__ == "__main__":
+    main()

tea_data_file_conversion-0.1.1/src/tea_data_file_conversion/processor.py ADDED Viewed

@@ -0,0 +1,340 @@
+# file: src/tea_data_file_conversion/processor.py
+r"""Processor module for fixed\-width file conversion.
+This module provides functions to:
+  \- Load and validate YAML schema configurations.
+  \- Process fixed\-width files into structured DataFrame objects.
+  \- Export template YAML schema files.
+  \- Convert CSV files into YAML schema files interactively.
+"""
+import os
+import shutil
+import sys
+import importlib_resources  # Used to locate package data.
+import pandas as pd
+import yaml
+def load_yaml_config(file_path):
+    """
+    Load a YAML configuration file for processing.
+    Parameters
+    ----------
+    file_path : str
+        The path to the YAML configuration file.
+    Returns
+    -------
+    dict
+        The parsed YAML configuration.
+    Raises
+    ------
+    ValueError
+        If there is an error parsing the YAML file.
+    """
+    try:
+        with open(file_path) as f:
+            config = yaml.safe_load(f)
+        return config
+    except yaml.YAMLError as ye:
+        # Raise an error with details of parsing issues.
+        raise ValueError(f"Error parsing YAML file {file_path}: {ye}") from ye
+def validate_yaml_config(config, file_path):
+    """
+    Validate the structure of the YAML configuration.
+    The configuration must be a dictionary containing a key 'fields' mapping to a list.
+    Each field in the list must contain 'start', 'end', and 'output_field' keys.
+    Parameters
+    ----------
+    config : dict
+        The YAML configuration dictionary.
+    file_path : str
+        File path used for reporting in error messages.
+    Raises
+    ------
+    ValueError
+        If the configuration does not adhere to the expected schema.
+    """
+    if not isinstance(config, dict):
+        raise ValueError(f"YAML file {file_path} should be a dictionary at the top level.")
+    if "fields" not in config:
+        raise ValueError(f"YAML file {file_path} is missing the required key 'fields'.")
+    if not isinstance(config["fields"], list):
+        raise ValueError(f"YAML file {file_path} key 'fields' should be a list.")
+    for index, field in enumerate(config["fields"]):
+        if not isinstance(field, dict):
+            raise ValueError(f"YAML file {file_path}, field at index {index} is not a dictionary.")
+        for key in ["start", "end", "output_field"]:
+            if key not in field:
+                raise ValueError(f"YAML file {file_path}, field at index {index} is missing required key '{key}'.")
+        if not isinstance(field["start"], int):
+            raise ValueError(f"YAML file {file_path}, field at index {index} key 'start' must be an integer.")
+        if not isinstance(field["end"], int):
+            raise ValueError(f"YAML file {file_path}, field at index {index} key 'end' must be an integer.")
+        if not isinstance(field["output_field"], str):
+            raise ValueError(f"YAML file {file_path}, field at index {index} key 'output_field' must be a string.")
+        if "keep" in field and not isinstance(field["keep"], bool):
+            raise ValueError(f"YAML file {file_path}, field at index {index} key 'keep' must be a boolean.")
+def process_fixed_width_file(input_file, schema_config, skip_header=False, filter_columns=False):
+    r"""
+    Process a fixed\-width file using the provided YAML schema configuration.
+    It determines column boundaries based on the schema, reads the file using pandas,
+    and applies optional filtering to only return columns marked to be kept.
+    Parameters
+    ----------
+    input_file : str
+        The path to the fixed\-width text file.
+    schema_config : dict
+        Schema configuration dictionary with field definitions.
+    skip_header : bool, optional
+        Skip the header row if True (default is False).
+    filter_columns : bool, optional
+        If True, return only DataFrame columns that are marked with "keep": true.
+    Returns
+    -------
+    pd.DataFrame
+        DataFrame with the processed data.
+    """
+    fields = schema_config["fields"]
+    colspecs = []  # List of tuples defining start and end positions for each field.
+    col_names = []  # List of column names derived from the schema.
+    keep_columns = []  # Track columns flagged to be retained.
+    for field in fields:
+        # Adjust the start position because the schema uses 1-based indexing.
+        start = field["start"] - 1
+        end = field["end"]
+        colspecs.append((start, end))
+        # Use 'mapped_field_name' when filtering columns if available.
+        if filter_columns:
+            col_name = (
+                field["mapped_field_name"] if not pd.isna(field.get("mapped_field_name")) else field["output_field"]
+            )
+        else:
+            col_name = field["output_field"]
+        col_names.append(col_name)
+        if field.get("keep", False):
+            keep_columns.append(col_name)
+    # Ensure each column name is unique by appending a counter if needed.
+    unique_col_names = []
+    for col_name in col_names:
+        if col_name in unique_col_names:
+            count = 1
+            new_col_name = f"{col_name}_{count}"
+            while new_col_name in unique_col_names:
+                count += 1
+                new_col_name = f"{col_name}_{count}"
+            unique_col_names.append(new_col_name)
+        else:
+            unique_col_names.append(col_name)
+    # Read the fixed\-width file into a DataFrame.
+    df = pd.read_fwf(input_file, colspecs=colspecs, header=None, names=unique_col_names)
+    if filter_columns:
+        df = df[keep_columns]
+    return df
+def process_file(input_file, output_file=None, schema_folder=None, filter_columns=False):
+    r"""
+    Process an input fixed\-width file and output a CSV file.
+    The function:
+      \- Determines the appropriate YAML schema based on header info.
+      \- Loads and validates the schema.
+      \- Processes the input file and writes the output DataFrame to CSV.
+    Parameters
+    ----------
+    input_file : str
+        The path to the fixed\-width input file.
+    output_file : str, optional
+        File path for the output CSV. Defaults to input file name with '_output.csv' appended.
+    schema_folder : str, optional
+        Folder where the YAML schema files are located; defaults to the current folder.
+    filter_columns : bool, optional
+        If True, only load columns flagged with "keep": true (default is False).
+    Returns
+    -------
+    pd.DataFrame
+        The processed DataFrame.
+    """
+    # Define the output CSV file name if not explicitly provided.
+    if output_file is None:
+        base, _ = os.path.splitext(input_file)
+        output_file = f"{base}_output.csv"
+    # Read and validate the header line.
+    with open(input_file) as f:
+        header_line = f.readline().strip()
+    if len(header_line) < 4:
+        raise ValueError("The header line must contain at least 4 characters.")
+    # Extract test month and abbreviated school year from header.
+    header = header_line[:4]
+    test_month = int(header[:2])
+    school_year_abbr = int(header[2:4])
+    full_school_year = 2000 + school_year_abbr
+    # Determine test type and adjust school year if necessary.
+    if test_month < 10:
+        test_name = "staar"
+    else:
+        test_name = "staar_eoc"
+        if test_month < 15:
+            full_school_year += 1
+    # Compose the path to the expected YAML schema file.
+    base_folder = schema_folder if schema_folder is not None else "default_schema"
+    schema_config_file = os.path.join(base_folder, test_name, f"{test_name}_{full_school_year}.yaml")
+    print(f"Loading schema config: {schema_config_file}")
+    # Load and validate the YAML configuration.
+    schema_config = load_yaml_config(schema_config_file)
+    try:
+        validate_yaml_config(schema_config, schema_config_file)
+    except ValueError as ve:
+        print(f"YAML validation error: {ve}")
+        sys.exit(1)
+    # Process the file using the loaded schema.
+    df = process_fixed_width_file(input_file, schema_config, skip_header=True, filter_columns=filter_columns)
+    # Write the processed data to a CSV file.
+    df.to_csv(output_file, index=False)
+    print(f"Data has been written to {output_file}")
+    return df
+def export_templates(schema_folder):
+    r"""
+    Export sample YAML template files to a specified folder.
+    The function copies files from the built\-in default_schema directory
+    (packaged with this module) into the target folder while preserving the
+    original directory structure.
+    Parameters
+    ----------
+    schema_folder : str
+        The destination folder for exporting the template YAML files.
+    Notes
+    -----
+    The function exits after exporting the template files.
+    """
+    # Locate the default_schema folder within the package.
+    with importlib_resources.path("fixedwidth_processor", "default_schema") as default_schema_path:
+        # Check if the default_schema_path is a valid directory.
+        if not os.path.isdir(str(default_schema_path)):
+            print("Default schema folder not found in package.")
+            sys.exit(1)
+        # Walk the directory using the string version of the path.
+        for root, _dirs, files in os.walk(str(default_schema_path)):
+            for file in files:
+                rel_path = os.path.relpath(os.path.join(root, file), str(default_schema_path))
+                target_file = os.path.join(schema_folder, rel_path)
+                os.makedirs(os.path.dirname(target_file), exist_ok=True)
+                shutil.copy(os.path.join(root, file), target_file)
+    print(f"Template YAML files exported to {schema_folder}.")
+    print(
+        "Please review and update the templates as needed, then run the script again using the --schema_folder option."
+    )
+    sys.exit(0)
+def csv_to_schema_yaml(csv_file, yaml_output_file=None):
+    r"""
+    Convert a CSV file into a YAML schema file for fixed\-width processing.
+    This function loads a CSV file, lists available columns, and interactively
+    prompts the user to select fields corresponding to start, end, and output
+    values, then writes out a YAML file with the chosen configuration.
+    Parameters
+    ----------
+    csv_file : str
+        Path to the input CSV file.
+    yaml_output_file : str, optional
+        Output file path for the YAML schema. If omitted, a default name is generated.
+    """
+    try:
+        df = pd.read_csv(csv_file)
+    except Exception as e:
+        print(f"Error loading CSV file: {e}")
+        return
+    # Display available CSV columns for user selection.
+    print("Available columns in the CSV:")
+    for col in df.columns:
+        print(f" - {col}")
+    # Request the user to enter the necessary columns.
+    start_col = input("Enter the name of the column representing the start value: ").strip()
+    end_col = input("Enter the name of the column representing the end value: ").strip()
+    output_field_col = input(
+        "Enter the name of the column representing the output field (e.g., 'Field Category - Field Title'): "
+    ).strip()
+    fields = []  # Prepare a list for schema field definitions.
+    for index, row in df.iterrows():
+        try:
+            start_value = int(row[start_col])
+        except (ValueError, TypeError):
+            print(f"Row {index}: Could not convert start value '{row[start_col]}' to int. Skipping this row.")
+            continue
+        try:
+            end_value = int(row[end_col])
+        except (ValueError, TypeError):
+            print(f"Row {index}: Could not convert end value '{row[end_col]}' to int. Skipping this row.")
+            continue
+        # Clean the output field by replacing special dash characters.
+        output_field_value = (
+            str(row[output_field_col]).replace("\u2010", "-").replace("\u2013", "-").replace("\n", "").replace("\r", "")
+        )
+        field_entry = {
+            "start": start_value,
+            "end": end_value,
+            "output_field": output_field_value,
+            "keep": row.get("keep", False),
+            "mapped_field_name": row.get("Mapped Field Title", output_field_value),
+        }
+        fields.append(field_entry)
+    data = {"fields": fields}
+    # Set default output YAML file name if none provided.
+    if yaml_output_file is None:
+        base, _ = os.path.splitext(csv_file)
+        yaml_output_file = f"{base}_schema.yaml"
+    try:
+        with open(yaml_output_file, "w") as f:
+            yaml.dump(data, f, sort_keys=False)
+        print(f"Schema YAML file successfully created: {yaml_output_file}")
+    except Exception as e:
+        print(f"Error writing YAML file: {e}")

tea_data_file_conversion-0.1.1/src/tea_data_file_conversion/py.typed ADDED Viewed

File without changes

tea_data_file_conversion-0.1.1/src/tea_data_file_conversion.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,109 @@
+Metadata-Version: 2.2
+Name: tea-data-file-conversion
+Version: 0.1.1
+Summary: Fixedwidth Processor is a Python package designed to transform fixed-width text files into CSVs using dynamic YAML schema configurations.
+Author-email: Mark Moreno <mamoreno@aldineisd.org>
+License: MIT
+Project-URL: Bug Tracker, https://github.com/markm-io/tea-data-file-conversion/issues
+Project-URL: Changelog, https://github.com/markm-io/tea-data-file-conversion/blob/main/CHANGELOG.md
+Project-URL: documentation, https://tea-data-file-conversion.readthedocs.io
+Project-URL: repository, https://github.com/markm-io/tea-data-file-conversion
+Classifier: Development Status :: 2 - Pre-Alpha
+Classifier: Intended Audience :: Developers
+Classifier: Natural Language :: English
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Software Development :: Libraries
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: importlib-resources>=6.5.2
+Requires-Dist: pandas>=2.2.3
+Requires-Dist: pyyaml>=6.0.2
+Requires-Dist: rich>=10
+Requires-Dist: typer<1,>=0.15
+# tea-data-file-conversion
+<p align="center">
+  <a href="https://github.com/markm-io/tea-data-file-conversion/actions/workflows/ci.yml?query=branch%3Amain">
+    <img src="https://img.shields.io/github/actions/workflow/status/markm-io/tea-data-file-conversion/ci.yml?branch=main&label=CI&logo=github&style=flat-square" alt="CI Status" >
+  </a>
+  <a href="https://tea-data-file-conversion.readthedocs.io">
+    <img src="https://img.shields.io/readthedocs/tea-data-file-conversion.svg?logo=read-the-docs&logoColor=fff&style=flat-square" alt="Documentation Status">
+  </a>
+  <a href="https://codecov.io/gh/markm-io/tea-data-file-conversion">
+    <img src="https://img.shields.io/codecov/c/github/markm-io/tea-data-file-conversion.svg?logo=codecov&logoColor=fff&style=flat-square" alt="Test coverage percentage">
+  </a>
+</p>
+<p align="center">
+  <a href="https://github.com/astral-sh/uv">
+    <img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json" alt="uv">
+  </a>
+  <a href="https://github.com/astral-sh/ruff">
+    <img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json" alt="Ruff">
+  </a>
+  <a href="https://github.com/pre-commit/pre-commit">
+    <img src="https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white&style=flat-square" alt="pre-commit">
+  </a>
+</p>
+<p align="center">
+  <a href="https://pypi.org/project/tea-data-file-conversion/">
+    <img src="https://img.shields.io/pypi/v/tea-data-file-conversion.svg?logo=python&logoColor=fff&style=flat-square" alt="PyPI Version">
+  </a>
+  <img src="https://img.shields.io/pypi/pyversions/tea-data-file-conversion.svg?style=flat-square&logo=python&amp;logoColor=fff" alt="Supported Python versions">
+  <img src="https://img.shields.io/pypi/l/tea-data-file-conversion.svg?style=flat-square" alt="License">
+</p>
+---
+**Documentation**: <a href="https://tea-data-file-conversion.readthedocs.io" target="_blank">https://tea-data-file-conversion.readthedocs.io </a>
+**Source Code**: <a href="https://github.com/markm-io/tea-data-file-conversion" target="_blank">https://github.com/markm-io/tea-data-file-conversion </a>
+---
+Fixedwidth Processor is a Python package designed to transform fixed-width text files into CSVs using dynamic YAML schema configurations.
+## Installation
+Install this via pip (or your favourite package manager):
+`pip install tea-data-file-conversion`
+## Contributors ✨
+Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):
+<!-- prettier-ignore-start -->
+<!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section -->
+<!-- prettier-ignore-start -->
+<!-- markdownlint-disable -->
+<table>
+  <tbody>
+    <tr>
+      <td align="center" valign="top" width="14.28%"><a href="https://github.com/markm-io"><img src="https://avatars.githubusercontent.com/u/45011486?v=4?s=80" width="80px;" alt="Mark Moreno"/><br /><sub><b>Mark Moreno</b></sub></a><br /><a href="https://github.com/markm-io/tea-data-file-conversion/commits?author=markm-io" title="Code">💻</a> <a href="#ideas-markm-io" title="Ideas, Planning, & Feedback">🤔</a> <a href="https://github.com/markm-io/tea-data-file-conversion/commits?author=markm-io" title="Documentation">📖</a></td>
+    </tr>
+  </tbody>
+</table>
+<!-- markdownlint-restore -->
+<!-- prettier-ignore-end -->
+<!-- ALL-CONTRIBUTORS-LIST:END -->
+<!-- prettier-ignore-end -->
+This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!
+## Credits
+[![Copier](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/copier-org/copier/master/img/badge/badge-grayscale-inverted-border-orange.json)](https://github.com/copier-org/copier)
+This package was created with
+[Copier](https://copier.readthedocs.io/) and the
+[browniebroke/pypackage-template](https://github.com/browniebroke/pypackage-template)
+project template.

tea_data_file_conversion-0.1.1/src/tea_data_file_conversion.egg-info/SOURCES.txt ADDED Viewed

@@ -0,0 +1,16 @@
+LICENSE
+README.md
+pyproject.toml
+setup.py
+src/tea_data_file_conversion/__init__.py
+src/tea_data_file_conversion/cli.py
+src/tea_data_file_conversion/processor.py
+src/tea_data_file_conversion/py.typed
+src/tea_data_file_conversion.egg-info/PKG-INFO
+src/tea_data_file_conversion.egg-info/SOURCES.txt
+src/tea_data_file_conversion.egg-info/dependency_links.txt
+src/tea_data_file_conversion.egg-info/entry_points.txt
+src/tea_data_file_conversion.egg-info/requires.txt
+src/tea_data_file_conversion.egg-info/top_level.txt
+tests/test_cli.py
+tests/test_processor.py

tea_data_file_conversion-0.1.1/src/tea_data_file_conversion.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+

tea_data_file_conversion-0.1.1/src/tea_data_file_conversion.egg-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ [console_scripts]
2	+ tea-data-file-conversion = tea_data_file_conversion.cli:app

tea_data_file_conversion-0.1.1/src/tea_data_file_conversion.egg-info/requires.txt ADDED Viewed

@@ -0,0 +1,5 @@
+importlib-resources>=6.5.2
+pandas>=2.2.3
+pyyaml>=6.0.2
+rich>=10
+typer<1,>=0.15

tea_data_file_conversion-0.1.1/src/tea_data_file_conversion.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ tea_data_file_conversion

tea_data_file_conversion-0.1.1/tests/test_cli.py ADDED Viewed

@@ -0,0 +1,86 @@
+import sys
+import pytest
+# Import the cli module. Make sure your PYTHONPATH is set correctly.
+from tea_data_file_conversion import cli
+# Define dummy functions to replace export_templates and process_file.
+def dummy_export_templates(schema_folder):
+    dummy_export_templates.called = True
+    dummy_export_templates.schema_folder = schema_folder
+dummy_export_templates.called = False
+dummy_export_templates.schema_folder = None
+def dummy_process_file(input_file, output_file, schema_folder):
+    dummy_process_file.called = True
+    dummy_process_file.args = (input_file, output_file, schema_folder)
+dummy_process_file.called = False
+dummy_process_file.args = None
+# A fixture to patch the functions in the cli module before each test.
+@pytest.fixture(autouse=True)
+def patch_cli_functions(monkeypatch):
+    monkeypatch.setattr(cli, "export_templates", dummy_export_templates)
+    monkeypatch.setattr(cli, "process_file", dummy_process_file)
+    # Reset our dummy function flags
+    dummy_export_templates.called = False
+    dummy_export_templates.schema_folder = None
+    dummy_process_file.called = False
+    dummy_process_file.args = None
+# Test running the CLI without the --export_templates flag.
+def test_main_without_export_templates():
+    test_input = "dummy_input.txt"
+    test_output = "dummy_output.csv"
+    test_schema = "dummy_schema"
+    sys.argv = [
+        "cli.py",
+        test_input,
+        "--output_file",
+        test_output,
+        "--schema_folder",
+        test_schema,
+    ]
+    cli.main()
+    # export_templates should NOT be called.
+    assert not dummy_export_templates.called
+    # process_file should be called with the provided values.
+    assert dummy_process_file.called
+    assert dummy_process_file.args == (test_input, test_output, test_schema)
+# Test running the CLI when --export_templates flag is provided.
+def test_main_with_export_templates():
+    test_input = "dummy_input.txt"
+    test_schema = "dummy_schema"
+    sys.argv = [
+        "cli.py",
+        test_input,
+        "--schema_folder",
+        test_schema,
+        "--export_templates",
+    ]
+    cli.main()
+    # When --export_templates is provided, export_templates should be called.
+    assert dummy_export_templates.called
+    assert dummy_export_templates.schema_folder == test_schema
+    # process_file is always called after the if-statement.
+    assert dummy_process_file.called
+    # Here output_file is not provided so it defaults to None.
+    assert dummy_process_file.args == (test_input, None, test_schema)
+# Test that missing the required input_file argument causes a SystemExit.
+def test_main_missing_input_file():
+    sys.argv = ["cli.py"]
+    with pytest.raises(SystemExit):
+        cli.main()

tea_data_file_conversion-0.1.1/tests/test_processor.py ADDED Viewed

@@ -0,0 +1,124 @@
+import os
+import pandas as pd
+import pytest
+from tea_data_file_conversion.processor import (
+    csv_to_schema_yaml,
+    load_yaml_config,
+    process_file,
+    process_fixed_width_file,
+    validate_yaml_config,
+)
+# Existing tests remain the same...
+def test_validate_yaml_config_valid():
+    valid_config = {"fields": [{"start": 1, "end": 5, "output_field": "field1", "keep": True}]}
+    validate_yaml_config(valid_config, "test.yaml")  # Should not raise
+def test_validate_yaml_config_invalid_cases():
+    cases = [
+        ({}, "missing fields key"),
+        ({"fields": {}}, "fields not a list"),
+        ({"fields": [{"invalid": "field"}]}, "missing required keys"),
+        ({"fields": [{"start": "1", "end": 5, "output_field": "field1"}]}, "start not int"),
+        ({"fields": [{"start": 1, "end": "5", "output_field": "field1"}]}, "end not int"),
+        ({"fields": [{"start": 1, "end": 5, "output_field": 1}]}, "output_field not str"),
+        ({"fields": [{"start": 1, "end": 5, "output_field": "field1", "keep": "true"}]}, "keep not bool"),
+    ]
+    for config, _ in cases:
+        with pytest.raises(ValueError):
+            validate_yaml_config(config, "test.yaml")
+def test_process_fixed_width_file(tmp_path):
+    # Create a test fixed-width file
+    input_data = "ABC123\nDEF456"
+    input_file = tmp_path / "test.txt"
+    input_file.write_text(input_data)
+    config = {
+        "fields": [
+            {
+                "start": 1,
+                "end": 3,
+                "output_field": "letters",
+                "keep": True,
+                "mapped_field_name": "letters_mapped",  # Added mapped field name
+            },
+            {
+                "start": 4,
+                "end": 6,
+                "output_field": "numbers",
+                "keep": False,
+                "mapped_field_name": "numbers_mapped",  # Added mapped field name
+            },
+        ]
+    }
+    # Test with filter_columns=True
+    df = process_fixed_width_file(str(input_file), config, filter_columns=True)
+    assert list(df.columns) == ["letters_mapped"]  # Updated assertion to use mapped name
+    # Test with filter_columns=False
+    df = process_fixed_width_file(str(input_file), config, filter_columns=False)
+    assert list(df.columns) == ["letters", "numbers"]
+def test_process_file_integration(tmp_path):
+    # Create test input file
+    input_data = "0224ABC123\nDEF456789"
+    input_file = tmp_path / "test.txt"
+    input_file.write_text(input_data)
+    # Create test schema folder and file
+    schema_folder = tmp_path / "schemas"
+    schema_folder.mkdir()
+    staar_folder = schema_folder / "staar"
+    staar_folder.mkdir()
+    schema_content = """
+    fields:
+      - start: 1
+        end: 3
+        output_field: "field1"
+        keep: true
+      - start: 4
+        end: 6
+        output_field: "field2"
+        keep: false
+    """
+    schema_file = staar_folder / "staar_2024.yaml"
+    schema_file.write_text(schema_content)
+    # Test processing
+    output_file = tmp_path / "output.csv"
+    df = process_file(str(input_file), str(output_file), schema_folder=str(schema_folder))
+    assert os.path.exists(output_file)
+    assert isinstance(df, pd.DataFrame)
+def test_csv_to_schema_yaml(tmp_path, monkeypatch):
+    # Create test CSV
+    csv_content = "start,end,field_name\n1,5,Field A\n6,10,Field B"
+    csv_file = tmp_path / "test.csv"
+    csv_file.write_text(csv_content)
+    # Mock input function
+    inputs = ["start", "end", "field_name"]
+    input_iter = iter(inputs)
+    monkeypatch.setattr("builtins.input", lambda _: next(input_iter))
+    # Test conversion
+    yaml_output = tmp_path / "output.yaml"
+    csv_to_schema_yaml(str(csv_file), str(yaml_output))
+    assert yaml_output.exists()
+    # Verify the generated YAML
+    config = load_yaml_config(str(yaml_output))
+    assert "fields" in config
+    assert len(config["fields"]) == 2