PyPI - rost-io - Versions diffs - 0.1.0__tar.gz - Mend

rost-io 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

rost_io-0.1.0/PKG-INFO +111 -0
rost_io-0.1.0/README.md +83 -0
rost_io-0.1.0/pyproject.toml +35 -0
rost_io-0.1.0/rost_io/__init__.py +42 -0
rost_io-0.1.0/rost_io/adapters/__init__.py +0 -0
rost_io-0.1.0/rost_io/adapters/csv_adapter.py +234 -0
rost_io-0.1.0/rost_io/adapters/database_adapter.py +133 -0
rost_io-0.1.0/rost_io/adapters/excel_adapter.py +215 -0
rost_io-0.1.0/rost_io/adapters/json_adapter.py +112 -0
rost_io-0.1.0/rost_io/adapters/pandas_adapter.py +225 -0
rost_io-0.1.0/rost_io/adapters/parquet_adapter.py +162 -0
rost_io-0.1.0/rost_io/base.py +58 -0
rost_io-0.1.0/rost_io/validation.py +66 -0
rost_io-0.1.0/rost_io.egg-info/PKG-INFO +111 -0
rost_io-0.1.0/rost_io.egg-info/SOURCES.txt +17 -0
rost_io-0.1.0/rost_io.egg-info/dependency_links.txt +1 -0
rost_io-0.1.0/rost_io.egg-info/requires.txt +27 -0
rost_io-0.1.0/rost_io.egg-info/top_level.txt +1 -0
rost_io-0.1.0/setup.cfg +4 -0

rost_io-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,111 @@
+Metadata-Version: 2.4
+Name: rost-io
+Version: 0.1.0
+Summary: Adapter layer for .rost — converts any data source to canonical JSON
+License: MIT
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+Requires-Dist: jsonschema>=4.0
+Provides-Extra: csv
+Provides-Extra: parquet
+Requires-Dist: pyarrow>=13.0; extra == "parquet"
+Provides-Extra: db
+Requires-Dist: sqlalchemy>=2.0; extra == "db"
+Provides-Extra: excel
+Requires-Dist: openpyxl>=3.1; extra == "excel"
+Provides-Extra: pandas
+Requires-Dist: pandas>=2.0; extra == "pandas"
+Provides-Extra: llm
+Requires-Dist: instructor>=1.0; extra == "llm"
+Requires-Dist: openai>=1.0; extra == "llm"
+Provides-Extra: all
+Requires-Dist: pyarrow>=13.0; extra == "all"
+Requires-Dist: sqlalchemy>=2.0; extra == "all"
+Requires-Dist: openpyxl>=3.1; extra == "all"
+Requires-Dist: pandas>=2.0; extra == "all"
+Requires-Dist: instructor>=1.0; extra == "all"
+Requires-Dist: openai>=1.0; extra == "all"
+# rost-io — Adapter layer for `.rost`
+`rost-io` converts messy real-world data sources into the canonical JSON
+contracts that the `.rost` compiler and solver consume.
+## Design principle
+The `.rost` compiler only reads canonical JSON.  It has no database drivers,
+no Excel parser, no API calls.  `rost-io` is the adapter layer that handles
+all of that — exactly as described in `DATA_IO.md` (Hexagonal Architecture).
+```
+Excel / CSV / Parquet / PostgreSQL / MySQL / PDF
+        │
+        ▼
+  rost-io adapter        ← this package
+        │
+        │   staff.json        (schema: rost/staff/v1)
+        │   leave.json        (schema: rost/leave/v1)
+        │   calendar.json     (schema: rost/calendar/v1)
+        ▼
+  rostc compiler  →  solver  →  solution.json
+```
+## Installation
+```bash
+# Core only (CSV + JSON — no extra dependencies)
+pip install rost-io
+# With Parquet support (P1)
+pip install "rost-io[parquet]"
+# With database support — PostgreSQL + MySQL (P1)
+pip install "rost-io[db]"
+# With Excel support (P2)
+pip install "rost-io[excel]"
+# With pandas support (P2)
+pip install "rost-io[pandas]"
+# Everything
+pip install "rost-io[all]"
+```
+## Quick start
+```python
+from rost_io import CsvAdapter, validate_staff
+# Convert a CSV staff export to canonical staff.json
+adapter = CsvAdapter("hr_export.csv", id_col="employee_id", tags_col="roles")
+staff_json = adapter.to_staff_json()
+# Validate against the canonical schema
+validate_staff(staff_json)          # raises jsonschema.ValidationError if invalid
+# Write for the compiler
+import json
+with open("staff.json", "w") as f:
+    json.dump(staff_json, f, indent=2)
+```
+## Canonical JSON schemas
+| File             | Schema ID          | Description                         |
+|------------------|--------------------|-------------------------------------|
+| `staff.json`     | `rost/staff/v1`    | People + tags                       |
+| `leave.json`     | `rost/leave/v1`    | Leave/absence entries               |
+| `calendar.json`  | `rost/calendar/v1` | Date range + public holidays        |
+| `solution.json`  | `rost/solution/v1` | Solver output (read-only for rost-io)|
+## Adapters
+| Adapter            | Priority | Extra dep           |
+|--------------------|----------|---------------------|
+| `CsvAdapter`       | P0       | none (stdlib)       |
+| `JsonAdapter`      | P0       | none                |
+| `ParquetAdapter`   | P1       | `pyarrow`           |
+| `DatabaseAdapter`  | P1       | `sqlalchemy`        |
+| `ExcelAdapter`     | P2       | `openpyxl`          |
+| `PandasAdapter`    | P2       | `pandas`            |

rost_io-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,83 @@
+# rost-io — Adapter layer for `.rost`
+`rost-io` converts messy real-world data sources into the canonical JSON
+contracts that the `.rost` compiler and solver consume.
+## Design principle
+The `.rost` compiler only reads canonical JSON.  It has no database drivers,
+no Excel parser, no API calls.  `rost-io` is the adapter layer that handles
+all of that — exactly as described in `DATA_IO.md` (Hexagonal Architecture).
+```
+Excel / CSV / Parquet / PostgreSQL / MySQL / PDF
+        │
+        ▼
+  rost-io adapter        ← this package
+        │
+        │   staff.json        (schema: rost/staff/v1)
+        │   leave.json        (schema: rost/leave/v1)
+        │   calendar.json     (schema: rost/calendar/v1)
+        ▼
+  rostc compiler  →  solver  →  solution.json
+```
+## Installation
+```bash
+# Core only (CSV + JSON — no extra dependencies)
+pip install rost-io
+# With Parquet support (P1)
+pip install "rost-io[parquet]"
+# With database support — PostgreSQL + MySQL (P1)
+pip install "rost-io[db]"
+# With Excel support (P2)
+pip install "rost-io[excel]"
+# With pandas support (P2)
+pip install "rost-io[pandas]"
+# Everything
+pip install "rost-io[all]"
+```
+## Quick start
+```python
+from rost_io import CsvAdapter, validate_staff
+# Convert a CSV staff export to canonical staff.json
+adapter = CsvAdapter("hr_export.csv", id_col="employee_id", tags_col="roles")
+staff_json = adapter.to_staff_json()
+# Validate against the canonical schema
+validate_staff(staff_json)          # raises jsonschema.ValidationError if invalid
+# Write for the compiler
+import json
+with open("staff.json", "w") as f:
+    json.dump(staff_json, f, indent=2)
+```
+## Canonical JSON schemas
+| File             | Schema ID          | Description                         |
+|------------------|--------------------|-------------------------------------|
+| `staff.json`     | `rost/staff/v1`    | People + tags                       |
+| `leave.json`     | `rost/leave/v1`    | Leave/absence entries               |
+| `calendar.json`  | `rost/calendar/v1` | Date range + public holidays        |
+| `solution.json`  | `rost/solution/v1` | Solver output (read-only for rost-io)|
+## Adapters
+| Adapter            | Priority | Extra dep           |
+|--------------------|----------|---------------------|
+| `CsvAdapter`       | P0       | none (stdlib)       |
+| `JsonAdapter`      | P0       | none                |
+| `ParquetAdapter`   | P1       | `pyarrow`           |
+| `DatabaseAdapter`  | P1       | `sqlalchemy`        |
+| `ExcelAdapter`     | P2       | `openpyxl`          |
+| `PandasAdapter`    | P2       | `pandas`            |

rost_io-0.1.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,35 @@
+[build-system]
+requires = ["setuptools>=67", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "rost-io"
+version = "0.1.0"
+description = "Adapter layer for .rost — converts any data source to canonical JSON"
+readme = "README.md"
+requires-python = ">=3.10"
+license = { text = "MIT" }
+dependencies = [
+    "jsonschema>=4.0",
+]
+[project.optional-dependencies]
+csv     = []                              # stdlib csv — no extra dep
+parquet = ["pyarrow>=13.0"]
+db      = ["sqlalchemy>=2.0"]
+excel   = ["openpyxl>=3.1"]
+pandas  = ["pandas>=2.0"]
+llm     = ["instructor>=1.0", "openai>=1.0"]
+all     = [
+    "pyarrow>=13.0",
+    "sqlalchemy>=2.0",
+    "openpyxl>=3.1",
+    "pandas>=2.0",
+    "instructor>=1.0",
+    "openai>=1.0",
+]
+[tool.setuptools.packages.find]
+where = ["."]
+include = ["rost_io*"]

rost_io-0.1.0/rost_io/__init__.py ADDED Viewed

@@ -0,0 +1,42 @@
+"""
+rost-io — Adapter layer for .rost
+Converts any data source to canonical JSON (rost/staff/v1, rost/leave/v1, rost/calendar/v1).
+"""
+from __future__ import annotations
+from rost_io.base import RostAdapter
+from rost_io.validation import validate_staff, validate_leave, validate_calendar, validate_solution
+from rost_io.adapters.csv_adapter import CsvAdapter
+from rost_io.adapters.json_adapter import JsonAdapter
+__all__ = [
+    # Base class
+    "RostAdapter",
+    # Validation helpers
+    "validate_staff",
+    "validate_leave",
+    "validate_calendar",
+    "validate_solution",
+    # P0 adapters (no extra deps)
+    "CsvAdapter",
+    "JsonAdapter",
+]
+# P1/P2 adapters are imported lazily so missing optional deps don't break the
+# core import.  Users should install extras: pip install "rost-io[parquet]" etc.
+def __getattr__(name: str):
+    if name == "ParquetAdapter":
+        from rost_io.adapters.parquet_adapter import ParquetAdapter
+        return ParquetAdapter
+    if name == "DatabaseAdapter":
+        from rost_io.adapters.database_adapter import DatabaseAdapter
+        return DatabaseAdapter
+    if name == "ExcelAdapter":
+        from rost_io.adapters.excel_adapter import ExcelAdapter
+        return ExcelAdapter
+    if name == "PandasAdapter":
+        from rost_io.adapters.pandas_adapter import PandasAdapter
+        return PandasAdapter
+    raise AttributeError(f"module 'rost_io' has no attribute {name!r}")

rost_io-0.1.0/rost_io/adapters/__init__.py ADDED Viewed

File without changes

rost_io-0.1.0/rost_io/adapters/csv_adapter.py ADDED Viewed

@@ -0,0 +1,234 @@
+"""
+CsvAdapter — P0 adapter that reads CSV staff/leave exports.
+No extra dependencies beyond the Python standard library.
+Staff CSV format (minimum):
+    name,tags
+    Alice,trainee
+    Bob,ac
+Staff CSV format (extended):
+    employee_id,full_name,role,department
+    alice,Alice Wong,trainee,ED
+    bob,Bob Smith,ac,ED
+Leave CSV format (minimum):
+    person_id,start,end
+    alice,2026-05-10,2026-05-12
+Leave CSV format (extended):
+    person_id,start,end,type,priority,approved
+    alice,2026-05-10,2026-05-12,annual,normal,true
+"""
+from __future__ import annotations
+import csv
+import re
+from pathlib import Path
+from typing import Any
+from rost_io.base import RostAdapter
+def _parse_bool(s: str) -> bool:
+    return s.strip().lower() in ("true", "yes", "1", "y")
+def _normalise_id(s: str) -> str:
+    """Lower-case, strip, replace spaces/special chars with underscores."""
+    return re.sub(r"[^a-z0-9_]", "_", s.strip().lower()).strip("_")
+class CsvAdapter(RostAdapter):
+    """
+    Reads a CSV file and converts it to canonical rost/staff/v1 or
+    rost/leave/v1 JSON.
+    Args:
+        path:           Path to the CSV file.
+        id_col:         Column name for the person identifier.
+                        Defaults to first of: ``id``, ``employee_id``, ``name``.
+        display_name_col: Column name for the display name (optional).
+        tags_col:       Column name for comma-separated tags (optional).
+        extra_tag_cols: Additional columns to include as ``col:value`` tags.
+        normalise_ids:  Whether to normalise IDs (lower-case, underscore).
+                        Default True.
+        encoding:       CSV file encoding.  Default ``utf-8``.
+    """
+    # Ordered list of fallback column names when id_col is not specified
+    _ID_FALLBACKS = ("id", "employee_id", "name", "person_id")
+    def __init__(
+        self,
+        path: str | Path,
+        *,
+        id_col: str | None = None,
+        display_name_col: str | None = None,
+        tags_col: str | None = None,
+        extra_tag_cols: list[str] | None = None,
+        normalise_ids: bool = True,
+        encoding: str = "utf-8",
+    ) -> None:
+        self.path = Path(path)
+        self.id_col = id_col
+        self.display_name_col = display_name_col
+        self.tags_col = tags_col
+        self.extra_tag_cols = extra_tag_cols or []
+        self.normalise_ids = normalise_ids
+        self.encoding = encoding
+    # ── staff ──────────────────────────────────────────────────────────────────
+    def to_staff_json(self) -> dict:
+        """
+        Convert the CSV to canonical rost/staff/v1 JSON.
+        Returns:
+            dict with ``"schema": "rost/staff/v1"`` and ``"people"`` list.
+        """
+        rows = self._read_csv()
+        if not rows:
+            return {"schema": "rost/staff/v1", "people": []}
+        headers = list(rows[0].keys())
+        id_col = self._resolve_col(headers, self.id_col, self._ID_FALLBACKS, "id/name")
+        display_col = self._resolve_col(
+            headers, self.display_name_col,
+            ("display_name", "full_name", "name", "Display Name", "Full Name"),
+            required=False,
+        )
+        tags_col = self._resolve_col(
+            headers, self.tags_col, ("tags", "roles", "role", "tag"),
+            required=False,
+        )
+        extra_cols = [
+            c for c in self.extra_tag_cols if c in headers
+        ]
+        people = []
+        for row in rows:
+            raw_id = row.get(id_col, "").strip()
+            if not raw_id:
+                continue
+            person_id = _normalise_id(raw_id) if self.normalise_ids else raw_id
+            display_name = row.get(display_col, raw_id).strip() if display_col else raw_id
+            tags: list[str] = []
+            if tags_col:
+                tags_str = row.get(tags_col, "").strip()
+                tags = [t.strip() for t in tags_str.split(",") if t.strip()]
+            for col in extra_cols:
+                val = row.get(col, "").strip()
+                if val:
+                    tags.append(f"{col}:{val}")
+            people.append({
+                "id": person_id,
+                "display_name": display_name,
+                "tags": tags,
+                "custom": {},
+            })
+        return {"schema": "rost/staff/v1", "people": people}
+    # ── leave ──────────────────────────────────────────────────────────────────
+    def to_leave_json(self) -> dict:
+        """
+        Convert the CSV to canonical rost/leave/v1 JSON.
+        Returns:
+            dict with ``"schema": "rost/leave/v1"`` and ``"entries"`` list.
+        """
+        rows = self._read_csv()
+        if not rows:
+            return {"schema": "rost/leave/v1", "entries": []}
+        headers = list(rows[0].keys())
+        person_col = self._resolve_col(
+            headers, None, ("person_id", "person", "employee_id", "name", "id"), "person_id"
+        )
+        start_col = self._resolve_col(headers, None, ("start", "start_date", "from"), "start")
+        end_col = self._resolve_col(
+            headers, None, ("end", "end_date", "to"), required=False
+        )
+        type_col = self._resolve_col(
+            headers, None, ("type", "leave_type", "category"), required=False
+        )
+        priority_col = self._resolve_col(
+            headers, None, ("priority",), required=False
+        )
+        approved_col = self._resolve_col(
+            headers, None, ("approved", "is_approved"), required=False
+        )
+        entries = []
+        for row in rows:
+            person_id = row.get(person_col, "").strip()
+            if not person_id:
+                continue
+            if self.normalise_ids:
+                person_id = _normalise_id(person_id)
+            start = row.get(start_col, "").strip()
+            if not start:
+                continue
+            entry: dict[str, Any] = {"person_id": person_id, "start": start}
+            if end_col:
+                end = row.get(end_col, "").strip()
+                if end:
+                    entry["end"] = end
+            if type_col:
+                t = row.get(type_col, "").strip()
+                if t:
+                    entry["type"] = t
+            if priority_col:
+                p = row.get(priority_col, "").strip()
+                if p:
+                    entry["priority"] = p
+            if approved_col:
+                a = row.get(approved_col, "").strip()
+                if a:
+                    entry["approved"] = _parse_bool(a)
+            entries.append(entry)
+        return {"schema": "rost/leave/v1", "entries": entries}
+    # ── internal ───────────────────────────────────────────────────────────────
+    def _read_csv(self) -> list[dict[str, str]]:
+        with open(self.path, newline="", encoding=self.encoding) as f:
+            reader = csv.DictReader(f)
+            return list(reader)
+    @staticmethod
+    def _resolve_col(
+        headers: list[str],
+        explicit: str | None,
+        fallbacks: tuple[str, ...],
+        label: str = "",
+        *,
+        required: bool = True,
+    ) -> str | None:
+        if explicit is not None:
+            if explicit not in headers:
+                raise ValueError(
+                    f"Column '{explicit}' not found in CSV. "
+                    f"Available: {', '.join(headers)}"
+                )
+            return explicit
+        for fb in fallbacks:
+            if fb in headers:
+                return fb
+        if required:
+            raise ValueError(
+                f"Cannot find a '{label}' column in CSV. "
+                f"Available: {', '.join(headers)}. "
+                f"Pass the correct column name explicitly."
+            )
+        return None

rost_io-0.1.0/rost_io/adapters/database_adapter.py ADDED Viewed

@@ -0,0 +1,133 @@
+"""
+DatabaseAdapter — P1 adapter for PostgreSQL and MySQL via SQLAlchemy.
+Install:  pip install "rost-io[db]"
+For PostgreSQL also: pip install psycopg2-binary   (or psycopg)
+For MySQL also:      pip install pymysql            (or mysqlclient)
+Usage:
+    adapter = DatabaseAdapter(
+        "postgresql+psycopg2://user:pass@host/dbname",
+        people_query="SELECT employee_id AS id, full_name AS display_name, roles AS tags FROM staff",
+        leave_query='''
+            SELECT employee_id AS person_id, leave_start AS start, leave_end AS end,
+                   leave_type AS type, priority
+            FROM employee_leave WHERE approved = TRUE
+        ''',
+    )
+    staff_json = adapter.to_staff_json()
+    leave_json  = adapter.to_leave_json()
+"""
+from __future__ import annotations
+from typing import Any
+from rost_io.base import RostAdapter
+class DatabaseAdapter(RostAdapter):
+    """
+    Reads from a SQL database (PostgreSQL or MySQL) via SQLAlchemy.
+    Requires ``sqlalchemy``:  pip install "rost-io[db]"
+    Also requires a DB driver:
+      PostgreSQL → psycopg2-binary or psycopg
+      MySQL      → pymysql or mysqlclient
+    Column aliases in your SQL queries must match the canonical field names:
+      Staff: ``id``, ``display_name`` (optional), ``tags`` (optional)
+      Leave: ``person_id``, ``start``, ``end`` (optional), ``type`` (optional),
+             ``priority`` (optional), ``approved`` (optional)
+    Args:
+        connection_string: SQLAlchemy connection URL.
+        people_query:      SQL that returns the staff rows.
+        leave_query:       SQL that returns the leave rows (optional).
+        tags_separator:    Separator for tags when stored as a delimited string.
+                           Default ``","`` (comma).
+    """
+    def __init__(
+        self,
+        connection_string: str,
+        *,
+        people_query: str = "SELECT * FROM staff",
+        leave_query: str | None = None,
+        tags_separator: str = ",",
+    ) -> None:
+        self.connection_string = connection_string
+        self.people_query = people_query
+        self.leave_query = leave_query
+        self.tags_separator = tags_separator
+    def _engine(self):
+        try:
+            from sqlalchemy import create_engine
+        except ImportError as exc:
+            raise ImportError(
+                "sqlalchemy is required for DatabaseAdapter: pip install 'rost-io[db]'"
+            ) from exc
+        return create_engine(self.connection_string)
+    def _query(self, sql: str) -> list[dict[str, Any]]:
+        from sqlalchemy import text
+        engine = self._engine()
+        with engine.connect() as conn:
+            result = conn.execute(text(sql))
+            keys = list(result.keys())
+            return [dict(zip(keys, row)) for row in result]
+    # ── staff ──────────────────────────────────────────────────────────────────
+    def to_staff_json(self) -> dict:
+        rows = self._query(self.people_query)
+        people = []
+        for row in rows:
+            person_id = str(row.get("id") or row.get("name") or "").strip()
+            if not person_id:
+                continue
+            tags = self._parse_tags(row.get("tags"))
+            people.append({
+                "id": person_id,
+                "display_name": str(row.get("display_name", person_id)).strip(),
+                "tags": tags,
+                "custom": {},
+            })
+        return {"schema": "rost/staff/v1", "people": people}
+    # ── leave ──────────────────────────────────────────────────────────────────
+    def to_leave_json(self) -> dict:
+        if not self.leave_query:
+            raise ValueError(
+                "DatabaseAdapter: provide leave_query to use to_leave_json()"
+            )
+        rows = self._query(self.leave_query)
+        entries = []
+        for row in rows:
+            person_id = str(row.get("person_id") or row.get("person") or "").strip()
+            start = str(row.get("start") or row.get("start_date") or "").strip()
+            if not person_id or not start:
+                continue
+            entry: dict[str, Any] = {"person_id": person_id, "start": start}
+            if row.get("end") or row.get("end_date"):
+                entry["end"] = str(row.get("end") or row.get("end_date")).strip()
+            if row.get("type"):
+                entry["type"] = str(row["type"]).strip()
+            if row.get("priority"):
+                entry["priority"] = str(row["priority"]).strip()
+            if "approved" in row and row["approved"] is not None:
+                entry["approved"] = bool(row["approved"])
+            entries.append(entry)
+        return {"schema": "rost/leave/v1", "entries": entries}
+    # ── helpers ────────────────────────────────────────────────────────────────
+    def _parse_tags(self, raw: Any) -> list[str]:
+        if raw is None:
+            return []
+        if isinstance(raw, list):
+            return [str(t) for t in raw if t]
+        return [t.strip() for t in str(raw).split(self.tags_separator) if t.strip()]