PyPI - data-sitter - Versions diffs - 0.1.3__tar.gz → 0.1.6__tar.gz - Mend

data-sitter 0.1.3tar.gz → 0.1.6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (52) hide show

data_sitter-0.1.6/PKG-INFO ADDED Viewed

@@ -0,0 +1,220 @@
+Metadata-Version: 2.4
+Name: data-sitter
+Version: 0.1.6
+Summary: A Python library that reads data contracts and generates Pydantic models for seamless data validation.
+Author-email: Lázaro Pereira Candea <lazaro@candea.es>
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+Requires-Dist: python-dotenv==1.0.1
+Requires-Dist: PyYAML==6.0.2
+Requires-Dist: parse_type==0.6.4
+Requires-Dist: pydantic==2.10.5
+Provides-Extra: dev
+Requires-Dist: pytest==8.3.5; extra == "dev"
+Requires-Dist: pytest-cov==6.0.0; extra == "dev"
+Requires-Dist: pytest-mock==3.14.0; extra == "dev"
+Requires-Dist: twine==6.1.0; extra == "dev"
+Requires-Dist: build==1.2.2.post1; extra == "dev"
+# Data-Sitter
+![Coverage](./coverage.svg)
+## Overview
+Data-Sitter is a Python library designed to simplify data validation by converting data contracts into Pydantic models. This allows for easy and efficient validation of structured data, ensuring compliance with predefined rules and constraints.
+## Features
+- Define structured data contracts in JSON format.
+- Generate Pydantic models automatically from contracts.
+- Enforce validation rules at the field level.
+- Support for rule references within the contract.
+## Installation
+```sh
+pip install data-sitter
+```
+## Development and Deployment
+### CI/CD Pipeline
+The project uses GitHub Actions for continuous integration and deployment:
+1. **Pull Request Checks**
+   - Automatically checks if the version has been bumped in `pyproject.toml`
+   - Fails if the version is the same as in the main branch
+   - Ensures every PR includes a version update
+2. **Automatic Releases**
+   - When code is merged to the main branch:
+     - Builds the package
+     - Publishes to PyPI automatically
+   - Uses PyPI API token for secure authentication
+To set up the CI/CD pipeline:
+1. Create a PyPI API token:
+   - Go to [PyPI Account Settings](https://pypi.org/manage/account/)
+   - Create a new API token with "Upload" scope
+   - Copy the token
+2. Add the token to GitHub:
+   - Go to your repository's Settings > Secrets and variables > Actions
+   - Create a new secret named `PYPI_API_TOKEN`
+   - Paste your PyPI API token
+### Setting Up Development Environment
+To set up a development environment with all the necessary tools, install the package with development dependencies:
+```sh
+pip install -e ".[dev]"
+```
+This will install:
+- The package in editable mode
+- Testing tools (pytest, pytest-cov, pytest-mock)
+- Build tools (build, twine)
+### Building the Package
+To build the package, run:
+```sh
+python -m build
+```
+This will create a `dist` directory containing both a source distribution (`.tar.gz`) and a wheel (`.whl`).
+### Deploying to PyPI
+To upload to PyPI:
+```sh
+twine upload dist/*
+```
+You'll be prompted for your PyPI username and password. For security, it's recommended to use an API token instead of your password.
+## Usage
+### Creating a Pydantic Model from a Contract
+To convert a data contract into a Pydantic model, follow these steps:
+```python
+from data_sitter import Contract
+contract_dict = {
+    "name": "test",
+    "fields": [
+        {
+            "name": "FID",
+            "type": "Integer",
+            "rules": ["Positive"]
+        },
+        {
+            "name": "SECCLASS",
+            "type": "String",
+            "rules": [
+                "Validate Not Null",
+                "Value In ['UNCLASSIFIED', 'CLASSIFIED']",
+            ]
+        }
+    ],
+}
+contract = Contract.from_dict(contract_dict)
+pydantic_contract = contract.pydantic_model
+```
+### Using Rule References
+Data-Sitter allows you to define reusable values in the `values` key and reference them in field rules using `$values.[key]`. For example:
+```json
+{
+    "name": "example_contract",
+    "fields": [
+        {
+            "name": "CATEGORY",
+            "type": "String",
+            "rules": ["Value In $values.categories"]
+        },
+        {
+            "name": "NAME",
+            "type": "String",
+            "rules": [
+                "Length Between $values.min_length and $values.max_length"
+            ]
+        }
+    ],
+    "values": {"categories": ["A", "B", "C"], "min_length": 5,"max_length": 50}
+}
+```
+## Available Rules
+The available validation rules can be retrieved programmatically:
+```python
+from data_sitter import RuleRegistry
+rules = RuleRegistry.get_rules_definition()
+print(rules)
+```
+### Rule Definitions
+Below are the available rules grouped by field type:
+#### Base
+- Is not null
+#### String - (Inherits from `Base`)
+- Is not empty
+- Starts with {prefix:String}
+- Ends with {suffix:String}
+- Is not one of {possible_values:Strings}
+- Is one of {possible_values:Strings}
+- Has length between {min_val:Integer} and {max_val:Integer}
+- Has maximum length {max_len:Integer}
+- Has minimum length {min_len:Integer}
+- Is uppercase
+- Is lowercase
+- Matches regex {pattern:String}
+- Is valid email
+- Is valid URL
+- Has no digits
+#### Numeric - (Inherits from `Base`)
+- Is not zero
+- Is positive
+- Is negative
+- Is at least {min_val:Number}
+- Is at most {max_val:Number}
+- Is greater than {threshold:Number}
+- Is less than {threshold:Number}
+- Is not between {min_val:Number} and {max_val:Number}
+- Is between {min_val:Number} and {max_val:Number}
+#### Integer  - (Inherits from `Numeric`)
+#### Float  - (Inherits from `Numeric`)
+- Has at most {decimal_places:Integer} decimal places
+## Contributing
+Contributions are welcome! Feel free to submit issues or pull requests in the [GitHub repository](https://github.com/lcandea/data-sitter).
+## License
+Data-Sitter is licensed under the MIT License.

data_sitter-0.1.6/README.md ADDED Viewed

@@ -0,0 +1,202 @@
+# Data-Sitter
+![Coverage](./coverage.svg)
+## Overview
+Data-Sitter is a Python library designed to simplify data validation by converting data contracts into Pydantic models. This allows for easy and efficient validation of structured data, ensuring compliance with predefined rules and constraints.
+## Features
+- Define structured data contracts in JSON format.
+- Generate Pydantic models automatically from contracts.
+- Enforce validation rules at the field level.
+- Support for rule references within the contract.
+## Installation
+```sh
+pip install data-sitter
+```
+## Development and Deployment
+### CI/CD Pipeline
+The project uses GitHub Actions for continuous integration and deployment:
+1. **Pull Request Checks**
+   - Automatically checks if the version has been bumped in `pyproject.toml`
+   - Fails if the version is the same as in the main branch
+   - Ensures every PR includes a version update
+2. **Automatic Releases**
+   - When code is merged to the main branch:
+     - Builds the package
+     - Publishes to PyPI automatically
+   - Uses PyPI API token for secure authentication
+To set up the CI/CD pipeline:
+1. Create a PyPI API token:
+   - Go to [PyPI Account Settings](https://pypi.org/manage/account/)
+   - Create a new API token with "Upload" scope
+   - Copy the token
+2. Add the token to GitHub:
+   - Go to your repository's Settings > Secrets and variables > Actions
+   - Create a new secret named `PYPI_API_TOKEN`
+   - Paste your PyPI API token
+### Setting Up Development Environment
+To set up a development environment with all the necessary tools, install the package with development dependencies:
+```sh
+pip install -e ".[dev]"
+```
+This will install:
+- The package in editable mode
+- Testing tools (pytest, pytest-cov, pytest-mock)
+- Build tools (build, twine)
+### Building the Package
+To build the package, run:
+```sh
+python -m build
+```
+This will create a `dist` directory containing both a source distribution (`.tar.gz`) and a wheel (`.whl`).
+### Deploying to PyPI
+To upload to PyPI:
+```sh
+twine upload dist/*
+```
+You'll be prompted for your PyPI username and password. For security, it's recommended to use an API token instead of your password.
+## Usage
+### Creating a Pydantic Model from a Contract
+To convert a data contract into a Pydantic model, follow these steps:
+```python
+from data_sitter import Contract
+contract_dict = {
+    "name": "test",
+    "fields": [
+        {
+            "name": "FID",
+            "type": "Integer",
+            "rules": ["Positive"]
+        },
+        {
+            "name": "SECCLASS",
+            "type": "String",
+            "rules": [
+                "Validate Not Null",
+                "Value In ['UNCLASSIFIED', 'CLASSIFIED']",
+            ]
+        }
+    ],
+}
+contract = Contract.from_dict(contract_dict)
+pydantic_contract = contract.pydantic_model
+```
+### Using Rule References
+Data-Sitter allows you to define reusable values in the `values` key and reference them in field rules using `$values.[key]`. For example:
+```json
+{
+    "name": "example_contract",
+    "fields": [
+        {
+            "name": "CATEGORY",
+            "type": "String",
+            "rules": ["Value In $values.categories"]
+        },
+        {
+            "name": "NAME",
+            "type": "String",
+            "rules": [
+                "Length Between $values.min_length and $values.max_length"
+            ]
+        }
+    ],
+    "values": {"categories": ["A", "B", "C"], "min_length": 5,"max_length": 50}
+}
+```
+## Available Rules
+The available validation rules can be retrieved programmatically:
+```python
+from data_sitter import RuleRegistry
+rules = RuleRegistry.get_rules_definition()
+print(rules)
+```
+### Rule Definitions
+Below are the available rules grouped by field type:
+#### Base
+- Is not null
+#### String - (Inherits from `Base`)
+- Is not empty
+- Starts with {prefix:String}
+- Ends with {suffix:String}
+- Is not one of {possible_values:Strings}
+- Is one of {possible_values:Strings}
+- Has length between {min_val:Integer} and {max_val:Integer}
+- Has maximum length {max_len:Integer}
+- Has minimum length {min_len:Integer}
+- Is uppercase
+- Is lowercase
+- Matches regex {pattern:String}
+- Is valid email
+- Is valid URL
+- Has no digits
+#### Numeric - (Inherits from `Base`)
+- Is not zero
+- Is positive
+- Is negative
+- Is at least {min_val:Number}
+- Is at most {max_val:Number}
+- Is greater than {threshold:Number}
+- Is less than {threshold:Number}
+- Is not between {min_val:Number} and {max_val:Number}
+- Is between {min_val:Number} and {max_val:Number}
+#### Integer  - (Inherits from `Numeric`)
+#### Float  - (Inherits from `Numeric`)
+- Has at most {decimal_places:Integer} decimal places
+## Contributing
+Contributions are welcome! Feel free to submit issues or pull requests in the [GitHub repository](https://github.com/lcandea/data-sitter).
+## License
+Data-Sitter is licensed under the MIT License.

data_sitter-0.1.6/data_sitter/Contract.py ADDED Viewed

@@ -0,0 +1,129 @@
+import json
+import yaml
+from typing import Any, Dict, List, NamedTuple
+from functools import cached_property
+from pydantic import BaseModel
+from .Validation import Validation
+from .field_types import BaseField
+from .FieldResolver import FieldResolver
+from .rules import ProcessedRule, RuleRegistry, RuleParser
+class ContractWithoutFields(Exception):
+    pass
+class ContractWithoutName(Exception):
+    pass
+class Field(NamedTuple):
+    name: str
+    type: str
+    rules: List[str]
+class Contract:
+    name: str
+    fields: List[Field]
+    rule_parser: RuleParser
+    field_resolvers: Dict[str, FieldResolver]
+    def __init__(self, name: str, fields: List[Field], values: Dict[str, Any]) -> None:
+        self.name = name
+        self.fields = fields
+        self.rule_parser = RuleParser(values)
+        self.field_resolvers = {
+            _type: FieldResolver(RuleRegistry.get_type(_type), self.rule_parser)
+            for _type in list({field.type for field in self.fields})  # Unique types
+        }
+    @classmethod
+    def from_dict(cls, contract_dict: dict):
+        if "name" not in contract_dict:
+            raise ContractWithoutName()
+        if "fields" not in contract_dict:
+            raise ContractWithoutFields()
+        return cls(
+            name=contract_dict["name"],
+            fields=[Field(**field) for field in contract_dict["fields"]],
+            values=contract_dict.get("values", {}),
+        )
+    @classmethod
+    def from_json(cls, contract_json: str):
+        return cls.from_dict(json.loads(contract_json))
+    @classmethod
+    def from_yaml(cls, contract_yaml: str):
+        return cls.from_dict(yaml.load(contract_yaml, yaml.Loader))
+    @cached_property
+    def field_validators(self) -> Dict[str, BaseField]:
+        field_validators = {}
+        for field in self.fields:
+            field_resolver = self.field_resolvers[field.type]
+            field_validators[field.name] = field_resolver.get_field_validator(field.name, field.rules)
+        return field_validators
+    @cached_property
+    def rules(self) -> Dict[str, List[ProcessedRule]]:
+        rules = {}
+        for field in self.fields:
+            field_resolver = self.field_resolvers[field.type]
+            rules[field.name] = field_resolver.get_processed_rules(field.rules)
+        return rules
+    def validate(self, item: dict) -> Validation:
+        return Validation.validate(self.pydantic_model, item)
+    @cached_property
+    def pydantic_model(self) -> BaseModel:
+        return type(self.name, (BaseModel,), {
+            "__annotations__": {
+                name: field_validator.get_annotation()
+                for name, field_validator in self.field_validators.items()
+            }
+        })
+    @cached_property
+    def contract(self) -> dict:
+        return {
+            "name": self.name,
+            "fields": [
+                {
+                    "name": name,
+                    "type": field_validator.type_name.value,
+                    "rules": [rule.parsed_rule for rule in self.rules.get(name, [])]
+                }
+                for name, field_validator in self.field_validators.items()
+            ],
+            "values": self.rule_parser.values
+        }
+    def get_json_contract(self, indent: int=2) -> str:
+        return json.dumps(self.contract, indent=indent)
+    def get_yaml_contract(self, indent: int=2) -> str:
+        return yaml.dump(self.contract, Dumper=yaml.Dumper, indent=indent, sort_keys=False)
+    def get_front_end_contract(self) -> dict:
+        return {
+            "name": self.name,
+            "fields": [
+                {
+                    "name": name,
+                    "type": field_validator.type_name.value,
+                    "rules": [
+                        rule.get_front_end_repr()
+                        for rule in self.rules.get(name, [])
+                    ]
+                }
+                for name, field_validator in self.field_validators.items()
+            ],
+            "values": self.rule_parser.values
+        }

data_sitter-0.1.6/data_sitter/FieldResolver.py ADDED Viewed

@@ -0,0 +1,62 @@
+from typing import  Dict, List, Type, Union
+from .field_types import BaseField
+from .rules import Rule, ProcessedRule, LogicalRule, MatchedRule, RuleRegistry, LogicalOperator
+from .rules.Parser import RuleParser
+class RuleNotFoundError(Exception):
+    """No matching rule found for the given parsed rule."""
+class MalformedLogicalRuleError(Exception):
+    """Logical rule structure not recognised."""
+class FieldResolver:
+    field_class: Type[BaseField]
+    rule_parser: RuleParser
+    rules: List[Rule]
+    _match_rule_cache: Dict[str, MatchedRule]
+    def __init__(self, field_class: Type[BaseField], rule_parser: RuleParser) -> None:
+        self.field_class = field_class
+        self.rule_parser = rule_parser
+        self.rules = RuleRegistry.get_rules_for(field_class)
+        self._match_rule_cache = {}
+    def get_field_validator(self, name: str, parsed_rules: List[Union[str, dict]]) -> BaseField:
+        field_validator = self.field_class(name)
+        processed_rules = self.get_processed_rules(parsed_rules)
+        validators = [pr.get_validator(field_validator) for pr in processed_rules]
+        field_validator.validators = validators
+        return field_validator
+    def get_processed_rules(self, parsed_rules: List[Union[str, dict]]) -> List[ProcessedRule]:
+        processed_rules = []
+        for parsed_rule in parsed_rules:
+            if isinstance(parsed_rule, dict):
+                if len(keys := tuple(parsed_rule)) != 1 or (operator := keys[0]) not in LogicalOperator:
+                    raise MalformedLogicalRuleError()
+                if operator == LogicalOperator.NOT and not isinstance(parsed_rule[operator], list):
+                    parsed_rule = {operator: [parsed_rule[operator]]}  # NOT operator can be a single rule
+                processed_rule = LogicalRule(operator, self.get_processed_rules(parsed_rule[operator]))
+            elif isinstance(parsed_rule, str):
+                processed_rule = self._match_rule(parsed_rule)
+                if not processed_rule:
+                    raise RuleNotFoundError(f"Rule not found for parsed rule: '{parsed_rule}'")
+            else:
+                raise TypeError(f'Parsed Rule type not recognised: {type(parsed_rule)}')
+            processed_rules.append(processed_rule)
+        return processed_rules
+    def _match_rule(self, parsed_rule: str) -> MatchedRule:
+        if parsed_rule in self._match_rule_cache:
+            return self._match_rule_cache[parsed_rule]
+        for rule in self.rules:
+            matched_rule = self.rule_parser.match(rule, parsed_rule)
+            if matched_rule:
+                self._match_rule_cache[parsed_rule] = matched_rule
+                return matched_rule
+        return None

data_sitter-0.1.6/data_sitter/Validation.py ADDED Viewed

@@ -0,0 +1,39 @@
+from collections import defaultdict
+from typing import Any, Dict, List, Type
+from pydantic import BaseModel, ValidationError
+class Validation():
+    item: Dict[str, Any]
+    errors: Dict[str, List[str]]
+    unknowns: Dict[str, Any]
+    def __init__(self, item: dict, errors: dict = None, unknowns: dict = None):
+        self.item = item
+        self.errors = errors if errors else None
+        self.unknowns = unknowns if unknowns else None
+    def to_dict(self) -> dict:
+        return {key: value for key in ["item", "errors", "unknowns"] if (value := getattr(self, key))}
+    @classmethod
+    def validate(cls, PydanticModel: Type[BaseModel], input_item: dict) -> "Validation":
+        model_keys = PydanticModel.model_json_schema()['properties'].keys()
+        item = {key: None for key in model_keys}  # Filling not present values with Nones
+        errors = defaultdict(list)
+        unknowns = {}
+        for key, value in input_item.items():
+            if key in item:
+                item[key] = value
+            else:
+                unknowns[key] = value
+        try:
+            validated = PydanticModel(**item).model_dump()
+        except ValidationError as e:
+            validated = item
+            for error in e.errors():
+                field = error['loc'][0]  # Extract the field name
+                msg = error['msg']
+                errors[field].append(msg)
+        return Validation(item=validated, errors=dict(errors), unknowns=unknowns)

{data_sitter-0.1.3 → data_sitter-0.1.6}/data_sitter/cli.py RENAMED Viewed

@@ -44,5 +44,5 @@ def main():
     print(f"The file {args.file} pass the contract {args.contract}")
-if __name__ == '__main__':
+if __name__ == '__main__':  # pragma: no cover
     main()

data-sitter 0.1.3__tar.gz → 0.1.6__tar.gz

data-sitter 0.1.3tar.gz → 0.1.6tar.gz