PyPI - TestDataX - Versions diffs - 0.1.1__tar.gz → 0.2.0__tar.gz - Mend

TestDataX 0.1.1tar.gz → 0.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

{testdatax-0.1.1 → testdatax-0.2.0}/PKG-INFO RENAMED Viewed

@@ -1,8 +1,9 @@
-Metadata-Version: 2.3
+Metadata-Version: 2.4
 Name: TestDataX
-Version: 0.1.1
+Version: 0.2.0
 Summary: A flexible test data generation toolkit
 License: MIT
+License-File: LICENSE
 Author: JamesPBrett
 Requires-Python: >=3.11,<4.0
 Classifier: License :: OSI Approved :: MIT License
@@ -10,9 +11,10 @@ Classifier: Programming Language :: Python :: 3
 Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
 Classifier: Programming Language :: Python :: 3.13
+Classifier: Programming Language :: Python :: 3.14
 Requires-Dist: faker (>=33.1.0,<34.0.0)
+Requires-Dist: mimesis (>=18.0.0,<19.0.0)
 Requires-Dist: mysql-connector-python (>=9.1.0,<10.0.0)
-Requires-Dist: orjson (>=3.10.12,<4.0.0)
 Requires-Dist: pandas (>=2.2.3,<3.0.0)
 Requires-Dist: pyarrow (>=18.1.0,<19.0.0)
 Requires-Dist: pydantic (>=2.10.4,<3.0.0)
@@ -21,14 +23,12 @@ Description-Content-Type: text/markdown
 # TestDataX
-# TestDataX
 ![Build Status](https://github.com/JamesPBrett/testdatax/actions/workflows/publish.yml/badge.svg)
 [![codecov](https://codecov.io/gh/JamesPBrett/testdatax/branch/main/graph/badge.svg?token=6VX62CI6U9)](https://codecov.io/gh/JamesPBrett/testdatax)
 ![Python Version](https://img.shields.io/badge/python-3.11%2B-blue)
 ![License](https://img.shields.io/badge/license-MIT-blue.svg)
-This command-line interface application enables quick and customizable test data generation across various formats. It leverages Faker for realistic data fields, offers flexible schema configurations, and simplifies output to multiple database dialects or file types. Users can define precise parameters for data volume, types, and constraints for each target data set.
+This command-line interface application enables quick and customizable test data generation across various formats. It supports multiple data providers (Mimesis and Faker) for realistic data generation, offers flexible schema configurations, and simplifies output to multiple database dialects or file types. Users can define precise parameters for data volume, types, and constraints for each target data set.
 ## Requirements
 - Python 3.11+
@@ -41,11 +41,11 @@ pip install testdatax
 # Generate sample data
 testdatax --rows 1000 --format json --output data.json
+```
 ## Features
-- Generate realistic test data using Data providers
+- Generate realistic test data using multiple data providers (Mimesis, Faker)
 - Support for multiple output formats (CSV, JSON, SQL, etc.)
 - Customizable schema definitions
 - Configurable data generation parameters
@@ -63,7 +63,7 @@ testdatax --rows 1000 --format json --output data.json
 ## CLI Usage
 ```bash
-testdatax -o <output_file> -f <format> -s <schema_file> -r <num_rows> [-d]
+testdatax -o <output_file> -f <format> -s <schema_file> -r <num_rows> -p <provider> [-d]
 ```
 Options:
@@ -71,6 +71,7 @@ Options:
 - `-f, --format`: Output format (csv, json, orc, parquet, mysql, mssql, oracle)
 - `-r, --rows`: Number of rows to generate (default: 10)
 - `-s, --schema`: Path to schema file
+- `-p, --provider`: Data provider (mimesis, faker) - default: mimesis
 - `-d, --debug`: Enable debug output
 ## Usage Examples
@@ -80,10 +81,20 @@ Generate 10 rows of CSV data:
 testdatax -o users.csv -f csv -s schema.json -r 10
 ```
+Generate 10 rows of CSV data using Faker provider:
+```bash
+testdatax -o users.csv -f csv -s schema.json -r 10 -p faker
+```
 Generate 1000 rows of Parquet data with debug output:
 ```bash
 testdatax -o large_dataset.parquet -f parquet -s users_schema.json -r 1000 -d
 ```
+Generate 1000 rows of Parquet data using Mimesis provider:
+```bash
+testdatax -o large_dataset.parquet -f parquet -s users_schema.json -r 1000 -p mimesis
+```
 Generate JSON data with default row count (10):
 ```bash
 testdatax -o data.json -f json -s schema.json
@@ -106,7 +117,7 @@ testdatax -o mstest.sql -f mssql -r 1000
 Generate Oracle with default row count (1000), table_name as 'oracle':
 ```bash
-datagen -o oracle.sql -f oracle -r 1000
+testdatax -o oracle.sql -f oracle -r 1000
 ```
 Each command consists of:
@@ -114,6 +125,7 @@ Each command consists of:
 - `-f, --format`: Output format (csv, json, orc, parquet, mysql, mssql, oracle)
 - `-s, --schema`: Path to your schema definition file
 - `-r, --rows`: Number of rows to generate (optional, defaults to 10)
+- `-p, --provider`: Data provider (mimesis, faker) - default: mimesis
 - `-d, --debug`: Enable debug logging (optional)
 ## Schema Example
@@ -122,7 +134,7 @@ Each command consists of:
 {
   "username": {
     "type": "string",
-    "faker": "name"
+    "provider_field": "name"
   },
   "date_joined": {
     "type": "datetime"
@@ -169,7 +181,7 @@ The schema file defines the structure and constraints of your generated data. Ea
     "type": "string",
     "min_length": 5,
     "max_length": 20,
-    "faker": "user_name"  // Use faker to generate realistic data
+    "provider_field": "user_name"  // Use provider-specific field to generate realistic data
   },
   "description": {
     "type": "text",
@@ -211,6 +223,12 @@ The schema file defines the structure and constraints of your generated data. Ea
 }
 ```
+> **Note:** `start_date`/`end_date` bound the generated range (inclusive). When
+> `format` is set, date/datetime values are rendered to a string with
+> `strftime`; for the SQL exporters this means the column receives a formatted
+> string literal rather than a native date, so `format` is best suited to the
+> CSV/JSON formats.
 #### Enum Fields
 ```json
 {
@@ -222,25 +240,25 @@ The schema file defines the structure and constraints of your generated data. Ea
 }
 ```
-#### Using Faker
-The generator supports Faker providers for generating realistic data:
+#### Using Data Providers
+Both Mimesis and Faker providers support the same schema format. You can specify provider-specific generators using the `provider_field` field (works with both providers):
 ```json
 {
   "name": {
     "type": "string",
-    "faker": "name"
+    "provider_field": "name"
   },
   "email": {
     "type": "string",
-    "faker": "email"
+    "provider_field": "email"
   },
   "address": {
     "type": "string",
-    "faker": "address"
+    "provider_field": "address"
   },
   "company": {
     "type": "string",
-    "faker": "company"
+    "provider_field": "company"
   }
 }
 ```
@@ -254,12 +272,12 @@ The generator supports Faker providers for generating realistic data:
   },
   "username": {
     "type": "string",
-    "faker": "user_name",
+    "provider_field": "user_name",
     "unique": true
   },
   "email": {
     "type": "string",
-    "faker": "email",
+    "provider_field": "email",
     "unique": true
   },
   "age": {
@@ -284,6 +302,37 @@ The generator supports Faker providers for generating realistic data:
 }
 ```
+## Data Providers
+TestDataX supports two powerful data providers for generating realistic test data:
+### Mimesis (Default)
+Mimesis is a high-performance Python library for generating synthetic data. It provides:
+- Fast data generation with excellent performance
+- Support for multiple locales and languages
+- Wide variety of data providers for different domains
+- Lightweight and efficient implementation
+### Faker
+Faker is a popular Python library for generating fake data. It offers:
+- Extensive provider ecosystem with community contributions
+- Rich set of localized providers
+- Well-established and widely used in the Python community
+- Comprehensive documentation and examples
+You can specify the provider using the `-p` or `--provider` option:
+```bash
+# Use Mimesis (default)
+testdatax -o data.csv -f csv -p mimesis
+# Use Faker
+testdatax -o data.csv -f csv -p faker
+```
+Both providers support the same schema format and generate compatible data types.
+**Note:** For backward compatibility, the legacy `faker` field name is still supported, but `provider_field` is recommended for new schemas.
 ## Supported Data Types
 - string

{testdatax-0.1.1 → testdatax-0.2.0}/README.md RENAMED Viewed

@@ -1,13 +1,11 @@
 # TestDataX
-# TestDataX
 ![Build Status](https://github.com/JamesPBrett/testdatax/actions/workflows/publish.yml/badge.svg)
 [![codecov](https://codecov.io/gh/JamesPBrett/testdatax/branch/main/graph/badge.svg?token=6VX62CI6U9)](https://codecov.io/gh/JamesPBrett/testdatax)
 ![Python Version](https://img.shields.io/badge/python-3.11%2B-blue)
 ![License](https://img.shields.io/badge/license-MIT-blue.svg)
-This command-line interface application enables quick and customizable test data generation across various formats. It leverages Faker for realistic data fields, offers flexible schema configurations, and simplifies output to multiple database dialects or file types. Users can define precise parameters for data volume, types, and constraints for each target data set.
+This command-line interface application enables quick and customizable test data generation across various formats. It supports multiple data providers (Mimesis and Faker) for realistic data generation, offers flexible schema configurations, and simplifies output to multiple database dialects or file types. Users can define precise parameters for data volume, types, and constraints for each target data set.
 ## Requirements
 - Python 3.11+
@@ -20,11 +18,11 @@ pip install testdatax
 # Generate sample data
 testdatax --rows 1000 --format json --output data.json
+```
 ## Features
-- Generate realistic test data using Data providers
+- Generate realistic test data using multiple data providers (Mimesis, Faker)
 - Support for multiple output formats (CSV, JSON, SQL, etc.)
 - Customizable schema definitions
 - Configurable data generation parameters
@@ -42,7 +40,7 @@ testdatax --rows 1000 --format json --output data.json
 ## CLI Usage
 ```bash
-testdatax -o <output_file> -f <format> -s <schema_file> -r <num_rows> [-d]
+testdatax -o <output_file> -f <format> -s <schema_file> -r <num_rows> -p <provider> [-d]
 ```
 Options:
@@ -50,6 +48,7 @@ Options:
 - `-f, --format`: Output format (csv, json, orc, parquet, mysql, mssql, oracle)
 - `-r, --rows`: Number of rows to generate (default: 10)
 - `-s, --schema`: Path to schema file
+- `-p, --provider`: Data provider (mimesis, faker) - default: mimesis
 - `-d, --debug`: Enable debug output
 ## Usage Examples
@@ -59,10 +58,20 @@ Generate 10 rows of CSV data:
 testdatax -o users.csv -f csv -s schema.json -r 10
 ```
+Generate 10 rows of CSV data using Faker provider:
+```bash
+testdatax -o users.csv -f csv -s schema.json -r 10 -p faker
+```
 Generate 1000 rows of Parquet data with debug output:
 ```bash
 testdatax -o large_dataset.parquet -f parquet -s users_schema.json -r 1000 -d
 ```
+Generate 1000 rows of Parquet data using Mimesis provider:
+```bash
+testdatax -o large_dataset.parquet -f parquet -s users_schema.json -r 1000 -p mimesis
+```
 Generate JSON data with default row count (10):
 ```bash
 testdatax -o data.json -f json -s schema.json
@@ -85,7 +94,7 @@ testdatax -o mstest.sql -f mssql -r 1000
 Generate Oracle with default row count (1000), table_name as 'oracle':
 ```bash
-datagen -o oracle.sql -f oracle -r 1000
+testdatax -o oracle.sql -f oracle -r 1000
 ```
 Each command consists of:
@@ -93,6 +102,7 @@ Each command consists of:
 - `-f, --format`: Output format (csv, json, orc, parquet, mysql, mssql, oracle)
 - `-s, --schema`: Path to your schema definition file
 - `-r, --rows`: Number of rows to generate (optional, defaults to 10)
+- `-p, --provider`: Data provider (mimesis, faker) - default: mimesis
 - `-d, --debug`: Enable debug logging (optional)
 ## Schema Example
@@ -101,7 +111,7 @@ Each command consists of:
 {
   "username": {
     "type": "string",
-    "faker": "name"
+    "provider_field": "name"
   },
   "date_joined": {
     "type": "datetime"
@@ -148,7 +158,7 @@ The schema file defines the structure and constraints of your generated data. Ea
     "type": "string",
     "min_length": 5,
     "max_length": 20,
-    "faker": "user_name"  // Use faker to generate realistic data
+    "provider_field": "user_name"  // Use provider-specific field to generate realistic data
   },
   "description": {
     "type": "text",
@@ -190,6 +200,12 @@ The schema file defines the structure and constraints of your generated data. Ea
 }
 ```
+> **Note:** `start_date`/`end_date` bound the generated range (inclusive). When
+> `format` is set, date/datetime values are rendered to a string with
+> `strftime`; for the SQL exporters this means the column receives a formatted
+> string literal rather than a native date, so `format` is best suited to the
+> CSV/JSON formats.
 #### Enum Fields
 ```json
 {
@@ -201,25 +217,25 @@ The schema file defines the structure and constraints of your generated data. Ea
 }
 ```
-#### Using Faker
-The generator supports Faker providers for generating realistic data:
+#### Using Data Providers
+Both Mimesis and Faker providers support the same schema format. You can specify provider-specific generators using the `provider_field` field (works with both providers):
 ```json
 {
   "name": {
     "type": "string",
-    "faker": "name"
+    "provider_field": "name"
   },
   "email": {
     "type": "string",
-    "faker": "email"
+    "provider_field": "email"
   },
   "address": {
     "type": "string",
-    "faker": "address"
+    "provider_field": "address"
   },
   "company": {
     "type": "string",
-    "faker": "company"
+    "provider_field": "company"
   }
 }
 ```
@@ -233,12 +249,12 @@ The generator supports Faker providers for generating realistic data:
   },
   "username": {
     "type": "string",
-    "faker": "user_name",
+    "provider_field": "user_name",
     "unique": true
   },
   "email": {
     "type": "string",
-    "faker": "email",
+    "provider_field": "email",
     "unique": true
   },
   "age": {
@@ -263,6 +279,37 @@ The generator supports Faker providers for generating realistic data:
 }
 ```
+## Data Providers
+TestDataX supports two powerful data providers for generating realistic test data:
+### Mimesis (Default)
+Mimesis is a high-performance Python library for generating synthetic data. It provides:
+- Fast data generation with excellent performance
+- Support for multiple locales and languages
+- Wide variety of data providers for different domains
+- Lightweight and efficient implementation
+### Faker
+Faker is a popular Python library for generating fake data. It offers:
+- Extensive provider ecosystem with community contributions
+- Rich set of localized providers
+- Well-established and widely used in the Python community
+- Comprehensive documentation and examples
+You can specify the provider using the `-p` or `--provider` option:
+```bash
+# Use Mimesis (default)
+testdatax -o data.csv -f csv -p mimesis
+# Use Faker
+testdatax -o data.csv -f csv -p faker
+```
+Both providers support the same schema format and generate compatible data types.
+**Note:** For backward compatibility, the legacy `faker` field name is still supported, but `provider_field` is recommended for new schemas.
 ## Supported Data Types
 - string

{testdatax-0.1.1 → testdatax-0.2.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "TestDataX"
-version = "0.1.1"
+version = "0.2.0"
 description = "A flexible test data generation toolkit"
 authors = ["JamesPBrett"]
 license = "MIT"
@@ -12,10 +12,10 @@ python = "^3.11"
 typer = "^0.15.1"
 faker = "^33.1.0"
 pydantic = "^2.10.4"
-orjson = "^3.10.12"
 pyarrow = "^18.1.0"
 pandas = "^2.2.3"
 mysql-connector-python = "^9.1.0"
+mimesis = "^18.0.0"
 [tool.poetry.group.dev.dependencies]
 pytest = "^8.3.4"
@@ -38,13 +38,11 @@ types-psutil = "^6.1.0.20241221"
 commitizen = "^3.13.0"
 python-semantic-release = "^9.17.0"
-[build-system]
-requires = ["poetry-core"]
-build-backend = "poetry.core.masonry.api"
 [tool.poetry.scripts]
 testdatax = "src.cli:app"
 [tool.ruff]
 # Same as Black
 line-length = 88
@@ -84,6 +82,7 @@ exclude = [
 [tool.ruff.lint.isort]
 known-first-party = ["src"]
 [tool.black]
 line-length = 88
 target-version = ['py311']
@@ -119,6 +118,15 @@ warn_unreachable = true
 strict_optional = true
 plugins = ["pydantic.mypy"]
+[[tool.mypy.overrides]]
+module = "mimesis.*"
+ignore_missing_imports = true
+[[tool.mypy.overrides]]
+module = "src.providers.mimesis_provider"
+warn_return_any = false
 [tool.coverage.run]
 source = ["src"]
 branch = true
@@ -132,15 +140,17 @@ exclude_lines = [
     "pass",
 ]
 [tool.pytest.ini_options]
 testpaths = ["tests"]
 python_files = ["test_*.py"]
 python_classes = ["Test*"]
 python_functions = ["test_*"]
 [tool.commitizen]
 name = "cz_conventional_commits"
-version = "0.1.0"
+version = "0.1.3"
 tag_format = "v$version"
 version_files = [
     "src/__init__.py:__version__",
@@ -214,3 +224,8 @@ allowed_tags = [
     "chore",    # Maintenance tasks
     "refactor", # Code changes without fixing bugs or adding features
 ]
+[build-system]
+requires = ["poetry-core"]
+build-backend = "poetry.core.masonry.api"

{testdatax-0.1.1 → testdatax-0.2.0}/src/__init__.py RENAMED Viewed

@@ -1,6 +1,6 @@
 """TestDataX package initialization."""
-__version__ = "0.1.1"
+__version__ = "0.2.0"
 from src.cli import app  # noqa

{testdatax-0.1.1 → testdatax-0.2.0}/src/cli.py RENAMED Viewed

@@ -9,6 +9,7 @@ from .exporters.base_exporter import BaseExporter
 from .exporters.utils.constants import DEFAULT_SCHEMA, EXPORT_FORMATS
 from .exporters.utils.exporter_config import EXPORTER_CLASSES
 from .generator import DataGenerator
+from .providers import FakerProvider, MimesisProvider
 from .schemas import DataType, FieldSchema, GeneratorConfig
@@ -42,6 +43,9 @@ FORMAT_OPTION = typer.Option(
 ROWS_OPTION = typer.Option(10, "--rows", "-r", help="Number of rows to generate")
 SCHEMA_PATH_OPTION = typer.Option(None, "--schema", "-s", help="Path to schema file")
 DEBUG_OPTION = typer.Option(False, "--debug", "-d", help="Enable debug output")
+PROVIDER_OPTION = typer.Option(
+    "mimesis", "--provider", "-p", help="Data provider (faker or mimesis)"
+)
 @app.command()
@@ -51,6 +55,7 @@ def generate(
     rows: int = ROWS_OPTION,
     schema_path: Path | None = SCHEMA_PATH_OPTION,
     debug: bool = DEBUG_OPTION,
+    provider: str = PROVIDER_OPTION,
 ) -> None:
     """Generate synthetic data based on the provided schema."""
     try:
@@ -97,15 +102,30 @@ def generate(
                             f"{min_value}, {max_value}"
                         )
+                # Accept "precision" as an alias for "right_digits"; use an
+                # explicit None check so an intentional 0 is not dropped.
+                right_digits = field_def.get("right_digits")
+                if right_digits is None:
+                    right_digits = field_def.get("precision")
                 field_schema = FieldSchema(
                     name=name,
                     type=field_type,
                     enum_values=field_def.get("values"),
                     min_value=min_value,
                     max_value=max_value,
-                    right_digits=field_def.get("right_digits"),
-                    value_provider=field_def.get("faker"),
+                    right_digits=right_digits,
+                    value_provider=field_def.get("provider_field")
+                    or field_def.get("faker"),
                     pattern=field_def.get("pattern"),
+                    nullable=field_def.get("nullable", False),
+                    unique=field_def.get("unique", False),
+                    weights=field_def.get("weights"),
+                    min_length=field_def.get("min_length"),
+                    max_length=field_def.get("max_length"),
+                    start_date=field_def.get("start_date"),
+                    end_date=field_def.get("end_date"),
+                    format=field_def.get("format"),
                 )
                 fields.append(field_schema.model_dump())
             else:
@@ -117,6 +137,10 @@ def generate(
         if format not in EXPORT_FORMATS:
             raise ValueError(f"Unsupported format: {format}")
+        # Validate provider
+        if provider.lower() not in ["faker", "mimesis"]:
+            raise ValueError(f"Unsupported provider: {provider}")
         # Create generator config
         if debug:
             typer.echo(f"Converted fields: {fields}", err=False)
@@ -127,7 +151,12 @@ def generate(
         # Generate data
         if debug:
             typer.echo(f"Generator config: {config}", err=False)
-        generator = DataGenerator()
+        # Select provider
+        data_provider = (
+            MimesisProvider() if provider.lower() == "mimesis" else FakerProvider()
+        )
+        generator = DataGenerator(provider=data_provider)
         data = generator.generate_data(config.fields, config.row_count)
         # Export data
@@ -148,9 +177,10 @@ def generate(
         raise typer.Exit(code=1) from e
     except Exception as e:
         typer.echo(f"Error: {str(e)}", err=True)
-        typer.echo(f"Exception type: {type(e).__name__}", err=True)
-        typer.echo(f"Exception args: {e.args}", err=True)
-        typer.echo(f"Traceback: {traceback.format_exc()}", err=True)
+        if debug:
+            typer.echo(f"Exception type: {type(e).__name__}", err=True)
+            typer.echo(f"Exception args: {e.args}", err=True)
+            typer.echo(f"Traceback: {traceback.format_exc()}", err=True)
         raise typer.Exit(code=1) from e

{testdatax-0.1.1 → testdatax-0.2.0}/src/exporters/csv_exporter.py RENAMED Viewed

@@ -77,10 +77,8 @@ class CsvExporter(BaseExporter):
                 fieldnames = list(data[0].keys())
             first_chunk = True
-            formatted_rows = []
             for chunk in self.chunker.chunk_data(data):
                 formatted_chunk = [self.formatter.format_row(row) for row in chunk]
-                formatted_rows.extend(formatted_chunk)
                 df = pd.DataFrame(formatted_chunk, columns=fieldnames)
                 # Write the data to CSV in chunks

{testdatax-0.1.1 → testdatax-0.2.0}/src/exporters/json_exporter.py RENAMED Viewed

@@ -62,16 +62,22 @@ class JsonExporter(BaseExporter):
                         raise ValueError(
                             f"Field '{field}' in schema is not present in data."
                         )
-            # Format the data and write it in chunks to the output file
-            all_formatted_rows = []
-            for chunk in self.chunker.chunk_data(data):
-                formatted_chunk = [self.formatter.format_row(row) for row in chunk]
-                all_formatted_rows.extend(formatted_chunk)
-            # Write the complete file with proper formatting using json.dumps
+            # Stream a valid JSON array to disk one chunk at a time so the whole
+            # dataset is never held in memory at once.
             with open(output_path, "w", encoding="utf-8") as f:
-                json_str = json.dumps(all_formatted_rows, indent=4)
-                f.write(json_str)
+                f.write("[")
+                first = True
+                for chunk in self.chunker.chunk_data(data):
+                    for row in chunk:
+                        formatted = self.formatter.format_row(row)
+                        block = json.dumps(formatted, indent=4)
+                        indented = "\n".join(
+                            "    " + line for line in block.splitlines()
+                        )
+                        f.write(("\n" if first else ",\n") + indented)
+                        first = False
+                f.write("\n]" if not first else "]")
             logger.info(f"Successfully exported {len(data)} rows to {output_path}.")

TestDataX 0.1.1__tar.gz → 0.2.0__tar.gz

TestDataX 0.1.1tar.gz → 0.2.0tar.gz