PyPI - fauxdata-cli - Versions diffs - 0.1.1__tar.gz → 0.1.2__tar.gz - Mend

fauxdata-cli 0.1.1tar.gz → 0.1.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (37) hide show

fauxdata_cli-0.1.2/.coverage ADDED Viewed

Binary file

fauxdata_cli-0.1.2/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Andrea Borruso <aborruso@gmail.com>
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

fauxdata_cli-0.1.2/LOG.md ADDED Viewed

@@ -0,0 +1,34 @@
+# Log
+## 2026-03-06 — v0.1.2
+- Bump to 0.1.2 and publish to PyPI
+## 2026-03-06 (feature)
+- `--version` / `-V` flag nel CLI (`fauxdata --version` → `fauxdata 0.1.1`)
+- Coverage threshold 80% in pytest config (`--cov-fail-under=80`); attuale: 83.76%
+- Nuovo campo `pattern` in ColumnSchema: genera stringhe che matchano un regex via pointblank
+- Nuovo campo `null_probability` in ColumnSchema: controllo granulare dei null (0.0–1.0), con validazione in parsing
+- Rimossa dipendenza `faker` (non usata, pointblank gestisce tutto)
+- Fix generator: `null_probability=None` non passato a pointblank (causa TypeError)
+- Test aggiornati: 79/79 pass
+## 2026-03-06 (tests)
+- Add pytest test suite: 65 tests, 100% pass, 0.44s
+- `tests/test_schema.py`: unit tests for YAML schema parsing (valid/invalid cases)
+- `tests/test_output.py`: unit tests for export functions (all formats, stdout, errors)
+- `tests/test_generator.py`: integration tests for generation (types, seed, unique, presets)
+- `tests/test_validator.py`: integration tests for validation rules (pass/fail scenarios)
+- `tests/test_cli.py`: CLI smoke tests via `typer.testing.CliRunner`
+- Add `[dependency-groups] dev` in `pyproject.toml` (pytest, pytest-cov); config via `[tool.pytest.ini_options]`
+## 2026-03-06
+- Initial implementation of `fauxdata` CLI
+- Stack: pointblank 0.22 (native generation + validation), polars, typer, rich, pyfiglet, questionary
+- Commands: `init`, `generate`, `validate`, `preview`
+- Example schemas: `people.yml`, `orders.yml`, `events.yml`
+- All schemas generate and validate cleanly (all rules PASS)
+- `locale` field at schema level maps to pointblank `country=` param

{fauxdata_cli-0.1.1 → fauxdata_cli-0.1.2}/PKG-INFO RENAMED Viewed

@@ -1,12 +1,12 @@
 Metadata-Version: 2.4
 Name: fauxdata-cli
-Version: 0.1.1
+Version: 0.1.2
 Summary: CLI for generating and validating fake datasets
 Project-URL: Homepage, https://aborruso.github.io/fauxdata/
 Project-URL: Repository, https://github.com/aborruso/fauxdata
 Project-URL: Bug Tracker, https://github.com/aborruso/fauxdata/issues
+License-File: LICENSE
 Requires-Python: >=3.11
-Requires-Dist: faker>=26.0
 Requires-Dist: pointblank>=0.22
 Requires-Dist: polars>=1.0
 Requires-Dist: pyfiglet>=1.0

fauxdata_cli-0.1.2/docs/deployment.md ADDED Viewed

@@ -0,0 +1,90 @@
+# Deployment rules
+## Pre-release checklist (always)
+1. Run tests locally — must all pass:
+```bash
+uv run pytest
+```
+Coverage must stay above 80%. If it drops, fix before proceeding.
+2. Bump version in **both**:
+   - `src/fauxdata/__init__.py` → `__version__ = "X.Y.Z"`
+   - `pyproject.toml` → `version = "X.Y.Z"`
+3. Update `LOG.md` with a summary of changes under a new date heading.
+---
+## GitHub release (tag + release notes)
+```bash
+# Create and push annotated tag
+git tag -a vX.Y.Z -m "vX.Y.Z"
+git push origin vX.Y.Z
+```
+Then create a GitHub release via `gh`:
+```bash
+gh release create vX.Y.Z \
+  --title "vX.Y.Z" \
+  --notes "$(cat <<'EOF'
+## What's new
+- Short bullet list of user-facing changes
+- Include new fields, commands, bug fixes
+## Breaking changes
+- List any breaking changes here (or remove section if none)
+## Installation
+\`\`\`bash
+pip install fauxdata-cli==X.Y.Z
+\`\`\`
+Full changelog: https://github.com/aborruso/fauxdata/commits/vX.Y.Z
+EOF
+)"
+```
+Release notes style: **concise, nerd-friendly, technical**. List the actual changes with enough detail that a developer understands what changed and why.
+---
+## PyPI publish (via twine)
+```bash
+# Build
+uv build
+# Check the dist
+twine check dist/*
+# Publish
+twine upload dist/*
+```
+Requires `~/.pypirc` configured with PyPI token, or set `TWINE_USERNAME`/`TWINE_PASSWORD` env vars.
+---
+## Order of operations
+```
+uv run pytest          # must pass 100%
+bump version           # __init__.py + pyproject.toml
+update LOG.md
+git commit + git push
+git tag + git push tag
+gh release create      # with release notes
+uv build
+twine check dist/*
+twine upload dist/*
+```
+Never publish to PyPI without a corresponding GitHub release.

{fauxdata_cli-0.1.1 → fauxdata_cli-0.1.2}/docs/index.html RENAMED Viewed

@@ -5,8 +5,32 @@
 <meta name="viewport" content="width=device-width, initial-scale=1.0">
 <title>fauxdata — fake data, done right</title>
 <meta name="description" content="fauxdata is a CLI tool for generating and validating realistic fake datasets from YAML schemas. Locale-aware, pipeline-friendly, powered by pointblank.">
+<meta name="keywords" content="fake data, synthetic data, dataset generator, CLI, YAML schema, pointblank, data testing, fake dataset, CSV generator, Parquet">
+<meta name="author" content="Andrea Borruso">
+<meta name="robots" content="index, follow">
+<link rel="canonical" href="https://aborruso.github.io/fauxdata/">
+<!-- Open Graph -->
+<meta property="og:type" content="website">
+<meta property="og:url" content="https://aborruso.github.io/fauxdata/">
 <meta property="og:title" content="fauxdata — fake data, done right">
 <meta property="og:description" content="Generate and validate realistic fake datasets from YAML schemas. Because fake data can actually be better than real data.">
+<meta property="og:image" content="https://aborruso.github.io/fauxdata/share.png">
+<meta property="og:image:width" content="1200">
+<meta property="og:image:height" content="630">
+<meta property="og:image:alt" content="fauxdata — CLI tool for generating realistic fake datasets">
+<meta property="og:site_name" content="fauxdata">
+<meta property="og:locale" content="en_US">
+<!-- Twitter Card -->
+<meta name="twitter:card" content="summary_large_image">
+<meta name="twitter:url" content="https://aborruso.github.io/fauxdata/">
+<meta name="twitter:title" content="fauxdata — fake data, done right">
+<meta name="twitter:description" content="Generate and validate realistic fake datasets from YAML schemas. Because fake data can actually be better than real data.">
+<meta name="twitter:image" content="https://aborruso.github.io/fauxdata/share.png">
+<meta name="twitter:image:alt" content="fauxdata — CLI tool for generating realistic fake datasets">
+<meta name="twitter:creator" content="@aborruso">
 <link rel="preconnect" href="https://fonts.googleapis.com">
 <link href="https://fonts.googleapis.com/css2?family=IBM+Plex+Mono:ital,wght@0,300;0,400;0,600;0,700;1,400&family=VT323&display=swap" rel="stylesheet">
 <style>
@@ -876,11 +900,13 @@
         <span class="t-line dim" style="padding-left:2rem">| duckdb -c "SELECT status, COUNT(*) FROM '/dev/stdin' GROUP BY ALL"</span>
         <br>
         <span class="t-line t-out">┌───────────┬──────────┐</span>
-        <span class="t-line t-out">│ status    │ count(*) │</span>
-        <span class="t-line t-out">│ delivered │ 3124     │</span>
-        <span class="t-line t-out">│ shipped   │ 2891     │</span>
-        <span class="t-line t-out">│ pending   │ 2003     │</span>
-        <span class="t-line t-out">│ cancelled │ 1982     │</span>
+        <span class="t-line t-out">│  status   │ count(*) │</span>
+        <span class="t-line t-out">│  varchar  │  int64   │</span>
+        <span class="t-line t-out">├───────────┼──────────┤</span>
+        <span class="t-line t-out">│ delivered │     3124 │</span>
+        <span class="t-line t-out">│ shipped   │     2891 │</span>
+        <span class="t-line t-out">│ pending   │     2003 │</span>
+        <span class="t-line t-out">│ cancelled │     1982 │</span>
         <span class="t-line t-out">└───────────┴──────────┘</span>
       </div>
     </div>

fauxdata_cli-0.1.2/docs/share.png ADDED Viewed

Binary file

{fauxdata_cli-0.1.1 → fauxdata_cli-0.1.2}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "fauxdata-cli"
-version = "0.1.1"
+version = "0.1.2"
 description = "CLI for generating and validating fake datasets"
 readme = "README.md"
 requires-python = ">=3.11"
@@ -12,7 +12,6 @@ dependencies = [
     "questionary>=2.0",
     "polars>=1.0",
     "pyyaml>=6.0",
-    "faker>=26.0",
 ]
 [project.urls]
@@ -23,6 +22,16 @@ Repository = "https://github.com/aborruso/fauxdata"
 [project.scripts]
 fauxdata = "fauxdata.main:app"
+[dependency-groups]
+dev = [
+    "pytest>=8.0",
+    "pytest-cov>=5.0",
+]
+[tool.pytest.ini_options]
+testpaths = ["tests"]
+addopts = "--tb=short --cov=fauxdata --cov-report=term-missing --cov-fail-under=80"
 [build-system]
 requires = ["hatchling"]
 build-backend = "hatchling.build"

fauxdata_cli-0.1.2/share.png ADDED Viewed

Binary file

{fauxdata_cli-0.1.1 → fauxdata_cli-0.1.2}/src/fauxdata/__init__.py RENAMED Viewed

@@ -1,3 +1,3 @@
 """fauxdata - CLI for generating and validating fake datasets."""
-__version__ = "0.1.0"
+__version__ = "0.1.2"

{fauxdata_cli-0.1.1 → fauxdata_cli-0.1.2}/src/fauxdata/commands/generate.py RENAMED Viewed

@@ -61,13 +61,22 @@ def _print_schema_table(schema, n: int, seed):
     t = Table(title=f"Schema: {schema.name}", show_header=True, header_style="bold magenta")
     t.add_column("Column", style="cyan")
     t.add_column("Type")
-    t.add_column("Preset/Values")
+    t.add_column("Preset/Pattern/Values")
     t.add_column("Min")
     t.add_column("Max")
     t.add_column("Unique")
+    t.add_column("Null%")
     for col in schema.columns:
-        preset_val = col.preset or (str(col.values) if col.values else "-")
+        if col.pattern:
+            preset_val = f"pattern:{col.pattern}"
+        elif col.preset:
+            preset_val = col.preset
+        elif col.values:
+            preset_val = str(col.values)
+        else:
+            preset_val = "-"
+        null_pct = f"{int(col.null_probability * 100)}%" if col.null_probability else "-"
         t.add_row(
             col.name,
             col.col_type,
@@ -75,6 +84,7 @@ def _print_schema_table(schema, n: int, seed):
             str(col.min) if col.min is not None else "-",
             str(col.max) if col.max is not None else "-",
             "yes" if col.unique else "no",
+            null_pct,
         )
     console.print(t)

{fauxdata_cli-0.1.1 → fauxdata_cli-0.1.2}/src/fauxdata/generator.py RENAMED Viewed

@@ -30,8 +30,10 @@ def _build_pb_schema(schema: SchemaConfig) -> pb.Schema:
 def _col_to_field(col: ColumnSchema):
     """Convert a ColumnSchema to a pointblank field spec."""
-    nullable = col.nullable
+    nullable = col.nullable or (col.null_probability is not None and col.null_probability > 0)
     unique = col.unique
+    # Build optional kwargs only when null_probability is explicitly set
+    np_kwargs = {"null_probability": col.null_probability} if col.null_probability is not None else {}
     if col.col_type == "int":
         return pb.int_field(
@@ -39,6 +41,7 @@ def _col_to_field(col: ColumnSchema):
             max_val=int(col.max) if col.max is not None else None,
             nullable=nullable,
             unique=unique,
+            **np_kwargs,
         )
     elif col.col_type == "float":
@@ -47,10 +50,11 @@ def _col_to_field(col: ColumnSchema):
             max_val=float(col.max) if col.max is not None else None,
             nullable=nullable,
             unique=unique,
+            **np_kwargs,
         )
     elif col.col_type == "bool":
-        return pb.bool_field(nullable=nullable)
+        return pb.bool_field(nullable=nullable, **np_kwargs)
     elif col.col_type == "date":
         return pb.date_field(
@@ -58,6 +62,7 @@ def _col_to_field(col: ColumnSchema):
             max_date=str(col.max) if col.max is not None else None,
             nullable=nullable,
             unique=unique,
+            **np_kwargs,
         )
     elif col.col_type == "datetime":
@@ -66,15 +71,18 @@ def _col_to_field(col: ColumnSchema):
             max_date=str(col.max) if col.max is not None else None,
             nullable=nullable,
             unique=unique,
+            **np_kwargs,
         )
     elif col.col_type == "string":
         if col.values:
-            return pb.string_field(allowed=col.values, nullable=nullable)
+            return pb.string_field(allowed=col.values, nullable=nullable, **np_kwargs)
+        elif col.pattern:
+            return pb.string_field(pattern=col.pattern, nullable=nullable, unique=unique, **np_kwargs)
         elif col.preset:
-            return pb.string_field(preset=col.preset, nullable=nullable, unique=unique)
+            return pb.string_field(preset=col.preset, nullable=nullable, unique=unique, **np_kwargs)
         else:
-            return pb.string_field(nullable=nullable, unique=unique)
+            return pb.string_field(nullable=nullable, unique=unique, **np_kwargs)
     else:
-        return pb.string_field(nullable=nullable)
+        return pb.string_field(nullable=nullable, **np_kwargs)

{fauxdata_cli-0.1.1 → fauxdata_cli-0.1.2}/src/fauxdata/main.py RENAMED Viewed

@@ -9,6 +9,8 @@ import typer
 from rich import print as rprint
 from rich.console import Console
+from fauxdata import __version__
 app = typer.Typer(
     name="fauxdata",
     help="Generate and validate fake datasets from YAML schemas.",
@@ -23,8 +25,20 @@ def _banner():
     rprint("[dim]Generate and validate realistic fake datasets[/dim]\n")
+def _version_callback(value: bool):
+    if value:
+        rprint(f"fauxdata {__version__}")
+        raise typer.Exit()
 @app.callback(invoke_without_command=True)
-def main(ctx: typer.Context):
+def main(
+    ctx: typer.Context,
+    version: Optional[bool] = typer.Option(
+        None, "--version", "-V", callback=_version_callback, is_eager=True,
+        help="Show version and exit.",
+    ),
+):
     if ctx.invoked_subcommand is None:
         _banner()
         rprint(ctx.get_help())

{fauxdata_cli-0.1.1 → fauxdata_cli-0.1.2}/src/fauxdata/schema.py RENAMED Viewed

@@ -57,6 +57,8 @@ class ColumnSchema:
     locale: str | None = None
     precision: int | None = None
     values: list | None = None  # for in_set
+    pattern: str | None = None  # regex pattern for string generation
+    null_probability: float | None = None  # e.g. 0.1 = 10% nulls
 @dataclass
@@ -142,6 +144,10 @@ def _parse_column(name: str, data: dict) -> ColumnSchema:
     if preset and preset not in STRING_PRESETS:
         raise ValueError(f"Column '{name}': unknown preset '{preset}'. Valid: {STRING_PRESETS}")
+    null_probability = data.get("null_probability", None)
+    if null_probability is not None and not (0.0 <= float(null_probability) <= 1.0):
+        raise ValueError(f"Column '{name}': null_probability must be between 0.0 and 1.0")
     return ColumnSchema(
         name=name,
         col_type=col_type,
@@ -153,6 +159,8 @@ def _parse_column(name: str, data: dict) -> ColumnSchema:
         locale=data.get("locale", None),
         precision=data.get("precision", None),
         values=data.get("values", None),
+        pattern=data.get("pattern", None),
+        null_probability=float(null_probability) if null_probability is not None else None,
     )

fauxdata_cli-0.1.2/tests/__init__.py ADDED Viewed

File without changes

fauxdata_cli-0.1.2/tests/conftest.py ADDED Viewed

@@ -0,0 +1,39 @@
+"""Shared fixtures for fauxdata tests."""
+import pytest
+import polars as pl
+from fauxdata.schema import SchemaConfig, ColumnSchema, ValidationRule
+@pytest.fixture
+def minimal_schema():
+    """A minimal SchemaConfig with one int and one string column."""
+    return SchemaConfig(
+        name="test",
+        rows=10,
+        seed=42,
+        locale="US",
+        output_format="csv",
+        columns=[
+            ColumnSchema(name="id", col_type="int", min=1, max=100, unique=True),
+            ColumnSchema(name="name", col_type="string", preset="name"),
+        ],
+    )
+@pytest.fixture
+def simple_df():
+    """A small deterministic DataFrame for validation tests."""
+    return pl.DataFrame({
+        "id": [1, 2, 3],
+        "age": [25, 40, 55],
+        "email": ["a@b.com", "c@d.com", "e@f.com"],
+    })
+@pytest.fixture
+def people_schema_path():
+    """Path to the existing people.yml schema."""
+    from pathlib import Path
+    return str(Path(__file__).parent.parent / "schemas" / "people.yml")

fauxdata_cli-0.1.2/tests/test_cli.py ADDED Viewed

@@ -0,0 +1,112 @@
+"""Smoke tests for the fauxdata CLI using typer's CliRunner."""
+import textwrap
+import pytest
+from typer.testing import CliRunner
+from fauxdata.main import app
+runner = CliRunner()
+def test_cli_no_args():
+    """Running fauxdata with no args should show help."""
+    result = runner.invoke(app, [])
+    assert result.exit_code == 0
+    assert "fauxdata" in result.output.lower() or "generate" in result.output.lower()
+def test_cli_help():
+    result = runner.invoke(app, ["--help"])
+    assert result.exit_code == 0
+    assert "generate" in result.output
+def test_cli_generate_help():
+    result = runner.invoke(app, ["generate", "--help"])
+    assert result.exit_code == 0
+    assert "--rows" in result.output
+    assert "--format" in result.output
+def test_cli_generate_csv(tmp_path, people_schema_path):
+    out = tmp_path / "out.csv"
+    result = runner.invoke(app, ["generate", people_schema_path, "--rows", "5",
+                                  "--out", str(out), "--format", "csv", "--seed", "1"])
+    assert result.exit_code == 0, result.output
+    assert out.exists()
+def test_cli_generate_json(tmp_path, people_schema_path):
+    out = tmp_path / "out.json"
+    result = runner.invoke(app, ["generate", people_schema_path, "--rows", "5",
+                                  "--out", str(out), "--format", "json", "--seed", "1"])
+    assert result.exit_code == 0, result.output
+    assert out.exists()
+def test_cli_generate_parquet(tmp_path, people_schema_path):
+    out = tmp_path / "out.parquet"
+    result = runner.invoke(app, ["generate", people_schema_path, "--rows", "5",
+                                  "--out", str(out), "--format", "parquet", "--seed", "1"])
+    assert result.exit_code == 0, result.output
+    assert out.exists()
+def test_cli_generate_stdout(people_schema_path, capsys):
+    result = runner.invoke(app, ["generate", people_schema_path, "--rows", "3",
+                                  "--out", "-", "--format", "csv", "--seed", "1"])
+    assert result.exit_code == 0, result.output
+def test_cli_generate_with_validate(tmp_path, people_schema_path):
+    out = tmp_path / "out.csv"
+    result = runner.invoke(app, ["generate", people_schema_path, "--rows", "10",
+                                  "--out", str(out), "--format", "csv",
+                                  "--seed", "42", "--validate"])
+    assert result.exit_code == 0, result.output
+def test_cli_generate_missing_schema(tmp_path):
+    result = runner.invoke(app, ["generate", "/nonexistent/schema.yml"])
+    assert result.exit_code != 0
+def test_cli_validate(tmp_path, people_schema_path):
+    """Generate a file then validate it."""
+    out = tmp_path / "people.csv"
+    runner.invoke(app, ["generate", people_schema_path, "--rows", "10",
+                         "--out", str(out), "--format", "csv", "--seed", "42"])
+    result = runner.invoke(app, ["validate", str(out), people_schema_path])
+    assert result.exit_code == 0, result.output
+def test_cli_preview(tmp_path, people_schema_path):
+    out = tmp_path / "people.csv"
+    runner.invoke(app, ["generate", people_schema_path, "--rows", "20",
+                         "--out", str(out), "--format", "csv", "--seed", "42"])
+    result = runner.invoke(app, ["preview", str(out), "--rows", "5"])
+    assert result.exit_code == 0, result.output
+def test_cli_generate_inline_schema(tmp_path):
+    """Test with a minimal inline schema written to a tmp file."""
+    schema_yaml = textwrap.dedent("""\
+        name: mini
+        rows: 5
+        columns:
+          id:
+            type: int
+            min: 1
+            max: 100
+          label:
+            type: string
+            values: ["a", "b"]
+    """)
+    schema_path = tmp_path / "mini.yml"
+    schema_path.write_text(schema_yaml)
+    out = tmp_path / "mini.csv"
+    result = runner.invoke(app, ["generate", str(schema_path),
+                                  "--out", str(out), "--format", "csv", "--seed", "1"])
+    assert result.exit_code == 0, result.output
+    assert out.exists()

fauxdata-cli 0.1.1__tar.gz → 0.1.2__tar.gz

fauxdata-cli 0.1.1tar.gz → 0.1.2tar.gz