PyPI - mostlyai-mock - Versions diffs - 0.0.4__tar.gz → 0.0.6__tar.gz - Mend

mostlyai-mock 0.0.4tar.gz → 0.0.6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

mostlyai_mock-0.0.6/PKG-INFO +186 -0
mostlyai_mock-0.0.6/README.md +153 -0
{mostlyai_mock-0.0.4 → mostlyai_mock-0.0.6}/mostlyai/mock/__init__.py +1 -1
{mostlyai_mock-0.0.4 → mostlyai_mock-0.0.6}/mostlyai/mock/core.py +118 -30
{mostlyai_mock-0.0.4 → mostlyai_mock-0.0.6}/pyproject.toml +18 -1
mostlyai_mock-0.0.4/PKG-INFO +0 -98
mostlyai_mock-0.0.4/README.md +0 -80
{mostlyai_mock-0.0.4 → mostlyai_mock-0.0.6}/.gitignore +0 -0
{mostlyai_mock-0.0.4 → mostlyai_mock-0.0.6}/LICENSE +0 -0

mostlyai_mock-0.0.6/PKG-INFO ADDED Viewed

@@ -0,0 +1,186 @@
+Metadata-Version: 2.4
+Name: mostlyai-mock
+Version: 0.0.6
+Summary: Synthetic Mock Data
+Project-URL: homepage, https://github.com/mostly-ai/mostlyai-mock
+Project-URL: repository, https://github.com/mostly-ai/mostlyai-mock
+Project-URL: documentation, https://mostly-ai.github.io/mostlyai-mock/
+Author-email: MOSTLY AI <dev@mostly.ai>
+License-Expression: Apache-2.0
+License-File: LICENSE
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: Financial and Insurance Industry
+Classifier: Intended Audience :: Healthcare Industry
+Classifier: Intended Audience :: Information Technology
+Classifier: Intended Audience :: Science/Research
+Classifier: Intended Audience :: Telecommunications Industry
+Classifier: License :: OSI Approved :: Apache Software License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Software Development :: Libraries
+Classifier: Typing :: Typed
+Requires-Python: >=3.10
+Requires-Dist: litellm>=1.67.0
+Requires-Dist: numpy>=1.26.3
+Requires-Dist: pandas>=2.0.0
+Requires-Dist: pyarrow>=14.0.0
+Requires-Dist: pydantic<3.0.0,>=2.0.0
+Description-Content-Type: text/markdown
+# Synthetic Mock Data 🔮
+[![Documentation](https://img.shields.io/badge/docs-latest-green)](https://mostly-ai.github.io/mostlyai-mock/) [![stats](https://pepy.tech/badge/mostlyai-mock)](https://pypi.org/project/mostlyai-mock/) ![license](https://img.shields.io/github/license/mostly-ai/mostlyai-mock) ![GitHub Release](https://img.shields.io/github/v/release/mostly-ai/mostlyai-mock) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mostlyai-mock)
+Create data out of nothing. Prompt LLMs for Tabular Data.
+## Key Features
+* A light-weight python client for prompting LLMs for mixed-type tabular data
+* Select from a range of LLM endpoints, that provide structured output
+* Supports single-table as well as multi-table scenarios.
+* Supports variety of data types: `string`, `categorical`, `integer`, `float`, `boolean`, `date`, and `datetime`.
+* Specify context, distributions and rules via dataset-, table- or column-level prompts.
+* Tailor the diversity and realism of your generated data via temperature and top_p.
+## Getting Started
+1. Install the latest version of the `mostlyai-mock` python package.
+```bash
+pip install -U mostlyai-mock
+```
+2. Set the API key of your LLM endpoint (if not done yet)
+```python
+import os
+os.environ["OPENAI_API_KEY"] = "your-api-key"
+# os.environ["GEMINI_API_KEY"] = "your-api-key"
+# os.environ["GROQ_API_KEY"] = "your-api-key"
+```
+Note: You will need to obtain your API key directly from the LLM service provider (e.g. for Open AI from [here](https://platform.openai.com/api-keys)). The LLM endpoint will be determined by the chosen `model` when making calls to `mock.sample`.
+3. Create your first basic synthetic table from scratch
+```python
+from mostlyai import mock
+tables = {
+    "guests": {
+        "description": "Guests of an Alpine ski hotel in Austria",
+        "columns": {
+            "nationality": {"prompt": "2-letter code for the nationality", "dtype": "string"},
+            "name": {"prompt": "first name and last name of the guest", "dtype": "string"},
+            "gender": {"dtype": "category", "values": ["male", "female"]},
+            "age": {"prompt": "age in years; min: 18, max: 80; avg: 25", "dtype": "integer"},
+            "date_of_birth": {"prompt": "date of birth", "dtype": "date"},
+            "checkin_time": {"prompt": "the check in timestamp of the guest; may 2025", "dtype": "datetime"},
+            "is_vip": {"prompt": "is the guest a VIP", "dtype": "boolean"},
+            "price_per_night": {"prompt": "price paid per night, in EUR", "dtype": "float"},
+            "room_number": {"prompt": "room number", "dtype": "integer", "values": [101, 102, 103, 201, 202, 203, 204]}
+        },
+    }
+}
+df = mock.sample(
+    tables=tables,  # provide table and column definitions
+    sample_size=10,  # generate 10 records
+    model="openai/gpt-4.1-nano",  # select the LLM model (optional)
+)
+print(df)
+#   nationality            name  gender  age date_of_birth        checkin_time  is_vip  price_per_night  room_number
+# 0          AT     Anna Müller  female   29    1994-09-15 2025-01-05 14:30:00    True            350.0          101
+# 1          DE  Johann Schmidt    male   45    1978-11-20 2025-01-06 16:45:00   False            250.0          102
+# 2          CH      Lara Meier  female   32    1991-04-12 2025-01-05 12:00:00    True            400.0          103
+# 3          IT     Marco Rossi    male   38    1985-02-25 2025-01-07 09:15:00   False            280.0          201
+# 4          FR   Claire Dupont  female   24    2000-07-08 2025-01-07 11:20:00   False            220.0          202
+# 5          AT    Felix Gruber    male   52    1972-01-10 2025-01-06 17:50:00    True            375.0          203
+# 6          DE   Sophie Becker  female   27    1996-03-30 2025-01-08 08:30:00   False            230.0          204
+# 7          CH      Max Keller    male   31    1992-05-16 2025-01-09 14:10:00   False            290.0          101
+# 8          IT  Giulia Bianchi  female   36    1988-08-19 2025-01-05 15:55:00    True            410.0          102
+# 9          FR    Louis Martin    male   44    1980-12-05 2025-01-07 10:40:00   False            270.0          103
+```
+4. Create your first multi-table synthetic dataset
+```python
+from mostlyai import mock
+tables = {
+    "customers": {
+        "description": "Customers of a hardware store",
+        "columns": {
+            "customer_id": {"prompt": "the unique id of the customer", "dtype": "integer"},
+            "name": {"prompt": "first name and last name of the customer", "dtype": "string"},
+        },
+        "primary_key": "customer_id",
+    },
+    "orders": {
+        "description": "Orders of a Customer",
+        "columns": {
+            "customer_id": {"prompt": "the customer id for that order", "dtype": "integer"},
+            "order_id": {"prompt": "the unique id of the order", "dtype": "string"},
+            "text": {"prompt": "order text description", "dtype": "string"},
+            "amount": {"prompt": "order amount in USD", "dtype": "float"},
+        },
+        "primary_key": "order_id",
+        "foreign_keys": [
+            {
+                "column": "customer_id",
+                "referenced_table": "customers",
+                "description": "each customer has anywhere between 2 and 3 orders",
+            }
+        ],
+    },
+    "items": {
+        "description": "Items in an Order",
+        "columns": {
+            "item_id": {"prompt": "the unique id of the item", "dtype": "string"},
+            "order_id": {"prompt": "the order id for that item", "dtype": "string"},
+            "name": {"prompt": "the name of the item", "dtype": "string"},
+            "price": {"prompt": "the price of the item in USD", "dtype": "float"},
+        },
+        "foreign_keys": [
+            {
+                "column": "order_id",
+                "referenced_table": "orders",
+                "description": "each order has between 1 and 2 items",
+            }
+        ],
+    },
+}
+data = mock.sample(
+    tables=tables,
+    sample_size=2,
+    model="openai/gpt-4.1"
+)
+print(data["customers"])
+#    customer_id            name
+# 0            1  Michael Torres
+# 1            2      Elaine Kim
+print(data["orders"])
+#    customer_id        order_id                                               text  amount
+# 0            1  ORD20240612001        Home office desk and ergonomic chair bundle  412.95
+# 1            1  ORD20240517322               Wireless noise-cancelling headphones  226.49
+# 2            1  ORD20240430307         Smart LED desk lamp with USB charging port   69.99
+# 3            2  ORD20240614015            Eco-friendly bamboo kitchen utensil set   39.95
+# 4            2  ORD20240528356  Air fryer with digital touch screen, 5-quart c...  129.99
+# 5            2  ORD20240510078          Double-walled glass coffee mugs, set of 4    48.5
+print(data["items"])
+#         item_id        order_id                                       name   price
+# 0   ITEM100001A  ORD20240612001                Ergonomic Mesh Office Chair  179.99
+# 1   ITEM100001B  ORD20240612001                Adjustable Home Office Desk  232.96
+# 2   ITEM100002A  ORD20240517322       Wireless Noise-Cancelling Headphones  226.49
+# 3   ITEM100003A  ORD20240430307                        Smart LED Desk Lamp   59.99
+# 4   ITEM100003B  ORD20240430307  USB Charging Cable (Desk Lamp Compatible)    10.0
+# 5   ITEM100004A  ORD20240614015                       Bamboo Cooking Spoon   13.49
+# 6   ITEM100004B  ORD20240614015                      Bamboo Slotted Turner   12.99
+# 7   ITEM100005A  ORD20240528356         Digital Air Fryer (5-Quart, Black)  115.99
+# 8   ITEM100005B  ORD20240528356     Silicone Liner for Air Fryer (5-Quart)   13.99
+# 9   ITEM100006A  ORD20240510078      Double-Walled Glass Coffee Mug (12oz)   13.75
+# 10  ITEM100006B  ORD20240510078       Double-Walled Glass Coffee Mug (8oz)   11.25
+```

mostlyai_mock-0.0.6/README.md ADDED Viewed

@@ -0,0 +1,153 @@
+# Synthetic Mock Data 🔮
+[![Documentation](https://img.shields.io/badge/docs-latest-green)](https://mostly-ai.github.io/mostlyai-mock/) [![stats](https://pepy.tech/badge/mostlyai-mock)](https://pypi.org/project/mostlyai-mock/) ![license](https://img.shields.io/github/license/mostly-ai/mostlyai-mock) ![GitHub Release](https://img.shields.io/github/v/release/mostly-ai/mostlyai-mock) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mostlyai-mock)
+Create data out of nothing. Prompt LLMs for Tabular Data.
+## Key Features
+* A light-weight python client for prompting LLMs for mixed-type tabular data
+* Select from a range of LLM endpoints, that provide structured output
+* Supports single-table as well as multi-table scenarios.
+* Supports variety of data types: `string`, `categorical`, `integer`, `float`, `boolean`, `date`, and `datetime`.
+* Specify context, distributions and rules via dataset-, table- or column-level prompts.
+* Tailor the diversity and realism of your generated data via temperature and top_p.
+## Getting Started
+1. Install the latest version of the `mostlyai-mock` python package.
+```bash
+pip install -U mostlyai-mock
+```
+2. Set the API key of your LLM endpoint (if not done yet)
+```python
+import os
+os.environ["OPENAI_API_KEY"] = "your-api-key"
+# os.environ["GEMINI_API_KEY"] = "your-api-key"
+# os.environ["GROQ_API_KEY"] = "your-api-key"
+```
+Note: You will need to obtain your API key directly from the LLM service provider (e.g. for Open AI from [here](https://platform.openai.com/api-keys)). The LLM endpoint will be determined by the chosen `model` when making calls to `mock.sample`.
+3. Create your first basic synthetic table from scratch
+```python
+from mostlyai import mock
+tables = {
+    "guests": {
+        "description": "Guests of an Alpine ski hotel in Austria",
+        "columns": {
+            "nationality": {"prompt": "2-letter code for the nationality", "dtype": "string"},
+            "name": {"prompt": "first name and last name of the guest", "dtype": "string"},
+            "gender": {"dtype": "category", "values": ["male", "female"]},
+            "age": {"prompt": "age in years; min: 18, max: 80; avg: 25", "dtype": "integer"},
+            "date_of_birth": {"prompt": "date of birth", "dtype": "date"},
+            "checkin_time": {"prompt": "the check in timestamp of the guest; may 2025", "dtype": "datetime"},
+            "is_vip": {"prompt": "is the guest a VIP", "dtype": "boolean"},
+            "price_per_night": {"prompt": "price paid per night, in EUR", "dtype": "float"},
+            "room_number": {"prompt": "room number", "dtype": "integer", "values": [101, 102, 103, 201, 202, 203, 204]}
+        },
+    }
+}
+df = mock.sample(
+    tables=tables,  # provide table and column definitions
+    sample_size=10,  # generate 10 records
+    model="openai/gpt-4.1-nano",  # select the LLM model (optional)
+)
+print(df)
+#   nationality            name  gender  age date_of_birth        checkin_time  is_vip  price_per_night  room_number
+# 0          AT     Anna Müller  female   29    1994-09-15 2025-01-05 14:30:00    True            350.0          101
+# 1          DE  Johann Schmidt    male   45    1978-11-20 2025-01-06 16:45:00   False            250.0          102
+# 2          CH      Lara Meier  female   32    1991-04-12 2025-01-05 12:00:00    True            400.0          103
+# 3          IT     Marco Rossi    male   38    1985-02-25 2025-01-07 09:15:00   False            280.0          201
+# 4          FR   Claire Dupont  female   24    2000-07-08 2025-01-07 11:20:00   False            220.0          202
+# 5          AT    Felix Gruber    male   52    1972-01-10 2025-01-06 17:50:00    True            375.0          203
+# 6          DE   Sophie Becker  female   27    1996-03-30 2025-01-08 08:30:00   False            230.0          204
+# 7          CH      Max Keller    male   31    1992-05-16 2025-01-09 14:10:00   False            290.0          101
+# 8          IT  Giulia Bianchi  female   36    1988-08-19 2025-01-05 15:55:00    True            410.0          102
+# 9          FR    Louis Martin    male   44    1980-12-05 2025-01-07 10:40:00   False            270.0          103
+```
+4. Create your first multi-table synthetic dataset
+```python
+from mostlyai import mock
+tables = {
+    "customers": {
+        "description": "Customers of a hardware store",
+        "columns": {
+            "customer_id": {"prompt": "the unique id of the customer", "dtype": "integer"},
+            "name": {"prompt": "first name and last name of the customer", "dtype": "string"},
+        },
+        "primary_key": "customer_id",
+    },
+    "orders": {
+        "description": "Orders of a Customer",
+        "columns": {
+            "customer_id": {"prompt": "the customer id for that order", "dtype": "integer"},
+            "order_id": {"prompt": "the unique id of the order", "dtype": "string"},
+            "text": {"prompt": "order text description", "dtype": "string"},
+            "amount": {"prompt": "order amount in USD", "dtype": "float"},
+        },
+        "primary_key": "order_id",
+        "foreign_keys": [
+            {
+                "column": "customer_id",
+                "referenced_table": "customers",
+                "description": "each customer has anywhere between 2 and 3 orders",
+            }
+        ],
+    },
+    "items": {
+        "description": "Items in an Order",
+        "columns": {
+            "item_id": {"prompt": "the unique id of the item", "dtype": "string"},
+            "order_id": {"prompt": "the order id for that item", "dtype": "string"},
+            "name": {"prompt": "the name of the item", "dtype": "string"},
+            "price": {"prompt": "the price of the item in USD", "dtype": "float"},
+        },
+        "foreign_keys": [
+            {
+                "column": "order_id",
+                "referenced_table": "orders",
+                "description": "each order has between 1 and 2 items",
+            }
+        ],
+    },
+}
+data = mock.sample(
+    tables=tables,
+    sample_size=2,
+    model="openai/gpt-4.1"
+)
+print(data["customers"])
+#    customer_id            name
+# 0            1  Michael Torres
+# 1            2      Elaine Kim
+print(data["orders"])
+#    customer_id        order_id                                               text  amount
+# 0            1  ORD20240612001        Home office desk and ergonomic chair bundle  412.95
+# 1            1  ORD20240517322               Wireless noise-cancelling headphones  226.49
+# 2            1  ORD20240430307         Smart LED desk lamp with USB charging port   69.99
+# 3            2  ORD20240614015            Eco-friendly bamboo kitchen utensil set   39.95
+# 4            2  ORD20240528356  Air fryer with digital touch screen, 5-quart c...  129.99
+# 5            2  ORD20240510078          Double-walled glass coffee mugs, set of 4    48.5
+print(data["items"])
+#         item_id        order_id                                       name   price
+# 0   ITEM100001A  ORD20240612001                Ergonomic Mesh Office Chair  179.99
+# 1   ITEM100001B  ORD20240612001                Adjustable Home Office Desk  232.96
+# 2   ITEM100002A  ORD20240517322       Wireless Noise-Cancelling Headphones  226.49
+# 3   ITEM100003A  ORD20240430307                        Smart LED Desk Lamp   59.99
+# 4   ITEM100003B  ORD20240430307  USB Charging Cable (Desk Lamp Compatible)    10.0
+# 5   ITEM100004A  ORD20240614015                       Bamboo Cooking Spoon   13.49
+# 6   ITEM100004B  ORD20240614015                      Bamboo Slotted Turner   12.99
+# 7   ITEM100005A  ORD20240528356         Digital Air Fryer (5-Quart, Black)  115.99
+# 8   ITEM100005B  ORD20240528356     Silicone Liner for Air Fryer (5-Quart)   13.99
+# 9   ITEM100006A  ORD20240510078      Double-Walled Glass Coffee Mug (12oz)   13.75
+# 10  ITEM100006B  ORD20240510078       Double-Walled Glass Coffee Mug (8oz)   11.25
+```

{mostlyai_mock-0.0.4 → mostlyai_mock-0.0.6}/mostlyai/mock/__init__.py RENAMED Viewed

@@ -15,4 +15,4 @@
 from mostlyai.mock.core import sample
 __all__ = ["sample"]
-__version__ = "0.0.4"  # Do not set this manually. Use poetry version [params].
+__version__ = "0.0.6"  # Do not set this manually. Use poetry version [params].

{mostlyai_mock-0.0.4 → mostlyai_mock-0.0.6}/mostlyai/mock/core.py RENAMED Viewed

@@ -89,6 +89,31 @@ class MockConfig(RootModel[dict[str, "TableConfig"]]):
         return tables
+    @model_validator(mode="after")
+    def validate_no_circular_dependencies(self) -> MockConfig:
+        child_to_parents = {}
+        for table_name, table_config in self.root.items():
+            child_to_parents[table_name] = [fk.referenced_table for fk in table_config.foreign_keys]
+        visited = set()
+        def detect_cycle(table_name: str, path: list[str]) -> None:
+            if table_name in path:
+                cycle_start = path.index(table_name)
+                cycle = path[cycle_start:] + [table_name]
+                raise ValueError(f"Circular dependency detected: {' -> '.join(cycle)}")
+            if table_name in visited:
+                return
+            visited.add(table_name)
+            path.append(table_name)
+            for parent in child_to_parents[table_name]:
+                detect_cycle(parent, path)
+            path.pop()
+        for table_name in child_to_parents:
+            detect_cycle(table_name, [])
+        return self
 class TableConfig(BaseModel):
     description: str = ""
@@ -234,7 +259,7 @@ def _create_table_prompt(
     # add previous rows as context to help the LLM generate consistent data
     if previous_rows:
         prompt += f"\n## Previous {len(previous_rows)} Rows:\n\n"
-        prompt += json.dumps(previous_rows, indent=2)
+        prompt += f"{json.dumps(previous_rows, indent=2)}\n\n"
     # add context table name, primary key and data
     if context_data is not None:
@@ -252,12 +277,14 @@ def _create_table_prompt(
         prompt += f"Generate {batch_size} rows for the `{table_name}` table.\n\n"
     else:
         prompt += (
-            f"Generate rows for the `{table_name}` table. "
-            f"The Foreign Key column may only contain values from Context Table Data.\n\n"
+            f"Generate data for the `{table_name}` table. "
+            f"The Foreign Key column may only contain values from Context Table Data. "
+            f"Pay attention to description of the Foreign Key column to understand the relationship.\n\n"
         )
     if previous_rows:
         prompt += (
             "Generate new rows that maintain consistency with the previous rows where appropriate. "
+            "Don't copy previous rows in the output. "
             "Don't pay attention to the number of previous rows; there might have been more generated than provided.\n\n"
         )
     prompt += f"Do not use code to generate the data.\n\n"
@@ -426,6 +453,44 @@ def _harmonize_sample_size(sample_size: int | dict[str, int], config: MockConfig
     return sample_size
+def _build_dependency_graph(config: MockConfig) -> tuple[dict[str, list[str]], dict[str, list[str]], list[str]]:
+    child_to_parents = {}
+    parent_to_children = {}
+    for table_name in config.root:
+        child_to_parents[table_name] = []
+        parent_to_children[table_name] = []
+    for table_name, table_config in config.root.items():
+        if table_config.foreign_keys:
+            for fk in table_config.foreign_keys:
+                referenced_table = fk.referenced_table
+                child_to_parents[table_name].append(referenced_table)
+                parent_to_children[referenced_table].append(table_name)
+    subject_tables = [table_name for table_name, deps in child_to_parents.items() if not deps]
+    return child_to_parents, parent_to_children, subject_tables
+def _build_execution_plan(parent_to_children: dict[str, list[str]], subject_tables: list[str]) -> list[str]:
+    execution_plan = []
+    bfs_queue = list(subject_tables)
+    processed = set()
+    while bfs_queue:
+        table_name = bfs_queue.pop(0)
+        if table_name in processed:
+            continue
+        execution_plan.append(table_name)
+        processed.add(table_name)
+        for child in parent_to_children[table_name]:
+            if child not in bfs_queue and child not in processed:
+                bfs_queue.append(child)
+    return execution_plan
 def sample(
     *,
     tables: dict[str, dict],
@@ -491,34 +556,52 @@ def sample(
     from mostlyai import mock
     tables = {
-        "guests": {
-            "description": "Guests of an Alpine ski hotel in Austria",
+        "customers": {
+            "description": "Customers of a hardware store",
             "columns": {
-                "id": {"prompt": "the unique id of the guest", "dtype": "integer"},
-                "name": {"prompt": "first name and last name of the guest", "dtype": "string"},
+                "customer_id": {"prompt": "the unique id of the customer", "dtype": "integer"},
+                "name": {"prompt": "first name and last name of the customer", "dtype": "string"},
+            },
+            "primary_key": "customer_id",
+        },
+        "orders": {
+            "description": "Orders of a Customer",
+            "columns": {
+                "customer_id": {"prompt": "the customer id for that order", "dtype": "integer"},
+                "order_id": {"prompt": "the unique id of the order", "dtype": "string"},
+                "text": {"prompt": "order text description", "dtype": "string"},
+                "amount": {"prompt": "order amount in USD", "dtype": "float"},
             },
-            "primary_key": "id",
+            "primary_key": "order_id",
+            "foreign_keys": [
+                {
+                    "column": "customer_id",
+                    "referenced_table": "customers",
+                    "description": "each customer has anywhere between 2 and 3 orders",
+                }
+            ],
         },
-        "purchases": {
-            "description": "Purchases of a Guest during their stay",
+        "items": {
+            "description": "Items in an Order",
             "columns": {
-                "guest_id": {"prompt": "the guest id for that purchase", "dtype": "integer"},
-                "purchase_id": {"prompt": "the unique id of the purchase", "dtype": "string"},
-                "text": {"prompt": "purchase text description", "dtype": "string"},
-                "amount": {"prompt": "purchase amount in EUR", "dtype": "float"},
+                "item_id": {"prompt": "the unique id of the item", "dtype": "string"},
+                "order_id": {"prompt": "the order id for that item", "dtype": "string"},
+                "name": {"prompt": "the name of the item", "dtype": "string"},
+                "price": {"prompt": "the price of the item in USD", "dtype": "float"},
             },
             "foreign_keys": [
                 {
-                    "column": "guest_id",
-                    "referenced_table": "guests",
-                    "description": "each guest has anywhere between 1 and 10 purchases",
+                    "column": "order_id",
+                    "referenced_table": "orders",
+                    "description": "each order has between 1 and 2 items",
                 }
             ],
         },
     }
-    data = mock.sample(tables=tables, sample_size=5, model="openai/gpt-4.1-nano")
-    df_guests = data["guests"]
-    df_purchases = data["purchases"]
+    data = mock.sample(tables=tables, sample_size=2, model="openai/gpt-4.1")
+    df_customers = data["customers"]
+    df_orders = data["orders"]
+    df_items = data["items"]
     ```
     """
@@ -526,9 +609,15 @@ def sample(
     sample_size = _harmonize_sample_size(sample_size, config)
     primary_keys = {table_name: table_config.primary_key for table_name, table_config in config.root.items()}
-    dfs = {}
-    for table_name, table_config in config.root.items():
-        if len(dfs) == 0:
+    child_to_parents, parent_to_children, subject_tables = _build_dependency_graph(config)
+    execution_plan: list[str] = _build_execution_plan(parent_to_children, subject_tables)
+    results: dict[str, pd.DataFrame] = {}
+    for table_name in execution_plan:
+        table_config = config.root[table_name]
+        if not child_to_parents[table_name]:
             # subject table
             df = _sample_table(
                 table_name=table_name,
@@ -542,22 +631,21 @@ def sample(
                 previous_rows_size=5,
                 llm_config=LLMConfig(model=model, api_key=api_key),
             )
-        elif len(dfs) == 1:
-            # sequence table
+        else:
+            # sequencial table
+            referenced_table = table_config.foreign_keys[0].referenced_table
             df = _sample_table(
                 table_name=table_name,
                 table_config=table_config,
                 primary_keys=primary_keys,
                 sample_size=None,
-                context_data=next(iter(dfs.values())),
+                context_data=results[referenced_table],
                 temperature=temperature,
                 top_p=top_p,
                 batch_size=1,  # generate one sequence at a time
                 previous_rows_size=5,
                 llm_config=LLMConfig(model=model, api_key=api_key),
             )
-        else:
-            raise RuntimeError("Only 1 or 2 table setups are supported for now")
-        dfs[table_name] = df
+        results[table_name] = df
-    return dfs if len(dfs) > 1 else next(iter(dfs.values()))
+    return results if len(results) > 1 else next(iter(results.values()))

{mostlyai_mock-0.0.4 → mostlyai_mock-0.0.6}/pyproject.toml RENAMED Viewed

@@ -1,11 +1,28 @@
 [project]
 name = "mostlyai-mock"
-version = "0.0.4"
+version = "0.0.6"
 description = "Synthetic Mock Data"
 authors = [{ name = "MOSTLY AI", email = "dev@mostly.ai" }]
 requires-python = ">=3.10"
 readme = "README.md"
 license = "Apache-2.0"
+classifiers = [
+    "Development Status :: 4 - Beta",
+    "Intended Audience :: Developers",
+    "Intended Audience :: Science/Research",
+    "Intended Audience :: Information Technology",
+    "Intended Audience :: Financial and Insurance Industry",
+    "Intended Audience :: Healthcare Industry",
+    "Intended Audience :: Telecommunications Industry",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+    "Programming Language :: Python :: 3.13",
+    "License :: OSI Approved :: Apache Software License",
+    "Operating System :: OS Independent",
+    "Topic :: Software Development :: Libraries",
+    "Typing :: Typed",
+]
 dependencies = [
     "pydantic>=2.0.0,<3.0.0",
     "numpy>=1.26.3",

mostlyai_mock-0.0.4/PKG-INFO DELETED Viewed

@@ -1,98 +0,0 @@
-Metadata-Version: 2.4
-Name: mostlyai-mock
-Version: 0.0.4
-Summary: Synthetic Mock Data
-Project-URL: homepage, https://github.com/mostly-ai/mostlyai-mock
-Project-URL: repository, https://github.com/mostly-ai/mostlyai-mock
-Project-URL: documentation, https://mostly-ai.github.io/mostlyai-mock/
-Author-email: MOSTLY AI <dev@mostly.ai>
-License-Expression: Apache-2.0
-License-File: LICENSE
-Requires-Python: >=3.10
-Requires-Dist: litellm>=1.67.0
-Requires-Dist: numpy>=1.26.3
-Requires-Dist: pandas>=2.0.0
-Requires-Dist: pyarrow>=14.0.0
-Requires-Dist: pydantic<3.0.0,>=2.0.0
-Description-Content-Type: text/markdown
-# Synthetic Mock Data 🔮
-[![Documentation](https://img.shields.io/badge/docs-latest-green)](https://mostly-ai.github.io/mostlyai-mock/) [![stats](https://pepy.tech/badge/mostlyai-mock)](https://pypi.org/project/mostlyai-mock/) ![license](https://img.shields.io/github/license/mostly-ai/mostlyai-mock) ![GitHub Release](https://img.shields.io/github/v/release/mostly-ai/mostlyai-mock) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mostlyai-mock)
-Create data out of nothing. Prompt LLMs for Tabular Data.
-## Installation
-The latest release of `mostlyai-mock` can be installed via pip:
-```bash
-pip install -U mostlyai-mock
-```
-Note: An API key to a LLM endpoint, with structured response, is required. It is recommended to set such a key as an environment variable (e.g. `OPENAI_API_KEY`, `GEMINI_API_KEY`, etc.). Alternatively, the key needs to be passed to every call to the library iteself via the parameter `api_key`.
-## Quick Start
-### Single Table
-```python
-from mostlyai import mock
-tables = {
-    "guests": {
-        "description": "Guests of an Alpine ski hotel in Austria",
-        "columns": {
-            "nationality": {"prompt": "2-letter code for the nationality", "dtype": "string"},
-            "name": {"prompt": "first name and last name of the guest", "dtype": "string"},
-            "gender": {"dtype": "category", "values": ["male", "female"]},
-            "age": {"prompt": "age in years; min: 18, max: 80; avg: 25", "dtype": "integer"},
-            "date_of_birth": {"prompt": "date of birth", "dtype": "date"},
-            "checkin_time": {"prompt": "the check in timestamp of the guest; may 2025", "dtype": "datetime"},
-            "is_vip": {"prompt": "is the guest a VIP", "dtype": "boolean"},
-            "price_per_night": {"prompt": "price paid per night, in EUR", "dtype": "float"},
-            "room_number": {"prompt": "room number", "dtype": "integer", "values": [101, 102, 103, 201, 202, 203, 204]}
-        },
-    }
-}
-df = mock.sample(tables=tables, sample_size=10, model="openai/gpt-4.1-nano")
-print(df)
-```
-### Multiple Tables
-```python
-from mostlyai import mock
-tables = {
-    "guests": {
-        "description": "Guests of an Alpine ski hotel in Austria",
-        "columns": {
-            "id": {"prompt": "the unique id of the guest", "dtype": "integer"},
-            "name": {"prompt": "first name and last name of the guest", "dtype": "string"},
-        },
-        "primary_key": "id",
-    },
-    "purchases": {
-        "description": "Purchases of a Guest during their stay",
-        "columns": {
-            "guest_id": {"prompt": "the guest id for that purchase", "dtype": "integer"},
-            "purchase_id": {"prompt": "the unique id of the purchase", "dtype": "string"},
-            "text": {"prompt": "purchase text description", "dtype": "string"},
-            "amount": {"prompt": "purchase amount in EUR", "dtype": "float"},
-        },
-        "foreign_keys": [
-            {
-                "column": "guest_id",
-                "referenced_table": "guests",
-                "description": "each guest has anywhere between 1 and 10 purchases",
-            }
-        ],
-    },
-}
-data = mock.sample(tables=tables, sample_size=5, model="openai/gpt-4.1-nano")
-df_guests = data["guests"]
-df_purchases = data["purchases"]
-print(df_guests)
-print(df_purchases)
-```

mostlyai_mock-0.0.4/README.md DELETED Viewed

@@ -1,80 +0,0 @@
-# Synthetic Mock Data 🔮
-[![Documentation](https://img.shields.io/badge/docs-latest-green)](https://mostly-ai.github.io/mostlyai-mock/) [![stats](https://pepy.tech/badge/mostlyai-mock)](https://pypi.org/project/mostlyai-mock/) ![license](https://img.shields.io/github/license/mostly-ai/mostlyai-mock) ![GitHub Release](https://img.shields.io/github/v/release/mostly-ai/mostlyai-mock) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/mostlyai-mock)
-Create data out of nothing. Prompt LLMs for Tabular Data.
-## Installation
-The latest release of `mostlyai-mock` can be installed via pip:
-```bash
-pip install -U mostlyai-mock
-```
-Note: An API key to a LLM endpoint, with structured response, is required. It is recommended to set such a key as an environment variable (e.g. `OPENAI_API_KEY`, `GEMINI_API_KEY`, etc.). Alternatively, the key needs to be passed to every call to the library iteself via the parameter `api_key`.
-## Quick Start
-### Single Table
-```python
-from mostlyai import mock
-tables = {
-    "guests": {
-        "description": "Guests of an Alpine ski hotel in Austria",
-        "columns": {
-            "nationality": {"prompt": "2-letter code for the nationality", "dtype": "string"},
-            "name": {"prompt": "first name and last name of the guest", "dtype": "string"},
-            "gender": {"dtype": "category", "values": ["male", "female"]},
-            "age": {"prompt": "age in years; min: 18, max: 80; avg: 25", "dtype": "integer"},
-            "date_of_birth": {"prompt": "date of birth", "dtype": "date"},
-            "checkin_time": {"prompt": "the check in timestamp of the guest; may 2025", "dtype": "datetime"},
-            "is_vip": {"prompt": "is the guest a VIP", "dtype": "boolean"},
-            "price_per_night": {"prompt": "price paid per night, in EUR", "dtype": "float"},
-            "room_number": {"prompt": "room number", "dtype": "integer", "values": [101, 102, 103, 201, 202, 203, 204]}
-        },
-    }
-}
-df = mock.sample(tables=tables, sample_size=10, model="openai/gpt-4.1-nano")
-print(df)
-```
-### Multiple Tables
-```python
-from mostlyai import mock
-tables = {
-    "guests": {
-        "description": "Guests of an Alpine ski hotel in Austria",
-        "columns": {
-            "id": {"prompt": "the unique id of the guest", "dtype": "integer"},
-            "name": {"prompt": "first name and last name of the guest", "dtype": "string"},
-        },
-        "primary_key": "id",
-    },
-    "purchases": {
-        "description": "Purchases of a Guest during their stay",
-        "columns": {
-            "guest_id": {"prompt": "the guest id for that purchase", "dtype": "integer"},
-            "purchase_id": {"prompt": "the unique id of the purchase", "dtype": "string"},
-            "text": {"prompt": "purchase text description", "dtype": "string"},
-            "amount": {"prompt": "purchase amount in EUR", "dtype": "float"},
-        },
-        "foreign_keys": [
-            {
-                "column": "guest_id",
-                "referenced_table": "guests",
-                "description": "each guest has anywhere between 1 and 10 purchases",
-            }
-        ],
-    },
-}
-data = mock.sample(tables=tables, sample_size=5, model="openai/gpt-4.1-nano")
-df_guests = data["guests"]
-df_purchases = data["purchases"]
-print(df_guests)
-print(df_purchases)
-```

{mostlyai_mock-0.0.4 → mostlyai_mock-0.0.6}/.gitignore RENAMED Viewed

File without changes

{mostlyai_mock-0.0.4 → mostlyai_mock-0.0.6}/LICENSE RENAMED Viewed

File without changes

mostlyai-mock 0.0.4__tar.gz → 0.0.6__tar.gz

mostlyai-mock 0.0.4tar.gz → 0.0.6tar.gz