fauxreal 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
fauxreal-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,455 @@
1
+ Metadata-Version: 2.4
2
+ Name: fauxreal
3
+ Version: 0.1.0
4
+ Summary: A powerful, declarative engine for generating dynamic, relational, and highly customizable fake datasets using a JSON schema.
5
+ Author-email: Author Name <author@example.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/vatsa_codes/fauxreal
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Requires-Python: >=3.8
12
+ Description-Content-Type: text/markdown
13
+ License-File: LICENSE
14
+ Requires-Dist: pandas>=1.0.0
15
+ Requires-Dist: faker>=10.0.0
16
+ Requires-Dist: python-dateutil
17
+ Provides-Extra: parquet
18
+ Requires-Dist: pyarrow; extra == "parquet"
19
+ Requires-Dist: fastparquet; extra == "parquet"
20
+ Dynamic: license-file
21
+
22
+ # Fauxreal
23
+
24
+ A powerful, declarative engine for generating dynamic, relational, and highly customizable fake datasets using a JSON schema. It operates in distinct phases to weave static variables, dynamic fakes, transformations, and complex nested payloads seamlessly into Pandas DataFrames and JSON exports.
25
+
26
+ ## Features
27
+ - **Deterministic Generation**: Supports strict seeding ensuring your mock datasets are identical across every CI/CD or testing run.
28
+ - **Relational Integrity**: Automatically resolves Foreign Keys. DataFrames can sample primary keys generated in upstream DataFrames to ensure valid relationships.
29
+ - **Cartesian Cross-Joins**: Easily generate every unique permutation of specified lists and inject them into DataFrames.
30
+ - **Deep Nesting**: Construct complex nested JSON payloads (Composites) by injecting previously generated variables into leaf nodes.
31
+ - **Conditional Logic**: Evaluate Python-style `eval` statements against current variables to conditionally generate values (e.g. `If amount > 80, result = REVIEW`).
32
+ - **Data Transforms**: Apply string transformations sequentially (padding, truncating, regex replacements) directly within the schema.
33
+
34
+ ## Requirements
35
+ - Python 3.8+
36
+ - `pandas`
37
+ - `faker`
38
+ - `pyarrow` or `fastparquet` (Optional, for parquet export support)
39
+
40
+ ---
41
+
42
+ ## Installation & Usage
43
+
44
+ Install the package directly:
45
+ ```bash
46
+ pip install fauxreal
47
+ ```
48
+
49
+ ### Python API
50
+ Import `generate` directly into your data pipeline or test suite:
51
+
52
+ ```python
53
+ from fauxreal import generate
54
+
55
+ # 1. Execute the pipeline and retrieve only exactly what you want
56
+ results = generate(
57
+ config_path="fauxreal_config.json",
58
+ targets=["system_environment", "user_identity_payload", "transactions_df"]
59
+ )
60
+
61
+ # 2. Access Data
62
+ print(results["system_environment"])
63
+ print(results["user_identity_payload"])
64
+
65
+ # 3. Extract your Pandas DataFrame
66
+ df = results["transactions_df"]
67
+ ```
68
+ *Note: If you omit the `targets` parameter, the engine will return the legacy tuple containing all variables: `(fixed, dynamic, transformed, composites, dataframes)`.*
69
+
70
+ ### CLI Usage
71
+ You can run the engine directly from your terminal and dynamically override any fixed variables at runtime using the `--override` flag. This is extremely useful for CI/CD pipelines!
72
+
73
+ ```bash
74
+ fauxreal \
75
+ --config fauxreal_config.json \
76
+ --override system_environment=staging \
77
+ --override max_connections=999 \
78
+ --seed 42
79
+ ```
80
+
81
+ ---
82
+
83
+ ## Full Example Pipeline
84
+
85
+ Here is an example demonstrating how to construct a schema to generate users and their related transactions:
86
+
87
+ ### 1. Sample Config (`fauxreal_config.json`)
88
+ ```json
89
+ {
90
+ "variable_generation_config": {
91
+ "seed": 42,
92
+ "fixed_variables": [
93
+ { "name": "env", "type": "string", "value": "US" }
94
+ ],
95
+ "dynamic_variables": [
96
+ {
97
+ "name": "transaction_id",
98
+ "type": "string",
99
+ "generation_rules": { "format": "uuid" }
100
+ },
101
+ {
102
+ "name": "amount",
103
+ "type": "float",
104
+ "generation_rules": { "min": 10.0, "max": 100.0, "decimal_places": 2 }
105
+ }
106
+ ],
107
+ "dataframes": [
108
+ {
109
+ "name": "transactions_df",
110
+ "count": 5,
111
+ "columns": [
112
+ { "name": "env", "ref": "env" },
113
+ { "name": "id", "ref": "transaction_id" },
114
+ { "name": "amount", "ref": "amount" }
115
+ ]
116
+ }
117
+ ],
118
+ "exports": [
119
+ {
120
+ "type": "csv",
121
+ "ref": "transactions_df",
122
+ "filepath": "output_transactions.csv"
123
+ }
124
+ ]
125
+ }
126
+ }
127
+ ```
128
+
129
+ ### 2. Python Script
130
+ ```python
131
+ from fauxreal import generate
132
+
133
+ results = generate(config_path='fauxreal_config.json', targets=['transactions_df'])
134
+ df = results['transactions_df']
135
+ print(df)
136
+ ```
137
+
138
+ ---
139
+
140
+ ## Function Reference
141
+
142
+ ### `generate(config_path="fauxreal_config.json", overrides=None, seed=None, targets=None)`
143
+
144
+ The primary extraction engine.
145
+
146
+ #### Parameters
147
+
148
+ * **`config_path`** `(str)`: *(Optional, Default: "fauxreal_config.json")*
149
+ The filepath to your JSON configuration schema.
150
+
151
+ * **`overrides`** `(dict)`: *(Optional)*
152
+ A dictionary of runtime overrides for fixed variables (e.g. `{"env": "prod"}`).
153
+
154
+ * **`seed`** `(int)`: *(Optional)*
155
+ Optional global seed for deterministic generation. Overrides the seed set inside the JSON config.
156
+
157
+ * **`targets`** `(list)`: *(Optional)*
158
+ A list of specific variable or DataFrame names to return exclusively. If provided, returns a `dict` mapping the target name to the generated object. If omitted, returns a 5-element tuple.
159
+
160
+ ---
161
+
162
+ ## Configuration Schema Details
163
+
164
+ This document outlines the structure of the `fauxreal_config.json` file. The configuration drives the `fauxreal` pipeline, which operates through these core mechanisms:
165
+
166
+ 1. **Fixed Variables:** Static mappings of keys to values.
167
+ 2. **Dynamic Variables:** Rule-based generation (random numbers, strings, UUIDs, dates, etc.).
168
+ 3. **Transformations:** Post-processing actions applied to generated variables (e.g., padding, replacing, truncating).
169
+ 4. **Composite Variables:** Nested schemas (dicts, lists) that reference other generated variables.
170
+ 5. **DataFrames:** Tabular structures that cross-join combinations and map columns to variables.
171
+ 6. **Exports:** Automatically save DataFrames to CSVs and Composite payloads to JSON files.
172
+ 7. **Command Line Interface (CLI):** Dynamically execute and override configurations via terminal.
173
+ 8. **Python Usage:** Programmatic API to extract exact payloads or DataFrames in external scripts.
174
+
175
+ ---
176
+
177
+ ### Top-Level Structure
178
+
179
+ ```json
180
+ {
181
+ "variable_generation_config": {
182
+ "description": "Optional description",
183
+ "seed": 42,
184
+ "fixed_variables": [],
185
+ "dynamic_variables": [],
186
+ "transformations": [],
187
+ "composite_variables": [],
188
+ "dataframes": [],
189
+ "exports": []
190
+ }
191
+ }
192
+ ```
193
+
194
+ ---
195
+
196
+ ### 1. Fixed Variables
197
+ Simple key-value pairs that are loaded directly into the state store.
198
+
199
+ | Field | Type | Description |
200
+ |-------|------|-------------|
201
+ | `name` | string | The variable reference name. |
202
+ | `type` | string | `string`, `int`, `float`, `boolean`, `list`, `dict`, `date`. |
203
+ | `value`| any | The static value to assign. |
204
+
205
+ ---
206
+
207
+ ### 2. Dynamic Variables
208
+ Variables generated dynamically at runtime based on specific rules.
209
+
210
+ #### Common Fields
211
+ - `name`: Reference name.
212
+ - `type`: Data type. Must be one of: `"int"`, `"float"`, `"string"`, `"list"`, `"dict"`, `"date"`, `"date_range"`, `"choice"`, `"conditional"`, `"foreign_key"`, `"template"`, `"faker"`.
213
+ - `generation_rules`: Object containing type-specific parameters.
214
+
215
+ #### Supported Variable Types Summary
216
+ - **`int` / `float`**: Uniform or Gaussian random numbers clamped between bounds.
217
+ - **`string`**: UUIDs, integers, or fully customized random strings.
218
+ - **`faker`**: Highly realistic semantic data (names, emails, addresses) powered by `Faker`.
219
+ - **`choice`**: Randomly selects an item from `options` based on `weights` (optional).
220
+ - **`conditional`**: Evaluates conditions using Python-like `eval` strings with `{{variable}}` string interpolation. If a condition evaluates to `True`, returns `result`. Fallback is `default`.
221
+ - **`date` / `date_range`**: Creates ISO date strings or arrays of dates using base anchors and offsets.
222
+ - **`template`**: Interpolates other variables into a string template using `{{variable_name}}` syntax.
223
+ - **`foreign_key`**: Randomly samples a value from a previously generated DataFrame's column (requires `dataframe` and `column` attributes).
224
+
225
+ #### Integer & Float Rules (`type: "int" | "float"`)
226
+ - `min`: Minimum value (number).
227
+ - `max`: Maximum value (number).
228
+ - `decimal_places`: (Float only) Number of decimal places (integer).
229
+ - `distribution`: (Optional) Setting this to `"normal"` switches from uniform random to Gaussian distribution.
230
+ - If `"normal"`, you must provide `"mean"` and `"std_dev"`.
231
+ - E.g., `{"distribution": "normal", "mean": 50, "std_dev": 15}` will group values closely around 50. Results are still strictly clamped between `min` and `max`.
232
+
233
+ #### String Rules (`type: "string"`)
234
+ - `format`: `"uuid"` (generates standard v4 UUID).
235
+ - `source`: `"integer"` or `"random_string"`.
236
+ - If `"integer"`: Uses `min` and `max` limits.
237
+ - If `"random_string"`:
238
+ - `min_length`, `max_length`: String boundaries.
239
+ - `include_mixed_case`: boolean.
240
+ - `include_alphanumeric`: boolean.
241
+ - `include_special_characters`: boolean.
242
+ - `special_characters`: List of string chars, e.g., `["!", "@", "#"]`.
243
+
244
+ #### Choice Rules (`type: "choice"`)
245
+ Randomly selects one item from a list, with optional probability weighting.
246
+ - `options`: List of items to pick from.
247
+ - `weights`: (Optional) List of probabilities. Must be the same length as `options`. If omitted, uniform distribution is used.
248
+ - Example: `{"options": ["SUCCESS", "FAILED"], "weights": [0.9, 0.1]}`
249
+
250
+ #### Template Rules (`type: "template"`)
251
+ Dynamically constructs a string by injecting previously generated variables into placeholders.
252
+ - `template`: A string containing placeholders wrapped in double curly braces (e.g., `"{{var_name}}"`).
253
+ - Example: `{"template": "Receipt for {{customer_name}}: A transaction of ${{transaction_amount_usd}} was {{transaction_status}}."}`
254
+
255
+ #### Conditional Variable
256
+ ```json
257
+ {
258
+ "name": "transaction_status",
259
+ "type": "conditional",
260
+ "generation_rules": {
261
+ "conditions": [
262
+ {
263
+ "eval": "{{transaction_amount}} > 80",
264
+ "result": "REVIEW"
265
+ },
266
+ {
267
+ "eval": "{{transaction_amount}} > 50",
268
+ "result": "PENDING"
269
+ }
270
+ ],
271
+ "default": "SUCCESS"
272
+ }
273
+ }
274
+ ```
275
+
276
+ #### Foreign Key Variable (`type: "foreign_key"`)
277
+ Randomly samples a value from a previously generated DataFrame's column (requires `dataframe` and `column` attributes).
278
+ ```json
279
+ {
280
+ "name": "transaction_fk",
281
+ "type": "foreign_key",
282
+ "generation_rules": {
283
+ "dataframe": "transactions_df",
284
+ "column": "id"
285
+ }
286
+ }
287
+ ```
288
+
289
+ #### Faker / Semantic Rules (`type: "faker"`)
290
+ Requires `faker` to be installed (`pip install faker`). Allows you to generate highly realistic mocked data for almost any common field type natively.
291
+ - `provider`: The Faker provider method to call.
292
+ - Supported Providers include:
293
+ - `name`, `first_name`, `last_name`
294
+ - `email`, `company_email`, `domain_name`
295
+ - `address`, `city`, `state`, `country`, `zipcode`
296
+ - `company`, `job`, `catch_phrase`
297
+ - `phone_number`, `ssn`
298
+ - `text`, `sentence`, `paragraph`
299
+ - `credit_card_number`, `iban`, `bban`
300
+ - *Note:* Any valid Faker provider not requiring complex positional arguments is supported.
301
+
302
+ #### Date Rules (`type: "date"`)
303
+ Generates timestamps or standard dates.
304
+ - `anchor_date`: Base date to start calculations from. (`"today"`, `"2024-05-18"`, etc.)
305
+ - `date_offset`: Offset amount and unit (`"+7 days"`, `"-1 month"`, `"+1 years"`).
306
+ - `time_offset`: Sub-day offset (`"+5 hours"`, `"-30 minutes"`).
307
+ - `timezone`: Standard tz database name (e.g., `"UTC"`, `"EST"`, `"Europe/London"`).
308
+ - `format`: Desired output format string (e.g., `"YYYY-MM-DD"`, `"epoch"`).
309
+ - `include_timestamp`: boolean. If false and format is epoch, it zeroes out hours/minutes/seconds.
310
+ - `timestamp_format`: e.g., `"HH:mm:ss"` or literal strings like `"T12:00:00Z"`.
311
+
312
+ #### Date Range Rules (`type: "date_range"`)
313
+ Generates a *list* of dates based on start and end offsets.
314
+ - `start_anchor` / `end_anchor`: e.g., `"today"`.
315
+ - `start_offset` / `end_offset`: e.g., `"-5 days"`, `"+5 days"`.
316
+ - `step`: Interval between dates (e.g., `"1 days"`).
317
+ - `skip_weekends`: boolean. If true, Saturdays and Sundays will be excluded from the generated list.
318
+ - Uses standard formatting fields (`format`, `timezone`, `include_timestamp`).
319
+
320
+ ---
321
+
322
+ ### 3. Transformations
323
+ Transformations allow you to take a generated variable and apply sequential modifications.
324
+ - `name`: The new transformed variable's name.
325
+ - `ref`: The name of the previously generated variable to modify.
326
+ - `actions`: List of transformation objects.
327
+
328
+ #### Possible Actions:
329
+ - `{"action": "cast_to_string"}`
330
+ - `{"action": "pad_left", "pad_character": "0", "target_length": 9}`
331
+ - `{"action": "prepend", "value": "EMP-"}`
332
+ - `{"action": "append", "value": " USD"}`
333
+ - `{"action": "lowercase"}`
334
+ - `{"action": "truncate", "max_length": 10}`
335
+ - `{"action": "replace", "target": "[^a-z0-9]", "replacement": ".", "use_regex": true}`
336
+
337
+ ---
338
+
339
+ ### 4. Composite Variables
340
+ Composite variables allow you to construct rich nested JSON objects or lists containing references to other variables.
341
+ - `name`: Composite variable name.
342
+ - `type`: `"dict"` or `"list"`.
343
+ - `count`: How many to generate (e.g., `"count": 10`). Defaults to 1.
344
+ - `schema`: A nested structure defining the keys. Use `{"ref": "variable_name"}` to inject a generated value into the leaf nodes.
345
+
346
+ #### Dict Schema Example (with Deep Nesting)
347
+ You can nest dictionaries and lists infinitely. Schema resolution is fully recursive.
348
+ ```json
349
+ {
350
+ "name": "profile_payload",
351
+ "type": "dict",
352
+ "count": 5,
353
+ "schema": {
354
+ "user_id": { "ref": "random_user_id" },
355
+ "line_items": { "ref": "line_item_payload" },
356
+ "metadata": {
357
+ "timestamp": { "ref": "timestamp_iso" }
358
+ }
359
+ }
360
+ }
361
+ ```
362
+
363
+ #### List Schema Example
364
+ ```json
365
+ {
366
+ "name": "line_item_payload",
367
+ "type": "list",
368
+ "count": 5,
369
+ "schema": { "ref": "transaction_amount" }
370
+ }
371
+ ```
372
+
373
+ ---
374
+
375
+ ### 5. DataFrames
376
+ Generates Pandas DataFrames using combinations of the previously defined variables.
377
+
378
+ - `name`: DataFrame name.
379
+ - `count`: Number of rows to generate (fallback if `unique_combinations` is not used).
380
+ - `columns`: List of column mappings `[{"name": "col_1", "ref": "var_1"}]`.
381
+ - `unique_combinations`: (Optional) List of column names (e.g., `["env", "date"]`).
382
+ - *Note:* If provided, the referenced variables MUST be lists. The script will perform a Cartesian Cross-Join across all provided lists, guaranteeing every unique permutation appears exactly once. The resulting row count will be the multiplied lengths of those lists, completely overriding `count`.
383
+ - **Per-Row Dynamic Generation**: Any DataFrame columns pointing to Dynamic or Transformed variables that are *not* part of `unique_combinations` will be actively re-generated with fresh values for every single row in the DataFrame (e.g. generating unique random IDs and amounts per row).
384
+
385
+ ---
386
+
387
+ ### 6. Exports
388
+ Exports DataFrames or Composites to external files.
389
+
390
+ - `type`: `"csv"`, `"json"`, or `"parquet"` (Requires `pyarrow` or `fastparquet`).
391
+ - `ref`: The name of the DataFrame or Composite variable to export.
392
+ - `filepath`: Output file name.
393
+ - **`indent`**: (JSON only) Number of spaces for indentation (default: 4).
394
+
395
+ #### Example
396
+ ```json
397
+ "exports": [
398
+ {
399
+ "type": "csv",
400
+ "ref": "transactions_df",
401
+ "filepath": "output_transactions.csv"
402
+ },
403
+ {
404
+ "type": "json",
405
+ "ref": "user_identity_payload",
406
+ "filepath": "output_users.json",
407
+ "indent": 4
408
+ }
409
+ ]
410
+ ```
411
+
412
+ ---
413
+
414
+ ### 7. Command Line Interface (CLI)
415
+ You can run the engine directly from your terminal and dynamically override any fixed variables at runtime using the `--override` flag. This is extremely useful for CI/CD pipelines!
416
+
417
+ ### Command Line Arguments
418
+ - `--config`: Path to your JSON configuration file (defaults to `fauxreal_config.json`).
419
+ - `--override`: Supply as many times as you want in `key=value` format.
420
+ - The script will automatically infer strings, integers, floats, and booleans (`true`/`false`).
421
+ - `--seed`: An integer value to ensure exact reproducibility across multiple runs.
422
+
423
+ ### Example
424
+ ```bash
425
+ fauxreal \
426
+ --config my_config.json \
427
+ --override system_environment=staging \
428
+ --override max_connections=999 \
429
+ --seed 42
430
+ ```
431
+
432
+ ---
433
+
434
+ ### 8. Python Usage
435
+
436
+ If you are importing this script into another Python file, you can call `generate()` to execute the pipeline. You can use the `targets` parameter to exclusively return the specific variables or DataFrames you need without dealing with large tuples:
437
+
438
+ ```python
439
+ from fauxreal import generate
440
+
441
+ # Execute the pipeline and retrieve only exactly what you want
442
+ results = generate(
443
+ config_path="fauxreal_config.json",
444
+ targets=["system_environment", "user_identity_payload", "transactions_df"]
445
+ )
446
+
447
+ # Access Data
448
+ print(results["system_environment"])
449
+ print(results["user_identity_payload"])
450
+
451
+ # Extract your Pandas DataFrame
452
+ df = results["transactions_df"]
453
+ ```
454
+
455
+ If you omit the `targets` parameter, the engine will return the legacy tuple containing all variables: `(fixed, dynamic, transformed, composites, dataframes)`.