PyPI - fakedata-python - Versions diffs - 2.0.0__py3-none-any.whl → 2.0.2__py3-none-any.whl - Mend

fakedata-python 2.0.0py3-none-any.whl → 2.0.2py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

fakedata/cli.py ADDED Viewed

@@ -0,0 +1,133 @@
+#!/usr/bin/env python3
+"""
+fakedata CLI - ML-Ready Synthetic Data Engine
+"""
+import argparse
+import json
+import sys
+import time
+import os
+def main():
+    parser = argparse.ArgumentParser(
+        prog='fakedata',
+        description='fakedata - ML-Ready Synthetic Data Engine (Python)',
+        formatter_class=argparse.RawTextHelpFormatter,
+        epilog="""
+EXAMPLES:
+  # Generate 1000 users to a CSV file
+  fakedata generate -n 1000 -f csv -o dataset.csv
+  # Generate 500 deterministic Indian users
+  fakedata generate -n 500 -l in --seed 42 -o india.json
+  # Generate fraud detection dataset with 5%% anomalies
+  fakedata generate -n 10000 -a 0.05 -f csv -o fraud_data.csv
+  # Preview a single user profile
+  fakedata preview
+  # Generate with time-series activity logs
+  fakedata generate -n 100 --timeseries --days 60 -o activity.json
+        """
+    )
+    subparsers = parser.add_subparsers(dest='command', help='Command to run')
+    # ─── preview ──────────────────────────────────────────────────────────────
+    subparsers.add_parser('preview', help='Print a single user profile to the console')
+    # ─── generate ─────────────────────────────────────────────────────────────
+    gen = subparsers.add_parser('generate', help='Generate synthetic user data')
+    gen.add_argument('-n', '--count', type=int, default=10,
+                     help='Number of users to generate (default: 10)')
+    gen.add_argument('-f', '--format', choices=['json', 'csv', 'flat'], default='json',
+                     help='Output format: json | csv | flat (default: json)')
+    gen.add_argument('-o', '--output', type=str, default=None,
+                     help='Output file path (default: stdout)')
+    gen.add_argument('-s', '--seed', type=int, default=None,
+                     help='Random seed for reproducibility')
+    gen.add_argument('-l', '--locale', type=str, default=None,
+                     help='Locale: en|in|jp|kr|de|br|ar|fr (default: en)')
+    gen.add_argument('-a', '--anomaly-rate', type=float, default=0.0,
+                     help='Fraction of anomalous users 0-1 (default: 0)')
+    gen.add_argument('-m', '--missing-rate', type=float, default=0.0,
+                     help='Fraction of null fields 0-1 (default: 0)')
+    gen.add_argument('-t', '--timeseries', action='store_true',
+                     help='Include time-series activity logs')
+    gen.add_argument('--days', type=int, default=30,
+                     help='Days of activity for time-series (default: 30)')
+    gen.add_argument('--events-per-day', type=int, default=8,
+                     help='Average events per day for time-series (default: 8)')
+    gen.add_argument('--pretty', action='store_true',
+                     help='Pretty-print JSON output')
+    args = parser.parse_args()
+    if args.command is None or args.command == 'help':
+        parser.print_help()
+        return
+    # ─── Import the engine (lazy, only when needed) ───────────────────────────
+    try:
+        import fakedata.modules.data as data
+    except ImportError:
+        print("ERROR: Could not import fakedata. Make sure it is installed: pip install fakedata-python", file=sys.stderr)
+        sys.exit(1)
+    # ─── Preview ─────────────────────────────────────────────────────────────
+    if args.command == 'preview':
+        u = data.user()
+        print(json.dumps(u, indent=2))
+        return
+    # ─── Generate ────────────────────────────────────────────────────────────
+    if args.command == 'generate':
+        options = {
+            'seed': args.seed,
+            'locale': args.locale,
+            'anomaly_rate': args.anomaly_rate,
+            'missing_rate': args.missing_rate,
+        }
+        # Remove None values so defaults are used inside the engine
+        options = {k: v for k, v in options.items() if v is not None and v != 0.0}
+        start = time.time()
+        if args.timeseries:
+            results = [
+                data.user_time_series({**options, 'days': args.days, 'events_per_day': args.events_per_day})
+                for _ in range(args.count)
+            ]
+            output = json.dumps(results, indent=2 if args.pretty else None)
+        elif args.format == 'csv':
+            output = data.users_to_csv(args.count, options if options else None)
+        elif args.format == 'flat':
+            rows = data.users_flat(args.count, options if options else None)
+            output = json.dumps(rows, indent=2 if args.pretty else None)
+        else:  # json
+            if args.pretty:
+                output = data.users_to_json(args.count, options if options else None)
+            else:
+                output = json.dumps(data.users(args.count, options if options else None))
+        elapsed = round(time.time() - start, 2)
+        if args.output:
+            out_path = os.path.abspath(args.output)
+            with open(out_path, 'w', encoding='utf-8') as f:
+                f.write(output)
+            size_kb = round(len(output.encode('utf-8')) / 1024, 1)
+            print(
+                f"✔ Done! Generated {args.count:,} users in {elapsed}s → {out_path} ({size_kb} KB)",
+                file=sys.stderr
+            )
+        else:
+            print(output)
+if __name__ == '__main__':
+    main()

fakedata/modules/data.py CHANGED Viewed

@@ -1,6 +1,7 @@
 import random
 import string
 import math
+import datetime
 from ..core import load_data
 # ─── Data Loading ───────────────────────────────────────────────────────────
@@ -32,6 +33,16 @@ healthcare_data = load_data('healthcare.json')
 # Phase 4 datasets
 locales_data = load_data('locales.json')
+# ─── Personas for Realistic Distribution ───────────────────────────────────
+PERSONAS = [
+    {"type": "Executive", "weight": 5, "income_mult": 3.5, "tech_bias": "Apple", "lifestyle": "luxury"},
+    {"type": "Tech Professional", "weight": 20, "income_mult": 2.2, "tech_bias": "High-End", "lifestyle": "modern"},
+    {"type": "Student", "weight": 15, "income_mult": 0.4, "tech_bias": "Mid-Range", "lifestyle": "frugal"},
+    {"type": "Manual Laborer", "weight": 25, "income_mult": 0.9, "tech_bias": "Budget", "lifestyle": "basic"},
+    {"type": "Service Worker", "weight": 25, "income_mult": 0.8, "tech_bias": "Budget", "lifestyle": "basic"},
+    {"type": "Freelancer", "weight": 10, "income_mult": 1.2, "tech_bias": "Mid-Range", "lifestyle": "flexible"}
+]
 # ─── Utility Functions ──────────────────────────────────────────────────────
 def get_random(arr):
@@ -97,10 +108,22 @@ EDUCATION_FIELDS = [
 ]
-def generate_education(age):
+def generate_education(age, persona):
     # Filter education levels by age eligibility
     eligible = [e for e in EDUCATION_LEVELS if age >= e["min_age"]]
-    weights = [e["weight"] for e in eligible]
+    # Adjust weights based on persona
+    weights = []
+    for e in eligible:
+        w = e["weight"]
+        if persona["type"] == "Executive" and e["level"] in ["Master's", "PhD"]:
+            w *= 3
+        if persona["type"] == "Manual Laborer" and e["level"] in ["High School", "Dropout"]:
+            w *= 2
+        if persona["type"] == "Student" and age < 25:
+            w *= 2
+        weights.append(w)
     selected_index = weighted_random(weights)
     selected = eligible[selected_index]
@@ -169,16 +192,20 @@ WORK_MODES = [
 ]
-def generate_employment(age, education):
+def generate_employment(age, education, persona):
     # Filter eligible statuses by age
     eligible = [s for s in EMPLOYMENT_STATUSES if age >= s["min_age"] and age <= s["max_age"]]
-    # Boost weights contextually
+    # Boost weights based on persona
     weights = []
     for s in eligible:
         w = s["weight"]
-        if s["status"] == "student" and age < 25:
+        if persona["type"] == "Executive" and s["status"] == "employed":
             w *= 2
+        if persona["type"] == "Freelancer" and s["status"] == "freelancer":
+            w *= 5
+        if persona["type"] == "Student" and s["status"] == "student":
+            w *= 4
         if s["status"] == "retired" and age >= 65:
             w *= 3
         weights.append(w)
@@ -250,7 +277,7 @@ TAX_BRACKETS = [
 ]
-def generate_financial(age, education, employment):
+def generate_financial(age, education, employment, persona):
     # Base income multipliers by education level
     income_multipliers = {
         "High School": 1.0, "Associate's": 1.3, "Bachelor's": 1.8,
@@ -277,7 +304,9 @@ def generate_financial(age, education, employment):
         sr = salary_ranges_data[role_key]
         base_salary_inr = normal_random(sr["median"], (sr["p75"] - sr["p25"]) / 2)
         base_salary_usd = round(clamp(base_salary_inr / 80, 15000, 500000))
-        annual_income = round(base_salary_usd * edu_multiplier * age_factor)
+        # Income influenced by Persona and Education
+        annual_income = round(base_salary_usd * edu_multiplier * age_factor * persona["income_mult"])
     elif employment["status"] == "retired":
         annual_income = round(random.uniform(20000, 60000))
     else:
@@ -994,14 +1023,17 @@ def generate_single_user(id_index=None, schema=None, locale=None):
     # Weighted age generation
     age = generate_age()
+    # Pick a Persona to drive statistical correlations
+    persona = PERSONAS[weighted_random([p["weight"] for p in PERSONAS])]
     # Correlated education
-    education = generate_education(age)
+    education = generate_education(age, persona)
     # Correlated employment
-    employment = generate_employment(age, education)
+    employment = generate_employment(age, education, persona)
     # Correlated financial profile
-    financial = generate_financial(age, education, employment)
+    financial = generate_financial(age, education, employment, persona)
     # Demographics
     demographics = generate_demographics(age)
@@ -1111,7 +1143,12 @@ def generate_single_user(id_index=None, schema=None, locale=None):
             "cardCvv": str(card_cvv),
         },
         "hobbies": user_hobbies,
-        "technology_profile": tech_profile
+        "technology_profile": tech_profile,
+        "persona": persona["type"],
+        "metadata": {
+            "version": "2.1.0",
+            "generation_timestamp": datetime.datetime.utcnow().isoformat() + "Z"
+        }
     }
     # Phase 4: Apply locale country override

fakedata_python-2.0.2.dist-info/METADATA ADDED Viewed

@@ -0,0 +1,434 @@
+Metadata-Version: 2.4
+Name: fakedata-python
+Version: 2.0.2
+Summary: The fakedata package generates realistic synthetic user profiles for machine learning, deep learning, data analysis, and data science workflows.
+Author-email: abhay557 <contact@abhaymourya.in>
+License-Expression: MIT
+Project-URL: Homepage, https://github.com/abhay557/fakedata
+Classifier: Programming Language :: Python :: 3
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.7
+Description-Content-Type: text/markdown
+# fakedata
+[![NPM Version](https://img.shields.io/npm/v/@abhay557/fakedata?color=red&label=npm)](https://www.npmjs.com/package/@abhay557/fakedata)
+[![PyPI Version](https://img.shields.io/pypi/v/fakedata-python?color=blue&label=pypi)](https://pypi.org/project/fakedata-python/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/16N9x1YCOVVvIF8rl7IQxKRkK4en_g3Gi?usp=sharing)
+[![PyPI Downloads](https://static.pepy.tech/personalized-badge/fakedata-python?period=total&units=INTERNATIONAL_SYSTEM&left_color=BLACK&right_color=GREEN&left_text=downloads)](https://pepy.tech/projects/fakedata-python)
+A high-performance, **zero-dependency** synthetic data generation engine, available for both **Node.js** and **Python**. Designed specifically for machine learning, data science, and analytics workflows, providing 100% data parity across platforms.
+## Overview
+`fakedata` has been completely rebuilt from the ground up to serve as an **ML-ready synthetic data engine**. It generates deeply interconnected user profiles with **112 flat columns across 13 domains** (Health, Financial, Employment, Digital Footprint, etc.), making it the perfect tool for training models, benchmarking pipelines, or simulating realistic databases.
+###  Machine Learning Power Features:
+- **Behavioral Personas**: Orchestrate correlations through 6 distinct personas (e.g., Executive, Student, Tech Pro) to ensure realistic socio-economic patterns.
+- **Seed Reproducibility**: Generate byte-for-byte identical datasets across runs (and languages!) using `seed`.
+- **Schema Overrides**: Force specific distributions (e.g., age ranges, income brackets, genders) using `schema`.
+- **Locale-Aware Generation**: Support for 8 culture-specific name sets and phone formats (`en`, `in`, `jp`, `kr`, `de`, `br`, `ar`, `fr`).
+- **Missing Data Simulation**: Automatically inject realistic nulls using `missing_rate` to test your data imputation pipelines.
+- **Anomaly Injection**: Inject fraud/outlier profiles (e.g., impossible geography, credit fraud, income spikes) using `anomaly_rate`.
+- **Time-Series Data**: Generate chronological activity logs (logins, page views, purchases) per user for behavioral modeling.
+- **Pipeline Ready**: Export directly to CSV, JSON, or Flat objects (perfect for `pandas.DataFrame`).
+- **CLI Tool**: Generate and export datasets directly from your terminal — no scripting required.
+---
+##  Python Implementation
+### Installation
+```bash
+pip install fakedata-python
+```
+### Quick Start
+```python
+import fakedata.data as data
+import pandas as pd
+# Generate 10,000 highly correlated users deterministically
+users = data.users(10000, {"seed": 42})
+# Or export directly to a Pandas DataFrame
+df = pd.DataFrame(data.users_flat(10000, {"seed": 42}))
+print(df.head())
+# Create time-series activity data
+ts = data.user_time_series({"days": 30, "events_per_day": 8})
+print(f"Generated {len(ts['activity'])} events for {ts['user']['fullName']}")
+```
+##  Node.js / TypeScript Implementation
+### Installation
+```bash
+npm install @abhay557/fakedata
+```
+### Quick Start
+```javascript
+const { data } = require('@abhay557/fakedata');
+// Generate deterministic users with a 5% missing data rate (null injection)
+const users = data.users(1000, { seed: 42, missing_rate: 0.05 });
+// Export directly to CSV format
+const csvString = data.usersToCSV(1000, { seed: 42 });
+// Time-series activity data
+const ts = data.userTimeSeries({ days: 30, eventsPerDay: 8 });
+console.log(`Generated ${ts.activity.length} events for ${ts.user.fullName}`);
+```
+---
+##  CLI — Command Line Interface
+After installing, use `fakedata` directly from your terminal. No scripts needed!
+### Node.js (global install)
+```bash
+npm install -g @abhay557/fakedata
+```
+### Python (global install)
+```bash
+pip install fakedata-python
+```
+### CLI Commands
+| Command | Description |
+|:---|:---|
+| `fakedata generate` | Generate synthetic user data |
+| `fakedata preview` | Print a single user profile to the console |
+| `fakedata help` | Show all available options |
+### CLI Options
+| Flag | Default | Description |
+|:---|:---|:---|
+| `-n`, `--count` | `10` | Number of users to generate |
+| `-f`, `--format` | `json` | Output format: `json` \| `csv` \| `flat` |
+| `-o`, `--output` | stdout | Output file path |
+| `-s`, `--seed` | none | Random seed for reproducibility |
+| `-l`, `--locale` | `en` | Locale: `en` \| `in` \| `jp` \| `kr` \| `de` \| `br` \| `ar` \| `fr` |
+| `-a`, `--anomaly-rate` | `0` | Fraction of anomalous users (0–1) |
+| `-m`, `--missing-rate` | `0` | Fraction of null fields (0–1) |
+| `-t`, `--timeseries` | — | Include time-series activity logs |
+| `--days` | `30` | Days of activity for time-series |
+| `--pretty` | — | Pretty-print JSON output |
+### Examples
+```bash
+# Generate 1000 users and save as CSV
+fakedata generate -n 1000 -f csv -o dataset.csv
+# Generate 500 deterministic Indian users
+fakedata generate -n 500 -l in --seed 42 -o india.json
+# Fraud detection dataset with 5% anomalies
+fakedata generate -n 10000 -a 0.05 -f csv -o fraud_data.csv
+# Preview a single user in the console
+fakedata preview
+# Time-series activity logs for 100 users
+fakedata generate -n 100 --timeseries --days 60 -o activity.json
+```
+---
+### sample output - one user
+```fakedata.data.user()```
+```fakedata.data.user(n) // set n = 100```
+```json
+  "id": "4612",
+  "fullName": "Damaris Carlo Ebervale",
+  "firstName": "Damaris",
+  "lastName": "Ebervale",
+  "middleName": "Carlo",
+  "age": 31,
+  "gender": "non-binary",
+  "email": "damaris.ebervale@liberomail.com",
+  "phone": "+1 7469125114",
+  "username": "damaris_4612",
+  "password": "UQ!VZr0cLUD9",
+  "birthDate": "1995-07-19",
+  "bloodGroup": "+B",
+  "height": 185,
+  "weight": 60,
+  "domain": "damarisebervale.vg",
+  "ip": "48.50.80.113",
+  "macaddress": "33:2F:39:EE:3B:1E",
+  "address": {
+    "street": "3623 Chateau Lane",
+    "city": "Kilgore",
+    "state": "Texas",
+    "country": "Sierra Leone",
+    "countryCode": "SL",
+    "zipCode": 36434,
+    "coordinates": {
+      "latitude": "-68.324385",
+      "longitude": "55.859967"
+    }
+  },
+  "demographics": {
+    "ethnicity": "Hispanic",
+    "nationality": "South Korean",
+    "language": {
+      "primary": "Arabic",
+      "secondary": "Turkish"
+    },
+    "relationshipStatus": "dating"
+  },
+  "education": {
+    "level": "Bachelor's",
+    "field": "Computer Science",
+    "institution": "Agricultural University of Lublin",
+    "institutionCountry": "Poland",
+    "gpa": 2.79,
+    "graduationYear": 2017,
+    "studentDebt": 64117
+  },
+  "employment": {
+    "status": "self-employed",
+    "company": "China CITIC Bank",
+    "companySize": "enterprise",
+    "industry": "Banking",
+    "jobTitle": "\"ORACLE DBA\"",
+    "jobCategory": "Network Engineering",
+    "yearsExperience": 10,
+    "workMode": "onsite",
+    "workHoursPerWeek": 36,
+    "jobSatisfaction": 6
+  },
+  "financial": {
+    "annualIncome": 21600,
+    "creditScore": 464,
+    "savings": 1680,
+    "monthlyExpenses": 1309,
+    "debtToIncome": 3.12,
+    "taxBracket": "12%",
+    "investmentStyle": "moderate",
+    "homeOwnership": "own"
+  },
+  "health": {
+    "bmi": 17.5,
+    "bmiCategory": "underweight",
+    "bloodPressure": {
+      "systolic": 100,
+      "diastolic": 82
+    },
+    "exerciseFrequency": "3-4 times/week",
+    "smoking": "never",
+    "alcohol": "never",
+    "sleepHoursPerNight": 8.3,
+    "sleepQuality": "poor",
+    "diet": "mediterranean",
+    "medicalCondition": "None",
+    "insuranceProvider": "UnitedHealthcare",
+    "medications": [
+      "Lisinopril"
+    ],
+    "lastCheckupMonthsAgo": 11,
+    "hasDisability": false,
+    "mentalHealth": "poor",
+    "vaccination": "partially vaccinated"
+  },
+  "social": {
+    "socialMedia": {
+      "platforms": [
+        "Pinterest",
+        "Twitter/X",
+        "Reddit",
+        "Instagram"
+      ],
+      "screenTimeHoursPerDay": 3.8,
+      "preferredContent": "video"
+    },
+    "shopping": {
+      "frequency": "weekly",
+      "preferredCategories": [
+        "toys & games",
+        "books"
+      ],
+      "monthlyOnlineSpending": 175
+    },
+    "newsSource": "social media",
+    "travelFrequency": "weekly",
+    "volunteers": false,
+    "pet": "multiple"
+  },
+  "digitalFootprint": {
+    "accountCreatedAt": "2021-04-01T09:59:41.867116+00:00",
+    "lastLoginAt": "2026-04-24T09:59:41.867116+00:00",
+    "lastPasswordChangeAt": "2025-11-06T09:59:41.867116+00:00",
+    "userAgent": "Mozilla/5.0 (Linux; Android 14; Pixel 8) AppleWebKit/537.36 Chrome/121.0.0.0 Mobile Safari/537.36",
+    "browser": "Chrome",
+    "os": "Windows 11",
+    "referrer": "facebook.com",
+    "avgSessionMinutes": 17.6,
+    "sessionsPerWeek": 10,
+    "totalSessions": 2666,
+    "twoFactorEnabled": false,
+    "preferredLanguage": "de",
+    "accountStatus": "inactive",
+    "verifiedEmail": false,
+    "verifiedPhone": true
+  },
+  "bank": {
+    "nameOnCard": "Damaris Carlo Ebervale",
+    "cardNumber": "2289970210128357",
+    "cardType": "Mastercard",
+    "cardExpiry": "5/29",
+    "cardCvv": "355"
+  },
+  "hobbies": [
+    "Knitting",
+    "Gardening",
+    "LARPing"
+  ],
+  "technology_profile": {
+    "devices": {
+      "additional_devices": [
+        "BlackBerry Bold 9790",
+        "Nokia N9"
+      ],
+      "smartphone": "Sony Ericsson Xperia X10"
+    },
+    "phone_preferences": {
+      "critical_features": [
+        "security features",
+        "reliability",
+        "5G connectivity"
+      ],
+      "primary_uses": [
+        "photography",
+        "education",
+        "organization"
+      ]
+    },
+    "interest": [
+      "Knitting",
+      "Gardening",
+      "LARPing"
+    ]
+  }
+}
+```
+---
+## Advanced Features Reference
+Both Python and JS/TS expose the same underlying engine options.
+### 1. Configuration Options
+Pass an `options` dictionary/object to `data.user(options)` or `data.users(n, options)`:
+```javascript
+const options = {
+    seed: 42,              // Number: Ensures deterministic, byte-for-byte identical output
+    missing_rate: 0.05,    // Float (0-1): 5% chance of any leaf field being null
+    locale: 'jp',          // String: 'en', 'in', 'jp', 'kr', 'de', 'br', 'ar', 'fr'
+    anomaly_rate: 0.05,    // Float (0-1): 5% of users will have injected fraud anomalies
+    days: 30,              // Number: Days of time-series activity to generate
+    eventsPerDay: 8,       // Number: Average events per day for time-series logs
+    // Schema Constraints (force specific data distributions)
+    schema: {
+        age: { min: 25, max: 40 },           // Can also use { exact: 30 }
+        gender: "female",                    // "male", "female", or "non-binary"
+        employment: { status: "employed" },
+        education: { level: "Master's" },
+        financial: { annualIncome: { min: 60000, max: 120000 } },
+        health: { medicalCondition: "Diabetes" },
+        address: { country: "Japan" },
+        height: { min: 160, max: 180 },
+        weight: { min: 50, max: 80 }
+    }
+}
+```
+### 2. Supported API Methods
+| Method (JS) | Method (Python) | Description |
+| :--- | :--- | :--- |
+| `data.user(opts?)` | `data.user(opts=None)` | Generate a single complex user profile. |
+| `data.users(n, opts?)` | `data.users(n, opts=None)` | Generate an array/list of `n` users. |
+| `data.userTimeSeries(opts)` | `data.user_time_series(opts)`| Returns `{ user, activity }` containing chronological event logs. |
+| `data.usersFlat(n, opts?)` | `data.users_flat(n, opts=None)`| Returns flat dicts/objects, perfect for `pandas.DataFrame` ingestion. |
+| `data.usersToCSV(n, opts?)` | `data.users_to_csv(n, opts=None)`| Returns a fully formatted CSV string (112 columns). |
+| `data.usersToJSON(n, opts?)`| `data.users_to_json(n, opts=None)`| Returns a pretty-printed JSON string. |
+### 3. Behavioral Personas (Statistical Modeling)
+To ensure the data is useful for **Clustering** and **Regression** analysis, `fakedata` uses a **Persona-driven engine**. Every user is assigned one of 6 personas that orchestrate their life outcomes:
+- **Executive**: High income, high education (Master's/PhD), premium Apple devices, luxury lifestyle.
+- **Tech Professional**: High income, high-end hardware, heavy social media use, remote work bias.
+- **Student**: Low income, high student debt, budget/mid-range tech, high social media footprint.
+- **Manual Laborer / Service Worker**: Budget-conscious, steady income, consistent employment patterns.
+- **Freelancer**: Flexible work modes, variable income ranges, mid-range tech profile.
+These personas ensure that an analyst looking at your synthetic data will find **statistically significant clusters** rather than just a uniform cloud of random values.
+---
+## Data Structure Highlights (112 Columns)
+### 3. Locale-Aware Name Generation
+Supports 8 locales with culturally accurate first names, last names, and country/phone codes:
+- `'in'`: Aarav Sharma, Priya Patel (+91, India)
+- `'jp'`: Haruto Tanaka, Sakura Sato (+81, Japan)
+- `'kr'`: Minjun Kim, Seo-yeon Park (+82, South Korea)
+- `'de'`: Lukas Müller, Mia Schmidt (+49, Germany)
+- `'br'`: Miguel Silva, Alice Santos (+55, Brazil)
+- `'ar'`: Mohammed Al-Ahmed, Fatima Khalil (+966, Saudi Arabia)
+- `'fr'`: Gabriel Martin, Emma Dubois (+33, France)
+- `'en'`: James Smith, Mary Johnson (+1, United States)
+### 4. Time-Series Activity Data
+Generate chronological behavioral logs for users. Event types include `login`, `page_view`, `purchase`, `search`, `click`, `logout`, `api_call`, `upload`, `download`, and `comment`.
+```javascript
+const ts = data.userTimeSeries({ seed: 42, days: 30, eventsPerDay: 8 });
+// ts.user → Full user profile
+// ts.activity → [{ timestamp, type, page, duration, device, ip, success, amount?, query? }]
+```
+### 5. Anomaly Injection Engine (Fraud Detection)
+When `anomaly_rate` is > 0, `fakedata` injects ML-detectable fraud patterns into the dataset. Affected users receive a special `_anomaly` flag object indicating the fraud type.
+| Anomaly Type | Effect |
+|:---|:---|
+| `income_spike` | Income multiplied 5-15x |
+| `credit_fraud` | Credit score = 100-200 or 850-999, DTI = 10-60 |
+| `session_anomaly` | Sessions/week = 200-700, avg session = 500-1500 min |
+| `age_outlier` | Age = 1, 2, 3, 115, 120, or 130 |
+| `geo_impossible` | Coordinates = (0,0), IP = 0.0.0.0 |
+| `velocity_attack` | Total sessions = 50k-150k, last login = now |
+| `data_mismatch` | Age=12 + employed + 30yr experience + $500k income |
+| `health_outlier` | BMI = 8-9 or 75-80, BP = extreme values |
+### 6. The User Profile Schema (109 Correlated Fields)
+Each generated user contains highly realistic, correlated data. For example, age determines education graduation year, which impacts employment salary, which impacts credit score, which impacts housing status and health/BMI metrics.
+```text
+identity(9) → personal(6) → network(3) → address(7) → demographics(5)
+→ education(7) → employment(10) → financial(8) → health(16)
+→ social(9) → digitalFootprint(15) → bank(5) → lifestyle(9)
+```
+---
+## License
+Distributed under the **MIT License**. See `LICENSE` for more information.
+**Maintainer**: [abhay557](https://github.com/abhay557)
+- Project Commit History - `https://github.com/abhay557/random-api.xyz`

{fakedata_python-2.0.0.dist-info → fakedata_python-2.0.2.dist-info}/RECORD RENAMED Viewed

@@ -1,4 +1,5 @@
 fakedata/__init__.py,sha256=PXwXDWU2HFUfAF2zFMrxsJ7BvP5RSpTbF0GvxWCTt3g,93
+fakedata/cli.py,sha256=8gwV_PYyUJo0QwmLnabZ_PHBS6UUpS4HJLJeMX-Bi-w,5817
 fakedata/core.py,sha256=ZiZ51aZ3cAG7n02Giliq0XO5nN-bbjLJWM3pZQ6gWT4,437
 fakedata/test_python.py,sha256=UpfmArMkM7bcRkV_MTed0vZ1QXokOAZqazcfISlLmZA,3226
 fakedata/helpers/cardtype.json,sha256=3Ij5N_QPCO1Xg6g7jTp759yOVF9scXkbBDZvWYRaSAM,201
@@ -24,8 +25,9 @@ fakedata/helpers/states.json,sha256=1NLVCllDcRN8QXp3GTv9iiqeqA8rOOlx2v3_w729jfU,
 fakedata/helpers/street.json,sha256=Z-1cRr7uGMXBqlPoqoedPagfx_hLXqbWDNoylcnS8L0,305724
 fakedata/helpers/universities.json,sha256=7NHac5anNxnCVSj6kdFc87896S2A3J-zT58Yw4lzkaQ,230421
 fakedata/modules/__init__.py,sha256=buFp940xk9V39VnBFIca5ADTEtX8qsKz7_VQC3102tI,19
-fakedata/modules/data.py,sha256=UQxgdzjBzPD46p5mXoi6T4PPYKvduFtbOLl1SEcaBV4,50652
-fakedata_python-2.0.0.dist-info/METADATA,sha256=hvIvrcmlA-1PngIcWd9d50VbebolI7iE3FdEh9lqnTY,8381
-fakedata_python-2.0.0.dist-info/WHEEL,sha256=aeYiig01lYGDzBgS8HxWXOg3uV61G9ijOsup-k9o1sk,91
-fakedata_python-2.0.0.dist-info/top_level.txt,sha256=SHFa_6848yAE45QgW-PX_DHp_nakY64Zs_t2NobLcn0,9
-fakedata_python-2.0.0.dist-info/RECORD,,
+fakedata/modules/data.py,sha256=8MHRV5AtbzLOCBRKUP3E-dHc8nIQ3VuP-Xyk7c_-Eog,52542
+fakedata_python-2.0.2.dist-info/METADATA,sha256=WlEpLxJNeuYVUsasQCQcHHS_s3d8gBcjYpaiMRY-TqI,16306
+fakedata_python-2.0.2.dist-info/WHEEL,sha256=aeYiig01lYGDzBgS8HxWXOg3uV61G9ijOsup-k9o1sk,91
+fakedata_python-2.0.2.dist-info/entry_points.txt,sha256=qLOKT1Qujc8-qppTaDO2GUWcuoUQR9fSID3qvIaEAPo,47
+fakedata_python-2.0.2.dist-info/top_level.txt,sha256=SHFa_6848yAE45QgW-PX_DHp_nakY64Zs_t2NobLcn0,9
+fakedata_python-2.0.2.dist-info/RECORD,,

fakedata_python-2.0.2.dist-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ [console_scripts]
2	+ fakedata = fakedata.cli:main

fakedata_python-2.0.0.dist-info/METADATA DELETED Viewed

@@ -1,179 +0,0 @@
-Metadata-Version: 2.4
-Name: fakedata-python
-Version: 2.0.0
-Summary: The fakedata package generates realistic synthetic user profiles for machine learning, deep learning, data analysis, and data science workflows.
-Author-email: abhay557 <abhaycormourya@gmail.com>
-License-Expression: MIT
-Project-URL: Homepage, https://github.com/abhay557/fakedata
-Classifier: Programming Language :: Python :: 3
-Classifier: Operating System :: OS Independent
-Requires-Python: >=3.7
-Description-Content-Type: text/markdown
-# fakedata
-[![NPM Version](https://img.shields.io/npm/v/@abhay557/fakedata?color=red&label=npm)](https://www.npmjs.com/package/@abhay557/fakedata)
-[![PyPI Version](https://img.shields.io/pypi/v/fakedata-python?color=blue&label=pypi)](https://pypi.org/project/fakedata-python/)
-[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
-A high-performance, **zero-dependency** synthetic data generation engine, available for both **Node.js** and **Python**. Designed specifically for machine learning, data science, and analytics workflows, providing 100% data parity across platforms.
-## Overview
-`fakedata` has been completely rebuilt from the ground up to serve as an **ML-ready synthetic data engine**. It generates deeply interconnected user profiles with **109 flat columns across 13 domains** (Health, Financial, Employment, Digital Footprint, etc.), making it the perfect tool for training models, benchmarking pipelines, or simulating realistic databases.
-###  Machine Learning Power Features:
-- **Seed Reproducibility**: Generate byte-for-byte identical datasets across runs (and languages!) using `seed`.
-- **Schema Overrides**: Force specific distributions (e.g., age ranges, income brackets, genders) using `schema`.
-- **Locale-Aware Generation**: Support for 8 culture-specific name sets and phone formats (`en`, `in`, `jp`, `kr`, `de`, `br`, `ar`, `fr`).
-- **Missing Data Simulation**: Automatically inject realistic nulls using `missing_rate` to test your data imputation pipelines.
-- **Anomaly Injection**: Inject fraud/outlier profiles (e.g., impossible geography, credit fraud, income spikes) using `anomaly_rate`.
-- **Time-Series Data**: Generate chronological activity logs (logins, page views, purchases) per user for behavioral modeling.
-- **Pipeline Ready**: Export directly to CSV, JSON, or Flat objects (perfect for `pandas.DataFrame`).
----
-##  Node.js / TypeScript Implementation
-### Installation
-```bash
-npm install @abhay557/fakedata
-```
-### Quick Start
-```javascript
-const { data } = require('@abhay557/fakedata');
-// Generate deterministic users with a 5% missing data rate (null injection)
-const users = data.users(1000, { seed: 42, missing_rate: 0.05 });
-// Export directly to CSV format
-const csvString = data.usersToCSV(1000, { seed: 42 });
-// Time-series activity data
-const ts = data.userTimeSeries({ days: 30, eventsPerDay: 8 });
-console.log(`Generated ${ts.activity.length} events for ${ts.user.fullName}`);
-```
----
-##  Python Implementation
-### Installation
-```bash
-pip install fakedata-python
-```
-### Quick Start
-```python
-import fakedata.data as data
-import pandas as pd
-# Generate 10,000 highly correlated users deterministically
-users = data.users(10000, {"seed": 42})
-# Or export directly to a Pandas DataFrame
-df = pd.DataFrame(data.users_flat(10000, {"seed": 42}))
-print(df.head())
-# Create time-series activity data
-ts = data.user_time_series({"days": 30, "events_per_day": 8})
-print(f"Generated {len(ts['activity'])} events for {ts['user']['fullName']}")
-```
----
-## Advanced Features Reference
-Both Python and JS/TS expose the same underlying engine options.
-### 1. Configuration Options
-Pass an `options` dictionary/object to `data.user(options)` or `data.users(n, options)`:
-```javascript
-const options = {
-    seed: 42,              // Number: Ensures deterministic, byte-for-byte identical output
-    missing_rate: 0.05,    // Float (0-1): 5% chance of any leaf field being null
-    locale: 'jp',          // String: 'en', 'in', 'jp', 'kr', 'de', 'br', 'ar', 'fr'
-    anomaly_rate: 0.05,    // Float (0-1): 5% of users will have injected fraud anomalies
-    days: 30,              // Number: Days of time-series activity to generate
-    eventsPerDay: 8,       // Number: Average events per day for time-series logs
-    // Schema Constraints (force specific data distributions)
-    schema: {
-        age: { min: 25, max: 40 },           // Can also use { exact: 30 }
-        gender: "female",                    // "male", "female", or "non-binary"
-        employment: { status: "employed" },
-        education: { level: "Master's" },
-        financial: { annualIncome: { min: 60000, max: 120000 } },
-        health: { medicalCondition: "Diabetes" },
-        address: { country: "Japan" },
-        height: { min: 160, max: 180 },
-        weight: { min: 50, max: 80 }
-    }
-}
-```
-### 2. Supported API Methods
-| Method (JS) | Method (Python) | Description |
-| :--- | :--- | :--- |
-| `data.user(opts?)` | `data.user(opts=None)` | Generate a single complex user profile. |
-| `data.users(n, opts?)` | `data.users(n, opts=None)` | Generate an array/list of `n` users. |
-| `data.userTimeSeries(opts)` | `data.user_time_series(opts)`| Returns `{ user, activity }` containing chronological event logs. |
-| `data.usersFlat(n, opts?)` | `data.users_flat(n, opts=None)`| Returns flat dicts/objects, perfect for `pandas.DataFrame` ingestion. |
-| `data.usersToCSV(n, opts?)` | `data.users_to_csv(n, opts=None)`| Returns a fully formatted CSV string (109 columns). |
-| `data.usersToJSON(n, opts?)`| `data.users_to_json(n, opts=None)`| Returns a pretty-printed JSON string. |
-### 3. Locale-Aware Name Generation
-Supports 8 locales with culturally accurate first names, last names, and country/phone codes:
-- `'in'`: Aarav Sharma, Priya Patel (+91, India)
-- `'jp'`: Haruto Tanaka, Sakura Sato (+81, Japan)
-- `'kr'`: Minjun Kim, Seo-yeon Park (+82, South Korea)
-- `'de'`: Lukas Müller, Mia Schmidt (+49, Germany)
-- `'br'`: Miguel Silva, Alice Santos (+55, Brazil)
-- `'ar'`: Mohammed Al-Ahmed, Fatima Khalil (+966, Saudi Arabia)
-- `'fr'`: Gabriel Martin, Emma Dubois (+33, France)
-- `'en'`: James Smith, Mary Johnson (+1, United States)
-### 4. Time-Series Activity Data
-Generate chronological behavioral logs for users. Event types include `login`, `page_view`, `purchase`, `search`, `click`, `logout`, `api_call`, `upload`, `download`, and `comment`.
-```javascript
-const ts = data.userTimeSeries({ seed: 42, days: 30, eventsPerDay: 8 });
-// ts.user → Full user profile
-// ts.activity → [{ timestamp, type, page, duration, device, ip, success, amount?, query? }]
-```
-### 5. Anomaly Injection Engine (Fraud Detection)
-When `anomaly_rate` is > 0, `fakedata` injects ML-detectable fraud patterns into the dataset. Affected users receive a special `_anomaly` flag object indicating the fraud type.
-| Anomaly Type | Effect |
-|:---|:---|
-| `income_spike` | Income multiplied 5-15x |
-| `credit_fraud` | Credit score = 100-200 or 850-999, DTI = 10-60 |
-| `session_anomaly` | Sessions/week = 200-700, avg session = 500-1500 min |
-| `age_outlier` | Age = 1, 2, 3, 115, 120, or 130 |
-| `geo_impossible` | Coordinates = (0,0), IP = 0.0.0.0 |
-| `velocity_attack` | Total sessions = 50k-150k, last login = now |
-| `data_mismatch` | Age=12 + employed + 30yr experience + $500k income |
-| `health_outlier` | BMI = 8-9 or 75-80, BP = extreme values |
-### 6. The User Profile Schema (109 Correlated Fields)
-Each generated user contains highly realistic, correlated data. For example, age determines education graduation year, which impacts employment salary, which impacts credit score, which impacts housing status and health/BMI metrics.
-```text
-identity(9) → personal(6) → network(3) → address(7) → demographics(5)
-→ education(7) → employment(10) → financial(8) → health(16)
-→ social(9) → digitalFootprint(15) → bank(5) → lifestyle(9)
-```
----
-## License
-Distributed under the **MIT License**. See `LICENSE` for more information.
-**Maintainer**: [abhay557](https://github.com/abhay557)
-- Project Commit History - `https://github.com/abhay557/random-api.xyz`

{fakedata_python-2.0.0.dist-info → fakedata_python-2.0.2.dist-info}/WHEEL RENAMED Viewed

File without changes

{fakedata_python-2.0.0.dist-info → fakedata_python-2.0.2.dist-info}/top_level.txt RENAMED Viewed

File without changes

fakedata-python 2.0.0__py3-none-any.whl → 2.0.2__py3-none-any.whl

fakedata-python 2.0.0py3-none-any.whl → 2.0.2py3-none-any.whl