fakedata-python 2.0.0__tar.gz → 2.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (35) hide show
  1. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/PKG-INFO +223 -26
  2. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/README.md +364 -167
  3. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/modules/data.py +48 -11
  4. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata_python.egg-info/PKG-INFO +223 -26
  5. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/pyproject.toml +1 -1
  6. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/__init__.py +0 -0
  7. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/core.py +0 -0
  8. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/cardtype.json +0 -0
  9. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/companies.json +0 -0
  10. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/countries.json +0 -0
  11. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/devices.json +0 -0
  12. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/domain.json +0 -0
  13. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/email.json +0 -0
  14. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/first.json +0 -0
  15. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/healthcare.json +0 -0
  16. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/hobbies.json +0 -0
  17. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/industries.json +0 -0
  18. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/job_categories.json +0 -0
  19. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/job_titles.json +0 -0
  20. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/last.json +0 -0
  21. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/locales.json +0 -0
  22. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/middle.json +0 -0
  23. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/occupation.json +0 -0
  24. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/salary_ranges.json +0 -0
  25. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/shortformstate.json +0 -0
  26. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/state.json +0 -0
  27. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/states.json +0 -0
  28. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/street.json +0 -0
  29. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/helpers/universities.json +0 -0
  30. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/modules/__init__.py +0 -0
  31. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata/test_python.py +0 -0
  32. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata_python.egg-info/SOURCES.txt +0 -0
  33. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata_python.egg-info/dependency_links.txt +0 -0
  34. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/fakedata_python.egg-info/top_level.txt +0 -0
  35. {fakedata_python-2.0.0 → fakedata_python-2.0.1}/setup.cfg +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: fakedata-python
3
- Version: 2.0.0
3
+ Version: 2.0.1
4
4
  Summary: The fakedata package generates realistic synthetic user profiles for machine learning, deep learning, data analysis, and data science workflows.
5
5
  Author-email: abhay557 <abhaycormourya@gmail.com>
6
6
  License-Expression: MIT
@@ -15,14 +15,17 @@ Description-Content-Type: text/markdown
15
15
  [![NPM Version](https://img.shields.io/npm/v/@abhay557/fakedata?color=red&label=npm)](https://www.npmjs.com/package/@abhay557/fakedata)
16
16
  [![PyPI Version](https://img.shields.io/pypi/v/fakedata-python?color=blue&label=pypi)](https://pypi.org/project/fakedata-python/)
17
17
  [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
18
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/16N9x1YCOVVvIF8rl7IQxKRkK4en_g3Gi?usp=sharing)
19
+ [![PyPI Downloads](https://static.pepy.tech/personalized-badge/fakedata-python?period=total&units=INTERNATIONAL_SYSTEM&left_color=BLACK&right_color=GREEN&left_text=downloads)](https://pepy.tech/projects/fakedata-python)
18
20
 
19
21
  A high-performance, **zero-dependency** synthetic data generation engine, available for both **Node.js** and **Python**. Designed specifically for machine learning, data science, and analytics workflows, providing 100% data parity across platforms.
20
22
 
21
23
  ## Overview
22
24
 
23
- `fakedata` has been completely rebuilt from the ground up to serve as an **ML-ready synthetic data engine**. It generates deeply interconnected user profiles with **109 flat columns across 13 domains** (Health, Financial, Employment, Digital Footprint, etc.), making it the perfect tool for training models, benchmarking pipelines, or simulating realistic databases.
25
+ `fakedata` has been completely rebuilt from the ground up to serve as an **ML-ready synthetic data engine**. It generates deeply interconnected user profiles with **112 flat columns across 13 domains** (Health, Financial, Employment, Digital Footprint, etc.), making it the perfect tool for training models, benchmarking pipelines, or simulating realistic databases.
24
26
 
25
27
  ### Machine Learning Power Features:
28
+ - **Behavioral Personas**: Orchestrate correlations through 6 distinct personas (e.g., Executive, Student, Tech Pro) to ensure realistic socio-economic patterns.
26
29
  - **Seed Reproducibility**: Generate byte-for-byte identical datasets across runs (and languages!) using `seed`.
27
30
  - **Schema Overrides**: Force specific distributions (e.g., age ranges, income brackets, genders) using `schema`.
28
31
  - **Locale-Aware Generation**: Support for 8 culture-specific name sets and phone formats (`en`, `in`, `jp`, `kr`, `de`, `br`, `ar`, `fr`).
@@ -32,6 +35,30 @@ A high-performance, **zero-dependency** synthetic data generation engine, availa
32
35
  - **Pipeline Ready**: Export directly to CSV, JSON, or Flat objects (perfect for `pandas.DataFrame`).
33
36
 
34
37
  ---
38
+ ## Python Implementation
39
+
40
+ ### Installation
41
+ ```bash
42
+ pip install fakedata-python
43
+ ```
44
+
45
+ ### Quick Start
46
+ ```python
47
+ import fakedata.data as data
48
+ import pandas as pd
49
+
50
+ # Generate 10,000 highly correlated users deterministically
51
+ users = data.users(10000, {"seed": 42})
52
+
53
+ # Or export directly to a Pandas DataFrame
54
+ df = pd.DataFrame(data.users_flat(10000, {"seed": 42}))
55
+ print(df.head())
56
+
57
+ # Create time-series activity data
58
+ ts = data.user_time_series({"days": 30, "events_per_day": 8})
59
+ print(f"Generated {len(ts['activity'])} events for {ts['user']['fullName']}")
60
+ ```
61
+
35
62
 
36
63
  ## Node.js / TypeScript Implementation
37
64
 
@@ -54,33 +81,188 @@ const csvString = data.usersToCSV(1000, { seed: 42 });
54
81
  const ts = data.userTimeSeries({ days: 30, eventsPerDay: 8 });
55
82
  console.log(`Generated ${ts.activity.length} events for ${ts.user.fullName}`);
56
83
  ```
57
-
58
84
  ---
59
85
 
60
- ## Python Implementation
61
-
62
- ### Installation
63
- ```bash
64
- pip install fakedata-python
65
- ```
66
-
67
- ### Quick Start
68
- ```python
69
- import fakedata.data as data
70
- import pandas as pd
71
-
72
- # Generate 10,000 highly correlated users deterministically
73
- users = data.users(10000, {"seed": 42})
74
-
75
- # Or export directly to a Pandas DataFrame
76
- df = pd.DataFrame(data.users_flat(10000, {"seed": 42}))
77
- print(df.head())
86
+ ### sample output - one user
87
+ ```fakedata.data.user()```
88
+ ```fakedata.data.user(n) // set n = 100```
89
+
90
+ ```json
91
+ "id": "4612",
92
+ "fullName": "Damaris Carlo Ebervale",
93
+ "firstName": "Damaris",
94
+ "lastName": "Ebervale",
95
+ "middleName": "Carlo",
96
+ "age": 31,
97
+ "gender": "non-binary",
98
+ "email": "damaris.ebervale@liberomail.com",
99
+ "phone": "+1 7469125114",
100
+ "username": "damaris_4612",
101
+ "password": "UQ!VZr0cLUD9",
102
+ "birthDate": "1995-07-19",
103
+ "bloodGroup": "+B",
104
+ "height": 185,
105
+ "weight": 60,
106
+ "domain": "damarisebervale.vg",
107
+ "ip": "48.50.80.113",
108
+ "macaddress": "33:2F:39:EE:3B:1E",
109
+ "address": {
110
+ "street": "3623 Chateau Lane",
111
+ "city": "Kilgore",
112
+ "state": "Texas",
113
+ "country": "Sierra Leone",
114
+ "countryCode": "SL",
115
+ "zipCode": 36434,
116
+ "coordinates": {
117
+ "latitude": "-68.324385",
118
+ "longitude": "55.859967"
119
+ }
120
+ },
121
+ "demographics": {
122
+ "ethnicity": "Hispanic",
123
+ "nationality": "South Korean",
124
+ "language": {
125
+ "primary": "Arabic",
126
+ "secondary": "Turkish"
127
+ },
128
+ "relationshipStatus": "dating"
129
+ },
130
+ "education": {
131
+ "level": "Bachelor's",
132
+ "field": "Computer Science",
133
+ "institution": "Agricultural University of Lublin",
134
+ "institutionCountry": "Poland",
135
+ "gpa": 2.79,
136
+ "graduationYear": 2017,
137
+ "studentDebt": 64117
138
+ },
139
+ "employment": {
140
+ "status": "self-employed",
141
+ "company": "China CITIC Bank",
142
+ "companySize": "enterprise",
143
+ "industry": "Banking",
144
+ "jobTitle": "\"ORACLE DBA\"",
145
+ "jobCategory": "Network Engineering",
146
+ "yearsExperience": 10,
147
+ "workMode": "onsite",
148
+ "workHoursPerWeek": 36,
149
+ "jobSatisfaction": 6
150
+ },
151
+ "financial": {
152
+ "annualIncome": 21600,
153
+ "creditScore": 464,
154
+ "savings": 1680,
155
+ "monthlyExpenses": 1309,
156
+ "debtToIncome": 3.12,
157
+ "taxBracket": "12%",
158
+ "investmentStyle": "moderate",
159
+ "homeOwnership": "own"
160
+ },
161
+ "health": {
162
+ "bmi": 17.5,
163
+ "bmiCategory": "underweight",
164
+ "bloodPressure": {
165
+ "systolic": 100,
166
+ "diastolic": 82
167
+ },
168
+ "exerciseFrequency": "3-4 times/week",
169
+ "smoking": "never",
170
+ "alcohol": "never",
171
+ "sleepHoursPerNight": 8.3,
172
+ "sleepQuality": "poor",
173
+ "diet": "mediterranean",
174
+ "medicalCondition": "None",
175
+ "insuranceProvider": "UnitedHealthcare",
176
+ "medications": [
177
+ "Lisinopril"
178
+ ],
179
+ "lastCheckupMonthsAgo": 11,
180
+ "hasDisability": false,
181
+ "mentalHealth": "poor",
182
+ "vaccination": "partially vaccinated"
183
+ },
184
+ "social": {
185
+ "socialMedia": {
186
+ "platforms": [
187
+ "Pinterest",
188
+ "Twitter/X",
189
+ "Reddit",
190
+ "Instagram"
191
+ ],
192
+ "screenTimeHoursPerDay": 3.8,
193
+ "preferredContent": "video"
194
+ },
195
+ "shopping": {
196
+ "frequency": "weekly",
197
+ "preferredCategories": [
198
+ "toys & games",
199
+ "books"
200
+ ],
201
+ "monthlyOnlineSpending": 175
202
+ },
203
+ "newsSource": "social media",
204
+ "travelFrequency": "weekly",
205
+ "volunteers": false,
206
+ "pet": "multiple"
207
+ },
208
+ "digitalFootprint": {
209
+ "accountCreatedAt": "2021-04-01T09:59:41.867116+00:00",
210
+ "lastLoginAt": "2026-04-24T09:59:41.867116+00:00",
211
+ "lastPasswordChangeAt": "2025-11-06T09:59:41.867116+00:00",
212
+ "userAgent": "Mozilla/5.0 (Linux; Android 14; Pixel 8) AppleWebKit/537.36 Chrome/121.0.0.0 Mobile Safari/537.36",
213
+ "browser": "Chrome",
214
+ "os": "Windows 11",
215
+ "referrer": "facebook.com",
216
+ "avgSessionMinutes": 17.6,
217
+ "sessionsPerWeek": 10,
218
+ "totalSessions": 2666,
219
+ "twoFactorEnabled": false,
220
+ "preferredLanguage": "de",
221
+ "accountStatus": "inactive",
222
+ "verifiedEmail": false,
223
+ "verifiedPhone": true
224
+ },
225
+ "bank": {
226
+ "nameOnCard": "Damaris Carlo Ebervale",
227
+ "cardNumber": "2289970210128357",
228
+ "cardType": "Mastercard",
229
+ "cardExpiry": "5/29",
230
+ "cardCvv": "355"
231
+ },
232
+ "hobbies": [
233
+ "Knitting",
234
+ "Gardening",
235
+ "LARPing"
236
+ ],
237
+ "technology_profile": {
238
+ "devices": {
239
+ "additional_devices": [
240
+ "BlackBerry Bold 9790",
241
+ "Nokia N9"
242
+ ],
243
+ "smartphone": "Sony Ericsson Xperia X10"
244
+ },
245
+ "phone_preferences": {
246
+ "critical_features": [
247
+ "security features",
248
+ "reliability",
249
+ "5G connectivity"
250
+ ],
251
+ "primary_uses": [
252
+ "photography",
253
+ "education",
254
+ "organization"
255
+ ]
256
+ },
257
+ "interest": [
258
+ "Knitting",
259
+ "Gardening",
260
+ "LARPing"
261
+ ]
262
+ }
263
+ }
78
264
 
79
- # Create time-series activity data
80
- ts = data.user_time_series({"days": 30, "events_per_day": 8})
81
- print(f"Generated {len(ts['activity'])} events for {ts['user']['fullName']}")
82
265
  ```
83
-
84
266
  ---
85
267
 
86
268
  ## Advanced Features Reference
@@ -122,9 +304,24 @@ const options = {
122
304
  | `data.users(n, opts?)` | `data.users(n, opts=None)` | Generate an array/list of `n` users. |
123
305
  | `data.userTimeSeries(opts)` | `data.user_time_series(opts)`| Returns `{ user, activity }` containing chronological event logs. |
124
306
  | `data.usersFlat(n, opts?)` | `data.users_flat(n, opts=None)`| Returns flat dicts/objects, perfect for `pandas.DataFrame` ingestion. |
125
- | `data.usersToCSV(n, opts?)` | `data.users_to_csv(n, opts=None)`| Returns a fully formatted CSV string (109 columns). |
307
+ | `data.usersToCSV(n, opts?)` | `data.users_to_csv(n, opts=None)`| Returns a fully formatted CSV string (112 columns). |
126
308
  | `data.usersToJSON(n, opts?)`| `data.users_to_json(n, opts=None)`| Returns a pretty-printed JSON string. |
127
309
 
310
+ ### 3. Behavioral Personas (Statistical Modeling)
311
+ To ensure the data is useful for **Clustering** and **Regression** analysis, `fakedata` uses a **Persona-driven engine**. Every user is assigned one of 6 personas that orchestrate their life outcomes:
312
+
313
+ - **Executive**: High income, high education (Master's/PhD), premium Apple devices, luxury lifestyle.
314
+ - **Tech Professional**: High income, high-end hardware, heavy social media use, remote work bias.
315
+ - **Student**: Low income, high student debt, budget/mid-range tech, high social media footprint.
316
+ - **Manual Laborer / Service Worker**: Budget-conscious, steady income, consistent employment patterns.
317
+ - **Freelancer**: Flexible work modes, variable income ranges, mid-range tech profile.
318
+
319
+ These personas ensure that an analyst looking at your synthetic data will find **statistically significant clusters** rather than just a uniform cloud of random values.
320
+
321
+ ---
322
+
323
+ ## Data Structure Highlights (112 Columns)
324
+
128
325
  ### 3. Locale-Aware Name Generation
129
326
  Supports 8 locales with culturally accurate first names, last names, and country/phone codes:
130
327
  - `'in'`: Aarav Sharma, Priya Patel (+91, India)