fakedata-python 2.0.8__tar.gz → 2.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/MANIFEST.in +1 -0
- {fakedata_python-2.0.8/fakedata_python.egg-info → fakedata_python-2.1.0}/PKG-INFO +96 -236
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/README.md +95 -235
- fakedata_python-2.1.0/fakedata/__init__.py +21 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/cli.py +20 -2
- fakedata_python-2.1.0/fakedata/helpers/companies.json +1 -0
- fakedata_python-2.1.0/fakedata/helpers/healthcare_extended.json +973 -0
- fakedata_python-2.1.0/fakedata/helpers/job_skills.json +606 -0
- fakedata_python-2.1.0/fakedata/helpers/salary_distributions.json +101 -0
- fakedata_python-2.1.0/fakedata/helpers/universities.json +71570 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/modules/data.py +308 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0/fakedata_python.egg-info}/PKG-INFO +96 -236
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata_python.egg-info/SOURCES.txt +3 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/pyproject.toml +1 -1
- fakedata_python-2.0.8/fakedata/__init__.py +0 -7
- fakedata_python-2.0.8/fakedata/helpers/companies.json +0 -1
- fakedata_python-2.0.8/fakedata/helpers/universities.json +0 -1
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/LICENSE +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/core.py +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/helpers/cardtype.json +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/helpers/countries.json +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/helpers/devices.json +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/helpers/domain.json +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/helpers/email.json +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/helpers/first.json +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/helpers/healthcare.json +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/helpers/hobbies.json +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/helpers/industries.json +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/helpers/job_categories.json +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/helpers/job_titles.json +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/helpers/last.json +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/helpers/locales.json +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/helpers/middle.json +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/helpers/occupation.json +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/helpers/salary_ranges.json +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/helpers/shortformstate.json +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/helpers/state.json +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/helpers/states.json +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/helpers/street.json +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/modules/__init__.py +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata/test_python.py +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata_python.egg-info/dependency_links.txt +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata_python.egg-info/entry_points.txt +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/fakedata_python.egg-info/top_level.txt +0 -0
- {fakedata_python-2.0.8 → fakedata_python-2.1.0}/setup.cfg +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: fakedata-python
|
|
3
|
-
Version: 2.0
|
|
3
|
+
Version: 2.1.0
|
|
4
4
|
Summary: The fakedata package generates realistic user profiles for machine learning, deep learning, data analysis, and data science workflows.
|
|
5
5
|
Author-email: abhay557 <contact@abhaymourya.in>
|
|
6
6
|
License-Expression: MIT
|
|
@@ -22,6 +22,7 @@ Dynamic: license-file
|
|
|
22
22
|
|
|
23
23
|
A high-performance, **zero-dependency** synthetic data generation engine, available for both **Node.js** and **Python**. Designed specifically for machine learning, data science, and analytics workflows, providing 100% data parity across platforms.
|
|
24
24
|
|
|
25
|
+
|
|
25
26
|
## Overview
|
|
26
27
|
|
|
27
28
|
`fakedata` has been completely rebuilt from the ground up to serve as an **ML-ready synthetic data engine**. It generates deeply interconnected user profiles with **112 flat columns across 13 domains** (Health, Financial, Employment, Digital Footprint, etc.), making it the perfect tool for training models, benchmarking pipelines, or simulating realistic databases.
|
|
@@ -37,50 +38,8 @@ A high-performance, **zero-dependency** synthetic data generation engine, availa
|
|
|
37
38
|
- **Pipeline Ready**: Export directly to CSV, JSON, or Flat objects (perfect for `pandas.DataFrame`).
|
|
38
39
|
- **CLI Tool**: Generate and export datasets directly from your terminal — no scripting required.
|
|
39
40
|
- **Streaming Generation**: Files are written one record at a time — constant RAM usage regardless of dataset size. Generate 10M+ rows without running out of memory.
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
## Node.js / TypeScript Implementation
|
|
44
|
-
|
|
45
|
-
### Installation
|
|
46
|
-
```bash
|
|
47
|
-
npm install @abhay557/fakedata
|
|
48
|
-
```
|
|
49
|
-
|
|
50
|
-
### Quick Start
|
|
51
|
-
```javascript
|
|
52
|
-
const fakedata = require('@abhay557/fakedata');
|
|
53
|
-
|
|
54
|
-
// Generate deterministic users with a 5% missing data rate (null injection)
|
|
55
|
-
const users = fakedata.data.users(1000, { seed: 42, missing_rate: 0.05 });
|
|
56
|
-
|
|
57
|
-
// Export directly to CSV format
|
|
58
|
-
const csvString = fakedata.data.usersToCSV(1000, { seed: 42 });
|
|
59
|
-
|
|
60
|
-
// Time-series activity data
|
|
61
|
-
const ts = fakedata.userTimeSeries({ days: 30, eventsPerDay: 8 });
|
|
62
|
-
console.log(`Generated ${ts.activity.length} events for ${ts.user.fullName}`);
|
|
63
|
-
```
|
|
64
|
-
|
|
65
|
-
### Streaming API & Custom Correlations
|
|
66
|
-
Generate unlimited data directly to disk while keeping memory at O(1), and force mathematical relationships between fields using the Pearson Correlation API:
|
|
67
|
-
|
|
68
|
-
```javascript
|
|
69
|
-
const fs = require('fs');
|
|
70
|
-
const fakedata = require('@abhay557/fakedata');
|
|
71
|
-
|
|
72
|
-
// Create a stream that emits 1 million users as CSV
|
|
73
|
-
const stream = fakedata.data.generateStream(1000000, {
|
|
74
|
-
format: 'csv',
|
|
75
|
-
correlations: [
|
|
76
|
-
{ fieldA: 'education.level', fieldB: 'financial.annualIncome', pearson_coeff: 0.85 },
|
|
77
|
-
{ fieldA: 'health.bmi', fieldB: 'health.bloodPressure.systolic', pearson_coeff: 0.60 }
|
|
78
|
-
]
|
|
79
|
-
});
|
|
80
|
-
|
|
81
|
-
// Pipe directly to file (constant RAM usage)
|
|
82
|
-
stream.pipe(fs.createWriteStream('1m_dataset.csv'));
|
|
83
|
-
```
|
|
41
|
+
- **Standalone Generators**: Generate modular, domain-specific data without full user profiles using `data.company()`, `data.job()`, `data.medicalRecord()`, `data.university()`, and `data.transaction()`.
|
|
42
|
+
- **Enriched High-Fidelity Data**: Powered by aggregated datasets, user profiles now include structured `health.medicalHistory` arrays, `employment.companyDetails` with revenue and net income, and `employment.skills` arrays correlated to real job titles.
|
|
84
43
|
|
|
85
44
|
---
|
|
86
45
|
|
|
@@ -128,6 +87,48 @@ for user in stream:
|
|
|
128
87
|
pass
|
|
129
88
|
```
|
|
130
89
|
|
|
90
|
+
## Node.js / TypeScript Implementation
|
|
91
|
+
|
|
92
|
+
### Installation
|
|
93
|
+
```bash
|
|
94
|
+
npm install @abhay557/fakedata
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
### Quick Start
|
|
98
|
+
```javascript
|
|
99
|
+
const fakedata = require('@abhay557/fakedata');
|
|
100
|
+
|
|
101
|
+
// Generate deterministic users with a 5% missing data rate (null injection)
|
|
102
|
+
const users = fakedata.data.users(1000, { seed: 42, missing_rate: 0.05 });
|
|
103
|
+
|
|
104
|
+
// Export directly to CSV format
|
|
105
|
+
const csvString = fakedata.data.usersToCSV(1000, { seed: 42 });
|
|
106
|
+
|
|
107
|
+
// Time-series activity data
|
|
108
|
+
const ts = fakedata.userTimeSeries({ days: 30, eventsPerDay: 8 });
|
|
109
|
+
console.log(`Generated ${ts.activity.length} events for ${ts.user.fullName}`);
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
### Streaming API & Custom Correlations
|
|
113
|
+
Generate unlimited data directly to disk while keeping memory at O(1), and force mathematical relationships between fields using the Pearson Correlation API:
|
|
114
|
+
|
|
115
|
+
```javascript
|
|
116
|
+
const fs = require('fs');
|
|
117
|
+
const fakedata = require('@abhay557/fakedata');
|
|
118
|
+
|
|
119
|
+
// Create a stream that emits 1 million users as CSV
|
|
120
|
+
const stream = fakedata.data.generateStream(1000000, {
|
|
121
|
+
format: 'csv',
|
|
122
|
+
correlations: [
|
|
123
|
+
{ fieldA: 'education.level', fieldB: 'financial.annualIncome', pearson_coeff: 0.85 },
|
|
124
|
+
{ fieldA: 'health.bmi', fieldB: 'health.bloodPressure.systolic', pearson_coeff: 0.60 }
|
|
125
|
+
]
|
|
126
|
+
});
|
|
127
|
+
|
|
128
|
+
// Pipe directly to file (constant RAM usage)
|
|
129
|
+
stream.pipe(fs.createWriteStream('1m_dataset.csv'));
|
|
130
|
+
```
|
|
131
|
+
|
|
131
132
|
---
|
|
132
133
|
|
|
133
134
|
## CLI — Command Line Interface
|
|
@@ -156,7 +157,8 @@ pip install fakedata-python
|
|
|
156
157
|
|
|
157
158
|
| Flag | Default | Description |
|
|
158
159
|
|:---|:---|:---|
|
|
159
|
-
| `-
|
|
160
|
+
| `-T`, `--type` | `users` | Type of data: `users` \| `companies` \| `jobs` \| `universities` \| `transactions` \| `medical_records` |
|
|
161
|
+
| `-n`, `--count` | `10` | Number of records to generate |
|
|
160
162
|
| `-f`, `--format` | `json` | Output format: `json` \| `csv` \| `flat` |
|
|
161
163
|
| `-o`, `--output` | stdout | Output file path |
|
|
162
164
|
| `-s`, `--seed` | none | Random seed for reproducibility |
|
|
@@ -173,6 +175,12 @@ pip install fakedata-python
|
|
|
173
175
|
# Generate 1000 users and save as CSV
|
|
174
176
|
fakedata generate -n 1000 -f csv -o dataset.csv
|
|
175
177
|
|
|
178
|
+
# Generate 500 standalone company profiles (v2.1)
|
|
179
|
+
fakedata generate --type companies -n 500 -o companies.json
|
|
180
|
+
|
|
181
|
+
# Generate 100,000 medical records directly to a file (v2.1)
|
|
182
|
+
fakedata generate -T medical_records -n 100000 -o hospitals.json
|
|
183
|
+
|
|
176
184
|
# Generate 500 deterministic Indian users
|
|
177
185
|
fakedata generate -n 500 -l in --seed 42 -o india.json
|
|
178
186
|
|
|
@@ -201,188 +209,6 @@ When writing to a file (`-o`), the CLI uses a **streaming write** strategy:
|
|
|
201
209
|
|
|
202
210
|
This means you can generate **tens of millions of rows** without hitting Node.js heap limits or Python memory errors.
|
|
203
211
|
|
|
204
|
-
|
|
205
|
-
---
|
|
206
|
-
### sample output - one user
|
|
207
|
-
```fakedata.data.user()```
|
|
208
|
-
```fakedata.data.user(n) // set n = 100```
|
|
209
|
-
|
|
210
|
-
```json
|
|
211
|
-
"id": "4612",
|
|
212
|
-
"fullName": "Damaris Carlo Ebervale",
|
|
213
|
-
"firstName": "Damaris",
|
|
214
|
-
"lastName": "Ebervale",
|
|
215
|
-
"middleName": "Carlo",
|
|
216
|
-
"age": 31,
|
|
217
|
-
"gender": "non-binary",
|
|
218
|
-
"email": "damaris.ebervale@liberomail.com",
|
|
219
|
-
"phone": "+1 7469125114",
|
|
220
|
-
"username": "damaris_4612",
|
|
221
|
-
"password": "UQ!VZr0cLUD9",
|
|
222
|
-
"birthDate": "1995-07-19",
|
|
223
|
-
"bloodGroup": "+B",
|
|
224
|
-
"height": 185,
|
|
225
|
-
"weight": 60,
|
|
226
|
-
"domain": "damarisebervale.vg",
|
|
227
|
-
"ip": "48.50.80.113",
|
|
228
|
-
"macaddress": "33:2F:39:EE:3B:1E",
|
|
229
|
-
"address": {
|
|
230
|
-
"street": "3623 Chateau Lane",
|
|
231
|
-
"city": "Kilgore",
|
|
232
|
-
"state": "Texas",
|
|
233
|
-
"country": "Sierra Leone",
|
|
234
|
-
"countryCode": "SL",
|
|
235
|
-
"zipCode": 36434,
|
|
236
|
-
"coordinates": {
|
|
237
|
-
"latitude": "-68.324385",
|
|
238
|
-
"longitude": "55.859967"
|
|
239
|
-
}
|
|
240
|
-
},
|
|
241
|
-
"demographics": {
|
|
242
|
-
"ethnicity": "Hispanic",
|
|
243
|
-
"nationality": "South Korean",
|
|
244
|
-
"language": {
|
|
245
|
-
"primary": "Arabic",
|
|
246
|
-
"secondary": "Turkish"
|
|
247
|
-
},
|
|
248
|
-
"relationshipStatus": "dating"
|
|
249
|
-
},
|
|
250
|
-
"education": {
|
|
251
|
-
"level": "Bachelor's",
|
|
252
|
-
"field": "Computer Science",
|
|
253
|
-
"institution": "Agricultural University of Lublin",
|
|
254
|
-
"institutionCountry": "Poland",
|
|
255
|
-
"gpa": 2.79,
|
|
256
|
-
"graduationYear": 2017,
|
|
257
|
-
"studentDebt": 64117
|
|
258
|
-
},
|
|
259
|
-
"employment": {
|
|
260
|
-
"status": "self-employed",
|
|
261
|
-
"company": "China CITIC Bank",
|
|
262
|
-
"companySize": "enterprise",
|
|
263
|
-
"industry": "Banking",
|
|
264
|
-
"jobTitle": "\"ORACLE DBA\"",
|
|
265
|
-
"jobCategory": "Network Engineering",
|
|
266
|
-
"yearsExperience": 10,
|
|
267
|
-
"workMode": "onsite",
|
|
268
|
-
"workHoursPerWeek": 36,
|
|
269
|
-
"jobSatisfaction": 6
|
|
270
|
-
},
|
|
271
|
-
"financial": {
|
|
272
|
-
"annualIncome": 21600,
|
|
273
|
-
"creditScore": 464,
|
|
274
|
-
"savings": 1680,
|
|
275
|
-
"monthlyExpenses": 1309,
|
|
276
|
-
"debtToIncome": 3.12,
|
|
277
|
-
"taxBracket": "12%",
|
|
278
|
-
"investmentStyle": "moderate",
|
|
279
|
-
"homeOwnership": "own"
|
|
280
|
-
},
|
|
281
|
-
"health": {
|
|
282
|
-
"bmi": 17.5,
|
|
283
|
-
"bmiCategory": "underweight",
|
|
284
|
-
"bloodPressure": {
|
|
285
|
-
"systolic": 100,
|
|
286
|
-
"diastolic": 82
|
|
287
|
-
},
|
|
288
|
-
"exerciseFrequency": "3-4 times/week",
|
|
289
|
-
"smoking": "never",
|
|
290
|
-
"alcohol": "never",
|
|
291
|
-
"sleepHoursPerNight": 8.3,
|
|
292
|
-
"sleepQuality": "poor",
|
|
293
|
-
"diet": "mediterranean",
|
|
294
|
-
"medicalCondition": "None",
|
|
295
|
-
"insuranceProvider": "UnitedHealthcare",
|
|
296
|
-
"medications": [
|
|
297
|
-
"Lisinopril"
|
|
298
|
-
],
|
|
299
|
-
"lastCheckupMonthsAgo": 11,
|
|
300
|
-
"hasDisability": false,
|
|
301
|
-
"mentalHealth": "poor",
|
|
302
|
-
"vaccination": "partially vaccinated"
|
|
303
|
-
},
|
|
304
|
-
"social": {
|
|
305
|
-
"socialMedia": {
|
|
306
|
-
"platforms": [
|
|
307
|
-
"Pinterest",
|
|
308
|
-
"Twitter/X",
|
|
309
|
-
"Reddit",
|
|
310
|
-
"Instagram"
|
|
311
|
-
],
|
|
312
|
-
"screenTimeHoursPerDay": 3.8,
|
|
313
|
-
"preferredContent": "video"
|
|
314
|
-
},
|
|
315
|
-
"shopping": {
|
|
316
|
-
"frequency": "weekly",
|
|
317
|
-
"preferredCategories": [
|
|
318
|
-
"toys & games",
|
|
319
|
-
"books"
|
|
320
|
-
],
|
|
321
|
-
"monthlyOnlineSpending": 175
|
|
322
|
-
},
|
|
323
|
-
"newsSource": "social media",
|
|
324
|
-
"travelFrequency": "weekly",
|
|
325
|
-
"volunteers": false,
|
|
326
|
-
"pet": "multiple"
|
|
327
|
-
},
|
|
328
|
-
"digitalFootprint": {
|
|
329
|
-
"accountCreatedAt": "2021-04-01T09:59:41.867116+00:00",
|
|
330
|
-
"lastLoginAt": "2026-04-24T09:59:41.867116+00:00",
|
|
331
|
-
"lastPasswordChangeAt": "2025-11-06T09:59:41.867116+00:00",
|
|
332
|
-
"userAgent": "Mozilla/5.0 (Linux; Android 14; Pixel 8) AppleWebKit/537.36 Chrome/121.0.0.0 Mobile Safari/537.36",
|
|
333
|
-
"browser": "Chrome",
|
|
334
|
-
"os": "Windows 11",
|
|
335
|
-
"referrer": "facebook.com",
|
|
336
|
-
"avgSessionMinutes": 17.6,
|
|
337
|
-
"sessionsPerWeek": 10,
|
|
338
|
-
"totalSessions": 2666,
|
|
339
|
-
"twoFactorEnabled": false,
|
|
340
|
-
"preferredLanguage": "de",
|
|
341
|
-
"accountStatus": "inactive",
|
|
342
|
-
"verifiedEmail": false,
|
|
343
|
-
"verifiedPhone": true
|
|
344
|
-
},
|
|
345
|
-
"bank": {
|
|
346
|
-
"nameOnCard": "Damaris Carlo Ebervale",
|
|
347
|
-
"cardNumber": "2289970210128357",
|
|
348
|
-
"cardType": "Mastercard",
|
|
349
|
-
"cardExpiry": "5/29",
|
|
350
|
-
"cardCvv": "355"
|
|
351
|
-
},
|
|
352
|
-
"hobbies": [
|
|
353
|
-
"Knitting",
|
|
354
|
-
"Gardening",
|
|
355
|
-
"LARPing"
|
|
356
|
-
],
|
|
357
|
-
"technology_profile": {
|
|
358
|
-
"devices": {
|
|
359
|
-
"additional_devices": [
|
|
360
|
-
"BlackBerry Bold 9790",
|
|
361
|
-
"Nokia N9"
|
|
362
|
-
],
|
|
363
|
-
"smartphone": "Sony Ericsson Xperia X10"
|
|
364
|
-
},
|
|
365
|
-
"phone_preferences": {
|
|
366
|
-
"critical_features": [
|
|
367
|
-
"security features",
|
|
368
|
-
"reliability",
|
|
369
|
-
"5G connectivity"
|
|
370
|
-
],
|
|
371
|
-
"primary_uses": [
|
|
372
|
-
"photography",
|
|
373
|
-
"education",
|
|
374
|
-
"organization"
|
|
375
|
-
]
|
|
376
|
-
},
|
|
377
|
-
"interest": [
|
|
378
|
-
"Knitting",
|
|
379
|
-
"Gardening",
|
|
380
|
-
"LARPing"
|
|
381
|
-
]
|
|
382
|
-
}
|
|
383
|
-
}
|
|
384
|
-
|
|
385
|
-
```
|
|
386
212
|
---
|
|
387
213
|
|
|
388
214
|
## Advanced Features Reference
|
|
@@ -442,7 +268,45 @@ These personas ensure that an analyst looking at your synthetic data will find *
|
|
|
442
268
|
|
|
443
269
|
## Data Structure Highlights (112 Columns)
|
|
444
270
|
|
|
445
|
-
### 3.
|
|
271
|
+
### 3. v2.1 High-Fidelity Data Injections
|
|
272
|
+
Version 2.1 completely revamps the `user()` profile by injecting rich, deeply nested real-world data distributions for Employment, Health, and Education.
|
|
273
|
+
|
|
274
|
+
```json
|
|
275
|
+
{
|
|
276
|
+
"employment": {
|
|
277
|
+
"status": "employed",
|
|
278
|
+
"jobTitle": "Data Scientist",
|
|
279
|
+
"jobCategory": "Engineering",
|
|
280
|
+
"skills": ["Python", "SQL", "Machine Learning", "PyTorch"],
|
|
281
|
+
"companyDetails": {
|
|
282
|
+
"country": "United States",
|
|
283
|
+
"industry": "Technology",
|
|
284
|
+
"yearFounded": 1998,
|
|
285
|
+
"revenue": 182300000000,
|
|
286
|
+
"netIncome": 46200000000
|
|
287
|
+
}
|
|
288
|
+
},
|
|
289
|
+
"health": {
|
|
290
|
+
"medicalHistory": [
|
|
291
|
+
{
|
|
292
|
+
"condition": "Hypertension",
|
|
293
|
+
"hospital": "UCLA Medical Center",
|
|
294
|
+
"admissionType": "Urgent",
|
|
295
|
+
"billingAmount": 18560.50,
|
|
296
|
+
"medication": "Lisinopril",
|
|
297
|
+
"testResult": "Abnormal"
|
|
298
|
+
}
|
|
299
|
+
]
|
|
300
|
+
},
|
|
301
|
+
"education": {
|
|
302
|
+
"institution": "Massachusetts Institute of Technology",
|
|
303
|
+
"institutionDomain": "mit.edu",
|
|
304
|
+
"institutionState": "Massachusetts"
|
|
305
|
+
}
|
|
306
|
+
}
|
|
307
|
+
```
|
|
308
|
+
|
|
309
|
+
### 4. Locale-Aware Name Generation
|
|
446
310
|
Supports 8 locales with culturally accurate first names, last names, and country/phone codes:
|
|
447
311
|
- `'in'`: Aarav Sharma, Priya Patel (+91, India)
|
|
448
312
|
- `'jp'`: Haruto Tanaka, Sakura Sato (+81, Japan)
|
|
@@ -453,7 +317,7 @@ Supports 8 locales with culturally accurate first names, last names, and country
|
|
|
453
317
|
- `'fr'`: Gabriel Martin, Emma Dubois (+33, France)
|
|
454
318
|
- `'en'`: James Smith, Mary Johnson (+1, United States)
|
|
455
319
|
|
|
456
|
-
###
|
|
320
|
+
### 5. Time-Series Activity Data
|
|
457
321
|
Generate chronological behavioral logs for users. Event types include `login`, `page_view`, `purchase`, `search`, `click`, `logout`, `api_call`, `upload`, `download`, and `comment`.
|
|
458
322
|
|
|
459
323
|
```javascript
|
|
@@ -462,7 +326,7 @@ const ts = data.userTimeSeries({ seed: 42, days: 30, eventsPerDay: 8 });
|
|
|
462
326
|
// ts.activity → [{ timestamp, type, page, duration, device, ip, success, amount?, query? }]
|
|
463
327
|
```
|
|
464
328
|
|
|
465
|
-
###
|
|
329
|
+
### 6. Anomaly Injection Engine (Fraud Detection)
|
|
466
330
|
When `anomaly_rate` is > 0, `fakedata` injects ML-detectable fraud patterns into the dataset. Affected users receive a special `_anomaly` flag object indicating the fraud type.
|
|
467
331
|
|
|
468
332
|
| Anomaly Type | Effect |
|
|
@@ -476,7 +340,7 @@ When `anomaly_rate` is > 0, `fakedata` injects ML-detectable fraud patterns into
|
|
|
476
340
|
| `data_mismatch` | Age=12 + employed + 30yr experience + $500k income |
|
|
477
341
|
| `health_outlier` | BMI = 8-9 or 75-80, BP = extreme values |
|
|
478
342
|
|
|
479
|
-
###
|
|
343
|
+
### 7. The User Profile Schema (109 Correlated Fields)
|
|
480
344
|
Each generated user contains highly realistic, correlated data. For example, age determines education graduation year, which impacts employment salary, which impacts credit score, which impacts housing status and health/BMI metrics.
|
|
481
345
|
|
|
482
346
|
```text
|
|
@@ -492,7 +356,3 @@ identity(9) → personal(6) → network(3) → address(7) → demographics(5)
|
|
|
492
356
|
Distributed under the **MIT License**. See `LICENSE` for more information.
|
|
493
357
|
|
|
494
358
|
**Maintainer**: [abhay557](https://github.com/abhay557)
|
|
495
|
-
|
|
496
|
-
- Project Commit History - `https://github.com/abhay557/random-api.xyz`
|
|
497
|
-
|
|
498
|
-
---
|