fakedata-python 2.0.5__tar.gz → 2.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/MANIFEST.in +3 -0
- {fakedata_python-2.0.5/fakedata_python.egg-info → fakedata_python-2.1.0}/PKG-INFO +93 -197
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/README.md +92 -196
- fakedata_python-2.1.0/fakedata/__init__.py +21 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/cli.py +20 -2
- fakedata_python-2.1.0/fakedata/helpers/companies.json +1 -0
- fakedata_python-2.1.0/fakedata/helpers/healthcare_extended.json +973 -0
- fakedata_python-2.1.0/fakedata/helpers/job_skills.json +606 -0
- fakedata_python-2.1.0/fakedata/helpers/salary_distributions.json +101 -0
- fakedata_python-2.1.0/fakedata/helpers/universities.json +71570 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/modules/data.py +482 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0/fakedata_python.egg-info}/PKG-INFO +93 -197
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata_python.egg-info/SOURCES.txt +3 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/pyproject.toml +1 -1
- fakedata_python-2.0.5/fakedata/__init__.py +0 -6
- fakedata_python-2.0.5/fakedata/helpers/companies.json +0 -1
- fakedata_python-2.0.5/fakedata/helpers/universities.json +0 -1
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/LICENSE +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/core.py +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/helpers/cardtype.json +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/helpers/countries.json +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/helpers/devices.json +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/helpers/domain.json +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/helpers/email.json +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/helpers/first.json +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/helpers/healthcare.json +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/helpers/hobbies.json +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/helpers/industries.json +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/helpers/job_categories.json +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/helpers/job_titles.json +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/helpers/last.json +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/helpers/locales.json +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/helpers/middle.json +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/helpers/occupation.json +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/helpers/salary_ranges.json +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/helpers/shortformstate.json +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/helpers/state.json +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/helpers/states.json +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/helpers/street.json +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/modules/__init__.py +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata/test_python.py +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata_python.egg-info/dependency_links.txt +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata_python.egg-info/entry_points.txt +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/fakedata_python.egg-info/top_level.txt +0 -0
- {fakedata_python-2.0.5 → fakedata_python-2.1.0}/setup.cfg +0 -0
|
@@ -1,11 +1,14 @@
|
|
|
1
1
|
# Exclude development and Node.js files
|
|
2
2
|
prune .github
|
|
3
3
|
exclude CONTRIBUTING.md
|
|
4
|
+
exclude data.md
|
|
4
5
|
exclude CODE_OF_CONDUCT.md
|
|
5
6
|
exclude .npmignore
|
|
6
7
|
exclude test.js
|
|
7
8
|
exclude test_py.py
|
|
8
9
|
exclude test_python.py
|
|
10
|
+
exclude test_new_apis.py
|
|
11
|
+
exclude test_new_apis.js
|
|
9
12
|
|
|
10
13
|
# Exclude JS source code
|
|
11
14
|
prune src
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: fakedata-python
|
|
3
|
-
Version: 2.0
|
|
3
|
+
Version: 2.1.0
|
|
4
4
|
Summary: The fakedata package generates realistic user profiles for machine learning, deep learning, data analysis, and data science workflows.
|
|
5
5
|
Author-email: abhay557 <contact@abhaymourya.in>
|
|
6
6
|
License-Expression: MIT
|
|
@@ -22,6 +22,7 @@ Dynamic: license-file
|
|
|
22
22
|
|
|
23
23
|
A high-performance, **zero-dependency** synthetic data generation engine, available for both **Node.js** and **Python**. Designed specifically for machine learning, data science, and analytics workflows, providing 100% data parity across platforms.
|
|
24
24
|
|
|
25
|
+
|
|
25
26
|
## Overview
|
|
26
27
|
|
|
27
28
|
`fakedata` has been completely rebuilt from the ground up to serve as an **ML-ready synthetic data engine**. It generates deeply interconnected user profiles with **112 flat columns across 13 domains** (Health, Financial, Employment, Digital Footprint, etc.), making it the perfect tool for training models, benchmarking pipelines, or simulating realistic databases.
|
|
@@ -37,6 +38,8 @@ A high-performance, **zero-dependency** synthetic data generation engine, availa
|
|
|
37
38
|
- **Pipeline Ready**: Export directly to CSV, JSON, or Flat objects (perfect for `pandas.DataFrame`).
|
|
38
39
|
- **CLI Tool**: Generate and export datasets directly from your terminal — no scripting required.
|
|
39
40
|
- **Streaming Generation**: Files are written one record at a time — constant RAM usage regardless of dataset size. Generate 10M+ rows without running out of memory.
|
|
41
|
+
- **Standalone Generators**: Generate modular, domain-specific data without full user profiles using `data.company()`, `data.job()`, `data.medicalRecord()`, `data.university()`, and `data.transaction()`.
|
|
42
|
+
- **Enriched High-Fidelity Data**: Powered by aggregated datasets, user profiles now include structured `health.medicalHistory` arrays, `employment.companyDetails` with revenue and net income, and `employment.skills` arrays correlated to real job titles.
|
|
40
43
|
|
|
41
44
|
---
|
|
42
45
|
|
|
@@ -64,7 +67,25 @@ ts = fakedata.data.user_time_series({"days": 30, "events_per_day": 8})
|
|
|
64
67
|
print(f"Generated {len(ts['activity'])} events for {ts['user']['fullName']}")
|
|
65
68
|
```
|
|
66
69
|
|
|
67
|
-
|
|
70
|
+
### Streaming API & Custom Correlations
|
|
71
|
+
Generate unlimited data lazily, keeping memory footprint at O(1), and force mathematical relationships between fields using the Pearson Correlation API:
|
|
72
|
+
|
|
73
|
+
```python
|
|
74
|
+
import fakedata
|
|
75
|
+
|
|
76
|
+
# Create a lazy generator that yields 1 million users
|
|
77
|
+
stream = fakedata.generate_stream(1000000, {
|
|
78
|
+
"correlations": [
|
|
79
|
+
{"fieldA": "education.level", "fieldB": "financial.annualIncome", "pearson_coeff": 0.85},
|
|
80
|
+
{"fieldA": "health.bmi", "fieldB": "health.bloodPressure.systolic", "pearson_coeff": 0.60}
|
|
81
|
+
]
|
|
82
|
+
})
|
|
83
|
+
|
|
84
|
+
# Process users one by one without blowing up RAM
|
|
85
|
+
for user in stream:
|
|
86
|
+
# write to DB, serialize to file, or process
|
|
87
|
+
pass
|
|
88
|
+
```
|
|
68
89
|
|
|
69
90
|
## Node.js / TypeScript Implementation
|
|
70
91
|
|
|
@@ -88,6 +109,26 @@ const ts = fakedata.userTimeSeries({ days: 30, eventsPerDay: 8 });
|
|
|
88
109
|
console.log(`Generated ${ts.activity.length} events for ${ts.user.fullName}`);
|
|
89
110
|
```
|
|
90
111
|
|
|
112
|
+
### Streaming API & Custom Correlations
|
|
113
|
+
Generate unlimited data directly to disk while keeping memory at O(1), and force mathematical relationships between fields using the Pearson Correlation API:
|
|
114
|
+
|
|
115
|
+
```javascript
|
|
116
|
+
const fs = require('fs');
|
|
117
|
+
const fakedata = require('@abhay557/fakedata');
|
|
118
|
+
|
|
119
|
+
// Create a stream that emits 1 million users as CSV
|
|
120
|
+
const stream = fakedata.data.generateStream(1000000, {
|
|
121
|
+
format: 'csv',
|
|
122
|
+
correlations: [
|
|
123
|
+
{ fieldA: 'education.level', fieldB: 'financial.annualIncome', pearson_coeff: 0.85 },
|
|
124
|
+
{ fieldA: 'health.bmi', fieldB: 'health.bloodPressure.systolic', pearson_coeff: 0.60 }
|
|
125
|
+
]
|
|
126
|
+
});
|
|
127
|
+
|
|
128
|
+
// Pipe directly to file (constant RAM usage)
|
|
129
|
+
stream.pipe(fs.createWriteStream('1m_dataset.csv'));
|
|
130
|
+
```
|
|
131
|
+
|
|
91
132
|
---
|
|
92
133
|
|
|
93
134
|
## CLI — Command Line Interface
|
|
@@ -116,7 +157,8 @@ pip install fakedata-python
|
|
|
116
157
|
|
|
117
158
|
| Flag | Default | Description |
|
|
118
159
|
|:---|:---|:---|
|
|
119
|
-
| `-
|
|
160
|
+
| `-T`, `--type` | `users` | Type of data: `users` \| `companies` \| `jobs` \| `universities` \| `transactions` \| `medical_records` |
|
|
161
|
+
| `-n`, `--count` | `10` | Number of records to generate |
|
|
120
162
|
| `-f`, `--format` | `json` | Output format: `json` \| `csv` \| `flat` |
|
|
121
163
|
| `-o`, `--output` | stdout | Output file path |
|
|
122
164
|
| `-s`, `--seed` | none | Random seed for reproducibility |
|
|
@@ -133,6 +175,12 @@ pip install fakedata-python
|
|
|
133
175
|
# Generate 1000 users and save as CSV
|
|
134
176
|
fakedata generate -n 1000 -f csv -o dataset.csv
|
|
135
177
|
|
|
178
|
+
# Generate 500 standalone company profiles (v2.1)
|
|
179
|
+
fakedata generate --type companies -n 500 -o companies.json
|
|
180
|
+
|
|
181
|
+
# Generate 100,000 medical records directly to a file (v2.1)
|
|
182
|
+
fakedata generate -T medical_records -n 100000 -o hospitals.json
|
|
183
|
+
|
|
136
184
|
# Generate 500 deterministic Indian users
|
|
137
185
|
fakedata generate -n 500 -l in --seed 42 -o india.json
|
|
138
186
|
|
|
@@ -161,192 +209,6 @@ When writing to a file (`-o`), the CLI uses a **streaming write** strategy:
|
|
|
161
209
|
|
|
162
210
|
This means you can generate **tens of millions of rows** without hitting Node.js heap limits or Python memory errors.
|
|
163
211
|
|
|
164
|
-
```
|
|
165
|
-
Before (old): generate ALL → hold in RAM → write to file ❌ OOM at ~500k rows
|
|
166
|
-
After (new): open file → generate 1 → write → discard → repeat ✅ unlimited
|
|
167
|
-
```
|
|
168
|
-
|
|
169
|
-
---
|
|
170
|
-
### sample output - one user
|
|
171
|
-
```fakedata.data.user()```
|
|
172
|
-
```fakedata.data.user(n) // set n = 100```
|
|
173
|
-
|
|
174
|
-
```json
|
|
175
|
-
"id": "4612",
|
|
176
|
-
"fullName": "Damaris Carlo Ebervale",
|
|
177
|
-
"firstName": "Damaris",
|
|
178
|
-
"lastName": "Ebervale",
|
|
179
|
-
"middleName": "Carlo",
|
|
180
|
-
"age": 31,
|
|
181
|
-
"gender": "non-binary",
|
|
182
|
-
"email": "damaris.ebervale@liberomail.com",
|
|
183
|
-
"phone": "+1 7469125114",
|
|
184
|
-
"username": "damaris_4612",
|
|
185
|
-
"password": "UQ!VZr0cLUD9",
|
|
186
|
-
"birthDate": "1995-07-19",
|
|
187
|
-
"bloodGroup": "+B",
|
|
188
|
-
"height": 185,
|
|
189
|
-
"weight": 60,
|
|
190
|
-
"domain": "damarisebervale.vg",
|
|
191
|
-
"ip": "48.50.80.113",
|
|
192
|
-
"macaddress": "33:2F:39:EE:3B:1E",
|
|
193
|
-
"address": {
|
|
194
|
-
"street": "3623 Chateau Lane",
|
|
195
|
-
"city": "Kilgore",
|
|
196
|
-
"state": "Texas",
|
|
197
|
-
"country": "Sierra Leone",
|
|
198
|
-
"countryCode": "SL",
|
|
199
|
-
"zipCode": 36434,
|
|
200
|
-
"coordinates": {
|
|
201
|
-
"latitude": "-68.324385",
|
|
202
|
-
"longitude": "55.859967"
|
|
203
|
-
}
|
|
204
|
-
},
|
|
205
|
-
"demographics": {
|
|
206
|
-
"ethnicity": "Hispanic",
|
|
207
|
-
"nationality": "South Korean",
|
|
208
|
-
"language": {
|
|
209
|
-
"primary": "Arabic",
|
|
210
|
-
"secondary": "Turkish"
|
|
211
|
-
},
|
|
212
|
-
"relationshipStatus": "dating"
|
|
213
|
-
},
|
|
214
|
-
"education": {
|
|
215
|
-
"level": "Bachelor's",
|
|
216
|
-
"field": "Computer Science",
|
|
217
|
-
"institution": "Agricultural University of Lublin",
|
|
218
|
-
"institutionCountry": "Poland",
|
|
219
|
-
"gpa": 2.79,
|
|
220
|
-
"graduationYear": 2017,
|
|
221
|
-
"studentDebt": 64117
|
|
222
|
-
},
|
|
223
|
-
"employment": {
|
|
224
|
-
"status": "self-employed",
|
|
225
|
-
"company": "China CITIC Bank",
|
|
226
|
-
"companySize": "enterprise",
|
|
227
|
-
"industry": "Banking",
|
|
228
|
-
"jobTitle": "\"ORACLE DBA\"",
|
|
229
|
-
"jobCategory": "Network Engineering",
|
|
230
|
-
"yearsExperience": 10,
|
|
231
|
-
"workMode": "onsite",
|
|
232
|
-
"workHoursPerWeek": 36,
|
|
233
|
-
"jobSatisfaction": 6
|
|
234
|
-
},
|
|
235
|
-
"financial": {
|
|
236
|
-
"annualIncome": 21600,
|
|
237
|
-
"creditScore": 464,
|
|
238
|
-
"savings": 1680,
|
|
239
|
-
"monthlyExpenses": 1309,
|
|
240
|
-
"debtToIncome": 3.12,
|
|
241
|
-
"taxBracket": "12%",
|
|
242
|
-
"investmentStyle": "moderate",
|
|
243
|
-
"homeOwnership": "own"
|
|
244
|
-
},
|
|
245
|
-
"health": {
|
|
246
|
-
"bmi": 17.5,
|
|
247
|
-
"bmiCategory": "underweight",
|
|
248
|
-
"bloodPressure": {
|
|
249
|
-
"systolic": 100,
|
|
250
|
-
"diastolic": 82
|
|
251
|
-
},
|
|
252
|
-
"exerciseFrequency": "3-4 times/week",
|
|
253
|
-
"smoking": "never",
|
|
254
|
-
"alcohol": "never",
|
|
255
|
-
"sleepHoursPerNight": 8.3,
|
|
256
|
-
"sleepQuality": "poor",
|
|
257
|
-
"diet": "mediterranean",
|
|
258
|
-
"medicalCondition": "None",
|
|
259
|
-
"insuranceProvider": "UnitedHealthcare",
|
|
260
|
-
"medications": [
|
|
261
|
-
"Lisinopril"
|
|
262
|
-
],
|
|
263
|
-
"lastCheckupMonthsAgo": 11,
|
|
264
|
-
"hasDisability": false,
|
|
265
|
-
"mentalHealth": "poor",
|
|
266
|
-
"vaccination": "partially vaccinated"
|
|
267
|
-
},
|
|
268
|
-
"social": {
|
|
269
|
-
"socialMedia": {
|
|
270
|
-
"platforms": [
|
|
271
|
-
"Pinterest",
|
|
272
|
-
"Twitter/X",
|
|
273
|
-
"Reddit",
|
|
274
|
-
"Instagram"
|
|
275
|
-
],
|
|
276
|
-
"screenTimeHoursPerDay": 3.8,
|
|
277
|
-
"preferredContent": "video"
|
|
278
|
-
},
|
|
279
|
-
"shopping": {
|
|
280
|
-
"frequency": "weekly",
|
|
281
|
-
"preferredCategories": [
|
|
282
|
-
"toys & games",
|
|
283
|
-
"books"
|
|
284
|
-
],
|
|
285
|
-
"monthlyOnlineSpending": 175
|
|
286
|
-
},
|
|
287
|
-
"newsSource": "social media",
|
|
288
|
-
"travelFrequency": "weekly",
|
|
289
|
-
"volunteers": false,
|
|
290
|
-
"pet": "multiple"
|
|
291
|
-
},
|
|
292
|
-
"digitalFootprint": {
|
|
293
|
-
"accountCreatedAt": "2021-04-01T09:59:41.867116+00:00",
|
|
294
|
-
"lastLoginAt": "2026-04-24T09:59:41.867116+00:00",
|
|
295
|
-
"lastPasswordChangeAt": "2025-11-06T09:59:41.867116+00:00",
|
|
296
|
-
"userAgent": "Mozilla/5.0 (Linux; Android 14; Pixel 8) AppleWebKit/537.36 Chrome/121.0.0.0 Mobile Safari/537.36",
|
|
297
|
-
"browser": "Chrome",
|
|
298
|
-
"os": "Windows 11",
|
|
299
|
-
"referrer": "facebook.com",
|
|
300
|
-
"avgSessionMinutes": 17.6,
|
|
301
|
-
"sessionsPerWeek": 10,
|
|
302
|
-
"totalSessions": 2666,
|
|
303
|
-
"twoFactorEnabled": false,
|
|
304
|
-
"preferredLanguage": "de",
|
|
305
|
-
"accountStatus": "inactive",
|
|
306
|
-
"verifiedEmail": false,
|
|
307
|
-
"verifiedPhone": true
|
|
308
|
-
},
|
|
309
|
-
"bank": {
|
|
310
|
-
"nameOnCard": "Damaris Carlo Ebervale",
|
|
311
|
-
"cardNumber": "2289970210128357",
|
|
312
|
-
"cardType": "Mastercard",
|
|
313
|
-
"cardExpiry": "5/29",
|
|
314
|
-
"cardCvv": "355"
|
|
315
|
-
},
|
|
316
|
-
"hobbies": [
|
|
317
|
-
"Knitting",
|
|
318
|
-
"Gardening",
|
|
319
|
-
"LARPing"
|
|
320
|
-
],
|
|
321
|
-
"technology_profile": {
|
|
322
|
-
"devices": {
|
|
323
|
-
"additional_devices": [
|
|
324
|
-
"BlackBerry Bold 9790",
|
|
325
|
-
"Nokia N9"
|
|
326
|
-
],
|
|
327
|
-
"smartphone": "Sony Ericsson Xperia X10"
|
|
328
|
-
},
|
|
329
|
-
"phone_preferences": {
|
|
330
|
-
"critical_features": [
|
|
331
|
-
"security features",
|
|
332
|
-
"reliability",
|
|
333
|
-
"5G connectivity"
|
|
334
|
-
],
|
|
335
|
-
"primary_uses": [
|
|
336
|
-
"photography",
|
|
337
|
-
"education",
|
|
338
|
-
"organization"
|
|
339
|
-
]
|
|
340
|
-
},
|
|
341
|
-
"interest": [
|
|
342
|
-
"Knitting",
|
|
343
|
-
"Gardening",
|
|
344
|
-
"LARPing"
|
|
345
|
-
]
|
|
346
|
-
}
|
|
347
|
-
}
|
|
348
|
-
|
|
349
|
-
```
|
|
350
212
|
---
|
|
351
213
|
|
|
352
214
|
## Advanced Features Reference
|
|
@@ -406,7 +268,45 @@ These personas ensure that an analyst looking at your synthetic data will find *
|
|
|
406
268
|
|
|
407
269
|
## Data Structure Highlights (112 Columns)
|
|
408
270
|
|
|
409
|
-
### 3.
|
|
271
|
+
### 3. v2.1 High-Fidelity Data Injections
|
|
272
|
+
Version 2.1 completely revamps the `user()` profile by injecting rich, deeply nested real-world data distributions for Employment, Health, and Education.
|
|
273
|
+
|
|
274
|
+
```json
|
|
275
|
+
{
|
|
276
|
+
"employment": {
|
|
277
|
+
"status": "employed",
|
|
278
|
+
"jobTitle": "Data Scientist",
|
|
279
|
+
"jobCategory": "Engineering",
|
|
280
|
+
"skills": ["Python", "SQL", "Machine Learning", "PyTorch"],
|
|
281
|
+
"companyDetails": {
|
|
282
|
+
"country": "United States",
|
|
283
|
+
"industry": "Technology",
|
|
284
|
+
"yearFounded": 1998,
|
|
285
|
+
"revenue": 182300000000,
|
|
286
|
+
"netIncome": 46200000000
|
|
287
|
+
}
|
|
288
|
+
},
|
|
289
|
+
"health": {
|
|
290
|
+
"medicalHistory": [
|
|
291
|
+
{
|
|
292
|
+
"condition": "Hypertension",
|
|
293
|
+
"hospital": "UCLA Medical Center",
|
|
294
|
+
"admissionType": "Urgent",
|
|
295
|
+
"billingAmount": 18560.50,
|
|
296
|
+
"medication": "Lisinopril",
|
|
297
|
+
"testResult": "Abnormal"
|
|
298
|
+
}
|
|
299
|
+
]
|
|
300
|
+
},
|
|
301
|
+
"education": {
|
|
302
|
+
"institution": "Massachusetts Institute of Technology",
|
|
303
|
+
"institutionDomain": "mit.edu",
|
|
304
|
+
"institutionState": "Massachusetts"
|
|
305
|
+
}
|
|
306
|
+
}
|
|
307
|
+
```
|
|
308
|
+
|
|
309
|
+
### 4. Locale-Aware Name Generation
|
|
410
310
|
Supports 8 locales with culturally accurate first names, last names, and country/phone codes:
|
|
411
311
|
- `'in'`: Aarav Sharma, Priya Patel (+91, India)
|
|
412
312
|
- `'jp'`: Haruto Tanaka, Sakura Sato (+81, Japan)
|
|
@@ -417,7 +317,7 @@ Supports 8 locales with culturally accurate first names, last names, and country
|
|
|
417
317
|
- `'fr'`: Gabriel Martin, Emma Dubois (+33, France)
|
|
418
318
|
- `'en'`: James Smith, Mary Johnson (+1, United States)
|
|
419
319
|
|
|
420
|
-
###
|
|
320
|
+
### 5. Time-Series Activity Data
|
|
421
321
|
Generate chronological behavioral logs for users. Event types include `login`, `page_view`, `purchase`, `search`, `click`, `logout`, `api_call`, `upload`, `download`, and `comment`.
|
|
422
322
|
|
|
423
323
|
```javascript
|
|
@@ -426,7 +326,7 @@ const ts = data.userTimeSeries({ seed: 42, days: 30, eventsPerDay: 8 });
|
|
|
426
326
|
// ts.activity → [{ timestamp, type, page, duration, device, ip, success, amount?, query? }]
|
|
427
327
|
```
|
|
428
328
|
|
|
429
|
-
###
|
|
329
|
+
### 6. Anomaly Injection Engine (Fraud Detection)
|
|
430
330
|
When `anomaly_rate` is > 0, `fakedata` injects ML-detectable fraud patterns into the dataset. Affected users receive a special `_anomaly` flag object indicating the fraud type.
|
|
431
331
|
|
|
432
332
|
| Anomaly Type | Effect |
|
|
@@ -440,7 +340,7 @@ When `anomaly_rate` is > 0, `fakedata` injects ML-detectable fraud patterns into
|
|
|
440
340
|
| `data_mismatch` | Age=12 + employed + 30yr experience + $500k income |
|
|
441
341
|
| `health_outlier` | BMI = 8-9 or 75-80, BP = extreme values |
|
|
442
342
|
|
|
443
|
-
###
|
|
343
|
+
### 7. The User Profile Schema (109 Correlated Fields)
|
|
444
344
|
Each generated user contains highly realistic, correlated data. For example, age determines education graduation year, which impacts employment salary, which impacts credit score, which impacts housing status and health/BMI metrics.
|
|
445
345
|
|
|
446
346
|
```text
|
|
@@ -456,7 +356,3 @@ identity(9) → personal(6) → network(3) → address(7) → demographics(5)
|
|
|
456
356
|
Distributed under the **MIT License**. See `LICENSE` for more information.
|
|
457
357
|
|
|
458
358
|
**Maintainer**: [abhay557](https://github.com/abhay557)
|
|
459
|
-
|
|
460
|
-
- Project Commit History - `https://github.com/abhay557/random-api.xyz`
|
|
461
|
-
|
|
462
|
-
---
|