datly 0.0.4 → 0.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.MD CHANGED
@@ -8,7 +8,7 @@ A comprehensive JavaScript library for data analysis, statistics, machine learni
8
8
  1. [Introduction](#introduction)
9
9
  2. [Installation](#installation)
10
10
  3. [Core Concepts](#core-concepts)
11
- 4. [Data Preparation](#data-preparation)
11
+ 4. [Dataframe Operations](#dataframe-operations)
12
12
  5. [Descriptive Statistics](#descriptive-statistics)
13
13
  6. [Exploratory Data Analysis](#exploratory-data-analysis)
14
14
  7. [Probability Distributions](#probability-distributions)
@@ -79,14 +79,18 @@ This format makes it easy to:
79
79
 
80
80
  ---
81
81
 
82
- ## Data Preparation
82
+ ## Dataframe Operations
83
83
 
84
- ### `dataframe_from_json(data)`
84
+ ### `df_from_csv(content, options = {})`
85
85
 
86
- Creates a dataframe summary from JSON data.
86
+ Creates a dataframe from CSV content.
87
87
 
88
88
  **Parameters:**
89
- - `data`: Array of objects or single object
89
+ - `content`: CSV string content
90
+ - `options`:
91
+ - `delimiter`: Column delimiter (default: ',')
92
+ - `header`: First row contains headers (default: true)
93
+ - `skipEmptyLines`: Skip empty lines (default: true)
90
94
 
91
95
  **Returns:**
92
96
  ```yaml
@@ -95,31 +99,751 @@ columns:
95
99
  - name
96
100
  - age
97
101
  - salary
98
- n_rows: 100
99
- n_cols: 3
100
- dtypes:
101
- - string
102
- - number
103
- - number
104
- preview:
102
+ data:
105
103
  - name: alice
106
104
  age: 30
107
105
  salary: 50000
108
106
  - name: bob
109
107
  age: 25
110
108
  salary: 45000
109
+ n_rows: 2
110
+ n_cols: 3
111
+ ```
112
+
113
+ **Example:**
114
+ ```javascript
115
+ const csvContent = `name,age,salary
116
+ Alice,30,50000
117
+ Bob,25,45000
118
+ Charlie,35,60000`;
119
+
120
+ const df = datly.df_from_csv(csvContent);
121
+ console.log(df);
122
+ ```
123
+
124
+ ---
125
+
126
+ ### `df_from_json(input)`
127
+
128
+ Creates a dataframe from JSON data. Accepts multiple formats:
129
+ - Array of objects
130
+ - Single object (converted to single-row dataframe)
131
+ - Structured JSON with headers and data arrays
132
+ - String (parsed as JSON)
133
+
134
+ **Returns:**
135
+ ```yaml
136
+ type: dataframe
137
+ columns:
138
+ - name
139
+ - age
140
+ - department
141
+ data:
142
+ - name: alice
143
+ age: 30
144
+ department: engineering
145
+ - name: bob
146
+ age: 25
147
+ department: sales
148
+ n_rows: 2
149
+ n_cols: 3
111
150
  ```
112
151
 
113
152
  **Example:**
114
153
  ```javascript
154
+ // From array of objects
115
155
  const data = [
116
- { name: 'Alice', age: 30, salary: 50000 },
117
- { name: 'Bob', age: 25, salary: 45000 },
118
- { name: 'Charlie', age: 35, salary: 60000 }
156
+ { name: 'Alice', age: 30, department: 'Engineering' },
157
+ { name: 'Bob', age: 25, department: 'Sales' }
119
158
  ];
159
+ const df = datly.df_from_json(data);
120
160
 
121
- const df = datly.dataframe_from_json(data);
122
- console.log(df);
161
+ // From JSON string
162
+ const jsonString = '[{"name":"Alice","age":30},{"name":"Bob","age":25}]';
163
+ const df2 = datly.df_from_json(jsonString);
164
+
165
+ // From structured format
166
+ const structured = {
167
+ headers: ['name', 'age'],
168
+ data: [['Alice', 30], ['Bob', 25]]
169
+ };
170
+ const df3 = datly.df_from_json(structured);
171
+ ```
172
+
173
+ ---
174
+
175
+ ### `df_from_array(array)`
176
+
177
+ Creates a dataframe from an array of objects.
178
+
179
+ **Parameters:**
180
+ - `array`: Array of objects with consistent keys
181
+
182
+ **Returns:**
183
+ ```yaml
184
+ type: dataframe
185
+ columns:
186
+ - product
187
+ - price
188
+ - stock
189
+ data:
190
+ - product: laptop
191
+ price: 999
192
+ stock: 15
193
+ - product: mouse
194
+ price: 25
195
+ stock: 50
196
+ n_rows: 2
197
+ n_cols: 3
198
+ ```
199
+
200
+ **Example:**
201
+ ```javascript
202
+ const products = [
203
+ { product: 'Laptop', price: 999, stock: 15 },
204
+ { product: 'Mouse', price: 25, stock: 50 },
205
+ { product: 'Keyboard', price: 75, stock: 30 }
206
+ ];
207
+
208
+ const df = datly.df_from_array(products);
209
+ ```
210
+
211
+ ---
212
+
213
+ ### `df_from_object(object, options = {})`
214
+
215
+ Creates a dataframe from a single object. Can flatten nested structures.
216
+
217
+ **Parameters:**
218
+ - `object`: JavaScript object
219
+ - `options`:
220
+ - `flatten`: Flatten nested objects (default: true)
221
+ - `maxDepth`: Maximum depth for flattening (default: 10)
222
+
223
+ **Returns (flattened):**
224
+ ```yaml
225
+ type: dataframe
226
+ columns:
227
+ - user.name
228
+ - user.age
229
+ - user.address.city
230
+ - user.address.country
231
+ - orders
232
+ - orders.id
233
+ - orders.total
234
+ data:
235
+ - user.name: alice
236
+ user.age: 30
237
+ user.address.city: new york
238
+ user.address.country: usa
239
+ orders:
240
+ - id: 1
241
+ total: 150
242
+ - id: 2
243
+ total: 200
244
+ orders.id:
245
+ - 1
246
+ - 2
247
+ orders.total:
248
+ - 150
249
+ - 200
250
+ n_rows: 1
251
+ n_cols: 7
252
+ ```
253
+
254
+ **Example:**
255
+ ```javascript
256
+ // Flattened (default)
257
+ const user = {
258
+ name: 'Alice',
259
+ age: 30,
260
+ address: {
261
+ city: 'New York',
262
+ country: 'USA'
263
+ },
264
+ orders: [
265
+ { id: 1, total: 150 },
266
+ { id: 2, total: 200 }
267
+ ]
268
+ };
269
+
270
+ const df = datly.df_from_object(user);
271
+ // Flattened columns: name, age, address.city, address.country, etc.
272
+
273
+ // Non-flattened (key-value pairs)
274
+ const df2 = datly.df_from_object(user, { flatten: false });
275
+ ```
276
+
277
+ ---
278
+
279
+ ## Basic Operations
280
+
281
+ ### `df_get_column(dataframe, column)`
282
+
283
+ Extracts a single column as an array.
284
+
285
+ **Returns:**
286
+ ```javascript
287
+ [30, 25, 35] // Array of values
288
+ ```
289
+
290
+ **Example:**
291
+ ```javascript
292
+ const df = datly.df_from_json([
293
+ { name: 'Alice', age: 30 },
294
+ { name: 'Bob', age: 25 },
295
+ { name: 'Charlie', age: 35 }
296
+ ]);
297
+
298
+ const ages = datly.df_get_column(df, 'age');
299
+ console.log(ages); // [30, 25, 35]
300
+ ```
301
+
302
+ ---
303
+
304
+ ### `df_get_value(dataframe, column)`
305
+
306
+ Gets the first value from a column. Useful for single-row dataframes.
307
+
308
+ **Returns:**
309
+ ```javascript
310
+ 30 // Single value
311
+ ```
312
+
313
+ **Example:**
314
+ ```javascript
315
+ const userObj = { name: 'Alice', age: 30, city: 'NYC' };
316
+ const df = datly.df_from_object(userObj);
317
+
318
+ const age = datly.df_get_value(df, 'age');
319
+ console.log(age); // 30
320
+ ```
321
+
322
+ ---
323
+
324
+ ### `df_get_columns(dataframe, columns)`
325
+
326
+ Extracts multiple columns as an object of arrays.
327
+
328
+ **Returns:**
329
+ ```javascript
330
+ {
331
+ name: ['Alice', 'Bob', 'Charlie'],
332
+ age: [30, 25, 35]
333
+ }
334
+ ```
335
+
336
+ **Example:**
337
+ ```javascript
338
+ const df = datly.df_from_json([
339
+ { name: 'Alice', age: 30, salary: 50000 },
340
+ { name: 'Bob', age: 25, salary: 45000 }
341
+ ]);
342
+
343
+ const subset = datly.df_get_columns(df, ['name', 'age']);
344
+ console.log(subset);
345
+ ```
346
+
347
+ ---
348
+
349
+ ### `df_head(dataframe, n = 5)`
350
+
351
+ Returns the first n rows.
352
+
353
+ **Returns:**
354
+ ```yaml
355
+ type: dataframe
356
+ columns:
357
+ - name
358
+ - age
359
+ data:
360
+ - name: alice
361
+ age: 30
362
+ - name: bob
363
+ age: 25
364
+ n_rows: 2
365
+ n_cols: 2
366
+ ```
367
+
368
+ **Example:**
369
+ ```javascript
370
+ const df = datly.df_from_json([...largeDataset]);
371
+ const first3 = datly.df_head(df, 3);
372
+ ```
373
+
374
+ ---
375
+
376
+ ### `df_tail(dataframe, n = 5)`
377
+
378
+ Returns the last n rows.
379
+
380
+ **Example:**
381
+ ```javascript
382
+ const df = datly.df_from_json([...largeDataset]);
383
+ const last3 = datly.df_tail(df, 3);
384
+ ```
385
+
386
+ ---
387
+
388
+ ### `df_info(dataframe)`
389
+
390
+ Returns detailed information about the dataframe structure.
391
+
392
+ **Returns:**
393
+ ```yaml
394
+ n_rows: 100
395
+ n_cols: 5
396
+ columns:
397
+ - name
398
+ - age
399
+ - salary
400
+ - department
401
+ - active
402
+ types:
403
+ name: string
404
+ age: number
405
+ salary: number
406
+ department: string
407
+ active: boolean
408
+ null_counts:
409
+ name: 0
410
+ age: 2
411
+ salary: 1
412
+ unique_counts:
413
+ name: 95
414
+ age: 45
415
+ ```
416
+
417
+ **Example:**
418
+ ```javascript
419
+ const df = datly.df_from_json(employeeData);
420
+ const info = datly.df_info(df);
421
+ console.log(info);
422
+ ```
423
+
424
+ ---
425
+
426
+ ## Data Selection
427
+
428
+ ### `df_select(dataframe, columns)`
429
+
430
+ Selects specific columns.
431
+
432
+ **Returns:**
433
+ ```yaml
434
+ type: dataframe
435
+ columns:
436
+ - name
437
+ - salary
438
+ data:
439
+ - name: alice
440
+ salary: 50000
441
+ n_rows: 1
442
+ n_cols: 2
443
+ ```
444
+
445
+ **Example:**
446
+ ```javascript
447
+ const df = datly.df_from_json(employeeData);
448
+ const subset = datly.df_select(df, ['name', 'salary']);
449
+ ```
450
+
451
+ ---
452
+
453
+ ### `df_filter(dataframe, predicate)`
454
+
455
+ Filters rows based on a predicate function.
456
+
457
+ **Returns:**
458
+ ```yaml
459
+ type: dataframe
460
+ columns:
461
+ - name
462
+ - age
463
+ - salary
464
+ data:
465
+ - name: alice
466
+ age: 30
467
+ salary: 50000
468
+ - name: charlie
469
+ age: 35
470
+ salary: 60000
471
+ n_rows: 2
472
+ n_cols: 3
473
+ ```
474
+
475
+ **Example:**
476
+ ```javascript
477
+ const df = datly.df_from_json(employeeData);
478
+
479
+ // Filter employees older than 28
480
+ const filtered = datly.df_filter(df, row => row.age > 28);
481
+
482
+ // Multiple conditions
483
+ const highEarners = datly.df_filter(df, row =>
484
+ row.salary > 55000 && row.department === 'Engineering'
485
+ );
486
+ ```
487
+
488
+ ---
489
+
490
+ ### `df_sort(dataframe, column, order = 'asc')`
491
+
492
+ Sorts dataframe by a column.
493
+
494
+ **Example:**
495
+ ```javascript
496
+ const df = datly.df_from_json(employeeData);
497
+
498
+ // Sort ascending
499
+ const sortedAsc = datly.df_sort(df, 'age', 'asc');
500
+
501
+ // Sort descending
502
+ const sortedDesc = datly.df_sort(df, 'salary', 'desc');
503
+ ```
504
+
505
+ ---
506
+
507
+ ## Data Cleaning
508
+
509
+ ### `df_dropna(dataframe, subset = null)`
510
+
511
+ Removes rows with null/undefined values.
512
+
513
+ **Example:**
514
+ ```javascript
515
+ const df = datly.df_from_json([
516
+ { name: 'Alice', age: 30, email: 'alice@example.com' },
517
+ { name: 'Bob', age: null, email: 'bob@example.com' },
518
+ { name: 'Charlie', age: 35, email: null }
519
+ ]);
520
+
521
+ // Drop rows with any null values
522
+ const cleaned = datly.df_dropna(df);
523
+
524
+ // Drop rows with null in specific columns
525
+ const cleanedPartial = datly.df_dropna(df, ['age']);
526
+ ```
527
+
528
+ ---
529
+
530
+ ### `df_fillna(dataframe, value, subset = null)`
531
+
532
+ Fills null/undefined values with a specified value.
533
+
534
+ **Example:**
535
+ ```javascript
536
+ const df = datly.df_from_json([
537
+ { name: 'Alice', age: 30, score: 85 },
538
+ { name: 'Bob', age: null, score: 90 },
539
+ { name: 'Charlie', age: 35, score: null }
540
+ ]);
541
+
542
+ // Fill all nulls with 0
543
+ const filled = datly.df_fillna(df, 0);
544
+
545
+ // Fill specific columns
546
+ const filledPartial = datly.df_fillna(df, 0, ['score']);
547
+ ```
548
+
549
+ ---
550
+
551
+ ### `df_drop(dataframe, columns)`
552
+
553
+ Removes specified columns.
554
+
555
+ **Example:**
556
+ ```javascript
557
+ const df = datly.df_from_json(employeeData);
558
+
559
+ // Drop single column
560
+ const dropped = datly.df_drop(df, 'email');
561
+
562
+ // Drop multiple columns
563
+ const droppedMultiple = datly.df_drop(df, ['email', 'phone', 'address']);
564
+ ```
565
+
566
+ ---
567
+
568
+ ### `df_rename(dataframe, renameMap)`
569
+
570
+ Renames columns.
571
+
572
+ **Example:**
573
+ ```javascript
574
+ const df = datly.df_from_json([
575
+ { name: 'Alice', age: 30, salary: 50000 }
576
+ ]);
577
+
578
+ const renamed = datly.df_rename(df, {
579
+ name: 'employee_name',
580
+ age: 'employee_age',
581
+ salary: 'monthly_salary'
582
+ });
583
+ ```
584
+
585
+ ---
586
+
587
+ ## Advanced Operations
588
+
589
+ ### `df_concat(...dataframes)`
590
+
591
+ Concatenates multiple dataframes vertically.
592
+
593
+ **Example:**
594
+ ```javascript
595
+ const df1 = datly.df_from_json([
596
+ { name: 'Alice', age: 30 }
597
+ ]);
598
+
599
+ const df2 = datly.df_from_json([
600
+ { name: 'Bob', age: 25 }
601
+ ]);
602
+
603
+ const combined = datly.df_concat(df1, df2);
604
+ ```
605
+
606
+ ---
607
+
608
+ ### `df_merge(dataframe1, dataframe2, options)`
609
+
610
+ Merges two dataframes (SQL-style join).
611
+
612
+ **Parameters:**
613
+ - `options`:
614
+ - `on`: Column name(s) to join on
615
+ - `how`: 'inner', 'left', 'right', or 'outer'
616
+
617
+ **Example:**
618
+ ```javascript
619
+ const employees = datly.df_from_json([
620
+ { id: 1, name: 'Alice', dept: 'Engineering' },
621
+ { id: 2, name: 'Bob', dept: 'Sales' }
622
+ ]);
623
+
624
+ const salaries = datly.df_from_json([
625
+ { id: 1, salary: 50000 },
626
+ { id: 2, salary: 45000 }
627
+ ]);
628
+
629
+ // Inner join
630
+ const merged = datly.df_merge(employees, salaries, {
631
+ on: 'id',
632
+ how: 'inner'
633
+ });
634
+
635
+ // Multiple keys
636
+ const merged2 = datly.df_merge(df1, df2, {
637
+ on: ['id', 'year'],
638
+ how: 'left'
639
+ });
640
+ ```
641
+
642
+ ---
643
+
644
+ ### `df_groupby(dataframe, keys)`
645
+
646
+ Groups dataframe by columns.
647
+
648
+ **Returns:**
649
+ ```javascript
650
+ {
651
+ keys: ['department'],
652
+ groups: Map { ... }
653
+ }
654
+ ```
655
+
656
+ **Example:**
657
+ ```javascript
658
+ const df = datly.df_from_json([
659
+ { name: 'Alice', department: 'Engineering', salary: 50000 },
660
+ { name: 'Bob', department: 'Sales', salary: 45000 },
661
+ { name: 'Charlie', department: 'Engineering', salary: 60000 }
662
+ ]);
663
+
664
+ // Group by single column
665
+ const grouped = datly.df_groupby(df, 'department');
666
+
667
+ // Group by multiple columns
668
+ const multiGrouped = datly.df_groupby(df, ['department', 'level']);
669
+ ```
670
+
671
+ ---
672
+
673
+ ### `df_aggregate(grouped, aggMap)`
674
+
675
+ Applies aggregation functions to grouped data.
676
+
677
+ **Example:**
678
+ ```javascript
679
+ const df = datly.df_from_json(employeeData);
680
+ const grouped = datly.df_groupby(df, 'department');
681
+
682
+ // Average salary and age by department
683
+ const aggregated = datly.df_aggregate(grouped, {
684
+ salary: arr => arr.reduce((a, b) => a + b, 0) / arr.length,
685
+ age: arr => arr.reduce((a, b) => a + b, 0) / arr.length
686
+ });
687
+
688
+ // Custom aggregations
689
+ const customAgg = datly.df_aggregate(grouped, {
690
+ salary: arr => Math.max(...arr),
691
+ age: arr => Math.min(...arr)
692
+ });
693
+ ```
694
+
695
+ ---
696
+
697
+ ## Utility Functions
698
+
699
+ ### `df_apply(dataframe, column, function)`
700
+
701
+ Applies a function to transform a column.
702
+
703
+ **Example:**
704
+ ```javascript
705
+ const df = datly.df_from_json([
706
+ { name: 'Alice', salary: 50000 },
707
+ { name: 'Bob', salary: 45000 }
708
+ ]);
709
+
710
+ // Increase all salaries by 10%
711
+ const increased = datly.df_apply(df, 'salary', val => val * 1.1);
712
+
713
+ // Access full row
714
+ const withBonus = datly.df_apply(df, 'salary', (val, row) => {
715
+ return row.name === 'Alice' ? val * 1.2 : val * 1.1;
716
+ });
717
+ ```
718
+
719
+ ---
720
+
721
+ ### `df_add_column(dataframe, columnName, function)`
722
+
723
+ Adds a new derived column.
724
+
725
+ **Example:**
726
+ ```javascript
727
+ const df = datly.df_from_json([
728
+ { name: 'Alice', salary: 50000, bonus: 5000 },
729
+ { name: 'Bob', salary: 45000, bonus: 3000 }
730
+ ]);
731
+
732
+ // Add total compensation
733
+ const withTotal = datly.df_add_column(df, 'total_comp',
734
+ row => row.salary + row.bonus
735
+ );
736
+
737
+ // Add calculated column
738
+ const withTax = datly.df_add_column(df, 'tax',
739
+ row => row.salary * 0.25
740
+ );
741
+ ```
742
+
743
+ ---
744
+
745
+ ### `df_unique(dataframe, column)`
746
+
747
+ Returns unique values from a column.
748
+
749
+ **Example:**
750
+ ```javascript
751
+ const df = datly.df_from_json(employeeData);
752
+ const departments = datly.df_unique(df, 'department');
753
+ console.log(departments); // ['Engineering', 'Sales', 'HR']
754
+ ```
755
+
756
+ ---
757
+
758
+ ### `df_sample(dataframe, n = 5, seed = null)`
759
+
760
+ Returns a random sample of rows.
761
+
762
+ **Example:**
763
+ ```javascript
764
+ const df = datly.df_from_json(largeDataset);
765
+
766
+ // Random sample
767
+ const sample = datly.df_sample(df, 10);
768
+
769
+ // Reproducible with seed
770
+ const reproducible = datly.df_sample(df, 10, 42);
771
+ ```
772
+
773
+ ---
774
+
775
+ ### `df_to_csv(dataframe, delimiter = ',')`
776
+
777
+ Exports dataframe to CSV string.
778
+
779
+ **Returns:**
780
+ ```csv
781
+ name,age,salary
782
+ Alice,30,50000
783
+ Bob,25,45000
784
+ ```
785
+
786
+ **Example:**
787
+ ```javascript
788
+ const df = datly.df_from_json(employeeData);
789
+
790
+ // Export to CSV
791
+ const csv = datly.df_to_csv(df);
792
+
793
+ // Custom delimiter
794
+ const tsv = datly.df_to_csv(df, '\t');
795
+ ```
796
+
797
+ ---
798
+
799
+ ## Working with Nested Data
800
+
801
+ ### `df_explode(dataframe, column)`
802
+
803
+ Expands array values into multiple rows.
804
+
805
+ **Example:**
806
+ ```javascript
807
+ const df = datly.df_from_json([
808
+ { user: 'Alice', order_ids: [1, 2, 3] },
809
+ { user: 'Bob', order_ids: [4] }
810
+ ]);
811
+
812
+ // Explode order_ids
813
+ const exploded = datly.df_explode(df, 'order_ids');
814
+ // Alice appears 3 times (one per order)
815
+ ```
816
+
817
+ ---
818
+
819
+ ### `df_find_columns(dataframe, pattern)`
820
+
821
+ Searches for columns matching a pattern.
822
+
823
+ **Returns:**
824
+ ```yaml
825
+ pattern: user
826
+ matches_found: 3
827
+ columns:
828
+ - user.name
829
+ - user.age
830
+ - user.email
831
+ ```
832
+
833
+ **Example:**
834
+ ```javascript
835
+ const user = {
836
+ name: 'Alice',
837
+ address: {
838
+ street: '123 Main St',
839
+ city: 'NYC'
840
+ }
841
+ };
842
+
843
+ const df = datly.df_from_object(user);
844
+
845
+ // Find address columns
846
+ const addressCols = datly.df_find_columns(df, 'address');
123
847
  ```
124
848
 
125
849
  ---