lawkit-python 2.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,119 @@
1
+ # Rust build artifacts
2
+ /target/
3
+ **/target/
4
+ Cargo.lock
5
+
6
+ # IDE files
7
+ .vscode/
8
+ .idea/
9
+ *.swp
10
+ *.swo
11
+ *.vim
12
+ .netrwhist
13
+
14
+ # Claude Code settings
15
+ .claude/settings.local.json
16
+
17
+ # OS files
18
+ .DS_Store
19
+ .DS_Store?
20
+ ._*
21
+ .Spotlight-V100
22
+ .Trashes
23
+ ehthumbs.db
24
+ Thumbs.db
25
+
26
+ # Temporary files
27
+ *.tmp
28
+ *.bak
29
+ *~
30
+ *.orig
31
+
32
+ # Test files and outputs
33
+ test_*
34
+ *.test
35
+ test_output/
36
+ test_manual/
37
+ test_threshold.csv
38
+
39
+ # Test data directories (generated by test scripts)
40
+ tests/integration/usage_test_data/
41
+ tests/integration/batch_test_data/
42
+ tests/output/
43
+
44
+ # Logs
45
+ *.log
46
+ logs/
47
+
48
+ # Documentation build
49
+ /book/
50
+ /docs/_build/
51
+
52
+ # Benchmark outputs
53
+ criterion/
54
+ bench_*.txt
55
+
56
+ # Coverage reports
57
+ tarpaulin-report.html
58
+ lcov.info
59
+ coverage/
60
+
61
+ # Release artifacts
62
+ *.tar.gz
63
+ *.zip
64
+ release/
65
+
66
+ # Environment files
67
+ .env
68
+ .env.local
69
+ .env.production
70
+ .env.staging
71
+
72
+ # Editor backups and swap files
73
+ *~
74
+ .#*
75
+ \#*#
76
+ .*.sw[a-z]
77
+ *.un~
78
+ Session.vim
79
+ .netrwhist
80
+
81
+ # Python packaging
82
+ __pycache__/
83
+ *.py[cod]
84
+ *$py.class
85
+ *.egg-info/
86
+ python-package/dist/
87
+ python-package/build/
88
+ python-package/lawkit.egg-info/
89
+ python-venv/
90
+ *.whl
91
+ *.tar.gz.sig
92
+
93
+ # Node.js packaging
94
+ node_modules/
95
+ npm-debug.log*
96
+ yarn-debug.log*
97
+ yarn-error.log*
98
+ .npm
99
+ .yarn-integrity
100
+ npm-package/bin/lawkit*
101
+ npm-package/bin/benf*
102
+ !npm-package/bin/lawkit
103
+ !npm-package/bin/benf
104
+ *.tgz
105
+
106
+ # Rust-specific
107
+ **/*.rs.bk
108
+ *.pdb
109
+
110
+ # Security
111
+ *.key
112
+ *.pem
113
+ *.crt
114
+ secrets.json
115
+
116
+ # Large test files
117
+ *.csv.large
118
+ *.xlsx.large
119
+ *.json.large
@@ -0,0 +1,441 @@
1
+ Metadata-Version: 2.4
2
+ Name: lawkit-python
3
+ Version: 2.1.0
4
+ Summary: Python wrapper for lawkit - Statistical law analysis toolkit for fraud detection and data quality assessment
5
+ Project-URL: Homepage, https://github.com/kako-jun/lawkit
6
+ Project-URL: Repository, https://github.com/kako-jun/lawkit
7
+ Project-URL: Issues, https://github.com/kako-jun/lawkit/issues
8
+ Project-URL: Documentation, https://github.com/kako-jun/lawkit/tree/main/docs
9
+ Author-email: kako-jun <kako.jun.42@gmail.com>
10
+ License-Expression: MIT
11
+ Keywords: anomaly-detection,audit,benford,compliance,data-quality,forensic-accounting,fraud-detection,normal,outlier-detection,pareto,poisson,statistical-analysis,statistics,zipf
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: Intended Audience :: Financial and Insurance Industry
15
+ Classifier: Intended Audience :: Science/Research
16
+ Classifier: License :: OSI Approved :: MIT License
17
+ Classifier: Programming Language :: Python :: 3
18
+ Classifier: Programming Language :: Python :: 3.8
19
+ Classifier: Programming Language :: Python :: 3.9
20
+ Classifier: Programming Language :: Python :: 3.10
21
+ Classifier: Programming Language :: Python :: 3.11
22
+ Classifier: Programming Language :: Python :: 3.12
23
+ Classifier: Programming Language :: Python :: 3.13
24
+ Classifier: Topic :: Office/Business :: Financial
25
+ Classifier: Topic :: Scientific/Engineering :: Mathematics
26
+ Classifier: Topic :: Security
27
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
28
+ Classifier: Topic :: Utilities
29
+ Requires-Python: >=3.8
30
+ Provides-Extra: dev
31
+ Requires-Dist: black; extra == 'dev'
32
+ Requires-Dist: isort; extra == 'dev'
33
+ Requires-Dist: mypy; extra == 'dev'
34
+ Requires-Dist: pytest-asyncio; extra == 'dev'
35
+ Requires-Dist: pytest-cov; extra == 'dev'
36
+ Requires-Dist: pytest>=6.0; extra == 'dev'
37
+ Requires-Dist: ruff; extra == 'dev'
38
+ Requires-Dist: types-requests; extra == 'dev'
39
+ Description-Content-Type: text/markdown
40
+
41
+ # lawkit-python
42
+
43
+ Python wrapper for the `lawkit` CLI tool - Statistical law analysis toolkit for fraud detection and data quality assessment.
44
+
45
+ ## Installation
46
+
47
+ ```bash
48
+ pip install lawkit-python
49
+ ```
50
+
51
+ This will automatically download the appropriate `lawkit` binary for your system from GitHub Releases.
52
+
53
+ ## Quick Start
54
+
55
+ ```python
56
+ import lawkit
57
+
58
+ # Analyze financial data with Benford's Law
59
+ result = lawkit.analyze_benford('financial_data.csv')
60
+ print(result)
61
+
62
+ # Get structured JSON output
63
+ json_result = lawkit.analyze_benford(
64
+ 'accounting.csv',
65
+ lawkit.LawkitOptions(format='csv', output='json')
66
+ )
67
+ print(f"Risk level: {json_result.risk_level}")
68
+ print(f"P-value: {json_result.p_value}")
69
+
70
+ # Check if data follows Pareto principle (80/20 rule)
71
+ pareto_result = lawkit.analyze_pareto(
72
+ 'sales_data.csv',
73
+ lawkit.LawkitOptions(output='json', gini_coefficient=True)
74
+ )
75
+ print(f"Gini coefficient: {pareto_result.gini_coefficient}")
76
+ print(f"80/20 concentration: {pareto_result.concentration_80_20}")
77
+ ```
78
+
79
+ ## Features
80
+
81
+ ### Statistical Laws Supported
82
+
83
+ - **Benford's Law**: Detect fraud and anomalies in numerical data
84
+ - **Pareto Principle**: Analyze 80/20 distributions and concentration
85
+ - **Zipf's Law**: Analyze word frequencies and power-law distributions
86
+ - **Normal Distribution**: Test for normality and detect outliers
87
+ - **Poisson Distribution**: Analyze rare events and count data
88
+
89
+ ### Advanced Analysis
90
+
91
+ - **Multi-law Comparison**: Compare multiple statistical laws on the same data
92
+ - **Outlier Detection**: Advanced anomaly detection algorithms
93
+ - **Time Series Analysis**: Trend and seasonality detection
94
+ - **International Numbers**: Support for various number formats (Japanese, Chinese, etc.)
95
+ - **Memory Efficient**: Handle large datasets with streaming analysis
96
+
97
+ ### File Format Support
98
+
99
+ - **CSV, JSON, YAML, TOML, XML**: Standard structured data formats
100
+ - **Excel Files**: `.xlsx` and `.xls` support
101
+ - **PDF Documents**: Extract and analyze numerical data from PDFs
102
+ - **Word Documents**: Analyze data from `.docx` files
103
+ - **PowerPoint**: Extract data from presentations
104
+
105
+ ## Usage Examples
106
+
107
+ ### Modern API (Recommended)
108
+
109
+ ```python
110
+ import lawkit
111
+
112
+ # Analyze with Benford's Law
113
+ result = lawkit.analyze_benford('invoice_data.csv')
114
+ print(result)
115
+
116
+ # Get detailed JSON analysis
117
+ json_result = lawkit.analyze_benford(
118
+ 'financial_statements.xlsx',
119
+ lawkit.LawkitOptions(
120
+ format='excel',
121
+ output='json',
122
+ confidence=0.95,
123
+ verbose=True
124
+ )
125
+ )
126
+
127
+ if json_result.risk_level == "High":
128
+ print("⚠️ High risk of fraud detected!")
129
+ print(f"Chi-square: {json_result.chi_square}")
130
+ print(f"P-value: {json_result.p_value}")
131
+ print(f"MAD: {json_result.mad}%")
132
+
133
+ # Pareto analysis for business insights
134
+ pareto_result = lawkit.analyze_pareto(
135
+ 'customer_revenue.csv',
136
+ lawkit.LawkitOptions(
137
+ output='json',
138
+ gini_coefficient=True,
139
+ business_analysis=True,
140
+ percentiles="70,80,90"
141
+ )
142
+ )
143
+
144
+ print(f"Top 20% customers generate {pareto_result.concentration_80_20:.1f}% of revenue")
145
+ print(f"Income inequality (Gini): {pareto_result.gini_coefficient:.3f}")
146
+
147
+ # Normal distribution analysis with outlier detection
148
+ normal_result = lawkit.analyze_normal(
149
+ 'quality_measurements.csv',
150
+ lawkit.LawkitOptions(
151
+ output='json',
152
+ outlier_detection=True,
153
+ test_type='shapiro'
154
+ )
155
+ )
156
+
157
+ if normal_result.p_value < 0.05:
158
+ print("Data does not follow normal distribution")
159
+ if normal_result.outliers:
160
+ print(f"Found {len(normal_result.outliers)} outliers")
161
+
162
+ # Multi-law comparison
163
+ comparison = lawkit.compare_laws(
164
+ 'complex_dataset.csv',
165
+ lawkit.LawkitOptions(output='json')
166
+ )
167
+ print(f"Best fitting law: {comparison.data.get('best_law')}")
168
+ print(f"Overall risk level: {comparison.risk_level}")
169
+ ```
170
+
171
+ ### Generate Sample Data
172
+
173
+ ```python
174
+ import lawkit
175
+
176
+ # Generate Benford's Law compliant data
177
+ benford_data = lawkit.generate_data('benf', samples=1000, seed=42)
178
+ print(benford_data)
179
+
180
+ # Generate normal distribution data
181
+ normal_data = lawkit.generate_data('normal', samples=500, mean=100, stddev=15)
182
+
183
+ # Generate Pareto distribution data
184
+ pareto_data = lawkit.generate_data('pareto', samples=1000, concentration=0.8)
185
+
186
+ # Test the pipeline: generate → analyze
187
+ data = lawkit.generate_data('benf', samples=10000, seed=42)
188
+ result = lawkit.analyze_string(data, 'benf', lawkit.LawkitOptions(output='json'))
189
+ print(f"Generated data risk level: {result.risk_level}")
190
+ ```
191
+
192
+ ### Analyze String Data Directly
193
+
194
+ ```python
195
+ import lawkit
196
+
197
+ # Analyze CSV data from string
198
+ csv_data = """amount
199
+ 123.45
200
+ 456.78
201
+ 789.12
202
+ 234.56
203
+ 567.89"""
204
+
205
+ result = lawkit.analyze_string(
206
+ csv_data,
207
+ 'benf',
208
+ lawkit.LawkitOptions(format='csv', output='json')
209
+ )
210
+ print(f"Risk assessment: {result.risk_level}")
211
+
212
+ # Analyze JSON data
213
+ json_data = '{"values": [12, 23, 34, 45, 56, 67, 78, 89]}'
214
+ result = lawkit.analyze_string(
215
+ json_data,
216
+ 'normal',
217
+ lawkit.LawkitOptions(format='json', output='json')
218
+ )
219
+ print(f"Is normal: {result.p_value > 0.05}")
220
+ ```
221
+
222
+ ### Advanced Options
223
+
224
+ ```python
225
+ import lawkit
226
+
227
+ # High-performance analysis with optimization
228
+ result = lawkit.analyze_benford(
229
+ 'large_dataset.csv',
230
+ lawkit.LawkitOptions(
231
+ optimize=True,
232
+ parallel=True,
233
+ memory_efficient=True,
234
+ min_count=50,
235
+ threshold=0.001
236
+ )
237
+ )
238
+
239
+ # International number support
240
+ result = lawkit.analyze_benford(
241
+ 'japanese_accounting.csv',
242
+ lawkit.LawkitOptions(
243
+ international=True,
244
+ format='csv',
245
+ output='json'
246
+ )
247
+ )
248
+
249
+ # Time series analysis
250
+ result = lawkit.analyze_normal(
251
+ 'sensor_data.csv',
252
+ lawkit.LawkitOptions(
253
+ time_series=True,
254
+ outlier_detection=True,
255
+ output='json'
256
+ )
257
+ )
258
+ ```
259
+
260
+ ### Legacy API (Backward Compatibility)
261
+
262
+ ```python
263
+ from lawkit import run_lawkit
264
+
265
+ # Direct command execution
266
+ result = run_lawkit(["benf", "data.csv", "--format", "csv", "--output", "json"])
267
+
268
+ if result.returncode == 0:
269
+ print("Analysis successful")
270
+ print(result.stdout)
271
+ else:
272
+ print("Analysis failed")
273
+ print(result.stderr)
274
+
275
+ # Legacy analysis functions
276
+ from lawkit.compat import run_benford_analysis, run_pareto_analysis
277
+
278
+ benford_result = run_benford_analysis("financial.csv", format="csv", output="json")
279
+ pareto_result = run_pareto_analysis("sales.csv", gini_coefficient=True)
280
+ ```
281
+
282
+ ## Installation and Setup
283
+
284
+ ### Automatic Installation (Recommended)
285
+
286
+ ```bash
287
+ pip install lawkit-python
288
+ ```
289
+
290
+ The package will automatically download the appropriate binary for your platform.
291
+
292
+ ### Manual Binary Installation
293
+
294
+ If automatic download fails:
295
+
296
+ ```bash
297
+ lawkit-download-binary
298
+ ```
299
+
300
+ ### Development Installation
301
+
302
+ ```bash
303
+ git clone https://github.com/kako-jun/lawkit
304
+ cd lawkit/lawkit-python
305
+ pip install -e .[dev]
306
+ ```
307
+
308
+ ### Verify Installation
309
+
310
+ ```python
311
+ import lawkit
312
+
313
+ # Check if lawkit is available
314
+ if lawkit.is_lawkit_available():
315
+ print("✅ lawkit is installed and working")
316
+ print(f"Version: {lawkit.get_version()}")
317
+ else:
318
+ print("❌ lawkit is not available")
319
+
320
+ # Run self-test
321
+ if lawkit.selftest():
322
+ print("✅ All tests passed")
323
+ else:
324
+ print("❌ Self-test failed")
325
+ ```
326
+
327
+ ## Use Cases
328
+
329
+ ### Financial Fraud Detection
330
+
331
+ ```python
332
+ import lawkit
333
+
334
+ # Analyze invoice amounts for fraud
335
+ result = lawkit.analyze_benford('invoices.csv',
336
+ lawkit.LawkitOptions(output='json'))
337
+
338
+ if result.risk_level in ['High', 'Critical']:
339
+ print("🚨 Potential fraud detected in invoice data")
340
+ print(f"Statistical significance: p={result.p_value:.6f}")
341
+ print(f"Deviation from Benford's Law: {result.mad:.2f}%")
342
+ ```
343
+
344
+ ### Business Intelligence
345
+
346
+ ```python
347
+ import lawkit
348
+
349
+ # Analyze customer revenue distribution
350
+ result = lawkit.analyze_pareto('customer_revenue.csv',
351
+ lawkit.LawkitOptions(
352
+ output='json',
353
+ business_analysis=True,
354
+ gini_coefficient=True
355
+ ))
356
+
357
+ print(f"Revenue concentration: {result.concentration_80_20:.1f}%")
358
+ print(f"Market inequality: {result.gini_coefficient:.3f}")
359
+ ```
360
+
361
+ ### Quality Control
362
+
363
+ ```python
364
+ import lawkit
365
+
366
+ # Analyze manufacturing measurements
367
+ result = lawkit.analyze_normal('measurements.csv',
368
+ lawkit.LawkitOptions(
369
+ output='json',
370
+ outlier_detection=True,
371
+ test_type='shapiro'
372
+ ))
373
+
374
+ if result.p_value < 0.05:
375
+ print("⚠️ Process out of control - not following normal distribution")
376
+ if result.outliers:
377
+ print(f"Found {len(result.outliers)} outlying measurements")
378
+ ```
379
+
380
+ ### Text Analysis
381
+
382
+ ```python
383
+ import lawkit
384
+
385
+ # Analyze word frequency in documents
386
+ result = lawkit.analyze_zipf('document.txt',
387
+ lawkit.LawkitOptions(output='json'))
388
+
389
+ print(f"Text follows Zipf's Law: {result.p_value > 0.05}")
390
+ print(f"Power law exponent: {result.exponent:.3f}")
391
+ ```
392
+
393
+ ## API Reference
394
+
395
+ ### Main Functions
396
+
397
+ - `analyze_benford(input_data, options)` - Benford's Law analysis
398
+ - `analyze_pareto(input_data, options)` - Pareto principle analysis
399
+ - `analyze_zipf(input_data, options)` - Zipf's Law analysis
400
+ - `analyze_normal(input_data, options)` - Normal distribution analysis
401
+ - `analyze_poisson(input_data, options)` - Poisson distribution analysis
402
+ - `compare_laws(input_data, options)` - Multi-law comparison
403
+ - `generate_data(law_type, samples, **kwargs)` - Generate sample data
404
+ - `analyze_string(content, law_type, options)` - Analyze string data directly
405
+
406
+ ### Utility Functions
407
+
408
+ - `is_lawkit_available()` - Check if lawkit CLI is available
409
+ - `get_version()` - Get lawkit version
410
+ - `selftest()` - Run self-test
411
+
412
+ ### Classes
413
+
414
+ - `LawkitOptions` - Configuration options for analysis
415
+ - `LawkitResult` - Analysis results with structured access
416
+ - `LawkitError` - Exception class for lawkit errors
417
+
418
+ ## Platform Support
419
+
420
+ - **Windows**: x86_64
421
+ - **macOS**: x86_64, ARM64 (Apple Silicon)
422
+ - **Linux**: x86_64, ARM64
423
+
424
+ ## Requirements
425
+
426
+ - Python 3.8+
427
+ - No additional dependencies required
428
+
429
+ ## License
430
+
431
+ This project is licensed under the MIT License.
432
+
433
+ ## Support
434
+
435
+ - GitHub Issues: https://github.com/kako-jun/lawkit/issues
436
+ - Documentation: https://github.com/kako-jun/lawkit/tree/main/docs
437
+ - Examples: https://github.com/kako-jun/lawkit/tree/main/docs/user-guide/examples.md
438
+
439
+ ## Contributing
440
+
441
+ Contributions are welcome! Please read the [Contributing Guide](https://github.com/kako-jun/lawkit/blob/main/CONTRIBUTING.md) for details.