muaddib-scanner 2.2.6 → 2.2.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/muaddib.js +10 -1
- package/datasets/benign/packages-npm.txt +576 -77
- package/datasets/benign/packages-pypi.txt +146 -31
- package/datasets/ground-truth/README.md +54 -0
- package/datasets/ground-truth/known-malware.json +622 -0
- package/package.json +1 -1
- package/src/commands/evaluate.js +191 -31
- package/tmp-summary.js +24 -0
- package/tmp-test-pack.js +66 -0
|
@@ -1,49 +1,164 @@
|
|
|
1
|
+
# MUAD'DIB Benign Dataset — 100+ popular PyPI packages
|
|
2
|
+
# Used for FPR measurement on Python ecosystem
|
|
3
|
+
|
|
4
|
+
# === Web frameworks (15) ===
|
|
1
5
|
requests
|
|
2
|
-
numpy
|
|
3
|
-
pandas
|
|
4
6
|
flask
|
|
5
7
|
django
|
|
6
8
|
fastapi
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
9
|
+
starlette
|
|
10
|
+
uvicorn
|
|
11
|
+
gunicorn
|
|
12
|
+
tornado
|
|
13
|
+
bottle
|
|
14
|
+
falcon
|
|
15
|
+
sanic
|
|
16
|
+
quart
|
|
14
17
|
aiohttp
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
18
|
+
httpx
|
|
19
|
+
werkzeug
|
|
20
|
+
|
|
21
|
+
# === Data science (15) ===
|
|
22
|
+
numpy
|
|
23
|
+
pandas
|
|
24
|
+
scipy
|
|
22
25
|
scikit-learn
|
|
23
26
|
matplotlib
|
|
24
27
|
seaborn
|
|
25
28
|
plotly
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
29
|
+
bokeh
|
|
30
|
+
altair
|
|
31
|
+
statsmodels
|
|
32
|
+
xgboost
|
|
33
|
+
lightgbm
|
|
34
|
+
catboost
|
|
35
|
+
polars
|
|
36
|
+
vaex
|
|
37
|
+
|
|
38
|
+
# === ML/AI (10) ===
|
|
39
|
+
tensorflow
|
|
40
|
+
torch
|
|
41
|
+
keras
|
|
42
|
+
transformers
|
|
43
|
+
datasets
|
|
44
|
+
tokenizers
|
|
45
|
+
onnx
|
|
46
|
+
onnxruntime
|
|
47
|
+
jax
|
|
48
|
+
flax
|
|
49
|
+
|
|
50
|
+
# === Database (10) ===
|
|
51
|
+
sqlalchemy
|
|
52
|
+
psycopg2
|
|
53
|
+
pymongo
|
|
54
|
+
redis
|
|
55
|
+
motor
|
|
56
|
+
peewee
|
|
57
|
+
databases
|
|
58
|
+
asyncpg
|
|
59
|
+
aiomysql
|
|
60
|
+
cassandra-driver
|
|
61
|
+
|
|
62
|
+
# === Testing (10) ===
|
|
63
|
+
pytest
|
|
64
|
+
pytest-cov
|
|
65
|
+
pytest-asyncio
|
|
66
|
+
pytest-mock
|
|
67
|
+
coverage
|
|
68
|
+
tox
|
|
69
|
+
nox
|
|
70
|
+
hypothesis
|
|
71
|
+
faker
|
|
72
|
+
factory-boy
|
|
73
|
+
|
|
74
|
+
# === CLI/UX (10) ===
|
|
34
75
|
click
|
|
35
76
|
typer
|
|
36
77
|
rich
|
|
37
78
|
tqdm
|
|
79
|
+
prompt-toolkit
|
|
80
|
+
colorama
|
|
81
|
+
termcolor
|
|
82
|
+
tabulate
|
|
83
|
+
alive-progress
|
|
84
|
+
questionary
|
|
85
|
+
|
|
86
|
+
# === Linting/formatting (8) ===
|
|
87
|
+
black
|
|
88
|
+
pylint
|
|
89
|
+
mypy
|
|
90
|
+
ruff
|
|
91
|
+
isort
|
|
92
|
+
flake8
|
|
93
|
+
autopep8
|
|
94
|
+
bandit
|
|
95
|
+
|
|
96
|
+
# === Config/env (8) ===
|
|
97
|
+
python-dotenv
|
|
98
|
+
pydantic
|
|
99
|
+
pydantic-settings
|
|
100
|
+
dynaconf
|
|
101
|
+
toml
|
|
102
|
+
tomli
|
|
103
|
+
configparser
|
|
104
|
+
python-decouple
|
|
105
|
+
|
|
106
|
+
# === Logging/monitoring (5) ===
|
|
38
107
|
loguru
|
|
39
108
|
structlog
|
|
40
109
|
sentry-sdk
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
110
|
+
watchtower
|
|
111
|
+
python-json-logger
|
|
112
|
+
|
|
113
|
+
# === Crypto/security (8) ===
|
|
114
|
+
cryptography
|
|
115
|
+
pyjwt
|
|
116
|
+
passlib
|
|
117
|
+
bcrypt
|
|
118
|
+
paramiko
|
|
119
|
+
certifi
|
|
120
|
+
truststore
|
|
121
|
+
pyopenssl
|
|
122
|
+
|
|
123
|
+
# === File/IO (8) ===
|
|
124
|
+
pillow
|
|
125
|
+
opencv-python
|
|
126
|
+
pyyaml
|
|
127
|
+
lxml
|
|
128
|
+
beautifulsoup4
|
|
129
|
+
python-magic
|
|
130
|
+
watchdog
|
|
131
|
+
send2trash
|
|
132
|
+
|
|
133
|
+
# === Async/concurrency (5) ===
|
|
134
|
+
celery
|
|
135
|
+
dramatiq
|
|
136
|
+
asyncio
|
|
137
|
+
trio
|
|
138
|
+
anyio
|
|
139
|
+
|
|
140
|
+
# === DevOps/CI (5) ===
|
|
141
|
+
boto3
|
|
142
|
+
docker
|
|
143
|
+
fabric
|
|
144
|
+
invoke
|
|
145
|
+
ansible
|
|
146
|
+
|
|
147
|
+
# === Networking (5) ===
|
|
148
|
+
scrapy
|
|
149
|
+
selenium
|
|
150
|
+
playwright
|
|
151
|
+
websockets
|
|
152
|
+
grpcio
|
|
153
|
+
|
|
154
|
+
# === Misc popular (10) ===
|
|
155
|
+
arrow
|
|
156
|
+
pendulum
|
|
157
|
+
python-dateutil
|
|
44
158
|
jinja2
|
|
45
159
|
mako
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
160
|
+
marshmallow
|
|
161
|
+
attrs
|
|
162
|
+
dataclasses-json
|
|
163
|
+
orjson
|
|
164
|
+
ujson
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
# Ground Truth — Known Malicious Packages
|
|
2
|
+
|
|
3
|
+
This directory documents real-world supply-chain malware for reference and validation purposes.
|
|
4
|
+
|
|
5
|
+
## Files
|
|
6
|
+
|
|
7
|
+
- **`known-malware.json`** — Database of 65 documented malicious packages (npm + PyPI) with metadata:
|
|
8
|
+
- `name`: package name or campaign name
|
|
9
|
+
- `ecosystem`: npm, pypi, or npm/pypi
|
|
10
|
+
- `version`: malicious version(s)
|
|
11
|
+
- `date`: discovery date (YYYY-MM)
|
|
12
|
+
- `source`: who discovered it
|
|
13
|
+
- `technique`: attack technique used
|
|
14
|
+
- `url`: link to advisory/report
|
|
15
|
+
- `severity`: critical, high, or medium
|
|
16
|
+
|
|
17
|
+
## Why document but not store?
|
|
18
|
+
|
|
19
|
+
Malicious packages are removed from registries shortly after discovery. We cannot redistribute them. Instead, we document:
|
|
20
|
+
1. The techniques used (for rule development)
|
|
21
|
+
2. The detection timeline (for lead time measurement)
|
|
22
|
+
3. The sources (for IOC enrichment)
|
|
23
|
+
|
|
24
|
+
## Separate from scanner ground truth
|
|
25
|
+
|
|
26
|
+
The scanner's ground truth dataset (`tests/ground-truth/`) contains recreated fixtures of 5 real attacks (event-stream, ua-parser-js, coa, node-ipc, colors) with expected findings. That dataset is used by `muaddib evaluate` for TPR measurement.
|
|
27
|
+
|
|
28
|
+
This directory (`datasets/ground-truth/`) is a broader reference database documenting the full landscape of known supply-chain malware for research and rule development.
|
|
29
|
+
|
|
30
|
+
## Sources
|
|
31
|
+
|
|
32
|
+
| Source | Coverage |
|
|
33
|
+
|--------|----------|
|
|
34
|
+
| Microsoft Security | Shai-Hulud 2.0 analysis |
|
|
35
|
+
| Datadog Security Labs | Shai-Hulud, MUT-8694, targeted malware |
|
|
36
|
+
| Socket.dev | Contagious Interview, typosquatting, Flashbots, WhatsApp |
|
|
37
|
+
| Snyk | Nx/s1ngularity, chalk/debug, ngx-bootstrap, ESLint |
|
|
38
|
+
| Sonatype | coa/rc, Bladeroid, PyPI crypto-stealers |
|
|
39
|
+
| ReversingLabs | Roblox, crypto wallets, ML steganography, Solana |
|
|
40
|
+
| Phylum | Lazarus APT, Django-log-tracker, npm campaigns |
|
|
41
|
+
| JFrog | Discord token stealers |
|
|
42
|
+
| CISA | ua-parser-js advisory |
|
|
43
|
+
| PyPI Security | aiocpa analysis, 500+ package campaign |
|
|
44
|
+
| Fortinet | PyPI malware statistics, open-source registry trends |
|
|
45
|
+
| Zscaler ThreatLabz | NodeCordRAT, SilentSync RAT |
|
|
46
|
+
| Kaspersky | LofyLife campaign |
|
|
47
|
+
| Orca Security | protestware analysis |
|
|
48
|
+
|
|
49
|
+
## Statistics
|
|
50
|
+
|
|
51
|
+
- **65 entries** (45 npm, 18 PyPI, 2 cross-ecosystem)
|
|
52
|
+
- **Date range**: 2018-2026
|
|
53
|
+
- **Severity**: 47 critical, 16 high, 2 medium
|
|
54
|
+
- **Campaigns**: Shai-Hulud (796+), Contagious Interview (338+), 287 typosquats, 500+ PyPI uploads
|