muaddib-scanner 2.2.6 → 2.2.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,49 +1,164 @@
1
+ # MUAD'DIB Benign Dataset — 100+ popular PyPI packages
2
+ # Used for FPR measurement on Python ecosystem
3
+
4
+ # === Web frameworks (15) ===
1
5
  requests
2
- numpy
3
- pandas
4
6
  flask
5
7
  django
6
8
  fastapi
7
- sqlalchemy
8
- pytest
9
- black
10
- pylint
11
- mypy
12
- pydantic
13
- httpx
9
+ starlette
10
+ uvicorn
11
+ gunicorn
12
+ tornado
13
+ bottle
14
+ falcon
15
+ sanic
16
+ quart
14
17
  aiohttp
15
- celery
16
- redis
17
- boto3
18
- pillow
19
- opencv-python
20
- tensorflow
21
- torch
18
+ httpx
19
+ werkzeug
20
+
21
+ # === Data science (15) ===
22
+ numpy
23
+ pandas
24
+ scipy
22
25
  scikit-learn
23
26
  matplotlib
24
27
  seaborn
25
28
  plotly
26
- beautifulsoup4
27
- scrapy
28
- selenium
29
- playwright
30
- lxml
31
- pyyaml
32
- toml
33
- python-dotenv
29
+ bokeh
30
+ altair
31
+ statsmodels
32
+ xgboost
33
+ lightgbm
34
+ catboost
35
+ polars
36
+ vaex
37
+
38
+ # === ML/AI (10) ===
39
+ tensorflow
40
+ torch
41
+ keras
42
+ transformers
43
+ datasets
44
+ tokenizers
45
+ onnx
46
+ onnxruntime
47
+ jax
48
+ flax
49
+
50
+ # === Database (10) ===
51
+ sqlalchemy
52
+ psycopg2
53
+ pymongo
54
+ redis
55
+ motor
56
+ peewee
57
+ databases
58
+ asyncpg
59
+ aiomysql
60
+ cassandra-driver
61
+
62
+ # === Testing (10) ===
63
+ pytest
64
+ pytest-cov
65
+ pytest-asyncio
66
+ pytest-mock
67
+ coverage
68
+ tox
69
+ nox
70
+ hypothesis
71
+ faker
72
+ factory-boy
73
+
74
+ # === CLI/UX (10) ===
34
75
  click
35
76
  typer
36
77
  rich
37
78
  tqdm
79
+ prompt-toolkit
80
+ colorama
81
+ termcolor
82
+ tabulate
83
+ alive-progress
84
+ questionary
85
+
86
+ # === Linting/formatting (8) ===
87
+ black
88
+ pylint
89
+ mypy
90
+ ruff
91
+ isort
92
+ flake8
93
+ autopep8
94
+ bandit
95
+
96
+ # === Config/env (8) ===
97
+ python-dotenv
98
+ pydantic
99
+ pydantic-settings
100
+ dynaconf
101
+ toml
102
+ tomli
103
+ configparser
104
+ python-decouple
105
+
106
+ # === Logging/monitoring (5) ===
38
107
  loguru
39
108
  structlog
40
109
  sentry-sdk
41
- gunicorn
42
- uvicorn
43
- starlette
110
+ watchtower
111
+ python-json-logger
112
+
113
+ # === Crypto/security (8) ===
114
+ cryptography
115
+ pyjwt
116
+ passlib
117
+ bcrypt
118
+ paramiko
119
+ certifi
120
+ truststore
121
+ pyopenssl
122
+
123
+ # === File/IO (8) ===
124
+ pillow
125
+ opencv-python
126
+ pyyaml
127
+ lxml
128
+ beautifulsoup4
129
+ python-magic
130
+ watchdog
131
+ send2trash
132
+
133
+ # === Async/concurrency (5) ===
134
+ celery
135
+ dramatiq
136
+ asyncio
137
+ trio
138
+ anyio
139
+
140
+ # === DevOps/CI (5) ===
141
+ boto3
142
+ docker
143
+ fabric
144
+ invoke
145
+ ansible
146
+
147
+ # === Networking (5) ===
148
+ scrapy
149
+ selenium
150
+ playwright
151
+ websockets
152
+ grpcio
153
+
154
+ # === Misc popular (10) ===
155
+ arrow
156
+ pendulum
157
+ python-dateutil
44
158
  jinja2
45
159
  mako
46
- alembic
47
- psycopg2
48
- pymongo
49
- motor
160
+ marshmallow
161
+ attrs
162
+ dataclasses-json
163
+ orjson
164
+ ujson
@@ -0,0 +1,54 @@
1
+ # Ground Truth — Known Malicious Packages
2
+
3
+ This directory documents real-world supply-chain malware for reference and validation purposes.
4
+
5
+ ## Files
6
+
7
+ - **`known-malware.json`** — Database of 65 documented malicious packages (npm + PyPI) with metadata:
8
+ - `name`: package name or campaign name
9
+ - `ecosystem`: npm, pypi, or npm/pypi
10
+ - `version`: malicious version(s)
11
+ - `date`: discovery date (YYYY-MM)
12
+ - `source`: who discovered it
13
+ - `technique`: attack technique used
14
+ - `url`: link to advisory/report
15
+ - `severity`: critical, high, or medium
16
+
17
+ ## Why document but not store?
18
+
19
+ Malicious packages are removed from registries shortly after discovery. We cannot redistribute them. Instead, we document:
20
+ 1. The techniques used (for rule development)
21
+ 2. The detection timeline (for lead time measurement)
22
+ 3. The sources (for IOC enrichment)
23
+
24
+ ## Separate from scanner ground truth
25
+
26
+ The scanner's ground truth dataset (`tests/ground-truth/`) contains recreated fixtures of 5 real attacks (event-stream, ua-parser-js, coa, node-ipc, colors) with expected findings. That dataset is used by `muaddib evaluate` for TPR measurement.
27
+
28
+ This directory (`datasets/ground-truth/`) is a broader reference database documenting the full landscape of known supply-chain malware for research and rule development.
29
+
30
+ ## Sources
31
+
32
+ | Source | Coverage |
33
+ |--------|----------|
34
+ | Microsoft Security | Shai-Hulud 2.0 analysis |
35
+ | Datadog Security Labs | Shai-Hulud, MUT-8694, targeted malware |
36
+ | Socket.dev | Contagious Interview, typosquatting, Flashbots, WhatsApp |
37
+ | Snyk | Nx/s1ngularity, chalk/debug, ngx-bootstrap, ESLint |
38
+ | Sonatype | coa/rc, Bladeroid, PyPI crypto-stealers |
39
+ | ReversingLabs | Roblox, crypto wallets, ML steganography, Solana |
40
+ | Phylum | Lazarus APT, Django-log-tracker, npm campaigns |
41
+ | JFrog | Discord token stealers |
42
+ | CISA | ua-parser-js advisory |
43
+ | PyPI Security | aiocpa analysis, 500+ package campaign |
44
+ | Fortinet | PyPI malware statistics, open-source registry trends |
45
+ | Zscaler ThreatLabz | NodeCordRAT, SilentSync RAT |
46
+ | Kaspersky | LofyLife campaign |
47
+ | Orca Security | protestware analysis |
48
+
49
+ ## Statistics
50
+
51
+ - **65 entries** (45 npm, 18 PyPI, 2 cross-ecosystem)
52
+ - **Date range**: 2018-2026
53
+ - **Severity**: 47 critical, 16 high, 2 medium
54
+ - **Campaigns**: Shai-Hulud (796+), Contagious Interview (338+), 287 typosquats, 500+ PyPI uploads