typosquatting 0.2.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +9 -0
- data/Dockerfile +5 -0
- data/README.md +67 -0
- data/lib/typosquatting/cli.rb +74 -0
- data/lib/typosquatting/ecosystems/base.rb +4 -0
- data/lib/typosquatting/generator.rb +15 -13
- data/lib/typosquatting/lookup.rb +114 -0
- data/lib/typosquatting/version.rb +1 -1
- data/research/README.md +74 -0
- data/research/critical_packages.rb +282 -0
- metadata +4 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: fc18d0104e41766e5b0b8603d786cde1ff74f7b65360509a8d880732b1d5f6ca
|
|
4
|
+
data.tar.gz: fb48d984d14196d0ccdb2765ee14d986f614d9d1285c329fe6b57d8fe1cb35a6
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 0b9103d71382bbdfb8af88843a4c73d0abf99a65da6b494027c21d3327eb31f8261545c95514dd0bd4b35b7f681c54d70e4a1abb819d9ac020fcceebd96784d5
|
|
7
|
+
data.tar.gz: 122e21bf9b46b6379ea2d925998797161020500e56d440f916a43f995faf4865bad5d60cd5077e2e2d33ebc6cd4aa9046af9e419ac02a39729a156fb016376cf
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,14 @@
|
|
|
1
1
|
## [Unreleased]
|
|
2
2
|
|
|
3
|
+
## [0.4.0] - 2026-01-02
|
|
4
|
+
|
|
5
|
+
- Skip intra-namespace typosquats for scoped packages (npm, composer, golang) since namespace owners control all packages under their namespace
|
|
6
|
+
- Add Dockerfile for running without Ruby installed
|
|
7
|
+
|
|
8
|
+
## [0.3.0] - 2025-12-17
|
|
9
|
+
|
|
10
|
+
- Add `discover` command to find existing similar packages by edit distance using prefix/postfix API
|
|
11
|
+
|
|
3
12
|
## [0.2.0] - 2025-12-17
|
|
4
13
|
|
|
5
14
|
- Add GitHub Actions ecosystem for CI/CD workflow typosquatting detection
|
data/Dockerfile
ADDED
data/README.md
CHANGED
|
@@ -31,6 +31,19 @@ Or add to your Gemfile:
|
|
|
31
31
|
gem "typosquatting"
|
|
32
32
|
```
|
|
33
33
|
|
|
34
|
+
Or run with Docker:
|
|
35
|
+
|
|
36
|
+
```bash
|
|
37
|
+
docker build -t typosquatting .
|
|
38
|
+
docker run --rm typosquatting generate requests -e pypi
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
For commands that read local files (like `sbom`), mount your directory:
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
docker run --rm -v $PWD:/src typosquatting sbom /src/bom.json
|
|
45
|
+
```
|
|
46
|
+
|
|
34
47
|
## CLI Usage
|
|
35
48
|
|
|
36
49
|
```bash
|
|
@@ -69,6 +82,12 @@ typosquatting check requests -e pypi -f json
|
|
|
69
82
|
|
|
70
83
|
# List available algorithms
|
|
71
84
|
typosquatting algorithms
|
|
85
|
+
|
|
86
|
+
# Discover existing packages similar to a target (by edit distance)
|
|
87
|
+
typosquatting discover requests -e pypi
|
|
88
|
+
|
|
89
|
+
# Discover with generated variants check
|
|
90
|
+
typosquatting discover requests -e pypi --with-variants
|
|
72
91
|
```
|
|
73
92
|
|
|
74
93
|
## Example Output
|
|
@@ -204,10 +223,58 @@ Package lookups use the [ecosyste.ms](https://packages.ecosyste.ms) API. Request
|
|
|
204
223
|
|
|
205
224
|
Be mindful when checking many packages. The `--dry-run` flag shows what would be checked without making API calls.
|
|
206
225
|
|
|
226
|
+
### packages.ecosyste.ms API
|
|
227
|
+
|
|
228
|
+
The package_names endpoint can help identify potential typosquats by searching for packages with similar prefixes or postfixes to popular package names.
|
|
229
|
+
|
|
230
|
+
```
|
|
231
|
+
GET /api/v1/registries/{registry}/package_names
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
**Parameters:**
|
|
235
|
+
- `prefix` - filter by package names starting with string (case insensitive)
|
|
236
|
+
- `postfix` - filter by package names ending with string (case insensitive)
|
|
237
|
+
- `page`, `per_page` - pagination
|
|
238
|
+
- `sort`, `order` - sorting
|
|
239
|
+
|
|
240
|
+
**Examples:**
|
|
241
|
+
```
|
|
242
|
+
# Find RubyGems packages ending in "ails" (potential "rails" typosquats)
|
|
243
|
+
https://packages.ecosyste.ms/api/v1/registries/rubygems.org/package_names?postfix=ails
|
|
244
|
+
|
|
245
|
+
# Find RubyGems packages starting with "rai" (potential "rails" typosquats)
|
|
246
|
+
https://packages.ecosyste.ms/api/v1/registries/rubygems.org/package_names?prefix=rai
|
|
247
|
+
|
|
248
|
+
# Find npm packages starting with "reac" (potential "react" typosquats)
|
|
249
|
+
https://packages.ecosyste.ms/api/v1/registries/npmjs.org/package_names?prefix=reac
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
Full API documentation: [packages.ecosyste.ms/docs](https://packages.ecosyste.ms/docs)
|
|
253
|
+
|
|
207
254
|
## Dataset
|
|
208
255
|
|
|
209
256
|
The [ecosyste-ms/typosquatting-dataset](https://github.com/ecosyste-ms/typosquatting-dataset) contains 143 confirmed typosquatting attacks from security research, mapping malicious packages to their targets with classification and source attribution. Useful for testing detection tools and understanding real attack patterns.
|
|
210
257
|
|
|
258
|
+
## Research
|
|
259
|
+
|
|
260
|
+
The `research/` directory contains a script to scan "critical" packages (high OpenSSF criticality score) for potential typosquats:
|
|
261
|
+
|
|
262
|
+
```bash
|
|
263
|
+
# Scan critical RubyGems packages
|
|
264
|
+
ruby research/critical_packages.rb rubygems.org
|
|
265
|
+
|
|
266
|
+
# Scan npm
|
|
267
|
+
ruby research/critical_packages.rb npmjs.org
|
|
268
|
+
|
|
269
|
+
# Include all algorithms (default is high-confidence only)
|
|
270
|
+
ruby research/critical_packages.rb rubygems.org --all
|
|
271
|
+
|
|
272
|
+
# Limit to first N packages for testing
|
|
273
|
+
ruby research/critical_packages.rb rubygems.org --limit=100
|
|
274
|
+
```
|
|
275
|
+
|
|
276
|
+
The script generates variants using all library algorithms, checks which exist on the registry, and outputs a CSV with download counts, creation dates, repository URLs, and package status. It filters out packages that predate the target (can't be typosquats), packages with high download ratios (likely legitimate), and flags packages that have been removed (confirmed typosquats).
|
|
277
|
+
|
|
211
278
|
## Development
|
|
212
279
|
|
|
213
280
|
```bash
|
data/lib/typosquatting/cli.rb
CHANGED
|
@@ -16,6 +16,8 @@ module Typosquatting
|
|
|
16
16
|
generate(args)
|
|
17
17
|
when "check"
|
|
18
18
|
check(args)
|
|
19
|
+
when "discover"
|
|
20
|
+
discover(args)
|
|
19
21
|
when "confusion"
|
|
20
22
|
confusion(args)
|
|
21
23
|
when "sbom"
|
|
@@ -101,6 +103,39 @@ module Typosquatting
|
|
|
101
103
|
output_check_results(results, options)
|
|
102
104
|
end
|
|
103
105
|
|
|
106
|
+
def discover(args)
|
|
107
|
+
options = { format: "text", max_distance: 2 }
|
|
108
|
+
parser = OptionParser.new do |opts|
|
|
109
|
+
opts.banner = "Usage: typosquatting discover PACKAGE -e ECOSYSTEM [options]"
|
|
110
|
+
opts.on("-e", "--ecosystem ECOSYSTEM", "Package ecosystem (required)") { |v| options[:ecosystem] = v }
|
|
111
|
+
opts.on("-f", "--format FORMAT", "Output format (text, json)") { |v| options[:format] = v }
|
|
112
|
+
opts.on("-d", "--distance N", Integer, "Maximum edit distance (default: 2)") { |v| options[:max_distance] = v }
|
|
113
|
+
opts.on("--with-variants", "Also show which generated variants exist") { options[:with_variants] = true }
|
|
114
|
+
end
|
|
115
|
+
parser.parse!(args)
|
|
116
|
+
|
|
117
|
+
package = args.shift
|
|
118
|
+
unless package && options[:ecosystem]
|
|
119
|
+
$stderr.puts "Error: Package name and ecosystem required"
|
|
120
|
+
$stderr.puts parser
|
|
121
|
+
exit 1
|
|
122
|
+
end
|
|
123
|
+
|
|
124
|
+
lookup = Lookup.new(ecosystem: options[:ecosystem])
|
|
125
|
+
|
|
126
|
+
$stderr.puts "Discovering similar packages to #{package}..." if $stderr.tty?
|
|
127
|
+
results = lookup.discover(package, max_distance: options[:max_distance])
|
|
128
|
+
|
|
129
|
+
if options[:with_variants]
|
|
130
|
+
generator = Generator.new(ecosystem: options[:ecosystem])
|
|
131
|
+
variants = generator.generate(package)
|
|
132
|
+
variant_results = lookup.check_with_variants(package, variants)
|
|
133
|
+
existing_variants = variant_results.select(&:exists?)
|
|
134
|
+
end
|
|
135
|
+
|
|
136
|
+
output_discover_results(results, existing_variants, options)
|
|
137
|
+
end
|
|
138
|
+
|
|
104
139
|
def confusion(args)
|
|
105
140
|
options = { format: "text" }
|
|
106
141
|
parser = OptionParser.new do |opts|
|
|
@@ -212,6 +247,7 @@ module Typosquatting
|
|
|
212
247
|
puts "Commands:"
|
|
213
248
|
puts " generate PACKAGE -e ECOSYSTEM Generate typosquat variants"
|
|
214
249
|
puts " check PACKAGE -e ECOSYSTEM Check which variants exist"
|
|
250
|
+
puts " discover PACKAGE -e ECOSYSTEM Find similar packages by edit distance"
|
|
215
251
|
puts " confusion PACKAGE -e ECOSYSTEM Check for dependency confusion"
|
|
216
252
|
puts " sbom FILE Check SBOM for potential typosquats"
|
|
217
253
|
puts " ecosystems List supported ecosystems"
|
|
@@ -222,6 +258,7 @@ module Typosquatting
|
|
|
222
258
|
puts "Examples:"
|
|
223
259
|
puts " typosquatting generate requests -e pypi"
|
|
224
260
|
puts " typosquatting check requests -e pypi --existing-only"
|
|
261
|
+
puts " typosquatting discover rails -e gem --with-variants"
|
|
225
262
|
puts " typosquatting confusion my-package -e maven"
|
|
226
263
|
puts " typosquatting sbom bom.json"
|
|
227
264
|
end
|
|
@@ -379,5 +416,42 @@ module Typosquatting
|
|
|
379
416
|
puts "Found #{results.length} suspicious package(s)"
|
|
380
417
|
end
|
|
381
418
|
end
|
|
419
|
+
|
|
420
|
+
def output_discover_results(discovered, existing_variants, options)
|
|
421
|
+
case options[:format]
|
|
422
|
+
when "json"
|
|
423
|
+
data = {
|
|
424
|
+
discovered: discovered.map(&:to_h),
|
|
425
|
+
existing_variants: existing_variants&.map(&:to_h)
|
|
426
|
+
}.compact
|
|
427
|
+
puts JSON.pretty_generate(data)
|
|
428
|
+
else
|
|
429
|
+
if discovered.empty? && (existing_variants.nil? || existing_variants.empty?)
|
|
430
|
+
puts "No similar packages found"
|
|
431
|
+
return
|
|
432
|
+
end
|
|
433
|
+
|
|
434
|
+
if discovered.any?
|
|
435
|
+
puts "Similar packages found (by edit distance):"
|
|
436
|
+
puts ""
|
|
437
|
+
discovered.each do |result|
|
|
438
|
+
puts " #{result.name} (distance: #{result.distance})"
|
|
439
|
+
end
|
|
440
|
+
puts ""
|
|
441
|
+
end
|
|
442
|
+
|
|
443
|
+
if existing_variants&.any?
|
|
444
|
+
puts "Generated variants that exist:"
|
|
445
|
+
puts ""
|
|
446
|
+
existing_variants.each do |result|
|
|
447
|
+
puts " #{result.name}"
|
|
448
|
+
end
|
|
449
|
+
puts ""
|
|
450
|
+
end
|
|
451
|
+
|
|
452
|
+
puts "Found #{discovered.length} similar package(s)"
|
|
453
|
+
puts "Found #{existing_variants.length} existing variant(s)" if existing_variants&.any?
|
|
454
|
+
end
|
|
455
|
+
end
|
|
382
456
|
end
|
|
383
457
|
end
|
|
@@ -85,19 +85,21 @@ module Typosquatting
|
|
|
85
85
|
end
|
|
86
86
|
end
|
|
87
87
|
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
88
|
+
unless ecosystem.namespace_controls_members?
|
|
89
|
+
name_algorithms.each do |algorithm|
|
|
90
|
+
name_variants = algorithm.generate(name)
|
|
91
|
+
name_variants.each do |name_variant|
|
|
92
|
+
full_name = rebuild_namespaced_name(namespace, name_variant)
|
|
93
|
+
next if full_name == package_name
|
|
94
|
+
next unless ecosystem.valid_name?(full_name)
|
|
95
|
+
next if same_after_normalisation?(package_name, full_name)
|
|
96
|
+
|
|
97
|
+
results << Variant.new(
|
|
98
|
+
name: full_name,
|
|
99
|
+
algorithm: algorithm.name,
|
|
100
|
+
original: package_name
|
|
101
|
+
)
|
|
102
|
+
end
|
|
101
103
|
end
|
|
102
104
|
end
|
|
103
105
|
|
data/lib/typosquatting/lookup.rb
CHANGED
|
@@ -4,6 +4,7 @@ require "net/http"
|
|
|
4
4
|
require "json"
|
|
5
5
|
require "uri"
|
|
6
6
|
require "purl"
|
|
7
|
+
require "set"
|
|
7
8
|
|
|
8
9
|
module Typosquatting
|
|
9
10
|
class Lookup
|
|
@@ -51,6 +52,119 @@ module Typosquatting
|
|
|
51
52
|
response&.map { |r| Registry.new(r) } || []
|
|
52
53
|
end
|
|
53
54
|
|
|
55
|
+
def list_names(registry:, prefix: nil, postfix: nil, critical: nil, page: nil, per_page: nil)
|
|
56
|
+
params = []
|
|
57
|
+
params << "prefix=#{URI.encode_www_form_component(prefix)}" if prefix
|
|
58
|
+
params << "postfix=#{URI.encode_www_form_component(postfix)}" if postfix
|
|
59
|
+
params << "critical=true" if critical
|
|
60
|
+
params << "page=#{page}" if page
|
|
61
|
+
params << "per_page=#{per_page}" if per_page
|
|
62
|
+
query = params.empty? ? "" : "?#{params.join("&")}"
|
|
63
|
+
|
|
64
|
+
fetch("/registries/#{URI.encode_www_form_component(registry)}/package_names#{query}") || []
|
|
65
|
+
end
|
|
66
|
+
|
|
67
|
+
def list_all_names(registry:, prefix: nil, postfix: nil, critical: nil, per_page: 100)
|
|
68
|
+
all_names = []
|
|
69
|
+
page = 1
|
|
70
|
+
|
|
71
|
+
loop do
|
|
72
|
+
names = list_names(
|
|
73
|
+
registry: registry,
|
|
74
|
+
prefix: prefix,
|
|
75
|
+
postfix: postfix,
|
|
76
|
+
critical: critical,
|
|
77
|
+
page: page,
|
|
78
|
+
per_page: per_page
|
|
79
|
+
)
|
|
80
|
+
break if names.empty?
|
|
81
|
+
|
|
82
|
+
all_names.concat(names)
|
|
83
|
+
break if names.length < per_page
|
|
84
|
+
|
|
85
|
+
page += 1
|
|
86
|
+
end
|
|
87
|
+
|
|
88
|
+
all_names
|
|
89
|
+
end
|
|
90
|
+
|
|
91
|
+
def discover(package_name, max_distance: 2)
|
|
92
|
+
registry = registries.first
|
|
93
|
+
return [] unless registry
|
|
94
|
+
|
|
95
|
+
prefix = package_name[0, 3]
|
|
96
|
+
candidates = list_names(registry: registry.name, prefix: prefix)
|
|
97
|
+
|
|
98
|
+
candidates.filter_map do |candidate|
|
|
99
|
+
next if candidate == package_name
|
|
100
|
+
|
|
101
|
+
distance = levenshtein(package_name.downcase, candidate.downcase)
|
|
102
|
+
next if distance > max_distance || distance == 0
|
|
103
|
+
|
|
104
|
+
DiscoveryResult.new(
|
|
105
|
+
name: candidate,
|
|
106
|
+
target: package_name,
|
|
107
|
+
distance: distance
|
|
108
|
+
)
|
|
109
|
+
end.sort_by(&:distance)
|
|
110
|
+
end
|
|
111
|
+
|
|
112
|
+
def check_with_variants(package_name, variants)
|
|
113
|
+
registry = registries.first
|
|
114
|
+
return [] unless registry
|
|
115
|
+
|
|
116
|
+
prefix = package_name[0, 3]
|
|
117
|
+
existing = list_names(registry: registry.name, prefix: prefix)
|
|
118
|
+
existing_set = existing.map(&:downcase).to_set
|
|
119
|
+
|
|
120
|
+
variant_names = variants.map { |v| v.is_a?(String) ? v : v.name }
|
|
121
|
+
|
|
122
|
+
variant_names.filter_map do |variant|
|
|
123
|
+
exists = existing_set.include?(variant.downcase)
|
|
124
|
+
VariantCheckResult.new(
|
|
125
|
+
name: variant,
|
|
126
|
+
exists: exists
|
|
127
|
+
)
|
|
128
|
+
end
|
|
129
|
+
end
|
|
130
|
+
|
|
131
|
+
def levenshtein(s1, s2)
|
|
132
|
+
m, n = s1.length, s2.length
|
|
133
|
+
return n if m == 0
|
|
134
|
+
return m if n == 0
|
|
135
|
+
|
|
136
|
+
d = Array.new(m + 1) { |i| i }
|
|
137
|
+
x = nil
|
|
138
|
+
|
|
139
|
+
(1..n).each do |j|
|
|
140
|
+
d[0] = j
|
|
141
|
+
x = j - 1
|
|
142
|
+
|
|
143
|
+
(1..m).each do |i|
|
|
144
|
+
cost = s1[i - 1] == s2[j - 1] ? 0 : 1
|
|
145
|
+
x, d[i] = d[i], [d[i] + 1, d[i - 1] + 1, x + cost].min
|
|
146
|
+
end
|
|
147
|
+
end
|
|
148
|
+
|
|
149
|
+
d[m]
|
|
150
|
+
end
|
|
151
|
+
|
|
152
|
+
DiscoveryResult = Struct.new(:name, :target, :distance, keyword_init: true) do
|
|
153
|
+
def to_h
|
|
154
|
+
{ name: name, target: target, distance: distance }
|
|
155
|
+
end
|
|
156
|
+
end
|
|
157
|
+
|
|
158
|
+
VariantCheckResult = Struct.new(:name, :exists, keyword_init: true) do
|
|
159
|
+
def exists?
|
|
160
|
+
exists
|
|
161
|
+
end
|
|
162
|
+
|
|
163
|
+
def to_h
|
|
164
|
+
{ name: name, exists: exists }
|
|
165
|
+
end
|
|
166
|
+
end
|
|
167
|
+
|
|
54
168
|
Result = Struct.new(:name, :purl, :packages, :ecosystem, keyword_init: true) do
|
|
55
169
|
def exists?
|
|
56
170
|
!packages.empty?
|
data/research/README.md
ADDED
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
# Typosquatting Research Tools
|
|
2
|
+
|
|
3
|
+
Scripts for analyzing potential typosquats across package registries.
|
|
4
|
+
|
|
5
|
+
## critical_packages.rb
|
|
6
|
+
|
|
7
|
+
Scans critical packages (high OpenSSF criticality score) from a registry for potential typosquats using our detection algorithms. Results are written to a timestamped CSV file.
|
|
8
|
+
|
|
9
|
+
```bash
|
|
10
|
+
# Scan rubygems.org critical packages (high-confidence algorithms only)
|
|
11
|
+
ruby research/critical_packages.rb rubygems.org
|
|
12
|
+
|
|
13
|
+
# Include all algorithm matches
|
|
14
|
+
ruby research/critical_packages.rb rubygems.org --all
|
|
15
|
+
|
|
16
|
+
# Limit to first N packages (useful for testing)
|
|
17
|
+
ruby research/critical_packages.rb rubygems.org --limit=100
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
Supported registries: rubygems.org, npmjs.org, pypi.org, crates.io, packagist.org, hex.pm, pub.dev, proxy.golang.org, repo1.maven.org, nuget.org
|
|
21
|
+
|
|
22
|
+
## Algorithms
|
|
23
|
+
|
|
24
|
+
By default, only high-confidence algorithms are used (less likely to produce false positives):
|
|
25
|
+
|
|
26
|
+
- homoglyph - lookalike characters (l vs 1, O vs 0)
|
|
27
|
+
- repetition - doubled characters (lodash vs llodash)
|
|
28
|
+
- replacement - adjacent keyboard keys (lodash vs lodazh)
|
|
29
|
+
- transposition - swapped adjacent characters (lodash vs lodasj)
|
|
30
|
+
- omission - dropped characters (lodash vs lodas)
|
|
31
|
+
|
|
32
|
+
Use `--all` to include all 17 algorithms.
|
|
33
|
+
|
|
34
|
+
## Filters
|
|
35
|
+
|
|
36
|
+
The script applies several filters to reduce false positives:
|
|
37
|
+
|
|
38
|
+
- **Short names**: Packages under 5 characters are skipped (too many false positives)
|
|
39
|
+
- **Higher downloads**: Packages with more downloads than the critical package are skipped (not typosquats)
|
|
40
|
+
- **Popular packages**: Packages with >= 1% of the critical package's downloads are skipped (likely legitimate)
|
|
41
|
+
- **Predates target**: Packages created before the critical package are skipped (can't be typosquats)
|
|
42
|
+
|
|
43
|
+
## CSV Output
|
|
44
|
+
|
|
45
|
+
Output files are named `{registry}_{timestamp}.csv` with these columns:
|
|
46
|
+
|
|
47
|
+
| Column | Description |
|
|
48
|
+
|--------|-------------|
|
|
49
|
+
| critical_package | The critical package being checked |
|
|
50
|
+
| critical_downloads | Total downloads of the critical package |
|
|
51
|
+
| critical_created | First release date of the critical package |
|
|
52
|
+
| critical_repo | Repository URL of the critical package |
|
|
53
|
+
| potential_typosquat | A similarly named package that exists |
|
|
54
|
+
| algorithm | Which detection algorithm matched |
|
|
55
|
+
| squat_downloads | Total downloads of the potential typosquat |
|
|
56
|
+
| download_ratio | Squat downloads as percentage of critical downloads |
|
|
57
|
+
| squat_created | First release date of the potential typosquat |
|
|
58
|
+
| squat_status | Package status (empty = active, "removed" = yanked) |
|
|
59
|
+
| squat_repo | Repository URL of the potential typosquat |
|
|
60
|
+
| squat_description | Package description |
|
|
61
|
+
|
|
62
|
+
## Interpreting Results
|
|
63
|
+
|
|
64
|
+
Signs of a real typosquat:
|
|
65
|
+
- `squat_status` is "removed" (already yanked by registry)
|
|
66
|
+
- No repository URL
|
|
67
|
+
- Very low download ratio
|
|
68
|
+
- Description is empty or generic
|
|
69
|
+
- Created shortly after the critical package became popular
|
|
70
|
+
|
|
71
|
+
Signs of a false positive:
|
|
72
|
+
- Has a legitimate repository with real code
|
|
73
|
+
- Description describes unrelated functionality
|
|
74
|
+
- Reasonable download count for its purpose
|
|
@@ -0,0 +1,282 @@
|
|
|
1
|
+
#!/usr/bin/env ruby
|
|
2
|
+
# frozen_string_literal: true
|
|
3
|
+
|
|
4
|
+
require "bundler/setup"
|
|
5
|
+
require "typosquatting"
|
|
6
|
+
require "csv"
|
|
7
|
+
|
|
8
|
+
class CriticalPackageScanner
|
|
9
|
+
SHORT_NAME_THRESHOLD = 5
|
|
10
|
+
POPULAR_RATIO_THRESHOLD = 1.0 # Skip squats with >= 1% of critical package downloads
|
|
11
|
+
|
|
12
|
+
REGISTRY_MAP = {
|
|
13
|
+
"rubygems.org" => "rubygems",
|
|
14
|
+
"npmjs.org" => "npm",
|
|
15
|
+
"pypi.org" => "pypi",
|
|
16
|
+
"crates.io" => "cargo",
|
|
17
|
+
"packagist.org" => "composer",
|
|
18
|
+
"hex.pm" => "hex",
|
|
19
|
+
"pub.dev" => "pub",
|
|
20
|
+
"proxy.golang.org" => "golang",
|
|
21
|
+
"repo1.maven.org" => "maven",
|
|
22
|
+
"nuget.org" => "nuget"
|
|
23
|
+
}.freeze
|
|
24
|
+
|
|
25
|
+
# High confidence algorithms that indicate likely intentional typosquatting
|
|
26
|
+
HIGH_CONFIDENCE_ALGORITHMS = %w[
|
|
27
|
+
homoglyph
|
|
28
|
+
repetition
|
|
29
|
+
replacement
|
|
30
|
+
transposition
|
|
31
|
+
omission
|
|
32
|
+
].freeze
|
|
33
|
+
|
|
34
|
+
attr_reader :registry, :results, :errors, :high_confidence_only, :limit
|
|
35
|
+
|
|
36
|
+
def initialize(registry:, high_confidence_only: true, limit: nil)
|
|
37
|
+
@registry = registry
|
|
38
|
+
@high_confidence_only = high_confidence_only
|
|
39
|
+
@limit = limit
|
|
40
|
+
@results = []
|
|
41
|
+
@errors = []
|
|
42
|
+
@prefix_cache = {}
|
|
43
|
+
end
|
|
44
|
+
|
|
45
|
+
def run
|
|
46
|
+
packages = fetch_critical_packages
|
|
47
|
+
puts "Found #{packages.length} critical packages for #{registry}"
|
|
48
|
+
puts
|
|
49
|
+
|
|
50
|
+
packages.each_with_index do |package, index|
|
|
51
|
+
scan_package(package, index + 1, packages.length)
|
|
52
|
+
end
|
|
53
|
+
|
|
54
|
+
write_csv
|
|
55
|
+
print_summary
|
|
56
|
+
end
|
|
57
|
+
|
|
58
|
+
def fetch_critical_packages
|
|
59
|
+
packages = lookup.list_all_names(registry: registry, critical: true, per_page: 1000)
|
|
60
|
+
limit ? packages.first(limit) : packages
|
|
61
|
+
end
|
|
62
|
+
|
|
63
|
+
def scan_package(package_name, current, total)
|
|
64
|
+
print "\r[#{current}/#{total}] Scanning #{package_name.ljust(40)}"
|
|
65
|
+
|
|
66
|
+
# Skip short names - too many false positives
|
|
67
|
+
return if package_name.length < SHORT_NAME_THRESHOLD
|
|
68
|
+
|
|
69
|
+
# Generate typosquatting variants using our algorithms
|
|
70
|
+
variants = generator.generate(package_name)
|
|
71
|
+
return if variants.empty?
|
|
72
|
+
|
|
73
|
+
# Fetch details for the critical package first (needed for download/date comparison)
|
|
74
|
+
critical_details = fetch_package_details(package_name)
|
|
75
|
+
@current_critical_downloads = critical_details&.dig("downloads") || 0
|
|
76
|
+
@current_critical_created = critical_details&.dig("first_release_published_at")
|
|
77
|
+
|
|
78
|
+
# Check which variants exist on the registry
|
|
79
|
+
existing = check_variants_exist(package_name, variants)
|
|
80
|
+
return if existing.empty?
|
|
81
|
+
|
|
82
|
+
results << {
|
|
83
|
+
package: package_name,
|
|
84
|
+
critical_details: critical_details,
|
|
85
|
+
matches: existing
|
|
86
|
+
}
|
|
87
|
+
rescue Typosquatting::APIError => e
|
|
88
|
+
errors << { package: package_name, error: e.message }
|
|
89
|
+
rescue StandardError => e
|
|
90
|
+
errors << { package: package_name, error: e.message }
|
|
91
|
+
end
|
|
92
|
+
|
|
93
|
+
def check_variants_exist(package_name, variants)
|
|
94
|
+
# Filter to high-confidence algorithms if requested
|
|
95
|
+
if high_confidence_only
|
|
96
|
+
variants = variants.select { |v| HIGH_CONFIDENCE_ALGORITHMS.include?(v.algorithm) }
|
|
97
|
+
end
|
|
98
|
+
|
|
99
|
+
# Group variants by prefix for efficient lookup
|
|
100
|
+
variants_by_prefix = variants.group_by { |v| v.name[0, 3] }
|
|
101
|
+
|
|
102
|
+
existing = []
|
|
103
|
+
variants_by_prefix.each do |prefix, prefix_variants|
|
|
104
|
+
@prefix_cache[prefix] ||= lookup.list_names(registry: registry, prefix: prefix)
|
|
105
|
+
existing_set = @prefix_cache[prefix].map(&:downcase).to_set
|
|
106
|
+
|
|
107
|
+
prefix_variants.each do |variant|
|
|
108
|
+
if existing_set.include?(variant.name.downcase) && variant.name.downcase != package_name.downcase
|
|
109
|
+
# Fetch package details
|
|
110
|
+
details = fetch_package_details(variant.name)
|
|
111
|
+
squat_downloads = details&.dig("downloads") || 0
|
|
112
|
+
squat_created = details&.dig("first_release_published_at")
|
|
113
|
+
|
|
114
|
+
# Skip if squat has more downloads than critical package - not a squat
|
|
115
|
+
next if squat_downloads > @current_critical_downloads
|
|
116
|
+
|
|
117
|
+
# Skip if squat is too popular (likely legitimate)
|
|
118
|
+
if @current_critical_downloads > 0
|
|
119
|
+
ratio = squat_downloads.to_f / @current_critical_downloads * 100
|
|
120
|
+
next if ratio >= POPULAR_RATIO_THRESHOLD
|
|
121
|
+
end
|
|
122
|
+
|
|
123
|
+
# Skip if squat predates the critical package (can't be a typosquat)
|
|
124
|
+
if squat_created && @current_critical_created
|
|
125
|
+
next if squat_created < @current_critical_created
|
|
126
|
+
end
|
|
127
|
+
|
|
128
|
+
existing << {
|
|
129
|
+
variant: variant,
|
|
130
|
+
description: details&.dig("description"),
|
|
131
|
+
repository_url: details&.dig("repository_url"),
|
|
132
|
+
downloads: squat_downloads,
|
|
133
|
+
first_release: squat_created,
|
|
134
|
+
status: details&.dig("status")
|
|
135
|
+
}
|
|
136
|
+
end
|
|
137
|
+
end
|
|
138
|
+
end
|
|
139
|
+
|
|
140
|
+
existing
|
|
141
|
+
end
|
|
142
|
+
|
|
143
|
+
def fetch_package_details(package_name)
|
|
144
|
+
result = lookup.check(package_name)
|
|
145
|
+
result.packages.first
|
|
146
|
+
rescue StandardError
|
|
147
|
+
nil
|
|
148
|
+
end
|
|
149
|
+
|
|
150
|
+
def generator
|
|
151
|
+
@generator ||= Typosquatting::Generator.new(ecosystem: ecosystem_for_registry)
|
|
152
|
+
end
|
|
153
|
+
|
|
154
|
+
def lookup
|
|
155
|
+
@lookup ||= Typosquatting::Lookup.new(ecosystem: ecosystem_for_registry)
|
|
156
|
+
end
|
|
157
|
+
|
|
158
|
+
def ecosystem_for_registry
|
|
159
|
+
REGISTRY_MAP[registry] || "rubygems"
|
|
160
|
+
end
|
|
161
|
+
|
|
162
|
+
def output_filename
|
|
163
|
+
timestamp = Time.now.strftime("%Y%m%d_%H%M%S")
|
|
164
|
+
"#{registry.gsub(".", "_")}_#{timestamp}.csv"
|
|
165
|
+
end
|
|
166
|
+
|
|
167
|
+
def write_csv
|
|
168
|
+
return if results.empty?
|
|
169
|
+
|
|
170
|
+
filename = output_filename
|
|
171
|
+
filepath = File.join(__dir__, filename)
|
|
172
|
+
|
|
173
|
+
CSV.open(filepath, "w") do |csv|
|
|
174
|
+
csv << [
|
|
175
|
+
"critical_package", "critical_downloads", "critical_created", "critical_repo",
|
|
176
|
+
"potential_typosquat", "algorithm", "squat_downloads", "download_ratio", "squat_created", "squat_status", "squat_repo", "squat_description"
|
|
177
|
+
]
|
|
178
|
+
|
|
179
|
+
results.each do |result|
|
|
180
|
+
critical = result[:critical_details]
|
|
181
|
+
critical_downloads = critical&.dig("downloads") || 0
|
|
182
|
+
result[:matches].each do |match|
|
|
183
|
+
squat_downloads = match[:downloads] || 0
|
|
184
|
+
ratio = critical_downloads > 0 ? (squat_downloads.to_f / critical_downloads * 100).round(4) : 0
|
|
185
|
+
|
|
186
|
+
csv << [
|
|
187
|
+
result[:package],
|
|
188
|
+
critical_downloads,
|
|
189
|
+
critical&.dig("first_release_published_at")&.split("T")&.first,
|
|
190
|
+
critical&.dig("repository_url"),
|
|
191
|
+
match[:variant].name,
|
|
192
|
+
match[:variant].algorithm,
|
|
193
|
+
squat_downloads,
|
|
194
|
+
"#{ratio}%",
|
|
195
|
+
match[:first_release]&.split("T")&.first,
|
|
196
|
+
match[:status],
|
|
197
|
+
match[:repository_url],
|
|
198
|
+
match[:description]&.gsub(/\s+/, " ")&.strip
|
|
199
|
+
]
|
|
200
|
+
end
|
|
201
|
+
end
|
|
202
|
+
end
|
|
203
|
+
|
|
204
|
+
puts "\n\nResults written to #{filepath}"
|
|
205
|
+
end
|
|
206
|
+
|
|
207
|
+
def print_summary
|
|
208
|
+
puts "\n"
|
|
209
|
+
puts "=" * 60
|
|
210
|
+
puts "Results for #{registry}"
|
|
211
|
+
puts "=" * 60
|
|
212
|
+
puts
|
|
213
|
+
|
|
214
|
+
if results.empty?
|
|
215
|
+
puts "No potential typosquats found."
|
|
216
|
+
else
|
|
217
|
+
puts "Found #{results.length} critical packages with potential typosquats"
|
|
218
|
+
puts "Total potential typosquats: #{results.sum { |r| r[:matches].length }}"
|
|
219
|
+
|
|
220
|
+
# Algorithm breakdown
|
|
221
|
+
algo_counts = Hash.new(0)
|
|
222
|
+
results.each do |result|
|
|
223
|
+
result[:matches].each { |m| algo_counts[m[:variant].algorithm] += 1 }
|
|
224
|
+
end
|
|
225
|
+
|
|
226
|
+
puts "\nBy algorithm:"
|
|
227
|
+
algo_counts.sort_by { |_, count| -count }.each do |algo, count|
|
|
228
|
+
puts " #{algo}: #{count}"
|
|
229
|
+
end
|
|
230
|
+
|
|
231
|
+
# Flag suspicious packages (no repo, low downloads)
|
|
232
|
+
suspicious = []
|
|
233
|
+
results.each do |result|
|
|
234
|
+
result[:matches].each do |match|
|
|
235
|
+
if match[:repository_url].nil? || match[:repository_url].to_s.empty?
|
|
236
|
+
suspicious << "#{match[:variant].name} (no repo, #{match[:downloads] || 0} downloads)"
|
|
237
|
+
end
|
|
238
|
+
end
|
|
239
|
+
end
|
|
240
|
+
|
|
241
|
+
if suspicious.any?
|
|
242
|
+
puts "\nSuspicious (no repository):"
|
|
243
|
+
suspicious.first(10).each { |s| puts " #{s}" }
|
|
244
|
+
puts " ... and #{suspicious.length - 10} more" if suspicious.length > 10
|
|
245
|
+
end
|
|
246
|
+
|
|
247
|
+
# Flag removed/yanked packages (confirmed typosquats)
|
|
248
|
+
removed = []
|
|
249
|
+
results.each do |result|
|
|
250
|
+
result[:matches].each do |match|
|
|
251
|
+
if match[:status] == "removed"
|
|
252
|
+
removed << "#{match[:variant].name} (targeting #{result[:package]})"
|
|
253
|
+
end
|
|
254
|
+
end
|
|
255
|
+
end
|
|
256
|
+
|
|
257
|
+
if removed.any?
|
|
258
|
+
puts "\nConfirmed (already yanked):"
|
|
259
|
+
removed.first(10).each { |s| puts " #{s}" }
|
|
260
|
+
puts " ... and #{removed.length - 10} more" if removed.length > 10
|
|
261
|
+
end
|
|
262
|
+
end
|
|
263
|
+
|
|
264
|
+
return if errors.empty?
|
|
265
|
+
|
|
266
|
+
puts "\n" + "=" * 60
|
|
267
|
+
puts "Errors (#{errors.length}):"
|
|
268
|
+
puts "=" * 60
|
|
269
|
+
errors.each do |error|
|
|
270
|
+
puts " #{error[:package]}: #{error[:error]}"
|
|
271
|
+
end
|
|
272
|
+
end
|
|
273
|
+
end
|
|
274
|
+
|
|
275
|
+
if __FILE__ == $PROGRAM_NAME
|
|
276
|
+
registry = ARGV[0] || "rubygems.org"
|
|
277
|
+
high_confidence_only = !ARGV.include?("--all")
|
|
278
|
+
limit = ARGV.find { |a| a.start_with?("--limit=") }&.split("=")&.last&.to_i
|
|
279
|
+
|
|
280
|
+
scanner = CriticalPackageScanner.new(registry: registry, high_confidence_only: high_confidence_only, limit: limit)
|
|
281
|
+
scanner.run
|
|
282
|
+
end
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: typosquatting
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.4.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Andrew Nesbitt
|
|
@@ -49,6 +49,7 @@ extra_rdoc_files: []
|
|
|
49
49
|
files:
|
|
50
50
|
- CHANGELOG.md
|
|
51
51
|
- CODE_OF_CONDUCT.md
|
|
52
|
+
- Dockerfile
|
|
52
53
|
- LICENSE
|
|
53
54
|
- README.md
|
|
54
55
|
- Rakefile
|
|
@@ -89,6 +90,8 @@ files:
|
|
|
89
90
|
- lib/typosquatting/lookup.rb
|
|
90
91
|
- lib/typosquatting/sbom.rb
|
|
91
92
|
- lib/typosquatting/version.rb
|
|
93
|
+
- research/README.md
|
|
94
|
+
- research/critical_packages.rb
|
|
92
95
|
- sig/typosquatting.rbs
|
|
93
96
|
homepage: https://github.com/andrew/typosquatting
|
|
94
97
|
licenses:
|