typosquatting 0.1.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +14 -0
- data/README.md +49 -1
- data/lib/typosquatting/algorithms/adjacent_insertion.rb +23 -0
- data/lib/typosquatting/algorithms/base.rb +5 -1
- data/lib/typosquatting/algorithms/bitflip.rb +39 -0
- data/lib/typosquatting/algorithms/combosquatting.rb +40 -0
- data/lib/typosquatting/algorithms/double_hit.rb +27 -0
- data/lib/typosquatting/cli.rb +91 -14
- data/lib/typosquatting/ecosystems/github_actions.rb +84 -0
- data/lib/typosquatting/ecosystems/golang.rb +4 -0
- data/lib/typosquatting/ecosystems/npm.rb +8 -0
- data/lib/typosquatting/generator.rb +86 -4
- data/lib/typosquatting/lookup.rb +87 -0
- data/lib/typosquatting/version.rb +1 -1
- data/lib/typosquatting.rb +5 -0
- metadata +6 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 59d9c744171a8ac32d88c7218078d24ce9a27da2822b03c5127569a9cf91be46
|
|
4
|
+
data.tar.gz: eb928d00e9d2f3eb5c195628c9a7bc8eaf33e02bc04ce07eab98449b48e3f12c
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: ef4a6f706d3bd5a53d603c1c7d124fd317241ecff5ec898dea1df810ae5e65173986ab3d329ddb7300c89c2608c797b99369425eeb7910cb40f7574de99dfd00
|
|
7
|
+
data.tar.gz: 93643869152bc1c8ee092a0289c5c49d8615f69adb849f1f5343d8f11748b425f48b6a1e6028223432cf1eecb725fae24181e9d218297657a9fa61e1309675ef
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,19 @@
|
|
|
1
1
|
## [Unreleased]
|
|
2
2
|
|
|
3
|
+
## [0.3.0] - 2025-12-17
|
|
4
|
+
|
|
5
|
+
- Add `discover` command to find existing similar packages by edit distance using prefix/postfix API
|
|
6
|
+
|
|
7
|
+
## [0.2.0] - 2025-12-17
|
|
8
|
+
|
|
9
|
+
- Add GitHub Actions ecosystem for CI/CD workflow typosquatting detection
|
|
10
|
+
- Add namespace-aware variant generation for ecosystems with owner/vendor (Go, Composer, npm scoped packages)
|
|
11
|
+
- Add bitflip algorithm for bitsquatting attacks
|
|
12
|
+
- Add adjacent_insertion algorithm for inserting adjacent keyboard characters
|
|
13
|
+
- Add double_hit algorithm for replacing consecutive identical characters with adjacent keys
|
|
14
|
+
- Add length-aware algorithm filtering to reduce false positives for short package names (under 5 chars)
|
|
15
|
+
- Add combosquatting algorithm for common package suffixes (-js, -py, -cli, -lite, etc.)
|
|
16
|
+
|
|
3
17
|
## [0.1.0] - 2025-12-16
|
|
4
18
|
|
|
5
19
|
- Initial release
|
data/README.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
Detect potential typosquatting packages across package ecosystems. Generate typosquat variants of package names and check if they exist on package registries.
|
|
4
4
|
|
|
5
|
-
Supports PyPI, npm, RubyGems, Cargo, Go, Maven, NuGet, Composer, Hex, and
|
|
5
|
+
Supports PyPI, npm, RubyGems, Cargo, Go, Maven, NuGet, Composer, Hex, Pub, and GitHub Actions.
|
|
6
6
|
|
|
7
7
|
## When to use this
|
|
8
8
|
|
|
@@ -17,6 +17,8 @@ This tool helps you:
|
|
|
17
17
|
|
|
18
18
|
False positives are common. A package named `request` isn't necessarily a typosquat of `requests`. Use the output as a starting point for investigation, not as a definitive verdict.
|
|
19
19
|
|
|
20
|
+
Short package names (under 5 characters) produce more false positives because many legitimate short packages exist. By default, the generator uses only high-confidence algorithms (homoglyph, repetition, replacement, transposition) for short names. Use `--no-length-filter` to disable this and run all algorithms regardless of name length.
|
|
21
|
+
|
|
20
22
|
## Installation
|
|
21
23
|
|
|
22
24
|
```bash
|
|
@@ -53,6 +55,9 @@ typosquatting check requests -e pypi --dry-run
|
|
|
53
55
|
# Check for dependency confusion risks
|
|
54
56
|
typosquatting confusion com.company:internal-lib -e maven
|
|
55
57
|
|
|
58
|
+
# Check GitHub Actions for typosquats
|
|
59
|
+
typosquatting check actions/checkout -e github_actions
|
|
60
|
+
|
|
56
61
|
# Check multiple packages from a file
|
|
57
62
|
typosquatting confusion -e maven --file internal-packages.txt
|
|
58
63
|
|
|
@@ -64,6 +69,12 @@ typosquatting check requests -e pypi -f json
|
|
|
64
69
|
|
|
65
70
|
# List available algorithms
|
|
66
71
|
typosquatting algorithms
|
|
72
|
+
|
|
73
|
+
# Discover existing packages similar to a target (by edit distance)
|
|
74
|
+
typosquatting discover requests -e pypi
|
|
75
|
+
|
|
76
|
+
# Discover with generated variants check
|
|
77
|
+
typosquatting discover requests -e pypi --with-variants
|
|
67
78
|
```
|
|
68
79
|
|
|
69
80
|
## Example Output
|
|
@@ -158,6 +169,7 @@ Use these identifiers with the `-e` / `--ecosystem` flag:
|
|
|
158
169
|
| `composer` | Packagist | No | `-` `_` `.` | `vendor/package` format |
|
|
159
170
|
| `hex` | hex.pm | No | `_` | Underscore only, no hyphens |
|
|
160
171
|
| `pub` | pub.dev | No | `_` | Underscore only, 2-64 chars |
|
|
172
|
+
| `github_actions` | GitHub | No | `-` `_` `.` | `owner/repo` format, targets CI/CD workflows |
|
|
161
173
|
|
|
162
174
|
## Algorithms
|
|
163
175
|
|
|
@@ -177,6 +189,10 @@ Use these names with the `-a` / `--algorithms` flag (comma-separated):
|
|
|
177
189
|
| `plural` | Singularize/pluralize | `request` -> `requests` |
|
|
178
190
|
| `misspelling` | Common typos | `library` -> `libary` |
|
|
179
191
|
| `numeral` | Number/word swap | `lib2` -> `libtwo` |
|
|
192
|
+
| `bitflip` | Single-bit errors (bitsquatting) | `google` -> `coogle` |
|
|
193
|
+
| `adjacent_insertion` | Insert adjacent keyboard key | `google` -> `googhle` |
|
|
194
|
+
| `double_hit` | Replace double chars with adjacent | `google` -> `giigle` |
|
|
195
|
+
| `combosquatting` | Add common package suffixes | `lodash` -> `lodash-js` |
|
|
180
196
|
|
|
181
197
|
## SBOM Support
|
|
182
198
|
|
|
@@ -194,6 +210,38 @@ Package lookups use the [ecosyste.ms](https://packages.ecosyste.ms) API. Request
|
|
|
194
210
|
|
|
195
211
|
Be mindful when checking many packages. The `--dry-run` flag shows what would be checked without making API calls.
|
|
196
212
|
|
|
213
|
+
### packages.ecosyste.ms API
|
|
214
|
+
|
|
215
|
+
The package_names endpoint can help identify potential typosquats by searching for packages with similar prefixes or postfixes to popular package names.
|
|
216
|
+
|
|
217
|
+
```
|
|
218
|
+
GET /api/v1/registries/{registry}/package_names
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
**Parameters:**
|
|
222
|
+
- `prefix` - filter by package names starting with string (case insensitive)
|
|
223
|
+
- `postfix` - filter by package names ending with string (case insensitive)
|
|
224
|
+
- `page`, `per_page` - pagination
|
|
225
|
+
- `sort`, `order` - sorting
|
|
226
|
+
|
|
227
|
+
**Examples:**
|
|
228
|
+
```
|
|
229
|
+
# Find RubyGems packages ending in "ails" (potential "rails" typosquats)
|
|
230
|
+
https://packages.ecosyste.ms/api/v1/registries/rubygems.org/package_names?postfix=ails
|
|
231
|
+
|
|
232
|
+
# Find RubyGems packages starting with "rai" (potential "rails" typosquats)
|
|
233
|
+
https://packages.ecosyste.ms/api/v1/registries/rubygems.org/package_names?prefix=rai
|
|
234
|
+
|
|
235
|
+
# Find npm packages starting with "reac" (potential "react" typosquats)
|
|
236
|
+
https://packages.ecosyste.ms/api/v1/registries/npmjs.org/package_names?prefix=reac
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
Full API documentation: [packages.ecosyste.ms/docs](https://packages.ecosyste.ms/docs)
|
|
240
|
+
|
|
241
|
+
## Dataset
|
|
242
|
+
|
|
243
|
+
The [ecosyste-ms/typosquatting-dataset](https://github.com/ecosyste-ms/typosquatting-dataset) contains 143 confirmed typosquatting attacks from security research, mapping malicious packages to their targets with classification and source attribution. Useful for testing detection tools and understanding real attack patterns.
|
|
244
|
+
|
|
197
245
|
## Development
|
|
198
246
|
|
|
199
247
|
```bash
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Typosquatting
|
|
4
|
+
module Algorithms
|
|
5
|
+
class AdjacentInsertion < Base
|
|
6
|
+
KEYBOARD_ADJACENT = Replacement::KEYBOARD_ADJACENT
|
|
7
|
+
|
|
8
|
+
def generate(package_name)
|
|
9
|
+
variants = []
|
|
10
|
+
|
|
11
|
+
package_name.each_char.with_index do |char, i|
|
|
12
|
+
adjacent = KEYBOARD_ADJACENT[char.downcase] || []
|
|
13
|
+
adjacent.each do |adj_char|
|
|
14
|
+
variants << package_name[0..i] + adj_char + package_name[(i + 1)..]
|
|
15
|
+
variants << package_name[0...i] + adj_char + package_name[i..]
|
|
16
|
+
end
|
|
17
|
+
end
|
|
18
|
+
|
|
19
|
+
variants.uniq
|
|
20
|
+
end
|
|
21
|
+
end
|
|
22
|
+
end
|
|
23
|
+
end
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Typosquatting
|
|
4
|
+
module Algorithms
|
|
5
|
+
class Bitflip < Base
|
|
6
|
+
VALID_CHARS = (("a".."z").to_a + ("0".."9").to_a + %w[- _]).freeze
|
|
7
|
+
|
|
8
|
+
def generate(package_name)
|
|
9
|
+
variants = []
|
|
10
|
+
|
|
11
|
+
package_name.each_char.with_index do |char, i|
|
|
12
|
+
flipped = bitflip_char(char)
|
|
13
|
+
flipped.each do |new_char|
|
|
14
|
+
next unless VALID_CHARS.include?(new_char)
|
|
15
|
+
|
|
16
|
+
variant = package_name[0...i] + new_char + package_name[(i + 1)..]
|
|
17
|
+
variants << variant
|
|
18
|
+
end
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
variants.uniq
|
|
22
|
+
end
|
|
23
|
+
|
|
24
|
+
def bitflip_char(char)
|
|
25
|
+
byte = char.ord
|
|
26
|
+
results = []
|
|
27
|
+
|
|
28
|
+
8.times do |bit|
|
|
29
|
+
flipped_byte = byte ^ (1 << bit)
|
|
30
|
+
next if flipped_byte > 127 || flipped_byte < 32
|
|
31
|
+
|
|
32
|
+
results << flipped_byte.chr
|
|
33
|
+
end
|
|
34
|
+
|
|
35
|
+
results.reject { |c| c == char }
|
|
36
|
+
end
|
|
37
|
+
end
|
|
38
|
+
end
|
|
39
|
+
end
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Typosquatting
|
|
4
|
+
module Algorithms
|
|
5
|
+
class Combosquatting < Base
|
|
6
|
+
SUFFIXES = %w[
|
|
7
|
+
js .js -js
|
|
8
|
+
py -py -python python
|
|
9
|
+
-node node- -npm npm-
|
|
10
|
+
-cli -api -core -utils -util -lib -pkg
|
|
11
|
+
-lite -dev -test -beta -alpha
|
|
12
|
+
-compat -legacy -next -new -v2
|
|
13
|
+
-simd -fast -async
|
|
14
|
+
s -s
|
|
15
|
+
].freeze
|
|
16
|
+
|
|
17
|
+
PREFIXES = %w[
|
|
18
|
+
py- python-
|
|
19
|
+
node- npm-
|
|
20
|
+
go-
|
|
21
|
+
js-
|
|
22
|
+
my- the- a-
|
|
23
|
+
].freeze
|
|
24
|
+
|
|
25
|
+
def generate(package_name)
|
|
26
|
+
variants = []
|
|
27
|
+
|
|
28
|
+
SUFFIXES.each do |suffix|
|
|
29
|
+
variants << "#{package_name}#{suffix}"
|
|
30
|
+
end
|
|
31
|
+
|
|
32
|
+
PREFIXES.each do |prefix|
|
|
33
|
+
variants << "#{prefix}#{package_name}"
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
variants.uniq
|
|
37
|
+
end
|
|
38
|
+
end
|
|
39
|
+
end
|
|
40
|
+
end
|
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Typosquatting
|
|
4
|
+
module Algorithms
|
|
5
|
+
class DoubleHit < Base
|
|
6
|
+
KEYBOARD_ADJACENT = Replacement::KEYBOARD_ADJACENT
|
|
7
|
+
|
|
8
|
+
def generate(package_name)
|
|
9
|
+
variants = []
|
|
10
|
+
|
|
11
|
+
(package_name.length - 1).times do |i|
|
|
12
|
+
next unless package_name[i] == package_name[i + 1]
|
|
13
|
+
|
|
14
|
+
char = package_name[i].downcase
|
|
15
|
+
adjacent = KEYBOARD_ADJACENT[char] || []
|
|
16
|
+
|
|
17
|
+
adjacent.each do |adj_char|
|
|
18
|
+
variant = package_name[0...i] + adj_char + adj_char + package_name[(i + 2)..]
|
|
19
|
+
variants << variant
|
|
20
|
+
end
|
|
21
|
+
end
|
|
22
|
+
|
|
23
|
+
variants.uniq
|
|
24
|
+
end
|
|
25
|
+
end
|
|
26
|
+
end
|
|
27
|
+
end
|
data/lib/typosquatting/cli.rb
CHANGED
|
@@ -16,6 +16,8 @@ module Typosquatting
|
|
|
16
16
|
generate(args)
|
|
17
17
|
when "check"
|
|
18
18
|
check(args)
|
|
19
|
+
when "discover"
|
|
20
|
+
discover(args)
|
|
19
21
|
when "confusion"
|
|
20
22
|
confusion(args)
|
|
21
23
|
when "sbom"
|
|
@@ -36,13 +38,14 @@ module Typosquatting
|
|
|
36
38
|
end
|
|
37
39
|
|
|
38
40
|
def generate(args)
|
|
39
|
-
options = { format: "text", verbose: false }
|
|
41
|
+
options = { format: "text", verbose: false, length_filtering: true }
|
|
40
42
|
parser = OptionParser.new do |opts|
|
|
41
43
|
opts.banner = "Usage: typosquatting generate PACKAGE -e ECOSYSTEM [options]"
|
|
42
44
|
opts.on("-e", "--ecosystem ECOSYSTEM", "Package ecosystem (required)") { |v| options[:ecosystem] = v }
|
|
43
45
|
opts.on("-f", "--format FORMAT", "Output format (text, json, csv)") { |v| options[:format] = v }
|
|
44
46
|
opts.on("-v", "--verbose", "Show algorithm for each variant") { options[:verbose] = true }
|
|
45
47
|
opts.on("-a", "--algorithms LIST", "Comma-separated list of algorithms to use") { |v| options[:algorithms] = v }
|
|
48
|
+
opts.on("--no-length-filter", "Disable length-based algorithm filtering for short names") { options[:length_filtering] = false }
|
|
46
49
|
end
|
|
47
50
|
parser.parse!(args)
|
|
48
51
|
|
|
@@ -55,14 +58,14 @@ module Typosquatting
|
|
|
55
58
|
|
|
56
59
|
ecosystem = Ecosystems::Base.get(options[:ecosystem])
|
|
57
60
|
algorithms = select_algorithms(options[:algorithms])
|
|
58
|
-
generator = Generator.new(ecosystem: ecosystem, algorithms: algorithms)
|
|
61
|
+
generator = Generator.new(ecosystem: ecosystem, algorithms: algorithms, length_filtering: options[:length_filtering])
|
|
59
62
|
variants = generator.generate(package)
|
|
60
63
|
|
|
61
64
|
output_variants(variants, options)
|
|
62
65
|
end
|
|
63
66
|
|
|
64
67
|
def check(args)
|
|
65
|
-
options = { format: "text", verbose: false, existing_only: false, dry_run: false }
|
|
68
|
+
options = { format: "text", verbose: false, existing_only: false, dry_run: false, length_filtering: true }
|
|
66
69
|
parser = OptionParser.new do |opts|
|
|
67
70
|
opts.banner = "Usage: typosquatting check PACKAGE -e ECOSYSTEM [options]"
|
|
68
71
|
opts.on("-e", "--ecosystem ECOSYSTEM", "Package ecosystem (required)") { |v| options[:ecosystem] = v }
|
|
@@ -71,6 +74,7 @@ module Typosquatting
|
|
|
71
74
|
opts.on("-a", "--algorithms LIST", "Comma-separated list of algorithms to use") { |v| options[:algorithms] = v }
|
|
72
75
|
opts.on("--existing-only", "Only show packages that exist") { options[:existing_only] = true }
|
|
73
76
|
opts.on("--dry-run", "Show variants without making API calls") { options[:dry_run] = true }
|
|
77
|
+
opts.on("--no-length-filter", "Disable length-based algorithm filtering for short names") { options[:length_filtering] = false }
|
|
74
78
|
end
|
|
75
79
|
parser.parse!(args)
|
|
76
80
|
|
|
@@ -83,7 +87,7 @@ module Typosquatting
|
|
|
83
87
|
|
|
84
88
|
ecosystem = Ecosystems::Base.get(options[:ecosystem])
|
|
85
89
|
algorithms = select_algorithms(options[:algorithms])
|
|
86
|
-
generator = Generator.new(ecosystem: ecosystem, algorithms: algorithms)
|
|
90
|
+
generator = Generator.new(ecosystem: ecosystem, algorithms: algorithms, length_filtering: options[:length_filtering])
|
|
87
91
|
variants = generator.generate(package)
|
|
88
92
|
|
|
89
93
|
if options[:dry_run]
|
|
@@ -99,6 +103,39 @@ module Typosquatting
|
|
|
99
103
|
output_check_results(results, options)
|
|
100
104
|
end
|
|
101
105
|
|
|
106
|
+
def discover(args)
|
|
107
|
+
options = { format: "text", max_distance: 2 }
|
|
108
|
+
parser = OptionParser.new do |opts|
|
|
109
|
+
opts.banner = "Usage: typosquatting discover PACKAGE -e ECOSYSTEM [options]"
|
|
110
|
+
opts.on("-e", "--ecosystem ECOSYSTEM", "Package ecosystem (required)") { |v| options[:ecosystem] = v }
|
|
111
|
+
opts.on("-f", "--format FORMAT", "Output format (text, json)") { |v| options[:format] = v }
|
|
112
|
+
opts.on("-d", "--distance N", Integer, "Maximum edit distance (default: 2)") { |v| options[:max_distance] = v }
|
|
113
|
+
opts.on("--with-variants", "Also show which generated variants exist") { options[:with_variants] = true }
|
|
114
|
+
end
|
|
115
|
+
parser.parse!(args)
|
|
116
|
+
|
|
117
|
+
package = args.shift
|
|
118
|
+
unless package && options[:ecosystem]
|
|
119
|
+
$stderr.puts "Error: Package name and ecosystem required"
|
|
120
|
+
$stderr.puts parser
|
|
121
|
+
exit 1
|
|
122
|
+
end
|
|
123
|
+
|
|
124
|
+
lookup = Lookup.new(ecosystem: options[:ecosystem])
|
|
125
|
+
|
|
126
|
+
$stderr.puts "Discovering similar packages to #{package}..." if $stderr.tty?
|
|
127
|
+
results = lookup.discover(package, max_distance: options[:max_distance])
|
|
128
|
+
|
|
129
|
+
if options[:with_variants]
|
|
130
|
+
generator = Generator.new(ecosystem: options[:ecosystem])
|
|
131
|
+
variants = generator.generate(package)
|
|
132
|
+
variant_results = lookup.check_with_variants(package, variants)
|
|
133
|
+
existing_variants = variant_results.select(&:exists?)
|
|
134
|
+
end
|
|
135
|
+
|
|
136
|
+
output_discover_results(results, existing_variants, options)
|
|
137
|
+
end
|
|
138
|
+
|
|
102
139
|
def confusion(args)
|
|
103
140
|
options = { format: "text" }
|
|
104
141
|
parser = OptionParser.new do |opts|
|
|
@@ -177,16 +214,17 @@ module Typosquatting
|
|
|
177
214
|
def ecosystems
|
|
178
215
|
puts "Supported ecosystems:"
|
|
179
216
|
puts ""
|
|
180
|
-
puts " pypi
|
|
181
|
-
puts " npm
|
|
182
|
-
puts " gem
|
|
183
|
-
puts " cargo
|
|
184
|
-
puts " golang
|
|
185
|
-
puts " maven
|
|
186
|
-
puts " nuget
|
|
187
|
-
puts " composer
|
|
188
|
-
puts " hex
|
|
189
|
-
puts " pub
|
|
217
|
+
puts " pypi - Python Package Index"
|
|
218
|
+
puts " npm - Node Package Manager"
|
|
219
|
+
puts " gem - RubyGems"
|
|
220
|
+
puts " cargo - Rust packages"
|
|
221
|
+
puts " golang - Go modules"
|
|
222
|
+
puts " maven - Java/JVM packages"
|
|
223
|
+
puts " nuget - .NET packages"
|
|
224
|
+
puts " composer - PHP packages"
|
|
225
|
+
puts " hex - Erlang/Elixir packages"
|
|
226
|
+
puts " pub - Dart packages"
|
|
227
|
+
puts " github_actions - GitHub Actions"
|
|
190
228
|
end
|
|
191
229
|
|
|
192
230
|
def algorithms
|
|
@@ -209,6 +247,7 @@ module Typosquatting
|
|
|
209
247
|
puts "Commands:"
|
|
210
248
|
puts " generate PACKAGE -e ECOSYSTEM Generate typosquat variants"
|
|
211
249
|
puts " check PACKAGE -e ECOSYSTEM Check which variants exist"
|
|
250
|
+
puts " discover PACKAGE -e ECOSYSTEM Find similar packages by edit distance"
|
|
212
251
|
puts " confusion PACKAGE -e ECOSYSTEM Check for dependency confusion"
|
|
213
252
|
puts " sbom FILE Check SBOM for potential typosquats"
|
|
214
253
|
puts " ecosystems List supported ecosystems"
|
|
@@ -219,6 +258,7 @@ module Typosquatting
|
|
|
219
258
|
puts "Examples:"
|
|
220
259
|
puts " typosquatting generate requests -e pypi"
|
|
221
260
|
puts " typosquatting check requests -e pypi --existing-only"
|
|
261
|
+
puts " typosquatting discover rails -e gem --with-variants"
|
|
222
262
|
puts " typosquatting confusion my-package -e maven"
|
|
223
263
|
puts " typosquatting sbom bom.json"
|
|
224
264
|
end
|
|
@@ -376,5 +416,42 @@ module Typosquatting
|
|
|
376
416
|
puts "Found #{results.length} suspicious package(s)"
|
|
377
417
|
end
|
|
378
418
|
end
|
|
419
|
+
|
|
420
|
+
def output_discover_results(discovered, existing_variants, options)
|
|
421
|
+
case options[:format]
|
|
422
|
+
when "json"
|
|
423
|
+
data = {
|
|
424
|
+
discovered: discovered.map(&:to_h),
|
|
425
|
+
existing_variants: existing_variants&.map(&:to_h)
|
|
426
|
+
}.compact
|
|
427
|
+
puts JSON.pretty_generate(data)
|
|
428
|
+
else
|
|
429
|
+
if discovered.empty? && (existing_variants.nil? || existing_variants.empty?)
|
|
430
|
+
puts "No similar packages found"
|
|
431
|
+
return
|
|
432
|
+
end
|
|
433
|
+
|
|
434
|
+
if discovered.any?
|
|
435
|
+
puts "Similar packages found (by edit distance):"
|
|
436
|
+
puts ""
|
|
437
|
+
discovered.each do |result|
|
|
438
|
+
puts " #{result.name} (distance: #{result.distance})"
|
|
439
|
+
end
|
|
440
|
+
puts ""
|
|
441
|
+
end
|
|
442
|
+
|
|
443
|
+
if existing_variants&.any?
|
|
444
|
+
puts "Generated variants that exist:"
|
|
445
|
+
puts ""
|
|
446
|
+
existing_variants.each do |result|
|
|
447
|
+
puts " #{result.name}"
|
|
448
|
+
end
|
|
449
|
+
puts ""
|
|
450
|
+
end
|
|
451
|
+
|
|
452
|
+
puts "Found #{discovered.length} similar package(s)"
|
|
453
|
+
puts "Found #{existing_variants.length} existing variant(s)" if existing_variants&.any?
|
|
454
|
+
end
|
|
455
|
+
end
|
|
379
456
|
end
|
|
380
457
|
end
|
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Typosquatting
|
|
4
|
+
module Ecosystems
|
|
5
|
+
class GithubActions < Base
|
|
6
|
+
def initialize
|
|
7
|
+
super
|
|
8
|
+
@name = "github_actions"
|
|
9
|
+
@purl_type = "github"
|
|
10
|
+
end
|
|
11
|
+
|
|
12
|
+
def name_pattern
|
|
13
|
+
/\A[a-zA-Z0-9][a-zA-Z0-9-]*\/[a-zA-Z0-9._-]+\z/
|
|
14
|
+
end
|
|
15
|
+
|
|
16
|
+
def allowed_characters
|
|
17
|
+
/[a-zA-Z0-9._-]/
|
|
18
|
+
end
|
|
19
|
+
|
|
20
|
+
def allowed_delimiters
|
|
21
|
+
%w[- _ .]
|
|
22
|
+
end
|
|
23
|
+
|
|
24
|
+
def case_sensitive?
|
|
25
|
+
false
|
|
26
|
+
end
|
|
27
|
+
|
|
28
|
+
def supports_namespaces?
|
|
29
|
+
true
|
|
30
|
+
end
|
|
31
|
+
|
|
32
|
+
def normalise(name)
|
|
33
|
+
name.downcase.sub(/@.*$/, "")
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
def parse_namespace(name)
|
|
37
|
+
clean_name = name.sub(/@.*$/, "")
|
|
38
|
+
parts = clean_name.split("/", 2)
|
|
39
|
+
if parts.length == 2
|
|
40
|
+
[parts[0], parts[1]]
|
|
41
|
+
else
|
|
42
|
+
[nil, name]
|
|
43
|
+
end
|
|
44
|
+
end
|
|
45
|
+
|
|
46
|
+
def valid_name?(name)
|
|
47
|
+
return false if name.nil? || name.empty?
|
|
48
|
+
|
|
49
|
+
clean_name = name.sub(/@.*$/, "")
|
|
50
|
+
owner, repo = parse_namespace(clean_name)
|
|
51
|
+
|
|
52
|
+
return false if owner.nil? || repo.nil?
|
|
53
|
+
return false if owner.empty? || repo.empty?
|
|
54
|
+
|
|
55
|
+
return false unless valid_owner?(owner)
|
|
56
|
+
return false unless valid_repo?(repo)
|
|
57
|
+
|
|
58
|
+
true
|
|
59
|
+
end
|
|
60
|
+
|
|
61
|
+
def format_name(owner, repo)
|
|
62
|
+
"#{owner}/#{repo}"
|
|
63
|
+
end
|
|
64
|
+
|
|
65
|
+
def valid_owner?(owner)
|
|
66
|
+
return false if owner.length > 39
|
|
67
|
+
return false if owner.start_with?("-")
|
|
68
|
+
return false if owner.end_with?("-")
|
|
69
|
+
return false if owner.include?("--")
|
|
70
|
+
|
|
71
|
+
!!(owner =~ /\A[a-zA-Z0-9][a-zA-Z0-9-]*\z/)
|
|
72
|
+
end
|
|
73
|
+
|
|
74
|
+
def valid_repo?(repo)
|
|
75
|
+
return false if repo.length > 100
|
|
76
|
+
return false if repo.start_with?(".")
|
|
77
|
+
|
|
78
|
+
!!(repo =~ /\A[a-zA-Z0-9._-]+\z/)
|
|
79
|
+
end
|
|
80
|
+
end
|
|
81
|
+
|
|
82
|
+
Base.register(GithubActions.new)
|
|
83
|
+
end
|
|
84
|
+
end
|
|
@@ -2,17 +2,47 @@
|
|
|
2
2
|
|
|
3
3
|
module Typosquatting
|
|
4
4
|
class Generator
|
|
5
|
-
|
|
5
|
+
SHORT_NAME_THRESHOLD = 5
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
HIGH_CONFIDENCE_ALGORITHMS = %w[
|
|
8
|
+
homoglyph
|
|
9
|
+
repetition
|
|
10
|
+
replacement
|
|
11
|
+
transposition
|
|
12
|
+
].freeze
|
|
13
|
+
|
|
14
|
+
attr_reader :ecosystem, :algorithms, :length_filtering
|
|
15
|
+
|
|
16
|
+
def initialize(ecosystem:, algorithms: nil, length_filtering: true)
|
|
8
17
|
@ecosystem = ecosystem.is_a?(String) ? Ecosystems::Base.get(ecosystem) : ecosystem
|
|
9
18
|
@algorithms = algorithms || Algorithms::Base.all
|
|
19
|
+
@length_filtering = length_filtering
|
|
10
20
|
end
|
|
11
21
|
|
|
12
22
|
def generate(package_name)
|
|
13
23
|
results = []
|
|
14
24
|
|
|
15
|
-
|
|
25
|
+
if ecosystem.supports_namespaces?
|
|
26
|
+
results.concat(generate_namespace_aware(package_name))
|
|
27
|
+
else
|
|
28
|
+
results.concat(generate_simple(package_name))
|
|
29
|
+
end
|
|
30
|
+
|
|
31
|
+
dedupe_by_normalised_name(results)
|
|
32
|
+
end
|
|
33
|
+
|
|
34
|
+
def algorithms_for_length(name_length)
|
|
35
|
+
return algorithms unless length_filtering
|
|
36
|
+
return algorithms if name_length >= SHORT_NAME_THRESHOLD
|
|
37
|
+
|
|
38
|
+
algorithms.select { |a| HIGH_CONFIDENCE_ALGORITHMS.include?(a.name) }
|
|
39
|
+
end
|
|
40
|
+
|
|
41
|
+
def generate_simple(package_name)
|
|
42
|
+
results = []
|
|
43
|
+
active_algorithms = algorithms_for_length(package_name.length)
|
|
44
|
+
|
|
45
|
+
active_algorithms.each do |algorithm|
|
|
16
46
|
variants = algorithm.generate(package_name)
|
|
17
47
|
variants.each do |variant|
|
|
18
48
|
next if variant == package_name
|
|
@@ -27,7 +57,59 @@ module Typosquatting
|
|
|
27
57
|
end
|
|
28
58
|
end
|
|
29
59
|
|
|
30
|
-
|
|
60
|
+
results
|
|
61
|
+
end
|
|
62
|
+
|
|
63
|
+
def generate_namespace_aware(package_name)
|
|
64
|
+
namespace, name = ecosystem.parse_namespace(package_name)
|
|
65
|
+
results = []
|
|
66
|
+
|
|
67
|
+
return generate_simple(package_name) if namespace.nil?
|
|
68
|
+
|
|
69
|
+
namespace_algorithms = algorithms_for_length(namespace.length)
|
|
70
|
+
name_algorithms = algorithms_for_length(name.length)
|
|
71
|
+
|
|
72
|
+
namespace_algorithms.each do |algorithm|
|
|
73
|
+
namespace_variants = algorithm.generate(namespace)
|
|
74
|
+
namespace_variants.each do |ns_variant|
|
|
75
|
+
full_name = rebuild_namespaced_name(ns_variant, name)
|
|
76
|
+
next if full_name == package_name
|
|
77
|
+
next unless ecosystem.valid_name?(full_name)
|
|
78
|
+
next if same_after_normalisation?(package_name, full_name)
|
|
79
|
+
|
|
80
|
+
results << Variant.new(
|
|
81
|
+
name: full_name,
|
|
82
|
+
algorithm: algorithm.name,
|
|
83
|
+
original: package_name
|
|
84
|
+
)
|
|
85
|
+
end
|
|
86
|
+
end
|
|
87
|
+
|
|
88
|
+
name_algorithms.each do |algorithm|
|
|
89
|
+
name_variants = algorithm.generate(name)
|
|
90
|
+
name_variants.each do |name_variant|
|
|
91
|
+
full_name = rebuild_namespaced_name(namespace, name_variant)
|
|
92
|
+
next if full_name == package_name
|
|
93
|
+
next unless ecosystem.valid_name?(full_name)
|
|
94
|
+
next if same_after_normalisation?(package_name, full_name)
|
|
95
|
+
|
|
96
|
+
results << Variant.new(
|
|
97
|
+
name: full_name,
|
|
98
|
+
algorithm: algorithm.name,
|
|
99
|
+
original: package_name
|
|
100
|
+
)
|
|
101
|
+
end
|
|
102
|
+
end
|
|
103
|
+
|
|
104
|
+
results
|
|
105
|
+
end
|
|
106
|
+
|
|
107
|
+
def rebuild_namespaced_name(namespace, name)
|
|
108
|
+
if ecosystem.respond_to?(:format_name)
|
|
109
|
+
ecosystem.format_name(namespace, name)
|
|
110
|
+
else
|
|
111
|
+
"#{namespace}/#{name}"
|
|
112
|
+
end
|
|
31
113
|
end
|
|
32
114
|
|
|
33
115
|
Variant = Struct.new(:name, :algorithm, :original, keyword_init: true) do
|
data/lib/typosquatting/lookup.rb
CHANGED
|
@@ -4,6 +4,7 @@ require "net/http"
|
|
|
4
4
|
require "json"
|
|
5
5
|
require "uri"
|
|
6
6
|
require "purl"
|
|
7
|
+
require "set"
|
|
7
8
|
|
|
8
9
|
module Typosquatting
|
|
9
10
|
class Lookup
|
|
@@ -51,6 +52,92 @@ module Typosquatting
|
|
|
51
52
|
response&.map { |r| Registry.new(r) } || []
|
|
52
53
|
end
|
|
53
54
|
|
|
55
|
+
def list_names(registry:, prefix: nil, postfix: nil)
|
|
56
|
+
params = []
|
|
57
|
+
params << "prefix=#{URI.encode_www_form_component(prefix)}" if prefix
|
|
58
|
+
params << "postfix=#{URI.encode_www_form_component(postfix)}" if postfix
|
|
59
|
+
query = params.empty? ? "" : "?#{params.join("&")}"
|
|
60
|
+
|
|
61
|
+
fetch("/registries/#{URI.encode_www_form_component(registry)}/package_names#{query}") || []
|
|
62
|
+
end
|
|
63
|
+
|
|
64
|
+
def discover(package_name, max_distance: 2)
|
|
65
|
+
registry = registries.first
|
|
66
|
+
return [] unless registry
|
|
67
|
+
|
|
68
|
+
prefix = package_name[0, 3]
|
|
69
|
+
candidates = list_names(registry: registry.name, prefix: prefix)
|
|
70
|
+
|
|
71
|
+
candidates.filter_map do |candidate|
|
|
72
|
+
next if candidate == package_name
|
|
73
|
+
|
|
74
|
+
distance = levenshtein(package_name.downcase, candidate.downcase)
|
|
75
|
+
next if distance > max_distance || distance == 0
|
|
76
|
+
|
|
77
|
+
DiscoveryResult.new(
|
|
78
|
+
name: candidate,
|
|
79
|
+
target: package_name,
|
|
80
|
+
distance: distance
|
|
81
|
+
)
|
|
82
|
+
end.sort_by(&:distance)
|
|
83
|
+
end
|
|
84
|
+
|
|
85
|
+
def check_with_variants(package_name, variants)
|
|
86
|
+
registry = registries.first
|
|
87
|
+
return [] unless registry
|
|
88
|
+
|
|
89
|
+
prefix = package_name[0, 3]
|
|
90
|
+
existing = list_names(registry: registry.name, prefix: prefix)
|
|
91
|
+
existing_set = existing.map(&:downcase).to_set
|
|
92
|
+
|
|
93
|
+
variant_names = variants.map { |v| v.is_a?(String) ? v : v.name }
|
|
94
|
+
|
|
95
|
+
variant_names.filter_map do |variant|
|
|
96
|
+
exists = existing_set.include?(variant.downcase)
|
|
97
|
+
VariantCheckResult.new(
|
|
98
|
+
name: variant,
|
|
99
|
+
exists: exists
|
|
100
|
+
)
|
|
101
|
+
end
|
|
102
|
+
end
|
|
103
|
+
|
|
104
|
+
def levenshtein(s1, s2)
|
|
105
|
+
m, n = s1.length, s2.length
|
|
106
|
+
return n if m == 0
|
|
107
|
+
return m if n == 0
|
|
108
|
+
|
|
109
|
+
d = Array.new(m + 1) { |i| i }
|
|
110
|
+
x = nil
|
|
111
|
+
|
|
112
|
+
(1..n).each do |j|
|
|
113
|
+
d[0] = j
|
|
114
|
+
x = j - 1
|
|
115
|
+
|
|
116
|
+
(1..m).each do |i|
|
|
117
|
+
cost = s1[i - 1] == s2[j - 1] ? 0 : 1
|
|
118
|
+
x, d[i] = d[i], [d[i] + 1, d[i - 1] + 1, x + cost].min
|
|
119
|
+
end
|
|
120
|
+
end
|
|
121
|
+
|
|
122
|
+
d[m]
|
|
123
|
+
end
|
|
124
|
+
|
|
125
|
+
DiscoveryResult = Struct.new(:name, :target, :distance, keyword_init: true) do
|
|
126
|
+
def to_h
|
|
127
|
+
{ name: name, target: target, distance: distance }
|
|
128
|
+
end
|
|
129
|
+
end
|
|
130
|
+
|
|
131
|
+
VariantCheckResult = Struct.new(:name, :exists, keyword_init: true) do
|
|
132
|
+
def exists?
|
|
133
|
+
exists
|
|
134
|
+
end
|
|
135
|
+
|
|
136
|
+
def to_h
|
|
137
|
+
{ name: name, exists: exists }
|
|
138
|
+
end
|
|
139
|
+
end
|
|
140
|
+
|
|
54
141
|
Result = Struct.new(:name, :purl, :packages, :ecosystem, keyword_init: true) do
|
|
55
142
|
def exists?
|
|
56
143
|
!packages.empty?
|
data/lib/typosquatting.rb
CHANGED
|
@@ -15,6 +15,10 @@ require_relative "typosquatting/algorithms/word_order"
|
|
|
15
15
|
require_relative "typosquatting/algorithms/plural"
|
|
16
16
|
require_relative "typosquatting/algorithms/misspelling"
|
|
17
17
|
require_relative "typosquatting/algorithms/numeral"
|
|
18
|
+
require_relative "typosquatting/algorithms/bitflip"
|
|
19
|
+
require_relative "typosquatting/algorithms/adjacent_insertion"
|
|
20
|
+
require_relative "typosquatting/algorithms/double_hit"
|
|
21
|
+
require_relative "typosquatting/algorithms/combosquatting"
|
|
18
22
|
|
|
19
23
|
require_relative "typosquatting/ecosystems/base"
|
|
20
24
|
require_relative "typosquatting/ecosystems/pypi"
|
|
@@ -27,6 +31,7 @@ require_relative "typosquatting/ecosystems/nuget"
|
|
|
27
31
|
require_relative "typosquatting/ecosystems/composer"
|
|
28
32
|
require_relative "typosquatting/ecosystems/hex"
|
|
29
33
|
require_relative "typosquatting/ecosystems/pub"
|
|
34
|
+
require_relative "typosquatting/ecosystems/github_actions"
|
|
30
35
|
|
|
31
36
|
require_relative "typosquatting/generator"
|
|
32
37
|
require_relative "typosquatting/lookup"
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: typosquatting
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.3.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Andrew Nesbitt
|
|
@@ -55,8 +55,12 @@ files:
|
|
|
55
55
|
- exe/typosquatting
|
|
56
56
|
- lib/typosquatting.rb
|
|
57
57
|
- lib/typosquatting/algorithms/addition.rb
|
|
58
|
+
- lib/typosquatting/algorithms/adjacent_insertion.rb
|
|
58
59
|
- lib/typosquatting/algorithms/base.rb
|
|
60
|
+
- lib/typosquatting/algorithms/bitflip.rb
|
|
61
|
+
- lib/typosquatting/algorithms/combosquatting.rb
|
|
59
62
|
- lib/typosquatting/algorithms/delimiter.rb
|
|
63
|
+
- lib/typosquatting/algorithms/double_hit.rb
|
|
60
64
|
- lib/typosquatting/algorithms/homoglyph.rb
|
|
61
65
|
- lib/typosquatting/algorithms/misspelling.rb
|
|
62
66
|
- lib/typosquatting/algorithms/numeral.rb
|
|
@@ -72,6 +76,7 @@ files:
|
|
|
72
76
|
- lib/typosquatting/ecosystems/base.rb
|
|
73
77
|
- lib/typosquatting/ecosystems/cargo.rb
|
|
74
78
|
- lib/typosquatting/ecosystems/composer.rb
|
|
79
|
+
- lib/typosquatting/ecosystems/github_actions.rb
|
|
75
80
|
- lib/typosquatting/ecosystems/golang.rb
|
|
76
81
|
- lib/typosquatting/ecosystems/hex.rb
|
|
77
82
|
- lib/typosquatting/ecosystems/maven.rb
|