typosquatting 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e50bfd6b6ae458a3c588600cf0ae4e3fe7a551bf801d0133dca2c88d5df423d6
4
- data.tar.gz: d9f0abe8dd964b970e0f760f807b4b76a2f71d3ad9a05b32dcf5b19ce1438f76
3
+ metadata.gz: 59d9c744171a8ac32d88c7218078d24ce9a27da2822b03c5127569a9cf91be46
4
+ data.tar.gz: eb928d00e9d2f3eb5c195628c9a7bc8eaf33e02bc04ce07eab98449b48e3f12c
5
5
  SHA512:
6
- metadata.gz: 9cf712d35089a972dd4b9cd47139cb0b793a901f2665de3027629c0f6f39faa1216ef2df4092ee04bc21c89b5c9e2f44e4b26dcbf95598c9b64a151793b3f6be
7
- data.tar.gz: 4bb644fb9af9173051c6de93b5d0b5e88f3dac01ff6873dba102dfb2a72d64e0acda77daf7116597a1a93945e050a04f9603c18a039f6faca0761be51e310142
6
+ metadata.gz: ef4a6f706d3bd5a53d603c1c7d124fd317241ecff5ec898dea1df810ae5e65173986ab3d329ddb7300c89c2608c797b99369425eeb7910cb40f7574de99dfd00
7
+ data.tar.gz: 93643869152bc1c8ee092a0289c5c49d8615f69adb849f1f5343d8f11748b425f48b6a1e6028223432cf1eecb725fae24181e9d218297657a9fa61e1309675ef
data/CHANGELOG.md CHANGED
@@ -1,5 +1,19 @@
1
1
  ## [Unreleased]
2
2
 
3
+ ## [0.3.0] - 2025-12-17
4
+
5
+ - Add `discover` command to find existing similar packages by edit distance using prefix/postfix API
6
+
7
+ ## [0.2.0] - 2025-12-17
8
+
9
+ - Add GitHub Actions ecosystem for CI/CD workflow typosquatting detection
10
+ - Add namespace-aware variant generation for ecosystems with owner/vendor (Go, Composer, npm scoped packages)
11
+ - Add bitflip algorithm for bitsquatting attacks
12
+ - Add adjacent_insertion algorithm for inserting adjacent keyboard characters
13
+ - Add double_hit algorithm for replacing consecutive identical characters with adjacent keys
14
+ - Add length-aware algorithm filtering to reduce false positives for short package names (under 5 chars)
15
+ - Add combosquatting algorithm for common package suffixes (-js, -py, -cli, -lite, etc.)
16
+
3
17
  ## [0.1.0] - 2025-12-16
4
18
 
5
19
  - Initial release
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  Detect potential typosquatting packages across package ecosystems. Generate typosquat variants of package names and check if they exist on package registries.
4
4
 
5
- Supports PyPI, npm, RubyGems, Cargo, Go, Maven, NuGet, Composer, Hex, and Pub.
5
+ Supports PyPI, npm, RubyGems, Cargo, Go, Maven, NuGet, Composer, Hex, Pub, and GitHub Actions.
6
6
 
7
7
  ## When to use this
8
8
 
@@ -17,6 +17,8 @@ This tool helps you:
17
17
 
18
18
  False positives are common. A package named `request` isn't necessarily a typosquat of `requests`. Use the output as a starting point for investigation, not as a definitive verdict.
19
19
 
20
+ Short package names (under 5 characters) produce more false positives because many legitimate short packages exist. By default, the generator uses only high-confidence algorithms (homoglyph, repetition, replacement, transposition) for short names. Use `--no-length-filter` to disable this and run all algorithms regardless of name length.
21
+
20
22
  ## Installation
21
23
 
22
24
  ```bash
@@ -53,6 +55,9 @@ typosquatting check requests -e pypi --dry-run
53
55
  # Check for dependency confusion risks
54
56
  typosquatting confusion com.company:internal-lib -e maven
55
57
 
58
+ # Check GitHub Actions for typosquats
59
+ typosquatting check actions/checkout -e github_actions
60
+
56
61
  # Check multiple packages from a file
57
62
  typosquatting confusion -e maven --file internal-packages.txt
58
63
 
@@ -64,6 +69,12 @@ typosquatting check requests -e pypi -f json
64
69
 
65
70
  # List available algorithms
66
71
  typosquatting algorithms
72
+
73
+ # Discover existing packages similar to a target (by edit distance)
74
+ typosquatting discover requests -e pypi
75
+
76
+ # Discover with generated variants check
77
+ typosquatting discover requests -e pypi --with-variants
67
78
  ```
68
79
 
69
80
  ## Example Output
@@ -158,6 +169,7 @@ Use these identifiers with the `-e` / `--ecosystem` flag:
158
169
  | `composer` | Packagist | No | `-` `_` `.` | `vendor/package` format |
159
170
  | `hex` | hex.pm | No | `_` | Underscore only, no hyphens |
160
171
  | `pub` | pub.dev | No | `_` | Underscore only, 2-64 chars |
172
+ | `github_actions` | GitHub | No | `-` `_` `.` | `owner/repo` format, targets CI/CD workflows |
161
173
 
162
174
  ## Algorithms
163
175
 
@@ -177,6 +189,10 @@ Use these names with the `-a` / `--algorithms` flag (comma-separated):
177
189
  | `plural` | Singularize/pluralize | `request` -> `requests` |
178
190
  | `misspelling` | Common typos | `library` -> `libary` |
179
191
  | `numeral` | Number/word swap | `lib2` -> `libtwo` |
192
+ | `bitflip` | Single-bit errors (bitsquatting) | `google` -> `coogle` |
193
+ | `adjacent_insertion` | Insert adjacent keyboard key | `google` -> `googhle` |
194
+ | `double_hit` | Replace double chars with adjacent | `google` -> `giigle` |
195
+ | `combosquatting` | Add common package suffixes | `lodash` -> `lodash-js` |
180
196
 
181
197
  ## SBOM Support
182
198
 
@@ -194,6 +210,38 @@ Package lookups use the [ecosyste.ms](https://packages.ecosyste.ms) API. Request
194
210
 
195
211
  Be mindful when checking many packages. The `--dry-run` flag shows what would be checked without making API calls.
196
212
 
213
+ ### packages.ecosyste.ms API
214
+
215
+ The package_names endpoint can help identify potential typosquats by searching for packages with similar prefixes or postfixes to popular package names.
216
+
217
+ ```
218
+ GET /api/v1/registries/{registry}/package_names
219
+ ```
220
+
221
+ **Parameters:**
222
+ - `prefix` - filter by package names starting with string (case insensitive)
223
+ - `postfix` - filter by package names ending with string (case insensitive)
224
+ - `page`, `per_page` - pagination
225
+ - `sort`, `order` - sorting
226
+
227
+ **Examples:**
228
+ ```
229
+ # Find RubyGems packages ending in "ails" (potential "rails" typosquats)
230
+ https://packages.ecosyste.ms/api/v1/registries/rubygems.org/package_names?postfix=ails
231
+
232
+ # Find RubyGems packages starting with "rai" (potential "rails" typosquats)
233
+ https://packages.ecosyste.ms/api/v1/registries/rubygems.org/package_names?prefix=rai
234
+
235
+ # Find npm packages starting with "reac" (potential "react" typosquats)
236
+ https://packages.ecosyste.ms/api/v1/registries/npmjs.org/package_names?prefix=reac
237
+ ```
238
+
239
+ Full API documentation: [packages.ecosyste.ms/docs](https://packages.ecosyste.ms/docs)
240
+
241
+ ## Dataset
242
+
243
+ The [ecosyste-ms/typosquatting-dataset](https://github.com/ecosyste-ms/typosquatting-dataset) contains 143 confirmed typosquatting attacks from security research, mapping malicious packages to their targets with classification and source attribution. Useful for testing detection tools and understanding real attack patterns.
244
+
197
245
  ## Development
198
246
 
199
247
  ```bash
@@ -0,0 +1,23 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Typosquatting
4
+ module Algorithms
5
+ class AdjacentInsertion < Base
6
+ KEYBOARD_ADJACENT = Replacement::KEYBOARD_ADJACENT
7
+
8
+ def generate(package_name)
9
+ variants = []
10
+
11
+ package_name.each_char.with_index do |char, i|
12
+ adjacent = KEYBOARD_ADJACENT[char.downcase] || []
13
+ adjacent.each do |adj_char|
14
+ variants << package_name[0..i] + adj_char + package_name[(i + 1)..]
15
+ variants << package_name[0...i] + adj_char + package_name[i..]
16
+ end
17
+ end
18
+
19
+ variants.uniq
20
+ end
21
+ end
22
+ end
23
+ end
@@ -26,7 +26,11 @@ module Typosquatting
26
26
  WordOrder.new,
27
27
  Plural.new,
28
28
  Misspelling.new,
29
- Numeral.new
29
+ Numeral.new,
30
+ Bitflip.new,
31
+ AdjacentInsertion.new,
32
+ DoubleHit.new,
33
+ Combosquatting.new
30
34
  ]
31
35
  end
32
36
  end
@@ -0,0 +1,39 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Typosquatting
4
+ module Algorithms
5
+ class Bitflip < Base
6
+ VALID_CHARS = (("a".."z").to_a + ("0".."9").to_a + %w[- _]).freeze
7
+
8
+ def generate(package_name)
9
+ variants = []
10
+
11
+ package_name.each_char.with_index do |char, i|
12
+ flipped = bitflip_char(char)
13
+ flipped.each do |new_char|
14
+ next unless VALID_CHARS.include?(new_char)
15
+
16
+ variant = package_name[0...i] + new_char + package_name[(i + 1)..]
17
+ variants << variant
18
+ end
19
+ end
20
+
21
+ variants.uniq
22
+ end
23
+
24
+ def bitflip_char(char)
25
+ byte = char.ord
26
+ results = []
27
+
28
+ 8.times do |bit|
29
+ flipped_byte = byte ^ (1 << bit)
30
+ next if flipped_byte > 127 || flipped_byte < 32
31
+
32
+ results << flipped_byte.chr
33
+ end
34
+
35
+ results.reject { |c| c == char }
36
+ end
37
+ end
38
+ end
39
+ end
@@ -0,0 +1,40 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Typosquatting
4
+ module Algorithms
5
+ class Combosquatting < Base
6
+ SUFFIXES = %w[
7
+ js .js -js
8
+ py -py -python python
9
+ -node node- -npm npm-
10
+ -cli -api -core -utils -util -lib -pkg
11
+ -lite -dev -test -beta -alpha
12
+ -compat -legacy -next -new -v2
13
+ -simd -fast -async
14
+ s -s
15
+ ].freeze
16
+
17
+ PREFIXES = %w[
18
+ py- python-
19
+ node- npm-
20
+ go-
21
+ js-
22
+ my- the- a-
23
+ ].freeze
24
+
25
+ def generate(package_name)
26
+ variants = []
27
+
28
+ SUFFIXES.each do |suffix|
29
+ variants << "#{package_name}#{suffix}"
30
+ end
31
+
32
+ PREFIXES.each do |prefix|
33
+ variants << "#{prefix}#{package_name}"
34
+ end
35
+
36
+ variants.uniq
37
+ end
38
+ end
39
+ end
40
+ end
@@ -0,0 +1,27 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Typosquatting
4
+ module Algorithms
5
+ class DoubleHit < Base
6
+ KEYBOARD_ADJACENT = Replacement::KEYBOARD_ADJACENT
7
+
8
+ def generate(package_name)
9
+ variants = []
10
+
11
+ (package_name.length - 1).times do |i|
12
+ next unless package_name[i] == package_name[i + 1]
13
+
14
+ char = package_name[i].downcase
15
+ adjacent = KEYBOARD_ADJACENT[char] || []
16
+
17
+ adjacent.each do |adj_char|
18
+ variant = package_name[0...i] + adj_char + adj_char + package_name[(i + 2)..]
19
+ variants << variant
20
+ end
21
+ end
22
+
23
+ variants.uniq
24
+ end
25
+ end
26
+ end
27
+ end
@@ -16,6 +16,8 @@ module Typosquatting
16
16
  generate(args)
17
17
  when "check"
18
18
  check(args)
19
+ when "discover"
20
+ discover(args)
19
21
  when "confusion"
20
22
  confusion(args)
21
23
  when "sbom"
@@ -36,13 +38,14 @@ module Typosquatting
36
38
  end
37
39
 
38
40
  def generate(args)
39
- options = { format: "text", verbose: false }
41
+ options = { format: "text", verbose: false, length_filtering: true }
40
42
  parser = OptionParser.new do |opts|
41
43
  opts.banner = "Usage: typosquatting generate PACKAGE -e ECOSYSTEM [options]"
42
44
  opts.on("-e", "--ecosystem ECOSYSTEM", "Package ecosystem (required)") { |v| options[:ecosystem] = v }
43
45
  opts.on("-f", "--format FORMAT", "Output format (text, json, csv)") { |v| options[:format] = v }
44
46
  opts.on("-v", "--verbose", "Show algorithm for each variant") { options[:verbose] = true }
45
47
  opts.on("-a", "--algorithms LIST", "Comma-separated list of algorithms to use") { |v| options[:algorithms] = v }
48
+ opts.on("--no-length-filter", "Disable length-based algorithm filtering for short names") { options[:length_filtering] = false }
46
49
  end
47
50
  parser.parse!(args)
48
51
 
@@ -55,14 +58,14 @@ module Typosquatting
55
58
 
56
59
  ecosystem = Ecosystems::Base.get(options[:ecosystem])
57
60
  algorithms = select_algorithms(options[:algorithms])
58
- generator = Generator.new(ecosystem: ecosystem, algorithms: algorithms)
61
+ generator = Generator.new(ecosystem: ecosystem, algorithms: algorithms, length_filtering: options[:length_filtering])
59
62
  variants = generator.generate(package)
60
63
 
61
64
  output_variants(variants, options)
62
65
  end
63
66
 
64
67
  def check(args)
65
- options = { format: "text", verbose: false, existing_only: false, dry_run: false }
68
+ options = { format: "text", verbose: false, existing_only: false, dry_run: false, length_filtering: true }
66
69
  parser = OptionParser.new do |opts|
67
70
  opts.banner = "Usage: typosquatting check PACKAGE -e ECOSYSTEM [options]"
68
71
  opts.on("-e", "--ecosystem ECOSYSTEM", "Package ecosystem (required)") { |v| options[:ecosystem] = v }
@@ -71,6 +74,7 @@ module Typosquatting
71
74
  opts.on("-a", "--algorithms LIST", "Comma-separated list of algorithms to use") { |v| options[:algorithms] = v }
72
75
  opts.on("--existing-only", "Only show packages that exist") { options[:existing_only] = true }
73
76
  opts.on("--dry-run", "Show variants without making API calls") { options[:dry_run] = true }
77
+ opts.on("--no-length-filter", "Disable length-based algorithm filtering for short names") { options[:length_filtering] = false }
74
78
  end
75
79
  parser.parse!(args)
76
80
 
@@ -83,7 +87,7 @@ module Typosquatting
83
87
 
84
88
  ecosystem = Ecosystems::Base.get(options[:ecosystem])
85
89
  algorithms = select_algorithms(options[:algorithms])
86
- generator = Generator.new(ecosystem: ecosystem, algorithms: algorithms)
90
+ generator = Generator.new(ecosystem: ecosystem, algorithms: algorithms, length_filtering: options[:length_filtering])
87
91
  variants = generator.generate(package)
88
92
 
89
93
  if options[:dry_run]
@@ -99,6 +103,39 @@ module Typosquatting
99
103
  output_check_results(results, options)
100
104
  end
101
105
 
106
+ def discover(args)
107
+ options = { format: "text", max_distance: 2 }
108
+ parser = OptionParser.new do |opts|
109
+ opts.banner = "Usage: typosquatting discover PACKAGE -e ECOSYSTEM [options]"
110
+ opts.on("-e", "--ecosystem ECOSYSTEM", "Package ecosystem (required)") { |v| options[:ecosystem] = v }
111
+ opts.on("-f", "--format FORMAT", "Output format (text, json)") { |v| options[:format] = v }
112
+ opts.on("-d", "--distance N", Integer, "Maximum edit distance (default: 2)") { |v| options[:max_distance] = v }
113
+ opts.on("--with-variants", "Also show which generated variants exist") { options[:with_variants] = true }
114
+ end
115
+ parser.parse!(args)
116
+
117
+ package = args.shift
118
+ unless package && options[:ecosystem]
119
+ $stderr.puts "Error: Package name and ecosystem required"
120
+ $stderr.puts parser
121
+ exit 1
122
+ end
123
+
124
+ lookup = Lookup.new(ecosystem: options[:ecosystem])
125
+
126
+ $stderr.puts "Discovering similar packages to #{package}..." if $stderr.tty?
127
+ results = lookup.discover(package, max_distance: options[:max_distance])
128
+
129
+ if options[:with_variants]
130
+ generator = Generator.new(ecosystem: options[:ecosystem])
131
+ variants = generator.generate(package)
132
+ variant_results = lookup.check_with_variants(package, variants)
133
+ existing_variants = variant_results.select(&:exists?)
134
+ end
135
+
136
+ output_discover_results(results, existing_variants, options)
137
+ end
138
+
102
139
  def confusion(args)
103
140
  options = { format: "text" }
104
141
  parser = OptionParser.new do |opts|
@@ -177,16 +214,17 @@ module Typosquatting
177
214
  def ecosystems
178
215
  puts "Supported ecosystems:"
179
216
  puts ""
180
- puts " pypi - Python Package Index"
181
- puts " npm - Node Package Manager"
182
- puts " gem - RubyGems"
183
- puts " cargo - Rust packages"
184
- puts " golang - Go modules"
185
- puts " maven - Java/JVM packages"
186
- puts " nuget - .NET packages"
187
- puts " composer - PHP packages"
188
- puts " hex - Erlang/Elixir packages"
189
- puts " pub - Dart packages"
217
+ puts " pypi - Python Package Index"
218
+ puts " npm - Node Package Manager"
219
+ puts " gem - RubyGems"
220
+ puts " cargo - Rust packages"
221
+ puts " golang - Go modules"
222
+ puts " maven - Java/JVM packages"
223
+ puts " nuget - .NET packages"
224
+ puts " composer - PHP packages"
225
+ puts " hex - Erlang/Elixir packages"
226
+ puts " pub - Dart packages"
227
+ puts " github_actions - GitHub Actions"
190
228
  end
191
229
 
192
230
  def algorithms
@@ -209,6 +247,7 @@ module Typosquatting
209
247
  puts "Commands:"
210
248
  puts " generate PACKAGE -e ECOSYSTEM Generate typosquat variants"
211
249
  puts " check PACKAGE -e ECOSYSTEM Check which variants exist"
250
+ puts " discover PACKAGE -e ECOSYSTEM Find similar packages by edit distance"
212
251
  puts " confusion PACKAGE -e ECOSYSTEM Check for dependency confusion"
213
252
  puts " sbom FILE Check SBOM for potential typosquats"
214
253
  puts " ecosystems List supported ecosystems"
@@ -219,6 +258,7 @@ module Typosquatting
219
258
  puts "Examples:"
220
259
  puts " typosquatting generate requests -e pypi"
221
260
  puts " typosquatting check requests -e pypi --existing-only"
261
+ puts " typosquatting discover rails -e gem --with-variants"
222
262
  puts " typosquatting confusion my-package -e maven"
223
263
  puts " typosquatting sbom bom.json"
224
264
  end
@@ -376,5 +416,42 @@ module Typosquatting
376
416
  puts "Found #{results.length} suspicious package(s)"
377
417
  end
378
418
  end
419
+
420
+ def output_discover_results(discovered, existing_variants, options)
421
+ case options[:format]
422
+ when "json"
423
+ data = {
424
+ discovered: discovered.map(&:to_h),
425
+ existing_variants: existing_variants&.map(&:to_h)
426
+ }.compact
427
+ puts JSON.pretty_generate(data)
428
+ else
429
+ if discovered.empty? && (existing_variants.nil? || existing_variants.empty?)
430
+ puts "No similar packages found"
431
+ return
432
+ end
433
+
434
+ if discovered.any?
435
+ puts "Similar packages found (by edit distance):"
436
+ puts ""
437
+ discovered.each do |result|
438
+ puts " #{result.name} (distance: #{result.distance})"
439
+ end
440
+ puts ""
441
+ end
442
+
443
+ if existing_variants&.any?
444
+ puts "Generated variants that exist:"
445
+ puts ""
446
+ existing_variants.each do |result|
447
+ puts " #{result.name}"
448
+ end
449
+ puts ""
450
+ end
451
+
452
+ puts "Found #{discovered.length} similar package(s)"
453
+ puts "Found #{existing_variants.length} existing variant(s)" if existing_variants&.any?
454
+ end
455
+ end
379
456
  end
380
457
  end
@@ -0,0 +1,84 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Typosquatting
4
+ module Ecosystems
5
+ class GithubActions < Base
6
+ def initialize
7
+ super
8
+ @name = "github_actions"
9
+ @purl_type = "github"
10
+ end
11
+
12
+ def name_pattern
13
+ /\A[a-zA-Z0-9][a-zA-Z0-9-]*\/[a-zA-Z0-9._-]+\z/
14
+ end
15
+
16
+ def allowed_characters
17
+ /[a-zA-Z0-9._-]/
18
+ end
19
+
20
+ def allowed_delimiters
21
+ %w[- _ .]
22
+ end
23
+
24
+ def case_sensitive?
25
+ false
26
+ end
27
+
28
+ def supports_namespaces?
29
+ true
30
+ end
31
+
32
+ def normalise(name)
33
+ name.downcase.sub(/@.*$/, "")
34
+ end
35
+
36
+ def parse_namespace(name)
37
+ clean_name = name.sub(/@.*$/, "")
38
+ parts = clean_name.split("/", 2)
39
+ if parts.length == 2
40
+ [parts[0], parts[1]]
41
+ else
42
+ [nil, name]
43
+ end
44
+ end
45
+
46
+ def valid_name?(name)
47
+ return false if name.nil? || name.empty?
48
+
49
+ clean_name = name.sub(/@.*$/, "")
50
+ owner, repo = parse_namespace(clean_name)
51
+
52
+ return false if owner.nil? || repo.nil?
53
+ return false if owner.empty? || repo.empty?
54
+
55
+ return false unless valid_owner?(owner)
56
+ return false unless valid_repo?(repo)
57
+
58
+ true
59
+ end
60
+
61
+ def format_name(owner, repo)
62
+ "#{owner}/#{repo}"
63
+ end
64
+
65
+ def valid_owner?(owner)
66
+ return false if owner.length > 39
67
+ return false if owner.start_with?("-")
68
+ return false if owner.end_with?("-")
69
+ return false if owner.include?("--")
70
+
71
+ !!(owner =~ /\A[a-zA-Z0-9][a-zA-Z0-9-]*\z/)
72
+ end
73
+
74
+ def valid_repo?(repo)
75
+ return false if repo.length > 100
76
+ return false if repo.start_with?(".")
77
+
78
+ !!(repo =~ /\A[a-zA-Z0-9._-]+\z/)
79
+ end
80
+ end
81
+
82
+ Base.register(GithubActions.new)
83
+ end
84
+ end
@@ -49,6 +49,10 @@ module Typosquatting
49
49
 
50
50
  !!(name =~ name_pattern)
51
51
  end
52
+
53
+ def format_name(namespace, name)
54
+ "#{namespace}/#{name}"
55
+ end
52
56
  end
53
57
 
54
58
  Base.register(Golang.new)
@@ -59,6 +59,14 @@ module Typosquatting
59
59
 
60
60
  true
61
61
  end
62
+
63
+ def format_name(namespace, name)
64
+ if namespace
65
+ "#{namespace}/#{name}"
66
+ else
67
+ name
68
+ end
69
+ end
62
70
  end
63
71
 
64
72
  Base.register(Npm.new)
@@ -2,17 +2,47 @@
2
2
 
3
3
  module Typosquatting
4
4
  class Generator
5
- attr_reader :ecosystem, :algorithms
5
+ SHORT_NAME_THRESHOLD = 5
6
6
 
7
- def initialize(ecosystem:, algorithms: nil)
7
+ HIGH_CONFIDENCE_ALGORITHMS = %w[
8
+ homoglyph
9
+ repetition
10
+ replacement
11
+ transposition
12
+ ].freeze
13
+
14
+ attr_reader :ecosystem, :algorithms, :length_filtering
15
+
16
+ def initialize(ecosystem:, algorithms: nil, length_filtering: true)
8
17
  @ecosystem = ecosystem.is_a?(String) ? Ecosystems::Base.get(ecosystem) : ecosystem
9
18
  @algorithms = algorithms || Algorithms::Base.all
19
+ @length_filtering = length_filtering
10
20
  end
11
21
 
12
22
  def generate(package_name)
13
23
  results = []
14
24
 
15
- algorithms.each do |algorithm|
25
+ if ecosystem.supports_namespaces?
26
+ results.concat(generate_namespace_aware(package_name))
27
+ else
28
+ results.concat(generate_simple(package_name))
29
+ end
30
+
31
+ dedupe_by_normalised_name(results)
32
+ end
33
+
34
+ def algorithms_for_length(name_length)
35
+ return algorithms unless length_filtering
36
+ return algorithms if name_length >= SHORT_NAME_THRESHOLD
37
+
38
+ algorithms.select { |a| HIGH_CONFIDENCE_ALGORITHMS.include?(a.name) }
39
+ end
40
+
41
+ def generate_simple(package_name)
42
+ results = []
43
+ active_algorithms = algorithms_for_length(package_name.length)
44
+
45
+ active_algorithms.each do |algorithm|
16
46
  variants = algorithm.generate(package_name)
17
47
  variants.each do |variant|
18
48
  next if variant == package_name
@@ -27,7 +57,59 @@ module Typosquatting
27
57
  end
28
58
  end
29
59
 
30
- dedupe_by_normalised_name(results)
60
+ results
61
+ end
62
+
63
+ def generate_namespace_aware(package_name)
64
+ namespace, name = ecosystem.parse_namespace(package_name)
65
+ results = []
66
+
67
+ return generate_simple(package_name) if namespace.nil?
68
+
69
+ namespace_algorithms = algorithms_for_length(namespace.length)
70
+ name_algorithms = algorithms_for_length(name.length)
71
+
72
+ namespace_algorithms.each do |algorithm|
73
+ namespace_variants = algorithm.generate(namespace)
74
+ namespace_variants.each do |ns_variant|
75
+ full_name = rebuild_namespaced_name(ns_variant, name)
76
+ next if full_name == package_name
77
+ next unless ecosystem.valid_name?(full_name)
78
+ next if same_after_normalisation?(package_name, full_name)
79
+
80
+ results << Variant.new(
81
+ name: full_name,
82
+ algorithm: algorithm.name,
83
+ original: package_name
84
+ )
85
+ end
86
+ end
87
+
88
+ name_algorithms.each do |algorithm|
89
+ name_variants = algorithm.generate(name)
90
+ name_variants.each do |name_variant|
91
+ full_name = rebuild_namespaced_name(namespace, name_variant)
92
+ next if full_name == package_name
93
+ next unless ecosystem.valid_name?(full_name)
94
+ next if same_after_normalisation?(package_name, full_name)
95
+
96
+ results << Variant.new(
97
+ name: full_name,
98
+ algorithm: algorithm.name,
99
+ original: package_name
100
+ )
101
+ end
102
+ end
103
+
104
+ results
105
+ end
106
+
107
+ def rebuild_namespaced_name(namespace, name)
108
+ if ecosystem.respond_to?(:format_name)
109
+ ecosystem.format_name(namespace, name)
110
+ else
111
+ "#{namespace}/#{name}"
112
+ end
31
113
  end
32
114
 
33
115
  Variant = Struct.new(:name, :algorithm, :original, keyword_init: true) do
@@ -4,6 +4,7 @@ require "net/http"
4
4
  require "json"
5
5
  require "uri"
6
6
  require "purl"
7
+ require "set"
7
8
 
8
9
  module Typosquatting
9
10
  class Lookup
@@ -51,6 +52,92 @@ module Typosquatting
51
52
  response&.map { |r| Registry.new(r) } || []
52
53
  end
53
54
 
55
+ def list_names(registry:, prefix: nil, postfix: nil)
56
+ params = []
57
+ params << "prefix=#{URI.encode_www_form_component(prefix)}" if prefix
58
+ params << "postfix=#{URI.encode_www_form_component(postfix)}" if postfix
59
+ query = params.empty? ? "" : "?#{params.join("&")}"
60
+
61
+ fetch("/registries/#{URI.encode_www_form_component(registry)}/package_names#{query}") || []
62
+ end
63
+
64
+ def discover(package_name, max_distance: 2)
65
+ registry = registries.first
66
+ return [] unless registry
67
+
68
+ prefix = package_name[0, 3]
69
+ candidates = list_names(registry: registry.name, prefix: prefix)
70
+
71
+ candidates.filter_map do |candidate|
72
+ next if candidate == package_name
73
+
74
+ distance = levenshtein(package_name.downcase, candidate.downcase)
75
+ next if distance > max_distance || distance == 0
76
+
77
+ DiscoveryResult.new(
78
+ name: candidate,
79
+ target: package_name,
80
+ distance: distance
81
+ )
82
+ end.sort_by(&:distance)
83
+ end
84
+
85
+ def check_with_variants(package_name, variants)
86
+ registry = registries.first
87
+ return [] unless registry
88
+
89
+ prefix = package_name[0, 3]
90
+ existing = list_names(registry: registry.name, prefix: prefix)
91
+ existing_set = existing.map(&:downcase).to_set
92
+
93
+ variant_names = variants.map { |v| v.is_a?(String) ? v : v.name }
94
+
95
+ variant_names.filter_map do |variant|
96
+ exists = existing_set.include?(variant.downcase)
97
+ VariantCheckResult.new(
98
+ name: variant,
99
+ exists: exists
100
+ )
101
+ end
102
+ end
103
+
104
+ def levenshtein(s1, s2)
105
+ m, n = s1.length, s2.length
106
+ return n if m == 0
107
+ return m if n == 0
108
+
109
+ d = Array.new(m + 1) { |i| i }
110
+ x = nil
111
+
112
+ (1..n).each do |j|
113
+ d[0] = j
114
+ x = j - 1
115
+
116
+ (1..m).each do |i|
117
+ cost = s1[i - 1] == s2[j - 1] ? 0 : 1
118
+ x, d[i] = d[i], [d[i] + 1, d[i - 1] + 1, x + cost].min
119
+ end
120
+ end
121
+
122
+ d[m]
123
+ end
124
+
125
+ DiscoveryResult = Struct.new(:name, :target, :distance, keyword_init: true) do
126
+ def to_h
127
+ { name: name, target: target, distance: distance }
128
+ end
129
+ end
130
+
131
+ VariantCheckResult = Struct.new(:name, :exists, keyword_init: true) do
132
+ def exists?
133
+ exists
134
+ end
135
+
136
+ def to_h
137
+ { name: name, exists: exists }
138
+ end
139
+ end
140
+
54
141
  Result = Struct.new(:name, :purl, :packages, :ecosystem, keyword_init: true) do
55
142
  def exists?
56
143
  !packages.empty?
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Typosquatting
4
- VERSION = "0.1.0"
4
+ VERSION = "0.3.0"
5
5
  end
data/lib/typosquatting.rb CHANGED
@@ -15,6 +15,10 @@ require_relative "typosquatting/algorithms/word_order"
15
15
  require_relative "typosquatting/algorithms/plural"
16
16
  require_relative "typosquatting/algorithms/misspelling"
17
17
  require_relative "typosquatting/algorithms/numeral"
18
+ require_relative "typosquatting/algorithms/bitflip"
19
+ require_relative "typosquatting/algorithms/adjacent_insertion"
20
+ require_relative "typosquatting/algorithms/double_hit"
21
+ require_relative "typosquatting/algorithms/combosquatting"
18
22
 
19
23
  require_relative "typosquatting/ecosystems/base"
20
24
  require_relative "typosquatting/ecosystems/pypi"
@@ -27,6 +31,7 @@ require_relative "typosquatting/ecosystems/nuget"
27
31
  require_relative "typosquatting/ecosystems/composer"
28
32
  require_relative "typosquatting/ecosystems/hex"
29
33
  require_relative "typosquatting/ecosystems/pub"
34
+ require_relative "typosquatting/ecosystems/github_actions"
30
35
 
31
36
  require_relative "typosquatting/generator"
32
37
  require_relative "typosquatting/lookup"
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: typosquatting
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Nesbitt
@@ -55,8 +55,12 @@ files:
55
55
  - exe/typosquatting
56
56
  - lib/typosquatting.rb
57
57
  - lib/typosquatting/algorithms/addition.rb
58
+ - lib/typosquatting/algorithms/adjacent_insertion.rb
58
59
  - lib/typosquatting/algorithms/base.rb
60
+ - lib/typosquatting/algorithms/bitflip.rb
61
+ - lib/typosquatting/algorithms/combosquatting.rb
59
62
  - lib/typosquatting/algorithms/delimiter.rb
63
+ - lib/typosquatting/algorithms/double_hit.rb
60
64
  - lib/typosquatting/algorithms/homoglyph.rb
61
65
  - lib/typosquatting/algorithms/misspelling.rb
62
66
  - lib/typosquatting/algorithms/numeral.rb
@@ -72,6 +76,7 @@ files:
72
76
  - lib/typosquatting/ecosystems/base.rb
73
77
  - lib/typosquatting/ecosystems/cargo.rb
74
78
  - lib/typosquatting/ecosystems/composer.rb
79
+ - lib/typosquatting/ecosystems/github_actions.rb
75
80
  - lib/typosquatting/ecosystems/golang.rb
76
81
  - lib/typosquatting/ecosystems/hex.rb
77
82
  - lib/typosquatting/ecosystems/maven.rb