typosquatting 0.2.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 57ce19f59014bac56c5922b53d59c794ef8b55c3ff13cde363db2bef133aff23
4
- data.tar.gz: facd26dd6b71803eadfb0c2395f7a0a1249d25992c38d5cf07a15a255de91d69
3
+ metadata.gz: fc18d0104e41766e5b0b8603d786cde1ff74f7b65360509a8d880732b1d5f6ca
4
+ data.tar.gz: fb48d984d14196d0ccdb2765ee14d986f614d9d1285c329fe6b57d8fe1cb35a6
5
5
  SHA512:
6
- metadata.gz: f1de5348e69ee48a5eadcd7fccc310b5f7224f888e84a69bac5ba5fd6bd7d01a64638650834e39ad9f7c55083ec6dd9b3613ed83aefe6019724325a5fcf3b07d
7
- data.tar.gz: 40fe04d1f03d7917bac5663a5a22f90b0bfef24307f9656d9401cb77dbff710f332de4c15c7165a564f81ca6a701ea783fd26ceea335b09309260d1526dc87ef
6
+ metadata.gz: 0b9103d71382bbdfb8af88843a4c73d0abf99a65da6b494027c21d3327eb31f8261545c95514dd0bd4b35b7f681c54d70e4a1abb819d9ac020fcceebd96784d5
7
+ data.tar.gz: 122e21bf9b46b6379ea2d925998797161020500e56d440f916a43f995faf4865bad5d60cd5077e2e2d33ebc6cd4aa9046af9e419ac02a39729a156fb016376cf
data/CHANGELOG.md CHANGED
@@ -1,5 +1,14 @@
1
1
  ## [Unreleased]
2
2
 
3
+ ## [0.4.0] - 2026-01-02
4
+
5
+ - Skip intra-namespace typosquats for scoped packages (npm, composer, golang) since namespace owners control all packages under their namespace
6
+ - Add Dockerfile for running without Ruby installed
7
+
8
+ ## [0.3.0] - 2025-12-17
9
+
10
+ - Add `discover` command to find existing similar packages by edit distance using prefix/postfix API
11
+
3
12
  ## [0.2.0] - 2025-12-17
4
13
 
5
14
  - Add GitHub Actions ecosystem for CI/CD workflow typosquatting detection
data/Dockerfile ADDED
@@ -0,0 +1,5 @@
1
+ FROM ruby:3.4-alpine
2
+
3
+ RUN gem install typosquatting
4
+
5
+ ENTRYPOINT ["typosquatting"]
data/README.md CHANGED
@@ -31,6 +31,19 @@ Or add to your Gemfile:
31
31
  gem "typosquatting"
32
32
  ```
33
33
 
34
+ Or run with Docker:
35
+
36
+ ```bash
37
+ docker build -t typosquatting .
38
+ docker run --rm typosquatting generate requests -e pypi
39
+ ```
40
+
41
+ For commands that read local files (like `sbom`), mount your directory:
42
+
43
+ ```bash
44
+ docker run --rm -v $PWD:/src typosquatting sbom /src/bom.json
45
+ ```
46
+
34
47
  ## CLI Usage
35
48
 
36
49
  ```bash
@@ -69,6 +82,12 @@ typosquatting check requests -e pypi -f json
69
82
 
70
83
  # List available algorithms
71
84
  typosquatting algorithms
85
+
86
+ # Discover existing packages similar to a target (by edit distance)
87
+ typosquatting discover requests -e pypi
88
+
89
+ # Discover with generated variants check
90
+ typosquatting discover requests -e pypi --with-variants
72
91
  ```
73
92
 
74
93
  ## Example Output
@@ -204,10 +223,58 @@ Package lookups use the [ecosyste.ms](https://packages.ecosyste.ms) API. Request
204
223
 
205
224
  Be mindful when checking many packages. The `--dry-run` flag shows what would be checked without making API calls.
206
225
 
226
+ ### packages.ecosyste.ms API
227
+
228
+ The package_names endpoint can help identify potential typosquats by searching for packages with similar prefixes or postfixes to popular package names.
229
+
230
+ ```
231
+ GET /api/v1/registries/{registry}/package_names
232
+ ```
233
+
234
+ **Parameters:**
235
+ - `prefix` - filter by package names starting with string (case insensitive)
236
+ - `postfix` - filter by package names ending with string (case insensitive)
237
+ - `page`, `per_page` - pagination
238
+ - `sort`, `order` - sorting
239
+
240
+ **Examples:**
241
+ ```
242
+ # Find RubyGems packages ending in "ails" (potential "rails" typosquats)
243
+ https://packages.ecosyste.ms/api/v1/registries/rubygems.org/package_names?postfix=ails
244
+
245
+ # Find RubyGems packages starting with "rai" (potential "rails" typosquats)
246
+ https://packages.ecosyste.ms/api/v1/registries/rubygems.org/package_names?prefix=rai
247
+
248
+ # Find npm packages starting with "reac" (potential "react" typosquats)
249
+ https://packages.ecosyste.ms/api/v1/registries/npmjs.org/package_names?prefix=reac
250
+ ```
251
+
252
+ Full API documentation: [packages.ecosyste.ms/docs](https://packages.ecosyste.ms/docs)
253
+
207
254
  ## Dataset
208
255
 
209
256
  The [ecosyste-ms/typosquatting-dataset](https://github.com/ecosyste-ms/typosquatting-dataset) contains 143 confirmed typosquatting attacks from security research, mapping malicious packages to their targets with classification and source attribution. Useful for testing detection tools and understanding real attack patterns.
210
257
 
258
+ ## Research
259
+
260
+ The `research/` directory contains a script to scan "critical" packages (high OpenSSF criticality score) for potential typosquats:
261
+
262
+ ```bash
263
+ # Scan critical RubyGems packages
264
+ ruby research/critical_packages.rb rubygems.org
265
+
266
+ # Scan npm
267
+ ruby research/critical_packages.rb npmjs.org
268
+
269
+ # Include all algorithms (default is high-confidence only)
270
+ ruby research/critical_packages.rb rubygems.org --all
271
+
272
+ # Limit to first N packages for testing
273
+ ruby research/critical_packages.rb rubygems.org --limit=100
274
+ ```
275
+
276
+ The script generates variants using all library algorithms, checks which exist on the registry, and outputs a CSV with download counts, creation dates, repository URLs, and package status. It filters out packages that predate the target (can't be typosquats), packages with high download ratios (likely legitimate), and flags packages that have been removed (confirmed typosquats).
277
+
211
278
  ## Development
212
279
 
213
280
  ```bash
@@ -16,6 +16,8 @@ module Typosquatting
16
16
  generate(args)
17
17
  when "check"
18
18
  check(args)
19
+ when "discover"
20
+ discover(args)
19
21
  when "confusion"
20
22
  confusion(args)
21
23
  when "sbom"
@@ -101,6 +103,39 @@ module Typosquatting
101
103
  output_check_results(results, options)
102
104
  end
103
105
 
106
+ def discover(args)
107
+ options = { format: "text", max_distance: 2 }
108
+ parser = OptionParser.new do |opts|
109
+ opts.banner = "Usage: typosquatting discover PACKAGE -e ECOSYSTEM [options]"
110
+ opts.on("-e", "--ecosystem ECOSYSTEM", "Package ecosystem (required)") { |v| options[:ecosystem] = v }
111
+ opts.on("-f", "--format FORMAT", "Output format (text, json)") { |v| options[:format] = v }
112
+ opts.on("-d", "--distance N", Integer, "Maximum edit distance (default: 2)") { |v| options[:max_distance] = v }
113
+ opts.on("--with-variants", "Also show which generated variants exist") { options[:with_variants] = true }
114
+ end
115
+ parser.parse!(args)
116
+
117
+ package = args.shift
118
+ unless package && options[:ecosystem]
119
+ $stderr.puts "Error: Package name and ecosystem required"
120
+ $stderr.puts parser
121
+ exit 1
122
+ end
123
+
124
+ lookup = Lookup.new(ecosystem: options[:ecosystem])
125
+
126
+ $stderr.puts "Discovering similar packages to #{package}..." if $stderr.tty?
127
+ results = lookup.discover(package, max_distance: options[:max_distance])
128
+
129
+ if options[:with_variants]
130
+ generator = Generator.new(ecosystem: options[:ecosystem])
131
+ variants = generator.generate(package)
132
+ variant_results = lookup.check_with_variants(package, variants)
133
+ existing_variants = variant_results.select(&:exists?)
134
+ end
135
+
136
+ output_discover_results(results, existing_variants, options)
137
+ end
138
+
104
139
  def confusion(args)
105
140
  options = { format: "text" }
106
141
  parser = OptionParser.new do |opts|
@@ -212,6 +247,7 @@ module Typosquatting
212
247
  puts "Commands:"
213
248
  puts " generate PACKAGE -e ECOSYSTEM Generate typosquat variants"
214
249
  puts " check PACKAGE -e ECOSYSTEM Check which variants exist"
250
+ puts " discover PACKAGE -e ECOSYSTEM Find similar packages by edit distance"
215
251
  puts " confusion PACKAGE -e ECOSYSTEM Check for dependency confusion"
216
252
  puts " sbom FILE Check SBOM for potential typosquats"
217
253
  puts " ecosystems List supported ecosystems"
@@ -222,6 +258,7 @@ module Typosquatting
222
258
  puts "Examples:"
223
259
  puts " typosquatting generate requests -e pypi"
224
260
  puts " typosquatting check requests -e pypi --existing-only"
261
+ puts " typosquatting discover rails -e gem --with-variants"
225
262
  puts " typosquatting confusion my-package -e maven"
226
263
  puts " typosquatting sbom bom.json"
227
264
  end
@@ -379,5 +416,42 @@ module Typosquatting
379
416
  puts "Found #{results.length} suspicious package(s)"
380
417
  end
381
418
  end
419
+
420
+ def output_discover_results(discovered, existing_variants, options)
421
+ case options[:format]
422
+ when "json"
423
+ data = {
424
+ discovered: discovered.map(&:to_h),
425
+ existing_variants: existing_variants&.map(&:to_h)
426
+ }.compact
427
+ puts JSON.pretty_generate(data)
428
+ else
429
+ if discovered.empty? && (existing_variants.nil? || existing_variants.empty?)
430
+ puts "No similar packages found"
431
+ return
432
+ end
433
+
434
+ if discovered.any?
435
+ puts "Similar packages found (by edit distance):"
436
+ puts ""
437
+ discovered.each do |result|
438
+ puts " #{result.name} (distance: #{result.distance})"
439
+ end
440
+ puts ""
441
+ end
442
+
443
+ if existing_variants&.any?
444
+ puts "Generated variants that exist:"
445
+ puts ""
446
+ existing_variants.each do |result|
447
+ puts " #{result.name}"
448
+ end
449
+ puts ""
450
+ end
451
+
452
+ puts "Found #{discovered.length} similar package(s)"
453
+ puts "Found #{existing_variants.length} existing variant(s)" if existing_variants&.any?
454
+ end
455
+ end
382
456
  end
383
457
  end
@@ -40,6 +40,10 @@ module Typosquatting
40
40
  false
41
41
  end
42
42
 
43
+ def namespace_controls_members?
44
+ true
45
+ end
46
+
43
47
  def parse_namespace(name)
44
48
  [nil, name]
45
49
  end
@@ -85,19 +85,21 @@ module Typosquatting
85
85
  end
86
86
  end
87
87
 
88
- name_algorithms.each do |algorithm|
89
- name_variants = algorithm.generate(name)
90
- name_variants.each do |name_variant|
91
- full_name = rebuild_namespaced_name(namespace, name_variant)
92
- next if full_name == package_name
93
- next unless ecosystem.valid_name?(full_name)
94
- next if same_after_normalisation?(package_name, full_name)
95
-
96
- results << Variant.new(
97
- name: full_name,
98
- algorithm: algorithm.name,
99
- original: package_name
100
- )
88
+ unless ecosystem.namespace_controls_members?
89
+ name_algorithms.each do |algorithm|
90
+ name_variants = algorithm.generate(name)
91
+ name_variants.each do |name_variant|
92
+ full_name = rebuild_namespaced_name(namespace, name_variant)
93
+ next if full_name == package_name
94
+ next unless ecosystem.valid_name?(full_name)
95
+ next if same_after_normalisation?(package_name, full_name)
96
+
97
+ results << Variant.new(
98
+ name: full_name,
99
+ algorithm: algorithm.name,
100
+ original: package_name
101
+ )
102
+ end
101
103
  end
102
104
  end
103
105
 
@@ -4,6 +4,7 @@ require "net/http"
4
4
  require "json"
5
5
  require "uri"
6
6
  require "purl"
7
+ require "set"
7
8
 
8
9
  module Typosquatting
9
10
  class Lookup
@@ -51,6 +52,119 @@ module Typosquatting
51
52
  response&.map { |r| Registry.new(r) } || []
52
53
  end
53
54
 
55
+ def list_names(registry:, prefix: nil, postfix: nil, critical: nil, page: nil, per_page: nil)
56
+ params = []
57
+ params << "prefix=#{URI.encode_www_form_component(prefix)}" if prefix
58
+ params << "postfix=#{URI.encode_www_form_component(postfix)}" if postfix
59
+ params << "critical=true" if critical
60
+ params << "page=#{page}" if page
61
+ params << "per_page=#{per_page}" if per_page
62
+ query = params.empty? ? "" : "?#{params.join("&")}"
63
+
64
+ fetch("/registries/#{URI.encode_www_form_component(registry)}/package_names#{query}") || []
65
+ end
66
+
67
+ def list_all_names(registry:, prefix: nil, postfix: nil, critical: nil, per_page: 100)
68
+ all_names = []
69
+ page = 1
70
+
71
+ loop do
72
+ names = list_names(
73
+ registry: registry,
74
+ prefix: prefix,
75
+ postfix: postfix,
76
+ critical: critical,
77
+ page: page,
78
+ per_page: per_page
79
+ )
80
+ break if names.empty?
81
+
82
+ all_names.concat(names)
83
+ break if names.length < per_page
84
+
85
+ page += 1
86
+ end
87
+
88
+ all_names
89
+ end
90
+
91
+ def discover(package_name, max_distance: 2)
92
+ registry = registries.first
93
+ return [] unless registry
94
+
95
+ prefix = package_name[0, 3]
96
+ candidates = list_names(registry: registry.name, prefix: prefix)
97
+
98
+ candidates.filter_map do |candidate|
99
+ next if candidate == package_name
100
+
101
+ distance = levenshtein(package_name.downcase, candidate.downcase)
102
+ next if distance > max_distance || distance == 0
103
+
104
+ DiscoveryResult.new(
105
+ name: candidate,
106
+ target: package_name,
107
+ distance: distance
108
+ )
109
+ end.sort_by(&:distance)
110
+ end
111
+
112
+ def check_with_variants(package_name, variants)
113
+ registry = registries.first
114
+ return [] unless registry
115
+
116
+ prefix = package_name[0, 3]
117
+ existing = list_names(registry: registry.name, prefix: prefix)
118
+ existing_set = existing.map(&:downcase).to_set
119
+
120
+ variant_names = variants.map { |v| v.is_a?(String) ? v : v.name }
121
+
122
+ variant_names.filter_map do |variant|
123
+ exists = existing_set.include?(variant.downcase)
124
+ VariantCheckResult.new(
125
+ name: variant,
126
+ exists: exists
127
+ )
128
+ end
129
+ end
130
+
131
+ def levenshtein(s1, s2)
132
+ m, n = s1.length, s2.length
133
+ return n if m == 0
134
+ return m if n == 0
135
+
136
+ d = Array.new(m + 1) { |i| i }
137
+ x = nil
138
+
139
+ (1..n).each do |j|
140
+ d[0] = j
141
+ x = j - 1
142
+
143
+ (1..m).each do |i|
144
+ cost = s1[i - 1] == s2[j - 1] ? 0 : 1
145
+ x, d[i] = d[i], [d[i] + 1, d[i - 1] + 1, x + cost].min
146
+ end
147
+ end
148
+
149
+ d[m]
150
+ end
151
+
152
+ DiscoveryResult = Struct.new(:name, :target, :distance, keyword_init: true) do
153
+ def to_h
154
+ { name: name, target: target, distance: distance }
155
+ end
156
+ end
157
+
158
+ VariantCheckResult = Struct.new(:name, :exists, keyword_init: true) do
159
+ def exists?
160
+ exists
161
+ end
162
+
163
+ def to_h
164
+ { name: name, exists: exists }
165
+ end
166
+ end
167
+
54
168
  Result = Struct.new(:name, :purl, :packages, :ecosystem, keyword_init: true) do
55
169
  def exists?
56
170
  !packages.empty?
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Typosquatting
4
- VERSION = "0.2.0"
4
+ VERSION = "0.4.0"
5
5
  end
@@ -0,0 +1,74 @@
1
+ # Typosquatting Research Tools
2
+
3
+ Scripts for analyzing potential typosquats across package registries.
4
+
5
+ ## critical_packages.rb
6
+
7
+ Scans critical packages (high OpenSSF criticality score) from a registry for potential typosquats using our detection algorithms. Results are written to a timestamped CSV file.
8
+
9
+ ```bash
10
+ # Scan rubygems.org critical packages (high-confidence algorithms only)
11
+ ruby research/critical_packages.rb rubygems.org
12
+
13
+ # Include all algorithm matches
14
+ ruby research/critical_packages.rb rubygems.org --all
15
+
16
+ # Limit to first N packages (useful for testing)
17
+ ruby research/critical_packages.rb rubygems.org --limit=100
18
+ ```
19
+
20
+ Supported registries: rubygems.org, npmjs.org, pypi.org, crates.io, packagist.org, hex.pm, pub.dev, proxy.golang.org, repo1.maven.org, nuget.org
21
+
22
+ ## Algorithms
23
+
24
+ By default, only high-confidence algorithms are used (less likely to produce false positives):
25
+
26
+ - homoglyph - lookalike characters (l vs 1, O vs 0)
27
+ - repetition - doubled characters (lodash vs llodash)
28
+ - replacement - adjacent keyboard keys (lodash vs lodazh)
29
+ - transposition - swapped adjacent characters (lodash vs lodasj)
30
+ - omission - dropped characters (lodash vs lodas)
31
+
32
+ Use `--all` to include all 17 algorithms.
33
+
34
+ ## Filters
35
+
36
+ The script applies several filters to reduce false positives:
37
+
38
+ - **Short names**: Packages under 5 characters are skipped (too many false positives)
39
+ - **Higher downloads**: Packages with more downloads than the critical package are skipped (not typosquats)
40
+ - **Popular packages**: Packages with >= 1% of the critical package's downloads are skipped (likely legitimate)
41
+ - **Predates target**: Packages created before the critical package are skipped (can't be typosquats)
42
+
43
+ ## CSV Output
44
+
45
+ Output files are named `{registry}_{timestamp}.csv` with these columns:
46
+
47
+ | Column | Description |
48
+ |--------|-------------|
49
+ | critical_package | The critical package being checked |
50
+ | critical_downloads | Total downloads of the critical package |
51
+ | critical_created | First release date of the critical package |
52
+ | critical_repo | Repository URL of the critical package |
53
+ | potential_typosquat | A similarly named package that exists |
54
+ | algorithm | Which detection algorithm matched |
55
+ | squat_downloads | Total downloads of the potential typosquat |
56
+ | download_ratio | Squat downloads as percentage of critical downloads |
57
+ | squat_created | First release date of the potential typosquat |
58
+ | squat_status | Package status (empty = active, "removed" = yanked) |
59
+ | squat_repo | Repository URL of the potential typosquat |
60
+ | squat_description | Package description |
61
+
62
+ ## Interpreting Results
63
+
64
+ Signs of a real typosquat:
65
+ - `squat_status` is "removed" (already yanked by registry)
66
+ - No repository URL
67
+ - Very low download ratio
68
+ - Description is empty or generic
69
+ - Created shortly after the critical package became popular
70
+
71
+ Signs of a false positive:
72
+ - Has a legitimate repository with real code
73
+ - Description describes unrelated functionality
74
+ - Reasonable download count for its purpose
@@ -0,0 +1,282 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require "bundler/setup"
5
+ require "typosquatting"
6
+ require "csv"
7
+
8
+ class CriticalPackageScanner
9
+ SHORT_NAME_THRESHOLD = 5
10
+ POPULAR_RATIO_THRESHOLD = 1.0 # Skip squats with >= 1% of critical package downloads
11
+
12
+ REGISTRY_MAP = {
13
+ "rubygems.org" => "rubygems",
14
+ "npmjs.org" => "npm",
15
+ "pypi.org" => "pypi",
16
+ "crates.io" => "cargo",
17
+ "packagist.org" => "composer",
18
+ "hex.pm" => "hex",
19
+ "pub.dev" => "pub",
20
+ "proxy.golang.org" => "golang",
21
+ "repo1.maven.org" => "maven",
22
+ "nuget.org" => "nuget"
23
+ }.freeze
24
+
25
+ # High confidence algorithms that indicate likely intentional typosquatting
26
+ HIGH_CONFIDENCE_ALGORITHMS = %w[
27
+ homoglyph
28
+ repetition
29
+ replacement
30
+ transposition
31
+ omission
32
+ ].freeze
33
+
34
+ attr_reader :registry, :results, :errors, :high_confidence_only, :limit
35
+
36
+ def initialize(registry:, high_confidence_only: true, limit: nil)
37
+ @registry = registry
38
+ @high_confidence_only = high_confidence_only
39
+ @limit = limit
40
+ @results = []
41
+ @errors = []
42
+ @prefix_cache = {}
43
+ end
44
+
45
+ def run
46
+ packages = fetch_critical_packages
47
+ puts "Found #{packages.length} critical packages for #{registry}"
48
+ puts
49
+
50
+ packages.each_with_index do |package, index|
51
+ scan_package(package, index + 1, packages.length)
52
+ end
53
+
54
+ write_csv
55
+ print_summary
56
+ end
57
+
58
+ def fetch_critical_packages
59
+ packages = lookup.list_all_names(registry: registry, critical: true, per_page: 1000)
60
+ limit ? packages.first(limit) : packages
61
+ end
62
+
63
+ def scan_package(package_name, current, total)
64
+ print "\r[#{current}/#{total}] Scanning #{package_name.ljust(40)}"
65
+
66
+ # Skip short names - too many false positives
67
+ return if package_name.length < SHORT_NAME_THRESHOLD
68
+
69
+ # Generate typosquatting variants using our algorithms
70
+ variants = generator.generate(package_name)
71
+ return if variants.empty?
72
+
73
+ # Fetch details for the critical package first (needed for download/date comparison)
74
+ critical_details = fetch_package_details(package_name)
75
+ @current_critical_downloads = critical_details&.dig("downloads") || 0
76
+ @current_critical_created = critical_details&.dig("first_release_published_at")
77
+
78
+ # Check which variants exist on the registry
79
+ existing = check_variants_exist(package_name, variants)
80
+ return if existing.empty?
81
+
82
+ results << {
83
+ package: package_name,
84
+ critical_details: critical_details,
85
+ matches: existing
86
+ }
87
+ rescue Typosquatting::APIError => e
88
+ errors << { package: package_name, error: e.message }
89
+ rescue StandardError => e
90
+ errors << { package: package_name, error: e.message }
91
+ end
92
+
93
+ def check_variants_exist(package_name, variants)
94
+ # Filter to high-confidence algorithms if requested
95
+ if high_confidence_only
96
+ variants = variants.select { |v| HIGH_CONFIDENCE_ALGORITHMS.include?(v.algorithm) }
97
+ end
98
+
99
+ # Group variants by prefix for efficient lookup
100
+ variants_by_prefix = variants.group_by { |v| v.name[0, 3] }
101
+
102
+ existing = []
103
+ variants_by_prefix.each do |prefix, prefix_variants|
104
+ @prefix_cache[prefix] ||= lookup.list_names(registry: registry, prefix: prefix)
105
+ existing_set = @prefix_cache[prefix].map(&:downcase).to_set
106
+
107
+ prefix_variants.each do |variant|
108
+ if existing_set.include?(variant.name.downcase) && variant.name.downcase != package_name.downcase
109
+ # Fetch package details
110
+ details = fetch_package_details(variant.name)
111
+ squat_downloads = details&.dig("downloads") || 0
112
+ squat_created = details&.dig("first_release_published_at")
113
+
114
+ # Skip if squat has more downloads than critical package - not a squat
115
+ next if squat_downloads > @current_critical_downloads
116
+
117
+ # Skip if squat is too popular (likely legitimate)
118
+ if @current_critical_downloads > 0
119
+ ratio = squat_downloads.to_f / @current_critical_downloads * 100
120
+ next if ratio >= POPULAR_RATIO_THRESHOLD
121
+ end
122
+
123
+ # Skip if squat predates the critical package (can't be a typosquat)
124
+ if squat_created && @current_critical_created
125
+ next if squat_created < @current_critical_created
126
+ end
127
+
128
+ existing << {
129
+ variant: variant,
130
+ description: details&.dig("description"),
131
+ repository_url: details&.dig("repository_url"),
132
+ downloads: squat_downloads,
133
+ first_release: squat_created,
134
+ status: details&.dig("status")
135
+ }
136
+ end
137
+ end
138
+ end
139
+
140
+ existing
141
+ end
142
+
143
+ def fetch_package_details(package_name)
144
+ result = lookup.check(package_name)
145
+ result.packages.first
146
+ rescue StandardError
147
+ nil
148
+ end
149
+
150
+ def generator
151
+ @generator ||= Typosquatting::Generator.new(ecosystem: ecosystem_for_registry)
152
+ end
153
+
154
+ def lookup
155
+ @lookup ||= Typosquatting::Lookup.new(ecosystem: ecosystem_for_registry)
156
+ end
157
+
158
+ def ecosystem_for_registry
159
+ REGISTRY_MAP[registry] || "rubygems"
160
+ end
161
+
162
+ def output_filename
163
+ timestamp = Time.now.strftime("%Y%m%d_%H%M%S")
164
+ "#{registry.gsub(".", "_")}_#{timestamp}.csv"
165
+ end
166
+
167
+ def write_csv
168
+ return if results.empty?
169
+
170
+ filename = output_filename
171
+ filepath = File.join(__dir__, filename)
172
+
173
+ CSV.open(filepath, "w") do |csv|
174
+ csv << [
175
+ "critical_package", "critical_downloads", "critical_created", "critical_repo",
176
+ "potential_typosquat", "algorithm", "squat_downloads", "download_ratio", "squat_created", "squat_status", "squat_repo", "squat_description"
177
+ ]
178
+
179
+ results.each do |result|
180
+ critical = result[:critical_details]
181
+ critical_downloads = critical&.dig("downloads") || 0
182
+ result[:matches].each do |match|
183
+ squat_downloads = match[:downloads] || 0
184
+ ratio = critical_downloads > 0 ? (squat_downloads.to_f / critical_downloads * 100).round(4) : 0
185
+
186
+ csv << [
187
+ result[:package],
188
+ critical_downloads,
189
+ critical&.dig("first_release_published_at")&.split("T")&.first,
190
+ critical&.dig("repository_url"),
191
+ match[:variant].name,
192
+ match[:variant].algorithm,
193
+ squat_downloads,
194
+ "#{ratio}%",
195
+ match[:first_release]&.split("T")&.first,
196
+ match[:status],
197
+ match[:repository_url],
198
+ match[:description]&.gsub(/\s+/, " ")&.strip
199
+ ]
200
+ end
201
+ end
202
+ end
203
+
204
+ puts "\n\nResults written to #{filepath}"
205
+ end
206
+
207
+ def print_summary
208
+ puts "\n"
209
+ puts "=" * 60
210
+ puts "Results for #{registry}"
211
+ puts "=" * 60
212
+ puts
213
+
214
+ if results.empty?
215
+ puts "No potential typosquats found."
216
+ else
217
+ puts "Found #{results.length} critical packages with potential typosquats"
218
+ puts "Total potential typosquats: #{results.sum { |r| r[:matches].length }}"
219
+
220
+ # Algorithm breakdown
221
+ algo_counts = Hash.new(0)
222
+ results.each do |result|
223
+ result[:matches].each { |m| algo_counts[m[:variant].algorithm] += 1 }
224
+ end
225
+
226
+ puts "\nBy algorithm:"
227
+ algo_counts.sort_by { |_, count| -count }.each do |algo, count|
228
+ puts " #{algo}: #{count}"
229
+ end
230
+
231
+ # Flag suspicious packages (no repo, low downloads)
232
+ suspicious = []
233
+ results.each do |result|
234
+ result[:matches].each do |match|
235
+ if match[:repository_url].nil? || match[:repository_url].to_s.empty?
236
+ suspicious << "#{match[:variant].name} (no repo, #{match[:downloads] || 0} downloads)"
237
+ end
238
+ end
239
+ end
240
+
241
+ if suspicious.any?
242
+ puts "\nSuspicious (no repository):"
243
+ suspicious.first(10).each { |s| puts " #{s}" }
244
+ puts " ... and #{suspicious.length - 10} more" if suspicious.length > 10
245
+ end
246
+
247
+ # Flag removed/yanked packages (confirmed typosquats)
248
+ removed = []
249
+ results.each do |result|
250
+ result[:matches].each do |match|
251
+ if match[:status] == "removed"
252
+ removed << "#{match[:variant].name} (targeting #{result[:package]})"
253
+ end
254
+ end
255
+ end
256
+
257
+ if removed.any?
258
+ puts "\nConfirmed (already yanked):"
259
+ removed.first(10).each { |s| puts " #{s}" }
260
+ puts " ... and #{removed.length - 10} more" if removed.length > 10
261
+ end
262
+ end
263
+
264
+ return if errors.empty?
265
+
266
+ puts "\n" + "=" * 60
267
+ puts "Errors (#{errors.length}):"
268
+ puts "=" * 60
269
+ errors.each do |error|
270
+ puts " #{error[:package]}: #{error[:error]}"
271
+ end
272
+ end
273
+ end
274
+
275
+ if __FILE__ == $PROGRAM_NAME
276
+ registry = ARGV[0] || "rubygems.org"
277
+ high_confidence_only = !ARGV.include?("--all")
278
+ limit = ARGV.find { |a| a.start_with?("--limit=") }&.split("=")&.last&.to_i
279
+
280
+ scanner = CriticalPackageScanner.new(registry: registry, high_confidence_only: high_confidence_only, limit: limit)
281
+ scanner.run
282
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: typosquatting
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.4.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Nesbitt
@@ -49,6 +49,7 @@ extra_rdoc_files: []
49
49
  files:
50
50
  - CHANGELOG.md
51
51
  - CODE_OF_CONDUCT.md
52
+ - Dockerfile
52
53
  - LICENSE
53
54
  - README.md
54
55
  - Rakefile
@@ -89,6 +90,8 @@ files:
89
90
  - lib/typosquatting/lookup.rb
90
91
  - lib/typosquatting/sbom.rb
91
92
  - lib/typosquatting/version.rb
93
+ - research/README.md
94
+ - research/critical_packages.rb
92
95
  - sig/typosquatting.rbs
93
96
  homepage: https://github.com/andrew/typosquatting
94
97
  licenses: