UrlCategorise 0.0.3 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 926043e28097f20035b4dbc943534c63a5c3c0a429745e4bb42e7dd0701295c1
4
- data.tar.gz: 3ef850d4c43266a6ec15a7653c64b97720efc2e46298ed36be07d48d70ada772
3
+ metadata.gz: f9e0158a598c4ce31320e56da6cfa74eaf795b6961cf432dc36bfc806b291a80
4
+ data.tar.gz: cfc035a4f344ef9d70f3336259fec31e76c6f2f9367934dd79a3fff932872040
5
5
  SHA512:
6
- metadata.gz: c6504dd8ec0a5f284bc78dfcbb7e45b9e1752c50f6f32a380c52128c19364b976232bd4af93c943d73f0b7cbb6e2ac3e44574e90a0fde38da87f5585b92fc3c0
7
- data.tar.gz: c2de16323a1cfa085ac15590f7601b5307cc7c660243f48b6a5db10df25809041b119eb3f48da382effbe120150c2967ae4224213508fca896c45889bad6bce1
6
+ metadata.gz: c4f140c5b7eaafe8d556c5db8d8e8ed17925c9c34999205bdecb521283b9b9f8f7124e43f421000dd78a441ea57516fe31226710018e32a1db1c038f103f5465
7
+ data.tar.gz: b033b13f7143399ff908449ee0f5c932f8fed1dc892295256802b3aca5b5114e07b20d81403484299b22dd664b6b6691e68b6a344fed0ef91571b68ca39cefe1
@@ -0,0 +1,13 @@
1
+ {
2
+ "permissions": {
3
+ "allow": [
4
+ "Bash(bundle exec rake:*)",
5
+ "Bash(bundle install:*)",
6
+ "Bash(ruby:*)",
7
+ "Bash(bundle exec ruby:*)",
8
+ "Bash(find:*)",
9
+ "Bash(grep:*)"
10
+ ],
11
+ "deny": []
12
+ }
13
+ }
@@ -0,0 +1,57 @@
1
+ name: CI
2
+
3
+ on:
4
+ push:
5
+ branches: [ main, develop ]
6
+ pull_request:
7
+ branches: [ main, develop ]
8
+
9
+ jobs:
10
+ test:
11
+ runs-on: ubuntu-latest
12
+ strategy:
13
+ matrix:
14
+ ruby-version: ['3.0', '3.1', '3.2', '3.3', '3.4']
15
+
16
+ steps:
17
+ - uses: actions/checkout@v4
18
+
19
+ - name: Set up Ruby ${{ matrix.ruby-version }}
20
+ uses: ruby/setup-ruby@v1
21
+ with:
22
+ ruby-version: ${{ matrix.ruby-version }}
23
+ bundler-cache: true
24
+
25
+ - name: Install dependencies
26
+ run: bundle install
27
+
28
+ - name: Run tests
29
+ run: bundle exec rake test
30
+
31
+ - name: Run linter (if available)
32
+ run: bundle exec rubocop || true
33
+ continue-on-error: true
34
+
35
+ coverage:
36
+ runs-on: ubuntu-latest
37
+ steps:
38
+ - uses: actions/checkout@v4
39
+
40
+ - name: Set up Ruby
41
+ uses: ruby/setup-ruby@v1
42
+ with:
43
+ ruby-version: '3.4'
44
+ bundler-cache: true
45
+
46
+ - name: Install dependencies
47
+ run: bundle install
48
+
49
+ - name: Run tests with coverage
50
+ run: bundle exec rake test
51
+ env:
52
+ COVERAGE: true
53
+
54
+ - name: Upload coverage to Codecov
55
+ uses: codecov/codecov-action@v4
56
+ with:
57
+ fail_ci_if_error: false
data/.ruby-version ADDED
@@ -0,0 +1 @@
1
+ 3.4.5
data/CLAUDE.md ADDED
@@ -0,0 +1,134 @@
1
+ # UrlCategorise Development Guidelines
2
+
3
+ ## Overview
4
+ UrlCategorise is a Ruby gem for categorizing URLs and domains based on various security and content blocklists. It downloads and processes multiple types of lists to provide comprehensive domain categorization.
5
+
6
+ ## Development Requirements
7
+
8
+ ### Testing Standards
9
+ - **ALL new changes MUST include new tests**
10
+ - **Test coverage MUST be 90% or higher**
11
+ - **NEVER delete, skip, or environment-check tests to make them pass**
12
+ - **Tests MUST pass because the underlying code works correctly**
13
+ - Use minitest for all testing
14
+ - Use WebMock for HTTP request stubbing in tests
15
+ - Run tests with: `bundle exec rake test`
16
+ - SimpleCov integration is mandatory for coverage tracking
17
+
18
+ ### Dependencies and Rails Support
19
+ - **MUST use the latest stable versions of gems**
20
+ - Ruby >= 3.0.0 (currently using 3.4+)
21
+ - **MUST use minitest and rake** for testing and build automation
22
+ - **Rails compatibility MUST support Rails 8.x** and current stable versions
23
+ - Dependencies are managed via Gemfile and gemspec
24
+ - ActiveRecord integration must be optional and backward compatible
25
+
26
+ #### Rails 8 Integration
27
+ - ActiveRecord models use `coder: JSON` for serialization (Rails 8 compatible)
28
+ - Migration version set to `ActiveRecord::Migration[8.0]`
29
+ - Optional database integration with automatic fallback to memory-based categorization
30
+ - Installation: Generate migration with `UrlCategorise::Models.generate_migration`
31
+ - Usage: Use `UrlCategorise::ActiveRecordClient` instead of `UrlCategorise::Client`
32
+
33
+ ### Code Quality
34
+ - Follow Ruby best practices and conventions
35
+ - Use meaningful variable and method names
36
+ - Add appropriate error handling
37
+ - Ensure thread safety where applicable
38
+
39
+ ### Supported List Formats
40
+ The gem supports multiple blocklist formats:
41
+ - Standard hosts files (0.0.0.0 domain.com)
42
+ - pfSense format
43
+ - AdSense lists
44
+ - uBlock Origin files
45
+ - dnsmasq format
46
+ - Plain text domain lists
47
+
48
+ ### Category Management Guidelines
49
+ - **Category names MUST be human-readable and intuitive**
50
+ - **NEVER add combined/meta lists as categories** (e.g., hagezi_light, stevenblack_all)
51
+ - **First try to add new lists to existing categories** before creating new ones
52
+ - **Use descriptive names instead of provider prefixes**:
53
+ - ❌ Bad: `abuse_ch_feodo`, `dshield_block_list`, `botnet_c2`, `doh_vpn_proxy_bypass`
54
+ - ✅ Good: `banking_trojans`, `suspicious_domains`, `botnet_command_control`, `dns_over_https_bypass`
55
+ - **Logical category organization**:
56
+ - Security threats: `malware`, `phishing`, `threat_indicators`, `cryptojacking`, `phishing_extended`
57
+ - Content filtering: `advertising`, `gambling`, `pornography`, `social_media`
58
+ - Network security: `suspicious_domains`, `threat_intelligence`, `dns_over_https_bypass`
59
+ - Geographic/specialized: `sanctions_ips`, `newly_registered_domains`, `chinese_ad_hosts`, `korean_ad_hosts`
60
+ - IP-based security: `compromised_ips`, `tor_exit_nodes`, `open_proxy_ips`, `top_attack_sources`
61
+ - Content categories: `news`, `fakenews` (remaining active categories)
62
+ - Mobile/TV: `mobile_ads`, `smart_tv_ads`
63
+
64
+ ### URL Health Monitoring and Cleanup
65
+ The gem includes automatic monitoring and cleanup of broken URLs:
66
+ - **Automatic removal of broken URLs**: Categories with URLs returning 403, 404, or persistent errors are commented out
67
+ - **Health checking tools**: Use `bin/check_lists` to verify all URLs in constants
68
+ - **Programmatic checking**: The `Client#check_all_lists` method provides detailed health reports
69
+ - **Recently removed categories**: Categories like `botnet_command_control` (403 Forbidden), `blogs`, `forums`, `educational`, `health`, `finance`, `streaming`, `shopping`, `business`, `technology`, `government` (404 Not Found) have been commented out until working URLs are found
70
+
71
+ ### Core Features
72
+ - Domain/URL categorization
73
+ - Multiple list format parsing
74
+ - Hash-based file update detection
75
+ - Optional local file caching
76
+ - IP sanctions list checking
77
+ - DNS resolution for domain-to-IP mapping
78
+ - ActiveRecord/Rails integration (optional)
79
+ - URL health monitoring and reporting
80
+ - Automatic cleanup of broken blocklist sources
81
+
82
+ ### Architecture
83
+ - `Client` class: Main interface for categorization
84
+ - `Constants` module: Contains default list URLs and categories
85
+ - Modular design allows extending with new list sources
86
+ - Support for custom list directories and caching
87
+
88
+ ### List Sources
89
+ Primary sources include:
90
+ - The Block List Project
91
+ - hagezi/dns-blocklists
92
+ - StevenBlack/hosts
93
+ - Various specialized security lists
94
+
95
+ ### Testing Guidelines
96
+ - Mock all HTTP requests using WebMock
97
+ - Test both success and failure scenarios
98
+ - Verify proper parsing of different list formats
99
+ - Test edge cases (empty responses, malformed data)
100
+ - Include integration tests for the full categorization flow
101
+
102
+ ### Performance Considerations
103
+ - Implement efficient parsing for large lists
104
+ - Use appropriate data structures for fast lookups
105
+ - Consider memory usage with large datasets
106
+ - Provide options for selective list loading
107
+
108
+ ### Configuration
109
+ - Allow custom list URLs
110
+ - Support for local file directories
111
+ - Configurable DNS servers for IP resolution
112
+ - Optional caching parameters
113
+
114
+ ## Build and Release Process
115
+ 1. Update version number in `lib/url_categorise/version.rb`
116
+ 2. Update CHANGELOG.md with new features
117
+ 3. Run full test suite: `bundle exec rake test`
118
+ 4. Update documentation as needed
119
+ 5. Build gem: `gem build url_categorise.gemspec`
120
+ 6. Release: `gem push url_categorise-x.x.x.gem`
121
+
122
+ ## Contributing
123
+ - Fork the repository
124
+ - Create a feature branch
125
+ - Add comprehensive tests for new functionality
126
+ - Ensure all tests pass
127
+ - Update documentation
128
+ - Submit a pull request
129
+
130
+ ## CI/CD
131
+ - GitHub Actions workflow runs tests on multiple Ruby versions
132
+ - All tests must pass before merging
133
+ - Coverage reporting with Codecov integration
134
+ - Automated dependency updates where appropriate
data/Gemfile.lock CHANGED
@@ -1,112 +1,172 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- UrlCategorise (0.0.3)
5
- api_pattern (~> 0.0.4)
4
+ UrlCategorise (0.1.1)
5
+ api_pattern (>= 0.0.6, < 1.0)
6
+ csv (>= 3.3.0, < 4.0)
7
+ digest (>= 3.1.0, < 4.0)
8
+ fileutils (>= 1.7.0, < 2.0)
9
+ httparty (>= 0.22.0, < 1.0)
10
+ nokogiri (>= 1.16.0, < 2.0)
11
+ resolv (>= 0.4.0, < 1.0)
6
12
 
7
13
  GEM
8
14
  remote: https://rubygems.org/
9
15
  specs:
10
- actionpack (7.0.4.3)
11
- actionview (= 7.0.4.3)
12
- activesupport (= 7.0.4.3)
13
- rack (~> 2.0, >= 2.2.0)
16
+ actionpack (8.0.2.1)
17
+ actionview (= 8.0.2.1)
18
+ activesupport (= 8.0.2.1)
19
+ nokogiri (>= 1.8.5)
20
+ rack (>= 2.2.4)
21
+ rack-session (>= 1.0.1)
14
22
  rack-test (>= 0.6.3)
15
- rails-dom-testing (~> 2.0)
16
- rails-html-sanitizer (~> 1.0, >= 1.2.0)
17
- actionview (7.0.4.3)
18
- activesupport (= 7.0.4.3)
23
+ rails-dom-testing (~> 2.2)
24
+ rails-html-sanitizer (~> 1.6)
25
+ useragent (~> 0.16)
26
+ actionview (8.0.2.1)
27
+ activesupport (= 8.0.2.1)
19
28
  builder (~> 3.1)
20
- erubi (~> 1.4)
21
- rails-dom-testing (~> 2.0)
22
- rails-html-sanitizer (~> 1.1, >= 1.2.0)
23
- active_attr (0.15.4)
24
- actionpack (>= 3.0.2, < 7.1)
25
- activemodel (>= 3.0.2, < 7.1)
26
- activesupport (>= 3.0.2, < 7.1)
27
- activemodel (7.0.4.3)
28
- activesupport (= 7.0.4.3)
29
- activesupport (7.0.4.3)
30
- concurrent-ruby (~> 1.0, >= 1.0.2)
29
+ erubi (~> 1.11)
30
+ rails-dom-testing (~> 2.2)
31
+ rails-html-sanitizer (~> 1.6)
32
+ active_attr (0.17.1)
33
+ actionpack (>= 3.0.2, < 8.1)
34
+ activemodel (>= 3.0.2, < 8.1)
35
+ activesupport (>= 3.0.2, < 8.1)
36
+ activemodel (8.0.2.1)
37
+ activesupport (= 8.0.2.1)
38
+ activerecord (8.0.2.1)
39
+ activemodel (= 8.0.2.1)
40
+ activesupport (= 8.0.2.1)
41
+ timeout (>= 0.4.0)
42
+ activesupport (8.0.2.1)
43
+ base64
44
+ benchmark (>= 0.3)
45
+ bigdecimal
46
+ concurrent-ruby (~> 1.0, >= 1.3.1)
47
+ connection_pool (>= 2.2.5)
48
+ drb
31
49
  i18n (>= 1.6, < 2)
50
+ logger (>= 1.4.2)
32
51
  minitest (>= 5.1)
33
- tzinfo (~> 2.0)
34
- addressable (2.8.4)
35
- public_suffix (>= 2.0.2, < 6.0)
52
+ securerandom (>= 0.3)
53
+ tzinfo (~> 2.0, >= 2.0.5)
54
+ uri (>= 0.13.1)
55
+ addressable (2.8.7)
56
+ public_suffix (>= 2.0.2, < 7.0)
36
57
  ansi (1.5.0)
37
- api_pattern (0.0.4)
38
- active_attr (~> 0.15.4)
39
- httparty (~> 0.21.0)
40
- nokogiri (~> 1.14.3)
41
- builder (3.2.4)
58
+ api_pattern (0.0.6)
59
+ active_attr (>= 0.15.4)
60
+ csv (>= 3.3.0)
61
+ httparty (>= 0.22.0)
62
+ nokogiri (>= 1.16.0)
63
+ base64 (0.3.0)
64
+ benchmark (0.4.1)
65
+ bigdecimal (3.2.2)
66
+ builder (3.3.0)
42
67
  coderay (1.1.3)
43
- concurrent-ruby (1.2.2)
44
- crack (0.4.5)
68
+ concurrent-ruby (1.3.5)
69
+ connection_pool (2.5.3)
70
+ crack (1.0.0)
71
+ bigdecimal
45
72
  rexml
46
73
  crass (1.0.6)
47
- erubi (1.12.0)
48
- hashdiff (1.0.1)
49
- httparty (0.21.0)
74
+ csv (3.3.5)
75
+ digest (3.2.0)
76
+ docile (1.4.1)
77
+ drb (2.2.3)
78
+ erubi (1.13.1)
79
+ fileutils (1.7.3)
80
+ hashdiff (1.2.0)
81
+ httparty (0.22.0)
82
+ csv
50
83
  mini_mime (>= 1.0.0)
51
84
  multi_xml (>= 0.5.2)
52
- i18n (1.13.0)
85
+ i18n (1.14.7)
53
86
  concurrent-ruby (~> 1.0)
54
- loofah (2.21.3)
87
+ logger (1.7.0)
88
+ loofah (2.24.1)
55
89
  crass (~> 1.0.2)
56
90
  nokogiri (>= 1.12.0)
57
- method_source (1.0.0)
58
- mini_mime (1.1.2)
59
- minitest (5.18.0)
60
- minitest-focus (1.3.1)
91
+ method_source (1.1.0)
92
+ mini_mime (1.1.5)
93
+ mini_portile2 (2.8.9)
94
+ minitest (5.25.5)
95
+ minitest-focus (1.4.0)
61
96
  minitest (>= 4, < 6)
62
- minitest-reporters (1.6.0)
97
+ minitest-reporters (1.7.1)
63
98
  ansi
64
99
  builder
65
100
  minitest (>= 5.0)
66
101
  ruby-progressbar
67
- mocha (2.0.2)
102
+ mocha (2.4.5)
68
103
  ruby2_keywords (>= 0.0.5)
69
- multi_xml (0.6.0)
70
- nokogiri (1.14.4-arm64-darwin)
104
+ multi_xml (0.7.2)
105
+ bigdecimal (~> 3.1)
106
+ nokogiri (1.16.8)
107
+ mini_portile2 (~> 2.8.2)
71
108
  racc (~> 1.4)
72
- pry (0.14.2)
109
+ pry (0.15.2)
73
110
  coderay (~> 1.1)
74
111
  method_source (~> 1.0)
75
- public_suffix (5.0.1)
76
- racc (1.6.2)
77
- rack (2.2.7)
78
- rack-test (2.1.0)
112
+ public_suffix (6.0.2)
113
+ racc (1.8.1)
114
+ rack (2.2.17)
115
+ rack-session (1.0.2)
116
+ rack (< 3)
117
+ rack-test (2.2.0)
79
118
  rack (>= 1.3)
80
- rails-dom-testing (2.0.3)
81
- activesupport (>= 4.2.0)
119
+ rails-dom-testing (2.3.0)
120
+ activesupport (>= 5.0.0)
121
+ minitest
82
122
  nokogiri (>= 1.6)
83
- rails-html-sanitizer (1.5.0)
84
- loofah (~> 2.19, >= 2.19.1)
85
- rake (13.0.6)
86
- rexml (3.2.5)
123
+ rails-html-sanitizer (1.6.2)
124
+ loofah (~> 2.21)
125
+ nokogiri (>= 1.15.7, != 1.16.7, != 1.16.6, != 1.16.5, != 1.16.4, != 1.16.3, != 1.16.2, != 1.16.1, != 1.16.0.rc1, != 1.16.0)
126
+ rake (13.3.0)
127
+ resolv (0.6.2)
128
+ rexml (3.4.1)
87
129
  ruby-progressbar (1.13.0)
88
130
  ruby2_keywords (0.0.5)
89
- timecop (0.9.6)
131
+ securerandom (0.4.1)
132
+ simplecov (0.22.0)
133
+ docile (~> 1.1)
134
+ simplecov-html (~> 0.11)
135
+ simplecov_json_formatter (~> 0.1)
136
+ simplecov-html (0.13.2)
137
+ simplecov_json_formatter (0.1.4)
138
+ sqlite3 (2.7.3)
139
+ mini_portile2 (~> 2.8.0)
140
+ sqlite3 (2.7.3-arm64-darwin)
141
+ timecop (0.9.10)
142
+ timeout (0.4.3)
90
143
  tzinfo (2.0.6)
91
144
  concurrent-ruby (~> 1.0)
92
- webmock (3.18.1)
145
+ uri (1.0.3)
146
+ useragent (0.16.11)
147
+ webmock (3.24.0)
93
148
  addressable (>= 2.8.0)
94
149
  crack (>= 0.3.2)
95
150
  hashdiff (>= 0.4.0, < 2.0.0)
96
151
 
97
152
  PLATFORMS
98
- arm64-darwin-20
153
+ arm64-darwin-24
154
+ ruby
99
155
 
100
156
  DEPENDENCIES
101
157
  UrlCategorise!
102
- minitest (~> 5.18.0)
103
- minitest-focus (~> 1.3.1)
104
- minitest-reporters (~> 1.6.0)
105
- mocha (~> 2.0.2)
106
- pry (~> 0.14.2)
107
- rake (~> 13.0.6)
108
- timecop (~> 0.9.6)
109
- webmock (~> 3.18.1)
158
+ activerecord (>= 8.0)
159
+ logger
160
+ minitest (~> 5.25.5)
161
+ minitest-focus (~> 1.4.0)
162
+ minitest-reporters (~> 1.7.1)
163
+ mocha (~> 2.4.5)
164
+ pry (~> 0.15.2)
165
+ rake (~> 13.3.0)
166
+ simplecov (~> 0.22.0)
167
+ sqlite3 (>= 2.7)
168
+ timecop (~> 0.9.10)
169
+ webmock (~> 3.24.0)
110
170
 
111
171
  BUNDLED WITH
112
- 2.3.16
172
+ 2.7.1