ronin-web-spider 0.1.0.beta1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: f4a08a150f4301148cc86877702ea981f2083d173c3101ee6047d44cfc5f076c
4
+ data.tar.gz: 8fb751613e0cee2161d08b338cce434b252ba83f62971275b5a33d0060cec567
5
+ SHA512:
6
+ metadata.gz: 5af53f0fbf55f13d8ccd2ff9fd13d7cfba7d6ce7c8aae90e5b44a467617212eb77247f9e343da118bdc8a369a9cbb7d240f8513d58373ab36b210d95b7f51e2c
7
+ data.tar.gz: d456194ef7e1c45a22a4419699e3f3cdcb82f09d987842ab15bcd3de009a2557cf1aa416da3d1ce0a8404fd3579d14103ca24e6b2495fa7684a421e9db6cf7b0
data/.document ADDED
@@ -0,0 +1,5 @@
1
+ lib/**/*.rb
2
+ -
3
+ ChangeLog.md
4
+ COPYING.txt
5
+ man/*.md
@@ -0,0 +1,31 @@
1
+ name: CI
2
+
3
+ on: [ push, pull_request ]
4
+
5
+ jobs:
6
+ tests:
7
+ runs-on: ubuntu-latest
8
+ strategy:
9
+ fail-fast: false
10
+ matrix:
11
+ ruby:
12
+ - '3.0'
13
+ - '3.1'
14
+ - '3.2'
15
+ - jruby
16
+ - truffleruby
17
+ name: Ruby ${{ matrix.ruby }}
18
+ steps:
19
+ - uses: actions/checkout@v2
20
+ - name: Set up Ruby
21
+ uses: ruby/setup-ruby@v1
22
+ with:
23
+ ruby-version: ${{ matrix.ruby }}
24
+ - name: Install libsqlite3
25
+ run: |
26
+ sudo apt update -y && \
27
+ sudo apt install -y --no-install-recommends --no-install-suggests libsqlite3-dev
28
+ - name: Install dependencies
29
+ run: bundle install --jobs 4 --retry 3
30
+ - name: Run tests
31
+ run: bundle exec rake test
data/.gitignore ADDED
@@ -0,0 +1,13 @@
1
+ /coverage
2
+ /doc
3
+ /pkg
4
+ /man/*.[1-9]
5
+ /vendor/bundle
6
+ /Gemfile.lock
7
+ /.bundle
8
+ /.yardoc
9
+ .DS_Store
10
+ *.db
11
+ *.log
12
+ *.swp
13
+ *~
data/.rspec ADDED
@@ -0,0 +1 @@
1
+ --colour --format documentation
data/.ruby-version ADDED
@@ -0,0 +1 @@
1
+ ruby-3.1
data/.yardopts ADDED
@@ -0,0 +1 @@
1
+ --markup markdown --title 'Ronin FIXME Documentation' --protected
data/COPYING.txt ADDED
@@ -0,0 +1,165 @@
1
+ GNU LESSER GENERAL PUBLIC LICENSE
2
+ Version 3, 29 June 2007
3
+
4
+ Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
5
+ Everyone is permitted to copy and distribute verbatim copies
6
+ of this license document, but changing it is not allowed.
7
+
8
+
9
+ This version of the GNU Lesser General Public License incorporates
10
+ the terms and conditions of version 3 of the GNU General Public
11
+ License, supplemented by the additional permissions listed below.
12
+
13
+ 0. Additional Definitions.
14
+
15
+ As used herein, "this License" refers to version 3 of the GNU Lesser
16
+ General Public License, and the "GNU GPL" refers to version 3 of the GNU
17
+ General Public License.
18
+
19
+ "The Library" refers to a covered work governed by this License,
20
+ other than an Application or a Combined Work as defined below.
21
+
22
+ An "Application" is any work that makes use of an interface provided
23
+ by the Library, but which is not otherwise based on the Library.
24
+ Defining a subclass of a class defined by the Library is deemed a mode
25
+ of using an interface provided by the Library.
26
+
27
+ A "Combined Work" is a work produced by combining or linking an
28
+ Application with the Library. The particular version of the Library
29
+ with which the Combined Work was made is also called the "Linked
30
+ Version".
31
+
32
+ The "Minimal Corresponding Source" for a Combined Work means the
33
+ Corresponding Source for the Combined Work, excluding any source code
34
+ for portions of the Combined Work that, considered in isolation, are
35
+ based on the Application, and not on the Linked Version.
36
+
37
+ The "Corresponding Application Code" for a Combined Work means the
38
+ object code and/or source code for the Application, including any data
39
+ and utility programs needed for reproducing the Combined Work from the
40
+ Application, but excluding the System Libraries of the Combined Work.
41
+
42
+ 1. Exception to Section 3 of the GNU GPL.
43
+
44
+ You may convey a covered work under sections 3 and 4 of this License
45
+ without being bound by section 3 of the GNU GPL.
46
+
47
+ 2. Conveying Modified Versions.
48
+
49
+ If you modify a copy of the Library, and, in your modifications, a
50
+ facility refers to a function or data to be supplied by an Application
51
+ that uses the facility (other than as an argument passed when the
52
+ facility is invoked), then you may convey a copy of the modified
53
+ version:
54
+
55
+ a) under this License, provided that you make a good faith effort to
56
+ ensure that, in the event an Application does not supply the
57
+ function or data, the facility still operates, and performs
58
+ whatever part of its purpose remains meaningful, or
59
+
60
+ b) under the GNU GPL, with none of the additional permissions of
61
+ this License applicable to that copy.
62
+
63
+ 3. Object Code Incorporating Material from Library Header Files.
64
+
65
+ The object code form of an Application may incorporate material from
66
+ a header file that is part of the Library. You may convey such object
67
+ code under terms of your choice, provided that, if the incorporated
68
+ material is not limited to numerical parameters, data structure
69
+ layouts and accessors, or small macros, inline functions and templates
70
+ (ten or fewer lines in length), you do both of the following:
71
+
72
+ a) Give prominent notice with each copy of the object code that the
73
+ Library is used in it and that the Library and its use are
74
+ covered by this License.
75
+
76
+ b) Accompany the object code with a copy of the GNU GPL and this license
77
+ document.
78
+
79
+ 4. Combined Works.
80
+
81
+ You may convey a Combined Work under terms of your choice that,
82
+ taken together, effectively do not restrict modification of the
83
+ portions of the Library contained in the Combined Work and reverse
84
+ engineering for debugging such modifications, if you also do each of
85
+ the following:
86
+
87
+ a) Give prominent notice with each copy of the Combined Work that
88
+ the Library is used in it and that the Library and its use are
89
+ covered by this License.
90
+
91
+ b) Accompany the Combined Work with a copy of the GNU GPL and this license
92
+ document.
93
+
94
+ c) For a Combined Work that displays copyright notices during
95
+ execution, include the copyright notice for the Library among
96
+ these notices, as well as a reference directing the user to the
97
+ copies of the GNU GPL and this license document.
98
+
99
+ d) Do one of the following:
100
+
101
+ 0) Convey the Minimal Corresponding Source under the terms of this
102
+ License, and the Corresponding Application Code in a form
103
+ suitable for, and under terms that permit, the user to
104
+ recombine or relink the Application with a modified version of
105
+ the Linked Version to produce a modified Combined Work, in the
106
+ manner specified by section 6 of the GNU GPL for conveying
107
+ Corresponding Source.
108
+
109
+ 1) Use a suitable shared library mechanism for linking with the
110
+ Library. A suitable mechanism is one that (a) uses at run time
111
+ a copy of the Library already present on the user's computer
112
+ system, and (b) will operate properly with a modified version
113
+ of the Library that is interface-compatible with the Linked
114
+ Version.
115
+
116
+ e) Provide Installation Information, but only if you would otherwise
117
+ be required to provide such information under section 6 of the
118
+ GNU GPL, and only to the extent that such information is
119
+ necessary to install and execute a modified version of the
120
+ Combined Work produced by recombining or relinking the
121
+ Application with a modified version of the Linked Version. (If
122
+ you use option 4d0, the Installation Information must accompany
123
+ the Minimal Corresponding Source and Corresponding Application
124
+ Code. If you use option 4d1, you must provide the Installation
125
+ Information in the manner specified by section 6 of the GNU GPL
126
+ for conveying Corresponding Source.)
127
+
128
+ 5. Combined Libraries.
129
+
130
+ You may place library facilities that are a work based on the
131
+ Library side by side in a single library together with other library
132
+ facilities that are not Applications and are not covered by this
133
+ License, and convey such a combined library under terms of your
134
+ choice, if you do both of the following:
135
+
136
+ a) Accompany the combined library with a copy of the same work based
137
+ on the Library, uncombined with any other library facilities,
138
+ conveyed under the terms of this License.
139
+
140
+ b) Give prominent notice with the combined library that part of it
141
+ is a work based on the Library, and explaining where to find the
142
+ accompanying uncombined form of the same work.
143
+
144
+ 6. Revised Versions of the GNU Lesser General Public License.
145
+
146
+ The Free Software Foundation may publish revised and/or new versions
147
+ of the GNU Lesser General Public License from time to time. Such new
148
+ versions will be similar in spirit to the present version, but may
149
+ differ in detail to address new problems or concerns.
150
+
151
+ Each version is given a distinguishing version number. If the
152
+ Library as you received it specifies that a certain numbered version
153
+ of the GNU Lesser General Public License "or any later version"
154
+ applies to it, you have the option of following the terms and
155
+ conditions either of that published version or of any later version
156
+ published by the Free Software Foundation. If the Library as you
157
+ received it does not specify a version number of the GNU Lesser
158
+ General Public License, you may choose any version of the GNU Lesser
159
+ General Public License ever published by the Free Software Foundation.
160
+
161
+ If the Library as you received it specifies that a proxy can decide
162
+ whether future versions of the GNU Lesser General Public License shall
163
+ apply, that proxy's public statement of acceptance of any version is
164
+ permanent authorization for you to choose that version for the
165
+ Library.
data/ChangeLog.md ADDED
@@ -0,0 +1,19 @@
1
+ ### 0.1.0 / 2023-XX-XX
2
+
3
+ * Initial release:
4
+ * Built on top of the battle tested and versatile [spidr] gem.
5
+ * Provides additional callback methods:
6
+ * `every_host` - yields every unique host name that's spidered.
7
+ * `every_cert` - yields every unique SSL/TLS certificate encountered while
8
+ spidering.
9
+ * `every_favicon` - yields every favicon file that's encountered while
10
+ spidering.
11
+ * `every_html_comment` - yields every HTML comment.
12
+ * `every_javascript` - yields all JavaScript source code from either inline
13
+ `<script>` or `.js` files.
14
+ * `every_javascript_string` - yields every single-quoted or double-quoted
15
+ String literal from all JavaScript source code.
16
+ * `every_javascript_comment` - yields every JavaScript comment.
17
+ * `every_comment` - yields every HTML or JavaScript comment.
18
+ * Supports archiving spidered pages to a directory or git repository.
19
+
data/Gemfile ADDED
@@ -0,0 +1,31 @@
1
+ source 'https://rubygems.org'
2
+
3
+ gemspec
4
+
5
+ platform :jruby do
6
+ gem 'jruby-openssl', '~> 0.7'
7
+ end
8
+
9
+ # gem 'spidr', '~> 0.7', github: 'postmodern/spidr'
10
+
11
+ gem 'ronin-support', '~> 1.0', github: "ronin-rb/ronin-support",
12
+ branch: '1.0.0'
13
+
14
+ group :development do
15
+ gem 'rake'
16
+ gem 'rubygems-tasks', '~> 0.2'
17
+
18
+ gem 'rspec', '~> 3.0'
19
+ gem 'webmock', '~> 2.0'
20
+ gem 'sinatra', '~> 1.0'
21
+ gem 'simplecov', '~> 0.20'
22
+
23
+ gem 'kramdown', '~> 2.0'
24
+ gem 'redcarpet', platform: :mri
25
+ gem 'yard', '~> 0.9'
26
+ gem 'yard-spellcheck', require: false
27
+
28
+ gem 'dead_end', require: false
29
+ gem 'sord', require: false, platform: :mri
30
+ gem 'stackprof', require: false, platform: :mri
31
+ end
data/README.md ADDED
@@ -0,0 +1,139 @@
1
+ # ronin-web-spider
2
+
3
+ [![CI](https://github.com/ronin-rb/ronin-web-spider/actions/workflows/ruby.yml/badge.svg)](https://github.com/ronin-rb/ronin-web-spider/actions/workflows/ruby.yml)
4
+ [![Code Climate](https://codeclimate.com/github/ronin-rb/ronin-web-spider.svg)](https://codeclimate.com/github/ronin-rb/ronin-web-spider)
5
+
6
+ * [Website](https://ronin-rb.dev/)
7
+ * [Source](https://github.com/ronin-rb/ronin-web-spider)
8
+ * [Issues](https://github.com/ronin-rb/ronin-web-spider/issues)
9
+ * [Documentation](https://ronin-rb.dev/docs/ronin-web-spider/frames)
10
+ * [Discord](https://discord.gg/6WAb3PsVX9) |
11
+ [Twitter](https://twitter.com/ronin_rb) |
12
+ [Mastodon](https://infosec.exchange/@ronin_rb)
13
+
14
+ ## Description
15
+
16
+ ronin-web-spider is a collection of common web spidering routines using the
17
+ [spidr] gem.
18
+
19
+ ## Features
20
+
21
+ * Built on top of the battle tested and versatile [spidr] gem.
22
+ * Provides additional callback methods:
23
+ * `every_host` - yields every unique host name that's spidered.
24
+ * `every_cert` - yields every unique SSL/TLS certificate encountered while
25
+ spidering.
26
+ * `every_favicon` - yields every favicon file that's encountered while
27
+ spidering.
28
+ * `every_html_comment` - yields every HTML comment.
29
+ * `every_javascript` - yields all JavaScript source code from either inline
30
+ `<script>` or `.js` files.
31
+ * `every_javascript_string` - yields every single-quoted or double-quoted
32
+ String literal from all JavaScript source code.
33
+ * `every_javascript_comment` - yields every JavaScript comment.
34
+ * `every_comment` - yields every HTML or JavaScript comment.
35
+ * Supports archiving spidered pages to a directory or git repository.
36
+ * Has 94% documentation coverage.
37
+ * Has 94% test coverage.
38
+
39
+ ## Examples
40
+
41
+ Spider a host:
42
+
43
+ ```ruby
44
+ require 'ronin/web/spider'
45
+
46
+ Ronin::Web::Spider.host('www.example.com') do |agent|
47
+ agent.ever_url do |url|
48
+ # ...
49
+ end
50
+
51
+ agent.every_url_like(/.../) do |url|
52
+ # ...
53
+ end
54
+
55
+ agent.every_page do |page|
56
+ # ...
57
+ end
58
+ end
59
+ ```
60
+
61
+ See [Spidr::Agent] documentation for more agent methods.
62
+
63
+ [Spidr::Agent]: https://rubydoc.info/gems/spidr/Spidr/Agent
64
+
65
+ Spider a domain:
66
+
67
+ ```ruby
68
+ Ronin::Web::Spider.domain('example.com') do |agent|
69
+ agent.every_page do |page|
70
+ # ...
71
+ end
72
+ end
73
+ ```
74
+
75
+ Spider a website:
76
+
77
+ ```ruby
78
+ Ronin::Web::Spider.site('https://www.example.com/index.html') do |agent|
79
+ agent.every_page do |page|
80
+ # ...
81
+ end
82
+ end
83
+ ```
84
+
85
+ ## Requirements
86
+
87
+ * [Ruby] >= 3.0.0
88
+ * [spidr] ~> 0.7
89
+ * [ronin-support] ~> 1.0
90
+
91
+ ## Install
92
+
93
+ ```shell
94
+ $ gem install ronin-web-spider
95
+ ```
96
+
97
+ ### Gemfile
98
+
99
+ ```ruby
100
+ gem 'ronin-web-spider', '~> 0.1'
101
+ ```
102
+
103
+ ### gemspec
104
+
105
+ ```ruby
106
+ gem.add_dependency 'ronin-web-spider', '~> 0.1'
107
+ ```
108
+
109
+ ## Development
110
+
111
+ 1. [Fork It!](https://github.com/ronin-rb/ronin-web-spider/fork)
112
+ 2. Clone It!
113
+ 3. `cd ronin-web-spider/`
114
+ 4. `bundle install`
115
+ 5. `git checkout -b my_feature`
116
+ 6. Code It!
117
+ 7. `bundle exec rake spec`
118
+ 8. `git push origin my_feature`
119
+
120
+ ## License
121
+
122
+ Copyright (c) 2006-2022 Hal Brodigan (postmodern.mod3 at gmail.com)
123
+
124
+ ronin-web-spider is free software: you can redistribute it and/or modify
125
+ it under the terms of the GNU Lesser General Public License as published
126
+ by the Free Software Foundation, either version 3 of the License, or
127
+ (at your option) any later version.
128
+
129
+ ronin-web-spider is distributed in the hope that it will be useful,
130
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
131
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
132
+ GNU Lesser General Public License for more details.
133
+
134
+ You should have received a copy of the GNU Lesser General Public License
135
+ along with ronin-web-spider. If not, see <https://www.gnu.org/licenses/>.
136
+
137
+ [Ruby]: https://www.ruby-lang.org
138
+ [spidr]: https://github.com/postmodern/spidr#readme
139
+ [ronin-support]: https://github.com/ronin-rb/ronin-support#readme
data/Rakefile ADDED
@@ -0,0 +1,31 @@
1
+ require 'rubygems'
2
+
3
+ begin
4
+ require 'bundler'
5
+ rescue LoadError => e
6
+ warn e.message
7
+ warn "Run `gem install bundler` to install Bundler"
8
+ exit -1
9
+ end
10
+
11
+ begin
12
+ Bundler.setup(:development)
13
+ rescue Bundler::BundlerError => e
14
+ warn e.message
15
+ warn "Run `bundle install` to install missing gems"
16
+ exit e.status_code
17
+ end
18
+
19
+ require 'rake'
20
+
21
+ require 'rubygems/tasks'
22
+ Gem::Tasks.new(sign: {checksum: true, pgp: true})
23
+
24
+ require 'rspec/core/rake_task'
25
+ RSpec::Core::RakeTask.new
26
+ task :test => :spec
27
+ task :default => :spec
28
+
29
+ require 'yard'
30
+ YARD::Rake::YardocTask.new
31
+ task :docs => :yard
data/gemspec.yml ADDED
@@ -0,0 +1,27 @@
1
+ name: ronin-web-spider
2
+ summary: collection of common web spidering routines
3
+ description:
4
+ ronin-web-spider is a collection of common web spidering routines using the
5
+ spidr gem.
6
+
7
+ license: LGPL-3.0
8
+ authors: Postmodern
9
+ email: postmodern.mod3@gmail.com
10
+ homepage: https://ronin-rb.dev/
11
+ has_yard: true
12
+
13
+ metadata:
14
+ documentation_uri: https://rubydoc.info/gems/ronin-web-spider
15
+ source_code_uri: https://github.com/ronin-rb/ronin-web-spider
16
+ bug_tracker_uri: https://github.com/ronin-rb/ronin-web-spider/issues
17
+ changelog_uri: https://github.com/ronin-rb/ronin-web-spider/blob/master/ChangeLog.md
18
+ rubygems_mfa_required: 'true'
19
+
20
+ required_ruby_version: ">= 3.0.0"
21
+
22
+ dependencies:
23
+ spidr: ~> 0.7
24
+ ronin-support: ~> 1.0
25
+
26
+ development_dependencies:
27
+ bundler: ~> 2.0