mkwebook 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 91b2c6fce12ddd620f497fc5a902c000525d8581e34ede7107ac55b2771871e4
4
+ data.tar.gz: 11ca2232b8c30848b352737eb9afebafbfda49e0a974b93586627390a524d977
5
+ SHA512:
6
+ metadata.gz: 7aaafd73130c773b6f2b5a942ab525ee95fb84a4a7b01e4ee890edabc1554563e9f3c6dc3fbd36b3321212a50969006e1a9bfd5efb0e0028a92b18bae4df319d
7
+ data.tar.gz: c8578b37ba25133d81e487f5486c0eba9c16712d048b64e0bf2336ef28a0310ce5b93994bc43634c692e671d043a1b97a70124d0ddf616e9330ebf669e0c32ea
data/.gitignore ADDED
@@ -0,0 +1,10 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /coverage/
5
+ /doc/
6
+ /pkg/
7
+ /spec/reports/
8
+ /tmp/
9
+ /.idea/
10
+ /.log/
data/.solargraph.yml ADDED
@@ -0,0 +1,11 @@
1
+ require:
2
+ - actioncable
3
+ - actionmailer
4
+ - actionpack
5
+ - actionview
6
+ - activejob
7
+ - activemodel
8
+ - activerecord
9
+ - activestorage
10
+ - activesupport
11
+ - caxlsx
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at liuxiang@ktjr.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [https://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: https://contributor-covenant.org
74
+ [version]: https://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source "https://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in mkwebook.gemspec
4
+ gemspec
5
+
6
+ gem "rake", "~> 12.0"
data/Gemfile.lock ADDED
@@ -0,0 +1,64 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ mkwebook (0.1.0)
5
+ activesupport (>= 6.1.5)
6
+ concurrent-ruby
7
+ ferrum (>= 0.13)
8
+ thor (>= 1.2.1)
9
+
10
+ GEM
11
+ remote: https://rubygems.org/
12
+ specs:
13
+ activesupport (7.0.4)
14
+ concurrent-ruby (~> 1.0, >= 1.0.2)
15
+ i18n (>= 1.6, < 2)
16
+ minitest (>= 5.1)
17
+ tzinfo (~> 2.0)
18
+ addressable (2.8.1)
19
+ public_suffix (>= 2.0.2, < 6.0)
20
+ byebug (11.1.3)
21
+ coderay (1.1.3)
22
+ concurrent-ruby (1.1.10)
23
+ ferrum (0.13)
24
+ addressable (~> 2.5)
25
+ concurrent-ruby (~> 1.1)
26
+ webrick (~> 1.7)
27
+ websocket-driver (>= 0.6, < 0.8)
28
+ i18n (1.12.0)
29
+ concurrent-ruby (~> 1.0)
30
+ method_source (1.0.0)
31
+ minitest (5.16.3)
32
+ pry (0.13.1)
33
+ coderay (~> 1.1)
34
+ method_source (~> 1.0)
35
+ pry-byebug (3.9.0)
36
+ byebug (~> 11.0)
37
+ pry (~> 0.13.0)
38
+ pry-doc (1.3.0)
39
+ pry (~> 0.11)
40
+ yard (~> 0.9.11)
41
+ public_suffix (5.0.0)
42
+ rake (12.3.3)
43
+ thor (1.2.1)
44
+ tzinfo (2.0.5)
45
+ concurrent-ruby (~> 1.0)
46
+ webrick (1.7.0)
47
+ websocket-driver (0.7.5)
48
+ websocket-extensions (>= 0.1.0)
49
+ websocket-extensions (0.1.5)
50
+ yard (0.9.28)
51
+ webrick (~> 1.7.0)
52
+
53
+ PLATFORMS
54
+ ruby
55
+
56
+ DEPENDENCIES
57
+ mkwebook!
58
+ pry
59
+ pry-byebug
60
+ pry-doc
61
+ rake (~> 12.0)
62
+
63
+ BUNDLED WITH
64
+ 2.3.3
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2020 Liu Xiang
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
File without changes
data/Rakefile ADDED
@@ -0,0 +1,2 @@
1
+ require "bundler/gem_tasks"
2
+ task :default => :spec
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "mkwebook"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
data/bin/setup ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
data/exe/mkwebook ADDED
@@ -0,0 +1,5 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "mkwebook"
4
+
5
+ Mkwebook::Cli.start
@@ -0,0 +1,179 @@
1
+ require 'fileutils'
2
+ require 'Mkwebook/config'
3
+ require 'ferrum'
4
+ require 'pry-byebug'
5
+ require 'concurrent'
6
+
7
+ module Mkwebook
8
+ class App
9
+ attr_accessor :config, :browser, :browser_context, :cli_options
10
+
11
+ def initialize(cli_options)
12
+ if cli_options[:work_dir]
13
+ FileUtils.mkdir_p(cli_options[:work_dir]) unless File.directory?(cli_options[:work_dir])
14
+ Dir.chdir(cli_options[:work_dir])
15
+ end
16
+ @cli_options = cli_options
17
+ @config = Mkwebook::Config.new(@cli_options[:pause] || @cli_options[:pause_on_error] || @cli_options[:single_thread])
18
+ end
19
+
20
+ def create_config
21
+ FileUtils.cp(template_config_file, 'mkwebook.yml', verbose: true)
22
+ end
23
+
24
+ def template_config_file
25
+ File.join(Mkwebook::GEM_ROOT, 'template', 'mkwebook.yml')
26
+ end
27
+
28
+ def make
29
+ make_index
30
+ make_pages
31
+ end
32
+
33
+ def prepare_browser
34
+ @browser = Ferrum::Browser.new(browser_options)
35
+ @browser_context = browser.contexts.create
36
+ end
37
+
38
+ def make_index
39
+ prepare_browser
40
+ index_page = @browser_context.create_page
41
+ index_page.go_to(@config[:index_page][:url])
42
+ index_page.network.wait_for_idle(timeout: 10) rescue nil
43
+ modifier = @config[:index_page][:modifier]
44
+ if modifier && File.file?(modifier)
45
+ index_page.execute(File.read(modifier))
46
+ elsif modifier.present?
47
+ index_page.execute(modifier)
48
+ end
49
+ index_elements = index_page.css(@config[:index_page][:selector])
50
+
51
+ @page_urls = index_elements.flat_map do |element|
52
+ url = element.css(@config[:index_page][:link_selector]).map { |a| a.evaluate('this.href') }
53
+ element.css(@config[:index_page][:link_selector]).each do |a|
54
+ u = a.evaluate('this.href').normalize_uri('.html').relative_path_from(@config[:index_page][:output])
55
+ a.evaluate("this.href = '#{u}'")
56
+ end
57
+ url
58
+ end.uniq
59
+
60
+ @page_urls.select! do |url|
61
+ @config[:pages].any? { |page| url =~ Regexp.new(page[:url_pattern]) }
62
+ end
63
+
64
+ @page_urls = @page_urls[0, @cli_options[:limit]] if @cli_options[:limit]
65
+
66
+
67
+ @config[:index_page][:title].try do |title|
68
+ index_page.execute("document.title = '#{title}'")
69
+ end
70
+
71
+ index_page.execute <<-JS
72
+ for (var e of document.querySelectorAll('[integrity]')) {
73
+ e.removeAttribute('integrity');
74
+ }
75
+ JS
76
+
77
+ binding.pry if @cli_options[:pause]
78
+ download_assets(index_page, @config[:index_page][:assets] || [], @config[:index_page][:output])
79
+
80
+ index_elements.map do |element|
81
+ element.evaluate('this.outerHTML')
82
+ end.join("\n").tap do |html|
83
+ File.write(@config[:index_page][:output], html)
84
+ end
85
+ rescue Ferrum::Error => e
86
+ binding.pry
87
+ end
88
+
89
+ def make_pages
90
+
91
+ pool = Concurrent::FixedThreadPool.new(@config[:concurrency])
92
+
93
+ @page_urls.each do |url|
94
+ page_config = @config[:pages].find { |page| url =~ Regexp.new(page[:url_pattern]) }
95
+ next unless page_config
96
+
97
+ pool.post do
98
+ page = @browser_context.create_page
99
+
100
+ begin
101
+ output = url.normalize_file_path('.html')
102
+ page.go_to(url)
103
+ page.network.wait_for_idle(timeout: 10) rescue nil
104
+ modifier = page_config[:modifier]
105
+ if modifier && File.file?(modifier)
106
+ page.execute(File.read(modifier))
107
+ elsif modifier.present?
108
+ page.execute(modifier)
109
+ end
110
+ page_elements = page.css(page_config[:selector])
111
+
112
+ @config[:index_page][:title].try do |title|
113
+ page.execute("document.title = '#{title}'")
114
+ end
115
+
116
+ page.execute <<-JS
117
+ for (var e of document.querySelectorAll('[integrity]')) {
118
+ e.removeAttribute('integrity');
119
+ }
120
+ JS
121
+
122
+ binding.pry if @cli_options[:pause]
123
+ download_assets(page, page_config[:assets] || [])
124
+
125
+ page_elements.map do |element|
126
+ element.css('a').each do |a|
127
+ u = a.evaluate('this.href')
128
+ next unless @page_urls.include?(u)
129
+
130
+ u = u.normalize_uri('.html').relative_path_from(url.normalize_uri('.html'))
131
+ a.evaluate("this.href = '#{u}'")
132
+ end
133
+ element.evaluate('this.outerHTML')
134
+ end.join("\n").tap do |html|
135
+ FileUtils.mkdir_p(File.dirname(output))
136
+ File.write(output, html)
137
+ end
138
+ rescue Ferrum::Error => e
139
+ $stderr.puts e.message
140
+ $stderr.puts e.backtrace
141
+ binding.pry if @cli_options[:pause_on_error]
142
+ ensure
143
+ page.close
144
+ end
145
+ end
146
+
147
+ end
148
+
149
+ pool.shutdown
150
+ pool.wait_for_termination
151
+ end
152
+
153
+ def download_assets(page, assets_config, page_uri = nil)
154
+ assets_config.each do |asset_config|
155
+ asset_attr = asset_config[:attr]
156
+ asset_selector = asset_config[:selector]
157
+ page.css(asset_selector).each do |element|
158
+ asset_url = element.evaluate("this.#{asset_attr}")
159
+ next if asset_url.start_with?('data:')
160
+ asset_file = asset_url.normalize_file_path
161
+ FileUtils.mkdir_p(File.dirname(asset_file))
162
+ page.network.traffic.find { |t| t.url == asset_url }.try do |traffic|
163
+ traffic&.response&.body.try do |body|
164
+ File.write(asset_file, body)
165
+ end
166
+ end
167
+ u = asset_url.normalize_uri.relative_path_from((page_uri || page.url.normalize_uri))
168
+ element.evaluate("this.#{asset_attr} = '#{u}'")
169
+ end
170
+ end
171
+ end
172
+
173
+ private
174
+
175
+ def browser_options
176
+ @config[:browser]
177
+ end
178
+ end
179
+ end
@@ -0,0 +1,41 @@
1
+ require 'thor'
2
+
3
+ module Mkwebook
4
+ class Cli < ::Thor
5
+ class << self
6
+ def main(args)
7
+ start(args)
8
+ end
9
+ end
10
+
11
+ class_option :work_dir, :type => :string, :aliases => '-d', :default => '.', :desc => 'Working directory'
12
+ class_option :pause_on_error, :type => :boolean, :aliases => '-e', :default => false, :desc => 'Pause on error, this option will force concurrency off'
13
+ desc 'init', 'Create config file in current directory'
14
+ def init
15
+ Mkwebook::App.new(options).create_config
16
+ end
17
+
18
+ option :pause, :type => :boolean, :aliases => '-p', :desc => 'Pause after processing index page'
19
+ desc 'make_index', 'Download and process index page'
20
+ def make_index
21
+ Mkwebook::App.new(options).make_index
22
+ end
23
+
24
+ option :limit, :type => :numeric, :aliases => '-l', :desc => 'Limit number of pages, specially for debugging'
25
+ option :pause, :type => :boolean, :aliases => '-P', :desc => 'Pause before quit, this option will force concurrency off'
26
+ option :pause_on_index, :type => :boolean, :aliases => '-p', :desc => 'Pause after processing index page'
27
+ option :single_thread, :type => :boolean, :aliases => '-s', :desc => 'Force conccurency off'
28
+ desc 'make', 'Download and process html files'
29
+ def make
30
+ Mkwebook::App.new(options).make
31
+ end
32
+
33
+ desc 'version', 'Print version'
34
+ def version
35
+ puts Mkwebook::VERSION
36
+ end
37
+
38
+ no_commands do
39
+ end
40
+ end
41
+ end
@@ -0,0 +1,4 @@
1
+ require 'mkwebook/commands/info'
2
+
3
+ module Mkwebook::Commands
4
+ end
@@ -0,0 +1,244 @@
1
+ require 'active_support/concern'
2
+
3
+ module Mkwebook
4
+ module Concerns
5
+ module GlobalDataDefinition
6
+ extend ActiveSupport::Concern
7
+
8
+ included do
9
+
10
+ # Example:
11
+ #
12
+ # create_table :post, id: false, primary_key: :id do |t|
13
+ # t.column :id, :bigint, precison: 19, comment: 'ID'
14
+ # t.column :name, :string, comment: '名称'
15
+ # t.column :gmt_created, :datetime, comment: '创建时间'
16
+ # t.column :gmt_modified, :datetime, comment: '最后修改时间'
17
+ # end
18
+ #
19
+ # Creates a new table with the name +table_name+. +table_name+ may either
20
+ # be a String or a Symbol.
21
+ #
22
+ # There are two ways to work with #create_table. You can use the block
23
+ # form or the regular form, like this:
24
+ #
25
+ # === Block form
26
+ #
27
+ # # create_table() passes a TableDefinition object to the block.
28
+ # # This form will not only create the table, but also columns for the
29
+ # # table.
30
+ #
31
+ # create_table(:suppliers) do |t|
32
+ # t.column :name, :string, limit: 60
33
+ # # Other fields here
34
+ # end
35
+ #
36
+ # === Block form, with shorthand
37
+ #
38
+ # # You can also use the column types as method calls, rather than calling the column method.
39
+ # create_table(:suppliers) do |t|
40
+ # t.string :name, limit: 60
41
+ # # Other fields here
42
+ # end
43
+ #
44
+ # === Regular form
45
+ #
46
+ # # Creates a table called 'suppliers' with no columns.
47
+ # create_table(:suppliers)
48
+ # # Add a column to 'suppliers'.
49
+ # add_column(:suppliers, :name, :string, {limit: 60})
50
+ #
51
+ # The +options+ hash can include the following keys:
52
+ # [<tt>:id</tt>]
53
+ # Whether to automatically add a primary key column. Defaults to true.
54
+ # Join tables for {ActiveRecord::Base.has_and_belongs_to_many}[rdoc-ref:Associations::ClassMethods#has_and_belongs_to_many] should set it to false.
55
+ #
56
+ # A Symbol can be used to specify the type of the generated primary key column.
57
+ # [<tt>:primary_key</tt>]
58
+ # The name of the primary key, if one is to be added automatically.
59
+ # Defaults to +id+. If <tt>:id</tt> is false, then this option is ignored.
60
+ #
61
+ # If an array is passed, a composite primary key will be created.
62
+ #
63
+ # Note that Active Record models will automatically detect their
64
+ # primary key. This can be avoided by using
65
+ # {self.primary_key=}[rdoc-ref:AttributeMethods::PrimaryKey::ClassMethods#primary_key=] on the model
66
+ # to define the key explicitly.
67
+ #
68
+ # [<tt>:options</tt>]
69
+ # Any extra options you want appended to the table definition.
70
+ # [<tt>:temporary</tt>]
71
+ # Make a temporary table.
72
+ # [<tt>:force</tt>]
73
+ # Set to true to drop the table before creating it.
74
+ # Set to +:cascade+ to drop dependent objects as well.
75
+ # Defaults to false.
76
+ # [<tt>:if_not_exists</tt>]
77
+ # Set to true to avoid raising an error when the table already exists.
78
+ # Defaults to false.
79
+ # [<tt>:as</tt>]
80
+ # SQL to use to generate the table. When this option is used, the block is
81
+ # ignored, as are the <tt>:id</tt> and <tt>:primary_key</tt> options.
82
+ #
83
+ # ====== Add a backend specific option to the generated SQL (MySQL)
84
+ #
85
+ # create_table(:suppliers, options: 'ENGINE=InnoDB DEFAULT CHARSET=utf8mb4')
86
+ #
87
+ # generates:
88
+ #
89
+ # CREATE TABLE suppliers (
90
+ # id bigint auto_increment PRIMARY KEY
91
+ # ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
92
+ #
93
+ # ====== Rename the primary key column
94
+ #
95
+ # create_table(:objects, primary_key: 'guid') do |t|
96
+ # t.column :name, :string, limit: 80
97
+ # end
98
+ #
99
+ # generates:
100
+ #
101
+ # CREATE TABLE objects (
102
+ # guid bigint auto_increment PRIMARY KEY,
103
+ # name varchar(80)
104
+ # )
105
+ #
106
+ # ====== Change the primary key column type
107
+ #
108
+ # create_table(:tags, id: :string) do |t|
109
+ # t.column :label, :string
110
+ # end
111
+ #
112
+ # generates:
113
+ #
114
+ # CREATE TABLE tags (
115
+ # id varchar PRIMARY KEY,
116
+ # label varchar
117
+ # )
118
+ #
119
+ # ====== Create a composite primary key
120
+ #
121
+ # create_table(:orders, primary_key: [:product_id, :client_id]) do |t|
122
+ # t.belongs_to :product
123
+ # t.belongs_to :client
124
+ # end
125
+ #
126
+ # generates:
127
+ #
128
+ # CREATE TABLE order (
129
+ # product_id bigint NOT NULL,
130
+ # client_id bigint NOT NULL
131
+ # );
132
+ #
133
+ # ALTER TABLE ONLY "orders"
134
+ # ADD CONSTRAINT orders_pkey PRIMARY KEY (product_id, client_id);
135
+ #
136
+ # ====== Do not add a primary key column
137
+ #
138
+ # create_table(:categories_suppliers, id: false) do |t|
139
+ # t.column :category_id, :bigint
140
+ # t.column :supplier_id, :bigint
141
+ # end
142
+ #
143
+ # generates:
144
+ #
145
+ # CREATE TABLE categories_suppliers (
146
+ # category_id bigint,
147
+ # supplier_id bigint
148
+ # )
149
+ #
150
+ # ====== Create a temporary table based on a query
151
+ #
152
+ # create_table(:long_query, temporary: true,
153
+ # as: "SELECT * FROM orders INNER JOIN line_items ON order_id=orders.id")
154
+ #
155
+ # generates:
156
+ #
157
+ # CREATE TEMPORARY TABLE long_query AS
158
+ # SELECT * FROM orders INNER JOIN line_items ON order_id=orders.id
159
+ #
160
+ # See also TableDefinition#column for details on how to create columns.
161
+ def create_table(table_name, **options, &blk)
162
+ ActiveRecord::Base.connection.create_table(table_name, **options, &blk)
163
+ end
164
+
165
+ # Creates a new join table with the name created using the lexical order of the first two
166
+ # arguments. These arguments can be a String or a Symbol.
167
+ #
168
+ # # Creates a table called 'assemblies_parts' with no id.
169
+ # create_join_table(:assemblies, :parts)
170
+ #
171
+ # You can pass an +options+ hash which can include the following keys:
172
+ # [<tt>:table_name</tt>]
173
+ # Sets the table name, overriding the default.
174
+ # [<tt>:column_options</tt>]
175
+ # Any extra options you want appended to the columns definition.
176
+ # [<tt>:options</tt>]
177
+ # Any extra options you want appended to the table definition.
178
+ # [<tt>:temporary</tt>]
179
+ # Make a temporary table.
180
+ # [<tt>:force</tt>]
181
+ # Set to true to drop the table before creating it.
182
+ # Defaults to false.
183
+ #
184
+ # Note that #create_join_table does not create any indices by default; you can use
185
+ # its block form to do so yourself:
186
+ #
187
+ # create_join_table :products, :categories do |t|
188
+ # t.index :product_id
189
+ # t.index :category_id
190
+ # end
191
+ #
192
+ # ====== Add a backend specific option to the generated SQL (MySQL)
193
+ #
194
+ # create_join_table(:assemblies, :parts, options: 'ENGINE=InnoDB DEFAULT CHARSET=utf8')
195
+ #
196
+ # generates:
197
+ #
198
+ # CREATE TABLE assemblies_parts (
199
+ # assembly_id bigint NOT NULL,
200
+ # part_id bigint NOT NULL,
201
+ # ) ENGINE=InnoDB DEFAULT CHARSET=utf8
202
+ #
203
+ def create_join_table(table_1, table_2, column_options: {}, **options)
204
+ ActiveRecord::Base.connection.create_join_table(table_1, table_2, column_options, **options)
205
+ end
206
+
207
+ # Drops a table from the database.
208
+ #
209
+ # [<tt>:force</tt>]
210
+ # Set to +:cascade+ to drop dependent objects as well.
211
+ # Defaults to false.
212
+ # [<tt>:if_exists</tt>]
213
+ # Set to +true+ to only drop the table if it exists.
214
+ # Defaults to false.
215
+ #
216
+ # Although this command ignores most +options+ and the block if one is given,
217
+ # it can be helpful to provide these in a migration's +change+ method so it can be reverted.
218
+ # In that case, +options+ and the block will be used by #create_table.
219
+ def drop_table(table_name, **options)
220
+ ActiveRecord::Base.connection.drop_table(table_name, **options)
221
+ end
222
+
223
+ # Drops the join table specified by the given arguments.
224
+ # See #create_join_table for details.
225
+ #
226
+ # Although this command ignores the block if one is given, it can be helpful
227
+ # to provide one in a migration's +change+ method so it can be reverted.
228
+ # In that case, the block will be used by #create_join_table.
229
+ def drop_join_table(table_1, table_2, **options)
230
+ ActiveRecord::Base.connection.drop_join_table(table_1, table_2, **options)
231
+ end
232
+
233
+ # Renames a table.
234
+ #
235
+ # rename_table('octopuses', 'octopi')
236
+ #
237
+ def rename_table(table_name, new_name)
238
+ ActiveRecord::Base.connection.rename_table(table_name, new_name)
239
+ end
240
+
241
+ end
242
+ end
243
+ end
244
+ end
@@ -0,0 +1 @@
1
+ require 'mkwebook/concerns/global_data_definition'
@@ -0,0 +1,49 @@
1
+ require 'delegate'
2
+ require 'etc'
3
+
4
+ module Mkwebook
5
+ class Config < SimpleDelegator
6
+ attr_accessor :file, :config
7
+
8
+ def initialize(force_concurrency_off)
9
+ super(nil)
10
+ @file = find_mkwebook_yaml
11
+ if @file && File.exist?(@file)
12
+ @config = load(@file, force_concurrency_off)
13
+ __setobj__(@config)
14
+ else
15
+ __setobj__(self)
16
+ end
17
+ end
18
+
19
+ def load(config_file, force_concurrency_off)
20
+ default_config = {
21
+ 'browser' => {
22
+ 'headless' => true
23
+ },
24
+ 'concurrency': 1
25
+ }
26
+ config = YAML.load_file(config_file)
27
+ config = default_config.deep_merge(config).deep_transform_keys! { |k| k.to_s.underscore.to_sym }
28
+ config[:concurrency] = 1 if force_concurrency_off
29
+ config
30
+ end
31
+
32
+ def concurrent?
33
+ config[:concurrency].present?
34
+ end
35
+
36
+ def find_mkwebook_yaml
37
+ dir = Dir.pwd
38
+ while dir != '/'
39
+ file = File.join(dir, 'mkwebook.yaml')
40
+ return file if File.exist?(file)
41
+
42
+ file = File.join(dir, 'mkwebook.yml')
43
+ return file if File.exist?(file)
44
+
45
+ dir = File.dirname(dir)
46
+ end
47
+ end
48
+ end
49
+ end
@@ -0,0 +1,44 @@
1
+ class String
2
+ def p
3
+ puts self
4
+ end
5
+
6
+ def expa
7
+ File.expand_path(self)
8
+ end
9
+
10
+ def f
11
+ expa
12
+ end
13
+
14
+ def normalize_file_path(force_extname = nil)
15
+ uri = URI.parse(self)
16
+ file_path = uri.path[1..]
17
+ extname = File.extname(file_path)
18
+ basename = File.basename(file_path, extname)
19
+ origin = "#{uri.scheme.try { |s| s + '_' }}#{uri.host}#{uri.port.try { |p| '_' + p.to_s }}"
20
+ basename += "_#{Digest::MD5.hexdigest(uri.query)}" if uri.query.present?
21
+ extname = force_extname if force_extname && extname.empty?
22
+ File.join(origin, File.dirname(file_path), basename + extname)
23
+ end
24
+
25
+ def normalize_uri(force_extname = nil)
26
+ uri = URI.parse(self)
27
+ file_path = uri.path[1..]
28
+ extname = File.extname(file_path)
29
+ basename = File.basename(file_path, extname)
30
+ basename += "_#{Digest::MD5.hexdigest(uri.query)}" if uri.query.present?
31
+ origin = "#{uri.scheme.try { |s| s + '_' }}#{uri.host}#{uri.port.try { |p| '_' + p.to_s }}"
32
+ extname = force_extname if force_extname && extname.empty?
33
+ file_path = File.join(origin, File.dirname(file_path), basename + extname)
34
+ if uri.fragment.present?
35
+ file_path += "##{uri.fragment}"
36
+ else
37
+ file_path
38
+ end
39
+ end
40
+
41
+ def relative_path_from(base)
42
+ Pathname.new(self).relative_path_from(Pathname.new(base)).to_s.gsub(%r{^\.\./}, '')
43
+ end
44
+ end
@@ -0,0 +1 @@
1
+ require 'mkwebook/ext/string'
@@ -0,0 +1,3 @@
1
+ module Mkwebook
2
+ VERSION = "0.1.0"
3
+ end
data/lib/mkwebook.rb ADDED
@@ -0,0 +1,9 @@
1
+ require "mkwebook/version"
2
+ require 'mkwebook/ext'
3
+ require "mkwebook/app"
4
+ require "mkwebook/cli"
5
+ require 'active_support/all'
6
+
7
+ module Mkwebook
8
+ GEM_ROOT = __dir__
9
+ end
@@ -0,0 +1,42 @@
1
+ browser:
2
+ headless: false
3
+ window_size: [1440, 1024]
4
+ timeout: 30
5
+
6
+ concurrency: 16
7
+
8
+ index-page:
9
+ url: https://clojure.org/guides/repl/introduction
10
+ modifier: |
11
+ document.body.innerHTML = document.querySelector('.clj-section-nav-container').outerHTML;
12
+ document.querySelector('.clj-section-nav-container').style.width = '100%';
13
+ document.body.style.backgroundColor = 'white';
14
+
15
+ selector: "html"
16
+ output: "index.html"
17
+ link-selector: "a:not([href='../guides'])"
18
+ assets:
19
+ - selector: "link[rel=stylesheet]"
20
+ attr: href
21
+ - selector: "script[src]"
22
+ attr: src
23
+
24
+
25
+ pages:
26
+ - url-pattern: '.*'
27
+ modifier: |
28
+ document.body.innerHTML = document.querySelector('.clj-content-container').outerHTML;
29
+ document.querySelector('.clj-content-container').style.width = '100%';
30
+ document.body.style.backgroundColor = 'white';
31
+ var style = document.createElement('style');
32
+ style.innerHTML = '.clj-content-container { margin-left: 0; }';
33
+ document.body.appendChild(style);
34
+ selector: html
35
+ assets:
36
+ - selector: img
37
+ attr: src
38
+ - selector: "link[rel=stylesheet]"
39
+ attr: href
40
+ - selector: "script[src]"
41
+ attr: src
42
+
@@ -0,0 +1,99 @@
1
+ browser:
2
+ headless: false
3
+ window_size: [1440, 1024]
4
+ timeout: 30
5
+
6
+ concurrency: 16
7
+
8
+ index-page:
9
+ url: https://python-poetry.org/docs
10
+ title: Poetry Documentation
11
+ modifier: |
12
+ document.querySelector('button[data-controller="mode-switch"]').click();
13
+ document.querySelector('#TableOfContents').remove();
14
+ document.body.innerHTML = document.querySelector('#documentation-menu').outerHTML;
15
+ document.querySelector('#documentation-menu').style.width = '100%';
16
+
17
+ selector: "html"
18
+ output: "index.html"
19
+ link-selector: a
20
+ assets:
21
+ - selector: "link[rel=stylesheet]"
22
+ attr: href
23
+ - selector: "script[src]"
24
+ attr: src
25
+
26
+
27
+ pages:
28
+ - url-pattern: '.*'
29
+ modifier: |
30
+ for (var e of document.querySelectorAll('#content > div > div > div')) {
31
+ e.remove();
32
+ }
33
+ document.querySelector('#content > div').classList.remove('max-w-7xl');
34
+ document.querySelector('#content > div').classList.remove('mt-48');
35
+ document.querySelector('#content > div').classList.remove('md:mt-64');
36
+ document.querySelector('[data-controller="menu"]').remove();
37
+ document.querySelector('footer').remove();
38
+ document.querySelector('#docs').classList.remove('lg:px-12');
39
+ for (var e of document.querySelectorAll('.clipboard-button')) {e.remove();}
40
+ var style = document.createElement('style');
41
+ style.innerHTML = `
42
+ #content main .highlight { width:100%;}
43
+ #content main .admonition {margin-left: 80px; width: 100%;}
44
+ .highlight pre {
45
+ --tw-bg-opacity: 1;
46
+ background-color: rgb(255 255 255/ var(--tw-bg-opacity));
47
+ border-radius: .5rem;
48
+ font-family: Jetbrains Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,monospace;
49
+ line-height: 1.5rem;
50
+ overflow: auto;
51
+ padding: 1.75rem 2rem
52
+ }
53
+ .browser-window.dark .highlight pre code,.highlight pre code {
54
+ --tw-text-opacity: 1!important;
55
+ background-color: transparent!important;
56
+ color: rgb(0 0 0/var(--tw-text-opacity))!important;
57
+ }
58
+ body {
59
+ color: black!important;
60
+ }
61
+ #content main .admonition .content p>code, #content main .admonition .content>code, #content main li>code, #content main p>a>code, #content main p>code, #content main table:not(.lntable) td code {
62
+ color: #000!important;
63
+ }
64
+
65
+ #content main .admonition.note:before {
66
+ content: "";
67
+ }
68
+
69
+ #content main .admonition {
70
+ margin-left: auto;
71
+ width: 100%;
72
+ }
73
+
74
+ `;
75
+
76
+ for (var e of document.querySelectorAll('pre > code > span')) {
77
+ e.style.color = 'black';
78
+ }
79
+
80
+ document.body.appendChild(style);
81
+ var script = document.createElement('script');
82
+ script.type = 'text/javascript';
83
+ script.innerHTML = `
84
+ document.addEventListener('DOMContentLoaded', function() {
85
+ document.querySelector('html').removeAttribute('class');
86
+ document.querySelector('html').setAttribute('class', 'light');
87
+ });
88
+ `
89
+ document.body.appendChild(script);
90
+
91
+ selector: html
92
+ assets:
93
+ - selector: img
94
+ attr: src
95
+ - selector: "link[rel=stylesheet]"
96
+ attr: href
97
+ - selector: "script[src]"
98
+ attr: src
99
+
@@ -0,0 +1,44 @@
1
+ browser: # browser settings
2
+ headless: false # headless mode
3
+ window_size: [1440, 1024] # browser window size
4
+ timeout: 30 # timeout for waiting for page loading
5
+ # Any options accepted by Ferum::Browser.new are allowed here
6
+
7
+ concurrency: 16 # number of concurrent threads, default is no conccurency
8
+
9
+ index-page: # index page settings
10
+ url: https://clojure.org/guides/repl/introduction # URL of index page
11
+ title: Clojure Guides # title for the book, use page's title if not set
12
+ modifier: | # JavaScript code to modify the page
13
+ document.body.innerHTML = document.querySelector('.clj-section-nav-container').outerHTML;
14
+ document.querySelector('.clj-section-nav-container').style.width = '100%';
15
+ document.body.style.backgroundColor = 'white';
16
+
17
+ selector: "html" # CSS selector for the content to be saved
18
+ output: "index.html" # output file name
19
+ link-selector: "a:not([href='../guides'])" # CSS selector for links of content pages
20
+ assets: # assets to be downloaded
21
+ - selector: "link[rel=stylesheet]" # CSS selector for assets
22
+ attr: href # attribute name for the asset URL
23
+ - selector: "script[src]"
24
+ attr: src
25
+
26
+
27
+ pages: # settings for content pages
28
+ - url-pattern: '.*' # URL pattern for content page, only pages' URL matching this pattern will be processed
29
+ modifier: | # JavaScript code to modify the page
30
+ document.body.innerHTML = document.querySelector('.clj-content-container').outerHTML;
31
+ document.querySelector('.clj-content-container').style.width = '100%';
32
+ document.body.style.backgroundColor = 'white';
33
+ var style = document.createElement('style');
34
+ style.innerHTML = '.clj-content-container { margin-left: 0; }';
35
+ document.body.appendChild(style);
36
+ selector: html # CSS selector for the content to be saved
37
+ assets: # assets to be downloaded
38
+ - selector: img # CSS selector for assets
39
+ attr: src # attribute name for the asset URL
40
+ - selector: "link[rel=stylesheet]"
41
+ attr: href
42
+ - selector: "script[src]"
43
+ attr: src
44
+
data/mkwebook.gemspec ADDED
@@ -0,0 +1,32 @@
1
+ require_relative 'lib/mkwebook/version'
2
+
3
+ Gem::Specification.new do |spec|
4
+ spec.name = 'mkwebook'
5
+ spec.version = Mkwebook::VERSION
6
+ spec.authors = ['Liu Xiang']
7
+ spec.email = ['liuxiang921@gmail.com']
8
+
9
+ spec.summary = %(A tool to download web pages and convert them to Calibre ready.)
10
+ spec.description = %(A tool to download web pages and convert them to Calibre ready.)
11
+ spec.homepage = 'https://github.com/lululau/mkwebook'
12
+ spec.license = 'MIT'
13
+ spec.required_ruby_version = Gem::Requirement.new('>= 2.6.0')
14
+
15
+ # Specify which files should be added to the gem when it is released.
16
+ # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
17
+ spec.files = Dir.chdir(File.expand_path(__dir__)) do
18
+ `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
19
+ end
20
+ spec.bindir = 'exe'
21
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
22
+ spec.require_paths = ['lib']
23
+
24
+ spec.add_dependency 'activesupport', '>= 6.1.5'
25
+ spec.add_dependency 'concurrent-ruby'
26
+ spec.add_dependency 'ferrum', '>= 0.13'
27
+ spec.add_dependency 'thor', '>= 1.2.1'
28
+
29
+ spec.add_development_dependency 'pry'
30
+ spec.add_development_dependency 'pry-byebug'
31
+ spec.add_development_dependency 'pry-doc'
32
+ end
data/mkwebook.yml ADDED
@@ -0,0 +1,19 @@
1
+ browser:
2
+ headless: false
3
+
4
+ conccurency: 1
5
+
6
+
7
+ index-page:
8
+ url: https://clojure.org/guides/repl/introduction
9
+ extractor:
10
+ code: ./index_extract.js
11
+
12
+
13
+ page-extractors:
14
+ - url-pattern:
15
+ code:
16
+ assets:
17
+ - css-selector:
18
+ download-attr:
19
+
metadata ADDED
@@ -0,0 +1,168 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: mkwebook
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Liu Xiang
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2022-12-09 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: activesupport
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: 6.1.5
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: 6.1.5
27
+ - !ruby/object:Gem::Dependency
28
+ name: concurrent-ruby
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: ferrum
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0.13'
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0.13'
55
+ - !ruby/object:Gem::Dependency
56
+ name: thor
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: 1.2.1
62
+ type: :runtime
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: 1.2.1
69
+ - !ruby/object:Gem::Dependency
70
+ name: pry
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: pry-byebug
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '0'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
97
+ - !ruby/object:Gem::Dependency
98
+ name: pry-doc
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ version: '0'
104
+ type: :development
105
+ prerelease: false
106
+ version_requirements: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - ">="
109
+ - !ruby/object:Gem::Version
110
+ version: '0'
111
+ description: A tool to download web pages and convert them to Calibre ready.
112
+ email:
113
+ - liuxiang921@gmail.com
114
+ executables:
115
+ - mkwebook
116
+ extensions: []
117
+ extra_rdoc_files: []
118
+ files:
119
+ - ".gitignore"
120
+ - ".solargraph.yml"
121
+ - CODE_OF_CONDUCT.md
122
+ - Gemfile
123
+ - Gemfile.lock
124
+ - LICENSE.txt
125
+ - README.md
126
+ - Rakefile
127
+ - bin/console
128
+ - bin/setup
129
+ - exe/mkwebook
130
+ - lib/mkwebook.rb
131
+ - lib/mkwebook/app.rb
132
+ - lib/mkwebook/cli.rb
133
+ - lib/mkwebook/commands.rb
134
+ - lib/mkwebook/concerns.rb
135
+ - lib/mkwebook/concerns/global_data_definition.rb
136
+ - lib/mkwebook/config.rb
137
+ - lib/mkwebook/ext.rb
138
+ - lib/mkwebook/ext/string.rb
139
+ - lib/mkwebook/version.rb
140
+ - lib/template/mkwebook.clojure.yml
141
+ - lib/template/mkwebook.poetry.yml
142
+ - lib/template/mkwebook.yml
143
+ - mkwebook.gemspec
144
+ - mkwebook.yml
145
+ homepage: https://github.com/lululau/mkwebook
146
+ licenses:
147
+ - MIT
148
+ metadata: {}
149
+ post_install_message:
150
+ rdoc_options: []
151
+ require_paths:
152
+ - lib
153
+ required_ruby_version: !ruby/object:Gem::Requirement
154
+ requirements:
155
+ - - ">="
156
+ - !ruby/object:Gem::Version
157
+ version: 2.6.0
158
+ required_rubygems_version: !ruby/object:Gem::Requirement
159
+ requirements:
160
+ - - ">="
161
+ - !ruby/object:Gem::Version
162
+ version: '0'
163
+ requirements: []
164
+ rubygems_version: 3.3.3
165
+ signing_key:
166
+ specification_version: 4
167
+ summary: A tool to download web pages and convert them to Calibre ready.
168
+ test_files: []