mkwebook 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 91b2c6fce12ddd620f497fc5a902c000525d8581e34ede7107ac55b2771871e4
4
+ data.tar.gz: 11ca2232b8c30848b352737eb9afebafbfda49e0a974b93586627390a524d977
5
+ SHA512:
6
+ metadata.gz: 7aaafd73130c773b6f2b5a942ab525ee95fb84a4a7b01e4ee890edabc1554563e9f3c6dc3fbd36b3321212a50969006e1a9bfd5efb0e0028a92b18bae4df319d
7
+ data.tar.gz: c8578b37ba25133d81e487f5486c0eba9c16712d048b64e0bf2336ef28a0310ce5b93994bc43634c692e671d043a1b97a70124d0ddf616e9330ebf669e0c32ea
data/.gitignore ADDED
@@ -0,0 +1,10 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /coverage/
5
+ /doc/
6
+ /pkg/
7
+ /spec/reports/
8
+ /tmp/
9
+ /.idea/
10
+ /.log/
data/.solargraph.yml ADDED
@@ -0,0 +1,11 @@
1
+ require:
2
+ - actioncable
3
+ - actionmailer
4
+ - actionpack
5
+ - actionview
6
+ - activejob
7
+ - activemodel
8
+ - activerecord
9
+ - activestorage
10
+ - activesupport
11
+ - caxlsx
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at liuxiang@ktjr.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [https://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: https://contributor-covenant.org
74
+ [version]: https://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source "https://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in mkwebook.gemspec
4
+ gemspec
5
+
6
+ gem "rake", "~> 12.0"
data/Gemfile.lock ADDED
@@ -0,0 +1,64 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ mkwebook (0.1.0)
5
+ activesupport (>= 6.1.5)
6
+ concurrent-ruby
7
+ ferrum (>= 0.13)
8
+ thor (>= 1.2.1)
9
+
10
+ GEM
11
+ remote: https://rubygems.org/
12
+ specs:
13
+ activesupport (7.0.4)
14
+ concurrent-ruby (~> 1.0, >= 1.0.2)
15
+ i18n (>= 1.6, < 2)
16
+ minitest (>= 5.1)
17
+ tzinfo (~> 2.0)
18
+ addressable (2.8.1)
19
+ public_suffix (>= 2.0.2, < 6.0)
20
+ byebug (11.1.3)
21
+ coderay (1.1.3)
22
+ concurrent-ruby (1.1.10)
23
+ ferrum (0.13)
24
+ addressable (~> 2.5)
25
+ concurrent-ruby (~> 1.1)
26
+ webrick (~> 1.7)
27
+ websocket-driver (>= 0.6, < 0.8)
28
+ i18n (1.12.0)
29
+ concurrent-ruby (~> 1.0)
30
+ method_source (1.0.0)
31
+ minitest (5.16.3)
32
+ pry (0.13.1)
33
+ coderay (~> 1.1)
34
+ method_source (~> 1.0)
35
+ pry-byebug (3.9.0)
36
+ byebug (~> 11.0)
37
+ pry (~> 0.13.0)
38
+ pry-doc (1.3.0)
39
+ pry (~> 0.11)
40
+ yard (~> 0.9.11)
41
+ public_suffix (5.0.0)
42
+ rake (12.3.3)
43
+ thor (1.2.1)
44
+ tzinfo (2.0.5)
45
+ concurrent-ruby (~> 1.0)
46
+ webrick (1.7.0)
47
+ websocket-driver (0.7.5)
48
+ websocket-extensions (>= 0.1.0)
49
+ websocket-extensions (0.1.5)
50
+ yard (0.9.28)
51
+ webrick (~> 1.7.0)
52
+
53
+ PLATFORMS
54
+ ruby
55
+
56
+ DEPENDENCIES
57
+ mkwebook!
58
+ pry
59
+ pry-byebug
60
+ pry-doc
61
+ rake (~> 12.0)
62
+
63
+ BUNDLED WITH
64
+ 2.3.3
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2020 Liu Xiang
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
File without changes
data/Rakefile ADDED
@@ -0,0 +1,2 @@
1
+ require "bundler/gem_tasks"
2
+ task :default => :spec
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "mkwebook"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
data/bin/setup ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
data/exe/mkwebook ADDED
@@ -0,0 +1,5 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "mkwebook"
4
+
5
+ Mkwebook::Cli.start
@@ -0,0 +1,179 @@
1
+ require 'fileutils'
2
+ require 'Mkwebook/config'
3
+ require 'ferrum'
4
+ require 'pry-byebug'
5
+ require 'concurrent'
6
+
7
+ module Mkwebook
8
+ class App
9
+ attr_accessor :config, :browser, :browser_context, :cli_options
10
+
11
+ def initialize(cli_options)
12
+ if cli_options[:work_dir]
13
+ FileUtils.mkdir_p(cli_options[:work_dir]) unless File.directory?(cli_options[:work_dir])
14
+ Dir.chdir(cli_options[:work_dir])
15
+ end
16
+ @cli_options = cli_options
17
+ @config = Mkwebook::Config.new(@cli_options[:pause] || @cli_options[:pause_on_error] || @cli_options[:single_thread])
18
+ end
19
+
20
+ def create_config
21
+ FileUtils.cp(template_config_file, 'mkwebook.yml', verbose: true)
22
+ end
23
+
24
+ def template_config_file
25
+ File.join(Mkwebook::GEM_ROOT, 'template', 'mkwebook.yml')
26
+ end
27
+
28
+ def make
29
+ make_index
30
+ make_pages
31
+ end
32
+
33
+ def prepare_browser
34
+ @browser = Ferrum::Browser.new(browser_options)
35
+ @browser_context = browser.contexts.create
36
+ end
37
+
38
+ def make_index
39
+ prepare_browser
40
+ index_page = @browser_context.create_page
41
+ index_page.go_to(@config[:index_page][:url])
42
+ index_page.network.wait_for_idle(timeout: 10) rescue nil
43
+ modifier = @config[:index_page][:modifier]
44
+ if modifier && File.file?(modifier)
45
+ index_page.execute(File.read(modifier))
46
+ elsif modifier.present?
47
+ index_page.execute(modifier)
48
+ end
49
+ index_elements = index_page.css(@config[:index_page][:selector])
50
+
51
+ @page_urls = index_elements.flat_map do |element|
52
+ url = element.css(@config[:index_page][:link_selector]).map { |a| a.evaluate('this.href') }
53
+ element.css(@config[:index_page][:link_selector]).each do |a|
54
+ u = a.evaluate('this.href').normalize_uri('.html').relative_path_from(@config[:index_page][:output])
55
+ a.evaluate("this.href = '#{u}'")
56
+ end
57
+ url
58
+ end.uniq
59
+
60
+ @page_urls.select! do |url|
61
+ @config[:pages].any? { |page| url =~ Regexp.new(page[:url_pattern]) }
62
+ end
63
+
64
+ @page_urls = @page_urls[0, @cli_options[:limit]] if @cli_options[:limit]
65
+
66
+
67
+ @config[:index_page][:title].try do |title|
68
+ index_page.execute("document.title = '#{title}'")
69
+ end
70
+
71
+ index_page.execute <<-JS
72
+ for (var e of document.querySelectorAll('[integrity]')) {
73
+ e.removeAttribute('integrity');
74
+ }
75
+ JS
76
+
77
+ binding.pry if @cli_options[:pause]
78
+ download_assets(index_page, @config[:index_page][:assets] || [], @config[:index_page][:output])
79
+
80
+ index_elements.map do |element|
81
+ element.evaluate('this.outerHTML')
82
+ end.join("\n").tap do |html|
83
+ File.write(@config[:index_page][:output], html)
84
+ end
85
+ rescue Ferrum::Error => e
86
+ binding.pry
87
+ end
88
+
89
+ def make_pages
90
+
91
+ pool = Concurrent::FixedThreadPool.new(@config[:concurrency])
92
+
93
+ @page_urls.each do |url|
94
+ page_config = @config[:pages].find { |page| url =~ Regexp.new(page[:url_pattern]) }
95
+ next unless page_config
96
+
97
+ pool.post do
98
+ page = @browser_context.create_page
99
+
100
+ begin
101
+ output = url.normalize_file_path('.html')
102
+ page.go_to(url)
103
+ page.network.wait_for_idle(timeout: 10) rescue nil
104
+ modifier = page_config[:modifier]
105
+ if modifier && File.file?(modifier)
106
+ page.execute(File.read(modifier))
107
+ elsif modifier.present?
108
+ page.execute(modifier)
109
+ end
110
+ page_elements = page.css(page_config[:selector])
111
+
112
+ @config[:index_page][:title].try do |title|
113
+ page.execute("document.title = '#{title}'")
114
+ end
115
+
116
+ page.execute <<-JS
117
+ for (var e of document.querySelectorAll('[integrity]')) {
118
+ e.removeAttribute('integrity');
119
+ }
120
+ JS
121
+
122
+ binding.pry if @cli_options[:pause]
123
+ download_assets(page, page_config[:assets] || [])
124
+
125
+ page_elements.map do |element|
126
+ element.css('a').each do |a|
127
+ u = a.evaluate('this.href')
128
+ next unless @page_urls.include?(u)
129
+
130
+ u = u.normalize_uri('.html').relative_path_from(url.normalize_uri('.html'))
131
+ a.evaluate("this.href = '#{u}'")
132
+ end
133
+ element.evaluate('this.outerHTML')
134
+ end.join("\n").tap do |html|
135
+ FileUtils.mkdir_p(File.dirname(output))
136
+ File.write(output, html)
137
+ end
138
+ rescue Ferrum::Error => e
139
+ $stderr.puts e.message
140
+ $stderr.puts e.backtrace
141
+ binding.pry if @cli_options[:pause_on_error]
142
+ ensure
143
+ page.close
144
+ end
145
+ end
146
+
147
+ end
148
+
149
+ pool.shutdown
150
+ pool.wait_for_termination
151
+ end
152
+
153
+ def download_assets(page, assets_config, page_uri = nil)
154
+ assets_config.each do |asset_config|
155
+ asset_attr = asset_config[:attr]
156
+ asset_selector = asset_config[:selector]
157
+ page.css(asset_selector).each do |element|
158
+ asset_url = element.evaluate("this.#{asset_attr}")
159
+ next if asset_url.start_with?('data:')
160
+ asset_file = asset_url.normalize_file_path
161
+ FileUtils.mkdir_p(File.dirname(asset_file))
162
+ page.network.traffic.find { |t| t.url == asset_url }.try do |traffic|
163
+ traffic&.response&.body.try do |body|
164
+ File.write(asset_file, body)
165
+ end
166
+ end
167
+ u = asset_url.normalize_uri.relative_path_from((page_uri || page.url.normalize_uri))
168
+ element.evaluate("this.#{asset_attr} = '#{u}'")
169
+ end
170
+ end
171
+ end
172
+
173
+ private
174
+
175
+ def browser_options
176
+ @config[:browser]
177
+ end
178
+ end
179
+ end
@@ -0,0 +1,41 @@
1
+ require 'thor'
2
+
3
+ module Mkwebook
4
+ class Cli < ::Thor
5
+ class << self
6
+ def main(args)
7
+ start(args)
8
+ end
9
+ end
10
+
11
+ class_option :work_dir, :type => :string, :aliases => '-d', :default => '.', :desc => 'Working directory'
12
+ class_option :pause_on_error, :type => :boolean, :aliases => '-e', :default => false, :desc => 'Pause on error, this option will force concurrency off'
13
+ desc 'init', 'Create config file in current directory'
14
+ def init
15
+ Mkwebook::App.new(options).create_config
16
+ end
17
+
18
+ option :pause, :type => :boolean, :aliases => '-p', :desc => 'Pause after processing index page'
19
+ desc 'make_index', 'Download and process index page'
20
+ def make_index
21
+ Mkwebook::App.new(options).make_index
22
+ end
23
+
24
+ option :limit, :type => :numeric, :aliases => '-l', :desc => 'Limit number of pages, specially for debugging'
25
+ option :pause, :type => :boolean, :aliases => '-P', :desc => 'Pause before quit, this option will force concurrency off'
26
+ option :pause_on_index, :type => :boolean, :aliases => '-p', :desc => 'Pause after processing index page'
27
+ option :single_thread, :type => :boolean, :aliases => '-s', :desc => 'Force conccurency off'
28
+ desc 'make', 'Download and process html files'
29
+ def make
30
+ Mkwebook::App.new(options).make
31
+ end
32
+
33
+ desc 'version', 'Print version'
34
+ def version
35
+ puts Mkwebook::VERSION
36
+ end
37
+
38
+ no_commands do
39
+ end
40
+ end
41
+ end
@@ -0,0 +1,4 @@
1
+ require 'mkwebook/commands/info'
2
+
3
+ module Mkwebook::Commands
4
+ end
@@ -0,0 +1,244 @@
1
+ require 'active_support/concern'
2
+
3
+ module Mkwebook
4
+ module Concerns
5
+ module GlobalDataDefinition
6
+ extend ActiveSupport::Concern
7
+
8
+ included do
9
+
10
+ # Example:
11
+ #
12
+ # create_table :post, id: false, primary_key: :id do |t|
13
+ # t.column :id, :bigint, precison: 19, comment: 'ID'
14
+ # t.column :name, :string, comment: '名称'
15
+ # t.column :gmt_created, :datetime, comment: '创建时间'
16
+ # t.column :gmt_modified, :datetime, comment: '最后修改时间'
17
+ # end
18
+ #
19
+ # Creates a new table with the name +table_name+. +table_name+ may either
20
+ # be a String or a Symbol.
21
+ #
22
+ # There are two ways to work with #create_table. You can use the block
23
+ # form or the regular form, like this:
24
+ #
25
+ # === Block form
26
+ #
27
+ # # create_table() passes a TableDefinition object to the block.
28
+ # # This form will not only create the table, but also columns for the
29
+ # # table.
30
+ #
31
+ # create_table(:suppliers) do |t|
32
+ # t.column :name, :string, limit: 60
33
+ # # Other fields here
34
+ # end
35
+ #
36
+ # === Block form, with shorthand
37
+ #
38
+ # # You can also use the column types as method calls, rather than calling the column method.
39
+ # create_table(:suppliers) do |t|
40
+ # t.string :name, limit: 60
41
+ # # Other fields here
42
+ # end
43
+ #
44
+ # === Regular form
45
+ #
46
+ # # Creates a table called 'suppliers' with no columns.
47
+ # create_table(:suppliers)
48
+ # # Add a column to 'suppliers'.
49
+ # add_column(:suppliers, :name, :string, {limit: 60})
50
+ #
51
+ # The +options+ hash can include the following keys:
52
+ # [<tt>:id</tt>]
53
+ # Whether to automatically add a primary key column. Defaults to true.
54
+ # Join tables for {ActiveRecord::Base.has_and_belongs_to_many}[rdoc-ref:Associations::ClassMethods#has_and_belongs_to_many] should set it to false.
55
+ #
56
+ # A Symbol can be used to specify the type of the generated primary key column.
57
+ # [<tt>:primary_key</tt>]
58
+ # The name of the primary key, if one is to be added automatically.
59
+ # Defaults to +id+. If <tt>:id</tt> is false, then this option is ignored.
60
+ #
61
+ # If an array is passed, a composite primary key will be created.
62
+ #
63
+ # Note that Active Record models will automatically detect their
64
+ # primary key. This can be avoided by using
65
+ # {self.primary_key=}[rdoc-ref:AttributeMethods::PrimaryKey::ClassMethods#primary_key=] on the model
66
+ # to define the key explicitly.
67
+ #
68
+ # [<tt>:options</tt>]
69
+ # Any extra options you want appended to the table definition.
70
+ # [<tt>:temporary</tt>]
71
+ # Make a temporary table.
72
+ # [<tt>:force</tt>]
73
+ # Set to true to drop the table before creating it.
74
+ # Set to +:cascade+ to drop dependent objects as well.
75
+ # Defaults to false.
76
+ # [<tt>:if_not_exists</tt>]
77
+ # Set to true to avoid raising an error when the table already exists.
78
+ # Defaults to false.
79
+ # [<tt>:as</tt>]
80
+ # SQL to use to generate the table. When this option is used, the block is
81
+ # ignored, as are the <tt>:id</tt> and <tt>:primary_key</tt> options.
82
+ #
83
+ # ====== Add a backend specific option to the generated SQL (MySQL)
84
+ #
85
+ # create_table(:suppliers, options: 'ENGINE=InnoDB DEFAULT CHARSET=utf8mb4')
86
+ #
87
+ # generates:
88
+ #
89
+ # CREATE TABLE suppliers (
90
+ # id bigint auto_increment PRIMARY KEY
91
+ # ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
92
+ #
93
+ # ====== Rename the primary key column
94
+ #
95
+ # create_table(:objects, primary_key: 'guid') do |t|
96
+ # t.column :name, :string, limit: 80
97
+ # end
98
+ #
99
+ # generates:
100
+ #
101
+ # CREATE TABLE objects (
102
+ # guid bigint auto_increment PRIMARY KEY,
103
+ # name varchar(80)
104
+ # )
105
+ #
106
+ # ====== Change the primary key column type
107
+ #
108
+ # create_table(:tags, id: :string) do |t|
109
+ # t.column :label, :string
110
+ # end
111
+ #
112
+ # generates:
113
+ #
114
+ # CREATE TABLE tags (
115
+ # id varchar PRIMARY KEY,
116
+ # label varchar
117
+ # )
118
+ #
119
+ # ====== Create a composite primary key
120
+ #
121
+ # create_table(:orders, primary_key: [:product_id, :client_id]) do |t|
122
+ # t.belongs_to :product
123
+ # t.belongs_to :client
124
+ # end
125
+ #
126
+ # generates:
127
+ #
128
+ # CREATE TABLE order (
129
+ # product_id bigint NOT NULL,
130
+ # client_id bigint NOT NULL
131
+ # );
132
+ #
133
+ # ALTER TABLE ONLY "orders"
134
+ # ADD CONSTRAINT orders_pkey PRIMARY KEY (product_id, client_id);
135
+ #
136
+ # ====== Do not add a primary key column
137
+ #
138
+ # create_table(:categories_suppliers, id: false) do |t|
139
+ # t.column :category_id, :bigint
140
+ # t.column :supplier_id, :bigint
141
+ # end
142
+ #
143
+ # generates:
144
+ #
145
+ # CREATE TABLE categories_suppliers (
146
+ # category_id bigint,
147
+ # supplier_id bigint
148
+ # )
149
+ #
150
+ # ====== Create a temporary table based on a query
151
+ #
152
+ # create_table(:long_query, temporary: true,
153
+ # as: "SELECT * FROM orders INNER JOIN line_items ON order_id=orders.id")
154
+ #
155
+ # generates:
156
+ #
157
+ # CREATE TEMPORARY TABLE long_query AS
158
+ # SELECT * FROM orders INNER JOIN line_items ON order_id=orders.id
159
+ #
160
+ # See also TableDefinition#column for details on how to create columns.
161
+ def create_table(table_name, **options, &blk)
162
+ ActiveRecord::Base.connection.create_table(table_name, **options, &blk)
163
+ end
164
+
165
+ # Creates a new join table with the name created using the lexical order of the first two
166
+ # arguments. These arguments can be a String or a Symbol.
167
+ #
168
+ # # Creates a table called 'assemblies_parts' with no id.
169
+ # create_join_table(:assemblies, :parts)
170
+ #
171
+ # You can pass an +options+ hash which can include the following keys:
172
+ # [<tt>:table_name</tt>]
173
+ # Sets the table name, overriding the default.
174
+ # [<tt>:column_options</tt>]
175
+ # Any extra options you want appended to the columns definition.
176
+ # [<tt>:options</tt>]
177
+ # Any extra options you want appended to the table definition.
178
+ # [<tt>:temporary</tt>]
179
+ # Make a temporary table.
180
+ # [<tt>:force</tt>]
181
+ # Set to true to drop the table before creating it.
182
+ # Defaults to false.
183
+ #
184
+ # Note that #create_join_table does not create any indices by default; you can use
185
+ # its block form to do so yourself:
186
+ #
187
+ # create_join_table :products, :categories do |t|
188
+ # t.index :product_id
189
+ # t.index :category_id
190
+ # end
191
+ #
192
+ # ====== Add a backend specific option to the generated SQL (MySQL)
193
+ #
194
+ # create_join_table(:assemblies, :parts, options: 'ENGINE=InnoDB DEFAULT CHARSET=utf8')
195
+ #
196
+ # generates:
197
+ #
198
+ # CREATE TABLE assemblies_parts (
199
+ # assembly_id bigint NOT NULL,
200
+ # part_id bigint NOT NULL,
201
+ # ) ENGINE=InnoDB DEFAULT CHARSET=utf8
202
+ #
203
+ def create_join_table(table_1, table_2, column_options: {}, **options)
204
+ ActiveRecord::Base.connection.create_join_table(table_1, table_2, column_options, **options)
205
+ end
206
+
207
+ # Drops a table from the database.
208
+ #
209
+ # [<tt>:force</tt>]
210
+ # Set to +:cascade+ to drop dependent objects as well.
211
+ # Defaults to false.
212
+ # [<tt>:if_exists</tt>]
213
+ # Set to +true+ to only drop the table if it exists.
214
+ # Defaults to false.
215
+ #
216
+ # Although this command ignores most +options+ and the block if one is given,
217
+ # it can be helpful to provide these in a migration's +change+ method so it can be reverted.
218
+ # In that case, +options+ and the block will be used by #create_table.
219
+ def drop_table(table_name, **options)
220
+ ActiveRecord::Base.connection.drop_table(table_name, **options)
221
+ end
222
+
223
+ # Drops the join table specified by the given arguments.
224
+ # See #create_join_table for details.
225
+ #
226
+ # Although this command ignores the block if one is given, it can be helpful
227
+ # to provide one in a migration's +change+ method so it can be reverted.
228
+ # In that case, the block will be used by #create_join_table.
229
+ def drop_join_table(table_1, table_2, **options)
230
+ ActiveRecord::Base.connection.drop_join_table(table_1, table_2, **options)
231
+ end
232
+
233
+ # Renames a table.
234
+ #
235
+ # rename_table('octopuses', 'octopi')
236
+ #
237
+ def rename_table(table_name, new_name)
238
+ ActiveRecord::Base.connection.rename_table(table_name, new_name)
239
+ end
240
+
241
+ end
242
+ end
243
+ end
244
+ end
@@ -0,0 +1 @@
1
+ require 'mkwebook/concerns/global_data_definition'
@@ -0,0 +1,49 @@
1
+ require 'delegate'
2
+ require 'etc'
3
+
4
+ module Mkwebook
5
+ class Config < SimpleDelegator
6
+ attr_accessor :file, :config
7
+
8
+ def initialize(force_concurrency_off)
9
+ super(nil)
10
+ @file = find_mkwebook_yaml
11
+ if @file && File.exist?(@file)
12
+ @config = load(@file, force_concurrency_off)
13
+ __setobj__(@config)
14
+ else
15
+ __setobj__(self)
16
+ end
17
+ end
18
+
19
+ def load(config_file, force_concurrency_off)
20
+ default_config = {
21
+ 'browser' => {
22
+ 'headless' => true
23
+ },
24
+ 'concurrency': 1
25
+ }
26
+ config = YAML.load_file(config_file)
27
+ config = default_config.deep_merge(config).deep_transform_keys! { |k| k.to_s.underscore.to_sym }
28
+ config[:concurrency] = 1 if force_concurrency_off
29
+ config
30
+ end
31
+
32
+ def concurrent?
33
+ config[:concurrency].present?
34
+ end
35
+
36
+ def find_mkwebook_yaml
37
+ dir = Dir.pwd
38
+ while dir != '/'
39
+ file = File.join(dir, 'mkwebook.yaml')
40
+ return file if File.exist?(file)
41
+
42
+ file = File.join(dir, 'mkwebook.yml')
43
+ return file if File.exist?(file)
44
+
45
+ dir = File.dirname(dir)
46
+ end
47
+ end
48
+ end
49
+ end
@@ -0,0 +1,44 @@
1
+ class String
2
+ def p
3
+ puts self
4
+ end
5
+
6
+ def expa
7
+ File.expand_path(self)
8
+ end
9
+
10
+ def f
11
+ expa
12
+ end
13
+
14
+ def normalize_file_path(force_extname = nil)
15
+ uri = URI.parse(self)
16
+ file_path = uri.path[1..]
17
+ extname = File.extname(file_path)
18
+ basename = File.basename(file_path, extname)
19
+ origin = "#{uri.scheme.try { |s| s + '_' }}#{uri.host}#{uri.port.try { |p| '_' + p.to_s }}"
20
+ basename += "_#{Digest::MD5.hexdigest(uri.query)}" if uri.query.present?
21
+ extname = force_extname if force_extname && extname.empty?
22
+ File.join(origin, File.dirname(file_path), basename + extname)
23
+ end
24
+
25
+ def normalize_uri(force_extname = nil)
26
+ uri = URI.parse(self)
27
+ file_path = uri.path[1..]
28
+ extname = File.extname(file_path)
29
+ basename = File.basename(file_path, extname)
30
+ basename += "_#{Digest::MD5.hexdigest(uri.query)}" if uri.query.present?
31
+ origin = "#{uri.scheme.try { |s| s + '_' }}#{uri.host}#{uri.port.try { |p| '_' + p.to_s }}"
32
+ extname = force_extname if force_extname && extname.empty?
33
+ file_path = File.join(origin, File.dirname(file_path), basename + extname)
34
+ if uri.fragment.present?
35
+ file_path += "##{uri.fragment}"
36
+ else
37
+ file_path
38
+ end
39
+ end
40
+
41
+ def relative_path_from(base)
42
+ Pathname.new(self).relative_path_from(Pathname.new(base)).to_s.gsub(%r{^\.\./}, '')
43
+ end
44
+ end
@@ -0,0 +1 @@
1
+ require 'mkwebook/ext/string'
@@ -0,0 +1,3 @@
1
+ module Mkwebook
2
+ VERSION = "0.1.0"
3
+ end
data/lib/mkwebook.rb ADDED
@@ -0,0 +1,9 @@
1
+ require "mkwebook/version"
2
+ require 'mkwebook/ext'
3
+ require "mkwebook/app"
4
+ require "mkwebook/cli"
5
+ require 'active_support/all'
6
+
7
+ module Mkwebook
8
+ GEM_ROOT = __dir__
9
+ end
@@ -0,0 +1,42 @@
1
+ browser:
2
+ headless: false
3
+ window_size: [1440, 1024]
4
+ timeout: 30
5
+
6
+ concurrency: 16
7
+
8
+ index-page:
9
+ url: https://clojure.org/guides/repl/introduction
10
+ modifier: |
11
+ document.body.innerHTML = document.querySelector('.clj-section-nav-container').outerHTML;
12
+ document.querySelector('.clj-section-nav-container').style.width = '100%';
13
+ document.body.style.backgroundColor = 'white';
14
+
15
+ selector: "html"
16
+ output: "index.html"
17
+ link-selector: "a:not([href='../guides'])"
18
+ assets:
19
+ - selector: "link[rel=stylesheet]"
20
+ attr: href
21
+ - selector: "script[src]"
22
+ attr: src
23
+
24
+
25
+ pages:
26
+ - url-pattern: '.*'
27
+ modifier: |
28
+ document.body.innerHTML = document.querySelector('.clj-content-container').outerHTML;
29
+ document.querySelector('.clj-content-container').style.width = '100%';
30
+ document.body.style.backgroundColor = 'white';
31
+ var style = document.createElement('style');
32
+ style.innerHTML = '.clj-content-container { margin-left: 0; }';
33
+ document.body.appendChild(style);
34
+ selector: html
35
+ assets:
36
+ - selector: img
37
+ attr: src
38
+ - selector: "link[rel=stylesheet]"
39
+ attr: href
40
+ - selector: "script[src]"
41
+ attr: src
42
+
@@ -0,0 +1,99 @@
1
+ browser:
2
+ headless: false
3
+ window_size: [1440, 1024]
4
+ timeout: 30
5
+
6
+ concurrency: 16
7
+
8
+ index-page:
9
+ url: https://python-poetry.org/docs
10
+ title: Poetry Documentation
11
+ modifier: |
12
+ document.querySelector('button[data-controller="mode-switch"]').click();
13
+ document.querySelector('#TableOfContents').remove();
14
+ document.body.innerHTML = document.querySelector('#documentation-menu').outerHTML;
15
+ document.querySelector('#documentation-menu').style.width = '100%';
16
+
17
+ selector: "html"
18
+ output: "index.html"
19
+ link-selector: a
20
+ assets:
21
+ - selector: "link[rel=stylesheet]"
22
+ attr: href
23
+ - selector: "script[src]"
24
+ attr: src
25
+
26
+
27
+ pages:
28
+ - url-pattern: '.*'
29
+ modifier: |
30
+ for (var e of document.querySelectorAll('#content > div > div > div')) {
31
+ e.remove();
32
+ }
33
+ document.querySelector('#content > div').classList.remove('max-w-7xl');
34
+ document.querySelector('#content > div').classList.remove('mt-48');
35
+ document.querySelector('#content > div').classList.remove('md:mt-64');
36
+ document.querySelector('[data-controller="menu"]').remove();
37
+ document.querySelector('footer').remove();
38
+ document.querySelector('#docs').classList.remove('lg:px-12');
39
+ for (var e of document.querySelectorAll('.clipboard-button')) {e.remove();}
40
+ var style = document.createElement('style');
41
+ style.innerHTML = `
42
+ #content main .highlight { width:100%;}
43
+ #content main .admonition {margin-left: 80px; width: 100%;}
44
+ .highlight pre {
45
+ --tw-bg-opacity: 1;
46
+ background-color: rgb(255 255 255/ var(--tw-bg-opacity));
47
+ border-radius: .5rem;
48
+ font-family: Jetbrains Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,monospace;
49
+ line-height: 1.5rem;
50
+ overflow: auto;
51
+ padding: 1.75rem 2rem
52
+ }
53
+ .browser-window.dark .highlight pre code,.highlight pre code {
54
+ --tw-text-opacity: 1!important;
55
+ background-color: transparent!important;
56
+ color: rgb(0 0 0/var(--tw-text-opacity))!important;
57
+ }
58
+ body {
59
+ color: black!important;
60
+ }
61
+ #content main .admonition .content p>code, #content main .admonition .content>code, #content main li>code, #content main p>a>code, #content main p>code, #content main table:not(.lntable) td code {
62
+ color: #000!important;
63
+ }
64
+
65
+ #content main .admonition.note:before {
66
+ content: "";
67
+ }
68
+
69
+ #content main .admonition {
70
+ margin-left: auto;
71
+ width: 100%;
72
+ }
73
+
74
+ `;
75
+
76
+ for (var e of document.querySelectorAll('pre > code > span')) {
77
+ e.style.color = 'black';
78
+ }
79
+
80
+ document.body.appendChild(style);
81
+ var script = document.createElement('script');
82
+ script.type = 'text/javascript';
83
+ script.innerHTML = `
84
+ document.addEventListener('DOMContentLoaded', function() {
85
+ document.querySelector('html').removeAttribute('class');
86
+ document.querySelector('html').setAttribute('class', 'light');
87
+ });
88
+ `
89
+ document.body.appendChild(script);
90
+
91
+ selector: html
92
+ assets:
93
+ - selector: img
94
+ attr: src
95
+ - selector: "link[rel=stylesheet]"
96
+ attr: href
97
+ - selector: "script[src]"
98
+ attr: src
99
+
@@ -0,0 +1,44 @@
1
+ browser: # browser settings
2
+ headless: false # headless mode
3
+ window_size: [1440, 1024] # browser window size
4
+ timeout: 30 # timeout for waiting for page loading
5
+ # Any options accepted by Ferum::Browser.new are allowed here
6
+
7
+ concurrency: 16 # number of concurrent threads, default is no conccurency
8
+
9
+ index-page: # index page settings
10
+ url: https://clojure.org/guides/repl/introduction # URL of index page
11
+ title: Clojure Guides # title for the book, use page's title if not set
12
+ modifier: | # JavaScript code to modify the page
13
+ document.body.innerHTML = document.querySelector('.clj-section-nav-container').outerHTML;
14
+ document.querySelector('.clj-section-nav-container').style.width = '100%';
15
+ document.body.style.backgroundColor = 'white';
16
+
17
+ selector: "html" # CSS selector for the content to be saved
18
+ output: "index.html" # output file name
19
+ link-selector: "a:not([href='../guides'])" # CSS selector for links of content pages
20
+ assets: # assets to be downloaded
21
+ - selector: "link[rel=stylesheet]" # CSS selector for assets
22
+ attr: href # attribute name for the asset URL
23
+ - selector: "script[src]"
24
+ attr: src
25
+
26
+
27
+ pages: # settings for content pages
28
+ - url-pattern: '.*' # URL pattern for content page, only pages' URL matching this pattern will be processed
29
+ modifier: | # JavaScript code to modify the page
30
+ document.body.innerHTML = document.querySelector('.clj-content-container').outerHTML;
31
+ document.querySelector('.clj-content-container').style.width = '100%';
32
+ document.body.style.backgroundColor = 'white';
33
+ var style = document.createElement('style');
34
+ style.innerHTML = '.clj-content-container { margin-left: 0; }';
35
+ document.body.appendChild(style);
36
+ selector: html # CSS selector for the content to be saved
37
+ assets: # assets to be downloaded
38
+ - selector: img # CSS selector for assets
39
+ attr: src # attribute name for the asset URL
40
+ - selector: "link[rel=stylesheet]"
41
+ attr: href
42
+ - selector: "script[src]"
43
+ attr: src
44
+
data/mkwebook.gemspec ADDED
@@ -0,0 +1,32 @@
1
+ require_relative 'lib/mkwebook/version'
2
+
3
+ Gem::Specification.new do |spec|
4
+ spec.name = 'mkwebook'
5
+ spec.version = Mkwebook::VERSION
6
+ spec.authors = ['Liu Xiang']
7
+ spec.email = ['liuxiang921@gmail.com']
8
+
9
+ spec.summary = %(A tool to download web pages and convert them to Calibre ready.)
10
+ spec.description = %(A tool to download web pages and convert them to Calibre ready.)
11
+ spec.homepage = 'https://github.com/lululau/mkwebook'
12
+ spec.license = 'MIT'
13
+ spec.required_ruby_version = Gem::Requirement.new('>= 2.6.0')
14
+
15
+ # Specify which files should be added to the gem when it is released.
16
+ # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
17
+ spec.files = Dir.chdir(File.expand_path(__dir__)) do
18
+ `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
19
+ end
20
+ spec.bindir = 'exe'
21
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
22
+ spec.require_paths = ['lib']
23
+
24
+ spec.add_dependency 'activesupport', '>= 6.1.5'
25
+ spec.add_dependency 'concurrent-ruby'
26
+ spec.add_dependency 'ferrum', '>= 0.13'
27
+ spec.add_dependency 'thor', '>= 1.2.1'
28
+
29
+ spec.add_development_dependency 'pry'
30
+ spec.add_development_dependency 'pry-byebug'
31
+ spec.add_development_dependency 'pry-doc'
32
+ end
data/mkwebook.yml ADDED
@@ -0,0 +1,19 @@
1
+ browser:
2
+ headless: false
3
+
4
+ conccurency: 1
5
+
6
+
7
+ index-page:
8
+ url: https://clojure.org/guides/repl/introduction
9
+ extractor:
10
+ code: ./index_extract.js
11
+
12
+
13
+ page-extractors:
14
+ - url-pattern:
15
+ code:
16
+ assets:
17
+ - css-selector:
18
+ download-attr:
19
+
metadata ADDED
@@ -0,0 +1,168 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: mkwebook
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Liu Xiang
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2022-12-09 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: activesupport
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: 6.1.5
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: 6.1.5
27
+ - !ruby/object:Gem::Dependency
28
+ name: concurrent-ruby
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: ferrum
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0.13'
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0.13'
55
+ - !ruby/object:Gem::Dependency
56
+ name: thor
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: 1.2.1
62
+ type: :runtime
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: 1.2.1
69
+ - !ruby/object:Gem::Dependency
70
+ name: pry
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: pry-byebug
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '0'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
97
+ - !ruby/object:Gem::Dependency
98
+ name: pry-doc
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ version: '0'
104
+ type: :development
105
+ prerelease: false
106
+ version_requirements: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - ">="
109
+ - !ruby/object:Gem::Version
110
+ version: '0'
111
+ description: A tool to download web pages and convert them to Calibre ready.
112
+ email:
113
+ - liuxiang921@gmail.com
114
+ executables:
115
+ - mkwebook
116
+ extensions: []
117
+ extra_rdoc_files: []
118
+ files:
119
+ - ".gitignore"
120
+ - ".solargraph.yml"
121
+ - CODE_OF_CONDUCT.md
122
+ - Gemfile
123
+ - Gemfile.lock
124
+ - LICENSE.txt
125
+ - README.md
126
+ - Rakefile
127
+ - bin/console
128
+ - bin/setup
129
+ - exe/mkwebook
130
+ - lib/mkwebook.rb
131
+ - lib/mkwebook/app.rb
132
+ - lib/mkwebook/cli.rb
133
+ - lib/mkwebook/commands.rb
134
+ - lib/mkwebook/concerns.rb
135
+ - lib/mkwebook/concerns/global_data_definition.rb
136
+ - lib/mkwebook/config.rb
137
+ - lib/mkwebook/ext.rb
138
+ - lib/mkwebook/ext/string.rb
139
+ - lib/mkwebook/version.rb
140
+ - lib/template/mkwebook.clojure.yml
141
+ - lib/template/mkwebook.poetry.yml
142
+ - lib/template/mkwebook.yml
143
+ - mkwebook.gemspec
144
+ - mkwebook.yml
145
+ homepage: https://github.com/lululau/mkwebook
146
+ licenses:
147
+ - MIT
148
+ metadata: {}
149
+ post_install_message:
150
+ rdoc_options: []
151
+ require_paths:
152
+ - lib
153
+ required_ruby_version: !ruby/object:Gem::Requirement
154
+ requirements:
155
+ - - ">="
156
+ - !ruby/object:Gem::Version
157
+ version: 2.6.0
158
+ required_rubygems_version: !ruby/object:Gem::Requirement
159
+ requirements:
160
+ - - ">="
161
+ - !ruby/object:Gem::Version
162
+ version: '0'
163
+ requirements: []
164
+ rubygems_version: 3.3.3
165
+ signing_key:
166
+ specification_version: 4
167
+ summary: A tool to download web pages and convert them to Calibre ready.
168
+ test_files: []