html2index 1.2.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/bin/html2index +32 -0
- data/doc/html/html2index.html +638 -0
- data/doc/man/html2index.1.gz +0 -0
- data/doc/pdf/html2index.pdf +0 -0
- data/doc/rst/html2index.rst +265 -0
- data/html2index.gemspec +21 -0
- data/lib/argparser.rb +111 -0
- data/lib/configuration.rb +183 -0
- data/lib/constants.rb +55 -0
- data/lib/definition.rb +43 -0
- data/lib/dictionary.rb +46 -0
- data/lib/file_checking.rb +103 -0
- data/lib/html2index.rb +277 -0
- data/lib/log.conf +56 -0
- data/lib/logging.rb +206 -0
- data/lib/template.rb +134 -0
- data/lib/translating.rb +89 -0
- data/lib/translations +0 -0
- data/lib/user_input.rb +45 -0
- data/lib/version.rb +12 -0
- metadata +104 -0
@@ -0,0 +1,265 @@
|
|
1
|
+
==========================
|
2
|
+
HTML2Index
|
3
|
+
==========================
|
4
|
+
--------------------------------------------------------------------
|
5
|
+
Creates an index or glossary of marked expressions in an HTML-file
|
6
|
+
--------------------------------------------------------------------
|
7
|
+
|
8
|
+
SYNOPSIS
|
9
|
+
=========
|
10
|
+
**html2index -s input.html [-o output.html] [-t template.html] [-c config] [-d]**
|
11
|
+
|
12
|
+
**html2index -v**
|
13
|
+
|
14
|
+
**html2index -h**
|
15
|
+
|
16
|
+
DESCRIPTION
|
17
|
+
============
|
18
|
+
|
19
|
+
The program identifies in an HTML source all expressions which need to be copied
|
20
|
+
to the generated index and searches given dictionaries on the Web for an
|
21
|
+
explanation of each expression.
|
22
|
+
|
23
|
+
The resulting glossary is written to a new HTML-file or its HTML-code printed to STDOUT.
|
24
|
+
|
25
|
+
**NOTE**: The default dictionaries are French: Larousse and JargonF. Non-French speakers MUST define the dictionary by editing the configuration file ~/.config/HTML2Index/config as described under Configuration below.
|
26
|
+
|
27
|
+
OPTIONS
|
28
|
+
============================
|
29
|
+
|
30
|
+
:-d, --debug: Be verbose
|
31
|
+
|
32
|
+
:-s, --source=SOURCE: Source is the the original html-file
|
33
|
+
which contains marked expressions (see
|
34
|
+
**Preparations**, below).
|
35
|
+
|
36
|
+
:-o, --out=GLOSSARY: Glossary is the generated file in HTML-format.
|
37
|
+
|
38
|
+
:-t, --template=TEMPLATE: A html file containing placeholders for the references to the dictionaries used and
|
39
|
+
the generated glossary. The placeholders are
|
40
|
+
currently defined as %=dict_list=% and
|
41
|
+
%=glossary=%. You can set different
|
42
|
+
field-delimiters and names in the
|
43
|
+
configuration-file. See below under
|
44
|
+
*EXAMPLE-Template* for a rudimentary example.
|
45
|
+
|
46
|
+
:-c, --config=CONFIG: Configuration-file. Command-line arguments override the settings in this file. You find a functional
|
47
|
+
configuration after the first program-execution in *~/.config/HTML2Index*. The file is commented and
|
48
|
+
can immediately be adapted to your needs.
|
49
|
+
|
50
|
+
Common Options
|
51
|
+
----------------
|
52
|
+
|
53
|
+
**-h, --help** Show this message
|
54
|
+
|
55
|
+
**-v, --version** Show program version
|
56
|
+
|
57
|
+
EXAMPLE Usage
|
58
|
+
============================
|
59
|
+
|
60
|
+
Here is a html-page containing instructions on how to enable and disable a
|
61
|
+
touchpad using the xinput command (or any other HTML-file) in the French (or
|
62
|
+
any other) language.
|
63
|
+
|
64
|
+
**touchpad_fr.html**
|
65
|
+
|
66
|
+
Execution
|
67
|
+
--------------------
|
68
|
+
|
69
|
+
Executing HTML2Index with the -s argument and the HTML-file as its value, like this:
|
70
|
+
|
71
|
+
::
|
72
|
+
|
73
|
+
:~$ html2index -s /[path]/touchpad_fr.html
|
74
|
+
|
75
|
+
will produce output like this with expressions from the HTML-file explained in
|
76
|
+
the French language :
|
77
|
+
|
78
|
+
::
|
79
|
+
|
80
|
+
vi
|
81
|
+
(JargonF): 1. [Unix]. « Visual Interface » (littéralement, « interface
|
82
|
+
visuelle », ça ne s'invente pas !) éditeur de texte du pléistocène codé par
|
83
|
+
Bill Joy, aussi fondateur de Sun. Des aficionados d'Unix s'en servent
|
84
|
+
encore, même s'il est très concurrencé par Emacs. Son principal avantage
|
85
|
+
est que quel que soit l'état de votre système (par exemple complètement
|
86
|
+
déglingué ou allégé) il a de fortes chances de fonctionner encore
|
87
|
+
correctement.
|
88
|
+
La version la plus répandue est vim.
|
89
|
+
|
90
|
+
2. [nom de domaine]. Nom de domaine de premier niveau des îles Vierges étasuniennes.
|
91
|
+
------------------------------------------------------------
|
92
|
+
xinput
|
93
|
+
(JargonF): commande. [X11] Utilitaire facilitant la gestion des
|
94
|
+
périphériques X Window d'entrée. Il peut en fournir la liste, détailler
|
95
|
+
leurs propriétés et modifier celles qui peuvent l'être.
|
96
|
+
http://www.souris-libre.fr/savoir_faire/touchpad/touchpad_fr.html Exemple
|
97
|
+
d'utilisation: désactivation et activation rapide du pavé tactile.
|
98
|
+
|
99
|
+
If you name an output file with the -o option, html2index will direct its
|
100
|
+
output in HTML-format to this file.
|
101
|
+
|
102
|
+
Preparations
|
103
|
+
-------------------------
|
104
|
+
|
105
|
+
...................
|
106
|
+
Mark catchwords
|
107
|
+
...................
|
108
|
+
|
109
|
+
In the source-code of the original HTML- page, expressions for the future
|
110
|
+
glossary are marked by means of
|
111
|
+
|
112
|
+
* a tag
|
113
|
+
* an attribute of this tag
|
114
|
+
* the value of the attribute.
|
115
|
+
|
116
|
+
By default, the *span*-tag with an attribute *lang="fy"* is used, 'fy' meaning
|
117
|
+
Frisian, a language which is rarely used on the Web.., I venture.
|
118
|
+
|
119
|
+
|
120
|
+
*Example*:
|
121
|
+
::
|
122
|
+
|
123
|
+
<span lang="fy" xml:lang="fy">pavé tactile</span>
|
124
|
+
|
125
|
+
You can, though, define your own tag, attribute and attribute-value, if you prefer
|
126
|
+
to mark expressions in your original html-file differently, like in
|
127
|
+
|
128
|
+
*Example*:
|
129
|
+
::
|
130
|
+
|
131
|
+
<em class="expression">Tripane</em>
|
132
|
+
|
133
|
+
Remember that you can combine css classes and thus economize on html-elements, if
|
134
|
+
you use them anyway to style your html-content. This would complicate the task for
|
135
|
+
html2index only a little bit, as we will see further below.
|
136
|
+
|
137
|
+
.....................
|
138
|
+
Configuration
|
139
|
+
.....................
|
140
|
+
|
141
|
+
Apart from the way that expressions are marked in the original html, you can prepare
|
142
|
+
a few settings for HTML2Index, which influence its behaviour. Command-line
|
143
|
+
options override the values stored in the configuration-file.
|
144
|
+
|
145
|
+
A default configuration will be stored in the file *~/.config/HTML2Index/config* the
|
146
|
+
first time that you run html2index. It should be sufficiently commented to allow you
|
147
|
+
to comprehend and alter any values in the file.
|
148
|
+
|
149
|
+
However, an explanation of each one of the available variables follows:
|
150
|
+
|
151
|
+
:debug: Does the same as the command-line options '-d' or '--debug'.
|
152
|
+
Accepts the values false or true or can be left empty.
|
153
|
+
If set to true, this setting causes html2index to be very verbose.
|
154
|
+
Usually, you do not need to change the default value to this
|
155
|
+
variable, which is *false*.
|
156
|
+
|
157
|
+
:dictionaries: Here, you **HAVE** to define the online-dictionaries to
|
158
|
+
consult, if you do not want to stick with the defaults, which are
|
159
|
+
Larousse and JargonF, two French speaking sites, which also provide
|
160
|
+
explanations in the French language only.
|
161
|
+
|
162
|
+
The dictionaries are defined with four variables, each: *name, url, xpath, color*.
|
163
|
+
Each dictionary-definition must start with a dash, followed by a white-space, then the
|
164
|
+
first variable. Each variable-name must be enclosed by colons (see comments in the config-file).
|
165
|
+
|
166
|
+
:name: The name of the dictionary, how it will be referred to in the Glossary. An example could be 'Meriam-Webster'
|
167
|
+
|
168
|
+
:url: Note here the part from the url to a search-result in
|
169
|
+
the chosen dictionary, which precedes the searched
|
170
|
+
expression. You determine this string by doing a
|
171
|
+
search in the online-dictionary, then copy&paste the
|
172
|
+
url as it is displayed in your browser. Rearrange
|
173
|
+
possible request-parameters (following '?') to ensure
|
174
|
+
that the searched word or expression is the very last
|
175
|
+
item in the url. Remove only the searched expression
|
176
|
+
and note the remainder as the value to the variable
|
177
|
+
*url*.
|
178
|
+
|
179
|
+
:xpath: This is the xpath which identifies the HTML-element
|
180
|
+
in a search-result which contains the explanation of
|
181
|
+
an expression. Many resources on the Web explain how
|
182
|
+
to compose an xpath. Be as specific as possible, to
|
183
|
+
avoid a miss-interpretation of the xpath-expression,
|
184
|
+
use html-attributes which may be applied to an HTML
|
185
|
+
container-tag. Especially *id*, if present but also
|
186
|
+
css-classes can help to identify a tag unambiguously.
|
187
|
+
|
188
|
+
:color: A hexadecimal rgb color value in single quotes is
|
189
|
+
attributed to each dictionary to facilitate the
|
190
|
+
identification of the dictionary which provides a
|
191
|
+
specific explanation in the glossary. Exemplary
|
192
|
+
colors are *'800000'* or *'500050'*. Take care to
|
193
|
+
choose colors which harmonize with the background in
|
194
|
+
your template-file, if you use one.
|
195
|
+
|
196
|
+
:template: An HTML-file which contains placeholders. Two placeholders are needed
|
197
|
+
at the time of this writing, one to name the dictionaries which are
|
198
|
+
used to look-up definitions, another one to locate the spot where
|
199
|
+
the new glossary will be written. See below under *EXAMPLE-Template*
|
200
|
+
for a rudimentary example. The default template is internally defined.
|
201
|
+
|
202
|
+
:fdelim: A character sequence which is used to mark placeholders in the
|
203
|
+
HTML-template file. The default is '*%-*', meaning that a
|
204
|
+
percent-symbol followed by a dash marks the beginning, a dash
|
205
|
+
followed by a percent-symbol the end of a placeholder, like in
|
206
|
+
**%-dict-list-%** for the placeholder named 'dict-list'.
|
207
|
+
|
208
|
+
:placeholders: A list of placeholder names. Currently, there are only two
|
209
|
+
placeholders recognized by Html2Index: *dict_list* and
|
210
|
+
*glossary*. As the value to these two variables, note the name
|
211
|
+
that you chose for the placeholders in your HTML-template. The
|
212
|
+
defaults are *dict_list* for *dict_list* and *index* for
|
213
|
+
*glossary*.
|
214
|
+
|
215
|
+
:html_tag: This is the tag which encloses marked expressions in the original
|
216
|
+
HTML-page (the source-file). Default is *span*
|
217
|
+
|
218
|
+
:html_attribute: An attribute of the html_tag which encloses marked expressions
|
219
|
+
in the original HTML-page (the source-file). Default is *lang*.
|
220
|
+
|
221
|
+
:html_value: The value of an attribute of the html_tag which encloses marked
|
222
|
+
expressions in the original HTML-page (the source-file). Default is
|
223
|
+
*fy*.
|
224
|
+
|
225
|
+
EXAMPLE-Template
|
226
|
+
============================
|
227
|
+
Assuming that the defaults are used, the following could be a working
|
228
|
+
HTML-template to use with HTML2Index:
|
229
|
+
|
230
|
+
::
|
231
|
+
|
232
|
+
<html>
|
233
|
+
<head><title>Glossary</title></head>
|
234
|
+
<body>
|
235
|
+
<h1>Glossary</h1>
|
236
|
+
<h2>Dictionaries used to produce this glossary</h2>
|
237
|
+
<!-- will be replaced by an unnumbered list <ul><li> ... </li></ul> -->
|
238
|
+
%-dict_list-%
|
239
|
+
<h2>Definitions</h2>
|
240
|
+
<!-- will be replaced by a definition list <dl><dt><dd>... </dd></dt></dl> -->
|
241
|
+
%-glossary-%
|
242
|
+
</body>
|
243
|
+
</html>
|
244
|
+
|
245
|
+
ERRORS and WARNINGS
|
246
|
+
============================
|
247
|
+
|
248
|
+
html2index warns you if the output-file exists and asks you if you want
|
249
|
+
to replace it with a new version.
|
250
|
+
|
251
|
+
The program also tries to determine the file-type of the input (HTML) file and
|
252
|
+
gives out a warning if the file is considered unsuitable.
|
253
|
+
|
254
|
+
Each time, that an expression cannot be found in one of the targeted dictionaries,
|
255
|
+
a warning is given. All these problematic expressions will be listed in a
|
256
|
+
temporary file, which is named after html2index has terminated.
|
257
|
+
|
258
|
+
SOURCE CODE and DEVELOPMENT
|
259
|
+
============================
|
260
|
+
html2index is developed in Ruby and can be installed as a Ruby-Gem. As Ruby is
|
261
|
+
an interpreter-language, the source-code of the installed version is always
|
262
|
+
accessible. You can also decompress the gem-file to take a look at the code.
|
263
|
+
|
264
|
+
:AUTHOR: Michael Uplawski <michael[dot]uplawski[at]uplawski[dot]eu>
|
265
|
+
|
data/html2index.gemspec
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
require_relative "lib/version"
|
2
|
+
# require_relative "lib/constants"
|
3
|
+
require 'date'
|
4
|
+
|
5
|
+
Gem::Specification.new do |s|
|
6
|
+
s.version = VERSION
|
7
|
+
s.name = File.basename(__FILE__, '.gemspec')
|
8
|
+
s.date = Date.today.strftime('%F')
|
9
|
+
s.summary = "updated dependencies, updated use of the URI module."
|
10
|
+
s.description = "creates a glossary from HTML"
|
11
|
+
s.authors = ["Michael Uplawski"]
|
12
|
+
s.email = 'michael.uplawski@uplawski.eu'
|
13
|
+
s.files = %w~html2index~.collect{|f| 'bin/' << f} + %w~version.rb argparser.rb configuration.rb constants.rb dictionary.rb html2index.rb logging.rb translating.rb user_input.rb definition.rb file_checking.rb log.conf template.rb translations~.collect{|f| 'lib/' << f} + %w~html2index.gemspec~.collect{|f|f} + %w~html/html2index.html man/html2index.1.gz pdf/html2index.pdf rst/html2index.rst~.collect{|f| 'doc/' << f}
|
14
|
+
s.homepage = 'http://www.souris-libre.fr'
|
15
|
+
s.requirements = 'nokogiri, ruby-filemagic'
|
16
|
+
s.add_runtime_dependency 'nokogiri', '~> 1.10', '>= 1.10.9'
|
17
|
+
s.add_runtime_dependency 'ruby-filemagic', '~> 0.7', '>= 0.7.2'
|
18
|
+
s.executables = 'html2index'
|
19
|
+
s.license = 'GPL-3.0'
|
20
|
+
s.required_ruby_version = '>= 2.7.1'
|
21
|
+
end
|
data/lib/argparser.rb
ADDED
@@ -0,0 +1,111 @@
|
|
1
|
+
#encoding: UTF-8
|
2
|
+
=begin
|
3
|
+
/***************************************************************************
|
4
|
+
* ©2016-2017 Michael Uplawski <michael.uplawski@uplawski.eu> *
|
5
|
+
* *
|
6
|
+
* This program is free software; you can redistribute it and/or modify *
|
7
|
+
* it under the terms of the GNU General Public License as published by *
|
8
|
+
* the Free Software Foundation; either version 3 of the License, or *
|
9
|
+
* (at your option) any later version. *
|
10
|
+
* *
|
11
|
+
* This program is distributed in the hope that it will be useful, *
|
12
|
+
* but WITHOUT ANY WARRANTY; without even the implied warranty of *
|
13
|
+
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
|
14
|
+
* GNU General Public License for more details. *
|
15
|
+
* *
|
16
|
+
* You should have received a copy of the GNU General Public License *
|
17
|
+
* along with this program; if not, write to the *
|
18
|
+
* Free Software Foundation, Inc., *
|
19
|
+
* 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. *
|
20
|
+
***************************************************************************/
|
21
|
+
=end
|
22
|
+
|
23
|
+
|
24
|
+
require 'optparse'
|
25
|
+
require 'optparse/time'
|
26
|
+
require 'ostruct'
|
27
|
+
require_relative 'logging'
|
28
|
+
require_relative 'version'
|
29
|
+
# require_relative 'translating'
|
30
|
+
require_relative 'constants'
|
31
|
+
|
32
|
+
class ArgParser
|
33
|
+
# Class level logger. This is a static class.
|
34
|
+
self.extend(Logging)
|
35
|
+
# self.extend(Translating)
|
36
|
+
@@log = init_logger()
|
37
|
+
|
38
|
+
# Returns a structure describing the options.
|
39
|
+
#
|
40
|
+
def self.parse(args)
|
41
|
+
if args.empty?
|
42
|
+
puts usage
|
43
|
+
exit true
|
44
|
+
end
|
45
|
+
# The options specified on the command line will be collected in
|
46
|
+
# <b>options</b>. No defaults. Most options are optional and do not
|
47
|
+
# have to be set at all.
|
48
|
+
# The others must be named for each transformation or be set in the
|
49
|
+
# configuration-file.
|
50
|
+
options = OpenStruct.new
|
51
|
+
options.target = nil
|
52
|
+
|
53
|
+
op = OptionParser.new do |opts|
|
54
|
+
opts.banner = usage
|
55
|
+
|
56
|
+
opts.on("-d", "--debug", 'Be verbose') do
|
57
|
+
$log_level = Logger::DEBUG
|
58
|
+
@@log.level = $log_level
|
59
|
+
end
|
60
|
+
|
61
|
+
opts.on("-sOURCE", "--source=SOURCE", 'Source document (html)') do |so|
|
62
|
+
options.source = so
|
63
|
+
end
|
64
|
+
|
65
|
+
opts.on("-oUT", "--out=GLOSSAR", 'Glossar-file (html)') do |ta|
|
66
|
+
options.target = ta
|
67
|
+
end
|
68
|
+
|
69
|
+
opts.on("-tEMPLATE", "--template=TEMPLATE", 'Template (html)') do |tpl|
|
70
|
+
options.template = tpl
|
71
|
+
end
|
72
|
+
|
73
|
+
opts.on("-cONFIG", "--config=CONFIG", 'Configuration-file') do |cfg|
|
74
|
+
options.config = cfg
|
75
|
+
end
|
76
|
+
|
77
|
+
opts.on("-h", "--help", 'Show this message') do
|
78
|
+
puts opts
|
79
|
+
exit true
|
80
|
+
end
|
81
|
+
|
82
|
+
opts.on("-v", "--version", 'Show program version') do
|
83
|
+
puts APPNAME.dup << ", version " << VERSION
|
84
|
+
exit true
|
85
|
+
end
|
86
|
+
end
|
87
|
+
begin
|
88
|
+
op.parse!(args)
|
89
|
+
rescue OptionParser::ParseError => er
|
90
|
+
msg = "ERROR! Unsuitable or incomplete program-arguments" << ": %s" %er.message
|
91
|
+
puts msg
|
92
|
+
puts "Start this program with parameter -h or --help to see the usage-message."
|
93
|
+
exit false
|
94
|
+
end
|
95
|
+
@@log.debug('options are ' << options.to_s)
|
96
|
+
|
97
|
+
options
|
98
|
+
end # parse()
|
99
|
+
|
100
|
+
=begin
|
101
|
+
Shows the usage-message
|
102
|
+
=end
|
103
|
+
def self::usage
|
104
|
+
msg = "\n\tUsage: html2index -s input.html [-o output.html] [-c config-file] [-t template.html] [-d]"
|
105
|
+
msg << "\n\n\t* Will print to stdout, if the output-file is not provided."
|
106
|
+
msg << "\n\t* Adapt ~/.config/HTML2Index/config to your needs.\n\n"
|
107
|
+
end
|
108
|
+
|
109
|
+
|
110
|
+
end
|
111
|
+
|
@@ -0,0 +1,183 @@
|
|
1
|
+
#encoding: UTF-8
|
2
|
+
=begin
|
3
|
+
/***************************************************************************
|
4
|
+
* ©2016-2017 Michael Uplawski <michael.uplawski@uplawski.eu> *
|
5
|
+
* *
|
6
|
+
* This program is free software; you can redistribute it and/or modify *
|
7
|
+
* it under the terms of the GNU General Public License as published by *
|
8
|
+
* the Free Software Foundation; either version 3 of the License, or *
|
9
|
+
* (at your option) any later version. *
|
10
|
+
* *
|
11
|
+
* This program is distributed in the hope that it will be useful, *
|
12
|
+
* but WITHOUT ANY WARRANTY; without even the implied warranty of *
|
13
|
+
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
|
14
|
+
* GNU General Public License for more details. *
|
15
|
+
* *
|
16
|
+
* You should have received a copy of the GNU General Public License *
|
17
|
+
* along with this program; if not, write to the *
|
18
|
+
* Free Software Foundation, Inc., *
|
19
|
+
* 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. *
|
20
|
+
***************************************************************************/
|
21
|
+
=end
|
22
|
+
require 'yaml'
|
23
|
+
require 'singleton'
|
24
|
+
require 'ostruct'
|
25
|
+
require_relative 'constants'
|
26
|
+
require_relative 'file_checking'
|
27
|
+
require_relative 'logging'
|
28
|
+
require_relative 'translating'
|
29
|
+
|
30
|
+
class Configuration
|
31
|
+
include File_Checking
|
32
|
+
include Translating
|
33
|
+
include Logging
|
34
|
+
|
35
|
+
# default configuration file
|
36
|
+
@@config_file = File::dirname(__FILE__) << File::Separator << 'config'
|
37
|
+
|
38
|
+
# do initializations
|
39
|
+
def initialize(options)
|
40
|
+
init_logger(STDOUT)
|
41
|
+
@log.level = $log_level
|
42
|
+
set(options)
|
43
|
+
@log.debug('config-file is ' << @@config_file)
|
44
|
+
end
|
45
|
+
|
46
|
+
def user_conf
|
47
|
+
confdir = ENV['HOME'].dup << File::Separator << '.config'
|
48
|
+
Dir.mkdir(confdir) if !Dir.exist?(confdir)
|
49
|
+
confdir = confdir << File::Separator << APPNAME
|
50
|
+
Dir.mkdir(confdir) if !Dir.exist?(confdir)
|
51
|
+
config = confdir << File::Separator << 'config'
|
52
|
+
if(!File.exist?(config ) )
|
53
|
+
begin
|
54
|
+
File.open(config, 'w') {|co| co.write(File.read(@@config_file))}
|
55
|
+
@log.info("Created user-version of the configuration-file in\n\t" << config)
|
56
|
+
rescue Exception => ex
|
57
|
+
@log.error('Cannot create the configuration: ' << ex.message)
|
58
|
+
give_up
|
59
|
+
end
|
60
|
+
end
|
61
|
+
return config
|
62
|
+
end
|
63
|
+
|
64
|
+
attr_reader :dicts, :template, :fields, :placeholders, :fdelim
|
65
|
+
|
66
|
+
# return any value stored in @config
|
67
|
+
def method_missing(msg, *args)
|
68
|
+
ms = msg.to_sym
|
69
|
+
# Exception-handling is not a control-structure.
|
70
|
+
# This is.
|
71
|
+
if @config[ms]
|
72
|
+
return @config[ms]
|
73
|
+
else
|
74
|
+
return nil
|
75
|
+
end
|
76
|
+
end
|
77
|
+
|
78
|
+
private
|
79
|
+
|
80
|
+
# Configure with the command-line arguments.
|
81
|
+
def set(options)
|
82
|
+
@log.debug('merging options ' << options.to_s)
|
83
|
+
# User-provided configuration-file?
|
84
|
+
if(options['config'])
|
85
|
+
cf = options['config']
|
86
|
+
@log.debug('config should be ' << cf.to_s)
|
87
|
+
msg = file_check(cf, :file, :readable)
|
88
|
+
if(!msg)
|
89
|
+
@@config_file = cf
|
90
|
+
else
|
91
|
+
msg = ("The file %s " << msg.split[1,100].join(' ')) %msg.split[0]
|
92
|
+
@log.error(("ERROR! Unsuitable file") << ' ' << msg)
|
93
|
+
give_up
|
94
|
+
end
|
95
|
+
else
|
96
|
+
@@config_file = user_conf
|
97
|
+
end
|
98
|
+
|
99
|
+
@log.debug('config-file is ' << @@config_file)
|
100
|
+
|
101
|
+
# read defaults from configuration-file
|
102
|
+
co = OpenStruct.new(YAML::load_file(@@config_file))
|
103
|
+
|
104
|
+
# merge and overwrite with the command-line arguments
|
105
|
+
@config = co.to_h.update(options.to_h)
|
106
|
+
if(! @config[:source] )
|
107
|
+
msg = ('missing argument %s') %'source'
|
108
|
+
@log.error msg
|
109
|
+
@log.error(("Start this program with parameter -h or --help to see the usage-message.") )
|
110
|
+
give_up
|
111
|
+
end
|
112
|
+
|
113
|
+
# ----- define the template html ----
|
114
|
+
warn = false
|
115
|
+
# set template
|
116
|
+
if @config[:template]
|
117
|
+
@template = @config[:template]
|
118
|
+
else
|
119
|
+
@log.warn 'Using default-template!'
|
120
|
+
warn ||= true
|
121
|
+
end
|
122
|
+
# fields in the template file
|
123
|
+
if @config[:placeholders] && @config[:template]
|
124
|
+
@placeholders = @config[:placeholders]
|
125
|
+
@log.debug('placeholders from config: ' << @placeholders.to_s)
|
126
|
+
else
|
127
|
+
@placeholders = Template.default(:placeholders)
|
128
|
+
if @config[:placeholders]
|
129
|
+
@log.warn 'Placeholders are defined, but no template-file is given.'
|
130
|
+
else
|
131
|
+
@log.warn 'Template is given, but placeholders are not defined.'
|
132
|
+
end
|
133
|
+
@log.warn 'Using default placeholders ' << @placeholders.to_a.collect{|p|p.join(': ')}.join(', ')
|
134
|
+
warn = true
|
135
|
+
end
|
136
|
+
@fields = [@placeholders[:dict_list], @placeholders[:glossary]]
|
137
|
+
# the field-delimiter
|
138
|
+
if @config[:fdelim] && @config[:template]
|
139
|
+
@fdelim = @config[:fdelim]
|
140
|
+
else
|
141
|
+
@fdelim = Template.default(:fdelim)
|
142
|
+
if @config[:template]
|
143
|
+
@log.warn 'Template is given, but field delimiters are not defined.'
|
144
|
+
else
|
145
|
+
@log.warn 'Field delimiters are defined but no template is given.'
|
146
|
+
end
|
147
|
+
@log.warn 'Using default delimiters ' << @fdelim << ', ' << @fdelim.reverse
|
148
|
+
warn ||= true
|
149
|
+
end
|
150
|
+
|
151
|
+
# ----------- template is defined --------
|
152
|
+
dictionaries = @config[:dictionaries]
|
153
|
+
@dicts = Array.new
|
154
|
+
if(dictionaries)
|
155
|
+
dictionaries.each do |d|
|
156
|
+
@dicts << Dictionary.new(d[:name], d[:url], d[:xpath], d[:color])
|
157
|
+
end
|
158
|
+
@log.debug('dicts are from config' << @dicts.to_s)
|
159
|
+
else
|
160
|
+
@log.warn( %~NO DICTIONARIES have been set in the configuration!
|
161
|
+
Will use the defaults, which is probably NOT what you want!
|
162
|
+
Defaults are: %s~ %[URL_DICT1.dup << ', ' << URL_DICT2.dup])
|
163
|
+
warn ||= true
|
164
|
+
@dicts << Dictionary.new(NAME_DICT1, URL_DICT1,XPATH_DICT1, DICT_COLORS[0])
|
165
|
+
@dicts << Dictionary.new(NAME_DICT2, URL_DICT2, XPATH_DICT2, DICT_COLORS[1])
|
166
|
+
@log.debug('dicts are from constants' << @dicts.to_s)
|
167
|
+
end
|
168
|
+
@log.warn "HINT: Adapt #{@@config_file} to avoid warnings in the future." if warn
|
169
|
+
end
|
170
|
+
|
171
|
+
# exit on error
|
172
|
+
def give_up
|
173
|
+
@log.error("\t" << ("Aborting. Bye!"))
|
174
|
+
exit false
|
175
|
+
end
|
176
|
+
end
|
177
|
+
|
178
|
+
#------- TEST -----------
|
179
|
+
if __FILE__ == "$0"
|
180
|
+
conf = Configuration.new
|
181
|
+
conf.set({})
|
182
|
+
end
|
183
|
+
#eof
|
data/lib/constants.rb
ADDED
@@ -0,0 +1,55 @@
|
|
1
|
+
#encoding: UTF-8
|
2
|
+
=begin
|
3
|
+
/***************************************************************************
|
4
|
+
* ©2015-2017 Michael Uplawski <michael.uplawski@uplawski.eu> *
|
5
|
+
* *
|
6
|
+
* This program is free software; you can redistribute it and/or modify *
|
7
|
+
* it under the terms of the GNU General Public License as published by *
|
8
|
+
* the Free Software Foundation; either version 3 of the License, or *
|
9
|
+
* (at your option) any later version. *
|
10
|
+
* *
|
11
|
+
* This program is distributed in the hope that it will be useful, *
|
12
|
+
* but WITHOUT ANY WARRANTY; without even the implied warranty of *
|
13
|
+
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
|
14
|
+
* GNU General Public License for more details. *
|
15
|
+
* *
|
16
|
+
* You should have received a copy of the GNU General Public License *
|
17
|
+
* along with this program; if not, write to the *
|
18
|
+
* Free Software Foundation, Inc., *
|
19
|
+
* 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. *
|
20
|
+
***************************************************************************/
|
21
|
+
=end
|
22
|
+
require_relative 'dictionary'
|
23
|
+
require 'logger'
|
24
|
+
require 'date'
|
25
|
+
|
26
|
+
APPNAME = 'HTML2Index'
|
27
|
+
|
28
|
+
# URL and xpath for the definitions.
|
29
|
+
#
|
30
|
+
# The expression is added to the URL. Example:
|
31
|
+
# Taken a dictionary url like
|
32
|
+
# URL_MY_DICT = "http://my_technical_dictionary.somewhere.com/words/
|
33
|
+
# the url to search an expression will be
|
34
|
+
# URL_MY_DICT = "http://my_technical_dictionary.somewhere.com/words/expression
|
35
|
+
# In other words: Note here the part of the url *before* the expression
|
36
|
+
# ---
|
37
|
+
# The xpath must identify any HTML-elements, containing definitions.
|
38
|
+
#
|
39
|
+
NAME_DICT1 ||= 'Larousse'
|
40
|
+
URL_DICT1 ||= "http://www.larousse.com/fr/dictionnaires/francais/"
|
41
|
+
XPATH_DICT1 ||= ".//li[@class='DivisionDefinition']"
|
42
|
+
|
43
|
+
NAME_DICT2 ||= 'JargonF'
|
44
|
+
URL_DICT2 ||= "http://jargonf.org/wiki/"
|
45
|
+
XPATH_DICT2 ||= ".//div[@id='mw-content-text']/*"
|
46
|
+
|
47
|
+
# colors which are connected to one dictionary, each
|
48
|
+
DICT_COLORS ||= ['a000a0', '00a000']
|
49
|
+
|
50
|
+
# definitions which cause problems are logged.
|
51
|
+
PROBLEM_LOG ||= 'html2index_problems.txt'
|
52
|
+
$log_level = Logger::INFO
|
53
|
+
|
54
|
+
# meta-tag for the html-output
|
55
|
+
GeneratorMeta = "<meta name=\"generator\" content=\"HTML2Index ©2015-#{Date.today.strftime('%Y')} michael.uplawski@uplawski.eu\" />"
|
data/lib/definition.rb
ADDED
@@ -0,0 +1,43 @@
|
|
1
|
+
#encoding: UTF-8
|
2
|
+
=begin
|
3
|
+
/***************************************************************************
|
4
|
+
* ©2015-2017 Michael Uplawski <michael.uplawski@uplawski.eu> *
|
5
|
+
* *
|
6
|
+
* This program is free software; you can redistribute it and/or modify *
|
7
|
+
* it under the terms of the GNU General Public License as published by *
|
8
|
+
* the Free Software Foundation; either version 3 of the License, or *
|
9
|
+
* (at your option) any later version. *
|
10
|
+
* *
|
11
|
+
* This program is distributed in the hope that it will be useful, *
|
12
|
+
* but WITHOUT ANY WARRANTY; without even the implied warranty of *
|
13
|
+
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
|
14
|
+
* GNU General Public License for more details. *
|
15
|
+
* *
|
16
|
+
* You should have received a copy of the GNU General Public License *
|
17
|
+
* along with this program; if not, write to the *
|
18
|
+
* Free Software Foundation, Inc., *
|
19
|
+
* 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. *
|
20
|
+
***************************************************************************/
|
21
|
+
=end
|
22
|
+
require 'constants'
|
23
|
+
require_relative 'translating'
|
24
|
+
|
25
|
+
class Definition
|
26
|
+
include Comparable
|
27
|
+
|
28
|
+
attr_reader :origin, :expression, :definition
|
29
|
+
attr_accessor :color
|
30
|
+
|
31
|
+
def initialize(origin, expression, definition)
|
32
|
+
@origin = origin
|
33
|
+
@expression = expression
|
34
|
+
@definition = definition
|
35
|
+
@color = nil
|
36
|
+
end
|
37
|
+
|
38
|
+
def <=>(other_def)
|
39
|
+
return @expression <=> other_def.expression
|
40
|
+
end
|
41
|
+
end
|
42
|
+
|
43
|
+
|