html2index 1.2.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,265 @@
1
+ ==========================
2
+ HTML2Index
3
+ ==========================
4
+ --------------------------------------------------------------------
5
+ Creates an index or glossary of marked expressions in an HTML-file
6
+ --------------------------------------------------------------------
7
+
8
+ SYNOPSIS
9
+ =========
10
+ **html2index -s input.html [-o output.html] [-t template.html] [-c config] [-d]**
11
+
12
+ **html2index -v**
13
+
14
+ **html2index -h**
15
+
16
+ DESCRIPTION
17
+ ============
18
+
19
+ The program identifies in an HTML source all expressions which need to be copied
20
+ to the generated index and searches given dictionaries on the Web for an
21
+ explanation of each expression.
22
+
23
+ The resulting glossary is written to a new HTML-file or its HTML-code printed to STDOUT.
24
+
25
+ **NOTE**: The default dictionaries are French: Larousse and JargonF. Non-French speakers MUST define the dictionary by editing the configuration file ~/.config/HTML2Index/config as described under Configuration below.
26
+
27
+ OPTIONS
28
+ ============================
29
+
30
+ :-d, --debug: Be verbose
31
+
32
+ :-s, --source=SOURCE: Source is the the original html-file
33
+ which contains marked expressions (see
34
+ **Preparations**, below).
35
+
36
+ :-o, --out=GLOSSARY: Glossary is the generated file in HTML-format.
37
+
38
+ :-t, --template=TEMPLATE: A html file containing placeholders for the references to the dictionaries used and
39
+ the generated glossary. The placeholders are
40
+ currently defined as %=dict_list=% and
41
+ %=glossary=%. You can set different
42
+ field-delimiters and names in the
43
+ configuration-file. See below under
44
+ *EXAMPLE-Template* for a rudimentary example.
45
+
46
+ :-c, --config=CONFIG: Configuration-file. Command-line arguments override the settings in this file. You find a functional
47
+ configuration after the first program-execution in *~/.config/HTML2Index*. The file is commented and
48
+ can immediately be adapted to your needs.
49
+
50
+ Common Options
51
+ ----------------
52
+
53
+ **-h, --help** Show this message
54
+
55
+ **-v, --version** Show program version
56
+
57
+ EXAMPLE Usage
58
+ ============================
59
+
60
+ Here is a html-page containing instructions on how to enable and disable a
61
+ touchpad using the xinput command (or any other HTML-file) in the French (or
62
+ any other) language.
63
+
64
+ **touchpad_fr.html**
65
+
66
+ Execution
67
+ --------------------
68
+
69
+ Executing HTML2Index with the -s argument and the HTML-file as its value, like this:
70
+
71
+ ::
72
+
73
+ :~$ html2index -s /[path]/touchpad_fr.html
74
+
75
+ will produce output like this with expressions from the HTML-file explained in
76
+ the French language :
77
+
78
+ ::
79
+
80
+ vi
81
+ (JargonF): 1.  [Unix]. « Visual Interface » (littéralement, « interface
82
+ visuelle », ça ne s'invente pas !) éditeur de texte du pléistocène codé par
83
+ Bill Joy, aussi fondateur de Sun. Des aficionados d'Unix s'en servent
84
+ encore, même s'il est très concurrencé par Emacs. Son principal avantage
85
+ est que quel que soit l'état de votre système (par exemple complètement
86
+ déglingué ou allégé) il a de fortes chances de fonctionner encore
87
+ correctement.
88
+ La version la plus répandue est vim.
89
+
90
+ 2.  [nom de domaine]. Nom de domaine de premier niveau des îles Vierges étasuniennes.
91
+ ------------------------------------------------------------
92
+ xinput
93
+ (JargonF): commande.  [X11] Utilitaire facilitant la gestion des
94
+ périphériques X Window d'entrée. Il peut en fournir la liste, détailler
95
+ leurs propriétés et modifier celles qui peuvent l'être.
96
+ http://www.souris-libre.fr/savoir_faire/touchpad/touchpad_fr.html Exemple
97
+ d'utilisation: désactivation et activation rapide du pavé tactile.
98
+
99
+ If you name an output file with the -o option, html2index will direct its
100
+ output in HTML-format to this file.
101
+
102
+ Preparations
103
+ -------------------------
104
+
105
+ ...................
106
+ Mark catchwords
107
+ ...................
108
+
109
+ In the source-code of the original HTML- page, expressions for the future
110
+ glossary are marked by means of
111
+
112
+ * a tag
113
+ * an attribute of this tag
114
+ * the value of the attribute.
115
+
116
+ By default, the *span*-tag with an attribute *lang="fy"* is used, 'fy' meaning
117
+ Frisian, a language which is rarely used on the Web.., I venture.
118
+
119
+
120
+ *Example*:
121
+ ::
122
+
123
+ <span lang="fy" xml:lang="fy">pavé tactile</span>
124
+
125
+ You can, though, define your own tag, attribute and attribute-value, if you prefer
126
+ to mark expressions in your original html-file differently, like in
127
+
128
+ *Example*:
129
+ ::
130
+
131
+ <em class="expression">Tripane</em>
132
+
133
+ Remember that you can combine css classes and thus economize on html-elements, if
134
+ you use them anyway to style your html-content. This would complicate the task for
135
+ html2index only a little bit, as we will see further below.
136
+
137
+ .....................
138
+ Configuration
139
+ .....................
140
+
141
+ Apart from the way that expressions are marked in the original html, you can prepare
142
+ a few settings for HTML2Index, which influence its behaviour. Command-line
143
+ options override the values stored in the configuration-file.
144
+
145
+ A default configuration will be stored in the file *~/.config/HTML2Index/config* the
146
+ first time that you run html2index. It should be sufficiently commented to allow you
147
+ to comprehend and alter any values in the file.
148
+
149
+ However, an explanation of each one of the available variables follows:
150
+
151
+ :debug: Does the same as the command-line options '-d' or '--debug'.
152
+ Accepts the values false or true or can be left empty.
153
+ If set to true, this setting causes html2index to be very verbose.
154
+ Usually, you do not need to change the default value to this
155
+ variable, which is *false*.
156
+
157
+ :dictionaries: Here, you **HAVE** to define the online-dictionaries to
158
+ consult, if you do not want to stick with the defaults, which are
159
+ Larousse and JargonF, two French speaking sites, which also provide
160
+ explanations in the French language only.
161
+
162
+ The dictionaries are defined with four variables, each: *name, url, xpath, color*.
163
+ Each dictionary-definition must start with a dash, followed by a white-space, then the
164
+ first variable. Each variable-name must be enclosed by colons (see comments in the config-file).
165
+
166
+ :name: The name of the dictionary, how it will be referred to in the Glossary. An example could be 'Meriam-Webster'
167
+
168
+ :url: Note here the part from the url to a search-result in
169
+ the chosen dictionary, which precedes the searched
170
+ expression. You determine this string by doing a
171
+ search in the online-dictionary, then copy&paste the
172
+ url as it is displayed in your browser. Rearrange
173
+ possible request-parameters (following '?') to ensure
174
+ that the searched word or expression is the very last
175
+ item in the url. Remove only the searched expression
176
+ and note the remainder as the value to the variable
177
+ *url*.
178
+
179
+ :xpath: This is the xpath which identifies the HTML-element
180
+ in a search-result which contains the explanation of
181
+ an expression. Many resources on the Web explain how
182
+ to compose an xpath. Be as specific as possible, to
183
+ avoid a miss-interpretation of the xpath-expression,
184
+ use html-attributes which may be applied to an HTML
185
+ container-tag. Especially *id*, if present but also
186
+ css-classes can help to identify a tag unambiguously.
187
+
188
+ :color: A hexadecimal rgb color value in single quotes is
189
+ attributed to each dictionary to facilitate the
190
+ identification of the dictionary which provides a
191
+ specific explanation in the glossary. Exemplary
192
+ colors are *'800000'* or *'500050'*. Take care to
193
+ choose colors which harmonize with the background in
194
+ your template-file, if you use one.
195
+
196
+ :template: An HTML-file which contains placeholders. Two placeholders are needed
197
+ at the time of this writing, one to name the dictionaries which are
198
+ used to look-up definitions, another one to locate the spot where
199
+ the new glossary will be written. See below under *EXAMPLE-Template*
200
+ for a rudimentary example. The default template is internally defined.
201
+
202
+ :fdelim: A character sequence which is used to mark placeholders in the
203
+ HTML-template file. The default is '*%-*', meaning that a
204
+ percent-symbol followed by a dash marks the beginning, a dash
205
+ followed by a percent-symbol the end of a placeholder, like in
206
+ **%-dict-list-%** for the placeholder named 'dict-list'.
207
+
208
+ :placeholders: A list of placeholder names. Currently, there are only two
209
+ placeholders recognized by Html2Index: *dict_list* and
210
+ *glossary*. As the value to these two variables, note the name
211
+ that you chose for the placeholders in your HTML-template. The
212
+ defaults are *dict_list* for *dict_list* and *index* for
213
+ *glossary*.
214
+
215
+ :html_tag: This is the tag which encloses marked expressions in the original
216
+ HTML-page (the source-file). Default is *span*
217
+
218
+ :html_attribute: An attribute of the html_tag which encloses marked expressions
219
+ in the original HTML-page (the source-file). Default is *lang*.
220
+
221
+ :html_value: The value of an attribute of the html_tag which encloses marked
222
+ expressions in the original HTML-page (the source-file). Default is
223
+ *fy*.
224
+
225
+ EXAMPLE-Template
226
+ ============================
227
+ Assuming that the defaults are used, the following could be a working
228
+ HTML-template to use with HTML2Index:
229
+
230
+ ::
231
+
232
+ <html>
233
+ <head><title>Glossary</title></head>
234
+ <body>
235
+ <h1>Glossary</h1>
236
+ <h2>Dictionaries used to produce this glossary</h2>
237
+ <!-- will be replaced by an unnumbered list <ul><li> ... </li></ul> -->
238
+ %-dict_list-%
239
+ <h2>Definitions</h2>
240
+ <!-- will be replaced by a definition list <dl><dt><dd>... </dd></dt></dl> -->
241
+ %-glossary-%
242
+ </body>
243
+ </html>
244
+
245
+ ERRORS and WARNINGS
246
+ ============================
247
+
248
+ html2index warns you if the output-file exists and asks you if you want
249
+ to replace it with a new version.
250
+
251
+ The program also tries to determine the file-type of the input (HTML) file and
252
+ gives out a warning if the file is considered unsuitable.
253
+
254
+ Each time, that an expression cannot be found in one of the targeted dictionaries,
255
+ a warning is given. All these problematic expressions will be listed in a
256
+ temporary file, which is named after html2index has terminated.
257
+
258
+ SOURCE CODE and DEVELOPMENT
259
+ ============================
260
+ html2index is developed in Ruby and can be installed as a Ruby-Gem. As Ruby is
261
+ an interpreter-language, the source-code of the installed version is always
262
+ accessible. You can also decompress the gem-file to take a look at the code.
263
+
264
+ :AUTHOR: Michael Uplawski <michael[dot]uplawski[at]uplawski[dot]eu>
265
+
@@ -0,0 +1,21 @@
1
+ require_relative "lib/version"
2
+ # require_relative "lib/constants"
3
+ require 'date'
4
+
5
+ Gem::Specification.new do |s|
6
+ s.version = VERSION
7
+ s.name = File.basename(__FILE__, '.gemspec')
8
+ s.date = Date.today.strftime('%F')
9
+ s.summary = "updated dependencies, updated use of the URI module."
10
+ s.description = "creates a glossary from HTML"
11
+ s.authors = ["Michael Uplawski"]
12
+ s.email = 'michael.uplawski@uplawski.eu'
13
+ s.files = %w~html2index~.collect{|f| 'bin/' << f} + %w~version.rb argparser.rb configuration.rb constants.rb dictionary.rb html2index.rb logging.rb translating.rb user_input.rb definition.rb file_checking.rb log.conf template.rb translations~.collect{|f| 'lib/' << f} + %w~html2index.gemspec~.collect{|f|f} + %w~html/html2index.html man/html2index.1.gz pdf/html2index.pdf rst/html2index.rst~.collect{|f| 'doc/' << f}
14
+ s.homepage = 'http://www.souris-libre.fr'
15
+ s.requirements = 'nokogiri, ruby-filemagic'
16
+ s.add_runtime_dependency 'nokogiri', '~> 1.10', '>= 1.10.9'
17
+ s.add_runtime_dependency 'ruby-filemagic', '~> 0.7', '>= 0.7.2'
18
+ s.executables = 'html2index'
19
+ s.license = 'GPL-3.0'
20
+ s.required_ruby_version = '>= 2.7.1'
21
+ end
data/lib/argparser.rb ADDED
@@ -0,0 +1,111 @@
1
+ #encoding: UTF-8
2
+ =begin
3
+ /***************************************************************************
4
+ * ©2016-2017 Michael Uplawski <michael.uplawski@uplawski.eu> *
5
+ * *
6
+ * This program is free software; you can redistribute it and/or modify *
7
+ * it under the terms of the GNU General Public License as published by *
8
+ * the Free Software Foundation; either version 3 of the License, or *
9
+ * (at your option) any later version. *
10
+ * *
11
+ * This program is distributed in the hope that it will be useful, *
12
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of *
13
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
14
+ * GNU General Public License for more details. *
15
+ * *
16
+ * You should have received a copy of the GNU General Public License *
17
+ * along with this program; if not, write to the *
18
+ * Free Software Foundation, Inc., *
19
+ * 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. *
20
+ ***************************************************************************/
21
+ =end
22
+
23
+
24
+ require 'optparse'
25
+ require 'optparse/time'
26
+ require 'ostruct'
27
+ require_relative 'logging'
28
+ require_relative 'version'
29
+ # require_relative 'translating'
30
+ require_relative 'constants'
31
+
32
+ class ArgParser
33
+ # Class level logger. This is a static class.
34
+ self.extend(Logging)
35
+ # self.extend(Translating)
36
+ @@log = init_logger()
37
+
38
+ # Returns a structure describing the options.
39
+ #
40
+ def self.parse(args)
41
+ if args.empty?
42
+ puts usage
43
+ exit true
44
+ end
45
+ # The options specified on the command line will be collected in
46
+ # <b>options</b>. No defaults. Most options are optional and do not
47
+ # have to be set at all.
48
+ # The others must be named for each transformation or be set in the
49
+ # configuration-file.
50
+ options = OpenStruct.new
51
+ options.target = nil
52
+
53
+ op = OptionParser.new do |opts|
54
+ opts.banner = usage
55
+
56
+ opts.on("-d", "--debug", 'Be verbose') do
57
+ $log_level = Logger::DEBUG
58
+ @@log.level = $log_level
59
+ end
60
+
61
+ opts.on("-sOURCE", "--source=SOURCE", 'Source document (html)') do |so|
62
+ options.source = so
63
+ end
64
+
65
+ opts.on("-oUT", "--out=GLOSSAR", 'Glossar-file (html)') do |ta|
66
+ options.target = ta
67
+ end
68
+
69
+ opts.on("-tEMPLATE", "--template=TEMPLATE", 'Template (html)') do |tpl|
70
+ options.template = tpl
71
+ end
72
+
73
+ opts.on("-cONFIG", "--config=CONFIG", 'Configuration-file') do |cfg|
74
+ options.config = cfg
75
+ end
76
+
77
+ opts.on("-h", "--help", 'Show this message') do
78
+ puts opts
79
+ exit true
80
+ end
81
+
82
+ opts.on("-v", "--version", 'Show program version') do
83
+ puts APPNAME.dup << ", version " << VERSION
84
+ exit true
85
+ end
86
+ end
87
+ begin
88
+ op.parse!(args)
89
+ rescue OptionParser::ParseError => er
90
+ msg = "ERROR! Unsuitable or incomplete program-arguments" << ": %s" %er.message
91
+ puts msg
92
+ puts "Start this program with parameter -h or --help to see the usage-message."
93
+ exit false
94
+ end
95
+ @@log.debug('options are ' << options.to_s)
96
+
97
+ options
98
+ end # parse()
99
+
100
+ =begin
101
+ Shows the usage-message
102
+ =end
103
+ def self::usage
104
+ msg = "\n\tUsage: html2index -s input.html [-o output.html] [-c config-file] [-t template.html] [-d]"
105
+ msg << "\n\n\t* Will print to stdout, if the output-file is not provided."
106
+ msg << "\n\t* Adapt ~/.config/HTML2Index/config to your needs.\n\n"
107
+ end
108
+
109
+
110
+ end
111
+
@@ -0,0 +1,183 @@
1
+ #encoding: UTF-8
2
+ =begin
3
+ /***************************************************************************
4
+ * ©2016-2017 Michael Uplawski <michael.uplawski@uplawski.eu> *
5
+ * *
6
+ * This program is free software; you can redistribute it and/or modify *
7
+ * it under the terms of the GNU General Public License as published by *
8
+ * the Free Software Foundation; either version 3 of the License, or *
9
+ * (at your option) any later version. *
10
+ * *
11
+ * This program is distributed in the hope that it will be useful, *
12
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of *
13
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
14
+ * GNU General Public License for more details. *
15
+ * *
16
+ * You should have received a copy of the GNU General Public License *
17
+ * along with this program; if not, write to the *
18
+ * Free Software Foundation, Inc., *
19
+ * 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. *
20
+ ***************************************************************************/
21
+ =end
22
+ require 'yaml'
23
+ require 'singleton'
24
+ require 'ostruct'
25
+ require_relative 'constants'
26
+ require_relative 'file_checking'
27
+ require_relative 'logging'
28
+ require_relative 'translating'
29
+
30
+ class Configuration
31
+ include File_Checking
32
+ include Translating
33
+ include Logging
34
+
35
+ # default configuration file
36
+ @@config_file = File::dirname(__FILE__) << File::Separator << 'config'
37
+
38
+ # do initializations
39
+ def initialize(options)
40
+ init_logger(STDOUT)
41
+ @log.level = $log_level
42
+ set(options)
43
+ @log.debug('config-file is ' << @@config_file)
44
+ end
45
+
46
+ def user_conf
47
+ confdir = ENV['HOME'].dup << File::Separator << '.config'
48
+ Dir.mkdir(confdir) if !Dir.exist?(confdir)
49
+ confdir = confdir << File::Separator << APPNAME
50
+ Dir.mkdir(confdir) if !Dir.exist?(confdir)
51
+ config = confdir << File::Separator << 'config'
52
+ if(!File.exist?(config ) )
53
+ begin
54
+ File.open(config, 'w') {|co| co.write(File.read(@@config_file))}
55
+ @log.info("Created user-version of the configuration-file in\n\t" << config)
56
+ rescue Exception => ex
57
+ @log.error('Cannot create the configuration: ' << ex.message)
58
+ give_up
59
+ end
60
+ end
61
+ return config
62
+ end
63
+
64
+ attr_reader :dicts, :template, :fields, :placeholders, :fdelim
65
+
66
+ # return any value stored in @config
67
+ def method_missing(msg, *args)
68
+ ms = msg.to_sym
69
+ # Exception-handling is not a control-structure.
70
+ # This is.
71
+ if @config[ms]
72
+ return @config[ms]
73
+ else
74
+ return nil
75
+ end
76
+ end
77
+
78
+ private
79
+
80
+ # Configure with the command-line arguments.
81
+ def set(options)
82
+ @log.debug('merging options ' << options.to_s)
83
+ # User-provided configuration-file?
84
+ if(options['config'])
85
+ cf = options['config']
86
+ @log.debug('config should be ' << cf.to_s)
87
+ msg = file_check(cf, :file, :readable)
88
+ if(!msg)
89
+ @@config_file = cf
90
+ else
91
+ msg = ("The file %s " << msg.split[1,100].join(' ')) %msg.split[0]
92
+ @log.error(("ERROR! Unsuitable file") << ' ' << msg)
93
+ give_up
94
+ end
95
+ else
96
+ @@config_file = user_conf
97
+ end
98
+
99
+ @log.debug('config-file is ' << @@config_file)
100
+
101
+ # read defaults from configuration-file
102
+ co = OpenStruct.new(YAML::load_file(@@config_file))
103
+
104
+ # merge and overwrite with the command-line arguments
105
+ @config = co.to_h.update(options.to_h)
106
+ if(! @config[:source] )
107
+ msg = ('missing argument %s') %'source'
108
+ @log.error msg
109
+ @log.error(("Start this program with parameter -h or --help to see the usage-message.") )
110
+ give_up
111
+ end
112
+
113
+ # ----- define the template html ----
114
+ warn = false
115
+ # set template
116
+ if @config[:template]
117
+ @template = @config[:template]
118
+ else
119
+ @log.warn 'Using default-template!'
120
+ warn ||= true
121
+ end
122
+ # fields in the template file
123
+ if @config[:placeholders] && @config[:template]
124
+ @placeholders = @config[:placeholders]
125
+ @log.debug('placeholders from config: ' << @placeholders.to_s)
126
+ else
127
+ @placeholders = Template.default(:placeholders)
128
+ if @config[:placeholders]
129
+ @log.warn 'Placeholders are defined, but no template-file is given.'
130
+ else
131
+ @log.warn 'Template is given, but placeholders are not defined.'
132
+ end
133
+ @log.warn 'Using default placeholders ' << @placeholders.to_a.collect{|p|p.join(': ')}.join(', ')
134
+ warn = true
135
+ end
136
+ @fields = [@placeholders[:dict_list], @placeholders[:glossary]]
137
+ # the field-delimiter
138
+ if @config[:fdelim] && @config[:template]
139
+ @fdelim = @config[:fdelim]
140
+ else
141
+ @fdelim = Template.default(:fdelim)
142
+ if @config[:template]
143
+ @log.warn 'Template is given, but field delimiters are not defined.'
144
+ else
145
+ @log.warn 'Field delimiters are defined but no template is given.'
146
+ end
147
+ @log.warn 'Using default delimiters ' << @fdelim << ', ' << @fdelim.reverse
148
+ warn ||= true
149
+ end
150
+
151
+ # ----------- template is defined --------
152
+ dictionaries = @config[:dictionaries]
153
+ @dicts = Array.new
154
+ if(dictionaries)
155
+ dictionaries.each do |d|
156
+ @dicts << Dictionary.new(d[:name], d[:url], d[:xpath], d[:color])
157
+ end
158
+ @log.debug('dicts are from config' << @dicts.to_s)
159
+ else
160
+ @log.warn( %~NO DICTIONARIES have been set in the configuration!
161
+ Will use the defaults, which is probably NOT what you want!
162
+ Defaults are: %s~ %[URL_DICT1.dup << ', ' << URL_DICT2.dup])
163
+ warn ||= true
164
+ @dicts << Dictionary.new(NAME_DICT1, URL_DICT1,XPATH_DICT1, DICT_COLORS[0])
165
+ @dicts << Dictionary.new(NAME_DICT2, URL_DICT2, XPATH_DICT2, DICT_COLORS[1])
166
+ @log.debug('dicts are from constants' << @dicts.to_s)
167
+ end
168
+ @log.warn "HINT: Adapt #{@@config_file} to avoid warnings in the future." if warn
169
+ end
170
+
171
+ # exit on error
172
+ def give_up
173
+ @log.error("\t" << ("Aborting. Bye!"))
174
+ exit false
175
+ end
176
+ end
177
+
178
+ #------- TEST -----------
179
+ if __FILE__ == "$0"
180
+ conf = Configuration.new
181
+ conf.set({})
182
+ end
183
+ #eof
data/lib/constants.rb ADDED
@@ -0,0 +1,55 @@
1
+ #encoding: UTF-8
2
+ =begin
3
+ /***************************************************************************
4
+ * ©2015-2017 Michael Uplawski <michael.uplawski@uplawski.eu> *
5
+ * *
6
+ * This program is free software; you can redistribute it and/or modify *
7
+ * it under the terms of the GNU General Public License as published by *
8
+ * the Free Software Foundation; either version 3 of the License, or *
9
+ * (at your option) any later version. *
10
+ * *
11
+ * This program is distributed in the hope that it will be useful, *
12
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of *
13
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
14
+ * GNU General Public License for more details. *
15
+ * *
16
+ * You should have received a copy of the GNU General Public License *
17
+ * along with this program; if not, write to the *
18
+ * Free Software Foundation, Inc., *
19
+ * 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. *
20
+ ***************************************************************************/
21
+ =end
22
+ require_relative 'dictionary'
23
+ require 'logger'
24
+ require 'date'
25
+
26
+ APPNAME = 'HTML2Index'
27
+
28
+ # URL and xpath for the definitions.
29
+ #
30
+ # The expression is added to the URL. Example:
31
+ # Taken a dictionary url like
32
+ # URL_MY_DICT = "http://my_technical_dictionary.somewhere.com/words/
33
+ # the url to search an expression will be
34
+ # URL_MY_DICT = "http://my_technical_dictionary.somewhere.com/words/expression
35
+ # In other words: Note here the part of the url *before* the expression
36
+ # ---
37
+ # The xpath must identify any HTML-elements, containing definitions.
38
+ #
39
+ NAME_DICT1 ||= 'Larousse'
40
+ URL_DICT1 ||= "http://www.larousse.com/fr/dictionnaires/francais/"
41
+ XPATH_DICT1 ||= ".//li[@class='DivisionDefinition']"
42
+
43
+ NAME_DICT2 ||= 'JargonF'
44
+ URL_DICT2 ||= "http://jargonf.org/wiki/"
45
+ XPATH_DICT2 ||= ".//div[@id='mw-content-text']/*"
46
+
47
+ # colors which are connected to one dictionary, each
48
+ DICT_COLORS ||= ['a000a0', '00a000']
49
+
50
+ # definitions which cause problems are logged.
51
+ PROBLEM_LOG ||= 'html2index_problems.txt'
52
+ $log_level = Logger::INFO
53
+
54
+ # meta-tag for the html-output
55
+ GeneratorMeta = "<meta name=\"generator\" content=\"HTML2Index ©2015-#{Date.today.strftime('%Y')} michael.uplawski@uplawski.eu\" />"
data/lib/definition.rb ADDED
@@ -0,0 +1,43 @@
1
+ #encoding: UTF-8
2
+ =begin
3
+ /***************************************************************************
4
+ * ©2015-2017 Michael Uplawski <michael.uplawski@uplawski.eu> *
5
+ * *
6
+ * This program is free software; you can redistribute it and/or modify *
7
+ * it under the terms of the GNU General Public License as published by *
8
+ * the Free Software Foundation; either version 3 of the License, or *
9
+ * (at your option) any later version. *
10
+ * *
11
+ * This program is distributed in the hope that it will be useful, *
12
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of *
13
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
14
+ * GNU General Public License for more details. *
15
+ * *
16
+ * You should have received a copy of the GNU General Public License *
17
+ * along with this program; if not, write to the *
18
+ * Free Software Foundation, Inc., *
19
+ * 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. *
20
+ ***************************************************************************/
21
+ =end
22
+ require 'constants'
23
+ require_relative 'translating'
24
+
25
+ class Definition
26
+ include Comparable
27
+
28
+ attr_reader :origin, :expression, :definition
29
+ attr_accessor :color
30
+
31
+ def initialize(origin, expression, definition)
32
+ @origin = origin
33
+ @expression = expression
34
+ @definition = definition
35
+ @color = nil
36
+ end
37
+
38
+ def <=>(other_def)
39
+ return @expression <=> other_def.expression
40
+ end
41
+ end
42
+
43
+