html2index 1.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,265 @@
1
+ ==========================
2
+ HTML2Index
3
+ ==========================
4
+ --------------------------------------------------------------------
5
+ Creates an index or glossary of marked expressions in an HTML-file
6
+ --------------------------------------------------------------------
7
+
8
+ SYNOPSIS
9
+ =========
10
+ **html2index -s input.html [-o output.html] [-t template.html] [-c config] [-d]**
11
+
12
+ **html2index -v**
13
+
14
+ **html2index -h**
15
+
16
+ DESCRIPTION
17
+ ============
18
+
19
+ The program identifies in an HTML source all expressions which need to be copied
20
+ to the generated index and searches given dictionaries on the Web for an
21
+ explanation of each expression.
22
+
23
+ The resulting glossary is written to a new HTML-file or its HTML-code printed to STDOUT.
24
+
25
+ **NOTE**: The default dictionaries are French: Larousse and JargonF. Non-French speakers MUST define the dictionary by editing the configuration file ~/.config/HTML2Index/config as described under Configuration below.
26
+
27
+ OPTIONS
28
+ ============================
29
+
30
+ :-d, --debug: Be verbose
31
+
32
+ :-s, --source=SOURCE: Source is the the original html-file
33
+ which contains marked expressions (see
34
+ **Preparations**, below).
35
+
36
+ :-o, --out=GLOSSARY: Glossary is the generated file in HTML-format.
37
+
38
+ :-t, --template=TEMPLATE: A html file containing placeholders for the references to the dictionaries used and
39
+ the generated glossary. The placeholders are
40
+ currently defined as %=dict_list=% and
41
+ %=glossary=%. You can set different
42
+ field-delimiters and names in the
43
+ configuration-file. See below under
44
+ *EXAMPLE-Template* for a rudimentary example.
45
+
46
+ :-c, --config=CONFIG: Configuration-file. Command-line arguments override the settings in this file. You find a functional
47
+ configuration after the first program-execution in *~/.config/HTML2Index*. The file is commented and
48
+ can immediately be adapted to your needs.
49
+
50
+ Common Options
51
+ ----------------
52
+
53
+ **-h, --help** Show this message
54
+
55
+ **-v, --version** Show program version
56
+
57
+ EXAMPLE Usage
58
+ ============================
59
+
60
+ Here is a html-page containing instructions on how to enable and disable a
61
+ touchpad using the xinput command (or any other HTML-file) in the French (or
62
+ any other) language.
63
+
64
+ **touchpad_fr.html**
65
+
66
+ Execution
67
+ --------------------
68
+
69
+ Executing HTML2Index with the -s argument and the HTML-file as its value, like this:
70
+
71
+ ::
72
+
73
+ :~$ html2index -s /[path]/touchpad_fr.html
74
+
75
+ will produce output like this with expressions from the HTML-file explained in
76
+ the French language :
77
+
78
+ ::
79
+
80
+ vi
81
+ (JargonF): 1.  [Unix]. « Visual Interface » (littéralement, « interface
82
+ visuelle », ça ne s'invente pas !) éditeur de texte du pléistocène codé par
83
+ Bill Joy, aussi fondateur de Sun. Des aficionados d'Unix s'en servent
84
+ encore, même s'il est très concurrencé par Emacs. Son principal avantage
85
+ est que quel que soit l'état de votre système (par exemple complètement
86
+ déglingué ou allégé) il a de fortes chances de fonctionner encore
87
+ correctement.
88
+ La version la plus répandue est vim.
89
+
90
+ 2.  [nom de domaine]. Nom de domaine de premier niveau des îles Vierges étasuniennes.
91
+ ------------------------------------------------------------
92
+ xinput
93
+ (JargonF): commande.  [X11] Utilitaire facilitant la gestion des
94
+ périphériques X Window d'entrée. Il peut en fournir la liste, détailler
95
+ leurs propriétés et modifier celles qui peuvent l'être.
96
+ http://www.souris-libre.fr/savoir_faire/touchpad/touchpad_fr.html Exemple
97
+ d'utilisation: désactivation et activation rapide du pavé tactile.
98
+
99
+ If you name an output file with the -o option, html2index will direct its
100
+ output in HTML-format to this file.
101
+
102
+ Preparations
103
+ -------------------------
104
+
105
+ ...................
106
+ Mark catchwords
107
+ ...................
108
+
109
+ In the source-code of the original HTML- page, expressions for the future
110
+ glossary are marked by means of
111
+
112
+ * a tag
113
+ * an attribute of this tag
114
+ * the value of the attribute.
115
+
116
+ By default, the *span*-tag with an attribute *lang="fy"* is used, 'fy' meaning
117
+ Frisian, a language which is rarely used on the Web.., I venture.
118
+
119
+
120
+ *Example*:
121
+ ::
122
+
123
+ <span lang="fy" xml:lang="fy">pavé tactile</span>
124
+
125
+ You can, though, define your own tag, attribute and attribute-value, if you prefer
126
+ to mark expressions in your original html-file differently, like in
127
+
128
+ *Example*:
129
+ ::
130
+
131
+ <em class="expression">Tripane</em>
132
+
133
+ Remember that you can combine css classes and thus economize on html-elements, if
134
+ you use them anyway to style your html-content. This would complicate the task for
135
+ html2index only a little bit, as we will see further below.
136
+
137
+ .....................
138
+ Configuration
139
+ .....................
140
+
141
+ Apart from the way that expressions are marked in the original html, you can prepare
142
+ a few settings for HTML2Index, which influence its behaviour. Command-line
143
+ options override the values stored in the configuration-file.
144
+
145
+ A default configuration will be stored in the file *~/.config/HTML2Index/config* the
146
+ first time that you run html2index. It should be sufficiently commented to allow you
147
+ to comprehend and alter any values in the file.
148
+
149
+ However, an explanation of each one of the available variables follows:
150
+
151
+ :debug: Does the same as the command-line options '-d' or '--debug'.
152
+ Accepts the values false or true or can be left empty.
153
+ If set to true, this setting causes html2index to be very verbose.
154
+ Usually, you do not need to change the default value to this
155
+ variable, which is *false*.
156
+
157
+ :dictionaries: Here, you **HAVE** to define the online-dictionaries to
158
+ consult, if you do not want to stick with the defaults, which are
159
+ Larousse and JargonF, two French speaking sites, which also provide
160
+ explanations in the French language only.
161
+
162
+ The dictionaries are defined with four variables, each: *name, url, xpath, color*.
163
+ Each dictionary-definition must start with a dash, followed by a white-space, then the
164
+ first variable. Each variable-name must be enclosed by colons (see comments in the config-file).
165
+
166
+ :name: The name of the dictionary, how it will be referred to in the Glossary. An example could be 'Meriam-Webster'
167
+
168
+ :url: Note here the part from the url to a search-result in
169
+ the chosen dictionary, which precedes the searched
170
+ expression. You determine this string by doing a
171
+ search in the online-dictionary, then copy&paste the
172
+ url as it is displayed in your browser. Rearrange
173
+ possible request-parameters (following '?') to ensure
174
+ that the searched word or expression is the very last
175
+ item in the url. Remove only the searched expression
176
+ and note the remainder as the value to the variable
177
+ *url*.
178
+
179
+ :xpath: This is the xpath which identifies the HTML-element
180
+ in a search-result which contains the explanation of
181
+ an expression. Many resources on the Web explain how
182
+ to compose an xpath. Be as specific as possible, to
183
+ avoid a miss-interpretation of the xpath-expression,
184
+ use html-attributes which may be applied to an HTML
185
+ container-tag. Especially *id*, if present but also
186
+ css-classes can help to identify a tag unambiguously.
187
+
188
+ :color: A hexadecimal rgb color value in single quotes is
189
+ attributed to each dictionary to facilitate the
190
+ identification of the dictionary which provides a
191
+ specific explanation in the glossary. Exemplary
192
+ colors are *'800000'* or *'500050'*. Take care to
193
+ choose colors which harmonize with the background in
194
+ your template-file, if you use one.
195
+
196
+ :template: An HTML-file which contains placeholders. Two placeholders are needed
197
+ at the time of this writing, one to name the dictionaries which are
198
+ used to look-up definitions, another one to locate the spot where
199
+ the new glossary will be written. See below under *EXAMPLE-Template*
200
+ for a rudimentary example. The default template is internally defined.
201
+
202
+ :fdelim: A character sequence which is used to mark placeholders in the
203
+ HTML-template file. The default is '*%-*', meaning that a
204
+ percent-symbol followed by a dash marks the beginning, a dash
205
+ followed by a percent-symbol the end of a placeholder, like in
206
+ **%-dict-list-%** for the placeholder named 'dict-list'.
207
+
208
+ :placeholders: A list of placeholder names. Currently, there are only two
209
+ placeholders recognized by Html2Index: *dict_list* and
210
+ *glossary*. As the value to these two variables, note the name
211
+ that you chose for the placeholders in your HTML-template. The
212
+ defaults are *dict_list* for *dict_list* and *index* for
213
+ *glossary*.
214
+
215
+ :html_tag: This is the tag which encloses marked expressions in the original
216
+ HTML-page (the source-file). Default is *span*
217
+
218
+ :html_attribute: An attribute of the html_tag which encloses marked expressions
219
+ in the original HTML-page (the source-file). Default is *lang*.
220
+
221
+ :html_value: The value of an attribute of the html_tag which encloses marked
222
+ expressions in the original HTML-page (the source-file). Default is
223
+ *fy*.
224
+
225
+ EXAMPLE-Template
226
+ ============================
227
+ Assuming that the defaults are used, the following could be a working
228
+ HTML-template to use with HTML2Index:
229
+
230
+ ::
231
+
232
+ <html>
233
+ <head><title>Glossary</title></head>
234
+ <body>
235
+ <h1>Glossary</h1>
236
+ <h2>Dictionaries used to produce this glossary</h2>
237
+ <!-- will be replaced by an unnumbered list <ul><li> ... </li></ul> -->
238
+ %-dict_list-%
239
+ <h2>Definitions</h2>
240
+ <!-- will be replaced by a definition list <dl><dt><dd>... </dd></dt></dl> -->
241
+ %-glossary-%
242
+ </body>
243
+ </html>
244
+
245
+ ERRORS and WARNINGS
246
+ ============================
247
+
248
+ html2index warns you if the output-file exists and asks you if you want
249
+ to replace it with a new version.
250
+
251
+ The program also tries to determine the file-type of the input (HTML) file and
252
+ gives out a warning if the file is considered unsuitable.
253
+
254
+ Each time, that an expression cannot be found in one of the targeted dictionaries,
255
+ a warning is given. All these problematic expressions will be listed in a
256
+ temporary file, which is named after html2index has terminated.
257
+
258
+ SOURCE CODE and DEVELOPMENT
259
+ ============================
260
+ html2index is developed in Ruby and can be installed as a Ruby-Gem. As Ruby is
261
+ an interpreter-language, the source-code of the installed version is always
262
+ accessible. You can also decompress the gem-file to take a look at the code.
263
+
264
+ :AUTHOR: Michael Uplawski <michael[dot]uplawski[at]uplawski[dot]eu>
265
+
@@ -0,0 +1,21 @@
1
+ require_relative "lib/version"
2
+ # require_relative "lib/constants"
3
+ require 'date'
4
+
5
+ Gem::Specification.new do |s|
6
+ s.version = VERSION
7
+ s.name = File.basename(__FILE__, '.gemspec')
8
+ s.date = Date.today.strftime('%F')
9
+ s.summary = "updated dependencies, updated use of the URI module."
10
+ s.description = "creates a glossary from HTML"
11
+ s.authors = ["Michael Uplawski"]
12
+ s.email = 'michael.uplawski@uplawski.eu'
13
+ s.files = %w~html2index~.collect{|f| 'bin/' << f} + %w~version.rb argparser.rb configuration.rb constants.rb dictionary.rb html2index.rb logging.rb translating.rb user_input.rb definition.rb file_checking.rb log.conf template.rb translations~.collect{|f| 'lib/' << f} + %w~html2index.gemspec~.collect{|f|f} + %w~html/html2index.html man/html2index.1.gz pdf/html2index.pdf rst/html2index.rst~.collect{|f| 'doc/' << f}
14
+ s.homepage = 'http://www.souris-libre.fr'
15
+ s.requirements = 'nokogiri, ruby-filemagic'
16
+ s.add_runtime_dependency 'nokogiri', '~> 1.10', '>= 1.10.9'
17
+ s.add_runtime_dependency 'ruby-filemagic', '~> 0.7', '>= 0.7.2'
18
+ s.executables = 'html2index'
19
+ s.license = 'GPL-3.0'
20
+ s.required_ruby_version = '>= 2.7.1'
21
+ end
data/lib/argparser.rb ADDED
@@ -0,0 +1,111 @@
1
+ #encoding: UTF-8
2
+ =begin
3
+ /***************************************************************************
4
+ * ©2016-2017 Michael Uplawski <michael.uplawski@uplawski.eu> *
5
+ * *
6
+ * This program is free software; you can redistribute it and/or modify *
7
+ * it under the terms of the GNU General Public License as published by *
8
+ * the Free Software Foundation; either version 3 of the License, or *
9
+ * (at your option) any later version. *
10
+ * *
11
+ * This program is distributed in the hope that it will be useful, *
12
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of *
13
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
14
+ * GNU General Public License for more details. *
15
+ * *
16
+ * You should have received a copy of the GNU General Public License *
17
+ * along with this program; if not, write to the *
18
+ * Free Software Foundation, Inc., *
19
+ * 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. *
20
+ ***************************************************************************/
21
+ =end
22
+
23
+
24
+ require 'optparse'
25
+ require 'optparse/time'
26
+ require 'ostruct'
27
+ require_relative 'logging'
28
+ require_relative 'version'
29
+ # require_relative 'translating'
30
+ require_relative 'constants'
31
+
32
+ class ArgParser
33
+ # Class level logger. This is a static class.
34
+ self.extend(Logging)
35
+ # self.extend(Translating)
36
+ @@log = init_logger()
37
+
38
+ # Returns a structure describing the options.
39
+ #
40
+ def self.parse(args)
41
+ if args.empty?
42
+ puts usage
43
+ exit true
44
+ end
45
+ # The options specified on the command line will be collected in
46
+ # <b>options</b>. No defaults. Most options are optional and do not
47
+ # have to be set at all.
48
+ # The others must be named for each transformation or be set in the
49
+ # configuration-file.
50
+ options = OpenStruct.new
51
+ options.target = nil
52
+
53
+ op = OptionParser.new do |opts|
54
+ opts.banner = usage
55
+
56
+ opts.on("-d", "--debug", 'Be verbose') do
57
+ $log_level = Logger::DEBUG
58
+ @@log.level = $log_level
59
+ end
60
+
61
+ opts.on("-sOURCE", "--source=SOURCE", 'Source document (html)') do |so|
62
+ options.source = so
63
+ end
64
+
65
+ opts.on("-oUT", "--out=GLOSSAR", 'Glossar-file (html)') do |ta|
66
+ options.target = ta
67
+ end
68
+
69
+ opts.on("-tEMPLATE", "--template=TEMPLATE", 'Template (html)') do |tpl|
70
+ options.template = tpl
71
+ end
72
+
73
+ opts.on("-cONFIG", "--config=CONFIG", 'Configuration-file') do |cfg|
74
+ options.config = cfg
75
+ end
76
+
77
+ opts.on("-h", "--help", 'Show this message') do
78
+ puts opts
79
+ exit true
80
+ end
81
+
82
+ opts.on("-v", "--version", 'Show program version') do
83
+ puts APPNAME.dup << ", version " << VERSION
84
+ exit true
85
+ end
86
+ end
87
+ begin
88
+ op.parse!(args)
89
+ rescue OptionParser::ParseError => er
90
+ msg = "ERROR! Unsuitable or incomplete program-arguments" << ": %s" %er.message
91
+ puts msg
92
+ puts "Start this program with parameter -h or --help to see the usage-message."
93
+ exit false
94
+ end
95
+ @@log.debug('options are ' << options.to_s)
96
+
97
+ options
98
+ end # parse()
99
+
100
+ =begin
101
+ Shows the usage-message
102
+ =end
103
+ def self::usage
104
+ msg = "\n\tUsage: html2index -s input.html [-o output.html] [-c config-file] [-t template.html] [-d]"
105
+ msg << "\n\n\t* Will print to stdout, if the output-file is not provided."
106
+ msg << "\n\t* Adapt ~/.config/HTML2Index/config to your needs.\n\n"
107
+ end
108
+
109
+
110
+ end
111
+
@@ -0,0 +1,183 @@
1
+ #encoding: UTF-8
2
+ =begin
3
+ /***************************************************************************
4
+ * ©2016-2017 Michael Uplawski <michael.uplawski@uplawski.eu> *
5
+ * *
6
+ * This program is free software; you can redistribute it and/or modify *
7
+ * it under the terms of the GNU General Public License as published by *
8
+ * the Free Software Foundation; either version 3 of the License, or *
9
+ * (at your option) any later version. *
10
+ * *
11
+ * This program is distributed in the hope that it will be useful, *
12
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of *
13
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
14
+ * GNU General Public License for more details. *
15
+ * *
16
+ * You should have received a copy of the GNU General Public License *
17
+ * along with this program; if not, write to the *
18
+ * Free Software Foundation, Inc., *
19
+ * 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. *
20
+ ***************************************************************************/
21
+ =end
22
+ require 'yaml'
23
+ require 'singleton'
24
+ require 'ostruct'
25
+ require_relative 'constants'
26
+ require_relative 'file_checking'
27
+ require_relative 'logging'
28
+ require_relative 'translating'
29
+
30
+ class Configuration
31
+ include File_Checking
32
+ include Translating
33
+ include Logging
34
+
35
+ # default configuration file
36
+ @@config_file = File::dirname(__FILE__) << File::Separator << 'config'
37
+
38
+ # do initializations
39
+ def initialize(options)
40
+ init_logger(STDOUT)
41
+ @log.level = $log_level
42
+ set(options)
43
+ @log.debug('config-file is ' << @@config_file)
44
+ end
45
+
46
+ def user_conf
47
+ confdir = ENV['HOME'].dup << File::Separator << '.config'
48
+ Dir.mkdir(confdir) if !Dir.exist?(confdir)
49
+ confdir = confdir << File::Separator << APPNAME
50
+ Dir.mkdir(confdir) if !Dir.exist?(confdir)
51
+ config = confdir << File::Separator << 'config'
52
+ if(!File.exist?(config ) )
53
+ begin
54
+ File.open(config, 'w') {|co| co.write(File.read(@@config_file))}
55
+ @log.info("Created user-version of the configuration-file in\n\t" << config)
56
+ rescue Exception => ex
57
+ @log.error('Cannot create the configuration: ' << ex.message)
58
+ give_up
59
+ end
60
+ end
61
+ return config
62
+ end
63
+
64
+ attr_reader :dicts, :template, :fields, :placeholders, :fdelim
65
+
66
+ # return any value stored in @config
67
+ def method_missing(msg, *args)
68
+ ms = msg.to_sym
69
+ # Exception-handling is not a control-structure.
70
+ # This is.
71
+ if @config[ms]
72
+ return @config[ms]
73
+ else
74
+ return nil
75
+ end
76
+ end
77
+
78
+ private
79
+
80
+ # Configure with the command-line arguments.
81
+ def set(options)
82
+ @log.debug('merging options ' << options.to_s)
83
+ # User-provided configuration-file?
84
+ if(options['config'])
85
+ cf = options['config']
86
+ @log.debug('config should be ' << cf.to_s)
87
+ msg = file_check(cf, :file, :readable)
88
+ if(!msg)
89
+ @@config_file = cf
90
+ else
91
+ msg = ("The file %s " << msg.split[1,100].join(' ')) %msg.split[0]
92
+ @log.error(("ERROR! Unsuitable file") << ' ' << msg)
93
+ give_up
94
+ end
95
+ else
96
+ @@config_file = user_conf
97
+ end
98
+
99
+ @log.debug('config-file is ' << @@config_file)
100
+
101
+ # read defaults from configuration-file
102
+ co = OpenStruct.new(YAML::load_file(@@config_file))
103
+
104
+ # merge and overwrite with the command-line arguments
105
+ @config = co.to_h.update(options.to_h)
106
+ if(! @config[:source] )
107
+ msg = ('missing argument %s') %'source'
108
+ @log.error msg
109
+ @log.error(("Start this program with parameter -h or --help to see the usage-message.") )
110
+ give_up
111
+ end
112
+
113
+ # ----- define the template html ----
114
+ warn = false
115
+ # set template
116
+ if @config[:template]
117
+ @template = @config[:template]
118
+ else
119
+ @log.warn 'Using default-template!'
120
+ warn ||= true
121
+ end
122
+ # fields in the template file
123
+ if @config[:placeholders] && @config[:template]
124
+ @placeholders = @config[:placeholders]
125
+ @log.debug('placeholders from config: ' << @placeholders.to_s)
126
+ else
127
+ @placeholders = Template.default(:placeholders)
128
+ if @config[:placeholders]
129
+ @log.warn 'Placeholders are defined, but no template-file is given.'
130
+ else
131
+ @log.warn 'Template is given, but placeholders are not defined.'
132
+ end
133
+ @log.warn 'Using default placeholders ' << @placeholders.to_a.collect{|p|p.join(': ')}.join(', ')
134
+ warn = true
135
+ end
136
+ @fields = [@placeholders[:dict_list], @placeholders[:glossary]]
137
+ # the field-delimiter
138
+ if @config[:fdelim] && @config[:template]
139
+ @fdelim = @config[:fdelim]
140
+ else
141
+ @fdelim = Template.default(:fdelim)
142
+ if @config[:template]
143
+ @log.warn 'Template is given, but field delimiters are not defined.'
144
+ else
145
+ @log.warn 'Field delimiters are defined but no template is given.'
146
+ end
147
+ @log.warn 'Using default delimiters ' << @fdelim << ', ' << @fdelim.reverse
148
+ warn ||= true
149
+ end
150
+
151
+ # ----------- template is defined --------
152
+ dictionaries = @config[:dictionaries]
153
+ @dicts = Array.new
154
+ if(dictionaries)
155
+ dictionaries.each do |d|
156
+ @dicts << Dictionary.new(d[:name], d[:url], d[:xpath], d[:color])
157
+ end
158
+ @log.debug('dicts are from config' << @dicts.to_s)
159
+ else
160
+ @log.warn( %~NO DICTIONARIES have been set in the configuration!
161
+ Will use the defaults, which is probably NOT what you want!
162
+ Defaults are: %s~ %[URL_DICT1.dup << ', ' << URL_DICT2.dup])
163
+ warn ||= true
164
+ @dicts << Dictionary.new(NAME_DICT1, URL_DICT1,XPATH_DICT1, DICT_COLORS[0])
165
+ @dicts << Dictionary.new(NAME_DICT2, URL_DICT2, XPATH_DICT2, DICT_COLORS[1])
166
+ @log.debug('dicts are from constants' << @dicts.to_s)
167
+ end
168
+ @log.warn "HINT: Adapt #{@@config_file} to avoid warnings in the future." if warn
169
+ end
170
+
171
+ # exit on error
172
+ def give_up
173
+ @log.error("\t" << ("Aborting. Bye!"))
174
+ exit false
175
+ end
176
+ end
177
+
178
+ #------- TEST -----------
179
+ if __FILE__ == "$0"
180
+ conf = Configuration.new
181
+ conf.set({})
182
+ end
183
+ #eof
data/lib/constants.rb ADDED
@@ -0,0 +1,55 @@
1
+ #encoding: UTF-8
2
+ =begin
3
+ /***************************************************************************
4
+ * ©2015-2017 Michael Uplawski <michael.uplawski@uplawski.eu> *
5
+ * *
6
+ * This program is free software; you can redistribute it and/or modify *
7
+ * it under the terms of the GNU General Public License as published by *
8
+ * the Free Software Foundation; either version 3 of the License, or *
9
+ * (at your option) any later version. *
10
+ * *
11
+ * This program is distributed in the hope that it will be useful, *
12
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of *
13
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
14
+ * GNU General Public License for more details. *
15
+ * *
16
+ * You should have received a copy of the GNU General Public License *
17
+ * along with this program; if not, write to the *
18
+ * Free Software Foundation, Inc., *
19
+ * 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. *
20
+ ***************************************************************************/
21
+ =end
22
+ require_relative 'dictionary'
23
+ require 'logger'
24
+ require 'date'
25
+
26
+ APPNAME = 'HTML2Index'
27
+
28
+ # URL and xpath for the definitions.
29
+ #
30
+ # The expression is added to the URL. Example:
31
+ # Taken a dictionary url like
32
+ # URL_MY_DICT = "http://my_technical_dictionary.somewhere.com/words/
33
+ # the url to search an expression will be
34
+ # URL_MY_DICT = "http://my_technical_dictionary.somewhere.com/words/expression
35
+ # In other words: Note here the part of the url *before* the expression
36
+ # ---
37
+ # The xpath must identify any HTML-elements, containing definitions.
38
+ #
39
+ NAME_DICT1 ||= 'Larousse'
40
+ URL_DICT1 ||= "http://www.larousse.com/fr/dictionnaires/francais/"
41
+ XPATH_DICT1 ||= ".//li[@class='DivisionDefinition']"
42
+
43
+ NAME_DICT2 ||= 'JargonF'
44
+ URL_DICT2 ||= "http://jargonf.org/wiki/"
45
+ XPATH_DICT2 ||= ".//div[@id='mw-content-text']/*"
46
+
47
+ # colors which are connected to one dictionary, each
48
+ DICT_COLORS ||= ['a000a0', '00a000']
49
+
50
+ # definitions which cause problems are logged.
51
+ PROBLEM_LOG ||= 'html2index_problems.txt'
52
+ $log_level = Logger::INFO
53
+
54
+ # meta-tag for the html-output
55
+ GeneratorMeta = "<meta name=\"generator\" content=\"HTML2Index ©2015-#{Date.today.strftime('%Y')} michael.uplawski@uplawski.eu\" />"
data/lib/definition.rb ADDED
@@ -0,0 +1,43 @@
1
+ #encoding: UTF-8
2
+ =begin
3
+ /***************************************************************************
4
+ * ©2015-2017 Michael Uplawski <michael.uplawski@uplawski.eu> *
5
+ * *
6
+ * This program is free software; you can redistribute it and/or modify *
7
+ * it under the terms of the GNU General Public License as published by *
8
+ * the Free Software Foundation; either version 3 of the License, or *
9
+ * (at your option) any later version. *
10
+ * *
11
+ * This program is distributed in the hope that it will be useful, *
12
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of *
13
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
14
+ * GNU General Public License for more details. *
15
+ * *
16
+ * You should have received a copy of the GNU General Public License *
17
+ * along with this program; if not, write to the *
18
+ * Free Software Foundation, Inc., *
19
+ * 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. *
20
+ ***************************************************************************/
21
+ =end
22
+ require 'constants'
23
+ require_relative 'translating'
24
+
25
+ class Definition
26
+ include Comparable
27
+
28
+ attr_reader :origin, :expression, :definition
29
+ attr_accessor :color
30
+
31
+ def initialize(origin, expression, definition)
32
+ @origin = origin
33
+ @expression = expression
34
+ @definition = definition
35
+ @color = nil
36
+ end
37
+
38
+ def <=>(other_def)
39
+ return @expression <=> other_def.expression
40
+ end
41
+ end
42
+
43
+