linkify-it-rb 0.1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 8899eb2fafae9f6ce4105221dc2342c93de1ca5c
4
+ data.tar.gz: 493bb38504a90c6f6c9dfa870364758370ee548e
5
+ SHA512:
6
+ metadata.gz: 01ebcaaaa3238631990a3212f5058d69bd58b7f336576e82ac101745f25ddf7dbd948b81013eda800a388d845ab99fa23f75d2d6d90db35368d90a9b92e5f6b2
7
+ data.tar.gz: 0b9c9fbfe357d9b3c78c2def0bd9f51f08cb0b39335e85c8bb791d483c4e589bf3a5d860357584591e3fe8277f6226310dff04a5235ed0c783472d7ac7803663
@@ -0,0 +1,170 @@
1
+ # linkify-it-rb
2
+
3
+ Links recognition library with FULL unicode support. Focused on high quality link patterns detection in plain text. For use with both Ruby and RubyMotion.
4
+
5
+ This gem is a port of the [linkify-it javascript package](https://github.com/markdown-it/linkify-it) by Vitaly Puzrin, that is used for the [markdown-it](https://github.com/markdown-it/markdown-it) package.
6
+
7
+ __[Javascript Demo](http://markdown-it.github.io/linkify-it/)__
8
+
9
+ _Note:_ This gem is still in progress - some of the Unicode support is still being worked on.
10
+
11
+
12
+ ## To be updated: Original Javascript package documentation
13
+
14
+ Why it's awesome:
15
+
16
+ - Full unicode support, _with astral characters_!
17
+ - International domains support.
18
+ - Allows rules extension & custom normalizers.
19
+
20
+
21
+ Install
22
+ -------
23
+
24
+ ```bash
25
+ npm install linkify-it --save
26
+ ```
27
+
28
+ Browserification is also supported.
29
+
30
+
31
+ Usage examples
32
+ --------------
33
+
34
+ ##### Example 1
35
+
36
+ ```js
37
+ var linkify = require('linkify-it')();
38
+
39
+ // Reload full tlds list & add uniffocial `.onion` domain.
40
+ linkify
41
+ .tlds(require('tlds')) // Reload with full tlds list
42
+ .tlds('.onion', true); // Add uniffocial `.onion` domain
43
+ .linkify.add('git:', 'http:'); // Add `git:` ptotocol as "alias"
44
+ .linkify.add('ftp:', null); // Disable `ftp:` ptotocol
45
+
46
+ console.log(linkify.test('Site github.com!')); // true
47
+
48
+ console.log(linkify.match('Site github.com!')); // [ {
49
+ // schema: "",
50
+ // index: 5,
51
+ // lastIndex: 15,
52
+ // raw: "github.com",
53
+ // text: "github.com",
54
+ // url: "http://github.com",
55
+ // } ]
56
+ ```
57
+
58
+ ##### Exmple 2. Add twitter mentions handler
59
+
60
+ ```js
61
+ linkify.add('@', {
62
+ validate: function (text, pos, self) {
63
+ var tail = text.slice(pos);
64
+
65
+ if (!self.re.twitter) {
66
+ self.re.twitter = new RegExp(
67
+ '^([a-zA-Z0-9_]){1,15}(?!_)(?=$|' + self.re.src_ZPCcCf + ')'
68
+ );
69
+ }
70
+ if (self.re.twitter.test(tail)) {
71
+ // Linkifier allows punctuation chars before prefix,
72
+ // but we additionally disable `@` ("@@mention" is invalid)
73
+ if (pos >= 2 && tail[pos - 2] === '@') {
74
+ return false;
75
+ }
76
+ return tail.match(self.re.twitter)[0].length;
77
+ }
78
+ return 0;
79
+ },
80
+ normalize: function (match) {
81
+ match.url = 'https://twitter.com/' + match.url.replace(/^@/, '');
82
+ }
83
+ });
84
+ ```
85
+
86
+
87
+ API
88
+ ---
89
+
90
+ __[API documentation](http://markdown-it.github.io/linkify-it/doc)__
91
+
92
+ ### new LinkifyIt(schemas)
93
+
94
+ Creates new linkifier instance with optional additional schemas.
95
+ Can be called without `new` keyword for convenience.
96
+
97
+ By default understands:
98
+
99
+ - `http(s)://...` , `ftp://...`, `mailto:...` & `//...` links
100
+ - "fuzzy" links and emails (google.com, foo@bar.com).
101
+
102
+ `schemas` is an object, where each key/value describes protocol/rule:
103
+
104
+ - __key__ - link prefix (usually, protocol name with `:` at the end, `skype:`
105
+ for example). `linkify-it` makes shure that prefix is not preceeded with
106
+ alphanumeric char.
107
+ - __value__ - rule to check tail after link prefix
108
+ - _String_ - just alias to existing rule
109
+ - _Object_
110
+ - _validate_ - validator function (should return matched length on success),
111
+ or `RegExp`.
112
+ - _normalize_ - optional function to normalize text & url of matched result
113
+ (for example, for twitter mentions).
114
+
115
+
116
+ ### .test(text)
117
+
118
+ Searches linkifiable pattern and returns `true` on success or `false` on fail.
119
+
120
+
121
+ ### .pretest(text)
122
+
123
+ Quick check if link MAY BE can exist. Can be used to optimize more expensive
124
+ `.test()` calls. Return `false` if link can not be found, `true` - if `.test()`
125
+ call needed to know exactly.
126
+
127
+
128
+ ### .testSchemaAt(text, name, offset)
129
+
130
+ Similar to `.test()` but checks only specific protocol tail exactly at given
131
+ position. Returns length of found pattern (0 on fail).
132
+
133
+
134
+ ### .match(text)
135
+
136
+ Returns `Array` of found link matches or null if nothing found.
137
+
138
+ Each match has:
139
+
140
+ - __schema__ - link schema, can be empty for fuzzy links, or `//` for
141
+ protocol-neutral links.
142
+ - __index__ - offset of matched text
143
+ - __lastIndex__ - index of next char after mathch end
144
+ - __raw__ - matched text
145
+ - __text__ - normalized text
146
+ - __url__ - link, generated from matched text
147
+
148
+
149
+ ### .tlds(list[, keepOld])
150
+
151
+ Load (or merge) new tlds list. Those are user for fuzzy links (without prefix)
152
+ to avoid false positives. By default this algorythm used:
153
+
154
+ - hostname with any 2-letter root zones are ok.
155
+ - biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф
156
+ are ok.
157
+ - encoded (`xn--...`) root zones are ok.
158
+
159
+ If list is replaced, then exact match for 2-chars root zones will be checked.
160
+
161
+
162
+ ### .add(schema, definition)
163
+
164
+ Add new rule with `schema` prefix. For definition details see constructor
165
+ description. To disable existing rule use `.add(name, null)`
166
+
167
+
168
+ ## License
169
+
170
+ [MIT](https://github.com/markdown-it/linkify-it/blob/master/LICENSE)
@@ -0,0 +1,18 @@
1
+ # encoding: utf-8
2
+
3
+ if defined?(Motion::Project::Config)
4
+
5
+ lib_dir_path = File.dirname(File.expand_path(__FILE__))
6
+ Motion::Project::App.setup do |app|
7
+ app.files.unshift(Dir.glob(File.join(lib_dir_path, "linkify-it-rb/**/*.rb")))
8
+ end
9
+
10
+ require 'uc.micro-rb'
11
+
12
+ else
13
+
14
+ require 'uc.micro-rb'
15
+ require 'linkify-it-rb/re'
16
+ require 'linkify-it-rb/index'
17
+
18
+ end
@@ -0,0 +1,503 @@
1
+ class Linkify
2
+ include ::LinkifyRe
3
+
4
+ attr_accessor :__index__, :__last_index__, :__text_cache__, :__schema__, :__compiled__
5
+ attr_accessor :re, :bypass_normalizer
6
+
7
+ # DON'T try to make PRs with changes. Extend TLDs with LinkifyIt.tlds() instead
8
+ TLDS_DEFAULT = 'biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф'.split('|')
9
+
10
+ DEFAULT_SCHEMAS = {
11
+ 'http:' => {
12
+ validate: lambda do |text, pos, obj|
13
+ tail = text.slice(pos..-1)
14
+
15
+ if (!obj.re[:http])
16
+ # compile lazily, because "host"-containing variables can change on tlds update.
17
+ obj.re[:http] = Regexp.new('^\\/\\/' + LinkifyRe::SRC_AUTH + LinkifyRe::SRC_HOST_PORT_STRICT + LinkifyRe::SRC_PATH, 'i')
18
+ end
19
+ if obj.re[:http] =~ tail
20
+ return tail.match(obj.re[:http])[0].length
21
+ end
22
+ return 0
23
+ end
24
+ },
25
+ 'https:' => 'http:',
26
+ 'ftp:' => 'http:',
27
+ '//' => {
28
+ validate: lambda do |text, pos, obj|
29
+ tail = text.slice(pos..-1)
30
+
31
+ if (!obj.re[:no_http])
32
+ # compile lazily, becayse "host"-containing variables can change on tlds update.
33
+ obj.re[:no_http] = Regexp.new('^' + LinkifyRe::SRC_AUTH + LinkifyRe::SRC_HOST_PORT_STRICT + LinkifyRe::SRC_PATH, 'i')
34
+ end
35
+
36
+ if (obj.re[:no_http] =~ tail)
37
+ # should not be `://`, that protects from errors in protocol name
38
+ return 0 if (pos >= 3 && text[pos - 3] == ':')
39
+ return tail.match(obj.re[:no_http])[0].length
40
+ end
41
+ return 0
42
+ end
43
+ },
44
+ 'mailto:' => {
45
+ validate: lambda do |text, pos, obj|
46
+ tail = text.slice(pos..-1)
47
+
48
+ if (!obj.re[:mailto])
49
+ obj.re[:mailto] = Regexp.new('^' + LinkifyRe::SRC_EMAIL_NAME + '@' + LinkifyRe::SRC_HOST_STRICT, 'i')
50
+ end
51
+ if (obj.re[:mailto] =~ tail)
52
+ return tail.match(obj.re[:mailto])[0].length
53
+ end
54
+ return 0
55
+ end
56
+ }
57
+ }
58
+
59
+ #------------------------------------------------------------------------------
60
+ def escapeRE(str)
61
+ return str.gsub(/[\.\?\*\+\^\$\[\]\\\(\)\{\}\|\-]/, "\\$&")
62
+ end
63
+
64
+ #------------------------------------------------------------------------------
65
+ def resetScanCache
66
+ @__index__ = -1
67
+ @__text_cache__ = ''
68
+ end
69
+
70
+ #------------------------------------------------------------------------------
71
+ def createValidator(re)
72
+ return lambda do |text, pos, obj|
73
+ tail = text.slice(pos..-1)
74
+
75
+ (re =~ tail) ? tail.match(re)[0].length : 0
76
+ end
77
+ end
78
+
79
+ #------------------------------------------------------------------------------
80
+ def createNormalizer()
81
+ return lambda do |match, obj|
82
+ obj.normalize(match)
83
+ end
84
+ end
85
+
86
+ # Schemas compiler. Build regexps.
87
+ #
88
+ #------------------------------------------------------------------------------
89
+ def compile
90
+
91
+ # Load & clone RE patterns.
92
+ re = @re = {} #.merge!(require('./lib/re'))
93
+
94
+ # Define dynamic patterns
95
+ tlds = @__tlds__.dup
96
+
97
+ if (!@__tlds_replaced__)
98
+ tlds.push('[a-z]{2}')
99
+ end
100
+ tlds.push(re[:src_xn])
101
+
102
+ re[:src_tlds] = tlds.join('|')
103
+
104
+ untpl = lambda { |tpl| tpl.gsub('%TLDS%', re[:src_tlds]) }
105
+
106
+ re[:email_fuzzy] = Regexp.new(LinkifyRe::TPL_EMAIL_FUZZY.gsub('%TLDS%', re[:src_tlds]), true)
107
+ re[:link_fuzzy] = Regexp.new(LinkifyRe::TPL_LINK_FUZZY.gsub('%TLDS%', re[:src_tlds]), true)
108
+ re[:host_fuzzy_test] = Regexp.new(LinkifyRe::TPL_HOST_FUZZY_TEST.gsub('%TLDS%', re[:src_tlds]), true)
109
+
110
+ #
111
+ # Compile each schema
112
+ #
113
+
114
+ aliases = []
115
+
116
+ @__compiled__ = {} # Reset compiled data
117
+
118
+ schemaError = lambda do |name, val|
119
+ raise Error, ('(LinkifyIt) Invalid schema "' + name + '": ' + val)
120
+ end
121
+
122
+ @__schemas__.each do |name, val|
123
+
124
+ # skip disabled methods
125
+ next if (val == nil)
126
+
127
+ compiled = { validate: nil, link: nil }
128
+
129
+ @__compiled__[name] = compiled
130
+
131
+ if (val.is_a? Hash)
132
+ if (val[:validate].is_a? Regexp)
133
+ compiled[:validate] = createValidator(val[:validate])
134
+ elsif (val[:validate].is_a? Proc)
135
+ compiled[:validate] = val[:validate]
136
+ else
137
+ schemaError(name, val)
138
+ end
139
+
140
+ if (val[:normalize].is_a? Proc)
141
+ compiled[:normalize] = val[:normalize]
142
+ elsif (!val[:normalize])
143
+ compiled[:normalize] = createNormalizer()
144
+ else
145
+ schemaError(name, val)
146
+ end
147
+ next
148
+ end
149
+
150
+ if (val.is_a? String)
151
+ aliases.push(name)
152
+ next
153
+ end
154
+
155
+ schemaError(name, val)
156
+ end
157
+
158
+ #
159
+ # Compile postponed aliases
160
+ #
161
+
162
+ aliases.each do |an_alias|
163
+ if (!@__compiled__[@__schemas__[an_alias]])
164
+ # Silently fail on missed schemas to avoid errons on disable.
165
+ # schemaError(an_alias, self.__schemas__[an_alias]);
166
+ else
167
+ @__compiled__[an_alias][:validate] = @__compiled__[@__schemas__[an_alias]][:validate]
168
+ @__compiled__[an_alias][:normalize] = @__compiled__[@__schemas__[an_alias]][:normalize]
169
+ end
170
+ end
171
+
172
+ #
173
+ # Fake record for guessed links
174
+ #
175
+ @__compiled__[''] = { validate: nil, normalize: createNormalizer }
176
+
177
+ #
178
+ # Build schema condition, and filter disabled & fake schemas
179
+ #
180
+ slist = @__compiled__.select {|name, val| name.length > 0 && !val.nil? }.keys.map {|str| escapeRE(str)}.join('|')
181
+
182
+ # (?!_) cause 1.5x slowdown
183
+ @re[:schema_test] = Regexp.new('(^|(?!_)(?:>|' + LinkifyRe::SRC_Z_P_CC_CF + '))(' + slist + ')', 'i')
184
+ @re[:schema_search] = Regexp.new('(^|(?!_)(?:>|' + LinkifyRe::SRC_Z_P_CC_CF + '))(' + slist + ')', 'ig')
185
+
186
+ @re[:pretest] = Regexp.new(
187
+ '(' + @re[:schema_test].source + ')|' +
188
+ '(' + @re[:host_fuzzy_test].source + ')|' + '@', 'i')
189
+
190
+ #
191
+ # Cleanup
192
+ #
193
+
194
+ resetScanCache
195
+ end
196
+
197
+ # Match result. Single element of array, returned by [[LinkifyIt#match]]
198
+ #------------------------------------------------------------------------------
199
+ class Match
200
+ attr_accessor :schema, :index, :lastIndex, :raw, :text, :url
201
+
202
+ def initialize(obj, shift)
203
+ start = obj.__index__
204
+ endt = obj.__last_index__
205
+ text = obj.__text_cache__.slice(start...endt)
206
+
207
+ # Match#schema -> String
208
+ #
209
+ # Prefix (protocol) for matched string.
210
+ @schema = obj.__schema__.downcase
211
+
212
+ # Match#index -> Number
213
+ #
214
+ # First position of matched string.
215
+ @index = start + shift
216
+
217
+ # Match#lastIndex -> Number
218
+ #
219
+ # Next position after matched string.
220
+ @lastIndex = endt + shift
221
+
222
+ # Match#raw -> String
223
+ #
224
+ # Matched string.
225
+ @raw = text
226
+
227
+ # Match#text -> String
228
+ #
229
+ # Notmalized text of matched string.
230
+ @text = text
231
+
232
+ # Match#url -> String
233
+ #
234
+ # Normalized url of matched string.
235
+ @url = text
236
+ end
237
+
238
+ #------------------------------------------------------------------------------
239
+ def self.createMatch(obj, shift)
240
+ match = Match.new(obj, shift)
241
+ obj.__compiled__[match.schema][:normalize].call(match, obj)
242
+ return match
243
+ end
244
+ end
245
+
246
+
247
+
248
+ # new LinkifyIt(schemas)
249
+ # - schemas (Object): Optional. Additional schemas to validate (prefix/validator)
250
+ #
251
+ # Creates new linkifier instance with optional additional schemas.
252
+ # Can be called without `new` keyword for convenience.
253
+ #
254
+ # By default understands:
255
+ #
256
+ # - `http(s)://...` , `ftp://...`, `mailto:...` & `//...` links
257
+ # - "fuzzy" links and emails (example.com, foo@bar.com).
258
+ #
259
+ # `schemas` is an object, where each key/value describes protocol/rule:
260
+ #
261
+ # - __key__ - link prefix (usually, protocol name with `:` at the end, `skype:`
262
+ # for example). `linkify-it` makes shure that prefix is not preceeded with
263
+ # alphanumeric char and symbols. Only whitespaces and punctuation allowed.
264
+ # - __value__ - rule to check tail after link prefix
265
+ # - _String_ - just alias to existing rule
266
+ # - _Object_
267
+ # - _validate_ - validator function (should return matched length on success),
268
+ # or `RegExp`.
269
+ # - _normalize_ - optional function to normalize text & url of matched result
270
+ # (for example, for @twitter mentions).
271
+ #------------------------------------------------------------------------------
272
+ def initialize(schemas = {})
273
+ # if (!(this instanceof LinkifyIt)) {
274
+ # return new LinkifyIt(schemas);
275
+ # }
276
+
277
+ # Cache last tested result. Used to skip repeating steps on next `match` call.
278
+ @__index__ = -1
279
+ @__last_index__ = -1 # Next scan position
280
+ @__schema__ = ''
281
+ @__text_cache__ = ''
282
+
283
+ @__schemas__ = {}.merge!(DEFAULT_SCHEMAS).merge!(schemas)
284
+ @__compiled__ = {}
285
+
286
+ @__tlds__ = TLDS_DEFAULT
287
+ @__tlds_replaced__ = false
288
+
289
+ @re = {}
290
+
291
+ @bypass_normalizer = false # only used in testing scenarios
292
+
293
+ compile
294
+ end
295
+
296
+
297
+ # chainable
298
+ # LinkifyIt#add(schema, definition)
299
+ # - schema (String): rule name (fixed pattern prefix)
300
+ # - definition (String|RegExp|Object): schema definition
301
+ #
302
+ # Add new rule definition. See constructor description for details.
303
+ #------------------------------------------------------------------------------
304
+ def add(schema, definition)
305
+ @__schemas__[schema] = definition
306
+ compile
307
+ return self
308
+ end
309
+
310
+
311
+ # LinkifyIt#test(text) -> Boolean
312
+ #
313
+ # Searches linkifiable pattern and returns `true` on success or `false` on fail.
314
+ #------------------------------------------------------------------------------
315
+ def test(text)
316
+ # Reset scan cache
317
+ @__text_cache__ = text
318
+ @__index__ = -1
319
+
320
+ return false if (!text.length)
321
+
322
+ # try to scan for link with schema - that's the most simple rule
323
+ if @re[:schema_test] =~ text
324
+ re = @re[:schema_search]
325
+ # re[:lastIndex] = 0
326
+ while ((m = re.match(text)) != nil)
327
+ len = testSchemaAt(text, m[2], m.end(0)) #re[:lastIndex])
328
+ if (len)
329
+ @__schema__ = m[2]
330
+ @__index__ = m.begin(0) + m[1].length
331
+ @__last_index__ = m.begin(0) + m[0].length + len
332
+ break
333
+ end
334
+ end
335
+ end
336
+
337
+ if (@__compiled__['http:'])
338
+ # guess schemaless links
339
+
340
+ tld_pos = text.index(@re[:host_fuzzy_test])
341
+ if !tld_pos.nil?
342
+ # if tld is located after found link - no need to check fuzzy pattern
343
+ if (@__index__ < 0 || tld_pos < @__index__)
344
+ if ((ml = text.match(@re[:link_fuzzy])) != nil)
345
+
346
+ shift = ml.begin(0) + ml[1].length
347
+
348
+ if (@__index__ < 0 || shift < @__index__)
349
+ @__schema__ = ''
350
+ @__index__ = shift
351
+ @__last_index__ = ml.begin(0) + ml[0].length
352
+ end
353
+ end
354
+ end
355
+ end
356
+ end
357
+
358
+ if (@__compiled__['mailto:'])
359
+ # guess schemaless emails
360
+ at_pos = text.index('@')
361
+ if !at_pos.nil?
362
+ # We can't skip this check, because this cases are possible:
363
+ # 192.168.1.1@gmail.com, my.in@example.com
364
+ if ((me = text.match(@re[:email_fuzzy])) != nil)
365
+
366
+ shift = me.begin(0) + me[1].length
367
+ nextc = me.begin(0) + me[0].length
368
+
369
+ if (@__index__ < 0 || shift < @__index__ ||
370
+ (shift == @__index__ && nextc > @__last_index__))
371
+ @__schema__ = 'mailto:'
372
+ @__index__ = shift
373
+ @__last_index__ = nextc
374
+ end
375
+ end
376
+ end
377
+ end
378
+
379
+ return @__index__ >= 0
380
+ end
381
+
382
+
383
+ # LinkifyIt#pretest(text) -> Boolean
384
+ #
385
+ # Very quick check, that can give false positives. Returns true if link MAY BE
386
+ # can exists. Can be used for speed optimization, when you need to check that
387
+ # link NOT exists.
388
+ #------------------------------------------------------------------------------
389
+ def pretest(text)
390
+ return !(@re[:pretest] =~ text).nil?
391
+ end
392
+
393
+
394
+ # LinkifyIt#testSchemaAt(text, name, position) -> Number
395
+ # - text (String): text to scan
396
+ # - name (String): rule (schema) name
397
+ # - position (Number): text offset to check from
398
+ #
399
+ # Similar to [[LinkifyIt#test]] but checks only specific protocol tail exactly
400
+ # at given position. Returns length of found pattern (0 on fail).
401
+ #------------------------------------------------------------------------------
402
+ def testSchemaAt(text, schema, pos)
403
+ # If not supported schema check requested - terminate
404
+ if (!@__compiled__[schema.downcase])
405
+ return 0
406
+ end
407
+ return @__compiled__[schema.downcase][:validate].call(text, pos, self)
408
+ end
409
+
410
+
411
+ # LinkifyIt#match(text) -> Array|null
412
+ #
413
+ # Returns array of found link descriptions or `null` on fail. We strongly
414
+ # to use [[LinkifyIt#test]] first, for best speed.
415
+ #
416
+ # ##### Result match description
417
+ #
418
+ # - __schema__ - link schema, can be empty for fuzzy links, or `//` for
419
+ # protocol-neutral links.
420
+ # - __index__ - offset of matched text
421
+ # - __lastIndex__ - index of next char after mathch end
422
+ # - __raw__ - matched text
423
+ # - __text__ - normalized text
424
+ # - __url__ - link, generated from matched text
425
+ #------------------------------------------------------------------------------
426
+ def match(text)
427
+ shift = 0
428
+ result = []
429
+
430
+ # Try to take previous element from cache, if .test() called before
431
+ if (@__index__ >= 0 && @__text_cache__ == text)
432
+ result.push(Match.createMatch(self, shift))
433
+ shift = @__last_index__
434
+ end
435
+
436
+ # Cut head if cache was used
437
+ tail = shift ? text.slice(shift..-1) : text
438
+
439
+ # Scan string until end reached
440
+ while (self.test(tail))
441
+ result.push(Match.createMatch(self, shift))
442
+
443
+ tail = tail.slice(@__last_index__..-1)
444
+ shift += @__last_index__
445
+ end
446
+
447
+ if (result.length)
448
+ return result
449
+ end
450
+
451
+ return nil
452
+ end
453
+
454
+
455
+ # chainable
456
+ # LinkifyIt#tlds(list [, keepOld]) -> this
457
+ # - list (Array): list of tlds
458
+ # - keepOld (Boolean): merge with current list if `true` (`false` by default)
459
+ #
460
+ # Load (or merge) new tlds list. Those are user for fuzzy links (without prefix)
461
+ # to avoid false positives. By default this algorythm used:
462
+ #
463
+ # - hostname with any 2-letter root zones are ok.
464
+ # - biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф
465
+ # are ok.
466
+ # - encoded (`xn--...`) root zones are ok.
467
+ #
468
+ # If list is replaced, then exact match for 2-chars root zones will be checked.
469
+ #------------------------------------------------------------------------------
470
+ def tlds(list, keepOld)
471
+ list = list.is_a?(Array) ? list : [ list ]
472
+
473
+ if (!keepOld)
474
+ @__tlds__ = list.dup
475
+ @__tlds_replaced__ = true
476
+ compile
477
+ return self
478
+ end
479
+
480
+ @__tlds__ = @__tlds__.concat(list).sort.uniq.reverse
481
+
482
+ compile
483
+ return self
484
+ end
485
+
486
+ # LinkifyIt#normalize(match)
487
+ #
488
+ # Default normalizer (if schema does not define it's own).
489
+ #------------------------------------------------------------------------------
490
+ def normalize(match)
491
+ return if @bypass_normalizer
492
+
493
+ # Do minimal possible changes by default. Need to collect feedback prior
494
+ # to move forward https://github.com/markdown-it/linkify-it/issues/1
495
+
496
+ match.url = 'http://' + match.url if !match.schema
497
+
498
+ if (match.schema == 'mailto:' && !(/^mailto\:/i =~ match.url))
499
+ match.url = 'mailto:' + match.url
500
+ end
501
+ end
502
+
503
+ end
@@ -0,0 +1,111 @@
1
+ module LinkifyRe
2
+
3
+ # Use direct extract instead of `regenerate` to reduce size
4
+ SRC_ANY = UCMicro::Properties::Any::REGEX
5
+ SRC_CC = UCMicro::Categories::Cc::REGEX
6
+ SRC_CF = UCMicro::Categories::Cf::REGEX
7
+ SRC_Z = UCMicro::Categories::Z::REGEX
8
+ SRC_P = UCMicro::Categories::P::REGEX
9
+
10
+ # \p{\Z\P\Cc\CF} (white spaces + control + format + punctuation)
11
+ SRC_Z_P_CC_CF = [ SRC_Z, SRC_P, SRC_CC, SRC_CF ].join('|')
12
+
13
+ # \p{\Z\Cc\CF} (white spaces + control + format)
14
+ SRC_Z_CC_CF = [ SRC_Z, SRC_CC, SRC_CF ].join('|')
15
+
16
+ # All possible word characters (everything without punctuation, spaces & controls)
17
+ # Defined via punctuation & spaces to save space
18
+ # Should be something like \p{\L\N\S\M} (\w but without `_`)
19
+ SRC_PSEUDO_LETTER = '(?:(?!' + SRC_Z_P_CC_CF + ')' + SRC_ANY.source + ')'
20
+ # The same as above but without [0-9]
21
+ SRC_PSEUDO_LETTER_NON_D = '(?:(?![0-9]|' + SRC_Z_P_CC_CF + ')' + SRC_ANY.source + ')'
22
+
23
+ #------------------------------------------------------------------------------
24
+
25
+ SRC_IP4 = '(?:(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'
26
+ SRC_AUTH = '(?:(?:(?!' + SRC_Z_CC_CF + ').)+@)?'
27
+
28
+ SRC_PORT = '(?::(?:6(?:[0-4]\\d{3}|5(?:[0-4]\\d{2}|5(?:[0-2]\\d|3[0-5])))|[1-5]?\\d{1,4}))?'
29
+
30
+ SRC_HOST_TERMINATOR = '(?=$|' + SRC_Z_P_CC_CF + ')(?!-|_|:\\d|\\.-|\\.(?!$|' + SRC_Z_P_CC_CF + '))'
31
+
32
+ SRC_PATH =
33
+ '(?:' +
34
+ '[/?#]' +
35
+ '(?:' +
36
+ '(?!' + SRC_Z_CC_CF + '|[()\\[\\]{}.,"\'?!\\-]).|' +
37
+ '\\[(?:(?!' + SRC_Z_CC_CF + '|\\]).)*\\]|' +
38
+ '\\((?:(?!' + SRC_Z_CC_CF + '|[)]).)*\\)|' +
39
+ '\\{(?:(?!' + SRC_Z_CC_CF + '|[}]).)*\\}|' +
40
+ '\\"(?:(?!' + SRC_Z_CC_CF + '|["]).)+\\"|' +
41
+ "\\'(?:(?!" + SRC_Z_CC_CF + "|[']).)+\\'|" +
42
+ "\\'(?=" + SRC_PSEUDO_LETTER + ').|' + # allow `I'm_king` if no pair found
43
+ '\\.{2,3}[a-zA-Z0-9%]|' + # github has ... in commit range links. Restrict to
44
+ # english & percent-encoded only, until more examples found.
45
+ '\\.(?!' + SRC_Z_CC_CF + '|[.]).|' +
46
+ '\\-(?!' + SRC_Z_CC_CF + '|--(?:[^-]|$))(?:[-]+|.)|' + # `---` => long dash, terminate
47
+ '\\,(?!' + SRC_Z_CC_CF + ').|' + # allow `,,,` in paths
48
+ '\\!(?!' + SRC_Z_CC_CF + '|[!]).|' +
49
+ '\\?(?!' + SRC_Z_CC_CF + '|[?]).' +
50
+ ')+' +
51
+ '|\\/' +
52
+ ')?'
53
+
54
+ SRC_EMAIL_NAME = '[\\-;:&=\\+\\$,\\"\\.a-zA-Z0-9_]+'
55
+ SRC_XN = 'xn--[a-z0-9\\-]{1,59}';
56
+
57
+ # More to read about domain names
58
+ # http://serverfault.com/questions/638260/
59
+
60
+ SRC_DOMAIN_ROOT =
61
+ # Can't have digits and dashes
62
+ '(?:' +
63
+ SRC_XN +
64
+ '|' +
65
+ SRC_PSEUDO_LETTER_NON_D + '{1,63}' +
66
+ ')'
67
+
68
+ SRC_DOMAIN =
69
+ '(?:' +
70
+ SRC_XN +
71
+ '|' +
72
+ '(?:' + SRC_PSEUDO_LETTER + ')' +
73
+ '|' +
74
+ # don't allow `--` in domain names, because:
75
+ # - that can conflict with markdown &mdash; / &ndash;
76
+ # - nobody use those anyway
77
+ '(?:' + SRC_PSEUDO_LETTER + '(?:-(?!-)|' + SRC_PSEUDO_LETTER + '){0,61}' + SRC_PSEUDO_LETTER + ')' +
78
+ ')'
79
+
80
+ SRC_HOST =
81
+ '(?:' +
82
+ SRC_IP4 +
83
+ '|' +
84
+ '(?:(?:(?:' + SRC_DOMAIN + ')\\.)*' + SRC_DOMAIN_ROOT + ')' +
85
+ ')'
86
+
87
+ TPL_HOST_FUZZY =
88
+ '(?:' +
89
+ SRC_IP4 +
90
+ '|' +
91
+ '(?:(?:(?:' + SRC_DOMAIN + ')\\.)+(?:%TLDS%))' +
92
+ ')';
93
+
94
+ SRC_HOST_STRICT = SRC_HOST + SRC_HOST_TERMINATOR
95
+ TPL_HOST_FUZZY_STRICT = TPL_HOST_FUZZY + SRC_HOST_TERMINATOR
96
+ SRC_HOST_PORT_STRICT = SRC_HOST + SRC_PORT + SRC_HOST_TERMINATOR
97
+ TPL_HOST_PORT_FUZZY_STRICT = TPL_HOST_FUZZY + SRC_PORT + SRC_HOST_TERMINATOR
98
+
99
+ #------------------------------------------------------------------------------
100
+ # Main rules
101
+
102
+ # Rude test fuzzy links by host, for quick deny
103
+ TPL_HOST_FUZZY_TEST = 'localhost|\\.\\d{1,3}\\.|(?:\\.(?:%TLDS%)(?:' + SRC_Z_P_CC_CF + '|$))'
104
+ TPL_EMAIL_FUZZY = '(^|>|' + SRC_Z_CC_CF + ')(' + SRC_EMAIL_NAME + '@' + TPL_HOST_FUZZY_STRICT + ')'
105
+ TPL_LINK_FUZZY =
106
+ # Fuzzy link can't be prepended with .:/\- and non punctuation.
107
+ # but can start with > (markdown blockquote)
108
+ '(^|(?![.:/\\-_@])(?:[$+<=>^`|]|' + SRC_Z_P_CC_CF + '))' +
109
+ '((?![$+<=>^`|])' + TPL_HOST_PORT_FUZZY_STRICT + SRC_PATH + ')'
110
+
111
+ end
@@ -0,0 +1,5 @@
1
+ module LinkifyIt
2
+
3
+ VERSION = '0.1.0.0'
4
+
5
+ end
@@ -0,0 +1,234 @@
1
+ #------------------------------------------------------------------------------
2
+ describe 'links' do
3
+
4
+ # TODO tests which can't seem to get passing at the moment, so skip them
5
+ failing_test = [
6
+ 95, # GOOGLE.COM. unable to get final . to be removed
7
+ 214 # xn--d1abbgf6aiiy.xn--p1ai
8
+ ]
9
+
10
+ l = Linkify.new
11
+ l.bypass_normalizer = true # kill the normalizer
12
+
13
+ skipNext = false
14
+ linkfile = File.join(File.dirname(__FILE__), 'fixtures/links.txt')
15
+ lines = File.read(linkfile).split(/\r?\n/)
16
+ lines.each_with_index do |line, idx|
17
+ if skipNext
18
+ skipNext = false
19
+ next
20
+ end
21
+
22
+ line = line.sub(/^%.*/, '')
23
+ next_line = (lines[idx + 1] || '').sub(/^%.*/, '')
24
+
25
+ next if line.strip.empty?
26
+
27
+ unless failing_test.include?(idx + 1)
28
+ if !next_line.strip.empty?
29
+
30
+ it "line #{idx + 1}" do
31
+ expect(l.pretest(line)).to eq true # "(pretest failed in `#{line}`)"
32
+ expect(l.test("\n#{line}\n")).to eq true # "(link not found in `\n#{line}\n`)"
33
+ expect(l.test(line)).to eq true # "(link not found in `#{line}`)"
34
+ expect(l.match(line)[0].url).to eq next_line
35
+ end
36
+
37
+ skipNext = true
38
+
39
+ else
40
+
41
+ it "line #{idx + 1}" do
42
+ expect(l.pretest(line)).to eq true # "(pretest failed in `#{line}`)"
43
+ expect(l.test("\n#{line}\n")).to eq true # "(link not found in `\n#{line}\n`)"
44
+ expect(l.test(line)).to eq true # "(link not found in `#{line}`)"
45
+ expect(l.match(line)[0].url).to eq line
46
+ end
47
+ end
48
+ end
49
+ end
50
+
51
+ end
52
+
53
+
54
+ #------------------------------------------------------------------------------
55
+ describe 'not links' do
56
+
57
+ # TODO tests which can't seem to get passing at the moment, so skip them
58
+ failing_test = [ 6, 7, 8, 12, 16, 19, 22, 23, 24, 25, 26, 27, 28, 29, 48 ]
59
+
60
+ l = Linkify.new
61
+ l.bypass_normalizer = true # kill the normalizer
62
+
63
+ linkfile = File.join(File.dirname(__FILE__), 'fixtures/not_links.txt')
64
+ lines = File.read(linkfile).split(/\r?\n/)
65
+ lines.each_with_index do |line, idx|
66
+ line = line.sub(/^%.*/, '')
67
+
68
+ next if line.strip.empty?
69
+
70
+ unless failing_test.include?(idx + 1)
71
+ it "line #{idx + 1}" do
72
+ # assert.notOk(l.test(line),
73
+ # '(should not find link in `' + line + '`, but found `' +
74
+ # JSON.stringify((l.match(line) || [])[0]) + '`)');
75
+ expect(l.test(line)).not_to eq true
76
+ end
77
+ end
78
+ end
79
+
80
+ end
81
+
82
+ #------------------------------------------------------------------------------
83
+ describe 'API' do
84
+
85
+ #------------------------------------------------------------------------------
86
+ it 'extend tlds' do
87
+ l = Linkify.new
88
+
89
+ expect(l.test('google.myroot')).to_not eq true
90
+
91
+ l.tlds('myroot', true)
92
+
93
+ expect(l.test('google.myroot')).to eq true
94
+ expect(l.test('google.xyz')).to_not eq true
95
+
96
+ # this is some other package of tlds which we don't have
97
+ # l.tlds(require('tlds'));
98
+ # assert.ok(l.test('google.xyz'));
99
+ # assert.notOk(l.test('google.myroot'));
100
+ end
101
+
102
+
103
+ # TODO Tests not passing
104
+ #------------------------------------------------------------------------------
105
+ # it 'add rule as regexp, with default normalizer' do
106
+ # l = Linkify.new.add('my:', {validate: /^\/\/[a-z]+/} )
107
+ #
108
+ # match = l.match('google.com. my:// my://asdf!')
109
+ #
110
+ # expect(match[0].text).to eq 'google.com'
111
+ # expect(match[1].text).to eq 'my://asdf'
112
+ # end
113
+
114
+ # TODO Tests not passing
115
+ #------------------------------------------------------------------------------
116
+ # it 'add rule with normalizer'
117
+ # l = Linkify.new.add('my:', {
118
+ # validate: /^\/\/[a-z]+/,
119
+ # normalize: lambda {|m|
120
+ # m.text = m.text.sub(/^my:\/\//, '').upcase
121
+ # m.url = m.url.upcase
122
+ # }
123
+ # })
124
+ #
125
+ # match = l.match('google.com. my:// my://asdf!')
126
+ #
127
+ # expect(match[1].text).to eq 'ASDF'
128
+ # expect(match[1].url).to eq 'MY://ASDF'
129
+ # end
130
+
131
+ # it('disable rule', function () {
132
+ # var l = linkify();
133
+ #
134
+ # assert.ok(l.test('http://google.com'));
135
+ # assert.ok(l.test('foo@bar.com'));
136
+ # l.add('http:', null);
137
+ # l.add('mailto:', null);
138
+ # assert.notOk(l.test('http://google.com'));
139
+ # assert.notOk(l.test('foo@bar.com'));
140
+ # });
141
+ #
142
+ #
143
+ # it('add bad definition', function () {
144
+ # var l;
145
+ #
146
+ # l = linkify();
147
+ #
148
+ # assert.throw(function () {
149
+ # l.add('test:', []);
150
+ # });
151
+ #
152
+ # l = linkify();
153
+ #
154
+ # assert.throw(function () {
155
+ # l.add('test:', { validate: [] });
156
+ # });
157
+ #
158
+ # l = linkify();
159
+ #
160
+ # assert.throw(function () {
161
+ # l.add('test:', {
162
+ # validate: function () { return false; },
163
+ # normalize: 'bad'
164
+ # });
165
+ # });
166
+ # });
167
+ #
168
+ #
169
+ # it('test at position', function () {
170
+ # var l = linkify();
171
+ #
172
+ # assert.ok(l.testSchemaAt('http://google.com', 'http:', 5));
173
+ # assert.ok(l.testSchemaAt('http://google.com', 'HTTP:', 5));
174
+ # assert.notOk(l.testSchemaAt('http://google.com', 'http:', 6));
175
+ #
176
+ # assert.notOk(l.testSchemaAt('http://google.com', 'bad_schema:', 6));
177
+ # });
178
+ #
179
+ #
180
+ # it('correct cache value', function () {
181
+ # var l = linkify();
182
+ #
183
+ # var match = l.match('.com. http://google.com google.com ftp://google.com');
184
+ #
185
+ # assert.equal(match[0].text, 'http://google.com');
186
+ # assert.equal(match[1].text, 'google.com');
187
+ # assert.equal(match[2].text, 'ftp://google.com');
188
+ # });
189
+ #
190
+ # it('normalize', function () {
191
+ # var l = linkify(), m;
192
+ #
193
+ # m = l.match('mailto:foo@bar.com')[0];
194
+ #
195
+ # // assert.equal(m.text, 'foo@bar.com');
196
+ # assert.equal(m.url, 'mailto:foo@bar.com');
197
+ #
198
+ # m = l.match('foo@bar.com')[0];
199
+ #
200
+ # // assert.equal(m.text, 'foo@bar.com');
201
+ # assert.equal(m.url, 'mailto:foo@bar.com');
202
+ # });
203
+ #
204
+ #
205
+ # it('test @twitter rule', function () {
206
+ # var l = linkify().add('@', {
207
+ # validate: function (text, pos, self) {
208
+ # var tail = text.slice(pos);
209
+ #
210
+ # if (!self.re.twitter) {
211
+ # self.re.twitter = new RegExp(
212
+ # '^([a-zA-Z0-9_]){1,15}(?!_)(?=$|' + self.re.src_ZPCcCf + ')'
213
+ # );
214
+ # }
215
+ # if (self.re.twitter.test(tail)) {
216
+ # if (pos >= 2 && tail[pos - 2] === '@') {
217
+ # return false;
218
+ # }
219
+ # return tail.match(self.re.twitter)[0].length;
220
+ # }
221
+ # return 0;
222
+ # },
223
+ # normalize: function (m) {
224
+ # m.url = 'https://twitter.com/' + m.url.replace(/^@/, '');
225
+ # }
226
+ # });
227
+ #
228
+ # assert.equal(l.match('hello, @gamajoba_!')[0].text, '@gamajoba_');
229
+ # assert.equal(l.match(':@givi')[0].text, '@givi');
230
+ # assert.equal(l.match(':@givi')[0].url, 'https://twitter.com/givi');
231
+ # assert.notOk(l.test('@@invalid'));
232
+ # });
233
+
234
+ end
@@ -0,0 +1,2 @@
1
+ require 'byebug'
2
+ require 'linkify-it-rb'
metadata ADDED
@@ -0,0 +1,67 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: linkify-it-rb
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Brett Walker
8
+ - Vitaly Puzrin
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2015-03-26 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: uc.micro-rb
16
+ requirement: !ruby/object:Gem::Requirement
17
+ requirements:
18
+ - - "~>"
19
+ - !ruby/object:Gem::Version
20
+ version: '1.0'
21
+ type: :runtime
22
+ prerelease: false
23
+ version_requirements: !ruby/object:Gem::Requirement
24
+ requirements:
25
+ - - "~>"
26
+ - !ruby/object:Gem::Version
27
+ version: '1.0'
28
+ description: Ruby version of linkify-it for motion-markdown-it, for Ruby and RubyMotion
29
+ email: github@digitalmoksha.com
30
+ executables: []
31
+ extensions: []
32
+ extra_rdoc_files: []
33
+ files:
34
+ - README.md
35
+ - lib/linkify-it-rb.rb
36
+ - lib/linkify-it-rb/index.rb
37
+ - lib/linkify-it-rb/re.rb
38
+ - lib/linkify-it-rb/version.rb
39
+ - spec/linkify-it-rb/test_spec.rb
40
+ - spec/spec_helper.rb
41
+ homepage: https://github.com/digitalmoksha/linkify-it-rb
42
+ licenses:
43
+ - MIT
44
+ metadata: {}
45
+ post_install_message:
46
+ rdoc_options: []
47
+ require_paths:
48
+ - lib
49
+ required_ruby_version: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - ">="
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ required_rubygems_version: !ruby/object:Gem::Requirement
55
+ requirements:
56
+ - - ">="
57
+ - !ruby/object:Gem::Version
58
+ version: '0'
59
+ requirements: []
60
+ rubyforge_project:
61
+ rubygems_version: 2.4.5
62
+ signing_key:
63
+ specification_version: 4
64
+ summary: linkify-it for motion-markdown-it in Ruby
65
+ test_files:
66
+ - spec/linkify-it-rb/test_spec.rb
67
+ - spec/spec_helper.rb