linkify-it-rb 0.1.0.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 8899eb2fafae9f6ce4105221dc2342c93de1ca5c
4
+ data.tar.gz: 493bb38504a90c6f6c9dfa870364758370ee548e
5
+ SHA512:
6
+ metadata.gz: 01ebcaaaa3238631990a3212f5058d69bd58b7f336576e82ac101745f25ddf7dbd948b81013eda800a388d845ab99fa23f75d2d6d90db35368d90a9b92e5f6b2
7
+ data.tar.gz: 0b9c9fbfe357d9b3c78c2def0bd9f51f08cb0b39335e85c8bb791d483c4e589bf3a5d860357584591e3fe8277f6226310dff04a5235ed0c783472d7ac7803663
@@ -0,0 +1,170 @@
1
+ # linkify-it-rb
2
+
3
+ Links recognition library with FULL unicode support. Focused on high quality link patterns detection in plain text. For use with both Ruby and RubyMotion.
4
+
5
+ This gem is a port of the [linkify-it javascript package](https://github.com/markdown-it/linkify-it) by Vitaly Puzrin, that is used for the [markdown-it](https://github.com/markdown-it/markdown-it) package.
6
+
7
+ __[Javascript Demo](http://markdown-it.github.io/linkify-it/)__
8
+
9
+ _Note:_ This gem is still in progress - some of the Unicode support is still being worked on.
10
+
11
+
12
+ ## To be updated: Original Javascript package documentation
13
+
14
+ Why it's awesome:
15
+
16
+ - Full unicode support, _with astral characters_!
17
+ - International domains support.
18
+ - Allows rules extension & custom normalizers.
19
+
20
+
21
+ Install
22
+ -------
23
+
24
+ ```bash
25
+ npm install linkify-it --save
26
+ ```
27
+
28
+ Browserification is also supported.
29
+
30
+
31
+ Usage examples
32
+ --------------
33
+
34
+ ##### Example 1
35
+
36
+ ```js
37
+ var linkify = require('linkify-it')();
38
+
39
+ // Reload full tlds list & add uniffocial `.onion` domain.
40
+ linkify
41
+ .tlds(require('tlds')) // Reload with full tlds list
42
+ .tlds('.onion', true); // Add uniffocial `.onion` domain
43
+ .linkify.add('git:', 'http:'); // Add `git:` ptotocol as "alias"
44
+ .linkify.add('ftp:', null); // Disable `ftp:` ptotocol
45
+
46
+ console.log(linkify.test('Site github.com!')); // true
47
+
48
+ console.log(linkify.match('Site github.com!')); // [ {
49
+ // schema: "",
50
+ // index: 5,
51
+ // lastIndex: 15,
52
+ // raw: "github.com",
53
+ // text: "github.com",
54
+ // url: "http://github.com",
55
+ // } ]
56
+ ```
57
+
58
+ ##### Exmple 2. Add twitter mentions handler
59
+
60
+ ```js
61
+ linkify.add('@', {
62
+ validate: function (text, pos, self) {
63
+ var tail = text.slice(pos);
64
+
65
+ if (!self.re.twitter) {
66
+ self.re.twitter = new RegExp(
67
+ '^([a-zA-Z0-9_]){1,15}(?!_)(?=$|' + self.re.src_ZPCcCf + ')'
68
+ );
69
+ }
70
+ if (self.re.twitter.test(tail)) {
71
+ // Linkifier allows punctuation chars before prefix,
72
+ // but we additionally disable `@` ("@@mention" is invalid)
73
+ if (pos >= 2 && tail[pos - 2] === '@') {
74
+ return false;
75
+ }
76
+ return tail.match(self.re.twitter)[0].length;
77
+ }
78
+ return 0;
79
+ },
80
+ normalize: function (match) {
81
+ match.url = 'https://twitter.com/' + match.url.replace(/^@/, '');
82
+ }
83
+ });
84
+ ```
85
+
86
+
87
+ API
88
+ ---
89
+
90
+ __[API documentation](http://markdown-it.github.io/linkify-it/doc)__
91
+
92
+ ### new LinkifyIt(schemas)
93
+
94
+ Creates new linkifier instance with optional additional schemas.
95
+ Can be called without `new` keyword for convenience.
96
+
97
+ By default understands:
98
+
99
+ - `http(s)://...` , `ftp://...`, `mailto:...` & `//...` links
100
+ - "fuzzy" links and emails (google.com, foo@bar.com).
101
+
102
+ `schemas` is an object, where each key/value describes protocol/rule:
103
+
104
+ - __key__ - link prefix (usually, protocol name with `:` at the end, `skype:`
105
+ for example). `linkify-it` makes shure that prefix is not preceeded with
106
+ alphanumeric char.
107
+ - __value__ - rule to check tail after link prefix
108
+ - _String_ - just alias to existing rule
109
+ - _Object_
110
+ - _validate_ - validator function (should return matched length on success),
111
+ or `RegExp`.
112
+ - _normalize_ - optional function to normalize text & url of matched result
113
+ (for example, for twitter mentions).
114
+
115
+
116
+ ### .test(text)
117
+
118
+ Searches linkifiable pattern and returns `true` on success or `false` on fail.
119
+
120
+
121
+ ### .pretest(text)
122
+
123
+ Quick check if link MAY BE can exist. Can be used to optimize more expensive
124
+ `.test()` calls. Return `false` if link can not be found, `true` - if `.test()`
125
+ call needed to know exactly.
126
+
127
+
128
+ ### .testSchemaAt(text, name, offset)
129
+
130
+ Similar to `.test()` but checks only specific protocol tail exactly at given
131
+ position. Returns length of found pattern (0 on fail).
132
+
133
+
134
+ ### .match(text)
135
+
136
+ Returns `Array` of found link matches or null if nothing found.
137
+
138
+ Each match has:
139
+
140
+ - __schema__ - link schema, can be empty for fuzzy links, or `//` for
141
+ protocol-neutral links.
142
+ - __index__ - offset of matched text
143
+ - __lastIndex__ - index of next char after mathch end
144
+ - __raw__ - matched text
145
+ - __text__ - normalized text
146
+ - __url__ - link, generated from matched text
147
+
148
+
149
+ ### .tlds(list[, keepOld])
150
+
151
+ Load (or merge) new tlds list. Those are user for fuzzy links (without prefix)
152
+ to avoid false positives. By default this algorythm used:
153
+
154
+ - hostname with any 2-letter root zones are ok.
155
+ - biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф
156
+ are ok.
157
+ - encoded (`xn--...`) root zones are ok.
158
+
159
+ If list is replaced, then exact match for 2-chars root zones will be checked.
160
+
161
+
162
+ ### .add(schema, definition)
163
+
164
+ Add new rule with `schema` prefix. For definition details see constructor
165
+ description. To disable existing rule use `.add(name, null)`
166
+
167
+
168
+ ## License
169
+
170
+ [MIT](https://github.com/markdown-it/linkify-it/blob/master/LICENSE)
@@ -0,0 +1,18 @@
1
+ # encoding: utf-8
2
+
3
+ if defined?(Motion::Project::Config)
4
+
5
+ lib_dir_path = File.dirname(File.expand_path(__FILE__))
6
+ Motion::Project::App.setup do |app|
7
+ app.files.unshift(Dir.glob(File.join(lib_dir_path, "linkify-it-rb/**/*.rb")))
8
+ end
9
+
10
+ require 'uc.micro-rb'
11
+
12
+ else
13
+
14
+ require 'uc.micro-rb'
15
+ require 'linkify-it-rb/re'
16
+ require 'linkify-it-rb/index'
17
+
18
+ end
@@ -0,0 +1,503 @@
1
+ class Linkify
2
+ include ::LinkifyRe
3
+
4
+ attr_accessor :__index__, :__last_index__, :__text_cache__, :__schema__, :__compiled__
5
+ attr_accessor :re, :bypass_normalizer
6
+
7
+ # DON'T try to make PRs with changes. Extend TLDs with LinkifyIt.tlds() instead
8
+ TLDS_DEFAULT = 'biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф'.split('|')
9
+
10
+ DEFAULT_SCHEMAS = {
11
+ 'http:' => {
12
+ validate: lambda do |text, pos, obj|
13
+ tail = text.slice(pos..-1)
14
+
15
+ if (!obj.re[:http])
16
+ # compile lazily, because "host"-containing variables can change on tlds update.
17
+ obj.re[:http] = Regexp.new('^\\/\\/' + LinkifyRe::SRC_AUTH + LinkifyRe::SRC_HOST_PORT_STRICT + LinkifyRe::SRC_PATH, 'i')
18
+ end
19
+ if obj.re[:http] =~ tail
20
+ return tail.match(obj.re[:http])[0].length
21
+ end
22
+ return 0
23
+ end
24
+ },
25
+ 'https:' => 'http:',
26
+ 'ftp:' => 'http:',
27
+ '//' => {
28
+ validate: lambda do |text, pos, obj|
29
+ tail = text.slice(pos..-1)
30
+
31
+ if (!obj.re[:no_http])
32
+ # compile lazily, becayse "host"-containing variables can change on tlds update.
33
+ obj.re[:no_http] = Regexp.new('^' + LinkifyRe::SRC_AUTH + LinkifyRe::SRC_HOST_PORT_STRICT + LinkifyRe::SRC_PATH, 'i')
34
+ end
35
+
36
+ if (obj.re[:no_http] =~ tail)
37
+ # should not be `://`, that protects from errors in protocol name
38
+ return 0 if (pos >= 3 && text[pos - 3] == ':')
39
+ return tail.match(obj.re[:no_http])[0].length
40
+ end
41
+ return 0
42
+ end
43
+ },
44
+ 'mailto:' => {
45
+ validate: lambda do |text, pos, obj|
46
+ tail = text.slice(pos..-1)
47
+
48
+ if (!obj.re[:mailto])
49
+ obj.re[:mailto] = Regexp.new('^' + LinkifyRe::SRC_EMAIL_NAME + '@' + LinkifyRe::SRC_HOST_STRICT, 'i')
50
+ end
51
+ if (obj.re[:mailto] =~ tail)
52
+ return tail.match(obj.re[:mailto])[0].length
53
+ end
54
+ return 0
55
+ end
56
+ }
57
+ }
58
+
59
+ #------------------------------------------------------------------------------
60
+ def escapeRE(str)
61
+ return str.gsub(/[\.\?\*\+\^\$\[\]\\\(\)\{\}\|\-]/, "\\$&")
62
+ end
63
+
64
+ #------------------------------------------------------------------------------
65
+ def resetScanCache
66
+ @__index__ = -1
67
+ @__text_cache__ = ''
68
+ end
69
+
70
+ #------------------------------------------------------------------------------
71
+ def createValidator(re)
72
+ return lambda do |text, pos, obj|
73
+ tail = text.slice(pos..-1)
74
+
75
+ (re =~ tail) ? tail.match(re)[0].length : 0
76
+ end
77
+ end
78
+
79
+ #------------------------------------------------------------------------------
80
+ def createNormalizer()
81
+ return lambda do |match, obj|
82
+ obj.normalize(match)
83
+ end
84
+ end
85
+
86
+ # Schemas compiler. Build regexps.
87
+ #
88
+ #------------------------------------------------------------------------------
89
+ def compile
90
+
91
+ # Load & clone RE patterns.
92
+ re = @re = {} #.merge!(require('./lib/re'))
93
+
94
+ # Define dynamic patterns
95
+ tlds = @__tlds__.dup
96
+
97
+ if (!@__tlds_replaced__)
98
+ tlds.push('[a-z]{2}')
99
+ end
100
+ tlds.push(re[:src_xn])
101
+
102
+ re[:src_tlds] = tlds.join('|')
103
+
104
+ untpl = lambda { |tpl| tpl.gsub('%TLDS%', re[:src_tlds]) }
105
+
106
+ re[:email_fuzzy] = Regexp.new(LinkifyRe::TPL_EMAIL_FUZZY.gsub('%TLDS%', re[:src_tlds]), true)
107
+ re[:link_fuzzy] = Regexp.new(LinkifyRe::TPL_LINK_FUZZY.gsub('%TLDS%', re[:src_tlds]), true)
108
+ re[:host_fuzzy_test] = Regexp.new(LinkifyRe::TPL_HOST_FUZZY_TEST.gsub('%TLDS%', re[:src_tlds]), true)
109
+
110
+ #
111
+ # Compile each schema
112
+ #
113
+
114
+ aliases = []
115
+
116
+ @__compiled__ = {} # Reset compiled data
117
+
118
+ schemaError = lambda do |name, val|
119
+ raise Error, ('(LinkifyIt) Invalid schema "' + name + '": ' + val)
120
+ end
121
+
122
+ @__schemas__.each do |name, val|
123
+
124
+ # skip disabled methods
125
+ next if (val == nil)
126
+
127
+ compiled = { validate: nil, link: nil }
128
+
129
+ @__compiled__[name] = compiled
130
+
131
+ if (val.is_a? Hash)
132
+ if (val[:validate].is_a? Regexp)
133
+ compiled[:validate] = createValidator(val[:validate])
134
+ elsif (val[:validate].is_a? Proc)
135
+ compiled[:validate] = val[:validate]
136
+ else
137
+ schemaError(name, val)
138
+ end
139
+
140
+ if (val[:normalize].is_a? Proc)
141
+ compiled[:normalize] = val[:normalize]
142
+ elsif (!val[:normalize])
143
+ compiled[:normalize] = createNormalizer()
144
+ else
145
+ schemaError(name, val)
146
+ end
147
+ next
148
+ end
149
+
150
+ if (val.is_a? String)
151
+ aliases.push(name)
152
+ next
153
+ end
154
+
155
+ schemaError(name, val)
156
+ end
157
+
158
+ #
159
+ # Compile postponed aliases
160
+ #
161
+
162
+ aliases.each do |an_alias|
163
+ if (!@__compiled__[@__schemas__[an_alias]])
164
+ # Silently fail on missed schemas to avoid errons on disable.
165
+ # schemaError(an_alias, self.__schemas__[an_alias]);
166
+ else
167
+ @__compiled__[an_alias][:validate] = @__compiled__[@__schemas__[an_alias]][:validate]
168
+ @__compiled__[an_alias][:normalize] = @__compiled__[@__schemas__[an_alias]][:normalize]
169
+ end
170
+ end
171
+
172
+ #
173
+ # Fake record for guessed links
174
+ #
175
+ @__compiled__[''] = { validate: nil, normalize: createNormalizer }
176
+
177
+ #
178
+ # Build schema condition, and filter disabled & fake schemas
179
+ #
180
+ slist = @__compiled__.select {|name, val| name.length > 0 && !val.nil? }.keys.map {|str| escapeRE(str)}.join('|')
181
+
182
+ # (?!_) cause 1.5x slowdown
183
+ @re[:schema_test] = Regexp.new('(^|(?!_)(?:>|' + LinkifyRe::SRC_Z_P_CC_CF + '))(' + slist + ')', 'i')
184
+ @re[:schema_search] = Regexp.new('(^|(?!_)(?:>|' + LinkifyRe::SRC_Z_P_CC_CF + '))(' + slist + ')', 'ig')
185
+
186
+ @re[:pretest] = Regexp.new(
187
+ '(' + @re[:schema_test].source + ')|' +
188
+ '(' + @re[:host_fuzzy_test].source + ')|' + '@', 'i')
189
+
190
+ #
191
+ # Cleanup
192
+ #
193
+
194
+ resetScanCache
195
+ end
196
+
197
+ # Match result. Single element of array, returned by [[LinkifyIt#match]]
198
+ #------------------------------------------------------------------------------
199
+ class Match
200
+ attr_accessor :schema, :index, :lastIndex, :raw, :text, :url
201
+
202
+ def initialize(obj, shift)
203
+ start = obj.__index__
204
+ endt = obj.__last_index__
205
+ text = obj.__text_cache__.slice(start...endt)
206
+
207
+ # Match#schema -> String
208
+ #
209
+ # Prefix (protocol) for matched string.
210
+ @schema = obj.__schema__.downcase
211
+
212
+ # Match#index -> Number
213
+ #
214
+ # First position of matched string.
215
+ @index = start + shift
216
+
217
+ # Match#lastIndex -> Number
218
+ #
219
+ # Next position after matched string.
220
+ @lastIndex = endt + shift
221
+
222
+ # Match#raw -> String
223
+ #
224
+ # Matched string.
225
+ @raw = text
226
+
227
+ # Match#text -> String
228
+ #
229
+ # Notmalized text of matched string.
230
+ @text = text
231
+
232
+ # Match#url -> String
233
+ #
234
+ # Normalized url of matched string.
235
+ @url = text
236
+ end
237
+
238
+ #------------------------------------------------------------------------------
239
+ def self.createMatch(obj, shift)
240
+ match = Match.new(obj, shift)
241
+ obj.__compiled__[match.schema][:normalize].call(match, obj)
242
+ return match
243
+ end
244
+ end
245
+
246
+
247
+
248
+ # new LinkifyIt(schemas)
249
+ # - schemas (Object): Optional. Additional schemas to validate (prefix/validator)
250
+ #
251
+ # Creates new linkifier instance with optional additional schemas.
252
+ # Can be called without `new` keyword for convenience.
253
+ #
254
+ # By default understands:
255
+ #
256
+ # - `http(s)://...` , `ftp://...`, `mailto:...` & `//...` links
257
+ # - "fuzzy" links and emails (example.com, foo@bar.com).
258
+ #
259
+ # `schemas` is an object, where each key/value describes protocol/rule:
260
+ #
261
+ # - __key__ - link prefix (usually, protocol name with `:` at the end, `skype:`
262
+ # for example). `linkify-it` makes shure that prefix is not preceeded with
263
+ # alphanumeric char and symbols. Only whitespaces and punctuation allowed.
264
+ # - __value__ - rule to check tail after link prefix
265
+ # - _String_ - just alias to existing rule
266
+ # - _Object_
267
+ # - _validate_ - validator function (should return matched length on success),
268
+ # or `RegExp`.
269
+ # - _normalize_ - optional function to normalize text & url of matched result
270
+ # (for example, for @twitter mentions).
271
+ #------------------------------------------------------------------------------
272
+ def initialize(schemas = {})
273
+ # if (!(this instanceof LinkifyIt)) {
274
+ # return new LinkifyIt(schemas);
275
+ # }
276
+
277
+ # Cache last tested result. Used to skip repeating steps on next `match` call.
278
+ @__index__ = -1
279
+ @__last_index__ = -1 # Next scan position
280
+ @__schema__ = ''
281
+ @__text_cache__ = ''
282
+
283
+ @__schemas__ = {}.merge!(DEFAULT_SCHEMAS).merge!(schemas)
284
+ @__compiled__ = {}
285
+
286
+ @__tlds__ = TLDS_DEFAULT
287
+ @__tlds_replaced__ = false
288
+
289
+ @re = {}
290
+
291
+ @bypass_normalizer = false # only used in testing scenarios
292
+
293
+ compile
294
+ end
295
+
296
+
297
+ # chainable
298
+ # LinkifyIt#add(schema, definition)
299
+ # - schema (String): rule name (fixed pattern prefix)
300
+ # - definition (String|RegExp|Object): schema definition
301
+ #
302
+ # Add new rule definition. See constructor description for details.
303
+ #------------------------------------------------------------------------------
304
+ def add(schema, definition)
305
+ @__schemas__[schema] = definition
306
+ compile
307
+ return self
308
+ end
309
+
310
+
311
+ # LinkifyIt#test(text) -> Boolean
312
+ #
313
+ # Searches linkifiable pattern and returns `true` on success or `false` on fail.
314
+ #------------------------------------------------------------------------------
315
+ def test(text)
316
+ # Reset scan cache
317
+ @__text_cache__ = text
318
+ @__index__ = -1
319
+
320
+ return false if (!text.length)
321
+
322
+ # try to scan for link with schema - that's the most simple rule
323
+ if @re[:schema_test] =~ text
324
+ re = @re[:schema_search]
325
+ # re[:lastIndex] = 0
326
+ while ((m = re.match(text)) != nil)
327
+ len = testSchemaAt(text, m[2], m.end(0)) #re[:lastIndex])
328
+ if (len)
329
+ @__schema__ = m[2]
330
+ @__index__ = m.begin(0) + m[1].length
331
+ @__last_index__ = m.begin(0) + m[0].length + len
332
+ break
333
+ end
334
+ end
335
+ end
336
+
337
+ if (@__compiled__['http:'])
338
+ # guess schemaless links
339
+
340
+ tld_pos = text.index(@re[:host_fuzzy_test])
341
+ if !tld_pos.nil?
342
+ # if tld is located after found link - no need to check fuzzy pattern
343
+ if (@__index__ < 0 || tld_pos < @__index__)
344
+ if ((ml = text.match(@re[:link_fuzzy])) != nil)
345
+
346
+ shift = ml.begin(0) + ml[1].length
347
+
348
+ if (@__index__ < 0 || shift < @__index__)
349
+ @__schema__ = ''
350
+ @__index__ = shift
351
+ @__last_index__ = ml.begin(0) + ml[0].length
352
+ end
353
+ end
354
+ end
355
+ end
356
+ end
357
+
358
+ if (@__compiled__['mailto:'])
359
+ # guess schemaless emails
360
+ at_pos = text.index('@')
361
+ if !at_pos.nil?
362
+ # We can't skip this check, because this cases are possible:
363
+ # 192.168.1.1@gmail.com, my.in@example.com
364
+ if ((me = text.match(@re[:email_fuzzy])) != nil)
365
+
366
+ shift = me.begin(0) + me[1].length
367
+ nextc = me.begin(0) + me[0].length
368
+
369
+ if (@__index__ < 0 || shift < @__index__ ||
370
+ (shift == @__index__ && nextc > @__last_index__))
371
+ @__schema__ = 'mailto:'
372
+ @__index__ = shift
373
+ @__last_index__ = nextc
374
+ end
375
+ end
376
+ end
377
+ end
378
+
379
+ return @__index__ >= 0
380
+ end
381
+
382
+
383
+ # LinkifyIt#pretest(text) -> Boolean
384
+ #
385
+ # Very quick check, that can give false positives. Returns true if link MAY BE
386
+ # can exists. Can be used for speed optimization, when you need to check that
387
+ # link NOT exists.
388
+ #------------------------------------------------------------------------------
389
+ def pretest(text)
390
+ return !(@re[:pretest] =~ text).nil?
391
+ end
392
+
393
+
394
+ # LinkifyIt#testSchemaAt(text, name, position) -> Number
395
+ # - text (String): text to scan
396
+ # - name (String): rule (schema) name
397
+ # - position (Number): text offset to check from
398
+ #
399
+ # Similar to [[LinkifyIt#test]] but checks only specific protocol tail exactly
400
+ # at given position. Returns length of found pattern (0 on fail).
401
+ #------------------------------------------------------------------------------
402
+ def testSchemaAt(text, schema, pos)
403
+ # If not supported schema check requested - terminate
404
+ if (!@__compiled__[schema.downcase])
405
+ return 0
406
+ end
407
+ return @__compiled__[schema.downcase][:validate].call(text, pos, self)
408
+ end
409
+
410
+
411
+ # LinkifyIt#match(text) -> Array|null
412
+ #
413
+ # Returns array of found link descriptions or `null` on fail. We strongly
414
+ # to use [[LinkifyIt#test]] first, for best speed.
415
+ #
416
+ # ##### Result match description
417
+ #
418
+ # - __schema__ - link schema, can be empty for fuzzy links, or `//` for
419
+ # protocol-neutral links.
420
+ # - __index__ - offset of matched text
421
+ # - __lastIndex__ - index of next char after mathch end
422
+ # - __raw__ - matched text
423
+ # - __text__ - normalized text
424
+ # - __url__ - link, generated from matched text
425
+ #------------------------------------------------------------------------------
426
+ def match(text)
427
+ shift = 0
428
+ result = []
429
+
430
+ # Try to take previous element from cache, if .test() called before
431
+ if (@__index__ >= 0 && @__text_cache__ == text)
432
+ result.push(Match.createMatch(self, shift))
433
+ shift = @__last_index__
434
+ end
435
+
436
+ # Cut head if cache was used
437
+ tail = shift ? text.slice(shift..-1) : text
438
+
439
+ # Scan string until end reached
440
+ while (self.test(tail))
441
+ result.push(Match.createMatch(self, shift))
442
+
443
+ tail = tail.slice(@__last_index__..-1)
444
+ shift += @__last_index__
445
+ end
446
+
447
+ if (result.length)
448
+ return result
449
+ end
450
+
451
+ return nil
452
+ end
453
+
454
+
455
+ # chainable
456
+ # LinkifyIt#tlds(list [, keepOld]) -> this
457
+ # - list (Array): list of tlds
458
+ # - keepOld (Boolean): merge with current list if `true` (`false` by default)
459
+ #
460
+ # Load (or merge) new tlds list. Those are user for fuzzy links (without prefix)
461
+ # to avoid false positives. By default this algorythm used:
462
+ #
463
+ # - hostname with any 2-letter root zones are ok.
464
+ # - biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф
465
+ # are ok.
466
+ # - encoded (`xn--...`) root zones are ok.
467
+ #
468
+ # If list is replaced, then exact match for 2-chars root zones will be checked.
469
+ #------------------------------------------------------------------------------
470
+ def tlds(list, keepOld)
471
+ list = list.is_a?(Array) ? list : [ list ]
472
+
473
+ if (!keepOld)
474
+ @__tlds__ = list.dup
475
+ @__tlds_replaced__ = true
476
+ compile
477
+ return self
478
+ end
479
+
480
+ @__tlds__ = @__tlds__.concat(list).sort.uniq.reverse
481
+
482
+ compile
483
+ return self
484
+ end
485
+
486
+ # LinkifyIt#normalize(match)
487
+ #
488
+ # Default normalizer (if schema does not define it's own).
489
+ #------------------------------------------------------------------------------
490
+ def normalize(match)
491
+ return if @bypass_normalizer
492
+
493
+ # Do minimal possible changes by default. Need to collect feedback prior
494
+ # to move forward https://github.com/markdown-it/linkify-it/issues/1
495
+
496
+ match.url = 'http://' + match.url if !match.schema
497
+
498
+ if (match.schema == 'mailto:' && !(/^mailto\:/i =~ match.url))
499
+ match.url = 'mailto:' + match.url
500
+ end
501
+ end
502
+
503
+ end
@@ -0,0 +1,111 @@
1
+ module LinkifyRe
2
+
3
+ # Use direct extract instead of `regenerate` to reduce size
4
+ SRC_ANY = UCMicro::Properties::Any::REGEX
5
+ SRC_CC = UCMicro::Categories::Cc::REGEX
6
+ SRC_CF = UCMicro::Categories::Cf::REGEX
7
+ SRC_Z = UCMicro::Categories::Z::REGEX
8
+ SRC_P = UCMicro::Categories::P::REGEX
9
+
10
+ # \p{\Z\P\Cc\CF} (white spaces + control + format + punctuation)
11
+ SRC_Z_P_CC_CF = [ SRC_Z, SRC_P, SRC_CC, SRC_CF ].join('|')
12
+
13
+ # \p{\Z\Cc\CF} (white spaces + control + format)
14
+ SRC_Z_CC_CF = [ SRC_Z, SRC_CC, SRC_CF ].join('|')
15
+
16
+ # All possible word characters (everything without punctuation, spaces & controls)
17
+ # Defined via punctuation & spaces to save space
18
+ # Should be something like \p{\L\N\S\M} (\w but without `_`)
19
+ SRC_PSEUDO_LETTER = '(?:(?!' + SRC_Z_P_CC_CF + ')' + SRC_ANY.source + ')'
20
+ # The same as above but without [0-9]
21
+ SRC_PSEUDO_LETTER_NON_D = '(?:(?![0-9]|' + SRC_Z_P_CC_CF + ')' + SRC_ANY.source + ')'
22
+
23
+ #------------------------------------------------------------------------------
24
+
25
+ SRC_IP4 = '(?:(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'
26
+ SRC_AUTH = '(?:(?:(?!' + SRC_Z_CC_CF + ').)+@)?'
27
+
28
+ SRC_PORT = '(?::(?:6(?:[0-4]\\d{3}|5(?:[0-4]\\d{2}|5(?:[0-2]\\d|3[0-5])))|[1-5]?\\d{1,4}))?'
29
+
30
+ SRC_HOST_TERMINATOR = '(?=$|' + SRC_Z_P_CC_CF + ')(?!-|_|:\\d|\\.-|\\.(?!$|' + SRC_Z_P_CC_CF + '))'
31
+
32
+ SRC_PATH =
33
+ '(?:' +
34
+ '[/?#]' +
35
+ '(?:' +
36
+ '(?!' + SRC_Z_CC_CF + '|[()\\[\\]{}.,"\'?!\\-]).|' +
37
+ '\\[(?:(?!' + SRC_Z_CC_CF + '|\\]).)*\\]|' +
38
+ '\\((?:(?!' + SRC_Z_CC_CF + '|[)]).)*\\)|' +
39
+ '\\{(?:(?!' + SRC_Z_CC_CF + '|[}]).)*\\}|' +
40
+ '\\"(?:(?!' + SRC_Z_CC_CF + '|["]).)+\\"|' +
41
+ "\\'(?:(?!" + SRC_Z_CC_CF + "|[']).)+\\'|" +
42
+ "\\'(?=" + SRC_PSEUDO_LETTER + ').|' + # allow `I'm_king` if no pair found
43
+ '\\.{2,3}[a-zA-Z0-9%]|' + # github has ... in commit range links. Restrict to
44
+ # english & percent-encoded only, until more examples found.
45
+ '\\.(?!' + SRC_Z_CC_CF + '|[.]).|' +
46
+ '\\-(?!' + SRC_Z_CC_CF + '|--(?:[^-]|$))(?:[-]+|.)|' + # `---` => long dash, terminate
47
+ '\\,(?!' + SRC_Z_CC_CF + ').|' + # allow `,,,` in paths
48
+ '\\!(?!' + SRC_Z_CC_CF + '|[!]).|' +
49
+ '\\?(?!' + SRC_Z_CC_CF + '|[?]).' +
50
+ ')+' +
51
+ '|\\/' +
52
+ ')?'
53
+
54
+ SRC_EMAIL_NAME = '[\\-;:&=\\+\\$,\\"\\.a-zA-Z0-9_]+'
55
+ SRC_XN = 'xn--[a-z0-9\\-]{1,59}';
56
+
57
+ # More to read about domain names
58
+ # http://serverfault.com/questions/638260/
59
+
60
+ SRC_DOMAIN_ROOT =
61
+ # Can't have digits and dashes
62
+ '(?:' +
63
+ SRC_XN +
64
+ '|' +
65
+ SRC_PSEUDO_LETTER_NON_D + '{1,63}' +
66
+ ')'
67
+
68
+ SRC_DOMAIN =
69
+ '(?:' +
70
+ SRC_XN +
71
+ '|' +
72
+ '(?:' + SRC_PSEUDO_LETTER + ')' +
73
+ '|' +
74
+ # don't allow `--` in domain names, because:
75
+ # - that can conflict with markdown &mdash; / &ndash;
76
+ # - nobody use those anyway
77
+ '(?:' + SRC_PSEUDO_LETTER + '(?:-(?!-)|' + SRC_PSEUDO_LETTER + '){0,61}' + SRC_PSEUDO_LETTER + ')' +
78
+ ')'
79
+
80
+ SRC_HOST =
81
+ '(?:' +
82
+ SRC_IP4 +
83
+ '|' +
84
+ '(?:(?:(?:' + SRC_DOMAIN + ')\\.)*' + SRC_DOMAIN_ROOT + ')' +
85
+ ')'
86
+
87
+ TPL_HOST_FUZZY =
88
+ '(?:' +
89
+ SRC_IP4 +
90
+ '|' +
91
+ '(?:(?:(?:' + SRC_DOMAIN + ')\\.)+(?:%TLDS%))' +
92
+ ')';
93
+
94
+ SRC_HOST_STRICT = SRC_HOST + SRC_HOST_TERMINATOR
95
+ TPL_HOST_FUZZY_STRICT = TPL_HOST_FUZZY + SRC_HOST_TERMINATOR
96
+ SRC_HOST_PORT_STRICT = SRC_HOST + SRC_PORT + SRC_HOST_TERMINATOR
97
+ TPL_HOST_PORT_FUZZY_STRICT = TPL_HOST_FUZZY + SRC_PORT + SRC_HOST_TERMINATOR
98
+
99
+ #------------------------------------------------------------------------------
100
+ # Main rules
101
+
102
+ # Rude test fuzzy links by host, for quick deny
103
+ TPL_HOST_FUZZY_TEST = 'localhost|\\.\\d{1,3}\\.|(?:\\.(?:%TLDS%)(?:' + SRC_Z_P_CC_CF + '|$))'
104
+ TPL_EMAIL_FUZZY = '(^|>|' + SRC_Z_CC_CF + ')(' + SRC_EMAIL_NAME + '@' + TPL_HOST_FUZZY_STRICT + ')'
105
+ TPL_LINK_FUZZY =
106
+ # Fuzzy link can't be prepended with .:/\- and non punctuation.
107
+ # but can start with > (markdown blockquote)
108
+ '(^|(?![.:/\\-_@])(?:[$+<=>^`|]|' + SRC_Z_P_CC_CF + '))' +
109
+ '((?![$+<=>^`|])' + TPL_HOST_PORT_FUZZY_STRICT + SRC_PATH + ')'
110
+
111
+ end
@@ -0,0 +1,5 @@
1
+ module LinkifyIt
2
+
3
+ VERSION = '0.1.0.0'
4
+
5
+ end
@@ -0,0 +1,234 @@
1
+ #------------------------------------------------------------------------------
2
+ describe 'links' do
3
+
4
+ # TODO tests which can't seem to get passing at the moment, so skip them
5
+ failing_test = [
6
+ 95, # GOOGLE.COM. unable to get final . to be removed
7
+ 214 # xn--d1abbgf6aiiy.xn--p1ai
8
+ ]
9
+
10
+ l = Linkify.new
11
+ l.bypass_normalizer = true # kill the normalizer
12
+
13
+ skipNext = false
14
+ linkfile = File.join(File.dirname(__FILE__), 'fixtures/links.txt')
15
+ lines = File.read(linkfile).split(/\r?\n/)
16
+ lines.each_with_index do |line, idx|
17
+ if skipNext
18
+ skipNext = false
19
+ next
20
+ end
21
+
22
+ line = line.sub(/^%.*/, '')
23
+ next_line = (lines[idx + 1] || '').sub(/^%.*/, '')
24
+
25
+ next if line.strip.empty?
26
+
27
+ unless failing_test.include?(idx + 1)
28
+ if !next_line.strip.empty?
29
+
30
+ it "line #{idx + 1}" do
31
+ expect(l.pretest(line)).to eq true # "(pretest failed in `#{line}`)"
32
+ expect(l.test("\n#{line}\n")).to eq true # "(link not found in `\n#{line}\n`)"
33
+ expect(l.test(line)).to eq true # "(link not found in `#{line}`)"
34
+ expect(l.match(line)[0].url).to eq next_line
35
+ end
36
+
37
+ skipNext = true
38
+
39
+ else
40
+
41
+ it "line #{idx + 1}" do
42
+ expect(l.pretest(line)).to eq true # "(pretest failed in `#{line}`)"
43
+ expect(l.test("\n#{line}\n")).to eq true # "(link not found in `\n#{line}\n`)"
44
+ expect(l.test(line)).to eq true # "(link not found in `#{line}`)"
45
+ expect(l.match(line)[0].url).to eq line
46
+ end
47
+ end
48
+ end
49
+ end
50
+
51
+ end
52
+
53
+
54
+ #------------------------------------------------------------------------------
55
+ describe 'not links' do
56
+
57
+ # TODO tests which can't seem to get passing at the moment, so skip them
58
+ failing_test = [ 6, 7, 8, 12, 16, 19, 22, 23, 24, 25, 26, 27, 28, 29, 48 ]
59
+
60
+ l = Linkify.new
61
+ l.bypass_normalizer = true # kill the normalizer
62
+
63
+ linkfile = File.join(File.dirname(__FILE__), 'fixtures/not_links.txt')
64
+ lines = File.read(linkfile).split(/\r?\n/)
65
+ lines.each_with_index do |line, idx|
66
+ line = line.sub(/^%.*/, '')
67
+
68
+ next if line.strip.empty?
69
+
70
+ unless failing_test.include?(idx + 1)
71
+ it "line #{idx + 1}" do
72
+ # assert.notOk(l.test(line),
73
+ # '(should not find link in `' + line + '`, but found `' +
74
+ # JSON.stringify((l.match(line) || [])[0]) + '`)');
75
+ expect(l.test(line)).not_to eq true
76
+ end
77
+ end
78
+ end
79
+
80
+ end
81
+
82
+ #------------------------------------------------------------------------------
83
+ describe 'API' do
84
+
85
+ #------------------------------------------------------------------------------
86
+ it 'extend tlds' do
87
+ l = Linkify.new
88
+
89
+ expect(l.test('google.myroot')).to_not eq true
90
+
91
+ l.tlds('myroot', true)
92
+
93
+ expect(l.test('google.myroot')).to eq true
94
+ expect(l.test('google.xyz')).to_not eq true
95
+
96
+ # this is some other package of tlds which we don't have
97
+ # l.tlds(require('tlds'));
98
+ # assert.ok(l.test('google.xyz'));
99
+ # assert.notOk(l.test('google.myroot'));
100
+ end
101
+
102
+
103
+ # TODO Tests not passing
104
+ #------------------------------------------------------------------------------
105
+ # it 'add rule as regexp, with default normalizer' do
106
+ # l = Linkify.new.add('my:', {validate: /^\/\/[a-z]+/} )
107
+ #
108
+ # match = l.match('google.com. my:// my://asdf!')
109
+ #
110
+ # expect(match[0].text).to eq 'google.com'
111
+ # expect(match[1].text).to eq 'my://asdf'
112
+ # end
113
+
114
+ # TODO Tests not passing
115
+ #------------------------------------------------------------------------------
116
+ # it 'add rule with normalizer'
117
+ # l = Linkify.new.add('my:', {
118
+ # validate: /^\/\/[a-z]+/,
119
+ # normalize: lambda {|m|
120
+ # m.text = m.text.sub(/^my:\/\//, '').upcase
121
+ # m.url = m.url.upcase
122
+ # }
123
+ # })
124
+ #
125
+ # match = l.match('google.com. my:// my://asdf!')
126
+ #
127
+ # expect(match[1].text).to eq 'ASDF'
128
+ # expect(match[1].url).to eq 'MY://ASDF'
129
+ # end
130
+
131
+ # it('disable rule', function () {
132
+ # var l = linkify();
133
+ #
134
+ # assert.ok(l.test('http://google.com'));
135
+ # assert.ok(l.test('foo@bar.com'));
136
+ # l.add('http:', null);
137
+ # l.add('mailto:', null);
138
+ # assert.notOk(l.test('http://google.com'));
139
+ # assert.notOk(l.test('foo@bar.com'));
140
+ # });
141
+ #
142
+ #
143
+ # it('add bad definition', function () {
144
+ # var l;
145
+ #
146
+ # l = linkify();
147
+ #
148
+ # assert.throw(function () {
149
+ # l.add('test:', []);
150
+ # });
151
+ #
152
+ # l = linkify();
153
+ #
154
+ # assert.throw(function () {
155
+ # l.add('test:', { validate: [] });
156
+ # });
157
+ #
158
+ # l = linkify();
159
+ #
160
+ # assert.throw(function () {
161
+ # l.add('test:', {
162
+ # validate: function () { return false; },
163
+ # normalize: 'bad'
164
+ # });
165
+ # });
166
+ # });
167
+ #
168
+ #
169
+ # it('test at position', function () {
170
+ # var l = linkify();
171
+ #
172
+ # assert.ok(l.testSchemaAt('http://google.com', 'http:', 5));
173
+ # assert.ok(l.testSchemaAt('http://google.com', 'HTTP:', 5));
174
+ # assert.notOk(l.testSchemaAt('http://google.com', 'http:', 6));
175
+ #
176
+ # assert.notOk(l.testSchemaAt('http://google.com', 'bad_schema:', 6));
177
+ # });
178
+ #
179
+ #
180
+ # it('correct cache value', function () {
181
+ # var l = linkify();
182
+ #
183
+ # var match = l.match('.com. http://google.com google.com ftp://google.com');
184
+ #
185
+ # assert.equal(match[0].text, 'http://google.com');
186
+ # assert.equal(match[1].text, 'google.com');
187
+ # assert.equal(match[2].text, 'ftp://google.com');
188
+ # });
189
+ #
190
+ # it('normalize', function () {
191
+ # var l = linkify(), m;
192
+ #
193
+ # m = l.match('mailto:foo@bar.com')[0];
194
+ #
195
+ # // assert.equal(m.text, 'foo@bar.com');
196
+ # assert.equal(m.url, 'mailto:foo@bar.com');
197
+ #
198
+ # m = l.match('foo@bar.com')[0];
199
+ #
200
+ # // assert.equal(m.text, 'foo@bar.com');
201
+ # assert.equal(m.url, 'mailto:foo@bar.com');
202
+ # });
203
+ #
204
+ #
205
+ # it('test @twitter rule', function () {
206
+ # var l = linkify().add('@', {
207
+ # validate: function (text, pos, self) {
208
+ # var tail = text.slice(pos);
209
+ #
210
+ # if (!self.re.twitter) {
211
+ # self.re.twitter = new RegExp(
212
+ # '^([a-zA-Z0-9_]){1,15}(?!_)(?=$|' + self.re.src_ZPCcCf + ')'
213
+ # );
214
+ # }
215
+ # if (self.re.twitter.test(tail)) {
216
+ # if (pos >= 2 && tail[pos - 2] === '@') {
217
+ # return false;
218
+ # }
219
+ # return tail.match(self.re.twitter)[0].length;
220
+ # }
221
+ # return 0;
222
+ # },
223
+ # normalize: function (m) {
224
+ # m.url = 'https://twitter.com/' + m.url.replace(/^@/, '');
225
+ # }
226
+ # });
227
+ #
228
+ # assert.equal(l.match('hello, @gamajoba_!')[0].text, '@gamajoba_');
229
+ # assert.equal(l.match(':@givi')[0].text, '@givi');
230
+ # assert.equal(l.match(':@givi')[0].url, 'https://twitter.com/givi');
231
+ # assert.notOk(l.test('@@invalid'));
232
+ # });
233
+
234
+ end
@@ -0,0 +1,2 @@
1
+ require 'byebug'
2
+ require 'linkify-it-rb'
metadata ADDED
@@ -0,0 +1,67 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: linkify-it-rb
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Brett Walker
8
+ - Vitaly Puzrin
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2015-03-26 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: uc.micro-rb
16
+ requirement: !ruby/object:Gem::Requirement
17
+ requirements:
18
+ - - "~>"
19
+ - !ruby/object:Gem::Version
20
+ version: '1.0'
21
+ type: :runtime
22
+ prerelease: false
23
+ version_requirements: !ruby/object:Gem::Requirement
24
+ requirements:
25
+ - - "~>"
26
+ - !ruby/object:Gem::Version
27
+ version: '1.0'
28
+ description: Ruby version of linkify-it for motion-markdown-it, for Ruby and RubyMotion
29
+ email: github@digitalmoksha.com
30
+ executables: []
31
+ extensions: []
32
+ extra_rdoc_files: []
33
+ files:
34
+ - README.md
35
+ - lib/linkify-it-rb.rb
36
+ - lib/linkify-it-rb/index.rb
37
+ - lib/linkify-it-rb/re.rb
38
+ - lib/linkify-it-rb/version.rb
39
+ - spec/linkify-it-rb/test_spec.rb
40
+ - spec/spec_helper.rb
41
+ homepage: https://github.com/digitalmoksha/linkify-it-rb
42
+ licenses:
43
+ - MIT
44
+ metadata: {}
45
+ post_install_message:
46
+ rdoc_options: []
47
+ require_paths:
48
+ - lib
49
+ required_ruby_version: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - ">="
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ required_rubygems_version: !ruby/object:Gem::Requirement
55
+ requirements:
56
+ - - ">="
57
+ - !ruby/object:Gem::Version
58
+ version: '0'
59
+ requirements: []
60
+ rubyforge_project:
61
+ rubygems_version: 2.4.5
62
+ signing_key:
63
+ specification_version: 4
64
+ summary: linkify-it for motion-markdown-it in Ruby
65
+ test_files:
66
+ - spec/linkify-it-rb/test_spec.rb
67
+ - spec/spec_helper.rb