RubyGems - linkify-it-rb - Versions diffs - 0.1.0.0 - Mend

linkify-it-rb 0.1.0.0

Files changed (9) hide show

checksums.yaml +7 -0
data/README.md +170 -0
data/lib/linkify-it-rb.rb +18 -0
data/lib/linkify-it-rb/index.rb +503 -0
data/lib/linkify-it-rb/re.rb +111 -0
data/lib/linkify-it-rb/version.rb +5 -0
data/spec/linkify-it-rb/test_spec.rb +234 -0
data/spec/spec_helper.rb +2 -0
metadata +67 -0

checksums.yaml ADDED

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: 8899eb2fafae9f6ce4105221dc2342c93de1ca5c
+  data.tar.gz: 493bb38504a90c6f6c9dfa870364758370ee548e
+SHA512:
+  metadata.gz: 01ebcaaaa3238631990a3212f5058d69bd58b7f336576e82ac101745f25ddf7dbd948b81013eda800a388d845ab99fa23f75d2d6d90db35368d90a9b92e5f6b2
+  data.tar.gz: 0b9c9fbfe357d9b3c78c2def0bd9f51f08cb0b39335e85c8bb791d483c4e589bf3a5d860357584591e3fe8277f6226310dff04a5235ed0c783472d7ac7803663

data/README.md ADDED

@@ -0,0 +1,170 @@
+# linkify-it-rb
+Links recognition library with FULL unicode support. Focused on high quality link patterns detection in plain text.  For use with both Ruby and RubyMotion.
+This gem is a port of the [linkify-it javascript package](https://github.com/markdown-it/linkify-it) by Vitaly Puzrin, that is used for the [markdown-it](https://github.com/markdown-it/markdown-it) package.
+__[Javascript Demo](http://markdown-it.github.io/linkify-it/)__
+_Note:_ This gem is still in progress - some of the Unicode support is still being worked on.
+## To be updated: Original Javascript package documentation
+Why it's awesome:
+- Full unicode support, _with astral characters_!
+- International domains support.
+- Allows rules extension & custom normalizers.
+Install
+-------
+```bash
+npm install linkify-it --save
+```
+Browserification is also supported.
+Usage examples
+--------------
+##### Example 1
+```js
+var linkify = require('linkify-it')();
+// Reload full tlds list & add uniffocial `.onion` domain.
+linkify
+  .tlds(require('tlds'))          // Reload with full tlds list
+  .tlds('.onion', true);          // Add uniffocial `.onion` domain
+  .linkify.add('git:', 'http:');  // Add `git:` ptotocol as "alias"
+  .linkify.add('ftp:', null);     // Disable `ftp:` ptotocol
+console.log(linkify.test('Site github.com!'));  // true
+console.log(linkify.match('Site github.com!')); // [ {
+                                                //   schema: "",
+                                                //   index: 5,
+                                                //   lastIndex: 15,
+                                                //   raw: "github.com",
+                                                //   text: "github.com",
+                                                //   url: "http://github.com",
+                                                // } ]
+```
+##### Exmple 2. Add twitter mentions handler
+```js
+linkify.add('@', {
+  validate: function (text, pos, self) {
+    var tail = text.slice(pos);
+    if (!self.re.twitter) {
+      self.re.twitter =  new RegExp(
+        '^([a-zA-Z0-9_]){1,15}(?!_)(?=$|' + self.re.src_ZPCcCf + ')'
+      );
+    }
+    if (self.re.twitter.test(tail)) {
+      // Linkifier allows punctuation chars before prefix,
+      // but we additionally disable `@` ("@@mention" is invalid)
+      if (pos >= 2 && tail[pos - 2] === '@') {
+        return false;
+      }
+      return tail.match(self.re.twitter)[0].length;
+    }
+    return 0;
+  },
+  normalize: function (match) {
+    match.url = 'https://twitter.com/' + match.url.replace(/^@/, '');
+  }
+});
+```
+API
+---
+__[API documentation](http://markdown-it.github.io/linkify-it/doc)__
+### new LinkifyIt(schemas)
+Creates new linkifier instance with optional additional schemas.
+Can be called without `new` keyword for convenience.
+By default understands:
+- `http(s)://...` , `ftp://...`, `mailto:...` & `//...` links
+- "fuzzy" links and emails (google.com, foo@bar.com).
+`schemas` is an object, where each key/value describes protocol/rule:
+- __key__ - link prefix (usually, protocol name with `:` at the end, `skype:`
+  for example). `linkify-it` makes shure that prefix is not preceeded with
+  alphanumeric char.
+- __value__ - rule to check tail after link prefix
+  - _String_ - just alias to existing rule
+  - _Object_
+    - _validate_ - validator function (should return matched length on success),
+      or `RegExp`.
+    - _normalize_ - optional function to normalize text & url of matched result
+      (for example, for twitter mentions).
+### .test(text)
+Searches linkifiable pattern and returns `true` on success or `false` on fail.
+### .pretest(text)
+Quick check if link MAY BE can exist. Can be used to optimize more expensive
+`.test()` calls. Return `false` if link can not be found, `true` - if `.test()`
+call needed to know exactly.
+### .testSchemaAt(text, name, offset)
+Similar to `.test()` but checks only specific protocol tail exactly at given
+position. Returns length of found pattern (0 on fail).
+### .match(text)
+Returns `Array` of found link matches or null if nothing found.
+Each match has:
+- __schema__ - link schema, can be empty for fuzzy links, or `//` for
+  protocol-neutral  links.
+- __index__ - offset of matched text
+- __lastIndex__ - index of next char after mathch end
+- __raw__ - matched text
+- __text__ - normalized text
+- __url__ - link, generated from matched text
+### .tlds(list[, keepOld])
+Load (or merge) new tlds list. Those are user for fuzzy links (without prefix)
+to avoid false positives. By default this algorythm used:
+- hostname with any 2-letter root zones are ok.
+- biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф
+  are ok.
+- encoded (`xn--...`) root zones are ok.
+If list is replaced, then exact match for 2-chars root zones will be checked.
+### .add(schema, definition)
+Add new rule with `schema` prefix. For definition details see constructor
+description. To disable existing rule use `.add(name, null)`
+## License
+[MIT](https://github.com/markdown-it/linkify-it/blob/master/LICENSE)

data/lib/linkify-it-rb.rb ADDED

@@ -0,0 +1,18 @@
+# encoding: utf-8
+if defined?(Motion::Project::Config)
+  lib_dir_path = File.dirname(File.expand_path(__FILE__))
+  Motion::Project::App.setup do |app|
+    app.files.unshift(Dir.glob(File.join(lib_dir_path, "linkify-it-rb/**/*.rb")))
+  end
+  require 'uc.micro-rb'
+else
+  require 'uc.micro-rb'
+  require 'linkify-it-rb/re'
+  require 'linkify-it-rb/index'
+end

data/lib/linkify-it-rb/index.rb ADDED

@@ -0,0 +1,503 @@
+class Linkify
+  include ::LinkifyRe
+  attr_accessor   :__index__, :__last_index__, :__text_cache__, :__schema__, :__compiled__
+  attr_accessor   :re, :bypass_normalizer
+  # DON'T try to make PRs with changes. Extend TLDs with LinkifyIt.tlds() instead
+  TLDS_DEFAULT = 'biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф'.split('|')
+  DEFAULT_SCHEMAS = {
+    'http:' => {
+      validate: lambda do |text, pos, obj|
+        tail = text.slice(pos..-1)
+        if (!obj.re[:http])
+          # compile lazily, because "host"-containing variables can change on tlds update.
+          obj.re[:http] = Regexp.new('^\\/\\/' + LinkifyRe::SRC_AUTH + LinkifyRe::SRC_HOST_PORT_STRICT + LinkifyRe::SRC_PATH, 'i')
+        end
+        if obj.re[:http] =~ tail
+          return tail.match(obj.re[:http])[0].length
+        end
+        return 0
+      end
+    },
+    'https:' =>  'http:',
+    'ftp:' =>    'http:',
+    '//' =>      {
+      validate: lambda do |text, pos, obj|
+        tail = text.slice(pos..-1)
+        if (!obj.re[:no_http])
+          # compile lazily, becayse "host"-containing variables can change on tlds update.
+          obj.re[:no_http] = Regexp.new('^' + LinkifyRe::SRC_AUTH + LinkifyRe::SRC_HOST_PORT_STRICT + LinkifyRe::SRC_PATH, 'i')
+        end
+        if (obj.re[:no_http] =~ tail)
+          # should not be `://`, that protects from errors in protocol name
+          return 0 if (pos >= 3 && text[pos - 3] == ':')
+          return tail.match(obj.re[:no_http])[0].length
+        end
+        return 0
+      end
+    },
+    'mailto:' => {
+      validate: lambda do |text, pos, obj|
+        tail = text.slice(pos..-1)
+        if (!obj.re[:mailto])
+          obj.re[:mailto] = Regexp.new('^' + LinkifyRe::SRC_EMAIL_NAME + '@' + LinkifyRe::SRC_HOST_STRICT, 'i')
+        end
+        if (obj.re[:mailto] =~ tail)
+          return tail.match(obj.re[:mailto])[0].length
+        end
+        return 0
+      end
+    }
+  }
+  #------------------------------------------------------------------------------
+  def escapeRE(str)
+    return str.gsub(/[\.\?\*\+\^\$\[\]\\\(\)\{\}\|\-]/, "\\$&")
+  end
+  #------------------------------------------------------------------------------
+  def resetScanCache
+    @__index__      = -1
+    @__text_cache__ = ''
+  end
+  #------------------------------------------------------------------------------
+  def createValidator(re)
+    return lambda do |text, pos, obj|
+      tail = text.slice(pos..-1)
+      (re =~ tail) ? tail.match(re)[0].length : 0
+    end
+  end
+  #------------------------------------------------------------------------------
+  def createNormalizer()
+    return lambda do |match, obj|
+      obj.normalize(match)
+    end
+  end
+  # Schemas compiler. Build regexps.
+  #
+  #------------------------------------------------------------------------------
+  def compile
+    # Load & clone RE patterns.
+    re = @re = {}  #.merge!(require('./lib/re'))
+    # Define dynamic patterns
+    tlds = @__tlds__.dup
+    if (!@__tlds_replaced__)
+      tlds.push('[a-z]{2}')
+    end
+    tlds.push(re[:src_xn])
+    re[:src_tlds] = tlds.join('|')
+    untpl = lambda { |tpl| tpl.gsub('%TLDS%', re[:src_tlds]) }
+    re[:email_fuzzy]      = Regexp.new(LinkifyRe::TPL_EMAIL_FUZZY.gsub('%TLDS%', re[:src_tlds]), true)
+    re[:link_fuzzy]       = Regexp.new(LinkifyRe::TPL_LINK_FUZZY.gsub('%TLDS%', re[:src_tlds]), true)
+    re[:host_fuzzy_test]  = Regexp.new(LinkifyRe::TPL_HOST_FUZZY_TEST.gsub('%TLDS%', re[:src_tlds]), true)
+    #
+    # Compile each schema
+    #
+    aliases = []
+    @__compiled__ = {} # Reset compiled data
+    schemaError = lambda do |name, val|
+      raise Error, ('(LinkifyIt) Invalid schema "' + name + '": ' + val)
+    end
+    @__schemas__.each do |name, val|
+      # skip disabled methods
+      next if (val == nil)
+      compiled = { validate: nil, link: nil }
+      @__compiled__[name] = compiled
+      if (val.is_a? Hash)
+        if (val[:validate].is_a? Regexp)
+          compiled[:validate] = createValidator(val[:validate])
+        elsif (val[:validate].is_a? Proc)
+          compiled[:validate] = val[:validate]
+        else
+          schemaError(name, val)
+        end
+        if (val[:normalize].is_a? Proc)
+          compiled[:normalize] = val[:normalize]
+        elsif (!val[:normalize])
+          compiled[:normalize] = createNormalizer()
+        else
+          schemaError(name, val)
+        end
+        next
+      end
+      if (val.is_a? String)
+        aliases.push(name)
+        next
+      end
+      schemaError(name, val)
+    end
+    #
+    # Compile postponed aliases
+    #
+    aliases.each do |an_alias|
+      if (!@__compiled__[@__schemas__[an_alias]])
+        # Silently fail on missed schemas to avoid errons on disable.
+        # schemaError(an_alias, self.__schemas__[an_alias]);
+      else
+        @__compiled__[an_alias][:validate]  = @__compiled__[@__schemas__[an_alias]][:validate]
+        @__compiled__[an_alias][:normalize] = @__compiled__[@__schemas__[an_alias]][:normalize]
+      end
+    end
+    #
+    # Fake record for guessed links
+    #
+    @__compiled__[''] = { validate: nil, normalize: createNormalizer }
+    #
+    # Build schema condition, and filter disabled & fake schemas
+    #
+    slist = @__compiled__.select {|name, val| name.length > 0 && !val.nil? }.keys.map {|str| escapeRE(str)}.join('|')
+    # (?!_) cause 1.5x slowdown
+    @re[:schema_test]   = Regexp.new('(^|(?!_)(?:>|' + LinkifyRe::SRC_Z_P_CC_CF + '))(' + slist + ')', 'i')
+    @re[:schema_search] = Regexp.new('(^|(?!_)(?:>|' + LinkifyRe::SRC_Z_P_CC_CF + '))(' + slist + ')', 'ig')
+    @re[:pretest]       = Regexp.new(
+                              '(' + @re[:schema_test].source + ')|' +
+                              '(' + @re[:host_fuzzy_test].source + ')|' + '@', 'i')
+    #
+    # Cleanup
+    #
+    resetScanCache
+  end
+  # Match result. Single element of array, returned by [[LinkifyIt#match]]
+  #------------------------------------------------------------------------------
+  class Match
+    attr_accessor   :schema, :index, :lastIndex, :raw, :text, :url
+    def initialize(obj, shift)
+      start = obj.__index__
+      endt  = obj.__last_index__
+      text  = obj.__text_cache__.slice(start...endt)
+      # Match#schema -> String
+      #
+      # Prefix (protocol) for matched string.
+      @schema    = obj.__schema__.downcase
+      # Match#index -> Number
+      #
+      # First position of matched string.
+      @index     = start + shift
+      # Match#lastIndex -> Number
+      #
+      # Next position after matched string.
+      @lastIndex = endt + shift
+      # Match#raw -> String
+      #
+      # Matched string.
+      @raw       = text
+      # Match#text -> String
+      #
+      # Notmalized text of matched string.
+      @text      = text
+      # Match#url -> String
+      #
+      # Normalized url of matched string.
+      @url       = text
+    end
+    #------------------------------------------------------------------------------
+    def self.createMatch(obj, shift)
+      match = Match.new(obj, shift)
+      obj.__compiled__[match.schema][:normalize].call(match, obj)
+      return match
+    end
+  end
+  # new LinkifyIt(schemas)
+  # - schemas (Object): Optional. Additional schemas to validate (prefix/validator)
+  #
+  # Creates new linkifier instance with optional additional schemas.
+  # Can be called without `new` keyword for convenience.
+  #
+  # By default understands:
+  #
+  # - `http(s)://...` , `ftp://...`, `mailto:...` & `//...` links
+  # - "fuzzy" links and emails (example.com, foo@bar.com).
+  #
+  # `schemas` is an object, where each key/value describes protocol/rule:
+  #
+  # - __key__ - link prefix (usually, protocol name with `:` at the end, `skype:`
+  #   for example). `linkify-it` makes shure that prefix is not preceeded with
+  #   alphanumeric char and symbols. Only whitespaces and punctuation allowed.
+  # - __value__ - rule to check tail after link prefix
+  #   - _String_ - just alias to existing rule
+  #   - _Object_
+  #     - _validate_ - validator function (should return matched length on success),
+  #       or `RegExp`.
+  #     - _normalize_ - optional function to normalize text & url of matched result
+  #       (for example, for @twitter mentions).
+  #------------------------------------------------------------------------------
+  def initialize(schemas = {})
+    # if (!(this instanceof LinkifyIt)) {
+    #   return new LinkifyIt(schemas);
+    # }
+    # Cache last tested result. Used to skip repeating steps on next `match` call.
+    @__index__          = -1
+    @__last_index__     = -1 # Next scan position
+    @__schema__         = ''
+    @__text_cache__     = ''
+    @__schemas__        = {}.merge!(DEFAULT_SCHEMAS).merge!(schemas)
+    @__compiled__       = {}
+    @__tlds__           = TLDS_DEFAULT
+    @__tlds_replaced__  = false
+    @re                 = {}
+    @bypass_normalizer  = false   # only used in testing scenarios
+    compile
+  end
+  # chainable
+  # LinkifyIt#add(schema, definition)
+  # - schema (String): rule name (fixed pattern prefix)
+  # - definition (String|RegExp|Object): schema definition
+  #
+  # Add new rule definition. See constructor description for details.
+  #------------------------------------------------------------------------------
+  def add(schema, definition)
+    @__schemas__[schema] = definition
+    compile
+    return self
+  end
+  # LinkifyIt#test(text) -> Boolean
+  #
+  # Searches linkifiable pattern and returns `true` on success or `false` on fail.
+  #------------------------------------------------------------------------------
+  def test(text)
+    # Reset scan cache
+    @__text_cache__ = text
+    @__index__      = -1
+    return false if (!text.length)
+    # try to scan for link with schema - that's the most simple rule
+    if @re[:schema_test] =~ text
+      re = @re[:schema_search]
+      # re[:lastIndex] = 0
+      while ((m = re.match(text)) != nil)
+        len = testSchemaAt(text, m[2], m.end(0)) #re[:lastIndex])
+        if (len)
+          @__schema__     = m[2]
+          @__index__      = m.begin(0) + m[1].length
+          @__last_index__ = m.begin(0) + m[0].length + len
+          break
+        end
+      end
+    end
+    if (@__compiled__['http:'])
+      # guess schemaless links
+      tld_pos = text.index(@re[:host_fuzzy_test])
+      if !tld_pos.nil?
+        # if tld is located after found link - no need to check fuzzy pattern
+        if (@__index__ < 0 || tld_pos < @__index__)
+          if ((ml = text.match(@re[:link_fuzzy])) != nil)
+            shift = ml.begin(0) + ml[1].length
+            if (@__index__ < 0 || shift < @__index__)
+              @__schema__     = ''
+              @__index__      = shift
+              @__last_index__ = ml.begin(0) + ml[0].length
+            end
+          end
+        end
+      end
+    end
+    if (@__compiled__['mailto:'])
+      # guess schemaless emails
+      at_pos = text.index('@')
+      if !at_pos.nil?
+        # We can't skip this check, because this cases are possible:
+        # 192.168.1.1@gmail.com, my.in@example.com
+        if ((me = text.match(@re[:email_fuzzy])) != nil)
+          shift = me.begin(0) + me[1].length
+          nextc = me.begin(0) + me[0].length
+          if (@__index__ < 0 || shift < @__index__ ||
+              (shift == @__index__ && nextc > @__last_index__))
+            @__schema__     = 'mailto:'
+            @__index__      = shift
+            @__last_index__ = nextc
+          end
+        end
+      end
+    end
+    return @__index__ >= 0
+  end
+  # LinkifyIt#pretest(text) -> Boolean
+  #
+  # Very quick check, that can give false positives. Returns true if link MAY BE
+  # can exists. Can be used for speed optimization, when you need to check that
+  # link NOT exists.
+  #------------------------------------------------------------------------------
+  def pretest(text)
+    return !(@re[:pretest] =~ text).nil?
+  end
+  # LinkifyIt#testSchemaAt(text, name, position) -> Number
+  # - text (String): text to scan
+  # - name (String): rule (schema) name
+  # - position (Number): text offset to check from
+  #
+  # Similar to [[LinkifyIt#test]] but checks only specific protocol tail exactly
+  # at given position. Returns length of found pattern (0 on fail).
+  #------------------------------------------------------------------------------
+  def testSchemaAt(text, schema, pos)
+    # If not supported schema check requested - terminate
+    if (!@__compiled__[schema.downcase])
+      return 0
+    end
+    return @__compiled__[schema.downcase][:validate].call(text, pos, self)
+  end
+  # LinkifyIt#match(text) -> Array|null
+  #
+  # Returns array of found link descriptions or `null` on fail. We strongly
+  # to use [[LinkifyIt#test]] first, for best speed.
+  #
+  # ##### Result match description
+  #
+  # - __schema__ - link schema, can be empty for fuzzy links, or `//` for
+  #   protocol-neutral  links.
+  # - __index__ - offset of matched text
+  # - __lastIndex__ - index of next char after mathch end
+  # - __raw__ - matched text
+  # - __text__ - normalized text
+  # - __url__ - link, generated from matched text
+  #------------------------------------------------------------------------------
+  def match(text)
+    shift  = 0
+    result = []
+    # Try to take previous element from cache, if .test() called before
+    if (@__index__ >= 0 && @__text_cache__ == text)
+      result.push(Match.createMatch(self, shift))
+      shift = @__last_index__
+    end
+    # Cut head if cache was used
+    tail = shift ? text.slice(shift..-1) : text
+    # Scan string until end reached
+    while (self.test(tail))
+      result.push(Match.createMatch(self, shift))
+      tail   = tail.slice(@__last_index__..-1)
+      shift += @__last_index__
+    end
+    if (result.length)
+      return result
+    end
+    return nil
+  end
+  # chainable
+  # LinkifyIt#tlds(list [, keepOld]) -> this
+  # - list (Array): list of tlds
+  # - keepOld (Boolean): merge with current list if `true` (`false` by default)
+  #
+  # Load (or merge) new tlds list. Those are user for fuzzy links (without prefix)
+  # to avoid false positives. By default this algorythm used:
+  #
+  # - hostname with any 2-letter root zones are ok.
+  # - biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф
+  #   are ok.
+  # - encoded (`xn--...`) root zones are ok.
+  #
+  # If list is replaced, then exact match for 2-chars root zones will be checked.
+  #------------------------------------------------------------------------------
+  def tlds(list, keepOld)
+    list = list.is_a?(Array) ? list : [ list ]
+    if (!keepOld)
+      @__tlds__ = list.dup
+      @__tlds_replaced__ = true
+      compile
+      return self
+    end
+    @__tlds__ = @__tlds__.concat(list).sort.uniq.reverse
+    compile
+    return self
+  end
+  # LinkifyIt#normalize(match)
+  #
+  # Default normalizer (if schema does not define it's own).
+  #------------------------------------------------------------------------------
+  def normalize(match)
+    return if @bypass_normalizer
+    # Do minimal possible changes by default. Need to collect feedback prior
+    # to move forward https://github.com/markdown-it/linkify-it/issues/1
+    match.url = 'http://' + match.url if !match.schema
+    if (match.schema == 'mailto:' && !(/^mailto\:/i =~ match.url))
+      match.url = 'mailto:' + match.url
+    end
+  end
+end

data/lib/linkify-it-rb/re.rb ADDED

@@ -0,0 +1,111 @@
+module LinkifyRe
+    # Use direct extract instead of `regenerate` to reduce size
+    SRC_ANY = UCMicro::Properties::Any::REGEX
+    SRC_CC  = UCMicro::Categories::Cc::REGEX
+    SRC_CF  = UCMicro::Categories::Cf::REGEX
+    SRC_Z   = UCMicro::Categories::Z::REGEX
+    SRC_P   = UCMicro::Categories::P::REGEX
+    # \p{\Z\P\Cc\CF} (white spaces + control + format + punctuation)
+    SRC_Z_P_CC_CF = [ SRC_Z, SRC_P, SRC_CC, SRC_CF ].join('|')
+    # \p{\Z\Cc\CF} (white spaces + control + format)
+    SRC_Z_CC_CF = [ SRC_Z, SRC_CC, SRC_CF ].join('|')
+    # All possible word characters (everything without punctuation, spaces & controls)
+    # Defined via punctuation & spaces to save space
+    # Should be something like \p{\L\N\S\M} (\w but without `_`)
+    SRC_PSEUDO_LETTER       = '(?:(?!' + SRC_Z_P_CC_CF + ')' + SRC_ANY.source + ')'
+    # The same as above but without [0-9]
+    SRC_PSEUDO_LETTER_NON_D = '(?:(?![0-9]|' + SRC_Z_P_CC_CF + ')' + SRC_ANY.source + ')'
+    #------------------------------------------------------------------------------
+    SRC_IP4   = '(?:(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'
+    SRC_AUTH  = '(?:(?:(?!' + SRC_Z_CC_CF + ').)+@)?'
+    SRC_PORT  = '(?::(?:6(?:[0-4]\\d{3}|5(?:[0-4]\\d{2}|5(?:[0-2]\\d|3[0-5])))|[1-5]?\\d{1,4}))?'
+    SRC_HOST_TERMINATOR = '(?=$|' + SRC_Z_P_CC_CF + ')(?!-|_|:\\d|\\.-|\\.(?!$|' + SRC_Z_P_CC_CF + '))'
+    SRC_PATH =
+      '(?:' +
+        '[/?#]' +
+          '(?:' +
+            '(?!' + SRC_Z_CC_CF + '|[()\\[\\]{}.,"\'?!\\-]).|' +
+            '\\[(?:(?!' + SRC_Z_CC_CF + '|\\]).)*\\]|' +
+            '\\((?:(?!' + SRC_Z_CC_CF + '|[)]).)*\\)|' +
+            '\\{(?:(?!' + SRC_Z_CC_CF + '|[}]).)*\\}|' +
+            '\\"(?:(?!' + SRC_Z_CC_CF + '|["]).)+\\"|' +
+            "\\'(?:(?!" + SRC_Z_CC_CF + "|[']).)+\\'|" +
+            "\\'(?=" + SRC_PSEUDO_LETTER + ').|' +  # allow `I'm_king` if no pair found
+            '\\.{2,3}[a-zA-Z0-9%]|' + # github has ... in commit range links. Restrict to
+                                      # english & percent-encoded only, until more examples found.
+            '\\.(?!' + SRC_Z_CC_CF + '|[.]).|' +
+            '\\-(?!' + SRC_Z_CC_CF + '|--(?:[^-]|$))(?:[-]+|.)|' +  # `---` => long dash, terminate
+            '\\,(?!' + SRC_Z_CC_CF + ').|' +      # allow `,,,` in paths
+            '\\!(?!' + SRC_Z_CC_CF + '|[!]).|' +
+            '\\?(?!' + SRC_Z_CC_CF + '|[?]).' +
+          ')+' +
+        '|\\/' +
+      ')?'
+    SRC_EMAIL_NAME  = '[\\-;:&=\\+\\$,\\"\\.a-zA-Z0-9_]+'
+    SRC_XN          = 'xn--[a-z0-9\\-]{1,59}';
+    # More to read about domain names
+    # http://serverfault.com/questions/638260/
+    SRC_DOMAIN_ROOT =
+      # Can't have digits and dashes
+      '(?:' +
+        SRC_XN +
+        '|' +
+        SRC_PSEUDO_LETTER_NON_D + '{1,63}' +
+      ')'
+    SRC_DOMAIN =
+      '(?:' +
+        SRC_XN +
+        '|' +
+        '(?:' + SRC_PSEUDO_LETTER + ')' +
+        '|' +
+        # don't allow `--` in domain names, because:
+        # - that can conflict with markdown &mdash; / &ndash;
+        # - nobody use those anyway
+        '(?:' + SRC_PSEUDO_LETTER + '(?:-(?!-)|' + SRC_PSEUDO_LETTER + '){0,61}' + SRC_PSEUDO_LETTER + ')' +
+      ')'
+    SRC_HOST =
+      '(?:' +
+        SRC_IP4 +
+      '|' +
+        '(?:(?:(?:' + SRC_DOMAIN + ')\\.)*' + SRC_DOMAIN_ROOT + ')' +
+      ')'
+    TPL_HOST_FUZZY =
+      '(?:' +
+        SRC_IP4 +
+      '|' +
+        '(?:(?:(?:' + SRC_DOMAIN + ')\\.)+(?:%TLDS%))' +
+      ')';
+    SRC_HOST_STRICT            = SRC_HOST + SRC_HOST_TERMINATOR
+    TPL_HOST_FUZZY_STRICT      = TPL_HOST_FUZZY + SRC_HOST_TERMINATOR
+    SRC_HOST_PORT_STRICT       = SRC_HOST + SRC_PORT + SRC_HOST_TERMINATOR
+    TPL_HOST_PORT_FUZZY_STRICT = TPL_HOST_FUZZY + SRC_PORT + SRC_HOST_TERMINATOR
+    #------------------------------------------------------------------------------
+    # Main rules
+    # Rude test fuzzy links by host, for quick deny
+    TPL_HOST_FUZZY_TEST = 'localhost|\\.\\d{1,3}\\.|(?:\\.(?:%TLDS%)(?:' + SRC_Z_P_CC_CF + '|$))'
+    TPL_EMAIL_FUZZY     = '(^|>|' + SRC_Z_CC_CF + ')(' + SRC_EMAIL_NAME + '@' + TPL_HOST_FUZZY_STRICT + ')'
+    TPL_LINK_FUZZY =
+        # Fuzzy link can't be prepended with .:/\- and non punctuation.
+        # but can start with > (markdown blockquote)
+        '(^|(?![.:/\\-_@])(?:[$+<=>^`|]|' + SRC_Z_P_CC_CF + '))' +
+        '((?![$+<=>^`|])' + TPL_HOST_PORT_FUZZY_STRICT + SRC_PATH + ')'
+end

data/lib/linkify-it-rb/version.rb ADDED

@@ -0,0 +1,5 @@
+module LinkifyIt
+  VERSION = '0.1.0.0'
+end

data/spec/linkify-it-rb/test_spec.rb ADDED

@@ -0,0 +1,234 @@
+#------------------------------------------------------------------------------
+describe 'links' do
+  # TODO tests which can't seem to get passing at the moment, so skip them
+  failing_test = [
+    95,     # GOOGLE.COM.     unable to get final . to be removed
+    214     # xn--d1abbgf6aiiy.xn--p1ai
+  ]
+  l = Linkify.new
+  l.bypass_normalizer = true    # kill the normalizer
+  skipNext  = false
+  linkfile  = File.join(File.dirname(__FILE__), 'fixtures/links.txt')
+  lines     = File.read(linkfile).split(/\r?\n/)
+  lines.each_with_index do |line, idx|
+    if skipNext
+      skipNext = false
+      next
+    end
+    line      = line.sub(/^%.*/, '')
+    next_line = (lines[idx + 1] || '').sub(/^%.*/, '')
+    next if line.strip.empty?
+    unless failing_test.include?(idx + 1)
+      if !next_line.strip.empty?
+        it "line #{idx + 1}" do
+          expect(l.pretest(line)).to eq true        # "(pretest failed in `#{line}`)"
+          expect(l.test("\n#{line}\n")).to eq true  # "(link not found in `\n#{line}\n`)"
+          expect(l.test(line)).to eq true           # "(link not found in `#{line}`)"
+          expect(l.match(line)[0].url).to eq next_line
+        end
+        skipNext = true
+      else
+        it "line #{idx + 1}" do
+          expect(l.pretest(line)).to eq true        # "(pretest failed in `#{line}`)"
+          expect(l.test("\n#{line}\n")).to eq true  # "(link not found in `\n#{line}\n`)"
+          expect(l.test(line)).to eq true           # "(link not found in `#{line}`)"
+          expect(l.match(line)[0].url).to eq line
+        end
+      end
+    end
+  end
+end
+#------------------------------------------------------------------------------
+describe 'not links' do
+  # TODO tests which can't seem to get passing at the moment, so skip them
+  failing_test = [ 6, 7, 8, 12, 16, 19, 22, 23, 24, 25, 26, 27, 28, 29, 48 ]
+  l = Linkify.new
+  l.bypass_normalizer = true    # kill the normalizer
+  linkfile  = File.join(File.dirname(__FILE__), 'fixtures/not_links.txt')
+  lines     = File.read(linkfile).split(/\r?\n/)
+  lines.each_with_index do |line, idx|
+    line = line.sub(/^%.*/, '')
+    next if line.strip.empty?
+    unless failing_test.include?(idx + 1)
+      it "line #{idx + 1}" do
+        # assert.notOk(l.test(line),
+        #  '(should not find link in `' + line + '`, but found `' +
+        #  JSON.stringify((l.match(line) || [])[0]) + '`)');
+        expect(l.test(line)).not_to eq true
+      end
+    end
+  end
+end
+#------------------------------------------------------------------------------
+describe 'API' do
+  #------------------------------------------------------------------------------
+  it 'extend tlds' do
+    l = Linkify.new
+    expect(l.test('google.myroot')).to_not eq true
+    l.tlds('myroot', true)
+    expect(l.test('google.myroot')).to eq true
+    expect(l.test('google.xyz')).to_not eq true
+    # this is some other package of tlds which we don't have
+    # l.tlds(require('tlds'));
+    # assert.ok(l.test('google.xyz'));
+    # assert.notOk(l.test('google.myroot'));
+  end
+  # TODO Tests not passing
+  #------------------------------------------------------------------------------
+  # it 'add rule as regexp, with default normalizer' do
+  #   l = Linkify.new.add('my:', {validate: /^\/\/[a-z]+/} )
+  #
+  #   match = l.match('google.com. my:// my://asdf!')
+  #
+  #   expect(match[0].text).to eq 'google.com'
+  #   expect(match[1].text).to eq 'my://asdf'
+  # end
+  # TODO Tests not passing
+  #------------------------------------------------------------------------------
+  # it 'add rule with normalizer'
+  #   l = Linkify.new.add('my:', {
+  #     validate: /^\/\/[a-z]+/,
+  #     normalize: lambda {|m|
+  #       m.text = m.text.sub(/^my:\/\//, '').upcase
+  #       m.url  = m.url.upcase
+  #     }
+  #   })
+  #
+  #   match = l.match('google.com. my:// my://asdf!')
+  #
+  #   expect(match[1].text).to eq 'ASDF'
+  #   expect(match[1].url).to eq 'MY://ASDF'
+  # end
+#   it('disable rule', function () {
+#     var l = linkify();
+#
+#     assert.ok(l.test('http://google.com'));
+#     assert.ok(l.test('foo@bar.com'));
+#     l.add('http:', null);
+#     l.add('mailto:', null);
+#     assert.notOk(l.test('http://google.com'));
+#     assert.notOk(l.test('foo@bar.com'));
+#   });
+#
+#
+#   it('add bad definition', function () {
+#     var l;
+#
+#     l = linkify();
+#
+#     assert.throw(function () {
+#       l.add('test:', []);
+#     });
+#
+#     l = linkify();
+#
+#     assert.throw(function () {
+#       l.add('test:', { validate: [] });
+#     });
+#
+#     l = linkify();
+#
+#     assert.throw(function () {
+#       l.add('test:', {
+#         validate: function () { return false; },
+#         normalize: 'bad'
+#       });
+#     });
+#   });
+#
+#
+#   it('test at position', function () {
+#     var l = linkify();
+#
+#     assert.ok(l.testSchemaAt('http://google.com', 'http:', 5));
+#     assert.ok(l.testSchemaAt('http://google.com', 'HTTP:', 5));
+#     assert.notOk(l.testSchemaAt('http://google.com', 'http:', 6));
+#
+#     assert.notOk(l.testSchemaAt('http://google.com', 'bad_schema:', 6));
+#   });
+#
+#
+#   it('correct cache value', function () {
+#     var l = linkify();
+#
+#     var match = l.match('.com. http://google.com google.com ftp://google.com');
+#
+#     assert.equal(match[0].text, 'http://google.com');
+#     assert.equal(match[1].text, 'google.com');
+#     assert.equal(match[2].text, 'ftp://google.com');
+#   });
+#
+#   it('normalize', function () {
+#     var l = linkify(), m;
+#
+#     m = l.match('mailto:foo@bar.com')[0];
+#
+#     // assert.equal(m.text, 'foo@bar.com');
+#     assert.equal(m.url,  'mailto:foo@bar.com');
+#
+#     m = l.match('foo@bar.com')[0];
+#
+#     // assert.equal(m.text, 'foo@bar.com');
+#     assert.equal(m.url,  'mailto:foo@bar.com');
+#   });
+#
+#
+#   it('test @twitter rule', function () {
+#     var l = linkify().add('@', {
+#       validate: function (text, pos, self) {
+#         var tail = text.slice(pos);
+#
+#         if (!self.re.twitter) {
+#           self.re.twitter =  new RegExp(
+#             '^([a-zA-Z0-9_]){1,15}(?!_)(?=$|' + self.re.src_ZPCcCf + ')'
+#           );
+#         }
+#         if (self.re.twitter.test(tail)) {
+#           if (pos >= 2 && tail[pos - 2] === '@') {
+#             return false;
+#           }
+#           return tail.match(self.re.twitter)[0].length;
+#         }
+#         return 0;
+#       },
+#       normalize: function (m) {
+#         m.url = 'https://twitter.com/' + m.url.replace(/^@/, '');
+#       }
+#     });
+#
+#     assert.equal(l.match('hello, @gamajoba_!')[0].text, '@gamajoba_');
+#     assert.equal(l.match(':@givi')[0].text, '@givi');
+#     assert.equal(l.match(':@givi')[0].url, 'https://twitter.com/givi');
+#     assert.notOk(l.test('@@invalid'));
+#   });
+end

data/spec/spec_helper.rb ADDED

	@@ -0,0 +1,2 @@
1	+ require 'byebug'
2	+ require 'linkify-it-rb'

metadata ADDED

@@ -0,0 +1,67 @@
+--- !ruby/object:Gem::Specification
+name: linkify-it-rb
+version: !ruby/object:Gem::Version
+  version: 0.1.0.0
+platform: ruby
+authors:
+- Brett Walker
+- Vitaly Puzrin
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2015-03-26 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: uc.micro-rb
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.0'
+description: Ruby version of linkify-it for motion-markdown-it, for Ruby and RubyMotion
+email: github@digitalmoksha.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- README.md
+- lib/linkify-it-rb.rb
+- lib/linkify-it-rb/index.rb
+- lib/linkify-it-rb/re.rb
+- lib/linkify-it-rb/version.rb
+- spec/linkify-it-rb/test_spec.rb
+- spec/spec_helper.rb
+homepage: https://github.com/digitalmoksha/linkify-it-rb
+licenses:
+- MIT
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 2.4.5
+signing_key:
+specification_version: 4
+summary: linkify-it for motion-markdown-it in Ruby
+test_files:
+- spec/linkify-it-rb/test_spec.rb
+- spec/spec_helper.rb