RubyGems - linkify-it-rb - Versions diffs - 0.1.0.0 - Mend

linkify-it-rb 0.1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml +7 -0
data/README.md +170 -0
data/lib/linkify-it-rb.rb +18 -0
data/lib/linkify-it-rb/index.rb +503 -0
data/lib/linkify-it-rb/re.rb +111 -0
data/lib/linkify-it-rb/version.rb +5 -0
data/spec/linkify-it-rb/test_spec.rb +234 -0
data/spec/spec_helper.rb +2 -0
metadata +67 -0

checksums.yaml ADDED

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: 8899eb2fafae9f6ce4105221dc2342c93de1ca5c
+  data.tar.gz: 493bb38504a90c6f6c9dfa870364758370ee548e
+SHA512:
+  metadata.gz: 01ebcaaaa3238631990a3212f5058d69bd58b7f336576e82ac101745f25ddf7dbd948b81013eda800a388d845ab99fa23f75d2d6d90db35368d90a9b92e5f6b2
+  data.tar.gz: 0b9c9fbfe357d9b3c78c2def0bd9f51f08cb0b39335e85c8bb791d483c4e589bf3a5d860357584591e3fe8277f6226310dff04a5235ed0c783472d7ac7803663

data/README.md ADDED

@@ -0,0 +1,170 @@
+# linkify-it-rb
+Links recognition library with FULL unicode support. Focused on high quality link patterns detection in plain text.  For use with both Ruby and RubyMotion.
+This gem is a port of the [linkify-it javascript package](https://github.com/markdown-it/linkify-it) by Vitaly Puzrin, that is used for the [markdown-it](https://github.com/markdown-it/markdown-it) package.
+__[Javascript Demo](http://markdown-it.github.io/linkify-it/)__
+_Note:_ This gem is still in progress - some of the Unicode support is still being worked on.
+## To be updated: Original Javascript package documentation
+Why it's awesome:
+- Full unicode support, _with astral characters_!
+- International domains support.
+- Allows rules extension & custom normalizers.
+Install
+-------
+```bash
+npm install linkify-it --save
+```
+Browserification is also supported.
+Usage examples
+--------------
+##### Example 1
+```js
+var linkify = require('linkify-it')();
+// Reload full tlds list & add uniffocial `.onion` domain.
+linkify
+  .tlds(require('tlds'))          // Reload with full tlds list
+  .tlds('.onion', true);          // Add uniffocial `.onion` domain
+  .linkify.add('git:', 'http:');  // Add `git:` ptotocol as "alias"
+  .linkify.add('ftp:', null);     // Disable `ftp:` ptotocol
+console.log(linkify.test('Site github.com!'));  // true
+console.log(linkify.match('Site github.com!')); // [ {
+                                                //   schema: "",
+                                                //   index: 5,
+                                                //   lastIndex: 15,
+                                                //   raw: "github.com",
+                                                //   text: "github.com",
+                                                //   url: "http://github.com",
+                                                // } ]
+```
+##### Exmple 2. Add twitter mentions handler
+```js
+linkify.add('@', {
+  validate: function (text, pos, self) {
+    var tail = text.slice(pos);
+    if (!self.re.twitter) {
+      self.re.twitter =  new RegExp(
+        '^([a-zA-Z0-9_]){1,15}(?!_)(?=$|' + self.re.src_ZPCcCf + ')'
+      );
+    }
+    if (self.re.twitter.test(tail)) {
+      // Linkifier allows punctuation chars before prefix,
+      // but we additionally disable `@` ("@@mention" is invalid)
+      if (pos >= 2 && tail[pos - 2] === '@') {
+        return false;
+      }
+      return tail.match(self.re.twitter)[0].length;
+    }
+    return 0;
+  },
+  normalize: function (match) {
+    match.url = 'https://twitter.com/' + match.url.replace(/^@/, '');
+  }
+});
+```
+API
+---
+__[API documentation](http://markdown-it.github.io/linkify-it/doc)__
+### new LinkifyIt(schemas)
+Creates new linkifier instance with optional additional schemas.
+Can be called without `new` keyword for convenience.
+By default understands:
+- `http(s)://...` , `ftp://...`, `mailto:...` & `//...` links
+- "fuzzy" links and emails (google.com, foo@bar.com).
+`schemas` is an object, where each key/value describes protocol/rule:
+- __key__ - link prefix (usually, protocol name with `:` at the end, `skype:`
+  for example). `linkify-it` makes shure that prefix is not preceeded with
+  alphanumeric char.
+- __value__ - rule to check tail after link prefix
+  - _String_ - just alias to existing rule
+  - _Object_
+    - _validate_ - validator function (should return matched length on success),
+      or `RegExp`.
+    - _normalize_ - optional function to normalize text & url of matched result
+      (for example, for twitter mentions).
+### .test(text)
+Searches linkifiable pattern and returns `true` on success or `false` on fail.
+### .pretest(text)
+Quick check if link MAY BE can exist. Can be used to optimize more expensive
+`.test()` calls. Return `false` if link can not be found, `true` - if `.test()`
+call needed to know exactly.
+### .testSchemaAt(text, name, offset)
+Similar to `.test()` but checks only specific protocol tail exactly at given
+position. Returns length of found pattern (0 on fail).
+### .match(text)
+Returns `Array` of found link matches or null if nothing found.
+Each match has:
+- __schema__ - link schema, can be empty for fuzzy links, or `//` for
+  protocol-neutral  links.
+- __index__ - offset of matched text
+- __lastIndex__ - index of next char after mathch end
+- __raw__ - matched text
+- __text__ - normalized text
+- __url__ - link, generated from matched text
+### .tlds(list[, keepOld])
+Load (or merge) new tlds list. Those are user for fuzzy links (without prefix)
+to avoid false positives. By default this algorythm used:
+- hostname with any 2-letter root zones are ok.
+- biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф
+  are ok.
+- encoded (`xn--...`) root zones are ok.
+If list is replaced, then exact match for 2-chars root zones will be checked.
+### .add(schema, definition)
+Add new rule with `schema` prefix. For definition details see constructor
+description. To disable existing rule use `.add(name, null)`
+## License
+[MIT](https://github.com/markdown-it/linkify-it/blob/master/LICENSE)

data/lib/linkify-it-rb.rb ADDED

@@ -0,0 +1,18 @@
+# encoding: utf-8
+if defined?(Motion::Project::Config)
+  lib_dir_path = File.dirname(File.expand_path(__FILE__))
+  Motion::Project::App.setup do |app|
+    app.files.unshift(Dir.glob(File.join(lib_dir_path, "linkify-it-rb/**/*.rb")))
+  end
+  require 'uc.micro-rb'
+else
+  require 'uc.micro-rb'
+  require 'linkify-it-rb/re'
+  require 'linkify-it-rb/index'
+end

data/lib/linkify-it-rb/index.rb ADDED

@@ -0,0 +1,503 @@
+class Linkify
+  include ::LinkifyRe
+  attr_accessor   :__index__, :__last_index__, :__text_cache__, :__schema__, :__compiled__
+  attr_accessor   :re, :bypass_normalizer
+  # DON'T try to make PRs with changes. Extend TLDs with LinkifyIt.tlds() instead
+  TLDS_DEFAULT = 'biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф'.split('|')
+  DEFAULT_SCHEMAS = {
+    'http:' => {
+      validate: lambda do |text, pos, obj|
+        tail = text.slice(pos..-1)
+        if (!obj.re[:http])
+          # compile lazily, because "host"-containing variables can change on tlds update.
+          obj.re[:http] = Regexp.new('^\\/\\/' + LinkifyRe::SRC_AUTH + LinkifyRe::SRC_HOST_PORT_STRICT + LinkifyRe::SRC_PATH, 'i')
+        end
+        if obj.re[:http] =~ tail
+          return tail.match(obj.re[:http])[0].length
+        end
+        return 0
+      end
+    },
+    'https:' =>  'http:',
+    'ftp:' =>    'http:',
+    '//' =>      {
+      validate: lambda do |text, pos, obj|
+        tail = text.slice(pos..-1)
+        if (!obj.re[:no_http])
+          # compile lazily, becayse "host"-containing variables can change on tlds update.
+          obj.re[:no_http] = Regexp.new('^' + LinkifyRe::SRC_AUTH + LinkifyRe::SRC_HOST_PORT_STRICT + LinkifyRe::SRC_PATH, 'i')
+        end
+        if (obj.re[:no_http] =~ tail)
+          # should not be `://`, that protects from errors in protocol name
+          return 0 if (pos >= 3 && text[pos - 3] == ':')
+          return tail.match(obj.re[:no_http])[0].length
+        end
+        return 0
+      end
+    },
+    'mailto:' => {
+      validate: lambda do |text, pos, obj|
+        tail = text.slice(pos..-1)
+        if (!obj.re[:mailto])
+          obj.re[:mailto] = Regexp.new('^' + LinkifyRe::SRC_EMAIL_NAME + '@' + LinkifyRe::SRC_HOST_STRICT, 'i')
+        end
+        if (obj.re[:mailto] =~ tail)
+          return tail.match(obj.re[:mailto])[0].length
+        end
+        return 0
+      end
+    }
+  }
+  #------------------------------------------------------------------------------
+  def escapeRE(str)
+    return str.gsub(/[\.\?\*\+\^\$\[\]\\\(\)\{\}\|\-]/, "\\$&")
+  end
+  #------------------------------------------------------------------------------
+  def resetScanCache
+    @__index__      = -1
+    @__text_cache__ = ''
+  end
+  #------------------------------------------------------------------------------
+  def createValidator(re)
+    return lambda do |text, pos, obj|
+      tail = text.slice(pos..-1)
+      (re =~ tail) ? tail.match(re)[0].length : 0
+    end
+  end
+  #------------------------------------------------------------------------------
+  def createNormalizer()
+    return lambda do |match, obj|
+      obj.normalize(match)
+    end
+  end
+  # Schemas compiler. Build regexps.
+  #
+  #------------------------------------------------------------------------------
+  def compile
+    # Load & clone RE patterns.
+    re = @re = {}  #.merge!(require('./lib/re'))
+    # Define dynamic patterns
+    tlds = @__tlds__.dup
+    if (!@__tlds_replaced__)
+      tlds.push('[a-z]{2}')
+    end
+    tlds.push(re[:src_xn])
+    re[:src_tlds] = tlds.join('|')
+    untpl = lambda { |tpl| tpl.gsub('%TLDS%', re[:src_tlds]) }
+    re[:email_fuzzy]      = Regexp.new(LinkifyRe::TPL_EMAIL_FUZZY.gsub('%TLDS%', re[:src_tlds]), true)
+    re[:link_fuzzy]       = Regexp.new(LinkifyRe::TPL_LINK_FUZZY.gsub('%TLDS%', re[:src_tlds]), true)
+    re[:host_fuzzy_test]  = Regexp.new(LinkifyRe::TPL_HOST_FUZZY_TEST.gsub('%TLDS%', re[:src_tlds]), true)
+    #
+    # Compile each schema
+    #
+    aliases = []
+    @__compiled__ = {} # Reset compiled data
+    schemaError = lambda do |name, val|
+      raise Error, ('(LinkifyIt) Invalid schema "' + name + '": ' + val)
+    end
+    @__schemas__.each do |name, val|
+      # skip disabled methods
+      next if (val == nil)
+      compiled = { validate: nil, link: nil }
+      @__compiled__[name] = compiled
+      if (val.is_a? Hash)
+        if (val[:validate].is_a? Regexp)
+          compiled[:validate] = createValidator(val[:validate])
+        elsif (val[:validate].is_a? Proc)
+          compiled[:validate] = val[:validate]
+        else
+          schemaError(name, val)
+        end
+        if (val[:normalize].is_a? Proc)
+          compiled[:normalize] = val[:normalize]
+        elsif (!val[:normalize])
+          compiled[:normalize] = createNormalizer()
+        else
+          schemaError(name, val)
+        end
+        next
+      end
+      if (val.is_a? String)
+        aliases.push(name)
+        next
+      end
+      schemaError(name, val)
+    end
+    #
+    # Compile postponed aliases
+    #
+    aliases.each do |an_alias|
+      if (!@__compiled__[@__schemas__[an_alias]])
+        # Silently fail on missed schemas to avoid errons on disable.
+        # schemaError(an_alias, self.__schemas__[an_alias]);
+      else
+        @__compiled__[an_alias][:validate]  = @__compiled__[@__schemas__[an_alias]][:validate]
+        @__compiled__[an_alias][:normalize] = @__compiled__[@__schemas__[an_alias]][:normalize]
+      end
+    end
+    #
+    # Fake record for guessed links
+    #
+    @__compiled__[''] = { validate: nil, normalize: createNormalizer }
+    #
+    # Build schema condition, and filter disabled & fake schemas
+    #
+    slist = @__compiled__.select {|name, val| name.length > 0 && !val.nil? }.keys.map {|str| escapeRE(str)}.join('|')
+    # (?!_) cause 1.5x slowdown
+    @re[:schema_test]   = Regexp.new('(^|(?!_)(?:>|' + LinkifyRe::SRC_Z_P_CC_CF + '))(' + slist + ')', 'i')
+    @re[:schema_search] = Regexp.new('(^|(?!_)(?:>|' + LinkifyRe::SRC_Z_P_CC_CF + '))(' + slist + ')', 'ig')
+    @re[:pretest]       = Regexp.new(
+                              '(' + @re[:schema_test].source + ')|' +
+                              '(' + @re[:host_fuzzy_test].source + ')|' + '@', 'i')
+    #
+    # Cleanup
+    #
+    resetScanCache
+  end
+  # Match result. Single element of array, returned by [[LinkifyIt#match]]
+  #------------------------------------------------------------------------------
+  class Match
+    attr_accessor   :schema, :index, :lastIndex, :raw, :text, :url
+    def initialize(obj, shift)
+      start = obj.__index__
+      endt  = obj.__last_index__
+      text  = obj.__text_cache__.slice(start...endt)
+      # Match#schema -> String
+      #
+      # Prefix (protocol) for matched string.
+      @schema    = obj.__schema__.downcase
+      # Match#index -> Number
+      #
+      # First position of matched string.
+      @index     = start + shift
+      # Match#lastIndex -> Number
+      #
+      # Next position after matched string.
+      @lastIndex = endt + shift
+      # Match#raw -> String
+      #
+      # Matched string.
+      @raw       = text
+      # Match#text -> String
+      #
+      # Notmalized text of matched string.
+      @text      = text
+      # Match#url -> String
+      #
+      # Normalized url of matched string.
+      @url       = text
+    end
+    #------------------------------------------------------------------------------
+    def self.createMatch(obj, shift)
+      match = Match.new(obj, shift)
+      obj.__compiled__[match.schema][:normalize].call(match, obj)
+      return match
+    end
+  end
+  # new LinkifyIt(schemas)
+  # - schemas (Object): Optional. Additional schemas to validate (prefix/validator)
+  #
+  # Creates new linkifier instance with optional additional schemas.
+  # Can be called without `new` keyword for convenience.
+  #
+  # By default understands:
+  #
+  # - `http(s)://...` , `ftp://...`, `mailto:...` & `//...` links
+  # - "fuzzy" links and emails (example.com, foo@bar.com).
+  #
+  # `schemas` is an object, where each key/value describes protocol/rule:
+  #
+  # - __key__ - link prefix (usually, protocol name with `:` at the end, `skype:`
+  #   for example). `linkify-it` makes shure that prefix is not preceeded with
+  #   alphanumeric char and symbols. Only whitespaces and punctuation allowed.
+  # - __value__ - rule to check tail after link prefix
+  #   - _String_ - just alias to existing rule
+  #   - _Object_
+  #     - _validate_ - validator function (should return matched length on success),
+  #       or `RegExp`.
+  #     - _normalize_ - optional function to normalize text & url of matched result
+  #       (for example, for @twitter mentions).
+  #------------------------------------------------------------------------------
+  def initialize(schemas = {})
+    # if (!(this instanceof LinkifyIt)) {
+    #   return new LinkifyIt(schemas);
+    # }
+    # Cache last tested result. Used to skip repeating steps on next `match` call.
+    @__index__          = -1
+    @__last_index__     = -1 # Next scan position
+    @__schema__         = ''
+    @__text_cache__     = ''
+    @__schemas__        = {}.merge!(DEFAULT_SCHEMAS).merge!(schemas)
+    @__compiled__       = {}
+    @__tlds__           = TLDS_DEFAULT
+    @__tlds_replaced__  = false
+    @re                 = {}
+    @bypass_normalizer  = false   # only used in testing scenarios
+    compile
+  end
+  # chainable
+  # LinkifyIt#add(schema, definition)
+  # - schema (String): rule name (fixed pattern prefix)
+  # - definition (String|RegExp|Object): schema definition
+  #
+  # Add new rule definition. See constructor description for details.
+  #------------------------------------------------------------------------------
+  def add(schema, definition)
+    @__schemas__[schema] = definition
+    compile
+    return self
+  end
+  # LinkifyIt#test(text) -> Boolean
+  #
+  # Searches linkifiable pattern and returns `true` on success or `false` on fail.
+  #------------------------------------------------------------------------------
+  def test(text)
+    # Reset scan cache
+    @__text_cache__ = text
+    @__index__      = -1
+    return false if (!text.length)
+    # try to scan for link with schema - that's the most simple rule
+    if @re[:schema_test] =~ text
+      re = @re[:schema_search]
+      # re[:lastIndex] = 0
+      while ((m = re.match(text)) != nil)
+        len = testSchemaAt(text, m[2], m.end(0)) #re[:lastIndex])
+        if (len)
+          @__schema__     = m[2]
+          @__index__      = m.begin(0) + m[1].length
+          @__last_index__ = m.begin(0) + m[0].length + len
+          break
+        end
+      end
+    end
+    if (@__compiled__['http:'])
+      # guess schemaless links
+      tld_pos = text.index(@re[:host_fuzzy_test])
+      if !tld_pos.nil?
+        # if tld is located after found link - no need to check fuzzy pattern
+        if (@__index__ < 0 || tld_pos < @__index__)
+          if ((ml = text.match(@re[:link_fuzzy])) != nil)
+            shift = ml.begin(0) + ml[1].length
+            if (@__index__ < 0 || shift < @__index__)
+              @__schema__     = ''
+              @__index__      = shift
+              @__last_index__ = ml.begin(0) + ml[0].length
+            end
+          end
+        end
+      end
+    end
+    if (@__compiled__['mailto:'])
+      # guess schemaless emails
+      at_pos = text.index('@')
+      if !at_pos.nil?
+        # We can't skip this check, because this cases are possible:
+        # 192.168.1.1@gmail.com, my.in@example.com
+        if ((me = text.match(@re[:email_fuzzy])) != nil)
+          shift = me.begin(0) + me[1].length
+          nextc = me.begin(0) + me[0].length
+          if (@__index__ < 0 || shift < @__index__ ||
+              (shift == @__index__ && nextc > @__last_index__))
+            @__schema__     = 'mailto:'
+            @__index__      = shift
+            @__last_index__ = nextc
+          end
+        end
+      end
+    end
+    return @__index__ >= 0
+  end
+  # LinkifyIt#pretest(text) -> Boolean
+  #
+  # Very quick check, that can give false positives. Returns true if link MAY BE
+  # can exists. Can be used for speed optimization, when you need to check that
+  # link NOT exists.
+  #------------------------------------------------------------------------------
+  def pretest(text)
+    return !(@re[:pretest] =~ text).nil?
+  end
+  # LinkifyIt#testSchemaAt(text, name, position) -> Number
+  # - text (String): text to scan
+  # - name (String): rule (schema) name
+  # - position (Number): text offset to check from
+  #
+  # Similar to [[LinkifyIt#test]] but checks only specific protocol tail exactly
+  # at given position. Returns length of found pattern (0 on fail).
+  #------------------------------------------------------------------------------
+  def testSchemaAt(text, schema, pos)
+    # If not supported schema check requested - terminate
+    if (!@__compiled__[schema.downcase])
+      return 0
+    end
+    return @__compiled__[schema.downcase][:validate].call(text, pos, self)
+  end
+  # LinkifyIt#match(text) -> Array|null
+  #
+  # Returns array of found link descriptions or `null` on fail. We strongly
+  # to use [[LinkifyIt#test]] first, for best speed.
+  #
+  # ##### Result match description
+  #
+  # - __schema__ - link schema, can be empty for fuzzy links, or `//` for
+  #   protocol-neutral  links.
+  # - __index__ - offset of matched text
+  # - __lastIndex__ - index of next char after mathch end
+  # - __raw__ - matched text
+  # - __text__ - normalized text
+  # - __url__ - link, generated from matched text
+  #------------------------------------------------------------------------------
+  def match(text)
+    shift  = 0
+    result = []
+    # Try to take previous element from cache, if .test() called before
+    if (@__index__ >= 0 && @__text_cache__ == text)
+      result.push(Match.createMatch(self, shift))
+      shift = @__last_index__
+    end
+    # Cut head if cache was used
+    tail = shift ? text.slice(shift..-1) : text
+    # Scan string until end reached
+    while (self.test(tail))
+      result.push(Match.createMatch(self, shift))
+      tail   = tail.slice(@__last_index__..-1)
+      shift += @__last_index__
+    end
+    if (result.length)
+      return result
+    end
+    return nil
+  end
+  # chainable
+  # LinkifyIt#tlds(list [, keepOld]) -> this
+  # - list (Array): list of tlds
+  # - keepOld (Boolean): merge with current list if `true` (`false` by default)
+  #
+  # Load (or merge) new tlds list. Those are user for fuzzy links (without prefix)
+  # to avoid false positives. By default this algorythm used:
+  #
+  # - hostname with any 2-letter root zones are ok.
+  # - biz|com|edu|gov|net|org|pro|web|xxx|aero|asia|coop|info|museum|name|shop|рф
+  #   are ok.
+  # - encoded (`xn--...`) root zones are ok.
+  #
+  # If list is replaced, then exact match for 2-chars root zones will be checked.
+  #------------------------------------------------------------------------------
+  def tlds(list, keepOld)
+    list = list.is_a?(Array) ? list : [ list ]
+    if (!keepOld)
+      @__tlds__ = list.dup
+      @__tlds_replaced__ = true
+      compile
+      return self
+    end
+    @__tlds__ = @__tlds__.concat(list).sort.uniq.reverse
+    compile
+    return self
+  end
+  # LinkifyIt#normalize(match)
+  #
+  # Default normalizer (if schema does not define it's own).
+  #------------------------------------------------------------------------------
+  def normalize(match)
+    return if @bypass_normalizer
+    # Do minimal possible changes by default. Need to collect feedback prior
+    # to move forward https://github.com/markdown-it/linkify-it/issues/1
+    match.url = 'http://' + match.url if !match.schema
+    if (match.schema == 'mailto:' && !(/^mailto\:/i =~ match.url))
+      match.url = 'mailto:' + match.url
+    end
+  end
+end

data/lib/linkify-it-rb/re.rb ADDED

@@ -0,0 +1,111 @@
+module LinkifyRe
+    # Use direct extract instead of `regenerate` to reduce size
+    SRC_ANY = UCMicro::Properties::Any::REGEX
+    SRC_CC  = UCMicro::Categories::Cc::REGEX
+    SRC_CF  = UCMicro::Categories::Cf::REGEX
+    SRC_Z   = UCMicro::Categories::Z::REGEX
+    SRC_P   = UCMicro::Categories::P::REGEX
+    # \p{\Z\P\Cc\CF} (white spaces + control + format + punctuation)
+    SRC_Z_P_CC_CF = [ SRC_Z, SRC_P, SRC_CC, SRC_CF ].join('|')
+    # \p{\Z\Cc\CF} (white spaces + control + format)
+    SRC_Z_CC_CF = [ SRC_Z, SRC_CC, SRC_CF ].join('|')
+    # All possible word characters (everything without punctuation, spaces & controls)
+    # Defined via punctuation & spaces to save space
+    # Should be something like \p{\L\N\S\M} (\w but without `_`)
+    SRC_PSEUDO_LETTER       = '(?:(?!' + SRC_Z_P_CC_CF + ')' + SRC_ANY.source + ')'
+    # The same as above but without [0-9]
+    SRC_PSEUDO_LETTER_NON_D = '(?:(?![0-9]|' + SRC_Z_P_CC_CF + ')' + SRC_ANY.source + ')'
+    #------------------------------------------------------------------------------
+    SRC_IP4   = '(?:(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'
+    SRC_AUTH  = '(?:(?:(?!' + SRC_Z_CC_CF + ').)+@)?'
+    SRC_PORT  = '(?::(?:6(?:[0-4]\\d{3}|5(?:[0-4]\\d{2}|5(?:[0-2]\\d|3[0-5])))|[1-5]?\\d{1,4}))?'
+    SRC_HOST_TERMINATOR = '(?=$|' + SRC_Z_P_CC_CF + ')(?!-|_|:\\d|\\.-|\\.(?!$|' + SRC_Z_P_CC_CF + '))'
+    SRC_PATH =
+      '(?:' +
+        '[/?#]' +
+          '(?:' +
+            '(?!' + SRC_Z_CC_CF + '|[()\\[\\]{}.,"\'?!\\-]).|' +
+            '\\[(?:(?!' + SRC_Z_CC_CF + '|\\]).)*\\]|' +
+            '\\((?:(?!' + SRC_Z_CC_CF + '|[)]).)*\\)|' +
+            '\\{(?:(?!' + SRC_Z_CC_CF + '|[}]).)*\\}|' +
+            '\\"(?:(?!' + SRC_Z_CC_CF + '|["]).)+\\"|' +
+            "\\'(?:(?!" + SRC_Z_CC_CF + "|[']).)+\\'|" +
+            "\\'(?=" + SRC_PSEUDO_LETTER + ').|' +  # allow `I'm_king` if no pair found
+            '\\.{2,3}[a-zA-Z0-9%]|' + # github has ... in commit range links. Restrict to
+                                      # english & percent-encoded only, until more examples found.
+            '\\.(?!' + SRC_Z_CC_CF + '|[.]).|' +
+            '\\-(?!' + SRC_Z_CC_CF + '|--(?:[^-]|$))(?:[-]+|.)|' +  # `---` => long dash, terminate
+            '\\,(?!' + SRC_Z_CC_CF + ').|' +      # allow `,,,` in paths
+            '\\!(?!' + SRC_Z_CC_CF + '|[!]).|' +
+            '\\?(?!' + SRC_Z_CC_CF + '|[?]).' +
+          ')+' +
+        '|\\/' +
+      ')?'
+    SRC_EMAIL_NAME  = '[\\-;:&=\\+\\$,\\"\\.a-zA-Z0-9_]+'
+    SRC_XN          = 'xn--[a-z0-9\\-]{1,59}';
+    # More to read about domain names
+    # http://serverfault.com/questions/638260/
+    SRC_DOMAIN_ROOT =
+      # Can't have digits and dashes
+      '(?:' +
+        SRC_XN +
+        '|' +
+        SRC_PSEUDO_LETTER_NON_D + '{1,63}' +
+      ')'
+    SRC_DOMAIN =
+      '(?:' +
+        SRC_XN +
+        '|' +
+        '(?:' + SRC_PSEUDO_LETTER + ')' +
+        '|' +
+        # don't allow `--` in domain names, because:
+        # - that can conflict with markdown &mdash; / &ndash;
+        # - nobody use those anyway
+        '(?:' + SRC_PSEUDO_LETTER + '(?:-(?!-)|' + SRC_PSEUDO_LETTER + '){0,61}' + SRC_PSEUDO_LETTER + ')' +
+      ')'
+    SRC_HOST =
+      '(?:' +
+        SRC_IP4 +
+      '|' +
+        '(?:(?:(?:' + SRC_DOMAIN + ')\\.)*' + SRC_DOMAIN_ROOT + ')' +
+      ')'
+    TPL_HOST_FUZZY =
+      '(?:' +
+        SRC_IP4 +
+      '|' +
+        '(?:(?:(?:' + SRC_DOMAIN + ')\\.)+(?:%TLDS%))' +
+      ')';
+    SRC_HOST_STRICT            = SRC_HOST + SRC_HOST_TERMINATOR
+    TPL_HOST_FUZZY_STRICT      = TPL_HOST_FUZZY + SRC_HOST_TERMINATOR
+    SRC_HOST_PORT_STRICT       = SRC_HOST + SRC_PORT + SRC_HOST_TERMINATOR
+    TPL_HOST_PORT_FUZZY_STRICT = TPL_HOST_FUZZY + SRC_PORT + SRC_HOST_TERMINATOR
+    #------------------------------------------------------------------------------
+    # Main rules
+    # Rude test fuzzy links by host, for quick deny
+    TPL_HOST_FUZZY_TEST = 'localhost|\\.\\d{1,3}\\.|(?:\\.(?:%TLDS%)(?:' + SRC_Z_P_CC_CF + '|$))'
+    TPL_EMAIL_FUZZY     = '(^|>|' + SRC_Z_CC_CF + ')(' + SRC_EMAIL_NAME + '@' + TPL_HOST_FUZZY_STRICT + ')'
+    TPL_LINK_FUZZY =
+        # Fuzzy link can't be prepended with .:/\- and non punctuation.
+        # but can start with > (markdown blockquote)
+        '(^|(?![.:/\\-_@])(?:[$+<=>^`|]|' + SRC_Z_P_CC_CF + '))' +
+        '((?![$+<=>^`|])' + TPL_HOST_PORT_FUZZY_STRICT + SRC_PATH + ')'
+end

data/lib/linkify-it-rb/version.rb ADDED

@@ -0,0 +1,5 @@
+module LinkifyIt
+  VERSION = '0.1.0.0'
+end

data/spec/linkify-it-rb/test_spec.rb ADDED

@@ -0,0 +1,234 @@
+#------------------------------------------------------------------------------
+describe 'links' do
+  # TODO tests which can't seem to get passing at the moment, so skip them
+  failing_test = [
+    95,     # GOOGLE.COM.     unable to get final . to be removed
+    214     # xn--d1abbgf6aiiy.xn--p1ai
+  ]
+  l = Linkify.new
+  l.bypass_normalizer = true    # kill the normalizer
+  skipNext  = false
+  linkfile  = File.join(File.dirname(__FILE__), 'fixtures/links.txt')
+  lines     = File.read(linkfile).split(/\r?\n/)
+  lines.each_with_index do |line, idx|
+    if skipNext
+      skipNext = false
+      next
+    end
+    line      = line.sub(/^%.*/, '')
+    next_line = (lines[idx + 1] || '').sub(/^%.*/, '')
+    next if line.strip.empty?
+    unless failing_test.include?(idx + 1)
+      if !next_line.strip.empty?
+        it "line #{idx + 1}" do
+          expect(l.pretest(line)).to eq true        # "(pretest failed in `#{line}`)"
+          expect(l.test("\n#{line}\n")).to eq true  # "(link not found in `\n#{line}\n`)"
+          expect(l.test(line)).to eq true           # "(link not found in `#{line}`)"
+          expect(l.match(line)[0].url).to eq next_line
+        end
+        skipNext = true
+      else
+        it "line #{idx + 1}" do
+          expect(l.pretest(line)).to eq true        # "(pretest failed in `#{line}`)"
+          expect(l.test("\n#{line}\n")).to eq true  # "(link not found in `\n#{line}\n`)"
+          expect(l.test(line)).to eq true           # "(link not found in `#{line}`)"
+          expect(l.match(line)[0].url).to eq line
+        end
+      end
+    end
+  end
+end
+#------------------------------------------------------------------------------
+describe 'not links' do
+  # TODO tests which can't seem to get passing at the moment, so skip them
+  failing_test = [ 6, 7, 8, 12, 16, 19, 22, 23, 24, 25, 26, 27, 28, 29, 48 ]
+  l = Linkify.new
+  l.bypass_normalizer = true    # kill the normalizer
+  linkfile  = File.join(File.dirname(__FILE__), 'fixtures/not_links.txt')
+  lines     = File.read(linkfile).split(/\r?\n/)
+  lines.each_with_index do |line, idx|
+    line = line.sub(/^%.*/, '')
+    next if line.strip.empty?
+    unless failing_test.include?(idx + 1)
+      it "line #{idx + 1}" do
+        # assert.notOk(l.test(line),
+        #  '(should not find link in `' + line + '`, but found `' +
+        #  JSON.stringify((l.match(line) || [])[0]) + '`)');
+        expect(l.test(line)).not_to eq true
+      end
+    end
+  end
+end
+#------------------------------------------------------------------------------
+describe 'API' do
+  #------------------------------------------------------------------------------
+  it 'extend tlds' do
+    l = Linkify.new
+    expect(l.test('google.myroot')).to_not eq true
+    l.tlds('myroot', true)
+    expect(l.test('google.myroot')).to eq true
+    expect(l.test('google.xyz')).to_not eq true
+    # this is some other package of tlds which we don't have
+    # l.tlds(require('tlds'));
+    # assert.ok(l.test('google.xyz'));
+    # assert.notOk(l.test('google.myroot'));
+  end
+  # TODO Tests not passing
+  #------------------------------------------------------------------------------
+  # it 'add rule as regexp, with default normalizer' do
+  #   l = Linkify.new.add('my:', {validate: /^\/\/[a-z]+/} )
+  #
+  #   match = l.match('google.com. my:// my://asdf!')
+  #
+  #   expect(match[0].text).to eq 'google.com'
+  #   expect(match[1].text).to eq 'my://asdf'
+  # end
+  # TODO Tests not passing
+  #------------------------------------------------------------------------------
+  # it 'add rule with normalizer'
+  #   l = Linkify.new.add('my:', {
+  #     validate: /^\/\/[a-z]+/,
+  #     normalize: lambda {|m|
+  #       m.text = m.text.sub(/^my:\/\//, '').upcase
+  #       m.url  = m.url.upcase
+  #     }
+  #   })
+  #
+  #   match = l.match('google.com. my:// my://asdf!')
+  #
+  #   expect(match[1].text).to eq 'ASDF'
+  #   expect(match[1].url).to eq 'MY://ASDF'
+  # end
+#   it('disable rule', function () {
+#     var l = linkify();
+#
+#     assert.ok(l.test('http://google.com'));
+#     assert.ok(l.test('foo@bar.com'));
+#     l.add('http:', null);
+#     l.add('mailto:', null);
+#     assert.notOk(l.test('http://google.com'));
+#     assert.notOk(l.test('foo@bar.com'));
+#   });
+#
+#
+#   it('add bad definition', function () {
+#     var l;
+#
+#     l = linkify();
+#
+#     assert.throw(function () {
+#       l.add('test:', []);
+#     });
+#
+#     l = linkify();
+#
+#     assert.throw(function () {
+#       l.add('test:', { validate: [] });
+#     });
+#
+#     l = linkify();
+#
+#     assert.throw(function () {
+#       l.add('test:', {
+#         validate: function () { return false; },
+#         normalize: 'bad'
+#       });
+#     });
+#   });
+#
+#
+#   it('test at position', function () {
+#     var l = linkify();
+#
+#     assert.ok(l.testSchemaAt('http://google.com', 'http:', 5));
+#     assert.ok(l.testSchemaAt('http://google.com', 'HTTP:', 5));
+#     assert.notOk(l.testSchemaAt('http://google.com', 'http:', 6));
+#
+#     assert.notOk(l.testSchemaAt('http://google.com', 'bad_schema:', 6));
+#   });
+#
+#
+#   it('correct cache value', function () {
+#     var l = linkify();
+#
+#     var match = l.match('.com. http://google.com google.com ftp://google.com');
+#
+#     assert.equal(match[0].text, 'http://google.com');
+#     assert.equal(match[1].text, 'google.com');
+#     assert.equal(match[2].text, 'ftp://google.com');
+#   });
+#
+#   it('normalize', function () {
+#     var l = linkify(), m;
+#
+#     m = l.match('mailto:foo@bar.com')[0];
+#
+#     // assert.equal(m.text, 'foo@bar.com');
+#     assert.equal(m.url,  'mailto:foo@bar.com');
+#
+#     m = l.match('foo@bar.com')[0];
+#
+#     // assert.equal(m.text, 'foo@bar.com');
+#     assert.equal(m.url,  'mailto:foo@bar.com');
+#   });
+#
+#
+#   it('test @twitter rule', function () {
+#     var l = linkify().add('@', {
+#       validate: function (text, pos, self) {
+#         var tail = text.slice(pos);
+#
+#         if (!self.re.twitter) {
+#           self.re.twitter =  new RegExp(
+#             '^([a-zA-Z0-9_]){1,15}(?!_)(?=$|' + self.re.src_ZPCcCf + ')'
+#           );
+#         }
+#         if (self.re.twitter.test(tail)) {
+#           if (pos >= 2 && tail[pos - 2] === '@') {
+#             return false;
+#           }
+#           return tail.match(self.re.twitter)[0].length;
+#         }
+#         return 0;
+#       },
+#       normalize: function (m) {
+#         m.url = 'https://twitter.com/' + m.url.replace(/^@/, '');
+#       }
+#     });
+#
+#     assert.equal(l.match('hello, @gamajoba_!')[0].text, '@gamajoba_');
+#     assert.equal(l.match(':@givi')[0].text, '@givi');
+#     assert.equal(l.match(':@givi')[0].url, 'https://twitter.com/givi');
+#     assert.notOk(l.test('@@invalid'));
+#   });
+end

data/spec/spec_helper.rb ADDED

	@@ -0,0 +1,2 @@
1	+ require 'byebug'
2	+ require 'linkify-it-rb'

metadata ADDED

@@ -0,0 +1,67 @@
+--- !ruby/object:Gem::Specification
+name: linkify-it-rb
+version: !ruby/object:Gem::Version
+  version: 0.1.0.0
+platform: ruby
+authors:
+- Brett Walker
+- Vitaly Puzrin
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2015-03-26 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: uc.micro-rb
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.0'
+description: Ruby version of linkify-it for motion-markdown-it, for Ruby and RubyMotion
+email: github@digitalmoksha.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- README.md
+- lib/linkify-it-rb.rb
+- lib/linkify-it-rb/index.rb
+- lib/linkify-it-rb/re.rb
+- lib/linkify-it-rb/version.rb
+- spec/linkify-it-rb/test_spec.rb
+- spec/spec_helper.rb
+homepage: https://github.com/digitalmoksha/linkify-it-rb
+licenses:
+- MIT
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 2.4.5
+signing_key:
+specification_version: 4
+summary: linkify-it for motion-markdown-it in Ruby
+test_files:
+- spec/linkify-it-rb/test_spec.rb
+- spec/spec_helper.rb