RubyGems - quesadilla - Versions diffs - 0.1.0 - Mend

quesadilla 0.1.0

Files changed (29) hide show

checksums.yaml +7 -0
data/.gitignore +17 -0
data/.travis.yml +6 -0
data/Contributing.markdown +19 -0
data/Gemfile +19 -0
data/LICENSE +22 -0
data/Rakefile +21 -0
data/Readme.markdown +100 -0
data/lib/quesadilla/core_ext/string.rb +28 -0
data/lib/quesadilla/extractor/autolinks.rb +28 -0
data/lib/quesadilla/extractor/emoji.rb +43 -0
data/lib/quesadilla/extractor/hashtags.rb +28 -0
data/lib/quesadilla/extractor/html.rb +103 -0
data/lib/quesadilla/extractor/markdown.rb +187 -0
data/lib/quesadilla/extractor.rb +140 -0
data/lib/quesadilla/html_renderer.rb +57 -0
data/lib/quesadilla/version.rb +4 -0
data/lib/quesadilla.rb +45 -0
data/quesadilla.gemspec +28 -0
data/test/quesadilla/autolink_test.rb +84 -0
data/test/quesadilla/emoji_test.rb +103 -0
data/test/quesadilla/hashtags_test.rb +50 -0
data/test/quesadilla/html_test.rb +21 -0
data/test/quesadilla/markdown_test.rb +235 -0
data/test/quesadilla/multi_test.rb +64 -0
data/test/quesadilla_test.rb +9 -0
data/test/support/extractor_macros.rb +5 -0
data/test/test_helper.rb +18 -0
metadata +109 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: 17f8cfec03e4c86278585c52d2ae57995a984d3c
+  data.tar.gz: e145d4496a0bc034e278f6a2c4738d48006d7a8b
+SHA512:
+  metadata.gz: aaf8418d063286d7666e0b666b5badbcf63e2e54cceb225d35e69bc238fb3f0973544c547cbc806dde042e5d8f7ae37fa4accb1325c210ef5d9eb8a989f6330a
+  data.tar.gz: 39f070f60d71278bceeab4e939944ebc921609e4c6eb90b26f0e09ac693639b4dd80bd3a423e33dccd32f641fe6218213b665c3ebf43282e2a203134ef7454fb

data/.gitignore ADDED Viewed

@@ -0,0 +1,17 @@
+*.gem
+*.rbc
+.bundle
+.config
+.yardoc
+Gemfile.lock
+InstalledFiles
+_yardoc
+coverage
+doc/
+lib/bundler/man
+pkg
+rdoc
+spec/reports
+test/tmp
+test/version_tmp
+tmp

data/.travis.yml ADDED Viewed

@@ -0,0 +1,6 @@
+language: ruby
+bundler_args: --without development
+rvm:
+  - 1.9.3
+  - 2.0.0
+  - jruby-19mode

data/Contributing.markdown ADDED Viewed

@@ -0,0 +1,19 @@
+## Submitting a Pull Request
+1. [Fork the repository.][fork]
+2. [Create a topic branch.][branch]
+3. Add tests for your unimplemented feature or bug fix.
+4. Run `bundle exec rake`. If your tests pass, return to step 3.
+5. Implement your feature or bug fix.
+6. Run `bundle exec rake`. If your tests fail, return to step 5.
+7. Run `open coverage/index.html`. If your changes are not completely covered
+   by your tests, return to step 3.
+8. Add documentation for your feature or bug fix.
+9. Run `bundle exec rake doc`. If your changes are not 100% documented, go
+   back to step 8.
+10. Add, commit, and push your changes.
+11. [Submit a pull request.][pr]
+[fork]: http://help.github.com/fork-a-repo/
+[branch]: http://learn.github.com/p/branching.html
+[pr]: http://help.github.com/send-pull-requests/

data/Gemfile ADDED Viewed

@@ -0,0 +1,19 @@
+source 'https://rubygems.org'
+# Gem dependencies
+gemspec
+gem 'rake', group: [:development, :test]
+# Development dependencies
+group :development do
+  gem 'yard'
+  gem 'redcarpet', platform: 'ruby'
+end
+# Testing dependencies
+group :test do
+  gem 'minitest'
+  gem 'minitest-wscolor'
+  gem 'simplecov', require: false
+end

data/LICENSE ADDED Viewed

@@ -0,0 +1,22 @@
+Copyright (c) 2013 Sam Soffes
+MIT License
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/Rakefile ADDED Viewed

@@ -0,0 +1,21 @@
+require 'bundler'
+Bundler::GemHelper.install_tasks
+require 'rake/testtask'
+Rake::TestTask.new(:test) do |t|
+  t.libs << 'test'
+  t.pattern = 'test/**/*_test.rb'
+end
+task default: :test
+begin
+  require 'yard'
+  YARD::Rake::YardocTask.new(:doc) do |task|
+    task.files   = ['Readme.markdown', 'LICENSE', 'lib/**/*.rb']
+    task.options = [
+      '--output-dir', 'doc',
+      '--markup', 'markdown',
+    ]
+  end
+rescue LoadError
+end

data/Readme.markdown ADDED Viewed

@@ -0,0 +1,100 @@
+# Quesadilla
+Entity-style text parsing. Quesadilla was extracted from [Cheddar](https://cheddarapp.com).
+See the [Cheddar text guide](https://cheddarapp.com/text) for more information about how to type entities.
+## Installation
+Add this line to your application's Gemfile:
+``` ruby
+gem 'quesadilla'
+```
+And then execute:
+    $ bundle
+Or install it yourself as:
+    $ gem install quesadilla
+## Usage
+To extract entites from text, simply call extract:
+``` ruby
+Quesadilla.extract('Some #awesome text')
+# => {
+#   display_text: "Some #awesome text",
+#   display_html: "Some <a href=\"#hashtag-awesome\" class=\"tag\">#awesome</a> text",
+#   entities: [
+#     {
+#       type: "hashtag",
+#       text: "#awesome",
+#       display_text: "#awesome",
+#       indices: [5, 13],
+#       hashtag: "awesome",
+#       display_indices: [5, 13]
+#     }
+#   ]
+# }
+```
+### Configuring
+Quesadilla supports extracting various span-level Markdown features as well as automatically detecting links and GitHub-style named emoji. Here are the list of options you can pass when extracting:
+Option                      | Description
+----------------------------|-----------------------------------------------------------------
+`:markdown`                 | All Markdown parsing
+`:markdown_code`            | Markdown code tags
+`:markdown_links`           | Markdown links (including `<http://soff.es>` style links)
+`:markdown_triple_emphasis` | Markdown bold italic
+`:markdown_double_emphasis` | Markdown bold
+`:markdown_emphasis`        | Markdown italic
+`:markdown_strikethrough`   | Markdown Extra strikethrough
+`:hashtags`                 | Hashtags
+`:autolinks`                | Automatically detect links
+`:emoji`                    | GitHub-style named emoji
+`:html`                     | Generate HTML representations for entities and the entire string
+Everything is enabled by deafult. If you don't want to extract Markdown, you should call the extractor this like:
+``` ruby
+Quesadilla.extract('Some text', markdown: false)
+```
+You can also just disable strikethrough and still extract the rest of the Markdown entities if you want:
+``` ruby
+Quesadilla.extract('Some text', markdown_strikethrough: false)
+```
+### Customizing HTML
+If you want to change the generated HTML, you can create a custom renderer:
+``` ruby
+class CustomRenderer < Quesadilla::HTMLRenderer
+  def hashtag(display_text, hashtag)
+    %Q{<a href="http://example.com/tags/#{hashtag}" class="tag">#{display_text}</a>}
+  end
+end
+extraction = Quesadilla.extract('Some #awesome text', html_renderer: CustomRenderer)
+extraction[:display_html] #=> 'Some <a href="http://example.com/tags/awesome" class="tag">#awesome</a> text'
+```
+Take a look at [Quesadilla::HTMLRenderer](lib/quesadilla/html_renderer.html) for more details on creating a custom renderer.
+## Supported Ruby Versions
+Quesadilla is tested under 1.9.3, 2.0.0, and JRuby (1.9 mode).
+[![Build Status](https://travis-ci.org/soffes/quesadilla.png?branch=master)](https://travis-ci.org/soffes/quesadilla)
+## Contributing
+See the [contributing guide](Contributing.markdown).

data/lib/quesadilla/core_ext/string.rb ADDED Viewed

@@ -0,0 +1,28 @@
+# String additions
+class String
+  # Truncate method from ActiveSupport.
+  # @param truncate_at [Fixnum] number of characters to truncate after
+  # @param options [Hash] optional options hash
+  # @option options separator [String] truncate text only at a certain separator strings
+  # @option options omission [String] string to add at the end to endicated truncated text. Defaults to '...'
+  # @return [String] truncated string
+  def q_truncate(truncate_at, options = {})
+    return dup unless length > truncate_at
+    # Default omission to '...'
+    options[:omission] ||= '...'
+    # Account for the omission string in the truncated length
+    truncate_at -= options[:omission].length
+    # Calculate end index
+    stop = if options[:separator]
+      rindex(options[:separator], truncate_at) || truncate_at
+    else
+      truncate_at
+    end
+    # Return the trucnated string plus the omission string
+    self[0...stop] + options[:omission]
+  end
+end

data/lib/quesadilla/extractor/autolinks.rb ADDED Viewed

@@ -0,0 +1,28 @@
+# encoding: UTF-8
+module Quesadilla
+  class Extractor
+    # Extract plain links.
+    #
+    # This module has no public methods.
+    module Autolinks
+    private
+      require 'twitter-text'
+      def extract_autolinks
+        Twitter::Extractor::extract_urls_with_indices(@working_text).each do |entity|
+          entity_text = entity[:url]
+          @entities << {
+            type: ENTITY_TYPE_LINK,
+            text: entity_text,
+            display_text: display_url(entity[:url]),
+            url: quality_url(entity[:url]),
+            indices: entity[:indices]
+          }
+          @working_text.sub!(entity_text, REPLACE_TOKEN * entity_text.length)
+        end
+      end
+    end
+  end
+end

data/lib/quesadilla/extractor/emoji.rb ADDED Viewed

@@ -0,0 +1,43 @@
+# encoding: UTF-8
+module Quesadilla
+  class Extractor
+    # Extract named emoji.
+    #
+    # This module has no public methods.
+    module Emoji
+    private
+      require 'named_emoji'
+      # Emoji colon-syntax regex
+      EMOJI_COLON_REGEX = %r{:([a-zA-Z0-9_\-\+]+):}.freeze
+      def replace_emoji
+        codes = {}
+        # Replace codes with shas
+        i = 0
+        while match = @original_text.match(Markdown::CODE_REGEX)
+          original = match[0]
+          key = Digest::SHA1.hexdigest("#{original}-#{i}")
+          codes[key] = original
+          @original_text.sub!(original, key)
+          i += 1
+        end
+        # Replace emojis
+        while match = @original_text.match(EMOJI_COLON_REGEX)
+          sym = match[1].downcase.to_sym
+          next unless NamedEmoji.emojis.keys.include?(sym)
+          @original_text.sub!(match[0], NamedEmoji.emojis[sym])
+        end
+        # Unreplace codes
+        codes.each do |key, value|
+          @original_text.sub!(key, value)
+        end
+      end
+    end
+  end
+end

data/lib/quesadilla/extractor/hashtags.rb ADDED Viewed

@@ -0,0 +1,28 @@
+# encoding: UTF-8
+module Quesadilla
+  class Extractor
+    # Extract hashtags.
+    #
+    # This module has no public methods.
+    module Hashtags
+    private
+      require 'twitter-text'
+      def extract_hashtags
+        Twitter::Extractor::extract_hashtags_with_indices(@working_text).each do |entity|
+          entity_text = "##{entity[:hashtag]}"
+          @entities << {
+            type: ENTITY_TYPE_HASHTAG,
+            text: entity_text,
+            display_text: entity_text,
+            indices: entity[:indices],
+            hashtag: entity[:hashtag].downcase
+          }
+          @working_text.sub!(entity_text, REPLACE_TOKEN * entity_text.length)
+        end
+      end
+    end
+  end
+end

data/lib/quesadilla/extractor/html.rb ADDED Viewed

@@ -0,0 +1,103 @@
+# encoding: UTF-8
+module Quesadilla
+  class Extractor
+    # Convert entites and entire string to HTML.
+    #
+    # This module has no public methods.
+    module HTML
+    private
+      HTML_ESCAPE_MAP = [
+        {
+           pattern: '&',
+           text: '&amp;',
+           placeholder: "\uf050",
+        },
+        {
+           pattern: '<',
+           text: '&lt;',
+           placeholder: "\uf051",
+        },
+        {
+           pattern: '>',
+           text: '&gt;',
+           placeholder: "\uf052",
+        },
+        {
+           pattern: '"',
+           text: '&quot;',
+           placeholder: "\uf053",
+        },
+        {
+           pattern: '\'',
+           text: '&#x27;',
+           placeholder: "\uf054",
+        },
+        {
+           pattern: '/',
+           text: '&#x2F;',
+           placeholder: "\uf055",
+        }
+      ].freeze
+      def display_html(display_text, entities)
+         return html_escape(display_text) unless entities and entities.length > 0
+         # Replace entities
+         html = sub_entities(display_text, entities, true) do |entity|
+           html_entity(entity)
+        end
+         # Return
+         html_un_pre_escape(html)
+      end
+      def html_entity(entity)
+        display_text = html_pre_escape(entity[:display_text])
+        case entity[:type]
+        when ENTITY_TYPE_EMPHASIS
+          @renderer.emphasis(display_text)
+        when ENTITY_TYPE_DOUBLE_EMPHASIS
+          @renderer.double_emphasis(display_text)
+        when ENTITY_TYPE_TRIPLE_EMPHASIS
+          @renderer.triple_emphasis(display_text)
+        when ENTITY_TYPE_STRIKETHROUGH
+          @renderer.strikethrough(display_text)
+        when ENTITY_TYPE_CODE
+          @renderer.code(display_text)
+        when ENTITY_TYPE_HASHTAG
+          @renderer.hashtag(display_text, html_pre_escape(entity[:hashtag]))
+        when ENTITY_TYPE_LINK
+          @renderer.link(display_text, entity[:url], html_pre_escape(entity[:title]))
+        else
+          # Catchall
+          html_pre_escape(entity[:text])
+        end
+      end
+      # Pre-escape. Convert bad characters to high UTF-8 characters
+      # We do this dance so we don't throw off the indexes so the entities get inserted correctly.
+      def html_pre_escape(string)
+         return '' unless string
+         HTML_ESCAPE_MAP.each do |escape|
+           string = string.gsub(escape[:pattern], escape[:placeholder])
+         end
+         string
+      end
+      # Convert bad characters (now, high UTF-8 characters) to HTML escaped ones
+      def html_un_pre_escape(string)
+         HTML_ESCAPE_MAP.each do |escape|
+           string = string.gsub(escape[:placeholder], escape[:text])
+        end
+        string
+      end
+      def html_escape(string)
+        return '' unless string
+        string.gsub(/&/, '&amp;').gsub(/</, '&lt;').gsub(/>/, '&gt;').gsub(/"/, '&quot;').gsub(/'/, '&#x27;').gsub(/\//, '&#x2F;')
+      end
+    end
+  end
+end

data/lib/quesadilla/extractor/markdown.rb ADDED Viewed

@@ -0,0 +1,187 @@
+# encoding: UTF-8
+module Quesadilla
+  class Extractor
+    # Extract Markdown
+    #
+    # This module has no public methods.
+    module Markdown
+    private
+      # Gruber's regex is recursive, but I can't figure out how to do it in Ruby without the `g` option.
+      # Maybe I should use StringScanner instead. For now, I think it's fine. Everything appears to work
+      # as expected.
+      NESTED_BRACKETS_REGEX = %r{
+        (?>
+           [^\[\]]+
+        )*
+      }x.freeze
+      # 2 = Text, 3 = URL, 6 = Title
+      LINK_REGEX = %r{
+        (
+          \[
+            (#{NESTED_BRACKETS_REGEX})
+          \]
+          \(
+            [ \t]*
+          <?(.*?)>?
+            [ \t]*
+          (
+            (['"])
+            (.*?)
+            \5
+          )?
+          \)
+        )
+      }x.freeze
+      # 1 = URL
+      AUTOLINK_LINK_REGEX = /<((?:https?|ftp):[^'">\s]+)>/i.freeze
+      # 1 = Email
+      AUTOLINK_EMAIL_REGEX = %r{
+        <
+            (?:mailto:)?
+        (
+          [-.\w]+
+          \@
+          [-a-z0-9]+(?:\.[-a-z0-9]+)*\.[a-z]+
+        )
+        >
+      }xi.freeze
+      # 1 = Delimiter, 2 = Text
+      BOLD_ITALIC_REGEX = %r{ (\*\*\*|___) (?=\S) (.+?[*_]*) (?<=\S) \1 }x.freeze
+      # 1 = Delimiter, 2 = Text
+      BOLD_REGEX = %r{ (\*\*|__) (?=\S) (.+?[*_]*) (?<=\S) \1 }x.freeze
+      # 1 = Delimiter, 2 = Text
+      ITALIC_REGEX = %r{ (\*|_) (?=\S) (.+?) (?<=\S) \1 }x.freeze
+      # 1 = Delimiter, 2 = Text
+      STRIKETHROUGH_REGEX = %r{ (~~) (?=\S) (.+?[~]*) (?<=\S) \1 }x.freeze
+      # 1 = Delimiter, 2 = Text
+      CODE_REGEX = %r{ (`+) (.+?) (?<!`) \1 (?!`) }x.freeze
+      def extract_markdown
+        extract_markdown_code if @options[:markdown_code]
+        if @options[:markdown_links]
+          extract_markdown_autolink_links
+          extract_markdown_autolink_email
+          extract_markdown_links
+        end
+        extract_markdown_span(BOLD_ITALIC_REGEX, ENTITY_TYPE_TRIPLE_EMPHASIS) if @options[:markdown_triple_emphasis]
+        extract_markdown_span(BOLD_REGEX, ENTITY_TYPE_DOUBLE_EMPHASIS) if @options[:markdown_double_emphasis]
+        extract_markdown_span(ITALIC_REGEX, ENTITY_TYPE_EMPHASIS) if @options[:markdown_emphasis]
+        extract_markdown_span(STRIKETHROUGH_REGEX, ENTITY_TYPE_STRIKETHROUGH) if @options[:markdown_strikethrough]
+      end
+    private
+      def extract_markdown_span(regex, type)
+        # Match until there's no results
+        while match = @working_text.match(regex)
+          original = match[0]
+          length = original.length
+          # Find the start position of the original
+          start = @working_text.index(original)
+          # Create the entity
+          entity = {
+            type: type,
+            text: original,
+            display_text: match[2],
+            indices: [
+              start,
+              start + length
+            ]
+          }
+          # Let block modify
+          entity = yield(entity, match) if block_given?
+          # Add the entity
+          @entities << entity
+          # Remove from the working text
+          @working_text.sub!(original, REPLACE_TOKEN * length)
+        end
+      end
+      def extract_markdown_code
+        extract_markdown_span(CODE_REGEX, 'code') do |entity, match|
+          # Strip tabs from the display text
+          display = match[2]
+          display.gsub!(/^[ \t]*/, '')
+          display.gsub!(/[ \t]*$/, '')
+          entity[:display_text] = display
+          entity
+        end
+      end
+      def extract_markdown_autolink(regex)
+        # Match until there's no results
+        while match = @working_text.match(regex)
+          original = match[0]
+          length = original.length
+          # Find the start position of the original
+          start = @working_text.index(original)
+          # Create the entity
+          entity = {
+            type: ENTITY_TYPE_LINK,
+            text: original,
+            indices: [
+              start,
+              start + length
+            ]
+          }
+          # Let block modify
+          entity = yield(entity, match) if block_given?
+          # Add the entity
+          @entities << entity
+          # Remove from the working text
+          @working_text.sub!(original, REPLACE_TOKEN * length)
+        end
+      end
+      def extract_markdown_autolink_links
+        extract_markdown_autolink AUTOLINK_LINK_REGEX do |entity, match|
+          entity[:url] = match[1]
+          entity[:display_text] = display_url(match[1])
+          entity
+        end
+      end
+      def extract_markdown_autolink_email
+        extract_markdown_autolink AUTOLINK_EMAIL_REGEX do |entity, match|
+          email = match[1]
+          entity[:url] = "mailto:#{email}"
+          entity[:display_text] = email
+          entity
+        end
+      end
+      def extract_markdown_links
+        extract_markdown_span(LINK_REGEX, ENTITY_TYPE_LINK) do |entity, match|
+          # Add the URL
+          entity[:url] = match[3]
+          # Add the title
+          entity[:title] = match[6] if match[6]
+          entity
+        end
+      end
+    end
+  end
+end