RubyGems - quesadilla - Versions diffs - 0.1.0 - Mend

quesadilla 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

checksums.yaml +7 -0
data/.gitignore +17 -0
data/.travis.yml +6 -0
data/Contributing.markdown +19 -0
data/Gemfile +19 -0
data/LICENSE +22 -0
data/Rakefile +21 -0
data/Readme.markdown +100 -0
data/lib/quesadilla/core_ext/string.rb +28 -0
data/lib/quesadilla/extractor/autolinks.rb +28 -0
data/lib/quesadilla/extractor/emoji.rb +43 -0
data/lib/quesadilla/extractor/hashtags.rb +28 -0
data/lib/quesadilla/extractor/html.rb +103 -0
data/lib/quesadilla/extractor/markdown.rb +187 -0
data/lib/quesadilla/extractor.rb +140 -0
data/lib/quesadilla/html_renderer.rb +57 -0
data/lib/quesadilla/version.rb +4 -0
data/lib/quesadilla.rb +45 -0
data/quesadilla.gemspec +28 -0
data/test/quesadilla/autolink_test.rb +84 -0
data/test/quesadilla/emoji_test.rb +103 -0
data/test/quesadilla/hashtags_test.rb +50 -0
data/test/quesadilla/html_test.rb +21 -0
data/test/quesadilla/markdown_test.rb +235 -0
data/test/quesadilla/multi_test.rb +64 -0
data/test/quesadilla_test.rb +9 -0
data/test/support/extractor_macros.rb +5 -0
data/test/test_helper.rb +18 -0
metadata +109 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: 17f8cfec03e4c86278585c52d2ae57995a984d3c
+  data.tar.gz: e145d4496a0bc034e278f6a2c4738d48006d7a8b
+SHA512:
+  metadata.gz: aaf8418d063286d7666e0b666b5badbcf63e2e54cceb225d35e69bc238fb3f0973544c547cbc806dde042e5d8f7ae37fa4accb1325c210ef5d9eb8a989f6330a
+  data.tar.gz: 39f070f60d71278bceeab4e939944ebc921609e4c6eb90b26f0e09ac693639b4dd80bd3a423e33dccd32f641fe6218213b665c3ebf43282e2a203134ef7454fb

data/.gitignore ADDED Viewed

@@ -0,0 +1,17 @@
+*.gem
+*.rbc
+.bundle
+.config
+.yardoc
+Gemfile.lock
+InstalledFiles
+_yardoc
+coverage
+doc/
+lib/bundler/man
+pkg
+rdoc
+spec/reports
+test/tmp
+test/version_tmp
+tmp

data/.travis.yml ADDED Viewed

@@ -0,0 +1,6 @@
+language: ruby
+bundler_args: --without development
+rvm:
+  - 1.9.3
+  - 2.0.0
+  - jruby-19mode

data/Contributing.markdown ADDED Viewed

@@ -0,0 +1,19 @@
+## Submitting a Pull Request
+1. [Fork the repository.][fork]
+2. [Create a topic branch.][branch]
+3. Add tests for your unimplemented feature or bug fix.
+4. Run `bundle exec rake`. If your tests pass, return to step 3.
+5. Implement your feature or bug fix.
+6. Run `bundle exec rake`. If your tests fail, return to step 5.
+7. Run `open coverage/index.html`. If your changes are not completely covered
+   by your tests, return to step 3.
+8. Add documentation for your feature or bug fix.
+9. Run `bundle exec rake doc`. If your changes are not 100% documented, go
+   back to step 8.
+10. Add, commit, and push your changes.
+11. [Submit a pull request.][pr]
+[fork]: http://help.github.com/fork-a-repo/
+[branch]: http://learn.github.com/p/branching.html
+[pr]: http://help.github.com/send-pull-requests/

data/Gemfile ADDED Viewed

@@ -0,0 +1,19 @@
+source 'https://rubygems.org'
+# Gem dependencies
+gemspec
+gem 'rake', group: [:development, :test]
+# Development dependencies
+group :development do
+  gem 'yard'
+  gem 'redcarpet', platform: 'ruby'
+end
+# Testing dependencies
+group :test do
+  gem 'minitest'
+  gem 'minitest-wscolor'
+  gem 'simplecov', require: false
+end

data/LICENSE ADDED Viewed

@@ -0,0 +1,22 @@
+Copyright (c) 2013 Sam Soffes
+MIT License
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/Rakefile ADDED Viewed

@@ -0,0 +1,21 @@
+require 'bundler'
+Bundler::GemHelper.install_tasks
+require 'rake/testtask'
+Rake::TestTask.new(:test) do |t|
+  t.libs << 'test'
+  t.pattern = 'test/**/*_test.rb'
+end
+task default: :test
+begin
+  require 'yard'
+  YARD::Rake::YardocTask.new(:doc) do |task|
+    task.files   = ['Readme.markdown', 'LICENSE', 'lib/**/*.rb']
+    task.options = [
+      '--output-dir', 'doc',
+      '--markup', 'markdown',
+    ]
+  end
+rescue LoadError
+end

data/Readme.markdown ADDED Viewed

@@ -0,0 +1,100 @@
+# Quesadilla
+Entity-style text parsing. Quesadilla was extracted from [Cheddar](https://cheddarapp.com).
+See the [Cheddar text guide](https://cheddarapp.com/text) for more information about how to type entities.
+## Installation
+Add this line to your application's Gemfile:
+``` ruby
+gem 'quesadilla'
+```
+And then execute:
+    $ bundle
+Or install it yourself as:
+    $ gem install quesadilla
+## Usage
+To extract entites from text, simply call extract:
+``` ruby
+Quesadilla.extract('Some #awesome text')
+# => {
+#   display_text: "Some #awesome text",
+#   display_html: "Some <a href=\"#hashtag-awesome\" class=\"tag\">#awesome</a> text",
+#   entities: [
+#     {
+#       type: "hashtag",
+#       text: "#awesome",
+#       display_text: "#awesome",
+#       indices: [5, 13],
+#       hashtag: "awesome",
+#       display_indices: [5, 13]
+#     }
+#   ]
+# }
+```
+### Configuring
+Quesadilla supports extracting various span-level Markdown features as well as automatically detecting links and GitHub-style named emoji. Here are the list of options you can pass when extracting:
+Option                      | Description
+----------------------------|-----------------------------------------------------------------
+`:markdown`                 | All Markdown parsing
+`:markdown_code`            | Markdown code tags
+`:markdown_links`           | Markdown links (including `<http://soff.es>` style links)
+`:markdown_triple_emphasis` | Markdown bold italic
+`:markdown_double_emphasis` | Markdown bold
+`:markdown_emphasis`        | Markdown italic
+`:markdown_strikethrough`   | Markdown Extra strikethrough
+`:hashtags`                 | Hashtags
+`:autolinks`                | Automatically detect links
+`:emoji`                    | GitHub-style named emoji
+`:html`                     | Generate HTML representations for entities and the entire string
+Everything is enabled by deafult. If you don't want to extract Markdown, you should call the extractor this like:
+``` ruby
+Quesadilla.extract('Some text', markdown: false)
+```
+You can also just disable strikethrough and still extract the rest of the Markdown entities if you want:
+``` ruby
+Quesadilla.extract('Some text', markdown_strikethrough: false)
+```
+### Customizing HTML
+If you want to change the generated HTML, you can create a custom renderer:
+``` ruby
+class CustomRenderer < Quesadilla::HTMLRenderer
+  def hashtag(display_text, hashtag)
+    %Q{<a href="http://example.com/tags/#{hashtag}" class="tag">#{display_text}</a>}
+  end
+end
+extraction = Quesadilla.extract('Some #awesome text', html_renderer: CustomRenderer)
+extraction[:display_html] #=> 'Some <a href="http://example.com/tags/awesome" class="tag">#awesome</a> text'
+```
+Take a look at [Quesadilla::HTMLRenderer](lib/quesadilla/html_renderer.html) for more details on creating a custom renderer.
+## Supported Ruby Versions
+Quesadilla is tested under 1.9.3, 2.0.0, and JRuby (1.9 mode).
+[![Build Status](https://travis-ci.org/soffes/quesadilla.png?branch=master)](https://travis-ci.org/soffes/quesadilla)
+## Contributing
+See the [contributing guide](Contributing.markdown).

data/lib/quesadilla/core_ext/string.rb ADDED Viewed

@@ -0,0 +1,28 @@
+# String additions
+class String
+  # Truncate method from ActiveSupport.
+  # @param truncate_at [Fixnum] number of characters to truncate after
+  # @param options [Hash] optional options hash
+  # @option options separator [String] truncate text only at a certain separator strings
+  # @option options omission [String] string to add at the end to endicated truncated text. Defaults to '...'
+  # @return [String] truncated string
+  def q_truncate(truncate_at, options = {})
+    return dup unless length > truncate_at
+    # Default omission to '...'
+    options[:omission] ||= '...'
+    # Account for the omission string in the truncated length
+    truncate_at -= options[:omission].length
+    # Calculate end index
+    stop = if options[:separator]
+      rindex(options[:separator], truncate_at) || truncate_at
+    else
+      truncate_at
+    end
+    # Return the trucnated string plus the omission string
+    self[0...stop] + options[:omission]
+  end
+end

data/lib/quesadilla/extractor/autolinks.rb ADDED Viewed

@@ -0,0 +1,28 @@
+# encoding: UTF-8
+module Quesadilla
+  class Extractor
+    # Extract plain links.
+    #
+    # This module has no public methods.
+    module Autolinks
+    private
+      require 'twitter-text'
+      def extract_autolinks
+        Twitter::Extractor::extract_urls_with_indices(@working_text).each do |entity|
+          entity_text = entity[:url]
+          @entities << {
+            type: ENTITY_TYPE_LINK,
+            text: entity_text,
+            display_text: display_url(entity[:url]),
+            url: quality_url(entity[:url]),
+            indices: entity[:indices]
+          }
+          @working_text.sub!(entity_text, REPLACE_TOKEN * entity_text.length)
+        end
+      end
+    end
+  end
+end

data/lib/quesadilla/extractor/emoji.rb ADDED Viewed

@@ -0,0 +1,43 @@
+# encoding: UTF-8
+module Quesadilla
+  class Extractor
+    # Extract named emoji.
+    #
+    # This module has no public methods.
+    module Emoji
+    private
+      require 'named_emoji'
+      # Emoji colon-syntax regex
+      EMOJI_COLON_REGEX = %r{:([a-zA-Z0-9_\-\+]+):}.freeze
+      def replace_emoji
+        codes = {}
+        # Replace codes with shas
+        i = 0
+        while match = @original_text.match(Markdown::CODE_REGEX)
+          original = match[0]
+          key = Digest::SHA1.hexdigest("#{original}-#{i}")
+          codes[key] = original
+          @original_text.sub!(original, key)
+          i += 1
+        end
+        # Replace emojis
+        while match = @original_text.match(EMOJI_COLON_REGEX)
+          sym = match[1].downcase.to_sym
+          next unless NamedEmoji.emojis.keys.include?(sym)
+          @original_text.sub!(match[0], NamedEmoji.emojis[sym])
+        end
+        # Unreplace codes
+        codes.each do |key, value|
+          @original_text.sub!(key, value)
+        end
+      end
+    end
+  end
+end

data/lib/quesadilla/extractor/hashtags.rb ADDED Viewed

@@ -0,0 +1,28 @@
+# encoding: UTF-8
+module Quesadilla
+  class Extractor
+    # Extract hashtags.
+    #
+    # This module has no public methods.
+    module Hashtags
+    private
+      require 'twitter-text'
+      def extract_hashtags
+        Twitter::Extractor::extract_hashtags_with_indices(@working_text).each do |entity|
+          entity_text = "##{entity[:hashtag]}"
+          @entities << {
+            type: ENTITY_TYPE_HASHTAG,
+            text: entity_text,
+            display_text: entity_text,
+            indices: entity[:indices],
+            hashtag: entity[:hashtag].downcase
+          }
+          @working_text.sub!(entity_text, REPLACE_TOKEN * entity_text.length)
+        end
+      end
+    end
+  end
+end

data/lib/quesadilla/extractor/html.rb ADDED Viewed

@@ -0,0 +1,103 @@
+# encoding: UTF-8
+module Quesadilla
+  class Extractor
+    # Convert entites and entire string to HTML.
+    #
+    # This module has no public methods.
+    module HTML
+    private
+      HTML_ESCAPE_MAP = [
+        {
+           pattern: '&',
+           text: '&amp;',
+           placeholder: "\uf050",
+        },
+        {
+           pattern: '<',
+           text: '&lt;',
+           placeholder: "\uf051",
+        },
+        {
+           pattern: '>',
+           text: '&gt;',
+           placeholder: "\uf052",
+        },
+        {
+           pattern: '"',
+           text: '&quot;',
+           placeholder: "\uf053",
+        },
+        {
+           pattern: '\'',
+           text: '&#x27;',
+           placeholder: "\uf054",
+        },
+        {
+           pattern: '/',
+           text: '&#x2F;',
+           placeholder: "\uf055",
+        }
+      ].freeze
+      def display_html(display_text, entities)
+         return html_escape(display_text) unless entities and entities.length > 0
+         # Replace entities
+         html = sub_entities(display_text, entities, true) do |entity|
+           html_entity(entity)
+        end
+         # Return
+         html_un_pre_escape(html)
+      end
+      def html_entity(entity)
+        display_text = html_pre_escape(entity[:display_text])
+        case entity[:type]
+        when ENTITY_TYPE_EMPHASIS
+          @renderer.emphasis(display_text)
+        when ENTITY_TYPE_DOUBLE_EMPHASIS
+          @renderer.double_emphasis(display_text)
+        when ENTITY_TYPE_TRIPLE_EMPHASIS
+          @renderer.triple_emphasis(display_text)
+        when ENTITY_TYPE_STRIKETHROUGH
+          @renderer.strikethrough(display_text)
+        when ENTITY_TYPE_CODE
+          @renderer.code(display_text)
+        when ENTITY_TYPE_HASHTAG
+          @renderer.hashtag(display_text, html_pre_escape(entity[:hashtag]))
+        when ENTITY_TYPE_LINK
+          @renderer.link(display_text, entity[:url], html_pre_escape(entity[:title]))
+        else
+          # Catchall
+          html_pre_escape(entity[:text])
+        end
+      end
+      # Pre-escape. Convert bad characters to high UTF-8 characters
+      # We do this dance so we don't throw off the indexes so the entities get inserted correctly.
+      def html_pre_escape(string)
+         return '' unless string
+         HTML_ESCAPE_MAP.each do |escape|
+           string = string.gsub(escape[:pattern], escape[:placeholder])
+         end
+         string
+      end
+      # Convert bad characters (now, high UTF-8 characters) to HTML escaped ones
+      def html_un_pre_escape(string)
+         HTML_ESCAPE_MAP.each do |escape|
+           string = string.gsub(escape[:placeholder], escape[:text])
+        end
+        string
+      end
+      def html_escape(string)
+        return '' unless string
+        string.gsub(/&/, '&amp;').gsub(/</, '&lt;').gsub(/>/, '&gt;').gsub(/"/, '&quot;').gsub(/'/, '&#x27;').gsub(/\//, '&#x2F;')
+      end
+    end
+  end
+end

data/lib/quesadilla/extractor/markdown.rb ADDED Viewed

@@ -0,0 +1,187 @@
+# encoding: UTF-8
+module Quesadilla
+  class Extractor
+    # Extract Markdown
+    #
+    # This module has no public methods.
+    module Markdown
+    private
+      # Gruber's regex is recursive, but I can't figure out how to do it in Ruby without the `g` option.
+      # Maybe I should use StringScanner instead. For now, I think it's fine. Everything appears to work
+      # as expected.
+      NESTED_BRACKETS_REGEX = %r{
+        (?>
+           [^\[\]]+
+        )*
+      }x.freeze
+      # 2 = Text, 3 = URL, 6 = Title
+      LINK_REGEX = %r{
+        (
+          \[
+            (#{NESTED_BRACKETS_REGEX})
+          \]
+          \(
+            [ \t]*
+          <?(.*?)>?
+            [ \t]*
+          (
+            (['"])
+            (.*?)
+            \5
+          )?
+          \)
+        )
+      }x.freeze
+      # 1 = URL
+      AUTOLINK_LINK_REGEX = /<((?:https?|ftp):[^'">\s]+)>/i.freeze
+      # 1 = Email
+      AUTOLINK_EMAIL_REGEX = %r{
+        <
+            (?:mailto:)?
+        (
+          [-.\w]+
+          \@
+          [-a-z0-9]+(?:\.[-a-z0-9]+)*\.[a-z]+
+        )
+        >
+      }xi.freeze
+      # 1 = Delimiter, 2 = Text
+      BOLD_ITALIC_REGEX = %r{ (\*\*\*|___) (?=\S) (.+?[*_]*) (?<=\S) \1 }x.freeze
+      # 1 = Delimiter, 2 = Text
+      BOLD_REGEX = %r{ (\*\*|__) (?=\S) (.+?[*_]*) (?<=\S) \1 }x.freeze
+      # 1 = Delimiter, 2 = Text
+      ITALIC_REGEX = %r{ (\*|_) (?=\S) (.+?) (?<=\S) \1 }x.freeze
+      # 1 = Delimiter, 2 = Text
+      STRIKETHROUGH_REGEX = %r{ (~~) (?=\S) (.+?[~]*) (?<=\S) \1 }x.freeze
+      # 1 = Delimiter, 2 = Text
+      CODE_REGEX = %r{ (`+) (.+?) (?<!`) \1 (?!`) }x.freeze
+      def extract_markdown
+        extract_markdown_code if @options[:markdown_code]
+        if @options[:markdown_links]
+          extract_markdown_autolink_links
+          extract_markdown_autolink_email
+          extract_markdown_links
+        end
+        extract_markdown_span(BOLD_ITALIC_REGEX, ENTITY_TYPE_TRIPLE_EMPHASIS) if @options[:markdown_triple_emphasis]
+        extract_markdown_span(BOLD_REGEX, ENTITY_TYPE_DOUBLE_EMPHASIS) if @options[:markdown_double_emphasis]
+        extract_markdown_span(ITALIC_REGEX, ENTITY_TYPE_EMPHASIS) if @options[:markdown_emphasis]
+        extract_markdown_span(STRIKETHROUGH_REGEX, ENTITY_TYPE_STRIKETHROUGH) if @options[:markdown_strikethrough]
+      end
+    private
+      def extract_markdown_span(regex, type)
+        # Match until there's no results
+        while match = @working_text.match(regex)
+          original = match[0]
+          length = original.length
+          # Find the start position of the original
+          start = @working_text.index(original)
+          # Create the entity
+          entity = {
+            type: type,
+            text: original,
+            display_text: match[2],
+            indices: [
+              start,
+              start + length
+            ]
+          }
+          # Let block modify
+          entity = yield(entity, match) if block_given?
+          # Add the entity
+          @entities << entity
+          # Remove from the working text
+          @working_text.sub!(original, REPLACE_TOKEN * length)
+        end
+      end
+      def extract_markdown_code
+        extract_markdown_span(CODE_REGEX, 'code') do |entity, match|
+          # Strip tabs from the display text
+          display = match[2]
+          display.gsub!(/^[ \t]*/, '')
+          display.gsub!(/[ \t]*$/, '')
+          entity[:display_text] = display
+          entity
+        end
+      end
+      def extract_markdown_autolink(regex)
+        # Match until there's no results
+        while match = @working_text.match(regex)
+          original = match[0]
+          length = original.length
+          # Find the start position of the original
+          start = @working_text.index(original)
+          # Create the entity
+          entity = {
+            type: ENTITY_TYPE_LINK,
+            text: original,
+            indices: [
+              start,
+              start + length
+            ]
+          }
+          # Let block modify
+          entity = yield(entity, match) if block_given?
+          # Add the entity
+          @entities << entity
+          # Remove from the working text
+          @working_text.sub!(original, REPLACE_TOKEN * length)
+        end
+      end
+      def extract_markdown_autolink_links
+        extract_markdown_autolink AUTOLINK_LINK_REGEX do |entity, match|
+          entity[:url] = match[1]
+          entity[:display_text] = display_url(match[1])
+          entity
+        end
+      end
+      def extract_markdown_autolink_email
+        extract_markdown_autolink AUTOLINK_EMAIL_REGEX do |entity, match|
+          email = match[1]
+          entity[:url] = "mailto:#{email}"
+          entity[:display_text] = email
+          entity
+        end
+      end
+      def extract_markdown_links
+        extract_markdown_span(LINK_REGEX, ENTITY_TYPE_LINK) do |entity, match|
+          # Add the URL
+          entity[:url] = match[3]
+          # Add the title
+          entity[:title] = match[6] if match[6]
+          entity
+        end
+      end
+    end
+  end
+end