RubyGems - archieml - Versions diffs - 0.1.0 - Mend

archieml 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

checksums.yaml +7 -0
data/.gitignore +3 -0
data/Gemfile +4 -0
data/LICENSE +13 -0
data/README.md +172 -0
data/archieml.gemspec +15 -0
data/examples/google_drive.rb +103 -0
data/lib/archieml.rb +14 -0
data/lib/archieml/loader.rb +189 -0
data/lib/archieml/version.rb +3 -0
data/spec/lib/archieml/loader_spec.rb +522 -0
data/spec/spec_helper.rb +2 -0
metadata +57 -0

checksums.yaml ADDED

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: 38f795c262135209607462aaf552bad7daec3bc0
+  data.tar.gz: 44fe6ad8971f1232b65dfb34caded74fd947f8bb
+SHA512:
+  metadata.gz: e3895fd7157bca39da81dab19a4998b31a9ac6f6aa3b4c5e792fe6b9e68f3a22a41b6b8745ec94dddc77db48fccc35a081652bbbcf822ac6ad1321c9749b2390
+  data.tar.gz: d0a2d979cb5cd1600f5e82b851e38c3fb39adee415641d757fad44ab8a38ce04cbb883cab26c6e5b290c6b2f74297cd159009c3cad79f827612bb79c22e64a61

data/.gitignore ADDED

@@ -0,0 +1,3 @@
+Gemfile.lock
+.ruby-version
+.ruby-gemset

data/Gemfile ADDED

@@ -0,0 +1,4 @@
+source "https://rubygems.org"
+gem "rspec", group: :development
+gem "rspec-mocks", group: :development

data/LICENSE ADDED

@@ -0,0 +1,13 @@
+Copyright (c) 2015 The New York Times Company
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this library except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.

data/README.md ADDED

@@ -0,0 +1,172 @@
+# Archieml
+Parse Archie Markup Language (ArchieML) documents into Ruby Hashes.
+Read about the ArchieML specification at [archieml.org](http://archieml.org).
+The current version is `v0.1.0`.
+## Installation
+`gem install archieml`
+## Usage
+```
+require 'archieml'
+Archieml.load("key: value")
+=> {"key"=>"value"}
+File.write("text.aml", "key: value")
+Archieml.load_file("text.aml")
+=> {"key"=>"value"}
+```
+### Using with Google Documents
+We use `archieml` at The New York Times to parse Google Documents containing AML. This requires a little upfront work to download the document and convert it into text that `archieml` can load.
+The first step is authenticating with the Google Drive API, and accessing the document. For this, you will need a user account that is authorized to view the document you wish to download.
+For this example, I'm going to use the official `google-api-client` Ruby gem, but you can use another library or authentication method if you like. Whatever mechanism, you'll need to be able to export the document either as text, or html, at which point the instructions will be identical.
+The full example is at [`examples/google_drive.rb`](https://github.com/newsdev/archieml-ruby/blob/master/examples/google_drive.rb).
+First, install the gem directly, or using a Gemfile:
+```
+$ gem install google-api-client
+```
+Next, open up `irb` and run the follow code to authorize a user, and initialize and OAuth client. Note that if you want to use this on a server, you'll have to set up a more re-usable way of authorizing users.
+```
+require 'google/api_client'
+require 'google/api_client/client_secrets'
+require 'google/api_client/auth/installed_app'
+client = Google::APIClient.new(:application_name => 'Ruby Drive sample', :application_version => '1.0.0')
+client_secrets = Google::APIClient::ClientSecrets.load
+flow = Google::APIClient::InstalledAppFlow.new(
+  :client_id => client_secrets.client_id,
+  :client_secret => client_secrets.client_secret,
+  :scope => ['https://www.googleapis.com/auth/drive']
+)
+client.authorization = flow.authorize
+```
+Log into your Google account and authorize the application to access your Google Drive files.
+Now that you have an authenticated `client`, you can make an API call to a document saved in Drive. Create a document with some basic AML inside (such as "key: value"), save it, and note the long string of characters at the end of the URL:
+`https://docs.google.com/a/nytimes.com/document/d/[FILE_ID]/edit`
+FILE_ID is defaulted to a public test file.
+```
+FILE_ID = "1JjYD90DyoaBuRYNxa4_nqrHKkgZf1HrUj30i3rTWX1s"
+drive = client.discovered_api('drive', 'v2')
+result = client.execute(
+  :api_method => drive.files.get,
+  :parameters => { 'fileId' => FILE_ID })
+```
+If result executes correctly, you should now have the file's metadata. The next step is to download the body of the file. The metadata has a property called `exportLinks` which gives you URLs to different formats that you can export the document as. Let's start with `text/plain`.
+```
+text_url = result.data['exportLinks']['text/plain']
+text_aml = client.execute(uri: text_url).body
+```
+`text_aml` should now contain your document in plain text! You're all set to run the text through the ArchieML parser.
+```
+require 'archieml'
+parsed = Archieml.load(text_aml)
+```
+Check out parsed, and ensure that it has any data you entered into the document.
+There are a few extra steps that we do to make working with Google Documents more useful. With a little more prep, we generally process the documents to:
+* Include links that users enter in the google document as HTML `<a>` tags
+* Remove smart quotes inside tag brackets `<>` (which Google loves to add for you)
+* Ensure that list bullet points are turned into `*`s
+Unfortunately, google strips out links when you export as `text/plain`, so if you want to preserve them, we have to export the document in a different format, `text/html`.
+```
+html_url = result.data['exportLinks']['text/html']
+html_data = client.execute(uri: html_url).body
+```
+At the other extreme, `html_data` now contains far too *much* data - there's a whole DOM represented in that text! We want to turn that HTML body back into plain text so that ArchieML can load it, and we want to preserve any links that we find.
+This is a lightweight DOM traverser which requires using the `nokogiri` gem: `gem install nokogiri`. It moves through the HTML document and constructs a simple text representation of the document, without things like images or tables that would be ignored by AML anyway.
+```
+require 'nokogiri'
+def convert(node)
+  str = ''
+  node.children.each do |child|
+    if func = @node_types[child.name || child.type]
+      str += func.call(child)
+    end
+  end
+  return str
+end
+@node_types = {
+  'text' => lambda { |node| return node.content },
+  'span' => lambda { |node| convert(node) },
+  'p'    => lambda { |node| return convert(node) + "\n" },
+  'li'   => lambda { |node| return '* ' + convert(node) + "\n" },
+  'a'    => lambda { |node|
+    return convert(node) unless node.attributes['href'] && node.attributes['href'].value
+    # Google changes all links to be served from a google domain.
+    # We need to strip off the real url, which has been moved to the
+    # "q" querystring parameter.
+    href = node.attributes['href'].value
+    if !href.index('?').nil? && parsed_url = CGI.parse(href.split('?')[1])
+      href = parsed_url['q'][0] if parsed_url['q']
+    end
+    str = "<a href=\"#{href}\">"
+    str += convert(node)
+    str += "</a>"
+    return str
+  }
+}
+%w(ul ol).each { |tag| @node_types[tag] = @node_types['span'] }
+%w(h1 h2 h3 h4 h5 h6 br hr).each { |tag| @node_types[tag] = @node_types['p'] }
+html_doc = Nokogiri::HTML(html_data)
+html_aml = convert(html_doc.children[1].children[1])
+require 'archieml'
+aml = Archieml.load(html_aml)
+```
+`aml` should now have your document with links included, and bullet points should continue to work (we transformed each `<li>` element into a separate line beginning with a `*`).
+One additional step we perform is removing smart quotes. You can run `html_aml` through this before calling `Archieml.load`:
+```
+html_aml.gsub!(/<[^<>]*>/) do |match|
+  match.gsub("‘", "'")
+       .gsub("’", "'")
+       .gsub("“", '"')
+       .gsub("”", '"')
+end
+aml = Archieml.load(html_aml)
+```
+## Changelog
+* `0.1.0` - Initial release supporting the first version of the ArchieML spec, published [2015-03-06](http://archieml.org/spec/1.0/CR-20150306.html).

data/archieml.gemspec ADDED

@@ -0,0 +1,15 @@
+require File.join(File.dirname(__FILE__), "lib", "archieml", "version")
+Gem::Specification.new do |gem|
+  gem.name          = "archieml"
+  gem.version       = Archieml::VERSION
+  gem.authors       = ["Michael Strickland"]
+  gem.email         = ["michael.strickland@nytimes.com"]
+  gem.description   = %q{Parse Archie Markup Language documents}
+  gem.summary       = %q{Archieml is a Ruby parser for the Archie Markup Language (ArchieML)}
+  gem.homepage      = "http://archieml.org"
+  gem.license       = "Apache License 2.0"
+  gem.files         = `git ls-files`.split($\)
+  gem.test_files    = gem.files.grep(%r{^spec/})
+  gem.require_paths = ["lib"]
+end

data/examples/google_drive.rb ADDED

@@ -0,0 +1,103 @@
+#!/usr/bin/env ruby
+# Adapted from example code Copyright 2012 by Google Inc.,
+# Licensed under the Apache License, Version 2.0.
+# https://github.com/google/google-api-ruby-client-samples/blob/master/drive/drive.rb
+FILE_ID = "1JjYD90DyoaBuRYNxa4_nqrHKkgZf1HrUj30i3rTWX1s"
+require 'archieml'
+require 'google/api_client'
+require 'google/api_client/client_secrets'
+require 'google/api_client/auth/file_storage'
+require 'google/api_client/auth/installed_app'
+require 'nokogiri'
+require 'pp'
+client = Google::APIClient.new(:application_name => 'Ruby Drive sample', :application_version => '1.0.0')
+CREDENTIAL_STORE_FILE = "oauth2.json"
+# FileStorage stores auth credentials in a file, so they survive multiple runs
+# of the application. This avoids prompting the user for authorization every
+# time the access token expires, by remembering the refresh token.
+# Note: FileStorage is not suitable for multi-user applications.
+file_storage = Google::APIClient::FileStorage.new(CREDENTIAL_STORE_FILE)
+if file_storage.authorization.nil?
+  client_secrets = Google::APIClient::ClientSecrets.load
+  flow = Google::APIClient::InstalledAppFlow.new(
+    :client_id => client_secrets.client_id,
+    :client_secret => client_secrets.client_secret,
+    :scope => ['https://www.googleapis.com/auth/drive']
+  )
+  client.authorization = flow.authorize(file_storage)
+else
+  client.authorization = file_storage.authorization
+end
+drive = client.discovered_api('drive', 'v2')
+result = client.execute(
+  :api_method => drive.files.get,
+  :parameters => { 'fileId' => FILE_ID })
+# Text version
+text_url = result.data['exportLinks']['text/plain']
+text_aml = client.execute(uri: text_url).body
+parsed = Archieml.load(text_aml)
+# HTML version
+html_url = result.data['exportLinks']['text/html']
+html_data = client.execute(uri: html_url).body
+def convert(node)
+  str = ''
+  node.children.each do |child|
+    if func = @node_types[child.name || child.type]
+      str += func.call(child)
+    end
+  end
+  return str
+end
+@node_types = {
+  'text' => lambda { |node| return node.content },
+  'span' => lambda { |node| convert(node) },
+  'p'    => lambda { |node| return convert(node) + "\n" },
+  'li'   => lambda { |node| return '* ' + convert(node) + "\n" },
+  'a'    => lambda { |node|
+    return convert(node) unless node.attributes['href'] && node.attributes['href'].value
+    # Google changes all links to be served from a google domain.
+    # We need to strip off the real url, which has been moved to the
+    # "q" querystring parameter.
+    href = node.attributes['href'].value
+    if !href.index('?').nil? && parsed_url = CGI.parse(href.split('?')[1])
+      href = parsed_url['q'][0] if parsed_url['q']
+    end
+    str = "<a href=\"#{href}\">"
+    str += convert(node)
+    str += "</a>"
+    return str
+  }
+}
+%w(ul ol).each { |tag| @node_types[tag] = @node_types['span'] }
+%w(h1 h2 h3 h4 h5 h6 br hr).each { |tag| @node_types[tag] = @node_types['p'] }
+html_doc = Nokogiri::HTML(html_data)
+html_aml = convert(html_doc.children[1].children[1])
+html_aml.gsub!(/<[^<>]*>/) do |match|
+  match.gsub("‘", "'")
+       .gsub("’", "'")
+       .gsub("“", '"')
+       .gsub("”", '"')
+end
+aml = Archieml.load(html_aml)
+pp aml

data/lib/archieml.rb ADDED

@@ -0,0 +1,14 @@
+require 'archieml/loader'
+module Archieml
+  def self.load(aml)
+    loader = Archieml::Loader.new()
+    loader.load(aml)
+  end
+  def self.load_file(filename)
+    loader = Archieml::Loader.new()
+    stream = File.open(filename)
+    loader.load(stream)
+  end
+end

data/lib/archieml/loader.rb ADDED

@@ -0,0 +1,189 @@
+module Archieml
+  class Loader
+    NEXT_LINE     = /.*((\r|\n)+)/
+    START_KEY     = /^\s*([A-Za-z0-9\-_\.]+)[ \t\r]*:[ \t\r]*(.*(?:\n|\r|$))/
+    COMMAND_KEY   = /^\s*:[ \t\r]*(endskip|ignore|skip|end)/i
+    ARRAY_ELEMENT = /^\s*\*[ \t\r]*(.*(?:\n|\r|$))/
+    SCOPE_PATTERN = /^\s*(\[|\{)[ \t\r]*([A-Za-z0-9\-_\.]*)[ \t\r]*(?:\]|\})[ \t\r]*.*?(\n|\r|$)/
+    def initialize
+      @data = @scope = {}
+      @buffer_scope  = @buffer_key = nil
+      @buffer_string = ''
+      @is_skipping  = false
+      @done_parsing = false
+      self.flush_scope!
+    end
+    def load(stream)
+      stream.each_line do |line|
+        return @data if @done_parsing
+        if match = line.match(COMMAND_KEY)
+          self.parse_command_key(match[1].downcase)
+        elsif !@is_skipping && (match = line.match(START_KEY)) && (!@array || @array_type != 'simple')
+          self.parse_start_key(match[1], match[2] || '')
+        elsif !@is_skipping && (match = line.match(ARRAY_ELEMENT)) && @array && @array_type != 'complex'
+          self.parse_array_element(match[1])
+        elsif !@is_skipping && match = line.match(SCOPE_PATTERN)
+          self.parse_scope(match[1], match[2])
+        else
+          @buffer_string += line
+        end
+      end
+      self.flush_buffer!
+      return @data
+    end
+    def parse_start_key(key, rest_of_line)
+      self.flush_buffer!
+      if @array
+        @array_type ||= 'complex'
+        # Ignore complex keys inside simple arrays
+        return if @array_type == 'simple'
+        if [nil, key].include?(@array_first_key)
+          @array << (@scope = {})
+        end
+        @array_first_key ||= key
+      end
+      @buffer_key = key
+      @buffer_string = rest_of_line
+      self.flush_buffer_into(key, replace: true)
+    end
+    def parse_array_element(value)
+      self.flush_buffer!
+      @array_type ||= 'simple'
+      # Ignore simple array elements inside complex arrays
+      return if @array_type == 'complex'
+      @array << ''
+      @buffer_key = @array
+      @buffer_string = value
+      self.flush_buffer_into(@array, replace: true)
+    end
+    def parse_command_key(command)
+      if @is_skipping && !%w(endskip ignore).include?(command)
+        return self.flush_buffer!
+      end
+      case command
+      when "end"
+        self.flush_buffer_into(@buffer_key, replace: false) if @buffer_key
+        return
+      when "ignore"
+        return @done_parsing = true
+      when "skip"
+        @is_skipping = true
+      when "endskip"
+        @is_skipping = false
+      end
+      self.flush_buffer!
+    end
+    def parse_scope(scope_type, scope_key)
+      self.flush_buffer!
+      self.flush_scope!
+      if scope_key == ''
+        @scope = @data
+      elsif %w([ {).include?(scope_type)
+        key_scope = @data
+        key_bits  = scope_key.split('.')
+        key_bits[0...-1].each do |bit|
+          key_scope = key_scope[bit] ||= {}
+        end
+        if scope_type == '['
+          @array = key_scope[key_bits.last] ||= []
+          if @array.length > 0
+            @array_type = @array.first.class == String ? 'simple' : 'complex'
+          end
+        elsif scope_type == '{'
+          @scope = key_scope[key_bits.last] ||= {}
+        end
+      end
+    end
+    def flush_buffer!
+      result = @buffer_string.dup
+      @buffer_string = ''
+      return result
+    end
+    def flush_buffer_into(key, options = {})
+      value = self.flush_buffer!
+      if options[:replace]
+        value = self.format_value(value, :replace).sub(/^\s*/, '')
+        @buffer_string = value.match(/\s*\Z/)[0]
+      else
+        value = self.format_value(value, :append)
+      end
+      if key.class == Array
+        key[key.length - 1] = '' if options[:replace]
+        key[key.length - 1] += value.sub(/\s*\Z/, '')
+      else
+        key_bits = key.split('.')
+        @buffer_scope = @scope
+        key_bits[0...-1].each do |bit|
+          @buffer_scope[bit] = {} if @buffer_scope[bit].class == String # reset
+          @buffer_scope = @buffer_scope[bit] ||= {}
+        end
+        @buffer_scope[key_bits.last] = '' if options[:replace]
+        @buffer_scope[key_bits.last] += value.sub(/\s*\Z/, '')
+      end
+    end
+    def flush_scope!
+      @array = @array_type = @array_first_key = nil
+    end
+    # type can be either :replace or :append.
+    # If it's :replace, then the string is assumed to be the first line of a
+    # value, and no escaping takes place.
+    # If we're appending to a multi-line string, escape special punctuation
+    # by prepending the line with a backslash.
+    # (:, [, {, *, \) surrounding the first token of any line.
+    def format_value(value, type)
+      value.gsub!(/(?:^\\)?\[[^\[\]\n\r]*\](?!\])/, '') # remove comments
+      value.gsub!(/\[\[([^\[\]\n\r]*)\]\]/, '[\1]') # [[]] => []
+      if type == :append
+        value.gsub!(/^(\s*)\\/, '\1')
+      end
+      value
+    end
+  end
+end

data/lib/archieml/version.rb ADDED

@@ -0,0 +1,3 @@
+module Archieml
+  VERSION = '0.1.0'
+end

data/spec/lib/archieml/loader_spec.rb ADDED

@@ -0,0 +1,522 @@
+require 'spec_helper'
+describe Archieml::Loader do
+  before(:each) do
+    @loader = Archieml::Loader.new()
+    allow(@loader).to receive(:parse_scope).with(any_args).and_call_original
+    allow(@loader).to receive(:parse_start_key).with(any_args).and_call_original
+    allow(@loader).to receive(:parse_command_key).with(any_args).and_call_original
+  end
+  describe "parsing values" do
+    it "parses key value pairs" do
+      @loader.load("key:value")['key'].should == 'value'
+    end
+    it "ignores spaces on either side of the key" do
+      @loader.load("  key  :value")['key'].should == 'value'
+    end
+    it "ignores tabs on either side of the key" do
+      @loader.load("\t\tkey\t\t:value")['key'].should == 'value'
+    end
+    it "ignores spaces on either side of the value" do
+      @loader.load("key:  value  ")['key'].should == 'value'
+    end
+    it "ignores tabs on either side of the value" do
+      @loader.load("key:\t\tvalue\t\t")['key'].should == 'value'
+    end
+    it "dupliate keys are assigned the last given value" do
+      @loader.load("key:value\nkey:newvalue")['key'].should == 'newvalue'
+    end
+    it "allows non-letter characters at the start of values" do
+      @loader.load("key::value")['key'].should == ':value'
+    end
+    it "keys are case sensitive" do
+      @loader.load("key:value\nKey:Value").keys.should == ['key', 'Key']
+    end
+    it "non-keys don't affect parsing" do
+      @loader.load("other stuff\nkey:value\nother stuff")['key'].should == 'value'
+    end
+  end
+  describe "valid keys" do
+    it "letters, numbers, dashes and underscores are valid key components" do
+      @loader.load("a-_1:value")['a-_1'].should == 'value'
+    end
+    it "spaces are not allowed in keys" do
+      @loader.load("k ey:value").keys.length.should == 0
+    end
+    it "symbols are not allowed in keys" do
+      @loader.load("k&ey:value").keys.length.should == 0
+    end
+    it "keys can be nested using dot-notation" do
+      @loader.load("scope.key:value")['scope']['key'].should == 'value'
+    end
+    it "earlier keys within scopes aren't deleted when using dot-notation" do
+      @loader.load("scope.key:value\nscope.otherkey:value")['scope']['key'].should == 'value'
+      @loader.load("scope.key:value\nscope.otherkey:value")['scope']['otherkey'].should == 'value'
+    end
+    it "the value of key that used to be a parent object should be replaced with a string if necessary" do
+      @loader.load("scope.level:value\nscope.level.level:value")['scope']['level']['level'].should == 'value'
+    end
+    it "the value of key that used to be a string object should be replaced with an object if necessary" do
+      @loader.load("scope.level.level:value\nscope.level:value")['scope']['level'].should == 'value'
+    end
+  end
+  describe "valid values" do
+    it "HTML is allowed" do
+      @loader.load("key:<strong>value</strong>")['key'].should == '<strong>value</strong>'
+    end
+  end
+  describe "skip" do
+    it "ignores spaces on either side of :skip" do
+      expect(@loader).to receive(:parse_command_key).with('skip').once
+      @loader.load("  :skip  \nkey:value\n:endskip").keys.length.should == 0
+    end
+    it "ignores tabs on either side of :skip" do
+      expect(@loader).to receive(:parse_command_key).with('skip').once
+      @loader.load("\t\t:skip\t\t\nkey:value\n:endskip").keys.length.should == 0
+    end
+    it "ignores spaces on either side of :endskip" do
+      expect(@loader).to receive(:parse_command_key).with('endskip').once
+      @loader.load(":skip\nkey:value\n  :endskip  ").keys.length.should == 0
+    end
+    it "ignores tabs on either side of :endskip" do
+      expect(@loader).to receive(:parse_command_key).with('endskip').once
+      @loader.load(":skip\nkey:value\n\t\t:endskip\t\t").keys.length.should == 0
+    end
+    it "starts parsing again after :endskip" do
+      expect(@loader).to receive(:parse_start_key).with('key', 'value').once
+      @loader.load(":skip\n:endskip\nkey:value").keys.length.should == 1
+    end
+    it ":skip and :endskip are case insensitive" do
+      expect(@loader).to receive(:parse_command_key).with('skip').once
+      expect(@loader).to receive(:parse_command_key).with('endskip').once
+      @loader.load(":sKiP\nkey:value\n:eNdSkIp").keys.length.should == 0
+    end
+    it "parse :skip as a special command even if more is appended to word" do
+      expect(@loader).to receive(:parse_command_key).with('skip')
+      @loader.load(":skipthis\nkey:value\n:endskip").keys.length.should == 0
+    end
+    it "ignores all content on line after :skip + space" do
+      expect(@loader).to receive(:parse_command_key).with('skip').once
+      expect(@loader).to_not receive(:parse_start_key).with('key', 'value')
+      @loader.load(":skip this text  \nkey:value\n:endskip").keys.length.should == 0
+    end
+    it "ignores all content on line after :skip + tab" do
+      expect(@loader).to receive(:parse_command_key).with('skip').once
+      expect(@loader).to_not receive(:parse_start_key).with('key', 'value')
+      @loader.load(":skip\tthis text\t\t\nkey:value\n:endskip").keys.length.should == 0
+    end
+    it "parse :endskip as a special command even if more is appended to word" do
+      expect(@loader).to receive(:parse_command_key).with('endskip')
+      @loader.load(":skip\n:endskiptheabove\nkey:value").keys.length.should == 1
+    end
+    it "ignores all content on line after :endskip + space" do
+      expect(@loader).to receive(:parse_command_key).with('endskip').once
+      expect(@loader).to receive(:parse_start_key).with('key', 'value').once
+      @loader.load(":skip\n:endskip the above\nkey:value").keys.length.should == 1
+    end
+    it "ignores all content on line after :endskip + tab" do
+      expect(@loader).to receive(:parse_command_key).with('endskip').once
+      expect(@loader).to receive(:parse_start_key).with('key', 'value').once
+      @loader.load(":skip\n:endskip\tthe above\nkey:value").keys.length.should == 1
+    end
+    it "does not parse :end as an :endskip" do
+      expect(@loader).to_not receive(:parse_command_key).with('endskip')
+      @loader.load(":skip\n:end\tthe above\nkey:value").keys.length.should == 0
+    end
+    it "ignores keys within a skip block" do
+      expect(@loader).to_not receive(:parse_start_key).with('other', 'value')
+      @loader.load("key1:value1\n:skip\nother:value\n\n:endskip\n\nkey2:value2").keys.should == ['key1', 'key2']
+    end
+  end
+  describe "ignore" do
+    it "text before ':ignore' should be included" do
+      @loader.load("key:value\n:ignore")['key'].should == 'value'
+    end
+    it "text after ':ignore' should be ignored" do
+      expect(@loader).to_not receive(:parse_start_key)
+      @loader.load(":ignore\nkey:value").keys.length.should == 0
+    end
+    it "':ignore' is case insensitive" do
+      expect(@loader).to receive(:parse_command_key).with('ignore').once
+      @loader.load(":iGnOrE\nkey:value").keys.length.should == 0
+    end
+    it "ignores spaces on either side of :ignore" do
+      expect(@loader).to receive(:parse_command_key).with('ignore').once
+      @loader.load(":iGnOrE\nkey:value").keys.length.should == 0
+      @loader.load("  :ignore  \nkey:value")
+    end
+    it "ignores tabs on either side of :ignore" do
+      expect(@loader).to receive(:parse_command_key).with('ignore').once
+      @loader.load(":iGnOrE\nkey:value").keys.length.should == 0
+      @loader.load("\t\t:ignore\t\t\nkey:value")
+    end
+    it "parses :ignore as a special command even if more is appended to word" do
+      expect(@loader).to receive(:parse_command_key).with('ignore')
+      @loader.load(":ignorethis\nkey:value").keys.length.should == 0
+    end
+    it "ignores all content on line after :ignore + space" do
+      expect(@loader).to receive(:parse_command_key).with('ignore').once
+      @loader.load(":iGnOrE\nkey:value").keys.length.should == 0
+      @loader.load(":ignore the below\nkey:value")
+    end
+    it "ignores all content on line after :ignore + tab" do
+      expect(@loader).to receive(:parse_command_key).with('ignore').once
+      @loader.load(":iGnOrE\nkey:value").keys.length.should == 0
+      @loader.load(":ignore\tthe below\nkey:value")
+    end
+  end
+  describe "multi line values" do
+    it "adds additional lines to value if followed by an ':end'" do
+      @loader.load("key:value\nextra\n:end")['key'].should == "value\nextra"
+    end
+    it "':end' is case insensitive" do
+      expect(@loader).to receive(:parse_command_key).with('end').once
+      @loader.load("key:value\nextra\n:EnD")
+    end
+    it "preserves blank lines and whitespace lines in the middle of content" do
+      @loader.load("key:value\n\n\t \nextra\n:end")['key'].should == "value\n\n\t \nextra"
+    end
+    it "doesn't preserve whitespace at the end of the key" do
+      @loader.load("key:value\nextra\t \n:end")['key'].should == "value\nextra"
+    end
+    it "preserves whitespace at the end of the original line" do
+      @loader.load("key:value\t \nextra\n:end")['key'].should == "value\t \nextra"
+    end
+    it "ignores whitespace and newlines before the ':end'" do
+      @loader.load("key:value\nextra\n \n\t\n:end")['key'].should == "value\nextra"
+    end
+    it "ignores spaces on either side of :end" do
+      expect(@loader).to receive(:parse_command_key).with('end').once
+      @loader.load("key:value\nextra\n  :end  ")
+    end
+    it "ignores tabs on either side of :end" do
+      expect(@loader).to receive(:parse_command_key).with('end').once
+      @loader.load("key:value\nextra\n\t\t:end\t\t")
+    end
+    it "parses :end as a special command even if more is appended to word" do
+      expect(@loader).to receive(:parse_command_key).with('end')
+      @loader.load("key:value\nextra\n:endthis")['key'].should == "value\nextra"
+    end
+    it "does not parse :endskip as an :end" do
+      expect(@loader).to_not receive(:parse_command_key).with('end')
+      @loader.load("key:value\nextra\n:endskip")['key'].should == "value"
+    end
+    it "ordinary text that starts with a colon is included" do
+      @loader.load("key:value\n:notacommand\n:end")['key'].should == "value\n:notacommand"
+    end
+    it "ignores all content on line after :end + space" do
+      expect(@loader).to receive(:parse_command_key).with('end').once
+      @loader.load("key:value\nextra\n:end this")['key'].should == "value\nextra"
+    end
+    it "ignores all content on line after :end + tab" do
+      expect(@loader).to receive(:parse_command_key).with('end').once
+      @loader.load("key:value\nextra\n:end\tthis")['key'].should == "value\nextra"
+    end
+    it "doesn't escape colons on first line" do
+      @loader.load("key::value\n:end")['key'].should == ":value"
+      @loader.load("key:\\:value\n:end")['key'].should == "\\:value"
+    end
+    it "does not allow escaping keys" do
+      @loader.load("key:value\nkey2\\:value\n:end")['key'].should == "value\nkey2\\:value"
+    end
+    it "allows escaping key lines with a leading backslash" do
+      @loader.load("key:value\n\\key2:value\n:end")['key'].should == "value\nkey2:value"
+    end
+    it "allows escaping commands at the beginning of lines" do
+      @loader.load("key:value\n\\:end\n:end")['key'].should == "value\n:end"
+    end
+    it "allows escaping commands with extra text at the beginning of lines" do
+      @loader.load("key:value\n\\:endthis\n:end")['key'].should == "value\n:endthis"
+    end
+    it "allows escaping of non-commandc at the beginning of lines" do
+      @loader.load("key:value\n\\:notacommand\n:end")['key'].should == "value\n:notacommand"
+    end
+    it "allows simple array style lines" do
+      @loader.load("key:value\n* value\n:end")['key'].should == "value\n* value"
+    end
+    it "escapes '*' within multi-line values when not in a simple array" do
+      @loader.load("key:value\n\\* value\n:end")['key'].should == "value\n* value"
+    end
+    it "allows escaping scope keys at the beginning of lines" do
+      @loader.load("key:value\n\\{scope}\n:end")['key'].should == "value\n{scope}"
+      @loader.load("key:value\n\\[comment]\n:end")['key'].should == "value"
+      @loader.load("key:value\n\\[[array]]\n:end")['key'].should == "value\n[array]"
+    end
+    it "allows escaping initial backslash at the beginning of lines" do
+      @loader.load("key:value\n\\\\:end\n:end")['key'].should == "value\n\\:end"
+    end
+    it "escapes only one initial backslash" do
+      @loader.load("key:value\n\\\\\\:end\n:end")['key'].should == "value\n\\\\:end"
+    end
+    it "doesn't escape colons after beginning of lines" do
+      @loader.load("key:value\nLorem key2\\:value\n:end")['key'].should == "value\nLorem key2\\:value"
+    end
+  end
+  describe "scopes" do
+    it "{scope} creates an empty object at 'scope'" do
+      @loader.load("{scope}")['scope'].class.should == Hash
+    end
+    it "ignores spaces on either side of {scope}" do
+      expect(@loader).to receive(:parse_scope).with('{', 'scope').once
+      @loader.load("  {scope}  ")
+    end
+    it "ignores tabs on either side of {scope}" do
+      expect(@loader).to receive(:parse_scope).with('{', 'scope').once
+      @loader.load("\t\t{scope}\t\t")['scope'].should == {}
+    end
+    it "ignores text after {scope}" do
+      expect(@loader).to receive(:parse_scope).with('{', 'scope').once
+      @loader.load("{scope}a")['scope'].should == {}
+    end
+    it "ignores spaces on either side of {scope} variable name" do
+      expect(@loader).to receive(:parse_scope).with('{', 'scope').once
+      @loader.load("{  scope  }")['scope'].should == {}
+    end
+    it "ignores tabs on either side of {scope} variable name" do
+      expect(@loader).to receive(:parse_scope).with('{', 'scope').once
+      @loader.load("{\t\tscope\t\t}")['scope'].should == {}
+    end
+    it "items before a {scope} are not namespaced" do
+      @loader.load("key:value\n{scope}")['key'].should == 'value'
+    end
+    it "items after a {scope} are namespaced" do
+      @loader.load("{scope}\nkey:value")['key'].should == nil
+      @loader.load("{scope}\nkey:value")['scope']['key'].should == 'value'
+    end
+    it "scopes can be nested using dot-notaion" do
+      @loader.load("{scope.scope}\nkey:value")['scope']['scope']['key'].should == 'value'
+    end
+    it "scopes can be reopened" do
+      @loader.load("{scope}\nkey:value\n{}\n{scope}\nother:value")['scope'].keys.should =~ ["key", "other"]
+    end
+    it "scopes do not overwrite existing values" do
+      @loader.load("{scope.scope}\nkey:value\n{scope.otherscope}key:value")['scope']['scope']['key'].should == 'value'
+    end
+    it "{} resets to the global scope" do
+      expect(@loader).to receive(:parse_scope).with('{', '').once
+      @loader.load("{scope}\n{}\nkey:value")['key'].should == 'value'
+    end
+    it "ignore spaces inside {}" do
+      expect(@loader).to receive(:parse_scope).with('{', '').once
+      @loader.load("{scope}\n{  }\nkey:value")['key'].should == 'value'
+    end
+    it "ignore tabs inside {}" do
+      expect(@loader).to receive(:parse_scope).with('{', '').once
+      @loader.load("{scope}\n{\t\t}\nkey:value")['key'].should == 'value'
+    end
+    it "ignore spaces on either side of {}" do
+      expect(@loader).to receive(:parse_scope).with('{', '').once
+      @loader.load("{scope}\n  {}  \nkey:value")['key'].should == 'value'
+    end
+    it "ignore tabs on either side of {}" do
+      expect(@loader).to receive(:parse_scope).with('{', '').once
+      @loader.load("{scope}\n\t\t{}\t\t\nkey:value")['key'].should == 'value'
+    end
+  end
+  describe "arrays" do
+    it "[array] creates an empty array at 'array'" do
+      @loader.load("[array]")['array'].should == []
+    end
+    it "ignores spaces on either side of [array]" do
+      expect(@loader).to receive(:parse_scope).with('[', 'array').once
+      @loader.load("  [array]  ")
+    end
+    it "ignores tabs on either side of [array]" do
+      expect(@loader).to receive(:parse_scope).with('[', 'array').once
+      @loader.load("\t\t[array]\t\t")
+    end
+    it "ignores text after [array]" do
+      expect(@loader).to receive(:parse_scope).with('[', 'array').once
+      @loader.load("[array]a")['array'].should == []
+    end
+    it "ignores spaces on either side of [array] variable name" do
+      expect(@loader).to receive(:parse_scope).with('[', 'array').once
+      @loader.load("[  array  ]")
+    end
+    it "ignores tabs on either side of [array] variable name" do
+      expect(@loader).to receive(:parse_scope).with('[', 'array').once
+      @loader.load("[\t\tarray\t\t]")
+    end
+    it "arrays can be nested using dot-notaion" do
+      @loader.load("[scope.array]")['scope']['array'].should == []
+    end
+    it "array values can be nested using dot-notaion" do
+      @loader.load("[array]\nscope.key: value\nscope.key: value")['array'].should == [{'scope' => {'key' => 'value'}}, {'scope' => {'key' => 'value'}}]
+    end
+    it "[] resets to the global scope" do
+      @loader.load("[array]\n[]\nkey:value")['key'].should == 'value'
+    end
+    it "ignore spaces inside []" do
+      expect(@loader).to receive(:parse_scope).with('[', '').once
+      @loader.load("[array]\n[  ]\nkey:value")['key'].should == 'value'
+    end
+    it "ignore tabs inside []" do
+      expect(@loader).to receive(:parse_scope).with('[', '').once
+      @loader.load("[array]\n[\t\t]\nkey:value")['key'].should == 'value'
+    end
+    it "ignore spaces on either side of []" do
+      expect(@loader).to receive(:parse_scope).with('[', '').once
+      @loader.load("[array]\n  []  \nkey:value")['key'].should == 'value'
+    end
+    it "ignore tabs on either side of []" do
+      expect(@loader).to receive(:parse_scope).with('[', '').once
+      @loader.load("[array]\n\t\t[]\t\t\nkey:value")['key'].should == 'value'
+    end
+  end
+  describe "simple arrays" do
+    it "creates a simple array when an '*' is encountered first" do
+      @loader.load("[array]\n*Value")['array'].first.should == 'Value'
+    end
+    it "ignores spaces on either side of '*'" do
+      @loader.load("[array]\n  *  Value")['array'].first.should == 'Value'
+    end
+    it "ignores tabs on either side of '*'" do
+      @loader.load("[array]\n\t\t*\t\tValue")['array'].first.should == 'Value'
+    end
+    it "adds multiple elements" do
+      @loader.load("[array]\n*Value1\n*Value2")['array'].should == ['Value1', 'Value2']
+    end
+    it "ignores all other text between elements" do
+      @loader.load("[array]\n*Value1\nNon-element\n*Value2")['array'].should == ['Value1', 'Value2']
+    end
+    it "ignores key:value pairs between elements" do
+      @loader.load("[array]\n*Value1\nkey:value\n*Value2")['array'].should == ['Value1', 'Value2']
+    end
+    it "parses key:values normally after an end-array" do
+      @loader.load("[array]\n*Value1\n[]\nkey:value")['key'].should == 'value'
+    end
+    it "multi-line values are allowed" do
+      @loader.load("[array]\n*Value1\nextra\n:end")['array'].first.should == "Value1\nextra"
+    end
+    it "allows escaping of '*' within multi-line values in simple arrays" do
+      @loader.load("[array]\n*Value\n\\* extra\n:end")['array'].first.should == "Value\n* extra"
+    end
+    it "allows escaping of command keys within multi-line values" do
+      @loader.load("[array]\n*Value\n\\:end\n:end")['array'].first.should == "Value\n:end"
+    end
+    it "does not allow escaping of keys within multi-line values" do
+      @loader.load("[array]\n*Value\nkey\\:value\n:end")['array'].first.should == "Value\nkey\\:value"
+    end
+    it "allows escaping key lines with a leading backslash" do
+      @loader.load("[array]\n*Value\n\\key:value\n:end")['array'].first.should == "Value\nkey:value"
+    end
+    it "does not allow escaping of colons not at the beginning of lines" do
+      @loader.load("[array]\n*Value\nword key\\:value\n:end")['array'].first.should == "Value\nword key\\:value"
+    end
+    it "arrays that are reopened add to existing array" do
+      @loader.load("[array]\n*Value\n[]\n[array]\n*Value")['array'].should == ['Value', 'Value']
+    end
+    it "simple arrays that are reopened remain simple" do
+      @loader.load("[array]\n*Value\n[]\n[array]\nkey:value")['array'].should == ['Value']
+    end
+  end
+  describe "complex arrays" do
+    it "keys after an [array] are included as items in the array" do
+      @loader.load("[array]\nkey:value")['array'].first.should == {'key' => 'value' }
+    end
+    it "array items can have multiple keys" do
+      @loader.load("[array]\nkey:value\nsecond:value")['array'].first.keys.should =~ ['key', 'second']
+    end
+    it "when a duplicate key is encountered, a new item in the array is started" do
+      @loader.load("[array]\nkey:value\nsecond:value\nkey:value")['array'].length.should == 2
+      @loader.load("[array]\nkey:first\nkey:second")['array'].last.should == {'key' => 'second'}
+      @loader.load("[array]\nscope.key:first\nscope.key:second")['array'].last.should == {'scope' => {'key' => 'second'}}
+    end
+    it "duplicate keys must match on dot-notation scope" do
+      @loader.load("[array]\nkey:value\nscope.key:value")['array'].length.should == 1
+    end
+    it "duplicate keys must match on dot-notation scope" do
+      @loader.load("[array]\nscope.key:value\nkey:value\notherscope.key:value")['array'].length.should == 1
+    end
+    it "arrays that are reopened add to existing array" do
+      @loader.load("[array]\nkey:value\n[]\n[array]\nkey:value")['array'].length.should == 2
+    end
+    it "complex arrays that are reopened remain complex" do
+      @loader.load("[array]\nkey:value\n[]\n[array]\n*Value")['array'].should == [{'key' => 'value'}]
+    end
+  end
+  describe "inline comments" do
+    it "ignore comments inside of [single brackets]" do
+      @loader.load("key:value [inline comments] value")['key'].should == "value  value"
+    end
+    it "supports multiple inline comments on a single line" do
+      @loader.load("key:value [inline comments] value [inline comments] value")['key'].should == "value  value  value"
+    end
+    it "supports adjacent comments" do
+      @loader.load("key:value [inline comments] [inline comments] value")['key'].should == "value   value"
+    end
+    it "supports no-space adjacent comments" do
+      @loader.load("key:value [inline comments][inline comments] value")['key'].should == "value  value"
+    end
+    it "supports comments at beginning of string" do
+      @loader.load("key:[inline comments] value")['key'].should == "value"
+    end
+    it "supports comments at end of string" do
+      @loader.load("key:value [inline comments]")['key'].should == "value"
+    end
+    it "whitespace before a comment that appears at end of line is ignored" do
+      @loader.load("key:value [inline comments] value [inline comments]")['key'].should == "value  value"
+    end
+    it "unmatched single brackets are preserved" do
+      @loader.load("key:value ][ value")['key'].should == "value ][ value"
+    end
+    it "inline comments are supported on the first of multi-line values" do
+      @loader.load("key:value [inline comments] on\nmultiline\n:end")['key'].should == "value  on\nmultiline"
+    end
+    it "inline comments are supported on subsequent lines of multi-line values" do
+      @loader.load("key:value\nmultiline [inline comments]\n:end")['key'].should == "value\nmultiline"
+    end
+    it "whitespace around comments is preserved, except at the beinning and end of a value" do
+      @loader.load("key: [] value [] \n multiline [] \n:end")['key'].should == "value  \n multiline"
+    end
+    it "inline comments cannot span multiple lines" do
+      @loader.load("key:value [inline\ncomments] value\n:end")['key'].should == "value [inline\ncomments] value"
+      @loader.load("key:value \n[inline\ncomments] value\n:end")['key'].should == "value \n[inline\ncomments] value"
+    end
+    it "text inside [[double brackets]] is included as [single brackets]" do
+      @loader.load("key:value [[brackets]] value")['key'].should == "value [brackets] value"
+    end
+    it "unmatched double brackets are preserved" do
+      @loader.load("key:value ]][[ value")['key'].should == "value ]][[ value"
+    end
+    it "comments work in simple arrays" do
+      @loader.load("[array]\n*Val[comment]ue")['array'].first.should == "Value"
+    end
+    it "double brackets work in simple arrays" do
+      @loader.load("[array]\n*Val[[real]]ue")['array'].first.should == "Val[real]ue"
+    end
+  end
+end

data/spec/spec_helper.rb ADDED

	@@ -0,0 +1,2 @@
1	+ require 'archieml'
2	+ require 'rspec'

metadata ADDED

@@ -0,0 +1,57 @@
+--- !ruby/object:Gem::Specification
+name: archieml
+version: !ruby/object:Gem::Version
+  version: 0.1.0
+platform: ruby
+authors:
+- Michael Strickland
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2015-03-05 00:00:00.000000000 Z
+dependencies: []
+description: Parse Archie Markup Language documents
+email:
+- michael.strickland@nytimes.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- .gitignore
+- Gemfile
+- LICENSE
+- README.md
+- archieml.gemspec
+- examples/google_drive.rb
+- lib/archieml.rb
+- lib/archieml/loader.rb
+- lib/archieml/version.rb
+- spec/lib/archieml/loader_spec.rb
+- spec/spec_helper.rb
+homepage: http://archieml.org
+licenses:
+- Apache License 2.0
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 2.4.5
+signing_key:
+specification_version: 4
+summary: Archieml is a Ruby parser for the Archie Markup Language (ArchieML)
+test_files:
+- spec/lib/archieml/loader_spec.rb
+- spec/spec_helper.rb