RubyGems - pacer-xml - Versions diffs - 0.2.1-java → 0.2.2-java - Mend

pacer-xml 0.2.1-java → 0.2.2-java

Files changed (13) hide show

data/.gitignore +5 -0
data/Gemfile +6 -0
data/Rakefile +2 -0
data/Readme.markdown +172 -0
data/lib/pacer-xml/build_graph.rb +216 -0
data/lib/pacer-xml/nokogiri_node.rb +148 -0
data/lib/pacer-xml/sample.rb +107 -0
data/lib/pacer-xml/string_route.rb +50 -0
data/lib/pacer-xml/version.rb +7 -0
data/lib/pacer-xml/xml_route.rb +129 -0
data/lib/pacer-xml.rb +48 -0
data/pacer-xml.gemspec +24 -0
metadata +15 -3

data/.gitignore ADDED Viewed

@@ -0,0 +1,5 @@
+*.graph
+*.lock
+*.xml
+pkg
+*.graphml

data/Gemfile ADDED Viewed

@@ -0,0 +1,6 @@
+source "http://rubygems.org"
+# Specify your gem's dependencies in pacer-graph.gemspec
+gemspec
+gem 'pacer', path: '~/xn/pacer'

data/Rakefile ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ require 'bundler'
2	+ Bundler::GemHelper.install_tasks

data/Readme.markdown ADDED Viewed

@@ -0,0 +1,172 @@
+pacer-xml
+=========
+This Pacer plugin is designed to make it dead-simple to import any
+arbitrary XML file (no matter how bizarre) into any graph database
+supported by Pacer.
+This library evolved out of my need to be able to easily pull in sample
+data when demoing Pacer. GraphML is pretty rare and what I've been able
+to find is mostly pretty lame anyway, but raw XML seems to be everywhere
+(just check out [DATA.GOV](http://www.data.gov/)).
+Usage
+-----
+I suggest looking at the implementation of the below sample to see how
+I've used pacer-xml there.
+There are 2 key methods:
+`Pacer.xml(file, start_section = nil, end_section = nil)`
+```
+file:          String | IO
+  String           path to an xml file to read
+  IO               an open resource that responds to #each_line
+start_section: String | Symbol | Regex | Proc  (optional)
+  String | Symbol  name of xml tag to use as the root node of each
+                   section of xml. The end_section will automatically be
+                   set to the closing tag.  This uses very simple regex
+                   matching.
+  Regex            If it matches, start the section from this line
+  Proc             proc { |line| }
+                   If it results in a truthy value, starts collecting
+                   lines for the next section of xml.
+end_section:   Proc  (optional)
+  Regex            If it matches, end the section including this line
+  Proc             proc { |line, lines| }
+                   - If it results in a truthy value to indicate that the
+                     current line is the last line in a section.
+                   - if it results in an Array, pass the result of
+                     joining the array to Nokogiri for the next section.
+```
+If the parser is building a section when it gets to the end of the file,
+it will call the `end_section.call(nil, lines)`. To prevent the final
+section from being processed, return `[]`.
+Returns a Pacer Route to a series of Nokogiri::XML::Elements. Each
+element is the root element of the its document. By default, chunks are
+delimited by the presence of `<?xml`.
+`xml_route.import(graph, opts = {})`
+```
+graph: PacerGraph   The graph to load the data into.
+opts:  Hash
+  :cache  false | Hash
+    false              disable caching
+    stats: true        enable occasional dump of cache info
+  :rename Hash         map of { 'old-name' => 'new-name' }
+  :html   Array        set of tag names to treat as containing HTML
+  :skip   Array        set of tag or attribute names to skip
+```
+Baked-in Sample
+---------------
+This library started out with me tackling a chunk of [Patent Grants](https://explore.data.gov/Business-Enterprise/Patent-Grant-Bibliographic-Text-1976-Present-/8du5-jxih)
+data, and my first attempt at importing it was with a hand-crafted set
+of rules that walked the XML, creating graph elements along the way.
+That was fairly painful and turned out to be very slow as well. My
+second attempt evolved into this tool. The cool thing is that by the
+end, everything specific to the patent grants data set was just a few
+lines of configuration on top of a very powerful streaming XML parsing
+tool.
+I encourage you to check out the sample data, simply install this gem
+and start up IRB, then:
+```ruby
+require 'pacer-xml'
+graph = PacerXml::Sample.load_100
+```
+That will download and extract a 100M xml file full of 2 weeks of patent
+grants data, then create a graph with the first 100 patents, including
+every piece of data in the file.
+I encourage you to take a look at [how it was done](https://github.com/xnlogic/pacer-xml/blob/master/lib/pacer-xml/sample.rb).
+Once you've created a graph from the data, it may be useful for you to
+check out how it's structured. Pacer's got a handy tool built in to do
+that, `Pacer::Utils::GraphAnalysis.structure graph`, but let's go one
+step further and visually analyze the graph. If we run the command
+below, we'll see the same results as the GraphAnalysis, but it will
+export a graphml file that we can load into yEd, an excellent free graph
+visualization tool:
+```ruby
+PacerXml::Sample.structure! graph
+# ... lots of output ...
+#=> #<PacerGraph tinkergraph[vertices:90 edges:112]
+```
+The new file in your working directory is called
+`patent-structure.graphml`. Open that file in yEd. You'll see a single
+box... Fortunately, laying it out is fairly simple:
+1. Tools / Fit Node To Label
+1. OK
+1. Layout / Hierarchical...
+1. Labelling Tab / set Edge Labelling to Hierarchic
+1. OK
+Cool!
+Contextual Help
+---------------
+Back to Pacer, there's lots to learn about Pacer. The best way to do
+that is to use Pacer's own inline help:
+* Use `Pacer.help` for general help
+* Get into a general section with `Pacer.help :section`
+* Get contextual help with `graph.v.map.help`
+* Get more contextual help with `graph.v.map.help :section`
+Contextual help was only added recently so it's not complete yet but
+it's developing quickly and contributions are very welcome!
+More
+-----
+To play with the xml tools themselves, try out the following commands:
+```ruby
+xml_route = PacerXml::Sample.xml(nil, start_rule, end_rule)
+importer = PacerXml::Sample.importer
+```
+Performance Notes
+-----------------
+This section uses the `PacerXml::Sample.load_all` method. The `load_100`
+method runs in just a couple of seconds.
+The default sample file contains 3019840 lines representing 4479
+documents. Running under the simple `bundle exec irb` command on a MBP
+2.3 GHz i7, here are some quick timings (in seconds) for operations on
+the entire file:
+```
+=> 8.36    iterate through 3019840 lines
+=> 28.534  reduce the lines to 4479 arrays of lines
+=> 29.753  join each array of lines into a string
+=> 34.788  parse each string into a Nokogiri XML document
+=> 812.732 create a graph, producing 494659 vertices and 629690 edges
+```
+Starting up with `bundle exec jruby --server -J-Xmx2048m -S irb`
+slightly improves performance of the import but does not appear to
+affect Pacer or Nokogiri's performance:
+```
+=> 34.857  parsed XML documents
+=> 780.828 created graph
+```

data/lib/pacer-xml/build_graph.rb ADDED Viewed

@@ -0,0 +1,216 @@
+require 'set'
+module PacerXml
+  class GraphVisitor
+    class << self
+      def build_rename(custom = {})
+        h = Hash.new { |h, k| h[k] = k.to_s }
+        h['id'] = 'identifier'
+        h.merge! custom if custom
+        h
+      end
+    end
+    attr_reader :graph
+    attr_accessor :depth, :documents
+    attr_reader :rename, :html, :skip
+    def initialize(graph, opts = {})
+      @documents = 0
+      @graph = graph
+      # treat tag as a property containing html
+      @html = (opts[:html] || []).map(&:to_s).to_set
+      # skip property or tag
+      @skip = (opts[:skip] || []).map(&:to_s).to_set
+      # rename type or property
+      @rename = self.class.build_rename(opts[:rename])
+    end
+    def build(doc)
+      self.documents += 1
+      self.depth = 0
+      if doc.is_a? Nokogiri::XML::Document
+        visit_element doc.first_element_child
+      elsif doc.element?
+        visit_element doc
+      elsif doc.is_a? Enumerable
+        doc.select(&:element?).each { |e| visit_element e }
+      else
+        fail "Don't know what you want to do"
+      end
+    end
+    def visit_vertex_fields(e)
+      h = e.fields
+      h['type'] = rename[h['type']]
+      rename.each do |from, to|
+        if h.key? from
+          h[to] = h.delete from
+        end
+      end
+      html.each do |name|
+        name = rename[name]
+        child = e.at_xpath(name)
+        h[name] = child.inner_html if child
+      end
+      skip.each do |name|
+        h.delete name
+      end
+      h
+    end
+    def visit_edge_fields(e)
+      h = visit_vertex_fields(e)
+      h.delete 'type'
+      h
+    end
+    def tell(x)
+      print('  ' * depth) if depth
+      if x.is_a? Hash or x.is_a? Array
+        p x
+      else
+        puts x
+      end
+    end
+    def skip?(e)
+      skip.include? e.name or html.include? e.name
+    end
+    def level
+      self.depth += 1
+      yield
+    ensure
+      self.depth -= 1
+    end
+  end
+  class BuildGraph < GraphVisitor
+    def visit_element(e)
+      return nil if skip? e
+      level do
+        vertex = graph.create_vertex visit_vertex_fields(e)
+        e.one_rels.each do |rel|
+          visit_one_rel e, vertex, rel
+        end
+        e.many_rels.each do |rel|
+          visit_many_rels e, vertex, rel
+        end
+        if block_given?
+          yield vertex
+        else
+          vertex
+        end
+      end
+    end
+    def visit_one_rel(e, from, rel)
+      to = visit_element(rel)
+      if from and to
+        graph.create_edge nil, from, to, rename[rel.name]
+      end
+    end
+    def visit_many_rels(from_e, from, rel)
+      return nil if skip? rel
+      level do
+        attrs = visit_edge_fields rel
+        attrs.delete :type
+        rel.contained_rels.map do |to_e|
+          visit_many_rel(from_e, from, rel, to_e, attrs)
+        end
+      end
+    end
+    def visit_many_rel(from_e, from, rel, to_e, attrs)
+      to = visit_element(to_e)
+      if from and to
+        graph.create_edge nil, from, to, rename[rel.name], attrs
+      end
+    end
+  end
+  class BuildGraphCached < BuildGraph
+    class << self
+      def empty_cache
+        cache = Hash.new { |h, k| h[k] = {} }
+        cache[:hits] = Hash.new 0
+        cache[:size] = 0
+        cache[:kill] = nil
+        cache[:skip] = Set[]
+        cache
+      end
+    end
+    attr_reader :cache
+    attr_accessor :fields
+    def initialize(graph, opts = {})
+      if opts[:cache]
+        @cache = self.class.empty_cache.merge! opts[:cache]
+      else
+        @cache = self.class.empty_cache
+      end
+      super
+    end
+    def build(doc)
+      result = super
+      #tell "CACHE size #{ cache[:size] },  hits:"
+      if cache[:stats] and documents % 100 == 99
+        tell '-----------------'
+        cache.each do |k, adds|
+          next unless k.is_a? String
+          adds = adds.length
+          hits = cache[:hits][k]
+          tell("%40s: %6s / %6s = %5.4f" % [k, hits, adds, (hits/adds.to_f)])
+        end
+      end
+      result
+    end
+    def cacheable?(e)
+      not cache[:skip].include?(rename[e.name]) and not visit_vertex_fields(e).empty?
+    end
+    def get_cached(e)
+      if cacheable?(e)
+        id = cache[rename[e.name]][visit_vertex_fields(e).hash]
+        #tell "cache hit: #{ e.description }" if el
+        if id
+          cache[:hits][rename[e.name]] += 1
+          graph.vertex(id)
+        end
+      end
+    end
+    def set_cached(e, el)
+      return unless el
+      if cacheable?(e)
+        ct = cache[rename[e.name]]
+        kill = cache[:kill]
+        if kill and cache[:hits][rename[e.name]] == 0 and ct.length > kill
+          tell "cache kill #{ e.description }"
+          cache[:skip] << rename[e.name]
+          cache[:size] -= ct.length
+          cache[rename[e.name]] = []
+        else
+          ct[visit_vertex_fields(e).hash] = el.element_id
+          cache[:size] += 1
+        end
+      end
+      el
+    end
+    def visit_vertex_fields(e)
+      self.fields ||= super
+    end
+    def visit_element(e)
+      self.fields = nil
+      get_cached(e) || set_cached(e, super)
+    end
+  end
+end

data/lib/pacer-xml/nokogiri_node.rb ADDED Viewed

@@ -0,0 +1,148 @@
+class Nokogiri::XML::Text
+  def tree(_ = nil)
+    text unless text =~ /\A\s*\Z/
+  end
+  def inspect
+    if text =~ /\A\s*\Z/
+      "#<(whitespace)>"
+    else
+      "#<Text #{ text }>"
+    end
+  end
+end
+class Nokogiri::XML::Node
+  def tree(key_map = {})
+    c = elements.map { |x| x.tree(key_map) }.compact
+    if c.empty?
+      key_map.fetch(name, name)
+    else
+      ct = {}
+      texts = []
+      attrs = {}
+      if respond_to? :attributes
+        attrs = Hash[attributes.map { |k, a|
+          k = key_map.fetch(k, k)
+          [k, a.value] if k
+        }.compact]
+      end
+      c.each do |h|
+        if h.is_a? String
+          texts << h
+          next
+        end
+        h.each do |name, value|
+          if ct.key? name
+            if ct[name].is_a? Array
+              ct[name] << value unless ct[name].include? value
+            elsif ct[name] != value
+              ct[name] = [ct[name], value]
+            end
+          else
+            ct[name] = value
+          end
+        end
+      end
+      ct.merge! attrs
+      key = key_map.fetch(name, name)
+      if key
+        if ct.empty?
+          if texts.count < 2
+            { key => texts.first }
+          else
+            { key => texts.uniq }
+          end
+        elsif texts.any?
+          { key => ct }
+        else
+          { key => ct }
+        end
+      end
+    end
+  end
+  def inspect
+    if children.all? &:text?
+      "#<Property #{ name }>"
+    else
+      "#<Element #{ name } [#{ elements.map(&:name).uniq.join(', ') }]>"
+    end
+  end
+  def description
+    s = if property?
+      "property"
+    elsif container?
+      'container'
+    elsif vertex?
+      'vertex'
+    else
+      'other'
+    end
+    "#{ s } #{ name }"
+  end
+  def property?
+    children.all? &:text?
+  end
+  def container?
+    not property? and
+      elements.map(&:name).uniq.length == 1 and
+      elements.all? { |e| e.vertex? or e.container? }
+  end
+  def vertex?
+    not property? and not container?
+  end
+  def properties
+    elements.select(&:property?)
+  end
+  def attrs
+    if respond_to? :attributes
+      attributes
+    else
+      {}
+    end
+  end
+  def fields
+    result = {}
+    attrs.each do |name, attr|
+      result[name] = attr.value
+    end
+    properties.each do |e|
+      result[e.name] = e.text
+    end
+    result['type'] = name
+    result
+  end
+  def one_rels
+    elements.select &:vertex?
+  end
+  def contained_rels
+    if container?
+      elements.select(&:vertex?) +
+        elements.select(&:container?).flat_map(&:contained_rels)
+    else
+      []
+    end
+  end
+  def many_rels
+    elements.select &:container?
+  end
+  def rels_hash
+    result = Hash.new { |h, k| h[k] = [] }
+    one_rels.each  { |e| result[e.name] << e }
+    many_rels.each { |e| result[e.name] += e.contained_rels }
+    result
+  end
+end

data/lib/pacer-xml/sample.rb ADDED Viewed

@@ -0,0 +1,107 @@
+require 'set'
+module PacerXml
+  module Sample
+    class << self
+      # Will actually load 101. To avoid this side-effect of
+      # prefetching, the route should be defined as:
+      # xml_route.limit(100).import(...)
+      def load_100(*args)
+        i = importer(*args).limit(100)
+        i.run!
+        i.graph
+      end
+      # Uses a Neo4j graph because the data is too big to fit in memory
+      # without configuring the JVM to use more than its small default
+      # footprint.
+      #
+      # Alternatively, to start the JVM with more memory, try:
+      # bundle exec jruby -J-Xmx2048m -S irb
+      def load_all(graph = nil, *args)
+        require 'pacer-neo4j'
+        n = Time.now.to_i % 1000000
+        graph ||= Pacer.neo4j "sample.#{n}.graph"
+        i = importer(graph, *args)
+        i.run!
+        i.graph
+      end
+      def structure(g)
+        Pacer::Utils::GraphAnalysis.structure g
+      end
+      def structure!(g, fn = 'patent-structure.graphml')
+        s = structure g
+        if fn
+          e = Pacer::Utils::YFilesExport.new
+          e.vertex_label = s.vertex_name
+          e.edge_label = s.edge_name
+          e.export s, fn
+          puts
+          puts "Wrote #{ fn }"
+        end
+        s
+      end
+      # Sample of using the xml import function with some advanced options to
+      # clean up the resulting graph.
+      #
+      # Import can successfully be run with no options specified, but this patent
+      # xml is particularly hairy.
+      def importer(graph = nil, fn = nil, start_rule = nil, end_rule = nil)
+        html = [:abstract]
+        rename = {
+          'classification-national' => 'classification',
+          'assistant-examiner' => 'examiner',
+          'primary-examiner' => 'examiner',
+          'us-term-of-grant' => 'term',
+          'addressbook' => 'entity',
+          'document-id' => 'document',
+          'us-related-documents' => 'related-document',
+          'us-patent-grant' => 'patent-version',
+          'us-bibliographic-data-grant' => 'patent'
+        }
+        cache = { stats: true }
+        graph ||= Pacer.tg
+        graph.create_key_index :type, :vertex
+        xml_route = xml(fn, start_rule, end_rule)
+        xml_route.
+          process { print '.' }.
+          import(graph, html: html, rename: rename, cache: cache)
+      end
+      def xml(fn = nil, *args)
+        fn ||= a_week
+        path = download_patent_grant fn
+        Pacer.xml path, *args
+      end
+      def cleanup(fn = nil)
+        fn ||= a_week
+        name, week = fn.split '_'
+        Dir["/tmp/#{name}*"].each { |f| File.delete f }
+      end
+      private
+      def a_week
+        'ipgb20120103_wk01'
+      end
+      def download_patent_grant(fn)
+        puts "Downloading a sample xml file from"
+        puts "http://www.google.com/googlebooks/uspto-patents-grants-biblio.html"
+        name, week = fn.split '_'
+        result = "/tmp/#{name}.xml"
+        Dir.chdir '/tmp' do
+          unless File.exists? result
+            system "curl http://storage.googleapis.com/patents/grantbib/2012/#{fn}.zip > #{fn}.zip"
+            system "unzip #{fn}.zip"
+          end
+        end
+        result
+      end
+    end
+  end
+end

data/lib/pacer-xml/string_route.rb ADDED Viewed

@@ -0,0 +1,50 @@
+module Pacer
+  module Core
+    module StringRoute
+      def xml_stream(enter = nil, leave = nil)
+        enter ||= /<\?xml/
+        leave ||= enter
+        enter = build_rule :enter, enter
+        leave = build_rule :leave, leave
+        r = reducer(element_type: :array, enter: enter, leave: leave) do |s, lines|
+          lines << s
+        end.route
+        joined = r.map(element_type: :string, info: 'join', &:join).route
+        joined.xml
+      end
+      def xml
+        map(element_type: :xml) do |s|
+          Nokogiri::XML(s).first_element_child
+        end
+      end
+      private
+      def build_rule(type, rule)
+        rule = rule.to_s if rule.is_a? Symbol
+        if rule.is_a? String
+          if type == :leave
+            rule = "/#{rule}"
+            add_close_tag = true
+          end
+          rule = /<#{rule}\b/
+        end
+        if rule.is_a? Proc
+          rule
+        elsif add_close_tag
+          proc do |line, lines, set_value|
+            if line.nil? or rule =~ line
+              set_value.call(lines << line)
+              true
+            end
+          end
+        else
+          proc do |line|
+            [] if line.nil? or rule =~ line
+          end
+        end
+      end
+    end
+  end
+end

data/lib/pacer-xml/version.rb ADDED Viewed

@@ -0,0 +1,7 @@
+module PacerXml
+  unless const_defined? :VERSION
+    START_TIME = Time.now
+    VERSION = '0.2.2'
+    PACER_VERSION = '>= 1.1.1'
+  end
+end

data/lib/pacer-xml/xml_route.rb ADDED Viewed

@@ -0,0 +1,129 @@
+module PacerXml
+  module XmlRoute
+    def help(section = nil)
+      case section
+      when nil
+        puts <<HELP
+This is included via the pacer-xml gem plugin.
+pacer-xml uses Nokogiri for its xml parsing. Each element in an xml route
+is the first child element of the Nokogiri::XML::Document element. To get at
+the document element, simply call #parent on the element.
+An xml route can be created, transformed, filtered and otherwise
+processed by all standard Pacer routes. For instance, if a graph element
+has a property with xml data in it, we could process it as follows:
+  g.v.map(element_type: :xml) { |v| Nokogiri(v[:xml]) }
+Method help sections:
+  :xml
+  :import
+HELP
+      when :xml
+        puts <<HELP
+Turn an xml file into a stream of xml nodes. Scans the xml file
+line-by-line and uses arguments defined in start_section and end_section
+to extract sections from the file.
+Pacer.xml(file, start_section = nil, end_section = nil)
+file:          String | IO
+    String           path to an xml file to read
+    IO               an open resource that responds to #each_line
+  start_section: String | Symbol | Regex | Proc  (optional)
+    String | Symbol  name of xml tag to use as the root node of each
+                     section of xml. The end_section will automatically be
+                     set to the closing tag.  This uses very simple regex
+                     matching.
+    Regex            If it matches, start the section from this line
+    Proc             proc { |line| }
+                     If it results in a truthy value, starts collecting
+                     lines for the next section of xml.
+  end_section:   Proc  (optional)
+    Regex            If it matches, end the section including this line
+    Proc             proc { |line, lines| }
+                     - If it results in a truthy value to indicate that the
+                       current line is the last line in a section.
+                     - if it results in an Array, pass the result of
+                       joining the array to Nokogiri for the next section.
+HELP
+      when :import
+        puts <<HELP
+Turn the tree of xml in each node in the stream
+xml_route.import(graph, opts = {})
+  graph: PacerGraph   The graph to load the data into.
+  opts:  Hash
+    :cache  false | Hash
+      false              disable caching
+      stats: true        enable occasional dump of cache info
+    :rename Hash         map of { 'old-name' => 'new-name' }
+    :html   Array        set of tag names to treat as containing HTML
+    :skip   Array        set of tag or attribute names to skip
+Produces a vertex route where each vertex is the root vertex for each xml tree.
+Look at the source of lib/pacer-xml/sample.rb a good example.
+HELP
+      else
+        super
+      end
+      description
+    end
+    def children
+      flat_map(element_type: :xml) { |x| x.children.to_a }
+    end
+    def names
+      map element_type: :string, &:name
+    end
+    def text_nodes
+      select &:text?
+    end
+    def elements
+      select &:element?
+    end
+    def fields
+      elements.map element_type: :hash, &:fields
+    end
+    def import(graph, opts = {})
+      if opts[:cache] == false
+        builder = BuildGraph.new(graph, opts)
+      else
+        builder = BuildGraphCached.new(graph, opts)
+      end
+      graph.vertex_name ||= proc { |v| v[:type] }
+      to_route.map(route_name: 'import', graph: graph, element_type: :vertex, modules: [ImportHelp]) do |node|
+        graph.transaction do
+          builder.build(node)
+        end
+      end.route
+    end
+    module ImportHelp
+      def help(section = nil)
+        case section
+        when nil
+          back.help :import
+        else
+          super
+        end
+        description
+      end
+    end
+  end
+  Pacer::RouteBuilder.current.element_types[:xml] = [XmlRoute]
+end

data/lib/pacer-xml.rb ADDED Viewed

@@ -0,0 +1,48 @@
+require_relative 'pacer-xml/version'
+require 'nokogiri'
+require 'pacer'
+module PacerXml
+  class << self
+    # Returns the time pacer-xml was last reloaded (or when it was started).
+    def reload_time
+      if defined? @reload_time
+        @reload_time
+      else
+        START_TIME
+      end
+    end
+    # Reload all Ruby modified files in the pacer-xml library. Useful for debugging
+    # in the console. Does not do any of the fancy stuff that Rails reloading
+    # does. Certain types of changes will still require restarting the session.
+    def reload!
+      require 'pathname'
+      Pathname.new(File.expand_path(__FILE__)).parent.find do |path|
+        if path.extname == '.rb' and path.mtime > reload_time
+          puts path.to_s
+          load path.to_s
+        end
+      end
+      @reload_time = Time.now
+    end
+  end
+end
+require_relative 'pacer-xml/build_graph'
+require_relative 'pacer-xml/nokogiri_node'
+require_relative 'pacer-xml/xml_route'
+require_relative 'pacer-xml/string_route'
+require_relative 'pacer-xml/sample'
+module Pacer
+  class << self
+    def xml(file, enter = nil, leave = nil)
+      if file.is_a? String
+        file = File.open '/tmp/ipgb20120103.xml'
+      end
+      lines = file.each_line.to_route(element_type: :string, info: 'lines').route
+      lines.xml_stream(enter, leave).route
+    end
+  end
+end

data/pacer-xml.gemspec ADDED Viewed

@@ -0,0 +1,24 @@
+# -*- encoding: utf-8 -*-
+$:.push File.expand_path("../lib", __FILE__)
+require "pacer-xml/version"
+Gem::Specification.new do |s|
+  s.name        = "pacer-xml"
+  s.version     = PacerXml::VERSION
+  s.platform    = 'java'
+  s.authors     = ["Darrick Wiebe"]
+  s.email       = ["dw@xnlogic.com"]
+  s.homepage    = "http://xnlogic.com"
+  s.summary     = %q{XML streaming and graph import for Pacer}
+  s.description = s.summary
+  s.add_dependency 'pacer', PacerXml::PACER_VERSION
+  s.add_dependency 'pacer-neo4j', ">= 2.1"
+  s.add_dependency 'nokogiri'
+  s.rubyforge_project = "pacer-xml"
+  s.files = `git ls-files`.split("\n")
+  s.test_files    = `git ls-files -- {test,spec,features}/*`.split("\n")
+  s.require_paths = ["lib"]
+end

metadata CHANGED Viewed

@@ -2,14 +2,14 @@
 name: pacer-xml
 version: !ruby/object:Gem::Version
   prerelease:
-  version: 0.2.1
+  version: 0.2.2
 platform: java
 authors:
 - Darrick Wiebe
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2012-10-27 00:00:00.000000000 Z
+date: 2012-10-31 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: pacer
@@ -67,7 +67,19 @@ email:
 executables: []
 extensions: []
 extra_rdoc_files: []
-files: []
+files:
+- .gitignore
+- Gemfile
+- Rakefile
+- Readme.markdown
+- lib/pacer-xml.rb
+- lib/pacer-xml/build_graph.rb
+- lib/pacer-xml/nokogiri_node.rb
+- lib/pacer-xml/sample.rb
+- lib/pacer-xml/string_route.rb
+- lib/pacer-xml/version.rb
+- lib/pacer-xml/xml_route.rb
+- pacer-xml.gemspec
 homepage: http://xnlogic.com
 licenses: []
 post_install_message: