pupa 0.1.10 → 0.1.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 5ff448a08241852d405068e030d4b689b45373e0
4
- data.tar.gz: 7b1327e204e05de6659155659ac6b3b7c1f3d853
3
+ metadata.gz: 08051be1ed0ea215d0e1c258d9b01bd189215dc2
4
+ data.tar.gz: dd163a78b33959069adbb5d971781a17a582a15b
5
5
  SHA512:
6
- metadata.gz: 45805c371d13c20c22a600197df6d7a57ff70d6e1323ece52c51b108c7973d379ba2523d91afa7bf39e977a0a7c91538e437ada33be24aecdc83e8535dbe865d
7
- data.tar.gz: 66fa473a639735566a13c2c2ad7c782a037865c3f5f38c7da4524f8865f25b0a3fb64c8a0d0687823968a0eadab1c908e3e6eeecd336a1b9c4e5a10bd123c2e6
6
+ metadata.gz: 597d6a420ae3c8ba04336b9ed38f4073925e90720d51895fbd83a9c77232f8c588834f66d5106fd9906c402bba752f4805c1014919812427b119aa870ead1cda
7
+ data.tar.gz: 98e65c5af6ea3f1a63b54c7c0c2091d0664660705d19e77bf22b2d5e7dfa911d56155e8cb6fa69ab11c22191e74444f1e60981f3523239e6a8873bc0d3c92efd
data/.travis.yml CHANGED
@@ -2,9 +2,6 @@ language: ruby
2
2
  rvm:
3
3
  - 2.0.0
4
4
  - 2.1.0
5
- env:
6
- - MODE=default
7
- - MODE=compat
8
5
  services:
9
6
  - memcached
10
7
  - mongodb
data/README.md CHANGED
@@ -6,13 +6,13 @@
6
6
  [![Coverage Status](https://coveralls.io/repos/opennorth/pupa-ruby/badge.png?branch=master)](https://coveralls.io/r/opennorth/pupa-ruby)
7
7
  [![Code Climate](https://codeclimate.com/github/opennorth/pupa-ruby.png)](https://codeclimate.com/github/opennorth/pupa-ruby)
8
8
 
9
- Pupa.rb is a Ruby 2.x fork of Sunlight Labs' [Pupa](https://github.com/opencivicdata/pupa). It implements an Extract, Transform and Load (ETL) process to scrape data from online sources, transform it, and write it to a database.
9
+ Pupa.rb is a Ruby 2.x fork of Python [Pupa](https://github.com/opencivicdata/pupa). It implements an Extract, Transform and Load (ETL) process to scrape data from online sources, transform it, and write it to a database.
10
10
 
11
11
  gem install pupa
12
12
 
13
13
  ## Usage
14
14
 
15
- You can scrape any sort of data with Pupa.rb using your own models. You can also use Pupa.rb to scrape people, organizations, memberships and posts according to the [Popolo](http://www.popoloproject.com/) open government data specification. This gem is up-to-date with Popolo's 2014-05-09 version, but omits the Motion, VoteEvent, Count, Vote and Area classes.
15
+ You can scrape any sort of data with Pupa.rb using your own models. You can also use Pupa.rb to scrape people, organizations, memberships and posts according to the [Popolo](http://www.popoloproject.com/) open government data specification. This gem is up-to-date with Popolo's 2014-10-28 version.
16
16
 
17
17
  The [cat.rb](http://opennorth.github.io/pupa-ruby/docs/cat.html) example shows you how to:
18
18
 
@@ -52,14 +52,6 @@ The [organization.rb](http://opennorth.github.io/pupa-ruby/docs/organization.htm
52
52
 
53
53
  JSON parsing is enabled by default. To enable automatic parsing of HTML and XML, require the `nokogiri` and `multi_xml` gems.
54
54
 
55
- ### Automatic response decompression
56
-
57
- Until [Faraday Middleware](https://github.com/lostisland/faraday_middleware) releases its next version (> 0.9.1), you must use the gem from its git repository to automatically decompress responses:
58
-
59
- ```ruby
60
- gem 'faraday_middleware', git: 'https://github.com/lostisland/faraday_middleware.git'
61
- ```
62
-
63
55
  ## Performance
64
56
 
65
57
  Pupa.rb offers several ways to significantly improve performance. [Read the documentation.](https://github.com/opennorth/pupa-ruby/blob/master/PERFORMANCE.md#readme)
@@ -90,22 +82,15 @@ In short, Pupa.rb lets you spend more time on the tasks that are unique to your
90
82
 
91
83
  * Logging, to make debugging and monitoring a scraper easier
92
84
  * [Automatic response parsing](#automatic-response-parsing) of JSON, XML and HTML
85
+ * Automatic response decompression
93
86
  * [Option parsing](http://opennorth.github.io/pupa-ruby/docs/legislator.html#section-9), to control your scraper from the command-line
94
87
  * [Object validation](http://opennorth.github.io/pupa-ruby/docs/cat.html#section-4), using [JSON Schema](http://json-schema.org/)
95
88
 
96
89
  Pupa.rb is extensible, so that you can add your own models, parsers, helpers, actions, etc. It also offers several ways to [improve your scraper's performance](#performance).
97
90
 
98
- ## [OpenCivicData](http://opencivicdata.org/) compatibility
99
-
100
- Both Pupa.rb and Sunlight Labs' [Pupa](https://github.com/opencivicdata/pupa) implement models for people, organizations and memberships from the [Popolo](http://www.popoloproject.com/) open government data specification. Pupa.rb lets you use your own classes, but Pupa only supports a fixed set of classes. A consequence of Pupa.rb's flexibility is that the value of the `_type` property for `Person`, `Organization` and `Membership` objects differs between Pupa.rb and Pupa. Pupa.rb has namespaced types like `pupa/person` – to allow Ruby to load the `Person` class in the `Pupa` module – whereas Pupa has unnamespaced types like `person`.
101
-
102
- To save objects to MongoDB with unnamespaced types like Sunlight Labs' Pupa – in order to benefit from other tools in the [OpenCivicData](http://opencivicdata.org/) stack – add this line to the top of your script:
103
-
104
- ```ruby
105
- require 'pupa/refinements/opencivicdata'
106
- ```
91
+ ## Python [Pupa](https://github.com/opencivicdata/pupa) differences
107
92
 
108
- It is not currently possible to run the `scrape` action with one of Pupa.rb and Pupa, and to then run the `import` action with the other. Both actions must be run by the same library.
93
+ Both Pupa.rb and Python [Pupa](https://github.com/opencivicdata/pupa) implement models from the [Popolo](http://www.popoloproject.com/) open government data specifications, but Pupa.rb also lets you use your own classes. Pupa.rb stores data in either MongoDB (default) or PostgreSQL (experimental); Python Pupa stores data in PostgreSQL. The PostgreSQL schema of Pupa.rb and Python Pupa differ.
109
94
 
110
95
  ## Testing
111
96
 
data/lib/pupa/runner.rb CHANGED
@@ -13,9 +13,9 @@ module Pupa
13
13
  @options = OpenStruct.new({
14
14
  actions: [],
15
15
  tasks: [],
16
- output_dir: File.expand_path('scraped_data', Dir.pwd),
16
+ output_dir: File.expand_path('_data', Dir.pwd),
17
17
  pipelined: false,
18
- cache_dir: File.expand_path('web_cache', Dir.pwd),
18
+ cache_dir: File.expand_path('_cache', Dir.pwd),
19
19
  expires_in: 86400, # 1 day
20
20
  value_max_bytes: 1048576, # 1 MB
21
21
  memcached_username: nil,
data/lib/pupa/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module Pupa
2
- VERSION = "0.1.10"
2
+ VERSION = "0.1.11"
3
3
  end
data/pupa.gemspec CHANGED
@@ -16,7 +16,7 @@ Gem::Specification.new do |s|
16
16
  s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
17
17
  s.require_paths = ["lib"]
18
18
 
19
- s.add_runtime_dependency('activesupport', '~> 4.0.0')
19
+ s.add_runtime_dependency('activesupport', '~> 4.0')
20
20
  s.add_runtime_dependency('colored', '~> 1.2')
21
21
  s.add_runtime_dependency('faraday_middleware', '~> 0.9.0')
22
22
  s.add_runtime_dependency('json-schema', '~> 2.1.3')
@@ -2,11 +2,7 @@ require File.expand_path(File.dirname(__FILE__) + '/../../spec_helper')
2
2
 
3
3
  describe Pupa::Processor::Connection::MongoDBAdapter do
4
4
  def _type
5
- if testing_python_compatibility?
6
- 'person'
7
- else
8
- 'pupa/person'
9
- end
5
+ 'pupa/person'
10
6
  end
11
7
 
12
8
  def connection
@@ -2,11 +2,7 @@ require File.expand_path(File.dirname(__FILE__) + '/../../spec_helper')
2
2
 
3
3
  describe Pupa::Processor::Connection::PostgreSQLAdapter do
4
4
  def _type
5
- if testing_python_compatibility?
6
- 'person'
7
- else
8
- 'pupa/person'
9
- end
5
+ 'pupa/person'
10
6
  end
11
7
 
12
8
  def connection
@@ -102,11 +102,7 @@ describe Pupa::Processor do
102
102
  end
103
103
 
104
104
  let :_type do
105
- if testing_python_compatibility?
106
- 'organization'
107
- else
108
- 'pupa/organization'
109
- end
105
+ 'pupa/organization'
110
106
  end
111
107
 
112
108
  let :graphable do
data/spec/spec_helper.rb CHANGED
@@ -8,15 +8,3 @@ require 'nokogiri'
8
8
  require 'redis-store'
9
9
  require 'rspec'
10
10
  require File.dirname(__FILE__) + '/../lib/pupa'
11
-
12
- def testing_python_compatibility?
13
- ENV['MODE'] == 'compat'
14
- end
15
-
16
- if testing_python_compatibility?
17
- require File.dirname(__FILE__) + '/../lib/pupa/refinements/opencivicdata'
18
- end
19
-
20
- RSpec.configure do |c|
21
- c.filter_run_excluding :testing_python_compatibility => true unless testing_python_compatibility?
22
- end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pupa
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.10
4
+ version: 0.1.11
5
5
  platform: ruby
6
6
  authors:
7
7
  - Open North
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-10-18 00:00:00.000000000 Z
11
+ date: 2015-01-07 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activesupport
@@ -16,14 +16,14 @@ dependencies:
16
16
  requirements:
17
17
  - - "~>"
18
18
  - !ruby/object:Gem::Version
19
- version: 4.0.0
19
+ version: '4.0'
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
24
  - - "~>"
25
25
  - !ruby/object:Gem::Version
26
- version: 4.0.0
26
+ version: '4.0'
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: colored
29
29
  requirement: !ruby/object:Gem::Requirement
@@ -331,7 +331,6 @@ files:
331
331
  - lib/pupa/refinements/faraday.rb
332
332
  - lib/pupa/refinements/faraday_middleware.rb
333
333
  - lib/pupa/refinements/json-schema.rb
334
- - lib/pupa/refinements/opencivicdata.rb
335
334
  - lib/pupa/runner.rb
336
335
  - lib/pupa/version.rb
337
336
  - pupa.gemspec
@@ -385,7 +384,6 @@ files:
385
384
  - spec/processor/yielder_spec.rb
386
385
  - spec/processor_spec.rb
387
386
  - spec/refinements/json-schema_spec.rb
388
- - spec/refinements/opencivicdata_spec.rb
389
387
  - spec/runner_spec.rb
390
388
  - spec/spec_helper.rb
391
389
  homepage: http://github.com/opennorth/pupa-ruby
@@ -449,7 +447,5 @@ test_files:
449
447
  - spec/processor/yielder_spec.rb
450
448
  - spec/processor_spec.rb
451
449
  - spec/refinements/json-schema_spec.rb
452
- - spec/refinements/opencivicdata_spec.rb
453
450
  - spec/runner_spec.rb
454
451
  - spec/spec_helper.rb
455
- has_rdoc:
@@ -1,42 +0,0 @@
1
- # @see https://github.com/opennorth/pupa-ruby#opencivicdata-compatibility
2
-
3
- module Pupa::Model
4
- # This unfortunately won't cause the behavior of any model that has already
5
- # included `Pupa::Model` to change.
6
- class << self
7
- def append_features(base)
8
- if base.instance_variable_defined?("@_dependencies")
9
- base.instance_variable_get("@_dependencies") << self
10
- return false
11
- else
12
- return false if base < self
13
- @_dependencies.each { |dep| base.send(:include, dep) }
14
- super
15
- base.extend const_get("ClassMethods") if const_defined?("ClassMethods")
16
- base.class_eval(&@_included_block) if instance_variable_defined?("@_included_block")
17
- base.class_eval do # XXX
18
- set_callback(:save, :before) do |object|
19
- object._type = object._type.camelize.demodulize.underscore
20
- end
21
- end
22
- end
23
- end
24
- end
25
- end
26
-
27
- # `set_callback` is called by `class_eval` in `ActiveSupport::Concern`. Without
28
- # monkey-patching `ActiveSupport::Concern`, we can either iterate `ObjectSpace`,
29
- # implement something like ActiveSupport's `DescendantsTracker` for inclusion
30
- # instead of inheritance, or go back to `Pupa::Model` being a superclass instead
31
- # of a mixin to take advantage of `DescendantsTracker` itself.
32
- #
33
- # Instead of adding a callback, we can override `to_h` when `persist` is `true`.
34
- ObjectSpace.each_object(Class) do |base|
35
- if base != Sequel::Model && base.include?(Pupa::Model) # Sequel::Model will error on #include?
36
- base.class_eval do
37
- set_callback(:save, :before) do |object|
38
- object._type = object._type.camelize.demodulize.underscore
39
- end
40
- end
41
- end
42
- end
@@ -1,35 +0,0 @@
1
- require File.expand_path(File.dirname(__FILE__) + '/../spec_helper')
2
-
3
- describe Pupa::Refinements, testing_python_compatibility: true do
4
- module Music
5
- class Band
6
- include Pupa::Model
7
-
8
- def save
9
- run_callbacks(:save) do
10
- end
11
- end
12
- end
13
- end
14
-
15
- module Pupa
16
- class Committee < Organization
17
- def save
18
- run_callbacks(:save) do
19
- end
20
- end
21
- end
22
- end
23
-
24
- it 'should demodulize the type of new models' do
25
- object = Music::Band.new
26
- object.save
27
- object._type.should == 'band'
28
- end
29
-
30
- it 'should demodulize the type of existing models' do
31
- object = Pupa::Committee.new
32
- object.save
33
- object._type.should == 'committee'
34
- end
35
- end