pupa 0.1.10 → 0.1.11

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 5ff448a08241852d405068e030d4b689b45373e0
4
- data.tar.gz: 7b1327e204e05de6659155659ac6b3b7c1f3d853
3
+ metadata.gz: 08051be1ed0ea215d0e1c258d9b01bd189215dc2
4
+ data.tar.gz: dd163a78b33959069adbb5d971781a17a582a15b
5
5
  SHA512:
6
- metadata.gz: 45805c371d13c20c22a600197df6d7a57ff70d6e1323ece52c51b108c7973d379ba2523d91afa7bf39e977a0a7c91538e437ada33be24aecdc83e8535dbe865d
7
- data.tar.gz: 66fa473a639735566a13c2c2ad7c782a037865c3f5f38c7da4524f8865f25b0a3fb64c8a0d0687823968a0eadab1c908e3e6eeecd336a1b9c4e5a10bd123c2e6
6
+ metadata.gz: 597d6a420ae3c8ba04336b9ed38f4073925e90720d51895fbd83a9c77232f8c588834f66d5106fd9906c402bba752f4805c1014919812427b119aa870ead1cda
7
+ data.tar.gz: 98e65c5af6ea3f1a63b54c7c0c2091d0664660705d19e77bf22b2d5e7dfa911d56155e8cb6fa69ab11c22191e74444f1e60981f3523239e6a8873bc0d3c92efd
data/.travis.yml CHANGED
@@ -2,9 +2,6 @@ language: ruby
2
2
  rvm:
3
3
  - 2.0.0
4
4
  - 2.1.0
5
- env:
6
- - MODE=default
7
- - MODE=compat
8
5
  services:
9
6
  - memcached
10
7
  - mongodb
data/README.md CHANGED
@@ -6,13 +6,13 @@
6
6
  [![Coverage Status](https://coveralls.io/repos/opennorth/pupa-ruby/badge.png?branch=master)](https://coveralls.io/r/opennorth/pupa-ruby)
7
7
  [![Code Climate](https://codeclimate.com/github/opennorth/pupa-ruby.png)](https://codeclimate.com/github/opennorth/pupa-ruby)
8
8
 
9
- Pupa.rb is a Ruby 2.x fork of Sunlight Labs' [Pupa](https://github.com/opencivicdata/pupa). It implements an Extract, Transform and Load (ETL) process to scrape data from online sources, transform it, and write it to a database.
9
+ Pupa.rb is a Ruby 2.x fork of Python [Pupa](https://github.com/opencivicdata/pupa). It implements an Extract, Transform and Load (ETL) process to scrape data from online sources, transform it, and write it to a database.
10
10
 
11
11
  gem install pupa
12
12
 
13
13
  ## Usage
14
14
 
15
- You can scrape any sort of data with Pupa.rb using your own models. You can also use Pupa.rb to scrape people, organizations, memberships and posts according to the [Popolo](http://www.popoloproject.com/) open government data specification. This gem is up-to-date with Popolo's 2014-05-09 version, but omits the Motion, VoteEvent, Count, Vote and Area classes.
15
+ You can scrape any sort of data with Pupa.rb using your own models. You can also use Pupa.rb to scrape people, organizations, memberships and posts according to the [Popolo](http://www.popoloproject.com/) open government data specification. This gem is up-to-date with Popolo's 2014-10-28 version.
16
16
 
17
17
  The [cat.rb](http://opennorth.github.io/pupa-ruby/docs/cat.html) example shows you how to:
18
18
 
@@ -52,14 +52,6 @@ The [organization.rb](http://opennorth.github.io/pupa-ruby/docs/organization.htm
52
52
 
53
53
  JSON parsing is enabled by default. To enable automatic parsing of HTML and XML, require the `nokogiri` and `multi_xml` gems.
54
54
 
55
- ### Automatic response decompression
56
-
57
- Until [Faraday Middleware](https://github.com/lostisland/faraday_middleware) releases its next version (> 0.9.1), you must use the gem from its git repository to automatically decompress responses:
58
-
59
- ```ruby
60
- gem 'faraday_middleware', git: 'https://github.com/lostisland/faraday_middleware.git'
61
- ```
62
-
63
55
  ## Performance
64
56
 
65
57
  Pupa.rb offers several ways to significantly improve performance. [Read the documentation.](https://github.com/opennorth/pupa-ruby/blob/master/PERFORMANCE.md#readme)
@@ -90,22 +82,15 @@ In short, Pupa.rb lets you spend more time on the tasks that are unique to your
90
82
 
91
83
  * Logging, to make debugging and monitoring a scraper easier
92
84
  * [Automatic response parsing](#automatic-response-parsing) of JSON, XML and HTML
85
+ * Automatic response decompression
93
86
  * [Option parsing](http://opennorth.github.io/pupa-ruby/docs/legislator.html#section-9), to control your scraper from the command-line
94
87
  * [Object validation](http://opennorth.github.io/pupa-ruby/docs/cat.html#section-4), using [JSON Schema](http://json-schema.org/)
95
88
 
96
89
  Pupa.rb is extensible, so that you can add your own models, parsers, helpers, actions, etc. It also offers several ways to [improve your scraper's performance](#performance).
97
90
 
98
- ## [OpenCivicData](http://opencivicdata.org/) compatibility
99
-
100
- Both Pupa.rb and Sunlight Labs' [Pupa](https://github.com/opencivicdata/pupa) implement models for people, organizations and memberships from the [Popolo](http://www.popoloproject.com/) open government data specification. Pupa.rb lets you use your own classes, but Pupa only supports a fixed set of classes. A consequence of Pupa.rb's flexibility is that the value of the `_type` property for `Person`, `Organization` and `Membership` objects differs between Pupa.rb and Pupa. Pupa.rb has namespaced types like `pupa/person` – to allow Ruby to load the `Person` class in the `Pupa` module – whereas Pupa has unnamespaced types like `person`.
101
-
102
- To save objects to MongoDB with unnamespaced types like Sunlight Labs' Pupa – in order to benefit from other tools in the [OpenCivicData](http://opencivicdata.org/) stack – add this line to the top of your script:
103
-
104
- ```ruby
105
- require 'pupa/refinements/opencivicdata'
106
- ```
91
+ ## Python [Pupa](https://github.com/opencivicdata/pupa) differences
107
92
 
108
- It is not currently possible to run the `scrape` action with one of Pupa.rb and Pupa, and to then run the `import` action with the other. Both actions must be run by the same library.
93
+ Both Pupa.rb and Python [Pupa](https://github.com/opencivicdata/pupa) implement models from the [Popolo](http://www.popoloproject.com/) open government data specifications, but Pupa.rb also lets you use your own classes. Pupa.rb stores data in either MongoDB (default) or PostgreSQL (experimental); Python Pupa stores data in PostgreSQL. The PostgreSQL schema of Pupa.rb and Python Pupa differ.
109
94
 
110
95
  ## Testing
111
96
 
data/lib/pupa/runner.rb CHANGED
@@ -13,9 +13,9 @@ module Pupa
13
13
  @options = OpenStruct.new({
14
14
  actions: [],
15
15
  tasks: [],
16
- output_dir: File.expand_path('scraped_data', Dir.pwd),
16
+ output_dir: File.expand_path('_data', Dir.pwd),
17
17
  pipelined: false,
18
- cache_dir: File.expand_path('web_cache', Dir.pwd),
18
+ cache_dir: File.expand_path('_cache', Dir.pwd),
19
19
  expires_in: 86400, # 1 day
20
20
  value_max_bytes: 1048576, # 1 MB
21
21
  memcached_username: nil,
data/lib/pupa/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module Pupa
2
- VERSION = "0.1.10"
2
+ VERSION = "0.1.11"
3
3
  end
data/pupa.gemspec CHANGED
@@ -16,7 +16,7 @@ Gem::Specification.new do |s|
16
16
  s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
17
17
  s.require_paths = ["lib"]
18
18
 
19
- s.add_runtime_dependency('activesupport', '~> 4.0.0')
19
+ s.add_runtime_dependency('activesupport', '~> 4.0')
20
20
  s.add_runtime_dependency('colored', '~> 1.2')
21
21
  s.add_runtime_dependency('faraday_middleware', '~> 0.9.0')
22
22
  s.add_runtime_dependency('json-schema', '~> 2.1.3')
@@ -2,11 +2,7 @@ require File.expand_path(File.dirname(__FILE__) + '/../../spec_helper')
2
2
 
3
3
  describe Pupa::Processor::Connection::MongoDBAdapter do
4
4
  def _type
5
- if testing_python_compatibility?
6
- 'person'
7
- else
8
- 'pupa/person'
9
- end
5
+ 'pupa/person'
10
6
  end
11
7
 
12
8
  def connection
@@ -2,11 +2,7 @@ require File.expand_path(File.dirname(__FILE__) + '/../../spec_helper')
2
2
 
3
3
  describe Pupa::Processor::Connection::PostgreSQLAdapter do
4
4
  def _type
5
- if testing_python_compatibility?
6
- 'person'
7
- else
8
- 'pupa/person'
9
- end
5
+ 'pupa/person'
10
6
  end
11
7
 
12
8
  def connection
@@ -102,11 +102,7 @@ describe Pupa::Processor do
102
102
  end
103
103
 
104
104
  let :_type do
105
- if testing_python_compatibility?
106
- 'organization'
107
- else
108
- 'pupa/organization'
109
- end
105
+ 'pupa/organization'
110
106
  end
111
107
 
112
108
  let :graphable do
data/spec/spec_helper.rb CHANGED
@@ -8,15 +8,3 @@ require 'nokogiri'
8
8
  require 'redis-store'
9
9
  require 'rspec'
10
10
  require File.dirname(__FILE__) + '/../lib/pupa'
11
-
12
- def testing_python_compatibility?
13
- ENV['MODE'] == 'compat'
14
- end
15
-
16
- if testing_python_compatibility?
17
- require File.dirname(__FILE__) + '/../lib/pupa/refinements/opencivicdata'
18
- end
19
-
20
- RSpec.configure do |c|
21
- c.filter_run_excluding :testing_python_compatibility => true unless testing_python_compatibility?
22
- end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pupa
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.10
4
+ version: 0.1.11
5
5
  platform: ruby
6
6
  authors:
7
7
  - Open North
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-10-18 00:00:00.000000000 Z
11
+ date: 2015-01-07 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activesupport
@@ -16,14 +16,14 @@ dependencies:
16
16
  requirements:
17
17
  - - "~>"
18
18
  - !ruby/object:Gem::Version
19
- version: 4.0.0
19
+ version: '4.0'
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
24
  - - "~>"
25
25
  - !ruby/object:Gem::Version
26
- version: 4.0.0
26
+ version: '4.0'
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: colored
29
29
  requirement: !ruby/object:Gem::Requirement
@@ -331,7 +331,6 @@ files:
331
331
  - lib/pupa/refinements/faraday.rb
332
332
  - lib/pupa/refinements/faraday_middleware.rb
333
333
  - lib/pupa/refinements/json-schema.rb
334
- - lib/pupa/refinements/opencivicdata.rb
335
334
  - lib/pupa/runner.rb
336
335
  - lib/pupa/version.rb
337
336
  - pupa.gemspec
@@ -385,7 +384,6 @@ files:
385
384
  - spec/processor/yielder_spec.rb
386
385
  - spec/processor_spec.rb
387
386
  - spec/refinements/json-schema_spec.rb
388
- - spec/refinements/opencivicdata_spec.rb
389
387
  - spec/runner_spec.rb
390
388
  - spec/spec_helper.rb
391
389
  homepage: http://github.com/opennorth/pupa-ruby
@@ -449,7 +447,5 @@ test_files:
449
447
  - spec/processor/yielder_spec.rb
450
448
  - spec/processor_spec.rb
451
449
  - spec/refinements/json-schema_spec.rb
452
- - spec/refinements/opencivicdata_spec.rb
453
450
  - spec/runner_spec.rb
454
451
  - spec/spec_helper.rb
455
- has_rdoc:
@@ -1,42 +0,0 @@
1
- # @see https://github.com/opennorth/pupa-ruby#opencivicdata-compatibility
2
-
3
- module Pupa::Model
4
- # This unfortunately won't cause the behavior of any model that has already
5
- # included `Pupa::Model` to change.
6
- class << self
7
- def append_features(base)
8
- if base.instance_variable_defined?("@_dependencies")
9
- base.instance_variable_get("@_dependencies") << self
10
- return false
11
- else
12
- return false if base < self
13
- @_dependencies.each { |dep| base.send(:include, dep) }
14
- super
15
- base.extend const_get("ClassMethods") if const_defined?("ClassMethods")
16
- base.class_eval(&@_included_block) if instance_variable_defined?("@_included_block")
17
- base.class_eval do # XXX
18
- set_callback(:save, :before) do |object|
19
- object._type = object._type.camelize.demodulize.underscore
20
- end
21
- end
22
- end
23
- end
24
- end
25
- end
26
-
27
- # `set_callback` is called by `class_eval` in `ActiveSupport::Concern`. Without
28
- # monkey-patching `ActiveSupport::Concern`, we can either iterate `ObjectSpace`,
29
- # implement something like ActiveSupport's `DescendantsTracker` for inclusion
30
- # instead of inheritance, or go back to `Pupa::Model` being a superclass instead
31
- # of a mixin to take advantage of `DescendantsTracker` itself.
32
- #
33
- # Instead of adding a callback, we can override `to_h` when `persist` is `true`.
34
- ObjectSpace.each_object(Class) do |base|
35
- if base != Sequel::Model && base.include?(Pupa::Model) # Sequel::Model will error on #include?
36
- base.class_eval do
37
- set_callback(:save, :before) do |object|
38
- object._type = object._type.camelize.demodulize.underscore
39
- end
40
- end
41
- end
42
- end
@@ -1,35 +0,0 @@
1
- require File.expand_path(File.dirname(__FILE__) + '/../spec_helper')
2
-
3
- describe Pupa::Refinements, testing_python_compatibility: true do
4
- module Music
5
- class Band
6
- include Pupa::Model
7
-
8
- def save
9
- run_callbacks(:save) do
10
- end
11
- end
12
- end
13
- end
14
-
15
- module Pupa
16
- class Committee < Organization
17
- def save
18
- run_callbacks(:save) do
19
- end
20
- end
21
- end
22
- end
23
-
24
- it 'should demodulize the type of new models' do
25
- object = Music::Band.new
26
- object.save
27
- object._type.should == 'band'
28
- end
29
-
30
- it 'should demodulize the type of existing models' do
31
- object = Pupa::Committee.new
32
- object.save
33
- object._type.should == 'committee'
34
- end
35
- end