pupa 0.0.11 → 0.0.12
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.travis.yml +4 -0
- data/README.md +30 -4
- data/lib/pupa.rb +32 -0
- data/lib/pupa/errors.rb +4 -0
- data/lib/pupa/models/concerns/indifferent_access.rb +35 -0
- data/lib/pupa/models/concerns/timestamps.rb +1 -0
- data/lib/pupa/models/foreign_object.rb +28 -0
- data/lib/pupa/models/model.rb +5 -30
- data/lib/pupa/processor.rb +56 -34
- data/lib/pupa/processor/document_store/file_store.rb +11 -9
- data/lib/pupa/processor/document_store/redis_store.rb +5 -5
- data/lib/pupa/processor/middleware/parse_json.rb +2 -2
- data/lib/pupa/processor/persistence.rb +3 -2
- data/lib/pupa/refinements/faraday_middleware.rb +1 -1
- data/lib/pupa/refinements/json-schema.rb +2 -3
- data/lib/pupa/refinements/opencivicdata.rb +42 -0
- data/lib/pupa/runner.rb +1 -1
- data/lib/pupa/version.rb +1 -1
- data/pupa.gemspec +1 -0
- data/spec/models/model_spec.rb +2 -2
- data/spec/processor/document_store/file_store_spec.rb +12 -6
- data/spec/processor/document_store/redis_store_spec.rb +7 -7
- data/spec/processor/persistence_spec.rb +12 -4
- data/spec/processor_spec.rb +84 -15
- data/spec/refinements/opencivicdata_spec.rb +35 -0
- data/spec/spec_helper.rb +12 -0
- metadata +21 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c894758e0375a999ec78a825feb05ff7ec9da66a
|
4
|
+
data.tar.gz: 4f2633cc09888ef1f236cb2c35afdff9681e7924
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 7e83f57b09cf99dab424032634193aa17055c22721b5ee558362f5fc07face6d6129b71d431e01f0b2b663aa771f752014ce7129e963d44d47beaa154f770c44
|
7
|
+
data.tar.gz: a51e3cee53727013f60623576af75b7c0ce7c047085be577a5f745fed04babc62ffd64a3c59abf234b909ffca47bc73250645f39eaae0364a89ad131f2593426
|
data/.travis.yml
CHANGED
data/README.md
CHANGED
@@ -7,6 +7,24 @@
|
|
7
7
|
|
8
8
|
Pupa.rb is a Ruby 2.0 fork of Sunlight Labs' [Pupa](https://github.com/opencivicdata/pupa). It implements an Extract, Transform and Load (ETL) process to scrape data from online sources, transform it, and write it to a database.
|
9
9
|
|
10
|
+
## What it tries to solve
|
11
|
+
|
12
|
+
Pupa.rb's goal is to make scraping less painful by solving common problems:
|
13
|
+
|
14
|
+
* If you are updating a database by scraping a website, you can either delete and recreate records, or you can merge the scraped records with the saved records. Pupa.rb offers a simple way to merge records, by using an object's stable properties for identification.
|
15
|
+
* If you are scraping a source that references other sources – for example, a committee that references its members – you may want to link the source to its references with foreign keys. Pupa.rb will use whatever identifying information you scrape – for example, the members' names – to fill in the foreign keys for you.
|
16
|
+
* Data sources may use different formats in different contexts. Pupa.rb makes it easy to [select scraping methods](https://github.com/opennorth/pupa-ruby#scraping-method-selection) according to criteria, like the year of publication for example.
|
17
|
+
* By splitting the scrape (extract) and import (load) steps, it's easier for you and volunteers to start a scraper without any interaction with a database.
|
18
|
+
|
19
|
+
In short, Pupa.rb lets you spend more time on the tasks that are unique to your use case, and less time on common tasks like caching, merging and storing data. It also provides helpful features like:
|
20
|
+
|
21
|
+
* Logging, to make debugging and monitoring a scraper easier
|
22
|
+
* [Automatic response parsing](https://github.com/opennorth/pupa-ruby#automatic-response-parsing) of JSON, XML and HTML
|
23
|
+
* Option parsing, to control your scraper from the command-line
|
24
|
+
* Object validation, using [JSON Schema](http://json-schema.org/)
|
25
|
+
|
26
|
+
Pupa.rb is extensible, so that you can add your own models, parsers, helpers, actions, etc. It also offers several ways to [improve your scraper's performance](https://github.com/opennorth/pupa-ruby#performance).
|
27
|
+
|
10
28
|
## Usage
|
11
29
|
|
12
30
|
You can use Pupa.rb to author scrapers that create people, organizations, memberships and posts according to the [Popolo](http://popoloproject.com/) open government data specification. If you need to scrape other types of data, you can also use your own models with Pupa.rb.
|
@@ -49,6 +67,18 @@ The [organization.rb](http://opennorth.github.io/pupa-ruby/docs/organization.htm
|
|
49
67
|
|
50
68
|
JSON parsing is enabled by default. To enable automatic parsing of HTML and XML, require the `nokogiri` and `multi_xml` gems.
|
51
69
|
|
70
|
+
### [OpenCivicData](http://opencivicdata.org/) compatibility
|
71
|
+
|
72
|
+
Both Pupa.rb and Sunlight Labs' [Pupa](https://github.com/opencivicdata/pupa) implement models for people, organizations and memberships from the [Popolo](http://popoloproject.com/) open government data specification. Pupa.rb lets you use your own classes, but Pupa only supports a fixed set of classes. A consequence of Pupa.rb's flexibility is that the value of the `_type` property for `Person`, `Organization` and `Membership` objects differs between Pupa.rb and Pupa. Pupa.rb has namespaced types like `pupa/person` – to allow Ruby to load the `Person` class in the `Pupa` module – whereas Pupa has unnamespaced types like `person`.
|
73
|
+
|
74
|
+
To save objects to MongoDB with unnamespaced types like Sunlight Labs' Pupa – in order to benefit from other tools in the [OpenCivicData](http://opencivicdata.org/) stack – add this line to the top of your script:
|
75
|
+
|
76
|
+
```ruby
|
77
|
+
require 'pupa/refinements/opencivicdata'
|
78
|
+
```
|
79
|
+
|
80
|
+
It is not currently possible to run the `scrape` action with one of Pupa.rb and Pupa, and to then run the `import` action with the other. Both actions must be run by the same library.
|
81
|
+
|
52
82
|
## Performance
|
53
83
|
|
54
84
|
Pupa.rb offers several ways to significantly improve performance.
|
@@ -157,10 +187,6 @@ The `json-schema` gem is slow compared to, for example, [JSV](https://github.com
|
|
157
187
|
|
158
188
|
The [pupa-validate](https://npmjs.org/package/pupa-validate) npm package can be used to validate JSON documents using the faster JSV. In an example case, using JSV instead of the `json-schema` gem reduced by half the time to validate 10,000 documents.
|
159
189
|
|
160
|
-
### Parsing JSON
|
161
|
-
|
162
|
-
If the rest of your scraper is fast, you may see an improvement by using the `oj` gem. Just `require 'oj'` and Pupa.rb will automatically pick it up, since it uses [MultiJson](https://github.com/intridea/multi_json).
|
163
|
-
|
164
190
|
### Profiling
|
165
191
|
|
166
192
|
You can profile your code using [perftools.rb](https://github.com/tmm1/perftools.rb). First, install the gem:
|
data/lib/pupa.rb
CHANGED
@@ -1,5 +1,6 @@
|
|
1
1
|
require 'fileutils'
|
2
2
|
require 'forwardable'
|
3
|
+
require 'json'
|
3
4
|
|
4
5
|
require 'active_support/concern'
|
5
6
|
require 'active_support/core_ext/class/attribute'
|
@@ -13,6 +14,7 @@ require 'pupa/logger'
|
|
13
14
|
require 'pupa/processor'
|
14
15
|
require 'pupa/runner'
|
15
16
|
|
17
|
+
require 'pupa/models/concerns/indifferent_access'
|
16
18
|
require 'pupa/models/concerns/contactable'
|
17
19
|
require 'pupa/models/concerns/identifiable'
|
18
20
|
require 'pupa/models/concerns/linkable'
|
@@ -20,6 +22,7 @@ require 'pupa/models/concerns/nameable'
|
|
20
22
|
require 'pupa/models/concerns/sourceable'
|
21
23
|
require 'pupa/models/concerns/timestamps'
|
22
24
|
|
25
|
+
require 'pupa/models/foreign_object'
|
23
26
|
require 'pupa/models/model'
|
24
27
|
require 'pupa/models/contact_detail_list'
|
25
28
|
require 'pupa/models/identifier_list'
|
@@ -33,3 +36,32 @@ module Pupa
|
|
33
36
|
attr_accessor :session
|
34
37
|
end
|
35
38
|
end
|
39
|
+
|
40
|
+
# ActiveSupport's String methods become bottlenecks once:
|
41
|
+
#
|
42
|
+
# - HTTP responses are cached in Memcached
|
43
|
+
# - JSON documents are dumped to Redis
|
44
|
+
# - Redis is pipelined
|
45
|
+
# - Validation is skipped
|
46
|
+
# - The runner is quiet
|
47
|
+
#
|
48
|
+
# With these optimizations, in sample scripts, garbage collection and gem
|
49
|
+
# requiring take up two-thirds of the running time.
|
50
|
+
class String
|
51
|
+
# Alternatively, check if `inflections.acronym_regex` is equal to `/(?=a)b/`.
|
52
|
+
# If so, to skip the substitution, which is guaranteed to fail.
|
53
|
+
#
|
54
|
+
# @see http://api.rubyonrails.org/classes/ActiveSupport/Inflector.html#method-i-underscore
|
55
|
+
def underscore
|
56
|
+
word = gsub('::', '/')
|
57
|
+
# word.gsub!(/(?:([A-Za-z\d])|^)(#{inflections.acronym_regex})(?=\b|[^a-z])/) { "#{$1}#{$1 && '_'}#{$2.downcase}" }
|
58
|
+
word.gsub!(/([A-Z\d]+)([A-Z][a-z])/,'\1_\2')
|
59
|
+
word.gsub!(/([a-z\d])([A-Z])/,'\1_\2')
|
60
|
+
word.tr!("-", "_")
|
61
|
+
word.downcase!
|
62
|
+
word
|
63
|
+
end
|
64
|
+
|
65
|
+
# @see http://api.rubyonrails.org/classes/String.html#method-i-blank-3F
|
66
|
+
alias_method :blank?, :empty?
|
67
|
+
end
|
data/lib/pupa/errors.rb
CHANGED
@@ -3,6 +3,10 @@ module Pupa
|
|
3
3
|
# An abstract class from which all Pupa errors inherit.
|
4
4
|
class Error < StandardError; end
|
5
5
|
|
6
|
+
# This error is raised when loading a scraped object from disk if a type is
|
7
|
+
# not set.
|
8
|
+
class MissingObjectTypeError < Error; end
|
9
|
+
|
6
10
|
# This error is raised when saving an object to a database if a foreign key
|
7
11
|
# cannot be resolved.
|
8
12
|
class MissingDatabaseIdError < Error; end
|
@@ -0,0 +1,35 @@
|
|
1
|
+
module Pupa
|
2
|
+
module Concerns
|
3
|
+
# Adds private methods for changing hash keys to strings or symbols.
|
4
|
+
module IndifferentAccess
|
5
|
+
extend ActiveSupport::Concern
|
6
|
+
|
7
|
+
private
|
8
|
+
|
9
|
+
def transform_keys(object, meth)
|
10
|
+
case object
|
11
|
+
when Hash
|
12
|
+
{}.tap do |hash|
|
13
|
+
object.each do |key,value|
|
14
|
+
hash[key.send(meth)] = transform_keys(value, meth)
|
15
|
+
end
|
16
|
+
end
|
17
|
+
when Array
|
18
|
+
object.map do |value|
|
19
|
+
transform_keys(value, meth)
|
20
|
+
end
|
21
|
+
else
|
22
|
+
object
|
23
|
+
end
|
24
|
+
end
|
25
|
+
|
26
|
+
def symbolize_keys(object)
|
27
|
+
transform_keys(object, :to_sym)
|
28
|
+
end
|
29
|
+
|
30
|
+
def stringify_keys(object)
|
31
|
+
transform_keys(object, :to_s)
|
32
|
+
end
|
33
|
+
end
|
34
|
+
end
|
35
|
+
end
|
@@ -16,6 +16,7 @@ module Pupa
|
|
16
16
|
|
17
17
|
set_callback(:save, :before) do |object|
|
18
18
|
# The object may not set created_at.
|
19
|
+
# @see https://github.com/opennorth/pupa-ruby/issues/17
|
19
20
|
object.created_at = object.document['created_at'] if object.document
|
20
21
|
object.updated_at = Time.now.utc
|
21
22
|
end
|
@@ -0,0 +1,28 @@
|
|
1
|
+
module Pupa
|
2
|
+
# A minimal model for a foreign object.
|
3
|
+
class ForeignObject
|
4
|
+
extend Forwardable
|
5
|
+
include Concerns::IndifferentAccess
|
6
|
+
|
7
|
+
attr_reader :attributes, :foreign_keys
|
8
|
+
|
9
|
+
def_delegators :@attributes, :[], :[]=
|
10
|
+
|
11
|
+
def initialize(properties = {})
|
12
|
+
hash = symbolize_keys(properties)
|
13
|
+
value = hash.delete(:foreign_keys) || {}
|
14
|
+
@attributes = hash.merge(value)
|
15
|
+
@foreign_keys = value.keys
|
16
|
+
end
|
17
|
+
|
18
|
+
def to_h
|
19
|
+
{}.tap do |hash|
|
20
|
+
attributes.each do |property,value|
|
21
|
+
if value == false || value.present?
|
22
|
+
hash[property] = value
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
26
|
+
end
|
27
|
+
end
|
28
|
+
end
|
data/lib/pupa/models/model.rb
CHANGED
@@ -17,6 +17,8 @@ module Pupa
|
|
17
17
|
|
18
18
|
included do
|
19
19
|
include ActiveSupport::Callbacks
|
20
|
+
include Concerns::IndifferentAccess
|
21
|
+
|
20
22
|
define_callbacks :create, :save
|
21
23
|
|
22
24
|
class_attribute :json_schema
|
@@ -77,13 +79,14 @@ module Pupa
|
|
77
79
|
# Sets the class' schema.
|
78
80
|
#
|
79
81
|
# @param [Hash,String] value a hash or a relative or absolute path
|
82
|
+
# @note `JSON::Validator#initialize_schema` runs fastest if given a hash.
|
80
83
|
def schema=(value)
|
81
84
|
self.json_schema = if Hash === value
|
82
85
|
value
|
83
86
|
elsif Pathname.new(value).absolute?
|
84
|
-
File.read(value)
|
87
|
+
JSON.load(File.read(value))
|
85
88
|
else
|
86
|
-
File.read(File.expand_path(File.join('..', '..', '..', 'schemas', "#{value}.json"), __dir__))
|
89
|
+
JSON.load(File.read(File.expand_path(File.join('..', '..', '..', 'schemas', "#{value}.json"), __dir__)))
|
87
90
|
end
|
88
91
|
end
|
89
92
|
end
|
@@ -166,7 +169,6 @@ module Pupa
|
|
166
169
|
# @raises [JSON::Schema::ValidationError] if the object is invalid
|
167
170
|
def validate!
|
168
171
|
if self.class.json_schema
|
169
|
-
# JSON::Validator#initialize_schema runs fastest if given a hash.
|
170
172
|
JSON::Validator.validate!(self.class.json_schema, stringify_keys(to_h(persist: true)))
|
171
173
|
end
|
172
174
|
end
|
@@ -200,32 +202,5 @@ module Pupa
|
|
200
202
|
b.delete(:_id)
|
201
203
|
a == b
|
202
204
|
end
|
203
|
-
|
204
|
-
private
|
205
|
-
|
206
|
-
def transform_keys(object, meth)
|
207
|
-
case object
|
208
|
-
when Hash
|
209
|
-
{}.tap do |hash|
|
210
|
-
object.each do |key,value|
|
211
|
-
hash[key.send(meth)] = transform_keys(value, meth)
|
212
|
-
end
|
213
|
-
end
|
214
|
-
when Array
|
215
|
-
object.map do |value|
|
216
|
-
transform_keys(value, meth)
|
217
|
-
end
|
218
|
-
else
|
219
|
-
object
|
220
|
-
end
|
221
|
-
end
|
222
|
-
|
223
|
-
def symbolize_keys(object)
|
224
|
-
transform_keys(object, :to_sym)
|
225
|
-
end
|
226
|
-
|
227
|
-
def stringify_keys(object)
|
228
|
-
transform_keys(object, :to_s)
|
229
|
-
end
|
230
205
|
end
|
231
206
|
end
|
data/lib/pupa/processor.rb
CHANGED
@@ -151,7 +151,10 @@ module Pupa
|
|
151
151
|
object = objects[id]
|
152
152
|
resolve_foreign_keys(object, object_id_to_database_id)
|
153
153
|
# The dependency graph strategy only works if there are no foreign objects.
|
154
|
-
|
154
|
+
|
155
|
+
database_id = import_object(object)
|
156
|
+
object_id_to_database_id[id] = database_id
|
157
|
+
object_id_to_database_id[database_id] = database_id
|
155
158
|
end
|
156
159
|
else
|
157
160
|
size = objects.size
|
@@ -167,23 +170,16 @@ module Pupa
|
|
167
170
|
progress_made = false
|
168
171
|
|
169
172
|
objects.delete_if do |id,object|
|
170
|
-
|
171
|
-
|
172
|
-
resolvable &= object.foreign_keys.all? do |property|
|
173
|
-
value = object[property]
|
174
|
-
value.nil? || object_id_to_database_id.key?(value)
|
175
|
-
end
|
176
|
-
|
177
|
-
resolvable &= object.foreign_objects.all? do |property|
|
178
|
-
selector = object[property]
|
179
|
-
selector.blank? || Persistence.find(selector)
|
180
|
-
end
|
181
|
-
|
182
|
-
if resolvable
|
183
|
-
progress_made = true
|
173
|
+
begin
|
184
174
|
resolve_foreign_keys(object, object_id_to_database_id)
|
185
|
-
resolve_foreign_objects(object)
|
186
|
-
|
175
|
+
resolve_foreign_objects(object, object_id_to_database_id)
|
176
|
+
progress_made = true
|
177
|
+
|
178
|
+
database_id = import_object(object)
|
179
|
+
object_id_to_database_id[id] = database_id
|
180
|
+
object_id_to_database_id[database_id] = database_id
|
181
|
+
rescue Pupa::Errors::MissingDatabaseIdError
|
182
|
+
false
|
187
183
|
end
|
188
184
|
end
|
189
185
|
|
@@ -191,14 +187,16 @@ module Pupa
|
|
191
187
|
end
|
192
188
|
|
193
189
|
unless objects.empty?
|
194
|
-
raise Errors::UnprocessableEntity, "couldn't resolve #{objects.size}/#{size} objects:\n #{objects.values.map{|object|
|
190
|
+
raise Errors::UnprocessableEntity, "couldn't resolve #{objects.size}/#{size} objects:\n #{objects.values.map{|object| JSON.dump(object.foreign_properties)}.join("\n ")}"
|
195
191
|
end
|
196
192
|
end
|
197
193
|
|
198
194
|
# Ensure that fingerprints uniquely identified objects.
|
199
195
|
counts = {}
|
200
196
|
object_id_to_database_id.each do |object_id,database_id|
|
201
|
-
|
197
|
+
unless object_id == database_id
|
198
|
+
(counts[database_id] ||= []) << object_id
|
199
|
+
end
|
202
200
|
end
|
203
201
|
duplicates = counts.select do |_,object_ids|
|
204
202
|
object_ids.size > 1
|
@@ -251,13 +249,28 @@ module Pupa
|
|
251
249
|
# @return [Hash] a hash of scraped objects keyed by ID
|
252
250
|
def load_scraped_objects
|
253
251
|
{}.tap do |objects|
|
254
|
-
@store.read_multi(@store.entries).each do |
|
255
|
-
object =
|
252
|
+
@store.read_multi(@store.entries).each do |properties|
|
253
|
+
object = load_scraped_object(properties)
|
256
254
|
objects[object._id] = object
|
257
255
|
end
|
258
256
|
end
|
259
257
|
end
|
260
258
|
|
259
|
+
# Loads a scraped object from its properties.
|
260
|
+
#
|
261
|
+
# @param [Hash] properties the object's properties
|
262
|
+
# @return [Object] a scraped object
|
263
|
+
# @raises [Pupa::Errors::MissingObjectTypeError] if the scraped object is
|
264
|
+
# missing a `_type` property.
|
265
|
+
def load_scraped_object(properties)
|
266
|
+
type = properties['_type'] || properties[:_type]
|
267
|
+
if type
|
268
|
+
type.camelize.constantize.new(properties)
|
269
|
+
else
|
270
|
+
raise Errors::MissingObjectTypeError, "missing _type: #{JSON.dump(properties)}"
|
271
|
+
end
|
272
|
+
end
|
273
|
+
|
261
274
|
# Removes all duplicate objects and re-assigns any foreign keys.
|
262
275
|
#
|
263
276
|
# @param [Hash] objects a hash of scraped objects keyed by ID
|
@@ -341,17 +354,17 @@ module Pupa
|
|
341
354
|
#
|
342
355
|
# @param [Object] an object
|
343
356
|
# @param [Hash] a map from object ID to database ID
|
344
|
-
# @raises [Pupa::Errors::MissingDatabaseIdError]
|
357
|
+
# @raises [Pupa::Errors::MissingDatabaseIdError] if a foreign key cannot be
|
358
|
+
# resolved
|
345
359
|
def resolve_foreign_keys(object, map)
|
346
360
|
object.foreign_keys.each do |property|
|
347
361
|
value = object[property]
|
348
362
|
if value
|
349
|
-
|
350
|
-
|
351
|
-
|
352
|
-
|
353
|
-
|
354
|
-
object[property] = map[value]
|
363
|
+
if map.key?(value)
|
364
|
+
object[property] = map[value]
|
365
|
+
else
|
366
|
+
raise Errors::MissingDatabaseIdError, "couldn't resolve foreign key: #{property} #{value}"
|
367
|
+
end
|
355
368
|
end
|
356
369
|
end
|
357
370
|
end
|
@@ -359,13 +372,22 @@ module Pupa
|
|
359
372
|
# Resolves an object's foreign objects to database IDs.
|
360
373
|
#
|
361
374
|
# @param [Object] object an object
|
362
|
-
# @
|
363
|
-
|
375
|
+
# @param [Hash] a map from object ID to database ID
|
376
|
+
# @raises [Pupa::Errors::MissingDatabaseIdError] if a foreign object cannot
|
377
|
+
# be resolved
|
378
|
+
def resolve_foreign_objects(object, map)
|
364
379
|
object.foreign_objects.each do |property|
|
365
|
-
|
366
|
-
if
|
367
|
-
|
368
|
-
|
380
|
+
value = object[property]
|
381
|
+
if value.present?
|
382
|
+
foreign_object = ForeignObject.new(value)
|
383
|
+
resolve_foreign_keys(foreign_object, map)
|
384
|
+
document = Persistence.find(foreign_object.to_h)
|
385
|
+
|
386
|
+
if document
|
387
|
+
object["#{property}_id"] = document['_id']
|
388
|
+
else
|
389
|
+
raise Errors::MissingDatabaseIdError, "couldn't resolve foreign object: #{property} #{value}"
|
390
|
+
end
|
369
391
|
end
|
370
392
|
end
|
371
393
|
end
|
@@ -16,7 +16,7 @@ module Pupa
|
|
16
16
|
# @param [String] name a key
|
17
17
|
# @return [Boolean] whether the store contains an entry for the given key
|
18
18
|
def exist?(name)
|
19
|
-
File.exist?(
|
19
|
+
File.exist?(path(name))
|
20
20
|
end
|
21
21
|
|
22
22
|
# Returns all file names in the storage directory.
|
@@ -33,8 +33,8 @@ module Pupa
|
|
33
33
|
# @param [String] name a key
|
34
34
|
# @return [Hash] the value of the given key
|
35
35
|
def read(name)
|
36
|
-
File.open(
|
37
|
-
|
36
|
+
File.open(path(name)) do |f|
|
37
|
+
Oj.load(f)
|
38
38
|
end
|
39
39
|
end
|
40
40
|
|
@@ -53,8 +53,8 @@ module Pupa
|
|
53
53
|
# @param [String] name a key
|
54
54
|
# @param [Hash] value a value
|
55
55
|
def write(name, value)
|
56
|
-
File.open(
|
57
|
-
f.write(
|
56
|
+
File.open(path(name), 'w') do |f|
|
57
|
+
f.write(Oj.dump(value, mode: :compat, time_format: :ruby))
|
58
58
|
end
|
59
59
|
end
|
60
60
|
|
@@ -83,7 +83,7 @@ module Pupa
|
|
83
83
|
#
|
84
84
|
# @param [String] name a key
|
85
85
|
def delete(name)
|
86
|
-
File.delete(
|
86
|
+
File.delete(path(name))
|
87
87
|
end
|
88
88
|
|
89
89
|
# Deletes all files in the storage directory.
|
@@ -98,9 +98,11 @@ module Pupa
|
|
98
98
|
yield
|
99
99
|
end
|
100
100
|
|
101
|
-
|
102
|
-
|
103
|
-
|
101
|
+
# Returns the path to the file with the given name.
|
102
|
+
#
|
103
|
+
# @param [String] name a key
|
104
|
+
# @param [String] a path
|
105
|
+
def path(name)
|
104
106
|
File.join(@output_dir, name)
|
105
107
|
end
|
106
108
|
end
|
@@ -42,7 +42,7 @@ module Pupa
|
|
42
42
|
# @param [String] name a key
|
43
43
|
# @return [Hash] the value of the given key
|
44
44
|
def read(name)
|
45
|
-
|
45
|
+
Oj.load(@redis.get(name))
|
46
46
|
end
|
47
47
|
|
48
48
|
# Returns, as JSON, the values of the given keys.
|
@@ -50,7 +50,7 @@ module Pupa
|
|
50
50
|
# @param [String] names keys
|
51
51
|
# @return [Array<Hash>] the values of the given keys
|
52
52
|
def read_multi(names)
|
53
|
-
@redis.mget(*names).map{|value|
|
53
|
+
@redis.mget(*names).map{|value| Oj.load(value)}
|
54
54
|
end
|
55
55
|
|
56
56
|
# Writes, as JSON, the value to a key.
|
@@ -58,7 +58,7 @@ module Pupa
|
|
58
58
|
# @param [String] name a key
|
59
59
|
# @param [Hash] value a value
|
60
60
|
def write(name, value)
|
61
|
-
@redis.set(name,
|
61
|
+
@redis.set(name, Oj.dump(value, mode: :compat, time_format: :ruby))
|
62
62
|
end
|
63
63
|
|
64
64
|
# Writes, as JSON, the value to a key, unless the key exists.
|
@@ -67,7 +67,7 @@ module Pupa
|
|
67
67
|
# @param [Hash] value a value
|
68
68
|
# @return [Boolean] whether the key was set
|
69
69
|
def write_unless_exists(name, value)
|
70
|
-
@redis.setnx(name,
|
70
|
+
@redis.setnx(name, Oj.dump(value, mode: :compat, time_format: :ruby))
|
71
71
|
end
|
72
72
|
|
73
73
|
# Writes, as JSON, the values to keys.
|
@@ -77,7 +77,7 @@ module Pupa
|
|
77
77
|
args = []
|
78
78
|
pairs.each do |key,value|
|
79
79
|
args << key
|
80
|
-
args <<
|
80
|
+
args << Oj.dump(value, mode: :compat, time_format: :ruby)
|
81
81
|
end
|
82
82
|
@redis.mset(*args)
|
83
83
|
end
|
@@ -5,10 +5,10 @@ module Pupa
|
|
5
5
|
#
|
6
6
|
# @see https://github.com/lostisland/faraday_middleware/issues/30#issuecomment-4706892
|
7
7
|
class ParseJson < FaradayMiddleware::ResponseMiddleware
|
8
|
-
dependency '
|
8
|
+
dependency 'oj'
|
9
9
|
|
10
10
|
define_parser do |body|
|
11
|
-
|
11
|
+
Oj.load(body) unless body.strip.empty?
|
12
12
|
end
|
13
13
|
end
|
14
14
|
end
|
@@ -24,7 +24,7 @@ module Pupa
|
|
24
24
|
when 1
|
25
25
|
query.first
|
26
26
|
else
|
27
|
-
raise Errors::TooManyMatches, "selector matches multiple documents during find: #{collection_name} #{
|
27
|
+
raise Errors::TooManyMatches, "selector matches multiple documents during find: #{collection_name} #{JSON.dump(selector)}"
|
28
28
|
end
|
29
29
|
end
|
30
30
|
|
@@ -47,13 +47,14 @@ module Pupa
|
|
47
47
|
end
|
48
48
|
when 1
|
49
49
|
# Make the document available to the callbacks.
|
50
|
+
# @see https://github.com/opennorth/pupa-ruby/issues/17
|
50
51
|
@object.document = query.first
|
51
52
|
@object.run_callbacks(:save) do
|
52
53
|
query.update(@object.to_h(persist: true).except(:_id))
|
53
54
|
[false, @object.document['_id'].to_s]
|
54
55
|
end
|
55
56
|
else
|
56
|
-
raise Errors::TooManyMatches, "selector matches multiple documents during save: #{collection_name} #{
|
57
|
+
raise Errors::TooManyMatches, "selector matches multiple documents during save: #{collection_name} #{JSON.dump(selector)} for #{@object._id}"
|
57
58
|
end
|
58
59
|
end
|
59
60
|
|
@@ -2,9 +2,8 @@ require 'mail'
|
|
2
2
|
|
3
3
|
module Pupa
|
4
4
|
module Refinements
|
5
|
-
#
|
6
|
-
#
|
7
|
-
# be used with `prepend`.
|
5
|
+
# Validates "email" and "uri" formats. Using Ruby's refinements doesn't seem
|
6
|
+
# to work, possibly because `refine` can't be used with `prepend`.
|
8
7
|
module FormatAttribute
|
9
8
|
# @see http://my.rails-royce.org/2010/07/21/email-validation-in-ruby-on-rails-without-regexp/
|
10
9
|
def validate(current_schema, data, fragments, processor, validator, options = {})
|
@@ -0,0 +1,42 @@
|
|
1
|
+
# @see https://github.com/opennorth/pupa-ruby#opencivicdata-compatibility
|
2
|
+
|
3
|
+
module Pupa::Model
|
4
|
+
# This unfortunately won't cause the behavior of any model that has already
|
5
|
+
# included `Pupa::Model` to change.
|
6
|
+
class << self
|
7
|
+
def append_features(base)
|
8
|
+
if base.instance_variable_defined?("@_dependencies")
|
9
|
+
base.instance_variable_get("@_dependencies") << self
|
10
|
+
return false
|
11
|
+
else
|
12
|
+
return false if base < self
|
13
|
+
@_dependencies.each { |dep| base.send(:include, dep) }
|
14
|
+
super
|
15
|
+
base.extend const_get("ClassMethods") if const_defined?("ClassMethods")
|
16
|
+
base.class_eval(&@_included_block) if instance_variable_defined?("@_included_block")
|
17
|
+
base.class_eval do # XXX
|
18
|
+
set_callback(:save, :before) do |object|
|
19
|
+
object._type = object._type.camelize.demodulize.underscore
|
20
|
+
end
|
21
|
+
end
|
22
|
+
end
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
# `set_callback` is called by `class_eval` in `ActiveSupport::Concern`. Without
|
28
|
+
# monkey-patching `ActiveSupport::Concern`, we can either iterate `ObjectSpace`,
|
29
|
+
# implement something like ActiveSupport's `DescendantsTracker` for inclusion
|
30
|
+
# instead of inheritance, or go back to `Pupa::Model` being a superclass instead
|
31
|
+
# of a mixin to take advantage of `DescendantsTracker` itself.
|
32
|
+
#
|
33
|
+
# Instead of adding a callback, we can override `to_h` when `persist` is `true`.
|
34
|
+
ObjectSpace.each_object(Class) do |base|
|
35
|
+
if base.include?(Pupa::Model)
|
36
|
+
base.class_eval do
|
37
|
+
set_callback(:save, :before) do |object|
|
38
|
+
object._type = object._type.camelize.demodulize.underscore
|
39
|
+
end
|
40
|
+
end
|
41
|
+
end
|
42
|
+
end
|
data/lib/pupa/runner.rb
CHANGED
data/lib/pupa/version.rb
CHANGED
data/pupa.gemspec
CHANGED
@@ -22,6 +22,7 @@ Gem::Specification.new do |s|
|
|
22
22
|
s.add_runtime_dependency('json-schema', '~> 2.1.3')
|
23
23
|
s.add_runtime_dependency('mail')
|
24
24
|
s.add_runtime_dependency('moped', '~> 1.5.1')
|
25
|
+
s.add_runtime_dependency('oj', '~> 2.1')
|
25
26
|
|
26
27
|
s.add_development_dependency('coveralls')
|
27
28
|
s.add_development_dependency('dalli')
|
data/spec/models/model_spec.rb
CHANGED
@@ -91,12 +91,12 @@ describe Pupa::Model do
|
|
91
91
|
|
92
92
|
it 'should accept an absolute path' do
|
93
93
|
File.should_receive(:read).and_return('{}')
|
94
|
-
klass_with_absolute_path.json_schema.should ==
|
94
|
+
klass_with_absolute_path.json_schema.should == {}
|
95
95
|
end
|
96
96
|
|
97
97
|
it 'should accept a relative path' do
|
98
98
|
File.should_receive(:read).and_return('{}')
|
99
|
-
klass_with_relative_path.json_schema.should ==
|
99
|
+
klass_with_relative_path.json_schema.should == {}
|
100
100
|
end
|
101
101
|
end
|
102
102
|
|
@@ -36,7 +36,7 @@ describe Pupa::Processor::DocumentStore::FileStore do
|
|
36
36
|
describe '#write' do
|
37
37
|
it 'should write an entry with the given value for the given key' do
|
38
38
|
store.exist?('new.json').should == false
|
39
|
-
store.write('new.json', {
|
39
|
+
store.write('new.json', {name: 'new'})
|
40
40
|
store.read('new.json').should == {'name' => 'new'}
|
41
41
|
store.delete('new.json') # cleanup
|
42
42
|
end
|
@@ -45,13 +45,13 @@ describe Pupa::Processor::DocumentStore::FileStore do
|
|
45
45
|
describe '#write_unless_exists' do
|
46
46
|
it 'should write an entry with the given value for the given key' do
|
47
47
|
store.exist?('new.json').should == false
|
48
|
-
store.write_unless_exists('new.json', {
|
48
|
+
store.write_unless_exists('new.json', {name: 'new'}).should == true
|
49
49
|
store.read('new.json').should == {'name' => 'new'}
|
50
50
|
store.delete('new.json') # cleanup
|
51
51
|
end
|
52
52
|
|
53
53
|
it 'should not write an entry with the given value for the given key if the key exists' do
|
54
|
-
store.write_unless_exists('foo.json', {
|
54
|
+
store.write_unless_exists('foo.json', {name: 'new'}).should == false
|
55
55
|
store.read('foo.json').should == {'name' => 'foo'}
|
56
56
|
end
|
57
57
|
end
|
@@ -60,7 +60,7 @@ describe Pupa::Processor::DocumentStore::FileStore do
|
|
60
60
|
it 'should write entries with the given values for the given keys' do
|
61
61
|
pairs = {}
|
62
62
|
%w(new1 new2).each do |name|
|
63
|
-
pairs["#{name}.json"] = {
|
63
|
+
pairs["#{name}.json"] = {name: name}
|
64
64
|
end
|
65
65
|
|
66
66
|
pairs.keys.each do |name|
|
@@ -76,7 +76,7 @@ describe Pupa::Processor::DocumentStore::FileStore do
|
|
76
76
|
|
77
77
|
describe '#delete' do
|
78
78
|
it 'should delete an entry with the given key from the store' do
|
79
|
-
store.write('new.json', {
|
79
|
+
store.write('new.json', {name: 'new'})
|
80
80
|
store.exist?('new.json').should == true
|
81
81
|
store.delete('new.json')
|
82
82
|
store.exist?('new.json').should == false
|
@@ -90,8 +90,14 @@ describe Pupa::Processor::DocumentStore::FileStore do
|
|
90
90
|
store.entries.should == []
|
91
91
|
|
92
92
|
%w(bar baz foo).each do |name| # cleanup
|
93
|
-
store.write("#{name}.json", {
|
93
|
+
store.write("#{name}.json", {name: name})
|
94
94
|
end
|
95
95
|
end
|
96
96
|
end
|
97
|
+
|
98
|
+
describe '#path' do
|
99
|
+
it 'should return the file path to the entry' do
|
100
|
+
store.path('foo').should == File.expand_path(File.join('..', '..', 'fixtures', 'foo'), __dir__)
|
101
|
+
end
|
102
|
+
end
|
97
103
|
end
|
@@ -8,7 +8,7 @@ describe Pupa::Processor::DocumentStore::RedisStore do
|
|
8
8
|
before :all do
|
9
9
|
store.clear
|
10
10
|
%w(foo bar baz).each do |name|
|
11
|
-
store.write("#{name}.json", {
|
11
|
+
store.write("#{name}.json", {name: name})
|
12
12
|
end
|
13
13
|
end
|
14
14
|
|
@@ -43,7 +43,7 @@ describe Pupa::Processor::DocumentStore::RedisStore do
|
|
43
43
|
describe '#write' do
|
44
44
|
it 'should write an entry with the given value for the given key' do
|
45
45
|
store.exist?('new.json').should == false
|
46
|
-
store.write('new.json', {
|
46
|
+
store.write('new.json', {name: 'new'})
|
47
47
|
store.read('new.json').should == {'name' => 'new'}
|
48
48
|
store.delete('new.json') # cleanup
|
49
49
|
end
|
@@ -52,13 +52,13 @@ describe Pupa::Processor::DocumentStore::RedisStore do
|
|
52
52
|
describe '#write_unless_exists' do
|
53
53
|
it 'should write an entry with the given value for the given key' do
|
54
54
|
store.exist?('new.json').should == false
|
55
|
-
store.write_unless_exists('new.json', {
|
55
|
+
store.write_unless_exists('new.json', {name: 'new'}).should == true
|
56
56
|
store.read('new.json').should == {'name' => 'new'}
|
57
57
|
store.delete('new.json') # cleanup
|
58
58
|
end
|
59
59
|
|
60
60
|
it 'should not write an entry with the given value for the given key if the key exists' do
|
61
|
-
store.write_unless_exists('foo.json', {
|
61
|
+
store.write_unless_exists('foo.json', {name: 'new'}).should == false
|
62
62
|
store.read('foo.json').should == {'name' => 'foo'}
|
63
63
|
end
|
64
64
|
end
|
@@ -67,7 +67,7 @@ describe Pupa::Processor::DocumentStore::RedisStore do
|
|
67
67
|
it 'should write entries with the given values for the given keys' do
|
68
68
|
pairs = {}
|
69
69
|
%w(new1 new2).each do |name|
|
70
|
-
pairs["#{name}.json"] = {
|
70
|
+
pairs["#{name}.json"] = {name: name}
|
71
71
|
end
|
72
72
|
|
73
73
|
pairs.keys.each do |name|
|
@@ -83,7 +83,7 @@ describe Pupa::Processor::DocumentStore::RedisStore do
|
|
83
83
|
|
84
84
|
describe '#delete' do
|
85
85
|
it 'should delete an entry with the given key from the store' do
|
86
|
-
store.write('new.json', {
|
86
|
+
store.write('new.json', {name: 'new'})
|
87
87
|
store.exist?('new.json').should == true
|
88
88
|
store.delete('new.json')
|
89
89
|
store.exist?('new.json').should == false
|
@@ -97,7 +97,7 @@ describe Pupa::Processor::DocumentStore::RedisStore do
|
|
97
97
|
store.entries.should == []
|
98
98
|
|
99
99
|
%w(bar baz foo).each do |name| # cleanup
|
100
|
-
store.write("#{name}.json", {
|
100
|
+
store.write("#{name}.json", {name: name})
|
101
101
|
end
|
102
102
|
end
|
103
103
|
end
|
@@ -1,6 +1,14 @@
|
|
1
1
|
require File.expand_path(File.dirname(__FILE__) + '/../spec_helper')
|
2
2
|
|
3
3
|
describe Pupa::Processor::Persistence do
|
4
|
+
def _type
|
5
|
+
if testing_python_compatibility?
|
6
|
+
'person'
|
7
|
+
else
|
8
|
+
'pupa/person'
|
9
|
+
end
|
10
|
+
end
|
11
|
+
|
4
12
|
before :all do
|
5
13
|
Pupa.session = Moped::Session.new(['localhost:27017'], database: 'pupa_test')
|
6
14
|
Pupa.session.collections.each(&:drop)
|
@@ -13,11 +21,11 @@ describe Pupa::Processor::Persistence do
|
|
13
21
|
|
14
22
|
describe '.find' do
|
15
23
|
it 'should return nil if no matches' do
|
16
|
-
Pupa::Processor::Persistence.find(_type:
|
24
|
+
Pupa::Processor::Persistence.find(_type: _type, name: 'nonexistent').should == nil
|
17
25
|
end
|
18
26
|
|
19
27
|
it 'should return a document if one match' do
|
20
|
-
Pupa::Processor::Persistence.find(_type:
|
28
|
+
Pupa::Processor::Persistence.find(_type: _type, name: 'existing').should be_a(Hash)
|
21
29
|
end
|
22
30
|
|
23
31
|
it 'should raise an error if many matches' do
|
@@ -28,12 +36,12 @@ describe Pupa::Processor::Persistence do
|
|
28
36
|
describe '#save' do
|
29
37
|
it 'should insert a document if no matches' do
|
30
38
|
Pupa::Processor::Persistence.new(Pupa::Person.new(_id: 'new', name: 'new', email: 'new@example.com')).save.should == [true, 'new']
|
31
|
-
Pupa::Processor::Persistence.find(_type:
|
39
|
+
Pupa::Processor::Persistence.find(_type: _type, name: 'new')['email'].should == 'new@example.com'
|
32
40
|
end
|
33
41
|
|
34
42
|
it 'should update a document if one match' do
|
35
43
|
Pupa::Processor::Persistence.new(Pupa::Person.new(_id: 'changed', name: 'existing', email: 'changed@example.com')).save.should == [false, 'existing']
|
36
|
-
Pupa::Processor::Persistence.find(_type:
|
44
|
+
Pupa::Processor::Persistence.find(_type: _type, name: 'existing')['email'].should == 'changed@example.com'
|
37
45
|
end
|
38
46
|
|
39
47
|
it 'should raise an error if many matches' do
|
data/spec/processor_spec.rb
CHANGED
@@ -102,6 +102,14 @@ describe Pupa::Processor do
|
|
102
102
|
Pupa.session.collections.each(&:drop)
|
103
103
|
end
|
104
104
|
|
105
|
+
let :_type do
|
106
|
+
if testing_python_compatibility?
|
107
|
+
'organization'
|
108
|
+
else
|
109
|
+
'pupa/organization'
|
110
|
+
end
|
111
|
+
end
|
112
|
+
|
105
113
|
let :graphable do
|
106
114
|
{
|
107
115
|
'1' => Pupa::Organization.new({
|
@@ -125,7 +133,7 @@ describe Pupa::Processor do
|
|
125
133
|
'4' => Pupa::Organization.new({
|
126
134
|
_id: '4',
|
127
135
|
name: 'Child',
|
128
|
-
parent: {_type:
|
136
|
+
parent: {_type: _type, name: 'Parent'},
|
129
137
|
}),
|
130
138
|
'5' => Pupa::Organization.new({
|
131
139
|
_id: '5',
|
@@ -138,6 +146,25 @@ describe Pupa::Processor do
|
|
138
146
|
}
|
139
147
|
end
|
140
148
|
|
149
|
+
let :foreign_keys_on_foreign_objects do
|
150
|
+
{
|
151
|
+
'7' => Pupa::Organization.new({
|
152
|
+
_id: '7',
|
153
|
+
name: 'Child',
|
154
|
+
parent: {_type: _type, name: 'Parent'},
|
155
|
+
}),
|
156
|
+
'8' => Pupa::Organization.new({
|
157
|
+
_id: '8',
|
158
|
+
name: 'Grandchild',
|
159
|
+
parent: {_type: _type, foreign_keys: {parent_id: '9'}}
|
160
|
+
}),
|
161
|
+
'9' => Pupa::Organization.new({
|
162
|
+
_id: '9',
|
163
|
+
name: 'Parent',
|
164
|
+
}),
|
165
|
+
}
|
166
|
+
end
|
167
|
+
|
141
168
|
it 'should use a dependency graph if possible' do
|
142
169
|
processor.should_receive(:load_scraped_objects).and_return(graphable)
|
143
170
|
|
@@ -145,6 +172,16 @@ describe Pupa::Processor do
|
|
145
172
|
processor.import
|
146
173
|
end
|
147
174
|
|
175
|
+
it 'should remove duplicate objects and re-assign foreign keys' do
|
176
|
+
processor.should_receive(:load_scraped_objects).and_return(graphable)
|
177
|
+
|
178
|
+
processor.import
|
179
|
+
documents = Pupa.session[:organizations].find.entries
|
180
|
+
documents.size.should == 2
|
181
|
+
documents[0].slice('_id', '_type', 'name', 'parent_id').should == {'_id' => '2', '_type' => _type, 'name' => 'Parent'}
|
182
|
+
documents[1].slice('_id', '_type', 'name', 'parent_id').should == {'_id' => '1', '_type' => _type, 'name' => 'Child', 'parent_id' => '2'}
|
183
|
+
end
|
184
|
+
|
148
185
|
it 'should not use a dependency graph if not possible' do
|
149
186
|
processor.should_receive(:load_scraped_objects).and_return(ungraphable)
|
150
187
|
|
@@ -152,24 +189,25 @@ describe Pupa::Processor do
|
|
152
189
|
processor.import
|
153
190
|
end
|
154
191
|
|
155
|
-
it 'should remove duplicate objects and
|
156
|
-
processor.should_receive(:load_scraped_objects).and_return(
|
192
|
+
it 'should remove duplicate objects and resolve foreign objects' do
|
193
|
+
processor.should_receive(:load_scraped_objects).and_return(ungraphable)
|
157
194
|
|
158
195
|
processor.import
|
159
196
|
documents = Pupa.session[:organizations].find.entries
|
160
197
|
documents.size.should == 2
|
161
|
-
documents[0].slice('_id', '_type', 'name', 'parent_id').should == {'_id' => '
|
162
|
-
documents[1].slice('_id', '_type', 'name', 'parent_id').should == {'_id' => '
|
198
|
+
documents[0].slice('_id', '_type', 'name', 'parent_id').should == {'_id' => '5', '_type' => _type, 'name' => 'Parent'}
|
199
|
+
documents[1].slice('_id', '_type', 'name', 'parent_id').should == {'_id' => '4', '_type' => _type, 'name' => 'Child', 'parent_id' => '5'}
|
163
200
|
end
|
164
201
|
|
165
|
-
it 'should resolve foreign objects' do
|
166
|
-
processor.should_receive(:load_scraped_objects).and_return(
|
202
|
+
it 'should resolve foreign keys on foreign objects' do
|
203
|
+
processor.should_receive(:load_scraped_objects).and_return(foreign_keys_on_foreign_objects)
|
167
204
|
|
168
205
|
processor.import
|
169
206
|
documents = Pupa.session[:organizations].find.entries
|
170
|
-
documents.size.should ==
|
171
|
-
documents[0].slice('_id', '_type', 'name', 'parent_id').should == {'_id' => '
|
172
|
-
documents[1].slice('_id', '_type', 'name', 'parent_id').should == {'_id' => '
|
207
|
+
documents.size.should == 3
|
208
|
+
documents[0].slice('_id', '_type', 'name', 'parent_id').should == {'_id' => '9', '_type' => _type, 'name' => 'Parent'}
|
209
|
+
documents[1].slice('_id', '_type', 'name', 'parent_id').should == {'_id' => '7', '_type' => _type, 'name' => 'Child', 'parent_id' => '9'}
|
210
|
+
documents[2].slice('_id', '_type', 'name', 'parent_id').should == {'_id' => '8', '_type' => _type, 'name' => 'Grandchild', 'parent_id' => '7'}
|
173
211
|
end
|
174
212
|
|
175
213
|
context 'with existing documents' do
|
@@ -196,12 +234,13 @@ describe Pupa::Processor do
|
|
196
234
|
}
|
197
235
|
end
|
198
236
|
|
237
|
+
# Use a foreign object to not use a dependency graph.
|
199
238
|
let :unresolvable_foreign_key do
|
200
239
|
{
|
201
240
|
'a' => Pupa::Organization.new({
|
202
241
|
_id: 'a',
|
203
242
|
name: 'Child',
|
204
|
-
parent: {_type:
|
243
|
+
parent: {_type: _type, name: 'Parent'},
|
205
244
|
}),
|
206
245
|
'b' => Pupa::Organization.new({
|
207
246
|
_id: 'b',
|
@@ -220,7 +259,7 @@ describe Pupa::Processor do
|
|
220
259
|
'a' => Pupa::Organization.new({
|
221
260
|
_id: 'a',
|
222
261
|
name: 'Child',
|
223
|
-
parent: {_type:
|
262
|
+
parent: {_type: _type, name: 'Nonexistent'},
|
224
263
|
}),
|
225
264
|
'b' => Pupa::Organization.new({
|
226
265
|
_id: 'b',
|
@@ -239,7 +278,7 @@ describe Pupa::Processor do
|
|
239
278
|
'a' => Pupa::Organization.new({
|
240
279
|
_id: 'a',
|
241
280
|
name: 'Child',
|
242
|
-
parent: {_type:
|
281
|
+
parent: {_type: _type, name: 'Parent'},
|
243
282
|
}),
|
244
283
|
'b' => Pupa::Organization.new({
|
245
284
|
_id: 'b',
|
@@ -253,14 +292,33 @@ describe Pupa::Processor do
|
|
253
292
|
}
|
254
293
|
end
|
255
294
|
|
295
|
+
let :resolvable_foreign_keys_on_foreign_objects do
|
296
|
+
{
|
297
|
+
'a' => Pupa::Organization.new({
|
298
|
+
_id: 'a',
|
299
|
+
name: 'Child',
|
300
|
+
parent: {_type: _type, name: 'Parent'},
|
301
|
+
}),
|
302
|
+
'b' => Pupa::Organization.new({
|
303
|
+
_id: 'b',
|
304
|
+
name: 'Grandchild',
|
305
|
+
parent: {_type: _type, foreign_keys: {parent_id: 'c'}}
|
306
|
+
}),
|
307
|
+
'c' => Pupa::Organization.new({
|
308
|
+
_id: 'c',
|
309
|
+
name: 'Parent',
|
310
|
+
}),
|
311
|
+
}
|
312
|
+
end
|
313
|
+
|
256
314
|
it 'should resolve foreign keys' do
|
257
315
|
processor.should_receive(:load_scraped_objects).and_return(resolvable_foreign_key)
|
258
316
|
|
259
317
|
processor.import
|
260
318
|
documents = Pupa.session[:organizations].find.entries
|
261
319
|
documents.size.should == 2
|
262
|
-
documents[0].slice('_id', '_type', 'name', 'parent_id').should == {'_id' => '2', '_type' =>
|
263
|
-
documents[1].slice('_id', '_type', 'name', 'parent_id').should == {'_id' => '1', '_type' =>
|
320
|
+
documents[0].slice('_id', '_type', 'name', 'parent_id').should == {'_id' => '2', '_type' => _type, 'name' => 'Parent'}
|
321
|
+
documents[1].slice('_id', '_type', 'name', 'parent_id').should == {'_id' => '1', '_type' => _type, 'name' => 'Child', 'parent_id' => '2'}
|
264
322
|
end
|
265
323
|
|
266
324
|
it 'should raise an error if a foreign key cannot be resolved' do
|
@@ -277,6 +335,17 @@ describe Pupa::Processor do
|
|
277
335
|
processor.should_receive(:load_scraped_objects).and_return(duplicate_documents)
|
278
336
|
expect{processor.import}.to raise_error(Pupa::Errors::DuplicateDocumentError)
|
279
337
|
end
|
338
|
+
|
339
|
+
it 'should resolve foreign keys on foreign objects' do
|
340
|
+
processor.should_receive(:load_scraped_objects).and_return(resolvable_foreign_keys_on_foreign_objects)
|
341
|
+
|
342
|
+
processor.import
|
343
|
+
documents = Pupa.session[:organizations].find.entries
|
344
|
+
documents.size.should == 3
|
345
|
+
documents[0].slice('_id', '_type', 'name', 'parent_id').should == {'_id' => '2', '_type' => _type, 'name' => 'Parent'}
|
346
|
+
documents[1].slice('_id', '_type', 'name', 'parent_id').should == {'_id' => '1', '_type' => _type, 'name' => 'Child', 'parent_id' => '2'}
|
347
|
+
documents[2].slice('_id', '_type', 'name', 'parent_id').should == {'_id' => 'b', '_type' => _type, 'name' => 'Grandchild', 'parent_id' => '1'}
|
348
|
+
end
|
280
349
|
end
|
281
350
|
end
|
282
351
|
end
|
@@ -0,0 +1,35 @@
|
|
1
|
+
require File.expand_path(File.dirname(__FILE__) + '/../spec_helper')
|
2
|
+
|
3
|
+
describe Pupa::Refinements, testing_python_compatibility: true do
|
4
|
+
module Music
|
5
|
+
class Band
|
6
|
+
include Pupa::Model
|
7
|
+
|
8
|
+
def save
|
9
|
+
run_callbacks(:save) do
|
10
|
+
end
|
11
|
+
end
|
12
|
+
end
|
13
|
+
end
|
14
|
+
|
15
|
+
module Pupa
|
16
|
+
class Committee < Organization
|
17
|
+
def save
|
18
|
+
run_callbacks(:save) do
|
19
|
+
end
|
20
|
+
end
|
21
|
+
end
|
22
|
+
end
|
23
|
+
|
24
|
+
it 'should demodulize the type of new models' do
|
25
|
+
object = Music::Band.new
|
26
|
+
object.save
|
27
|
+
object._type.should == 'band'
|
28
|
+
end
|
29
|
+
|
30
|
+
it 'should demodulize the type of existing models' do
|
31
|
+
object = Pupa::Committee.new
|
32
|
+
object.save
|
33
|
+
object._type.should == 'committee'
|
34
|
+
end
|
35
|
+
end
|
data/spec/spec_helper.rb
CHANGED
@@ -18,3 +18,15 @@ VCR.configure do |c|
|
|
18
18
|
VCR.use_cassette(Digest::SHA1.hexdigest(request.uri + request.body + request.headers.to_s), &request)
|
19
19
|
end
|
20
20
|
end
|
21
|
+
|
22
|
+
def testing_python_compatibility?
|
23
|
+
ENV['MODE'] == 'compat'
|
24
|
+
end
|
25
|
+
|
26
|
+
if testing_python_compatibility?
|
27
|
+
require File.dirname(__FILE__) + '/../lib/pupa/refinements/opencivicdata'
|
28
|
+
end
|
29
|
+
|
30
|
+
RSpec.configure do |c|
|
31
|
+
c.filter_run_excluding :testing_python_compatibility => true unless testing_python_compatibility?
|
32
|
+
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: pupa
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.12
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Open North
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2013-
|
11
|
+
date: 2013-12-29 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: activesupport
|
@@ -94,6 +94,20 @@ dependencies:
|
|
94
94
|
- - ~>
|
95
95
|
- !ruby/object:Gem::Version
|
96
96
|
version: 1.5.1
|
97
|
+
- !ruby/object:Gem::Dependency
|
98
|
+
name: oj
|
99
|
+
requirement: !ruby/object:Gem::Requirement
|
100
|
+
requirements:
|
101
|
+
- - ~>
|
102
|
+
- !ruby/object:Gem::Version
|
103
|
+
version: '2.1'
|
104
|
+
type: :runtime
|
105
|
+
prerelease: false
|
106
|
+
version_requirements: !ruby/object:Gem::Requirement
|
107
|
+
requirements:
|
108
|
+
- - ~>
|
109
|
+
- !ruby/object:Gem::Version
|
110
|
+
version: '2.1'
|
97
111
|
- !ruby/object:Gem::Dependency
|
98
112
|
name: coveralls
|
99
113
|
requirement: !ruby/object:Gem::Requirement
|
@@ -268,11 +282,13 @@ files:
|
|
268
282
|
- lib/pupa/logger.rb
|
269
283
|
- lib/pupa/models/concerns/contactable.rb
|
270
284
|
- lib/pupa/models/concerns/identifiable.rb
|
285
|
+
- lib/pupa/models/concerns/indifferent_access.rb
|
271
286
|
- lib/pupa/models/concerns/linkable.rb
|
272
287
|
- lib/pupa/models/concerns/nameable.rb
|
273
288
|
- lib/pupa/models/concerns/sourceable.rb
|
274
289
|
- lib/pupa/models/concerns/timestamps.rb
|
275
290
|
- lib/pupa/models/contact_detail_list.rb
|
291
|
+
- lib/pupa/models/foreign_object.rb
|
276
292
|
- lib/pupa/models/identifier_list.rb
|
277
293
|
- lib/pupa/models/membership.rb
|
278
294
|
- lib/pupa/models/model.rb
|
@@ -294,6 +310,7 @@ files:
|
|
294
310
|
- lib/pupa/processor/yielder.rb
|
295
311
|
- lib/pupa/refinements/faraday_middleware.rb
|
296
312
|
- lib/pupa/refinements/json-schema.rb
|
313
|
+
- lib/pupa/refinements/opencivicdata.rb
|
297
314
|
- lib/pupa/runner.rb
|
298
315
|
- lib/pupa/version.rb
|
299
316
|
- pupa.gemspec
|
@@ -340,6 +357,7 @@ files:
|
|
340
357
|
- spec/processor/persistence_spec.rb
|
341
358
|
- spec/processor/yielder_spec.rb
|
342
359
|
- spec/processor_spec.rb
|
360
|
+
- spec/refinements/opencivicdata_spec.rb
|
343
361
|
- spec/runner_spec.rb
|
344
362
|
- spec/spec_helper.rb
|
345
363
|
homepage: http://github.com/opennorth/pupa-ruby
|
@@ -402,5 +420,6 @@ test_files:
|
|
402
420
|
- spec/processor/persistence_spec.rb
|
403
421
|
- spec/processor/yielder_spec.rb
|
404
422
|
- spec/processor_spec.rb
|
423
|
+
- spec/refinements/opencivicdata_spec.rb
|
405
424
|
- spec/runner_spec.rb
|
406
425
|
- spec/spec_helper.rb
|