extraloop-redis-storage 0.0.5 → 0.0.6

Sign up to get free protection for your applications and to get access to all the features.
@@ -1,3 +1,10 @@
1
+ == 0.0.6 / 2012-26-02
2
+
3
+ * Added CSV export
4
+
5
+ == 0.0.5 / 2012-26-02
6
+ * added command line interface for listing, deleting and exporing stored records.
7
+
1
8
  == 0.0.2 / 2012-20-02
2
9
  * Fixed scope bug
3
10
 
@@ -3,8 +3,9 @@
3
3
  == Description
4
4
 
5
5
  Persistence layer for the {ExtraLoop}[https://github.com/afiore/extraloop] data extraction toolkit.
6
- The module is implemented as a small wrapper built on top of {Ohm}[http://ohm.keyvalue.org], an object-hash mapping library which
7
- makes storing and managing Ruby objects in Redis easy and convenient.
6
+ This module is implemented as a wrapper around {Ohm}[http://ohm.keyvalue.org], an object-hash mapping library which
7
+ makes easy storing structured data into Redis. It comes with a convinent command line tool, which allows to
8
+ list, filter, delete, and export harvested datasets.
8
9
 
9
10
  == Installation
10
11
 
@@ -44,11 +45,34 @@ session instance:
44
45
  reviews = scraper.session.records AmazonReview
45
46
 
46
47
 
47
- == The #set_storage method
48
+ === The #set_storage method
48
49
 
49
50
  The +set_storage+ method can be called with the following arguments:
50
51
 
51
52
  * _model_ A Ruby constant specifying the model to be used for storing the extracted data .
52
53
  * _session_title_ A human readable title for the extracted dataset (optional).
53
54
 
55
+ == Command line interface
54
56
 
57
+ Once installed, the gem will also add to your system path the +extraloop+ executable, a command line interface to the datasets harvested through extraloop.
58
+ A list of datasets can be obtained by running:
59
+
60
+ extraloop datastore list:
61
+
62
+ This will generate a table like the following one:
63
+ <code>
64
+ id | title | model | records
65
+ --------------------------------------------------------------------
66
+ 48 | 1330106699 GoogleNewsStory Dataset | GoogleNewsStory | 110
67
+ 49 | 1330106948 AmazonReview Dataset | AmazonReview | 0
68
+ 51 | 1330107087 GoogleNewsStory Dataset | GoogleNewsStory | 110
69
+ 52 | 1330111630 AmazonReview Dataset | AmazonReview | 10
70
+ </code>
71
+
72
+ Datasets can by removed using the +delete+ subcommand:
73
+
74
+ extraloop datastore delete [id]
75
+
76
+ Where +id+ is either a single scraping session id, or a session id range (e.g. 48..52).
77
+ Finally, the +export+ subcommand allows to export one or several datasets into a JSON or CSV documents.
78
+ Please refer to the executable inline help (<code>extraloop datastore help [command]</code>) for more usage information.
@@ -34,7 +34,7 @@ class DataStoreCommand < Thor
34
34
 
35
35
  desc "export [sessions]", "Export datasets by session id or session id range"
36
36
 
37
- method_option :format, :default => 'json', :type => 'string'
37
+ method_option :format, :default => 'json', :type => 'string', :aliases => "-f"
38
38
  method_option :directory, :default => "#{Time.now.to_i}-export", :aliases => "-o"
39
39
  method_option :require, :type => 'array', :aliases => "-r"
40
40
 
@@ -43,8 +43,8 @@ class DataStoreCommand < Thor
43
43
  require_files paths if paths.any?
44
44
  format = options[:format]
45
45
  dir = options[:directory]
46
-
47
- raise FormatNotImplemented.new unless %w[json].include? format
46
+ exception = DataStoreCommand::Exceptions::FormatNotImplemented.new "Format not supported #{format}"
47
+ raise exception unless %w[json csv].include? format
48
48
  FileUtils.mkdir(dir) unless File.exists? dir
49
49
 
50
50
  (filter sessions).each do |session|
@@ -25,6 +25,8 @@ module ExtraLoop
25
25
  end
26
26
  end
27
27
 
28
+ autoload :CSV, 'csv'
29
+ autoload :Iconv, 'iconv'
28
30
  ExtraLoop::Storage.autoload :Record, "#{base_path}/redis-storage/record.rb"
29
31
  ExtraLoop::Storage.autoload :ScrapingSession, "#{base_path}/redis-storage/scraping_session.rb"
30
32
  ExtraLoop::Storage.autoload :Model, "#{base_path}/redis-storage/model.rb"
@@ -1,11 +1,15 @@
1
1
  class ExtraLoop::Storage::ScrapingSession < Ohm::Model
2
2
 
3
+ BOM = "\377\376" #Byte Order Mark
4
+
3
5
  include Ohm::Boundaries
4
6
  include Ohm::Timestamping
5
7
  include Ohm::Callbacks
6
8
 
7
9
  attribute :title
8
10
  reference :model, ExtraLoop::Storage::Model
11
+
12
+
9
13
 
10
14
  def records(params={})
11
15
  klass = if Object.const_defined?(model.name)
@@ -47,5 +51,9 @@ class ExtraLoop::Storage::ScrapingSession < Ohm::Model
47
51
  end
48
52
 
49
53
  def to_csv
54
+ _records = Array records.all.map &:to_hash
55
+ header = _records.first && _records.first.keys.map(&:to_s)
56
+ data = [header].concat _records.map(&:values)
57
+ output = data.map { |cells| CSV.generate_line cells }.join
50
58
  end
51
59
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: extraloop-redis-storage
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.5
4
+ version: 0.0.6
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -13,7 +13,7 @@ date: 2012-02-26 00:00:00.000000000Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: extraloop
16
- requirement: &16543420 !ruby/object:Gem::Requirement
16
+ requirement: &10773840 !ruby/object:Gem::Requirement
17
17
  none: false
18
18
  requirements:
19
19
  - - ~>
@@ -21,10 +21,10 @@ dependencies:
21
21
  version: 0.0.3
22
22
  type: :runtime
23
23
  prerelease: false
24
- version_requirements: *16543420
24
+ version_requirements: *10773840
25
25
  - !ruby/object:Gem::Dependency
26
26
  name: ohm
27
- requirement: &16542960 !ruby/object:Gem::Requirement
27
+ requirement: &10773380 !ruby/object:Gem::Requirement
28
28
  none: false
29
29
  requirements:
30
30
  - - ~>
@@ -32,10 +32,10 @@ dependencies:
32
32
  version: 0.1.3
33
33
  type: :runtime
34
34
  prerelease: false
35
- version_requirements: *16542960
35
+ version_requirements: *10773380
36
36
  - !ruby/object:Gem::Dependency
37
37
  name: ohm-contrib
38
- requirement: &16542460 !ruby/object:Gem::Requirement
38
+ requirement: &10772860 !ruby/object:Gem::Requirement
39
39
  none: false
40
40
  requirements:
41
41
  - - ~>
@@ -43,10 +43,10 @@ dependencies:
43
43
  version: 0.1.2
44
44
  type: :runtime
45
45
  prerelease: false
46
- version_requirements: *16542460
46
+ version_requirements: *10772860
47
47
  - !ruby/object:Gem::Dependency
48
48
  name: thor
49
- requirement: &16541860 !ruby/object:Gem::Requirement
49
+ requirement: &10772260 !ruby/object:Gem::Requirement
50
50
  none: false
51
51
  requirements:
52
52
  - - =
@@ -54,10 +54,10 @@ dependencies:
54
54
  version: 0.14.6
55
55
  type: :runtime
56
56
  prerelease: false
57
- version_requirements: *16541860
57
+ version_requirements: *10772260
58
58
  - !ruby/object:Gem::Dependency
59
59
  name: rspec
60
- requirement: &16541180 !ruby/object:Gem::Requirement
60
+ requirement: &10771600 !ruby/object:Gem::Requirement
61
61
  none: false
62
62
  requirements:
63
63
  - - ~>
@@ -65,10 +65,10 @@ dependencies:
65
65
  version: 2.7.0
66
66
  type: :development
67
67
  prerelease: false
68
- version_requirements: *16541180
68
+ version_requirements: *10771600
69
69
  - !ruby/object:Gem::Dependency
70
70
  name: rr
71
- requirement: &16540580 !ruby/object:Gem::Requirement
71
+ requirement: &10771000 !ruby/object:Gem::Requirement
72
72
  none: false
73
73
  requirements:
74
74
  - - ~>
@@ -76,10 +76,10 @@ dependencies:
76
76
  version: 1.0.4
77
77
  type: :development
78
78
  prerelease: false
79
- version_requirements: *16540580
79
+ version_requirements: *10771000
80
80
  - !ruby/object:Gem::Dependency
81
81
  name: pry
82
- requirement: &16540100 !ruby/object:Gem::Requirement
82
+ requirement: &10770520 !ruby/object:Gem::Requirement
83
83
  none: false
84
84
  requirements:
85
85
  - - ~>
@@ -87,7 +87,7 @@ dependencies:
87
87
  version: 0.9.7.4
88
88
  type: :development
89
89
  prerelease: false
90
- version_requirements: *16540100
90
+ version_requirements: *10770520
91
91
  description: Redis+Ohm based storage for data sets extracted using the ExtraLoop toolkit.
92
92
  email: andrea.giulio.fiore@googlemail.com
93
93
  executables: