postgresql_cursor 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,5 @@
1
+ README.rdoc
2
+ lib/**/*.rb
3
+ bin/*
4
+ features/**/*.feature
5
+ LICENSE
@@ -0,0 +1,21 @@
1
+ ## MAC OS
2
+ .DS_Store
3
+
4
+ ## TEXTMATE
5
+ *.tmproj
6
+ tmtags
7
+
8
+ ## EMACS
9
+ *~
10
+ \#*
11
+ .\#*
12
+
13
+ ## VIM
14
+ *.swp
15
+
16
+ ## PROJECT::GENERAL
17
+ coverage
18
+ rdoc
19
+ pkg
20
+
21
+ ## PROJECT::SPECIFIC
data/LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2009 Allen Fair
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,78 @@
1
+ = PostgreSQLCursor
2
+
3
+ PostgreSQL Cursor is an extension to the ActiveRecord PostgreSQLAdapter for very large result sets.
4
+ It provides a cursor open/fetch/close interface to access data without loading all rows into memory,
5
+ and instead loads the result rows in "chunks" (default of 10_000 rows), buffers them, and returns the
6
+ rows one at a time.
7
+
8
+ For web pages, an application would not want to process a large amount of data, usually employing a
9
+ Pagination scheme to present it to the users. Background processes sometimes need to generate a large
10
+ amount of data, and the ActiveRecord approach to load all data into memory is not the best fit here.
11
+
12
+ Previous solutions employ pagination to fetch each block,
13
+ then re-running the query for the next "page". This gem avoids re-executing
14
+ the query by using the PostgreSQL cursors.
15
+
16
+ Like the #find_by_sql method, #find_cursor returns each row as a hash instead of an instantiated
17
+ model class. The rationale for this is performance, though an option to return instances is available.
18
+ Julian's benchmarks showed returning instances was a factor of 4 slower than return the hash.
19
+
20
+ ==Installation
21
+ [sudo] gem install postgresql_cursor
22
+
23
+ This does not require Rails to work, just ActiveRecord < 3.0.0 and the 'pg' gem.
24
+
25
+ ==Usage
26
+
27
+ A Rails/ActiveRecord plugin for the PostgreSQL database adapter that will add
28
+ cursors to a find_cursor() method to process very large result sets.
29
+
30
+ the *find_cursor* method uses cursors to pull in one data block (of x records) at a time, and
31
+ return each record as a Hash to a procedural block. When each data block is
32
+ exhausted, it will fetch the next one.
33
+
34
+ *find_by_sql_with_cursor* takes a custom SQL statement and returns each row.
35
+
36
+ ==Examples
37
+
38
+ Account.find_with_cursor(:conditions=>["status = ?", 'good']).each do |row|
39
+ puts row.to_json
40
+ end
41
+
42
+ Account.find_by_sql_with_cursor("select ...", :buffer_size=>1000) do |row|
43
+ row.process
44
+ end
45
+
46
+ Account.transaction do
47
+ cursor = Account.find_with_cursor(...) { |record| record.symbolize_keys }
48
+ while record = cursor.next do
49
+ record.process # => {:column=>value, ...}
50
+ cursor.close if cursor.count > 1000 # Halts loop after 1000 records
51
+ end
52
+ end
53
+
54
+ Account.find_with_cursor(...) { |record| record.symbolize_keys }.each do |row|
55
+ row.process
56
+ end
57
+
58
+ ==Authors
59
+ Allen Fair, allen.fair@gmail.com, http://github.com/afair
60
+
61
+ Thank you to:
62
+ * Iulian Dogariu, http://github.com/iulianu (Fixes)
63
+ * Julian Mehnle, http://www.mehnle.net (Suggestions)
64
+
65
+
66
+ == Note on Patches/Pull Requests
67
+
68
+ * Fork the project.
69
+ * Make your feature addition or bug fix.
70
+ * Add tests for it. This is important so I don't break it in a
71
+ future version unintentionally.
72
+ * Commit, do not mess with rakefile, version, or history.
73
+ (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
74
+ * Send me a pull request. Bonus points for topic branches.
75
+
76
+ == Copyright
77
+
78
+ Copyright (c) 2010 Allen Fair. See LICENSE for details.
@@ -0,0 +1,54 @@
1
+ require 'rubygems'
2
+ require 'rake'
3
+
4
+ begin
5
+ require 'jeweler'
6
+ Jeweler::Tasks.new do |gem|
7
+ gem.name = "postgresql_cursor"
8
+ gem.summary = %Q{ActiveRecord PostgreSQL Adapter extension for using a cursor to return a large result set}
9
+ gem.description = %Q{PostgreSQL Cursor is an extension to the ActiveRecord PostgreSQLAdapter for very large result sets. It provides a cursor open/fetch/close interface to access data without loading all rows into memory, and instead loads the result rows in "chunks" (default of 10_000 rows), buffers them, and returns the rows one at a time.}
10
+ gem.email = "allen.fair@gmail.com"
11
+ gem.homepage = "http://github.com/afair/postgresql_cursor"
12
+ gem.authors = ["Allen Fair"]
13
+ gem.add_dependency 'activerecord', '<=2.3.5'
14
+ gem.add_dependency 'pg'
15
+ # gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
16
+ end
17
+ Jeweler::GemcutterTasks.new
18
+ rescue LoadError
19
+ puts "Jeweler (or a dependency) not available. Install it with: gem install jeweler"
20
+ end
21
+
22
+ require 'rake/testtask'
23
+ Rake::TestTask.new(:test) do |test|
24
+ test.libs << 'lib' << 'test'
25
+ test.pattern = 'test/**/test_*.rb'
26
+ test.verbose = true
27
+ end
28
+
29
+ begin
30
+ require 'rcov/rcovtask'
31
+ Rcov::RcovTask.new do |test|
32
+ test.libs << 'test'
33
+ test.pattern = 'test/**/test_*.rb'
34
+ test.verbose = true
35
+ end
36
+ rescue LoadError
37
+ task :rcov do
38
+ abort "RCov is not available. In order to run rcov, you must: sudo gem install spicycode-rcov"
39
+ end
40
+ end
41
+
42
+ task :test => :check_dependencies
43
+
44
+ task :default => :test
45
+
46
+ require 'rake/rdoctask'
47
+ Rake::RDocTask.new do |rdoc|
48
+ version = File.exist?('VERSION') ? File.read('VERSION') : ""
49
+
50
+ rdoc.rdoc_dir = 'rdoc'
51
+ rdoc.title = "postgresql_cursor #{version}"
52
+ rdoc.rdoc_files.include('README*')
53
+ rdoc.rdoc_files.include('lib/**/*.rb')
54
+ end
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.2.0
@@ -0,0 +1,121 @@
1
+ gem 'activerecord', '<=2.3.5'
2
+ require 'active_record'
3
+
4
+ # Class to operate a PostgreSQL cursor to buffer a set of rows, and return single rows for processing.
5
+ # Use this class when processing a very large number of records, which would otherwise all be instantiated
6
+ # in memory by find(). This also adds helpers to ActiveRecord::Base for *find_with_cursor()* and
7
+ # *find_by_sql_with_cursor()* to return instances of the cursor ready to fetch.
8
+ #
9
+ # Use each() with a block to accept an instance of the Model (or whatever you define with a block on
10
+ # initialize()). It will open, buffer, yield each row, then close the cursor.
11
+ #
12
+ # PostgreSQL requires that a cursor is executed within a transaction block, which you must provide unless
13
+ # you use each() to run through the result set.
14
+ class PostgreSQLCursor
15
+ attr_reader :count, :buffer_reads
16
+ @@counter=0
17
+
18
+ # Define a new cursor, with a SQL statement, as a string with parameters already replaced, and options for
19
+ # the cursor
20
+ # * :buffer_size=>number of records to buffer, default 10000.
21
+ # Pass a optional block which takes a Hash of "column"=>"value", and returns an object to be yielded for each row.
22
+ def initialize(sql,*args, &block)
23
+ @options = args.last.is_a?(Hash) ? args.pop : {}
24
+ @@counter += 1
25
+ @instantiator = block || lambda {|r| r }
26
+ @sql = sql
27
+ @name = "pgcursor_#{@@counter}"
28
+ @connection = ActiveRecord::Base.connection
29
+ @buffer_size = @options[:buffer_size] || 10_000
30
+ @count = 0
31
+ @state = :ready
32
+ end
33
+
34
+ # Iterates through the rows, yields them to the block. It wraps the processing in a transaction
35
+ # (required by PostgreSQL), opens the cursor, buffers the results, returns each row, and closes the cursor.
36
+ def each
37
+ @connection.transaction do
38
+ @result = open
39
+ while (row = fetch ) do
40
+ yield row
41
+ end
42
+ close
43
+ @count
44
+ end
45
+ end
46
+
47
+ # Starts buffered result set processing for a given SQL statement. The DB
48
+ def open
49
+ raise "Open Cursor state not ready" unless @state == :ready
50
+ @result = @connection.execute("declare #{@name} cursor for #{@sql}")
51
+ @state = :empty
52
+ @buffer_reads = 0
53
+ @buffer = nil
54
+ end
55
+
56
+ # Returns a string of the current status
57
+ def status #:nodoc:
58
+ "row=#{@count} buffer=#{@buffer.size} state=#{@state} buffer_size=#{@buffer_size} reads=#{@buffer_reads}"
59
+ end
60
+
61
+ # Fetches the next block of rows into memory
62
+ def fetch_buffer #:nodoc:
63
+ return unless @state == :empty
64
+ @result = @connection.execute("fetch #{@buffer_size} from #{@name}")
65
+ @buffer = @result.collect {|row| row }
66
+ @state = @buffer.size > 0 ? :buffered : :eof
67
+ @buffer_reads += 1
68
+ @buffer
69
+ end
70
+
71
+ # Returns the next row from the cursor, or nil when end of data.
72
+ # The row returned is a hash[:colname]
73
+ def fetch
74
+ open if @state == :ready
75
+ fetch_buffer if @state == :empty
76
+ return nil if @state == :eof || @state == :closed
77
+ @state = :empty if @buffer.size <= 1
78
+ @count+= 1
79
+ row = @buffer.shift
80
+ @instantiator.call(row)
81
+ end
82
+
83
+ alias_method :next, :fetch
84
+
85
+ # Closes the cursor to clean up resources. Call this method during process of each() to
86
+ # exit the loop
87
+ def close
88
+ pg_result = @connection.execute("close #{@name}")
89
+ @state = :closed
90
+ end
91
+
92
+ end
93
+
94
+ class ActiveRecord::Base
95
+ class <<self
96
+
97
+ # Returns a PostgreSQLCursor instance to access the results, on which you are able to call
98
+ # each (though the cursor is not Enumerable and no other methods are available).
99
+ # No :all argument is needed, and other find() options can be specified.
100
+ # Specify the :cursor=>{...} option to override options for the cursor such has :buffer_size=>n.
101
+ # Pass an optional block that takes a Hash of the record and returns what you want to return.
102
+ # For example, return the Hash back to process a Hash instead of a table instance for better speed.
103
+ def find_with_cursor(*args, &block)
104
+ find_options = args.last.is_a?(Hash) ? args.pop : {}
105
+ options = find_options.delete(:cursor) || {}
106
+ validate_find_options(find_options)
107
+ set_readonly_option!(find_options)
108
+ sql = construct_finder_sql(find_options)
109
+ PostgreSQLCursor.new(sql, options) { |r| block_given? ? yield(r) : instantiate(r) }
110
+ end
111
+
112
+ # Returns a PostgreSQLCursor instance to access the results of the sql
113
+ # Specify the :cursor=>{...} option to override options for the cursor such has :buffer_size=>n.
114
+ # Pass an optional block that takes a Hash of the record and returns what you want to return.
115
+ # For example, return the Hash back to process a Hash instead of a table instance for better speed.
116
+ def find_by_sql_with_cursor(sql, options={})
117
+ PostgreSQLCursor.new(sql, options) { |r| block_given? ? yield(r) : instantiate(r) }
118
+ end
119
+
120
+ end
121
+ end
@@ -0,0 +1,57 @@
1
+ # Generated by jeweler
2
+ # DO NOT EDIT THIS FILE DIRECTLY
3
+ # Instead, edit Jeweler::Tasks in Rakefile, and run the gemspec command
4
+ # -*- encoding: utf-8 -*-
5
+
6
+ Gem::Specification.new do |s|
7
+ s.name = %q{postgresql_cursor}
8
+ s.version = "0.2.0"
9
+
10
+ s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
+ s.authors = ["Allen Fair"]
12
+ s.date = %q{2010-05-17}
13
+ s.description = %q{PostgreSQL Cursor is an extension to the ActiveRecord PostgreSQLAdapter for very large result sets. It provides a cursor open/fetch/close interface to access data without loading all rows into memory, and instead loads the result rows in "chunks" (default of 10_000 rows), buffers them, and returns the rows one at a time.}
14
+ s.email = %q{allen.fair@gmail.com}
15
+ s.extra_rdoc_files = [
16
+ "LICENSE",
17
+ "README.rdoc"
18
+ ]
19
+ s.files = [
20
+ ".document",
21
+ ".gitignore",
22
+ "LICENSE",
23
+ "README.rdoc",
24
+ "Rakefile",
25
+ "VERSION",
26
+ "lib/postgresql_cursor.rb",
27
+ "postgresql_cursor.gemspec",
28
+ "test/helper.rb",
29
+ "test/test_postgresql_cursor.rb"
30
+ ]
31
+ s.homepage = %q{http://github.com/afair/postgresql_cursor}
32
+ s.rdoc_options = ["--charset=UTF-8"]
33
+ s.require_paths = ["lib"]
34
+ s.rubygems_version = %q{1.3.6}
35
+ s.summary = %q{ActiveRecord PostgreSQL Adapter extension for using a cursor to return a large result set}
36
+ s.test_files = [
37
+ "test/helper.rb",
38
+ "test/test_postgresql_cursor.rb"
39
+ ]
40
+
41
+ if s.respond_to? :specification_version then
42
+ current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
43
+ s.specification_version = 3
44
+
45
+ if Gem::Version.new(Gem::RubyGemsVersion) >= Gem::Version.new('1.2.0') then
46
+ s.add_runtime_dependency(%q<activerecord>, ["<= 2.3.5"])
47
+ s.add_runtime_dependency(%q<pg>, [">= 0"])
48
+ else
49
+ s.add_dependency(%q<activerecord>, ["<= 2.3.5"])
50
+ s.add_dependency(%q<pg>, [">= 0"])
51
+ end
52
+ else
53
+ s.add_dependency(%q<activerecord>, ["<= 2.3.5"])
54
+ s.add_dependency(%q<pg>, [">= 0"])
55
+ end
56
+ end
57
+
@@ -0,0 +1,23 @@
1
+ require 'rubygems'
2
+ require 'test/unit'
3
+
4
+ $LOAD_PATH.unshift(File.dirname(__FILE__))
5
+ $LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
6
+ require 'postgresql_cursor'
7
+
8
+ ActiveRecord::Base.establish_connection :database=>'allen_test', :adapter=>'postgresql', :username=>'allen'
9
+ class Model < ActiveRecord::Base
10
+ set_table_name "records"
11
+
12
+ # create table records (id serial primary key);
13
+ def self.generate(max=1_000_000)
14
+ max.times do
15
+ connection.execute("insert into records values (nextval('records_id_seq'::regclass))")
16
+ end
17
+ end
18
+ end
19
+
20
+ Model.generate(1000) if Model.count == 0
21
+
22
+ class Test::Unit::TestCase
23
+ end
@@ -0,0 +1,44 @@
1
+ require 'helper'
2
+
3
+ class TestPostgresqlCursor < Test::Unit::TestCase
4
+
5
+ def test_cursor
6
+ c = Model.find_with_cursor(:conditions=>["id>?",0], :cursor=>{:buffer_size=>10})
7
+ mycount=0
8
+ count = c.each { |r| mycount += 1 }
9
+ assert_equal mycount, count
10
+ end
11
+
12
+ def test_empty_set
13
+ c = Model.find_with_cursor(:conditions=>["id<?",0])
14
+ count = c.each { |r| puts r.class }
15
+ assert_equal count, 0
16
+ end
17
+
18
+ def test_block
19
+ Model.transaction do
20
+ c = Model.find_with_cursor(:conditions=>["id<?",10]) { |r| r }
21
+ r = c.next
22
+ assert_equal r.class, Hash
23
+ end
24
+ end
25
+
26
+ def test_sql
27
+ c = Model.find_by_sql_with_cursor("select * from #{Model.table_name}")
28
+ mycount=0
29
+ count = c.each { |r| mycount += 1 }
30
+ assert_equal mycount, count
31
+ end
32
+
33
+ def test_loop
34
+ Model.transaction do
35
+ cursor = Model.find_with_cursor() { |record| record.symbolize_keys }
36
+ while record = cursor.next do
37
+ assert record[:id].class, Fixnum
38
+ cursor.close if cursor.count >= 10
39
+ end
40
+ assert_equal cursor.count, 10
41
+ end
42
+ end
43
+
44
+ end
metadata ADDED
@@ -0,0 +1,98 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: postgresql_cursor
3
+ version: !ruby/object:Gem::Version
4
+ prerelease: false
5
+ segments:
6
+ - 0
7
+ - 2
8
+ - 0
9
+ version: 0.2.0
10
+ platform: ruby
11
+ authors:
12
+ - Allen Fair
13
+ autorequire:
14
+ bindir: bin
15
+ cert_chain: []
16
+
17
+ date: 2010-05-17 00:00:00 -04:00
18
+ default_executable:
19
+ dependencies:
20
+ - !ruby/object:Gem::Dependency
21
+ name: activerecord
22
+ prerelease: false
23
+ requirement: &id001 !ruby/object:Gem::Requirement
24
+ requirements:
25
+ - - <=
26
+ - !ruby/object:Gem::Version
27
+ segments:
28
+ - 2
29
+ - 3
30
+ - 5
31
+ version: 2.3.5
32
+ type: :runtime
33
+ version_requirements: *id001
34
+ - !ruby/object:Gem::Dependency
35
+ name: pg
36
+ prerelease: false
37
+ requirement: &id002 !ruby/object:Gem::Requirement
38
+ requirements:
39
+ - - ">="
40
+ - !ruby/object:Gem::Version
41
+ segments:
42
+ - 0
43
+ version: "0"
44
+ type: :runtime
45
+ version_requirements: *id002
46
+ description: PostgreSQL Cursor is an extension to the ActiveRecord PostgreSQLAdapter for very large result sets. It provides a cursor open/fetch/close interface to access data without loading all rows into memory, and instead loads the result rows in "chunks" (default of 10_000 rows), buffers them, and returns the rows one at a time.
47
+ email: allen.fair@gmail.com
48
+ executables: []
49
+
50
+ extensions: []
51
+
52
+ extra_rdoc_files:
53
+ - LICENSE
54
+ - README.rdoc
55
+ files:
56
+ - .document
57
+ - .gitignore
58
+ - LICENSE
59
+ - README.rdoc
60
+ - Rakefile
61
+ - VERSION
62
+ - lib/postgresql_cursor.rb
63
+ - postgresql_cursor.gemspec
64
+ - test/helper.rb
65
+ - test/test_postgresql_cursor.rb
66
+ has_rdoc: true
67
+ homepage: http://github.com/afair/postgresql_cursor
68
+ licenses: []
69
+
70
+ post_install_message:
71
+ rdoc_options:
72
+ - --charset=UTF-8
73
+ require_paths:
74
+ - lib
75
+ required_ruby_version: !ruby/object:Gem::Requirement
76
+ requirements:
77
+ - - ">="
78
+ - !ruby/object:Gem::Version
79
+ segments:
80
+ - 0
81
+ version: "0"
82
+ required_rubygems_version: !ruby/object:Gem::Requirement
83
+ requirements:
84
+ - - ">="
85
+ - !ruby/object:Gem::Version
86
+ segments:
87
+ - 0
88
+ version: "0"
89
+ requirements: []
90
+
91
+ rubyforge_project:
92
+ rubygems_version: 1.3.6
93
+ signing_key:
94
+ specification_version: 3
95
+ summary: ActiveRecord PostgreSQL Adapter extension for using a cursor to return a large result set
96
+ test_files:
97
+ - test/helper.rb
98
+ - test/test_postgresql_cursor.rb