postgresql_cursor 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,5 @@
1
+ README.rdoc
2
+ lib/**/*.rb
3
+ bin/*
4
+ features/**/*.feature
5
+ LICENSE
@@ -0,0 +1,21 @@
1
+ ## MAC OS
2
+ .DS_Store
3
+
4
+ ## TEXTMATE
5
+ *.tmproj
6
+ tmtags
7
+
8
+ ## EMACS
9
+ *~
10
+ \#*
11
+ .\#*
12
+
13
+ ## VIM
14
+ *.swp
15
+
16
+ ## PROJECT::GENERAL
17
+ coverage
18
+ rdoc
19
+ pkg
20
+
21
+ ## PROJECT::SPECIFIC
data/LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2009 Allen Fair
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,78 @@
1
+ = PostgreSQLCursor
2
+
3
+ PostgreSQL Cursor is an extension to the ActiveRecord PostgreSQLAdapter for very large result sets.
4
+ It provides a cursor open/fetch/close interface to access data without loading all rows into memory,
5
+ and instead loads the result rows in "chunks" (default of 10_000 rows), buffers them, and returns the
6
+ rows one at a time.
7
+
8
+ For web pages, an application would not want to process a large amount of data, usually employing a
9
+ Pagination scheme to present it to the users. Background processes sometimes need to generate a large
10
+ amount of data, and the ActiveRecord approach to load all data into memory is not the best fit here.
11
+
12
+ Previous solutions employ pagination to fetch each block,
13
+ then re-running the query for the next "page". This gem avoids re-executing
14
+ the query by using the PostgreSQL cursors.
15
+
16
+ Like the #find_by_sql method, #find_cursor returns each row as a hash instead of an instantiated
17
+ model class. The rationale for this is performance, though an option to return instances is available.
18
+ Julian's benchmarks showed returning instances was a factor of 4 slower than return the hash.
19
+
20
+ ==Installation
21
+ [sudo] gem install postgresql_cursor
22
+
23
+ This does not require Rails to work, just ActiveRecord < 3.0.0 and the 'pg' gem.
24
+
25
+ ==Usage
26
+
27
+ A Rails/ActiveRecord plugin for the PostgreSQL database adapter that will add
28
+ cursors to a find_cursor() method to process very large result sets.
29
+
30
+ the *find_cursor* method uses cursors to pull in one data block (of x records) at a time, and
31
+ return each record as a Hash to a procedural block. When each data block is
32
+ exhausted, it will fetch the next one.
33
+
34
+ *find_by_sql_with_cursor* takes a custom SQL statement and returns each row.
35
+
36
+ ==Examples
37
+
38
+ Account.find_with_cursor(:conditions=>["status = ?", 'good']).each do |row|
39
+ puts row.to_json
40
+ end
41
+
42
+ Account.find_by_sql_with_cursor("select ...", :buffer_size=>1000) do |row|
43
+ row.process
44
+ end
45
+
46
+ Account.transaction do
47
+ cursor = Account.find_with_cursor(...) { |record| record.symbolize_keys }
48
+ while record = cursor.next do
49
+ record.process # => {:column=>value, ...}
50
+ cursor.close if cursor.count > 1000 # Halts loop after 1000 records
51
+ end
52
+ end
53
+
54
+ Account.find_with_cursor(...) { |record| record.symbolize_keys }.each do |row|
55
+ row.process
56
+ end
57
+
58
+ ==Authors
59
+ Allen Fair, allen.fair@gmail.com, http://github.com/afair
60
+
61
+ Thank you to:
62
+ * Iulian Dogariu, http://github.com/iulianu (Fixes)
63
+ * Julian Mehnle, http://www.mehnle.net (Suggestions)
64
+
65
+
66
+ == Note on Patches/Pull Requests
67
+
68
+ * Fork the project.
69
+ * Make your feature addition or bug fix.
70
+ * Add tests for it. This is important so I don't break it in a
71
+ future version unintentionally.
72
+ * Commit, do not mess with rakefile, version, or history.
73
+ (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
74
+ * Send me a pull request. Bonus points for topic branches.
75
+
76
+ == Copyright
77
+
78
+ Copyright (c) 2010 Allen Fair. See LICENSE for details.
@@ -0,0 +1,54 @@
1
+ require 'rubygems'
2
+ require 'rake'
3
+
4
+ begin
5
+ require 'jeweler'
6
+ Jeweler::Tasks.new do |gem|
7
+ gem.name = "postgresql_cursor"
8
+ gem.summary = %Q{ActiveRecord PostgreSQL Adapter extension for using a cursor to return a large result set}
9
+ gem.description = %Q{PostgreSQL Cursor is an extension to the ActiveRecord PostgreSQLAdapter for very large result sets. It provides a cursor open/fetch/close interface to access data without loading all rows into memory, and instead loads the result rows in "chunks" (default of 10_000 rows), buffers them, and returns the rows one at a time.}
10
+ gem.email = "allen.fair@gmail.com"
11
+ gem.homepage = "http://github.com/afair/postgresql_cursor"
12
+ gem.authors = ["Allen Fair"]
13
+ gem.add_dependency 'activerecord', '<=2.3.5'
14
+ gem.add_dependency 'pg'
15
+ # gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
16
+ end
17
+ Jeweler::GemcutterTasks.new
18
+ rescue LoadError
19
+ puts "Jeweler (or a dependency) not available. Install it with: gem install jeweler"
20
+ end
21
+
22
+ require 'rake/testtask'
23
+ Rake::TestTask.new(:test) do |test|
24
+ test.libs << 'lib' << 'test'
25
+ test.pattern = 'test/**/test_*.rb'
26
+ test.verbose = true
27
+ end
28
+
29
+ begin
30
+ require 'rcov/rcovtask'
31
+ Rcov::RcovTask.new do |test|
32
+ test.libs << 'test'
33
+ test.pattern = 'test/**/test_*.rb'
34
+ test.verbose = true
35
+ end
36
+ rescue LoadError
37
+ task :rcov do
38
+ abort "RCov is not available. In order to run rcov, you must: sudo gem install spicycode-rcov"
39
+ end
40
+ end
41
+
42
+ task :test => :check_dependencies
43
+
44
+ task :default => :test
45
+
46
+ require 'rake/rdoctask'
47
+ Rake::RDocTask.new do |rdoc|
48
+ version = File.exist?('VERSION') ? File.read('VERSION') : ""
49
+
50
+ rdoc.rdoc_dir = 'rdoc'
51
+ rdoc.title = "postgresql_cursor #{version}"
52
+ rdoc.rdoc_files.include('README*')
53
+ rdoc.rdoc_files.include('lib/**/*.rb')
54
+ end
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.2.0
@@ -0,0 +1,121 @@
1
+ gem 'activerecord', '<=2.3.5'
2
+ require 'active_record'
3
+
4
+ # Class to operate a PostgreSQL cursor to buffer a set of rows, and return single rows for processing.
5
+ # Use this class when processing a very large number of records, which would otherwise all be instantiated
6
+ # in memory by find(). This also adds helpers to ActiveRecord::Base for *find_with_cursor()* and
7
+ # *find_by_sql_with_cursor()* to return instances of the cursor ready to fetch.
8
+ #
9
+ # Use each() with a block to accept an instance of the Model (or whatever you define with a block on
10
+ # initialize()). It will open, buffer, yield each row, then close the cursor.
11
+ #
12
+ # PostgreSQL requires that a cursor is executed within a transaction block, which you must provide unless
13
+ # you use each() to run through the result set.
14
+ class PostgreSQLCursor
15
+ attr_reader :count, :buffer_reads
16
+ @@counter=0
17
+
18
+ # Define a new cursor, with a SQL statement, as a string with parameters already replaced, and options for
19
+ # the cursor
20
+ # * :buffer_size=>number of records to buffer, default 10000.
21
+ # Pass a optional block which takes a Hash of "column"=>"value", and returns an object to be yielded for each row.
22
+ def initialize(sql,*args, &block)
23
+ @options = args.last.is_a?(Hash) ? args.pop : {}
24
+ @@counter += 1
25
+ @instantiator = block || lambda {|r| r }
26
+ @sql = sql
27
+ @name = "pgcursor_#{@@counter}"
28
+ @connection = ActiveRecord::Base.connection
29
+ @buffer_size = @options[:buffer_size] || 10_000
30
+ @count = 0
31
+ @state = :ready
32
+ end
33
+
34
+ # Iterates through the rows, yields them to the block. It wraps the processing in a transaction
35
+ # (required by PostgreSQL), opens the cursor, buffers the results, returns each row, and closes the cursor.
36
+ def each
37
+ @connection.transaction do
38
+ @result = open
39
+ while (row = fetch ) do
40
+ yield row
41
+ end
42
+ close
43
+ @count
44
+ end
45
+ end
46
+
47
+ # Starts buffered result set processing for a given SQL statement. The DB
48
+ def open
49
+ raise "Open Cursor state not ready" unless @state == :ready
50
+ @result = @connection.execute("declare #{@name} cursor for #{@sql}")
51
+ @state = :empty
52
+ @buffer_reads = 0
53
+ @buffer = nil
54
+ end
55
+
56
+ # Returns a string of the current status
57
+ def status #:nodoc:
58
+ "row=#{@count} buffer=#{@buffer.size} state=#{@state} buffer_size=#{@buffer_size} reads=#{@buffer_reads}"
59
+ end
60
+
61
+ # Fetches the next block of rows into memory
62
+ def fetch_buffer #:nodoc:
63
+ return unless @state == :empty
64
+ @result = @connection.execute("fetch #{@buffer_size} from #{@name}")
65
+ @buffer = @result.collect {|row| row }
66
+ @state = @buffer.size > 0 ? :buffered : :eof
67
+ @buffer_reads += 1
68
+ @buffer
69
+ end
70
+
71
+ # Returns the next row from the cursor, or nil when end of data.
72
+ # The row returned is a hash[:colname]
73
+ def fetch
74
+ open if @state == :ready
75
+ fetch_buffer if @state == :empty
76
+ return nil if @state == :eof || @state == :closed
77
+ @state = :empty if @buffer.size <= 1
78
+ @count+= 1
79
+ row = @buffer.shift
80
+ @instantiator.call(row)
81
+ end
82
+
83
+ alias_method :next, :fetch
84
+
85
+ # Closes the cursor to clean up resources. Call this method during process of each() to
86
+ # exit the loop
87
+ def close
88
+ pg_result = @connection.execute("close #{@name}")
89
+ @state = :closed
90
+ end
91
+
92
+ end
93
+
94
+ class ActiveRecord::Base
95
+ class <<self
96
+
97
+ # Returns a PostgreSQLCursor instance to access the results, on which you are able to call
98
+ # each (though the cursor is not Enumerable and no other methods are available).
99
+ # No :all argument is needed, and other find() options can be specified.
100
+ # Specify the :cursor=>{...} option to override options for the cursor such has :buffer_size=>n.
101
+ # Pass an optional block that takes a Hash of the record and returns what you want to return.
102
+ # For example, return the Hash back to process a Hash instead of a table instance for better speed.
103
+ def find_with_cursor(*args, &block)
104
+ find_options = args.last.is_a?(Hash) ? args.pop : {}
105
+ options = find_options.delete(:cursor) || {}
106
+ validate_find_options(find_options)
107
+ set_readonly_option!(find_options)
108
+ sql = construct_finder_sql(find_options)
109
+ PostgreSQLCursor.new(sql, options) { |r| block_given? ? yield(r) : instantiate(r) }
110
+ end
111
+
112
+ # Returns a PostgreSQLCursor instance to access the results of the sql
113
+ # Specify the :cursor=>{...} option to override options for the cursor such has :buffer_size=>n.
114
+ # Pass an optional block that takes a Hash of the record and returns what you want to return.
115
+ # For example, return the Hash back to process a Hash instead of a table instance for better speed.
116
+ def find_by_sql_with_cursor(sql, options={})
117
+ PostgreSQLCursor.new(sql, options) { |r| block_given? ? yield(r) : instantiate(r) }
118
+ end
119
+
120
+ end
121
+ end
@@ -0,0 +1,57 @@
1
+ # Generated by jeweler
2
+ # DO NOT EDIT THIS FILE DIRECTLY
3
+ # Instead, edit Jeweler::Tasks in Rakefile, and run the gemspec command
4
+ # -*- encoding: utf-8 -*-
5
+
6
+ Gem::Specification.new do |s|
7
+ s.name = %q{postgresql_cursor}
8
+ s.version = "0.2.0"
9
+
10
+ s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
+ s.authors = ["Allen Fair"]
12
+ s.date = %q{2010-05-17}
13
+ s.description = %q{PostgreSQL Cursor is an extension to the ActiveRecord PostgreSQLAdapter for very large result sets. It provides a cursor open/fetch/close interface to access data without loading all rows into memory, and instead loads the result rows in "chunks" (default of 10_000 rows), buffers them, and returns the rows one at a time.}
14
+ s.email = %q{allen.fair@gmail.com}
15
+ s.extra_rdoc_files = [
16
+ "LICENSE",
17
+ "README.rdoc"
18
+ ]
19
+ s.files = [
20
+ ".document",
21
+ ".gitignore",
22
+ "LICENSE",
23
+ "README.rdoc",
24
+ "Rakefile",
25
+ "VERSION",
26
+ "lib/postgresql_cursor.rb",
27
+ "postgresql_cursor.gemspec",
28
+ "test/helper.rb",
29
+ "test/test_postgresql_cursor.rb"
30
+ ]
31
+ s.homepage = %q{http://github.com/afair/postgresql_cursor}
32
+ s.rdoc_options = ["--charset=UTF-8"]
33
+ s.require_paths = ["lib"]
34
+ s.rubygems_version = %q{1.3.6}
35
+ s.summary = %q{ActiveRecord PostgreSQL Adapter extension for using a cursor to return a large result set}
36
+ s.test_files = [
37
+ "test/helper.rb",
38
+ "test/test_postgresql_cursor.rb"
39
+ ]
40
+
41
+ if s.respond_to? :specification_version then
42
+ current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
43
+ s.specification_version = 3
44
+
45
+ if Gem::Version.new(Gem::RubyGemsVersion) >= Gem::Version.new('1.2.0') then
46
+ s.add_runtime_dependency(%q<activerecord>, ["<= 2.3.5"])
47
+ s.add_runtime_dependency(%q<pg>, [">= 0"])
48
+ else
49
+ s.add_dependency(%q<activerecord>, ["<= 2.3.5"])
50
+ s.add_dependency(%q<pg>, [">= 0"])
51
+ end
52
+ else
53
+ s.add_dependency(%q<activerecord>, ["<= 2.3.5"])
54
+ s.add_dependency(%q<pg>, [">= 0"])
55
+ end
56
+ end
57
+
@@ -0,0 +1,23 @@
1
+ require 'rubygems'
2
+ require 'test/unit'
3
+
4
+ $LOAD_PATH.unshift(File.dirname(__FILE__))
5
+ $LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
6
+ require 'postgresql_cursor'
7
+
8
+ ActiveRecord::Base.establish_connection :database=>'allen_test', :adapter=>'postgresql', :username=>'allen'
9
+ class Model < ActiveRecord::Base
10
+ set_table_name "records"
11
+
12
+ # create table records (id serial primary key);
13
+ def self.generate(max=1_000_000)
14
+ max.times do
15
+ connection.execute("insert into records values (nextval('records_id_seq'::regclass))")
16
+ end
17
+ end
18
+ end
19
+
20
+ Model.generate(1000) if Model.count == 0
21
+
22
+ class Test::Unit::TestCase
23
+ end
@@ -0,0 +1,44 @@
1
+ require 'helper'
2
+
3
+ class TestPostgresqlCursor < Test::Unit::TestCase
4
+
5
+ def test_cursor
6
+ c = Model.find_with_cursor(:conditions=>["id>?",0], :cursor=>{:buffer_size=>10})
7
+ mycount=0
8
+ count = c.each { |r| mycount += 1 }
9
+ assert_equal mycount, count
10
+ end
11
+
12
+ def test_empty_set
13
+ c = Model.find_with_cursor(:conditions=>["id<?",0])
14
+ count = c.each { |r| puts r.class }
15
+ assert_equal count, 0
16
+ end
17
+
18
+ def test_block
19
+ Model.transaction do
20
+ c = Model.find_with_cursor(:conditions=>["id<?",10]) { |r| r }
21
+ r = c.next
22
+ assert_equal r.class, Hash
23
+ end
24
+ end
25
+
26
+ def test_sql
27
+ c = Model.find_by_sql_with_cursor("select * from #{Model.table_name}")
28
+ mycount=0
29
+ count = c.each { |r| mycount += 1 }
30
+ assert_equal mycount, count
31
+ end
32
+
33
+ def test_loop
34
+ Model.transaction do
35
+ cursor = Model.find_with_cursor() { |record| record.symbolize_keys }
36
+ while record = cursor.next do
37
+ assert record[:id].class, Fixnum
38
+ cursor.close if cursor.count >= 10
39
+ end
40
+ assert_equal cursor.count, 10
41
+ end
42
+ end
43
+
44
+ end
metadata ADDED
@@ -0,0 +1,98 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: postgresql_cursor
3
+ version: !ruby/object:Gem::Version
4
+ prerelease: false
5
+ segments:
6
+ - 0
7
+ - 2
8
+ - 0
9
+ version: 0.2.0
10
+ platform: ruby
11
+ authors:
12
+ - Allen Fair
13
+ autorequire:
14
+ bindir: bin
15
+ cert_chain: []
16
+
17
+ date: 2010-05-17 00:00:00 -04:00
18
+ default_executable:
19
+ dependencies:
20
+ - !ruby/object:Gem::Dependency
21
+ name: activerecord
22
+ prerelease: false
23
+ requirement: &id001 !ruby/object:Gem::Requirement
24
+ requirements:
25
+ - - <=
26
+ - !ruby/object:Gem::Version
27
+ segments:
28
+ - 2
29
+ - 3
30
+ - 5
31
+ version: 2.3.5
32
+ type: :runtime
33
+ version_requirements: *id001
34
+ - !ruby/object:Gem::Dependency
35
+ name: pg
36
+ prerelease: false
37
+ requirement: &id002 !ruby/object:Gem::Requirement
38
+ requirements:
39
+ - - ">="
40
+ - !ruby/object:Gem::Version
41
+ segments:
42
+ - 0
43
+ version: "0"
44
+ type: :runtime
45
+ version_requirements: *id002
46
+ description: PostgreSQL Cursor is an extension to the ActiveRecord PostgreSQLAdapter for very large result sets. It provides a cursor open/fetch/close interface to access data without loading all rows into memory, and instead loads the result rows in "chunks" (default of 10_000 rows), buffers them, and returns the rows one at a time.
47
+ email: allen.fair@gmail.com
48
+ executables: []
49
+
50
+ extensions: []
51
+
52
+ extra_rdoc_files:
53
+ - LICENSE
54
+ - README.rdoc
55
+ files:
56
+ - .document
57
+ - .gitignore
58
+ - LICENSE
59
+ - README.rdoc
60
+ - Rakefile
61
+ - VERSION
62
+ - lib/postgresql_cursor.rb
63
+ - postgresql_cursor.gemspec
64
+ - test/helper.rb
65
+ - test/test_postgresql_cursor.rb
66
+ has_rdoc: true
67
+ homepage: http://github.com/afair/postgresql_cursor
68
+ licenses: []
69
+
70
+ post_install_message:
71
+ rdoc_options:
72
+ - --charset=UTF-8
73
+ require_paths:
74
+ - lib
75
+ required_ruby_version: !ruby/object:Gem::Requirement
76
+ requirements:
77
+ - - ">="
78
+ - !ruby/object:Gem::Version
79
+ segments:
80
+ - 0
81
+ version: "0"
82
+ required_rubygems_version: !ruby/object:Gem::Requirement
83
+ requirements:
84
+ - - ">="
85
+ - !ruby/object:Gem::Version
86
+ segments:
87
+ - 0
88
+ version: "0"
89
+ requirements: []
90
+
91
+ rubyforge_project:
92
+ rubygems_version: 1.3.6
93
+ signing_key:
94
+ specification_version: 3
95
+ summary: ActiveRecord PostgreSQL Adapter extension for using a cursor to return a large result set
96
+ test_files:
97
+ - test/helper.rb
98
+ - test/test_postgresql_cursor.rb