RubyGems - partitioned - Versions diffs - 0.8.0 → 1.0.1 - Mend

partitioned 0.8.0 → 1.0.1

Files changed (29) hide show

data/README +85 -36
data/Rakefile +3 -0
data/examples/README +46 -18
data/lib/monkey_patch_activerecord.rb +14 -8
data/lib/monkey_patch_postgres.rb +46 -13
data/lib/partitioned/active_record_overrides.rb +13 -5
data/lib/partitioned/bulk_methods_mixin.rb +91 -146
data/lib/partitioned/by_created_at.rb +3 -1
data/lib/partitioned/by_foreign_key.rb +5 -0
data/lib/partitioned/by_id.rb +10 -4
data/lib/partitioned/by_integer_field.rb +9 -0
data/lib/partitioned/by_monthly_time_field.rb +8 -1
data/lib/partitioned/by_time_field.rb +16 -8
data/lib/partitioned/by_weekly_time_field.rb +6 -3
data/lib/partitioned/multi_level/configurator/data.rb +1 -0
data/lib/partitioned/multi_level/configurator/dsl.rb +11 -0
data/lib/partitioned/multi_level/configurator/reader.rb +18 -0
data/lib/partitioned/multi_level/partition_manager.rb +13 -4
data/lib/partitioned/multi_level.rb +3 -1
data/lib/partitioned/partitioned_base/configurator/data.rb +10 -1
data/lib/partitioned/partitioned_base/configurator/dsl.rb +20 -15
data/lib/partitioned/partitioned_base/configurator/reader.rb +3 -0
data/lib/partitioned/partitioned_base/configurator.rb +4 -0
data/lib/partitioned/partitioned_base/partition_manager.rb +17 -15
data/lib/partitioned/partitioned_base/sql_adapter.rb +25 -23
data/lib/partitioned/partitioned_base.rb +112 -41
data/lib/partitioned/version.rb +2 -1
data/partitioned.gemspec +3 -2
metadata +68 -73

data/README CHANGED Viewed

@@ -1,29 +1,30 @@
 Partitioned
 ===========
-Partitioned adds assistance to ActiveRecord for manipulating
-(reading, creating, updating) an activerecord model that represents
-data that may be in one of many database tables (determined by the Models data).
+Partitioned adds assistance to ActiveRecord for manipulating (reading,
+creating, updating) an activerecord model that represents data that
+may be in one of many database tables (determined by the Models data).
-It also has features that support the creation and deleting of child tables and
-partitioning support infrastructure.
+It also has features that support the creation and deleting of child
+tables and partitioning support infrastructure.
-It supports Postgres partitioning and has specific features to overcome basic
-failings of Postgres's implementation of partitioning.
+It supports Postgres partitioning and has specific features to
+overcome basic failings of Postgres's implementation of partitioning.
 Basics:
-A parent table can be inherited by many child tables that inherit most of the
-attributes of the parent table including its columns.  child tables typically
-(and for the uses of this plugin must) have a unique check constraint the
-defines which data should be located in that specific child table.
+A parent table can be inherited by many child tables that inherit most
+of the attributes of the parent table including its columns.  child
+tables typically (and for the uses of this plugin must) have a unique
+check constraint the defines which data should be located in that
+specific child table.
-Such a constraint allows for the SQL planner to ignore most child tables and target
-the (hopefully) one child table that contains the records interested.  This splits
-data, and meta-data (indexes) which provides streamlined targeted access to the
-desired data.
+Such a constraint allows for the SQL planner to ignore most child
+tables and target the (hopefully) one child table that contains the
+records interested.  This splits data, and meta-data (indexes) which
+provides streamlined targeted access to the desired data.
-Support for bulk inserts and bulk updates is also provided via Partitioned::Base.create_many and
-Partitioned::Base.update_many.
+Support for bulk inserts and bulk updates is also provided via
+Partitioned::Base.create_many and Partitioned::Base.update_many.
 Example
 =======
@@ -33,7 +34,21 @@ Given the following models:
   class Company < ActiveRecord::Base
   end
-  class Employee < Partitioned::ByCompanyId
+  class ByCompanyId < Partitioned::ByForeignKey
+    self.abstract_class = true
+    belongs_to :company
+    def self.partition_foreign_key
+      return :company_id
+    end
+    partitioned do |partition|
+      partition.index :id, :unique => true
+    end
+  end
+  class Employee < ByCompanyId
   end
 and the following tables:
@@ -47,6 +62,10 @@ and the following tables:
       name             text null
   );
+  -- add some companies
+  insert into table companies (name) values
+    ('company 1'),('company 2'),('company 2');
   -- this is the parent table
   create table employees
   (
@@ -57,18 +76,45 @@ and the following tables:
       company_id       integer not null references companies
   );
+We now need to create some infrastructure for partitioned tables,
+in particular, we create a schema to hold the child partition
+tables of employees.
+  Employee.create_infrastructure
+Which creates the employees_partitions schema using the following SQL:
   create schema employees_partitions;
-  create table companies (name) values ('company 1'),('company 2'),('company 2');
+NOTE: We also install protections on the employees table so it isn't
+used as a data table (this SQL is not presented for simplicity but is
+apart of the create_infrastructure call).
+To add child tables we use the create_new_partitions_tables method:
+  company_ids = Company.all.map(&:id)
+  Employee.create_new_partition_tables(company_ids)
+which results in the following SQL:
-  -- some children
-  create table employees_partitions.p1 ( CHECK ( company_id = 1 ) ) INHERITS (employees);
-  create table employees_partitions.p2 ( CHECK ( company_id = 2 ) ) INHERITS (employees);
-  create table employees_partitions.p3 ( CHECK ( company_id = 3 ) ) INHERITS (employees);
+  create table employees_partitions.p1
+    ( CHECK ( company_id = 1 ) ) INHERITS (employees);
+  create table employees_partitions.p2
+    ( CHECK ( company_id = 2 ) ) INHERITS (employees);
+  create table employees_partitions.p3
+    ( CHECK ( company_id = 3 ) ) INHERITS (employees);
-since database records exist in a specific child table dependant on the field "company_id"
-we need to have creates that turn into database inserts of the EMPLOYEES table redirect
-the record insert into the specific child table determined by the value of COMPANY_ID
+NOTE: Some other SQL is generated in the above example, specifically
+the reference to the companies table needs to be explicitly created
+for postgres child tables AND the unique index on 'id' is created.
+These are not shown for simplicity.
+Now we can do operations involving the child partitions.
+Since database records exist in a specific child table dependant on
+the field "company_id" we need to have creates that turn into database
+inserts of the EMPLOYEES table redirect the record insert into the
+specific child table determined by the value of COMPANY_ID
 eg:
   employee = Employee.create(:name => 'Keith', :company_id => 1)
@@ -79,12 +125,12 @@ this would normally produce the following:
 but with Partitioned we see:
   INSERT INTO employees_partitions.p1 ('name', company_id) values ('Keith', 1);
-reads of such a table need some assistance to find the specific child table the
-record exists in.
+reads of such a table need some assistance to find the specific child
+table the record exists in.
-Since we are partitioned by company_id the programmer needs to provide that information
-when fetching data, or the database will need to search all child table for the
-specific record we are looking for.
+Since we are partitioned by company_id the programmer needs to provide
+that information when fetching data, or the database will need to
+search all child table for the specific record we are looking for.
 This is no longer valid (well, doesn't perform well):
@@ -93,11 +139,14 @@ This is no longer valid (well, doesn't perform well):
 instead, do one of the following:
   employee = Employee.from_partition(1).find(1)
-  employee = Employee.find(:first, :conditions => {:name => 'Keith', :company_id => 1})
-  employee = Employee.find(:first, :conditions => {:id => 1, :company_id => 1})
-an update (employee.save where the record already exists in the database) will take
-advantage of knowing which child table the record exists in so it can do some optimization.
+  employee = Employee.find(:first,
+                           :conditions => {:name => 'Keith', :company_id => 1})
+  employee = Employee.find(:first,
+                           :conditions => {:id => 1, :company_id => 1})
+an update (employee.save where the record already exists in the
+database) will take advantage of knowing which child table the record
+exists in so it can do some optimization.
 so, the following works as expected:

data/Rakefile CHANGED Viewed

@@ -4,6 +4,9 @@ begin
 rescue LoadError
   puts 'You must `gem install bundler` and `bundle install` to run rake tasks'
 end
+task :default => :spec
 begin
   require 'rdoc/task'
 rescue LoadError

data/examples/README CHANGED Viewed

@@ -1,23 +1,51 @@
 The directory holds examples of how to use the partitioned gem.
+These rails scripts will create and populate partitioned tables. The
+scripts accept the following parameters:
+ --?                   list available options
+ --force	       delete tables before starting
+                         default: false
+ --cleanup	       delete tables and exit
+                         default: false
+ --create-many         how many objects to create via create_many
+                         default: 3000
+ --create-individual   how many objects to create via create
+                         default: 1000
+ --new-individual      how many objects to create via new.save
+                         default: 1000
+ --update-individual   how many objects to update individually
+                         default: 1000
+ --update-many         how many objects to update via update_many
+                         default: 1000
+The scripts are:
+company_id.rb: table 'employees' partitioned by company_id
+company_id_and_created_at.rb: table 'employees' has multi-level
+ partitioning by company_id then created_at created_at is grouped by
+ week where weeks start on Monday.
+created_at.rb: table 'employees' partitioned by created_at
+ created_at is grouped by week where weeks start on Monday.
+created_at_referencing_awards.rb: table 'employees' partitioned by
+ created_at and table 'awards' is partitioned by created_at which a
+ reference to specific child table of employees with the created_at
+ range.
+id.rb: partitioned on 'id' grouping each 10 records into separate
+ child tables.
+start_date.rb: grouped by column start_date which is a date grouped
+ by month.
 The lib directory contains:
-    by_company_id.rb  - a partitioned model where the partition's key is the column company_id that references companies.
-    company.rb        - an ActiveRecord model for the table companies.
-    roman.rb          - some helper routines for generating roman numerals.
-This directory holds executable rails scripts that create and populate partitioned tables.  The scripts accept the following
-parameters:
-  --force		delete used tables before starting
-  --cleanup		delete used tables and exit
+by_company_id.rb: a partitioned model where the partition's key is
+ the column company_id that references companies.
+company.rb: an ActiveRecord model for the table companies.
+roman.rb: some helper routines for generating roman numerals.
-The scripts are:
-    company_id.rb			- table 'employees' partitioned by company_id
-    company_id_and_created_at.rb	- table 'employees' has multi-level partitioning by company_id then created_at
-    					  created_at is grouped by week where weeks start on Monday.
-    created_at.rb			- table 'employees' partitioned by created_at
-    					  created_at is grouped by week where weeks start on Monday.
-    created_at_referencing_awards.rb	- table 'employees' partitioned by created_at and table 'awards' is partitioned
-    					  by created_at which a reference to specific child table of employees with the
-					  created_at range.
-    id.rb				- partitioned on 'id' grouping each 10 records into separate child tables.
-    start_date.rb			- grouped by column start_date which is a date grouped by month.

data/lib/monkey_patch_activerecord.rb CHANGED Viewed

@@ -5,11 +5,17 @@ require 'active_record/relation.rb'
 require 'active_record/persistence.rb'
 #
-# patching activerecord to allow specifying the table name as a function of
-# attributes
+# Patching {ActiveRecord} to allow specifying the table name as a function of
+# attributes.
 #
 module ActiveRecord
+  #
+  # Patches for Persistence to allow certain partitioning (that related to the primary key) to work.
+  #
   module Persistence
+    #
+    # patch the create method to prefetch the primary key if needed
+    #
     def create
       if self.id.nil? && self.class.respond_to?(:prefetch_primary_key?) && self.class.prefetch_primary_key?
         self.id = connection.next_sequence_value(self.class.sequence_name)
@@ -27,17 +33,17 @@ module ActiveRecord
     end
   end
   #
-  # patches for relation to allow back hooks into the activerecord
-  # requesting name of table as a function of attributes
+  # Patches for relation to allow back hooks into the {ActiveRecord}
+  # requesting name of table as a function of attributes.
   #
   class Relation
     #
-    # patches activerecord's building of an insert statement to request
+    # Patches {ActiveRecord}'s building of an insert statement to request
     # of the model a table name with respect to attribute values being
-    # inserted
+    # inserted.
     #
-    # the differences between this and the original code are small and marked
-    # with PARTITIONED comment
+    # The differences between this and the original code are small and marked
+    # with PARTITIONED comment.
     def insert(values)
       primary_key_value = nil

data/lib/monkey_patch_postgres.rb CHANGED Viewed

@@ -2,47 +2,73 @@ require 'active_record'
 require 'active_record/base'
 require 'active_record/connection_adapters/abstract_adapter'
+#
+# Patching {ActiveRecord::ConnectionAdapters::TableDefinition} and
+# {ActiveRecord::ConnectionAdapters::PostgreSQLAdapter} to add functionality
+# needed to abstract partition specific SQL statements.
+#
 module ActiveRecord::ConnectionAdapters
+  #
+  # Patches associated with building check constraints.
+  #
   class TableDefinition
+    #
+    # Builds a SQL check constraint
+    #
+    # @param [String] constraint a SQL constraint
     def check_constraint(constraint)
       @columns << Struct.new(:to_sql).new("CHECK (#{constraint})")
     end
   end
+  #
+  # Patches extending the postgres adapter with new operations for managing
+  # sequences (and sets of sequence values), schemas and foreign keys.
+  # These should go into AbstractAdapter allowing any database adapter
+  # to take advantage of these SQL builders.
+  #
   class PostgreSQLAdapter < AbstractAdapter
     #
-    # get the next value in a sequence.  used on INSERT operation for
+    # Get the next value in a sequence. Used on INSERT operation for
     # partitioning like by_id because the ID is required before the insert
     # so that the specific child table is known ahead of time.
     #
+    # @param [String] sequence_name the name of the sequence to fetch the next value from
+    # @return [Integer] the value from the sequence
     def next_sequence_value(sequence_name)
       return execute("select nextval('#{sequence_name}')").field_values("nextval").first
     end
     #
-    # get the some next values in a sequence.
-    # batch_size - count of values
+    # Get the some next values in a sequence.
     #
+    # @param [String] sequence_name the name of the sequence to fetch the next values from
+    # @param [Integer] batch_size count of values.
+    # @return [Array<Integer>] an array of values from the sequence
     def next_sequence_values(sequence_name, batch_size)
       result = execute("select nextval('#{sequence_name}') from generate_series(1, #{batch_size})")
       return result.field_values("nextval").map(&:to_i)
     end
     #
-    # causes active resource to fetch the primary key for the table (using next_sequence_value())
-    # just before an insert.  We need the prefetch to happen but we don't have enough information
+    # Causes active resource to fetch the primary key for the table (using next_sequence_value())
+    # just before an insert. We need the prefetch to happen but we don't have enough information
     # here to determine if it should happen, so Relation::insert has been modified to request of
     # the ActiveRecord::Base derived class if it requires a prefetch.
     #
+    # @param [String] table_name the table name to query
+    # @return [Boolean] returns true if the table should have its primary key prefetched.
     def prefetch_primary_key?(table_name)
       return false
     end
     #
-    # creates a schema given a name.
-    # options:
-    #   :unless_exists - check if schema exists.
+    # Creates a schema given a name.
     #
+    # @param [String] name the name of the schema.
+    # @param [Hash] options ({}) options for creating a schema
+    # @option options [Boolean] :unless_exists (false) check if schema exists.
+    # @return [optional] undefined
     def create_schema(name, options = {})
       if options[:unless_exists]
         return if execute("select count(*) from pg_namespace where nspname = '#{name}'").getvalue(0,0).to_i > 0
@@ -51,11 +77,13 @@ module ActiveRecord::ConnectionAdapters
     end
     #
-    # drop a schema given a name.
-    # options:
-    #   :if_exists - check if schema exists.
-    #   :cascade - cascade drop to dependant objects
+    # Drop a schema given a name.
     #
+    # @param [String] name the name of the schema.
+    # @param [Hash] options ({}) options for dropping a schema
+    # @option options [Boolean] :if_exists (false) check if schema exists.
+    # @option options [Boolean] :cascade (false) drop dependant objects
+    # @return [optional] undefined
     def drop_schema(name, options = {})
       if options[:if_exists]
         return if execute("select count(*) from pg_namespace where nspname = '#{name}'").getvalue(0,0).to_i == 0
@@ -64,8 +92,13 @@ module ActiveRecord::ConnectionAdapters
     end
     #
-    # add foreign key constraint to table.
+    # Add foreign key constraint to table.
     #
+    # @param [String] referencing_table_name the name of the table containing the foreign key
+    # @param [String] referencing_field_name the name of foreign key column
+    # @param [String] referenced_table_name the name of the table referenced by the foreign key
+    # @param [String] referenced_field_name (:id) the name of the column referenced by the foreign key
+    # @return [optional] undefined
     def add_foreign_key(referencing_table_name, referencing_field_name, referenced_table_name, referenced_field_name = :id)
       execute("ALTER TABLE #{referencing_table_name} add foreign key (#{referencing_field_name}) references #{referenced_table_name}(#{referenced_field_name})")
     end

data/lib/partitioned/active_record_overrides.rb CHANGED Viewed

@@ -1,19 +1,26 @@
 #
-# these are things our base class must fix in ActiveRecord::Base
+# These are things our base class must fix in ActiveRecord::Base
 #
-# no need to monkey patch these, just override them.
+# No need to monkey patch these, just override them.
 #
 module Partitioned
+  #
+  # methods that need to be override in an ActiveRecord::Base derived class so that we can support partitioning
+  #
   module ActiveRecordOverrides
     #
     # arel_attribute_values needs to return attributes (and their values) associated with the dynamic_arel_table instead of the
     # static arel_table provided by ActiveRecord.
     #
-    # the standard release of this function gathers a collection of attributes and creates a wrapper function around them
+    # The standard release of this function gathers a collection of attributes and creates a wrapper function around them
     # that names the table they are associated with. that naming is incorrect for partitioned tables.
     #
-    # we call the standard release's method then retrofit our partitioned table into the hash that is returned.
+    # We call the standard releases method then retrofit our partitioned table into the hash that is returned.
     #
+    # @param [Boolean] include_primary_key (true)
+    # @param [Boolean] include_readonly_attributes (true)
+    # @param [Boolean] attribute_names (@attributes.keys)
+    # @return [Hash] hash of key value pairs associated with persistent attributes
     def arel_attributes_values(include_primary_key = true, include_readonly_attributes = true, attribute_names = @attributes.keys)
       attrs = super
       actual_arel_table = dynamic_arel_table(self.class.table_name)
@@ -21,8 +28,9 @@ module Partitioned
     end
     #
-    # delete just needs a wrapper around it to specify the specific partition.
+    # Delete just needs a wrapper around it to specify the specific partition.
     #
+    # @return [optional] undefined
     def delete
       if persisted?
         self.class.from_partition(*self.class.partition_key_values(attributes)).delete(id)