RubyGems - partitioned - Versions diffs - 0.8.0 → 1.0.1 - Mend

partitioned 0.8.0 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

data/README +85 -36
data/Rakefile +3 -0
data/examples/README +46 -18
data/lib/monkey_patch_activerecord.rb +14 -8
data/lib/monkey_patch_postgres.rb +46 -13
data/lib/partitioned/active_record_overrides.rb +13 -5
data/lib/partitioned/bulk_methods_mixin.rb +91 -146
data/lib/partitioned/by_created_at.rb +3 -1
data/lib/partitioned/by_foreign_key.rb +5 -0
data/lib/partitioned/by_id.rb +10 -4
data/lib/partitioned/by_integer_field.rb +9 -0
data/lib/partitioned/by_monthly_time_field.rb +8 -1
data/lib/partitioned/by_time_field.rb +16 -8
data/lib/partitioned/by_weekly_time_field.rb +6 -3
data/lib/partitioned/multi_level/configurator/data.rb +1 -0
data/lib/partitioned/multi_level/configurator/dsl.rb +11 -0
data/lib/partitioned/multi_level/configurator/reader.rb +18 -0
data/lib/partitioned/multi_level/partition_manager.rb +13 -4
data/lib/partitioned/multi_level.rb +3 -1
data/lib/partitioned/partitioned_base/configurator/data.rb +10 -1
data/lib/partitioned/partitioned_base/configurator/dsl.rb +20 -15
data/lib/partitioned/partitioned_base/configurator/reader.rb +3 -0
data/lib/partitioned/partitioned_base/configurator.rb +4 -0
data/lib/partitioned/partitioned_base/partition_manager.rb +17 -15
data/lib/partitioned/partitioned_base/sql_adapter.rb +25 -23
data/lib/partitioned/partitioned_base.rb +112 -41
data/lib/partitioned/version.rb +2 -1
data/partitioned.gemspec +3 -2
metadata +68 -73

data/README CHANGED Viewed

@@ -1,29 +1,30 @@
 Partitioned
 ===========
-Partitioned adds assistance to ActiveRecord for manipulating
-(reading, creating, updating) an activerecord model that represents
-data that may be in one of many database tables (determined by the Models data).
+Partitioned adds assistance to ActiveRecord for manipulating (reading,
+creating, updating) an activerecord model that represents data that
+may be in one of many database tables (determined by the Models data).
-It also has features that support the creation and deleting of child tables and
-partitioning support infrastructure.
+It also has features that support the creation and deleting of child
+tables and partitioning support infrastructure.
-It supports Postgres partitioning and has specific features to overcome basic
-failings of Postgres's implementation of partitioning.
+It supports Postgres partitioning and has specific features to
+overcome basic failings of Postgres's implementation of partitioning.
 Basics:
-A parent table can be inherited by many child tables that inherit most of the
-attributes of the parent table including its columns.  child tables typically
-(and for the uses of this plugin must) have a unique check constraint the
-defines which data should be located in that specific child table.
+A parent table can be inherited by many child tables that inherit most
+of the attributes of the parent table including its columns.  child
+tables typically (and for the uses of this plugin must) have a unique
+check constraint the defines which data should be located in that
+specific child table.
-Such a constraint allows for the SQL planner to ignore most child tables and target
-the (hopefully) one child table that contains the records interested.  This splits
-data, and meta-data (indexes) which provides streamlined targeted access to the
-desired data.
+Such a constraint allows for the SQL planner to ignore most child
+tables and target the (hopefully) one child table that contains the
+records interested.  This splits data, and meta-data (indexes) which
+provides streamlined targeted access to the desired data.
-Support for bulk inserts and bulk updates is also provided via Partitioned::Base.create_many and
-Partitioned::Base.update_many.
+Support for bulk inserts and bulk updates is also provided via
+Partitioned::Base.create_many and Partitioned::Base.update_many.
 Example
 =======
@@ -33,7 +34,21 @@ Given the following models:
   class Company < ActiveRecord::Base
   end
-  class Employee < Partitioned::ByCompanyId
+  class ByCompanyId < Partitioned::ByForeignKey
+    self.abstract_class = true
+    belongs_to :company
+    def self.partition_foreign_key
+      return :company_id
+    end
+    partitioned do |partition|
+      partition.index :id, :unique => true
+    end
+  end
+  class Employee < ByCompanyId
   end
 and the following tables:
@@ -47,6 +62,10 @@ and the following tables:
       name             text null
   );
+  -- add some companies
+  insert into table companies (name) values
+    ('company 1'),('company 2'),('company 2');
   -- this is the parent table
   create table employees
   (
@@ -57,18 +76,45 @@ and the following tables:
       company_id       integer not null references companies
   );
+We now need to create some infrastructure for partitioned tables,
+in particular, we create a schema to hold the child partition
+tables of employees.
+  Employee.create_infrastructure
+Which creates the employees_partitions schema using the following SQL:
   create schema employees_partitions;
-  create table companies (name) values ('company 1'),('company 2'),('company 2');
+NOTE: We also install protections on the employees table so it isn't
+used as a data table (this SQL is not presented for simplicity but is
+apart of the create_infrastructure call).
+To add child tables we use the create_new_partitions_tables method:
+  company_ids = Company.all.map(&:id)
+  Employee.create_new_partition_tables(company_ids)
+which results in the following SQL:
-  -- some children
-  create table employees_partitions.p1 ( CHECK ( company_id = 1 ) ) INHERITS (employees);
-  create table employees_partitions.p2 ( CHECK ( company_id = 2 ) ) INHERITS (employees);
-  create table employees_partitions.p3 ( CHECK ( company_id = 3 ) ) INHERITS (employees);
+  create table employees_partitions.p1
+    ( CHECK ( company_id = 1 ) ) INHERITS (employees);
+  create table employees_partitions.p2
+    ( CHECK ( company_id = 2 ) ) INHERITS (employees);
+  create table employees_partitions.p3
+    ( CHECK ( company_id = 3 ) ) INHERITS (employees);
-since database records exist in a specific child table dependant on the field "company_id"
-we need to have creates that turn into database inserts of the EMPLOYEES table redirect
-the record insert into the specific child table determined by the value of COMPANY_ID
+NOTE: Some other SQL is generated in the above example, specifically
+the reference to the companies table needs to be explicitly created
+for postgres child tables AND the unique index on 'id' is created.
+These are not shown for simplicity.
+Now we can do operations involving the child partitions.
+Since database records exist in a specific child table dependant on
+the field "company_id" we need to have creates that turn into database
+inserts of the EMPLOYEES table redirect the record insert into the
+specific child table determined by the value of COMPANY_ID
 eg:
   employee = Employee.create(:name => 'Keith', :company_id => 1)
@@ -79,12 +125,12 @@ this would normally produce the following:
 but with Partitioned we see:
   INSERT INTO employees_partitions.p1 ('name', company_id) values ('Keith', 1);
-reads of such a table need some assistance to find the specific child table the
-record exists in.
+reads of such a table need some assistance to find the specific child
+table the record exists in.
-Since we are partitioned by company_id the programmer needs to provide that information
-when fetching data, or the database will need to search all child table for the
-specific record we are looking for.
+Since we are partitioned by company_id the programmer needs to provide
+that information when fetching data, or the database will need to
+search all child table for the specific record we are looking for.
 This is no longer valid (well, doesn't perform well):
@@ -93,11 +139,14 @@ This is no longer valid (well, doesn't perform well):
 instead, do one of the following:
   employee = Employee.from_partition(1).find(1)
-  employee = Employee.find(:first, :conditions => {:name => 'Keith', :company_id => 1})
-  employee = Employee.find(:first, :conditions => {:id => 1, :company_id => 1})
-an update (employee.save where the record already exists in the database) will take
-advantage of knowing which child table the record exists in so it can do some optimization.
+  employee = Employee.find(:first,
+                           :conditions => {:name => 'Keith', :company_id => 1})
+  employee = Employee.find(:first,
+                           :conditions => {:id => 1, :company_id => 1})
+an update (employee.save where the record already exists in the
+database) will take advantage of knowing which child table the record
+exists in so it can do some optimization.
 so, the following works as expected:

data/Rakefile CHANGED Viewed

@@ -4,6 +4,9 @@ begin
 rescue LoadError
   puts 'You must `gem install bundler` and `bundle install` to run rake tasks'
 end
+task :default => :spec
 begin
   require 'rdoc/task'
 rescue LoadError

data/examples/README CHANGED Viewed

@@ -1,23 +1,51 @@
 The directory holds examples of how to use the partitioned gem.
+These rails scripts will create and populate partitioned tables. The
+scripts accept the following parameters:
+ --?                   list available options
+ --force	       delete tables before starting
+                         default: false
+ --cleanup	       delete tables and exit
+                         default: false
+ --create-many         how many objects to create via create_many
+                         default: 3000
+ --create-individual   how many objects to create via create
+                         default: 1000
+ --new-individual      how many objects to create via new.save
+                         default: 1000
+ --update-individual   how many objects to update individually
+                         default: 1000
+ --update-many         how many objects to update via update_many
+                         default: 1000
+The scripts are:
+company_id.rb: table 'employees' partitioned by company_id
+company_id_and_created_at.rb: table 'employees' has multi-level
+ partitioning by company_id then created_at created_at is grouped by
+ week where weeks start on Monday.
+created_at.rb: table 'employees' partitioned by created_at
+ created_at is grouped by week where weeks start on Monday.
+created_at_referencing_awards.rb: table 'employees' partitioned by
+ created_at and table 'awards' is partitioned by created_at which a
+ reference to specific child table of employees with the created_at
+ range.
+id.rb: partitioned on 'id' grouping each 10 records into separate
+ child tables.
+start_date.rb: grouped by column start_date which is a date grouped
+ by month.
 The lib directory contains:
-    by_company_id.rb  - a partitioned model where the partition's key is the column company_id that references companies.
-    company.rb        - an ActiveRecord model for the table companies.
-    roman.rb          - some helper routines for generating roman numerals.
-This directory holds executable rails scripts that create and populate partitioned tables.  The scripts accept the following
-parameters:
-  --force		delete used tables before starting
-  --cleanup		delete used tables and exit
+by_company_id.rb: a partitioned model where the partition's key is
+ the column company_id that references companies.
+company.rb: an ActiveRecord model for the table companies.
+roman.rb: some helper routines for generating roman numerals.
-The scripts are:
-    company_id.rb			- table 'employees' partitioned by company_id
-    company_id_and_created_at.rb	- table 'employees' has multi-level partitioning by company_id then created_at
-    					  created_at is grouped by week where weeks start on Monday.
-    created_at.rb			- table 'employees' partitioned by created_at
-    					  created_at is grouped by week where weeks start on Monday.
-    created_at_referencing_awards.rb	- table 'employees' partitioned by created_at and table 'awards' is partitioned
-    					  by created_at which a reference to specific child table of employees with the
-					  created_at range.
-    id.rb				- partitioned on 'id' grouping each 10 records into separate child tables.
-    start_date.rb			- grouped by column start_date which is a date grouped by month.

data/lib/monkey_patch_activerecord.rb CHANGED Viewed

@@ -5,11 +5,17 @@ require 'active_record/relation.rb'
 require 'active_record/persistence.rb'
 #
-# patching activerecord to allow specifying the table name as a function of
-# attributes
+# Patching {ActiveRecord} to allow specifying the table name as a function of
+# attributes.
 #
 module ActiveRecord
+  #
+  # Patches for Persistence to allow certain partitioning (that related to the primary key) to work.
+  #
   module Persistence
+    #
+    # patch the create method to prefetch the primary key if needed
+    #
     def create
       if self.id.nil? && self.class.respond_to?(:prefetch_primary_key?) && self.class.prefetch_primary_key?
         self.id = connection.next_sequence_value(self.class.sequence_name)
@@ -27,17 +33,17 @@ module ActiveRecord
     end
   end
   #
-  # patches for relation to allow back hooks into the activerecord
-  # requesting name of table as a function of attributes
+  # Patches for relation to allow back hooks into the {ActiveRecord}
+  # requesting name of table as a function of attributes.
   #
   class Relation
     #
-    # patches activerecord's building of an insert statement to request
+    # Patches {ActiveRecord}'s building of an insert statement to request
     # of the model a table name with respect to attribute values being
-    # inserted
+    # inserted.
     #
-    # the differences between this and the original code are small and marked
-    # with PARTITIONED comment
+    # The differences between this and the original code are small and marked
+    # with PARTITIONED comment.
     def insert(values)
       primary_key_value = nil

data/lib/monkey_patch_postgres.rb CHANGED Viewed

@@ -2,47 +2,73 @@ require 'active_record'
 require 'active_record/base'
 require 'active_record/connection_adapters/abstract_adapter'
+#
+# Patching {ActiveRecord::ConnectionAdapters::TableDefinition} and
+# {ActiveRecord::ConnectionAdapters::PostgreSQLAdapter} to add functionality
+# needed to abstract partition specific SQL statements.
+#
 module ActiveRecord::ConnectionAdapters
+  #
+  # Patches associated with building check constraints.
+  #
   class TableDefinition
+    #
+    # Builds a SQL check constraint
+    #
+    # @param [String] constraint a SQL constraint
     def check_constraint(constraint)
       @columns << Struct.new(:to_sql).new("CHECK (#{constraint})")
     end
   end
+  #
+  # Patches extending the postgres adapter with new operations for managing
+  # sequences (and sets of sequence values), schemas and foreign keys.
+  # These should go into AbstractAdapter allowing any database adapter
+  # to take advantage of these SQL builders.
+  #
   class PostgreSQLAdapter < AbstractAdapter
     #
-    # get the next value in a sequence.  used on INSERT operation for
+    # Get the next value in a sequence. Used on INSERT operation for
     # partitioning like by_id because the ID is required before the insert
     # so that the specific child table is known ahead of time.
     #
+    # @param [String] sequence_name the name of the sequence to fetch the next value from
+    # @return [Integer] the value from the sequence
     def next_sequence_value(sequence_name)
       return execute("select nextval('#{sequence_name}')").field_values("nextval").first
     end
     #
-    # get the some next values in a sequence.
-    # batch_size - count of values
+    # Get the some next values in a sequence.
     #
+    # @param [String] sequence_name the name of the sequence to fetch the next values from
+    # @param [Integer] batch_size count of values.
+    # @return [Array<Integer>] an array of values from the sequence
     def next_sequence_values(sequence_name, batch_size)
       result = execute("select nextval('#{sequence_name}') from generate_series(1, #{batch_size})")
       return result.field_values("nextval").map(&:to_i)
     end
     #
-    # causes active resource to fetch the primary key for the table (using next_sequence_value())
-    # just before an insert.  We need the prefetch to happen but we don't have enough information
+    # Causes active resource to fetch the primary key for the table (using next_sequence_value())
+    # just before an insert. We need the prefetch to happen but we don't have enough information
     # here to determine if it should happen, so Relation::insert has been modified to request of
     # the ActiveRecord::Base derived class if it requires a prefetch.
     #
+    # @param [String] table_name the table name to query
+    # @return [Boolean] returns true if the table should have its primary key prefetched.
     def prefetch_primary_key?(table_name)
       return false
     end
     #
-    # creates a schema given a name.
-    # options:
-    #   :unless_exists - check if schema exists.
+    # Creates a schema given a name.
     #
+    # @param [String] name the name of the schema.
+    # @param [Hash] options ({}) options for creating a schema
+    # @option options [Boolean] :unless_exists (false) check if schema exists.
+    # @return [optional] undefined
     def create_schema(name, options = {})
       if options[:unless_exists]
         return if execute("select count(*) from pg_namespace where nspname = '#{name}'").getvalue(0,0).to_i > 0
@@ -51,11 +77,13 @@ module ActiveRecord::ConnectionAdapters
     end
     #
-    # drop a schema given a name.
-    # options:
-    #   :if_exists - check if schema exists.
-    #   :cascade - cascade drop to dependant objects
+    # Drop a schema given a name.
     #
+    # @param [String] name the name of the schema.
+    # @param [Hash] options ({}) options for dropping a schema
+    # @option options [Boolean] :if_exists (false) check if schema exists.
+    # @option options [Boolean] :cascade (false) drop dependant objects
+    # @return [optional] undefined
     def drop_schema(name, options = {})
       if options[:if_exists]
         return if execute("select count(*) from pg_namespace where nspname = '#{name}'").getvalue(0,0).to_i == 0
@@ -64,8 +92,13 @@ module ActiveRecord::ConnectionAdapters
     end
     #
-    # add foreign key constraint to table.
+    # Add foreign key constraint to table.
     #
+    # @param [String] referencing_table_name the name of the table containing the foreign key
+    # @param [String] referencing_field_name the name of foreign key column
+    # @param [String] referenced_table_name the name of the table referenced by the foreign key
+    # @param [String] referenced_field_name (:id) the name of the column referenced by the foreign key
+    # @return [optional] undefined
     def add_foreign_key(referencing_table_name, referencing_field_name, referenced_table_name, referenced_field_name = :id)
       execute("ALTER TABLE #{referencing_table_name} add foreign key (#{referencing_field_name}) references #{referenced_table_name}(#{referenced_field_name})")
     end

data/lib/partitioned/active_record_overrides.rb CHANGED Viewed

@@ -1,19 +1,26 @@
 #
-# these are things our base class must fix in ActiveRecord::Base
+# These are things our base class must fix in ActiveRecord::Base
 #
-# no need to monkey patch these, just override them.
+# No need to monkey patch these, just override them.
 #
 module Partitioned
+  #
+  # methods that need to be override in an ActiveRecord::Base derived class so that we can support partitioning
+  #
   module ActiveRecordOverrides
     #
     # arel_attribute_values needs to return attributes (and their values) associated with the dynamic_arel_table instead of the
     # static arel_table provided by ActiveRecord.
     #
-    # the standard release of this function gathers a collection of attributes and creates a wrapper function around them
+    # The standard release of this function gathers a collection of attributes and creates a wrapper function around them
     # that names the table they are associated with. that naming is incorrect for partitioned tables.
     #
-    # we call the standard release's method then retrofit our partitioned table into the hash that is returned.
+    # We call the standard releases method then retrofit our partitioned table into the hash that is returned.
     #
+    # @param [Boolean] include_primary_key (true)
+    # @param [Boolean] include_readonly_attributes (true)
+    # @param [Boolean] attribute_names (@attributes.keys)
+    # @return [Hash] hash of key value pairs associated with persistent attributes
     def arel_attributes_values(include_primary_key = true, include_readonly_attributes = true, attribute_names = @attributes.keys)
       attrs = super
       actual_arel_table = dynamic_arel_table(self.class.table_name)
@@ -21,8 +28,9 @@ module Partitioned
     end
     #
-    # delete just needs a wrapper around it to specify the specific partition.
+    # Delete just needs a wrapper around it to specify the specific partition.
     #
+    # @return [optional] undefined
     def delete
       if persisted?
         self.class.from_partition(*self.class.partition_key_values(attributes)).delete(id)