RubyGems - canvas_sync - Versions diffs - 0.26.1 → 0.27.1.beta2 - Mend

canvas_sync 0.26.1 → 0.27.1.beta2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (69) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 4df6839f29a0cd2bf0dfe0f3203a117ccc1bc14c1422a2d2bbb8020fed7a29e5
-  data.tar.gz: f07f9efcf576dce523e34dd4ee77819ed5ceb502265fb0f167659a34e53db8c2
+  metadata.gz: 50549b542e0425d35fc1effb9f505ce4a1d75e1517eb1e32b2a8bbffb00ee7de
+  data.tar.gz: 6442ed0b393e7e234984eb78dced6ef82aa388767ffba09f483de78b411e4ff5
 SHA512:
-  metadata.gz: a633c8a2c949083c1b5d177235283dda2a959d6471519034265e585daf3b2305b4f7832cc8a8ee0b9b2839ba8625834fc37b7c4a34fb3d3be2c563d638b433ab
-  data.tar.gz: 4f173ff8a8437776e2cb37ce321c8fc826ee01119a6e5ec6f05d54793d303d570d51ae63de2b2a03b0e531740597d5b752a10cef7bc4a325f736ced1595f6ec0
+  metadata.gz: 7c5c55806050f7fb686d7331d39b611def2783c0f9db57c6437cea99df30d69038671b3fc9b7b1bc9e4c4065e4a1ce8eb164db92cf4a40399c823e2d1e0f28a7
+  data.tar.gz: f743ab5316387d8a0519e29d8675e2246ace7cefdb8d9acff9a5d2ed42e3cc0fce55287a2c779105f7b51502331aac3cb75bb900a38096247626b57274c92c47

data/README.md CHANGED Viewed

@@ -125,9 +125,9 @@ These jobs can also be generated from template using `bin/rails generate canvas_
 This gem also helps with syncing and processing other reports if needed. In order to do so, you must:
-- Define a `Processor` class that implements a `process` method for handling the results of the report
-- Integrate your reports with the `ReportStarter`
-- Tell the gem what jobs to run
+- Create a job class that extends `CanvasSync::Jobs::ReportSyncTask`
+- Define the report name and implement a `process` method for handling the results
+- Optionally override `report_parameters` for custom report parameters
 ### `updated_after`
 An `updated_after` param may be passed when triggering a provision or making a chain:
@@ -183,38 +183,73 @@ chain.insert({ job: SomeOtherJob }, after: 'CanvasSync::Jobs::SyncCoursesJob') #
 chain.get_sub_chain('CanvasSync::Jobs::SyncTermsJob')
 ```
-### Processor
+### Custom Report Sync Jobs
-Your processor class must implement a `process` class method that receives a `report_file_path` and a hash of `options`. (See the `CanvasSync::Processors::ProvisioningReportProcessor` for an example.) The gem handles the work of enqueueing and downloading the report and then passes the file path to your class to process as needed. A simple example might be:
+To create a custom report sync job, extend `CanvasSync::Jobs::ReportSyncTask` and define your report. The gem automatically handles:
+- Starting the Canvas report
+- Polling for completion
+- Downloading the report file
+- Error handling and retries
+- Timeout management
+Let's say we have a custom Canvas report called "my_really_cool_report_csv". Here's how to create a sync job:
 ```ruby
-class MyCoolProcessor
-  def self.process(report_file_path, options)
-    puts "I downloaded a report to #{report_file_path}! Isn't that neat!"
+class MyReallyCoolReportJob < CanvasSync::Jobs::ReportSyncTask
+  # Define the Canvas report name
+  report_name "my_really_cool_report_csv"
+  # Optional: Override report parameters
+  def report_parameters
+    super.merge(
+      "course_ids" => [1,2,3],
+      "param2" => "value"
+    )
+  end
+  # Implement the process method to handle the downloaded report
+  def process(file)
+    # file is the path to the downloaded report
+    puts "I downloaded a report to #{file}! Isn't that neat!"
+    # Add your processing logic here, e.g.:
+    # do_bulk_import(file, MyModel)
   end
 end
 ```
-### Report starter
+#### Examples
-You must implement a job that will enqueue a report starter for your report. (TODO: would be nice to make some sort of builder for this, so you just define the report and its params and then the gem runs it in a pre-defined job.)
+For simple CSV reports (single model):
+```ruby
+class SyncRubricAssessmentsJob < CanvasSync::Jobs::ReportSyncTask
+  report_name "rubric_assessments_csv"
-Let's say we have a custom Canvas report called "my_really_cool_report_csv". First, we would need to create a job class that will enqueue a report starter.
+  def process(file)
+    do_bulk_import(file, RubricAssessment)
+  end
+end
+```
+For ZIP reports (multiple models like provisioning reports):
 ```ruby
-class MyReallyCoolReportJob < CanvasSync::Jobs::ReportStarter
-  def perform(options)
-    super(
-      'my_really_cool_report_csv', # Report name
-      { "parameters[param1]" => true }, # Report parameters
-      MyCoolProcessor.to_s, # Your processor class as a string
-      options
-    )
+class SyncProvisioningReportJob < CanvasSync::Jobs::ReportSyncTask
+  report_name "provisioning_csv"
+  def report_parameters
+    ...
+  end
+  def process(file)
+    # Handle ZIP extraction and process each model
+    # See lib/canvas_sync/jobs/sync_provisioning_report_job.rb for full example
   end
 end
 ```
-You can also see examples in `lib/canvas_sync/jobs/sync_users_job.rb` and `lib/canvas_sync/jobs/sync_provisioning_report.rb`.
+You can also see more examples in:
+- `lib/canvas_sync/jobs/sync_provisioning_report_job.rb` (ZIP report with multiple models)
+- `lib/canvas_sync/jobs/sync_rubric_assessments_job.rb` (Simple CSV report)
+- `lib/canvas_sync/jobs/sync_course_progresses_job.rb` (Simple CSV report)
 ### Batching
@@ -322,6 +357,68 @@ end
 `before_jit_sync` is provided as well, but it's use case in niche. It can `throw :jit_found, record` to abort the rest of the JIT process and instead return a specific record. Again, should be quite niche.
+### UserViaPseudonym
+CanvasSync provides the `CanvasSync::Concerns::UserViaPseudonym` concern for models that need to associate with Canvas users through their pseudonym (login) records. This is particularly useful so that records will always be linked to the correct user, regardless of whether users are merged in Canvas.
+#### Basic Usage
+Include the concern in your model and use the `belongs_to_user` class method:
+```ruby
+class MyModel < ApplicationRecord
+  include CanvasSync::Concerns::UserViaPseudonym
+  # Associate with user using Canvas ID
+  belongs_to_user :teacher
+  # Or associate using SIS ID
+  belongs_to_user :student, using_sis_id: true
+end
+```
+This translates largely into:
+```ruby
+  belongs_to :teacher_pseudonym, class_name: "Pseudonym"
+  has_one :teacher, through: :pseudonym, class_name: "User"
+  def user=(u)
+    self.pseudonym = u.pseudonyms.active.last || u.pseudonyms.last
+  end
+```
+#### Cached User IDs (Performance Optimization)
+For better performance, you can add a cached user ID column to your table. This eliminates the need for joins when you just need the user ID:
+```ruby
+# In your migration:
+add_column :my_models, :cached_teacher_user_id, :bigint
+add_index :my_models, :cached_teacher_user_id
+# In your model:
+class MyModel < ApplicationRecord
+  include CanvasSync::Concerns::UserViaPseudonym
+  belongs_to_user :teacher, cache_column: :cached_teacher_user_id
+end
+```
+##### Refreshing Cached User IDs
+When using cached columns, you can refresh them in bulk:
+```ruby
+# Refresh all records
+MyModel.update_cached_user_ids!
+# Or refresh specific records
+MyModel.where(some_condition: true).update_cached_user_ids!
+```
+CanvasSync will also do this automatically after syncing.
 ### Job Batching
 CanvasSync adds a `CanvasSync::JobBatches` module. It adds Sidekiq/sidekiq-batch like support for Job Batches.
 It integrates automatically with both Sidekiq and ActiveJob. The API is highly similar to the Sidekiq-batch implementation,

data/app/controllers/canvas_sync/api/v1/live_events_controller.rb CHANGED Viewed

@@ -111,6 +111,7 @@ module CanvasSync::Api
       end
       def switch_tenant(&block)
+        # TODO Move detection of this param into the PandaPal Apartment Elevator
         if defined?(PandaPal) && (org = params[:organization] || params[:org]).present?
           org = PandaPal::Organization.find(org)
           if org.respond_to?(:switch_tenant)

data/lib/canvas_sync/config.rb CHANGED Viewed

@@ -2,7 +2,7 @@ module CanvasSync
   class Config
     include ActiveSupport::Configurable
-    config_accessor(:classes_to_only_log_errors_on) { ["CanvasSync::Jobs::ReportChecker"] }
+    config_accessor(:classes_to_only_log_errors_on) { ["CanvasSync::Jobs::ReportChecker", "CanvasSync::Jobs::ReportSyncTask::CheckerJob"] }
     config_accessor(:redis_key_prefix) { "cs" }
   end
 end

data/lib/canvas_sync/importers/bulk_importer.rb CHANGED Viewed

@@ -1,3 +1,5 @@
+require "activerecord-import"
 module CanvasSync
   module Importers
     class BulkImporter

data/lib/canvas_sync/jobs/begin_sync_chain_job.rb CHANGED Viewed

@@ -45,7 +45,7 @@ module CanvasSync
           b.on(:success, "#{self.class.to_s}.batch_completed", sync_batch_id: sync_batch.id)
           b.context = globals
           b.jobs do
-            JobBatches::SerialBatchJob.perform_now(chain_definition)
+            JobBatches::SerialBatchJob.perform_now(chain_definition, description: "Top Serial Batch")
           end
           sync_batch.update(batch_bid: b.bid)
         end

data/lib/canvas_sync/jobs/beta_cleanup/create_temp_tables_job.rb ADDED Viewed

@@ -0,0 +1,30 @@
+module CanvasSync
+  module Jobs::BetaCleanup
+    # This job creates "temporary" tables that holds the records that were updated since a given date
+    # and deletes those records from the main tables.
+    # The tables are actual tables and not the temporary tables in the DB sense
+    # because they need to be reused in other contexts.
+    # They are deleted in a different job at a later point
+    class CreateTempTablesJob < CanvasSync::Job
+      def perform(options = {})
+        canvas_tables = options[:models]
+        return if canvas_tables.empty?
+        canvas_tables.each do |table_name|
+          model = table_name.singularize.camelize.constantize
+          updated_after = options[:updated_after] || 2.weeks.ago
+          ActiveRecord::Base.connection.drop_table("beta_#{table_name}", if_exists: true)
+          ActiveRecord::Base.connection.exec_query(<<~SQL
+            CREATE TABLE beta_#{table_name} AS
+            SELECT * FROM #{model.quoted_table_name} WHERE updated_at >= '#{updated_after}';
+          SQL
+                                                  )
+          model.where("updated_at >= '#{updated_after}'").delete_all
+        end
+      end
+    end
+  end
+end

data/lib/canvas_sync/jobs/beta_cleanup/delete_related_records_job.rb ADDED Viewed

@@ -0,0 +1,125 @@
+module CanvasSync
+  module Jobs::BetaCleanup
+    class DeleteRelatedRecordsJob < CanvasSync::Job
+      # This method assumes that the main tables have been dumped into temporary tables (until provided date range, e.g 2 weeks)
+      # and that the main tables have been re-synced
+      # Considering these two criteria, assuming we have the main table now with
+      # records [1, 2, 3] and the temp table with records [1, 2, 3, 4], this means that
+      # record '4' needs to be taken out of related tables.
+      # This could for example be a record with a 'canvas_user_id = 4' in a table that is not part of CanvasSync models
+      # When this is implemented in CanvasSync, we could store the cleanup id and prefix the table name with it
+      # e.g: beta_cleanup_1_users
+      def delete_matching_records_between_main_and_temp
+        @canvas_tables.each do |table_name|
+          model = table_name.singularize.camelize.constantize
+          # here we can safely use the id to find the records we want - the primary keys are preserved in the temporary table
+          ActiveRecord::Base.connection.exec_query(<<~SQL
+            DELETE FROM beta_#{table_name} WHERE canvas_id IN (SELECT canvas_id FROM #{model.quoted_table_name});
+          SQL
+                                                  )
+        end
+      end
+      # possible foreign keys per table. Some LTI tables may or may not always use the Canvas ID
+      #
+      # Some tables like user_observers have multiple user_id in them (e.g 'observed_user_id' and 'observing_user_id')
+      # Not sure if this matters here, to investigate
+      def foreign_keys
+        @foreign_keys ||= @canvas_tables.map do |table_name|
+          model_name = table_name.singularize
+          fks = ["#{model_name}_id", "canvas_#{model_name}_id"]
+          fks.each { |fk| foreign_key_to_table[fk] = table_name }
+          fks
+        end.flatten
+      end
+      # Find better names for these next two methods
+      # fk => table_name (1:1)
+      def foreign_key_to_table
+        @foreign_key_to_table ||= {}.with_indifferent_access
+      end
+      # fk => table_name[] (1:many)
+      def foreign_key_to_tables
+        h = {}.with_indifferent_access
+        ActiveRecord::Base.connection.tables.map do |lti_table|
+          # ignore the models we just synced or the beta tables
+          # This may not be super reliable in case someone calls a table 'beta_xyz'
+          next if @canvas_tables.include?(lti_table.gsub(/^beta_/, ''))
+          columns = ActiveRecord::Base.connection.columns(lti_table).map(&:name)
+          cols_in_table = foreign_keys & columns
+          next if cols_in_table.empty?
+          cols_in_table.map do |col|
+            h[col] ||= []
+            h[col] << lti_table
+          end
+        end
+        @foreign_key_to_tables ||= h
+      end
+      def active_tables
+        # check the count of each "beta_" table
+        active_tables_sql = @canvas_tables.map do |table|
+          "SELECT 'beta_#{table}' AS table_name FROM beta_#{table} GROUP BY table_name HAVING COUNT(*) > 0"
+        end.join(' UNION ALL ')
+        # active tables are tables with at least one record
+        # If a beta table has no rows, we can ignore it in the next step
+        active_tables = ActiveRecord::Base.connection.exec_query(active_tables_sql).rows.flatten
+      end
+      def perform(options = {})
+        canvas_tables = options[:models] || []
+        @canvas_tables = canvas_tables
+        return [] if canvas_tables.empty?
+        delete_matching_records_between_main_and_temp
+        # every table is cleaned by foreign key to be idempotent in case of job failures (instead of cleaned by table)
+        # we can not clean by table because multiple tables may be using the same 'user_id' foreign key for example
+        foreign_key_to_tables.each do |fk, table_names|
+          temp_related_table = "beta_#{foreign_key_to_table[fk]}"
+          next unless active_tables.include?(temp_related_table)
+          pk = fk.include?('canvas') ? 'canvas_id' : 'id'
+          # Ideally this commented CTE is used so this process is idempotent but there is an issue when there are two tables with, for example user_id & canvas_user_id respectively
+          # The user_id foreign key will be processed and will delete the temporary records, which means the second foreign key, canvas_user_id, will not be able to retrive any records from it
+          # I don't think idempotency is such a big deal here because:
+          #   1) this runs only in beta instances
+          #   2) it runs for data within (most likely) the last 2 weeks, which means not that many records
+          # In the meantime, we can use the CTE that does not delete records
+=begin
+          sql = <<~SQL
+            WITH stale_records AS (
+              DELETE FROM #{temp_related_table}
+              WHERE #{pk} IN (
+                SELECT #{pk} FROM #{temp_related_table} ORDER BY #{pk} LIMIT 1000
+              )
+              RETURNING #{pk}
+            ),
+          SQL
+=end
+          sql = <<~SQL
+            WITH stale_records AS (
+              SELECT #{pk} FROM #{temp_related_table} ORDER BY #{pk} LIMIT 1000
+            ),
+          SQL
+          # using CTEs for multiple deletion statements in one query
+          # This allows using the same 'stale_records' CTE for all of the delete statements
+          sql << table_names.map { |table_name| "#{table_name}_del AS ( DELETE FROM #{table_name} WHERE #{fk} IN (SELECT #{pk} FROM stale_records) )" }.join(",\n")
+          sql << 'SELECT count(*) AS deleted_count FROM stale_records'
+          ActiveRecord::Base.transaction do # transaction to use the CTE multiple times
+            ActiveRecord::Base.connection.exec_query(sql)
+          end
+        end
+      end
+    end
+  end
+end

data/lib/canvas_sync/jobs/beta_cleanup/delete_temp_tables_job.rb ADDED Viewed

@@ -0,0 +1,16 @@
+module CanvasSync
+  module Jobs::BetaCleanup
+    class DeleteTempTablesJob < CanvasSync::Job
+      def perform(options = {})
+        tables = options[:models] || []
+        return if tables.empty?
+        tables.each do |table_name|
+          ActiveRecord::Base.connection.drop_table(table_name, if_exists: true)
+        end
+      end
+    end
+  end
+end

data/lib/canvas_sync/jobs/report_starter.rb CHANGED Viewed

@@ -1,8 +1,12 @@
+require_relative "./report_sync_task"
 module CanvasSync
   module Jobs
-    # Starts a Canvas report and enqueues a ReportChecker
+    # @deprecated Use ReportSyncTask instead
+    # This class is now a shim that delegates to LegacyReportShimTask for backwards compatibility.
+    # ReportChecker and ReportProcessorJob are no longer used when going through this shim.
     class ReportStarter < CanvasSync::Job
-      # @param report_name [Hash] e.g., 'provisioning_csv'
+      # @param report_name [String] e.g., 'provisioning_csv'
       # @param report_params [Hash] The Canvas report parameters
       # @param processor [String] a stringified report processor class name
       # @param options [Hash] hash of options that will be passed to the job processor
@@ -10,27 +14,17 @@ module CanvasSync
       #   so that any later jobs in the chain will use the same generated report
       # @return [nil]
       def perform(report_name, report_params, processor, options, allow_redownloads: false)
-        account_id = options[:account_id] || batch_context[:account_id] || "self"
-        options[:sync_start_time] = DateTime.now.utc.iso8601
-        options[:report_params] = report_params
-        report_id = start_report(account_id, report_name, report_params)
-        # TODO: Restore report caching support (does nayone actually use it?)
-        # report_id = if allow_redownloads
-        #               get_cached_report(account_id, report_name, report_params)
-        #             else
-        #               start_report(account_id, report_name, report_params)
-        #             end
+        # Merge the legacy configuration into options that will be passed through batch_context
+        merged_options = options.merge({
+          legacy_report_starter: {
+            report: report_name,
+            params: report_params,
+            processor: processor,
+          },
+        })
-        batch = JobBatches::Batch.new
-        batch.description = "CanvasSync #{report_name} Fiber"
-        batch.jobs do
-          CanvasSync::Jobs::ReportChecker.set(wait: report_checker_wait_time).perform_later(
-            report_name,
-            report_id,
-            processor.to_s,
-            options
-          )
-        end
+        # Call perform_later on the shim task class
+        LegacyReportShimTask.perform_later(merged_options)
       end
       protected
@@ -40,35 +34,28 @@ module CanvasSync
       # merge_report_params(options, params, {}) is used. That doesn't work in Ruby 3.
       # In order to maintain compatibility with 2 and with any apps, this oddness is needed
       def merge_report_params(options, params={}, _kw_placeholder=nil, term_scope: true)
-        term_scope = options[:canvas_term_id] || batch_context[:canvas_term_id] if term_scope == true
-        if term_scope.present?
-          params[:enrollment_term_id] = term_scope
-        end
-        if (updated_after = batch_context[:updated_after]).present?
-          params[:updated_after] = updated_after
-        end
-        params.merge!(options[:report_params]) if options[:report_params].present?
-        params.merge!(options[:report_parameters]) if options[:report_parameters].present?
-        { parameters: params }
+        LegacyReportShimTask.merge_report_params(options, params, term_scope: term_scope)
       end
+    end
-      private
+    # Internal shim task that reads configuration from batch_context to support the old ReportStarter API
+    class LegacyReportShimTask < CanvasSync::Jobs::ReportSyncTask
+      def report_name
+        options[:legacy_report_starter][:report]
+      end
-      def get_cached_report(job_chain, account_id, report_name, report_params)
-        # TODO: job_chain[:global_options] is no longer available and batch_context won't work for this
-        if job_chain[:global_options][report_name].present?
-          job_chain[:global_options][report_name]
-        else
-          report_id = start_report(job_chain, account_id, report_name, report_params)
-          job_chain[:global_options][report_name] = report_id
-          report_id
-        end
+      def report_parameters
+        options[:legacy_report_starter][:params]
       end
-      def start_report(account_id, report_name, report_params)
-        report = CanvasSync.get_canvas_sync_client(batch_context)
-                           .start_report(account_id, report_name, report_params)
-        report["id"]
+      def process(file)
+        processor_class_name = options[:legacy_report_starter][:processor]
+        processor_class = processor_class_name.constantize
+        account_id = options[:account_id] || batch_context[:account_id] || "self"
+        # The old processor signature: process(file_path, options, report_id)
+        # Note: The third param was report_id in ReportProcessorJob but was account_id in practice
+        processor_class.process(file, options, account_id)
       end
     end
   end