RubyGems - json_data_extractor - Versions diffs - 0.1.04 → 0.1.05 - Mend

json_data_extractor 0.1.04 → 0.1.05

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +20 -0
data/README.md +113 -0
data/json_data_extractor.gemspec +1 -1
data/lib/json_data_extractor/extractor.rb +60 -1
data/lib/json_data_extractor/schema_cache.rb +30 -0
data/lib/json_data_extractor/version.rb +1 -1
data/lib/json_data_extractor.rb +10 -1
metadata +6 -4

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 78d2adf9786c1444ad0307cbc8b5a613be04a8b4ace56c5b1973f619fed58177
-  data.tar.gz: 78bdd81ae9c8b6bb68742b64c7d389fa4844fad20b498cd927ca3dd85a03e9a6
+  metadata.gz: 1a833b6de5421f4871a5a36814d6f304a495c16b65b8d27d60f1839317bd26ac
+  data.tar.gz: 2e1a98f860ad8f52fd81aa9003b1e2a0fb079bbfdeaef51380625d7e25e83b2c
 SHA512:
-  metadata.gz: ac6bc721be1214813aecefc887cfddb7cfb6b55730506dec85fd853d35a01cd403fd1726f93bbabe7af8ed1ad25763df42e8b41ca8794e3a602f87aff040165e
-  data.tar.gz: 7c1ba2814904cba8d1041f652d313eb097c2c02e59ccd386abd95cbc64b32414ddfff8e2cf7dfb57c98045b1bb50631490ebde0adaede4b14b4455d1fe9b878e
+  metadata.gz: 1c408a29566f5999e7ccb251860442b4b7c1cbcee391a72173753bf09c30e9f6e0d78ae18af28c95cd8bc2708dbb7f180493f038221a365745f7f587da08c2a8
+  data.tar.gz: 8c936a46b176ebe7dbc365b79d4a267fb3217b16f5f88280a040f3b951b5a8f65a70384c9f025853ec90d8de5c41e0ac1b3545de2b0ed2ba5c64bc8405240fc8

data/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,20 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.1.05] -  2025-05-13
+### Added
+- Added schema reuse functionality for improved performance when processing multiple data objects with the same schema
+  - New `JsonDataExtractor.with_schema` class method to create an extractor with a pre-processed schema
+  - New `SchemaCache` class to store and reuse schema information
+  - New `extract_from` method to extract data using a cached schema
+- Performance improvements by pre-compiling JsonPath objects and caching schema elements
+## [0.1.04] - 2025-04-26
+- Use Oj for json dump
+- Use json path caching

data/README.md CHANGED Viewed

@@ -1,5 +1,7 @@
 # JsonDataExtractor
+[![Gem Version](https://badge.fury.io/rb/json_data_extractor.svg)](https://badge.fury.io/rb/json_data_extractor)
 Transform JSON data structures with the help of a simple schema and JsonPath expressions.
 Use the JsonDataExtractor gem to extract and modify data from complex JSON structures using a
 straightforward syntax
@@ -325,6 +327,117 @@ E.g. this is a valid real-life schema with nested data:
 Nested schema can be also applied to objects, not arrays. See specs for more examples.
+### Schema Reuse for Performance
+When processing multiple data objects with the same schema, JsonDataExtractor provides an optimized approach that avoids redundant schema processing. This is particularly useful for batch processing scenarios where you need to apply the same transformation to multiple data objects.
+#### Using `with_schema` and `extract_from`
+Instead of creating a new extractor for each data object:
+```ruby
+data_objects.map do |data|
+  extractor = JsonDataExtractor.new(data)
+  extractor.extract(schema)
+end
+```
+You can create a single extractor with a pre-processed schema and reuse it:
+```ruby
+extractor = JsonDataExtractor.with_schema(schema)
+data_objects.map do |data|
+  extractor.extract_from(data)
+end
+```
+This approach offers significant performance improvements for large datasets by:
+1. Pre-processing the schema only once
+2. Pre-compiling JsonPath objects
+3. Caching schema elements
+4. Avoiding redundant schema validation
+#### Comparison with Nested Schema Approach
+It's worth noting that similar functionality could be achieved using the existing nested schema approach when your data is already in an array format:
+```ruby
+# Process an array of locations with the nested schema approach
+locations_array = [location1, location2, location3]
+schema = {
+  all_locations: {
+    path: "[*]",
+    type: "array",
+    schema: { code: ".iataCode", city: ".city", name: ".name" }
+  }
+}
+result = JsonDataExtractor.new(locations_array).extract(schema)
+# Result: { all_locations: [{code: "...", city: "...", name: "..."}, {...}, {...}] }
+```
+**When to use which approach:**
+1. **Use nested schema when:**
+   - Your data is already structured as an array
+   - You want to preserve the array structure in your result
+   - You need to process the entire array at once
+2. **Use schema reuse when:**
+   - You receive data objects individually (e.g., from multiple API calls)
+   - You need to process each object separately
+   - You want to transform each object independently
+   - You need direct access to individual results without unwrapping them from an array
+The schema reuse approach is specifically optimized for scenarios where you process similar objects multiple times in sequence, rather than all at once in an array.
+#### Real-world Example
+Here's a practical example of extracting location data from multiple sources:
+```ruby
+# Location data from an API
+locations = [
+  {
+    "iataCode" => "JFK",
+    "countryCode" => "US",
+    "city" => "New York",
+    "name" => "John F. Kennedy International Airport"
+  },
+  {
+    "iataCode" => "LHR",
+    "countryCode" => "GB",
+    "city" => "London",
+    "name" => "Heathrow Airport"
+  }
+]
+# Define schema once
+schema = {
+  code: "$.iataCode",
+  city: "$.city",
+  name: "$.name",
+  country: "$.countryCode"
+}
+# Create an extractor with the schema
+jde = JsonDataExtractor.with_schema(schema)
+# Process each location efficiently
+processed_locations = locations.map do |data|
+  jde.extract_from(data)
+end
+# Result:
+# [
+#   {code: "JFK", city: "New York", name: "John F. Kennedy International Airport", country: "US"},
+#   {code: "LHR", city: "London", name: "Heathrow Airport", country: "GB"}
+# ]
+```
+This pattern is especially beneficial when:
+- Processing data in batches that arrive separately
+- Working with large datasets where you need to process one item at a time
+- Applying the same schema to multiple API responses
+- Parsing large collections of similar objects that aren't already in an array structure
 ## Configuration Options
 The JsonDataExtractor gem provides a configuration option to control the behavior when encountering

data/json_data_extractor.gemspec CHANGED Viewed

@@ -29,7 +29,7 @@ transformations. The schema is defined as a simple Ruby hash that maps keys to p
   spec.add_development_dependency 'amazing_print'
   spec.add_development_dependency 'bundler'
   spec.add_development_dependency 'pry'
-  spec.add_development_dependency 'rake', '~> 10.0'
+  spec.add_development_dependency 'rake', '~> 12.3.3'
   spec.add_development_dependency 'rspec', '~> 3.0'
   spec.add_development_dependency 'rubocop'

data/lib/json_data_extractor/extractor.rb CHANGED Viewed

@@ -3,7 +3,7 @@
 module JsonDataExtractor
   # does the main job of the gem
   class Extractor
-    attr_reader :data, :modifiers
+    attr_reader :data, :modifiers, :schema_cache
     # @param json_data [Hash,String]
     # @param modifiers [Hash]
@@ -14,6 +14,35 @@ module JsonDataExtractor
       @path_cache = {}
     end
+    # Creates a new extractor with a pre-processed schema
+    # @param schema [Hash] schema of the expected data mapping
+    # @param modifiers [Hash] modifiers to apply to the extracted data
+    # @return [Extractor] an extractor initialized with the schema
+    def self.with_schema(schema, modifiers = {})
+      extractor = new({}, modifiers)
+      extractor.instance_variable_set(:@schema_cache, SchemaCache.new(schema))
+      extractor
+    end
+    # Extracts data from the provided json_data using the cached schema
+    # @param json_data [Hash,String] the data to extract from
+    # @return [Hash] the extracted data
+    def extract_from(json_data)
+      # Ensure we have a schema cache
+      raise ArgumentError, 'No schema cache available. Use Extractor.with_schema first.' unless @schema_cache
+      # Reset results
+      @results = {}
+      # Update data
+      @data = json_data.is_a?(Hash) ? Oj.dump(json_data, mode: :compat) : json_data
+      # Extract data using cached schema
+      extract_using_cache
+      @results
+    end
     # @param modifier_name [String, Symbol]
     # @param callable [#call, nil] Optional callable object
     def add_modifier(modifier_name, callable = nil, &block)
@@ -58,6 +87,36 @@ module JsonDataExtractor
     private
+    # Extracts data using the cached schema
+    def extract_using_cache
+      schema_cache.schema.each do |key, _|
+        element = schema_cache.schema_elements[key]
+        path = element.path
+        # Use cached JsonPath object
+        json_path = path ? schema_cache.path_cache[path] : nil
+        extracted_data = json_path&.on(@data)
+        if extracted_data.nil? || extracted_data.empty?
+          # we either got nothing or the `path` was initially nil
+          @results[key] = element.fetch_default_value
+          next
+        end
+        # check for nils and apply defaults if applicable
+        extracted_data.map! { |item| item.nil? ? element.fetch_default_value : item }
+        # apply modifiers if present
+        extracted_data = apply_modifiers(extracted_data, element.modifiers) if element.modifiers.any?
+        # apply maps if present
+        @results[key] = element.maps.any? ? apply_maps(extracted_data, element.maps) : extracted_data
+        @results[key] = resolve_result_structure(@results[key], element)
+      end
+    end
     def resolve_result_structure(result, element)
       if element.nested
         # Process nested data

data/lib/json_data_extractor/schema_cache.rb ADDED Viewed

@@ -0,0 +1,30 @@
+# frozen_string_literal: true
+module JsonDataExtractor
+  # Caches schema elements to avoid re-processing the schema for each data extraction
+  class SchemaCache
+    attr_reader :schema, :schema_elements, :path_cache
+    def initialize(schema)
+      @schema = schema
+      @schema_elements = {}
+      @path_cache = {}
+      # Pre-process the schema to create SchemaElement objects
+      process_schema
+    end
+    private
+    def process_schema
+      schema.each do |key, val|
+        # Store the SchemaElement for each key in the schema
+        @schema_elements[key] = JsonDataExtractor::SchemaElement.new(val.is_a?(Hash) ? val : { path: val })
+        # Pre-compile JsonPath objects for each path
+        path = @schema_elements[key].path
+        @path_cache[path] = JsonPath.new(path) if path
+      end
+    end
+  end
+end

data/lib/json_data_extractor/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module JsonDataExtractor
-  VERSION = '0.1.04'
+  VERSION = '0.1.05'
 end

data/lib/json_data_extractor.rb CHANGED Viewed

@@ -5,8 +5,9 @@ require 'multi_json'
 require 'oj'
 require_relative 'json_data_extractor/version'
 require_relative 'json_data_extractor/configuration'
-require_relative 'json_data_extractor/extractor'
 require_relative 'json_data_extractor/schema_element'
+require_relative 'json_data_extractor/schema_cache'
+require_relative 'json_data_extractor/extractor'
 # Set MultiJson to use Oj for performance
 MultiJson.use(:oj)
@@ -22,6 +23,14 @@ module JsonDataExtractor
       Extractor.new(*args)
     end
+    # Creates a new extractor with a pre-processed schema
+    # @param schema [Hash] schema of the expected data mapping
+    # @param modifiers [Hash] modifiers to apply to the extracted data
+    # @return [Extractor] an extractor initialized with the schema
+    def with_schema(schema, modifiers = {})
+      Extractor.with_schema(schema, modifiers)
+    end
     def configuration
       @configuration ||= Configuration.new
     end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: json_data_extractor
 version: !ruby/object:Gem::Version
-  version: 0.1.04
+  version: 0.1.05
 platform: ruby
 authors:
 - Max Buslaev
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2025-04-10 00:00:00.000000000 Z
+date: 2025-05-13 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: amazing_print
@@ -58,14 +58,14 @@ dependencies:
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '10.0'
+        version: 12.3.3
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: '10.0'
+        version: 12.3.3
 - !ruby/object:Gem::Dependency
   name: rspec
   requirement: !ruby/object:Gem::Requirement
@@ -135,6 +135,7 @@ files:
 - ".gitignore"
 - ".rspec"
 - ".travis.yml"
+- CHANGELOG.md
 - CODE_OF_CONDUCT.md
 - Gemfile
 - LICENSE.txt
@@ -146,6 +147,7 @@ files:
 - lib/json_data_extractor.rb
 - lib/json_data_extractor/configuration.rb
 - lib/json_data_extractor/extractor.rb
+- lib/json_data_extractor/schema_cache.rb
 - lib/json_data_extractor/schema_element.rb
 - lib/json_data_extractor/version.rb
 homepage: https://github.com/austerlitz/json_data_extractor