json_data_extractor 0.1.04 → 0.1.05

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 78d2adf9786c1444ad0307cbc8b5a613be04a8b4ace56c5b1973f619fed58177
4
- data.tar.gz: 78bdd81ae9c8b6bb68742b64c7d389fa4844fad20b498cd927ca3dd85a03e9a6
3
+ metadata.gz: 1a833b6de5421f4871a5a36814d6f304a495c16b65b8d27d60f1839317bd26ac
4
+ data.tar.gz: 2e1a98f860ad8f52fd81aa9003b1e2a0fb079bbfdeaef51380625d7e25e83b2c
5
5
  SHA512:
6
- metadata.gz: ac6bc721be1214813aecefc887cfddb7cfb6b55730506dec85fd853d35a01cd403fd1726f93bbabe7af8ed1ad25763df42e8b41ca8794e3a602f87aff040165e
7
- data.tar.gz: 7c1ba2814904cba8d1041f652d313eb097c2c02e59ccd386abd95cbc64b32414ddfff8e2cf7dfb57c98045b1bb50631490ebde0adaede4b14b4455d1fe9b878e
6
+ metadata.gz: 1c408a29566f5999e7ccb251860442b4b7c1cbcee391a72173753bf09c30e9f6e0d78ae18af28c95cd8bc2708dbb7f180493f038221a365745f7f587da08c2a8
7
+ data.tar.gz: 8c936a46b176ebe7dbc365b79d4a267fb3217b16f5f88280a040f3b951b5a8f65a70384c9f025853ec90d8de5c41e0ac1b3545de2b0ed2ba5c64bc8405240fc8
data/CHANGELOG.md ADDED
@@ -0,0 +1,20 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [0.1.05] - 2025-05-13
9
+
10
+ ### Added
11
+ - Added schema reuse functionality for improved performance when processing multiple data objects with the same schema
12
+ - New `JsonDataExtractor.with_schema` class method to create an extractor with a pre-processed schema
13
+ - New `SchemaCache` class to store and reuse schema information
14
+ - New `extract_from` method to extract data using a cached schema
15
+ - Performance improvements by pre-compiling JsonPath objects and caching schema elements
16
+
17
+ ## [0.1.04] - 2025-04-26
18
+
19
+ - Use Oj for json dump
20
+ - Use json path caching
data/README.md CHANGED
@@ -1,5 +1,7 @@
1
1
  # JsonDataExtractor
2
2
 
3
+ [![Gem Version](https://badge.fury.io/rb/json_data_extractor.svg)](https://badge.fury.io/rb/json_data_extractor)
4
+
3
5
  Transform JSON data structures with the help of a simple schema and JsonPath expressions.
4
6
  Use the JsonDataExtractor gem to extract and modify data from complex JSON structures using a
5
7
  straightforward syntax
@@ -325,6 +327,117 @@ E.g. this is a valid real-life schema with nested data:
325
327
 
326
328
  Nested schema can be also applied to objects, not arrays. See specs for more examples.
327
329
 
330
+ ### Schema Reuse for Performance
331
+
332
+ When processing multiple data objects with the same schema, JsonDataExtractor provides an optimized approach that avoids redundant schema processing. This is particularly useful for batch processing scenarios where you need to apply the same transformation to multiple data objects.
333
+
334
+ #### Using `with_schema` and `extract_from`
335
+
336
+ Instead of creating a new extractor for each data object:
337
+
338
+ ```ruby
339
+ data_objects.map do |data|
340
+ extractor = JsonDataExtractor.new(data)
341
+ extractor.extract(schema)
342
+ end
343
+ ```
344
+
345
+ You can create a single extractor with a pre-processed schema and reuse it:
346
+ ```ruby
347
+ extractor = JsonDataExtractor.with_schema(schema)
348
+ data_objects.map do |data|
349
+ extractor.extract_from(data)
350
+ end
351
+ ```
352
+
353
+ This approach offers significant performance improvements for large datasets by:
354
+ 1. Pre-processing the schema only once
355
+ 2. Pre-compiling JsonPath objects
356
+ 3. Caching schema elements
357
+ 4. Avoiding redundant schema validation
358
+
359
+
360
+ #### Comparison with Nested Schema Approach
361
+
362
+ It's worth noting that similar functionality could be achieved using the existing nested schema approach when your data is already in an array format:
363
+
364
+ ```ruby
365
+ # Process an array of locations with the nested schema approach
366
+ locations_array = [location1, location2, location3]
367
+ schema = {
368
+ all_locations: {
369
+ path: "[*]",
370
+ type: "array",
371
+ schema: { code: ".iataCode", city: ".city", name: ".name" }
372
+ }
373
+ }
374
+ result = JsonDataExtractor.new(locations_array).extract(schema)
375
+ # Result: { all_locations: [{code: "...", city: "...", name: "..."}, {...}, {...}] }
376
+ ```
377
+
378
+ **When to use which approach:**
379
+
380
+ 1. **Use nested schema when:**
381
+ - Your data is already structured as an array
382
+ - You want to preserve the array structure in your result
383
+ - You need to process the entire array at once
384
+
385
+ 2. **Use schema reuse when:**
386
+ - You receive data objects individually (e.g., from multiple API calls)
387
+ - You need to process each object separately
388
+ - You want to transform each object independently
389
+ - You need direct access to individual results without unwrapping them from an array
390
+
391
+ The schema reuse approach is specifically optimized for scenarios where you process similar objects multiple times in sequence, rather than all at once in an array.
392
+
393
+
394
+ #### Real-world Example
395
+ Here's a practical example of extracting location data from multiple sources:
396
+ ```ruby
397
+ # Location data from an API
398
+ locations = [
399
+ {
400
+ "iataCode" => "JFK",
401
+ "countryCode" => "US",
402
+ "city" => "New York",
403
+ "name" => "John F. Kennedy International Airport"
404
+ },
405
+ {
406
+ "iataCode" => "LHR",
407
+ "countryCode" => "GB",
408
+ "city" => "London",
409
+ "name" => "Heathrow Airport"
410
+ }
411
+ ]
412
+
413
+ # Define schema once
414
+ schema = {
415
+ code: "$.iataCode",
416
+ city: "$.city",
417
+ name: "$.name",
418
+ country: "$.countryCode"
419
+ }
420
+
421
+ # Create an extractor with the schema
422
+ jde = JsonDataExtractor.with_schema(schema)
423
+
424
+ # Process each location efficiently
425
+ processed_locations = locations.map do |data|
426
+ jde.extract_from(data)
427
+ end
428
+
429
+ # Result:
430
+ # [
431
+ # {code: "JFK", city: "New York", name: "John F. Kennedy International Airport", country: "US"},
432
+ # {code: "LHR", city: "London", name: "Heathrow Airport", country: "GB"}
433
+ # ]
434
+ ```
435
+ This pattern is especially beneficial when:
436
+ - Processing data in batches that arrive separately
437
+ - Working with large datasets where you need to process one item at a time
438
+ - Applying the same schema to multiple API responses
439
+ - Parsing large collections of similar objects that aren't already in an array structure
440
+
328
441
  ## Configuration Options
329
442
 
330
443
  The JsonDataExtractor gem provides a configuration option to control the behavior when encountering
@@ -29,7 +29,7 @@ transformations. The schema is defined as a simple Ruby hash that maps keys to p
29
29
  spec.add_development_dependency 'amazing_print'
30
30
  spec.add_development_dependency 'bundler'
31
31
  spec.add_development_dependency 'pry'
32
- spec.add_development_dependency 'rake', '~> 10.0'
32
+ spec.add_development_dependency 'rake', '~> 12.3.3'
33
33
  spec.add_development_dependency 'rspec', '~> 3.0'
34
34
  spec.add_development_dependency 'rubocop'
35
35
 
@@ -3,7 +3,7 @@
3
3
  module JsonDataExtractor
4
4
  # does the main job of the gem
5
5
  class Extractor
6
- attr_reader :data, :modifiers
6
+ attr_reader :data, :modifiers, :schema_cache
7
7
 
8
8
  # @param json_data [Hash,String]
9
9
  # @param modifiers [Hash]
@@ -14,6 +14,35 @@ module JsonDataExtractor
14
14
  @path_cache = {}
15
15
  end
16
16
 
17
+ # Creates a new extractor with a pre-processed schema
18
+ # @param schema [Hash] schema of the expected data mapping
19
+ # @param modifiers [Hash] modifiers to apply to the extracted data
20
+ # @return [Extractor] an extractor initialized with the schema
21
+ def self.with_schema(schema, modifiers = {})
22
+ extractor = new({}, modifiers)
23
+ extractor.instance_variable_set(:@schema_cache, SchemaCache.new(schema))
24
+ extractor
25
+ end
26
+
27
+ # Extracts data from the provided json_data using the cached schema
28
+ # @param json_data [Hash,String] the data to extract from
29
+ # @return [Hash] the extracted data
30
+ def extract_from(json_data)
31
+ # Ensure we have a schema cache
32
+ raise ArgumentError, 'No schema cache available. Use Extractor.with_schema first.' unless @schema_cache
33
+
34
+ # Reset results
35
+ @results = {}
36
+
37
+ # Update data
38
+ @data = json_data.is_a?(Hash) ? Oj.dump(json_data, mode: :compat) : json_data
39
+
40
+ # Extract data using cached schema
41
+ extract_using_cache
42
+
43
+ @results
44
+ end
45
+
17
46
  # @param modifier_name [String, Symbol]
18
47
  # @param callable [#call, nil] Optional callable object
19
48
  def add_modifier(modifier_name, callable = nil, &block)
@@ -58,6 +87,36 @@ module JsonDataExtractor
58
87
 
59
88
  private
60
89
 
90
+ # Extracts data using the cached schema
91
+ def extract_using_cache
92
+ schema_cache.schema.each do |key, _|
93
+ element = schema_cache.schema_elements[key]
94
+ path = element.path
95
+
96
+ # Use cached JsonPath object
97
+ json_path = path ? schema_cache.path_cache[path] : nil
98
+
99
+ extracted_data = json_path&.on(@data)
100
+
101
+ if extracted_data.nil? || extracted_data.empty?
102
+ # we either got nothing or the `path` was initially nil
103
+ @results[key] = element.fetch_default_value
104
+ next
105
+ end
106
+
107
+ # check for nils and apply defaults if applicable
108
+ extracted_data.map! { |item| item.nil? ? element.fetch_default_value : item }
109
+
110
+ # apply modifiers if present
111
+ extracted_data = apply_modifiers(extracted_data, element.modifiers) if element.modifiers.any?
112
+
113
+ # apply maps if present
114
+ @results[key] = element.maps.any? ? apply_maps(extracted_data, element.maps) : extracted_data
115
+
116
+ @results[key] = resolve_result_structure(@results[key], element)
117
+ end
118
+ end
119
+
61
120
  def resolve_result_structure(result, element)
62
121
  if element.nested
63
122
  # Process nested data
@@ -0,0 +1,30 @@
1
+ # frozen_string_literal: true
2
+
3
+ module JsonDataExtractor
4
+ # Caches schema elements to avoid re-processing the schema for each data extraction
5
+ class SchemaCache
6
+ attr_reader :schema, :schema_elements, :path_cache
7
+
8
+ def initialize(schema)
9
+ @schema = schema
10
+ @schema_elements = {}
11
+ @path_cache = {}
12
+
13
+ # Pre-process the schema to create SchemaElement objects
14
+ process_schema
15
+ end
16
+
17
+ private
18
+
19
+ def process_schema
20
+ schema.each do |key, val|
21
+ # Store the SchemaElement for each key in the schema
22
+ @schema_elements[key] = JsonDataExtractor::SchemaElement.new(val.is_a?(Hash) ? val : { path: val })
23
+
24
+ # Pre-compile JsonPath objects for each path
25
+ path = @schema_elements[key].path
26
+ @path_cache[path] = JsonPath.new(path) if path
27
+ end
28
+ end
29
+ end
30
+ end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module JsonDataExtractor
4
- VERSION = '0.1.04'
4
+ VERSION = '0.1.05'
5
5
  end
@@ -5,8 +5,9 @@ require 'multi_json'
5
5
  require 'oj'
6
6
  require_relative 'json_data_extractor/version'
7
7
  require_relative 'json_data_extractor/configuration'
8
- require_relative 'json_data_extractor/extractor'
9
8
  require_relative 'json_data_extractor/schema_element'
9
+ require_relative 'json_data_extractor/schema_cache'
10
+ require_relative 'json_data_extractor/extractor'
10
11
 
11
12
  # Set MultiJson to use Oj for performance
12
13
  MultiJson.use(:oj)
@@ -22,6 +23,14 @@ module JsonDataExtractor
22
23
  Extractor.new(*args)
23
24
  end
24
25
 
26
+ # Creates a new extractor with a pre-processed schema
27
+ # @param schema [Hash] schema of the expected data mapping
28
+ # @param modifiers [Hash] modifiers to apply to the extracted data
29
+ # @return [Extractor] an extractor initialized with the schema
30
+ def with_schema(schema, modifiers = {})
31
+ Extractor.with_schema(schema, modifiers)
32
+ end
33
+
25
34
  def configuration
26
35
  @configuration ||= Configuration.new
27
36
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: json_data_extractor
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.04
4
+ version: 0.1.05
5
5
  platform: ruby
6
6
  authors:
7
7
  - Max Buslaev
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2025-04-10 00:00:00.000000000 Z
11
+ date: 2025-05-13 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: amazing_print
@@ -58,14 +58,14 @@ dependencies:
58
58
  requirements:
59
59
  - - "~>"
60
60
  - !ruby/object:Gem::Version
61
- version: '10.0'
61
+ version: 12.3.3
62
62
  type: :development
63
63
  prerelease: false
64
64
  version_requirements: !ruby/object:Gem::Requirement
65
65
  requirements:
66
66
  - - "~>"
67
67
  - !ruby/object:Gem::Version
68
- version: '10.0'
68
+ version: 12.3.3
69
69
  - !ruby/object:Gem::Dependency
70
70
  name: rspec
71
71
  requirement: !ruby/object:Gem::Requirement
@@ -135,6 +135,7 @@ files:
135
135
  - ".gitignore"
136
136
  - ".rspec"
137
137
  - ".travis.yml"
138
+ - CHANGELOG.md
138
139
  - CODE_OF_CONDUCT.md
139
140
  - Gemfile
140
141
  - LICENSE.txt
@@ -146,6 +147,7 @@ files:
146
147
  - lib/json_data_extractor.rb
147
148
  - lib/json_data_extractor/configuration.rb
148
149
  - lib/json_data_extractor/extractor.rb
150
+ - lib/json_data_extractor/schema_cache.rb
149
151
  - lib/json_data_extractor/schema_element.rb
150
152
  - lib/json_data_extractor/version.rb
151
153
  homepage: https://github.com/austerlitz/json_data_extractor