data_collector 0.17.0 → 0.18.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +105 -58
- data/data_collector.gemspec +5 -2
- data/examples/marc.rb +27 -0
- data/lib/data_collector/core.rb +16 -0
- data/lib/data_collector/input/dir.rb +28 -0
- data/lib/data_collector/input/generic.rb +77 -0
- data/lib/data_collector/input/queue.rb +60 -0
- data/lib/data_collector/input.rb +21 -2
- data/lib/data_collector/output.rb +4 -3
- data/lib/data_collector/pipeline.rb +91 -0
- data/lib/data_collector/rules.rb +5 -126
- data/lib/data_collector/rules.rb.depricated +130 -0
- data/lib/data_collector/rules_ng.rb +25 -7
- data/lib/data_collector/runner.rb +0 -1
- data/lib/data_collector/version.rb +1 -1
- data/lib/data_collector.rb +1 -0
- metadata +55 -7
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 9b2dda800a0c468ee0db8c4a4546f98c8baa005f0cff2df603e613d404021315
|
4
|
+
data.tar.gz: c505eb5354999645eb5ea9fbb5200ce100a37d9f3e0eac85bf9416d21cd3514a
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: a44557a687028b74b495236a47b4d802a4a6e130526a639ddf63b7b6a8a07b090f5197c23a36b2b4c9628bcfa33a0d38e2451c1a3224a45fa63d388f6922624e
|
7
|
+
data.tar.gz: b98a223f063f24b8f78e1358faeb02e33e365edd77b0fba2d28649fa0ad17d79f386ff216326040f3ec87390cb595f41382733ea042c5357c9cf48a23481d8c7
|
data/README.md
CHANGED
@@ -1,39 +1,91 @@
|
|
1
1
|
# DataCollector
|
2
|
-
Convenience module to Extract, Transform and Load your data.
|
3
|
-
|
4
|
-
Support objects like CONFIG, LOG, RULES
|
2
|
+
Convenience module to Extract, Transform and Load your data in a Pipeline.
|
3
|
+
The 'INPUT', 'OUTPUT' and 'FILTER' object will help you to read, transform and output your data.
|
4
|
+
Support objects like CONFIG, LOG, ERROR, RULES. Will help you to write manageable rules to transform and log your data.
|
5
|
+
Include the DataCollector::Core module into your application gives you access to these objects.
|
6
|
+
```ruby
|
7
|
+
include DataCollector::Core
|
8
|
+
```
|
5
9
|
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
10
|
+
Every object can be used on its own.
|
11
|
+
|
12
|
+
|
13
|
+
#### Pipeline
|
14
|
+
Allows you to create a simple pipeline of operations to process data. With a data pipeline, you can collect, process, and transform data, and then transfer it to various systems and applications.
|
15
|
+
|
16
|
+
You can set a schedule for pipelines that are triggered by new data, specifying how often the pipeline should be
|
17
|
+
executed in the [ISO8601 duration format](https://www.digi.com/resources/documentation/digidocs//90001488-13/reference/r_iso_8601_duration_format.htm). The processing logic is then executed.
|
18
|
+
###### methods:
|
19
|
+
- .new(options): options can be schedule in [ISO8601 duration format](https://www.digi.com/resources/documentation/digidocs//90001488-13/reference/r_iso_8601_duration_format.htm) and name
|
20
|
+
- .run: start the pipeline. blocking if a schedule is supplied
|
21
|
+
- .stop: stop the pipeline
|
22
|
+
- .pause: pause the pipeline. Restart using .run
|
23
|
+
- .running?: is pipeline running
|
24
|
+
- .stopped?: is pipeline not running
|
25
|
+
- .paused?: is pipeline paused
|
26
|
+
- .name: name of the pipe
|
27
|
+
- .run_count: number of times the pipe has ran
|
28
|
+
- .on_message: handle to run every time a trigger event happens
|
29
|
+
###### example:
|
30
|
+
```ruby
|
31
|
+
#create a pipline scheduled to run every 10 minutes
|
32
|
+
pipeline = Pipeline.new(schedule: 'PT10M')
|
12
33
|
|
34
|
+
pipeline.on_message do |input, output|
|
35
|
+
# logic
|
36
|
+
end
|
13
37
|
|
14
|
-
|
15
|
-
|
38
|
+
pipeline.run
|
39
|
+
```
|
40
|
+
|
41
|
+
#### input
|
42
|
+
The input component is part of the processing logic. All data is converted into a Hash, Array, ... accessible using plain Ruby or JSONPath using the filter object.
|
43
|
+
The input component can fetch data from various URIs, such as files, URLs, directories, queues, ...
|
44
|
+
For a push input component, a listener is created with a processing logic block that is executed whenever new data is available.
|
45
|
+
A push happens when new data is created in a directory, message queue, ...
|
16
46
|
|
17
|
-
**Public methods**
|
18
47
|
```ruby
|
19
48
|
from_uri(source, options = {:raw, :content_type})
|
20
49
|
```
|
21
|
-
- source: an uri with a scheme of http, https, file
|
50
|
+
- source: an uri with a scheme of http, https, file, amqp
|
22
51
|
- options:
|
23
52
|
- raw: _boolean_ do not parse
|
24
53
|
- content_type: _string_ force a content_type if the 'Content-Type' returned by the http server is incorrect
|
25
54
|
|
26
|
-
example:
|
55
|
+
###### example:
|
27
56
|
```ruby
|
57
|
+
# read from an http endpoint
|
28
58
|
input.from_uri("http://www.libis.be")
|
29
59
|
input.from_uri("file://hello.txt")
|
30
60
|
input.from_uri("http://www.libis.be/record.jsonld", content_type: 'application/ld+json')
|
31
|
-
```
|
32
61
|
|
62
|
+
# read data from a RabbitMQ queue
|
63
|
+
listener = input.from_uri('amqp://user:password@localhost?channel=hello')
|
64
|
+
listener.on_message do |input, output, message|
|
65
|
+
puts message
|
66
|
+
end
|
67
|
+
listener.start
|
68
|
+
|
69
|
+
# read data from a directory
|
70
|
+
listener = input.from_uri('file://this/is/directory')
|
71
|
+
listener.on_message do |input, output, filename|
|
72
|
+
puts filename
|
73
|
+
end
|
74
|
+
listener.start
|
75
|
+
```
|
33
76
|
|
77
|
+
Inputs can be JSON, XML or CSV or XML in a TAR.GZ file
|
34
78
|
|
79
|
+
###### listener from input.from_uri(directory|message queue)
|
80
|
+
When a listener is defined that is triggered by an event(PUSH) like a message queue or files written to a directory you have these extra methods.
|
35
81
|
|
36
|
-
|
82
|
+
- .run: start the listener. blocking if a schedule is supplied
|
83
|
+
- .stop: stop the listener
|
84
|
+
- .pause: pause the listener. Restart using .run
|
85
|
+
- .running?: is listener running
|
86
|
+
- .stopped?: is listener not running
|
87
|
+
- .paused?: is listener paused
|
88
|
+
- .on_message: handle to run every time a trigger event happens
|
37
89
|
|
38
90
|
### output
|
39
91
|
Output is an object you can store key/value pairs that needs to be written to an output stream.
|
@@ -45,7 +97,7 @@ Output is an object you can store key/value pairs that needs to be written to an
|
|
45
97
|
Write output to a file, string use an ERB file as a template
|
46
98
|
example:
|
47
99
|
___test.erb___
|
48
|
-
```
|
100
|
+
```erbruby
|
49
101
|
<names>
|
50
102
|
<combined><%= data[:name] %> <%= data[:last_name] %></combined>
|
51
103
|
<%= print data, :name, :first_name %>
|
@@ -53,7 +105,7 @@ ___test.erb___
|
|
53
105
|
</names>
|
54
106
|
```
|
55
107
|
will produce
|
56
|
-
```
|
108
|
+
```html
|
57
109
|
<names>
|
58
110
|
<combined>John Doe</combined>
|
59
111
|
<first_name>John</first_name>
|
@@ -97,41 +149,11 @@ filter data from a hash using [JSONPath](http://goessner.net/articles/JsonPath/i
|
|
97
149
|
filtered_data = filter(data, "$..metadata.record")
|
98
150
|
```
|
99
151
|
|
100
|
-
#### rules
|
101
|
-
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
~~Available convert methods are: time, map, each, call, suffix, text~~
|
106
|
-
~~- time: parses a given time/date string into a Time object~~
|
107
|
-
~~- map: applies a mapping to a filter~~
|
108
|
-
~~- suffix: adds a suffix to a result~~
|
109
|
-
~~- call: executes a lambda on the filter~~
|
110
|
-
~~- each: runs a lambda on each row of a filter~~
|
111
|
-
~~- text: passthrough method. Returns value unchanged~~
|
112
|
-
|
113
|
-
~~example:~~
|
114
|
-
```ruby
|
115
|
-
my_rules = {
|
116
|
-
'identifier' => {"filter" => '$..id'},
|
117
|
-
'language' => {'filter' => '$..lang',
|
118
|
-
'options' => {'convert' => 'map',
|
119
|
-
'map' => {'nl' => 'dut', 'fr' => 'fre', 'de' => 'ger', 'en' => 'eng'}
|
120
|
-
}
|
121
|
-
},
|
122
|
-
'subject' => {'filter' => '$..keywords',
|
123
|
-
options' => {'convert' => 'each',
|
124
|
-
'lambda' => lambda {|d| d.split(',')}
|
125
|
-
}
|
126
|
-
},
|
127
|
-
'creationdate' => {'filter' => '$..published_date', 'convert' => 'time'}
|
128
|
-
}
|
129
|
-
|
130
|
-
rules.run(my_rules, record, output)
|
131
|
-
```
|
132
|
-
|
133
|
-
#### rules_ng
|
134
|
-
!!! not compatible with RULES object
|
152
|
+
#### rules
|
153
|
+
The RULES objects have a simple concept. Rules exist of 3 components:
|
154
|
+
- a destination tag
|
155
|
+
- a jsonpath filter to get the data
|
156
|
+
- a lambda to execute on every filter hit
|
135
157
|
|
136
158
|
TODO: work in progress see test for examples on how to use
|
137
159
|
|
@@ -202,15 +224,15 @@ Here you find different rule combination that are possible
|
|
202
224
|
}
|
203
225
|
```
|
204
226
|
|
205
|
-
Here is an example on how to call last RULESET "rs_hash_with_json_filter_and_option".
|
206
227
|
|
207
|
-
***
|
228
|
+
***rules.run*** can have 4 parameters. First 3 are mandatory. The last one ***options*** can hold data static to a rule set or engine directives.
|
208
229
|
|
209
|
-
List of engine directives:
|
230
|
+
##### List of engine directives:
|
210
231
|
- _no_array_with_one_element: defaults to false. if the result is an array with 1 element just return the element.
|
211
232
|
|
212
|
-
|
233
|
+
###### example:
|
213
234
|
```ruby
|
235
|
+
# apply RULESET "rs_hash_with_json_filter_and_option" to data
|
214
236
|
include DataCollector::Core
|
215
237
|
output.clear
|
216
238
|
data = {'subject' => ['water', 'thermodynamics']}
|
@@ -315,7 +337,32 @@ Or install it yourself as:
|
|
315
337
|
|
316
338
|
## Usage
|
317
339
|
|
318
|
-
|
340
|
+
```ruby
|
341
|
+
require 'data_collector'
|
342
|
+
|
343
|
+
include DataCollector::Core
|
344
|
+
# including core gives you a pipeline, input, output, filter, config, log, error object to work with
|
345
|
+
RULES = {
|
346
|
+
'title' => '$..vertitle'
|
347
|
+
}
|
348
|
+
#create a PULL pipeline and schedule it to run every 5 seconds
|
349
|
+
pipeline = DataCollector::Pipeline.new(schedule: 'PT5S')
|
350
|
+
|
351
|
+
pipeline.on_message do |input, output|
|
352
|
+
data = input.from_uri('https://services3.libis.be/primo_artefact/lirias3611609')
|
353
|
+
rules.run(RULES, data, output)
|
354
|
+
#puts JSON.pretty_generate(input.raw)
|
355
|
+
puts JSON.pretty_generate(output.raw)
|
356
|
+
output.clear
|
357
|
+
|
358
|
+
if pipeline.run_count > 2
|
359
|
+
log('stopping pipeline after one run')
|
360
|
+
pipeline.stop
|
361
|
+
end
|
362
|
+
end
|
363
|
+
pipeline.run
|
364
|
+
|
365
|
+
```
|
319
366
|
|
320
367
|
## Development
|
321
368
|
|
data/data_collector.gemspec
CHANGED
@@ -43,11 +43,14 @@ Gem::Specification.new do |spec|
|
|
43
43
|
spec.add_runtime_dependency 'jsonpath', '~> 1.1'
|
44
44
|
spec.add_runtime_dependency 'mime-types', '~> 3.4'
|
45
45
|
spec.add_runtime_dependency 'minitar', '= 0.9'
|
46
|
-
spec.add_runtime_dependency 'nokogiri', '~> 1.
|
46
|
+
spec.add_runtime_dependency 'nokogiri', '~> 1.14'
|
47
47
|
spec.add_runtime_dependency 'nori', '~> 2.6'
|
48
|
+
spec.add_runtime_dependency 'iso8601', '~> 0.13'
|
49
|
+
spec.add_runtime_dependency 'listen', '~> 3.8'
|
50
|
+
spec.add_runtime_dependency 'bunny', '~> 2.20'
|
48
51
|
|
49
52
|
spec.add_development_dependency 'bundler', '~> 2.3'
|
50
|
-
spec.add_development_dependency 'minitest', '~> 5.
|
53
|
+
spec.add_development_dependency 'minitest', '~> 5.18'
|
51
54
|
spec.add_development_dependency 'rake', '~> 13.0'
|
52
55
|
spec.add_development_dependency 'webmock', '~> 3.18'
|
53
56
|
end
|
data/examples/marc.rb
ADDED
@@ -0,0 +1,27 @@
|
|
1
|
+
$LOAD_PATH << '../lib'
|
2
|
+
require 'data_collector'
|
3
|
+
|
4
|
+
# include module gives us an pipeline, input, output, filter, log and error object to work with
|
5
|
+
include DataCollector::Core
|
6
|
+
|
7
|
+
RULES = {
|
8
|
+
"title" => {'$.record.datafield[?(@._tag == "245")]' => lambda do |d, o|
|
9
|
+
subfields = d['subfield']
|
10
|
+
subfields = [subfields] unless subfields.is_a?(Array)
|
11
|
+
subfields.map{|m| m["$text"]}.join(' ')
|
12
|
+
end
|
13
|
+
},
|
14
|
+
"author" => {'$..datafield[?(@._tag == "100")]' => lambda do |d, o|
|
15
|
+
subfields = d['subfield']
|
16
|
+
subfields = [subfields] unless subfields.is_a?(Array)
|
17
|
+
subfields.map{|m| m["$text"]}.join(' ')
|
18
|
+
end
|
19
|
+
}
|
20
|
+
}
|
21
|
+
|
22
|
+
#read remote record enable logging
|
23
|
+
data = input.from_uri('https://gist.githubusercontent.com/kefo/796b39925e234fb6d912/raw/3df2ce329a947864ae8555f214253f956d679605/sample-marc-with-xsd.xml', {logging: true})
|
24
|
+
# apply rules to data and if result contains only 1 entry do not return an array
|
25
|
+
rules.run(RULES, data, output, {_no_array_with_one_element: true})
|
26
|
+
# print result
|
27
|
+
puts JSON.pretty_generate(output.raw)
|
data/lib/data_collector/core.rb
CHANGED
@@ -10,6 +10,14 @@ require_relative 'config_file'
|
|
10
10
|
|
11
11
|
module DataCollector
|
12
12
|
module Core
|
13
|
+
# Pipeline for your data pipeline
|
14
|
+
# example: pipeline.on_message do |input, output|
|
15
|
+
# ** processing logic here **
|
16
|
+
# end
|
17
|
+
def pipeline
|
18
|
+
@input ||= DataCollector::Pipeline.new
|
19
|
+
end
|
20
|
+
module_function :pipeline
|
13
21
|
# Read input from an URI
|
14
22
|
# example: input.from_uri("http://www.libis.be")
|
15
23
|
# input.from_uri("file://hello.txt")
|
@@ -79,6 +87,8 @@ module DataCollector
|
|
79
87
|
# }
|
80
88
|
# rules.run(my_rules, input, output)
|
81
89
|
def rules
|
90
|
+
#DataCollector::Core.log('RULES depricated using RULESNG')
|
91
|
+
#rules_ng
|
82
92
|
@rules ||= Rules.new
|
83
93
|
end
|
84
94
|
module_function :rules
|
@@ -121,6 +131,12 @@ module DataCollector
|
|
121
131
|
end
|
122
132
|
module_function :log
|
123
133
|
|
134
|
+
def error(message)
|
135
|
+
@logger ||= Logger.new(STDOUT)
|
136
|
+
@logger.error(message)
|
137
|
+
end
|
138
|
+
module_function :error
|
139
|
+
|
124
140
|
end
|
125
141
|
|
126
142
|
end
|
@@ -0,0 +1,28 @@
|
|
1
|
+
require_relative 'generic'
|
2
|
+
require 'listen'
|
3
|
+
|
4
|
+
module DataCollector
|
5
|
+
class Input
|
6
|
+
class Dir < Generic
|
7
|
+
def initialize(uri, options)
|
8
|
+
super
|
9
|
+
end
|
10
|
+
|
11
|
+
def running?
|
12
|
+
@listener.processing?
|
13
|
+
end
|
14
|
+
|
15
|
+
private
|
16
|
+
|
17
|
+
def create_listener
|
18
|
+
@listener ||= Listen.to("#{@uri.host}/#{@uri.path}", @options) do |modified, added, _|
|
19
|
+
files = added | modified
|
20
|
+
files.each do |filename|
|
21
|
+
handle_on_message(input, output, filename)
|
22
|
+
end
|
23
|
+
end
|
24
|
+
end
|
25
|
+
|
26
|
+
end
|
27
|
+
end
|
28
|
+
end
|
@@ -0,0 +1,77 @@
|
|
1
|
+
require 'listen'
|
2
|
+
|
3
|
+
module DataCollector
|
4
|
+
class Input
|
5
|
+
class Generic
|
6
|
+
def initialize(uri, options)
|
7
|
+
@uri = uri
|
8
|
+
@options = options
|
9
|
+
|
10
|
+
@input = DataCollector::Input.new
|
11
|
+
@output = DataCollector::Output.new
|
12
|
+
|
13
|
+
@listener = create_listener
|
14
|
+
end
|
15
|
+
|
16
|
+
def run(should_block = false, &block)
|
17
|
+
raise DataCollector::Error, 'Please supply a on_message block' if @on_message_callback.nil?
|
18
|
+
@listener.start
|
19
|
+
|
20
|
+
if should_block
|
21
|
+
while running?
|
22
|
+
yield block if block_given?
|
23
|
+
sleep 2
|
24
|
+
end
|
25
|
+
else
|
26
|
+
yield block if block_given?
|
27
|
+
end
|
28
|
+
|
29
|
+
end
|
30
|
+
|
31
|
+
def stop
|
32
|
+
@listener.stop
|
33
|
+
end
|
34
|
+
|
35
|
+
def pause
|
36
|
+
@listener.pause
|
37
|
+
end
|
38
|
+
|
39
|
+
def running?
|
40
|
+
@listener.running?
|
41
|
+
end
|
42
|
+
|
43
|
+
def stopped?
|
44
|
+
@listener.stopped?
|
45
|
+
end
|
46
|
+
|
47
|
+
def paused?
|
48
|
+
@listener.paused?
|
49
|
+
end
|
50
|
+
|
51
|
+
def on_message(&block)
|
52
|
+
@on_message_callback = block
|
53
|
+
end
|
54
|
+
|
55
|
+
private
|
56
|
+
|
57
|
+
def create_listener
|
58
|
+
raise DataCollector::Error, 'Please implement a listener'
|
59
|
+
end
|
60
|
+
|
61
|
+
def handle_on_message(input, output, data)
|
62
|
+
if (callback = @on_message_callback)
|
63
|
+
timing = Time.now
|
64
|
+
begin
|
65
|
+
callback.call(input, output, data)
|
66
|
+
rescue StandardError => e
|
67
|
+
DataCollector::Core.error("INPUT #{e.message}")
|
68
|
+
puts e.backtrace.join("\n")
|
69
|
+
ensure
|
70
|
+
DataCollector::Core.log("INPUT ran for #{((Time.now.to_f - timing.to_f).to_f * 1000.0).to_i}ms")
|
71
|
+
end
|
72
|
+
end
|
73
|
+
end
|
74
|
+
|
75
|
+
end
|
76
|
+
end
|
77
|
+
end
|
@@ -0,0 +1,60 @@
|
|
1
|
+
require_relative 'generic'
|
2
|
+
require 'bunny'
|
3
|
+
require 'active_support/core_ext/hash'
|
4
|
+
|
5
|
+
module DataCollector
|
6
|
+
class Input
|
7
|
+
class Queue < Generic
|
8
|
+
def initialize(uri, options)
|
9
|
+
super
|
10
|
+
|
11
|
+
if running?
|
12
|
+
create_channel unless @channel
|
13
|
+
create_queue unless @queue
|
14
|
+
end
|
15
|
+
end
|
16
|
+
|
17
|
+
def running?
|
18
|
+
@listener.open?
|
19
|
+
end
|
20
|
+
|
21
|
+
def send(message)
|
22
|
+
if running?
|
23
|
+
@queue.publish(message)
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
private
|
28
|
+
|
29
|
+
def create_listener
|
30
|
+
@listener ||= begin
|
31
|
+
connection = Bunny.new(@uri.to_s)
|
32
|
+
connection.start
|
33
|
+
|
34
|
+
connection
|
35
|
+
rescue StandardError => e
|
36
|
+
raise DataCollector::Error, "Unable to connect to RabbitMQ. #{e.message}"
|
37
|
+
end
|
38
|
+
end
|
39
|
+
|
40
|
+
def create_channel
|
41
|
+
raise DataCollector::Error, 'Connection to RabbitMQ is closed' if @listener.closed?
|
42
|
+
@channel ||= @listener.create_channel
|
43
|
+
end
|
44
|
+
|
45
|
+
def create_queue
|
46
|
+
@queue ||= begin
|
47
|
+
options = CGI.parse(@uri.query).with_indifferent_access
|
48
|
+
raise DataCollector::Error, '"channel" query parameter missing from uri.' unless options.include?(:channel)
|
49
|
+
queue = @channel.queue(options[:channel].first)
|
50
|
+
|
51
|
+
queue.subscribe do |delivery_info, metadata, payload|
|
52
|
+
handle_on_message(input, output, payload)
|
53
|
+
end if queue
|
54
|
+
|
55
|
+
queue
|
56
|
+
end
|
57
|
+
end
|
58
|
+
end
|
59
|
+
end
|
60
|
+
end
|
data/lib/data_collector/input.rb
CHANGED
@@ -12,6 +12,8 @@ require 'active_support/core_ext/hash'
|
|
12
12
|
require 'zlib'
|
13
13
|
require 'minitar'
|
14
14
|
require 'csv'
|
15
|
+
require_relative 'input/dir'
|
16
|
+
require_relative 'input/queue'
|
15
17
|
|
16
18
|
#require_relative 'ext/xml_utility_node'
|
17
19
|
module DataCollector
|
@@ -34,7 +36,13 @@ module DataCollector
|
|
34
36
|
when 'https'
|
35
37
|
data = from_https(uri, options)
|
36
38
|
when 'file'
|
37
|
-
|
39
|
+
if File.directory?("#{uri.host}/#{uri.path}")
|
40
|
+
return from_dir(uri, options)
|
41
|
+
else
|
42
|
+
data = from_file(uri, options)
|
43
|
+
end
|
44
|
+
when 'amqp'
|
45
|
+
data = from_queue(uri,options)
|
38
46
|
else
|
39
47
|
raise "Do not know how to process #{source}"
|
40
48
|
end
|
@@ -61,7 +69,10 @@ module DataCollector
|
|
61
69
|
|
62
70
|
def from_https(uri, options = {})
|
63
71
|
data = nil
|
64
|
-
|
72
|
+
if options.with_indifferent_access.include?(:logging) && options.with_indifferent_access[:logging]
|
73
|
+
HTTP.default_options = HTTP::Options.new(features: { logging: { logger: @logger } })
|
74
|
+
end
|
75
|
+
|
65
76
|
http = HTTP
|
66
77
|
|
67
78
|
#http.use(logging: {logger: @logger})
|
@@ -157,6 +168,14 @@ module DataCollector
|
|
157
168
|
data
|
158
169
|
end
|
159
170
|
|
171
|
+
def from_dir(uri, options = {})
|
172
|
+
DataCollector::Input::Dir.new(uri, options)
|
173
|
+
end
|
174
|
+
|
175
|
+
def from_queue(uri, options = {})
|
176
|
+
DataCollector::Input::Queue.new(uri, options)
|
177
|
+
end
|
178
|
+
|
160
179
|
def xml_to_hash(data)
|
161
180
|
#gsub('<\/', '< /') outherwise wrong XML-parsing (see records lirias1729192 )
|
162
181
|
data = data.gsub /</, '< /'
|
@@ -38,8 +38,10 @@ module DataCollector
|
|
38
38
|
data[k] << v
|
39
39
|
end
|
40
40
|
else
|
41
|
-
|
42
|
-
|
41
|
+
data[k] = v
|
42
|
+
# HELP: why am I creating an array here?
|
43
|
+
# t = data[k]
|
44
|
+
# data[k] = Array.new([t, v])
|
43
45
|
end
|
44
46
|
else
|
45
47
|
data[k] = v
|
@@ -152,7 +154,6 @@ module DataCollector
|
|
152
154
|
result
|
153
155
|
rescue Exception => e
|
154
156
|
raise "unable to transform to text: #{e.message}"
|
155
|
-
""
|
156
157
|
end
|
157
158
|
|
158
159
|
def to_tmp_file(erb_file, records_dir)
|
@@ -0,0 +1,91 @@
|
|
1
|
+
require 'iso8601'
|
2
|
+
|
3
|
+
module DataCollector
|
4
|
+
class Pipeline
|
5
|
+
attr_reader :run_count, :name
|
6
|
+
def initialize(options = {})
|
7
|
+
@running = false
|
8
|
+
@paused = false
|
9
|
+
|
10
|
+
@input = DataCollector::Input.new
|
11
|
+
@output = DataCollector::Output.new
|
12
|
+
@run_count = 0
|
13
|
+
|
14
|
+
@schedule = options[:schedule] || {}
|
15
|
+
@name = options[:name] || "#{Time.now.to_i}-#{rand(10000)}"
|
16
|
+
end
|
17
|
+
|
18
|
+
def on_message(&block)
|
19
|
+
@on_message_callback = block
|
20
|
+
end
|
21
|
+
|
22
|
+
def run
|
23
|
+
if paused? && @running
|
24
|
+
@paused = false
|
25
|
+
end
|
26
|
+
|
27
|
+
@running = true
|
28
|
+
if @schedule && !@schedule.empty?
|
29
|
+
while running?
|
30
|
+
@run_count += 1
|
31
|
+
start_time = ISO8601::DateTime.new(Time.now.to_datetime.to_s)
|
32
|
+
begin
|
33
|
+
duration = ISO8601::Duration.new(@schedule)
|
34
|
+
rescue StandardError => e
|
35
|
+
raise DataCollector::Error, "PIPELINE - bad schedule: #{e.message}"
|
36
|
+
end
|
37
|
+
interval = ISO8601::TimeInterval.from_duration(start_time, duration)
|
38
|
+
|
39
|
+
DataCollector::Core.log("PIPELINE running in #{interval.size} seconds")
|
40
|
+
sleep interval.size
|
41
|
+
handle_on_message(@input, @output) unless paused?
|
42
|
+
end
|
43
|
+
else # run once
|
44
|
+
@run_count += 1
|
45
|
+
DataCollector::Core.log("PIPELINE running once")
|
46
|
+
handle_on_message(@input, @output)
|
47
|
+
end
|
48
|
+
rescue StandardError => e
|
49
|
+
DataCollector::Core.error("PIPELINE run failed: #{e.message}")
|
50
|
+
raise e
|
51
|
+
#puts e.backtrace.join("\n")
|
52
|
+
end
|
53
|
+
|
54
|
+
def stop
|
55
|
+
@running = false
|
56
|
+
@paused = false
|
57
|
+
end
|
58
|
+
|
59
|
+
def pause
|
60
|
+
@paused = !@paused if @running
|
61
|
+
end
|
62
|
+
|
63
|
+
def running?
|
64
|
+
@running
|
65
|
+
end
|
66
|
+
|
67
|
+
def stopped?
|
68
|
+
!@running
|
69
|
+
end
|
70
|
+
|
71
|
+
def paused?
|
72
|
+
@paused
|
73
|
+
end
|
74
|
+
|
75
|
+
private
|
76
|
+
|
77
|
+
def handle_on_message(input, output)
|
78
|
+
if (callback = @on_message_callback)
|
79
|
+
timing = Time.now
|
80
|
+
begin
|
81
|
+
callback.call(input, output)
|
82
|
+
rescue StandardError => e
|
83
|
+
DataCollector::Core.error("PIPELINE #{e.message}")
|
84
|
+
ensure
|
85
|
+
DataCollector::Core.log("PIPELINE ran for #{((Time.now.to_f - timing.to_f).to_f * 1000.0).to_i}ms")
|
86
|
+
end
|
87
|
+
end
|
88
|
+
end
|
89
|
+
|
90
|
+
end
|
91
|
+
end
|
data/lib/data_collector/rules.rb
CHANGED
@@ -1,130 +1,9 @@
|
|
1
|
-
|
1
|
+
require_relative 'rules_ng'
|
2
2
|
|
3
3
|
module DataCollector
|
4
|
-
class Rules
|
5
|
-
def initialize()
|
6
|
-
|
4
|
+
class Rules < RulesNg
|
5
|
+
def initialize(logger = Logger.new(STDOUT))
|
6
|
+
super
|
7
7
|
end
|
8
|
-
|
9
|
-
def run(rule_map, from_record, to_record, options = {})
|
10
|
-
rule_map.each do |map_to_key, rule|
|
11
|
-
if rule.is_a?(Array)
|
12
|
-
rule.each do |sub_rule|
|
13
|
-
apply_rule(map_to_key, sub_rule, from_record, to_record, options)
|
14
|
-
end
|
15
|
-
else
|
16
|
-
apply_rule(map_to_key, rule, from_record, to_record, options)
|
17
|
-
end
|
18
|
-
end
|
19
|
-
|
20
|
-
to_record.each do |element|
|
21
|
-
element = element.delete_if do |k, v|
|
22
|
-
v != false && (v.nil?)
|
23
|
-
end
|
24
|
-
end
|
25
|
-
end
|
26
|
-
|
27
|
-
private
|
28
|
-
|
29
|
-
def apply_rule(map_to_key, rule, from_record, to_record, options = {})
|
30
|
-
if rule.has_key?('text')
|
31
|
-
suffix = (rule && rule.key?('options') && rule['options'].key?('suffix')) ? rule['options']['suffix'] : ''
|
32
|
-
to_record << { map_to_key.to_sym => add_suffix(rule['text'], suffix) }
|
33
|
-
elsif rule.has_key?('options') && rule['options'].has_key?('convert') && rule['options']['convert'].eql?('each')
|
34
|
-
result = get_value_for(map_to_key, rule['filter'], from_record, rule['options'], options)
|
35
|
-
|
36
|
-
if result.is_a?(Array)
|
37
|
-
result.each do |m|
|
38
|
-
to_record << {map_to_key.to_sym => m}
|
39
|
-
end
|
40
|
-
else
|
41
|
-
to_record << {map_to_key.to_sym => result}
|
42
|
-
end
|
43
|
-
else
|
44
|
-
result = get_value_for(map_to_key, rule['filter'], from_record, rule['options'], options)
|
45
|
-
return if result && result.empty?
|
46
|
-
|
47
|
-
to_record << {map_to_key.to_sym => result}
|
48
|
-
end
|
49
|
-
end
|
50
|
-
|
51
|
-
def get_value_for(tag_key, filter_path, record, rule_options = {}, options = {})
|
52
|
-
data = nil
|
53
|
-
if record
|
54
|
-
if filter_path.is_a?(Array) && !record.is_a?(Array)
|
55
|
-
record = [record]
|
56
|
-
end
|
57
|
-
|
58
|
-
data = Core::filter(record, filter_path)
|
59
|
-
|
60
|
-
if data && rule_options
|
61
|
-
if rule_options.key?('convert')
|
62
|
-
case rule_options['convert']
|
63
|
-
when 'time'
|
64
|
-
result = []
|
65
|
-
data = [data] unless data.is_a?(Array)
|
66
|
-
data.each do |d|
|
67
|
-
result << Time.parse(d)
|
68
|
-
end
|
69
|
-
data = result
|
70
|
-
when 'map'
|
71
|
-
if data.is_a?(Array)
|
72
|
-
data = data.map do |r|
|
73
|
-
rule_options['map'][r] if rule_options['map'].key?(r)
|
74
|
-
end
|
75
|
-
|
76
|
-
data.compact!
|
77
|
-
data.flatten! if rule_options.key?('flatten') && rule_options['flatten']
|
78
|
-
else
|
79
|
-
return rule_options['map'][data] if rule_options['map'].key?(data)
|
80
|
-
end
|
81
|
-
when 'each'
|
82
|
-
data = [data] unless data.is_a?(Array)
|
83
|
-
if options.empty?
|
84
|
-
data = data.map { |d| rule_options['lambda'].call(d) }
|
85
|
-
else
|
86
|
-
data = data.map { |d| rule_options['lambda'].call(d, options) }
|
87
|
-
end
|
88
|
-
data.flatten! if rule_options.key?('flatten') && rule_options['flatten']
|
89
|
-
when 'call'
|
90
|
-
if options.empty?
|
91
|
-
data = rule_options['lambda'].call(data)
|
92
|
-
else
|
93
|
-
data = rule_options['lambda'].call(data, options)
|
94
|
-
end
|
95
|
-
return data
|
96
|
-
end
|
97
|
-
end
|
98
|
-
|
99
|
-
if rule_options.key?('suffix')
|
100
|
-
data = add_suffix(data, rule_options['suffix'])
|
101
|
-
end
|
102
|
-
|
103
|
-
end
|
104
|
-
|
105
|
-
end
|
106
|
-
|
107
|
-
return data
|
108
|
-
end
|
109
|
-
|
110
|
-
def add_suffix(data, suffix)
|
111
|
-
case data.class.name
|
112
|
-
when 'Array'
|
113
|
-
result = []
|
114
|
-
data.each do |d|
|
115
|
-
result << add_suffix(d, suffix)
|
116
|
-
end
|
117
|
-
data = result
|
118
|
-
when 'Hash'
|
119
|
-
data.each do |k, v|
|
120
|
-
data[k] = add_suffix(v, suffix)
|
121
|
-
end
|
122
|
-
else
|
123
|
-
data = data.to_s
|
124
|
-
data += suffix
|
125
|
-
end
|
126
|
-
data
|
127
|
-
end
|
128
|
-
|
129
8
|
end
|
130
|
-
end
|
9
|
+
end
|
@@ -0,0 +1,130 @@
|
|
1
|
+
require 'logger'
|
2
|
+
|
3
|
+
module DataCollector
|
4
|
+
class Rules
|
5
|
+
def initialize()
|
6
|
+
@logger = Logger.new(STDOUT)
|
7
|
+
end
|
8
|
+
|
9
|
+
def run(rule_map, from_record, to_record, options = {})
|
10
|
+
rule_map.each do |map_to_key, rule|
|
11
|
+
if rule.is_a?(Array)
|
12
|
+
rule.each do |sub_rule|
|
13
|
+
apply_rule(map_to_key, sub_rule, from_record, to_record, options)
|
14
|
+
end
|
15
|
+
else
|
16
|
+
apply_rule(map_to_key, rule, from_record, to_record, options)
|
17
|
+
end
|
18
|
+
end
|
19
|
+
|
20
|
+
to_record.each do |element|
|
21
|
+
element = element.delete_if do |k, v|
|
22
|
+
v != false && (v.nil?)
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
private
|
28
|
+
|
29
|
+
def apply_rule(map_to_key, rule, from_record, to_record, options = {})
|
30
|
+
if rule.has_key?('text')
|
31
|
+
suffix = (rule && rule.key?('options') && rule['options'].key?('suffix')) ? rule['options']['suffix'] : ''
|
32
|
+
to_record << { map_to_key.to_sym => add_suffix(rule['text'], suffix) }
|
33
|
+
elsif rule.has_key?('options') && rule['options'].has_key?('convert') && rule['options']['convert'].eql?('each')
|
34
|
+
result = get_value_for(map_to_key, rule['filter'], from_record, rule['options'], options)
|
35
|
+
|
36
|
+
if result.is_a?(Array)
|
37
|
+
result.each do |m|
|
38
|
+
to_record << {map_to_key.to_sym => m}
|
39
|
+
end
|
40
|
+
else
|
41
|
+
to_record << {map_to_key.to_sym => result}
|
42
|
+
end
|
43
|
+
else
|
44
|
+
result = get_value_for(map_to_key, rule['filter'], from_record, rule['options'], options)
|
45
|
+
return if result && result.empty?
|
46
|
+
|
47
|
+
to_record << {map_to_key.to_sym => result}
|
48
|
+
end
|
49
|
+
end
|
50
|
+
|
51
|
+
def get_value_for(tag_key, filter_path, record, rule_options = {}, options = {})
|
52
|
+
data = nil
|
53
|
+
if record
|
54
|
+
if filter_path.is_a?(Array) && !record.is_a?(Array)
|
55
|
+
record = [record]
|
56
|
+
end
|
57
|
+
|
58
|
+
data = Core::filter(record, filter_path)
|
59
|
+
|
60
|
+
if data && rule_options
|
61
|
+
if rule_options.key?('convert')
|
62
|
+
case rule_options['convert']
|
63
|
+
when 'time'
|
64
|
+
result = []
|
65
|
+
data = [data] unless data.is_a?(Array)
|
66
|
+
data.each do |d|
|
67
|
+
result << Time.parse(d)
|
68
|
+
end
|
69
|
+
data = result
|
70
|
+
when 'map'
|
71
|
+
if data.is_a?(Array)
|
72
|
+
data = data.map do |r|
|
73
|
+
rule_options['map'][r] if rule_options['map'].key?(r)
|
74
|
+
end
|
75
|
+
|
76
|
+
data.compact!
|
77
|
+
data.flatten! if rule_options.key?('flatten') && rule_options['flatten']
|
78
|
+
else
|
79
|
+
return rule_options['map'][data] if rule_options['map'].key?(data)
|
80
|
+
end
|
81
|
+
when 'each'
|
82
|
+
data = [data] unless data.is_a?(Array)
|
83
|
+
if options.empty?
|
84
|
+
data = data.map { |d| rule_options['lambda'].call(d) }
|
85
|
+
else
|
86
|
+
data = data.map { |d| rule_options['lambda'].call(d, options) }
|
87
|
+
end
|
88
|
+
data.flatten! if rule_options.key?('flatten') && rule_options['flatten']
|
89
|
+
when 'call'
|
90
|
+
if options.empty?
|
91
|
+
data = rule_options['lambda'].call(data)
|
92
|
+
else
|
93
|
+
data = rule_options['lambda'].call(data, options)
|
94
|
+
end
|
95
|
+
return data
|
96
|
+
end
|
97
|
+
end
|
98
|
+
|
99
|
+
if rule_options.key?('suffix')
|
100
|
+
data = add_suffix(data, rule_options['suffix'])
|
101
|
+
end
|
102
|
+
|
103
|
+
end
|
104
|
+
|
105
|
+
end
|
106
|
+
|
107
|
+
return data
|
108
|
+
end
|
109
|
+
|
110
|
+
def add_suffix(data, suffix)
|
111
|
+
case data.class.name
|
112
|
+
when 'Array'
|
113
|
+
result = []
|
114
|
+
data.each do |d|
|
115
|
+
result << add_suffix(d, suffix)
|
116
|
+
end
|
117
|
+
data = result
|
118
|
+
when 'Hash'
|
119
|
+
data.each do |k, v|
|
120
|
+
data[k] = add_suffix(v, suffix)
|
121
|
+
end
|
122
|
+
else
|
123
|
+
data = data.to_s
|
124
|
+
data += suffix
|
125
|
+
end
|
126
|
+
data
|
127
|
+
end
|
128
|
+
|
129
|
+
end
|
130
|
+
end
|
@@ -53,28 +53,38 @@ module DataCollector
|
|
53
53
|
|
54
54
|
output_data << {tag.to_sym => data} unless data.nil? || (data.is_a?(Array) && data.empty?)
|
55
55
|
rescue StandardError => e
|
56
|
-
puts "error running rule '#{tag}'\n\t#{e.message}"
|
57
|
-
puts e.backtrace.join("\n")
|
56
|
+
# puts "error running rule '#{tag}'\n\t#{e.message}"
|
57
|
+
# puts e.backtrace.join("\n")
|
58
|
+
raise DataCollector::Error, "error running rule '#{tag}'\n\t#{e.message}"
|
58
59
|
end
|
59
60
|
|
60
61
|
def apply_filtered_data_on_payload(input_data, payload, options = {})
|
61
62
|
return nil if input_data.nil?
|
62
63
|
|
64
|
+
normalized_options = options.select{|k,v| k !~ /^_/ }.with_indifferent_access
|
63
65
|
output_data = nil
|
64
66
|
case payload.class.name
|
65
67
|
when 'Proc'
|
66
68
|
data = input_data.is_a?(Array) ? input_data : [input_data]
|
67
|
-
output_data = if
|
69
|
+
output_data = if normalized_options.empty?
|
68
70
|
data.map { |d| payload.call(d) }
|
69
71
|
else
|
70
|
-
data.map { |d| payload.call(d,
|
72
|
+
data.map { |d| payload.call(d, normalized_options) }
|
71
73
|
end
|
72
74
|
when 'Hash'
|
73
75
|
input_data = [input_data] unless input_data.is_a?(Array)
|
74
76
|
if input_data.is_a?(Array)
|
75
77
|
output_data = input_data.map do |m|
|
76
78
|
if payload.key?('suffix')
|
77
|
-
|
79
|
+
if (m.is_a?(Hash))
|
80
|
+
m.transform_values{|v| v.is_a?(String) ? "#{v}#{payload['suffix']}" : v}
|
81
|
+
elsif m.is_a?(Array)
|
82
|
+
m.map{|n| n.is_a?(String) ? "#{n}#{payload['suffix']}": n}
|
83
|
+
elsif m.methods.include?(:to_s)
|
84
|
+
"#{m}#{payload['suffix']}"
|
85
|
+
else
|
86
|
+
m
|
87
|
+
end
|
78
88
|
else
|
79
89
|
payload[m]
|
80
90
|
end
|
@@ -83,7 +93,7 @@ module DataCollector
|
|
83
93
|
when 'Array'
|
84
94
|
output_data = input_data
|
85
95
|
payload.each do |p|
|
86
|
-
output_data = apply_filtered_data_on_payload(output_data, p,
|
96
|
+
output_data = apply_filtered_data_on_payload(output_data, p, normalized_options)
|
87
97
|
end
|
88
98
|
else
|
89
99
|
output_data = [input_data]
|
@@ -97,12 +107,16 @@ module DataCollector
|
|
97
107
|
output_data = output_data.first
|
98
108
|
end
|
99
109
|
|
100
|
-
if options.key?('_no_array_with_one_element') && options['_no_array_with_one_element'] &&
|
110
|
+
if options.with_indifferent_access.key?('_no_array_with_one_element') && options.with_indifferent_access['_no_array_with_one_element'] &&
|
101
111
|
output_data.is_a?(Array) && output_data.size == 1
|
102
112
|
output_data = output_data.first
|
103
113
|
end
|
104
114
|
|
105
115
|
output_data
|
116
|
+
rescue StandardError => e
|
117
|
+
# puts "error applying filtered data on payload'#{payload.to_json}'\n\t#{e.message}"
|
118
|
+
# puts e.backtrace.join("\n")
|
119
|
+
raise DataCollector::Error, "error applying filtered data on payload'#{payload.to_json}'\n\t#{e.message}"
|
106
120
|
end
|
107
121
|
|
108
122
|
def json_path_filter(filter, input_data)
|
@@ -111,6 +125,10 @@ module DataCollector
|
|
111
125
|
return input_data if input_data.is_a?(String)
|
112
126
|
|
113
127
|
Core.filter(input_data, filter)
|
128
|
+
rescue StandardError => e
|
129
|
+
puts "error running filter '#{filter}'\n\t#{e.message}"
|
130
|
+
puts e.backtrace.join("\n")
|
131
|
+
raise DataCollector::Error, "error running filter '#{filter}'\n\t#{e.message}"
|
114
132
|
end
|
115
133
|
end
|
116
134
|
end
|
data/lib/data_collector.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: data_collector
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.18.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Mehmet Celik
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2023-
|
11
|
+
date: 2023-04-18 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: activesupport
|
@@ -114,14 +114,14 @@ dependencies:
|
|
114
114
|
requirements:
|
115
115
|
- - "~>"
|
116
116
|
- !ruby/object:Gem::Version
|
117
|
-
version: '1.
|
117
|
+
version: '1.14'
|
118
118
|
type: :runtime
|
119
119
|
prerelease: false
|
120
120
|
version_requirements: !ruby/object:Gem::Requirement
|
121
121
|
requirements:
|
122
122
|
- - "~>"
|
123
123
|
- !ruby/object:Gem::Version
|
124
|
-
version: '1.
|
124
|
+
version: '1.14'
|
125
125
|
- !ruby/object:Gem::Dependency
|
126
126
|
name: nori
|
127
127
|
requirement: !ruby/object:Gem::Requirement
|
@@ -136,6 +136,48 @@ dependencies:
|
|
136
136
|
- - "~>"
|
137
137
|
- !ruby/object:Gem::Version
|
138
138
|
version: '2.6'
|
139
|
+
- !ruby/object:Gem::Dependency
|
140
|
+
name: iso8601
|
141
|
+
requirement: !ruby/object:Gem::Requirement
|
142
|
+
requirements:
|
143
|
+
- - "~>"
|
144
|
+
- !ruby/object:Gem::Version
|
145
|
+
version: '0.13'
|
146
|
+
type: :runtime
|
147
|
+
prerelease: false
|
148
|
+
version_requirements: !ruby/object:Gem::Requirement
|
149
|
+
requirements:
|
150
|
+
- - "~>"
|
151
|
+
- !ruby/object:Gem::Version
|
152
|
+
version: '0.13'
|
153
|
+
- !ruby/object:Gem::Dependency
|
154
|
+
name: listen
|
155
|
+
requirement: !ruby/object:Gem::Requirement
|
156
|
+
requirements:
|
157
|
+
- - "~>"
|
158
|
+
- !ruby/object:Gem::Version
|
159
|
+
version: '3.8'
|
160
|
+
type: :runtime
|
161
|
+
prerelease: false
|
162
|
+
version_requirements: !ruby/object:Gem::Requirement
|
163
|
+
requirements:
|
164
|
+
- - "~>"
|
165
|
+
- !ruby/object:Gem::Version
|
166
|
+
version: '3.8'
|
167
|
+
- !ruby/object:Gem::Dependency
|
168
|
+
name: bunny
|
169
|
+
requirement: !ruby/object:Gem::Requirement
|
170
|
+
requirements:
|
171
|
+
- - "~>"
|
172
|
+
- !ruby/object:Gem::Version
|
173
|
+
version: '2.20'
|
174
|
+
type: :runtime
|
175
|
+
prerelease: false
|
176
|
+
version_requirements: !ruby/object:Gem::Requirement
|
177
|
+
requirements:
|
178
|
+
- - "~>"
|
179
|
+
- !ruby/object:Gem::Version
|
180
|
+
version: '2.20'
|
139
181
|
- !ruby/object:Gem::Dependency
|
140
182
|
name: bundler
|
141
183
|
requirement: !ruby/object:Gem::Requirement
|
@@ -156,14 +198,14 @@ dependencies:
|
|
156
198
|
requirements:
|
157
199
|
- - "~>"
|
158
200
|
- !ruby/object:Gem::Version
|
159
|
-
version: '5.
|
201
|
+
version: '5.18'
|
160
202
|
type: :development
|
161
203
|
prerelease: false
|
162
204
|
version_requirements: !ruby/object:Gem::Requirement
|
163
205
|
requirements:
|
164
206
|
- - "~>"
|
165
207
|
- !ruby/object:Gem::Version
|
166
|
-
version: '5.
|
208
|
+
version: '5.18'
|
167
209
|
- !ruby/object:Gem::Dependency
|
168
210
|
name: rake
|
169
211
|
requirement: !ruby/object:Gem::Requirement
|
@@ -208,13 +250,19 @@ files:
|
|
208
250
|
- bin/console
|
209
251
|
- bin/setup
|
210
252
|
- data_collector.gemspec
|
253
|
+
- examples/marc.rb
|
211
254
|
- lib/data_collector.rb
|
212
255
|
- lib/data_collector/config_file.rb
|
213
256
|
- lib/data_collector/core.rb
|
214
257
|
- lib/data_collector/ext/xml_utility_node.rb
|
215
258
|
- lib/data_collector/input.rb
|
259
|
+
- lib/data_collector/input/dir.rb
|
260
|
+
- lib/data_collector/input/generic.rb
|
261
|
+
- lib/data_collector/input/queue.rb
|
216
262
|
- lib/data_collector/output.rb
|
263
|
+
- lib/data_collector/pipeline.rb
|
217
264
|
- lib/data_collector/rules.rb
|
265
|
+
- lib/data_collector/rules.rb.depricated
|
218
266
|
- lib/data_collector/rules_ng.rb
|
219
267
|
- lib/data_collector/runner.rb
|
220
268
|
- lib/data_collector/version.rb
|
@@ -240,7 +288,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
240
288
|
- !ruby/object:Gem::Version
|
241
289
|
version: '0'
|
242
290
|
requirements: []
|
243
|
-
rubygems_version: 3.
|
291
|
+
rubygems_version: 3.4.10
|
244
292
|
signing_key:
|
245
293
|
specification_version: 4
|
246
294
|
summary: ETL helper library
|