data_collector 0.17.0 → 0.19.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +110 -60
- data/data_collector.gemspec +5 -2
- data/examples/marc.rb +27 -0
- data/lib/data_collector/core.rb +16 -0
- data/lib/data_collector/input/dir.rb +28 -0
- data/lib/data_collector/input/generic.rb +77 -0
- data/lib/data_collector/input/queue.rb +60 -0
- data/lib/data_collector/input.rb +23 -2
- data/lib/data_collector/output.rb +4 -3
- data/lib/data_collector/pipeline.rb +116 -0
- data/lib/data_collector/rules.rb +5 -126
- data/lib/data_collector/rules.rb.depricated +130 -0
- data/lib/data_collector/rules_ng.rb +40 -11
- data/lib/data_collector/runner.rb +0 -1
- data/lib/data_collector/version.rb +1 -1
- data/lib/data_collector.rb +1 -0
- metadata +55 -7
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 35d57ff2998ab1343a4d6e906bcd76bd67951a0eae9a6db69387e4de7dbba285
|
4
|
+
data.tar.gz: 702a6447c28533d2dcdce237cc209417963ea2827eaed4f4ad0ab56a62c42783
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 80e487e0d8bfa19cec43a607b3c58698c37e23fd6385be3102d3ca87584348d585241f1794f848c460c02751f29f3a28d1365c472b8e3a532a922f9104fb2e06
|
7
|
+
data.tar.gz: 0366f4350e54e1bf985f68d3d0532b6fb00394aad23a3641b07fd65594e7e0b16ddf5da90574251c75b83c63fe09455aa988b7395706fe76e995b17fc79cb2fe
|
data/README.md
CHANGED
@@ -1,39 +1,91 @@
|
|
1
1
|
# DataCollector
|
2
|
-
Convenience module to Extract, Transform and Load your data.
|
3
|
-
|
4
|
-
Support objects like CONFIG, LOG, RULES
|
2
|
+
Convenience module to Extract, Transform and Load your data in a Pipeline.
|
3
|
+
The 'INPUT', 'OUTPUT' and 'FILTER' object will help you to read, transform and output your data.
|
4
|
+
Support objects like CONFIG, LOG, ERROR, RULES. Will help you to write manageable rules to transform and log your data.
|
5
|
+
Include the DataCollector::Core module into your application gives you access to these objects.
|
6
|
+
```ruby
|
7
|
+
include DataCollector::Core
|
8
|
+
```
|
5
9
|
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
10
|
+
Every object can be used on its own.
|
11
|
+
|
12
|
+
|
13
|
+
#### Pipeline
|
14
|
+
Allows you to create a simple pipeline of operations to process data. With a data pipeline, you can collect, process, and transform data, and then transfer it to various systems and applications.
|
15
|
+
|
16
|
+
You can set a schedule for pipelines that are triggered by new data, specifying how often the pipeline should be
|
17
|
+
executed in the [ISO8601 duration format](https://www.digi.com/resources/documentation/digidocs//90001488-13/reference/r_iso_8601_duration_format.htm). The processing logic is then executed.
|
18
|
+
###### methods:
|
19
|
+
- .new(options): options can be schedule in [ISO8601 duration format](https://www.digi.com/resources/documentation/digidocs//90001488-13/reference/r_iso_8601_duration_format.htm) and name
|
20
|
+
- .run: start the pipeline. blocking if a schedule is supplied
|
21
|
+
- .stop: stop the pipeline
|
22
|
+
- .pause: pause the pipeline. Restart using .run
|
23
|
+
- .running?: is pipeline running
|
24
|
+
- .stopped?: is pipeline not running
|
25
|
+
- .paused?: is pipeline paused
|
26
|
+
- .name: name of the pipe
|
27
|
+
- .run_count: number of times the pipe has ran
|
28
|
+
- .on_message: handle to run every time a trigger event happens
|
29
|
+
###### example:
|
30
|
+
```ruby
|
31
|
+
#create a pipline scheduled to run every 10 minutes
|
32
|
+
pipeline = Pipeline.new(schedule: 'PT10M')
|
33
|
+
|
34
|
+
pipeline.on_message do |input, output|
|
35
|
+
# logic
|
36
|
+
end
|
12
37
|
|
38
|
+
pipeline.run
|
39
|
+
```
|
13
40
|
|
14
|
-
#### input
|
15
|
-
|
41
|
+
#### input
|
42
|
+
The input component is part of the processing logic. All data is converted into a Hash, Array, ... accessible using plain Ruby or JSONPath using the filter object.
|
43
|
+
The input component can fetch data from various URIs, such as files, URLs, directories, queues, ...
|
44
|
+
For a push input component, a listener is created with a processing logic block that is executed whenever new data is available.
|
45
|
+
A push happens when new data is created in a directory, message queue, ...
|
16
46
|
|
17
|
-
**Public methods**
|
18
47
|
```ruby
|
19
48
|
from_uri(source, options = {:raw, :content_type})
|
20
49
|
```
|
21
|
-
- source: an uri with a scheme of http, https, file
|
50
|
+
- source: an uri with a scheme of http, https, file, amqp
|
22
51
|
- options:
|
23
52
|
- raw: _boolean_ do not parse
|
24
53
|
- content_type: _string_ force a content_type if the 'Content-Type' returned by the http server is incorrect
|
25
54
|
|
26
|
-
example:
|
55
|
+
###### example:
|
27
56
|
```ruby
|
57
|
+
# read from an http endpoint
|
28
58
|
input.from_uri("http://www.libis.be")
|
29
59
|
input.from_uri("file://hello.txt")
|
30
60
|
input.from_uri("http://www.libis.be/record.jsonld", content_type: 'application/ld+json')
|
31
|
-
```
|
32
61
|
|
62
|
+
# read data from a RabbitMQ queue
|
63
|
+
listener = input.from_uri('amqp://user:password@localhost?channel=hello')
|
64
|
+
listener.on_message do |input, output, message|
|
65
|
+
puts message
|
66
|
+
end
|
67
|
+
listener.start
|
68
|
+
|
69
|
+
# read data from a directory
|
70
|
+
listener = input.from_uri('file://this/is/directory')
|
71
|
+
listener.on_message do |input, output, filename|
|
72
|
+
puts filename
|
73
|
+
end
|
74
|
+
listener.start
|
75
|
+
```
|
33
76
|
|
77
|
+
Inputs can be JSON, XML or CSV or XML in a TAR.GZ file
|
34
78
|
|
79
|
+
###### listener from input.from_uri(directory|message queue)
|
80
|
+
When a listener is defined that is triggered by an event(PUSH) like a message queue or files written to a directory you have these extra methods.
|
35
81
|
|
36
|
-
|
82
|
+
- .run: start the listener. blocking if a schedule is supplied
|
83
|
+
- .stop: stop the listener
|
84
|
+
- .pause: pause the listener. Restart using .run
|
85
|
+
- .running?: is listener running
|
86
|
+
- .stopped?: is listener not running
|
87
|
+
- .paused?: is listener paused
|
88
|
+
- .on_message: handle to run every time a trigger event happens
|
37
89
|
|
38
90
|
### output
|
39
91
|
Output is an object you can store key/value pairs that needs to be written to an output stream.
|
@@ -45,7 +97,7 @@ Output is an object you can store key/value pairs that needs to be written to an
|
|
45
97
|
Write output to a file, string use an ERB file as a template
|
46
98
|
example:
|
47
99
|
___test.erb___
|
48
|
-
```
|
100
|
+
```erbruby
|
49
101
|
<names>
|
50
102
|
<combined><%= data[:name] %> <%= data[:last_name] %></combined>
|
51
103
|
<%= print data, :name, :first_name %>
|
@@ -53,7 +105,7 @@ ___test.erb___
|
|
53
105
|
</names>
|
54
106
|
```
|
55
107
|
will produce
|
56
|
-
```
|
108
|
+
```html
|
57
109
|
<names>
|
58
110
|
<combined>John Doe</combined>
|
59
111
|
<first_name>John</first_name>
|
@@ -97,41 +149,11 @@ filter data from a hash using [JSONPath](http://goessner.net/articles/JsonPath/i
|
|
97
149
|
filtered_data = filter(data, "$..metadata.record")
|
98
150
|
```
|
99
151
|
|
100
|
-
#### rules
|
101
|
-
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
~~Available convert methods are: time, map, each, call, suffix, text~~
|
106
|
-
~~- time: parses a given time/date string into a Time object~~
|
107
|
-
~~- map: applies a mapping to a filter~~
|
108
|
-
~~- suffix: adds a suffix to a result~~
|
109
|
-
~~- call: executes a lambda on the filter~~
|
110
|
-
~~- each: runs a lambda on each row of a filter~~
|
111
|
-
~~- text: passthrough method. Returns value unchanged~~
|
112
|
-
|
113
|
-
~~example:~~
|
114
|
-
```ruby
|
115
|
-
my_rules = {
|
116
|
-
'identifier' => {"filter" => '$..id'},
|
117
|
-
'language' => {'filter' => '$..lang',
|
118
|
-
'options' => {'convert' => 'map',
|
119
|
-
'map' => {'nl' => 'dut', 'fr' => 'fre', 'de' => 'ger', 'en' => 'eng'}
|
120
|
-
}
|
121
|
-
},
|
122
|
-
'subject' => {'filter' => '$..keywords',
|
123
|
-
options' => {'convert' => 'each',
|
124
|
-
'lambda' => lambda {|d| d.split(',')}
|
125
|
-
}
|
126
|
-
},
|
127
|
-
'creationdate' => {'filter' => '$..published_date', 'convert' => 'time'}
|
128
|
-
}
|
129
|
-
|
130
|
-
rules.run(my_rules, record, output)
|
131
|
-
```
|
132
|
-
|
133
|
-
#### rules_ng
|
134
|
-
!!! not compatible with RULES object
|
152
|
+
#### rules
|
153
|
+
The RULES objects have a simple concept. Rules exist of 3 components:
|
154
|
+
- a destination tag
|
155
|
+
- a jsonpath filter to get the data
|
156
|
+
- a lambda to execute on every filter hit
|
135
157
|
|
136
158
|
TODO: work in progress see test for examples on how to use
|
137
159
|
|
@@ -202,15 +224,15 @@ Here you find different rule combination that are possible
|
|
202
224
|
}
|
203
225
|
```
|
204
226
|
|
205
|
-
Here is an example on how to call last RULESET "rs_hash_with_json_filter_and_option".
|
206
227
|
|
207
|
-
***
|
228
|
+
***rules.run*** can have 4 parameters. First 3 are mandatory. The last one ***options*** can hold data static to a rule set or engine directives.
|
208
229
|
|
209
|
-
List of engine directives:
|
230
|
+
##### List of engine directives:
|
210
231
|
- _no_array_with_one_element: defaults to false. if the result is an array with 1 element just return the element.
|
211
232
|
|
212
|
-
|
233
|
+
###### example:
|
213
234
|
```ruby
|
235
|
+
# apply RULESET "rs_hash_with_json_filter_and_option" to data
|
214
236
|
include DataCollector::Core
|
215
237
|
output.clear
|
216
238
|
data = {'subject' => ['water', 'thermodynamics']}
|
@@ -247,8 +269,11 @@ Log to stdout
|
|
247
269
|
```ruby
|
248
270
|
log("hello world")
|
249
271
|
```
|
250
|
-
|
251
|
-
|
272
|
+
#### error
|
273
|
+
Log an error
|
274
|
+
```ruby
|
275
|
+
error("if you have an issue take a tissue")
|
276
|
+
```
|
252
277
|
## Example
|
253
278
|
Input data ___test.csv___
|
254
279
|
```csv
|
@@ -315,7 +340,32 @@ Or install it yourself as:
|
|
315
340
|
|
316
341
|
## Usage
|
317
342
|
|
318
|
-
|
343
|
+
```ruby
|
344
|
+
require 'data_collector'
|
345
|
+
|
346
|
+
include DataCollector::Core
|
347
|
+
# including core gives you a pipeline, input, output, filter, config, log, error object to work with
|
348
|
+
RULES = {
|
349
|
+
'title' => '$..vertitle'
|
350
|
+
}
|
351
|
+
#create a PULL pipeline and schedule it to run every 5 seconds
|
352
|
+
pipeline = DataCollector::Pipeline.new(schedule: 'PT5S')
|
353
|
+
|
354
|
+
pipeline.on_message do |input, output|
|
355
|
+
data = input.from_uri('https://services3.libis.be/primo_artefact/lirias3611609')
|
356
|
+
rules.run(RULES, data, output)
|
357
|
+
#puts JSON.pretty_generate(input.raw)
|
358
|
+
puts JSON.pretty_generate(output.raw)
|
359
|
+
output.clear
|
360
|
+
|
361
|
+
if pipeline.run_count > 2
|
362
|
+
log('stopping pipeline after one run')
|
363
|
+
pipeline.stop
|
364
|
+
end
|
365
|
+
end
|
366
|
+
pipeline.run
|
367
|
+
|
368
|
+
```
|
319
369
|
|
320
370
|
## Development
|
321
371
|
|
data/data_collector.gemspec
CHANGED
@@ -43,11 +43,14 @@ Gem::Specification.new do |spec|
|
|
43
43
|
spec.add_runtime_dependency 'jsonpath', '~> 1.1'
|
44
44
|
spec.add_runtime_dependency 'mime-types', '~> 3.4'
|
45
45
|
spec.add_runtime_dependency 'minitar', '= 0.9'
|
46
|
-
spec.add_runtime_dependency 'nokogiri', '~> 1.
|
46
|
+
spec.add_runtime_dependency 'nokogiri', '~> 1.14'
|
47
47
|
spec.add_runtime_dependency 'nori', '~> 2.6'
|
48
|
+
spec.add_runtime_dependency 'iso8601', '~> 0.13'
|
49
|
+
spec.add_runtime_dependency 'listen', '~> 3.8'
|
50
|
+
spec.add_runtime_dependency 'bunny', '~> 2.20'
|
48
51
|
|
49
52
|
spec.add_development_dependency 'bundler', '~> 2.3'
|
50
|
-
spec.add_development_dependency 'minitest', '~> 5.
|
53
|
+
spec.add_development_dependency 'minitest', '~> 5.18'
|
51
54
|
spec.add_development_dependency 'rake', '~> 13.0'
|
52
55
|
spec.add_development_dependency 'webmock', '~> 3.18'
|
53
56
|
end
|
data/examples/marc.rb
ADDED
@@ -0,0 +1,27 @@
|
|
1
|
+
$LOAD_PATH << '../lib'
|
2
|
+
require 'data_collector'
|
3
|
+
|
4
|
+
# include module gives us an pipeline, input, output, filter, log and error object to work with
|
5
|
+
include DataCollector::Core
|
6
|
+
|
7
|
+
RULES = {
|
8
|
+
"title" => {'$.record.datafield[?(@._tag == "245")]' => lambda do |d, o|
|
9
|
+
subfields = d['subfield']
|
10
|
+
subfields = [subfields] unless subfields.is_a?(Array)
|
11
|
+
subfields.map{|m| m["$text"]}.join(' ')
|
12
|
+
end
|
13
|
+
},
|
14
|
+
"author" => {'$..datafield[?(@._tag == "100")]' => lambda do |d, o|
|
15
|
+
subfields = d['subfield']
|
16
|
+
subfields = [subfields] unless subfields.is_a?(Array)
|
17
|
+
subfields.map{|m| m["$text"]}.join(' ')
|
18
|
+
end
|
19
|
+
}
|
20
|
+
}
|
21
|
+
|
22
|
+
#read remote record enable logging
|
23
|
+
data = input.from_uri('https://gist.githubusercontent.com/kefo/796b39925e234fb6d912/raw/3df2ce329a947864ae8555f214253f956d679605/sample-marc-with-xsd.xml', {logging: true})
|
24
|
+
# apply rules to data and if result contains only 1 entry do not return an array
|
25
|
+
rules.run(RULES, data, output, {_no_array_with_one_element: true})
|
26
|
+
# print result
|
27
|
+
puts JSON.pretty_generate(output.raw)
|
data/lib/data_collector/core.rb
CHANGED
@@ -10,6 +10,14 @@ require_relative 'config_file'
|
|
10
10
|
|
11
11
|
module DataCollector
|
12
12
|
module Core
|
13
|
+
# Pipeline for your data pipeline
|
14
|
+
# example: pipeline.on_message do |input, output|
|
15
|
+
# ** processing logic here **
|
16
|
+
# end
|
17
|
+
def pipeline
|
18
|
+
@input ||= DataCollector::Pipeline.new
|
19
|
+
end
|
20
|
+
module_function :pipeline
|
13
21
|
# Read input from an URI
|
14
22
|
# example: input.from_uri("http://www.libis.be")
|
15
23
|
# input.from_uri("file://hello.txt")
|
@@ -79,6 +87,8 @@ module DataCollector
|
|
79
87
|
# }
|
80
88
|
# rules.run(my_rules, input, output)
|
81
89
|
def rules
|
90
|
+
#DataCollector::Core.log('RULES depricated using RULESNG')
|
91
|
+
#rules_ng
|
82
92
|
@rules ||= Rules.new
|
83
93
|
end
|
84
94
|
module_function :rules
|
@@ -121,6 +131,12 @@ module DataCollector
|
|
121
131
|
end
|
122
132
|
module_function :log
|
123
133
|
|
134
|
+
def error(message)
|
135
|
+
@logger ||= Logger.new(STDOUT)
|
136
|
+
@logger.error(message)
|
137
|
+
end
|
138
|
+
module_function :error
|
139
|
+
|
124
140
|
end
|
125
141
|
|
126
142
|
end
|
@@ -0,0 +1,28 @@
|
|
1
|
+
require_relative 'generic'
|
2
|
+
require 'listen'
|
3
|
+
|
4
|
+
module DataCollector
|
5
|
+
class Input
|
6
|
+
class Dir < Generic
|
7
|
+
def initialize(uri, options)
|
8
|
+
super
|
9
|
+
end
|
10
|
+
|
11
|
+
def running?
|
12
|
+
@listener.processing?
|
13
|
+
end
|
14
|
+
|
15
|
+
private
|
16
|
+
|
17
|
+
def create_listener
|
18
|
+
@listener ||= Listen.to("#{@uri.host}/#{@uri.path}", @options) do |modified, added, _|
|
19
|
+
files = added | modified
|
20
|
+
files.each do |filename|
|
21
|
+
handle_on_message(input, output, filename)
|
22
|
+
end
|
23
|
+
end
|
24
|
+
end
|
25
|
+
|
26
|
+
end
|
27
|
+
end
|
28
|
+
end
|
@@ -0,0 +1,77 @@
|
|
1
|
+
require 'listen'
|
2
|
+
|
3
|
+
module DataCollector
|
4
|
+
class Input
|
5
|
+
class Generic
|
6
|
+
def initialize(uri, options)
|
7
|
+
@uri = uri
|
8
|
+
@options = options
|
9
|
+
|
10
|
+
@input = DataCollector::Input.new
|
11
|
+
@output = DataCollector::Output.new
|
12
|
+
|
13
|
+
@listener = create_listener
|
14
|
+
end
|
15
|
+
|
16
|
+
def run(should_block = false, &block)
|
17
|
+
raise DataCollector::Error, 'Please supply a on_message block' if @on_message_callback.nil?
|
18
|
+
@listener.start
|
19
|
+
|
20
|
+
if should_block
|
21
|
+
while running?
|
22
|
+
yield block if block_given?
|
23
|
+
sleep 2
|
24
|
+
end
|
25
|
+
else
|
26
|
+
yield block if block_given?
|
27
|
+
end
|
28
|
+
|
29
|
+
end
|
30
|
+
|
31
|
+
def stop
|
32
|
+
@listener.stop
|
33
|
+
end
|
34
|
+
|
35
|
+
def pause
|
36
|
+
@listener.pause
|
37
|
+
end
|
38
|
+
|
39
|
+
def running?
|
40
|
+
@listener.running?
|
41
|
+
end
|
42
|
+
|
43
|
+
def stopped?
|
44
|
+
@listener.stopped?
|
45
|
+
end
|
46
|
+
|
47
|
+
def paused?
|
48
|
+
@listener.paused?
|
49
|
+
end
|
50
|
+
|
51
|
+
def on_message(&block)
|
52
|
+
@on_message_callback = block
|
53
|
+
end
|
54
|
+
|
55
|
+
private
|
56
|
+
|
57
|
+
def create_listener
|
58
|
+
raise DataCollector::Error, 'Please implement a listener'
|
59
|
+
end
|
60
|
+
|
61
|
+
def handle_on_message(input, output, data)
|
62
|
+
if (callback = @on_message_callback)
|
63
|
+
timing = Time.now
|
64
|
+
begin
|
65
|
+
callback.call(input, output, data)
|
66
|
+
rescue StandardError => e
|
67
|
+
DataCollector::Core.error("INPUT #{e.message}")
|
68
|
+
puts e.backtrace.join("\n")
|
69
|
+
ensure
|
70
|
+
DataCollector::Core.log("INPUT ran for #{((Time.now.to_f - timing.to_f).to_f * 1000.0).to_i}ms")
|
71
|
+
end
|
72
|
+
end
|
73
|
+
end
|
74
|
+
|
75
|
+
end
|
76
|
+
end
|
77
|
+
end
|
@@ -0,0 +1,60 @@
|
|
1
|
+
require_relative 'generic'
|
2
|
+
require 'bunny'
|
3
|
+
require 'active_support/core_ext/hash'
|
4
|
+
|
5
|
+
module DataCollector
|
6
|
+
class Input
|
7
|
+
class Queue < Generic
|
8
|
+
def initialize(uri, options)
|
9
|
+
super
|
10
|
+
|
11
|
+
if running?
|
12
|
+
create_channel unless @channel
|
13
|
+
create_queue unless @queue
|
14
|
+
end
|
15
|
+
end
|
16
|
+
|
17
|
+
def running?
|
18
|
+
@listener.open?
|
19
|
+
end
|
20
|
+
|
21
|
+
def send(message)
|
22
|
+
if running?
|
23
|
+
@queue.publish(message)
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
private
|
28
|
+
|
29
|
+
def create_listener
|
30
|
+
@listener ||= begin
|
31
|
+
connection = Bunny.new(@uri.to_s)
|
32
|
+
connection.start
|
33
|
+
|
34
|
+
connection
|
35
|
+
rescue StandardError => e
|
36
|
+
raise DataCollector::Error, "Unable to connect to RabbitMQ. #{e.message}"
|
37
|
+
end
|
38
|
+
end
|
39
|
+
|
40
|
+
def create_channel
|
41
|
+
raise DataCollector::Error, 'Connection to RabbitMQ is closed' if @listener.closed?
|
42
|
+
@channel ||= @listener.create_channel
|
43
|
+
end
|
44
|
+
|
45
|
+
def create_queue
|
46
|
+
@queue ||= begin
|
47
|
+
options = CGI.parse(@uri.query).with_indifferent_access
|
48
|
+
raise DataCollector::Error, '"channel" query parameter missing from uri.' unless options.include?(:channel)
|
49
|
+
queue = @channel.queue(options[:channel].first)
|
50
|
+
|
51
|
+
queue.subscribe do |delivery_info, metadata, payload|
|
52
|
+
handle_on_message(input, output, payload)
|
53
|
+
end if queue
|
54
|
+
|
55
|
+
queue
|
56
|
+
end
|
57
|
+
end
|
58
|
+
end
|
59
|
+
end
|
60
|
+
end
|
data/lib/data_collector/input.rb
CHANGED
@@ -12,6 +12,8 @@ require 'active_support/core_ext/hash'
|
|
12
12
|
require 'zlib'
|
13
13
|
require 'minitar'
|
14
14
|
require 'csv'
|
15
|
+
require_relative 'input/dir'
|
16
|
+
require_relative 'input/queue'
|
15
17
|
|
16
18
|
#require_relative 'ext/xml_utility_node'
|
17
19
|
module DataCollector
|
@@ -34,7 +36,15 @@ module DataCollector
|
|
34
36
|
when 'https'
|
35
37
|
data = from_https(uri, options)
|
36
38
|
when 'file'
|
37
|
-
|
39
|
+
if File.directory?("#{uri.host}/#{uri.path}")
|
40
|
+
raise DataCollector::Error, "#{uri.host}/#{uri.path} not found" unless File.exist?("#{uri.host}/#{uri.path}")
|
41
|
+
return from_dir(uri, options)
|
42
|
+
else
|
43
|
+
raise DataCollector::Error, "#{uri.host}/#{uri.path} not found" unless File.exist?("#{uri.host}/#{uri.path}")
|
44
|
+
data = from_file(uri, options)
|
45
|
+
end
|
46
|
+
when 'amqp'
|
47
|
+
data = from_queue(uri,options)
|
38
48
|
else
|
39
49
|
raise "Do not know how to process #{source}"
|
40
50
|
end
|
@@ -61,7 +71,10 @@ module DataCollector
|
|
61
71
|
|
62
72
|
def from_https(uri, options = {})
|
63
73
|
data = nil
|
64
|
-
|
74
|
+
if options.with_indifferent_access.include?(:logging) && options.with_indifferent_access[:logging]
|
75
|
+
HTTP.default_options = HTTP::Options.new(features: { logging: { logger: @logger } })
|
76
|
+
end
|
77
|
+
|
65
78
|
http = HTTP
|
66
79
|
|
67
80
|
#http.use(logging: {logger: @logger})
|
@@ -157,6 +170,14 @@ module DataCollector
|
|
157
170
|
data
|
158
171
|
end
|
159
172
|
|
173
|
+
def from_dir(uri, options = {})
|
174
|
+
DataCollector::Input::Dir.new(uri, options)
|
175
|
+
end
|
176
|
+
|
177
|
+
def from_queue(uri, options = {})
|
178
|
+
DataCollector::Input::Queue.new(uri, options)
|
179
|
+
end
|
180
|
+
|
160
181
|
def xml_to_hash(data)
|
161
182
|
#gsub('<\/', '< /') outherwise wrong XML-parsing (see records lirias1729192 )
|
162
183
|
data = data.gsub /</, '< /'
|
@@ -38,8 +38,10 @@ module DataCollector
|
|
38
38
|
data[k] << v
|
39
39
|
end
|
40
40
|
else
|
41
|
-
|
42
|
-
|
41
|
+
data[k] = v
|
42
|
+
# HELP: why am I creating an array here?
|
43
|
+
# t = data[k]
|
44
|
+
# data[k] = Array.new([t, v])
|
43
45
|
end
|
44
46
|
else
|
45
47
|
data[k] = v
|
@@ -152,7 +154,6 @@ module DataCollector
|
|
152
154
|
result
|
153
155
|
rescue Exception => e
|
154
156
|
raise "unable to transform to text: #{e.message}"
|
155
|
-
""
|
156
157
|
end
|
157
158
|
|
158
159
|
def to_tmp_file(erb_file, records_dir)
|
@@ -0,0 +1,116 @@
|
|
1
|
+
require 'iso8601'
|
2
|
+
|
3
|
+
module DataCollector
|
4
|
+
class Pipeline
|
5
|
+
attr_reader :run_count, :name
|
6
|
+
def initialize(options = {})
|
7
|
+
@running = false
|
8
|
+
@paused = false
|
9
|
+
|
10
|
+
@input = DataCollector::Input.new
|
11
|
+
@output = DataCollector::Output.new
|
12
|
+
@run_count = 0
|
13
|
+
|
14
|
+
@schedule = options[:schedule] || {}
|
15
|
+
@name = options[:name] || "#{Time.now.to_i}-#{rand(10000)}"
|
16
|
+
@options = options
|
17
|
+
@listeners = []
|
18
|
+
end
|
19
|
+
|
20
|
+
def on_message(&block)
|
21
|
+
@on_message_callback = block
|
22
|
+
end
|
23
|
+
|
24
|
+
def run
|
25
|
+
if paused? && @running
|
26
|
+
@paused = false
|
27
|
+
@listeners.each do |listener|
|
28
|
+
listener.run if listener.paused?
|
29
|
+
end
|
30
|
+
end
|
31
|
+
|
32
|
+
@running = true
|
33
|
+
if @schedule && !@schedule.empty?
|
34
|
+
while running?
|
35
|
+
@run_count += 1
|
36
|
+
start_time = ISO8601::DateTime.new(Time.now.to_datetime.to_s)
|
37
|
+
begin
|
38
|
+
duration = ISO8601::Duration.new(@schedule)
|
39
|
+
rescue StandardError => e
|
40
|
+
raise DataCollector::Error, "PIPELINE - bad schedule: #{e.message}"
|
41
|
+
end
|
42
|
+
interval = ISO8601::TimeInterval.from_duration(start_time, duration)
|
43
|
+
|
44
|
+
DataCollector::Core.log("PIPELINE running in #{interval.size} seconds")
|
45
|
+
sleep interval.size
|
46
|
+
handle_on_message(@input, @output) unless paused?
|
47
|
+
end
|
48
|
+
else # run once
|
49
|
+
@run_count += 1
|
50
|
+
if @options.key?(:uri)
|
51
|
+
listener = Input.new.from_uri(@options[:uri], @options)
|
52
|
+
listener.on_message do |input, output, filename|
|
53
|
+
DataCollector::Core.log("PIPELINE triggered by #{filename}")
|
54
|
+
handle_on_message(@input, @output, filename)
|
55
|
+
end
|
56
|
+
@listeners << listener
|
57
|
+
|
58
|
+
listener.run(true)
|
59
|
+
|
60
|
+
else
|
61
|
+
DataCollector::Core.log("PIPELINE running once")
|
62
|
+
handle_on_message(@input, @output)
|
63
|
+
end
|
64
|
+
end
|
65
|
+
rescue StandardError => e
|
66
|
+
DataCollector::Core.error("PIPELINE run failed: #{e.message}")
|
67
|
+
raise e
|
68
|
+
#puts e.backtrace.join("\n")
|
69
|
+
end
|
70
|
+
|
71
|
+
def stop
|
72
|
+
@running = false
|
73
|
+
@paused = false
|
74
|
+
@listeners.each do |listener|
|
75
|
+
listener.stop if listener.running?
|
76
|
+
end
|
77
|
+
end
|
78
|
+
|
79
|
+
def pause
|
80
|
+
if @running
|
81
|
+
@paused = !@paused
|
82
|
+
@listeners.each do |listener|
|
83
|
+
listener.pause if listener.running?
|
84
|
+
end
|
85
|
+
end
|
86
|
+
end
|
87
|
+
|
88
|
+
def running?
|
89
|
+
@running
|
90
|
+
end
|
91
|
+
|
92
|
+
def stopped?
|
93
|
+
!@running
|
94
|
+
end
|
95
|
+
|
96
|
+
def paused?
|
97
|
+
@paused
|
98
|
+
end
|
99
|
+
|
100
|
+
private
|
101
|
+
|
102
|
+
def handle_on_message(input, output, filename = nil)
|
103
|
+
if (callback = @on_message_callback)
|
104
|
+
timing = Time.now
|
105
|
+
begin
|
106
|
+
callback.call(input, output, filename)
|
107
|
+
rescue StandardError => e
|
108
|
+
DataCollector::Core.error("PIPELINE #{e.message}")
|
109
|
+
ensure
|
110
|
+
DataCollector::Core.log("PIPELINE ran for #{((Time.now.to_f - timing.to_f).to_f * 1000.0).to_i}ms")
|
111
|
+
end
|
112
|
+
end
|
113
|
+
end
|
114
|
+
|
115
|
+
end
|
116
|
+
end
|
data/lib/data_collector/rules.rb
CHANGED
@@ -1,130 +1,9 @@
|
|
1
|
-
|
1
|
+
require_relative 'rules_ng'
|
2
2
|
|
3
3
|
module DataCollector
|
4
|
-
class Rules
|
5
|
-
def initialize()
|
6
|
-
|
4
|
+
class Rules < RulesNg
|
5
|
+
def initialize(logger = Logger.new(STDOUT))
|
6
|
+
super
|
7
7
|
end
|
8
|
-
|
9
|
-
def run(rule_map, from_record, to_record, options = {})
|
10
|
-
rule_map.each do |map_to_key, rule|
|
11
|
-
if rule.is_a?(Array)
|
12
|
-
rule.each do |sub_rule|
|
13
|
-
apply_rule(map_to_key, sub_rule, from_record, to_record, options)
|
14
|
-
end
|
15
|
-
else
|
16
|
-
apply_rule(map_to_key, rule, from_record, to_record, options)
|
17
|
-
end
|
18
|
-
end
|
19
|
-
|
20
|
-
to_record.each do |element|
|
21
|
-
element = element.delete_if do |k, v|
|
22
|
-
v != false && (v.nil?)
|
23
|
-
end
|
24
|
-
end
|
25
|
-
end
|
26
|
-
|
27
|
-
private
|
28
|
-
|
29
|
-
def apply_rule(map_to_key, rule, from_record, to_record, options = {})
|
30
|
-
if rule.has_key?('text')
|
31
|
-
suffix = (rule && rule.key?('options') && rule['options'].key?('suffix')) ? rule['options']['suffix'] : ''
|
32
|
-
to_record << { map_to_key.to_sym => add_suffix(rule['text'], suffix) }
|
33
|
-
elsif rule.has_key?('options') && rule['options'].has_key?('convert') && rule['options']['convert'].eql?('each')
|
34
|
-
result = get_value_for(map_to_key, rule['filter'], from_record, rule['options'], options)
|
35
|
-
|
36
|
-
if result.is_a?(Array)
|
37
|
-
result.each do |m|
|
38
|
-
to_record << {map_to_key.to_sym => m}
|
39
|
-
end
|
40
|
-
else
|
41
|
-
to_record << {map_to_key.to_sym => result}
|
42
|
-
end
|
43
|
-
else
|
44
|
-
result = get_value_for(map_to_key, rule['filter'], from_record, rule['options'], options)
|
45
|
-
return if result && result.empty?
|
46
|
-
|
47
|
-
to_record << {map_to_key.to_sym => result}
|
48
|
-
end
|
49
|
-
end
|
50
|
-
|
51
|
-
def get_value_for(tag_key, filter_path, record, rule_options = {}, options = {})
|
52
|
-
data = nil
|
53
|
-
if record
|
54
|
-
if filter_path.is_a?(Array) && !record.is_a?(Array)
|
55
|
-
record = [record]
|
56
|
-
end
|
57
|
-
|
58
|
-
data = Core::filter(record, filter_path)
|
59
|
-
|
60
|
-
if data && rule_options
|
61
|
-
if rule_options.key?('convert')
|
62
|
-
case rule_options['convert']
|
63
|
-
when 'time'
|
64
|
-
result = []
|
65
|
-
data = [data] unless data.is_a?(Array)
|
66
|
-
data.each do |d|
|
67
|
-
result << Time.parse(d)
|
68
|
-
end
|
69
|
-
data = result
|
70
|
-
when 'map'
|
71
|
-
if data.is_a?(Array)
|
72
|
-
data = data.map do |r|
|
73
|
-
rule_options['map'][r] if rule_options['map'].key?(r)
|
74
|
-
end
|
75
|
-
|
76
|
-
data.compact!
|
77
|
-
data.flatten! if rule_options.key?('flatten') && rule_options['flatten']
|
78
|
-
else
|
79
|
-
return rule_options['map'][data] if rule_options['map'].key?(data)
|
80
|
-
end
|
81
|
-
when 'each'
|
82
|
-
data = [data] unless data.is_a?(Array)
|
83
|
-
if options.empty?
|
84
|
-
data = data.map { |d| rule_options['lambda'].call(d) }
|
85
|
-
else
|
86
|
-
data = data.map { |d| rule_options['lambda'].call(d, options) }
|
87
|
-
end
|
88
|
-
data.flatten! if rule_options.key?('flatten') && rule_options['flatten']
|
89
|
-
when 'call'
|
90
|
-
if options.empty?
|
91
|
-
data = rule_options['lambda'].call(data)
|
92
|
-
else
|
93
|
-
data = rule_options['lambda'].call(data, options)
|
94
|
-
end
|
95
|
-
return data
|
96
|
-
end
|
97
|
-
end
|
98
|
-
|
99
|
-
if rule_options.key?('suffix')
|
100
|
-
data = add_suffix(data, rule_options['suffix'])
|
101
|
-
end
|
102
|
-
|
103
|
-
end
|
104
|
-
|
105
|
-
end
|
106
|
-
|
107
|
-
return data
|
108
|
-
end
|
109
|
-
|
110
|
-
def add_suffix(data, suffix)
|
111
|
-
case data.class.name
|
112
|
-
when 'Array'
|
113
|
-
result = []
|
114
|
-
data.each do |d|
|
115
|
-
result << add_suffix(d, suffix)
|
116
|
-
end
|
117
|
-
data = result
|
118
|
-
when 'Hash'
|
119
|
-
data.each do |k, v|
|
120
|
-
data[k] = add_suffix(v, suffix)
|
121
|
-
end
|
122
|
-
else
|
123
|
-
data = data.to_s
|
124
|
-
data += suffix
|
125
|
-
end
|
126
|
-
data
|
127
|
-
end
|
128
|
-
|
129
8
|
end
|
130
|
-
end
|
9
|
+
end
|
@@ -0,0 +1,130 @@
|
|
1
|
+
require 'logger'
|
2
|
+
|
3
|
+
module DataCollector
|
4
|
+
class Rules
|
5
|
+
def initialize()
|
6
|
+
@logger = Logger.new(STDOUT)
|
7
|
+
end
|
8
|
+
|
9
|
+
def run(rule_map, from_record, to_record, options = {})
|
10
|
+
rule_map.each do |map_to_key, rule|
|
11
|
+
if rule.is_a?(Array)
|
12
|
+
rule.each do |sub_rule|
|
13
|
+
apply_rule(map_to_key, sub_rule, from_record, to_record, options)
|
14
|
+
end
|
15
|
+
else
|
16
|
+
apply_rule(map_to_key, rule, from_record, to_record, options)
|
17
|
+
end
|
18
|
+
end
|
19
|
+
|
20
|
+
to_record.each do |element|
|
21
|
+
element = element.delete_if do |k, v|
|
22
|
+
v != false && (v.nil?)
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
private
|
28
|
+
|
29
|
+
def apply_rule(map_to_key, rule, from_record, to_record, options = {})
|
30
|
+
if rule.has_key?('text')
|
31
|
+
suffix = (rule && rule.key?('options') && rule['options'].key?('suffix')) ? rule['options']['suffix'] : ''
|
32
|
+
to_record << { map_to_key.to_sym => add_suffix(rule['text'], suffix) }
|
33
|
+
elsif rule.has_key?('options') && rule['options'].has_key?('convert') && rule['options']['convert'].eql?('each')
|
34
|
+
result = get_value_for(map_to_key, rule['filter'], from_record, rule['options'], options)
|
35
|
+
|
36
|
+
if result.is_a?(Array)
|
37
|
+
result.each do |m|
|
38
|
+
to_record << {map_to_key.to_sym => m}
|
39
|
+
end
|
40
|
+
else
|
41
|
+
to_record << {map_to_key.to_sym => result}
|
42
|
+
end
|
43
|
+
else
|
44
|
+
result = get_value_for(map_to_key, rule['filter'], from_record, rule['options'], options)
|
45
|
+
return if result && result.empty?
|
46
|
+
|
47
|
+
to_record << {map_to_key.to_sym => result}
|
48
|
+
end
|
49
|
+
end
|
50
|
+
|
51
|
+
def get_value_for(tag_key, filter_path, record, rule_options = {}, options = {})
|
52
|
+
data = nil
|
53
|
+
if record
|
54
|
+
if filter_path.is_a?(Array) && !record.is_a?(Array)
|
55
|
+
record = [record]
|
56
|
+
end
|
57
|
+
|
58
|
+
data = Core::filter(record, filter_path)
|
59
|
+
|
60
|
+
if data && rule_options
|
61
|
+
if rule_options.key?('convert')
|
62
|
+
case rule_options['convert']
|
63
|
+
when 'time'
|
64
|
+
result = []
|
65
|
+
data = [data] unless data.is_a?(Array)
|
66
|
+
data.each do |d|
|
67
|
+
result << Time.parse(d)
|
68
|
+
end
|
69
|
+
data = result
|
70
|
+
when 'map'
|
71
|
+
if data.is_a?(Array)
|
72
|
+
data = data.map do |r|
|
73
|
+
rule_options['map'][r] if rule_options['map'].key?(r)
|
74
|
+
end
|
75
|
+
|
76
|
+
data.compact!
|
77
|
+
data.flatten! if rule_options.key?('flatten') && rule_options['flatten']
|
78
|
+
else
|
79
|
+
return rule_options['map'][data] if rule_options['map'].key?(data)
|
80
|
+
end
|
81
|
+
when 'each'
|
82
|
+
data = [data] unless data.is_a?(Array)
|
83
|
+
if options.empty?
|
84
|
+
data = data.map { |d| rule_options['lambda'].call(d) }
|
85
|
+
else
|
86
|
+
data = data.map { |d| rule_options['lambda'].call(d, options) }
|
87
|
+
end
|
88
|
+
data.flatten! if rule_options.key?('flatten') && rule_options['flatten']
|
89
|
+
when 'call'
|
90
|
+
if options.empty?
|
91
|
+
data = rule_options['lambda'].call(data)
|
92
|
+
else
|
93
|
+
data = rule_options['lambda'].call(data, options)
|
94
|
+
end
|
95
|
+
return data
|
96
|
+
end
|
97
|
+
end
|
98
|
+
|
99
|
+
if rule_options.key?('suffix')
|
100
|
+
data = add_suffix(data, rule_options['suffix'])
|
101
|
+
end
|
102
|
+
|
103
|
+
end
|
104
|
+
|
105
|
+
end
|
106
|
+
|
107
|
+
return data
|
108
|
+
end
|
109
|
+
|
110
|
+
def add_suffix(data, suffix)
|
111
|
+
case data.class.name
|
112
|
+
when 'Array'
|
113
|
+
result = []
|
114
|
+
data.each do |d|
|
115
|
+
result << add_suffix(d, suffix)
|
116
|
+
end
|
117
|
+
data = result
|
118
|
+
when 'Hash'
|
119
|
+
data.each do |k, v|
|
120
|
+
data[k] = add_suffix(v, suffix)
|
121
|
+
end
|
122
|
+
else
|
123
|
+
data = data.to_s
|
124
|
+
data += suffix
|
125
|
+
end
|
126
|
+
data
|
127
|
+
end
|
128
|
+
|
129
|
+
end
|
130
|
+
end
|
@@ -51,30 +51,51 @@ module DataCollector
|
|
51
51
|
|
52
52
|
data = apply_filtered_data_on_payload(data, rule_payload, options)
|
53
53
|
|
54
|
-
output_data << {tag.to_sym => data} unless data.nil? || (data.is_a?(Array) && data.empty?)
|
54
|
+
output_data << { tag.to_sym => data } unless data.nil? || (data.is_a?(Array) && data.empty?)
|
55
55
|
rescue StandardError => e
|
56
|
-
puts "error running rule '#{tag}'\n\t#{e.message}"
|
57
|
-
puts e.backtrace.join("\n")
|
56
|
+
# puts "error running rule '#{tag}'\n\t#{e.message}"
|
57
|
+
# puts e.backtrace.join("\n")
|
58
|
+
raise DataCollector::Error, "error running rule '#{tag}'\n\t#{e.message}"
|
58
59
|
end
|
59
60
|
|
60
61
|
def apply_filtered_data_on_payload(input_data, payload, options = {})
|
61
62
|
return nil if input_data.nil?
|
62
63
|
|
64
|
+
normalized_options = options.select { |k, v| k !~ /^_/ }.with_indifferent_access
|
63
65
|
output_data = nil
|
64
66
|
case payload.class.name
|
65
67
|
when 'Proc'
|
66
68
|
data = input_data.is_a?(Array) ? input_data : [input_data]
|
67
|
-
output_data = if
|
68
|
-
data.map { |d| payload.call(d) }
|
69
|
+
output_data = if normalized_options.empty?
|
70
|
+
# data.map { |d| payload.curry.call(d).call(d) }
|
71
|
+
data.map { |d|
|
72
|
+
loop do
|
73
|
+
payload_result = payload.curry.call(d)
|
74
|
+
break payload_result unless payload_result.is_a?(Proc)
|
75
|
+
end
|
76
|
+
}
|
69
77
|
else
|
70
|
-
data.map { |d|
|
78
|
+
data.map { |d|
|
79
|
+
loop do
|
80
|
+
payload_result = payload.curry.call(d, normalized_options)
|
81
|
+
break payload_result unless payload_result.is_a?(Proc)
|
82
|
+
end
|
83
|
+
}
|
71
84
|
end
|
72
85
|
when 'Hash'
|
73
86
|
input_data = [input_data] unless input_data.is_a?(Array)
|
74
87
|
if input_data.is_a?(Array)
|
75
88
|
output_data = input_data.map do |m|
|
76
89
|
if payload.key?('suffix')
|
77
|
-
|
90
|
+
if (m.is_a?(Hash))
|
91
|
+
m.transform_values { |v| v.is_a?(String) ? "#{v}#{payload['suffix']}" : v }
|
92
|
+
elsif m.is_a?(Array)
|
93
|
+
m.map { |n| n.is_a?(String) ? "#{n}#{payload['suffix']}" : n }
|
94
|
+
elsif m.methods.include?(:to_s)
|
95
|
+
"#{m}#{payload['suffix']}"
|
96
|
+
else
|
97
|
+
m
|
98
|
+
end
|
78
99
|
else
|
79
100
|
payload[m]
|
80
101
|
end
|
@@ -83,7 +104,7 @@ module DataCollector
|
|
83
104
|
when 'Array'
|
84
105
|
output_data = input_data
|
85
106
|
payload.each do |p|
|
86
|
-
output_data = apply_filtered_data_on_payload(output_data, p,
|
107
|
+
output_data = apply_filtered_data_on_payload(output_data, p, normalized_options)
|
87
108
|
end
|
88
109
|
else
|
89
110
|
output_data = [input_data]
|
@@ -92,17 +113,21 @@ module DataCollector
|
|
92
113
|
output_data.compact! if output_data.is_a?(Array)
|
93
114
|
output_data.flatten! if output_data.is_a?(Array)
|
94
115
|
if output_data.is_a?(Array) &&
|
95
|
-
|
96
|
-
|
116
|
+
output_data.size == 1 &&
|
117
|
+
(output_data.first.is_a?(Array) || output_data.first.is_a?(Hash))
|
97
118
|
output_data = output_data.first
|
98
119
|
end
|
99
120
|
|
100
|
-
if options.key?('_no_array_with_one_element') && options['_no_array_with_one_element'] &&
|
121
|
+
if options.with_indifferent_access.key?('_no_array_with_one_element') && options.with_indifferent_access['_no_array_with_one_element'] &&
|
101
122
|
output_data.is_a?(Array) && output_data.size == 1
|
102
123
|
output_data = output_data.first
|
103
124
|
end
|
104
125
|
|
105
126
|
output_data
|
127
|
+
rescue StandardError => e
|
128
|
+
# puts "error applying filtered data on payload'#{payload.to_json}'\n\t#{e.message}"
|
129
|
+
# puts e.backtrace.join("\n")
|
130
|
+
raise DataCollector::Error, "error applying filtered data on payload'#{payload.to_json}'\n\t#{e.message}"
|
106
131
|
end
|
107
132
|
|
108
133
|
def json_path_filter(filter, input_data)
|
@@ -111,6 +136,10 @@ module DataCollector
|
|
111
136
|
return input_data if input_data.is_a?(String)
|
112
137
|
|
113
138
|
Core.filter(input_data, filter)
|
139
|
+
rescue StandardError => e
|
140
|
+
puts "error running filter '#{filter}'\n\t#{e.message}"
|
141
|
+
puts e.backtrace.join("\n")
|
142
|
+
raise DataCollector::Error, "error running filter '#{filter}'\n\t#{e.message}"
|
114
143
|
end
|
115
144
|
end
|
116
145
|
end
|
data/lib/data_collector.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: data_collector
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.19.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Mehmet Celik
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2023-
|
11
|
+
date: 2023-05-08 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: activesupport
|
@@ -114,14 +114,14 @@ dependencies:
|
|
114
114
|
requirements:
|
115
115
|
- - "~>"
|
116
116
|
- !ruby/object:Gem::Version
|
117
|
-
version: '1.
|
117
|
+
version: '1.14'
|
118
118
|
type: :runtime
|
119
119
|
prerelease: false
|
120
120
|
version_requirements: !ruby/object:Gem::Requirement
|
121
121
|
requirements:
|
122
122
|
- - "~>"
|
123
123
|
- !ruby/object:Gem::Version
|
124
|
-
version: '1.
|
124
|
+
version: '1.14'
|
125
125
|
- !ruby/object:Gem::Dependency
|
126
126
|
name: nori
|
127
127
|
requirement: !ruby/object:Gem::Requirement
|
@@ -136,6 +136,48 @@ dependencies:
|
|
136
136
|
- - "~>"
|
137
137
|
- !ruby/object:Gem::Version
|
138
138
|
version: '2.6'
|
139
|
+
- !ruby/object:Gem::Dependency
|
140
|
+
name: iso8601
|
141
|
+
requirement: !ruby/object:Gem::Requirement
|
142
|
+
requirements:
|
143
|
+
- - "~>"
|
144
|
+
- !ruby/object:Gem::Version
|
145
|
+
version: '0.13'
|
146
|
+
type: :runtime
|
147
|
+
prerelease: false
|
148
|
+
version_requirements: !ruby/object:Gem::Requirement
|
149
|
+
requirements:
|
150
|
+
- - "~>"
|
151
|
+
- !ruby/object:Gem::Version
|
152
|
+
version: '0.13'
|
153
|
+
- !ruby/object:Gem::Dependency
|
154
|
+
name: listen
|
155
|
+
requirement: !ruby/object:Gem::Requirement
|
156
|
+
requirements:
|
157
|
+
- - "~>"
|
158
|
+
- !ruby/object:Gem::Version
|
159
|
+
version: '3.8'
|
160
|
+
type: :runtime
|
161
|
+
prerelease: false
|
162
|
+
version_requirements: !ruby/object:Gem::Requirement
|
163
|
+
requirements:
|
164
|
+
- - "~>"
|
165
|
+
- !ruby/object:Gem::Version
|
166
|
+
version: '3.8'
|
167
|
+
- !ruby/object:Gem::Dependency
|
168
|
+
name: bunny
|
169
|
+
requirement: !ruby/object:Gem::Requirement
|
170
|
+
requirements:
|
171
|
+
- - "~>"
|
172
|
+
- !ruby/object:Gem::Version
|
173
|
+
version: '2.20'
|
174
|
+
type: :runtime
|
175
|
+
prerelease: false
|
176
|
+
version_requirements: !ruby/object:Gem::Requirement
|
177
|
+
requirements:
|
178
|
+
- - "~>"
|
179
|
+
- !ruby/object:Gem::Version
|
180
|
+
version: '2.20'
|
139
181
|
- !ruby/object:Gem::Dependency
|
140
182
|
name: bundler
|
141
183
|
requirement: !ruby/object:Gem::Requirement
|
@@ -156,14 +198,14 @@ dependencies:
|
|
156
198
|
requirements:
|
157
199
|
- - "~>"
|
158
200
|
- !ruby/object:Gem::Version
|
159
|
-
version: '5.
|
201
|
+
version: '5.18'
|
160
202
|
type: :development
|
161
203
|
prerelease: false
|
162
204
|
version_requirements: !ruby/object:Gem::Requirement
|
163
205
|
requirements:
|
164
206
|
- - "~>"
|
165
207
|
- !ruby/object:Gem::Version
|
166
|
-
version: '5.
|
208
|
+
version: '5.18'
|
167
209
|
- !ruby/object:Gem::Dependency
|
168
210
|
name: rake
|
169
211
|
requirement: !ruby/object:Gem::Requirement
|
@@ -208,13 +250,19 @@ files:
|
|
208
250
|
- bin/console
|
209
251
|
- bin/setup
|
210
252
|
- data_collector.gemspec
|
253
|
+
- examples/marc.rb
|
211
254
|
- lib/data_collector.rb
|
212
255
|
- lib/data_collector/config_file.rb
|
213
256
|
- lib/data_collector/core.rb
|
214
257
|
- lib/data_collector/ext/xml_utility_node.rb
|
215
258
|
- lib/data_collector/input.rb
|
259
|
+
- lib/data_collector/input/dir.rb
|
260
|
+
- lib/data_collector/input/generic.rb
|
261
|
+
- lib/data_collector/input/queue.rb
|
216
262
|
- lib/data_collector/output.rb
|
263
|
+
- lib/data_collector/pipeline.rb
|
217
264
|
- lib/data_collector/rules.rb
|
265
|
+
- lib/data_collector/rules.rb.depricated
|
218
266
|
- lib/data_collector/rules_ng.rb
|
219
267
|
- lib/data_collector/runner.rb
|
220
268
|
- lib/data_collector/version.rb
|
@@ -240,7 +288,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
240
288
|
- !ruby/object:Gem::Version
|
241
289
|
version: '0'
|
242
290
|
requirements: []
|
243
|
-
rubygems_version: 3.
|
291
|
+
rubygems_version: 3.4.10
|
244
292
|
signing_key:
|
245
293
|
specification_version: 4
|
246
294
|
summary: ETL helper library
|