oai_schedules 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +4 -2
- data/README.md +108 -9
- data/examples/example_01.rb +16 -2
- data/examples/example_02.rb +13 -2
- data/lib/oai_schedules/manager.rb +5 -1
- data/lib/oai_schedules/version.rb +1 -1
- metadata +4 -4
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 6b67e6f8703a12140e9f0ddb981a30a5311e80d27e5146bd9848601d27272db9
|
4
|
+
data.tar.gz: '09864239d4e38142c63a503a1535296f0a504bd4e9c0c84bfb87b024e640d104'
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: cbe2109c9a0b04a3e743ef6c577a38320bc84d411bff334b9b5e5e6bfc536aa9946c536b4a4dd3f56f320977b734de16e183247ac64edbb8d5d0051c7690ae91
|
7
|
+
data.tar.gz: e747187b4375075a4e7f06831c67b6f59e7e2062578255732330884ea60bd4607b360054b2db504fcee7ec853d09b89ed8adf7f34d7f2f0a3d8ba49822a24b88
|
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
@@ -1,28 +1,127 @@
|
|
1
|
-
#
|
1
|
+
# Description
|
2
2
|
|
3
|
-
|
4
|
-
|
5
|
-
Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/oai_schedules`. To experiment with that code, run `bin/console` for an interactive prompt.
|
3
|
+
This gem allows to create concurrent harvesting of records from OAI-PMH repositories, with custom-provided
|
4
|
+
records digestion logic.
|
6
5
|
|
7
6
|
## Installation
|
8
7
|
|
9
|
-
TODO: Replace `UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG` with your gem name right after releasing it to RubyGems.org. Please do not do it earlier due to security reasons. Alternatively, replace this section with instructions to install your gem from git if you don't plan to release to RubyGems.org.
|
10
|
-
|
11
8
|
Install the gem and add to the application's Gemfile by executing:
|
12
9
|
|
13
10
|
```bash
|
14
|
-
bundle add
|
11
|
+
bundle add oai_schedules
|
15
12
|
```
|
16
13
|
|
17
14
|
If bundler is not being used to manage dependencies, install the gem by executing:
|
18
15
|
|
19
16
|
```bash
|
20
|
-
gem install
|
17
|
+
gem install oai_schedules
|
21
18
|
```
|
22
19
|
|
23
20
|
## Usage
|
24
21
|
|
25
|
-
|
22
|
+
```ruby
|
23
|
+
require 'oai_schedules/manager'
|
24
|
+
|
25
|
+
f_show = lambda do |records, done|
|
26
|
+
# ... do your stuff with records ...
|
27
|
+
if done
|
28
|
+
puts "done full harvesting"
|
29
|
+
end
|
30
|
+
end
|
31
|
+
|
32
|
+
manager = OAISchedules::Manager.new(
|
33
|
+
path_dir_state: "./dir_state",
|
34
|
+
f_digest: f_show
|
35
|
+
)
|
36
|
+
content_schedule = {
|
37
|
+
"interval" => "PT2S",
|
38
|
+
"active" => true,
|
39
|
+
"repository" => {
|
40
|
+
"uri" => "https://eudml.org/oai/OAIHandler"
|
41
|
+
},
|
42
|
+
"format" => "oai_dc",
|
43
|
+
"set" => "CEDRAM"
|
44
|
+
}
|
45
|
+
manager.add_schedule("my_sample_schedule", content_schedule)
|
46
|
+
sleep
|
47
|
+
```
|
48
|
+
|
49
|
+
The demo app above instantiates a schedules manager.
|
50
|
+
It will save schedules internal *state* file (e.g. OAI resumption token) in folder `./dir_state`.
|
51
|
+
Each schedule state files is a JSON file in format *state_\<name-schedule\>.json*.
|
52
|
+
The state file is used, in case of the schedules manager crash, to restore the harvesting from the last saved point.
|
53
|
+
A schedule is then added to the manager.
|
54
|
+
The schedule with name `my_sample_schedule` will get partial records from set `CEDRAM` in `oai_dc` (Dublic Core) format,
|
55
|
+
from the repository whose uri is `https://eudml.org/oai/OAIHandler`, every 2 seconds.
|
56
|
+
At every iteration, every 2 seconds, it will get the *partial* records by using the previous OAI *resumption token*
|
57
|
+
(for iterations after the very first one);
|
58
|
+
The code will do this by querying `https://eudml.org/oai/OAIHandler?verb=ListRecords&...` and adding the necessary
|
59
|
+
query parameters.
|
60
|
+
The custom function provided as `f_digest` will then be called at each iteration.
|
61
|
+
This will be provided the partial list of `records` as a hash, and a `done` flag (full harvesting complete).
|
62
|
+
and it will write the new one to the state file, until no token is provided (end of the harvesting).
|
63
|
+
As soon as the schedule is added, it is executed.
|
64
|
+
It is possible to add all schedules in advance, then call `sleep` for infinite event loop.
|
65
|
+
Ot it is also possible to do the following:
|
66
|
+
|
67
|
+
```ruby
|
68
|
+
# add schedule
|
69
|
+
manager.add_schedule("my_sample_schedule", content_schedule)
|
70
|
+
|
71
|
+
# ... do your own things here ...
|
72
|
+
|
73
|
+
# modify schedule
|
74
|
+
content_schedule["active"] = false # e.g. pause schedule
|
75
|
+
manager.modify_schedule("my_sample_schedule", content_schedule)
|
76
|
+
|
77
|
+
# .... do other things ...
|
78
|
+
|
79
|
+
# modify schedule
|
80
|
+
content_schedule["active"] = true # e.g. resume schedule ...
|
81
|
+
content_schedule["interval"] = "PT5S" # ... but slower
|
82
|
+
manager.modify_schedule("my_sample_schedule", content_schedule)
|
83
|
+
|
84
|
+
# finally remove schedule
|
85
|
+
manager.remove_schedule("my_sample_schedule")
|
86
|
+
```
|
87
|
+
|
88
|
+
It is also possible to listen to a folder containing *schedule* JSON files.
|
89
|
+
These files must have this format: *schedule_\<name-schedule\>.json*.
|
90
|
+
The manager will extract the schedule name from the file name.
|
91
|
+
See the following:
|
92
|
+
|
93
|
+
```ruby
|
94
|
+
manager = OAISchedules::Manager.new(
|
95
|
+
# ...
|
96
|
+
path_dir_schedules: "./dir_schedules",
|
97
|
+
# ...
|
98
|
+
)
|
99
|
+
# # alternative:
|
100
|
+
# manager = OAISchedules::Manager.new()
|
101
|
+
# manager.set_listener_dir_schedules("./dir_schedules")
|
102
|
+
manager.run_listener_dir_schedules(block: true)
|
103
|
+
```
|
104
|
+
|
105
|
+
The above app will listen to files addition, modification and deletion in the folder `./dir_schedules`.
|
106
|
+
The listener is started in blocking mode.
|
107
|
+
|
108
|
+
You can add as many schedules are you want (with different names), they will all run concurrently.
|
109
|
+
|
110
|
+
A *schedule definition*, either provided programmatically or from file, must have this structure:
|
111
|
+
|
112
|
+
```json5
|
113
|
+
{
|
114
|
+
"interval": "PT2S", // (required) schedule interval in ISO8601 format.
|
115
|
+
"active": true, // (required) is active (resumed) or not (paused)
|
116
|
+
"repository": {
|
117
|
+
"uri": "https://eudml.org/oai/OAIHandler" // (required) OAI-PMH repository url
|
118
|
+
},
|
119
|
+
"format": "oai_dc", // (required) metadata prefix to use
|
120
|
+
"set": "CEDRAM", // (optional) set to collect. NOTE: on some repositories, this is necessary
|
121
|
+
"from": "2025-03-23T00:00:00Z", // (optional) start datetime to collect from
|
122
|
+
"until": "2025-03-23T00:00:00Z", // (optional) end datetime to collect from
|
123
|
+
}
|
124
|
+
```
|
26
125
|
|
27
126
|
## Development
|
28
127
|
|
data/examples/example_01.rb
CHANGED
@@ -4,8 +4,22 @@ require 'oai_schedules/manager'
|
|
4
4
|
|
5
5
|
# usage with folder listener
|
6
6
|
|
7
|
-
|
7
|
+
f_show = lambda do |records, done|
|
8
|
+
# ... do your stuff with records ...
|
9
|
+
if done
|
10
|
+
puts "done full harvesting"
|
11
|
+
end
|
12
|
+
end
|
13
|
+
|
14
|
+
manager = OAISchedules::Manager.new(
|
15
|
+
path_dir_schedules: "./dir_schedules",
|
16
|
+
path_dir_state: "./dir_state",
|
17
|
+
f_digest: f_show
|
18
|
+
)
|
8
19
|
# # alternative:
|
9
|
-
# manager = OAISchedules::Manager.new(
|
20
|
+
# manager = OAISchedules::Manager.new(
|
21
|
+
# path_dir_state: "./dir_state",
|
22
|
+
# f_digest: f_show
|
23
|
+
# )
|
10
24
|
# manager.set_listener_dir_schedules("./dir_schedules")
|
11
25
|
manager.run_listener_dir_schedules(block: true)
|
data/examples/example_02.rb
CHANGED
@@ -4,14 +4,25 @@ require 'oai_schedules/manager'
|
|
4
4
|
|
5
5
|
# usage with programmatic schedules addition / modify / remove
|
6
6
|
|
7
|
-
|
7
|
+
f_show = lambda do |records, done|
|
8
|
+
# ... do your stuff with records ...
|
9
|
+
if done
|
10
|
+
puts "done full harvesting"
|
11
|
+
end
|
12
|
+
end
|
13
|
+
|
14
|
+
manager = OAISchedules::Manager.new(
|
15
|
+
path_dir_state: "./dir_state",
|
16
|
+
f_digest: f_show
|
17
|
+
)
|
8
18
|
content_schedule = {
|
9
19
|
"interval" => "PT2S",
|
10
20
|
"active" => true,
|
11
21
|
"repository" => {
|
12
22
|
"uri" => "https://eudml.org/oai/OAIHandler"
|
13
23
|
},
|
14
|
-
"format" => "oai_dc"
|
24
|
+
"format" => "oai_dc",
|
25
|
+
"set" => "CEDRAM"
|
15
26
|
}
|
16
27
|
manager.add_schedule("my_sample_schedule", content_schedule)
|
17
28
|
sleep
|
@@ -119,7 +119,7 @@ module OAISchedules
|
|
119
119
|
|
120
120
|
attr_accessor :path_dir_state
|
121
121
|
|
122
|
-
def initialize(path_dir_schedules: nil, path_dir_state: nil)
|
122
|
+
def initialize(path_dir_schedules: nil, path_dir_state: nil, f_digest: nil)
|
123
123
|
@logger = Logger.new(STDOUT)
|
124
124
|
@path_dir_state = path_dir_state
|
125
125
|
@schedules = {}
|
@@ -128,6 +128,7 @@ module OAISchedules
|
|
128
128
|
unless path_dir_schedules.nil?
|
129
129
|
set_listener_dir_schedules(path_dir_schedules)
|
130
130
|
end
|
131
|
+
@f_digest = f_digest
|
131
132
|
end
|
132
133
|
|
133
134
|
|
@@ -354,9 +355,12 @@ module OAISchedules
|
|
354
355
|
write_state_file(path_file_state, state)
|
355
356
|
if !data["resumptionToken"].nil?
|
356
357
|
state_machine.add_event(EventHarvesting::DONE_HARVEST)
|
358
|
+
done = false
|
357
359
|
else
|
358
360
|
state_machine.add_event(EventHarvesting::DONE_FULL_HARVEST)
|
361
|
+
done = true
|
359
362
|
end
|
363
|
+
@f_digest&.call(data, done)
|
360
364
|
break
|
361
365
|
when StateHarvesting::COMPLETE
|
362
366
|
@logger.warn("#{name}: full harvesting complete")
|
metadata
CHANGED
@@ -1,13 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: oai_schedules
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Davide Monari
|
8
8
|
bindir: exe
|
9
9
|
cert_chain: []
|
10
|
-
date: 2025-03-
|
10
|
+
date: 2025-03-26 00:00:00.000000000 Z
|
11
11
|
dependencies:
|
12
12
|
- !ruby/object:Gem::Dependency
|
13
13
|
name: concurrent-ruby
|
@@ -65,7 +65,7 @@ dependencies:
|
|
65
65
|
- - "~>"
|
66
66
|
- !ruby/object:Gem::Version
|
67
67
|
version: 3.9.0
|
68
|
-
description: gem to run concurrent OAI-
|
68
|
+
description: gem to run concurrent OAI-PMH harvesting schedules
|
69
69
|
email:
|
70
70
|
- davide.monari@kuleuven.be
|
71
71
|
executables: []
|
@@ -110,5 +110,5 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
110
110
|
requirements: []
|
111
111
|
rubygems_version: 3.6.2
|
112
112
|
specification_version: 4
|
113
|
-
summary: gem to run concurrent OAI-
|
113
|
+
summary: gem to run concurrent OAI-PMH harvesting schedules
|
114
114
|
test_files: []
|