oai_schedules 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 1fe0345feb8fe60475f4668b9dfabadfe525b9251d60778cabcbb27f7c20d0b1
4
- data.tar.gz: 4407a7a41d10b7a8dc30e8261a99eb74ea9cfdd5bc35e8592cbf27f76c114def
3
+ metadata.gz: 6b67e6f8703a12140e9f0ddb981a30a5311e80d27e5146bd9848601d27272db9
4
+ data.tar.gz: '09864239d4e38142c63a503a1535296f0a504bd4e9c0c84bfb87b024e640d104'
5
5
  SHA512:
6
- metadata.gz: 29278455d3eceff2ce5b53bbb65f4074dd87b16a4389258fb77e4567b0d20343214d864ab166fcc8a7c2978ff2f8a20659946ee5fa990a15b1e1422f3d7bfcb1
7
- data.tar.gz: 6d39fee175e2e2a8b872529ca79fac3b0c1b22f53f8502011e514bf1d769c99bacd37c066fb5682505a88c7d72284275876e52c66a5b2afa163674c4c0a0a42e
6
+ metadata.gz: cbe2109c9a0b04a3e743ef6c577a38320bc84d411bff334b9b5e5e6bfc536aa9946c536b4a4dd3f56f320977b734de16e183247ac64edbb8d5d0051c7690ae91
7
+ data.tar.gz: e747187b4375075a4e7f06831c67b6f59e7e2062578255732330884ea60bd4607b360054b2db504fcee7ec853d09b89ed8adf7f34d7f2f0a3d8ba49822a24b88
data/CHANGELOG.md CHANGED
@@ -1,5 +1,7 @@
1
- ## [Unreleased]
1
+ ## [0.2.0] - 2025-03-26
2
2
 
3
- ## [0.1.0] - 2025-03-20
3
+ - Added custom records digestions function
4
+
5
+ ## [0.1.0] - 2025-03-25
4
6
 
5
7
  - Initial release
data/README.md CHANGED
@@ -1,28 +1,127 @@
1
- # OaiSchedules
1
+ # Description
2
2
 
3
- TODO: Delete this and the text below, and describe your gem
4
-
5
- Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/oai_schedules`. To experiment with that code, run `bin/console` for an interactive prompt.
3
+ This gem allows to create concurrent harvesting of records from OAI-PMH repositories, with custom-provided
4
+ records digestion logic.
6
5
 
7
6
  ## Installation
8
7
 
9
- TODO: Replace `UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG` with your gem name right after releasing it to RubyGems.org. Please do not do it earlier due to security reasons. Alternatively, replace this section with instructions to install your gem from git if you don't plan to release to RubyGems.org.
10
-
11
8
  Install the gem and add to the application's Gemfile by executing:
12
9
 
13
10
  ```bash
14
- bundle add UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG
11
+ bundle add oai_schedules
15
12
  ```
16
13
 
17
14
  If bundler is not being used to manage dependencies, install the gem by executing:
18
15
 
19
16
  ```bash
20
- gem install UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG
17
+ gem install oai_schedules
21
18
  ```
22
19
 
23
20
  ## Usage
24
21
 
25
- TODO: Write usage instructions here
22
+ ```ruby
23
+ require 'oai_schedules/manager'
24
+
25
+ f_show = lambda do |records, done|
26
+ # ... do your stuff with records ...
27
+ if done
28
+ puts "done full harvesting"
29
+ end
30
+ end
31
+
32
+ manager = OAISchedules::Manager.new(
33
+ path_dir_state: "./dir_state",
34
+ f_digest: f_show
35
+ )
36
+ content_schedule = {
37
+ "interval" => "PT2S",
38
+ "active" => true,
39
+ "repository" => {
40
+ "uri" => "https://eudml.org/oai/OAIHandler"
41
+ },
42
+ "format" => "oai_dc",
43
+ "set" => "CEDRAM"
44
+ }
45
+ manager.add_schedule("my_sample_schedule", content_schedule)
46
+ sleep
47
+ ```
48
+
49
+ The demo app above instantiates a schedules manager.
50
+ It will save schedules internal *state* file (e.g. OAI resumption token) in folder `./dir_state`.
51
+ Each schedule state files is a JSON file in format *state_\<name-schedule\>.json*.
52
+ The state file is used, in case of the schedules manager crash, to restore the harvesting from the last saved point.
53
+ A schedule is then added to the manager.
54
+ The schedule with name `my_sample_schedule` will get partial records from set `CEDRAM` in `oai_dc` (Dublic Core) format,
55
+ from the repository whose uri is `https://eudml.org/oai/OAIHandler`, every 2 seconds.
56
+ At every iteration, every 2 seconds, it will get the *partial* records by using the previous OAI *resumption token*
57
+ (for iterations after the very first one);
58
+ The code will do this by querying `https://eudml.org/oai/OAIHandler?verb=ListRecords&...` and adding the necessary
59
+ query parameters.
60
+ The custom function provided as `f_digest` will then be called at each iteration.
61
+ This will be provided the partial list of `records` as a hash, and a `done` flag (full harvesting complete).
62
+ and it will write the new one to the state file, until no token is provided (end of the harvesting).
63
+ As soon as the schedule is added, it is executed.
64
+ It is possible to add all schedules in advance, then call `sleep` for infinite event loop.
65
+ Ot it is also possible to do the following:
66
+
67
+ ```ruby
68
+ # add schedule
69
+ manager.add_schedule("my_sample_schedule", content_schedule)
70
+
71
+ # ... do your own things here ...
72
+
73
+ # modify schedule
74
+ content_schedule["active"] = false # e.g. pause schedule
75
+ manager.modify_schedule("my_sample_schedule", content_schedule)
76
+
77
+ # .... do other things ...
78
+
79
+ # modify schedule
80
+ content_schedule["active"] = true # e.g. resume schedule ...
81
+ content_schedule["interval"] = "PT5S" # ... but slower
82
+ manager.modify_schedule("my_sample_schedule", content_schedule)
83
+
84
+ # finally remove schedule
85
+ manager.remove_schedule("my_sample_schedule")
86
+ ```
87
+
88
+ It is also possible to listen to a folder containing *schedule* JSON files.
89
+ These files must have this format: *schedule_\<name-schedule\>.json*.
90
+ The manager will extract the schedule name from the file name.
91
+ See the following:
92
+
93
+ ```ruby
94
+ manager = OAISchedules::Manager.new(
95
+ # ...
96
+ path_dir_schedules: "./dir_schedules",
97
+ # ...
98
+ )
99
+ # # alternative:
100
+ # manager = OAISchedules::Manager.new()
101
+ # manager.set_listener_dir_schedules("./dir_schedules")
102
+ manager.run_listener_dir_schedules(block: true)
103
+ ```
104
+
105
+ The above app will listen to files addition, modification and deletion in the folder `./dir_schedules`.
106
+ The listener is started in blocking mode.
107
+
108
+ You can add as many schedules are you want (with different names), they will all run concurrently.
109
+
110
+ A *schedule definition*, either provided programmatically or from file, must have this structure:
111
+
112
+ ```json5
113
+ {
114
+ "interval": "PT2S", // (required) schedule interval in ISO8601 format.
115
+ "active": true, // (required) is active (resumed) or not (paused)
116
+ "repository": {
117
+ "uri": "https://eudml.org/oai/OAIHandler" // (required) OAI-PMH repository url
118
+ },
119
+ "format": "oai_dc", // (required) metadata prefix to use
120
+ "set": "CEDRAM", // (optional) set to collect. NOTE: on some repositories, this is necessary
121
+ "from": "2025-03-23T00:00:00Z", // (optional) start datetime to collect from
122
+ "until": "2025-03-23T00:00:00Z", // (optional) end datetime to collect from
123
+ }
124
+ ```
26
125
 
27
126
  ## Development
28
127
 
@@ -4,8 +4,22 @@ require 'oai_schedules/manager'
4
4
 
5
5
  # usage with folder listener
6
6
 
7
- manager = OAISchedules::Manager.new(path_dir_schedules: "./dir_schedules", path_dir_state: "./dir_state")
7
+ f_show = lambda do |records, done|
8
+ # ... do your stuff with records ...
9
+ if done
10
+ puts "done full harvesting"
11
+ end
12
+ end
13
+
14
+ manager = OAISchedules::Manager.new(
15
+ path_dir_schedules: "./dir_schedules",
16
+ path_dir_state: "./dir_state",
17
+ f_digest: f_show
18
+ )
8
19
  # # alternative:
9
- # manager = OAISchedules::Manager.new()
20
+ # manager = OAISchedules::Manager.new(
21
+ # path_dir_state: "./dir_state",
22
+ # f_digest: f_show
23
+ # )
10
24
  # manager.set_listener_dir_schedules("./dir_schedules")
11
25
  manager.run_listener_dir_schedules(block: true)
@@ -4,14 +4,25 @@ require 'oai_schedules/manager'
4
4
 
5
5
  # usage with programmatic schedules addition / modify / remove
6
6
 
7
- manager = OAISchedules::Manager.new(path_dir_state: "./dir_state")
7
+ f_show = lambda do |records, done|
8
+ # ... do your stuff with records ...
9
+ if done
10
+ puts "done full harvesting"
11
+ end
12
+ end
13
+
14
+ manager = OAISchedules::Manager.new(
15
+ path_dir_state: "./dir_state",
16
+ f_digest: f_show
17
+ )
8
18
  content_schedule = {
9
19
  "interval" => "PT2S",
10
20
  "active" => true,
11
21
  "repository" => {
12
22
  "uri" => "https://eudml.org/oai/OAIHandler"
13
23
  },
14
- "format" => "oai_dc"
24
+ "format" => "oai_dc",
25
+ "set" => "CEDRAM"
15
26
  }
16
27
  manager.add_schedule("my_sample_schedule", content_schedule)
17
28
  sleep
@@ -119,7 +119,7 @@ module OAISchedules
119
119
 
120
120
  attr_accessor :path_dir_state
121
121
 
122
- def initialize(path_dir_schedules: nil, path_dir_state: nil)
122
+ def initialize(path_dir_schedules: nil, path_dir_state: nil, f_digest: nil)
123
123
  @logger = Logger.new(STDOUT)
124
124
  @path_dir_state = path_dir_state
125
125
  @schedules = {}
@@ -128,6 +128,7 @@ module OAISchedules
128
128
  unless path_dir_schedules.nil?
129
129
  set_listener_dir_schedules(path_dir_schedules)
130
130
  end
131
+ @f_digest = f_digest
131
132
  end
132
133
 
133
134
 
@@ -354,9 +355,12 @@ module OAISchedules
354
355
  write_state_file(path_file_state, state)
355
356
  if !data["resumptionToken"].nil?
356
357
  state_machine.add_event(EventHarvesting::DONE_HARVEST)
358
+ done = false
357
359
  else
358
360
  state_machine.add_event(EventHarvesting::DONE_FULL_HARVEST)
361
+ done = true
359
362
  end
363
+ @f_digest&.call(data, done)
360
364
  break
361
365
  when StateHarvesting::COMPLETE
362
366
  @logger.warn("#{name}: full harvesting complete")
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module OaiSchedules
4
- VERSION = "0.1.0"
4
+ VERSION = "0.2.0"
5
5
  end
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: oai_schedules
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Davide Monari
8
8
  bindir: exe
9
9
  cert_chain: []
10
- date: 2025-03-25 00:00:00.000000000 Z
10
+ date: 2025-03-26 00:00:00.000000000 Z
11
11
  dependencies:
12
12
  - !ruby/object:Gem::Dependency
13
13
  name: concurrent-ruby
@@ -65,7 +65,7 @@ dependencies:
65
65
  - - "~>"
66
66
  - !ruby/object:Gem::Version
67
67
  version: 3.9.0
68
- description: gem to run concurrent OAI-PHM harvesting schedules
68
+ description: gem to run concurrent OAI-PMH harvesting schedules
69
69
  email:
70
70
  - davide.monari@kuleuven.be
71
71
  executables: []
@@ -110,5 +110,5 @@ required_rubygems_version: !ruby/object:Gem::Requirement
110
110
  requirements: []
111
111
  rubygems_version: 3.6.2
112
112
  specification_version: 4
113
- summary: gem to run concurrent OAI-PHM harvesting schedules
113
+ summary: gem to run concurrent OAI-PMH harvesting schedules
114
114
  test_files: []