med_pipe 0.1.1 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +36 -19
- data/app/models/med_pipe/pipeline_plan.rb +4 -6
- data/lib/med_pipe/version.rb +1 -1
- metadata +4 -10
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: e033daa9e892bde3d031d927cf70e7ea1164edcef333eea69c0cdb71d17124d2
|
4
|
+
data.tar.gz: 1253ceea7ed2d9021c2e620a52addfe813fd80bda18f2a55920e18b9f7134623
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: bbf20fedd6d3d99d1789da72fd6e7555357882a4acf17f6f18e97d23aa739e4b028beecbf149a9ac28017cefb1a92079ec7bb3c9f78e39a4cb8311496a051feb
|
7
|
+
data.tar.gz: bcd4881d759ccb04b2c1f9bac09f758534098b5524ff266952d0ccd5642c0c96536e94a3a8e95c5246f724bc9f9a3ae65d42b69550828116eb3c2c3e8655e219
|
data/README.md
CHANGED
@@ -1,39 +1,56 @@
|
|
1
|
-
# MedPipe
|
2
|
-
|
1
|
+
# MedPipe
|
2
|
+

|
3
|
+
|
4
|
+
A Rails engine that provides mechanisms for processing datasets ranging from 1 million to several billion records.
|
3
5
|
|
4
6
|
## Concept
|
7
|
+
|
8
|
+

|
9
|
+
|
5
10
|
### MedPipe::Pipeline
|
6
|
-
apply
|
11
|
+
Register PipelineTask through 'apply' method and execute them sequentially using 'run'.
|
7
12
|
|
8
13
|
### MedPipe::PipelineTask
|
9
|
-
|
10
|
-
DB
|
11
|
-
|
12
|
-
call
|
14
|
+
This is the basic unit of processing registered in the pipeline.
|
15
|
+
Tasks are divided into specific operations such as reading from DB or uploading to S3.
|
16
|
+
When handling large datasets, Enumerable::Lazy can be used to process data in chunks.
|
17
|
+
You need to implement the 'call' method:
|
13
18
|
|
14
|
-
|
19
|
+
```ruby
|
15
20
|
@param context [Hash] Stores data during pipeline execution
|
16
21
|
@param prev_result [Object] The result of the previous task
|
17
22
|
def call(context, prev_result)
|
18
|
-
yield
|
23
|
+
yield "data_to_pass_to_next_task"
|
19
24
|
end
|
20
25
|
```
|
21
26
|
|
22
27
|
### MedPipe::PipelinePlan
|
23
|
-
|
24
|
-
|
28
|
+
A model for storing pipeline state, options, and results.
|
29
|
+
There are two ways to pass options for tasks: either retrieve from PipelinePlan or propagate through context.
|
25
30
|
|
26
31
|
### MedPipe::PipelineGroup
|
27
|
-
|
28
|
-
|
32
|
+
A model for grouping plans.
|
33
|
+
Execution can be interrupted by setting parallel_limit to 0 during runtime.
|
29
34
|
|
30
35
|
## Usage
|
31
36
|
|
32
|
-
1. Reader, Uploader
|
33
|
-
2. PipelineRunner
|
34
|
-
3. Pipeline
|
35
|
-
4. PipelinePlan
|
36
|
-
5.
|
37
|
+
1. Create PipelineTask such as Reader, Uploader, etc. [Samples](https://github.com/medpeer-dev/med_pipe/tree/main/spec/dummy/app/models/pipeline_task)
|
38
|
+
2. Create PipelineRunner [Sample](https://github.com/medpeer-dev/med_pipe/blob/main/spec/dummy/app/models/sample_pipeline_runner.rb)
|
39
|
+
3. Create a job for parallel Pipeline execution [Sample](https://github.com/medpeer-dev/med_pipe/blob/main/spec/dummy/app/jobs/sample_execute_pipeline_job.rb)
|
40
|
+
4. Write code to register PipelinePlan
|
41
|
+
5. Execute like this:
|
42
|
+
|
43
|
+
```ruby
|
44
|
+
# add plan
|
45
|
+
pipeline_group = MedPipe::PipelineGroup.create!(parallel_limit: 10)
|
46
|
+
date_range = Date.new(2024, 6, 1)..Date.new(2024, 6, 30)
|
47
|
+
date_range.each do |date|
|
48
|
+
pipeline_group.pipeline_plans.status_waiting.create!(name: 'point_events', output_unit: :daily, target_date: date)
|
49
|
+
end
|
50
|
+
|
51
|
+
# execute
|
52
|
+
ExecutePipelineJob.perform_later(pipeline_group.id)
|
53
|
+
```
|
37
54
|
|
38
55
|
## Installation
|
39
56
|
Add this line to your application's Gemfile:
|
@@ -42,7 +59,7 @@ Add this line to your application's Gemfile:
|
|
42
59
|
gem "med_pipe"
|
43
60
|
```
|
44
61
|
|
45
|
-
### migration
|
62
|
+
### Adding migration files
|
46
63
|
|
47
64
|
```shell
|
48
65
|
$ rails med_pipe:install:migrations
|
@@ -9,18 +9,16 @@ class MedPipe::PipelinePlan < MedPipe::ApplicationRecord
|
|
9
9
|
validates :output_unit, presence: true
|
10
10
|
validates :status, presence: true
|
11
11
|
|
12
|
-
|
13
|
-
# https://zenn.dev/kanazawa/articles/8bc1fcbba3ef1d#enum%E3%81%AE%E5%AE%9A%E7%BE%A9%E6%96%B9%E6%B3%95%E3%81%8C%E5%A4%89%E3%82%8F%E3%82%8B
|
14
|
-
enum status: {
|
12
|
+
enum :status, {
|
15
13
|
waiting: "waiting",
|
16
14
|
enqueued: "enqueued",
|
17
15
|
running: "running",
|
18
16
|
finished: "finished",
|
19
17
|
failed: "failed"
|
20
|
-
},
|
18
|
+
}, prefix: true, default: :waiting
|
21
19
|
|
22
|
-
enum output_unit
|
20
|
+
enum :output_unit, {
|
23
21
|
daily: "daily",
|
24
22
|
all: "all"
|
25
|
-
},
|
23
|
+
}, prefix: true
|
26
24
|
end
|
data/lib/med_pipe/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: med_pipe
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- mpg-taichi-sato
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-11-
|
11
|
+
date: 2024-11-29 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rails
|
@@ -16,20 +16,14 @@ dependencies:
|
|
16
16
|
requirements:
|
17
17
|
- - ">="
|
18
18
|
- !ruby/object:Gem::Version
|
19
|
-
version:
|
20
|
-
- - "<"
|
21
|
-
- !ruby/object:Gem::Version
|
22
|
-
version: '8.0'
|
19
|
+
version: 7.2.0
|
23
20
|
type: :runtime
|
24
21
|
prerelease: false
|
25
22
|
version_requirements: !ruby/object:Gem::Requirement
|
26
23
|
requirements:
|
27
24
|
- - ">="
|
28
25
|
- !ruby/object:Gem::Version
|
29
|
-
version:
|
30
|
-
- - "<"
|
31
|
-
- !ruby/object:Gem::Version
|
32
|
-
version: '8.0'
|
26
|
+
version: 7.2.0
|
33
27
|
description: Provides a system for processing data ranging from 1 million to several
|
34
28
|
billion records
|
35
29
|
email:
|