med_pipe 0.1.1 → 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +36 -19
- data/app/models/med_pipe/pipeline_plan.rb +4 -6
- data/lib/med_pipe/version.rb +1 -1
- metadata +4 -10
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: e033daa9e892bde3d031d927cf70e7ea1164edcef333eea69c0cdb71d17124d2
|
4
|
+
data.tar.gz: 1253ceea7ed2d9021c2e620a52addfe813fd80bda18f2a55920e18b9f7134623
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: bbf20fedd6d3d99d1789da72fd6e7555357882a4acf17f6f18e97d23aa739e4b028beecbf149a9ac28017cefb1a92079ec7bb3c9f78e39a4cb8311496a051feb
|
7
|
+
data.tar.gz: bcd4881d759ccb04b2c1f9bac09f758534098b5524ff266952d0ccd5642c0c96536e94a3a8e95c5246f724bc9f9a3ae65d42b69550828116eb3c2c3e8655e219
|
data/README.md
CHANGED
@@ -1,39 +1,56 @@
|
|
1
|
-
# MedPipe
|
2
|
-
|
1
|
+
# MedPipe
|
2
|
+
![test_badge](https://github.com/medpeer-dev/med_pipe/actions/workflows/test.yml/badge.svg)
|
3
|
+
|
4
|
+
A Rails engine that provides mechanisms for processing datasets ranging from 1 million to several billion records.
|
3
5
|
|
4
6
|
## Concept
|
7
|
+
|
8
|
+
![MedPipeConcept](https://github.com/user-attachments/assets/69ef986b-33cc-478c-830f-78d24ff6c9f4)
|
9
|
+
|
5
10
|
### MedPipe::Pipeline
|
6
|
-
apply
|
11
|
+
Register PipelineTask through 'apply' method and execute them sequentially using 'run'.
|
7
12
|
|
8
13
|
### MedPipe::PipelineTask
|
9
|
-
|
10
|
-
DB
|
11
|
-
|
12
|
-
call
|
14
|
+
This is the basic unit of processing registered in the pipeline.
|
15
|
+
Tasks are divided into specific operations such as reading from DB or uploading to S3.
|
16
|
+
When handling large datasets, Enumerable::Lazy can be used to process data in chunks.
|
17
|
+
You need to implement the 'call' method:
|
13
18
|
|
14
|
-
|
19
|
+
```ruby
|
15
20
|
@param context [Hash] Stores data during pipeline execution
|
16
21
|
@param prev_result [Object] The result of the previous task
|
17
22
|
def call(context, prev_result)
|
18
|
-
yield
|
23
|
+
yield "data_to_pass_to_next_task"
|
19
24
|
end
|
20
25
|
```
|
21
26
|
|
22
27
|
### MedPipe::PipelinePlan
|
23
|
-
|
24
|
-
|
28
|
+
A model for storing pipeline state, options, and results.
|
29
|
+
There are two ways to pass options for tasks: either retrieve from PipelinePlan or propagate through context.
|
25
30
|
|
26
31
|
### MedPipe::PipelineGroup
|
27
|
-
|
28
|
-
|
32
|
+
A model for grouping plans.
|
33
|
+
Execution can be interrupted by setting parallel_limit to 0 during runtime.
|
29
34
|
|
30
35
|
## Usage
|
31
36
|
|
32
|
-
1. Reader, Uploader
|
33
|
-
2. PipelineRunner
|
34
|
-
3. Pipeline
|
35
|
-
4. PipelinePlan
|
36
|
-
5.
|
37
|
+
1. Create PipelineTask such as Reader, Uploader, etc. [Samples](https://github.com/medpeer-dev/med_pipe/tree/main/spec/dummy/app/models/pipeline_task)
|
38
|
+
2. Create PipelineRunner [Sample](https://github.com/medpeer-dev/med_pipe/blob/main/spec/dummy/app/models/sample_pipeline_runner.rb)
|
39
|
+
3. Create a job for parallel Pipeline execution [Sample](https://github.com/medpeer-dev/med_pipe/blob/main/spec/dummy/app/jobs/sample_execute_pipeline_job.rb)
|
40
|
+
4. Write code to register PipelinePlan
|
41
|
+
5. Execute like this:
|
42
|
+
|
43
|
+
```ruby
|
44
|
+
# add plan
|
45
|
+
pipeline_group = MedPipe::PipelineGroup.create!(parallel_limit: 10)
|
46
|
+
date_range = Date.new(2024, 6, 1)..Date.new(2024, 6, 30)
|
47
|
+
date_range.each do |date|
|
48
|
+
pipeline_group.pipeline_plans.status_waiting.create!(name: 'point_events', output_unit: :daily, target_date: date)
|
49
|
+
end
|
50
|
+
|
51
|
+
# execute
|
52
|
+
ExecutePipelineJob.perform_later(pipeline_group.id)
|
53
|
+
```
|
37
54
|
|
38
55
|
## Installation
|
39
56
|
Add this line to your application's Gemfile:
|
@@ -42,7 +59,7 @@ Add this line to your application's Gemfile:
|
|
42
59
|
gem "med_pipe"
|
43
60
|
```
|
44
61
|
|
45
|
-
### migration
|
62
|
+
### Adding migration files
|
46
63
|
|
47
64
|
```shell
|
48
65
|
$ rails med_pipe:install:migrations
|
@@ -9,18 +9,16 @@ class MedPipe::PipelinePlan < MedPipe::ApplicationRecord
|
|
9
9
|
validates :output_unit, presence: true
|
10
10
|
validates :status, presence: true
|
11
11
|
|
12
|
-
|
13
|
-
# https://zenn.dev/kanazawa/articles/8bc1fcbba3ef1d#enum%E3%81%AE%E5%AE%9A%E7%BE%A9%E6%96%B9%E6%B3%95%E3%81%8C%E5%A4%89%E3%82%8F%E3%82%8B
|
14
|
-
enum status: {
|
12
|
+
enum :status, {
|
15
13
|
waiting: "waiting",
|
16
14
|
enqueued: "enqueued",
|
17
15
|
running: "running",
|
18
16
|
finished: "finished",
|
19
17
|
failed: "failed"
|
20
|
-
},
|
18
|
+
}, prefix: true, default: :waiting
|
21
19
|
|
22
|
-
enum output_unit
|
20
|
+
enum :output_unit, {
|
23
21
|
daily: "daily",
|
24
22
|
all: "all"
|
25
|
-
},
|
23
|
+
}, prefix: true
|
26
24
|
end
|
data/lib/med_pipe/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: med_pipe
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- mpg-taichi-sato
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-11-
|
11
|
+
date: 2024-11-29 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rails
|
@@ -16,20 +16,14 @@ dependencies:
|
|
16
16
|
requirements:
|
17
17
|
- - ">="
|
18
18
|
- !ruby/object:Gem::Version
|
19
|
-
version:
|
20
|
-
- - "<"
|
21
|
-
- !ruby/object:Gem::Version
|
22
|
-
version: '8.0'
|
19
|
+
version: 7.2.0
|
23
20
|
type: :runtime
|
24
21
|
prerelease: false
|
25
22
|
version_requirements: !ruby/object:Gem::Requirement
|
26
23
|
requirements:
|
27
24
|
- - ">="
|
28
25
|
- !ruby/object:Gem::Version
|
29
|
-
version:
|
30
|
-
- - "<"
|
31
|
-
- !ruby/object:Gem::Version
|
32
|
-
version: '8.0'
|
26
|
+
version: 7.2.0
|
33
27
|
description: Provides a system for processing data ranging from 1 million to several
|
34
28
|
billion records
|
35
29
|
email:
|