med_pipe 0.1.1 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 254b6d0a324f400418ad4ead68cc2741f779e77a1a18f01b0e865737d783358b
4
- data.tar.gz: ab801a44a97d8ff5a9f6bba5ca023bd0dc1712707d20b7d3754981f6c7528228
3
+ metadata.gz: e033daa9e892bde3d031d927cf70e7ea1164edcef333eea69c0cdb71d17124d2
4
+ data.tar.gz: 1253ceea7ed2d9021c2e620a52addfe813fd80bda18f2a55920e18b9f7134623
5
5
  SHA512:
6
- metadata.gz: 8d94037e54f43df01d53f95057e21597660afa3c4eedb24d107ed2ffeb5d7d5c823f4ed9f0ea28a9267bd26d58be6fc22e7dbc6eb918ddb059bbead9db85f418
7
- data.tar.gz: 0c744341829cb8f970acbd26da6e3b6708534b18e2ad0a34cc8641b7ba48d313a6f788de486827c37b7ffd0238d33227f76130e40ddc5753159b53f12af3dee9
6
+ metadata.gz: bbf20fedd6d3d99d1789da72fd6e7555357882a4acf17f6f18e97d23aa739e4b028beecbf149a9ac28017cefb1a92079ec7bb3c9f78e39a4cb8311496a051feb
7
+ data.tar.gz: bcd4881d759ccb04b2c1f9bac09f758534098b5524ff266952d0ccd5642c0c96536e94a3a8e95c5246f724bc9f9a3ae65d42b69550828116eb3c2c3e8655e219
data/README.md CHANGED
@@ -1,39 +1,56 @@
1
- # MedPipe <sup>BETA</sup>
2
- 100万 ~ 数10億程度のデータを処理するための仕組みを提供する Rails エンジンです。
1
+ # MedPipe
2
+ ![test_badge](https://github.com/medpeer-dev/med_pipe/actions/workflows/test.yml/badge.svg)
3
+
4
+ A Rails engine that provides mechanisms for processing datasets ranging from 1 million to several billion records.
3
5
 
4
6
  ## Concept
7
+
8
+ ![MedPipeConcept](https://github.com/user-attachments/assets/69ef986b-33cc-478c-830f-78d24ff6c9f4)
9
+
5
10
  ### MedPipe::Pipeline
6
- apply で後述する PipelineTask を登録し、run で順番に実行します。
11
+ Register PipelineTask through 'apply' method and execute them sequentially using 'run'.
7
12
 
8
13
  ### MedPipe::PipelineTask
9
- Pipeline に登録する処理の単位です。
10
- DB からの読み込みや、S3 へのアップロード等やることを分割してタスク化します。
11
- 大量データを扱う際には Enumerable::Lazy を使うことで分割して処理をすることができます。
12
- call を実装する必要があります
14
+ This is the basic unit of processing registered in the pipeline.
15
+ Tasks are divided into specific operations such as reading from DB or uploading to S3.
16
+ When handling large datasets, Enumerable::Lazy can be used to process data in chunks.
17
+ You need to implement the 'call' method:
13
18
 
14
- ```.rb
19
+ ```ruby
15
20
  @param context [Hash] Stores data during pipeline execution
16
21
  @param prev_result [Object] The result of the previous task
17
22
  def call(context, prev_result)
18
- yield 次のTaskに渡すデータ
23
+ yield "data_to_pass_to_next_task"
19
24
  end
20
25
  ```
21
26
 
22
27
  ### MedPipe::PipelinePlan
23
- Pipeline の状態、オプション、結果を保存するためのモデルです。
24
- Task で使うためのオプションを渡す方法は PipelinePlan から取得するか、contextで伝搬するかの二択です。
28
+ A model for storing pipeline state, options, and results.
29
+ There are two ways to pass options for tasks: either retrieve from PipelinePlan or propagate through context.
25
30
 
26
31
  ### MedPipe::PipelineGroup
27
- 一つのジョブで実行する Plan をまとめるためのモデルです。
28
- 実行中に parallel_limit 0 にすることで中断することができます。
32
+ A model for grouping plans.
33
+ Execution can be interrupted by setting parallel_limit to 0 during runtime.
29
34
 
30
35
  ## Usage
31
36
 
32
- 1. Reader, Uploader 等の PipelineTask を作成 [Samples](https://github.com/medpeer-dev/med_pipe/tree/main/spec/dummy/app/models/pipeline_task)
33
- 2. PipelineRunner を作成 [Sample](https://github.com/medpeer-dev/med_pipe/blob/main/spec/dummy/app/models/sample_pipeline_runner.rb)
34
- 3. Pipeline を並列実行するためのジョブを作成 [Sample](https://github.com/medpeer-dev/med_pipe/blob/main/spec/dummy/app/jobs/sample_execute_pipeline_job.rb)
35
- 4. PipelinePlan を登録するコードを記述
36
- 5. 実行
37
+ 1. Create PipelineTask such as Reader, Uploader, etc. [Samples](https://github.com/medpeer-dev/med_pipe/tree/main/spec/dummy/app/models/pipeline_task)
38
+ 2. Create PipelineRunner [Sample](https://github.com/medpeer-dev/med_pipe/blob/main/spec/dummy/app/models/sample_pipeline_runner.rb)
39
+ 3. Create a job for parallel Pipeline execution [Sample](https://github.com/medpeer-dev/med_pipe/blob/main/spec/dummy/app/jobs/sample_execute_pipeline_job.rb)
40
+ 4. Write code to register PipelinePlan
41
+ 5. Execute like this:
42
+
43
+ ```ruby
44
+ # add plan
45
+ pipeline_group = MedPipe::PipelineGroup.create!(parallel_limit: 10)
46
+ date_range = Date.new(2024, 6, 1)..Date.new(2024, 6, 30)
47
+ date_range.each do |date|
48
+ pipeline_group.pipeline_plans.status_waiting.create!(name: 'point_events', output_unit: :daily, target_date: date)
49
+ end
50
+
51
+ # execute
52
+ ExecutePipelineJob.perform_later(pipeline_group.id)
53
+ ```
37
54
 
38
55
  ## Installation
39
56
  Add this line to your application's Gemfile:
@@ -42,7 +59,7 @@ Add this line to your application's Gemfile:
42
59
  gem "med_pipe"
43
60
  ```
44
61
 
45
- ### migrationファイルの追加
62
+ ### Adding migration files
46
63
 
47
64
  ```shell
48
65
  $ rails med_pipe:install:migrations
@@ -9,18 +9,16 @@ class MedPipe::PipelinePlan < MedPipe::ApplicationRecord
9
9
  validates :output_unit, presence: true
10
10
  validates :status, presence: true
11
11
 
12
- # TODO: Rails6記法のため、Rails8に上げる際に定義の仕方を変える
13
- # https://zenn.dev/kanazawa/articles/8bc1fcbba3ef1d#enum%E3%81%AE%E5%AE%9A%E7%BE%A9%E6%96%B9%E6%B3%95%E3%81%8C%E5%A4%89%E3%82%8F%E3%82%8B
14
- enum status: {
12
+ enum :status, {
15
13
  waiting: "waiting",
16
14
  enqueued: "enqueued",
17
15
  running: "running",
18
16
  finished: "finished",
19
17
  failed: "failed"
20
- }, _prefix: true
18
+ }, prefix: true, default: :waiting
21
19
 
22
- enum output_unit: {
20
+ enum :output_unit, {
23
21
  daily: "daily",
24
22
  all: "all"
25
- }, _prefix: true
23
+ }, prefix: true
26
24
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module MedPipe
4
- VERSION = "0.1.1"
4
+ VERSION = "0.2.0"
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: med_pipe
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - mpg-taichi-sato
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2024-11-28 00:00:00.000000000 Z
11
+ date: 2024-11-29 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rails
@@ -16,20 +16,14 @@ dependencies:
16
16
  requirements:
17
17
  - - ">="
18
18
  - !ruby/object:Gem::Version
19
- version: 6.1.7
20
- - - "<"
21
- - !ruby/object:Gem::Version
22
- version: '8.0'
19
+ version: 7.2.0
23
20
  type: :runtime
24
21
  prerelease: false
25
22
  version_requirements: !ruby/object:Gem::Requirement
26
23
  requirements:
27
24
  - - ">="
28
25
  - !ruby/object:Gem::Version
29
- version: 6.1.7
30
- - - "<"
31
- - !ruby/object:Gem::Version
32
- version: '8.0'
26
+ version: 7.2.0
33
27
  description: Provides a system for processing data ranging from 1 million to several
34
28
  billion records
35
29
  email: