med_pipe 0.1.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 254b6d0a324f400418ad4ead68cc2741f779e77a1a18f01b0e865737d783358b
4
- data.tar.gz: ab801a44a97d8ff5a9f6bba5ca023bd0dc1712707d20b7d3754981f6c7528228
3
+ metadata.gz: e033daa9e892bde3d031d927cf70e7ea1164edcef333eea69c0cdb71d17124d2
4
+ data.tar.gz: 1253ceea7ed2d9021c2e620a52addfe813fd80bda18f2a55920e18b9f7134623
5
5
  SHA512:
6
- metadata.gz: 8d94037e54f43df01d53f95057e21597660afa3c4eedb24d107ed2ffeb5d7d5c823f4ed9f0ea28a9267bd26d58be6fc22e7dbc6eb918ddb059bbead9db85f418
7
- data.tar.gz: 0c744341829cb8f970acbd26da6e3b6708534b18e2ad0a34cc8641b7ba48d313a6f788de486827c37b7ffd0238d33227f76130e40ddc5753159b53f12af3dee9
6
+ metadata.gz: bbf20fedd6d3d99d1789da72fd6e7555357882a4acf17f6f18e97d23aa739e4b028beecbf149a9ac28017cefb1a92079ec7bb3c9f78e39a4cb8311496a051feb
7
+ data.tar.gz: bcd4881d759ccb04b2c1f9bac09f758534098b5524ff266952d0ccd5642c0c96536e94a3a8e95c5246f724bc9f9a3ae65d42b69550828116eb3c2c3e8655e219
data/README.md CHANGED
@@ -1,39 +1,56 @@
1
- # MedPipe <sup>BETA</sup>
2
- 100万 ~ 数10億程度のデータを処理するための仕組みを提供する Rails エンジンです。
1
+ # MedPipe
2
+ ![test_badge](https://github.com/medpeer-dev/med_pipe/actions/workflows/test.yml/badge.svg)
3
+
4
+ A Rails engine that provides mechanisms for processing datasets ranging from 1 million to several billion records.
3
5
 
4
6
  ## Concept
7
+
8
+ ![MedPipeConcept](https://github.com/user-attachments/assets/69ef986b-33cc-478c-830f-78d24ff6c9f4)
9
+
5
10
  ### MedPipe::Pipeline
6
- apply で後述する PipelineTask を登録し、run で順番に実行します。
11
+ Register PipelineTask through 'apply' method and execute them sequentially using 'run'.
7
12
 
8
13
  ### MedPipe::PipelineTask
9
- Pipeline に登録する処理の単位です。
10
- DB からの読み込みや、S3 へのアップロード等やることを分割してタスク化します。
11
- 大量データを扱う際には Enumerable::Lazy を使うことで分割して処理をすることができます。
12
- call を実装する必要があります
14
+ This is the basic unit of processing registered in the pipeline.
15
+ Tasks are divided into specific operations such as reading from DB or uploading to S3.
16
+ When handling large datasets, Enumerable::Lazy can be used to process data in chunks.
17
+ You need to implement the 'call' method:
13
18
 
14
- ```.rb
19
+ ```ruby
15
20
  @param context [Hash] Stores data during pipeline execution
16
21
  @param prev_result [Object] The result of the previous task
17
22
  def call(context, prev_result)
18
- yield 次のTaskに渡すデータ
23
+ yield "data_to_pass_to_next_task"
19
24
  end
20
25
  ```
21
26
 
22
27
  ### MedPipe::PipelinePlan
23
- Pipeline の状態、オプション、結果を保存するためのモデルです。
24
- Task で使うためのオプションを渡す方法は PipelinePlan から取得するか、contextで伝搬するかの二択です。
28
+ A model for storing pipeline state, options, and results.
29
+ There are two ways to pass options for tasks: either retrieve from PipelinePlan or propagate through context.
25
30
 
26
31
  ### MedPipe::PipelineGroup
27
- 一つのジョブで実行する Plan をまとめるためのモデルです。
28
- 実行中に parallel_limit 0 にすることで中断することができます。
32
+ A model for grouping plans.
33
+ Execution can be interrupted by setting parallel_limit to 0 during runtime.
29
34
 
30
35
  ## Usage
31
36
 
32
- 1. Reader, Uploader 等の PipelineTask を作成 [Samples](https://github.com/medpeer-dev/med_pipe/tree/main/spec/dummy/app/models/pipeline_task)
33
- 2. PipelineRunner を作成 [Sample](https://github.com/medpeer-dev/med_pipe/blob/main/spec/dummy/app/models/sample_pipeline_runner.rb)
34
- 3. Pipeline を並列実行するためのジョブを作成 [Sample](https://github.com/medpeer-dev/med_pipe/blob/main/spec/dummy/app/jobs/sample_execute_pipeline_job.rb)
35
- 4. PipelinePlan を登録するコードを記述
36
- 5. 実行
37
+ 1. Create PipelineTask such as Reader, Uploader, etc. [Samples](https://github.com/medpeer-dev/med_pipe/tree/main/spec/dummy/app/models/pipeline_task)
38
+ 2. Create PipelineRunner [Sample](https://github.com/medpeer-dev/med_pipe/blob/main/spec/dummy/app/models/sample_pipeline_runner.rb)
39
+ 3. Create a job for parallel Pipeline execution [Sample](https://github.com/medpeer-dev/med_pipe/blob/main/spec/dummy/app/jobs/sample_execute_pipeline_job.rb)
40
+ 4. Write code to register PipelinePlan
41
+ 5. Execute like this:
42
+
43
+ ```ruby
44
+ # add plan
45
+ pipeline_group = MedPipe::PipelineGroup.create!(parallel_limit: 10)
46
+ date_range = Date.new(2024, 6, 1)..Date.new(2024, 6, 30)
47
+ date_range.each do |date|
48
+ pipeline_group.pipeline_plans.status_waiting.create!(name: 'point_events', output_unit: :daily, target_date: date)
49
+ end
50
+
51
+ # execute
52
+ ExecutePipelineJob.perform_later(pipeline_group.id)
53
+ ```
37
54
 
38
55
  ## Installation
39
56
  Add this line to your application's Gemfile:
@@ -42,7 +59,7 @@ Add this line to your application's Gemfile:
42
59
  gem "med_pipe"
43
60
  ```
44
61
 
45
- ### migrationファイルの追加
62
+ ### Adding migration files
46
63
 
47
64
  ```shell
48
65
  $ rails med_pipe:install:migrations
@@ -9,18 +9,16 @@ class MedPipe::PipelinePlan < MedPipe::ApplicationRecord
9
9
  validates :output_unit, presence: true
10
10
  validates :status, presence: true
11
11
 
12
- # TODO: Rails6記法のため、Rails8に上げる際に定義の仕方を変える
13
- # https://zenn.dev/kanazawa/articles/8bc1fcbba3ef1d#enum%E3%81%AE%E5%AE%9A%E7%BE%A9%E6%96%B9%E6%B3%95%E3%81%8C%E5%A4%89%E3%82%8F%E3%82%8B
14
- enum status: {
12
+ enum :status, {
15
13
  waiting: "waiting",
16
14
  enqueued: "enqueued",
17
15
  running: "running",
18
16
  finished: "finished",
19
17
  failed: "failed"
20
- }, _prefix: true
18
+ }, prefix: true, default: :waiting
21
19
 
22
- enum output_unit: {
20
+ enum :output_unit, {
23
21
  daily: "daily",
24
22
  all: "all"
25
- }, _prefix: true
23
+ }, prefix: true
26
24
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module MedPipe
4
- VERSION = "0.1.1"
4
+ VERSION = "0.2.0"
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: med_pipe
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - mpg-taichi-sato
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2024-11-28 00:00:00.000000000 Z
11
+ date: 2024-11-29 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rails
@@ -16,20 +16,14 @@ dependencies:
16
16
  requirements:
17
17
  - - ">="
18
18
  - !ruby/object:Gem::Version
19
- version: 6.1.7
20
- - - "<"
21
- - !ruby/object:Gem::Version
22
- version: '8.0'
19
+ version: 7.2.0
23
20
  type: :runtime
24
21
  prerelease: false
25
22
  version_requirements: !ruby/object:Gem::Requirement
26
23
  requirements:
27
24
  - - ">="
28
25
  - !ruby/object:Gem::Version
29
- version: 6.1.7
30
- - - "<"
31
- - !ruby/object:Gem::Version
32
- version: '8.0'
26
+ version: 7.2.0
33
27
  description: Provides a system for processing data ranging from 1 million to several
34
28
  billion records
35
29
  email: