pipen-cli-gbatch 0.1.5__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,246 @@
1
+ Metadata-Version: 2.4
2
+ Name: pipen-cli-gbatch
3
+ Version: 0.1.5
4
+ Summary: A pipen cli plugin to run command via Google Cloud Batch
5
+ License: MIT
6
+ Author: pwwang
7
+ Author-email: pwwang@pwwang.com
8
+ Requires-Python: >=3.9,<4.0
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Programming Language :: Python :: 3
11
+ Classifier: Programming Language :: Python :: 3.9
12
+ Classifier: Programming Language :: Python :: 3.10
13
+ Classifier: Programming Language :: Python :: 3.11
14
+ Classifier: Programming Language :: Python :: 3.12
15
+ Classifier: Programming Language :: Python :: 3.13
16
+ Classifier: Programming Language :: Python :: 3.14
17
+ Requires-Dist: google-cloud-storage (>=3.0.0,<4.0.0)
18
+ Requires-Dist: pipen (>=0.17.19,<0.18.0)
19
+ Requires-Dist: pipen-poplog (>=0.3.6,<0.4.0)
20
+ Project-URL: Homepage, https://github.com/pwwang/pipen-cli-gbatch
21
+ Project-URL: Repository, https://github.com/pwwang/pipen-cli-gbatch
22
+ Description-Content-Type: text/markdown
23
+
24
+ # pipen-cli-gbatch
25
+
26
+ A pipen CLI plugin to run commands via Google Cloud Batch.
27
+
28
+ The idea is to submit the command using xqute and use the gbatch scheduler to run it on Google Cloud Batch.
29
+
30
+ ## Installation
31
+
32
+ ```bash
33
+ pip install pipen-cli-gbatch
34
+ ```
35
+
36
+ ## Usage
37
+
38
+ ### Basic Command Execution
39
+
40
+ To run a command like:
41
+
42
+ ```bash
43
+ python myscript.py --input input.txt --output output.txt
44
+ ```
45
+
46
+ You can run it with:
47
+
48
+ ```bash
49
+ pipen gbatch -- python myscript.py --input input.txt --output output.txt
50
+ ```
51
+
52
+ ### With Configuration File
53
+
54
+ In order to provide configurations like we do for a normal pipen pipeline, you can also provide a config file (the `[pipen-cli-gbatch]` section will be used):
55
+
56
+ ```bash
57
+ pipen gbatch @config.toml -- \
58
+ python myscript.py --input input.txt --output output.txt
59
+ ```
60
+
61
+ ### Detached Mode
62
+
63
+ We can also use the `--nowait` option to run the command in a detached mode:
64
+
65
+ ```bash
66
+ pipen gbatch --nowait -- \
67
+ python myscript.py --input input.txt --output output.txt
68
+ ```
69
+
70
+ Or by default, it will wait for the command to complete:
71
+
72
+ ```bash
73
+ pipen gbatch -- \
74
+ python myscript.py --input input.txt --output output.txt
75
+ ```
76
+
77
+ While waiting, the running logs will be pulled and shown in the terminal.
78
+
79
+ ### View Logs
80
+
81
+ When running in detached mode, one can also pull the logs later by:
82
+
83
+ ```bash
84
+ pipen gbatch --view-logs -- \
85
+ python myscript.py --input input.txt --output output.txt
86
+
87
+ # or just provide the workdir
88
+ pipen gbatch --view-logs --workdir gs://my-bucket/workdir
89
+ ```
90
+
91
+ ## Configuration
92
+
93
+ Because the daemon pipeline is running on Google Cloud Batch, a Google Storage Bucket path is required for the workdir. For example: `gs://my-bucket/workdir`
94
+
95
+ A unique job ID will be generated per the name (`--name`) and workdir, so that if the same command is run again with the same name and workdir, it will not start a new job, but just attach to the existing job and pull the logs.
96
+
97
+ If `--name` is not provided in the command line, it will try to grab the name (`--name`) from the command line arguments after `--`, or else use "name" from the root section of the configuration file, with a "GbatchDaemon" suffix. If nothing can be found, a default name "PipenGbatchDaemon" will be used.
98
+
99
+ Then a workdir `{workdir}/<daemon pipeline name>/` will be created to store the meta information.
100
+
101
+ With `--profile` provided, the scheduler options (`scheduler_opts`) defined in `~/.pipen.toml` and `./.pipen.toml` will be used as default.
102
+
103
+ ## All Options
104
+
105
+ ```bash
106
+ > pipen gbatch --help
107
+ Usage: pipen gbatch [-h] [--nowait | --view-logs {all,stdout,stderr}] [--workdir WORKDIR]
108
+ [--error-strategy {retry,halt}] [--num-retries NUM_RETRIES] [--prescript PRESCRIPT]
109
+ [--postscript POSTSCRIPT] [--jobname-prefix JOBNAME_PREFIX] [--recheck-interval RECHECK_INTERVAL]
110
+ [--cwd CWD] [--project PROJECT] [--location LOCATION] [--mount MOUNT]
111
+ [--service-account SERVICE_ACCOUNT] [--network NETWORK] [--subnetwork SUBNETWORK]
112
+ [--no-external-ip-address] [--machine-type MACHINE_TYPE] [--provisioning-model {STANDARD,SPOT}]
113
+ [--image-uri IMAGE_URI] [--entrypoint ENTRYPOINT] [--commands COMMANDS] [--runnables RUNNABLES]
114
+ [--allocationPolicy ALLOCATIONPOLICY] [--taskGroups TASKGROUPS] [--labels LABELS] [--gcloud GCLOUD]
115
+ [--name NAME] [--profile PROFILE] [--version]
116
+ [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,debug,info,warning,error,critical}]
117
+ ...
118
+
119
+ Simplify running commands via Google Cloud Batch.
120
+
121
+ Key Options:
122
+ The key options to run the command.
123
+
124
+ --workdir WORKDIR The workdir (a Google Storage Bucket path is required) to store the meta information of the
125
+ daemon pipeline.
126
+ If not provided, the one from the command will be used.
127
+ command The command passed after `--` to run, with all its arguments. Note that the command should be
128
+ provided after `--`.
129
+
130
+ Scheduler Options:
131
+ The options to configure the gbatch scheduler.
132
+
133
+ --error-strategy {retry,halt}
134
+ The strategy when there is error happened [default: halt]
135
+ --num-retries NUM_RETRIES
136
+ The number of retries when there is error happened. Only valid when --error-strategy is 'retry'.
137
+ [default: 0]
138
+ --prescript PRESCRIPT
139
+ The prescript to run before the main command.
140
+ --postscript POSTSCRIPT
141
+ The postscript to run after the main command.
142
+ --jobname-prefix JOBNAME_PREFIX
143
+ The prefix of the name prefix of the daemon job.
144
+ If not provided, try to generate one from the command to run.
145
+ If the command is also not provided, use 'pipen-gbatch-daemon' as the prefix.
146
+ --recheck-interval RECHECK_INTERVAL
147
+ The interval to recheck the job status, each takes about 0.1 seconds. [default: 600]
148
+ --cwd CWD The working directory to run the command. If not provided, the current directory is used. You
149
+ can pass either a mounted path (inside the VM) or a Google Storage Bucket path (gs://...). If a
150
+ Google Storage Bucket path is provided, the mounted path will be inferred from the mounted paths
151
+ of the VM.
152
+ --project PROJECT The Google Cloud project to run the job.
153
+ --location LOCATION The location to run the job.
154
+ --mount MOUNT The list of mounts to mount to the VM, each in the format of SOURCE:TARGET, where SOURCE must be
155
+ either a Google Storage Bucket path (gs://...). [default: []]
156
+ --service-account SERVICE_ACCOUNT
157
+ The service account to run the job.
158
+ --network NETWORK The network to run the job.
159
+ --subnetwork SUBNETWORK
160
+ The subnetwork to run the job.
161
+ --no-external-ip-address
162
+ Whether to disable external IP address for the VM.
163
+ --machine-type MACHINE_TYPE
164
+ The machine type of the VM.
165
+ --provisioning-model {STANDARD,SPOT}
166
+ The provisioning model of the VM.
167
+ --image-uri IMAGE_URI
168
+ The custom image URI of the VM.
169
+ --entrypoint ENTRYPOINT
170
+ The entry point of the container to run the command.
171
+ --commands COMMANDS The list of commands to run in the container, each as a separate string. [default: []]
172
+ --runnables RUNNABLES
173
+ The JSON string of extra settings of runnables add to the job.json.
174
+ Refer to https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#Runnable
175
+ for details.
176
+ You can have an extra key 'order' for each runnable, where negative values mean to run before
177
+ the main command,
178
+ and positive values mean to run after the main command.
179
+ --allocationPolicy ALLOCATIONPOLICY
180
+ The JSON string of extra settings of allocationPolicy add to the job.json. Refer to
181
+ https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#AllocationPolicy
182
+ for details. [default: {}]
183
+ --taskGroups TASKGROUPS
184
+ The JSON string of extra settings of taskGroups add to the job.json. Refer to
185
+ https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#TaskGroup for
186
+ details. [default: []]
187
+ --labels LABELS The JSON string of labels to add to the job. Refer to
188
+ https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#Job.FIELDS.labels
189
+ for details. [default: {}]
190
+ --gcloud GCLOUD The path to the gcloud command. [default: gcloud]
191
+
192
+ Options:
193
+ -h, --help show this help message and exit
194
+ --nowait Run the command in a detached mode without waiting for its completion. [default: False]
195
+ --view-logs {all,stdout,stderr}
196
+ View the logs of a job.
197
+ --name NAME The name of the daemon pipeline.
198
+ If not provided, try to generate one from the command to run.
199
+ If the command is also not provided, use 'PipenCliGbatchDaemon' as the name.
200
+ --profile PROFILE Use the `scheduler_opts` as the Scheduler Options of a given profile from pipen configuration
201
+ files,
202
+ including ~/.pipen.toml and ./pipen.toml.
203
+ Note that if not provided, nothing will be loaded from the configuration files.
204
+ --version Show the version of the pipen-cli-gbatch package. [default: False]
205
+ --loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,debug,info,warning,error,critical}
206
+ Set the logging level for the daemon process. [default: INFO]
207
+
208
+ Examples:
209
+
210
+ # Run a command and wait for it to complete
211
+ > pipen gbatch --workdir gs://my-bucket/workdir -- \
212
+ python myscript.py --input input.txt --output output.txt
213
+
214
+ # Use named mounts
215
+ > pipen gbatch --workdir gs://my-bucket/workdir --mount INFILE=gs://bucket/path/to/file \
216
+ --mount OUTDIR=gs://bucket/path/to/outdir -- \
217
+ cat $INFILE > $OUTDIR/output.txt
218
+
219
+ # Run a command in a detached mode
220
+ > pipen gbatch --nowait --project $PROJECT --location $LOCATION \
221
+ --workdir gs://my-bucket/workdir -- \
222
+ python myscript.py --input input.txt --output output.txt
223
+
224
+ # If you have a profile defined in ~/.pipen.toml or ./.pipen.toml
225
+ > pipen gbatch --profile myprofile -- \
226
+ python myscript.py --input input.txt --output output.txt
227
+
228
+ # View the logs of a previously run command
229
+ > pipen gbatch --view-logs all --name my-daemon-name \
230
+ --workdir gs://my-bucket/workdir
231
+ ```
232
+
233
+ ## API
234
+
235
+ The API can also be used to run commands programmatically:
236
+
237
+ ```python
238
+ import asyncio
239
+ from pipen_cli_gbatch import CliGbatchDaemon
240
+
241
+ pipe = CliGbatchDaemon(config_for_daemon, command)
242
+ asyncio.run(pipe.run())
243
+ ```
244
+
245
+ Note that the daemon pipeline will always be running without caching, so that the command will always be executed when the pipeline is run.
246
+
@@ -0,0 +1,222 @@
1
+ # pipen-cli-gbatch
2
+
3
+ A pipen CLI plugin to run commands via Google Cloud Batch.
4
+
5
+ The idea is to submit the command using xqute and use the gbatch scheduler to run it on Google Cloud Batch.
6
+
7
+ ## Installation
8
+
9
+ ```bash
10
+ pip install pipen-cli-gbatch
11
+ ```
12
+
13
+ ## Usage
14
+
15
+ ### Basic Command Execution
16
+
17
+ To run a command like:
18
+
19
+ ```bash
20
+ python myscript.py --input input.txt --output output.txt
21
+ ```
22
+
23
+ You can run it with:
24
+
25
+ ```bash
26
+ pipen gbatch -- python myscript.py --input input.txt --output output.txt
27
+ ```
28
+
29
+ ### With Configuration File
30
+
31
+ In order to provide configurations like we do for a normal pipen pipeline, you can also provide a config file (the `[pipen-cli-gbatch]` section will be used):
32
+
33
+ ```bash
34
+ pipen gbatch @config.toml -- \
35
+ python myscript.py --input input.txt --output output.txt
36
+ ```
37
+
38
+ ### Detached Mode
39
+
40
+ We can also use the `--nowait` option to run the command in a detached mode:
41
+
42
+ ```bash
43
+ pipen gbatch --nowait -- \
44
+ python myscript.py --input input.txt --output output.txt
45
+ ```
46
+
47
+ Or by default, it will wait for the command to complete:
48
+
49
+ ```bash
50
+ pipen gbatch -- \
51
+ python myscript.py --input input.txt --output output.txt
52
+ ```
53
+
54
+ While waiting, the running logs will be pulled and shown in the terminal.
55
+
56
+ ### View Logs
57
+
58
+ When running in detached mode, one can also pull the logs later by:
59
+
60
+ ```bash
61
+ pipen gbatch --view-logs -- \
62
+ python myscript.py --input input.txt --output output.txt
63
+
64
+ # or just provide the workdir
65
+ pipen gbatch --view-logs --workdir gs://my-bucket/workdir
66
+ ```
67
+
68
+ ## Configuration
69
+
70
+ Because the daemon pipeline is running on Google Cloud Batch, a Google Storage Bucket path is required for the workdir. For example: `gs://my-bucket/workdir`
71
+
72
+ A unique job ID will be generated per the name (`--name`) and workdir, so that if the same command is run again with the same name and workdir, it will not start a new job, but just attach to the existing job and pull the logs.
73
+
74
+ If `--name` is not provided in the command line, it will try to grab the name (`--name`) from the command line arguments after `--`, or else use "name" from the root section of the configuration file, with a "GbatchDaemon" suffix. If nothing can be found, a default name "PipenGbatchDaemon" will be used.
75
+
76
+ Then a workdir `{workdir}/<daemon pipeline name>/` will be created to store the meta information.
77
+
78
+ With `--profile` provided, the scheduler options (`scheduler_opts`) defined in `~/.pipen.toml` and `./.pipen.toml` will be used as default.
79
+
80
+ ## All Options
81
+
82
+ ```bash
83
+ > pipen gbatch --help
84
+ Usage: pipen gbatch [-h] [--nowait | --view-logs {all,stdout,stderr}] [--workdir WORKDIR]
85
+ [--error-strategy {retry,halt}] [--num-retries NUM_RETRIES] [--prescript PRESCRIPT]
86
+ [--postscript POSTSCRIPT] [--jobname-prefix JOBNAME_PREFIX] [--recheck-interval RECHECK_INTERVAL]
87
+ [--cwd CWD] [--project PROJECT] [--location LOCATION] [--mount MOUNT]
88
+ [--service-account SERVICE_ACCOUNT] [--network NETWORK] [--subnetwork SUBNETWORK]
89
+ [--no-external-ip-address] [--machine-type MACHINE_TYPE] [--provisioning-model {STANDARD,SPOT}]
90
+ [--image-uri IMAGE_URI] [--entrypoint ENTRYPOINT] [--commands COMMANDS] [--runnables RUNNABLES]
91
+ [--allocationPolicy ALLOCATIONPOLICY] [--taskGroups TASKGROUPS] [--labels LABELS] [--gcloud GCLOUD]
92
+ [--name NAME] [--profile PROFILE] [--version]
93
+ [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,debug,info,warning,error,critical}]
94
+ ...
95
+
96
+ Simplify running commands via Google Cloud Batch.
97
+
98
+ Key Options:
99
+ The key options to run the command.
100
+
101
+ --workdir WORKDIR The workdir (a Google Storage Bucket path is required) to store the meta information of the
102
+ daemon pipeline.
103
+ If not provided, the one from the command will be used.
104
+ command The command passed after `--` to run, with all its arguments. Note that the command should be
105
+ provided after `--`.
106
+
107
+ Scheduler Options:
108
+ The options to configure the gbatch scheduler.
109
+
110
+ --error-strategy {retry,halt}
111
+ The strategy when there is error happened [default: halt]
112
+ --num-retries NUM_RETRIES
113
+ The number of retries when there is error happened. Only valid when --error-strategy is 'retry'.
114
+ [default: 0]
115
+ --prescript PRESCRIPT
116
+ The prescript to run before the main command.
117
+ --postscript POSTSCRIPT
118
+ The postscript to run after the main command.
119
+ --jobname-prefix JOBNAME_PREFIX
120
+ The prefix of the name prefix of the daemon job.
121
+ If not provided, try to generate one from the command to run.
122
+ If the command is also not provided, use 'pipen-gbatch-daemon' as the prefix.
123
+ --recheck-interval RECHECK_INTERVAL
124
+ The interval to recheck the job status, each takes about 0.1 seconds. [default: 600]
125
+ --cwd CWD The working directory to run the command. If not provided, the current directory is used. You
126
+ can pass either a mounted path (inside the VM) or a Google Storage Bucket path (gs://...). If a
127
+ Google Storage Bucket path is provided, the mounted path will be inferred from the mounted paths
128
+ of the VM.
129
+ --project PROJECT The Google Cloud project to run the job.
130
+ --location LOCATION The location to run the job.
131
+ --mount MOUNT The list of mounts to mount to the VM, each in the format of SOURCE:TARGET, where SOURCE must be
132
+ either a Google Storage Bucket path (gs://...). [default: []]
133
+ --service-account SERVICE_ACCOUNT
134
+ The service account to run the job.
135
+ --network NETWORK The network to run the job.
136
+ --subnetwork SUBNETWORK
137
+ The subnetwork to run the job.
138
+ --no-external-ip-address
139
+ Whether to disable external IP address for the VM.
140
+ --machine-type MACHINE_TYPE
141
+ The machine type of the VM.
142
+ --provisioning-model {STANDARD,SPOT}
143
+ The provisioning model of the VM.
144
+ --image-uri IMAGE_URI
145
+ The custom image URI of the VM.
146
+ --entrypoint ENTRYPOINT
147
+ The entry point of the container to run the command.
148
+ --commands COMMANDS The list of commands to run in the container, each as a separate string. [default: []]
149
+ --runnables RUNNABLES
150
+ The JSON string of extra settings of runnables add to the job.json.
151
+ Refer to https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#Runnable
152
+ for details.
153
+ You can have an extra key 'order' for each runnable, where negative values mean to run before
154
+ the main command,
155
+ and positive values mean to run after the main command.
156
+ --allocationPolicy ALLOCATIONPOLICY
157
+ The JSON string of extra settings of allocationPolicy add to the job.json. Refer to
158
+ https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#AllocationPolicy
159
+ for details. [default: {}]
160
+ --taskGroups TASKGROUPS
161
+ The JSON string of extra settings of taskGroups add to the job.json. Refer to
162
+ https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#TaskGroup for
163
+ details. [default: []]
164
+ --labels LABELS The JSON string of labels to add to the job. Refer to
165
+ https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#Job.FIELDS.labels
166
+ for details. [default: {}]
167
+ --gcloud GCLOUD The path to the gcloud command. [default: gcloud]
168
+
169
+ Options:
170
+ -h, --help show this help message and exit
171
+ --nowait Run the command in a detached mode without waiting for its completion. [default: False]
172
+ --view-logs {all,stdout,stderr}
173
+ View the logs of a job.
174
+ --name NAME The name of the daemon pipeline.
175
+ If not provided, try to generate one from the command to run.
176
+ If the command is also not provided, use 'PipenCliGbatchDaemon' as the name.
177
+ --profile PROFILE Use the `scheduler_opts` as the Scheduler Options of a given profile from pipen configuration
178
+ files,
179
+ including ~/.pipen.toml and ./pipen.toml.
180
+ Note that if not provided, nothing will be loaded from the configuration files.
181
+ --version Show the version of the pipen-cli-gbatch package. [default: False]
182
+ --loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,debug,info,warning,error,critical}
183
+ Set the logging level for the daemon process. [default: INFO]
184
+
185
+ Examples:
186
+
187
+ # Run a command and wait for it to complete
188
+ > pipen gbatch --workdir gs://my-bucket/workdir -- \
189
+ python myscript.py --input input.txt --output output.txt
190
+
191
+ # Use named mounts
192
+ > pipen gbatch --workdir gs://my-bucket/workdir --mount INFILE=gs://bucket/path/to/file \
193
+ --mount OUTDIR=gs://bucket/path/to/outdir -- \
194
+ cat $INFILE > $OUTDIR/output.txt
195
+
196
+ # Run a command in a detached mode
197
+ > pipen gbatch --nowait --project $PROJECT --location $LOCATION \
198
+ --workdir gs://my-bucket/workdir -- \
199
+ python myscript.py --input input.txt --output output.txt
200
+
201
+ # If you have a profile defined in ~/.pipen.toml or ./.pipen.toml
202
+ > pipen gbatch --profile myprofile -- \
203
+ python myscript.py --input input.txt --output output.txt
204
+
205
+ # View the logs of a previously run command
206
+ > pipen gbatch --view-logs all --name my-daemon-name \
207
+ --workdir gs://my-bucket/workdir
208
+ ```
209
+
210
+ ## API
211
+
212
+ The API can also be used to run commands programmatically:
213
+
214
+ ```python
215
+ import asyncio
216
+ from pipen_cli_gbatch import CliGbatchDaemon
217
+
218
+ pipe = CliGbatchDaemon(config_for_daemon, command)
219
+ asyncio.run(pipe.run())
220
+ ```
221
+
222
+ Note that the daemon pipeline will always be running without caching, so that the command will always be executed when the pipeline is run.