cloudos-cli 2.32.1__tar.gz → 2.33.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (34) hide show
  1. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/PKG-INFO +93 -1
  2. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/README.md +92 -0
  3. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/__main__.py +367 -3
  4. cloudos_cli-2.33.0/cloudos_cli/_version.py +1 -0
  5. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/jobs/job.py +206 -2
  6. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli.egg-info/PKG-INFO +93 -1
  7. cloudos_cli-2.32.1/cloudos_cli/_version.py +0 -1
  8. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/LICENSE +0 -0
  9. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/__init__.py +0 -0
  10. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/clos.py +0 -0
  11. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/configure/__init__.py +0 -0
  12. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/configure/configure.py +0 -0
  13. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/datasets/__init__.py +0 -0
  14. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/datasets/datasets.py +0 -0
  15. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/import_wf/__init__.py +0 -0
  16. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/import_wf/import_wf.py +0 -0
  17. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/jobs/__init__.py +0 -0
  18. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/queue/__init__.py +0 -0
  19. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/queue/queue.py +0 -0
  20. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/utils/__init__.py +0 -0
  21. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/utils/cloud.py +0 -0
  22. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/utils/details.py +0 -0
  23. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/utils/errors.py +0 -0
  24. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/utils/requests.py +0 -0
  25. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/utils/resources.py +0 -0
  26. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli.egg-info/SOURCES.txt +0 -0
  27. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli.egg-info/dependency_links.txt +0 -0
  28. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli.egg-info/entry_points.txt +0 -0
  29. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli.egg-info/requires.txt +0 -0
  30. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli.egg-info/top_level.txt +0 -0
  31. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/setup.cfg +0 -0
  32. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/setup.py +0 -0
  33. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/tests/__init__.py +0 -0
  34. {cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/tests/functions_for_pytest.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: cloudos_cli
3
- Version: 2.32.1
3
+ Version: 2.33.0
4
4
  Summary: Python package for interacting with CloudOS
5
5
  Home-page: https://github.com/lifebit-ai/cloudos-cli
6
6
  Author: David Piñeyro
@@ -420,6 +420,98 @@ command.
420
420
  Other options like `--wait-completion` are also available and work in the same way as for the `cloudos job run` command.
421
421
  Check `cloudos bash job --help` for more details.
422
422
 
423
+ #### Send a bash array-job to CloudOS (parallel sample processing)
424
+
425
+ When running a bash array job, the following options are available to customize the behavior:
426
+
427
+ ##### Array File
428
+ - **`--array-file`**: Specifies the path to a file containing a set of columns useful in running the bash job. This option is **required** when using the command `bash array-job`.
429
+
430
+ ##### Separator
431
+ - **`--separator`**: Defines the separator to use in the array file. Supported separators include:
432
+ - `,` (comma)
433
+ - `;` (semicolon)
434
+ - `tab`
435
+ - `space`
436
+ - `|` (pipe)
437
+ This option is **required** when using the command `bash array-job`.
438
+
439
+ ##### List Columns
440
+ - **`--list-columns`**: Lists the columns available in the array file. This is useful for inspecting the structure of the file. This flag disables sending the job, it just prints the column list, one per line:
441
+
442
+ ```console
443
+ Columns:
444
+ - column1
445
+ - column2
446
+ - column3
447
+ ```
448
+
449
+ ##### Array File Project
450
+ - **`--array-file-project`**: Specifies the name of the project in which the array file is placed, if it is different from the project specified by `--project-name`.
451
+
452
+ ##### Disable Column Check
453
+ - **`--disable-column-check`**: Disables the validation of columns in the array file. This implies that each `--array-parameter` value is not checked against the header of the `--array-file`. For example, `--array-parameter --bar=foo`, without `--disable-column-check`, expects the array file to have column 'foo' inside the file header. If the column is not present, the CLI will throw an error. When `--disable-column-check` flag is added, the column check is not performed and the bash array job is sent to the platform.
454
+
455
+ > [!NOTE]
456
+ > Adding `--disable-column-check` will make the CLI command run without errors, but the errors might appear when checking the job in the platform, if the columns in the array file do not exists, as depicted with `--array-parameter`.
457
+
458
+ ##### Array Parameter
459
+ - **`-a` / `--array-parameter`**: Allows specifying the column name present in the header of the array file. Each parameter should be in the format `arary_parameter_name=array_file_column`. For example:
460
+ - `-a --test=value` or
461
+ - `--array-parameter -test=value`
462
+ specify a column named 'value' in the array file header. Adding array parameters not present in the header will cause an error. This option can be used multiple times to include as many array parameters as needed. This type of parameter is similar to `-p, --parameter`, both parameters can be interpolated in the bash array job command (either with `--command` or `--custom-script-path`), but this parameter can only be used to name the column present in the header of the array file.
463
+
464
+ For example, the array file has the following header:
465
+
466
+ ```console
467
+ id,bgen,csv
468
+ 1,s3://data/adipose.bgen,s3://data/adipose.csv
469
+ 2,s3://data/blood.bgen,s3://data/blood.csv
470
+ 3,s3://data/brain.bgen,s3://data/brain.csv
471
+ ...
472
+ ```
473
+ and in the command there is need to go over the `bgen` column, this can be specified as `--array-parameter file=bgen`, refering to the column in the header.
474
+
475
+ ##### Custom Script Path
476
+ - **`--custom-script-path`**: Specifies the path to a custom script to run in the bash array job instead of a command. When adding this command, parameter `--command` is ignored. To ensure the script runs successfully, you must either:
477
+
478
+ 1. Use a Shebang Line at the Top of the Script
479
+
480
+ The shebang (#!) tells the system which interpreter to use to run the script. The path should match absolute path to python or other interpreter installed inside the docker container.
481
+
482
+ Examples:
483
+ `#!/usr/bin/python3` –-> for Python scripts
484
+ `#!/usr/bin/Rscript` –-> for R scripts
485
+ `#!/bin/bash` –-> for Bash scripts
486
+
487
+ Example Python Script:
488
+
489
+ ```python
490
+ #!/usr/bin/python3
491
+ print("Hello world")
492
+ ```
493
+
494
+ 2. Or use an interpreter command in the executable field
495
+
496
+ If your script doesn’t have a shebang line, you can execute it by explicitly specifying the interpreter in the executable command:
497
+
498
+ ```console
499
+ python my_script.py
500
+ Rscript my_script.R
501
+ bash my_script.sh
502
+ ```
503
+ This assumes the interpreter is available on the container’s $PATH. If not, you can use the full absolute path instead:
504
+
505
+ ```console
506
+ /usr/bin/python3 my_script.py
507
+ /usr/local/bin/Rscript my_script.R
508
+ ```
509
+
510
+ ##### Custom Script Project
511
+ - **`--custom-script-project`**: Specifies the name of the project in which the custom script is placed, if it is different from the project specified by `--project-name`.
512
+
513
+ These options provide flexibility for configuring and running bash array jobs, allowing to tailor the execution for specific requirements.
514
+
423
515
  #### Get path to logs of job from CloudOS
424
516
 
425
517
  Get the path to "Nextflow logs", "Nextflow standard output", and "trace" files. It can be used only on your user's jobs, with any status.
@@ -385,6 +385,98 @@ command.
385
385
  Other options like `--wait-completion` are also available and work in the same way as for the `cloudos job run` command.
386
386
  Check `cloudos bash job --help` for more details.
387
387
 
388
+ #### Send a bash array-job to CloudOS (parallel sample processing)
389
+
390
+ When running a bash array job, the following options are available to customize the behavior:
391
+
392
+ ##### Array File
393
+ - **`--array-file`**: Specifies the path to a file containing a set of columns useful in running the bash job. This option is **required** when using the command `bash array-job`.
394
+
395
+ ##### Separator
396
+ - **`--separator`**: Defines the separator to use in the array file. Supported separators include:
397
+ - `,` (comma)
398
+ - `;` (semicolon)
399
+ - `tab`
400
+ - `space`
401
+ - `|` (pipe)
402
+ This option is **required** when using the command `bash array-job`.
403
+
404
+ ##### List Columns
405
+ - **`--list-columns`**: Lists the columns available in the array file. This is useful for inspecting the structure of the file. This flag disables sending the job, it just prints the column list, one per line:
406
+
407
+ ```console
408
+ Columns:
409
+ - column1
410
+ - column2
411
+ - column3
412
+ ```
413
+
414
+ ##### Array File Project
415
+ - **`--array-file-project`**: Specifies the name of the project in which the array file is placed, if it is different from the project specified by `--project-name`.
416
+
417
+ ##### Disable Column Check
418
+ - **`--disable-column-check`**: Disables the validation of columns in the array file. This implies that each `--array-parameter` value is not checked against the header of the `--array-file`. For example, `--array-parameter --bar=foo`, without `--disable-column-check`, expects the array file to have column 'foo' inside the file header. If the column is not present, the CLI will throw an error. When `--disable-column-check` flag is added, the column check is not performed and the bash array job is sent to the platform.
419
+
420
+ > [!NOTE]
421
+ > Adding `--disable-column-check` will make the CLI command run without errors, but the errors might appear when checking the job in the platform, if the columns in the array file do not exists, as depicted with `--array-parameter`.
422
+
423
+ ##### Array Parameter
424
+ - **`-a` / `--array-parameter`**: Allows specifying the column name present in the header of the array file. Each parameter should be in the format `arary_parameter_name=array_file_column`. For example:
425
+ - `-a --test=value` or
426
+ - `--array-parameter -test=value`
427
+ specify a column named 'value' in the array file header. Adding array parameters not present in the header will cause an error. This option can be used multiple times to include as many array parameters as needed. This type of parameter is similar to `-p, --parameter`, both parameters can be interpolated in the bash array job command (either with `--command` or `--custom-script-path`), but this parameter can only be used to name the column present in the header of the array file.
428
+
429
+ For example, the array file has the following header:
430
+
431
+ ```console
432
+ id,bgen,csv
433
+ 1,s3://data/adipose.bgen,s3://data/adipose.csv
434
+ 2,s3://data/blood.bgen,s3://data/blood.csv
435
+ 3,s3://data/brain.bgen,s3://data/brain.csv
436
+ ...
437
+ ```
438
+ and in the command there is need to go over the `bgen` column, this can be specified as `--array-parameter file=bgen`, refering to the column in the header.
439
+
440
+ ##### Custom Script Path
441
+ - **`--custom-script-path`**: Specifies the path to a custom script to run in the bash array job instead of a command. When adding this command, parameter `--command` is ignored. To ensure the script runs successfully, you must either:
442
+
443
+ 1. Use a Shebang Line at the Top of the Script
444
+
445
+ The shebang (#!) tells the system which interpreter to use to run the script. The path should match absolute path to python or other interpreter installed inside the docker container.
446
+
447
+ Examples:
448
+ `#!/usr/bin/python3` –-> for Python scripts
449
+ `#!/usr/bin/Rscript` –-> for R scripts
450
+ `#!/bin/bash` –-> for Bash scripts
451
+
452
+ Example Python Script:
453
+
454
+ ```python
455
+ #!/usr/bin/python3
456
+ print("Hello world")
457
+ ```
458
+
459
+ 2. Or use an interpreter command in the executable field
460
+
461
+ If your script doesn’t have a shebang line, you can execute it by explicitly specifying the interpreter in the executable command:
462
+
463
+ ```console
464
+ python my_script.py
465
+ Rscript my_script.R
466
+ bash my_script.sh
467
+ ```
468
+ This assumes the interpreter is available on the container’s $PATH. If not, you can use the full absolute path instead:
469
+
470
+ ```console
471
+ /usr/bin/python3 my_script.py
472
+ /usr/local/bin/Rscript my_script.R
473
+ ```
474
+
475
+ ##### Custom Script Project
476
+ - **`--custom-script-project`**: Specifies the name of the project in which the custom script is placed, if it is different from the project specified by `--project-name`.
477
+
478
+ These options provide flexibility for configuring and running bash array jobs, allowing to tailor the execution for specific requirements.
479
+
388
480
  #### Get path to logs of job from CloudOS
389
481
 
390
482
  Get the path to "Nextflow logs", "Nextflow standard output", and "trace" files. It can be used only on your user's jobs, with any status.
@@ -16,7 +16,7 @@ from rich.table import Table
16
16
  from cloudos_cli.datasets import Datasets
17
17
  from cloudos_cli.utils.resources import ssl_selector, format_bytes
18
18
  from rich.style import Style
19
- from cloudos_cli.utils.details import get_path
19
+ from cloudos_cli.utils.details import get_path
20
20
 
21
21
 
22
22
  # GLOBAL VARS
@@ -82,7 +82,8 @@ def run_cloudos_cli(ctx):
82
82
  'list': shared_config
83
83
  },
84
84
  'bash': {
85
- 'job': shared_config
85
+ 'job': shared_config,
86
+ 'array-job': shared_config
86
87
  },
87
88
  'datasets': {
88
89
  'ls': shared_config,
@@ -128,7 +129,8 @@ def run_cloudos_cli(ctx):
128
129
  'list': shared_config
129
130
  },
130
131
  'bash': {
131
- 'job': shared_config
132
+ 'job': shared_config,
133
+ 'array-job': shared_config
132
134
  },
133
135
  'datasets': {
134
136
  'ls': shared_config,
@@ -2185,6 +2187,368 @@ def run_bash_job(ctx,
2185
2187
  f'\t\t--job-id {j_id}\n')
2186
2188
 
2187
2189
 
2190
+ @bash.command('array-job')
2191
+ @click.option('-k',
2192
+ '--apikey',
2193
+ help='Your CloudOS API key',
2194
+ required=True)
2195
+ @click.option('--command',
2196
+ help='The command to run in the bash job.')
2197
+ @click.option('-c',
2198
+ '--cloudos-url',
2199
+ help=(f'The CloudOS url you are trying to access to. Default={CLOUDOS_URL}.'),
2200
+ default=CLOUDOS_URL)
2201
+ @click.option('--workspace-id',
2202
+ help='The specific CloudOS workspace id.',
2203
+ required=True)
2204
+ @click.option('--project-name',
2205
+ help='The name of a CloudOS project.',
2206
+ required=True)
2207
+ @click.option('--workflow-name',
2208
+ help='The name of a CloudOS workflow or pipeline.',
2209
+ required=True)
2210
+ @click.option('-p',
2211
+ '--parameter',
2212
+ multiple=True,
2213
+ help=('A single parameter to pass to the job call. It should be in the ' +
2214
+ 'following form: parameter_name=parameter_value. E.g.: ' +
2215
+ '-p --test=value or -p -test=value or -p test=value. You can use this option as many ' +
2216
+ 'times as parameters you want to include.'))
2217
+ @click.option('--job-name',
2218
+ help='The name of the job. Default=new_job.',
2219
+ default='new_job')
2220
+ @click.option('--do-not-save-logs',
2221
+ help=('Avoids process log saving. If you select this option, your job process ' +
2222
+ 'logs will not be stored.'),
2223
+ is_flag=True)
2224
+ @click.option('--job-queue',
2225
+ help='Name of the job queue to use with a batch job.')
2226
+ @click.option('--instance-type',
2227
+ help=('The type of compute instance to use as master node. ' +
2228
+ 'Default=c5.xlarge(aws)|Standard_D4as_v4(azure).'),
2229
+ default='NONE_SELECTED')
2230
+ @click.option('--instance-disk',
2231
+ help='The disk space of the master node instance, in GB. Default=500.',
2232
+ type=int,
2233
+ default=500)
2234
+ @click.option('--cpus',
2235
+ help='The number of CPUs to use for the task\'s master node. Default=1.',
2236
+ type=int,
2237
+ default=1)
2238
+ @click.option('--memory',
2239
+ help='The amount of memory, in GB, to use for the task\'s master node. Default=4.',
2240
+ type=int,
2241
+ default=4)
2242
+ @click.option('--storage-mode',
2243
+ help=('Either \'lustre\' or \'regular\'. Indicates if the user wants to select ' +
2244
+ 'regular or lustre storage. Default=regular.'),
2245
+ default='regular')
2246
+ @click.option('--lustre-size',
2247
+ help=('The lustre storage to be used when --storage-mode=lustre, in GB. It should ' +
2248
+ 'be 1200 or a multiple of it. Default=1200.'),
2249
+ type=int,
2250
+ default=1200)
2251
+ @click.option('--wait-completion',
2252
+ help=('Whether to wait to job completion and report final ' +
2253
+ 'job status.'),
2254
+ is_flag=True)
2255
+ @click.option('--wait-time',
2256
+ help=('Max time to wait (in seconds) to job completion. ' +
2257
+ 'Default=3600.'),
2258
+ default=3600)
2259
+ @click.option('--repository-platform', type=click.Choice(["github", "gitlab", "bitbucketServer"]),
2260
+ help='Name of the repository platform of the workflow. Default=github.',
2261
+ default='github')
2262
+ @click.option('--execution-platform',
2263
+ help='Name of the execution platform implemented in your CloudOS. Default=aws.',
2264
+ type=click.Choice(['aws', 'azure', 'hpc']),
2265
+ default='aws')
2266
+ @click.option('--cost-limit',
2267
+ help='Add a cost limit to your job. Default=30.0 (For no cost limit please use -1).',
2268
+ type=float,
2269
+ default=30.0)
2270
+ @click.option('--request-interval',
2271
+ help=('Time interval to request (in seconds) the job status. ' +
2272
+ 'For large jobs is important to use a high number to ' +
2273
+ 'make fewer requests so that is not considered spamming by the API. ' +
2274
+ 'Default=30.'),
2275
+ default=30)
2276
+ @click.option('--disable-ssl-verification',
2277
+ help=('Disable SSL certificate verification. Please, remember that this option is ' +
2278
+ 'not generally recommended for security reasons.'),
2279
+ is_flag=True)
2280
+ @click.option('--ssl-cert',
2281
+ help='Path to your SSL certificate file.')
2282
+ @click.option('--profile', help='Profile to use from the config file', default=None)
2283
+ @click.option('--array-file',
2284
+ help=('Path to a file containing an array of commands to run in the bash job.'),
2285
+ default=None,
2286
+ required=True)
2287
+ @click.option('--separator',
2288
+ help=('Separator to use in the array file. Default=",".'),
2289
+ type=click.Choice([',', ';', 'tab', 'space', '|']),
2290
+ default=",",
2291
+ required=True)
2292
+ @click.option('--list-columns',
2293
+ help=('List columns present in the array file. ' +
2294
+ 'This option will not run any job.'),
2295
+ is_flag=True)
2296
+ @click.option('--array-file-project',
2297
+ help=('Name of the project in which the array file is placed, if different from --project-name.'),
2298
+ default=None)
2299
+ @click.option('--disable-column-check',
2300
+ help=('Disable the check for the columns in the array file. ' +
2301
+ 'This option is only used when --array-file is provided.'),
2302
+ is_flag=True)
2303
+ @click.option('-a', '--array-parameter',
2304
+ multiple=True,
2305
+ help=('A single parameter to pass to the job call only for specifying array parameter. It should be in the ' +
2306
+ 'following form: parameter_name=parameter_value. E.g.: ' +
2307
+ '-a --test=value or -a -test=value or -a test=value. You can use this option as many ' +
2308
+ 'times as parameters you want to include.'))
2309
+ @click.option('--custom-script-path',
2310
+ help=('Path of a custom script to run in the bash array job instead of a command.'),
2311
+ default=None)
2312
+ @click.option('--custom-script-project',
2313
+ help=('Name of the project to use when running the custom command script, if ' +
2314
+ 'different than --project-name.'),
2315
+ default=None)
2316
+ @click.pass_context
2317
+ def run_bash_array_job(ctx,
2318
+ apikey,
2319
+ command,
2320
+ cloudos_url,
2321
+ workspace_id,
2322
+ project_name,
2323
+ workflow_name,
2324
+ parameter,
2325
+ job_name,
2326
+ do_not_save_logs,
2327
+ job_queue,
2328
+ instance_type,
2329
+ instance_disk,
2330
+ cpus,
2331
+ memory,
2332
+ storage_mode,
2333
+ lustre_size,
2334
+ wait_completion,
2335
+ wait_time,
2336
+ repository_platform,
2337
+ execution_platform,
2338
+ cost_limit,
2339
+ request_interval,
2340
+ disable_ssl_verification,
2341
+ ssl_cert,
2342
+ profile,
2343
+ array_file,
2344
+ separator,
2345
+ list_columns,
2346
+ array_file_project,
2347
+ disable_column_check,
2348
+ array_parameter,
2349
+ custom_script_path,
2350
+ custom_script_project):
2351
+ """Run a bash array job in CloudOS."""
2352
+ profile = profile or ctx.default_map['bash']['array-job']['profile']
2353
+
2354
+ # Create a dictionary with required and non-required params
2355
+ required_dict = {
2356
+ 'apikey': True,
2357
+ 'workspace_id': True,
2358
+ 'workflow_name': True,
2359
+ 'project_name': True
2360
+ }
2361
+
2362
+ # determine if the user provided all required parameters
2363
+ config_manager = ConfigurationProfile()
2364
+ apikey, cloudos_url, workspace_id, workflow_name, repository_platform, execution_platform, project_name = (
2365
+ config_manager.load_profile_and_validate_data(
2366
+ ctx,
2367
+ INIT_PROFILE,
2368
+ CLOUDOS_URL,
2369
+ profile=profile,
2370
+ required_dict=required_dict,
2371
+ apikey=apikey,
2372
+ cloudos_url=cloudos_url,
2373
+ workspace_id=workspace_id,
2374
+ workflow_name=workflow_name,
2375
+ repository_platform=repository_platform,
2376
+ execution_platform=execution_platform,
2377
+ project_name=project_name
2378
+ )
2379
+ )
2380
+ verify_ssl = ssl_selector(disable_ssl_verification, ssl_cert)
2381
+
2382
+ if not list_columns and not (command or custom_script_path):
2383
+ raise click.UsageError("Must provide --command or --custom-script-path if --list-columns is not set.")
2384
+
2385
+ # when not set, use the global project name
2386
+ if array_file_project is None:
2387
+ array_file_project = project_name
2388
+
2389
+ # this needs to be in another call to datasets, by default it uses the global project name
2390
+ if custom_script_project is None:
2391
+ custom_script_project = project_name
2392
+
2393
+ # setup separators for API and array file (the're different)
2394
+ separators = {
2395
+ ",": { "api": ",", "file": "," },
2396
+ ";": { "api": "%3B", "file": ";" },
2397
+ "space": { "api": "+", "file": " " },
2398
+ "tab": { "api": "tab", "file": "tab" },
2399
+ "|": { "api": "%7C", "file": "|" }
2400
+ }
2401
+
2402
+ # Setup datasets
2403
+ try:
2404
+ ds = Datasets(
2405
+ cloudos_url=cloudos_url,
2406
+ apikey=apikey,
2407
+ workspace_id=workspace_id,
2408
+ project_name=array_file_project,
2409
+ verify=verify_ssl,
2410
+ cromwell_token=None
2411
+ )
2412
+ if custom_script_project is not None:
2413
+ # If a custom script project is specified, create a new Datasets object for it
2414
+ # This allows the user to run custom scripts in a different project
2415
+ ds_custom = Datasets(
2416
+ cloudos_url=cloudos_url,
2417
+ apikey=apikey,
2418
+ workspace_id=workspace_id,
2419
+ project_name=custom_script_project,
2420
+ verify=verify_ssl,
2421
+ cromwell_token=None
2422
+ )
2423
+ except BadRequestException as e:
2424
+ if 'Forbidden' in str(e):
2425
+ print('[Error] It seems your call is not authorised. Please check if ' +
2426
+ 'your workspace is restricted by Airlock and if your API key is valid.')
2427
+ sys.exit(1)
2428
+ else:
2429
+ raise e
2430
+
2431
+ # setup important options for the job
2432
+ if do_not_save_logs:
2433
+ save_logs = False
2434
+ else:
2435
+ save_logs = True
2436
+
2437
+ if instance_type == 'NONE_SELECTED':
2438
+ if execution_platform == 'aws':
2439
+ instance_type = 'c5.xlarge'
2440
+ elif execution_platform == 'azure':
2441
+ instance_type = 'Standard_D4as_v4'
2442
+ else:
2443
+ instance_type = None
2444
+
2445
+ j = jb.Job(cloudos_url, apikey, None, workspace_id, project_name, workflow_name,
2446
+ mainfile=None, importsfile=None,
2447
+ repository_platform=repository_platform, verify=verify_ssl)
2448
+
2449
+ # retrieve columns
2450
+ r = j.retrieve_cols_from_array_file(array_file, ds, separators[separator]['api'], verify_ssl)
2451
+
2452
+ if not disable_column_check:
2453
+ columns = json.loads(r.content).get("headers", None)
2454
+ # pass this to the SEND JOB API call
2455
+ # b'{"headers":[{"index":0,"name":"id"},{"index":1,"name":"title"},{"index":2,"name":"filename"},{"index":3,"name":"file2name"}]}'
2456
+ if columns is None:
2457
+ raise ValueError("No columns found in the array file metadata.")
2458
+ if list_columns:
2459
+ print("Columns: ")
2460
+ for col in columns:
2461
+ print(f"\t- {col['name']}")
2462
+ return
2463
+ else:
2464
+ columns = []
2465
+
2466
+ # setup parameters for the job
2467
+ cmd = j.setup_params_array_file(custom_script_path, ds_custom, command, separators[separator]['file'])
2468
+
2469
+ # check columns in the array file vs parameters added
2470
+ if not disable_column_check and array_parameter:
2471
+ print("\nChecking columns in the array file vs parameters added...\n")
2472
+ for ap in array_parameter:
2473
+ ap_split = ap.split('=')
2474
+ ap_value = '='.join(ap_split[1:])
2475
+ for col in columns:
2476
+ if col['name'] == ap_value:
2477
+ print(f"Found column '{ap_value}' in the array file.")
2478
+ break
2479
+ else:
2480
+ raise ValueError(f"Column '{ap_value}' not found in the array file. " +
2481
+ "Columns in array-file: ", f"{separator}".join([col['name'] for col in columns]))
2482
+
2483
+ if job_queue is not None:
2484
+ batch = True
2485
+ queue = Queue(cloudos_url=cloudos_url, apikey=apikey, cromwell_token=None,
2486
+ workspace_id=workspace_id, verify=verify_ssl)
2487
+ # I have to add 'nextflow', other wise the job queue id is not found
2488
+ job_queue_id = queue.fetch_job_queue_id(workflow_type='nextflow', batch=batch,
2489
+ job_queue=job_queue)
2490
+ else:
2491
+ job_queue_id = None
2492
+ batch = False
2493
+
2494
+ # send job
2495
+ j_id = j.send_job(job_config=None,
2496
+ parameter=parameter,
2497
+ array_parameter=array_parameter,
2498
+ array_file_header=columns,
2499
+ git_commit=None,
2500
+ git_tag=None,
2501
+ git_branch=None,
2502
+ job_name=job_name,
2503
+ resumable=False,
2504
+ save_logs=save_logs,
2505
+ batch=batch,
2506
+ job_queue_id=job_queue_id,
2507
+ workflow_type='docker',
2508
+ nextflow_profile=None,
2509
+ nextflow_version=None,
2510
+ instance_type=instance_type,
2511
+ instance_disk=instance_disk,
2512
+ storage_mode=storage_mode,
2513
+ lustre_size=lustre_size,
2514
+ execution_platform=execution_platform,
2515
+ hpc_id=None,
2516
+ cost_limit=cost_limit,
2517
+ verify=verify_ssl,
2518
+ command=cmd,
2519
+ cpus=cpus,
2520
+ memory=memory)
2521
+
2522
+ print(f'\tYour assigned job id is: {j_id}\n')
2523
+ j_url = f'{cloudos_url}/app/advanced-analytics/analyses/{j_id}'
2524
+ if wait_completion:
2525
+ print('\tPlease, wait until job completion (max wait time of ' +
2526
+ f'{wait_time} seconds).\n')
2527
+ j_status = j.wait_job_completion(job_id=j_id,
2528
+ wait_time=wait_time,
2529
+ request_interval=request_interval,
2530
+ verbose=False,
2531
+ verify=verify_ssl)
2532
+ j_name = j_status['name']
2533
+ j_final_s = j_status['status']
2534
+ if j_final_s == JOB_COMPLETED:
2535
+ print(f'\nJob status for job "{j_name}" (ID: {j_id}): {j_final_s}')
2536
+ sys.exit(0)
2537
+ else:
2538
+ print(f'\nJob status for job "{j_name}" (ID: {j_id}): {j_final_s}')
2539
+ sys.exit(1)
2540
+ else:
2541
+ j_status = j.get_job_status(j_id, verify_ssl)
2542
+ j_status_h = json.loads(j_status.content)["status"]
2543
+ print(f'\tYour current job status is: {j_status_h}')
2544
+ print('\tTo further check your job status you can either go to ' +
2545
+ f'{j_url} or use the following command:\n' +
2546
+ '\tcloudos job status \\\n' +
2547
+ '\t\t--apikey $MY_API_KEY \\\n' +
2548
+ f'\t\t--cloudos-url {cloudos_url} \\\n' +
2549
+ f'\t\t--job-id {j_id}\n')
2550
+
2551
+
2188
2552
  @datasets.command(name="ls")
2189
2553
  @click.argument("path", required=False, nargs=1)
2190
2554
  @click.option('-k',
@@ -0,0 +1 @@
1
+ __version__ = '2.33.0'
@@ -7,7 +7,9 @@ from typing import Union
7
7
  import json
8
8
  from cloudos_cli.clos import Cloudos
9
9
  from cloudos_cli.utils.errors import BadRequestException
10
- from cloudos_cli.utils.requests import retry_requests_post
10
+ from cloudos_cli.utils.requests import retry_requests_post, retry_requests_get
11
+ from pathlib import Path
12
+ import base64
11
13
 
12
14
 
13
15
  @dataclass
@@ -174,6 +176,8 @@ class Job(Cloudos):
174
176
  def convert_nextflow_to_json(self,
175
177
  job_config,
176
178
  parameter,
179
+ array_parameter,
180
+ array_file_header,
177
181
  is_module,
178
182
  example_parameters,
179
183
  git_commit,
@@ -214,6 +218,15 @@ class Job(Cloudos):
214
218
  parameter : tuple
215
219
  Tuple of strings indicating the parameters to pass to the pipeline call.
216
220
  They are in the following form: ('param1=param1val', 'param2=param2val', ...)
221
+ array_parameter : tuple
222
+ Tuple of strings indicating the parameters to pass to the pipeline call
223
+ for array jobs. They are in the following form: ('param1=param1val', 'param2=param2val', ...)
224
+ array_file_header : string
225
+ The header of the file containing the array parameters. It is used to
226
+ add the necessary column index for array file columns.
227
+ is_module : bool
228
+ Whether the job is a module or not. If True, the job will be
229
+ submitted as a module.
217
230
  example_parameters : list
218
231
  A list of dicts, with the parameters required for the API request in JSON format.
219
232
  It is typically used to run curated pipelines using the already available
@@ -353,6 +366,13 @@ class Job(Cloudos):
353
366
  if len(workflow_params) == 0:
354
367
  raise ValueError(f'The {job_config} file did not contain any ' +
355
368
  'valid parameter')
369
+
370
+ # array file specific parameters (from --array-parameter)
371
+ if array_parameter is not None and len(array_parameter) > 0:
372
+ ap_param = Job.split_array_file_params(array_parameter, workflow_type, array_file_header)
373
+ workflow_params.append(ap_param)
374
+
375
+ # general parameters (from --parameter)
356
376
  if len(parameter) > 0:
357
377
  for p in parameter:
358
378
  p_split = p.split('=')
@@ -439,7 +459,7 @@ class Job(Cloudos):
439
459
  "diskSizeInGb": azure_worker_instance_disk
440
460
  }
441
461
  if workflow_type == 'docker':
442
- params['command'] = command
462
+ params = params | command # add command to params as dict (python 3.9+)
443
463
  params["resourceRequirements"] = {
444
464
  "cpu": cpus,
445
465
  "ram": memory
@@ -465,6 +485,8 @@ class Job(Cloudos):
465
485
  def send_job(self,
466
486
  job_config=None,
467
487
  parameter=(),
488
+ array_parameter=(),
489
+ array_file_header=None,
468
490
  is_module=False,
469
491
  example_parameters=[],
470
492
  git_commit=None,
@@ -504,6 +526,12 @@ class Job(Cloudos):
504
526
  parameter : tuple
505
527
  Tuple of strings indicating the parameters to pass to the pipeline call.
506
528
  They are in the following form: ('param1=param1val', 'param2=param2val', ...)
529
+ array_parameter : tuple
530
+ Tuple of strings indicating the parameters to pass to the pipeline call
531
+ for array jobs. They are in the following form: ('param1=param1val', 'param2=param2val', ...)
532
+ array_file_header : string
533
+ The header of the file containing the array parameters. It is used to
534
+ add the necessary column index for array file columns.
507
535
  example_parameters : list
508
536
  A list of dicts, with the parameters required for the API request in JSON format.
509
537
  It is typically used to run curated pipelines using the already available
@@ -590,6 +618,8 @@ class Job(Cloudos):
590
618
  }
591
619
  params = self.convert_nextflow_to_json(job_config,
592
620
  parameter,
621
+ array_parameter,
622
+ array_file_header,
593
623
  is_module,
594
624
  example_parameters,
595
625
  git_commit,
@@ -630,3 +660,177 @@ class Job(Cloudos):
630
660
  print('\tJob successfully launched to CloudOS, please check the ' +
631
661
  f'following link: {cloudos_url}/app/advanced-analytics/analyses/{j_id}')
632
662
  return j_id
663
+
664
+ def retrieve_cols_from_array_file(self, array_file, ds, separator, verify_ssl):
665
+ """
666
+ Retrieve metadata for columns from an array file stored in a directory.
667
+
668
+ This method fetches the metadata of an array file by interacting with a directory service
669
+ and making an API call to retrieve the file's metadata.
670
+
671
+ Parameters
672
+ ----------
673
+ array_file : str
674
+ The path to the array file whose metadata is to be retrieved.
675
+ ds : object
676
+ The directory service object used to list folder content.
677
+ separator : str
678
+ The separator used in the array file.
679
+ verify_ssl : bool
680
+ Whether to verify SSL certificates during the API request.
681
+
682
+ Raises
683
+ ------
684
+ ValueError
685
+ If the specified file is not found in the directory.
686
+ BadRequestException
687
+ If the API request to retrieve metadata fails with a status code >= 400.
688
+
689
+ Returns
690
+ -------
691
+ Response
692
+ The HTTP response object containing the metadata of the array file.
693
+ """
694
+ # Split the array_file path to get the directory and file name
695
+ p = Path(array_file)
696
+ directory = str(p.parent)
697
+ file_name = p.name
698
+
699
+ # fetch the content of the directory
700
+ result = ds.list_folder_content(directory)
701
+
702
+ # retrieve the S3 bucket name and object key for the specified file
703
+ for file in result['files']:
704
+ if file.get("name") == file_name:
705
+ self.array_file_id = file.get("_id")
706
+ s3_bucket_name = file.get("s3BucketName")
707
+ s3_object_key = file.get("s3ObjectKey")
708
+ s3_object_key_b64 = base64.b64encode(s3_object_key.encode()).decode()
709
+ break
710
+ else:
711
+ raise ValueError(f'File "{file_name}" not found in the "{directory}" folder of the project "{self.project_name}".')
712
+
713
+ # retrieve the metadata of the array file
714
+ headers = {
715
+ "Content-type": "application/json",
716
+ "apikey": self.apikey
717
+ }
718
+ url = (
719
+ f"{self.cloudos_url}/api/v1/jobs/array-file/metadata"
720
+ f"?separator={separator}"
721
+ f"&s3BucketName={s3_bucket_name}"
722
+ f"&s3ObjectKey={s3_object_key_b64}"
723
+ f"&teamId={self.workspace_id}"
724
+ )
725
+ r = retry_requests_get(url, headers=headers, verify=verify_ssl)
726
+ if r.status_code >= 400:
727
+ raise BadRequestException(r)
728
+
729
+ return r
730
+
731
+ def setup_params_array_file(self, custom_script_path, ds_custom, command, separator):
732
+ """
733
+ Sets up a dictionary representing command parameters, including support for custom scripts
734
+ and array files, to be used in job execution.
735
+
736
+ Parameters
737
+ ----------
738
+ custom_script_path : str
739
+ Path to the custom script file. If None, the command is treated as text.
740
+ ds_custom : object
741
+ An object providing access to folder content listing functionality.
742
+ command : str
743
+ The command to be executed, either as text or the name of a custom script.
744
+ separator : str
745
+ The separator to be used for the array file.
746
+
747
+ Returns
748
+ -------
749
+ dict
750
+ A dictionary containing the command parameters, including:
751
+ - "command": The command name or text.
752
+ - "customScriptFile" (optional): Details of the custom script file if provided.
753
+ - "arrayFile": Details of the array file and its separator.
754
+ """
755
+ if custom_script_path is not None:
756
+ command_path = Path(custom_script_path)
757
+ command_dir = str(command_path.parent)
758
+ command_name = command_path.name
759
+ result_script = ds_custom.list_folder_content(command_dir)
760
+ for file in result_script['files']:
761
+ if file.get("name") == command_name:
762
+ custom_script_item = file.get("_id")
763
+ break
764
+ # use this in case the command is in a custom script
765
+ cmd = {
766
+ "command": f"{command_name}",
767
+ "customScriptFile": {
768
+ "dataItem": {
769
+ "kind": "File",
770
+ "item": f"{custom_script_item}"
771
+ }
772
+ }
773
+ }
774
+ else:
775
+ # use this for text commands
776
+ cmd = {"command": command}
777
+
778
+ # add array-file
779
+ cmd = cmd | {
780
+ "arrayFile": {
781
+ "dataItem": {"kind": "File", "item": f"{self.array_file_id}"},
782
+ "separator": f"{separator}"
783
+ }
784
+ }
785
+
786
+ return cmd
787
+
788
+ @staticmethod
789
+ def split_array_file_params(array_parameter, workflow_type, array_file_header):
790
+ """
791
+ Splits and processes array parameters for a given workflow type and array file header.
792
+
793
+ Parameters
794
+ ----------
795
+ array_parameter : list
796
+ A list of strings representing array parameters in the format "key=value".
797
+ workflow_type : str
798
+ The type of workflow, e.g., 'docker'.
799
+ array_file_header : list
800
+ A list of dictionaries representing the header of the array file.
801
+ Each dictionary should contain "name" and "index" keys.
802
+
803
+ Returns
804
+ -------
805
+ dict
806
+ A dictionary containing processed parameter details, including:
807
+ - prefix (str): The prefix for the parameter (e.g., "--" or "-").
808
+ - name (str): The name of the parameter with leading dashes stripped.
809
+ - parameterKind (str): The kind of parameter, set to "arrayFileColumn".
810
+ - columnName (str): The name of the column derived from the parameter value.
811
+ - columnIndex (int): The index of the column in the array file header.
812
+
813
+ Raises
814
+ ------
815
+ ValueError
816
+ If an array parameter does not contain a '=' character or is improperly formatted.
817
+ """
818
+ ap_param = dict()
819
+ for ap in array_parameter:
820
+ ap_split = ap.split('=')
821
+ if len(ap_split) < 2:
822
+ raise ValueError('Please, specify -a / --array-parameter using a single \'=\' ' +
823
+ 'as spacer. E.g: input=value')
824
+ ap_name = ap_split[0]
825
+ ap_value = '='.join(ap_split[1:])
826
+ if workflow_type == 'docker':
827
+ ap_prefix = "--" if ap_name.startswith('--') else ("-" if ap_name.startswith('-') else '')
828
+ ap_param = {
829
+ "prefix": ap_prefix,
830
+ "name": ap_name.lstrip('-'),
831
+ "parameterKind": "arrayFileColumn",
832
+ "columnName": ap_value,
833
+ "columnIndex": next((item["index"] for item in array_file_header if item["name"] == "id"), 0)
834
+ }
835
+
836
+ return ap_param
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: cloudos_cli
3
- Version: 2.32.1
3
+ Version: 2.33.0
4
4
  Summary: Python package for interacting with CloudOS
5
5
  Home-page: https://github.com/lifebit-ai/cloudos-cli
6
6
  Author: David Piñeyro
@@ -420,6 +420,98 @@ command.
420
420
  Other options like `--wait-completion` are also available and work in the same way as for the `cloudos job run` command.
421
421
  Check `cloudos bash job --help` for more details.
422
422
 
423
+ #### Send a bash array-job to CloudOS (parallel sample processing)
424
+
425
+ When running a bash array job, the following options are available to customize the behavior:
426
+
427
+ ##### Array File
428
+ - **`--array-file`**: Specifies the path to a file containing a set of columns useful in running the bash job. This option is **required** when using the command `bash array-job`.
429
+
430
+ ##### Separator
431
+ - **`--separator`**: Defines the separator to use in the array file. Supported separators include:
432
+ - `,` (comma)
433
+ - `;` (semicolon)
434
+ - `tab`
435
+ - `space`
436
+ - `|` (pipe)
437
+ This option is **required** when using the command `bash array-job`.
438
+
439
+ ##### List Columns
440
+ - **`--list-columns`**: Lists the columns available in the array file. This is useful for inspecting the structure of the file. This flag disables sending the job, it just prints the column list, one per line:
441
+
442
+ ```console
443
+ Columns:
444
+ - column1
445
+ - column2
446
+ - column3
447
+ ```
448
+
449
+ ##### Array File Project
450
+ - **`--array-file-project`**: Specifies the name of the project in which the array file is placed, if it is different from the project specified by `--project-name`.
451
+
452
+ ##### Disable Column Check
453
+ - **`--disable-column-check`**: Disables the validation of columns in the array file. This implies that each `--array-parameter` value is not checked against the header of the `--array-file`. For example, `--array-parameter --bar=foo`, without `--disable-column-check`, expects the array file to have column 'foo' inside the file header. If the column is not present, the CLI will throw an error. When `--disable-column-check` flag is added, the column check is not performed and the bash array job is sent to the platform.
454
+
455
+ > [!NOTE]
456
+ > Adding `--disable-column-check` will make the CLI command run without errors, but the errors might appear when checking the job in the platform, if the columns in the array file do not exists, as depicted with `--array-parameter`.
457
+
458
+ ##### Array Parameter
459
+ - **`-a` / `--array-parameter`**: Allows specifying the column name present in the header of the array file. Each parameter should be in the format `arary_parameter_name=array_file_column`. For example:
460
+ - `-a --test=value` or
461
+ - `--array-parameter -test=value`
462
+ specify a column named 'value' in the array file header. Adding array parameters not present in the header will cause an error. This option can be used multiple times to include as many array parameters as needed. This type of parameter is similar to `-p, --parameter`, both parameters can be interpolated in the bash array job command (either with `--command` or `--custom-script-path`), but this parameter can only be used to name the column present in the header of the array file.
463
+
464
+ For example, the array file has the following header:
465
+
466
+ ```console
467
+ id,bgen,csv
468
+ 1,s3://data/adipose.bgen,s3://data/adipose.csv
469
+ 2,s3://data/blood.bgen,s3://data/blood.csv
470
+ 3,s3://data/brain.bgen,s3://data/brain.csv
471
+ ...
472
+ ```
473
+ and in the command there is need to go over the `bgen` column, this can be specified as `--array-parameter file=bgen`, refering to the column in the header.
474
+
475
+ ##### Custom Script Path
476
+ - **`--custom-script-path`**: Specifies the path to a custom script to run in the bash array job instead of a command. When adding this command, parameter `--command` is ignored. To ensure the script runs successfully, you must either:
477
+
478
+ 1. Use a Shebang Line at the Top of the Script
479
+
480
+ The shebang (#!) tells the system which interpreter to use to run the script. The path should match absolute path to python or other interpreter installed inside the docker container.
481
+
482
+ Examples:
483
+ `#!/usr/bin/python3` –-> for Python scripts
484
+ `#!/usr/bin/Rscript` –-> for R scripts
485
+ `#!/bin/bash` –-> for Bash scripts
486
+
487
+ Example Python Script:
488
+
489
+ ```python
490
+ #!/usr/bin/python3
491
+ print("Hello world")
492
+ ```
493
+
494
+ 2. Or use an interpreter command in the executable field
495
+
496
+ If your script doesn’t have a shebang line, you can execute it by explicitly specifying the interpreter in the executable command:
497
+
498
+ ```console
499
+ python my_script.py
500
+ Rscript my_script.R
501
+ bash my_script.sh
502
+ ```
503
+ This assumes the interpreter is available on the container’s $PATH. If not, you can use the full absolute path instead:
504
+
505
+ ```console
506
+ /usr/bin/python3 my_script.py
507
+ /usr/local/bin/Rscript my_script.R
508
+ ```
509
+
510
+ ##### Custom Script Project
511
+ - **`--custom-script-project`**: Specifies the name of the project in which the custom script is placed, if it is different from the project specified by `--project-name`.
512
+
513
+ These options provide flexibility for configuring and running bash array jobs, allowing to tailor the execution for specific requirements.
514
+
423
515
  #### Get path to logs of job from CloudOS
424
516
 
425
517
  Get the path to "Nextflow logs", "Nextflow standard output", and "trace" files. It can be used only on your user's jobs, with any status.
@@ -1 +0,0 @@
1
- __version__ = '2.32.1'
File without changes
File without changes
File without changes