cloudos-cli 2.32.0__tar.gz → 2.33.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (34) hide show
  1. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/PKG-INFO +93 -1
  2. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/README.md +92 -0
  3. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli/__main__.py +395 -23
  4. cloudos_cli-2.33.0/cloudos_cli/_version.py +1 -0
  5. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli/jobs/job.py +206 -2
  6. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli/utils/__init__.py +2 -1
  7. cloudos_cli-2.33.0/cloudos_cli/utils/details.py +66 -0
  8. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli.egg-info/PKG-INFO +93 -1
  9. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli.egg-info/SOURCES.txt +1 -0
  10. cloudos_cli-2.32.0/cloudos_cli/_version.py +0 -1
  11. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/LICENSE +0 -0
  12. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli/__init__.py +0 -0
  13. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli/clos.py +0 -0
  14. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli/configure/__init__.py +0 -0
  15. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli/configure/configure.py +0 -0
  16. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli/datasets/__init__.py +0 -0
  17. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli/datasets/datasets.py +0 -0
  18. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli/import_wf/__init__.py +0 -0
  19. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli/import_wf/import_wf.py +0 -0
  20. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli/jobs/__init__.py +0 -0
  21. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli/queue/__init__.py +0 -0
  22. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli/queue/queue.py +0 -0
  23. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli/utils/cloud.py +0 -0
  24. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli/utils/errors.py +0 -0
  25. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli/utils/requests.py +0 -0
  26. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli/utils/resources.py +0 -0
  27. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli.egg-info/dependency_links.txt +0 -0
  28. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli.egg-info/entry_points.txt +0 -0
  29. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli.egg-info/requires.txt +0 -0
  30. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/cloudos_cli.egg-info/top_level.txt +0 -0
  31. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/setup.cfg +0 -0
  32. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/setup.py +0 -0
  33. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/tests/__init__.py +0 -0
  34. {cloudos_cli-2.32.0 → cloudos_cli-2.33.0}/tests/functions_for_pytest.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: cloudos_cli
3
- Version: 2.32.0
3
+ Version: 2.33.0
4
4
  Summary: Python package for interacting with CloudOS
5
5
  Home-page: https://github.com/lifebit-ai/cloudos-cli
6
6
  Author: David Piñeyro
@@ -420,6 +420,98 @@ command.
420
420
  Other options like `--wait-completion` are also available and work in the same way as for the `cloudos job run` command.
421
421
  Check `cloudos bash job --help` for more details.
422
422
 
423
+ #### Send a bash array-job to CloudOS (parallel sample processing)
424
+
425
+ When running a bash array job, the following options are available to customize the behavior:
426
+
427
+ ##### Array File
428
+ - **`--array-file`**: Specifies the path to a file containing a set of columns useful in running the bash job. This option is **required** when using the command `bash array-job`.
429
+
430
+ ##### Separator
431
+ - **`--separator`**: Defines the separator to use in the array file. Supported separators include:
432
+ - `,` (comma)
433
+ - `;` (semicolon)
434
+ - `tab`
435
+ - `space`
436
+ - `|` (pipe)
437
+ This option is **required** when using the command `bash array-job`.
438
+
439
+ ##### List Columns
440
+ - **`--list-columns`**: Lists the columns available in the array file. This is useful for inspecting the structure of the file. This flag disables sending the job, it just prints the column list, one per line:
441
+
442
+ ```console
443
+ Columns:
444
+ - column1
445
+ - column2
446
+ - column3
447
+ ```
448
+
449
+ ##### Array File Project
450
+ - **`--array-file-project`**: Specifies the name of the project in which the array file is placed, if it is different from the project specified by `--project-name`.
451
+
452
+ ##### Disable Column Check
453
+ - **`--disable-column-check`**: Disables the validation of columns in the array file. This implies that each `--array-parameter` value is not checked against the header of the `--array-file`. For example, `--array-parameter --bar=foo`, without `--disable-column-check`, expects the array file to have column 'foo' inside the file header. If the column is not present, the CLI will throw an error. When `--disable-column-check` flag is added, the column check is not performed and the bash array job is sent to the platform.
454
+
455
+ > [!NOTE]
456
+ > Adding `--disable-column-check` will make the CLI command run without errors, but the errors might appear when checking the job in the platform, if the columns in the array file do not exists, as depicted with `--array-parameter`.
457
+
458
+ ##### Array Parameter
459
+ - **`-a` / `--array-parameter`**: Allows specifying the column name present in the header of the array file. Each parameter should be in the format `arary_parameter_name=array_file_column`. For example:
460
+ - `-a --test=value` or
461
+ - `--array-parameter -test=value`
462
+ specify a column named 'value' in the array file header. Adding array parameters not present in the header will cause an error. This option can be used multiple times to include as many array parameters as needed. This type of parameter is similar to `-p, --parameter`, both parameters can be interpolated in the bash array job command (either with `--command` or `--custom-script-path`), but this parameter can only be used to name the column present in the header of the array file.
463
+
464
+ For example, the array file has the following header:
465
+
466
+ ```console
467
+ id,bgen,csv
468
+ 1,s3://data/adipose.bgen,s3://data/adipose.csv
469
+ 2,s3://data/blood.bgen,s3://data/blood.csv
470
+ 3,s3://data/brain.bgen,s3://data/brain.csv
471
+ ...
472
+ ```
473
+ and in the command there is need to go over the `bgen` column, this can be specified as `--array-parameter file=bgen`, refering to the column in the header.
474
+
475
+ ##### Custom Script Path
476
+ - **`--custom-script-path`**: Specifies the path to a custom script to run in the bash array job instead of a command. When adding this command, parameter `--command` is ignored. To ensure the script runs successfully, you must either:
477
+
478
+ 1. Use a Shebang Line at the Top of the Script
479
+
480
+ The shebang (#!) tells the system which interpreter to use to run the script. The path should match absolute path to python or other interpreter installed inside the docker container.
481
+
482
+ Examples:
483
+ `#!/usr/bin/python3` –-> for Python scripts
484
+ `#!/usr/bin/Rscript` –-> for R scripts
485
+ `#!/bin/bash` –-> for Bash scripts
486
+
487
+ Example Python Script:
488
+
489
+ ```python
490
+ #!/usr/bin/python3
491
+ print("Hello world")
492
+ ```
493
+
494
+ 2. Or use an interpreter command in the executable field
495
+
496
+ If your script doesn’t have a shebang line, you can execute it by explicitly specifying the interpreter in the executable command:
497
+
498
+ ```console
499
+ python my_script.py
500
+ Rscript my_script.R
501
+ bash my_script.sh
502
+ ```
503
+ This assumes the interpreter is available on the container’s $PATH. If not, you can use the full absolute path instead:
504
+
505
+ ```console
506
+ /usr/bin/python3 my_script.py
507
+ /usr/local/bin/Rscript my_script.R
508
+ ```
509
+
510
+ ##### Custom Script Project
511
+ - **`--custom-script-project`**: Specifies the name of the project in which the custom script is placed, if it is different from the project specified by `--project-name`.
512
+
513
+ These options provide flexibility for configuring and running bash array jobs, allowing to tailor the execution for specific requirements.
514
+
423
515
  #### Get path to logs of job from CloudOS
424
516
 
425
517
  Get the path to "Nextflow logs", "Nextflow standard output", and "trace" files. It can be used only on your user's jobs, with any status.
@@ -385,6 +385,98 @@ command.
385
385
  Other options like `--wait-completion` are also available and work in the same way as for the `cloudos job run` command.
386
386
  Check `cloudos bash job --help` for more details.
387
387
 
388
+ #### Send a bash array-job to CloudOS (parallel sample processing)
389
+
390
+ When running a bash array job, the following options are available to customize the behavior:
391
+
392
+ ##### Array File
393
+ - **`--array-file`**: Specifies the path to a file containing a set of columns useful in running the bash job. This option is **required** when using the command `bash array-job`.
394
+
395
+ ##### Separator
396
+ - **`--separator`**: Defines the separator to use in the array file. Supported separators include:
397
+ - `,` (comma)
398
+ - `;` (semicolon)
399
+ - `tab`
400
+ - `space`
401
+ - `|` (pipe)
402
+ This option is **required** when using the command `bash array-job`.
403
+
404
+ ##### List Columns
405
+ - **`--list-columns`**: Lists the columns available in the array file. This is useful for inspecting the structure of the file. This flag disables sending the job, it just prints the column list, one per line:
406
+
407
+ ```console
408
+ Columns:
409
+ - column1
410
+ - column2
411
+ - column3
412
+ ```
413
+
414
+ ##### Array File Project
415
+ - **`--array-file-project`**: Specifies the name of the project in which the array file is placed, if it is different from the project specified by `--project-name`.
416
+
417
+ ##### Disable Column Check
418
+ - **`--disable-column-check`**: Disables the validation of columns in the array file. This implies that each `--array-parameter` value is not checked against the header of the `--array-file`. For example, `--array-parameter --bar=foo`, without `--disable-column-check`, expects the array file to have column 'foo' inside the file header. If the column is not present, the CLI will throw an error. When `--disable-column-check` flag is added, the column check is not performed and the bash array job is sent to the platform.
419
+
420
+ > [!NOTE]
421
+ > Adding `--disable-column-check` will make the CLI command run without errors, but the errors might appear when checking the job in the platform, if the columns in the array file do not exists, as depicted with `--array-parameter`.
422
+
423
+ ##### Array Parameter
424
+ - **`-a` / `--array-parameter`**: Allows specifying the column name present in the header of the array file. Each parameter should be in the format `arary_parameter_name=array_file_column`. For example:
425
+ - `-a --test=value` or
426
+ - `--array-parameter -test=value`
427
+ specify a column named 'value' in the array file header. Adding array parameters not present in the header will cause an error. This option can be used multiple times to include as many array parameters as needed. This type of parameter is similar to `-p, --parameter`, both parameters can be interpolated in the bash array job command (either with `--command` or `--custom-script-path`), but this parameter can only be used to name the column present in the header of the array file.
428
+
429
+ For example, the array file has the following header:
430
+
431
+ ```console
432
+ id,bgen,csv
433
+ 1,s3://data/adipose.bgen,s3://data/adipose.csv
434
+ 2,s3://data/blood.bgen,s3://data/blood.csv
435
+ 3,s3://data/brain.bgen,s3://data/brain.csv
436
+ ...
437
+ ```
438
+ and in the command there is need to go over the `bgen` column, this can be specified as `--array-parameter file=bgen`, refering to the column in the header.
439
+
440
+ ##### Custom Script Path
441
+ - **`--custom-script-path`**: Specifies the path to a custom script to run in the bash array job instead of a command. When adding this command, parameter `--command` is ignored. To ensure the script runs successfully, you must either:
442
+
443
+ 1. Use a Shebang Line at the Top of the Script
444
+
445
+ The shebang (#!) tells the system which interpreter to use to run the script. The path should match absolute path to python or other interpreter installed inside the docker container.
446
+
447
+ Examples:
448
+ `#!/usr/bin/python3` –-> for Python scripts
449
+ `#!/usr/bin/Rscript` –-> for R scripts
450
+ `#!/bin/bash` –-> for Bash scripts
451
+
452
+ Example Python Script:
453
+
454
+ ```python
455
+ #!/usr/bin/python3
456
+ print("Hello world")
457
+ ```
458
+
459
+ 2. Or use an interpreter command in the executable field
460
+
461
+ If your script doesn’t have a shebang line, you can execute it by explicitly specifying the interpreter in the executable command:
462
+
463
+ ```console
464
+ python my_script.py
465
+ Rscript my_script.R
466
+ bash my_script.sh
467
+ ```
468
+ This assumes the interpreter is available on the container’s $PATH. If not, you can use the full absolute path instead:
469
+
470
+ ```console
471
+ /usr/bin/python3 my_script.py
472
+ /usr/local/bin/Rscript my_script.R
473
+ ```
474
+
475
+ ##### Custom Script Project
476
+ - **`--custom-script-project`**: Specifies the name of the project in which the custom script is placed, if it is different from the project specified by `--project-name`.
477
+
478
+ These options provide flexibility for configuring and running bash array jobs, allowing to tailor the execution for specific requirements.
479
+
388
480
  #### Get path to logs of job from CloudOS
389
481
 
390
482
  Get the path to "Nextflow logs", "Nextflow standard output", and "trace" files. It can be used only on your user's jobs, with any status.
@@ -16,6 +16,7 @@ from rich.table import Table
16
16
  from cloudos_cli.datasets import Datasets
17
17
  from cloudos_cli.utils.resources import ssl_selector, format_bytes
18
18
  from rich.style import Style
19
+ from cloudos_cli.utils.details import get_path
19
20
 
20
21
 
21
22
  # GLOBAL VARS
@@ -81,7 +82,8 @@ def run_cloudos_cli(ctx):
81
82
  'list': shared_config
82
83
  },
83
84
  'bash': {
84
- 'job': shared_config
85
+ 'job': shared_config,
86
+ 'array-job': shared_config
85
87
  },
86
88
  'datasets': {
87
89
  'ls': shared_config,
@@ -127,7 +129,8 @@ def run_cloudos_cli(ctx):
127
129
  'list': shared_config
128
130
  },
129
131
  'bash': {
130
- 'job': shared_config
132
+ 'job': shared_config,
133
+ 'array-job': shared_config
131
134
  },
132
135
  'datasets': {
133
136
  'ls': shared_config,
@@ -944,6 +947,19 @@ def job_details(ctx,
944
947
  sys.exit(1)
945
948
  j_details_h = json.loads(j_details.content)
946
949
 
950
+ # Determine the execution platform based on jobType
951
+ executors = {
952
+ 'nextflowAWS': 'Batch AWS',
953
+ 'nextflowAzure': 'Batch Azure',
954
+ 'nextflowGcp': 'GCP',
955
+ 'nextflowHpc': 'HPC',
956
+ 'nextflowKubernetes': 'Kubernetes',
957
+ 'dockerAWS': 'Batch AWS',
958
+ 'cromwellAWS': 'Batch AWS'
959
+ }
960
+ execution_platform = executors.get(j_details_h["jobType"], "None")
961
+ storage_provider = "s3://" if execution_platform == "Batch AWS" else "az://"
962
+
947
963
  # Check if the job details contain parameters
948
964
  if j_details_h["parameters"] != []:
949
965
  param_kind_map = {
@@ -951,18 +967,15 @@ def job_details(ctx,
951
967
  'arrayFileColumn': 'columnName',
952
968
  'globPattern': 'globPattern',
953
969
  'lustreFileSystem': 'fileSystem',
970
+ 'dataItem': 'dataItem'
954
971
  }
955
972
  # there are different types of parameters, arrayFileColumn, globPattern, lustreFileSystem
956
973
  # get first the type of parameter, then the value based on the parameter kind
957
974
  concats = []
958
975
  for param in j_details_h["parameters"]:
959
- if param['parameterKind'] == 'dataItem':
960
- # For dataItem, we need to use specific nested keys
961
- concats.append(f"{param['prefix']}{param['name']}={param['dataItem']['item']['name']}")
962
- else:
963
- # For other parameter kinds, we use the appropriate key from param_kind_map
964
- concats.append(f"{param['prefix']}{param['name']}={param[param_kind_map[param['parameterKind']]]}")
976
+ concats.append(f"{param['prefix']}{param['name']}={get_path(param, param_kind_map, execution_platform, storage_provider, 'asis')}")
965
977
  concat_string = '\n'.join(concats)
978
+
966
979
  # If the user requested to save the parameters in a config file
967
980
  if parameters:
968
981
  # Create a config file with the parameters
@@ -970,7 +983,7 @@ def job_details(ctx,
970
983
  with open(config_filename, 'w') as config_file:
971
984
  config_file.write("params {\n")
972
985
  for param in j_details_h["parameters"]:
973
- config_file.write(f"\t{param['name']} = {param['textValue']}\n")
986
+ config_file.write(f"\t{param['name']} = {get_path(param, param_kind_map, execution_platform, storage_provider)}\n")
974
987
  config_file.write("}\n")
975
988
  print(f"\tJob parameters have been saved to '{config_filename}'")
976
989
  else:
@@ -978,18 +991,6 @@ def job_details(ctx,
978
991
  if parameters:
979
992
  print("\tNo parameters found in the job details, no config file will be created.")
980
993
 
981
- # Determine the execution platform based on jobType
982
- executors = {
983
- 'nextflowAWS': 'Batch AWS',
984
- 'nextflowAzure': 'Batch Azure',
985
- 'nextflowGcp': 'GCP',
986
- 'nextflowHpc': 'HPC',
987
- 'nextflowKubernetes': 'Kubernetes',
988
- 'dockerAWS': 'Batch AWS',
989
- 'cromwellAWS': 'Batch AWS'
990
- }
991
- execution_platform = executors.get(j_details_h["jobType"], "None")
992
-
993
994
  # revision
994
995
  if j_details_h["jobType"] == "dockerAWS":
995
996
  revision = j_details_h["revision"]["digest"]
@@ -1012,7 +1013,11 @@ def job_details(ctx,
1012
1013
  table.add_row("Nextflow Version", str(j_details_h.get("nextflowVersion", "None")))
1013
1014
  table.add_row("Execution Platform", execution_platform)
1014
1015
  table.add_row("Profile", str(j_details_h.get("profile", "None")))
1015
- table.add_row("Master Instance", str(j_details_h["masterInstance"]["usedInstance"]["type"]))
1016
+ # when the job is just running this value might not be present
1017
+ master_instance = j_details_h.get("masterInstance", {})
1018
+ used_instance = master_instance.get("usedInstance", {})
1019
+ instance_type = used_instance.get("type", "N/A")
1020
+ table.add_row("Master Instance", str(instance_type))
1016
1021
  if j_details_h["jobType"] == "nextflowAzure":
1017
1022
  try:
1018
1023
  table.add_row("Worker Node", str(j_details_h["azureBatch"]["vmType"]))
@@ -1039,7 +1044,6 @@ def job_details(ctx,
1039
1044
  "Nextflow Version": str(j_details_h.get("nextflowVersion", "None")),
1040
1045
  "Execution Platform": execution_platform,
1041
1046
  "Profile": str(j_details_h.get("profile", "None")),
1042
- "Master Instance": str(j_details_h["masterInstance"]["usedInstance"]["type"]),
1043
1047
  "Storage": str(j_details_h["storageSizeInGb"]) + " GB",
1044
1048
  "Accelerated File Staging": str(j_details_h.get("usesFusionFileSystem", "None")),
1045
1049
  "Task Resources": f"{str(j_details_h['resourceRequirements']['cpu'])} CPUs, " +
@@ -1047,6 +1051,12 @@ def job_details(ctx,
1047
1051
 
1048
1052
  }
1049
1053
 
1054
+ # when the job is just running this value might not be present
1055
+ master_instance = j_details_h.get("masterInstance", {})
1056
+ used_instance = master_instance.get("usedInstance", {})
1057
+ instance_type = used_instance.get("type", "N/A")
1058
+ job_details_json["Master Instance"] = str(instance_type)
1059
+
1050
1060
  # Conditionally add the "Command" key if the jobType is "dockerAWS"
1051
1061
  if j_details_h["jobType"] == "dockerAWS":
1052
1062
  job_details_json["Command"] = str(j_details_h["command"])
@@ -2177,6 +2187,368 @@ def run_bash_job(ctx,
2177
2187
  f'\t\t--job-id {j_id}\n')
2178
2188
 
2179
2189
 
2190
+ @bash.command('array-job')
2191
+ @click.option('-k',
2192
+ '--apikey',
2193
+ help='Your CloudOS API key',
2194
+ required=True)
2195
+ @click.option('--command',
2196
+ help='The command to run in the bash job.')
2197
+ @click.option('-c',
2198
+ '--cloudos-url',
2199
+ help=(f'The CloudOS url you are trying to access to. Default={CLOUDOS_URL}.'),
2200
+ default=CLOUDOS_URL)
2201
+ @click.option('--workspace-id',
2202
+ help='The specific CloudOS workspace id.',
2203
+ required=True)
2204
+ @click.option('--project-name',
2205
+ help='The name of a CloudOS project.',
2206
+ required=True)
2207
+ @click.option('--workflow-name',
2208
+ help='The name of a CloudOS workflow or pipeline.',
2209
+ required=True)
2210
+ @click.option('-p',
2211
+ '--parameter',
2212
+ multiple=True,
2213
+ help=('A single parameter to pass to the job call. It should be in the ' +
2214
+ 'following form: parameter_name=parameter_value. E.g.: ' +
2215
+ '-p --test=value or -p -test=value or -p test=value. You can use this option as many ' +
2216
+ 'times as parameters you want to include.'))
2217
+ @click.option('--job-name',
2218
+ help='The name of the job. Default=new_job.',
2219
+ default='new_job')
2220
+ @click.option('--do-not-save-logs',
2221
+ help=('Avoids process log saving. If you select this option, your job process ' +
2222
+ 'logs will not be stored.'),
2223
+ is_flag=True)
2224
+ @click.option('--job-queue',
2225
+ help='Name of the job queue to use with a batch job.')
2226
+ @click.option('--instance-type',
2227
+ help=('The type of compute instance to use as master node. ' +
2228
+ 'Default=c5.xlarge(aws)|Standard_D4as_v4(azure).'),
2229
+ default='NONE_SELECTED')
2230
+ @click.option('--instance-disk',
2231
+ help='The disk space of the master node instance, in GB. Default=500.',
2232
+ type=int,
2233
+ default=500)
2234
+ @click.option('--cpus',
2235
+ help='The number of CPUs to use for the task\'s master node. Default=1.',
2236
+ type=int,
2237
+ default=1)
2238
+ @click.option('--memory',
2239
+ help='The amount of memory, in GB, to use for the task\'s master node. Default=4.',
2240
+ type=int,
2241
+ default=4)
2242
+ @click.option('--storage-mode',
2243
+ help=('Either \'lustre\' or \'regular\'. Indicates if the user wants to select ' +
2244
+ 'regular or lustre storage. Default=regular.'),
2245
+ default='regular')
2246
+ @click.option('--lustre-size',
2247
+ help=('The lustre storage to be used when --storage-mode=lustre, in GB. It should ' +
2248
+ 'be 1200 or a multiple of it. Default=1200.'),
2249
+ type=int,
2250
+ default=1200)
2251
+ @click.option('--wait-completion',
2252
+ help=('Whether to wait to job completion and report final ' +
2253
+ 'job status.'),
2254
+ is_flag=True)
2255
+ @click.option('--wait-time',
2256
+ help=('Max time to wait (in seconds) to job completion. ' +
2257
+ 'Default=3600.'),
2258
+ default=3600)
2259
+ @click.option('--repository-platform', type=click.Choice(["github", "gitlab", "bitbucketServer"]),
2260
+ help='Name of the repository platform of the workflow. Default=github.',
2261
+ default='github')
2262
+ @click.option('--execution-platform',
2263
+ help='Name of the execution platform implemented in your CloudOS. Default=aws.',
2264
+ type=click.Choice(['aws', 'azure', 'hpc']),
2265
+ default='aws')
2266
+ @click.option('--cost-limit',
2267
+ help='Add a cost limit to your job. Default=30.0 (For no cost limit please use -1).',
2268
+ type=float,
2269
+ default=30.0)
2270
+ @click.option('--request-interval',
2271
+ help=('Time interval to request (in seconds) the job status. ' +
2272
+ 'For large jobs is important to use a high number to ' +
2273
+ 'make fewer requests so that is not considered spamming by the API. ' +
2274
+ 'Default=30.'),
2275
+ default=30)
2276
+ @click.option('--disable-ssl-verification',
2277
+ help=('Disable SSL certificate verification. Please, remember that this option is ' +
2278
+ 'not generally recommended for security reasons.'),
2279
+ is_flag=True)
2280
+ @click.option('--ssl-cert',
2281
+ help='Path to your SSL certificate file.')
2282
+ @click.option('--profile', help='Profile to use from the config file', default=None)
2283
+ @click.option('--array-file',
2284
+ help=('Path to a file containing an array of commands to run in the bash job.'),
2285
+ default=None,
2286
+ required=True)
2287
+ @click.option('--separator',
2288
+ help=('Separator to use in the array file. Default=",".'),
2289
+ type=click.Choice([',', ';', 'tab', 'space', '|']),
2290
+ default=",",
2291
+ required=True)
2292
+ @click.option('--list-columns',
2293
+ help=('List columns present in the array file. ' +
2294
+ 'This option will not run any job.'),
2295
+ is_flag=True)
2296
+ @click.option('--array-file-project',
2297
+ help=('Name of the project in which the array file is placed, if different from --project-name.'),
2298
+ default=None)
2299
+ @click.option('--disable-column-check',
2300
+ help=('Disable the check for the columns in the array file. ' +
2301
+ 'This option is only used when --array-file is provided.'),
2302
+ is_flag=True)
2303
+ @click.option('-a', '--array-parameter',
2304
+ multiple=True,
2305
+ help=('A single parameter to pass to the job call only for specifying array parameter. It should be in the ' +
2306
+ 'following form: parameter_name=parameter_value. E.g.: ' +
2307
+ '-a --test=value or -a -test=value or -a test=value. You can use this option as many ' +
2308
+ 'times as parameters you want to include.'))
2309
+ @click.option('--custom-script-path',
2310
+ help=('Path of a custom script to run in the bash array job instead of a command.'),
2311
+ default=None)
2312
+ @click.option('--custom-script-project',
2313
+ help=('Name of the project to use when running the custom command script, if ' +
2314
+ 'different than --project-name.'),
2315
+ default=None)
2316
+ @click.pass_context
2317
+ def run_bash_array_job(ctx,
2318
+ apikey,
2319
+ command,
2320
+ cloudos_url,
2321
+ workspace_id,
2322
+ project_name,
2323
+ workflow_name,
2324
+ parameter,
2325
+ job_name,
2326
+ do_not_save_logs,
2327
+ job_queue,
2328
+ instance_type,
2329
+ instance_disk,
2330
+ cpus,
2331
+ memory,
2332
+ storage_mode,
2333
+ lustre_size,
2334
+ wait_completion,
2335
+ wait_time,
2336
+ repository_platform,
2337
+ execution_platform,
2338
+ cost_limit,
2339
+ request_interval,
2340
+ disable_ssl_verification,
2341
+ ssl_cert,
2342
+ profile,
2343
+ array_file,
2344
+ separator,
2345
+ list_columns,
2346
+ array_file_project,
2347
+ disable_column_check,
2348
+ array_parameter,
2349
+ custom_script_path,
2350
+ custom_script_project):
2351
+ """Run a bash array job in CloudOS."""
2352
+ profile = profile or ctx.default_map['bash']['array-job']['profile']
2353
+
2354
+ # Create a dictionary with required and non-required params
2355
+ required_dict = {
2356
+ 'apikey': True,
2357
+ 'workspace_id': True,
2358
+ 'workflow_name': True,
2359
+ 'project_name': True
2360
+ }
2361
+
2362
+ # determine if the user provided all required parameters
2363
+ config_manager = ConfigurationProfile()
2364
+ apikey, cloudos_url, workspace_id, workflow_name, repository_platform, execution_platform, project_name = (
2365
+ config_manager.load_profile_and_validate_data(
2366
+ ctx,
2367
+ INIT_PROFILE,
2368
+ CLOUDOS_URL,
2369
+ profile=profile,
2370
+ required_dict=required_dict,
2371
+ apikey=apikey,
2372
+ cloudos_url=cloudos_url,
2373
+ workspace_id=workspace_id,
2374
+ workflow_name=workflow_name,
2375
+ repository_platform=repository_platform,
2376
+ execution_platform=execution_platform,
2377
+ project_name=project_name
2378
+ )
2379
+ )
2380
+ verify_ssl = ssl_selector(disable_ssl_verification, ssl_cert)
2381
+
2382
+ if not list_columns and not (command or custom_script_path):
2383
+ raise click.UsageError("Must provide --command or --custom-script-path if --list-columns is not set.")
2384
+
2385
+ # when not set, use the global project name
2386
+ if array_file_project is None:
2387
+ array_file_project = project_name
2388
+
2389
+ # this needs to be in another call to datasets, by default it uses the global project name
2390
+ if custom_script_project is None:
2391
+ custom_script_project = project_name
2392
+
2393
+ # setup separators for API and array file (the're different)
2394
+ separators = {
2395
+ ",": { "api": ",", "file": "," },
2396
+ ";": { "api": "%3B", "file": ";" },
2397
+ "space": { "api": "+", "file": " " },
2398
+ "tab": { "api": "tab", "file": "tab" },
2399
+ "|": { "api": "%7C", "file": "|" }
2400
+ }
2401
+
2402
+ # Setup datasets
2403
+ try:
2404
+ ds = Datasets(
2405
+ cloudos_url=cloudos_url,
2406
+ apikey=apikey,
2407
+ workspace_id=workspace_id,
2408
+ project_name=array_file_project,
2409
+ verify=verify_ssl,
2410
+ cromwell_token=None
2411
+ )
2412
+ if custom_script_project is not None:
2413
+ # If a custom script project is specified, create a new Datasets object for it
2414
+ # This allows the user to run custom scripts in a different project
2415
+ ds_custom = Datasets(
2416
+ cloudos_url=cloudos_url,
2417
+ apikey=apikey,
2418
+ workspace_id=workspace_id,
2419
+ project_name=custom_script_project,
2420
+ verify=verify_ssl,
2421
+ cromwell_token=None
2422
+ )
2423
+ except BadRequestException as e:
2424
+ if 'Forbidden' in str(e):
2425
+ print('[Error] It seems your call is not authorised. Please check if ' +
2426
+ 'your workspace is restricted by Airlock and if your API key is valid.')
2427
+ sys.exit(1)
2428
+ else:
2429
+ raise e
2430
+
2431
+ # setup important options for the job
2432
+ if do_not_save_logs:
2433
+ save_logs = False
2434
+ else:
2435
+ save_logs = True
2436
+
2437
+ if instance_type == 'NONE_SELECTED':
2438
+ if execution_platform == 'aws':
2439
+ instance_type = 'c5.xlarge'
2440
+ elif execution_platform == 'azure':
2441
+ instance_type = 'Standard_D4as_v4'
2442
+ else:
2443
+ instance_type = None
2444
+
2445
+ j = jb.Job(cloudos_url, apikey, None, workspace_id, project_name, workflow_name,
2446
+ mainfile=None, importsfile=None,
2447
+ repository_platform=repository_platform, verify=verify_ssl)
2448
+
2449
+ # retrieve columns
2450
+ r = j.retrieve_cols_from_array_file(array_file, ds, separators[separator]['api'], verify_ssl)
2451
+
2452
+ if not disable_column_check:
2453
+ columns = json.loads(r.content).get("headers", None)
2454
+ # pass this to the SEND JOB API call
2455
+ # b'{"headers":[{"index":0,"name":"id"},{"index":1,"name":"title"},{"index":2,"name":"filename"},{"index":3,"name":"file2name"}]}'
2456
+ if columns is None:
2457
+ raise ValueError("No columns found in the array file metadata.")
2458
+ if list_columns:
2459
+ print("Columns: ")
2460
+ for col in columns:
2461
+ print(f"\t- {col['name']}")
2462
+ return
2463
+ else:
2464
+ columns = []
2465
+
2466
+ # setup parameters for the job
2467
+ cmd = j.setup_params_array_file(custom_script_path, ds_custom, command, separators[separator]['file'])
2468
+
2469
+ # check columns in the array file vs parameters added
2470
+ if not disable_column_check and array_parameter:
2471
+ print("\nChecking columns in the array file vs parameters added...\n")
2472
+ for ap in array_parameter:
2473
+ ap_split = ap.split('=')
2474
+ ap_value = '='.join(ap_split[1:])
2475
+ for col in columns:
2476
+ if col['name'] == ap_value:
2477
+ print(f"Found column '{ap_value}' in the array file.")
2478
+ break
2479
+ else:
2480
+ raise ValueError(f"Column '{ap_value}' not found in the array file. " +
2481
+ "Columns in array-file: ", f"{separator}".join([col['name'] for col in columns]))
2482
+
2483
+ if job_queue is not None:
2484
+ batch = True
2485
+ queue = Queue(cloudos_url=cloudos_url, apikey=apikey, cromwell_token=None,
2486
+ workspace_id=workspace_id, verify=verify_ssl)
2487
+ # I have to add 'nextflow', other wise the job queue id is not found
2488
+ job_queue_id = queue.fetch_job_queue_id(workflow_type='nextflow', batch=batch,
2489
+ job_queue=job_queue)
2490
+ else:
2491
+ job_queue_id = None
2492
+ batch = False
2493
+
2494
+ # send job
2495
+ j_id = j.send_job(job_config=None,
2496
+ parameter=parameter,
2497
+ array_parameter=array_parameter,
2498
+ array_file_header=columns,
2499
+ git_commit=None,
2500
+ git_tag=None,
2501
+ git_branch=None,
2502
+ job_name=job_name,
2503
+ resumable=False,
2504
+ save_logs=save_logs,
2505
+ batch=batch,
2506
+ job_queue_id=job_queue_id,
2507
+ workflow_type='docker',
2508
+ nextflow_profile=None,
2509
+ nextflow_version=None,
2510
+ instance_type=instance_type,
2511
+ instance_disk=instance_disk,
2512
+ storage_mode=storage_mode,
2513
+ lustre_size=lustre_size,
2514
+ execution_platform=execution_platform,
2515
+ hpc_id=None,
2516
+ cost_limit=cost_limit,
2517
+ verify=verify_ssl,
2518
+ command=cmd,
2519
+ cpus=cpus,
2520
+ memory=memory)
2521
+
2522
+ print(f'\tYour assigned job id is: {j_id}\n')
2523
+ j_url = f'{cloudos_url}/app/advanced-analytics/analyses/{j_id}'
2524
+ if wait_completion:
2525
+ print('\tPlease, wait until job completion (max wait time of ' +
2526
+ f'{wait_time} seconds).\n')
2527
+ j_status = j.wait_job_completion(job_id=j_id,
2528
+ wait_time=wait_time,
2529
+ request_interval=request_interval,
2530
+ verbose=False,
2531
+ verify=verify_ssl)
2532
+ j_name = j_status['name']
2533
+ j_final_s = j_status['status']
2534
+ if j_final_s == JOB_COMPLETED:
2535
+ print(f'\nJob status for job "{j_name}" (ID: {j_id}): {j_final_s}')
2536
+ sys.exit(0)
2537
+ else:
2538
+ print(f'\nJob status for job "{j_name}" (ID: {j_id}): {j_final_s}')
2539
+ sys.exit(1)
2540
+ else:
2541
+ j_status = j.get_job_status(j_id, verify_ssl)
2542
+ j_status_h = json.loads(j_status.content)["status"]
2543
+ print(f'\tYour current job status is: {j_status_h}')
2544
+ print('\tTo further check your job status you can either go to ' +
2545
+ f'{j_url} or use the following command:\n' +
2546
+ '\tcloudos job status \\\n' +
2547
+ '\t\t--apikey $MY_API_KEY \\\n' +
2548
+ f'\t\t--cloudos-url {cloudos_url} \\\n' +
2549
+ f'\t\t--job-id {j_id}\n')
2550
+
2551
+
2180
2552
  @datasets.command(name="ls")
2181
2553
  @click.argument("path", required=False, nargs=1)
2182
2554
  @click.option('-k',
@@ -0,0 +1 @@
1
+ __version__ = '2.33.0'
@@ -7,7 +7,9 @@ from typing import Union
7
7
  import json
8
8
  from cloudos_cli.clos import Cloudos
9
9
  from cloudos_cli.utils.errors import BadRequestException
10
- from cloudos_cli.utils.requests import retry_requests_post
10
+ from cloudos_cli.utils.requests import retry_requests_post, retry_requests_get
11
+ from pathlib import Path
12
+ import base64
11
13
 
12
14
 
13
15
  @dataclass
@@ -174,6 +176,8 @@ class Job(Cloudos):
174
176
  def convert_nextflow_to_json(self,
175
177
  job_config,
176
178
  parameter,
179
+ array_parameter,
180
+ array_file_header,
177
181
  is_module,
178
182
  example_parameters,
179
183
  git_commit,
@@ -214,6 +218,15 @@ class Job(Cloudos):
214
218
  parameter : tuple
215
219
  Tuple of strings indicating the parameters to pass to the pipeline call.
216
220
  They are in the following form: ('param1=param1val', 'param2=param2val', ...)
221
+ array_parameter : tuple
222
+ Tuple of strings indicating the parameters to pass to the pipeline call
223
+ for array jobs. They are in the following form: ('param1=param1val', 'param2=param2val', ...)
224
+ array_file_header : string
225
+ The header of the file containing the array parameters. It is used to
226
+ add the necessary column index for array file columns.
227
+ is_module : bool
228
+ Whether the job is a module or not. If True, the job will be
229
+ submitted as a module.
217
230
  example_parameters : list
218
231
  A list of dicts, with the parameters required for the API request in JSON format.
219
232
  It is typically used to run curated pipelines using the already available
@@ -353,6 +366,13 @@ class Job(Cloudos):
353
366
  if len(workflow_params) == 0:
354
367
  raise ValueError(f'The {job_config} file did not contain any ' +
355
368
  'valid parameter')
369
+
370
+ # array file specific parameters (from --array-parameter)
371
+ if array_parameter is not None and len(array_parameter) > 0:
372
+ ap_param = Job.split_array_file_params(array_parameter, workflow_type, array_file_header)
373
+ workflow_params.append(ap_param)
374
+
375
+ # general parameters (from --parameter)
356
376
  if len(parameter) > 0:
357
377
  for p in parameter:
358
378
  p_split = p.split('=')
@@ -439,7 +459,7 @@ class Job(Cloudos):
439
459
  "diskSizeInGb": azure_worker_instance_disk
440
460
  }
441
461
  if workflow_type == 'docker':
442
- params['command'] = command
462
+ params = params | command # add command to params as dict (python 3.9+)
443
463
  params["resourceRequirements"] = {
444
464
  "cpu": cpus,
445
465
  "ram": memory
@@ -465,6 +485,8 @@ class Job(Cloudos):
465
485
  def send_job(self,
466
486
  job_config=None,
467
487
  parameter=(),
488
+ array_parameter=(),
489
+ array_file_header=None,
468
490
  is_module=False,
469
491
  example_parameters=[],
470
492
  git_commit=None,
@@ -504,6 +526,12 @@ class Job(Cloudos):
504
526
  parameter : tuple
505
527
  Tuple of strings indicating the parameters to pass to the pipeline call.
506
528
  They are in the following form: ('param1=param1val', 'param2=param2val', ...)
529
+ array_parameter : tuple
530
+ Tuple of strings indicating the parameters to pass to the pipeline call
531
+ for array jobs. They are in the following form: ('param1=param1val', 'param2=param2val', ...)
532
+ array_file_header : string
533
+ The header of the file containing the array parameters. It is used to
534
+ add the necessary column index for array file columns.
507
535
  example_parameters : list
508
536
  A list of dicts, with the parameters required for the API request in JSON format.
509
537
  It is typically used to run curated pipelines using the already available
@@ -590,6 +618,8 @@ class Job(Cloudos):
590
618
  }
591
619
  params = self.convert_nextflow_to_json(job_config,
592
620
  parameter,
621
+ array_parameter,
622
+ array_file_header,
593
623
  is_module,
594
624
  example_parameters,
595
625
  git_commit,
@@ -630,3 +660,177 @@ class Job(Cloudos):
630
660
  print('\tJob successfully launched to CloudOS, please check the ' +
631
661
  f'following link: {cloudos_url}/app/advanced-analytics/analyses/{j_id}')
632
662
  return j_id
663
+
664
+ def retrieve_cols_from_array_file(self, array_file, ds, separator, verify_ssl):
665
+ """
666
+ Retrieve metadata for columns from an array file stored in a directory.
667
+
668
+ This method fetches the metadata of an array file by interacting with a directory service
669
+ and making an API call to retrieve the file's metadata.
670
+
671
+ Parameters
672
+ ----------
673
+ array_file : str
674
+ The path to the array file whose metadata is to be retrieved.
675
+ ds : object
676
+ The directory service object used to list folder content.
677
+ separator : str
678
+ The separator used in the array file.
679
+ verify_ssl : bool
680
+ Whether to verify SSL certificates during the API request.
681
+
682
+ Raises
683
+ ------
684
+ ValueError
685
+ If the specified file is not found in the directory.
686
+ BadRequestException
687
+ If the API request to retrieve metadata fails with a status code >= 400.
688
+
689
+ Returns
690
+ -------
691
+ Response
692
+ The HTTP response object containing the metadata of the array file.
693
+ """
694
+ # Split the array_file path to get the directory and file name
695
+ p = Path(array_file)
696
+ directory = str(p.parent)
697
+ file_name = p.name
698
+
699
+ # fetch the content of the directory
700
+ result = ds.list_folder_content(directory)
701
+
702
+ # retrieve the S3 bucket name and object key for the specified file
703
+ for file in result['files']:
704
+ if file.get("name") == file_name:
705
+ self.array_file_id = file.get("_id")
706
+ s3_bucket_name = file.get("s3BucketName")
707
+ s3_object_key = file.get("s3ObjectKey")
708
+ s3_object_key_b64 = base64.b64encode(s3_object_key.encode()).decode()
709
+ break
710
+ else:
711
+ raise ValueError(f'File "{file_name}" not found in the "{directory}" folder of the project "{self.project_name}".')
712
+
713
+ # retrieve the metadata of the array file
714
+ headers = {
715
+ "Content-type": "application/json",
716
+ "apikey": self.apikey
717
+ }
718
+ url = (
719
+ f"{self.cloudos_url}/api/v1/jobs/array-file/metadata"
720
+ f"?separator={separator}"
721
+ f"&s3BucketName={s3_bucket_name}"
722
+ f"&s3ObjectKey={s3_object_key_b64}"
723
+ f"&teamId={self.workspace_id}"
724
+ )
725
+ r = retry_requests_get(url, headers=headers, verify=verify_ssl)
726
+ if r.status_code >= 400:
727
+ raise BadRequestException(r)
728
+
729
+ return r
730
+
731
+ def setup_params_array_file(self, custom_script_path, ds_custom, command, separator):
732
+ """
733
+ Sets up a dictionary representing command parameters, including support for custom scripts
734
+ and array files, to be used in job execution.
735
+
736
+ Parameters
737
+ ----------
738
+ custom_script_path : str
739
+ Path to the custom script file. If None, the command is treated as text.
740
+ ds_custom : object
741
+ An object providing access to folder content listing functionality.
742
+ command : str
743
+ The command to be executed, either as text or the name of a custom script.
744
+ separator : str
745
+ The separator to be used for the array file.
746
+
747
+ Returns
748
+ -------
749
+ dict
750
+ A dictionary containing the command parameters, including:
751
+ - "command": The command name or text.
752
+ - "customScriptFile" (optional): Details of the custom script file if provided.
753
+ - "arrayFile": Details of the array file and its separator.
754
+ """
755
+ if custom_script_path is not None:
756
+ command_path = Path(custom_script_path)
757
+ command_dir = str(command_path.parent)
758
+ command_name = command_path.name
759
+ result_script = ds_custom.list_folder_content(command_dir)
760
+ for file in result_script['files']:
761
+ if file.get("name") == command_name:
762
+ custom_script_item = file.get("_id")
763
+ break
764
+ # use this in case the command is in a custom script
765
+ cmd = {
766
+ "command": f"{command_name}",
767
+ "customScriptFile": {
768
+ "dataItem": {
769
+ "kind": "File",
770
+ "item": f"{custom_script_item}"
771
+ }
772
+ }
773
+ }
774
+ else:
775
+ # use this for text commands
776
+ cmd = {"command": command}
777
+
778
+ # add array-file
779
+ cmd = cmd | {
780
+ "arrayFile": {
781
+ "dataItem": {"kind": "File", "item": f"{self.array_file_id}"},
782
+ "separator": f"{separator}"
783
+ }
784
+ }
785
+
786
+ return cmd
787
+
788
+ @staticmethod
789
+ def split_array_file_params(array_parameter, workflow_type, array_file_header):
790
+ """
791
+ Splits and processes array parameters for a given workflow type and array file header.
792
+
793
+ Parameters
794
+ ----------
795
+ array_parameter : list
796
+ A list of strings representing array parameters in the format "key=value".
797
+ workflow_type : str
798
+ The type of workflow, e.g., 'docker'.
799
+ array_file_header : list
800
+ A list of dictionaries representing the header of the array file.
801
+ Each dictionary should contain "name" and "index" keys.
802
+
803
+ Returns
804
+ -------
805
+ dict
806
+ A dictionary containing processed parameter details, including:
807
+ - prefix (str): The prefix for the parameter (e.g., "--" or "-").
808
+ - name (str): The name of the parameter with leading dashes stripped.
809
+ - parameterKind (str): The kind of parameter, set to "arrayFileColumn".
810
+ - columnName (str): The name of the column derived from the parameter value.
811
+ - columnIndex (int): The index of the column in the array file header.
812
+
813
+ Raises
814
+ ------
815
+ ValueError
816
+ If an array parameter does not contain a '=' character or is improperly formatted.
817
+ """
818
+ ap_param = dict()
819
+ for ap in array_parameter:
820
+ ap_split = ap.split('=')
821
+ if len(ap_split) < 2:
822
+ raise ValueError('Please, specify -a / --array-parameter using a single \'=\' ' +
823
+ 'as spacer. E.g: input=value')
824
+ ap_name = ap_split[0]
825
+ ap_value = '='.join(ap_split[1:])
826
+ if workflow_type == 'docker':
827
+ ap_prefix = "--" if ap_name.startswith('--') else ("-" if ap_name.startswith('-') else '')
828
+ ap_param = {
829
+ "prefix": ap_prefix,
830
+ "name": ap_name.lstrip('-'),
831
+ "parameterKind": "arrayFileColumn",
832
+ "columnName": ap_value,
833
+ "columnIndex": next((item["index"] for item in array_file_header if item["name"] == "id"), 0)
834
+ }
835
+
836
+ return ap_param
@@ -7,5 +7,6 @@ from .requests import retry_requests_get, retry_requests_post, retry_requests_pu
7
7
  from .resources import format_bytes, ssl_selector
8
8
  from .cloud import find_cloud
9
9
  from .cloud import find_cloud
10
+ from .details import get_path
10
11
 
11
- __all__ = ['errors', 'requests', 'resources', 'cloud']
12
+ __all__ = ['errors', 'requests', 'resources', 'cloud', 'details']
@@ -0,0 +1,66 @@
1
+ def get_path(param, param_kind_map, execution_platform, storage_provider, mode="parameters"):
2
+ """
3
+ Constructs a storage path based on the parameter kind and execution platform.
4
+
5
+ Parameters
6
+ ----------
7
+ param : dict
8
+ A dictionary containing parameter details. Expected keys include:
9
+ - 'parameterKind': Specifies the kind of parameter (e.g., 'dataItem', 'globPattern').
10
+ - For 'dataItem': Contains nested keys such as 'item', which includes:
11
+ - 's3BucketName', 's3ObjectKey', 's3Prefix' (for AWS Batch).
12
+ - 'blobStorageAccountName', 'blobContainerName', 'blobName' (for other platforms).
13
+ - For 'globPattern': Contains nested keys such as 'folder', which includes:
14
+ - 's3BucketName', 's3Prefix' (for AWS Batch).
15
+ - 'blobStorageAccountName', 'blobContainerName', 'blobPrefix' (for other platforms).
16
+ param_kind_map : dict
17
+ A mapping of parameter kinds to their corresponding keys in the `param` dictionary.
18
+ execution_platform : str
19
+ The platform on which the execution is taking place.
20
+ Expected values include "Batch AWS" or other non-AWS platforms.
21
+ storage_provider : str
22
+ Either s3:// or az://
23
+ mode : str
24
+ For "parameters" is creating the '*.config' file and it adds the complete path, for "asis"
25
+ leaves the constructed path as generated from the API
26
+
27
+ Returns
28
+ -------
29
+ str: A constructed storage path based on the parameter kind and execution platform.
30
+ - For 'dataItem' on AWS Batch: "s3BucketName/s3ObjectKey" or "s3BucketName/s3Prefix".
31
+ - For 'dataItem' on other platforms: "blobStorageAccountName/blobContainerName/blobName".
32
+ - For 'globPattern' on AWS Batch: "s3BucketName/s3Prefix/globPattern".
33
+ - For 'globPattern' on other platforms: "blobStorageAccountName/blobContainerName/blobPrefix/globPattern".
34
+ """
35
+ value = param[param_kind_map[param['parameterKind']]]
36
+ if param['parameterKind'] == 'dataItem':
37
+ if execution_platform == "Batch AWS":
38
+ s3_object_key = value['item'].get('s3ObjectKey', None) if value['item'].get('s3Prefix', None) is None else value['item'].get('s3Prefix', None)
39
+ if mode == "parameters":
40
+ value = storage_provider + value['item']['s3BucketName'] + '/' + s3_object_key
41
+ else:
42
+ value = value['item']['s3BucketName'] + '/' + s3_object_key
43
+ else:
44
+ account_name = value['item']['blobStorageAccountName'] + ".blob.core.windows.net"
45
+ container_name = value['item']['blobContainerName']
46
+ blob_name = value['item']['blobName']
47
+ if mode == "parameters":
48
+ value = storage_provider + account_name + '/' + container_name + '/' + blob_name
49
+ else:
50
+ value = value['item']['blobStorageAccountName'] + '/' + container_name + '/' + blob_name
51
+ elif param['parameterKind'] == 'globPattern':
52
+ if execution_platform == "Batch AWS":
53
+ if mode == "parameters":
54
+ value = storage_provider + param['folder']['s3BucketName'] + '/' + param['folder']['s3Prefix'] + '/' + param['globPattern']
55
+ else:
56
+ value = param['folder']['s3BucketName'] + '/' + param['folder']['s3Prefix'] + '/' + param['globPattern']
57
+ else:
58
+ account_name = param['folder']['blobStorageAccountName'] + ".blob.core.windows.net"
59
+ container_name = param['folder']['blobContainerName']
60
+ blob_name = param['folder']['blobPrefix']
61
+ if mode == "parameters":
62
+ value = storage_provider + account_name + '/' + container_name + '/' + blob_name + '/' + param['globPattern']
63
+ else:
64
+ value = param['folder']['blobStorageAccountName'] + '/' + container_name + '/' + blob_name + '/' + param['globPattern']
65
+
66
+ return value
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: cloudos_cli
3
- Version: 2.32.0
3
+ Version: 2.33.0
4
4
  Summary: Python package for interacting with CloudOS
5
5
  Home-page: https://github.com/lifebit-ai/cloudos-cli
6
6
  Author: David Piñeyro
@@ -420,6 +420,98 @@ command.
420
420
  Other options like `--wait-completion` are also available and work in the same way as for the `cloudos job run` command.
421
421
  Check `cloudos bash job --help` for more details.
422
422
 
423
+ #### Send a bash array-job to CloudOS (parallel sample processing)
424
+
425
+ When running a bash array job, the following options are available to customize the behavior:
426
+
427
+ ##### Array File
428
+ - **`--array-file`**: Specifies the path to a file containing a set of columns useful in running the bash job. This option is **required** when using the command `bash array-job`.
429
+
430
+ ##### Separator
431
+ - **`--separator`**: Defines the separator to use in the array file. Supported separators include:
432
+ - `,` (comma)
433
+ - `;` (semicolon)
434
+ - `tab`
435
+ - `space`
436
+ - `|` (pipe)
437
+ This option is **required** when using the command `bash array-job`.
438
+
439
+ ##### List Columns
440
+ - **`--list-columns`**: Lists the columns available in the array file. This is useful for inspecting the structure of the file. This flag disables sending the job, it just prints the column list, one per line:
441
+
442
+ ```console
443
+ Columns:
444
+ - column1
445
+ - column2
446
+ - column3
447
+ ```
448
+
449
+ ##### Array File Project
450
+ - **`--array-file-project`**: Specifies the name of the project in which the array file is placed, if it is different from the project specified by `--project-name`.
451
+
452
+ ##### Disable Column Check
453
+ - **`--disable-column-check`**: Disables the validation of columns in the array file. This implies that each `--array-parameter` value is not checked against the header of the `--array-file`. For example, `--array-parameter --bar=foo`, without `--disable-column-check`, expects the array file to have column 'foo' inside the file header. If the column is not present, the CLI will throw an error. When `--disable-column-check` flag is added, the column check is not performed and the bash array job is sent to the platform.
454
+
455
+ > [!NOTE]
456
+ > Adding `--disable-column-check` will make the CLI command run without errors, but the errors might appear when checking the job in the platform, if the columns in the array file do not exists, as depicted with `--array-parameter`.
457
+
458
+ ##### Array Parameter
459
+ - **`-a` / `--array-parameter`**: Allows specifying the column name present in the header of the array file. Each parameter should be in the format `arary_parameter_name=array_file_column`. For example:
460
+ - `-a --test=value` or
461
+ - `--array-parameter -test=value`
462
+ specify a column named 'value' in the array file header. Adding array parameters not present in the header will cause an error. This option can be used multiple times to include as many array parameters as needed. This type of parameter is similar to `-p, --parameter`, both parameters can be interpolated in the bash array job command (either with `--command` or `--custom-script-path`), but this parameter can only be used to name the column present in the header of the array file.
463
+
464
+ For example, the array file has the following header:
465
+
466
+ ```console
467
+ id,bgen,csv
468
+ 1,s3://data/adipose.bgen,s3://data/adipose.csv
469
+ 2,s3://data/blood.bgen,s3://data/blood.csv
470
+ 3,s3://data/brain.bgen,s3://data/brain.csv
471
+ ...
472
+ ```
473
+ and in the command there is need to go over the `bgen` column, this can be specified as `--array-parameter file=bgen`, refering to the column in the header.
474
+
475
+ ##### Custom Script Path
476
+ - **`--custom-script-path`**: Specifies the path to a custom script to run in the bash array job instead of a command. When adding this command, parameter `--command` is ignored. To ensure the script runs successfully, you must either:
477
+
478
+ 1. Use a Shebang Line at the Top of the Script
479
+
480
+ The shebang (#!) tells the system which interpreter to use to run the script. The path should match absolute path to python or other interpreter installed inside the docker container.
481
+
482
+ Examples:
483
+ `#!/usr/bin/python3` –-> for Python scripts
484
+ `#!/usr/bin/Rscript` –-> for R scripts
485
+ `#!/bin/bash` –-> for Bash scripts
486
+
487
+ Example Python Script:
488
+
489
+ ```python
490
+ #!/usr/bin/python3
491
+ print("Hello world")
492
+ ```
493
+
494
+ 2. Or use an interpreter command in the executable field
495
+
496
+ If your script doesn’t have a shebang line, you can execute it by explicitly specifying the interpreter in the executable command:
497
+
498
+ ```console
499
+ python my_script.py
500
+ Rscript my_script.R
501
+ bash my_script.sh
502
+ ```
503
+ This assumes the interpreter is available on the container’s $PATH. If not, you can use the full absolute path instead:
504
+
505
+ ```console
506
+ /usr/bin/python3 my_script.py
507
+ /usr/local/bin/Rscript my_script.R
508
+ ```
509
+
510
+ ##### Custom Script Project
511
+ - **`--custom-script-project`**: Specifies the name of the project in which the custom script is placed, if it is different from the project specified by `--project-name`.
512
+
513
+ These options provide flexibility for configuring and running bash array jobs, allowing to tailor the execution for specific requirements.
514
+
423
515
  #### Get path to logs of job from CloudOS
424
516
 
425
517
  Get the path to "Nextflow logs", "Nextflow standard output", and "trace" files. It can be used only on your user's jobs, with any status.
@@ -23,6 +23,7 @@ cloudos_cli/queue/__init__.py
23
23
  cloudos_cli/queue/queue.py
24
24
  cloudos_cli/utils/__init__.py
25
25
  cloudos_cli/utils/cloud.py
26
+ cloudos_cli/utils/details.py
26
27
  cloudos_cli/utils/errors.py
27
28
  cloudos_cli/utils/requests.py
28
29
  cloudos_cli/utils/resources.py
@@ -1 +0,0 @@
1
- __version__ = '2.32.0'
File without changes
File without changes
File without changes