PyPI - cloudos-cli - Versions diffs - 2.32.1__tar.gz → 2.33.0__tar.gz - Mend

cloudos-cli 2.32.1tar.gz → 2.33.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: cloudos_cli
-Version: 2.32.1
+Version: 2.33.0
 Summary: Python package for interacting with CloudOS
 Home-page: https://github.com/lifebit-ai/cloudos-cli
 Author: David Piñeyro
@@ -420,6 +420,98 @@ command.
 Other options like `--wait-completion` are also available and work in the same way as for the `cloudos job run` command.
 Check `cloudos bash job --help` for more details.
+#### Send a bash array-job to CloudOS (parallel sample processing)
+When running a bash array job, the following options are available to customize the behavior:
+##### Array File
+- **`--array-file`**: Specifies the path to a file containing a set of columns useful in running the bash job. This option is **required** when using the command `bash array-job`.
+##### Separator
+- **`--separator`**: Defines the separator to use in the array file. Supported separators include:
+    - `,` (comma)
+    - `;` (semicolon)
+    - `tab`
+    - `space`
+    - `|` (pipe)
+This option is **required** when using the command `bash array-job`.
+##### List Columns
+- **`--list-columns`**: Lists the columns available in the array file. This is useful for inspecting the structure of the file. This flag disables sending the job, it just prints the column list, one per line:
+```console
+Columns:
+    - column1
+    - column2
+    - column3
+```
+##### Array File Project
+- **`--array-file-project`**: Specifies the name of the project in which the array file is placed, if it is different from the project specified by `--project-name`.
+##### Disable Column Check
+- **`--disable-column-check`**: Disables the validation of columns in the array file. This implies that each `--array-parameter` value is not checked against the header of the `--array-file`. For example, `--array-parameter --bar=foo`, without `--disable-column-check`, expects the array file to have column 'foo' inside the file header. If the column is not present, the CLI will throw an error. When `--disable-column-check` flag is added, the column check is not performed and the bash array job is sent to the platform.
+> [!NOTE]
+> Adding `--disable-column-check` will make the CLI command run without errors, but the errors might appear when checking the job in the platform, if the columns in the array file do not exists, as depicted with `--array-parameter`.
+##### Array Parameter
+- **`-a` / `--array-parameter`**: Allows specifying the column name present in the header of the array file. Each parameter should be in the format `arary_parameter_name=array_file_column`. For example:
+    - `-a --test=value` or
+    - `--array-parameter -test=value`
+specify a column named 'value' in the array file header. Adding array parameters not present in the header will cause an error. This option can be used multiple times to include as many array parameters as needed. This type of parameter is similar to `-p, --parameter`, both parameters can be interpolated in the bash array job command (either with `--command` or `--custom-script-path`), but this parameter can only be used to name the column present in the header of the array file.
+For example, the array file has the following header:
+```console
+id,bgen,csv
+1,s3://data/adipose.bgen,s3://data/adipose.csv
+2,s3://data/blood.bgen,s3://data/blood.csv
+3,s3://data/brain.bgen,s3://data/brain.csv
+...
+```
+and in the command there is need to go over the `bgen` column, this can be specified as `--array-parameter file=bgen`, refering to the column in the header.
+##### Custom Script Path
+- **`--custom-script-path`**: Specifies the path to a custom script to run in the bash array job instead of a command. When adding this command, parameter `--command` is ignored. To ensure the script runs successfully, you must either:
+1. Use a Shebang Line at the Top of the Script
+The shebang (#!) tells the system which interpreter to use to run the script. The path should match absolute path to python or other interpreter installed inside the docker container.
+Examples:
+`#!/usr/bin/python3` –-> for Python scripts
+`#!/usr/bin/Rscript` –-> for R scripts
+`#!/bin/bash`        –-> for Bash scripts
+Example Python Script:
+```python
+#!/usr/bin/python3
+print("Hello world")
+```
+2. Or use an interpreter command in the executable field
+If your script doesn’t have a shebang line, you can execute it by explicitly specifying the interpreter in the executable command:
+```console
+python my_script.py
+Rscript my_script.R
+bash my_script.sh
+```
+This assumes the interpreter is available on the container’s $PATH. If not, you can use the full absolute path instead:
+```console
+/usr/bin/python3 my_script.py
+/usr/local/bin/Rscript my_script.R
+```
+##### Custom Script Project
+- **`--custom-script-project`**: Specifies the name of the project in which the custom script is placed, if it is different from the project specified by `--project-name`.
+These options provide flexibility for configuring and running bash array jobs, allowing to tailor the execution for specific requirements.
 #### Get path to logs of job from CloudOS
 Get the path to "Nextflow logs", "Nextflow standard output", and "trace" files. It can be used only on your user's jobs, with any status.

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/README.md RENAMED Viewed

@@ -385,6 +385,98 @@ command.
 Other options like `--wait-completion` are also available and work in the same way as for the `cloudos job run` command.
 Check `cloudos bash job --help` for more details.
+#### Send a bash array-job to CloudOS (parallel sample processing)
+When running a bash array job, the following options are available to customize the behavior:
+##### Array File
+- **`--array-file`**: Specifies the path to a file containing a set of columns useful in running the bash job. This option is **required** when using the command `bash array-job`.
+##### Separator
+- **`--separator`**: Defines the separator to use in the array file. Supported separators include:
+    - `,` (comma)
+    - `;` (semicolon)
+    - `tab`
+    - `space`
+    - `|` (pipe)
+This option is **required** when using the command `bash array-job`.
+##### List Columns
+- **`--list-columns`**: Lists the columns available in the array file. This is useful for inspecting the structure of the file. This flag disables sending the job, it just prints the column list, one per line:
+```console
+Columns:
+    - column1
+    - column2
+    - column3
+```
+##### Array File Project
+- **`--array-file-project`**: Specifies the name of the project in which the array file is placed, if it is different from the project specified by `--project-name`.
+##### Disable Column Check
+- **`--disable-column-check`**: Disables the validation of columns in the array file. This implies that each `--array-parameter` value is not checked against the header of the `--array-file`. For example, `--array-parameter --bar=foo`, without `--disable-column-check`, expects the array file to have column 'foo' inside the file header. If the column is not present, the CLI will throw an error. When `--disable-column-check` flag is added, the column check is not performed and the bash array job is sent to the platform.
+> [!NOTE]
+> Adding `--disable-column-check` will make the CLI command run without errors, but the errors might appear when checking the job in the platform, if the columns in the array file do not exists, as depicted with `--array-parameter`.
+##### Array Parameter
+- **`-a` / `--array-parameter`**: Allows specifying the column name present in the header of the array file. Each parameter should be in the format `arary_parameter_name=array_file_column`. For example:
+    - `-a --test=value` or
+    - `--array-parameter -test=value`
+specify a column named 'value' in the array file header. Adding array parameters not present in the header will cause an error. This option can be used multiple times to include as many array parameters as needed. This type of parameter is similar to `-p, --parameter`, both parameters can be interpolated in the bash array job command (either with `--command` or `--custom-script-path`), but this parameter can only be used to name the column present in the header of the array file.
+For example, the array file has the following header:
+```console
+id,bgen,csv
+1,s3://data/adipose.bgen,s3://data/adipose.csv
+2,s3://data/blood.bgen,s3://data/blood.csv
+3,s3://data/brain.bgen,s3://data/brain.csv
+...
+```
+and in the command there is need to go over the `bgen` column, this can be specified as `--array-parameter file=bgen`, refering to the column in the header.
+##### Custom Script Path
+- **`--custom-script-path`**: Specifies the path to a custom script to run in the bash array job instead of a command. When adding this command, parameter `--command` is ignored. To ensure the script runs successfully, you must either:
+1. Use a Shebang Line at the Top of the Script
+The shebang (#!) tells the system which interpreter to use to run the script. The path should match absolute path to python or other interpreter installed inside the docker container.
+Examples:
+`#!/usr/bin/python3` –-> for Python scripts
+`#!/usr/bin/Rscript` –-> for R scripts
+`#!/bin/bash`        –-> for Bash scripts
+Example Python Script:
+```python
+#!/usr/bin/python3
+print("Hello world")
+```
+2. Or use an interpreter command in the executable field
+If your script doesn’t have a shebang line, you can execute it by explicitly specifying the interpreter in the executable command:
+```console
+python my_script.py
+Rscript my_script.R
+bash my_script.sh
+```
+This assumes the interpreter is available on the container’s $PATH. If not, you can use the full absolute path instead:
+```console
+/usr/bin/python3 my_script.py
+/usr/local/bin/Rscript my_script.R
+```
+##### Custom Script Project
+- **`--custom-script-project`**: Specifies the name of the project in which the custom script is placed, if it is different from the project specified by `--project-name`.
+These options provide flexibility for configuring and running bash array jobs, allowing to tailor the execution for specific requirements.
 #### Get path to logs of job from CloudOS
 Get the path to "Nextflow logs", "Nextflow standard output", and "trace" files. It can be used only on your user's jobs, with any status.

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/__main__.py RENAMED Viewed

@@ -16,7 +16,7 @@ from rich.table import Table
 from cloudos_cli.datasets import Datasets
 from cloudos_cli.utils.resources import ssl_selector, format_bytes
 from rich.style import Style
-from cloudos_cli.utils.details import get_path
+from cloudos_cli.utils.details import get_path
 # GLOBAL VARS
@@ -82,7 +82,8 @@ def run_cloudos_cli(ctx):
                 'list': shared_config
             },
             'bash': {
-                'job': shared_config
+                'job': shared_config,
+                'array-job': shared_config
             },
             'datasets': {
                 'ls': shared_config,
@@ -128,7 +129,8 @@ def run_cloudos_cli(ctx):
                 'list': shared_config
             },
             'bash': {
-                'job': shared_config
+                'job': shared_config,
+                'array-job': shared_config
             },
             'datasets': {
                 'ls': shared_config,
@@ -2185,6 +2187,368 @@ def run_bash_job(ctx,
               f'\t\t--job-id {j_id}\n')
+@bash.command('array-job')
+@click.option('-k',
+              '--apikey',
+              help='Your CloudOS API key',
+              required=True)
+@click.option('--command',
+              help='The command to run in the bash job.')
+@click.option('-c',
+              '--cloudos-url',
+              help=(f'The CloudOS url you are trying to access to. Default={CLOUDOS_URL}.'),
+              default=CLOUDOS_URL)
+@click.option('--workspace-id',
+              help='The specific CloudOS workspace id.',
+              required=True)
+@click.option('--project-name',
+              help='The name of a CloudOS project.',
+              required=True)
+@click.option('--workflow-name',
+              help='The name of a CloudOS workflow or pipeline.',
+              required=True)
+@click.option('-p',
+              '--parameter',
+              multiple=True,
+              help=('A single parameter to pass to the job call. It should be in the ' +
+                    'following form: parameter_name=parameter_value. E.g.: ' +
+                    '-p --test=value or -p -test=value or -p test=value. You can use this option as many ' +
+                    'times as parameters you want to include.'))
+@click.option('--job-name',
+              help='The name of the job. Default=new_job.',
+              default='new_job')
+@click.option('--do-not-save-logs',
+              help=('Avoids process log saving. If you select this option, your job process ' +
+                    'logs will not be stored.'),
+              is_flag=True)
+@click.option('--job-queue',
+              help='Name of the job queue to use with a batch job.')
+@click.option('--instance-type',
+              help=('The type of compute instance to use as master node. ' +
+                    'Default=c5.xlarge(aws)|Standard_D4as_v4(azure).'),
+              default='NONE_SELECTED')
+@click.option('--instance-disk',
+              help='The disk space of the master node instance, in GB. Default=500.',
+              type=int,
+              default=500)
+@click.option('--cpus',
+              help='The number of CPUs to use for the task\'s master node. Default=1.',
+              type=int,
+              default=1)
+@click.option('--memory',
+              help='The amount of memory, in GB, to use for the task\'s master node. Default=4.',
+              type=int,
+              default=4)
+@click.option('--storage-mode',
+              help=('Either \'lustre\' or \'regular\'. Indicates if the user wants to select ' +
+                    'regular or lustre storage. Default=regular.'),
+              default='regular')
+@click.option('--lustre-size',
+              help=('The lustre storage to be used when --storage-mode=lustre, in GB. It should ' +
+                    'be 1200 or a multiple of it. Default=1200.'),
+              type=int,
+              default=1200)
+@click.option('--wait-completion',
+              help=('Whether to wait to job completion and report final ' +
+                    'job status.'),
+              is_flag=True)
+@click.option('--wait-time',
+              help=('Max time to wait (in seconds) to job completion. ' +
+                    'Default=3600.'),
+              default=3600)
+@click.option('--repository-platform', type=click.Choice(["github", "gitlab", "bitbucketServer"]),
+              help='Name of the repository platform of the workflow. Default=github.',
+              default='github')
+@click.option('--execution-platform',
+              help='Name of the execution platform implemented in your CloudOS. Default=aws.',
+              type=click.Choice(['aws', 'azure', 'hpc']),
+              default='aws')
+@click.option('--cost-limit',
+              help='Add a cost limit to your job. Default=30.0 (For no cost limit please use -1).',
+              type=float,
+              default=30.0)
+@click.option('--request-interval',
+              help=('Time interval to request (in seconds) the job status. ' +
+                    'For large jobs is important to use a high number to ' +
+                    'make fewer requests so that is not considered spamming by the API. ' +
+                    'Default=30.'),
+              default=30)
+@click.option('--disable-ssl-verification',
+              help=('Disable SSL certificate verification. Please, remember that this option is ' +
+                    'not generally recommended for security reasons.'),
+              is_flag=True)
+@click.option('--ssl-cert',
+              help='Path to your SSL certificate file.')
+@click.option('--profile', help='Profile to use from the config file', default=None)
+@click.option('--array-file',
+              help=('Path to a file containing an array of commands to run in the bash job.'),
+              default=None,
+              required=True)
+@click.option('--separator',
+              help=('Separator to use in the array file. Default=",".'),
+              type=click.Choice([',', ';', 'tab', 'space', '|']),
+              default=",",
+              required=True)
+@click.option('--list-columns',
+              help=('List columns present in the array file. ' +
+                    'This option will not run any job.'),
+              is_flag=True)
+@click.option('--array-file-project',
+              help=('Name of the project in which the array file is placed, if different from --project-name.'),
+              default=None)
+@click.option('--disable-column-check',
+              help=('Disable the check for the columns in the array file. ' +
+                    'This option is only used when --array-file is provided.'),
+              is_flag=True)
+@click.option('-a', '--array-parameter',
+              multiple=True,
+              help=('A single parameter to pass to the job call only for specifying array parameter. It should be in the ' +
+                    'following form: parameter_name=parameter_value. E.g.: ' +
+                    '-a --test=value or -a -test=value or -a test=value. You can use this option as many ' +
+                    'times as parameters you want to include.'))
+@click.option('--custom-script-path',
+              help=('Path of a custom script to run in the bash array job instead of a command.'),
+              default=None)
+@click.option('--custom-script-project',
+              help=('Name of the project to use when running the custom command script, if ' +
+                    'different than --project-name.'),
+              default=None)
+@click.pass_context
+def run_bash_array_job(ctx,
+                       apikey,
+                       command,
+                       cloudos_url,
+                       workspace_id,
+                       project_name,
+                       workflow_name,
+                       parameter,
+                       job_name,
+                       do_not_save_logs,
+                       job_queue,
+                       instance_type,
+                       instance_disk,
+                       cpus,
+                       memory,
+                       storage_mode,
+                       lustre_size,
+                       wait_completion,
+                       wait_time,
+                       repository_platform,
+                       execution_platform,
+                       cost_limit,
+                       request_interval,
+                       disable_ssl_verification,
+                       ssl_cert,
+                       profile,
+                       array_file,
+                       separator,
+                       list_columns,
+                       array_file_project,
+                       disable_column_check,
+                       array_parameter,
+                       custom_script_path,
+                       custom_script_project):
+    """Run a bash array job in CloudOS."""
+    profile = profile or ctx.default_map['bash']['array-job']['profile']
+    # Create a dictionary with required and non-required params
+    required_dict = {
+        'apikey': True,
+        'workspace_id': True,
+        'workflow_name': True,
+        'project_name': True
+    }
+    # determine if the user provided all required parameters
+    config_manager = ConfigurationProfile()
+    apikey, cloudos_url, workspace_id, workflow_name, repository_platform, execution_platform, project_name = (
+        config_manager.load_profile_and_validate_data(
+            ctx,
+            INIT_PROFILE,
+            CLOUDOS_URL,
+            profile=profile,
+            required_dict=required_dict,
+            apikey=apikey,
+            cloudos_url=cloudos_url,
+            workspace_id=workspace_id,
+            workflow_name=workflow_name,
+            repository_platform=repository_platform,
+            execution_platform=execution_platform,
+            project_name=project_name
+        )
+    )
+    verify_ssl = ssl_selector(disable_ssl_verification, ssl_cert)
+    if not list_columns and not (command or custom_script_path):
+        raise click.UsageError("Must provide --command or --custom-script-path if --list-columns is not set.")
+    # when not set, use the global project name
+    if array_file_project is None:
+        array_file_project = project_name
+    # this needs to be in another call to datasets, by default it uses the global project name
+    if custom_script_project is None:
+        custom_script_project = project_name
+    # setup separators for API and array file (the're different)
+    separators = {
+        ",": { "api": ",", "file": "," },
+        ";": { "api": "%3B", "file": ";" },
+        "space": { "api": "+", "file": " " },
+        "tab": { "api": "tab", "file": "tab" },
+        "|": { "api": "%7C", "file": "|" }
+    }
+    # Setup datasets
+    try:
+        ds = Datasets(
+            cloudos_url=cloudos_url,
+            apikey=apikey,
+            workspace_id=workspace_id,
+            project_name=array_file_project,
+            verify=verify_ssl,
+            cromwell_token=None
+        )
+        if custom_script_project is not None:
+            # If a custom script project is specified, create a new Datasets object for it
+            # This allows the user to run custom scripts in a different project
+            ds_custom = Datasets(
+                cloudos_url=cloudos_url,
+                apikey=apikey,
+                workspace_id=workspace_id,
+                project_name=custom_script_project,
+                verify=verify_ssl,
+                cromwell_token=None
+            )
+    except BadRequestException as e:
+        if 'Forbidden' in str(e):
+            print('[Error] It seems your call is not authorised. Please check if ' +
+                  'your workspace is restricted by Airlock and if your API key is valid.')
+            sys.exit(1)
+        else:
+            raise e
+    # setup important options for the job
+    if do_not_save_logs:
+        save_logs = False
+    else:
+        save_logs = True
+    if instance_type == 'NONE_SELECTED':
+        if execution_platform == 'aws':
+            instance_type = 'c5.xlarge'
+        elif execution_platform == 'azure':
+            instance_type = 'Standard_D4as_v4'
+        else:
+            instance_type = None
+    j = jb.Job(cloudos_url, apikey, None, workspace_id, project_name, workflow_name,
+               mainfile=None, importsfile=None,
+               repository_platform=repository_platform, verify=verify_ssl)
+    # retrieve columns
+    r = j.retrieve_cols_from_array_file(array_file, ds, separators[separator]['api'], verify_ssl)
+    if not disable_column_check:
+        columns = json.loads(r.content).get("headers", None)
+        # pass this to the SEND JOB API call
+        # b'{"headers":[{"index":0,"name":"id"},{"index":1,"name":"title"},{"index":2,"name":"filename"},{"index":3,"name":"file2name"}]}'
+        if columns is None:
+            raise ValueError("No columns found in the array file metadata.")
+        if list_columns:
+            print("Columns: ")
+            for col in columns:
+                print(f"\t- {col['name']}")
+            return
+    else:
+        columns = []
+    # setup parameters for the job
+    cmd = j.setup_params_array_file(custom_script_path, ds_custom, command, separators[separator]['file'])
+    # check columns in the array file vs parameters added
+    if not disable_column_check and array_parameter:
+        print("\nChecking columns in the array file vs parameters added...\n")
+        for ap in array_parameter:
+            ap_split = ap.split('=')
+            ap_value = '='.join(ap_split[1:])
+            for col in columns:
+                if col['name'] == ap_value:
+                    print(f"Found column '{ap_value}' in the array file.")
+                    break
+            else:
+                raise ValueError(f"Column '{ap_value}' not found in the array file. " +
+                                 "Columns in array-file: ", f"{separator}".join([col['name'] for col in columns]))
+    if job_queue is not None:
+        batch = True
+        queue = Queue(cloudos_url=cloudos_url, apikey=apikey, cromwell_token=None,
+                      workspace_id=workspace_id, verify=verify_ssl)
+        # I have to add 'nextflow', other wise the job queue id is not found
+        job_queue_id = queue.fetch_job_queue_id(workflow_type='nextflow', batch=batch,
+                                                job_queue=job_queue)
+    else:
+        job_queue_id = None
+        batch = False
+    # send job
+    j_id = j.send_job(job_config=None,
+                      parameter=parameter,
+                      array_parameter=array_parameter,
+                      array_file_header=columns,
+                      git_commit=None,
+                      git_tag=None,
+                      git_branch=None,
+                      job_name=job_name,
+                      resumable=False,
+                      save_logs=save_logs,
+                      batch=batch,
+                      job_queue_id=job_queue_id,
+                      workflow_type='docker',
+                      nextflow_profile=None,
+                      nextflow_version=None,
+                      instance_type=instance_type,
+                      instance_disk=instance_disk,
+                      storage_mode=storage_mode,
+                      lustre_size=lustre_size,
+                      execution_platform=execution_platform,
+                      hpc_id=None,
+                      cost_limit=cost_limit,
+                      verify=verify_ssl,
+                      command=cmd,
+                      cpus=cpus,
+                      memory=memory)
+    print(f'\tYour assigned job id is: {j_id}\n')
+    j_url = f'{cloudos_url}/app/advanced-analytics/analyses/{j_id}'
+    if wait_completion:
+        print('\tPlease, wait until job completion (max wait time of ' +
+              f'{wait_time} seconds).\n')
+        j_status = j.wait_job_completion(job_id=j_id,
+                                         wait_time=wait_time,
+                                         request_interval=request_interval,
+                                         verbose=False,
+                                         verify=verify_ssl)
+        j_name = j_status['name']
+        j_final_s = j_status['status']
+        if j_final_s == JOB_COMPLETED:
+            print(f'\nJob status for job "{j_name}" (ID: {j_id}): {j_final_s}')
+            sys.exit(0)
+        else:
+            print(f'\nJob status for job "{j_name}" (ID: {j_id}): {j_final_s}')
+            sys.exit(1)
+    else:
+        j_status = j.get_job_status(j_id, verify_ssl)
+        j_status_h = json.loads(j_status.content)["status"]
+        print(f'\tYour current job status is: {j_status_h}')
+        print('\tTo further check your job status you can either go to ' +
+              f'{j_url} or use the following command:\n' +
+              '\tcloudos job status \\\n' +
+              '\t\t--apikey $MY_API_KEY \\\n' +
+              f'\t\t--cloudos-url {cloudos_url} \\\n' +
+              f'\t\t--job-id {j_id}\n')
 @datasets.command(name="ls")
 @click.argument("path", required=False, nargs=1)
 @click.option('-k',

cloudos_cli-2.33.0/cloudos_cli/_version.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = '2.33.0'

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/jobs/job.py RENAMED Viewed

@@ -7,7 +7,9 @@ from typing import Union
 import json
 from cloudos_cli.clos import Cloudos
 from cloudos_cli.utils.errors import BadRequestException
-from cloudos_cli.utils.requests import retry_requests_post
+from cloudos_cli.utils.requests import retry_requests_post, retry_requests_get
+from pathlib import Path
+import base64
 @dataclass
@@ -174,6 +176,8 @@ class Job(Cloudos):
     def convert_nextflow_to_json(self,
                                  job_config,
                                  parameter,
+                                 array_parameter,
+                                 array_file_header,
                                  is_module,
                                  example_parameters,
                                  git_commit,
@@ -214,6 +218,15 @@ class Job(Cloudos):
         parameter : tuple
             Tuple of strings indicating the parameters to pass to the pipeline call.
             They are in the following form: ('param1=param1val', 'param2=param2val', ...)
+        array_parameter : tuple
+            Tuple of strings indicating the parameters to pass to the pipeline call
+            for array jobs. They are in the following form: ('param1=param1val', 'param2=param2val', ...)
+        array_file_header : string
+            The header of the file containing the array parameters. It is used to
+            add the necessary column index for array file columns.
+        is_module : bool
+            Whether the job is a module or not. If True, the job will be
+            submitted as a module.
         example_parameters : list
             A list of dicts, with the parameters required for the API request in JSON format.
             It is typically used to run curated pipelines using the already available
@@ -353,6 +366,13 @@ class Job(Cloudos):
             if len(workflow_params) == 0:
                 raise ValueError(f'The {job_config} file did not contain any ' +
                                  'valid parameter')
+        # array file specific parameters (from --array-parameter)
+        if array_parameter is not None and len(array_parameter) > 0:
+            ap_param = Job.split_array_file_params(array_parameter, workflow_type, array_file_header)
+            workflow_params.append(ap_param)
+        # general parameters (from --parameter)
         if len(parameter) > 0:
             for p in parameter:
                 p_split = p.split('=')
@@ -439,7 +459,7 @@ class Job(Cloudos):
                 "diskSizeInGb": azure_worker_instance_disk
             }
         if workflow_type == 'docker':
-            params['command'] = command
+            params = params | command  # add command to params as dict (python 3.9+)
             params["resourceRequirements"] = {
                 "cpu": cpus,
                 "ram": memory
@@ -465,6 +485,8 @@ class Job(Cloudos):
     def send_job(self,
                  job_config=None,
                  parameter=(),
+                 array_parameter=(),
+                 array_file_header=None,
                  is_module=False,
                  example_parameters=[],
                  git_commit=None,
@@ -504,6 +526,12 @@ class Job(Cloudos):
         parameter : tuple
             Tuple of strings indicating the parameters to pass to the pipeline call.
             They are in the following form: ('param1=param1val', 'param2=param2val', ...)
+        array_parameter : tuple
+            Tuple of strings indicating the parameters to pass to the pipeline call
+            for array jobs. They are in the following form: ('param1=param1val', 'param2=param2val', ...)
+        array_file_header : string
+            The header of the file containing the array parameters. It is used to
+            add the necessary column index for array file columns.
         example_parameters : list
             A list of dicts, with the parameters required for the API request in JSON format.
             It is typically used to run curated pipelines using the already available
@@ -590,6 +618,8 @@ class Job(Cloudos):
         }
         params = self.convert_nextflow_to_json(job_config,
                                                parameter,
+                                               array_parameter,
+                                               array_file_header,
                                                is_module,
                                                example_parameters,
                                                git_commit,
@@ -630,3 +660,177 @@ class Job(Cloudos):
         print('\tJob successfully launched to CloudOS, please check the ' +
               f'following link: {cloudos_url}/app/advanced-analytics/analyses/{j_id}')
         return j_id
+    def retrieve_cols_from_array_file(self, array_file, ds, separator, verify_ssl):
+        """
+        Retrieve metadata for columns from an array file stored in a directory.
+        This method fetches the metadata of an array file by interacting with a directory service
+        and making an API call to retrieve the file's metadata.
+        Parameters
+        ----------
+        array_file : str
+            The path to the array file whose metadata is to be retrieved.
+        ds : object
+            The directory service object used to list folder content.
+        separator : str
+            The separator used in the array file.
+        verify_ssl : bool
+            Whether to verify SSL certificates during the API request.
+        Raises
+        ------
+        ValueError
+            If the specified file is not found in the directory.
+        BadRequestException
+            If the API request to retrieve metadata fails with a status code >= 400.
+        Returns
+        -------
+        Response
+            The HTTP response object containing the metadata of the array file.
+        """
+        # Split the array_file path to get the directory and file name
+        p = Path(array_file)
+        directory = str(p.parent)
+        file_name = p.name
+        # fetch the content of the directory
+        result = ds.list_folder_content(directory)
+        # retrieve the S3 bucket name and object key for the specified file
+        for file in result['files']:
+            if file.get("name") == file_name:
+                self.array_file_id = file.get("_id")
+                s3_bucket_name = file.get("s3BucketName")
+                s3_object_key = file.get("s3ObjectKey")
+                s3_object_key_b64 = base64.b64encode(s3_object_key.encode()).decode()
+                break
+        else:
+            raise ValueError(f'File "{file_name}" not found in the "{directory}" folder of the project "{self.project_name}".')
+        # retrieve the metadata of the array file
+        headers = {
+            "Content-type": "application/json",
+            "apikey": self.apikey
+        }
+        url = (
+            f"{self.cloudos_url}/api/v1/jobs/array-file/metadata"
+            f"?separator={separator}"
+            f"&s3BucketName={s3_bucket_name}"
+            f"&s3ObjectKey={s3_object_key_b64}"
+            f"&teamId={self.workspace_id}"
+        )
+        r = retry_requests_get(url, headers=headers, verify=verify_ssl)
+        if r.status_code >= 400:
+            raise BadRequestException(r)
+        return r
+    def setup_params_array_file(self, custom_script_path, ds_custom, command, separator):
+        """
+        Sets up a dictionary representing command parameters, including support for custom scripts
+        and array files, to be used in job execution.
+        Parameters
+        ----------
+        custom_script_path : str
+            Path to the custom script file. If None, the command is treated as text.
+        ds_custom : object
+            An object providing access to folder content listing functionality.
+        command : str
+            The command to be executed, either as text or the name of a custom script.
+        separator : str
+            The separator to be used for the array file.
+        Returns
+        -------
+        dict
+            A dictionary containing the command parameters, including:
+                - "command": The command name or text.
+                - "customScriptFile" (optional): Details of the custom script file if provided.
+                - "arrayFile": Details of the array file and its separator.
+        """
+        if custom_script_path is not None:
+            command_path = Path(custom_script_path)
+            command_dir = str(command_path.parent)
+            command_name = command_path.name
+            result_script = ds_custom.list_folder_content(command_dir)
+            for file in result_script['files']:
+                if file.get("name") == command_name:
+                    custom_script_item = file.get("_id")
+                    break
+            # use this in case the command is in a custom script
+            cmd = {
+                "command": f"{command_name}",
+                "customScriptFile": {
+                    "dataItem": {
+                        "kind": "File",
+                        "item": f"{custom_script_item}"
+                    }
+                }
+            }
+        else:
+            # use this for text commands
+            cmd = {"command": command}
+        # add array-file
+        cmd = cmd | {
+            "arrayFile": {
+                "dataItem": {"kind": "File", "item": f"{self.array_file_id}"},
+                "separator": f"{separator}"
+            }
+        }
+        return cmd
+    @staticmethod
+    def split_array_file_params(array_parameter, workflow_type, array_file_header):
+        """
+        Splits and processes array parameters for a given workflow type and array file header.
+        Parameters
+        ----------
+        array_parameter :   list
+            A list of strings representing array parameters in the format "key=value".
+        workflow_type : str
+            The type of workflow, e.g., 'docker'.
+        array_file_header : list
+            A list of dictionaries representing the header of the array file.
+            Each dictionary should contain "name" and "index" keys.
+        Returns
+        -------
+        dict
+            A dictionary containing processed parameter details, including:
+                - prefix (str): The prefix for the parameter (e.g., "--" or "-").
+                - name (str): The name of the parameter with leading dashes stripped.
+                - parameterKind (str): The kind of parameter, set to "arrayFileColumn".
+                - columnName (str): The name of the column derived from the parameter value.
+                - columnIndex (int): The index of the column in the array file header.
+        Raises
+        ------
+        ValueError
+            If an array parameter does not contain a '=' character or is improperly formatted.
+        """
+        ap_param = dict()
+        for ap in array_parameter:
+            ap_split = ap.split('=')
+            if len(ap_split) < 2:
+                raise ValueError('Please, specify -a / --array-parameter using a single \'=\' ' +
+                                'as spacer. E.g: input=value')
+            ap_name = ap_split[0]
+            ap_value = '='.join(ap_split[1:])
+            if workflow_type == 'docker':
+                ap_prefix = "--" if ap_name.startswith('--') else ("-" if ap_name.startswith('-') else '')
+                ap_param = {
+                    "prefix": ap_prefix,
+                    "name": ap_name.lstrip('-'),
+                    "parameterKind": "arrayFileColumn",
+                    "columnName": ap_value,
+                    "columnIndex": next((item["index"] for item in array_file_header if item["name"] == "id"), 0)
+                }
+        return ap_param

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: cloudos_cli
-Version: 2.32.1
+Version: 2.33.0
 Summary: Python package for interacting with CloudOS
 Home-page: https://github.com/lifebit-ai/cloudos-cli
 Author: David Piñeyro
@@ -420,6 +420,98 @@ command.
 Other options like `--wait-completion` are also available and work in the same way as for the `cloudos job run` command.
 Check `cloudos bash job --help` for more details.
+#### Send a bash array-job to CloudOS (parallel sample processing)
+When running a bash array job, the following options are available to customize the behavior:
+##### Array File
+- **`--array-file`**: Specifies the path to a file containing a set of columns useful in running the bash job. This option is **required** when using the command `bash array-job`.
+##### Separator
+- **`--separator`**: Defines the separator to use in the array file. Supported separators include:
+    - `,` (comma)
+    - `;` (semicolon)
+    - `tab`
+    - `space`
+    - `|` (pipe)
+This option is **required** when using the command `bash array-job`.
+##### List Columns
+- **`--list-columns`**: Lists the columns available in the array file. This is useful for inspecting the structure of the file. This flag disables sending the job, it just prints the column list, one per line:
+```console
+Columns:
+    - column1
+    - column2
+    - column3
+```
+##### Array File Project
+- **`--array-file-project`**: Specifies the name of the project in which the array file is placed, if it is different from the project specified by `--project-name`.
+##### Disable Column Check
+- **`--disable-column-check`**: Disables the validation of columns in the array file. This implies that each `--array-parameter` value is not checked against the header of the `--array-file`. For example, `--array-parameter --bar=foo`, without `--disable-column-check`, expects the array file to have column 'foo' inside the file header. If the column is not present, the CLI will throw an error. When `--disable-column-check` flag is added, the column check is not performed and the bash array job is sent to the platform.
+> [!NOTE]
+> Adding `--disable-column-check` will make the CLI command run without errors, but the errors might appear when checking the job in the platform, if the columns in the array file do not exists, as depicted with `--array-parameter`.
+##### Array Parameter
+- **`-a` / `--array-parameter`**: Allows specifying the column name present in the header of the array file. Each parameter should be in the format `arary_parameter_name=array_file_column`. For example:
+    - `-a --test=value` or
+    - `--array-parameter -test=value`
+specify a column named 'value' in the array file header. Adding array parameters not present in the header will cause an error. This option can be used multiple times to include as many array parameters as needed. This type of parameter is similar to `-p, --parameter`, both parameters can be interpolated in the bash array job command (either with `--command` or `--custom-script-path`), but this parameter can only be used to name the column present in the header of the array file.
+For example, the array file has the following header:
+```console
+id,bgen,csv
+1,s3://data/adipose.bgen,s3://data/adipose.csv
+2,s3://data/blood.bgen,s3://data/blood.csv
+3,s3://data/brain.bgen,s3://data/brain.csv
+...
+```
+and in the command there is need to go over the `bgen` column, this can be specified as `--array-parameter file=bgen`, refering to the column in the header.
+##### Custom Script Path
+- **`--custom-script-path`**: Specifies the path to a custom script to run in the bash array job instead of a command. When adding this command, parameter `--command` is ignored. To ensure the script runs successfully, you must either:
+1. Use a Shebang Line at the Top of the Script
+The shebang (#!) tells the system which interpreter to use to run the script. The path should match absolute path to python or other interpreter installed inside the docker container.
+Examples:
+`#!/usr/bin/python3` –-> for Python scripts
+`#!/usr/bin/Rscript` –-> for R scripts
+`#!/bin/bash`        –-> for Bash scripts
+Example Python Script:
+```python
+#!/usr/bin/python3
+print("Hello world")
+```
+2. Or use an interpreter command in the executable field
+If your script doesn’t have a shebang line, you can execute it by explicitly specifying the interpreter in the executable command:
+```console
+python my_script.py
+Rscript my_script.R
+bash my_script.sh
+```
+This assumes the interpreter is available on the container’s $PATH. If not, you can use the full absolute path instead:
+```console
+/usr/bin/python3 my_script.py
+/usr/local/bin/Rscript my_script.R
+```
+##### Custom Script Project
+- **`--custom-script-project`**: Specifies the name of the project in which the custom script is placed, if it is different from the project specified by `--project-name`.
+These options provide flexibility for configuring and running bash array jobs, allowing to tailor the execution for specific requirements.
 #### Get path to logs of job from CloudOS
 Get the path to "Nextflow logs", "Nextflow standard output", and "trace" files. It can be used only on your user's jobs, with any status.

cloudos_cli-2.32.1/cloudos_cli/_version.py DELETED Viewed

	@@ -1 +0,0 @@
1	- __version__ = '2.32.1'

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/LICENSE RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/__init__.py RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/clos.py RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/configure/__init__.py RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/configure/configure.py RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/datasets/__init__.py RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/datasets/datasets.py RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/import_wf/__init__.py RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/import_wf/import_wf.py RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/jobs/__init__.py RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/queue/__init__.py RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/queue/queue.py RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/utils/__init__.py RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/utils/cloud.py RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/utils/details.py RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/utils/errors.py RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/utils/requests.py RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli/utils/resources.py RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli.egg-info/SOURCES.txt RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli.egg-info/entry_points.txt RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli.egg-info/requires.txt RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/cloudos_cli.egg-info/top_level.txt RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/setup.cfg RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/setup.py RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/tests/__init__.py RENAMED Viewed

File without changes

{cloudos_cli-2.32.1 → cloudos_cli-2.33.0}/tests/functions_for_pytest.py RENAMED Viewed

File without changes

cloudos-cli 2.32.1__tar.gz → 2.33.0__tar.gz

cloudos-cli 2.32.1tar.gz → 2.33.0tar.gz