data-transfer-cli 0.3.2__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,235 @@
1
+ Metadata-Version: 2.3
2
+ Name: data-transfer-cli
3
+ Version: 0.3.2
4
+ Summary: HiDALGO Data Transfer CLI provides commands to transfer data between different data providers and consumers using NIFI pipelines
5
+ License: APL-2.0
6
+ Author: Jesús Gorroñogoitia
7
+ Author-email: jesus.gorronogoitia@eviden.com
8
+ Requires-Python: >=3.11, <4.0
9
+ Classifier: License :: Other/Proprietary License
10
+ Classifier: Programming Language :: Python :: 3
11
+ Classifier: Programming Language :: Python :: 3.11
12
+ Classifier: Programming Language :: Python :: 3.12
13
+ Classifier: Programming Language :: Python :: 3.13
14
+ Requires-Dist: hid_data_transfer_lib (>=0.3.2)
15
+ Requires-Dist: paramiko (>=3.3.1)
16
+ Requires-Dist: pyyaml (>=6.0.2,<7.0.0)
17
+ Requires-Dist: requests (>=2.31.0)
18
+ Description-Content-Type: text/markdown
19
+
20
+ # Hidalgo2 Data Transfer Tool
21
+ This repository contains the implementation of the Hidalgo2 data transfer tool. It uses [Apache NIFI](https://nifi.apache.org/) to transfer data from different data sources to specified targets
22
+
23
+ ## Features
24
+ This tool is planning to support the following features:
25
+ - transfer datasets from Cloud Providers to HDFS
26
+ - transfer datasets from Cloud Providers to CKAN
27
+ - transfer datasets from/to Hadoop HDFS to/from HPC
28
+ - transfer datasets from/to Hadoop HDFS to/from CKAN
29
+ - transfer datasets from/to a CKAN to/from HPC
30
+ - transfer datasets from/to local filesystem to/from CKAN
31
+
32
+ ## Prototype
33
+ Current prototype supports the following features:
34
+ - transfer datasets from/to Hadoop HDFS to/from HPC
35
+ - transfer datasets from/to Hadoop HDFS to/from CKAN
36
+ - transfer datasets from/to a CKAN to/from HPC
37
+ - transfer datasets from/to local filesystem to/from CKAN
38
+
39
+
40
+ ## Implementation
41
+ Current implementation is based on Python. It is implemented as a CLI that executes a transfer command, by creating a NIFI process group out of the worflow definition reqistered in NIFI registry. It uses the parameters given within the CLI command invocation to populate a NIFI parameter context that is asociated to the created process group. Then, the process group processors are executed once (or until the incoming flowfile queues is empty), one after another, following the group sequence flow, until the flow is completed. To check the status of the transfer command, the CLI offers a check-status command. The Data Transfer CLI tool sends requests to NIFI through its REST API.
42
+
43
+ ## Requirements
44
+ To use the Data Transfer CLI tool, it is required the following requirements:
45
+ - **Python3** execution environment
46
+ - **Poetry** python package management tool
47
+ - **NIFI** instance, with either a NIFI service or a KEYCLOAK account, plus a NIFI server account (for keys transfer)
48
+ - **HDFS** instance, with a user Kerberos token (i.e. authenticated Kerberos principal) if required
49
+ - **CKAN** instance, with an user APIKey
50
+
51
+ Python3 and Poetry should be installed in the computer where Data Transfer CLI tool will be used.
52
+ To install Poetry, follows [this instructions](https://python-poetry.org/docs/#installing-with-the-official-installer)
53
+
54
+ For a quick download, setup, configuration and execution of the DTCLI go to section [Quick Deployment, setup, configuration and execution](#quick-deployment-setup-configuration-and-execution)
55
+
56
+ ## CLI configuration
57
+ ### Configuration file
58
+ Before using the Data Transfer CLI tool, you should configure it to point at the target NIFI. The configuration file is located at the *src/dtcli.cfg* file.
59
+
60
+ ```
61
+ [Nifi]
62
+ nifi_endpoint=http://localhost:8443
63
+ nifi_upload_folder=/opt/nifi/data/upload
64
+ nifi_download_folder=/opt/nifi/data/download
65
+ nifi_secure_connection=True
66
+
67
+ [Keycloak]
68
+ keycloak_endpoint=https://idm.hidalgo2.eu
69
+ keycloak_client_id=nifi
70
+ keycloak_client_secret=Tkt8BQmTfUkSceknml6HDSbmGyNRik9V
71
+ ```
72
+
73
+ Under the NIFI section,
74
+ - We define the url of the NIFI service (*nifi_endpoint*),
75
+ - We also specify a folder (*nifi_upload_folder*) in NIFI server where to upload files
76
+ - And another folder (*nifi_download_folder*) where from to download files. These folder must be accessible by the NIFI service (ask NIFI administrator for details).
77
+ - Additionally, you cat set if NIFI servers listens on a secure HTTPS connection (*nifi_secure_connection*=True) or on a non-secure HTTP (*nifi_secure_connection*=False)
78
+
79
+ Under the Keycloak section, you can configure the Keycloak integrated with NIFI, specifying:
80
+ - The Keycloak service endpoint (*keycloak_endpoint*)
81
+ - The NIFI client in Keycloak (*keycloak_client*)
82
+ - The NIFI secret in Keycloak (*keycloak_client_secret*)
83
+
84
+ ### NIFI and Keycloak credentials in environment variables
85
+ We must also specify a user account (username, private_key) that grants to upload/download files to the NIFI server (as requested to upload temporary HPC keys or to support local file transfer). This user's account is provided by Hidalgo2 infrastructure provider and it is user's specific. This account is setup in the following environment variables
86
+ - NIFI_SERVER_USERNAME: `export NIFI_SERVER_USERNAME=<nifi_server_username>`
87
+ - NIFI_SERVER_PRIVATE_KEY: `export NIFI_SERVER_PRIVATE_KEY=<path_to_private_key>`
88
+
89
+ Additionally, a user account granted with access to the NIFI service must be specified, either a
90
+
91
+ #### A) NIFI User Account
92
+ The NIFI account must be configured in the following environment variables:
93
+ - NIFI_LOGIN: `export NIFI_LOGIN=<nifi_login>`
94
+ - NIFI_PASSWORD: `export NIFI_PASSWORD=<nifi_password>`
95
+
96
+ This NIFI account is provided by the NIFI administrator, or a
97
+
98
+ #### B) Keycloak Account with access to NIFI
99
+ The Keycloak account must be configured in the following environment variables:
100
+ - KEYCLOAK_LOGIN: `export KEYCLOAK_LOGIN=<keycloak_login>`
101
+ - KEYCLOAK_PASSWORD: `export KEYCLOAK_PASSWORD=<keycloak_password>`
102
+
103
+ This Keycloak account is provided by the Keycloak administrator.
104
+
105
+ ## Quick Deployment, setup, configuration and execution
106
+ ### From GitLab repository
107
+ 1. Clone Data Transfer CLI repository.
108
+ 2. Setup the hid-data_transfer_lib project.
109
+ Go to folder *hid-data-management/data-transfer/nifi/hid_data_transfer_lib*.
110
+ On the prompt, do: `poetry install & poetry build`
111
+ Note: this is only required meanwhile this tool is under development, as data-transfer-cli references the local hid_data_transfer_lib and not the one published on Pypi.
112
+ 2. Setup the data-transfer-cli project with poetry.
113
+ Go to folder *hid-data-management/data-transfer/nifi/data-transfer-cli*.
114
+ On the prompt, run `./setup.sh`
115
+ 3. Configure your NIFI and Keycloak services, by modifying the default DT CLI configuration (attached to HiDALGO2 NIFI and KEYCLOAK) located at *src/dtcli.cfg*
116
+ 4. Edit *setenv.sh*. Provide your accounts for KEYCLOAK and NIFI server. Contact the HiDALGO2 administrator to request them.
117
+ ```
118
+ export NIFI_SERVER_USERNAME="<username>"
119
+ export NIFI_SERVER_PRIVATE_KEY="<relative_path_ssh_private_key"
120
+ export KEYCLOAK_LOGIN="<username>"
121
+ export KEYCLOAK_PASSWORD="<password>"
122
+ ```
123
+ 5. Run Data Transfer CLI tool. In this example, we ask it for help: `dtcli -h`
124
+
125
+ ### From Pipy installation
126
+ To be done
127
+
128
+ ## Usage
129
+ The Data Transfer CLI tool can be executed by invoking the command `dtcli`. Add this command location to your path, either by adding the *data_transfer_cli* folder (when cloned from GitLab) or its location when installed with pip from Pypi:
130
+
131
+ `./dtcli command <arguments>`
132
+
133
+ To get help execute:
134
+
135
+ `./dtcli -h`
136
+
137
+ obtaining:
138
+
139
+ ```
140
+ usage: ['-h'] [-h]
141
+ {check-status,hdfs2hpc,hpc2hdfs,ckan2hdfs,hdfs2ckan,ckan2hpc,hpc2ckan,local2ckan,ckan2local}
142
+ ...
143
+
144
+ positional arguments:
145
+ {check-status,hdfs2hpc,hpc2hdfs,ckan2hdfs,hdfs2ckan,ckan2hpc,hpc2ckan,local2ckan,ckan2local}
146
+ supported commands to transfer data
147
+ check-status check the status of a command
148
+ hdfs2hpc transfer data from HDFS to target HPC
149
+ hpc2hdfs transfer data from HPC to target HDFS
150
+ ckan2hdfs transfer data from CKAN to target HDFS
151
+ hdfs2ckan transfer data from HDFS to a target CKAN
152
+ ckan2hpc transfer data from CKAN to target HPC
153
+ hpc2ckan transfer data from HPC to a target CKAN
154
+ local2ckan transfer data from a local filesystem to a target CKAN
155
+ ckan2local transfer data from CKAN to a local filesystem
156
+
157
+ options:
158
+ -h, --help show this help message and exit
159
+ ```
160
+
161
+ To get help of a particular command:
162
+
163
+ `./dtcli hdfs2hpc -h`
164
+
165
+ obtaining:
166
+
167
+ ```
168
+ usage: ['hdfs2hpc', '-h'] hdfs2hpc [-h] -s DATA_SOURCE [-t DATA_TARGET] [-kpr KERBEROS_PRINCIPAL] [-kp KERBEROS_PASSWORD] -H HPC_HOST [-z HPC_PORT] -u HPC_USERNAME [-p HPC_PASSWORD] [-k HPC_SECRET_KEY] [-P HPC_SECRET_KEY_PASSWORD]
169
+
170
+ options:
171
+ -h, --help show this help message and exit
172
+ -s DATA_SOURCE, --data-source DATA_SOURCE
173
+ HDFS file path
174
+ -t DATA_TARGET, --data-target DATA_TARGET
175
+ [Optional] HPC folder
176
+ -kpr KERBEROS_PRINCIPAL, --kerberos-principal KERBEROS_PRINCIPAL
177
+ [Optional] Kerberos principal (mandatory for a Kerberized HDFS)
178
+ -kp KERBEROS_PASSWORD, --kerberos-password KERBEROS_PASSWORD
179
+ [Optional] Kerberos principal password (mandatory for a Kerberized HDFS)
180
+ -H HPC_HOST, --hpc-host HPC_HOST
181
+ Target HPC ssh host
182
+ -z HPC_PORT, --hpc-port HPC_PORT
183
+ [Optional] Target HPC ssh port
184
+ -u HPC_USERNAME, --hpc-username HPC_USERNAME
185
+ Username for HPC account
186
+ -p HPC_PASSWORD, --hpc-password HPC_PASSWORD
187
+ [Optional] Password for HPC account. Either password or secret key is required
188
+ -k HPC_SECRET_KEY, --hpc-secret-key HPC_SECRET_KEY
189
+ [Optional] Path to HPC secret key. Either password or secret key is required
190
+ -P HPC_SECRET_KEY_PASSWORD, --hpc-secret-key-password HPC_SECRET_KEY_PASSWORD
191
+ [Optional] Password for HPC secret key
192
+ -2fa, --two-factor-authentication
193
+ [Optional] HPC requires 2FA authentication
194
+ ```
195
+
196
+ A common command flow (e.g. transfer data from hdfs to hpc) would be like this:
197
+
198
+ - execute *hdfs2hcp* CLI command to transfer data from an hdfs location (e.g. /users/yosu/data/genome-tags.csv) to a remote HPC (e.g. LUMI, at $HOME/data folder)
199
+ - check status of *hdfs2hcp* transfer (and possible warnings/errors) with *check-status* CLI command
200
+
201
+ ## Support for HPC clusters that require a 2FA token
202
+ The Data Transfer CLI tool's commands support transferring data to/from HPC clusters that require a 2FA token. These commands offer an optional flag *_2fa*. If set by the user, the command prompts the user (in the standard input) for the token when required.
203
+
204
+ ## Predefined profiles for data hosts
205
+ To avoid feeding the Data Transfer CLI tool with many inputs decribing the hosts of the source and target data providers/consumers, the user can defined them in the `~/dtcli/config` YAML file, as shown in the following YAML code snippet:
206
+ ```
207
+ # Meluxina
208
+ login.lxp.lu:
209
+ username: u102309
210
+ port: 8822
211
+ secret-key: ~/.ssh/<secret_key>
212
+ secret-key-password: <password>
213
+
214
+ # CKAN
215
+ ckan.hidalgo2.eu:
216
+ api-key: <api-key>
217
+ organization: atos
218
+ dataset: test-dataset
219
+ ```
220
+
221
+ where details for Meluxina HPC and CKAN are given. For a HPC cluster, provide the HPC host as key, followed by colon, and below, with identation, any of the hpc parameters described by the Data Tranfer CLI tool help, without the *hpc_* prefix. For instance, if the Data Transfer CLI tool help mentions:
222
+ ```
223
+ -u HPC_USERNAME, --hpc-username HPC_USERNAME
224
+ Username for HPC account
225
+ ```
226
+ that is, *--hpc-username* as parameter, use *username* as nested property for the HPC profile's description in the YAML config file, as shown in the example below. Similarly, proceed for other HPC parameters, such as *port*, *password*, *secret-key*, etc.
227
+ The same procedure can be adopted to describe the CKAN host's parameters.
228
+
229
+ Note: Hidalgo2 HPDA configuration is included in the Data Transfer CLI tool implementation and does not require to be included in this config file.
230
+
231
+ Then, when you launch a Data Tranfer CLI tool command, any parameter not included in the command line will be retrieved from the config file if the corresponding host entry is included. After that, if the command line gets complete (i.e. all required parameters are provided), the command will be executed, otherwise the corresponding error will be triggered.
232
+
233
+
234
+
235
+
@@ -0,0 +1,12 @@
1
+ src/.env,sha256=vGRFD3D5PPwoMBhtPvqN_1alGOHLsNAOJB_3r34BoFA,16
2
+ src/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
3
+ src/conf/cli.cfg,sha256=Djxan_HjTkqVJOJZWuXt3DsypAgGVPYMj40YWEtjGFs,206
4
+ src/data_transfer_cli.py,sha256=abM6Dpik9-LHpOQT4_GUqZRiXImICJu0YpcYUrCL7sQ,5646
5
+ src/data_transfer_proxy.py,sha256=4dN5RNT_zJyNUaKGxjvyUw-AMdNVnr3uBpzGXgupko4,13877
6
+ src/dtcli.cfg,sha256=4eWBXiKB5WtG9I6jq6mh1u8ZDb73owmzqpfHAv-2K7s,367
7
+ src/parser/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
8
+ src/parser/cli_parser.py,sha256=IxbtaQxs_zqkVBDmLMngJdNR-l8ER5DvO5KtFpowczU,14775
9
+ data_transfer_cli-0.3.2.dist-info/METADATA,sha256=v6hcAsBxa9nk114VDo1MjTczZtozf-_RT1PmfEExwVk,12362
10
+ data_transfer_cli-0.3.2.dist-info/WHEEL,sha256=XbeZDeTWKc1w7CSIyre5aMDU_-PohRwTQceYnisIYYY,88
11
+ data_transfer_cli-0.3.2.dist-info/entry_points.txt,sha256=vQK-69hamV_JVD9AMg1e0HquJCZtjeYxj92ULtYSBso,46
12
+ data_transfer_cli-0.3.2.dist-info/RECORD,,
@@ -0,0 +1,4 @@
1
+ Wheel-Version: 1.0
2
+ Generator: poetry-core 2.1.1
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
@@ -0,0 +1,3 @@
1
+ [console_scripts]
2
+ cli=data_transfer_cli:main
3
+
src/.env ADDED
@@ -0,0 +1 @@
1
+ PYTHONPATH=./src
src/__init__.py ADDED
File without changes
src/conf/cli.cfg ADDED
@@ -0,0 +1,8 @@
1
+ [Nifi]
2
+ nifi_endpoint=https://nifi.hidalgo2.eu:9443
3
+
4
+ nifi_server_user_name=yosu
5
+ nifi_server_private_key=~/.ssh/hidalgo2
6
+ nifi_upload_folder=/opt/nifi/data/upload
7
+ nifi_download_folder=/opt/nifi/data/download
8
+
@@ -0,0 +1,171 @@
1
+ '''
2
+ Copyright 2024 Eviden
3
+ Licensed under the Apache License, Version 2.0 (the "License");
4
+ you may not use this file except in compliance with the License.
5
+ You may obtain a copy of the License at
6
+
7
+ http://www.apache.org/licenses/LICENSE-2.0
8
+
9
+ Unless required by applicable law or agreed to in writing, software
10
+ distributed under the License is distributed on an 'AS IS' BASIS,
11
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ See the License for the specific language governing permissions and
13
+ limitations under the License.
14
+
15
+ CLI tool for data transfer based on Apache NIFI
16
+ Initial Poc features
17
+ - hdfs2hpc: transfer data from hdfs to target hpc, using sftp processor:
18
+ - inputs:
19
+ - hpc_host: hpc frontend hostname
20
+ - hpc_username: user account name
21
+ - hpc_secret_key_path: user's secret key location
22
+ - data-source: HDFS file path
23
+ - data-target: HPC remote folder
24
+
25
+ - hpc2hdfs: transfer data file from hpc folder to target hdfs folder,
26
+ using sftp processor:
27
+ - inputs:
28
+ - hpc_host: hpc frontend hostname
29
+ - hpc_username: user account name
30
+ - hpc_secret_key_path: user's secret key location
31
+ - data-source: HPC file path
32
+ - data-target: HDFS remote folder
33
+
34
+ - ckan2hpc: transfer data from ckan to target hpc,
35
+ using ckan and sftp processors:
36
+ - inputs:
37
+ - ckan_host: CKAN host endpoint
38
+ - ckan_api_key: CKAN API key
39
+ - ckan_organization: CKAN organization
40
+ - ckan_dataset: CKAN dataset
41
+ - ckan_resource: CKAN resource
42
+ - hpc_host: hpc frontend hostname
43
+ - hpc_username: user account name
44
+ - hpc_secret_key_path: user's secret key location
45
+ - data-target: HPC remote folder
46
+
47
+ - hpc2ckan: transfer data from hpc to target ckan,
48
+ using ckan and sftp processors:
49
+ - inputs:
50
+ - ckan_host: CKAN host endpoint
51
+ - ckan_api_key: CKAN API key
52
+ - ckan_organization: CKAN organization
53
+ - ckan_dataset: CKAN dataset
54
+ - hpc_host: hpc frontend hostname
55
+ - hpc_username: user account name
56
+ - hpc_secret_key_path: user's secret key location
57
+ - data_source: HPC file path
58
+
59
+ - local2ckan: transfer data from a local filesystem to a target ckan
60
+ - inputs:
61
+ - ckan_host: CKAN host endpoint
62
+ - ckan_api_key: CKAN API key
63
+ - ckan_organization: CKAN organization
64
+ - ckan_dataset: CKAN dataset
65
+ - ckan_resource: CKAN resource to receive the data
66
+ - data_source: local file path to the data to transfer
67
+
68
+ - ckan2local: transfer data from ckan to a local filesystem
69
+ - inputs:
70
+ - ckan_host: CKAN host endpoint
71
+ - ckan_api_key: CKAN API key
72
+ - ckan_organization: CKAN organization
73
+ - ckan_dataset: CKAN dataset
74
+ - ckan_resource: CKAN resource to transfer
75
+ - data_target: local target directory where to transfer the resource
76
+
77
+ - check-status: check the execution state of a command
78
+ - inputs:
79
+ - command_id: uuid of command executed
80
+ (uuid is reported after command execution)
81
+
82
+
83
+ This CLI uses NIFI account to get an access token,
84
+ It uses NIFI REST API to send requests.
85
+ It uses a predefined and installed process group HDSF2HPC template
86
+ with associated parameter context
87
+ '''
88
+
89
+ import sys
90
+ import os
91
+ import threading
92
+ import traceback
93
+ import warnings
94
+
95
+ from hid_data_transfer_lib.exceptions.hid_dt_exceptions import HidDataTransferException
96
+ from hid_data_transfer_lib.conf.hid_dt_configuration import HidDataTransferConfiguration
97
+
98
+ from src.data_transfer_proxy import DataTransferProxy
99
+ from src.parser.cli_parser import CLIParser
100
+
101
+
102
+ warnings.filterwarnings("ignore")
103
+
104
+
105
+ # Get CLI configuration
106
+ os.environ["HID_DT_CONFIG_FILE"] = \
107
+ str(os.path.dirname(os.path.realpath(__file__))) + "/dtcli.cfg"
108
+ config = HidDataTransferConfiguration()
109
+
110
+ # Data Transfer proxy to the library
111
+ dt_proxy = DataTransferProxy(config, True)
112
+
113
+
114
+ class ThreadRaisingExceptions(threading.Thread):
115
+ """Thread class that raises exceptions in the main thread
116
+ when the thread finishes with an exception"""
117
+
118
+ def __init__(self, *args, **kwargs):
119
+ self._exception = None
120
+ self._process_group_id = None
121
+ super().__init__(*args, **kwargs)
122
+
123
+ def run(self):
124
+ try:
125
+ self._process_group_id = self._target(*self._args, **self._kwargs)
126
+ except HidDataTransferException as e:
127
+ self._exception = e
128
+ raise e
129
+
130
+ def join(self, *args, **kwargs):
131
+ super().join(*args, **kwargs)
132
+ if self._exception:
133
+ raise self._exception
134
+ return self._process_group_id
135
+
136
+
137
+ def main(args=None):
138
+ """Main entry point for the Data Transfer CLI"""
139
+ if not args:
140
+ args = sys.argv[1:]
141
+ # Parse arguments
142
+ cli_parser = CLIParser(args)
143
+
144
+ try:
145
+ if len(args) == 0:
146
+ cli_parser.print_help()
147
+ sys.exit(1)
148
+
149
+ # Read user's config file to complete missing arguments with default ones
150
+ args = cli_parser.fill_missing_args_from_config(args)
151
+ args = cli_parser.parse_arguments(args, dt_proxy)
152
+
153
+ # executes associated command in data_transfer_cli module
154
+ thread = ThreadRaisingExceptions(target=args.func, args=(args,))
155
+ thread.start()
156
+ thread.join()
157
+ except HidDataTransferException as e:
158
+ if e.process_group_id():
159
+ sys.stderr.write(
160
+ (
161
+ f"Got error{e} when executing process group "
162
+ f"with id {e.process_group_id()}"
163
+ )
164
+ )
165
+ else:
166
+ traceback.print_exc(file=sys.stderr)
167
+ raise e
168
+
169
+
170
+ if __name__ == "__main__":
171
+ main()
@@ -0,0 +1,331 @@
1
+ """
2
+ Copyright 2024 Eviden
3
+ Licensed under the Apache License, Version 2.0 (the "License");
4
+ you may not use this file except in compliance with the License.
5
+ You may obtain a copy of the License at
6
+
7
+ http://www.apache.org/licenses/LICENSE-2.0
8
+
9
+ Unless required by applicable law or agreed to in writing, software
10
+ distributed under the License is distributed on an 'AS IS' BASIS,
11
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ See the License for the specific language governing permissions and
13
+ limitations under the License.
14
+
15
+
16
+ This module defines the NIFI API client class.
17
+ It provides methods to interface the NIFI server to in instantiate templates,
18
+ and run processors in a process group.
19
+ """
20
+
21
+ from hid_data_transfer_lib.exceptions.hid_dt_exceptions import HidDataTransferException
22
+ from hid_data_transfer_lib.conf.hid_dt_configuration import HidDataTransferConfiguration
23
+ from hid_data_transfer_lib.hid_dt_lib import HIDDataTransfer
24
+
25
+
26
+ class DataTransferProxy:
27
+ """interface to the hid_data_transfer_lib to run data transfer commands"""
28
+
29
+ def __init__(
30
+ self, conf: HidDataTransferConfiguration, secure: bool = False
31
+ ) -> None:
32
+ """constructs a data transfer client,"""
33
+ self.__conf = conf
34
+ self.dt_client = HIDDataTransfer(conf=conf, secure=secure)
35
+ self.__logger = self.__conf.logger("nifi.v2.client")
36
+
37
+ def format_args_to_string(self, args):
38
+ """Format dtcli command arguments to string for logging"""
39
+ return " ".join(
40
+ [
41
+ f"ckan_host={args.ckan_host}," if hasattr(args, "ckan_host") else "",
42
+ (
43
+ f"ckan_organization={args.ckan_organization},"
44
+ if hasattr(args, "ckan_organization")
45
+ else ""
46
+ ),
47
+ (
48
+ f"ckan_dataset={args.ckan_dataset},"
49
+ if hasattr(args, "ckan_dataset")
50
+ else ""
51
+ ),
52
+ (
53
+ f"ckan_resource={args.ckan_resource},"
54
+ if hasattr(args, "ckan_resource")
55
+ else ""
56
+ ),
57
+ (
58
+ f"data_source={args.data_source},"
59
+ if hasattr(args, "data_source")
60
+ else ""
61
+ ),
62
+ (
63
+ f"data_target={args.data_target},"
64
+ if hasattr(args, "data_target")
65
+ else ""
66
+ ),
67
+ f"hpc_host={args.hpc_host}," if hasattr(args, "hpc_host") else "",
68
+ f"hpc_port={args.hpc_port}," if hasattr(args, "hpc_port") else "",
69
+ (
70
+ f"hpc_username={args.hpc_username},"
71
+ if hasattr(args, "hpc_username")
72
+ else ""
73
+ ),
74
+ (
75
+ f"hpc_secret_key={args.hpc_secret_key},"
76
+ if hasattr(args, "hpc_secret_key")
77
+ else ""
78
+ ),
79
+ f"command_id={args.command_id}," if hasattr(args, "command_id") else "",
80
+ ]
81
+ )
82
+
83
+ # MAIN CLI commands
84
+
85
+ def hdfs2hpc(self, args) -> str:
86
+ """transfer data from HDFS to hpc using SFTP"""
87
+ self.__logger.info(
88
+ "executing hdfs2hpc command with args: %s", self.format_args_to_string(args)
89
+ )
90
+ try:
91
+ # Check if 2FA is enabled
92
+ if args.two_factor_authentication:
93
+ return self.dt_client.hdfs2hpc_2fa(
94
+ hpc_host=args.hpc_host,
95
+ hpc_port=args.hpc_port,
96
+ hpc_username=args.hpc_username,
97
+ hpc_secret_key_path=args.hpc_secret_key,
98
+ hpc_secret_key_password=args.hpc_secret_key_password,
99
+ data_source=args.data_source,
100
+ data_target=args.data_target,
101
+ kerberos_principal=args.kerberos_principal,
102
+ kerberos_password=args.kerberos_password,
103
+ )
104
+ return self.dt_client.hdfs2hpc(
105
+ hpc_host=args.hpc_host,
106
+ hpc_port=args.hpc_port,
107
+ hpc_username=args.hpc_username,
108
+ hpc_password=args.hpc_password,
109
+ hpc_secret_key_path=args.hpc_secret_key,
110
+ hpc_secret_key_password=args.hpc_secret_key_password,
111
+ data_source=args.data_source,
112
+ data_target=args.data_target,
113
+ kerberos_principal=args.kerberos_principal,
114
+ kerberos_password=args.kerberos_password,
115
+ )
116
+
117
+ except Exception as ex:
118
+ raise HidDataTransferException(ex) from ex
119
+
120
+ def hpc2hdfs(self, args) -> str:
121
+ """transfer data from HPC to hdfs using SFTP"""
122
+ self.__logger.info(
123
+ "executing hpc2hdfs command with args: %s", self.format_args_to_string(args)
124
+ )
125
+ try:
126
+ # Check if 2FA is enabled
127
+ if args.two_factor_authentication:
128
+ return self.dt_client.hpc2hdfs_2fa(
129
+ hpc_host=args.hpc_host,
130
+ hpc_port=args.hpc_port,
131
+ hpc_username=args.hpc_username,
132
+ hpc_secret_key_path=args.hpc_secret_key,
133
+ hpc_secret_key_password=args.hpc_secret_key_password,
134
+ data_source=args.data_source,
135
+ data_target=args.data_target,
136
+ kerberos_principal=args.kerberos_principal,
137
+ kerberos_password=args.kerberos_password,
138
+ )
139
+ return self.dt_client.hpc2hdfs(
140
+ hpc_host=args.hpc_host,
141
+ hpc_port=args.hpc_port,
142
+ hpc_username=args.hpc_username,
143
+ hpc_password=args.hpc_password,
144
+ hpc_secret_key_path=args.hpc_secret_key,
145
+ hpc_secret_key_password=args.hpc_secret_key_password,
146
+ data_source=args.data_source,
147
+ data_target=args.data_target,
148
+ kerberos_principal=args.kerberos_principal,
149
+ kerberos_password=args.kerberos_password,
150
+ )
151
+
152
+ except Exception as ex:
153
+ raise HidDataTransferException(ex) from ex
154
+
155
+ def hdfs2ckan(self, args) -> str:
156
+ """transfer data from HDFS to CKAN using SFTP"""
157
+ self.__logger.info(
158
+ "executing hpc2ckan command with args: %s", self.format_args_to_string(args)
159
+ )
160
+ try:
161
+ return self.dt_client.hdfs2ckan(
162
+ ckan_host=args.ckan_host
163
+ if args.ckan_host.startswith("https") else f"https://{args.ckan_host}",
164
+ ckan_api_key=args.ckan_api_key,
165
+ ckan_organization=args.ckan_organization,
166
+ ckan_dataset=args.ckan_dataset,
167
+ data_source=args.data_source,
168
+ kerberos_principal=args.kerberos_principal,
169
+ kerberos_password=args.kerberos_password,
170
+ )
171
+
172
+ except Exception as ex:
173
+ raise HidDataTransferException(ex) from ex
174
+
175
+ def ckan2hdfs(self, args) -> str:
176
+ """transfer data from CKAN to HPC using SFTP"""
177
+ self.__logger.info(
178
+ "executing ckan2hpc command with args: %s", self.format_args_to_string(args)
179
+ )
180
+ try:
181
+ return self.dt_client.ckan2hdfs(
182
+ ckan_host=args.ckan_host
183
+ if args.ckan_host.startswith("https") else f"https://{args.ckan_host}",
184
+ ckan_api_key=args.ckan_api_key,
185
+ ckan_organization=args.ckan_organization,
186
+ ckan_dataset=args.ckan_dataset,
187
+ ckan_resource=args.ckan_resource,
188
+ data_target=args.data_target,
189
+ kerberos_principal=args.kerberos_principal,
190
+ kerberos_password=args.kerberos_password,
191
+ )
192
+
193
+ except Exception as ex:
194
+ raise HidDataTransferException(ex) from ex
195
+
196
+ def hpc2ckan(self, args) -> str:
197
+ """transfer data from hpc to CKAN using SFTP"""
198
+ self.__logger.info(
199
+ "executing hpc2ckan command with args: %s", self.format_args_to_string(args)
200
+ )
201
+ try:
202
+ # Check if 2FA is enabled
203
+ if args.two_factor_authentication:
204
+ return self.dt_client.hpc2ckan_2fa(
205
+ ckan_host=args.ckan_host
206
+ if args.ckan_host.startswith("https")
207
+ else f"https://{args.ckan_host}",
208
+ ckan_api_key=args.ckan_api_key,
209
+ ckan_organization=args.ckan_organization,
210
+ ckan_dataset=args.ckan_dataset,
211
+ hpc_host=args.hpc_host,
212
+ hpc_port=args.hpc_port,
213
+ hpc_username=args.hpc_username,
214
+ hpc_secret_key_path=args.hpc_secret_key,
215
+ hpc_secret_key_password=args.hpc_secret_key_password,
216
+ data_source=args.data_source,
217
+ )
218
+ return self.dt_client.hpc2ckan(
219
+ ckan_host=args.ckan_host
220
+ if args.ckan_host.startswith("https") else f"https://{args.ckan_host}",
221
+ ckan_api_key=args.ckan_api_key,
222
+ ckan_organization=args.ckan_organization,
223
+ ckan_dataset=args.ckan_dataset,
224
+ hpc_host=args.hpc_host,
225
+ hpc_port=args.hpc_port,
226
+ hpc_username=args.hpc_username,
227
+ hpc_password=args.hpc_password,
228
+ hpc_secret_key_path=args.hpc_secret_key,
229
+ hpc_secret_key_password=args.hpc_secret_key_password,
230
+ data_source=args.data_source,
231
+ )
232
+
233
+ except Exception as ex:
234
+ raise HidDataTransferException(ex) from ex
235
+
236
+ def ckan2hpc(self, args) -> str:
237
+ """transfer data from CKAN to hpc using SFTP"""
238
+ self.__logger.info(
239
+ "executing ckan2hpc command with args: %s", self.format_args_to_string(args)
240
+ )
241
+ try:
242
+ # Check if 2FA is enabled
243
+ if args.two_factor_authentication:
244
+ return self.dt_client.ckan2hpc_2fa(
245
+ ckan_host=args.ckan_host
246
+ if args.ckan_host.startswith("https")
247
+ else f"https://{args.ckan_host}",
248
+ ckan_api_key=args.ckan_api_key,
249
+ ckan_organization=args.ckan_organization,
250
+ ckan_dataset=args.ckan_dataset,
251
+ ckan_resource=args.ckan_resource,
252
+ hpc_host=args.hpc_host,
253
+ hpc_port=args.hpc_port,
254
+ hpc_username=args.hpc_username,
255
+ hpc_secret_key_path=args.hpc_secret_key,
256
+ hpc_secret_key_password=args.hpc_secret_key_password,
257
+ data_target=args.data_target,
258
+ )
259
+ return self.dt_client.ckan2hpc(
260
+ ckan_host=args.ckan_host
261
+ if args.ckan_host.startswith("https") else f"https://{args.ckan_host}",
262
+ ckan_api_key=args.ckan_api_key,
263
+ ckan_organization=args.ckan_organization,
264
+ ckan_dataset=args.ckan_dataset,
265
+ ckan_resource=args.ckan_resource,
266
+ hpc_host=args.hpc_host,
267
+ hpc_port=args.hpc_port,
268
+ hpc_username=args.hpc_username,
269
+ hpc_password=args.hpc_password,
270
+ hpc_secret_key_path=args.hpc_secret_key,
271
+ hpc_secret_key_password=args.hpc_secret_key_password,
272
+ data_target=args.data_target,
273
+ )
274
+
275
+ except Exception as ex:
276
+ raise HidDataTransferException(ex) from ex
277
+
278
+ def local2ckan(self, args) -> str:
279
+ """transfer data from local filesystem to CKAN using SFTP"""
280
+ self.__logger.info(
281
+ "executing local2ckan command with args: %s",
282
+ self.format_args_to_string(args),
283
+ )
284
+
285
+ try:
286
+ return self.dt_client.local2ckan(
287
+ ckan_host=args.ckan_host
288
+ if args.ckan_host.startswith("https") else f"https://{args.ckan_host}",
289
+ ckan_api_key=args.ckan_api_key,
290
+ ckan_organization=args.ckan_organization,
291
+ ckan_dataset=args.ckan_dataset,
292
+ ckan_resource=args.ckan_resource,
293
+ data_source=args.data_source,
294
+ )
295
+
296
+ except Exception as ex:
297
+ raise HidDataTransferException(ex) from ex
298
+
299
+ def ckan2local(self, args) -> str:
300
+ """transfer data from CKAN to the local filesystem using SFTP"""
301
+ self.__logger.info(
302
+ "executing ckan2local command with args: %s",
303
+ self.format_args_to_string(args),
304
+ )
305
+ try:
306
+ return self.dt_client.ckan2local(
307
+ ckan_host=args.ckan_host
308
+ if args.ckan_host.startswith("https") else f"https://{args.ckan_host}",
309
+ ckan_api_key=args.ckan_api_key,
310
+ ckan_organization=args.ckan_organization,
311
+ ckan_dataset=args.ckan_dataset,
312
+ ckan_resource=args.ckan_resource,
313
+ data_target=args.data_target,
314
+ )
315
+
316
+ except Exception as ex:
317
+ raise HidDataTransferException(ex) from ex
318
+
319
+ def check_command_status(self, args):
320
+ """Checks the status of a CLI command by ID"""
321
+ # Check process group state by id
322
+ # This implies to check the execution state of the last processor in the group
323
+ self.__logger.info(
324
+ "executing check_command_status command with args: %s",
325
+ self.format_args_to_string(args),
326
+ )
327
+ try:
328
+ return self.dt_client.check_command_status(args.command_id)
329
+
330
+ except Exception as ex:
331
+ raise HidDataTransferException(ex) from ex
src/dtcli.cfg ADDED
@@ -0,0 +1,16 @@
1
+ [Nifi]
2
+ nifi_endpoint=https://nifi.hidalgo2.eu:9443
3
+ nifi_upload_folder=/opt/nifi/data/upload
4
+ nifi_download_folder=/opt/nifi/data/download
5
+ nifi_secure_connection=True
6
+
7
+ [Keycloak]
8
+ keycloak_endpoint=https://idm.hidalgo2.eu
9
+ keycloak_client_id=nifi
10
+ keycloak_client_secret=Tkt8BQmTfUkSceknml6HDSbmGyNRik9V
11
+
12
+ [Logging]
13
+ logging_level=INFO
14
+
15
+ [Network]
16
+ check_status_sleep_lapse=1
src/parser/__init__.py ADDED
File without changes
@@ -0,0 +1,370 @@
1
+ """
2
+ Copyright 2024 Eviden
3
+ Licensed under the Apache License, Version 2.0 (the "License");
4
+ you may not use this file except in compliance with the License.
5
+ You may obtain a copy of the License at
6
+
7
+ http://www.apache.org/licenses/LICENSE-2.0
8
+
9
+ Unless required by applicable law or agreed to in writing, software
10
+ distributed under the License is distributed on an 'AS IS' BASIS,
11
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ See the License for the specific language governing permissions and
13
+ limitations under the License.
14
+
15
+
16
+ This module provides a parser for the command line arguments.
17
+ It defines the command line arguments that the user can pass to the CLI
18
+ """
19
+
20
+ import argparse
21
+ import os
22
+
23
+ import yaml
24
+
25
+
26
+ class CLIParser(argparse.ArgumentParser):
27
+ """Parser of the command line arguments"""
28
+
29
+ def fill_missing_args_from_config(self, args):
30
+ """Fill missing arguments from the config file"""
31
+ # Read default dtcli YAML config file from ~/.dtcli/config if exists
32
+ dtcli_config_file = os.path.expanduser("~/.dtcli/config")
33
+ if not os.path.exists(dtcli_config_file):
34
+ return args
35
+ with open(dtcli_config_file, "r", encoding="utf8") as f:
36
+ config = yaml.safe_load(f)
37
+
38
+ # For each host in command, find it in the config file,
39
+ # For each host in command, complete the arguments collected
40
+ # from the config file that are not already set in the command line
41
+ for host in config.keys():
42
+ if host in args:
43
+ host_index = args.index(host)
44
+ if (args[host_index-1] == "-H" or
45
+ args[host_index-1] == "--hpc-host"):
46
+ self.process_hpc_args(args, config[host])
47
+ elif (args[host_index-1] == "-c" or
48
+ args[host_index-1] == "--ckan-host"):
49
+ self.process_ckan_args(args, config[host])
50
+ # Kerberos
51
+ if host.upper() == "KERBEROS" and 'hdfs' in args[0]:
52
+ kerberos_config = config[host]
53
+ self.process_kerberos_args(args, kerberos_config)
54
+ return args
55
+
56
+ def process_kerberos_args(self, args, kerberos_config):
57
+ """Process the Kerberos arguments from the config file"""
58
+ if (("-kpr" not in args and "--kerberos-principal" not in args)
59
+ and "principal" in kerberos_config):
60
+ args.append("-kpr")
61
+ args.append(str(kerberos_config["principal"]))
62
+ if (("-kp" not in args and "--kerberos-password" not in args)
63
+ and "password" in kerberos_config):
64
+ args.append("-kp")
65
+ args.append(str(kerberos_config["password"]))
66
+
67
+ def process_hpc_args(self, args, hpc_config):
68
+ """Process the HPC arguments from the config file"""
69
+ if (("-z" not in args and "--hpc-port" not in args)
70
+ and "port" in hpc_config):
71
+ args.append("-z")
72
+ args.append(str(hpc_config["port"]))
73
+ if (("-u" not in args and "--hpc-username" not in args)
74
+ and "username" in hpc_config):
75
+ args.append("-u")
76
+ args.append(hpc_config["username"])
77
+ if (("-p" not in args and "--hpc-password" not in args)
78
+ and "password" in hpc_config):
79
+ args.append("-p")
80
+ args.append(hpc_config["password"])
81
+ if (("-k" not in args and "--hpc-secret-key" not in args)
82
+ and "secret-key" in hpc_config):
83
+ args.append("-k")
84
+ args.append(hpc_config["secret-key"])
85
+ if (("-P" not in args and "--hpc-secret-key-password" not in args)
86
+ and "secret-key-password" in hpc_config):
87
+ args.append("-P")
88
+ args.append(hpc_config["secret-key-password"])
89
+
90
+ def process_ckan_args(self, args, ckan_config):
91
+ """Process the CKAN arguments from the config file"""
92
+ if (("-a" not in args and "--ckan-api-key" not in args)
93
+ and "api-key" in ckan_config):
94
+ args.append("-a")
95
+ args.append(ckan_config["api-key"])
96
+ if (("-o" not in args and "--ckan-organization" not in args)
97
+ and "organization" in ckan_config):
98
+ args.append("-o")
99
+ args.append(ckan_config["organization"])
100
+ if (("-d" not in args and "--ckan-dataset" not in args)
101
+ and "dataset" in ckan_config):
102
+ args.append("-d")
103
+ args.append(ckan_config["dataset"])
104
+
105
+ def add_default_hpc_arguments(self, parser):
106
+ """Add default HPC arguments to the parser"""
107
+ parser.add_argument(
108
+ "-H", "--hpc-host", required=True, help="Target HPC ssh host"
109
+ )
110
+ parser.add_argument(
111
+ "-z", "--hpc-port", required=False, help="[Optional] Target HPC ssh port"
112
+ )
113
+ parser.add_argument(
114
+ "-u", "--hpc-username", required=True, help="Username for HPC account"
115
+ )
116
+ parser.add_argument(
117
+ "-p",
118
+ "--hpc-password",
119
+ required=False,
120
+ help="[Optional] Password for HPC account. "
121
+ "Either password or secret key is required",
122
+ )
123
+ parser.add_argument(
124
+ "-k",
125
+ "--hpc-secret-key",
126
+ required=False,
127
+ help="[Optional] Path to HPC secret key. "
128
+ "Either password or secret key is required",
129
+ )
130
+ parser.add_argument(
131
+ "-P",
132
+ "--hpc-secret-key-password",
133
+ required=False,
134
+ help="[Optional] Password for HPC secret key",
135
+ )
136
+ return parser
137
+
138
+ def add_default_ckan_arguments(self, parser):
139
+ """Add default CKAN arguments to the parser"""
140
+ parser.add_argument(
141
+ "-c", "--ckan-host", required=True, help="CKAN host endpoint"
142
+ )
143
+ parser.add_argument(
144
+ "-a",
145
+ "--ckan-api-key",
146
+ required=True,
147
+ help="CKAN API key",
148
+ )
149
+ parser.add_argument(
150
+ "-o",
151
+ "--ckan-organization",
152
+ required=True,
153
+ help="Identifier of the CKAN organization that hosts \
154
+ the dataset resource to transfer. \
155
+ Could be identified by the organization id, name or title",
156
+ )
157
+ parser.add_argument(
158
+ "-d",
159
+ "--ckan-dataset",
160
+ required=True,
161
+ help="Identifier of the CKAN dataset that hosts the resource to transfer. \
162
+ Could be identified by the dataset id, name or title",
163
+ )
164
+ return parser
165
+
166
+ def add_default_kerberos_arguments(self, parser):
167
+ """Add default Kerberos arguments to the parser"""
168
+ parser.add_argument(
169
+ "-kpr", "--kerberos-principal",
170
+ required=False,
171
+ help="[Optional] Kerberos principal (mandatory for a Kerberized HDFS)"
172
+ )
173
+ parser.add_argument(
174
+ "-kp", "--kerberos-password",
175
+ required=False,
176
+ help="[Optional] Kerberos principal password "
177
+ "(mandatory for a Kerberized HDFS)"
178
+ )
179
+ return parser
180
+
181
+ def parse_arguments(self, args, target):
182
+ """parse the command line arguments
183
+
184
+ Args:
185
+ args (list): list of command line arguments
186
+ target (object): target object to execute the command
187
+
188
+ Returns:
189
+ dict: a dictionary of parsed arguments
190
+ """
191
+
192
+ # commands
193
+ commands_parsers = self.add_subparsers(
194
+ help="supported commands to transfer data"
195
+ )
196
+ # check status
197
+ check_status_parser = commands_parsers.add_parser(
198
+ "check-status", help="check the status of a command"
199
+ )
200
+ check_status_parser.add_argument(
201
+ "-i",
202
+ "--command_id",
203
+ required=True,
204
+ help="id of command to check status",
205
+ )
206
+ check_status_parser.set_defaults(func=target.check_command_status)
207
+
208
+ # hdfs2hpc
209
+ hdfs2hpc_parser = commands_parsers.add_parser(
210
+ "hdfs2hpc", help="transfer data from HDFS to target HPC"
211
+ )
212
+ hdfs2hpc_parser.add_argument(
213
+ "-s", "--data-source",
214
+ required=True, help="HDFS file path"
215
+ )
216
+ hdfs2hpc_parser.add_argument(
217
+ "-t", "--data-target",
218
+ required=False, help="[Optional] HPC folder"
219
+ )
220
+ hdfs2hpc_parser = self.add_default_kerberos_arguments(hdfs2hpc_parser)
221
+ hdfs2hpc_parser = self.add_default_hpc_arguments(hdfs2hpc_parser)
222
+ hdfs2hpc_parser.add_argument(
223
+ "-2fa", "--two-factor-authentication",
224
+ required=False, action="store_true", default=False,
225
+ help="[Optional] HPC requires 2FA authentication"
226
+ )
227
+ hdfs2hpc_parser.set_defaults(func=target.hdfs2hpc)
228
+
229
+ # hpc2hdfs
230
+ hpc2hdfs_parser = commands_parsers.add_parser(
231
+ "hpc2hdfs", help="transfer data from HPC to target HDFS"
232
+ )
233
+
234
+ hpc2hdfs_parser.add_argument(
235
+ "-s", "--data-source", required=True, help="HPC file path"
236
+ )
237
+ hpc2hdfs_parser.add_argument(
238
+ "-t", "--data-target", required=True, help="HDFS folder"
239
+ )
240
+ hpc2hdfs_parser = self.add_default_kerberos_arguments(hpc2hdfs_parser)
241
+ hpc2hdfs_parser = self.add_default_hpc_arguments(hpc2hdfs_parser)
242
+ hpc2hdfs_parser.add_argument(
243
+ "-2fa", "--two-factor-authentication",
244
+ required=False, action="store_true", default=False,
245
+ help="[Optional] HPC requires 2FA authentication"
246
+ )
247
+ hpc2hdfs_parser.set_defaults(func=target.hpc2hdfs)
248
+
249
+ # ckan2hdfs
250
+ ckan2hdfs_parser = commands_parsers.add_parser(
251
+ "ckan2hdfs", help="transfer data from CKAN to target HDFS"
252
+ )
253
+ ckan2hdfs_parser = self.add_default_ckan_arguments(ckan2hdfs_parser)
254
+ ckan2hdfs_parser.add_argument(
255
+ "-r",
256
+ "--ckan-resource",
257
+ required=False,
258
+ help="[Optional] CKAN resource to transfer. \
259
+ Could be identified by the dataset id or name. \
260
+ If empty, all resources in the dataset will be transferred. \
261
+ A regex is also accepted to filter the resources to transfer",
262
+ )
263
+ ckan2hdfs_parser.add_argument(
264
+ "-t", "--data-target", required=False, help="[Optional] target HDFS folder"
265
+ )
266
+ ckan2hdfs_parser = self.add_default_kerberos_arguments(ckan2hdfs_parser)
267
+ ckan2hdfs_parser.set_defaults(func=target.ckan2hdfs)
268
+
269
+ # hdfs2ckan
270
+ hdfs2ckan_parser = commands_parsers.add_parser(
271
+ "hdfs2ckan", help="transfer data from HDFS to a target CKAN"
272
+ )
273
+ hdfs2ckan_parser = self.add_default_ckan_arguments(hdfs2ckan_parser)
274
+ hdfs2ckan_parser = self.add_default_kerberos_arguments(hdfs2ckan_parser)
275
+ hdfs2ckan_parser.add_argument(
276
+ "-s",
277
+ "--data-source",
278
+ required=True,
279
+ help="File path to HDFS file or directory to transfer",
280
+ )
281
+ hdfs2ckan_parser.set_defaults(func=target.hdfs2ckan)
282
+
283
+ # ckan2hpc
284
+ ckan2hpc_parser = commands_parsers.add_parser(
285
+ "ckan2hpc", help="transfer data from CKAN to target HPC"
286
+ )
287
+ ckan2hpc_parser = self.add_default_ckan_arguments(ckan2hpc_parser)
288
+ ckan2hpc_parser.add_argument(
289
+ "-r",
290
+ "--ckan-resource",
291
+ required=False,
292
+ help="[Optional] CKAN resource to transfer. \
293
+ Could be identified by the dataset id or name. \
294
+ If empty, all resources in the dataset will be transferred. \
295
+ A regex is also accepted to filter the resources to transfer",
296
+ )
297
+ ckan2hpc_parser.add_argument(
298
+ "-t", "--data-target", required=False, help="[Optional] target HPC folder"
299
+ )
300
+ ckan2hpc_parser = self.add_default_hpc_arguments(ckan2hpc_parser)
301
+ ckan2hpc_parser.add_argument(
302
+ "-2fa", "--two-factor-authentication",
303
+ required=False, action="store_true", default=False,
304
+ help="[Optional] HPC requires 2FA authentication"
305
+ )
306
+ ckan2hpc_parser.set_defaults(func=target.ckan2hpc)
307
+
308
+ # hpc2ckan
309
+ hpc2ckan_parser = commands_parsers.add_parser(
310
+ "hpc2ckan", help="transfer data from HPC to a target CKAN"
311
+ )
312
+ hpc2ckan_parser = self.add_default_ckan_arguments(hpc2ckan_parser)
313
+ hpc2ckan_parser = self.add_default_hpc_arguments(hpc2ckan_parser)
314
+ hpc2ckan_parser.add_argument(
315
+ "-2fa", "--two-factor-authentication",
316
+ required=False, action="store_true", default=False,
317
+ help="[Optional] HPC requires 2FA authentication"
318
+ )
319
+ hpc2ckan_parser.add_argument(
320
+ "-s",
321
+ "--data-source",
322
+ required=True,
323
+ help="File path to HPC file or directory to transfer",
324
+ )
325
+ hpc2ckan_parser.set_defaults(func=target.hpc2ckan)
326
+
327
+ # local2ckan
328
+ local2ckan_parser = commands_parsers.add_parser(
329
+ "local2ckan", help="transfer data from a local filesystem to a target CKAN"
330
+ )
331
+ local2ckan_parser = self.add_default_ckan_arguments(local2ckan_parser)
332
+ local2ckan_parser.add_argument(
333
+ "-r",
334
+ "--ckan-resource",
335
+ required=False,
336
+ help="[Optional] CKAN resource to create from transferred sources. \
337
+ If omitted, target resource name will adopt the source file or folder name",
338
+ )
339
+
340
+ local2ckan_parser.add_argument(
341
+ "-s",
342
+ "--data-source",
343
+ required=True,
344
+ help="File path to local file or directory to transfer",
345
+ )
346
+ local2ckan_parser.set_defaults(func=target.local2ckan)
347
+
348
+ # ckan2local
349
+ ckan2local_parser = commands_parsers.add_parser(
350
+ "ckan2local", help="transfer data from CKAN to a local filesystem"
351
+ )
352
+ ckan2local_parser = self.add_default_ckan_arguments(ckan2local_parser)
353
+ ckan2local_parser.add_argument(
354
+ "-r",
355
+ "--ckan-resource",
356
+ required=False,
357
+ help="[Optional] CKAN resource to transfer. \
358
+ If omitted, all resources in the dataset will be transferred",
359
+ )
360
+
361
+ ckan2local_parser.add_argument(
362
+ "-t",
363
+ "--data-target",
364
+ required=False,
365
+ help="Local directory where to transfer the data. \
366
+ If omitted, data will be transferred to the current directory",
367
+ )
368
+ ckan2local_parser.set_defaults(func=target.ckan2local)
369
+
370
+ return self.parse_args(args)