data-transfer-cli 0.3.2__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data_transfer_cli-0.3.2.dist-info/METADATA +235 -0
- data_transfer_cli-0.3.2.dist-info/RECORD +12 -0
- data_transfer_cli-0.3.2.dist-info/WHEEL +4 -0
- data_transfer_cli-0.3.2.dist-info/entry_points.txt +3 -0
- src/.env +1 -0
- src/__init__.py +0 -0
- src/conf/cli.cfg +8 -0
- src/data_transfer_cli.py +171 -0
- src/data_transfer_proxy.py +331 -0
- src/dtcli.cfg +16 -0
- src/parser/__init__.py +0 -0
- src/parser/cli_parser.py +370 -0
|
@@ -0,0 +1,235 @@
|
|
|
1
|
+
Metadata-Version: 2.3
|
|
2
|
+
Name: data-transfer-cli
|
|
3
|
+
Version: 0.3.2
|
|
4
|
+
Summary: HiDALGO Data Transfer CLI provides commands to transfer data between different data providers and consumers using NIFI pipelines
|
|
5
|
+
License: APL-2.0
|
|
6
|
+
Author: Jesús Gorroñogoitia
|
|
7
|
+
Author-email: jesus.gorronogoitia@eviden.com
|
|
8
|
+
Requires-Python: >=3.11, <4.0
|
|
9
|
+
Classifier: License :: Other/Proprietary License
|
|
10
|
+
Classifier: Programming Language :: Python :: 3
|
|
11
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
14
|
+
Requires-Dist: hid_data_transfer_lib (>=0.3.2)
|
|
15
|
+
Requires-Dist: paramiko (>=3.3.1)
|
|
16
|
+
Requires-Dist: pyyaml (>=6.0.2,<7.0.0)
|
|
17
|
+
Requires-Dist: requests (>=2.31.0)
|
|
18
|
+
Description-Content-Type: text/markdown
|
|
19
|
+
|
|
20
|
+
# Hidalgo2 Data Transfer Tool
|
|
21
|
+
This repository contains the implementation of the Hidalgo2 data transfer tool. It uses [Apache NIFI](https://nifi.apache.org/) to transfer data from different data sources to specified targets
|
|
22
|
+
|
|
23
|
+
## Features
|
|
24
|
+
This tool is planning to support the following features:
|
|
25
|
+
- transfer datasets from Cloud Providers to HDFS
|
|
26
|
+
- transfer datasets from Cloud Providers to CKAN
|
|
27
|
+
- transfer datasets from/to Hadoop HDFS to/from HPC
|
|
28
|
+
- transfer datasets from/to Hadoop HDFS to/from CKAN
|
|
29
|
+
- transfer datasets from/to a CKAN to/from HPC
|
|
30
|
+
- transfer datasets from/to local filesystem to/from CKAN
|
|
31
|
+
|
|
32
|
+
## Prototype
|
|
33
|
+
Current prototype supports the following features:
|
|
34
|
+
- transfer datasets from/to Hadoop HDFS to/from HPC
|
|
35
|
+
- transfer datasets from/to Hadoop HDFS to/from CKAN
|
|
36
|
+
- transfer datasets from/to a CKAN to/from HPC
|
|
37
|
+
- transfer datasets from/to local filesystem to/from CKAN
|
|
38
|
+
|
|
39
|
+
|
|
40
|
+
## Implementation
|
|
41
|
+
Current implementation is based on Python. It is implemented as a CLI that executes a transfer command, by creating a NIFI process group out of the worflow definition reqistered in NIFI registry. It uses the parameters given within the CLI command invocation to populate a NIFI parameter context that is asociated to the created process group. Then, the process group processors are executed once (or until the incoming flowfile queues is empty), one after another, following the group sequence flow, until the flow is completed. To check the status of the transfer command, the CLI offers a check-status command. The Data Transfer CLI tool sends requests to NIFI through its REST API.
|
|
42
|
+
|
|
43
|
+
## Requirements
|
|
44
|
+
To use the Data Transfer CLI tool, it is required the following requirements:
|
|
45
|
+
- **Python3** execution environment
|
|
46
|
+
- **Poetry** python package management tool
|
|
47
|
+
- **NIFI** instance, with either a NIFI service or a KEYCLOAK account, plus a NIFI server account (for keys transfer)
|
|
48
|
+
- **HDFS** instance, with a user Kerberos token (i.e. authenticated Kerberos principal) if required
|
|
49
|
+
- **CKAN** instance, with an user APIKey
|
|
50
|
+
|
|
51
|
+
Python3 and Poetry should be installed in the computer where Data Transfer CLI tool will be used.
|
|
52
|
+
To install Poetry, follows [this instructions](https://python-poetry.org/docs/#installing-with-the-official-installer)
|
|
53
|
+
|
|
54
|
+
For a quick download, setup, configuration and execution of the DTCLI go to section [Quick Deployment, setup, configuration and execution](#quick-deployment-setup-configuration-and-execution)
|
|
55
|
+
|
|
56
|
+
## CLI configuration
|
|
57
|
+
### Configuration file
|
|
58
|
+
Before using the Data Transfer CLI tool, you should configure it to point at the target NIFI. The configuration file is located at the *src/dtcli.cfg* file.
|
|
59
|
+
|
|
60
|
+
```
|
|
61
|
+
[Nifi]
|
|
62
|
+
nifi_endpoint=http://localhost:8443
|
|
63
|
+
nifi_upload_folder=/opt/nifi/data/upload
|
|
64
|
+
nifi_download_folder=/opt/nifi/data/download
|
|
65
|
+
nifi_secure_connection=True
|
|
66
|
+
|
|
67
|
+
[Keycloak]
|
|
68
|
+
keycloak_endpoint=https://idm.hidalgo2.eu
|
|
69
|
+
keycloak_client_id=nifi
|
|
70
|
+
keycloak_client_secret=Tkt8BQmTfUkSceknml6HDSbmGyNRik9V
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
Under the NIFI section,
|
|
74
|
+
- We define the url of the NIFI service (*nifi_endpoint*),
|
|
75
|
+
- We also specify a folder (*nifi_upload_folder*) in NIFI server where to upload files
|
|
76
|
+
- And another folder (*nifi_download_folder*) where from to download files. These folder must be accessible by the NIFI service (ask NIFI administrator for details).
|
|
77
|
+
- Additionally, you cat set if NIFI servers listens on a secure HTTPS connection (*nifi_secure_connection*=True) or on a non-secure HTTP (*nifi_secure_connection*=False)
|
|
78
|
+
|
|
79
|
+
Under the Keycloak section, you can configure the Keycloak integrated with NIFI, specifying:
|
|
80
|
+
- The Keycloak service endpoint (*keycloak_endpoint*)
|
|
81
|
+
- The NIFI client in Keycloak (*keycloak_client*)
|
|
82
|
+
- The NIFI secret in Keycloak (*keycloak_client_secret*)
|
|
83
|
+
|
|
84
|
+
### NIFI and Keycloak credentials in environment variables
|
|
85
|
+
We must also specify a user account (username, private_key) that grants to upload/download files to the NIFI server (as requested to upload temporary HPC keys or to support local file transfer). This user's account is provided by Hidalgo2 infrastructure provider and it is user's specific. This account is setup in the following environment variables
|
|
86
|
+
- NIFI_SERVER_USERNAME: `export NIFI_SERVER_USERNAME=<nifi_server_username>`
|
|
87
|
+
- NIFI_SERVER_PRIVATE_KEY: `export NIFI_SERVER_PRIVATE_KEY=<path_to_private_key>`
|
|
88
|
+
|
|
89
|
+
Additionally, a user account granted with access to the NIFI service must be specified, either a
|
|
90
|
+
|
|
91
|
+
#### A) NIFI User Account
|
|
92
|
+
The NIFI account must be configured in the following environment variables:
|
|
93
|
+
- NIFI_LOGIN: `export NIFI_LOGIN=<nifi_login>`
|
|
94
|
+
- NIFI_PASSWORD: `export NIFI_PASSWORD=<nifi_password>`
|
|
95
|
+
|
|
96
|
+
This NIFI account is provided by the NIFI administrator, or a
|
|
97
|
+
|
|
98
|
+
#### B) Keycloak Account with access to NIFI
|
|
99
|
+
The Keycloak account must be configured in the following environment variables:
|
|
100
|
+
- KEYCLOAK_LOGIN: `export KEYCLOAK_LOGIN=<keycloak_login>`
|
|
101
|
+
- KEYCLOAK_PASSWORD: `export KEYCLOAK_PASSWORD=<keycloak_password>`
|
|
102
|
+
|
|
103
|
+
This Keycloak account is provided by the Keycloak administrator.
|
|
104
|
+
|
|
105
|
+
## Quick Deployment, setup, configuration and execution
|
|
106
|
+
### From GitLab repository
|
|
107
|
+
1. Clone Data Transfer CLI repository.
|
|
108
|
+
2. Setup the hid-data_transfer_lib project.
|
|
109
|
+
Go to folder *hid-data-management/data-transfer/nifi/hid_data_transfer_lib*.
|
|
110
|
+
On the prompt, do: `poetry install & poetry build`
|
|
111
|
+
Note: this is only required meanwhile this tool is under development, as data-transfer-cli references the local hid_data_transfer_lib and not the one published on Pypi.
|
|
112
|
+
2. Setup the data-transfer-cli project with poetry.
|
|
113
|
+
Go to folder *hid-data-management/data-transfer/nifi/data-transfer-cli*.
|
|
114
|
+
On the prompt, run `./setup.sh`
|
|
115
|
+
3. Configure your NIFI and Keycloak services, by modifying the default DT CLI configuration (attached to HiDALGO2 NIFI and KEYCLOAK) located at *src/dtcli.cfg*
|
|
116
|
+
4. Edit *setenv.sh*. Provide your accounts for KEYCLOAK and NIFI server. Contact the HiDALGO2 administrator to request them.
|
|
117
|
+
```
|
|
118
|
+
export NIFI_SERVER_USERNAME="<username>"
|
|
119
|
+
export NIFI_SERVER_PRIVATE_KEY="<relative_path_ssh_private_key"
|
|
120
|
+
export KEYCLOAK_LOGIN="<username>"
|
|
121
|
+
export KEYCLOAK_PASSWORD="<password>"
|
|
122
|
+
```
|
|
123
|
+
5. Run Data Transfer CLI tool. In this example, we ask it for help: `dtcli -h`
|
|
124
|
+
|
|
125
|
+
### From Pipy installation
|
|
126
|
+
To be done
|
|
127
|
+
|
|
128
|
+
## Usage
|
|
129
|
+
The Data Transfer CLI tool can be executed by invoking the command `dtcli`. Add this command location to your path, either by adding the *data_transfer_cli* folder (when cloned from GitLab) or its location when installed with pip from Pypi:
|
|
130
|
+
|
|
131
|
+
`./dtcli command <arguments>`
|
|
132
|
+
|
|
133
|
+
To get help execute:
|
|
134
|
+
|
|
135
|
+
`./dtcli -h`
|
|
136
|
+
|
|
137
|
+
obtaining:
|
|
138
|
+
|
|
139
|
+
```
|
|
140
|
+
usage: ['-h'] [-h]
|
|
141
|
+
{check-status,hdfs2hpc,hpc2hdfs,ckan2hdfs,hdfs2ckan,ckan2hpc,hpc2ckan,local2ckan,ckan2local}
|
|
142
|
+
...
|
|
143
|
+
|
|
144
|
+
positional arguments:
|
|
145
|
+
{check-status,hdfs2hpc,hpc2hdfs,ckan2hdfs,hdfs2ckan,ckan2hpc,hpc2ckan,local2ckan,ckan2local}
|
|
146
|
+
supported commands to transfer data
|
|
147
|
+
check-status check the status of a command
|
|
148
|
+
hdfs2hpc transfer data from HDFS to target HPC
|
|
149
|
+
hpc2hdfs transfer data from HPC to target HDFS
|
|
150
|
+
ckan2hdfs transfer data from CKAN to target HDFS
|
|
151
|
+
hdfs2ckan transfer data from HDFS to a target CKAN
|
|
152
|
+
ckan2hpc transfer data from CKAN to target HPC
|
|
153
|
+
hpc2ckan transfer data from HPC to a target CKAN
|
|
154
|
+
local2ckan transfer data from a local filesystem to a target CKAN
|
|
155
|
+
ckan2local transfer data from CKAN to a local filesystem
|
|
156
|
+
|
|
157
|
+
options:
|
|
158
|
+
-h, --help show this help message and exit
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
To get help of a particular command:
|
|
162
|
+
|
|
163
|
+
`./dtcli hdfs2hpc -h`
|
|
164
|
+
|
|
165
|
+
obtaining:
|
|
166
|
+
|
|
167
|
+
```
|
|
168
|
+
usage: ['hdfs2hpc', '-h'] hdfs2hpc [-h] -s DATA_SOURCE [-t DATA_TARGET] [-kpr KERBEROS_PRINCIPAL] [-kp KERBEROS_PASSWORD] -H HPC_HOST [-z HPC_PORT] -u HPC_USERNAME [-p HPC_PASSWORD] [-k HPC_SECRET_KEY] [-P HPC_SECRET_KEY_PASSWORD]
|
|
169
|
+
|
|
170
|
+
options:
|
|
171
|
+
-h, --help show this help message and exit
|
|
172
|
+
-s DATA_SOURCE, --data-source DATA_SOURCE
|
|
173
|
+
HDFS file path
|
|
174
|
+
-t DATA_TARGET, --data-target DATA_TARGET
|
|
175
|
+
[Optional] HPC folder
|
|
176
|
+
-kpr KERBEROS_PRINCIPAL, --kerberos-principal KERBEROS_PRINCIPAL
|
|
177
|
+
[Optional] Kerberos principal (mandatory for a Kerberized HDFS)
|
|
178
|
+
-kp KERBEROS_PASSWORD, --kerberos-password KERBEROS_PASSWORD
|
|
179
|
+
[Optional] Kerberos principal password (mandatory for a Kerberized HDFS)
|
|
180
|
+
-H HPC_HOST, --hpc-host HPC_HOST
|
|
181
|
+
Target HPC ssh host
|
|
182
|
+
-z HPC_PORT, --hpc-port HPC_PORT
|
|
183
|
+
[Optional] Target HPC ssh port
|
|
184
|
+
-u HPC_USERNAME, --hpc-username HPC_USERNAME
|
|
185
|
+
Username for HPC account
|
|
186
|
+
-p HPC_PASSWORD, --hpc-password HPC_PASSWORD
|
|
187
|
+
[Optional] Password for HPC account. Either password or secret key is required
|
|
188
|
+
-k HPC_SECRET_KEY, --hpc-secret-key HPC_SECRET_KEY
|
|
189
|
+
[Optional] Path to HPC secret key. Either password or secret key is required
|
|
190
|
+
-P HPC_SECRET_KEY_PASSWORD, --hpc-secret-key-password HPC_SECRET_KEY_PASSWORD
|
|
191
|
+
[Optional] Password for HPC secret key
|
|
192
|
+
-2fa, --two-factor-authentication
|
|
193
|
+
[Optional] HPC requires 2FA authentication
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
A common command flow (e.g. transfer data from hdfs to hpc) would be like this:
|
|
197
|
+
|
|
198
|
+
- execute *hdfs2hcp* CLI command to transfer data from an hdfs location (e.g. /users/yosu/data/genome-tags.csv) to a remote HPC (e.g. LUMI, at $HOME/data folder)
|
|
199
|
+
- check status of *hdfs2hcp* transfer (and possible warnings/errors) with *check-status* CLI command
|
|
200
|
+
|
|
201
|
+
## Support for HPC clusters that require a 2FA token
|
|
202
|
+
The Data Transfer CLI tool's commands support transferring data to/from HPC clusters that require a 2FA token. These commands offer an optional flag *_2fa*. If set by the user, the command prompts the user (in the standard input) for the token when required.
|
|
203
|
+
|
|
204
|
+
## Predefined profiles for data hosts
|
|
205
|
+
To avoid feeding the Data Transfer CLI tool with many inputs decribing the hosts of the source and target data providers/consumers, the user can defined them in the `~/dtcli/config` YAML file, as shown in the following YAML code snippet:
|
|
206
|
+
```
|
|
207
|
+
# Meluxina
|
|
208
|
+
login.lxp.lu:
|
|
209
|
+
username: u102309
|
|
210
|
+
port: 8822
|
|
211
|
+
secret-key: ~/.ssh/<secret_key>
|
|
212
|
+
secret-key-password: <password>
|
|
213
|
+
|
|
214
|
+
# CKAN
|
|
215
|
+
ckan.hidalgo2.eu:
|
|
216
|
+
api-key: <api-key>
|
|
217
|
+
organization: atos
|
|
218
|
+
dataset: test-dataset
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
where details for Meluxina HPC and CKAN are given. For a HPC cluster, provide the HPC host as key, followed by colon, and below, with identation, any of the hpc parameters described by the Data Tranfer CLI tool help, without the *hpc_* prefix. For instance, if the Data Transfer CLI tool help mentions:
|
|
222
|
+
```
|
|
223
|
+
-u HPC_USERNAME, --hpc-username HPC_USERNAME
|
|
224
|
+
Username for HPC account
|
|
225
|
+
```
|
|
226
|
+
that is, *--hpc-username* as parameter, use *username* as nested property for the HPC profile's description in the YAML config file, as shown in the example below. Similarly, proceed for other HPC parameters, such as *port*, *password*, *secret-key*, etc.
|
|
227
|
+
The same procedure can be adopted to describe the CKAN host's parameters.
|
|
228
|
+
|
|
229
|
+
Note: Hidalgo2 HPDA configuration is included in the Data Transfer CLI tool implementation and does not require to be included in this config file.
|
|
230
|
+
|
|
231
|
+
Then, when you launch a Data Tranfer CLI tool command, any parameter not included in the command line will be retrieved from the config file if the corresponding host entry is included. After that, if the command line gets complete (i.e. all required parameters are provided), the command will be executed, otherwise the corresponding error will be triggered.
|
|
232
|
+
|
|
233
|
+
|
|
234
|
+
|
|
235
|
+
|
|
@@ -0,0 +1,12 @@
|
|
|
1
|
+
src/.env,sha256=vGRFD3D5PPwoMBhtPvqN_1alGOHLsNAOJB_3r34BoFA,16
|
|
2
|
+
src/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
3
|
+
src/conf/cli.cfg,sha256=Djxan_HjTkqVJOJZWuXt3DsypAgGVPYMj40YWEtjGFs,206
|
|
4
|
+
src/data_transfer_cli.py,sha256=abM6Dpik9-LHpOQT4_GUqZRiXImICJu0YpcYUrCL7sQ,5646
|
|
5
|
+
src/data_transfer_proxy.py,sha256=4dN5RNT_zJyNUaKGxjvyUw-AMdNVnr3uBpzGXgupko4,13877
|
|
6
|
+
src/dtcli.cfg,sha256=4eWBXiKB5WtG9I6jq6mh1u8ZDb73owmzqpfHAv-2K7s,367
|
|
7
|
+
src/parser/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
8
|
+
src/parser/cli_parser.py,sha256=IxbtaQxs_zqkVBDmLMngJdNR-l8ER5DvO5KtFpowczU,14775
|
|
9
|
+
data_transfer_cli-0.3.2.dist-info/METADATA,sha256=v6hcAsBxa9nk114VDo1MjTczZtozf-_RT1PmfEExwVk,12362
|
|
10
|
+
data_transfer_cli-0.3.2.dist-info/WHEEL,sha256=XbeZDeTWKc1w7CSIyre5aMDU_-PohRwTQceYnisIYYY,88
|
|
11
|
+
data_transfer_cli-0.3.2.dist-info/entry_points.txt,sha256=vQK-69hamV_JVD9AMg1e0HquJCZtjeYxj92ULtYSBso,46
|
|
12
|
+
data_transfer_cli-0.3.2.dist-info/RECORD,,
|
src/__init__.py
ADDED
|
File without changes
|
src/conf/cli.cfg
ADDED
src/data_transfer_cli.py
ADDED
|
@@ -0,0 +1,171 @@
|
|
|
1
|
+
'''
|
|
2
|
+
Copyright 2024 Eviden
|
|
3
|
+
Licensed under the Apache License, Version 2.0 (the "License");
|
|
4
|
+
you may not use this file except in compliance with the License.
|
|
5
|
+
You may obtain a copy of the License at
|
|
6
|
+
|
|
7
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
|
8
|
+
|
|
9
|
+
Unless required by applicable law or agreed to in writing, software
|
|
10
|
+
distributed under the License is distributed on an 'AS IS' BASIS,
|
|
11
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
12
|
+
See the License for the specific language governing permissions and
|
|
13
|
+
limitations under the License.
|
|
14
|
+
|
|
15
|
+
CLI tool for data transfer based on Apache NIFI
|
|
16
|
+
Initial Poc features
|
|
17
|
+
- hdfs2hpc: transfer data from hdfs to target hpc, using sftp processor:
|
|
18
|
+
- inputs:
|
|
19
|
+
- hpc_host: hpc frontend hostname
|
|
20
|
+
- hpc_username: user account name
|
|
21
|
+
- hpc_secret_key_path: user's secret key location
|
|
22
|
+
- data-source: HDFS file path
|
|
23
|
+
- data-target: HPC remote folder
|
|
24
|
+
|
|
25
|
+
- hpc2hdfs: transfer data file from hpc folder to target hdfs folder,
|
|
26
|
+
using sftp processor:
|
|
27
|
+
- inputs:
|
|
28
|
+
- hpc_host: hpc frontend hostname
|
|
29
|
+
- hpc_username: user account name
|
|
30
|
+
- hpc_secret_key_path: user's secret key location
|
|
31
|
+
- data-source: HPC file path
|
|
32
|
+
- data-target: HDFS remote folder
|
|
33
|
+
|
|
34
|
+
- ckan2hpc: transfer data from ckan to target hpc,
|
|
35
|
+
using ckan and sftp processors:
|
|
36
|
+
- inputs:
|
|
37
|
+
- ckan_host: CKAN host endpoint
|
|
38
|
+
- ckan_api_key: CKAN API key
|
|
39
|
+
- ckan_organization: CKAN organization
|
|
40
|
+
- ckan_dataset: CKAN dataset
|
|
41
|
+
- ckan_resource: CKAN resource
|
|
42
|
+
- hpc_host: hpc frontend hostname
|
|
43
|
+
- hpc_username: user account name
|
|
44
|
+
- hpc_secret_key_path: user's secret key location
|
|
45
|
+
- data-target: HPC remote folder
|
|
46
|
+
|
|
47
|
+
- hpc2ckan: transfer data from hpc to target ckan,
|
|
48
|
+
using ckan and sftp processors:
|
|
49
|
+
- inputs:
|
|
50
|
+
- ckan_host: CKAN host endpoint
|
|
51
|
+
- ckan_api_key: CKAN API key
|
|
52
|
+
- ckan_organization: CKAN organization
|
|
53
|
+
- ckan_dataset: CKAN dataset
|
|
54
|
+
- hpc_host: hpc frontend hostname
|
|
55
|
+
- hpc_username: user account name
|
|
56
|
+
- hpc_secret_key_path: user's secret key location
|
|
57
|
+
- data_source: HPC file path
|
|
58
|
+
|
|
59
|
+
- local2ckan: transfer data from a local filesystem to a target ckan
|
|
60
|
+
- inputs:
|
|
61
|
+
- ckan_host: CKAN host endpoint
|
|
62
|
+
- ckan_api_key: CKAN API key
|
|
63
|
+
- ckan_organization: CKAN organization
|
|
64
|
+
- ckan_dataset: CKAN dataset
|
|
65
|
+
- ckan_resource: CKAN resource to receive the data
|
|
66
|
+
- data_source: local file path to the data to transfer
|
|
67
|
+
|
|
68
|
+
- ckan2local: transfer data from ckan to a local filesystem
|
|
69
|
+
- inputs:
|
|
70
|
+
- ckan_host: CKAN host endpoint
|
|
71
|
+
- ckan_api_key: CKAN API key
|
|
72
|
+
- ckan_organization: CKAN organization
|
|
73
|
+
- ckan_dataset: CKAN dataset
|
|
74
|
+
- ckan_resource: CKAN resource to transfer
|
|
75
|
+
- data_target: local target directory where to transfer the resource
|
|
76
|
+
|
|
77
|
+
- check-status: check the execution state of a command
|
|
78
|
+
- inputs:
|
|
79
|
+
- command_id: uuid of command executed
|
|
80
|
+
(uuid is reported after command execution)
|
|
81
|
+
|
|
82
|
+
|
|
83
|
+
This CLI uses NIFI account to get an access token,
|
|
84
|
+
It uses NIFI REST API to send requests.
|
|
85
|
+
It uses a predefined and installed process group HDSF2HPC template
|
|
86
|
+
with associated parameter context
|
|
87
|
+
'''
|
|
88
|
+
|
|
89
|
+
import sys
|
|
90
|
+
import os
|
|
91
|
+
import threading
|
|
92
|
+
import traceback
|
|
93
|
+
import warnings
|
|
94
|
+
|
|
95
|
+
from hid_data_transfer_lib.exceptions.hid_dt_exceptions import HidDataTransferException
|
|
96
|
+
from hid_data_transfer_lib.conf.hid_dt_configuration import HidDataTransferConfiguration
|
|
97
|
+
|
|
98
|
+
from src.data_transfer_proxy import DataTransferProxy
|
|
99
|
+
from src.parser.cli_parser import CLIParser
|
|
100
|
+
|
|
101
|
+
|
|
102
|
+
warnings.filterwarnings("ignore")
|
|
103
|
+
|
|
104
|
+
|
|
105
|
+
# Get CLI configuration
|
|
106
|
+
os.environ["HID_DT_CONFIG_FILE"] = \
|
|
107
|
+
str(os.path.dirname(os.path.realpath(__file__))) + "/dtcli.cfg"
|
|
108
|
+
config = HidDataTransferConfiguration()
|
|
109
|
+
|
|
110
|
+
# Data Transfer proxy to the library
|
|
111
|
+
dt_proxy = DataTransferProxy(config, True)
|
|
112
|
+
|
|
113
|
+
|
|
114
|
+
class ThreadRaisingExceptions(threading.Thread):
|
|
115
|
+
"""Thread class that raises exceptions in the main thread
|
|
116
|
+
when the thread finishes with an exception"""
|
|
117
|
+
|
|
118
|
+
def __init__(self, *args, **kwargs):
|
|
119
|
+
self._exception = None
|
|
120
|
+
self._process_group_id = None
|
|
121
|
+
super().__init__(*args, **kwargs)
|
|
122
|
+
|
|
123
|
+
def run(self):
|
|
124
|
+
try:
|
|
125
|
+
self._process_group_id = self._target(*self._args, **self._kwargs)
|
|
126
|
+
except HidDataTransferException as e:
|
|
127
|
+
self._exception = e
|
|
128
|
+
raise e
|
|
129
|
+
|
|
130
|
+
def join(self, *args, **kwargs):
|
|
131
|
+
super().join(*args, **kwargs)
|
|
132
|
+
if self._exception:
|
|
133
|
+
raise self._exception
|
|
134
|
+
return self._process_group_id
|
|
135
|
+
|
|
136
|
+
|
|
137
|
+
def main(args=None):
|
|
138
|
+
"""Main entry point for the Data Transfer CLI"""
|
|
139
|
+
if not args:
|
|
140
|
+
args = sys.argv[1:]
|
|
141
|
+
# Parse arguments
|
|
142
|
+
cli_parser = CLIParser(args)
|
|
143
|
+
|
|
144
|
+
try:
|
|
145
|
+
if len(args) == 0:
|
|
146
|
+
cli_parser.print_help()
|
|
147
|
+
sys.exit(1)
|
|
148
|
+
|
|
149
|
+
# Read user's config file to complete missing arguments with default ones
|
|
150
|
+
args = cli_parser.fill_missing_args_from_config(args)
|
|
151
|
+
args = cli_parser.parse_arguments(args, dt_proxy)
|
|
152
|
+
|
|
153
|
+
# executes associated command in data_transfer_cli module
|
|
154
|
+
thread = ThreadRaisingExceptions(target=args.func, args=(args,))
|
|
155
|
+
thread.start()
|
|
156
|
+
thread.join()
|
|
157
|
+
except HidDataTransferException as e:
|
|
158
|
+
if e.process_group_id():
|
|
159
|
+
sys.stderr.write(
|
|
160
|
+
(
|
|
161
|
+
f"Got error{e} when executing process group "
|
|
162
|
+
f"with id {e.process_group_id()}"
|
|
163
|
+
)
|
|
164
|
+
)
|
|
165
|
+
else:
|
|
166
|
+
traceback.print_exc(file=sys.stderr)
|
|
167
|
+
raise e
|
|
168
|
+
|
|
169
|
+
|
|
170
|
+
if __name__ == "__main__":
|
|
171
|
+
main()
|
|
@@ -0,0 +1,331 @@
|
|
|
1
|
+
"""
|
|
2
|
+
Copyright 2024 Eviden
|
|
3
|
+
Licensed under the Apache License, Version 2.0 (the "License");
|
|
4
|
+
you may not use this file except in compliance with the License.
|
|
5
|
+
You may obtain a copy of the License at
|
|
6
|
+
|
|
7
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
|
8
|
+
|
|
9
|
+
Unless required by applicable law or agreed to in writing, software
|
|
10
|
+
distributed under the License is distributed on an 'AS IS' BASIS,
|
|
11
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
12
|
+
See the License for the specific language governing permissions and
|
|
13
|
+
limitations under the License.
|
|
14
|
+
|
|
15
|
+
|
|
16
|
+
This module defines the NIFI API client class.
|
|
17
|
+
It provides methods to interface the NIFI server to in instantiate templates,
|
|
18
|
+
and run processors in a process group.
|
|
19
|
+
"""
|
|
20
|
+
|
|
21
|
+
from hid_data_transfer_lib.exceptions.hid_dt_exceptions import HidDataTransferException
|
|
22
|
+
from hid_data_transfer_lib.conf.hid_dt_configuration import HidDataTransferConfiguration
|
|
23
|
+
from hid_data_transfer_lib.hid_dt_lib import HIDDataTransfer
|
|
24
|
+
|
|
25
|
+
|
|
26
|
+
class DataTransferProxy:
|
|
27
|
+
"""interface to the hid_data_transfer_lib to run data transfer commands"""
|
|
28
|
+
|
|
29
|
+
def __init__(
|
|
30
|
+
self, conf: HidDataTransferConfiguration, secure: bool = False
|
|
31
|
+
) -> None:
|
|
32
|
+
"""constructs a data transfer client,"""
|
|
33
|
+
self.__conf = conf
|
|
34
|
+
self.dt_client = HIDDataTransfer(conf=conf, secure=secure)
|
|
35
|
+
self.__logger = self.__conf.logger("nifi.v2.client")
|
|
36
|
+
|
|
37
|
+
def format_args_to_string(self, args):
|
|
38
|
+
"""Format dtcli command arguments to string for logging"""
|
|
39
|
+
return " ".join(
|
|
40
|
+
[
|
|
41
|
+
f"ckan_host={args.ckan_host}," if hasattr(args, "ckan_host") else "",
|
|
42
|
+
(
|
|
43
|
+
f"ckan_organization={args.ckan_organization},"
|
|
44
|
+
if hasattr(args, "ckan_organization")
|
|
45
|
+
else ""
|
|
46
|
+
),
|
|
47
|
+
(
|
|
48
|
+
f"ckan_dataset={args.ckan_dataset},"
|
|
49
|
+
if hasattr(args, "ckan_dataset")
|
|
50
|
+
else ""
|
|
51
|
+
),
|
|
52
|
+
(
|
|
53
|
+
f"ckan_resource={args.ckan_resource},"
|
|
54
|
+
if hasattr(args, "ckan_resource")
|
|
55
|
+
else ""
|
|
56
|
+
),
|
|
57
|
+
(
|
|
58
|
+
f"data_source={args.data_source},"
|
|
59
|
+
if hasattr(args, "data_source")
|
|
60
|
+
else ""
|
|
61
|
+
),
|
|
62
|
+
(
|
|
63
|
+
f"data_target={args.data_target},"
|
|
64
|
+
if hasattr(args, "data_target")
|
|
65
|
+
else ""
|
|
66
|
+
),
|
|
67
|
+
f"hpc_host={args.hpc_host}," if hasattr(args, "hpc_host") else "",
|
|
68
|
+
f"hpc_port={args.hpc_port}," if hasattr(args, "hpc_port") else "",
|
|
69
|
+
(
|
|
70
|
+
f"hpc_username={args.hpc_username},"
|
|
71
|
+
if hasattr(args, "hpc_username")
|
|
72
|
+
else ""
|
|
73
|
+
),
|
|
74
|
+
(
|
|
75
|
+
f"hpc_secret_key={args.hpc_secret_key},"
|
|
76
|
+
if hasattr(args, "hpc_secret_key")
|
|
77
|
+
else ""
|
|
78
|
+
),
|
|
79
|
+
f"command_id={args.command_id}," if hasattr(args, "command_id") else "",
|
|
80
|
+
]
|
|
81
|
+
)
|
|
82
|
+
|
|
83
|
+
# MAIN CLI commands
|
|
84
|
+
|
|
85
|
+
def hdfs2hpc(self, args) -> str:
|
|
86
|
+
"""transfer data from HDFS to hpc using SFTP"""
|
|
87
|
+
self.__logger.info(
|
|
88
|
+
"executing hdfs2hpc command with args: %s", self.format_args_to_string(args)
|
|
89
|
+
)
|
|
90
|
+
try:
|
|
91
|
+
# Check if 2FA is enabled
|
|
92
|
+
if args.two_factor_authentication:
|
|
93
|
+
return self.dt_client.hdfs2hpc_2fa(
|
|
94
|
+
hpc_host=args.hpc_host,
|
|
95
|
+
hpc_port=args.hpc_port,
|
|
96
|
+
hpc_username=args.hpc_username,
|
|
97
|
+
hpc_secret_key_path=args.hpc_secret_key,
|
|
98
|
+
hpc_secret_key_password=args.hpc_secret_key_password,
|
|
99
|
+
data_source=args.data_source,
|
|
100
|
+
data_target=args.data_target,
|
|
101
|
+
kerberos_principal=args.kerberos_principal,
|
|
102
|
+
kerberos_password=args.kerberos_password,
|
|
103
|
+
)
|
|
104
|
+
return self.dt_client.hdfs2hpc(
|
|
105
|
+
hpc_host=args.hpc_host,
|
|
106
|
+
hpc_port=args.hpc_port,
|
|
107
|
+
hpc_username=args.hpc_username,
|
|
108
|
+
hpc_password=args.hpc_password,
|
|
109
|
+
hpc_secret_key_path=args.hpc_secret_key,
|
|
110
|
+
hpc_secret_key_password=args.hpc_secret_key_password,
|
|
111
|
+
data_source=args.data_source,
|
|
112
|
+
data_target=args.data_target,
|
|
113
|
+
kerberos_principal=args.kerberos_principal,
|
|
114
|
+
kerberos_password=args.kerberos_password,
|
|
115
|
+
)
|
|
116
|
+
|
|
117
|
+
except Exception as ex:
|
|
118
|
+
raise HidDataTransferException(ex) from ex
|
|
119
|
+
|
|
120
|
+
def hpc2hdfs(self, args) -> str:
|
|
121
|
+
"""transfer data from HPC to hdfs using SFTP"""
|
|
122
|
+
self.__logger.info(
|
|
123
|
+
"executing hpc2hdfs command with args: %s", self.format_args_to_string(args)
|
|
124
|
+
)
|
|
125
|
+
try:
|
|
126
|
+
# Check if 2FA is enabled
|
|
127
|
+
if args.two_factor_authentication:
|
|
128
|
+
return self.dt_client.hpc2hdfs_2fa(
|
|
129
|
+
hpc_host=args.hpc_host,
|
|
130
|
+
hpc_port=args.hpc_port,
|
|
131
|
+
hpc_username=args.hpc_username,
|
|
132
|
+
hpc_secret_key_path=args.hpc_secret_key,
|
|
133
|
+
hpc_secret_key_password=args.hpc_secret_key_password,
|
|
134
|
+
data_source=args.data_source,
|
|
135
|
+
data_target=args.data_target,
|
|
136
|
+
kerberos_principal=args.kerberos_principal,
|
|
137
|
+
kerberos_password=args.kerberos_password,
|
|
138
|
+
)
|
|
139
|
+
return self.dt_client.hpc2hdfs(
|
|
140
|
+
hpc_host=args.hpc_host,
|
|
141
|
+
hpc_port=args.hpc_port,
|
|
142
|
+
hpc_username=args.hpc_username,
|
|
143
|
+
hpc_password=args.hpc_password,
|
|
144
|
+
hpc_secret_key_path=args.hpc_secret_key,
|
|
145
|
+
hpc_secret_key_password=args.hpc_secret_key_password,
|
|
146
|
+
data_source=args.data_source,
|
|
147
|
+
data_target=args.data_target,
|
|
148
|
+
kerberos_principal=args.kerberos_principal,
|
|
149
|
+
kerberos_password=args.kerberos_password,
|
|
150
|
+
)
|
|
151
|
+
|
|
152
|
+
except Exception as ex:
|
|
153
|
+
raise HidDataTransferException(ex) from ex
|
|
154
|
+
|
|
155
|
+
def hdfs2ckan(self, args) -> str:
|
|
156
|
+
"""transfer data from HDFS to CKAN using SFTP"""
|
|
157
|
+
self.__logger.info(
|
|
158
|
+
"executing hpc2ckan command with args: %s", self.format_args_to_string(args)
|
|
159
|
+
)
|
|
160
|
+
try:
|
|
161
|
+
return self.dt_client.hdfs2ckan(
|
|
162
|
+
ckan_host=args.ckan_host
|
|
163
|
+
if args.ckan_host.startswith("https") else f"https://{args.ckan_host}",
|
|
164
|
+
ckan_api_key=args.ckan_api_key,
|
|
165
|
+
ckan_organization=args.ckan_organization,
|
|
166
|
+
ckan_dataset=args.ckan_dataset,
|
|
167
|
+
data_source=args.data_source,
|
|
168
|
+
kerberos_principal=args.kerberos_principal,
|
|
169
|
+
kerberos_password=args.kerberos_password,
|
|
170
|
+
)
|
|
171
|
+
|
|
172
|
+
except Exception as ex:
|
|
173
|
+
raise HidDataTransferException(ex) from ex
|
|
174
|
+
|
|
175
|
+
def ckan2hdfs(self, args) -> str:
|
|
176
|
+
"""transfer data from CKAN to HPC using SFTP"""
|
|
177
|
+
self.__logger.info(
|
|
178
|
+
"executing ckan2hpc command with args: %s", self.format_args_to_string(args)
|
|
179
|
+
)
|
|
180
|
+
try:
|
|
181
|
+
return self.dt_client.ckan2hdfs(
|
|
182
|
+
ckan_host=args.ckan_host
|
|
183
|
+
if args.ckan_host.startswith("https") else f"https://{args.ckan_host}",
|
|
184
|
+
ckan_api_key=args.ckan_api_key,
|
|
185
|
+
ckan_organization=args.ckan_organization,
|
|
186
|
+
ckan_dataset=args.ckan_dataset,
|
|
187
|
+
ckan_resource=args.ckan_resource,
|
|
188
|
+
data_target=args.data_target,
|
|
189
|
+
kerberos_principal=args.kerberos_principal,
|
|
190
|
+
kerberos_password=args.kerberos_password,
|
|
191
|
+
)
|
|
192
|
+
|
|
193
|
+
except Exception as ex:
|
|
194
|
+
raise HidDataTransferException(ex) from ex
|
|
195
|
+
|
|
196
|
+
def hpc2ckan(self, args) -> str:
|
|
197
|
+
"""transfer data from hpc to CKAN using SFTP"""
|
|
198
|
+
self.__logger.info(
|
|
199
|
+
"executing hpc2ckan command with args: %s", self.format_args_to_string(args)
|
|
200
|
+
)
|
|
201
|
+
try:
|
|
202
|
+
# Check if 2FA is enabled
|
|
203
|
+
if args.two_factor_authentication:
|
|
204
|
+
return self.dt_client.hpc2ckan_2fa(
|
|
205
|
+
ckan_host=args.ckan_host
|
|
206
|
+
if args.ckan_host.startswith("https")
|
|
207
|
+
else f"https://{args.ckan_host}",
|
|
208
|
+
ckan_api_key=args.ckan_api_key,
|
|
209
|
+
ckan_organization=args.ckan_organization,
|
|
210
|
+
ckan_dataset=args.ckan_dataset,
|
|
211
|
+
hpc_host=args.hpc_host,
|
|
212
|
+
hpc_port=args.hpc_port,
|
|
213
|
+
hpc_username=args.hpc_username,
|
|
214
|
+
hpc_secret_key_path=args.hpc_secret_key,
|
|
215
|
+
hpc_secret_key_password=args.hpc_secret_key_password,
|
|
216
|
+
data_source=args.data_source,
|
|
217
|
+
)
|
|
218
|
+
return self.dt_client.hpc2ckan(
|
|
219
|
+
ckan_host=args.ckan_host
|
|
220
|
+
if args.ckan_host.startswith("https") else f"https://{args.ckan_host}",
|
|
221
|
+
ckan_api_key=args.ckan_api_key,
|
|
222
|
+
ckan_organization=args.ckan_organization,
|
|
223
|
+
ckan_dataset=args.ckan_dataset,
|
|
224
|
+
hpc_host=args.hpc_host,
|
|
225
|
+
hpc_port=args.hpc_port,
|
|
226
|
+
hpc_username=args.hpc_username,
|
|
227
|
+
hpc_password=args.hpc_password,
|
|
228
|
+
hpc_secret_key_path=args.hpc_secret_key,
|
|
229
|
+
hpc_secret_key_password=args.hpc_secret_key_password,
|
|
230
|
+
data_source=args.data_source,
|
|
231
|
+
)
|
|
232
|
+
|
|
233
|
+
except Exception as ex:
|
|
234
|
+
raise HidDataTransferException(ex) from ex
|
|
235
|
+
|
|
236
|
+
def ckan2hpc(self, args) -> str:
|
|
237
|
+
"""transfer data from CKAN to hpc using SFTP"""
|
|
238
|
+
self.__logger.info(
|
|
239
|
+
"executing ckan2hpc command with args: %s", self.format_args_to_string(args)
|
|
240
|
+
)
|
|
241
|
+
try:
|
|
242
|
+
# Check if 2FA is enabled
|
|
243
|
+
if args.two_factor_authentication:
|
|
244
|
+
return self.dt_client.ckan2hpc_2fa(
|
|
245
|
+
ckan_host=args.ckan_host
|
|
246
|
+
if args.ckan_host.startswith("https")
|
|
247
|
+
else f"https://{args.ckan_host}",
|
|
248
|
+
ckan_api_key=args.ckan_api_key,
|
|
249
|
+
ckan_organization=args.ckan_organization,
|
|
250
|
+
ckan_dataset=args.ckan_dataset,
|
|
251
|
+
ckan_resource=args.ckan_resource,
|
|
252
|
+
hpc_host=args.hpc_host,
|
|
253
|
+
hpc_port=args.hpc_port,
|
|
254
|
+
hpc_username=args.hpc_username,
|
|
255
|
+
hpc_secret_key_path=args.hpc_secret_key,
|
|
256
|
+
hpc_secret_key_password=args.hpc_secret_key_password,
|
|
257
|
+
data_target=args.data_target,
|
|
258
|
+
)
|
|
259
|
+
return self.dt_client.ckan2hpc(
|
|
260
|
+
ckan_host=args.ckan_host
|
|
261
|
+
if args.ckan_host.startswith("https") else f"https://{args.ckan_host}",
|
|
262
|
+
ckan_api_key=args.ckan_api_key,
|
|
263
|
+
ckan_organization=args.ckan_organization,
|
|
264
|
+
ckan_dataset=args.ckan_dataset,
|
|
265
|
+
ckan_resource=args.ckan_resource,
|
|
266
|
+
hpc_host=args.hpc_host,
|
|
267
|
+
hpc_port=args.hpc_port,
|
|
268
|
+
hpc_username=args.hpc_username,
|
|
269
|
+
hpc_password=args.hpc_password,
|
|
270
|
+
hpc_secret_key_path=args.hpc_secret_key,
|
|
271
|
+
hpc_secret_key_password=args.hpc_secret_key_password,
|
|
272
|
+
data_target=args.data_target,
|
|
273
|
+
)
|
|
274
|
+
|
|
275
|
+
except Exception as ex:
|
|
276
|
+
raise HidDataTransferException(ex) from ex
|
|
277
|
+
|
|
278
|
+
def local2ckan(self, args) -> str:
|
|
279
|
+
"""transfer data from local filesystem to CKAN using SFTP"""
|
|
280
|
+
self.__logger.info(
|
|
281
|
+
"executing local2ckan command with args: %s",
|
|
282
|
+
self.format_args_to_string(args),
|
|
283
|
+
)
|
|
284
|
+
|
|
285
|
+
try:
|
|
286
|
+
return self.dt_client.local2ckan(
|
|
287
|
+
ckan_host=args.ckan_host
|
|
288
|
+
if args.ckan_host.startswith("https") else f"https://{args.ckan_host}",
|
|
289
|
+
ckan_api_key=args.ckan_api_key,
|
|
290
|
+
ckan_organization=args.ckan_organization,
|
|
291
|
+
ckan_dataset=args.ckan_dataset,
|
|
292
|
+
ckan_resource=args.ckan_resource,
|
|
293
|
+
data_source=args.data_source,
|
|
294
|
+
)
|
|
295
|
+
|
|
296
|
+
except Exception as ex:
|
|
297
|
+
raise HidDataTransferException(ex) from ex
|
|
298
|
+
|
|
299
|
+
def ckan2local(self, args) -> str:
|
|
300
|
+
"""transfer data from CKAN to the local filesystem using SFTP"""
|
|
301
|
+
self.__logger.info(
|
|
302
|
+
"executing ckan2local command with args: %s",
|
|
303
|
+
self.format_args_to_string(args),
|
|
304
|
+
)
|
|
305
|
+
try:
|
|
306
|
+
return self.dt_client.ckan2local(
|
|
307
|
+
ckan_host=args.ckan_host
|
|
308
|
+
if args.ckan_host.startswith("https") else f"https://{args.ckan_host}",
|
|
309
|
+
ckan_api_key=args.ckan_api_key,
|
|
310
|
+
ckan_organization=args.ckan_organization,
|
|
311
|
+
ckan_dataset=args.ckan_dataset,
|
|
312
|
+
ckan_resource=args.ckan_resource,
|
|
313
|
+
data_target=args.data_target,
|
|
314
|
+
)
|
|
315
|
+
|
|
316
|
+
except Exception as ex:
|
|
317
|
+
raise HidDataTransferException(ex) from ex
|
|
318
|
+
|
|
319
|
+
def check_command_status(self, args):
|
|
320
|
+
"""Checks the status of a CLI command by ID"""
|
|
321
|
+
# Check process group state by id
|
|
322
|
+
# This implies to check the execution state of the last processor in the group
|
|
323
|
+
self.__logger.info(
|
|
324
|
+
"executing check_command_status command with args: %s",
|
|
325
|
+
self.format_args_to_string(args),
|
|
326
|
+
)
|
|
327
|
+
try:
|
|
328
|
+
return self.dt_client.check_command_status(args.command_id)
|
|
329
|
+
|
|
330
|
+
except Exception as ex:
|
|
331
|
+
raise HidDataTransferException(ex) from ex
|
src/dtcli.cfg
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
[Nifi]
|
|
2
|
+
nifi_endpoint=https://nifi.hidalgo2.eu:9443
|
|
3
|
+
nifi_upload_folder=/opt/nifi/data/upload
|
|
4
|
+
nifi_download_folder=/opt/nifi/data/download
|
|
5
|
+
nifi_secure_connection=True
|
|
6
|
+
|
|
7
|
+
[Keycloak]
|
|
8
|
+
keycloak_endpoint=https://idm.hidalgo2.eu
|
|
9
|
+
keycloak_client_id=nifi
|
|
10
|
+
keycloak_client_secret=Tkt8BQmTfUkSceknml6HDSbmGyNRik9V
|
|
11
|
+
|
|
12
|
+
[Logging]
|
|
13
|
+
logging_level=INFO
|
|
14
|
+
|
|
15
|
+
[Network]
|
|
16
|
+
check_status_sleep_lapse=1
|
src/parser/__init__.py
ADDED
|
File without changes
|
src/parser/cli_parser.py
ADDED
|
@@ -0,0 +1,370 @@
|
|
|
1
|
+
"""
|
|
2
|
+
Copyright 2024 Eviden
|
|
3
|
+
Licensed under the Apache License, Version 2.0 (the "License");
|
|
4
|
+
you may not use this file except in compliance with the License.
|
|
5
|
+
You may obtain a copy of the License at
|
|
6
|
+
|
|
7
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
|
8
|
+
|
|
9
|
+
Unless required by applicable law or agreed to in writing, software
|
|
10
|
+
distributed under the License is distributed on an 'AS IS' BASIS,
|
|
11
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
12
|
+
See the License for the specific language governing permissions and
|
|
13
|
+
limitations under the License.
|
|
14
|
+
|
|
15
|
+
|
|
16
|
+
This module provides a parser for the command line arguments.
|
|
17
|
+
It defines the command line arguments that the user can pass to the CLI
|
|
18
|
+
"""
|
|
19
|
+
|
|
20
|
+
import argparse
|
|
21
|
+
import os
|
|
22
|
+
|
|
23
|
+
import yaml
|
|
24
|
+
|
|
25
|
+
|
|
26
|
+
class CLIParser(argparse.ArgumentParser):
|
|
27
|
+
"""Parser of the command line arguments"""
|
|
28
|
+
|
|
29
|
+
def fill_missing_args_from_config(self, args):
|
|
30
|
+
"""Fill missing arguments from the config file"""
|
|
31
|
+
# Read default dtcli YAML config file from ~/.dtcli/config if exists
|
|
32
|
+
dtcli_config_file = os.path.expanduser("~/.dtcli/config")
|
|
33
|
+
if not os.path.exists(dtcli_config_file):
|
|
34
|
+
return args
|
|
35
|
+
with open(dtcli_config_file, "r", encoding="utf8") as f:
|
|
36
|
+
config = yaml.safe_load(f)
|
|
37
|
+
|
|
38
|
+
# For each host in command, find it in the config file,
|
|
39
|
+
# For each host in command, complete the arguments collected
|
|
40
|
+
# from the config file that are not already set in the command line
|
|
41
|
+
for host in config.keys():
|
|
42
|
+
if host in args:
|
|
43
|
+
host_index = args.index(host)
|
|
44
|
+
if (args[host_index-1] == "-H" or
|
|
45
|
+
args[host_index-1] == "--hpc-host"):
|
|
46
|
+
self.process_hpc_args(args, config[host])
|
|
47
|
+
elif (args[host_index-1] == "-c" or
|
|
48
|
+
args[host_index-1] == "--ckan-host"):
|
|
49
|
+
self.process_ckan_args(args, config[host])
|
|
50
|
+
# Kerberos
|
|
51
|
+
if host.upper() == "KERBEROS" and 'hdfs' in args[0]:
|
|
52
|
+
kerberos_config = config[host]
|
|
53
|
+
self.process_kerberos_args(args, kerberos_config)
|
|
54
|
+
return args
|
|
55
|
+
|
|
56
|
+
def process_kerberos_args(self, args, kerberos_config):
|
|
57
|
+
"""Process the Kerberos arguments from the config file"""
|
|
58
|
+
if (("-kpr" not in args and "--kerberos-principal" not in args)
|
|
59
|
+
and "principal" in kerberos_config):
|
|
60
|
+
args.append("-kpr")
|
|
61
|
+
args.append(str(kerberos_config["principal"]))
|
|
62
|
+
if (("-kp" not in args and "--kerberos-password" not in args)
|
|
63
|
+
and "password" in kerberos_config):
|
|
64
|
+
args.append("-kp")
|
|
65
|
+
args.append(str(kerberos_config["password"]))
|
|
66
|
+
|
|
67
|
+
def process_hpc_args(self, args, hpc_config):
|
|
68
|
+
"""Process the HPC arguments from the config file"""
|
|
69
|
+
if (("-z" not in args and "--hpc-port" not in args)
|
|
70
|
+
and "port" in hpc_config):
|
|
71
|
+
args.append("-z")
|
|
72
|
+
args.append(str(hpc_config["port"]))
|
|
73
|
+
if (("-u" not in args and "--hpc-username" not in args)
|
|
74
|
+
and "username" in hpc_config):
|
|
75
|
+
args.append("-u")
|
|
76
|
+
args.append(hpc_config["username"])
|
|
77
|
+
if (("-p" not in args and "--hpc-password" not in args)
|
|
78
|
+
and "password" in hpc_config):
|
|
79
|
+
args.append("-p")
|
|
80
|
+
args.append(hpc_config["password"])
|
|
81
|
+
if (("-k" not in args and "--hpc-secret-key" not in args)
|
|
82
|
+
and "secret-key" in hpc_config):
|
|
83
|
+
args.append("-k")
|
|
84
|
+
args.append(hpc_config["secret-key"])
|
|
85
|
+
if (("-P" not in args and "--hpc-secret-key-password" not in args)
|
|
86
|
+
and "secret-key-password" in hpc_config):
|
|
87
|
+
args.append("-P")
|
|
88
|
+
args.append(hpc_config["secret-key-password"])
|
|
89
|
+
|
|
90
|
+
def process_ckan_args(self, args, ckan_config):
|
|
91
|
+
"""Process the CKAN arguments from the config file"""
|
|
92
|
+
if (("-a" not in args and "--ckan-api-key" not in args)
|
|
93
|
+
and "api-key" in ckan_config):
|
|
94
|
+
args.append("-a")
|
|
95
|
+
args.append(ckan_config["api-key"])
|
|
96
|
+
if (("-o" not in args and "--ckan-organization" not in args)
|
|
97
|
+
and "organization" in ckan_config):
|
|
98
|
+
args.append("-o")
|
|
99
|
+
args.append(ckan_config["organization"])
|
|
100
|
+
if (("-d" not in args and "--ckan-dataset" not in args)
|
|
101
|
+
and "dataset" in ckan_config):
|
|
102
|
+
args.append("-d")
|
|
103
|
+
args.append(ckan_config["dataset"])
|
|
104
|
+
|
|
105
|
+
def add_default_hpc_arguments(self, parser):
|
|
106
|
+
"""Add default HPC arguments to the parser"""
|
|
107
|
+
parser.add_argument(
|
|
108
|
+
"-H", "--hpc-host", required=True, help="Target HPC ssh host"
|
|
109
|
+
)
|
|
110
|
+
parser.add_argument(
|
|
111
|
+
"-z", "--hpc-port", required=False, help="[Optional] Target HPC ssh port"
|
|
112
|
+
)
|
|
113
|
+
parser.add_argument(
|
|
114
|
+
"-u", "--hpc-username", required=True, help="Username for HPC account"
|
|
115
|
+
)
|
|
116
|
+
parser.add_argument(
|
|
117
|
+
"-p",
|
|
118
|
+
"--hpc-password",
|
|
119
|
+
required=False,
|
|
120
|
+
help="[Optional] Password for HPC account. "
|
|
121
|
+
"Either password or secret key is required",
|
|
122
|
+
)
|
|
123
|
+
parser.add_argument(
|
|
124
|
+
"-k",
|
|
125
|
+
"--hpc-secret-key",
|
|
126
|
+
required=False,
|
|
127
|
+
help="[Optional] Path to HPC secret key. "
|
|
128
|
+
"Either password or secret key is required",
|
|
129
|
+
)
|
|
130
|
+
parser.add_argument(
|
|
131
|
+
"-P",
|
|
132
|
+
"--hpc-secret-key-password",
|
|
133
|
+
required=False,
|
|
134
|
+
help="[Optional] Password for HPC secret key",
|
|
135
|
+
)
|
|
136
|
+
return parser
|
|
137
|
+
|
|
138
|
+
def add_default_ckan_arguments(self, parser):
|
|
139
|
+
"""Add default CKAN arguments to the parser"""
|
|
140
|
+
parser.add_argument(
|
|
141
|
+
"-c", "--ckan-host", required=True, help="CKAN host endpoint"
|
|
142
|
+
)
|
|
143
|
+
parser.add_argument(
|
|
144
|
+
"-a",
|
|
145
|
+
"--ckan-api-key",
|
|
146
|
+
required=True,
|
|
147
|
+
help="CKAN API key",
|
|
148
|
+
)
|
|
149
|
+
parser.add_argument(
|
|
150
|
+
"-o",
|
|
151
|
+
"--ckan-organization",
|
|
152
|
+
required=True,
|
|
153
|
+
help="Identifier of the CKAN organization that hosts \
|
|
154
|
+
the dataset resource to transfer. \
|
|
155
|
+
Could be identified by the organization id, name or title",
|
|
156
|
+
)
|
|
157
|
+
parser.add_argument(
|
|
158
|
+
"-d",
|
|
159
|
+
"--ckan-dataset",
|
|
160
|
+
required=True,
|
|
161
|
+
help="Identifier of the CKAN dataset that hosts the resource to transfer. \
|
|
162
|
+
Could be identified by the dataset id, name or title",
|
|
163
|
+
)
|
|
164
|
+
return parser
|
|
165
|
+
|
|
166
|
+
def add_default_kerberos_arguments(self, parser):
|
|
167
|
+
"""Add default Kerberos arguments to the parser"""
|
|
168
|
+
parser.add_argument(
|
|
169
|
+
"-kpr", "--kerberos-principal",
|
|
170
|
+
required=False,
|
|
171
|
+
help="[Optional] Kerberos principal (mandatory for a Kerberized HDFS)"
|
|
172
|
+
)
|
|
173
|
+
parser.add_argument(
|
|
174
|
+
"-kp", "--kerberos-password",
|
|
175
|
+
required=False,
|
|
176
|
+
help="[Optional] Kerberos principal password "
|
|
177
|
+
"(mandatory for a Kerberized HDFS)"
|
|
178
|
+
)
|
|
179
|
+
return parser
|
|
180
|
+
|
|
181
|
+
def parse_arguments(self, args, target):
|
|
182
|
+
"""parse the command line arguments
|
|
183
|
+
|
|
184
|
+
Args:
|
|
185
|
+
args (list): list of command line arguments
|
|
186
|
+
target (object): target object to execute the command
|
|
187
|
+
|
|
188
|
+
Returns:
|
|
189
|
+
dict: a dictionary of parsed arguments
|
|
190
|
+
"""
|
|
191
|
+
|
|
192
|
+
# commands
|
|
193
|
+
commands_parsers = self.add_subparsers(
|
|
194
|
+
help="supported commands to transfer data"
|
|
195
|
+
)
|
|
196
|
+
# check status
|
|
197
|
+
check_status_parser = commands_parsers.add_parser(
|
|
198
|
+
"check-status", help="check the status of a command"
|
|
199
|
+
)
|
|
200
|
+
check_status_parser.add_argument(
|
|
201
|
+
"-i",
|
|
202
|
+
"--command_id",
|
|
203
|
+
required=True,
|
|
204
|
+
help="id of command to check status",
|
|
205
|
+
)
|
|
206
|
+
check_status_parser.set_defaults(func=target.check_command_status)
|
|
207
|
+
|
|
208
|
+
# hdfs2hpc
|
|
209
|
+
hdfs2hpc_parser = commands_parsers.add_parser(
|
|
210
|
+
"hdfs2hpc", help="transfer data from HDFS to target HPC"
|
|
211
|
+
)
|
|
212
|
+
hdfs2hpc_parser.add_argument(
|
|
213
|
+
"-s", "--data-source",
|
|
214
|
+
required=True, help="HDFS file path"
|
|
215
|
+
)
|
|
216
|
+
hdfs2hpc_parser.add_argument(
|
|
217
|
+
"-t", "--data-target",
|
|
218
|
+
required=False, help="[Optional] HPC folder"
|
|
219
|
+
)
|
|
220
|
+
hdfs2hpc_parser = self.add_default_kerberos_arguments(hdfs2hpc_parser)
|
|
221
|
+
hdfs2hpc_parser = self.add_default_hpc_arguments(hdfs2hpc_parser)
|
|
222
|
+
hdfs2hpc_parser.add_argument(
|
|
223
|
+
"-2fa", "--two-factor-authentication",
|
|
224
|
+
required=False, action="store_true", default=False,
|
|
225
|
+
help="[Optional] HPC requires 2FA authentication"
|
|
226
|
+
)
|
|
227
|
+
hdfs2hpc_parser.set_defaults(func=target.hdfs2hpc)
|
|
228
|
+
|
|
229
|
+
# hpc2hdfs
|
|
230
|
+
hpc2hdfs_parser = commands_parsers.add_parser(
|
|
231
|
+
"hpc2hdfs", help="transfer data from HPC to target HDFS"
|
|
232
|
+
)
|
|
233
|
+
|
|
234
|
+
hpc2hdfs_parser.add_argument(
|
|
235
|
+
"-s", "--data-source", required=True, help="HPC file path"
|
|
236
|
+
)
|
|
237
|
+
hpc2hdfs_parser.add_argument(
|
|
238
|
+
"-t", "--data-target", required=True, help="HDFS folder"
|
|
239
|
+
)
|
|
240
|
+
hpc2hdfs_parser = self.add_default_kerberos_arguments(hpc2hdfs_parser)
|
|
241
|
+
hpc2hdfs_parser = self.add_default_hpc_arguments(hpc2hdfs_parser)
|
|
242
|
+
hpc2hdfs_parser.add_argument(
|
|
243
|
+
"-2fa", "--two-factor-authentication",
|
|
244
|
+
required=False, action="store_true", default=False,
|
|
245
|
+
help="[Optional] HPC requires 2FA authentication"
|
|
246
|
+
)
|
|
247
|
+
hpc2hdfs_parser.set_defaults(func=target.hpc2hdfs)
|
|
248
|
+
|
|
249
|
+
# ckan2hdfs
|
|
250
|
+
ckan2hdfs_parser = commands_parsers.add_parser(
|
|
251
|
+
"ckan2hdfs", help="transfer data from CKAN to target HDFS"
|
|
252
|
+
)
|
|
253
|
+
ckan2hdfs_parser = self.add_default_ckan_arguments(ckan2hdfs_parser)
|
|
254
|
+
ckan2hdfs_parser.add_argument(
|
|
255
|
+
"-r",
|
|
256
|
+
"--ckan-resource",
|
|
257
|
+
required=False,
|
|
258
|
+
help="[Optional] CKAN resource to transfer. \
|
|
259
|
+
Could be identified by the dataset id or name. \
|
|
260
|
+
If empty, all resources in the dataset will be transferred. \
|
|
261
|
+
A regex is also accepted to filter the resources to transfer",
|
|
262
|
+
)
|
|
263
|
+
ckan2hdfs_parser.add_argument(
|
|
264
|
+
"-t", "--data-target", required=False, help="[Optional] target HDFS folder"
|
|
265
|
+
)
|
|
266
|
+
ckan2hdfs_parser = self.add_default_kerberos_arguments(ckan2hdfs_parser)
|
|
267
|
+
ckan2hdfs_parser.set_defaults(func=target.ckan2hdfs)
|
|
268
|
+
|
|
269
|
+
# hdfs2ckan
|
|
270
|
+
hdfs2ckan_parser = commands_parsers.add_parser(
|
|
271
|
+
"hdfs2ckan", help="transfer data from HDFS to a target CKAN"
|
|
272
|
+
)
|
|
273
|
+
hdfs2ckan_parser = self.add_default_ckan_arguments(hdfs2ckan_parser)
|
|
274
|
+
hdfs2ckan_parser = self.add_default_kerberos_arguments(hdfs2ckan_parser)
|
|
275
|
+
hdfs2ckan_parser.add_argument(
|
|
276
|
+
"-s",
|
|
277
|
+
"--data-source",
|
|
278
|
+
required=True,
|
|
279
|
+
help="File path to HDFS file or directory to transfer",
|
|
280
|
+
)
|
|
281
|
+
hdfs2ckan_parser.set_defaults(func=target.hdfs2ckan)
|
|
282
|
+
|
|
283
|
+
# ckan2hpc
|
|
284
|
+
ckan2hpc_parser = commands_parsers.add_parser(
|
|
285
|
+
"ckan2hpc", help="transfer data from CKAN to target HPC"
|
|
286
|
+
)
|
|
287
|
+
ckan2hpc_parser = self.add_default_ckan_arguments(ckan2hpc_parser)
|
|
288
|
+
ckan2hpc_parser.add_argument(
|
|
289
|
+
"-r",
|
|
290
|
+
"--ckan-resource",
|
|
291
|
+
required=False,
|
|
292
|
+
help="[Optional] CKAN resource to transfer. \
|
|
293
|
+
Could be identified by the dataset id or name. \
|
|
294
|
+
If empty, all resources in the dataset will be transferred. \
|
|
295
|
+
A regex is also accepted to filter the resources to transfer",
|
|
296
|
+
)
|
|
297
|
+
ckan2hpc_parser.add_argument(
|
|
298
|
+
"-t", "--data-target", required=False, help="[Optional] target HPC folder"
|
|
299
|
+
)
|
|
300
|
+
ckan2hpc_parser = self.add_default_hpc_arguments(ckan2hpc_parser)
|
|
301
|
+
ckan2hpc_parser.add_argument(
|
|
302
|
+
"-2fa", "--two-factor-authentication",
|
|
303
|
+
required=False, action="store_true", default=False,
|
|
304
|
+
help="[Optional] HPC requires 2FA authentication"
|
|
305
|
+
)
|
|
306
|
+
ckan2hpc_parser.set_defaults(func=target.ckan2hpc)
|
|
307
|
+
|
|
308
|
+
# hpc2ckan
|
|
309
|
+
hpc2ckan_parser = commands_parsers.add_parser(
|
|
310
|
+
"hpc2ckan", help="transfer data from HPC to a target CKAN"
|
|
311
|
+
)
|
|
312
|
+
hpc2ckan_parser = self.add_default_ckan_arguments(hpc2ckan_parser)
|
|
313
|
+
hpc2ckan_parser = self.add_default_hpc_arguments(hpc2ckan_parser)
|
|
314
|
+
hpc2ckan_parser.add_argument(
|
|
315
|
+
"-2fa", "--two-factor-authentication",
|
|
316
|
+
required=False, action="store_true", default=False,
|
|
317
|
+
help="[Optional] HPC requires 2FA authentication"
|
|
318
|
+
)
|
|
319
|
+
hpc2ckan_parser.add_argument(
|
|
320
|
+
"-s",
|
|
321
|
+
"--data-source",
|
|
322
|
+
required=True,
|
|
323
|
+
help="File path to HPC file or directory to transfer",
|
|
324
|
+
)
|
|
325
|
+
hpc2ckan_parser.set_defaults(func=target.hpc2ckan)
|
|
326
|
+
|
|
327
|
+
# local2ckan
|
|
328
|
+
local2ckan_parser = commands_parsers.add_parser(
|
|
329
|
+
"local2ckan", help="transfer data from a local filesystem to a target CKAN"
|
|
330
|
+
)
|
|
331
|
+
local2ckan_parser = self.add_default_ckan_arguments(local2ckan_parser)
|
|
332
|
+
local2ckan_parser.add_argument(
|
|
333
|
+
"-r",
|
|
334
|
+
"--ckan-resource",
|
|
335
|
+
required=False,
|
|
336
|
+
help="[Optional] CKAN resource to create from transferred sources. \
|
|
337
|
+
If omitted, target resource name will adopt the source file or folder name",
|
|
338
|
+
)
|
|
339
|
+
|
|
340
|
+
local2ckan_parser.add_argument(
|
|
341
|
+
"-s",
|
|
342
|
+
"--data-source",
|
|
343
|
+
required=True,
|
|
344
|
+
help="File path to local file or directory to transfer",
|
|
345
|
+
)
|
|
346
|
+
local2ckan_parser.set_defaults(func=target.local2ckan)
|
|
347
|
+
|
|
348
|
+
# ckan2local
|
|
349
|
+
ckan2local_parser = commands_parsers.add_parser(
|
|
350
|
+
"ckan2local", help="transfer data from CKAN to a local filesystem"
|
|
351
|
+
)
|
|
352
|
+
ckan2local_parser = self.add_default_ckan_arguments(ckan2local_parser)
|
|
353
|
+
ckan2local_parser.add_argument(
|
|
354
|
+
"-r",
|
|
355
|
+
"--ckan-resource",
|
|
356
|
+
required=False,
|
|
357
|
+
help="[Optional] CKAN resource to transfer. \
|
|
358
|
+
If omitted, all resources in the dataset will be transferred",
|
|
359
|
+
)
|
|
360
|
+
|
|
361
|
+
ckan2local_parser.add_argument(
|
|
362
|
+
"-t",
|
|
363
|
+
"--data-target",
|
|
364
|
+
required=False,
|
|
365
|
+
help="Local directory where to transfer the data. \
|
|
366
|
+
If omitted, data will be transferred to the current directory",
|
|
367
|
+
)
|
|
368
|
+
ckan2local_parser.set_defaults(func=target.ckan2local)
|
|
369
|
+
|
|
370
|
+
return self.parse_args(args)
|