teradataml 20.0.0.0__py3-none-any.whl → 20.0.0.1__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of teradataml might be problematic. Click here for more details.
- teradataml/LICENSE-3RD-PARTY.pdf +0 -0
- teradataml/LICENSE.pdf +0 -0
- teradataml/README.md +71 -0
- teradataml/_version.py +2 -2
- teradataml/analytics/analytic_function_executor.py +51 -24
- teradataml/analytics/json_parser/utils.py +11 -17
- teradataml/automl/__init__.py +103 -48
- teradataml/automl/data_preparation.py +55 -37
- teradataml/automl/data_transformation.py +131 -69
- teradataml/automl/feature_engineering.py +117 -185
- teradataml/automl/feature_exploration.py +9 -2
- teradataml/automl/model_evaluation.py +13 -25
- teradataml/automl/model_training.py +214 -75
- teradataml/catalog/model_cataloging_utils.py +1 -1
- teradataml/clients/auth_client.py +133 -0
- teradataml/common/aed_utils.py +3 -2
- teradataml/common/constants.py +11 -6
- teradataml/common/garbagecollector.py +5 -0
- teradataml/common/messagecodes.py +3 -1
- teradataml/common/messages.py +2 -1
- teradataml/common/utils.py +6 -0
- teradataml/context/context.py +49 -29
- teradataml/data/advertising.csv +201 -0
- teradataml/data/bank_marketing.csv +11163 -0
- teradataml/data/bike_sharing.csv +732 -0
- teradataml/data/boston2cols.csv +721 -0
- teradataml/data/breast_cancer.csv +570 -0
- teradataml/data/customer_segmentation_test.csv +2628 -0
- teradataml/data/customer_segmentation_train.csv +8069 -0
- teradataml/data/docs/sqle/docs_17_10/OneHotEncodingFit.py +3 -1
- teradataml/data/docs/sqle/docs_17_10/OneHotEncodingTransform.py +6 -0
- teradataml/data/docs/sqle/docs_17_10/OutlierFilterTransform.py +5 -1
- teradataml/data/docs/sqle/docs_17_20/ANOVA.py +61 -1
- teradataml/data/docs/sqle/docs_17_20/ColumnTransformer.py +2 -0
- teradataml/data/docs/sqle/docs_17_20/FTest.py +105 -26
- teradataml/data/docs/sqle/docs_17_20/GLM.py +162 -1
- teradataml/data/docs/sqle/docs_17_20/GetFutileColumns.py +5 -3
- teradataml/data/docs/sqle/docs_17_20/KMeans.py +48 -1
- teradataml/data/docs/sqle/docs_17_20/NonLinearCombineFit.py +3 -2
- teradataml/data/docs/sqle/docs_17_20/OneHotEncodingFit.py +5 -0
- teradataml/data/docs/sqle/docs_17_20/OneHotEncodingTransform.py +6 -0
- teradataml/data/docs/sqle/docs_17_20/ROC.py +3 -2
- teradataml/data/docs/sqle/docs_17_20/SVMPredict.py +13 -2
- teradataml/data/docs/sqle/docs_17_20/ScaleFit.py +119 -1
- teradataml/data/docs/sqle/docs_17_20/ScaleTransform.py +93 -1
- teradataml/data/docs/sqle/docs_17_20/TDGLMPredict.py +163 -1
- teradataml/data/docs/sqle/docs_17_20/XGBoost.py +12 -4
- teradataml/data/docs/sqle/docs_17_20/XGBoostPredict.py +7 -1
- teradataml/data/docs/sqle/docs_17_20/ZTest.py +72 -7
- teradataml/data/glm_example.json +28 -1
- teradataml/data/housing_train_segment.csv +201 -0
- teradataml/data/insect2Cols.csv +61 -0
- teradataml/data/jsons/sqle/17.20/TD_ANOVA.json +99 -27
- teradataml/data/jsons/sqle/17.20/TD_FTest.json +166 -83
- teradataml/data/jsons/sqle/17.20/TD_GLM.json +90 -14
- teradataml/data/jsons/sqle/17.20/TD_GLMPREDICT.json +48 -5
- teradataml/data/jsons/sqle/17.20/TD_GetFutileColumns.json +5 -3
- teradataml/data/jsons/sqle/17.20/TD_KMeans.json +31 -11
- teradataml/data/jsons/sqle/17.20/TD_NonLinearCombineFit.json +3 -2
- teradataml/data/jsons/sqle/17.20/TD_ROC.json +2 -1
- teradataml/data/jsons/sqle/17.20/TD_SVM.json +16 -16
- teradataml/data/jsons/sqle/17.20/TD_SVMPredict.json +19 -1
- teradataml/data/jsons/sqle/17.20/TD_ScaleFit.json +168 -15
- teradataml/data/jsons/sqle/17.20/TD_ScaleTransform.json +50 -1
- teradataml/data/jsons/sqle/17.20/TD_XGBoost.json +25 -7
- teradataml/data/jsons/sqle/17.20/TD_XGBoostPredict.json +17 -4
- teradataml/data/jsons/sqle/17.20/TD_ZTest.json +157 -80
- teradataml/data/kmeans_example.json +5 -0
- teradataml/data/kmeans_table.csv +10 -0
- teradataml/data/onehot_encoder_train.csv +4 -0
- teradataml/data/openml_example.json +29 -0
- teradataml/data/scale_attributes.csv +3 -0
- teradataml/data/scale_example.json +52 -1
- teradataml/data/scale_input_part_sparse.csv +31 -0
- teradataml/data/scale_input_partitioned.csv +16 -0
- teradataml/data/scale_input_sparse.csv +11 -0
- teradataml/data/scale_parameters.csv +3 -0
- teradataml/data/scripts/deploy_script.py +20 -1
- teradataml/data/scripts/sklearn/sklearn_fit.py +23 -27
- teradataml/data/scripts/sklearn/sklearn_fit_predict.py +20 -28
- teradataml/data/scripts/sklearn/sklearn_function.template +13 -18
- teradataml/data/scripts/sklearn/sklearn_model_selection_split.py +23 -33
- teradataml/data/scripts/sklearn/sklearn_neighbors.py +18 -27
- teradataml/data/scripts/sklearn/sklearn_score.py +20 -29
- teradataml/data/scripts/sklearn/sklearn_transform.py +30 -38
- teradataml/data/teradataml_example.json +77 -0
- teradataml/data/ztest_example.json +16 -0
- teradataml/dataframe/copy_to.py +8 -3
- teradataml/dataframe/data_transfer.py +120 -61
- teradataml/dataframe/dataframe.py +102 -17
- teradataml/dataframe/dataframe_utils.py +47 -9
- teradataml/dataframe/fastload.py +272 -89
- teradataml/dataframe/sql.py +84 -0
- teradataml/dbutils/dbutils.py +2 -2
- teradataml/lib/aed_0_1.dll +0 -0
- teradataml/opensource/sklearn/_sklearn_wrapper.py +102 -55
- teradataml/options/__init__.py +13 -4
- teradataml/options/configure.py +27 -6
- teradataml/scriptmgmt/UserEnv.py +19 -16
- teradataml/scriptmgmt/lls_utils.py +117 -14
- teradataml/table_operators/Script.py +2 -3
- teradataml/table_operators/TableOperator.py +58 -10
- teradataml/utils/validators.py +40 -2
- {teradataml-20.0.0.0.dist-info → teradataml-20.0.0.1.dist-info}/METADATA +78 -6
- {teradataml-20.0.0.0.dist-info → teradataml-20.0.0.1.dist-info}/RECORD +108 -90
- {teradataml-20.0.0.0.dist-info → teradataml-20.0.0.1.dist-info}/WHEEL +0 -0
- {teradataml-20.0.0.0.dist-info → teradataml-20.0.0.1.dist-info}/top_level.txt +0 -0
- {teradataml-20.0.0.0.dist-info → teradataml-20.0.0.1.dist-info}/zip-safe +0 -0
|
@@ -21,13 +21,14 @@ import requests
|
|
|
21
21
|
|
|
22
22
|
from json.decoder import JSONDecodeError
|
|
23
23
|
from teradataml import configure
|
|
24
|
-
from teradataml.context.context import _get_user
|
|
24
|
+
from teradataml.context.context import _get_user, get_connection
|
|
25
25
|
from teradataml.common.constants import HTTPRequest, AsyncStatusColumns
|
|
26
26
|
from teradataml.common.exceptions import TeradataMlException
|
|
27
27
|
from teradataml.common.messages import Messages
|
|
28
28
|
from teradataml.common.messagecodes import MessageCodes
|
|
29
29
|
from teradataml.common.utils import UtilFuncs
|
|
30
30
|
from teradataml.clients.pkce_client import _DAWorkflow
|
|
31
|
+
from teradataml.clients.auth_client import _AuthWorkflow
|
|
31
32
|
from teradataml.utils.internal_buffer import _InternalBuffer
|
|
32
33
|
from teradataml.scriptmgmt.UserEnv import UserEnv, _get_auth_token, \
|
|
33
34
|
_process_ues_response, _get_ues_url, _AuthToken
|
|
@@ -1548,15 +1549,17 @@ def get_user_env():
|
|
|
1548
1549
|
|
|
1549
1550
|
|
|
1550
1551
|
@collect_queryband(queryband="StAthTkn")
|
|
1551
|
-
def set_auth_token(ues_url, client_id=None):
|
|
1552
|
+
def set_auth_token(ues_url, client_id=None, pat_token=None, pem_file=None, **kwargs):
|
|
1552
1553
|
"""
|
|
1553
1554
|
DESCRIPTION:
|
|
1554
1555
|
Function to set the Authentication token to connect to User Environment Service
|
|
1555
1556
|
in VantageCloud Lake.
|
|
1556
1557
|
Note:
|
|
1557
|
-
User must have a
|
|
1558
|
+
User must have a privilege to login with a NULL password to use set_auth_token().
|
|
1558
1559
|
Please refer to GRANT LOGON section in Teradata Documentation for more details.
|
|
1559
|
-
|
|
1560
|
+
If ues_url and client_id are specified then authentication is through OAuth.
|
|
1561
|
+
If ues_url, pat_token, pem_file are specified then authentication is through PAT.
|
|
1562
|
+
Refresh token still works but only for OAuth authentication.
|
|
1560
1563
|
|
|
1561
1564
|
PARAMETERS:
|
|
1562
1565
|
ues_url:
|
|
@@ -1570,6 +1573,32 @@ def set_auth_token(ues_url, client_id=None):
|
|
|
1570
1573
|
VantageCloud Lake.
|
|
1571
1574
|
Types: str
|
|
1572
1575
|
|
|
1576
|
+
pat_token:
|
|
1577
|
+
Required, if PAT authentication is to be used, optional otherwise.
|
|
1578
|
+
Specifies the PAT token generated from VantageCloud Lake Console.
|
|
1579
|
+
Types: str
|
|
1580
|
+
|
|
1581
|
+
pem_file:
|
|
1582
|
+
Required, if PAT authentication is to be used, optional otherwise.
|
|
1583
|
+
Specifies the path to private key file which is generated from VantageCloud Lake Console.
|
|
1584
|
+
Types: str
|
|
1585
|
+
|
|
1586
|
+
**kwargs:
|
|
1587
|
+
username:
|
|
1588
|
+
Specifies the user for which authentication is to be requested.
|
|
1589
|
+
If not specified, then user associated with current connection is used.
|
|
1590
|
+
Note:
|
|
1591
|
+
1. Use this option only if name of the database username has lower case letters.
|
|
1592
|
+
2. This option is used only for PAT and not for OAuth.
|
|
1593
|
+
Types: str
|
|
1594
|
+
|
|
1595
|
+
expiration_time:
|
|
1596
|
+
Specifies the expiration time of the token in seconds. After expiry time JWT token expires and
|
|
1597
|
+
UserEnv methods does not work, user should regenerate the token.
|
|
1598
|
+
Note:
|
|
1599
|
+
This option is used only for PAT and not for OAuth.
|
|
1600
|
+
Default Value: 31536000
|
|
1601
|
+
Types: int
|
|
1573
1602
|
|
|
1574
1603
|
RETURNS:
|
|
1575
1604
|
True, if the operation is successful.
|
|
@@ -1586,31 +1615,105 @@ def set_auth_token(ues_url, client_id=None):
|
|
|
1586
1615
|
# Example 2: Set the Authentication token by specifying the client_id.
|
|
1587
1616
|
>>> set_auth_token(ues_url=getpass.getpass("ues_url : "),
|
|
1588
1617
|
... client_id=getpass.getpass("client_id : "))
|
|
1618
|
+
|
|
1619
|
+
# Example 3: Set the Authentication token by specifying the "pem_file" and "pat_token"
|
|
1620
|
+
# without specifying "username".
|
|
1621
|
+
>>> import getpass
|
|
1622
|
+
>>> set_auth_token(ues_url=getpass.getpass("ues_url : "),
|
|
1623
|
+
... pat_token=getpass.getpass("pat_token : "),
|
|
1624
|
+
... pem_file=getpass.getpass("pem_file : "))
|
|
1625
|
+
True
|
|
1626
|
+
|
|
1627
|
+
# Example 4: Set the Authentication token by specifying the "pem_file" and "pat_token"
|
|
1628
|
+
# and "username".
|
|
1629
|
+
>>> import getpass
|
|
1630
|
+
>>> set_auth_token(ues_url=getpass.getpass("ues_url : "),
|
|
1631
|
+
... pat_token=getpass.getpass("pat_token : "),
|
|
1632
|
+
... pem_file=getpass.getpass("pem_file : "))
|
|
1633
|
+
... username = "alice")
|
|
1634
|
+
True
|
|
1589
1635
|
"""
|
|
1636
|
+
# Deriving global connection using get_connection().
|
|
1637
|
+
con = get_connection()
|
|
1638
|
+
if con is None:
|
|
1639
|
+
raise TeradataMlException(Messages.get_message(MessageCodes.INVALID_CONTEXT_CONNECTION),
|
|
1640
|
+
MessageCodes.INVALID_CONTEXT_CONNECTION)
|
|
1641
|
+
|
|
1590
1642
|
__arg_info_matrix = []
|
|
1591
1643
|
__arg_info_matrix.append(["ues_url", ues_url, False, (str), True])
|
|
1592
1644
|
__arg_info_matrix.append(["client_id", client_id, True, (str), True])
|
|
1645
|
+
__arg_info_matrix.append(["pat_token", pat_token, True, (str), True])
|
|
1646
|
+
__arg_info_matrix.append(["pem_file", pem_file, True, (str), True])
|
|
1647
|
+
|
|
1648
|
+
username = kwargs.get("username", None)
|
|
1649
|
+
__arg_info_matrix.append((["username", username, True, (str), True]))
|
|
1650
|
+
|
|
1651
|
+
expiration_time = kwargs.get("expiration_time", 31536000)
|
|
1652
|
+
__arg_info_matrix.append((["expiration_time", expiration_time, True, (int), True]))
|
|
1593
1653
|
|
|
1594
1654
|
# Validate arguments.
|
|
1595
1655
|
_Validators._validate_function_arguments(__arg_info_matrix)
|
|
1596
1656
|
|
|
1657
|
+
if client_id and any([pat_token, pem_file]):
|
|
1658
|
+
message = Messages.get_message(MessageCodes.EITHER_THIS_OR_THAT_ARGUMENT,
|
|
1659
|
+
"client_id", "pat_token' and 'pem_file")
|
|
1660
|
+
raise TeradataMlException(message, MessageCodes.EITHER_THIS_OR_THAT_ARGUMENT)
|
|
1661
|
+
|
|
1662
|
+
if client_id is None:
|
|
1663
|
+
if (pat_token and pem_file is None) or (pem_file and pat_token is None):
|
|
1664
|
+
message = Messages.get_message(MessageCodes.MUST_PASS_ARGUMENT,
|
|
1665
|
+
"pat_token", "pem_file")
|
|
1666
|
+
raise TeradataMlException(message, MessageCodes.MUST_PASS_ARGUMENT)
|
|
1667
|
+
|
|
1668
|
+
# Check if pem file exists.
|
|
1669
|
+
if pem_file is not None:
|
|
1670
|
+
_Validators._validate_file_exists(pem_file)
|
|
1671
|
+
|
|
1597
1672
|
# Extract the base URL from "ues_url".
|
|
1598
1673
|
url_parser = urlparse(ues_url)
|
|
1599
1674
|
base_url = "{}://{}".format(url_parser.scheme, url_parser.netloc)
|
|
1675
|
+
netloc = url_parser.netloc.split('.')[0]
|
|
1600
1676
|
|
|
1601
|
-
if
|
|
1602
|
-
|
|
1603
|
-
|
|
1677
|
+
# Check if the authentication is PAT based or OAuth.
|
|
1678
|
+
if all(arg is None for arg in [pat_token, pem_file]):
|
|
1679
|
+
configure._oauth = True
|
|
1680
|
+
client_id = "{}-oaf-device".format(netloc) if client_id is None else client_id
|
|
1681
|
+
da_wf = _DAWorkflow(base_url, client_id)
|
|
1682
|
+
token_data = da_wf._get_token_data()
|
|
1683
|
+
|
|
1684
|
+
# Set Open AF parameters.
|
|
1685
|
+
configure._oauth_client_id = client_id
|
|
1686
|
+
configure._oauth_end_point = da_wf.device_auth_end_point
|
|
1687
|
+
configure._auth_token_expiry_time = time() + token_data["expires_in"] - 15
|
|
1688
|
+
|
|
1689
|
+
# Store the jwt token in internal class attribute.
|
|
1690
|
+
_InternalBuffer.add(auth_token=_AuthToken(token=token_data["access_token"]))
|
|
1691
|
+
|
|
1692
|
+
else:
|
|
1693
|
+
configure._oauth = False
|
|
1694
|
+
|
|
1695
|
+
if username is None:
|
|
1696
|
+
# If username is not specified then the database username associated with the current context will be
|
|
1697
|
+
# considered.
|
|
1698
|
+
username = _get_user()
|
|
1699
|
+
|
|
1700
|
+
org_id = netloc
|
|
1701
|
+
|
|
1702
|
+
# Construct a dictionary to be passed to _AuthWorkflow().
|
|
1703
|
+
state_dict = {}
|
|
1704
|
+
state_dict["base_url"] = base_url
|
|
1705
|
+
state_dict["org_id"] = org_id
|
|
1706
|
+
state_dict["pat_token"] = pat_token
|
|
1707
|
+
state_dict["pem_file"] = pem_file
|
|
1708
|
+
state_dict["username"] = username
|
|
1709
|
+
state_dict["expiration_time"] = expiration_time
|
|
1604
1710
|
|
|
1605
|
-
|
|
1606
|
-
|
|
1711
|
+
auth_wf = _AuthWorkflow(state_dict)
|
|
1712
|
+
token_data = auth_wf._proxy_jwt()
|
|
1713
|
+
# Store the jwt token in internal class attribute.
|
|
1714
|
+
_InternalBuffer.add(auth_token=_AuthToken(token=token_data))
|
|
1607
1715
|
|
|
1608
1716
|
# Set Open AF parameters.
|
|
1609
|
-
configure._oauth_client_id = client_id
|
|
1610
1717
|
configure.ues_url = ues_url
|
|
1611
|
-
configure._oauth_end_point = da_wf.device_auth_end_point
|
|
1612
|
-
configure._auth_token_expiry_time = time() + token_data["expires_in"] - 15
|
|
1613
|
-
# Store the jwt token in internal class attribute.
|
|
1614
|
-
_InternalBuffer.add(auth_token=_AuthToken(token=token_data["access_token"]))
|
|
1615
1718
|
|
|
1616
1719
|
return True
|
|
@@ -728,7 +728,7 @@ class Script(TableOperator):
|
|
|
728
728
|
5 1 1
|
|
729
729
|
|
|
730
730
|
# Example 2 -
|
|
731
|
-
#
|
|
731
|
+
# Input data is barrier_new and script is executed on Vantage.
|
|
732
732
|
# use set_data() to reset arguments.
|
|
733
733
|
# Create teradataml DataFrame objects.
|
|
734
734
|
>>> load_example_data("Script", ["barrier_new"])
|
|
@@ -751,7 +751,7 @@ class Script(TableOperator):
|
|
|
751
751
|
... sort_ascending=False,
|
|
752
752
|
... charset='latin',
|
|
753
753
|
... returns=OrderedDict([("word", VARCHAR(15)),("count_input", VARCHAR(2))]))
|
|
754
|
-
# Script is
|
|
754
|
+
# Script is executed on Vantage.
|
|
755
755
|
>>> sto.execute_script()
|
|
756
756
|
############ STDOUT Output ############
|
|
757
757
|
word count_input
|
|
@@ -786,7 +786,6 @@ class Script(TableOperator):
|
|
|
786
786
|
5 1 1
|
|
787
787
|
|
|
788
788
|
# Example 3
|
|
789
|
-
# Script is tested using test_script and executed on Vantage.
|
|
790
789
|
# In order to run the script with same dataset but different data related
|
|
791
790
|
# arguments, use set_data() to reset arguments.
|
|
792
791
|
# Note:
|
|
@@ -1015,8 +1015,10 @@ class TableOperator:
|
|
|
1015
1015
|
def deploy(self, model_column, partition_columns=None, model_file_prefix=None):
|
|
1016
1016
|
"""
|
|
1017
1017
|
DESCRIPTION:
|
|
1018
|
-
Function deploys the model generated after `execute_script()` in database
|
|
1019
|
-
environment in
|
|
1018
|
+
Function deploys the model generated after running `execute_script()` in database in
|
|
1019
|
+
VantageCloud Enterprise or in user environment in VantageCloud Lake.
|
|
1020
|
+
If deployed files are not needed, these files can be removed using `remove_file()` in
|
|
1021
|
+
database or `<user_env>.remove_file()` in lake.
|
|
1020
1022
|
|
|
1021
1023
|
PARAMETERS:
|
|
1022
1024
|
model_column:
|
|
@@ -1050,12 +1052,14 @@ class TableOperator:
|
|
|
1050
1052
|
Types: str
|
|
1051
1053
|
|
|
1052
1054
|
RETURNS:
|
|
1053
|
-
List of generated file names.
|
|
1055
|
+
List of generated file identifiers in database or file names in lake.
|
|
1054
1056
|
|
|
1055
1057
|
RAISES:
|
|
1056
1058
|
TeradatamlException
|
|
1057
1059
|
|
|
1058
1060
|
EXAMPLES:
|
|
1061
|
+
>>> import teradataml
|
|
1062
|
+
>>> from teradataml import load_example_data
|
|
1059
1063
|
>>> load_example_data("openml", "multi_model_classification")
|
|
1060
1064
|
|
|
1061
1065
|
>>> df = DataFrame("multi_model_classification")
|
|
@@ -1073,12 +1077,16 @@ class TableOperator:
|
|
|
1073
1077
|
-0.615226 -0.546472 0.017496 -0.488720 0 12 0 10
|
|
1074
1078
|
0.579671 -0.573365 0.160603 0.014404 0 9 1 10
|
|
1075
1079
|
|
|
1080
|
+
## Run in VantageCloud Enterprise using Script object.
|
|
1076
1081
|
# Install Script file.
|
|
1077
1082
|
>>> file_location = os.path.join(os.path.dirname(teradataml.__file__), "data", "scripts", "deploy_script.py")
|
|
1078
1083
|
>>> install_file("deploy_script", file_location, replace=True)
|
|
1079
1084
|
|
|
1085
|
+
>>> execute_sql("SET SESSION SEARCHUIFDBPATH = <db_name>;")
|
|
1086
|
+
|
|
1080
1087
|
# Variables needed for Script execution.
|
|
1081
|
-
>>>
|
|
1088
|
+
>>> from teradataml import configure
|
|
1089
|
+
>>> script_command = f'{configure.indb_install_location} ./<db_name>/deploy_script.py enterprise'
|
|
1082
1090
|
>>> partition_columns = ["partition_column_1", "partition_column_2"]
|
|
1083
1091
|
>>> columns = ["col1", "col2", "col3", "col4", "label",
|
|
1084
1092
|
"partition_column_1", "partition_column_2"]
|
|
@@ -1104,10 +1112,10 @@ class TableOperator:
|
|
|
1104
1112
|
# is auto generated.
|
|
1105
1113
|
>>> obj.deploy(model_column="model",
|
|
1106
1114
|
partition_columns=["partition_column_1", "partition_column_2"])
|
|
1107
|
-
|
|
1108
|
-
|
|
1109
|
-
|
|
1110
|
-
|
|
1115
|
+
['model_file_1710436227163427__0_10',
|
|
1116
|
+
'model_file_1710436227163427__1_10',
|
|
1117
|
+
'model_file_1710436227163427__0_11',
|
|
1118
|
+
'model_file_1710436227163427__1_11']
|
|
1111
1119
|
|
|
1112
1120
|
# Example 2: Provide only "model_file_prefix" argument. Here, filenames are suffixed
|
|
1113
1121
|
# with 1, 2, 3, ... for multiple models.
|
|
@@ -1132,6 +1140,43 @@ class TableOperator:
|
|
|
1132
1140
|
'my_prefix_new__1_10',
|
|
1133
1141
|
'my_prefix_new__1_11']
|
|
1134
1142
|
|
|
1143
|
+
## Run in VantageCloud Lake using Apply object.
|
|
1144
|
+
# Let's assume an user environment named "user_env" already exists in VantageCloud Lake,
|
|
1145
|
+
# which will be used for the examples below.
|
|
1146
|
+
|
|
1147
|
+
# ApplyTableOperator returns BLOB type for model column as per deploy_script.py.
|
|
1148
|
+
>>> returns = OrderedDict([("partition_column_1", INTEGER()),
|
|
1149
|
+
("partition_column_2", INTEGER()),
|
|
1150
|
+
("model", BLOB())])
|
|
1151
|
+
|
|
1152
|
+
# Install the script file which returns model and partition columns.
|
|
1153
|
+
>>> user_env.install_file(file_location)
|
|
1154
|
+
|
|
1155
|
+
>>> script_command = 'python3 deploy_script.py lake'
|
|
1156
|
+
>>> obj = Apply(data=df.select(columns),
|
|
1157
|
+
script_command=script_command,
|
|
1158
|
+
data_partition_column=partition_columns,
|
|
1159
|
+
returns=returns,
|
|
1160
|
+
env_name="user_env"
|
|
1161
|
+
)
|
|
1162
|
+
|
|
1163
|
+
>>> opt = obj.execute_script()
|
|
1164
|
+
>>> opt
|
|
1165
|
+
partition_column_1 partition_column_2 model model
|
|
1166
|
+
0 10 b'gAejc1.....drIr'
|
|
1167
|
+
0 11 b'gANjcw.....qWIu'
|
|
1168
|
+
1 10 b'abdwcd.....dWIz'
|
|
1169
|
+
1 11 b'gA4jc4.....agfu'
|
|
1170
|
+
|
|
1171
|
+
# Example 5: Provide both "partition_columns" and "model_file_prefix" arguments.
|
|
1172
|
+
>>> obj.deploy(model_column="model", model_file_prefix="my_prefix_",
|
|
1173
|
+
partition_columns=["partition_column_1", "partition_column_2"])
|
|
1174
|
+
['my_prefix__0_10',
|
|
1175
|
+
'my_prefix__0_11',
|
|
1176
|
+
'my_prefix__1_10',
|
|
1177
|
+
'my_prefix__1_11']
|
|
1178
|
+
|
|
1179
|
+
# Other examples are similar to the examples provided for VantageCloud Enterprise.
|
|
1135
1180
|
"""
|
|
1136
1181
|
|
|
1137
1182
|
arg_info_matrix = []
|
|
@@ -1169,6 +1214,9 @@ class TableOperator:
|
|
|
1169
1214
|
n_models = len(vals)
|
|
1170
1215
|
all_files = []
|
|
1171
1216
|
|
|
1217
|
+
# Default location for .teradataml is user's home directory if configure.local_storage is not set.
|
|
1218
|
+
tempdir = GarbageCollector._get_temp_dir_name()
|
|
1219
|
+
|
|
1172
1220
|
for i, row in enumerate(vals):
|
|
1173
1221
|
model = row[0]
|
|
1174
1222
|
partition_values = ""
|
|
@@ -1178,7 +1226,7 @@ class TableOperator:
|
|
|
1178
1226
|
partition_values = str(i+1)
|
|
1179
1227
|
|
|
1180
1228
|
model_file = f"{model_file_prefix}_{partition_values}"
|
|
1181
|
-
model_file_path = os.path.join(
|
|
1229
|
+
model_file_path = os.path.join(tempdir, model_file)
|
|
1182
1230
|
|
|
1183
1231
|
if model_column_type == "CLOB":
|
|
1184
1232
|
import base64
|
|
@@ -1198,7 +1246,7 @@ class TableOperator:
|
|
|
1198
1246
|
install_file(file_identifier=model_file, file_path=model_file_path,
|
|
1199
1247
|
is_binary=True, suppress_output=True)
|
|
1200
1248
|
elif self.__class__.__name__ == "Apply":
|
|
1201
|
-
self.env.install_file(
|
|
1249
|
+
self.env.install_file(file_path=model_file_path)
|
|
1202
1250
|
|
|
1203
1251
|
all_files.append(model_file)
|
|
1204
1252
|
|
teradataml/utils/validators.py
CHANGED
|
@@ -170,7 +170,7 @@ class _Validators:
|
|
|
170
170
|
Required Argument.
|
|
171
171
|
Specifies the name or list of names of columns to be validated
|
|
172
172
|
for existence.
|
|
173
|
-
Types: str or List of strings
|
|
173
|
+
Types: str or List of strings or ColumnExpression or list of ColumnExpression
|
|
174
174
|
|
|
175
175
|
arg_name:
|
|
176
176
|
Required Argument.
|
|
@@ -204,7 +204,15 @@ class _Validators:
|
|
|
204
204
|
df_columns = UtilFuncs._all_df_columns(column_expression)
|
|
205
205
|
|
|
206
206
|
# Let's validate existence of each column one by one.
|
|
207
|
-
|
|
207
|
+
columns_ = []
|
|
208
|
+
for column in columns:
|
|
209
|
+
if isinstance(column, str):
|
|
210
|
+
columns_.append(column)
|
|
211
|
+
else:
|
|
212
|
+
columns_ = columns_ + UtilFuncs._all_df_columns(column)
|
|
213
|
+
|
|
214
|
+
# Let's validate existence of each column one by one.
|
|
215
|
+
for column_name in columns_:
|
|
208
216
|
# If column name does not exist in DataFrame of a column, raise the exception.
|
|
209
217
|
if column_name not in df_columns:
|
|
210
218
|
message = "{}. Check the argument '{}'".format(sorted(df_columns), arg_name)
|
|
@@ -2237,3 +2245,33 @@ class _Validators:
|
|
|
2237
2245
|
raise TeradataMlException(message,
|
|
2238
2246
|
MessageCodes.IMPORT_PYTHON_PACKAGE)
|
|
2239
2247
|
return True
|
|
2248
|
+
|
|
2249
|
+
|
|
2250
|
+
@staticmethod
|
|
2251
|
+
@skip_validation()
|
|
2252
|
+
def _validate_ipaddress(ip_address):
|
|
2253
|
+
"""
|
|
2254
|
+
DESCRIPTION:
|
|
2255
|
+
Check if ipaddress is valid.
|
|
2256
|
+
PARAMETERS:
|
|
2257
|
+
ip_address:
|
|
2258
|
+
Required Argument.
|
|
2259
|
+
Specifies the ip address to be validated.
|
|
2260
|
+
Types: str
|
|
2261
|
+
RETURNS:
|
|
2262
|
+
None.
|
|
2263
|
+
RAISES:
|
|
2264
|
+
TeradataMlException
|
|
2265
|
+
EXAMPLES:
|
|
2266
|
+
_Validators._validate_ipaddress("190.132.12.15")
|
|
2267
|
+
"""
|
|
2268
|
+
import ipaddress
|
|
2269
|
+
|
|
2270
|
+
try:
|
|
2271
|
+
ipaddress.ip_address(ip_address)
|
|
2272
|
+
except Exception as err:
|
|
2273
|
+
raise ValueError(Messages.get_message(
|
|
2274
|
+
MessageCodes.INVALID_ARG_VALUE).format(ip_address, "ip_address",
|
|
2275
|
+
'of four numbers (each between 0 and 255) separated by periods'))
|
|
2276
|
+
|
|
2277
|
+
return True
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: teradataml
|
|
3
|
-
Version: 20.0.0.
|
|
3
|
+
Version: 20.0.0.1
|
|
4
4
|
Summary: Teradata Vantage Python package for Advanced Analytics
|
|
5
5
|
Home-page: http://www.teradata.com/
|
|
6
6
|
Author: Teradata Corporation
|
|
@@ -8,24 +8,25 @@ License: Teradata License Agreement
|
|
|
8
8
|
Keywords: Teradata
|
|
9
9
|
Platform: MacOS X, Windows, Linux
|
|
10
10
|
Classifier: Programming Language :: Python :: 3 :: Only
|
|
11
|
-
Classifier: Programming Language :: Python :: 3.
|
|
12
|
-
Classifier: Programming Language :: Python :: 3.
|
|
13
|
-
Classifier: Programming Language :: Python :: 3.7
|
|
11
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
14
13
|
Classifier: Operating System :: Microsoft :: Windows
|
|
15
14
|
Classifier: Operating System :: MacOS :: MacOS X
|
|
16
15
|
Classifier: Operating System :: POSIX :: Linux
|
|
17
16
|
Classifier: Topic :: Database :: Front-Ends
|
|
18
17
|
Classifier: License :: Other/Proprietary License
|
|
19
|
-
Requires-Python: >=3.
|
|
18
|
+
Requires-Python: >=3.8
|
|
20
19
|
Description-Content-Type: text/markdown
|
|
21
20
|
Requires-Dist: teradatasql (>=17.10.0.11)
|
|
22
|
-
Requires-Dist: teradatasqlalchemy (>=20.0.0.
|
|
21
|
+
Requires-Dist: teradatasqlalchemy (>=20.0.0.1)
|
|
23
22
|
Requires-Dist: pandas (>=0.22)
|
|
24
23
|
Requires-Dist: psutil
|
|
25
24
|
Requires-Dist: requests (>=2.25.1)
|
|
26
25
|
Requires-Dist: scikit-learn (>=0.24.2)
|
|
27
26
|
Requires-Dist: IPython (>=8.10.0)
|
|
28
27
|
Requires-Dist: imbalanced-learn (>=0.8.0)
|
|
28
|
+
Requires-Dist: pyjwt (>=2.8.0)
|
|
29
|
+
Requires-Dist: cryptography (>=42.0.5)
|
|
29
30
|
|
|
30
31
|
## Teradata Python package for Advanced Analytics.
|
|
31
32
|
|
|
@@ -45,6 +46,77 @@ Copyright 2024, Teradata. All Rights Reserved.
|
|
|
45
46
|
* [License](#license)
|
|
46
47
|
|
|
47
48
|
## Release Notes:
|
|
49
|
+
#### teradataml 20.00.00.01
|
|
50
|
+
* teradataml no longer supports Python versions less than 3.8.
|
|
51
|
+
|
|
52
|
+
* ##### New Features/Functionality
|
|
53
|
+
* ##### Personal Access Token (PAT) support in teradataml
|
|
54
|
+
* `set_auth_token()` - teradataml now supports authentication via PAT in addition to
|
|
55
|
+
OAuth 2.0 Device Authorization Grant (formerly known as the Device Flow).
|
|
56
|
+
* It accepts UES URL, Personal AccessToken (PAT) and Private Key file generated from VantageCloud Lake Console
|
|
57
|
+
and optional argument `username` and `expiration_time` in seconds.
|
|
58
|
+
|
|
59
|
+
* ##### Updates
|
|
60
|
+
* ##### teradataml: SQLE Engine Analytic Functions
|
|
61
|
+
* `ANOVA()`
|
|
62
|
+
* New arguments added: `group_name_column`, `group_value_name`, `group_names`, `num_groups` for data containing group values and group names.
|
|
63
|
+
* `FTest()`
|
|
64
|
+
* New arguments added: `sample_name_column`, `sample_name_value`, `first_sample_name`, `second_sample_name`.
|
|
65
|
+
* `GLM()`
|
|
66
|
+
* Supports stepwise regression and accept new arguments `stepwise_direction`, `max_steps_num` and `initial_stepwise_columns`.
|
|
67
|
+
* New arguments added: `attribute_data`, `parameter_data`, `iteration_mode` and `partition_column`.
|
|
68
|
+
* `GetFutileColumns()`
|
|
69
|
+
* Arguments `category_summary_column` and `threshold_value` are now optional.
|
|
70
|
+
* `KMeans()`
|
|
71
|
+
* New argument added: `initialcentroids_method`.
|
|
72
|
+
* `NonLinearCombineFit()`
|
|
73
|
+
* Argument `result_column` is now optional.
|
|
74
|
+
* `ROC()`
|
|
75
|
+
* Argument `positive_class` is now optional.
|
|
76
|
+
* `SVMPredict()`
|
|
77
|
+
* New argument added: `model_type`.
|
|
78
|
+
* `ScaleFit()`
|
|
79
|
+
* New arguments added: `ignoreinvalid_locationscale`, `unused_attributes`, `attribute_name_column`, `attribute_value_column`.
|
|
80
|
+
* Arguments `attribute_name_column`, `attribute_value_column` and `target_attributes` are supported for sparse input.
|
|
81
|
+
* Arguments `attribute_data`, `parameter_data` and `partition_column` are supported for partitioning.
|
|
82
|
+
* `ScaleTransform()`
|
|
83
|
+
* New arguments added: `attribute_name_column` and `attribute_value_column` support for sparse input.
|
|
84
|
+
* `TDGLMPredict()`
|
|
85
|
+
* New arguments added: `family` and `partition_column`.
|
|
86
|
+
* `XGBoost()`
|
|
87
|
+
* New argument `base_score` is added for initial prediction value for all data points.
|
|
88
|
+
* `XGBoostPredict()`
|
|
89
|
+
* New argument `detailed` is added for detailed information of each prediction.
|
|
90
|
+
* `ZTest()`
|
|
91
|
+
* New arguments added: `sample_name_column`, `sample_value_column`, `first_sample_name` and `second_sample_name`.
|
|
92
|
+
* ##### teradataml: AutoML
|
|
93
|
+
* `AutoML()`, `AutoRegressor()` and `AutoClassifier()`
|
|
94
|
+
* New argument `max_models` is added as an early stopping criterion to limit the maximum number of models to be trained.
|
|
95
|
+
* ##### teradataml: DataFrame functions
|
|
96
|
+
* `DataFrame.agg()`
|
|
97
|
+
* Accepts ColumnExpressions and list of ColumnExpressions as arguments.
|
|
98
|
+
* ##### teradataml: General Functions
|
|
99
|
+
* Data Transfer Utility
|
|
100
|
+
* `fastload()` - Improved error and warning table handling with below-mentioned new arguments.
|
|
101
|
+
* `err_staging_db`
|
|
102
|
+
* `err_tbl_name`
|
|
103
|
+
* `warn_tbl_name`
|
|
104
|
+
* `err_tbl_1_suffix`
|
|
105
|
+
* `err_tbl_2_suffix`
|
|
106
|
+
* `fastload()` - Change in behaviour of `save_errors` argument.
|
|
107
|
+
When `save_errors` is set to `True`, error information will be available in two persistent tables `ERR_1` and `ERR_2`.
|
|
108
|
+
When `save_errors` is set to `False`, error information will be available in single pandas dataframe.
|
|
109
|
+
* Garbage collector location is now configurable.
|
|
110
|
+
User can set configure.local_storage to a desired location.
|
|
111
|
+
|
|
112
|
+
* ##### Bug Fixes
|
|
113
|
+
* UAF functions now work if the database name has special characters.
|
|
114
|
+
* OpensourceML can now read and process NULL/nan values.
|
|
115
|
+
* Boolean values output will now be returned as VARBYTE column with 0 or 1 values in OpensourceML.
|
|
116
|
+
* Fixed bug for `Apply`'s `deploy()`.
|
|
117
|
+
* Issue with volatile table creation is fixed where it is created in the right database, i.e., user's spool space, regardless of the temp database specified.
|
|
118
|
+
* `ColumnTransformer` function now processes its arguments in the order they are passed.
|
|
119
|
+
|
|
48
120
|
#### teradataml 20.00.00.00
|
|
49
121
|
* ##### New Features/Functionality
|
|
50
122
|
* ###### teradataml OpenML: Run Opensource packages through Teradata Vantage
|