wedata-pre-code 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,192 @@
1
+ Metadata-Version: 2.4
2
+ Name: wedata-pre-code
3
+ Version: 1.0.0
4
+ Summary: WeData平台的预执行代码库,为机器学习实验提供与MLflow的深度集成
5
+ Author-email: WeData Team <wedata@tencent.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://wedata.tencent.com
8
+ Project-URL: Documentation, https://wedata.tencent.com/docs
9
+ Classifier: Development Status :: 4 - Beta
10
+ Classifier: Intended Audience :: Developers
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Programming Language :: Python :: 3.8
14
+ Classifier: Programming Language :: Python :: 3.9
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
18
+ Requires-Python: >=3.8
19
+ Description-Content-Type: text/markdown
20
+ Requires-Dist: mlflow>=2.0.0
21
+ Provides-Extra: dev
22
+ Requires-Dist: pytest>=6.0.0; extra == "dev"
23
+ Requires-Dist: black>=21.0.0; extra == "dev"
24
+ Requires-Dist: flake8>=3.9.0; extra == "dev"
25
+
26
+ # WeData Pre-Code Library
27
+
28
+ WeData平台的预执行代码库,为机器学习实验提供与MLflow的深度集成和WeData平台的功能增强。
29
+
30
+ ## 项目概述
31
+
32
+ 本项目提供了两个版本的WeData客户端,用于在WeData平台上运行机器学习实验时提供以下功能:
33
+
34
+ - **MLflow集成增强**:自动注入WeData平台特定的标签和过滤条件
35
+ - **权限控制**:基于项目/工作空间的权限验证机制
36
+ - **URL生成**:自动生成实验和运行的查看链接
37
+ - **环境配置**:自动设置运行环境变量
38
+
39
+ ## 版本说明
40
+
41
+ ### Wedata2PreCodeClient (WeData 2.0版本)
42
+
43
+ 适用于WeData 2.0平台的客户端,主要特性:
44
+
45
+ - 基于项目ID进行权限控制
46
+ - 支持国内站和国际站URL模板
47
+ - 自动注入项目标签和机器学习类型标签
48
+ - 提供完整的MLflow客户端装饰器
49
+
50
+ ### Wedata3PreCodeClient (WeData 3.0版本)
51
+
52
+ 适用于WeData 3.0平台的客户端,主要特性:
53
+
54
+ - 基于工作空间ID进行权限控制
55
+ - 支持更灵活的配置选项
56
+ - 增强的标签注入和验证机制
57
+ - 支持机器学习和深度学习两种实验类型
58
+
59
+ ## 安装和使用
60
+
61
+ ### 安装依赖
62
+
63
+ ```bash
64
+ pip install mlflow
65
+ ```
66
+
67
+ ### 使用Wedata2PreCodeClient
68
+
69
+ ```python
70
+ from wedata_pre_code.wedata2.client import Wedata2PreCodeClient
71
+
72
+ # 初始化客户端
73
+ client = Wedata2PreCodeClient(
74
+ wedata_project_id="your_project_id",
75
+ wedata_notebook_engine="your_engine",
76
+ qcloud_uin="your_uin",
77
+ qcloud_subuin="your_subuin",
78
+ wedata_default_feature_store_database="your_db",
79
+ wedata_feature_store_databases="your_dbs",
80
+ qcloud_region="your_region",
81
+ mlflow_tracking_uri="your_tracking_uri",
82
+ kernel_task_name="task_name",
83
+ kernel_task_id="task_id",
84
+ kernel_region="region",
85
+ kernel_is_international=False
86
+ )
87
+
88
+ # 现在可以使用MLflow客户端,会自动应用WeData的增强功能
89
+ import mlflow
90
+ mlflow.start_run()
91
+ # ... 你的实验代码
92
+ ```
93
+
94
+ ### 使用Wedata3PreCodeClient
95
+
96
+ ```python
97
+ from wedata_pre_code.wedata3.client import Wedata3PreCodeClient
98
+
99
+ # 初始化客户端
100
+ client = Wedata3PreCodeClient(
101
+ workspace_id="your_workspace_id",
102
+ mlflow_tracking_uri="your_tracking_uri",
103
+ base_url="your_base_url",
104
+ region="your_region",
105
+ run_context_data="your_context_data"
106
+ )
107
+
108
+ # 现在可以使用MLflow客户端,会自动应用WeData的增强功能
109
+ import mlflow
110
+ mlflow.start_run()
111
+ # ... 你的实验代码
112
+ ```
113
+
114
+ ## 功能特性
115
+
116
+ ### 自动标签注入
117
+
118
+ - 自动为实验、运行和模型注入WeData平台标签
119
+ - 包括项目ID、工作空间ID、机器学习类型等信息
120
+ - 确保数据在平台上的可追溯性
121
+
122
+ ### 权限验证
123
+
124
+ - 在执行敏感操作前验证权限
125
+ - 防止跨项目/工作空间的未授权操作
126
+ - 保护内置标签不被修改
127
+
128
+ ### URL生成
129
+
130
+ - 自动生成实验和运行的查看URL
131
+ - 在运行终止时显示访问链接
132
+ - 方便用户快速访问实验结果
133
+
134
+ ### 环境配置
135
+
136
+ - 自动设置MLflow跟踪URI
137
+ - 配置运行上下文环境变量
138
+ - 支持国际站和国内站的不同配置
139
+
140
+ ## 项目结构
141
+
142
+ ```
143
+ pre-execute/
144
+ ├── src/
145
+ │ └── wedata_pre_code/
146
+ │ ├── __init__.py
147
+ │ ├── client.py # 主客户端入口
148
+ │ ├── common/
149
+ │ │ ├── __init__.py
150
+ │ │ └── base_client.py # 基础客户端类
151
+ │ ├── wedata2/
152
+ │ │ ├── __init__.py
153
+ │ │ └── client.py # WeData 2.0客户端
154
+ │ └── wedata3/
155
+ │ ├── __init__.py
156
+ │ └── client.py # WeData 3.0客户端
157
+ ├── docs/ # 文档目录
158
+ ├── pyproject.toml # 项目配置
159
+ ├── requirement.txt # 依赖文件
160
+ └── README.md # 项目说明
161
+ ```
162
+
163
+ ## 开发指南
164
+
165
+ ### 添加新的装饰器
166
+
167
+ 要添加新的MLflow客户端方法装饰器,可以参考现有的实现模式:
168
+
169
+ 1. 在相应的客户端类中定义装饰器函数
170
+ 2. 使用`@wraps`保留原函数属性
171
+ 3. 在装饰器内部实现特定的逻辑
172
+ 4. 将装饰器应用到目标MLflow方法
173
+
174
+ ### 测试
175
+
176
+ 确保在修改代码后测试以下场景:
177
+
178
+ - 正常创建实验和运行
179
+ - 权限验证功能
180
+ - 标签注入的正确性
181
+ - URL生成的准确性
182
+
183
+ ## 注意事项
184
+
185
+ - 确保MLflow服务器配置正确
186
+ - 验证环境变量设置完整
187
+ - 注意不同版本客户端的参数差异
188
+ - 在生产环境使用前进行充分测试
189
+
190
+ ## 支持与反馈
191
+
192
+ 如有问题或建议,请联系WeData平台技术支持团队。
@@ -0,0 +1,167 @@
1
+ # WeData Pre-Code Library
2
+
3
+ WeData平台的预执行代码库,为机器学习实验提供与MLflow的深度集成和WeData平台的功能增强。
4
+
5
+ ## 项目概述
6
+
7
+ 本项目提供了两个版本的WeData客户端,用于在WeData平台上运行机器学习实验时提供以下功能:
8
+
9
+ - **MLflow集成增强**:自动注入WeData平台特定的标签和过滤条件
10
+ - **权限控制**:基于项目/工作空间的权限验证机制
11
+ - **URL生成**:自动生成实验和运行的查看链接
12
+ - **环境配置**:自动设置运行环境变量
13
+
14
+ ## 版本说明
15
+
16
+ ### Wedata2PreCodeClient (WeData 2.0版本)
17
+
18
+ 适用于WeData 2.0平台的客户端,主要特性:
19
+
20
+ - 基于项目ID进行权限控制
21
+ - 支持国内站和国际站URL模板
22
+ - 自动注入项目标签和机器学习类型标签
23
+ - 提供完整的MLflow客户端装饰器
24
+
25
+ ### Wedata3PreCodeClient (WeData 3.0版本)
26
+
27
+ 适用于WeData 3.0平台的客户端,主要特性:
28
+
29
+ - 基于工作空间ID进行权限控制
30
+ - 支持更灵活的配置选项
31
+ - 增强的标签注入和验证机制
32
+ - 支持机器学习和深度学习两种实验类型
33
+
34
+ ## 安装和使用
35
+
36
+ ### 安装依赖
37
+
38
+ ```bash
39
+ pip install mlflow
40
+ ```
41
+
42
+ ### 使用Wedata2PreCodeClient
43
+
44
+ ```python
45
+ from wedata_pre_code.wedata2.client import Wedata2PreCodeClient
46
+
47
+ # 初始化客户端
48
+ client = Wedata2PreCodeClient(
49
+ wedata_project_id="your_project_id",
50
+ wedata_notebook_engine="your_engine",
51
+ qcloud_uin="your_uin",
52
+ qcloud_subuin="your_subuin",
53
+ wedata_default_feature_store_database="your_db",
54
+ wedata_feature_store_databases="your_dbs",
55
+ qcloud_region="your_region",
56
+ mlflow_tracking_uri="your_tracking_uri",
57
+ kernel_task_name="task_name",
58
+ kernel_task_id="task_id",
59
+ kernel_region="region",
60
+ kernel_is_international=False
61
+ )
62
+
63
+ # 现在可以使用MLflow客户端,会自动应用WeData的增强功能
64
+ import mlflow
65
+ mlflow.start_run()
66
+ # ... 你的实验代码
67
+ ```
68
+
69
+ ### 使用Wedata3PreCodeClient
70
+
71
+ ```python
72
+ from wedata_pre_code.wedata3.client import Wedata3PreCodeClient
73
+
74
+ # 初始化客户端
75
+ client = Wedata3PreCodeClient(
76
+ workspace_id="your_workspace_id",
77
+ mlflow_tracking_uri="your_tracking_uri",
78
+ base_url="your_base_url",
79
+ region="your_region",
80
+ run_context_data="your_context_data"
81
+ )
82
+
83
+ # 现在可以使用MLflow客户端,会自动应用WeData的增强功能
84
+ import mlflow
85
+ mlflow.start_run()
86
+ # ... 你的实验代码
87
+ ```
88
+
89
+ ## 功能特性
90
+
91
+ ### 自动标签注入
92
+
93
+ - 自动为实验、运行和模型注入WeData平台标签
94
+ - 包括项目ID、工作空间ID、机器学习类型等信息
95
+ - 确保数据在平台上的可追溯性
96
+
97
+ ### 权限验证
98
+
99
+ - 在执行敏感操作前验证权限
100
+ - 防止跨项目/工作空间的未授权操作
101
+ - 保护内置标签不被修改
102
+
103
+ ### URL生成
104
+
105
+ - 自动生成实验和运行的查看URL
106
+ - 在运行终止时显示访问链接
107
+ - 方便用户快速访问实验结果
108
+
109
+ ### 环境配置
110
+
111
+ - 自动设置MLflow跟踪URI
112
+ - 配置运行上下文环境变量
113
+ - 支持国际站和国内站的不同配置
114
+
115
+ ## 项目结构
116
+
117
+ ```
118
+ pre-execute/
119
+ ├── src/
120
+ │ └── wedata_pre_code/
121
+ │ ├── __init__.py
122
+ │ ├── client.py # 主客户端入口
123
+ │ ├── common/
124
+ │ │ ├── __init__.py
125
+ │ │ └── base_client.py # 基础客户端类
126
+ │ ├── wedata2/
127
+ │ │ ├── __init__.py
128
+ │ │ └── client.py # WeData 2.0客户端
129
+ │ └── wedata3/
130
+ │ ├── __init__.py
131
+ │ └── client.py # WeData 3.0客户端
132
+ ├── docs/ # 文档目录
133
+ ├── pyproject.toml # 项目配置
134
+ ├── requirement.txt # 依赖文件
135
+ └── README.md # 项目说明
136
+ ```
137
+
138
+ ## 开发指南
139
+
140
+ ### 添加新的装饰器
141
+
142
+ 要添加新的MLflow客户端方法装饰器,可以参考现有的实现模式:
143
+
144
+ 1. 在相应的客户端类中定义装饰器函数
145
+ 2. 使用`@wraps`保留原函数属性
146
+ 3. 在装饰器内部实现特定的逻辑
147
+ 4. 将装饰器应用到目标MLflow方法
148
+
149
+ ### 测试
150
+
151
+ 确保在修改代码后测试以下场景:
152
+
153
+ - 正常创建实验和运行
154
+ - 权限验证功能
155
+ - 标签注入的正确性
156
+ - URL生成的准确性
157
+
158
+ ## 注意事项
159
+
160
+ - 确保MLflow服务器配置正确
161
+ - 验证环境变量设置完整
162
+ - 注意不同版本客户端的参数差异
163
+ - 在生产环境使用前进行充分测试
164
+
165
+ ## 支持与反馈
166
+
167
+ 如有问题或建议,请联系WeData平台技术支持团队。
@@ -0,0 +1,43 @@
1
+ [build-system]
2
+ requires = ["setuptools>=45", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "wedata-pre-code"
7
+ version = "1.0.0"
8
+ description = "WeData平台的预执行代码库,为机器学习实验提供与MLflow的深度集成"
9
+ authors = [
10
+ {name = "WeData Team", email = "wedata@tencent.com"}
11
+ ]
12
+ readme = "README.md"
13
+ license = {text = "MIT"}
14
+ classifiers = [
15
+ "Development Status :: 4 - Beta",
16
+ "Intended Audience :: Developers",
17
+ "License :: OSI Approved :: MIT License",
18
+ "Programming Language :: Python :: 3",
19
+ "Programming Language :: Python :: 3.8",
20
+ "Programming Language :: Python :: 3.9",
21
+ "Programming Language :: Python :: 3.10",
22
+ "Programming Language :: Python :: 3.11",
23
+ "Topic :: Scientific/Engineering :: Artificial Intelligence",
24
+ ]
25
+ requires-python = ">=3.8"
26
+ dependencies = [
27
+ "mlflow>=2.0.0",
28
+ ]
29
+
30
+ [project.optional-dependencies]
31
+ dev = [
32
+ "pytest>=6.0.0",
33
+ "black>=21.0.0",
34
+ "flake8>=3.9.0",
35
+ ]
36
+
37
+ [project.urls]
38
+ Homepage = "https://wedata.tencent.com"
39
+ Documentation = "https://wedata.tencent.com/docs"
40
+
41
+ [tool.setuptools.packages.find]
42
+ where = ["src"]
43
+ include = ["wedata_pre_code*"]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,12 @@
1
+ """WeData Pre-Code Library
2
+
3
+ WeData平台的预执行代码库,为机器学习实验提供与MLflow的深度集成和WeData平台的功能增强。
4
+ """
5
+
6
+ from .client import PreCodeClient
7
+
8
+ __all__ = [
9
+ "PreCodeClient"
10
+ ]
11
+
12
+ __version__ = "1.0.0"
@@ -0,0 +1,11 @@
1
+
2
+
3
+ class PreCodeClient:
4
+
5
+ def init_wedata2_pre_code(self, **kwargs):
6
+ from wedata_pre_code.wedata2.client import Wedata2PreCodeClient
7
+ return Wedata2PreCodeClient(**kwargs)
8
+
9
+ def init_wedata3_pre_code(self, **kwargs):
10
+ from wedata_pre_code.wedata3.client import Wedata3PreCodeClient
11
+ return Wedata3PreCodeClient(**kwargs)
@@ -0,0 +1,18 @@
1
+ class BaseClient:
2
+
3
+ def set_properties(self, **kwargs):
4
+ for key, value in kwargs.items():
5
+ setattr(self, key, value)
6
+
7
+ def check_properties(self, *args):
8
+ for arg in args:
9
+ if not hasattr(self, arg):
10
+ raise AttributeError(f"Missing required property: {arg}")
11
+
12
+ def check_required_properties(self, *args):
13
+ for arg in args:
14
+ if not hasattr(self, arg):
15
+ raise AttributeError(f"Missing required property: {arg}")
16
+ if getattr(self, arg) is None:
17
+ raise AttributeError(f"Required property cannot be None: {arg}")
18
+
@@ -0,0 +1,334 @@
1
+ from wedata_pre_code.common.base_client import BaseClient
2
+
3
+
4
+ class Wedata2PreCodeClient(BaseClient):
5
+ def __init__(self, wedata_project_id: str = None, wedata_notebook_engine: str = None, qcloud_uin: str = None,
6
+ qcloud_subuin: str = None, wedata_default_feature_store_database: str = None,
7
+ wedata_feature_store_databases: str = None, qcloud_region: str = None,
8
+ mlflow_tracking_uri: str = None, kernel_task_name: str = None, kernel_task_id: str = None,
9
+ kernel_submit_form_workflow: str = None, kernel_region: str = None, kernel_is_international: str = None):
10
+
11
+ self.wedata_project_id = wedata_project_id
12
+ self.wedata_notebook_engine = wedata_notebook_engine
13
+ self.qcloud_uin = qcloud_uin
14
+ self.qcloud_subuin = qcloud_subuin
15
+ self.wedata_default_feature_store_database = wedata_default_feature_store_database
16
+ self.wedata_feature_store_databases = wedata_feature_store_databases
17
+ self.qcloud_region = qcloud_region
18
+ self.mlflow_tracking_uri = mlflow_tracking_uri
19
+ self.kernel_task_name = kernel_task_name
20
+ self.kernel_task_id = kernel_task_id
21
+ self.kernel_submit_form_workflow = kernel_submit_form_workflow
22
+ self.kernel_region = kernel_region
23
+ self.kernel_is_international = kernel_is_international
24
+ self.check_required_properties("wedata_project_id", "wedata_notebook_engine", "qcloud_uin", "qcloud_subuin",
25
+ "wedata_default_feature_store_database", "wedata_feature_store_databases",
26
+ "qcloud_region", "mlflow_tracking_uri", "kernel_task_name", "kernel_task_id",
27
+ "kernel_submit_form_workflow", "kernel_region")
28
+ self.init_pre_code()
29
+
30
+ def init_pre_code(self):
31
+ import os
32
+ from mlflow.tracking._tracking_service.client import TrackingServiceClient
33
+ from mlflow.tracking import MlflowClient
34
+ from functools import wraps
35
+ import json
36
+
37
+ os.environ['WEDATA_PROJECT_ID'] = self.wedata_project_id
38
+ os.environ['WEDATA_NOTEBOOK_ENGINE'] = self.wedata_notebook_engine
39
+ os.environ['QCLOUD_UIN'] = self.qcloud_uin
40
+ os.environ['QCLOUD_SUBUIN'] = self.qcloud_subuin
41
+ os.environ['WEDATA_DEFAULT_FEATURE_STORE_DATABASE'] = self.wedata_default_feature_store_database
42
+ os.environ['WEDATA_FEATURE_STORE_DATABASES'] = self.wedata_feature_store_databases
43
+ os.environ['QCLOUD_REGION'] = self.qcloud_region
44
+ os.environ["MLFLOW_TRACKING_URI"] = self.mlflow_tracking_uri
45
+
46
+ user_name = self.qcloud_uin
47
+ task_name = self.kernel_task_name
48
+ task_id = self.kernel_task_id
49
+ try:
50
+ workflow_id = self.kernel_submit_form_workflow
51
+ except Exception:
52
+ workflow_id = ""
53
+
54
+ project_id = self.wedata_project_id
55
+ os.environ["WEDATA_PROJECT_ID"] = project_id
56
+ os.environ["KERNEL_SUBMIT_FORM_WORKFLOW"] = workflow_id
57
+
58
+ region = self.kernel_region
59
+ is_international = self.kernel_is_international
60
+
61
+ template = (
62
+ "https://{region}.wedata.tencentcloud.com" # 国际站
63
+ if is_international
64
+ else "https://{region}.wedata.cloud.tencent.com" # 国内站
65
+ )
66
+ base_url = f"{template.format(region=region)}"
67
+
68
+ run_context_data = {
69
+ "mlflow.source.name": task_name,
70
+ "mlflow.user": user_name,
71
+ "wedata.taskId": task_id,
72
+ "wedata.workflowId": workflow_id,
73
+ "wedata.datascience.type": "MACHINE_LEARNING",
74
+ "wedata.project": project_id
75
+ }
76
+ run_context_value = json.dumps(run_context_data, indent=None)
77
+
78
+ os.environ["MLFLOW_RUN_CONTEXT"] = run_context_value
79
+
80
+ def log_after_terminated(func):
81
+ @wraps(func)
82
+ def wrapper(self, run_id, *args, **kwargs):
83
+ print("wedata log_after_terminated wrapper")
84
+ # 调用原set_terminated
85
+ result = func(self, run_id, *args, **kwargs)
86
+ # 获取experiment_id
87
+ run_info = self.store.get_run(run_id).info
88
+ run_name = run_info.run_name
89
+ experiment_id = run_info.experiment_id
90
+ experment_url = f"{base_url}/datascience/experiments-single/{experiment_id}?ProjectId={project_id}"
91
+ run_url = f"{base_url}/datascience/experiments/task-detail-learn/{run_id}?ProjectId={project_id}"
92
+ print(f"View run {run_name} at :{run_url}")
93
+ print(f"View experiment at:{experment_url}")
94
+ return result
95
+
96
+ return wrapper
97
+
98
+ from mlflow.models.model import Model
99
+ def inject_model_version_tag(func):
100
+ @wraps(func)
101
+ def wrapper(*args, **kwargs):
102
+ print("wedata inject_model_version_tag wrapper")
103
+ registered_model_name = kwargs.get("registered_model_name")
104
+ if registered_model_name is None:
105
+ # 如果在 args 里,找到它的位置
106
+ import inspect
107
+ sig = inspect.signature(func)
108
+ params = list(sig.parameters.keys())
109
+ if 'registered_model_name' in params:
110
+ idx = params.index('registered_model_name') - 1 # -1 因为 self
111
+ if len(args) > idx:
112
+ registered_model_name = args[idx]
113
+ result = func(*args, **kwargs)
114
+ model_version = result.registered_model_version
115
+ # 添加 tag
116
+ if registered_model_name and model_version:
117
+ from mlflow import MlflowClient
118
+ MlflowClient().set_model_version_tag(registered_model_name, model_version, "mlflow.user",
119
+ "{user_name}")
120
+ MlflowClient().set_model_version_tag(registered_model_name, model_version, "wedata.project",
121
+ "{project_id}")
122
+ MlflowClient().set_model_version_tag(registered_model_name, model_version,
123
+ "wedata.datascience.type", "MACHINE_LEARNING")
124
+ return result
125
+
126
+ return wrapper
127
+
128
+ Model.log = inject_model_version_tag(Model.log)
129
+
130
+ def inject_project_filter(func):
131
+ @wraps(func)
132
+ def wrapper(*args, **kwargs):
133
+ # 从环境变量获取 project 值
134
+ project = os.getenv("WEDATA_PROJECT_ID")
135
+ if project:
136
+ # 获取原始过滤条件
137
+ filter_str = kwargs.get("filter_string", "")
138
+ # 拼接新的过滤条件(假设 project 存储在 run 的 tag 中)
139
+ new_filter = f"tags.wedata.project = '{project}' and tags.wedata.datascience.type='MACHINE_LEARNING'"
140
+ if filter_str:
141
+ new_filter = f"({filter_str}) and ({new_filter})"
142
+ kwargs["filter_string"] = new_filter
143
+ return func(*args, **kwargs)
144
+
145
+ return wrapper
146
+
147
+ def inject_project_tag(func):
148
+ @wraps(func)
149
+ def wrapper(self, *args, **kwargs):
150
+ project = os.getenv("WEDATA_PROJECT_ID")
151
+ workflow_id = os.getenv("KERNEL_SUBMIT_FORM_WORKFLOW")
152
+ args_list = list(args)
153
+ if project:
154
+ if 'tags' in kwargs:
155
+ tags = kwargs['tags'] or {}
156
+ tags = tags.copy()
157
+ tags["wedata.project"] = project
158
+ tags["wedata.datascience.type"] = "MACHINE_LEARNING"
159
+ tags["wedata.workflowId"] = workflow_id
160
+ kwargs['tags'] = tags
161
+ else:
162
+ current_tags = {}
163
+ method_name = func.__name__
164
+ if current_tags == None:
165
+ if method_name in ('create_experiment', 'create_run'):
166
+ if len(args_list) >= 3:
167
+ current_tags = args_list[2]
168
+ elif method_name in ('create_registered_model'):
169
+ if len(args_list) >= 2:
170
+ current_tags = args[1]
171
+ elif method_name in ('create_model_version'):
172
+ if len(args_list) >= 5:
173
+ current_tags = args[4]
174
+ if current_tags is None:
175
+ current_tags = {}
176
+ else:
177
+ current_tags = current_tags.copy() # 避免修改原始字典
178
+ current_tags["wedata.project"] = project
179
+ current_tags["wedata.datascience.type"] = "MACHINE_LEARNING"
180
+ current_tags["mlflow.user"] = "{user_name}"
181
+ kwargs["tags"] = current_tags
182
+ return func(self, *args, **kwargs)
183
+
184
+ return wrapper
185
+
186
+ def validate_wedata_tag(func):
187
+ @wraps(func)
188
+ def wrapper(*args, **kwargs):
189
+ project = os.getenv("WEDATA_PROJECT_ID")
190
+ # 调用原始方法获取 Experiment
191
+ obj = func(*args, **kwargs)
192
+
193
+ # 如果 Experiment 不存在,直接返回错误
194
+ if obj is None:
195
+ # print("object is not exists")
196
+ return obj
197
+
198
+ project_tag = None
199
+ datascience_type_tag = None
200
+ method_name = func.__name__
201
+ obj_name = 'object'
202
+ if 'run' in method_name:
203
+ project_tag = obj.data.tags.get("wedata.project")
204
+ datascience_type_tag = obj.data.tags.get("wedata.datascience.type")
205
+ obj_name = 'run'
206
+ elif 'experiment' in method_name:
207
+ obj_name = 'experiment'
208
+ project_tag = obj.tags.get("wedata.project")
209
+ datascience_type_tag = obj.tags.get("wedata.datascience.type")
210
+ elif 'model' in method_name:
211
+ obj_name = 'model'
212
+ project_tag = obj.tags.get("wedata.project")
213
+ datascience_type_tag = obj.tags.get("wedata.datascience.type")
214
+ # 检查标签是否存在且值正确
215
+ if project and project_tag != project:
216
+ print(f"this project:{project},has no {obj_name}")
217
+ return None
218
+ if datascience_type_tag != 'MACHINE_LEARNING':
219
+ print("Only MACHINE_LEARNING experiment/run/model can be operated in the notebook")
220
+ return None
221
+ return obj
222
+
223
+ return wrapper
224
+
225
+ def validate_wedata_before_operation(func):
226
+ @wraps(func)
227
+ def wrapper(self, *args, **kwargs):
228
+ project = os.getenv("WEDATA_PROJECT_ID")
229
+ # 如果未设置环境变量,直接执行原删除操作
230
+ if not project:
231
+ return func(self, *args, **kwargs)
232
+ method_name = func.__name__
233
+
234
+ id_name = None
235
+ res = None
236
+ project_tag = None
237
+ data_science_type = None
238
+ # 如果设置了环境变量,则校验标签
239
+ # 获取 Experiment 对象
240
+ if 'experiment' in method_name:
241
+ id_name = kwargs.get("experiment_id") or (args[0] if args else None)
242
+ res = self.get_experiment(id_name)
243
+ if not res:
244
+ print(f"Experiment: '{id_name}' not exist or does not have permission to operate")
245
+ return
246
+ project_tag = res.tags.get("wedata.project")
247
+ data_science_type = res.tags.get("wedata.datascience.type")
248
+ elif 'model' in method_name:
249
+ id_name = kwargs.get("name") or (args[0] if args else None)
250
+ res = self.get_registered_model(id_name)
251
+ if not res:
252
+ print(f"Model '{id_name}' not exist or does not have permission to operate")
253
+ return
254
+ project_tag = res.tags.get("wedata.project")
255
+ data_science_type = res.tags.get("wedata.datascience.type")
256
+ else:
257
+ id_name = kwargs.get("run_id") or (args[0] if args else None)
258
+ res = self.get_run(id_name)
259
+ if not res:
260
+ print(f"run: '{id_name}' not exist or does not have permission to operate")
261
+ return
262
+ project_tag = res.data.tags.get("wedata.project")
263
+ data_science_type = res.data.tags.get("wedata.datascience.type")
264
+ # print(f"query result:{res}")
265
+ # 检查标签是否匹配
266
+ if project_tag != project or data_science_type != 'MACHINE_LEARNING':
267
+ print(f"Unauthorized operation:{method_name} ({id_name})")
268
+ return # 不执行删除
269
+
270
+ # print(method_name)
271
+ # 操作标签的操作需要确认不会影响内置标签wedata.project
272
+ if method_name in ("update_tag", "delete_tags",
273
+ "set_registered_model_tag",
274
+ "delete_registered_model_tag",
275
+ "delete_model_version_tag", "set_experiment_tag"):
276
+ # 获取 key 参数的值
277
+ key_value = kwargs.get("key") or (args[1] if args else None)
278
+ print(key_value)
279
+ if key_value == "wedata.project":
280
+ print(f"No permission to operate protected tags: {key_value}")
281
+ return
282
+ # 标签匹配,执行删除
283
+ return func(self, *args, **kwargs)
284
+
285
+ return wrapper
286
+
287
+ # 1. 应用装饰器,过滤条件filter_str 中添加tag
288
+ MlflowClient.search_experiments = inject_project_filter(MlflowClient.search_experiments)
289
+ MlflowClient.search_runs = inject_project_filter(MlflowClient.search_runs)
290
+ MlflowClient.search_registered_models = inject_project_filter(MlflowClient.search_registered_models)
291
+ MlflowClient.search_model_versions = inject_project_filter(MlflowClient.search_model_versions)
292
+ MlflowClient.create_experiment = inject_project_tag(MlflowClient.create_experiment)
293
+ MlflowClient.create_registered_model = inject_project_tag(MlflowClient.create_registered_model)
294
+ MlflowClient.create_model_version = inject_project_tag(MlflowClient.create_model_version)
295
+ # 2. 后置返回结果过滤wedata_project tag
296
+ MlflowClient.get_experiment = validate_wedata_tag(MlflowClient.get_experiment)
297
+ MlflowClient.get_experiment_by_name = validate_wedata_tag(MlflowClient.get_experiment_by_name)
298
+ MlflowClient.get_run = validate_wedata_tag(MlflowClient.get_run)
299
+ MlflowClient.get_parent_run = validate_wedata_tag(MlflowClient.get_parent_run)
300
+ MlflowClient.get_registered_model = validate_wedata_tag(MlflowClient.get_registered_model)
301
+ TrackingServiceClient.set_terminated = log_after_terminated(TrackingServiceClient.set_terminated)
302
+ # MlflowClient.get_model_version = validate_wedata_tag(MlflowClient.get_model_version)
303
+ # MlflowClient.get_model_version_download_uri = validate_wedata_tag(MlflowClient.get_model_version_download_uri)
304
+ # MlflowClient.get_latest_versions = validate_wedata_tag(MlflowClient.get_latest_versions)
305
+ # 4. 操作前校验,参数experment_id
306
+ MlflowClient.delete_experiment = validate_wedata_before_operation(MlflowClient.delete_experiment)
307
+ MlflowClient.restore_experiment = validate_wedata_before_operation(MlflowClient.restore_experiment)
308
+ MlflowClient.rename_experiment = validate_wedata_before_operation(MlflowClient.rename_experiment)
309
+ MlflowClient.set_experiment_tag = validate_wedata_before_operation(MlflowClient.set_experiment_tag)
310
+ # 操作前校验 参数run_id
311
+ MlflowClient.set_tag = validate_wedata_before_operation(MlflowClient.set_tag)
312
+ MlflowClient.delete_tag = validate_wedata_before_operation(MlflowClient.delete_tag)
313
+ MlflowClient.update_run = validate_wedata_before_operation(MlflowClient.update_run)
314
+ MlflowClient.download_artifacts = validate_wedata_before_operation(MlflowClient.download_artifacts)
315
+ MlflowClient.list_artifacts = validate_wedata_before_operation(MlflowClient.list_artifacts)
316
+ MlflowClient.delete_run = validate_wedata_before_operation(MlflowClient.delete_run)
317
+ MlflowClient.restore_run = validate_wedata_before_operation(MlflowClient.restore_run)
318
+ # 操作前校验 参数name
319
+ MlflowClient.rename_registered_model = validate_wedata_before_operation(MlflowClient.rename_registered_model)
320
+ MlflowClient.update_registered_model = validate_wedata_before_operation(MlflowClient.update_registered_model)
321
+ MlflowClient.delete_registered_model = validate_wedata_before_operation(MlflowClient.delete_registered_model)
322
+ MlflowClient.update_model_version = validate_wedata_before_operation(MlflowClient.update_model_version)
323
+ MlflowClient.delete_model_version = validate_wedata_before_operation(MlflowClient.delete_model_version)
324
+ MlflowClient.set_model_version_tag = validate_wedata_before_operation(MlflowClient.set_model_version_tag)
325
+ MlflowClient.delete_model_version_tag = validate_wedata_before_operation(MlflowClient.delete_model_version_tag)
326
+ MlflowClient.set_registered_model_alias = validate_wedata_before_operation(
327
+ MlflowClient.set_registered_model_alias)
328
+ MlflowClient.delete_registered_model_alias = validate_wedata_before_operation(
329
+ MlflowClient.delete_registered_model_alias)
330
+
331
+ # TOOD:设置tag相关需要校验设置的key是否为wedata_project
332
+ MlflowClient.set_registered_model_tag = validate_wedata_before_operation(MlflowClient.set_registered_model_tag)
333
+ MlflowClient.delete_registered_model_tag = validate_wedata_before_operation(
334
+ MlflowClient.delete_registered_model_tag)
@@ -0,0 +1,241 @@
1
+ from wedata_pre_code.common.base_client import BaseClient
2
+ import types
3
+
4
+
5
+ class Wedata3PreCodeClient(BaseClient):
6
+ def __init__(self, workspace_id: str = None, mlflow_tracking_uri: str = None, base_url: str = None,
7
+ region: str = None, run_context_data: str = None):
8
+ self.workspace_id = workspace_id
9
+ self.mlflow_tracking_uri = mlflow_tracking_uri
10
+ self.base_url = base_url
11
+ self.region = region
12
+ self.run_context_data = run_context_data
13
+ self.check_required_properties("workspace_id", "mlflow_tracking_uri", "base_url", "run_context_data")
14
+ self.init_pre_code()
15
+
16
+ def init_pre_code(self):
17
+ import os
18
+ import json
19
+ import mlflow
20
+ from mlflow.tracking._tracking_service.client import TrackingServiceClient
21
+ from mlflow.tracking import MlflowClient
22
+ from mlflow.store.tracking.rest_store import RestStore
23
+ import logging
24
+ from functools import wraps
25
+ import inspect
26
+ from mlflow.models.model import Model
27
+ os.environ["MLFLOW_RUN_CONTEXT"] = self.run_context_data
28
+ os.environ["WEDATA_WORKSPACE_ID"] = self.workspace_id
29
+
30
+ mlflow.set_tracking_uri(getattr(self, "mlflow_tracking_uri", "http://127.0.0.1:5000"))
31
+ if self.region:
32
+ # 日志输出装饰器
33
+ base_url = self.base_url
34
+ workspace_id = self.workspace_id
35
+
36
+ def log_after_terminated(func):
37
+ @wraps(func)
38
+ def wrapper(self, run_id, *args, **kwargs):
39
+ print("wedata log_after_terminated wrapper")
40
+ result = func(self, run_id, *args, **kwargs)
41
+ run_info = self.store.get_run(run_id).info
42
+ run_name = run_info.run_name
43
+ experiment_id = run_info.experiment_id
44
+ experiment_url = f"${base_url}/datascience/experiments/experiments-single/{experiment_id}?o=${workspace_id}"
45
+ run_url = f"${base_url}/datascience/experiments/task-detail-learn/{run_id}?o=${workspace_id}"
46
+ print(f"View run {run_name} at :{run_url}")
47
+ print(f"View experiment at:{experiment_url}")
48
+ return result
49
+
50
+ return wrapper
51
+
52
+ TrackingServiceClient.set_terminated = log_after_terminated(TrackingServiceClient.set_terminated)
53
+
54
+ # 模型版本标签注入装饰器
55
+ def inject_model_version_tag(func):
56
+ @wraps(func)
57
+ def wrapper(*args, **kwargs):
58
+ print("wedata inject_model_version_tag wrapper")
59
+ registered_model_name = kwargs.get("registered_model_name")
60
+ if registered_model_name is None:
61
+ sig = inspect.signature(func)
62
+ params = list(sig.parameters.keys())
63
+ if 'registered_model_name' in params:
64
+ idx = params.index('registered_model_name') - 1
65
+ if len(args) > idx:
66
+ registered_model_name = args[idx]
67
+ result = func(*args, **kwargs)
68
+ model_version = result.registered_model_version
69
+ if registered_model_name and model_version:
70
+ from mlflow import MlflowClient
71
+ MlflowClient().set_model_version_tag(registered_model_name, model_version, "mlflow.user", "${uin}")
72
+ MlflowClient().set_model_version_tag(registered_model_name, model_version, "wedata.workspace",
73
+ "${workspaceId}")
74
+ MlflowClient().set_model_version_tag(registered_model_name, model_version,
75
+ "wedata.datascience.type", "MACHINE_LEARNING")
76
+ return result
77
+
78
+ return wrapper
79
+
80
+ Model.log = inject_model_version_tag(Model.log)
81
+
82
+ # 项目标签注入装饰器
83
+ def inject_workspace_tag(func):
84
+ @wraps(func)
85
+ def wrapper(self, *args, **kwargs):
86
+ workspace = os.getenv("WEDATA_WORKSPACE_ID")
87
+ args_list = list(args)
88
+ if workspace:
89
+ if 'tags' in kwargs:
90
+ tags = kwargs['tags'] or {}
91
+ tags = tags.copy()
92
+ # 如果传入的参数中有wedata.workspace和wedata.datascience.type,则不进行注入
93
+ if "wedata.workspace" not in tags:
94
+ tags["wedata.workspace"] = workspace
95
+ if "wedata.datascience.type" not in tags:
96
+ tags["wedata.datascience.type"] = "MACHINE_LEARNING"
97
+ kwargs['tags'] = tags
98
+ else:
99
+ current_tags = None
100
+ method_name = func.__name__
101
+ if current_tags is None:
102
+ if method_name in ('create_experiment', 'create_run'):
103
+ if len(args_list) >= 3:
104
+ current_tags = args_list[2]
105
+ elif method_name in ('create_registered_model'):
106
+ if len(args_list) >= 2:
107
+ current_tags = args_list[1]
108
+ elif method_name in ('create_model_version'):
109
+ if len(args_list) >= 5:
110
+ current_tags = args_list[4]
111
+ if current_tags is None:
112
+ current_tags = {}
113
+ else:
114
+ current_tags = current_tags.copy()
115
+ current_tags["wedata.workspace"] = workspace
116
+ current_tags["wedata.datascience.type"] = "MACHINE_LEARNING"
117
+ current_tags["mlflow.user"] = "${uin}"
118
+ kwargs["tags"] = current_tags
119
+ return func(self, *args, **kwargs)
120
+
121
+ return wrapper
122
+
123
+ # 标签验证装饰器
124
+ def validate_wedata_tag(func):
125
+ @wraps(func)
126
+ def wrapper(*args, **kwargs):
127
+ workspace = os.getenv("WEDATA_WORKSPACE_ID")
128
+ obj = func(*args, **kwargs)
129
+ if obj is None:
130
+ return obj
131
+ workspace_tag = None
132
+ datascience_type_tag = None
133
+ method_name = func.__name__
134
+ obj_name = 'object'
135
+ if 'run' in method_name:
136
+ workspace_tag = obj.data.tags.get("wedata.workspace")
137
+ datascience_type_tag = obj.data.tags.get("wedata.datascience.type")
138
+ obj_name = 'run'
139
+ elif 'experiment' in method_name:
140
+ obj_name = 'experiment'
141
+ workspace_tag = obj.tags.get("wedata.workspace")
142
+ datascience_type_tag = obj.tags.get("wedata.datascience.type")
143
+ elif 'model' in method_name:
144
+ obj_name = 'model'
145
+ workspace_tag = obj.tags.get("wedata.workspace")
146
+ datascience_type_tag = obj.tags.get("wedata.datascience.type")
147
+ if workspace and workspace_tag != workspace:
148
+ print(f"this workspace:{workspace},has no {obj_name}")
149
+ return None
150
+ if datascience_type_tag not in ('MACHINE_LEARNING', 'DEEP_LEARNING'):
151
+ print(
152
+ "Only MACHINE_LEARNING and DEEP_LEARNING experiment/run/model can be operated in the notebook")
153
+ return None
154
+ return obj
155
+
156
+ return wrapper
157
+
158
+ # 操作前验证装饰器
159
+ def validate_wedata_before_operation(func):
160
+ @wraps(func)
161
+ def wrapper(self, *args, **kwargs):
162
+ workspace = os.getenv("WEDATA_WORKSPACE_ID")
163
+ if not workspace:
164
+ return func(self, *args, **kwargs)
165
+ method_name = func.__name__
166
+ id_name = None
167
+ res = None
168
+ workspace_tag = None
169
+ data_science_type = None
170
+ if 'experiment' in method_name:
171
+ id_name = kwargs.get("experiment_id") or (args[0] if args else None)
172
+ res = self.get_experiment(id_name)
173
+ if not res:
174
+ print(f"Experiment: '{id_name}' not exist or does not have permission to operate")
175
+ return
176
+ workspace_tag = res.tags.get("wedata.workspace")
177
+ data_science_type = res.tags.get("wedata.datascience.type")
178
+ elif 'model' in method_name:
179
+ id_name = kwargs.get("name") or (args[0] if args else None)
180
+ res = self.get_registered_model(id_name)
181
+ if not res:
182
+ print(f"Model '{id_name}' not exist or does not have permission to operate")
183
+ return
184
+ workspace_tag = res.tags.get("wedata.workspace")
185
+ data_science_type = res.tags.get("wedata.datascience.type")
186
+ else:
187
+ id_name = kwargs.get("run_id") or (args[0] if args else None)
188
+ res = self.get_run(id_name)
189
+ if not res:
190
+ print(f"run: '{id_name}' not exist or does not have permission to operate")
191
+ return
192
+ workspace_tag = res.data.tags.get("wedata.workspace")
193
+ data_science_type = res.data.tags.get("wedata.datascience.type")
194
+ if workspace_tag != workspace or data_science_type not in ('MACHINE_LEARNING', 'DEEP_LEARNING'):
195
+ print(f"Unauthorized operation:{method_name} ({id_name})")
196
+ return
197
+ if method_name in (
198
+ "update_tag", "delete_tags", "set_registered_model_tag", "delete_registered_model_tag",
199
+ "delete_model_version_tag", "set_experiment_tag"):
200
+ key_value = kwargs.get("key") or (args[1] if args else None)
201
+ if key_value == "wedata.workspace":
202
+ print(f"No permission to operate protected tags: {key_value}")
203
+ return
204
+ return func(self, *args, **kwargs)
205
+
206
+ return wrapper
207
+
208
+ # 应用装饰器
209
+ MlflowClient.create_experiment = inject_workspace_tag(MlflowClient.create_experiment)
210
+ MlflowClient.create_registered_model = inject_workspace_tag(MlflowClient.create_registered_model)
211
+ MlflowClient.create_model_version = inject_workspace_tag(MlflowClient.create_model_version)
212
+ MlflowClient.get_experiment = validate_wedata_tag(MlflowClient.get_experiment)
213
+ MlflowClient.get_experiment_by_name = validate_wedata_tag(MlflowClient.get_experiment_by_name)
214
+ MlflowClient.get_run = validate_wedata_tag(MlflowClient.get_run)
215
+ MlflowClient.get_parent_run = validate_wedata_tag(MlflowClient.get_parent_run)
216
+ MlflowClient.get_registered_model = validate_wedata_tag(MlflowClient.get_registered_model)
217
+ MlflowClient.delete_experiment = validate_wedata_before_operation(MlflowClient.delete_experiment)
218
+ MlflowClient.restore_experiment = validate_wedata_before_operation(MlflowClient.restore_experiment)
219
+ MlflowClient.rename_experiment = validate_wedata_before_operation(MlflowClient.rename_experiment)
220
+ MlflowClient.set_experiment_tag = validate_wedata_before_operation(MlflowClient.set_experiment_tag)
221
+ MlflowClient.set_tag = validate_wedata_before_operation(MlflowClient.set_tag)
222
+ MlflowClient.delete_tag = validate_wedata_before_operation(MlflowClient.delete_tag)
223
+ MlflowClient.update_run = validate_wedata_before_operation(MlflowClient.update_run)
224
+ MlflowClient.download_artifacts = validate_wedata_before_operation(MlflowClient.download_artifacts)
225
+ MlflowClient.list_artifacts = validate_wedata_before_operation(MlflowClient.list_artifacts)
226
+ MlflowClient.delete_run = validate_wedata_before_operation(MlflowClient.delete_run)
227
+ MlflowClient.restore_run = validate_wedata_before_operation(MlflowClient.restore_run)
228
+ MlflowClient.rename_registered_model = validate_wedata_before_operation(MlflowClient.rename_registered_model)
229
+ MlflowClient.update_registered_model = validate_wedata_before_operation(MlflowClient.update_registered_model)
230
+ MlflowClient.delete_registered_model = validate_wedata_before_operation(MlflowClient.delete_registered_model)
231
+ MlflowClient.update_model_version = validate_wedata_before_operation(MlflowClient.update_model_version)
232
+ MlflowClient.delete_model_version = validate_wedata_before_operation(MlflowClient.delete_model_version)
233
+ MlflowClient.set_model_version_tag = validate_wedata_before_operation(MlflowClient.set_model_version_tag)
234
+ MlflowClient.delete_model_version_tag = validate_wedata_before_operation(MlflowClient.delete_model_version_tag)
235
+ MlflowClient.set_registered_model_alias = validate_wedata_before_operation(
236
+ MlflowClient.set_registered_model_alias)
237
+ MlflowClient.delete_registered_model_alias = validate_wedata_before_operation(
238
+ MlflowClient.delete_registered_model_alias)
239
+ MlflowClient.set_registered_model_tag = validate_wedata_before_operation(MlflowClient.set_registered_model_tag)
240
+ MlflowClient.delete_registered_model_tag = validate_wedata_before_operation(
241
+ MlflowClient.delete_registered_model_tag)
@@ -0,0 +1,192 @@
1
+ Metadata-Version: 2.4
2
+ Name: wedata-pre-code
3
+ Version: 1.0.0
4
+ Summary: WeData平台的预执行代码库,为机器学习实验提供与MLflow的深度集成
5
+ Author-email: WeData Team <wedata@tencent.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://wedata.tencent.com
8
+ Project-URL: Documentation, https://wedata.tencent.com/docs
9
+ Classifier: Development Status :: 4 - Beta
10
+ Classifier: Intended Audience :: Developers
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Programming Language :: Python :: 3.8
14
+ Classifier: Programming Language :: Python :: 3.9
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
18
+ Requires-Python: >=3.8
19
+ Description-Content-Type: text/markdown
20
+ Requires-Dist: mlflow>=2.0.0
21
+ Provides-Extra: dev
22
+ Requires-Dist: pytest>=6.0.0; extra == "dev"
23
+ Requires-Dist: black>=21.0.0; extra == "dev"
24
+ Requires-Dist: flake8>=3.9.0; extra == "dev"
25
+
26
+ # WeData Pre-Code Library
27
+
28
+ WeData平台的预执行代码库,为机器学习实验提供与MLflow的深度集成和WeData平台的功能增强。
29
+
30
+ ## 项目概述
31
+
32
+ 本项目提供了两个版本的WeData客户端,用于在WeData平台上运行机器学习实验时提供以下功能:
33
+
34
+ - **MLflow集成增强**:自动注入WeData平台特定的标签和过滤条件
35
+ - **权限控制**:基于项目/工作空间的权限验证机制
36
+ - **URL生成**:自动生成实验和运行的查看链接
37
+ - **环境配置**:自动设置运行环境变量
38
+
39
+ ## 版本说明
40
+
41
+ ### Wedata2PreCodeClient (WeData 2.0版本)
42
+
43
+ 适用于WeData 2.0平台的客户端,主要特性:
44
+
45
+ - 基于项目ID进行权限控制
46
+ - 支持国内站和国际站URL模板
47
+ - 自动注入项目标签和机器学习类型标签
48
+ - 提供完整的MLflow客户端装饰器
49
+
50
+ ### Wedata3PreCodeClient (WeData 3.0版本)
51
+
52
+ 适用于WeData 3.0平台的客户端,主要特性:
53
+
54
+ - 基于工作空间ID进行权限控制
55
+ - 支持更灵活的配置选项
56
+ - 增强的标签注入和验证机制
57
+ - 支持机器学习和深度学习两种实验类型
58
+
59
+ ## 安装和使用
60
+
61
+ ### 安装依赖
62
+
63
+ ```bash
64
+ pip install mlflow
65
+ ```
66
+
67
+ ### 使用Wedata2PreCodeClient
68
+
69
+ ```python
70
+ from wedata_pre_code.wedata2.client import Wedata2PreCodeClient
71
+
72
+ # 初始化客户端
73
+ client = Wedata2PreCodeClient(
74
+ wedata_project_id="your_project_id",
75
+ wedata_notebook_engine="your_engine",
76
+ qcloud_uin="your_uin",
77
+ qcloud_subuin="your_subuin",
78
+ wedata_default_feature_store_database="your_db",
79
+ wedata_feature_store_databases="your_dbs",
80
+ qcloud_region="your_region",
81
+ mlflow_tracking_uri="your_tracking_uri",
82
+ kernel_task_name="task_name",
83
+ kernel_task_id="task_id",
84
+ kernel_region="region",
85
+ kernel_is_international=False
86
+ )
87
+
88
+ # 现在可以使用MLflow客户端,会自动应用WeData的增强功能
89
+ import mlflow
90
+ mlflow.start_run()
91
+ # ... 你的实验代码
92
+ ```
93
+
94
+ ### 使用Wedata3PreCodeClient
95
+
96
+ ```python
97
+ from wedata_pre_code.wedata3.client import Wedata3PreCodeClient
98
+
99
+ # 初始化客户端
100
+ client = Wedata3PreCodeClient(
101
+ workspace_id="your_workspace_id",
102
+ mlflow_tracking_uri="your_tracking_uri",
103
+ base_url="your_base_url",
104
+ region="your_region",
105
+ run_context_data="your_context_data"
106
+ )
107
+
108
+ # 现在可以使用MLflow客户端,会自动应用WeData的增强功能
109
+ import mlflow
110
+ mlflow.start_run()
111
+ # ... 你的实验代码
112
+ ```
113
+
114
+ ## 功能特性
115
+
116
+ ### 自动标签注入
117
+
118
+ - 自动为实验、运行和模型注入WeData平台标签
119
+ - 包括项目ID、工作空间ID、机器学习类型等信息
120
+ - 确保数据在平台上的可追溯性
121
+
122
+ ### 权限验证
123
+
124
+ - 在执行敏感操作前验证权限
125
+ - 防止跨项目/工作空间的未授权操作
126
+ - 保护内置标签不被修改
127
+
128
+ ### URL生成
129
+
130
+ - 自动生成实验和运行的查看URL
131
+ - 在运行终止时显示访问链接
132
+ - 方便用户快速访问实验结果
133
+
134
+ ### 环境配置
135
+
136
+ - 自动设置MLflow跟踪URI
137
+ - 配置运行上下文环境变量
138
+ - 支持国际站和国内站的不同配置
139
+
140
+ ## 项目结构
141
+
142
+ ```
143
+ pre-execute/
144
+ ├── src/
145
+ │ └── wedata_pre_code/
146
+ │ ├── __init__.py
147
+ │ ├── client.py # 主客户端入口
148
+ │ ├── common/
149
+ │ │ ├── __init__.py
150
+ │ │ └── base_client.py # 基础客户端类
151
+ │ ├── wedata2/
152
+ │ │ ├── __init__.py
153
+ │ │ └── client.py # WeData 2.0客户端
154
+ │ └── wedata3/
155
+ │ ├── __init__.py
156
+ │ └── client.py # WeData 3.0客户端
157
+ ├── docs/ # 文档目录
158
+ ├── pyproject.toml # 项目配置
159
+ ├── requirement.txt # 依赖文件
160
+ └── README.md # 项目说明
161
+ ```
162
+
163
+ ## 开发指南
164
+
165
+ ### 添加新的装饰器
166
+
167
+ 要添加新的MLflow客户端方法装饰器,可以参考现有的实现模式:
168
+
169
+ 1. 在相应的客户端类中定义装饰器函数
170
+ 2. 使用`@wraps`保留原函数属性
171
+ 3. 在装饰器内部实现特定的逻辑
172
+ 4. 将装饰器应用到目标MLflow方法
173
+
174
+ ### 测试
175
+
176
+ 确保在修改代码后测试以下场景:
177
+
178
+ - 正常创建实验和运行
179
+ - 权限验证功能
180
+ - 标签注入的正确性
181
+ - URL生成的准确性
182
+
183
+ ## 注意事项
184
+
185
+ - 确保MLflow服务器配置正确
186
+ - 验证环境变量设置完整
187
+ - 注意不同版本客户端的参数差异
188
+ - 在生产环境使用前进行充分测试
189
+
190
+ ## 支持与反馈
191
+
192
+ 如有问题或建议,请联系WeData平台技术支持团队。
@@ -0,0 +1,15 @@
1
+ README.md
2
+ pyproject.toml
3
+ src/wedata_pre_code/__init__.py
4
+ src/wedata_pre_code/client.py
5
+ src/wedata_pre_code.egg-info/PKG-INFO
6
+ src/wedata_pre_code.egg-info/SOURCES.txt
7
+ src/wedata_pre_code.egg-info/dependency_links.txt
8
+ src/wedata_pre_code.egg-info/requires.txt
9
+ src/wedata_pre_code.egg-info/top_level.txt
10
+ src/wedata_pre_code/common/__init__.py
11
+ src/wedata_pre_code/common/base_client.py
12
+ src/wedata_pre_code/wedata2/__init__.py
13
+ src/wedata_pre_code/wedata2/client.py
14
+ src/wedata_pre_code/wedata3/__init__.py
15
+ src/wedata_pre_code/wedata3/client.py
@@ -0,0 +1,6 @@
1
+ mlflow>=2.0.0
2
+
3
+ [dev]
4
+ pytest>=6.0.0
5
+ black>=21.0.0
6
+ flake8>=3.9.0
@@ -0,0 +1 @@
1
+ wedata_pre_code