expr-codegen 0.10.16__tar.gz → 0.12.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/PKG-INFO +40 -10
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/README.md +38 -9
- expr_codegen-0.12.0/expr_codegen/_version.py +1 -0
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen/codes.py +29 -8
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen/expr.py +7 -9
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen/model.py +16 -10
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen/pandas/code.py +6 -3
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen/pandas/template.py.j2 +2 -2
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen/polars_group/code.py +6 -3
- {expr_codegen-0.10.16/expr_codegen/polars_over → expr_codegen-0.12.0/expr_codegen/polars_group}/template.py.j2 +2 -2
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen/polars_over/code.py +14 -5
- {expr_codegen-0.10.16/expr_codegen/polars_group → expr_codegen-0.12.0/expr_codegen/polars_over}/template.py.j2 +2 -2
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen/tool.py +57 -42
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen.egg-info/PKG-INFO +40 -10
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen.egg-info/requires.txt +1 -0
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/pyproject.toml +1 -0
- expr_codegen-0.10.16/expr_codegen/_version.py +0 -1
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/LICENSE +0 -0
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen/__init__.py +0 -0
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen/dag.py +0 -0
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen/latex/__init__.py +0 -0
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen/latex/printer.py +0 -0
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen/pandas/__init__.py +0 -0
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen/pandas/helper.py +0 -0
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen/pandas/printer.py +0 -0
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen/pandas/ta.py +0 -0
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen/polars_group/__init__.py +0 -0
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen/polars_group/printer.py +0 -0
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen/polars_over/__init__.py +0 -0
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen/polars_over/printer.py +0 -0
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen.egg-info/SOURCES.txt +0 -0
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen.egg-info/dependency_links.txt +0 -0
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/expr_codegen.egg-info/top_level.txt +0 -0
- {expr_codegen-0.10.16 → expr_codegen-0.12.0}/setup.cfg +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: expr_codegen
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.12.0
|
|
4
4
|
Summary: symbol expression to polars expression tool
|
|
5
5
|
Author-email: wukan <wu-kan@163.com>
|
|
6
6
|
License: BSD 3-Clause License
|
|
@@ -43,6 +43,7 @@ Requires-Dist: Jinja2
|
|
|
43
43
|
Requires-Dist: networkx
|
|
44
44
|
Requires-Dist: loguru
|
|
45
45
|
Requires-Dist: sympy
|
|
46
|
+
Requires-Dist: ast-comments
|
|
46
47
|
Provides-Extra: streamlit
|
|
47
48
|
Requires-Dist: streamlit; extra == "streamlit"
|
|
48
49
|
Requires-Dist: streamlit-ace; extra == "streamlit"
|
|
@@ -211,6 +212,33 @@ X3 = (ts_returns(CLOSE, 3)).over(_ASSET_, order_by=_DATE_),
|
|
|
211
212
|
2. `over_null='order_by'`。分到一个区域,`null`排在前面
|
|
212
213
|
3. `over_null=None`。不处理,直接调用,速度更快。如果确信不会中段产生`null`建议使用此参数
|
|
213
214
|
|
|
215
|
+
`codegen_exec(over_null='partition_by')`为全局使用`partition_by`。但遇到`ts_count_nulls`这类`null`
|
|
216
|
+
函数就得使用`over_null=None`,所以本工具还新添了注释功能来指定单行表达式参数
|
|
217
|
+
|
|
218
|
+
1. `# --over_null partition_by`。单行`over_null='partition_by'`
|
|
219
|
+
2. `# --over_null=order_by`。单行`over_null='order_by'`
|
|
220
|
+
3. `# --over_null`。单行`over_null=None`
|
|
221
|
+
4. `# `。取`codegen_exec`参数传入的`over_null`值
|
|
222
|
+
|
|
223
|
+
注意:
|
|
224
|
+
|
|
225
|
+
1. `# --over_null`传参注释只能写在单行表达式的后面,不能独立成一行,否则会被忽略
|
|
226
|
+
2. `# --over_null # --over_null=order_by`多个`#`时,只取第一个有效
|
|
227
|
+
3. 只对最外层`ts`函数有效。如果`ts`函数不在外层,需要人工提炼。例如:
|
|
228
|
+
```python
|
|
229
|
+
X1 = cs_rank(ts_mean(CLOSE, 3)) # --over_null=order_by # 应用在cs_rank上,没有意义
|
|
230
|
+
X2 = ts_rank(ts_mean(CLOSE, 3), 5) # --over_null=order_by # 本以为应用在ts_rank(ts_mean)上,但由于出现了公共ts_mean,其实是应用在ts_rank(_x_0)上
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
需写成
|
|
234
|
+
|
|
235
|
+
```python
|
|
236
|
+
_x_0 = ts_mean(CLOSE, 3) # --over_null=order_by
|
|
237
|
+
X1 = cs_rank(_x_0)
|
|
238
|
+
X2 = ts_rank(_x_0, 5)
|
|
239
|
+
```
|
|
240
|
+
4. 由于很容易搞错,强烈建议生成`output_file`,检查生成的代码是否正确。
|
|
241
|
+
|
|
214
242
|
## `expr_codegen`局限性
|
|
215
243
|
|
|
216
244
|
1. `DAG`只能增加列无法删除。增加列时,遇到同名列会覆盖
|
|
@@ -220,7 +248,8 @@ X3 = (ts_returns(CLOSE, 3)).over(_ASSET_, order_by=_DATE_),
|
|
|
220
248
|
|
|
221
249
|
## 特别语法
|
|
222
250
|
|
|
223
|
-
1. 支持`C?T:F`三元表达式(仅可字符串中使用),底层会先转成`C or True if( T )else F`,然后修正成`T if C else F
|
|
251
|
+
1. 支持`C?T:F`三元表达式(仅可字符串中使用),底层会先转成`C or True if( T )else F`,然后修正成`T if C else F`
|
|
252
|
+
,最后转成`if_else(C,T,F)`。支持与`if else`混用
|
|
224
253
|
2. `(A<B)*-1`,底层将转换成`int_(A<B)*-1`
|
|
225
254
|
3. 为防止`A==B`被`sympy`替换成`False`,底层会换成`Eq(A,B)`
|
|
226
255
|
4. `A^B`的含义与`convert_xor`参数有关,`convert_xor=True`底层会转换成`Pow(A,B)`,反之为`Xor(A,B)`。默认为`False`,用`**`表示乘方
|
|
@@ -230,19 +259,20 @@ X3 = (ts_returns(CLOSE, 3)).over(_ASSET_, order_by=_DATE_),
|
|
|
230
259
|
8. 支持`~A`,底层会转换成`Not(A)`
|
|
231
260
|
9. `gp_`开头的函数都会返回对应的`cs_`函数。如`gp_func(A,B,C)`会替换成`cs_func(B,C)`,其中`A`用在了`groupby([date, A])`
|
|
232
261
|
10. 支持`A,B,C=MACD()`元组解包,在底层会替换成
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
|
|
262
|
+
```python
|
|
263
|
+
_x_0 = MACD()
|
|
264
|
+
A = unpack(_x_0, 0)
|
|
265
|
+
B = unpack(_x_0, 1)
|
|
266
|
+
C = unpack(_x_0, 2)
|
|
267
|
+
```
|
|
268
|
+
11. 单行注释支持参数输入,如:`# --over_null`、`# --over_null=order_by`、`# --over_null=partition_by`
|
|
240
269
|
|
|
241
270
|
## 下划线开头的变量
|
|
242
271
|
|
|
243
272
|
1. 输出的数据,所有以`_`开头的列,最后会被自动删除。所以需要保留的变量一定不要以`_`开头
|
|
244
273
|
2. 为减少重复计算,自动添加了了中间变量,以`_x_`开头,如`_x_0`,`_x_1`等。最后会被自动删除
|
|
245
|
-
3. 单行表达式过长,或有重复计算,可以通过中间变量,将单行表达式改成多行。如果中间变量使用`_
|
|
274
|
+
3. 单行表达式过长,或有重复计算,可以通过中间变量,将单行表达式改成多行。如果中间变量使用`_`
|
|
275
|
+
开头,将会自动添加数字后缀,形成不同的变量,如`_A`会替换成`_A_0_`、`_A_1_`等。使用场景如下:
|
|
246
276
|
1. 同一变量名,重复使用。本质是不同的变量
|
|
247
277
|
2. 循环赋值,但`DAG`不支持有环。`=`号左右的同名变量其实是不同变量
|
|
248
278
|
|
|
@@ -160,6 +160,33 @@ X3 = (ts_returns(CLOSE, 3)).over(_ASSET_, order_by=_DATE_),
|
|
|
160
160
|
2. `over_null='order_by'`。分到一个区域,`null`排在前面
|
|
161
161
|
3. `over_null=None`。不处理,直接调用,速度更快。如果确信不会中段产生`null`建议使用此参数
|
|
162
162
|
|
|
163
|
+
`codegen_exec(over_null='partition_by')`为全局使用`partition_by`。但遇到`ts_count_nulls`这类`null`
|
|
164
|
+
函数就得使用`over_null=None`,所以本工具还新添了注释功能来指定单行表达式参数
|
|
165
|
+
|
|
166
|
+
1. `# --over_null partition_by`。单行`over_null='partition_by'`
|
|
167
|
+
2. `# --over_null=order_by`。单行`over_null='order_by'`
|
|
168
|
+
3. `# --over_null`。单行`over_null=None`
|
|
169
|
+
4. `# `。取`codegen_exec`参数传入的`over_null`值
|
|
170
|
+
|
|
171
|
+
注意:
|
|
172
|
+
|
|
173
|
+
1. `# --over_null`传参注释只能写在单行表达式的后面,不能独立成一行,否则会被忽略
|
|
174
|
+
2. `# --over_null # --over_null=order_by`多个`#`时,只取第一个有效
|
|
175
|
+
3. 只对最外层`ts`函数有效。如果`ts`函数不在外层,需要人工提炼。例如:
|
|
176
|
+
```python
|
|
177
|
+
X1 = cs_rank(ts_mean(CLOSE, 3)) # --over_null=order_by # 应用在cs_rank上,没有意义
|
|
178
|
+
X2 = ts_rank(ts_mean(CLOSE, 3), 5) # --over_null=order_by # 本以为应用在ts_rank(ts_mean)上,但由于出现了公共ts_mean,其实是应用在ts_rank(_x_0)上
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
需写成
|
|
182
|
+
|
|
183
|
+
```python
|
|
184
|
+
_x_0 = ts_mean(CLOSE, 3) # --over_null=order_by
|
|
185
|
+
X1 = cs_rank(_x_0)
|
|
186
|
+
X2 = ts_rank(_x_0, 5)
|
|
187
|
+
```
|
|
188
|
+
4. 由于很容易搞错,强烈建议生成`output_file`,检查生成的代码是否正确。
|
|
189
|
+
|
|
163
190
|
## `expr_codegen`局限性
|
|
164
191
|
|
|
165
192
|
1. `DAG`只能增加列无法删除。增加列时,遇到同名列会覆盖
|
|
@@ -169,7 +196,8 @@ X3 = (ts_returns(CLOSE, 3)).over(_ASSET_, order_by=_DATE_),
|
|
|
169
196
|
|
|
170
197
|
## 特别语法
|
|
171
198
|
|
|
172
|
-
1. 支持`C?T:F`三元表达式(仅可字符串中使用),底层会先转成`C or True if( T )else F`,然后修正成`T if C else F
|
|
199
|
+
1. 支持`C?T:F`三元表达式(仅可字符串中使用),底层会先转成`C or True if( T )else F`,然后修正成`T if C else F`
|
|
200
|
+
,最后转成`if_else(C,T,F)`。支持与`if else`混用
|
|
173
201
|
2. `(A<B)*-1`,底层将转换成`int_(A<B)*-1`
|
|
174
202
|
3. 为防止`A==B`被`sympy`替换成`False`,底层会换成`Eq(A,B)`
|
|
175
203
|
4. `A^B`的含义与`convert_xor`参数有关,`convert_xor=True`底层会转换成`Pow(A,B)`,反之为`Xor(A,B)`。默认为`False`,用`**`表示乘方
|
|
@@ -179,19 +207,20 @@ X3 = (ts_returns(CLOSE, 3)).over(_ASSET_, order_by=_DATE_),
|
|
|
179
207
|
8. 支持`~A`,底层会转换成`Not(A)`
|
|
180
208
|
9. `gp_`开头的函数都会返回对应的`cs_`函数。如`gp_func(A,B,C)`会替换成`cs_func(B,C)`,其中`A`用在了`groupby([date, A])`
|
|
181
209
|
10. 支持`A,B,C=MACD()`元组解包,在底层会替换成
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
210
|
+
```python
|
|
211
|
+
_x_0 = MACD()
|
|
212
|
+
A = unpack(_x_0, 0)
|
|
213
|
+
B = unpack(_x_0, 1)
|
|
214
|
+
C = unpack(_x_0, 2)
|
|
215
|
+
```
|
|
216
|
+
11. 单行注释支持参数输入,如:`# --over_null`、`# --over_null=order_by`、`# --over_null=partition_by`
|
|
189
217
|
|
|
190
218
|
## 下划线开头的变量
|
|
191
219
|
|
|
192
220
|
1. 输出的数据,所有以`_`开头的列,最后会被自动删除。所以需要保留的变量一定不要以`_`开头
|
|
193
221
|
2. 为减少重复计算,自动添加了了中间变量,以`_x_`开头,如`_x_0`,`_x_1`等。最后会被自动删除
|
|
194
|
-
3. 单行表达式过长,或有重复计算,可以通过中间变量,将单行表达式改成多行。如果中间变量使用`_
|
|
222
|
+
3. 单行表达式过长,或有重复计算,可以通过中间变量,将单行表达式改成多行。如果中间变量使用`_`
|
|
223
|
+
开头,将会自动添加数字后缀,形成不同的变量,如`_A`会替换成`_A_0_`、`_A_1_`等。使用场景如下:
|
|
195
224
|
1. 同一变量名,重复使用。本质是不同的变量
|
|
196
225
|
2. 循环赋值,但`DAG`不支持有环。`=`号左右的同名变量其实是不同变量
|
|
197
226
|
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
__version__ = "0.12.0"
|
|
@@ -2,10 +2,11 @@ import ast
|
|
|
2
2
|
import re
|
|
3
3
|
from ast import expr
|
|
4
4
|
|
|
5
|
+
import ast_comments
|
|
5
6
|
from black import Mode, format_str
|
|
6
7
|
from sympy import Add, Mul, Pow, Eq, Not, Xor
|
|
7
8
|
|
|
8
|
-
from expr_codegen.expr import register_symbols,
|
|
9
|
+
from expr_codegen.expr import register_symbols, list_to_exprs
|
|
9
10
|
|
|
10
11
|
|
|
11
12
|
class SyntaxTransformer(ast.NodeTransformer):
|
|
@@ -108,7 +109,8 @@ class SyntaxTransformer(ast.NodeTransformer):
|
|
|
108
109
|
def visit_Subscript(self, node):
|
|
109
110
|
if isinstance(node.slice, ast.Constant) and node.slice.value == 0:
|
|
110
111
|
node = node.value
|
|
111
|
-
elif isinstance(node.slice, ast.UnaryOp) and isinstance(node.slice.operand,
|
|
112
|
+
elif isinstance(node.slice, ast.UnaryOp) and isinstance(node.slice.operand,
|
|
113
|
+
ast.Constant) and node.slice.operand.value == 0:
|
|
112
114
|
node = node.value
|
|
113
115
|
else:
|
|
114
116
|
node = ast.Call(
|
|
@@ -328,6 +330,21 @@ def assigns_to_dict(assigns):
|
|
|
328
330
|
return {ast.unparse(a.targets): ast.unparse(a.value) for a in assigns}
|
|
329
331
|
|
|
330
332
|
|
|
333
|
+
def assigns_to_list(assigns):
|
|
334
|
+
"""赋值表达式转成列表"""
|
|
335
|
+
outputs = []
|
|
336
|
+
for i, a in enumerate(assigns):
|
|
337
|
+
comment = "#"
|
|
338
|
+
if i + 1 < len(assigns):
|
|
339
|
+
b = assigns[i + 1]
|
|
340
|
+
if isinstance(b, ast_comments.Comment):
|
|
341
|
+
# comment = ast_comments.unparse(b)
|
|
342
|
+
comment = b.value
|
|
343
|
+
if isinstance(a, ast.Assign):
|
|
344
|
+
outputs.append((ast.unparse(a.targets), ast.unparse(a.value), comment))
|
|
345
|
+
return outputs
|
|
346
|
+
|
|
347
|
+
|
|
331
348
|
def raw_to_code(raw):
|
|
332
349
|
"""导入语句转字符列表"""
|
|
333
350
|
return '\n'.join([ast.unparse(a) for a in raw])
|
|
@@ -338,7 +355,7 @@ def sources_to_asts(*sources, convert_xor: bool):
|
|
|
338
355
|
|
|
339
356
|
def _source_to_asts(source):
|
|
340
357
|
"""源代码"""
|
|
341
|
-
tree =
|
|
358
|
+
tree = ast_comments.parse(source_replace(source))
|
|
342
359
|
|
|
343
360
|
if isinstance(tree.body[0], ast.FunctionDef):
|
|
344
361
|
body = tree.body[0].body
|
|
@@ -347,7 +364,7 @@ def sources_to_asts(*sources, convert_xor: bool):
|
|
|
347
364
|
|
|
348
365
|
return body
|
|
349
366
|
|
|
350
|
-
tree =
|
|
367
|
+
tree = ast_comments.parse("")
|
|
351
368
|
for arg in sources:
|
|
352
369
|
tree.body.extend(_source_to_asts(arg))
|
|
353
370
|
|
|
@@ -359,16 +376,21 @@ def sources_to_asts(*sources, convert_xor: bool):
|
|
|
359
376
|
raw = []
|
|
360
377
|
assigns = []
|
|
361
378
|
|
|
362
|
-
for node in tree.body:
|
|
379
|
+
for i, node in enumerate(tree.body):
|
|
363
380
|
# 特殊处理的节点
|
|
364
381
|
if isinstance(node, ast.Assign):
|
|
365
382
|
assigns.append(node)
|
|
366
383
|
continue
|
|
384
|
+
if isinstance(node, ast_comments.Comment):
|
|
385
|
+
# 添加注释
|
|
386
|
+
if node.inline and isinstance(assigns[-1], ast.Assign):
|
|
387
|
+
assigns.append(node)
|
|
388
|
+
continue
|
|
367
389
|
# TODO 是否要把其它语句也加入?是否有安全问题?
|
|
368
390
|
if isinstance(node, (ast.Import, ast.ImportFrom)):
|
|
369
391
|
raw.append(node)
|
|
370
392
|
continue
|
|
371
|
-
return raw_to_code(raw),
|
|
393
|
+
return raw_to_code(raw), assigns_to_list(assigns), t.funcs_new, t.args_new, t.targets_new
|
|
372
394
|
|
|
373
395
|
|
|
374
396
|
def _add_default_type(globals_):
|
|
@@ -394,5 +416,4 @@ def sources_to_exprs(globals_, *sources, convert_xor: bool):
|
|
|
394
416
|
register_symbols(funcs_new, globals_, is_function=True)
|
|
395
417
|
register_symbols(args_new, globals_, is_function=False)
|
|
396
418
|
register_symbols(targets_new, globals_, is_function=False)
|
|
397
|
-
|
|
398
|
-
return raw, exprs_dict
|
|
419
|
+
return raw, list_to_exprs(assigns, globals_)
|
|
@@ -46,9 +46,8 @@ def register_symbols(syms, globals_, is_function: bool):
|
|
|
46
46
|
return globals_
|
|
47
47
|
|
|
48
48
|
|
|
49
|
-
def
|
|
50
|
-
|
|
51
|
-
return exprs_src
|
|
49
|
+
def list_to_exprs(exprs_src, globals_):
|
|
50
|
+
return [(k, sympify(v, globals_, evaluate=False), c) for k, v, c in exprs_src]
|
|
52
51
|
|
|
53
52
|
|
|
54
53
|
def append_node(node, output_exprs):
|
|
@@ -290,15 +289,15 @@ def get_key(children):
|
|
|
290
289
|
def replace_exprs(exprs):
|
|
291
290
|
"""使用替换的方式简化表达式"""
|
|
292
291
|
# Alpha101中大量ts_sum(x, 10)/10, 转成ts_mean(x, 10)
|
|
293
|
-
exprs =
|
|
292
|
+
exprs = [(k, _replace__ts_sum__to__ts_mean(v), c) for k, v, c in exprs]
|
|
294
293
|
# alpha_031中大量cs_rank(cs_rank(x)) 转成cs_rank(x)
|
|
295
|
-
exprs =
|
|
294
|
+
exprs = [(k, _replace__repeat(v), c) for k, v, c in exprs]
|
|
296
295
|
# 1.0*VWAP转VWAP
|
|
297
|
-
exprs =
|
|
296
|
+
exprs = [(k, _replace__one_mul(v), c) for k, v, c in exprs]
|
|
298
297
|
# 将部分参数为1的ts函数进行简化
|
|
299
|
-
exprs =
|
|
298
|
+
exprs = [(k, _replace__ts_xxx_1(v), c) for k, v, c in exprs]
|
|
300
299
|
# ts_delay转成ts_delta
|
|
301
|
-
exprs =
|
|
300
|
+
exprs = [(k, _replace__ts_delay__to__ts_delta(v), c) for k, v, c in exprs]
|
|
302
301
|
|
|
303
302
|
return exprs
|
|
304
303
|
|
|
@@ -441,7 +440,6 @@ def _replace__ts_delay__to__ts_delta(e):
|
|
|
441
440
|
e = e.xreplace({node: replacement})
|
|
442
441
|
return e
|
|
443
442
|
|
|
444
|
-
|
|
445
443
|
# def is_meaningless(e):
|
|
446
444
|
# if _meaningless__ts_xxx_1(e):
|
|
447
445
|
# return True
|
|
@@ -4,7 +4,7 @@ from itertools import product
|
|
|
4
4
|
import networkx as nx
|
|
5
5
|
from sympy import symbols
|
|
6
6
|
|
|
7
|
-
from expr_codegen.dag import zero_indegree, hierarchy_pos, remove_paths_by_zero_outdegree
|
|
7
|
+
from expr_codegen.dag import zero_indegree, hierarchy_pos, remove_paths_by_zero_outdegree, zero_outdegree
|
|
8
8
|
from expr_codegen.expr import CL, get_symbols, get_children, get_key, is_simple_expr
|
|
9
9
|
|
|
10
10
|
_RESERVED_WORD_ = {'_NONE_', '_TRUE_', '_FALSE_'}
|
|
@@ -85,12 +85,13 @@ class ListDictList:
|
|
|
85
85
|
last_v = v
|
|
86
86
|
last_k = k
|
|
87
87
|
|
|
88
|
-
def optimize(self):
|
|
88
|
+
def optimize(self, merge: bool):
|
|
89
89
|
"""将多组groupby根据规则进行合并,减少运行时间"""
|
|
90
90
|
# 接龙。groupby的数量没少,首尾接龙数据比较整齐
|
|
91
91
|
self._list = chain_create(self._list)
|
|
92
|
-
|
|
93
|
-
|
|
92
|
+
if merge:
|
|
93
|
+
# 首尾一样,接上去
|
|
94
|
+
self.back_merge()
|
|
94
95
|
# 出现了空行,删除
|
|
95
96
|
self.filter_empty()
|
|
96
97
|
|
|
@@ -196,15 +197,15 @@ def create_dag_exprs(exprs):
|
|
|
196
197
|
# 创建有向无环图
|
|
197
198
|
G = nx.DiGraph()
|
|
198
199
|
|
|
199
|
-
for symbol, expr in exprs
|
|
200
|
+
for symbol, expr, comment in exprs:
|
|
200
201
|
# if symbol.name == 'GP_0':
|
|
201
202
|
# test = 1
|
|
202
203
|
if expr.is_Symbol:
|
|
203
|
-
G.add_node(symbol.name, symbol=symbol, expr=expr)
|
|
204
|
+
G.add_node(symbol.name, symbol=symbol, expr=expr, comment=comment)
|
|
204
205
|
G.add_edge(expr.name, symbol.name)
|
|
205
206
|
else:
|
|
206
207
|
# 添加中间节点
|
|
207
|
-
G.add_node(symbol.name, symbol=symbol, expr=expr)
|
|
208
|
+
G.add_node(symbol.name, symbol=symbol, expr=expr, comment=comment)
|
|
208
209
|
syms = get_symbols(expr, return_str=True)
|
|
209
210
|
for sym in syms:
|
|
210
211
|
# 由于边的原因,这里会主动生成一些源节点
|
|
@@ -221,6 +222,10 @@ def create_dag_exprs(exprs):
|
|
|
221
222
|
s = symbols(node)
|
|
222
223
|
G.nodes[node]['symbol'] = s
|
|
223
224
|
G.nodes[node]['expr'] = s
|
|
225
|
+
G.nodes[node]['comment'] = "#"
|
|
226
|
+
#
|
|
227
|
+
# for node in zero_outdegree(G):
|
|
228
|
+
# print(11, G.nodes[node]['comment'])
|
|
224
229
|
return G
|
|
225
230
|
|
|
226
231
|
|
|
@@ -380,9 +385,9 @@ def skip_expr_node(G: nx.DiGraph, node, keep_nodes):
|
|
|
380
385
|
return G
|
|
381
386
|
|
|
382
387
|
|
|
383
|
-
def dag_start(
|
|
388
|
+
def dag_start(exprs_list, func, func_kwargs, date, asset):
|
|
384
389
|
"""初始生成DAG"""
|
|
385
|
-
G = create_dag_exprs(
|
|
390
|
+
G = create_dag_exprs(exprs_list)
|
|
386
391
|
G = init_dag_exprs(G, func, func_kwargs, date, asset)
|
|
387
392
|
|
|
388
393
|
# 分层输出
|
|
@@ -413,11 +418,12 @@ def dag_end(G):
|
|
|
413
418
|
for node in generation:
|
|
414
419
|
key = G.nodes[node]['key']
|
|
415
420
|
expr = G.nodes[node]['expr']
|
|
421
|
+
comment = G.nodes[node]['comment']
|
|
416
422
|
symbols = G.nodes[node]['symbols']
|
|
417
423
|
# 这几个特殊的不算成字段名
|
|
418
424
|
symbols = list(set(symbols) - _RESERVED_WORD_)
|
|
419
425
|
|
|
420
|
-
exprs_ldl.append(key, (node, expr, symbols))
|
|
426
|
+
exprs_ldl.append(key, (node, expr, symbols, comment))
|
|
421
427
|
|
|
422
428
|
exprs_ldl._list = exprs_ldl.values()[1:]
|
|
423
429
|
|
|
@@ -35,12 +35,15 @@ def symbols_to_code(syms, alias):
|
|
|
35
35
|
|
|
36
36
|
|
|
37
37
|
def codegen(exprs_ldl: ListDictList, exprs_src, syms_dst,
|
|
38
|
-
filename
|
|
38
|
+
filename,
|
|
39
39
|
date='date', asset='asset',
|
|
40
40
|
alias: Dict[str, str] = {},
|
|
41
41
|
extra_codes: Sequence[str] = (),
|
|
42
42
|
**kwargs):
|
|
43
43
|
"""基于模板的代码生成"""
|
|
44
|
+
if filename is None:
|
|
45
|
+
filename = 'template.py.j2'
|
|
46
|
+
|
|
44
47
|
# 打印Pandas风格代码
|
|
45
48
|
p = PandasStrPrinter()
|
|
46
49
|
|
|
@@ -67,9 +70,9 @@ def codegen(exprs_ldl: ListDictList, exprs_src, syms_dst,
|
|
|
67
70
|
func_code.append(f" # " + '=' * 40)
|
|
68
71
|
exprs_dst.append(f"#" + '=' * 40 + func_name)
|
|
69
72
|
else:
|
|
70
|
-
va, ex, sym = kv
|
|
73
|
+
va, ex, sym, comment = kv
|
|
71
74
|
func_code.append(f" # {va} = {ex}\n g[{va}] = {p.doprint(ex)}")
|
|
72
|
-
exprs_dst.append(f"{va} = {ex}")
|
|
75
|
+
exprs_dst.append(f"{va} = {ex} {comment}")
|
|
73
76
|
if va not in syms_dst:
|
|
74
77
|
syms_out.append(va)
|
|
75
78
|
|
|
@@ -36,12 +36,15 @@ def symbols_to_code(syms, alias):
|
|
|
36
36
|
|
|
37
37
|
|
|
38
38
|
def codegen(exprs_ldl: ListDictList, exprs_src, syms_dst,
|
|
39
|
-
filename
|
|
39
|
+
filename,
|
|
40
40
|
date='date', asset='asset',
|
|
41
41
|
alias: Dict[str, str] = {},
|
|
42
42
|
extra_codes: Sequence[str] = (),
|
|
43
43
|
**kwargs):
|
|
44
44
|
"""基于模板的代码生成"""
|
|
45
|
+
if filename is None:
|
|
46
|
+
filename = 'template.py.j2'
|
|
47
|
+
|
|
45
48
|
# 打印Polars风格代码
|
|
46
49
|
p = PolarsStrPrinter()
|
|
47
50
|
|
|
@@ -70,7 +73,7 @@ def codegen(exprs_ldl: ListDictList, exprs_src, syms_dst,
|
|
|
70
73
|
func_code.append(f" df = df.with_columns(")
|
|
71
74
|
exprs_dst.append(f"#" + '=' * 40 + func_name)
|
|
72
75
|
else:
|
|
73
|
-
va, ex, sym = kv
|
|
76
|
+
va, ex, sym, comment = kv
|
|
74
77
|
s1 = str(ex)
|
|
75
78
|
s2 = p.doprint(ex)
|
|
76
79
|
if s1 != s2:
|
|
@@ -78,7 +81,7 @@ def codegen(exprs_ldl: ListDictList, exprs_src, syms_dst,
|
|
|
78
81
|
func_code.append(f"# {va} = {s1}")
|
|
79
82
|
|
|
80
83
|
func_code.append(f"{va}={s2},")
|
|
81
|
-
exprs_dst.append(f"{va} = {s1}")
|
|
84
|
+
exprs_dst.append(f"{va} = {s1} {comment}")
|
|
82
85
|
if va not in syms_dst:
|
|
83
86
|
syms_out.append(va)
|
|
84
87
|
func_code.append(f" )")
|
|
@@ -1,3 +1,4 @@
|
|
|
1
|
+
import argparse
|
|
1
2
|
import os
|
|
2
3
|
from typing import Sequence, Dict, Literal
|
|
3
4
|
|
|
@@ -36,13 +37,19 @@ def symbols_to_code(syms, alias):
|
|
|
36
37
|
|
|
37
38
|
|
|
38
39
|
def codegen(exprs_ldl: ListDictList, exprs_src, syms_dst,
|
|
39
|
-
filename
|
|
40
|
+
filename,
|
|
40
41
|
date='date', asset='asset',
|
|
41
42
|
alias: Dict[str, str] = {},
|
|
42
43
|
extra_codes: Sequence[str] = (),
|
|
43
44
|
over_null: Literal['order_by', 'partition_by', None] = 'partition_by',
|
|
44
45
|
**kwargs):
|
|
45
46
|
"""基于模板的代码生成"""
|
|
47
|
+
if filename is None:
|
|
48
|
+
filename = 'template.py.j2'
|
|
49
|
+
|
|
50
|
+
parser = argparse.ArgumentParser()
|
|
51
|
+
parser.add_argument("--over_null", type=str, nargs="?", default=over_null)
|
|
52
|
+
|
|
46
53
|
# 打印Polars风格代码
|
|
47
54
|
p = PolarsStrPrinter()
|
|
48
55
|
|
|
@@ -71,7 +78,9 @@ def codegen(exprs_ldl: ListDictList, exprs_src, syms_dst,
|
|
|
71
78
|
func_code.append(f" df = df.with_columns(")
|
|
72
79
|
exprs_dst.append(f"#" + '=' * 40 + func_name)
|
|
73
80
|
else:
|
|
74
|
-
va, ex, sym = kv
|
|
81
|
+
va, ex, sym, comment = kv
|
|
82
|
+
# 多个#时,只取第一个#后的参数
|
|
83
|
+
args, argv = parser.parse_known_args(args=comment.split("#")[1].split(" "))
|
|
75
84
|
s1 = str(ex)
|
|
76
85
|
s2 = p.doprint(ex)
|
|
77
86
|
if s1 != s2:
|
|
@@ -84,9 +93,9 @@ def codegen(exprs_ldl: ListDictList, exprs_src, syms_dst,
|
|
|
84
93
|
_sym = f"pl.all_horizontal({','.join(_sym)})"
|
|
85
94
|
else:
|
|
86
95
|
_sym = ','.join(_sym)
|
|
87
|
-
if over_null == 'partition_by':
|
|
96
|
+
if args.over_null == 'partition_by':
|
|
88
97
|
func_code.append(f"{va}=({s2}).over({_sym}, _ASSET_, order_by=_DATE_),")
|
|
89
|
-
elif over_null == 'order_by':
|
|
98
|
+
elif args.over_null == 'order_by':
|
|
90
99
|
func_code.append(f"{va}=({s2}).over(_ASSET_, order_by=[{_sym}, _DATE_]),")
|
|
91
100
|
else:
|
|
92
101
|
func_code.append(f"{va}=({s2}).over(_ASSET_, order_by=_DATE_),")
|
|
@@ -96,7 +105,7 @@ def codegen(exprs_ldl: ListDictList, exprs_src, syms_dst,
|
|
|
96
105
|
func_code.append(f"{va}=({s2}).over(_DATE_, '{k[2]}'),")
|
|
97
106
|
else:
|
|
98
107
|
func_code.append(f"{va}={s2},")
|
|
99
|
-
exprs_dst.append(f"{va} = {s1}")
|
|
108
|
+
exprs_dst.append(f"{va} = {s1} {comment}")
|
|
100
109
|
if va not in syms_dst:
|
|
101
110
|
syms_out.append(va)
|
|
102
111
|
func_code.append(f" )")
|
|
@@ -61,7 +61,7 @@ class ExprTool:
|
|
|
61
61
|
def __init__(self):
|
|
62
62
|
self.get_current_func = get_current_by_prefix
|
|
63
63
|
self.get_current_func_kwargs = {}
|
|
64
|
-
self.
|
|
64
|
+
self.exprs_list = {}
|
|
65
65
|
self.exprs_names = []
|
|
66
66
|
self.globals_ = {}
|
|
67
67
|
|
|
@@ -92,7 +92,7 @@ class ExprTool:
|
|
|
92
92
|
# print(exprs)
|
|
93
93
|
return exprs, syms
|
|
94
94
|
|
|
95
|
-
def merge(self, date, asset,
|
|
95
|
+
def merge(self, date, asset, args):
|
|
96
96
|
"""合并多个表达式
|
|
97
97
|
|
|
98
98
|
1. 先抽取分割子公式
|
|
@@ -100,28 +100,31 @@ class ExprTool:
|
|
|
100
100
|
|
|
101
101
|
Parameters
|
|
102
102
|
----------
|
|
103
|
-
|
|
104
|
-
|
|
103
|
+
args
|
|
104
|
+
表达式列表
|
|
105
105
|
|
|
106
106
|
Returns
|
|
107
107
|
-------
|
|
108
108
|
表达式列表
|
|
109
109
|
"""
|
|
110
110
|
# 抽取前先化简
|
|
111
|
-
|
|
111
|
+
args = [(k, simplify2(v), c) for k, v, c in args]
|
|
112
112
|
|
|
113
|
-
|
|
113
|
+
# 保留了注释信息
|
|
114
|
+
exprs_syms = [(self.extract(v, date, asset), c) for k, v, c in args]
|
|
114
115
|
exprs = []
|
|
115
116
|
syms = []
|
|
116
|
-
for e, s in exprs_syms:
|
|
117
|
-
exprs.extend(e)
|
|
117
|
+
for (e, s), c in exprs_syms:
|
|
118
118
|
syms.extend(s)
|
|
119
|
+
for _ in e:
|
|
120
|
+
# 抽取的表达式添加注释
|
|
121
|
+
exprs.append((_, c))
|
|
119
122
|
|
|
120
123
|
syms = sorted(set(syms), key=syms.index)
|
|
121
124
|
# 如果目标有重复表达式,这里会混乱
|
|
122
125
|
exprs = sorted(set(exprs), key=exprs.index)
|
|
123
126
|
# 这里不能合并简化与未简化的表达式,会导致cse时失败,需要简化表达式合并
|
|
124
|
-
exprs = exprs +
|
|
127
|
+
exprs = exprs + [(v, c) for k, v, c in args]
|
|
125
128
|
|
|
126
129
|
# print(exprs)
|
|
127
130
|
syms = [str(s) for s in syms]
|
|
@@ -130,18 +133,18 @@ class ExprTool:
|
|
|
130
133
|
def reduce(self, repl, redu):
|
|
131
134
|
"""减少中间变量数量,有利用减少内存占用"""
|
|
132
135
|
|
|
133
|
-
|
|
136
|
+
exprs_list = []
|
|
134
137
|
|
|
135
138
|
# cse前简化一次,cse后不再简化
|
|
136
139
|
# (~开盘涨停 & 昨收涨停) | (~收盘涨停 & 最高涨停)
|
|
137
|
-
for
|
|
138
|
-
|
|
139
|
-
for
|
|
140
|
-
|
|
140
|
+
for k, v in repl:
|
|
141
|
+
exprs_list.append((k, v, "#"))
|
|
142
|
+
for k, v, c in redu:
|
|
143
|
+
exprs_list.append((k, v, c))
|
|
141
144
|
|
|
142
|
-
return
|
|
145
|
+
return exprs_list
|
|
143
146
|
|
|
144
|
-
def cse(self, exprs, symbols_repl=None,
|
|
147
|
+
def cse(self, exprs, symbols_repl=None, exprs_src=None):
|
|
145
148
|
"""多个子公式+长公式,提取公共公式
|
|
146
149
|
|
|
147
150
|
Parameters
|
|
@@ -150,7 +153,7 @@ class ExprTool:
|
|
|
150
153
|
表达式列表
|
|
151
154
|
symbols_repl
|
|
152
155
|
中间字段名迭代器
|
|
153
|
-
|
|
156
|
+
exprs_src
|
|
154
157
|
最终字段名列表
|
|
155
158
|
|
|
156
159
|
Returns
|
|
@@ -163,34 +166,38 @@ class ExprTool:
|
|
|
163
166
|
表达式
|
|
164
167
|
|
|
165
168
|
"""
|
|
166
|
-
self.exprs_names =
|
|
169
|
+
self.exprs_names = [k for k, v, c in exprs_src]
|
|
170
|
+
# 包含了注释信息
|
|
171
|
+
_exprs = [k for k, v in exprs]
|
|
167
172
|
|
|
168
|
-
|
|
169
|
-
|
|
173
|
+
# 注意:对于表达式右边相同,左边不同的情况,会当成一个处理
|
|
174
|
+
repl, redu = cse(_exprs, symbols_repl, optimizations="basic")
|
|
175
|
+
outputs_len = len(exprs_src)
|
|
170
176
|
|
|
171
177
|
new_redu = []
|
|
172
|
-
symbols_redu = iter(
|
|
178
|
+
symbols_redu = iter(exprs_src)
|
|
173
179
|
for expr in redu[-outputs_len:]:
|
|
174
180
|
# 可能部分表达式只在之前出现过,后面完全用不到如,ts_rank(ts_decay_linear(x_147, 11.4157), 6.72611)
|
|
175
181
|
variable = next(symbols_redu)
|
|
176
|
-
|
|
177
|
-
new_redu.append((
|
|
182
|
+
a = symbols(variable[0])
|
|
183
|
+
new_redu.append((a, expr, variable[2]))
|
|
178
184
|
|
|
179
|
-
self.
|
|
185
|
+
self.exprs_list = self.reduce(repl, new_redu)
|
|
180
186
|
|
|
181
187
|
# with open("exprs.pickle", "wb") as file:
|
|
182
188
|
# pickle.dump(exprs_dict, file)
|
|
183
189
|
|
|
184
|
-
return self.
|
|
190
|
+
return self.exprs_list
|
|
185
191
|
|
|
186
192
|
def dag(self, merge: bool, date, asset):
|
|
187
193
|
"""生成DAG"""
|
|
188
|
-
G = dag_start(self.
|
|
194
|
+
G = dag_start(self.exprs_list, self.get_current_func, self.get_current_func_kwargs, date, asset)
|
|
189
195
|
if merge:
|
|
190
196
|
G = dag_middle(G, self.exprs_names, self.get_current_func, self.get_current_func_kwargs, date, asset)
|
|
191
197
|
return dag_end(G)
|
|
192
198
|
|
|
193
|
-
def all(self, exprs_src, style: Literal['pandas', 'polars_group', 'polars_over'
|
|
199
|
+
def all(self, exprs_src, style: Literal['pandas', 'polars_group', 'polars_over', 'sql'] = 'polars_over',
|
|
200
|
+
template_file: Optional[str] = None,
|
|
194
201
|
replace: bool = True, regroup: bool = False, format: bool = True,
|
|
195
202
|
date='date', asset='asset',
|
|
196
203
|
alias: Dict[str, str] = {},
|
|
@@ -200,10 +207,10 @@ class ExprTool:
|
|
|
200
207
|
|
|
201
208
|
Parameters
|
|
202
209
|
----------
|
|
203
|
-
exprs_src:
|
|
204
|
-
|
|
210
|
+
exprs_src: list
|
|
211
|
+
表达式列表
|
|
205
212
|
style: str
|
|
206
|
-
代码风格。可选值 ('polars_group', 'polars_over', 'pandas')
|
|
213
|
+
代码风格。可选值 ('polars_group', 'polars_over', 'pandas', 'sql')
|
|
207
214
|
template_file: str
|
|
208
215
|
根据需求可定制模板
|
|
209
216
|
replace:bool
|
|
@@ -226,29 +233,34 @@ class ExprTool:
|
|
|
226
233
|
代码字符串
|
|
227
234
|
|
|
228
235
|
"""
|
|
229
|
-
assert style in ('pandas', 'polars_group', 'polars_over')
|
|
236
|
+
assert style in ('pandas', 'polars_group', 'polars_over', 'sql')
|
|
230
237
|
|
|
231
238
|
if replace:
|
|
232
239
|
exprs_src = replace_exprs(exprs_src)
|
|
233
240
|
|
|
234
241
|
# 子表达式在前,原表式在最后
|
|
235
|
-
exprs_dst, syms_dst = self.merge(date, asset,
|
|
242
|
+
exprs_dst, syms_dst = self.merge(date, asset, exprs_src)
|
|
236
243
|
syms_dst = list(set(syms_dst) - _RESERVED_WORD_)
|
|
237
244
|
|
|
238
245
|
# 提取公共表达式
|
|
239
|
-
self.cse(exprs_dst, symbols_repl=numbered_symbols('_x_'),
|
|
246
|
+
self.cse(exprs_dst, symbols_repl=numbered_symbols('_x_'), exprs_src=exprs_src)
|
|
240
247
|
# 有向无环图流转
|
|
241
248
|
exprs_ldl, G = self.dag(True, date, asset)
|
|
242
249
|
|
|
243
250
|
if regroup:
|
|
244
|
-
exprs_ldl.optimize()
|
|
251
|
+
exprs_ldl.optimize(merge=style != 'sql')
|
|
245
252
|
|
|
246
253
|
if style == 'polars_group':
|
|
247
254
|
from expr_codegen.polars_group.code import codegen
|
|
248
255
|
elif style == 'polars_over':
|
|
249
256
|
from expr_codegen.polars_over.code import codegen
|
|
250
|
-
|
|
257
|
+
elif style == 'pandas':
|
|
251
258
|
from expr_codegen.pandas.code import codegen
|
|
259
|
+
elif style == 'sql':
|
|
260
|
+
from expr_codegen.sql.code import codegen
|
|
261
|
+
format = False
|
|
262
|
+
else:
|
|
263
|
+
raise ValueError(f'unknown style {style}')
|
|
252
264
|
|
|
253
265
|
extra_codes = [c if isinstance(c, str) else inspect.getsource(c) for c in extra_codes]
|
|
254
266
|
|
|
@@ -272,14 +284,15 @@ class ExprTool:
|
|
|
272
284
|
extra_codes: str,
|
|
273
285
|
output_file: str,
|
|
274
286
|
convert_xor: bool,
|
|
275
|
-
style: Literal['pandas', 'polars_group', 'polars_over'
|
|
287
|
+
style: Literal['pandas', 'polars_group', 'polars_over', 'sql'] = 'polars_over',
|
|
288
|
+
template_file: Optional[str] = None,
|
|
276
289
|
date: str = 'date', asset: str = 'asset',
|
|
277
290
|
**kwargs) -> str:
|
|
278
291
|
"""通过字符串生成代码, 加了缓存,多次调用不重复生成"""
|
|
279
|
-
raw,
|
|
292
|
+
raw, exprs_list = sources_to_exprs(self.globals_, source, *more_sources, convert_xor=convert_xor)
|
|
280
293
|
|
|
281
294
|
# 生成代码
|
|
282
|
-
code, G = _TOOL_.all(
|
|
295
|
+
code, G = _TOOL_.all(exprs_list, style=style, template_file=template_file,
|
|
283
296
|
replace=True, regroup=True, format=True,
|
|
284
297
|
date=date, asset=asset,
|
|
285
298
|
# 复制了需要使用的函数,还复制了最原始的表达式
|
|
@@ -333,14 +346,15 @@ _TOOL_ = ExprTool()
|
|
|
333
346
|
|
|
334
347
|
def codegen_exec(df: Optional[DataFrame],
|
|
335
348
|
*codes,
|
|
349
|
+
over_null: Literal['partition_by', 'order_by', None],
|
|
336
350
|
extra_codes: str = r'CS_SW_L1 = r"^sw_l1_\d+$"',
|
|
337
351
|
output_file: Union[str, TextIOBase, None] = None,
|
|
338
352
|
run_file: Union[bool, str] = False,
|
|
339
353
|
convert_xor: bool = False,
|
|
340
|
-
style: Literal['pandas', 'polars_group', 'polars_over'] = 'polars_over',
|
|
341
|
-
template_file: str =
|
|
354
|
+
style: Literal['pandas', 'polars_group', 'polars_over', 'sql'] = 'polars_over',
|
|
355
|
+
template_file: Optional[str] = None,
|
|
342
356
|
date: str = 'date', asset: str = 'asset',
|
|
343
|
-
|
|
357
|
+
|
|
344
358
|
**kwargs) -> Optional[DataFrame]:
|
|
345
359
|
"""快速转换源代码并执行
|
|
346
360
|
|
|
@@ -363,9 +377,10 @@ def codegen_exec(df: Optional[DataFrame],
|
|
|
363
377
|
convert_xor: bool
|
|
364
378
|
^ 转成异或还是乘方
|
|
365
379
|
style: str
|
|
366
|
-
代码风格。可选值 'pandas', 'polars_group', 'polars_over'
|
|
380
|
+
代码风格。可选值 'pandas', 'polars_group', 'polars_over', 'sql'
|
|
367
381
|
- polars_group: 不支持Lazy
|
|
368
382
|
- pandas: 不支持struct
|
|
383
|
+
- sql: 只生成sql语句,不执行
|
|
369
384
|
template_file: str
|
|
370
385
|
代码模板
|
|
371
386
|
date: str
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: expr_codegen
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.12.0
|
|
4
4
|
Summary: symbol expression to polars expression tool
|
|
5
5
|
Author-email: wukan <wu-kan@163.com>
|
|
6
6
|
License: BSD 3-Clause License
|
|
@@ -43,6 +43,7 @@ Requires-Dist: Jinja2
|
|
|
43
43
|
Requires-Dist: networkx
|
|
44
44
|
Requires-Dist: loguru
|
|
45
45
|
Requires-Dist: sympy
|
|
46
|
+
Requires-Dist: ast-comments
|
|
46
47
|
Provides-Extra: streamlit
|
|
47
48
|
Requires-Dist: streamlit; extra == "streamlit"
|
|
48
49
|
Requires-Dist: streamlit-ace; extra == "streamlit"
|
|
@@ -211,6 +212,33 @@ X3 = (ts_returns(CLOSE, 3)).over(_ASSET_, order_by=_DATE_),
|
|
|
211
212
|
2. `over_null='order_by'`。分到一个区域,`null`排在前面
|
|
212
213
|
3. `over_null=None`。不处理,直接调用,速度更快。如果确信不会中段产生`null`建议使用此参数
|
|
213
214
|
|
|
215
|
+
`codegen_exec(over_null='partition_by')`为全局使用`partition_by`。但遇到`ts_count_nulls`这类`null`
|
|
216
|
+
函数就得使用`over_null=None`,所以本工具还新添了注释功能来指定单行表达式参数
|
|
217
|
+
|
|
218
|
+
1. `# --over_null partition_by`。单行`over_null='partition_by'`
|
|
219
|
+
2. `# --over_null=order_by`。单行`over_null='order_by'`
|
|
220
|
+
3. `# --over_null`。单行`over_null=None`
|
|
221
|
+
4. `# `。取`codegen_exec`参数传入的`over_null`值
|
|
222
|
+
|
|
223
|
+
注意:
|
|
224
|
+
|
|
225
|
+
1. `# --over_null`传参注释只能写在单行表达式的后面,不能独立成一行,否则会被忽略
|
|
226
|
+
2. `# --over_null # --over_null=order_by`多个`#`时,只取第一个有效
|
|
227
|
+
3. 只对最外层`ts`函数有效。如果`ts`函数不在外层,需要人工提炼。例如:
|
|
228
|
+
```python
|
|
229
|
+
X1 = cs_rank(ts_mean(CLOSE, 3)) # --over_null=order_by # 应用在cs_rank上,没有意义
|
|
230
|
+
X2 = ts_rank(ts_mean(CLOSE, 3), 5) # --over_null=order_by # 本以为应用在ts_rank(ts_mean)上,但由于出现了公共ts_mean,其实是应用在ts_rank(_x_0)上
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
需写成
|
|
234
|
+
|
|
235
|
+
```python
|
|
236
|
+
_x_0 = ts_mean(CLOSE, 3) # --over_null=order_by
|
|
237
|
+
X1 = cs_rank(_x_0)
|
|
238
|
+
X2 = ts_rank(_x_0, 5)
|
|
239
|
+
```
|
|
240
|
+
4. 由于很容易搞错,强烈建议生成`output_file`,检查生成的代码是否正确。
|
|
241
|
+
|
|
214
242
|
## `expr_codegen`局限性
|
|
215
243
|
|
|
216
244
|
1. `DAG`只能增加列无法删除。增加列时,遇到同名列会覆盖
|
|
@@ -220,7 +248,8 @@ X3 = (ts_returns(CLOSE, 3)).over(_ASSET_, order_by=_DATE_),
|
|
|
220
248
|
|
|
221
249
|
## 特别语法
|
|
222
250
|
|
|
223
|
-
1. 支持`C?T:F`三元表达式(仅可字符串中使用),底层会先转成`C or True if( T )else F`,然后修正成`T if C else F
|
|
251
|
+
1. 支持`C?T:F`三元表达式(仅可字符串中使用),底层会先转成`C or True if( T )else F`,然后修正成`T if C else F`
|
|
252
|
+
,最后转成`if_else(C,T,F)`。支持与`if else`混用
|
|
224
253
|
2. `(A<B)*-1`,底层将转换成`int_(A<B)*-1`
|
|
225
254
|
3. 为防止`A==B`被`sympy`替换成`False`,底层会换成`Eq(A,B)`
|
|
226
255
|
4. `A^B`的含义与`convert_xor`参数有关,`convert_xor=True`底层会转换成`Pow(A,B)`,反之为`Xor(A,B)`。默认为`False`,用`**`表示乘方
|
|
@@ -230,19 +259,20 @@ X3 = (ts_returns(CLOSE, 3)).over(_ASSET_, order_by=_DATE_),
|
|
|
230
259
|
8. 支持`~A`,底层会转换成`Not(A)`
|
|
231
260
|
9. `gp_`开头的函数都会返回对应的`cs_`函数。如`gp_func(A,B,C)`会替换成`cs_func(B,C)`,其中`A`用在了`groupby([date, A])`
|
|
232
261
|
10. 支持`A,B,C=MACD()`元组解包,在底层会替换成
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
|
|
262
|
+
```python
|
|
263
|
+
_x_0 = MACD()
|
|
264
|
+
A = unpack(_x_0, 0)
|
|
265
|
+
B = unpack(_x_0, 1)
|
|
266
|
+
C = unpack(_x_0, 2)
|
|
267
|
+
```
|
|
268
|
+
11. 单行注释支持参数输入,如:`# --over_null`、`# --over_null=order_by`、`# --over_null=partition_by`
|
|
240
269
|
|
|
241
270
|
## 下划线开头的变量
|
|
242
271
|
|
|
243
272
|
1. 输出的数据,所有以`_`开头的列,最后会被自动删除。所以需要保留的变量一定不要以`_`开头
|
|
244
273
|
2. 为减少重复计算,自动添加了了中间变量,以`_x_`开头,如`_x_0`,`_x_1`等。最后会被自动删除
|
|
245
|
-
3. 单行表达式过长,或有重复计算,可以通过中间变量,将单行表达式改成多行。如果中间变量使用`_
|
|
274
|
+
3. 单行表达式过长,或有重复计算,可以通过中间变量,将单行表达式改成多行。如果中间变量使用`_`
|
|
275
|
+
开头,将会自动添加数字后缀,形成不同的变量,如`_A`会替换成`_A_0_`、`_A_1_`等。使用场景如下:
|
|
246
276
|
1. 同一变量名,重复使用。本质是不同的变量
|
|
247
277
|
2. 循环赋值,但`DAG`不支持有环。`=`号左右的同名变量其实是不同变量
|
|
248
278
|
|
|
@@ -1 +0,0 @@
|
|
|
1
|
-
__version__ = "0.10.16"
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|