PyPI - jupyter-data-fetch - Versions diffs - 0.1.0__tar.gz - Mend

jupyter-data-fetch 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

jupyter_data_fetch-0.1.0/.gitignore +10 -0
jupyter_data_fetch-0.1.0/LICENSE +21 -0
jupyter_data_fetch-0.1.0/PKG-INFO +90 -0
jupyter_data_fetch-0.1.0/README.md +79 -0
jupyter_data_fetch-0.1.0/jupyter_data_fetch/__init__.py +1 -0
jupyter_data_fetch-0.1.0/jupyter_data_fetch/_version.py +1 -0
jupyter_data_fetch-0.1.0/jupyter_data_fetch/codec.py +191 -0
jupyter_data_fetch-0.1.0/jupyter_data_fetch/wraps/__init__.py +34 -0
jupyter_data_fetch-0.1.0/jupyter_data_fetch/wraps/jqdatasdk.py +62 -0
jupyter_data_fetch-0.1.0/pyproject.toml +25 -0

jupyter_data_fetch-0.1.0/.gitignore ADDED Viewed

@@ -0,0 +1,10 @@
+# Python-generated files
+__pycache__/
+*.py[oc]
+build/
+dist/
+wheels/
+*.egg-info
+# Virtual environments
+.venv

jupyter_data_fetch-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 伍侃
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

jupyter_data_fetch-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,90 @@
+Metadata-Version: 2.4
+Name: jupyter-data-fetch
+Version: 0.1.0
+Summary: fetch data from jupyter notebook
+License-File: LICENSE
+Requires-Python: >=3.9
+Requires-Dist: jupyter-kernel-client
+Requires-Dist: pandas<3.0
+Requires-Dist: pillow
+Description-Content-Type: text/markdown
+# jupyter-data-fetch
+从`jupyterlab`、`jupyter notebook`中抓取数据的示例。**代码就一个文件，过于简单，故未打包发布**
+## 优点
+1. 与`ksrpc`比，通用性更强，理论上全平台通用
+2. 不需中转服务器，网页能打开就能使用
+## 缺点
+1. `ksrpc`传输是二进制，而本项目编码成了`base85/base64`,速度较慢
+2. 传输带宽消耗多，`base64`多占33%，`base85`多占25%
+## 安装
+1. 将`codec.py`复制到自己的项目中
+2. `uv pip install -r requirements.txt`，其中关键的是`uv pip install jupyter_kernel_client`库
+## 使用方法
+1. `examples`下提供了示例
+2. 以`joinquant`为例，打开浏览器，登录研究环境，按`F12`打开开发者工具
+3. 搜索`kernels`，复制`请求URL`和`Cookie`
+   ![devtool.png](docs/devtool.png)
+4. 替换示例中`Cookie`和`server_url`即可
+   ![ide.png](docs/ide.png)
+5. 留意:`server_url`只复制一段。`Cookie`要完整复制
+## 最简示例
+```python
+from jupyter_kernel_client import KernelClient
+from jupyter_data_fetch.codec import JupyterTextCodec
+# ... 省去部分代码。更多参考examples/joinquant.py
+with KernelClient(server_url="https://www.joinquant.com/user/12345678901", token=None, headers=headers) as kernel:
+    # 一定要保证缩进正确
+    code = """
+df = get_fundamentals(query(
+        valuation, income
+    ).filter(
+        # 这里不能使用 in 操作, 要使用in_()函数
+        valuation.code.in_(['000001.XSHE', '600000.XSHG'])
+    ), date='2015-10-15')
+"""
+    reply = kernel.execute(JupyterTextCodec.generate_code(code, var_name='df'))
+    # print(reply)
+    obj = JupyterTextCodec.extract_decode(reply)
+    print(obj)
+```
+## 进阶函数
+1. 由于`code`是字符串，动态传入`list/dict`太麻烦，所以还提供了`auto_execute`
+2. 用户提前对函数套上`auto_execute`装饰器，就可以快速使用
+3. 参考[examples/jqresearch.py](examples/jqresearch.py)
+## 自动登录并获取数据的完整示例
+参考[examples/playwright/joinquant.py](examples/playwright/joinquant.py)
+## 核心代码
+1. `JupyterTextCodec`: 目前使用`base85`编解码器，使用字符串传输数据，压缩率高。字符串被截断时，必须使用`JupyterImageCodec`
+2. `JupyterImageCodec`: 图片编解码器，使用图片传输数据，`base64`编码压缩率低
+3. `generate_code`生成可在`Notebook`单元格中运行的代码字符串，一定要指定需要获取的变量名`var_name`
+4. `kernel.execute`在服务段执行字符串代码，返回`json`对象
+5. `extract_decode`从`json`中提取数据后解码成对象
+## 注意
+1. 由于各平台限制，`generate_code`生成的代码可能无法运行，可以复制到`Notebook`中测试
+2. `python3.6`问题太多，可以打开一个`ipynb`文件后，通过菜单更改内核为最新版
+3. 可以连接到已经打开的内核，只要提供`kernel_id`参数即可。参考`ricequant.py`示例
+4. `Notebook`中可以导入当前目录中`py`，但本项目直接使用当前目录是`/`，导致导入失败，通过指定`kernel_id`可解决

jupyter_data_fetch-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,79 @@
+# jupyter-data-fetch
+从`jupyterlab`、`jupyter notebook`中抓取数据的示例。**代码就一个文件，过于简单，故未打包发布**
+## 优点
+1. 与`ksrpc`比，通用性更强，理论上全平台通用
+2. 不需中转服务器，网页能打开就能使用
+## 缺点
+1. `ksrpc`传输是二进制，而本项目编码成了`base85/base64`,速度较慢
+2. 传输带宽消耗多，`base64`多占33%，`base85`多占25%
+## 安装
+1. 将`codec.py`复制到自己的项目中
+2. `uv pip install -r requirements.txt`，其中关键的是`uv pip install jupyter_kernel_client`库
+## 使用方法
+1. `examples`下提供了示例
+2. 以`joinquant`为例，打开浏览器，登录研究环境，按`F12`打开开发者工具
+3. 搜索`kernels`，复制`请求URL`和`Cookie`
+   ![devtool.png](docs/devtool.png)
+4. 替换示例中`Cookie`和`server_url`即可
+   ![ide.png](docs/ide.png)
+5. 留意:`server_url`只复制一段。`Cookie`要完整复制
+## 最简示例
+```python
+from jupyter_kernel_client import KernelClient
+from jupyter_data_fetch.codec import JupyterTextCodec
+# ... 省去部分代码。更多参考examples/joinquant.py
+with KernelClient(server_url="https://www.joinquant.com/user/12345678901", token=None, headers=headers) as kernel:
+    # 一定要保证缩进正确
+    code = """
+df = get_fundamentals(query(
+        valuation, income
+    ).filter(
+        # 这里不能使用 in 操作, 要使用in_()函数
+        valuation.code.in_(['000001.XSHE', '600000.XSHG'])
+    ), date='2015-10-15')
+"""
+    reply = kernel.execute(JupyterTextCodec.generate_code(code, var_name='df'))
+    # print(reply)
+    obj = JupyterTextCodec.extract_decode(reply)
+    print(obj)
+```
+## 进阶函数
+1. 由于`code`是字符串，动态传入`list/dict`太麻烦，所以还提供了`auto_execute`
+2. 用户提前对函数套上`auto_execute`装饰器，就可以快速使用
+3. 参考[examples/jqresearch.py](examples/jqresearch.py)
+## 自动登录并获取数据的完整示例
+参考[examples/playwright/joinquant.py](examples/playwright/joinquant.py)
+## 核心代码
+1. `JupyterTextCodec`: 目前使用`base85`编解码器，使用字符串传输数据，压缩率高。字符串被截断时，必须使用`JupyterImageCodec`
+2. `JupyterImageCodec`: 图片编解码器，使用图片传输数据，`base64`编码压缩率低
+3. `generate_code`生成可在`Notebook`单元格中运行的代码字符串，一定要指定需要获取的变量名`var_name`
+4. `kernel.execute`在服务段执行字符串代码，返回`json`对象
+5. `extract_decode`从`json`中提取数据后解码成对象
+## 注意
+1. 由于各平台限制，`generate_code`生成的代码可能无法运行，可以复制到`Notebook`中测试
+2. `python3.6`问题太多，可以打开一个`ipynb`文件后，通过菜单更改内核为最新版
+3. 可以连接到已经打开的内核，只要提供`kernel_id`参数即可。参考`ricequant.py`示例
+4. `Notebook`中可以导入当前目录中`py`，但本项目直接使用当前目录是`/`，导致导入失败，通过指定`kernel_id`可解决

jupyter_data_fetch-0.1.0/jupyter_data_fetch/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ from jupyter_data_fetch._version import __version__

jupyter_data_fetch-0.1.0/jupyter_data_fetch/_version.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = "0.1.0"

jupyter_data_fetch-0.1.0/jupyter_data_fetch/codec.py ADDED Viewed

@@ -0,0 +1,191 @@
+import base64
+from io import BytesIO
+from types import SimpleNamespace
+import pandas as pd
+BASE_CODE = """
+# 不建议用python3.6
+from io import BytesIO
+import pandas as pd
+try:
+    buf = BytesIO()
+    pd.to_pickle({0}, buf, compression='gzip') # OSError: [Errno 9] write() on read-only GzipFile object
+    buf.seek(0)
+    compressed = buf.read()
+except OSError:
+    import gzip
+    buf = BytesIO()
+    pd.to_pickle({0}, buf)
+    buf.seek(0) # ValueError: I/O operation on closed file.
+    compressed = gzip.compress(buf.getvalue())
+"""
+TO_DICT_CODE = """
+# 客户端无法反序列化的类，转换成字典
+def object_to_dict(obj, exclude=None):
+    exclude = set(exclude or [])
+    return {
+        attr: getattr(obj, attr)
+        for attr in dir(obj)
+        if not attr.startswith('_')
+        and attr not in exclude
+        and not callable(getattr(obj, attr))
+    }
+"""
+def dict_to_object(d, exclude=None):
+    """从字典恢复为命名空间对象"""
+    exclude = set(exclude or [])
+    filtered = {k: v for k, v in d.items() if k not in exclude}
+    return SimpleNamespace(**filtered)
+def extract_from_reply(reply):
+    """print和!,都是走本路径"""
+    if reply['status'] == 'error':
+        error_msg = '\n'.join(reply['outputs'][0]['traceback'])
+        raise RuntimeError(f"Jupyter execution error:\n{error_msg}")
+    else:
+        return reply['outputs'][0]['text']
+class JupyterTextCodec:
+    """
+    ## 编码
+    1. 数据先pickle序列化
+    2. 使用base85编码。比base64更节省空间
+    3. 通过print输出。部分平台限制了print长度，可以用图片解决
+    ## 解码
+    1. json中提取base85字符
+    2. pickle反序列化
+    """
+    @staticmethod
+    def generate_code(*codes, var_name='df'):
+        codes_str = '\n'.join(codes)
+        return f"""
+{codes_str}
+{BASE_CODE.format(var_name)}
+import base64
+base64.b85encode(compressed).decode('ascii')
+"""
+    @staticmethod
+    def extract_from_reply(reply):
+        if reply['status'] == 'error':
+            error_msg = '\n'.join(reply['outputs'][0]['traceback'])
+            raise RuntimeError(f"Jupyter execution error:\n{error_msg}")
+        else:
+            # return reply['outputs'][0]['text']
+            return reply['outputs'][0]['data']['text/plain']
+    @staticmethod
+    def decode(text):
+        text = text[1:-1]
+        return pd.read_pickle(BytesIO(base64.b85decode(text)), compression='gzip')
+    @staticmethod
+    def extract_decode(reply):
+        text = JupyterTextCodec.extract_from_reply(reply)
+        return JupyterTextCodec.decode(text)
+class JupyterImageCodec:
+    """
+    ## 编码
+    1. 数据先pickle序列化
+    2. 转换到灰度图
+    3. 利用Notebook展示图片，隐含了base64编码
+    ## 解码
+    1. json中提取图片base64
+    2. base64解码后，打开为图片
+    3. 提取图片数据区
+    4. pickle反序列化
+    """
+    @staticmethod
+    def generate_code(*codes, var_name='df'):
+        codes_str = '\n'.join(codes)
+        return f"""
+{codes_str}
+{BASE_CODE.format(var_name)}
+import numpy as np
+from PIL import Image
+side = int(np.ceil(np.sqrt(len(compressed))))
+padded_data = np.pad(np.frombuffer(compressed, dtype=np.uint8),(0, side * side - len(compressed)),mode='constant')
+img_array = padded_data.reshape(side, side)
+img = Image.fromarray(img_array, 'L')
+img
+"""
+    @staticmethod
+    def extract_from_reply(reply):
+        if reply['status'] == 'error':
+            error_msg = '\n'.join(reply['outputs'][0]['traceback'])
+            raise RuntimeError(f"Jupyter execution error:\n{error_msg}")
+        else:
+            return reply['outputs'][0]['data']['image/png']
+    @staticmethod
+    def decode(b64_string):
+        import numpy as np
+        from PIL import Image
+        img_array = np.array(Image.open(BytesIO(base64.b64decode(b64_string))))
+        return pd.read_pickle(BytesIO(img_array), compression='gzip')
+    @staticmethod
+    def extract_decode(reply):
+        b64_string = JupyterImageCodec.extract_from_reply(reply)
+        return JupyterImageCodec.decode(b64_string)  # if b64_string else None
+# ======================================================================================
+from enum import Enum
+class CodecType(Enum):
+    TEXT = JupyterTextCodec
+    IMAGE = JupyterImageCodec
+class LazyKernel:
+    _kernel = None
+    _codec_type = CodecType.TEXT
+    @classmethod
+    def set_kernel(cls, kernel_obj):
+        cls._kernel = kernel_obj
+    @classmethod
+    def get_kernel(cls):
+        if cls._kernel is None:
+            raise RuntimeError("kernel 尚未初始化")
+        return cls._kernel
+    @classmethod
+    def set_codec(cls, codec_type: CodecType):
+        cls._codec_type = codec_type
+    @classmethod
+    def get_codec(cls):
+        if cls._codec_type == CodecType.IMAGE:
+            return JupyterImageCodec
+        return JupyterTextCodec

jupyter_data_fetch-0.1.0/jupyter_data_fetch/wraps/__init__.py ADDED Viewed

@@ -0,0 +1,34 @@
+# 本目录下只是演示如何封装API
+import inspect
+from functools import wraps
+from jupyter_data_fetch.codec import LazyKernel
+def auto_execute(func):
+    """
+    1. 在Notebook中用`help()`得到函数签名，然后套装饰器
+    2. 调用时必须单独一行
+    3. 建议只是临时使用，还是要使用完整版
+    """
+    @wraps(func)
+    def wrapper(*args, **kwargs):
+        kernel = LazyKernel.get_kernel()
+        codec = LazyKernel.get_codec()
+        frame = inspect.currentframe().f_back
+        call_line = inspect.getframeinfo(frame).code_context[0].strip()
+        code = f"""
+# 在外部调用时，必须单独成一行
+# 部分函数拼接后有缺失时，需退回到原始写法
+_ = {call_line}
+"""
+        # print(code)
+        reply = kernel.execute(codec.generate_code(code, var_name='_'))
+        return codec.extract_decode(reply)
+    return wrapper

jupyter_data_fetch-0.1.0/jupyter_data_fetch/wraps/jqdatasdk.py ADDED Viewed

@@ -0,0 +1,62 @@
+# Notebook中可以通过help()或??获得函数签名
+from jupyter_data_fetch.codec import LazyKernel, TO_DICT_CODE, dict_to_object
+def get_all_securities(types=[], date=None):
+    kernel = LazyKernel.get_kernel()
+    codec = LazyKernel.get_codec()
+    code = f"""_ = get_all_securities({repr(types)}, {repr(date)})"""
+    # print(code)
+    return codec.extract_decode(kernel.execute(codec.generate_code(code, var_name='_')))
+def get_price(security, start_date=None, end_date=None, frequency='daily', fields=None, skip_paused=False, fq='pre', count=None, panel=True, fill_paused=True, round=True):
+    kernel = LazyKernel.get_kernel()
+    codec = LazyKernel.get_codec()
+    code = f"""_ = get_price({repr(security)}, {repr(start_date)}, {repr(end_date)}, {repr(frequency)}, {repr(fields)}, {repr(skip_paused)}, {repr(fq)}, {repr(count)}, {repr(panel)}, {repr(fill_paused)}, {repr(round)})"""
+    # print(code)
+    return codec.extract_decode(kernel.execute(codec.generate_code(code, var_name='_')))
+def get_security_info(code, date=None):
+    kernel = LazyKernel.get_kernel()
+    codec = LazyKernel.get_codec()
+    code = f"""
+{TO_DICT_CODE}
+_ = get_security_info({repr(code)}, {repr(date)}) # ModuleNotFoundError: No module named 'jqdata'
+_ = object_to_dict(_)
+"""
+    _ = codec.extract_decode(kernel.execute(codec.generate_code(code, var_name='_')))
+    return dict_to_object(_)  # 字典还原成对象
+def get_fundamentals(query_object: str, date=None, statDate=None):
+    """注意：原函数是传的query_object，但这要将str当object,所以不能加repr"""
+    kernel = LazyKernel.get_kernel()
+    codec = LazyKernel.get_codec()
+    code = f"""_ = get_fundamentals({query_object}, {repr(date)}, {repr(statDate)})"""
+    # print(code)
+    return codec.extract_decode(kernel.execute(codec.generate_code(code, var_name='_')))
+def get_index_weights(index_id, date=None):
+    kernel = LazyKernel.get_kernel()
+    codec = LazyKernel.get_codec()
+    code = f"""_ = get_index_weights({repr(index_id)}, {repr(date)})"""
+    return codec.extract_decode(kernel.execute(codec.generate_code(code, var_name='_')))
+def get_extras(info, security_list, start_date=None, end_date='2015-12-31', df=True, count=None):
+    kernel = LazyKernel.get_kernel()
+    codec = LazyKernel.get_codec()
+    code = f"""_ = get_extras({repr(info)}, {repr(security_list)}, {repr(start_date)}, {repr(end_date)}, {repr(df)}, {repr(count)})"""
+    return codec.extract_decode(kernel.execute(codec.generate_code(code, var_name='_')))
+def get_industry(security, date=None):
+    kernel = LazyKernel.get_kernel()
+    codec = LazyKernel.get_codec()
+    code = f"""_ = get_industry({repr(security)}, {repr(date)})"""
+    return codec.extract_decode(kernel.execute(codec.generate_code(code, var_name='_')))

jupyter_data_fetch-0.1.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,25 @@
+[project]
+name = "jupyter-data-fetch"
+description = "fetch data from jupyter notebook"
+readme = "README.md"
+requires-python = ">=3.9"
+dynamic = ["version"]
+dependencies = [
+    'jupyter_kernel_client',
+    'pandas<3.0',
+    "pillow", # JupyterImageCodecc才需要
+]
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+[tool.hatch.version]
+path = "jupyter_data_fetch/_version.py"
+[tool.hatch.build.targets.wheel]
+packages = ["jupyter_data_fetch"]
+include-package-data = true
+[tool.hatch.build.targets.sdist]
+include = ["jupyter_data_fetch*"]