PyPI - xparse-client - Versions diffs - 0.2.9__tar.gz → 0.2.11__tar.gz - Mend

xparse-client 0.2.9tar.gz → 0.2.11tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

{xparse_client-0.2.9 → xparse_client-0.2.11}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: xparse-client
-Version: 0.2.9
+Version: 0.2.11
 Summary: 面向Agent和RAG的新一代文档处理 AI Infra
 License-Expression: MIT
 Project-URL: Homepage, https://gitlab.intsig.net/xparse1/xparse-pipeline
@@ -10,9 +10,11 @@ Requires-Python: >=3.8
 Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: boto3
-Requires-Dist: pymilvus[milvus_lite]
+Requires-Dist: pymilvus
+Requires-Dist: milvus-lite
 Requires-Dist: requests
 Requires-Dist: pysmb
+Requires-Dist: qdrant-client
 Dynamic: license-file
 # xParse
@@ -24,7 +26,7 @@ xParse的同步pipeline实现，支持多种数据源与输出。
 ## 🌟 特点
 - **灵活的数据源**：支持兼容 S3 协议的对象存储、本地文件系统以及 FTP/SMB 协议文件系统
-- **灵活的输出**：支持 Milvus/Zilliz 向量数据库、兼容 S3 协议的对象存储以及本地文件系统
+- **灵活的输出**：支持 Milvus/Zilliz/Qdrant 向量数据库、兼容 S3 协议的对象存储以及本地文件系统
 - **统一 Pipeline API**：使用 `/api/xparse/pipeline` 一次性完成 parse → chunk → embed 全流程
 - **配置化处理**：支持灵活配置 parse、chunk、embed 参数
 - **详细统计信息**：返回每个阶段的处理统计数据
@@ -51,7 +53,7 @@ xParse的同步pipeline实现，支持多种数据源与输出。
                │ [embeddings + stats]
                ▼
        ┌──────────────┐
-       │ Destination  │  目的地（Milvus/Zilliz/本地）
+       │ Destination  │  目的地（Milvus/Zilliz/Qdrant/本地）
        └──────────────┘
 ```
@@ -69,7 +71,7 @@ pip install --upgrade xparse-client
 #### 代码配置
 ```python
-from xparse_client import ParseConfig, ChunkConfig, EmbedConfig, Stage, Pipeline, S3Source, MilvusDestination
+from xparse_client import ParseConfig, ChunkConfig, EmbedConfig, Stage, Pipeline, S3Source, MilvusDestination, QdrantDestination
 # 使用新的 stages 格式创建配置
 stages = [
@@ -354,6 +356,28 @@ destination = MilvusDestination(
 )
 ```
+#### Qdrant 向量存储
+```python
+destination = QdrantDestination(
+    url='http://localhost:6333',  # Qdrant 服务地址（本地或云端）
+    collection_name='my_collection',  # Collection 名称
+    dimension=1024,  # 向量维度，需与 embed API 返回一致
+    api_key='your-api-key',  # 可选，Qdrant Cloud API Key
+    prefer_grpc=False  # 可选，是否优先使用 gRPC（默认 False）
+)
+```
+**Qdrant Cloud 示例：**
+```python
+destination = QdrantDestination(
+    url='https://xxxxxxx.us-east-1-0.aws.cloud.qdrant.io',
+    collection_name='my_collection',
+    dimension=1024,
+    api_key='your-api-key'
+)
+```
 #### 本地文件系统目的地
 将在配置的本地文件地址中写入`json`文件。

xparse_client-0.2.9/xparse_client.egg-info/PKG-INFO → xparse_client-0.2.11/README.md RENAMED Viewed

@@ -1,20 +1,3 @@
-Metadata-Version: 2.4
-Name: xparse-client
-Version: 0.2.9
-Summary: 面向Agent和RAG的新一代文档处理 AI Infra
-License-Expression: MIT
-Project-URL: Homepage, https://gitlab.intsig.net/xparse1/xparse-pipeline
-Project-URL: Repository, https://gitlab.intsig.net/xparse1/xparse-pipeline
-Keywords: xparse,pipeline,rag
-Requires-Python: >=3.8
-Description-Content-Type: text/markdown
-License-File: LICENSE
-Requires-Dist: boto3
-Requires-Dist: pymilvus[milvus_lite]
-Requires-Dist: requests
-Requires-Dist: pysmb
-Dynamic: license-file
 # xParse
 面向Agent和RAG的新一代文档处理 AI Infra。
@@ -24,7 +7,7 @@ xParse的同步pipeline实现，支持多种数据源与输出。
 ## 🌟 特点
 - **灵活的数据源**：支持兼容 S3 协议的对象存储、本地文件系统以及 FTP/SMB 协议文件系统
-- **灵活的输出**：支持 Milvus/Zilliz 向量数据库、兼容 S3 协议的对象存储以及本地文件系统
+- **灵活的输出**：支持 Milvus/Zilliz/Qdrant 向量数据库、兼容 S3 协议的对象存储以及本地文件系统
 - **统一 Pipeline API**：使用 `/api/xparse/pipeline` 一次性完成 parse → chunk → embed 全流程
 - **配置化处理**：支持灵活配置 parse、chunk、embed 参数
 - **详细统计信息**：返回每个阶段的处理统计数据
@@ -51,7 +34,7 @@ xParse的同步pipeline实现，支持多种数据源与输出。
                │ [embeddings + stats]
                ▼
        ┌──────────────┐
-       │ Destination  │  目的地（Milvus/Zilliz/本地）
+       │ Destination  │  目的地（Milvus/Zilliz/Qdrant/本地）
        └──────────────┘
 ```
@@ -69,7 +52,7 @@ pip install --upgrade xparse-client
 #### 代码配置
 ```python
-from xparse_client import ParseConfig, ChunkConfig, EmbedConfig, Stage, Pipeline, S3Source, MilvusDestination
+from xparse_client import ParseConfig, ChunkConfig, EmbedConfig, Stage, Pipeline, S3Source, MilvusDestination, QdrantDestination
 # 使用新的 stages 格式创建配置
 stages = [
@@ -354,6 +337,28 @@ destination = MilvusDestination(
 )
 ```
+#### Qdrant 向量存储
+```python
+destination = QdrantDestination(
+    url='http://localhost:6333',  # Qdrant 服务地址（本地或云端）
+    collection_name='my_collection',  # Collection 名称
+    dimension=1024,  # 向量维度，需与 embed API 返回一致
+    api_key='your-api-key',  # 可选，Qdrant Cloud API Key
+    prefer_grpc=False  # 可选，是否优先使用 gRPC（默认 False）
+)
+```
+**Qdrant Cloud 示例：**
+```python
+destination = QdrantDestination(
+    url='https://xxxxxxx.us-east-1-0.aws.cloud.qdrant.io',
+    collection_name='my_collection',
+    dimension=1024,
+    api_key='your-api-key'
+)
+```
 #### 本地文件系统目的地
 将在配置的本地文件地址中写入`json`文件。

{xparse_client-0.2.9 → xparse_client-0.2.11}/example/run_pipeline.py RENAMED Viewed

@@ -84,7 +84,7 @@ def run_with_config():
 def run_with_manual_setup():
     """手动创建 Source、Destination 和 Pipeline"""
-    from xparse_client import ChunkConfig, EmbedConfig, ParseConfig, Stage, PipelineConfig, LocalDestination
+    from xparse_client import ChunkConfig, EmbedConfig, ParseConfig, Stage, PipelineConfig, LocalDestination, QdrantDestination
     # 创建 S3 数据源
     # source = S3Source(
@@ -101,8 +101,15 @@ def run_with_manual_setup():
     #     secret_key='JFIIaTGiXelv7DgBYNIBSStofF0S98',
     #     bucket='textin',
     #     prefix='',
-    #     region='cn-shanghai',
-    #     pattern='*.png'
+    #     region='cn-shanghai'
+    # )
+    # source=S3Source(
+    #     endpoint='https://S3.oss-cn-shanghai.aliyuncs.com',
+    #     access_key='LTAI5t6ZnqTra8oLmJEfvcr7',
+    #     secret_key='SEbz4oJ4KNJIOTMfphuVGOWmRpGGUG',
+    #     bucket='textin-test-aliyun',
+    #     prefix='',
+    #     region='cn-shanghai'
     # )
     # source = S3Source(
     #     endpoint='https://cos.ap-shanghai.myqcloud.com',
@@ -132,7 +139,7 @@ def run_with_manual_setup():
     #     endpoint='https://s3.us-east-1.amazonaws.com',
     #     access_key='AKIA6QUE3TVZADUWA4PO',
     #     secret_key='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',
-    #     bucket='textin-xparse',
+    #     bucket='textin-test',
     #     prefix='',
     #     region='us-east-1'
     # )
@@ -160,9 +167,18 @@ def run_with_manual_setup():
     # )
     source = LocalSource(
         directory='/Users/ke_wang/Documents/doc',
-        recursive=False,
-        pattern=['*']  # 支持通配符: *.pdf, *.docx, **/*.txt
+        pattern=['*.pdf'],
+        recursive=True,
     )
+    # source=S3Source(
+    #     endpoint='https://obs.cn-north-4.myhuaweicloud.com',
+    #     access_key='HPUAFT3D1Q6O6UUN1RWQ',
+    #     secret_key='4zIk8x37nZiDS9P585BTFCWsOSo5G7ok1yRWtEA1',
+    #     bucket='textin-test-ywj',
+    #     prefix='',
+    #     region='cn-north-4'
+    # )# 华为云
     # 创建 Milvus 目的地
     # destination = MilvusDestination(
@@ -171,17 +187,17 @@ def run_with_manual_setup():
     #     dimension=1024
     # )
-    # destination = LocalDestination(
-    #     output_dir='./result'
-    # )
-    destination = MilvusDestination(
-        db_path='https://in03-5388093d0db1707.serverless.ali-cn-hangzhou.cloud.zilliz.com.cn', # zilliz连接地址
-        collection_name='textin_test_3_copy', # 数据库collection名称
-        dimension=1024,  # 向量维度，需与 embed API 返回一致
-        api_key='872c3f5b3f3995c80dcda5c3d34f1f608815aef7671b6ee391ab37e40e79c892ce56d9c8c6565a03a3fd66da7e11b67f384c5c46'  # Zilliz Cloud API Key
+    destination = LocalDestination(
+        output_dir='./result'
     )
+    # destination = MilvusDestination(
+    #     db_path='https://in03-5388093d0db1707.serverless.ali-cn-hangzhou.cloud.zilliz.com.cn', # zilliz连接地址
+    #     collection_name='textin_test_3_copy', # 数据库collection名称
+    #     dimension=1024,  # 向量维度，需与 embed API 返回一致
+    #     api_key='872c3f5b3f3995c80dcda5c3d34f1f608815aef7671b6ee391ab37e40e79c892ce56d9c8c6565a03a3fd66da7e11b67f384c5c46'  # Zilliz Cloud API Key
+    # )
     # destination = S3Destination(
     #     endpoint='https://cos.ap-shanghai.myqcloud.com',
     #     access_key='',
@@ -190,12 +206,19 @@ def run_with_manual_setup():
     #     prefix='result',
     #     region='ap-shanghai'
     # )
+    # destination = QdrantDestination(
+    #     url='https://1325db22-7dd8-4fc9-930b-f969d4963b3d.us-east-1-1.aws.cloud.qdrant.io:6333',
+    #     collection_name='textin1',
+    #     dimension=1024,
+    #     api_key='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3MiOiJtIn0.TGnFB1pAD7c7IqSOvTpgCPpHXSnnoKhWEQ5pQ8DrBnI',
+    # )
     # 使用新的 stages 格式创建配置
     stages = [
         Stage(
             type='parse',
-            config=ParseConfig(provider='textin')
+            config=ParseConfig(provider='textin', page_ranges='3')
         ),
         Stage(
             type='chunk',

{xparse_client-0.2.9 → xparse_client-0.2.11}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "xparse-client"
-version = "0.2.9"
+version = "0.2.11"
 description = "面向Agent和RAG的新一代文档处理 AI Infra"
 readme = "README.md"
 license = "MIT"
@@ -14,9 +14,11 @@ keywords = ["xparse", "pipeline", "rag"]
 requires-python = ">=3.8"
 dependencies = [
     "boto3",
-    "pymilvus[milvus_lite]",
+    "pymilvus",
+    "milvus-lite",
     "requests",
-    "pysmb"
+    "pysmb",
+    "qdrant-client"
 ]
 [project.urls]

{xparse_client-0.2.9 → xparse_client-0.2.11}/xparse_client/__init__.py RENAMED Viewed

@@ -10,7 +10,7 @@ logging.basicConfig(
 from .pipeline.config import ParseConfig, ChunkConfig, EmbedConfig, Stage, PipelineStats, PipelineConfig
 from .pipeline.sources import Source, S3Source, LocalSource, FtpSource, SmbSource
-from .pipeline.destinations import Destination, MilvusDestination, LocalDestination, S3Destination
+from .pipeline.destinations import Destination, MilvusDestination, QdrantDestination, LocalDestination, S3Destination
 from .pipeline.pipeline import Pipeline, create_pipeline_from_config
 __all__ = [
@@ -27,6 +27,7 @@ __all__ = [
     'SmbSource',
     'Destination',
     'MilvusDestination',
+    'QdrantDestination',
     'LocalDestination',
     'S3Destination',
     'Pipeline',

{xparse_client-0.2.9 → xparse_client-0.2.11}/xparse_client/pipeline/destinations.py RENAMED Viewed

@@ -13,7 +13,8 @@ from typing import List, Dict, Any
 from botocore.config import Config
 from pymilvus import MilvusClient
+from qdrant_client import QdrantClient
+from qdrant_client.models import Distance, VectorParams, PointStruct, PayloadSchemaType
 logger = logging.getLogger(__name__)
@@ -127,7 +128,7 @@ class MilvusDestination(Destination):
                         print(f"  ✓ 删除现有记录: record_id={record_id}, 删除 {deleted_count} 条")
                         logger.info(f"删除 Milvus 现有记录: record_id={record_id}, 删除 {deleted_count} 条")
                     else:
-                        print(f"  → 未找到现有记录: record_id={record_id}")
+                        print(f"  → 准备写入记录: record_id={record_id}")
                 except Exception as e:
                     print(f"  ! 删除现有记录失败: {str(e)}")
                     logger.warning(f"删除 Milvus 现有记录失败: record_id={record_id}, {str(e)}")
@@ -296,9 +297,190 @@ class S3Destination(Destination):
             return False
+class QdrantDestination(Destination):
+    """Qdrant 向量数据库目的地"""
+    def __init__(self, url: str, collection_name: str, dimension: int, api_key: str = None, prefer_grpc: bool = False):
+        """初始化 Qdrant 目的地
+        Args:
+            url: Qdrant 服务地址（如 'http://localhost:6333' 或 'https://xxx.qdrant.io'）
+            collection_name: Collection 名称
+            dimension: 向量维度
+            api_key: API Key（可选，用于 Qdrant Cloud）
+            prefer_grpc: 是否优先使用 gRPC（默认 False，使用 HTTP）
+        """
+        self.url = url
+        self.collection_name = collection_name
+        self.dimension = dimension
+        client_kwargs = {'url': url}
+        if api_key:
+            client_kwargs['api_key'] = api_key
+        if prefer_grpc:
+            client_kwargs['prefer_grpc'] = True
+        self.client = QdrantClient(**client_kwargs)
+        # 检查或创建 collection
+        try:
+            collections = self.client.get_collections()
+            collection_exists = any(col.name == collection_name for col in collections.collections)
+            if not collection_exists:
+                self.client.create_collection(
+                    collection_name=collection_name,
+                    vectors_config=VectorParams(
+                        size=dimension,
+                        distance=Distance.COSINE
+                    )
+                )
+                # 为 record_id 创建索引，用于过滤查询
+                try:
+                    self.client.create_payload_index(
+                        collection_name=collection_name,
+                        field_name="record_id",
+                        field_schema=PayloadSchemaType.KEYWORD
+                    )
+                    print(f"✓ Qdrant Collection 创建: {collection_name} (维度: {dimension})")
+                except Exception as e:
+                    logger.warning(f"创建 record_id 索引失败（可能已存在）: {str(e)}")
+                    print(f"✓ Qdrant Collection 创建: {collection_name} (维度: {dimension})")
+            else:
+                print(f"✓ Qdrant Collection 存在: {collection_name}")
+                # 确保 record_id 索引存在（如果不存在则创建）
+                try:
+                    self.client.create_payload_index(
+                        collection_name=collection_name,
+                        field_name="record_id",
+                        field_schema=PayloadSchemaType.KEYWORD
+                    )
+                except Exception as e:
+                    # 索引可能已存在，忽略错误
+                    logger.debug(f"record_id 索引可能已存在: {str(e)}")
+            logger.info(f"Qdrant 连接成功: {url}/{collection_name}")
+        except Exception as e:
+            print(f"✗ Qdrant 连接失败: {str(e)}")
+            logger.error(f"Qdrant 连接失败: {str(e)}")
+            raise
+    def write(self, data: List[Dict[str, Any]], metadata: Dict[str, Any]) -> bool:
+        try:
+            # 如果 metadata 中有 record_id，先删除相同 record_id 的现有记录
+            record_id = metadata.get('record_id')
+            if record_id:
+                try:
+                    # 查询并删除相同 record_id 的所有记录
+                    # 使用字典格式的 filter（兼容性更好）
+                    scroll_result = self.client.scroll(
+                        collection_name=self.collection_name,
+                        scroll_filter={
+                            "must": [
+                                {
+                                    "key": "record_id",
+                                    "match": {"value": record_id}
+                                }
+                            ]
+                        },
+                        limit=10000  # 假设单次最多删除 10000 条
+                    )
+                    if scroll_result[0]:  # 有记录
+                        point_ids = [point.id for point in scroll_result[0]]
+                        self.client.delete(
+                            collection_name=self.collection_name,
+                            points_selector=point_ids
+                        )
+                        print(f"  ✓ 删除现有记录: record_id={record_id}, 删除 {len(point_ids)} 条")
+                        logger.info(f"删除 Qdrant 现有记录: record_id={record_id}, 删除 {len(point_ids)} 条")
+                    else:
+                        print(f"  → 准备写入记录: record_id={record_id}")
+                except Exception as e:
+                    print(f"  ! 删除现有记录失败: {str(e)}")
+                    logger.warning(f"删除 Qdrant 现有记录失败: record_id={record_id}, {str(e)}")
+                    # 继续执行写入，不因为删除失败而中断
+            else:
+                print(f"  → 没有 record_id")
+                logger.warning(f"没有 record_id")
+                return False
+            points = []
+            for item in data:
+                # 获取元素级别的 metadata
+                element_metadata = item.get('metadata', {})
+                if 'embeddings' in item and item['embeddings']:
+                    element_id = item.get('element_id') or item.get('id') or str(uuid.uuid4())
+                    # 构建 payload（元数据）
+                    payload = {
+                        'text': item.get('text', ''),
+                        'record_id': record_id,
+                    }
+                    # 合并文件级别的 metadata 和元素级别的 metadata
+                    # 文件级别的 metadata 优先级更高
+                    merged_metadata = {**element_metadata, **metadata}
+                    # 将 metadata 中的字段添加到 payload
+                    # 排除已存在的固定字段，避免冲突
+                    fixed_fields = {'embeddings', 'text', 'element_id', 'record_id', 'created_at', 'metadata'}
+                    for key, value in merged_metadata.items():
+                        if key not in fixed_fields:
+                            # 特殊处理 data_source 字段：如果是字典则递归展平
+                            if key == 'data_source' and isinstance(value, dict):
+                                # 递归展平 data_source 字典，包括嵌套的字典
+                                flattened = _flatten_dict(value, 'data_source', fixed_fields)
+                                payload.update(flattened)
+                            elif key == 'coordinates' and isinstance(value, list):
+                                payload[key] = value
+                            elif isinstance(value, (dict, list)):
+                                # Qdrant 支持 JSON 格式的 payload
+                                payload[key] = value
+                            else:
+                                payload[key] = value
+                    # 创建 Point（id 是必需的）
+                    # Qdrant 的 point id 可以是整数或 UUID 字符串
+                    # 如果 element_id 是 UUID 格式，直接使用；否则转换为 UUID5（基于 element_id 生成稳定的 UUID）
+                    try:
+                        # 尝试将 element_id 解析为 UUID
+                        point_id = str(uuid.UUID(element_id))
+                    except (ValueError, TypeError):
+                        # 如果不是有效的 UUID，使用 UUID5 基于 element_id 生成稳定的 UUID
+                        point_id = str(uuid.uuid5(uuid.NAMESPACE_URL, str(element_id)))
+                    point = PointStruct(
+                        id=point_id,
+                        vector=item['embeddings'],
+                        payload=payload
+                    )
+                    points.append(point)
+            if not points:
+                print(f"  ! 警告: 没有有效的向量数据")
+                return False
+            # 批量插入
+            self.client.upsert(
+                collection_name=self.collection_name,
+                points=points
+            )
+            print(f"  ✓ 写入 Qdrant: {len(points)} 条")
+            logger.info(f"写入 Qdrant 成功: {len(points)} 条")
+            return True
+        except Exception as e:
+            print(f"  ✗ 写入 Qdrant 失败: {str(e)}")
+            logger.error(f"写入 Qdrant 失败: {str(e)}")
+            return False
 __all__ = [
     'Destination',
     'MilvusDestination',
+    'QdrantDestination',
     'LocalDestination',
     'S3Destination',
 ]

{xparse_client-0.2.9 → xparse_client-0.2.11}/xparse_client/pipeline/pipeline.py RENAMED Viewed

@@ -12,7 +12,7 @@ import requests
 from .config import ParseConfig, ChunkConfig, EmbedConfig, Stage, PipelineStats, PipelineConfig
 from .sources import Source, S3Source, LocalSource, FtpSource, SmbSource
-from .destinations import Destination, MilvusDestination, LocalDestination, S3Destination
+from .destinations import Destination, MilvusDestination, QdrantDestination, LocalDestination, S3Destination
 logger = logging.getLogger(__name__)
@@ -145,6 +145,14 @@ class Pipeline:
                 'dimension': self.destination.dimension
             })
             # api_key 和 token 不在对象中保存，无法恢复
+        elif isinstance(self.destination, QdrantDestination):
+            config['destination'].update({
+                'url': self.destination.url,
+                'collection_name': self.destination.collection_name,
+                'dimension': self.destination.dimension,
+                'prefer_grpc': getattr(self.destination, 'prefer_grpc', False)
+            })
+            # api_key 不在对象中保存，无法恢复
         elif isinstance(self.destination, LocalDestination):
             config['destination'].update({
                 'output_dir': str(self.destination.output_dir)
@@ -503,6 +511,14 @@ def create_pipeline_from_config(config: Dict[str, Any]) -> Pipeline:
             api_key=dest_config.get('api_key'),
             token=dest_config.get('token')
         )
+    elif dest_config['type'] == 'qdrant':
+        destination = QdrantDestination(
+            url=dest_config['url'],
+            collection_name=dest_config['collection_name'],
+            dimension=dest_config['dimension'],
+            api_key=dest_config.get('api_key'),
+            prefer_grpc=dest_config.get('prefer_grpc', False)
+        )
     elif dest_config['type'] == 'local':
         destination = LocalDestination(
             output_dir=dest_config['output_dir']

{xparse_client-0.2.9 → xparse_client-0.2.11}/xparse_client/pipeline/sources.py RENAMED Viewed

@@ -121,6 +121,8 @@ class S3Source(Source):
         if self.endpoint == 'https://textin-minio-api.ai.intsig.net':
             config = Config(signature_version='s3v4')
+        elif self.endpoint.endswith('aliyuncs.com'):
+            config = Config(signature_version='s3', s3={'addressing_style': 'virtual'})
         else:
             config = Config(signature_version='s3v4', s3={'addressing_style': 'virtual'})

xparse_client-0.2.9/README.md → xparse_client-0.2.11/xparse_client.egg-info/PKG-INFO RENAMED Viewed

@@ -1,3 +1,22 @@
+Metadata-Version: 2.4
+Name: xparse-client
+Version: 0.2.11
+Summary: 面向Agent和RAG的新一代文档处理 AI Infra
+License-Expression: MIT
+Project-URL: Homepage, https://gitlab.intsig.net/xparse1/xparse-pipeline
+Project-URL: Repository, https://gitlab.intsig.net/xparse1/xparse-pipeline
+Keywords: xparse,pipeline,rag
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: boto3
+Requires-Dist: pymilvus
+Requires-Dist: milvus-lite
+Requires-Dist: requests
+Requires-Dist: pysmb
+Requires-Dist: qdrant-client
+Dynamic: license-file
 # xParse
 面向Agent和RAG的新一代文档处理 AI Infra。
@@ -7,7 +26,7 @@ xParse的同步pipeline实现，支持多种数据源与输出。
 ## 🌟 特点
 - **灵活的数据源**：支持兼容 S3 协议的对象存储、本地文件系统以及 FTP/SMB 协议文件系统
-- **灵活的输出**：支持 Milvus/Zilliz 向量数据库、兼容 S3 协议的对象存储以及本地文件系统
+- **灵活的输出**：支持 Milvus/Zilliz/Qdrant 向量数据库、兼容 S3 协议的对象存储以及本地文件系统
 - **统一 Pipeline API**：使用 `/api/xparse/pipeline` 一次性完成 parse → chunk → embed 全流程
 - **配置化处理**：支持灵活配置 parse、chunk、embed 参数
 - **详细统计信息**：返回每个阶段的处理统计数据
@@ -34,7 +53,7 @@ xParse的同步pipeline实现，支持多种数据源与输出。
                │ [embeddings + stats]
                ▼
        ┌──────────────┐
-       │ Destination  │  目的地（Milvus/Zilliz/本地）
+       │ Destination  │  目的地（Milvus/Zilliz/Qdrant/本地）
        └──────────────┘
 ```
@@ -52,7 +71,7 @@ pip install --upgrade xparse-client
 #### 代码配置
 ```python
-from xparse_client import ParseConfig, ChunkConfig, EmbedConfig, Stage, Pipeline, S3Source, MilvusDestination
+from xparse_client import ParseConfig, ChunkConfig, EmbedConfig, Stage, Pipeline, S3Source, MilvusDestination, QdrantDestination
 # 使用新的 stages 格式创建配置
 stages = [
@@ -337,6 +356,28 @@ destination = MilvusDestination(
 )
 ```
+#### Qdrant 向量存储
+```python
+destination = QdrantDestination(
+    url='http://localhost:6333',  # Qdrant 服务地址（本地或云端）
+    collection_name='my_collection',  # Collection 名称
+    dimension=1024,  # 向量维度，需与 embed API 返回一致
+    api_key='your-api-key',  # 可选，Qdrant Cloud API Key
+    prefer_grpc=False  # 可选，是否优先使用 gRPC（默认 False）
+)
+```
+**Qdrant Cloud 示例：**
+```python
+destination = QdrantDestination(
+    url='https://xxxxxxx.us-east-1-0.aws.cloud.qdrant.io',
+    collection_name='my_collection',
+    dimension=1024,
+    api_key='your-api-key'
+)
+```
 #### 本地文件系统目的地
 将在配置的本地文件地址中写入`json`文件。