PyPI - vectordb-bench - Versions diffs - 0.0.18__tar.gz → 0.0.20__tar.gz - Mend

vectordb-bench 0.0.18tar.gz → 0.0.20tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (176) hide show

{vectordb_bench-0.0.18 → vectordb_bench-0.0.20}/.github/workflows/pull_request.yml RENAMED Viewed

@@ -31,6 +31,10 @@ jobs:
           python -m pip install --upgrade pip
           pip install -e ".[test]"
+      - name: Run coding checks
+        run: |
+          make lint
       - name: Test with pytest
         run: |
           make unittest

vectordb_bench-0.0.20/Makefile ADDED Viewed

@@ -0,0 +1,10 @@
+unittest:
+	PYTHONPATH=`pwd` python3 -m pytest tests/test_dataset.py::TestDataSet::test_download_small -svv
+format:
+	PYTHONPATH=`pwd` python3 -m black vectordb_bench
+	PYTHONPATH=`pwd` python3 -m ruff check vectordb_bench --fix
+lint:
+	PYTHONPATH=`pwd` python3 -m black vectordb_bench --check
+	PYTHONPATH=`pwd` python3 -m ruff check vectordb_bench

{vectordb_bench-0.0.18 → vectordb_bench-0.0.20}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
-Metadata-Version: 2.1
+Metadata-Version: 2.2
 Name: vectordb-bench
-Version: 0.0.18
+Version: 0.0.20
 Summary: VectorDBBench is not just an offering of benchmark results for mainstream vector databases and cloud services, it's your go-to tool for the ultimate performance and cost-effectiveness comparison. Designed with ease-of-use in mind, VectorDBBench is devised to help users, even non-professionals, reproduce results or test new systems, making the hunt for the optimal choice amongst a plethora of cloud services and open-source vector databases a breeze.
 Author-email: XuanYang-cn <xuan.yang@zilliz.com>
 Project-URL: repository, https://github.com/zilliztech/VectorDBBench
@@ -26,6 +26,7 @@ Requires-Dist: pydantic<v2
 Requires-Dist: scikit-learn
 Requires-Dist: pymilvus
 Provides-Extra: test
+Requires-Dist: black; extra == "test"
 Requires-Dist: ruff; extra == "test"
 Requires-Dist: pytest; extra == "test"
 Provides-Extra: all
@@ -35,15 +36,18 @@ Requires-Dist: qdrant-client; extra == "all"
 Requires-Dist: pinecone-client; extra == "all"
 Requires-Dist: weaviate-client; extra == "all"
 Requires-Dist: elasticsearch; extra == "all"
-Requires-Dist: pgvector; extra == "all"
-Requires-Dist: pgvecto_rs[psycopg3]>=0.2.2; extra == "all"
 Requires-Dist: sqlalchemy; extra == "all"
 Requires-Dist: redis; extra == "all"
 Requires-Dist: chromadb; extra == "all"
+Requires-Dist: pgvector; extra == "all"
 Requires-Dist: psycopg; extra == "all"
 Requires-Dist: psycopg-binary; extra == "all"
-Requires-Dist: opensearch-dsl==2.1.0; extra == "all"
-Requires-Dist: opensearch-py==2.6.0; extra == "all"
+Requires-Dist: pgvecto_rs[psycopg3]>=0.2.2; extra == "all"
+Requires-Dist: opensearch-dsl; extra == "all"
+Requires-Dist: opensearch-py; extra == "all"
+Requires-Dist: memorydb; extra == "all"
+Requires-Dist: alibabacloud_ha3engine_vector; extra == "all"
+Requires-Dist: alibabacloud_searchengine20211025; extra == "all"
 Provides-Extra: qdrant
 Requires-Dist: qdrant-client; extra == "qdrant"
 Provides-Extra: pinecone
@@ -56,18 +60,6 @@ Provides-Extra: pgvector
 Requires-Dist: psycopg; extra == "pgvector"
 Requires-Dist: psycopg-binary; extra == "pgvector"
 Requires-Dist: pgvector; extra == "pgvector"
-Provides-Extra: pgvectorscale
-Requires-Dist: psycopg; extra == "pgvectorscale"
-Requires-Dist: psycopg-binary; extra == "pgvectorscale"
-Requires-Dist: pgvector; extra == "pgvectorscale"
-Provides-Extra: pgdiskann
-Requires-Dist: psycopg; extra == "pgdiskann"
-Requires-Dist: psycopg-binary; extra == "pgdiskann"
-Requires-Dist: pgvector; extra == "pgdiskann"
-Provides-Extra: alloydb
-Requires-Dist: psycopg; extra == "alloydb"
-Requires-Dist: psycopg-binary; extra == "alloydb"
-Requires-Dist: pgvector; extra == "alloydb"
 Provides-Extra: pgvecto-rs
 Requires-Dist: pgvecto_rs[psycopg3]>=0.2.2; extra == "pgvecto-rs"
 Provides-Extra: redis
@@ -76,15 +68,27 @@ Provides-Extra: memorydb
 Requires-Dist: memorydb; extra == "memorydb"
 Provides-Extra: chromadb
 Requires-Dist: chromadb; extra == "chromadb"
-Provides-Extra: awsopensearch
-Requires-Dist: awsopensearch; extra == "awsopensearch"
-Provides-Extra: zilliz-cloud
+Provides-Extra: opensearch
+Requires-Dist: opensearch-py; extra == "opensearch"
+Provides-Extra: aliyun-opensearch
+Requires-Dist: alibabacloud_ha3engine_vector; extra == "aliyun-opensearch"
+Requires-Dist: alibabacloud_searchengine20211025; extra == "aliyun-opensearch"
 # VectorDBBench: A Benchmark Tool for VectorDB
 [![version](https://img.shields.io/pypi/v/vectordb-bench.svg?color=blue)](https://pypi.org/project/vectordb-bench/)
 [![Downloads](https://pepy.tech/badge/vectordb-bench)](https://pepy.tech/project/vectordb-bench)
+## What is VectorDBBench
+VectorDBBench is not just an offering of benchmark results for mainstream vector databases and cloud services, it's your go-to tool for the ultimate performance and cost-effectiveness comparison. Designed with ease-of-use in mind, VectorDBBench is devised to help users, even non-professionals, reproduce results or test new systems, making the hunt for the optimal choice amongst a plethora of cloud services and open-source vector databases a breeze.
+Understanding the importance of user experience, we provide an intuitive visual interface. This not only empowers users to initiate benchmarks at ease, but also to view comparative result reports, thereby reproducing benchmark results effortlessly.
+To add more relevance and practicality, we provide cost-effectiveness reports particularly for cloud services. This allows for a more realistic and applicable benchmarking process.
+Closely mimicking real-world production environments, we've set up diverse testing scenarios including insertion, searching, and filtered searching. To provide you with credible and reliable data, we've included public datasets from actual production scenarios, such as [SIFT](http://corpus-texmex.irisa.fr/), [GIST](http://corpus-texmex.irisa.fr/), [Cohere](https://huggingface.co/datasets/Cohere/wikipedia-22-12/tree/main/en), and a dataset generated by OpenAI from an opensource [raw dataset](https://huggingface.co/datasets/allenai/c4). It's fascinating to discover how a relatively unknown open-source database might excel in certain circumstances!
+Prepare to delve into the world of VectorDBBench, and let it guide you in uncovering your perfect vector database match.
 **Leaderboard:** https://zilliz.com/benchmark
 ## Quick Start
 ### Prerequirement
@@ -111,21 +115,19 @@ All the database client supported
 | Optional database client | install command                             |
 |--------------------------|---------------------------------------------|
-| pymilvus(*default*)      | `pip install vectordb-bench`                |
-| all                      | `pip install vectordb-bench[all]`           |
+| pymilvus, zilliz_cloud (*default*)     | `pip install vectordb-bench`                |
+| all (*clients requirements might be conflict with each other*) | `pip install vectordb-bench[all]`           |
 | qdrant                   | `pip install vectordb-bench[qdrant]`        |
 | pinecone                 | `pip install vectordb-bench[pinecone]`      |
 | weaviate                 | `pip install vectordb-bench[weaviate]`      |
-| elastic                  | `pip install vectordb-bench[elastic]`       |
-| pgvector                 | `pip install vectordb-bench[pgvector]`      |
+| elastic, aliyun_elasticsearch| `pip install vectordb-bench[elastic]`       |
+| pgvector, pgvectorscale, pgdiskann, alloydb | `pip install vectordb-bench[pgvector]`      |
 | pgvecto.rs               | `pip install vectordb-bench[pgvecto_rs]`    |
-| pgvectorscale            | `pip install vectordb-bench[pgvectorscale]` |
-| pgdiskann                | `pip install vectordb-bench[pgdiskann]`     |
 | redis                    | `pip install vectordb-bench[redis]`         |
 | memorydb                 | `pip install vectordb-bench[memorydb]`      |
 | chromadb                 | `pip install vectordb-bench[chromadb]`      |
-| awsopensearch            | `pip install vectordb-bench[awsopensearch]` |
-| alloydb                  | `pip install vectordb-bench[alloydb]`       |
+| awsopensearch            | `pip install vectordb-bench[opensearch]` |
+| aliyun_opensearch        | `pip install vectordb-bench[aliyun_opensearch]` |
 ### Run
@@ -264,16 +266,6 @@ milvushnsw:
 > - Options passed on the command line will override the configuration file*
 > - Parameter names use an _ not -
-## What is VectorDBBench
-VectorDBBench is not just an offering of benchmark results for mainstream vector databases and cloud services, it's your go-to tool for the ultimate performance and cost-effectiveness comparison. Designed with ease-of-use in mind, VectorDBBench is devised to help users, even non-professionals, reproduce results or test new systems, making the hunt for the optimal choice amongst a plethora of cloud services and open-source vector databases a breeze.
-Understanding the importance of user experience, we provide an intuitive visual interface. This not only empowers users to initiate benchmarks at ease, but also to view comparative result reports, thereby reproducing benchmark results effortlessly.
-To add more relevance and practicality, we provide cost-effectiveness reports particularly for cloud services. This allows for a more realistic and applicable benchmarking process.
-Closely mimicking real-world production environments, we've set up diverse testing scenarios including insertion, searching, and filtered searching. To provide you with credible and reliable data, we've included public datasets from actual production scenarios, such as [SIFT](http://corpus-texmex.irisa.fr/), [GIST](http://corpus-texmex.irisa.fr/), [Cohere](https://huggingface.co/datasets/Cohere/wikipedia-22-12/tree/main/en), and a dataset generated by OpenAI from an opensource [raw dataset](https://huggingface.co/datasets/allenai/c4). It's fascinating to discover how a relatively unknown open-source database might excel in certain circumstances!
-Prepare to delve into the world of VectorDBBench, and let it guide you in uncovering your perfect vector database match.
 ## Leaderboard
 ### Introduction
 To facilitate the presentation of test results and provide a comprehensive performance analysis report, we offer a [leaderboard page](https://zilliz.com/benchmark). It allows us to choose from QPS, QP$, and latency metrics, and provides a comprehensive assessment of a system's performance based on the test results of various cases and a set of scoring mechanisms (to be introduced later). On this leaderboard, we can select the systems and models to be compared, and filter out cases we do not want to consider. Comprehensive scores are always ranked from best to worst, and the specific test results of each query will be presented in the list below.
@@ -324,13 +316,13 @@ After reopen the repository in container, run `python -m vectordb_bench` in the
 ### Check coding styles
 ```shell
-$ ruff check vectordb_bench
+$ make lint
 ```
-Add `--fix` if you want to fix the coding styles automatically
+To fix the coding styles automatically
 ```shell
-$ ruff check vectordb_bench --fix
+$ make format
 ```
 ## How does it work?

{vectordb_bench-0.0.18 → vectordb_bench-0.0.20}/README.md RENAMED Viewed

@@ -3,6 +3,16 @@
 [![version](https://img.shields.io/pypi/v/vectordb-bench.svg?color=blue)](https://pypi.org/project/vectordb-bench/)
 [![Downloads](https://pepy.tech/badge/vectordb-bench)](https://pepy.tech/project/vectordb-bench)
+## What is VectorDBBench
+VectorDBBench is not just an offering of benchmark results for mainstream vector databases and cloud services, it's your go-to tool for the ultimate performance and cost-effectiveness comparison. Designed with ease-of-use in mind, VectorDBBench is devised to help users, even non-professionals, reproduce results or test new systems, making the hunt for the optimal choice amongst a plethora of cloud services and open-source vector databases a breeze.
+Understanding the importance of user experience, we provide an intuitive visual interface. This not only empowers users to initiate benchmarks at ease, but also to view comparative result reports, thereby reproducing benchmark results effortlessly.
+To add more relevance and practicality, we provide cost-effectiveness reports particularly for cloud services. This allows for a more realistic and applicable benchmarking process.
+Closely mimicking real-world production environments, we've set up diverse testing scenarios including insertion, searching, and filtered searching. To provide you with credible and reliable data, we've included public datasets from actual production scenarios, such as [SIFT](http://corpus-texmex.irisa.fr/), [GIST](http://corpus-texmex.irisa.fr/), [Cohere](https://huggingface.co/datasets/Cohere/wikipedia-22-12/tree/main/en), and a dataset generated by OpenAI from an opensource [raw dataset](https://huggingface.co/datasets/allenai/c4). It's fascinating to discover how a relatively unknown open-source database might excel in certain circumstances!
+Prepare to delve into the world of VectorDBBench, and let it guide you in uncovering your perfect vector database match.
 **Leaderboard:** https://zilliz.com/benchmark
 ## Quick Start
 ### Prerequirement
@@ -29,21 +39,19 @@ All the database client supported
 | Optional database client | install command                             |
 |--------------------------|---------------------------------------------|
-| pymilvus(*default*)      | `pip install vectordb-bench`                |
-| all                      | `pip install vectordb-bench[all]`           |
+| pymilvus, zilliz_cloud (*default*)     | `pip install vectordb-bench`                |
+| all (*clients requirements might be conflict with each other*) | `pip install vectordb-bench[all]`           |
 | qdrant                   | `pip install vectordb-bench[qdrant]`        |
 | pinecone                 | `pip install vectordb-bench[pinecone]`      |
 | weaviate                 | `pip install vectordb-bench[weaviate]`      |
-| elastic                  | `pip install vectordb-bench[elastic]`       |
-| pgvector                 | `pip install vectordb-bench[pgvector]`      |
+| elastic, aliyun_elasticsearch| `pip install vectordb-bench[elastic]`       |
+| pgvector, pgvectorscale, pgdiskann, alloydb | `pip install vectordb-bench[pgvector]`      |
 | pgvecto.rs               | `pip install vectordb-bench[pgvecto_rs]`    |
-| pgvectorscale            | `pip install vectordb-bench[pgvectorscale]` |
-| pgdiskann                | `pip install vectordb-bench[pgdiskann]`     |
 | redis                    | `pip install vectordb-bench[redis]`         |
 | memorydb                 | `pip install vectordb-bench[memorydb]`      |
 | chromadb                 | `pip install vectordb-bench[chromadb]`      |
-| awsopensearch            | `pip install vectordb-bench[awsopensearch]` |
-| alloydb                  | `pip install vectordb-bench[alloydb]`       |
+| awsopensearch            | `pip install vectordb-bench[opensearch]` |
+| aliyun_opensearch        | `pip install vectordb-bench[aliyun_opensearch]` |
 ### Run
@@ -182,16 +190,6 @@ milvushnsw:
 > - Options passed on the command line will override the configuration file*
 > - Parameter names use an _ not -
-## What is VectorDBBench
-VectorDBBench is not just an offering of benchmark results for mainstream vector databases and cloud services, it's your go-to tool for the ultimate performance and cost-effectiveness comparison. Designed with ease-of-use in mind, VectorDBBench is devised to help users, even non-professionals, reproduce results or test new systems, making the hunt for the optimal choice amongst a plethora of cloud services and open-source vector databases a breeze.
-Understanding the importance of user experience, we provide an intuitive visual interface. This not only empowers users to initiate benchmarks at ease, but also to view comparative result reports, thereby reproducing benchmark results effortlessly.
-To add more relevance and practicality, we provide cost-effectiveness reports particularly for cloud services. This allows for a more realistic and applicable benchmarking process.
-Closely mimicking real-world production environments, we've set up diverse testing scenarios including insertion, searching, and filtered searching. To provide you with credible and reliable data, we've included public datasets from actual production scenarios, such as [SIFT](http://corpus-texmex.irisa.fr/), [GIST](http://corpus-texmex.irisa.fr/), [Cohere](https://huggingface.co/datasets/Cohere/wikipedia-22-12/tree/main/en), and a dataset generated by OpenAI from an opensource [raw dataset](https://huggingface.co/datasets/allenai/c4). It's fascinating to discover how a relatively unknown open-source database might excel in certain circumstances!
-Prepare to delve into the world of VectorDBBench, and let it guide you in uncovering your perfect vector database match.
 ## Leaderboard
 ### Introduction
 To facilitate the presentation of test results and provide a comprehensive performance analysis report, we offer a [leaderboard page](https://zilliz.com/benchmark). It allows us to choose from QPS, QP$, and latency metrics, and provides a comprehensive assessment of a system's performance based on the test results of various cases and a set of scoring mechanisms (to be introduced later). On this leaderboard, we can select the systems and models to be compared, and filter out cases we do not want to consider. Comprehensive scores are always ranked from best to worst, and the specific test results of each query will be presented in the list below.
@@ -242,13 +240,13 @@ After reopen the repository in container, run `python -m vectordb_bench` in the
 ### Check coding styles
 ```shell
-$ ruff check vectordb_bench
+$ make lint
 ```
-Add `--fix` if you want to fix the coding styles automatically
+To fix the coding styles automatically
 ```shell
-$ ruff check vectordb_bench --fix
+$ make format
 ```
 ## How does it work?

vectordb_bench-0.0.20/pyproject.toml ADDED Viewed

@@ -0,0 +1,209 @@
+[build-system]
+requires = ["setuptools>=67.0", "wheel", "setuptools_scm[toml]>=6.2"]
+build-backend = "setuptools.build_meta"
+[tool.setuptools.package-data]
+"vectordb_bench.results" = ["*.json"]
+[tool.setuptools.packages.find]
+where = ["."]
+include = ["vectordb_bench", "vectordb_bench.cli"]
+[project]
+name = "vectordb-bench"
+authors = [
+  {name="XuanYang-cn", email="xuan.yang@zilliz.com"},
+]
+description = "VectorDBBench is not just an offering of benchmark results for mainstream vector databases and cloud services, it's your go-to tool for the ultimate performance and cost-effectiveness comparison. Designed with ease-of-use in mind, VectorDBBench is devised to help users, even non-professionals, reproduce results or test new systems, making the hunt for the optimal choice amongst a plethora of cloud services and open-source vector databases a breeze."
+readme = "README.md"
+requires-python = ">=3.11"
+classifiers = [
+    "Programming Language :: Python :: 3",
+    "License :: OSI Approved :: MIT License",
+    "Operating System :: OS Independent",
+]
+dependencies = [
+    "click",
+    "pytz",
+    "streamlit-autorefresh",
+    "streamlit!=1.34.0",
+    "streamlit_extras",
+    "tqdm",
+    "s3fs",
+    "oss2",
+    "psutil",
+    "polars",
+    "plotly",
+    "environs",
+    "pydantic<v2",
+    "scikit-learn",
+    "pymilvus", # with pandas, numpy, ujson
+]
+dynamic = ["version"]
+[project.optional-dependencies]
+test = [
+    "black",
+    "ruff",
+    "pytest",
+]
+all = [
+    "grpcio==1.53.0", # for qdrant-client and pymilvus
+    "grpcio-tools==1.53.0", # for qdrant-client and pymilvus
+    "qdrant-client",
+    "pinecone-client",
+    "weaviate-client",
+    "elasticsearch",
+    "sqlalchemy",
+    "redis",
+    "chromadb",
+    "pgvector",
+    "psycopg",
+    "psycopg-binary",
+    "pgvecto_rs[psycopg3]>=0.2.2",
+    "opensearch-dsl",
+    "opensearch-py",
+    "memorydb",
+    "alibabacloud_ha3engine_vector",
+    "alibabacloud_searchengine20211025",
+]
+qdrant          = [ "qdrant-client" ]
+pinecone        = [ "pinecone-client" ]
+weaviate        = [ "weaviate-client" ]
+elastic         = [ "elasticsearch" ]
+# For elastic and aliyun_elasticsearch
+pgvector        = [ "psycopg", "psycopg-binary", "pgvector" ]
+# for pgvector, pgvectorscale, pgdiskann, and, alloydb
+pgvecto_rs      = [ "pgvecto_rs[psycopg3]>=0.2.2" ]
+redis           = [ "redis" ]
+memorydb        = [ "memorydb" ]
+chromadb        = [ "chromadb" ]
+opensearch      = [ "opensearch-py" ]
+aliyun_opensearch = [ "alibabacloud_ha3engine_vector", "alibabacloud_searchengine20211025"]
+[project.urls]
+"repository" = "https://github.com/zilliztech/VectorDBBench"
+[project.scripts]
+init_bench = "vectordb_bench.__main__:main"
+vectordbbench = "vectordb_bench.cli.vectordbbench:cli"
+[tool.setuptools_scm]
+[tool.black]
+line-length = 120
+target-version = ['py311']
+include = '\.pyi?$'
+[tool.ruff]
+lint.select = [
+    "E",
+    "F",
+    "C90",
+    "I",
+    "N",
+    "B", "C", "G",
+    "A",
+    "ANN001",
+    "S", "T", "W", "ARG", "BLE", "COM", "DJ", "EM", "ERA", "EXE", "FBT", "ICN", "INP", "ISC", "NPY", "PD", "PGH", "PIE", "PL", "PT", "PTH", "PYI", "RET", "RSE", "RUF", "SIM", "SLF", "TCH", "TID", "TRY", "UP", "YTT"
+]
+lint.ignore = [
+    "BLE001", # blind-except (BLE001)
+    "SLF001", # SLF001 Private member accessed [E]
+    "TRY003", # [ruff] TRY003 Avoid specifying long messages outside the exception class [E]
+    "FBT001", "FBT002", "FBT003",
+    "G004", # [ruff] G004 Logging statement uses f-string [E]
+    "UP031",
+    "RUF012",
+    "EM101",
+    "N805",
+    "ARG002",
+    "ARG003",
+    "PIE796", # https://github.com/zilliztech/VectorDBBench/issues/438
+    "INP001", # TODO
+    "TID252", # TODO
+    "N801", "N802", "N815",
+    "S101", "S108", "S603", "S311",
+    "PLR2004",
+    "RUF017",
+    "C416",
+    "PLW0603",
+]
+# Allow autofix for all enabled rules (when `--fix`) is provided.
+lint.fixable = [
+    "A", "B", "C", "D", "E", "F", "G", "I", "N", "Q", "S", "T", "W",
+    "ANN", "ARG", "BLE", "COM", "DJ", "DTZ", "EM", "ERA", "EXE", "FBT",
+    "ICN", "INP", "ISC", "NPY", "PD", "PGH", "PIE", "PL", "PT", "PTH",
+    "PYI", "RET", "RSE", "RUF", "SIM", "SLF", "TCH", "TID", "TRY", "UP",
+    "YTT",
+]
+lint.unfixable = []
+show-fixes = true
+# Exclude a variety of commonly ignored directories.
+exclude = [
+    ".bzr",
+    ".direnv",
+    ".eggs",
+    ".git",
+    ".git-rewrite",
+    ".hg",
+    ".mypy_cache",
+    ".nox",
+    ".pants.d",
+    ".pytype",
+    ".ruff_cache",
+    ".svn",
+    ".tox",
+    ".venv",
+    "__pypackages__",
+    "_build",
+    "buck-out",
+    "build",
+    "dist",
+    "node_modules",
+    "venv",
+    "grpc_gen",
+    "__pycache__",
+    "frontend", # TODO
+    "tests",
+]
+# Same as Black.
+line-length = 120
+# Allow unused variables when underscore-prefixed.
+lint.dummy-variable-rgx = "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$"
+# Assume Python 3.11
+target-version = "py311"
+[tool.ruff.lint.mccabe]
+# Unlike Flake8, default to a complexity level of 10.
+max-complexity = 18
+[tool.ruff.lint.pycodestyle]
+max-line-length = 120
+max-doc-length = 120
+[tool.ruff.lint.pylint]
+max-args = 20
+max-branches = 15
+[tool.ruff.lint.flake8-builtins]
+builtins-ignorelist = [
+    # "format",
+    # "next",
+    # "object", # TODO
+    # "id",
+    # "dict", # TODO
+    # "filter",
+]

{vectordb_bench-0.0.18 → vectordb_bench-0.0.20}/tests/test_rate_runner.py RENAMED Viewed

@@ -52,9 +52,9 @@ def test_read_write_runner(db, insert_rate, conc: list, search_stage: Iterable[f
 def get_db(db: str, config: dict) -> VectorDB:
     if db == DB.Milvus.name:
-        return DB.Milvus.init_cls(dim=768, db_config=config, db_case_config=FLATConfig(metric_type="COSINE"), drop_old=True, pre_load=True)
+        return DB.Milvus.init_cls(dim=768, db_config=config, db_case_config=FLATConfig(metric_type="COSINE"), drop_old=True)
     elif db == DB.ZillizCloud.name:
-        return DB.ZillizCloud.init_cls(dim=768, db_config=config, db_case_config=AutoIndexConfig(metric_type="COSINE"), drop_old=True, pre_load=True)
+        return DB.ZillizCloud.init_cls(dim=768, db_config=config, db_case_config=AutoIndexConfig(metric_type="COSINE"), drop_old=True)
     else:
         raise ValueError(f"unknown db: {db}")
@@ -76,7 +76,7 @@ if __name__ == "__main__":
     }
     conc = (1, 15, 50)
-    search_stage = (0.5, 0.6, 0.7, 0.8, 0.9, 1.0)
+    search_stage = (0.5, 0.6, 0.7, 0.8, 0.9)
     db = get_db(flags.db, config)
     test_read_write_runner(

vectordb_bench-0.0.20/vectordb_bench/__init__.py ADDED Viewed

@@ -0,0 +1,92 @@
+import inspect
+import pathlib
+import environs
+from . import log_util
+env = environs.Env()
+env.read_env(".env", False)
+class config:
+    ALIYUN_OSS_URL = "assets.zilliz.com.cn/benchmark/"
+    AWS_S3_URL = "assets.zilliz.com/benchmark/"
+    LOG_LEVEL = env.str("LOG_LEVEL", "INFO")
+    DEFAULT_DATASET_URL = env.str("DEFAULT_DATASET_URL", AWS_S3_URL)
+    DATASET_LOCAL_DIR = env.path("DATASET_LOCAL_DIR", "/tmp/vectordb_bench/dataset")
+    NUM_PER_BATCH = env.int("NUM_PER_BATCH", 100)
+    DROP_OLD = env.bool("DROP_OLD", True)
+    USE_SHUFFLED_DATA = env.bool("USE_SHUFFLED_DATA", True)
+    NUM_CONCURRENCY = env.list(
+        "NUM_CONCURRENCY",
+        [
+            1,
+            5,
+            10,
+            15,
+            20,
+            25,
+            30,
+            35,
+            40,
+            45,
+            50,
+            55,
+            60,
+            65,
+            70,
+            75,
+            80,
+            85,
+            90,
+            95,
+            100,
+        ],
+        subcast=int,
+    )
+    CONCURRENCY_DURATION = 30
+    RESULTS_LOCAL_DIR = env.path(
+        "RESULTS_LOCAL_DIR",
+        pathlib.Path(__file__).parent.joinpath("results"),
+    )
+    CONFIG_LOCAL_DIR = env.path(
+        "CONFIG_LOCAL_DIR",
+        pathlib.Path(__file__).parent.joinpath("config-files"),
+    )
+    K_DEFAULT = 100  # default return top k nearest neighbors during search
+    CUSTOM_CONFIG_DIR = pathlib.Path(__file__).parent.joinpath("custom/custom_case.json")
+    CAPACITY_TIMEOUT_IN_SECONDS = 24 * 3600  # 24h
+    LOAD_TIMEOUT_DEFAULT = 24 * 3600  # 24h
+    LOAD_TIMEOUT_768D_1M = 24 * 3600  # 24h
+    LOAD_TIMEOUT_768D_10M = 240 * 3600  # 10d
+    LOAD_TIMEOUT_768D_100M = 2400 * 3600  # 100d
+    LOAD_TIMEOUT_1536D_500K = 24 * 3600  # 24h
+    LOAD_TIMEOUT_1536D_5M = 240 * 3600  # 10d
+    OPTIMIZE_TIMEOUT_DEFAULT = 24 * 3600  # 24h
+    OPTIMIZE_TIMEOUT_768D_1M = 24 * 3600  # 24h
+    OPTIMIZE_TIMEOUT_768D_10M = 240 * 3600  # 10d
+    OPTIMIZE_TIMEOUT_768D_100M = 2400 * 3600  # 100d
+    OPTIMIZE_TIMEOUT_1536D_500K = 24 * 3600  # 24h
+    OPTIMIZE_TIMEOUT_1536D_5M = 240 * 3600  # 10d
+    def display(self) -> str:
+        return [
+            i
+            for i in inspect.getmembers(self)
+            if not inspect.ismethod(i[1]) and not i[0].startswith("_") and "TIMEOUT" not in i[0]
+        ]
+log_util.init(config.LOG_LEVEL)

{vectordb_bench-0.0.18 → vectordb_bench-0.0.20}/vectordb_bench/__main__.py RENAMED Viewed

@@ -1,7 +1,8 @@
-import traceback
 import logging
+import pathlib
 import subprocess
-import os
+import traceback
 from . import config
 log = logging.getLogger("vectordb_bench")
@@ -16,7 +17,7 @@ def run_streamlit():
     cmd = [
         "streamlit",
         "run",
-        f"{os.path.dirname(__file__)}/frontend/vdb_benchmark.py",
+        f"{pathlib.Path(__file__).parent}/frontend/vdb_benchmark.py",
         "--logger.level",
         "info",
         "--theme.base",

{vectordb_bench-0.0.18 → vectordb_bench-0.0.20}/vectordb_bench/backend/assembler.py RENAMED Viewed

@@ -1,24 +1,25 @@
-from .cases import CaseLabel
-from .task_runner import CaseRunner, RunningStatus, TaskRunner
-from ..models import TaskConfig
-from ..backend.clients import EmptyDBCaseConfig
-from ..backend.data_source  import DatasetSource
 import logging
+from vectordb_bench.backend.clients import EmptyDBCaseConfig
+from vectordb_bench.backend.data_source import DatasetSource
+from vectordb_bench.models import TaskConfig
+from .cases import CaseLabel
+from .task_runner import CaseRunner, RunningStatus, TaskRunner
 log = logging.getLogger(__name__)
 class Assembler:
     @classmethod
-    def assemble(cls, run_id , task: TaskConfig, source: DatasetSource) -> CaseRunner:
+    def assemble(cls, run_id: str, task: TaskConfig, source: DatasetSource) -> CaseRunner:
         c_cls = task.case_config.case_id.case_cls
         c = c_cls(task.case_config.custom_case)
-        if type(task.db_case_config) != EmptyDBCaseConfig:
+        if type(task.db_case_config) is not EmptyDBCaseConfig:
             task.db_case_config.metric_type = c.dataset.data.metric_type
-        runner = CaseRunner(
+        return CaseRunner(
             run_id=run_id,
             config=task,
             ca=c,
@@ -26,8 +27,6 @@ class Assembler:
             dataset_source=source,
         )
-        return runner
     @classmethod
     def assemble_all(
         cls,
@@ -50,12 +49,12 @@ class Assembler:
             db2runner[db].append(r)
         # check dbclient installed
-        for k in db2runner.keys():
+        for k in db2runner:
             _ = k.init_cls
         # sort by dataset size
-        for k in db2runner.keys():
-            db2runner[k].sort(key=lambda x:x.ca.dataset.data.size)
+        for k, _ in db2runner:
+            db2runner[k].sort(key=lambda x: x.ca.dataset.data.size)
         all_runners = []
         all_runners.extend(load_runners)

vectordb-bench 0.0.18__tar.gz → 0.0.20__tar.gz

vectordb-bench 0.0.18tar.gz → 0.0.20tar.gz