cocoindex 0.1.41__tar.gz → 0.1.43__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {cocoindex-0.1.41 → cocoindex-0.1.43}/Cargo.lock +1 -1
- {cocoindex-0.1.41 → cocoindex-0.1.43}/Cargo.toml +1 -1
- {cocoindex-0.1.41 → cocoindex-0.1.43}/PKG-INFO +3 -2
- {cocoindex-0.1.41 → cocoindex-0.1.43}/README.md +1 -1
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/docs/core/basics.md +1 -1
- cocoindex-0.1.43/docs/docs/core/cli.mdx +75 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/docs/core/data_types.mdx +1 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/docs/core/flow_def.mdx +1 -1
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/docs/core/flow_methods.mdx +5 -15
- cocoindex-0.1.43/docs/docs/core/settings.mdx +116 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/docs/getting_started/quickstart.md +11 -31
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/docs/ops/storages.md +2 -2
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/docusaurus.config.ts +11 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/package.json +1 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/sidebars.ts +1 -1
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/yarn.lock +15 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/amazon_s3_embedding/README.md +3 -3
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/amazon_s3_embedding/main.py +4 -4
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/amazon_s3_embedding/pyproject.toml +1 -1
- cocoindex-0.1.43/examples/code_embedding/README.md +71 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/code_embedding/main.py +30 -16
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/code_embedding/pyproject.toml +1 -1
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/docs_to_knowledge_graph/README.md +3 -3
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/docs_to_knowledge_graph/main.py +0 -9
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/docs_to_knowledge_graph/pyproject.toml +1 -1
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/fastapi_server_docker/dockerfile +1 -1
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/fastapi_server_docker/main.py +3 -6
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/fastapi_server_docker/requirements.txt +1 -1
- cocoindex-0.1.43/examples/gdrive_text_embedding/README.md +83 -0
- cocoindex-0.1.43/examples/gdrive_text_embedding/main.py +93 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/gdrive_text_embedding/pyproject.toml +1 -1
- {cocoindex-0.1.41/examples/image_search_example → cocoindex-0.1.43/examples/image_search}/README.md +1 -1
- {cocoindex-0.1.41/examples/image_search_example → cocoindex-0.1.43/examples/image_search}/main.py +2 -13
- {cocoindex-0.1.41/examples/image_search_example → cocoindex-0.1.43/examples/image_search}/requirements.txt +1 -1
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/manuals_llm_extraction/README.md +3 -3
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/manuals_llm_extraction/main.py +1 -10
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/manuals_llm_extraction/pyproject.toml +1 -5
- cocoindex-0.1.43/examples/pdf_embedding/README.md +61 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/pdf_embedding/main.py +55 -25
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/pdf_embedding/pyproject.toml +1 -1
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/product_recommendation/README.md +3 -3
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/product_recommendation/main.py +1 -10
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/product_recommendation/pyproject.toml +1 -1
- cocoindex-0.1.43/examples/text_embedding/README.md +63 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/text_embedding/Text_Embedding.ipynb +2 -2
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/text_embedding/main.py +15 -21
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/text_embedding/pyproject.toml +1 -1
- cocoindex-0.1.43/examples/text_embedding_qdrant/README.md +87 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/text_embedding_qdrant/main.py +32 -25
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/text_embedding_qdrant/pyproject.toml +5 -1
- {cocoindex-0.1.41 → cocoindex-0.1.43}/pyproject.toml +4 -1
- {cocoindex-0.1.41 → cocoindex-0.1.43}/python/cocoindex/__init__.py +1 -1
- cocoindex-0.1.43/python/cocoindex/cli.py +439 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/python/cocoindex/flow.py +2 -2
- cocoindex-0.1.43/python/cocoindex/lib.py +71 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/python/cocoindex/typing.py +2 -0
- cocoindex-0.1.43/src/base/duration.rs +674 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/base/json_schema.rs +11 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/base/mod.rs +1 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/base/schema.rs +4 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/base/value.rs +16 -1
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/execution/query.rs +2 -1
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/ops/storages/neo4j.rs +14 -4
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/ops/storages/postgres.rs +12 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/ops/storages/qdrant.rs +9 -2
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/py/convert.rs +6 -2
- cocoindex-0.1.41/docs/docs/core/cli.mdx +0 -76
- cocoindex-0.1.41/docs/docs/core/initialization.mdx +0 -134
- cocoindex-0.1.41/examples/code_embedding/README.md +0 -52
- cocoindex-0.1.41/examples/gdrive_text_embedding/README.md +0 -65
- cocoindex-0.1.41/examples/gdrive_text_embedding/main.py +0 -76
- cocoindex-0.1.41/examples/pdf_embedding/README.md +0 -41
- cocoindex-0.1.41/examples/text_embedding/README.md +0 -46
- cocoindex-0.1.41/examples/text_embedding_qdrant/README.md +0 -69
- cocoindex-0.1.41/python/cocoindex/cli.py +0 -238
- cocoindex-0.1.41/python/cocoindex/lib.py +0 -78
- {cocoindex-0.1.41 → cocoindex-0.1.43}/.cargo/config.toml +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/.env.lib_debug +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/.github/ISSUE_TEMPLATE//360/237/220/233-bug-report.md" +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/.github/ISSUE_TEMPLATE//360/237/222/241-feature-request.md" +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/.github/scripts/update_version.sh +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/.github/workflows/CI.yml +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/.github/workflows/_test.yml +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/.github/workflows/docs.yml +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/.github/workflows/release.yml +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/.gitignore +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/.vscode/settings.json +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/CODE_OF_CONDUCT.md +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/CONTRIBUTING.md +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/LICENSE +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/dev/neo4j.yaml +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/dev/postgres.yaml +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/.gitignore +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/README.md +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/docs/about/community.md +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/docs/about/contributing.md +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/docs/ai/llm.mdx +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/docs/core/custom_function.mdx +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/docs/core/data_example.svg +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/docs/core/flow_example.svg +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/docs/getting_started/installation.md +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/docs/getting_started/markdown_files.zip +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/docs/getting_started/overview.md +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/docs/ops/functions.md +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/docs/ops/sources.md +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/docs/query.mdx +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/src/components/HomepageFeatures/index.tsx +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/src/components/HomepageFeatures/styles.module.css +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/src/css/custom.css +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/src/theme/Root.js +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/static/.nojekyll +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/static/img/docusaurus.png +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/static/img/favicon.ico +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/static/img/icon.svg +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/static/robots.txt +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/docs/tsconfig.json +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/amazon_s3_embedding/.env.example +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/amazon_s3_embedding/.gitignore +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/code_embedding/.env +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/docs_to_knowledge_graph/.env +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/fastapi_server_docker/.dockerignore +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/fastapi_server_docker/.env +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/fastapi_server_docker/README.md +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/fastapi_server_docker/compose.yaml +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/fastapi_server_docker/sample_code/main.py +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/fastapi_server_docker/src/cocoindex_funs.py +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/gdrive_text_embedding/.env.example +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/gdrive_text_embedding/.gitignore +0 -0
- {cocoindex-0.1.41/examples/image_search_example → cocoindex-0.1.43/examples/image_search}/.env +0 -0
- {cocoindex-0.1.41/examples/image_search_example → cocoindex-0.1.43/examples/image_search}/frontend/.gitignore +0 -0
- {cocoindex-0.1.41/examples/image_search_example → cocoindex-0.1.43/examples/image_search}/frontend/index.html +0 -0
- {cocoindex-0.1.41/examples/image_search_example → cocoindex-0.1.43/examples/image_search}/frontend/package-lock.json +0 -0
- {cocoindex-0.1.41/examples/image_search_example → cocoindex-0.1.43/examples/image_search}/frontend/package.json +0 -0
- {cocoindex-0.1.41/examples/image_search_example → cocoindex-0.1.43/examples/image_search}/frontend/src/App.jsx +0 -0
- {cocoindex-0.1.41/examples/image_search_example → cocoindex-0.1.43/examples/image_search}/frontend/src/main.jsx +0 -0
- {cocoindex-0.1.41/examples/image_search_example → cocoindex-0.1.43/examples/image_search}/frontend/src/style.css +0 -0
- {cocoindex-0.1.41/examples/image_search_example → cocoindex-0.1.43/examples/image_search}/frontend/vite.config.js +0 -0
- {cocoindex-0.1.41/examples/image_search_example → cocoindex-0.1.43/examples/image_search}/img/cat1.jpeg +0 -0
- {cocoindex-0.1.41/examples/image_search_example → cocoindex-0.1.43/examples/image_search}/img/dog1.jpeg +0 -0
- {cocoindex-0.1.41/examples/image_search_example → cocoindex-0.1.43/examples/image_search}/img/elephant1.jpg +0 -0
- {cocoindex-0.1.41/examples/image_search_example → cocoindex-0.1.43/examples/image_search}/img/giraffe.jpg +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/manuals_llm_extraction/.env +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/manuals_llm_extraction/manuals/array.pdf +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/manuals_llm_extraction/manuals/base64.pdf +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/manuals_llm_extraction/manuals/copy.pdf +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/manuals_llm_extraction/manuals/glob.pdf +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/pdf_embedding/.env +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/pdf_embedding/pdf_files/1706.03762v7.pdf +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/pdf_embedding/pdf_files/1810.04805v2.pdf +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/pdf_embedding/pdf_files/rfc8259.pdf +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/product_recommendation/.env +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/product_recommendation/img/cocoinsight.png +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/product_recommendation/img/neo4j.png +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/product_recommendation/products/p1.json +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/product_recommendation/products/p2.json +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/product_recommendation/products/p3.json +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/product_recommendation/products/p4.json +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/product_recommendation/products/p5.json +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/product_recommendation/products/p6.json +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/product_recommendation/products/p7.json +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/product_recommendation/products/p8.json +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/product_recommendation/products/p9.json +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/text_embedding/.env +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/text_embedding/markdown_files/1706.03762v7.md +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/text_embedding/markdown_files/1810.04805v2.md +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/text_embedding/markdown_files/rfc8259.md +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/text_embedding_qdrant/.env +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/examples/text_embedding_qdrant/markdown_files/rfc8259.md +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/python/cocoindex/auth_registry.py +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/python/cocoindex/convert.py +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/python/cocoindex/functions.py +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/python/cocoindex/index.py +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/python/cocoindex/llm.py +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/python/cocoindex/op.py +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/python/cocoindex/py.typed +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/python/cocoindex/query.py +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/python/cocoindex/runtime.py +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/python/cocoindex/setting.py +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/python/cocoindex/setup.py +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/python/cocoindex/sources.py +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/python/cocoindex/storages.py +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/python/cocoindex/tests/__init__.py +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/python/cocoindex/tests/test_convert.py +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/python/cocoindex/utils.py +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/base/field_attrs.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/base/spec.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/builder/analyzed_flow.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/builder/analyzer.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/builder/flow_builder.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/builder/mod.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/builder/plan.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/execution/db_tracking.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/execution/db_tracking_setup.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/execution/dumper.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/execution/evaluator.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/execution/indexing_status.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/execution/live_updater.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/execution/memoization.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/execution/mod.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/execution/row_indexer.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/execution/source_indexer.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/execution/stats.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/lib.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/lib_context.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/llm/anthropic.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/llm/gemini.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/llm/mod.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/llm/ollama.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/llm/openai.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/ops/factory_bases.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/ops/functions/extract_by_llm.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/ops/functions/mod.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/ops/functions/parse_json.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/ops/functions/split_recursively.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/ops/interface.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/ops/mod.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/ops/py_factory.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/ops/registration.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/ops/registry.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/ops/sdk.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/ops/sources/amazon_s3.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/ops/sources/google_drive.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/ops/sources/local_file.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/ops/sources/mod.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/ops/storages/mod.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/ops/storages/spec.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/prelude.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/py/mod.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/server.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/service/error.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/service/flows.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/service/mod.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/service/search.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/settings.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/setup/auth_registry.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/setup/components.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/setup/db_metadata.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/setup/driver.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/setup/mod.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/setup/states.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/utils/db.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/utils/fingerprint.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/utils/immutable.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/utils/mod.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/utils/retryable.rs +0 -0
- {cocoindex-0.1.41 → cocoindex-0.1.43}/src/utils/yaml_ser.rs +0 -0
@@ -1,9 +1,10 @@
|
|
1
1
|
Metadata-Version: 2.4
|
2
2
|
Name: cocoindex
|
3
|
-
Version: 0.1.
|
3
|
+
Version: 0.1.43
|
4
4
|
Requires-Dist: sentence-transformers>=3.3.1
|
5
5
|
Requires-Dist: click>=8.1.8
|
6
6
|
Requires-Dist: rich>=14.0.0
|
7
|
+
Requires-Dist: python-dotenv>=1.1.0
|
7
8
|
Requires-Dist: pytest ; extra == 'test'
|
8
9
|
Provides-Extra: test
|
9
10
|
License-File: LICENSE
|
@@ -154,7 +155,7 @@ It defines an index flow like this:
|
|
154
155
|
| [Embeddings to Qdrant](examples/text_embedding_qdrant) | Index documents in a Qdrant collection for semantic search |
|
155
156
|
| [FastAPI Server with Docker](examples/fastapi_server_docker) | Run the semantic search server in a Dockerized FastAPI setup |
|
156
157
|
| [Product Recommendation](examples/product_recommendation) | Build real-time product recommendations with LLM and graph database|
|
157
|
-
| [Image Search with Vision API](examples/
|
158
|
+
| [Image Search with Vision API](examples/image_search) | Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend|
|
158
159
|
|
159
160
|
More coming and stay tuned 👀!
|
160
161
|
|
@@ -138,7 +138,7 @@ It defines an index flow like this:
|
|
138
138
|
| [Embeddings to Qdrant](examples/text_embedding_qdrant) | Index documents in a Qdrant collection for semantic search |
|
139
139
|
| [FastAPI Server with Docker](examples/fastapi_server_docker) | Run the semantic search server in a Dockerized FastAPI setup |
|
140
140
|
| [Product Recommendation](examples/product_recommendation) | Build real-time product recommendations with LLM and graph database|
|
141
|
-
| [Image Search with Vision API](examples/
|
141
|
+
| [Image Search with Vision API](examples/image_search) | Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend|
|
142
142
|
|
143
143
|
More coming and stay tuned 👀!
|
144
144
|
|
@@ -101,4 +101,4 @@ As an indexing flow is long-lived, it needs to store intermediate data to keep t
|
|
101
101
|
CocoIndex uses internal storage for this purpose.
|
102
102
|
|
103
103
|
Currently, CocoIndex uses Postgres database as the internal storage.
|
104
|
-
See [
|
104
|
+
See [Settings](settings#databaseconnectionspec) for configuring its location, and `cocoindex setup` CLI command (see [CocoIndex CLI](cli)) creates tables for the internal storage.
|
@@ -0,0 +1,75 @@
|
|
1
|
+
---
|
2
|
+
title: CLI
|
3
|
+
description: CocoIndex CLI
|
4
|
+
---
|
5
|
+
|
6
|
+
import Tabs from '@theme/Tabs';
|
7
|
+
import TabItem from '@theme/TabItem';
|
8
|
+
|
9
|
+
# CocoIndex CLI
|
10
|
+
|
11
|
+
CocoIndex CLI is a standalone tool for easily managing and inspecting your flows and indexes.
|
12
|
+
|
13
|
+
## Invoke the CLI
|
14
|
+
|
15
|
+
Once CocoIndex is installed, you can invoke the CLI directly using the `cocoindex` command. Most commands require an `APP_TARGET` argument, which tells the CLI where your flow definitions are located.
|
16
|
+
|
17
|
+
### APP_TARGET Format
|
18
|
+
|
19
|
+
The `APP_TARGET` can be:
|
20
|
+
1. A **path to a Python file** defining your flows (e.g., `main.py`, `path/to/my_flows.py`).
|
21
|
+
2. An **installed Python module name** that contains your flow definitions (e.g., `my_package.flows`).
|
22
|
+
3. For commands that operate on a *specific flow* (like `show`, `update`, `evaluate`), you can combine the application reference with a flow name:
|
23
|
+
* `path/to/my_flows.py:MyFlow`
|
24
|
+
* `my_package.flows:MyFlow`
|
25
|
+
|
26
|
+
### Environment Variables
|
27
|
+
|
28
|
+
Environment variables are needed as CocoIndex library settings, as described in [CocoIndex Settings](settings#list-of-environment-variables).
|
29
|
+
|
30
|
+
You can set environment variables in an environment file.
|
31
|
+
|
32
|
+
* By default, the `cocoindex` CLI searches upward from the current directory for a `.env` file.
|
33
|
+
* You can use `--env-file <path>` to specify one explicitly:
|
34
|
+
|
35
|
+
```sh
|
36
|
+
cocoindex --env-file path/to/custom.env <COMMAND> ...
|
37
|
+
```
|
38
|
+
|
39
|
+
Loaded variables do *NOT* override existing system ones.
|
40
|
+
If no file is found, only existing system environment variables are used.
|
41
|
+
|
42
|
+
### Global Options
|
43
|
+
|
44
|
+
CocoIndex CLI supports the following global options:
|
45
|
+
|
46
|
+
* `--env-file <path>`: Load environment variables from a specified `.env` file. If not provided, `.env` in the current directory is loaded if it exists.
|
47
|
+
* `--version`: Show the CocoIndex version and exit.
|
48
|
+
* `--help`: Show the main help message and exit.
|
49
|
+
|
50
|
+
:::caution Deprecated Usage
|
51
|
+
|
52
|
+
The old method of invoking the CLI using `python main.py cocoindex ...` via the `@cocoindex.main_fn()` decorator is now deprecated. Please remove `@cocoindex.main_fn()` from your scripts and use the standalone cocoindex command as described.
|
53
|
+
|
54
|
+
:::
|
55
|
+
|
56
|
+
## Subcommands
|
57
|
+
|
58
|
+
The following subcommands are available:
|
59
|
+
|
60
|
+
| Subcommand | Description |
|
61
|
+
| ---------- | ----------- |
|
62
|
+
| `ls` | List all flows present in the given file/module. Or list all persisted flows under the current app namespace if no file/module specified. |
|
63
|
+
| `show` | Show the spec and schema for a specific flow. |
|
64
|
+
| `setup` | Check and apply backend setup changes for flows, including the internal and target storage (to export). |
|
65
|
+
| `drop` | Drop the backend setup for specified flows. |
|
66
|
+
| `update` | Update the index defined by the flow. |
|
67
|
+
| `evaluate` | Evaluate the flow and dump flow outputs to files. Instead of updating the index, it dumps what should be indexed to files. Mainly used for evaluation purpose. |
|
68
|
+
| `server` | Start a HTTP server providing REST APIs. It will allow tools like CocoInsight to access the server. |
|
69
|
+
|
70
|
+
Use `--help` to see the full list of subcommands, and `subcommand --help` to see the usage of a specific one.
|
71
|
+
|
72
|
+
```sh
|
73
|
+
cocoindex --help # Show all subcommands
|
74
|
+
cocoindex show --help # Show usage of "show" subcommand
|
75
|
+
```
|
@@ -35,6 +35,7 @@ This is the list of all basic types supported by CocoIndex:
|
|
35
35
|
| Time | | `datetime.time` | `datetime.time` |
|
36
36
|
| LocalDatetime | Date and time without timezone | `cocoindex.LocalDateTime` | `datetime.datetime` |
|
37
37
|
| OffsetDatetime | Date and time with a timezone offset | `cocoindex.OffsetDateTime` | `datetime.datetime` |
|
38
|
+
| TimeDelta | A duration of time | `datetime.timedelta` | `datetime.timedelta` |
|
38
39
|
| Vector[*T*, *Dim*?] | *T* must be basic type. *Dim* is a positive integer and optional. |`cocoindex.Vector[T]` or `cocoindex.Vector[T, Dim]` | `list[T]` |
|
39
40
|
| Json | | `cocoindex.Json` | Any data convertible to JSON by `json` package |
|
40
41
|
|
@@ -313,7 +313,7 @@ Following metrics are supported:
|
|
313
313
|
|
314
314
|
### Getting App Namespace
|
315
315
|
|
316
|
-
You can use the [`app_namespace` setting](
|
316
|
+
You can use the [`app_namespace` setting](settings#app-namespace) or `COCOINDEX_APP_NAMESPACE` environment variable to specify the app namespace,
|
317
317
|
to organize flows across different environments (e.g., dev, staging, production), team members, etc.
|
318
318
|
|
319
319
|
In the code, You can call `flow.get_app_namespace()` to get the app namespace, and use it to name certain backends. It takes the following arguments:
|
@@ -1,5 +1,5 @@
|
|
1
1
|
---
|
2
|
-
title: Flow
|
2
|
+
title: Run a Flow
|
3
3
|
toc_max_heading_level: 4
|
4
4
|
description: Run a CocoIndex Flow, including build / update data in the target storage and evaluate the flow without changing the target storage.
|
5
5
|
---
|
@@ -7,7 +7,7 @@ description: Run a CocoIndex Flow, including build / update data in the target s
|
|
7
7
|
import Tabs from '@theme/Tabs';
|
8
8
|
import TabItem from '@theme/TabItem';
|
9
9
|
|
10
|
-
#
|
10
|
+
# Run a CocoIndex Flow
|
11
11
|
|
12
12
|
After a flow is defined as discussed in [Flow Definition](/docs/core/flow_def), you can start to transform data with it.
|
13
13
|
|
@@ -30,17 +30,7 @@ def demo_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataSco
|
|
30
30
|
```
|
31
31
|
|
32
32
|
It creates a `demo_flow` object in `cocoindex.Flow` type.
|
33
|
-
To enable CLI, you also need to make sure you have a main function decorated with `@cocoindex.main_fn()`:
|
34
33
|
|
35
|
-
|
36
|
-
```python title="main.py"
|
37
|
-
@cocoindex.main_fn()
|
38
|
-
def main():
|
39
|
-
...
|
40
|
-
|
41
|
-
if __name__ == "__main__":
|
42
|
-
main()
|
43
|
-
```
|
44
34
|
</TabItem>
|
45
35
|
</Tabs>
|
46
36
|
|
@@ -78,7 +68,7 @@ The `cocoindex update` subcommand creates/updates data in the target storage.
|
|
78
68
|
Once it's done, the target data is fresh up to the moment when the function is called.
|
79
69
|
|
80
70
|
```sh
|
81
|
-
|
71
|
+
cocoindex update main.py
|
82
72
|
```
|
83
73
|
|
84
74
|
#### Library API
|
@@ -115,7 +105,7 @@ Change capture mechanisms enable CocoIndex to continuously capture changes from
|
|
115
105
|
To perform live update, run the `cocoindex update` subcommand with `-L` option:
|
116
106
|
|
117
107
|
```sh
|
118
|
-
|
108
|
+
cocoindex update main.py -L
|
119
109
|
```
|
120
110
|
|
121
111
|
If there's at least one data source with change capture mechanism enabled, it will keep running until the aborted (e.g. by `Ctrl-C`).
|
@@ -232,7 +222,7 @@ It takes the following options:
|
|
232
222
|
Example:
|
233
223
|
|
234
224
|
```sh
|
235
|
-
|
225
|
+
cocoindex evaluate main.py --output-dir ./eval_output
|
236
226
|
```
|
237
227
|
|
238
228
|
### Library API
|
@@ -0,0 +1,116 @@
|
|
1
|
+
---
|
2
|
+
title: CocoIndex Settings
|
3
|
+
description: Provide settings for CocoIndex, e.g. database connection, app namespace, etc.
|
4
|
+
---
|
5
|
+
|
6
|
+
import Tabs from '@theme/Tabs';
|
7
|
+
import TabItem from '@theme/TabItem';
|
8
|
+
|
9
|
+
# CocoIndex Settings
|
10
|
+
|
11
|
+
Certain settings need to be provided for CocoIndex to work, e.g. database connections, app namespace, etc.
|
12
|
+
|
13
|
+
## Launch CocoIndex
|
14
|
+
|
15
|
+
You have two ways to launch CocoIndex:
|
16
|
+
|
17
|
+
* Use [Cocoindex CLI](cli). It's handy for most routine indexing building and management tasks.
|
18
|
+
It will load settings from environment variables, either already set in your environment, or specified in `.env` file.
|
19
|
+
See [CLI](cli#environment-variables) for more details.
|
20
|
+
|
21
|
+
* Call CocoIndex functionality from your own Python application or library.
|
22
|
+
It's needed when you want to leverage CocoIndex support for query, or have your custom logic to trigger indexing, etc.
|
23
|
+
|
24
|
+
<Tabs>
|
25
|
+
<TabItem value="python" label="Python" default>
|
26
|
+
|
27
|
+
You need to explicitly call `cocoindex.init()` before doing anything with CocoIndex, and settings will be loaded during the call.
|
28
|
+
|
29
|
+
* If it's called without any argument, it will load settings from environment variables.
|
30
|
+
Only existing environment variables already set in your environment will be used.
|
31
|
+
If you want to load environment variables from a specific `.env` file, consider call `load_dotenv()` provided by the [`python-dotenv`](https://github.com/theskumar/python-dotenv) package.
|
32
|
+
|
33
|
+
```py
|
34
|
+
from dotenv import load_dotenv
|
35
|
+
import cocoindex
|
36
|
+
|
37
|
+
load_dotenv()
|
38
|
+
cocoindex.init()
|
39
|
+
```
|
40
|
+
|
41
|
+
* It takes an optional `cocoindex.Settings` dataclass object as argument, so you can also construct settings explicitly and pass to it:
|
42
|
+
|
43
|
+
```py
|
44
|
+
import cocoindex
|
45
|
+
|
46
|
+
cocoindex.init(
|
47
|
+
cocoindex.Settings(
|
48
|
+
database=cocoindex.DatabaseConnectionSpec(
|
49
|
+
url="postgres://cocoindex:cocoindex@localhost/cocoindex"
|
50
|
+
)
|
51
|
+
)
|
52
|
+
)
|
53
|
+
```
|
54
|
+
</TabItem>
|
55
|
+
</Tabs>
|
56
|
+
|
57
|
+
## List of Settings
|
58
|
+
|
59
|
+
`cocoindex.Settings` is a dataclass that contains the following fields:
|
60
|
+
|
61
|
+
* `app_namespace` (type: `str`, required): The namespace of the application.
|
62
|
+
* `database` (type: `DatabaseConnectionSpec`, required): The connection to the Postgres database.
|
63
|
+
|
64
|
+
### App Namespace
|
65
|
+
|
66
|
+
The `app_namespace` field helps organize flows across different environments (e.g., dev, staging, production), team members, etc. When set, it prefixes flow names with the namespace.
|
67
|
+
|
68
|
+
For example, if the namespace is `Staging`, for a flow with name specified as `Flow1` in code, the full name of the flow will be `Staging.Flow1`.
|
69
|
+
You can also get the current app namespace by calling `cocoindex.get_app_namespace()` (see [Getting App Namespace](flow_def#getting-app-namespace) for more details).
|
70
|
+
|
71
|
+
If not set, all flows are in a default unnamed namespace.
|
72
|
+
|
73
|
+
*Environment variable*: `COCOINDEX_APP_NAMESPACE`
|
74
|
+
|
75
|
+
### DatabaseConnectionSpec
|
76
|
+
|
77
|
+
`DatabaseConnectionSpec` configures the connection to a database. Only Postgres is supported for now. It has the following fields:
|
78
|
+
|
79
|
+
* `url` (type: `str`, required): The URL of the Postgres database to use as the internal storage, e.g. `postgres://cocoindex:cocoindex@localhost/cocoindex`.
|
80
|
+
|
81
|
+
*Environment variable* for `Settings.database.url`: `COCOINDEX_DATABASE_URL`
|
82
|
+
|
83
|
+
* `user` (type: `str`, optional): The username for the Postgres database. If not provided, username will come from `url`.
|
84
|
+
|
85
|
+
*Environment variable* for `Settings.database.user`: `COCOINDEX_DATABASE_USER`
|
86
|
+
|
87
|
+
* `password` (type: `str`, optional): The password for the Postgres database. If not provided, password will come from `url`.
|
88
|
+
|
89
|
+
*Environment variable* for `Settings.database.password`: `COCOINDEX_DATABASE_PASSWORD`
|
90
|
+
|
91
|
+
:::tip
|
92
|
+
|
93
|
+
Please be careful that all values in `url` needs to be url-encoded if they contain special characters.
|
94
|
+
For this reason, prefer to use the separated `user` and `password` fields for username and password.
|
95
|
+
|
96
|
+
:::
|
97
|
+
|
98
|
+
:::info
|
99
|
+
|
100
|
+
If you use the Postgres database hosted by [Supabase](https://supabase.com/), please click **Connect** on your project dashboard and find the following URL:
|
101
|
+
|
102
|
+
* If you're on a IPv6 network, use the URL under **Direct connection**. You can visit [IPv6 test](https://test-ipv6.com/) to see if you have IPv6 Internet connection.
|
103
|
+
* Otherwise, use the URL under **Session pooler**.
|
104
|
+
|
105
|
+
:::
|
106
|
+
|
107
|
+
## List of Environment Variables
|
108
|
+
|
109
|
+
This is the list of environment variables, each of which has a corresponding field in `Settings`:
|
110
|
+
|
111
|
+
| environment variable | corresponding field in `Settings` | required? |
|
112
|
+
|---------------------|-------------------|----------|
|
113
|
+
| `COCOINDEX_DATABASE_URL` | `database.url` | Yes |
|
114
|
+
| `COCOINDEX_DATABASE_USER` | `database.user` | No |
|
115
|
+
| `COCOINDEX_DATABASE_PASSWORD` | `database.password` | No |
|
116
|
+
| `COCOINDEX_APP_NAMESPACE` | `app_namespace` | No |
|
@@ -46,7 +46,7 @@ We'll need to install a bunch of dependencies for this project.
|
|
46
46
|
2. Prepare input files for the index. Put them in a directory, e.g. `markdown_files`.
|
47
47
|
If you don't have any files at hand, you may download the example [markdown_files.zip](markdown_files.zip) and unzip it in the current directory.
|
48
48
|
|
49
|
-
## Step 2:
|
49
|
+
## Step 2: Define the indexing flow
|
50
50
|
|
51
51
|
Create a new file `quickstart.py` and import the `cocoindex` library:
|
52
52
|
|
@@ -54,11 +54,7 @@ Create a new file `quickstart.py` and import the `cocoindex` library:
|
|
54
54
|
import cocoindex
|
55
55
|
```
|
56
56
|
|
57
|
-
Then we'll create the indexing flow.
|
58
|
-
|
59
|
-
### Step 2.1: Define the indexing flow
|
60
|
-
|
61
|
-
Starting from the indexing flow:
|
57
|
+
Then we'll create the indexing flow as follows.
|
62
58
|
|
63
59
|
```python title="quickstart.py"
|
64
60
|
@cocoindex.flow_def(name="TextEmbedding")
|
@@ -117,24 +113,6 @@ Notes:
|
|
117
113
|
|
118
114
|
6. In CocoIndex, a *collector* collects multiple entries of data together. In this example, the `doc_embeddings` collector collects data from all `chunk`s across all `doc`s, and using the collected data to build a vector index `"doc_embeddings"`, using `Postgres`.
|
119
115
|
|
120
|
-
### Step 2.2: Define the main function
|
121
|
-
|
122
|
-
We can provide an empty main function for now, with a `@cocoindex.main_fn()` decorator:
|
123
|
-
|
124
|
-
```python title="quickstart.py"
|
125
|
-
@cocoindex.main_fn()
|
126
|
-
def _main():
|
127
|
-
pass
|
128
|
-
|
129
|
-
if __name__ == "__main__":
|
130
|
-
_main()
|
131
|
-
```
|
132
|
-
|
133
|
-
The `@cocoindex.main_fn` declares a function as the main function for an indexing application. This achieves the following effects:
|
134
|
-
|
135
|
-
* Initialize the CocoIndex library states. Settings (e.g. database URL) are loaded from environment variables by default.
|
136
|
-
* When the CLI is invoked with `cocoindex` subcommand, `cocoindex CLI` takes over the control, which provides convenient ways to manage the index. See the next step for more details.
|
137
|
-
|
138
116
|
## Step 3: Run the indexing pipeline and queries
|
139
117
|
|
140
118
|
Specify the database URL by environment variable:
|
@@ -148,7 +126,7 @@ export COCOINDEX_DATABASE_URL="postgresql://cocoindex:cocoindex@localhost:5432/c
|
|
148
126
|
We need to setup the index:
|
149
127
|
|
150
128
|
```bash
|
151
|
-
|
129
|
+
cocoindex setup quickstart.py
|
152
130
|
```
|
153
131
|
|
154
132
|
Enter `yes` and it will automatically create a few tables in the database.
|
@@ -160,7 +138,7 @@ Now we have tables needed by this CocoIndex flow.
|
|
160
138
|
Now we're ready to build the index:
|
161
139
|
|
162
140
|
```bash
|
163
|
-
|
141
|
+
cocoindex update quickstart.py
|
164
142
|
```
|
165
143
|
|
166
144
|
It will run for a few seconds and output the following statistics:
|
@@ -260,13 +238,15 @@ There're two CocoIndex-specific logic:
|
|
260
238
|
It's done by the `eval()` method of the transform flow `text_to_embedding`.
|
261
239
|
The return type of this method is `list[float]` as declared in the `text_to_embedding()` function (`cocoindex.DataSlice[list[float]]`).
|
262
240
|
|
263
|
-
### Step 4.3:
|
241
|
+
### Step 4.3: Add the main script logic
|
264
242
|
|
265
|
-
Now we can
|
243
|
+
Now we can add the main logic to the program. It uses the query function we just defined:
|
266
244
|
|
267
245
|
```python title="quickstart.py"
|
268
|
-
|
269
|
-
|
246
|
+
if __name__ == "__main__":
|
247
|
+
# Initialize CocoIndex library states
|
248
|
+
cocoindex.init()
|
249
|
+
|
270
250
|
# Initialize the database connection pool.
|
271
251
|
pool = ConnectionPool(os.getenv("COCOINDEX_DATABASE_URL"))
|
272
252
|
# Run queries in a loop to demonstrate the query capabilities.
|
@@ -291,7 +271,7 @@ It interacts with users and search the database by calling the `search()` method
|
|
291
271
|
|
292
272
|
### Step 4.4: Run queries against the index
|
293
273
|
|
294
|
-
Now we can run the same Python file, which will run the new main
|
274
|
+
Now we can run the same Python file, which will run the new added main logic:
|
295
275
|
|
296
276
|
```bash
|
297
277
|
python quickstart.py
|
@@ -37,7 +37,7 @@ It should be a unique table, meaning that no other export target should export t
|
|
37
37
|
The spec takes the following fields:
|
38
38
|
|
39
39
|
* `database` (type: [auth reference](../core/flow_def#auth-registry) to `DatabaseConnectionSpec`, optional): The connection to the Postgres database.
|
40
|
-
See [DatabaseConnectionSpec](../core/
|
40
|
+
See [DatabaseConnectionSpec](../core/settings#databaseconnectionspec) for its specific fields.
|
41
41
|
If not provided, will use the same database as the [internal storage](/docs/core/basics#internal-storage).
|
42
42
|
|
43
43
|
* `table_name` (type: `str`, optional): The name of the table to store to. If unspecified, will use the table name `[${AppNamespace}__]${FlowName}__${TargetName}`, e.g. `DemoFlow__doc_embeddings` or `Staging__DemoFlow__doc_embeddings`.
|
@@ -419,7 +419,7 @@ The `Neo4j` storage exports each row as a relationship to Neo4j Knowledge Graph.
|
|
419
419
|
Neo4j also provides a declaration spec `Neo4jDeclaration`, to configure indexing options for nodes only referenced by relationships. It has the following fields:
|
420
420
|
|
421
421
|
* `connection` (type: auth reference to `Neo4jConnectionSpec`)
|
422
|
-
* Fields for [nodes to declare](#
|
422
|
+
* Fields for [nodes to declare](#declare-extra-node-labels), including
|
423
423
|
* `nodes_label` (required)
|
424
424
|
* `primary_key_fields` (required)
|
425
425
|
* `vector_indexes` (optional)
|
@@ -1604,6 +1604,21 @@
|
|
1604
1604
|
react-helmet-async "npm:@slorber/react-helmet-async@*"
|
1605
1605
|
react-loadable "npm:@docusaurus/react-loadable@6.0.0"
|
1606
1606
|
|
1607
|
+
"@docusaurus/plugin-client-redirects@^3.7.0":
|
1608
|
+
version "3.7.0"
|
1609
|
+
resolved "https://registry.yarnpkg.com/@docusaurus/plugin-client-redirects/-/plugin-client-redirects-3.7.0.tgz#b5cf92529768c457c01ad350bfc50862c6149463"
|
1610
|
+
integrity sha512-6B4XAtE5ZVKOyhPgpgMkb7LwCkN+Hgd4vOnlbwR8nCdTQhLjz8MHbGlwwvZ/cay2SPNRX5KssqKAlcHVZP2m8g==
|
1611
|
+
dependencies:
|
1612
|
+
"@docusaurus/core" "3.7.0"
|
1613
|
+
"@docusaurus/logger" "3.7.0"
|
1614
|
+
"@docusaurus/utils" "3.7.0"
|
1615
|
+
"@docusaurus/utils-common" "3.7.0"
|
1616
|
+
"@docusaurus/utils-validation" "3.7.0"
|
1617
|
+
eta "^2.2.0"
|
1618
|
+
fs-extra "^11.1.1"
|
1619
|
+
lodash "^4.17.21"
|
1620
|
+
tslib "^2.6.0"
|
1621
|
+
|
1607
1622
|
"@docusaurus/plugin-content-blog@3.7.0":
|
1608
1623
|
version "3.7.0"
|
1609
1624
|
resolved "https://registry.yarnpkg.com/@docusaurus/plugin-content-blog/-/plugin-content-blog-3.7.0.tgz#7bd69de87a1f3adb652e1473ef5b7ccc9468f47e"
|
@@ -40,7 +40,7 @@ pip install -e .
|
|
40
40
|
Setup:
|
41
41
|
|
42
42
|
```sh
|
43
|
-
|
43
|
+
cocoindex setup main.py
|
44
44
|
```
|
45
45
|
|
46
46
|
Run:
|
@@ -59,13 +59,13 @@ CocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute vi
|
|
59
59
|
Run CocoInsight to understand your RAG data pipeline:
|
60
60
|
|
61
61
|
```sh
|
62
|
-
|
62
|
+
cocoindex server -ci main.py
|
63
63
|
```
|
64
64
|
|
65
65
|
You can also add a `-L` flag to make the server keep updating the index to reflect source changes at the same time:
|
66
66
|
|
67
67
|
```sh
|
68
|
-
|
68
|
+
cocoindex server -ci -L main.py
|
69
69
|
```
|
70
70
|
|
71
71
|
Then open the CocoInsight UI at [https://cocoindex.io/cocoinsight](https://cocoindex.io/cocoinsight).
|
@@ -52,8 +52,7 @@ query_handler = cocoindex.query.SimpleSemanticsQueryHandler(
|
|
52
52
|
model="sentence-transformers/all-MiniLM-L6-v2")),
|
53
53
|
default_similarity_metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)
|
54
54
|
|
55
|
-
|
56
|
-
def _run():
|
55
|
+
def _main():
|
57
56
|
# Use a `FlowLiveUpdater` to keep the flow data updated.
|
58
57
|
with cocoindex.FlowLiveUpdater(amazon_s3_text_embedding_flow):
|
59
58
|
# Run queries in a loop to demonstrate the query capabilities.
|
@@ -73,5 +72,6 @@ def _run():
|
|
73
72
|
break
|
74
73
|
|
75
74
|
if __name__ == "__main__":
|
76
|
-
load_dotenv(
|
77
|
-
|
75
|
+
load_dotenv()
|
76
|
+
cocoindex.init()
|
77
|
+
_main()
|
@@ -3,7 +3,7 @@ name = "amazon-s3-text-embedding"
|
|
3
3
|
version = "0.1.0"
|
4
4
|
description = "Simple example for cocoindex: build embedding index based on Amazon S3 files."
|
5
5
|
requires-python = ">=3.11"
|
6
|
-
dependencies = ["cocoindex>=0.1.
|
6
|
+
dependencies = ["cocoindex>=0.1.42", "python-dotenv>=1.0.1"]
|
7
7
|
|
8
8
|
[tool.setuptools]
|
9
9
|
packages = []
|
@@ -0,0 +1,71 @@
|
|
1
|
+
# Build real-time index for codebase
|
2
|
+
[](https://github.com/cocoindex-io/cocoindex)
|
3
|
+
|
4
|
+
CocoIndex provides built-in support for code base chunking, using Tree-sitter to keep syntax boundary. In this example, we will build real-time index for codebase using CocoIndex.
|
5
|
+
|
6
|
+
We appreciate a star ⭐ at [CocoIndex Github](https://github.com/cocoindex-io/cocoindex) if this is helpful.
|
7
|
+
|
8
|
+

|
9
|
+
|
10
|
+
[Tree-sitter](https://en.wikipedia.org/wiki/Tree-sitter_%28parser_generator%29) is a parser generator tool and an incremental parsing library. It is available in Rust 🦀 - [GitHub](https://github.com/tree-sitter/tree-sitter). CocoIndex has built-in Rust integration with Tree-sitter to efficiently parse code and extract syntax trees for various programming languages. Check out the list of supported languages [here](https://cocoindex.io/docs/ops/functions#splitrecursively) - in the `language` section.
|
11
|
+
|
12
|
+
|
13
|
+
## Tutorials
|
14
|
+
- Step by step tutorial - Check out the [blog](https://cocoindex.io/blogs/index-code-base-for-rag).
|
15
|
+
- Video tutorial - [Youtube](https://youtu.be/G3WstvhHO24?si=Bnxu67Ax5Lv8b-J2).
|
16
|
+
|
17
|
+
## Steps
|
18
|
+
|
19
|
+
### Indexing Flow
|
20
|
+
<p align='center'>
|
21
|
+
<img width="434" alt="Screenshot 2025-05-19 at 10 14 36 PM" src="https://github.com/user-attachments/assets/3a506034-698f-480a-b653-22184dae4e14" />
|
22
|
+
</p>
|
23
|
+
|
24
|
+
1. We will ingest CocoIndex codebase.
|
25
|
+
2. For each file, perform chunking (Tree-sitter) and then embedding.
|
26
|
+
3. We will save the embeddings and the metadata in Postgres with PGVector.
|
27
|
+
|
28
|
+
### Query:
|
29
|
+
We will match against user-provided text by a SQL query, reusing the embedding operation in the indexing flow.
|
30
|
+
|
31
|
+
|
32
|
+
## Prerequisite
|
33
|
+
[Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one.
|
34
|
+
|
35
|
+
## Run
|
36
|
+
|
37
|
+
- Install dependencies:
|
38
|
+
```bash
|
39
|
+
pip install -e .
|
40
|
+
```
|
41
|
+
|
42
|
+
- Setup:
|
43
|
+
|
44
|
+
```bash
|
45
|
+
cocoindex setup main.py
|
46
|
+
```
|
47
|
+
|
48
|
+
- Update index:
|
49
|
+
|
50
|
+
```bash
|
51
|
+
cocoindex update main.py
|
52
|
+
```
|
53
|
+
|
54
|
+
- Run:
|
55
|
+
|
56
|
+
```bash
|
57
|
+
python main.py
|
58
|
+
```
|
59
|
+
|
60
|
+
## CocoInsight
|
61
|
+
I used CocoInsight (Free beta now) to troubleshoot the index generation and understand the data lineage of the pipeline.
|
62
|
+
It just connects to your local CocoIndex server, with Zero pipeline data retention. Run the following command to start CocoInsight:
|
63
|
+
|
64
|
+
```
|
65
|
+
cocoindex server -ci main.py
|
66
|
+
```
|
67
|
+
|
68
|
+
Then open the CocoInsight UI at [https://cocoindex.io/cocoinsight](https://cocoindex.io/cocoinsight).
|
69
|
+
|
70
|
+
<img width="1305" alt="Chunking Visualization" src="https://github.com/user-attachments/assets/8e83b9a4-2bed-456b-83e5-b5381b28b84a" />
|
71
|
+
|