cocoindex 0.1.82__tar.gz → 0.2.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {cocoindex-0.1.82 → cocoindex-0.2.0}/Cargo.lock +1 -1
- {cocoindex-0.1.82 → cocoindex-0.2.0}/Cargo.toml +1 -1
- {cocoindex-0.1.82 → cocoindex-0.2.0}/PKG-INFO +1 -1
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/core/basics.md +1 -1
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/core/data_types.mdx +16 -10
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/examples/examples/academic_papers_index.md +61 -55
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/examples/examples/codebase_index.md +71 -49
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/examples/examples/custom_targets.md +17 -21
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/examples/examples/docs_to_knowledge_graph.md +62 -26
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/examples/examples/image_search.md +1 -1
- cocoindex-0.2.0/docs/docs/examples/examples/manual_extraction.md +249 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/examples/examples/multi_format_index.md +1 -1
- cocoindex-0.2.0/docs/docs/examples/examples/patient_form_extraction.md +296 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/examples/examples/photo_search.md +37 -16
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/examples/examples/product_recommendation.md +88 -67
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/examples/examples/simple_vector_index.md +45 -24
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/getting_started/quickstart.md +1 -1
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/ops/sources.md +43 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/src/components/GitHubButton/index.tsx +26 -6
- cocoindex-0.2.0/docs/static/img/examples/academic_papers_index/abstract_chunks.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/academic_papers_index/basic_info.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/academic_papers_index/chunk_embedding.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/academic_papers_index/cover.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/academic_papers_index/first_page.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/academic_papers_index/flow.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/academic_papers_index/metadata.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/codebase_index/chunk.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/codebase_index/cover.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/codebase_index/flow.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/codebase_index/usecase.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/custom_targets/convert.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/custom_targets/cover.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/docs_to_knowledge_graph/cover.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/docs_to_knowledge_graph/dedupe.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/docs_to_knowledge_graph/export_document.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/docs_to_knowledge_graph/export_relationship.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/docs_to_knowledge_graph/extract_relationship.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/docs_to_knowledge_graph/flow.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/docs_to_knowledge_graph/relationship.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/docs_to_knowledge_graph/summary.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/image_search/cover.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/manual_extraction/cover.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/manual_extraction/extraction.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/manual_extraction/flow.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/manual_extraction/summary.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/multi_format_index/cover.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/patient_form_extraction/cover.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/patient_form_extraction/extraction.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/patient_form_extraction/fields.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/patient_form_extraction/flow.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/patient_form_extraction/tomarkdown.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/photo_search/cover.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/photo_search/extraction.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/photo_search/flow.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/product_recommendation/cover.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/product_recommendation/dedupe.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/product_recommendation/export_all.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/product_recommendation/export_product.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/product_recommendation/export_taxonomy.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/product_recommendation/extract_product.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/product_recommendation/extract_taxonomy.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/product_recommendation/neo4j.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/product_recommendation/parse_json.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/product_recommendation/taxonomy.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/simple_vector_index/chunk.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/simple_vector_index/cover.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/simple_vector_index/embed.png +0 -0
- cocoindex-0.2.0/docs/static/img/examples/simple_vector_index/flow.png +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/custom_output_files/README.md +1 -2
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/manuals_llm_extraction/pyproject.toml +1 -1
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/paper_metadata/README.md +1 -1
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/paper_metadata/pyproject.toml +2 -2
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/pdf_embedding/pyproject.toml +1 -1
- cocoindex-0.2.0/examples/postgres_source/.env +7 -0
- cocoindex-0.2.0/examples/postgres_source/.env.example +22 -0
- cocoindex-0.2.0/examples/postgres_source/README.md +60 -0
- cocoindex-0.2.0/examples/postgres_source/main.py +134 -0
- cocoindex-0.2.0/examples/postgres_source/prepare_source_data.sql +94 -0
- cocoindex-0.2.0/examples/postgres_source/pyproject.toml +9 -0
- cocoindex-0.1.82/examples/product_recommendation/.env → cocoindex-0.2.0/examples/product_recommendation/.env.example +2 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/convert.py +63 -46
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/flow.py +14 -6
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/op.py +10 -24
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/sources.py +20 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/typing.py +37 -22
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/base/schema.rs +35 -37
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/base/spec.rs +0 -10
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/base/value.rs +221 -77
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/builder/analyzer.rs +22 -42
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/builder/exec_ctx.rs +44 -19
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/builder/flow_builder.rs +32 -3
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/builder/plan.rs +2 -1
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/execution/db_tracking.rs +55 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/execution/db_tracking_setup.rs +36 -15
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/execution/dumper.rs +17 -14
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/execution/evaluator.rs +16 -13
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/execution/row_indexer.rs +337 -364
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/execution/source_indexer.rs +22 -11
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/execution/stats.rs +1 -1
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/lib_context.rs +1 -1
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/functions/split_recursively.rs +16 -10
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/interface.rs +7 -7
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/mod.rs +1 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/registration.rs +38 -5
- cocoindex-0.2.0/src/ops/registry.rs +87 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/sdk.rs +0 -1
- cocoindex-0.2.0/src/ops/shared/mod.rs +1 -0
- cocoindex-0.2.0/src/ops/shared/postgres.rs +74 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/sources/amazon_s3.rs +12 -11
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/sources/azure_blob.rs +11 -11
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/sources/google_drive.rs +12 -12
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/sources/local_file.rs +13 -15
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/sources/mod.rs +1 -0
- cocoindex-0.2.0/src/ops/sources/postgres.rs +558 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/targets/kuzu.rs +10 -11
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/targets/neo4j.rs +2 -3
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/targets/postgres.rs +7 -73
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/targets/shared/property_graph.rs +1 -1
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/py/convert.rs +32 -21
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/service/flows.rs +17 -22
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/setup/driver.rs +128 -83
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/setup/states.rs +32 -10
- cocoindex-0.2.0/src/utils/db.rs +16 -0
- cocoindex-0.1.82/docs/docs/examples/examples/manual_extraction.md +0 -274
- cocoindex-0.1.82/docs/docs/examples/examples/patient_form_extraction.md +0 -271
- cocoindex-0.1.82/docs/static/img/examples/academic_papers_index.png +0 -0
- cocoindex-0.1.82/docs/static/img/examples/codebase_index.png +0 -0
- cocoindex-0.1.82/docs/static/img/examples/custom_targets.png +0 -0
- cocoindex-0.1.82/docs/static/img/examples/docs_to_knowledge_graph.png +0 -0
- cocoindex-0.1.82/docs/static/img/examples/image_search.png +0 -0
- cocoindex-0.1.82/docs/static/img/examples/manual_extraction.png +0 -0
- cocoindex-0.1.82/docs/static/img/examples/multi_format_index.png +0 -0
- cocoindex-0.1.82/docs/static/img/examples/patient_form_extraction.png +0 -0
- cocoindex-0.1.82/docs/static/img/examples/photo_search.png +0 -0
- cocoindex-0.1.82/docs/static/img/examples/product_recommendation.png +0 -0
- cocoindex-0.1.82/docs/static/img/examples/simple_vector_index.png +0 -0
- cocoindex-0.1.82/src/ops/registry.rs +0 -38
- cocoindex-0.1.82/src/utils/db.rs +0 -45
- {cocoindex-0.1.82 → cocoindex-0.2.0}/.cargo/config.toml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/.env.lib_debug +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/.github/ISSUE_TEMPLATE//360/237/220/233-bug-report.md" +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/.github/ISSUE_TEMPLATE//360/237/222/241-feature-request.md" +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/.github/scripts/update_version.sh +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/.github/workflows/CI.yml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/.github/workflows/_doc_release.yml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/.github/workflows/_test.yml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/.github/workflows/docs.yml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/.github/workflows/format.yml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/.github/workflows/release.yml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/.gitignore +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/.pre-commit-config.yaml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/CODE_OF_CONDUCT.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/CONTRIBUTING.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/LICENSE +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/README.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/dev/neo4j.yaml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/dev/postgres.yaml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/.gitignore +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/README.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/about/community.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/ai/llm.mdx +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/contributing/guide.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/contributing/new_built_in_target.mdx +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/contributing/setup_dev_environment.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/core/cli.mdx +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/core/data_example.svg +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/core/flow_def.mdx +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/core/flow_example.svg +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/core/flow_methods.mdx +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/core/settings.mdx +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/custom_ops/custom_functions.mdx +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/custom_ops/custom_targets.mdx +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/examples/index.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/getting_started/installation.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/getting_started/markdown_files.zip +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/getting_started/overview.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/ops/functions.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/ops/targets.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/query.mdx +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/tutorials/live_updates.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docs/tutorials/manage_flow_dynamically.mdx +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/docusaurus.config.ts +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/package.json +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/sidebars.ts +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/src/css/custom.css +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/src/theme/DocCard/index.tsx +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/src/theme/DocCard/styles.module.css +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/src/theme/DocCardList/index.tsx +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/src/theme/DocCardList/styles.module.css +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/src/theme/Root.js +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/static/.nojekyll +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/static/img/docusaurus.png +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/static/img/favicon.ico +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/static/img/icon.svg +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/static/img/incremental-etl.gif +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/static/robots.txt +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/tsconfig.json +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/docs/yarn.lock +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/amazon_s3_embedding/.env.example +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/amazon_s3_embedding/.gitignore +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/amazon_s3_embedding/README.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/amazon_s3_embedding/main.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/amazon_s3_embedding/pyproject.toml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/azure_blob_embedding/.env.example +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/azure_blob_embedding/.gitignore +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/azure_blob_embedding/README.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/azure_blob_embedding/main.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/azure_blob_embedding/pyproject.toml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/code_embedding/.env +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/code_embedding/README.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/code_embedding/main.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/code_embedding/pyproject.toml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/custom_output_files/.env +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/custom_output_files/.gitignore +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/custom_output_files/data/bizarre_animals.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/custom_output_files/data/chunk_norris.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/custom_output_files/main.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/custom_output_files/pyproject.toml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/docs_to_knowledge_graph/.env +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/docs_to_knowledge_graph/README.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/docs_to_knowledge_graph/main.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/docs_to_knowledge_graph/pyproject.toml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/face_recognition/.env +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/face_recognition/README.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/face_recognition/images/Carter_welcomes_Reagan.jpg +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/face_recognition/images/Solvay_conference_1927.jpg +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/face_recognition/images/Steve_Jobs_and_Bill_Gates_(522695099).jpg +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/face_recognition/images/einplanck3.jpg +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/face_recognition/main.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/face_recognition/pyproject.toml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/fastapi_server_docker/.dockerignore +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/fastapi_server_docker/.env +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/fastapi_server_docker/README.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/fastapi_server_docker/compose.yaml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/fastapi_server_docker/dockerfile +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/fastapi_server_docker/files/1810.04805v2.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/fastapi_server_docker/main.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/fastapi_server_docker/requirements.txt +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/gdrive_text_embedding/.env.example +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/gdrive_text_embedding/.gitignore +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/gdrive_text_embedding/README.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/gdrive_text_embedding/main.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/gdrive_text_embedding/pyproject.toml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/image_search/.env +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/image_search/README.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/image_search/colpali_main.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/image_search/frontend/.gitignore +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/image_search/frontend/index.html +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/image_search/frontend/package-lock.json +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/image_search/frontend/package.json +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/image_search/frontend/src/App.jsx +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/image_search/frontend/src/main.jsx +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/image_search/frontend/src/style.css +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/image_search/frontend/vite.config.js +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/image_search/img/cat1.jpeg +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/image_search/img/dog1.jpeg +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/image_search/img/elephant1.jpg +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/image_search/img/giraffe.jpg +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/image_search/main.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/image_search/pyproject.toml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/live_updates/.env +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/live_updates/README.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/live_updates/data/bizarre_animals.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/live_updates/data/chunk_norris.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/live_updates/main.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/live_updates/pyproject.toml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/manuals_llm_extraction/.env +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/manuals_llm_extraction/README.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/manuals_llm_extraction/main.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/manuals_llm_extraction/manuals/array.pdf +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/manuals_llm_extraction/manuals/base64.pdf +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/manuals_llm_extraction/manuals/copy.pdf +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/manuals_llm_extraction/manuals/glob.pdf +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/multi_format_indexing/.env +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/multi_format_indexing/README.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/multi_format_indexing/main.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/multi_format_indexing/pyproject.toml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/multi_format_indexing/source_files/1706.03762v7.pdf +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/multi_format_indexing/source_files/1810.04805v2.pdf +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/multi_format_indexing/source_files/2502.06786v3.pdf +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/multi_format_indexing/source_files/healthcare_industry_test_p101.jpg +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/multi_format_indexing/source_files/healthcare_industry_test_p86.jpg +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/multi_format_indexing/source_files/healthcare_industry_test_p9.jpg +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/multi_format_indexing/source_files/restaurant_brands_international_2023.jpg +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/multi_format_indexing/source_files/sweetgreen_2023.jpg +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/paper_metadata/.env.example +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/paper_metadata/.gitignore +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/paper_metadata/main.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/paper_metadata/papers/1706.03762v7.pdf +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/paper_metadata/papers/1810.04805v2.pdf +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/paper_metadata/papers/2502.06786v3.pdf +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/paper_metadata/papers/2502.20346v1.pdf +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/patient_intake_extraction/.env.example +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/patient_intake_extraction/README.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/patient_intake_extraction/data/README.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/patient_intake_extraction/data/patient_forms/Patient_Intake_Form_David_Artificial.docx +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/patient_intake_extraction/data/patient_forms/Patient_Intake_Form_Emily_Artificial.pdf +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/patient_intake_extraction/data/patient_forms/Patient_Intake_Form_Joe_Artificial.pdf +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/patient_intake_extraction/data/patient_forms/Patient_Intake_From_Jane_Artificial.docx +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/patient_intake_extraction/main.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/patient_intake_extraction/pyproject.toml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/pdf_embedding/.env +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/pdf_embedding/README.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/pdf_embedding/main.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/pdf_embedding/pdf_files/1706.03762v7.pdf +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/pdf_embedding/pdf_files/1810.04805v2.pdf +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/pdf_embedding/pdf_files/rfc8259.pdf +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/product_recommendation/README.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/product_recommendation/img/cocoinsight.png +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/product_recommendation/img/neo4j.png +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/product_recommendation/main.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/product_recommendation/products/p1.json +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/product_recommendation/products/p2.json +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/product_recommendation/products/p3.json +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/product_recommendation/products/p4.json +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/product_recommendation/products/p5.json +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/product_recommendation/products/p6.json +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/product_recommendation/products/p7.json +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/product_recommendation/products/p8.json +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/product_recommendation/products/p9.json +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/product_recommendation/pyproject.toml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/text_embedding/.env +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/text_embedding/README.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/text_embedding/Text_Embedding.ipynb +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/text_embedding/main.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/text_embedding/markdown_files/1706.03762v7.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/text_embedding/markdown_files/1810.04805v2.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/text_embedding/markdown_files/rfc8259.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/text_embedding/pyproject.toml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/text_embedding_qdrant/.env +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/text_embedding_qdrant/README.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/text_embedding_qdrant/main.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/text_embedding_qdrant/markdown_files/rfc8259.md +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/examples/text_embedding_qdrant/pyproject.toml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/pyproject.toml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/__init__.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/auth_registry.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/cli.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/functions.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/index.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/lib.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/llm.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/py.typed +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/runtime.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/setting.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/setup.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/subprocess_exec.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/targets.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/tests/__init__.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/tests/test_convert.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/tests/test_optional_database.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/tests/test_transform_flow.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/tests/test_typing.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/tests/test_validation.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/user_app_loader.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/utils.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/python/cocoindex/validation.py +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/ruff.toml +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/base/duration.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/base/field_attrs.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/base/json_schema.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/base/mod.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/builder/analyzed_flow.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/builder/mod.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/execution/indexing_status.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/execution/live_updater.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/execution/memoization.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/execution/mod.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/lib.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/llm/anthropic.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/llm/gemini.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/llm/litellm.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/llm/mod.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/llm/ollama.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/llm/openai.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/llm/openrouter.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/llm/vllm.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/llm/voyage.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/factory_bases.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/functions/embed_text.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/functions/extract_by_llm.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/functions/mod.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/functions/parse_json.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/functions/test_utils.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/py_factory.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/sources/shared/mod.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/sources/shared/pattern_matcher.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/targets/mod.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/targets/qdrant.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/targets/shared/mod.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/ops/targets/shared/table_columns.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/prelude.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/py/mod.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/server.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/service/error.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/service/mod.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/settings.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/setup/auth_registry.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/setup/components.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/setup/db_metadata.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/setup/flow_features.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/setup/mod.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/utils/concur_control.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/utils/fingerprint.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/utils/immutable.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/utils/mod.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/utils/retryable.rs +0 -0
- {cocoindex-0.1.82 → cocoindex-0.2.0}/src/utils/yaml_ser.rs +0 -0
@@ -2,7 +2,7 @@
|
|
2
2
|
name = "cocoindex"
|
3
3
|
# Version used for local development is always higher than others to take precedence.
|
4
4
|
# Will be overridden for specific release versions.
|
5
|
-
version = "0.
|
5
|
+
version = "0.2.0"
|
6
6
|
edition = "2024"
|
7
7
|
rust-version = "1.88"
|
8
8
|
readme = "README.md"
|
@@ -23,7 +23,7 @@ Each piece of data has a **data type**, falling into one of the following catego
|
|
23
23
|
|
24
24
|
* *Basic type*.
|
25
25
|
* *Struct type*: a collection of **fields**, each with a name and a type.
|
26
|
-
* *Table type*: a collection of **rows**, each of which is a struct with specified schema. A table type can be a *KTable* (
|
26
|
+
* *Table type*: a collection of **rows**, each of which is a struct with specified schema. A table type can be a *KTable* (with key columns that uniquely identify each row) or a *LTable* (rows are ordered but without keys).
|
27
27
|
|
28
28
|
An indexing flow always has a top-level struct, containing all data within and managed by the flow.
|
29
29
|
|
@@ -148,21 +148,27 @@ We have two specific types of *Table* types: *KTable* and *LTable*.
|
|
148
148
|
|
149
149
|
#### KTable
|
150
150
|
|
151
|
-
*KTable* is a *Table* type whose
|
151
|
+
*KTable* is a *Table* type whose one or more columns together serve as the key.
|
152
152
|
The row order of a *KTable* is not preserved.
|
153
|
-
|
153
|
+
Each key column must be a [key type](#key-types). When multiple key columns are present, they form a composite key.
|
154
154
|
|
155
|
-
In Python, a *KTable* type is represented by `dict[K, V]`.
|
156
|
-
|
157
|
-
|
158
|
-
|
159
|
-
|
155
|
+
In Python, a *KTable* type is represented by `dict[K, V]`.
|
156
|
+
`K` represents the key and `V` represents the value for each row:
|
157
|
+
|
158
|
+
- `K` can be a Struct type (either a frozen dataclass or a `NamedTuple`) that contains all key parts as fields. This is the general way to model multi-part keys.
|
159
|
+
- When there is only a single key part and it is a basic type (e.g. `str`, `int`), you may use that basic type directly as the dictionary key instead of wrapping it in a Struct.
|
160
|
+
- `V` should be the type bound to a *Struct* representing the non-key value fields of each row.
|
161
|
+
|
162
|
+
When a specific type annotation is not provided:
|
163
|
+
- For composite keys (multiple key parts), the key binds to a Python tuple of the key parts, e.g. `tuple[str, str]`.
|
164
|
+
- For a single basic key part, the key binds to that basic Python type.
|
165
|
+
- The value binds to `dict[str, Any]`.
|
160
166
|
|
161
167
|
|
162
168
|
For example, you can use `dict[str, Person]` or `dict[str, PersonTuple]` to represent a *KTable*, with 4 columns: key (*Str*), `first_name` (*Str*), `last_name` (*Str*), `dob` (*Date*).
|
163
169
|
It's bound to `dict[str, dict[str, Any]]` if you don't annotate the function argument with a specific type.
|
164
170
|
|
165
|
-
Note that
|
171
|
+
Note that when using a Struct as the key, it must be immutable in Python. For a dataclass, annotate it with `@dataclass(frozen=True)`. For `NamedTuple`, immutability is built-in. For example:
|
166
172
|
|
167
173
|
```python
|
168
174
|
@dataclass(frozen=True)
|
@@ -175,8 +181,8 @@ class PersonKeyTuple(NamedTuple):
|
|
175
181
|
id: str
|
176
182
|
```
|
177
183
|
|
178
|
-
Then you can use `dict[PersonKey, Person]` or `dict[PersonKeyTuple, PersonTuple]` to represent a KTable keyed by `
|
179
|
-
|
184
|
+
Then you can use `dict[PersonKey, Person]` or `dict[PersonKeyTuple, PersonTuple]` to represent a KTable keyed by both `id_kind` and `id`.
|
185
|
+
If you don't annotate the function argument with a specific type, it's bound to `dict[tuple[str, str], dict[str, Any]]`.
|
180
186
|
|
181
187
|
|
182
188
|
#### LTable
|
@@ -5,16 +5,15 @@ sidebar_class_name: hidden
|
|
5
5
|
slug: /examples/academic_papers_index
|
6
6
|
canonicalUrl: '/examples/academic_papers_index'
|
7
7
|
sidebar_custom_props:
|
8
|
-
image: /img/examples/academic_papers_index.png
|
8
|
+
image: /img/examples/academic_papers_index/cover.png
|
9
9
|
tags: [vector-index, metadata]
|
10
10
|
tags: [vector-index, metadata]
|
11
11
|
---
|
12
12
|
|
13
|
-
import { GitHubButton, YouTubeButton } from '../../../src/components/GitHubButton';
|
13
|
+
import { GitHubButton, YouTubeButton, DocumentationButton } from '../../../src/components/GitHubButton';
|
14
14
|
|
15
15
|
<GitHubButton url="https://github.com/cocoindex-io/cocoindex/tree/main/examples/paper_metadata"/>
|
16
16
|
|
17
|
-
|
18
17
|
## What we will achieve
|
19
18
|
|
20
19
|
1. Extract the paper metadata, including file name, title, author information, abstract, and number of pages.
|
@@ -27,18 +26,8 @@ to answer questions like "Give me all the papers by Jeff Dean."
|
|
27
26
|
|
28
27
|
4. If you want to perform full PDF embedding for the paper, you can extend the flow.
|
29
28
|
|
30
|
-
##
|
31
|
-
|
32
|
-
- [Install PostgreSQL](https://cocoindex.io/docs/getting_started/installation#-install-postgres).
|
33
|
-
CocoIndex uses PostgreSQL internally for incremental processing.
|
34
|
-
- [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai).
|
35
|
-
Alternatively, we have native support for Gemini, Ollama, LiteLLM. Check out the [guide](https://cocoindex.io/docs/ai/llm#ollama).
|
36
|
-
You can choose your favorite LLM provider and work completely on-premises.
|
37
|
-
|
38
|
-
## Define Indexing Flow
|
39
|
-
|
40
|
-
To better help you navigate what we will walk through, here is a flow diagram:
|
41
|
-
|
29
|
+
## Flow Overview
|
30
|
+

|
42
31
|
1. Import a list of papers in PDF.
|
43
32
|
2. For each file:
|
44
33
|
- Extract the first page of the paper.
|
@@ -50,9 +39,15 @@ To better help you navigate what we will walk through, here is a flow diagram:
|
|
50
39
|
- Author-to-paper mapping, for author-based query.
|
51
40
|
- Embeddings for titles and abstract chunks, for semantic search.
|
52
41
|
|
53
|
-
|
42
|
+
## Setup
|
43
|
+
|
44
|
+
- [Install PostgreSQL](https://cocoindex.io/docs/getting_started/installation#-install-postgres).
|
45
|
+
CocoIndex uses PostgreSQL internally for incremental processing.
|
46
|
+
- [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai). Alternatively, we have native support for Gemini, Ollama, LiteLLM. You can choose your favorite LLM provider and work completely on-premises.
|
54
47
|
|
55
|
-
|
48
|
+
<DocumentationButton href="https://cocoindex.io/docs/ai/llm" text="LLM" margin="0 0 16px 0" />
|
49
|
+
|
50
|
+
## Import the Papers
|
56
51
|
|
57
52
|
```python
|
58
53
|
@cocoindex.flow_def(name="PaperMetadata")
|
@@ -65,12 +60,12 @@ def paper_metadata_flow(
|
|
65
60
|
)
|
66
61
|
```
|
67
62
|
|
68
|
-
`flow_builder.add_source` will create a table with sub fields (`filename`, `content`)
|
69
|
-
|
63
|
+
`flow_builder.add_source` will create a table with sub fields (`filename`, `content`).
|
64
|
+
<DocumentationButton href="https://cocoindex.io/docs/ops/sources" text="Sources" margin="0 0 16px 0" />
|
70
65
|
|
71
|
-
|
66
|
+
## Extract and collect metadata
|
72
67
|
|
73
|
-
|
68
|
+
### Extract first page for basic info
|
74
69
|
|
75
70
|
Define a custom function to extract the first page and number of pages of the PDF.
|
76
71
|
|
@@ -96,20 +91,19 @@ def extract_basic_info(content: bytes) -> PaperBasicInfo:
|
|
96
91
|
|
97
92
|
```
|
98
93
|
|
99
|
-
Now
|
100
|
-
We extract metadata from the first page to minimize processing cost, since the entire PDF can be very large.
|
94
|
+
Now plug this into the flow. We extract metadata from the first page to minimize processing cost, since the entire PDF can be very large.
|
101
95
|
|
102
96
|
```python
|
103
97
|
with data_scope["documents"].row() as doc:
|
104
98
|
doc["basic_info"] = doc["content"].transform(extract_basic_info)
|
105
99
|
```
|
100
|
+

|
106
101
|
|
107
|
-
After this step,
|
102
|
+
After this step, we should have the basic info of each paper.
|
108
103
|
|
109
104
|
### Parse basic info
|
110
105
|
|
111
|
-
We will convert the first page to Markdown using Marker.
|
112
|
-
Alternatively, you can easily plug in your favorite PDF parser, such as Docling.
|
106
|
+
We will convert the first page to Markdown using Marker. Alternatively, you can easily plug in any PDF parser, such as Docling using CocoIndex's [custom function](https://cocoindex.io/docs/custom_ops/custom_functions).
|
113
107
|
|
114
108
|
Define a marker converter function and cache it, since its initialization is resource-intensive.
|
115
109
|
This ensures that the same converter instance is reused for different input files.
|
@@ -140,18 +134,20 @@ def pdf_to_markdown(content: bytes) -> str:
|
|
140
134
|
Pass it to your transform
|
141
135
|
|
142
136
|
```python
|
143
|
-
with data_scope["documents"].row() as doc:
|
137
|
+
with data_scope["documents"].row() as doc:
|
138
|
+
# ... process
|
144
139
|
doc["first_page_md"] = doc["basic_info"]["first_page"].transform(
|
145
140
|
pdf_to_markdown
|
146
141
|
)
|
147
142
|
```
|
143
|
+

|
148
144
|
|
149
145
|
After this step, you should have the first page of each paper in Markdown format.
|
150
146
|
|
151
|
-
|
147
|
+
### Extract basic info with LLM
|
152
148
|
|
153
149
|
Define a schema for LLM extraction. CocoIndex natively supports LLM-structured extraction with complex and nested schemas.
|
154
|
-
If you are interested in learning more about nested schemas, refer to [this
|
150
|
+
If you are interested in learning more about nested schemas, refer to [this example](https://cocoindex.io/docs/examples/patient_form_extraction).
|
155
151
|
|
156
152
|
```python
|
157
153
|
@dataclasses.dataclass
|
@@ -163,7 +159,6 @@ class PaperMetadata:
|
|
163
159
|
title: str
|
164
160
|
authors: list[Author]
|
165
161
|
abstract: str
|
166
|
-
|
167
162
|
```
|
168
163
|
|
169
164
|
Plug it into the `ExtractByLlm` function. With a dataclass defined, CocoIndex will automatically parse the LLM response into the dataclass.
|
@@ -181,26 +176,27 @@ doc["metadata"] = doc["first_page_md"].transform(
|
|
181
176
|
```
|
182
177
|
|
183
178
|
After this step, you should have the metadata of each paper.
|
179
|
+

|
184
180
|
|
185
|
-
|
181
|
+
### Collect paper metadata
|
186
182
|
|
187
183
|
```python
|
188
|
-
|
189
|
-
|
190
|
-
|
191
|
-
|
192
|
-
|
193
|
-
|
194
|
-
|
195
|
-
|
196
|
-
|
197
|
-
|
198
|
-
|
184
|
+
paper_metadata = data_scope.add_collector()
|
185
|
+
with data_scope["documents"].row() as doc:
|
186
|
+
# ... process
|
187
|
+
# Collect metadata
|
188
|
+
paper_metadata.collect(
|
189
|
+
filename=doc["filename"],
|
190
|
+
title=doc["metadata"]["title"],
|
191
|
+
authors=doc["metadata"]["authors"],
|
192
|
+
abstract=doc["metadata"]["abstract"],
|
193
|
+
num_pages=doc["basic_info"]["num_pages"],
|
194
|
+
)
|
199
195
|
```
|
200
196
|
|
201
197
|
Just collect anything you need :)
|
202
198
|
|
203
|
-
|
199
|
+
### Collect `author` to `filename` information
|
204
200
|
We’ve already extracted author list. Here we want to collect Author → Papers in a separate table to build a look up functionality.
|
205
201
|
Simply collect by author.
|
206
202
|
|
@@ -216,9 +212,9 @@ with data_scope["documents"].row() as doc:
|
|
216
212
|
```
|
217
213
|
|
218
214
|
|
219
|
-
|
215
|
+
## Compute and collect embeddings
|
220
216
|
|
221
|
-
|
217
|
+
### Title
|
222
218
|
|
223
219
|
```python
|
224
220
|
doc["title_embedding"] = doc["metadata"]["title"].transform(
|
@@ -228,7 +224,7 @@ doc["title_embedding"] = doc["metadata"]["title"].transform(
|
|
228
224
|
)
|
229
225
|
```
|
230
226
|
|
231
|
-
|
227
|
+
### Abstract
|
232
228
|
|
233
229
|
Split abstract into chunks, embed each chunk and collect their embeddings.
|
234
230
|
Sometimes the abstract could be very long.
|
@@ -252,6 +248,8 @@ doc["abstract_chunks"] = doc["metadata"]["abstract"].transform(
|
|
252
248
|
|
253
249
|
After this step, you should have the abstract chunks of each paper.
|
254
250
|
|
251
|
+

|
252
|
+
|
255
253
|
Embed each chunk and collect their embeddings.
|
256
254
|
|
257
255
|
```python
|
@@ -265,7 +263,9 @@ with doc["abstract_chunks"].row() as chunk:
|
|
265
263
|
|
266
264
|
After this step, you should have the embeddings of the abstract chunks of each paper.
|
267
265
|
|
268
|
-
|
266
|
+

|
267
|
+
|
268
|
+
### Collect embeddings
|
269
269
|
|
270
270
|
```python
|
271
271
|
metadata_embeddings = data_scope.add_collector()
|
@@ -292,7 +292,7 @@ with data_scope["documents"].row() as doc:
|
|
292
292
|
)
|
293
293
|
```
|
294
294
|
|
295
|
-
|
295
|
+
## Export
|
296
296
|
Finally, we export the data to Postgres.
|
297
297
|
|
298
298
|
```python
|
@@ -319,14 +319,9 @@ metadata_embeddings.export(
|
|
319
319
|
)
|
320
320
|
```
|
321
321
|
|
322
|
-
In this example we use PGVector as embedding
|
323
|
-
With CocoIndex, you can do one line switch on other supported Vector databases like Qdrant, see this [guide](https://cocoindex.io/docs/ops/targets#entry-oriented-targets) for more details.
|
324
|
-
We aim to standardize interfaces and make it like assembling building blocks.
|
322
|
+
In this example we use PGVector as embedding store. With CocoIndex, you can do one line switch on other supported Vector databases.
|
325
323
|
|
326
|
-
|
327
|
-
|
328
|
-
You can walk through the project step by step in [CocoInsight](https://www.youtube.com/watch?v=MMrpUfUcZPk) to see
|
329
|
-
exactly how each field is constructed and what happens behind the scenes.
|
324
|
+
<DocumentationButton href="https://cocoindex.io/docs/ops/targets#entry-oriented-targets" text="Entry Oriented Targets" margin="0 0 16px 0" />
|
330
325
|
|
331
326
|
## Query the index
|
332
327
|
|
@@ -338,3 +333,14 @@ For now CocoIndex doesn't provide additional query interface. We can write SQL o
|
|
338
333
|
- The query space has excellent solutions for querying, reranking, and other search-related functionality.
|
339
334
|
|
340
335
|
If you need assist with writing the query, please feel free to reach out to us at [Discord](https://discord.com/invite/zpA9S2DR7s).
|
336
|
+
|
337
|
+
## CocoInsight
|
338
|
+
|
339
|
+
You can walk through the project step by step in [CocoInsight](https://www.youtube.com/watch?v=MMrpUfUcZPk) to see exactly how each field is constructed and what happens behind the scenes.
|
340
|
+
|
341
|
+
|
342
|
+
```sh
|
343
|
+
cocoindex server -ci main.py
|
344
|
+
```
|
345
|
+
|
346
|
+
Follow the url `https://cocoindex.io/cocoinsight`. It connects to your local CocoIndex server, with zero pipeline data retention.
|
@@ -5,30 +5,57 @@ sidebar_class_name: hidden
|
|
5
5
|
slug: /examples/code_index
|
6
6
|
canonicalUrl: '/examples/code_index'
|
7
7
|
sidebar_custom_props:
|
8
|
-
image: /img/examples/codebase_index.png
|
8
|
+
image: /img/examples/codebase_index/cover.png
|
9
9
|
tags: [vector-index, codebase]
|
10
10
|
tags: [vector-index, codebase]
|
11
11
|
---
|
12
12
|
|
13
|
-
import { GitHubButton, YouTubeButton } from '../../../src/components/GitHubButton';
|
13
|
+
import { GitHubButton, YouTubeButton, DocumentationButton } from '../../../src/components/GitHubButton';
|
14
14
|
|
15
15
|
<GitHubButton url="https://github.com/cocoindex-io/cocoindex/tree/main/examples/code_embedding"/>
|
16
16
|
<YouTubeButton url="https://youtu.be/G3WstvhHO24?si=ndYfM0XRs03_hVPR" />
|
17
17
|
|
18
|
-
##
|
18
|
+
## Overview
|
19
|
+
In this tutorial, we will build codebase index. [CocoIndex](https://github.com/cocoindex-io/cocoindex) provides built-in support for codebase chunking, with native Tree-sitter support. It works with large codebases, and can be updated in near real-time with incremental processing - only reprocess what's changed.
|
19
20
|
|
20
|
-
|
21
|
+
## Use Cases
|
22
|
+
A wide range of applications can be built with an effective codebase index that is always up-to-date. Some examples include:
|
21
23
|
|
22
|
-
|
24
|
+

|
25
|
+
|
26
|
+
- Semantic code context for AI coding agents like Claude, Codex, Gemini CLI.
|
27
|
+
- MCP for code editors such as Cursor, Windsurf, and VSCode.
|
28
|
+
- Context-aware code search applications—semantic code search, natural language code retrieval.
|
29
|
+
- Context for code review agents—AI code review, automated code analysis, code quality checks, pull request summarization.
|
30
|
+
- Automated code refactoring, large-scale code migration.
|
31
|
+
- Enhance SRE workflows: enable rapid root cause analysis, incident response, and change impact assessment by indexing infrastructure-as-code, deployment scripts, and config files for semantic search and lineage tracking.
|
32
|
+
- Automatically generate design documentation from code—keep design docs up-to-date.
|
33
|
+
|
34
|
+
## Flow Overview
|
35
|
+
|
36
|
+

|
37
|
+
|
38
|
+
The flow is composed of the following steps:
|
39
|
+
|
40
|
+
- Read code files from the local filesystem
|
41
|
+
- Extract file extensions, to get the language of the code for Tree-sitter to parse
|
42
|
+
- Split code into semantic chunks using Tree-sitter
|
43
|
+
- Generate embeddings for each chunk
|
44
|
+
- Store in a vector database for retrieval
|
23
45
|
|
24
|
-
|
46
|
+
## Setup
|
47
|
+
- Install Postgres, follow [installation guide](https://cocoindex.io/docs/getting_started/installation#-install-postgres).
|
48
|
+
- Install CocoIndex
|
49
|
+
```bash
|
50
|
+
pip install -U cocoindex
|
51
|
+
```
|
52
|
+
|
53
|
+
## Add the codebase as a source.
|
54
|
+
We will index the CocoIndex codebase. Here we use the `LocalFile` source to ingest files from the CocoIndex codebase root directory.
|
25
55
|
|
26
56
|
```python
|
27
57
|
@cocoindex.flow_def(name="CodeEmbedding")
|
28
58
|
def code_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
|
29
|
-
"""
|
30
|
-
Define an example flow that embeds files into a vector database.
|
31
|
-
"""
|
32
59
|
data_scope["files"] = flow_builder.add_source(
|
33
60
|
cocoindex.sources.LocalFile(path="../..",
|
34
61
|
included_patterns=["*.py", "*.rs", "*.toml", "*.md", "*.mdx"],
|
@@ -40,16 +67,15 @@ def code_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoind
|
|
40
67
|
- Exclude files and directories starting `.`, `target` in the root and `node_modules` under any directory.
|
41
68
|
|
42
69
|
`flow_builder.add_source` will create a table with sub fields (`filename`, `content`).
|
43
|
-
|
70
|
+
<DocumentationButton href="https://cocoindex.io/docs/ops/sources" text="Sources" />
|
44
71
|
|
45
72
|
|
46
|
-
## Process each file and collect the information
|
73
|
+
## Process each file and collect the information
|
47
74
|
|
48
|
-
###
|
75
|
+
### Extract the extension of a filename
|
49
76
|
|
50
77
|
We need to pass the language (or extension) to Tree-sitter to parse the code.
|
51
78
|
Let's define a function to extract the extension of a filename while processing each file.
|
52
|
-
You can find the documentation for custom function [here](https://cocoindex.io/docs/core/custom_function).
|
53
79
|
|
54
80
|
```python
|
55
81
|
@cocoindex.op.function()
|
@@ -58,52 +84,43 @@ def extract_extension(filename: str) -> str:
|
|
58
84
|
return os.path.splitext(filename)[1]
|
59
85
|
```
|
60
86
|
|
61
|
-
|
62
|
-
|
63
|
-
```python
|
64
|
-
with data_scope["files"].row() as file:
|
65
|
-
file["extension"] = file["filename"].transform(extract_extension)
|
66
|
-
```
|
67
|
-
|
68
|
-
Here we extract the extension of the filename and store it in the `extension` field.
|
69
|
-
|
87
|
+
<DocumentationButton href="https://cocoindex.io/docs/custom_ops/custom_functions" text="Custom Function" margin="0 0 16px 0" />
|
70
88
|
|
71
89
|
### Split the file into chunks
|
72
|
-
|
73
|
-
We will chunk the code with Tree-sitter.
|
74
|
-
We use the `SplitRecursively` function to split the file into chunks.
|
75
|
-
It is integrated with Tree-sitter, so you can pass in the language to the `language` parameter.
|
76
|
-
To see all supported language names and extensions, see the documentation [here](https://cocoindex.io/docs/ops/functions#splitrecursively). All the major languages are supported, e.g., Python, Rust, JavaScript, TypeScript, Java, C++, etc. If it's unspecified or the specified language is not supported, it will be treated as plain text.
|
90
|
+
We use the `SplitRecursively` function to split the file into chunks. `SplitRecursively` is CocoIndex building block, with native integration with Tree-sitter. You need to pass in the language to the `language` parameter if you are processing code.
|
77
91
|
|
78
92
|
```python
|
79
93
|
with data_scope["files"].row() as file:
|
94
|
+
# Extract the extension of the filename.
|
95
|
+
file["extension"] = file["filename"].transform(extract_extension)
|
80
96
|
file["chunks"] = file["content"].transform(
|
81
97
|
cocoindex.functions.SplitRecursively(),
|
82
98
|
language=file["extension"], chunk_size=1000, chunk_overlap=300)
|
83
99
|
```
|
100
|
+
<DocumentationButton href="https://cocoindex.io/docs/ops/functions#splitrecursively" text="SplitRecursively" margin="0 0 16px 0" />
|
84
101
|
|
102
|
+

|
85
103
|
|
86
104
|
### Embed the chunks
|
87
|
-
|
88
105
|
We use `SentenceTransformerEmbed` to embed the chunks.
|
89
|
-
You can refer to the documentation [here](https://cocoindex.io/docs/ops/functions#sentencetransformerembed).
|
90
106
|
|
91
107
|
```python
|
92
108
|
@cocoindex.transform_flow()
|
93
109
|
def code_to_embedding(text: cocoindex.DataSlice[str]) -> cocoindex.DataSlice[list[float]]:
|
94
|
-
"""
|
95
|
-
Embed the text using a SentenceTransformer model.
|
96
|
-
"""
|
97
110
|
return text.transform(
|
98
111
|
cocoindex.functions.SentenceTransformerEmbed(
|
99
112
|
model="sentence-transformers/all-MiniLM-L6-v2"))
|
100
113
|
```
|
101
114
|
|
102
|
-
|
115
|
+
<DocumentationButton href="https://cocoindex.io/docs/ops/functions#sentencetransformerembed" text="SentenceTransformerEmbed" margin="0 0 16px 0" />
|
116
|
+
|
117
|
+
:::tip
|
118
|
+
`@cocoindex.transform_flow()` is needed to share the transformation across indexing and query. When building a vector index and querying against it, the embedding computation must remain consistent between indexing and querying.
|
119
|
+
:::
|
103
120
|
|
104
|
-
|
105
|
-
the embedding computation needs to be consistent between indexing and querying. See [documentation](https://cocoindex.io/docs/query#transform-flow) for more details.
|
121
|
+
<DocumentationButton href="https://cocoindex.io/docs/query#transform-flow" text="Transform Flow" margin="0 0 16px 0" />
|
106
122
|
|
123
|
+
Then for each chunk, we will embed it using the `code_to_embedding` function, and collect the embeddings to the `code_embeddings` collector.
|
107
124
|
|
108
125
|
```python
|
109
126
|
with data_scope["files"].row() as file:
|
@@ -113,10 +130,7 @@ with data_scope["files"].row() as file:
|
|
113
130
|
code=chunk["text"], embedding=chunk["embedding"])
|
114
131
|
```
|
115
132
|
|
116
|
-
|
117
|
-
### 2.4 Collect the embeddings
|
118
|
-
|
119
|
-
Export the embeddings to a table.
|
133
|
+
### Export the embeddings
|
120
134
|
|
121
135
|
```python
|
122
136
|
code_embeddings.export(
|
@@ -126,8 +140,7 @@ code_embeddings.export(
|
|
126
140
|
vector_indexes=[cocoindex.VectorIndex("embedding", cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])
|
127
141
|
```
|
128
142
|
|
129
|
-
We use
|
130
|
-
To learn more about Consine Similarity, see [Wiki](https://en.wikipedia.org/wiki/Cosine_similarity).
|
143
|
+
We use [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity) to measure the similarity between the query and the indexed data.
|
131
144
|
|
132
145
|
## Query the index
|
133
146
|
We match against user-provided text by a SQL query, reusing the embedding operation in the indexing flow.
|
@@ -180,13 +193,16 @@ if __name__ == "__main__":
|
|
180
193
|
|
181
194
|
## Run the index setup & update
|
182
195
|
|
183
|
-
|
196
|
+
- Install dependencies
|
197
|
+
```bash
|
198
|
+
pip install -e .
|
199
|
+
```
|
184
200
|
|
185
|
-
|
186
|
-
```sh
|
187
|
-
cocoindex update --setup main.py
|
188
|
-
```
|
189
|
-
You'll see the index updates state in the terminal
|
201
|
+
- Setup and update the index
|
202
|
+
```sh
|
203
|
+
cocoindex update --setup main.py
|
204
|
+
```
|
205
|
+
You'll see the index updates state in the terminal
|
190
206
|
|
191
207
|
|
192
208
|
## Test the query
|
@@ -197,7 +213,13 @@ python main.py
|
|
197
213
|
```
|
198
214
|
|
199
215
|
When you see the prompt, you can enter your search query. for example: spec.
|
216
|
+
The returned results - each entry contains score (Cosine Similarity), filename, and the code snippet that get matched.
|
200
217
|
|
201
|
-
|
218
|
+
## CocoInsight
|
219
|
+
To get a better understanding of the indexing flow, you can use CocoInsight to help the development step by step.
|
220
|
+
To spin up, it is super easy.
|
202
221
|
|
203
|
-
|
222
|
+
```
|
223
|
+
cocoindex server main.py -ci
|
224
|
+
```
|
225
|
+
Follow the url from the terminal - `https://cocoindex.io/cocoinsight` to access the CocoInsight.
|
@@ -5,23 +5,17 @@ sidebar_class_name: hidden
|
|
5
5
|
slug: /examples/custom_targets
|
6
6
|
canonicalUrl: '/examples/custom_targets'
|
7
7
|
sidebar_custom_props:
|
8
|
-
image: /img/examples/custom_targets.png
|
8
|
+
image: /img/examples/custom_targets/cover.png
|
9
9
|
tags: [custom-building-blocks]
|
10
10
|
tags: [custom-building-blocks]
|
11
11
|
---
|
12
|
-
import { GitHubButton, YouTubeButton } from '../../../src/components/GitHubButton';
|
12
|
+
import { GitHubButton, YouTubeButton, DocumentationButton } from '../../../src/components/GitHubButton';
|
13
13
|
|
14
14
|
<GitHubButton url="https://github.com/cocoindex-io/cocoindex/tree/main/examples/custom_output_files"/>
|
15
15
|
|
16
16
|
## Overview
|
17
17
|
|
18
|
-
Let’s walk through a simple example—exporting `.md` files as `.html` using a custom file-based target. This project monitors folder changes and continuously converts markdown to HTML incrementally.
|
19
|
-
Check out the full [source code](https://github.com/cocoindex-io/cocoindex/tree/main/examples/custom_output_files).
|
20
|
-
|
21
|
-
The overall flow is simple:
|
22
|
-
This example focuses on
|
23
|
-
- how to configure your custom target
|
24
|
-
- the flow effortless picks up the changes in the source, recomputes only what's changed and export to the target
|
18
|
+
Let’s walk through a simple example—exporting `.md` files as `.html` using a custom file-based target. This project monitors folder changes and continuously converts markdown to HTML incrementally. The overall flow is simple and primarily focuses on how to configure your custom target.
|
25
19
|
|
26
20
|
|
27
21
|
## Ingest files
|
@@ -33,16 +27,13 @@ Ingest a list of markdown files:
|
|
33
27
|
def custom_output_files(
|
34
28
|
flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
|
35
29
|
) -> None:
|
36
|
-
"""
|
37
|
-
Define an example flow that exports markdown files to HTML files.
|
38
|
-
"""
|
39
30
|
data_scope["documents"] = flow_builder.add_source(
|
40
31
|
cocoindex.sources.LocalFile(path="data", included_patterns=["*.md"]),
|
41
32
|
refresh_interval=timedelta(seconds=5),
|
42
33
|
)
|
43
34
|
```
|
44
35
|
This ingestion creates a table with `filename` and `content` fields.
|
45
|
-
|
36
|
+
<DocumentationButton href="https://cocoindex.io/docs/ops/sources" text="Sources" />
|
46
37
|
|
47
38
|
## Process each file and collect
|
48
39
|
|
@@ -50,11 +41,12 @@ Define custom function that converts markdown to HTML
|
|
50
41
|
|
51
42
|
```python
|
52
43
|
@cocoindex.op.function()
|
53
|
-
|
54
44
|
def markdown_to_html(text: str) -> str:
|
55
45
|
return _markdown_it.render(text)
|
56
46
|
```
|
57
47
|
|
48
|
+
<DocumentationButton href="https://cocoindex.io/docs/custom_ops/custom_functions" text="Custom Function" margin="0 0 16px 0" />
|
49
|
+
|
58
50
|
Define data collector and transform each document to html.
|
59
51
|
|
60
52
|
```python
|
@@ -63,12 +55,15 @@ with data_scope["documents"].row() as doc:
|
|
63
55
|
doc["html"] = doc["content"].transform(markdown_to_html)
|
64
56
|
output_html.collect(filename=doc["filename"], html=doc["html"])
|
65
57
|
```
|
58
|
+

|
66
59
|
|
67
60
|
|
68
61
|
## Define the custom target
|
69
62
|
|
70
63
|
### Define the target spec
|
71
64
|
|
65
|
+
<DocumentationButton href="https://cocoindex.io/docs/custom_ops/custom_targets#target-spec" text="Target Spec" margin="0 0 16px 0" />
|
66
|
+
|
72
67
|
The target spec contains a directory for output files:
|
73
68
|
|
74
69
|
```python
|
@@ -76,8 +71,11 @@ class LocalFileTarget(cocoindex.op.TargetSpec):
|
|
76
71
|
directory: str
|
77
72
|
```
|
78
73
|
|
74
|
+
|
79
75
|
### Implement the connector
|
80
76
|
|
77
|
+
<DocumentationButton href="https://cocoindex.io/docs/custom_ops/custom_targets#target-connector" text="Target Connector" margin="0 0 16px 0" />
|
78
|
+
|
81
79
|
`get_persistent_key()` defines the persistent key,
|
82
80
|
which uniquely identifies the target for change tracking and incremental updates. Here, we simply use the target directory as the key (e.g., `./data/output`).
|
83
81
|
|
@@ -180,17 +178,15 @@ def mutate(
|
|
180
178
|
### Use it in the Flow
|
181
179
|
|
182
180
|
```python
|
183
|
-
|
184
|
-
|
185
|
-
|
186
|
-
|
187
|
-
|
181
|
+
output_html.export(
|
182
|
+
"OutputHtml",
|
183
|
+
LocalFileTarget(directory="output_html"),
|
184
|
+
primary_key_fields=["filename"],
|
185
|
+
)
|
188
186
|
```
|
189
187
|
|
190
188
|
## Run the example
|
191
189
|
|
192
|
-
Once your pipeline is set up, keeping your knowledge graph updated is simple:
|
193
|
-
|
194
190
|
```bash
|
195
191
|
pip install -e .
|
196
192
|
cocoindex update --setup main.py
|