cocoindex 0.1.83__tar.gz → 0.2.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {cocoindex-0.1.83 → cocoindex-0.2.1}/Cargo.lock +1 -1
- {cocoindex-0.1.83 → cocoindex-0.2.1}/Cargo.toml +1 -1
- {cocoindex-0.1.83 → cocoindex-0.2.1}/PKG-INFO +1 -1
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/core/basics.md +1 -1
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/core/data_types.mdx +16 -10
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/core/settings.mdx +4 -4
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/examples/examples/academic_papers_index.md +60 -54
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/examples/examples/docs_to_knowledge_graph.md +4 -2
- cocoindex-0.2.1/docs/docs/examples/examples/manual_extraction.md +249 -0
- cocoindex-0.2.1/docs/docs/examples/examples/patient_form_extraction.md +296 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/examples/examples/photo_search.md +36 -15
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/examples/examples/product_recommendation.md +87 -66
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/examples/examples/simple_vector_index.md +44 -23
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/getting_started/quickstart.md +1 -1
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/ops/sources.md +43 -0
- cocoindex-0.2.1/docs/static/img/examples/academic_papers_index/abstract_chunks.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/academic_papers_index/basic_info.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/academic_papers_index/chunk_embedding.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/academic_papers_index/cover.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/academic_papers_index/first_page.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/academic_papers_index/flow.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/academic_papers_index/metadata.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/manual_extraction/cover.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/manual_extraction/extraction.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/manual_extraction/flow.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/manual_extraction/summary.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/patient_form_extraction/cover.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/patient_form_extraction/extraction.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/patient_form_extraction/fields.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/patient_form_extraction/flow.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/patient_form_extraction/tomarkdown.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/photo_search/cover.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/photo_search/extraction.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/photo_search/flow.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/product_recommendation/cover.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/product_recommendation/dedupe.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/product_recommendation/export_all.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/product_recommendation/export_product.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/product_recommendation/export_taxonomy.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/product_recommendation/extract_product.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/product_recommendation/extract_taxonomy.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/product_recommendation/neo4j.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/product_recommendation/parse_json.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/product_recommendation/taxonomy.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/simple_vector_index/chunk.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/simple_vector_index/cover.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/simple_vector_index/embed.png +0 -0
- cocoindex-0.2.1/docs/static/img/examples/simple_vector_index/flow.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/manuals_llm_extraction/pyproject.toml +1 -1
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/paper_metadata/pyproject.toml +2 -2
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/pdf_embedding/pyproject.toml +1 -1
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/postgres_source/main.py +4 -4
- cocoindex-0.1.83/examples/product_recommendation/.env → cocoindex-0.2.1/examples/product_recommendation/.env.example +2 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/convert.py +63 -46
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/setting.py +2 -2
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/subprocess_exec.py +15 -1
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/typing.py +37 -22
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/base/schema.rs +35 -37
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/base/value.rs +221 -77
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/builder/analyzer.rs +2 -7
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/builder/exec_ctx.rs +38 -7
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/builder/plan.rs +2 -1
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/dumper.rs +10 -10
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/evaluator.rs +16 -13
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/row_indexer.rs +7 -8
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/source_indexer.rs +8 -9
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/stats.rs +17 -1
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/lib_context.rs +3 -3
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/functions/split_recursively.rs +16 -10
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/interface.rs +3 -3
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/sources/amazon_s3.rs +5 -5
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/sources/azure_blob.rs +4 -4
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/sources/google_drive.rs +5 -5
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/sources/local_file.rs +6 -8
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/sources/postgres.rs +23 -74
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/targets/kuzu.rs +10 -11
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/targets/neo4j.rs +2 -3
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/targets/shared/property_graph.rs +1 -1
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/py/convert.rs +32 -21
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/service/flows.rs +10 -18
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/setup/states.rs +7 -1
- cocoindex-0.1.83/docs/docs/examples/examples/manual_extraction.md +0 -274
- cocoindex-0.1.83/docs/docs/examples/examples/patient_form_extraction.md +0 -271
- cocoindex-0.1.83/docs/static/img/examples/academic_papers_index/cover.png +0 -0
- cocoindex-0.1.83/docs/static/img/examples/manual_extraction/cover.png +0 -0
- cocoindex-0.1.83/docs/static/img/examples/patient_form_extraction/cover.png +0 -0
- cocoindex-0.1.83/docs/static/img/examples/photo_search/cover.png +0 -0
- cocoindex-0.1.83/docs/static/img/examples/product_recommendation/cover.png +0 -0
- cocoindex-0.1.83/docs/static/img/examples/simple_vector_index/cover.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/.cargo/config.toml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/.env.lib_debug +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/.github/ISSUE_TEMPLATE//360/237/220/233-bug-report.md" +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/.github/ISSUE_TEMPLATE//360/237/222/241-feature-request.md" +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/.github/scripts/update_version.sh +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/.github/workflows/CI.yml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/.github/workflows/_doc_release.yml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/.github/workflows/_test.yml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/.github/workflows/docs.yml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/.github/workflows/format.yml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/.github/workflows/release.yml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/.gitignore +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/.pre-commit-config.yaml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/CODE_OF_CONDUCT.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/CONTRIBUTING.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/LICENSE +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/dev/neo4j.yaml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/dev/postgres.yaml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/.gitignore +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/about/community.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/ai/llm.mdx +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/contributing/guide.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/contributing/new_built_in_target.mdx +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/contributing/setup_dev_environment.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/core/cli.mdx +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/core/data_example.svg +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/core/flow_def.mdx +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/core/flow_example.svg +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/core/flow_methods.mdx +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/custom_ops/custom_functions.mdx +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/custom_ops/custom_targets.mdx +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/examples/examples/codebase_index.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/examples/examples/custom_targets.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/examples/examples/image_search.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/examples/examples/multi_format_index.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/examples/index.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/getting_started/installation.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/getting_started/markdown_files.zip +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/getting_started/overview.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/ops/functions.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/ops/targets.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/query.mdx +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/tutorials/live_updates.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/tutorials/manage_flow_dynamically.mdx +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docusaurus.config.ts +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/package.json +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/sidebars.ts +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/src/components/GitHubButton/index.tsx +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/src/css/custom.css +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/src/theme/DocCard/index.tsx +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/src/theme/DocCard/styles.module.css +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/src/theme/DocCardList/index.tsx +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/src/theme/DocCardList/styles.module.css +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/src/theme/Root.js +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/.nojekyll +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/docusaurus.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/codebase_index/chunk.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/codebase_index/cover.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/codebase_index/flow.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/codebase_index/usecase.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/custom_targets/convert.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/custom_targets/cover.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/docs_to_knowledge_graph/cover.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/docs_to_knowledge_graph/dedupe.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/docs_to_knowledge_graph/export_document.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/docs_to_knowledge_graph/export_relationship.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/docs_to_knowledge_graph/extract_relationship.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/docs_to_knowledge_graph/flow.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/docs_to_knowledge_graph/relationship.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/docs_to_knowledge_graph/summary.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/image_search/cover.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/multi_format_index/cover.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/favicon.ico +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/icon.svg +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/incremental-etl.gif +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/robots.txt +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/tsconfig.json +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/yarn.lock +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/amazon_s3_embedding/.env.example +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/amazon_s3_embedding/.gitignore +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/amazon_s3_embedding/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/amazon_s3_embedding/main.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/amazon_s3_embedding/pyproject.toml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/azure_blob_embedding/.env.example +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/azure_blob_embedding/.gitignore +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/azure_blob_embedding/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/azure_blob_embedding/main.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/azure_blob_embedding/pyproject.toml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/code_embedding/.env +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/code_embedding/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/code_embedding/main.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/code_embedding/pyproject.toml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/custom_output_files/.env +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/custom_output_files/.gitignore +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/custom_output_files/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/custom_output_files/data/bizarre_animals.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/custom_output_files/data/chunk_norris.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/custom_output_files/main.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/custom_output_files/pyproject.toml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/docs_to_knowledge_graph/.env +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/docs_to_knowledge_graph/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/docs_to_knowledge_graph/main.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/docs_to_knowledge_graph/pyproject.toml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/face_recognition/.env +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/face_recognition/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/face_recognition/images/Carter_welcomes_Reagan.jpg +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/face_recognition/images/Solvay_conference_1927.jpg +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/face_recognition/images/Steve_Jobs_and_Bill_Gates_(522695099).jpg +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/face_recognition/images/einplanck3.jpg +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/face_recognition/main.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/face_recognition/pyproject.toml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/fastapi_server_docker/.dockerignore +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/fastapi_server_docker/.env +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/fastapi_server_docker/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/fastapi_server_docker/compose.yaml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/fastapi_server_docker/dockerfile +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/fastapi_server_docker/files/1810.04805v2.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/fastapi_server_docker/main.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/fastapi_server_docker/requirements.txt +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/gdrive_text_embedding/.env.example +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/gdrive_text_embedding/.gitignore +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/gdrive_text_embedding/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/gdrive_text_embedding/main.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/gdrive_text_embedding/pyproject.toml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/.env +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/colpali_main.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/frontend/.gitignore +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/frontend/index.html +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/frontend/package-lock.json +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/frontend/package.json +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/frontend/src/App.jsx +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/frontend/src/main.jsx +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/frontend/src/style.css +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/frontend/vite.config.js +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/img/cat1.jpeg +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/img/dog1.jpeg +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/img/elephant1.jpg +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/img/giraffe.jpg +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/main.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/pyproject.toml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/live_updates/.env +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/live_updates/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/live_updates/data/bizarre_animals.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/live_updates/data/chunk_norris.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/live_updates/main.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/live_updates/pyproject.toml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/manuals_llm_extraction/.env +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/manuals_llm_extraction/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/manuals_llm_extraction/main.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/manuals_llm_extraction/manuals/array.pdf +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/manuals_llm_extraction/manuals/base64.pdf +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/manuals_llm_extraction/manuals/copy.pdf +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/manuals_llm_extraction/manuals/glob.pdf +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/.env +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/main.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/pyproject.toml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/source_files/1706.03762v7.pdf +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/source_files/1810.04805v2.pdf +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/source_files/2502.06786v3.pdf +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/source_files/healthcare_industry_test_p101.jpg +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/source_files/healthcare_industry_test_p86.jpg +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/source_files/healthcare_industry_test_p9.jpg +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/source_files/restaurant_brands_international_2023.jpg +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/source_files/sweetgreen_2023.jpg +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/paper_metadata/.env.example +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/paper_metadata/.gitignore +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/paper_metadata/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/paper_metadata/main.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/paper_metadata/papers/1706.03762v7.pdf +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/paper_metadata/papers/1810.04805v2.pdf +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/paper_metadata/papers/2502.06786v3.pdf +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/paper_metadata/papers/2502.20346v1.pdf +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/patient_intake_extraction/.env.example +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/patient_intake_extraction/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/patient_intake_extraction/data/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/patient_intake_extraction/data/patient_forms/Patient_Intake_Form_David_Artificial.docx +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/patient_intake_extraction/data/patient_forms/Patient_Intake_Form_Emily_Artificial.pdf +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/patient_intake_extraction/data/patient_forms/Patient_Intake_Form_Joe_Artificial.pdf +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/patient_intake_extraction/data/patient_forms/Patient_Intake_From_Jane_Artificial.docx +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/patient_intake_extraction/main.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/patient_intake_extraction/pyproject.toml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/pdf_embedding/.env +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/pdf_embedding/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/pdf_embedding/main.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/pdf_embedding/pdf_files/1706.03762v7.pdf +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/pdf_embedding/pdf_files/1810.04805v2.pdf +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/pdf_embedding/pdf_files/rfc8259.pdf +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/postgres_source/.env +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/postgres_source/.env.example +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/postgres_source/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/postgres_source/prepare_source_data.sql +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/postgres_source/pyproject.toml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/img/cocoinsight.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/img/neo4j.png +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/main.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/products/p1.json +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/products/p2.json +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/products/p3.json +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/products/p4.json +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/products/p5.json +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/products/p6.json +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/products/p7.json +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/products/p8.json +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/products/p9.json +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/pyproject.toml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding/.env +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding/Text_Embedding.ipynb +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding/main.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding/markdown_files/1706.03762v7.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding/markdown_files/1810.04805v2.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding/markdown_files/rfc8259.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding/pyproject.toml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding_qdrant/.env +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding_qdrant/README.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding_qdrant/main.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding_qdrant/markdown_files/rfc8259.md +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding_qdrant/pyproject.toml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/pyproject.toml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/__init__.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/auth_registry.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/cli.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/flow.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/functions.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/index.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/lib.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/llm.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/op.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/py.typed +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/runtime.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/setup.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/sources.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/targets.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/tests/__init__.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/tests/test_convert.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/tests/test_optional_database.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/tests/test_transform_flow.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/tests/test_typing.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/tests/test_validation.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/user_app_loader.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/utils.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/validation.py +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/ruff.toml +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/base/duration.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/base/field_attrs.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/base/json_schema.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/base/mod.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/base/spec.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/builder/analyzed_flow.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/builder/flow_builder.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/builder/mod.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/db_tracking.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/db_tracking_setup.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/indexing_status.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/live_updater.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/memoization.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/mod.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/lib.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/llm/anthropic.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/llm/gemini.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/llm/litellm.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/llm/mod.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/llm/ollama.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/llm/openai.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/llm/openrouter.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/llm/vllm.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/llm/voyage.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/factory_bases.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/functions/embed_text.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/functions/extract_by_llm.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/functions/mod.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/functions/parse_json.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/functions/test_utils.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/mod.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/py_factory.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/registration.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/registry.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/sdk.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/shared/mod.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/shared/postgres.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/sources/mod.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/sources/shared/mod.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/sources/shared/pattern_matcher.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/targets/mod.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/targets/postgres.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/targets/qdrant.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/targets/shared/mod.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/targets/shared/table_columns.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/prelude.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/py/mod.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/server.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/service/error.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/service/mod.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/settings.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/setup/auth_registry.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/setup/components.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/setup/db_metadata.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/setup/driver.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/setup/flow_features.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/setup/mod.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/utils/concur_control.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/utils/db.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/utils/fingerprint.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/utils/immutable.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/utils/mod.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/utils/retryable.rs +0 -0
- {cocoindex-0.1.83 → cocoindex-0.2.1}/src/utils/yaml_ser.rs +0 -0
@@ -2,7 +2,7 @@
|
|
2
2
|
name = "cocoindex"
|
3
3
|
# Version used for local development is always higher than others to take precedence.
|
4
4
|
# Will be overridden for specific release versions.
|
5
|
-
version = "0.1
|
5
|
+
version = "0.2.1"
|
6
6
|
edition = "2024"
|
7
7
|
rust-version = "1.88"
|
8
8
|
readme = "README.md"
|
@@ -23,7 +23,7 @@ Each piece of data has a **data type**, falling into one of the following catego
|
|
23
23
|
|
24
24
|
* *Basic type*.
|
25
25
|
* *Struct type*: a collection of **fields**, each with a name and a type.
|
26
|
-
* *Table type*: a collection of **rows**, each of which is a struct with specified schema. A table type can be a *KTable* (
|
26
|
+
* *Table type*: a collection of **rows**, each of which is a struct with specified schema. A table type can be a *KTable* (with key columns that uniquely identify each row) or a *LTable* (rows are ordered but without keys).
|
27
27
|
|
28
28
|
An indexing flow always has a top-level struct, containing all data within and managed by the flow.
|
29
29
|
|
@@ -148,21 +148,27 @@ We have two specific types of *Table* types: *KTable* and *LTable*.
|
|
148
148
|
|
149
149
|
#### KTable
|
150
150
|
|
151
|
-
*KTable* is a *Table* type whose
|
151
|
+
*KTable* is a *Table* type whose one or more columns together serve as the key.
|
152
152
|
The row order of a *KTable* is not preserved.
|
153
|
-
|
153
|
+
Each key column must be a [key type](#key-types). When multiple key columns are present, they form a composite key.
|
154
154
|
|
155
|
-
In Python, a *KTable* type is represented by `dict[K, V]`.
|
156
|
-
|
157
|
-
|
158
|
-
|
159
|
-
|
155
|
+
In Python, a *KTable* type is represented by `dict[K, V]`.
|
156
|
+
`K` represents the key and `V` represents the value for each row:
|
157
|
+
|
158
|
+
- `K` can be a Struct type (either a frozen dataclass or a `NamedTuple`) that contains all key parts as fields. This is the general way to model multi-part keys.
|
159
|
+
- When there is only a single key part and it is a basic type (e.g. `str`, `int`), you may use that basic type directly as the dictionary key instead of wrapping it in a Struct.
|
160
|
+
- `V` should be the type bound to a *Struct* representing the non-key value fields of each row.
|
161
|
+
|
162
|
+
When a specific type annotation is not provided:
|
163
|
+
- For composite keys (multiple key parts), the key binds to a Python tuple of the key parts, e.g. `tuple[str, str]`.
|
164
|
+
- For a single basic key part, the key binds to that basic Python type.
|
165
|
+
- The value binds to `dict[str, Any]`.
|
160
166
|
|
161
167
|
|
162
168
|
For example, you can use `dict[str, Person]` or `dict[str, PersonTuple]` to represent a *KTable*, with 4 columns: key (*Str*), `first_name` (*Str*), `last_name` (*Str*), `dob` (*Date*).
|
163
169
|
It's bound to `dict[str, dict[str, Any]]` if you don't annotate the function argument with a specific type.
|
164
170
|
|
165
|
-
Note that
|
171
|
+
Note that when using a Struct as the key, it must be immutable in Python. For a dataclass, annotate it with `@dataclass(frozen=True)`. For `NamedTuple`, immutability is built-in. For example:
|
166
172
|
|
167
173
|
```python
|
168
174
|
@dataclass(frozen=True)
|
@@ -175,8 +181,8 @@ class PersonKeyTuple(NamedTuple):
|
|
175
181
|
id: str
|
176
182
|
```
|
177
183
|
|
178
|
-
Then you can use `dict[PersonKey, Person]` or `dict[PersonKeyTuple, PersonTuple]` to represent a KTable keyed by `
|
179
|
-
|
184
|
+
Then you can use `dict[PersonKey, Person]` or `dict[PersonKeyTuple, PersonTuple]` to represent a KTable keyed by both `id_kind` and `id`.
|
185
|
+
If you don't annotate the function argument with a specific type, it's bound to `dict[tuple[str, str], dict[str, Any]]`.
|
180
186
|
|
181
187
|
|
182
188
|
#### LTable
|
@@ -96,11 +96,11 @@ If not set, all flows are in a default unnamed namespace.
|
|
96
96
|
|
97
97
|
:::
|
98
98
|
|
99
|
-
* `max_connections` (type: `int`, default: `
|
99
|
+
* `max_connections` (type: `int`, default: `25`): The maximum number of connections to keep in the pool.
|
100
100
|
|
101
101
|
*Environment variable* for `Settings.database.max_connections`: `COCOINDEX_DATABASE_MAX_CONNECTIONS`
|
102
102
|
|
103
|
-
* `min_connections` (type: `int`, default: `
|
103
|
+
* `min_connections` (type: `int`, default: `5`): The minimum number of connections to keep in the pool.
|
104
104
|
|
105
105
|
*Environment variable* for `Settings.database.min_connections`: `COCOINDEX_DATABASE_MIN_CONNECTIONS`
|
106
106
|
|
@@ -134,7 +134,7 @@ This is the list of environment variables, each of which has a corresponding fie
|
|
134
134
|
| `COCOINDEX_DATABASE_URL` | `database.url` | Yes |
|
135
135
|
| `COCOINDEX_DATABASE_USER` | `database.user` | No |
|
136
136
|
| `COCOINDEX_DATABASE_PASSWORD` | `database.password` | No |
|
137
|
-
| `COCOINDEX_DATABASE_MAX_CONNECTIONS` | `database.max_connections` | No (default: `
|
138
|
-
| `COCOINDEX_DATABASE_MIN_CONNECTIONS` | `database.min_connections` | No (default: `
|
137
|
+
| `COCOINDEX_DATABASE_MAX_CONNECTIONS` | `database.max_connections` | No (default: `25`) |
|
138
|
+
| `COCOINDEX_DATABASE_MIN_CONNECTIONS` | `database.min_connections` | No (default: `5`) |
|
139
139
|
| `COCOINDEX_SOURCE_MAX_INFLIGHT_ROWS` | `global_execution_options.source_max_inflight_rows` | No (default: `1024`) |
|
140
140
|
| `COCOINDEX_SOURCE_MAX_INFLIGHT_BYTES` | `global_execution_options.source_max_inflight_bytes` | No |
|
@@ -10,11 +10,10 @@ sidebar_custom_props:
|
|
10
10
|
tags: [vector-index, metadata]
|
11
11
|
---
|
12
12
|
|
13
|
-
import { GitHubButton, YouTubeButton } from '../../../src/components/GitHubButton';
|
13
|
+
import { GitHubButton, YouTubeButton, DocumentationButton } from '../../../src/components/GitHubButton';
|
14
14
|
|
15
15
|
<GitHubButton url="https://github.com/cocoindex-io/cocoindex/tree/main/examples/paper_metadata"/>
|
16
16
|
|
17
|
-
|
18
17
|
## What we will achieve
|
19
18
|
|
20
19
|
1. Extract the paper metadata, including file name, title, author information, abstract, and number of pages.
|
@@ -27,18 +26,8 @@ to answer questions like "Give me all the papers by Jeff Dean."
|
|
27
26
|
|
28
27
|
4. If you want to perform full PDF embedding for the paper, you can extend the flow.
|
29
28
|
|
30
|
-
##
|
31
|
-
|
32
|
-
- [Install PostgreSQL](https://cocoindex.io/docs/getting_started/installation#-install-postgres).
|
33
|
-
CocoIndex uses PostgreSQL internally for incremental processing.
|
34
|
-
- [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai).
|
35
|
-
Alternatively, we have native support for Gemini, Ollama, LiteLLM. Check out the [guide](https://cocoindex.io/docs/ai/llm#ollama).
|
36
|
-
You can choose your favorite LLM provider and work completely on-premises.
|
37
|
-
|
38
|
-
## Define Indexing Flow
|
39
|
-
|
40
|
-
To better help you navigate what we will walk through, here is a flow diagram:
|
41
|
-
|
29
|
+
## Flow Overview
|
30
|
+

|
42
31
|
1. Import a list of papers in PDF.
|
43
32
|
2. For each file:
|
44
33
|
- Extract the first page of the paper.
|
@@ -50,9 +39,15 @@ To better help you navigate what we will walk through, here is a flow diagram:
|
|
50
39
|
- Author-to-paper mapping, for author-based query.
|
51
40
|
- Embeddings for titles and abstract chunks, for semantic search.
|
52
41
|
|
53
|
-
|
42
|
+
## Setup
|
43
|
+
|
44
|
+
- [Install PostgreSQL](https://cocoindex.io/docs/getting_started/installation#-install-postgres).
|
45
|
+
CocoIndex uses PostgreSQL internally for incremental processing.
|
46
|
+
- [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai). Alternatively, we have native support for Gemini, Ollama, LiteLLM. You can choose your favorite LLM provider and work completely on-premises.
|
54
47
|
|
55
|
-
|
48
|
+
<DocumentationButton href="https://cocoindex.io/docs/ai/llm" text="LLM" margin="0 0 16px 0" />
|
49
|
+
|
50
|
+
## Import the Papers
|
56
51
|
|
57
52
|
```python
|
58
53
|
@cocoindex.flow_def(name="PaperMetadata")
|
@@ -65,12 +60,12 @@ def paper_metadata_flow(
|
|
65
60
|
)
|
66
61
|
```
|
67
62
|
|
68
|
-
`flow_builder.add_source` will create a table with sub fields (`filename`, `content`)
|
69
|
-
|
63
|
+
`flow_builder.add_source` will create a table with sub fields (`filename`, `content`).
|
64
|
+
<DocumentationButton href="https://cocoindex.io/docs/ops/sources" text="Sources" margin="0 0 16px 0" />
|
70
65
|
|
71
|
-
|
66
|
+
## Extract and collect metadata
|
72
67
|
|
73
|
-
|
68
|
+
### Extract first page for basic info
|
74
69
|
|
75
70
|
Define a custom function to extract the first page and number of pages of the PDF.
|
76
71
|
|
@@ -96,20 +91,19 @@ def extract_basic_info(content: bytes) -> PaperBasicInfo:
|
|
96
91
|
|
97
92
|
```
|
98
93
|
|
99
|
-
Now
|
100
|
-
We extract metadata from the first page to minimize processing cost, since the entire PDF can be very large.
|
94
|
+
Now plug this into the flow. We extract metadata from the first page to minimize processing cost, since the entire PDF can be very large.
|
101
95
|
|
102
96
|
```python
|
103
97
|
with data_scope["documents"].row() as doc:
|
104
98
|
doc["basic_info"] = doc["content"].transform(extract_basic_info)
|
105
99
|
```
|
100
|
+

|
106
101
|
|
107
|
-
After this step,
|
102
|
+
After this step, we should have the basic info of each paper.
|
108
103
|
|
109
104
|
### Parse basic info
|
110
105
|
|
111
|
-
We will convert the first page to Markdown using Marker.
|
112
|
-
Alternatively, you can easily plug in your favorite PDF parser, such as Docling.
|
106
|
+
We will convert the first page to Markdown using Marker. Alternatively, you can easily plug in any PDF parser, such as Docling using CocoIndex's [custom function](https://cocoindex.io/docs/custom_ops/custom_functions).
|
113
107
|
|
114
108
|
Define a marker converter function and cache it, since its initialization is resource-intensive.
|
115
109
|
This ensures that the same converter instance is reused for different input files.
|
@@ -140,18 +134,20 @@ def pdf_to_markdown(content: bytes) -> str:
|
|
140
134
|
Pass it to your transform
|
141
135
|
|
142
136
|
```python
|
143
|
-
with data_scope["documents"].row() as doc:
|
137
|
+
with data_scope["documents"].row() as doc:
|
138
|
+
# ... process
|
144
139
|
doc["first_page_md"] = doc["basic_info"]["first_page"].transform(
|
145
140
|
pdf_to_markdown
|
146
141
|
)
|
147
142
|
```
|
143
|
+

|
148
144
|
|
149
145
|
After this step, you should have the first page of each paper in Markdown format.
|
150
146
|
|
151
|
-
|
147
|
+
### Extract basic info with LLM
|
152
148
|
|
153
149
|
Define a schema for LLM extraction. CocoIndex natively supports LLM-structured extraction with complex and nested schemas.
|
154
|
-
If you are interested in learning more about nested schemas, refer to [this
|
150
|
+
If you are interested in learning more about nested schemas, refer to [this example](https://cocoindex.io/docs/examples/patient_form_extraction).
|
155
151
|
|
156
152
|
```python
|
157
153
|
@dataclasses.dataclass
|
@@ -163,7 +159,6 @@ class PaperMetadata:
|
|
163
159
|
title: str
|
164
160
|
authors: list[Author]
|
165
161
|
abstract: str
|
166
|
-
|
167
162
|
```
|
168
163
|
|
169
164
|
Plug it into the `ExtractByLlm` function. With a dataclass defined, CocoIndex will automatically parse the LLM response into the dataclass.
|
@@ -181,26 +176,27 @@ doc["metadata"] = doc["first_page_md"].transform(
|
|
181
176
|
```
|
182
177
|
|
183
178
|
After this step, you should have the metadata of each paper.
|
179
|
+

|
184
180
|
|
185
|
-
|
181
|
+
### Collect paper metadata
|
186
182
|
|
187
183
|
```python
|
188
|
-
|
189
|
-
|
190
|
-
|
191
|
-
|
192
|
-
|
193
|
-
|
194
|
-
|
195
|
-
|
196
|
-
|
197
|
-
|
198
|
-
|
184
|
+
paper_metadata = data_scope.add_collector()
|
185
|
+
with data_scope["documents"].row() as doc:
|
186
|
+
# ... process
|
187
|
+
# Collect metadata
|
188
|
+
paper_metadata.collect(
|
189
|
+
filename=doc["filename"],
|
190
|
+
title=doc["metadata"]["title"],
|
191
|
+
authors=doc["metadata"]["authors"],
|
192
|
+
abstract=doc["metadata"]["abstract"],
|
193
|
+
num_pages=doc["basic_info"]["num_pages"],
|
194
|
+
)
|
199
195
|
```
|
200
196
|
|
201
197
|
Just collect anything you need :)
|
202
198
|
|
203
|
-
|
199
|
+
### Collect `author` to `filename` information
|
204
200
|
We’ve already extracted author list. Here we want to collect Author → Papers in a separate table to build a look up functionality.
|
205
201
|
Simply collect by author.
|
206
202
|
|
@@ -216,9 +212,9 @@ with data_scope["documents"].row() as doc:
|
|
216
212
|
```
|
217
213
|
|
218
214
|
|
219
|
-
|
215
|
+
## Compute and collect embeddings
|
220
216
|
|
221
|
-
|
217
|
+
### Title
|
222
218
|
|
223
219
|
```python
|
224
220
|
doc["title_embedding"] = doc["metadata"]["title"].transform(
|
@@ -228,7 +224,7 @@ doc["title_embedding"] = doc["metadata"]["title"].transform(
|
|
228
224
|
)
|
229
225
|
```
|
230
226
|
|
231
|
-
|
227
|
+
### Abstract
|
232
228
|
|
233
229
|
Split abstract into chunks, embed each chunk and collect their embeddings.
|
234
230
|
Sometimes the abstract could be very long.
|
@@ -252,6 +248,8 @@ doc["abstract_chunks"] = doc["metadata"]["abstract"].transform(
|
|
252
248
|
|
253
249
|
After this step, you should have the abstract chunks of each paper.
|
254
250
|
|
251
|
+

|
252
|
+
|
255
253
|
Embed each chunk and collect their embeddings.
|
256
254
|
|
257
255
|
```python
|
@@ -265,7 +263,9 @@ with doc["abstract_chunks"].row() as chunk:
|
|
265
263
|
|
266
264
|
After this step, you should have the embeddings of the abstract chunks of each paper.
|
267
265
|
|
268
|
-
|
266
|
+

|
267
|
+
|
268
|
+
### Collect embeddings
|
269
269
|
|
270
270
|
```python
|
271
271
|
metadata_embeddings = data_scope.add_collector()
|
@@ -292,7 +292,7 @@ with data_scope["documents"].row() as doc:
|
|
292
292
|
)
|
293
293
|
```
|
294
294
|
|
295
|
-
|
295
|
+
## Export
|
296
296
|
Finally, we export the data to Postgres.
|
297
297
|
|
298
298
|
```python
|
@@ -319,14 +319,9 @@ metadata_embeddings.export(
|
|
319
319
|
)
|
320
320
|
```
|
321
321
|
|
322
|
-
In this example we use PGVector as embedding
|
323
|
-
With CocoIndex, you can do one line switch on other supported Vector databases like Qdrant, see this [guide](https://cocoindex.io/docs/ops/targets#entry-oriented-targets) for more details.
|
324
|
-
We aim to standardize interfaces and make it like assembling building blocks.
|
322
|
+
In this example we use PGVector as embedding store. With CocoIndex, you can do one line switch on other supported Vector databases.
|
325
323
|
|
326
|
-
|
327
|
-
|
328
|
-
You can walk through the project step by step in [CocoInsight](https://www.youtube.com/watch?v=MMrpUfUcZPk) to see
|
329
|
-
exactly how each field is constructed and what happens behind the scenes.
|
324
|
+
<DocumentationButton href="https://cocoindex.io/docs/ops/targets#entry-oriented-targets" text="Entry Oriented Targets" margin="0 0 16px 0" />
|
330
325
|
|
331
326
|
## Query the index
|
332
327
|
|
@@ -338,3 +333,14 @@ For now CocoIndex doesn't provide additional query interface. We can write SQL o
|
|
338
333
|
- The query space has excellent solutions for querying, reranking, and other search-related functionality.
|
339
334
|
|
340
335
|
If you need assist with writing the query, please feel free to reach out to us at [Discord](https://discord.com/invite/zpA9S2DR7s).
|
336
|
+
|
337
|
+
## CocoInsight
|
338
|
+
|
339
|
+
You can walk through the project step by step in [CocoInsight](https://www.youtube.com/watch?v=MMrpUfUcZPk) to see exactly how each field is constructed and what happens behind the scenes.
|
340
|
+
|
341
|
+
|
342
|
+
```sh
|
343
|
+
cocoindex server -ci main.py
|
344
|
+
```
|
345
|
+
|
346
|
+
Follow the url `https://cocoindex.io/cocoinsight`. It connects to your local CocoIndex server, with zero pipeline data retention.
|
@@ -35,8 +35,10 @@ and then build a knowledge graph.
|
|
35
35
|
## Setup
|
36
36
|
* [Install PostgreSQL](https://cocoindex.io/docs/getting_started/installation#-install-postgres). CocoIndex uses PostgreSQL internally for incremental processing.
|
37
37
|
* [Install Neo4j](https://cocoindex.io/docs/ops/targets#neo4j-dev-instance), a graph database.
|
38
|
-
* [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai).
|
39
|
-
|
38
|
+
* [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai). Alternatively, we have native support for Gemini, Ollama, LiteLLM. You can choose your favorite LLM provider and work completely on-premises.
|
39
|
+
|
40
|
+
<DocumentationButton href="https://cocoindex.io/docs/ai/llm" text="LLM" margin="0 0 16px 0" />
|
41
|
+
|
40
42
|
|
41
43
|
## Documentation
|
42
44
|
<DocumentationButton href="https://cocoindex.io/docs/ops/targets#property-graph-targets" text="Property Graph Targets" margin="0 0 16px 0" />
|
@@ -0,0 +1,249 @@
|
|
1
|
+
---
|
2
|
+
title: Extract Structured Data from Python Manual markdowns with Ollama
|
3
|
+
description: Extract structured data from markdowns (Python Manual)
|
4
|
+
sidebar_class_name: hidden
|
5
|
+
slug: /examples/manual_extraction
|
6
|
+
canonicalUrl: '/examples/manual_extraction'
|
7
|
+
sidebar_custom_props:
|
8
|
+
image: /img/examples/manual_extraction/cover.png
|
9
|
+
tags: [structured-data-extraction, data-mapping]
|
10
|
+
tags: [structured-data-extraction, data-mapping]
|
11
|
+
---
|
12
|
+
|
13
|
+
import { GitHubButton, YouTubeButton, DocumentationButton } from '../../../src/components/GitHubButton';
|
14
|
+
|
15
|
+
<GitHubButton url="https://github.com/cocoindex-io/cocoindex/tree/main/examples/manuals_llm_extraction"/>
|
16
|
+
|
17
|
+
## Overview
|
18
|
+
This example shows how to extract structured data from Python Manuals using Ollama.
|
19
|
+
|
20
|
+
## Flow Overview
|
21
|
+

|
22
|
+
|
23
|
+
- For each PDF file:
|
24
|
+
- Parse to markdown.
|
25
|
+
- Extract structured data from the markdown using LLM.
|
26
|
+
- Add summary to the module info.
|
27
|
+
- Collect the data.
|
28
|
+
- Export the data to a table.
|
29
|
+
|
30
|
+
|
31
|
+
## Prerequisites
|
32
|
+
- If you don't have Postgres installed, please refer to the [installation guide](https://cocoindex.io/docs/getting_started/installation).
|
33
|
+
|
34
|
+
- [Download](https://ollama.com/download) and install Ollama. Pull your favorite LLM models by:
|
35
|
+
```sh
|
36
|
+
ollama pull llama3.2
|
37
|
+
```
|
38
|
+
|
39
|
+
<DocumentationButton href="https://cocoindex.io/docs/ai/llm#ollama" text="Ollama" margin="0 0 16px 0" />
|
40
|
+
|
41
|
+
Alternatively, CocoIndex have native support for Gemini, Ollama, LiteLLM. You can choose your favorite LLM provider and work completely on-premises.
|
42
|
+
|
43
|
+
<DocumentationButton href="https://cocoindex.io/docs/ai/llm" text="LLM" margin="0 0 16px 0" />
|
44
|
+
|
45
|
+
## Add Source
|
46
|
+
Let's add Python docs as a source.
|
47
|
+
|
48
|
+
```python
|
49
|
+
@cocoindex.flow_def(name="ManualExtraction")
|
50
|
+
def manual_extraction_flow(
|
51
|
+
flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
|
52
|
+
):
|
53
|
+
"""
|
54
|
+
Define an example flow that extracts manual information from a Markdown.
|
55
|
+
"""
|
56
|
+
data_scope["documents"] = flow_builder.add_source(
|
57
|
+
cocoindex.sources.LocalFile(path="manuals", binary=True)
|
58
|
+
)
|
59
|
+
|
60
|
+
modules_index = data_scope.add_collector()
|
61
|
+
```
|
62
|
+
|
63
|
+
`flow_builder.add_source` will create a table with the following sub fields:
|
64
|
+
- `filename` (key, type: `str`): the filename of the file, e.g. `dir1/file1.md`
|
65
|
+
- `content` (type: `str` if `binary` is `False`, otherwise `bytes`): the content of the file
|
66
|
+
|
67
|
+
<DocumentationButton href="https://cocoindex.io/docs/ops/sources" text="LocalFile" margin="0 0 16px 0" />
|
68
|
+
|
69
|
+
## Parse Markdown
|
70
|
+
|
71
|
+
To do this, we can plugin a custom function to convert PDF to markdown. There are so many different parsers commercially and open source available, you can bring your own parser here.
|
72
|
+
|
73
|
+
```python
|
74
|
+
class PdfToMarkdown(cocoindex.op.FunctionSpec):
|
75
|
+
"""Convert a PDF to markdown."""
|
76
|
+
|
77
|
+
|
78
|
+
@cocoindex.op.executor_class(gpu=True, cache=True, behavior_version=1)
|
79
|
+
class PdfToMarkdownExecutor:
|
80
|
+
"""Executor for PdfToMarkdown."""
|
81
|
+
|
82
|
+
spec: PdfToMarkdown
|
83
|
+
_converter: PdfConverter
|
84
|
+
|
85
|
+
def prepare(self):
|
86
|
+
config_parser = ConfigParser({})
|
87
|
+
self._converter = PdfConverter(
|
88
|
+
create_model_dict(), config=config_parser.generate_config_dict()
|
89
|
+
)
|
90
|
+
|
91
|
+
def __call__(self, content: bytes) -> str:
|
92
|
+
with tempfile.NamedTemporaryFile(delete=True, suffix=".pdf") as temp_file:
|
93
|
+
temp_file.write(content)
|
94
|
+
temp_file.flush()
|
95
|
+
text, _, _ = text_from_rendered(self._converter(temp_file.name))
|
96
|
+
return text
|
97
|
+
```
|
98
|
+
You may wonder why we want to define a spec + executor (instead of using a standalone function) here. The main reason is there're some heavy preparation work (initialize the parser) needs to be done before being ready to process real data.
|
99
|
+
|
100
|
+
<DocumentationButton href="https://cocoindex.io/docs/custom_ops/custom_functions" text="Custom Function" margin="0 0 16px 0" />
|
101
|
+
|
102
|
+
Plug in the function to the flow.
|
103
|
+
|
104
|
+
```python
|
105
|
+
with data_scope["documents"].row() as doc:
|
106
|
+
doc["markdown"] = doc["content"].transform(PdfToMarkdown())
|
107
|
+
```
|
108
|
+
|
109
|
+
It transforms each document to markdown.
|
110
|
+
|
111
|
+
|
112
|
+
## Extract Structured Data from Markdown files
|
113
|
+
### Define schema
|
114
|
+
Let's define the schema `ModuleInfo` using Python dataclasses, and we can pass it to the LLM to extract the structured data. It's easy to do this with CocoIndex.
|
115
|
+
|
116
|
+
``` python
|
117
|
+
@dataclasses.dataclass
|
118
|
+
class ArgInfo:
|
119
|
+
"""Information about an argument of a method."""
|
120
|
+
name: str
|
121
|
+
description: str
|
122
|
+
|
123
|
+
@dataclasses.dataclass
|
124
|
+
class MethodInfo:
|
125
|
+
"""Information about a method."""
|
126
|
+
name: str
|
127
|
+
args: cocoindex.typing.List[ArgInfo]
|
128
|
+
description: str
|
129
|
+
|
130
|
+
@dataclasses.dataclass
|
131
|
+
class ClassInfo:
|
132
|
+
"""Information about a class."""
|
133
|
+
name: str
|
134
|
+
description: str
|
135
|
+
methods: cocoindex.typing.List[MethodInfo]
|
136
|
+
|
137
|
+
@dataclasses.dataclass
|
138
|
+
class ModuleInfo:
|
139
|
+
"""Information about a Python module."""
|
140
|
+
title: str
|
141
|
+
description: str
|
142
|
+
classes: cocoindex.typing.List[ClassInfo]
|
143
|
+
methods: cocoindex.typing.List[MethodInfo]
|
144
|
+
```
|
145
|
+
|
146
|
+
### Extract structured data
|
147
|
+
|
148
|
+
CocoIndex provides builtin functions (e.g. ExtractByLlm) that process data using LLM. This example uses Ollama.
|
149
|
+
|
150
|
+
```python
|
151
|
+
with data_scope["documents"].row() as doc:
|
152
|
+
doc["module_info"] = doc["content"].transform(
|
153
|
+
cocoindex.functions.ExtractByLlm(
|
154
|
+
llm_spec=cocoindex.LlmSpec(
|
155
|
+
api_type=cocoindex.LlmApiType.OLLAMA,
|
156
|
+
# See the full list of models: https://ollama.com/library
|
157
|
+
model="llama3.2"
|
158
|
+
),
|
159
|
+
output_type=ModuleInfo,
|
160
|
+
instruction="Please extract Python module information from the manual."))
|
161
|
+
```
|
162
|
+
|
163
|
+
<DocumentationButton href="https://cocoindex.io/docs/core/functions#extractbyllm" text="ExtractByLlm" margin="0 0 16px 0" />
|
164
|
+
|
165
|
+

|
166
|
+
|
167
|
+
## Add summarization to module info
|
168
|
+
Using CocoIndex as framework, you can easily add any transformation on the data, and collect it as part of the data index. Let's add some simple summary to each module - like number of classes and methods, using simple Python function.
|
169
|
+
|
170
|
+
### Define Schema
|
171
|
+
``` python
|
172
|
+
@dataclasses.dataclass
|
173
|
+
class ModuleSummary:
|
174
|
+
"""Summary info about a Python module."""
|
175
|
+
num_classes: int
|
176
|
+
num_methods: int
|
177
|
+
```
|
178
|
+
|
179
|
+
### A simple custom function to summarize the data
|
180
|
+
```python
|
181
|
+
@cocoindex.op.function()
|
182
|
+
def summarize_module(module_info: ModuleInfo) -> ModuleSummary:
|
183
|
+
"""Summarize a Python module."""
|
184
|
+
return ModuleSummary(
|
185
|
+
num_classes=len(module_info.classes),
|
186
|
+
num_methods=len(module_info.methods),
|
187
|
+
)
|
188
|
+
```
|
189
|
+
|
190
|
+
### Plug in the function into the flow
|
191
|
+
```python
|
192
|
+
with data_scope["documents"].row() as doc:
|
193
|
+
# ... after the extraction
|
194
|
+
doc["module_summary"] = doc["module_info"].transform(summarize_module)
|
195
|
+
```
|
196
|
+
|
197
|
+
<DocumentationButton href="https://cocoindex.io/docs/custom_ops/custom_functions" text="Custom Function" margin="0 0 16px 0" />
|
198
|
+
|
199
|
+

|
200
|
+
|
201
|
+
## Collect the data
|
202
|
+
|
203
|
+
|
204
|
+
After the extraction, we need to cherrypick anything we like from the output using the `collect` function from the collector of a data scope defined above.
|
205
|
+
|
206
|
+
```python
|
207
|
+
modules_index.collect(
|
208
|
+
filename=doc["filename"],
|
209
|
+
module_info=doc["module_info"],
|
210
|
+
)
|
211
|
+
```
|
212
|
+
|
213
|
+
Finally, let's export the extracted data to a table.
|
214
|
+
|
215
|
+
```python
|
216
|
+
modules_index.export(
|
217
|
+
"modules",
|
218
|
+
cocoindex.storages.Postgres(table_name="modules_info"),
|
219
|
+
primary_key_fields=["filename"],
|
220
|
+
)
|
221
|
+
```
|
222
|
+
|
223
|
+
## Query and test your index
|
224
|
+
Run the following command to setup and update the index.
|
225
|
+
```sh
|
226
|
+
cocoindex update -L main.py
|
227
|
+
```
|
228
|
+
You'll see the index updates state in the terminal
|
229
|
+
|
230
|
+
After the index is built, you have a table with the name `modules_info`. You can query it at any time, e.g., start a Postgres shell:
|
231
|
+
|
232
|
+
```bash
|
233
|
+
psql postgres://cocoindex:cocoindex@localhost/cocoindex
|
234
|
+
```
|
235
|
+
|
236
|
+
And run the SQL query:
|
237
|
+
|
238
|
+
```sql
|
239
|
+
SELECT filename, module_info->'title' AS title, module_summary FROM modules_info;
|
240
|
+
```
|
241
|
+
|
242
|
+
## CocoInsight
|
243
|
+
[CocoInsight](https://www.youtube.com/watch?v=ZnmyoHslBSc) is a really cool tool to help you understand your data pipeline and data index. It is in Early Access now (Free).
|
244
|
+
|
245
|
+
```sh
|
246
|
+
cocoindex server -ci main.py
|
247
|
+
```
|
248
|
+
CocoInsight dashboard is here `https://cocoindex.io/cocoinsight`. It connects to your local CocoIndex server with zero data retention.
|
249
|
+
|