cocoindex 0.1.83__tar.gz → 0.2.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (401) hide show
  1. {cocoindex-0.1.83 → cocoindex-0.2.1}/Cargo.lock +1 -1
  2. {cocoindex-0.1.83 → cocoindex-0.2.1}/Cargo.toml +1 -1
  3. {cocoindex-0.1.83 → cocoindex-0.2.1}/PKG-INFO +1 -1
  4. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/core/basics.md +1 -1
  5. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/core/data_types.mdx +16 -10
  6. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/core/settings.mdx +4 -4
  7. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/examples/examples/academic_papers_index.md +60 -54
  8. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/examples/examples/docs_to_knowledge_graph.md +4 -2
  9. cocoindex-0.2.1/docs/docs/examples/examples/manual_extraction.md +249 -0
  10. cocoindex-0.2.1/docs/docs/examples/examples/patient_form_extraction.md +296 -0
  11. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/examples/examples/photo_search.md +36 -15
  12. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/examples/examples/product_recommendation.md +87 -66
  13. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/examples/examples/simple_vector_index.md +44 -23
  14. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/getting_started/quickstart.md +1 -1
  15. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/ops/sources.md +43 -0
  16. cocoindex-0.2.1/docs/static/img/examples/academic_papers_index/abstract_chunks.png +0 -0
  17. cocoindex-0.2.1/docs/static/img/examples/academic_papers_index/basic_info.png +0 -0
  18. cocoindex-0.2.1/docs/static/img/examples/academic_papers_index/chunk_embedding.png +0 -0
  19. cocoindex-0.2.1/docs/static/img/examples/academic_papers_index/cover.png +0 -0
  20. cocoindex-0.2.1/docs/static/img/examples/academic_papers_index/first_page.png +0 -0
  21. cocoindex-0.2.1/docs/static/img/examples/academic_papers_index/flow.png +0 -0
  22. cocoindex-0.2.1/docs/static/img/examples/academic_papers_index/metadata.png +0 -0
  23. cocoindex-0.2.1/docs/static/img/examples/manual_extraction/cover.png +0 -0
  24. cocoindex-0.2.1/docs/static/img/examples/manual_extraction/extraction.png +0 -0
  25. cocoindex-0.2.1/docs/static/img/examples/manual_extraction/flow.png +0 -0
  26. cocoindex-0.2.1/docs/static/img/examples/manual_extraction/summary.png +0 -0
  27. cocoindex-0.2.1/docs/static/img/examples/patient_form_extraction/cover.png +0 -0
  28. cocoindex-0.2.1/docs/static/img/examples/patient_form_extraction/extraction.png +0 -0
  29. cocoindex-0.2.1/docs/static/img/examples/patient_form_extraction/fields.png +0 -0
  30. cocoindex-0.2.1/docs/static/img/examples/patient_form_extraction/flow.png +0 -0
  31. cocoindex-0.2.1/docs/static/img/examples/patient_form_extraction/tomarkdown.png +0 -0
  32. cocoindex-0.2.1/docs/static/img/examples/photo_search/cover.png +0 -0
  33. cocoindex-0.2.1/docs/static/img/examples/photo_search/extraction.png +0 -0
  34. cocoindex-0.2.1/docs/static/img/examples/photo_search/flow.png +0 -0
  35. cocoindex-0.2.1/docs/static/img/examples/product_recommendation/cover.png +0 -0
  36. cocoindex-0.2.1/docs/static/img/examples/product_recommendation/dedupe.png +0 -0
  37. cocoindex-0.2.1/docs/static/img/examples/product_recommendation/export_all.png +0 -0
  38. cocoindex-0.2.1/docs/static/img/examples/product_recommendation/export_product.png +0 -0
  39. cocoindex-0.2.1/docs/static/img/examples/product_recommendation/export_taxonomy.png +0 -0
  40. cocoindex-0.2.1/docs/static/img/examples/product_recommendation/extract_product.png +0 -0
  41. cocoindex-0.2.1/docs/static/img/examples/product_recommendation/extract_taxonomy.png +0 -0
  42. cocoindex-0.2.1/docs/static/img/examples/product_recommendation/neo4j.png +0 -0
  43. cocoindex-0.2.1/docs/static/img/examples/product_recommendation/parse_json.png +0 -0
  44. cocoindex-0.2.1/docs/static/img/examples/product_recommendation/taxonomy.png +0 -0
  45. cocoindex-0.2.1/docs/static/img/examples/simple_vector_index/chunk.png +0 -0
  46. cocoindex-0.2.1/docs/static/img/examples/simple_vector_index/cover.png +0 -0
  47. cocoindex-0.2.1/docs/static/img/examples/simple_vector_index/embed.png +0 -0
  48. cocoindex-0.2.1/docs/static/img/examples/simple_vector_index/flow.png +0 -0
  49. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/manuals_llm_extraction/pyproject.toml +1 -1
  50. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/paper_metadata/pyproject.toml +2 -2
  51. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/pdf_embedding/pyproject.toml +1 -1
  52. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/postgres_source/main.py +4 -4
  53. cocoindex-0.1.83/examples/product_recommendation/.env → cocoindex-0.2.1/examples/product_recommendation/.env.example +2 -0
  54. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/convert.py +63 -46
  55. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/setting.py +2 -2
  56. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/subprocess_exec.py +15 -1
  57. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/typing.py +37 -22
  58. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/base/schema.rs +35 -37
  59. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/base/value.rs +221 -77
  60. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/builder/analyzer.rs +2 -7
  61. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/builder/exec_ctx.rs +38 -7
  62. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/builder/plan.rs +2 -1
  63. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/dumper.rs +10 -10
  64. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/evaluator.rs +16 -13
  65. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/row_indexer.rs +7 -8
  66. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/source_indexer.rs +8 -9
  67. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/stats.rs +17 -1
  68. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/lib_context.rs +3 -3
  69. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/functions/split_recursively.rs +16 -10
  70. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/interface.rs +3 -3
  71. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/sources/amazon_s3.rs +5 -5
  72. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/sources/azure_blob.rs +4 -4
  73. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/sources/google_drive.rs +5 -5
  74. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/sources/local_file.rs +6 -8
  75. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/sources/postgres.rs +23 -74
  76. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/targets/kuzu.rs +10 -11
  77. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/targets/neo4j.rs +2 -3
  78. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/targets/shared/property_graph.rs +1 -1
  79. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/py/convert.rs +32 -21
  80. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/service/flows.rs +10 -18
  81. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/setup/states.rs +7 -1
  82. cocoindex-0.1.83/docs/docs/examples/examples/manual_extraction.md +0 -274
  83. cocoindex-0.1.83/docs/docs/examples/examples/patient_form_extraction.md +0 -271
  84. cocoindex-0.1.83/docs/static/img/examples/academic_papers_index/cover.png +0 -0
  85. cocoindex-0.1.83/docs/static/img/examples/manual_extraction/cover.png +0 -0
  86. cocoindex-0.1.83/docs/static/img/examples/patient_form_extraction/cover.png +0 -0
  87. cocoindex-0.1.83/docs/static/img/examples/photo_search/cover.png +0 -0
  88. cocoindex-0.1.83/docs/static/img/examples/product_recommendation/cover.png +0 -0
  89. cocoindex-0.1.83/docs/static/img/examples/simple_vector_index/cover.png +0 -0
  90. {cocoindex-0.1.83 → cocoindex-0.2.1}/.cargo/config.toml +0 -0
  91. {cocoindex-0.1.83 → cocoindex-0.2.1}/.env.lib_debug +0 -0
  92. {cocoindex-0.1.83 → cocoindex-0.2.1}/.github/ISSUE_TEMPLATE//360/237/220/233-bug-report.md" +0 -0
  93. {cocoindex-0.1.83 → cocoindex-0.2.1}/.github/ISSUE_TEMPLATE//360/237/222/241-feature-request.md" +0 -0
  94. {cocoindex-0.1.83 → cocoindex-0.2.1}/.github/scripts/update_version.sh +0 -0
  95. {cocoindex-0.1.83 → cocoindex-0.2.1}/.github/workflows/CI.yml +0 -0
  96. {cocoindex-0.1.83 → cocoindex-0.2.1}/.github/workflows/_doc_release.yml +0 -0
  97. {cocoindex-0.1.83 → cocoindex-0.2.1}/.github/workflows/_test.yml +0 -0
  98. {cocoindex-0.1.83 → cocoindex-0.2.1}/.github/workflows/docs.yml +0 -0
  99. {cocoindex-0.1.83 → cocoindex-0.2.1}/.github/workflows/format.yml +0 -0
  100. {cocoindex-0.1.83 → cocoindex-0.2.1}/.github/workflows/release.yml +0 -0
  101. {cocoindex-0.1.83 → cocoindex-0.2.1}/.gitignore +0 -0
  102. {cocoindex-0.1.83 → cocoindex-0.2.1}/.pre-commit-config.yaml +0 -0
  103. {cocoindex-0.1.83 → cocoindex-0.2.1}/CODE_OF_CONDUCT.md +0 -0
  104. {cocoindex-0.1.83 → cocoindex-0.2.1}/CONTRIBUTING.md +0 -0
  105. {cocoindex-0.1.83 → cocoindex-0.2.1}/LICENSE +0 -0
  106. {cocoindex-0.1.83 → cocoindex-0.2.1}/README.md +0 -0
  107. {cocoindex-0.1.83 → cocoindex-0.2.1}/dev/neo4j.yaml +0 -0
  108. {cocoindex-0.1.83 → cocoindex-0.2.1}/dev/postgres.yaml +0 -0
  109. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/.gitignore +0 -0
  110. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/README.md +0 -0
  111. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/about/community.md +0 -0
  112. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/ai/llm.mdx +0 -0
  113. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/contributing/guide.md +0 -0
  114. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/contributing/new_built_in_target.mdx +0 -0
  115. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/contributing/setup_dev_environment.md +0 -0
  116. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/core/cli.mdx +0 -0
  117. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/core/data_example.svg +0 -0
  118. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/core/flow_def.mdx +0 -0
  119. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/core/flow_example.svg +0 -0
  120. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/core/flow_methods.mdx +0 -0
  121. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/custom_ops/custom_functions.mdx +0 -0
  122. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/custom_ops/custom_targets.mdx +0 -0
  123. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/examples/examples/codebase_index.md +0 -0
  124. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/examples/examples/custom_targets.md +0 -0
  125. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/examples/examples/image_search.md +0 -0
  126. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/examples/examples/multi_format_index.md +0 -0
  127. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/examples/index.md +0 -0
  128. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/getting_started/installation.md +0 -0
  129. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/getting_started/markdown_files.zip +0 -0
  130. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/getting_started/overview.md +0 -0
  131. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/ops/functions.md +0 -0
  132. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/ops/targets.md +0 -0
  133. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/query.mdx +0 -0
  134. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/tutorials/live_updates.md +0 -0
  135. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docs/tutorials/manage_flow_dynamically.mdx +0 -0
  136. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/docusaurus.config.ts +0 -0
  137. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/package.json +0 -0
  138. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/sidebars.ts +0 -0
  139. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/src/components/GitHubButton/index.tsx +0 -0
  140. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/src/css/custom.css +0 -0
  141. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/src/theme/DocCard/index.tsx +0 -0
  142. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/src/theme/DocCard/styles.module.css +0 -0
  143. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/src/theme/DocCardList/index.tsx +0 -0
  144. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/src/theme/DocCardList/styles.module.css +0 -0
  145. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/src/theme/Root.js +0 -0
  146. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/.nojekyll +0 -0
  147. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/docusaurus.png +0 -0
  148. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/codebase_index/chunk.png +0 -0
  149. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/codebase_index/cover.png +0 -0
  150. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/codebase_index/flow.png +0 -0
  151. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/codebase_index/usecase.png +0 -0
  152. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/custom_targets/convert.png +0 -0
  153. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/custom_targets/cover.png +0 -0
  154. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/docs_to_knowledge_graph/cover.png +0 -0
  155. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/docs_to_knowledge_graph/dedupe.png +0 -0
  156. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/docs_to_knowledge_graph/export_document.png +0 -0
  157. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/docs_to_knowledge_graph/export_relationship.png +0 -0
  158. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/docs_to_knowledge_graph/extract_relationship.png +0 -0
  159. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/docs_to_knowledge_graph/flow.png +0 -0
  160. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/docs_to_knowledge_graph/relationship.png +0 -0
  161. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/docs_to_knowledge_graph/summary.png +0 -0
  162. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/image_search/cover.png +0 -0
  163. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/examples/multi_format_index/cover.png +0 -0
  164. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/favicon.ico +0 -0
  165. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/icon.svg +0 -0
  166. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/img/incremental-etl.gif +0 -0
  167. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/static/robots.txt +0 -0
  168. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/tsconfig.json +0 -0
  169. {cocoindex-0.1.83 → cocoindex-0.2.1}/docs/yarn.lock +0 -0
  170. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/amazon_s3_embedding/.env.example +0 -0
  171. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/amazon_s3_embedding/.gitignore +0 -0
  172. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/amazon_s3_embedding/README.md +0 -0
  173. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/amazon_s3_embedding/main.py +0 -0
  174. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/amazon_s3_embedding/pyproject.toml +0 -0
  175. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/azure_blob_embedding/.env.example +0 -0
  176. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/azure_blob_embedding/.gitignore +0 -0
  177. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/azure_blob_embedding/README.md +0 -0
  178. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/azure_blob_embedding/main.py +0 -0
  179. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/azure_blob_embedding/pyproject.toml +0 -0
  180. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/code_embedding/.env +0 -0
  181. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/code_embedding/README.md +0 -0
  182. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/code_embedding/main.py +0 -0
  183. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/code_embedding/pyproject.toml +0 -0
  184. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/custom_output_files/.env +0 -0
  185. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/custom_output_files/.gitignore +0 -0
  186. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/custom_output_files/README.md +0 -0
  187. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/custom_output_files/data/bizarre_animals.md +0 -0
  188. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/custom_output_files/data/chunk_norris.md +0 -0
  189. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/custom_output_files/main.py +0 -0
  190. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/custom_output_files/pyproject.toml +0 -0
  191. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/docs_to_knowledge_graph/.env +0 -0
  192. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/docs_to_knowledge_graph/README.md +0 -0
  193. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/docs_to_knowledge_graph/main.py +0 -0
  194. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/docs_to_knowledge_graph/pyproject.toml +0 -0
  195. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/face_recognition/.env +0 -0
  196. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/face_recognition/README.md +0 -0
  197. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/face_recognition/images/Carter_welcomes_Reagan.jpg +0 -0
  198. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/face_recognition/images/Solvay_conference_1927.jpg +0 -0
  199. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/face_recognition/images/Steve_Jobs_and_Bill_Gates_(522695099).jpg +0 -0
  200. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/face_recognition/images/einplanck3.jpg +0 -0
  201. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/face_recognition/main.py +0 -0
  202. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/face_recognition/pyproject.toml +0 -0
  203. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/fastapi_server_docker/.dockerignore +0 -0
  204. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/fastapi_server_docker/.env +0 -0
  205. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/fastapi_server_docker/README.md +0 -0
  206. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/fastapi_server_docker/compose.yaml +0 -0
  207. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/fastapi_server_docker/dockerfile +0 -0
  208. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/fastapi_server_docker/files/1810.04805v2.md +0 -0
  209. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/fastapi_server_docker/main.py +0 -0
  210. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/fastapi_server_docker/requirements.txt +0 -0
  211. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/gdrive_text_embedding/.env.example +0 -0
  212. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/gdrive_text_embedding/.gitignore +0 -0
  213. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/gdrive_text_embedding/README.md +0 -0
  214. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/gdrive_text_embedding/main.py +0 -0
  215. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/gdrive_text_embedding/pyproject.toml +0 -0
  216. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/.env +0 -0
  217. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/README.md +0 -0
  218. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/colpali_main.py +0 -0
  219. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/frontend/.gitignore +0 -0
  220. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/frontend/index.html +0 -0
  221. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/frontend/package-lock.json +0 -0
  222. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/frontend/package.json +0 -0
  223. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/frontend/src/App.jsx +0 -0
  224. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/frontend/src/main.jsx +0 -0
  225. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/frontend/src/style.css +0 -0
  226. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/frontend/vite.config.js +0 -0
  227. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/img/cat1.jpeg +0 -0
  228. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/img/dog1.jpeg +0 -0
  229. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/img/elephant1.jpg +0 -0
  230. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/img/giraffe.jpg +0 -0
  231. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/main.py +0 -0
  232. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/image_search/pyproject.toml +0 -0
  233. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/live_updates/.env +0 -0
  234. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/live_updates/README.md +0 -0
  235. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/live_updates/data/bizarre_animals.md +0 -0
  236. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/live_updates/data/chunk_norris.md +0 -0
  237. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/live_updates/main.py +0 -0
  238. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/live_updates/pyproject.toml +0 -0
  239. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/manuals_llm_extraction/.env +0 -0
  240. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/manuals_llm_extraction/README.md +0 -0
  241. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/manuals_llm_extraction/main.py +0 -0
  242. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/manuals_llm_extraction/manuals/array.pdf +0 -0
  243. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/manuals_llm_extraction/manuals/base64.pdf +0 -0
  244. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/manuals_llm_extraction/manuals/copy.pdf +0 -0
  245. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/manuals_llm_extraction/manuals/glob.pdf +0 -0
  246. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/.env +0 -0
  247. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/README.md +0 -0
  248. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/main.py +0 -0
  249. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/pyproject.toml +0 -0
  250. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/source_files/1706.03762v7.pdf +0 -0
  251. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/source_files/1810.04805v2.pdf +0 -0
  252. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/source_files/2502.06786v3.pdf +0 -0
  253. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/source_files/healthcare_industry_test_p101.jpg +0 -0
  254. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/source_files/healthcare_industry_test_p86.jpg +0 -0
  255. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/source_files/healthcare_industry_test_p9.jpg +0 -0
  256. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/source_files/restaurant_brands_international_2023.jpg +0 -0
  257. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/multi_format_indexing/source_files/sweetgreen_2023.jpg +0 -0
  258. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/paper_metadata/.env.example +0 -0
  259. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/paper_metadata/.gitignore +0 -0
  260. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/paper_metadata/README.md +0 -0
  261. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/paper_metadata/main.py +0 -0
  262. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/paper_metadata/papers/1706.03762v7.pdf +0 -0
  263. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/paper_metadata/papers/1810.04805v2.pdf +0 -0
  264. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/paper_metadata/papers/2502.06786v3.pdf +0 -0
  265. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/paper_metadata/papers/2502.20346v1.pdf +0 -0
  266. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/patient_intake_extraction/.env.example +0 -0
  267. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/patient_intake_extraction/README.md +0 -0
  268. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/patient_intake_extraction/data/README.md +0 -0
  269. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/patient_intake_extraction/data/patient_forms/Patient_Intake_Form_David_Artificial.docx +0 -0
  270. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/patient_intake_extraction/data/patient_forms/Patient_Intake_Form_Emily_Artificial.pdf +0 -0
  271. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/patient_intake_extraction/data/patient_forms/Patient_Intake_Form_Joe_Artificial.pdf +0 -0
  272. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/patient_intake_extraction/data/patient_forms/Patient_Intake_From_Jane_Artificial.docx +0 -0
  273. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/patient_intake_extraction/main.py +0 -0
  274. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/patient_intake_extraction/pyproject.toml +0 -0
  275. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/pdf_embedding/.env +0 -0
  276. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/pdf_embedding/README.md +0 -0
  277. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/pdf_embedding/main.py +0 -0
  278. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/pdf_embedding/pdf_files/1706.03762v7.pdf +0 -0
  279. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/pdf_embedding/pdf_files/1810.04805v2.pdf +0 -0
  280. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/pdf_embedding/pdf_files/rfc8259.pdf +0 -0
  281. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/postgres_source/.env +0 -0
  282. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/postgres_source/.env.example +0 -0
  283. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/postgres_source/README.md +0 -0
  284. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/postgres_source/prepare_source_data.sql +0 -0
  285. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/postgres_source/pyproject.toml +0 -0
  286. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/README.md +0 -0
  287. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/img/cocoinsight.png +0 -0
  288. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/img/neo4j.png +0 -0
  289. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/main.py +0 -0
  290. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/products/p1.json +0 -0
  291. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/products/p2.json +0 -0
  292. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/products/p3.json +0 -0
  293. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/products/p4.json +0 -0
  294. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/products/p5.json +0 -0
  295. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/products/p6.json +0 -0
  296. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/products/p7.json +0 -0
  297. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/products/p8.json +0 -0
  298. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/products/p9.json +0 -0
  299. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/product_recommendation/pyproject.toml +0 -0
  300. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding/.env +0 -0
  301. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding/README.md +0 -0
  302. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding/Text_Embedding.ipynb +0 -0
  303. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding/main.py +0 -0
  304. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding/markdown_files/1706.03762v7.md +0 -0
  305. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding/markdown_files/1810.04805v2.md +0 -0
  306. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding/markdown_files/rfc8259.md +0 -0
  307. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding/pyproject.toml +0 -0
  308. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding_qdrant/.env +0 -0
  309. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding_qdrant/README.md +0 -0
  310. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding_qdrant/main.py +0 -0
  311. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding_qdrant/markdown_files/rfc8259.md +0 -0
  312. {cocoindex-0.1.83 → cocoindex-0.2.1}/examples/text_embedding_qdrant/pyproject.toml +0 -0
  313. {cocoindex-0.1.83 → cocoindex-0.2.1}/pyproject.toml +0 -0
  314. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/__init__.py +0 -0
  315. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/auth_registry.py +0 -0
  316. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/cli.py +0 -0
  317. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/flow.py +0 -0
  318. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/functions.py +0 -0
  319. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/index.py +0 -0
  320. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/lib.py +0 -0
  321. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/llm.py +0 -0
  322. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/op.py +0 -0
  323. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/py.typed +0 -0
  324. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/runtime.py +0 -0
  325. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/setup.py +0 -0
  326. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/sources.py +0 -0
  327. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/targets.py +0 -0
  328. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/tests/__init__.py +0 -0
  329. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/tests/test_convert.py +0 -0
  330. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/tests/test_optional_database.py +0 -0
  331. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/tests/test_transform_flow.py +0 -0
  332. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/tests/test_typing.py +0 -0
  333. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/tests/test_validation.py +0 -0
  334. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/user_app_loader.py +0 -0
  335. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/utils.py +0 -0
  336. {cocoindex-0.1.83 → cocoindex-0.2.1}/python/cocoindex/validation.py +0 -0
  337. {cocoindex-0.1.83 → cocoindex-0.2.1}/ruff.toml +0 -0
  338. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/base/duration.rs +0 -0
  339. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/base/field_attrs.rs +0 -0
  340. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/base/json_schema.rs +0 -0
  341. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/base/mod.rs +0 -0
  342. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/base/spec.rs +0 -0
  343. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/builder/analyzed_flow.rs +0 -0
  344. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/builder/flow_builder.rs +0 -0
  345. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/builder/mod.rs +0 -0
  346. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/db_tracking.rs +0 -0
  347. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/db_tracking_setup.rs +0 -0
  348. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/indexing_status.rs +0 -0
  349. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/live_updater.rs +0 -0
  350. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/memoization.rs +0 -0
  351. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/execution/mod.rs +0 -0
  352. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/lib.rs +0 -0
  353. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/llm/anthropic.rs +0 -0
  354. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/llm/gemini.rs +0 -0
  355. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/llm/litellm.rs +0 -0
  356. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/llm/mod.rs +0 -0
  357. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/llm/ollama.rs +0 -0
  358. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/llm/openai.rs +0 -0
  359. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/llm/openrouter.rs +0 -0
  360. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/llm/vllm.rs +0 -0
  361. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/llm/voyage.rs +0 -0
  362. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/factory_bases.rs +0 -0
  363. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/functions/embed_text.rs +0 -0
  364. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/functions/extract_by_llm.rs +0 -0
  365. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/functions/mod.rs +0 -0
  366. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/functions/parse_json.rs +0 -0
  367. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/functions/test_utils.rs +0 -0
  368. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/mod.rs +0 -0
  369. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/py_factory.rs +0 -0
  370. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/registration.rs +0 -0
  371. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/registry.rs +0 -0
  372. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/sdk.rs +0 -0
  373. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/shared/mod.rs +0 -0
  374. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/shared/postgres.rs +0 -0
  375. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/sources/mod.rs +0 -0
  376. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/sources/shared/mod.rs +0 -0
  377. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/sources/shared/pattern_matcher.rs +0 -0
  378. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/targets/mod.rs +0 -0
  379. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/targets/postgres.rs +0 -0
  380. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/targets/qdrant.rs +0 -0
  381. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/targets/shared/mod.rs +0 -0
  382. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/ops/targets/shared/table_columns.rs +0 -0
  383. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/prelude.rs +0 -0
  384. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/py/mod.rs +0 -0
  385. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/server.rs +0 -0
  386. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/service/error.rs +0 -0
  387. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/service/mod.rs +0 -0
  388. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/settings.rs +0 -0
  389. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/setup/auth_registry.rs +0 -0
  390. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/setup/components.rs +0 -0
  391. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/setup/db_metadata.rs +0 -0
  392. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/setup/driver.rs +0 -0
  393. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/setup/flow_features.rs +0 -0
  394. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/setup/mod.rs +0 -0
  395. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/utils/concur_control.rs +0 -0
  396. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/utils/db.rs +0 -0
  397. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/utils/fingerprint.rs +0 -0
  398. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/utils/immutable.rs +0 -0
  399. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/utils/mod.rs +0 -0
  400. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/utils/retryable.rs +0 -0
  401. {cocoindex-0.1.83 → cocoindex-0.2.1}/src/utils/yaml_ser.rs +0 -0
@@ -1283,7 +1283,7 @@ dependencies = [
1283
1283
 
1284
1284
  [[package]]
1285
1285
  name = "cocoindex"
1286
- version = "0.1.83"
1286
+ version = "0.2.1"
1287
1287
  dependencies = [
1288
1288
  "anyhow",
1289
1289
  "async-openai",
@@ -2,7 +2,7 @@
2
2
  name = "cocoindex"
3
3
  # Version used for local development is always higher than others to take precedence.
4
4
  # Will be overridden for specific release versions.
5
- version = "0.1.83"
5
+ version = "0.2.1"
6
6
  edition = "2024"
7
7
  rust-version = "1.88"
8
8
  readme = "README.md"
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: cocoindex
3
- Version: 0.1.83
3
+ Version: 0.2.1
4
4
  Requires-Dist: click>=8.1.8
5
5
  Requires-Dist: rich>=14.0.0
6
6
  Requires-Dist: python-dotenv>=1.1.0
@@ -23,7 +23,7 @@ Each piece of data has a **data type**, falling into one of the following catego
23
23
 
24
24
  * *Basic type*.
25
25
  * *Struct type*: a collection of **fields**, each with a name and a type.
26
- * *Table type*: a collection of **rows**, each of which is a struct with specified schema. A table type can be a *KTable* (which has a key field) or a *LTable* (ordered but without key field).
26
+ * *Table type*: a collection of **rows**, each of which is a struct with specified schema. A table type can be a *KTable* (with key columns that uniquely identify each row) or a *LTable* (rows are ordered but without keys).
27
27
 
28
28
  An indexing flow always has a top-level struct, containing all data within and managed by the flow.
29
29
 
@@ -148,21 +148,27 @@ We have two specific types of *Table* types: *KTable* and *LTable*.
148
148
 
149
149
  #### KTable
150
150
 
151
- *KTable* is a *Table* type whose first column serves as the key.
151
+ *KTable* is a *Table* type whose one or more columns together serve as the key.
152
152
  The row order of a *KTable* is not preserved.
153
- Type of the first column (key column) must be a [key type](#key-types).
153
+ Each key column must be a [key type](#key-types). When multiple key columns are present, they form a composite key.
154
154
 
155
- In Python, a *KTable* type is represented by `dict[K, V]`.
156
- The `K` should be the type binding to a key type,
157
- and the `V` should be the type binding to a *Struct* type representing the value fields of each row.
158
- When the specific type annotation is not provided,
159
- the key type is bound to a tuple with its key parts when it's a *Struct* type, the value type is bound to `dict[str, Any]`.
155
+ In Python, a *KTable* type is represented by `dict[K, V]`.
156
+ `K` represents the key and `V` represents the value for each row:
157
+
158
+ - `K` can be a Struct type (either a frozen dataclass or a `NamedTuple`) that contains all key parts as fields. This is the general way to model multi-part keys.
159
+ - When there is only a single key part and it is a basic type (e.g. `str`, `int`), you may use that basic type directly as the dictionary key instead of wrapping it in a Struct.
160
+ - `V` should be the type bound to a *Struct* representing the non-key value fields of each row.
161
+
162
+ When a specific type annotation is not provided:
163
+ - For composite keys (multiple key parts), the key binds to a Python tuple of the key parts, e.g. `tuple[str, str]`.
164
+ - For a single basic key part, the key binds to that basic Python type.
165
+ - The value binds to `dict[str, Any]`.
160
166
 
161
167
 
162
168
  For example, you can use `dict[str, Person]` or `dict[str, PersonTuple]` to represent a *KTable*, with 4 columns: key (*Str*), `first_name` (*Str*), `last_name` (*Str*), `dob` (*Date*).
163
169
  It's bound to `dict[str, dict[str, Any]]` if you don't annotate the function argument with a specific type.
164
170
 
165
- Note that if you want to use a *Struct* as the key, you need to ensure its value in Python is immutable. For `dataclass`, annotate it with `@dataclass(frozen=True)`. For `NamedTuple`, immutability is built-in. For example:
171
+ Note that when using a Struct as the key, it must be immutable in Python. For a dataclass, annotate it with `@dataclass(frozen=True)`. For `NamedTuple`, immutability is built-in. For example:
166
172
 
167
173
  ```python
168
174
  @dataclass(frozen=True)
@@ -175,8 +181,8 @@ class PersonKeyTuple(NamedTuple):
175
181
  id: str
176
182
  ```
177
183
 
178
- Then you can use `dict[PersonKey, Person]` or `dict[PersonKeyTuple, PersonTuple]` to represent a KTable keyed by `PersonKey` or `PersonKeyTuple`.
179
- It's bound to `dict[(str, str), dict[str, Any]]` if you don't annotate the function argument with a specific type.
184
+ Then you can use `dict[PersonKey, Person]` or `dict[PersonKeyTuple, PersonTuple]` to represent a KTable keyed by both `id_kind` and `id`.
185
+ If you don't annotate the function argument with a specific type, it's bound to `dict[tuple[str, str], dict[str, Any]]`.
180
186
 
181
187
 
182
188
  #### LTable
@@ -96,11 +96,11 @@ If not set, all flows are in a default unnamed namespace.
96
96
 
97
97
  :::
98
98
 
99
- * `max_connections` (type: `int`, default: `64`): The maximum number of connections to keep in the pool.
99
+ * `max_connections` (type: `int`, default: `25`): The maximum number of connections to keep in the pool.
100
100
 
101
101
  *Environment variable* for `Settings.database.max_connections`: `COCOINDEX_DATABASE_MAX_CONNECTIONS`
102
102
 
103
- * `min_connections` (type: `int`, default: `16`): The minimum number of connections to keep in the pool.
103
+ * `min_connections` (type: `int`, default: `5`): The minimum number of connections to keep in the pool.
104
104
 
105
105
  *Environment variable* for `Settings.database.min_connections`: `COCOINDEX_DATABASE_MIN_CONNECTIONS`
106
106
 
@@ -134,7 +134,7 @@ This is the list of environment variables, each of which has a corresponding fie
134
134
  | `COCOINDEX_DATABASE_URL` | `database.url` | Yes |
135
135
  | `COCOINDEX_DATABASE_USER` | `database.user` | No |
136
136
  | `COCOINDEX_DATABASE_PASSWORD` | `database.password` | No |
137
- | `COCOINDEX_DATABASE_MAX_CONNECTIONS` | `database.max_connections` | No (default: `64`) |
138
- | `COCOINDEX_DATABASE_MIN_CONNECTIONS` | `database.min_connections` | No (default: `16`) |
137
+ | `COCOINDEX_DATABASE_MAX_CONNECTIONS` | `database.max_connections` | No (default: `25`) |
138
+ | `COCOINDEX_DATABASE_MIN_CONNECTIONS` | `database.min_connections` | No (default: `5`) |
139
139
  | `COCOINDEX_SOURCE_MAX_INFLIGHT_ROWS` | `global_execution_options.source_max_inflight_rows` | No (default: `1024`) |
140
140
  | `COCOINDEX_SOURCE_MAX_INFLIGHT_BYTES` | `global_execution_options.source_max_inflight_bytes` | No |
@@ -10,11 +10,10 @@ sidebar_custom_props:
10
10
  tags: [vector-index, metadata]
11
11
  ---
12
12
 
13
- import { GitHubButton, YouTubeButton } from '../../../src/components/GitHubButton';
13
+ import { GitHubButton, YouTubeButton, DocumentationButton } from '../../../src/components/GitHubButton';
14
14
 
15
15
  <GitHubButton url="https://github.com/cocoindex-io/cocoindex/tree/main/examples/paper_metadata"/>
16
16
 
17
-
18
17
  ## What we will achieve
19
18
 
20
19
  1. Extract the paper metadata, including file name, title, author information, abstract, and number of pages.
@@ -27,18 +26,8 @@ to answer questions like "Give me all the papers by Jeff Dean."
27
26
 
28
27
  4. If you want to perform full PDF embedding for the paper, you can extend the flow.
29
28
 
30
- ## Setup
31
-
32
- - [Install PostgreSQL](https://cocoindex.io/docs/getting_started/installation#-install-postgres).
33
- CocoIndex uses PostgreSQL internally for incremental processing.
34
- - [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai).
35
- Alternatively, we have native support for Gemini, Ollama, LiteLLM. Check out the [guide](https://cocoindex.io/docs/ai/llm#ollama).
36
- You can choose your favorite LLM provider and work completely on-premises.
37
-
38
- ## Define Indexing Flow
39
-
40
- To better help you navigate what we will walk through, here is a flow diagram:
41
-
29
+ ## Flow Overview
30
+ ![Flow Overview](/img/examples/academic_papers_index/flow.png)
42
31
  1. Import a list of papers in PDF.
43
32
  2. For each file:
44
33
  - Extract the first page of the paper.
@@ -50,9 +39,15 @@ To better help you navigate what we will walk through, here is a flow diagram:
50
39
  - Author-to-paper mapping, for author-based query.
51
40
  - Embeddings for titles and abstract chunks, for semantic search.
52
41
 
53
- Let’s zoom in on the steps.
42
+ ## Setup
43
+
44
+ - [Install PostgreSQL](https://cocoindex.io/docs/getting_started/installation#-install-postgres).
45
+ CocoIndex uses PostgreSQL internally for incremental processing.
46
+ - [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai). Alternatively, we have native support for Gemini, Ollama, LiteLLM. You can choose your favorite LLM provider and work completely on-premises.
54
47
 
55
- ### Import the Papers
48
+ <DocumentationButton href="https://cocoindex.io/docs/ai/llm" text="LLM" margin="0 0 16px 0" />
49
+
50
+ ## Import the Papers
56
51
 
57
52
  ```python
58
53
  @cocoindex.flow_def(name="PaperMetadata")
@@ -65,12 +60,12 @@ def paper_metadata_flow(
65
60
  )
66
61
  ```
67
62
 
68
- `flow_builder.add_source` will create a table with sub fields (`filename`, `content`),
69
- we can refer to the [documentation](https://cocoindex.io/docs/ops/sources) for more details.
63
+ `flow_builder.add_source` will create a table with sub fields (`filename`, `content`).
64
+ <DocumentationButton href="https://cocoindex.io/docs/ops/sources" text="Sources" margin="0 0 16px 0" />
70
65
 
71
- ### Extract and collect metadata
66
+ ## Extract and collect metadata
72
67
 
73
- #### Extract first page for basic info
68
+ ### Extract first page for basic info
74
69
 
75
70
  Define a custom function to extract the first page and number of pages of the PDF.
76
71
 
@@ -96,20 +91,19 @@ def extract_basic_info(content: bytes) -> PaperBasicInfo:
96
91
 
97
92
  ```
98
93
 
99
- Now, plug this into your flow.
100
- We extract metadata from the first page to minimize processing cost, since the entire PDF can be very large.
94
+ Now plug this into the flow. We extract metadata from the first page to minimize processing cost, since the entire PDF can be very large.
101
95
 
102
96
  ```python
103
97
  with data_scope["documents"].row() as doc:
104
98
  doc["basic_info"] = doc["content"].transform(extract_basic_info)
105
99
  ```
100
+ ![Extract basic info](/img/examples/academic_papers_index/basic_info.png)
106
101
 
107
- After this step, you should have the basic info of each paper.
102
+ After this step, we should have the basic info of each paper.
108
103
 
109
104
  ### Parse basic info
110
105
 
111
- We will convert the first page to Markdown using Marker.
112
- Alternatively, you can easily plug in your favorite PDF parser, such as Docling.
106
+ We will convert the first page to Markdown using Marker. Alternatively, you can easily plug in any PDF parser, such as Docling using CocoIndex's [custom function](https://cocoindex.io/docs/custom_ops/custom_functions).
113
107
 
114
108
  Define a marker converter function and cache it, since its initialization is resource-intensive.
115
109
  This ensures that the same converter instance is reused for different input files.
@@ -140,18 +134,20 @@ def pdf_to_markdown(content: bytes) -> str:
140
134
  Pass it to your transform
141
135
 
142
136
  ```python
143
- with data_scope["documents"].row() as doc:
137
+ with data_scope["documents"].row() as doc:
138
+ # ... process
144
139
  doc["first_page_md"] = doc["basic_info"]["first_page"].transform(
145
140
  pdf_to_markdown
146
141
  )
147
142
  ```
143
+ ![First page in Markdown](/img/examples/academic_papers_index/first_page.png)
148
144
 
149
145
  After this step, you should have the first page of each paper in Markdown format.
150
146
 
151
- #### Extract basic info with LLM
147
+ ### Extract basic info with LLM
152
148
 
153
149
  Define a schema for LLM extraction. CocoIndex natively supports LLM-structured extraction with complex and nested schemas.
154
- If you are interested in learning more about nested schemas, refer to [this article](https://cocoindex.io/blogs/patient-intake-form-extraction-with-llm).
150
+ If you are interested in learning more about nested schemas, refer to [this example](https://cocoindex.io/docs/examples/patient_form_extraction).
155
151
 
156
152
  ```python
157
153
  @dataclasses.dataclass
@@ -163,7 +159,6 @@ class PaperMetadata:
163
159
  title: str
164
160
  authors: list[Author]
165
161
  abstract: str
166
-
167
162
  ```
168
163
 
169
164
  Plug it into the `ExtractByLlm` function. With a dataclass defined, CocoIndex will automatically parse the LLM response into the dataclass.
@@ -181,26 +176,27 @@ doc["metadata"] = doc["first_page_md"].transform(
181
176
  ```
182
177
 
183
178
  After this step, you should have the metadata of each paper.
179
+ ![Metadata](/img/examples/academic_papers_index/metadata.png)
184
180
 
185
- #### Collect paper metadata
181
+ ### Collect paper metadata
186
182
 
187
183
  ```python
188
- paper_metadata = data_scope.add_collector()
189
- with data_scope["documents"].row() as doc:
190
- # ... process
191
- # Collect metadata
192
- paper_metadata.collect(
193
- filename=doc["filename"],
194
- title=doc["metadata"]["title"],
195
- authors=doc["metadata"]["authors"],
196
- abstract=doc["metadata"]["abstract"],
197
- num_pages=doc["basic_info"]["num_pages"],
198
- )
184
+ paper_metadata = data_scope.add_collector()
185
+ with data_scope["documents"].row() as doc:
186
+ # ... process
187
+ # Collect metadata
188
+ paper_metadata.collect(
189
+ filename=doc["filename"],
190
+ title=doc["metadata"]["title"],
191
+ authors=doc["metadata"]["authors"],
192
+ abstract=doc["metadata"]["abstract"],
193
+ num_pages=doc["basic_info"]["num_pages"],
194
+ )
199
195
  ```
200
196
 
201
197
  Just collect anything you need :)
202
198
 
203
- #### Collect `author` to `filename` information
199
+ ### Collect `author` to `filename` information
204
200
  We’ve already extracted author list. Here we want to collect Author → Papers in a separate table to build a look up functionality.
205
201
  Simply collect by author.
206
202
 
@@ -216,9 +212,9 @@ with data_scope["documents"].row() as doc:
216
212
  ```
217
213
 
218
214
 
219
- ### Compute and collect embeddings
215
+ ## Compute and collect embeddings
220
216
 
221
- #### Title
217
+ ### Title
222
218
 
223
219
  ```python
224
220
  doc["title_embedding"] = doc["metadata"]["title"].transform(
@@ -228,7 +224,7 @@ doc["title_embedding"] = doc["metadata"]["title"].transform(
228
224
  )
229
225
  ```
230
226
 
231
- #### Abstract
227
+ ### Abstract
232
228
 
233
229
  Split abstract into chunks, embed each chunk and collect their embeddings.
234
230
  Sometimes the abstract could be very long.
@@ -252,6 +248,8 @@ doc["abstract_chunks"] = doc["metadata"]["abstract"].transform(
252
248
 
253
249
  After this step, you should have the abstract chunks of each paper.
254
250
 
251
+ ![Abstract chunks](/img/examples/academic_papers_index/abstract_chunks.png)
252
+
255
253
  Embed each chunk and collect their embeddings.
256
254
 
257
255
  ```python
@@ -265,7 +263,9 @@ with doc["abstract_chunks"].row() as chunk:
265
263
 
266
264
  After this step, you should have the embeddings of the abstract chunks of each paper.
267
265
 
268
- #### Collect embeddings
266
+ ![Abstract chunks embeddings](/img/examples/academic_papers_index/chunk_embedding.png)
267
+
268
+ ### Collect embeddings
269
269
 
270
270
  ```python
271
271
  metadata_embeddings = data_scope.add_collector()
@@ -292,7 +292,7 @@ with data_scope["documents"].row() as doc:
292
292
  )
293
293
  ```
294
294
 
295
- ### Export
295
+ ## Export
296
296
  Finally, we export the data to Postgres.
297
297
 
298
298
  ```python
@@ -319,14 +319,9 @@ metadata_embeddings.export(
319
319
  )
320
320
  ```
321
321
 
322
- In this example we use PGVector as embedding stores/
323
- With CocoIndex, you can do one line switch on other supported Vector databases like Qdrant, see this [guide](https://cocoindex.io/docs/ops/targets#entry-oriented-targets) for more details.
324
- We aim to standardize interfaces and make it like assembling building blocks.
322
+ In this example we use PGVector as embedding store. With CocoIndex, you can do one line switch on other supported Vector databases.
325
323
 
326
- ## View in CocoInsight step by step
327
-
328
- You can walk through the project step by step in [CocoInsight](https://www.youtube.com/watch?v=MMrpUfUcZPk) to see
329
- exactly how each field is constructed and what happens behind the scenes.
324
+ <DocumentationButton href="https://cocoindex.io/docs/ops/targets#entry-oriented-targets" text="Entry Oriented Targets" margin="0 0 16px 0" />
330
325
 
331
326
  ## Query the index
332
327
 
@@ -338,3 +333,14 @@ For now CocoIndex doesn't provide additional query interface. We can write SQL o
338
333
  - The query space has excellent solutions for querying, reranking, and other search-related functionality.
339
334
 
340
335
  If you need assist with writing the query, please feel free to reach out to us at [Discord](https://discord.com/invite/zpA9S2DR7s).
336
+
337
+ ## CocoInsight
338
+
339
+ You can walk through the project step by step in [CocoInsight](https://www.youtube.com/watch?v=MMrpUfUcZPk) to see exactly how each field is constructed and what happens behind the scenes.
340
+
341
+
342
+ ```sh
343
+ cocoindex server -ci main.py
344
+ ```
345
+
346
+ Follow the url `https://cocoindex.io/cocoinsight`. It connects to your local CocoIndex server, with zero pipeline data retention.
@@ -35,8 +35,10 @@ and then build a knowledge graph.
35
35
  ## Setup
36
36
  * [Install PostgreSQL](https://cocoindex.io/docs/getting_started/installation#-install-postgres). CocoIndex uses PostgreSQL internally for incremental processing.
37
37
  * [Install Neo4j](https://cocoindex.io/docs/ops/targets#neo4j-dev-instance), a graph database.
38
- * [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai). Alternatively, you can switch to Ollama, which runs LLM models locally.
39
- <DocumentationButton href="https://cocoindex.io/docs/ai/llm#ollama" text="Ollama" margin="0 0 16px 0" />
38
+ * [Configure your OpenAI API key](https://cocoindex.io/docs/ai/llm#openai). Alternatively, we have native support for Gemini, Ollama, LiteLLM. You can choose your favorite LLM provider and work completely on-premises.
39
+
40
+ <DocumentationButton href="https://cocoindex.io/docs/ai/llm" text="LLM" margin="0 0 16px 0" />
41
+
40
42
 
41
43
  ## Documentation
42
44
  <DocumentationButton href="https://cocoindex.io/docs/ops/targets#property-graph-targets" text="Property Graph Targets" margin="0 0 16px 0" />
@@ -0,0 +1,249 @@
1
+ ---
2
+ title: Extract Structured Data from Python Manual markdowns with Ollama
3
+ description: Extract structured data from markdowns (Python Manual)
4
+ sidebar_class_name: hidden
5
+ slug: /examples/manual_extraction
6
+ canonicalUrl: '/examples/manual_extraction'
7
+ sidebar_custom_props:
8
+ image: /img/examples/manual_extraction/cover.png
9
+ tags: [structured-data-extraction, data-mapping]
10
+ tags: [structured-data-extraction, data-mapping]
11
+ ---
12
+
13
+ import { GitHubButton, YouTubeButton, DocumentationButton } from '../../../src/components/GitHubButton';
14
+
15
+ <GitHubButton url="https://github.com/cocoindex-io/cocoindex/tree/main/examples/manuals_llm_extraction"/>
16
+
17
+ ## Overview
18
+ This example shows how to extract structured data from Python Manuals using Ollama.
19
+
20
+ ## Flow Overview
21
+ ![Flow Overview](/img/examples/manual_extraction/flow.png)
22
+
23
+ - For each PDF file:
24
+ - Parse to markdown.
25
+ - Extract structured data from the markdown using LLM.
26
+ - Add summary to the module info.
27
+ - Collect the data.
28
+ - Export the data to a table.
29
+
30
+
31
+ ## Prerequisites
32
+ - If you don't have Postgres installed, please refer to the [installation guide](https://cocoindex.io/docs/getting_started/installation).
33
+
34
+ - [Download](https://ollama.com/download) and install Ollama. Pull your favorite LLM models by:
35
+ ```sh
36
+ ollama pull llama3.2
37
+ ```
38
+
39
+ <DocumentationButton href="https://cocoindex.io/docs/ai/llm#ollama" text="Ollama" margin="0 0 16px 0" />
40
+
41
+ Alternatively, CocoIndex have native support for Gemini, Ollama, LiteLLM. You can choose your favorite LLM provider and work completely on-premises.
42
+
43
+ <DocumentationButton href="https://cocoindex.io/docs/ai/llm" text="LLM" margin="0 0 16px 0" />
44
+
45
+ ## Add Source
46
+ Let's add Python docs as a source.
47
+
48
+ ```python
49
+ @cocoindex.flow_def(name="ManualExtraction")
50
+ def manual_extraction_flow(
51
+ flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
52
+ ):
53
+ """
54
+ Define an example flow that extracts manual information from a Markdown.
55
+ """
56
+ data_scope["documents"] = flow_builder.add_source(
57
+ cocoindex.sources.LocalFile(path="manuals", binary=True)
58
+ )
59
+
60
+ modules_index = data_scope.add_collector()
61
+ ```
62
+
63
+ `flow_builder.add_source` will create a table with the following sub fields:
64
+ - `filename` (key, type: `str`): the filename of the file, e.g. `dir1/file1.md`
65
+ - `content` (type: `str` if `binary` is `False`, otherwise `bytes`): the content of the file
66
+
67
+ <DocumentationButton href="https://cocoindex.io/docs/ops/sources" text="LocalFile" margin="0 0 16px 0" />
68
+
69
+ ## Parse Markdown
70
+
71
+ To do this, we can plugin a custom function to convert PDF to markdown. There are so many different parsers commercially and open source available, you can bring your own parser here.
72
+
73
+ ```python
74
+ class PdfToMarkdown(cocoindex.op.FunctionSpec):
75
+ """Convert a PDF to markdown."""
76
+
77
+
78
+ @cocoindex.op.executor_class(gpu=True, cache=True, behavior_version=1)
79
+ class PdfToMarkdownExecutor:
80
+ """Executor for PdfToMarkdown."""
81
+
82
+ spec: PdfToMarkdown
83
+ _converter: PdfConverter
84
+
85
+ def prepare(self):
86
+ config_parser = ConfigParser({})
87
+ self._converter = PdfConverter(
88
+ create_model_dict(), config=config_parser.generate_config_dict()
89
+ )
90
+
91
+ def __call__(self, content: bytes) -> str:
92
+ with tempfile.NamedTemporaryFile(delete=True, suffix=".pdf") as temp_file:
93
+ temp_file.write(content)
94
+ temp_file.flush()
95
+ text, _, _ = text_from_rendered(self._converter(temp_file.name))
96
+ return text
97
+ ```
98
+ You may wonder why we want to define a spec + executor (instead of using a standalone function) here. The main reason is there're some heavy preparation work (initialize the parser) needs to be done before being ready to process real data.
99
+
100
+ <DocumentationButton href="https://cocoindex.io/docs/custom_ops/custom_functions" text="Custom Function" margin="0 0 16px 0" />
101
+
102
+ Plug in the function to the flow.
103
+
104
+ ```python
105
+ with data_scope["documents"].row() as doc:
106
+ doc["markdown"] = doc["content"].transform(PdfToMarkdown())
107
+ ```
108
+
109
+ It transforms each document to markdown.
110
+
111
+
112
+ ## Extract Structured Data from Markdown files
113
+ ### Define schema
114
+ Let's define the schema `ModuleInfo` using Python dataclasses, and we can pass it to the LLM to extract the structured data. It's easy to do this with CocoIndex.
115
+
116
+ ``` python
117
+ @dataclasses.dataclass
118
+ class ArgInfo:
119
+ """Information about an argument of a method."""
120
+ name: str
121
+ description: str
122
+
123
+ @dataclasses.dataclass
124
+ class MethodInfo:
125
+ """Information about a method."""
126
+ name: str
127
+ args: cocoindex.typing.List[ArgInfo]
128
+ description: str
129
+
130
+ @dataclasses.dataclass
131
+ class ClassInfo:
132
+ """Information about a class."""
133
+ name: str
134
+ description: str
135
+ methods: cocoindex.typing.List[MethodInfo]
136
+
137
+ @dataclasses.dataclass
138
+ class ModuleInfo:
139
+ """Information about a Python module."""
140
+ title: str
141
+ description: str
142
+ classes: cocoindex.typing.List[ClassInfo]
143
+ methods: cocoindex.typing.List[MethodInfo]
144
+ ```
145
+
146
+ ### Extract structured data
147
+
148
+ CocoIndex provides builtin functions (e.g. ExtractByLlm) that process data using LLM. This example uses Ollama.
149
+
150
+ ```python
151
+ with data_scope["documents"].row() as doc:
152
+ doc["module_info"] = doc["content"].transform(
153
+ cocoindex.functions.ExtractByLlm(
154
+ llm_spec=cocoindex.LlmSpec(
155
+ api_type=cocoindex.LlmApiType.OLLAMA,
156
+ # See the full list of models: https://ollama.com/library
157
+ model="llama3.2"
158
+ ),
159
+ output_type=ModuleInfo,
160
+ instruction="Please extract Python module information from the manual."))
161
+ ```
162
+
163
+ <DocumentationButton href="https://cocoindex.io/docs/core/functions#extractbyllm" text="ExtractByLlm" margin="0 0 16px 0" />
164
+
165
+ ![ExtractByLlm](/img/examples/manual_extraction/extraction.png)
166
+
167
+ ## Add summarization to module info
168
+ Using CocoIndex as framework, you can easily add any transformation on the data, and collect it as part of the data index. Let's add some simple summary to each module - like number of classes and methods, using simple Python function.
169
+
170
+ ### Define Schema
171
+ ``` python
172
+ @dataclasses.dataclass
173
+ class ModuleSummary:
174
+ """Summary info about a Python module."""
175
+ num_classes: int
176
+ num_methods: int
177
+ ```
178
+
179
+ ### A simple custom function to summarize the data
180
+ ```python
181
+ @cocoindex.op.function()
182
+ def summarize_module(module_info: ModuleInfo) -> ModuleSummary:
183
+ """Summarize a Python module."""
184
+ return ModuleSummary(
185
+ num_classes=len(module_info.classes),
186
+ num_methods=len(module_info.methods),
187
+ )
188
+ ```
189
+
190
+ ### Plug in the function into the flow
191
+ ```python
192
+ with data_scope["documents"].row() as doc:
193
+ # ... after the extraction
194
+ doc["module_summary"] = doc["module_info"].transform(summarize_module)
195
+ ```
196
+
197
+ <DocumentationButton href="https://cocoindex.io/docs/custom_ops/custom_functions" text="Custom Function" margin="0 0 16px 0" />
198
+
199
+ ![Summarize Module](/img/examples/manual_extraction/summary.png)
200
+
201
+ ## Collect the data
202
+
203
+
204
+ After the extraction, we need to cherrypick anything we like from the output using the `collect` function from the collector of a data scope defined above.
205
+
206
+ ```python
207
+ modules_index.collect(
208
+ filename=doc["filename"],
209
+ module_info=doc["module_info"],
210
+ )
211
+ ```
212
+
213
+ Finally, let's export the extracted data to a table.
214
+
215
+ ```python
216
+ modules_index.export(
217
+ "modules",
218
+ cocoindex.storages.Postgres(table_name="modules_info"),
219
+ primary_key_fields=["filename"],
220
+ )
221
+ ```
222
+
223
+ ## Query and test your index
224
+ Run the following command to setup and update the index.
225
+ ```sh
226
+ cocoindex update -L main.py
227
+ ```
228
+ You'll see the index updates state in the terminal
229
+
230
+ After the index is built, you have a table with the name `modules_info`. You can query it at any time, e.g., start a Postgres shell:
231
+
232
+ ```bash
233
+ psql postgres://cocoindex:cocoindex@localhost/cocoindex
234
+ ```
235
+
236
+ And run the SQL query:
237
+
238
+ ```sql
239
+ SELECT filename, module_info->'title' AS title, module_summary FROM modules_info;
240
+ ```
241
+
242
+ ## CocoInsight
243
+ [CocoInsight](https://www.youtube.com/watch?v=ZnmyoHslBSc) is a really cool tool to help you understand your data pipeline and data index. It is in Early Access now (Free).
244
+
245
+ ```sh
246
+ cocoindex server -ci main.py
247
+ ```
248
+ CocoInsight dashboard is here `https://cocoindex.io/cocoinsight`. It connects to your local CocoIndex server with zero data retention.
249
+