sdg-hub 0.6.0__tar.gz → 0.7.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (215) hide show
  1. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/actionlint.dockerfile +1 -1
  2. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/docs.yml +1 -1
  3. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/integration-test.yml +1 -1
  4. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/packer.yml +1 -1
  5. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/pypi.yaml +4 -4
  6. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/PKG-INFO +4 -2
  7. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/README.md +3 -1
  8. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/README.md +5 -5
  9. sdg_hub-0.7.0/docs/_coverpage.md +13 -0
  10. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/_sidebar.md +1 -1
  11. sdg_hub-0.7.0/docs/assets/logo.png +0 -0
  12. sdg_hub-0.7.0/docs/assets/sdg-hub-cover.png +0 -0
  13. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/blocks/custom-blocks.md +42 -5
  14. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/blocks/llm-blocks.md +198 -2
  15. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/blocks/overview.md +4 -4
  16. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/concepts.md +5 -7
  17. sdg_hub-0.7.0/docs/flows/available-flows.md +974 -0
  18. sdg_hub-0.7.0/docs/flows/custom-flows.md +62 -0
  19. sdg_hub-0.7.0/docs/flows/discovery.md +472 -0
  20. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/index.html +13 -21
  21. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/enhanced_summary_knowledge_tuning/knowledge_mixing.ipynb +2 -0
  22. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/document_pre_processing.ipynb +14 -12
  23. sdg_hub-0.7.0/examples/knowledge_tuning/instructlab/knowledge_generation_ja.ipynb +458 -0
  24. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/knowledge_utils.py +19 -18
  25. sdg_hub-0.7.0/examples/rag_evaluation/ibm-annual-report-2024.pdf +0 -0
  26. sdg_hub-0.7.0/examples/rag_evaluation/rag_evaluation_dataset_generation.ipynb +505 -0
  27. sdg_hub-0.7.0/scripts/snyk_notebook_scan.sh +196 -0
  28. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/_version.py +3 -3
  29. sdg_hub-0.7.0/src/sdg_hub/flows/evaluation/rag/answer_generation.yaml +21 -0
  30. sdg_hub-0.7.0/src/sdg_hub/flows/evaluation/rag/conceptual_qa_generation.yaml +25 -0
  31. sdg_hub-0.7.0/src/sdg_hub/flows/evaluation/rag/context_extraction.yaml +23 -0
  32. sdg_hub-0.7.0/src/sdg_hub/flows/evaluation/rag/flow.yaml +201 -0
  33. sdg_hub-0.7.0/src/sdg_hub/flows/evaluation/rag/groundedness_critic.yaml +24 -0
  34. sdg_hub-0.7.0/src/sdg_hub/flows/evaluation/rag/question_evolution.yaml +18 -0
  35. sdg_hub-0.7.0/src/sdg_hub/flows/evaluation/rag/topic_generation.yaml +12 -0
  36. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/multilingual/japanese/flow.yaml +1 -0
  37. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub.egg-info/PKG-INFO +4 -2
  38. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub.egg-info/SOURCES.txt +16 -0
  39. sdg_hub-0.7.0/tests/__init__.py +0 -0
  40. sdg_hub-0.6.0/docs/_coverpage.md +0 -14
  41. sdg_hub-0.6.0/docs/flows/discovery.md +0 -206
  42. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/actionlint.yaml +0 -0
  43. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/actions/free-disk-space/action.yml +0 -0
  44. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/dependabot.yml +0 -0
  45. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/mergify.yml +0 -0
  46. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/actionlint.yml +0 -0
  47. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/lint.yml +0 -0
  48. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/matchers/actionlint.json +0 -0
  49. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/matchers/pylint.json +0 -0
  50. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/test.yml +0 -0
  51. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.gitignore +0 -0
  52. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.isort.cfg +0 -0
  53. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.markdownlint-cli2.yaml +0 -0
  54. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.pre-commit-config.yaml +0 -0
  55. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.pylintrc +0 -0
  56. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/CLAUDE.md +0 -0
  57. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/CONTRIBUTING.md +0 -0
  58. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/LICENSE +0 -0
  59. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/Makefile +0 -0
  60. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/.nojekyll +0 -0
  61. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/_navbar.md +0 -0
  62. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/api-reference.md +0 -0
  63. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/blocks/filtering-blocks.md +0 -0
  64. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/blocks/transform-blocks.md +0 -0
  65. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/development.md +0 -0
  66. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/flows/overview.md +0 -0
  67. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/installation.md +0 -0
  68. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/quick-start.md +0 -0
  69. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/enhanced_summary_knowledge_tuning/.env.example +0 -0
  70. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/enhanced_summary_knowledge_tuning/README.md +0 -0
  71. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/enhanced_summary_knowledge_tuning/document_pre_processing.ipynb +0 -0
  72. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/enhanced_summary_knowledge_tuning/knowledge_generation.ipynb +0 -0
  73. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/enhanced_summary_knowledge_tuning/knowledge_mixing_utils.py +0 -0
  74. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/enhanced_summary_knowledge_tuning/raft_builder.py +0 -0
  75. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/.gitignore +0 -0
  76. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/README.md +0 -0
  77. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/assets/imgs/instructlab-banner.png +0 -0
  78. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/docling_v2_config.yaml +0 -0
  79. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/docparser.py +0 -0
  80. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/docparser_v2.py +0 -0
  81. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/document_collection/ibm-annual-report/ibm-annual-report-2024.json +0 -0
  82. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/document_collection/ibm-annual-report/ibm-annual-report-2024.md +0 -0
  83. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/document_collection/ibm-annual-report/ibm-annual-report-2024.pdf +0 -0
  84. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/document_collection/ibm-annual-report/qna.yaml +0 -0
  85. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/knowledge_generation_and_mixing.ipynb +0 -0
  86. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/logger_config.py +0 -0
  87. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/text_analysis/README.md +0 -0
  88. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/text_analysis/extract_stock_tickers.yaml +0 -0
  89. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/text_analysis/structured_insights_demo.ipynb +0 -0
  90. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/pyproject.toml +0 -0
  91. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/scripts/packer/centos.pkr.hcl +0 -0
  92. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/scripts/packer/setup-centos.sh +0 -0
  93. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/scripts/ruff.sh +0 -0
  94. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/setup.cfg +0 -0
  95. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/__init__.py +0 -0
  96. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/__init__.py +0 -0
  97. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/__init__.py +0 -0
  98. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/base.py +0 -0
  99. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/filtering/__init__.py +0 -0
  100. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/filtering/column_value_filter.py +0 -0
  101. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/llm/__init__.py +0 -0
  102. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/llm/error_handler.py +0 -0
  103. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/llm/llm_chat_block.py +0 -0
  104. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/llm/llm_parser_block.py +0 -0
  105. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/llm/prompt_builder_block.py +0 -0
  106. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/llm/text_parser_block.py +0 -0
  107. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/registry.py +0 -0
  108. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/transform/__init__.py +0 -0
  109. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/transform/duplicate_columns.py +0 -0
  110. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/transform/index_based_mapper.py +0 -0
  111. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/transform/json_structure_block.py +0 -0
  112. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/transform/melt_columns.py +0 -0
  113. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/transform/rename_columns.py +0 -0
  114. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/transform/text_concat.py +0 -0
  115. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/transform/uniform_col_val_setter.py +0 -0
  116. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/flow/__init__.py +0 -0
  117. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/flow/base.py +0 -0
  118. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/flow/checkpointer.py +0 -0
  119. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/flow/metadata.py +0 -0
  120. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/flow/registry.py +0 -0
  121. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/flow/validation.py +0 -0
  122. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/utils/__init__.py +0 -0
  123. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/utils/datautils.py +0 -0
  124. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/utils/error_handling.py +0 -0
  125. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/utils/flow_id_words.yaml +0 -0
  126. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/utils/flow_identifier.py +0 -0
  127. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/utils/flow_metrics.py +0 -0
  128. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/utils/logger_config.py +0 -0
  129. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/utils/path_resolution.py +0 -0
  130. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/utils/time_estimator.py +0 -0
  131. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/utils/yaml_utils.py +0 -0
  132. {sdg_hub-0.6.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa → sdg_hub-0.7.0/src/sdg_hub/flows/evaluation/rag}/__init__.py +0 -0
  133. {sdg_hub-0.6.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/detailed_summary → sdg_hub-0.7.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa}/__init__.py +0 -0
  134. {sdg_hub-0.6.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/doc_direct_qa → sdg_hub-0.7.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/detailed_summary}/__init__.py +0 -0
  135. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/detailed_summary/detailed_summary.yaml +0 -0
  136. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/detailed_summary/flow.yaml +0 -0
  137. {sdg_hub-0.6.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/extractive_summary → sdg_hub-0.7.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/doc_direct_qa}/__init__.py +0 -0
  138. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/doc_direct_qa/flow.yaml +0 -0
  139. {sdg_hub-0.6.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/key_facts → sdg_hub-0.7.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/extractive_summary}/__init__.py +0 -0
  140. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/extractive_summary/extractive_summary.yaml +0 -0
  141. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/extractive_summary/flow.yaml +0 -0
  142. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/generate_answers.yaml +0 -0
  143. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/generate_multiple_qa.yaml +0 -0
  144. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/generate_question_list.yaml +0 -0
  145. {sdg_hub-0.6.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab → sdg_hub-0.7.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/key_facts}/__init__.py +0 -0
  146. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/key_facts/flow.yaml +0 -0
  147. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/key_facts/key_facts_summary.yaml +0 -0
  148. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab/README.md +0 -0
  149. {sdg_hub-0.6.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/multilingual/japanese → sdg_hub-0.7.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab}/__init__.py +0 -0
  150. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab/atomic_facts.yaml +0 -0
  151. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab/detailed_summary.yaml +0 -0
  152. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab/evaluate_faithfulness.yaml +0 -0
  153. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab/evaluate_question.yaml +0 -0
  154. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab/evaluate_relevancy.yaml +0 -0
  155. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab/extractive_summary.yaml +0 -0
  156. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab/flow.yaml +0 -0
  157. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab/generate_questions_responses.yaml +0 -0
  158. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/multilingual/japanese/README.md +0 -0
  159. {sdg_hub-0.6.0/tests → sdg_hub-0.7.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/multilingual/japanese}/__init__.py +0 -0
  160. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/multilingual/japanese/atomic_facts_ja.yaml +0 -0
  161. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/multilingual/japanese/detailed_summary_ja.yaml +0 -0
  162. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/multilingual/japanese/extractive_summary_ja.yaml +0 -0
  163. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/multilingual/japanese/generate_questions_responses_ja.yaml +0 -0
  164. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/text_analysis/__init__.py +0 -0
  165. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/text_analysis/structured_insights/__init__.py +0 -0
  166. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/text_analysis/structured_insights/analyze_sentiment.yaml +0 -0
  167. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/text_analysis/structured_insights/extract_entities.yaml +0 -0
  168. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/text_analysis/structured_insights/extract_keywords.yaml +0 -0
  169. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/text_analysis/structured_insights/flow.yaml +0 -0
  170. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/text_analysis/structured_insights/summarize.yaml +0 -0
  171. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/py.typed +0 -0
  172. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub.egg-info/dependency_links.txt +0 -0
  173. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub.egg-info/requires.txt +0 -0
  174. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub.egg-info/top_level.txt +0 -0
  175. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/filtering/test_columnvaluefilter.py +0 -0
  176. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/llm/test_llm_chat_block.py +0 -0
  177. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/llm/test_llm_parser_block.py +0 -0
  178. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/llm/test_promptbuilderblock.py +0 -0
  179. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/llm/test_textparserblock.py +0 -0
  180. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/test_base_block.py +0 -0
  181. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/test_registry.py +0 -0
  182. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/testdata/test_config.yaml +0 -0
  183. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/testdata/test_prompt_format_config.yaml +0 -0
  184. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/testdata/test_prompt_format_no_system.yaml +0 -0
  185. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/testdata/test_prompt_format_strict.yaml +0 -0
  186. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/testdata/test_prompt_invalid_final_role.yaml +0 -0
  187. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/testdata/test_prompt_no_user_messages.yaml +0 -0
  188. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/transform/test_index_based_mapper.py +0 -0
  189. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/transform/test_json_structure_block.py +0 -0
  190. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/transform/test_melt_columns.py +0 -0
  191. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/transform/test_rename_columns.py +0 -0
  192. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/transform/test_text_concat.py +0 -0
  193. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/transform/test_uniform_col_val_setter.py +0 -0
  194. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/flow/__init__.py +0 -0
  195. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/flow/conftest.py +0 -0
  196. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/flow/test_base.py +0 -0
  197. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/flow/test_checkpointer.py +0 -0
  198. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/flow/test_dataset_requirements.py +0 -0
  199. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/flow/test_integration.py +0 -0
  200. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/flow/test_metadata.py +0 -0
  201. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/flow/test_registry.py +0 -0
  202. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/flow/test_time_estimation.py +0 -0
  203. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/flow/test_validation.py +0 -0
  204. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/integration/README.md +0 -0
  205. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/integration/__init__.py +0 -0
  206. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/integration/knowledge_tuning/enhanced_summary_knowledge_tuning/README.md +0 -0
  207. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/integration/knowledge_tuning/enhanced_summary_knowledge_tuning/__init__.py +0 -0
  208. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/integration/knowledge_tuning/enhanced_summary_knowledge_tuning/conftest.py +0 -0
  209. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/integration/knowledge_tuning/enhanced_summary_knowledge_tuning/test_data/test_seed_data.jsonl +0 -0
  210. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/integration/knowledge_tuning/enhanced_summary_knowledge_tuning/test_functional.py +0 -0
  211. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/utils/test_datautils.py +0 -0
  212. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/utils/test_error_handling.py +0 -0
  213. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/utils/test_flow_metrics.py +0 -0
  214. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/utils/test_path_resolution.py +0 -0
  215. {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tox.ini +0 -0
@@ -1,3 +1,3 @@
1
1
  # Since dependabot cannot update workflows using docker,
2
2
  # we use this indirection since dependabot can update this file.
3
- FROM rhysd/actionlint:1.7.7@sha256:887a259a5a534f3c4f36cb02dca341673c6089431057242cdc931e9f133147e9
3
+ FROM rhysd/actionlint:1.7.9@sha256:a0383f60d92601e2694e24b24d37df7b6a40bed7cedbc447611c50009bf02d94
@@ -39,6 +39,6 @@ jobs:
39
39
  - name: "Checkout"
40
40
  uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29 # v4.1.6
41
41
  - name: "Check Markdown documents"
42
- uses: DavidAnson/markdownlint-cli2-action@992badcdf24e3b8eb7e87ff9287fe931bcb00c6e # v20.0.0
42
+ uses: DavidAnson/markdownlint-cli2-action@30a0e04f1870d58f8d717450cc6134995f993c63 # v21.0.0
43
43
  with:
44
44
  globs: '**/*.md'
@@ -139,7 +139,7 @@ jobs:
139
139
  flags: integration
140
140
 
141
141
  - name: Upload integration test artifacts
142
- uses: actions/upload-artifact@v4
142
+ uses: actions/upload-artifact@v5
143
143
  if: always()
144
144
  with:
145
145
  name: integration-test-results-${{ matrix.python }}-${{ matrix.platform }}
@@ -15,7 +15,7 @@ jobs:
15
15
  uses: actions/checkout@v4
16
16
 
17
17
  - name: Configure AWS Credentials
18
- uses: aws-actions/configure-aws-credentials@ff717079ee2060e4bcee96c4779b553acc87447c
18
+ uses: aws-actions/configure-aws-credentials@61815dcd50bd041e203e49132bacad1fd04d2708
19
19
  with:
20
20
  role-to-assume: arn:aws:iam::851725220677:role/github-actions-packer-role
21
21
  aws-region: us-east-2
@@ -49,7 +49,7 @@ jobs:
49
49
  fetch-depth: 0
50
50
 
51
51
  - name: "Build and Inspect"
52
- uses: hynek/build-and-inspect-python-package@c52c3a4710070b50470d903818a7b25115dcd076 # v2.13.0
52
+ uses: hynek/build-and-inspect-python-package@efb823f52190ad02594531168b7a2d5790e66516 # v2.14.0
53
53
 
54
54
  # push to Test PyPI on
55
55
  # - a new GitHub release is published
@@ -72,7 +72,7 @@ jobs:
72
72
  egress-policy: audit # TODO: change to 'egress-policy: block' after couple of runs
73
73
 
74
74
  - name: "Download build artifacts"
75
- uses: actions/download-artifact@634f93cb2916e3fdff6788551b99b062d0335ce0 # v5.0.0
75
+ uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
76
76
  with:
77
77
  name: Packages
78
78
  path: dist
@@ -104,13 +104,13 @@ jobs:
104
104
  egress-policy: audit # TODO: change to 'egress-policy: block' after couple of runs
105
105
 
106
106
  - name: "Download build artifacts"
107
- uses: actions/download-artifact@634f93cb2916e3fdff6788551b99b062d0335ce0 # v5.0.0
107
+ uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
108
108
  with:
109
109
  name: Packages
110
110
  path: dist
111
111
 
112
112
  - name: "Sigstore sign package"
113
- uses: sigstore/gh-action-sigstore-python@f7ad0af51a5648d09a20d00370f0a91c3bdf8f84 # v3.0.1
113
+ uses: sigstore/gh-action-sigstore-python@f832326173235dcb00dd5d92cd3f353de3188e6c # v3.1.0
114
114
  with:
115
115
  inputs: |
116
116
  ./dist/*.tar.gz
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: sdg_hub
3
- Version: 0.6.0
3
+ Version: 0.7.0
4
4
  Summary: Synthetic Data Generation
5
5
  Author-email: Red Hat AI Innovation <abhandwa@redhat.com>
6
6
  License: Apache-2.0
@@ -70,7 +70,9 @@ Dynamic: license-file
70
70
  [![Tests](https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub/actions/workflows/test.yml/badge.svg)](https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub/actions/workflows/test.yml)
71
71
  [![codecov](https://codecov.io/gh/Red-Hat-AI-Innovation-Team/sdg_hub/graph/badge.svg?token=SP75BCXWO2)](https://codecov.io/gh/Red-Hat-AI-Innovation-Team/sdg_hub)
72
72
 
73
-
73
+ <p align="center">
74
+ <img src="docs/assets/sdg-hub-cover.png" alt="SDG Hub Cover" width="400">
75
+ </p>
74
76
 
75
77
  A modular Python framework for building synthetic data generation pipelines using composable blocks and flows. Transform datasets through **building-block composition** - mix and match LLM-powered and traditional processing blocks to create sophisticated data generation workflows.
76
78
 
@@ -6,7 +6,9 @@
6
6
  [![Tests](https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub/actions/workflows/test.yml/badge.svg)](https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub/actions/workflows/test.yml)
7
7
  [![codecov](https://codecov.io/gh/Red-Hat-AI-Innovation-Team/sdg_hub/graph/badge.svg?token=SP75BCXWO2)](https://codecov.io/gh/Red-Hat-AI-Innovation-Team/sdg_hub)
8
8
 
9
-
9
+ <p align="center">
10
+ <img src="docs/assets/sdg-hub-cover.png" alt="SDG Hub Cover" width="400">
11
+ </p>
10
12
 
11
13
  A modular Python framework for building synthetic data generation pipelines using composable blocks and flows. Transform datasets through **building-block composition** - mix and match LLM-powered and traditional processing blocks to create sophisticated data generation workflows.
12
14
 
@@ -6,7 +6,7 @@
6
6
  [![Tests](https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub/actions/workflows/test.yml/badge.svg)](https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub/actions/workflows/test.yml)
7
7
  [![codecov](https://codecov.io/gh/Red-Hat-AI-Innovation-Team/sdg_hub/graph/badge.svg?token=SP75BCXWO2)](https://codecov.io/gh/Red-Hat-AI-Innovation-Team/sdg_hub)
8
8
 
9
- A modular Python framework for building synthetic data generation pipelines using composable blocks and flows. Transform datasets through **building-block composition** - mix and match LLM-powered and traditional processing blocks to create sophisticated data generation workflows.
9
+ A modular Python framework for building synthetic data generation pipelines using composable blocks and flows
10
10
 
11
11
  ## 🧱 Core Philosophy
12
12
 
@@ -52,11 +52,11 @@ Learn about the modular block architecture that powers SDG Hub:
52
52
  - **[Custom Blocks](blocks/custom-blocks.md)** - Building your own processing blocks
53
53
 
54
54
  ### Flow System
55
- Master the orchestration system for building complete pipelines:
56
- - **[Flow Overview](flows/overview.md)** - Understanding flow orchestration
57
- - **[YAML Configuration](flows/yaml-configuration.md)** - Structure and parameters
55
+ Master the orchestration system for building complete flows:
56
+ - **[Flow Overview](flows/overview.md)** - Understanding flow orchestration and YAML structure
58
57
  - **[Flow Discovery](flows/discovery.md)** - Registry and auto-discovery system
59
- - **[Custom Flows](flows/custom-flows.md)** - Building custom pipeline flows
58
+ - **[Custom Flows](flows/custom-flows.md)** - Building custom flows
59
+ - **[Available Flows](flows/available-flows.md)** - Pre-built flows in the ecosystem
60
60
 
61
61
  ### Advanced Topics
62
62
  - **[API Reference](api-reference.md)** - Complete API documentation
@@ -0,0 +1,13 @@
1
+ <!-- ![logo](https://your-logo-url.png)
2
+
3
+ # SDG Hub
4
+
5
+ > A modular Python framework for building synthetic data generation pipelines using composable blocks and flows.
6
+
7
+ - Mix and match LLM-powered and traditional processing blocks like Lego pieces.
8
+ - High-performance async execution with built-in error handling and retry logic.
9
+ - Type-safe configurations with Pydantic validation throughout.
10
+ - Zero-config auto-discovery of blocks and flows.
11
+
12
+ [GitHub](https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub)
13
+ [Get Started](quick-start.md) -->
@@ -13,8 +13,8 @@
13
13
 
14
14
  * **Flow System**
15
15
  * [Overview](flows/overview.md)
16
- * [YAML Configuration](flows/yaml-configuration.md)
17
16
  * [Flow Discovery](flows/discovery.md)
17
+ * [Available Flows](flows/available-flows.md)
18
18
  * [Custom Flows](flows/custom-flows.md)
19
19
 
20
20
  * **Advanced**
Binary file
@@ -9,10 +9,15 @@ Learn how to create your own custom blocks to extend SDG Hub's functionality. Cu
9
9
  All custom blocks must inherit from `BaseBlock` and implement the `generate()` method:
10
10
 
11
11
  ```python
12
+ # Standard library imports
13
+ from typing import Any
14
+
15
+ # Third-party imports
16
+ import pandas as pd
17
+
18
+ # Local imports
12
19
  from sdg_hub.core.blocks.base import BaseBlock
13
20
  from sdg_hub.core.blocks.registry import BlockRegistry
14
- from datasets import Dataset
15
- from typing import Any
16
21
 
17
22
  @BlockRegistry.register(
18
23
  "MyCustomBlock", # Block name for discovery
@@ -22,9 +27,41 @@ from typing import Any
22
27
  class MyCustomBlock(BaseBlock):
23
28
  """Custom block that performs specific processing."""
24
29
 
25
- def generate(self, samples: Dataset, **kwargs: Any) -> Dataset:
26
- """Implement your custom processing logic here."""
27
- #TODO: Add Custom block boilerplate code here
30
+ def generate(self, samples: pd.DataFrame, **kwargs: Any) -> pd.DataFrame:
31
+ """Implement your custom processing logic here.
32
+
33
+ Parameters
34
+ ----------
35
+ samples : pd.DataFrame
36
+ Input dataset to process.
37
+ **kwargs : Any
38
+ Additional runtime parameters.
39
+
40
+ Returns
41
+ -------
42
+ pd.DataFrame
43
+ Processed dataset with new columns added.
44
+ """
45
+ # Validate required columns exist (optional - BaseBlock already does this)
46
+ for col in self.input_cols:
47
+ if col not in samples.columns:
48
+ raise ValueError(f"Required column '{col}' not found in dataset")
49
+
50
+ # Create a copy to avoid modifying the input
51
+ result = samples.copy()
52
+
53
+ # Process each row (example: transform input column to output column)
54
+ processed_data = []
55
+ for idx, row in result.iterrows():
56
+ # Your custom processing logic here
57
+ input_value = row[self.input_cols[0]]
58
+ processed_value = f"Processed: {input_value}"
59
+ processed_data.append(processed_value)
60
+
61
+ # Add the processed data as a new column
62
+ result[self.output_cols[0]] = processed_data
63
+
64
+ return result
28
65
  ```
29
66
 
30
67
  ### Block Configuration
@@ -222,11 +222,207 @@ dataset = Dataset.from_dict({
222
222
 
223
223
  ## 🏗️ PromptBuilderBlock
224
224
 
225
- Constructs prompts from templates and data with validation and formatting support.
225
+ Constructs prompts from templates and data with validation and formatting support. Uses Jinja2 templating to dynamically render messages from dataset columns into structured chat format or plain text.
226
226
 
227
227
  ### Basic Template Usage
228
228
 
229
- #TODO: Add prompt builder block example
229
+ Create a YAML configuration file defining your prompt template:
230
+
231
+ ```yaml
232
+ # qa_prompt.yaml
233
+ - role: system
234
+ content: "You are an expert {{domain}} assistant with deep knowledge in the field."
235
+
236
+ - role: user
237
+ content: |
238
+ Please answer the following question based on the context provided.
239
+
240
+ Context: {{context}}
241
+ Question: {{question}}
242
+
243
+ Provide a clear and accurate answer.
244
+ ```
245
+
246
+ Use the template with PromptBuilderBlock:
247
+
248
+ ```python
249
+ from sdg_hub.core.blocks import PromptBuilderBlock
250
+ import pandas as pd
251
+
252
+ # Create the prompt builder block
253
+ prompt_builder = PromptBuilderBlock(
254
+ block_name="qa_prompter",
255
+ input_cols=["domain", "context", "question"],
256
+ output_cols="messages",
257
+ prompt_config_path="qa_prompt.yaml",
258
+ format_as_messages=True # Output as chat messages (default)
259
+ )
260
+
261
+ # Create dataset with your data
262
+ dataset = pd.DataFrame([
263
+ {
264
+ "domain": "physics",
265
+ "context": "Newton's laws describe the relationship between forces and motion.",
266
+ "question": "What is Newton's first law?"
267
+ },
268
+ {
269
+ "domain": "biology",
270
+ "context": "DNA contains the genetic instructions for living organisms.",
271
+ "question": "What is the role of DNA?"
272
+ }
273
+ ])
274
+
275
+ # Generate formatted prompts
276
+ result = prompt_builder.generate(dataset)
277
+
278
+ # Result contains messages in OpenAI chat format
279
+ print(result["messages"][0])
280
+ # [
281
+ # {"role": "system", "content": "You are an expert physics assistant..."},
282
+ # {"role": "user", "content": "Please answer the following question..."}
283
+ # ]
284
+ ```
285
+
286
+ ### Column Mapping with Dictionary
287
+
288
+ Map dataset column names to different template variable names:
289
+
290
+ ```python
291
+ # When dataset columns don't match template variable names
292
+ prompt_builder = PromptBuilderBlock(
293
+ block_name="mapped_prompter",
294
+ input_cols={
295
+ "article_text": "context", # Maps article_text column to {{context}}
296
+ "user_query": "question", # Maps user_query column to {{question}}
297
+ "subject": "domain" # Maps subject column to {{domain}}
298
+ },
299
+ output_cols="messages",
300
+ prompt_config_path="qa_prompt.yaml"
301
+ )
302
+
303
+ dataset = pd.DataFrame([{
304
+ "article_text": "Einstein's theory of relativity...",
305
+ "user_query": "What is time dilation?",
306
+ "subject": "physics"
307
+ }])
308
+
309
+ result = prompt_builder.generate(dataset)
310
+ ```
311
+
312
+ ### Plain Text Format
313
+
314
+ Generate formatted text instead of structured messages:
315
+
316
+ ```python
317
+ # evaluation_prompt.yaml
318
+ # - role: system
319
+ # content: "You are an evaluator assessing response quality."
320
+ # - role: user
321
+ # content: |
322
+ # Document: {{document}}
323
+ # Response: {{response}}
324
+ #
325
+ # Is the response faithful to the document? Answer YES or NO.
326
+
327
+ prompt_builder = PromptBuilderBlock(
328
+ block_name="eval_prompter",
329
+ input_cols=["document", "response"],
330
+ output_cols="formatted_prompt",
331
+ prompt_config_path="evaluation_prompt.yaml",
332
+ format_as_messages=False # Output as plain text
333
+ )
334
+
335
+ dataset = pd.DataFrame([{
336
+ "document": "The capital of France is Paris.",
337
+ "response": "Paris is the capital of France."
338
+ }])
339
+
340
+ result = prompt_builder.generate(dataset)
341
+
342
+ print(result["formatted_prompt"][0])
343
+ # system: You are an evaluator assessing response quality.
344
+ #
345
+ # user: Document: The capital of France is Paris.
346
+ # Response: Paris is the capital of France.
347
+ #
348
+ # Is the response faithful to the document? Answer YES or NO.
349
+ ```
350
+
351
+ ### Practical Example: Question Generation Pipeline
352
+
353
+ Complete example showing PromptBuilderBlock with LLMChatBlock:
354
+
355
+ ```python
356
+ from sdg_hub.core.blocks import PromptBuilderBlock, LLMChatBlock
357
+ import pandas as pd
358
+
359
+ # Step 1: Create template for question generation
360
+ # question_gen_prompt.yaml:
361
+ # - role: system
362
+ # content: "You are a question generation assistant."
363
+ # - role: user
364
+ # content: |
365
+ # Generate 3 questions based on this text:
366
+ # {{text}}
367
+ #
368
+ # Format: Return questions separated by newlines.
369
+
370
+ # Step 2: Configure prompt builder
371
+ prompt_builder = PromptBuilderBlock(
372
+ block_name="question_prompter",
373
+ input_cols="text",
374
+ output_cols="messages",
375
+ prompt_config_path="question_gen_prompt.yaml"
376
+ )
377
+
378
+ # Step 3: Configure LLM chat block
379
+ chat_block = LLMChatBlock(
380
+ block_name="question_generator",
381
+ model="openai/gpt-4o",
382
+ api_key="your-api-key",
383
+ input_cols="messages",
384
+ output_cols="llm_response",
385
+ temperature=0.7
386
+ )
387
+
388
+ # Step 4: Process dataset
389
+ dataset = pd.DataFrame([{
390
+ "text": "Machine learning is a subset of AI that enables systems to learn from data."
391
+ }])
392
+
393
+ # Execute pipeline
394
+ result = prompt_builder.generate(dataset)
395
+ result = chat_block.generate(result)
396
+
397
+ print(result["llm_response"][0])
398
+ # Generated questions based on the text
399
+ ```
400
+
401
+ ### Configuration Reference
402
+
403
+ **Required Parameters:**
404
+ - `block_name` - Unique identifier for the block
405
+ - `input_cols` - Column specification (str, list, or dict for mapping)
406
+ - `output_cols` - Single output column name (must be exactly one)
407
+ - `prompt_config_path` - Path to YAML template file
408
+
409
+ **Optional Parameters:**
410
+ - `format_as_messages` - Output format (default: `True`)
411
+ - `True`: List of dicts with 'role' and 'content' keys
412
+ - `False`: Concatenated string with role prefixes
413
+
414
+ **Template Requirements:**
415
+ - Must be a YAML list of message objects
416
+ - Each message requires 'role' and 'content' fields
417
+ - Valid roles: 'system', 'user', 'assistant', 'tool'
418
+ - Must contain at least one 'user' message
419
+ - Final message must have role='user' for chat completion
420
+ - Content supports Jinja2 templating syntax
421
+
422
+ **Template Variable Resolution:**
423
+ - Variables in `{{...}}` are replaced with dataset column values
424
+ - Use `input_cols` dict to map column names to template variables
425
+ - Missing variables are logged as warnings
230
426
 
231
427
  ## 🔍 TextParserBlock
232
428
 
@@ -141,7 +141,7 @@ result = block.generate(dataset) # ❌ Error!
141
141
 
142
142
  Ready to dive deeper? Explore specific block categories:
143
143
 
144
- - **[LLM Blocks](llm-blocks.md)** - AI-powered language model operations
145
- - **[Transform Blocks](transform-blocks.md)** - Data manipulation and reshaping
146
- - **[Filtering Blocks](filtering-blocks.md)** - Quality control and validation
147
- - **[Custom Blocks](custom-blocks.md)** - Build your own processing blocks
144
+ - **[LLM Blocks](blocks/llm-blocks.md)** - AI-powered language model operations
145
+ - **[Transform Blocks](blocks/transform-blocks.md)** - Data manipulation and reshaping
146
+ - **[Filtering Blocks](blocks/filtering-blocks.md)** - Quality control and validation
147
+ - **[Custom Blocks](blocks/custom-blocks.md)** - Build your own processing blocks
@@ -26,16 +26,12 @@ SDG Hub organizes blocks into logical categories:
26
26
  | **Filtering** | Quality control | Value-based filtering, threshold checks |
27
27
  | **Evaluation** | Quality assessment | Faithfulness scoring, relevancy evaluation |
28
28
 
29
- ### Block Example
30
- #TODO: Add block example
29
+ For detailed block examples and usage patterns, see [Block System Overview](blocks/overview.md).
31
30
 
32
31
  ## 🌊 Flows: Orchestrating Pipelines
33
32
 
34
33
  **Flows** are YAML-defined pipelines that orchestrate multiple blocks into complete data processing workflows.
35
34
 
36
- ### Flow Structure
37
- #TODO: Add flow structure
38
-
39
35
  ### Flow Execution Model
40
36
 
41
37
  Flows execute blocks sequentially:
@@ -58,6 +54,8 @@ Each block:
58
54
  - **🛡️ Validation** - Built-in checks for configuration and data compatibility
59
55
  - **📊 Monitoring** - Execution tracking and performance metrics
60
56
 
57
+ For detailed flow structure and YAML configuration examples, see [Flow System Overview](flows/overview.md).
58
+
61
59
  ## 🔍 Auto-Discovery System
62
60
 
63
61
  SDG Hub automatically discovers and registers components with zero configuration.
@@ -187,6 +185,6 @@ print(f"Output columns: {result['final_dataset']['columns']}")
187
185
  Now that you understand the core concepts:
188
186
 
189
187
  1. **[Explore Block Types](blocks/overview.md)** - Learn about specific block categories
190
- 2. **[Master Flow Configuration](flows/yaml-configuration.md)** - Deep dive into YAML structure
188
+ 2. **[Understand Flow System](flows/overview.md)** - Chain blocks into complete flows
191
189
  3. **[Build Custom Components](blocks/custom-blocks.md)** - Create your own blocks
192
- 4. **[Advanced Patterns](flows/custom-flows.md)** - Build sophisticated pipelines
190
+ 4. **[Advanced Patterns](flows/custom-flows.md)** - Build sophisticated flows