sdg-hub 0.6.0__tar.gz → 0.7.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/actionlint.dockerfile +1 -1
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/docs.yml +1 -1
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/integration-test.yml +1 -1
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/packer.yml +1 -1
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/pypi.yaml +4 -4
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/PKG-INFO +4 -2
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/README.md +3 -1
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/README.md +5 -5
- sdg_hub-0.7.0/docs/_coverpage.md +13 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/_sidebar.md +1 -1
- sdg_hub-0.7.0/docs/assets/logo.png +0 -0
- sdg_hub-0.7.0/docs/assets/sdg-hub-cover.png +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/blocks/custom-blocks.md +42 -5
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/blocks/llm-blocks.md +198 -2
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/blocks/overview.md +4 -4
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/concepts.md +5 -7
- sdg_hub-0.7.0/docs/flows/available-flows.md +974 -0
- sdg_hub-0.7.0/docs/flows/custom-flows.md +62 -0
- sdg_hub-0.7.0/docs/flows/discovery.md +472 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/index.html +13 -21
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/enhanced_summary_knowledge_tuning/knowledge_mixing.ipynb +2 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/document_pre_processing.ipynb +14 -12
- sdg_hub-0.7.0/examples/knowledge_tuning/instructlab/knowledge_generation_ja.ipynb +458 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/knowledge_utils.py +19 -18
- sdg_hub-0.7.0/examples/rag_evaluation/ibm-annual-report-2024.pdf +0 -0
- sdg_hub-0.7.0/examples/rag_evaluation/rag_evaluation_dataset_generation.ipynb +505 -0
- sdg_hub-0.7.0/scripts/snyk_notebook_scan.sh +196 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/_version.py +3 -3
- sdg_hub-0.7.0/src/sdg_hub/flows/evaluation/rag/answer_generation.yaml +21 -0
- sdg_hub-0.7.0/src/sdg_hub/flows/evaluation/rag/conceptual_qa_generation.yaml +25 -0
- sdg_hub-0.7.0/src/sdg_hub/flows/evaluation/rag/context_extraction.yaml +23 -0
- sdg_hub-0.7.0/src/sdg_hub/flows/evaluation/rag/flow.yaml +201 -0
- sdg_hub-0.7.0/src/sdg_hub/flows/evaluation/rag/groundedness_critic.yaml +24 -0
- sdg_hub-0.7.0/src/sdg_hub/flows/evaluation/rag/question_evolution.yaml +18 -0
- sdg_hub-0.7.0/src/sdg_hub/flows/evaluation/rag/topic_generation.yaml +12 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/multilingual/japanese/flow.yaml +1 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub.egg-info/PKG-INFO +4 -2
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub.egg-info/SOURCES.txt +16 -0
- sdg_hub-0.7.0/tests/__init__.py +0 -0
- sdg_hub-0.6.0/docs/_coverpage.md +0 -14
- sdg_hub-0.6.0/docs/flows/discovery.md +0 -206
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/actionlint.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/actions/free-disk-space/action.yml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/dependabot.yml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/mergify.yml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/actionlint.yml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/lint.yml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/matchers/actionlint.json +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/matchers/pylint.json +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/test.yml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.gitignore +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.isort.cfg +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.markdownlint-cli2.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.pre-commit-config.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/.pylintrc +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/CLAUDE.md +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/CONTRIBUTING.md +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/LICENSE +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/Makefile +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/.nojekyll +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/_navbar.md +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/api-reference.md +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/blocks/filtering-blocks.md +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/blocks/transform-blocks.md +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/development.md +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/flows/overview.md +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/installation.md +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/quick-start.md +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/enhanced_summary_knowledge_tuning/.env.example +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/enhanced_summary_knowledge_tuning/README.md +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/enhanced_summary_knowledge_tuning/document_pre_processing.ipynb +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/enhanced_summary_knowledge_tuning/knowledge_generation.ipynb +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/enhanced_summary_knowledge_tuning/knowledge_mixing_utils.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/enhanced_summary_knowledge_tuning/raft_builder.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/.gitignore +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/README.md +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/assets/imgs/instructlab-banner.png +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/docling_v2_config.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/docparser.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/docparser_v2.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/document_collection/ibm-annual-report/ibm-annual-report-2024.json +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/document_collection/ibm-annual-report/ibm-annual-report-2024.md +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/document_collection/ibm-annual-report/ibm-annual-report-2024.pdf +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/document_collection/ibm-annual-report/qna.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/knowledge_generation_and_mixing.ipynb +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/knowledge_tuning/instructlab/logger_config.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/text_analysis/README.md +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/text_analysis/extract_stock_tickers.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/examples/text_analysis/structured_insights_demo.ipynb +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/pyproject.toml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/scripts/packer/centos.pkr.hcl +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/scripts/packer/setup-centos.sh +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/scripts/ruff.sh +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/setup.cfg +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/__init__.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/__init__.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/__init__.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/base.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/filtering/__init__.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/filtering/column_value_filter.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/llm/__init__.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/llm/error_handler.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/llm/llm_chat_block.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/llm/llm_parser_block.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/llm/prompt_builder_block.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/llm/text_parser_block.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/registry.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/transform/__init__.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/transform/duplicate_columns.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/transform/index_based_mapper.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/transform/json_structure_block.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/transform/melt_columns.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/transform/rename_columns.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/transform/text_concat.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/blocks/transform/uniform_col_val_setter.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/flow/__init__.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/flow/base.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/flow/checkpointer.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/flow/metadata.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/flow/registry.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/flow/validation.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/utils/__init__.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/utils/datautils.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/utils/error_handling.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/utils/flow_id_words.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/utils/flow_identifier.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/utils/flow_metrics.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/utils/logger_config.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/utils/path_resolution.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/utils/time_estimator.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/core/utils/yaml_utils.py +0 -0
- {sdg_hub-0.6.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa → sdg_hub-0.7.0/src/sdg_hub/flows/evaluation/rag}/__init__.py +0 -0
- {sdg_hub-0.6.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/detailed_summary → sdg_hub-0.7.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa}/__init__.py +0 -0
- {sdg_hub-0.6.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/doc_direct_qa → sdg_hub-0.7.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/detailed_summary}/__init__.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/detailed_summary/detailed_summary.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/detailed_summary/flow.yaml +0 -0
- {sdg_hub-0.6.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/extractive_summary → sdg_hub-0.7.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/doc_direct_qa}/__init__.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/doc_direct_qa/flow.yaml +0 -0
- {sdg_hub-0.6.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/key_facts → sdg_hub-0.7.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/extractive_summary}/__init__.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/extractive_summary/extractive_summary.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/extractive_summary/flow.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/generate_answers.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/generate_multiple_qa.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/generate_question_list.yaml +0 -0
- {sdg_hub-0.6.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab → sdg_hub-0.7.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/key_facts}/__init__.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/key_facts/flow.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/enhanced_multi_summary_qa/key_facts/key_facts_summary.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab/README.md +0 -0
- {sdg_hub-0.6.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/multilingual/japanese → sdg_hub-0.7.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab}/__init__.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab/atomic_facts.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab/detailed_summary.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab/evaluate_faithfulness.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab/evaluate_question.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab/evaluate_relevancy.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab/extractive_summary.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab/flow.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab/generate_questions_responses.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/multilingual/japanese/README.md +0 -0
- {sdg_hub-0.6.0/tests → sdg_hub-0.7.0/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/multilingual/japanese}/__init__.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/multilingual/japanese/atomic_facts_ja.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/multilingual/japanese/detailed_summary_ja.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/multilingual/japanese/extractive_summary_ja.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/qa_generation/document_grounded_qa/multi_summary_qa/multilingual/japanese/generate_questions_responses_ja.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/text_analysis/__init__.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/text_analysis/structured_insights/__init__.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/text_analysis/structured_insights/analyze_sentiment.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/text_analysis/structured_insights/extract_entities.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/text_analysis/structured_insights/extract_keywords.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/text_analysis/structured_insights/flow.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/flows/text_analysis/structured_insights/summarize.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub/py.typed +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub.egg-info/dependency_links.txt +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub.egg-info/requires.txt +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/src/sdg_hub.egg-info/top_level.txt +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/filtering/test_columnvaluefilter.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/llm/test_llm_chat_block.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/llm/test_llm_parser_block.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/llm/test_promptbuilderblock.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/llm/test_textparserblock.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/test_base_block.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/test_registry.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/testdata/test_config.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/testdata/test_prompt_format_config.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/testdata/test_prompt_format_no_system.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/testdata/test_prompt_format_strict.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/testdata/test_prompt_invalid_final_role.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/testdata/test_prompt_no_user_messages.yaml +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/transform/test_index_based_mapper.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/transform/test_json_structure_block.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/transform/test_melt_columns.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/transform/test_rename_columns.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/transform/test_text_concat.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/blocks/transform/test_uniform_col_val_setter.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/flow/__init__.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/flow/conftest.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/flow/test_base.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/flow/test_checkpointer.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/flow/test_dataset_requirements.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/flow/test_integration.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/flow/test_metadata.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/flow/test_registry.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/flow/test_time_estimation.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/flow/test_validation.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/integration/README.md +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/integration/__init__.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/integration/knowledge_tuning/enhanced_summary_knowledge_tuning/README.md +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/integration/knowledge_tuning/enhanced_summary_knowledge_tuning/__init__.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/integration/knowledge_tuning/enhanced_summary_knowledge_tuning/conftest.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/integration/knowledge_tuning/enhanced_summary_knowledge_tuning/test_data/test_seed_data.jsonl +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/integration/knowledge_tuning/enhanced_summary_knowledge_tuning/test_functional.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/utils/test_datautils.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/utils/test_error_handling.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/utils/test_flow_metrics.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tests/utils/test_path_resolution.py +0 -0
- {sdg_hub-0.6.0 → sdg_hub-0.7.0}/tox.ini +0 -0
|
@@ -1,3 +1,3 @@
|
|
|
1
1
|
# Since dependabot cannot update workflows using docker,
|
|
2
2
|
# we use this indirection since dependabot can update this file.
|
|
3
|
-
FROM rhysd/actionlint:1.7.
|
|
3
|
+
FROM rhysd/actionlint:1.7.9@sha256:a0383f60d92601e2694e24b24d37df7b6a40bed7cedbc447611c50009bf02d94
|
|
@@ -39,6 +39,6 @@ jobs:
|
|
|
39
39
|
- name: "Checkout"
|
|
40
40
|
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29 # v4.1.6
|
|
41
41
|
- name: "Check Markdown documents"
|
|
42
|
-
uses: DavidAnson/markdownlint-cli2-action@
|
|
42
|
+
uses: DavidAnson/markdownlint-cli2-action@30a0e04f1870d58f8d717450cc6134995f993c63 # v21.0.0
|
|
43
43
|
with:
|
|
44
44
|
globs: '**/*.md'
|
|
@@ -139,7 +139,7 @@ jobs:
|
|
|
139
139
|
flags: integration
|
|
140
140
|
|
|
141
141
|
- name: Upload integration test artifacts
|
|
142
|
-
uses: actions/upload-artifact@
|
|
142
|
+
uses: actions/upload-artifact@v5
|
|
143
143
|
if: always()
|
|
144
144
|
with:
|
|
145
145
|
name: integration-test-results-${{ matrix.python }}-${{ matrix.platform }}
|
|
@@ -15,7 +15,7 @@ jobs:
|
|
|
15
15
|
uses: actions/checkout@v4
|
|
16
16
|
|
|
17
17
|
- name: Configure AWS Credentials
|
|
18
|
-
uses: aws-actions/configure-aws-credentials@
|
|
18
|
+
uses: aws-actions/configure-aws-credentials@61815dcd50bd041e203e49132bacad1fd04d2708
|
|
19
19
|
with:
|
|
20
20
|
role-to-assume: arn:aws:iam::851725220677:role/github-actions-packer-role
|
|
21
21
|
aws-region: us-east-2
|
|
@@ -49,7 +49,7 @@ jobs:
|
|
|
49
49
|
fetch-depth: 0
|
|
50
50
|
|
|
51
51
|
- name: "Build and Inspect"
|
|
52
|
-
uses: hynek/build-and-inspect-python-package@
|
|
52
|
+
uses: hynek/build-and-inspect-python-package@efb823f52190ad02594531168b7a2d5790e66516 # v2.14.0
|
|
53
53
|
|
|
54
54
|
# push to Test PyPI on
|
|
55
55
|
# - a new GitHub release is published
|
|
@@ -72,7 +72,7 @@ jobs:
|
|
|
72
72
|
egress-policy: audit # TODO: change to 'egress-policy: block' after couple of runs
|
|
73
73
|
|
|
74
74
|
- name: "Download build artifacts"
|
|
75
|
-
uses: actions/download-artifact@
|
|
75
|
+
uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
|
|
76
76
|
with:
|
|
77
77
|
name: Packages
|
|
78
78
|
path: dist
|
|
@@ -104,13 +104,13 @@ jobs:
|
|
|
104
104
|
egress-policy: audit # TODO: change to 'egress-policy: block' after couple of runs
|
|
105
105
|
|
|
106
106
|
- name: "Download build artifacts"
|
|
107
|
-
uses: actions/download-artifact@
|
|
107
|
+
uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
|
|
108
108
|
with:
|
|
109
109
|
name: Packages
|
|
110
110
|
path: dist
|
|
111
111
|
|
|
112
112
|
- name: "Sigstore sign package"
|
|
113
|
-
uses: sigstore/gh-action-sigstore-python@
|
|
113
|
+
uses: sigstore/gh-action-sigstore-python@f832326173235dcb00dd5d92cd3f353de3188e6c # v3.1.0
|
|
114
114
|
with:
|
|
115
115
|
inputs: |
|
|
116
116
|
./dist/*.tar.gz
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: sdg_hub
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.7.0
|
|
4
4
|
Summary: Synthetic Data Generation
|
|
5
5
|
Author-email: Red Hat AI Innovation <abhandwa@redhat.com>
|
|
6
6
|
License: Apache-2.0
|
|
@@ -70,7 +70,9 @@ Dynamic: license-file
|
|
|
70
70
|
[](https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub/actions/workflows/test.yml)
|
|
71
71
|
[](https://codecov.io/gh/Red-Hat-AI-Innovation-Team/sdg_hub)
|
|
72
72
|
|
|
73
|
-
|
|
73
|
+
<p align="center">
|
|
74
|
+
<img src="docs/assets/sdg-hub-cover.png" alt="SDG Hub Cover" width="400">
|
|
75
|
+
</p>
|
|
74
76
|
|
|
75
77
|
A modular Python framework for building synthetic data generation pipelines using composable blocks and flows. Transform datasets through **building-block composition** - mix and match LLM-powered and traditional processing blocks to create sophisticated data generation workflows.
|
|
76
78
|
|
|
@@ -6,7 +6,9 @@
|
|
|
6
6
|
[](https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub/actions/workflows/test.yml)
|
|
7
7
|
[](https://codecov.io/gh/Red-Hat-AI-Innovation-Team/sdg_hub)
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
<p align="center">
|
|
10
|
+
<img src="docs/assets/sdg-hub-cover.png" alt="SDG Hub Cover" width="400">
|
|
11
|
+
</p>
|
|
10
12
|
|
|
11
13
|
A modular Python framework for building synthetic data generation pipelines using composable blocks and flows. Transform datasets through **building-block composition** - mix and match LLM-powered and traditional processing blocks to create sophisticated data generation workflows.
|
|
12
14
|
|
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
[](https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub/actions/workflows/test.yml)
|
|
7
7
|
[](https://codecov.io/gh/Red-Hat-AI-Innovation-Team/sdg_hub)
|
|
8
8
|
|
|
9
|
-
A modular Python framework for building synthetic data generation pipelines using composable blocks and flows
|
|
9
|
+
A modular Python framework for building synthetic data generation pipelines using composable blocks and flows
|
|
10
10
|
|
|
11
11
|
## 🧱 Core Philosophy
|
|
12
12
|
|
|
@@ -52,11 +52,11 @@ Learn about the modular block architecture that powers SDG Hub:
|
|
|
52
52
|
- **[Custom Blocks](blocks/custom-blocks.md)** - Building your own processing blocks
|
|
53
53
|
|
|
54
54
|
### Flow System
|
|
55
|
-
Master the orchestration system for building complete
|
|
56
|
-
- **[Flow Overview](flows/overview.md)** - Understanding flow orchestration
|
|
57
|
-
- **[YAML Configuration](flows/yaml-configuration.md)** - Structure and parameters
|
|
55
|
+
Master the orchestration system for building complete flows:
|
|
56
|
+
- **[Flow Overview](flows/overview.md)** - Understanding flow orchestration and YAML structure
|
|
58
57
|
- **[Flow Discovery](flows/discovery.md)** - Registry and auto-discovery system
|
|
59
|
-
- **[Custom Flows](flows/custom-flows.md)** - Building custom
|
|
58
|
+
- **[Custom Flows](flows/custom-flows.md)** - Building custom flows
|
|
59
|
+
- **[Available Flows](flows/available-flows.md)** - Pre-built flows in the ecosystem
|
|
60
60
|
|
|
61
61
|
### Advanced Topics
|
|
62
62
|
- **[API Reference](api-reference.md)** - Complete API documentation
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
<!-- 
|
|
2
|
+
|
|
3
|
+
# SDG Hub
|
|
4
|
+
|
|
5
|
+
> A modular Python framework for building synthetic data generation pipelines using composable blocks and flows.
|
|
6
|
+
|
|
7
|
+
- Mix and match LLM-powered and traditional processing blocks like Lego pieces.
|
|
8
|
+
- High-performance async execution with built-in error handling and retry logic.
|
|
9
|
+
- Type-safe configurations with Pydantic validation throughout.
|
|
10
|
+
- Zero-config auto-discovery of blocks and flows.
|
|
11
|
+
|
|
12
|
+
[GitHub](https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub)
|
|
13
|
+
[Get Started](quick-start.md) -->
|
|
@@ -13,8 +13,8 @@
|
|
|
13
13
|
|
|
14
14
|
* **Flow System**
|
|
15
15
|
* [Overview](flows/overview.md)
|
|
16
|
-
* [YAML Configuration](flows/yaml-configuration.md)
|
|
17
16
|
* [Flow Discovery](flows/discovery.md)
|
|
17
|
+
* [Available Flows](flows/available-flows.md)
|
|
18
18
|
* [Custom Flows](flows/custom-flows.md)
|
|
19
19
|
|
|
20
20
|
* **Advanced**
|
|
Binary file
|
|
Binary file
|
|
@@ -9,10 +9,15 @@ Learn how to create your own custom blocks to extend SDG Hub's functionality. Cu
|
|
|
9
9
|
All custom blocks must inherit from `BaseBlock` and implement the `generate()` method:
|
|
10
10
|
|
|
11
11
|
```python
|
|
12
|
+
# Standard library imports
|
|
13
|
+
from typing import Any
|
|
14
|
+
|
|
15
|
+
# Third-party imports
|
|
16
|
+
import pandas as pd
|
|
17
|
+
|
|
18
|
+
# Local imports
|
|
12
19
|
from sdg_hub.core.blocks.base import BaseBlock
|
|
13
20
|
from sdg_hub.core.blocks.registry import BlockRegistry
|
|
14
|
-
from datasets import Dataset
|
|
15
|
-
from typing import Any
|
|
16
21
|
|
|
17
22
|
@BlockRegistry.register(
|
|
18
23
|
"MyCustomBlock", # Block name for discovery
|
|
@@ -22,9 +27,41 @@ from typing import Any
|
|
|
22
27
|
class MyCustomBlock(BaseBlock):
|
|
23
28
|
"""Custom block that performs specific processing."""
|
|
24
29
|
|
|
25
|
-
def generate(self, samples:
|
|
26
|
-
"""Implement your custom processing logic here.
|
|
27
|
-
|
|
30
|
+
def generate(self, samples: pd.DataFrame, **kwargs: Any) -> pd.DataFrame:
|
|
31
|
+
"""Implement your custom processing logic here.
|
|
32
|
+
|
|
33
|
+
Parameters
|
|
34
|
+
----------
|
|
35
|
+
samples : pd.DataFrame
|
|
36
|
+
Input dataset to process.
|
|
37
|
+
**kwargs : Any
|
|
38
|
+
Additional runtime parameters.
|
|
39
|
+
|
|
40
|
+
Returns
|
|
41
|
+
-------
|
|
42
|
+
pd.DataFrame
|
|
43
|
+
Processed dataset with new columns added.
|
|
44
|
+
"""
|
|
45
|
+
# Validate required columns exist (optional - BaseBlock already does this)
|
|
46
|
+
for col in self.input_cols:
|
|
47
|
+
if col not in samples.columns:
|
|
48
|
+
raise ValueError(f"Required column '{col}' not found in dataset")
|
|
49
|
+
|
|
50
|
+
# Create a copy to avoid modifying the input
|
|
51
|
+
result = samples.copy()
|
|
52
|
+
|
|
53
|
+
# Process each row (example: transform input column to output column)
|
|
54
|
+
processed_data = []
|
|
55
|
+
for idx, row in result.iterrows():
|
|
56
|
+
# Your custom processing logic here
|
|
57
|
+
input_value = row[self.input_cols[0]]
|
|
58
|
+
processed_value = f"Processed: {input_value}"
|
|
59
|
+
processed_data.append(processed_value)
|
|
60
|
+
|
|
61
|
+
# Add the processed data as a new column
|
|
62
|
+
result[self.output_cols[0]] = processed_data
|
|
63
|
+
|
|
64
|
+
return result
|
|
28
65
|
```
|
|
29
66
|
|
|
30
67
|
### Block Configuration
|
|
@@ -222,11 +222,207 @@ dataset = Dataset.from_dict({
|
|
|
222
222
|
|
|
223
223
|
## 🏗️ PromptBuilderBlock
|
|
224
224
|
|
|
225
|
-
Constructs prompts from templates and data with validation and formatting support.
|
|
225
|
+
Constructs prompts from templates and data with validation and formatting support. Uses Jinja2 templating to dynamically render messages from dataset columns into structured chat format or plain text.
|
|
226
226
|
|
|
227
227
|
### Basic Template Usage
|
|
228
228
|
|
|
229
|
-
|
|
229
|
+
Create a YAML configuration file defining your prompt template:
|
|
230
|
+
|
|
231
|
+
```yaml
|
|
232
|
+
# qa_prompt.yaml
|
|
233
|
+
- role: system
|
|
234
|
+
content: "You are an expert {{domain}} assistant with deep knowledge in the field."
|
|
235
|
+
|
|
236
|
+
- role: user
|
|
237
|
+
content: |
|
|
238
|
+
Please answer the following question based on the context provided.
|
|
239
|
+
|
|
240
|
+
Context: {{context}}
|
|
241
|
+
Question: {{question}}
|
|
242
|
+
|
|
243
|
+
Provide a clear and accurate answer.
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
Use the template with PromptBuilderBlock:
|
|
247
|
+
|
|
248
|
+
```python
|
|
249
|
+
from sdg_hub.core.blocks import PromptBuilderBlock
|
|
250
|
+
import pandas as pd
|
|
251
|
+
|
|
252
|
+
# Create the prompt builder block
|
|
253
|
+
prompt_builder = PromptBuilderBlock(
|
|
254
|
+
block_name="qa_prompter",
|
|
255
|
+
input_cols=["domain", "context", "question"],
|
|
256
|
+
output_cols="messages",
|
|
257
|
+
prompt_config_path="qa_prompt.yaml",
|
|
258
|
+
format_as_messages=True # Output as chat messages (default)
|
|
259
|
+
)
|
|
260
|
+
|
|
261
|
+
# Create dataset with your data
|
|
262
|
+
dataset = pd.DataFrame([
|
|
263
|
+
{
|
|
264
|
+
"domain": "physics",
|
|
265
|
+
"context": "Newton's laws describe the relationship between forces and motion.",
|
|
266
|
+
"question": "What is Newton's first law?"
|
|
267
|
+
},
|
|
268
|
+
{
|
|
269
|
+
"domain": "biology",
|
|
270
|
+
"context": "DNA contains the genetic instructions for living organisms.",
|
|
271
|
+
"question": "What is the role of DNA?"
|
|
272
|
+
}
|
|
273
|
+
])
|
|
274
|
+
|
|
275
|
+
# Generate formatted prompts
|
|
276
|
+
result = prompt_builder.generate(dataset)
|
|
277
|
+
|
|
278
|
+
# Result contains messages in OpenAI chat format
|
|
279
|
+
print(result["messages"][0])
|
|
280
|
+
# [
|
|
281
|
+
# {"role": "system", "content": "You are an expert physics assistant..."},
|
|
282
|
+
# {"role": "user", "content": "Please answer the following question..."}
|
|
283
|
+
# ]
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
### Column Mapping with Dictionary
|
|
287
|
+
|
|
288
|
+
Map dataset column names to different template variable names:
|
|
289
|
+
|
|
290
|
+
```python
|
|
291
|
+
# When dataset columns don't match template variable names
|
|
292
|
+
prompt_builder = PromptBuilderBlock(
|
|
293
|
+
block_name="mapped_prompter",
|
|
294
|
+
input_cols={
|
|
295
|
+
"article_text": "context", # Maps article_text column to {{context}}
|
|
296
|
+
"user_query": "question", # Maps user_query column to {{question}}
|
|
297
|
+
"subject": "domain" # Maps subject column to {{domain}}
|
|
298
|
+
},
|
|
299
|
+
output_cols="messages",
|
|
300
|
+
prompt_config_path="qa_prompt.yaml"
|
|
301
|
+
)
|
|
302
|
+
|
|
303
|
+
dataset = pd.DataFrame([{
|
|
304
|
+
"article_text": "Einstein's theory of relativity...",
|
|
305
|
+
"user_query": "What is time dilation?",
|
|
306
|
+
"subject": "physics"
|
|
307
|
+
}])
|
|
308
|
+
|
|
309
|
+
result = prompt_builder.generate(dataset)
|
|
310
|
+
```
|
|
311
|
+
|
|
312
|
+
### Plain Text Format
|
|
313
|
+
|
|
314
|
+
Generate formatted text instead of structured messages:
|
|
315
|
+
|
|
316
|
+
```python
|
|
317
|
+
# evaluation_prompt.yaml
|
|
318
|
+
# - role: system
|
|
319
|
+
# content: "You are an evaluator assessing response quality."
|
|
320
|
+
# - role: user
|
|
321
|
+
# content: |
|
|
322
|
+
# Document: {{document}}
|
|
323
|
+
# Response: {{response}}
|
|
324
|
+
#
|
|
325
|
+
# Is the response faithful to the document? Answer YES or NO.
|
|
326
|
+
|
|
327
|
+
prompt_builder = PromptBuilderBlock(
|
|
328
|
+
block_name="eval_prompter",
|
|
329
|
+
input_cols=["document", "response"],
|
|
330
|
+
output_cols="formatted_prompt",
|
|
331
|
+
prompt_config_path="evaluation_prompt.yaml",
|
|
332
|
+
format_as_messages=False # Output as plain text
|
|
333
|
+
)
|
|
334
|
+
|
|
335
|
+
dataset = pd.DataFrame([{
|
|
336
|
+
"document": "The capital of France is Paris.",
|
|
337
|
+
"response": "Paris is the capital of France."
|
|
338
|
+
}])
|
|
339
|
+
|
|
340
|
+
result = prompt_builder.generate(dataset)
|
|
341
|
+
|
|
342
|
+
print(result["formatted_prompt"][0])
|
|
343
|
+
# system: You are an evaluator assessing response quality.
|
|
344
|
+
#
|
|
345
|
+
# user: Document: The capital of France is Paris.
|
|
346
|
+
# Response: Paris is the capital of France.
|
|
347
|
+
#
|
|
348
|
+
# Is the response faithful to the document? Answer YES or NO.
|
|
349
|
+
```
|
|
350
|
+
|
|
351
|
+
### Practical Example: Question Generation Pipeline
|
|
352
|
+
|
|
353
|
+
Complete example showing PromptBuilderBlock with LLMChatBlock:
|
|
354
|
+
|
|
355
|
+
```python
|
|
356
|
+
from sdg_hub.core.blocks import PromptBuilderBlock, LLMChatBlock
|
|
357
|
+
import pandas as pd
|
|
358
|
+
|
|
359
|
+
# Step 1: Create template for question generation
|
|
360
|
+
# question_gen_prompt.yaml:
|
|
361
|
+
# - role: system
|
|
362
|
+
# content: "You are a question generation assistant."
|
|
363
|
+
# - role: user
|
|
364
|
+
# content: |
|
|
365
|
+
# Generate 3 questions based on this text:
|
|
366
|
+
# {{text}}
|
|
367
|
+
#
|
|
368
|
+
# Format: Return questions separated by newlines.
|
|
369
|
+
|
|
370
|
+
# Step 2: Configure prompt builder
|
|
371
|
+
prompt_builder = PromptBuilderBlock(
|
|
372
|
+
block_name="question_prompter",
|
|
373
|
+
input_cols="text",
|
|
374
|
+
output_cols="messages",
|
|
375
|
+
prompt_config_path="question_gen_prompt.yaml"
|
|
376
|
+
)
|
|
377
|
+
|
|
378
|
+
# Step 3: Configure LLM chat block
|
|
379
|
+
chat_block = LLMChatBlock(
|
|
380
|
+
block_name="question_generator",
|
|
381
|
+
model="openai/gpt-4o",
|
|
382
|
+
api_key="your-api-key",
|
|
383
|
+
input_cols="messages",
|
|
384
|
+
output_cols="llm_response",
|
|
385
|
+
temperature=0.7
|
|
386
|
+
)
|
|
387
|
+
|
|
388
|
+
# Step 4: Process dataset
|
|
389
|
+
dataset = pd.DataFrame([{
|
|
390
|
+
"text": "Machine learning is a subset of AI that enables systems to learn from data."
|
|
391
|
+
}])
|
|
392
|
+
|
|
393
|
+
# Execute pipeline
|
|
394
|
+
result = prompt_builder.generate(dataset)
|
|
395
|
+
result = chat_block.generate(result)
|
|
396
|
+
|
|
397
|
+
print(result["llm_response"][0])
|
|
398
|
+
# Generated questions based on the text
|
|
399
|
+
```
|
|
400
|
+
|
|
401
|
+
### Configuration Reference
|
|
402
|
+
|
|
403
|
+
**Required Parameters:**
|
|
404
|
+
- `block_name` - Unique identifier for the block
|
|
405
|
+
- `input_cols` - Column specification (str, list, or dict for mapping)
|
|
406
|
+
- `output_cols` - Single output column name (must be exactly one)
|
|
407
|
+
- `prompt_config_path` - Path to YAML template file
|
|
408
|
+
|
|
409
|
+
**Optional Parameters:**
|
|
410
|
+
- `format_as_messages` - Output format (default: `True`)
|
|
411
|
+
- `True`: List of dicts with 'role' and 'content' keys
|
|
412
|
+
- `False`: Concatenated string with role prefixes
|
|
413
|
+
|
|
414
|
+
**Template Requirements:**
|
|
415
|
+
- Must be a YAML list of message objects
|
|
416
|
+
- Each message requires 'role' and 'content' fields
|
|
417
|
+
- Valid roles: 'system', 'user', 'assistant', 'tool'
|
|
418
|
+
- Must contain at least one 'user' message
|
|
419
|
+
- Final message must have role='user' for chat completion
|
|
420
|
+
- Content supports Jinja2 templating syntax
|
|
421
|
+
|
|
422
|
+
**Template Variable Resolution:**
|
|
423
|
+
- Variables in `{{...}}` are replaced with dataset column values
|
|
424
|
+
- Use `input_cols` dict to map column names to template variables
|
|
425
|
+
- Missing variables are logged as warnings
|
|
230
426
|
|
|
231
427
|
## 🔍 TextParserBlock
|
|
232
428
|
|
|
@@ -141,7 +141,7 @@ result = block.generate(dataset) # ❌ Error!
|
|
|
141
141
|
|
|
142
142
|
Ready to dive deeper? Explore specific block categories:
|
|
143
143
|
|
|
144
|
-
- **[LLM Blocks](llm-blocks.md)** - AI-powered language model operations
|
|
145
|
-
- **[Transform Blocks](transform-blocks.md)** - Data manipulation and reshaping
|
|
146
|
-
- **[Filtering Blocks](filtering-blocks.md)** - Quality control and validation
|
|
147
|
-
- **[Custom Blocks](custom-blocks.md)** - Build your own processing blocks
|
|
144
|
+
- **[LLM Blocks](blocks/llm-blocks.md)** - AI-powered language model operations
|
|
145
|
+
- **[Transform Blocks](blocks/transform-blocks.md)** - Data manipulation and reshaping
|
|
146
|
+
- **[Filtering Blocks](blocks/filtering-blocks.md)** - Quality control and validation
|
|
147
|
+
- **[Custom Blocks](blocks/custom-blocks.md)** - Build your own processing blocks
|
|
@@ -26,16 +26,12 @@ SDG Hub organizes blocks into logical categories:
|
|
|
26
26
|
| **Filtering** | Quality control | Value-based filtering, threshold checks |
|
|
27
27
|
| **Evaluation** | Quality assessment | Faithfulness scoring, relevancy evaluation |
|
|
28
28
|
|
|
29
|
-
|
|
30
|
-
#TODO: Add block example
|
|
29
|
+
For detailed block examples and usage patterns, see [Block System Overview](blocks/overview.md).
|
|
31
30
|
|
|
32
31
|
## 🌊 Flows: Orchestrating Pipelines
|
|
33
32
|
|
|
34
33
|
**Flows** are YAML-defined pipelines that orchestrate multiple blocks into complete data processing workflows.
|
|
35
34
|
|
|
36
|
-
### Flow Structure
|
|
37
|
-
#TODO: Add flow structure
|
|
38
|
-
|
|
39
35
|
### Flow Execution Model
|
|
40
36
|
|
|
41
37
|
Flows execute blocks sequentially:
|
|
@@ -58,6 +54,8 @@ Each block:
|
|
|
58
54
|
- **🛡️ Validation** - Built-in checks for configuration and data compatibility
|
|
59
55
|
- **📊 Monitoring** - Execution tracking and performance metrics
|
|
60
56
|
|
|
57
|
+
For detailed flow structure and YAML configuration examples, see [Flow System Overview](flows/overview.md).
|
|
58
|
+
|
|
61
59
|
## 🔍 Auto-Discovery System
|
|
62
60
|
|
|
63
61
|
SDG Hub automatically discovers and registers components with zero configuration.
|
|
@@ -187,6 +185,6 @@ print(f"Output columns: {result['final_dataset']['columns']}")
|
|
|
187
185
|
Now that you understand the core concepts:
|
|
188
186
|
|
|
189
187
|
1. **[Explore Block Types](blocks/overview.md)** - Learn about specific block categories
|
|
190
|
-
2. **[
|
|
188
|
+
2. **[Understand Flow System](flows/overview.md)** - Chain blocks into complete flows
|
|
191
189
|
3. **[Build Custom Components](blocks/custom-blocks.md)** - Create your own blocks
|
|
192
|
-
4. **[Advanced Patterns](flows/custom-flows.md)** - Build sophisticated
|
|
190
|
+
4. **[Advanced Patterns](flows/custom-flows.md)** - Build sophisticated flows
|