PyPI - sdg-hub - Versions diffs - 0.6.0__tar.gz → 0.7.0__tar.gz - Mend

sdg-hub 0.6.0tar.gz → 0.7.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (215) hide show

{sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/actionlint.dockerfile RENAMED Viewed

@@ -1,3 +1,3 @@
 # Since dependabot cannot update workflows using docker,
 # we use this indirection since dependabot can update this file.
-FROM rhysd/actionlint:1.7.7@sha256:887a259a5a534f3c4f36cb02dca341673c6089431057242cdc931e9f133147e9
+FROM rhysd/actionlint:1.7.9@sha256:a0383f60d92601e2694e24b24d37df7b6a40bed7cedbc447611c50009bf02d94

{sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/docs.yml RENAMED Viewed

@@ -39,6 +39,6 @@ jobs:
       - name: "Checkout"
         uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29 # v4.1.6
       - name: "Check Markdown documents"
-        uses: DavidAnson/markdownlint-cli2-action@992badcdf24e3b8eb7e87ff9287fe931bcb00c6e # v20.0.0
+        uses: DavidAnson/markdownlint-cli2-action@30a0e04f1870d58f8d717450cc6134995f993c63 # v21.0.0
         with:
           globs: '**/*.md'

{sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/integration-test.yml RENAMED Viewed

@@ -139,7 +139,7 @@ jobs:
           flags: integration
       - name: Upload integration test artifacts
-        uses: actions/upload-artifact@v4
+        uses: actions/upload-artifact@v5
         if: always()
         with:
           name: integration-test-results-${{ matrix.python }}-${{ matrix.platform }}

{sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/packer.yml RENAMED Viewed

@@ -15,7 +15,7 @@ jobs:
         uses: actions/checkout@v4
       - name: Configure AWS Credentials
-        uses: aws-actions/configure-aws-credentials@ff717079ee2060e4bcee96c4779b553acc87447c
+        uses: aws-actions/configure-aws-credentials@61815dcd50bd041e203e49132bacad1fd04d2708
         with:
           role-to-assume: arn:aws:iam::851725220677:role/github-actions-packer-role
           aws-region: us-east-2

{sdg_hub-0.6.0 → sdg_hub-0.7.0}/.github/workflows/pypi.yaml RENAMED Viewed

@@ -49,7 +49,7 @@ jobs:
                   fetch-depth: 0
             - name: "Build and Inspect"
-              uses: hynek/build-and-inspect-python-package@c52c3a4710070b50470d903818a7b25115dcd076 # v2.13.0
+              uses: hynek/build-and-inspect-python-package@efb823f52190ad02594531168b7a2d5790e66516 # v2.14.0
     # push to Test PyPI on
     # - a new GitHub release is published
@@ -72,7 +72,7 @@ jobs:
                   egress-policy: audit # TODO: change to 'egress-policy: block' after couple of runs
             - name: "Download build artifacts"
-              uses: actions/download-artifact@634f93cb2916e3fdff6788551b99b062d0335ce0 # v5.0.0
+              uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
               with:
                   name: Packages
                   path: dist
@@ -104,13 +104,13 @@ jobs:
                   egress-policy: audit # TODO: change to 'egress-policy: block' after couple of runs
             - name: "Download build artifacts"
-              uses: actions/download-artifact@634f93cb2916e3fdff6788551b99b062d0335ce0 # v5.0.0
+              uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
               with:
                   name: Packages
                   path: dist
             - name: "Sigstore sign package"
-              uses: sigstore/gh-action-sigstore-python@f7ad0af51a5648d09a20d00370f0a91c3bdf8f84 # v3.0.1
+              uses: sigstore/gh-action-sigstore-python@f832326173235dcb00dd5d92cd3f353de3188e6c # v3.1.0
               with:
                   inputs: |
                       ./dist/*.tar.gz

{sdg_hub-0.6.0 → sdg_hub-0.7.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: sdg_hub
-Version: 0.6.0
+Version: 0.7.0
 Summary: Synthetic Data Generation
 Author-email: Red Hat AI Innovation <abhandwa@redhat.com>
 License: Apache-2.0
@@ -70,7 +70,9 @@ Dynamic: license-file
 [![Tests](https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub/actions/workflows/test.yml/badge.svg)](https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub/actions/workflows/test.yml)
 [![codecov](https://codecov.io/gh/Red-Hat-AI-Innovation-Team/sdg_hub/graph/badge.svg?token=SP75BCXWO2)](https://codecov.io/gh/Red-Hat-AI-Innovation-Team/sdg_hub)
+<p align="center">
+  <img src="docs/assets/sdg-hub-cover.png" alt="SDG Hub Cover" width="400">
+</p>
 A modular Python framework for building synthetic data generation pipelines using composable blocks and flows. Transform datasets through **building-block composition** - mix and match LLM-powered and traditional processing blocks to create sophisticated data generation workflows.

{sdg_hub-0.6.0 → sdg_hub-0.7.0}/README.md RENAMED Viewed

@@ -6,7 +6,9 @@
 [![Tests](https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub/actions/workflows/test.yml/badge.svg)](https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub/actions/workflows/test.yml)
 [![codecov](https://codecov.io/gh/Red-Hat-AI-Innovation-Team/sdg_hub/graph/badge.svg?token=SP75BCXWO2)](https://codecov.io/gh/Red-Hat-AI-Innovation-Team/sdg_hub)
+<p align="center">
+  <img src="docs/assets/sdg-hub-cover.png" alt="SDG Hub Cover" width="400">
+</p>
 A modular Python framework for building synthetic data generation pipelines using composable blocks and flows. Transform datasets through **building-block composition** - mix and match LLM-powered and traditional processing blocks to create sophisticated data generation workflows.

{sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/README.md RENAMED Viewed

@@ -6,7 +6,7 @@
 [![Tests](https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub/actions/workflows/test.yml/badge.svg)](https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub/actions/workflows/test.yml)
 [![codecov](https://codecov.io/gh/Red-Hat-AI-Innovation-Team/sdg_hub/graph/badge.svg?token=SP75BCXWO2)](https://codecov.io/gh/Red-Hat-AI-Innovation-Team/sdg_hub)
-A modular Python framework for building synthetic data generation pipelines using composable blocks and flows. Transform datasets through **building-block composition** - mix and match LLM-powered and traditional processing blocks to create sophisticated data generation workflows.
+A modular Python framework for building synthetic data generation pipelines using composable blocks and flows
 ## 🧱 Core Philosophy
@@ -52,11 +52,11 @@ Learn about the modular block architecture that powers SDG Hub:
 - **[Custom Blocks](blocks/custom-blocks.md)** - Building your own processing blocks
 ### Flow System
-Master the orchestration system for building complete pipelines:
-- **[Flow Overview](flows/overview.md)** - Understanding flow orchestration
-- **[YAML Configuration](flows/yaml-configuration.md)** - Structure and parameters
+Master the orchestration system for building complete flows:
+- **[Flow Overview](flows/overview.md)** - Understanding flow orchestration and YAML structure
 - **[Flow Discovery](flows/discovery.md)** - Registry and auto-discovery system
-- **[Custom Flows](flows/custom-flows.md)** - Building custom pipeline flows
+- **[Custom Flows](flows/custom-flows.md)** - Building custom flows
+- **[Available Flows](flows/available-flows.md)** - Pre-built flows in the ecosystem
 ### Advanced Topics
 - **[API Reference](api-reference.md)** - Complete API documentation

sdg_hub-0.7.0/docs/_coverpage.md ADDED Viewed

@@ -0,0 +1,13 @@
+<!-- ![logo](https://your-logo-url.png)
+# SDG Hub
+> A modular Python framework for building synthetic data generation pipelines using composable blocks and flows.
+- Mix and match LLM-powered and traditional processing blocks like Lego pieces.
+- High-performance async execution with built-in error handling and retry logic.
+- Type-safe configurations with Pydantic validation throughout.
+- Zero-config auto-discovery of blocks and flows.
+[GitHub](https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub)
+[Get Started](quick-start.md) -->

{sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/_sidebar.md RENAMED Viewed

@@ -13,8 +13,8 @@
 * **Flow System**
   * [Overview](flows/overview.md)
-  * [YAML Configuration](flows/yaml-configuration.md)
   * [Flow Discovery](flows/discovery.md)
+  * [Available Flows](flows/available-flows.md)
   * [Custom Flows](flows/custom-flows.md)
 * **Advanced**

sdg_hub-0.7.0/docs/assets/logo.png ADDED Viewed

Binary file

sdg_hub-0.7.0/docs/assets/sdg-hub-cover.png ADDED Viewed

Binary file

{sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/blocks/custom-blocks.md RENAMED Viewed

@@ -9,10 +9,15 @@ Learn how to create your own custom blocks to extend SDG Hub's functionality. Cu
 All custom blocks must inherit from `BaseBlock` and implement the `generate()` method:
 ```python
+# Standard library imports
+from typing import Any
+# Third-party imports
+import pandas as pd
+# Local imports
 from sdg_hub.core.blocks.base import BaseBlock
 from sdg_hub.core.blocks.registry import BlockRegistry
-from datasets import Dataset
-from typing import Any
 @BlockRegistry.register(
     "MyCustomBlock",           # Block name for discovery
@@ -22,9 +27,41 @@ from typing import Any
 class MyCustomBlock(BaseBlock):
     """Custom block that performs specific processing."""
-    def generate(self, samples: Dataset, **kwargs: Any) -> Dataset:
-        """Implement your custom processing logic here."""
-        #TODO: Add Custom block boilerplate code here
+    def generate(self, samples: pd.DataFrame, **kwargs: Any) -> pd.DataFrame:
+        """Implement your custom processing logic here.
+        Parameters
+        ----------
+        samples : pd.DataFrame
+            Input dataset to process.
+        **kwargs : Any
+            Additional runtime parameters.
+        Returns
+        -------
+        pd.DataFrame
+            Processed dataset with new columns added.
+        """
+        # Validate required columns exist (optional - BaseBlock already does this)
+        for col in self.input_cols:
+            if col not in samples.columns:
+                raise ValueError(f"Required column '{col}' not found in dataset")
+        # Create a copy to avoid modifying the input
+        result = samples.copy()
+        # Process each row (example: transform input column to output column)
+        processed_data = []
+        for idx, row in result.iterrows():
+            # Your custom processing logic here
+            input_value = row[self.input_cols[0]]
+            processed_value = f"Processed: {input_value}"
+            processed_data.append(processed_value)
+        # Add the processed data as a new column
+        result[self.output_cols[0]] = processed_data
+        return result
 ```
 ### Block Configuration

{sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/blocks/llm-blocks.md RENAMED Viewed

@@ -222,11 +222,207 @@ dataset = Dataset.from_dict({
 ## 🏗️ PromptBuilderBlock
-Constructs prompts from templates and data with validation and formatting support.
+Constructs prompts from templates and data with validation and formatting support. Uses Jinja2 templating to dynamically render messages from dataset columns into structured chat format or plain text.
 ### Basic Template Usage
-#TODO: Add prompt builder block example
+Create a YAML configuration file defining your prompt template:
+```yaml
+# qa_prompt.yaml
+- role: system
+  content: "You are an expert {{domain}} assistant with deep knowledge in the field."
+- role: user
+  content: |
+    Please answer the following question based on the context provided.
+    Context: {{context}}
+    Question: {{question}}
+    Provide a clear and accurate answer.
+```
+Use the template with PromptBuilderBlock:
+```python
+from sdg_hub.core.blocks import PromptBuilderBlock
+import pandas as pd
+# Create the prompt builder block
+prompt_builder = PromptBuilderBlock(
+    block_name="qa_prompter",
+    input_cols=["domain", "context", "question"],
+    output_cols="messages",
+    prompt_config_path="qa_prompt.yaml",
+    format_as_messages=True  # Output as chat messages (default)
+)
+# Create dataset with your data
+dataset = pd.DataFrame([
+    {
+        "domain": "physics",
+        "context": "Newton's laws describe the relationship between forces and motion.",
+        "question": "What is Newton's first law?"
+    },
+    {
+        "domain": "biology",
+        "context": "DNA contains the genetic instructions for living organisms.",
+        "question": "What is the role of DNA?"
+    }
+])
+# Generate formatted prompts
+result = prompt_builder.generate(dataset)
+# Result contains messages in OpenAI chat format
+print(result["messages"][0])
+# [
+#   {"role": "system", "content": "You are an expert physics assistant..."},
+#   {"role": "user", "content": "Please answer the following question..."}
+# ]
+```
+### Column Mapping with Dictionary
+Map dataset column names to different template variable names:
+```python
+# When dataset columns don't match template variable names
+prompt_builder = PromptBuilderBlock(
+    block_name="mapped_prompter",
+    input_cols={
+        "article_text": "context",      # Maps article_text column to {{context}}
+        "user_query": "question",        # Maps user_query column to {{question}}
+        "subject": "domain"              # Maps subject column to {{domain}}
+    },
+    output_cols="messages",
+    prompt_config_path="qa_prompt.yaml"
+)
+dataset = pd.DataFrame([{
+    "article_text": "Einstein's theory of relativity...",
+    "user_query": "What is time dilation?",
+    "subject": "physics"
+}])
+result = prompt_builder.generate(dataset)
+```
+### Plain Text Format
+Generate formatted text instead of structured messages:
+```python
+# evaluation_prompt.yaml
+# - role: system
+#   content: "You are an evaluator assessing response quality."
+# - role: user
+#   content: |
+#     Document: {{document}}
+#     Response: {{response}}
+#
+#     Is the response faithful to the document? Answer YES or NO.
+prompt_builder = PromptBuilderBlock(
+    block_name="eval_prompter",
+    input_cols=["document", "response"],
+    output_cols="formatted_prompt",
+    prompt_config_path="evaluation_prompt.yaml",
+    format_as_messages=False  # Output as plain text
+)
+dataset = pd.DataFrame([{
+    "document": "The capital of France is Paris.",
+    "response": "Paris is the capital of France."
+}])
+result = prompt_builder.generate(dataset)
+print(result["formatted_prompt"][0])
+# system: You are an evaluator assessing response quality.
+#
+# user: Document: The capital of France is Paris.
+# Response: Paris is the capital of France.
+#
+# Is the response faithful to the document? Answer YES or NO.
+```
+### Practical Example: Question Generation Pipeline
+Complete example showing PromptBuilderBlock with LLMChatBlock:
+```python
+from sdg_hub.core.blocks import PromptBuilderBlock, LLMChatBlock
+import pandas as pd
+# Step 1: Create template for question generation
+# question_gen_prompt.yaml:
+# - role: system
+#   content: "You are a question generation assistant."
+# - role: user
+#   content: |
+#     Generate 3 questions based on this text:
+#     {{text}}
+#
+#     Format: Return questions separated by newlines.
+# Step 2: Configure prompt builder
+prompt_builder = PromptBuilderBlock(
+    block_name="question_prompter",
+    input_cols="text",
+    output_cols="messages",
+    prompt_config_path="question_gen_prompt.yaml"
+)
+# Step 3: Configure LLM chat block
+chat_block = LLMChatBlock(
+    block_name="question_generator",
+    model="openai/gpt-4o",
+    api_key="your-api-key",
+    input_cols="messages",
+    output_cols="llm_response",
+    temperature=0.7
+)
+# Step 4: Process dataset
+dataset = pd.DataFrame([{
+    "text": "Machine learning is a subset of AI that enables systems to learn from data."
+}])
+# Execute pipeline
+result = prompt_builder.generate(dataset)
+result = chat_block.generate(result)
+print(result["llm_response"][0])
+# Generated questions based on the text
+```
+### Configuration Reference
+**Required Parameters:**
+- `block_name` - Unique identifier for the block
+- `input_cols` - Column specification (str, list, or dict for mapping)
+- `output_cols` - Single output column name (must be exactly one)
+- `prompt_config_path` - Path to YAML template file
+**Optional Parameters:**
+- `format_as_messages` - Output format (default: `True`)
+  - `True`: List of dicts with 'role' and 'content' keys
+  - `False`: Concatenated string with role prefixes
+**Template Requirements:**
+- Must be a YAML list of message objects
+- Each message requires 'role' and 'content' fields
+- Valid roles: 'system', 'user', 'assistant', 'tool'
+- Must contain at least one 'user' message
+- Final message must have role='user' for chat completion
+- Content supports Jinja2 templating syntax
+**Template Variable Resolution:**
+- Variables in `{{...}}` are replaced with dataset column values
+- Use `input_cols` dict to map column names to template variables
+- Missing variables are logged as warnings
 ## 🔍 TextParserBlock

{sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/blocks/overview.md RENAMED Viewed

@@ -141,7 +141,7 @@ result = block.generate(dataset)  # ❌ Error!
 Ready to dive deeper? Explore specific block categories:
-- **[LLM Blocks](llm-blocks.md)** - AI-powered language model operations
-- **[Transform Blocks](transform-blocks.md)** - Data manipulation and reshaping
-- **[Filtering Blocks](filtering-blocks.md)** - Quality control and validation
-- **[Custom Blocks](custom-blocks.md)** - Build your own processing blocks
+- **[LLM Blocks](blocks/llm-blocks.md)** - AI-powered language model operations
+- **[Transform Blocks](blocks/transform-blocks.md)** - Data manipulation and reshaping
+- **[Filtering Blocks](blocks/filtering-blocks.md)** - Quality control and validation
+- **[Custom Blocks](blocks/custom-blocks.md)** - Build your own processing blocks

{sdg_hub-0.6.0 → sdg_hub-0.7.0}/docs/concepts.md RENAMED Viewed

@@ -26,16 +26,12 @@ SDG Hub organizes blocks into logical categories:
 | **Filtering** | Quality control | Value-based filtering, threshold checks |
 | **Evaluation** | Quality assessment | Faithfulness scoring, relevancy evaluation |
-### Block Example
-#TODO: Add block example
+For detailed block examples and usage patterns, see [Block System Overview](blocks/overview.md).
 ## 🌊 Flows: Orchestrating Pipelines
 **Flows** are YAML-defined pipelines that orchestrate multiple blocks into complete data processing workflows.
-### Flow Structure
-#TODO: Add flow structure
 ### Flow Execution Model
 Flows execute blocks sequentially:
@@ -58,6 +54,8 @@ Each block:
 - **🛡️ Validation** - Built-in checks for configuration and data compatibility
 - **📊 Monitoring** - Execution tracking and performance metrics
+For detailed flow structure and YAML configuration examples, see [Flow System Overview](flows/overview.md).
 ## 🔍 Auto-Discovery System
 SDG Hub automatically discovers and registers components with zero configuration.
@@ -187,6 +185,6 @@ print(f"Output columns: {result['final_dataset']['columns']}")
 Now that you understand the core concepts:
 1. **[Explore Block Types](blocks/overview.md)** - Learn about specific block categories
-2. **[Master Flow Configuration](flows/yaml-configuration.md)** - Deep dive into YAML structure
+2. **[Understand Flow System](flows/overview.md)** - Chain blocks into complete flows
 3. **[Build Custom Components](blocks/custom-blocks.md)** - Create your own blocks
-4. **[Advanced Patterns](flows/custom-flows.md)** - Build sophisticated pipelines
+4. **[Advanced Patterns](flows/custom-flows.md)** - Build sophisticated flows

sdg-hub 0.6.0__tar.gz → 0.7.0__tar.gz

sdg-hub 0.6.0tar.gz → 0.7.0tar.gz