PyPI - sdg-hub - Versions diffs - 0.1.4__tar.gz → 0.2.0__tar.gz - Mend

sdg-hub 0.1.4tar.gz → 0.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (364) hide show

{sdg_hub-0.1.4 → sdg_hub-0.2.0}/.github/workflows/lint.yml RENAMED Viewed

@@ -5,7 +5,7 @@ name: Lint, Format, and MyPy
 on:
   push:
     branches:
-      - "main-disabled"
+      - "main"
     paths:
       - '**.py'
       - 'pyproject.toml'
@@ -15,7 +15,7 @@ on:
       - '.github/**'
   pull_request:
     branches:
-      - "main-disabled"
+      - "main"
     paths:
       - '**.py'
       - 'pyproject.toml'
@@ -57,13 +57,15 @@ jobs:
         run: |
           tox -e ruff -- check
-      - name: Run linting
-        if: ${{ !cancelled() && (steps.deps.outcome == 'success') }}
-        run: |
-          echo "::add-matcher::.github/workflows/matchers/pylint.json"
-          tox -e lint
+      # Pylint disabled for now - may re-enable as non-blocking check in future
+      # - name: Run linting
+      #   if: ${{ !cancelled() && (steps.deps.outcome == 'success') }}
+      #   run: |
+      #     echo "::add-matcher::.github/workflows/matchers/pylint.json"
+      #     tox -e lint
-      - name: Run mypy type checks
-        if: ${{ !cancelled() && (steps.deps.outcome == 'success') }}
-        run: |
-          tox -e mypy
+      # MyPy type checking disabled for now - may re-enable as non-blocking check in future
+      # - name: Run mypy type checks
+      #   if: ${{ !cancelled() && (steps.deps.outcome == 'success') }}
+      #   run: |
+      #     tox -e mypy

{sdg_hub-0.1.4 → sdg_hub-0.2.0}/.github/workflows/pypi.yaml RENAMED Viewed

@@ -72,7 +72,7 @@ jobs:
                   egress-policy: audit # TODO: change to 'egress-policy: block' after couple of runs
             - name: "Download build artifacts"
-              uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
+              uses: actions/download-artifact@634f93cb2916e3fdff6788551b99b062d0335ce0 # v5.0.0
               with:
                   name: Packages
                   path: dist
@@ -104,7 +104,7 @@ jobs:
                   egress-policy: audit # TODO: change to 'egress-policy: block' after couple of runs
             - name: "Download build artifacts"
-              uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
+              uses: actions/download-artifact@634f93cb2916e3fdff6788551b99b062d0335ce0 # v5.0.0
               with:
                   name: Packages
                   path: dist

sdg_hub-0.2.0/CLAUDE.md ADDED Viewed

@@ -0,0 +1,171 @@
+# CLAUDE.md
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+## Project Overview
+**Requirements:** Python 3.9+
+SDG Hub is a Python framework for synthetic data generation using composable blocks and flows. Transform datasets through **building-block composition** - mix and match LLM-powered and traditional processing blocks like Lego pieces to create sophisticated data generation workflows.
+**Core Concepts:**
+- **Blocks** are composable units that transform datasets - think data processing Lego pieces
+- **Flows** orchestrate multiple blocks into complete pipelines defined in YAML
+- Simple concept: `dataset → Block₁ → Block₂ → Block₃ → enriched_dataset`
+## Development Commands
+**Use `uv` for all Python commands and package management.**
+### Setup and Installation
+```bash
+# Install core dependencies
+uv pip install .
+# Install with development dependencies
+uv pip install .[dev]
+# Alternative: uv sync --extra dev
+# Install with optional vLLM support
+uv pip install .[vllm]
+# Alternative: uv sync --extra vllm
+# Install with examples dependencies
+uv pip install .[examples]
+# Alternative: uv sync --extra examples
+```
+### Testing
+```bash
+# Run all tests
+tox -e py3-unit
+# Run tests with coverage
+tox -e py3-unitcov
+# Run specific test file
+pytest tests/test_specific_file.py
+# Run tests matching pattern
+pytest -k "test_pattern"
+```
+### Linting and Formatting
+```bash
+# Run full verification (lint, mypy, ruff)
+make verify
+# Individual lint commands
+tox -e lint        # Full pylint check
+tox -e fastlint    # Fast pylint (without 3rd party)
+tox -e ruff        # Ruff formatting and fixes
+tox -e mypy        # Type checking
+# Format code with ruff
+tox -e ruff fix
+# Check code formatting
+tox -e ruff check
+```
+### Other Make targets
+```bash
+make actionlint    # Lint GitHub Actions
+make md-lint       # Lint markdown files
+```
+## Core Architecture
+### Block System
+The framework is built around a modular block system with **composability at its core** - mix and match blocks to build simple transformations or complex multi-stage pipelines:
+- **BaseBlock** (`src/sdg_hub/core/blocks/base.py`): Abstract base class for all processing blocks with Pydantic validation
+- **BlockRegistry** (`src/sdg_hub/core/blocks/registry.py`): Auto-discovery system for organizing blocks with zero setup
+- Blocks are organized in categories:
+  - `llm/`: LLM-powered blocks (chat, prompt building, text parsing) with async execution
+  - `transform/`: Data transformation blocks (column operations, text manipulation)
+  - `filtering/`: Data filtering blocks with quality thresholds
+  - `evaluation/`: Quality evaluation blocks (faithfulness, relevancy assessment)
+  - `deprecated_blocks/`: Legacy blocks maintained for backward compatibility
+**Key Benefits**: Type-safe composition, automatic validation, rich logging, and high-performance async processing.
+### Flow System
+Flows orchestrate multiple blocks into data processing pipelines:
+- **Flow** (`src/sdg_hub/core/flow/base.py`): Main flow execution class with Pydantic validation
+- **FlowRegistry** (`src/sdg_hub/core/flow/registry.py`): Registry for flow discovery
+- **FlowMetadata** (`src/sdg_hub/core/flow/metadata.py`): Metadata and parameter definitions
+- **FlowValidator** (`src/sdg_hub/core/flow/validation.py`): YAML structure validation
+- **FlowMigration** (`src/sdg_hub/core/flow/migration.py`): Backward compatibility for old flow formats
+### Flow Configuration
+Flows are defined in YAML files with this structure:
+```yaml
+metadata:
+  name: "flow_name"
+  version: "1.0.0"
+  author: "Author Name"
+  description: "Flow description"
+parameters:
+  param_name:
+    type: "string"
+    default: "default_value"
+    description: "Parameter description"
+blocks:
+  - block_type: "BlockTypeName"
+    block_config:
+      block_name: "unique_block_name"
+      # block-specific configuration
+```
+### Built-in Flow Discovery
+The framework includes auto-discovery for flows in `src/sdg_hub/flows/`. Example flow structure:
+```
+flows/qa_generation/document_grounded_qa/multi_summary_qa/instructlab/
+├── flow.yaml                    # Main flow definition
+├── atomic_facts.yaml           # Sub-flow configurations
+├── detailed_summary.yaml
+└── generate_questions_responses.yaml
+```
+## Key Patterns
+### Block Development
+When creating new blocks:
+1. Inherit from `BaseBlock` and implement the `generate()` method
+2. Use Pydantic field validation for configuration
+3. Follow the standardized column handling patterns (`input_cols`, `output_cols`)
+4. Register blocks in appropriate category directories
+5. Include proper error handling and logging
+### Dataset Processing
+All blocks operate on HuggingFace `datasets.Dataset` objects:
+- Input validation ensures required columns exist
+- Output validation prevents column collisions
+- Rich logging provides processing summaries
+- Empty dataset handling with appropriate errors
+### Backward Compatibility
+The framework maintains compatibility with legacy formats:
+- Deprecated blocks are preserved in `deprecated_blocks/`
+- Flow migration automatically converts old YAML formats
+- Legacy LLMBlocks receive special handling during execution
+## Testing Guidelines
+- Tests are organized by block category under `tests/blocks/`
+- Use `pytest` fixtures for common test data
+- Test configuration files are in `tests/blocks/testdata/`
+- Follow the existing pattern of testing both success and error cases
+- Mock LLM clients when testing LLM-powered blocks
+## Important Notes
+- Always use `uv` for Python package management
+- The framework uses Pydantic extensively for validation and configuration
+- LLM clients are managed through the `client_manager.py` system
+- Path resolution is handled centrally in `utils/path_resolution.py`
+- Error handling follows custom exception patterns in `utils/error_handling.py`

sdg_hub-0.2.0/CONTRIBUTING.md ADDED Viewed

@@ -0,0 +1,251 @@
+# Contributing to SDG Hub
+Welcome to SDG Hub development! This guide covers everything you need to know about contributing blocks, flows, and other improvements to the SDG Hub ecosystem.
+For detailed documentation including examples and advanced patterns, see our comprehensive [Development Guide](docs/development.md).
+## 🚀 Quick Start
+### Development Setup
+1. **Clone the Repository**
+```bash
+git clone https://github.com/Red-Hat-AI-Innovation-Team/sdg_hub.git
+cd sdg_hub
+```
+2. **Install Development Dependencies**
+```bash
+# Using uv (recommended)
+uv sync --extra dev
+# Or using pip
+pip install .[dev]
+```
+## 🛠️ Development Tools
+### Linting and Code Quality
+**Primary linting tools** (required for all contributions):
+```bash
+tox -e lint        # Full pylint check
+tox -e fastlint    # Quick pylint check
+tox -e mypy        # Type checking
+# Ruff (code formatting and linting)
+tox -e ruff                 # Format and fix issues (development mode)
+tox -e ruff -- check        # Check only, no fixes (CI mode)
+./scripts/ruff.sh           # Direct script - format and fix
+./scripts/ruff.sh check     # Direct script - check only
+./scripts/ruff.sh --help    # Pass custom arguments to ruff
+```
+**Optional development tools** (require additional dependencies):
+```bash
+make actionlint          # Lint GitHub Actions (requires: actionlint, shellcheck)
+make md-lint            # Lint markdown files (requires: podman/docker)
+make verify             # Run extended checks: pylint, mypy, ruff (may differ from CI)
+```
+### Testing
+SDG Hub uses [tox](https://tox.wiki/) for test automation and [pytest](https://docs.pytest.org/) as a test framework:
+```bash
+# Run all tests
+tox -e py3-unit
+# Run with coverage
+tox -e py3-unitcov
+# Run specific tests
+pytest tests/test_specific_file.py
+pytest -k "test_pattern"
+```
+## 🧱 Contributing Blocks
+Blocks are the core processing units of SDG Hub. To contribute a new block:
+1. **Choose the appropriate category**: `llm`, `transform`, `filtering`, or `evaluation`
+2. **Implement your block** following the [Custom Blocks Guide](docs/blocks/custom-blocks.md)
+3. **Add comprehensive tests** in `tests/blocks/[category]/`
+4. **Update documentation** in the relevant block category page
+### Example Block Structure
+```python
+from sdg_hub.core.blocks.base import BaseBlock
+from sdg_hub.core.blocks.registry import BlockRegistry
+@BlockRegistry.register("MyNewBlock", "category", "Description")
+class MyNewBlock(BaseBlock):
+    """Comprehensive docstring with examples."""
+    def generate(self, samples: Dataset, **kwargs: Any) -> Dataset:
+        # Your implementation here
+        pass
+```
+## 🌊 Contributing Flows
+Flows orchestrate multiple blocks into complete pipelines. To contribute a new flow:
+1. **Design your flow** with clear use case and objectives
+2. **Create flow directory structure** under `src/sdg_hub/flows/[category]/`
+3. **Implement the flow** with comprehensive YAML configuration
+4. **Add tests** and documentation
+### Flow Directory Structure
+```
+src/sdg_hub/flows/[category]/[use_case]/[variant]/
+├── flow.yaml              # Main flow definition
+├── prompt_template_1.yaml # Supporting templates
+└── README.md             # Flow documentation
+```
+## 📋 Contribution Checklist
+### For New Blocks
+- [ ] Block placed in correct category directory
+- [ ] Inherits from `BaseBlock` and implements `generate()`
+- [ ] Registered with `@BlockRegistry.register()`
+- [ ] Comprehensive docstring with examples
+- [ ] Proper Pydantic field validation
+- [ ] Comprehensive test suite
+- [ ] Documentation updated
+- [ ] All linting checks pass
+- [ ] All tests pass
+### For New Flows
+- [ ] Flow directory structure follows conventions
+- [ ] Complete metadata in `flow.yaml`
+- [ ] Required input columns documented
+- [ ] Supporting templates included
+- [ ] Flow-specific README created
+- [ ] Integration tests written
+- [ ] Documentation updated
+## 🔄 Development Workflow
+### Git Workflow
+**Branch Naming:**
+- `feature/block-name-implementation` - New blocks
+- `feature/flow-name-implementation` - New flows
+- `fix/issue-description` - Bug fixes
+- `docs/section-updates` - Documentation updates
+**Commit Messages:**
+Follow conventional commits:
+```
+feat(blocks): add TextSummarizerBlock for document summarization
+fix(flows): correct parameter validation in QA generation flow
+docs(blocks): update LLM block examples with new model config
+```
+**Pull Request Process:**
+1. Create feature branch from `main`
+2. Implement changes with tests and documentation
+3. Run full verification: `make verify && tox -e py3-unit`
+4. Create PR with clear description
+5. Address review feedback
+6. Squash and merge when approved
+## 🤝 Community Guidelines
+- Be respectful and inclusive
+- Provide constructive feedback
+- Help newcomers get started
+- Follow the project's coding standards
+- Report issues responsibly
+## 📚 Documentation
+For comprehensive guides and examples:
+- **[Development Guide](docs/development.md)** - Complete development documentation
+- **[Custom Blocks](docs/blocks/custom-blocks.md)** - Building custom blocks
+- **[Flow Configuration](docs/flows/yaml-configuration.md)** - YAML configuration guide
+- **[Block System Overview](docs/blocks/overview.md)** - Understanding the block architecture
+- **[Flow System Overview](docs/flows/overview.md)** - Understanding flow orchestration
+## 🚀 Getting Help
+- **GitHub Issues** - Report bugs, request features
+- **GitHub Discussions** - Ask questions, share ideas
+- **Documentation** - Check existing docs first
+- **Code Examples** - Look at existing implementations
+You can run all tests by simply running the `tox -e py3-unit` command.
+## Documentation Guidelines
+### NumPy-Style Docstrings
+If you choose to add docstrings to your functions, we recommend following the NumPy docstring format for consistency with the scientific Python ecosystem.
+#### Basic Structure
+```python
+def example_function(param1, param2=None):
+    """Brief description of the function.
+    Longer description providing more context about what the function does,
+    its purpose, and any important behavioral notes.
+    Parameters
+    ----------
+    param1 : str
+        Description of the first parameter
+    param2 : int, optional
+        Description of the second parameter (default: None)
+    Returns
+    -------
+    bool
+        Description of what the function returns
+    Raises
+    ------
+    ValueError
+        When invalid input is provided
+    Examples
+    --------
+    >>> result = example_function("hello", 42)
+    >>> print(result)
+    True
+    """
+```
+#### Key Guidelines
+- **Summary**: Start with a concise one-line description
+- **Parameters**: Document all function parameters with types and descriptions
+- **Returns**: Describe return values with types and meaning
+- **Types**: Use standard Python types (`str`, `int`, `list`, `dict`, etc.)
+- **Optional parameters**: Mark default parameters as "optional"
+- **Examples**: Include simple usage examples when helpful
+#### When to Add Docstrings
+Docstrings are **optional** but recommended for:
+- Public API functions and classes
+- Complex functions with multiple parameters
+- Functions that might be confusing to other developers
+- Core framework components
+#### When to Skip Docstrings
+You may skip docstrings for:
+- Simple utility functions with obvious behavior
+- Private/internal functions (starting with `_`)
+- Functions with self-explanatory names and simple parameters
+**Remember**: Quality over quantity. A well-written docstring is better than a verbose one, and no docstring is better than a poor one.
+Thank you for contributing to SDG Hub! 🎉

sdg-hub 0.1.4__tar.gz → 0.2.0__tar.gz

sdg-hub 0.1.4tar.gz → 0.2.0tar.gz