PyPI - synkro - Versions diffs - 0.4.30__tar.gz → 0.4.53__tar.gz - Mend

synkro 0.4.30tar.gz → 0.4.53tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (106) hide show

{synkro-0.4.30 → synkro-0.4.53}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: synkro
-Version: 0.4.30
+Version: 0.4.53
 Summary: Generate training datasets from any document
 Author: Murtaza Meerza
 License-Expression: MIT
@@ -20,9 +20,8 @@ Requires-Dist: html2text>=2020.1
 Requires-Dist: httpx>=0.25
 Requires-Dist: litellm>=1.40
 Requires-Dist: mammoth>=1.6
-Requires-Dist: marker-pdf>=0.2
 Requires-Dist: pydantic>=2.0
-Requires-Dist: python-dotenv>=1.0
+Requires-Dist: pymupdf>=1.24
 Requires-Dist: rich>=13.0
 Requires-Dist: typer>=0.9
 Provides-Extra: dev
@@ -33,16 +32,19 @@ Description-Content-Type: text/markdown
 # Synkro
-Turn policies, handbooks, and documentation into high-quality training data for fine-tuning LLMs.
+Library for turning unstructured policies, handbooks, and documentation into high-quality conversation, tool calling or evaluation data for LLMs.
 ## Features
 - **Quality Evaluation** - Each response is graded and automatically refined if it fails
-- **Multiple Formats** - Conversation (multi-turn), Instruction (single-turn), and Tool Calling
+- **Multiple Formats** - Conversation (multi-turn), Instruction (single-turn), Evaluation (Q&A), and Tool Calling
+- **Eval Platform Support** - Export to LangSmith, Langfuse, or generic Q&A format
 - **Tool Call Training** - Generate OpenAI function calling format for teaching models to use custom tools
-- **Top LLM Providers** - OpenAI, Anthropic, and Google
+- **Coverage Tracking** - Track scenario diversity like code coverage, identify gaps, and improve coverage with natural language commands
+- **Top LLM Providers** - OpenAI, Anthropic, Google, and local models (Ollama, vLLM)
 - **File Support** - PDF, DOCX, TXT, Markdown, URLs
 - **CLI Included** - Generate datasets from the command line
+- **Cost Tracking** - See total cost and LLM call breakdown after each generation
 ## Installation
@@ -95,9 +97,10 @@ dataset = pipeline.generate(policy)
 | Type | Turns | Output Formats | Best For |
 |------|-------|----------------|----------|
-| **CONVERSATION** | Multi | messages | Fine-tuning chat models |
-| **INSTRUCTION** | 1 | messages | Instruction-following models |
-| **TOOL_CALL** | Multi | OpenAI function calling, ChatML | Teaching tool use |
+| **CONVERSATION** | Multi | messages, chatml | Fine-tuning chat models |
+| **INSTRUCTION** | 1 | messages, chatml | Instruction-following models |
+| **EVALUATION** | 1 | qa, langsmith, langfuse | LLM evaluation & benchmarks |
+| **TOOL_CALL** | Multi | tool_call, chatml | Teaching tool use |
 ### Conversation (Default)
@@ -133,6 +136,50 @@ Output (single-turn):
 ]}
 ```
+### Evaluation
+Generate Q&A datasets for LLM evaluation with ground truth:
+```python
+pipeline = create_pipeline(dataset_type=DatasetType.EVALUATION)
+dataset = pipeline.generate(policy, traces=50)
+# Save in different formats
+dataset.save("eval.jsonl", format="qa")         # Generic Q&A
+dataset.save("eval.jsonl", format="langsmith")  # LangSmith format
+dataset.save("eval.jsonl", format="langfuse")   # Langfuse format
+```
+Output (`format="qa"`):
+```json
+{
+  "question": "Can I submit a $200 expense without a receipt?",
+  "answer": "All expenses require receipts per policy...",
+  "expected_outcome": "Deny - missing receipt violates R003",
+  "ground_truth_rules": ["R003", "R005"],
+  "difficulty": "negative",
+  "category": "Receipt Requirements"
+}
+```
+Output (`format="langsmith"`):
+```json
+{
+  "inputs": {"question": "...", "context": "..."},
+  "outputs": {"answer": "..."},
+  "metadata": {"expected_outcome": "...", "ground_truth_rules": [...]}
+}
+```
+Output (`format="langfuse"`):
+```json
+{
+  "input": {"question": "...", "context": "..."},
+  "expectedOutput": {"answer": "...", "expected_outcome": "..."},
+  "metadata": {"ground_truth_rules": [...], "difficulty": "..."}
+}
+```
 ### Tool Calling
 Generate training data for teaching models when and how to use your custom tools:
@@ -218,6 +265,188 @@ high_quality = dataset.filter(passed=True)
 high_quality.save("training.jsonl")
 ```
+## Eval API
+Generate test scenarios and grade your own model's responses against policy compliance.
+```python
+import synkro
+# Generate scenarios with ground truth (no synthetic responses)
+result = synkro.generate_scenarios(
+    policy="Expenses over $50 require manager approval...",
+    count=100,
+)
+# Each scenario has ground truth labels
+for scenario in result.scenarios:
+    print(scenario.user_message)       # "Can I expense a $200 dinner?"
+    print(scenario.expected_outcome)   # "Requires manager approval per R001"
+    print(scenario.target_rule_ids)    # ["R001", "R003"]
+    print(scenario.scenario_type)      # "positive" | "negative" | "edge_case"
+# Grade YOUR model's responses
+for scenario in result.scenarios:
+    response = my_model(scenario.user_message)  # Your model
+    grade = synkro.grade(response, scenario, policy)
+    if not grade.passed:
+        print(f"Failed: {grade.feedback}")
+```
+### When to Use
+| Use Case | API |
+|----------|-----|
+| Generate training data | `synkro.generate()` |
+| Generate eval scenarios | `synkro.generate_scenarios()` |
+| Grade external model | `synkro.grade()` |
+### Scenario Types
+Scenarios are generated with balanced coverage:
+| Type | % | Description |
+|------|---|-------------|
+| `positive` | 35% | Happy path - user meets all criteria |
+| `negative` | 30% | Violations - user fails one criterion |
+| `edge_case` | 25% | Boundary conditions at exact limits |
+| `irrelevant` | 10% | Outside policy scope |
+### EvalScenario Fields
+```python
+scenario.user_message      # The test input
+scenario.expected_outcome  # Ground truth behavior
+scenario.target_rule_ids   # Rules being tested
+scenario.scenario_type     # positive/negative/edge_case/irrelevant
+scenario.category          # Policy category
+scenario.context           # Additional context
+```
+### Temperature
+Use `temperature` to control output diversity:
+```python
+# High temp for diverse scenario coverage
+result = synkro.generate_scenarios(policy, temperature=0.8)
+# Low temp for deterministic training data
+dataset = synkro.generate(policy, temperature=0.2)
+```
+## Coverage Tracking
+Track how well your generated scenarios cover different aspects of your policy, similar to code coverage for tests.
+```python
+import synkro
+# Generate with logic map access
+result = synkro.generate(policy, traces=50, return_logic_map=True)
+# View coverage report
+synkro.coverage_report(result)
+```
+Output:
+```
+Coverage Report
+========================================
+Overall: 68.8%
+Sub-categories: 2 covered, 1 partial, 1 uncovered
+Total scenarios: 20
+Gaps (2):
+  - Receipt requirements [HIGH] (0% coverage, 0 scenarios)
+  - Travel booking rules [MEDIUM] (partial: 40% coverage)
+Suggestions:
+  1. Add 3+ scenarios for 'Receipt requirements' testing R008, R009
+  2. Add edge_case scenarios for 'Travel booking rules'
+```
+### Coverage Report Formats
+```python
+# Print to console (default)
+synkro.coverage_report(result)
+# Get as dictionary for programmatic use
+report = synkro.coverage_report(result, format="dict")
+print(f"Coverage: {report['overall_coverage_percent']}%")
+print(f"Gaps: {len(report['gaps'])}")
+# Get as JSON string
+json_str = synkro.coverage_report(result, format="json")
+# Get raw CoverageReport object
+report = synkro.coverage_report(result, format="report")
+for gap in report.gaps:
+    print(f"Gap: {gap}")
+```
+### Interactive Coverage Commands
+In interactive mode, use natural language to view and improve coverage:
+| Command | Action |
+|---------|--------|
+| `"show coverage"` | Display coverage summary |
+| `"show coverage gaps"` | Show uncovered sub-categories |
+| `"show heatmap"` | Visual coverage by category |
+| `"increase coverage for refunds by 20%"` | Add scenarios for a sub-category |
+| `"get amount thresholds to 80%"` | Target specific coverage percentage |
+| `"add more negative scenarios for time eligibility"` | Add specific scenario types |
+### Coverage Metrics
+Each sub-category is tracked with:
+| Metric | Description |
+|--------|-------------|
+| `coverage_percent` | % of expected coverage achieved |
+| `coverage_status` | `covered` (80%+), `partial` (30-80%), `uncovered` (<30%) |
+| `scenario_count` | Number of scenarios testing this sub-category |
+| `type_distribution` | Breakdown by positive/negative/edge_case |
+## Cost & Performance
+Approximate costs using Gemini 2.5 Flash (multi-turn conversations):
+| Traces | LLM Calls | Time | Cost |
+|--------|-----------|------|------|
+| 100 | ~335 | ~13 min | ~$3 |
+| 500 | ~1,675 | ~1 hour | ~$14 |
+| 1000 | ~3,350 | ~2 hours | ~$28 |
+*Based on ~3.3 LLM calls per trace (generation + grading) with max_iterations=3. Actual costs vary by policy complexity and turn count.*
+## Local LLMs
+Run with Ollama, vLLM, or any OpenAI-compatible endpoint:
+```python
+from synkro import create_pipeline
+from synkro.models import Local
+# Ollama
+pipeline = create_pipeline(model=Local.OLLAMA("llama3.2"))
+# vLLM
+pipeline = create_pipeline(model=Local.VLLM("mistral-7b"))
+# Custom endpoint
+pipeline = create_pipeline(model=Local.CUSTOM("my-model", endpoint="http://localhost:8080"))
+```
+**CLI:**
+```bash
+synkro generate policy.pdf --provider ollama --model llama3.2
+synkro generate policy.pdf --provider vllm --endpoint http://localhost:8000
+```
 ## CLI
 ```bash
@@ -232,12 +461,18 @@ synkro generate https://example.com/policy -o training.jsonl
 # Skip interactive mode
 synkro generate policy.pdf --no-interactive
+# Quick demo with built-in policy
+synkro demo
 ```
 **Options:**
 - `--traces, -n` - Number of traces (default: 20)
 - `--output, -o` - Output file path
 - `--model, -m` - Model for generation
+- `--format, -f` - Output format: `messages`, `qa`, `langsmith`, `langfuse`, `tool_call`, `chatml`
+- `--provider, -p` - LLM provider for local models (`ollama`, `vllm`)
+- `--endpoint, -e` - Custom API endpoint URL
 - `--interactive/-i, --no-interactive/-I` - Review/edit extracted rules before generation (default: on)
 ## Interactive Mode
@@ -278,6 +513,60 @@ You can adjust both **conversation turns** and **rules** using natural language:
 Commands: `done`, `undo`, `reset`, `show R001`, `help`
+## Advanced Features
+### Checkpointing
+Resume interrupted generations:
+```python
+pipeline = create_pipeline(checkpoint_dir="./checkpoints")
+dataset = pipeline.generate(policy, traces=100)  # Resumes from checkpoint
+```
+### Dataset Operations
+```python
+# Filter by quality
+high_quality = dataset.filter(passed=True)
+# Remove duplicates
+unique = dataset.dedupe(threshold=0.85)
+# Check pass rate
+print(f"Pass rate: {dataset.passing_rate:.1%}")
+```
+### Folder Loading
+Generate from multiple documents at once:
+```python
+from synkro.core.policy import Policy
+policy = Policy.from_file("policies/")  # Loads all PDF, DOCX, TXT, MD files
+dataset = pipeline.generate(policy, traces=100)
+```
+### Thinking Mode
+Generate training data with explicit reasoning in `<think>` tags, compatible with Qwen3 and DeepSeek-R1:
+```python
+pipeline = create_pipeline(thinking=True)
+dataset = pipeline.generate(policy, traces=50)
+```
+Output:
+```json
+{"messages": [
+  {"role": "user", "content": "Can I expense a $350 team dinner?"},
+  {"role": "assistant", "content": "<think>\nLet me check the expense policy...\n- Rule: Expenses over $50 require manager approval\n- $350 exceeds the $50 threshold\n- Manager approval is required\n</think>\n\nFor a $350 team dinner, you'll need manager approval since it exceeds the $50 threshold. Please submit your expense report with the receipt and request approval from your manager."}
+]}
+```
+Works with all dataset types (`CONVERSATION`, `INSTRUCTION`, `TOOL_CALL`).
 ## Logic Map Inspection
 Access the extracted rules programmatically:

synkro 0.4.30__tar.gz → 0.4.53__tar.gz

synkro 0.4.30tar.gz → 0.4.53tar.gz