PyPI - freesolo - Versions diffs - 0.2.3__tar.gz → 0.2.4__tar.gz - Mend

freesolo 0.2.3tar.gz → 0.2.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (125) hide show

freesolo-0.2.4/.github/workflows/publish-packages.yml ADDED Viewed

@@ -0,0 +1,96 @@
+name: Publish packages
+on:
+  push:
+    branches:
+      - main
+    paths:
+      - "pyproject.toml"
+      - "uv.lock"
+      - "pypi/**"
+      - "examples/**"
+      - ".github/workflows/publish-packages.yml"
+  workflow_dispatch:
+concurrency:
+  group: publish-packages-${{ github.ref }}
+  cancel-in-progress: false
+jobs:
+  publish-pypi:
+    name: Publish PyPI package
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+    env:
+      UV_PUBLISH_TOKEN: ${{ secrets.PYPI_API_TOKEN }}
+    steps:
+      - uses: actions/checkout@v6
+      - uses: actions/setup-python@v6
+        with:
+          python-version: "3.12"
+      - name: Read package metadata
+        id: metadata
+        run: |
+          python - <<'PY' >> "$GITHUB_OUTPUT"
+          import tomllib
+          with open("pyproject.toml", "rb") as f:
+              project = tomllib.load(f)["project"]
+          print(f"name={project['name']}")
+          print(f"version={project['version']}")
+          PY
+      - name: Check PyPI for existing version
+        id: pypi
+        env:
+          PACKAGE_NAME: ${{ steps.metadata.outputs.name }}
+          PACKAGE_VERSION: ${{ steps.metadata.outputs.version }}
+        run: |
+          python - <<'PY' >> "$GITHUB_OUTPUT"
+          import os
+          import urllib.error
+          import urllib.request
+          name = os.environ["PACKAGE_NAME"]
+          version = os.environ["PACKAGE_VERSION"]
+          url = f"https://pypi.org/pypi/{name}/{version}/json"
+          try:
+              with urllib.request.urlopen(url, timeout=30) as response:
+                  exists = response.status == 200
+          except urllib.error.HTTPError as error:
+              if error.code != 404:
+                  raise
+              exists = False
+          print(f"exists={'true' if exists else 'false'}")
+          PY
+      - name: Skip existing PyPI version
+        if: steps.pypi.outputs.exists == 'true'
+        run: echo "${{ steps.metadata.outputs.name }} ${{ steps.metadata.outputs.version }} is already on PyPI."
+      - name: Install uv
+        if: steps.pypi.outputs.exists != 'true'
+        run: python -m pip install --upgrade uv
+      - name: Build distributions
+        if: steps.pypi.outputs.exists != 'true'
+        run: |
+          rm -rf dist
+          uv build
+      - name: Publish to PyPI
+        if: steps.pypi.outputs.exists != 'true' && env.UV_PUBLISH_TOKEN != ''
+        run: uv publish
+      - name: Skip publish without PyPI token
+        if: steps.pypi.outputs.exists != 'true' && env.UV_PUBLISH_TOKEN == ''
+        run: |
+          echo "PYPI_API_TOKEN is not configured; built distributions but skipped upload."
+          echo "Add a PYPI_API_TOKEN repository secret to publish this package."

freesolo-0.2.4/.github/workflows/python-checks.yml ADDED Viewed

@@ -0,0 +1,41 @@
+name: Python checks
+on:
+  pull_request:
+  push:
+    branches:
+      - main
+  workflow_dispatch:
+permissions:
+  contents: read
+jobs:
+  checks:
+    name: Ruff and tests
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v6
+      - uses: actions/setup-python@v6
+        with:
+          python-version: "3.12"
+      - name: Install uv
+        run: python3 -m pip install --upgrade uv
+      - name: Install dependencies
+        run: uv sync --locked --extra dev
+      - name: Python compile check
+        run: python3 -m py_compile $(find pypi tests -name '*.py' -print)
+      - name: Ruff check
+        run: uv run --extra dev python -m ruff check .
+      - name: Ruff format check
+        run: uv run --extra dev python -m ruff format --check .
+      - name: Tests
+        run: uv run --extra dev python -m pytest tests

freesolo-0.2.4/.github/workflows/sync-package-function-usage.yml ADDED Viewed

@@ -0,0 +1,38 @@
+name: Sync package function usage
+on:
+  push:
+    branches:
+      - main
+    paths:
+      - "function_usage_registry.json"
+      - "scripts/sync_package_function_usage.py"
+      - ".github/workflows/sync-package-function-usage.yml"
+  workflow_dispatch:
+permissions:
+  contents: read
+jobs:
+  sync:
+    name: Sync usage registry
+    runs-on: ubuntu-latest
+    if: ${{ github.ref == 'refs/heads/main' }}
+    env:
+      SUPABASE_URL: ${{ secrets.SUPABASE_URL }}
+      SUPABASE_SERVICE_ROLE_KEY: ${{ secrets.SUPABASE_SERVICE_ROLE_KEY }}
+    steps:
+      - uses: actions/checkout@v6
+      - uses: actions/setup-python@v6
+        with:
+          python-version: "3.12"
+      - name: Sync package function rows
+        if: env.SUPABASE_URL != '' && env.SUPABASE_SERVICE_ROLE_KEY != ''
+        run: python scripts/sync_package_function_usage.py --remove-stale
+      - name: Skip without Supabase secrets
+        if: env.SUPABASE_URL == '' || env.SUPABASE_SERVICE_ROLE_KEY == ''
+        run: echo "SUPABASE_URL and SUPABASE_SERVICE_ROLE_KEY secrets are required to sync usage."

{freesolo-0.2.3 → freesolo-0.2.4}/PKG-INFO RENAMED Viewed

@@ -1,19 +1,25 @@
 Metadata-Version: 2.4
 Name: freesolo
-Version: 0.2.3
+Version: 0.2.4
 Summary: Tracing, evaluation, and training utilities for LLM applications.
-Requires-Python: >=3.10
+Requires-Python: >=3.11
+Requires-Dist: gepa>=0.1.1
 Requires-Dist: httpx>=0.27.0
+Requires-Dist: jsonschema>=4.0.0
+Requires-Dist: numpy>=1.26.0
+Requires-Dist: opentelemetry-api>=1.28.0
+Requires-Dist: opentelemetry-exporter-otlp-proto-http>=1.28.0
+Requires-Dist: opentelemetry-sdk>=1.28.0
+Requires-Dist: pymongo>=4.0.0
+Requires-Dist: python-dotenv>=1.0.0
+Requires-Dist: tinker-cookbook>=0.3.0
+Requires-Dist: tinker>=0.19.0
 Requires-Dist: wandb>=0.17.0
 Provides-Extra: dev
 Requires-Dist: pytest>=8.0.0; extra == 'dev'
 Requires-Dist: ruff>=0.11.0; extra == 'dev'
 Provides-Extra: examples
-Requires-Dist: anthropic>=0.40.0; extra == 'examples'
-Requires-Dist: google-genai>=1.0.0; extra == 'examples'
 Requires-Dist: openai>=1.0.0; extra == 'examples'
-Provides-Extra: gepa
-Requires-Dist: gepa>=0.1.1; extra == 'gepa'
 Description-Content-Type: text/markdown
 # freesolo
@@ -24,36 +30,15 @@ It is built for the lowest-friction integration possible:
 1. Install the package
 2. Set `FREESOLO_API_KEY`
-3. Wrap your OpenAI, Anthropic, Gemini, or OpenAI-compatible client
+3. Configure the tracer
 4. Run traces and evaluations from the package APIs
-## Current provider support
-`freesolo` currently supports automatic client instrumentation for:
-- OpenAI
-- Anthropic
-- Gemini
-- OpenAI-compatible clients via `wrap(...)` / `wrap_provider(...)`
 ## Install
-Install the package plus the provider client you use:
-```bash
-pip install freesolo openai
-```
-or
-```bash
-pip install freesolo anthropic
-```
-or
+Install the package:
 ```bash
-pip install freesolo google-genai
+pip install freesolo
 ```
 ## Environment
@@ -68,107 +53,77 @@ export FREESOLO_API_KEY=fslo_...
 ## Quickstart
 ```python
-from openai import OpenAI
-from freesolo import wrap
-client = wrap(OpenAI())
-result = client.responses.create(
-    model="gpt-4.1-mini",
-    instructions="Reply in plain text.",
-    input=[
-        {
-            "role": "user",
-            "content": [{"type": "input_text", "text": "How do I reset my password?"}],
-        }
-    ],
-)
-print(result.output_text or "")
+from freesolo.tracing import configure_tracer, get_tracer
+configure_tracer(service_name="my-llm-app")
+tracer = get_tracer()
+with tracer.start_as_current_span(
+    "model.call",
+    attributes={
+        "gen_ai.system": "openai",
+        "gen_ai.request.model": "gpt-5.5",
+        "freesolo.input": {"prompt": "How do I reset my password?"},
+    },
+) as span:
+    result = "Reset it from account settings."
+    span.set_attribute("freesolo.output", result)
 ```
-## OpenRouter Quickstart
+## Runnable Examples
-```python
-from openai import OpenAI
-from freesolo import wrap
+Copy-pasteable examples live in [`examples/`](examples/):
-client = wrap(
-    OpenAI(
-        base_url="https://openrouter.ai/api/v1",
-        api_key="YOUR_OPENROUTER_API_KEY",
-    )
-)
+- `tracing_manual_span.py`: configure OpenTelemetry and send one application span.
+- `evaluation_custom_scorer.py`: run custom binary and numeric eval scorers.
+- `evaluation_from_files.py`: run evals from a concrete dataset and environment.
+- `environment.py`: example environment used by evals, training, and GEPA.
+- `support_dataset.py`: example dataset paths and loaders used by evals, SFT, GRPO, and GEPA.
+- `gepa_prompt_example.py`: run the Freesolo GEPA adapter over the example dataset.
+- `training_sft_grpo.py`: start SFT or GRPO training runs from package APIs.
-response = client.chat.completions.create(
-    model="openai/gpt-4.1-mini",
-    messages=[
-        {"role": "system", "content": "Reply in plain text."},
-        {"role": "user", "content": "Write a one-sentence launch blurb."},
-    ],
-    max_tokens=120,
-)
-print(response.choices[0].message.content or "")
-```
-## Gemini Quickstart
-```python
-from google import genai
-from freesolo import instrument_gemini
+From a repo checkout:
-client = instrument_gemini(genai.Client())
-response = client.models.generate_content(
-    model="gemini-2.5-flash",
-    contents="Write a one-sentence release note for traced Gemini support.",
-)
-print(response.text or "")
+```bash
+cd freesolo-sdk
+export PYTHONPATH="$PWD/pypi"
+uv run python examples/evaluation_custom_scorer.py --local
 ```
-## Group Multiple Model Calls
-For agentic or long-horizon tasks, strongly prefer wrapping the whole task in `start_trace(...)` so all of the model calls land in one trace.
-For a single one-off OpenAI, Anthropic, or Gemini request, you can skip it.
-```python
-from anthropic import Anthropic
-from freesolo import instrument_anthropic, start_trace
-client = instrument_anthropic(Anthropic())
-with start_trace("support-agent-run"):
-    first = client.messages.create(
-        model="claude-sonnet-4-20250514",
-        max_tokens=64,
-        messages=[{"role": "user", "content": "Say hello"}],
-    )
-    second = client.messages.create(
-        model="claude-sonnet-4-20250514",
-        max_tokens=64,
-        messages=[{"role": "user", "content": "Say goodbye"}],
-    )
-```
+## Public API
+The root `freesolo` module intentionally exports no functions. Import from the
+subpackages below; lower-level modules may be importable, but they are
+implementation helpers unless they appear here or in an example.
+| Import | Use case |
+| --- | --- |
+| `freesolo.tracing.configure_tracer`, `get_tracer`, `force_flush`, `shutdown` | Send OpenTelemetry traces from an application to Freesolo. |
+| `freesolo.evaluation.EvaluationClient` | Run custom-scorer evals or environment evals and upload results to Freesolo. |
+| `freesolo.evaluation.run_local_evaluation` | Run custom scorers locally without uploading results. |
+| `freesolo.evaluation.CustomScorer`, `BinaryResponse`, `NumericResponse` | Define local scorer logic for eval rows. |
+| `freesolo.evaluation.HostedJudgeClient` and hosted scorer classes | Use hosted LLM-as-judge scorers with OpenRouter-compatible credentials. |
+| `freesolo.datasets.TaskExample`, `Dataset`, `load_dataset` | Load task examples and construct labeled conversations for evals or training. |
+| `freesolo.environments.Environment`, `RewardResult`, `RewardMetric`, `GrpoConfig`, `EnvironmentGeneration` | Define task behavior once for evals, GEPA, SFT, and GRPO. |
+| `freesolo.training.SftConfig`, `TrainGrpoOptions`, `train_sft`, `train_grpo` | Start SFT or GRPO training from package APIs. |
+| `freesolo.gepa.GEPASetup`, `GEPAConfig`, `DefaultReflectionAgent`, `attach_gepa`, `optimize_gepa` | Optimize prompts through the GEPA adapter using the same environment and dataset abstractions. |
+| `freesolo.contracts.load_contract_text`, `extract_contract_spec`, `load_contract_spec`, `build_oracle_messages` | Read contract markdown and build oracle prompt messages. |
+| `freesolo.utils.oracle.generate_ground_truth_records` | Generate ground-truth JSONL records from source examples using a contract, environment, and oracle model. |
+| `freesolo.utils.upload.upload_tinker_checkpoint_to_huggingface` | Upload a Tinker checkpoint to a private Hugging Face model repo. |
 ## What Gets Stored
-- Trace title if you explicitly pass it to `start_trace("...")`
-- Trace metadata if you explicitly pass it to `start_trace(..., metadata=...)`
-- Input payloads with `system_prompt`, `user_prompt`, and `images`
-- Output payloads as plain text
-- Token usage when available
-- Image inputs with inline previews for the trace UI
+- Native OTLP traces and spans
+- Resource attributes like `service.name`
+- Span names, timings, parent span ids, status, and errors
+- Common model attributes such as `gen_ai.system`, `gen_ai.request.model`, and token counts
+- Optional `freesolo.input` and `freesolo.output` span attributes
 ## Notes
-- You do not need `@trace()` for ordinary LLM tracing.
-- A single instrumented OpenAI, Anthropic, or Gemini request creates a trace automatically.
-- For OpenAI-compatible providers like OpenRouter, prefer `wrap(...)` instead of provider-specific helpers.
-- For agentic or long-horizon workflows, strongly recommend `start_trace("descriptive-title")` so planning, retries, and follow-up calls stay grouped.
-- Delivery is best-effort by default. Trace ingestion failures do not break your app.
+- Tracing uses native OpenTelemetry protobuf export to `/api/traces/ingest`.
+- Configure third-party OpenTelemetry instrumentors against the provider returned by `configure_tracer(...)`.
+- Delivery is handled by the OpenTelemetry span processor you configure.
 ## Evaluations
@@ -216,16 +171,15 @@ results = client.run(
 print(results[0].success)
 ```
-## Tinker Deployment
+## Tinker Hugging Face Upload
-`freesolo.utils.deployment` is a thin proxy for the Modal deployment server. It posts
-a Tinker checkpoint URL to the pinned Modal `/deployments` endpoint and returns
-the server JSON response.
+`freesolo.utils.upload` posts a Tinker checkpoint URL to the Freesolo upload
+service and returns the Hugging Face upload response.
 ```python
-from freesolo.utils.deployment import deploy_tinker_checkpoint
+from freesolo.utils.upload import upload_tinker_checkpoint_to_huggingface
-result = deploy_tinker_checkpoint(
+result = upload_tinker_checkpoint_to_huggingface(
     "tinker://<run_id>/sampler_weights/final",
     base_model="Qwen/Qwen3.5-35B-A3B",
 )
@@ -235,34 +189,36 @@ print(result["repoId"])
 ### Environment-driven evaluations
-For training contracts, you can use the same `Environment` adapter for evals,
-SFT, and GRPO. `run_environment` loads examples, builds prompt messages, calls
-your model callback, scores the response through the environment, and uploads
-the same `scorers_data` shape used by the eval DB.
+For training contracts, `Environment` describes task behavior for evals and
+GRPO/RL: prompt construction, response normalization, and reward scoring.
+Dataset loading and labeled conversation construction live in `freesolo.datasets`.
+`run_environment` loads task examples, calls your model callback, scores the
+response through the environment, and uploads the same `scorers_data` shape used
+by the eval DB.
 ```python
 from typing import Any
 from openai import OpenAI
+from freesolo.datasets import TaskExample
 from freesolo.environments import (
     Environment,
     EnvironmentGeneration,
     RewardMetric,
     RewardResult,
-    TaskExample,
 )
 from freesolo.evaluation import EvaluationClient
-class ContractEnvironment(Environment):
+class PromptEnvironment(Environment):
     def build_prompt_messages(
         self,
         example: TaskExample,
-        contract_text: str,
+        prompt_text: str,
     ):
         return [
-            {"role": "system", "content": contract_text},
+            {"role": "system", "content": prompt_text},
             {"role": "user", "content": example.task},
         ]
@@ -359,7 +315,6 @@ from typing import Any
 from openai import OpenAI
-from freesolo import instrument_openai
 from freesolo.evaluation import CustomScorer, EvaluationClient, NumericResponse
@@ -403,7 +358,7 @@ class CorrectnessJudge(CustomScorer[NumericResponse]):
         )
-judge_client = instrument_openai(OpenAI())
+judge_client = OpenAI()
 results = EvaluationClient().run(
     name="support-agent-correctness",
@@ -434,11 +389,4 @@ judge = HostedJudgeClient(api_key="YOUR_OPENROUTER_API_KEY")
 scorer = ReferenceCorrectnessScorer(client=judge)
 ```
-Tracing is available through namespaced helpers:
-```python
-from freesolo.tracing import start_trace
-with start_trace("support-agent-run"):
-    ...
-```
+Tracing is available through the OpenTelemetry helpers in `freesolo.tracing`.

freesolo 0.2.3__tar.gz → 0.2.4__tar.gz

freesolo 0.2.3tar.gz → 0.2.4tar.gz