PyPI - veep - Versions diffs - 0.4.2__tar.gz - Mend

veep 0.4.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

veep-0.4.2/.github/workflows/publish.yml +31 -0
veep-0.4.2/.gitignore +13 -0
veep-0.4.2/API_QUESTIONS.md +34 -0
veep-0.4.2/CLAUDE.md +129 -0
veep-0.4.2/LICENSE +21 -0
veep-0.4.2/PKG-INFO +313 -0
veep-0.4.2/README.md +279 -0
veep-0.4.2/pyproject.toml +62 -0
veep-0.4.2/src/veep/__init__.py +112 -0
veep-0.4.2/src/veep/auth.py +211 -0
veep-0.4.2/src/veep/client.py +456 -0
veep-0.4.2/src/veep/collections.py +210 -0
veep-0.4.2/src/veep/exceptions.py +140 -0
veep-0.4.2/src/veep/models.py +153 -0
veep-0.4.2/src/veep/schema.py +177 -0
veep-0.4.2/src/veep/vectors.py +892 -0
veep-0.4.2/tests/__init__.py +0 -0
veep-0.4.2/tests/test_client.py +166 -0
veep-0.4.2/tests/test_collections.py +273 -0
veep-0.4.2/tests/test_models.py +72 -0
veep-0.4.2/tests/test_schema.py +114 -0
veep-0.4.2/tests/test_vectors.py +800 -0

veep-0.4.2/.github/workflows/publish.yml ADDED Viewed

@@ -0,0 +1,31 @@
+name: Publish to PyPI
+on:
+  push:
+    tags:
+      - "v*"
+permissions:
+  contents: read
+jobs:
+  publish:
+    runs-on: ubuntu-latest
+    environment: pypi
+    permissions:
+      id-token: write
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+      - name: Install build tools
+        run: pip install build
+      - name: Build package
+        run: python -m build
+      - name: Publish to PyPI
+        uses: pypa/gh-action-pypi-publish@release/v1

veep-0.4.2/.gitignore ADDED Viewed

@@ -0,0 +1,13 @@
+__pycache__/
+*.pyc
+*.pyo
+*.egg-info/
+dist/
+build/
+.eggs/
+*.egg
+.pytest_cache/
+.venv/
+venv/
+.env
+.pypirc

veep-0.4.2/API_QUESTIONS.md ADDED Viewed

@@ -0,0 +1,34 @@
+# API Questions & Gaps
+Issues discovered while building the veep Python SDK.
+## Resolved
+### 1. Individual vector fetch — RESOLVED
+`GET /api/v1/collections/:collection/vectors/:key` implemented across the full stack:
+worker key scan, coordinator fan-out/aggregation, consumer-site proxy, SDK `vectors.fetch()`.
+Metadata included by default via artifact server lookup.
+### 3. Multipart vs raw binary upload — RESOLVED
+Fixed in SDK v0.2.0 to use proper multipart/form-data.
+### 4. Schema auto-detection timing — RESOLVED
+`collections.create()` now accepts `id_field` and `vector_field`. When provided, schema is
+pre-confirmed at creation time — no polling, no manual confirmation step. Auto-detection
+only triggers when fields are omitted or don't match the uploaded data.
+## Open
+### 2. Individual vector delete
+There is no endpoint to delete individual vectors by key. Deletion operates at the file level
+only (`DELETE /api/v1/collections/:collection/files/:filename`). If a user wants to remove one
+vector from a 10,000-vector file, they must delete the entire file and re-upload without it.
+**Impact**: The SDK exposes `vectors.delete(collection, filename)` which deletes files, not
+vectors. This is semantically confusing for users who think in terms of vectors, not files.
+This is part of the larger vector-addressable abstraction work (server-v28q).
+### 5. Collection describe response shape
+`GET /api/v1/collections/:collection` proxies raw coordinator response without normalizing
+field names. The SDK does fragile multi-field parsing (`collection_name` vs `name` vs
+`collection`). Should normalize on the consumer-site side (server-rf2u).

veep-0.4.2/CLAUDE.md ADDED Viewed

@@ -0,0 +1,129 @@
+# veep — Python SDK for Vector Panda
+## Project Overview
+Python client library for the Vector Panda vector search API. Published to PyPI as `veep`.
+This lives under `/home/mike/server/veep/` in the monorepo.
+## Architecture
+- **Thin HTTP wrapper** — all state lives server-side. The SDK makes REST calls and returns typed results.
+- **Single host** — all traffic routes through consumer-site (.120) at `/api/v1/*` endpoints.
+- **Resource-based API** — `client.collections.*`, `client.vectors.*`, `client.schema.*` sub-resources.
+## Public API
+```python
+from veep import VP
+# Authentication (device flow — opens browser for OAuth)
+vp = VP.login()              # interactive, saves to ~/.veep/credentials.json
+vp = VP.from_creds()                  # reuse saved credentials
+vp = VP(api_key="...")       # explicit key
+vp.save()                             # persist for later
+# Collections
+vp.collections.create("name", tier="hot")
+vp.collections.get("name")
+vp.collections.list()
+vp.collections.delete("name")
+vp.collections.status("name")
+# Vectors
+vp.vectors.upsert("collection", "file.parquet")
+vp.vectors.replace("collection", "file.parquet")
+vp.vectors.query("collection", vector=[...])
+vp.vectors.query_batch([...])
+vp.vectors.delete("collection", "filename")
+vp.vectors.list_files("collection")
+# Schema
+vp.schema.get("collection")
+vp.schema.confirm("collection", id_field="id", vector_field="emb")
+# Health
+vp.ping()
+```
+## API Endpoints Used
+All traffic goes through consumer-site (.120):
+| SDK Method | HTTP | Endpoint |
+|------------|------|----------|
+| `collections.create()` | POST | `/api/v1/collections` |
+| `collections.list()` | GET | `/api/v1/collections` |
+| `collections.get()` | GET | `/api/v1/collections/{name}` |
+| `collections.status()` | GET | `/api/v1/collections/{name}/status` |
+| `collections.delete()` | DELETE | `/api/v1/collections/{name}` |
+| `vectors.upsert()` | POST | `/api/v1/collections/{col}/files/{file}` |
+| `vectors.replace()` | PUT | `/api/v1/collections/{col}/files/{file}` |
+| `vectors.query()` | POST | `/api/v1/query` |
+| `vectors.query_batch()` | POST | `/api/v1/query/batch` |
+| `vectors.delete()` | DELETE | `/api/v1/collections/{col}/files/{file}` |
+| `vectors.list_files()` | GET | `/api/v1/collections/{col}/files` |
+| `schema.get()` | GET | `/api/v1/collections/{col}/schema` |
+| `schema.confirm()` | POST | `/api/v1/collections/{col}/schema/confirm` |
+| `health()` | GET | `/api/v1/health` |
+| `VP.login()` | POST | `/api/v1/auth/device` + `/api/v1/auth/device/token` |
+Auth: Bearer token in Authorization header for all endpoints (except health and device auth).
+Credentials persist to `~/.veep/credentials.json` (chmod 600).
+## Package Structure
+```
+veep/
+├── src/veep/
+│   ├── __init__.py       # Public API exports
+│   ├── client.py         # VP class, HTTP engine, login()/from_creds()
+│   ├── auth.py           # Device auth flow, credential persistence
+│   ├── collections.py    # Collections sub-resource
+│   ├── vectors.py        # Vectors sub-resource
+│   ├── schema.py         # Schema sub-resource
+│   ├── models.py         # Result, Collection, FileInfo, etc.
+│   └── exceptions.py     # Full exception hierarchy
+├── tests/
+│   ├── test_client.py
+│   ├── test_collections.py
+│   ├── test_vectors.py
+│   ├── test_schema.py
+│   └── test_models.py
+├── .github/workflows/publish.yml
+├── pyproject.toml
+├── README.md
+├── API_QUESTIONS.md
+└── CLAUDE.md
+```
+## Coding Rules
+- **Python 3.9+** minimum.
+- **Minimal dependencies**: `requests` for HTTP. No pyarrow dependency (removed — files uploaded as-is).
+- **Type hints** on all public methods. Use `from __future__ import annotations`.
+- **Docstrings** on all public classes and methods (Google style).
+- **Exception hierarchy**: `VeepError` base, with specific subclasses for every failure mode.
+- **Verbose mode**: `verbose=True` logs in plain English via Python logging.
+## Testing
+```bash
+pip install -e ".[dev]"
+pytest
+```
+## Build & Publish
+GitHub Actions publishes to PyPI on version tag push (`v*`).
+Manual:
+```bash
+pip install build twine
+python -m build
+twine upload dist/*
+```
+## Commit Rules
+- Prefix: `veep:` for commits touching this directory.

veep-0.4.2/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Vector Panda
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

veep-0.4.2/PKG-INFO ADDED Viewed

@@ -0,0 +1,313 @@
+Metadata-Version: 2.4
+Name: veep
+Version: 0.4.2
+Summary: Python SDK for Vector Panda vector search
+Project-URL: Homepage, https://vectorpanda.com
+Project-URL: Documentation, https://github.com/vectorpanda/veep
+Project-URL: Repository, https://github.com/vectorpanda/veep
+Project-URL: Issues, https://github.com/vectorpanda/veep/issues
+Author-email: Vector Panda <hello@vectorpanda.com>
+License-Expression: MIT
+License-File: LICENSE
+Keywords: ai,embeddings,search,similarity,vector
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Database
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Requires-Python: >=3.9
+Requires-Dist: requests>=2.28
+Provides-Extra: dev
+Requires-Dist: pytest>=7.0; extra == 'dev'
+Requires-Dist: responses>=0.23; extra == 'dev'
+Requires-Dist: ruff>=0.6; extra == 'dev'
+Provides-Extra: pandas
+Requires-Dist: pandas>=1.5; extra == 'pandas'
+Requires-Dist: pyarrow>=10.0; extra == 'pandas'
+Description-Content-Type: text/markdown
+# veep — Python SDK for Vector Panda
+Search your vectors in five minutes.
+## Install
+```bash
+pip install veep
+```
+Requires Python 3.9+. The only mandatory dependency is `requests`. NumPy, pandas, and PyArrow are optional — install them only if you use those upload modes.
+## Quickstart
+Five steps. Copy-paste the whole block — it runs end-to-end.
+```python
+import numpy as np
+from veep import VP
+# 1. Sign in. Opens your browser for Google or GitHub OAuth.
+#    Credentials save to ~/.veep/credentials.json so future runs skip this step.
+vp = VP.login()
+# 2. Create a collection.
+vp.collections.create("quickstart", tier="hot")
+# 3. Upload 100 random 64-dim vectors.
+rng = np.random.default_rng(42)
+vectors = [
+    {"id": f"item_{i}", "vector": rng.standard_normal(64).tolist()}
+    for i in range(100)
+]
+vp.vectors.upsert("quickstart", vectors=vectors)
+# 4. Query the 5 nearest vectors to a random target.
+query_vec = rng.standard_normal(64).tolist()
+results = vp.vectors.query("quickstart", vector=query_vec, top_k=5)
+# 5. Print results.
+for r in results:
+    print(f"{r.key}  score={r.score:.4f}")
+```
+That's it. You're searching vectors. `upsert` blocks until the collection is queryable, so step 4 always sees the data from step 3 — no manual polling.
+## Upload Modes
+Pick whichever shape your data already has. All four take the same `upsert()` call:
+```python
+# Inline list of dicts (the quickstart shape — good for small batches)
+vp.vectors.upsert("col", vectors=[
+    {"id": "abc", "vector": [0.1, 0.2, ...], "metadata": {"color": "red"}},
+])
+# Parquet / CSV / JSONL file on disk
+vp.vectors.upsert("col", "embeddings.parquet")
+# pandas DataFrame with id + vector + optional metadata columns
+import pandas as pd
+df = pd.DataFrame({"id": ids, "vector": list(embeddings), "category": tags})
+vp.vectors.upsert("col", dataframe=df)
+# pyarrow Table — useful when you've already loaded with pyarrow
+import pyarrow.parquet as pq
+tbl = pq.read_table("embeddings.parquet")
+vp.vectors.upsert("col", table=tbl)
+```
+The file path is the bulk-upload path: it streams your file through a chunked protocol with no client-side RAM ceiling. The other three are convenience wrappers that serialize to Arrow and use the same backend.
+## Authentication
+Three ways to connect — pick whichever fits your workflow:
+```python
+# Option 1: Interactive login (recommended for first use)
+# Opens your browser for Google or GitHub sign-in.
+# Saves credentials to ~/.veep/credentials.json automatically.
+vp = VP.login()
+# Option 2: Reuse saved credentials (no browser, no key needed)
+# Works after a previous login() call.
+vp = VP.from_creds()
+# Option 3: Explicit API key (CI/CD, headless environments)
+vp = VP(api_key="sk_live_...")
+# Option 4: Environment variable
+# export VEEP_API_KEY=sk_live_...
+vp = VP()
+```
+`login()` uses the same device authorization pattern as `gh auth login` — it works in terminals, Jupyter notebooks, and remote SSH sessions. The verification URL is clickable in notebooks.
+```python
+# Full options
+vp = VP(
+    api_key="your_key",        # or set VEEP_API_KEY env var
+    host="https://...",         # optional, defaults to Vector Panda cloud
+    timeout=120,                # request timeout in seconds
+    verbose=True,               # log what the client is doing in plain English
+)
+# Save credentials for later
+vp.save()                       # writes to ~/.veep/credentials.json
+```
+## Collections
+```python
+# Create a collection (with schema for instant processing)
+col = vp.collections.create(
+    "products",
+    tier="hot",
+    id_field="product_id",
+    vector_field="embedding",
+)
+# Or create without schema (auto-detected from first upload)
+col = vp.collections.create("products", tier="hot")
+# List all collections
+for col in vp.collections.list():
+    count = col.vector_count if col.vector_count is not None else "—"
+    size = f"{col.storage_gb:.1f} GB" if col.storage_gb is not None else "—"
+    print(f"{col.name}: {count} vectors, {size}")
+# Get details about one collection
+col = vp.collections.get("products")
+print(col.dimension, col.status)
+# Check processing status
+status = vp.collections.status("products")  # "ready", "processing", "unknown", "error"
+# Delete a collection (permanent)
+vp.collections.delete("products")
+```
+## Querying
+```python
+results = vp.vectors.query(
+    "products",
+    vector=[0.1, 0.2, ...],          # your query vector
+    top_k=10,                          # max results (default: 10)
+    min_score=0.7,                     # only return results with score >= this (cosine 0-1)
+    metric="cosine",                   # "cosine", "euclidean", "dot_product"
+    with_metadata=True,                # return metadata fields
+)
+for r in results:
+    print(f"{r.key}: {r.score:.4f} — {r.metadata}")
+# Batch queries (up to 100 at once)
+batch = vp.vectors.query_batch([
+    {"collection": "products", "vector": query_vec_1, "top_k": 5},
+    {"collection": "products", "vector": query_vec_2, "top_k": 5},
+])
+for query_results in batch:
+    print(f"Got {len(query_results)} results")
+# Fetch a single vector by key (the key from a query result)
+result = vp.vectors.fetch("products", "12345")
+if result.found:
+    print(f"Vector: {result.vector[:5]}...")
+    print(f"Metadata: {result.metadata}")
+```
+## File Management
+```python
+# Replace an existing file (idempotent: same content = no-op)
+result = vp.vectors.replace("products", "product_embeddings.parquet")
+# List uploaded files
+for f in vp.vectors.list_files("products"):
+    print(f"{f.name}: {f.size} bytes, modified {f.modified}")
+# Delete an uploaded file
+vp.vectors.delete("products", "old_embeddings.parquet")
+```
+## Schema
+After uploading files, Vector Panda auto-detects which columns hold your vector keys and embeddings. You can inspect and confirm the schema:
+```python
+schema = vp.schema.get("products")
+print(schema.state)         # "analyzing" or "confirmed"
+print(schema.vector_field)  # e.g., "embedding"
+print(schema.id_field)      # e.g., "product_id"
+# Confirm or override the detected schema
+vp.schema.confirm("products", id_field="product_id", vector_field="embedding")
+```
+## Index Parameters
+For advanced use, pass index-specific parameters to queries:
+```python
+results = vp.vectors.query(
+    "products",
+    vector=query_vec,
+    use_index="pca",
+    index_params={"pca": {"reduced_dimensions": 64, "candidate_multiplier": 10}},
+)
+```
+## Health Check
+```python
+if vp.ping():
+    print("Vector Panda is up")
+```
+## Verbose Mode
+Turn on `verbose=True` to see what the client is doing:
+```python
+vp = VP(api_key="...", verbose=True)
+vp.collections.list()
+# veep: Connected to https://api.vectorpanda.com
+# veep: Listing collections...
+# veep: Found 3 collection(s).
+```
+## Error Handling
+Every error tells you what happened and what to do about it:
+```python
+from veep import VP
+from veep.exceptions import (
+    CollectionNotFoundError,
+    CollectionAlreadyExistsError,
+    CollectionNotReadyError,
+    AuthError,
+    ValidationError,
+)
+try:
+    vp.collections.get("nonexistent")
+except CollectionNotFoundError as e:
+    print(e)
+    # Collection 'nonexistent' not found.
+    # Use vp.collections.list() to see available collections.
+```
+| Exception | When |
+|-----------|------|
+| `AuthError` | Invalid or missing API key |
+| `ValidationError` | Bad parameter (name, vector, etc.) |
+| `CollectionNotFoundError` | Collection doesn't exist |
+| `CollectionAlreadyExistsError` | Collection already exists |
+| `CollectionNotReadyError` | Collection is still ingesting; retry shortly |
+| `UploadError` | File not found or unreadable |
+| `FileAlreadyExistsError` | File exists (use `replace`) |
+| `QueryError` | Query service unavailable |
+| `TimeoutError` | Request timed out |
+| `ServerError` | Unexpected server error |
+## Configuration
+| Environment Variable | Description |
+|---------------------|-------------|
+| `VEEP_API_KEY` | Default API key |
+| `VEEP_HOST` | Default API host |
+## Beta Status
+Vector Panda is in private beta. `pip install veep` works for everyone, but creating an account currently requires an invite — request one at [vectorpanda.com](https://vectorpanda.com). If you already have an API key, everything in this README is live.
+## License
+MIT