veep 0.4.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,31 @@
1
+ name: Publish to PyPI
2
+
3
+ on:
4
+ push:
5
+ tags:
6
+ - "v*"
7
+
8
+ permissions:
9
+ contents: read
10
+
11
+ jobs:
12
+ publish:
13
+ runs-on: ubuntu-latest
14
+ environment: pypi
15
+ permissions:
16
+ id-token: write
17
+ steps:
18
+ - uses: actions/checkout@v4
19
+
20
+ - uses: actions/setup-python@v5
21
+ with:
22
+ python-version: "3.12"
23
+
24
+ - name: Install build tools
25
+ run: pip install build
26
+
27
+ - name: Build package
28
+ run: python -m build
29
+
30
+ - name: Publish to PyPI
31
+ uses: pypa/gh-action-pypi-publish@release/v1
veep-0.4.2/.gitignore ADDED
@@ -0,0 +1,13 @@
1
+ __pycache__/
2
+ *.pyc
3
+ *.pyo
4
+ *.egg-info/
5
+ dist/
6
+ build/
7
+ .eggs/
8
+ *.egg
9
+ .pytest_cache/
10
+ .venv/
11
+ venv/
12
+ .env
13
+ .pypirc
@@ -0,0 +1,34 @@
1
+ # API Questions & Gaps
2
+
3
+ Issues discovered while building the veep Python SDK.
4
+
5
+ ## Resolved
6
+
7
+ ### 1. Individual vector fetch — RESOLVED
8
+ `GET /api/v1/collections/:collection/vectors/:key` implemented across the full stack:
9
+ worker key scan, coordinator fan-out/aggregation, consumer-site proxy, SDK `vectors.fetch()`.
10
+ Metadata included by default via artifact server lookup.
11
+
12
+ ### 3. Multipart vs raw binary upload — RESOLVED
13
+ Fixed in SDK v0.2.0 to use proper multipart/form-data.
14
+
15
+ ### 4. Schema auto-detection timing — RESOLVED
16
+ `collections.create()` now accepts `id_field` and `vector_field`. When provided, schema is
17
+ pre-confirmed at creation time — no polling, no manual confirmation step. Auto-detection
18
+ only triggers when fields are omitted or don't match the uploaded data.
19
+
20
+ ## Open
21
+
22
+ ### 2. Individual vector delete
23
+ There is no endpoint to delete individual vectors by key. Deletion operates at the file level
24
+ only (`DELETE /api/v1/collections/:collection/files/:filename`). If a user wants to remove one
25
+ vector from a 10,000-vector file, they must delete the entire file and re-upload without it.
26
+
27
+ **Impact**: The SDK exposes `vectors.delete(collection, filename)` which deletes files, not
28
+ vectors. This is semantically confusing for users who think in terms of vectors, not files.
29
+ This is part of the larger vector-addressable abstraction work (server-v28q).
30
+
31
+ ### 5. Collection describe response shape
32
+ `GET /api/v1/collections/:collection` proxies raw coordinator response without normalizing
33
+ field names. The SDK does fragile multi-field parsing (`collection_name` vs `name` vs
34
+ `collection`). Should normalize on the consumer-site side (server-rf2u).
veep-0.4.2/CLAUDE.md ADDED
@@ -0,0 +1,129 @@
1
+ # veep — Python SDK for Vector Panda
2
+
3
+ ## Project Overview
4
+
5
+ Python client library for the Vector Panda vector search API. Published to PyPI as `veep`.
6
+
7
+ This lives under `/home/mike/server/veep/` in the monorepo.
8
+
9
+ ## Architecture
10
+
11
+ - **Thin HTTP wrapper** — all state lives server-side. The SDK makes REST calls and returns typed results.
12
+ - **Single host** — all traffic routes through consumer-site (.120) at `/api/v1/*` endpoints.
13
+ - **Resource-based API** — `client.collections.*`, `client.vectors.*`, `client.schema.*` sub-resources.
14
+
15
+ ## Public API
16
+
17
+ ```python
18
+ from veep import VP
19
+
20
+ # Authentication (device flow — opens browser for OAuth)
21
+ vp = VP.login() # interactive, saves to ~/.veep/credentials.json
22
+ vp = VP.from_creds() # reuse saved credentials
23
+ vp = VP(api_key="...") # explicit key
24
+ vp.save() # persist for later
25
+
26
+ # Collections
27
+ vp.collections.create("name", tier="hot")
28
+ vp.collections.get("name")
29
+ vp.collections.list()
30
+ vp.collections.delete("name")
31
+ vp.collections.status("name")
32
+
33
+ # Vectors
34
+ vp.vectors.upsert("collection", "file.parquet")
35
+ vp.vectors.replace("collection", "file.parquet")
36
+ vp.vectors.query("collection", vector=[...])
37
+ vp.vectors.query_batch([...])
38
+ vp.vectors.delete("collection", "filename")
39
+ vp.vectors.list_files("collection")
40
+
41
+ # Schema
42
+ vp.schema.get("collection")
43
+ vp.schema.confirm("collection", id_field="id", vector_field="emb")
44
+
45
+ # Health
46
+ vp.ping()
47
+ ```
48
+
49
+ ## API Endpoints Used
50
+
51
+ All traffic goes through consumer-site (.120):
52
+
53
+ | SDK Method | HTTP | Endpoint |
54
+ |------------|------|----------|
55
+ | `collections.create()` | POST | `/api/v1/collections` |
56
+ | `collections.list()` | GET | `/api/v1/collections` |
57
+ | `collections.get()` | GET | `/api/v1/collections/{name}` |
58
+ | `collections.status()` | GET | `/api/v1/collections/{name}/status` |
59
+ | `collections.delete()` | DELETE | `/api/v1/collections/{name}` |
60
+ | `vectors.upsert()` | POST | `/api/v1/collections/{col}/files/{file}` |
61
+ | `vectors.replace()` | PUT | `/api/v1/collections/{col}/files/{file}` |
62
+ | `vectors.query()` | POST | `/api/v1/query` |
63
+ | `vectors.query_batch()` | POST | `/api/v1/query/batch` |
64
+ | `vectors.delete()` | DELETE | `/api/v1/collections/{col}/files/{file}` |
65
+ | `vectors.list_files()` | GET | `/api/v1/collections/{col}/files` |
66
+ | `schema.get()` | GET | `/api/v1/collections/{col}/schema` |
67
+ | `schema.confirm()` | POST | `/api/v1/collections/{col}/schema/confirm` |
68
+ | `health()` | GET | `/api/v1/health` |
69
+ | `VP.login()` | POST | `/api/v1/auth/device` + `/api/v1/auth/device/token` |
70
+
71
+ Auth: Bearer token in Authorization header for all endpoints (except health and device auth).
72
+ Credentials persist to `~/.veep/credentials.json` (chmod 600).
73
+
74
+ ## Package Structure
75
+
76
+ ```
77
+ veep/
78
+ ├── src/veep/
79
+ │ ├── __init__.py # Public API exports
80
+ │ ├── client.py # VP class, HTTP engine, login()/from_creds()
81
+ │ ├── auth.py # Device auth flow, credential persistence
82
+ │ ├── collections.py # Collections sub-resource
83
+ │ ├── vectors.py # Vectors sub-resource
84
+ │ ├── schema.py # Schema sub-resource
85
+ │ ├── models.py # Result, Collection, FileInfo, etc.
86
+ │ └── exceptions.py # Full exception hierarchy
87
+ ├── tests/
88
+ │ ├── test_client.py
89
+ │ ├── test_collections.py
90
+ │ ├── test_vectors.py
91
+ │ ├── test_schema.py
92
+ │ └── test_models.py
93
+ ├── .github/workflows/publish.yml
94
+ ├── pyproject.toml
95
+ ├── README.md
96
+ ├── API_QUESTIONS.md
97
+ └── CLAUDE.md
98
+ ```
99
+
100
+ ## Coding Rules
101
+
102
+ - **Python 3.9+** minimum.
103
+ - **Minimal dependencies**: `requests` for HTTP. No pyarrow dependency (removed — files uploaded as-is).
104
+ - **Type hints** on all public methods. Use `from __future__ import annotations`.
105
+ - **Docstrings** on all public classes and methods (Google style).
106
+ - **Exception hierarchy**: `VeepError` base, with specific subclasses for every failure mode.
107
+ - **Verbose mode**: `verbose=True` logs in plain English via Python logging.
108
+
109
+ ## Testing
110
+
111
+ ```bash
112
+ pip install -e ".[dev]"
113
+ pytest
114
+ ```
115
+
116
+ ## Build & Publish
117
+
118
+ GitHub Actions publishes to PyPI on version tag push (`v*`).
119
+
120
+ Manual:
121
+ ```bash
122
+ pip install build twine
123
+ python -m build
124
+ twine upload dist/*
125
+ ```
126
+
127
+ ## Commit Rules
128
+
129
+ - Prefix: `veep:` for commits touching this directory.
veep-0.4.2/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Vector Panda
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
veep-0.4.2/PKG-INFO ADDED
@@ -0,0 +1,313 @@
1
+ Metadata-Version: 2.4
2
+ Name: veep
3
+ Version: 0.4.2
4
+ Summary: Python SDK for Vector Panda vector search
5
+ Project-URL: Homepage, https://vectorpanda.com
6
+ Project-URL: Documentation, https://github.com/vectorpanda/veep
7
+ Project-URL: Repository, https://github.com/vectorpanda/veep
8
+ Project-URL: Issues, https://github.com/vectorpanda/veep/issues
9
+ Author-email: Vector Panda <hello@vectorpanda.com>
10
+ License-Expression: MIT
11
+ License-File: LICENSE
12
+ Keywords: ai,embeddings,search,similarity,vector
13
+ Classifier: Development Status :: 3 - Alpha
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: License :: OSI Approved :: MIT License
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.9
18
+ Classifier: Programming Language :: Python :: 3.10
19
+ Classifier: Programming Language :: Python :: 3.11
20
+ Classifier: Programming Language :: Python :: 3.12
21
+ Classifier: Programming Language :: Python :: 3.13
22
+ Classifier: Topic :: Database
23
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
24
+ Requires-Python: >=3.9
25
+ Requires-Dist: requests>=2.28
26
+ Provides-Extra: dev
27
+ Requires-Dist: pytest>=7.0; extra == 'dev'
28
+ Requires-Dist: responses>=0.23; extra == 'dev'
29
+ Requires-Dist: ruff>=0.6; extra == 'dev'
30
+ Provides-Extra: pandas
31
+ Requires-Dist: pandas>=1.5; extra == 'pandas'
32
+ Requires-Dist: pyarrow>=10.0; extra == 'pandas'
33
+ Description-Content-Type: text/markdown
34
+
35
+ # veep — Python SDK for Vector Panda
36
+
37
+ Search your vectors in five minutes.
38
+
39
+ ## Install
40
+
41
+ ```bash
42
+ pip install veep
43
+ ```
44
+
45
+ Requires Python 3.9+. The only mandatory dependency is `requests`. NumPy, pandas, and PyArrow are optional — install them only if you use those upload modes.
46
+
47
+ ## Quickstart
48
+
49
+ Five steps. Copy-paste the whole block — it runs end-to-end.
50
+
51
+ ```python
52
+ import numpy as np
53
+ from veep import VP
54
+
55
+ # 1. Sign in. Opens your browser for Google or GitHub OAuth.
56
+ # Credentials save to ~/.veep/credentials.json so future runs skip this step.
57
+ vp = VP.login()
58
+
59
+ # 2. Create a collection.
60
+ vp.collections.create("quickstart", tier="hot")
61
+
62
+ # 3. Upload 100 random 64-dim vectors.
63
+ rng = np.random.default_rng(42)
64
+ vectors = [
65
+ {"id": f"item_{i}", "vector": rng.standard_normal(64).tolist()}
66
+ for i in range(100)
67
+ ]
68
+ vp.vectors.upsert("quickstart", vectors=vectors)
69
+
70
+ # 4. Query the 5 nearest vectors to a random target.
71
+ query_vec = rng.standard_normal(64).tolist()
72
+ results = vp.vectors.query("quickstart", vector=query_vec, top_k=5)
73
+
74
+ # 5. Print results.
75
+ for r in results:
76
+ print(f"{r.key} score={r.score:.4f}")
77
+ ```
78
+
79
+ That's it. You're searching vectors. `upsert` blocks until the collection is queryable, so step 4 always sees the data from step 3 — no manual polling.
80
+
81
+ ## Upload Modes
82
+
83
+ Pick whichever shape your data already has. All four take the same `upsert()` call:
84
+
85
+ ```python
86
+ # Inline list of dicts (the quickstart shape — good for small batches)
87
+ vp.vectors.upsert("col", vectors=[
88
+ {"id": "abc", "vector": [0.1, 0.2, ...], "metadata": {"color": "red"}},
89
+ ])
90
+
91
+ # Parquet / CSV / JSONL file on disk
92
+ vp.vectors.upsert("col", "embeddings.parquet")
93
+
94
+ # pandas DataFrame with id + vector + optional metadata columns
95
+ import pandas as pd
96
+ df = pd.DataFrame({"id": ids, "vector": list(embeddings), "category": tags})
97
+ vp.vectors.upsert("col", dataframe=df)
98
+
99
+ # pyarrow Table — useful when you've already loaded with pyarrow
100
+ import pyarrow.parquet as pq
101
+ tbl = pq.read_table("embeddings.parquet")
102
+ vp.vectors.upsert("col", table=tbl)
103
+ ```
104
+
105
+ The file path is the bulk-upload path: it streams your file through a chunked protocol with no client-side RAM ceiling. The other three are convenience wrappers that serialize to Arrow and use the same backend.
106
+
107
+ ## Authentication
108
+
109
+ Three ways to connect — pick whichever fits your workflow:
110
+
111
+ ```python
112
+ # Option 1: Interactive login (recommended for first use)
113
+ # Opens your browser for Google or GitHub sign-in.
114
+ # Saves credentials to ~/.veep/credentials.json automatically.
115
+ vp = VP.login()
116
+
117
+ # Option 2: Reuse saved credentials (no browser, no key needed)
118
+ # Works after a previous login() call.
119
+ vp = VP.from_creds()
120
+
121
+ # Option 3: Explicit API key (CI/CD, headless environments)
122
+ vp = VP(api_key="sk_live_...")
123
+
124
+ # Option 4: Environment variable
125
+ # export VEEP_API_KEY=sk_live_...
126
+ vp = VP()
127
+ ```
128
+
129
+ `login()` uses the same device authorization pattern as `gh auth login` — it works in terminals, Jupyter notebooks, and remote SSH sessions. The verification URL is clickable in notebooks.
130
+
131
+ ```python
132
+ # Full options
133
+ vp = VP(
134
+ api_key="your_key", # or set VEEP_API_KEY env var
135
+ host="https://...", # optional, defaults to Vector Panda cloud
136
+ timeout=120, # request timeout in seconds
137
+ verbose=True, # log what the client is doing in plain English
138
+ )
139
+
140
+ # Save credentials for later
141
+ vp.save() # writes to ~/.veep/credentials.json
142
+ ```
143
+
144
+ ## Collections
145
+
146
+ ```python
147
+ # Create a collection (with schema for instant processing)
148
+ col = vp.collections.create(
149
+ "products",
150
+ tier="hot",
151
+ id_field="product_id",
152
+ vector_field="embedding",
153
+ )
154
+
155
+ # Or create without schema (auto-detected from first upload)
156
+ col = vp.collections.create("products", tier="hot")
157
+
158
+ # List all collections
159
+ for col in vp.collections.list():
160
+ count = col.vector_count if col.vector_count is not None else "—"
161
+ size = f"{col.storage_gb:.1f} GB" if col.storage_gb is not None else "—"
162
+ print(f"{col.name}: {count} vectors, {size}")
163
+
164
+ # Get details about one collection
165
+ col = vp.collections.get("products")
166
+ print(col.dimension, col.status)
167
+
168
+ # Check processing status
169
+ status = vp.collections.status("products") # "ready", "processing", "unknown", "error"
170
+
171
+ # Delete a collection (permanent)
172
+ vp.collections.delete("products")
173
+ ```
174
+
175
+ ## Querying
176
+
177
+ ```python
178
+ results = vp.vectors.query(
179
+ "products",
180
+ vector=[0.1, 0.2, ...], # your query vector
181
+ top_k=10, # max results (default: 10)
182
+ min_score=0.7, # only return results with score >= this (cosine 0-1)
183
+ metric="cosine", # "cosine", "euclidean", "dot_product"
184
+ with_metadata=True, # return metadata fields
185
+ )
186
+
187
+ for r in results:
188
+ print(f"{r.key}: {r.score:.4f} — {r.metadata}")
189
+
190
+ # Batch queries (up to 100 at once)
191
+ batch = vp.vectors.query_batch([
192
+ {"collection": "products", "vector": query_vec_1, "top_k": 5},
193
+ {"collection": "products", "vector": query_vec_2, "top_k": 5},
194
+ ])
195
+ for query_results in batch:
196
+ print(f"Got {len(query_results)} results")
197
+
198
+ # Fetch a single vector by key (the key from a query result)
199
+ result = vp.vectors.fetch("products", "12345")
200
+ if result.found:
201
+ print(f"Vector: {result.vector[:5]}...")
202
+ print(f"Metadata: {result.metadata}")
203
+ ```
204
+
205
+ ## File Management
206
+
207
+ ```python
208
+ # Replace an existing file (idempotent: same content = no-op)
209
+ result = vp.vectors.replace("products", "product_embeddings.parquet")
210
+
211
+ # List uploaded files
212
+ for f in vp.vectors.list_files("products"):
213
+ print(f"{f.name}: {f.size} bytes, modified {f.modified}")
214
+
215
+ # Delete an uploaded file
216
+ vp.vectors.delete("products", "old_embeddings.parquet")
217
+ ```
218
+
219
+ ## Schema
220
+
221
+ After uploading files, Vector Panda auto-detects which columns hold your vector keys and embeddings. You can inspect and confirm the schema:
222
+
223
+ ```python
224
+ schema = vp.schema.get("products")
225
+ print(schema.state) # "analyzing" or "confirmed"
226
+ print(schema.vector_field) # e.g., "embedding"
227
+ print(schema.id_field) # e.g., "product_id"
228
+
229
+ # Confirm or override the detected schema
230
+ vp.schema.confirm("products", id_field="product_id", vector_field="embedding")
231
+ ```
232
+
233
+ ## Index Parameters
234
+
235
+ For advanced use, pass index-specific parameters to queries:
236
+
237
+ ```python
238
+ results = vp.vectors.query(
239
+ "products",
240
+ vector=query_vec,
241
+ use_index="pca",
242
+ index_params={"pca": {"reduced_dimensions": 64, "candidate_multiplier": 10}},
243
+ )
244
+ ```
245
+
246
+ ## Health Check
247
+
248
+ ```python
249
+ if vp.ping():
250
+ print("Vector Panda is up")
251
+ ```
252
+
253
+ ## Verbose Mode
254
+
255
+ Turn on `verbose=True` to see what the client is doing:
256
+
257
+ ```python
258
+ vp = VP(api_key="...", verbose=True)
259
+ vp.collections.list()
260
+ # veep: Connected to https://api.vectorpanda.com
261
+ # veep: Listing collections...
262
+ # veep: Found 3 collection(s).
263
+ ```
264
+
265
+ ## Error Handling
266
+
267
+ Every error tells you what happened and what to do about it:
268
+
269
+ ```python
270
+ from veep import VP
271
+ from veep.exceptions import (
272
+ CollectionNotFoundError,
273
+ CollectionAlreadyExistsError,
274
+ CollectionNotReadyError,
275
+ AuthError,
276
+ ValidationError,
277
+ )
278
+
279
+ try:
280
+ vp.collections.get("nonexistent")
281
+ except CollectionNotFoundError as e:
282
+ print(e)
283
+ # Collection 'nonexistent' not found.
284
+ # Use vp.collections.list() to see available collections.
285
+ ```
286
+
287
+ | Exception | When |
288
+ |-----------|------|
289
+ | `AuthError` | Invalid or missing API key |
290
+ | `ValidationError` | Bad parameter (name, vector, etc.) |
291
+ | `CollectionNotFoundError` | Collection doesn't exist |
292
+ | `CollectionAlreadyExistsError` | Collection already exists |
293
+ | `CollectionNotReadyError` | Collection is still ingesting; retry shortly |
294
+ | `UploadError` | File not found or unreadable |
295
+ | `FileAlreadyExistsError` | File exists (use `replace`) |
296
+ | `QueryError` | Query service unavailable |
297
+ | `TimeoutError` | Request timed out |
298
+ | `ServerError` | Unexpected server error |
299
+
300
+ ## Configuration
301
+
302
+ | Environment Variable | Description |
303
+ |---------------------|-------------|
304
+ | `VEEP_API_KEY` | Default API key |
305
+ | `VEEP_HOST` | Default API host |
306
+
307
+ ## Beta Status
308
+
309
+ Vector Panda is in private beta. `pip install veep` works for everyone, but creating an account currently requires an invite — request one at [vectorpanda.com](https://vectorpanda.com). If you already have an API key, everything in this README is live.
310
+
311
+ ## License
312
+
313
+ MIT