lsmvec-client 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,147 @@
1
+ Metadata-Version: 2.4
2
+ Name: lsmvec-client
3
+ Version: 0.1.0
4
+ Summary: Python client for the LSM-Vec vector database HTTP API
5
+ Author: LSM-Vec
6
+ License: Apache-2.0
7
+ Requires-Python: >=3.8
8
+ Description-Content-Type: text/markdown
9
+ Provides-Extra: numpy
10
+ Requires-Dist: numpy>=1.20; extra == "numpy"
11
+ Provides-Extra: dev
12
+ Requires-Dist: pytest>=7; extra == "dev"
13
+
14
+ # lsmvec-client — Python client for LSM-Vec
15
+
16
+ A thin, dependency-free Python client for the LSM-Vec vector database
17
+ HTTP API. Uses only the Python standard library; `numpy` is optional
18
+ (a convenience for `bulk_build`).
19
+
20
+ ## Install
21
+
22
+ ```bash
23
+ pip install lsmvec-client # core, zero dependencies
24
+ pip install lsmvec-client[numpy] # + numpy for bulk_build convenience
25
+ ```
26
+
27
+ Or run straight from the repo without installing:
28
+
29
+ ```python
30
+ import sys; sys.path.insert(0, "sdk/python")
31
+ from lsmvec_client import Client
32
+ ```
33
+
34
+ ## Quickstart
35
+
36
+ ```python
37
+ from lsmvec_client import Client
38
+
39
+ client = Client(
40
+ api_key="sk-live-...", # sent as Bearer token
41
+ base_url="https://api.lsmvec.com", # or http://localhost:8000 for local
42
+ )
43
+
44
+ # Insert with optional metadata
45
+ client.insert(1, [0.10, 0.20, 0.30, ...], metadata={"title": "intro"})
46
+
47
+ # Search
48
+ hits = client.search([0.10, 0.20, 0.30, ...], k=10)
49
+ for h in hits:
50
+ print(h.id, h.distance)
51
+
52
+ # Filtered search (metadata predicate, same syntax as the HTTP API)
53
+ hits = client.search(
54
+ [0.10, 0.20, ...], k=10,
55
+ filter={"$and": [{"category": {"$eq": "docs"}}]},
56
+ )
57
+ ```
58
+
59
+ ## Bulk build (initial load)
60
+
61
+ The fastest way to populate a **new, empty** database. Builds the
62
+ whole index in memory (RNN-Descent) and writes it in one pass —
63
+ 2-3× faster than per-vector inserts and higher recall. Initial-load
64
+ only; the DB must be empty.
65
+
66
+ ```python
67
+ import numpy as np
68
+ from lsmvec_client import Client
69
+
70
+ client = Client(base_url="http://localhost:8000")
71
+
72
+ vectors = np.random.rand(100_000, 128).astype(np.float32)
73
+ report = client.bulk_build(vectors, threads=4)
74
+ print(report) # {'n': 100000, 'elapsed_ms': ..., 'vectors_per_sec': ..., 'threads': 4}
75
+ ```
76
+
77
+ `bulk_build` also accepts a plain list of equal-length float lists
78
+ (no numpy required):
79
+
80
+ ```python
81
+ rows = [[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8], ...]
82
+ client.bulk_build(rows)
83
+ ```
84
+
85
+ For incremental updates on an already-built index, use `insert()` /
86
+ `upsert()` instead — `bulk_build` rejects a non-empty DB.
87
+
88
+ ## API
89
+
90
+ | Method | HTTP | Notes |
91
+ |---|---|---|
92
+ | `insert(id, vector, metadata=None)` | `POST /v1/vectors` | metadata is any JSON object |
93
+ | `upsert(id, vector)` | `PUT /v1/vectors/:id` | insert-or-replace vector |
94
+ | `get(id) -> dict` | `GET /v1/vectors/:id` | `{"id", "vector"}` |
95
+ | `delete(id)` | `DELETE /v1/vectors/:id` | |
96
+ | `get_payload(id) -> dict` | `GET /v1/vectors/:id/payload` | |
97
+ | `set_payload(id, payload)` | `PUT /v1/vectors/:id/payload` | replace |
98
+ | `merge_payload(id, partial)` | `PATCH /v1/vectors/:id/payload` | RFC 7396 merge |
99
+ | `search(vector, k=10, ef_search=None, filter=None) -> [SearchResult]` | `POST /v1/search` | |
100
+ | `bulk_build(vectors, dim=None, threads=0) -> dict` | `POST /v1/build/bulk` | empty DB only |
101
+ | `stats() -> dict` | `GET /v1/stats` | tombstone / bloom counters |
102
+ | `health() -> bool` | `GET /health` | |
103
+ | `ready() -> bool` | `GET /ready` | DB open + responsive |
104
+
105
+ `search` returns a list of `SearchResult(id: int, distance: float)`.
106
+
107
+ ## Errors
108
+
109
+ HTTP status codes map to typed exceptions (all subclass `LSMVecError`):
110
+
111
+ | Status | Exception |
112
+ |---|---|
113
+ | 400 | `InvalidArgument` |
114
+ | 401 | `Unauthorized` |
115
+ | 404 | `NotFound` |
116
+ | 413 | `PayloadTooLarge` |
117
+ | 429 | `RateLimited` |
118
+ | 5xx | `ServerError` |
119
+
120
+ ```python
121
+ from lsmvec_client import NotFound
122
+
123
+ try:
124
+ client.get(999999)
125
+ except NotFound:
126
+ print("no such id")
127
+ ```
128
+
129
+ ## Notes
130
+
131
+ - Vectors are stored with 8-bit scalar quantization (SQ8). `get()`
132
+ returns the dequantized vector, which differs from the input by
133
+ up to ~`range/255` per element. Distances and recall are computed
134
+ on the quantized form.
135
+ - `id` is a 64-bit unsigned integer.
136
+ - The client is synchronous and connection-per-request (stdlib
137
+ `urllib`). For high-throughput batch ingestion, prefer
138
+ `bulk_build` over a loop of `insert`.
139
+
140
+ ## Testing
141
+
142
+ Against a running server:
143
+
144
+ ```bash
145
+ LSMVEC_TEST_URL=http://localhost:8000 LSMVEC_TEST_DIM=8 \
146
+ python3 sdk/python/tests/test_client.py
147
+ ```
@@ -0,0 +1,134 @@
1
+ # lsmvec-client — Python client for LSM-Vec
2
+
3
+ A thin, dependency-free Python client for the LSM-Vec vector database
4
+ HTTP API. Uses only the Python standard library; `numpy` is optional
5
+ (a convenience for `bulk_build`).
6
+
7
+ ## Install
8
+
9
+ ```bash
10
+ pip install lsmvec-client # core, zero dependencies
11
+ pip install lsmvec-client[numpy] # + numpy for bulk_build convenience
12
+ ```
13
+
14
+ Or run straight from the repo without installing:
15
+
16
+ ```python
17
+ import sys; sys.path.insert(0, "sdk/python")
18
+ from lsmvec_client import Client
19
+ ```
20
+
21
+ ## Quickstart
22
+
23
+ ```python
24
+ from lsmvec_client import Client
25
+
26
+ client = Client(
27
+ api_key="sk-live-...", # sent as Bearer token
28
+ base_url="https://api.lsmvec.com", # or http://localhost:8000 for local
29
+ )
30
+
31
+ # Insert with optional metadata
32
+ client.insert(1, [0.10, 0.20, 0.30, ...], metadata={"title": "intro"})
33
+
34
+ # Search
35
+ hits = client.search([0.10, 0.20, 0.30, ...], k=10)
36
+ for h in hits:
37
+ print(h.id, h.distance)
38
+
39
+ # Filtered search (metadata predicate, same syntax as the HTTP API)
40
+ hits = client.search(
41
+ [0.10, 0.20, ...], k=10,
42
+ filter={"$and": [{"category": {"$eq": "docs"}}]},
43
+ )
44
+ ```
45
+
46
+ ## Bulk build (initial load)
47
+
48
+ The fastest way to populate a **new, empty** database. Builds the
49
+ whole index in memory (RNN-Descent) and writes it in one pass —
50
+ 2-3× faster than per-vector inserts and higher recall. Initial-load
51
+ only; the DB must be empty.
52
+
53
+ ```python
54
+ import numpy as np
55
+ from lsmvec_client import Client
56
+
57
+ client = Client(base_url="http://localhost:8000")
58
+
59
+ vectors = np.random.rand(100_000, 128).astype(np.float32)
60
+ report = client.bulk_build(vectors, threads=4)
61
+ print(report) # {'n': 100000, 'elapsed_ms': ..., 'vectors_per_sec': ..., 'threads': 4}
62
+ ```
63
+
64
+ `bulk_build` also accepts a plain list of equal-length float lists
65
+ (no numpy required):
66
+
67
+ ```python
68
+ rows = [[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8], ...]
69
+ client.bulk_build(rows)
70
+ ```
71
+
72
+ For incremental updates on an already-built index, use `insert()` /
73
+ `upsert()` instead — `bulk_build` rejects a non-empty DB.
74
+
75
+ ## API
76
+
77
+ | Method | HTTP | Notes |
78
+ |---|---|---|
79
+ | `insert(id, vector, metadata=None)` | `POST /v1/vectors` | metadata is any JSON object |
80
+ | `upsert(id, vector)` | `PUT /v1/vectors/:id` | insert-or-replace vector |
81
+ | `get(id) -> dict` | `GET /v1/vectors/:id` | `{"id", "vector"}` |
82
+ | `delete(id)` | `DELETE /v1/vectors/:id` | |
83
+ | `get_payload(id) -> dict` | `GET /v1/vectors/:id/payload` | |
84
+ | `set_payload(id, payload)` | `PUT /v1/vectors/:id/payload` | replace |
85
+ | `merge_payload(id, partial)` | `PATCH /v1/vectors/:id/payload` | RFC 7396 merge |
86
+ | `search(vector, k=10, ef_search=None, filter=None) -> [SearchResult]` | `POST /v1/search` | |
87
+ | `bulk_build(vectors, dim=None, threads=0) -> dict` | `POST /v1/build/bulk` | empty DB only |
88
+ | `stats() -> dict` | `GET /v1/stats` | tombstone / bloom counters |
89
+ | `health() -> bool` | `GET /health` | |
90
+ | `ready() -> bool` | `GET /ready` | DB open + responsive |
91
+
92
+ `search` returns a list of `SearchResult(id: int, distance: float)`.
93
+
94
+ ## Errors
95
+
96
+ HTTP status codes map to typed exceptions (all subclass `LSMVecError`):
97
+
98
+ | Status | Exception |
99
+ |---|---|
100
+ | 400 | `InvalidArgument` |
101
+ | 401 | `Unauthorized` |
102
+ | 404 | `NotFound` |
103
+ | 413 | `PayloadTooLarge` |
104
+ | 429 | `RateLimited` |
105
+ | 5xx | `ServerError` |
106
+
107
+ ```python
108
+ from lsmvec_client import NotFound
109
+
110
+ try:
111
+ client.get(999999)
112
+ except NotFound:
113
+ print("no such id")
114
+ ```
115
+
116
+ ## Notes
117
+
118
+ - Vectors are stored with 8-bit scalar quantization (SQ8). `get()`
119
+ returns the dequantized vector, which differs from the input by
120
+ up to ~`range/255` per element. Distances and recall are computed
121
+ on the quantized form.
122
+ - `id` is a 64-bit unsigned integer.
123
+ - The client is synchronous and connection-per-request (stdlib
124
+ `urllib`). For high-throughput batch ingestion, prefer
125
+ `bulk_build` over a loop of `insert`.
126
+
127
+ ## Testing
128
+
129
+ Against a running server:
130
+
131
+ ```bash
132
+ LSMVEC_TEST_URL=http://localhost:8000 LSMVEC_TEST_DIM=8 \
133
+ python3 sdk/python/tests/test_client.py
134
+ ```
@@ -0,0 +1,34 @@
1
+ """LSM-Vec Python client.
2
+
3
+ A thin, dependency-free client for the LSM-Vec HTTP server.
4
+
5
+ from lsmvec_client import Client
6
+ c = Client(api_key="sk-live-...", base_url="https://api.lsmvec.com")
7
+ c.insert(1, [0.1, 0.2, ...])
8
+ hits = c.search([0.1, 0.2, ...], k=10)
9
+ """
10
+
11
+ from .client import Client, SearchResult
12
+ from .errors import (
13
+ InvalidArgument,
14
+ LSMVecError,
15
+ NotFound,
16
+ PayloadTooLarge,
17
+ RateLimited,
18
+ ServerError,
19
+ Unauthorized,
20
+ )
21
+
22
+ __version__ = "0.1.0"
23
+
24
+ __all__ = [
25
+ "Client",
26
+ "SearchResult",
27
+ "LSMVecError",
28
+ "InvalidArgument",
29
+ "Unauthorized",
30
+ "NotFound",
31
+ "PayloadTooLarge",
32
+ "RateLimited",
33
+ "ServerError",
34
+ ]
@@ -0,0 +1,246 @@
1
+ """LSM-Vec HTTP client.
2
+
3
+ Thin wrapper over the LSM-Vec REST API (see docs/TRIAL_LAUNCH_PLAN.md
4
+ §5.3). Uses only the Python standard library — no third-party
5
+ dependencies required. numpy is optional and only used as a
6
+ convenience for `bulk_build` (lists work too).
7
+
8
+ Example:
9
+ from lsmvec_client import Client
10
+ c = Client(api_key="sk-live-...", base_url="http://localhost:8000")
11
+ c.insert(1, [0.1, 0.2, ...], metadata={"title": "doc"})
12
+ hits = c.search([0.1, 0.2, ...], k=10)
13
+ for h in hits:
14
+ print(h.id, h.distance)
15
+ """
16
+
17
+ import json
18
+ import struct
19
+ import urllib.error
20
+ import urllib.request
21
+ from dataclasses import dataclass
22
+ from typing import Any, Dict, List, Optional, Sequence, Union
23
+
24
+ from .errors import LSMVecError, from_status
25
+
26
+ __all__ = ["Client", "SearchResult"]
27
+
28
+ Vector = Union[Sequence[float], "numpy.ndarray"] # noqa: F821
29
+
30
+
31
+ @dataclass
32
+ class SearchResult:
33
+ id: int
34
+ distance: float
35
+
36
+
37
+ class Client:
38
+ """Client for a single LSM-Vec HTTP server (one trial user/container)."""
39
+
40
+ def __init__(
41
+ self,
42
+ api_key: str = "",
43
+ base_url: str = "http://localhost:8000",
44
+ timeout: float = 30.0,
45
+ ):
46
+ self.api_key = api_key
47
+ self.base_url = base_url.rstrip("/")
48
+ self.timeout = timeout
49
+
50
+ # ---- internal request helper ----
51
+
52
+ def _request(
53
+ self,
54
+ method: str,
55
+ path: str,
56
+ *,
57
+ json_body: Optional[Any] = None,
58
+ raw_body: Optional[bytes] = None,
59
+ extra_headers: Optional[Dict[str, str]] = None,
60
+ parse_json: bool = True,
61
+ ):
62
+ url = self.base_url + path
63
+ headers = {}
64
+ if self.api_key:
65
+ headers["Authorization"] = "Bearer " + self.api_key
66
+
67
+ data = None
68
+ if raw_body is not None:
69
+ data = raw_body
70
+ headers["Content-Type"] = "application/octet-stream"
71
+ elif json_body is not None:
72
+ data = json.dumps(json_body).encode("utf-8")
73
+ headers["Content-Type"] = "application/json"
74
+ if extra_headers:
75
+ headers.update(extra_headers)
76
+
77
+ req = urllib.request.Request(url, data=data, method=method, headers=headers)
78
+ try:
79
+ with urllib.request.urlopen(req, timeout=self.timeout) as resp:
80
+ body = resp.read()
81
+ if not parse_json or not body:
82
+ return resp.status, None
83
+ return resp.status, json.loads(body.decode("utf-8"))
84
+ except urllib.error.HTTPError as e:
85
+ body = e.read()
86
+ code = None
87
+ message = e.reason or "http error"
88
+ try:
89
+ parsed = json.loads(body.decode("utf-8"))
90
+ code = parsed.get("code")
91
+ message = parsed.get("error", message)
92
+ except Exception:
93
+ if body:
94
+ message = body.decode("utf-8", "replace")
95
+ raise from_status(e.code, code, message) from None
96
+ except urllib.error.URLError as e:
97
+ raise LSMVecError("connection failed: " + str(e.reason)) from None
98
+
99
+ @staticmethod
100
+ def _to_list(vector: Vector) -> List[float]:
101
+ # numpy array → list, list stays list.
102
+ if hasattr(vector, "tolist"):
103
+ return vector.tolist()
104
+ return list(vector)
105
+
106
+ # ---- vectors ----
107
+
108
+ def insert(
109
+ self,
110
+ id: int,
111
+ vector: Vector,
112
+ metadata: Optional[Dict[str, Any]] = None,
113
+ ) -> None:
114
+ body: Dict[str, Any] = {"id": int(id), "vector": self._to_list(vector)}
115
+ if metadata is not None:
116
+ body["metadata"] = metadata
117
+ self._request("POST", "/v1/vectors", json_body=body, parse_json=False)
118
+
119
+ def upsert(self, id: int, vector: Vector) -> None:
120
+ self._request(
121
+ "PUT",
122
+ "/v1/vectors/%d" % int(id),
123
+ json_body={"vector": self._to_list(vector)},
124
+ parse_json=False,
125
+ )
126
+
127
+ def get(self, id: int) -> Dict[str, Any]:
128
+ _, body = self._request("GET", "/v1/vectors/%d" % int(id))
129
+ return body
130
+
131
+ def delete(self, id: int) -> None:
132
+ self._request("DELETE", "/v1/vectors/%d" % int(id), parse_json=False)
133
+
134
+ # ---- payload ----
135
+
136
+ def get_payload(self, id: int) -> Dict[str, Any]:
137
+ _, body = self._request("GET", "/v1/vectors/%d/payload" % int(id))
138
+ return body
139
+
140
+ def set_payload(self, id: int, payload: Dict[str, Any]) -> None:
141
+ self._request(
142
+ "PUT",
143
+ "/v1/vectors/%d/payload" % int(id),
144
+ json_body=payload,
145
+ parse_json=False,
146
+ )
147
+
148
+ def merge_payload(self, id: int, partial: Dict[str, Any]) -> None:
149
+ """RFC 7396 merge-patch: keys in `partial` overwrite; null deletes."""
150
+ self._request(
151
+ "PATCH",
152
+ "/v1/vectors/%d/payload" % int(id),
153
+ json_body=partial,
154
+ parse_json=False,
155
+ )
156
+
157
+ # ---- search ----
158
+
159
+ def search(
160
+ self,
161
+ vector: Vector,
162
+ k: int = 10,
163
+ ef_search: Optional[int] = None,
164
+ filter: Optional[Dict[str, Any]] = None,
165
+ ) -> List[SearchResult]:
166
+ body: Dict[str, Any] = {"vector": self._to_list(vector), "k": int(k)}
167
+ if ef_search is not None:
168
+ body["ef_search"] = int(ef_search)
169
+ if filter is not None:
170
+ body["filter"] = filter
171
+ _, parsed = self._request("POST", "/v1/search", json_body=body)
172
+ results = parsed.get("results", []) if parsed else []
173
+ return [SearchResult(id=r["id"], distance=r["distance"]) for r in results]
174
+
175
+ # ---- bulk build (initial-load only) ----
176
+
177
+ def bulk_build(self, vectors, dim: Optional[int] = None, threads: int = 0) -> Dict[str, Any]:
178
+ """Build the entire index from a batch of vectors in one call.
179
+
180
+ `vectors` may be a 2-D numpy array (n, dim) or a list of
181
+ equal-length float lists. The DB must be empty — bulk build is
182
+ initial-load only.
183
+
184
+ Returns the server's timing report dict
185
+ {n, elapsed_ms, vectors_per_sec, threads}.
186
+ """
187
+ # Normalize to a flat float32 byte blob + (n, dim).
188
+ if hasattr(vectors, "shape"): # numpy array
189
+ import numpy as np # local import; numpy optional otherwise
190
+
191
+ arr = np.ascontiguousarray(vectors, dtype=np.float32)
192
+ if arr.ndim != 2:
193
+ raise ValueError("vectors must be a 2-D array (n, dim)")
194
+ n, d = arr.shape
195
+ if dim is not None and dim != d:
196
+ raise ValueError("dim=%d does not match array dim=%d" % (dim, d))
197
+ blob = arr.tobytes()
198
+ else: # list of lists
199
+ rows = list(vectors)
200
+ n = len(rows)
201
+ if n == 0:
202
+ raise ValueError("vectors is empty")
203
+ d = len(rows[0])
204
+ if dim is not None and dim != d:
205
+ raise ValueError("dim=%d does not match row length=%d" % (dim, d))
206
+ flat = []
207
+ for row in rows:
208
+ if len(row) != d:
209
+ raise ValueError("all rows must have the same length")
210
+ flat.extend(row)
211
+ blob = struct.pack("<%df" % (n * d), *flat)
212
+
213
+ headers = {
214
+ "X-LSMVec-N": str(n),
215
+ "X-LSMVec-Dim": str(d),
216
+ }
217
+ if threads > 0:
218
+ headers["X-LSMVec-Threads"] = str(threads)
219
+
220
+ _, parsed = self._request(
221
+ "POST",
222
+ "/v1/build/bulk",
223
+ raw_body=blob,
224
+ extra_headers=headers,
225
+ )
226
+ return parsed
227
+
228
+ # ---- diagnostics ----
229
+
230
+ def stats(self) -> Dict[str, Any]:
231
+ _, body = self._request("GET", "/v1/stats")
232
+ return body
233
+
234
+ def health(self) -> bool:
235
+ try:
236
+ status, _ = self._request("GET", "/health", parse_json=False)
237
+ return status == 200
238
+ except LSMVecError:
239
+ return False
240
+
241
+ def ready(self) -> bool:
242
+ try:
243
+ status, _ = self._request("GET", "/ready", parse_json=False)
244
+ return status == 200
245
+ except LSMVecError:
246
+ return False
@@ -0,0 +1,63 @@
1
+ """Exception hierarchy for the LSM-Vec client.
2
+
3
+ HTTP status codes map to typed exceptions so callers can catch the
4
+ specific failure they care about:
5
+
6
+ 400 -> InvalidArgument
7
+ 401 -> Unauthorized
8
+ 404 -> NotFound
9
+ 413 -> PayloadTooLarge
10
+ 429 -> RateLimited
11
+ 5xx -> ServerError
12
+ other / network -> LSMVecError (base)
13
+ """
14
+
15
+
16
+ class LSMVecError(Exception):
17
+ """Base class for all client errors."""
18
+
19
+ def __init__(self, message, *, status=None, code=None):
20
+ super().__init__(message)
21
+ self.status = status
22
+ self.code = code
23
+
24
+
25
+ class InvalidArgument(LSMVecError):
26
+ """400 — malformed request (bad vector, wrong dim, bad JSON)."""
27
+
28
+
29
+ class Unauthorized(LSMVecError):
30
+ """401 — missing or invalid API key."""
31
+
32
+
33
+ class NotFound(LSMVecError):
34
+ """404 — id does not exist / index empty."""
35
+
36
+
37
+ class PayloadTooLarge(LSMVecError):
38
+ """413 — request body exceeds the server's limit."""
39
+
40
+
41
+ class RateLimited(LSMVecError):
42
+ """429 — too many requests; back off and retry."""
43
+
44
+
45
+ class ServerError(LSMVecError):
46
+ """5xx — server-side failure."""
47
+
48
+
49
+ def from_status(status, code, message):
50
+ """Build the right exception subtype from an HTTP status."""
51
+ if status == 400:
52
+ return InvalidArgument(message, status=status, code=code)
53
+ if status == 401:
54
+ return Unauthorized(message, status=status, code=code)
55
+ if status == 404:
56
+ return NotFound(message, status=status, code=code)
57
+ if status == 413:
58
+ return PayloadTooLarge(message, status=status, code=code)
59
+ if status == 429:
60
+ return RateLimited(message, status=status, code=code)
61
+ if 500 <= status < 600:
62
+ return ServerError(message, status=status, code=code)
63
+ return LSMVecError(message, status=status, code=code)
@@ -0,0 +1,147 @@
1
+ Metadata-Version: 2.4
2
+ Name: lsmvec-client
3
+ Version: 0.1.0
4
+ Summary: Python client for the LSM-Vec vector database HTTP API
5
+ Author: LSM-Vec
6
+ License: Apache-2.0
7
+ Requires-Python: >=3.8
8
+ Description-Content-Type: text/markdown
9
+ Provides-Extra: numpy
10
+ Requires-Dist: numpy>=1.20; extra == "numpy"
11
+ Provides-Extra: dev
12
+ Requires-Dist: pytest>=7; extra == "dev"
13
+
14
+ # lsmvec-client — Python client for LSM-Vec
15
+
16
+ A thin, dependency-free Python client for the LSM-Vec vector database
17
+ HTTP API. Uses only the Python standard library; `numpy` is optional
18
+ (a convenience for `bulk_build`).
19
+
20
+ ## Install
21
+
22
+ ```bash
23
+ pip install lsmvec-client # core, zero dependencies
24
+ pip install lsmvec-client[numpy] # + numpy for bulk_build convenience
25
+ ```
26
+
27
+ Or run straight from the repo without installing:
28
+
29
+ ```python
30
+ import sys; sys.path.insert(0, "sdk/python")
31
+ from lsmvec_client import Client
32
+ ```
33
+
34
+ ## Quickstart
35
+
36
+ ```python
37
+ from lsmvec_client import Client
38
+
39
+ client = Client(
40
+ api_key="sk-live-...", # sent as Bearer token
41
+ base_url="https://api.lsmvec.com", # or http://localhost:8000 for local
42
+ )
43
+
44
+ # Insert with optional metadata
45
+ client.insert(1, [0.10, 0.20, 0.30, ...], metadata={"title": "intro"})
46
+
47
+ # Search
48
+ hits = client.search([0.10, 0.20, 0.30, ...], k=10)
49
+ for h in hits:
50
+ print(h.id, h.distance)
51
+
52
+ # Filtered search (metadata predicate, same syntax as the HTTP API)
53
+ hits = client.search(
54
+ [0.10, 0.20, ...], k=10,
55
+ filter={"$and": [{"category": {"$eq": "docs"}}]},
56
+ )
57
+ ```
58
+
59
+ ## Bulk build (initial load)
60
+
61
+ The fastest way to populate a **new, empty** database. Builds the
62
+ whole index in memory (RNN-Descent) and writes it in one pass —
63
+ 2-3× faster than per-vector inserts and higher recall. Initial-load
64
+ only; the DB must be empty.
65
+
66
+ ```python
67
+ import numpy as np
68
+ from lsmvec_client import Client
69
+
70
+ client = Client(base_url="http://localhost:8000")
71
+
72
+ vectors = np.random.rand(100_000, 128).astype(np.float32)
73
+ report = client.bulk_build(vectors, threads=4)
74
+ print(report) # {'n': 100000, 'elapsed_ms': ..., 'vectors_per_sec': ..., 'threads': 4}
75
+ ```
76
+
77
+ `bulk_build` also accepts a plain list of equal-length float lists
78
+ (no numpy required):
79
+
80
+ ```python
81
+ rows = [[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8], ...]
82
+ client.bulk_build(rows)
83
+ ```
84
+
85
+ For incremental updates on an already-built index, use `insert()` /
86
+ `upsert()` instead — `bulk_build` rejects a non-empty DB.
87
+
88
+ ## API
89
+
90
+ | Method | HTTP | Notes |
91
+ |---|---|---|
92
+ | `insert(id, vector, metadata=None)` | `POST /v1/vectors` | metadata is any JSON object |
93
+ | `upsert(id, vector)` | `PUT /v1/vectors/:id` | insert-or-replace vector |
94
+ | `get(id) -> dict` | `GET /v1/vectors/:id` | `{"id", "vector"}` |
95
+ | `delete(id)` | `DELETE /v1/vectors/:id` | |
96
+ | `get_payload(id) -> dict` | `GET /v1/vectors/:id/payload` | |
97
+ | `set_payload(id, payload)` | `PUT /v1/vectors/:id/payload` | replace |
98
+ | `merge_payload(id, partial)` | `PATCH /v1/vectors/:id/payload` | RFC 7396 merge |
99
+ | `search(vector, k=10, ef_search=None, filter=None) -> [SearchResult]` | `POST /v1/search` | |
100
+ | `bulk_build(vectors, dim=None, threads=0) -> dict` | `POST /v1/build/bulk` | empty DB only |
101
+ | `stats() -> dict` | `GET /v1/stats` | tombstone / bloom counters |
102
+ | `health() -> bool` | `GET /health` | |
103
+ | `ready() -> bool` | `GET /ready` | DB open + responsive |
104
+
105
+ `search` returns a list of `SearchResult(id: int, distance: float)`.
106
+
107
+ ## Errors
108
+
109
+ HTTP status codes map to typed exceptions (all subclass `LSMVecError`):
110
+
111
+ | Status | Exception |
112
+ |---|---|
113
+ | 400 | `InvalidArgument` |
114
+ | 401 | `Unauthorized` |
115
+ | 404 | `NotFound` |
116
+ | 413 | `PayloadTooLarge` |
117
+ | 429 | `RateLimited` |
118
+ | 5xx | `ServerError` |
119
+
120
+ ```python
121
+ from lsmvec_client import NotFound
122
+
123
+ try:
124
+ client.get(999999)
125
+ except NotFound:
126
+ print("no such id")
127
+ ```
128
+
129
+ ## Notes
130
+
131
+ - Vectors are stored with 8-bit scalar quantization (SQ8). `get()`
132
+ returns the dequantized vector, which differs from the input by
133
+ up to ~`range/255` per element. Distances and recall are computed
134
+ on the quantized form.
135
+ - `id` is a 64-bit unsigned integer.
136
+ - The client is synchronous and connection-per-request (stdlib
137
+ `urllib`). For high-throughput batch ingestion, prefer
138
+ `bulk_build` over a loop of `insert`.
139
+
140
+ ## Testing
141
+
142
+ Against a running server:
143
+
144
+ ```bash
145
+ LSMVEC_TEST_URL=http://localhost:8000 LSMVEC_TEST_DIM=8 \
146
+ python3 sdk/python/tests/test_client.py
147
+ ```
@@ -0,0 +1,11 @@
1
+ README.md
2
+ pyproject.toml
3
+ lsmvec_client/__init__.py
4
+ lsmvec_client/client.py
5
+ lsmvec_client/errors.py
6
+ lsmvec_client.egg-info/PKG-INFO
7
+ lsmvec_client.egg-info/SOURCES.txt
8
+ lsmvec_client.egg-info/dependency_links.txt
9
+ lsmvec_client.egg-info/requires.txt
10
+ lsmvec_client.egg-info/top_level.txt
11
+ tests/test_client.py
@@ -0,0 +1,6 @@
1
+
2
+ [dev]
3
+ pytest>=7
4
+
5
+ [numpy]
6
+ numpy>=1.20
@@ -0,0 +1 @@
1
+ lsmvec_client
@@ -0,0 +1,24 @@
1
+ [build-system]
2
+ requires = ["setuptools>=61"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "lsmvec-client"
7
+ version = "0.1.0"
8
+ description = "Python client for the LSM-Vec vector database HTTP API"
9
+ readme = "README.md"
10
+ requires-python = ">=3.8"
11
+ license = {text = "Apache-2.0"}
12
+ authors = [{name = "LSM-Vec"}]
13
+ # No required runtime dependencies — the client uses only the Python
14
+ # standard library (urllib, json, struct). numpy is optional and only
15
+ # used for the bulk_build convenience path.
16
+ dependencies = []
17
+
18
+ [project.optional-dependencies]
19
+ numpy = ["numpy>=1.20"]
20
+ dev = ["pytest>=7"]
21
+
22
+ [tool.setuptools.packages.find]
23
+ where = ["."]
24
+ include = ["lsmvec_client*"]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,90 @@
1
+ """Integration test for the LSM-Vec Python client.
2
+
3
+ Requires a running lsm_vec_http server. Point it via env:
4
+
5
+ LSMVEC_TEST_URL=http://localhost:8000 python3 tests/test_client.py
6
+
7
+ The server must be started fresh (empty DB) with the matching dim
8
+ (default 8 below). This is a plain-assert script (no pytest needed)
9
+ so it runs in any environment.
10
+ """
11
+
12
+ import os
13
+ import sys
14
+
15
+ # Allow running from the sdk/python dir without install.
16
+ sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
17
+
18
+ from lsmvec_client import Client, NotFound, InvalidArgument # noqa: E402
19
+
20
+ URL = os.environ.get("LSMVEC_TEST_URL", "http://localhost:8000")
21
+ DIM = int(os.environ.get("LSMVEC_TEST_DIM", "8"))
22
+
23
+
24
+ def vec(seed):
25
+ return [float((seed + i) % 10) * 0.1 for i in range(DIM)]
26
+
27
+
28
+ def main():
29
+ c = Client(api_key="test-key", base_url=URL)
30
+
31
+ assert c.health(), "server health check failed"
32
+ print("[ok] health")
33
+
34
+ # Insert + get round-trip.
35
+ c.insert(1, vec(1), metadata={"name": "one"})
36
+ c.insert(2, vec(2))
37
+ got = c.get(1)
38
+ assert got["id"] == 1, got
39
+ assert len(got["vector"]) == DIM, got
40
+ print("[ok] insert + get")
41
+
42
+ # Payload.
43
+ pl = c.get_payload(1)
44
+ assert pl.get("name") == "one", pl
45
+ c.merge_payload(1, {"score": 42})
46
+ pl = c.get_payload(1)
47
+ assert pl.get("name") == "one" and pl.get("score") == 42, pl
48
+ print("[ok] payload get + merge")
49
+
50
+ # Search.
51
+ hits = c.search(vec(1), k=2)
52
+ assert len(hits) >= 1, hits
53
+ assert hits[0].id == 1, "nearest to vec(1) should be id 1, got %r" % hits
54
+ print("[ok] search (top hit id=%d dist=%.4f)" % (hits[0].id, hits[0].distance))
55
+
56
+ # Upsert then re-get. Stored vectors are SQ8-quantized (8-bit), so
57
+ # the round-trip is lossy — tolerance must accommodate ~range/255
58
+ # (~0.01 for these small magnitudes), not exact equality.
59
+ c.upsert(1, vec(5))
60
+ got = c.get(1)
61
+ assert abs(got["vector"][0] - vec(5)[0]) < 0.02, got
62
+ print("[ok] upsert")
63
+
64
+ # Delete + NotFound.
65
+ c.delete(2)
66
+ try:
67
+ c.get(2)
68
+ assert False, "expected NotFound after delete"
69
+ except NotFound:
70
+ pass
71
+ print("[ok] delete + NotFound")
72
+
73
+ # Wrong-dim insert → InvalidArgument.
74
+ try:
75
+ c.insert(99, [0.1, 0.2]) # wrong dim
76
+ assert False, "expected InvalidArgument for wrong dim"
77
+ except InvalidArgument:
78
+ pass
79
+ print("[ok] wrong-dim rejected")
80
+
81
+ # Stats.
82
+ s = c.stats()
83
+ assert "total_inserts_ever" in s, s
84
+ print("[ok] stats (total_inserts_ever=%s)" % s["total_inserts_ever"])
85
+
86
+ print("\nALL PASSED")
87
+
88
+
89
+ if __name__ == "__main__":
90
+ main()