@pyxmate/memory 0.7.0 → 0.9.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -91,9 +91,9 @@ curl -s -X POST {{ENDPOINT}}/api/memory/ingest/file \
|
|
|
91
91
|
-F "description=What this file contains or shows"
|
|
92
92
|
```
|
|
93
93
|
|
|
94
|
-
Supported formats: txt, md, csv, tsv, log, pdf, docx, xlsx, json, jsonl, html, htm, png, jpg, jpeg, webp, gif, bmp, tiff, svg (
|
|
94
|
+
Supported formats: txt, md, csv, tsv, log, pdf, docx, xlsx, pptx, json, jsonl, html, htm, png, jpg, jpeg, webp, gif, bmp, tiff, svg (100MB limit).
|
|
95
95
|
|
|
96
|
-
Ingestion streams from disk — parsers read row-by-row (csv/xlsx), line-by-line (txt/jsonl),
|
|
96
|
+
Ingestion streams from disk — parsers read row-by-row (csv/xlsx), line-by-line (txt/jsonl), page-by-page (pdf), or slide-by-slide (pptx) so peak memory is bounded regardless of file size. The only exception is `.docx`, which is hard-capped at 10MB because the underlying library (mammoth) loads the whole document into memory; above 10MB the server returns an error asking you to pre-extract the text upstream and re-upload as `.txt` or `.md`.
|
|
97
97
|
|
|
98
98
|
**Images require a `description`** — this is how the content gets embedded and becomes searchable. Without it, the image is stored but not findable. Use your vision capabilities to generate the description when the user doesn't provide one.
|
|
99
99
|
|
|
@@ -25,7 +25,7 @@ const client = new MemoryClient('http://localhost:7822', process.env.MEMORY_API_
|
|
|
25
25
|
| GET | `/health` | Public health check (status only — no internals exposed) |
|
|
26
26
|
| GET | `/admin/health` | Admin health check (version, uptime, embedding provider, memory stats) |
|
|
27
27
|
| POST | `/api/memory/ingest` | Store a memory (JSON: `{ content, type, metadata, agentId?, sessionId?, targets?, entities?, relationships?, importance?, source?, eventTime?, id?, parentId?, ingestTime? }`) |
|
|
28
|
-
| POST | `/api/memory/ingest/file` | Upload file (multipart,
|
|
28
|
+
| POST | `/api/memory/ingest/file` | Upload file (multipart, 100MB limit). For PDFs with images, returns an `enrichment` block with HMAC token and image metadata for two-phase enrichment. |
|
|
29
29
|
| GET | `/api/memory/files/download/:filename` | Download uploaded file binary by filename. Returns the original file with proper Content-Type. Images served inline, documents as attachment. |
|
|
30
30
|
| GET | `/api/memory/files/:fileId/images/:imageId?token=...` | Fetch an extracted PDF image (binary). Requires HMAC token from ingest response. |
|
|
31
31
|
| POST | `/api/memory/files/:fileId/enrich` | Submit image descriptions + entities after LLM vision processing. Requires `X-Enrichment-Token` header. |
|
|
@@ -66,12 +66,12 @@ const client = new MemoryClient('http://localhost:7822', process.env.MEMORY_API_
|
|
|
66
66
|
|
|
67
67
|
| Field | Type | Required | Description |
|
|
68
68
|
|-------|------|----------|-------------|
|
|
69
|
-
| `file` | File | Yes | The file to ingest (max
|
|
69
|
+
| `file` | File | Yes | The file to ingest (max 100MB) |
|
|
70
70
|
| `description` | string | No | Agent-provided description (e.g., from LLM vision). Used instead of parser output for images. |
|
|
71
71
|
|
|
72
|
-
**Supported formats**: `.txt`, `.md`, `.csv`, `.tsv`, `.log`, `.pdf`, `.docx`, `.xlsx`, `.json`, `.jsonl`, `.html`, `.htm`, `.png`, `.jpg`, `.jpeg`, `.webp`, `.gif`, `.bmp`, `.tiff`, `.svg`
|
|
72
|
+
**Supported formats**: `.txt`, `.md`, `.csv`, `.tsv`, `.log`, `.pdf`, `.docx`, `.xlsx`, `.pptx`, `.json`, `.jsonl`, `.html`, `.htm`, `.png`, `.jpg`, `.jpeg`, `.webp`, `.gif`, `.bmp`, `.tiff`, `.svg`
|
|
73
73
|
|
|
74
|
-
**Memory behavior**: All text parsers (csv, tsv, txt, log, pdf, json, jsonl, html, xlsx) stream from disk during ingestion — peak server memory is bounded to a few MB regardless of file size, up to the
|
|
74
|
+
**Memory behavior**: All text parsers (csv, tsv, txt, log, pdf, json, jsonl, html, xlsx, pptx) stream from disk during ingestion — peak server memory is bounded to a few MB regardless of file size, up to the 100MB file limit. `.docx` is the exception: mammoth (the parser) has no streaming API, so `.docx` is hard-capped at 10MB on the server. Above 10MB the server returns a `MemoryError` asking you to pre-extract the text upstream and re-upload as `.txt` or `.md`.
|
|
75
75
|
|
|
76
76
|
**What happens on upload**:
|
|
77
77
|
1. Original file is saved to `{DATA_DIR}/files/{filename}` (persistent across restarts)
|
|
@@ -84,6 +84,39 @@ const client = new MemoryClient('http://localhost:7822', process.env.MEMORY_API_
|
|
|
84
84
|
|
|
85
85
|
**PDF enrichment**: When a PDF contains images (≥50x50px), the response includes an `enrichment` block. The SDK's `ingestFile()` with `EnrichmentCallbacks` handles the full flow automatically — see [sdk-guide.md](sdk-guide.md#two-phase-pdf-enrichment).
|
|
86
86
|
|
|
87
|
+
### Streaming Progress (NDJSON)
|
|
88
|
+
|
|
89
|
+
For real-time ingestion progress, request NDJSON streaming by setting the `Accept` header or a query parameter:
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
# Via Accept header
|
|
93
|
+
curl -X POST {{ENDPOINT}}/api/memory/ingest/file \
|
|
94
|
+
-H "Authorization: Bearer {{API_KEY}}" \
|
|
95
|
+
-H "Accept: application/x-ndjson" \
|
|
96
|
+
-F "file=@large-report.pdf"
|
|
97
|
+
|
|
98
|
+
# Via query parameter
|
|
99
|
+
curl -X POST "{{ENDPOINT}}/api/memory/ingest/file?stream=true" \
|
|
100
|
+
-H "Authorization: Bearer {{API_KEY}}" \
|
|
101
|
+
-F "file=@large-report.pdf"
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
The response is streamed as newline-delimited JSON (`Content-Type: application/x-ndjson`). Each line is a JSON object:
|
|
105
|
+
|
|
106
|
+
```
|
|
107
|
+
{"type":"progress","stage":"parsing","filename":"large-report.pdf","message":"Extracting text..."}
|
|
108
|
+
{"type":"progress","stage":"storing","filename":"large-report.pdf","chunksStored":10,"totalCharacters":4800,"message":"Storing chunk 10..."}
|
|
109
|
+
{"type":"progress","stage":"storing","filename":"large-report.pdf","chunksStored":20,"totalCharacters":9600,"message":"Storing chunk 20..."}
|
|
110
|
+
{"type":"progress","stage":"complete","filename":"large-report.pdf","chunksStored":24,"totalCharacters":11520,"message":"Ingestion complete"}
|
|
111
|
+
{"type":"result","filename":"large-report.pdf","fileType":".pdf","chunks":24,"entryIds":[...],"totalCharacters":11520}
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
**Progress stages**: `parsing` (text extraction), `storing` (chunk storage — emitted every batch), `enrichment` (PDF image extraction), `complete` (done).
|
|
115
|
+
|
|
116
|
+
The last line always has `"type":"result"` and contains the full `IngestionResult` (same shape as the non-streaming JSON response). On error, the last line has `"type":"error"` with an `error` message string.
|
|
117
|
+
|
|
118
|
+
Without the `Accept: application/x-ndjson` header (or `?stream=true`), the endpoint behaves exactly as before — returning a single JSON response after ingestion completes.
|
|
119
|
+
|
|
87
120
|
### Example: ingest a document
|
|
88
121
|
|
|
89
122
|
```bash
|