@pyxmate/memory 0.6.2 → 0.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -91,7 +91,9 @@ curl -s -X POST {{ENDPOINT}}/api/memory/ingest/file \
|
|
|
91
91
|
-F "description=What this file contains or shows"
|
|
92
92
|
```
|
|
93
93
|
|
|
94
|
-
Supported formats: txt, md, csv, pdf, docx, json, jsonl, html, png, jpg, jpeg, webp, gif, bmp, tiff, svg (50MB limit).
|
|
94
|
+
Supported formats: txt, md, csv, tsv, log, pdf, docx, xlsx, json, jsonl, html, htm, png, jpg, jpeg, webp, gif, bmp, tiff, svg (50MB limit).
|
|
95
|
+
|
|
96
|
+
Ingestion streams from disk — parsers read row-by-row (csv/xlsx), line-by-line (txt/jsonl), or page-by-page (pdf) so peak memory is bounded regardless of file size. The only exception is `.docx`, which is hard-capped at 10MB because the underlying library (mammoth) loads the whole document into memory; above 10MB the server returns an error asking you to pre-extract the text upstream and re-upload as `.txt` or `.md`.
|
|
95
97
|
|
|
96
98
|
**Images require a `description`** — this is how the content gets embedded and becomes searchable. Without it, the image is stored but not findable. Use your vision capabilities to generate the description when the user doesn't provide one.
|
|
97
99
|
|
|
@@ -69,14 +69,16 @@ const client = new MemoryClient('http://localhost:7822', process.env.MEMORY_API_
|
|
|
69
69
|
| `file` | File | Yes | The file to ingest (max 50MB) |
|
|
70
70
|
| `description` | string | No | Agent-provided description (e.g., from LLM vision). Used instead of parser output for images. |
|
|
71
71
|
|
|
72
|
-
**Supported formats**: `.txt`, `.md`, `.csv`, `.pdf`, `.docx`, `.json`, `.jsonl`, `.html`, `.htm`, `.
|
|
72
|
+
**Supported formats**: `.txt`, `.md`, `.csv`, `.tsv`, `.log`, `.pdf`, `.docx`, `.xlsx`, `.json`, `.jsonl`, `.html`, `.htm`, `.png`, `.jpg`, `.jpeg`, `.webp`, `.gif`, `.bmp`, `.tiff`, `.svg`
|
|
73
|
+
|
|
74
|
+
**Memory behavior**: All text parsers (csv, tsv, txt, log, pdf, json, jsonl, html, xlsx) stream from disk during ingestion — peak server memory is bounded to a few MB regardless of file size, up to the 50MB file limit. `.docx` is the exception: mammoth (the parser) has no streaming API, so `.docx` is hard-capped at 10MB on the server. Above 10MB the server returns a `MemoryError` asking you to pre-extract the text upstream and re-upload as `.txt` or `.md`.
|
|
73
75
|
|
|
74
76
|
**What happens on upload**:
|
|
75
77
|
1. Original file is saved to `{DATA_DIR}/files/{filename}` (persistent across restarts)
|
|
76
78
|
2. Text is extracted (documents) or description is used (images)
|
|
77
79
|
3. Content is chunked and stored in SQLite + vector for semantic search
|
|
78
80
|
4. Source-aware dedup: re-uploading the same filename replaces the previous version
|
|
79
|
-
5. **PDFs with images**: Images are extracted via
|
|
81
|
+
5. **PDFs with images**: Images are extracted via poppler (`pdfimages` for embedded objects, `pdftoppm` for page renders on scanned PDFs), saved to a temp directory, and an `enrichment` block is returned with HMAC-signed tokens for two-phase enrichment. `pdf-parse` is used only as a dev fallback for small files (<5MB) when poppler isn't installed.
|
|
80
82
|
|
|
81
83
|
**Image ingestion**: Images cannot have text extracted. Pass a `description` field with a natural-language description (e.g., from an LLM with vision). Without a description, images get a minimal placeholder (`[Image] filename (size KB)`).
|
|
82
84
|
|
|
@@ -240,8 +242,8 @@ curl -s -X POST {{ENDPOINT}}/api/memory/ingest \
|
|
|
240
242
|
{"name":"JavaScript","type":"TOOL"}
|
|
241
243
|
],
|
|
242
244
|
"relationships":[
|
|
243
|
-
{"
|
|
244
|
-
{"
|
|
245
|
+
{"source":"Alice","target":"TypeScript","type":"USES"},
|
|
246
|
+
{"source":"TypeScript","target":"JavaScript","type":"RELATED_TO"}
|
|
245
247
|
]
|
|
246
248
|
}'
|
|
247
249
|
```
|
|
@@ -249,7 +251,7 @@ curl -s -X POST {{ENDPOINT}}/api/memory/ingest \
|
|
|
249
251
|
**When to extract**: Always extract when content mentions specific people, tools, technologies, organizations, locations, or events by name. Skip only for abstract observations with no named subjects (e.g., "prefer tabs over spaces").
|
|
250
252
|
|
|
251
253
|
**Entity fields**: `name` (required), `type` (required), `metadata` (optional properties object)
|
|
252
|
-
**Relationship fields**: `
|
|
254
|
+
**Relationship fields**: `source` (source entity name), `target` (target entity name), `type` (required). Legacy `{from, to}` aliases are also accepted by the pyx-cloud hosted wrapper at `memory.api.pyxmate.com` for backward compatibility, but `{source, target}` is canonical.
|
|
253
255
|
|
|
254
256
|
---
|
|
255
257
|
|