@adverant/nexus-memory-skill 2.1.0 → 2.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/SKILL.md +139 -0
- package/hooks/upload-document.sh +377 -125
- package/package.json +1 -1
package/SKILL.md
CHANGED
|
@@ -418,6 +418,145 @@ curl -X POST "https://api.adverant.ai/fileprocess/api/process" \
|
|
|
418
418
|
|
|
419
419
|
---
|
|
420
420
|
|
|
421
|
+
## Upload Document Hook (Full Knowledge Extraction)
|
|
422
|
+
|
|
423
|
+
The `upload-document.sh` hook provides a streamlined way to upload files with **full auto-discovery and knowledge extraction** enabled by default.
|
|
424
|
+
|
|
425
|
+
### Auto-Discovery Features (All Enabled by Default)
|
|
426
|
+
|
|
427
|
+
When you upload a document, the system automatically:
|
|
428
|
+
|
|
429
|
+
| Feature | Description | Accuracy |
|
|
430
|
+
|---------|-------------|----------|
|
|
431
|
+
| **Smart File Detection** | Magic byte detection for accurate MIME type | 100% |
|
|
432
|
+
| **Intelligent Routing** | Auto-routes to MageAgent, VideoAgent, or CyberAgent | Automatic |
|
|
433
|
+
| **3-Tier OCR Cascade** | Tesseract → GPT-4o Vision → Claude Opus | Auto-escalates |
|
|
434
|
+
| **Layout Analysis** | Document structure preservation | 99.2% |
|
|
435
|
+
| **Table Extraction** | Tables converted to structured data | 97.9% |
|
|
436
|
+
| **Document DNA** | Triple-layer storage (semantic + structural + original) | Full fidelity |
|
|
437
|
+
| **Entity Extraction** | People, places, organizations → Neo4j | Automatic |
|
|
438
|
+
| **Vector Embeddings** | VoyageAI embeddings → Qdrant | Automatic |
|
|
439
|
+
|
|
440
|
+
### Basic Usage
|
|
441
|
+
|
|
442
|
+
```bash
|
|
443
|
+
# Upload a single file
|
|
444
|
+
~/.claude/hooks/upload-document.sh ./document.pdf
|
|
445
|
+
|
|
446
|
+
# Upload and wait for processing to complete
|
|
447
|
+
~/.claude/hooks/upload-document.sh ./book.pdf --wait
|
|
448
|
+
|
|
449
|
+
# Upload with custom tags for easier recall
|
|
450
|
+
~/.claude/hooks/upload-document.sh ./research.pdf --wait --tags=research,ai,papers
|
|
451
|
+
```
|
|
452
|
+
|
|
453
|
+
### Batch Upload (Multiple Files)
|
|
454
|
+
|
|
455
|
+
```bash
|
|
456
|
+
# Upload 3 books at once
|
|
457
|
+
~/.claude/hooks/upload-document.sh book1.pdf book2.pdf book3.pdf --batch --wait
|
|
458
|
+
|
|
459
|
+
# Upload entire directory of PDFs
|
|
460
|
+
~/.claude/hooks/upload-document.sh ./docs/*.pdf --batch --wait --tags=documentation
|
|
461
|
+
```
|
|
462
|
+
|
|
463
|
+
### Processing Options
|
|
464
|
+
|
|
465
|
+
| Flag | Description |
|
|
466
|
+
|------|-------------|
|
|
467
|
+
| `--wait` | Wait for processing to complete and show results |
|
|
468
|
+
| `--batch` | Enable batch mode for multiple files |
|
|
469
|
+
| `--tags=a,b,c` | Add custom tags for easier recall |
|
|
470
|
+
| `--no-entities` | Skip entity extraction (faster, less rich) |
|
|
471
|
+
| `--prefer-speed` | Use faster OCR (may reduce accuracy) |
|
|
472
|
+
| `--poll-interval=N` | Poll interval in seconds (default: 5) |
|
|
473
|
+
|
|
474
|
+
### Supported File Types
|
|
475
|
+
|
|
476
|
+
The upload hook supports **ALL file types** through intelligent routing:
|
|
477
|
+
|
|
478
|
+
- **Documents**: PDF, DOCX, DOC, TXT, MD, HTML, CSV, XML, JSON
|
|
479
|
+
- **Images**: JPEG, PNG, GIF, TIFF, WebP (with OCR)
|
|
480
|
+
- **Videos**: MP4, MOV, AVI, MKV, WebM, FLV
|
|
481
|
+
- **Archives**: ZIP, RAR, 7z, TAR, TAR.GZ, TAR.BZ2
|
|
482
|
+
- **Geospatial**: GeoJSON, Shapefile, GeoTIFF, KML
|
|
483
|
+
- **Point Cloud**: LAS, LAZ, PLY, PCD, E57
|
|
484
|
+
- **Code**: Any programming language
|
|
485
|
+
- **Any binary format** (routed to CyberAgent if suspicious)
|
|
486
|
+
|
|
487
|
+
### Example Output (--wait mode)
|
|
488
|
+
|
|
489
|
+
```
|
|
490
|
+
[upload-document] Uploading: research-paper.pdf (12MB)
|
|
491
|
+
Job ID: abc123-def456-ghi789
|
|
492
|
+
|
|
493
|
+
╔══════════════════════════════════════════════════════════════╗
|
|
494
|
+
║ PROCESSING COMPLETE: research-paper.pdf
|
|
495
|
+
╚══════════════════════════════════════════════════════════════╝
|
|
496
|
+
|
|
497
|
+
📄 Document Type: academic_paper
|
|
498
|
+
📑 Pages: 42
|
|
499
|
+
📝 Words: 15,230
|
|
500
|
+
|
|
501
|
+
🔍 Auto-Discovery Results:
|
|
502
|
+
• OCR Tier Used: tesseract (text-based PDF)
|
|
503
|
+
• Tables Found: 8
|
|
504
|
+
• Entities: 127
|
|
505
|
+
• GraphRAG: true
|
|
506
|
+
|
|
507
|
+
🏷️ Extracted Entities:
|
|
508
|
+
• Dr. Emily Chen (person)
|
|
509
|
+
• Stanford University (organization)
|
|
510
|
+
• NeurIPS 2024 (event)
|
|
511
|
+
• Transformer Architecture (concept)
|
|
512
|
+
... and 123 more
|
|
513
|
+
|
|
514
|
+
💡 To recall this content:
|
|
515
|
+
echo '{"query": "<your search>"}' | recall-memory.sh
|
|
516
|
+
```
|
|
517
|
+
|
|
518
|
+
### Recalling Uploaded Content
|
|
519
|
+
|
|
520
|
+
After upload, documents are immediately searchable:
|
|
521
|
+
|
|
522
|
+
```bash
|
|
523
|
+
# Search by content
|
|
524
|
+
echo '{"query": "transformer architecture research"}' | ~/.claude/hooks/recall-memory.sh
|
|
525
|
+
|
|
526
|
+
# Search by entity
|
|
527
|
+
echo '{"query": "papers by Dr. Emily Chen"}' | ~/.claude/hooks/recall-memory.sh
|
|
528
|
+
|
|
529
|
+
# Search by tag
|
|
530
|
+
echo '{"query": "research papers tagged ai"}' | ~/.claude/hooks/recall-memory.sh
|
|
531
|
+
```
|
|
532
|
+
|
|
533
|
+
### Environment Variables
|
|
534
|
+
|
|
535
|
+
| Variable | Default | Description |
|
|
536
|
+
|----------|---------|-------------|
|
|
537
|
+
| `NEXUS_API_KEY` | (required) | API key for authentication |
|
|
538
|
+
| `NEXUS_API_URL` | `https://api.adverant.ai` | API endpoint |
|
|
539
|
+
| `NEXUS_COMPANY_ID` | `adverant` | Company identifier |
|
|
540
|
+
| `NEXUS_APP_ID` | `claude-code` | Application identifier |
|
|
541
|
+
| `NEXUS_VERBOSE` | `0` | Set to `1` for debug output |
|
|
542
|
+
|
|
543
|
+
### Troubleshooting
|
|
544
|
+
|
|
545
|
+
**File too large:**
|
|
546
|
+
Maximum file size is 5GB. For larger files, consider splitting or using the direct API.
|
|
547
|
+
|
|
548
|
+
**Processing takes too long:**
|
|
549
|
+
- Large PDFs with many images trigger OCR cascade
|
|
550
|
+
- Use `--prefer-speed` for faster (but less accurate) processing
|
|
551
|
+
- Increase `--poll-interval` for large batches
|
|
552
|
+
|
|
553
|
+
**Entities not extracted:**
|
|
554
|
+
- Ensure you didn't use `--no-entities`
|
|
555
|
+
- Check NEXUS_VERBOSE=1 for detailed logs
|
|
556
|
+
- Some file types don't support entity extraction
|
|
557
|
+
|
|
558
|
+
---
|
|
559
|
+
|
|
421
560
|
## Store Memory
|
|
422
561
|
|
|
423
562
|
```bash
|
package/hooks/upload-document.sh
CHANGED
|
@@ -1,7 +1,19 @@
|
|
|
1
1
|
#!/bin/bash
|
|
2
2
|
#
|
|
3
|
-
# Nexus Memory - Upload Document Hook
|
|
4
|
-
# Uploads documents to FileProcessAgent for intelligent processing
|
|
3
|
+
# Nexus Memory - Upload Document Hook (v2.2.0)
|
|
4
|
+
# Uploads documents to FileProcessAgent for intelligent processing with
|
|
5
|
+
# FULL KNOWLEDGE EXTRACTION enabled by default.
|
|
6
|
+
#
|
|
7
|
+
# Auto-Discovery Features (enabled automatically):
|
|
8
|
+
# - Smart file type detection via magic bytes
|
|
9
|
+
# - Intelligent routing: MageAgent (docs), VideoAgent (video), CyberAgent (binaries)
|
|
10
|
+
# - 3-tier OCR cascade: Tesseract → GPT-4o Vision → Claude Opus (auto-escalates)
|
|
11
|
+
# - Layout analysis: 99.2% accuracy (Dockling-level)
|
|
12
|
+
# - Table extraction: 97.9% accuracy
|
|
13
|
+
# - Document DNA: Triple-layer storage (semantic + structural + original)
|
|
14
|
+
# - Entity extraction → Neo4j knowledge graph
|
|
15
|
+
# - Vector embeddings → Qdrant for semantic search
|
|
16
|
+
# - Content findable via recall-memory.sh
|
|
5
17
|
#
|
|
6
18
|
# Supports ALL file types including:
|
|
7
19
|
# - Documents: PDF, DOCX, TXT, MD, HTML, etc.
|
|
@@ -10,15 +22,21 @@
|
|
|
10
22
|
# - Archives: ZIP, RAR, 7z, TAR, GZIP
|
|
11
23
|
# - Geospatial: GeoJSON, Shapefile, GeoTIFF, KML (via intelligent routing)
|
|
12
24
|
# - Point Cloud: LAS, LAZ, PLY, PCD, E57 (via intelligent routing)
|
|
25
|
+
# - Code repositories: Automatically detected and processed
|
|
13
26
|
# - Any other binary format (routed to appropriate processor)
|
|
14
27
|
#
|
|
15
28
|
# Usage:
|
|
16
|
-
# upload-document.sh <file_path> [
|
|
29
|
+
# upload-document.sh <file_path> [options]
|
|
30
|
+
# upload-document.sh <file1> <file2> ... --batch [options]
|
|
17
31
|
#
|
|
18
32
|
# Arguments:
|
|
19
|
-
# file_path
|
|
20
|
-
# --wait
|
|
21
|
-
# --poll-interval=N
|
|
33
|
+
# file_path Path to the file(s) to upload (required)
|
|
34
|
+
# --wait Wait for processing to complete and return results
|
|
35
|
+
# --poll-interval=N Poll interval in seconds (default: 5)
|
|
36
|
+
# --batch Process multiple files (list files before this flag)
|
|
37
|
+
# --tags=a,b,c Add custom tags for recall (comma-separated)
|
|
38
|
+
# --no-entities Skip entity extraction to knowledge graph
|
|
39
|
+
# --prefer-speed Use faster OCR (may reduce accuracy for scanned docs)
|
|
22
40
|
#
|
|
23
41
|
# Environment Variables:
|
|
24
42
|
# NEXUS_API_KEY - API key for authentication (REQUIRED)
|
|
@@ -29,9 +47,10 @@
|
|
|
29
47
|
#
|
|
30
48
|
# Examples:
|
|
31
49
|
# upload-document.sh ./document.pdf
|
|
32
|
-
# upload-document.sh ./
|
|
50
|
+
# upload-document.sh ./book.pdf --wait
|
|
51
|
+
# upload-document.sh ./data.csv --wait --tags=dataset,sales
|
|
52
|
+
# upload-document.sh book1.pdf book2.pdf book3.pdf --batch --wait
|
|
33
53
|
# upload-document.sh ./video.mp4 --wait --poll-interval=10
|
|
34
|
-
# upload-document.sh ./pointcloud.las --wait
|
|
35
54
|
#
|
|
36
55
|
|
|
37
56
|
set -o pipefail
|
|
@@ -63,12 +82,27 @@ log_info() {
|
|
|
63
82
|
}
|
|
64
83
|
|
|
65
84
|
print_usage() {
|
|
66
|
-
echo "Usage: upload-document.sh <file_path> [
|
|
85
|
+
echo "Usage: upload-document.sh <file_path> [options]"
|
|
86
|
+
echo " upload-document.sh <file1> <file2> ... --batch [options]"
|
|
67
87
|
echo ""
|
|
68
88
|
echo "Arguments:"
|
|
69
|
-
echo " file_path Path to the file to upload (required)"
|
|
89
|
+
echo " file_path Path to the file(s) to upload (required)"
|
|
70
90
|
echo " --wait Wait for processing to complete"
|
|
71
91
|
echo " --poll-interval=N Poll interval in seconds (default: 5)"
|
|
92
|
+
echo " --batch Process multiple files (list files before this flag)"
|
|
93
|
+
echo " --tags=a,b,c Add custom tags for recall (comma-separated)"
|
|
94
|
+
echo " --no-entities Skip entity extraction to knowledge graph"
|
|
95
|
+
echo " --prefer-speed Use faster OCR (may reduce accuracy)"
|
|
96
|
+
echo ""
|
|
97
|
+
echo "Auto-Discovery Features (enabled by default):"
|
|
98
|
+
echo " • Smart file type detection via magic bytes"
|
|
99
|
+
echo " • Intelligent routing: MageAgent, VideoAgent, CyberAgent"
|
|
100
|
+
echo " • 3-tier OCR cascade (auto-escalates for quality)"
|
|
101
|
+
echo " • Layout analysis (99.2% accuracy)"
|
|
102
|
+
echo " • Table extraction (97.9% accuracy)"
|
|
103
|
+
echo " • Entity extraction → Knowledge graph"
|
|
104
|
+
echo " • Vector embeddings → Semantic search"
|
|
105
|
+
echo " • Content findable via recall-memory.sh"
|
|
72
106
|
echo ""
|
|
73
107
|
echo "Supported file types:"
|
|
74
108
|
echo " Documents: PDF, DOCX, DOC, TXT, MD, HTML, CSV, XML, JSON"
|
|
@@ -77,13 +111,16 @@ print_usage() {
|
|
|
77
111
|
echo " Archives: ZIP, RAR, 7z, TAR, TAR.GZ, TAR.BZ2"
|
|
78
112
|
echo " Geospatial: GeoJSON, Shapefile, GeoTIFF, KML"
|
|
79
113
|
echo " Point Cloud: LAS, LAZ, PLY, PCD, E57"
|
|
114
|
+
echo " Code: Any programming language"
|
|
80
115
|
echo " Any other binary format"
|
|
81
116
|
echo ""
|
|
82
117
|
echo "Maximum file size: 5GB"
|
|
83
118
|
echo ""
|
|
84
119
|
echo "Examples:"
|
|
85
120
|
echo " upload-document.sh ./document.pdf"
|
|
86
|
-
echo " upload-document.sh ./
|
|
121
|
+
echo " upload-document.sh ./book.pdf --wait"
|
|
122
|
+
echo " upload-document.sh ./data.csv --wait --tags=dataset,sales"
|
|
123
|
+
echo " upload-document.sh book1.pdf book2.pdf book3.pdf --batch --wait"
|
|
87
124
|
echo " upload-document.sh ./video.mp4 --wait --poll-interval=10"
|
|
88
125
|
}
|
|
89
126
|
|
|
@@ -109,9 +146,13 @@ if ! command -v jq &> /dev/null; then
|
|
|
109
146
|
fi
|
|
110
147
|
|
|
111
148
|
# Parse arguments
|
|
112
|
-
|
|
149
|
+
FILES=()
|
|
113
150
|
WAIT_FOR_COMPLETION=0
|
|
114
151
|
POLL_INTERVAL=5
|
|
152
|
+
BATCH_MODE=0
|
|
153
|
+
CUSTOM_TAGS=""
|
|
154
|
+
EXTRACT_ENTITIES=1
|
|
155
|
+
PREFER_SPEED=0
|
|
115
156
|
|
|
116
157
|
while [[ $# -gt 0 ]]; do
|
|
117
158
|
case $1 in
|
|
@@ -123,6 +164,22 @@ while [[ $# -gt 0 ]]; do
|
|
|
123
164
|
POLL_INTERVAL="${1#*=}"
|
|
124
165
|
shift
|
|
125
166
|
;;
|
|
167
|
+
--batch)
|
|
168
|
+
BATCH_MODE=1
|
|
169
|
+
shift
|
|
170
|
+
;;
|
|
171
|
+
--tags=*)
|
|
172
|
+
CUSTOM_TAGS="${1#*=}"
|
|
173
|
+
shift
|
|
174
|
+
;;
|
|
175
|
+
--no-entities)
|
|
176
|
+
EXTRACT_ENTITIES=0
|
|
177
|
+
shift
|
|
178
|
+
;;
|
|
179
|
+
--prefer-speed)
|
|
180
|
+
PREFER_SPEED=1
|
|
181
|
+
shift
|
|
182
|
+
;;
|
|
126
183
|
--help|-h)
|
|
127
184
|
print_usage
|
|
128
185
|
exit 0
|
|
@@ -133,153 +190,348 @@ while [[ $# -gt 0 ]]; do
|
|
|
133
190
|
exit 1
|
|
134
191
|
;;
|
|
135
192
|
*)
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
else
|
|
139
|
-
log_error "Unexpected argument: $1"
|
|
140
|
-
print_usage
|
|
141
|
-
exit 1
|
|
142
|
-
fi
|
|
193
|
+
# Collect file paths
|
|
194
|
+
FILES+=("$1")
|
|
143
195
|
shift
|
|
144
196
|
;;
|
|
145
197
|
esac
|
|
146
198
|
done
|
|
147
199
|
|
|
148
|
-
# Validate
|
|
149
|
-
if [[ -
|
|
150
|
-
log_error "
|
|
200
|
+
# Validate files
|
|
201
|
+
if [[ ${#FILES[@]} -eq 0 ]]; then
|
|
202
|
+
log_error "At least one file path is required"
|
|
151
203
|
print_usage
|
|
152
204
|
exit 1
|
|
153
205
|
fi
|
|
154
206
|
|
|
155
|
-
|
|
156
|
-
|
|
207
|
+
# If not batch mode but multiple files provided, error
|
|
208
|
+
if [[ "$BATCH_MODE" == "0" ]] && [[ ${#FILES[@]} -gt 1 ]]; then
|
|
209
|
+
log_error "Multiple files require --batch flag"
|
|
210
|
+
log_error "Usage: upload-document.sh file1.pdf file2.pdf --batch"
|
|
157
211
|
exit 1
|
|
158
212
|
fi
|
|
159
213
|
|
|
160
|
-
#
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
|
|
214
|
+
# Validate all files exist
|
|
215
|
+
for file in "${FILES[@]}"; do
|
|
216
|
+
if [[ ! -f "$file" ]]; then
|
|
217
|
+
log_error "File not found: $file"
|
|
218
|
+
exit 1
|
|
219
|
+
fi
|
|
220
|
+
done
|
|
164
221
|
|
|
165
|
-
#
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
exit 1
|
|
170
|
-
fi
|
|
222
|
+
# Build processing metadata JSON with hints for aggressive extraction
|
|
223
|
+
build_metadata() {
|
|
224
|
+
local file_name="$1"
|
|
225
|
+
local tags_json="[]"
|
|
171
226
|
|
|
172
|
-
|
|
173
|
-
|
|
227
|
+
# Convert comma-separated tags to JSON array
|
|
228
|
+
if [[ -n "$CUSTOM_TAGS" ]]; then
|
|
229
|
+
tags_json=$(echo "$CUSTOM_TAGS" | tr ',' '\n' | jq -R . | jq -s .)
|
|
230
|
+
fi
|
|
174
231
|
|
|
175
|
-
#
|
|
176
|
-
|
|
177
|
-
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
fi
|
|
232
|
+
# Determine OCR preference
|
|
233
|
+
local prefer_accuracy="true"
|
|
234
|
+
if [[ "$PREFER_SPEED" == "1" ]]; then
|
|
235
|
+
prefer_accuracy="false"
|
|
236
|
+
fi
|
|
181
237
|
|
|
182
|
-
#
|
|
183
|
-
|
|
238
|
+
# Determine entity extraction
|
|
239
|
+
local extract_entities="true"
|
|
240
|
+
if [[ "$EXTRACT_ENTITIES" == "0" ]]; then
|
|
241
|
+
extract_entities="false"
|
|
242
|
+
fi
|
|
184
243
|
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
|
|
244
|
+
cat <<EOF
|
|
245
|
+
{
|
|
246
|
+
"source": "nexus-memory-skill",
|
|
247
|
+
"version": "2.2.0",
|
|
248
|
+
"preferAccuracy": ${prefer_accuracy},
|
|
249
|
+
"forceEntityExtraction": ${extract_entities},
|
|
250
|
+
"storeInKnowledgeGraph": ${extract_entities},
|
|
251
|
+
"enableDocumentDNA": true,
|
|
252
|
+
"tags": ${tags_json},
|
|
253
|
+
"uploadedBy": "${USER:-unknown}",
|
|
254
|
+
"uploadedAt": "$(date -u +"%Y-%m-%dT%H:%M:%SZ")"
|
|
255
|
+
}
|
|
256
|
+
EOF
|
|
257
|
+
}
|
|
193
258
|
|
|
194
|
-
#
|
|
195
|
-
|
|
196
|
-
|
|
259
|
+
# Function to upload a single file
|
|
260
|
+
upload_file() {
|
|
261
|
+
local FILE_PATH="$1"
|
|
262
|
+
local FILE_NAME=$(basename "$FILE_PATH")
|
|
263
|
+
local FILE_SIZE=$(wc -c < "$FILE_PATH" | tr -d ' ')
|
|
264
|
+
local FILE_SIZE_MB=$((FILE_SIZE / 1024 / 1024))
|
|
265
|
+
|
|
266
|
+
# Check file size (max 5GB)
|
|
267
|
+
local MAX_SIZE=$((5 * 1024 * 1024 * 1024))
|
|
268
|
+
if [[ "$FILE_SIZE" -gt "$MAX_SIZE" ]]; then
|
|
269
|
+
log_error "File too large: $FILE_NAME (${FILE_SIZE_MB}MB). Maximum size is 5GB."
|
|
270
|
+
return 1
|
|
271
|
+
fi
|
|
197
272
|
|
|
198
|
-
log "
|
|
273
|
+
log "File: $FILE_PATH"
|
|
274
|
+
log "Size: $FILE_SIZE bytes (${FILE_SIZE_MB}MB)"
|
|
199
275
|
|
|
200
|
-
#
|
|
201
|
-
if [[ "$
|
|
202
|
-
|
|
203
|
-
|
|
276
|
+
# Display upload info
|
|
277
|
+
if [[ "$FILE_SIZE_MB" -gt 100 ]]; then
|
|
278
|
+
log_info "Uploading large file: $FILE_NAME (${FILE_SIZE_MB}MB) - this may take a while..."
|
|
279
|
+
else
|
|
280
|
+
log_info "Uploading: $FILE_NAME (${FILE_SIZE_MB}MB)"
|
|
281
|
+
fi
|
|
282
|
+
|
|
283
|
+
# Build metadata with processing hints
|
|
284
|
+
local METADATA=$(build_metadata "$FILE_NAME")
|
|
285
|
+
log "Metadata: $METADATA"
|
|
286
|
+
|
|
287
|
+
# Upload file via multipart form with metadata hints
|
|
288
|
+
log "Uploading to $FILEPROCESS_URL"
|
|
289
|
+
|
|
290
|
+
local RESPONSE=$(curl -s -w "\n%{http_code}" -X POST "$FILEPROCESS_URL" \
|
|
291
|
+
-H "Authorization: Bearer $NEXUS_API_KEY" \
|
|
292
|
+
-H "X-Company-ID: $COMPANY_ID" \
|
|
293
|
+
-H "X-App-ID: $APP_ID" \
|
|
294
|
+
-H "X-User-ID: ${USER:-unknown}" \
|
|
295
|
+
-F "file=@${FILE_PATH}" \
|
|
296
|
+
-F "userId=${USER:-unknown}" \
|
|
297
|
+
-F "metadata=${METADATA}" \
|
|
298
|
+
--max-time 600 2>&1)
|
|
299
|
+
|
|
300
|
+
# Parse response
|
|
301
|
+
local HTTP_CODE=$(echo "$RESPONSE" | tail -n1)
|
|
302
|
+
local BODY=$(echo "$RESPONSE" | sed '$d')
|
|
303
|
+
|
|
304
|
+
log "Response code: $HTTP_CODE"
|
|
305
|
+
|
|
306
|
+
# Check for upload errors (200, 201, 202 are all success codes)
|
|
307
|
+
if [[ "$HTTP_CODE" != "200" ]] && [[ "$HTTP_CODE" != "201" ]] && [[ "$HTTP_CODE" != "202" ]]; then
|
|
308
|
+
log_error "Failed to upload document (HTTP $HTTP_CODE)"
|
|
309
|
+
if [[ -n "$BODY" ]]; then
|
|
310
|
+
echo "$BODY" | jq . 2>/dev/null || echo "$BODY"
|
|
311
|
+
fi
|
|
312
|
+
return 1
|
|
313
|
+
fi
|
|
314
|
+
|
|
315
|
+
# Parse job ID from response
|
|
316
|
+
local JOB_ID=$(echo "$BODY" | jq -r '.jobId // empty')
|
|
317
|
+
|
|
318
|
+
if [[ -z "$JOB_ID" ]]; then
|
|
319
|
+
log_error "No job ID returned from upload"
|
|
204
320
|
echo "$BODY" | jq . 2>/dev/null || echo "$BODY"
|
|
321
|
+
return 1
|
|
205
322
|
fi
|
|
206
|
-
exit 1
|
|
207
|
-
fi
|
|
208
323
|
|
|
209
|
-
|
|
210
|
-
|
|
324
|
+
log_info "Document queued for processing"
|
|
325
|
+
echo "Job ID: $JOB_ID"
|
|
211
326
|
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
exit 1
|
|
216
|
-
fi
|
|
327
|
+
# Return job ID for tracking
|
|
328
|
+
echo "$JOB_ID"
|
|
329
|
+
}
|
|
217
330
|
|
|
218
|
-
|
|
219
|
-
|
|
331
|
+
# Function to display detailed results
|
|
332
|
+
display_results() {
|
|
333
|
+
local STATUS_RESPONSE="$1"
|
|
334
|
+
local FILE_NAME="$2"
|
|
220
335
|
|
|
221
|
-
# If not waiting, exit here
|
|
222
|
-
if [[ "$WAIT_FOR_COMPLETION" == "0" ]]; then
|
|
223
336
|
echo ""
|
|
224
|
-
echo "
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
|
|
228
|
-
# Wait for processing to complete
|
|
229
|
-
log_info "Waiting for processing to complete (polling every ${POLL_INTERVAL}s)..."
|
|
337
|
+
echo "╔══════════════════════════════════════════════════════════════╗"
|
|
338
|
+
echo "║ PROCESSING COMPLETE: $FILE_NAME"
|
|
339
|
+
echo "╚══════════════════════════════════════════════════════════════╝"
|
|
340
|
+
echo ""
|
|
230
341
|
|
|
231
|
-
|
|
232
|
-
|
|
342
|
+
# Extract key metrics from response
|
|
343
|
+
local ENTITY_COUNT=$(echo "$STATUS_RESPONSE" | jq -r '.result.entities // .entities // [] | length' 2>/dev/null)
|
|
344
|
+
local TABLE_COUNT=$(echo "$STATUS_RESPONSE" | jq -r '.result.tables // .tables // [] | length' 2>/dev/null)
|
|
345
|
+
local OCR_TIER=$(echo "$STATUS_RESPONSE" | jq -r '.result.ocrTier // .ocrTier // "auto"' 2>/dev/null)
|
|
346
|
+
local PAGE_COUNT=$(echo "$STATUS_RESPONSE" | jq -r '.result.pageCount // .pageCount // "N/A"' 2>/dev/null)
|
|
347
|
+
local WORD_COUNT=$(echo "$STATUS_RESPONSE" | jq -r '.result.wordCount // .wordCount // "N/A"' 2>/dev/null)
|
|
348
|
+
local DOC_TYPE=$(echo "$STATUS_RESPONSE" | jq -r '.result.documentType // .documentType // "unknown"' 2>/dev/null)
|
|
349
|
+
local GRAPHRAG_STORED=$(echo "$STATUS_RESPONSE" | jq -r '.result.storedInGraphRAG // .storedInGraphRAG // false' 2>/dev/null)
|
|
350
|
+
|
|
351
|
+
echo "📄 Document Type: $DOC_TYPE"
|
|
352
|
+
echo "📑 Pages: $PAGE_COUNT"
|
|
353
|
+
echo "📝 Words: $WORD_COUNT"
|
|
354
|
+
echo ""
|
|
355
|
+
echo "🔍 Auto-Discovery Results:"
|
|
356
|
+
echo " • OCR Tier Used: $OCR_TIER"
|
|
357
|
+
echo " • Tables Found: $TABLE_COUNT"
|
|
358
|
+
echo " • Entities: $ENTITY_COUNT"
|
|
359
|
+
echo " • GraphRAG: $GRAPHRAG_STORED"
|
|
360
|
+
echo ""
|
|
233
361
|
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
362
|
+
# Show extracted entities if any
|
|
363
|
+
if [[ "$ENTITY_COUNT" != "0" ]] && [[ "$ENTITY_COUNT" != "null" ]]; then
|
|
364
|
+
echo "🏷️ Extracted Entities:"
|
|
365
|
+
echo "$STATUS_RESPONSE" | jq -r '.result.entities // .entities // [] | .[:10][] | " • \(.name // .text) (\(.type // "entity"))"' 2>/dev/null
|
|
366
|
+
if [[ "$ENTITY_COUNT" -gt 10 ]]; then
|
|
367
|
+
echo " ... and $((ENTITY_COUNT - 10)) more"
|
|
368
|
+
fi
|
|
369
|
+
echo ""
|
|
370
|
+
fi
|
|
237
371
|
|
|
238
|
-
#
|
|
239
|
-
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
-H "X-App-ID: $APP_ID" \
|
|
243
|
-
-H "X-User-ID: ${USER:-unknown}" \
|
|
244
|
-
--max-time 30 2>/dev/null)
|
|
372
|
+
# Show recall command
|
|
373
|
+
echo "💡 To recall this content:"
|
|
374
|
+
echo " echo '{\"query\": \"<your search>\"}' | recall-memory.sh"
|
|
375
|
+
echo ""
|
|
245
376
|
|
|
246
|
-
|
|
247
|
-
|
|
248
|
-
|
|
377
|
+
# Show full JSON if verbose
|
|
378
|
+
if [[ "$VERBOSE" == "1" ]]; then
|
|
379
|
+
echo "=== FULL RESPONSE ==="
|
|
380
|
+
echo "$STATUS_RESPONSE" | jq .
|
|
249
381
|
fi
|
|
382
|
+
}
|
|
250
383
|
|
|
251
|
-
|
|
384
|
+
# Function to wait for job completion
|
|
385
|
+
wait_for_job() {
|
|
386
|
+
local JOB_ID="$1"
|
|
387
|
+
local FILE_NAME="$2"
|
|
388
|
+
|
|
389
|
+
log_info "Waiting for processing to complete (polling every ${POLL_INTERVAL}s)..."
|
|
390
|
+
|
|
391
|
+
local MAX_WAIT=3600 # 1 hour max wait
|
|
392
|
+
local WAITED=0
|
|
393
|
+
|
|
394
|
+
while [[ "$WAITED" -lt "$MAX_WAIT" ]]; do
|
|
395
|
+
sleep "$POLL_INTERVAL"
|
|
396
|
+
WAITED=$((WAITED + POLL_INTERVAL))
|
|
397
|
+
|
|
398
|
+
# Check job status
|
|
399
|
+
local STATUS_RESPONSE=$(curl -s "$JOBS_URL/$JOB_ID" \
|
|
400
|
+
-H "Authorization: Bearer $NEXUS_API_KEY" \
|
|
401
|
+
-H "X-Company-ID: $COMPANY_ID" \
|
|
402
|
+
-H "X-App-ID: $APP_ID" \
|
|
403
|
+
-H "X-User-ID: ${USER:-unknown}" \
|
|
404
|
+
--max-time 30 2>/dev/null)
|
|
405
|
+
|
|
406
|
+
if [[ -z "$STATUS_RESPONSE" ]]; then
|
|
407
|
+
log "Waiting... (${WAITED}s elapsed)"
|
|
408
|
+
continue
|
|
409
|
+
fi
|
|
410
|
+
|
|
411
|
+
local JOB_STATE=$(echo "$STATUS_RESPONSE" | jq -r '.state // .status // empty')
|
|
412
|
+
|
|
413
|
+
case "$JOB_STATE" in
|
|
414
|
+
"completed"|"finished"|"success")
|
|
415
|
+
display_results "$STATUS_RESPONSE" "$FILE_NAME"
|
|
416
|
+
return 0
|
|
417
|
+
;;
|
|
418
|
+
"failed"|"error")
|
|
419
|
+
log_error "Processing failed for: $FILE_NAME"
|
|
420
|
+
echo ""
|
|
421
|
+
echo "=== ERROR DETAILS ==="
|
|
422
|
+
echo "$STATUS_RESPONSE" | jq .
|
|
423
|
+
return 1
|
|
424
|
+
;;
|
|
425
|
+
"waiting"|"active"|"processing"|"pending")
|
|
426
|
+
local PROGRESS=$(echo "$STATUS_RESPONSE" | jq -r '.progress // empty')
|
|
427
|
+
local STAGE=$(echo "$STATUS_RESPONSE" | jq -r '.stage // empty')
|
|
428
|
+
if [[ -n "$PROGRESS" ]] && [[ -n "$STAGE" ]]; then
|
|
429
|
+
log "[$FILE_NAME] ${STAGE}: ${PROGRESS}% (${WAITED}s elapsed)"
|
|
430
|
+
elif [[ -n "$PROGRESS" ]]; then
|
|
431
|
+
log "[$FILE_NAME] Processing... ${PROGRESS}% (${WAITED}s elapsed)"
|
|
432
|
+
else
|
|
433
|
+
log "[$FILE_NAME] Processing... (${WAITED}s elapsed)"
|
|
434
|
+
fi
|
|
435
|
+
;;
|
|
436
|
+
*)
|
|
437
|
+
log "[$FILE_NAME] Status: $JOB_STATE (${WAITED}s elapsed)"
|
|
438
|
+
;;
|
|
439
|
+
esac
|
|
440
|
+
done
|
|
441
|
+
|
|
442
|
+
log_error "Timeout waiting for processing (${MAX_WAIT}s)"
|
|
443
|
+
echo "Job may still be processing. Check status manually:"
|
|
444
|
+
echo "curl -s \"$JOBS_URL/$JOB_ID\" | jq ."
|
|
445
|
+
return 1
|
|
446
|
+
}
|
|
252
447
|
|
|
253
|
-
|
|
254
|
-
|
|
255
|
-
|
|
256
|
-
|
|
257
|
-
|
|
258
|
-
|
|
259
|
-
|
|
260
|
-
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
|
|
264
|
-
|
|
265
|
-
|
|
266
|
-
|
|
267
|
-
|
|
268
|
-
|
|
269
|
-
|
|
270
|
-
|
|
271
|
-
|
|
272
|
-
|
|
273
|
-
|
|
448
|
+
# ============================================================================
|
|
449
|
+
# MAIN EXECUTION
|
|
450
|
+
# ============================================================================
|
|
451
|
+
|
|
452
|
+
# Track all job IDs for batch mode
|
|
453
|
+
JOB_IDS=()
|
|
454
|
+
FILE_NAMES=()
|
|
455
|
+
FAILED_UPLOADS=0
|
|
456
|
+
|
|
457
|
+
# Display batch info
|
|
458
|
+
if [[ "$BATCH_MODE" == "1" ]]; then
|
|
459
|
+
log_info "Batch mode: uploading ${#FILES[@]} files"
|
|
460
|
+
echo ""
|
|
461
|
+
fi
|
|
462
|
+
|
|
463
|
+
# Upload all files
|
|
464
|
+
for file in "${FILES[@]}"; do
|
|
465
|
+
FILE_NAME=$(basename "$file")
|
|
466
|
+
FILE_NAMES+=("$FILE_NAME")
|
|
467
|
+
|
|
468
|
+
# Upload file and capture job ID (last line of output)
|
|
469
|
+
UPLOAD_OUTPUT=$(upload_file "$file" 2>&1)
|
|
470
|
+
UPLOAD_EXIT_CODE=$?
|
|
471
|
+
|
|
472
|
+
if [[ $UPLOAD_EXIT_CODE -eq 0 ]]; then
|
|
473
|
+
# Extract job ID from output (last non-empty line that looks like a job ID)
|
|
474
|
+
JOB_ID=$(echo "$UPLOAD_OUTPUT" | grep -E '^[a-f0-9-]+$' | tail -1)
|
|
475
|
+
if [[ -n "$JOB_ID" ]]; then
|
|
476
|
+
JOB_IDS+=("$JOB_ID")
|
|
477
|
+
else
|
|
478
|
+
# Try to extract from "Job ID: xxx" format
|
|
479
|
+
JOB_ID=$(echo "$UPLOAD_OUTPUT" | grep "Job ID:" | sed 's/Job ID: //' | tr -d ' ')
|
|
480
|
+
if [[ -n "$JOB_ID" ]]; then
|
|
481
|
+
JOB_IDS+=("$JOB_ID")
|
|
274
482
|
fi
|
|
275
|
-
|
|
276
|
-
|
|
277
|
-
|
|
278
|
-
|
|
279
|
-
|
|
483
|
+
fi
|
|
484
|
+
echo "$UPLOAD_OUTPUT"
|
|
485
|
+
else
|
|
486
|
+
log_error "Failed to upload: $FILE_NAME"
|
|
487
|
+
echo "$UPLOAD_OUTPUT"
|
|
488
|
+
FAILED_UPLOADS=$((FAILED_UPLOADS + 1))
|
|
489
|
+
fi
|
|
490
|
+
|
|
491
|
+
# Add spacing between files in batch mode
|
|
492
|
+
if [[ "$BATCH_MODE" == "1" ]]; then
|
|
493
|
+
echo ""
|
|
494
|
+
fi
|
|
280
495
|
done
|
|
281
496
|
|
|
282
|
-
|
|
283
|
-
|
|
284
|
-
echo "
|
|
285
|
-
|
|
497
|
+
# Summary for batch mode
|
|
498
|
+
if [[ "$BATCH_MODE" == "1" ]]; then
|
|
499
|
+
echo "╔══════════════════════════════════════════════════════════════╗"
|
|
500
|
+
echo "║ UPLOAD SUMMARY ║"
|
|
501
|
+
echo "╚══════════════════════════════════════════════════════════════╝"
|
|
502
|
+
echo "Total files: ${#FILES[@]}"
|
|
503
|
+
echo "Uploaded: ${#JOB_IDS[@]}"
|
|
504
|
+
echo "Failed: $FAILED_UPLOADS"
|
|
505
|
+
echo ""
|
|
506
|
+
fi
|
|
507
|
+
|
|
508
|
+
# If not waiting, exit here
|
|
509
|
+
if [[ "$WAIT_FOR_COMPLETION" == "0" ]]; then
|
|
510
|
+
if [[ ${#JOB_IDS[@]} -gt 0 ]]; then
|
|
511
|
+
echo "To check status:"
|
|
512
|
+
for i in "${!JOB_IDS[@]}"; do
|
|
513
|
+
echo " curl -s \"$JOBS_URL/${JOB_IDS[$i]}\" | jq . # ${FILE_NAMES[$i]}"
|
|
514
|
+
done
|
|
515
|
+
fi
|
|
516
|
+
exit $FAILED_UPLOADS
|
|
517
|
+
fi
|
|
518
|
+
|
|
519
|
+
# Wait for all jobs to complete
|
|
520
|
+
FAILED_JOBS=0
|
|
521
|
+
for i in "${!JOB_IDS[@]}"; do
|
|
522
|
+
JOB_ID="${JOB_IDS[$i]}"
|
|
523
|
+
FILE_NAME="${FILE_NAMES[$i]}"
|
|
524
|
+
|
|
525
|
+
if ! wait_for_job "$JOB_ID" "$FILE_NAME"; then
|
|
526
|
+
FAILED_JOBS=$((FAILED_JOBS + 1))
|
|
527
|
+
fi
|
|
528
|
+
done
|
|
529
|
+
|
|
530
|
+
# Final exit code
|
|
531
|
+
TOTAL_FAILURES=$((FAILED_UPLOADS + FAILED_JOBS))
|
|
532
|
+
if [[ "$TOTAL_FAILURES" -gt 0 ]]; then
|
|
533
|
+
log_error "$TOTAL_FAILURES file(s) failed to process"
|
|
534
|
+
exit 1
|
|
535
|
+
fi
|
|
536
|
+
|
|
537
|
+
exit 0
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@adverant/nexus-memory-skill",
|
|
3
|
-
"version": "2.
|
|
3
|
+
"version": "2.2.0",
|
|
4
4
|
"description": "Claude Code skill for persistent memory via Nexus GraphRAG - store and recall memories across all sessions and projects",
|
|
5
5
|
"main": "SKILL.md",
|
|
6
6
|
"type": "module",
|