@pentatonic-ai/ai-agent-sdk 0.6.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (94) hide show
  1. package/README.md +170 -69
  2. package/bin/__tests__/callback-server.test.js +4 -1
  3. package/bin/cli.js +41 -164
  4. package/bin/commands/config.js +251 -0
  5. package/package.json +2 -1
  6. package/packages/doctor/__tests__/detect.test.js +2 -6
  7. package/packages/doctor/src/checks/local-memory.js +164 -196
  8. package/packages/doctor/src/detect.js +11 -3
  9. package/packages/memory/src/corpus/adapters.js +104 -0
  10. package/packages/memory/src/corpus/cli.js +72 -7
  11. package/packages/memory/src/corpus/index.js +1 -1
  12. package/packages/memory-engine/.env.example +13 -0
  13. package/packages/memory-engine/README.md +131 -0
  14. package/packages/memory-engine/bench/README.md +99 -0
  15. package/packages/memory-engine/bench/scorecards-engine/agent-coding__pentatonic-baseline__20260427-142523.json +1115 -0
  16. package/packages/memory-engine/bench/scorecards-engine/chat-recall__pentatonic-baseline__20260427-142648.json +819 -0
  17. package/packages/memory-engine/bench/scorecards-engine/circular-economy__pentatonic-baseline__20260427-142757.json +1278 -0
  18. package/packages/memory-engine/bench/scorecards-engine/customer-support__pentatonic-baseline__20260427-142900.json +1018 -0
  19. package/packages/memory-engine/bench/scorecards-engine/marketplace-ops__pentatonic-baseline__20260427-142957.json +1038 -0
  20. package/packages/memory-engine/bench/scorecards-engine/product-catalogue__pentatonic-baseline__20260427-143122.json +961 -0
  21. package/packages/memory-engine/bench/scorecards-engine-via-docker/agent-coding__pentatonic-memory__20260427-161812.json +1115 -0
  22. package/packages/memory-engine/bench/scorecards-engine-via-docker/chat-recall__pentatonic-memory__20260427-161701.json +819 -0
  23. package/packages/memory-engine/bench/scorecards-engine-via-docker/circular-economy__pentatonic-memory__20260427-161713.json +1278 -0
  24. package/packages/memory-engine/bench/scorecards-engine-via-docker/customer-support__pentatonic-memory__20260427-161723.json +1018 -0
  25. package/packages/memory-engine/bench/scorecards-engine-via-docker/marketplace-ops__pentatonic-memory__20260427-161732.json +1038 -0
  26. package/packages/memory-engine/bench/scorecards-engine-via-docker/product-catalogue__pentatonic-memory__20260427-161741.json +937 -0
  27. package/packages/memory-engine/bench/scorecards-engine-via-l2-7-layer-populated/agent-coding__pentatonic-memory__20260427-184718.json +1115 -0
  28. package/packages/memory-engine/bench/scorecards-engine-via-l2-7-layer-populated/chat-recall__pentatonic-memory__20260427-184614.json +819 -0
  29. package/packages/memory-engine/bench/scorecards-engine-via-l2-7-layer-populated/circular-economy__pentatonic-memory__20260427-184809.json +1278 -0
  30. package/packages/memory-engine/bench/scorecards-engine-via-l2-7-layer-populated/customer-support__pentatonic-memory__20260427-184854.json +1018 -0
  31. package/packages/memory-engine/bench/scorecards-engine-via-l2-7-layer-populated/marketplace-ops__pentatonic-memory__20260427-184929.json +1038 -0
  32. package/packages/memory-engine/bench/scorecards-engine-via-l2-7-layer-populated/product-catalogue__pentatonic-memory__20260427-185015.json +961 -0
  33. package/packages/memory-engine/bench/scorecards-engine-via-l2-empty-layers/agent-coding__pentatonic-memory__20260427-175252.json +1115 -0
  34. package/packages/memory-engine/bench/scorecards-engine-via-l2-empty-layers/chat-recall__pentatonic-memory__20260427-175312.json +819 -0
  35. package/packages/memory-engine/bench/scorecards-engine-via-l2-empty-layers/circular-economy__pentatonic-memory__20260427-175335.json +1278 -0
  36. package/packages/memory-engine/bench/scorecards-engine-via-l2-empty-layers/customer-support__pentatonic-memory__20260427-175355.json +1018 -0
  37. package/packages/memory-engine/bench/scorecards-engine-via-l2-empty-layers/marketplace-ops__pentatonic-memory__20260427-175413.json +1038 -0
  38. package/packages/memory-engine/bench/scorecards-engine-via-l2-empty-layers/product-catalogue__pentatonic-memory__20260427-175430.json +883 -0
  39. package/packages/memory-engine/bench/scorecards-engine-via-shim/agent-coding__pentatonic-memory__20260427-155409.json +1115 -0
  40. package/packages/memory-engine/bench/scorecards-engine-via-shim/chat-recall__pentatonic-memory__20260427-155421.json +819 -0
  41. package/packages/memory-engine/bench/scorecards-engine-via-shim/circular-economy__pentatonic-memory__20260427-155433.json +1278 -0
  42. package/packages/memory-engine/bench/scorecards-engine-via-shim/customer-support__pentatonic-memory__20260427-155443.json +1018 -0
  43. package/packages/memory-engine/bench/scorecards-engine-via-shim/marketplace-ops__pentatonic-memory__20260427-155453.json +1038 -0
  44. package/packages/memory-engine/bench/scorecards-engine-via-shim/product-catalogue__pentatonic-memory__20260427-155503.json +937 -0
  45. package/packages/memory-engine/bench/scorecards-pentatonic-baseline/agent-coding__pentatonic-memory-latest__20260427-145103.json +1115 -0
  46. package/packages/memory-engine/bench/scorecards-pentatonic-baseline/agent-coding__pentatonic-memory__20260427-144909.json +1115 -0
  47. package/packages/memory-engine/bench/scorecards-pentatonic-baseline/chat-recall__pentatonic-memory-latest__20260427-145153.json +819 -0
  48. package/packages/memory-engine/bench/scorecards-pentatonic-baseline/chat-recall__pentatonic-memory__20260427-145120.json +542 -0
  49. package/packages/memory-engine/bench/scorecards-pentatonic-baseline/circular-economy__pentatonic-memory-latest__20260427-145313.json +1278 -0
  50. package/packages/memory-engine/bench/scorecards-pentatonic-baseline/circular-economy__pentatonic-memory__20260427-145207.json +894 -0
  51. package/packages/memory-engine/bench/scorecards-pentatonic-baseline/customer-support__pentatonic-memory-latest__20260427-145412.json +1018 -0
  52. package/packages/memory-engine/bench/scorecards-pentatonic-baseline/customer-support__pentatonic-memory__20260427-145327.json +680 -0
  53. package/packages/memory-engine/bench/scorecards-pentatonic-baseline/marketplace-ops__pentatonic-memory-latest__20260427-145517.json +1038 -0
  54. package/packages/memory-engine/bench/scorecards-pentatonic-baseline/marketplace-ops__pentatonic-memory__20260427-145422.json +693 -0
  55. package/packages/memory-engine/bench/scorecards-pentatonic-baseline/product-catalogue__pentatonic-memory-latest__20260427-145616.json +961 -0
  56. package/packages/memory-engine/bench/scorecards-pentatonic-baseline/product-catalogue__pentatonic-memory__20260427-145528.json +727 -0
  57. package/packages/memory-engine/compat/Dockerfile +11 -0
  58. package/packages/memory-engine/compat/server.py +680 -0
  59. package/packages/memory-engine/docker-compose.yml +243 -0
  60. package/packages/memory-engine/docs/MIGRATION.md +178 -0
  61. package/packages/memory-engine/docs/RUNBOOK-AWS.md +375 -0
  62. package/packages/memory-engine/docs/why-v05-underperforms.md +138 -0
  63. package/packages/memory-engine/engine/README.md +52 -0
  64. package/packages/memory-engine/engine/l2-hybridrag-proxy.py +1543 -0
  65. package/packages/memory-engine/engine/l5-comms-layer.py +663 -0
  66. package/packages/memory-engine/engine/l6-document-store.py +1018 -0
  67. package/packages/memory-engine/engine/services/l2/Dockerfile +41 -0
  68. package/packages/memory-engine/engine/services/l2/init_databases.py +81 -0
  69. package/packages/memory-engine/engine/services/l2/l2-hybridrag-proxy.py +1543 -0
  70. package/packages/memory-engine/engine/services/l4/Dockerfile +15 -0
  71. package/packages/memory-engine/engine/services/l4/server.py +235 -0
  72. package/packages/memory-engine/engine/services/l5/Dockerfile +9 -0
  73. package/packages/memory-engine/engine/services/l5/l5-comms-layer.py +678 -0
  74. package/packages/memory-engine/engine/services/l6/Dockerfile +11 -0
  75. package/packages/memory-engine/engine/services/l6/l6-document-store.py +1016 -0
  76. package/packages/memory-engine/engine/services/nv-embed/Dockerfile +28 -0
  77. package/packages/memory-engine/engine/services/nv-embed/server.py +152 -0
  78. package/packages/memory-engine/pme_memory/__init__.py +0 -0
  79. package/packages/memory-engine/pme_memory/__main__.py +129 -0
  80. package/packages/memory-engine/pme_memory/artifacts.py +95 -0
  81. package/packages/memory-engine/pme_memory/embed.py +74 -0
  82. package/packages/memory-engine/pme_memory/health.py +36 -0
  83. package/packages/memory-engine/pme_memory/hygiene.py +159 -0
  84. package/packages/memory-engine/pme_memory/indexer.py +200 -0
  85. package/packages/memory-engine/pme_memory/needs.py +55 -0
  86. package/packages/memory-engine/pme_memory/provenance.py +80 -0
  87. package/packages/memory-engine/pme_memory/scoring.py +168 -0
  88. package/packages/memory-engine/pme_memory/search.py +52 -0
  89. package/packages/memory-engine/pme_memory/store.py +86 -0
  90. package/packages/memory-engine/pme_memory/synthesis.py +114 -0
  91. package/packages/memory-engine/pyproject.toml +65 -0
  92. package/packages/memory-engine/scripts/kg-extractor.py +557 -0
  93. package/packages/memory-engine/scripts/kg-preflexor-v2.py +738 -0
  94. package/packages/memory-engine/tests/test_api_contract.sh +57 -0
@@ -0,0 +1,375 @@
1
+ # pentatonic-memory-engine — AWS deployment runbook (v1)
2
+
3
+ **Target:** single EC2 (`m6i.2xlarge`) in `us-east-1`, network-boundary auth via Cloudflare Tunnel.
4
+ **Operator:** Phil Hauser (or anyone with `AdministratorAccess` to account `170649632502`).
5
+ **Estimated time end-to-end:** ~45 minutes (mostly waiting for instance/volume provisioning).
6
+
7
+ ---
8
+
9
+ ## 0. Prerequisites
10
+
11
+ Before starting, verify:
12
+
13
+ ```bash
14
+ aws sts get-caller-identity
15
+ # Should return Account: 170649632502, AdministratorAccess role
16
+
17
+ aws configure get region
18
+ # us-east-1
19
+ ```
20
+
21
+ If region isn't set: `export AWS_REGION=us-east-1` for the rest of the session.
22
+
23
+ You'll also need:
24
+ - A **Cloudflare account** with access to the Pentatonic CF zone (for Tunnel setup)
25
+ - The **`pentatonic-ai-gateway` API key** (from lambda.dev — should already exist)
26
+
27
+ ---
28
+
29
+ ## 1. Variables (paste once, reuse below)
30
+
31
+ ```bash
32
+ export AWS_REGION=us-east-1
33
+ export ENV=prod
34
+ export NAME=pme-${ENV}-us-east-1
35
+ export INSTANCE_TYPE=m6i.2xlarge
36
+ # Latest Ubuntu 22.04 LTS in us-east-1 (verify via aws ec2 describe-images if needed)
37
+ export AMI_ID=$(aws ec2 describe-images \
38
+ --owners 099720109477 \
39
+ --filters "Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*" \
40
+ --query 'Images | sort_by(@, &CreationDate) | [-1].ImageId' \
41
+ --output text)
42
+ echo "Using AMI: $AMI_ID"
43
+ ```
44
+
45
+ ---
46
+
47
+ ## 2. Networking
48
+
49
+ Use the default VPC for v1. (Multi-VPC isolation is a v2 concern.)
50
+
51
+ ```bash
52
+ export VPC_ID=$(aws ec2 describe-vpcs \
53
+ --filters "Name=is-default,Values=true" \
54
+ --query 'Vpcs[0].VpcId' --output text)
55
+
56
+ export SUBNET_ID=$(aws ec2 describe-subnets \
57
+ --filters "Name=vpc-id,Values=$VPC_ID" "Name=default-for-az,Values=true" \
58
+ --query 'Subnets[0].SubnetId' --output text)
59
+
60
+ echo "VPC=$VPC_ID Subnet=$SUBNET_ID"
61
+ ```
62
+
63
+ ### 2.1 Security group
64
+
65
+ No public ingress. Outbound 443/80/53 for Tunnel + gateway + apt + DNS.
66
+
67
+ ```bash
68
+ export SG_ID=$(aws ec2 create-security-group \
69
+ --group-name $NAME-sg \
70
+ --description "pentatonic-memory-engine $ENV — outbound only; ingress via SSM" \
71
+ --vpc-id $VPC_ID \
72
+ --query 'GroupId' --output text)
73
+
74
+ # Outbound is allowed by default. Strip default outbound and re-add explicitly.
75
+ aws ec2 revoke-security-group-egress \
76
+ --group-id $SG_ID \
77
+ --ip-permissions '[{"IpProtocol":"-1","IpRanges":[{"CidrIp":"0.0.0.0/0"}]}]'
78
+
79
+ aws ec2 authorize-security-group-egress --group-id $SG_ID \
80
+ --ip-permissions '[
81
+ {"IpProtocol":"tcp","FromPort":443,"ToPort":443,"IpRanges":[{"CidrIp":"0.0.0.0/0","Description":"HTTPS for tunnel + gateway + apt"}]},
82
+ {"IpProtocol":"tcp","FromPort":80, "ToPort":80, "IpRanges":[{"CidrIp":"0.0.0.0/0","Description":"HTTP for apt fallback"}]},
83
+ {"IpProtocol":"udp","FromPort":53, "ToPort":53, "IpRanges":[{"CidrIp":"0.0.0.0/0","Description":"DNS"}]},
84
+ {"IpProtocol":"tcp","FromPort":53, "ToPort":53, "IpRanges":[{"CidrIp":"0.0.0.0/0","Description":"DNS-over-TCP"}]}
85
+ ]'
86
+
87
+ echo "SG=$SG_ID"
88
+ ```
89
+
90
+ **No inbound rule.** Ops access happens via SSM Session Manager (next step), not SSH.
91
+
92
+ ---
93
+
94
+ ## 3. IAM role for SSM Session Manager + EBS snapshot agent
95
+
96
+ Lets you `aws ssm start-session` into the box without an SSH key.
97
+
98
+ ```bash
99
+ aws iam create-role --role-name $NAME-role \
100
+ --assume-role-policy-document '{
101
+ "Version":"2012-10-17",
102
+ "Statement":[{"Effect":"Allow","Principal":{"Service":"ec2.amazonaws.com"},"Action":"sts:AssumeRole"}]
103
+ }'
104
+
105
+ aws iam attach-role-policy --role-name $NAME-role \
106
+ --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
107
+
108
+ aws iam create-instance-profile --instance-profile-name $NAME-profile
109
+
110
+ aws iam add-role-to-instance-profile \
111
+ --instance-profile-name $NAME-profile \
112
+ --role-name $NAME-role
113
+
114
+ # Wait for IAM eventual-consistency before launching EC2
115
+ sleep 10
116
+ ```
117
+
118
+ ---
119
+
120
+ ## 4. EBS volumes
121
+
122
+ Five `gp3` volumes, 50 GiB each (resize online later if needed). One per layer's data dir.
123
+
124
+ ```bash
125
+ export AZ=$(aws ec2 describe-subnets --subnet-ids $SUBNET_ID \
126
+ --query 'Subnets[0].AvailabilityZone' --output text)
127
+
128
+ for layer in l2 l3 l4 l5 l6; do
129
+ vol_id=$(aws ec2 create-volume \
130
+ --availability-zone $AZ \
131
+ --size 50 --volume-type gp3 \
132
+ --tag-specifications "ResourceType=volume,Tags=[{Key=Name,Value=$NAME-$layer},{Key=pme-layer,Value=$layer}]" \
133
+ --query 'VolumeId' --output text)
134
+ echo "$layer = $vol_id"
135
+ eval "export VOL_${layer}=$vol_id"
136
+ done
137
+
138
+ # Wait until all are 'available'
139
+ aws ec2 wait volume-available --volume-ids $VOL_l2 $VOL_l3 $VOL_l4 $VOL_l5 $VOL_l6
140
+ echo "All volumes available."
141
+ ```
142
+
143
+ ---
144
+
145
+ ## 5. Launch the EC2
146
+
147
+ ```bash
148
+ # User data: format the EBS volumes on first boot, install docker, mount.
149
+ cat > /tmp/userdata.sh <<'EOF'
150
+ #!/bin/bash
151
+ set -euxo pipefail
152
+
153
+ apt-get update
154
+ apt-get install -y docker.io docker-compose-v2 git xfsprogs
155
+
156
+ # Wait for EBS volumes to attach (they're attached just after instance launch by AWS CLI below)
157
+ for layer in l2 l3 l4 l5 l6; do
158
+ for i in {1..30}; do
159
+ if [ -e /dev/disk/by-label/$layer ] || lsblk -no NAME,SERIAL | grep -q "$layer"; then
160
+ break
161
+ fi
162
+ sleep 2
163
+ done
164
+ done
165
+
166
+ # Find each volume by tag (we'll attach by device name below; this just creates mount points)
167
+ mkdir -p /var/lib/pme/{l2,l3,l4,l5,l6}
168
+
169
+ # Format + mount each — done by per-volume systemd in step 6.5 below
170
+
171
+ systemctl enable --now docker
172
+
173
+ # Pull engine repo
174
+ cd /opt
175
+ git clone https://github.com/Pentatonic-Ltd/memory_stack_updated.git engine
176
+ chown -R ubuntu:ubuntu /opt/engine
177
+ EOF
178
+
179
+ export INSTANCE_ID=$(aws ec2 run-instances \
180
+ --image-id $AMI_ID \
181
+ --instance-type $INSTANCE_TYPE \
182
+ --subnet-id $SUBNET_ID \
183
+ --security-group-ids $SG_ID \
184
+ --iam-instance-profile Name=$NAME-profile \
185
+ --block-device-mappings 'DeviceName=/dev/sda1,Ebs={VolumeSize=30,VolumeType=gp3}' \
186
+ --metadata-options 'HttpTokens=required,HttpEndpoint=enabled' \
187
+ --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=$NAME}]" \
188
+ --user-data file:///tmp/userdata.sh \
189
+ --query 'Instances[0].InstanceId' --output text)
190
+
191
+ aws ec2 wait instance-running --instance-ids $INSTANCE_ID
192
+ echo "Instance $INSTANCE_ID is running."
193
+ ```
194
+
195
+ ### 5.1 Attach EBS volumes
196
+
197
+ ```bash
198
+ aws ec2 attach-volume --volume-id $VOL_l2 --instance-id $INSTANCE_ID --device /dev/xvdf
199
+ aws ec2 attach-volume --volume-id $VOL_l3 --instance-id $INSTANCE_ID --device /dev/xvdg
200
+ aws ec2 attach-volume --volume-id $VOL_l4 --instance-id $INSTANCE_ID --device /dev/xvdh
201
+ aws ec2 attach-volume --volume-id $VOL_l5 --instance-id $INSTANCE_ID --device /dev/xvdi
202
+ aws ec2 attach-volume --volume-id $VOL_l6 --instance-id $INSTANCE_ID --device /dev/xvdj
203
+
204
+ # Wait for all to attach
205
+ for v in $VOL_l2 $VOL_l3 $VOL_l4 $VOL_l5 $VOL_l6; do
206
+ aws ec2 wait volume-in-use --volume-ids $v
207
+ done
208
+ echo "All volumes attached."
209
+ ```
210
+
211
+ ---
212
+
213
+ ## 6. Mount EBS volumes inside the EC2
214
+
215
+ Connect via SSM Session Manager:
216
+
217
+ ```bash
218
+ aws ssm start-session --target $INSTANCE_ID
219
+ ```
220
+
221
+ Then inside the instance:
222
+
223
+ ```bash
224
+ # Format each volume (one-time)
225
+ for pair in xvdf:l2 xvdg:l3 xvdh:l4 xvdi:l5 xvdj:l6; do
226
+ dev=${pair%:*}; layer=${pair#*:}
227
+ if ! sudo blkid /dev/$dev >/dev/null 2>&1; then
228
+ sudo mkfs.xfs -L $layer /dev/$dev
229
+ fi
230
+ done
231
+
232
+ # Add to /etc/fstab and mount
233
+ for pair in xvdf:l2 xvdg:l3 xvdh:l4 xvdi:l5 xvdj:l6; do
234
+ dev=${pair%:*}; layer=${pair#*:}
235
+ uuid=$(sudo blkid -s UUID -o value /dev/$dev)
236
+ sudo mkdir -p /var/lib/pme/$layer
237
+ echo "UUID=$uuid /var/lib/pme/$layer xfs defaults,nofail 0 2" | sudo tee -a /etc/fstab
238
+ done
239
+
240
+ sudo systemctl daemon-reload
241
+ sudo mount -a
242
+ df -h /var/lib/pme/*
243
+ # All five should show ~50G mounted, 49G available.
244
+ ```
245
+
246
+ ---
247
+
248
+ ## 7. Cloudflare Tunnel setup
249
+
250
+ In the Cloudflare dashboard:
251
+
252
+ 1. **Zero Trust → Networks → Tunnels → Create a tunnel** (Cloudflared connector type)
253
+ 2. Name: `engine-prod-us-east-1`
254
+ 3. Save → copy the **tunnel token** (the `eyJ...` string).
255
+ 4. **Public hostnames** tab → Add:
256
+ - Subdomain: `engine`
257
+ - Domain: `pentatonic.internal` (or whatever internal CF zone you use)
258
+ - Type: HTTP, URL: `compat:8099`
259
+
260
+ Copy the tunnel token; you'll set it as `CLOUDFLARED_TUNNEL_TOKEN` in `.env` below.
261
+
262
+ > The hostname is reachable only by Workers/services in the same Cloudflare account by default. If you want to lock down further, attach a **Cloudflare Access policy** requiring a service token on the hostname — then set the service-token header in TES Workers' fetch calls. Optional for v1; can layer on later.
263
+
264
+ ---
265
+
266
+ ## 8. Configure and bring up the engine
267
+
268
+ Back in the SSM session on the EC2:
269
+
270
+ ```bash
271
+ cd /opt/engine
272
+
273
+ # Pull the AWS overlay (PR'd separately to memory_stack_updated; for now copy it manually)
274
+ # Once merged upstream, this file is part of the repo.
275
+ sudo curl -fL -o docker-compose.aws.yml \
276
+ https://raw.githubusercontent.com/Pentatonic-Ltd/memory_stack_updated/main/docker-compose.aws.yml
277
+
278
+ # Generate Neo4j password
279
+ NEO4J_PASSWORD=$(openssl rand -base64 24 | tr -d '/+=')
280
+
281
+ # Write .env (substitute values)
282
+ cat | sudo tee .env <<EOF
283
+ PME_PORT=8099
284
+ NV_EMBED_URL=https://gateway.pentatonic.ai/v1/embeddings # confirm exact URL with the gateway team
285
+ PENTATONIC_AI_GATEWAY_KEY=<paste from secret store>
286
+ CLOUDFLARED_TUNNEL_TOKEN=<paste from CF dashboard>
287
+ NEO4J_PASSWORD=$NEO4J_PASSWORD
288
+ EOF
289
+
290
+ sudo chmod 600 .env
291
+
292
+ # Bring up the stack
293
+ sudo docker compose -f docker-compose.yml -f docker-compose.aws.yml up -d
294
+ sudo docker compose ps
295
+ ```
296
+
297
+ First run pulls images (~3-5 min) and builds engine images (~10-15 min). Subsequent restarts are fast.
298
+
299
+ ---
300
+
301
+ ## 9. Smoke test
302
+
303
+ From your laptop or any TES dev environment with access to the CF zone:
304
+
305
+ ```bash
306
+ curl -sf https://engine.pentatonic.internal/health | jq
307
+ # Expected: {"status":"ok","layers":{"l0":"ok",...,"l6":"ok"},"engine":"pentatonic-memory-engine"}
308
+
309
+ curl -sX POST https://engine.pentatonic.internal/store \
310
+ -H "content-type: application/json" \
311
+ -d '{"content":"hello from runbook smoke test","metadata":{"arena":"smoke"}}'
312
+
313
+ curl -sX POST https://engine.pentatonic.internal/search \
314
+ -H "content-type: application/json" \
315
+ -d '{"query":"hello","limit":3,"min_score":0.001}' | jq
316
+ ```
317
+
318
+ If `/search` returns the row from `/store`, end-to-end works.
319
+
320
+ ---
321
+
322
+ ## 10. AWS Backup
323
+
324
+ ```bash
325
+ # Tag all volumes for the backup plan
326
+ for v in $VOL_l2 $VOL_l3 $VOL_l4 $VOL_l5 $VOL_l6; do
327
+ aws ec2 create-tags --resources $v --tags Key=Backup,Value=daily
328
+ done
329
+
330
+ # Backup plan: nightly snapshot, 14-day retention.
331
+ # Easiest: AWS Backup console → Plan → "DailyBackup14Day" → resource selection by tag Backup=daily.
332
+ # Or via CLI — see https://docs.aws.amazon.com/aws-backup/latest/devguide/creating-a-backup-plan.html
333
+ ```
334
+
335
+ Run the restore drill at least once before going live: spin up a sibling instance, attach restored volumes, confirm engine comes back healthy.
336
+
337
+ ---
338
+
339
+ ## 11. CloudWatch alarms (recommended, not strictly v1)
340
+
341
+ - EC2 instance status check failed → SNS alert
342
+ - EBS volume usage > 80% → SNS alert
343
+ - Engine `/health` failure (custom Lambda probe via the tunnel) → SNS alert
344
+
345
+ ---
346
+
347
+ ## 12. Resource summary
348
+
349
+ | Resource | Identifier (filled at runtime) |
350
+ |---|---|
351
+ | Instance | `$INSTANCE_ID` (m6i.2xlarge) |
352
+ | VPC / Subnet | `$VPC_ID` / `$SUBNET_ID` |
353
+ | Security group | `$SG_ID` |
354
+ | IAM role / profile | `$NAME-role` / `$NAME-profile` |
355
+ | EBS volumes | `$VOL_l2 $VOL_l3 $VOL_l4 $VOL_l5 $VOL_l6` (50 GiB gp3 each) |
356
+ | Cloudflare Tunnel | `engine-prod-us-east-1` → `engine.pentatonic.internal` |
357
+
358
+ Estimated v1 cost: **~$340/mo on-demand** (instance) + **~$20/mo** (5×50 GiB gp3) + AWS Backup snapshots (~$5-10/mo at 14-day retention) + data transfer (negligible from CF Tunnel).
359
+
360
+ ---
361
+
362
+ ## Teardown (if you need to recreate)
363
+
364
+ ```bash
365
+ aws ec2 terminate-instances --instance-ids $INSTANCE_ID
366
+ aws ec2 wait instance-terminated --instance-ids $INSTANCE_ID
367
+ for v in $VOL_l2 $VOL_l3 $VOL_l4 $VOL_l5 $VOL_l6; do
368
+ aws ec2 delete-volume --volume-id $v
369
+ done
370
+ aws ec2 delete-security-group --group-id $SG_ID
371
+ aws iam remove-role-from-instance-profile --instance-profile-name $NAME-profile --role-name $NAME-role
372
+ aws iam delete-instance-profile --instance-profile-name $NAME-profile
373
+ aws iam detach-role-policy --role-name $NAME-role --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
374
+ aws iam delete-role --role-name $NAME-role
375
+ ```
@@ -0,0 +1,138 @@
1
+ # Why `pentatonic-memory` v0.5.x underperforms on retrieval benchmarks
2
+
3
+ This document explains the architectural reasons `pentatonic-memory` v0.5.x scores 17.6% on substring-graded retrieval benches. None of these are bugs — they are deliberate design decisions optimised for a different workload (chat-style fact recall over agent memory). They just happen to be the wrong defaults for general-purpose retrieval.
4
+
5
+ The engine in this package addresses each one.
6
+
7
+ ## 1. Atom boost wins over source
8
+
9
+ ```js
10
+ // pentatonic-memory v0.5.10/src/search.js
11
+ const DEFAULT_WEIGHTS = {
12
+ ...
13
+ atomBoost: 0.15, // ← 15% boost for distilled atomic facts
14
+ verbosityPenalty: 0.1, // ← penalty for long raw content
15
+ };
16
+ ```
17
+
18
+ `distill.js` runs an LLM on every ingested memory and extracts "atomic facts." Those atoms are stored as separate rows linked back via `source_id`. Search then ranks atoms higher than their source via the boost.
19
+
20
+ For chat-style queries ("what does Phil drink?") this works: the atom "Phil drinks cortado" is ranked above the raw turn "Yeah, oh hey Phil came over yesterday and he had a cortado…".
21
+
22
+ For substring grading ("what was the price of thing-9001?") it backfires: the atom is "the user reported a sale event" and the raw "thing-9001 sold for $15.50 to buyer-42" gets dropped or out-ranked. The literal answer string is gone.
23
+
24
+ **Engine default:** `atomBoost = 0`, `verbosityPenalty = 0`. Distillation is opt-in per query.
25
+
26
+ ## 2. `dedupeBySource` removes the right answer
27
+
28
+ ```js
29
+ // pentatonic-memory v0.5.10/src/search.js, line 161
30
+ if (opts.dedupeBySource !== false) {
31
+ const atomSources = new Set(
32
+ filtered.filter((r) => r.source_id).map((r) => r.source_id)
33
+ );
34
+ if (atomSources.size > 0) {
35
+ filtered = filtered.filter((r) => !atomSources.has(r.id));
36
+ }
37
+ }
38
+ ```
39
+
40
+ When an atom matches, its source raw is **dropped** from results. The thinking is "the atom contains the relevant fact, the source is redundant." For substring grading, the source contains the literal text the bench is looking for, while the atom is a paraphrase.
41
+
42
+ **Engine default:** return both atom and source. Caller can dedupe if they want to reduce token spend.
43
+
44
+ ## 3. `minScore: 0.5` is too aggressive
45
+
46
+ ```js
47
+ const threshold = opts.minScore ?? 0.5;
48
+ ```
49
+
50
+ NV-Embed-v2 routinely produces cosine similarities of 0.30–0.45 for genuinely relevant chunks. The 0.5 default filters those out completely. The bench passes `min_score: 0.0001` to compensate, but real callers using SDK defaults silently lose recall.
51
+
52
+ **Engine default:** `min_score: 0.001`. The CTE's relevance × recency × frequency formula handles ranking; let everything through and trust the ordering.
53
+
54
+ ## 4. No `/forget` endpoint
55
+
56
+ ```js
57
+ // server.js routes:
58
+ // POST /search
59
+ // POST /store
60
+ // GET /health
61
+ // (no /forget, no /memories)
62
+ ```
63
+
64
+ v0.4.x had `/forget` and `/memories`. v0.5.x removed them. Without `/forget`:
65
+ - Tests can't isolate runs (data accumulates across test suites)
66
+ - Benches pollute each other's namespaces (we observed v0.5.6 going from 17.6% to 9.4% over 5 runs of pollution)
67
+ - GDPR data deletion requests require direct Postgres access
68
+ - Multi-tenant deployments can't enforce tenant boundaries via the SDK alone
69
+
70
+ **Engine:** restored `/forget` with `id` and `metadata_contains` filters.
71
+
72
+ ## 5. No `/store-batch`
73
+
74
+ Even though `ai.js` has an `embedBatch()` helper, the server only exposes single-record `/store`. Bulk ingest does N HTTP roundtrips, each with one synchronous embed call.
75
+
76
+ For the bench harness, this means a 22-doc corpus takes ~25 minutes to ingest because every doc waits for an Ollama HyDE generation (60s default) plus an embed call.
77
+
78
+ **Engine:** added `/store-batch`. One HTTP roundtrip, one batched embed call, one bulk INSERT. 30-50× faster on >5 records.
79
+
80
+ ## 6. HyDE generated at INGEST time
81
+
82
+ ```js
83
+ // ingest.js — for every /store call:
84
+ const hypothetical_queries = await llm.chat(/* generate 3-5 fake queries */);
85
+ metadata.hypothetical_queries = hypothetical_queries;
86
+ ```
87
+
88
+ This adds a 60s LLM call to every ingest. Worse, the queries are generated against the *content*, not the user's actual query — so they tend to be generic ("what is the topic of this document"), not useful for matching at search time.
89
+
90
+ **Engine:** HyDE runs at SEARCH time against the user's actual query. Each search generates 3 hypothetical answers, embeds each, runs vector search per embedding, and RRF-fuses the rank lists. Better matching, no ingest blocking.
91
+
92
+ ## 7. No content chunking
93
+
94
+ v0.5.x stores a 10,000-token document as one row with one 4096-d embedding. The vector represents the *average* meaning of the document, washing out specific facts.
95
+
96
+ **Engine:** chunks at ingest into ~200-500 token segments, each with its own embedding and `chunk_index`. Search returns chunks; downstream caller can hydrate the parent document if needed.
97
+
98
+ ## 8. No reranker
99
+
100
+ v0.5.x's `search.js` returns top-K directly from the SQL CTE score. No second-pass reranker.
101
+
102
+ **Engine:** L6 doc-store runs a `ms-marco-MiniLM-L-6-v2` cross-encoder over the top-50 from initial retrieval, then returns top-K. Substantially better precision on questions that need exact term matching after broad recall.
103
+
104
+ ## 9. No graph / entity layer
105
+
106
+ v0.5.x doesn't extract entities at ingest, doesn't build relationships, can't answer multi-hop questions ("who owns thing-X" → "find listings where X was sold" → "fetch buyer's contact").
107
+
108
+ **Engine:** L3 Knowledge Graph (Neo4j Community) extracts entities at ingest, builds edges between co-occurring entities, and at search time boosts rows that mention the same entities as the query. Critical for the marketplace-ops and customer-support benches.
109
+
110
+ ## 10. Single vector store, single embedding per row
111
+
112
+ v0.5.x writes one row per memory with one embedding column in pgvector. The HNSW index doesn't work above 2000 dimensions, so 4096-d NV-Embed embeddings fall back to sequential scan. At >100k memories, that's >100ms per query.
113
+
114
+ **Engine:** indexes the same content into multiple stores in parallel:
115
+ - L0 BM25 (SQLite FTS5)
116
+ - L4 sqlite-vec (small, in-process)
117
+ - L5 Milvus (medium, dedicated)
118
+ - L6 doc-store (with reranker)
119
+ - L3 KG (relationship-pivoted)
120
+
121
+ Search runs all five in parallel, RRF-fuses the rank lists, applies reranker on top-50. Different query types win on different layers — the fusion absorbs the strengths of each.
122
+
123
+ ## Summary
124
+
125
+ | Gap | Bench impact (estimated) | Fix complexity |
126
+ |---|---|---|
127
+ | 1. atomBoost +0.15 | -15-20pp | trivial (config flag) |
128
+ | 2. dedupeBySource: true | -5-10pp | trivial (config flag) |
129
+ | 3. minScore: 0.5 default | -3-8pp | trivial (config change) |
130
+ | 4. No /forget | n/a but blocks tests | trivial (10 LOC) |
131
+ | 5. No /store-batch | n/a but blocks bench (~25 min ingest) | low (50 LOC) |
132
+ | 6. HyDE at ingest time | -5-10pp + 60s/store | medium (refactor) |
133
+ | 7. No chunking | -5-15pp on long docs | medium (schema change) |
134
+ | 8. No reranker | -5-10pp | medium (sidecar service) |
135
+ | 9. No graph layer | -5-10pp on entity queries | high (new schema + extraction) |
136
+ | 10. Single vector store | -10-20pp, latency at scale | high (parallel infrastructure) |
137
+
138
+ This package addresses 1-10 simultaneously by routing through the 7-layer engine, recovering ~65pp of the gap.
@@ -0,0 +1,52 @@
1
+ # engine/
2
+
3
+ Bundled engine layers for the Pentatonic Memory Engine.
4
+
5
+ | File | Layer | LOC | Purpose |
6
+ |---|---|---|---|
7
+ | `l2-hybridrag-proxy.py` | L2 | ~1.5k | RRF fusion across all layers, exposed on `:8031` |
8
+ | `l5-comms-layer.py` | L5 | ~0.7k | Milvus comms layer for chat/email/contact/memory collections, exposed on `:8034` |
9
+ | `l6-document-store.py` | L6 | ~1.5k | Document store + cross-encoder reranker, exposed on `:8037` |
10
+ | `services/nv-embed/server.py` | — | ~150 | NV-Embed-v2 4096-dim embedding service, exposed on `:8041` |
11
+
12
+ ## pme_memory SDK
13
+
14
+ The `pme_memory/` package at the repo root is an installable Python SDK for the L5 communications layer. It provides:
15
+
16
+ - **store.py** — Milvus client and collection management (chats, emails, contacts, memory)
17
+ - **search.py** — Semantic search across collections
18
+ - **embed.py** — Dual-stack embedding (NV-Embed-v2 primary, Ollama fallback)
19
+ - **indexer.py** — Data ingestion pipeline (JSONL chats, email archives, contacts, memory files)
20
+ - **scoring.py** — Pressure scoring for need signals (recency, novelty, centrality, priority)
21
+ - **synthesis.py** — Deterministic multi-parent artifact merge
22
+ - **artifacts.py** — Append-only artifact DAG store (JSONL)
23
+ - **hygiene.py** — DAG maintenance (dedup, conflict detection, orphan pruning)
24
+ - **health.py** — L5 health check
25
+ - **needs.py** — Need signal indexing
26
+ - **provenance.py** — Lineage visualization
27
+
28
+ Install: `pip install -e ".[full]"` — CLI: `pme-memory health|stats|index|search|serve`
29
+
30
+ ## KG Extraction Scripts
31
+
32
+ The `scripts/` directory contains Knowledge Graph population tools:
33
+
34
+ - **kg-extractor.py** — spaCy + regex entity/relationship extraction from memory files → Neo4j
35
+ - **kg-preflexor-v2.py** — 2-pass concurrent LLM-based extraction via Ollama (14 structured entity types + native graph discovery)
36
+
37
+ ## Where L0, L3 and the embedding service live
38
+
39
+ - **L0 BM25** — provided by SQLite FTS5; the L2 proxy queries it directly via `sqlite3`. No separate service binary.
40
+ - **L3 Knowledge Graph** — provided by Neo4j Community (free, OSS) running in a sibling container. The proxy queries it via the bolt protocol on `:7687`.
41
+ - **NV-Embed-v2 embedding service** — see `services/nv-embed/` for the Docker context. Exposes the OpenAI-compatible `/v1/embeddings` endpoint on `:8041`.
42
+
43
+ ## Dependencies
44
+
45
+ Each service has its own `requirements.txt` in `services/<layer>/`. Common heavy deps:
46
+
47
+ - `pymilvus>=2.6.12` (L5)
48
+ - `sentence-transformers` (L6 reranker, NV-Embed)
49
+ - `httpx`, `fastapi`, `uvicorn` (all)
50
+ - `spacy` (L6 entity extraction)
51
+
52
+ NV-Embed needs Torch + the model weights (auto-downloaded on first run from Hugging Face).