@rubytech/create-maxy 1.0.740 → 1.0.742
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/payload/platform/lib/brand-templating/dist/index.d.ts +18 -0
- package/payload/platform/lib/brand-templating/dist/index.d.ts.map +1 -0
- package/payload/platform/lib/brand-templating/dist/index.js +69 -0
- package/payload/platform/lib/brand-templating/dist/index.js.map +1 -0
- package/payload/platform/lib/brand-templating/src/index.ts +76 -0
- package/payload/platform/lib/brand-templating/tsconfig.json +8 -0
- package/payload/platform/lib/graph-write/dist/index.d.ts.map +1 -1
- package/payload/platform/lib/graph-write/dist/index.js +23 -1
- package/payload/platform/lib/graph-write/dist/index.js.map +1 -1
- package/payload/platform/lib/graph-write/src/index.ts +27 -4
- package/payload/platform/neo4j/schema.cypher +5 -2
- package/payload/platform/package.json +2 -2
- package/payload/platform/plugins/admin/mcp/dist/index.js +6 -1
- package/payload/platform/plugins/admin/mcp/dist/index.js.map +1 -1
- package/payload/platform/plugins/admin/skills/onboarding/SKILL.md +7 -7
- package/payload/platform/plugins/admin/skills/plugin-management/SKILL.md +1 -1
- package/payload/platform/plugins/anthropic/skills/get-api-key/SKILL.md +2 -2
- package/payload/platform/plugins/cloudflare/skills/setup-tunnel/SKILL.md +1 -1
- package/payload/platform/plugins/docs/references/access-control.md +10 -10
- package/payload/platform/plugins/docs/references/contacts-guide.md +11 -11
- package/payload/platform/plugins/docs/references/deployment.md +13 -13
- package/payload/platform/plugins/docs/references/getting-started.md +19 -19
- package/payload/platform/plugins/docs/references/internals.md +4 -4
- package/payload/platform/plugins/docs/references/memory-guide.md +21 -21
- package/payload/platform/plugins/docs/references/migration-guide.md +5 -5
- package/payload/platform/plugins/docs/references/platform.md +9 -9
- package/payload/platform/plugins/docs/references/plugins-guide.md +20 -12
- package/payload/platform/plugins/docs/references/projects-guide.md +10 -10
- package/payload/platform/plugins/docs/references/settings.md +13 -13
- package/payload/platform/plugins/docs/references/telegram-guide.md +14 -14
- package/payload/platform/plugins/docs/references/troubleshooting.md +23 -23
- package/payload/platform/plugins/linkedin-import/skills/linkedin-import/SKILL.md +6 -6
- package/payload/platform/plugins/linkedin-import/skills/linkedin-import/references/profile.md +2 -2
- package/payload/platform/plugins/memory/mcp/dist/lib/__tests__/llm-classifier.test.js +44 -5
- package/payload/platform/plugins/memory/mcp/dist/lib/__tests__/llm-classifier.test.js.map +1 -1
- package/payload/platform/plugins/memory/mcp/dist/lib/document-hierarchy.d.ts.map +1 -1
- package/payload/platform/plugins/memory/mcp/dist/lib/document-hierarchy.js +26 -5
- package/payload/platform/plugins/memory/mcp/dist/lib/document-hierarchy.js.map +1 -1
- package/payload/platform/plugins/memory/mcp/dist/lib/llm-classifier.d.ts.map +1 -1
- package/payload/platform/plugins/memory/mcp/dist/lib/llm-classifier.js +6 -1
- package/payload/platform/plugins/memory/mcp/dist/lib/llm-classifier.js.map +1 -1
- package/payload/platform/plugins/memory/mcp/dist/tools/memory-ingest.d.ts.map +1 -1
- package/payload/platform/plugins/memory/mcp/dist/tools/memory-ingest.js +4 -1
- package/payload/platform/plugins/memory/mcp/dist/tools/memory-ingest.js.map +1 -1
- package/payload/platform/plugins/memory/references/schema-base.md +7 -1
- package/payload/platform/plugins/memory/skills/document-ingest/SKILL.md +77 -19
- package/payload/platform/plugins/whatsapp/skills/connect-whatsapp/SKILL.md +2 -2
- package/payload/platform/plugins/workflows/mcp/test-workflows.sh +5 -1
- package/payload/platform/scripts/dedupe-userprofile-ghosts.sh +388 -0
- package/payload/platform/scripts/embed-backfill.sh +8 -1
- package/payload/platform/scripts/migrate-import.sh +42 -1
- package/payload/platform/scripts/seed-neo4j.sh +1 -0
- package/payload/platform/templates/specialists/agents/database-operator.md +27 -5
- package/payload/server/chunk-PQ6LDXZ4.js +2997 -0
- package/payload/server/chunk-W6ZUNLLS.js +9446 -0
- package/payload/server/client-pool-DQBHSKAF.js +28 -0
- package/payload/server/maxy-edge.js +2 -2
- package/payload/server/server.js +41 -3
|
@@ -0,0 +1,388 @@
|
|
|
1
|
+
#!/usr/bin/env bash
|
|
2
|
+
# ============================================================
|
|
3
|
+
# dedupe-userprofile-ghosts.sh — Task 792
|
|
4
|
+
#
|
|
5
|
+
# Purpose
|
|
6
|
+
# -------
|
|
7
|
+
# Remove ghost AdminUser + UserProfile rows whose userId does not appear
|
|
8
|
+
# in users.json (carryover from the userId-minting bug fixed by Task 791),
|
|
9
|
+
# and dedupe (accountId, userId) UserProfile collisions (would-be schema
|
|
10
|
+
# violations once the new constraint is applied).
|
|
11
|
+
#
|
|
12
|
+
# Per loser:
|
|
13
|
+
# 1. Reparent every outbound edge to the winner — except edges whose
|
|
14
|
+
# target is itself a loser, edges that would create a duplicate of
|
|
15
|
+
# an edge the winner already has, and self-loops.
|
|
16
|
+
# 2. Reparent every inbound edge to the winner under the same rules.
|
|
17
|
+
# 3. DETACH DELETE the loser node.
|
|
18
|
+
#
|
|
19
|
+
# All work is operator-invoked over SSH. The script is idempotent:
|
|
20
|
+
# a second run logs `[dedupe-userprofile] nothing-to-dedupe brand=<n>`
|
|
21
|
+
# and exits 0 with no writes.
|
|
22
|
+
#
|
|
23
|
+
# Usage
|
|
24
|
+
# -----
|
|
25
|
+
# bash dedupe-userprofile-ghosts.sh [--brand=NAME] [--dry-run]
|
|
26
|
+
#
|
|
27
|
+
# --brand=NAME Brand name (e.g. "maxy", "realagent"). Auto-detected if
|
|
28
|
+
# exactly one ~/.<brand>/.neo4j-password exists.
|
|
29
|
+
# --dry-run Print plan without writes.
|
|
30
|
+
#
|
|
31
|
+
# Reads:
|
|
32
|
+
# ~/.<brand>/.neo4j-password Neo4j password
|
|
33
|
+
# ~/.<brand>/.env NEO4J_URI (override with env var)
|
|
34
|
+
# ~/.<brand>/users.json live userIds
|
|
35
|
+
#
|
|
36
|
+
# Edge-reparent caveat
|
|
37
|
+
# --------------------
|
|
38
|
+
# Reparented edges keep their original properties. Properties on the
|
|
39
|
+
# *other* node (e.g. a Preference's own `userId`) are not rewritten —
|
|
40
|
+
# if a Preference's userId no longer matches its parent UserProfile's
|
|
41
|
+
# userId after dedupe, it surfaces in subsequent `[graph-health]` ticks
|
|
42
|
+
# and is the responsibility of a follow-up task. This script's contract
|
|
43
|
+
# is "no UserProfile/AdminUser ghost nodes remain", not "every property
|
|
44
|
+
# referencing a ghost userId is rewritten".
|
|
45
|
+
# ============================================================
|
|
46
|
+
|
|
47
|
+
set -euo pipefail
|
|
48
|
+
|
|
49
|
+
BRAND=""
|
|
50
|
+
DRY_RUN=0
|
|
51
|
+
|
|
52
|
+
while [ $# -gt 0 ]; do
|
|
53
|
+
case "$1" in
|
|
54
|
+
--brand=*) BRAND="${1#*=}"; shift ;;
|
|
55
|
+
--brand) BRAND="${2:-}"; shift 2 ;;
|
|
56
|
+
--dry-run) DRY_RUN=1; shift ;;
|
|
57
|
+
-h|--help)
|
|
58
|
+
sed -n '2,/^# =\{20,\}/p' "$0" | sed 's/^# \{0,1\}//'
|
|
59
|
+
exit 0
|
|
60
|
+
;;
|
|
61
|
+
*) echo "Unknown arg: $1" >&2; exit 2 ;;
|
|
62
|
+
esac
|
|
63
|
+
done
|
|
64
|
+
|
|
65
|
+
# --- Brand detection ------------------------------------------------
|
|
66
|
+
if [ -z "$BRAND" ]; then
|
|
67
|
+
candidates=()
|
|
68
|
+
for d in "$HOME"/.maxy "$HOME"/.realagent; do
|
|
69
|
+
if [ -f "$d/.neo4j-password" ]; then
|
|
70
|
+
candidates+=("$(basename "$d" | sed 's/^\.//')")
|
|
71
|
+
fi
|
|
72
|
+
done
|
|
73
|
+
if [ "${#candidates[@]}" -eq 1 ]; then
|
|
74
|
+
BRAND="${candidates[0]}"
|
|
75
|
+
elif [ "${#candidates[@]}" -gt 1 ]; then
|
|
76
|
+
echo "Error: multiple brand dirs (${candidates[*]}); pass --brand=NAME" >&2
|
|
77
|
+
exit 1
|
|
78
|
+
else
|
|
79
|
+
echo "Error: no brand dir with .neo4j-password under ~/.maxy or ~/.realagent" >&2
|
|
80
|
+
exit 1
|
|
81
|
+
fi
|
|
82
|
+
fi
|
|
83
|
+
|
|
84
|
+
BRAND_DIR="$HOME/.$BRAND"
|
|
85
|
+
[ -d "$BRAND_DIR" ] || { echo "Error: $BRAND_DIR not found" >&2; exit 1; }
|
|
86
|
+
|
|
87
|
+
PASSWORD_FILE="$BRAND_DIR/.neo4j-password"
|
|
88
|
+
ENV_FILE="$BRAND_DIR/.env"
|
|
89
|
+
USERS_FILE="$BRAND_DIR/users.json"
|
|
90
|
+
|
|
91
|
+
[ -f "$PASSWORD_FILE" ] || { echo "Error: $PASSWORD_FILE missing" >&2; exit 1; }
|
|
92
|
+
[ -f "$USERS_FILE" ] || { echo "Error: $USERS_FILE missing" >&2; exit 1; }
|
|
93
|
+
|
|
94
|
+
NEO4J_PASSWORD="$(cat "$PASSWORD_FILE")"
|
|
95
|
+
NEO4J_USER="${NEO4J_USER:-neo4j}"
|
|
96
|
+
NEO4J_URI="${NEO4J_URI:-}"
|
|
97
|
+
if [ -z "$NEO4J_URI" ] && [ -f "$ENV_FILE" ]; then
|
|
98
|
+
# shellcheck disable=SC2002
|
|
99
|
+
NEO4J_URI="$(awk -F'=' '/^NEO4J_URI=/ {sub(/^NEO4J_URI=/,""); print; exit}' "$ENV_FILE")"
|
|
100
|
+
fi
|
|
101
|
+
[ -n "$NEO4J_URI" ] || { echo "Error: NEO4J_URI not in env or $ENV_FILE" >&2; exit 1; }
|
|
102
|
+
|
|
103
|
+
if ! command -v cypher-shell >/dev/null 2>&1; then
|
|
104
|
+
echo "Error: cypher-shell not found in PATH" >&2
|
|
105
|
+
exit 1
|
|
106
|
+
fi
|
|
107
|
+
|
|
108
|
+
# --- Live userIds (from users.json) ---------------------------------
|
|
109
|
+
# Build a Cypher list literal: ['uid1', 'uid2', ...]
|
|
110
|
+
LIVE_USER_IDS_CYPHER="$(python3 -c '
|
|
111
|
+
import json, sys
|
|
112
|
+
with open(sys.argv[1]) as f:
|
|
113
|
+
ids = [u["userId"] for u in json.load(f) if u.get("userId")]
|
|
114
|
+
print("[" + ",".join("'\''" + i.replace("'\''","''") + "'\''" for i in ids) + "]")
|
|
115
|
+
' "$USERS_FILE")"
|
|
116
|
+
|
|
117
|
+
if [ "$LIVE_USER_IDS_CYPHER" = "[]" ]; then
|
|
118
|
+
echo "Error: $USERS_FILE has no userId entries" >&2
|
|
119
|
+
exit 1
|
|
120
|
+
fi
|
|
121
|
+
|
|
122
|
+
# --- Cypher-shell helpers -------------------------------------------
|
|
123
|
+
cs() {
|
|
124
|
+
cypher-shell -u "$NEO4J_USER" -p "$NEO4J_PASSWORD" -a "$NEO4J_URI" "$@"
|
|
125
|
+
}
|
|
126
|
+
|
|
127
|
+
now_ms() { python3 -c 'import time; print(int(time.time()*1000))'; }
|
|
128
|
+
|
|
129
|
+
START_MS=$(now_ms)
|
|
130
|
+
REPARENTED_TOTAL=0
|
|
131
|
+
DELETED_TOTAL=0
|
|
132
|
+
|
|
133
|
+
# --- Per-loser reparent + delete ------------------------------------
|
|
134
|
+
# Args: $1=label (UserProfile|AdminUser) $2=loserEid $3=winnerEid $4=allLosersJson
|
|
135
|
+
# (allLosersJson is a Cypher list literal of all loser eids — used to skip
|
|
136
|
+
# edges between two ghosts that will both be deleted.)
|
|
137
|
+
dedupe_one() {
|
|
138
|
+
local label="$1" loser_eid="$2" winner_eid="$3" all_losers_cypher="$4"
|
|
139
|
+
|
|
140
|
+
echo "[dedupe-userprofile] delete elementId=$loser_eid label=$label" >&2
|
|
141
|
+
|
|
142
|
+
if [ "$DRY_RUN" = "1" ]; then
|
|
143
|
+
DELETED_TOTAL=$((DELETED_TOTAL + 1))
|
|
144
|
+
return 0
|
|
145
|
+
fi
|
|
146
|
+
|
|
147
|
+
# Single transaction: reparent outbound, reparent inbound, delete loser.
|
|
148
|
+
# Skips:
|
|
149
|
+
# - edges to/from other losers (they die together)
|
|
150
|
+
# - edges that would duplicate an edge the winner already has
|
|
151
|
+
# - self-loops on winner
|
|
152
|
+
local out
|
|
153
|
+
out="$(cs --format plain <<CYPHER
|
|
154
|
+
CYPHER 5
|
|
155
|
+
MATCH (loser) WHERE elementId(loser) = '$loser_eid'
|
|
156
|
+
MATCH (winner) WHERE elementId(winner) = '$winner_eid'
|
|
157
|
+
CALL {
|
|
158
|
+
WITH loser, winner
|
|
159
|
+
MATCH (loser)-[r]->(o)
|
|
160
|
+
WHERE elementId(o) <> elementId(winner)
|
|
161
|
+
AND NOT elementId(o) IN $all_losers_cypher
|
|
162
|
+
AND NOT EXISTS { MATCH (winner)-[r2]->(o) WHERE type(r2) = type(r) }
|
|
163
|
+
WITH winner, o, type(r) AS t, properties(r) AS p, r
|
|
164
|
+
CREATE (winner)-[nr:\$(t)]->(o)
|
|
165
|
+
SET nr = p
|
|
166
|
+
RETURN count(*) AS outMoved
|
|
167
|
+
}
|
|
168
|
+
CALL {
|
|
169
|
+
WITH loser, winner
|
|
170
|
+
MATCH (i)-[r]->(loser)
|
|
171
|
+
WHERE elementId(i) <> elementId(winner)
|
|
172
|
+
AND NOT elementId(i) IN $all_losers_cypher
|
|
173
|
+
AND NOT EXISTS { MATCH (i)-[r2]->(winner) WHERE type(r2) = type(r) }
|
|
174
|
+
WITH winner, i, type(r) AS t, properties(r) AS p, r
|
|
175
|
+
CREATE (i)-[nr:\$(t)]->(winner)
|
|
176
|
+
SET nr = p
|
|
177
|
+
RETURN count(*) AS inMoved
|
|
178
|
+
}
|
|
179
|
+
WITH loser, outMoved, inMoved
|
|
180
|
+
DETACH DELETE loser
|
|
181
|
+
RETURN outMoved + inMoved AS reparented;
|
|
182
|
+
CYPHER
|
|
183
|
+
)"
|
|
184
|
+
|
|
185
|
+
# Parse result (last numeric token)
|
|
186
|
+
local moved
|
|
187
|
+
moved="$(printf '%s\n' "$out" | awk '/^[0-9]+$/ {n=$0} END {print (n+0)}')"
|
|
188
|
+
REPARENTED_TOTAL=$((REPARENTED_TOTAL + moved))
|
|
189
|
+
DELETED_TOTAL=$((DELETED_TOTAL + 1))
|
|
190
|
+
if [ "$moved" -gt 0 ]; then
|
|
191
|
+
echo "[dedupe-userprofile] reparent edges=$moved from=$loser_eid to=$winner_eid" >&2
|
|
192
|
+
fi
|
|
193
|
+
}
|
|
194
|
+
|
|
195
|
+
# --- Plan both phases up front --------------------------------------
|
|
196
|
+
# So that `start` / `nothing-to-dedupe` / `DRY-RUN` lines fire BEFORE
|
|
197
|
+
# any bucket/delete events. Pre-count ghosts per phase = total rows in
|
|
198
|
+
# the plan minus distinct accountIds (one keeper per account; solo-tier).
|
|
199
|
+
|
|
200
|
+
plan_admin="$(cs --format plain <<CYPHER
|
|
201
|
+
MATCH (au:AdminUser)-[:ADMIN_OF]->(b:LocalBusiness)
|
|
202
|
+
WITH b.accountId AS accountId, au
|
|
203
|
+
WITH accountId, collect({eid: elementId(au), userId: coalesce(au.userId, '')}) AS rows, count(au) AS n
|
|
204
|
+
WHERE n > 1
|
|
205
|
+
UNWIND rows AS row
|
|
206
|
+
RETURN accountId + '|' + row.eid + '|' + row.userId + '|' + toString(row.userId IN $LIVE_USER_IDS_CYPHER) AS line
|
|
207
|
+
ORDER BY accountId, line;
|
|
208
|
+
CYPHER
|
|
209
|
+
)"
|
|
210
|
+
|
|
211
|
+
plan_up="$(cs --format plain <<CYPHER
|
|
212
|
+
MATCH (up:UserProfile)
|
|
213
|
+
WITH up.accountId AS accountId, up
|
|
214
|
+
WITH accountId, collect({eid: elementId(up), userId: coalesce(up.userId, ''), updatedAt: coalesce(up.updatedAt, '')}) AS rows, count(up) AS n
|
|
215
|
+
WHERE n > 1
|
|
216
|
+
UNWIND rows AS row
|
|
217
|
+
RETURN accountId + '|' + row.eid + '|' + row.userId + '|' + toString(row.userId IN $LIVE_USER_IDS_CYPHER) + '|' + row.updatedAt AS line
|
|
218
|
+
ORDER BY accountId, line;
|
|
219
|
+
CYPHER
|
|
220
|
+
)"
|
|
221
|
+
|
|
222
|
+
admin_lines="$(printf '%s\n' "$plan_admin" | tail -n +2 | sed -e 's/^"//' -e 's/"$//' | grep -v '^$' || true)"
|
|
223
|
+
up_lines="$(printf '%s\n' "$plan_up" | tail -n +2 | sed -e 's/^"//' -e 's/"$//' | grep -v '^$' || true)"
|
|
224
|
+
|
|
225
|
+
count_ghosts() {
|
|
226
|
+
# ghosts = total_rows - distinct_accountIds (first |-field). 0 on empty.
|
|
227
|
+
local lines="$1"
|
|
228
|
+
[ -z "$lines" ] && { echo 0; return; }
|
|
229
|
+
local total distinct
|
|
230
|
+
total=$(printf '%s\n' "$lines" | wc -l | awk '{print $1}')
|
|
231
|
+
distinct=$(printf '%s\n' "$lines" | awk -F'|' '{print $1}' | sort -u | wc -l | awk '{print $1}')
|
|
232
|
+
echo $((total - distinct))
|
|
233
|
+
}
|
|
234
|
+
admin_pre_ghosts=$(count_ghosts "$admin_lines")
|
|
235
|
+
up_pre_ghosts=$(count_ghosts "$up_lines")
|
|
236
|
+
total_pre_ghosts=$((admin_pre_ghosts + up_pre_ghosts))
|
|
237
|
+
|
|
238
|
+
if [ "$total_pre_ghosts" -eq 0 ]; then
|
|
239
|
+
echo "[dedupe-userprofile] nothing-to-dedupe brand=$BRAND" >&2
|
|
240
|
+
exit 0
|
|
241
|
+
fi
|
|
242
|
+
|
|
243
|
+
if [ "$DRY_RUN" = "1" ]; then
|
|
244
|
+
echo "[dedupe-userprofile] DRY-RUN brand=$BRAND admin-ghosts=$admin_pre_ghosts userprofile-ghosts=$up_pre_ghosts (no writes)" >&2
|
|
245
|
+
fi
|
|
246
|
+
|
|
247
|
+
# Pre-count distinct accountIds with ghosts (across both phases) so the
|
|
248
|
+
# `start` line carries the same total as the success criterion query.
|
|
249
|
+
distinct_admin_accts=$([ -n "$admin_lines" ] && printf '%s\n' "$admin_lines" | awk -F'|' '{print $1}' | sort -u | wc -l | awk '{print $1}' || echo 0)
|
|
250
|
+
distinct_up_accts=$([ -n "$up_lines" ] && printf '%s\n' "$up_lines" | awk -F'|' '{print $1}' | sort -u | wc -l | awk '{print $1}' || echo 0)
|
|
251
|
+
distinct_total_accts=$((distinct_admin_accts + distinct_up_accts))
|
|
252
|
+
|
|
253
|
+
if [ "$DRY_RUN" != "1" ]; then
|
|
254
|
+
echo "[dedupe-userprofile] start brand=$BRAND accounts-with-ghosts=$distinct_total_accts total-ghosts=$total_pre_ghosts" >&2
|
|
255
|
+
fi
|
|
256
|
+
|
|
257
|
+
# --- Phase A: AdminUser cleanup -------------------------------------
|
|
258
|
+
admin_accounts_with_ghosts=0
|
|
259
|
+
admin_total_ghosts=0
|
|
260
|
+
|
|
261
|
+
if [ -n "$admin_lines" ]; then
|
|
262
|
+
# Group by accountId
|
|
263
|
+
current_account=""
|
|
264
|
+
winner_eid=""
|
|
265
|
+
losers_for_account=()
|
|
266
|
+
|
|
267
|
+
process_account() {
|
|
268
|
+
local acct="$1"
|
|
269
|
+
[ -z "$acct" ] && return 0
|
|
270
|
+
if [ -z "$winner_eid" ]; then
|
|
271
|
+
echo "[dedupe-userprofile] WARN no live AdminUser for account=${acct:0:8} — skipping" >&2
|
|
272
|
+
return 0
|
|
273
|
+
fi
|
|
274
|
+
if [ "${#losers_for_account[@]}" -eq 0 ]; then
|
|
275
|
+
return 0
|
|
276
|
+
fi
|
|
277
|
+
admin_accounts_with_ghosts=$((admin_accounts_with_ghosts + 1))
|
|
278
|
+
admin_total_ghosts=$((admin_total_ghosts + ${#losers_for_account[@]}))
|
|
279
|
+
|
|
280
|
+
# Build Cypher list literal of all loser eids (for the inter-loser skip)
|
|
281
|
+
local all_losers="["
|
|
282
|
+
local first=1
|
|
283
|
+
for e in "${losers_for_account[@]}"; do
|
|
284
|
+
[ $first -eq 0 ] && all_losers="$all_losers,"
|
|
285
|
+
all_losers="$all_losers'$e'"
|
|
286
|
+
first=0
|
|
287
|
+
done
|
|
288
|
+
all_losers="$all_losers]"
|
|
289
|
+
|
|
290
|
+
echo "[dedupe-userprofile] bucket label=AdminUser account=${acct:0:8} winner=$winner_eid losers=${#losers_for_account[@]}" >&2
|
|
291
|
+
for loser in "${losers_for_account[@]}"; do
|
|
292
|
+
dedupe_one "AdminUser" "$loser" "$winner_eid" "$all_losers"
|
|
293
|
+
done
|
|
294
|
+
}
|
|
295
|
+
|
|
296
|
+
# iterate
|
|
297
|
+
while IFS='|' read -r acct eid uid is_live; do
|
|
298
|
+
[ -z "$acct" ] && continue
|
|
299
|
+
if [ "$acct" != "$current_account" ]; then
|
|
300
|
+
process_account "$current_account"
|
|
301
|
+
current_account="$acct"
|
|
302
|
+
winner_eid=""
|
|
303
|
+
losers_for_account=()
|
|
304
|
+
fi
|
|
305
|
+
if [ "$is_live" = "true" ] && [ -z "$winner_eid" ]; then
|
|
306
|
+
winner_eid="$eid"
|
|
307
|
+
else
|
|
308
|
+
losers_for_account+=("$eid")
|
|
309
|
+
fi
|
|
310
|
+
done <<< "$admin_lines"
|
|
311
|
+
process_account "$current_account"
|
|
312
|
+
fi
|
|
313
|
+
|
|
314
|
+
# --- Phase B: UserProfile cleanup -----------------------------------
|
|
315
|
+
up_accounts_with_ghosts=0
|
|
316
|
+
up_total_ghosts=0
|
|
317
|
+
|
|
318
|
+
if [ -n "$up_lines" ]; then
|
|
319
|
+
current_account=""
|
|
320
|
+
winner_eid=""
|
|
321
|
+
winner_updated=""
|
|
322
|
+
losers_for_account=()
|
|
323
|
+
|
|
324
|
+
process_up_account() {
|
|
325
|
+
local acct="$1"
|
|
326
|
+
[ -z "$acct" ] && return 0
|
|
327
|
+
if [ -z "$winner_eid" ]; then
|
|
328
|
+
echo "[dedupe-userprofile] WARN no live UserProfile for account=${acct:0:8} — skipping" >&2
|
|
329
|
+
return 0
|
|
330
|
+
fi
|
|
331
|
+
if [ "${#losers_for_account[@]}" -eq 0 ]; then
|
|
332
|
+
return 0
|
|
333
|
+
fi
|
|
334
|
+
up_accounts_with_ghosts=$((up_accounts_with_ghosts + 1))
|
|
335
|
+
up_total_ghosts=$((up_total_ghosts + ${#losers_for_account[@]}))
|
|
336
|
+
|
|
337
|
+
local all_losers="["
|
|
338
|
+
local first=1
|
|
339
|
+
for e in "${losers_for_account[@]}"; do
|
|
340
|
+
[ $first -eq 0 ] && all_losers="$all_losers,"
|
|
341
|
+
all_losers="$all_losers'$e'"
|
|
342
|
+
first=0
|
|
343
|
+
done
|
|
344
|
+
all_losers="$all_losers]"
|
|
345
|
+
|
|
346
|
+
echo "[dedupe-userprofile] bucket label=UserProfile account=${acct:0:8} winner=$winner_eid losers=${#losers_for_account[@]}" >&2
|
|
347
|
+
for loser in "${losers_for_account[@]}"; do
|
|
348
|
+
dedupe_one "UserProfile" "$loser" "$winner_eid" "$all_losers"
|
|
349
|
+
done
|
|
350
|
+
}
|
|
351
|
+
|
|
352
|
+
while IFS='|' read -r acct eid uid is_live updated; do
|
|
353
|
+
[ -z "$acct" ] && continue
|
|
354
|
+
if [ "$acct" != "$current_account" ]; then
|
|
355
|
+
process_up_account "$current_account"
|
|
356
|
+
current_account="$acct"
|
|
357
|
+
winner_eid=""
|
|
358
|
+
winner_updated=""
|
|
359
|
+
losers_for_account=()
|
|
360
|
+
fi
|
|
361
|
+
# Winner = first live row with most-recent updatedAt; ties broken by first-seen.
|
|
362
|
+
if [ "$is_live" = "true" ]; then
|
|
363
|
+
if [ -z "$winner_eid" ] || [[ "$updated" > "$winner_updated" ]]; then
|
|
364
|
+
# demote previous winner (if any) to loser
|
|
365
|
+
if [ -n "$winner_eid" ]; then
|
|
366
|
+
losers_for_account+=("$winner_eid")
|
|
367
|
+
fi
|
|
368
|
+
winner_eid="$eid"
|
|
369
|
+
winner_updated="$updated"
|
|
370
|
+
else
|
|
371
|
+
losers_for_account+=("$eid")
|
|
372
|
+
fi
|
|
373
|
+
else
|
|
374
|
+
losers_for_account+=("$eid")
|
|
375
|
+
fi
|
|
376
|
+
done <<< "$up_lines"
|
|
377
|
+
process_up_account "$current_account"
|
|
378
|
+
fi
|
|
379
|
+
|
|
380
|
+
# --- Summary --------------------------------------------------------
|
|
381
|
+
END_MS=$(now_ms)
|
|
382
|
+
DURATION_MS=$((END_MS - START_MS))
|
|
383
|
+
|
|
384
|
+
if [ "$DRY_RUN" = "1" ]; then
|
|
385
|
+
exit 0
|
|
386
|
+
fi
|
|
387
|
+
|
|
388
|
+
echo "[dedupe-userprofile] done brand=$BRAND reparented=$REPARENTED_TOTAL deleted=$DELETED_TOTAL duration-ms=$DURATION_MS" >&2
|
|
@@ -42,7 +42,14 @@ set -euo pipefail
|
|
|
42
42
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
43
43
|
PROJECT_DIR="$(dirname "$SCRIPT_DIR")"
|
|
44
44
|
|
|
45
|
-
NEO4J_URI
|
|
45
|
+
# NEO4J_URI is hard-required (Task 788). The previous default
|
|
46
|
+
# `bolt://localhost:7687` would silently route the backfill to the wrong Neo4j
|
|
47
|
+
# on any brand-dedicated install, masking the actual configuration error.
|
|
48
|
+
if [ -z "${NEO4J_URI:-}" ]; then
|
|
49
|
+
echo "Error: NEO4J_URI required (no default — see Task 788)" >&2
|
|
50
|
+
echo " Set NEO4J_URI=bolt://localhost:<brand.neo4jPort> before running." >&2
|
|
51
|
+
exit 1
|
|
52
|
+
fi
|
|
46
53
|
NEO4J_USER="${NEO4J_USER:-neo4j}"
|
|
47
54
|
OLLAMA_URL="${OLLAMA_URL:-http://localhost:11434}"
|
|
48
55
|
EMBED_MODEL="${EMBED_MODEL:-nomic-embed-text}"
|
|
@@ -116,7 +116,14 @@ echo "[import] Account dir: $ACCOUNT_DIR"
|
|
|
116
116
|
# ------------------------------------------------------------------
|
|
117
117
|
# Neo4j connection
|
|
118
118
|
# ------------------------------------------------------------------
|
|
119
|
-
NEO4J_URI
|
|
119
|
+
# NEO4J_URI is hard-required (Task 788). The previous default
|
|
120
|
+
# `bolt://localhost:7687` would silently route the import to the wrong Neo4j on
|
|
121
|
+
# any brand-dedicated install, masking the actual configuration error.
|
|
122
|
+
if [ -z "${NEO4J_URI:-}" ]; then
|
|
123
|
+
echo "[import] ERROR: NEO4J_URI required (no default — see Task 788)" >&2
|
|
124
|
+
echo "[import] ERROR: Set NEO4J_URI=bolt://localhost:<brand.neo4jPort> before running." >&2
|
|
125
|
+
exit 1
|
|
126
|
+
fi
|
|
120
127
|
NEO4J_USER="${NEO4J_USER:-neo4j}"
|
|
121
128
|
|
|
122
129
|
NEO4J_PASSWORD_FILE="$INSTALL_DIR/platform/config/.neo4j-password"
|
|
@@ -364,6 +371,40 @@ if [ -f "$PINS_FILE" ]; then
|
|
|
364
371
|
# The platform stores PINs as SHA-256 hashes, not plain text
|
|
365
372
|
python3 -c "import hashlib; print(hashlib.sha256('$MASTER_PIN'.encode()).hexdigest())" > "$PIN_DIR/.admin-pin"
|
|
366
373
|
echo "[import] masterPin hash written to $PIN_DIR/.admin-pin"
|
|
374
|
+
|
|
375
|
+
# Auth gate is users.json-authoritative (Task 791). seed-neo4j.sh creates
|
|
376
|
+
# users.json from .admin-pin, but only at install time — and install runs
|
|
377
|
+
# before this script writes .admin-pin. Self-contained migration writes
|
|
378
|
+
# users.json directly, mirroring seed-neo4j.sh's migration branch.
|
|
379
|
+
USERS_FILE="$INSTALL_DIR/platform/config/users.json"
|
|
380
|
+
if [ ! -f "$USERS_FILE" ]; then
|
|
381
|
+
USER_ID="$(cat /proc/sys/kernel/random/uuid 2>/dev/null || python3 -c 'import uuid; print(uuid.uuid4())')"
|
|
382
|
+
PIN_HASH="$(cat "$PIN_DIR/.admin-pin")"
|
|
383
|
+
cat > "$USERS_FILE" << USERS_EOF
|
|
384
|
+
[{"userId":"$USER_ID","name":"Owner","pin":"$PIN_HASH"}]
|
|
385
|
+
USERS_EOF
|
|
386
|
+
echo "[import] users.json created (userId=${USER_ID:0:8})"
|
|
387
|
+
|
|
388
|
+
if [ -f "$ACCOUNT_DIR/account.json" ]; then
|
|
389
|
+
python3 -c "
|
|
390
|
+
import json
|
|
391
|
+
with open('$ACCOUNT_DIR/account.json', 'r') as f:
|
|
392
|
+
config = json.load(f)
|
|
393
|
+
config.setdefault('admins', [])
|
|
394
|
+
if not any(a.get('userId') == '$USER_ID' for a in config['admins']):
|
|
395
|
+
config['admins'].append({'userId': '$USER_ID', 'role': 'owner'})
|
|
396
|
+
with open('$ACCOUNT_DIR/account.json', 'w') as f:
|
|
397
|
+
json.dump(config, f, indent=2)
|
|
398
|
+
f.write('\n')
|
|
399
|
+
"
|
|
400
|
+
echo "[import] account.json admins updated (userId=${USER_ID:0:8} role=owner)"
|
|
401
|
+
else
|
|
402
|
+
echo "[import] ERROR: account.json not found at $ACCOUNT_DIR/account.json" >&2
|
|
403
|
+
exit 1
|
|
404
|
+
fi
|
|
405
|
+
else
|
|
406
|
+
echo "[import] users.json exists — skipping creation, preserving existing userId"
|
|
407
|
+
fi
|
|
367
408
|
else
|
|
368
409
|
echo "[import] WARN: brand.json not found at $BRAND_JSON — cannot determine config directory for PIN"
|
|
369
410
|
echo "[import] WARN: write the PIN manually: echo -n '$MASTER_PIN' > ~/.<brand>/.admin-pin"
|
|
@@ -428,6 +428,7 @@ echo "==> Connecting to Neo4j at $NEO4J_URI as $NEO4J_USER"
|
|
|
428
428
|
echo "==> Migrating schema: dropping renamed/obsolete constraints + indexes..."
|
|
429
429
|
"$CYPHER_SHELL" -u "$NEO4J_USER" -p "$NEO4J_PASSWORD" -a "$NEO4J_URI" << 'MIGRATE_EOF'
|
|
430
430
|
DROP CONSTRAINT user_profile_account_unique IF EXISTS;
|
|
431
|
+
DROP CONSTRAINT user_profile_account_user_unique IF EXISTS;
|
|
431
432
|
DROP INDEX preference_category IF EXISTS;
|
|
432
433
|
DROP INDEX knowledge_fulltext IF EXISTS;
|
|
433
434
|
MIGRATE_EOF
|
|
@@ -41,9 +41,9 @@ Return to the admin agent:
|
|
|
41
41
|
|
|
42
42
|
Do not return raw CSV rows, raw Cypher bodies, or raw tool-result dumps. Compression is the output discipline.
|
|
43
43
|
|
|
44
|
-
###
|
|
44
|
+
### Four-step operator narrative for document ingestion (Task 740, extended Task 790)
|
|
45
45
|
|
|
46
|
-
When the dispatch is a document ingestion (Branch A, the `document-ingest` skill), the operator sees
|
|
46
|
+
When the dispatch is a document ingestion (Branch A, the `document-ingest` skill), the operator sees up to four messages — one at each phase. You emit steps 2, 3, and 4 directly into chat at the moment each phase completes; admin emits step 1 before dispatching to you.
|
|
47
47
|
|
|
48
48
|
**Step 2 (after `memory-classify` returns ok).** Emit one chat message: `Classified <filename> into <N> sections, covering: <topicKeyword1>, <topicKeyword2>, …`. Use the `documentKeywords` from the classifier output. Do not paraphrase or reorder.
|
|
49
49
|
|
|
@@ -53,11 +53,33 @@ When the dispatch is a document ingestion (Branch A, the `document-ingest` skill
|
|
|
53
53
|
|
|
54
54
|
Use the actual numbers from the tool result, not approximations. Don't omit orphan candidates — they're the operator's primary debugging surface.
|
|
55
55
|
|
|
56
|
+
**Step 4 (after `wire-brief-entities` step completes — Task 790).** When the dispatch brief named entities the document should connect to (Persons, Organizations, Services, Tasks, Events, KnowledgeDocuments, BrandingData), execute the brief-driven entity-wiring discipline (see "Brief-driven entity wiring" below) and emit one chat message:
|
|
57
|
+
|
|
58
|
+
> Wired `<N>` brief entities: `<K>` Persons via `<edge>`, `<M>` Organizations via `<edge>`, `<T>` Tasks via `REFERENCES`. `<P>` entities not found in graph: `<comma-separated names>`.
|
|
59
|
+
|
|
60
|
+
Drop the "not found" clause when every brief entity resolved. Suppress the chat message entirely when the brief named no entities (single-author personal CV, generic FAQ, etc.) — the `[document-ingest] wire-brief-entities resolved=0 orphans=0 edges=0` log line still fires for grep regression coverage. The four-step narrative reverts to three visible chat messages in that case.
|
|
61
|
+
|
|
56
62
|
**Failure replacements.**
|
|
57
|
-
- Classifier failure (step 2): replace step 2 with `Classifier unavailable — <cause>: <reason>. <filename> not ingested. <remediation>.` Do not call `memory-ingest`. Do not emit step 3.
|
|
58
|
-
- Write failure (step 3): replace step 3 with `<n> sections classified but write failed at <stage> — <reason>. <filename> not in graph.`
|
|
63
|
+
- Classifier failure (step 2): replace step 2 with `Classifier unavailable — <cause>: <reason>. <filename> not ingested. <remediation>.` Do not call `memory-ingest`. Do not emit step 3 or step 4.
|
|
64
|
+
- Write failure (step 3): replace step 3 with `<n> sections classified but write failed at <stage> — <reason>. <filename> not in graph.` Do not emit step 4.
|
|
65
|
+
- Brief-wiring failure (step 4): if `memory-search` or `memory-write` fails mid-loop, emit `<K> brief entities wired before failure: <list>; <P> not yet attempted: <list>; failed at <entity name> — <reason>.` Do not silently swallow.
|
|
66
|
+
|
|
67
|
+
This is the operator's narrative — it must be truthful, specific, and complete. Never paraphrase the tool's structured output into a vague "ingested OK" — the verification cypher will catch the mismatch (`[memory-ingest] sections=… typed=… edges=… orphans=…` and `[document-ingest] wire-brief-entities …` log lines must agree with the chat numbers).
|
|
68
|
+
|
|
69
|
+
### Brief-driven entity wiring (Task 790)
|
|
70
|
+
|
|
71
|
+
When the admin agent dispatches you with a document and the brief names "key entities to connect" (Persons, Organizations, Services, Tasks, Events, KnowledgeDocuments, BrandingData), those connections are deliverables. The brief is the operator's intent translated into structured input — landing the document as an island anchored to one node while the named Persons/Organizations/Tasks stay disconnected silently degrades the graph into KnowledgeDocuments unreachable from the entities they describe.
|
|
72
|
+
|
|
73
|
+
**Discipline.** After `memory-ingest` returns the new `documentNodeId`, iterate every entity the brief named. For each:
|
|
74
|
+
|
|
75
|
+
1. Resolve the entity via `memory-search` (preferred — fuzzy name matching) or single-shot Cypher via `mcp__graph__maxy-graph-read_neo4j_cypher` (`MATCH (n:<Label> { <identifying-prop>: $value, accountId: $accountId }) RETURN elementId(n)`).
|
|
76
|
+
2. Pick the natural KD-level edge by entity kind and document shape (full table in [`document-ingest` SKILL.md](../../../plugins/memory/skills/document-ingest/SKILL.md)): meeting/call → `PARTICIPANT`; email → `FROM`/`TO`/`CC`; voice-note → `SPEAKER`; contract → `PARTY`; Task/Event/Service/KnowledgeDocument/BrandingData → `REFERENCES`; everything else → `MENTIONS`.
|
|
77
|
+
3. `memory-write` the edge from the new KnowledgeDocument to the resolved entity. Include `sourceDocumentId=<attachmentId>` and `createdByAgent='document-ingest'` in the edge properties — without these stamps, brief-wired edges leak on re-ingest because `memory-ingest`'s cleanup matches by `sourceDocumentId`.
|
|
78
|
+
4. If the entity does not resolve, append it to the orphan list. Do NOT create a placeholder Person/Organization — that path is reserved for the classifier's `documentEdges` (which create-MERGE on identifying properties).
|
|
79
|
+
|
|
80
|
+
Skip entities the classifier already wired via `documentEdges` (common for emails and contracts where the document body itself names the parties). The classifier output's `edgeBreakdown` enumerates these — compare against your brief list before each `memory-write` to avoid duplicate edges.
|
|
59
81
|
|
|
60
|
-
|
|
82
|
+
The brief is the contract; the wiring outcome is in the four-step narrative's step 4. Returning *"meeting notes processed as a KnowledgeDocument anchored to <X>"* without listing wired/unresolved brief entities is a regression of the failure mode that produced this discipline (Task 790 incident: Real Agent meeting ingested with anchor only, three named Persons + four named Tasks left disconnected, operator surfaced the gap manually).
|
|
61
83
|
|
|
62
84
|
---
|
|
63
85
|
|