claude-memory-layer 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (127) hide show
  1. package/.claude-plugin/commands/memory-forget.md +42 -0
  2. package/.claude-plugin/commands/memory-history.md +34 -0
  3. package/.claude-plugin/commands/memory-import.md +56 -0
  4. package/.claude-plugin/commands/memory-list.md +37 -0
  5. package/.claude-plugin/commands/memory-search.md +36 -0
  6. package/.claude-plugin/commands/memory-stats.md +34 -0
  7. package/.claude-plugin/hooks.json +59 -0
  8. package/.claude-plugin/plugin.json +24 -0
  9. package/.history/package_20260201112328.json +45 -0
  10. package/.history/package_20260201113602.json +45 -0
  11. package/.history/package_20260201113713.json +45 -0
  12. package/.history/package_20260201114110.json +45 -0
  13. package/Memo.txt +558 -0
  14. package/README.md +520 -0
  15. package/context.md +636 -0
  16. package/dist/.claude-plugin/commands/memory-forget.md +42 -0
  17. package/dist/.claude-plugin/commands/memory-history.md +34 -0
  18. package/dist/.claude-plugin/commands/memory-import.md +56 -0
  19. package/dist/.claude-plugin/commands/memory-list.md +37 -0
  20. package/dist/.claude-plugin/commands/memory-search.md +36 -0
  21. package/dist/.claude-plugin/commands/memory-stats.md +34 -0
  22. package/dist/.claude-plugin/hooks.json +59 -0
  23. package/dist/.claude-plugin/plugin.json +24 -0
  24. package/dist/cli/index.js +3539 -0
  25. package/dist/cli/index.js.map +7 -0
  26. package/dist/core/index.js +4408 -0
  27. package/dist/core/index.js.map +7 -0
  28. package/dist/hooks/session-end.js +2971 -0
  29. package/dist/hooks/session-end.js.map +7 -0
  30. package/dist/hooks/session-start.js +2969 -0
  31. package/dist/hooks/session-start.js.map +7 -0
  32. package/dist/hooks/stop.js +3123 -0
  33. package/dist/hooks/stop.js.map +7 -0
  34. package/dist/hooks/user-prompt-submit.js +2960 -0
  35. package/dist/hooks/user-prompt-submit.js.map +7 -0
  36. package/dist/services/memory-service.js +2931 -0
  37. package/dist/services/memory-service.js.map +7 -0
  38. package/package.json +45 -0
  39. package/plan.md +1642 -0
  40. package/scripts/build.ts +102 -0
  41. package/spec.md +624 -0
  42. package/specs/citations-system/context.md +243 -0
  43. package/specs/citations-system/plan.md +495 -0
  44. package/specs/citations-system/spec.md +371 -0
  45. package/specs/endless-mode/context.md +305 -0
  46. package/specs/endless-mode/plan.md +620 -0
  47. package/specs/endless-mode/spec.md +455 -0
  48. package/specs/entity-edge-model/context.md +401 -0
  49. package/specs/entity-edge-model/plan.md +459 -0
  50. package/specs/entity-edge-model/spec.md +391 -0
  51. package/specs/evidence-aligner-v2/context.md +401 -0
  52. package/specs/evidence-aligner-v2/plan.md +303 -0
  53. package/specs/evidence-aligner-v2/spec.md +312 -0
  54. package/specs/mcp-desktop-integration/context.md +278 -0
  55. package/specs/mcp-desktop-integration/plan.md +550 -0
  56. package/specs/mcp-desktop-integration/spec.md +494 -0
  57. package/specs/post-tool-use-hook/context.md +319 -0
  58. package/specs/post-tool-use-hook/plan.md +469 -0
  59. package/specs/post-tool-use-hook/spec.md +364 -0
  60. package/specs/private-tags/context.md +288 -0
  61. package/specs/private-tags/plan.md +412 -0
  62. package/specs/private-tags/spec.md +345 -0
  63. package/specs/progressive-disclosure/context.md +346 -0
  64. package/specs/progressive-disclosure/plan.md +663 -0
  65. package/specs/progressive-disclosure/spec.md +415 -0
  66. package/specs/task-entity-system/context.md +297 -0
  67. package/specs/task-entity-system/plan.md +301 -0
  68. package/specs/task-entity-system/spec.md +314 -0
  69. package/specs/vector-outbox-v2/context.md +470 -0
  70. package/specs/vector-outbox-v2/plan.md +562 -0
  71. package/specs/vector-outbox-v2/spec.md +466 -0
  72. package/specs/web-viewer-ui/context.md +384 -0
  73. package/specs/web-viewer-ui/plan.md +797 -0
  74. package/specs/web-viewer-ui/spec.md +516 -0
  75. package/src/cli/index.ts +570 -0
  76. package/src/core/canonical-key.ts +186 -0
  77. package/src/core/citation-generator.ts +63 -0
  78. package/src/core/consolidated-store.ts +279 -0
  79. package/src/core/consolidation-worker.ts +384 -0
  80. package/src/core/context-formatter.ts +276 -0
  81. package/src/core/continuity-manager.ts +336 -0
  82. package/src/core/edge-repo.ts +324 -0
  83. package/src/core/embedder.ts +124 -0
  84. package/src/core/entity-repo.ts +342 -0
  85. package/src/core/event-store.ts +672 -0
  86. package/src/core/evidence-aligner.ts +635 -0
  87. package/src/core/graduation.ts +365 -0
  88. package/src/core/index.ts +32 -0
  89. package/src/core/matcher.ts +210 -0
  90. package/src/core/metadata-extractor.ts +203 -0
  91. package/src/core/privacy/filter.ts +179 -0
  92. package/src/core/privacy/index.ts +20 -0
  93. package/src/core/privacy/tag-parser.ts +145 -0
  94. package/src/core/progressive-retriever.ts +415 -0
  95. package/src/core/retriever.ts +235 -0
  96. package/src/core/task/blocker-resolver.ts +325 -0
  97. package/src/core/task/index.ts +9 -0
  98. package/src/core/task/task-matcher.ts +238 -0
  99. package/src/core/task/task-projector.ts +345 -0
  100. package/src/core/task/task-resolver.ts +414 -0
  101. package/src/core/types.ts +841 -0
  102. package/src/core/vector-outbox.ts +295 -0
  103. package/src/core/vector-store.ts +182 -0
  104. package/src/core/vector-worker.ts +488 -0
  105. package/src/core/working-set-store.ts +244 -0
  106. package/src/hooks/post-tool-use.ts +127 -0
  107. package/src/hooks/session-end.ts +78 -0
  108. package/src/hooks/session-start.ts +57 -0
  109. package/src/hooks/stop.ts +78 -0
  110. package/src/hooks/user-prompt-submit.ts +54 -0
  111. package/src/mcp/handlers.ts +212 -0
  112. package/src/mcp/index.ts +47 -0
  113. package/src/mcp/tools.ts +78 -0
  114. package/src/server/api/citations.ts +101 -0
  115. package/src/server/api/events.ts +101 -0
  116. package/src/server/api/index.ts +18 -0
  117. package/src/server/api/search.ts +98 -0
  118. package/src/server/api/sessions.ts +111 -0
  119. package/src/server/api/stats.ts +97 -0
  120. package/src/server/index.ts +91 -0
  121. package/src/services/memory-service.ts +626 -0
  122. package/src/services/session-history-importer.ts +367 -0
  123. package/tests/canonical-key.test.ts +101 -0
  124. package/tests/evidence-aligner.test.ts +152 -0
  125. package/tests/matcher.test.ts +112 -0
  126. package/tsconfig.json +24 -0
  127. package/vitest.config.ts +15 -0
package/Memo.txt ADDED
@@ -0,0 +1,558 @@
1
+ # AxiomMind Memory Graduation Pipeline — Claude Code 구현 지시서
2
+
3
+ > 목적: **채팅 세션 로그를 L0(EventStore) → L1(JSON) → L2(Idris Candidate) → L3/L4 승격** 가능한 구조로 만들고, 특히 **Task를 “엔트리”가 아니라 “개체(Entity)+이벤트”**로 운영하여 중복/단절 문제를 해결한다.
4
+ > 또한 **EvidenceSpan을 LLM이 아니라 파이프라인이 확정(quote→span align)**하여 환각/근거불일치를 최소화한다.
5
+
6
+ ---
7
+
8
+ ## 0) 레포/현재 상태 전제
9
+
10
+ 현재 레포에는 다음이 이미 존재(혹은 유사 구현)한다고 가정한다.
11
+
12
+ - `memory_pipeline/`
13
+ - `extractor.py` : LLM으로 세션을 JSON으로 추출
14
+ - `idris_generator.py` : JSON → `.idr`
15
+ - `validator.py` : `idris2 --check`
16
+ - `indexer.py` : DuckDB + LanceDB 저장
17
+ - `search.py` : semantic/keyword 검색
18
+ - `orchestrator.py` : 전체 파이프라인 실행
19
+ - `cli.py`
20
+
21
+ 현재 설계의 문제/개선 목표(요약):
22
+
23
+ - DuckDB는 JSONB가 아니라 JSON 사용 필요
24
+ - LLM이 오프셋(spanStart/spanEnd)을 찍는 방식은 오류가 잦음 → **quote만 받고 aligner가 span 확정**
25
+ - Task/Decision을 세션 entry로만 저장하면 중복/단절이 쌓임 → **Task는 entity + event로 운영**
26
+ - DuckDB/LanceDB 정합성 필요 → **outbox + 단일 writer**
27
+ - append-only 트리거로 강제 어렵다 → **API 레벨로 강제 + dedupe_key로 idempotency**
28
+ - build/meta(프롬프트/임베딩/스키마 버전) 기록 필요
29
+ - conflict/decision/promotion/metrics 관측 가능하게 기록
30
+
31
+ ---
32
+
33
+ ## 1) 구현 범위(Phase)
34
+
35
+ ### Phase P0 (필수: 품질/정합성)
36
+ 1. **EventStore(SoT) + dedupe_key(idempotency) + projector offset**
37
+ 2. **EvidenceAligner(quote→span) 도입**
38
+ 3. **entries / entities / edges 분리**
39
+ 4. **Task Entity 시스템** (TaskResolver + BlockerResolver + TaskProjector)
40
+ 5. **Vector Outbox + 단일 writer + reconcile**
41
+ 6. DuckDB 스키마 수정(JSONB 제거), 최소 조회 API/CLI 추가
42
+
43
+ ### Phase P1 (운영 안정성)
44
+ 7. build_runs(빌드 스펙) + build_id 전파
45
+ 8. conflict ledger + contested 정책(검색/승격에서 패널티)
46
+ 9. decision ledger(Verified 이상 evidence만 허용) + causality 검사
47
+ 10. observability(파이프라인 메트릭)
48
+
49
+ ### Phase P2 (검색/승격 고도화)
50
+ 11. Hybrid retrieval(FTS+vector) + 재랭크 규칙 + Stage 가중치
51
+ 12. 골드셋 기반 검색 평가(리그레션)
52
+ 13. half-life 강등(정책 버전 분리)
53
+
54
+ > 이번 작업에서 **P0 전체 + P1의 build_runs/decision/conflict/metrics 스켈레톤**까지는 구현해두고,
55
+ > P2는 코드 자리(interfaces)만 만들어도 됨.
56
+
57
+ ---
58
+
59
+ ## 2) 핵심 원칙(반드시 준수)
60
+
61
+ 1. **SoT는 events(DuckDB)**
62
+ - 파생 테이블(entries/entities/edges/vector 등)은 언제든 **rebuild 가능**해야 함
63
+ 2. **append-only**
64
+ - events에 대한 UPDATE/DELETE 메서드 제공 금지(라이브러리/서비스 계층에서 막기)
65
+ 3. **idempotent**
66
+ - 재처리/재빌드/재시작에도 중복 이벤트·중복 edge·중복 벡터가 생기면 안 됨
67
+ - 모든 “의미 있는 행위”는 `dedupe_key`로 잠금
68
+ 4. **EvidenceSpan은 파이프라인이 확정**
69
+ - LLM에게 spanStart/spanEnd 요구 금지
70
+ - LLM은 quote만 제공 → aligner가 원문에서 찾아 span을 계산
71
+ 5. **Task는 entity**
72
+ - Task 상태(status/priority/blockers)는 이벤트 fold 결과로 계산
73
+ - 세션마다 Task entry를 새로 만들지 말고, 기존 task entity를 찾아 업데이트
74
+ 6. **Vector store 정합성**
75
+ - DuckDB에 먼저 기록 → outbox → 단일 writer가 LanceDB에 upsert → DuckDB 상태 업데이트
76
+ 7. **DuckDB JSON**
77
+ - JSONB 사용 금지, JSON 또는 VARCHAR(JSON string) 사용
78
+
79
+ ---
80
+
81
+ ## 3) 파일/모듈 추가 및 수정 계획
82
+
83
+ ### 3.1 신규 파일 구조(권장)
84
+
85
+ memory_pipeline/
86
+ db/
87
+ migrations.py
88
+ schema.sql
89
+ storage/
90
+ event_store.py
91
+ entity_repo.py
92
+ outbox.py
93
+ evidence/
94
+ aligner.py
95
+ task/
96
+ canonical.py
97
+ matcher.py
98
+ blocker_resolver.py
99
+ resolver.py
100
+ projector.py
101
+ ledgers/
102
+ decision_ledger.py
103
+ conflict_ledger.py
104
+ promotion_ledger.py
105
+ observability/
106
+ metrics.py
107
+ orchestrator.py (수정)
108
+ extractor.py (수정)
109
+ idris_generator.py (수정)
110
+ indexer.py (수정 또는 역할 축소)
111
+ search.py (수정)
112
+ cli.py (수정)
113
+ tests/
114
+ test_task_transitions.py
115
+ test_evidence_alignment.py
116
+ test_idempotency.py
117
+ test_outbox_reconcile.py
118
+
119
+ > 기존 파일을 완전히 갈아엎지 말고, **역할을 분리**하면서 점진 이관.
120
+
121
+ ---
122
+
123
+ ## 4) DB 스키마(DDL) — DuckDB
124
+
125
+ ### 4.1 events (SoT) + event_dedup + projection_offsets
126
+
127
+ ```sql
128
+ CREATE TABLE IF NOT EXISTS events (
129
+ event_id VARCHAR PRIMARY KEY,
130
+ ts TIMESTAMP NOT NULL,
131
+ event_type VARCHAR NOT NULL,
132
+ actor VARCHAR NOT NULL,
133
+ session_id VARCHAR,
134
+ payload_json JSON NOT NULL,
135
+ payload_redacted JSON,
136
+ sensitivity VARCHAR DEFAULT 'normal',
137
+ checksum VARCHAR NOT NULL,
138
+ build_id VARCHAR,
139
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
140
+ );
141
+ CREATE INDEX IF NOT EXISTS idx_events_ts ON events(ts);
142
+ CREATE INDEX IF NOT EXISTS idx_events_type ON events(event_type);
143
+ CREATE INDEX IF NOT EXISTS idx_events_session ON events(session_id);
144
+
145
+ CREATE TABLE IF NOT EXISTS event_dedup (
146
+ dedupe_key VARCHAR PRIMARY KEY,
147
+ event_id VARCHAR NOT NULL,
148
+ ts TIMESTAMP NOT NULL,
149
+ event_type VARCHAR NOT NULL
150
+ );
151
+
152
+ CREATE TABLE IF NOT EXISTS projection_offsets (
153
+ projector_name VARCHAR PRIMARY KEY,
154
+ last_ts TIMESTAMP,
155
+ last_event_id VARCHAR
156
+ );
157
+
158
+ 4.2 entries (immutable memory units)
159
+
160
+ CREATE TABLE IF NOT EXISTS entries (
161
+ entry_id VARCHAR PRIMARY KEY,
162
+ created_ts TIMESTAMP NOT NULL,
163
+ entry_type VARCHAR NOT NULL, -- fact|decision|insight|task_note|reference...
164
+ title VARCHAR NOT NULL,
165
+ content_json JSON NOT NULL,
166
+ stage VARCHAR NOT NULL, -- raw|working|candidate|verified|certified
167
+ status VARCHAR DEFAULT 'active',-- active|contested|deprecated|superseded
168
+ superseded_by VARCHAR,
169
+ build_id VARCHAR,
170
+ evidence_json JSON, -- aligned spans
171
+ canonical_key VARCHAR,
172
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
173
+ );
174
+ CREATE INDEX IF NOT EXISTS idx_entries_type ON entries(entry_type);
175
+ CREATE INDEX IF NOT EXISTS idx_entries_stage ON entries(stage);
176
+
177
+ 4.3 entities (task/condition/artifact) + aliases + edges
178
+
179
+ CREATE TABLE IF NOT EXISTS entities (
180
+ entity_id VARCHAR PRIMARY KEY,
181
+ entity_type VARCHAR NOT NULL, -- task|condition|artifact
182
+ canonical_key VARCHAR NOT NULL,
183
+ title VARCHAR NOT NULL,
184
+ stage VARCHAR NOT NULL,
185
+ status VARCHAR NOT NULL, -- active|contested|deprecated|superseded
186
+ current_json JSON NOT NULL,
187
+ title_norm VARCHAR,
188
+ search_text VARCHAR,
189
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
190
+ updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
191
+ );
192
+ CREATE INDEX IF NOT EXISTS idx_entities_type_key ON entities(entity_type, canonical_key);
193
+ CREATE INDEX IF NOT EXISTS idx_entities_status ON entities(status);
194
+
195
+ CREATE TABLE IF NOT EXISTS entity_aliases (
196
+ entity_type VARCHAR NOT NULL,
197
+ canonical_key VARCHAR NOT NULL,
198
+ entity_id VARCHAR NOT NULL,
199
+ is_primary BOOLEAN DEFAULT FALSE,
200
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
201
+ PRIMARY KEY(entity_type, canonical_key)
202
+ );
203
+
204
+ CREATE TABLE IF NOT EXISTS edges (
205
+ edge_id VARCHAR PRIMARY KEY,
206
+ src_type VARCHAR NOT NULL, -- entry|entity
207
+ src_id VARCHAR NOT NULL,
208
+ rel_type VARCHAR NOT NULL, -- evidence_of|blocked_by|blocked_by_suggested|resolves_to|...
209
+ dst_type VARCHAR NOT NULL, -- entry|entity
210
+ dst_id VARCHAR NOT NULL,
211
+ meta_json JSON,
212
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
213
+ );
214
+ CREATE INDEX IF NOT EXISTS idx_edges_src_rel ON edges(src_id, rel_type);
215
+ CREATE INDEX IF NOT EXISTS idx_edges_dst_rel ON edges(dst_id, rel_type);
216
+
217
+ 4.4 vector_outbox
218
+
219
+ CREATE TABLE IF NOT EXISTS vector_outbox (
220
+ job_id VARCHAR PRIMARY KEY,
221
+ item_kind VARCHAR NOT NULL, -- entry|task_title
222
+ item_id VARCHAR NOT NULL,
223
+ embedding_version VARCHAR NOT NULL,
224
+ status VARCHAR NOT NULL, -- pending|done|failed
225
+ error VARCHAR,
226
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
227
+ updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
228
+ UNIQUE(item_kind, item_id, embedding_version)
229
+ );
230
+ CREATE INDEX IF NOT EXISTS idx_outbox_status ON vector_outbox(status);
231
+
232
+ 4.5 build_runs / ledgers(스켈레톤)
233
+
234
+ CREATE TABLE IF NOT EXISTS build_runs (
235
+ build_id VARCHAR PRIMARY KEY,
236
+ started_at TIMESTAMP NOT NULL,
237
+ finished_at TIMESTAMP,
238
+ extractor_model VARCHAR NOT NULL,
239
+ extractor_prompt_hash VARCHAR NOT NULL,
240
+ embedder_model VARCHAR NOT NULL,
241
+ embedding_version VARCHAR NOT NULL,
242
+ idris_version VARCHAR NOT NULL,
243
+ schema_version VARCHAR NOT NULL,
244
+ status VARCHAR NOT NULL, -- running|success|failed
245
+ error VARCHAR
246
+ );
247
+
248
+ CREATE TABLE IF NOT EXISTS decisions (
249
+ decision_id VARCHAR PRIMARY KEY,
250
+ ts TIMESTAMP NOT NULL,
251
+ actor VARCHAR NOT NULL,
252
+ action VARCHAR NOT NULL,
253
+ outcome_json JSON,
254
+ confidence DOUBLE,
255
+ build_id VARCHAR,
256
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
257
+ );
258
+
259
+ CREATE TABLE IF NOT EXISTS decision_evidence (
260
+ id VARCHAR PRIMARY KEY,
261
+ decision_id VARCHAR NOT NULL,
262
+ evidence_kind VARCHAR NOT NULL, -- entry|entity
263
+ evidence_id VARCHAR NOT NULL,
264
+ evidence_stage VARCHAR NOT NULL, -- verified|certified only
265
+ role VARCHAR NOT NULL,
266
+ CHECK (evidence_stage IN ('verified', 'certified'))
267
+ );
268
+
269
+ CREATE TABLE IF NOT EXISTS conflicts (
270
+ conflict_id VARCHAR PRIMARY KEY,
271
+ older_id VARCHAR NOT NULL,
272
+ newer_id VARCHAR NOT NULL,
273
+ conflict_type VARCHAR NOT NULL,
274
+ reason VARCHAR,
275
+ detected_at TIMESTAMP NOT NULL,
276
+ resolved_at TIMESTAMP,
277
+ resolution VARCHAR,
278
+ resolved_by VARCHAR
279
+ );
280
+
281
+ CREATE TABLE IF NOT EXISTS pipeline_metrics (
282
+ id VARCHAR PRIMARY KEY,
283
+ ts TIMESTAMP NOT NULL,
284
+ stage VARCHAR NOT NULL,
285
+ latency_ms DOUBLE NOT NULL,
286
+ success BOOLEAN NOT NULL,
287
+ error VARCHAR,
288
+ session_id VARCHAR
289
+ );
290
+
291
+
292
+
293
+
294
+ 5) EventStore 구현 지시
295
+
296
+ 5.1 EventStore.append_dedup()
297
+ • 입력: event_type, ts, actor, session_id, payload, build_id, dedupe_key
298
+ • 동작:
299
+ 1. event_dedup에 dedupe_key INSERT 시도
300
+ • 실패(이미 존재)면 None 반환 + events에 쓰지 않음
301
+ 2. events에 INSERT
302
+ • checksum: payload_json의 canonical json dump에 sha256
303
+
304
+ 5.2 EventStore.fetch_since(last_ts, last_event_id, event_types)
305
+ • projector용 증분 읽기: (ts > last_ts) OR (ts==last_ts AND event_id > last_event_id)
306
+
307
+ 5.3 EventStore.replay(event_types=None, since=None)
308
+ • full rebuild용
309
+
310
+
311
+
312
+ 6) Evidence Align 구현 지시 (quote→span)
313
+
314
+ 6.1 Extractor 출력 스키마 변경(중요)
315
+
316
+ Extractor 프롬프트/파서 수정:
317
+ • evidenceSpans에서 spanStart/spanEnd 요구 금지
318
+ • 대신:
319
+ • messageIndex
320
+ • quote (짧게, 30~200자 권장)
321
+ • entry에는 entryId(string) 필수
322
+
323
+ 예시:
324
+
325
+ {
326
+ "entries":[
327
+ {
328
+ "entryId":"ent_...",
329
+ "type":"fact",
330
+ "title":"DuckDB JSONB 제거",
331
+ "evidence":[{"messageIndex":3,"quote":"content JSONB → JSON"}]
332
+ }
333
+ ]
334
+ }
335
+
336
+ 6.2 EvidenceAligner.align(session_messages, extracted_json)
337
+ • 입력:
338
+ • session_messages: list[str] (원문 메시지 배열)
339
+ • extracted_json
340
+ • 출력:
341
+ • extracted_json에 evidenceAligned: true/false
342
+ • 각 evidence item을 아래로 확정:
343
+ • event_id (session_ingested event_id)
344
+ • spanStart/spanEnd
345
+ • quote_hash
346
+ • confidence
347
+ • 정렬 알고리즘:
348
+ 1. exact substring match
349
+ 2. normalize(공백/개행 collapse) 후 fuzzy match(최소 0.85 이상) — optional
350
+ 3. 실패 시 evidenceAligned=false로 표시하고, 해당 엔트리는 Verified 승격 금지
351
+ • 결과는 evidence_aligned 이벤트로 기록
352
+
353
+
354
+
355
+ 7) Task Entity 시스템 구현 지시
356
+
357
+ 7.1 canonical key 함수
358
+ • task:{project}:{normalize(title)}
359
+ • cond:{project}:{normalize(text)}
360
+ • artifact:
361
+ • URL: art:url:{sha1(url)}
362
+ • JIRA key: art:jira:{key}
363
+ • GH issue: art:gh_issue:{repo}:{num} 등
364
+
365
+ 7.2 Task 이벤트 타입(SoT)
366
+ • task_created
367
+ • task_status_changed
368
+ • task_priority_changed
369
+ • task_blockers_set (mode=replace|suggest, blockers: entity refs)
370
+ • task_transition_rejected (디버깅/리뷰용)
371
+ • condition_declared
372
+ • artifact_declared
373
+ • condition_resolved_to
374
+
375
+ 7.3 BlockerResolver
376
+ • 입력: blockedBy texts + project + source_entry_id
377
+ • 출력: BlockerRef(kind, entity_id, raw_text, confidence, candidates?)
378
+
379
+ 규칙:
380
+ 1. 강한 ID/URL/키 패턴 → artifact로 get-or-create
381
+ 2. 명시 task_id → task로 연결(없으면 condition으로 fallback)
382
+ 3. Task 제목 매칭은 strict만 허용: 실패하면 condition으로 get-or-create + candidates 저장
383
+ 4. 스텁 Task 생성 금지 (중복 지옥 방지)
384
+
385
+ condition/artifact 생성은 declared 이벤트 + dedupe_key로 idempotent.
386
+
387
+ 7.4 TaskMatcher(고도화)
388
+ • exact: entity_aliases(canonical_key)
389
+ • fts: entities.search_text 기반
390
+ • vector: LanceDB task_title_vectors_{embedding_version}
391
+
392
+ 점수:
393
+ • base(method) * stage weight * entity status weight * task status weight * recency
394
+
395
+ strict 확정:
396
+ • top1 >= 0.92 & (top1-top2)>=0.03 & active & not cancelled
397
+
398
+ suggest_candidates()는 top-k를 반환(조건 엔티티에 저장)
399
+
400
+ 7.5 TaskResolver
401
+
402
+ 세션에서 추출된 task entry 처리:
403
+ 1. task_canonical_key로 기존 task 찾기
404
+ 2. 없으면 task_created 이벤트 발행(신규 initial_status=done 금지 → in_progress로 보정)
405
+ 3. priority/status 변경 필요 시 이벤트 발행(전이 검증)
406
+ 4. blockers가 있으면 BlockerResolver로 entity_id 리스트로 정규화
407
+ 5. evidenceAligned이면 mode=replace, 아니면 mode=suggest
408
+ 6. task_blockers_set 이벤트 발행(dedupe_key로 idempotent)
409
+ 7. entry→task evidence edge는 projector에서 생성
410
+
411
+ 주의:
412
+ • blocked인데 blockedBy가 비면 (unknown blocker) condition 자동 삽입 + meta.auto_placeholder=true
413
+
414
+ 7.6 TaskProjector (projection)
415
+ • events를 증분으로 읽고 entities/edges 갱신
416
+ • mode=replace:
417
+ • 기존 blocked_by edges 삭제 후 새 edges 삽입
418
+ • entities.current_json.blockers 갱신(캐시)
419
+ • mode=suggest:
420
+ • blocked_by_suggested edges만 insert(or replace)
421
+ • entities.current_json.blocker_suggestions 누적(선택)
422
+ • status_changed to done:
423
+ • blocked_by edges 제거 + blockers cache clear (불변식)
424
+
425
+ Projection offset 갱신 필수.
426
+
427
+
428
+
429
+ 8) Vector Outbox + LanceDB writer 구현 지시
430
+
431
+ 8.1 Outbox enqueue
432
+ • entry materialized 시: vector_outbox(item_kind='entry', item_id=entry_id, embedding_version=...) insert ignore
433
+ • task_created/task_title_changed 시: vector_outbox(item_kind='task_title', item_id=task_id, ...)
434
+
435
+ 8.2 Single writer worker
436
+ • vector_outbox where status='pending' batch 처리
437
+ • embedding 생성 후 LanceDB idempotent upsert
438
+ • 같은 id는 중복 row가 쌓이지 않게(가능하면 upsert, 아니면 delete+add를 한 규칙으로)
439
+ • 성공: outbox status=‘done’
440
+ • 실패: status=‘failed’, error 저장
441
+ • reconcile():
442
+ • pending 재처리
443
+ • failed는 재시도 정책(횟수 제한) optional
444
+
445
+
446
+
447
+ 9) Search / 조회 API 구현 지시
448
+
449
+ 9.1 Effective blockers view
450
+
451
+ v_task_blockers_effective 생성:
452
+ • blocked_by edge의 대상이 condition이고 resolves_to가 있으면 effective blocker를 resolved_to로 펼쳐서 반환
453
+
454
+ 9.2 주요 조회 함수 (DB 쿼리 + Python wrapper)
455
+ • list_blocked_tasks() : blocked task + blockers 펼친 결과 그룹핑
456
+ • list_tasks_with_only_suggested_blockers()
457
+ • list_tasks_with_unknown_placeholder_blocker()
458
+ • list_resolved_conditions() : condition_resolved_to 목록
459
+ • get_task_detail(task_id) : snapshot + blocked_by + suggested + 관련 task_* 이벤트 히스토리
460
+
461
+ CLI에 다음 커맨드 추가:
462
+ • cli blocked
463
+ • cli task show <task_id>
464
+ • cli tasks --status blocked|pending|...
465
+ • cli conditions resolved
466
+
467
+
468
+
469
+ 10) Orchestrator 수정 지시 (전체 파이프라인 연결)
470
+
471
+ 10.1 process_session(session_log, date, session_id) 동작 순서
472
+ 1. build_id 생성 + build_runs 기록(status=running)
473
+ 2. session_ingested 이벤트 append (payload: raw log + message array)
474
+ 3. extractor 실행 → memory_extracted 이벤트 append
475
+ 4. EvidenceAligner 실행 → evidence_aligned 이벤트 append
476
+ 5. entry materialization: entries에 INSERT(immutable)
477
+ 6. Idris 생성/검증 → 결과를 이벤트로 기록(idris_checked) + stage/승격 후보 반영
478
+ 7. TaskResolver 실행(추출된 task entry들 대상으로) → task_* 이벤트 append
479
+ 8. TaskProjector 실행(증분 처리)
480
+ 9. outbox worker(동기 호출 또는 별도 프로세스/스레드) 실행
481
+ 10. build_runs finished(status=success|failed) 업데이트
482
+ 11. pipeline_metrics 기록(각 단계 latency)
483
+
484
+ NOTE: projector/worker를 한 프로세스에서 돌릴 때는 순서 보장(단일 writer) 필수.
485
+
486
+
487
+
488
+ 11) Idris 관련 지시(최소 수정)
489
+ • 지금은 Python 쪽 구현이 핵심이므로, Idris는 “Candidate/Verified 래퍼” 기반으로만 최소 수정:
490
+ • entryId/TaskId/EventId 타입 도입(문자열)
491
+ • Fact/Decision/Task에 evidenceSpans 필드 추가(혹은 Reference만)
492
+ • Bool 검증은 Verified 래퍼에서 So로 강제(가능하면)
493
+ • idris2 --check 통과 여부를 Candidate→Verified 승격 조건에 사용
494
+
495
+
496
+
497
+ 12) 테스트(필수)
498
+
499
+ 12.1 Evidence alignment
500
+ • quote가 원문에 존재하면 span이 정확히 계산되는지
501
+ • quote가 없으면 evidenceAligned=false가 되는지
502
+
503
+ 12.2 Task transition property test
504
+ • 금지 전이(pending→done)가 자동으로 발생하지 않는지
505
+ • blocked면 blockers가 비지 않는지
506
+ • done이면 blockers가 비는지
507
+
508
+ 12.3 Idempotency test
509
+ • 동일 세션을 두 번 process해도:
510
+ • event_dedup가 중복을 막고
511
+ • edges가 중복 생성되지 않고
512
+ • outbox가 중복 job을 만들지 않는지
513
+
514
+ 12.4 Outbox reconcile test
515
+ • pending을 재시도했을 때 done으로 가는지(또는 실패 기록이 남는지)
516
+
517
+
518
+
519
+ 13) 완료 기준(Definition of Done)
520
+ • DuckDB 스키마에서 JSONB 제거, migrations로 재현 가능
521
+ • events + event_dedup + projection_offsets 동작
522
+ • Extractor는 quote evidence만 출력하고, EvidenceAligner가 span 확정
523
+ • entries/entities/edges 분리 및 기본 조회 가능
524
+ • Task entity 시스템 동작:
525
+ • 신규 task 생성, 상태 변경, blockers replace/suggest, placeholder 처리
526
+ • condition/artifact declared 이벤트로 idempotent 생성
527
+ • task projector가 entities/edges를 안정적으로 갱신
528
+ • vector_outbox + 단일 writer + reconcile 동작(중복 벡터 없음)
529
+ • CLI로 blocked tasks / task 상세 조회 가능
530
+ • 핵심 테스트(정렬/전이/idempotency/outbox) 통과
531
+ • pipeline_metrics 기록(최소 extraction/align/project/index 단계)
532
+
533
+
534
+
535
+ 14) 구현 메모(주의사항)
536
+ • DuckDB JSON 컬럼은 Python에서 dict로 자동 로드되지 않을 수 있음 → json.loads 필요 시 처리
537
+ • LanceDB distance 스케일은 모델/설정에 따라 다를 수 있으니 similarity 변환은 추후 보정 가능하게 함수로 분리
538
+ • FTS PRAGMA/함수는 DuckDB 버전별 차이가 있으므로 FTS 호출은 한 곳에 캡슐화(뷰/함수)
539
+ • “append-only”는 DB 트리거로 기대하지 말고 API에서 메서드 자체를 제공하지 않는 방식으로 강제
540
+
541
+
542
+
543
+ 15) 구현 우선순위 체크리스트(Claude Code 작업 순서)
544
+ 1. migrations + schema 적용
545
+ 2. EventStore + append_dedup + fetch_since + offsets
546
+ 3. Extractor 프롬프트/파서 변경(quote-only)
547
+ 4. EvidenceAligner + evidence_aligned 이벤트
548
+ 5. entries materialize + entry→task evidence edge(후속)
549
+ 6. entities/edges + EntityRepo
550
+ 7. Task canonical + TaskMatcher(초기엔 exact만, 후에 FTS/vector)
551
+ 8. BlockerResolver(condition/artifact declared)
552
+ 9. TaskResolver(task_* 이벤트)
553
+ 10. TaskProjector(mode replace/suggest)
554
+ 11. vector_outbox + writer + reconcile
555
+ 12. 조회 API + CLI
556
+ 13. 테스트/메트릭/빌드런 기록
557
+
558
+