freshcontext-mcp 0.3.19 → 0.3.20
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/FRESHCONTEXT_SPEC.md +317 -0
- package/METHODOLOGY.md +381 -0
- package/README.md +11 -7
- package/dist/adapters/arxiv.js +2 -1
- package/dist/adapters/changelog.js +4 -2
- package/dist/adapters/finance.js +1 -1
- package/dist/adapters/gdelt.js +1 -1
- package/dist/adapters/gebiz.js +1 -1
- package/dist/adapters/reddit.js +11 -4
- package/dist/adapters/repoSearch.js +1 -1
- package/dist/adapters/secFilings.js +1 -1
- package/dist/core/envelope.js +9 -1
- package/dist/security.js +3 -1
- package/dist/server.js +2 -2
- package/dist/tools/evaluateContext.js +19 -0
- package/docs/CLIENT_SETUP.md +166 -0
- package/docs/CODEX_MCP_USAGE.md +2 -2
- package/docs/CORE_API.md +8 -6
- package/docs/FUTURE_LANES.md +42 -19
- package/docs/HA_PRI_V2_DESIGN.md +7 -1
- package/docs/HA_PRI_V2_PRODUCTION_ENFORCEMENT_PLAN.md +414 -0
- package/docs/RELEASE_INTEGRITY.md +1 -1
- package/docs/SIGNAL_CONTRACT.md +213 -17
- package/package-script-guard.mjs +75 -28
- package/package.json +6 -1
- package/server.json +2 -2
|
@@ -0,0 +1,414 @@
|
|
|
1
|
+
# Ha-Pri v2 Production Enforcement Plan
|
|
2
|
+
|
|
3
|
+
Status: design only
|
|
4
|
+
Phase: Pass 11-K
|
|
5
|
+
Runtime impact: none
|
|
6
|
+
|
|
7
|
+
Ha-Pri v2 production enforcement is a future rollout path. Current FreshContext releases only include the Core helper and deterministic golden vectors unless a later implementation pass explicitly wires Worker/D1 enforcement.
|
|
8
|
+
|
|
9
|
+
This plan describes how Ha-Pri v2 could move from pure Core provenance helper to production Store/Worker verification without overclaiming current behavior.
|
|
10
|
+
|
|
11
|
+
## 1. Current State
|
|
12
|
+
|
|
13
|
+
Current FreshContext behavior:
|
|
14
|
+
|
|
15
|
+
- Ha-Pri v1 is the current Worker/feed audit stamp.
|
|
16
|
+
- Ha-Pri v1 is stored as `scrape_results.ha_pri_sig`.
|
|
17
|
+
- Ha-Pri v1 is returned in Worker feed `intelligence_stamps`.
|
|
18
|
+
- Ha-Pri v1 is a provenance stamp and audit reference, not hard row rejection.
|
|
19
|
+
- Ha-Pri v2 exists as a pure Core helper in `src/core/provenance.ts`.
|
|
20
|
+
- Ha-Pri v2 has deterministic golden vectors in `tests/fixtures/ha-pri-v2-golden-vectors.json`.
|
|
21
|
+
- Ha-Pri v2 is not production-enforced on Worker/D1 reads.
|
|
22
|
+
- Ha-Pri v2 does not currently reject rows.
|
|
23
|
+
- Ha-Pri v2 does not currently provide private-key origin authentication.
|
|
24
|
+
|
|
25
|
+
The current v2 helper provides deterministic canonicalization, SHA-256 hashing, signing payload construction, signature calculation, and verification status:
|
|
26
|
+
|
|
27
|
+
```txt
|
|
28
|
+
valid
|
|
29
|
+
invalid
|
|
30
|
+
unknown
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
That is real Core behavior. It is not yet Worker/D1 enforcement.
|
|
34
|
+
|
|
35
|
+
## 2. Target Future State
|
|
36
|
+
|
|
37
|
+
Future production enforcement should make stored context rows independently reviewable after write.
|
|
38
|
+
|
|
39
|
+
Target behavior:
|
|
40
|
+
|
|
41
|
+
- Worker write path stores Ha-Pri v2 provenance material for new rows.
|
|
42
|
+
- Rows can later be verified as `valid`, `invalid`, or `unknown`.
|
|
43
|
+
- Debug/internal read paths can report verification status.
|
|
44
|
+
- Safe public read paths can expose limited verification status without leaking internals.
|
|
45
|
+
- Invalid rows are not silently treated as trusted.
|
|
46
|
+
- Old rows remain compatible through `unknown` and optional backfill behavior.
|
|
47
|
+
- Strict rejection remains optional and staged, not the first rollout.
|
|
48
|
+
|
|
49
|
+
The target is not "FreshContext proves truth." The target is "FreshContext can detect whether stored provenance material still matches the stored row material under the documented v2 contract."
|
|
50
|
+
|
|
51
|
+
## 3. Proposed D1 / Storage Fields
|
|
52
|
+
|
|
53
|
+
Essential fields:
|
|
54
|
+
|
|
55
|
+
```txt
|
|
56
|
+
ha_pri_sig_v2 TEXT
|
|
57
|
+
ha_pri_canonical_content_sha256 TEXT
|
|
58
|
+
ha_pri_semantic_fingerprint_sha256 TEXT
|
|
59
|
+
ha_pri_signing_payload_version TEXT
|
|
60
|
+
ha_pri_engine_version TEXT
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
Recommended operational fields:
|
|
64
|
+
|
|
65
|
+
```txt
|
|
66
|
+
ha_pri_verification_status TEXT
|
|
67
|
+
ha_pri_verified_at TEXT
|
|
68
|
+
ha_pri_backfill_status TEXT
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
Likely unnecessary as separate stored fields:
|
|
72
|
+
|
|
73
|
+
```txt
|
|
74
|
+
ha_pri_adapter
|
|
75
|
+
ha_pri_published_at
|
|
76
|
+
ha_pri_retrieved_at
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
Reason: adapter, published timestamp, and retrieved/scraped timestamp already exist or should exist as first-class row fields. Duplicating them inside Ha-Pri-specific columns risks drift. The signing payload should read those canonical row fields directly during verification.
|
|
80
|
+
|
|
81
|
+
Minimum practical schema:
|
|
82
|
+
|
|
83
|
+
```sql
|
|
84
|
+
ALTER TABLE scrape_results ADD COLUMN ha_pri_sig_v2 TEXT;
|
|
85
|
+
ALTER TABLE scrape_results ADD COLUMN ha_pri_canonical_content_sha256 TEXT;
|
|
86
|
+
ALTER TABLE scrape_results ADD COLUMN ha_pri_semantic_fingerprint_sha256 TEXT;
|
|
87
|
+
ALTER TABLE scrape_results ADD COLUMN ha_pri_signing_payload_version TEXT;
|
|
88
|
+
ALTER TABLE scrape_results ADD COLUMN ha_pri_engine_version TEXT;
|
|
89
|
+
ALTER TABLE scrape_results ADD COLUMN ha_pri_verification_status TEXT;
|
|
90
|
+
ALTER TABLE scrape_results ADD COLUMN ha_pri_verified_at TEXT;
|
|
91
|
+
ALTER TABLE scrape_results ADD COLUMN ha_pri_backfill_status TEXT;
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
Storage status values should be boring and explicit:
|
|
95
|
+
|
|
96
|
+
```txt
|
|
97
|
+
valid
|
|
98
|
+
invalid
|
|
99
|
+
unknown
|
|
100
|
+
not_checked
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
Backfill status values should avoid pretending old rows were originally v2-stamped:
|
|
104
|
+
|
|
105
|
+
```txt
|
|
106
|
+
none
|
|
107
|
+
backfilled
|
|
108
|
+
unknown_origin
|
|
109
|
+
failed
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
## 4. Write-Path Design
|
|
113
|
+
|
|
114
|
+
The future Worker write path should dual-stamp v1 and v2 for new rows.
|
|
115
|
+
|
|
116
|
+
Recommended write sequence:
|
|
117
|
+
|
|
118
|
+
1. Adapter returns raw candidate content.
|
|
119
|
+
2. Existing Worker scoring computes current DAR fields and Ha-Pri v1.
|
|
120
|
+
3. Write path prepares Ha-Pri v2 input:
|
|
121
|
+
- `resultId`: row id that will be stored
|
|
122
|
+
- `rawContent`: canonical row raw content
|
|
123
|
+
- `semanticFingerprint`: semantic fingerprint material or stored fingerprint
|
|
124
|
+
- `adapter`: adapter id
|
|
125
|
+
- `publishedAt`: normalized source publication timestamp or `null`
|
|
126
|
+
- `retrievedAt`: normalized scrape/retrieval timestamp
|
|
127
|
+
- `engineVersion`: FreshContext engine/package version or explicit Worker engine version
|
|
128
|
+
4. If required v2 material is complete, calculate:
|
|
129
|
+
- canonical content SHA-256
|
|
130
|
+
- semantic fingerprint SHA-256
|
|
131
|
+
- signing payload version
|
|
132
|
+
- Ha-Pri v2 signature
|
|
133
|
+
5. Store v1 fields as today.
|
|
134
|
+
6. Store v2 fields alongside v1 fields.
|
|
135
|
+
7. If material is incomplete, store unknown-compatible metadata and do not pretend the row is valid.
|
|
136
|
+
|
|
137
|
+
Recommended incomplete-material behavior:
|
|
138
|
+
|
|
139
|
+
```txt
|
|
140
|
+
ha_pri_sig_v2 = null
|
|
141
|
+
ha_pri_verification_status = "unknown"
|
|
142
|
+
ha_pri_backfill_status = "none"
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
Canonical content should be produced by the same pure helper behavior used in Core golden vectors. Do not invent a second Worker-only canonicalization contract.
|
|
146
|
+
|
|
147
|
+
Semantic fingerprint should be produced before signing and should be stable across retries for the same underlying source item. If the fingerprint is missing, v2 signing should fall back to `unknown`, not a fake valid signature.
|
|
148
|
+
|
|
149
|
+
Engine version should be explicit. The safest initial choice is the package/server version used by the running Worker build.
|
|
150
|
+
|
|
151
|
+
## 5. Read / Debug Verification Design
|
|
152
|
+
|
|
153
|
+
Future read verification should be a pure recomputation:
|
|
154
|
+
|
|
155
|
+
```txt
|
|
156
|
+
verifyHaPriV2(row) -> valid | invalid | unknown
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
Suggested internal verification input:
|
|
160
|
+
|
|
161
|
+
```ts
|
|
162
|
+
{
|
|
163
|
+
resultId: row.id,
|
|
164
|
+
rawContent: row.raw_content,
|
|
165
|
+
semanticFingerprint: row.semantic_fingerprint,
|
|
166
|
+
adapter: row.adapter,
|
|
167
|
+
publishedAt: row.published_at,
|
|
168
|
+
retrievedAt: row.scraped_at,
|
|
169
|
+
engineVersion: row.ha_pri_engine_version
|
|
170
|
+
}
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
Debug output may include:
|
|
174
|
+
|
|
175
|
+
```json
|
|
176
|
+
{
|
|
177
|
+
"ha_pri_v2": {
|
|
178
|
+
"status": "valid",
|
|
179
|
+
"checked_at": "2026-06-11T12:00:00.000Z",
|
|
180
|
+
"payload_version": "FRESHCONTEXT_HA_PRI_V2",
|
|
181
|
+
"canonical_content_sha256": "sha256...",
|
|
182
|
+
"semantic_fingerprint_sha256": "sha256..."
|
|
183
|
+
}
|
|
184
|
+
}
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
Safe public output should be smaller:
|
|
188
|
+
|
|
189
|
+
```json
|
|
190
|
+
{
|
|
191
|
+
"provenance": {
|
|
192
|
+
"ha_pri_v2_status": "valid"
|
|
193
|
+
}
|
|
194
|
+
}
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
Do not expose signing payloads or debug hashes in broad public outputs unless there is a clear user need.
|
|
198
|
+
|
|
199
|
+
Suggested staged behavior:
|
|
200
|
+
|
|
201
|
+
- Phase 1: report-only verification in internal/debug output.
|
|
202
|
+
- Phase 2: warning in read/debug path when invalid.
|
|
203
|
+
- Phase 3: optional strict mode for private deployments.
|
|
204
|
+
- Phase 4: possible reject/block policy only after replay data and operational evidence.
|
|
205
|
+
|
|
206
|
+
Invalid should not become automatic rejection first. A migration bug, canonicalization mismatch, or schema rollout issue could otherwise hide useful rows during rollout.
|
|
207
|
+
|
|
208
|
+
## 6. Compatibility And Backfill
|
|
209
|
+
|
|
210
|
+
Old rows must remain readable.
|
|
211
|
+
|
|
212
|
+
Compatibility rules:
|
|
213
|
+
|
|
214
|
+
- v1-only rows verify as `unknown` for v2.
|
|
215
|
+
- Rows with missing v2 fields verify as `unknown`.
|
|
216
|
+
- Rows with malformed v2 signatures verify as `invalid`.
|
|
217
|
+
- Rows with present but mismatched v2 signatures verify as `invalid`.
|
|
218
|
+
- Missing `ha_pri_sig_v2` is not the same as tampering.
|
|
219
|
+
- Existing `ha_pri_sig` remains readable for historical continuity.
|
|
220
|
+
|
|
221
|
+
Backfill rules:
|
|
222
|
+
|
|
223
|
+
- Backfilled provenance must be marked as `backfilled` or `unknown_origin`.
|
|
224
|
+
- Backfill must not imply the row was v2-stamped at original write time.
|
|
225
|
+
- Backfill should preserve original row timestamps.
|
|
226
|
+
- Backfill should record when verification/backfill happened.
|
|
227
|
+
- Backfill should be reversible or repeatable where practical.
|
|
228
|
+
|
|
229
|
+
Possible backfill process:
|
|
230
|
+
|
|
231
|
+
1. Select rows missing `ha_pri_sig_v2`.
|
|
232
|
+
2. Reconstruct v2 input from stored row fields.
|
|
233
|
+
3. If required material is complete, calculate v2 fields.
|
|
234
|
+
4. Store v2 fields with `ha_pri_backfill_status = "backfilled"`.
|
|
235
|
+
5. If required material is incomplete, store `ha_pri_verification_status = "unknown"` and `ha_pri_backfill_status = "unknown_origin"`.
|
|
236
|
+
6. Report counts for backfilled, unknown, invalid, and failed rows.
|
|
237
|
+
|
|
238
|
+
## 7. Security Boundary
|
|
239
|
+
|
|
240
|
+
Plain SHA-256 gives deterministic integrity and audit checks.
|
|
241
|
+
|
|
242
|
+
Plain SHA-256 does not prove private origin authentication. Anyone with all payload fields can recompute a plain SHA-256 signature.
|
|
243
|
+
|
|
244
|
+
Ha-Pri v2 helps detect:
|
|
245
|
+
|
|
246
|
+
- accidental row corruption
|
|
247
|
+
- changed content after write
|
|
248
|
+
- changed semantic fingerprint material
|
|
249
|
+
- changed adapter/timestamp/version fields included in the signing payload
|
|
250
|
+
- malformed stored signatures
|
|
251
|
+
|
|
252
|
+
Ha-Pri v2 does not solve:
|
|
253
|
+
|
|
254
|
+
- truth certification
|
|
255
|
+
- legal, medical, tax, employment, academic, or investment correctness
|
|
256
|
+
- private origin authentication without a secret or private key
|
|
257
|
+
- compromise of the write path before signing
|
|
258
|
+
- compromise of all row fields plus signature under plain SHA-256
|
|
259
|
+
|
|
260
|
+
Recommendation:
|
|
261
|
+
|
|
262
|
+
Do not add HMAC/private signing immediately to the open package. Keep the open package deterministic and stateless.
|
|
263
|
+
|
|
264
|
+
Consider HMAC or private-key signing later for:
|
|
265
|
+
|
|
266
|
+
- hosted FreshContext endpoints
|
|
267
|
+
- private production deployments
|
|
268
|
+
- paid/tenant-specific infrastructure
|
|
269
|
+
- environments where the verifier must know the row was stamped by a trusted FreshContext deployment
|
|
270
|
+
|
|
271
|
+
If HMAC/private signing is added later, it requires:
|
|
272
|
+
|
|
273
|
+
- secret storage outside the repo
|
|
274
|
+
- key ids
|
|
275
|
+
- key rotation
|
|
276
|
+
- old-key verification policy
|
|
277
|
+
- signer/verifier boundary documentation
|
|
278
|
+
- tests proving secrets are never logged or returned
|
|
279
|
+
|
|
280
|
+
## 8. Threat Model
|
|
281
|
+
|
|
282
|
+
Threats considered:
|
|
283
|
+
|
|
284
|
+
### Accidental row corruption
|
|
285
|
+
|
|
286
|
+
Ha-Pri v2 helps by recomputing the expected signature and surfacing `invalid`.
|
|
287
|
+
|
|
288
|
+
### Stale or partial provenance
|
|
289
|
+
|
|
290
|
+
Ha-Pri v2 helps by returning `unknown` when required material is missing. The system should not pretend such rows are valid.
|
|
291
|
+
|
|
292
|
+
### Tampered D1 rows
|
|
293
|
+
|
|
294
|
+
Ha-Pri v2 helps if an attacker changes stored content or bound fields without also updating all matching v2 fields.
|
|
295
|
+
|
|
296
|
+
Plain SHA-256 does not help if an attacker can rewrite all row fields and recompute the public signature.
|
|
297
|
+
|
|
298
|
+
### Malformed signatures
|
|
299
|
+
|
|
300
|
+
Malformed, blank, or nonmatching signatures should produce `invalid` or `unknown` according to current helper behavior. They should not crash reads.
|
|
301
|
+
|
|
302
|
+
### Recomputed public SHA-256 signatures
|
|
303
|
+
|
|
304
|
+
Because v2 currently uses plain SHA-256, a party with all fields can recompute a matching signature. HMAC/private signing is the later answer if origin authentication becomes necessary.
|
|
305
|
+
|
|
306
|
+
### Debug endpoint leakage
|
|
307
|
+
|
|
308
|
+
Debug routes should remain authenticated. Public outputs should avoid exposing full signing payloads or internal row material unless deliberately needed.
|
|
309
|
+
|
|
310
|
+
### Secret exposure if HMAC is added later
|
|
311
|
+
|
|
312
|
+
HMAC/private signing introduces secret-management risk. Secrets must live in deployment configuration, never in docs, fixtures, npm package output, or client-visible responses.
|
|
313
|
+
|
|
314
|
+
## 9. Tests Needed Before Implementation
|
|
315
|
+
|
|
316
|
+
Future implementation should add tests before production rollout:
|
|
317
|
+
|
|
318
|
+
- D1 migration tests if the migration harness supports them.
|
|
319
|
+
- Write-path stamping tests for new rows.
|
|
320
|
+
- Dual-stamp tests proving v1 remains unchanged.
|
|
321
|
+
- Read-path verification tests for `valid`, `invalid`, and `unknown`.
|
|
322
|
+
- Old-row tests proving missing v2 fields are `unknown`, not invalid.
|
|
323
|
+
- Tampered-row tests for changed content, semantic fingerprint, adapter, timestamps, and engine version.
|
|
324
|
+
- Malformed signature tests.
|
|
325
|
+
- Debug output safety tests.
|
|
326
|
+
- Public output minimization tests.
|
|
327
|
+
- Backfill tests with complete and incomplete rows.
|
|
328
|
+
- Worker dry-run validation.
|
|
329
|
+
- HMAC/private signing tests if that later lands.
|
|
330
|
+
|
|
331
|
+
Do not add fake production-enforcement tests before implementation exists.
|
|
332
|
+
|
|
333
|
+
## 10. Rollout Phases
|
|
334
|
+
|
|
335
|
+
### Phase 0: Core helper and golden vectors
|
|
336
|
+
|
|
337
|
+
Done.
|
|
338
|
+
|
|
339
|
+
Includes:
|
|
340
|
+
|
|
341
|
+
- pure Core Ha-Pri v2 helper
|
|
342
|
+
- deterministic golden vectors
|
|
343
|
+
- valid / invalid / unknown verification behavior
|
|
344
|
+
|
|
345
|
+
### Phase 1: Storage schema design
|
|
346
|
+
|
|
347
|
+
This document.
|
|
348
|
+
|
|
349
|
+
No migration yet.
|
|
350
|
+
|
|
351
|
+
### Phase 2: Write-path dual-stamp v1 + v2
|
|
352
|
+
|
|
353
|
+
Add D1 columns and write v2 fields for new rows only. Keep v1 intact.
|
|
354
|
+
|
|
355
|
+
### Phase 3: Report-only read/debug verification
|
|
356
|
+
|
|
357
|
+
Recompute v2 on read/debug paths and report status. Do not reject rows yet.
|
|
358
|
+
|
|
359
|
+
### Phase 4: Backfill tooling
|
|
360
|
+
|
|
361
|
+
Backfill historical rows only with explicit `backfilled` or `unknown_origin` markers.
|
|
362
|
+
|
|
363
|
+
### Phase 5: Optional strict mode
|
|
364
|
+
|
|
365
|
+
Private deployments may opt into warnings or blocking for invalid rows after enough evidence.
|
|
366
|
+
|
|
367
|
+
### Phase 6: Hosted/private HMAC signing
|
|
368
|
+
|
|
369
|
+
Add origin-authenticated signing only if hosted/private use cases require it.
|
|
370
|
+
|
|
371
|
+
## 11. Non-Goals
|
|
372
|
+
|
|
373
|
+
This pass does not implement:
|
|
374
|
+
|
|
375
|
+
- D1 migration
|
|
376
|
+
- Worker enforcement
|
|
377
|
+
- read-path rejection
|
|
378
|
+
- debug endpoint changes
|
|
379
|
+
- HMAC/private signing
|
|
380
|
+
- backfill script
|
|
381
|
+
- npm publish
|
|
382
|
+
- version bump
|
|
383
|
+
- Cloudflare deploy
|
|
384
|
+
- public security claim upgrade
|
|
385
|
+
- new MCP tools
|
|
386
|
+
- ranking/scoring changes
|
|
387
|
+
- Operator/retrieve behavior
|
|
388
|
+
|
|
389
|
+
## Release Gates For Future Implementation
|
|
390
|
+
|
|
391
|
+
Before any implementation pass can claim production enforcement:
|
|
392
|
+
|
|
393
|
+
- migrations must be reviewed and dry-run
|
|
394
|
+
- new rows must dual-stamp v1 and v2
|
|
395
|
+
- v1 compatibility tests must pass
|
|
396
|
+
- v2 read verification tests must pass
|
|
397
|
+
- old-row `unknown` behavior must be proven
|
|
398
|
+
- tampered-row `invalid` behavior must be proven
|
|
399
|
+
- debug output must avoid leaking secrets or excessive internals
|
|
400
|
+
- Worker dry-run must pass
|
|
401
|
+
- Trust gate must pass with effective fail 0
|
|
402
|
+
- public docs must say exactly what is implemented
|
|
403
|
+
|
|
404
|
+
## Product Interpretation
|
|
405
|
+
|
|
406
|
+
Ha-Pri v2 is a provenance hardening lane for stored FreshContext rows. It supports the larger product story only when it stays honest:
|
|
407
|
+
|
|
408
|
+
```txt
|
|
409
|
+
candidate context in
|
|
410
|
+
decision-ready context out
|
|
411
|
+
optional provenance verification for stored rows
|
|
412
|
+
```
|
|
413
|
+
|
|
414
|
+
It is not a truth engine and not a substitute for authentication, authorization, or hosted tenant isolation.
|
|
@@ -22,7 +22,7 @@ This document describes release hardening practices for future FreshContext pack
|
|
|
22
22
|
- Confirm fresh consumer `npm audit --omit=dev` is clean.
|
|
23
23
|
- Run a stale-claim scan across public docs and package-facing files.
|
|
24
24
|
- Run a secret scan before sharing archives, diligence folders, or package artifacts.
|
|
25
|
-
- Keep operational demo runbooks, buyer scripts, outreach plans, diligence checklists, and
|
|
25
|
+
- Keep operational demo runbooks, buyer scripts, outreach plans, diligence checklists, and private commercial materials outside the public npm package.
|
|
26
26
|
|
|
27
27
|
## Package Exclusion Checks
|
|
28
28
|
|
package/docs/SIGNAL_CONTRACT.md
CHANGED
|
@@ -1,8 +1,30 @@
|
|
|
1
|
-
# FreshContext Signal Contract v1
|
|
2
|
-
|
|
3
|
-
FreshContext Signal Contract v1
|
|
4
|
-
|
|
5
|
-
|
|
1
|
+
# FreshContext Signal Contract v1
|
|
2
|
+
|
|
3
|
+
FreshContext Signal Contract v1 is the current FreshContext input standard: the stable shape for candidate context that should be judged before it reaches a model.
|
|
4
|
+
|
|
5
|
+
In plain terms:
|
|
6
|
+
|
|
7
|
+
```text
|
|
8
|
+
candidate context -> Signal Contract v1 -> FreshContext judgment -> decision-ready context
|
|
9
|
+
```
|
|
10
|
+
|
|
11
|
+
The Signal Contract is live product architecture. It is used by Core normalization, `evaluate_context`, bring-your-own-context demos, reference adapter signal paths, and future batch validation.
|
|
12
|
+
|
|
13
|
+
Do not rename this with future-signal terminology. Future context signals, control signals, provenance confidence signals, and richer decision metadata are optional future layers on top of this stable input shape. They are not replacements for Signal Contract v1 and are not required fields today.
|
|
14
|
+
|
|
15
|
+
It is an additive Core API. It does not change MCP tool schemas, Worker runtime behavior, D1 schema, Store scoring, feeds, or deployment behavior.
|
|
16
|
+
|
|
17
|
+
## Batch Validation Harness
|
|
18
|
+
|
|
19
|
+
Source checkouts include a small validation harness for testing larger Signal Contract v1 batches:
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
npm run batch:validate -- examples/batches/signal-contract-v1.academic.json
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
The harness reads caller-provided JSON, evaluates it through Core, summarizes status/date/decision counts, and prints a structured evidence block. It does not add required fields to the Signal Contract and does not fetch, crawl, scan folders, or call reference adapters.
|
|
26
|
+
|
|
27
|
+
Replay output includes concise decision explanations and small reason-code lists for top results and human-review mismatches. These are reporting aids for auditability: they surface whether relevance, freshness, date confidence, status, utility, score normalization, or Source Profile behavior influenced a treatment label. They do not change Core scoring, ranking, normalization, or decision thresholds, and they do not certify truth.
|
|
6
28
|
|
|
7
29
|
## Contract Version
|
|
8
30
|
|
|
@@ -16,9 +38,9 @@ Every normalized signal includes:
|
|
|
16
38
|
contract_version: "freshcontext.signal.v1"
|
|
17
39
|
```
|
|
18
40
|
|
|
19
|
-
## Input Shape
|
|
20
|
-
|
|
21
|
-
`FreshContextSignalInput` accepts the common fields used by adapters, agents, ranking, and future Store wiring:
|
|
41
|
+
## Input Shape
|
|
42
|
+
|
|
43
|
+
`FreshContextSignalInput` accepts the common fields used by adapters, agents, ranking, `evaluate_context`, and future Store wiring:
|
|
22
44
|
|
|
23
45
|
```ts
|
|
24
46
|
interface FreshContextSignalInput {
|
|
@@ -38,7 +60,21 @@ interface FreshContextSignalInput {
|
|
|
38
60
|
}
|
|
39
61
|
```
|
|
40
62
|
|
|
41
|
-
`published_at` is the canonical signal timestamp. `content_date` is accepted as an adapter/envelope compatibility alias.
|
|
63
|
+
`published_at` is the canonical signal timestamp. `content_date` is accepted as an adapter/envelope compatibility alias.
|
|
64
|
+
|
|
65
|
+
Minimal caller-provided input usually looks like:
|
|
66
|
+
|
|
67
|
+
```json
|
|
68
|
+
{
|
|
69
|
+
"title": "Example source",
|
|
70
|
+
"content": "Candidate context text...",
|
|
71
|
+
"source": "https://example.com/source",
|
|
72
|
+
"source_type": "official_docs",
|
|
73
|
+
"published_at": "2026-06-01T00:00:00.000Z",
|
|
74
|
+
"retrieved_at": "2026-06-09T00:00:00.000Z",
|
|
75
|
+
"semantic_score": 0.92
|
|
76
|
+
}
|
|
77
|
+
```
|
|
42
78
|
|
|
43
79
|
## Normalized Output
|
|
44
80
|
|
|
@@ -62,16 +98,166 @@ interface FreshContextSignal {
|
|
|
62
98
|
}
|
|
63
99
|
```
|
|
64
100
|
|
|
65
|
-
## Normalization Rules
|
|
66
|
-
|
|
67
|
-
- Missing or invalid `published_at` / `content_date` becomes `published_at: null`.
|
|
101
|
+
## Normalization Rules
|
|
102
|
+
|
|
103
|
+
- Missing or invalid `published_at` / `content_date` becomes `published_at: null`.
|
|
68
104
|
- `content_date` maps to `published_at` when `published_at` is absent.
|
|
69
105
|
- Meaningfully future-dated timestamps are cleared and receive `date_confidence: "unknown"`.
|
|
70
106
|
- Small clock skew is tolerated by the same Core freshness policy used by envelope scoring.
|
|
71
107
|
- Failed, empty, timeout, blocked, or error-looking content becomes `status: "failed"`.
|
|
72
108
|
- Missing, invalid, negative, or oversized `semantic_score` is clamped into `0..1`.
|
|
73
|
-
- `metadata` is shallow-copied so normalization does not mutate caller-owned objects.
|
|
74
|
-
- `reasons` records meaningful normalization changes.
|
|
109
|
+
- `metadata` is shallow-copied so normalization does not mutate caller-owned objects.
|
|
110
|
+
- `reasons` records meaningful normalization changes.
|
|
111
|
+
|
|
112
|
+
## Examples
|
|
113
|
+
|
|
114
|
+
These examples are intentionally small. They show the current contract shape, not future optional metadata.
|
|
115
|
+
|
|
116
|
+
### Valid Candidate Signals
|
|
117
|
+
|
|
118
|
+
Academic research:
|
|
119
|
+
|
|
120
|
+
```json
|
|
121
|
+
{
|
|
122
|
+
"title": "A fresh retrieval-augmented generation benchmark",
|
|
123
|
+
"content": "The paper reports a 2026 benchmark for retrieval-augmented generation systems.",
|
|
124
|
+
"source": "https://arxiv.org/abs/2606.00001",
|
|
125
|
+
"source_type": "arxiv",
|
|
126
|
+
"published_at": "2026-06-01T09:00:00.000Z",
|
|
127
|
+
"retrieved_at": "2026-06-09T12:00:00.000Z",
|
|
128
|
+
"semantic_score": 0.94,
|
|
129
|
+
"metadata": {
|
|
130
|
+
"profile": "academic_research"
|
|
131
|
+
}
|
|
132
|
+
}
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
Official docs:
|
|
136
|
+
|
|
137
|
+
```json
|
|
138
|
+
{
|
|
139
|
+
"title": "API changelog",
|
|
140
|
+
"content": "The official changelog documents the current API behavior.",
|
|
141
|
+
"source": "https://docs.example.com/changelog",
|
|
142
|
+
"source_type": "official_docs",
|
|
143
|
+
"published_at": "2026-06-08T10:00:00.000Z",
|
|
144
|
+
"retrieved_at": "2026-06-09T12:00:00.000Z",
|
|
145
|
+
"semantic_score": 0.88
|
|
146
|
+
}
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
Jobs/opportunities:
|
|
150
|
+
|
|
151
|
+
```json
|
|
152
|
+
{
|
|
153
|
+
"title": "AI tools engineer",
|
|
154
|
+
"content": "A current remote role for an AI tools engineer.",
|
|
155
|
+
"source": "https://jobs.example.com/ai-tools-engineer",
|
|
156
|
+
"source_type": "jobs",
|
|
157
|
+
"published_at": "2026-06-07T08:00:00.000Z",
|
|
158
|
+
"retrieved_at": "2026-06-09T12:00:00.000Z",
|
|
159
|
+
"semantic_score": 0.86
|
|
160
|
+
}
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
Market/finance:
|
|
164
|
+
|
|
165
|
+
```json
|
|
166
|
+
{
|
|
167
|
+
"title": "Company quarterly update",
|
|
168
|
+
"content": "The company reported current quarter revenue and guidance.",
|
|
169
|
+
"source": "https://investors.example.com/q2-update",
|
|
170
|
+
"source_type": "finance",
|
|
171
|
+
"published_at": "2026-06-09T07:00:00.000Z",
|
|
172
|
+
"retrieved_at": "2026-06-09T12:00:00.000Z",
|
|
173
|
+
"semantic_score": 0.83
|
|
174
|
+
}
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
Social pulse:
|
|
178
|
+
|
|
179
|
+
```json
|
|
180
|
+
{
|
|
181
|
+
"title": "Developer discussion",
|
|
182
|
+
"content": "Developers are discussing setup friction and recent adoption.",
|
|
183
|
+
"source": "https://news.ycombinator.com/item?id=123456",
|
|
184
|
+
"source_type": "hackernews",
|
|
185
|
+
"published_at": "2026-06-09T11:00:00.000Z",
|
|
186
|
+
"retrieved_at": "2026-06-09T12:00:00.000Z",
|
|
187
|
+
"semantic_score": 0.71
|
|
188
|
+
}
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
### Invalid Or Risky Candidate Signals
|
|
192
|
+
|
|
193
|
+
Missing date:
|
|
194
|
+
|
|
195
|
+
```json
|
|
196
|
+
{
|
|
197
|
+
"title": "Relevant source with no date",
|
|
198
|
+
"content": "Useful candidate context, but no publication timestamp is available.",
|
|
199
|
+
"source": "https://example.com/no-date",
|
|
200
|
+
"source_type": "official_docs",
|
|
201
|
+
"retrieved_at": "2026-06-09T12:00:00.000Z",
|
|
202
|
+
"semantic_score": 0.78
|
|
203
|
+
}
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
Invalid timestamp:
|
|
207
|
+
|
|
208
|
+
```json
|
|
209
|
+
{
|
|
210
|
+
"title": "Invalid date source",
|
|
211
|
+
"content": "Candidate context with malformed date metadata.",
|
|
212
|
+
"source": "https://example.com/bad-date",
|
|
213
|
+
"source_type": "official_docs",
|
|
214
|
+
"published_at": "not-a-date",
|
|
215
|
+
"retrieved_at": "2026-06-09T12:00:00.000Z",
|
|
216
|
+
"semantic_score": 0.78
|
|
217
|
+
}
|
|
218
|
+
```
|
|
219
|
+
|
|
220
|
+
Meaningfully future-dated timestamp:
|
|
221
|
+
|
|
222
|
+
```json
|
|
223
|
+
{
|
|
224
|
+
"title": "Future-dated source",
|
|
225
|
+
"content": "Candidate context whose publication timestamp is after retrieval time.",
|
|
226
|
+
"source": "https://example.com/future-date",
|
|
227
|
+
"source_type": "official_docs",
|
|
228
|
+
"published_at": "2026-06-09T12:06:00.000Z",
|
|
229
|
+
"retrieved_at": "2026-06-09T12:00:00.000Z",
|
|
230
|
+
"semantic_score": 0.78
|
|
231
|
+
}
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
Failed/error-looking content:
|
|
235
|
+
|
|
236
|
+
```json
|
|
237
|
+
{
|
|
238
|
+
"title": "Blocked source",
|
|
239
|
+
"content": "[Error] upstream timeout",
|
|
240
|
+
"source": "https://example.com/blocked",
|
|
241
|
+
"source_type": "official_docs",
|
|
242
|
+
"published_at": "2026-06-09T10:00:00.000Z",
|
|
243
|
+
"retrieved_at": "2026-06-09T12:00:00.000Z",
|
|
244
|
+
"semantic_score": 0.91
|
|
245
|
+
}
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
Out-of-range semantic score:
|
|
249
|
+
|
|
250
|
+
```json
|
|
251
|
+
{
|
|
252
|
+
"title": "Overscored source",
|
|
253
|
+
"content": "Candidate context with an out-of-range relevance score.",
|
|
254
|
+
"source": "https://example.com/overscored",
|
|
255
|
+
"source_type": "official_docs",
|
|
256
|
+
"published_at": "2026-06-09T10:00:00.000Z",
|
|
257
|
+
"retrieved_at": "2026-06-09T12:00:00.000Z",
|
|
258
|
+
"semantic_score": 1.7
|
|
259
|
+
}
|
|
260
|
+
```
|
|
75
261
|
|
|
76
262
|
## Relationship to Existing Core Types
|
|
77
263
|
|
|
@@ -84,6 +270,16 @@ The signal contract does not replace existing Core types:
|
|
|
84
270
|
|
|
85
271
|
The contract gives these surfaces a shared signal vocabulary without requiring Store, Worker, or MCP schema changes.
|
|
86
272
|
|
|
87
|
-
## Boundary
|
|
88
|
-
|
|
89
|
-
Signal Contract v1 does not determine truth, certify data, or provide legal, medical, tax, or financial advice. It provides normalized context metadata for freshness, provenance, relevance, and workflow review.
|
|
273
|
+
## Boundary
|
|
274
|
+
|
|
275
|
+
Signal Contract v1 does not determine truth, certify data, or provide legal, medical, tax, or financial advice. It provides normalized context metadata for freshness, provenance, relevance, and workflow review.
|
|
276
|
+
|
|
277
|
+
## Future Metadata Boundary
|
|
278
|
+
|
|
279
|
+
Future context signals, control signals, ingestion quality signals, structure preservation signals, and provenance confidence signals may later improve decisions such as `cite_as_primary`, `needs_refresh`, `needs_verification`, or `exclude`.
|
|
280
|
+
|
|
281
|
+
Those are roadmap metadata layers. They should remain optional until tests prove they improve decisions. The public input contract should stay boring and stable:
|
|
282
|
+
|
|
283
|
+
```text
|
|
284
|
+
title + content + source + source_type + published_at + retrieved_at + semantic_score
|
|
285
|
+
```
|