freshcontext-mcp 0.3.17 → 0.3.18
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/NOTICE.md +17 -0
- package/README.md +395 -296
- package/SECURITY.md +34 -0
- package/TRADEMARKS.md +9 -0
- package/dist/adapters/arxiv.js +92 -48
- package/dist/adapters/hackernews.js +16 -16
- package/dist/adapters/registry.js +232 -0
- package/dist/core/decay.js +61 -0
- package/dist/core/decision.js +176 -0
- package/dist/core/envelope.js +59 -0
- package/dist/core/explain.js +28 -0
- package/dist/core/guards.js +17 -0
- package/dist/core/index.js +11 -0
- package/dist/core/pipeline.js +101 -0
- package/dist/core/provenance.js +73 -0
- package/dist/core/rank.js +84 -0
- package/dist/core/signal.js +101 -0
- package/dist/core/sourceProfiles.js +126 -0
- package/dist/core/types.js +1 -0
- package/dist/core/utility.js +90 -0
- package/dist/rest/handler.js +126 -0
- package/dist/server.js +1 -1
- package/dist/tools/freshnessStamp.js +1 -137
- package/dist/types.js +0 -1
- package/docs/API_DESIGN.md +434 -0
- package/docs/CODEX_MCP_USAGE.md +116 -0
- package/docs/CORE_API.md +224 -0
- package/docs/DEPENDENCY_DILIGENCE.md +63 -0
- package/docs/HA_PRI_V2_DESIGN.md +279 -0
- package/docs/OPERATIONAL_DEMO_RUNBOOK.md +458 -0
- package/docs/RELEASE_INTEGRITY.md +53 -0
- package/docs/RELEASE_NOTES.md +38 -0
- package/docs/SIGNAL_CONTRACT.md +89 -0
- package/docs/SOURCE_PROFILES.md +427 -0
- package/freshcontext.schema.json +103 -103
- package/package-script-guard.mjs +140 -0
- package/package.json +92 -59
- package/server.json +27 -28
- package/dist/apify.js +0 -133
package/docs/CORE_API.md
ADDED
|
@@ -0,0 +1,224 @@
|
|
|
1
|
+
# FreshContext Core API
|
|
2
|
+
|
|
3
|
+
FreshContext Core is the reusable engine layer in the current integrated MCP/Core package. It owns signal normalization, envelope creation, freshness scoring, failure honesty, rank/explain primitives, the context-utility primitive, and pure provenance helpers.
|
|
4
|
+
|
|
5
|
+
MCP, Worker HTTP, future REST, and future CLI/SDK surfaces should use Core as the contract center instead of redefining freshness or envelope behavior per host.
|
|
6
|
+
|
|
7
|
+
## Stable Public Core API
|
|
8
|
+
|
|
9
|
+
Import stable Core functions from:
|
|
10
|
+
|
|
11
|
+
```ts
|
|
12
|
+
import {
|
|
13
|
+
calculateFreshnessScore,
|
|
14
|
+
formatForLLM,
|
|
15
|
+
looksLikeFailedAdapterContent,
|
|
16
|
+
scoreLabel,
|
|
17
|
+
stampFreshness,
|
|
18
|
+
toStructuredJSON,
|
|
19
|
+
} from "./src/core/index.js";
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
### Envelope
|
|
23
|
+
|
|
24
|
+
- `stampFreshness(result, options, adapter)` creates a `FreshContext` object from adapter output.
|
|
25
|
+
- `formatForLLM(ctx, options?)` renders the text envelope and trailing structured JSON block.
|
|
26
|
+
- `toStructuredJSON(ctx)` returns the machine-readable FreshContext JSON shape.
|
|
27
|
+
|
|
28
|
+
### Scoring
|
|
29
|
+
|
|
30
|
+
- `calculateFreshnessScore(content_date, retrieved_at, adapter)` returns a freshness score from `0..100`, or `null` when the content date cannot be trusted.
|
|
31
|
+
- `scoreLabel(score)` maps a numeric freshness score to a human-readable label.
|
|
32
|
+
|
|
33
|
+
### Guards
|
|
34
|
+
|
|
35
|
+
- `looksLikeFailedAdapterContent(raw)` detects empty, security, timeout, and error-like adapter output so failed content is not stamped as fresh high-confidence context.
|
|
36
|
+
|
|
37
|
+
## Core Evaluation Pipeline
|
|
38
|
+
|
|
39
|
+
The Core evaluation pipeline is the pure orchestration layer over the existing primitives.
|
|
40
|
+
|
|
41
|
+
Public exports:
|
|
42
|
+
|
|
43
|
+
- `evaluateSignal(input, options?)`
|
|
44
|
+
- `evaluateSignals(inputs, options?)`
|
|
45
|
+
- `CoreSignalEvaluationOptions`
|
|
46
|
+
- `CoreSignalEvaluationResult`
|
|
47
|
+
- `CoreSignalEnvelopeResult`
|
|
48
|
+
- `CoreSignalProvenanceOptions`
|
|
49
|
+
|
|
50
|
+
`evaluateSignal` normalizes a signal, applies timestamp/failure guards, computes `freshness_score`, computes context-conditioned utility, ranks/explains the signal, optionally creates a FreshContext envelope, and optionally prepares Ha-Pri v2 provenance material.
|
|
51
|
+
|
|
52
|
+
It does not fetch, cache, write D1, inspect Worker bindings, know MCP tool schemas, deploy, or publish. Hosts decide whether to store, cache, transmit, or expose the returned result.
|
|
53
|
+
|
|
54
|
+
`evaluateSignals` evaluates each input and returns evaluations sorted by existing `rankSignal` final score, preserving input order when scores tie. Context utility is returned as a sidecar and does not replace `final_score`.
|
|
55
|
+
|
|
56
|
+
Context utility is returned as sidecar output in the current pipeline; it does not replace or modify the default `rankSignal` / `evaluateSignals` ordering. A future pass may add an explicit utility-weighted ranking mode.
|
|
57
|
+
|
|
58
|
+
Local demo:
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
npm run demo:evaluate:file -- examples/sources.academic.example.json
|
|
62
|
+
npm run demo:evaluate:file -- examples/sources.jobs.example.json
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
The demo reads caller-provided JSON with `profile`, `intent`, and `signals`, then returns decision-first output. It does not fetch URLs, crawl, read folders, deploy REST, or implement Operator mode.
|
|
66
|
+
|
|
67
|
+
### Stable Types
|
|
68
|
+
|
|
69
|
+
- `FreshContext`
|
|
70
|
+
- `AdapterResult`
|
|
71
|
+
- `ExtractOptions`
|
|
72
|
+
- `EnvelopeFormatOptions`
|
|
73
|
+
- `SignalConfidence`
|
|
74
|
+
|
|
75
|
+
These types describe the stable envelope and adapter result contract.
|
|
76
|
+
|
|
77
|
+
## Signal Contract v1
|
|
78
|
+
|
|
79
|
+
Signal Contract v1 is the additive Core shape for a retrieved signal before it is ranked, wrapped, stored, or passed to an agent workflow.
|
|
80
|
+
|
|
81
|
+
Public exports:
|
|
82
|
+
|
|
83
|
+
- `SIGNAL_CONTRACT_VERSION`
|
|
84
|
+
- `normalizeSignal(input, options?)`
|
|
85
|
+
- `FreshContextSignalInput`
|
|
86
|
+
- `FreshContextSignal`
|
|
87
|
+
- `SignalDateConfidence`
|
|
88
|
+
- `SignalContractVersion`
|
|
89
|
+
- `SignalNormalizeOptions`
|
|
90
|
+
|
|
91
|
+
`published_at` is the canonical signal timestamp. `content_date` is accepted as an adapter/envelope compatibility alias. Normalization clears invalid or meaningfully future-dated timestamps, marks failed/error-looking content as `status: "failed"`, clamps `semantic_score` into `0..1`, and records normalization reasons.
|
|
92
|
+
|
|
93
|
+
See [Signal Contract v1](./SIGNAL_CONTRACT.md).
|
|
94
|
+
|
|
95
|
+
## Source Profiles
|
|
96
|
+
|
|
97
|
+
Source Profiles are early public Core metadata for describing how classes of information age, fail, rank, and explain.
|
|
98
|
+
|
|
99
|
+
Public exports:
|
|
100
|
+
|
|
101
|
+
- `BUILT_IN_SOURCE_PROFILES`
|
|
102
|
+
- `getSourceProfile(profileId)`
|
|
103
|
+
- `listSourceProfiles()`
|
|
104
|
+
- `SourceProfile`
|
|
105
|
+
- `SourceProfileId`
|
|
106
|
+
- `SourceAuthorityHint`
|
|
107
|
+
- `SourceDatePolicy`
|
|
108
|
+
- `SourceFailurePolicy`
|
|
109
|
+
- `SourceSurface`
|
|
110
|
+
|
|
111
|
+
They reframe the 21 MCP tools as reference adapters and source-profile examples instead of the product identity. They do not implement `retrieve(...)`, Operator mode, adapter selection, crawling, local file search, or any host/runtime behavior.
|
|
112
|
+
|
|
113
|
+
## Decision Helper
|
|
114
|
+
|
|
115
|
+
The decision helper translates a Core evaluation result into user-facing action meaning.
|
|
116
|
+
|
|
117
|
+
Public exports:
|
|
118
|
+
|
|
119
|
+
- `interpretEvaluation(evaluation, options?)`
|
|
120
|
+
- `interpretEvaluations(evaluations, options?)`
|
|
121
|
+
- `ContextDecision`
|
|
122
|
+
- `IntentProfileId`
|
|
123
|
+
- `ContextDecisionOptions`
|
|
124
|
+
- `ContextDecisionResult`
|
|
125
|
+
|
|
126
|
+
Supported decisions:
|
|
127
|
+
|
|
128
|
+
- `use_first`
|
|
129
|
+
- `cite_as_primary`
|
|
130
|
+
- `cite_as_supporting`
|
|
131
|
+
- `use_as_background`
|
|
132
|
+
- `needs_verification`
|
|
133
|
+
- `needs_refresh`
|
|
134
|
+
- `watch_only`
|
|
135
|
+
- `exclude`
|
|
136
|
+
|
|
137
|
+
Supported intent profiles:
|
|
138
|
+
|
|
139
|
+
- `citation_check`
|
|
140
|
+
- `student_research`
|
|
141
|
+
- `developer_adoption`
|
|
142
|
+
- `job_search`
|
|
143
|
+
- `market_watch`
|
|
144
|
+
- `business_due_diligence`
|
|
145
|
+
- `medical_literature_triage`
|
|
146
|
+
|
|
147
|
+
The helper consumes existing `CoreSignalEvaluationResult` fields plus optional Source Profile metadata. It does not change `evaluateSignal`, `evaluateSignals`, `rankSignal`, ranking order, freshness math, utility math, envelopes, provenance, or host behavior.
|
|
148
|
+
|
|
149
|
+
FreshContext decisions judge citation readiness, context usefulness, freshness, traceability, and uncertainty. They do not certify truth or provide legal, medical, tax, employment, academic, or investment advice.
|
|
150
|
+
|
|
151
|
+
Demo output will be updated separately so presentation stays separate from Core decision logic.
|
|
152
|
+
|
|
153
|
+
## Public Ranking Primitives
|
|
154
|
+
|
|
155
|
+
The ranking primitives are public, but consumers should treat their score scales carefully:
|
|
156
|
+
|
|
157
|
+
- `rankSignal(signal, options?)`
|
|
158
|
+
- `rankSignals(signals, options?)`
|
|
159
|
+
- `explainSignal(rankedSignalLike)`
|
|
160
|
+
- `FreshSignal`
|
|
161
|
+
- `RankedSignal`
|
|
162
|
+
- `RankOptions`
|
|
163
|
+
|
|
164
|
+
Score scales:
|
|
165
|
+
|
|
166
|
+
- `semantic_score`: normalized `0..1`
|
|
167
|
+
- `final_score`: normalized `0..1`
|
|
168
|
+
- `freshness_score`: FreshContext freshness score `0..100`, or `null`
|
|
169
|
+
|
|
170
|
+
Ranking combines semantic relevance and freshness into a deterministic order. It does not own retrieval, embedding, vector search, storage, or host-specific scoring policy.
|
|
171
|
+
|
|
172
|
+
## Experimental Utility Primitive
|
|
173
|
+
|
|
174
|
+
The context-conditioned utility primitive is pure and tested, but it is not production-wired into MCP ranking, Worker feeds, Store scoring, or runtime behavior.
|
|
175
|
+
|
|
176
|
+
Experimental exports:
|
|
177
|
+
|
|
178
|
+
- `calculateContextUtility`
|
|
179
|
+
- `ContextUtilityStatus`
|
|
180
|
+
- `ContextUtilityInput`
|
|
181
|
+
- `ContextUtilityResult`
|
|
182
|
+
|
|
183
|
+
These are pure Core math. They are now connected inside `evaluateSignal` as sidecar utility output, but they are not production-wired into MCP ranking, Worker feeds, Store scoring, or runtime behavior.
|
|
184
|
+
|
|
185
|
+
## Provenance Helpers
|
|
186
|
+
|
|
187
|
+
Ha-Pri v2 is available as pure Core helper functionality:
|
|
188
|
+
|
|
189
|
+
- `canonicalizeHaPriContent`
|
|
190
|
+
- `sha256Hex`
|
|
191
|
+
- `calculateHaPriV2`
|
|
192
|
+
- `verifyHaPriV2`
|
|
193
|
+
- `HaPriV2Input`
|
|
194
|
+
- `HaPriV2Result`
|
|
195
|
+
- `HaPriV2VerificationResult`
|
|
196
|
+
|
|
197
|
+
`evaluateSignal` can optionally prepare Ha-Pri v2 material when `includeProvenance` is set and required input material is present. Core does not persist provenance, add D1 columns, verify rows on read, reject rows, or replace Worker Ha-Pri v1 behavior.
|
|
198
|
+
|
|
199
|
+
## Internal, Policy, and Compatibility Exports
|
|
200
|
+
|
|
201
|
+
- `clampScore` is an internal ranking helper. It is currently exported for tests and utility use, but it should not be presented as a primary buyer-facing API.
|
|
202
|
+
- `LAMBDA` is the current policy constant table used by freshness scoring. It documents the reference decay policy, but it is not a buyer-facing tuning API.
|
|
203
|
+
|
|
204
|
+
Compatibility lanes should remain:
|
|
205
|
+
|
|
206
|
+
- `src/types.ts` re-exports legacy adapter types from Core.
|
|
207
|
+
- `src/tools/freshnessStamp.ts` re-exports envelope helpers for older MCP/npm import paths.
|
|
208
|
+
|
|
209
|
+
These lanes protect existing imports while Core becomes the center. Do not remove them until downstream imports have been migrated intentionally.
|
|
210
|
+
|
|
211
|
+
## What Core Does Not Own
|
|
212
|
+
|
|
213
|
+
Core does not own:
|
|
214
|
+
|
|
215
|
+
- MCP transport
|
|
216
|
+
- Cloudflare runtime behavior
|
|
217
|
+
- KV cache policy
|
|
218
|
+
- Cache metadata injection
|
|
219
|
+
- D1, feed, or cron behavior
|
|
220
|
+
- Store/feed scoring and provenance persistence
|
|
221
|
+
- Hosted dashboard, API, deployment, or runtime concerns
|
|
222
|
+
|
|
223
|
+
Hosts may wrap Core outputs with their own transport, cache, session, rate-limit, or persistence metadata, but they should not fork the Core envelope and freshness contract without an explicit compatibility reason.
|
|
224
|
+
|
|
@@ -0,0 +1,63 @@
|
|
|
1
|
+
# FreshContext Dependency Diligence Notes
|
|
2
|
+
|
|
3
|
+
This document records dependency and license diligence notes from the Trust L4/L5 cleanup. It is not legal advice and does not replace professional review for external review, distribution, or formal diligence.
|
|
4
|
+
|
|
5
|
+
## Current Audit Status
|
|
6
|
+
|
|
7
|
+
As of Pass 8-AB:
|
|
8
|
+
|
|
9
|
+
- `npm audit --omit=dev`: clean.
|
|
10
|
+
- `npm audit`: clean.
|
|
11
|
+
- The published MCP npm package excludes the Apify Actor entrypoint and does not install Apify/Crawlee in normal consumer installs.
|
|
12
|
+
- The previous moderate `qs` and `ws` advisories were resolved with narrow transitive overrides in the source checkout.
|
|
13
|
+
- No package version change was made.
|
|
14
|
+
|
|
15
|
+
## Resolved Advisories
|
|
16
|
+
|
|
17
|
+
`qs`
|
|
18
|
+
|
|
19
|
+
- Previous severity: moderate.
|
|
20
|
+
- Path: `@modelcontextprotocol/sdk -> express/body-parser -> qs`.
|
|
21
|
+
- Resolution: pinned through npm `overrides` to `qs@6.15.2`.
|
|
22
|
+
|
|
23
|
+
`ws`
|
|
24
|
+
|
|
25
|
+
- Previous severity: moderate.
|
|
26
|
+
- Historical source-checkout path: `apify -> ws`.
|
|
27
|
+
- Resolution: pinned through npm `overrides` to `ws@8.20.1` for source-checkout Apify Actor workflows.
|
|
28
|
+
|
|
29
|
+
`file-type`
|
|
30
|
+
|
|
31
|
+
- Previous severity: moderate in fresh consumer installs.
|
|
32
|
+
- Historical consumer path: `apify -> @crawlee/utils -> file-type`.
|
|
33
|
+
- Resolution: Apify/Crawlee were removed from the normal published MCP package dependency surface. Apify remains a source-checkout / separate-actor concern.
|
|
34
|
+
|
|
35
|
+
## License Inventory Notes
|
|
36
|
+
|
|
37
|
+
The Trust L4 license inventory was broadly permissive, including MIT, Apache-2.0, BSD variants, ISC, 0BSD, BlueOak-1.0.0, and similar permissive variants.
|
|
38
|
+
|
|
39
|
+
No GPL, AGPL, LGPL, MPL, EPL, CDDL, or similar copyleft licenses were reported in the Trust L4 scan.
|
|
40
|
+
|
|
41
|
+
`map-stream@0.1.0`
|
|
42
|
+
|
|
43
|
+
- Scanner result: `UNKNOWN`.
|
|
44
|
+
- Path observed during L4: transitive through source-checkout `apify` / Crawlee-related dependencies.
|
|
45
|
+
- Diligence note: package metadata appears incomplete, but the installed package includes an MIT-style license file.
|
|
46
|
+
- Action: keep as a source-checkout / actor-packaging diligence note and recheck before external diligence or Apify Actor distribution.
|
|
47
|
+
|
|
48
|
+
`caniuse-lite`
|
|
49
|
+
|
|
50
|
+
- Scanner result: `CC-BY-4.0`.
|
|
51
|
+
- Diligence note: preserve and review attribution requirements before external diligence, bundled distribution, or a formal review package.
|
|
52
|
+
|
|
53
|
+
## Before External Diligence or Distribution
|
|
54
|
+
|
|
55
|
+
- Rerun `npm audit --omit=dev`.
|
|
56
|
+
- Rerun `npm audit`.
|
|
57
|
+
- Rerun dependency license inventory.
|
|
58
|
+
- Review scanner-unknown packages.
|
|
59
|
+
- Review `caniuse-lite` attribution requirements.
|
|
60
|
+
- Generate an SBOM if requested by an evaluator, reviewer, or downstream distributor.
|
|
61
|
+
- Review dependency and license posture with qualified counsel when external distribution or formal diligence requires it.
|
|
62
|
+
|
|
63
|
+
Do not treat this document as a legal conclusion.
|
|
@@ -0,0 +1,279 @@
|
|
|
1
|
+
# Ha-Pri v2 Design
|
|
2
|
+
|
|
3
|
+
Status: design + pure Core helper
|
|
4
|
+
Phase: Math Spine Phase 3-A / 3-B
|
|
5
|
+
Runtime impact: none
|
|
6
|
+
|
|
7
|
+
## Purpose
|
|
8
|
+
|
|
9
|
+
Ha-Pri v2 is an additive provenance-hardening model for FreshContext Store/Ledger rows.
|
|
10
|
+
|
|
11
|
+
The goal is to keep Ha-Pri v1 readable while designing a stronger future signature that binds a row to canonical content, semantic identity, source metadata, timestamps, and engine version.
|
|
12
|
+
|
|
13
|
+
Phase 3-B adds pure Core helper functions and deterministic tests for the v2 model. Phase 3-C adds `examples/ha-pri-v2-example.ts`, a deterministic developer fixture showing `calculateHaPriV2` and `verifyHaPriV2` returning valid, invalid, and unknown verification states. Production Store wiring remains future work. This document does not change the D1 schema, change Worker write paths, migrate old rows, add HMAC secrets, or alter production scoring.
|
|
14
|
+
|
|
15
|
+
## Current Ha-Pri v1 Audit
|
|
16
|
+
|
|
17
|
+
Ha-Pri v1 is implemented today as a provenance stamp and audit reference, not yet hard tamper enforcement.
|
|
18
|
+
|
|
19
|
+
### Where v1 Lives
|
|
20
|
+
|
|
21
|
+
Current implementation points:
|
|
22
|
+
|
|
23
|
+
- `worker/src/intelligence.ts`
|
|
24
|
+
- `PROVENANCE_SALT = "FRESHCONTEXT_DAR_V1"`
|
|
25
|
+
- `generateAuditSig(resultId, contentHash)`
|
|
26
|
+
- `scoreSignal(...)` computes `ha_pri_sig`
|
|
27
|
+
- `worker/src/worker.ts`
|
|
28
|
+
- migration adds `ha_pri_sig TEXT`
|
|
29
|
+
- cron write path stores `ha_pri_sig` in `scrape_results`
|
|
30
|
+
- `/v1/intel/feed/:profile_id` returns `ha_pri_sig` in `intelligence_stamps`
|
|
31
|
+
- `tests/mathSpine.test.ts`
|
|
32
|
+
- checks that `generateAuditSig` matches the documented v1 formula
|
|
33
|
+
|
|
34
|
+
### v1 Formula
|
|
35
|
+
|
|
36
|
+
```text
|
|
37
|
+
ha_pri_sig = SHA-256(
|
|
38
|
+
result_id + ":" +
|
|
39
|
+
content_hash + ":" +
|
|
40
|
+
"FRESHCONTEXT_DAR_V1"
|
|
41
|
+
)
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
### What v1 Binds
|
|
45
|
+
|
|
46
|
+
Ha-Pri v1 binds:
|
|
47
|
+
|
|
48
|
+
- the generated `result_id`
|
|
49
|
+
- the current `content_hash` argument passed into `scoreSignal`
|
|
50
|
+
- the engine/version salt string `FRESHCONTEXT_DAR_V1`
|
|
51
|
+
|
|
52
|
+
In the current Worker cron path, `content_hash` is the value named `result_hash`, produced by `simpleHash(raw)`.
|
|
53
|
+
|
|
54
|
+
### Current Hash Input
|
|
55
|
+
|
|
56
|
+
The current `result_hash` is a small rolling hash:
|
|
57
|
+
|
|
58
|
+
```ts
|
|
59
|
+
let h = 0;
|
|
60
|
+
for (let i = 0; i < str.length; i++) h = Math.imul(31, h) + str.charCodeAt(i) | 0;
|
|
61
|
+
return Math.abs(h).toString(36);
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
This is useful for cheap change detection, but it is not a cryptographic content digest.
|
|
65
|
+
|
|
66
|
+
### Storage and Output
|
|
67
|
+
|
|
68
|
+
Ha-Pri v1 is stored in D1:
|
|
69
|
+
|
|
70
|
+
- `scrape_results.ha_pri_sig`
|
|
71
|
+
|
|
72
|
+
It is returned through the live intelligence feed:
|
|
73
|
+
|
|
74
|
+
- `signals[].intelligence_stamps.ha_pri_sig`
|
|
75
|
+
|
|
76
|
+
### Verification Status
|
|
77
|
+
|
|
78
|
+
Current v1 behavior:
|
|
79
|
+
|
|
80
|
+
- generated on write: yes
|
|
81
|
+
- stored in D1: yes
|
|
82
|
+
- returned in feed/API output: yes
|
|
83
|
+
- recomputed on read: no
|
|
84
|
+
- used to reject tampered rows: no
|
|
85
|
+
- tied to canonical raw content SHA-256: no
|
|
86
|
+
|
|
87
|
+
So Ha-Pri v1 works as a provenance stamp and audit reference. It does not yet work as hard tamper enforcement.
|
|
88
|
+
|
|
89
|
+
## Weaknesses in v1
|
|
90
|
+
|
|
91
|
+
1. The signature uses a weak content-hash input.
|
|
92
|
+
|
|
93
|
+
`ha_pri_sig` is SHA-256, but it currently binds to the rolling `result_hash`, not to canonical raw content bytes. The v1 signature inherits the collision risk and ambiguity of the weaker input.
|
|
94
|
+
|
|
95
|
+
2. No read-time verification exists.
|
|
96
|
+
|
|
97
|
+
Feed and debug reads return the stored signature, but they do not recompute it and compare stored vs recomputed values.
|
|
98
|
+
|
|
99
|
+
3. No canonicalization contract exists for signed content.
|
|
100
|
+
|
|
101
|
+
The current signature signs a hash value, not a documented canonical representation of the row content.
|
|
102
|
+
|
|
103
|
+
4. v1 does not bind all fields needed for provenance.
|
|
104
|
+
|
|
105
|
+
It does not directly bind adapter, published timestamp, scraped timestamp, semantic fingerprint, or a schema marker beyond the fixed salt.
|
|
106
|
+
|
|
107
|
+
5. v1 is not authentication.
|
|
108
|
+
|
|
109
|
+
The salt is public. Anyone with row fields can compute the v1 signature. That is acceptable for a provenance reference, but it should not be presented as proof of origin from a private signing authority.
|
|
110
|
+
|
|
111
|
+
## Ha-Pri v2 Design Goals
|
|
112
|
+
|
|
113
|
+
Ha-Pri v2 should be:
|
|
114
|
+
|
|
115
|
+
- additive, not a breaking migration
|
|
116
|
+
- deterministic
|
|
117
|
+
- recomputable
|
|
118
|
+
- explicit about canonicalization
|
|
119
|
+
- stronger than v1 for content integrity
|
|
120
|
+
- safe to run without secrets
|
|
121
|
+
- compatible with future HMAC signing, without requiring it now
|
|
122
|
+
- clear about verification status: valid, invalid, or unknown
|
|
123
|
+
|
|
124
|
+
## Proposed v2 Fields
|
|
125
|
+
|
|
126
|
+
Future Store/Ledger rows may add:
|
|
127
|
+
|
|
128
|
+
```text
|
|
129
|
+
canonical_content_sha256 TEXT
|
|
130
|
+
semantic_fingerprint_sha256 TEXT
|
|
131
|
+
ha_pri_sig_v2 TEXT
|
|
132
|
+
ha_pri_v2_status TEXT
|
|
133
|
+
ha_pri_v2_checked_at TEXT
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
These are design-level names only. No schema change is made in this phase.
|
|
137
|
+
|
|
138
|
+
### canonical_content_sha256
|
|
139
|
+
|
|
140
|
+
`canonical_content_sha256` is:
|
|
141
|
+
|
|
142
|
+
```text
|
|
143
|
+
SHA-256(canonical raw content)
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
It binds the actual content after deterministic normalization.
|
|
147
|
+
|
|
148
|
+
### semantic_fingerprint_sha256
|
|
149
|
+
|
|
150
|
+
`semantic_fingerprint_sha256` is:
|
|
151
|
+
|
|
152
|
+
```text
|
|
153
|
+
SHA-256(normalized title + canonical URL + publication date)
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
It is a full SHA-256 version of the current shorter semantic fingerprint idea.
|
|
157
|
+
|
|
158
|
+
## Canonicalization Rules
|
|
159
|
+
|
|
160
|
+
All canonicalization should be deterministic.
|
|
161
|
+
|
|
162
|
+
Recommended rules:
|
|
163
|
+
|
|
164
|
+
1. Use UTF-8.
|
|
165
|
+
2. Normalize line endings to `\n`.
|
|
166
|
+
3. Trim trailing whitespace on each line.
|
|
167
|
+
4. Preserve meaningful internal whitespace.
|
|
168
|
+
5. Normalize null or missing optional fields to the literal string `"null"`.
|
|
169
|
+
6. Use stable field order.
|
|
170
|
+
7. Use ISO-8601 timestamps where available.
|
|
171
|
+
8. Do not include fields whose values change during read-time verification unless they are explicitly part of the signed record.
|
|
172
|
+
9. Version the canonicalization contract.
|
|
173
|
+
|
|
174
|
+
For future implementation, canonicalization should live in a pure helper with deterministic fixtures.
|
|
175
|
+
|
|
176
|
+
## Proposed v2 Formula
|
|
177
|
+
|
|
178
|
+
```text
|
|
179
|
+
ha_pri_sig_v2 = SHA-256(signingPayload)
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
Where `signingPayload` is exactly:
|
|
183
|
+
|
|
184
|
+
```text
|
|
185
|
+
FRESHCONTEXT_HA_PRI_V2
|
|
186
|
+
result_id=<resultId>
|
|
187
|
+
canonical_content_sha256=<canonicalContentSha256>
|
|
188
|
+
semantic_fingerprint_sha256=<semanticFingerprintSha256>
|
|
189
|
+
adapter=<adapter>
|
|
190
|
+
published_at=<publishedAt-or-null>
|
|
191
|
+
retrieved_at=<retrievedAt-or-null>
|
|
192
|
+
engine_version=<engineVersion>
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
### Field Meaning
|
|
196
|
+
|
|
197
|
+
- `FRESHCONTEXT_HA_PRI_V2`: schema/version string
|
|
198
|
+
- `result_id`: stable row identifier
|
|
199
|
+
- `canonical_content_sha256`: cryptographic digest of canonical raw content
|
|
200
|
+
- `semantic_fingerprint_sha256`: cryptographic digest of semantic identity fields
|
|
201
|
+
- `adapter`: source adapter name
|
|
202
|
+
- `published_at`: source/content publication timestamp, or explicit null sentinel
|
|
203
|
+
- `retrieved_at`: retrieval or collection timestamp, or explicit null sentinel
|
|
204
|
+
- `engine_version`: scoring/signature engine version
|
|
205
|
+
|
|
206
|
+
Store/Ledger systems may map `scraped_at` to the v2 `retrieved_at` signing field.
|
|
207
|
+
|
|
208
|
+
## Verification Model
|
|
209
|
+
|
|
210
|
+
Future read or audit verification should:
|
|
211
|
+
|
|
212
|
+
1. Load the stored row.
|
|
213
|
+
2. Recompute canonical raw content from stored content fields.
|
|
214
|
+
3. Recompute `canonical_content_sha256`.
|
|
215
|
+
4. Recompute semantic identity fields.
|
|
216
|
+
5. Recompute `semantic_fingerprint_sha256`.
|
|
217
|
+
6. Recompute `ha_pri_sig_v2` from the canonical field sequence.
|
|
218
|
+
7. Compare stored vs recomputed values.
|
|
219
|
+
8. Mark the result:
|
|
220
|
+
- `valid`
|
|
221
|
+
- `invalid`
|
|
222
|
+
- `unknown`
|
|
223
|
+
9. Surface verification status to internal/debug paths first.
|
|
224
|
+
10. Avoid silently trusting unverifiable rows.
|
|
225
|
+
|
|
226
|
+
Verification must not mutate old rows during read unless a dedicated migration explicitly allows it.
|
|
227
|
+
|
|
228
|
+
## Backward Compatibility
|
|
229
|
+
|
|
230
|
+
Ha-Pri v2 should not remove or reinterpret v1.
|
|
231
|
+
|
|
232
|
+
Rules:
|
|
233
|
+
|
|
234
|
+
- Keep `ha_pri_sig` readable.
|
|
235
|
+
- Add `ha_pri_sig_v2` separately.
|
|
236
|
+
- Treat old rows without v2 fields as `unknown`, not invalid.
|
|
237
|
+
- Do not reject old rows solely because they lack v2.
|
|
238
|
+
- Preserve v1 formula tests.
|
|
239
|
+
- Add v2 fixtures before any production write path changes.
|
|
240
|
+
|
|
241
|
+
## Future HMAC Boundary
|
|
242
|
+
|
|
243
|
+
HMAC-SHA256 may be useful later if FreshContext needs origin authentication rather than only tamper evidence.
|
|
244
|
+
|
|
245
|
+
That would require:
|
|
246
|
+
|
|
247
|
+
- a private deployment key
|
|
248
|
+
- secret rotation
|
|
249
|
+
- key identifiers
|
|
250
|
+
- verification policy for old key versions
|
|
251
|
+
- clear trust-boundary documentation
|
|
252
|
+
|
|
253
|
+
This phase does not add HMAC, secrets, or key management.
|
|
254
|
+
|
|
255
|
+
## Suggested Future Patch Sequence
|
|
256
|
+
|
|
257
|
+
1. Add pure canonicalization helpers and deterministic tests.
|
|
258
|
+
2. Add pure v2 signature helper and fixtures.
|
|
259
|
+
3. Add optional verification helper that returns `valid`, `invalid`, or `unknown`.
|
|
260
|
+
4. Add D1 columns in a separate schema phase.
|
|
261
|
+
5. Write v2 fields for new rows only.
|
|
262
|
+
6. Expose verification status on debug/internal endpoints first.
|
|
263
|
+
7. Decide later whether public feed output should include v2 status.
|
|
264
|
+
|
|
265
|
+
## Non-Goals
|
|
266
|
+
|
|
267
|
+
This design does not:
|
|
268
|
+
|
|
269
|
+
- change runtime behavior
|
|
270
|
+
- change scoring
|
|
271
|
+
- change MCP tool schemas
|
|
272
|
+
- change D1 schema
|
|
273
|
+
- change Worker write paths
|
|
274
|
+
- migrate old rows
|
|
275
|
+
- add HMAC
|
|
276
|
+
- add secrets
|
|
277
|
+
- reject rows in production
|
|
278
|
+
- publish npm
|
|
279
|
+
- deploy the Worker
|