freshcontext-mcp 0.3.17 → 0.3.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (42) hide show
  1. package/LICENSE +21 -0
  2. package/NOTICE.md +17 -0
  3. package/README.md +459 -296
  4. package/SECURITY.md +34 -0
  5. package/TRADEMARKS.md +9 -0
  6. package/dist/adapters/arxiv.js +92 -48
  7. package/dist/adapters/hackernews.js +16 -16
  8. package/dist/adapters/registry.js +232 -0
  9. package/dist/core/decay.js +61 -0
  10. package/dist/core/decision.js +176 -0
  11. package/dist/core/envelope.js +59 -0
  12. package/dist/core/explain.js +28 -0
  13. package/dist/core/guards.js +17 -0
  14. package/dist/core/index.js +11 -0
  15. package/dist/core/pipeline.js +101 -0
  16. package/dist/core/provenance.js +73 -0
  17. package/dist/core/rank.js +84 -0
  18. package/dist/core/signal.js +101 -0
  19. package/dist/core/sourceProfiles.js +126 -0
  20. package/dist/core/types.js +1 -0
  21. package/dist/core/utility.js +90 -0
  22. package/dist/rest/handler.js +126 -0
  23. package/dist/server.js +40 -2
  24. package/dist/tools/evaluateContext.js +127 -0
  25. package/dist/tools/freshnessStamp.js +1 -137
  26. package/dist/types.js +0 -1
  27. package/docs/API_DESIGN.md +434 -0
  28. package/docs/CODEX_MCP_USAGE.md +116 -0
  29. package/docs/CORE_API.md +226 -0
  30. package/docs/CORE_MCP_BOUNDARY.md +106 -0
  31. package/docs/DEPENDENCY_DILIGENCE.md +63 -0
  32. package/docs/FUTURE_LANES.md +173 -0
  33. package/docs/HA_PRI_V2_DESIGN.md +279 -0
  34. package/docs/RELEASE_INTEGRITY.md +55 -0
  35. package/docs/RELEASE_NOTES.md +55 -0
  36. package/docs/SIGNAL_CONTRACT.md +89 -0
  37. package/docs/SOURCE_PROFILES.md +427 -0
  38. package/freshcontext.schema.json +103 -103
  39. package/package-script-guard.mjs +141 -0
  40. package/package.json +94 -59
  41. package/server.json +27 -28
  42. package/dist/apify.js +0 -133
@@ -0,0 +1,279 @@
1
+ # Ha-Pri v2 Design
2
+
3
+ Status: design + pure Core helper
4
+ Phase: Math Spine Phase 3-A / 3-B
5
+ Runtime impact: none
6
+
7
+ ## Purpose
8
+
9
+ Ha-Pri v2 is an additive provenance-hardening model for FreshContext Store/Ledger rows.
10
+
11
+ The goal is to keep Ha-Pri v1 readable while designing a stronger future signature that binds a row to canonical content, semantic identity, source metadata, timestamps, and engine version.
12
+
13
+ Phase 3-B adds pure Core helper functions and deterministic tests for the v2 model. Phase 3-C adds `examples/ha-pri-v2-example.ts`, a deterministic developer fixture showing `calculateHaPriV2` and `verifyHaPriV2` returning valid, invalid, and unknown verification states. Production Store wiring remains future work. This document does not change the D1 schema, change Worker write paths, migrate old rows, add HMAC secrets, or alter production scoring.
14
+
15
+ ## Current Ha-Pri v1 Audit
16
+
17
+ Ha-Pri v1 is implemented today as a provenance stamp and audit reference, not yet hard tamper enforcement.
18
+
19
+ ### Where v1 Lives
20
+
21
+ Current implementation points:
22
+
23
+ - `worker/src/intelligence.ts`
24
+ - `PROVENANCE_SALT = "FRESHCONTEXT_DAR_V1"`
25
+ - `generateAuditSig(resultId, contentHash)`
26
+ - `scoreSignal(...)` computes `ha_pri_sig`
27
+ - `worker/src/worker.ts`
28
+ - migration adds `ha_pri_sig TEXT`
29
+ - cron write path stores `ha_pri_sig` in `scrape_results`
30
+ - `/v1/intel/feed/:profile_id` returns `ha_pri_sig` in `intelligence_stamps`
31
+ - `tests/mathSpine.test.ts`
32
+ - checks that `generateAuditSig` matches the documented v1 formula
33
+
34
+ ### v1 Formula
35
+
36
+ ```text
37
+ ha_pri_sig = SHA-256(
38
+ result_id + ":" +
39
+ content_hash + ":" +
40
+ "FRESHCONTEXT_DAR_V1"
41
+ )
42
+ ```
43
+
44
+ ### What v1 Binds
45
+
46
+ Ha-Pri v1 binds:
47
+
48
+ - the generated `result_id`
49
+ - the current `content_hash` argument passed into `scoreSignal`
50
+ - the engine/version salt string `FRESHCONTEXT_DAR_V1`
51
+
52
+ In the current Worker cron path, `content_hash` is the value named `result_hash`, produced by `simpleHash(raw)`.
53
+
54
+ ### Current Hash Input
55
+
56
+ The current `result_hash` is a small rolling hash:
57
+
58
+ ```ts
59
+ let h = 0;
60
+ for (let i = 0; i < str.length; i++) h = Math.imul(31, h) + str.charCodeAt(i) | 0;
61
+ return Math.abs(h).toString(36);
62
+ ```
63
+
64
+ This is useful for cheap change detection, but it is not a cryptographic content digest.
65
+
66
+ ### Storage and Output
67
+
68
+ Ha-Pri v1 is stored in D1:
69
+
70
+ - `scrape_results.ha_pri_sig`
71
+
72
+ It is returned through the live intelligence feed:
73
+
74
+ - `signals[].intelligence_stamps.ha_pri_sig`
75
+
76
+ ### Verification Status
77
+
78
+ Current v1 behavior:
79
+
80
+ - generated on write: yes
81
+ - stored in D1: yes
82
+ - returned in feed/API output: yes
83
+ - recomputed on read: no
84
+ - used to reject tampered rows: no
85
+ - tied to canonical raw content SHA-256: no
86
+
87
+ So Ha-Pri v1 works as a provenance stamp and audit reference. It does not yet work as hard tamper enforcement.
88
+
89
+ ## Weaknesses in v1
90
+
91
+ 1. The signature uses a weak content-hash input.
92
+
93
+ `ha_pri_sig` is SHA-256, but it currently binds to the rolling `result_hash`, not to canonical raw content bytes. The v1 signature inherits the collision risk and ambiguity of the weaker input.
94
+
95
+ 2. No read-time verification exists.
96
+
97
+ Feed and debug reads return the stored signature, but they do not recompute it and compare stored vs recomputed values.
98
+
99
+ 3. No canonicalization contract exists for signed content.
100
+
101
+ The current signature signs a hash value, not a documented canonical representation of the row content.
102
+
103
+ 4. v1 does not bind all fields needed for provenance.
104
+
105
+ It does not directly bind adapter, published timestamp, scraped timestamp, semantic fingerprint, or a schema marker beyond the fixed salt.
106
+
107
+ 5. v1 is not authentication.
108
+
109
+ The salt is public. Anyone with row fields can compute the v1 signature. That is acceptable for a provenance reference, but it should not be presented as proof of origin from a private signing authority.
110
+
111
+ ## Ha-Pri v2 Design Goals
112
+
113
+ Ha-Pri v2 should be:
114
+
115
+ - additive, not a breaking migration
116
+ - deterministic
117
+ - recomputable
118
+ - explicit about canonicalization
119
+ - stronger than v1 for content integrity
120
+ - safe to run without secrets
121
+ - compatible with future HMAC signing, without requiring it now
122
+ - clear about verification status: valid, invalid, or unknown
123
+
124
+ ## Proposed v2 Fields
125
+
126
+ Future Store/Ledger rows may add:
127
+
128
+ ```text
129
+ canonical_content_sha256 TEXT
130
+ semantic_fingerprint_sha256 TEXT
131
+ ha_pri_sig_v2 TEXT
132
+ ha_pri_v2_status TEXT
133
+ ha_pri_v2_checked_at TEXT
134
+ ```
135
+
136
+ These are design-level names only. No schema change is made in this phase.
137
+
138
+ ### canonical_content_sha256
139
+
140
+ `canonical_content_sha256` is:
141
+
142
+ ```text
143
+ SHA-256(canonical raw content)
144
+ ```
145
+
146
+ It binds the actual content after deterministic normalization.
147
+
148
+ ### semantic_fingerprint_sha256
149
+
150
+ `semantic_fingerprint_sha256` is:
151
+
152
+ ```text
153
+ SHA-256(normalized title + canonical URL + publication date)
154
+ ```
155
+
156
+ It is a full SHA-256 version of the current shorter semantic fingerprint idea.
157
+
158
+ ## Canonicalization Rules
159
+
160
+ All canonicalization should be deterministic.
161
+
162
+ Recommended rules:
163
+
164
+ 1. Use UTF-8.
165
+ 2. Normalize line endings to `\n`.
166
+ 3. Trim trailing whitespace on each line.
167
+ 4. Preserve meaningful internal whitespace.
168
+ 5. Normalize null or missing optional fields to the literal string `"null"`.
169
+ 6. Use stable field order.
170
+ 7. Use ISO-8601 timestamps where available.
171
+ 8. Do not include fields whose values change during read-time verification unless they are explicitly part of the signed record.
172
+ 9. Version the canonicalization contract.
173
+
174
+ For future implementation, canonicalization should live in a pure helper with deterministic fixtures.
175
+
176
+ ## Proposed v2 Formula
177
+
178
+ ```text
179
+ ha_pri_sig_v2 = SHA-256(signingPayload)
180
+ ```
181
+
182
+ Where `signingPayload` is exactly:
183
+
184
+ ```text
185
+ FRESHCONTEXT_HA_PRI_V2
186
+ result_id=<resultId>
187
+ canonical_content_sha256=<canonicalContentSha256>
188
+ semantic_fingerprint_sha256=<semanticFingerprintSha256>
189
+ adapter=<adapter>
190
+ published_at=<publishedAt-or-null>
191
+ retrieved_at=<retrievedAt-or-null>
192
+ engine_version=<engineVersion>
193
+ ```
194
+
195
+ ### Field Meaning
196
+
197
+ - `FRESHCONTEXT_HA_PRI_V2`: schema/version string
198
+ - `result_id`: stable row identifier
199
+ - `canonical_content_sha256`: cryptographic digest of canonical raw content
200
+ - `semantic_fingerprint_sha256`: cryptographic digest of semantic identity fields
201
+ - `adapter`: source adapter name
202
+ - `published_at`: source/content publication timestamp, or explicit null sentinel
203
+ - `retrieved_at`: retrieval or collection timestamp, or explicit null sentinel
204
+ - `engine_version`: scoring/signature engine version
205
+
206
+ Store/Ledger systems may map `scraped_at` to the v2 `retrieved_at` signing field.
207
+
208
+ ## Verification Model
209
+
210
+ Future read or audit verification should:
211
+
212
+ 1. Load the stored row.
213
+ 2. Recompute canonical raw content from stored content fields.
214
+ 3. Recompute `canonical_content_sha256`.
215
+ 4. Recompute semantic identity fields.
216
+ 5. Recompute `semantic_fingerprint_sha256`.
217
+ 6. Recompute `ha_pri_sig_v2` from the canonical field sequence.
218
+ 7. Compare stored vs recomputed values.
219
+ 8. Mark the result:
220
+ - `valid`
221
+ - `invalid`
222
+ - `unknown`
223
+ 9. Surface verification status to internal/debug paths first.
224
+ 10. Avoid silently trusting unverifiable rows.
225
+
226
+ Verification must not mutate old rows during read unless a dedicated migration explicitly allows it.
227
+
228
+ ## Backward Compatibility
229
+
230
+ Ha-Pri v2 should not remove or reinterpret v1.
231
+
232
+ Rules:
233
+
234
+ - Keep `ha_pri_sig` readable.
235
+ - Add `ha_pri_sig_v2` separately.
236
+ - Treat old rows without v2 fields as `unknown`, not invalid.
237
+ - Do not reject old rows solely because they lack v2.
238
+ - Preserve v1 formula tests.
239
+ - Add v2 fixtures before any production write path changes.
240
+
241
+ ## Future HMAC Boundary
242
+
243
+ HMAC-SHA256 may be useful later if FreshContext needs origin authentication rather than only tamper evidence.
244
+
245
+ That would require:
246
+
247
+ - a private deployment key
248
+ - secret rotation
249
+ - key identifiers
250
+ - verification policy for old key versions
251
+ - clear trust-boundary documentation
252
+
253
+ This phase does not add HMAC, secrets, or key management.
254
+
255
+ ## Suggested Future Patch Sequence
256
+
257
+ 1. Add pure canonicalization helpers and deterministic tests.
258
+ 2. Add pure v2 signature helper and fixtures.
259
+ 3. Add optional verification helper that returns `valid`, `invalid`, or `unknown`.
260
+ 4. Add D1 columns in a separate schema phase.
261
+ 5. Write v2 fields for new rows only.
262
+ 6. Expose verification status on debug/internal endpoints first.
263
+ 7. Decide later whether public feed output should include v2 status.
264
+
265
+ ## Non-Goals
266
+
267
+ This design does not:
268
+
269
+ - change runtime behavior
270
+ - change scoring
271
+ - change MCP tool schemas
272
+ - change D1 schema
273
+ - change Worker write paths
274
+ - migrate old rows
275
+ - add HMAC
276
+ - add secrets
277
+ - reject rows in production
278
+ - publish npm
279
+ - deploy the Worker
@@ -0,0 +1,55 @@
1
+ # FreshContext Release Integrity Notes
2
+
3
+ This document describes release hardening practices for future FreshContext package and archive releases. It is a plan and checklist, not an implemented signing or SBOM system.
4
+
5
+ ## Before Publishing or Sharing a Package
6
+
7
+ - Start from a clean working tree.
8
+ - Confirm the package version intentionally matches the release plan.
9
+ - Run `npm run build`.
10
+ - Run `npm test`.
11
+ - Run `npm run smoke:stdio`.
12
+ - Run `npm run example:ha-pri-v2`.
13
+ - Run `cd worker && npx tsc --noEmit`.
14
+ - Run `npm audit --omit=dev`.
15
+ - Run `npm audit`.
16
+ - Run `npm pack --dry-run --json`.
17
+ - Smoke-test the packed tarball in a temporary install:
18
+ - Confirm `npm start` works from the installed package.
19
+ - Confirm the `freshcontext-mcp` binary works from the installed package.
20
+ - Confirm repo-only scripts print a source-checkout notice instead of failing when examples, tests, or scripts are intentionally excluded.
21
+ - Confirm `dist/server.js` is present and `dist/apify.js` is absent from the MCP npm package.
22
+ - Confirm fresh consumer `npm audit --omit=dev` is clean.
23
+ - Run a stale-claim scan across public docs and package-facing files.
24
+ - Run a secret scan before sharing archives, diligence folders, or package artifacts.
25
+ - Keep operational demo runbooks, buyer scripts, outreach plans, diligence checklists, and negotiation materials outside the public npm package.
26
+
27
+ ## Package Exclusion Checks
28
+
29
+ Confirm release artifacts do not include:
30
+
31
+ - Local environment files.
32
+ - Tokens or local credential files.
33
+ - MCP registry local credential files.
34
+ - Cloudflare local state.
35
+ - Local database snapshots or SQL dumps.
36
+ - Private sale, buyer, target, outreach, or diligence documents.
37
+ - Private data-room folders.
38
+ - Operational demo runbooks intended for buyer calls or internal screen-share rehearsal.
39
+ - Local logs.
40
+ - Old package tarballs.
41
+
42
+ ## Release Notes and Integrity Artifacts
43
+
44
+ - Current prepared release notes: [`RELEASE_NOTES.md`](./RELEASE_NOTES.md).
45
+
46
+ Future release hardening may include:
47
+
48
+ - GitHub release notes for tagged releases.
49
+ - Signed git tags, if signing is configured.
50
+ - `SHA256SUMS` files for release artifacts.
51
+ - SBOM generation for buyer or enterprise diligence.
52
+ - npm provenance and signature review.
53
+ - A documented token-rotation checklist for any maintainer or ownership transfer.
54
+
55
+ Do not publish from a dirty working tree. Do not publish from an environment that exposes secrets in logs or command output.
@@ -0,0 +1,55 @@
1
+ # FreshContext Release Notes
2
+
3
+ ## 0.3.19
4
+
5
+ FreshContext 0.3.19 syncs the public MCP package with the new generic `evaluate_context` interface.
6
+
7
+ ### Context Evaluation Front Door
8
+
9
+ - Adds `evaluate_context` as the primary MCP tool for caller-provided candidate context.
10
+ - Returns decision-first output for agents and users: decision, meaning, action, warnings, source, freshness, rank score, utility, confidence, and explanation.
11
+ - Keeps the boundary explicit: `evaluate_context` does not fetch, crawl, scrape, browse, read folders, or call adapters.
12
+ - Updates stdio smoke and Trust Scanner claim checks to expect 22 MCP tools: `evaluate_context` plus 21 read-only reference adapters.
13
+
14
+ ### Public Framing
15
+
16
+ - Reframes the MCP package around FreshContext as context integrity infrastructure, not a 21-tool toolbox.
17
+ - Keeps the 21 adapter tools as read-only reference adapters and proof surfaces.
18
+ - Updates package-facing docs/spec language to point first at `candidate context -> FreshContext Core -> decision-ready context`.
19
+
20
+ ## 0.3.18
21
+
22
+ FreshContext 0.3.18 made the MCP/Core package easier to install, validate, and explain without changing the deployed Worker runtime.
23
+
24
+ ### Core and Context Evaluation
25
+
26
+ - Added the Core signal evaluation pipeline for normalized, freshness-ranked context results.
27
+ - Added Source Profiles as Core metadata for source-aware policy vocabulary.
28
+ - Added the Context Decision Helper so evaluated context can be interpreted as use, cite, verify, refresh, watch, background, or exclude.
29
+ - Preserved the ranking boundary: `ranked.final_score` controls default ordering, while `utility.score` remains sidecar intelligence.
30
+
31
+ ### Bring Your Own Context
32
+
33
+ - Added decision-first local demos for user-provided JSON source lists.
34
+ - Added academic citation and jobs/opportunity sample inputs.
35
+ - Added `npm run demo:evaluate:file` for local source-list evaluation from a cloned source checkout.
36
+
37
+ ### Adapter Path
38
+
39
+ - Added adapter registry metadata for the 21 existing MCP tools.
40
+ - Added additive arXiv signal extraction without changing the existing MCP `extract_arxiv` behavior.
41
+ - Added an arXiv signal-to-decision proof using a static fixture, Core evaluation, and decision output.
42
+
43
+ ### Package and Release Hygiene
44
+
45
+ - Hardened the npm package file allowlist.
46
+ - Added script guards so repo-only scripts show a source-checkout notice in packed installs.
47
+ - Isolated Apify/Crawlee from the normal MCP npm runtime package while preserving source-checkout Apify Actor support.
48
+ - Confirmed fresh consumer installs no longer install Apify/Crawlee/file-type through the default MCP package path.
49
+
50
+ ### Boundaries
51
+
52
+ - No Worker deploy is part of the npm package release.
53
+ - No hosted dashboard, billing system, Operator mode, browser crawling, or local file scanning is included.
54
+ - No Worker, REST handler, MCP tool schema, or existing adapter behavior changes are included in the package release itself.
55
+ - Future work is tracked in [`FUTURE_LANES.md`](./FUTURE_LANES.md).
@@ -0,0 +1,89 @@
1
+ # FreshContext Signal Contract v1
2
+
3
+ FreshContext Signal Contract v1 defines the Core shape for a retrieved signal before it is ranked, wrapped, stored, or passed to an agent workflow.
4
+
5
+ It is an additive Core API. It does not change MCP tool schemas, Worker runtime behavior, D1 schema, Store scoring, feeds, or deployment behavior.
6
+
7
+ ## Contract Version
8
+
9
+ ```ts
10
+ type SignalContractVersion = "freshcontext.signal.v1";
11
+ ```
12
+
13
+ Every normalized signal includes:
14
+
15
+ ```ts
16
+ contract_version: "freshcontext.signal.v1"
17
+ ```
18
+
19
+ ## Input Shape
20
+
21
+ `FreshContextSignalInput` accepts the common fields used by adapters, agents, ranking, and future Store wiring:
22
+
23
+ ```ts
24
+ interface FreshContextSignalInput {
25
+ id?: string;
26
+ source: string;
27
+ source_type?: string;
28
+ title?: string;
29
+ content?: string;
30
+ published_at?: string | null;
31
+ content_date?: string | null;
32
+ retrieved_at?: string | null;
33
+ semantic_score?: number;
34
+ date_confidence?: "high" | "medium" | "low" | "unknown";
35
+ freshness_confidence?: "high" | "medium" | "low";
36
+ status?: "success" | "partial" | "stale" | "failed" | "unknown";
37
+ metadata?: Record<string, unknown>;
38
+ }
39
+ ```
40
+
41
+ `published_at` is the canonical signal timestamp. `content_date` is accepted as an adapter/envelope compatibility alias.
42
+
43
+ ## Normalized Output
44
+
45
+ `normalizeSignal(input, options?)` returns a `FreshContextSignal`:
46
+
47
+ ```ts
48
+ interface FreshContextSignal {
49
+ contract_version: "freshcontext.signal.v1";
50
+ id?: string;
51
+ source: string;
52
+ source_type: string;
53
+ title?: string;
54
+ content?: string;
55
+ published_at: string | null;
56
+ retrieved_at: string;
57
+ semantic_score: number;
58
+ date_confidence: "high" | "medium" | "low" | "unknown";
59
+ status: "success" | "partial" | "stale" | "failed" | "unknown";
60
+ metadata: Record<string, unknown>;
61
+ reasons: string[];
62
+ }
63
+ ```
64
+
65
+ ## Normalization Rules
66
+
67
+ - Missing or invalid `published_at` / `content_date` becomes `published_at: null`.
68
+ - `content_date` maps to `published_at` when `published_at` is absent.
69
+ - Meaningfully future-dated timestamps are cleared and receive `date_confidence: "unknown"`.
70
+ - Small clock skew is tolerated by the same Core freshness policy used by envelope scoring.
71
+ - Failed, empty, timeout, blocked, or error-looking content becomes `status: "failed"`.
72
+ - Missing, invalid, negative, or oversized `semantic_score` is clamped into `0..1`.
73
+ - `metadata` is shallow-copied so normalization does not mutate caller-owned objects.
74
+ - `reasons` records meaningful normalization changes.
75
+
76
+ ## Relationship to Existing Core Types
77
+
78
+ The signal contract does not replace existing Core types:
79
+
80
+ - `AdapterResult` remains the adapter-to-envelope input shape.
81
+ - `FreshContext` remains the envelope output shape.
82
+ - `FreshSignal` and `RankedSignal` remain the ranking input/output shapes.
83
+ - `ContextUtilityInput` remains the pure context-conditioned utility primitive.
84
+
85
+ The contract gives these surfaces a shared signal vocabulary without requiring Store, Worker, or MCP schema changes.
86
+
87
+ ## Boundary
88
+
89
+ Signal Contract v1 does not determine truth, certify data, or provide legal, medical, tax, or financial advice. It provides normalized context metadata for freshness, provenance, relevance, and workflow review.