npm - freshcontext-mcp - Versions diffs - 0.3.17 → 0.3.19 - Mend

freshcontext-mcp 0.3.17 → 0.3.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (42) hide show

package/LICENSE +21 -0
package/NOTICE.md +17 -0
package/README.md +459 -296
package/SECURITY.md +34 -0
package/TRADEMARKS.md +9 -0
package/dist/adapters/arxiv.js +92 -48
package/dist/adapters/hackernews.js +16 -16
package/dist/adapters/registry.js +232 -0
package/dist/core/decay.js +61 -0
package/dist/core/decision.js +176 -0
package/dist/core/envelope.js +59 -0
package/dist/core/explain.js +28 -0
package/dist/core/guards.js +17 -0
package/dist/core/index.js +11 -0
package/dist/core/pipeline.js +101 -0
package/dist/core/provenance.js +73 -0
package/dist/core/rank.js +84 -0
package/dist/core/signal.js +101 -0
package/dist/core/sourceProfiles.js +126 -0
package/dist/core/types.js +1 -0
package/dist/core/utility.js +90 -0
package/dist/rest/handler.js +126 -0
package/dist/server.js +40 -2
package/dist/tools/evaluateContext.js +127 -0
package/dist/tools/freshnessStamp.js +1 -137
package/dist/types.js +0 -1
package/docs/API_DESIGN.md +434 -0
package/docs/CODEX_MCP_USAGE.md +116 -0
package/docs/CORE_API.md +226 -0
package/docs/CORE_MCP_BOUNDARY.md +106 -0
package/docs/DEPENDENCY_DILIGENCE.md +63 -0
package/docs/FUTURE_LANES.md +173 -0
package/docs/HA_PRI_V2_DESIGN.md +279 -0
package/docs/RELEASE_INTEGRITY.md +55 -0
package/docs/RELEASE_NOTES.md +55 -0
package/docs/SIGNAL_CONTRACT.md +89 -0
package/docs/SOURCE_PROFILES.md +427 -0
package/freshcontext.schema.json +103 -103
package/package-script-guard.mjs +141 -0
package/package.json +94 -59
package/server.json +27 -28
package/dist/apify.js +0 -133

package/docs/HA_PRI_V2_DESIGN.md ADDED Viewed

@@ -0,0 +1,279 @@
+# Ha-Pri v2 Design
+Status: design + pure Core helper
+Phase: Math Spine Phase 3-A / 3-B
+Runtime impact: none
+## Purpose
+Ha-Pri v2 is an additive provenance-hardening model for FreshContext Store/Ledger rows.
+The goal is to keep Ha-Pri v1 readable while designing a stronger future signature that binds a row to canonical content, semantic identity, source metadata, timestamps, and engine version.
+Phase 3-B adds pure Core helper functions and deterministic tests for the v2 model. Phase 3-C adds `examples/ha-pri-v2-example.ts`, a deterministic developer fixture showing `calculateHaPriV2` and `verifyHaPriV2` returning valid, invalid, and unknown verification states. Production Store wiring remains future work. This document does not change the D1 schema, change Worker write paths, migrate old rows, add HMAC secrets, or alter production scoring.
+## Current Ha-Pri v1 Audit
+Ha-Pri v1 is implemented today as a provenance stamp and audit reference, not yet hard tamper enforcement.
+### Where v1 Lives
+Current implementation points:
+- `worker/src/intelligence.ts`
+  - `PROVENANCE_SALT = "FRESHCONTEXT_DAR_V1"`
+  - `generateAuditSig(resultId, contentHash)`
+  - `scoreSignal(...)` computes `ha_pri_sig`
+- `worker/src/worker.ts`
+  - migration adds `ha_pri_sig TEXT`
+  - cron write path stores `ha_pri_sig` in `scrape_results`
+  - `/v1/intel/feed/:profile_id` returns `ha_pri_sig` in `intelligence_stamps`
+- `tests/mathSpine.test.ts`
+  - checks that `generateAuditSig` matches the documented v1 formula
+### v1 Formula
+```text
+ha_pri_sig = SHA-256(
+  result_id + ":" +
+  content_hash + ":" +
+  "FRESHCONTEXT_DAR_V1"
+)
+```
+### What v1 Binds
+Ha-Pri v1 binds:
+- the generated `result_id`
+- the current `content_hash` argument passed into `scoreSignal`
+- the engine/version salt string `FRESHCONTEXT_DAR_V1`
+In the current Worker cron path, `content_hash` is the value named `result_hash`, produced by `simpleHash(raw)`.
+### Current Hash Input
+The current `result_hash` is a small rolling hash:
+```ts
+let h = 0;
+for (let i = 0; i < str.length; i++) h = Math.imul(31, h) + str.charCodeAt(i) | 0;
+return Math.abs(h).toString(36);
+```
+This is useful for cheap change detection, but it is not a cryptographic content digest.
+### Storage and Output
+Ha-Pri v1 is stored in D1:
+- `scrape_results.ha_pri_sig`
+It is returned through the live intelligence feed:
+- `signals[].intelligence_stamps.ha_pri_sig`
+### Verification Status
+Current v1 behavior:
+- generated on write: yes
+- stored in D1: yes
+- returned in feed/API output: yes
+- recomputed on read: no
+- used to reject tampered rows: no
+- tied to canonical raw content SHA-256: no
+So Ha-Pri v1 works as a provenance stamp and audit reference. It does not yet work as hard tamper enforcement.
+## Weaknesses in v1
+1. The signature uses a weak content-hash input.
+   `ha_pri_sig` is SHA-256, but it currently binds to the rolling `result_hash`, not to canonical raw content bytes. The v1 signature inherits the collision risk and ambiguity of the weaker input.
+2. No read-time verification exists.
+   Feed and debug reads return the stored signature, but they do not recompute it and compare stored vs recomputed values.
+3. No canonicalization contract exists for signed content.
+   The current signature signs a hash value, not a documented canonical representation of the row content.
+4. v1 does not bind all fields needed for provenance.
+   It does not directly bind adapter, published timestamp, scraped timestamp, semantic fingerprint, or a schema marker beyond the fixed salt.
+5. v1 is not authentication.
+   The salt is public. Anyone with row fields can compute the v1 signature. That is acceptable for a provenance reference, but it should not be presented as proof of origin from a private signing authority.
+## Ha-Pri v2 Design Goals
+Ha-Pri v2 should be:
+- additive, not a breaking migration
+- deterministic
+- recomputable
+- explicit about canonicalization
+- stronger than v1 for content integrity
+- safe to run without secrets
+- compatible with future HMAC signing, without requiring it now
+- clear about verification status: valid, invalid, or unknown
+## Proposed v2 Fields
+Future Store/Ledger rows may add:
+```text
+canonical_content_sha256 TEXT
+semantic_fingerprint_sha256 TEXT
+ha_pri_sig_v2 TEXT
+ha_pri_v2_status TEXT
+ha_pri_v2_checked_at TEXT
+```
+These are design-level names only. No schema change is made in this phase.
+### canonical_content_sha256
+`canonical_content_sha256` is:
+```text
+SHA-256(canonical raw content)
+```
+It binds the actual content after deterministic normalization.
+### semantic_fingerprint_sha256
+`semantic_fingerprint_sha256` is:
+```text
+SHA-256(normalized title + canonical URL + publication date)
+```
+It is a full SHA-256 version of the current shorter semantic fingerprint idea.
+## Canonicalization Rules
+All canonicalization should be deterministic.
+Recommended rules:
+1. Use UTF-8.
+2. Normalize line endings to `\n`.
+3. Trim trailing whitespace on each line.
+4. Preserve meaningful internal whitespace.
+5. Normalize null or missing optional fields to the literal string `"null"`.
+6. Use stable field order.
+7. Use ISO-8601 timestamps where available.
+8. Do not include fields whose values change during read-time verification unless they are explicitly part of the signed record.
+9. Version the canonicalization contract.
+For future implementation, canonicalization should live in a pure helper with deterministic fixtures.
+## Proposed v2 Formula
+```text
+ha_pri_sig_v2 = SHA-256(signingPayload)
+```
+Where `signingPayload` is exactly:
+```text
+FRESHCONTEXT_HA_PRI_V2
+result_id=<resultId>
+canonical_content_sha256=<canonicalContentSha256>
+semantic_fingerprint_sha256=<semanticFingerprintSha256>
+adapter=<adapter>
+published_at=<publishedAt-or-null>
+retrieved_at=<retrievedAt-or-null>
+engine_version=<engineVersion>
+```
+### Field Meaning
+- `FRESHCONTEXT_HA_PRI_V2`: schema/version string
+- `result_id`: stable row identifier
+- `canonical_content_sha256`: cryptographic digest of canonical raw content
+- `semantic_fingerprint_sha256`: cryptographic digest of semantic identity fields
+- `adapter`: source adapter name
+- `published_at`: source/content publication timestamp, or explicit null sentinel
+- `retrieved_at`: retrieval or collection timestamp, or explicit null sentinel
+- `engine_version`: scoring/signature engine version
+Store/Ledger systems may map `scraped_at` to the v2 `retrieved_at` signing field.
+## Verification Model
+Future read or audit verification should:
+1. Load the stored row.
+2. Recompute canonical raw content from stored content fields.
+3. Recompute `canonical_content_sha256`.
+4. Recompute semantic identity fields.
+5. Recompute `semantic_fingerprint_sha256`.
+6. Recompute `ha_pri_sig_v2` from the canonical field sequence.
+7. Compare stored vs recomputed values.
+8. Mark the result:
+   - `valid`
+   - `invalid`
+   - `unknown`
+9. Surface verification status to internal/debug paths first.
+10. Avoid silently trusting unverifiable rows.
+Verification must not mutate old rows during read unless a dedicated migration explicitly allows it.
+## Backward Compatibility
+Ha-Pri v2 should not remove or reinterpret v1.
+Rules:
+- Keep `ha_pri_sig` readable.
+- Add `ha_pri_sig_v2` separately.
+- Treat old rows without v2 fields as `unknown`, not invalid.
+- Do not reject old rows solely because they lack v2.
+- Preserve v1 formula tests.
+- Add v2 fixtures before any production write path changes.
+## Future HMAC Boundary
+HMAC-SHA256 may be useful later if FreshContext needs origin authentication rather than only tamper evidence.
+That would require:
+- a private deployment key
+- secret rotation
+- key identifiers
+- verification policy for old key versions
+- clear trust-boundary documentation
+This phase does not add HMAC, secrets, or key management.
+## Suggested Future Patch Sequence
+1. Add pure canonicalization helpers and deterministic tests.
+2. Add pure v2 signature helper and fixtures.
+3. Add optional verification helper that returns `valid`, `invalid`, or `unknown`.
+4. Add D1 columns in a separate schema phase.
+5. Write v2 fields for new rows only.
+6. Expose verification status on debug/internal endpoints first.
+7. Decide later whether public feed output should include v2 status.
+## Non-Goals
+This design does not:
+- change runtime behavior
+- change scoring
+- change MCP tool schemas
+- change D1 schema
+- change Worker write paths
+- migrate old rows
+- add HMAC
+- add secrets
+- reject rows in production
+- publish npm
+- deploy the Worker

package/docs/RELEASE_INTEGRITY.md ADDED Viewed

@@ -0,0 +1,55 @@
+# FreshContext Release Integrity Notes
+This document describes release hardening practices for future FreshContext package and archive releases. It is a plan and checklist, not an implemented signing or SBOM system.
+## Before Publishing or Sharing a Package
+- Start from a clean working tree.
+- Confirm the package version intentionally matches the release plan.
+- Run `npm run build`.
+- Run `npm test`.
+- Run `npm run smoke:stdio`.
+- Run `npm run example:ha-pri-v2`.
+- Run `cd worker && npx tsc --noEmit`.
+- Run `npm audit --omit=dev`.
+- Run `npm audit`.
+- Run `npm pack --dry-run --json`.
+- Smoke-test the packed tarball in a temporary install:
+  - Confirm `npm start` works from the installed package.
+  - Confirm the `freshcontext-mcp` binary works from the installed package.
+  - Confirm repo-only scripts print a source-checkout notice instead of failing when examples, tests, or scripts are intentionally excluded.
+  - Confirm `dist/server.js` is present and `dist/apify.js` is absent from the MCP npm package.
+  - Confirm fresh consumer `npm audit --omit=dev` is clean.
+- Run a stale-claim scan across public docs and package-facing files.
+- Run a secret scan before sharing archives, diligence folders, or package artifacts.
+- Keep operational demo runbooks, buyer scripts, outreach plans, diligence checklists, and negotiation materials outside the public npm package.
+## Package Exclusion Checks
+Confirm release artifacts do not include:
+- Local environment files.
+- Tokens or local credential files.
+- MCP registry local credential files.
+- Cloudflare local state.
+- Local database snapshots or SQL dumps.
+- Private sale, buyer, target, outreach, or diligence documents.
+- Private data-room folders.
+- Operational demo runbooks intended for buyer calls or internal screen-share rehearsal.
+- Local logs.
+- Old package tarballs.
+## Release Notes and Integrity Artifacts
+- Current prepared release notes: [`RELEASE_NOTES.md`](./RELEASE_NOTES.md).
+Future release hardening may include:
+- GitHub release notes for tagged releases.
+- Signed git tags, if signing is configured.
+- `SHA256SUMS` files for release artifacts.
+- SBOM generation for buyer or enterprise diligence.
+- npm provenance and signature review.
+- A documented token-rotation checklist for any maintainer or ownership transfer.
+Do not publish from a dirty working tree. Do not publish from an environment that exposes secrets in logs or command output.

package/docs/RELEASE_NOTES.md ADDED Viewed

@@ -0,0 +1,55 @@
+# FreshContext Release Notes
+## 0.3.19
+FreshContext 0.3.19 syncs the public MCP package with the new generic `evaluate_context` interface.
+### Context Evaluation Front Door
+- Adds `evaluate_context` as the primary MCP tool for caller-provided candidate context.
+- Returns decision-first output for agents and users: decision, meaning, action, warnings, source, freshness, rank score, utility, confidence, and explanation.
+- Keeps the boundary explicit: `evaluate_context` does not fetch, crawl, scrape, browse, read folders, or call adapters.
+- Updates stdio smoke and Trust Scanner claim checks to expect 22 MCP tools: `evaluate_context` plus 21 read-only reference adapters.
+### Public Framing
+- Reframes the MCP package around FreshContext as context integrity infrastructure, not a 21-tool toolbox.
+- Keeps the 21 adapter tools as read-only reference adapters and proof surfaces.
+- Updates package-facing docs/spec language to point first at `candidate context -> FreshContext Core -> decision-ready context`.
+## 0.3.18
+FreshContext 0.3.18 made the MCP/Core package easier to install, validate, and explain without changing the deployed Worker runtime.
+### Core and Context Evaluation
+- Added the Core signal evaluation pipeline for normalized, freshness-ranked context results.
+- Added Source Profiles as Core metadata for source-aware policy vocabulary.
+- Added the Context Decision Helper so evaluated context can be interpreted as use, cite, verify, refresh, watch, background, or exclude.
+- Preserved the ranking boundary: `ranked.final_score` controls default ordering, while `utility.score` remains sidecar intelligence.
+### Bring Your Own Context
+- Added decision-first local demos for user-provided JSON source lists.
+- Added academic citation and jobs/opportunity sample inputs.
+- Added `npm run demo:evaluate:file` for local source-list evaluation from a cloned source checkout.
+### Adapter Path
+- Added adapter registry metadata for the 21 existing MCP tools.
+- Added additive arXiv signal extraction without changing the existing MCP `extract_arxiv` behavior.
+- Added an arXiv signal-to-decision proof using a static fixture, Core evaluation, and decision output.
+### Package and Release Hygiene
+- Hardened the npm package file allowlist.
+- Added script guards so repo-only scripts show a source-checkout notice in packed installs.
+- Isolated Apify/Crawlee from the normal MCP npm runtime package while preserving source-checkout Apify Actor support.
+- Confirmed fresh consumer installs no longer install Apify/Crawlee/file-type through the default MCP package path.
+### Boundaries
+- No Worker deploy is part of the npm package release.
+- No hosted dashboard, billing system, Operator mode, browser crawling, or local file scanning is included.
+- No Worker, REST handler, MCP tool schema, or existing adapter behavior changes are included in the package release itself.
+- Future work is tracked in [`FUTURE_LANES.md`](./FUTURE_LANES.md).

package/docs/SIGNAL_CONTRACT.md ADDED Viewed

@@ -0,0 +1,89 @@
+# FreshContext Signal Contract v1
+FreshContext Signal Contract v1 defines the Core shape for a retrieved signal before it is ranked, wrapped, stored, or passed to an agent workflow.
+It is an additive Core API. It does not change MCP tool schemas, Worker runtime behavior, D1 schema, Store scoring, feeds, or deployment behavior.
+## Contract Version
+```ts
+type SignalContractVersion = "freshcontext.signal.v1";
+```
+Every normalized signal includes:
+```ts
+contract_version: "freshcontext.signal.v1"
+```
+## Input Shape
+`FreshContextSignalInput` accepts the common fields used by adapters, agents, ranking, and future Store wiring:
+```ts
+interface FreshContextSignalInput {
+  id?: string;
+  source: string;
+  source_type?: string;
+  title?: string;
+  content?: string;
+  published_at?: string | null;
+  content_date?: string | null;
+  retrieved_at?: string | null;
+  semantic_score?: number;
+  date_confidence?: "high" | "medium" | "low" | "unknown";
+  freshness_confidence?: "high" | "medium" | "low";
+  status?: "success" | "partial" | "stale" | "failed" | "unknown";
+  metadata?: Record<string, unknown>;
+}
+```
+`published_at` is the canonical signal timestamp. `content_date` is accepted as an adapter/envelope compatibility alias.
+## Normalized Output
+`normalizeSignal(input, options?)` returns a `FreshContextSignal`:
+```ts
+interface FreshContextSignal {
+  contract_version: "freshcontext.signal.v1";
+  id?: string;
+  source: string;
+  source_type: string;
+  title?: string;
+  content?: string;
+  published_at: string | null;
+  retrieved_at: string;
+  semantic_score: number;
+  date_confidence: "high" | "medium" | "low" | "unknown";
+  status: "success" | "partial" | "stale" | "failed" | "unknown";
+  metadata: Record<string, unknown>;
+  reasons: string[];
+}
+```
+## Normalization Rules
+- Missing or invalid `published_at` / `content_date` becomes `published_at: null`.
+- `content_date` maps to `published_at` when `published_at` is absent.
+- Meaningfully future-dated timestamps are cleared and receive `date_confidence: "unknown"`.
+- Small clock skew is tolerated by the same Core freshness policy used by envelope scoring.
+- Failed, empty, timeout, blocked, or error-looking content becomes `status: "failed"`.
+- Missing, invalid, negative, or oversized `semantic_score` is clamped into `0..1`.
+- `metadata` is shallow-copied so normalization does not mutate caller-owned objects.
+- `reasons` records meaningful normalization changes.
+## Relationship to Existing Core Types
+The signal contract does not replace existing Core types:
+- `AdapterResult` remains the adapter-to-envelope input shape.
+- `FreshContext` remains the envelope output shape.
+- `FreshSignal` and `RankedSignal` remain the ranking input/output shapes.
+- `ContextUtilityInput` remains the pure context-conditioned utility primitive.
+The contract gives these surfaces a shared signal vocabulary without requiring Store, Worker, or MCP schema changes.
+## Boundary
+Signal Contract v1 does not determine truth, certify data, or provide legal, medical, tax, or financial advice. It provides normalized context metadata for freshness, provenance, relevance, and workflow review.