npm - @datafog/fogclaw - Versions diffs - 0.1.0 - Mend

@datafog/fogclaw 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (97) hide show

package/.github/workflows/harness-docs.yml +30 -0
package/AGENTS.md +28 -0
package/LICENSE +21 -0
package/README.md +208 -0
package/dist/config.d.ts +4 -0
package/dist/config.d.ts.map +1 -0
package/dist/config.js +30 -0
package/dist/config.js.map +1 -0
package/dist/engines/gliner.d.ts +14 -0
package/dist/engines/gliner.d.ts.map +1 -0
package/dist/engines/gliner.js +75 -0
package/dist/engines/gliner.js.map +1 -0
package/dist/engines/regex.d.ts +5 -0
package/dist/engines/regex.d.ts.map +1 -0
package/dist/engines/regex.js +54 -0
package/dist/engines/regex.js.map +1 -0
package/dist/index.d.ts +19 -0
package/dist/index.d.ts.map +1 -0
package/dist/index.js +157 -0
package/dist/index.js.map +1 -0
package/dist/redactor.d.ts +3 -0
package/dist/redactor.d.ts.map +1 -0
package/dist/redactor.js +37 -0
package/dist/redactor.js.map +1 -0
package/dist/scanner.d.ts +11 -0
package/dist/scanner.d.ts.map +1 -0
package/dist/scanner.js +77 -0
package/dist/scanner.js.map +1 -0
package/dist/types.d.ts +31 -0
package/dist/types.d.ts.map +1 -0
package/dist/types.js +18 -0
package/dist/types.js.map +1 -0
package/docs/DATA.md +28 -0
package/docs/DESIGN.md +17 -0
package/docs/DOMAIN_DOCS.md +30 -0
package/docs/FRONTEND.md +24 -0
package/docs/OBSERVABILITY.md +25 -0
package/docs/PLANS.md +171 -0
package/docs/PRODUCT_SENSE.md +20 -0
package/docs/RELIABILITY.md +60 -0
package/docs/SECURITY.md +50 -0
package/docs/design-docs/core-beliefs.md +17 -0
package/docs/design-docs/index.md +8 -0
package/docs/generated/README.md +36 -0
package/docs/generated/memory.md +1 -0
package/docs/plans/2026-02-16-fogclaw-design.md +172 -0
package/docs/plans/2026-02-16-fogclaw-implementation.md +1606 -0
package/docs/plans/README.md +15 -0
package/docs/plans/active/2026-02-16-feat-openclaw-official-submission-plan.md +386 -0
package/docs/plans/active/2026-02-17-feat-release-fogclaw-via-datafog-package-plan.md +318 -0
package/docs/plans/active/2026-02-17-feat-submit-fogclaw-to-openclaw-plan.md +244 -0
package/docs/plans/tech-debt-tracker.md +42 -0
package/docs/plugins/fogclaw.md +95 -0
package/docs/runbooks/address-review-findings.md +30 -0
package/docs/runbooks/ci-failures.md +46 -0
package/docs/runbooks/code-review.md +34 -0
package/docs/runbooks/merge-change.md +28 -0
package/docs/runbooks/pull-request.md +45 -0
package/docs/runbooks/record-evidence.md +43 -0
package/docs/runbooks/reproduce-bug.md +42 -0
package/docs/runbooks/respond-to-feedback.md +42 -0
package/docs/runbooks/review-findings.md +31 -0
package/docs/runbooks/submit-openclaw-plugin.md +68 -0
package/docs/runbooks/update-agents-md.md +59 -0
package/docs/runbooks/update-domain-docs.md +42 -0
package/docs/runbooks/validate-current-state.md +41 -0
package/docs/runbooks/verify-release.md +69 -0
package/docs/specs/2026-02-16-feat-openclaw-official-submission-spec.md +115 -0
package/docs/specs/2026-02-17-feat-submit-fogclaw-to-openclaw.md +125 -0
package/docs/specs/README.md +5 -0
package/docs/specs/index.md +8 -0
package/docs/spikes/README.md +8 -0
package/fogclaw.config.example.json +15 -0
package/openclaw.plugin.json +45 -0
package/package.json +37 -0
package/scripts/ci/he-docs-config.json +123 -0
package/scripts/ci/he-docs-drift.sh +112 -0
package/scripts/ci/he-docs-lint.sh +234 -0
package/scripts/ci/he-plans-lint.sh +354 -0
package/scripts/ci/he-runbooks-lint.sh +445 -0
package/scripts/ci/he-specs-lint.sh +258 -0
package/scripts/ci/he-spikes-lint.sh +249 -0
package/scripts/runbooks/select-runbooks.sh +154 -0
package/src/config.ts +46 -0
package/src/engines/gliner.ts +88 -0
package/src/engines/regex.ts +71 -0
package/src/index.ts +223 -0
package/src/redactor.ts +51 -0
package/src/scanner.ts +90 -0
package/src/types.ts +52 -0
package/tests/config.test.ts +104 -0
package/tests/gliner.test.ts +184 -0
package/tests/plugin-smoke.test.ts +114 -0
package/tests/redactor.test.ts +320 -0
package/tests/regex.test.ts +345 -0
package/tests/scanner.test.ts +199 -0
package/tsconfig.json +20 -0

package/docs/PLANS.md ADDED Viewed

@@ -0,0 +1,171 @@
+# Agent Plans:
+This document describes the requirements for a plan ("Plan"), a design document that a coding agent can follow to deliver a working feature or system change. Treat the reader as a complete beginner to this repository: they have only the current working tree and the single Plan file you provide. There is no memory of prior plans and no external context.
+## How to use Plans and PLANS.md
+When authoring an executable specification (Plan), follow PLANS.md _to the letter_. If it is not in your context, refresh your memory by reading the entire PLANS.md file. Be thorough in reading (and re-reading) source material to produce an accurate specification. When creating a spec, start from the skeleton and flesh it out as you do your research.
+When implementing an executable specification (Plan), do not prompt the user for "next steps"; simply proceed to the next milestone. Keep all sections up to date, add or split entries in the list at every stopping point to affirmatively state the progress made and next steps. Resolve ambiguities autonomously, and commit frequently.
+When discussing an executable specification (Plan), record decisions in a log in the spec for posterity; it should be unambiguously clear why any change to the specification was made. Plans are living documents, and it should always be possible to restart from _only_ the Plan and no other work.
+When researching a design with challenging requirements or significant unknowns, use milestones to implement proof of concepts, "toy implementations", etc., that allow validating whether the user's proposal is feasible. Read the source code of libraries by finding or acquiring them, research deeply, and include prototypes to guide a fuller implementation.
+## Requirements
+NON-NEGOTIABLE REQUIREMENTS:
+* Every Plan must be fully self-contained. Self-contained means that in its current form it contains all knowledge and instructions needed for a novice to succeed.
+* Every Plan is a living document. Contributors are required to revise it as progress is made, as discoveries occur, and as design decisions are finalized. Each revision must remain fully self-contained.
+* Every Plan must enable a complete novice to implement the feature end-to-end without prior knowledge of this repo.
+* Every Plan must produce a demonstrably working behavior, not merely code changes to "meet a definition".
+* Every Plan must define every term of art in plain language or do not use it.
+Purpose and intent come first. Begin by explaining, in a few sentences, why the work matters from a user's perspective: what someone can do after this change that they could not do before, and how to see it working. Then guide the reader through the exact steps to achieve that outcome, including what to edit, what to run, and what they should observe.
+The agent executing your plan can list files, read files, search, run the project, and run tests. It does not know any prior context and cannot infer what you meant from earlier milestones. Repeat any assumption you rely on. Do not point to external blogs or docs; if knowledge is required, embed it in the plan itself in your own words. If a Plan builds upon a prior Plan and that file is checked in, incorporate it by reference. If it is not, you must include all relevant context from that plan.
+## Formatting
+Format and envelope are simple and strict. Each Plan must be one single fenced code block labeled as `md` that begins and ends with triple backticks. Do not nest additional triple-backtick code fences inside; when you need to show commands, transcripts, diffs, or code, present them as indented blocks within that single fence. Use indentation for clarity rather than code fences inside a Plan to avoid prematurely closing the Plan's code fence. Use two newlines after every heading, use # and ## and so on, and correct syntax for ordered and unordered lists.
+When writing a Plan to a Markdown (.md) file where the content of the file *is only* the single Plan, you should omit the triple backticks.
+Write in plain prose. Prefer sentences over lists. Avoid checklists, tables, and long enumerations unless brevity would obscure meaning. Checklists are permitted only in the `Progress` section, where they are mandatory. Narrative sections must remain prose-first.
+## Guidelines
+Self-containment and plain language are paramount. If you introduce a phrase that is not ordinary English ("daemon", "middleware", "RPC gateway", "filter graph"), define it immediately and remind the reader how it manifests in this repository (for example, by naming the files or commands where it appears). Do not say "as defined previously" or "according to the architecture doc." Include the needed explanation here, even if you repeat yourself.
+Avoid common failure modes. Do not rely on undefined jargon. Do not describe "the letter of a feature" so narrowly that the resulting code compiles but does nothing meaningful. Do not outsource key decisions to the reader. When ambiguity exists, resolve it in the plan itself and explain why you chose that path. Err on the side of over-explaining user-visible effects and under-specifying incidental implementation details.
+Anchor the plan with observable outcomes. State what the user can do after implementation, the commands to run, and the outputs they should see. Acceptance should be phrased as behavior a human can verify ("after starting the server, navigating to [http://localhost:8080/health](http://localhost:8080/health) returns HTTP 200 with body OK") rather than internal attributes ("added a HealthCheck struct"). If a change is internal, explain how its impact can still be demonstrated (for example, by running tests that fail before and pass after, and by showing a scenario that uses the new behavior).
+Specify repository context explicitly. Name files with full repository-relative paths, name functions and modules precisely, and describe where new files should be created. If touching multiple areas, include a short orientation paragraph that explains how those parts fit together so a novice can navigate confidently. When running commands, show the working directory and exact command line. When outcomes depend on environment, state the assumptions and provide alternatives when reasonable.
+Be idempotent and safe. Write the steps so they can be run multiple times without causing damage or drift. If a step can fail halfway, include how to retry or adapt. If a migration or destructive operation is necessary, spell out backups or safe fallbacks. Prefer additive, testable changes that can be validated as you go.
+Validation is not optional. Include instructions to run tests, to start the system if applicable, and to observe it doing something useful. Describe comprehensive testing for any new features or capabilities. Include expected outputs and error messages so a novice can tell success from failure. Where possible, show how to prove that the change is effective beyond compilation (for example, through a small end-to-end scenario, a CLI invocation, or an HTTP request/response transcript). State the exact test commands appropriate to the project’s toolchain and how to interpret their results.
+Capture evidence. When your steps produce terminal output, short diffs, or logs, include them inside the single fenced block as indented examples. Keep them concise and focused on what proves success. If you need to include a patch, prefer file-scoped diffs or small excerpts that a reader can recreate by following your instructions rather than pasting large blobs.
+## Milestones
+Milestones are narrative, not bureaucracy. If you break the work into milestones, introduce each with a brief paragraph that describes the scope, what will exist at the end of the milestone that did not exist before, the commands to run, and the acceptance you expect to observe. Keep it readable as a story: goal, work, result, proof. Progress and milestones are distinct: milestones tell the story, progress tracks granular work. Both must exist. Never abbreviate a milestone merely for the sake of brevity, do not leave out details that could be crucial to a future implementation.
+Each milestone must be independently verifiable and incrementally implement the overall goal of the plan.
+## Living plans and design decisions
+* Plans are living documents. As you make key design decisions, update the plan to record both the decision and the thinking behind it. Record all decisions in the `Decision Log` section.
+* Plans must contain and maintain a `Progress` section, a `Surprises & Discoveries` section, a `Decision Log`, and an `Outcomes & Retrospective` section. These are not optional.
+* When you discover optimizer behavior, performance tradeoffs, unexpected bugs, or inverse/unapply semantics that shaped your approach, capture those observations in the `Surprises & Discoveries` section with short evidence snippets (test output is ideal).
+* If you change course mid-implementation, document why in the `Decision Log` and reflect the implications in `Progress`. Plans are guides for the next contributor as much as checklists for you.
+* At completion of a major task or the full plan, write an `Outcomes & Retrospective` entry summarizing what was achieved, what remains, and lessons learned.
+* Plans must include explicit workflow handoff sections so later phases have a stable contract:
+  * `## Pull Request` (populated by PR-opening workflow)
+  * `## Review Findings` (populated by `he-review`)
+  * `## Verify/Release Decision` (populated by `he-verify-release`)
+# Prototyping milestones and parallel implementations
+It is acceptable—-and often encouraged—-to include explicit prototyping milestones when they de-risk a larger change. Examples: adding a low-level operator to a dependency to validate feasibility, or exploring two composition orders while measuring optimizer effects. Keep prototypes additive and testable. Clearly label the scope as “prototyping”; describe how to run and observe results; and state the criteria for promoting or discarding the prototype.
+Prefer additive code changes followed by subtractions that keep tests passing. Parallel implementations (e.g., keeping an adapter alongside an older path during migration) are fine when they reduce risk or enable tests to continue passing during a large migration. Describe how to validate both paths and how to retire one safely with tests. When working with multiple new libraries or feature areas, consider creating spikes that evaluate the feasibility of these features _independently_ of one another, proving that the external library performs as expected and implements the features we need in isolation.
+## Skeleton of a Good Plan
+    # <Short, action-oriented description>
+    This Plan is a living document. The sections `Progress`, `Surprises & Discoveries`, `Decision Log`, and `Outcomes & Retrospective` must be kept up to date as work proceeds.
+    If PLANS.md file is checked into the repo, reference the path to that file here from the repository root and note that this document must be maintained in accordance with PLANS.md.
+    ## Purpose / Big Picture
+    Explain in a few sentences what someone gains after this change and how they can see it working. State the user-visible behavior you will enable.
+    ## Progress
+    Use a list with checkboxes to summarize granular steps. Every stopping point must be documented here, even if it requires splitting a partially completed task into two (“done” vs. “remaining”). This section must always reflect the actual current state of the work.
+    - [x] (2025-10-01 13:00Z) Example completed step.
+    - [ ] Example incomplete step.
+    - [ ] Example partially completed step (completed: X; remaining: Y).
+    Use timestamps to measure rates of progress.
+    ## Surprises & Discoveries
+    Document unexpected behaviors, bugs, optimizations, or insights discovered during implementation. Provide concise evidence.
+    - Observation: …
+      Evidence: …
+    ## Decision Log
+    Record every decision made while working on the plan in the format:
+    - Decision: …
+      Rationale: …
+      Date/Author: …
+    ## Outcomes & Retrospective
+    Summarize outcomes, gaps, and lessons learned at major milestones or at completion. Compare the result against the original purpose.
+    ## Context and Orientation
+    Describe the current state relevant to this task as if the reader knows nothing. Name the key files and modules by full path. Define any non-obvious term you will use. Do not refer to prior plans.
+    ## Plan of Work
+    Describe, in prose, the sequence of edits and additions. For each edit, name the file and location (function, module) and what to insert or change. Keep it concrete and minimal.
+    ## Concrete Steps
+    State the exact commands to run and where to run them (working directory). When a command generates output, show a short expected transcript so the reader can compare. This section must be updated as work proceeds.
+    ## Validation and Acceptance
+    Describe how to start or exercise the system and what to observe. Phrase acceptance as behavior, with specific inputs and outputs. If tests are involved, say "run <project’s test command> and expect <N> passed; the new test <name> fails before the change and passes after>".
+    ## Idempotence and Recovery
+    If steps can be repeated safely, say so. If a step is risky, provide a safe retry or rollback path. Keep the environment clean after completion.
+    ## Artifacts and Notes
+    Include the most important transcripts, diffs, or snippets as indented examples. Keep them concise and focused on what proves success.
+    ## Interfaces and Dependencies
+    Be prescriptive. Name the libraries, modules, and services to use and why. Specify the types, traits/interfaces, and function signatures that must exist at the end of the milestone. Prefer stable names and paths such as `crate::module::function` or `package.submodule.Interface`. E.g.:
+    In crates/foo/planner.rs, define:
+        pub trait Planner {
+            fn plan(&self, observed: &Observed) -> Vec<Action>;
+        }
+    ## Pull Request
+    This section is the stable handoff contract to the PR/CI phase. Record:
+    - pr: <url>
+    - branch:
+    - commit:
+    - ci: <checks link or summary>
+    ## Review Findings
+    Populated by review workflow (e.g. `he-review`). Consolidate findings here with priorities and locations.
+    ## Verify/Release Decision
+    Populated by verify/release workflow (e.g. `he-verify-release`). Record GO/NO-GO plus evidence and rollback.
+If you follow the guidance above, a single, stateless agent -- or a human novice -- can read your Plan from top to bottom and produce a working, observable result. That is the bar: SELF-CONTAINED, SELF-SUFFICIENT, NOVICE-GUIDING, OUTCOME-FOCUSED.
+When you revise a plan, you must ensure your changes are comprehensively reflected across all sections, including the living document sections, and you must write a note at the bottom of the plan describing the change and the reason why. Plans must describe not just the what but the why for almost everything.

package/docs/PRODUCT_SENSE.md ADDED Viewed

@@ -0,0 +1,20 @@
+---
+title: "Product Sense"
+use_when: "Capturing target users, success outcomes, decision heuristics, and quality criteria for this repo."
+---
+## Target Users
+- Name the primary user and the primary job-to-be-done; list any secondary users explicitly.
+- Call out non-users (who this is not for) to reduce scope creep.
+## Key Outcomes
+- Define 1-3 outcomes that matter and how you will measure them (even if qualitative).
+- Prefer metrics tied to user time, reliability, and task completion.
+## Decision Heuristics
+- Prefer shipping a smaller, complete slice over a broad, partial feature.
+- Optimize for reducing user effort and reducing operational burden.
+## Quality Criteria
+- Clear error messages and recovery paths; no silent failures.
+- Sensible defaults and empty states; predictable navigation.

package/docs/RELIABILITY.md ADDED Viewed

@@ -0,0 +1,60 @@
+---
+title: "Reliability"
+use_when: "Capturing reliability goals, failure modes, and operational guardrails for this repo."
+---
+## Reliability Goals
+This plugin should behave predictably for two success paths: (1) full detection flow with GLiNER available and (2) degraded mode with regex-only detection when GLiNER cannot initialize.
+Primary targets:
+- Plugin registration should succeed on a clean build and load via OpenClaw extension loading.
+- Guardrail hook should never throw uncaught errors to the request path.
+- Tool responses should be deterministic for the same input and schema.
+- Degraded flow should preserve baseline privacy protection via regex and continue processing traffic.
+## Failure Modes
+1. **GLiNER model fails to initialize or load**
+   - Signal: startup warning in build/test output or runtime log.
+   - Effect: semantic entities from GLiNER are unavailable.
+   - Mitigation: continue with regex-only scan (already implemented with warning path).
+2. **Runtime GLiNER inference errors**
+   - Signal: scanner logs and returns regex-only results.
+   - Effect: temporary loss of semantic detections only.
+   - Mitigation: keep regex detections and return a best-effort result.
+3. **Wrong plugin package metadata during review/installation**
+   - Signal: mismatch across `package.json`, lockfile, and docs.
+   - Effect: reviewer confusion or install friction.
+   - Mitigation: document and verify package identity consistently for submission target.
+4. **Model download or environment path issues**
+   - Signal: GLiNER init failures in constrained environments.
+   - Effect: reduced detection coverage.
+   - Mitigation: allow fallback to deterministic regex tests and avoid hard dependency in test harness.
+## Monitoring
+No centralized telemetry is included in this repository. Reliability should be observed by:
+- test pass/fail trends,
+- OpenClaw plugin load or registration diagnostics,
+- explicit warning signals when GLiNER is unavailable.
+## Operational Guardrails
+- Keep `enabled: false` as a quick mitigation switch if plugin behavior needs immediate suppression.
+- Validate with reviewer smoke commands before any package-level change.
+- Prefer small, atomic commits for submission-related metadata and test updates.
+- For release risk, run build + unit tests before pushing, and re-run on merge.
+## Rollback Path
+If submission metadata changes cause regressions, revert to the last green commit from this initiative branch, then revalidate:
+- `npm test`
+- `npm run build`
+- plugin contract smoke test for hook/tool behavior.

package/docs/SECURITY.md ADDED Viewed

@@ -0,0 +1,50 @@
+---
+title: "Security"
+use_when: "Capturing security expectations for this repo: threat model, auth/authorization, data sensitivity, compliance, and required controls."
+---
+## Threat Model
+FogClaw processes user-provided prompt text and returns extracted entities with character offsets. The main risk is accidental exposure of sensitive information in logs, crash output, or plugin responses.
+Assume untrusted text arrives from user messages.
+For this repo, a concrete threat is that PII can appear in plain text and be reflected back in a wrong form (for example, redacted text leaking original spans) if redaction logic or logging is incorrect.
+## Auth Model
+This package does not implement authentication itself. It is a plugin loaded by OpenClaw, and security is enforced by OpenClaw’s plugin installation and runtime controls.
+Within this repo, this means:
+- No custom auth credentials are accepted.
+- No authorization checks are added in the plugin itself.
+- Sensitive behavior is controlled by config and explicit runtime policy (`redact`, `warn`, `block`).
+## Data Sensitivity
+Treat plugin input as sensitive by default.
+- PII and custom entities are parsed from incoming messages.
+- Entity text is held in memory during scan and redaction only.
+- Do not write raw messages, detected entity values, or redaction mappings to disk.
+- Prefer hash or token strategies when persistence is required by caller policy.
+## Compliance
+This plugin is scoped as an on-device plugin. It does not claim HIPAA, GDPR, or SOC 2 compliance by itself.
+At minimum, callers should classify whether their usage has regulated data and enforce policy at the platform level (retention, purge, and audit).
+## Controls
+- Keep `api.logger` and local logs free of entity values.
+- Never store scan results in global caches.
+- Preserve plugin fallback behavior: if GLiNER initialization fails, continue in regex-only mode and do not fail the entire request path.
+- Enforce the plugin-level switch (`enabled`) to allow safe disablement without process restart if needed.
+## OpenClaw-Specific Security Notes
+- Use clean plugin configuration.
+- Restrict plugin installation and publishing channels to trusted owners.
+- In reviews, verify `openclaw.plugin.json` and package metadata before listing; mismatched package identity can create install ambiguity.

package/docs/design-docs/core-beliefs.md ADDED Viewed

@@ -0,0 +1,17 @@
+# Core Beliefs
+Document the product and engineering beliefs that guide roadmap, architecture, and delivery decisions.
+<!-- seed: Populated from bootstrap Q8. If AGENTS.md has golden principles, those are suggested as starting points. -->
+## Belief 1
+- Statement:
+- Why it matters:
+- Tradeoffs:
+## Belief 2
+- Statement:
+- Why it matters:
+- Tradeoffs:

package/docs/design-docs/index.md ADDED Viewed

@@ -0,0 +1,8 @@
+# Design Docs Index
+Design rationale and deep dives live here.
+## Documents
+- `core-beliefs.md`

package/docs/generated/README.md ADDED Viewed

@@ -0,0 +1,36 @@
+# Generated Context
+This directory holds generated reference context that agents create to reason about the codebase. Files here are auto-generated snapshots — not hand-authored documentation.
+## When to Create
+Create a generated context file when a skill (`he-implement`, `he-review`, `he-doc-gardening`) discovers relevant project infrastructure during its workflow. Discovery signals and corresponding context files:
+| Discovery Signal | Context to Create | Example Filename |
+|---|---|---|
+| Database migrations or schema files exist | Schema snapshot | `db-schema.md` |
+| Route definitions or API framework detected | API endpoint index | `api-schema.md` |
+| UI component hierarchy (React, Vue, etc.) | Component tree map | `component-tree.md` |
+| Complex module dependency structure | Dependency graph | `dependency-graph.md` |
+This is not exhaustive — create whatever context helps agents reason about the project. The key rule: only create files for infrastructure that actually exists.
+## Format Contract
+Every generated file must include:
+```
+- last_updated: YYYY-MM-DD HH:MM
+```
+The `he-docs-lint` CI gate checks this timestamp on all files in this directory (except README.md and memory.md).
+## Rules
+- **Do not** create files for infrastructure the project does not have.
+- **Do not** manually edit generated files — regenerate them from source.
+- **Do** regenerate when the underlying source changes (migrations added, routes modified, etc.).
+## memory.md
+`memory.md` is a separate concept: it is a scratchpad for observations and patterns discovered during work, processed by `he-learn`. It is not auto-generated context and is not subject to the `last_updated` requirement.

package/docs/generated/memory.md ADDED Viewed

	@@ -0,0 +1 @@
1	+ # Memory

package/docs/plans/2026-02-16-fogclaw-design.md ADDED Viewed

@@ -0,0 +1,172 @@
+# FogClaw Design Document
+**Date:** 2026-02-16
+**Repo:** `datafog/fogclaw` (public, MIT license)
+**Status:** Approved
+## Overview
+FogClaw is an OpenClaw plugin that brings DataFog's PII detection and redaction capabilities into the OpenClaw AI agent ecosystem. It acts as both a passive guardrail on message flow and an on-demand tool the agent can invoke explicitly. It uses a dual-engine approach: ported DataFog regex patterns for structured PII and GLiNER (via ONNX) for zero-shot NER on custom entities.
+## Decisions
+| Decision | Choice |
+|---|---|
+| Use case | Both guardrail + on-demand tool |
+| Language | Pure TypeScript (ONNX for GLiNER) |
+| Regex layer | Port DataFog regex patterns |
+| PII action | Configurable per-entity-type (default: redact) |
+| Custom terms | Config file (`fogclaw.config.json`) |
+| Default model | `onnx-community/gliner_large-v2.1` |
+| Architecture | Dual-layer (regex + GLiNER) |
+## Project Structure
+```
+fogclaw/
+├── openclaw.plugin.json          # OpenClaw plugin manifest
+├── package.json
+├── tsconfig.json
+├── fogclaw.config.example.json   # Example user config
+├── src/
+│   ├── index.ts                  # Plugin entry: register hook + tool
+│   ├── engines/
+│   │   ├── regex.ts              # Ported DataFog regex patterns
+│   │   └── gliner.ts             # GLiNER ONNX inference wrapper
+│   ├── scanner.ts                # Orchestrator: regex → GLiNER pipeline
+│   ├── redactor.ts               # Redaction strategies (token, mask, hash)
+│   ├── config.ts                 # Config loading & validation
+│   └── types.ts                  # Shared TypeScript types
+├── models/                       # Auto-downloaded ONNX model cache
+├── tests/
+│   ├── regex.test.ts
+│   ├── gliner.test.ts
+│   ├── scanner.test.ts
+│   └── redactor.test.ts
+└── README.md
+```
+## Detection Pipeline
+```
+Input text
+    │
+    ▼
+┌─────────────┐
+│  Regex Pass  │  ← emails, SSNs, phones, credit cards, IPs, dates, zips
+│  (~20µs/kB)  │     confidence: 1.0
+└─────┬───────┘
+      │
+      ▼
+┌─────────────┐
+│ GLiNER Pass  │  ← persons, orgs, locations + custom entities from config
+│  (ONNX)      │     confidence: 0.0-1.0
+└─────┬───────┘
+      │
+      ▼
+┌─────────────┐
+│   Merge &    │  ← Deduplicate overlapping spans, prefer higher confidence
+│  Normalize   │     Canonical type mapping (same as DataFog)
+└─────┬───────┘
+      │
+      ▼
+  Entity[] — unified results
+```
+### Entity Type
+```typescript
+interface Entity {
+  text: string;        // "john@example.com"
+  label: string;       // "EMAIL"
+  start: number;       // character offset
+  end: number;
+  confidence: number;  // 1.0 for regex, 0.0-1.0 for GLiNER
+  source: "regex" | "gliner";
+}
+```
+### Span Conflict Resolution
+When regex and GLiNER detect overlapping spans, prefer regex (confidence 1.0) for structured types, GLiNER for semantic types. Partially overlapping spans resolved by higher confidence.
+### GLiNER Labels
+Built-in: `["person", "organization", "location", "address", "date of birth", "medical record number", "account number", "passport number"]`
+Plus `custom_entities` from user config.
+## OpenClaw Integration
+### Hook (Guardrail)
+Registers `before_agent_start` hook to intercept incoming messages. Per-entity-type actions:
+- **redact**: Replace with tokens like `[EMAIL]` (default)
+- **block**: Stop message, notify user
+- **warn**: Notify but allow message through
+### Tools
+Two tools registered for on-demand use by the agent:
+1. **fogclaw_scan** — Detect entities in text, return structured results
+2. **fogclaw_redact** — Detect and redact entities, return sanitized text
+Both accept optional `custom_labels` parameter for ad-hoc zero-shot entity detection.
+### Redaction Strategies
+- **token**: `"Contact john@example.com"` → `"Contact [EMAIL]"`
+- **mask**: `"Contact john@example.com"` → `"Contact ****************"`
+- **hash**: `"Contact john@example.com"` → `"Contact [EMAIL_a1b2c3d4e5f6]"`
+## Configuration
+```json
+{
+  "enabled": true,
+  "guardrail_mode": "redact",
+  "redactStrategy": "token",
+  "model": "onnx-community/gliner_large-v2.1",
+  "confidence_threshold": 0.5,
+  "custom_entities": ["project codename", "internal tool name", "competitor name"],
+  "entityActions": {
+    "SSN": "block",
+    "CREDIT_CARD": "block",
+    "EMAIL": "redact",
+    "PHONE": "redact",
+    "PERSON": "warn"
+  }
+}
+```
+## Dependencies
+```json
+{
+  "dependencies": {
+    "gliner": "^0.x.x",
+    "onnxruntime-node": "^1.x"
+  },
+  "devDependencies": {
+    "vitest": "^2.x",
+    "typescript": "^5.x"
+  }
+}
+```
+## Technical Considerations
+**Model Loading:** Downloaded once from HuggingFace, cached in `~/.openclaw/extensions/fogclaw/models/`. Singleton pattern — stays loaded after first inference.
+**Error Handling:** GLiNER failure → fall back to regex-only with warning. Network failure during download → clear error with manual download instructions.
+**Performance:** Regex <1ms, GLiNER ~50-200ms per message. Well under 1s total — acceptable for messaging bots.
+## Not In v1 (YAGNI)
+- No outbound message scanning
+- No persistent audit log
+- No web UI for config
+- No GLiNER2 support (add later when npm ecosystem catches up)
+- No runtime entity label management (config file only)