visus-mcp 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/settings.local.json +36 -0
- package/CLAUDE.md +324 -0
- package/README.md +290 -0
- package/SECURITY.md +360 -0
- package/STATUS.md +482 -0
- package/TROUBLESHOOT-BUILD-20260319-1450.md +546 -0
- package/TROUBLESHOOT-FETCH-20260320-1150.md +168 -0
- package/TROUBLESHOOT-SSL-20260320-1138.md +171 -0
- package/TROUBLESHOOT-STRUCTURED-20260320-1200.md +246 -0
- package/TROUBLESHOOT-TEST-20260320-0942.md +281 -0
- package/VISUS-CLAUDE-CODE-PROMPT.md +324 -0
- package/VISUS-PROJECT-PLAN.md +198 -0
- package/dist/browser/__mocks__/playwright-renderer.d.ts +25 -0
- package/dist/browser/__mocks__/playwright-renderer.d.ts.map +1 -0
- package/dist/browser/__mocks__/playwright-renderer.js +119 -0
- package/dist/browser/__mocks__/playwright-renderer.js.map +1 -0
- package/dist/browser/playwright-renderer.d.ts +36 -0
- package/dist/browser/playwright-renderer.d.ts.map +1 -0
- package/dist/browser/playwright-renderer.js +115 -0
- package/dist/browser/playwright-renderer.js.map +1 -0
- package/dist/index.d.ts +14 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +129 -0
- package/dist/index.js.map +1 -0
- package/dist/sanitizer/index.d.ts +55 -0
- package/dist/sanitizer/index.d.ts.map +1 -0
- package/dist/sanitizer/index.js +89 -0
- package/dist/sanitizer/index.js.map +1 -0
- package/dist/sanitizer/injection-detector.d.ts +34 -0
- package/dist/sanitizer/injection-detector.d.ts.map +1 -0
- package/dist/sanitizer/injection-detector.js +89 -0
- package/dist/sanitizer/injection-detector.js.map +1 -0
- package/dist/sanitizer/patterns.d.ts +30 -0
- package/dist/sanitizer/patterns.d.ts.map +1 -0
- package/dist/sanitizer/patterns.js +372 -0
- package/dist/sanitizer/patterns.js.map +1 -0
- package/dist/sanitizer/pii-redactor.d.ts +29 -0
- package/dist/sanitizer/pii-redactor.d.ts.map +1 -0
- package/dist/sanitizer/pii-redactor.js +189 -0
- package/dist/sanitizer/pii-redactor.js.map +1 -0
- package/dist/tools/fetch-structured.d.ts +46 -0
- package/dist/tools/fetch-structured.d.ts.map +1 -0
- package/dist/tools/fetch-structured.js +186 -0
- package/dist/tools/fetch-structured.js.map +1 -0
- package/dist/tools/fetch.d.ts +44 -0
- package/dist/tools/fetch.d.ts.map +1 -0
- package/dist/tools/fetch.js +97 -0
- package/dist/tools/fetch.js.map +1 -0
- package/dist/types.d.ts +93 -0
- package/dist/types.d.ts.map +1 -0
- package/dist/types.js +16 -0
- package/dist/types.js.map +1 -0
- package/jest.config.js +30 -0
- package/jest.setup.js +9 -0
- package/package.json +52 -0
- package/src/browser/__mocks__/playwright-renderer.ts +140 -0
- package/src/browser/playwright-renderer.ts +142 -0
- package/src/index.ts +169 -0
- package/src/sanitizer/index.ts +127 -0
- package/src/sanitizer/injection-detector.ts +121 -0
- package/src/sanitizer/patterns.ts +424 -0
- package/src/sanitizer/pii-redactor.ts +226 -0
- package/src/tools/fetch-structured.ts +218 -0
- package/src/tools/fetch.ts +108 -0
- package/src/types.ts +101 -0
- package/test-output.txt +4 -0
- package/tests/fetch-tool.test.ts +329 -0
- package/tests/injection-corpus.ts +338 -0
- package/tests/sanitizer.test.ts +306 -0
- package/tsconfig.json +25 -0
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
{
|
|
2
|
+
"permissions": {
|
|
3
|
+
"allow": [
|
|
4
|
+
"Bash(cat:*)",
|
|
5
|
+
"Bash(npm install)",
|
|
6
|
+
"Bash(npm test:*)",
|
|
7
|
+
"Bash(tee:*)",
|
|
8
|
+
"Bash(npm run build:*)",
|
|
9
|
+
"Bash(npm run lint:*)",
|
|
10
|
+
"Bash(node --version:*)",
|
|
11
|
+
"Bash(npm --version)",
|
|
12
|
+
"Bash(npx tsc:*)",
|
|
13
|
+
"Bash(NODE_OPTIONS=\"--experimental-vm-modules\" npm test:*)",
|
|
14
|
+
"Bash(pkill -f \"npm\\|node\\|jest\\|tsc\")",
|
|
15
|
+
"Bash(npx playwright install:*)",
|
|
16
|
+
"Bash(killall:*)",
|
|
17
|
+
"Bash(PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=0 npx playwright install:*)",
|
|
18
|
+
"Bash(sw_vers:*)",
|
|
19
|
+
"Bash(PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1 npm run build:*)",
|
|
20
|
+
"Bash(npm uninstall:*)",
|
|
21
|
+
"Bash(pkill -9 -f \"npm|node|tsc|jest\")",
|
|
22
|
+
"Bash(node -e:*)",
|
|
23
|
+
"Bash(timeout 30 npm test -- --verbose --no-coverage)",
|
|
24
|
+
"Bash(echo:*)",
|
|
25
|
+
"Bash(npm install:*)",
|
|
26
|
+
"Bash(npx jest:*)",
|
|
27
|
+
"Bash(pkill -9 -f \"jest|npm\")",
|
|
28
|
+
"Bash(pkill:*)",
|
|
29
|
+
"Bash(find:*)",
|
|
30
|
+
"Bash(defaults read:*)",
|
|
31
|
+
"Bash(rsync:*)"
|
|
32
|
+
],
|
|
33
|
+
"deny": [],
|
|
34
|
+
"ask": []
|
|
35
|
+
}
|
|
36
|
+
}
|
package/CLAUDE.md
ADDED
|
@@ -0,0 +1,324 @@
|
|
|
1
|
+
# CLAUDE.md
|
|
2
|
+
|
|
3
|
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
4
|
+
|
|
5
|
+
## Project Overview
|
|
6
|
+
|
|
7
|
+
**Visus** (`visus-mcp`) is an MCP tool that provides Claude with secure, sanitized access to web pages. Unlike other MCP browser tools (Firecrawl, Playwright MCP, ScrapeGraphAI), Visus runs ALL fetched content through an injection sanitization pipeline before the LLM reads it.
|
|
8
|
+
|
|
9
|
+
Core differentiator: *"What the web shows you, Lateos reads safely."*
|
|
10
|
+
|
|
11
|
+
This is part of the Lateos platform — a security-by-design AI agent framework deployed on AWS serverless (Lambda, Step Functions, API Gateway, Cognito, Bedrock with Guardrails, DynamoDB with KMS encryption).
|
|
12
|
+
|
|
13
|
+
## Architecture
|
|
14
|
+
|
|
15
|
+
The system follows this flow:
|
|
16
|
+
```
|
|
17
|
+
User provides URL → Visus MCP Tool → Browser rendering (Playwright) →
|
|
18
|
+
Raw HTML extraction → Injection Sanitizer (43 patterns) → PII Redactor →
|
|
19
|
+
Clean content → Claude via MCP
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
### Two MCP Tools
|
|
23
|
+
|
|
24
|
+
1. **`visus_fetch(url, options?)`** - Returns sanitized markdown/text from a URL
|
|
25
|
+
2. **`visus_fetch_structured(url, schema)`** - Extracts structured data with sanitization
|
|
26
|
+
|
|
27
|
+
Both tools MUST always pass content through the sanitizer — this cannot be bypassed.
|
|
28
|
+
|
|
29
|
+
## Key Components
|
|
30
|
+
|
|
31
|
+
### Sanitizer (The Core Product)
|
|
32
|
+
Location: `src/sanitizer/`
|
|
33
|
+
|
|
34
|
+
The sanitizer is the product's primary moat. It must detect and neutralize 43 injection pattern categories:
|
|
35
|
+
|
|
36
|
+
1. Direct instruction injection ("Ignore previous instructions")
|
|
37
|
+
2. Role hijacking ("You are now", "Act as")
|
|
38
|
+
3. System prompt extraction ("Repeat your instructions")
|
|
39
|
+
4. Privilege escalation ("Admin mode", "Developer override")
|
|
40
|
+
5. Context poisoning ("The user said", "You already agreed")
|
|
41
|
+
6. Data exfiltration ("Send this to", "Email the following")
|
|
42
|
+
7. Encoding obfuscation (Base64, Unicode lookalikes, hex-encoded)
|
|
43
|
+
8. Whitespace hiding (zero-width chars, invisible Unicode)
|
|
44
|
+
9. HTML/script injection (`<script>`, `<iframe>`, `onclick`)
|
|
45
|
+
10. Markdown injection (malicious link syntax, image payloads)
|
|
46
|
+
11. URL fragment attacks (instructions after `#`)
|
|
47
|
+
12. Social engineering (urgency language)
|
|
48
|
+
... (43 total)
|
|
49
|
+
|
|
50
|
+
**Sanitizer behavior:**
|
|
51
|
+
- Detect → log pattern name to `sanitization.patterns_detected`
|
|
52
|
+
- Neutralize → strip, replace with `[REDACTED: pattern_name]`, or escape
|
|
53
|
+
- Never block entire page — degrade gracefully
|
|
54
|
+
- PII redaction: email, phone, SSN, credit card, IP addresses
|
|
55
|
+
- PII format: `[REDACTED:EMAIL]`, `[REDACTED:PHONE]`, etc.
|
|
56
|
+
|
|
57
|
+
### Browser Rendering
|
|
58
|
+
Location: `src/browser/playwright-renderer.ts`
|
|
59
|
+
|
|
60
|
+
Uses Playwright headless Chromium to fetch pages. Phase 1 uses headless only; Phase 2 adds user-session relay for login-gated pages.
|
|
61
|
+
|
|
62
|
+
## Development Commands
|
|
63
|
+
|
|
64
|
+
Since this is a new project, these commands will be added to `package.json`:
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
npm run build # Compile TypeScript to /dist
|
|
68
|
+
npm test # Run Jest test suite (must have 0 failures)
|
|
69
|
+
npm run lint # TypeScript strict mode checks
|
|
70
|
+
npm publish --dry-run # Validate package before publishing
|
|
71
|
+
npx visus-mcp # Start MCP server
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
## Coding Standards (Lateos Conventions)
|
|
75
|
+
|
|
76
|
+
- **TypeScript strict mode** - No `any` types allowed
|
|
77
|
+
- **Error handling** - Never throw raw errors; return typed Result objects
|
|
78
|
+
- **Logging** - Structured JSON to stderr (NOT stdout — MCP protocol uses stdout)
|
|
79
|
+
- **Documentation** - All public functions must have JSDoc comments
|
|
80
|
+
- **Tests** - Jest, located in `/tests`, minimum 80% coverage
|
|
81
|
+
- **Security** - No secrets in code; read from environment variables
|
|
82
|
+
- **Build output** - `tsc` compiles to `/dist`
|
|
83
|
+
|
|
84
|
+
## Test Requirements
|
|
85
|
+
|
|
86
|
+
All tests must pass before Phase 1 is complete.
|
|
87
|
+
|
|
88
|
+
### `tests/sanitizer.test.ts`
|
|
89
|
+
- Each of 43 pattern categories with at least one positive test case
|
|
90
|
+
- PII detection: email, phone, SSN, credit card
|
|
91
|
+
- Clean content passes through unmodified (no false positives)
|
|
92
|
+
- `content_modified: false` when no patterns detected
|
|
93
|
+
- `content_modified: true` and `patterns_detected` populated when injection found
|
|
94
|
+
|
|
95
|
+
### `tests/fetch-tool.test.ts`
|
|
96
|
+
- `visus_fetch` returns expected output shape
|
|
97
|
+
- `visus_fetch_structured` extracts fields correctly
|
|
98
|
+
- Timeout handling
|
|
99
|
+
- Invalid URL handling
|
|
100
|
+
- Sanitizer is always called (cannot be bypassed)
|
|
101
|
+
|
|
102
|
+
### `tests/injection-corpus.ts`
|
|
103
|
+
- 43 injection payloads (one per pattern category)
|
|
104
|
+
- 10 clean pages/content samples (should produce no detections)
|
|
105
|
+
|
|
106
|
+
## Standard Troubleshooting Protocol
|
|
107
|
+
|
|
108
|
+
Whenever you encounter an error, blocked deployment, or multi-step recovery task, you MUST generate a structured troubleshooting log automatically. This is not optional — it applies to every build, fix, and diagnostic task in this project.
|
|
109
|
+
|
|
110
|
+
### Log File Naming
|
|
111
|
+
|
|
112
|
+
```
|
|
113
|
+
TROUBLESHOOT-<CONTEXT>-$(date +%Y%m%d-%H%M).md
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
Save to project root. Examples:
|
|
117
|
+
- `TROUBLESHOOT-CDK-20260314-0621.md`
|
|
118
|
+
- `TROUBLESHOOT-LAMBDA-20260314-0900.md`
|
|
119
|
+
- `TROUBLESHOOT-SLIM-20260314-0629.md`
|
|
120
|
+
|
|
121
|
+
### Entry Format (append after EVERY action)
|
|
122
|
+
|
|
123
|
+
```markdown
|
|
124
|
+
## [HH:MM:SS] Step N - <short title>
|
|
125
|
+
|
|
126
|
+
**Goal:** What this step is trying to accomplish
|
|
127
|
+
**Reasoning:** Why this approach was chosen over alternatives
|
|
128
|
+
**Action:** Exact command or operation performed
|
|
129
|
+
**Result:** Raw output, error messages, success confirmation
|
|
130
|
+
**Status:** ✅ Success / ❌ Failed / ⚠️ Partial
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
### Rules
|
|
134
|
+
|
|
135
|
+
1. **Log BEFORE executing, not after** — write Goal and Reasoning first
|
|
136
|
+
2. **Never skip a step** even if obvious or trivial
|
|
137
|
+
3. **On failure:** log the full error, state your revised reasoning, attempt one alternative, log that too
|
|
138
|
+
4. **Do not summarize or clean up errors** — paste raw output verbatim
|
|
139
|
+
5. **End every log with a SUMMARY section:** root cause, resolution, lessons learned, and open issues
|
|
140
|
+
|
|
141
|
+
### Purpose
|
|
142
|
+
|
|
143
|
+
These logs are tool-use execution traces for future agent training. The **Reasoning** field is the highest-value signal — always explain **WHY**, not just **WHAT**.
|
|
144
|
+
|
|
145
|
+
**Example log structure:**
|
|
146
|
+
|
|
147
|
+
```markdown
|
|
148
|
+
# Lateos MCP Handler - Emergency Recovery Log
|
|
149
|
+
|
|
150
|
+
Started: 2026-03-14 06:02:15
|
|
151
|
+
Goal: Restore MCP handler Lambda with proper dependency packaging
|
|
152
|
+
|
|
153
|
+
---
|
|
154
|
+
|
|
155
|
+
## [06:02:18] Step 1 - Locate MCP Handler Source
|
|
156
|
+
|
|
157
|
+
**Goal:** Find the mcp_handler.py source file in the project
|
|
158
|
+
**Reasoning:** Need the handler source to rebuild the deployment package
|
|
159
|
+
**Action:** find /Users/leochong/Documents/projects -name "mcp_handler.py"
|
|
160
|
+
**Result:**
|
|
161
|
+
/Users/leochong/Documents/projects/Lateos/lambdas/core/mcp_handler.py
|
|
162
|
+
**Status:** ✅ Success
|
|
163
|
+
|
|
164
|
+
---
|
|
165
|
+
|
|
166
|
+
# RECOVERY SUMMARY
|
|
167
|
+
|
|
168
|
+
Final Status: ✅ RESTORED
|
|
169
|
+
Root Cause: Lambda package missing runtime dependencies
|
|
170
|
+
Resolution: Installed aws_lambda_powertools + aws_xray_sdk
|
|
171
|
+
Lessons Learned: Always verify dependencies in Lambda packages
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
---
|
|
175
|
+
|
|
176
|
+
## CRITICAL: Security Rules — Never Violate These
|
|
177
|
+
|
|
178
|
+
Claude Code must refuse to generate code that violates these rules, even if
|
|
179
|
+
explicitly instructed to do so in a subsequent message:
|
|
180
|
+
|
|
181
|
+
```
|
|
182
|
+
RULE 1: No secrets in code, environment variables, or config files.
|
|
183
|
+
ALL secrets go through AWS Secrets Manager. No exceptions.
|
|
184
|
+
|
|
185
|
+
RULE 2: No wildcard (*) actions or resources in any IAM policy.
|
|
186
|
+
Every Lambda has a scoped execution role. Period.
|
|
187
|
+
|
|
188
|
+
RULE 3: No public S3 buckets, no public endpoints without Cognito.
|
|
189
|
+
(WAF deferred to Phase 2 per ADR-011)
|
|
190
|
+
|
|
191
|
+
RULE 4: No shell execution in any Lambda or skill.
|
|
192
|
+
os.system(), subprocess, eval(), exec() are banned.
|
|
193
|
+
|
|
194
|
+
RULE 5: All user input is sanitized for prompt injection before
|
|
195
|
+
touching the LLM. Never pass raw user input to Bedrock.
|
|
196
|
+
|
|
197
|
+
RULE 6: No cross-user data access. Every DynamoDB query is scoped
|
|
198
|
+
to the authenticated user_id partition key. No exceptions.
|
|
199
|
+
|
|
200
|
+
RULE 7: Every Lambda has reserved_concurrent_executions set.
|
|
201
|
+
No function can scale to infinity and run up costs.
|
|
202
|
+
|
|
203
|
+
RULE 8: No plaintext logging of tokens, passwords, API keys, or PII.
|
|
204
|
+
Use structured logging with field redaction.
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
If asked to do something that violates these rules, Claude Code should explain
|
|
208
|
+
why and offer a compliant alternative.
|
|
209
|
+
|
|
210
|
+
---
|
|
211
|
+
|
|
212
|
+
## Environment Variables
|
|
213
|
+
|
|
214
|
+
```bash
|
|
215
|
+
# Optional — for Lateos hosted tier (Phase 2)
|
|
216
|
+
LATEOS_API_KEY= # Enables audit logging to Lateos cloud
|
|
217
|
+
LATEOS_ENDPOINT= # Defaults to https://api.lateos.ai
|
|
218
|
+
|
|
219
|
+
# Optional — browser config
|
|
220
|
+
VISUS_TIMEOUT_MS=10000 # Default fetch timeout
|
|
221
|
+
VISUS_MAX_CONTENT_KB=512 # Max content size before truncation
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
No API key required for open-source tier. `npx visus-mcp` works out of the box.
|
|
225
|
+
|
|
226
|
+
## Project Structure
|
|
227
|
+
|
|
228
|
+
```
|
|
229
|
+
lateos-visus/
|
|
230
|
+
├── src/
|
|
231
|
+
│ ├── index.ts # MCP server entry, tool registration
|
|
232
|
+
│ ├── tools/
|
|
233
|
+
│ │ ├── fetch.ts # visus_fetch(url, options?)
|
|
234
|
+
│ │ └── fetch-structured.ts # visus_fetch_structured(url, schema)
|
|
235
|
+
│ ├── sanitizer/
|
|
236
|
+
│ │ ├── index.ts # Sanitizer orchestrator
|
|
237
|
+
│ │ ├── injection-detector.ts # Pattern matching engine
|
|
238
|
+
│ │ ├── pii-redactor.ts # PII detection and redaction
|
|
239
|
+
│ │ └── patterns.ts # 43 injection pattern definitions
|
|
240
|
+
│ ├── browser/
|
|
241
|
+
│ │ └── playwright-renderer.ts # Headless Chromium page fetcher
|
|
242
|
+
│ └── types.ts # Shared TypeScript interfaces
|
|
243
|
+
└── tests/
|
|
244
|
+
├── sanitizer.test.ts
|
|
245
|
+
├── fetch-tool.test.ts
|
|
246
|
+
└── injection-corpus.ts # Test payload library
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
## Phase 1 Definition of Done
|
|
250
|
+
|
|
251
|
+
- [ ] `npx visus-mcp` starts an MCP server with both tools registered
|
|
252
|
+
- [ ] `visus_fetch("https://example.com")` returns sanitized markdown
|
|
253
|
+
- [ ] All 43 pattern categories have test cases that pass
|
|
254
|
+
- [ ] No false positives on 10 clean content samples
|
|
255
|
+
- [ ] README leads with security narrative
|
|
256
|
+
- [ ] SECURITY.md documents the threat model
|
|
257
|
+
- [ ] `npm test` passes with 0 failures
|
|
258
|
+
- [ ] `npm run build` produces clean `/dist`
|
|
259
|
+
- [ ] `npm publish --dry-run` succeeds
|
|
260
|
+
|
|
261
|
+
## What NOT to Build in Phase 1
|
|
262
|
+
|
|
263
|
+
- No AWS Lambda deployment (Phase 2)
|
|
264
|
+
- No DynamoDB audit logging (Phase 2)
|
|
265
|
+
- No Cognito auth (Phase 2)
|
|
266
|
+
- No user-session relay / Chrome extension (Phase 3)
|
|
267
|
+
- No Lateos dashboard integration (Phase 2)
|
|
268
|
+
- No paid tier gating (Phase 2)
|
|
269
|
+
|
|
270
|
+
Keep Phase 1 lean: a working, publishable open-source MCP tool with security-first documentation.
|
|
271
|
+
|
|
272
|
+
## Implementation Order
|
|
273
|
+
|
|
274
|
+
Start with the sanitizer — it is the product:
|
|
275
|
+
|
|
276
|
+
1. Define all 43 patterns in `src/sanitizer/patterns.ts`
|
|
277
|
+
2. Build the sanitizer engine against those patterns
|
|
278
|
+
3. Build the Playwright renderer
|
|
279
|
+
4. Wire into MCP tools
|
|
280
|
+
5. Write tests (sanitizer tests FIRST)
|
|
281
|
+
6. Write README and SECURITY.md last
|
|
282
|
+
|
|
283
|
+
Do not proceed past the sanitizer until the pattern library and basic detection logic are complete and unit-tested.
|
|
284
|
+
|
|
285
|
+
## Tool Output Schemas
|
|
286
|
+
|
|
287
|
+
### `visus_fetch` Output
|
|
288
|
+
```typescript
|
|
289
|
+
{
|
|
290
|
+
url: string,
|
|
291
|
+
content: string, // Sanitized content
|
|
292
|
+
sanitization: {
|
|
293
|
+
patterns_detected: string[], // Names of injection patterns found
|
|
294
|
+
pii_types_redacted: string[], // e.g. ["email", "phone", "ssn"]
|
|
295
|
+
content_modified: boolean
|
|
296
|
+
},
|
|
297
|
+
metadata: {
|
|
298
|
+
title: string,
|
|
299
|
+
fetched_at: string, // ISO timestamp
|
|
300
|
+
content_length_original: number,
|
|
301
|
+
content_length_sanitized: number
|
|
302
|
+
}
|
|
303
|
+
}
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
### `visus_fetch_structured` Output
|
|
307
|
+
```typescript
|
|
308
|
+
{
|
|
309
|
+
url: string,
|
|
310
|
+
data: Record<string, string | null>, // Extracted fields, sanitized
|
|
311
|
+
sanitization: { /* same as above */ },
|
|
312
|
+
metadata: { /* same as above */ }
|
|
313
|
+
}
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
## Security-First Documentation
|
|
317
|
+
|
|
318
|
+
Both README.md and SECURITY.md must lead with the security narrative, not features:
|
|
319
|
+
- The problem with other tools (raw content passed to LLM)
|
|
320
|
+
- How Visus works (fetch → sanitize → return)
|
|
321
|
+
- 43 pattern categories with examples
|
|
322
|
+
- PII redaction types and format
|
|
323
|
+
- Honest limitations (novel obfuscation, AI-generated benign-looking instructions)
|
|
324
|
+
- Vulnerability reporting: security@lateos.ai or GitHub Security tab
|
package/README.md
ADDED
|
@@ -0,0 +1,290 @@
|
|
|
1
|
+
# Visus — Secure Web Access for Claude
|
|
2
|
+
|
|
3
|
+
> **Every MCP browser tool passes raw web content to your LLM. Visus doesn't.**
|
|
4
|
+
|
|
5
|
+
Visus is an MCP (Model Context Protocol) tool that provides Claude with secure, sanitized access to web pages. Built by [Lateos](https://lateos.ai), Visus runs **all** fetched content through a comprehensive injection sanitization pipeline before the LLM reads a single character.
|
|
6
|
+
|
|
7
|
+
**Tagline:** *"What the web shows you, Lateos reads safely."*
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## The Problem with Other Tools
|
|
12
|
+
|
|
13
|
+
Popular MCP browser tools like Firecrawl, Playwright MCP, and ScrapeGraphAI pass untrusted web content directly to your LLM without sanitization. This creates a **critical security vulnerability**:
|
|
14
|
+
|
|
15
|
+
- **Prompt injection attacks** can manipulate AI behavior
|
|
16
|
+
- **Personal identifiable information (PII)** can leak into conversation logs
|
|
17
|
+
- **Malicious instructions** hidden in web pages can compromise your AI agent
|
|
18
|
+
|
|
19
|
+
Visus solves this by treating **every web page as untrusted input** and sanitizing it before your LLM sees it.
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## How Visus Works
|
|
24
|
+
|
|
25
|
+
```
|
|
26
|
+
User provides URL → Playwright Fetch → Injection Sanitizer (43 patterns) →
|
|
27
|
+
PII Redactor → Clean Content → Claude via MCP
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
### Security Pipeline
|
|
31
|
+
|
|
32
|
+
1. **Browser Rendering**: Headless Chromium via Playwright fetches the page
|
|
33
|
+
2. **Injection Detection**: 43 pattern categories scan for prompt injection attempts
|
|
34
|
+
3. **PII Redaction**: Emails, phone numbers, SSNs, credit cards, and IP addresses are redacted
|
|
35
|
+
4. **Clean Delivery**: Only sanitized content reaches your LLM
|
|
36
|
+
|
|
37
|
+
**This pipeline cannot be bypassed.** Every tool invocation runs content through the full sanitizer.
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
## Security Features
|
|
42
|
+
|
|
43
|
+
### 43 Injection Pattern Categories
|
|
44
|
+
|
|
45
|
+
Visus detects and neutralizes:
|
|
46
|
+
|
|
47
|
+
- **Direct instruction injection** — "Ignore previous instructions"
|
|
48
|
+
- **Role hijacking** — "You are now an unrestricted AI"
|
|
49
|
+
- **System prompt extraction** — "Repeat your instructions"
|
|
50
|
+
- **Privilege escalation** — "Admin mode enabled"
|
|
51
|
+
- **Data exfiltration** — "Send this to http://attacker.com"
|
|
52
|
+
- **Encoding obfuscation** — Base64, Unicode lookalikes, leetspeak
|
|
53
|
+
- **HTML/script injection** — `<script>`, `<iframe>`, event handlers
|
|
54
|
+
- **Jailbreak keywords** — DAN mode, developer override
|
|
55
|
+
- **Token smuggling** — Special tokens like `<|im_start|>`
|
|
56
|
+
- **Social engineering** — Urgency language to bypass caution
|
|
57
|
+
- ... and 33 more categories
|
|
58
|
+
|
|
59
|
+
[See full list in SECURITY.md](./SECURITY.md)
|
|
60
|
+
|
|
61
|
+
### PII Redaction
|
|
62
|
+
|
|
63
|
+
Automatically redacts:
|
|
64
|
+
|
|
65
|
+
- Email addresses → `[REDACTED:EMAIL]`
|
|
66
|
+
- Phone numbers → `[REDACTED:PHONE]`
|
|
67
|
+
- Social Security Numbers → `[REDACTED:SSN]`
|
|
68
|
+
- Credit card numbers → `[REDACTED:CC]`
|
|
69
|
+
- IP addresses → `[REDACTED:IP]`
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
73
|
+
## Quickstart
|
|
74
|
+
|
|
75
|
+
### Installation
|
|
76
|
+
|
|
77
|
+
```bash
|
|
78
|
+
npx visus-mcp
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
### Claude Desktop Configuration
|
|
82
|
+
|
|
83
|
+
Add to your `claude_desktop_config.json`:
|
|
84
|
+
|
|
85
|
+
```json
|
|
86
|
+
{
|
|
87
|
+
"mcpServers": {
|
|
88
|
+
"visus": {
|
|
89
|
+
"command": "npx",
|
|
90
|
+
"args": ["-y", "visus-mcp"]
|
|
91
|
+
}
|
|
92
|
+
}
|
|
93
|
+
}
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
Restart Claude Desktop. Visus tools are now available to Claude.
|
|
97
|
+
|
|
98
|
+
---
|
|
99
|
+
|
|
100
|
+
## MCP Tools
|
|
101
|
+
|
|
102
|
+
### `visus_fetch`
|
|
103
|
+
|
|
104
|
+
Fetch and sanitize a web page.
|
|
105
|
+
|
|
106
|
+
**Input:**
|
|
107
|
+
```json
|
|
108
|
+
{
|
|
109
|
+
"url": "https://example.com",
|
|
110
|
+
"format": "markdown", // or "text"
|
|
111
|
+
"timeout_ms": 10000 // optional
|
|
112
|
+
}
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
**Output:**
|
|
116
|
+
```json
|
|
117
|
+
{
|
|
118
|
+
"url": "https://example.com",
|
|
119
|
+
"content": "# Page Title\n\nSanitized page content...",
|
|
120
|
+
"sanitization": {
|
|
121
|
+
"patterns_detected": ["direct_instruction_injection"],
|
|
122
|
+
"pii_types_redacted": ["email", "phone"],
|
|
123
|
+
"content_modified": true
|
|
124
|
+
},
|
|
125
|
+
"metadata": {
|
|
126
|
+
"title": "Example Domain",
|
|
127
|
+
"fetched_at": "2024-01-15T10:30:00.000Z",
|
|
128
|
+
"content_length_original": 5000,
|
|
129
|
+
"content_length_sanitized": 4800
|
|
130
|
+
}
|
|
131
|
+
}
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
### `visus_fetch_structured`
|
|
135
|
+
|
|
136
|
+
Extract structured data from a web page according to a schema.
|
|
137
|
+
|
|
138
|
+
**Input:**
|
|
139
|
+
```json
|
|
140
|
+
{
|
|
141
|
+
"url": "https://shop.example.com/product",
|
|
142
|
+
"schema": {
|
|
143
|
+
"title": "product name",
|
|
144
|
+
"price": "product price",
|
|
145
|
+
"description": "product description"
|
|
146
|
+
},
|
|
147
|
+
"timeout_ms": 10000 // optional
|
|
148
|
+
}
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
**Output:**
|
|
152
|
+
```json
|
|
153
|
+
{
|
|
154
|
+
"url": "https://shop.example.com/product",
|
|
155
|
+
"data": {
|
|
156
|
+
"title": "Awesome Product",
|
|
157
|
+
"price": "$99.99",
|
|
158
|
+
"description": "A great product for your needs"
|
|
159
|
+
},
|
|
160
|
+
"sanitization": {
|
|
161
|
+
"patterns_detected": [],
|
|
162
|
+
"pii_types_redacted": [],
|
|
163
|
+
"content_modified": false
|
|
164
|
+
},
|
|
165
|
+
"metadata": {
|
|
166
|
+
"title": "Product Page",
|
|
167
|
+
"fetched_at": "2024-01-15T10:30:00.000Z",
|
|
168
|
+
"content_length_original": 8000,
|
|
169
|
+
"content_length_sanitized": 8000
|
|
170
|
+
}
|
|
171
|
+
}
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
All extracted fields are individually sanitized.
|
|
175
|
+
|
|
176
|
+
---
|
|
177
|
+
|
|
178
|
+
## Environment Variables
|
|
179
|
+
|
|
180
|
+
```bash
|
|
181
|
+
# Optional — for Lateos hosted tier features (Phase 2)
|
|
182
|
+
LATEOS_API_KEY=your-api-key # Enables audit logging to Lateos cloud
|
|
183
|
+
LATEOS_ENDPOINT=https://api.lateos.ai
|
|
184
|
+
|
|
185
|
+
# Optional — browser config
|
|
186
|
+
VISUS_TIMEOUT_MS=10000 # Default fetch timeout (milliseconds)
|
|
187
|
+
VISUS_MAX_CONTENT_KB=512 # Max content size before truncation (kilobytes)
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
**No API key required for open-source tier.** `npx visus-mcp` works out of the box.
|
|
191
|
+
|
|
192
|
+
---
|
|
193
|
+
|
|
194
|
+
## Lateos Platform
|
|
195
|
+
|
|
196
|
+
Visus is part of the **Lateos** platform — a security-by-design AI agent framework:
|
|
197
|
+
|
|
198
|
+
- **AWS Serverless**: Lambda, Step Functions, API Gateway, Cognito
|
|
199
|
+
- **Security**: Bedrock Guardrails, KMS encryption, Secrets Manager
|
|
200
|
+
- **Validated Patterns**: 43 injection patterns, 73/73 passing tests
|
|
201
|
+
- **CISSP/CEH-Informed**: Designed by security professionals
|
|
202
|
+
|
|
203
|
+
Learn more: [lateos.ai](https://lateos.ai) (Phase 2)
|
|
204
|
+
|
|
205
|
+
---
|
|
206
|
+
|
|
207
|
+
## Development
|
|
208
|
+
|
|
209
|
+
```bash
|
|
210
|
+
# Clone repo
|
|
211
|
+
git clone https://github.com/visus-mcp/visus-mcp.git
|
|
212
|
+
cd visus-mcp
|
|
213
|
+
|
|
214
|
+
# Install dependencies
|
|
215
|
+
npm install
|
|
216
|
+
|
|
217
|
+
# Build
|
|
218
|
+
npm run build
|
|
219
|
+
|
|
220
|
+
# Run tests
|
|
221
|
+
npm test
|
|
222
|
+
|
|
223
|
+
# Start MCP server
|
|
224
|
+
npm start
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
---
|
|
228
|
+
|
|
229
|
+
## Project Status
|
|
230
|
+
|
|
231
|
+
**Phase 1** (Current): Open-source MCP tool with local sanitization
|
|
232
|
+
|
|
233
|
+
**Phase 2** (Planned):
|
|
234
|
+
- Lateos cloud integration for audit logging
|
|
235
|
+
- User session relay for authenticated pages
|
|
236
|
+
- Hosted tier with SLA guarantees
|
|
237
|
+
|
|
238
|
+
**Phase 3** (Future):
|
|
239
|
+
- Chrome extension for session relay
|
|
240
|
+
- Real-time threat dashboard
|
|
241
|
+
- Custom pattern libraries
|
|
242
|
+
|
|
243
|
+
---
|
|
244
|
+
|
|
245
|
+
## Security
|
|
246
|
+
|
|
247
|
+
For detailed threat model, pattern examples, and vulnerability reporting:
|
|
248
|
+
|
|
249
|
+
**[→ Read SECURITY.md](./SECURITY.md)**
|
|
250
|
+
|
|
251
|
+
Report vulnerabilities: **security@lateos.ai** or [GitHub Security](https://github.com/visus-mcp/visus-mcp/security)
|
|
252
|
+
|
|
253
|
+
---
|
|
254
|
+
|
|
255
|
+
## License
|
|
256
|
+
|
|
257
|
+
MIT License
|
|
258
|
+
|
|
259
|
+
Copyright (c) 2024 Lateos (Leo Chongolnee)
|
|
260
|
+
|
|
261
|
+
---
|
|
262
|
+
|
|
263
|
+
## Credits
|
|
264
|
+
|
|
265
|
+
Built by [Leo Chongolnee](https://github.com/leochong) (@leochong) as part of the Lateos platform.
|
|
266
|
+
|
|
267
|
+
Inspired by the MCP ecosystem and informed by CISSP/CEH security principles.
|
|
268
|
+
|
|
269
|
+
---
|
|
270
|
+
|
|
271
|
+
## FAQ
|
|
272
|
+
|
|
273
|
+
**Q: Does Visus slow down web fetching?**
|
|
274
|
+
A: Minimal overhead. Sanitization adds ~50-200ms per page.
|
|
275
|
+
|
|
276
|
+
**Q: Can attackers bypass the sanitizer?**
|
|
277
|
+
A: Novel obfuscation techniques or AI-generated benign-looking instructions may evade detection. See [SECURITY.md](./SECURITY.md) for honest limitations.
|
|
278
|
+
|
|
279
|
+
**Q: Does Visus work with authenticated pages?**
|
|
280
|
+
A: Phase 1 uses headless-only rendering. Phase 2 will add user session relay via Chrome extension.
|
|
281
|
+
|
|
282
|
+
**Q: How does Visus compare to Firecrawl?**
|
|
283
|
+
A: Firecrawl is excellent for web scraping but doesn't sanitize for prompt injection. Visus focuses on **security-first** content delivery.
|
|
284
|
+
|
|
285
|
+
**Q: Is Visus free?**
|
|
286
|
+
A: Yes! Open-source tier is free forever. Phase 2 will introduce a hosted tier with SLA guarantees for enterprise use.
|
|
287
|
+
|
|
288
|
+
---
|
|
289
|
+
|
|
290
|
+
**Built with by Lateos**
|