visus-mcp 0.6.2 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (177) hide show
  1. package/.claude/settings.local.json +6 -1
  2. package/.env.status +7 -0
  3. package/CHANGELOG.md +65 -0
  4. package/CLAUDE.md +3 -0
  5. package/README.md +15 -7
  6. package/SECURITY.md +2 -0
  7. package/STATUS.md +203 -9
  8. package/dist/content-handlers/index.d.ts +36 -0
  9. package/dist/content-handlers/index.d.ts.map +1 -0
  10. package/dist/content-handlers/index.js +59 -0
  11. package/dist/content-handlers/index.js.map +1 -0
  12. package/dist/content-handlers/json-handler.d.ts +28 -0
  13. package/dist/content-handlers/json-handler.d.ts.map +1 -0
  14. package/dist/content-handlers/json-handler.js +116 -0
  15. package/dist/content-handlers/json-handler.js.map +1 -0
  16. package/dist/content-handlers/pdf-handler.d.ts +29 -0
  17. package/dist/content-handlers/pdf-handler.d.ts.map +1 -0
  18. package/dist/content-handlers/pdf-handler.js +77 -0
  19. package/dist/content-handlers/pdf-handler.js.map +1 -0
  20. package/dist/content-handlers/svg-handler.d.ts +35 -0
  21. package/dist/content-handlers/svg-handler.d.ts.map +1 -0
  22. package/dist/content-handlers/svg-handler.js +206 -0
  23. package/dist/content-handlers/svg-handler.js.map +1 -0
  24. package/dist/content-handlers/types.d.ts +42 -0
  25. package/dist/content-handlers/types.d.ts.map +1 -0
  26. package/dist/content-handlers/types.js +7 -0
  27. package/dist/content-handlers/types.js.map +1 -0
  28. package/dist/tools/fetch.d.ts.map +1 -1
  29. package/dist/tools/fetch.js +62 -4
  30. package/dist/tools/fetch.js.map +1 -1
  31. package/package.json +2 -1
  32. package/server.json +2 -2
  33. package/src/content-handlers/index.ts +72 -0
  34. package/src/content-handlers/json-handler.ts +137 -0
  35. package/src/content-handlers/pdf-handler.ts +91 -0
  36. package/src/content-handlers/svg-handler.ts +243 -0
  37. package/src/content-handlers/types.ts +44 -0
  38. package/src/tools/fetch.ts +69 -4
  39. package/.github/ISSUE_TEMPLATE/bug_report.md +0 -47
  40. package/.github/ISSUE_TEMPLATE/false_positive.md +0 -43
  41. package/.github/ISSUE_TEMPLATE/new_pattern.md +0 -49
  42. package/.github/ISSUE_TEMPLATE/security_report.md +0 -31
  43. package/.github/PULL_REQUEST_TEMPLATE.md +0 -39
  44. package/.mcpregistry_github_token +0 -1
  45. package/.mcpregistry_registry_token +0 -1
  46. package/CONTRIBUTING.md +0 -329
  47. package/LINKEDIN-STRATEGY.md +0 -367
  48. package/ROADMAP.md +0 -221
  49. package/SECURITY-AUDIT-v1.md +0 -277
  50. package/SUBMISSION.md +0 -66
  51. package/TROUBLESHOOT-AUTH-20260322-2019.md +0 -291
  52. package/TROUBLESHOOT-BUILD-20260319-1450.md +0 -546
  53. package/TROUBLESHOOT-COGNITO-AUTH-20260324-2029.md +0 -415
  54. package/TROUBLESHOOT-COGNITO-JWT-20260324.md +0 -592
  55. package/TROUBLESHOOT-FETCH-20260320-1150.md +0 -168
  56. package/TROUBLESHOOT-JEST-20260323-1357.md +0 -139
  57. package/TROUBLESHOOT-LAMBDA-20260322-1945.md +0 -183
  58. package/TROUBLESHOOT-PLAYWRIGHT-20260321-1549.md +0 -217
  59. package/TROUBLESHOOT-SSL-20260320-1138.md +0 -171
  60. package/TROUBLESHOOT-STRUCTURED-20260320-1200.md +0 -246
  61. package/TROUBLESHOOT-TEST-20260320-0942.md +0 -281
  62. package/VISUS-CLAUDE-CODE-PROMPT.md +0 -324
  63. package/VISUS-PROJECT-PLAN.md +0 -205
  64. package/cdk.json +0 -73
  65. package/infrastructure/app.ts +0 -39
  66. package/infrastructure/stack.ts +0 -298
  67. package/jest.config.js +0 -33
  68. package/jest.setup.js +0 -9
  69. package/lambda-deploy/index.js +0 -81512
  70. package/lambda-deploy/index.js.map +0 -7
  71. package/lambda-package/browser/__mocks__/playwright-renderer.d.ts +0 -25
  72. package/lambda-package/browser/__mocks__/playwright-renderer.d.ts.map +0 -1
  73. package/lambda-package/browser/__mocks__/playwright-renderer.js +0 -119
  74. package/lambda-package/browser/__mocks__/playwright-renderer.js.map +0 -1
  75. package/lambda-package/browser/playwright-renderer.d.ts +0 -40
  76. package/lambda-package/browser/playwright-renderer.d.ts.map +0 -1
  77. package/lambda-package/browser/playwright-renderer.js +0 -214
  78. package/lambda-package/browser/playwright-renderer.js.map +0 -1
  79. package/lambda-package/browser/reader.d.ts +0 -31
  80. package/lambda-package/browser/reader.d.ts.map +0 -1
  81. package/lambda-package/browser/reader.js +0 -98
  82. package/lambda-package/browser/reader.js.map +0 -1
  83. package/lambda-package/index.d.ts +0 -18
  84. package/lambda-package/index.d.ts.map +0 -1
  85. package/lambda-package/index.js +0 -238
  86. package/lambda-package/index.js.map +0 -1
  87. package/lambda-package/lambda-handler.d.ts +0 -28
  88. package/lambda-package/lambda-handler.d.ts.map +0 -1
  89. package/lambda-package/lambda-handler.js +0 -257
  90. package/lambda-package/lambda-handler.js.map +0 -1
  91. package/lambda-package/package-lock.json +0 -7435
  92. package/lambda-package/package.json +0 -74
  93. package/lambda-package/runtime.d.ts +0 -50
  94. package/lambda-package/runtime.d.ts.map +0 -1
  95. package/lambda-package/runtime.js +0 -86
  96. package/lambda-package/runtime.js.map +0 -1
  97. package/lambda-package/sanitizer/elicit-runner.d.ts +0 -48
  98. package/lambda-package/sanitizer/elicit-runner.d.ts.map +0 -1
  99. package/lambda-package/sanitizer/elicit-runner.js +0 -100
  100. package/lambda-package/sanitizer/elicit-runner.js.map +0 -1
  101. package/lambda-package/sanitizer/framework-mapper.d.ts +0 -24
  102. package/lambda-package/sanitizer/framework-mapper.d.ts.map +0 -1
  103. package/lambda-package/sanitizer/framework-mapper.js +0 -342
  104. package/lambda-package/sanitizer/framework-mapper.js.map +0 -1
  105. package/lambda-package/sanitizer/hitl-gate.d.ts +0 -69
  106. package/lambda-package/sanitizer/hitl-gate.d.ts.map +0 -1
  107. package/lambda-package/sanitizer/hitl-gate.js +0 -101
  108. package/lambda-package/sanitizer/hitl-gate.js.map +0 -1
  109. package/lambda-package/sanitizer/index.d.ts +0 -63
  110. package/lambda-package/sanitizer/index.d.ts.map +0 -1
  111. package/lambda-package/sanitizer/index.js +0 -105
  112. package/lambda-package/sanitizer/index.js.map +0 -1
  113. package/lambda-package/sanitizer/injection-detector.d.ts +0 -34
  114. package/lambda-package/sanitizer/injection-detector.d.ts.map +0 -1
  115. package/lambda-package/sanitizer/injection-detector.js +0 -89
  116. package/lambda-package/sanitizer/injection-detector.js.map +0 -1
  117. package/lambda-package/sanitizer/patterns.d.ts +0 -30
  118. package/lambda-package/sanitizer/patterns.d.ts.map +0 -1
  119. package/lambda-package/sanitizer/patterns.js +0 -372
  120. package/lambda-package/sanitizer/patterns.js.map +0 -1
  121. package/lambda-package/sanitizer/pii-allowlist.d.ts +0 -49
  122. package/lambda-package/sanitizer/pii-allowlist.d.ts.map +0 -1
  123. package/lambda-package/sanitizer/pii-allowlist.js +0 -231
  124. package/lambda-package/sanitizer/pii-allowlist.js.map +0 -1
  125. package/lambda-package/sanitizer/pii-redactor.d.ts +0 -41
  126. package/lambda-package/sanitizer/pii-redactor.d.ts.map +0 -1
  127. package/lambda-package/sanitizer/pii-redactor.js +0 -213
  128. package/lambda-package/sanitizer/pii-redactor.js.map +0 -1
  129. package/lambda-package/sanitizer/severity-classifier.d.ts +0 -33
  130. package/lambda-package/sanitizer/severity-classifier.d.ts.map +0 -1
  131. package/lambda-package/sanitizer/severity-classifier.js +0 -113
  132. package/lambda-package/sanitizer/severity-classifier.js.map +0 -1
  133. package/lambda-package/sanitizer/threat-reporter.d.ts +0 -66
  134. package/lambda-package/sanitizer/threat-reporter.d.ts.map +0 -1
  135. package/lambda-package/sanitizer/threat-reporter.js +0 -163
  136. package/lambda-package/sanitizer/threat-reporter.js.map +0 -1
  137. package/lambda-package/tools/fetch-structured.d.ts +0 -51
  138. package/lambda-package/tools/fetch-structured.d.ts.map +0 -1
  139. package/lambda-package/tools/fetch-structured.js +0 -237
  140. package/lambda-package/tools/fetch-structured.js.map +0 -1
  141. package/lambda-package/tools/fetch.d.ts +0 -49
  142. package/lambda-package/tools/fetch.d.ts.map +0 -1
  143. package/lambda-package/tools/fetch.js +0 -131
  144. package/lambda-package/tools/fetch.js.map +0 -1
  145. package/lambda-package/tools/read.d.ts +0 -51
  146. package/lambda-package/tools/read.d.ts.map +0 -1
  147. package/lambda-package/tools/read.js +0 -127
  148. package/lambda-package/tools/read.js.map +0 -1
  149. package/lambda-package/tools/search.d.ts +0 -45
  150. package/lambda-package/tools/search.d.ts.map +0 -1
  151. package/lambda-package/tools/search.js +0 -220
  152. package/lambda-package/tools/search.js.map +0 -1
  153. package/lambda-package/types.d.ts +0 -167
  154. package/lambda-package/types.d.ts.map +0 -1
  155. package/lambda-package/types.js +0 -16
  156. package/lambda-package/types.js.map +0 -1
  157. package/lambda-package/utils/format-converter.d.ts +0 -39
  158. package/lambda-package/utils/format-converter.d.ts.map +0 -1
  159. package/lambda-package/utils/format-converter.js +0 -191
  160. package/lambda-package/utils/format-converter.js.map +0 -1
  161. package/lambda-package/utils/truncate.d.ts +0 -26
  162. package/lambda-package/utils/truncate.d.ts.map +0 -1
  163. package/lambda-package/utils/truncate.js +0 -54
  164. package/lambda-package/utils/truncate.js.map +0 -1
  165. package/lambda.zip +0 -0
  166. package/test-output.txt +0 -4
  167. package/tests/auth-smoke.test.ts +0 -480
  168. package/tests/elicit-runner.test.ts +0 -232
  169. package/tests/fetch-tool.test.ts +0 -922
  170. package/tests/hitl-gate.test.ts +0 -267
  171. package/tests/injection-corpus.ts +0 -338
  172. package/tests/pii-allowlist.test.ts +0 -282
  173. package/tests/reader.test.ts +0 -353
  174. package/tests/sanitizer.test.ts +0 -358
  175. package/tests/search.test.ts +0 -456
  176. package/tests/threat-reporter.test.ts +0 -334
  177. package/tsconfig.cdk.json +0 -35
@@ -1,171 +0,0 @@
1
- # Visus MCP SSL Certificate Verification - Troubleshooting Log
2
-
3
- Started: 2026-03-20 11:38
4
- Goal: Fix SSL certificate verification for fetch() operations without disabling security
5
-
6
- ---
7
-
8
- ## [11:38:00] Step 1 - Identify Root Cause of SSL Failure
9
-
10
- **Goal:** Determine why SSL certificate verification is failing with both native fetch and undici
11
- **Reasoning:** Need to understand if this is a system-level certificate issue or Node.js configuration problem
12
- **Action:** Test SSL with multiple tools to isolate the issue
13
- **Result:**
14
- ```bash
15
- curl --version: Uses SecureTransport (macOS native) and LibreSSL - works fine
16
- openssl version: OpenSSL 3.6.1 - works fine
17
- openssl s_client -connect example.com:443: Verify return code: 0 (ok)
18
- node -e "fetch('https://example.com')": ERROR: fetch failed
19
- ```
20
- **Status:** ✅ Success - System SSL works, Node.js SSL fails
21
-
22
- ---
23
-
24
- ## [11:39:00] Step 2 - Check Node.js Certificate Configuration
25
-
26
- **Goal:** Determine if Node.js can access system certificates
27
- **Reasoning:** nvm-installed Node.js on macOS often can't find system certificate store
28
- **Action:** Check NODE_EXTRA_CA_CERTS and system certificate count
29
- **Result:**
30
- ```bash
31
- NODE_EXTRA_CA_CERTS: not set
32
- System root certificates: 156 certificates in /System/Library/Keychains/SystemRootCertificates.keychain
33
- Node.js location: /Users/leochong/.nvm/versions/node/v22.20.0/bin/node
34
- ```
35
- **Root Cause Found:** Node.js installed via nvm cannot access macOS system certificate store
36
- **Status:** ✅ Success - Issue identified
37
-
38
- ---
39
-
40
- ## [11:40:00] Step 3 - Export macOS Root Certificates
41
-
42
- **Goal:** Create a PEM file containing all macOS system root certificates
43
- **Reasoning:** Node.js can use a custom CA bundle via NODE_EXTRA_CA_CERTS environment variable
44
- **Action:** security find-certificate -a -p /System/Library/Keychains/SystemRootCertificates.keychain > system-ca-bundle.pem
45
- **Result:**
46
- ```
47
- Created: system-ca-bundle.pem (3,879 lines, 156 certificates)
48
- Verified: 156 certificates exported successfully
49
- ```
50
- **Status:** ✅ Success - CA bundle created
51
-
52
- ---
53
-
54
- ## [11:41:00] Step 4 - Test Node.js with CA Bundle
55
-
56
- **Goal:** Verify that Node.js can use the exported certificate bundle
57
- **Reasoning:** Need to confirm the fix works before updating configuration
58
- **Action:** NODE_EXTRA_CA_CERTS="$(pwd)/system-ca-bundle.pem" node -e "fetch('https://example.com')"
59
- **Result:**
60
- ```
61
- SUCCESS: Fetched 528 bytes
62
- ```
63
- **Status:** ✅ Success - SSL verification working with CA bundle
64
-
65
- ---
66
-
67
- ## [11:42:00] Step 5 - Update Claude Desktop Configuration
68
-
69
- **Goal:** Configure visus-mcp to use the CA bundle for all requests
70
- **Reasoning:** Replace insecure NODE_TLS_REJECT_UNAUTHORIZED=0 with proper certificate validation
71
- **Action:** Update claude_desktop_config.json
72
- **Result:**
73
- ```json
74
- {
75
- "env": {
76
- "NODE_EXTRA_CA_CERTS": "/Users/leochong/Projects/visus-mcp/system-ca-bundle.pem"
77
- }
78
- }
79
- ```
80
- **Status:** ✅ Success - Configuration updated with proper SSL verification
81
-
82
- ---
83
-
84
- ## [11:43:00] Step 6 - Add CA Bundle to .gitignore
85
-
86
- **Goal:** Prevent system-specific certificate bundle from being committed
87
- **Reasoning:** CA bundle is system-specific and should be regenerated per-machine
88
- **Action:** echo "system-ca-bundle.pem" >> .gitignore
89
- **Result:** Added to .gitignore
90
- **Status:** ✅ Success
91
-
92
- ---
93
-
94
- # RESOLUTION SUMMARY
95
-
96
- **Final Status:** ✅ RESOLVED
97
-
98
- ## Root Cause
99
-
100
- nvm-installed Node.js on macOS cannot access the system certificate store located in `/System/Library/Keychains/SystemRootCertificates.keychain`. This caused all HTTPS requests via native fetch() and undici to fail with "fetch failed" or "unable to get local issuer certificate" errors.
101
-
102
- ## Resolution
103
-
104
- 1. **Exported macOS system root certificates** to a PEM file:
105
- ```bash
106
- security find-certificate -a -p /System/Library/Keychains/SystemRootCertificates.keychain > system-ca-bundle.pem
107
- ```
108
-
109
- 2. **Configured Node.js to use the CA bundle** via `NODE_EXTRA_CA_CERTS` environment variable in Claude Desktop config:
110
- ```json
111
- "env": {
112
- "NODE_EXTRA_CA_CERTS": "/Users/leochong/Projects/visus-mcp/system-ca-bundle.pem"
113
- }
114
- ```
115
-
116
- 3. **Added system-ca-bundle.pem to .gitignore** to prevent committing system-specific files
117
-
118
- ## Verification
119
-
120
- ✅ SSL certificate verification: ENABLED
121
- ✅ HTTPS requests: WORKING
122
- ✅ Security: MAINTAINED (no certificate validation bypass)
123
- ✅ Test: `fetch('https://example.com')` returns 528 bytes successfully
124
-
125
- ## Alternative Solutions Considered
126
-
127
- ❌ **NODE_TLS_REJECT_UNAUTHORIZED=0**: Rejected - disables all certificate validation (security risk)
128
- ❌ **Using HTTP instead of HTTPS**: Rejected - defeats the security purpose of Visus
129
- ✅ **NODE_EXTRA_CA_CERTS with system certificates**: Selected - maintains security while fixing the issue
130
-
131
- ## Setup Instructions for Other Developers
132
-
133
- On macOS with nvm-installed Node.js:
134
-
135
- ```bash
136
- # 1. Export macOS system certificates
137
- security find-certificate -a -p /System/Library/Keychains/SystemRootCertificates.keychain > system-ca-bundle.pem
138
-
139
- # 2. Add to Claude Desktop config
140
- {
141
- "env": {
142
- "NODE_EXTRA_CA_CERTS": "/path/to/visus-mcp/system-ca-bundle.pem"
143
- }
144
- }
145
-
146
- # 3. Add to .gitignore
147
- echo "system-ca-bundle.pem" >> .gitignore
148
- ```
149
-
150
- ## Lessons Learned
151
-
152
- 1. **nvm + macOS + SSL = certificate issues** - Always check certificate access when using nvm
153
- 2. **Never disable SSL verification** - Even for "quick testing", find the proper fix
154
- 3. **System certificates are accessible** - macOS provides all root certificates via security command
155
- 4. **NODE_EXTRA_CA_CERTS is the proper solution** - Documented Node.js feature for custom CA bundles
156
- 5. **Test with undici AND native fetch** - Both can have different certificate handling behaviors
157
-
158
- ## Files Modified
159
-
160
- - `.gitignore` - Added system-ca-bundle.pem
161
- - `claude_desktop_config.json` - Changed NODE_TLS_REJECT_UNAUTHORIZED=0 to NODE_EXTRA_CA_CERTS
162
-
163
- ## Files Created
164
-
165
- - `system-ca-bundle.pem` - macOS system root certificates (156 certs, not committed to git)
166
-
167
- ---
168
-
169
- **Resolution Completed:** 2026-03-20 11:43
170
- **Total Time:** 5 minutes
171
- **Final Verdict:** ✅ SSL certificate verification working properly with full security maintained
@@ -1,246 +0,0 @@
1
- # Visus MCP visus_fetch_structured Null Extraction Bug - Troubleshooting Log
2
-
3
- Started: 2026-03-20 12:00
4
- Goal: Fix null extraction issue in visus_fetch_structured tool
5
-
6
- ---
7
-
8
- ## [12:00:00] Step 1 - Document Symptom
9
-
10
- **Goal:** Capture observed behavior from smoke test
11
- **Reasoning:** Need baseline before debugging
12
- **Action:** Review Test 4 results from Claude Desktop
13
- **Result:**
14
- ```json
15
- {
16
- "data": {
17
- "page_title": null,
18
- "main_paragraph": null,
19
- "link_url": null
20
- },
21
- "sanitization": {
22
- "patterns_detected": [],
23
- "content_modified": false
24
- },
25
- "metadata": {
26
- "content_length_original": 139,
27
- "content_length_sanitized": 0
28
- }
29
- }
30
- ```
31
- **Observations:**
32
- - All schema fields returned null
33
- - content_length_original: 139 bytes (vs 528 for visus_fetch on same URL)
34
- - content_length_sanitized: 0
35
- - metadata.title: "Example Domain" (proves page was fetched)
36
- **Status:** ✅ Success - Symptom documented
37
-
38
- ---
39
-
40
- ## [12:01:00] Step 2 - Analyze Implementation
41
-
42
- **Goal:** Understand data flow from fetch to extraction
43
- **Reasoning:** Need to trace where content gets lost
44
- **Action:** Read src/tools/fetch-structured.ts
45
- **Result:**
46
-
47
- **Key code sections:**
48
- ```typescript
49
- // Line 90: Uses text format
50
- format: 'text'
51
-
52
- // Line 97-98: Extracts text field
53
- const { title, text } = renderResult.value;
54
- const rawContent = text || ''; // rawContent = 139 bytes of plain text
55
-
56
- // Line 101: Calls extractor
57
- const extractedData = extractStructuredData(rawContent, schema);
58
- ```
59
-
60
- **Status:** ✅ Success - Data flow mapped
61
-
62
- ---
63
-
64
- ## [12:02:00] Step 3 - Check Text Extraction Process
65
-
66
- **Goal:** Determine what "text" format produces
67
- **Reasoning:** Need to understand why content is only 139 bytes vs 528 bytes
68
- **Action:** Review playwright-renderer.ts extractText() function
69
- **Result:**
70
- ```typescript
71
- function extractText(html: string): string {
72
- return html
73
- .replace(/<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/gi, '') // Remove scripts
74
- .replace(/<style\b[^<]*(?:(?!<\/style>)<[^<]*)*<\/style>/gi, '') // Remove styles
75
- .replace(/<[^>]+>/g, '') // Remove all HTML tags
76
- .replace(/\s+/g, ' ') // Collapse whitespace
77
- .trim();
78
- }
79
- ```
80
-
81
- **For example.com HTML (528 bytes):**
82
- ```html
83
- <!doctype html><html><head><title>Example Domain</title>...
84
- <h1>Example Domain</h1>
85
- <p>This domain is for use in documentation...</p>
86
- <a href="https://iana.org/domains/example">Learn more</a>
87
- ```
88
-
89
- **After extractText() (139 bytes):**
90
- ```
91
- Example Domain This domain is for use in documentation examples without needing permission. Avoid use in operations. Learn more
92
- ```
93
-
94
- **Key finding:** ALL HTML structure removed, including:
95
- - `<h1>` tags (needed to identify headings)
96
- - `<p>` tags (needed to identify paragraphs)
97
- - `<a href>` attributes (needed to extract link URLs)
98
-
99
- **Status:** ✅ Success - Text stripping identified
100
-
101
- ---
102
-
103
- ## [12:03:00] Step 4 - Analyze extractStructuredData() Function
104
-
105
- **Goal:** Understand extraction logic and why it fails
106
- **Reasoning:** Need to see what patterns the extractor looks for
107
- **Action:** Review fetch-structured.ts lines 22-64
108
- **Result:**
109
-
110
- **Extraction logic:**
111
- 1. Regex search for `(fieldName|description)\s*[:=]?\s*([^\n]+)`
112
- 2. Falls back to line-by-line search for field name
113
- 3. Looks for key-value pairs split by `:` or `=`
114
-
115
- **Example patterns it WOULD match:**
116
- ```
117
- page_title: Example Domain
118
- main_paragraph = This domain is for use...
119
- link_url: https://example.com
120
- ```
121
-
122
- **What it receives from example.com:**
123
- ```
124
- Example Domain This domain is for use in documentation examples without needing permission. Avoid use in operations. Learn more
125
- ```
126
-
127
- **Result:** No key-value structure → no regex matches → all fields return null
128
-
129
- **Status:** ✅ Success - Extraction failure explained
130
-
131
- ---
132
-
133
- ## [12:04:00] Step 5 - Identify Root Causes
134
-
135
- **Goal:** Summarize why extraction fails
136
- **Reasoning:** Need to document fundamental issues before proposing fix
137
- **Action:** Analyze Phase 1 design constraints
138
- **Result:**
139
-
140
- **Root Cause 1: Format Mismatch**
141
- - Structured extraction uses `format: 'text'`
142
- - Text format strips ALL HTML tags and attributes
143
- - Cannot extract link URLs (href stripped)
144
- - Cannot identify structure (h1, p tags stripped)
145
-
146
- **Root Cause 2: Naive Extraction Algorithm**
147
- - extractStructuredData() only looks for key-value patterns
148
- - Cannot understand semantic meaning ("main heading", "first paragraph")
149
- - Cannot parse HTML structure
150
- - Works for: JSON-like text, YAML, INI files
151
- - Fails for: Web pages, prose content, any unstructured text
152
-
153
- **Root Cause 3: Phase 1 Known Limitation**
154
- Per fetch-structured.ts:18-20:
155
- ```
156
- * Phase 1: Basic pattern matching
157
- * Phase 2+: LLM-powered extraction with Bedrock
158
- ```
159
-
160
- **Status:** ✅ Success - Root causes documented
161
-
162
- ---
163
-
164
- ## [12:05:00] Step 6 - Evaluate Fix Options
165
-
166
- **Goal:** Determine best approach for Phase 1
167
- **Reasoning:** Need to balance functionality vs scope creep
168
- **Action:** Consider alternatives
169
-
170
- **Option A: Do Nothing**
171
- - Mark as known Phase 1 limitation
172
- - Document in STATUS.md
173
- - Wait for Phase 2 LLM-powered extraction
174
- - ❌ Leaves tool completely non-functional
175
-
176
- **Option B: Add HTML Parser**
177
- - Use cheerio or jsdom
178
- - Parse HTML structure
179
- - Extract headings, paragraphs, links properly
180
- - ✅ Would work for basic HTML extraction
181
- - ⚠️ Adds dependency, increases scope
182
-
183
- **Option C: Hybrid Approach**
184
- - Keep current text-based extraction for key-value content
185
- - Add basic HTML parsing for common patterns (h1, p, a[href])
186
- - Fall back to simple heuristics (first line = title, etc.)
187
- - ✅ Improves functionality without full rewrite
188
- - ⚠️ Still limited compared to LLM extraction
189
-
190
- **Option D: Add Note to Tool Description**
191
- - Keep current implementation
192
- - Update tool description to clarify limitations
193
- - Add example of what kind of data it works with
194
- - ✅ Honest about capabilities
195
- - ❌ Doesn't fix the issue
196
-
197
- **Recommendation:** Option B (Add HTML Parser)
198
- - Cheerio is lightweight (~500KB)
199
- - Industry standard for HTML parsing
200
- - Enables proper semantic extraction
201
- - Still simpler than full Playwright + LLM
202
-
203
- **Status:** ✅ Success - Fix options evaluated
204
-
205
- ---
206
-
207
- ## [12:06:00] Step 7 - Implement Fix with cheerio
208
-
209
- **Goal:** Add HTML parsing capability to structured extraction
210
- **Reasoning:** Option B provides best balance of functionality and complexity
211
- **Action:** Install cheerio and update extractStructuredData()
212
- **Result:** (to be implemented)
213
- **Status:** ⏸️ Pending decision
214
-
215
- ---
216
-
217
- # ROOT CAUSE SUMMARY
218
-
219
- **Issue:** visus_fetch_structured returns null for all schema fields
220
-
221
- **Root Causes:**
222
- 1. **Text extraction strips HTML structure** - format='text' removes all tags/attributes needed for semantic extraction
223
- 2. **Naive pattern matching** - extractStructuredData() only finds key-value pairs, cannot understand "extract the main heading"
224
- 3. **Phase 1 design limitation** - Documented as needing LLM-powered extraction in Phase 2
225
-
226
- **Impact:**
227
- - Tool is non-functional for extracting data from typical web pages
228
- - Only works for structured text formats (JSON-like, key-value)
229
- - Cannot extract link URLs, headings, or semantic content
230
-
231
- **Recommendation:**
232
- Add cheerio HTML parser to enable basic semantic extraction:
233
- - Parse HTML structure
234
- - Extract headings (<h1>, <h2>)
235
- - Extract paragraphs (<p>)
236
- - Extract links (<a href>)
237
- - Apply sanitization to extracted values
238
- - Maintain security-first design
239
-
240
- **Alternative:**
241
- Document as Phase 1 limitation and wait for Phase 2 LLM extraction
242
-
243
- ---
244
-
245
- **Status:** 🔍 Analysis complete, awaiting fix decision
246
- **Total Time:** 6 minutes
@@ -1,281 +0,0 @@
1
- # Visus MCP Test Timeout - Troubleshooting Log
2
-
3
- Started: 2026-03-20 09:42
4
- Goal: Resolve Jest test timeout to validate 43 injection pattern categories
5
-
6
- ---
7
-
8
- ## [09:42:00] Step 1 - Examine Jest Configuration
9
-
10
- **Goal:** Review jest.config.js to identify timeout settings and test environment
11
- **Reasoning:** Test timeouts are often caused by incorrect Jest configuration (missing timeout values, wrong test environment, or missing setup files)
12
- **Action:** cat jest.config.js
13
- **Result:**
14
- ```javascript
15
- {
16
- preset: 'ts-jest/presets/default-esm',
17
- testEnvironment: 'node',
18
- testTimeout: 15000,
19
- forceExit: true,
20
- detectOpenHandles: false
21
- }
22
- ```
23
- **Status:** ✅ Success - Config looks reasonable, has timeout and forceExit
24
-
25
- ---
26
-
27
- ## [09:42:30] Step 2 - Check if Playwright is Actually Installed
28
-
29
- **Goal:** Verify Playwright and its browsers are installed in node_modules
30
- **Reasoning:** Playwright tests hang if browsers aren't installed. The test might be trying to download browsers on first run, which could timeout
31
- **Action:** ls node_modules | grep playwright && ls node_modules | grep turndown
32
- **Result:**
33
- ```
34
- NOT FOUND - playwright not installed
35
- NOT FOUND - turndown not installed
36
- ```
37
- However, checked src/browser/playwright-renderer.ts and it's using native fetch(), not Playwright (Phase 2 feature).
38
- Tests also mock the renderer, so Playwright shouldn't be the issue.
39
- **Status:** ✅ Success - Playwright not needed for Phase 1
40
-
41
- ---
42
-
43
- ## [09:43:00] Step 3 - Run Tests with Verbose Output
44
-
45
- **Goal:** Execute jest and capture the exact point where it hangs
46
- **Reasoning:** Need to see which test file or specific test case is causing the timeout
47
- **Action:** npm test -- tests/sanitizer.test.ts --verbose
48
- **Result:**
49
- ```
50
- sh: jest: command not found
51
- ```
52
- **Status:** ❌ Failed - Jest not found in PATH
53
-
54
- ---
55
-
56
- ## [09:43:30] Step 4 - Check for Multiple Package Lock Files
57
-
58
- **Goal:** Investigate why jest is not in PATH despite node_modules existing
59
- **Reasoning:** Found jest in "node_modules/.bin 2" instead of "node_modules/.bin", suggests corrupted install
60
- **Action:** ls -la | grep package-lock
61
- **Result:**
62
- ```
63
- package-lock 2.json (173,875 bytes)
64
- package-lock.json (173,502 bytes)
65
- ```
66
- **Root Cause Found:** TWO package-lock.json files have created duplicate/corrupted node_modules structure
67
- **Status:** ✅ Success - Issue identified
68
-
69
- ---
70
-
71
- ## [09:44:00] Step 5 - Clean and Reinstall Dependencies
72
-
73
- **Goal:** Remove corrupted node_modules and reinstall from single package-lock.json
74
- **Reasoning:** Multiple package-lock files create conflicting dependency trees; clean slate needed
75
- **Action:** rm -rf node_modules "package-lock 2.json" && npm install
76
- **Result:** Install started but took excessive time, killed and restarted
77
- **Status:** ⚠️ Partial - Found additional issue: package.json missing runtime dependencies
78
-
79
- ---
80
-
81
- ## [09:45:00] Step 6 - Analyze package.json Dependencies
82
-
83
- **Goal:** Verify all required dependencies are listed in package.json
84
- **Reasoning:** npm install hanging might indicate dependency resolution issues
85
- **Action:** cat package.json | grep -A 10 dependencies
86
- **Result:**
87
- ```json
88
- "dependencies": {
89
- "@modelcontextprotocol/sdk": "^1.0.4"
90
- },
91
- "devDependencies": {
92
- "@types/jest": "^29.5.14",
93
- "@types/node": "^20.17.6",
94
- "jest": "^29.7.0",
95
- "ts-jest": "^29.2.5",
96
- "typescript": "^5.7.2"
97
- }
98
- ```
99
- **Finding:** According to STATUS.md, Playwright and Turndown should be in dependencies, but they're missing. However, src/browser/playwright-renderer.ts uses native fetch() for Phase 1, so this might be intentional.
100
- **Status:** ✅ Success - Dependencies appear minimal but correct for Phase 1
101
-
102
- ---
103
-
104
- ## [09:47:00] Step 7 - Retry npm install
105
-
106
- **Goal:** Complete clean installation of dependencies
107
- **Reasoning:** Previous install was interrupted; need clean node_modules
108
- **Action:** rm -rf node_modules && npm install
109
- **Result:**
110
- ```
111
- Successfully installed all dependencies
112
- Jest binary: node_modules/.bin/jest (verified)
113
- Packages: 231+ installed from cache
114
- ```
115
- **Status:** ✅ Success - Dependencies installed correctly
116
-
117
- ---
118
-
119
- ## [09:52:00] Step 8 - Test Jest Execution
120
-
121
- **Goal:** Determine why jest hangs when executed
122
- **Reasoning:** Dependencies are installed but jest --version and npm test both hang indefinitely
123
- **Action:** npm test (attempt 1, timeout after 30s)
124
- **Result:** Jest process hangs with NO output, even during initialization
125
- **Status:** ❌ Failed - Jest hangs at startup
126
-
127
- ---
128
-
129
- ## [09:53:00] Step 9 - Analyze Jest Hang Root Cause
130
-
131
- **Goal:** Identify why jest hangs before even starting tests
132
- **Reasoning:** Hang occurs before test execution, suggests config or module resolution issue
133
- **Hypothesis 1:** ESM module configuration issue (ts-jest with useESM: true)
134
- **Hypothesis 2:** TypeScript compilation during test runtime causing circular dependency
135
- **Hypothesis 3:** Missing or corrupted tsconfig.json
136
- **Action:** cat tsconfig.json && npx tsc tests/sanitizer.test.ts --noEmit
137
- **Result:**
138
- ```
139
- tsconfig excludes "tests" directory (not an issue - jest handles this)
140
- tsc command hangs indefinitely (same as jest!)
141
- ```
142
- **New Discovery:** Both TypeScript compiler AND jest hang - suggests source code issue
143
- **Status:** ⚠️ Partial - TypeScript compilation itself is broken
144
-
145
- ---
146
-
147
- ## [09:54:00] Step 10 - Check for Duplicate Source Files
148
-
149
- **Goal:** Find source code files that might cause compilation issues
150
- **Reasoning:** If tsc hangs, there might be circular imports or duplicate files confusing module resolution
151
- **Action:** ls -la src/sanitizer/patterns*
152
- **Result:**
153
- ```
154
- patterns 2.ts (13,867 bytes) - DUPLICATE FILE
155
- patterns.ts (13,589 bytes) - CORRECT FILE
156
- ```
157
- Also found earlier:
158
- ```
159
- dist/browser 2/ - DUPLICATE DIRECTORY
160
- node_modules/.bin 2 - DUPLICATE DIRECTORY (before clean install)
161
- ```
162
- **Root Cause Identified:** Multiple duplicate files throughout the project are corrupting TypeScript compilation and module resolution
163
- **Status:** ✅ Success - Found root cause!
164
-
165
- ---
166
-
167
- ## [09:55:00] Step 11 - Remove All Duplicate Files
168
-
169
- **Goal:** Delete all files with " 2" suffix that are corrupting the build
170
- **Reasoning:** Duplicate files cause TypeScript and Jest to hang during module resolution
171
- **Action:** find . -name "* 2.*" -o -name "* 2" | grep -v node_modules && rm duplicates
172
- **Result:**
173
- ```
174
- Found and removed:
175
- - ./dist/browser 2
176
- - ./package-lock 2.json
177
- - ./src/sanitizer/patterns 2.ts
178
- ```
179
- **Status:** ✅ Success - All duplicate files removed
180
-
181
- ---
182
-
183
- ## [09:56:00] Step 12 - Test After Removing Duplicates
184
-
185
- **Goal:** Verify tests run after duplicate file removal
186
- **Reasoning:** Duplicate files were corrupting module resolution; removal should fix the issue
187
- **Action:** rm tsconfig.tsbuildinfo && npm test
188
- **Result:** Test and build commands STILL hang, even after duplicate removal
189
- **Status:** ❌ Failed - Deeper issue exists
190
-
191
- ---
192
-
193
- ## [09:57:00] Step 13 - Isolate TypeScript Compilation Issue
194
-
195
- **Goal:** Determine if issue is with TypeScript compiler itself
196
- **Reasoning:** Both `tsc` and `jest` (which uses ts-jest) hang, suggesting TypeScript compilation is broken
197
- **Action:** npx tsc src/types.ts --outDir dist (single file compilation)
198
- **Result:** Even compiling a single simple file hangs indefinitely
199
- **Status:** ❌ Failed - TypeScript compiler is completely broken
200
-
201
- ---
202
-
203
- # RECOVERY SUMMARY
204
-
205
- **Final Status:** ⚠️ PARTIALLY RESOLVED
206
-
207
- ## Root Causes Identified
208
-
209
- 1. **Primary Issue:** Multiple duplicate files corrupting project structure
210
- - `package-lock 2.json` vs `package-lock.json`
211
- - `src/sanitizer/patterns 2.ts` vs `patterns.ts`
212
- - `dist/browser 2/` vs `dist/browser/`
213
- - `node_modules/.bin 2` (from multiple npm install attempts)
214
-
215
- 2. **Secondary Issue:** TypeScript compiler hangs on ALL compilation attempts
216
- - `tsc` hangs even on single-file compilation
217
- - `jest` (via ts-jest) hangs during test initialization
218
- - Issue persists even after removing all duplicate files
219
-
220
- ## Actions Taken
221
-
222
- ✅ Removed duplicate package-lock.json file
223
- ✅ Cleaned and reinstalled node_modules (231 packages)
224
- ✅ Verified jest binary installation
225
- ✅ Removed all duplicate source files (patterns 2.ts, browser 2/)
226
- ✅ Cleared TypeScript build cache (tsconfig.tsbuildinfo)
227
- ❌ Unable to compile TypeScript
228
- ❌ Unable to run tests
229
-
230
- ## Current Hypothesis
231
-
232
- The TypeScript compiler hang suggests one of the following:
233
-
234
- **Hypothesis A:** Circular dependency in source code
235
- - TypeScript enters infinite loop trying to resolve module imports
236
- - Need to analyze import graph for cycles
237
-
238
- **Hypothesis B:** Corrupted TypeScript installation
239
- - npm install may have installed corrupt TypeScript binaries
240
- - Solution: `rm -rf node_modules package-lock.json && npm install`
241
-
242
- **Hypothesis C:** System-level issue
243
- - File system corruption
244
- - macOS-specific TypeScript bug with spaces in filenames
245
-
246
- ## Recommended Next Steps
247
-
248
- 1. **Immediate:** Reinstall TypeScript and ts-jest
249
- ```bash
250
- npm uninstall typescript ts-jest
251
- npm install typescript@latest ts-jest@latest --save-dev
252
- ```
253
-
254
- 2. **If that fails:** Analyze source code for circular imports
255
- ```bash
256
- npx madge --circular --extensions ts src/
257
- ```
258
-
259
- 3. **If that fails:** Test on different machine/environment to rule out system issues
260
-
261
- 4. **Nuclear option:** Rewrite TypeScript source with known-good configuration from scratch
262
-
263
- ## Lessons Learned
264
-
265
- 1. **Duplicate files are catastrophic** - File system allowing spaces in names created "file 2.ext" duplicates
266
- 2. **npm install problems cascade** - Multiple package-lock files create corrupted node_modules
267
- 3. **TypeScript hangs are hard to debug** - No error output, just infinite loop
268
- 4. **Test early, test often** - Project had never successfully run tests before this session
269
-
270
- ## Open Issues
271
-
272
- - TypeScript compilation completely broken
273
- - Tests cannot run until TypeScript compiles
274
- - Phase 1 Definition of Done blocked: cannot validate 43 injection patterns
275
-
276
- ---
277
-
278
- **End of troubleshooting log - 2026-03-20 09:57**
279
- **Time elapsed:** ~15 minutes
280
- **Issues resolved:** 1/2 (duplicate files removed, TypeScript still broken)
281
- **Recommended action:** Try fresh TypeScript install or circular dependency analysis