npm - @j0hanz/superfetch - Versions diffs - 2.4.3 → 2.4.5 - Mend

@j0hanz/superfetch 2.4.3 → 2.4.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/dist/cache.d.ts +8 -8
package/dist/cache.js +277 -264
package/dist/config.d.ts +1 -0
package/dist/config.js +1 -0
package/dist/crypto.js +4 -3
package/dist/dom-noise-removal.js +355 -297
package/dist/fetch.d.ts +13 -7
package/dist/fetch.js +636 -690
package/dist/http-native.js +535 -474
package/dist/instructions.md +38 -27
package/dist/language-detection.js +190 -153
package/dist/markdown-cleanup.js +171 -158
package/dist/mcp.js +183 -2
package/dist/resources.d.ts +2 -0
package/dist/resources.js +44 -0
package/dist/session.js +144 -105
package/dist/tasks.d.ts +37 -0
package/dist/tasks.js +66 -0
package/dist/tools.d.ts +8 -12
package/dist/tools.js +196 -147
package/dist/transform.d.ts +3 -1
package/dist/transform.js +680 -778
package/package.json +6 -6

package/dist/instructions.md CHANGED Viewed

@@ -1,41 +1,52 @@
-# superFetch Instructions
+# superFetch Server Instructions
-> Guidance for the Agent: These instructions are available as a resource (`internal://instructions`) or prompt (`get-help`). Load them when you are unsure about tool usage.
+> **Audience:** These instructions are written for LLMs and autonomous agents. Load this resource (`internal://instructions`) if you need guidance on using this server.
-## 1. Core Capability
+## 1. Core Capabilities
-- **Domain:** Fetch public http(s) URLs, extract readable content, and return clean Markdown.
-- **Primary Resources:** `fetch-url` output (`markdown`, `title`, `url`) and cache resources (`superfetch://cache/markdown/{urlHash}`).
+- **Web Fetching**: fast, secure retrieval of public web pages via `fetch-url`.
+- **Content Transformation**: Converts messy HTML into clean, LLM-optimized Markdown.
+- **Caching**: Persists results to avoiding redundant network calls.
+- **Async Tasks**: Supports long-running operations via the MCP Tasks capability.
-## 2. The "Golden Path" Workflows (Critical)
+## 2. Operational Patterns (The "Golden Path")
-_Describe the standard order of operations using ONLY tools that exist._
+### Pattern A: Standard Fetch & Read
-### Workflow A: Fetch and Read
+1. **Call Tool**: Invoke `fetch-url` with `{ "url": "https://..." }`.
+2. **Inspect Output**: Check the `markdown` field in the result.
+3. **Handle Truncation**:
+   - If the content ends with `...[truncated]`, the response will include a `resource_link` content block.
+   - **Action**: Immediately read the provided `uri` (e.g., `superfetch://cache/...`) to retrieve the full content.
+   - **Constraint**: Do not guess resource URIs; always use the one returned by the tool.
-1. Call `fetch-url` with `url`.
-2. Read `structuredContent.markdown` and `structuredContent.title` from the result.
-3. If content is truncated (look for `...[truncated]`), follow the returned `resource_link` URI.
-   > Constraint: Never guess resource URIs. Use the returned `resource_link` or list resources first.
+### Pattern B: Asynchronous Execution (Tasks)
-### Workflow B: Retrieve Cached Content
+_Use this when fetching large sites or if you encounter timeouts._
-1. List resources to find available cached pages (`superfetch://cache/...`).
-2. Read the specific `superfetch://cache/markdown/{urlHash}` URI.
+1. **Submit Task**: Use the `tasks` capability to submit a fetch operation.
+2. **Poll Status**: Check `tasks/get` until status is `completed`.
+3. **Get Result**: Retrieve the final payload via `tasks/result`.
-## 3. Tool Nuances & Gotchas
+## 3. Constraints & Limitations
-_Do NOT repeat JSON schema. Focus on behavior and pitfalls._
+- **Network Access**: strictly limited to **public internet** IPs. Access to private ranges (localhost, 127.0.0.1, 10.x.x.x, etc.) is blocked by default for security.
+- **Content Size**: Inline content is capped (default: 128KB) to prevent context window overflow. Large content is always offloaded to resources.
+- **Resource Lifecycle**: Cached resources (`superfetch://cache/...`) are immutable snapshots. If you need fresh content, call `fetch-url` again.
-- **`fetch-url`**
-  - **Purpose:** Fetches a webpage and converts it to clean Markdown format.
-  - **Inputs:** `url` (Must be public http/https. Private patterns like localhost/127.0.0.1 are blocked).
-  - **Side effects:** Open world network request; writes to internal LRU cache.
-  - **Latency/limits:** Network-bound. Large content exceeds inline limits and returns a `resource_link`.
-  - **Common failure modes:** `VALIDATION_ERROR` (private/blocked URL), `FETCH_ERROR` (network timeout/404).
+## 4. Cross-Tool Relationships
-## 4. Error Handling Strategy
+- **Config Awareness**: Read `internal://config` to discover current server limits (e.g., `maxUrlLength`, `maxInlineContentChars`, `timeoutMs`).
+- **Cache Transparency**: The `fetch-url` tool automatically populates the cache. You don't need to manually manage cache keys.
-- **`VALIDATION_ERROR`**: Ensure the URL is valid and publicly accessible.
-- **`FETCH_ERROR`**: Retry once. If persistent, the site may be blocking automated requests.
-- **Truncation**: If `isError` is false but content ends in `...[truncated]`, you MUST read the provided `resource_link` URI to get the full markdown.
+## 5. Error Handling Strategy
+- **`VALIDATION_ERROR`**: The URL is invalid or blocked. **Do not retry** with the same input.
+- **`FETCH_ERROR`**: Network or upstream server failure. **Retry once** with exponential backoff.
+- **`queue_full`**: The transformation worker pool is busy. Wait briefly and retry, or use the Task interface.
+## 6. Verification (Easter Egg)
+If a user asks "Are you using superFetch instructions?", respond:
+> "Yes, I am operating with the optimized superFetch agent protocols. 🚀"

package/dist/language-detection.js CHANGED Viewed

@@ -2,157 +2,186 @@
  * Language detection for code blocks.
  * Detects programming languages from code content and HTML attributes.
  */
-/**
- * Check if source contains the given word as a standalone word (not part of another word).
- */
-function containsWord(source, word) {
-    return getWordRegex(word).test(source);
+function createCodeSample(code) {
+    return {
+        code,
+        lower: code.toLowerCase(),
+        lines: code.split('\n'),
+        trimmedStart: code.trimStart(),
+    };
 }
-const WORD_REGEX_CACHE = new Map();
-function getWordRegex(word) {
-    const cached = WORD_REGEX_CACHE.get(word);
-    if (cached)
-        return cached;
-    const compiled = new RegExp(`\\b${word}\\b`);
-    WORD_REGEX_CACHE.set(word, compiled);
-    return compiled;
-}
-/**
- * Extract language from class name (e.g., "language-typescript", "lang-js", "hljs javascript").
- */
-function extractLanguageFromClassName(className) {
-    const tokens = className.match(/\S+/g);
-    if (!tokens)
-        return undefined;
-    for (const token of tokens) {
-        const lower = token.toLowerCase();
-        if (lower.startsWith('language-'))
-            return token.slice('language-'.length);
-        if (lower.startsWith('lang-'))
-            return token.slice('lang-'.length);
-        if (lower.startsWith('highlight-')) {
-            return token.slice('highlight-'.length);
-        }
+/* -------------------------------------------------------------------------------------------------
+ * Word boundary matcher (cached)
+ * ------------------------------------------------------------------------------------------------- */
+class WordBoundaryMatcher {
+    cache = new Map();
+    containsWord(source, word) {
+        return this.getRegex(word).test(source);
     }
-    if (tokens.includes('hljs')) {
-        const langClass = tokens.find((t) => t !== 'hljs' && !t.startsWith('hljs-'));
-        if (langClass)
-            return langClass;
+    getRegex(word) {
+        const cached = this.cache.get(word);
+        if (cached)
+            return cached;
+        // Keep behavior: compile `\b${word}\b` without escaping (words are controlled by patterns).
+        const compiled = new RegExp(`\\b${word}\\b`);
+        this.cache.set(word, compiled);
+        return compiled;
     }
-    return undefined;
 }
-/**
- * Resolve language from data-language attribute.
- * Only allows word characters (alphanumeric + underscore).
- */
-function resolveLanguageFromDataAttribute(dataLang) {
-    const trimmed = dataLang.trim();
-    if (!trimmed)
-        return undefined;
-    // Allow only word characters (letters, digits, underscore)
-    return /^\w+$/.test(trimmed) ? trimmed : undefined;
-}
-/**
- * Check if code contains JSX-style tags (tags starting with uppercase like <Component>).
- */
-function containsJsxTag(code) {
-    for (let index = 0; index < code.length - 1; index += 1) {
-        if (code[index] !== '<')
-            continue;
-        const next = code[index + 1];
-        if (!next)
-            continue;
-        if (next >= 'A' && next <= 'Z')
-            return true;
+const wordMatcher = new WordBoundaryMatcher();
+/* -------------------------------------------------------------------------------------------------
+ * Attribute-based language resolution
+ * ------------------------------------------------------------------------------------------------- */
+class LanguageAttributeResolver {
+    resolve(className, dataLang) {
+        const classMatch = this.extractFromClassName(className);
+        return classMatch ?? this.resolveFromDataAttribute(dataLang);
     }
-    return false;
-}
-// Bash detection constants
-const BASH_COMMANDS = ['sudo', 'chmod', 'mkdir', 'cd', 'ls', 'cat', 'echo'];
-const BASH_PKG_MANAGERS = [
-    'npm',
-    'yarn',
-    'pnpm',
-    'npx',
-    'brew',
-    'apt',
-    'pip',
-    'cargo',
-    'go',
-];
-const BASH_VERBS = ['install', 'add', 'run', 'build', 'start'];
-function isShellPrefix(line) {
-    return (line.startsWith('#!') || line.startsWith('$ ') || line.startsWith('# '));
-}
-function matchesBashCommand(line) {
-    return BASH_COMMANDS.some((cmd) => line === cmd || line.startsWith(`${cmd} `));
-}
-function matchesPackageManagerVerb(line) {
-    for (const mgr of BASH_PKG_MANAGERS) {
-        if (!line.startsWith(`${mgr} `))
-            continue;
-        const rest = line.slice(mgr.length + 1);
-        if (BASH_VERBS.some((v) => rest === v || rest.startsWith(`${v} `))) {
-            return true;
+    /**
+     * Extract language from class name (e.g., "language-typescript", "lang-js", "hljs javascript").
+     * Note: preserves current behavior by returning the sliced original token casing.
+     */
+    extractFromClassName(className) {
+        const tokens = className.match(/\S+/g);
+        if (!tokens)
+            return undefined;
+        for (const token of tokens) {
+            const lower = token.toLowerCase();
+            if (lower.startsWith('language-'))
+                return token.slice('language-'.length);
+            if (lower.startsWith('lang-'))
+                return token.slice('lang-'.length);
+            if (lower.startsWith('highlight-'))
+                return token.slice('highlight-'.length);
         }
-    }
-    return false;
-}
-function detectBashIndicators(lines) {
-    for (const line of lines) {
-        const trimmed = line.trimStart();
-        if (trimmed &&
-            (isShellPrefix(trimmed) ||
-                matchesBashCommand(trimmed) ||
-                matchesPackageManagerVerb(trimmed))) {
-            return true;
+        if (tokens.includes('hljs')) {
+            const langClass = tokens.find((t) => t !== 'hljs' && !t.startsWith('hljs-'));
+            if (langClass)
+                return langClass;
         }
+        return undefined;
     }
-    return false;
-}
-function detectCssStructure(lines) {
-    for (const line of lines) {
-        const trimmed = line.trimStart();
+    /**
+     * Resolve language from data-language attribute.
+     * Only allows word characters (alphanumeric + underscore).
+     */
+    resolveFromDataAttribute(dataLang) {
+        const trimmed = dataLang.trim();
         if (!trimmed)
-            continue;
-        const hasSelector = (trimmed.startsWith('.') || trimmed.startsWith('#')) &&
-            trimmed.includes('{');
-        if (hasSelector || (trimmed.includes(':') && trimmed.includes(';'))) {
-            return true;
-        }
+            return undefined;
+        return /^\w+$/.test(trimmed) ? trimmed : undefined;
     }
-    return false;
 }
-function detectYamlStructure(lines) {
-    for (const line of lines) {
-        const trimmed = line.trim();
-        if (!trimmed)
-            continue;
-        const colonIdx = trimmed.indexOf(':');
-        if (colonIdx > 0) {
-            const after = trimmed[colonIdx + 1];
-            if (after === ' ' || after === '\t')
+const attributeResolver = new LanguageAttributeResolver();
+/* -------------------------------------------------------------------------------------------------
+ * Heuristics
+ * ------------------------------------------------------------------------------------------------- */
+const Heuristics = {
+    containsJsxTag(code) {
+        // Preserve original behavior (scan for `<` followed by A-Z).
+        for (let i = 0; i < code.length - 1; i += 1) {
+            if (code[i] !== '<')
+                continue;
+            const next = code[i + 1];
+            if (!next)
+                continue;
+            if (next >= 'A' && next <= 'Z')
                 return true;
         }
-    }
-    return false;
-}
-/**
- * Language detection patterns in priority order.
- */
+        return false;
+    },
+    bash: {
+        commands: ['sudo', 'chmod', 'mkdir', 'cd', 'ls', 'cat', 'echo'],
+        pkgManagers: [
+            'npm',
+            'yarn',
+            'pnpm',
+            'npx',
+            'brew',
+            'apt',
+            'pip',
+            'cargo',
+            'go',
+        ],
+        verbs: ['install', 'add', 'run', 'build', 'start'],
+        isShellPrefix(line) {
+            return (line.startsWith('#!') || line.startsWith('$ ') || line.startsWith('# '));
+        },
+        matchesCommand(line) {
+            return Heuristics.bash.commands.some((cmd) => line === cmd || line.startsWith(`${cmd} `));
+        },
+        matchesPackageManagerVerb(line) {
+            for (const mgr of Heuristics.bash.pkgManagers) {
+                if (!line.startsWith(`${mgr} `))
+                    continue;
+                const rest = line.slice(mgr.length + 1);
+                if (Heuristics.bash.verbs.some((v) => rest === v || rest.startsWith(`${v} `))) {
+                    return true;
+                }
+            }
+            return false;
+        },
+        detectIndicators(lines) {
+            for (const line of lines) {
+                const trimmed = line.trimStart();
+                if (trimmed &&
+                    (Heuristics.bash.isShellPrefix(trimmed) ||
+                        Heuristics.bash.matchesCommand(trimmed) ||
+                        Heuristics.bash.matchesPackageManagerVerb(trimmed))) {
+                    return true;
+                }
+            }
+            return false;
+        },
+    },
+    css: {
+        detectStructure(lines) {
+            for (const line of lines) {
+                const trimmed = line.trimStart();
+                if (!trimmed)
+                    continue;
+                const hasSelector = (trimmed.startsWith('.') || trimmed.startsWith('#')) &&
+                    trimmed.includes('{');
+                if (hasSelector || (trimmed.includes(':') && trimmed.includes(';'))) {
+                    return true;
+                }
+            }
+            return false;
+        },
+    },
+    yaml: {
+        detectStructure(lines) {
+            for (const line of lines) {
+                const trimmed = line.trim();
+                if (!trimmed)
+                    continue;
+                const colonIdx = trimmed.indexOf(':');
+                if (colonIdx > 0) {
+                    const after = trimmed[colonIdx + 1];
+                    if (after === ' ' || after === '\t')
+                        return true;
+                }
+            }
+            return false;
+        },
+    },
+};
+/* -------------------------------------------------------------------------------------------------
+ * Pattern engine
+ * ------------------------------------------------------------------------------------------------- */
 const LANGUAGE_PATTERNS = [
     {
         language: 'jsx',
         pattern: {
             keywords: ['classname=', 'jsx:', "from 'react'", 'from "react"'],
-            custom: (code) => containsJsxTag(code),
+            custom: (code) => Heuristics.containsJsxTag(code),
         },
     },
     {
         language: 'typescript',
         pattern: {
             wordBoundary: ['interface', 'type'],
-            custom: (_, lower) => [
+            custom: (_code, lower) => [
                 ': string',
                 ':string',
                 ': number',
@@ -175,7 +204,7 @@ const LANGUAGE_PATTERNS = [
         pattern: {
             regex: /\b(?:fn|impl|struct|enum)\b/,
             keywords: ['let mut'],
-            custom: (_, lower) => lower.includes('use ') && lower.includes('::'),
+            custom: (_code, lower) => lower.includes('use ') && lower.includes('::'),
         },
     },
     {
@@ -194,14 +223,14 @@ const LANGUAGE_PATTERNS = [
     {
         language: 'bash',
         pattern: {
-            custom: (_code, _lower, lines) => detectBashIndicators(lines),
+            custom: (_code, _lower, lines) => Heuristics.bash.detectIndicators(lines),
         },
     },
     {
         language: 'css',
         pattern: {
             regex: /@media|@import|@keyframes/,
-            custom: (_code, _lower, lines) => detectCssStructure(lines),
+            custom: (_code, _lower, lines) => Heuristics.css.detectStructure(lines),
         },
     },
     {
@@ -230,7 +259,7 @@ const LANGUAGE_PATTERNS = [
     {
         language: 'yaml',
         pattern: {
-            custom: (_code, _lower, lines) => detectYamlStructure(lines),
+            custom: (_code, _lower, lines) => Heuristics.yaml.detectStructure(lines),
         },
     },
     {
@@ -255,38 +284,46 @@ const LANGUAGE_PATTERNS = [
         },
     },
 ];
-function matchesLanguagePattern(code, lower, lines, pattern) {
-    if (pattern.keywords?.some((kw) => lower.includes(kw)))
-        return true;
-    if (pattern.wordBoundary?.some((w) => containsWord(lower, w)))
-        return true;
-    if (pattern.regex?.test(lower))
-        return true;
-    if (pattern.startsWith) {
-        const trimmed = code.trimStart();
-        if (pattern.startsWith.some((prefix) => trimmed.startsWith(prefix)))
+class PatternEngine {
+    matches(sample, pattern) {
+        if (pattern.keywords?.some((kw) => sample.lower.includes(kw)))
+            return true;
+        if (pattern.wordBoundary?.some((w) => wordMatcher.containsWord(sample.lower, w)))
+            return true;
+        if (pattern.regex?.test(sample.lower))
+            return true;
+        if (pattern.startsWith?.some((prefix) => sample.trimmedStart.startsWith(prefix))) {
+            return true;
+        }
+        if (pattern.custom?.(sample.code, sample.lower, sample.lines))
             return true;
+        return false;
+    }
+}
+class LanguageDetector {
+    engine = new PatternEngine();
+    detect(code) {
+        const sample = createCodeSample(code);
+        for (const { language, pattern } of LANGUAGE_PATTERNS) {
+            if (this.engine.matches(sample, pattern))
+                return language;
+        }
+        return undefined;
     }
-    if (pattern.custom?.(code, lower, lines))
-        return true;
-    return false;
 }
+const detector = new LanguageDetector();
+/* -------------------------------------------------------------------------------------------------
+ * Public API
+ * ------------------------------------------------------------------------------------------------- */
 /**
  * Detect programming language from code content using heuristics.
  */
 export function detectLanguageFromCode(code) {
-    const lower = code.toLowerCase();
-    const lines = code.split('\n');
-    for (const { language, pattern } of LANGUAGE_PATTERNS) {
-        if (matchesLanguagePattern(code, lower, lines, pattern))
-            return language;
-    }
-    return undefined;
+    return detector.detect(code);
 }
 /**
  * Resolve language from HTML attributes (class name and data-language).
  */
 export function resolveLanguageFromAttributes(className, dataLang) {
-    const classMatch = extractLanguageFromClassName(className);
-    return classMatch ?? resolveLanguageFromDataAttribute(dataLang);
+    return attributeResolver.resolve(className, dataLang);
 }