npm - @antodevs/groundtruth - Versions diffs - 0.1.4 → 0.2.1 - Mend

@antodevs/groundtruth 0.1.4 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/README.md CHANGED Viewed

@@ -1,5 +1,3 @@
-![GroundTruth Banner](assets/banner.png)
 # GroundTruth
 > Zero-configuration context injection layer for LLM-based coding agents.
@@ -43,6 +41,13 @@ Current-generation AI coding assistants (Claude Code, Antigravity, Cursor) suffe
 **GroundTruth** acts as a transparent middleware layer that resolves this by dynamically injecting real-time, stack-specific documentation directly into the agent's context window prior to inference.
+### The v0.2.0 Engine: Jina Reader & Source Registry
+GroundTruth v0.2.0 introduces a massive upgrade to content quality:
+- **Jina Reader API Integration**: Parses dynamic, JavaScript-rendered SPAs (like Vercel AI SDK, Next.js, and Svelte docs) into clean, LLM-optimized Markdown.
+- **Smart Source Registry**: Automatically bypasses search engines for the top 20+ frameworks (React, Svelte, Vue, Astro, etc.) and fetches their official documentation directly.
+- **Readability Fallback**: Ensures reliable extraction even if the primary engine fails.
 ---
 ## Architecture & Operational Mechanics
@@ -56,22 +61,18 @@ In this mode, GroundTruth provisions a local HTTP proxy that intercepts outbound
 ```mermaid
 sequenceDiagram
     participant Agent as Claude Code
-    participant Proxy as GroundTruth Proxy
-    participant Search as DuckDuckGo
+    participant Proxy as GroundTruth
+    participant Jina as Jina Reader API
     participant API as Anthropic API
-    Agent->>Proxy: Send Prompt (POST /v1/messages)
-    Proxy->>Search: Extract stack/query & scrape live docs
-    Search-->>Proxy: Return real-time web context
+    Agent->>Proxy: Send Prompt
+    Proxy->>Jina: Fetch docs (Direct Registry / DDG)
+    Jina-->>Proxy: Return clean Markdown
     Note over Proxy: Injects live context<br/>into System Prompt
     Proxy->>API: Forward mutated request
-    API-->>Agent: Return completion with fresh knowledge
+    API-->>Agent: Return response
 ```
-- **Query Extraction**: Parses the user prompt to identify context dependencies.
-- **Data Hydration**: Orchestrates an automated DuckDuckGo search to fetch the most recent documentation. It relies on a deterministic `LRUCache`, TCP keep-alive Pool configurations, and a 429-aware `CircuitBreaker` pattern to safeguard network operations safely.
-- **Payload Mutation**: Mutates the outgoing system prompt to inject the scraped live context before forwarding the request to the Anthropic completion endpoint. (It includes type-guard structures making it safe from undocumented Gemini system changes).
 ### 2. File Watcher Mode (Designed for `antigravity` / `gemini`)
 For agents that support side-channel context ingestion via dotfiles (like Antigravity Rules), GroundTruth runs as a background daemon.
@@ -79,42 +80,33 @@ For agents that support side-channel context ingestion via dotfiles (like Antigr
 ```mermaid
 flowchart TD
     pkg([package.json]) -->|Parse Dependencies| GT{GroundTruth Watcher}
-    GT -->|Search Stack Queries| DDG[(DuckDuckGo)]
-    DDG -->|Return Live Docs| GT
-    GT -->|Sync periodically| File[~/.gemini/GEMINI.md]
-    File -->|Auto-loaded| Agent(Antigravity Agent)
-    Agent -->|Execute| Prompt[Prompt with Fresh Context]
-    classDef core fill:#3B82F6,stroke:#fff,stroke-width:2px,color:#fff;
-    class GT,Agent core;
+    GT -->|Smart Routing| Map{Registry?}
+    Map -->|Yes| Jina[Jina Reader API]
+    Map -->|No| DDG[DuckDuckGo Search] --> Jina
+    Jina -->|Clean Markdown| Gen[Write to ~/.gemini/GEMINI.md]
+    Gen --> Agent(Coding Assistant)
 ```
-- **Stack Introspection**: Analyzes the local `package.json` to infer the project's dependency graph.
-- **Intelligent Chunking**: Groups the filtered dependencies in configurable size batches (default 3) and uniquely hashes them to avoid redundant context-fetching loops unless changes are detected.
-- **Automated Polling**: Periodically fetches updated documentation for the detected stack chunks in parallel.
-- **State Persistence**: Hashes are serialized persistently avoiding redundant DuckDuckGo scraping operations across application crashes.
-- **Block-Based Synchronization**: Writes the parsed context discretely into hash-oriented blocks inside `~/.gemini/GEMINI.md`. Native POSIX bindings and intra-device temporary files are leveraged ensuring `Atomic Writes` without EXDEV link errors. Stale contexts are efficiently garbage-collected via regex matching over tracked batch hashes.
 ---
-### Usage with Claude Code
-```bash
-# Initialize GroundTruth in proxy mode (auto-exports ANTHROPIC_BASE_URL)
-npx @antodevs/groundtruth --claude-code
-# Execute your agent in a separate TTY
-claude
-```
-> **Note:** The daemon automatically mutates your shell environment (`~/.zshrc`, `~/.bashrc`, `~/.bash_profile`, `~/.config/fish/config.fish`) to route traffic through the localhost proxy.
+## Configuration (`.groundtruth.json`)
-### Usage with Antigravity / Gemini
-```bash
-cd /workspace/your-project
+You can globally or locally configure GroundTruth by creating a `.groundtruth.json` file in your directory:
-# Initialize the daemon in file watcher mode
-npx @antodevs/groundtruth --antigravity
+```json
+{
+  "maxTokens": 4000,
+  "quality": "high",
+  "verbose": true,
+  "sources": [
+    { "url": "https://svelte.dev/docs/kit/introduction", "label": "SvelteKit Docs" }
+  ]
+}
 ```
-> **Note:** GroundTruth will continuously poll and sync documentation based on your `package.json` manifest.
+- **`maxTokens`**: The maximum length of characters injected for a single page.
+- **`quality`**: `low`, `medium`, or `high`. Controls how many search results to retrieve and the timeout budget.
+- **`sources`**: Useful for custom, internal, or highly specific documentation that GroundTruth should always inject.
 ---
@@ -124,24 +116,27 @@ npx @antodevs/groundtruth --antigravity
 |------|------|-------------|
 | `--claude-code` | Proxy | Initializes HTTP interceptor for Anthropic API payloads. |
 | `--antigravity` | Rules | Initializes background daemon for dotfile synchronization. |
-| `--use-package-json` | Both | Enforces AST/manifest parsing of `package.json` for query generation. |
+| `--uninstall` | Cleanup | Removes `ANTHROPIC_BASE_URL` from all shell config files. |
 | `--port <n>` | Proxy | Overrides default proxy listener port (Default: `8080`). |
+| `--quality <level>`| Both | `low`, `medium`, or `high` quality preset (Default: `medium`). |
+| `--max-tokens <n>` | Both | Modifies the character limit per injected context block (Default: `4000`). |
 | `--interval <n>` | Rules | Overrides the polling interval for documentation refresh in minutes (Default: `5`). |
-| `--batch-size <n>` | Rules | Changes the amount of dependencies per query chunk for block fetching (Default: `3`, Min: `2`, Max: `5`). |
+| `--batch-size <n>` | Rules | Changes the amount of dependencies per query chunk for block fetching. |
+| `--verbose` | Both | Enables verbose logging output. |
 ---
 ## Benchmark & Comparison
-GroundTruth is heavily optimized for zero-configuration deployments and minimal token overhead compared to existing MCP (Model Context Protocol) solutions.
+GroundTruth is optimized for zero-configuration deployments and minimal token overhead compared to existing MCP solutions.
-| Feature | GroundTruth | Brave MCP | Playwright MCP | Firecrawl |
-|---------|-------------|-----------|----------------|-----------|
-| **Authentication** | None Required | API Key | None Required | API Key |
-| **Token Overhead** | ~500 tokens | ~800 tokens | ~13,000 tokens | ~800 tokens |
-| **Antigravity Support** | Native | Unsupported | Unsupported | Unsupported |
-| **Runtime Footprint** | < 1MB | < 1MB | ~200MB | < 1MB |
-| **Shell Auto-config** | Automated | Manual | Manual | Manual |
+| Feature | GroundTruth | Jina Reader (Direct) | Crawl4AI / Playwright | Firecrawl |
+|---------|-------------|----------------------|-----------------------|-----------|
+| **Setup Required** | None (1 command) | Scripting needed | High (Docker/Deps) | High (API Key) |
+| **JS Rendering** | ✅ Yes (via Jina) | ✅ Yes | ✅ Yes | ✅ Yes |
+| **Agent Injection** | ✅ Auto (Proxy/File) | ❌ Manual integration | ❌ Manual integration | ❌ Manual integration |
+| **Cost** | Free | Rate limits apply | Free | Paid |
+| **Runtime Footprint** | < 1MB | N/A | ~200MB | N/A |
 ---

package/index.js CHANGED Viewed

@@ -4,14 +4,18 @@
  * @description Entry point runtime groundtruth delegazione CLI o proxy flow logic.
  */
 import { chalk, label } from './src/logger.js';
-import { usePackageJson, antigravityMode, claudeCodeMode, port, intervalMinutes, batchSize, version } from './src/cli.js';
+import { usePackageJson, antigravityMode, claudeCodeMode, uninstallMode, port, intervalMinutes, batchSize, version } from './src/cli.js';
 import { createServer } from './src/proxy.js';
-import { autoSetEnv } from './src/env.js';
+import { autoSetEnv, removeEnv } from './src/env.js';
 import { startWatcher } from './src/watcher.js';
 // ─── Dispatcher start app logic ──────────────────────
-if (antigravityMode) {
+if (uninstallMode) {
+  console.log(`\n  ${chalk.white.bold('GroundTruth')}  ${chalk.gray(`v${version}`)}  ${chalk.gray('[uninstall]')}\n`);
+  await removeEnv();
+  process.exit(0);
+} else if (antigravityMode) {
   startWatcher({ intervalMinutes, usePackageJson, batchSize });
 } else if (claudeCodeMode) {
   const server = await createServer(usePackageJson);

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@antodevs/groundtruth",
-  "version": "0.1.4",
+  "version": "0.2.1",
   "description": "Lightweight Node.js proxy to intercept API requests from coding agents and inject fresh web context",
   "type": "module",
   "license": "MIT",

package/specification.yaml CHANGED Viewed

@@ -1,5 +1,5 @@
 name: GroundTruth
-version: 0.1.0
+version: 0.1.4
 description: |
   GroundTruth is a zero-configuration, transparent middleware context injection layer.
   It is designed to bridge the deterministic knowledge cutoff gap of LLM-based coding agents

package/src/circuit-breaker.js CHANGED Viewed

@@ -35,7 +35,7 @@ export class CircuitBreaker {
             this.onSuccess();
             return result;
         } catch (err) {
-            this.onFailure();
+            this.onFailure(err);
             throw err;
         }
     }
@@ -53,9 +53,15 @@ export class CircuitBreaker {
         }
     }
-    onFailure() {
-        this.failures++;
+    onFailure(err) {
+        // 429 rate limit apre il circuito immediatamente
+        if (err?.message?.includes('429')) {
+            this.failures = this.failureThreshold;
+        } else {
+            this.failures++;
+        }
         this.lastFailureTime = Date.now();
+        this.halfOpenSuccesses = 0; // reset per evitare accumulo tra cicli HALF_OPEN
         if (this.failures >= this.failureThreshold) {
             this.state = 'OPEN';
         }

package/src/cli.js CHANGED Viewed

@@ -4,6 +4,7 @@
  */
 import { chalk } from './logger.js';
 import { createRequire } from 'module';
+import { loadConfig, resolveQuality } from './config.js';
 const { version } = createRequire(import.meta.url)('../package.json');
@@ -13,21 +14,29 @@ const args = process.argv.slice(2);
 const usePackageJson = args.includes('--use-package-json');
 const antigravityMode = args.includes('--antigravity');
 const claudeCodeMode = args.includes('--claude-code');
+const uninstallMode = args.includes('--uninstall');
 // Stop immediato se nessun mode definito
-if (!antigravityMode && !claudeCodeMode) {
+if (!antigravityMode && !claudeCodeMode && !uninstallMode) {
     console.log();
     console.log(`  ${chalk.white.bold('GroundTruth')}  ${chalk.gray(`v${version}`)}`);
     console.log();
     console.log(`  Usage:`);
     console.log(`    groundtruth --claude-code       proxy mode (Claude Code)`);
     console.log(`    groundtruth --antigravity       rules mode (Antigravity/Gemini)`);
+    console.log(`    groundtruth --uninstall         remove shell env config`);
     console.log();
     console.log(`  Options:`);
     console.log(`    --use-package-json   use package.json as search query`);
     console.log(`    --port <n>           custom port, default 8080 (claude-code only)`);
     console.log(`    --interval <n>       refresh in minutes, default 5 (antigravity only)`);
     console.log(`    --batch-size <n>     deps per search batch (default: 3)`);
+    console.log(`    --max-tokens <n>     max tokens per context block (default: 4000)`);
+    console.log(`    --quality <level>    low | medium | high (default: medium)`);
+    console.log(`    --verbose            enable detailed extraction logging`);
+    console.log();
+    console.log(`  Config:`);
+    console.log(`    Place a .groundtruth.json in your project root for persistent settings.`);
     console.log();
     console.log(`  Docs:`);
     console.log(`    Claude Code   →  export ANTHROPIC_BASE_URL=http://localhost:8080`);
@@ -38,13 +47,13 @@ if (!antigravityMode && !claudeCodeMode) {
 // ─── Default params override ─────────────────────────
-let port = 8080; // Default Anthropic proxy
+let port = 8080;
 const portArgIndex = args.indexOf('--port');
 if (portArgIndex !== -1 && args[portArgIndex + 1]) {
     port = parseInt(args[portArgIndex + 1], 10);
 }
-let intervalMinutes = 5; // Default context refresh
+let intervalMinutes = 5;
 const intervalArgIndex = args.indexOf('--interval');
 if (intervalArgIndex !== -1 && args[intervalArgIndex + 1]) {
     intervalMinutes = parseInt(args[intervalArgIndex + 1], 10) || 5;
@@ -55,4 +64,32 @@ const batchSize = batchSizeIndex !== -1
     ? Math.max(2, Math.min(parseInt(args[batchSizeIndex + 1]) || 3, 5))
     : 3;
-export { args, usePackageJson, antigravityMode, claudeCodeMode, port, intervalMinutes, batchSize, version };
+// ─── New v1.2 flags ──────────────────────────────────
+const maxTokensIndex = args.indexOf('--max-tokens');
+const cliMaxTokens = maxTokensIndex !== -1
+    ? Math.max(500, Math.min(parseInt(args[maxTokensIndex + 1]) || 4000, 8000))
+    : null;
+const qualityIndex = args.indexOf('--quality');
+const cliQuality = qualityIndex !== -1 && ['low', 'medium', 'high'].includes(args[qualityIndex + 1])
+    ? args[qualityIndex + 1]
+    : null;
+const cliVerbose = args.includes('--verbose');
+// ─── Merge CLI + .groundtruth.json ───────────────────
+const fileConfig = await loadConfig();
+const maxTokens = cliMaxTokens ?? fileConfig.maxTokens;
+const quality = cliQuality ?? fileConfig.quality;
+const verbose = cliVerbose || fileConfig.verbose;
+const qualitySettings = resolveQuality(quality);
+const customSources = fileConfig.sources;
+export {
+    args, usePackageJson, antigravityMode, claudeCodeMode, uninstallMode,
+    port, intervalMinutes, batchSize, version,
+    maxTokens, quality, qualitySettings, verbose, customSources
+};

package/src/config.js ADDED Viewed

@@ -0,0 +1,67 @@
+/**
+ * @module config
+ * @description Carica configurazione opzionale da .groundtruth.json nella cwd.
+ */
+import { readFile } from 'fs/promises';
+import { existsSync } from 'fs';
+import path from 'path';
+// ─── Quality Presets ─────────────────────────────────
+const QUALITY_PRESETS = {
+    low: { ddgResults: 1, charsPerPage: 2000, jinaTimeout: 5000 },
+    medium: { ddgResults: 3, charsPerPage: 4000, jinaTimeout: 8000 },
+    high: { ddgResults: 5, charsPerPage: 8000, jinaTimeout: 12000 },
+};
+/**
+ * @description Risolve preset quality da stringa a parametri operativi.
+ * @param   {string} level - "low" | "medium" | "high"
+ * @returns {Object} { ddgResults, charsPerPage, jinaTimeout }
+ */
+export function resolveQuality(level) {
+    return QUALITY_PRESETS[level] || QUALITY_PRESETS.medium;
+}
+// ─── Config Defaults ─────────────────────────────────
+const DEFAULTS = {
+    maxTokens: 4000,
+    quality: 'medium',
+    verbose: false,
+    sources: [],
+};
+/**
+ * @description Carica .groundtruth.json dalla cwd, merge con defaults.
+ * @returns {Promise<Object>} Configurazione finale mergiata
+ */
+export async function loadConfig() {
+    const configPath = path.resolve(process.cwd(), '.groundtruth.json');
+    if (!existsSync(configPath)) return { ...DEFAULTS };
+    try {
+        const raw = await readFile(configPath, 'utf8');
+        const parsed = JSON.parse(raw);
+        return {
+            maxTokens: clamp(parsed.maxTokens ?? DEFAULTS.maxTokens, 500, 8000),
+            quality: ['low', 'medium', 'high'].includes(parsed.quality) ? parsed.quality : DEFAULTS.quality,
+            verbose: typeof parsed.verbose === 'boolean' ? parsed.verbose : DEFAULTS.verbose,
+            sources: Array.isArray(parsed.sources) ? parsed.sources.filter(s => s && s.url) : DEFAULTS.sources,
+        };
+    } catch {
+        return { ...DEFAULTS };
+    }
+}
+/**
+ * @description Clamp numerico con min/max bounds.
+ */
+function clamp(val, min, max) {
+    const n = parseInt(val, 10);
+    if (isNaN(n)) return min;
+    return Math.max(min, Math.min(n, max));
+}
+export { QUALITY_PRESETS };

package/src/env.js CHANGED Viewed

@@ -118,3 +118,43 @@ export async function autoSetEnv(p) {
         log(LOG_WARN, chalk.yellow, chalk.white('env setup error') + `  →  ${chalk.yellow(err.message)}`);
     }
 }
+/**
+ * @description Rimuove ANTHROPIC_BASE_URL da tutti i file di configurazione shell.
+ * @returns {Promise<void>}
+ */
+export async function removeEnv() {
+    const homeDir = os.homedir();
+    const targets = [
+        { file: path.join(homeDir, '.zshrc'), pattern: /^export ANTHROPIC_BASE_URL=.*\n?/gm },
+        { file: path.join(homeDir, '.bashrc'), pattern: /^export ANTHROPIC_BASE_URL=.*\n?/gm },
+        { file: path.join(homeDir, '.bash_profile'), pattern: /^export ANTHROPIC_BASE_URL=.*\n?/gm },
+        { file: path.join(homeDir, '.profile'), pattern: /^export ANTHROPIC_BASE_URL=.*\n?/gm },
+        { file: path.join(homeDir, '.config', 'fish', 'config.fish'), pattern: /^set -gx ANTHROPIC_BASE_URL .*\n?/gm },
+    ];
+    let cleaned = 0;
+    for (const t of targets) {
+        if (!existsSync(t.file)) continue;
+        try {
+            const content = await fs.readFile(t.file, 'utf8');
+            const result = content.replace(t.pattern, '').replace(/\n{3,}/g, '\n\n');
+            if (result !== content) {
+                await atomicWrite(t.file, result);
+                const rel = t.file.replace(homeDir, '~');
+                log(LOG_OK, chalk.green, chalk.white('removed ANTHROPIC_BASE_URL from') + ' ' + chalk.white(rel));
+                cleaned++;
+            }
+        } catch (e) {
+            log(LOG_WARN, chalk.yellow, chalk.white(`cannot clean ${path.basename(t.file)}`) + `  →  ${chalk.yellow(e.message)}`);
+        }
+    }
+    if (cleaned === 0) {
+        log(LOG_WARN, chalk.yellow, chalk.white('nothing to clean') + `  →  ${chalk.yellow('no ANTHROPIC_BASE_URL found in shell configs')}`);
+    } else {
+        log(LOG_OK, chalk.green, chalk.white(`cleaned ${cleaned} file(s)`));
+    }
+    delete process.env.ANTHROPIC_BASE_URL;
+}

package/src/inject.js CHANGED Viewed

@@ -33,6 +33,7 @@ export async function injectBlock(filePath, content, blockId) {
     if (startIndex !== -1 && endIndex !== -1 && endIndex > startIndex) {
         fileContent = fileContent.slice(0, startIndex) + block + fileContent.slice(endIndex + endTag.length);
     } else {
+        fileContent = fileContent.trimEnd() + '\n\n' + block + '\n';
     }
     await atomicWrite(filePath, fileContent);

package/src/packages.js CHANGED Viewed

@@ -5,6 +5,7 @@
 import fs from 'fs/promises';
 import path from 'path';
 import { createHash } from 'crypto';
+import { chalk, log, LOG_WARN } from './logger.js';
 // ─── Logica Dipendenze ───────────────────────────────
@@ -41,7 +42,8 @@ export async function readPackageDeps() {
         selected = selected.concat(filterAndFormat(pkg.devDependencies));
         return selected.length > 0 ? selected : null;
-    } catch (_) {
+    } catch (err) {
+        log(LOG_WARN, chalk.yellow, chalk.white('package.json parse error') + `  →  ${chalk.yellow(err.message)}`);
         return null;
     }
 }

package/src/proxy.js CHANGED Viewed

@@ -8,6 +8,8 @@ import { webSearch } from './search.js';
 import { readPackageDeps, buildQuery } from './packages.js';
 import { chalk, log, LOG_WARN, LOG_BOLT } from './logger.js';
 import { httpsAgent } from './http-agent.js';
+import { sanitizeWebContent } from './sanitize.js';
+import { maxTokens, qualitySettings, verbose } from './cli.js';
 // ─── HTTP Node server daemon ─────────────────────────
@@ -93,14 +95,19 @@ export async function createServer(usePackageJson) {
             try {
                 if (!query || query.trim() === String(new Date().getFullYear())) throw new Error('Empty query');
                 // parallel load in proxy app process to boost response load
-                const { results, pageText } = await webSearch(query, true);
+                const { results, pageText } = await webSearch(query, true, {
+                    ddgResults: qualitySettings.ddgResults,
+                    maxLen: qualitySettings.charsPerPage,
+                    jinaTimeout: qualitySettings.jinaTimeout,
+                    verbose,
+                });
                 resultsCount = results.length;
                 contextBlock = `\n\n--- WEB CONTEXT (live, ${new Date().toISOString()}) ---\n`;
                 results.forEach((r, i) => {
-                    contextBlock += `${i + 1}. ${r.title}: ${r.snippet} (${r.url})\n`;
+                    contextBlock += `${i + 1}. ${r.title}: ${sanitizeWebContent(r.snippet, 500)} (${r.url})\n`;
                 });
-                if (pageText) contextBlock += `\nFULL TEXT:\n${pageText}\n`;
+                if (pageText) contextBlock += `\nFULL TEXT:\n${sanitizeWebContent(pageText, maxTokens)}\n`;
                 contextBlock += `--- END WEB CONTEXT ---\n`;
                 didInject = true;
             } catch (_) {

package/src/registry.js ADDED Viewed

@@ -0,0 +1,62 @@
+/**
+ * @module registry
+ * @description Mappa hardcodata dipendenza → URL docs ufficiale per bypass DDG su framework noti.
+ */
+// ─── Docs URL Registry ──────────────────────────────
+const DOCS_REGISTRY = {
+    'svelte': 'https://svelte.dev/docs/svelte/overview',
+    'sveltekit': 'https://svelte.dev/docs/kit/introduction',
+    'react': 'https://react.dev/reference/react',
+    'react-dom': 'https://react.dev/reference/react-dom',
+    'next': 'https://nextjs.org/docs',
+    'nextjs': 'https://nextjs.org/docs',
+    'vue': 'https://vuejs.org/api/',
+    'nuxt': 'https://nuxt.com/docs/api',
+    'angular': 'https://angular.dev/overview',
+    'astro': 'https://docs.astro.build/en/reference/configuration-reference/',
+    'tailwindcss': 'https://tailwindcss.com/docs',
+    'typescript': 'https://www.typescriptlang.org/docs/',
+    'express': 'https://expressjs.com/en/5x/api.html',
+    'fastify': 'https://fastify.dev/docs/latest/',
+    'hono': 'https://hono.dev/docs/',
+    'solid-js': 'https://docs.solidjs.com/',
+    'qwik': 'https://qwik.dev/docs/',
+    'remix': 'https://remix.run/docs/en/main',
+    'prisma': 'https://www.prisma.io/docs',
+    'drizzle-orm': 'https://orm.drizzle.team/docs/overview',
+    'three': 'https://threejs.org/docs/',
+    'zod': 'https://zod.dev/',
+    'trpc': 'https://trpc.io/docs',
+    'tanstack-query': 'https://tanstack.com/query/latest/docs/overview',
+};
+/**
+ * @description Normalizza nome dipendenza e cerca URL docs nel registry.
+ * @param   {string} depName - Nome dipendenza da package.json (es. "svelte 5.51" o "@sveltejs/kit")
+ * @returns {string|null} URL docs diretto o null se non trovato
+ */
+export function lookupRegistryUrl(depName) {
+    // Prende solo il nome senza versione ("svelte 5.51" → "svelte")
+    const name = depName.split(' ')[0].toLowerCase();
+    // Match diretto
+    if (DOCS_REGISTRY[name]) return DOCS_REGISTRY[name];
+    // Strip @scope/ prefix ("@sveltejs/kit" → "kit", ma usiamo mapping speciali)
+    if (name === '@sveltejs/kit') return DOCS_REGISTRY['sveltekit'];
+    if (name === 'next' || name === '@next/core') return DOCS_REGISTRY['next'];
+    // Generic scope strip
+    const stripped = name.startsWith('@') ? name.split('/')[1] : name;
+    if (DOCS_REGISTRY[stripped]) return DOCS_REGISTRY[stripped];
+    // Strip -js suffix ("solid-js" → "solid")
+    const noJs = stripped.replace(/-js$/, '');
+    if (noJs !== stripped && DOCS_REGISTRY[noJs]) return DOCS_REGISTRY[noJs];
+    return null;
+}
+export { DOCS_REGISTRY };

package/src/sanitize.js ADDED Viewed

@@ -0,0 +1,35 @@
+/**
+ * @module sanitize
+ * @description Sanitizzazione contenuto web contro prompt injection attacks.
+ */
+// Pattern noti di prompt injection che devono essere filtrati
+const DANGEROUS_PATTERNS = [
+    /ignore\s+(all\s+)?previous\s+instructions?/gi,
+    /disregard\s+(all\s+)?previous/gi,
+    /you\s+are\s+now\s+/gi,
+    /forget\s+(all\s+)?(your\s+)?instructions?/gi,
+    /new\s+instructions?\s*:/gi,
+    /system\s*prompt\s*:/gi,
+    /\[INST\]/gi,
+    /<\|im_start\|>/gi,
+    /<\|im_end\|>/gi,
+    /```system/gi,
+    /ASSISTANT:\s/gi,
+    /HUMAN:\s/gi,
+];
+/**
+ * @description Filtra pattern pericolosi di prompt injection dal testo web scrappato.
+ * @param   {string} text - Testo raw proveniente da web scraping
+ * @param   {number} maxLen - Lunghezza massima output (default 8000)
+ * @returns {string} Testo sanitizzato
+ */
+export function sanitizeWebContent(text, maxLen = 8000) {
+    if (!text || typeof text !== 'string') return '';
+    let cleaned = text;
+    for (const p of DANGEROUS_PATTERNS) {
+        cleaned = cleaned.replace(p, '[FILTERED]');
+    }
+    return cleaned.slice(0, maxLen);
+}

package/src/search.js CHANGED Viewed

@@ -1,6 +1,6 @@
 /**
  * @module search
- * @description Logica di scraping web su DuckDuckGo tramite cheerio e linkedom.
+ * @description Logica di scraping web: Jina Reader → fallback Readability, registry bypass, DDG search.
  */
 import fetch from 'node-fetch';
 import * as cheerio from 'cheerio';
@@ -9,26 +9,114 @@ import { DOMParser } from 'linkedom';
 import { searchCache } from './cache.js';
 import { CircuitBreaker } from './circuit-breaker.js';
 import { httpAgent, httpsAgent } from './http-agent.js';
+import { sanitizeWebContent } from './sanitize.js';
+import { lookupRegistryUrl } from './registry.js';
 // ─── Config & Cache ──────────────────────────────────
-// Evitiamo IP bans ruotando UA comuni in Chrome desktop
 const USER_AGENTS = [
     'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36',
     'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36',
     'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
 ];
-/**
- * @description Seleziona uno User-Agent rnd dall'array disponibile
- * @returns {string} Stringa di uno User Agent
- */
 function getRandomUA() {
     return USER_AGENTS[Math.floor(Math.random() * USER_AGENTS.length)];
 }
 const ddgCircuit = new CircuitBreaker({ failureThreshold: 3, resetTimeout: 30000 });
+// ─── Jina Reader + Readability Fallback ──────────────
+/**
+ * @description Fetch contenuto pagina: prima Jina Reader (JS rendering + markdown), poi fallback Readability.
+ * @param   {string} url         - URL della pagina
+ * @param   {string} userAgent   - UA per il fallback fetch
+ * @param   {Object} opts        - { jinaTimeout, maxLen, verbose }
+ * @returns {Promise<string>} Contenuto markdown/text estratto
+ */
+export async function fetchPageContent(url, userAgent, opts = {}) {
+    const { jinaTimeout = 8000, maxLen = 4000, verbose = false } = opts;
+    // ── Try Jina Reader API first ──
+    try {
+        const jinaRes = await fetch(`https://r.jina.ai/${url}`, {
+            signal: AbortSignal.timeout(jinaTimeout),
+            headers: { 'Accept': 'text/markdown', 'X-No-Cache': 'true' }
+        });
+        if (jinaRes.ok) {
+            const text = await jinaRes.text();
+            if (text && text.length > 200) {
+                if (verbose) console.log(`    [jina] ✓ ${url} → ${text.length} chars`);
+                return sanitizeWebContent(text.replace(/\s+/g, ' '), maxLen);
+            }
+        }
+    } catch (_) {
+        if (verbose) console.log(`    [jina] ✗ ${url} → fallback readability`);
+    }
+    // ── Fallback: fetch + Readability ──
+    try {
+        const pageRes = await fetch(url, {
+            signal: AbortSignal.timeout(5000),
+            headers: { 'User-Agent': userAgent },
+            agent: url.startsWith('https:') ? httpsAgent : httpAgent
+        });
+        if (pageRes.ok) {
+            const document = new DOMParser().parseFromString(await pageRes.text(), 'text/html');
+            let text = '';
+            try {
+                const article = new Readability(document).parse();
+                text = article?.textContent || '';
+            } catch (_) {
+                text = document.body?.textContent || '';
+            }
+            if (text) {
+                if (verbose) console.log(`    [readability] ✓ ${url} → ${text.length} chars`);
+                return sanitizeWebContent(text.replace(/\s+/g, ' '), maxLen);
+            }
+        }
+    } catch (_) { }
+    return '';
+}
+// ─── Registry Direct Fetch ───────────────────────────
+/**
+ * @description Fetch diretto dalle docs ufficiali per dipendenze nel registry.
+ * @param   {Array}  deps - Array di dipendenze ("svelte 5.51", "sveltekit 2.50")
+ * @param   {Object} opts - { jinaTimeout, maxLen, verbose }
+ * @returns {Promise<Object>} { registryText, coveredDeps }
+ */
+export async function registryFetch(deps, opts = {}) {
+    const { verbose = false } = opts;
+    const userAgent = getRandomUA();
+    let registryText = '';
+    const coveredDeps = new Set();
+    for (const dep of deps) {
+        const docUrl = lookupRegistryUrl(dep);
+        if (!docUrl) continue;
+        const depName = dep.split(' ')[0];
+        try {
+            const text = await fetchPageContent(docUrl, userAgent, opts);
+            if (text && text.length > 100) {
+                registryText += `\n### ${depName} (official docs)\n${text}\n`;
+                coveredDeps.add(dep);
+                if (verbose) console.log(`    [registry] ✓ ${depName} → ${docUrl}`);
+            }
+        } catch (_) {
+            if (verbose) console.log(`    [registry] ✗ ${depName} → fetch failed`);
+        }
+    }
+    return { registryText, coveredDeps };
+}
+// ─── DDG Search ──────────────────────────────────────
 /**
  * @description Decodifica link mascherati DuckDuckGo recuperando `uddg` querystring.
  * @param   {string} href - Url incapsulato proveniente da nodeDDG
@@ -46,13 +134,12 @@ export function resolveDDGUrl(href) {
 /**
  * @description Esegue chiamata http reale su node DDG.
- * @param   {string} query - Ricerca DDG formattata
+ * @param   {string} query       - Ricerca DDG formattata
+ * @param   {number} resultsLimit - Max risultati da ritornare
  * @returns {Promise<Object>} { results, userAgent }
- * @throws  {Error} Fallimento http DDG request
  */
-async function doSearch(query) {
+async function doSearch(query, resultsLimit = 3) {
     const userAgent = getRandomUA();
-    // Fetch DDG raw HTML search endpoint ignoring CSS/JS payloads
     const searchRes = await fetch(
         `https://html.duckduckgo.com/html/?q=${encodeURIComponent(query)}`,
         { signal: AbortSignal.timeout(5000), headers: { 'User-Agent': userAgent }, agent: httpsAgent }
@@ -70,21 +157,24 @@ async function doSearch(query) {
     });
     const seen = new Set();
-    results = results.filter(r => r.url && !seen.has(r.url) && seen.add(r.url)).slice(0, 3);
+    results = results.filter(r => r.url && !seen.has(r.url) && seen.add(r.url)).slice(0, resultsLimit);
     if (results.length === 0) throw new Error('No DDG results');
     return { results, userAgent };
 }
+// ─── Main Web Search ─────────────────────────────────
 /**
  * @description Punto d'accesso caching+retry orchestrator web.
- * @param   {string}  query    - Input utente di ricerca convertibile web
- * @param   {boolean} parallel - Promise.all fast per multiple page scraping
+ * @param   {string}  query        - Input utente di ricerca convertibile web
+ * @param   {boolean} parallel     - Promise.all fast per multiple page scraping
+ * @param   {Object}  opts         - { ddgResults, maxLen, jinaTimeout, verbose }
  * @returns {Promise<Object>} Oggetto risultati + pageText formattato str
  */
-export async function webSearch(query, parallel = false) {
-    const now = Date.now();
-    // In cache mode skip costose chiamate network
+export async function webSearch(query, parallel = false, opts = {}) {
+    const { ddgResults = 3, maxLen = 4000, jinaTimeout = 8000, verbose = false } = opts;
     const cached = searchCache.get(query);
     if (cached) {
         return { results: cached.results, pageText: cached.pageText };
@@ -92,62 +182,22 @@ export async function webSearch(query, parallel = false) {
     let results, userAgent;
     try {
-        const res = await ddgCircuit.execute(() => doSearch(query));
+        const res = await ddgCircuit.execute(() => doSearch(query, ddgResults));
         results = res.results;
         userAgent = res.userAgent;
     } catch (err) {
         throw err;
     }
+    const fetchOpts = { jinaTimeout, maxLen, verbose };
     let pageText = '';
-    // Se claude-code usa parallel mode; altrimenti solo primo link (antigravity)
     if (parallel) {
-        const pages = await Promise.all(results.map(async (r) => {
-            try {
-                const pageRes = await fetch(r.url, {
-                    signal: AbortSignal.timeout(5000),
-                    headers: { 'User-Agent': userAgent },
-                    agent: r.url.startsWith('https:') ? httpsAgent : httpAgent
-                });
-                if (pageRes.ok) {
-                    const document = new DOMParser().parseFromString(await pageRes.text(), 'text/html');
-                    let text = '';
-                    try {
-                        const article = new Readability(document).parse();
-                        text = article?.textContent || '';
-                    } catch (_) {
-                        text = document.body?.textContent || '';
-                    }
-                    if (text) return text.replace(/\s+/g, ' ').slice(0, 4000);
-                }
-            } catch (_) { // fail silenzioso parallelo tollerato per timeout link third-party
-            }
-            return '';
-        }));
+        const pages = await Promise.all(results.map(r => fetchPageContent(r.url, userAgent, fetchOpts)));
         pageText = pages.filter(Boolean).join('\n\n');
     } else {
-        try {
-            if (results[0]) {
-                const pageRes = await fetch(results[0].url, {
-                    signal: AbortSignal.timeout(5000), // node-fetch hang timeout catch
-                    headers: { 'User-Agent': userAgent },
-                    agent: results[0].url.startsWith('https:') ? httpsAgent : httpAgent
-                });
-                if (pageRes.ok) {
-                    const document = new DOMParser().parseFromString(await pageRes.text(), 'text/html');
-                    let text = '';
-                    try {
-                        const article = new Readability(document).parse();
-                        text = article?.textContent || '';
-                    } catch (_) {
-                        text = document.body?.textContent || '';
-                    }
-                    if (text) {
-                        pageText = text.replace(/\s+/g, ' ').slice(0, 4000);
-                    }
-                }
-            }
-        } catch (_) { // bypass errore url target: fallback al contesto vuoto
+        if (results[0]) {
+            pageText = await fetchPageContent(results[0].url, userAgent, fetchOpts);
         }
     }

package/src/state.js CHANGED Viewed

@@ -2,7 +2,8 @@
  * @module state
  * @description Persiste la memoria di antigravity prev-hash per fault tolleranza riavvii.
  */
-import { readFile, writeFile, mkdir } from 'fs/promises';
+import { readFile, mkdir } from 'fs/promises';
+import { atomicWrite } from './utils/atomic-write.js';
 import { existsSync } from 'fs';
 import path from 'path';
 import os from 'os';
@@ -33,5 +34,5 @@ export async function loadBatchState() {
 export async function saveBatchState(map) {
     await mkdir(STATE_DIR, { recursive: true });
     const obj = Object.fromEntries(map);
-    await writeFile(STATE_FILE, JSON.stringify(obj, null, 2), 'utf8');
+    await atomicWrite(STATE_FILE, JSON.stringify(obj, null, 2), { backup: false });
 }

package/src/watcher.js CHANGED Viewed

@@ -1,14 +1,15 @@
 /**
  * @module watcher
- * @description Timer poll di Antigravity update locale skill inject doc rules, ora con caching a batch blocchi separati.
+ * @description Timer poll di Antigravity update locale skill inject doc rules, con registry bypass e quality settings.
  */
 import os from 'os';
 import path from 'path';
-import { webSearch } from './search.js';
+import { webSearch, registryFetch, fetchPageContent } from './search.js';
 import { readPackageDeps, buildQuery, groupIntoBatches, batchHash } from './packages.js';
+import { sanitizeWebContent } from './sanitize.js';
 import { updateGeminiFiles, removeStaleBlocks } from './inject.js';
 import { chalk, label, log, LOG_WARN, LOG_REFRESH } from './logger.js';
-import { version } from './cli.js';
+import { version, maxTokens, quality, qualitySettings, verbose, customSources } from './cli.js';
 import { loadBatchState, saveBatchState } from './state.js';
 import { httpsAgent } from './http-agent.js';
@@ -29,20 +30,33 @@ export function startWatcher({ intervalMinutes, usePackageJson, batchSize }) {
     console.log(label('◆', 'workspace', skillFilePretty));
     console.log(label('◆', 'interval', `every ${intervalMinutes} min`));
     console.log(label('◆', 'batch_size', `chunk limit ${batchSize}`));
-    console.log(label('◆', 'context', 'DuckDuckGo → live'));
+    console.log(label('◆', 'engine', 'Jina Reader → Readability fallback'));
+    console.log(label('◆', 'quality', `${quality} (${qualitySettings.ddgResults} results, ${qualitySettings.charsPerPage} chars)`));
+    console.log(label('◆', 'max_tokens', `${maxTokens}`));
+    if (customSources.length > 0) {
+        console.log(label('◆', 'sources', `${customSources.length} custom URL(s)`));
+    }
+    if (verbose) console.log(label('◆', 'verbose', 'enabled'));
     console.log();
     console.log(`  ${chalk.cyan('✻')} Running. Antigravity will load context automatically.`);
     console.log();
     let previousBatchHashes = new Map();
+    const searchOpts = {
+        ddgResults: qualitySettings.ddgResults,
+        maxLen: qualitySettings.charsPerPage,
+        jinaTimeout: qualitySettings.jinaTimeout,
+        verbose,
+    };
     async function updateSkill() {
         if (previousBatchHashes.size === 0) {
             previousBatchHashes = await loadBatchState();
         }
-        const deps = await readPackageDeps(); // tutte le deps
+        const deps = await readPackageDeps();
         if (!deps || deps.length === 0) {
-            return; // fall back to something default or just skip
+            return;
         }
         const batches = groupIntoBatches(deps, batchSize);
@@ -65,11 +79,30 @@ export function startWatcher({ intervalMinutes, usePackageJson, batchSize }) {
                     return;
                 }
-                const query = buildQuery(batch);
                 try {
-                    const { results, pageText } = await webSearch(query, false);
+                    // ── Registry fetch per dipendenze note ──
+                    const { registryText, coveredDeps } = await registryFetch(batch, searchOpts);
+                    // ── DDG search per dipendenze non coperte dal registry ──
+                    const uncoveredBatch = batch.filter(d => !coveredDeps.has(d));
+                    let ddgText = '';
+                    let results = [];
+                    if (uncoveredBatch.length > 0) {
+                        const query = buildQuery(uncoveredBatch);
+                        try {
+                            const res = await webSearch(query, false, searchOpts);
+                            results = res.results;
+                            ddgText = res.pageText;
+                        } catch (_) {
+                            if (verbose) log(LOG_WARN, chalk.yellow, `DDG search failed for: ${uncoveredBatch.join(', ')}`);
+                        }
+                    }
+                    const combinedText = registryText + (ddgText || '');
                     const badSignals = ['403', 'captcha', 'blocked', 'access denied', 'forbidden'];
-                    const isBad = !pageText || pageText.length < 200 || badSignals.some(s => pageText.toLowerCase().includes(s));
+                    const isBad = !combinedText || combinedText.length < 200 || badSignals.some(s => combinedText.toLowerCase().includes(s));
                     if (isBad && previousBatchHashes.has(blockId)) {
                         log(LOG_WARN, chalk.yellow, `low quality result for block ${blockId} → keeping previous context`);
                         failedCount++;
@@ -81,19 +114,22 @@ export function startWatcher({ intervalMinutes, usePackageJson, batchSize }) {
                     const batchTitle = batch.map(b => b.split(' ')[0]).join(', ');
                     let globalMd = `## Live Context — ${batchTitle} (${nowStr})\n`;
-                    globalMd += `**Query:** ${query}\n\n`;
-                    if (results.length > 0) {
+                    if (registryText) {
+                        globalMd += sanitizeWebContent(registryText, 500) + '\n';
+                    } else if (results.length > 0) {
                         globalMd += `### ${results[0].title}\n`;
-                        globalMd += `${results[0].snippet.slice(0, 300)} — ${results[0].url}\n`;
+                        globalMd += `${sanitizeWebContent(results[0].snippet, 300)} — ${results[0].url}\n`;
                     }
                     let md = `## Live Context — ${batchTitle} (${nowStr})\n`;
-                    md += `**Query:** ${query}\n\n`;
+                    if (registryText) {
+                        md += sanitizeWebContent(registryText, maxTokens) + '\n\n';
+                    }
                     for (const r of results) {
-                        md += `### ${r.title}\n${r.snippet} — ${r.url}\n\n`;
+                        md += `### ${r.title}\n${sanitizeWebContent(r.snippet, 500)} — ${r.url}\n\n`;
                     }
-                    if (pageText) {
-                        md += `FULL TEXT: ${pageText}\n`;
+                    if (ddgText) {
+                        md += `FULL TEXT: ${sanitizeWebContent(ddgText, maxTokens)}\n`;
                     }
                     await updateGeminiFiles([{
@@ -104,7 +140,11 @@ export function startWatcher({ intervalMinutes, usePackageJson, batchSize }) {
                     previousBatchHashes.set(blockId, currentHash);
                     updatedCount++;
-                    log(LOG_REFRESH, chalk.cyan, `block ${blockId} updated → ${batch.join(', ')}`);
+                    const sources = [];
+                    if (coveredDeps.size > 0) sources.push(`registry:${coveredDeps.size}`);
+                    if (results.length > 0) sources.push(`ddg:${results.length}`);
+                    log(LOG_REFRESH, chalk.cyan, `block ${blockId} updated → ${batch.join(', ')} [${sources.join(', ')}]`);
                 } catch (e) {
                     failedCount++;
                     log(LOG_WARN, chalk.yellow, `block ${blockId} fetch failed → keeping previous`);
@@ -118,6 +158,38 @@ export function startWatcher({ intervalMinutes, usePackageJson, batchSize }) {
         }
         await Promise.all(executing);
+        // ── Custom sources from .groundtruth.json ──
+        if (customSources.length > 0) {
+            for (const src of customSources) {
+                const blockId = 'src_' + Buffer.from(src.url).toString('base64url').slice(0, 8);
+                activeBlockIds.add(blockId);
+                if (previousBatchHashes.has(blockId)) {
+                    skippedCount++;
+                    continue;
+                }
+                try {
+                    const text = await fetchPageContent(src.url, '', searchOpts);
+                    if (text && text.length > 100) {
+                        const srcLabel = src.label || new URL(src.url).hostname;
+                        const md = `## Custom Source — ${srcLabel}\n${sanitizeWebContent(text, maxTokens)}\n`;
+                        await updateGeminiFiles([{
+                            blockId,
+                            globalContent: `## ${srcLabel}\n${sanitizeWebContent(text, 500)}\n`,
+                            workspaceContent: md
+                        }]);
+                        previousBatchHashes.set(blockId, blockId);
+                        updatedCount++;
+                        log(LOG_REFRESH, chalk.cyan, `custom source updated → ${srcLabel}`);
+                    }
+                } catch (_) {
+                    failedCount++;
+                }
+            }
+        }
         await removeStaleBlocks(globalPath, activeBlockIds);
         await removeStaleBlocks(workspacePath, activeBlockIds);
@@ -128,18 +200,16 @@ export function startWatcher({ intervalMinutes, usePackageJson, batchSize }) {
     let cycleCount = 0;
-    // Periodical state persistence on process exit to avoid total crash data loss
     process.on('SIGINT', async () => {
         await saveBatchState(previousBatchHashes);
         process.exit(0);
     });
-    // Lancio a startup immediato
     updateSkill();
     setInterval(() => {
         cycleCount++;
         if (cycleCount % 10 === 0) {
-            httpsAgent.destroy(); // Forza chiusura idle connections
+            httpsAgent.destroy();
         }
         updateSkill();
     }, intervalMinutes * 60 * 1000);

package/assets/banner.png DELETED Viewed

Binary file