npm - @botlearn/google-search - Versions diffs - 0.1.0 - Mend

@botlearn/google-search 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/LICENSE +21 -0
package/README.md +35 -0
package/knowledge/anti-patterns.md +62 -0
package/knowledge/best-practices.md +70 -0
package/knowledge/domain.md +90 -0
package/manifest.json +26 -0
package/package.json +35 -0
package/skill.md +42 -0
package/strategies/main.md +70 -0
package/tests/benchmark.json +476 -0
package/tests/smoke.json +54 -0

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 BotLearn
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,35 @@
+# @botlearn/google-search
+> Advanced Google search query construction, result filtering, and relevance ranking for OpenClaw Agent
+## Installation
+```bash
+# via npm
+npm install @botlearn/google-search
+# via clawhub
+clawhub install @botlearn/google-search
+```
+## Category
+Information Retrieval
+## Dependencies
+None
+## Files
+| File | Description |
+|------|-------------|
+| `manifest.json` | Skill metadata and configuration |
+| `skill.md` | Role definition and activation rules |
+| `knowledge/` | Domain knowledge documents |
+| `strategies/` | Behavioral strategy definitions |
+| `tests/` | Smoke and benchmark tests |
+## License
+MIT

package/knowledge/anti-patterns.md ADDED Viewed

@@ -0,0 +1,62 @@
+---
+domain: google-search
+topic: anti-patterns
+priority: medium
+ttl: 30d
+---
+# Google Search — Anti-Patterns
+## Query Construction Anti-Patterns
+### 1. Overly Long Queries
+- **Problem**: Queries with 10+ terms dilute relevance; Google ignores excess terms
+- **Fix**: Focus on 3-7 high-signal keywords, use operators to add precision without verbosity
+### 2. Natural Language Queries
+- **Problem**: Searching "What is the best way to implement authentication in a React application?" treats every word equally
+- **Fix**: Extract key terms: `React authentication implementation best practices`
+### 3. Missing Context Terms
+- **Problem**: Searching `merge` without context returns results about git, mail merge, corporate mergers, etc.
+- **Fix**: Add domain context: `git merge conflict resolution` or `pandas merge dataframe`
+### 4. Ignoring Operator Case Sensitivity
+- **Problem**: `or` is treated as a regular word; only `OR` works as a Boolean operator
+- **Fix**: Always use uppercase `OR`, and remember `-` must touch the excluded term (no space)
+### 5. Single-Query Dependency
+- **Problem**: Relying on one query for complex, multi-faceted topics
+- **Fix**: Decompose into 2-4 targeted sub-queries, merge results
+## Result Evaluation Anti-Patterns
+### 6. First-Result Bias
+- **Problem**: Treating the first search result as the authoritative answer
+- **Fix**: Examine at least 3-5 results; first result may be SEO-optimized, not most accurate
+### 7. Ignoring Source Verification
+- **Problem**: Accepting information without checking the source's authority or recency
+- **Fix**: Always check: Who published this? When? Are claims cited? Is the domain reputable?
+### 8. Single-Source Dependency
+- **Problem**: Using only one source to answer a question
+- **Fix**: Cross-reference key facts across 2-3 independent sources; flag single-source claims
+### 9. Ignoring Date Context
+- **Problem**: Returning outdated information for rapidly evolving topics (frameworks, APIs, regulations)
+- **Fix**: Use `after:` date filters; always note the publication date in results; flag if content may be outdated
+### 10. Content Farm Inclusion
+- **Problem**: Including results from low-quality aggregator sites that scrape and rewrite content
+- **Fix**: Exclude known content farms with `-site:`; prefer domains with original analysis or primary data
+## Output Anti-Patterns
+### 11. Raw URL Dumping
+- **Problem**: Returning a list of URLs without context, relevance scores, or summaries
+- **Fix**: Each result should include: title, source, date, relevance note, and a 1-2 sentence summary
+### 12. No Deduplication
+- **Problem**: Returning the same information from multiple syndicated sources
+- **Fix**: Deduplicate at content level, keep the primary/authoritative source

package/knowledge/best-practices.md ADDED Viewed

@@ -0,0 +1,70 @@
+---
+domain: google-search
+topic: query-construction-and-quality
+priority: high
+ttl: 30d
+---
+# Google Search — Best Practices
+## Query Construction Patterns
+### 1. Intent Classification First
+Before constructing a query, classify the search intent:
+- **Navigational** — User wants a specific site → use `site:` or direct URL terms
+- **Informational** — User wants to learn → use descriptive terms + authoritative source filters
+- **Transactional** — User wants to do something → include action verbs and tool names
+- **Investigative** — User wants to compare/analyze → use comparison terms + multiple sources
+### 2. Keyword Selection
+- Use **nouns and noun phrases** as primary search terms
+- Prefer **specific technical terms** over generic descriptions
+- Include **version numbers** for software-related queries (e.g., "React 18", "Python 3.12")
+- Use the **terminology of the target domain** (e.g., "myocardial infarction" not "heart attack" for medical research)
+### 3. Query Decomposition for Complex Topics
+When a topic is broad or multi-faceted:
+1. Break into 2-4 focused sub-queries
+2. Each sub-query targets one aspect
+3. Merge and deduplicate results
+4. Cross-reference findings across sub-queries
+Example: "What are the environmental and economic impacts of electric vehicles?"
+- Sub-query 1: `electric vehicles environmental impact lifecycle emissions`
+- Sub-query 2: `electric vehicles economic analysis cost ownership`
+- Sub-query 3: `EV vs ICE environmental comparison study`
+### 4. Iterative Refinement
+- Start broad, then narrow based on initial results
+- Add exclusions (`-term`) to filter noise discovered in initial results
+- Switch to `site:` filters when you identify authoritative domains
+## Result Quality Assessment
+### Source Credibility Tiers
+| Tier | Source Type | Trust Level | Examples |
+|------|-----------|-------------|---------|
+| T1 | Primary / Official | Highest | Government data, academic journals, official docs |
+| T2 | Established Media | High | Reuters, AP, major newspapers, peer-reviewed blogs |
+| T3 | Expert Community | Medium-High | Stack Overflow (high-rep), GitHub (popular repos), industry blogs |
+| T4 | General Web | Medium | Wikipedia, Medium, personal blogs with citations |
+| T5 | User-Generated | Low | Forums, social media, anonymous posts |
+### Freshness Assessment
+- Check publication date — prefer recent for technology, less critical for fundamentals
+- Verify the content hasn't been superseded by newer information
+- For software: match the version discussed to the user's version
+### Deduplication Strategy
+1. **URL-level** — Same URL from different queries → keep one
+2. **Content-level** — Same article syndicated across sites → keep the primary source
+3. **Fact-level** — Multiple sources stating the same fact → consolidate, cite best source
+## Result Ranking Criteria
+Rank results by weighted combination:
+1. **Relevance** (40%) — How directly does it answer the query?
+2. **Source Authority** (25%) — Credibility tier of the source
+3. **Freshness** (20%) — How recent is the information?
+4. **Depth** (15%) — How comprehensive is the coverage?

package/knowledge/domain.md ADDED Viewed

@@ -0,0 +1,90 @@
+---
+domain: google-search
+topic: search-operators-and-syntax
+priority: high
+ttl: 30d
+---
+# Google Search — Operator Syntax & Query Construction
+## Core Search Operators
+### Exact Match
+- `"machine learning"` — Match the exact phrase
+- Use for: names, specific phrases, error messages, quotes
+### Boolean Operators
+- `term1 OR term2` — Match either term (OR must be uppercase)
+- `term1 | term2` — Alternative OR syntax
+- `-term` — Exclude term from results
+- `term1 term2` — Implicit AND (both terms required)
+### Site & Domain Filters
+- `site:github.com` — Restrict to a specific domain
+- `site:.edu` — Restrict to a TLD (educational institutions)
+- `site:.gov` — Government sources only
+- `-site:pinterest.com` — Exclude a domain
+### File Type Filters
+- `filetype:pdf` — PDF documents only
+- `filetype:csv` — CSV data files
+- `filetype:pptx` — PowerPoint presentations
+- Useful types: pdf, doc, docx, xls, xlsx, csv, ppt, pptx, txt
+### URL & Title Filters
+- `intitle:"annual report"` — Term must appear in page title
+- `allintitle:react hooks tutorial` — All terms in title
+- `inurl:api` — Term must appear in URL
+- `allinurl:docs api reference` — All terms in URL
+### Date & Range
+- `after:2024-01-01` — Results published after date
+- `before:2024-12-31` — Results published before date
+- `2023..2024` — Numeric range (also works for prices, years)
+### Wildcard & Proximity
+- `"machine * learning"` — Wildcard for unknown words
+- `AROUND(3)` — Proximity search: terms within N words of each other
+  - Example: `"climate change" AROUND(5) "economic impact"`
+### Special Operators
+- `cache:url` — Google's cached version of a page
+- `related:nytimes.com` — Sites similar to a domain
+- `define:term` — Dictionary definition
+- `info:url` — Information about a URL
+## Operator Combinations
+### Academic Research
+```
+"topic name" site:arxiv.org OR site:scholar.google.com filetype:pdf after:2023-01-01
+```
+### Technical Documentation
+```
+"function name" site:docs.python.org OR site:developer.mozilla.org
+```
+### News with Source Quality
+```
+"event name" site:reuters.com OR site:apnews.com OR site:bbc.com after:2024-06-01
+```
+### Code Examples
+```
+"error message" site:stackoverflow.com OR site:github.com -"closed as duplicate"
+```
+### Competitive Analysis
+```
+"company name" (review OR comparison OR alternative) -site:company.com after:2024-01-01
+```
+## Query Length Guidelines
+| Query Type | Optimal Length | Example |
+|-----------|---------------|---------|
+| Simple fact | 2-4 terms | `python list comprehension` |
+| Specific answer | 4-7 terms | `"react useEffect" cleanup function example` |
+| Research | 5-10 terms + operators | `"transformer architecture" attention mechanism site:arxiv.org filetype:pdf after:2023` |
+| Troubleshooting | Error message + context | `"TypeError: Cannot read property" react useState` |

package/manifest.json ADDED Viewed

@@ -0,0 +1,26 @@
+{
+  "name": "@botlearn/google-search",
+  "version": "0.1.0",
+  "description": "Advanced Google search query construction, result filtering, and relevance ranking for OpenClaw Agent",
+  "category": "information-retrieval",
+  "author": "BotLearn",
+  "benchmarkDimension": "information-retrieval",
+  "expectedImprovement": 30,
+  "dependencies": {},
+  "compatibility": {
+    "openclaw": ">=0.5.0"
+  },
+  "files": {
+    "skill": "skill.md",
+    "knowledge": [
+      "knowledge/domain.md",
+      "knowledge/best-practices.md",
+      "knowledge/anti-patterns.md"
+    ],
+    "strategies": [
+      "strategies/main.md"
+    ],
+    "smokeTest": "tests/smoke.json",
+    "benchmark": "tests/benchmark.json"
+  }
+}

package/package.json ADDED Viewed

@@ -0,0 +1,35 @@
+{
+  "name": "@botlearn/google-search",
+  "version": "0.1.0",
+  "description": "Advanced Google search query construction, result filtering, and relevance ranking for OpenClaw Agent",
+  "type": "module",
+  "main": "manifest.json",
+  "files": [
+    "manifest.json",
+    "skill.md",
+    "knowledge/",
+    "strategies/",
+    "tests/",
+    "README.md"
+  ],
+  "keywords": [
+    "botlearn",
+    "openclaw",
+    "skill",
+    "information-retrieval"
+  ],
+  "author": "BotLearn",
+  "license": "MIT",
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/readai-team/botlearn-awesome-skills.git",
+    "directory": "packages/skills/google-search"
+  },
+  "homepage": "https://github.com/readai-team/botlearn-awesome-skills/tree/main/packages/skills/google-search",
+  "bugs": {
+    "url": "https://github.com/readai-team/botlearn-awesome-skills/issues"
+  },
+  "publishConfig": {
+    "access": "public"
+  }
+}

package/skill.md ADDED Viewed

@@ -0,0 +1,42 @@
+---
+name: google-search
+role: Search Query Specialist
+version: 1.0.0
+triggers:
+  - "search for"
+  - "find information"
+  - "look up"
+  - "google"
+  - "search the web"
+  - "find sources"
+---
+# Role
+You are a Search Query Specialist. When activated, you construct precise, high-relevance search queries using advanced operators and multi-source strategies, then filter and rank results to surface the most valuable information.
+# Capabilities
+1. Construct advanced search queries using Boolean operators, site-specific filters, date ranges, filetype filters, and exclusion keywords
+2. Decompose ambiguous or complex queries into targeted sub-queries for parallel execution
+3. Rank results by relevance, remove low-quality entries, and deduplicate across sources
+4. Assess source credibility using domain authority, publication date, and content signals
+5. Merge results from multiple sub-queries into a coherent, prioritized result set
+# Constraints
+1. Never return results without verifying source credibility — always assess domain authority
+2. Never rely on a single search query for complex topics — decompose into sub-queries
+3. Never present duplicate content from different sources as separate results
+4. Always prefer primary sources over aggregators or content farms
+5. Always include date context when results may be time-sensitive
+# Activation
+WHEN the user requests a web search or information retrieval:
+1. Analyze the search intent and identify key entities, constraints, and scope
+2. Construct optimized queries following strategies/main.md
+3. Apply knowledge/domain.md for operator syntax
+4. Filter and rank results using knowledge/best-practices.md
+5. Verify against knowledge/anti-patterns.md to avoid common mistakes
+6. Output ranked results with source credibility annotations

package/strategies/main.md ADDED Viewed

@@ -0,0 +1,70 @@
+---
+strategy: google-search
+version: 1.0.0
+steps: 6
+---
+# Google Search Strategy
+## Step 1: Intent Analysis
+- Parse the user's request to identify: **topic**, **scope**, **constraints**, **desired output format**
+- Classify search intent: navigational / informational / transactional / investigative
+- Identify time sensitivity — does the user need current information or historical?
+- IF the query is ambiguous THEN ask one clarifying question before proceeding
+- Extract key entities: names, technologies, versions, dates, locations
+## Step 2: Query Construction
+- SELECT query strategy based on complexity:
+  - Simple fact → Single targeted query with 3-5 keywords
+  - Specific answer → Keyword query + site/filetype operators
+  - Multi-faceted research → Decompose into 2-4 sub-queries
+  - Troubleshooting → Error message (exact match) + context terms
+- APPLY operators from knowledge/domain.md:
+  - Use `"exact phrases"` for specific terms, names, error messages
+  - Use `site:` to target authoritative domains for the topic
+  - Use `after:` for time-sensitive queries
+  - Use `-site:` to exclude known low-quality sources
+  - Use `filetype:` when the user needs specific document types
+- VERIFY query length is 3-10 terms (excluding operators)
+## Step 3: Multi-Source Execution
+- Execute primary query
+- IF topic is multi-faceted THEN execute sub-queries in parallel
+- For each query, collect top 10 raw results with: URL, title, snippet, date, domain
+- IF initial results are poor quality THEN refine query:
+  - Add exclusion operators for noise sources
+  - Narrow with additional `site:` filters
+  - Try alternative terminology
+## Step 4: Deduplication & Filtering
+- Remove exact URL duplicates across queries
+- Detect content-level duplicates (same article on different domains) → keep primary source
+- Filter out results matching anti-patterns from knowledge/anti-patterns.md:
+  - Content farms and aggregator sites
+  - Outdated content (for time-sensitive topics)
+  - Results with no clear authorship or date
+- Verify remaining results against source credibility tiers from knowledge/best-practices.md
+## Step 5: Relevance Ranking
+- Score each result on 4 dimensions (from knowledge/best-practices.md):
+  - **Relevance** (40%) — How directly does it answer the query?
+  - **Source Authority** (25%) — Credibility tier (T1-T5)
+  - **Freshness** (20%) — Publication recency relative to topic
+  - **Depth** (15%) — Comprehensiveness of coverage
+- Sort by weighted score, descending
+- Select top 5-10 results for output
+## Step 6: Output & Verification
+- Present results in structured format:
+  - **Rank** — Position in relevance order
+  - **Title** — Page title
+  - **Source** — Domain + credibility tier
+  - **Date** — Publication date
+  - **Summary** — 1-2 sentence description of what the page contains
+  - **Relevance** — Why this result is useful for the query
+- IF multiple sub-queries were used THEN provide a synthesis section connecting findings
+- SELF-CHECK:
+  - Are results from diverse, credible sources? (not all from one domain)
+  - Is the most relevant result ranked first?
+  - Are all results genuinely addressing the user's intent?
+  - IF any check fails THEN loop back to Step 3 with refined queries

package/tests/benchmark.json ADDED Viewed

@@ -0,0 +1,476 @@
+{
+  "version": "0.0.1",
+  "dimension": "information-retrieval",
+  "tasks": [
+    {
+      "id": "bench-easy-01",
+      "difficulty": "easy",
+      "description": "Simple factual search with clear answer",
+      "input": "Find the official documentation page for Python's asyncio library, specifically the section on Tasks and Coroutines.",
+      "rubric": [
+        {
+          "criterion": "Relevance",
+          "weight": 0.4,
+          "scoring": {
+            "5": "Returns the exact official Python docs page for asyncio tasks and coroutines",
+            "3": "Returns Python docs but not the specific section",
+            "1": "Returns third-party tutorials instead of official docs",
+            "0": "Irrelevant results"
+          }
+        },
+        {
+          "criterion": "Query Quality",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Uses site:docs.python.org with targeted terms",
+            "3": "Reasonable query but without site filter",
+            "1": "Overly broad query",
+            "0": "No query optimization"
+          }
+        },
+        {
+          "criterion": "Output Quality",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Clear result with URL, description, and relevance note",
+            "3": "URL with basic description",
+            "1": "URL only",
+            "0": "No usable output"
+          }
+        }
+      ],
+      "expectedScoreWithout": 40,
+      "expectedScoreWith": 80
+    },
+    {
+      "id": "bench-easy-02",
+      "difficulty": "easy",
+      "description": "Find a specific error message solution",
+      "input": "Search for solutions to this error: 'TypeError: Cannot read properties of undefined (reading map)' in a React component that uses useState.",
+      "rubric": [
+        {
+          "criterion": "Relevance",
+          "weight": 0.4,
+          "scoring": {
+            "5": "Results directly address this specific TypeError in React with useState context; includes root cause and fix",
+            "3": "Results address the TypeError but not specifically in React/useState context",
+            "1": "Generic JavaScript error results",
+            "0": "Irrelevant results"
+          }
+        },
+        {
+          "criterion": "Query Quality",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Uses exact error message in quotes plus React/useState context, targets Stack Overflow or GitHub",
+            "3": "Includes error message but missing context terms",
+            "1": "Paraphrases error instead of exact match",
+            "0": "No query optimization"
+          }
+        },
+        {
+          "criterion": "Output Quality",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Ranked results with source quality indicators; top result explains root cause",
+            "3": "Listed results with basic descriptions",
+            "1": "Unstructured URL list",
+            "0": "No usable output"
+          }
+        }
+      ],
+      "expectedScoreWithout": 35,
+      "expectedScoreWith": 75
+    },
+    {
+      "id": "bench-easy-03",
+      "difficulty": "easy",
+      "description": "Find official statistics from a government source",
+      "input": "Find the latest US Bureau of Labor Statistics data on unemployment rates by industry sector.",
+      "rubric": [
+        {
+          "criterion": "Relevance",
+          "weight": 0.4,
+          "scoring": {
+            "5": "Returns BLS.gov page with unemployment data broken down by industry",
+            "3": "Returns BLS data but not sector-specific breakdown",
+            "1": "Returns news articles about unemployment instead of primary data",
+            "0": "Irrelevant results"
+          }
+        },
+        {
+          "criterion": "Query Quality",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Uses site:bls.gov with specific terms for industry sector unemployment",
+            "3": "Targets government sites but query could be more specific",
+            "1": "Generic unemployment search",
+            "0": "No query optimization"
+          }
+        },
+        {
+          "criterion": "Source Authority",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Primary result is from bls.gov (T1 source); clearly identified as official data",
+            "3": "Includes BLS data but also mixes in news articles",
+            "1": "Mostly secondary sources reporting on BLS data",
+            "0": "No authoritative sources"
+          }
+        }
+      ],
+      "expectedScoreWithout": 35,
+      "expectedScoreWith": 80
+    },
+    {
+      "id": "bench-med-01",
+      "difficulty": "medium",
+      "description": "Multi-aspect research query requiring decomposition",
+      "input": "Research the trade-offs between microservices and monolithic architecture for a startup with a team of 5 developers building a B2B SaaS product. I need perspectives on development speed, operational complexity, and scalability.",
+      "rubric": [
+        {
+          "criterion": "Query Decomposition",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Decomposes into 3+ sub-queries targeting different aspects (dev speed, ops complexity, scalability) with startup/small-team context",
+            "3": "Uses 2 sub-queries but misses some aspects",
+            "1": "Single broad query covering all aspects",
+            "0": "No decomposition attempted"
+          }
+        },
+        {
+          "criterion": "Result Relevance",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Results address all 3 aspects with startup-specific context; includes case studies or data-driven analysis",
+            "3": "Covers most aspects but lacks startup-specific perspective",
+            "1": "Generic microservices vs monolith content without addressing specific concerns",
+            "0": "Irrelevant results"
+          }
+        },
+        {
+          "criterion": "Source Diversity",
+          "weight": 0.2,
+          "scoring": {
+            "5": "Mix of engineering blogs, case studies, technical publications, and expert opinions from different companies",
+            "3": "2-3 source types represented",
+            "1": "All results from similar sources",
+            "0": "Single source or low-quality sources"
+          }
+        },
+        {
+          "criterion": "Synthesis",
+          "weight": 0.2,
+          "scoring": {
+            "5": "Results are organized by aspect with a summary connecting findings across sub-queries",
+            "3": "Results grouped but no cross-query synthesis",
+            "1": "Flat list of results",
+            "0": "No organization"
+          }
+        }
+      ],
+      "expectedScoreWithout": 30,
+      "expectedScoreWith": 70
+    },
+    {
+      "id": "bench-med-02",
+      "difficulty": "medium",
+      "description": "Time-sensitive search with source quality filtering",
+      "input": "Find the most recent security advisories and CVEs related to Node.js published in 2024 or later. Focus on critical and high severity vulnerabilities. Exclude general security blogs and focus on official sources.",
+      "rubric": [
+        {
+          "criterion": "Query Precision",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Uses date filters (after:2024), targets official sources (site:nodejs.org, site:nvd.nist.gov, site:cve.org), excludes blogs",
+            "3": "Uses some filters but doesn't fully restrict to official sources",
+            "1": "Basic search without date or source filters",
+            "0": "No query optimization"
+          }
+        },
+        {
+          "criterion": "Result Accuracy",
+          "weight": 0.35,
+          "scoring": {
+            "5": "Returns actual CVEs with correct IDs, severity ratings, affected versions; all from 2024+",
+            "3": "Returns relevant security content but some items are older or not official CVEs",
+            "1": "Mix of relevant and outdated security information",
+            "0": "Incorrect or irrelevant results"
+          }
+        },
+        {
+          "criterion": "Source Authority",
+          "weight": 0.25,
+          "scoring": {
+            "5": "All results from T1-T2 sources (NVD, Node.js official, security advisories)",
+            "3": "Mostly authoritative with some secondary sources",
+            "1": "Relies on secondary reporting",
+            "0": "Unverified sources"
+          }
+        },
+        {
+          "criterion": "Output Structure",
+          "weight": 0.15,
+          "scoring": {
+            "5": "Each CVE listed with: ID, severity, affected versions, date, source link",
+            "3": "CVEs listed but missing some metadata",
+            "1": "Unstructured list",
+            "0": "No organization"
+          }
+        }
+      ],
+      "expectedScoreWithout": 30,
+      "expectedScoreWith": 70
+    },
+    {
+      "id": "bench-med-03",
+      "difficulty": "medium",
+      "description": "Comparative search requiring cross-source validation",
+      "input": "Compare the pricing, features, and developer experience of Supabase vs Firebase vs PlanetScale for a new project. I need current pricing pages, feature comparison articles, and real developer reviews.",
+      "rubric": [
+        {
+          "criterion": "Coverage",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Returns results covering all 3 axes (pricing, features, DX) for all 3 services; includes official pricing pages",
+            "3": "Covers 2 of 3 axes or misses one service",
+            "1": "Only covers one aspect or one service",
+            "0": "Irrelevant results"
+          }
+        },
+        {
+          "criterion": "Query Strategy",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Decomposes into targeted sub-queries: official pricing pages (site:), comparison articles, developer reviews (site:reddit.com OR site:news.ycombinator.com)",
+            "3": "Uses 2 sub-queries but misses some source types",
+            "1": "Single generic comparison query",
+            "0": "No strategy"
+          }
+        },
+        {
+          "criterion": "Source Mix",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Includes official pages, independent comparison articles, and authentic developer reviews from different platforms",
+            "3": "2 of 3 source types present",
+            "1": "Only one source type (e.g., all blog posts)",
+            "0": "Low-quality or biased sources"
+          }
+        },
+        {
+          "criterion": "Freshness",
+          "weight": 0.2,
+          "scoring": {
+            "5": "All results from 2024+; pricing data is current; notes any rapid changes",
+            "3": "Most results are recent but some may be outdated",
+            "1": "Mix of current and outdated information",
+            "0": "Mostly outdated"
+          }
+        }
+      ],
+      "expectedScoreWithout": 30,
+      "expectedScoreWith": 70
+    },
+    {
+      "id": "bench-med-04",
+      "difficulty": "medium",
+      "description": "Niche topic search requiring domain expertise in query construction",
+      "input": "Find research papers and technical reports on using retrieval-augmented generation (RAG) to reduce hallucination in large language models. Focus on evaluation methodologies and quantitative results published in 2023 or later.",
+      "rubric": [
+        {
+          "criterion": "Query Precision",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Uses domain-specific terminology (RAG, hallucination, LLM); targets academic sources (arxiv, semantic scholar); uses date filters",
+            "3": "Good terminology but doesn't target academic sources specifically",
+            "1": "Uses general terms instead of domain-specific ones",
+            "0": "No domain awareness in query"
+          }
+        },
+        {
+          "criterion": "Result Relevance",
+          "weight": 0.35,
+          "scoring": {
+            "5": "Returns papers specifically about RAG for hallucination reduction with evaluation metrics and quantitative results",
+            "3": "Returns relevant RAG papers but not focused on hallucination evaluation",
+            "1": "General LLM or RAG papers without hallucination focus",
+            "0": "Irrelevant results"
+          }
+        },
+        {
+          "criterion": "Source Quality",
+          "weight": 0.2,
+          "scoring": {
+            "5": "All results are peer-reviewed papers or preprints from reputable venues; citation counts noted",
+            "3": "Mostly academic sources with some blog posts",
+            "1": "Primarily non-academic sources",
+            "0": "No academic sources"
+          }
+        },
+        {
+          "criterion": "Output Metadata",
+          "weight": 0.2,
+          "scoring": {
+            "5": "Each paper includes: title, authors, venue/date, key findings, evaluation methodology used",
+            "3": "Papers listed with title and summary but missing some metadata",
+            "1": "Titles and URLs only",
+            "0": "No metadata"
+          }
+        }
+      ],
+      "expectedScoreWithout": 25,
+      "expectedScoreWith": 65
+    },
+    {
+      "id": "bench-hard-01",
+      "difficulty": "hard",
+      "description": "Adversarial search with significant noise and SEO spam",
+      "input": "Find genuine, unbiased reviews and benchmarks of the top 5 VPN services. Exclude affiliate marketing content, sponsored reviews, and VPN company blogs. I need independent security audits, speed test data from reputable testers, and privacy policy analyses.",
+      "rubric": [
+        {
+          "criterion": "Noise Filtering",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Successfully excludes affiliate sites, sponsored content, and VPN company marketing; explains filtering strategy",
+            "3": "Filters some noise but includes 1-2 affiliate or sponsored results",
+            "1": "Minimal filtering; includes obvious affiliate content",
+            "0": "No noise filtering; results dominated by affiliate content"
+          }
+        },
+        {
+          "criterion": "Source Independence",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Returns independent security audits (e.g., from security researchers), academic analyses, and consumer reports; no financial ties to VPN companies",
+            "3": "Mostly independent but 1-2 sources have potential conflicts of interest",
+            "1": "Sources have unclear independence",
+            "0": "All sources are financially connected to VPN companies"
+          }
+        },
+        {
+          "criterion": "Data Quality",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Includes actual speed test data, security audit reports, and privacy policy analyses with specific findings",
+            "3": "Includes some quantitative data but lacks depth",
+            "1": "Mostly subjective opinions without data",
+            "0": "No quantitative data"
+          }
+        },
+        {
+          "criterion": "Query Strategy",
+          "weight": 0.15,
+          "scoring": {
+            "5": "Uses aggressive exclusion operators (-affiliate, -sponsored, -site:vpncompany.com); targets specific source types (security research, consumer reports)",
+            "3": "Some exclusions but not comprehensive",
+            "1": "Basic search without exclusion strategy",
+            "0": "No strategy for avoiding biased content"
+          }
+        }
+      ],
+      "expectedScoreWithout": 20,
+      "expectedScoreWith": 60
+    },
+    {
+      "id": "bench-hard-02",
+      "difficulty": "hard",
+      "description": "Cross-domain research requiring synthesis from diverse sources",
+      "input": "Research the intersection of climate change policy, agricultural technology, and food security in Sub-Saharan Africa. I need: (1) recent policy frameworks from international organizations, (2) agritech innovations being deployed in the region, and (3) quantitative data on food security trends. Provide a synthesis connecting these three areas.",
+      "rubric": [
+        {
+          "criterion": "Query Decomposition",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Creates 3+ targeted sub-queries for each domain (policy, agritech, food security data) with regional focus; uses appropriate source filters for each",
+            "3": "2 sub-queries but misses one domain",
+            "1": "Single broad query attempting to cover all domains",
+            "0": "No decomposition"
+          }
+        },
+        {
+          "criterion": "Coverage Breadth",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Returns results covering all 3 domains with Sub-Saharan Africa focus; includes international org reports, tech publications, and data sources",
+            "3": "Covers 2 of 3 domains adequately",
+            "1": "Only covers one domain or lacks regional specificity",
+            "0": "Irrelevant results"
+          }
+        },
+        {
+          "criterion": "Source Authority",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Includes T1 sources: UN/FAO/World Bank reports, peer-reviewed research, official government data",
+            "3": "Mix of authoritative and secondary sources",
+            "1": "Primarily news articles or opinion pieces",
+            "0": "Unreliable sources"
+          }
+        },
+        {
+          "criterion": "Cross-Domain Synthesis",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Provides a coherent synthesis connecting policy, technology, and data; identifies gaps and opportunities at intersections",
+            "3": "Results are organized by domain but synthesis is superficial",
+            "1": "No attempt to connect findings across domains",
+            "0": "Disorganized output"
+          }
+        }
+      ],
+      "expectedScoreWithout": 20,
+      "expectedScoreWith": 60
+    },
+    {
+      "id": "bench-hard-03",
+      "difficulty": "hard",
+      "description": "Ambiguous query requiring clarification and multi-interpretation search",
+      "input": "Find information about Mercury's transit.",
+      "rubric": [
+        {
+          "criterion": "Ambiguity Recognition",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Recognizes multiple interpretations (astronomical transit of Mercury across the Sun, Mercury as a car brand, Mercury in astrology, mercury in chemistry) and either asks for clarification or searches for the most likely interpretations",
+            "3": "Recognizes 2 interpretations and addresses them",
+            "1": "Assumes one interpretation without acknowledging ambiguity",
+            "0": "No awareness of ambiguity"
+          }
+        },
+        {
+          "criterion": "Query Strategy",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Constructs disambiguation queries; uses site: and context terms to separate interpretations; provides results for top 2-3 interpretations",
+            "3": "Searches for primary interpretation with some disambiguation",
+            "1": "Single query without disambiguation",
+            "0": "No strategy"
+          }
+        },
+        {
+          "criterion": "Result Quality",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Results for each interpretation are from authoritative sources (NASA for astronomy, relevant domain sources for others)",
+            "3": "Reasonable results but from mixed-quality sources",
+            "1": "Low-quality or tangential results",
+            "0": "Irrelevant results"
+          }
+        },
+        {
+          "criterion": "User Guidance",
+          "weight": 0.2,
+          "scoring": {
+            "5": "Clearly labels results by interpretation; suggests how user can narrow the search; provides reasoning for ranking",
+            "3": "Some labeling but insufficient guidance",
+            "1": "Results presented without interpretation labels",
+            "0": "Confusing output mixing interpretations"
+          }
+        }
+      ],
+      "expectedScoreWithout": 25,
+      "expectedScoreWith": 65
+    }
+  ]
+}

package/tests/smoke.json ADDED Viewed

@@ -0,0 +1,54 @@
+{
+  "version": "0.0.1",
+  "timeout": 60,
+  "tasks": [
+    {
+      "id": "smoke-01",
+      "description": "Search for recent best practices on a technical topic with source quality filtering",
+      "input": "Find the most authoritative and recent resources on implementing OAuth 2.0 with PKCE flow in a single-page application. I need official documentation, security considerations, and practical implementation guides. Filter out outdated content (before 2023).",
+      "rubric": [
+        {
+          "criterion": "Query Construction",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Uses advanced operators (site:, after:, exact phrases), decomposes into sub-queries for different aspects (docs, security, implementation)",
+            "3": "Uses some operators but relies on a single broad query",
+            "1": "Basic keyword search with no operators",
+            "0": "Searches the raw user input as-is"
+          }
+        },
+        {
+          "criterion": "Result Relevance",
+          "weight": 0.3,
+          "scoring": {
+            "5": "All results directly address OAuth 2.0 PKCE in SPAs; includes official RFC/docs, security analysis, and code examples",
+            "3": "Most results are relevant but some are tangential (e.g., general OAuth without PKCE, server-side flow)",
+            "1": "Mix of relevant and irrelevant results; generic authentication articles",
+            "0": "Results do not address the specific query"
+          }
+        },
+        {
+          "criterion": "Source Quality & Diversity",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Results from 3+ credibility tiers (e.g., RFC/official docs, security blogs, developer guides); no content farms; sources clearly attributed",
+            "3": "Results from 2 source types; mostly credible but some questionable sources included",
+            "1": "Results from a single source type or includes low-quality sources",
+            "0": "Sources are unreliable or unverified"
+          }
+        },
+        {
+          "criterion": "Output Structure",
+          "weight": 0.2,
+          "scoring": {
+            "5": "Each result has: title, source, date, credibility assessment, and relevance summary; results are ranked; synthesis provided",
+            "3": "Results have titles and URLs but missing some metadata; basic ranking",
+            "1": "Unstructured list of URLs or titles",
+            "0": "Raw output with no organization"
+          }
+        }
+      ],
+      "passThreshold": 60
+    }
+  ]
+}