PyPI - unique-web-search - Versions diffs - 1.7.4__tar.gz - Mend

unique-web-search 1.7.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (57) hide show

unique_web_search-1.7.4/PKG-INFO ADDED Viewed

@@ -0,0 +1,147 @@
+Metadata-Version: 2.3
+Name: unique-web-search
+Version: 1.7.4
+Summary:
+Author: Andreas Hauri, Gustav Hartz, Rami Azouz
+Author-email: Andreas Hauri <andreas@unique.ch>, Gustav Hartz <gustav.hartz.ext@unique.ch>, Rami Azouz <rami.ext@unique.ch>
+License: Proprietary
+Requires-Dist: typing-extensions>=4.14.1,<5
+Requires-Dist: pydantic>=2.12.3,<3
+Requires-Dist: pydantic-settings>=2.10.1,<3
+Requires-Dist: timeout-decorator>=0.5.0,<1
+Requires-Dist: markdownify>=0.14.1,<1
+Requires-Dist: fake-useragent>=2.2.0,<3
+Requires-Dist: crawl4ai>=0.6.3,<1
+Requires-Dist: firecrawl>=3.3.2,<4
+Requires-Dist: tavily-python>=0.7.11,<1
+Requires-Dist: unidecode>=1.4.0,<2
+Requires-Dist: azure-ai-projects>=1.0.0,<2
+Requires-Dist: azure-identity>=1.25.0,<2
+Requires-Dist: unique-toolkit>=1.38.3,<2
+Requires-Dist: azure-core>=1.36.0,<2
+Requires-Dist: google-cloud-aiplatform>=1.128.0,<2
+Requires-Dist: google-auth>=2.43.0,<3
+Requires-Dist: google-generativeai>=0.8.5,<0.9
+Requires-Dist: langchain-text-splitters>=1.0.0,<2
+Requires-Dist: httpx>=0.28.1
+Requires-Dist: tiktoken>=0.12.0
+Requires-Dist: openai>=1.109.1
+Requires-Dist: certifi>=2025.11.12
+Requires-Python: >=3.12
+Description-Content-Type: text/markdown
+# Unique Web Search
+A powerful, configurable web search tool for retrieving and processing the latest information from the internet. This package provides intelligent search capabilities with support for multiple search engines, web crawlers, and content processing strategies.
+## Architecture
+The following diagram illustrates the complete architecture and workflow of the unique_web_search package:
+![Web Search Tool Architecture](docs/images/architecture-diagram.svg)
+## Key Features
+- **Dual Execution Modes**:
+  - **V1 (Traditional)**: Query refinement with single or multiple search strategies
+  - **V2 (Step-based Planning)**: Advanced research planning with parallel execution
+- **Multiple Search Engines**:
+  - Google Search
+  - Bing Search
+  - Brave Search
+  - Jina Search
+  - Tavily Search
+  - Firecrawl Search
+  - VertexAI (Gemini with Grounding)
+  - Custom API (integrate any compatible web search API)
+- **Multiple Web Crawlers**:
+  - Basic HTTP Crawler
+  - Crawl4AI
+  - Jina Reader
+  - Tavily Crawler
+  - Firecrawl Crawler
+- **Intelligent Content Processing**:
+  - LLM-based summarization
+  - Token-based truncation
+  - Relevancy scoring and sorting
+  - Content chunking and optimization
+- **Query Refinement**:
+  - **BASIC Mode**: Single optimized search query
+  - **ADVANCED Mode**: Multiple targeted search queries for complex research
+- **Performance Optimized**:
+  - Parallel execution of search and crawl operations
+  - Token limit management
+  - Configurable timeouts and error handling
+## Detailed Subsystem Docs
+For deeper dives into each subsystem, see the dedicated READMEs:
+- [Search Engines](./unique_web_search/services/search_engine/README.md) &mdash; full catalogue of supported engines, configuration, and usage examples.
+- [Crawlers](./unique_web_search/services/crawlers/README.md) &mdash; comparison of crawling strategies (Basic, Crawl4AI, Tavily, Firecrawl, Jina) with setup guides.
+- [Executors](./unique_web_search/services/executors/README.md) &mdash; orchestration layer (V1 & V2) covering query refinement, planning, logging, and best practices.
+## Configuration
+The tool uses environment variables and configuration files to manage API keys and settings. Key configuration areas include:
+- Search engine selection and API keys
+- Crawler selection and configuration
+- Content processing strategies (SUMMARIZE, TRUNCATE, NONE)
+- Token limits and relevancy thresholds
+- Proxy configuration
+- Debug and monitoring options
+## Dependency management (uv.lock + min/latest testing)
+This package is a **library** and uses `uv` for dependency management.
+We run tests additionally with minimal dependencies to ensure that the listed ranges are valid. NOTE: We use lowest-direct, not lowest.
+Lowest attempts to use the lowest possible dependency versions _tarnsitively_ causing issues if a dependency has incorrect metadata. Example:
+- google-cloud-aiplatform says it works with shapely<3.0.0.
+- The lowest resolver assumes 1.0 which needs python 2 -> breaks
+Therefore we use lowest-direct which only sets our direct dependencies to lowest. However, this only correctly verifies our min dependencies
+if our code correctly lists all the required dependencies and never imports a transitive dependency. We therefore use deptry to ansure we
+don't use transitive dependencies and that we have no unused dependencies.
+### Test locally
+- **Latest deps and deptry**:
+```bash
+cd tool_packages/unique_web_search
+uv sync
+uv run pytest
+uv run deptry
+```
+- **Min deps**:
+```bash
+cd tool_packages/unique_web_search
+uv venv
+# Install runtime deps at minimum versions
+uv pip install -e . --resolution=lowest-direct
+# Install dev deps from [dependency-groups] (we only care about runtime dep minimums)
+uv export --only-group dev --no-hashes | uv pip install -r -
+# Use --no-sync to prevent uv from "fixing" the versions
+uv run --no-sync pytest
+```
+## Workflow
+1. **Input**: User query or structured search plan
+2. **Configuration**: Load settings and initialize services
+3. **Execution**:
+   - V1: Query refinement → Search → Crawl → Process
+   - V2: Execute planned steps in parallel → Process
+4. **Content Processing**: Clean, summarize/truncate, and chunk content
+5. **Optimization**: Reduce to token limits and sort by relevance
+6. **Output**: Return structured content chunks optimized for LLM consumption

unique_web_search-1.7.4/README.md ADDED Viewed

@@ -0,0 +1,115 @@
+# Unique Web Search
+A powerful, configurable web search tool for retrieving and processing the latest information from the internet. This package provides intelligent search capabilities with support for multiple search engines, web crawlers, and content processing strategies.
+## Architecture
+The following diagram illustrates the complete architecture and workflow of the unique_web_search package:
+![Web Search Tool Architecture](docs/images/architecture-diagram.svg)
+## Key Features
+- **Dual Execution Modes**:
+  - **V1 (Traditional)**: Query refinement with single or multiple search strategies
+  - **V2 (Step-based Planning)**: Advanced research planning with parallel execution
+- **Multiple Search Engines**:
+  - Google Search
+  - Bing Search
+  - Brave Search
+  - Jina Search
+  - Tavily Search
+  - Firecrawl Search
+  - VertexAI (Gemini with Grounding)
+  - Custom API (integrate any compatible web search API)
+- **Multiple Web Crawlers**:
+  - Basic HTTP Crawler
+  - Crawl4AI
+  - Jina Reader
+  - Tavily Crawler
+  - Firecrawl Crawler
+- **Intelligent Content Processing**:
+  - LLM-based summarization
+  - Token-based truncation
+  - Relevancy scoring and sorting
+  - Content chunking and optimization
+- **Query Refinement**:
+  - **BASIC Mode**: Single optimized search query
+  - **ADVANCED Mode**: Multiple targeted search queries for complex research
+- **Performance Optimized**:
+  - Parallel execution of search and crawl operations
+  - Token limit management
+  - Configurable timeouts and error handling
+## Detailed Subsystem Docs
+For deeper dives into each subsystem, see the dedicated READMEs:
+- [Search Engines](./unique_web_search/services/search_engine/README.md) &mdash; full catalogue of supported engines, configuration, and usage examples.
+- [Crawlers](./unique_web_search/services/crawlers/README.md) &mdash; comparison of crawling strategies (Basic, Crawl4AI, Tavily, Firecrawl, Jina) with setup guides.
+- [Executors](./unique_web_search/services/executors/README.md) &mdash; orchestration layer (V1 & V2) covering query refinement, planning, logging, and best practices.
+## Configuration
+The tool uses environment variables and configuration files to manage API keys and settings. Key configuration areas include:
+- Search engine selection and API keys
+- Crawler selection and configuration
+- Content processing strategies (SUMMARIZE, TRUNCATE, NONE)
+- Token limits and relevancy thresholds
+- Proxy configuration
+- Debug and monitoring options
+## Dependency management (uv.lock + min/latest testing)
+This package is a **library** and uses `uv` for dependency management.
+We run tests additionally with minimal dependencies to ensure that the listed ranges are valid. NOTE: We use lowest-direct, not lowest.
+Lowest attempts to use the lowest possible dependency versions _tarnsitively_ causing issues if a dependency has incorrect metadata. Example:
+- google-cloud-aiplatform says it works with shapely<3.0.0.
+- The lowest resolver assumes 1.0 which needs python 2 -> breaks
+Therefore we use lowest-direct which only sets our direct dependencies to lowest. However, this only correctly verifies our min dependencies
+if our code correctly lists all the required dependencies and never imports a transitive dependency. We therefore use deptry to ansure we
+don't use transitive dependencies and that we have no unused dependencies.
+### Test locally
+- **Latest deps and deptry**:
+```bash
+cd tool_packages/unique_web_search
+uv sync
+uv run pytest
+uv run deptry
+```
+- **Min deps**:
+```bash
+cd tool_packages/unique_web_search
+uv venv
+# Install runtime deps at minimum versions
+uv pip install -e . --resolution=lowest-direct
+# Install dev deps from [dependency-groups] (we only care about runtime dep minimums)
+uv export --only-group dev --no-hashes | uv pip install -r -
+# Use --no-sync to prevent uv from "fixing" the versions
+uv run --no-sync pytest
+```
+## Workflow
+1. **Input**: User query or structured search plan
+2. **Configuration**: Load settings and initialize services
+3. **Execution**:
+   - V1: Query refinement → Search → Crawl → Process
+   - V2: Execute planned steps in parallel → Process
+4. **Content Processing**: Clean, summarize/truncate, and chunk content
+5. **Optimization**: Reduce to token limits and sort by relevance
+6. **Output**: Return structured content chunks optimized for LLM consumption

unique_web_search-1.7.4/pyproject.toml ADDED Viewed

@@ -0,0 +1,59 @@
+[project]
+name = "unique_web_search"
+version = "1.7.4"
+description = ""
+readme = "README.md"
+requires-python = ">=3.12"
+authors = [
+  { name = "Andreas Hauri", email = "andreas@unique.ch" },
+  { name = "Gustav Hartz", email = "gustav.hartz.ext@unique.ch" },
+  { name = "Rami Azouz", email = "rami.ext@unique.ch" },
+]
+license = { text = "Proprietary" }
+dependencies = [
+  "typing-extensions>=4.14.1,<5",
+  "pydantic>=2.12.3,<3",
+  "pydantic-settings>=2.10.1,<3",
+  "timeout-decorator>=0.5.0,<1",
+  "markdownify>=0.14.1,<1",
+  "fake-useragent>=2.2.0,<3",
+  "crawl4ai>=0.6.3,<1",
+  "firecrawl>=3.3.2,<4",
+  "tavily-python>=0.7.11,<1",
+  "unidecode>=1.4.0,<2",
+  "azure-ai-projects>=1.0.0,<2",
+  "azure-identity>=1.25.0,<2",
+  "unique-toolkit>=1.38.3,<2",
+  "azure-core>=1.36.0,<2",
+  "google-cloud-aiplatform>=1.128.0,<2",
+  "google-auth>=2.43.0,<3",
+  "google-generativeai>=0.8.5,<0.9",
+  "langchain-text-splitters>=1.0.0,<2",
+  "httpx>=0.28.1",
+  "tiktoken>=0.12.0",
+  "openai>=1.109.1",
+  "certifi>=2025.11.12",
+]
+[build-system]
+requires = ["uv_build>=0.8.14,<0.9.0"]
+build-backend = "uv_build"
+[dependency-groups]
+dev = [
+  "deptry>=0.24.0",
+  "pytest>=8.4.1,<9",
+  "pytest-asyncio>=1.2.0,<2",
+  "pytest-mock>=3.14.0,<4",
+  "ruff>=0.12.10,<0.13",
+]
+[tool.deptry]
+known_first_party = ["unique_web_search"]
+[tool.ruff]
+target-version = "py312"
+[tool.ruff.lint]
+extend-select = ["I"]

unique_web_search-1.7.4/src/unique_web_search/__init__.py ADDED Viewed

File without changes

unique_web_search-1.7.4/src/unique_web_search/client_settings.py ADDED Viewed

@@ -0,0 +1,202 @@
+import logging
+from pydantic import BaseModel
+from unique_web_search.settings import env_settings
+_LOGGER = logging.getLogger(__name__)
+class GoogleSearchSettings(BaseModel):
+    api_key: str | None = None
+    search_engine_id: str | None = None
+    api_endpoint: str | None = None
+    @property
+    def is_configured(self) -> bool:
+        return (
+            self.api_key and self.search_engine_id and self.api_endpoint
+        ) is not None
+    @classmethod
+    def from_env_settings(cls):
+        missing_settings = []
+        if env_settings.google_search_api_key is None:
+            missing_settings.append("API Key")
+        if env_settings.google_search_engine_id is None:
+            missing_settings.append("Engine ID")
+        if env_settings.google_search_api_endpoint is None:
+            missing_settings.append("API Endpoint")
+        if missing_settings:
+            _LOGGER.warning(
+                f"Google Search API missing required settings: {', '.join(missing_settings)}"
+            )
+        else:
+            _LOGGER.info("Google Search API is properly configured")
+        return cls(
+            api_key=env_settings.google_search_api_key,
+            search_engine_id=env_settings.google_search_engine_id,
+            api_endpoint=env_settings.google_search_api_endpoint,
+        )
+_google_search_settings: GoogleSearchSettings | None = None
+def get_google_search_settings() -> GoogleSearchSettings:
+    global _google_search_settings
+    if _google_search_settings is None:
+        _google_search_settings = GoogleSearchSettings.from_env_settings()
+    return _google_search_settings
+class FirecrawlSearchSettings(BaseModel):
+    api_key: str | None = None
+    @property
+    def is_configured(self) -> bool:
+        return self.api_key is not None
+    @classmethod
+    def from_env_settings(cls):
+        missing_settings = []
+        if env_settings.firecrawl_api_key is None:
+            missing_settings.append("API Key")
+        if missing_settings:
+            _LOGGER.warning(
+                f"Firecrawl Search API missing required settings: {', '.join(missing_settings)}"
+            )
+        else:
+            _LOGGER.info("Firecrawl Search API is properly configured")
+        return cls(
+            api_key=env_settings.firecrawl_api_key,
+        )
+_firecrawl_search_settings: FirecrawlSearchSettings | None = None
+def get_firecrawl_search_settings() -> FirecrawlSearchSettings:
+    global _firecrawl_search_settings
+    if _firecrawl_search_settings is None:
+        _firecrawl_search_settings = FirecrawlSearchSettings.from_env_settings()
+    return _firecrawl_search_settings
+class JinaSearchSettings(BaseModel):
+    api_key: str | None = None
+    search_api_endpoint: str = env_settings.jina_search_api_endpoint
+    reader_api_endpoint: str = env_settings.jina_reader_api_endpoint
+    @property
+    def is_configured(self) -> bool:
+        return self.api_key is not None
+    @classmethod
+    def from_env_settings(cls):
+        missing_settings = []
+        if env_settings.jina_api_key is None:
+            missing_settings.append("API Key")
+        if missing_settings:
+            _LOGGER.warning(
+                f"Jina Search API missing required settings: {', '.join(missing_settings)}"
+            )
+        else:
+            _LOGGER.info("Jina Search API is properly configured")
+        return cls(
+            api_key=env_settings.jina_api_key,
+        )
+_jina_search_settings: JinaSearchSettings | None = None
+def get_jina_search_settings() -> JinaSearchSettings:
+    global _jina_search_settings
+    if _jina_search_settings is None:
+        _jina_search_settings = JinaSearchSettings.from_env_settings()
+    return _jina_search_settings
+class TavilySearchSettings(BaseModel):
+    api_key: str | None = None
+    @property
+    def is_configured(self) -> bool:
+        return self.api_key is not None
+    @classmethod
+    def from_env_settings(cls):
+        missing_settings = []
+        if env_settings.tavily_api_key is None:
+            missing_settings.append("API Key")
+        if missing_settings:
+            _LOGGER.warning(
+                f"Tavily Search API missing required settings: {', '.join(missing_settings)}"
+            )
+        else:
+            _LOGGER.info("Tavily Search API is properly configured")
+        return cls(api_key=env_settings.tavily_api_key)
+_tavily_search_settings: TavilySearchSettings | None = None
+def get_tavily_search_settings() -> TavilySearchSettings:
+    global _tavily_search_settings
+    if _tavily_search_settings is None:
+        _tavily_search_settings = TavilySearchSettings.from_env_settings()
+    return _tavily_search_settings
+class BraveSearchSettings(BaseModel):
+    api_key: str | None = None
+    api_endpoint: str | None = None
+    @property
+    def is_configured(self) -> bool:
+        return self.api_key is not None
+    @classmethod
+    def from_env_settings(cls):
+        missing_settings = []
+        if env_settings.brave_search_api_key is None:
+            missing_settings.append("API Key")
+        if env_settings.brave_search_api_endpoint is None:
+            missing_settings.append("API Endpoint")
+        if missing_settings:
+            _LOGGER.warning(
+                f"Brave Search API missing required settings: {', '.join(missing_settings)}"
+            )
+        else:
+            _LOGGER.info("Brave Search API is properly configured")
+        return cls(
+            api_key=env_settings.brave_search_api_key,
+            api_endpoint=env_settings.brave_search_api_endpoint,
+        )
+_brave_search_settings: BraveSearchSettings | None = None
+def get_brave_search_settings() -> BraveSearchSettings:
+    global _brave_search_settings
+    if _brave_search_settings is None:
+        _brave_search_settings = BraveSearchSettings.from_env_settings()
+    return _brave_search_settings