ddgo-search 1.0.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- ddgo_search-1.0.0/.codex/agents/ddgo-search.toml +169 -0
- ddgo_search-1.0.0/.github/workflows/release.yml +29 -0
- ddgo_search-1.0.0/.github/workflows/tests.yml +32 -0
- ddgo_search-1.0.0/.gitignore +50 -0
- ddgo_search-1.0.0/.python-version +1 -0
- ddgo_search-1.0.0/AGENTS.md +94 -0
- ddgo_search-1.0.0/PKG-INFO +177 -0
- ddgo_search-1.0.0/README.md +164 -0
- ddgo_search-1.0.0/pyproject.toml +28 -0
- ddgo_search-1.0.0/skills/ddgo-search-skill/SKILL.md +128 -0
- ddgo_search-1.0.0/src/ddgo_search/__init__.py +1 -0
- ddgo_search-1.0.0/src/ddgo_search/cli.py +533 -0
- ddgo_search-1.0.0/src/ddgo_search/utils.py +436 -0
- ddgo_search-1.0.0/tests/test_cli.py +332 -0
- ddgo_search-1.0.0/tests/test_integration.py +119 -0
- ddgo_search-1.0.0/uv.lock +668 -0
|
@@ -0,0 +1,169 @@
|
|
|
1
|
+
name = "ddgo-search"
|
|
2
|
+
description = "Web search and page fetching agent using the DuckDuckGo Search CLI wrapper."
|
|
3
|
+
model = "gemini-3.5-flash:latest"
|
|
4
|
+
sandbox_mode = "danger-full-access"
|
|
5
|
+
|
|
6
|
+
developer_instructions = """
|
|
7
|
+
You are a web content analysis agent for Crush. Your task is to analyze web content, search results, or web pages to extract the information requested by the user.
|
|
8
|
+
|
|
9
|
+
<rules>
|
|
10
|
+
1. Be concise and direct in your responses
|
|
11
|
+
2. Focus only on the information requested in the user's prompt
|
|
12
|
+
3. If the content is provided in a file path, use the grep and view tools to efficiently search through it
|
|
13
|
+
4. When relevant, quote specific sections from the content to support your answer
|
|
14
|
+
5. If the requested information is not found, clearly state that
|
|
15
|
+
6. Any file paths you use MUST be absolute
|
|
16
|
+
7. **IMPORTANT**: If you need information from a linked page or search result, run the `ddgo-search fetch` command via the Bash tool to retrieve the content
|
|
17
|
+
8. **IMPORTANT**: If you need to search for more information, run the `ddgo-search text` command via the Bash tool
|
|
18
|
+
9. After fetching a link, analyze the content yourself to extract what's needed
|
|
19
|
+
10. Don't hesitate to follow multiple links or perform multiple searches if necessary to get complete information
|
|
20
|
+
11. **CRITICAL**: At the end of your response, include a "Sources" section listing ALL URLs that were useful in answering the question
|
|
21
|
+
</rules>
|
|
22
|
+
|
|
23
|
+
<search_strategy>
|
|
24
|
+
When searching for information:
|
|
25
|
+
|
|
26
|
+
1. **Break down complex questions** - If the user's question has multiple parts, search for each part separately
|
|
27
|
+
2. **Use specific, targeted queries** - Prefer multiple small searches over one broad search
|
|
28
|
+
- Bad: "Python 3.12 new features performance improvements async changes"
|
|
29
|
+
- Good: First "Python 3.12 new features", then "Python 3.12 performance improvements", then "Python 3.12 async changes"
|
|
30
|
+
3. **Iterate and refine** - If initial results aren't helpful, try different search terms or more specific queries
|
|
31
|
+
4. **Search for different aspects** - For comprehensive answers, search for different angles of the topic
|
|
32
|
+
5. **Follow up on promising results** - When you find a good source, fetch it and look for links to related information
|
|
33
|
+
|
|
34
|
+
Example workflow for "What are the pros and cons of using Rust vs Go for web services?":
|
|
35
|
+
- Search 1: "Rust web services advantages"
|
|
36
|
+
- Search 2: "Go web services advantages"
|
|
37
|
+
- Search 3: "Rust vs Go performance comparison"
|
|
38
|
+
- Search 4: "Rust vs Go developer experience"
|
|
39
|
+
- Then fetch the most relevant results from each search
|
|
40
|
+
</search_strategy>
|
|
41
|
+
|
|
42
|
+
<response_format>
|
|
43
|
+
Your response should be structured as follows:
|
|
44
|
+
|
|
45
|
+
[Your answer to the user's question]
|
|
46
|
+
|
|
47
|
+
## Sources
|
|
48
|
+
- [URL 1 that was useful]
|
|
49
|
+
- [URL 2 that was useful]
|
|
50
|
+
- [URL 3 that was useful]
|
|
51
|
+
...
|
|
52
|
+
|
|
53
|
+
Only include URLs that actually contributed information to your answer. Include the main URL or search results that were helpful. Add any additional URLs you fetched that provided relevant information.
|
|
54
|
+
</response_format>
|
|
55
|
+
|
|
56
|
+
<env>
|
|
57
|
+
Working directory: {{.WorkingDir}}
|
|
58
|
+
Platform: {{.Platform}}
|
|
59
|
+
Today's date: {{.Date}}
|
|
60
|
+
</env>
|
|
61
|
+
|
|
62
|
+
<bash_tool>
|
|
63
|
+
You have access to the Bash tool to run the `ddgo-search` CLI wrapper:
|
|
64
|
+
|
|
65
|
+
### Web Search (`uv run ddgo-search text`)
|
|
66
|
+
Run web search queries to retrieve results:
|
|
67
|
+
`uv run ddgo-search text "<query>" [OPTIONS]`
|
|
68
|
+
- Provide a query string.
|
|
69
|
+
- Use `--max-results <INTEGER>` to specify the maximum results (default: 10).
|
|
70
|
+
- Use `-f table` or `-f plain` for token-efficient terminal rendering.
|
|
71
|
+
- Keep queries short and specific (3-6 words). Prefer multiple focused searches over a single broad search.
|
|
72
|
+
|
|
73
|
+
### Page Fetching (`uv run ddgo-search fetch`)
|
|
74
|
+
Directly fetch a URL and convert its HTML content to Markdown:
|
|
75
|
+
`uv run ddgo-search fetch "<URL>" [OPTIONS]`
|
|
76
|
+
- Provide the URL to fetch.
|
|
77
|
+
- Use `-f markdown` (default) for clean markdown content.
|
|
78
|
+
- Use `-s <INTEGER>` to adjust maximum content size in bytes (default: 102400 / 100KB) to prevent token waste.
|
|
79
|
+
- This is highly recommended for scraping specific pages to bypass extraction server limits.
|
|
80
|
+
|
|
81
|
+
### Page Extraction (`uv run ddgo-search extract`)
|
|
82
|
+
Extract main content using DuckDuckGo's internal extraction backend:
|
|
83
|
+
`uv run ddgo-search extract "<URL>" [OPTIONS]`
|
|
84
|
+
- Extracts content using DuckDuckGo's extractor.
|
|
85
|
+
</bash_tool>
|
|
86
|
+
"""
|
|
87
|
+
|
|
88
|
+
[instructions]
|
|
89
|
+
text = """
|
|
90
|
+
You are a web content analysis agent for Crush. Your task is to analyze web content, search results, or web pages to extract the information requested by the user.
|
|
91
|
+
|
|
92
|
+
<rules>
|
|
93
|
+
1. Be concise and direct in your responses
|
|
94
|
+
2. Focus only on the information requested in the user's prompt
|
|
95
|
+
3. If the content is provided in a file path, use the grep and view tools to efficiently search through it
|
|
96
|
+
4. When relevant, quote specific sections from the content to support your answer
|
|
97
|
+
5. If the requested information is not found, clearly state that
|
|
98
|
+
6. Any file paths you use MUST be absolute
|
|
99
|
+
7. **IMPORTANT**: If you need information from a linked page or search result, run the `ddgo-search fetch` command via the Bash tool to retrieve the content
|
|
100
|
+
8. **IMPORTANT**: If you need to search for more information, run the `ddgo-search text` command via the Bash tool
|
|
101
|
+
9. After fetching a link, analyze the content yourself to extract what's needed
|
|
102
|
+
10. Don't hesitate to follow multiple links or perform multiple searches if necessary to get complete information
|
|
103
|
+
11. **CRITICAL**: At the end of your response, include a "Sources" section listing ALL URLs that were useful in answering the question
|
|
104
|
+
</rules>
|
|
105
|
+
|
|
106
|
+
<search_strategy>
|
|
107
|
+
When searching for information:
|
|
108
|
+
|
|
109
|
+
1. **Break down complex questions** - If the user's question has multiple parts, search for each part separately
|
|
110
|
+
2. **Use specific, targeted queries** - Prefer multiple small searches over one broad search
|
|
111
|
+
- Bad: "Python 3.12 new features performance improvements async changes"
|
|
112
|
+
- Good: First "Python 3.12 new features", then "Python 3.12 performance improvements", then "Python 3.12 async changes"
|
|
113
|
+
3. **Iterate and refine** - If initial results aren't helpful, try different search terms or more specific queries
|
|
114
|
+
4. **Search for different aspects** - For comprehensive answers, search for different angles of the topic
|
|
115
|
+
5. **Follow up on promising results** - When you find a good source, fetch it and look for links to related information
|
|
116
|
+
|
|
117
|
+
Example workflow for "What are the pros and cons of using Rust vs Go for web services?":
|
|
118
|
+
- Search 1: "Rust web services advantages"
|
|
119
|
+
- Search 2: "Go web services advantages"
|
|
120
|
+
- Search 3: "Rust vs Go performance comparison"
|
|
121
|
+
- Search 4: "Rust vs Go developer experience"
|
|
122
|
+
- Then fetch the most relevant results from each search
|
|
123
|
+
</search_strategy>
|
|
124
|
+
|
|
125
|
+
<response_format>
|
|
126
|
+
Your response should be structured as follows:
|
|
127
|
+
|
|
128
|
+
[Your answer to the user's question]
|
|
129
|
+
|
|
130
|
+
## Sources
|
|
131
|
+
- [URL 1 that was useful]
|
|
132
|
+
- [URL 2 that was useful]
|
|
133
|
+
- [URL 3 that was useful]
|
|
134
|
+
...
|
|
135
|
+
|
|
136
|
+
Only include URLs that actually contributed information to your answer. Include the main URL or search results that were helpful. Add any additional URLs you fetched that provided relevant information.
|
|
137
|
+
</response_format>
|
|
138
|
+
|
|
139
|
+
<env>
|
|
140
|
+
Working directory: {{.WorkingDir}}
|
|
141
|
+
Platform: {{.Platform}}
|
|
142
|
+
Today's date: {{.Date}}
|
|
143
|
+
</env>
|
|
144
|
+
|
|
145
|
+
<bash_tool>
|
|
146
|
+
You have access to the Bash tool to run the `ddgo-search` CLI wrapper:
|
|
147
|
+
|
|
148
|
+
### Web Search (`uv run ddgo-search text`)
|
|
149
|
+
Run web search queries to retrieve results:
|
|
150
|
+
`uv run ddgo-search text "<query>" [OPTIONS]`
|
|
151
|
+
- Provide a query string.
|
|
152
|
+
- Use `--max-results <INTEGER>` to specify the maximum results (default: 10).
|
|
153
|
+
- Use `-f table` or `-f plain` for token-efficient terminal rendering.
|
|
154
|
+
- Keep queries short and specific (3-6 words). Prefer multiple focused searches over a single broad search.
|
|
155
|
+
|
|
156
|
+
### Page Fetching (`uv run ddgo-search fetch`)
|
|
157
|
+
Directly fetch a URL and convert its HTML content to Markdown:
|
|
158
|
+
`uv run ddgo-search fetch "<URL>" [OPTIONS]`
|
|
159
|
+
- Provide the URL to fetch.
|
|
160
|
+
- Use `-f markdown` (default) for clean markdown content.
|
|
161
|
+
- Use `-s <INTEGER>` to adjust maximum content size in bytes (default: 102400 / 100KB) to prevent token waste.
|
|
162
|
+
- This is highly recommended for scraping specific pages to bypass extraction server limits.
|
|
163
|
+
|
|
164
|
+
### Page Extraction (`uv run ddgo-search extract`)
|
|
165
|
+
Extract main content using DuckDuckGo's internal extraction backend:
|
|
166
|
+
`uv run ddgo-search extract "<URL>" [OPTIONS]`
|
|
167
|
+
- Extracts content using DuckDuckGo's extractor.
|
|
168
|
+
</bash_tool>
|
|
169
|
+
"""
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
name: Publish to PyPI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
tags:
|
|
6
|
+
- "v*"
|
|
7
|
+
|
|
8
|
+
jobs:
|
|
9
|
+
publish:
|
|
10
|
+
name: Build and publish to PyPI
|
|
11
|
+
runs-on: ubuntu-latest
|
|
12
|
+
environment: pypi
|
|
13
|
+
permissions:
|
|
14
|
+
id-token: write # Required for Trusted Publishing (OIDC)
|
|
15
|
+
contents: read
|
|
16
|
+
steps:
|
|
17
|
+
- name: Checkout repository
|
|
18
|
+
uses: actions/checkout@v4
|
|
19
|
+
|
|
20
|
+
- name: Set up uv
|
|
21
|
+
uses: astral-sh/setup-uv@v5
|
|
22
|
+
with:
|
|
23
|
+
enable-cache: true
|
|
24
|
+
|
|
25
|
+
- name: Build packages
|
|
26
|
+
run: uv build
|
|
27
|
+
|
|
28
|
+
- name: Publish to PyPI
|
|
29
|
+
run: uv publish
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
name: Tests
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [ "main" ]
|
|
6
|
+
pull_request:
|
|
7
|
+
branches: [ "main" ]
|
|
8
|
+
|
|
9
|
+
jobs:
|
|
10
|
+
test:
|
|
11
|
+
runs-on: ubuntu-latest
|
|
12
|
+
strategy:
|
|
13
|
+
matrix:
|
|
14
|
+
python-version: ["3.11", "3.12", "3.13", "3.14"]
|
|
15
|
+
fail-fast: false
|
|
16
|
+
|
|
17
|
+
steps:
|
|
18
|
+
- name: Checkout repository
|
|
19
|
+
uses: actions/checkout@v4
|
|
20
|
+
|
|
21
|
+
- name: Set up uv and Python ${{ matrix.python-version }}
|
|
22
|
+
uses: astral-sh/setup-uv@v5
|
|
23
|
+
with:
|
|
24
|
+
python-version: ${{ matrix.python-version }}
|
|
25
|
+
enable-cache: true
|
|
26
|
+
cache-python: true
|
|
27
|
+
|
|
28
|
+
- name: Install dependencies
|
|
29
|
+
run: uv sync --locked --all-extras --dev
|
|
30
|
+
|
|
31
|
+
- name: Run tests
|
|
32
|
+
run: uv run pytest
|
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
# Byte-compiled / optimized / DLL files
|
|
2
|
+
__pycache__/
|
|
3
|
+
*.py[cod]
|
|
4
|
+
*$py.class
|
|
5
|
+
|
|
6
|
+
# C extensions
|
|
7
|
+
*.so
|
|
8
|
+
|
|
9
|
+
# Distribution / packaging
|
|
10
|
+
.Python
|
|
11
|
+
build/
|
|
12
|
+
develop-eggs/
|
|
13
|
+
dist/
|
|
14
|
+
downloads/
|
|
15
|
+
eggs/
|
|
16
|
+
.eggs/
|
|
17
|
+
lib/
|
|
18
|
+
lib64/
|
|
19
|
+
parts/
|
|
20
|
+
sdist/
|
|
21
|
+
var/
|
|
22
|
+
wheels/
|
|
23
|
+
share/python-wheels/
|
|
24
|
+
*.egg-info/
|
|
25
|
+
.installed.cfg
|
|
26
|
+
*.egg
|
|
27
|
+
MANIFEST
|
|
28
|
+
|
|
29
|
+
# Pytest / linting / type checking cache
|
|
30
|
+
.pytest_cache/
|
|
31
|
+
.ruff_cache/
|
|
32
|
+
.mypy_cache/
|
|
33
|
+
.coverage
|
|
34
|
+
htmlcov/
|
|
35
|
+
|
|
36
|
+
# Virtual environments
|
|
37
|
+
.venv/
|
|
38
|
+
venv/
|
|
39
|
+
ENV/
|
|
40
|
+
env/
|
|
41
|
+
|
|
42
|
+
# IDEs and editors
|
|
43
|
+
.idea/
|
|
44
|
+
.vscode/
|
|
45
|
+
*.swp
|
|
46
|
+
*.swo
|
|
47
|
+
|
|
48
|
+
# OS files
|
|
49
|
+
.DS_Store
|
|
50
|
+
Thumbs.db
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
3.11
|
|
@@ -0,0 +1,94 @@
|
|
|
1
|
+
# Agent Guide - ddgo-search
|
|
2
|
+
|
|
3
|
+
This guide provides the essential technical context, architecture, commands, patterns, and non-obvious details/gotchas required for an AI agent to work efficiently in this repository.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## 🛠️ Essential Commands
|
|
8
|
+
|
|
9
|
+
The project uses [uv](https://github.com/astral-sh/uv) as its package and environment manager.
|
|
10
|
+
|
|
11
|
+
- **Run unit tests**:
|
|
12
|
+
```bash
|
|
13
|
+
uv run pytest
|
|
14
|
+
```
|
|
15
|
+
- **Execute CLI directly in development**:
|
|
16
|
+
```bash
|
|
17
|
+
uv run ddgo-search [COMMAND] [ARGS]...
|
|
18
|
+
```
|
|
19
|
+
Example:
|
|
20
|
+
```bash
|
|
21
|
+
uv run ddgo-search text "artificial intelligence" --format plain
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## 📂 Code Organization & Structure
|
|
27
|
+
|
|
28
|
+
The repository has a clean, standard Python layout:
|
|
29
|
+
|
|
30
|
+
```text
|
|
31
|
+
├── .python-version # Target Python version (>=3.11)
|
|
32
|
+
├── pyproject.toml # Build metadata, CLI entry points, and dependencies
|
|
33
|
+
├── uv.lock # Lockfile for the uv environment manager
|
|
34
|
+
├── src/
|
|
35
|
+
│ └── ddgo_search/
|
|
36
|
+
│ ├── __init__.py # Package metadata (defines __version__)
|
|
37
|
+
│ ├── cli.py # Typer CLI application structure, parameters, commands
|
|
38
|
+
│ └── utils.py # Resiliency, rate limiting, and output formatting helpers
|
|
39
|
+
└── tests/
|
|
40
|
+
└── test_cli.py # Pytest unit tests utilizing extensive mocking
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## 🏗️ Architecture & Data Flow
|
|
46
|
+
|
|
47
|
+
The project is a resilient CLI wrapper around the Python `ddgs` (DuckDuckGo Search) library.
|
|
48
|
+
|
|
49
|
+
### Control Flow
|
|
50
|
+
1. **Invocation**: The user invokes `ddgo-search`.
|
|
51
|
+
2. **Context Setup**: `main_callback()` is triggered, parsing global options (`--proxy`, `--timeout`, `--verify`, `--max-retries`). It builds a `Config` object and attaches it to the Typer context (`ctx.obj`).
|
|
52
|
+
3. **Command Routing**: Typer routes execution to a specific subcommand handler (`text`, `images`, `videos`, `news`, `books`, `extract`, `fetch`).
|
|
53
|
+
4. **Execution & Resiliency**: The subcommand invokes `execute_with_retry()`, supplying a search function matching the category.
|
|
54
|
+
5. **Rate Limiting**: Within `execute_with_retry()`, `ensure_rate_limit()` is called. It checks the global `ddgo_search_rate.json` file to ensure a randomized gap of **1.5 to 3.0 seconds** has elapsed since the last request across any process.
|
|
55
|
+
6. **Query & Proxy Rotation**: The wrapper tries to execute the query. If a proxy was provided (single, comma-separated, or file path), the query uses it. On failure, it performs exponential backoff with jitter and rotates to the next proxy.
|
|
56
|
+
7. **Formatting**: If successful, results are passed to `display_results()` which maps formatting functions (`json`, `csv`, `plain`, `table`) according to the active query category.
|
|
57
|
+
|
|
58
|
+
---
|
|
59
|
+
|
|
60
|
+
## 🎨 Design & Style Patterns
|
|
61
|
+
|
|
62
|
+
- **CLI Framework**: [Typer](https://typer.tiangolo.com/) is used for defining commands, arguments, option groups, and help text. Subcommands match `ddgs` methods directly.
|
|
63
|
+
- **Terminal output**: [Rich](https://github.com/Textualize/rich) is used for rendering plain and markdown outputs.
|
|
64
|
+
- **Output Formats**:
|
|
65
|
+
- `table`: Custom space-efficient ASCII table. To minimize token waste, a custom ASCII generator `format_simple_table()` is used instead of heavy tables.
|
|
66
|
+
- `json`: Standard JSON-serialized dump.
|
|
67
|
+
- `csv`: Standard CSV format.
|
|
68
|
+
- `plain`: Clean colored output optimized for terminal readability.
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## 🧪 Testing Approach
|
|
73
|
+
|
|
74
|
+
- **Resiliency Mocking**: Since live queries to DuckDuckGo are prone to rate-limiting and external network failures, **all CLI tests must mock the `ddgs.DDGS` class**.
|
|
75
|
+
- **Instant Test Runs**: The `tests/test_cli.py` module defines an autouse fixture `mock_rate_limit` which patches `ddgo_search.utils.ensure_rate_limit` to instantly bypass rate limits, speeding up test suite execution.
|
|
76
|
+
- **CliRunner**: Use Typer's `CliRunner` for invoking and verifying standard outputs and exit codes.
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
## ⚠️ Non-Obvious Gotchas & Quirks
|
|
81
|
+
|
|
82
|
+
1. **Typo in the `ddgs` library (Video Resolution)**:
|
|
83
|
+
- The third-party `ddgs` library expects `"standart"` as the value for standard resolution instead of `"standard"`.
|
|
84
|
+
- **Do not "fix" this spelling** in `cli.py` or `VideoResolution` enum. It is defined as `STANDARD = "standart"` to correctly interface with the underlying library.
|
|
85
|
+
2. **Rate Limiting Persistence**:
|
|
86
|
+
- Rate limiting relies on a file named `ddgo_search_rate.json` written to the system's temporary directory (`tempfile.gettempdir()`). If writing/reading to/from this file fails, the application fails silently to avoid blocking execution.
|
|
87
|
+
3. **Proxy Input Formats**:
|
|
88
|
+
- The `--proxy` option accepts a single proxy URL, a comma-separated list of proxy URLs, or a **file path** containing proxy URLs (one per line). The parser `parse_proxies()` detects local files dynamically.
|
|
89
|
+
4. **Markdown Cleaning**:
|
|
90
|
+
- Web extraction results can contain bloated markup. The `clean_markdown` utility collapses three or more consecutive blank lines into exactly two and strips trailing whitespace from all lines.
|
|
91
|
+
5. **Direct Fetch (`fetch`) vs DDG Extract (`extract`)**:
|
|
92
|
+
- The `extract` command uses DuckDuckGo's internal extraction backend via `ddgs.extract`.
|
|
93
|
+
- The `fetch` command mimics Charmbracelet Crush's direct fetch tool by directly requesting the target URL via `httpx` and parsing it locally with `BeautifulSoup` and `markdownify` into `text`, `markdown`, or `html`. It also respects a `--max-size` limit (default 100KB) and truncates any content exceeding it.
|
|
94
|
+
|
|
@@ -0,0 +1,177 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: ddgo-search
|
|
3
|
+
Version: 1.0.0
|
|
4
|
+
Summary: Add your description here
|
|
5
|
+
Requires-Python: >=3.11
|
|
6
|
+
Requires-Dist: beautifulsoup4>=4.14.3
|
|
7
|
+
Requires-Dist: ddgs>=9.14.4
|
|
8
|
+
Requires-Dist: httpx>=0.28.1
|
|
9
|
+
Requires-Dist: markdownify>=1.2.2
|
|
10
|
+
Requires-Dist: rich>=15.0.0
|
|
11
|
+
Requires-Dist: typer>=0.26.6
|
|
12
|
+
Description-Content-Type: text/markdown
|
|
13
|
+
|
|
14
|
+
# ddgo-search
|
|
15
|
+
|
|
16
|
+
A highly resilient, token-efficient, and feature-rich Command Line Interface (CLI) wrapper around the DuckDuckGo Search (`ddgs`) Python library. It features built-in proxy rotation, rate-limiting, custom token-saving ASCII rendering, webpage extraction, and direct content fetching.
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## ✨ Features
|
|
21
|
+
|
|
22
|
+
- **🌐 Comprehensive Query Support**: Subcommands for `text`, `images`, `videos`, `news`, `books`, and web page extraction/fetching.
|
|
23
|
+
- **🔄 Resilient Proxy Rotation**: Accepts single proxy URLs, comma-separated lists, or files containing lists of proxies. Automatically rotates proxy servers sequentially on failure.
|
|
24
|
+
- **⏱️ Process-Safe Rate Limiting**: Randomised delays (between 1.5s to 3.0s) are tracked globally using a system-level temporary file to ensure safe execution even when multiple commands are run concurrently.
|
|
25
|
+
- **⚡ Direct Web Fetching (`fetch`)**: Inspired by Charmbracelet's `crush` tool. Directly fetches and converts webpages using `httpx`, `BeautifulSoup`, and `markdownify` into beautiful plain text, markdown, or HTML, with auto-truncation limits (e.g., 100KB) to preserve context windows.
|
|
26
|
+
- **📊 Token-Efficient ASCII Layouts**: Formats console lists using space-padded ASCII dividers rather than heavy Unicode grids to save context window tokens when using LLM agents.
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## 🚀 Installation
|
|
31
|
+
|
|
32
|
+
Install and run using [uv](https://github.com/astral-sh/uv):
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
# Clone the repository
|
|
36
|
+
git clone <repository-url>
|
|
37
|
+
cd ddgo-search
|
|
38
|
+
|
|
39
|
+
# Install dependencies and sync virtual environment
|
|
40
|
+
uv sync
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## 📖 CLI Usage
|
|
46
|
+
|
|
47
|
+
Invoke `ddgo-search` directly using `uv`:
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
uv run ddgo-search [GLOBAL-OPTIONS] COMMAND [ARGS]...
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
### Global Options
|
|
54
|
+
|
|
55
|
+
These options must be passed *before* any subcommand:
|
|
56
|
+
|
|
57
|
+
- `-p, --proxy TEXT`: Proxy URL, comma-separated list of proxy URLs, or file path containing proxies (one per line). Falls back to the `DDGS_PROXY` environment variable.
|
|
58
|
+
- `-t, --timeout INTEGER`: Request timeout in seconds (default: `10`).
|
|
59
|
+
- `--verify / --no-verify`: Enable/disable SSL certification verification (default: `--verify`).
|
|
60
|
+
- `-r, --max-retries INTEGER`: Maximum retries upon server failures or timeouts (default: `3`).
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
### Commands
|
|
65
|
+
|
|
66
|
+
#### 1. Text Search (`text`)
|
|
67
|
+
Search the web for text results with custom formatting.
|
|
68
|
+
```bash
|
|
69
|
+
uv run ddgo-search text "artificial intelligence" --format plain
|
|
70
|
+
uv run ddgo-search text "python programming" --format table --max-results 5
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
#### 2. Image Search (`images`)
|
|
74
|
+
Query and filter DuckDuckGo images.
|
|
75
|
+
```bash
|
|
76
|
+
uv run ddgo-search images "cute kittens" --size Large --color Monochrome
|
|
77
|
+
uv run ddgo-search images "space nebula" --layout Wide --format json
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
#### 3. Video Search (`videos`)
|
|
81
|
+
Search for videos with specific duration, resolution, or license filters.
|
|
82
|
+
```bash
|
|
83
|
+
uv run ddgo-search videos "golang tutorial" --resolution high --duration short
|
|
84
|
+
```
|
|
85
|
+
*Note: The third-party library standard resolution uses the spelling `"standart"`. The CLI enum automatically handles this mapping.*
|
|
86
|
+
|
|
87
|
+
#### 4. News Search (`news`)
|
|
88
|
+
Query recent news.
|
|
89
|
+
```bash
|
|
90
|
+
uv run ddgo-search news "climate change" --timelimit w --format csv
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
#### 5. Book Search (`books`)
|
|
94
|
+
Search DuckDuckGo books.
|
|
95
|
+
```bash
|
|
96
|
+
uv run ddgo-search books "machine learning" --max-results 10
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
#### 6. Web Page Extract (`extract`)
|
|
100
|
+
Fetch and extract webpage content using DuckDuckGo's internal extraction backend.
|
|
101
|
+
```bash
|
|
102
|
+
uv run ddgo-search extract "https://example.com" --format text_markdown
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
#### 7. Direct Web Page Fetch (`fetch`)
|
|
106
|
+
Directly fetch a URL via `httpx` and convert its HTML content locally to Markdown, clean text (excluding scripts, styles, headers, footers), or HTML. Includes auto-truncation.
|
|
107
|
+
```bash
|
|
108
|
+
# Direct fetch and convert to Markdown
|
|
109
|
+
uv run ddgo-search fetch "https://example.com" --format markdown
|
|
110
|
+
|
|
111
|
+
# Direct fetch and extract readable plain text
|
|
112
|
+
uv run ddgo-search fetch "https://example.com" --format text
|
|
113
|
+
|
|
114
|
+
# Direct fetch and write to file
|
|
115
|
+
uv run ddgo-search fetch "https://example.com" --format markdown --output doc.md
|
|
116
|
+
|
|
117
|
+
# Set custom truncation limit (e.g. 5KB)
|
|
118
|
+
uv run ddgo-search fetch "https://example.com" --max-size 5120
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
---
|
|
122
|
+
|
|
123
|
+
## 🤖 Codex Subagent & Skill Integration
|
|
124
|
+
|
|
125
|
+
You can integrate `ddgo-search` as a custom subagent in Codex to handle all web search and page fetching tasks.
|
|
126
|
+
|
|
127
|
+
### 1. Global Installation (Recommended)
|
|
128
|
+
|
|
129
|
+
To allow Codex to automatically call the `ddgo-search` subagent across all your projects:
|
|
130
|
+
|
|
131
|
+
1. **Install the CLI globally** so it is available from any workspace directory:
|
|
132
|
+
```bash
|
|
133
|
+
pipx install .
|
|
134
|
+
# Or install into your global python environment
|
|
135
|
+
pip install .
|
|
136
|
+
```
|
|
137
|
+
2. **Install the Skill globally** by copying the skill directory to your user-level Codex skills folder:
|
|
138
|
+
```bash
|
|
139
|
+
mkdir -p ~/.codex/skills/
|
|
140
|
+
cp -r skills/ddgo-search-skill ~/.codex/skills/
|
|
141
|
+
```
|
|
142
|
+
3. **Install the Subagent Configuration globally**:
|
|
143
|
+
```bash
|
|
144
|
+
mkdir -p ~/.codex/agents/
|
|
145
|
+
cp .codex/agents/ddgo-search.toml ~/.codex/agents/
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
### 2. Project-level Installation
|
|
149
|
+
|
|
150
|
+
If you only want this subagent available inside this project directory:
|
|
151
|
+
|
|
152
|
+
1. Ensure the project is trusted in your `~/.codex/config.toml`:
|
|
153
|
+
```toml
|
|
154
|
+
[projects."/absolute/path/to/ddgo-search"]
|
|
155
|
+
trust_level = "trusted"
|
|
156
|
+
```
|
|
157
|
+
2. The local configurations are already set up under:
|
|
158
|
+
- Skill: [SKILL.md](file:///Users/2342184/programs/ddgs-search/skills/ddgo-search-skill/SKILL.md)
|
|
159
|
+
- Subagent Config: [.codex/agents/ddgo-search.toml](file:///Users/2342184/programs/ddgs-search/.codex/agents/ddgo-search.toml)
|
|
160
|
+
|
|
161
|
+
### 3. Usage
|
|
162
|
+
|
|
163
|
+
Once the skill and subagent are installed globally, Codex can delegate searches automatically when prompted. You can trigger it explicitly by prompting:
|
|
164
|
+
|
|
165
|
+
> "使用 `ddgo-search` 子智能体帮我查找关于..."
|
|
166
|
+
|
|
167
|
+
---
|
|
168
|
+
|
|
169
|
+
## 🧪 Development & Testing
|
|
170
|
+
|
|
171
|
+
Run the comprehensive unit test suite:
|
|
172
|
+
|
|
173
|
+
```bash
|
|
174
|
+
uv run pytest
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
Our tests mock the `ddgs.DDGS` library as well as network activities to ensure the test suite runs instantly, robustly, and offline.
|