doc-fetch-cli 1.1.1 → 1.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CONTRIBUTING.md DELETED
@@ -1,274 +0,0 @@
1
- # Contributing to DocFetch
2
-
3
- Thank you for your interest in contributing to DocFetch! šŸŽ‰ This guide will help you get started.
4
-
5
- ## šŸ“‹ Quick Overview
6
-
7
- 1. **Create an issue first** - Always start by opening an issue to discuss your change
8
- 2. **Wait for feedback** - Maintainers will respond and provide guidance
9
- 3. **Fork and develop** - Once approved, fork the repo and make your changes
10
- 4. **Submit a PR** - Open a pull request referencing the issue
11
- 5. **Review process** - Maintainers will review and provide feedback
12
- 6. **Merge** - Once approved, your contribution will be merged!
13
-
14
- ---
15
-
16
- ## šŸ› Before You Start: Create an Issue
17
-
18
- **āš ļø IMPORTANT: Always create an issue before submitting a PR!**
19
-
20
- This helps us:
21
- - Avoid duplicate work
22
- - Discuss the best approach
23
- - Ensure your contribution aligns with project goals
24
- - Get early feedback from maintainers
25
-
26
- ### Types of Issues
27
-
28
- #### šŸž Bug Reports
29
- Include:
30
- - Clear description of the bug
31
- - Steps to reproduce
32
- - Expected vs actual behavior
33
- - Environment details (OS, Go version, DocFetch version)
34
- - Sample command that triggers the bug
35
- - Error messages or logs
36
-
37
- #### ✨ Feature Requests
38
- Include:
39
- - Clear description of the feature
40
- - Use case / problem it solves
41
- - Example usage
42
- - Any relevant links or references
43
-
44
- #### šŸ“ Documentation Improvements
45
- Include:
46
- - What needs improvement
47
- - Why it's needed
48
- - Suggested changes
49
-
50
- #### ⚔ Performance Improvements
51
- Include:
52
- - Current performance metrics
53
- - Proposed improvements
54
- - Benchmark results (if available)
55
-
56
- ---
57
-
58
- ## šŸš€ Development Setup
59
-
60
- ### Prerequisites
61
-
62
- - Go 1.21 or later
63
- - Git
64
- - Make (optional, for running tests)
65
-
66
- ### Fork and Clone
67
-
68
- ```bash
69
- # Fork the repository on GitHub, then:
70
- git clone https://github.com/YOUR_USERNAME/doc-fetch.git
71
- cd doc-fetch
72
-
73
- # Add upstream remote
74
- git remote add upstream https://github.com/AlphaTechini/doc-fetch.git
75
- ```
76
-
77
- ### Build from Source
78
-
79
- ```bash
80
- # Build the binary
81
- go build -o doc-fetch ./cmd/docfetch
82
-
83
- # Test it works
84
- ./doc-fetch --help
85
- ```
86
-
87
- ### Run Tests
88
-
89
- ```bash
90
- # Run all tests
91
- go test ./...
92
-
93
- # Run tests with coverage
94
- go test -cover ./...
95
-
96
- # Run specific package tests
97
- go test ./pkg/fetcher/...
98
- ```
99
-
100
- ---
101
-
102
- ## šŸ’» Making Changes
103
-
104
- ### Branch Naming
105
-
106
- Use descriptive branch names:
107
- - `fix/content-extraction-bug`
108
- - `feat/add-pdf-support`
109
- - `docs/update-readme-examples`
110
- - `perf/improve-concurrent-fetching`
111
-
112
- ### Code Style
113
-
114
- Follow Go best practices:
115
- - Run `go fmt` before committing
116
- - Run `go vet` to catch issues
117
- - Write clear, concise comments
118
- - Keep functions small and focused
119
- - Use meaningful variable names
120
-
121
- ### Testing Requirements
122
-
123
- - Add tests for new features
124
- - Ensure existing tests pass
125
- - Include edge cases
126
- - Test with real documentation sites
127
-
128
- Example test:
129
- ```go
130
- func TestContentExtraction(t *testing.T) {
131
- doc := createTestDocument()
132
- content := cleanContent(doc)
133
-
134
- if len(content) == 0 {
135
- t.Error("Expected content to be extracted")
136
- }
137
-
138
- if !strings.Contains(content, "expected text") {
139
- t.Error("Expected content to contain specific text")
140
- }
141
- }
142
- ```
143
-
144
- ---
145
-
146
- ## šŸ“¤ Submitting a Pull Request
147
-
148
- ### PR Checklist
149
-
150
- Before submitting your PR, ensure:
151
-
152
- - [ ] You created an issue first and referenced it in the PR
153
- - [ ] Your code follows Go style guidelines
154
- - [ ] All tests pass (`go test ./...`)
155
- - [ ] You've added tests for new functionality
156
- - [ ] You've updated documentation if needed
157
- - [ ] Your commit messages are clear and descriptive
158
- - [ ] You've rebased on the latest main branch
159
-
160
- ### PR Template
161
-
162
- When creating your PR, include:
163
-
164
- ```markdown
165
- ## Description
166
- Brief description of changes
167
-
168
- ## Related Issue
169
- Fixes #123 (or "Related to #123")
170
-
171
- ## Type of Change
172
- - [ ] Bug fix
173
- - [ ] New feature
174
- - [ ] Breaking change
175
- - [ ] Documentation update
176
- - [ ] Performance improvement
177
- - [ ] Refactoring
178
-
179
- ## Testing
180
- Describe how you tested this:
181
- - [ ] Unit tests added/updated
182
- - [ ] Manual testing with real docs
183
- - [ ] Tested on: [list platforms]
184
-
185
- ## Example Usage
186
- Show example command and output if applicable
187
-
188
- ## Checklist
189
- - [ ] Code follows project guidelines
190
- - [ ] Self-review completed
191
- - [ ] Comments added where needed
192
- - [ ] Tests pass locally
193
- ```
194
-
195
- ---
196
-
197
- ## šŸ” Review Process
198
-
199
- 1. **Automated Checks**: CI runs tests and linting
200
- 2. **Maintainer Review**: At least one maintainer reviews
201
- 3. **Feedback**: You may be asked to make changes
202
- 4. **Approval**: Once approved, PR is merged
203
- 5. **Release**: Changes included in next release
204
-
205
- Typical timeline: 3-7 days for review
206
-
207
- ---
208
-
209
- ## šŸ“– Contribution Ideas
210
-
211
- Looking for ways to contribute? Here are some ideas:
212
-
213
- ### Easy Wins
214
- - Fix typos in documentation
215
- - Add more examples to README
216
- - Improve error messages
217
- - Add unit tests for existing code
218
-
219
- ### Intermediate
220
- - Add support for new documentation site formats
221
- - Improve content extraction selectors
222
- - Add progress indicators
223
- - Enhance LLM.txt generation
224
-
225
- ### Advanced
226
- - Add PDF export support
227
- - Implement incremental updates
228
- - Add authentication support for private docs
229
- - Create plugin system for custom extractors
230
-
231
- ---
232
-
233
- ## šŸ¤ Community Guidelines
234
-
235
- ### Be Respectful
236
- - Treat everyone with respect
237
- - Welcome newcomers
238
- - Provide constructive feedback
239
- - Assume good intentions
240
-
241
- ### Communication
242
- - Use clear, concise language
243
- - Explain your reasoning
244
- - Ask questions if unsure
245
- - Respond to feedback promptly
246
-
247
- ### Collaboration
248
- - Work with maintainers, not against them
249
- - Be open to suggestions
250
- - Help other contributors
251
- - Share knowledge
252
-
253
- ---
254
-
255
- ## šŸ“œ License
256
-
257
- By contributing to DocFetch, you agree that your contributions will be licensed under the MIT License.
258
-
259
- ---
260
-
261
- ## ā“ Questions?
262
-
263
- - **General questions**: Open a discussion on GitHub
264
- - **Bug reports**: Create an issue
265
- - **Feature requests**: Create an issue
266
- - **Quick questions**: Check existing issues/discussions first
267
-
268
- ---
269
-
270
- ## šŸ™ Thank You!
271
-
272
- Your contributions make DocFetch better for everyone. Whether it's a typo fix, a new feature, or better documentation - we appreciate your time and effort!
273
-
274
- Happy coding! šŸš€
package/SECURITY.md DELETED
@@ -1,84 +0,0 @@
1
- # Security Policy
2
-
3
- ## Security Features
4
-
5
- DocFetch includes several built-in security protections:
6
-
7
- ### āœ… Path Traversal Protection
8
- - Output files can only be written within the current working directory
9
- - Relative paths (`../`) are blocked
10
- - Absolute paths outside the current directory are rejected
11
-
12
- ### āœ… SSRF (Server-Side Request Forgery) Protection
13
- - Only HTTP/HTTPS URLs are allowed
14
- - Private IP addresses (192.168.x.x, 10.x.x.x, etc.) are blocked
15
- - Localhost and loopback addresses are blocked
16
- - Internal network access is prevented
17
-
18
- ### āœ… Rate Limiting
19
- - Maximum 10 requests per second to avoid overwhelming servers
20
- - Respectful crawling behavior
21
-
22
- ### āœ… Input Validation
23
- - URL validation and sanitization
24
- - Output path validation
25
- - Parameter bounds checking (max depth: 10, max workers: 20)
26
-
27
- ### āœ… Content Safety
28
- - HTML content is cleaned of scripts and dangerous elements
29
- - XSS patterns are filtered out
30
- - Only safe markdown is generated
31
-
32
- ## Safe Usage Guidelines
33
-
34
- ### Command Line Usage
35
- ```bash
36
- # āœ… SAFE - relative path in current directory
37
- doc-fetch --url https://example.com --output docs.md
38
-
39
- # āœ… SAFE - subdirectory in current directory
40
- doc-fetch --url https://example.com --output ./docs/site.md
41
-
42
- # āŒ BLOCKED - path traversal attempt
43
- doc-fetch --url https://example.com --output ../../etc/passwd
44
-
45
- # āŒ BLOCKED - absolute path outside current directory
46
- doc-fetch --url https://example.com --output /tmp/malicious.md
47
- ```
48
-
49
- ### URL Restrictions
50
- ```bash
51
- # āœ… SAFE - public HTTPS site
52
- doc-fetch --url https://golang.org/doc/ --output docs.md
53
-
54
- # āŒ BLOCKED - private IP address
55
- doc-fetch --url http://192.168.1.1/admin --output docs.md
56
-
57
- # āŒ BLOCKED - localhost
58
- doc-fetch --url http://localhost:8080/api --output docs.md
59
-
60
- # āŒ BLOCKED - non-HTTP protocol
61
- doc-fetch --url file:///etc/passwd --output docs.md
62
- ```
63
-
64
- ## Reporting Security Issues
65
-
66
- If you discover a security vulnerability in DocFetch, please:
67
-
68
- 1. **Do not disclose publicly** until it's been addressed
69
- 2. Contact the maintainer directly at [your email]
70
- 3. Provide detailed reproduction steps
71
- 4. Allow reasonable time for patch development
72
-
73
- ## Security Updates
74
-
75
- Security patches will be released as soon as possible after vulnerability confirmation. Users are encouraged to keep DocFetch updated to the latest version.
76
-
77
- ## Dependencies Security
78
-
79
- DocFetch uses the following dependencies with known security track records:
80
- - `github.com/PuerkitoBio/goquery` - HTML parsing
81
- - `github.com/yuin/goldmark` - Markdown processing
82
- - Standard Go libraries (`net/http`, `sync`, etc.)
83
-
84
- All dependencies are regularly audited and kept up-to-date.
@@ -1,55 +0,0 @@
1
- package main
2
-
3
- import (
4
- "flag"
5
- "log"
6
- "strings"
7
-
8
- "github.com/AlphaTechini/doc-fetch/pkg/fetcher"
9
- )
10
-
11
- func main() {
12
- url := flag.String("url", "", "Base URL to fetch documentation from")
13
- output := flag.String("output", "docs.md", "Output file path")
14
- depth := flag.Int("depth", 2, "Maximum crawl depth")
15
- concurrent := flag.Int("concurrent", 3, "Concurrent fetchers")
16
- userAgent := flag.String("user-agent", "DocFetch/1.0", "Custom user agent")
17
- llmTxt := flag.Bool("llm-txt", false, "Generate llm.txt index file")
18
-
19
- flag.Parse()
20
-
21
- if *url == "" {
22
- log.Fatal("Error: URL is required\nUsage: doc-fetch --url <base-url> --output <file-path>")
23
- }
24
-
25
- // Validate configuration for security
26
- config := fetcher.Config{
27
- BaseURL: *url,
28
- OutputPath: *output,
29
- MaxDepth: *depth,
30
- Workers: *concurrent,
31
- UserAgent: *userAgent,
32
- GenerateLLMTxt: *llmTxt,
33
- }
34
-
35
- if err := fetcher.ValidateConfig(&config); err != nil {
36
- log.Fatalf("Configuration error: %v", err)
37
- }
38
-
39
- // Use optimized high-performance fetcher
40
- err := fetcher.RunOptimized(config)
41
- if err != nil {
42
- log.Fatalf("Failed to fetch documentation: %v", err)
43
- }
44
-
45
- log.Printf("Documentation successfully saved to %s", *output)
46
- if *llmTxt {
47
- llmTxtPath := *output
48
- if strings.HasSuffix(*output, ".md") {
49
- llmTxtPath = strings.TrimSuffix(*output, ".md") + ".llm.txt"
50
- } else {
51
- llmTxtPath = *output + ".llm.txt"
52
- }
53
- log.Printf("LLM.txt index generated: %s", llmTxtPath)
54
- }
55
- }
Binary file
Binary file
Binary file
@@ -1,6 +0,0 @@
1
- """
2
- DocFetch - Dynamic documentation fetching CLI for AI/LLM consumption.
3
-
4
- This package provides a Python wrapper around the Go-based DocFetch binary,
5
- enabling easy installation and usage via pip.
6
- """
@@ -1,7 +0,0 @@
1
- """
2
- Module entry point for doc-fetch.
3
- """
4
- from .cli import main
5
-
6
- if __name__ == "__main__":
7
- main()
package/doc_fetch/cli.py DELETED
@@ -1,113 +0,0 @@
1
- #!/usr/bin/env python3
2
- """
3
- DocFetch CLI wrapper for Python.
4
-
5
- This module provides a Python interface to the Go-based DocFetch binary.
6
- It handles downloading the appropriate binary for your platform and
7
- executing it with the provided arguments.
8
- """
9
-
10
- import os
11
- import sys
12
- import subprocess
13
- import platform
14
- from pathlib import Path
15
-
16
- # Get the directory where this script is located
17
- SCRIPT_DIR = Path(__file__).parent
18
- BIN_DIR = SCRIPT_DIR / "bin"
19
- BINARY_NAME = None
20
-
21
-
22
- def get_binary_name():
23
- """Get the appropriate binary name for the current platform."""
24
- system = platform.system().lower()
25
- machine = platform.machine().lower()
26
-
27
- # Map machine architectures
28
- arch_map = {
29
- 'x86_64': 'amd64',
30
- 'amd64': 'amd64',
31
- 'arm64': 'arm64',
32
- 'aarch64': 'arm64'
33
- }
34
-
35
- arch = arch_map.get(machine, 'amd64')
36
-
37
- if system == 'windows':
38
- return f'doc-fetch_windows_{arch}.exe'
39
- elif system == 'darwin':
40
- return f'doc-fetch_darwin_{arch}'
41
- else: # linux and others
42
- return f'doc-fetch_linux_{arch}'
43
-
44
-
45
- def download_binary():
46
- """Download the appropriate binary from GitHub releases."""
47
- import urllib.request
48
- import ssl
49
-
50
- binary_name = get_binary_name()
51
- binary_path = BIN_DIR / binary_name
52
-
53
- # Create bin directory if it doesn't exist
54
- BIN_DIR.mkdir(exist_ok=True)
55
-
56
- # URL for the binary
57
- url = f"https://github.com/AlphaTechini/doc-fetch/releases/download/v1.0.0/{binary_name}"
58
-
59
- print(f"šŸ“„ Downloading doc-fetch binary for {platform.system()} {platform.machine()}...")
60
- print(f" URL: {url}")
61
-
62
- try:
63
- # Create SSL context to handle certificates
64
- ssl_context = ssl.create_default_context()
65
-
66
- # Download the binary
67
- with urllib.request.urlopen(url, context=ssl_context) as response:
68
- with open(binary_path, 'wb') as f:
69
- f.write(response.read())
70
-
71
- # Make executable on Unix-like systems
72
- if platform.system() != 'Windows':
73
- os.chmod(binary_path, 0o755)
74
-
75
- print("āœ… Binary downloaded successfully!")
76
- return binary_path
77
-
78
- except Exception as e:
79
- print(f"āŒ Failed to download binary: {e}")
80
- print("šŸ’” Please ensure you have internet access and can reach GitHub.")
81
- sys.exit(1)
82
-
83
-
84
- def main():
85
- """Main entry point for the doc-fetch CLI."""
86
- global BINARY_NAME
87
-
88
- # Get binary path
89
- binary_name = get_binary_name()
90
- binary_path = BIN_DIR / binary_name
91
-
92
- # Download binary if it doesn't exist
93
- if not binary_path.exists():
94
- binary_path = download_binary()
95
-
96
- # Execute the binary with all arguments
97
- try:
98
- result = subprocess.run([str(binary_path)] + sys.argv[1:], check=False)
99
- sys.exit(result.returncode)
100
- except FileNotFoundError:
101
- print("āŒ doc-fetch binary not found!")
102
- print("šŸ’” This shouldn't happen. Please reinstall the package.")
103
- sys.exit(1)
104
- except KeyboardInterrupt:
105
- print("\nāš ļø Interrupted by user")
106
- sys.exit(130)
107
- except Exception as e:
108
- print(f"āŒ Failed to execute doc-fetch: {e}")
109
- sys.exit(1)
110
-
111
-
112
- if __name__ == "__main__":
113
- main()