llmsbrieftxt 1.3.1__tar.gz → 1.11.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of llmsbrieftxt might be problematic. Click here for more details.

Files changed (49) hide show
  1. llmsbrieftxt-1.11.0/.github/copilot-instructions.md +115 -0
  2. llmsbrieftxt-1.11.0/.github/workflows/claude-cli-qa.yml +296 -0
  3. llmsbrieftxt-1.11.0/.github/workflows/claude-doc-review.yml +296 -0
  4. llmsbrieftxt-1.11.0/.github/workflows/claude.yml +54 -0
  5. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/CLAUDE.md +69 -13
  6. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/PKG-INFO +119 -13
  7. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/README.md +118 -12
  8. llmsbrieftxt-1.11.0/docs/USER_JOURNEYS.md +700 -0
  9. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/llmsbrieftxt/cli.py +126 -15
  10. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/llmsbrieftxt/constants.py +26 -0
  11. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/llmsbrieftxt/crawler.py +8 -0
  12. llmsbrieftxt-1.11.0/llmsbrieftxt/main.py +424 -0
  13. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/llmsbrieftxt/summarizer.py +42 -18
  14. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/pyproject.toml +1 -1
  15. llmsbrieftxt-1.11.0/tests/unit/test_cli.py +418 -0
  16. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/tests/unit/test_robustness.py +16 -17
  17. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/tests/unit/test_summarizer.py +40 -28
  18. llmsbrieftxt-1.3.1/.github/PULL_REQUEST_TEMPLATE.md +0 -44
  19. llmsbrieftxt-1.3.1/.github/pull_request_template.md +0 -63
  20. llmsbrieftxt-1.3.1/llmsbrieftxt/main.py +0 -142
  21. llmsbrieftxt-1.3.1/tests/unit/test_cli.py +0 -156
  22. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/.github/ISSUE_TEMPLATE/bug_report.yml +0 -0
  23. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/.github/ISSUE_TEMPLATE/config.yml +0 -0
  24. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/.github/ISSUE_TEMPLATE/feature_request.yml +0 -0
  25. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/.github/ISSUE_TEMPLATE/question.yml +0 -0
  26. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/.github/workflows/ci.yml +0 -0
  27. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/.github/workflows/pr-title-check.yml +0 -0
  28. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/.github/workflows/release.yml +0 -0
  29. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/.gitignore +0 -0
  30. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/CONTRIBUTING.md +0 -0
  31. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/LICENSE +0 -0
  32. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/PRODUCTION_CLEANUP_PLAN.md +0 -0
  33. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/llmsbrieftxt/__init__.py +0 -0
  34. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/llmsbrieftxt/doc_loader.py +0 -0
  35. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/llmsbrieftxt/extractor.py +0 -0
  36. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/llmsbrieftxt/schema.py +0 -0
  37. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/llmsbrieftxt/url_filters.py +0 -0
  38. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/llmsbrieftxt/url_utils.py +0 -0
  39. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/pytest.ini +0 -0
  40. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/scripts/bump_version.py +0 -0
  41. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/tests/__init__.py +0 -0
  42. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/tests/conftest.py +0 -0
  43. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/tests/fixtures/__init__.py +0 -0
  44. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/tests/integration/__init__.py +0 -0
  45. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/tests/integration/test_doc_loader_integration.py +0 -0
  46. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/tests/unit/__init__.py +0 -0
  47. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/tests/unit/test_doc_loader.py +0 -0
  48. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/tests/unit/test_extractor.py +0 -0
  49. {llmsbrieftxt-1.3.1 → llmsbrieftxt-1.11.0}/uv.lock +0 -0
@@ -0,0 +1,115 @@
1
+ # GitHub Copilot Instructions for llmsbrieftxt
2
+
3
+ ## Project Overview
4
+
5
+ This is `llmsbrieftxt`, a Python package that generates llms-brief.txt files by crawling documentation websites and using OpenAI to create structured descriptions. The CLI command is `llmtxt` (not `llmsbrieftxt`).
6
+
7
+ ## Architecture and Code Patterns
8
+
9
+ ### Async-First Design
10
+ All main functions use async/await patterns. Use `asyncio.gather()` for concurrent operations and semaphore control for rate limiting. The processing pipeline flows: URL Discovery → Content Extraction → LLM Summarization → File Generation.
11
+
12
+ ### Module Organization
13
+ - **cli.py**: Simple CLI with positional URL argument (no subcommands)
14
+ - **main.py**: Orchestrates the async generation pipeline
15
+ - **crawler.py**: RobustDocCrawler for breadth-first URL discovery
16
+ - **doc_loader.py**: DocLoader wraps crawler with document loading
17
+ - **extractor.py**: HTML to markdown via trafilatura
18
+ - **summarizer.py**: OpenAI integration with retry logic (tenacity)
19
+ - **url_utils.py**: URLNormalizer for deduplication
20
+ - **url_filters.py**: Filter non-documentation URLs
21
+ - **schema.py**: Pydantic models (PageSummary)
22
+ - **constants.py**: Configuration constants
23
+
24
+ ### Type Safety
25
+ Use Pydantic models for all structured data. The OpenAI integration uses structured output with the PageSummary model.
26
+
27
+ ### Error Handling
28
+ Failed URL loads should be logged but not stop processing. LLM failures use exponential backoff retries via tenacity. Never let one failure break the entire pipeline.
29
+
30
+ ## Development Practices
31
+
32
+ ### Testing Requirements
33
+ Write tests before implementing features. Use pytest with these markers:
34
+ - `@pytest.mark.unit` for fast, isolated tests
35
+ - `@pytest.mark.requires_openai` for tests needing OPENAI_API_KEY
36
+ - `@pytest.mark.slow` for tests making external API calls
37
+
38
+ Tests go in:
39
+ - `tests/unit/` for fast tests with no external dependencies
40
+ - `tests/integration/` for tests requiring OPENAI_API_KEY
41
+
42
+ ### Code Quality Tools
43
+ Before committing, always run:
44
+ 1. Format: `uv run ruff format llmsbrieftxt/ tests/`
45
+ 2. Lint: `uv run ruff check llmsbrieftxt/ tests/`
46
+ 3. Type check: `uv run pyright llmsbrieftxt/`
47
+ 4. Tests: `uv run pytest tests/unit/`
48
+
49
+ ### Package Management
50
+ Use `uv` for all package operations:
51
+ - Install: `uv sync --group dev`
52
+ - Add dependency: `uv add package-name`
53
+ - Build: `uv build`
54
+
55
+ ## Design Philosophy
56
+
57
+ ### Unix Philosophy
58
+ This project follows "do one thing and do it well":
59
+ - Generate llms-brief.txt files only (no built-in search/list features)
60
+ - Compose with standard Unix tools (rg, grep, ls)
61
+ - Simple CLI: URL is a positional argument, no subcommands
62
+ - Plain text output for scriptability
63
+
64
+ ### Simplicity Over Features
65
+ Avoid adding functionality that duplicates mature Unix tools. Every line of code must serve the core mission of generating llms-brief.txt files.
66
+
67
+ ## Configuration Defaults
68
+
69
+ - **Crawl Depth**: 3 levels (hardcoded in crawler.py)
70
+ - **Output**: `~/.claude/docs/<domain>.txt` (override with `--output`)
71
+ - **Cache**: `.llmsbrieftxt_cache/` for intermediate results
72
+ - **OpenAI Model**: `gpt-5-mini` (override with `--model`)
73
+ - **Concurrency**: 10 concurrent LLM requests (prevents rate limiting)
74
+
75
+ ## Commit Convention
76
+
77
+ Use conventional commits for automated versioning:
78
+ - `fix:` → patch bump (1.0.0 → 1.0.1)
79
+ - `feat:` → minor bump (1.0.0 → 1.1.0)
80
+ - `BREAKING CHANGE` or `feat!:`/`fix!:` → major bump (1.0.0 → 2.0.0)
81
+
82
+ Examples:
83
+ ```bash
84
+ git commit -m "fix: handle empty sitemap gracefully"
85
+ git commit -m "feat: add --depth option for custom crawl depth"
86
+ git commit -m "feat!: change default output location"
87
+ ```
88
+
89
+ ## Non-Obvious Behaviors
90
+
91
+ 1. URL Discovery discovers ALL pages up to depth 3, not just direct links
92
+ 2. URLs like `/page`, `/page/`, and `/page#section` are deduplicated as the same URL
93
+ 3. Summaries are automatically cached in `.llmsbrieftxt_cache/summaries.json`
94
+ 4. Content extraction uses trafilatura to preserve HTML structure in markdown
95
+ 5. File I/O is synchronous (uses standard `Path.write_text()` for simplicity)
96
+
97
+ ## Known Limitations
98
+
99
+ 1. Only supports OpenAI API (no other LLM providers)
100
+ 2. Crawl depth is hardcoded to 3 in crawler.py
101
+ 3. No CLI flag to force resume from cache (though cache exists)
102
+ 4. No progress persistence if interrupted
103
+ 5. Prompts and parsing assume English documentation
104
+
105
+ ## Code Review Checklist
106
+
107
+ When reviewing code changes:
108
+ - Ensure async patterns are used correctly (no blocking I/O in async functions)
109
+ - Verify all functions have type hints
110
+ - Check that tests are included for new functionality
111
+ - Confirm error handling doesn't break the pipeline
112
+ - Validate that conventional commit format is used
113
+ - Ensure code follows Unix philosophy (simplicity, composability)
114
+ - Check that ruff and pyright pass without errors
115
+ - **IMPORTANT**: Always include specific file names and line numbers when providing review feedback (e.g., "main.py:165" or "line 182 in cli.py")
@@ -0,0 +1,296 @@
1
+ name: Claude CLI QA Tests
2
+
3
+ on:
4
+ pull_request:
5
+ types: [opened, synchronize, reopened, ready_for_review]
6
+ branches: [main]
7
+ paths:
8
+ - 'llmsbrieftxt/**'
9
+ - 'tests/**'
10
+ - 'pyproject.toml'
11
+ - 'uv.lock'
12
+ - '.github/workflows/claude-cli-qa.yml'
13
+ workflow_dispatch:
14
+ inputs:
15
+ branch:
16
+ description: 'Branch to test (leave empty for current branch)'
17
+ required: false
18
+ type: string
19
+ test_scope:
20
+ description: 'Test scope (quick, standard, thorough)'
21
+ required: false
22
+ default: 'standard'
23
+ type: choice
24
+ options:
25
+ - quick
26
+ - standard
27
+ - thorough
28
+ single_test_case:
29
+ description: 'Run single test case (e.g., TS-001). Leave empty to run full test scope.'
30
+ required: false
31
+ type: choice
32
+ options:
33
+ - ''
34
+ - TS-001
35
+ - TS-002
36
+ - TS-003
37
+ - TS-004A
38
+ - TS-004B
39
+ - TS-005
40
+ - TS-006
41
+ - TS-007
42
+ - TS-008
43
+ - TS-009
44
+ - TS-010
45
+ - TS-011
46
+ - TS-012
47
+
48
+ jobs:
49
+ cli-qa-tests:
50
+ name: Agent-Based CLI QA
51
+ runs-on: ubuntu-latest
52
+ timeout-minutes: 20
53
+ permissions:
54
+ contents: read
55
+ pull-requests: write
56
+ issues: read
57
+ id-token: write
58
+ env:
59
+ OLLAMA_MODEL: gemma3:270m
60
+
61
+ steps:
62
+ - name: Checkout repository
63
+ uses: actions/checkout@v4
64
+ with:
65
+ ref: ${{ github.event.inputs.branch || github.head_ref || github.ref }}
66
+ fetch-depth: 1
67
+
68
+ - name: Setup Python
69
+ uses: actions/setup-python@v5
70
+ with:
71
+ python-version: '3.11'
72
+
73
+ - name: Install uv
74
+ run: |
75
+ curl -LsSf https://astral.sh/uv/install.sh | sh
76
+ echo "$HOME/.cargo/bin" >> $GITHUB_PATH
77
+
78
+ - name: Install package with dependencies
79
+ run: |
80
+ uv sync --all-groups
81
+
82
+ - name: Install Ollama
83
+ run: |
84
+ echo "Installing Ollama for local LLM testing..."
85
+ curl -fsSL https://ollama.com/install.sh | sh
86
+
87
+ # Verify installation
88
+ ollama --version
89
+
90
+ - name: Start Ollama service
91
+ run: |
92
+ echo "Starting Ollama service..."
93
+ ollama serve > ollama.log 2>&1 &
94
+ echo $! > ollama.pid
95
+
96
+ # Wait for Ollama to be ready
97
+ echo "Waiting for Ollama to be ready..."
98
+ timeout 60 bash -c 'until curl -s http://localhost:11434/api/tags > /dev/null; do sleep 2; done'
99
+
100
+ echo "✓ Ollama service is running"
101
+
102
+ - name: Pull Ollama model
103
+ run: |
104
+ echo "Pulling ${{ env.OLLAMA_MODEL }} model for testing..."
105
+ ollama pull ${{ env.OLLAMA_MODEL }}
106
+
107
+ echo "✓ Model pulled successfully"
108
+
109
+ - name: Verify Ollama setup
110
+ run: |
111
+ echo "Verifying Ollama models..."
112
+ ollama list
113
+
114
+ echo "✓ Model ready"
115
+
116
+ - name: Verify CLI installation
117
+ run: |
118
+ # Ensure CLI is accessible
119
+ uv run llmtxt --help
120
+
121
+ echo "✓ CLI installed and accessible"
122
+
123
+ - name: Determine test scope
124
+ id: test-scope
125
+ run: |
126
+ SCOPE="${{ github.event.inputs.test_scope || 'standard' }}"
127
+ echo "scope=$SCOPE" >> $GITHUB_OUTPUT
128
+
129
+ case "$SCOPE" in
130
+ quick)
131
+ echo "max_urls=2" >> $GITHUB_OUTPUT
132
+ echo "depth=1" >> $GITHUB_OUTPUT
133
+ ;;
134
+ thorough)
135
+ echo "max_urls=5" >> $GITHUB_OUTPUT
136
+ echo "depth=1" >> $GITHUB_OUTPUT
137
+ ;;
138
+ *)
139
+ echo "max_urls=3" >> $GITHUB_OUTPUT
140
+ echo "depth=1" >> $GITHUB_OUTPUT
141
+ ;;
142
+ esac
143
+
144
+ - name: Determine test argument
145
+ id: test-arg
146
+ run: |
147
+ # If single test case is specified, use that; otherwise use test scope
148
+ if [ -n "${{ github.event.inputs.single_test_case }}" ]; then
149
+ echo "arg=${{ github.event.inputs.single_test_case }}" >> $GITHUB_OUTPUT
150
+ else
151
+ # Default to 'standard' for PR events, or use the specified scope
152
+ SCOPE="${{ github.event.inputs.test_scope || 'standard' }}"
153
+ echo "arg=$SCOPE" >> $GITHUB_OUTPUT
154
+ fi
155
+
156
+ - name: Run Claude CLI QA Agent
157
+ id: claude-cli-qa
158
+ uses: anthropics/claude-code-action@v1
159
+ with:
160
+ claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
161
+
162
+ # Allow Claude bot to trigger this workflow
163
+ allowed_bots: '*'
164
+
165
+ # Use Haiku model for faster/cheaper testing
166
+ claude_args: |
167
+ --model claude-haiku-4-5-20251001
168
+ --allowedTools "Bash,Read,Write,Glob,Grep,TodoWrite,Task,SlashCommand"
169
+
170
+ # Enable progress tracking for PR events only
171
+ track_progress: ${{ github.event_name == 'pull_request' }}
172
+
173
+ # Show full output in workflow logs for debugging
174
+ show_full_output: true
175
+
176
+ prompt: |
177
+ Run the QA test suite using the slash command: /qa-test ${{ steps.test-arg.outputs.arg }}
178
+
179
+ - name: Validate test execution
180
+ id: validate-tests
181
+ if: always()
182
+ run: |
183
+ echo "=== Validating CLI QA Test Results ==="
184
+
185
+ # Check if CLI is accessible via uv run
186
+ if ! uv run llmtxt --help &> /dev/null; then
187
+ echo "❌ ERROR: llmtxt CLI not accessible via 'uv run'"
188
+ exit 1
189
+ fi
190
+
191
+ echo "✅ CLI validation passed (accessible via 'uv run llmtxt')"
192
+
193
+ - name: Evaluate test results
194
+ id: evaluate-results
195
+ if: always()
196
+ run: |
197
+ echo "=== Evaluating Test Results ==="
198
+
199
+ # Check Claude agent execution status
200
+ if [ "${{ steps.claude-cli-qa.outcome }}" = "failure" ]; then
201
+ echo "❌ Claude QA agent failed to complete"
202
+ echo "status=failure" >> $GITHUB_OUTPUT
203
+ exit 1
204
+ fi
205
+
206
+ # Parse output for test failures (these patterns match the prompt output)
207
+ CLAUDE_OUTPUT="${{ steps.claude-cli-qa.outputs.text }}"
208
+
209
+ # Check for critical failures in output
210
+ if echo "$CLAUDE_OUTPUT" | grep -q "NO-GO\|❌ FAIL\|CRITICAL"; then
211
+ echo "❌ Test failures detected in QA results"
212
+ echo "status=failure" >> $GITHUB_OUTPUT
213
+ exit 1
214
+ fi
215
+
216
+ # Check for test summary with failures
217
+ if echo "$CLAUDE_OUTPUT" | grep -q "FAIL.*\|Failed:.*[1-9]"; then
218
+ echo "⚠️ Some tests failed but not critical - checking severity"
219
+ if echo "$CLAUDE_OUTPUT" | grep -q "Exit Code Bug\|returns exit code 0\|CRITICAL"; then
220
+ echo "❌ Critical test failures detected"
221
+ echo "status=failure" >> $GITHUB_OUTPUT
222
+ exit 1
223
+ fi
224
+ fi
225
+
226
+ echo "✅ All critical tests passed"
227
+ echo "status=success" >> $GITHUB_OUTPUT
228
+
229
+ - name: Generate test summary
230
+ if: always()
231
+ env:
232
+ CLAUDE_STATUS: ${{ steps.claude-cli-qa.outcome }}
233
+ EVALUATION_STATUS: ${{ steps.evaluate-results.outputs.status }}
234
+ GITHUB_SERVER_URL: ${{ github.server_url }}
235
+ GITHUB_REPOSITORY: ${{ github.repository }}
236
+ GITHUB_RUN_ID: ${{ github.run_id }}
237
+ run: |
238
+ cat <<'EOF' >> $GITHUB_STEP_SUMMARY
239
+ ## llmtxt CLI QA Test Summary
240
+
241
+ **Test Mode**: ${{ github.event.inputs.single_test_case != '' && format('🔍 Single Test (Troubleshooting) - {0}', github.event.inputs.single_test_case) || format('📦 Full Test Suite - {0}', steps.test-arg.outputs.arg) }}
242
+ **Status**: ${{ steps.evaluate-results.outputs.status == 'success' && '✅ All Tests Passed' || '❌ Tests Failed' }}
243
+
244
+ ### Environment
245
+ - Python: 3.11
246
+ - Package Manager: uv
247
+ - LLM Provider: Ollama (${{ env.OLLAMA_MODEL }})
248
+ - Max URLs: ${{ steps.test-scope.outputs.max_urls }}
249
+ - Crawl Depth: ${{ steps.test-scope.outputs.depth }}
250
+
251
+ ### Test Categories
252
+ - Basic Functionality Tests
253
+ - CLI Flags and Options Tests
254
+ - Cache Behavior Tests
255
+ - Error Handling Tests
256
+ - Output Format Validation
257
+
258
+ ### Results
259
+ ${{ github.event.inputs.single_test_case != '' && format('The CLI QA agent executed test case **{0}** for troubleshooting.', github.event.inputs.single_test_case) || 'The CLI QA agent executed comprehensive tests using the `cli-qa-tester` agent.' }}
260
+ Detailed test report and findings are available in the workflow logs.
261
+
262
+ **View full logs**: [${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})
263
+
264
+ ---
265
+
266
+ 🤖 **Agent-Based Testing**: This workflow uses Claude Code agents to perform intelligent,
267
+ adaptive CLI testing without traditional E2E test frameworks.
268
+ EOF
269
+
270
+ - name: Upload test artifacts
271
+ if: always()
272
+ uses: actions/upload-artifact@v4
273
+ with:
274
+ name: cli-qa-artifacts
275
+ path: |
276
+ ~/.claude/docs/
277
+ .llmsbrieftxt_cache/
278
+ ollama.log
279
+ **/test-*.txt
280
+ **/test-*.log
281
+ if-no-files-found: warn
282
+ retention-days: 7
283
+
284
+ - name: Fail workflow if tests failed
285
+ if: always() && steps.evaluate-results.outputs.status == 'failure'
286
+ run: |
287
+ echo "❌ CLI QA tests have critical failures - blocking PR"
288
+ exit 1
289
+
290
+ - name: Cleanup
291
+ if: always()
292
+ run: |
293
+ if [ -f ollama.pid ]; then
294
+ kill $(cat ollama.pid) || true
295
+ fi
296
+ echo "✓ Cleanup complete"