PyPI - llmsbrieftxt - Versions diffs - 1.3.1__tar.gz → 1.4.0__tar.gz - Mend

llmsbrieftxt 1.3.1tar.gz → 1.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of llmsbrieftxt might be problematic. Click here for more details.

Files changed (44) hide show

llmsbrieftxt-1.4.0/.github/copilot-instructions.md ADDED Viewed

@@ -0,0 +1,115 @@
+# GitHub Copilot Instructions for llmsbrieftxt
+## Project Overview
+This is `llmsbrieftxt`, a Python package that generates llms-brief.txt files by crawling documentation websites and using OpenAI to create structured descriptions. The CLI command is `llmtxt` (not `llmsbrieftxt`).
+## Architecture and Code Patterns
+### Async-First Design
+All main functions use async/await patterns. Use `asyncio.gather()` for concurrent operations and semaphore control for rate limiting. The processing pipeline flows: URL Discovery → Content Extraction → LLM Summarization → File Generation.
+### Module Organization
+- **cli.py**: Simple CLI with positional URL argument (no subcommands)
+- **main.py**: Orchestrates the async generation pipeline
+- **crawler.py**: RobustDocCrawler for breadth-first URL discovery
+- **doc_loader.py**: DocLoader wraps crawler with document loading
+- **extractor.py**: HTML to markdown via trafilatura
+- **summarizer.py**: OpenAI integration with retry logic (tenacity)
+- **url_utils.py**: URLNormalizer for deduplication
+- **url_filters.py**: Filter non-documentation URLs
+- **schema.py**: Pydantic models (PageSummary)
+- **constants.py**: Configuration constants
+### Type Safety
+Use Pydantic models for all structured data. The OpenAI integration uses structured output with the PageSummary model.
+### Error Handling
+Failed URL loads should be logged but not stop processing. LLM failures use exponential backoff retries via tenacity. Never let one failure break the entire pipeline.
+## Development Practices
+### Testing Requirements
+Write tests before implementing features. Use pytest with these markers:
+- `@pytest.mark.unit` for fast, isolated tests
+- `@pytest.mark.requires_openai` for tests needing OPENAI_API_KEY
+- `@pytest.mark.slow` for tests making external API calls
+Tests go in:
+- `tests/unit/` for fast tests with no external dependencies
+- `tests/integration/` for tests requiring OPENAI_API_KEY
+### Code Quality Tools
+Before committing, always run:
+1. Format: `uv run ruff format llmsbrieftxt/ tests/`
+2. Lint: `uv run ruff check llmsbrieftxt/ tests/`
+3. Type check: `uv run pyright llmsbrieftxt/`
+4. Tests: `uv run pytest tests/unit/`
+### Package Management
+Use `uv` for all package operations:
+- Install: `uv sync --group dev`
+- Add dependency: `uv add package-name`
+- Build: `uv build`
+## Design Philosophy
+### Unix Philosophy
+This project follows "do one thing and do it well":
+- Generate llms-brief.txt files only (no built-in search/list features)
+- Compose with standard Unix tools (rg, grep, ls)
+- Simple CLI: URL is a positional argument, no subcommands
+- Plain text output for scriptability
+### Simplicity Over Features
+Avoid adding functionality that duplicates mature Unix tools. Every line of code must serve the core mission of generating llms-brief.txt files.
+## Configuration Defaults
+- **Crawl Depth**: 3 levels (hardcoded in crawler.py)
+- **Output**: `~/.claude/docs/<domain>.txt` (override with `--output`)
+- **Cache**: `.llmsbrieftxt_cache/` for intermediate results
+- **OpenAI Model**: `gpt-5-mini` (override with `--model`)
+- **Concurrency**: 10 concurrent LLM requests (prevents rate limiting)
+## Commit Convention
+Use conventional commits for automated versioning:
+- `fix:` → patch bump (1.0.0 → 1.0.1)
+- `feat:` → minor bump (1.0.0 → 1.1.0)
+- `BREAKING CHANGE` or `feat!:`/`fix!:` → major bump (1.0.0 → 2.0.0)
+Examples:
+```bash
+git commit -m "fix: handle empty sitemap gracefully"
+git commit -m "feat: add --depth option for custom crawl depth"
+git commit -m "feat!: change default output location"
+```
+## Non-Obvious Behaviors
+1. URL Discovery discovers ALL pages up to depth 3, not just direct links
+2. URLs like `/page`, `/page/`, and `/page#section` are deduplicated as the same URL
+3. Summaries are automatically cached in `.llmsbrieftxt_cache/summaries.json`
+4. Content extraction uses trafilatura to preserve HTML structure in markdown
+5. File I/O is synchronous (uses standard `Path.write_text()` for simplicity)
+## Known Limitations
+1. Only supports OpenAI API (no other LLM providers)
+2. Crawl depth is hardcoded to 3 in crawler.py
+3. No CLI flag to force resume from cache (though cache exists)
+4. No progress persistence if interrupted
+5. Prompts and parsing assume English documentation
+## Code Review Checklist
+When reviewing code changes:
+- Ensure async patterns are used correctly (no blocking I/O in async functions)
+- Verify all functions have type hints
+- Check that tests are included for new functionality
+- Confirm error handling doesn't break the pipeline
+- Validate that conventional commit format is used
+- Ensure code follows Unix philosophy (simplicity, composability)
+- Check that ruff and pyright pass without errors
+- **IMPORTANT**: Always include specific file names and line numbers when providing review feedback (e.g., "main.py:165" or "line 182 in cli.py")

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/CLAUDE.md RENAMED Viewed

@@ -95,8 +95,12 @@ llmtxt https://docs.python.org/3/
 # With options
 llmtxt https://example.com --model gpt-4o
-llmtxt https://example.com --show-urls
+llmtxt https://example.com --show-urls  # Preview URLs with cost estimate
 llmtxt https://example.com --max-urls 50
+llmtxt https://example.com --depth 2  # Control crawl depth (default: 3)
+llmtxt https://example.com --use-cache-only  # No API calls, cache only
+llmtxt https://example.com --force-refresh  # Ignore cache, regenerate all
+llmtxt https://example.com --cache-dir /tmp/cache  # Custom cache location
 llmtxt https://example.com --output custom-path.txt
 ```
@@ -114,9 +118,15 @@ ls -lh ~/.claude/docs/
 ### Default Behavior
 These are the production defaults:
-- **Crawl Depth**: 3 levels from starting URL (hardcoded in crawler.py)
+- **Crawl Depth**: 3 levels from starting URL (configurable with `--depth`)
 - **Output Location**: `~/.claude/docs/<domain>.txt` (can override with `--output`)
-- **Cache Directory**: `.llmsbrieftxt_cache/` for intermediate results
+- **Cache Directory**: `.llmsbrieftxt_cache/` for intermediate results (can override with `--cache-dir`)
+### New Features (as of latest update)
+- **Cost Estimation**: `--show-urls` now displays estimated API cost before processing
+- **Cache Control**: `--use-cache-only` and `--force-refresh` flags for cache management
+- **Failed URL Tracking**: Failed URLs are written to `failed_urls.txt` next to output file
+- **Depth Configuration**: Crawl depth is now configurable via `--depth` flag
 ### Default Model
 - **OpenAI Model**: `gpt-5-mini` (defined in constants.py, can override with `--model`)
@@ -169,12 +179,16 @@ Test markers:
 ## Non-Obvious Behaviors
-1. **URL Discovery**: Discovers ALL pages up to depth 3, not just pages linked from your starting URL
+1. **URL Discovery**: Discovers ALL pages up to configured depth (default 3), not just pages linked from your starting URL
 2. **Duplicate Handling**: `/page`, `/page/`, and `/page#section` are treated as the same URL
 3. **Concurrency Limit**: Default 10 concurrent LLM requests prevents rate limiting
 4. **Automatic Caching**: Summaries cached in `.llmsbrieftxt_cache/summaries.json` and reused automatically
 5. **Content Extraction**: Uses `trafilatura` for HTML→markdown, preserving structure
 6. **Sync File I/O**: Uses standard `Path.write_text()` instead of async file I/O (simpler, sufficient)
+7. **Cost Estimation**: `--show-urls` shows both discovered URLs count AND estimated API cost
+8. **Cache-First**: When using cache, shows "Cached: X | New: Y" breakdown before processing
+9. **Failed URL Reporting**: Failed URLs saved to `failed_urls.txt` in same directory as output
+10. **Environment Variables**: `--output` and `--cache-dir` support `$HOME` and other env var expansion
 ## Using llms-brief.txt Files
@@ -240,9 +254,12 @@ grep -rn "hooks" ~/.claude/docs/
 ### Debugging Issues
 1. Check logs - logger is configured in most modules
-2. Use `--show-urls` to preview URL discovery
-3. Check cache: `.llmsbrieftxt_cache/summaries.json`
-4. Run with verbose pytest: `uv run pytest -vv -s`
+2. Use `--show-urls` to preview URL discovery and cost estimate
+3. Check cache: `.llmsbrieftxt_cache/summaries.json` (or custom `--cache-dir`)
+4. Check failed URLs: `failed_urls.txt` in output directory
+5. Test with limited scope: `--max-urls 10 --depth 1` for quick testing
+6. Use `--use-cache-only` to test output generation without API calls
+7. Run with verbose pytest: `uv run pytest -vv -s`
 ### Modifying URL Discovery Logic
 - Edit `crawler.py` for crawling behavior
@@ -308,10 +325,9 @@ uv build
 ## Known Limitations
 1. **OpenAI Only**: Currently only supports OpenAI API (no other LLM providers)
-2. **Depth Hardcoded**: Crawl depth is hardcoded to 3 in crawler.py
-3. **No Resume Flag**: Cache exists but no CLI flag to force resume from cache
-4. **No Progress Persistence**: If interrupted, must restart (though cache helps)
-5. **English-Centric**: Prompts and parsing assume English documentation
+2. **No Progress Persistence**: If interrupted, must restart (though cache helps and is used automatically on restart)
+3. **English-Centric**: Prompts and parsing assume English documentation
+4. **No Incremental Timestamp Checking**: Force refresh or cache-only mode, but no "only update changed pages" mode
 ## Migration from v0.x

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: llmsbrieftxt
-Version: 1.3.1
+Version: 1.4.0
 Summary: Generate llms-brief.txt files from documentation websites using AI
 Project-URL: Homepage, https://github.com/stevennevins/llmsbrief
 Project-URL: Repository, https://github.com/stevennevins/llmsbrief
@@ -98,8 +98,12 @@ Output is automatically saved to `~/.claude/docs/<domain>.txt` (e.g., `docs.pyth
 - `--output PATH` - Custom output path (default: `~/.claude/docs/<domain>.txt`)
 - `--model MODEL` - OpenAI model to use (default: `gpt-5-mini`)
 - `--max-concurrent-summaries N` - Concurrent LLM requests (default: 10)
-- `--show-urls` - Preview discovered URLs without processing
+- `--show-urls` - Preview discovered URLs with cost estimate (no API calls)
 - `--max-urls N` - Limit number of URLs to process
+- `--depth N` - Maximum crawl depth (default: 3)
+- `--cache-dir PATH` - Cache directory path (default: `.llmsbrieftxt_cache`)
+- `--use-cache-only` - Use only cached summaries, skip API calls for new pages
+- `--force-refresh` - Ignore cache and regenerate all summaries
 ### Examples
@@ -110,12 +114,24 @@ llmtxt https://docs.python.org/3/
 # Use a different model
 llmtxt https://react.dev --model gpt-4o
-# Preview URLs before processing (no API calls)
+# Preview URLs with cost estimate before processing (no API calls)
 llmtxt https://react.dev --show-urls
 # Limit scope for testing
 llmtxt https://docs.python.org --max-urls 50
+# Custom crawl depth (explore deeper or shallower)
+llmtxt https://example.com --depth 2
+# Use only cached summaries (no API calls)
+llmtxt https://docs.python.org/3/ --use-cache-only
+# Force refresh all summaries (ignore cache)
+llmtxt https://docs.python.org/3/ --force-refresh
+# Custom cache directory
+llmtxt https://example.com --cache-dir /tmp/my-cache
 # Custom output location
 llmtxt https://react.dev --output ./my-docs/react.txt
@@ -245,11 +261,11 @@ uv run mypy llmsbrieftxt/
 ### Default Settings
-- **Crawl Depth**: 3 levels (hardcoded)
-- **Output Location**: `~/.claude/docs/<domain>.txt`
-- **Cache Directory**: `.llmsbrieftxt_cache/`
-- **OpenAI Model**: `gpt-5-mini`
-- **Concurrent Requests**: 10
+- **Crawl Depth**: 3 levels (configurable via `--depth`)
+- **Output Location**: `~/.claude/docs/<domain>.txt` (configurable via `--output`)
+- **Cache Directory**: `.llmsbrieftxt_cache/` (configurable via `--cache-dir`)
+- **OpenAI Model**: `gpt-5-mini` (configurable via `--model`)
+- **Concurrent Requests**: 10 (configurable via `--max-concurrent-summaries`)
 ### Environment Variables
@@ -259,10 +275,26 @@ uv run mypy llmsbrieftxt/
 ### Managing API Costs
-- Use `--show-urls` first to preview scope
-- Use `--max-urls` to limit processing during testing
-- Summaries are cached automatically - rerunning is cheap
-- Default model `gpt-5-mini` is cost-effective for most documentation
+- **Preview with cost estimate**: Use `--show-urls` to see discovered URLs and estimated API cost before processing
+- **Limit scope**: Use `--max-urls` to limit processing during testing
+- **Automatic caching**: Summaries are cached automatically - rerunning is cheap
+- **Cache-only mode**: Use `--use-cache-only` to generate output from cache without API calls
+- **Force refresh**: Use `--force-refresh` when you need to regenerate all summaries
+- **Cost-effective model**: Default model `gpt-5-mini` is cost-effective for most documentation
+### Controlling Crawl Depth
+- **Default depth (3)**: Good for most documentation sites (100-300 pages)
+- **Shallow crawl (1-2)**: Use for large sites or to focus on main pages only
+- **Deep crawl (4-5)**: Use for small sites or comprehensive coverage
+- Example: `llmtxt https://example.com --depth 2 --show-urls` to preview scope
+### Cache Management
+- **Default location**: `.llmsbrieftxt_cache/` in current directory
+- **Custom location**: Use `--cache-dir` for shared caches or different organization
+- **Cache benefits**: Speeds up reruns, reduces API costs, enables incremental updates
+- **Failed URLs tracking**: Failed URLs are written to `failed_urls.txt` next to output file
 ### Organizing Documentation

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/README.md RENAMED Viewed

@@ -65,8 +65,12 @@ Output is automatically saved to `~/.claude/docs/<domain>.txt` (e.g., `docs.pyth
 - `--output PATH` - Custom output path (default: `~/.claude/docs/<domain>.txt`)
 - `--model MODEL` - OpenAI model to use (default: `gpt-5-mini`)
 - `--max-concurrent-summaries N` - Concurrent LLM requests (default: 10)
-- `--show-urls` - Preview discovered URLs without processing
+- `--show-urls` - Preview discovered URLs with cost estimate (no API calls)
 - `--max-urls N` - Limit number of URLs to process
+- `--depth N` - Maximum crawl depth (default: 3)
+- `--cache-dir PATH` - Cache directory path (default: `.llmsbrieftxt_cache`)
+- `--use-cache-only` - Use only cached summaries, skip API calls for new pages
+- `--force-refresh` - Ignore cache and regenerate all summaries
 ### Examples
@@ -77,12 +81,24 @@ llmtxt https://docs.python.org/3/
 # Use a different model
 llmtxt https://react.dev --model gpt-4o
-# Preview URLs before processing (no API calls)
+# Preview URLs with cost estimate before processing (no API calls)
 llmtxt https://react.dev --show-urls
 # Limit scope for testing
 llmtxt https://docs.python.org --max-urls 50
+# Custom crawl depth (explore deeper or shallower)
+llmtxt https://example.com --depth 2
+# Use only cached summaries (no API calls)
+llmtxt https://docs.python.org/3/ --use-cache-only
+# Force refresh all summaries (ignore cache)
+llmtxt https://docs.python.org/3/ --force-refresh
+# Custom cache directory
+llmtxt https://example.com --cache-dir /tmp/my-cache
 # Custom output location
 llmtxt https://react.dev --output ./my-docs/react.txt
@@ -212,11 +228,11 @@ uv run mypy llmsbrieftxt/
 ### Default Settings
-- **Crawl Depth**: 3 levels (hardcoded)
-- **Output Location**: `~/.claude/docs/<domain>.txt`
-- **Cache Directory**: `.llmsbrieftxt_cache/`
-- **OpenAI Model**: `gpt-5-mini`
-- **Concurrent Requests**: 10
+- **Crawl Depth**: 3 levels (configurable via `--depth`)
+- **Output Location**: `~/.claude/docs/<domain>.txt` (configurable via `--output`)
+- **Cache Directory**: `.llmsbrieftxt_cache/` (configurable via `--cache-dir`)
+- **OpenAI Model**: `gpt-5-mini` (configurable via `--model`)
+- **Concurrent Requests**: 10 (configurable via `--max-concurrent-summaries`)
 ### Environment Variables
@@ -226,10 +242,26 @@ uv run mypy llmsbrieftxt/
 ### Managing API Costs
-- Use `--show-urls` first to preview scope
-- Use `--max-urls` to limit processing during testing
-- Summaries are cached automatically - rerunning is cheap
-- Default model `gpt-5-mini` is cost-effective for most documentation
+- **Preview with cost estimate**: Use `--show-urls` to see discovered URLs and estimated API cost before processing
+- **Limit scope**: Use `--max-urls` to limit processing during testing
+- **Automatic caching**: Summaries are cached automatically - rerunning is cheap
+- **Cache-only mode**: Use `--use-cache-only` to generate output from cache without API calls
+- **Force refresh**: Use `--force-refresh` when you need to regenerate all summaries
+- **Cost-effective model**: Default model `gpt-5-mini` is cost-effective for most documentation
+### Controlling Crawl Depth
+- **Default depth (3)**: Good for most documentation sites (100-300 pages)
+- **Shallow crawl (1-2)**: Use for large sites or to focus on main pages only
+- **Deep crawl (4-5)**: Use for small sites or comprehensive coverage
+- Example: `llmtxt https://example.com --depth 2 --show-urls` to preview scope
+### Cache Management
+- **Default location**: `.llmsbrieftxt_cache/` in current directory
+- **Custom location**: Use `--cache-dir` for shared caches or different organization
+- **Cache benefits**: Speeds up reruns, reduces API costs, enables incremental updates
+- **Failed URLs tracking**: Failed URLs are written to `failed_urls.txt` next to output file
 ### Organizing Documentation

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/llmsbrieftxt/cli.py RENAMED Viewed

@@ -8,9 +8,14 @@ from pathlib import Path
 from urllib.parse import urlparse
 from llmsbrieftxt.constants import (
+    DEFAULT_CACHE_DIR,
     DEFAULT_CONCURRENT_SUMMARIES,
+    DEFAULT_CRAWL_DEPTH,
     DEFAULT_OPENAI_MODEL,
     DOCS_DIR,
+    ESTIMATED_TOKENS_PER_PAGE_INPUT,
+    ESTIMATED_TOKENS_PER_PAGE_OUTPUT,
+    OPENAI_PRICING,
 )
 from llmsbrieftxt.main import generate_llms_txt
@@ -48,13 +53,39 @@ def parse_args(test_args: list[str] | None = None) -> argparse.Namespace:
     parser.add_argument(
         "--show-urls",
         action="store_true",
-        help="Preview discovered URLs without processing them",
+        help="Preview discovered URLs with cost estimate (no processing or API calls)",
     )
     parser.add_argument(
         "--max-urls", type=int, help="Maximum number of URLs to discover and process"
     )
+    parser.add_argument(
+        "--depth",
+        type=int,
+        default=DEFAULT_CRAWL_DEPTH,
+        help=f"Maximum crawl depth (default: {DEFAULT_CRAWL_DEPTH})",
+    )
+    parser.add_argument(
+        "--cache-dir",
+        type=str,
+        default=DEFAULT_CACHE_DIR,
+        help=f"Cache directory path (default: {DEFAULT_CACHE_DIR})",
+    )
+    parser.add_argument(
+        "--use-cache-only",
+        action="store_true",
+        help="Use only cached summaries, skip API calls for new pages",
+    )
+    parser.add_argument(
+        "--force-refresh",
+        action="store_true",
+        help="Ignore cache and regenerate all summaries",
+    )
     return parser.parse_args(test_args)
@@ -72,6 +103,39 @@ def check_openai_api_key() -> bool:
     return bool(os.environ.get("OPENAI_API_KEY"))
+def estimate_cost(num_pages: int, model: str) -> str:
+    """
+    Estimate the API cost for processing a given number of pages.
+    Args:
+        num_pages: Number of pages to process
+        model: OpenAI model name
+    Returns:
+        Formatted cost estimate string
+    """
+    if model not in OPENAI_PRICING:
+        return "Cost estimation not available for this model"
+    input_price, output_price = OPENAI_PRICING[model]
+    # Calculate total tokens
+    total_input_tokens = num_pages * ESTIMATED_TOKENS_PER_PAGE_INPUT
+    total_output_tokens = num_pages * ESTIMATED_TOKENS_PER_PAGE_OUTPUT
+    # Calculate cost (prices are per 1M tokens)
+    input_cost = (total_input_tokens / 1_000_000) * input_price
+    output_cost = (total_output_tokens / 1_000_000) * output_price
+    total_cost = input_cost + output_cost
+    if total_cost < 0.01:
+        return f"~${total_cost:.4f}"
+    elif total_cost < 1.00:
+        return f"~${total_cost:.3f}"
+    else:
+        return f"~${total_cost:.2f}"
 def get_output_path(url: str, custom_output: str | None = None) -> Path:
     """
     Get the output file path for a given URL.
@@ -84,7 +148,9 @@ def get_output_path(url: str, custom_output: str | None = None) -> Path:
         Path object for the output file
     """
     if custom_output:
-        return Path(custom_output)
+        # Expand environment variables and user home directory
+        expanded = os.path.expandvars(custom_output)
+        return Path(expanded).expanduser()
     # Extract domain from URL
     parsed = urlparse(url)
@@ -116,8 +182,25 @@ def main() -> None:
             print("Example: https://docs.python.org/3/", file=sys.stderr)
             sys.exit(1)
-        # Check for API key (unless just showing URLs)
-        if not args.show_urls and not check_openai_api_key():
+        # Validate depth parameter
+        if args.depth < 1:
+            print("Error: --depth must be at least 1", file=sys.stderr)
+            sys.exit(1)
+        # Check for conflicting cache flags
+        if args.use_cache_only and args.force_refresh:
+            print(
+                "Error: Cannot use --use-cache-only and --force-refresh together",
+                file=sys.stderr,
+            )
+            sys.exit(1)
+        # Check for API key (unless just showing URLs or using cache only)
+        if (
+            not args.show_urls
+            and not args.use_cache_only
+            and not check_openai_api_key()
+        ):
             print("Error: OPENAI_API_KEY not found", file=sys.stderr)
             print("Please set your OpenAI API key:", file=sys.stderr)
             print("  export OPENAI_API_KEY='sk-your-api-key-here'", file=sys.stderr)
@@ -131,24 +214,24 @@ def main() -> None:
         # Determine output path
         output_path = get_output_path(args.url, args.output)
+        # Expand cache directory path
+        cache_dir = Path(os.path.expandvars(args.cache_dir)).expanduser()
         # Print configuration
         print(f"Processing URL: {args.url}")
-        print(f"Using model: {args.model}")
+        if not args.show_urls:
+            print(f"Using model: {args.model}")
+        print(f"Crawl depth: {args.depth}")
         print(f"Output: {output_path}")
         if args.max_urls:
             print(f"Max URLs: {args.max_urls}")
-        # Warn about API costs for large jobs
-        if not args.show_urls and not args.max_urls:
-            print("")
-            print(
-                "Note: This will discover and process all documentation pages (depth=3)"
-            )
-            print("Tip: Use --show-urls first to preview scope, or --max-urls to limit")
-            print("")
+        if args.use_cache_only:
+            print("Mode: Cache-only (no API calls)")
+        elif args.force_refresh:
+            print("Mode: Force refresh (ignoring cache)")
         # Run generation
-        asyncio.run(
+        result = asyncio.run(
             generate_llms_txt(
                 url=args.url,
                 llm_name=args.model,
@@ -156,9 +239,23 @@ def main() -> None:
                 output_path=str(output_path),
                 show_urls=args.show_urls,
                 max_urls=args.max_urls,
+                max_depth=args.depth,
+                cache_dir=str(cache_dir),
+                use_cache_only=args.use_cache_only,
+                force_refresh=args.force_refresh,
             )
         )
+        # Show cost estimate and failed URLs if available
+        if args.show_urls and result:
+            num_urls_value = result.get("num_urls", 0)
+            # Type guard to ensure we have an int
+            if isinstance(num_urls_value, int):
+                print(
+                    f"\nEstimated cost for {num_urls_value} pages: {estimate_cost(num_urls_value, args.model)}"
+                )
+            print("Note: Actual cost may vary based on page content size and caching")
     except KeyboardInterrupt:
         print("\nOperation cancelled by user.", file=sys.stderr)
         sys.exit(1)

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/llmsbrieftxt/constants.py RENAMED Viewed

@@ -9,6 +9,32 @@ DEFAULT_OPENAI_MODEL = "gpt-5-mini"
 # Docs Directory
 DOCS_DIR = "~/.claude/docs"  # Will be expanded to full path at runtime
+# Default Cache Directory
+DEFAULT_CACHE_DIR = ".llmsbrieftxt_cache"
+# Default Crawl Depth
+DEFAULT_CRAWL_DEPTH = 3
+# OpenAI Pricing (per 1M tokens) - prices subject to change
+# Format: {model: (input_price, output_price)}
+# Note: Verify current pricing at https://openai.com/api/pricing/
+OPENAI_PRICING = {
+    "gpt-5-mini": (0.15, 0.60),  # $0.15 input, $0.60 output per 1M tokens
+    "gpt-4o-mini": (0.15, 0.60),
+    "gpt-4o": (2.50, 10.00),
+    "gpt-4-turbo": (10.00, 30.00),
+    "gpt-4": (30.00, 60.00),
+}
+# Estimated tokens per page for cost calculation
+# These estimates are based on typical documentation page sizes:
+# - Input: ~2000-4000 words per doc page → ~3000 tokens (conservative estimate)
+# - Output: ~300 tokens for structured PageSummary with all fields
+# Accuracy: Estimates typically within ±30% of actual cost
+# Pages with code examples or very long content may exceed these estimates
+ESTIMATED_TOKENS_PER_PAGE_INPUT = 3000
+ESTIMATED_TOKENS_PER_PAGE_OUTPUT = 400
 # Prompt Templates
 DEFAULT_SUMMARY_PROMPT = """You are a specialized content analyzer creating structured summaries for llms-brief.txt files. Your role is to help LLMs understand web content by providing comprehensive yet concise summaries.

llmsbrieftxt-1.4.0/llmsbrieftxt/main.py ADDED Viewed

@@ -0,0 +1,227 @@
+"""Main generation pipeline for llmsbrieftxt."""
+import json
+import re
+from pathlib import Path
+from llmsbrieftxt.doc_loader import DocLoader
+from llmsbrieftxt.extractor import default_extractor
+from llmsbrieftxt.summarizer import Summarizer
+def extract_url_from_summary(summary: str) -> str | None:
+    """
+    Extract URL from a summary in the format: Title: [title](URL).
+    Args:
+        summary: Formatted summary string
+    Returns:
+        Extracted URL or None if not found
+    """
+    # Match markdown link format: [text](url)
+    match = re.search(r"\[([^\]]+)\]\(([^)]+)\)", summary)
+    if match:
+        return match.group(2)
+    return None
+def ensure_directory_exists(file_path: str) -> None:
+    """Ensure the parent directory of the given file path exists.
+    Args:
+        file_path: Path to the file whose parent directory should be created
+    Raises:
+        RuntimeError: If directory creation fails due to permissions or other issues
+    """
+    dir_path = Path(file_path).parent
+    if dir_path == Path("."):
+        return  # Current directory, no need to create
+    try:
+        dir_path.mkdir(parents=True, exist_ok=True)
+        if not dir_path.exists():
+            print(f"Created directory: {dir_path}")
+    except OSError as e:
+        raise RuntimeError(f"Failed to create directory {dir_path}: {e}") from e
+async def generate_llms_txt(
+    url: str,
+    llm_name: str = "o4-mini",
+    max_concurrent_summaries: int = 10,
+    output_path: str = "llms.txt",
+    show_urls: bool = False,
+    max_urls: int | None = None,
+    max_depth: int = 3,
+    cache_dir: str = ".llmsbrieftxt_cache",
+    use_cache_only: bool = False,
+    force_refresh: bool = False,
+) -> dict[str, int | list[str]] | None:
+    """
+    Generate llms-brief.txt file from a documentation website.
+    Args:
+        url: URL of the documentation site to crawl
+        llm_name: OpenAI model to use for summarization
+        max_concurrent_summaries: Maximum concurrent LLM requests
+        output_path: Path to write the output file
+        show_urls: If True, only show discovered URLs without processing
+        max_urls: Maximum number of URLs to discover/process
+        max_depth: Maximum crawl depth for URL discovery
+        cache_dir: Directory to store cached summaries
+        use_cache_only: If True, only use cached summaries (no API calls)
+        force_refresh: If True, ignore cache and regenerate all summaries
+    Returns:
+        Dictionary with metadata (for show_urls mode) or None
+    """
+    urls_processed = 0
+    summaries_generated = 0
+    failed_urls: set[str] = set()  # Use set to avoid duplicates
+    # Set up cache directory
+    cache_path = Path(cache_dir)
+    cache_path.mkdir(parents=True, exist_ok=True)
+    cache_file = cache_path / "summaries.json"
+    # Load existing summaries from cache if available (unless force refresh)
+    existing_summaries: dict[str, str] = {}
+    if cache_file.exists() and not force_refresh:
+        try:
+            with open(cache_file) as f:
+                existing_summaries = json.load(f)
+                print(f"Found {len(existing_summaries)} cached summaries")
+        except Exception as e:
+            print(f"Warning: Could not load cache: {str(e)}")
+    elif force_refresh and cache_file.exists():
+        print("Force refresh enabled - ignoring existing cache")
+    extractor = default_extractor
+    output_file = output_path
+    # If show_urls is True, just show discovered URLs and exit
+    if show_urls:
+        print("Discovering documentation URLs...")
+        doc_loader = DocLoader(max_urls=max_urls, max_depth=max_depth)
+        _, discovered_urls = await doc_loader.load_docs(
+            url, extractor=extractor, show_urls=True
+        )
+        print("\nDiscovered URLs:")
+        for discovered_url in discovered_urls:
+            print(f"  - {discovered_url}")
+        print(f"\nTotal: {len(discovered_urls)} unique URLs")
+        # Calculate how many would be cached vs new
+        num_cached = sum(1 for u in discovered_urls if u in existing_summaries)
+        num_new = len(discovered_urls) - num_cached
+        if existing_summaries:
+            print(f"Cached: {num_cached} | New: {num_new}")
+        return {"num_urls": len(discovered_urls), "failed_urls": []}
+    # Load and process documents
+    doc_loader = DocLoader(max_urls=max_urls, max_depth=max_depth)
+    docs, discovered_urls = await doc_loader.load_docs(url, extractor=extractor)
+    urls_processed = len(docs)
+    # Track which URLs failed to load
+    loaded_urls = {doc.metadata.get("source") for doc in docs}
+    failed_urls.update(u for u in discovered_urls if u not in loaded_urls)
+    # Handle cache-only mode
+    if use_cache_only:
+        print("\nCache-only mode: Using only cached summaries")
+        summaries: list[str] = []
+        for doc in docs:
+            doc_url = doc.metadata.get("source", "")
+            if doc_url in existing_summaries:
+                summaries.append(existing_summaries[doc_url])
+            else:
+                print(f"  Warning: No cache for {doc_url}")
+                failed_urls.add(doc_url)
+        summaries_generated = len(summaries)
+    else:
+        # Initialize summarizer
+        print(f"\nGenerating summaries with {llm_name}...")
+        summarizer = Summarizer(
+            llm_name=llm_name,
+            max_concurrent=max_concurrent_summaries,
+        )
+        summaries: list[str] = []
+        try:
+            summaries = await summarizer.summarize_all(
+                docs, existing_summaries=existing_summaries, cache_file=cache_file
+            )
+            summaries_generated = len(summaries)
+            # Track URLs that failed summarization by extracting URLs from summaries
+            summarized_urls: set[str] = set()
+            for summary in summaries:
+                if summary:
+                    extracted_url: str | None = extract_url_from_summary(summary)
+                    if extracted_url:
+                        summarized_urls.add(extracted_url)
+            # Add docs that weren't successfully summarized to failed_urls
+            for doc in docs:
+                doc_url = doc.metadata.get("source", "")
+                if doc_url and doc_url not in summarized_urls:
+                    failed_urls.add(doc_url)
+        except KeyboardInterrupt:
+            print("Process interrupted by user. Saving partial results...")
+            if cache_file.exists():
+                try:
+                    with open(cache_file) as f:
+                        partial_summaries = json.load(f)
+                        summaries = list(partial_summaries.values())
+                        summaries_generated = len(summaries)
+                        print(f"Recovered {len(summaries)} summaries from cache")
+                except Exception:
+                    # Silently ignore cache read errors during interrupt recovery
+                    # If we can't recover from cache, we'll continue with empty results
+                    pass
+        except Exception as e:
+            print(f"Summarization process error: {str(e)}")
+            if cache_file.exists():
+                try:
+                    with open(cache_file) as f:
+                        partial_summaries = json.load(f)
+                        summaries = list(partial_summaries.values())
+                        summaries_generated = len(summaries)
+                        print(
+                            f"Recovered {len(summaries)} partial summaries from cache"
+                        )
+                except Exception:
+                    # If cache recovery fails during error handling, continue with empty results
+                    summaries = []
+        finally:
+            # Write results to file
+            if summaries:
+                ensure_directory_exists(output_file)
+                output_content = "".join(summaries)
+                Path(output_file).write_text(output_content, encoding="utf-8")
+            else:
+                ensure_directory_exists(output_file)
+                Path(output_file).write_text("", encoding="utf-8")
+            # Print summary
+            print(f"\n{'=' * 50}")
+            print(f"Processed: {summaries_generated}/{urls_processed} pages")
+            if urls_processed > 0:
+                success_rate = summaries_generated / urls_processed * 100
+                print(f"Success rate: {success_rate:.1f}%")
+            print(f"Output: {output_file}")
+            # Report failed URLs
+            if failed_urls:
+                print(f"Failed URLs: {len(failed_urls)}")
+                failed_file = Path(output_file).parent / "failed_urls.txt"
+                # Sort URLs for consistent output
+                failed_file.write_text("\n".join(sorted(failed_urls)), encoding="utf-8")
+                print(f"Failed URLs written to: {failed_file}")
+            print(f"{'=' * 50}")
+    return None

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/pyproject.toml RENAMED Viewed

@@ -7,7 +7,7 @@ packages = ["llmsbrieftxt"]
 [project]
 name = "llmsbrieftxt"
-version = "1.3.1"
+version = "1.4.0"
 description = "Generate llms-brief.txt files from documentation websites using AI"
 readme = "README.md"
 requires-python = ">=3.10"

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/tests/unit/test_cli.py RENAMED Viewed

@@ -1,8 +1,10 @@
 """Tests for CLI argument parsing."""
+import os
 import pytest
-from llmsbrieftxt.cli import parse_args, validate_url
+from llmsbrieftxt.cli import estimate_cost, get_output_path, parse_args, validate_url
 class TestCLIArgumentParsing:
@@ -110,6 +112,61 @@ class TestCLIArgumentParsing:
         args = parse_args(["https://example.com", "--max-urls", "50"])
         assert args.max_urls == 50
+    @pytest.mark.unit
+    def test_depth_flag_default(self):
+        """Test default depth value."""
+        args = parse_args(["https://example.com"])
+        assert args.depth == 3
+    @pytest.mark.unit
+    def test_depth_flag_custom(self):
+        """Test custom depth value."""
+        args = parse_args(["https://example.com", "--depth", "5"])
+        assert args.depth == 5
+    @pytest.mark.unit
+    def test_cache_dir_flag_default(self):
+        """Test default cache directory."""
+        args = parse_args(["https://example.com"])
+        assert args.cache_dir == ".llmsbrieftxt_cache"
+    @pytest.mark.unit
+    def test_cache_dir_flag_custom(self):
+        """Test custom cache directory."""
+        args = parse_args(["https://example.com", "--cache-dir", "/tmp/mycache"])
+        assert args.cache_dir == "/tmp/mycache"
+    @pytest.mark.unit
+    def test_use_cache_only_flag(self):
+        """Test --use-cache-only flag."""
+        args = parse_args(["https://example.com", "--use-cache-only"])
+        assert args.use_cache_only is True
+    @pytest.mark.unit
+    def test_force_refresh_flag(self):
+        """Test --force-refresh flag."""
+        args = parse_args(["https://example.com", "--force-refresh"])
+        assert args.force_refresh is True
+    @pytest.mark.unit
+    def test_all_new_arguments_together(self):
+        """Test all new arguments combined."""
+        args = parse_args(
+            [
+                "https://example.com",
+                "--depth",
+                "2",
+                "--cache-dir",
+                "custom_cache",
+                "--max-urls",
+                "100",
+            ]
+        )
+        assert args.url == "https://example.com"
+        assert args.depth == 2
+        assert args.cache_dir == "custom_cache"
+        assert args.max_urls == 100
     @pytest.mark.unit
     def test_no_url_exits(self):
         """Test that providing no URL exits with error."""
@@ -154,3 +211,73 @@ class TestURLValidation:
     def test_invalid_url_malformed(self):
         """Test invalid malformed URL."""
         assert validate_url("not-a-url") is False
+class TestCostEstimation:
+    """Tests for cost estimation."""
+    @pytest.mark.unit
+    def test_cost_estimate_small_job(self):
+        """Test cost estimation for small number of pages."""
+        cost = estimate_cost(10, "gpt-5-mini")
+        assert cost.startswith("~$")
+        assert "$0." in cost
+    @pytest.mark.unit
+    def test_cost_estimate_medium_job(self):
+        """Test cost estimation for medium number of pages."""
+        cost = estimate_cost(100, "gpt-4o-mini")
+        assert cost.startswith("~$")
+        assert "$" in cost
+    @pytest.mark.unit
+    def test_cost_estimate_large_job(self):
+        """Test cost estimation for large number of pages."""
+        cost = estimate_cost(500, "gpt-4o")
+        assert cost.startswith("~$")
+        assert float(cost.replace("~$", "")) > 1.0
+    @pytest.mark.unit
+    def test_cost_estimate_unknown_model(self):
+        """Test cost estimation for unknown model."""
+        cost = estimate_cost(100, "unknown-model")
+        assert "not available" in cost
+    @pytest.mark.unit
+    def test_cost_estimate_zero_pages(self):
+        """Test cost estimation for zero pages."""
+        cost = estimate_cost(0, "gpt-5-mini")
+        assert cost == "~$0.0000"
+class TestOutputPathExpansion:
+    """Tests for output path with environment variable expansion."""
+    @pytest.mark.unit
+    def test_output_path_with_tilde_expansion(self):
+        """Test output path with ~ expands to home directory."""
+        path = get_output_path("https://example.com", "~/docs/output.txt")
+        assert "~" not in str(path)
+        assert str(path).startswith(os.path.expanduser("~"))
+    @pytest.mark.unit
+    def test_output_path_with_env_var(self, monkeypatch):
+        """Test output path with $VAR environment variable."""
+        monkeypatch.setenv("MYDIR", "/tmp/testdir")
+        path = get_output_path("https://example.com", "$MYDIR/output.txt")
+        assert str(path) == "/tmp/testdir/output.txt"
+    @pytest.mark.unit
+    def test_output_path_with_env_var_braces(self, monkeypatch):
+        """Test output path with ${VAR} environment variable."""
+        monkeypatch.setenv("TESTDIR", "/tmp/test")
+        path = get_output_path("https://example.com", "${TESTDIR}/docs/output.txt")
+        assert str(path) == "/tmp/test/docs/output.txt"
+    @pytest.mark.unit
+    def test_output_path_default_no_expansion(self):
+        """Test default output path (no custom path) works correctly."""
+        path = get_output_path("https://docs.example.com")
+        # Should contain .claude/docs in path
+        assert ".claude/docs" in str(path)
+        assert str(path).endswith("docs.example.com.txt")

llmsbrieftxt-1.3.1/llmsbrieftxt/main.py DELETED Viewed

@@ -1,142 +0,0 @@
-"""Main generation pipeline for llmsbrieftxt."""
-import json
-from pathlib import Path
-from llmsbrieftxt.doc_loader import DocLoader
-from llmsbrieftxt.extractor import default_extractor
-from llmsbrieftxt.summarizer import Summarizer
-def ensure_directory_exists(file_path: str) -> None:
-    """Ensure the parent directory of the given file path exists.
-    Args:
-        file_path: Path to the file whose parent directory should be created
-    Raises:
-        RuntimeError: If directory creation fails due to permissions or other issues
-    """
-    dir_path = Path(file_path).parent
-    if dir_path == Path("."):
-        return  # Current directory, no need to create
-    try:
-        dir_path.mkdir(parents=True, exist_ok=True)
-        if not dir_path.exists():
-            print(f"Created directory: {dir_path}")
-    except OSError as e:
-        raise RuntimeError(f"Failed to create directory {dir_path}: {e}") from e
-async def generate_llms_txt(
-    url: str,
-    llm_name: str = "o4-mini",
-    max_concurrent_summaries: int = 10,
-    output_path: str = "llms.txt",
-    show_urls: bool = False,
-    max_urls: int | None = None,
-) -> None:
-    """
-    Generate llms-brief.txt file from a documentation website.
-    Args:
-        url: URL of the documentation site to crawl
-        llm_name: OpenAI model to use for summarization
-        max_concurrent_summaries: Maximum concurrent LLM requests
-        output_path: Path to write the output file
-        show_urls: If True, only show discovered URLs without processing
-        max_urls: Maximum number of URLs to discover/process
-    """
-    urls_processed = 0
-    summaries_generated = 0
-    # Set up cache directory
-    cache_dir = Path(".llmsbrieftxt_cache")
-    cache_dir.mkdir(exist_ok=True)
-    cache_file = cache_dir / "summaries.json"
-    # Load existing summaries from cache if available
-    existing_summaries: dict[str, str] = {}
-    if cache_file.exists():
-        try:
-            with open(cache_file) as f:
-                existing_summaries = json.load(f)
-                print(f"Using {len(existing_summaries)} cached summaries")
-        except Exception as e:
-            print(f"Warning: Could not load cache: {str(e)}")
-    extractor = default_extractor
-    output_file = output_path
-    # If show_urls is True, just show discovered URLs and exit
-    if show_urls:
-        print("Discovering documentation URLs...")
-        doc_loader = DocLoader(max_urls=max_urls)
-        _, discovered_urls = await doc_loader.load_docs(
-            url, extractor=extractor, show_urls=True
-        )
-        print("\nDiscovered URLs:")
-        for discovered_url in discovered_urls:
-            print(f"  - {discovered_url}")
-        print(f"\nTotal: {len(discovered_urls)} unique URLs")
-        return
-    # Load and process documents
-    doc_loader = DocLoader(max_urls=max_urls)
-    docs, discovered_urls = await doc_loader.load_docs(url, extractor=extractor)
-    urls_processed = len(docs)
-    # Initialize summarizer
-    print(f"\nGenerating summaries with {llm_name}...")
-    summarizer = Summarizer(
-        llm_name=llm_name,
-        max_concurrent=max_concurrent_summaries,
-    )
-    summaries = []
-    try:
-        summaries = await summarizer.summarize_all(
-            docs, existing_summaries=existing_summaries, cache_file=cache_file
-        )
-        summaries_generated = len(summaries)
-    except KeyboardInterrupt:
-        print("Process interrupted by user. Saving partial results...")
-        if cache_file.exists():
-            try:
-                with open(cache_file) as f:
-                    partial_summaries = json.load(f)
-                    summaries = list(partial_summaries.values())
-                    summaries_generated = len(summaries)
-                    print(f"Recovered {len(summaries)} summaries from cache")
-            except Exception:
-                pass
-    except Exception as e:
-        print(f"Summarization process error: {str(e)}")
-        if cache_file.exists():
-            try:
-                with open(cache_file) as f:
-                    partial_summaries = json.load(f)
-                    summaries = list(partial_summaries.values())
-                    summaries_generated = len(summaries)
-                    print(f"Recovered {len(summaries)} partial summaries from cache")
-            except Exception:
-                summaries = []
-    finally:
-        # Write results to file
-        if summaries:
-            ensure_directory_exists(output_file)
-            output_content = "".join(summaries)
-            Path(output_file).write_text(output_content, encoding="utf-8")
-        else:
-            ensure_directory_exists(output_file)
-            Path(output_file).write_text("", encoding="utf-8")
-        # Print summary
-        print(f"\n{'=' * 50}")
-        print(f"Processed: {summaries_generated}/{urls_processed} pages")
-        if urls_processed > 0:
-            success_rate = summaries_generated / urls_processed * 100
-            print(f"Success rate: {success_rate:.1f}%")
-        print(f"Output: {output_file}")
-        print(f"{'=' * 50}")

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/.github/ISSUE_TEMPLATE/bug_report.yml RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/.github/ISSUE_TEMPLATE/config.yml RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/.github/ISSUE_TEMPLATE/feature_request.yml RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/.github/ISSUE_TEMPLATE/question.yml RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/.github/PULL_REQUEST_TEMPLATE.md RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/.github/pull_request_template.md RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/.github/workflows/ci.yml RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/.github/workflows/pr-title-check.yml RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/.github/workflows/release.yml RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/.gitignore RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/CONTRIBUTING.md RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/LICENSE RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/PRODUCTION_CLEANUP_PLAN.md RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/llmsbrieftxt/__init__.py RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/llmsbrieftxt/crawler.py RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/llmsbrieftxt/doc_loader.py RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/llmsbrieftxt/extractor.py RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/llmsbrieftxt/schema.py RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/llmsbrieftxt/summarizer.py RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/llmsbrieftxt/url_filters.py RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/llmsbrieftxt/url_utils.py RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/pytest.ini RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/scripts/bump_version.py RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/tests/__init__.py RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/tests/conftest.py RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/tests/fixtures/__init__.py RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/tests/integration/__init__.py RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/tests/integration/test_doc_loader_integration.py RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/tests/unit/__init__.py RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/tests/unit/test_doc_loader.py RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/tests/unit/test_extractor.py RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/tests/unit/test_robustness.py RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/tests/unit/test_summarizer.py RENAMED Viewed

File without changes

{llmsbrieftxt-1.3.1 → llmsbrieftxt-1.4.0}/uv.lock RENAMED Viewed

File without changes

llmsbrieftxt 1.3.1__tar.gz → 1.4.0__tar.gz

Potentially problematic release.

llmsbrieftxt 1.3.1tar.gz → 1.4.0tar.gz