PyPI - codedoc-ai - Versions diffs - 0.8.0__tar.gz → 0.9.2__tar.gz - Mend

codedoc-ai 0.8.0tar.gz → 0.9.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (96) hide show

{codedoc_ai-0.8.0 → codedoc_ai-0.9.2}/CHANGELOG.md RENAMED Viewed

@@ -1,5 +1,276 @@
 # Changelog
+## 0.9.2 - 2026-06-12
+### Safe planning and CI ergonomics
+- Added a filesystem-read-only, provider-free `--dry-run` driven by the same
+  immutable routing plan as real execution.
+- Added `--max-files`, repeatable `--force-files`, and `--allow-partial`, with
+  matching config and environment-variable support.
+- Added stable CLI exit codes for success, file/output failures, setup errors,
+  and interrupts.
+- Added approximate planned and actual LLM call/token reporting. Dry-run totals
+  are explicitly lower bounds and no monetary accuracy is claimed.
+- Centralized per-file truncation so all three agents receive the same bounded
+  source string and only one warning is emitted.
+- Added read-only ownership inspection and moved the paid-file cap ahead of
+  filesystem mutation, writer initialization, and provider creation.
+- Added a packaged, manual-only GitHub Actions workflow with a dry-run, paid
+  cap, least-privilege permissions, and artifact upload.
+- Kept `--safe-mode` accepted but hidden for backward compatibility.
+- Added focused 0.9.2 regression coverage and synchronized release identity.
+## 0.9.1 - 2026-06-08
+### Bug-fix stabilization patch (first PyPI release)
+Corrective-only patch. No new features or output-shape changes.
+- **A1 — entry-reachability is no longer silent.** When an entry is given,
+  files not reachable from it were dropped without notice. `_select_files` now
+  logs a clear WARNING listing the excluded files, records `stats["entry_excluded"]`,
+  and the CLI prints an excluded-files line. (The structural selection fix is
+  tracked for a later minor; this patch only removes the silent failure.)
+- **A2 — a wrong `--entry` no longer silently documents the whole repo.** An
+  explicitly specified entry that cannot be resolved, is not in the scanned set,
+  resolves outside the project root, or is given when **no** supported files are
+  scanned, now raises `ConfigError` instead of falling back to all files or
+  exiting successfully. Auto-detection with no entry still documents everything.
+- **A3 — parser false imports fixed.** The Go parser no longer treats arbitrary
+  string literals (e.g. `fmt.Println("hi")`) as imports — only string-literal
+  paths in `import "..."` statements and `import ( ... )` blocks are read,
+  comments are ignored, and raw-string (backtick) paths are supported.
+  Interpreted literals use Go's byte-accurate escape semantics, including
+  multi-byte UTF-8 `\xNN` / octal sequences and Unicode escapes. The HTML parser
+  no longer treats CSS `<link href>` as a code import (kept `<script src>` and
+  JS imports).
+- **A4 — no stale/empty record substituted for a real one.** In the parallel
+  batch, a rate-limited file was treated as "already recorded" using state that
+  also included records **preloaded** from a prior run, so a *changed* file could
+  be restored from stale documentation instead of retried. `SafeWriter` now
+  tracks records written *this run* (`recorded_this_run()`); a changed,
+  rate-limited file is retried, and a file genuinely recorded this run recovers
+  its real record via `get_record()` (never an empty `{}`).
+- **A5 — honest interrupt message.** Removed dead code; the Ctrl-C message is now
+  conditional ("…if the run reached file processing") so it never falsely claims
+  progress was saved when interrupted before any file was processed.
+- **A6 — scanner is re-entrant.** The directory walker no longer stores state on
+  the function object; state lives on a per-scan `_Walker` instance.
+- **Version identity.** `pyproject.toml`, `codedoc.__version__`, the CLI
+  `--version`, and the README all report `0.9.1`, and the automated test
+  (`test_version_identity_consistent`) enforces agreement across **all four**,
+  including the README "Current release" line.
+- **Reliable tests.** `tests/conftest.py` redirects the temp root into the repo
+  (`.pyt_tmp`) so a locked system temp dir does not make the suite unrunnable.
+  (This addresses the observed locked-system-temp failure; it is not a guarantee
+  for every environment.)
+## 0.9.0 - 2026-06-04
+### Output preflight safety, clean INFO logs, extension list fix, configurable content truncation
+---
+#### G0 — Output Preflight Safety
+Foreign output targets now fail immediately with a `ConfigError` before the
+scanner runs, the provider initialises, or any LLM API call is made. Previously
+a foreign file at the target path would only be detected inside
+`write_project_outputs`, after all tokens had already been spent.
+- **`codedoc/core/output.py`**: Added `preflight_output_targets()` which calls
+  `_check_file_ownership()` for all final public targets (JSON, MD, both) and a
+  new `_check_md_live_backup_ownership()` for the MD live-backup JSON sibling.
+- **`codedoc/pipeline.py`**: Calls `preflight_output_targets()` immediately after
+  output spec resolution, before `scan_files()` and `create_provider()`.
+- **`codedoc/core/loader.py`**: `_resolve_output_spec()` now only emits the
+  format-conflict warning when `--format` was explicitly passed by the user (not
+  when the default `"json"` value from DEFAULTS triggers a mismatch).
+#### G1 — Clean Log Output
+Third-party HTTP libraries (`httpx`, `httpcore`, `openai`, `anthropic`,
+`google.auth`) are now silenced at WARNING level by default. At `--verbose` /
+DEBUG the HTTP diagnostics are restored. Per-agent progress lines appear at INFO
+so users can see what codedoc is doing at each step.
+- **`codedoc/utils/logger.py`**: `_NOISY_LOGGERS` constant defines the list;
+  `_configure()` sets those loggers to WARNING; `set_level()` lowers them to
+  DEBUG when the root logger is set to DEBUG.
+- **`codedoc/agents/orchestrator.py`**: Added timing via `time.monotonic()` and
+  INFO/WARNING log lines after each agent: `[FILE] path | structure ok  0.8s`,
+  `[FILE] path | dependencies ok  0.9s`, `[FILE] path | documentation ok  1.2s`.
+  Fallbacks emit WARNING with `"fallback"` in the message.
+#### G5 — Extension List Consistency
+`_candidate_variants()` in `graph.py` used a hardcoded 9-extension list that
+was out of sync with `_KNOWN_EXTENSIONS` and `DEFAULTS["extension_language_map"]`.
+Import resolution for Go, Kotlin, Swift, Rust, Ruby, and C-family files silently
+produced no candidates.
+- **`codedoc/core/graph.py`**: `_KNOWN_EXTENSIONS` expanded to all 19 extensions
+  in `DEFAULTS["extension_language_map"]`. `_candidate_variants()` now uses
+  `sorted(_KNOWN_EXTENSIONS)` instead of a separate hardcoded list. A comment
+  notes the sync requirement with `loader.py`.
+#### G6 — Configurable Content Truncation
+Files above 12,000 characters were silently truncated with a DEBUG-only log.
+Users saw degraded documentation for large files with no indication why.
+- **`codedoc/core/loader.py`**: `max_content_chars` added to `DEFAULTS` (12000)
+  and `_ENV_KEY_MAP` (`CODEDOC_MAX_CONTENT_CHARS`). Validation requires a positive
+  integer ≥ 1000.
+- **`codedoc/agents/base_agent.py`**: Removed module-level `_MAX_CONTENT_CHARS`
+  constant. `BaseAgent.__init__` now accepts `max_content_chars: int = 12000`.
+  `_truncate()` uses `self._max_content_chars` and logs at INFO with the file
+  path and original / truncated character counts.
+- **`codedoc/agents/orchestrator.py`**: `Orchestrator.__init__` accepts
+  `max_content_chars: int = 12000` and forwards it to each agent.
+- **`codedoc/pipeline.py`**: Passes `config.get("max_content_chars", 12000)` to
+  the `Orchestrator` constructor.
+- All three agent subclasses pass `file_path` to `_truncate()` for accurate logs.
+---
+## 0.8.1 - 2026-06-02
+### Lossless Markdown, placeholder sanitization, configurable defaults, provider-aware rate-limit backoff
+---
+#### Workstream A — Lossless Markdown View
+Markdown output now embeds the complete public JSON view as a hidden base64
+comment so `json_from_markdown()` (and incremental re-runs that read a `.md`
+file) recover the full dependency catalog, per-file hashes, and all dependency
+metadata without any information loss.
+- **`codedoc/core/project_view.py`**:
+  - `markdown_from_view()` writes a `<!-- codedoc-ai-view-base64 ... -->` block
+    immediately after the legacy `<!-- codedoc-ai: ... -->` metadata comment.
+    The block is standard base64-encoded UTF-8 JSON, which avoids comment-safety
+    issues with raw `--` or `-->` sequences in generated text.
+  - `markdown_to_view()` now tries the embedded view first (fast, lossless path);
+    falls back to the existing visible Markdown parser for pre-0.8.1 files.
+  - New public helper `read_embedded_view(markdown)` decodes and validates the
+    embedded block; returns `None` on any failure so callers fall back safely.
+  - `read_codedoc_meta()` no longer raises `ConfigError` when `entry_file` is
+    `null`; a valid CodeDoc file with no entry point is now correctly identified
+    as owned rather than foreign.
+- **`codedoc/pipeline.py`**:
+  - `_load_existing_file_docs_from_md()` preserves file hashes from the embedded
+    view when the lightweight metadata comment has no hash for a path.
+  - `_resolve_entry_and_docs()` no longer raises unconditionally when no existing
+    output is found; first runs without `--entry` now reach `detect_entry_file()`
+    for auto-detection instead of failing immediately.
+#### Workstream B — Placeholder Usage Example Sanitization
+LLM-generated usage examples that contain placeholder package names (e.g.
+`import 'package:your_package/...'`) are now removed before any output is
+written or cached.
+- **`codedoc/core/project_view.py`**: `_clean_file()` calls the new
+  `_sanitize_usage_example()` helper, which checks against `_PLACEHOLDER_PATTERN`
+  (a compiled `re.IGNORECASE` regex with word-boundary guards).  Covered
+  placeholders: `your_package_name`, `your_package`, `your_project`, `your_app`,
+  `example_package`, `my_package`, and Dart-style `package:example/`.
+  Sanitization is idempotent and applies to both freshly generated records and
+  cached/reused records loaded from prior output files.
+#### Workstream C — Configurable Hardcoded Defaults
+All previously hardcoded scanner and provider defaults are now driven by a
+single source of truth in `DEFAULTS` (`loader.py`) and support `_add` / `_remove`
+override keys.
+- **`codedoc/core/loader.py`**:
+  - `DEFAULTS` gains eleven new keys: `skip_dirs_add`, `skip_dirs_remove`,
+    `extension_language_map` (full 18-entry map), `extension_language_map_add`,
+    `extension_language_map_remove`, `auto_entry_candidates`,
+    `auto_entry_candidates_add`, `auto_entry_candidates_remove`,
+    `provider_prefixes`, `provider_prefixes_add`, `provider_prefixes_remove`.
+  - Three resolver helpers implement the resolution order (replace → `_add` →
+    `_remove`): `_resolve_list_override`, `_resolve_dict_override`,
+    `_resolve_nested_list_dict_override`.
+  - `_apply_config_overrides()` is called after all config sources are merged;
+    it resolves all four configurable keys and derives `supported_extensions`
+    from the resolved `extension_language_map`.
+  - Backward-compat bridge: if `supported_extensions` was explicitly set to a
+    value different from the defaults, it is used as a filter on
+    `extension_language_map` so old configs continue to restrict scanning as
+    intended.
+- **`codedoc/core/scanner.py`**:
+  - Hardcoded `SKIP_DIRS` and `EXTENSION_LANGUAGE_MAP` removed.
+  - `scan_files()` receives `extension_language_map` (primary) instead of
+    `supported_extensions`.  A positional-list guard handles legacy callers
+    that pass a list as the second argument.
+  - `detect_entry_file()` receives the resolved `auto_entry_candidates` list;
+    falls back to a module-level default for direct callers.
+- **`codedoc/pipeline.py`**: passes `extension_language_map` and
+  `auto_entry_candidates` to the scanner; always appends the output directory
+  name to the scan skip list (even when the user removed it via
+  `--remove-skip-dir`) to prevent codedoc from documenting its own output.
+- **`codedoc/cli/cli.py`**: three new flags: `--skip-dirs DIR [...]`,
+  `--add-skip-dir DIR` (repeatable), `--remove-skip-dir DIR` (repeatable).
+- **`codedoc/llm/factory.py`**: `create_provider()`, `_make_api()`,
+  `_resolve_api_provider()`, and `_provider_api_key()` all accept and use
+  `provider_prefixes` from config; module-level tuples kept as fallbacks.
+#### Workstream D — Provider-Aware Rate-Limit Backoff
+Parallel ladder step-downs now sleep between rungs using provider-aware
+exponential backoff, with optional `Retry-After` hint parsing.
+- **`codedoc/llm/rate_limit_profile.py`** *(new)*:
+  - `RateLimitProfile` dataclass — `provider`, `signals`, `min_backoff_s`,
+    `backoff_scale`.
+  - `PROVIDER_PROFILES` — preconfigured profiles for `openai`, `anthropic`,
+    `gemini`, and `default`.
+  - `get_rate_limit_profile(provider_name, config)` — returns the resolved
+    profile with `rate_limit_backoff_s`, `rate_limit_backoff_scale`,
+    `rate_limit_signals_add`, and `rate_limit_signals_remove` applied without
+    mutating module defaults.
+- **`codedoc/pipeline.py`**:
+  - `_is_rate_limit_error(exc, profile=None)` — when a `profile` is supplied,
+    checks only `profile.signals`; falls back to `_RATE_LIMIT_SIGNALS` for
+    backward compatibility with callers without a profile.
+  - `_detect_limit_type(error_msg)` — classifies errors as `"tpm"`, `"rpm"`,
+    `"quota"`, `"overloaded"`, or `None`.
+  - `_process_descriptor_batch()` return type changed:
+    `retry_rate_limited` is now `list[tuple[dict, Exception]]` so the causing
+    exception is preserved for `Retry-After` parsing and error sampling.
+  - `_process_agent_files()`: fetches the provider profile, passes it to
+    `_process_descriptor_batch()`, and sleeps between rungs using:
+    - `min(Retry-After, retry_after_cap_s)` when a hint is present and
+      `respect_retry_after = True`,
+    - `min(min_backoff_s × backoff_scale ^ rung, retry_after_cap_s)` otherwise,
+    - no sleep when `rate_limit_backoff_s = 0`.
+  - Rate-limit warning dicts now include: `retry_after_s`, `sleep_s`,
+    `error_sample`, `limit_type`, `event_number`, `rung_index`.
+- **`codedoc/core/loader.py`**: four new `DEFAULTS` keys:
+  `rate_limit_backoff_s`, `rate_limit_backoff_scale`, `rate_limit_signals_add`,
+  `rate_limit_signals_remove`.
+- **`codedoc/cli/cli.py`**: compact rate-limit summary line printed only when
+  step-down events occurred; shows event count, providers, and total sleep time.
+#### Version
+- `codedoc/__init__.py`, `pyproject.toml`, `cli.py`: `0.8.0` → `0.8.1`.
+#### Validation
+- Added regression coverage for lossless Markdown regeneration, placeholder
+  sanitization, configurable defaults, provider-aware rate-limit backoff, and
+  rate-limit edge cases.
+- Full test suite passes.
+- Built sdist/wheel and verified release metadata with `twine check`.
+---
 ## 0.8.0 - 2026-05-31
 ### Always-on live JSON crash backup, parallel crash-safety, rate-limit adaptive parallelism, error.log overhaul

{codedoc_ai-0.8.0 → codedoc_ai-0.9.2}/MANIFEST.in RENAMED Viewed

@@ -6,6 +6,7 @@ include CONTRIBUTING.md
 include SECURITY.md
 include CODE_OF_CONDUCT.md
 include .env.example
+recursive-include codedoc/templates *.yml
 recursive-include tests *.py
 recursive-include tests/fixtures *
 recursive-include .github *.md

{codedoc_ai-0.8.0 → codedoc_ai-0.9.2}/PKG-INFO RENAMED Viewed

@@ -1,14 +1,20 @@
 Metadata-Version: 2.4
 Name: codedoc-ai
-Version: 0.8.0
+Version: 0.9.2
 Summary: Generate structured, incremental documentation for any codebase using OpenAI, Anthropic, or Gemini
 Author: Atharv Mannur
 License-Expression: MIT
 Project-URL: Homepage, https://github.com/atharvm416/codedoc-ai
+Project-URL: PyPI, https://pypi.org/project/codedoc-ai/
+Project-URL: Documentation, https://github.com/atharvm416/codedoc-ai#readme
+Project-URL: Source, https://github.com/atharvm416/codedoc-ai
 Project-URL: Issues, https://github.com/atharvm416/codedoc-ai/issues
-Keywords: documentation,ai,llm,codebase,agents,codegen
+Project-URL: Changelog, https://github.com/atharvm416/codedoc-ai/blob/main/CHANGELOG.md
+Keywords: ai,anthropic,cli,code-analysis,codebase,developer-tools,documentation,gemini,llm,openai,python
 Classifier: Development Status :: 3 - Alpha
+Classifier: Environment :: Console
 Classifier: Intended Audience :: Developers
+Classifier: Operating System :: OS Independent
 Classifier: Programming Language :: Python :: 3
 Classifier: Programming Language :: Python :: 3.9
 Classifier: Programming Language :: Python :: 3.10
@@ -16,6 +22,7 @@ Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
 Classifier: Topic :: Software Development :: Documentation
 Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Classifier: Topic :: Utilities
 Requires-Python: >=3.9
 Description-Content-Type: text/markdown
 License-File: LICENSE
@@ -40,7 +47,7 @@ Dynamic: license-file
 The tool scans source files, resolves project-local imports into a dependency graph, sends only files that need analysis to an LLM, and writes one combined, structured documentation artifact designed for both humans and AI. By default that artifact is JSON.
-Current release: `0.8.0`.
+Current release: `0.9.2`.
 ## What It Does
@@ -61,6 +68,9 @@ Current release: `0.8.0`.
 - Survives interruptions: writes a live JSON backup before any AI work starts, then updates it after every completed file. A Ctrl-C or crash always leaves a readable partial output file — no results are lost, and re-running the same command resumes automatically from where it stopped.
 - Adaptive rate-limit parallelism: when a provider signals 429 / rate-limit, file concurrency is stepped down (`5 → 2 → 1`) and a provider-specific warning is printed to the terminal. No manual intervention needed.
 - Refuses to overwrite any file it did not create (ownership guard), protecting your data from accidental output collisions.
+- Provides a filesystem-read-only `--dry-run` with approximate lower-bound call and token estimates.
+- Supports a pre-call `--max-files` cap and repeatable `--force-files` reprocessing.
+- Reports stable CI-oriented exit codes and optional `--allow-partial` behavior.
 - Writes a clean, structured public project view to `codedoc/codedoc.json` by default, or Markdown when requested.
 - Public output includes project overview, file tree, folder map, dependency graph, dependency catalog, and flattened file summaries.
 - Converts public JSON to Markdown without another AI call.
@@ -91,6 +101,11 @@ codedoc run
 | Live JSON backup | always on (0.8.0 default) |
 | Rate-limit adaptive | `true` |
 | Max file size | `500 KB` |
+| Max content chars | `12000` |
+| Dry run | `false` |
+| Maximum paid files | `0` (unlimited) |
+| Forced files | `[]` |
+| Allow partial output | `false` |
 Because the default provider uses the OpenAI API, a user must supply an API key unless they select a different provider.
@@ -205,7 +220,10 @@ Common commands:
 | `codedoc run --provider gemini --model gemini-2.5-flash` | Use Google Gemini. |
 | `codedoc run --provider anthropic --model claude-haiku-4-5-20251001` | Use Anthropic Claude. |
 | `codedoc run --ignore /myenv --ignore generated` | Ignore project paths. |
-| `codedoc run --safe-mode` | Deprecated (live backup is always on since 0.8.0). |
+| `codedoc run --dry-run --max-files 25` | Inspect the plan without writes, provider creation, or API calls. |
+| `codedoc run --max-files 25` | Stop before mutation or API calls if more than 25 files need LLM work. |
+| `codedoc run --force-files src/a.py --force-files src/b.py` | Explicitly reprocess selected files. |
+| `codedoc run --allow-partial` | Exit 0 for completed partial runs, with a prominent warning. |
 | `codedoc run --max-parallel-files 3` | Limit concurrent file processing. |
 | `codedoc .` | Legacy shorthand for documenting the current directory. |
 | `codedoc --version` | Print the installed version. |
@@ -340,7 +358,47 @@ Create `codedoc.config.json` in the project being documented:
   "parallel_ladder": null,
   "respect_retry_after": true,
   "retry_after_cap_s": 30,
+  "rate_limit_backoff_s": null,
+  "rate_limit_backoff_scale": null,
+  "rate_limit_signals_add": [],
+  "rate_limit_signals_remove": [],
   "skip_dirs": ["myenv", ".venv", "venv", "env", "node_modules", "__pycache__", "codedoc"],
+  "skip_dirs_add": [],
+  "skip_dirs_remove": [],
+  "max_content_chars": 12000,
+  "extension_language_map": {
+    ".py": "python",
+    ".ts": "typescript",
+    ".tsx": "tsx",
+    ".js": "javascript",
+    ".jsx": "jsx",
+    ".dart": "dart",
+    ".java": "java",
+    ".cs": "csharp",
+    ".html": "html",
+    ".htm": "html",
+    ".kt": "kotlin",
+    ".swift": "swift",
+    ".go": "go",
+    ".rb": "ruby",
+    ".rs": "rust",
+    ".cpp": "cpp",
+    ".c": "c",
+    ".h": "c",
+    ".hpp": "cpp"
+  },
+  "extension_language_map_add": {},
+  "extension_language_map_remove": [],
+  "auto_entry_candidates": ["index.html", "main.tsx", "main.ts", "main.js", "main.py", "main.dart", "Main.java", "Program.cs"],
+  "auto_entry_candidates_add": [],
+  "auto_entry_candidates_remove": [],
+  "provider_prefixes": {
+    "anthropic": ["claude"],
+    "gemini": ["gemini"],
+    "openai": ["gpt-", "o1", "o3", "text-"]
+  },
+  "provider_prefixes_add": {},
+  "provider_prefixes_remove": {},
   "ignore_paths": ["/myenv", "services/generated"]
 }
 ```
@@ -369,6 +427,30 @@ Parallelism settings:
 | `file_retry_attempts` | Number of sequential retries for a failed file. Default: `1`. |
 | `max_consecutive_failures` | Stops the run after repeated failures so provider/API problems are visible quickly. Default: `5`. |
+Configurable defaults added in 0.8.1:
+| Setting | Purpose |
+| --- | --- |
+| `skip_dirs`, `skip_dirs_add`, `skip_dirs_remove` | Replace, extend, or reduce directory names skipped anywhere in the tree. Use `--remove-skip-dir codedoc` to document this package source while codedoc still skips its output directory. |
+| `extension_language_map`, `extension_language_map_add`, `extension_language_map_remove` | Control which extensions are scanned and what language label each gets. Any extension in the resolved map is supported. |
+| `auto_entry_candidates`, `auto_entry_candidates_add`, `auto_entry_candidates_remove` | Control first-run entry auto-detection when `--entry` is omitted. |
+| `provider_prefixes`, `provider_prefixes_add`, `provider_prefixes_remove` | Control model-name based provider auto-detection and matching API-key lookup. |
+Configurable settings added in 0.9.0:
+| Setting | Default | Purpose |
+| --- | --- | --- |
+| `max_content_chars` | `12000` | Maximum characters sent to the LLM per file. Long files are truncated once, one WARNING reports the path and counts, and the marker stays inside the ceiling. Must be at least `1000`. |
+Planning and CI settings added in 0.9.2:
+| Setting | Default | Purpose |
+| --- | --- | --- |
+| `dry_run` | `false` | Compute the real routing plan without filesystem mutation or provider/API interaction. |
+| `max_files` | `0` | Maximum files allowed to make LLM calls after reuse and resume decisions. `0` is unlimited. |
+| `force_files` | `[]` | Selected project paths to reprocess explicitly before dependency propagation. |
+| `allow_partial` | `false` | Exit 0 only for completed runs that produced partial output after file failures. |
 ## Environment Variables
 Secrets should live in environment variables or a local `.env` file that is ignored by Git. Use [.env.example](.env.example) as the template.
@@ -393,6 +475,11 @@ Supported variables:
 | `CODEDOC_MAX_CONSECUTIVE_FAILURES` | Consecutive failure threshold before stopping. |
 | `LOG_LEVEL` | `INFO`, `DEBUG`, etc. |
 | `CODEDOC_IGNORE_PATHS` | Semicolon-separated ignore paths. |
+| `CODEDOC_MAX_CONTENT_CHARS` | Maximum characters of file content sent to the LLM. Equivalent to `max_content_chars` in config. |
+| `CODEDOC_DRY_RUN` | Boolean planning-only mode. |
+| `CODEDOC_MAX_FILES` | Non-negative paid-file cap; `0` is unlimited. |
+| `CODEDOC_FORCE_FILES` | Semicolon-separated forced project paths. |
+| `CODEDOC_ALLOW_PARTIAL` | Boolean partial-output exit-code override. |
 Example `.env` for OpenAI:
@@ -582,14 +669,13 @@ On each run, `codedoc` follows this process:
 3. Scan supported files while respecting `skip_dirs` and `ignore_paths`.
 4. Build a dependency graph from parsed imports.
 5. Select files reachable from the entry point.
-6. Compute each selected file's SHA-256 hash.
-7. Skip files whose path and hash already match the existing output.
-8. Reuse existing documentation if another file has the same content hash.
-9. If `propagate_changes` is true, reprocess files that depend on changed files.
-10. Send only remaining files to the selected LLM, up to `max_parallel_files` at a time.
-11. Retry failed parallel files sequentially so errors are easier to diagnose.
-12. Stop early if repeated failures suggest the API or provider is unavailable.
-13. Rebuild the selected output file from processed records, embedding metadata for the next run.
+6. Normalize forced paths and add valid forced files before dependency propagation.
+7. Compute one immutable plan covering changed, unchanged, reused, resumed, and paid-agent files.
+8. In `--dry-run`, return that plan and approximate lower-bound usage without writing or creating a provider.
+9. In a real run, enforce ownership and `max_files` before creating directories, writers, logs, or providers.
+10. Materialize identical-content and checkpoint reuse exactly as planned.
+11. Send only paid-agent files to the LLM, retry failures, and write final output.
+12. Report actual call attempts and approximate input/output token totals.
 This means repeated runs should only send new or changed code to the LLM. Unchanged code and exact duplicate content are reused.
@@ -634,7 +720,7 @@ with the final clean output.
 and now has no effect — live backup is always on. Passing it prints a deprecation
 notice. It will be removed in a future release.
-### Adaptive rate-limit parallelism (0.8.0)
+### Adaptive rate-limit parallelism (0.8.1)
 When a provider signals 429 / rate-limit / quota-exceeded, codedoc automatically
 steps down file-level concurrency instead of hammering the API:
@@ -656,9 +742,34 @@ Customize it in config:
 }
 ```
-Provider-specific rate-limit signals are recognised for OpenAI (`429`,
-`rate_limit_exceeded`, `tpm`), Anthropic (`529`, `overloaded`), and Gemini
-(`RESOURCE_EXHAUSTED`, `quota`). Non-rate-limit errors never trigger a step-down.
+Provider-specific rate-limit signals are recognised for OpenAI (`429`, `rate limit`,
+`rate_limit`, `too many requests`, `tokens per min`, `tpm`, `quota`), Anthropic
+(`529`, `overloaded`, `rate_limit`, `429`), and Gemini (`resource_exhausted`,
+`quota`, `429`, `503`). Non-rate-limit errors never trigger a step-down.
+In 0.8.1, codedoc sleeps between parallel step-down rungs using provider-aware
+backoff. You can tune this in config:
+```json
+{
+  "rate_limit_backoff_s": null,
+  "rate_limit_backoff_scale": null,
+  "rate_limit_signals_add": ["capacity exceeded", "throttled"],
+  "rate_limit_signals_remove": ["503"]
+}
+```
+Set `rate_limit_backoff_s` to `0` to disable computed inter-rung backoff.
+`Retry-After` hints are still honored when `respect_retry_after` is true.
+### Lossless Markdown regeneration (0.8.1)
+Markdown output remains human-readable, but codedoc now embeds a hidden
+base64-encoded public JSON view in a `<!-- codedoc-ai-view-base64 ... -->`
+comment. This lets later Markdown-to-JSON conversion and incremental re-runs
+recover dependency catalogs, per-file dependency metadata, links, and hashes
+without another LLM call. Legacy Markdown without the embedded view still uses
+the best-effort visible Markdown parser.
 ### Issue log (`error.log`)
@@ -676,11 +787,64 @@ Only hard file failures are surfaced there.
 ### Ownership guard
-Before writing, `codedoc` checks that any existing file at the target path was
-produced by codedoc (a `_codedoc` metadata block in JSON, or a `<!-- codedoc-ai: -->`
-comment in Markdown). If the file is foreign, malformed, or empty, the run stops
-with a clear `ConfigError` instead of overwriting it. Choose a different
-`--output` directory or remove the conflicting file to proceed.
+`codedoc` checks that any existing file at the target path was produced by
+codedoc (a `_codedoc` metadata block in JSON, or a `<!-- codedoc-ai: -->` comment
+in Markdown). If the file is foreign, malformed, or empty, the run stops with a
+clear `ConfigError`. Choose a different `--output` directory or remove the
+conflicting file to proceed.
+**Preflight (0.9.0).** The ownership check now runs *before* any filesystem
+changes, directory creation, scanning, or LLM calls. A foreign target that would
+block the final write is caught immediately — no tokens are spent and no output
+directory is created.
+## Planning, Cost Guardrails, and CI
+Use `codedoc run --dry-run --max-files 25` to inspect a run safely. Dry-run
+uses the same routing plan as real execution. It may read source, existing
+outputs, live backups, and legacy checkpoints, but it does not create an output
+directory, write `error.log`, initialize `SafeWriter`, create a provider, or
+call an API. It works without an API key.
+Token figures use a simple character heuristic. Dry-run input totals are
+explicitly lower bounds because the documentation prompt includes earlier
+agent responses that do not exist during planning. No monetary estimate is
+provided.
+`--max-files N` counts only files that would actually make LLM calls after
+unchanged skipping, identical-content reuse, and eligible checkpoint reuse. A
+real run exceeding the cap exits `2` before persistent mutation or provider
+creation. Dry-run still exits `0` and reports that the equivalent real run
+would fail.
+Force selected files with repeatable options:
+```bash
+codedoc run --force-files src/a.py --force-files src/b.py
+```
+Explicitly forced files bypass unchanged, identical-content, and checkpoint
+reuse. They are added before normal dependency propagation; propagated
+dependents retain normal reuse behavior.
+CLI exit codes:
+| Code | Meaning |
+| --- | --- |
+| `0` | Success, dry-run success, or explicitly allowed partial output. |
+| `1` | File-processing failure, output/write failure, or unexpected fatal error. |
+| `2` | Invalid input/config/path, ownership conflict, cap exceeded, or provider initialization failure. |
+| `130` | Keyboard interrupt. |
+`--allow-partial` changes only completed runs with file-level failures. Setup,
+ownership, cap, provider initialization, write, and unexpected fatal errors
+remain nonzero.
+A packaged manual-only GitHub Actions example is installed at
+`codedoc/templates/github-actions-codedoc.yml`. It performs a dry-run before
+the paid run, applies the same cap to both, uploads documentation as an
+artifact, uses `contents: read`, and never commits or pushes. Selected source
+is sent to an external provider and API usage may cost money.
 ### More detail
@@ -757,7 +921,10 @@ CLI flags map directly to config keys:
 | `--output` | `output_dir` |
 | `--format` | `output_format` |
 | `--ignore` | `ignore_paths` |
-| `--safe-mode` | `safe_mode: True` |
+| `--dry-run` | `dry_run: True` |
+| `--max-files` | `max_files` |
+| `--force-files` | `force_files` |
+| `--allow-partial` | `allow_partial: True` |
 | `--no-parallel` | `parallel_agents: False` |
 | `--max-parallel-files` | `max_parallel_files` |
 | `--verbose` | `log_level: "DEBUG"` |

codedoc-ai 0.8.0__tar.gz → 0.9.2__tar.gz

codedoc-ai 0.8.0tar.gz → 0.9.2tar.gz