codedoc-ai 0.7.1__tar.gz → 0.8.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (83) hide show
  1. codedoc_ai-0.8.0/CHANGELOG.md +550 -0
  2. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/MANIFEST.in +1 -0
  3. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/PKG-INFO +127 -12
  4. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/README.md +754 -639
  5. codedoc_ai-0.8.0/RUN_FLOW.md +642 -0
  6. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/__init__.py +18 -18
  7. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/cli/cli.py +70 -19
  8. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/core/__init__.py +10 -0
  9. codedoc_ai-0.8.0/codedoc/core/checkpoint.py +210 -0
  10. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/core/loader.py +337 -282
  11. codedoc_ai-0.8.0/codedoc/core/output.py +159 -0
  12. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/core/project_view.py +23 -0
  13. codedoc_ai-0.8.0/codedoc/core/safe_writer.py +364 -0
  14. codedoc_ai-0.8.0/codedoc/pipeline.py +1263 -0
  15. codedoc_ai-0.8.0/codedoc/utils/errors.py +186 -0
  16. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/utils/logger.py +47 -47
  17. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc_ai.egg-info/PKG-INFO +127 -12
  18. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc_ai.egg-info/SOURCES.txt +4 -0
  19. codedoc_ai-0.8.0/codedoc_ai.egg-info/requires.txt +14 -0
  20. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/pyproject.toml +60 -60
  21. codedoc_ai-0.8.0/tests/test_080_features.py +1184 -0
  22. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/test_scenarios.py +930 -798
  23. codedoc_ai-0.7.1/CHANGELOG.md +0 -223
  24. codedoc_ai-0.7.1/codedoc/core/output.py +0 -103
  25. codedoc_ai-0.7.1/codedoc/pipeline.py +0 -768
  26. codedoc_ai-0.7.1/codedoc/utils/errors.py +0 -112
  27. codedoc_ai-0.7.1/codedoc_ai.egg-info/requires.txt +0 -14
  28. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/.env.example +0 -0
  29. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/.github/ISSUE_TEMPLATE/bug_report.md +0 -0
  30. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/.github/ISSUE_TEMPLATE/feature_request.md +0 -0
  31. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/.github/PULL_REQUEST_TEMPLATE.md +0 -0
  32. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/CODE_OF_CONDUCT.md +0 -0
  33. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/CONTRIBUTING.md +0 -0
  34. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/LICENSE +0 -0
  35. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/SECURITY.md +0 -0
  36. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/__main__.py +0 -0
  37. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/agents/__init__.py +0 -0
  38. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/agents/base_agent.py +0 -0
  39. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/agents/dependency_agent.py +0 -0
  40. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/agents/documentation_agent.py +0 -0
  41. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/agents/orchestrator.py +0 -0
  42. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/agents/structure_agent.py +0 -0
  43. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/bootstrap.py +0 -0
  44. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/cli/__init__.py +0 -0
  45. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/core/db.py +0 -0
  46. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/core/graph.py +0 -0
  47. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/core/queue.py +0 -0
  48. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/core/scanner.py +0 -0
  49. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/llm/__init__.py +0 -0
  50. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/llm/api_provider.py +0 -0
  51. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/llm/base.py +0 -0
  52. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/llm/factory.py +0 -0
  53. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/llm/local_provider.py +0 -0
  54. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/parser/__init__.py +0 -0
  55. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/parser/factory.py +0 -0
  56. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/parser/generic_parser.py +0 -0
  57. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/parser/python_parser.py +0 -0
  58. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/parser/react_parser.py +0 -0
  59. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc/utils/__init__.py +0 -0
  60. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc_ai.egg-info/dependency_links.txt +0 -0
  61. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc_ai.egg-info/entry_points.txt +0 -0
  62. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/codedoc_ai.egg-info/top_level.txt +0 -0
  63. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/setup.cfg +0 -0
  64. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/__init__.py +0 -0
  65. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/fixtures/flutter_app/app.dart +0 -0
  66. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/fixtures/flutter_app/main.dart +0 -0
  67. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/fixtures/java_app/Main.java +0 -0
  68. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/fixtures/java_app/Service.java +0 -0
  69. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/fixtures/python_app/main.py +0 -0
  70. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/fixtures/python_app/models.py +0 -0
  71. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/fixtures/python_app/utils.py +0 -0
  72. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/fixtures/react_app/App.tsx +0 -0
  73. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/fixtures/react_app/index.html +0 -0
  74. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/fixtures/react_app/main.tsx +0 -0
  75. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/fixtures/react_app/router.tsx +0 -0
  76. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/fixtures/react_sample.tsx +0 -0
  77. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/test_agents.py +0 -0
  78. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/test_graph.py +0 -0
  79. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/test_llm_mock.py +0 -0
  80. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/test_parser.py +0 -0
  81. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/test_pipeline.py +0 -0
  82. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/test_queue.py +0 -0
  83. {codedoc_ai-0.7.1 → codedoc_ai-0.8.0}/tests/test_scanner.py +0 -0
@@ -0,0 +1,550 @@
1
+ # Changelog
2
+
3
+ ## 0.8.0 - 2026-05-31
4
+
5
+ ### Always-on live JSON crash backup, parallel crash-safety, rate-limit adaptive parallelism, error.log overhaul
6
+
7
+ 0.8.0 closes the full known crash-safety/output-safety gap end to end.
8
+
9
+ ---
10
+
11
+ #### Work Item 1 — Always-on live JSON backup (replaces hidden checkpoint)
12
+
13
+ Every run now writes a visible live JSON backup that is updated after each completed file.
14
+ `--safe-mode` is deprecated and kept only for backwards compatibility — it now prints a
15
+ deprecation notice and has no additional effect.
16
+
17
+ - **`codedoc/core/safe_writer.py`** (overhauled): `SafeWriter` is now the default recorder.
18
+ Constructor now accepts a pre-computed `backup_path: Path` directly. The live backup
19
+ always starts with a `_crash_safety` banner as the first JSON key so interrupted files are
20
+ immediately recognisable as crash-recovery backups. Three new methods:
21
+ `initialize_empty()` — writes the banner before any AI call;
22
+ `set_queue_order()` — controls the `files` array order (topological / queue order, not
23
+ alphabetical); `has_record()` — deduplication check for retry logic.
24
+ `delete()` removes the live backup for MD-only runs after a clean Markdown conversion.
25
+ If deletion fails (Windows file-lock) a warning is logged and the path is reported so the
26
+ user knows the leftover file is safe to remove manually.
27
+
28
+ - **`codedoc/pipeline.py`** — `_resolve_live_backup_path()` helper centralises all backup
29
+ path logic, including the named-MD sibling case (`--output docs/report.md` → live backup
30
+ at `docs/report.json`). `SafeWriter` is always created regardless of `--safe-mode`.
31
+ `initialize_empty()` is called before `create_provider()` so the backup exists even if
32
+ provider initialisation fails. The topological order is passed to `set_queue_order()`.
33
+ Old `.codedoc_progress.json` checkpoints are migrated on the first run that finds no live
34
+ backup and deleted from the rotation afterwards. New stats keys returned:
35
+ `live_backup_path` (absolute path to live backup), `error_log` (absolute path, set when
36
+ any issue is recorded), `issues_recorded` (total count), `rate_limit_warnings` (list of
37
+ step-down events).
38
+
39
+ - **`codedoc/core/output.py`**: removed the intermediate `.codedoc_build.json` write for
40
+ `--format md` runs. Markdown is written directly from the in-memory view; crash safety
41
+ is provided by the live JSON backup. `BUILD_FILENAME` is kept only for reading/migrating
42
+ stale 0.7.x build files.
43
+
44
+ - **`codedoc/core/loader.py`**: updated `_load_existing_file_docs()` to accept
45
+ `live_backup_path` so the named-MD sibling (`report.json`) is probed before the default
46
+ `json_filename`.
47
+
48
+ #### Work Item 2 — Parallel crash-safety: record in worker thread
49
+
50
+ Previously a Ctrl-C or crash during parallel processing could discard a completed file's
51
+ result because `recorder.record()` was called in the main `as_completed` loop.
52
+
53
+ - **`codedoc/pipeline.py`** — `_process_and_record()` wrapper calls `recorder.record()`
54
+ inside the worker thread before returning, so a crash between worker completion and main
55
+ collection never loses a result. The main loop no longer calls `recorder.record()` in the
56
+ parallel path. `has_record()` is checked before adding a descriptor to the retry list so
57
+ a file that already recorded before batch cancellation is not submitted twice.
58
+
59
+ #### Work Item 3 — Adaptive parallelism on rate limits
60
+
61
+ When a provider signals 429 / rate-limit / too-many-requests, file concurrency is stepped
62
+ down through a ladder instead of hammering the API at the original concurrency.
63
+
64
+ - **`codedoc/pipeline.py`**:
65
+ - `_is_rate_limit_error()` — walks the full `__cause__`/`__context__` chain; covers
66
+ OpenAI (`429`, `rate_limit_exceeded`, `tpm`), Anthropic (`529`, `overloaded`), and
67
+ Gemini (`RESOURCE_EXHAUSTED`, `quota`).
68
+ - `_build_default_ladder()` — generates the step-down ladder for any
69
+ `max_parallel_files` value (e.g. `5 → [5, 2, 1]`, `10 → [10, 5, 1]`).
70
+ - `_process_descriptor_batch()` — processes one ladder level and classifies results as
71
+ succeeded / retry-rate-limited / failed-non-rate-limit.
72
+ - `_process_agent_files()` — iterates the ladder, collects step-down events into
73
+ `stats["rate_limit_warnings"]`, prints a provider-specific WARNING to stdout on each
74
+ step-down with the provider name and original `max_parallel_files` value.
75
+ - `_parse_retry_after()` — extracts `Retry-After` sleep delays from error messages;
76
+ applied in sequential mode too when `respect_retry_after = True`.
77
+ - **`codedoc/core/loader.py`**: added `rate_limit_adaptive`, `parallel_ladder`,
78
+ `respect_retry_after`, `retry_after_cap_s` to `DEFAULTS`; full `parallel_ladder`
79
+ validation in `_validate()` (strictly decreasing, clamped to `max_parallel_files`,
80
+ trailing `1` appended if missing).
81
+
82
+ #### Work Item 4 — `error.log` discoverability and `ErrorReporter` severity
83
+
84
+ - **`codedoc/utils/errors.py`**: `ErrorReporter.record()` gains a `level` parameter
85
+ (`"error"` / `"warning"`). `has_errors()` and `error_count()` count only error-level
86
+ entries. `has_issues()` and `issue_count()` count all entries. `summary()` returns `""`
87
+ for warning-only runs so recovered rate-limits never appear in the final `codedoc.json`
88
+ `errors` field or the Markdown `## Errors` section. Log header changed from `error(s)` to
89
+ `issue(s)`.
90
+ - **`codedoc/pipeline.py`**: `ErrorReporter` is now initialised with
91
+ `output_dir / "error.log"` instead of `root / "error.log"`. `stats["error_log"]` and
92
+ `stats["issues_recorded"]` are set on every return path (not only when `failed > 0`).
93
+ Rate-limit health-check notes are recorded as `level="warning"` so they appear in
94
+ `error.log` for diagnostics but do not alarm the final output.
95
+ - **`codedoc/cli/cli.py`**: the error log path is always printed when
96
+ `stats["issues_recorded"] > 0`; message distinguishes "file(s) failed" from "issue(s)
97
+ recorded (all recovered)". Rate-limit step-down warnings are printed to stdout.
98
+ `--safe-mode` help updated to `[DEPRECATED]`.
99
+
100
+ #### Version
101
+
102
+ - `codedoc/__init__.py`, `pyproject.toml`, `cli.py`: `0.7.2` → `0.8.0`.
103
+
104
+ #### Tests
105
+
106
+ - `tests/test_scenarios.py`: updated 3 `SafeWriter` constructor calls to new `backup_path`
107
+ signature.
108
+ - `tests/test_080_features.py` *(new, 38 tests)*: covers live backup creation, banner
109
+ presence, queue order, parallel crash-safety, ownership guard, resume, hash-change
110
+ reprocess, checkpoint migration, rate-limit ladder, signal detector (OpenAI/Anthropic/
111
+ Gemini/false-positives/cause-chain), provider notifications, error.log location and stats,
112
+ deprecation notice, `--format both` behaviour, stats keys, ladder validation,
113
+ no-files early return, and warning exclusion from final output.
114
+
115
+ **All 163 tests pass** (125 existing + 38 new).
116
+
117
+ ---
118
+
119
+ **Behaviour on interrupt and resume (0.8.0 default — always-on live backup):**
120
+ 1. User runs `codedoc run --entry src/main.py` on a 100-file project.
121
+ 2. Before the first LLM call, `codedoc/codedoc.json` is created with a `_crash_safety`
122
+ banner and an empty `files` array.
123
+ 3. After every completed file, `codedoc/codedoc.json` is updated atomically (`.tmp` rename).
124
+ 4. Run is interrupted (Ctrl-C, crash) after 60 files. `codedoc/codedoc.json` contains 60
125
+ complete file records in topological order, clearly marked with `_crash_safety` as
126
+ partial output.
127
+ 5. User re-runs; `codedoc.json` is read (including in-progress entries), 60 unchanged files
128
+ are skipped, only the remaining 40 are sent to the LLM.
129
+ 6. On clean completion, `write_project_outputs` overwrites `codedoc.json` with a final
130
+ clean output (no `_crash_safety`, no `status = "in_progress"`).
131
+
132
+ **MD-only and named-MD runs:**
133
+ - `--format md`: live backup is `codedoc/codedoc.json`; removed automatically on clean
134
+ Markdown write. On interrupt, the JSON sibling remains as the resume source.
135
+ - `--output docs/report.md`: live backup is `docs/report.json` (sibling derived from the
136
+ Markdown stem); removed on clean success.
137
+
138
+ **Rate-limit step-down example:**
139
+ ```
140
+ [OpenAI] Rate limit detected - your configured max_parallel_files (5) has been
141
+ reduced to 2. Retrying 4 remaining file(s) at lower concurrency.
142
+ ```
143
+
144
+ ---
145
+
146
+ ## 0.7.2 - 2026-05-30
147
+
148
+ ### Added: incremental progress checkpoint + `--safe-mode` live output + MD intermediate + ownership guard
149
+
150
+ This release fully solves the data-loss-on-interrupt problem for every output format and run
151
+ mode. It also adds the first line of defence against codedoc accidentally overwriting files
152
+ it did not create.
153
+
154
+ ---
155
+
156
+ #### Checkpoint (always-on, default behaviour)
157
+
158
+ Reverses the 0.6.4 decision ("no per-file checkpoint writes during a run") by introducing a
159
+ lightweight, thread-safe checkpoint file that persists each file result to disk the moment it
160
+ completes, for all output formats (JSON, MD, and both).
161
+
162
+ - `codedoc/core/checkpoint.py` *(new)*: `Checkpoint` class — writes `.codedoc_progress.json`
163
+ to the output directory after every file. Writes are atomic: content is serialised to a
164
+ `.tmp` sibling first, then renamed into place so a crash mid-write never leaves a corrupt
165
+ file. Thread-safe via a per-instance lock; safe to call from parallel worker threads.
166
+ - `codedoc/core/__init__.py`: exported `Checkpoint` in `__all__` and the lazy `__getattr__`
167
+ dispatcher, consistent with all other public core exports.
168
+
169
+ #### `--safe-mode` (opt-in, visible partial output)
170
+
171
+ Adds a `--safe-mode` CLI flag and matching `safe_mode` config key / `CODEDOC_SAFE_MODE`
172
+ environment variable. When active, `Checkpoint` is replaced by `SafeWriter`, which writes
173
+ directly to the real output file after every completed file — so the output always contains
174
+ whatever has been documented so far, even if the run is interrupted.
175
+
176
+ - `codedoc/core/safe_writer.py` *(new)*: `SafeWriter` class — same thread-safe, atomic-write
177
+ design as `Checkpoint`, but the target is the real output file rather than a hidden
178
+ intermediate. The partial JSON embeds `_codedoc.status = "in_progress"` so subsequent runs
179
+ can distinguish it from a completed output and resume correctly.
180
+ - **JSON / both format**: target is `codedoc.json`. The final `write_project_outputs` call
181
+ overwrites it with the complete, polished output — no separate cleanup required.
182
+ - **MD-only format**: target is `.codedoc_build.json` (internal build file, see below).
183
+ After a successful MD write, `SafeWriter.delete()` removes it. On failure it is
184
+ preserved so the user still has partial output and a re-run resumes automatically.
185
+ - `codedoc/core/project_view.py`: added public `clean_file_record()` wrapper around the
186
+ internal `_clean_file()` so `SafeWriter` can produce structurally identical file entries to
187
+ what `build_project_view` would produce.
188
+ - `codedoc/core/__init__.py`: exported `SafeWriter`.
189
+ - `codedoc/core/loader.py`: added `"safe_mode": False` to `DEFAULTS`, `"CODEDOC_SAFE_MODE"`
190
+ to `_ENV_KEY_MAP`, and bool-coercion in `_validate()` (env vars arrive as strings).
191
+ - `codedoc/pipeline.py`:
192
+ - `run_pipeline`: creates either `SafeWriter` or `Checkpoint` depending on `safe_mode`;
193
+ both are referred to via the `recorder` variable. Calls `recorder.record()` /
194
+ `recorder.delete()` uniformly — the recorder type determines the behaviour.
195
+ - `_process_agent_files` / `_process_files_sequentially`: parameter renamed
196
+ `checkpoint` → `recorder`; type annotation updated to `Checkpoint | SafeWriter`.
197
+ - `_resolve_entry_and_docs`: always probes the JSON candidate and build file before MD,
198
+ regardless of the current `--format` setting, enabling cross-format and build-file resume.
199
+ - `codedoc/cli/cli.py`: added `--safe-mode` flag; `KeyboardInterrupt` message updated;
200
+ `Files resumed` summary line added.
201
+
202
+ #### MD-only runs now always produce a JSON intermediate before converting
203
+
204
+ Previously a `--format md` run held all results in RAM and wrote one file at the end — a
205
+ crash before that point lost everything. Now `write_project_outputs` for MD format writes
206
+ the full result to `.codedoc_build.json` **before** starting the Markdown conversion.
207
+
208
+ - On successful MD write → `.codedoc_build.json` is deleted automatically.
209
+ - On failure (exception, crash during conversion) → `.codedoc_build.json` is preserved;
210
+ codedoc logs its location. Re-running the same command loads it via the incremental hash
211
+ check and re-attempts the conversion without any LLM calls.
212
+
213
+ `--format both` is unaffected: the JSON output itself serves as the durable intermediate.
214
+
215
+ #### Internal build file (`.codedoc_build.json`)
216
+
217
+ `BUILD_FILENAME = ".codedoc_build.json"` (exported from `codedoc.core.output`) names the
218
+ internal intermediate file used by both `write_project_outputs` (MD-only runs) and
219
+ `SafeWriter` (safe-mode MD runs). The dot-prefix marks it as a system-managed file — not a
220
+ final output, not user-editable.
221
+
222
+ - `codedoc/pipeline.py` — `_load_existing_file_docs`: loads from both `codedoc.json`
223
+ (baseline) and `.codedoc_build.json` (newer-run overlay) and **merges** them. Build-file
224
+ records take priority per-file so that LLM work completed in an interrupted newer run is
225
+ never discarded just because an older `codedoc.json` already exists.
226
+ - `codedoc/pipeline.py` — `_resolve_entry_and_docs`: adds `.codedoc_build.json` to the
227
+ candidate list so the entry file is recoverable from a partial build file.
228
+
229
+ #### Ownership guard before writing output files
230
+
231
+ `write_project_outputs` and `SafeWriter` now verify that any existing file at the target path
232
+ was produced by codedoc before allowing an overwrite. If the file does **not** carry a
233
+ `_codedoc` metadata block (JSON) or `<!-- codedoc-ai: -->` comment (Markdown), a
234
+ `ConfigError` is raised — codedoc refuses to overwrite data it did not create.
235
+
236
+ - `codedoc/core/output.py`: `_check_file_ownership(path)` — raises `ConfigError` for
237
+ non-codedoc files; passes silently for new files or files codedoc owns. The check now
238
+ covers `json_path`, `md_path`, **and** `build_path` (`.codedoc_build.json`).
239
+ - `codedoc/core/safe_writer.py`: `load()` now raises `ConfigError` at startup when the
240
+ target file exists but has no `_codedoc` block, preventing SafeWriter from ever flushing
241
+ over a foreign file during the run.
242
+ - `codedoc/cli/cli.py`: `ConfigError` is surfaced with an `"Error: ..."` prefix (matching
243
+ `FileNotFoundError`) rather than `"Fatal error: ..."`, giving the user a clean actionable
244
+ message without a traceback.
245
+
246
+ #### Fixed: modified files are re-documented when resuming from a checkpoint
247
+
248
+ When a run is interrupted and a file is edited before the user re-runs, the checkpoint entry
249
+ for that file is discarded and the file is re-documented rather than silently restoring stale
250
+ documentation.
251
+
252
+ - `codedoc/core/checkpoint.py`: `record()` now accepts an optional `file_hash` parameter.
253
+ When provided, the hash is stored inside the checkpoint entry under the reserved key
254
+ ``"_checkpoint_hash"``.
255
+ - `codedoc/core/safe_writer.py`: `record()` updated with the same optional `file_hash`
256
+ parameter for interface consistency.
257
+ - `codedoc/pipeline.py`:
258
+ - Added `_safe_file_hash()` helper.
259
+ - Both `_process_agent_files` (parallel path) and `_process_files_sequentially` compute
260
+ and forward the file hash to `recorder.record()`.
261
+ - The routing loop uses three explicit branches:
262
+ 1. **No hash stored** (`stored_hash == ""`): checkpoint was written by code older than
263
+ 0.7.2 and cannot be verified — reprocess to avoid silently restoring potentially
264
+ stale documentation.
265
+ 2. **Hash mismatch** (`content_hash != stored_hash`): file was modified after it was
266
+ checkpointed — discard entry, reprocess.
267
+ 3. **Hash matches**: checkpoint entry is current — restore it and skip the LLM.
268
+ - The ``"_checkpoint_hash"`` key is stripped before the entry is stored in
269
+ ``new_results``, so it never surfaces in the final output.
270
+
271
+ #### Fixed: hardening of the recovery / ownership work (review follow-ups)
272
+
273
+ Follow-up fixes to the recovery and ownership features above, found while
274
+ reviewing the release.
275
+
276
+ - `codedoc/core/safe_writer.py` — `SafeWriter.load()`:
277
+ - **No longer erases prior work on a safe-mode interrupt.** When a *completed*
278
+ `codedoc.json` already exists, its records are now pre-loaded into memory, so
279
+ the first per-file flush preserves them. Previously the first flush wrote
280
+ only the files processed in the current run, erasing previously completed
281
+ records if the run was then interrupted — making `--safe-mode` worse than the
282
+ default checkpoint. Records are now pre-loaded for both `in_progress`
283
+ intermediates and completed outputs.
284
+ - **Refuses to overwrite malformed / unreadable target files.** `load()` now
285
+ raises `ConfigError` when the target file cannot be parsed as JSON or is not a
286
+ JSON object with a `_codedoc` block, instead of logging a warning and starting
287
+ fresh (which would overwrite the foreign file on the first flush). This brings
288
+ `SafeWriter` in line with `_check_file_ownership` in `output.py`, which already
289
+ treated malformed files as foreign.
290
+ - The stale module docstring describing `codedoc.json` as the MD-only
291
+ intermediate was corrected to `.codedoc_build.json`.
292
+ - `codedoc/pipeline.py` — `_load_existing_file_docs()`: the `.codedoc_build.json`
293
+ overlay is now **freshness-gated**. A build file is only overlaid onto
294
+ `codedoc.json` when it is at least as new (by modification time). A build file
295
+ left behind by an earlier crashed MD run, after a later `--format json` run
296
+ rewrote `codedoc.json`, is now detected as stale, skipped, and removed — so older
297
+ build-file records can no longer silently replace newer JSON documentation (the
298
+ inverse of the merge case the overlay was added for).
299
+ - `codedoc/__init__.py`: `__version__` corrected from `0.7.0` to `0.7.2` to match
300
+ the CLI `--version` output and `pyproject.toml`.
301
+ - `OPENAI_RUN_FLOW.md` → `RUN_FLOW.md`: the run-flow / scenario reference was
302
+ renamed and generalised from OpenAI-only to cover all three providers (OpenAI,
303
+ Anthropic, Gemini) — correcting the API-key resolution and JSON-mode sections —
304
+ and four scenarios were added: newer vs. stale build-file overlay, safe-mode
305
+ resume with a completed output present, and malformed/foreign target files.
306
+ - `README.md`: documented the checkpoint recovery, `--safe-mode`, the
307
+ `.codedoc_build.json` intermediate, the ownership guard, and the
308
+ `CODEDOC_SAFE_MODE` environment variable; bumped the documented release to
309
+ `0.7.2`.
310
+
311
+ ---
312
+
313
+ **Behaviour on interrupt and resume (default — Checkpoint):**
314
+ 1. User runs `codedoc run --entry src/main.py` on a 100-file project.
315
+ 2. Run is interrupted (Ctrl-C, crash) after 60 files complete.
316
+ 3. `.codedoc_progress.json` in the output directory holds all 60 results.
317
+ 4. User re-runs the same command; 60 files are restored from the checkpoint (hash-verified),
318
+ only the remaining 40 are sent to the LLM.
319
+ 5. On clean completion the checkpoint file is deleted automatically.
320
+
321
+ **Behaviour on interrupt and resume (`--safe-mode`):**
322
+ 1. User runs `codedoc run --safe-mode --entry src/main.py` on a 100-file project.
323
+ 2. After every file, the output file is updated with the results so far.
324
+ 3. Run is interrupted after 60 files; the output contains 60 complete file records.
325
+ 4. User re-runs; the existing hash-based incremental logic detects all 60 files as unchanged
326
+ and skips them automatically — only the remaining 40 are sent to the LLM.
327
+ 5. On clean completion `write_project_outputs` overwrites the output with the final polished
328
+ result (and `SafeWriter.delete()` removes the intermediate for MD-only runs).
329
+
330
+ ## 0.7.1 - 2026-05-25
331
+
332
+ ### Fixed: provider-specific default models not applied when `--model` is omitted (GitHub Issue #2)
333
+
334
+ - `codedoc/core/loader.py`: changed `DEFAULTS["model_name"]` from `"gpt-4o-mini"` to `""`.
335
+ - Previously, the global default `"gpt-4o-mini"` was a truthy string that short-circuited the `or` fallbacks in the provider factory for every provider. Running `--provider gemini` without `--model` would silently send requests to Gemini using the OpenAI model name `gpt-4o-mini`, causing a 404 from the Gemini API. The same bug applied to `--provider anthropic` without `--model`, which would have called Anthropic with `gpt-4o-mini` and failed.
336
+ - With an empty string default, the factory's per-provider fallbacks now activate correctly:
337
+ - Gemini with no model → `gemini-2.5-flash`
338
+ - Anthropic with no model → `claude-haiku-4-5-20251001`
339
+ - OpenAI / auto with no model → `gpt-4o-mini` (unchanged)
340
+ - Behaviour when `--model` is explicitly passed is unchanged.
341
+
342
+ ## 0.7.0 - 2026-05-24
343
+
344
+ ### MD-only incremental now works (Issue 1)
345
+ - `_build_meta_comment` now embeds a `file_hashes` dict inside the `<!-- codedoc-ai: ... -->` metadata comment written at the top of every `codedoc.md`. Each entry maps a relative file path to its SHA-256 hash.
346
+ - `_load_existing_file_docs` now falls back to the MD file when no JSON exists. It reads hashes from the metadata comment and file records from the parsed MD content. Users who only ever run `--format md` no longer pay full LLM cost on every run.
347
+ - MD files generated before 0.7.0 have no `file_hashes`; the first 0.7.0 run re-processes everything once, then subsequent runs are incremental.
348
+ - Zero extra files: MD-only output remains a single file.
349
+
350
+ ### Cross-format resume (Issue 2)
351
+ - `_resolve_entry_and_docs` now checks for a same-stem `.md` sibling when a `.json` candidate does not exist (e.g. `--output codedoc/claude.json` after a previous run wrote `codedoc/claude.md`).
352
+ - `_load_existing_file_docs` checks the same-stem MD sibling before falling back to the configured MD filename.
353
+
354
+ ### Warning when entry file not in scanned set (Issue 3)
355
+ - `_select_files` now logs a `WARNING` when the entry file exists on disk but is absent from the scanner's file map (unsupported extension, too large, in a skip directory).
356
+
357
+ ### Removed dead `write_outputs` function (Issue 4)
358
+ - `codedoc/core/output.py`: removed the never-called `write_outputs()` backward-compat wrapper that still referenced removed fields (`id`, `format`, `last_processed`, `git_commit`, `author`). Unused `datetime`/`timezone` imports also removed.
359
+
360
+ ### `--format both` with a named file is now a hard error (Issue 5)
361
+ - `_resolve_output_spec` raises `ConfigError` when `output_format` is `"both"` and a named file path is given. Previously this silently downgraded to a single format. The error message directs developers to use a directory path instead.
362
+
363
+ ### Tests
364
+ - Added 5 regression tests covering all fixes above.
365
+
366
+ ## 0.6.4 - 2026-05-24
367
+
368
+ - Removed `codedoc_db.json` entirely — the public `codedoc.json` output already stores `hash` per file, which is sufficient for incremental processing.
369
+ - Hash-based incremental check now compares `compute_file_hash(path)` against `existing_docs[rel].get("hash")` from the public JSON, replacing the DB lookup.
370
+ - Added `_deps` field per file in the public JSON: stores the raw `dependencies_analysis` dict so the dependency catalog can be fully rebuilt from unchanged files on the next incremental run without an LLM call. Not rendered in Markdown output.
371
+ - `_public_record_to_doc` now reads `_deps` back and sets it as `dependencies_analysis`; falls back to `links.external_dependencies` for old-format JSON files.
372
+ - No per-file checkpoint writes during a run — crash recovery now means re-running the affected files.
373
+ - Legacy cleanup: if `codedoc_db.json` exists in the output directory at run time, it is deleted and a log message is emitted.
374
+ - `codedoc/core/db.py` stripped to just the `compute_file_hash` utility; `CodeDocDB` class removed.
375
+
376
+ ## 0.6.3 - 2026-05-24
377
+
378
+ - Trimmed `codedoc_db.json` to the minimum needed for incremental runs:
379
+ - Removed `history` array entirely — every field it contained (`file_path`, `processed_at`, `hash`, `author`) was already present in the `files` section, making it pure duplication. It was also never read anywhere in the pipeline.
380
+ - Removed `author` and `git_commit` fields from per-file DB entries — no longer stored in any output since 0.6.2, so they served no purpose in the cache.
381
+ - Removed git subprocess calls (`git rev-parse`, `git config user.name`) from the DB write path — nothing reads their output anymore, so there is no reason to shell out on every file write.
382
+ - Each DB entry now contains only: `hash`, `last_processed`, and (when present) `dependencies_analysis`.
383
+ - Existing `codedoc_db.json` files with the old format are migrated transparently on the next run (history is silently dropped).
384
+
385
+ ## 0.6.2 - 2026-05-23
386
+
387
+ - Cleaned public output for better AI scannability (schema version 1.4):
388
+ - Removed `id` field per file (always identical to `hash` — pure duplication).
389
+ - Removed `last_processed` field per file (internal processing timestamp, not documentation content).
390
+ - Removed `state` field per file (always `"checked"` in public output — carries no signal).
391
+ - Removed `format` field per file (file extension is already in `path`; `language` covers the language name).
392
+ - Result: each file record is smaller and contains only documentation-relevant content.
393
+ - Markdown output no longer renders `**ID:**` or `**Format:**` header lines per file.
394
+
395
+ ## 0.6.1 - 2026-05-23
396
+
397
+ - Improved run logging:
398
+ - Replaced animated file progress bars with stable log lines.
399
+ - Logs now show provider/model, configured file concurrency, file start events, completion percentage, and remaining file count.
400
+ - Format switches now log when an unselected public output file is removed.
401
+ - Parallel file processing is now visible in log output.
402
+ - Internal agent processing events demoted to debug level to reduce noise.
403
+
404
+ ## 0.6.0 - 2026-05-23
405
+
406
+ - Added metadata-backed reruns:
407
+ - JSON output now includes a top-level `_codedoc` metadata block.
408
+ - Markdown output now includes a hidden `codedoc-ai` metadata comment.
409
+ - Stored metadata includes the entry file, schema version, and generation time.
410
+ - Subsequent runs can recover the entry file from a previously generated `.json` or `.md` documentation file.
411
+ - Changed first-run/resume behavior:
412
+ - First runs require an explicit entry file when no valid previous CodeDoc output is available.
413
+ - If no output path is provided, CodeDoc checks the default `codedoc/` folder for previous docs.
414
+ - Invalid or metadata-free documentation files now fail clearly instead of being treated as valid resume sources.
415
+ - Changed default generated output location from `docs_output/` to `codedoc/`.
416
+ - Kept JSON as the default public output format.
417
+ - Added support for output file paths:
418
+ - `--output docs/report.json` writes a named JSON file.
419
+ - `--output docs/report.md` writes a named Markdown file.
420
+ - File extension now determines the selected output format for explicit file paths.
421
+ - Unsupported output file extensions now raise a configuration error.
422
+ - Moved the incremental cache into the selected output directory:
423
+ - `codedoc_db.json` is now stored beside generated docs.
424
+ - Existing root-level `codedoc_db.json` files are migrated into the output directory when possible.
425
+ - Improved output cleanup:
426
+ - Default managed files (`codedoc.json`, `codedoc.md`) are removed when switching formats.
427
+ - Legacy per-file outputs such as `main.py.json` and `main.py.md` are cleaned up.
428
+ - Custom-named output files are preserved across runs.
429
+ - Simplified provider mode support for this release:
430
+ - Active providers are OpenAI/OpenAI-compatible, Anthropic, and Gemini.
431
+ - Local provider code remains in the package but is not exposed through the CLI/factory in 0.6.0.
432
+ - Removed `--llm` / `LLM_MODE` from the documented public workflow.
433
+ - Improved provider implementations:
434
+ - Reused Anthropic clients instead of creating a client per request.
435
+ - Added native JSON-mode handling for OpenAI and Gemini where available.
436
+ - Improved Gemini system-instruction handling.
437
+ - Updated CLI help, README, and version metadata for the 0.6.0 workflow.
438
+ - Added regression coverage for:
439
+ - Missing entry plus missing docs raising a clear configuration error.
440
+ - Resuming from existing JSON metadata.
441
+ - Custom output filename behavior.
442
+ - JSON remaining the default format.
443
+ - Cache/output cleanup and metadata preservation.
444
+
445
+ ## 0.5.2 - 2026-05-13
446
+
447
+ - Fixed cache structure duplication issues in generated documentation output.
448
+ - Improved dependency/import resolution to prevent incorrect file mappings and false dependency relationships.
449
+ - Cleaned and normalized public dependency output generation.
450
+ - Reduced noisy dependency cycles in generated Markdown and JSON outputs.
451
+ - Added regression coverage for cache structure and dependency resolution behavior.
452
+
453
+ ## 0.5.1 - 2026-05-13
454
+
455
+ - Cleaned generated cache and public JSON by pruning empty arrays, empty objects, nulls, and duplicate nested fields.
456
+ - Removed the top-level cache `version` field from newly written `codedoc_db.json`.
457
+ - Improved Markdown-to-JSON conversion so it no longer recreates empty default sections.
458
+ - Tightened agent prompts to avoid placeholder package names and empty output fields.
459
+
460
+ ## 0.5.0 - 2026-05-13
461
+
462
+ - Promoted `codedoc-ai` to the 0.5.0 feature line.
463
+ - Added bounded file-level parallelism:
464
+ - Processes up to 5 files at a time by default.
465
+ - Adds `--max-parallel-files N` for CLI control.
466
+ - Adds `max_parallel_files`, `file_retry_attempts`, and `max_consecutive_failures` config options.
467
+ - Added sequential retry fallback for files that fail during parallel execution.
468
+ - Added provider/API health diagnostics when repeated file processing failures suggest bad credentials, rate limits, model errors, network issues, or provider downtime.
469
+ - Kept cache writes ordered and centralized so `codedoc_db.json` remains structured even when files are processed concurrently.
470
+ - Added AI-friendly dependency cataloging:
471
+ - File-level dependencies remain on each file.
472
+ - AI can suggest `catalog_updates` internally.
473
+ - Public output receives a merged `dependency_catalog`.
474
+ - Repeated dependency explanations are deduplicated across JSON and Markdown.
475
+ - Added deterministic JSON/Markdown conversion helpers so public JSON can become Markdown without another AI call, and generated Markdown can be parsed back into the public JSON shape.
476
+ - Clarified DependencyAgent output so generic import notes stay out of repeated file records unless they are file-specific.
477
+ - Added Google Gemini support through the official `google-genai` SDK.
478
+ - Added `llm_provider` config and `--provider auto|openai|anthropic|gemini` CLI selection.
479
+ - Expanded README with Codex/AI-agent analysis covering token savings, hallucination reduction, complex edit safety, and recommended workflows.
480
+ - Added tests for:
481
+ - File-level parallel processing.
482
+ - Retry behavior.
483
+ - Dependency catalog output.
484
+ - JSON/Markdown conversion.
485
+ - Format switching from cache.
486
+
487
+ ## 0.1.4 - 2026-05-02
488
+
489
+ - Redesigned **public output structure** for cleaner, AI-friendly documentation.
490
+ - Separated **internal cache (`codedoc_db.json`)** from **public output (`codedoc.json` / `codedoc.md`)**.
491
+ - Added **project-level overview** including entry file, file count, languages, and folder summary.
492
+ - Added **project tree visualization** in both JSON and Markdown outputs.
493
+ - Added **folder-based grouping** with summarized purpose and file listings.
494
+ - Introduced **dependency graph** with internal file relationships and external dependencies.
495
+ - Flattened file structure in public output:
496
+ - Removed nested and duplicated `result` / `documentation` blocks.
497
+ - Consolidated descriptions, roles, functions, classes, and exports into a single clean structure.
498
+ - Added **file-level linking metadata**:
499
+ - `internal_dependencies`
500
+ - `external_dependencies`
501
+ - `imported_by`
502
+ - Removed **author and git metadata** from public output by default.
503
+ - Improved **Markdown output (`--format md`)**:
504
+ - Added Project Overview, Tree, Folder Map, Dependency Map, and structured file summaries.
505
+ - Ensured **format-specific output behavior**:
506
+ - `--format md` → only `codedoc.md`
507
+ - `--format json` → only `codedoc.json`
508
+ - `--format both` → both files
509
+ - Added **clear CLI and pipeline logging**:
510
+ - Displays selected output format
511
+ - Displays exact output file path
512
+ - Added **BOM-safe file reading (`utf-8-sig`)** across Python, JS/TS, and generic parsers.
513
+ - Ensured **language-agnostic processing** (no Python-only assumptions).
514
+ - Added tests for:
515
+ - New public output structure
516
+ - Markdown generation
517
+ - Dependency graph presence
518
+ - Cross-language compatibility (including TS/TSX)
519
+ - Cleaned up public output by removing:
520
+ - Cache history
521
+ - Raw agent responses
522
+ - Redundant description fields
523
+
524
+ ## 0.1.3 - 2026-05-02
525
+
526
+ - Changed generated docs to one combined JSON file by default.
527
+ - Added `--format json|md|both` output selection.
528
+ - Added smart content-hash reuse for unchanged and duplicate files.
529
+ - Added cache-based output regeneration when selected docs are missing.
530
+ - Redesigned public output with project overview, tree, folder map, dependency graph, and flattened file summaries.
531
+ - Removed local author metadata and raw agent result duplication from public output.
532
+ - Expanded public README with provider setup, defaults, config, output, and cache behavior.
533
+
534
+ ## 0.1.1 - 2026-05-01
535
+
536
+ - Added safer default scanning for virtual environments such as `myenv`.
537
+ - Added configurable `skip_dirs`.
538
+ - Added strict project-relative ignore paths through CLI, config, environment, and Python API.
539
+ - Added `--ignore PATH` CLI option.
540
+ - Added scanner tests for virtual environment and strict path ignores.
541
+ - Fixed misleading API key warning when CLI overrides select local LLM mode.
542
+
543
+ ## 0.1.0 - 2026-05-01
544
+
545
+ - Initial alpha release.
546
+ - Added entry-file dependency traversal.
547
+ - Added local and API LLM provider support.
548
+ - Added per-file Markdown and JSON output.
549
+ - Added `_index.json`, `_summary.md`, and incremental `codedoc_db.json` memory.
550
+ - Added CLI and Python API entry points.
@@ -1,4 +1,5 @@
1
1
  include README.md
2
+ include RUN_FLOW.md
2
3
  include LICENSE
3
4
  include CHANGELOG.md
4
5
  include CONTRIBUTING.md