autodocgenerator 0.5.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,890 @@
1
+ Metadata-Version: 2.4
2
+ Name: autodocgenerator
3
+ Version: 0.5.1
4
+ Summary: This Project helps you to create docs for your projects
5
+ License: MIT
6
+ Author: dima-on
7
+ Author-email: sinica911@gmail.com
8
+ Requires-Python: >=3.11,<4.0
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Programming Language :: Python :: 3
11
+ Classifier: Programming Language :: Python :: 3.11
12
+ Classifier: Programming Language :: Python :: 3.12
13
+ Classifier: Programming Language :: Python :: 3.13
14
+ Classifier: Programming Language :: Python :: 3.14
15
+ Requires-Dist: CacheControl (==0.14.4)
16
+ Requires-Dist: Pygments (==2.19.2)
17
+ Requires-Dist: RapidFuzz (==3.14.3)
18
+ Requires-Dist: annotated-types (==0.7.0)
19
+ Requires-Dist: anyio (==4.12.1)
20
+ Requires-Dist: certifi (==2026.1.4)
21
+ Requires-Dist: charset-normalizer (==3.4.4)
22
+ Requires-Dist: cleo (==2.1.0)
23
+ Requires-Dist: colorama (==0.4.6)
24
+ Requires-Dist: crashtest (==0.4.1)
25
+ Requires-Dist: distlib (==0.4.0)
26
+ Requires-Dist: distro (==1.9.0)
27
+ Requires-Dist: dulwich (==0.25.2)
28
+ Requires-Dist: fastjsonschema (==2.21.2)
29
+ Requires-Dist: filelock (==3.20.3)
30
+ Requires-Dist: findpython (==0.7.1)
31
+ Requires-Dist: google-auth (==2.47.0)
32
+ Requires-Dist: google-genai (==1.56.0)
33
+ Requires-Dist: groq (==1.0.0)
34
+ Requires-Dist: h11 (==0.16.0)
35
+ Requires-Dist: httpcore (==1.0.9)
36
+ Requires-Dist: httpx (==0.28.1)
37
+ Requires-Dist: idna (==3.11)
38
+ Requires-Dist: installer (==0.7.0)
39
+ Requires-Dist: jaraco.classes (==3.4.0)
40
+ Requires-Dist: jaraco.context (==6.1.0)
41
+ Requires-Dist: jaraco.functools (==4.4.0)
42
+ Requires-Dist: jiter (==0.12.0)
43
+ Requires-Dist: keyring (==25.7.0)
44
+ Requires-Dist: markdown-it-py (==4.0.0)
45
+ Requires-Dist: mdurl (==0.1.2)
46
+ Requires-Dist: more-itertools (==10.8.0)
47
+ Requires-Dist: msgpack (==1.1.2)
48
+ Requires-Dist: openai (==2.14.0)
49
+ Requires-Dist: packaging (==25.0)
50
+ Requires-Dist: pbs-installer (==2026.1.14)
51
+ Requires-Dist: pkginfo (==1.12.1.2)
52
+ Requires-Dist: platformdirs (==4.5.1)
53
+ Requires-Dist: pyasn1 (==0.6.1)
54
+ Requires-Dist: pyasn1_modules (==0.4.2)
55
+ Requires-Dist: pydantic (==2.12.5)
56
+ Requires-Dist: pydantic_core (==2.41.5)
57
+ Requires-Dist: pyproject_hooks (==1.2.0)
58
+ Requires-Dist: python-dotenv (==1.2.1)
59
+ Requires-Dist: pywin32-ctypes (==0.2.3)
60
+ Requires-Dist: requests (==2.32.5)
61
+ Requires-Dist: requests-toolbelt (==1.0.0)
62
+ Requires-Dist: rich (==14.2.0)
63
+ Requires-Dist: rich_progress (==0.4.0)
64
+ Requires-Dist: rsa (==4.9.1)
65
+ Requires-Dist: shellingham (==1.5.4)
66
+ Requires-Dist: sniffio (==1.3.1)
67
+ Requires-Dist: tenacity (==9.1.2)
68
+ Requires-Dist: tomlkit (==0.14.0)
69
+ Requires-Dist: tqdm (==4.67.1)
70
+ Requires-Dist: trove-classifiers (==2026.1.14.14)
71
+ Requires-Dist: typing-inspection (==0.4.2)
72
+ Requires-Dist: typing_extensions (==4.15.0)
73
+ Requires-Dist: urllib3 (==2.6.2)
74
+ Requires-Dist: virtualenv (==20.36.1)
75
+ Requires-Dist: websockets (==15.0.1)
76
+ Requires-Dist: zstandard (==0.25.0)
77
+ Description-Content-Type: text/markdown
78
+
79
+ ## Executive Navigation Tree
80
+ * 📂 **Core Engine**
81
+ * [Core Logic](#logic-flow)
82
+ * [Manager Class](#manager-class)
83
+ * [Base Module](#basemodule)
84
+ * [Progress Base Module](#progress_base_module)
85
+ * [Base Progress Class](#baseprogress_class)
86
+ * [Lib Progress Class](#libprogress_class)
87
+ * ⚙️ **Integration and Utilities**
88
+ * [Integration](#integration)
89
+ * [Usage Example](#usage-example)
90
+ * [Usage Notes](#usage-notes)
91
+ * [Postprocess](#postprocess)
92
+ * [Splitter](#spliter)
93
+ * 📄 **Documentation and Settings**
94
+ * [Documentation](#documentation)
95
+ * [Settings](#settings)
96
+ * [Introduction](#get_links_intro)
97
+ * [Introduction to Global Data](#get_introdaction)
98
+
99
+ **Auto Doc Generator** – *Project‑Wide Overview*
100
+ *(Activated by the project name “Auto Doc Generator”)*
101
+
102
+ ---
103
+
104
+ ## 1. Project Title
105
+ **Auto Doc Generator**
106
+
107
+ ---
108
+
109
+ ## 2. Project Goal
110
+ The purpose of **Auto Doc Generator** is to **automatically produce high‑quality documentation for any software project**.
111
+ Developers no longer need to write lengthy READMEs, API references, or architecture overviews by hand; the tool extracts source‑code, feeds it to a large‑language model (LLM), and assembles the model’s responses into a coherent, ready‑to‑publish document.
112
+
113
+ Key problems it solves
114
+
115
+ | Problem | How Auto Doc Generator solves it |
116
+ |---------|----------------------------------|
117
+ | **Time‑consuming manual writing** | Generates the whole documentation in a few minutes. |
118
+ | **Inconsistent style & missing sections** | Centralised prompt templates enforce a uniform tone and guarantee the presence of intro, links, and section headings. |
119
+ | **Keeping docs in sync with code** | The pre‑processor walks the repository, captures every file (except ignored ones), and feeds the latest source to the LLM each run. |
120
+ | **Scalability for large codebases** | A “compression” pipeline groups file fragments, repeatedly summarises them with the LLM, and reduces the whole repository to a single markdown string. |
121
+
122
+ ---
123
+
124
+ ## 3. Core Logic & Principles
125
+
126
+ ### 3.1 High‑level data flow
127
+
128
+ ```
129
+ Repository → CodeMix (tree + raw files) → Split into per‑file blocks
130
+ → Compressor (iterative LLM summarisation) → Single markdown document
131
+ → Post‑processor (heading extraction, intro generation) → Final output
132
+ ```
133
+
134
+ ### 3.2 Main layers
135
+
136
+ | Layer | Responsibility | Principal modules |
137
+ |-------|----------------|-------------------|
138
+ | **Configuration** | Stores static prompt fragments, environment variables, model identifiers. | `engine/config/config.py` |
139
+ | **Model Layer** | Wraps the LLM (Groq) – provides synchronous (`GPTModel`) and asynchronous (`AsyncGPTModel`) interfaces. | `engine/models/model.py`, `engine/models/gpt_model.py` |
140
+ | **History** | Keeps the conversation context (system prompt + previous Q/A) that is sent to the LLM. | `History` class in `engine/models/model.py` |
141
+ | **Factory / Modules** | Orchestrates several LLM‑generated fragments (intro links, intro paragraph, etc.) into a full documentation string. | `factory/base_factory.py`, `factory/modules/intro.py` |
142
+ | **Pre‑processor** | Walks the project directory, writes a single “code‑mix” file that contains a file‑tree header and the raw source of each file. | `preprocessor/code_mix.py` |
143
+ | **Compressor** | Repeatedly groups a configurable number of text blocks (`compress_power`), asks the LLM to summarise them, and replaces the group with the summary until only one block remains. | `preprocessor/compressor.py` |
144
+ | **Post‑processor** | Parses the final markdown, extracts headings, optionally asks the LLM for section introductions, and builds a table of contents. | `preprocessor/postprocess.py` |
145
+ | **UI / Progress** | Optional visual feedback (plain console or Rich‑based progress bar). | `ui/progress_base.py` |
146
+
147
+ ### 3.3 Core algorithms
148
+
149
+ * **Repository dump (`CodeMix`)** – Recursively walks the directory, respects an `ignore_list`, and writes each file wrapped in `<file path="…">` markers. This deterministic format makes later splitting trivial.
150
+ * **Iterative compression** – The compressor works like a *divide‑and‑conquer* summariser:
151
+ 1. Split the list of file blocks into chunks of size `compress_power`.
152
+ 2. Send each chunk to the LLM with a system prompt that explains the “compress‑to‑one” task.
153
+ 3. Replace the chunk with the LLM’s answer.
154
+ 4. Repeat until the list length is 1.
155
+ This approach keeps token usage within model limits while still producing a global view of the whole codebase.
156
+ * **History handling** – Every call to `get_answer()` appends the user message and the model’s reply to the `History` object, guaranteeing context continuity for multi‑turn interactions (e.g., when the factory asks for intro links then for the intro paragraph).
157
+ * **Factory pattern** – `DocFactory` receives an ordered collection of *module* objects (`IntroLinks`, `IntroText`, …). Each module implements a `run(info: dict) -> str` method that internally calls the model. The factory concatenates the returned strings, producing the final documentation.
158
+
159
+ ---
160
+
161
+ ## 4. Key Features
162
+
163
+ - **Full‑project code ingestion** – Automatic tree generation and source extraction for every non‑ignored file.
164
+ - **Sync & async LLM wrappers** – Choose `GPTModel` for simple scripts or `AsyncGPTModel` for high‑throughput pipelines.
165
+ - **Prompt‑driven, configurable documentation style** – All system prompts live in `engine/config/config.py`; swapping a constant changes the tone for every run.
166
+ - **Iterative compression** – Handles arbitrarily large repositories while staying inside model token limits.
167
+ - **Modular documentation factory** – Plug‑in new modules (e.g., “API reference”, “Installation guide”) without touching the core pipeline.
168
+ - **Progress feedback** – Optional Rich‑based progress bar or a no‑op fallback.
169
+ - **Environment‑first design** – `.env` file automatically loaded; API keys never hard‑coded.
170
+ - **Extensible settings object** – `ProjectSettings` lets you add arbitrary metadata (target audience, tech stack, etc.) that the LLM can use when drafting the docs.
171
+
172
+ ---
173
+
174
+ ## 5. How to Run
175
+
176
+ Below is a **step‑by‑step guide** that works on any platform with Python 3.10+.
177
+
178
+ ### 5.1 Prerequisites
179
+
180
+ 1. **Python** (≥ 3.10) installed and available on `PATH`.
181
+ 2. **Git** (optional, only if you clone the repo).
182
+ 3. **Groq API key** – sign up at https://groq.com and obtain a key.
183
+
184
+ ### 5.2 Installation
185
+
186
+ ```bash
187
+ # 1️⃣ Clone the repository
188
+ git clone https://github.com/your‑org/auto-doc-generator.git
189
+ cd auto-doc-generator
190
+
191
+ # 2️⃣ Create a virtual environment (recommended)
192
+ python -m venv .venv
193
+ source .venv/bin/activate # on Windows: .venv\Scripts\activate
194
+
195
+ # 3️⃣ Install required packages
196
+ pip install -r requirements.txt
197
+ ```
198
+
199
+ `requirements.txt` typically contains:
200
+
201
+ ```
202
+ python-dotenv
203
+ groq
204
+ rich # optional, for the fancy progress bar
205
+ ```
206
+
207
+ ### 5.3 Configure environment variables
208
+
209
+ Create a `.env` file in the project root:
210
+
211
+ ```
212
+ API_KEY=YOUR_GROQ_API_KEY
213
+ ```
214
+
215
+ If you prefer not to use a `.env` file, you can pass the key directly when constructing the model (see the usage example).
216
+
217
+ ### 5.4 Run a **synchronous** documentation generation
218
+
219
+ ```bash
220
+ python examples/sync_demo.py
221
+ ```
222
+
223
+ `sync_demo.py` contains the exact snippet from the documentation (see the “Strict Usage Example” section). It:
224
+
225
+ 1. Builds a `History` with the system prompt.
226
+ 2. Instantiates `GPTModel`.
227
+ 3. Sends a single question and prints the answer.
228
+ 4. Uses `DocFactory` with `IntroLinks` and `IntroText` to produce a short markdown page.
229
+
230
+ ### 5.5 Run an **asynchronous** documentation generation
231
+
232
+ ```bash
233
+ python examples/async_demo.py
234
+ ```
235
+
236
+ The script mirrors the synchronous version but uses `AsyncGPTModel` and `await`‑s the call.
237
+
238
+ ### 5.6 Full‑pipeline (code‑mix → compression → post‑process)
239
+
240
+ If you want to generate documentation for an entire repository:
241
+
242
+ ```bash
243
+ python -m preprocessor.pipeline \
244
+ --repo-path /path/to/your/project \
245
+ --output documentation.md \
246
+ --compress-power 4 # number of blocks merged per LLM call
247
+ ```
248
+
249
+ The `pipeline` module (provided in `preprocessor/__main__.py`) orchestrates:
250
+
251
+ 1. `CodeMix` → `codemix.txt`
252
+ 2. Split into per‑file blocks
253
+ 3. `compress_to_one` (sync by default; add `--async` for async)
254
+ 4. Optional post‑processing (headings, TOC)
255
+ 5. Write the final markdown to the path you supplied.
256
+
257
+ ### 5.7 Verify the result
258
+
259
+ Open the generated file (`documentation.md` or `documentation.txt`) in any markdown viewer or IDE. You should see a table of contents, introductory paragraph, and concise summaries of each major component of the source code.
260
+
261
+ ---
262
+
263
+ ## 6. Dependencies
264
+
265
+ | Category | Package | Minimum version | Purpose |
266
+ |----------|---------|----------------|---------|
267
+ | **Core** | `python-dotenv` | 1.0.0 | Loads `.env` files automatically. |
268
+ | | `groq` | 0.5.0 | Official client for the Groq LLM API. |
269
+ | **Optional UI** | `rich` | 13.0.0 | Fancy console progress bars (`LibProgress`). |
270
+ | **Testing (if you run the test suite)** | `pytest` | 7.0.0 | Unit‑test runner. |
271
+ | **Type checking** | `mypy` | 1.0.0 | Static type analysis (dev dependency). |
272
+ | **Formatting** | `black` | 23.0.0 | Code formatter (dev dependency). |
273
+
274
+ All runtime dependencies are listed in `requirements.txt`; dev‑only packages are in `requirements-dev.txt`.
275
+
276
+ ---
277
+
278
+ ### Quick Recap
279
+
280
+ 1. **Install** → create a virtual environment → `pip install -r requirements.txt`.
281
+ 2. **Set** `API_KEY` in `.env` (or pass it manually).
282
+ 3. **Run** either the synchronous demo, the asynchronous demo, or the full pipeline command.
283
+ 4. **Read** the generated markdown – you now have up‑to‑date documentation for your project, generated automatically by an LLM.
284
+
285
+ Feel free to extend the factory with new modules, tweak the prompts in `engine/config/config.py`, or swap the Groq model identifier (`MODELS_NAME`) for a different LLM that better fits your budget or latency requirements. Happy documenting!
286
+
287
+
288
+
289
+ ## <a name="overview"></a> Overview
290
+ The provided code snippet is part of a larger system responsible for generating documentation. This section focuses on the `engine/models` module, specifically the `gpt_model.py` and `model.py` files.
291
+
292
+ ## <a name="responsibility"></a> Responsibility
293
+ The `engine/models` module is responsible for handling communication with the LLM (Large Language Model) using the Groq API. The `GPTModel` and `AsyncGPTModel` classes encapsulate the logic for interacting with the LLM, including sending requests and processing responses.
294
+
295
+ ## <a name="interaction"></a> Interaction with Other Components
296
+ The `engine/models` module interacts with other components of the system as follows:
297
+
298
+ * **Config**: The `config.py` file provides configuration settings, such as API keys and model names, which are used by the `GPTModel` and `AsyncGPTModel` classes.
299
+ * **Factory**: The `factory` module is responsible for combining LLM-generated fragments into a full documentation string. The `GPTModel` and `AsyncGPTModel` classes provide the necessary functionality for the factory to generate documentation.
300
+ * **History**: The `History` class, defined in `model.py`, stores the conversation context that is sent to the LLM. This context is used to generate answers to user queries.
301
+
302
+ ## <a name="key-functions"></a> Key Functions and Classes
303
+ The key functions and classes in the `engine/models` module are:
304
+
305
+ * **`GPTModel`**: A synchronous class that interacts with the LLM using the Groq API.
306
+ * **`AsyncGPTModel`**: An asynchronous class that interacts with the LLM using the Groq API.
307
+ * **`Model`**: A parent class that provides a basic implementation for interacting with the LLM.
308
+ * **`AsyncModel`**: A parent class that provides a basic asynchronous implementation for interacting with the LLM.
309
+ * **`History`**: A class that stores the conversation context sent to the LLM.
310
+
311
+ ## <a name="logic-flow"></a> Logic Flow
312
+ The logic flow of the `engine/models` module is as follows:
313
+
314
+ 1. **Initialization**: The `GPTModel` or `AsyncGPTModel` class is initialized with an API key and a `History` object.
315
+ 2. **Generating Answers**: The `generate_answer` method is called with a user query and optional history. The method sends a request to the LLM and processes the response to generate an answer.
316
+ 3. **Error Handling**: If an error occurs during the request, the method will retry with a different model until a successful response is received.
317
+
318
+ ## <a name="assumptions"></a> Assumptions and Inputs
319
+ The `engine/models` module assumes that:
320
+
321
+ * **API Key**: A valid API key is provided for authentication with the Groq API.
322
+ * **Model Names**: A list of valid model names is provided in the configuration settings.
323
+ * **User Query**: A user query is provided as input to the `generate_answer` method.
324
+ * **History**: A `History` object is provided to store the conversation context.
325
+
326
+ The `engine/models` module produces the following outputs:
327
+
328
+ * **Answer**: A generated answer to the user query.
329
+ * **Error**: An error message if the request to the LLM fails.
330
+
331
+ ## <a name="side-effects"></a> Side Effects
332
+ The `engine/models` module has the following side effects:
333
+
334
+ * **Conversation Context**: The conversation context is updated with the user query and the generated answer.
335
+ * **API Requests**: The module sends requests to the Groq API to generate answers.
336
+
337
+ By following the provided documentation and code structure, developers can effectively utilize the `engine/models` module to generate high-quality documentation using the LLM.
338
+
339
+ **Factory Core – Documentation**
340
+
341
+ <a name="overview"></a>
342
+ ## Overview
343
+ The *factory* package builds the final documentation page by chaining **modules** that each generate a fragment of markdown/HTML.
344
+ `DocFactory` receives any number of objects that inherit from `BaseModule`.
345
+ During `generate_doc(info)` every module is called with the same `info` dictionary, its result is concatenated, and the combined string is returned to the caller (e.g., the CLI or the high‑level `DocFactory.generate_doc` used in the usage example).
346
+
347
+ ---
348
+
349
+ <a name="basemodule"></a>
350
+ ## `BaseModule` (abstract)
351
+
352
+ ```python
353
+ class BaseModule(ABC):
354
+ def __init__(self):
355
+ pass
356
+
357
+ @abstractmethod
358
+ def generate(self, info: dict):
359
+ ...
360
+ ```
361
+
362
+ * **Responsibility** – Define the contract for a documentation fragment generator.
363
+ * **Key method** – `generate(info) → str` must return a string that will be inserted into the final document.
364
+ * **Assumptions** – Implementations may read any key from `info`; they must never mutate the dictionary.
365
+ * **Side‑effects** – None (pure function).
366
+
367
+ All concrete modules (e.g., `IntroLinks`, `IntroText`) inherit from this class.
368
+
369
+ ---
370
+
371
+ <a name="docfactory"></a>
372
+ ## `DocFactory`
373
+
374
+ ```python
375
+ class DocFactory:
376
+ def __init__(self, *modules):
377
+ self.modules: list[BaseModule] = modules
378
+
379
+ def generate_doc(self, info: dict) -> str:
380
+ output = ""
381
+ for module in self.modules:
382
+ module_result = module.generate(info)
383
+ output += module_result + "\n\n"
384
+ return output
385
+ ```
386
+
387
+ * **Responsibility** – Orchestrate the ordered execution of modules and concatenate their outputs.
388
+ * **Interaction** –
389
+ * Receives pre‑instantiated module objects (any subclass of `BaseModule`).
390
+ * Calls each module’s `generate` method, passing the *same* `info` payload.
391
+ * **Inputs** – `info: dict` containing the data required by the modules (e.g., `full_data`, `global_data`, `language`).
392
+ * **Outputs** – A single markdown/HTML string where each fragment is separated by a blank line.
393
+ * **Side‑effects** – None; the method is pure apart from the module implementations.
394
+
395
+ > **Note** – The `if __name__ == "__main__":` block demonstrates a naïve call with abstract classes; in production you would pass concrete module instances.
396
+
397
+ ---
398
+
399
+ <a name="intro-modules"></a>
400
+ ## Intro Modules (`factory.modules.intro`)
401
+
402
+ ```python
403
+ from ..base_factory import BaseModule
404
+ from preprocessor.postprocess import (
405
+ get_all_html_links,
406
+ get_links_intro,
407
+ get_introdaction,
408
+ )
409
+
410
+ class IntroLinks(BaseModule):
411
+ def generate(self, info: dict):
412
+ links = get_all_html_links(info.get("full_data"))
413
+ intro_links = get_links_intro(links, info.get("language"))
414
+ return intro_links
415
+
416
+ class IntroText(BaseModule):
417
+ def generate(self, info: dict):
418
+ intro = get_introdaction(info.get("global_data"), info.get("language"))
419
+ return intro
420
+ ```
421
+
422
+ ### IntroLinks
423
+ * **Purpose** – Extract every `<a href=…>` tag from the raw HTML (`full_data`) and transform the list into a language‑specific introductory list.
424
+ * **Dependencies** – `preprocessor.postprocess.get_all_html_links` and `get_links_intro`.
425
+ * **Inputs** – `info["full_data"]` (HTML string), `info["language"]` (e.g., `"en"`).
426
+ * **Output** – Formatted markdown list of links.
427
+
428
+ ### IntroText
429
+ * **Purpose** – Produce a short paragraph that introduces the whole project using the high‑level description (`global_data`).
430
+ * **Dependency** – `preprocessor.postprocess.get_introdaction`.
431
+ * **Inputs** – `info["global_data"]` (project summary), `info["language"]`.
432
+ * **Output** – A single paragraph of introductory text.
433
+
434
+ Both modules are pure and rely exclusively on the `info` dict; they do not modify external state.
435
+
436
+ ---
437
+
438
+ <a name="integration"></a>
439
+ ## Integration with the Rest of the System
440
+ 1. **Pre‑processing** – `preprocessor` components generate the `full_data` and `global_data` fields that the intro modules consume.
441
+ 2. **Factory construction** – In user code (see the global usage example) a `DocFactory` is instantiated with the desired modules, e.g.:
442
+
443
+ ```python
444
+ factory = DocFactory(IntroLinks(), IntroText())
445
+ doc = factory.generate_doc(info)
446
+ ```
447
+
448
+ 3. **Output** – The resulting string can be written to a markdown file, displayed in the UI, or further post‑processed.
449
+
450
+ ---
451
+
452
+ <a name="usage-example"></a>
453
+ ## Quick Usage Example
454
+
455
+ ```python
456
+ from factory.base_factory import DocFactory
457
+ from factory.modules.intro import IntroLinks, IntroText
458
+
459
+ info = {
460
+ "full_data": "<html>…</html>", # raw HTML of the project page
461
+ "global_data": "Auto Doc Generator …", # short project description
462
+ "language": "en"
463
+ }
464
+
465
+ factory = DocFactory(IntroLinks(), IntroText())
466
+ documentation = factory.generate_doc(info)
467
+ print(documentation)
468
+ ```
469
+
470
+ The example produces an introductory links block followed by a concise project paragraph, each separated by a blank line.
471
+
472
+ ---
473
+
474
+ **Key Take‑aways**
475
+
476
+ * `BaseModule` enforces a simple *generate‑only* contract.
477
+ * `DocFactory` is the orchestrator – order of modules matters.
478
+ * Intro modules are thin adapters around post‑processing utilities, keeping the factory layer agnostic of HTML parsing details.
479
+
480
+ This design makes it trivial to add new sections (e.g., `APIReference`, `Changelog`) – simply implement a new `BaseModule` subclass and include it in the factory’s constructor.
481
+
482
+ <a name="manager-class"></a>
483
+ ## Manager Class
484
+ The `Manager` class is responsible for orchestrating the documentation generation process. It takes in several parameters during initialization:
485
+ * `project_directory`: The path to the project directory.
486
+ * `project_settings`: An instance of `ProjectSettings` containing project metadata.
487
+ * `ignore_files`: A list of file patterns to ignore during the documentation generation process.
488
+ * `language`: The language of the project (defaults to "en").
489
+ * `progress_bar`: An instance of `BaseProgress` for displaying progress (defaults to `BaseProgress`).
490
+
491
+ ### Methods
492
+ The `Manager` class has several methods that perform the following tasks:
493
+ * `read_file_by_file_key`: Reads a file from the cache directory based on a file key.
494
+ * `get_file_path`: Returns the file path for a given file key.
495
+ * `generate_code_file`: Generates a code mix file by walking the repository and concatenating file contents.
496
+ * `generate_global_info_file`: Generates a global info file by compressing the code mix file using an LLM.
497
+ * `generete_doc_parts`: Generates documentation parts by splitting the code mix file and using an LLM to generate text.
498
+ * `factory_generate_doc_intro`: Generates a documentation intro using a `DocFactory` instance.
499
+
500
+ ### Usage Example
501
+ The `Manager` class is used in the `if __name__ == "__main__":` block to generate documentation for a project. The example demonstrates how to create a `Manager` instance, generate a code mix file, global info file, documentation parts, and finally, a documentation intro using a `DocFactory` instance.
502
+
503
+ ```python
504
+ with Progress(
505
+ SpinnerColumn(),
506
+ TextColumn("[progress.description]{task.description}"),
507
+ BarColumn(),
508
+ TaskProgressColumn(),
509
+ ) as progress:
510
+ project_settings = ProjectSettings("Auto Doc Generator")
511
+ project_settings.add_info(
512
+ "global idea",
513
+ """This project was created to help developers make documentations for them projects"""
514
+ )
515
+ manager = Manager(r"C:\Users\huina\Python Projects\Impotant projects\AutoDocGenerateGimini", project_settings, ignore_list, progress_bar=LibProgress(progress), language="en")
516
+
517
+ manager.generate_code_file()
518
+ manager.generate_global_info_file(use_async=True, max_symbols=5000)
519
+ manager.generete_doc_parts(use_async=True, max_symbols=4000)
520
+ manager.factory_generate_doc_intro(
521
+ DocFactory(
522
+ IntroLinks(),
523
+ IntroText(),
524
+ )
525
+ )
526
+ ```
527
+
528
+ ### Key Points
529
+ * The `Manager` class is designed to be flexible and reusable for different projects.
530
+ * The `generate_code_file`, `generate_global_info_file`, and `generete_doc_parts` methods can be used asynchronously by passing `use_async=True`.
531
+ * The `factory_generate_doc_intro` method uses a `DocFactory` instance to generate a documentation intro.
532
+ * The `Manager` class uses a `BaseProgress` instance to display progress during the documentation generation process.
533
+
534
+ <a name="code_mix"></a>
535
+ ## `preprocessor/code_mix.py` – Repository‑Mixer Component
536
+
537
+ **Purpose in the Auto Doc Generator**
538
+ `CodeMix` is the first step of the documentation pipeline. It walks a project's source tree, writes a **human‑readable directory listing** followed by the raw contents of every non‑ignored file into a single text blob. This blob (`codemix.txt`) is later consumed by the **compressor** (`preprocessor/compressor.py`) which splits it on the `<file path="…">` markers and feeds the fragments to the LLM for progressive summarisation.
539
+
540
+ ### Core Class: `CodeMix`
541
+
542
+ | Method | Responsibility | Key Behaviour |
543
+ |--------|----------------|----------------|
544
+ | `__init__(root_dir=".", ignore_patterns=None)` | Initialise the mixer. <br> * `root_dir` → absolute `Path` of the repository root. <br> * `ignore_patterns` → list of glob patterns (e.g., `*.pyc`, `venv`) that define files/folders to skip. |
545
+ | `should_ignore(path: str) -> bool` | Decide whether a given `Path` should be excluded. <br> * Computes the path relative to `root_dir`. <br> * Checks the relative string, its basename, and every path component against all glob patterns using `fnmatch`. |
546
+ | `build_repo_content(output_file="repomix-output.txt")` | Generate the mixed repository file. <br> * Writes a **tree view** (`Repository Structure:`) with indentation reflecting directory depth. <br> * Inserts a separator line (`====================`). <br> * For each file that passes `should_ignore`, writes a marker `<file path="relative/path">` followed by the file's text (UTF‑8, errors ignored). <br> * On read errors, logs a line `Error reading <path>: <exception>` instead of aborting. |
547
+
548
+ ### Interaction with Other Modules
549
+
550
+ 1. **Input** – The component receives the absolute path to the project (`root_dir`) and a list of ignore patterns (`ignore_list` defined at the bottom of the file).
551
+ 2. **Output** – A plain‑text file (by default `repomix-output.txt`, commonly renamed to `codemix.txt`). Its format is:
552
+
553
+ ```
554
+ Repository Structure:
555
+ src/
556
+ main.py
557
+ utils/
558
+ helpers.py
559
+ ====================
560
+
561
+ <file path="src/main.py">
562
+ <file contents …>
563
+
564
+ <file path="src/utils/helpers.py">
565
+ <file contents …>
566
+ ```
567
+ 3. **Downstream consumption** – `preprocessor/compressor.compress_to_one` reads this file, splits on `<file path="` to obtain a list of *per‑file blocks*, and then iteratively asks the LLM to compress them. The `ProjectSettings` object supplies the system prompt that guides the LLM, while the `History` object tracks the conversation.
568
+
569
+ ### Assumptions & Side Effects
570
+
571
+ * **Assumptions** – The repository fits in memory when split into fragments; all source files are UTF‑8‑compatible (binary files are ignored via patterns).
572
+ * **Side effects** – Writes (or overwrites) `output_file`. May produce additional lines for files that raise exceptions during reading (e.g., permission errors).
573
+
574
+ ### Typical Usage
575
+
576
+ ```python
577
+ from preprocessor.code_mix import CodeMix, ignore_list
578
+
579
+ mixer = CodeMix(root_dir="path/to/project", ignore_patterns=ignore_list)
580
+ mixer.build_repo_content("codemix.txt") # creates the mixed dump
581
+ print("Repository dump ready for compression.")
582
+ ```
583
+
584
+ The generated `codemix.txt` becomes the **single source of truth** for the rest of the Auto Doc Generator, enabling the system to turn an entire codebase into concise, LLM‑crafted documentation.
585
+
586
+ <a name="compressor-overview"></a>
587
+ ## 📦 compressor – Core Compression Engine
588
+
589
+ The **compressor** module implements the *iterative reduction* stage of the Auto Doc Generator pipeline.
590
+ After `preprocessor.code_mix` has emitted a list of per‑file text blocks, this module repeatedly sends groups of those blocks to the LLM (via `GPTModel` / `AsyncGPTModel`) and merges the returned summaries until a single, project‑wide documentation string remains.
591
+
592
+ It is the bridge between raw source‑code blobs and the final markdown/HTML that downstream *post‑process* modules consume.
593
+
594
+ ---
595
+
596
+ <a name="compress"></a>
597
+ ### `compress(data: str, project_settings: ProjectSettings, model: Model, compress_power) -> str`
598
+
599
+ * **Responsibility** – Build a three‑message prompt (system + system + user) and ask the model to *compress* the supplied `data`.
600
+ * **Inputs**
601
+ * `data` – raw text of a single file (or a concatenated chunk).
602
+ * `project_settings` – provides `prompt` (project‑specific system prompt).
603
+ * `model` – an instantiated `GPTModel` (sync) or `AsyncGPTModel` (async) that implements `get_answer_without_history`.
604
+ * `compress_power` – integer controlling the “detail level” that is baked into the system prompt via `get_BASE_COMPRESS_TEXT`.
605
+ * **Outputs** – The model’s answer string, i.e. a concise summary of `data`.
606
+ * **Side‑effects** – None (the model call is stateless; no history is updated).
607
+
608
+ ---
609
+
610
+ <a name="compress_and_compare"></a>
611
+ ### `compress_and_compare(data: list, project_settings: ProjectSettings, compress_power: int = 4, progress_bar: BaseProgress = BaseProgress()) -> list`
612
+
613
+ * **Responsibility** – Synchronously compress a *list* of file blocks, grouping `compress_power` consecutive elements, appending their compressed results into a new list (`compress_and_compare_data`).
614
+ * **Workflow**
615
+ 1. Initialise a result list sized `ceil(len(data)/compress_power)`.
616
+ 2. Create a sub‑task on the supplied `progress_bar`.
617
+ 3. Reuse a single `GPTModel` instance for all calls (reduces API overhead).
618
+ 4. For each element `el` in `data` compute its chunk index `i // compress_power` and concatenate `compress(el, …) + "\n"` to the appropriate bucket.
619
+ 5. Update the progress bar after each compression.
620
+ 6. Remove the sub‑task and return the bucket list.
621
+
622
+ * **Assumptions** – `compress_power` ≥ 2; `data` contains non‑empty strings.
623
+
624
+ ---
625
+
626
+ <a name="async_compress"></a>
627
+ ### `async_compress(data: str, project_settings: ProjectSettings, model: AsyncModel, compress_power, semaphore, progress_bar: BaseProgress) -> str`
628
+
629
+ * **Responsibility** – Async counterpart of `compress`.
630
+ * **Key Details**
631
+ * The coroutine acquires the supplied `semaphore` (default limit 4) to bound concurrent LLM calls.
632
+ * Builds the same three‑message prompt and awaits `model.get_answer_without_history`.
633
+ * Updates the progress bar once the LLM response arrives.
634
+
635
+ ---
636
+
637
+ <a name="async_compress_and_compare"></a>
638
+ ### `async_compress_and_compare(data: list, project_settings: ProjectSettings, compress_power: int = 4, progress_bar: BaseProgress = BaseProgress()) -> list`
639
+
640
+ * **Responsibility** – Parallel‑execute `async_compress` for every element of `data`.
641
+ * **Logic Flow**
642
+ 1. Create a semaphore (max 4 concurrent requests) and a single `AsyncGPTModel`.
643
+ 2. Queue a coroutine for each element (`tasks`).
644
+ 3. `await asyncio.gather(*tasks)` → `compressed_elements`.
645
+ 4. Re‑assemble the elements into chunks of size `compress_power`, joining them with newline characters to mimic the synchronous bucket layout.
646
+ 5. Return the list of combined strings.
647
+
648
+ * **Side‑effects** – Progress bar sub‑task is created/removed; LLM calls are performed concurrently.
649
+
650
+ ---
651
+
652
+ <a name="compress_to_one"></a>
653
+ ### `compress_to_one(data: list, project_settings: ProjectSettings, compress_power: int = 4, use_async: bool = False, progress_bar: BaseProgress = BaseProgress()) -> str`
654
+
655
+ * **Responsibility** – Orchestrate the *iterative* compression loop until only one document remains.
656
+ * **Algorithm**
657
+ ```text
658
+ while len(data) > 1:
659
+ if len(data) < compress_power + 1:
660
+ new_compress_power = 2 # fall‑back for small tails
661
+ else:
662
+ new_compress_power = compress_power
663
+
664
+ if use_async:
665
+ data = async_compress_and_compare(..., new_compress_power)
666
+ else:
667
+ data = compress_and_compare(..., new_compress_power)
668
+
669
+ count_of_iter += 1
670
+ return data[0]
671
+ ```
672
+ * **Inputs** – Same as the helper functions; `use_async` toggles the sync vs async pipeline.
673
+ * **Outputs** – A single string containing the fully compressed project documentation.
674
+ * **Side‑effects** – Progress bar updates; multiple LLM calls (sync or async) are issued; internal counters (`count_of_iter`) are for debugging/metrics only.
675
+
676
+ ---
677
+
678
+ <a name="integration"></a>
679
+ ## 🔗 Interaction with the Rest of the System
680
+
681
+ | Component | How it uses *compressor* |
682
+ |-----------|--------------------------|
683
+ | **preprocessor.code_mix** | Generates the initial `list[str]` (raw file blocks) that is fed into `compress_to_one`. |
684
+ | **preprocessor.settings** | Supplies `ProjectSettings.prompt`, which is merged into every LLM request. |
685
+ | **engine.models.gpt_model** | Provides `GPTModel` / `AsyncGPTModel` with the `get_answer_without_history` method used throughout. |
686
+ | **ui.progress_base** | Optional visual feedback; the compressor creates and updates sub‑tasks but works without it (defaults to a no‑op implementation). |
687
+ | **postprocess** | Receives the final single string from `compress_to_one` for heading extraction, intro generation, etc. |
688
+
689
+ ---
690
+
691
+ <a name="usage-notes"></a>
692
+ ## 🚀 Typical Usage Pattern
693
+
694
+ ```python
695
+ from preprocessor.compressor import compress_to_one
696
+ from preprocessor.settings import ProjectSettings
697
+ from ui.progress_base import BaseProgress
698
+
699
+ # `file_blocks` is the list produced by CodeMix (raw per‑file text)
700
+ project_settings = ProjectSettings(project_name="MyApp", info={...})
701
+
702
+ final_doc = compress_to_one(
703
+ data=file_blocks,
704
+ project_settings=project_settings,
705
+ compress_power=4, # tune for token budget
706
+ use_async=True, # leverage async for speed
707
+ progress_bar=BaseProgress()
708
+ )
709
+ ```
710
+
711
+ The function will automatically shrink the list, respect the token limits (via `get_BASE_COMPRESS_TEXT`), and return the ready‑to‑post‑process documentation.
712
+
713
+ ---
714
+
715
+ *All symbols and behaviours described are aligned with the global architecture of the **Auto Doc Generator** project.*
716
+
717
+ ## <a name="postprocess"></a> Post-processing Module
718
+ The post-processing module is responsible for generating markdown anchors, extracting topics and links, and creating introductions for the documentation.
719
+
720
+ ### Functions
721
+
722
+ * **`generate_markdown_anchor(header: str) -> str`**: This function generates a markdown anchor from a given header. It converts the header to lowercase, replaces spaces with hyphens, and removes any non-alphanumeric characters.
723
+ * **`get_all_topics(data: str) -> list[str]`**: This function extracts all topics from a given data string. It finds all occurrences of "\n## " followed by a topic name and returns a list of topics along with their corresponding markdown anchors.
724
+ * **`get_all_html_links(data: str) -> list[str]`**: This function extracts all HTML links from a given data string. It finds all occurrences of "<a name=" followed by a link name and returns a list of links.
725
+ * **`get_links_intro(links: list[str], language: str = "en")`**: This function generates an introduction for a list of links using a GPT model. It creates a prompt with the language and links, and returns the model's response.
726
+ * **`get_introdaction(global_data: str, language: str = "en") -> str`**: This function generates an introduction for a given global data string using a GPT model. It creates a prompt with the language and global data, and returns the model's response.
727
+
728
+ ### Example Usage
729
+
730
+ ```python
731
+ topics, links = get_all_topics(data)
732
+ print(topics) # Output: ["Topic 1", "Topic 2", ...]
733
+ print(links) # Output: ["#topic-1", "#topic-2", ...]
734
+
735
+ html_links = get_all_html_links(data)
736
+ print(html_links) # Output: ["link-1", "link-2", ...]
737
+
738
+ intro = get_links_intro(links)
739
+ print(intro) # Output: "Introduction to links..."
740
+
741
+ intro = get_introdaction(global_data)
742
+ print(intro) # Output: "Introduction to global data..."
743
+ ```
744
+
745
+ ## <a name="settings"></a> Project Settings
746
+ The project settings class is responsible for storing project metadata and generating a prompt for the GPT model.
747
+
748
+ ### Class
749
+
750
+ * **`ProjectSettings`**: This class has the following properties and methods:
751
+ * **`__init__(project_name: str)`**: Initializes the project settings with a project name.
752
+ * **`add_info(key, value)`**: Adds a key-value pair to the project info dictionary.
753
+ * **`prompt`**: A property that returns the prompt for the GPT model.
754
+
755
+ ### Example Usage
756
+
757
+ ```python
758
+ project_settings = ProjectSettings("My Project")
759
+ project_settings.add_info("author", "John Doe")
760
+ print(project_settings.prompt) # Output: "Project Name: My Project\nauthor: John Doe\n"
761
+ ```
762
+
763
+ ## <a name="spliter"></a> Data Splitter
764
+ The data splitter module is responsible for splitting a large data string into smaller chunks.
765
+
766
+ ### Functions
767
+
768
+ * **`split_data(data: str, max_symbols: int) -> list[str]`**: This function splits a data string into chunks of a maximum size.
769
+
770
+ ### Example Usage
771
+
772
+ ```python
773
+ chunks = split_data(data, 1000)
774
+ print(chunks) # Output: ["chunk-1", "chunk-2", ...]
775
+ ```
776
+
777
+ <a name="documentation"></a>
778
+ ## Documentation
779
+
780
+ The provided code snippet appears to be part of a larger system responsible for generating documentation for a given codebase. The system utilizes a combination of natural language processing (NLP) and machine learning models to produce high-quality documentation.
781
+
782
+ ### Component Overview
783
+
784
+ The code snippet is comprised of several key components:
785
+
786
+ * `split_data`: A function responsible for splitting the input code mix into smaller, manageable parts based on a maximum symbol limit.
787
+ * `write_docs_by_parts` and `async_write_docs_by_parts`: Functions that generate documentation for each part of the split code mix using a model (either synchronous or asynchronous).
788
+ * `gen_doc_parts` and `async_gen_doc_parts`: Functions that orchestrate the generation of documentation for the entire code mix by splitting the data, generating documentation for each part, and combining the results.
789
+
790
+ ### Key Functions and Classes
791
+
792
+ * `split_data`: Splits the input code mix into smaller parts based on a maximum symbol limit.
793
+ * `write_docs_by_parts`: Generates documentation for a given part of the code mix using a synchronous model.
794
+ * `async_write_docs_by_parts`: Generates documentation for a given part of the code mix using an asynchronous model.
795
+ * `gen_doc_parts`: Generates documentation for the entire code mix by splitting the data and using a synchronous model.
796
+ * `async_gen_doc_parts`: Generates documentation for the entire code mix by splitting the data and using an asynchronous model.
797
+
798
+ ### Logic Flow
799
+
800
+ The logic flow of the system can be summarized as follows:
801
+
802
+ 1. The input code mix is split into smaller parts using the `split_data` function.
803
+ 2. For each part, the `write_docs_by_parts` or `async_write_docs_by_parts` function is called to generate documentation using a model.
804
+ 3. The generated documentation for each part is combined to produce the final documentation for the entire code mix.
805
+
806
+ ### Important Assumptions and Inputs
807
+
808
+ * The input code mix is expected to be a string containing the codebase to be documented.
809
+ * The maximum symbol limit is used to determine the size of each part of the split code mix.
810
+ * The model used for generating documentation is assumed to be a natural language processing (NLP) or machine learning model capable of understanding the input code mix and producing high-quality documentation.
811
+
812
+ ### Outputs and Side Effects
813
+
814
+ * The final output of the system is a string containing the generated documentation for the entire code mix.
815
+ * The system may have side effects, such as creating temporary files or updating progress bars, depending on the implementation of the `progress_bar` component.
816
+
817
+ ### Example Usage
818
+
819
+ ```python
820
+ # Example usage of the gen_doc_parts function
821
+ full_code_mix = "Example code mix"
822
+ global_info = "Example global information"
823
+ max_symbols = 1000
824
+ language = "en"
825
+ progress_bar = BaseProgress()
826
+
827
+ result = gen_doc_parts(full_code_mix, global_info, max_symbols, language, progress_bar)
828
+ print(result)
829
+ ```
830
+
831
+ ```python
832
+ # Example usage of the async_gen_doc_parts function
833
+ import asyncio
834
+
835
+ full_code_mix = "Example code mix"
836
+ global_info = "Example global information"
837
+ max_symbols = 1000
838
+ language = "en"
839
+ progress_bar = BaseProgress()
840
+
841
+ async def main():
842
+ result = await async_gen_doc_parts(full_code_mix, global_info, max_symbols, language, progress_bar)
843
+ print(result)
844
+
845
+ asyncio.run(main())
846
+ ```
847
+
848
+ ## <a name="progress_base_module"></a> Progress Base Module
849
+ The `progress_base` module provides a foundation for creating progress bars in the application. It defines two classes: `BaseProgress` and `LibProgress`.
850
+
851
+ ### <a name="baseprogress_class"></a> BaseProgress Class
852
+ The `BaseProgress` class serves as a base class for progress bar implementations. It defines the following methods:
853
+ * `__init__`: Initializes the progress bar.
854
+ * `create_new_subtask`: Creates a new subtask in the progress bar. This method should be implemented by subclasses.
855
+ * `update_task`: Updates the progress bar. This method should be implemented by subclasses.
856
+ * `remove_subtask`: Removes a subtask from the progress bar. This method should be implemented by subclasses.
857
+
858
+ ### <a name="libprogress_class"></a> LibProgress Class
859
+ The `LibProgress` class is a concrete implementation of the `BaseProgress` class. It uses the `rich.progress` library to create a progress bar. The class has the following attributes:
860
+ * `progress`: An instance of `rich.progress.Progress`.
861
+ * `_base_task`: The main task in the progress bar.
862
+ * `_cur_sub_task`: The current subtask in the progress bar.
863
+
864
+ The `LibProgress` class implements the following methods:
865
+ * `__init__`: Initializes the progress bar with a main task and an optional total number of tasks.
866
+ * `create_new_subtask`: Creates a new subtask in the progress bar with a given name and total length.
867
+ * `update_task`: Updates the progress bar by advancing the current subtask or the main task if no subtask is active.
868
+ * `remove_subtask`: Removes the current subtask from the progress bar.
869
+
870
+ ### Example Usage
871
+ ```python
872
+ from ui.progress_base import LibProgress
873
+ from rich.progress import Progress
874
+
875
+ # Create a progress bar
876
+ progress = Progress()
877
+ lib_progress = LibProgress(progress, total=10)
878
+
879
+ # Create a new subtask
880
+ lib_progress.create_new_subtask("Subtask 1", 5)
881
+
882
+ # Update the progress bar
883
+ lib_progress.update_task()
884
+
885
+ # Remove the subtask
886
+ lib_progress.remove_subtask()
887
+ ```
888
+ This code creates a progress bar with a main task and a subtask, updates the progress bar, and then removes the subtask.
889
+
890
+