codebase-extractor 1.1.0__tar.gz → 1.2.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {codebase_extractor-1.1.0/src/codebase_extractor.egg-info → codebase_extractor-1.2.0}/PKG-INFO +40 -36
- {codebase_extractor-1.1.0 → codebase_extractor-1.2.0}/README.md +38 -34
- {codebase_extractor-1.1.0 → codebase_extractor-1.2.0}/pyproject.toml +3 -3
- codebase_extractor-1.2.0/src/codebase_extractor/__init__.py +1 -0
- {codebase_extractor-1.1.0 → codebase_extractor-1.2.0}/src/codebase_extractor/cli.py +3 -2
- {codebase_extractor-1.1.0 → codebase_extractor-1.2.0}/src/codebase_extractor/config.py +35 -3
- {codebase_extractor-1.1.0 → codebase_extractor-1.2.0}/src/codebase_extractor/file_handler.py +61 -21
- {codebase_extractor-1.1.0 → codebase_extractor-1.2.0}/src/codebase_extractor/main_logic.py +36 -13
- {codebase_extractor-1.1.0 → codebase_extractor-1.2.0}/src/codebase_extractor/ui.py +12 -26
- {codebase_extractor-1.1.0 → codebase_extractor-1.2.0/src/codebase_extractor.egg-info}/PKG-INFO +40 -36
- codebase_extractor-1.2.0/src/codebase_extractor.egg-info/entry_points.txt +2 -0
- codebase_extractor-1.1.0/src/codebase_extractor/__init__.py +0 -1
- codebase_extractor-1.1.0/src/codebase_extractor.egg-info/entry_points.txt +0 -2
- {codebase_extractor-1.1.0 → codebase_extractor-1.2.0}/LICENCE +0 -0
- {codebase_extractor-1.1.0 → codebase_extractor-1.2.0}/setup.cfg +0 -0
- {codebase_extractor-1.1.0 → codebase_extractor-1.2.0}/src/codebase_extractor.egg-info/SOURCES.txt +0 -0
- {codebase_extractor-1.1.0 → codebase_extractor-1.2.0}/src/codebase_extractor.egg-info/dependency_links.txt +0 -0
- {codebase_extractor-1.1.0 → codebase_extractor-1.2.0}/src/codebase_extractor.egg-info/requires.txt +0 -0
- {codebase_extractor-1.1.0 → codebase_extractor-1.2.0}/src/codebase_extractor.egg-info/top_level.txt +0 -0
{codebase_extractor-1.1.0/src/codebase_extractor.egg-info → codebase_extractor-1.2.0}/PKG-INFO
RENAMED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: codebase-extractor
|
|
3
|
-
Version: 1.
|
|
3
|
+
Version: 1.2.0
|
|
4
4
|
Summary: A CLI tool to extract project source code into structured Markdown files for LLM & AI context.
|
|
5
5
|
Author: Lukasz Lekowski
|
|
6
6
|
Project-URL: Homepage, https://github.com/lukaszlekowski/codebase-extractor
|
|
@@ -10,7 +10,7 @@ Classifier: License :: OSI Approved :: MIT License
|
|
|
10
10
|
Classifier: Operating System :: OS Independent
|
|
11
11
|
Classifier: Topic :: Software Development :: Documentation
|
|
12
12
|
Classifier: Topic :: Utilities
|
|
13
|
-
Requires-Python: >=3.
|
|
13
|
+
Requires-Python: >=3.14
|
|
14
14
|
Description-Content-Type: text/markdown
|
|
15
15
|
License-File: LICENCE
|
|
16
16
|
Requires-Dist: questionary
|
|
@@ -75,12 +75,12 @@ The tool is highly configurable, allowing you to select specific folders, exclud
|
|
|
75
75
|
## ✨ Key Features
|
|
76
76
|
|
|
77
77
|
- **Interactive & User-Friendly:** A guided, multi-step CLI experience that makes selecting options simple and clear.
|
|
78
|
+
- **Quick Start by Default:** The tool starts without delay. Detailed instructions are available via an `--instructions` flag when you need a reminder.
|
|
78
79
|
- **Smart Filtering:** Automatically excludes common dependency folders, build artifacts, version control directories, and IDE configuration files. The exact filters are configurable.
|
|
79
80
|
- **Flexible Selection Modes:** Choose to extract the entire project with one command, or dive into a specific selection mode.
|
|
80
|
-
- **🌳
|
|
81
|
+
- **🌳 Visual Tree Selection:** Interactively browse and select specific sub-folders from a clear, pipe-based tree structure.
|
|
81
82
|
- **🔢 Configurable Scan Depth:** You decide how many levels deep the script should look for folders when building the selection tree.
|
|
82
|
-
- **YAML Metadata:** Each generated Markdown file is prepended with a YAML front matter block containing useful metadata like a unique run ID, timestamp,
|
|
83
|
-
- **🚀 Quick Start Mode:** Use the `--no-instructions` flag to skip the detailed intro guide on subsequent runs.
|
|
83
|
+
- **Rich YAML Metadata:** Each generated Markdown file is prepended with a YAML front matter block containing useful metadata like a unique run ID, timestamp, file count, character count, and word count.
|
|
84
84
|
- **Safe & Robust:** Features graceful exit handling (`Ctrl+C`) and provides clear feedback during the extraction process.
|
|
85
85
|
|
|
86
86
|
---
|
|
@@ -165,15 +165,15 @@ pipx install codebase-extractor
|
|
|
165
165
|
Once installed, you can run the tool from any terminal window. Navigate to your project's root directory and run the command:
|
|
166
166
|
|
|
167
167
|
```bash
|
|
168
|
-
|
|
168
|
+
codebase-extractor
|
|
169
169
|
```
|
|
170
170
|
|
|
171
|
-
The script will
|
|
171
|
+
The script will launch immediately and guide you through the extraction process.
|
|
172
172
|
|
|
173
|
-
For
|
|
173
|
+
For a detailed guide on how the script works, you can use the `--instructions` flag:
|
|
174
174
|
|
|
175
175
|
```bash
|
|
176
|
-
|
|
176
|
+
codebase-extractor --instructions
|
|
177
177
|
```
|
|
178
178
|
|
|
179
179
|
### The Process
|
|
@@ -193,25 +193,25 @@ The tool will guide you through a series of prompts:
|
|
|
193
193
|
|
|
194
194
|
### Output Details
|
|
195
195
|
|
|
196
|
-
All output files are saved in a `CODEBASE_EXTRACTS` directory within your project folder. Each generated Markdown file includes a YAML metadata header with a unique reference ID, timestamp, and
|
|
196
|
+
All output files are saved in a `CODEBASE_EXTRACTS` directory within your project folder. Each generated Markdown file includes a YAML metadata header with a unique reference ID, timestamp, file count, character count, and word count for easy tracking and parsing.
|
|
197
197
|
|
|
198
198
|
### ⚡ CLI Command Reference
|
|
199
199
|
|
|
200
200
|
For non-interactive use and automation, you can control the script entirely with these arguments.
|
|
201
201
|
|
|
202
|
-
| Argument
|
|
203
|
-
|
|
|
204
|
-
|
|
|
205
|
-
| `--root <path>`
|
|
206
|
-
| `--output-dir <name>`
|
|
207
|
-
| `--dry-run`
|
|
208
|
-
| `-v`, `--verbose`
|
|
209
|
-
| `--log-file <path>`
|
|
210
|
-
| `--exclude-large-files`
|
|
211
|
-
| `--mode <mode>`
|
|
212
|
-
| `--depth <number>`
|
|
213
|
-
| `--select-folders <list>`
|
|
214
|
-
| `--select-root`
|
|
202
|
+
| Argument | Description | Default Value |
|
|
203
|
+
| :------------------------ | :--------------------------------------------------------------------------- | :-------------------------- |
|
|
204
|
+
| `--instructions` | Show the detailed instruction guide on startup. | `False` |
|
|
205
|
+
| `--root <path>` | The root directory of the project to extract. | The current directory |
|
|
206
|
+
| `--output-dir <name>` | Custom name for the output directory. | `CODEBASE_EXTRACTS` |
|
|
207
|
+
| `--dry-run` | Simulate the extraction process without writing any files. | `False` |
|
|
208
|
+
| `-v`, `--verbose` | Enable verbose logging for debugging. | `False` |
|
|
209
|
+
| `--log-file <path>` | Path to save the log file. | `None` |
|
|
210
|
+
| `--exclude-large-files` | Non-interactive: Exclude files larger than 1MB. | `False` |
|
|
211
|
+
| `--mode <mode>` | Non-interactive: Set the extraction mode. Choices: `everything`, `specific`. | `None` (Interactive prompt) |
|
|
212
|
+
| `--depth <number>` | Non-interactive: Set the folder scan depth for 'specific' mode. | `3` |
|
|
213
|
+
| `--select-folders <list>` | Non-interactive: A space-separated list of folders/sub-folders to extract. | `[]` |
|
|
214
|
+
| `--select-root` | Non-interactive: Include files from the root directory in the extraction. | `False` |
|
|
215
215
|
|
|
216
216
|
---
|
|
217
217
|
|
|
@@ -224,7 +224,7 @@ Here are a few practical examples of how to use the tool from your command line.
|
|
|
224
224
|
A common command for quick, automated runs.
|
|
225
225
|
|
|
226
226
|
```bash
|
|
227
|
-
|
|
227
|
+
codebase-extractor --mode everything
|
|
228
228
|
```
|
|
229
229
|
|
|
230
230
|
- #### Extract specific sub-folders non-interactively
|
|
@@ -232,7 +232,7 @@ Here are a few practical examples of how to use the tool from your command line.
|
|
|
232
232
|
This command extracts only the `src/components` and `src/hooks` directories, plus any files in the root.
|
|
233
233
|
|
|
234
234
|
```bash
|
|
235
|
-
|
|
235
|
+
codebase-extractor --mode specific --select-folders src/components src/hooks --select-root
|
|
236
236
|
```
|
|
237
237
|
|
|
238
238
|
- #### Perform a safe dry run
|
|
@@ -240,13 +240,13 @@ Here are a few practical examples of how to use the tool from your command line.
|
|
|
240
240
|
This will simulate a full extraction and print what it _would_ have done, without creating any files.
|
|
241
241
|
|
|
242
242
|
```bash
|
|
243
|
-
|
|
243
|
+
codebase-extractor --dry-run --mode everything
|
|
244
244
|
```
|
|
245
245
|
|
|
246
246
|
- #### Run on a different project and save to a custom folder
|
|
247
247
|
This targets a completely different directory and specifies a custom output folder name.
|
|
248
248
|
```bash
|
|
249
|
-
|
|
249
|
+
codebase-extractor --root /path/to/another/project --output-dir MyProject_Extraction
|
|
250
250
|
```
|
|
251
251
|
|
|
252
252
|
---
|
|
@@ -256,10 +256,11 @@ Here are a few practical examples of how to use the tool from your command line.
|
|
|
256
256
|
The tool uses a set of rules to determine which files and folders to include in the extraction. Here are the default settings found in the `config.py` file.
|
|
257
257
|
|
|
258
258
|
<details>
|
|
259
|
-
|
|
259
|
+
<summary><strong>Click to view Excluded Directories</strong></summary>
|
|
260
260
|
|
|
261
261
|
- `node_modules`, `vendor`, `__pycache__`, `dist`, `build`, `target`, `.next`
|
|
262
262
|
- `.git`, `.svn`, `.hg`, `.vscode`, `.idea`, `venv`, `.venv`
|
|
263
|
+
- `.dart_tool`, `.gradle`, `Pods`, `DerivedData`
|
|
263
264
|
|
|
264
265
|
</details>
|
|
265
266
|
|
|
@@ -273,17 +274,20 @@ The tool uses a set of rules to determine which files and folders to include in
|
|
|
273
274
|
<details>
|
|
274
275
|
<summary><strong>Click to view Allowed Filenames & Extensions</strong></summary>
|
|
275
276
|
|
|
276
|
-
|
|
277
|
+
The script will process any file with one of the following extensions. It also explicitly allows common configuration files that may not have an extension.
|
|
277
278
|
|
|
278
279
|
**Allowed Filenames:**
|
|
279
|
-
- `dockerfile`, `.gitignore`, `.htaccess`, `makefile`
|
|
280
|
+
- `dockerfile`, `.gitignore`, `.htaccess`, `makefile`, `.dockerignore`, `.env.example`
|
|
281
|
+
- `podfile`, `gemfile`, `jenkinsfile`, `gradlew`
|
|
280
282
|
|
|
281
283
|
**Allowed Extensions:**
|
|
282
|
-
- `.php`, `.html`, `.css`, `.js`, `.jsx`, `.ts`, `.tsx`, `.vue`, `.svelte`
|
|
283
|
-
- `.py`, `.rb`, `.java`, `.c`, `.cpp`, `.cs`, `.go`, `.rs`
|
|
284
|
-
- `.json`, `.xml`, `.yaml`, `.yml`, `.toml`, `.ini`, `.conf`
|
|
285
|
-
- `.md`, `.txt`, `.rst`, `.twig`, `.blade`, `.handlebars`, `.mustache`, `.ejs`
|
|
286
|
-
- `.sql`, `.graphql`, `.gql`, `.tf`
|
|
284
|
+
- **Web & General:** `.php`, `.html`, `.css`, `.js`, `.jsx`, `.ts`, `.tsx`, `.vue`, `.svelte`
|
|
285
|
+
- **Backend & Systems:** `.py`, `.rb`, `.java`, `.c`, `.cpp`, `.cs`, `.go`, `.rs`
|
|
286
|
+
- **Config & Data:** `.json`, `.xml`, `.yaml`, `.yml`, `.toml`, `.ini`, `.conf`
|
|
287
|
+
- **Docs & Templates:** `.md`, `.txt`, `.rst`, `.twig`, `.blade`, `.handlebars`, `.mustache`, `.ejs`
|
|
288
|
+
- **Database & IaC:** `.sql`, `.graphql`, `.gql`, `.tf`
|
|
289
|
+
- **Mobile (Flutter, Android, iOS):** `.dart`, `.arb`, `.gradle`, `.properties`, `.plist`, `.xcconfig`
|
|
290
|
+
- **Scripts:** `.sh`, `.bat`
|
|
287
291
|
|
|
288
292
|
</details>
|
|
289
293
|
|
|
@@ -291,7 +295,7 @@ The tool uses a set of rules to determine which files and folders to include in
|
|
|
291
295
|
|
|
292
296
|
## 🤔 Troubleshooting
|
|
293
297
|
|
|
294
|
-
- **Problem:** After installation, I run `
|
|
298
|
+
- **Problem:** After installation, I run `codebase-extractor` and my terminal says `command not found`.
|
|
295
299
|
- **Solution:** This is usually a `PATH` issue. It means your system's shell doesn't know where to find the installed script. The `pip install --user` command sometimes requires you to add a local scripts directory to your `PATH`. Please refer to your operating system's documentation for instructions on how to modify your `PATH` environment variable.
|
|
296
300
|
|
|
297
301
|
- **Problem:** The tool ran, but a specific folder or file I expected to see is missing from the output.
|
|
@@ -55,12 +55,12 @@ The tool is highly configurable, allowing you to select specific folders, exclud
|
|
|
55
55
|
## ✨ Key Features
|
|
56
56
|
|
|
57
57
|
- **Interactive & User-Friendly:** A guided, multi-step CLI experience that makes selecting options simple and clear.
|
|
58
|
+
- **Quick Start by Default:** The tool starts without delay. Detailed instructions are available via an `--instructions` flag when you need a reminder.
|
|
58
59
|
- **Smart Filtering:** Automatically excludes common dependency folders, build artifacts, version control directories, and IDE configuration files. The exact filters are configurable.
|
|
59
60
|
- **Flexible Selection Modes:** Choose to extract the entire project with one command, or dive into a specific selection mode.
|
|
60
|
-
- **🌳
|
|
61
|
+
- **🌳 Visual Tree Selection:** Interactively browse and select specific sub-folders from a clear, pipe-based tree structure.
|
|
61
62
|
- **🔢 Configurable Scan Depth:** You decide how many levels deep the script should look for folders when building the selection tree.
|
|
62
|
-
- **YAML Metadata:** Each generated Markdown file is prepended with a YAML front matter block containing useful metadata like a unique run ID, timestamp,
|
|
63
|
-
- **🚀 Quick Start Mode:** Use the `--no-instructions` flag to skip the detailed intro guide on subsequent runs.
|
|
63
|
+
- **Rich YAML Metadata:** Each generated Markdown file is prepended with a YAML front matter block containing useful metadata like a unique run ID, timestamp, file count, character count, and word count.
|
|
64
64
|
- **Safe & Robust:** Features graceful exit handling (`Ctrl+C`) and provides clear feedback during the extraction process.
|
|
65
65
|
|
|
66
66
|
---
|
|
@@ -145,15 +145,15 @@ pipx install codebase-extractor
|
|
|
145
145
|
Once installed, you can run the tool from any terminal window. Navigate to your project's root directory and run the command:
|
|
146
146
|
|
|
147
147
|
```bash
|
|
148
|
-
|
|
148
|
+
codebase-extractor
|
|
149
149
|
```
|
|
150
150
|
|
|
151
|
-
The script will
|
|
151
|
+
The script will launch immediately and guide you through the extraction process.
|
|
152
152
|
|
|
153
|
-
For
|
|
153
|
+
For a detailed guide on how the script works, you can use the `--instructions` flag:
|
|
154
154
|
|
|
155
155
|
```bash
|
|
156
|
-
|
|
156
|
+
codebase-extractor --instructions
|
|
157
157
|
```
|
|
158
158
|
|
|
159
159
|
### The Process
|
|
@@ -173,25 +173,25 @@ The tool will guide you through a series of prompts:
|
|
|
173
173
|
|
|
174
174
|
### Output Details
|
|
175
175
|
|
|
176
|
-
All output files are saved in a `CODEBASE_EXTRACTS` directory within your project folder. Each generated Markdown file includes a YAML metadata header with a unique reference ID, timestamp, and
|
|
176
|
+
All output files are saved in a `CODEBASE_EXTRACTS` directory within your project folder. Each generated Markdown file includes a YAML metadata header with a unique reference ID, timestamp, file count, character count, and word count for easy tracking and parsing.
|
|
177
177
|
|
|
178
178
|
### ⚡ CLI Command Reference
|
|
179
179
|
|
|
180
180
|
For non-interactive use and automation, you can control the script entirely with these arguments.
|
|
181
181
|
|
|
182
|
-
| Argument
|
|
183
|
-
|
|
|
184
|
-
|
|
|
185
|
-
| `--root <path>`
|
|
186
|
-
| `--output-dir <name>`
|
|
187
|
-
| `--dry-run`
|
|
188
|
-
| `-v`, `--verbose`
|
|
189
|
-
| `--log-file <path>`
|
|
190
|
-
| `--exclude-large-files`
|
|
191
|
-
| `--mode <mode>`
|
|
192
|
-
| `--depth <number>`
|
|
193
|
-
| `--select-folders <list>`
|
|
194
|
-
| `--select-root`
|
|
182
|
+
| Argument | Description | Default Value |
|
|
183
|
+
| :------------------------ | :--------------------------------------------------------------------------- | :-------------------------- |
|
|
184
|
+
| `--instructions` | Show the detailed instruction guide on startup. | `False` |
|
|
185
|
+
| `--root <path>` | The root directory of the project to extract. | The current directory |
|
|
186
|
+
| `--output-dir <name>` | Custom name for the output directory. | `CODEBASE_EXTRACTS` |
|
|
187
|
+
| `--dry-run` | Simulate the extraction process without writing any files. | `False` |
|
|
188
|
+
| `-v`, `--verbose` | Enable verbose logging for debugging. | `False` |
|
|
189
|
+
| `--log-file <path>` | Path to save the log file. | `None` |
|
|
190
|
+
| `--exclude-large-files` | Non-interactive: Exclude files larger than 1MB. | `False` |
|
|
191
|
+
| `--mode <mode>` | Non-interactive: Set the extraction mode. Choices: `everything`, `specific`. | `None` (Interactive prompt) |
|
|
192
|
+
| `--depth <number>` | Non-interactive: Set the folder scan depth for 'specific' mode. | `3` |
|
|
193
|
+
| `--select-folders <list>` | Non-interactive: A space-separated list of folders/sub-folders to extract. | `[]` |
|
|
194
|
+
| `--select-root` | Non-interactive: Include files from the root directory in the extraction. | `False` |
|
|
195
195
|
|
|
196
196
|
---
|
|
197
197
|
|
|
@@ -204,7 +204,7 @@ Here are a few practical examples of how to use the tool from your command line.
|
|
|
204
204
|
A common command for quick, automated runs.
|
|
205
205
|
|
|
206
206
|
```bash
|
|
207
|
-
|
|
207
|
+
codebase-extractor --mode everything
|
|
208
208
|
```
|
|
209
209
|
|
|
210
210
|
- #### Extract specific sub-folders non-interactively
|
|
@@ -212,7 +212,7 @@ Here are a few practical examples of how to use the tool from your command line.
|
|
|
212
212
|
This command extracts only the `src/components` and `src/hooks` directories, plus any files in the root.
|
|
213
213
|
|
|
214
214
|
```bash
|
|
215
|
-
|
|
215
|
+
codebase-extractor --mode specific --select-folders src/components src/hooks --select-root
|
|
216
216
|
```
|
|
217
217
|
|
|
218
218
|
- #### Perform a safe dry run
|
|
@@ -220,13 +220,13 @@ Here are a few practical examples of how to use the tool from your command line.
|
|
|
220
220
|
This will simulate a full extraction and print what it _would_ have done, without creating any files.
|
|
221
221
|
|
|
222
222
|
```bash
|
|
223
|
-
|
|
223
|
+
codebase-extractor --dry-run --mode everything
|
|
224
224
|
```
|
|
225
225
|
|
|
226
226
|
- #### Run on a different project and save to a custom folder
|
|
227
227
|
This targets a completely different directory and specifies a custom output folder name.
|
|
228
228
|
```bash
|
|
229
|
-
|
|
229
|
+
codebase-extractor --root /path/to/another/project --output-dir MyProject_Extraction
|
|
230
230
|
```
|
|
231
231
|
|
|
232
232
|
---
|
|
@@ -236,10 +236,11 @@ Here are a few practical examples of how to use the tool from your command line.
|
|
|
236
236
|
The tool uses a set of rules to determine which files and folders to include in the extraction. Here are the default settings found in the `config.py` file.
|
|
237
237
|
|
|
238
238
|
<details>
|
|
239
|
-
|
|
239
|
+
<summary><strong>Click to view Excluded Directories</strong></summary>
|
|
240
240
|
|
|
241
241
|
- `node_modules`, `vendor`, `__pycache__`, `dist`, `build`, `target`, `.next`
|
|
242
242
|
- `.git`, `.svn`, `.hg`, `.vscode`, `.idea`, `venv`, `.venv`
|
|
243
|
+
- `.dart_tool`, `.gradle`, `Pods`, `DerivedData`
|
|
243
244
|
|
|
244
245
|
</details>
|
|
245
246
|
|
|
@@ -253,17 +254,20 @@ The tool uses a set of rules to determine which files and folders to include in
|
|
|
253
254
|
<details>
|
|
254
255
|
<summary><strong>Click to view Allowed Filenames & Extensions</strong></summary>
|
|
255
256
|
|
|
256
|
-
|
|
257
|
+
The script will process any file with one of the following extensions. It also explicitly allows common configuration files that may not have an extension.
|
|
257
258
|
|
|
258
259
|
**Allowed Filenames:**
|
|
259
|
-
- `dockerfile`, `.gitignore`, `.htaccess`, `makefile`
|
|
260
|
+
- `dockerfile`, `.gitignore`, `.htaccess`, `makefile`, `.dockerignore`, `.env.example`
|
|
261
|
+
- `podfile`, `gemfile`, `jenkinsfile`, `gradlew`
|
|
260
262
|
|
|
261
263
|
**Allowed Extensions:**
|
|
262
|
-
- `.php`, `.html`, `.css`, `.js`, `.jsx`, `.ts`, `.tsx`, `.vue`, `.svelte`
|
|
263
|
-
- `.py`, `.rb`, `.java`, `.c`, `.cpp`, `.cs`, `.go`, `.rs`
|
|
264
|
-
- `.json`, `.xml`, `.yaml`, `.yml`, `.toml`, `.ini`, `.conf`
|
|
265
|
-
- `.md`, `.txt`, `.rst`, `.twig`, `.blade`, `.handlebars`, `.mustache`, `.ejs`
|
|
266
|
-
- `.sql`, `.graphql`, `.gql`, `.tf`
|
|
264
|
+
- **Web & General:** `.php`, `.html`, `.css`, `.js`, `.jsx`, `.ts`, `.tsx`, `.vue`, `.svelte`
|
|
265
|
+
- **Backend & Systems:** `.py`, `.rb`, `.java`, `.c`, `.cpp`, `.cs`, `.go`, `.rs`
|
|
266
|
+
- **Config & Data:** `.json`, `.xml`, `.yaml`, `.yml`, `.toml`, `.ini`, `.conf`
|
|
267
|
+
- **Docs & Templates:** `.md`, `.txt`, `.rst`, `.twig`, `.blade`, `.handlebars`, `.mustache`, `.ejs`
|
|
268
|
+
- **Database & IaC:** `.sql`, `.graphql`, `.gql`, `.tf`
|
|
269
|
+
- **Mobile (Flutter, Android, iOS):** `.dart`, `.arb`, `.gradle`, `.properties`, `.plist`, `.xcconfig`
|
|
270
|
+
- **Scripts:** `.sh`, `.bat`
|
|
267
271
|
|
|
268
272
|
</details>
|
|
269
273
|
|
|
@@ -271,7 +275,7 @@ The tool uses a set of rules to determine which files and folders to include in
|
|
|
271
275
|
|
|
272
276
|
## 🤔 Troubleshooting
|
|
273
277
|
|
|
274
|
-
- **Problem:** After installation, I run `
|
|
278
|
+
- **Problem:** After installation, I run `codebase-extractor` and my terminal says `command not found`.
|
|
275
279
|
- **Solution:** This is usually a `PATH` issue. It means your system's shell doesn't know where to find the installed script. The `pip install --user` command sometimes requires you to add a local scripts directory to your `PATH`. Please refer to your operating system's documentation for instructions on how to modify your `PATH` environment variable.
|
|
276
280
|
|
|
277
281
|
- **Problem:** The tool ran, but a specific folder or file I expected to see is missing from the output.
|
|
@@ -4,14 +4,14 @@ build-backend = "setuptools.build_meta"
|
|
|
4
4
|
|
|
5
5
|
[project]
|
|
6
6
|
name = "codebase-extractor"
|
|
7
|
-
version = "1.
|
|
7
|
+
version = "1.2.0"
|
|
8
8
|
authors = [
|
|
9
9
|
{ name="Lukasz Lekowski" },
|
|
10
10
|
]
|
|
11
11
|
description = "A CLI tool to extract project source code into structured Markdown files for LLM & AI context."
|
|
12
12
|
readme = "README.md"
|
|
13
13
|
license = { file="LICENSE" }
|
|
14
|
-
requires-python = ">=3.
|
|
14
|
+
requires-python = ">=3.14"
|
|
15
15
|
classifiers = [
|
|
16
16
|
"Programming Language :: Python :: 3",
|
|
17
17
|
"License :: OSI Approved :: MIT License",
|
|
@@ -27,7 +27,7 @@ dependencies = [
|
|
|
27
27
|
|
|
28
28
|
# This creates the `code-extractor` command in the user's terminal
|
|
29
29
|
[project.scripts]
|
|
30
|
-
|
|
30
|
+
codebase-extractor = "codebase_extractor.main_logic:main"
|
|
31
31
|
|
|
32
32
|
[project.urls]
|
|
33
33
|
"Homepage" = "https://github.com/lukaszlekowski/codebase-extractor"
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
__version__ = "1.2.0"
|
|
@@ -10,9 +10,10 @@ def parse_arguments():
|
|
|
10
10
|
|
|
11
11
|
# General Flags
|
|
12
12
|
parser.add_argument(
|
|
13
|
-
'
|
|
13
|
+
'--instructions',
|
|
14
14
|
action='store_true',
|
|
15
|
-
|
|
15
|
+
default=False, # ADDED: This ensures the attribute always exists.
|
|
16
|
+
help="Show the detailed instruction guide on startup."
|
|
16
17
|
)
|
|
17
18
|
parser.add_argument(
|
|
18
19
|
'--root',
|
|
@@ -10,30 +10,62 @@ OUTPUT_DIR_NAME = "CODEBASE_EXTRACTS"
|
|
|
10
10
|
|
|
11
11
|
# --- FILE/FOLDER LISTS ---
|
|
12
12
|
EXCLUDED_DIRS = {
|
|
13
|
+
# Standard exclusions
|
|
13
14
|
"node_modules", "vendor", "__pycache__", "dist", "build", "target", ".next",
|
|
14
|
-
".git", ".svn", ".hg", ".vscode", ".idea", "venv", ".venv",
|
|
15
|
+
".git", ".svn", ".hg", ".vscode", ".idea", "venv", ".venv", ".dart_tool",
|
|
16
|
+
# Flutter & Mobile specific exclusions
|
|
17
|
+
".dart_tool", # Critical: Contains noisy build config
|
|
18
|
+
".gradle", # Internal Gradle cache
|
|
19
|
+
"Pods", # iOS external dependencies
|
|
20
|
+
"DerivedData", # iOS build artifacts
|
|
15
21
|
}
|
|
16
22
|
EXCLUDED_FILENAMES = {
|
|
17
|
-
"package-lock.json", "yarn.lock", "composer.lock", ".env"
|
|
23
|
+
"package-lock.json", "yarn.lock", "composer.lock", ".env", "Podfile.lock",
|
|
18
24
|
}
|
|
19
25
|
ALLOWED_FILENAMES = {
|
|
20
|
-
|
|
26
|
+
# General
|
|
27
|
+
"dockerfile", ".gitignore", ".htaccess", "makefile", ".dockerignore", ".env.example",
|
|
28
|
+
# Mobile
|
|
29
|
+
"podfile", "gemfile", "jenkinsfile", "gradlew",
|
|
21
30
|
}
|
|
22
31
|
ALLOWED_EXTENSIONS = {
|
|
32
|
+
# Web & General
|
|
23
33
|
".php", ".html", ".css", ".js", ".jsx", ".ts", ".tsx", ".vue", ".svelte",
|
|
24
34
|
".py", ".rb", ".java", ".c", ".cpp", ".cs", ".go", ".rs", ".json", ".xml",
|
|
25
35
|
".yaml", ".yml", ".toml", ".ini", ".conf", ".md", ".txt", ".rst", ".twig",
|
|
26
36
|
".blade", ".handlebars", ".mustache", ".ejs", ".sql", ".graphql", ".gql", ".tf",
|
|
37
|
+
|
|
38
|
+
# Flutter / Dart
|
|
39
|
+
".dart", ".arb",
|
|
40
|
+
|
|
41
|
+
# Android
|
|
42
|
+
".gradle", ".properties",
|
|
43
|
+
|
|
44
|
+
# iOS
|
|
45
|
+
".plist", ".xcconfig",
|
|
46
|
+
|
|
47
|
+
# Scripts
|
|
48
|
+
".sh", ".bat",
|
|
27
49
|
}
|
|
28
50
|
|
|
29
51
|
# --- MAPPINGS & CONSTANTS ---
|
|
30
52
|
EXTENSION_LANG_MAP = {
|
|
53
|
+
# Web & General
|
|
31
54
|
".js": "javascript", ".ts": "typescript", ".tsx": "tsx", ".py": "python",
|
|
32
55
|
".html": "html", ".css": "css", ".json": "json", ".md": "markdown", ".txt": "",
|
|
33
56
|
".sh": "bash", ".yml": "yaml", ".yaml": "yaml", ".php": "php", ".rb": "ruby",
|
|
34
57
|
".java": "java", ".c": "c", ".cpp": "cpp", ".cs": "csharp", ".go": "go",
|
|
35
58
|
".rs": "rust", ".vue": "vue", ".svelte": "svelte", ".sql": "sql",
|
|
36
59
|
".graphql": "graphql", ".gql": "graphql",
|
|
60
|
+
|
|
61
|
+
# Mobile Specific
|
|
62
|
+
".dart": "dart",
|
|
63
|
+
".gradle": "groovy",
|
|
64
|
+
".plist": "xml",
|
|
65
|
+
".xcconfig": "properties",
|
|
66
|
+
".properties": "properties",
|
|
67
|
+
".arb": "json",
|
|
68
|
+
".bat": "batch",
|
|
37
69
|
}
|
|
38
70
|
MAX_FILE_SIZE_MB = 1
|
|
39
71
|
FILE_COUNT_WARNING_THRESHOLD = 1000
|
{codebase_extractor-1.1.0 → codebase_extractor-1.2.0}/src/codebase_extractor/file_handler.py
RENAMED
|
@@ -10,29 +10,46 @@ import questionary
|
|
|
10
10
|
|
|
11
11
|
|
|
12
12
|
def get_folder_choices(root_path: Path, max_depth: int) -> list:
|
|
13
|
-
"""Recursively finds folders up to a max depth and prepares them for questionary."""
|
|
13
|
+
"""Recursively finds folders up to a max depth and prepares them for questionary with a visual tree."""
|
|
14
14
|
choices = []
|
|
15
|
-
|
|
16
|
-
def scanner(current_path: Path, depth: int):
|
|
15
|
+
|
|
16
|
+
def scanner(current_path: Path, prefix: str, depth: int):
|
|
17
|
+
"""A recursive helper to build the folder tree."""
|
|
18
|
+
# Stop scanning if the maximum depth is reached
|
|
17
19
|
if depth > max_depth:
|
|
18
20
|
return
|
|
19
21
|
|
|
20
|
-
relative_path = current_path.relative_to(root_path)
|
|
21
|
-
prefix = " " * (depth - 1)
|
|
22
|
-
display_name = f"{prefix}{current_path.name}"
|
|
23
|
-
choices.append(questionary.Choice(title=display_name, value=relative_path))
|
|
24
|
-
|
|
25
22
|
try:
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
23
|
+
# Get a sorted list of valid subdirectories
|
|
24
|
+
subdirs = sorted([
|
|
25
|
+
p for p in current_path.iterdir()
|
|
26
|
+
if p.is_dir() and p.name not in config.EXCLUDED_DIRS
|
|
27
|
+
])
|
|
28
|
+
|
|
29
|
+
# Iterate through the subdirectories to build the tree display
|
|
30
|
+
for i, subdir in enumerate(subdirs):
|
|
31
|
+
is_last = (i == len(subdirs) - 1)
|
|
32
|
+
|
|
33
|
+
# Use '└─' for the last item and '├─' for others
|
|
34
|
+
connector = "└─ " if is_last else "├─ "
|
|
35
|
+
display_name = f"{prefix}{connector}{subdir.name}"
|
|
36
|
+
|
|
37
|
+
relative_path = subdir.relative_to(root_path)
|
|
38
|
+
choices.append(questionary.Choice(title=display_name, value=relative_path))
|
|
39
|
+
|
|
40
|
+
# Prepare the prefix for the next level of recursion
|
|
41
|
+
# Use a blank prefix for children of the last item, and a pipe for others
|
|
42
|
+
child_prefix = prefix + (" " if is_last else "│ ")
|
|
43
|
+
scanner(subdir, child_prefix, depth + 1)
|
|
44
|
+
|
|
29
45
|
except PermissionError:
|
|
46
|
+
# Silently ignore directories that the user doesn't have permission to read
|
|
30
47
|
pass
|
|
31
48
|
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
scanner(folder, 1)
|
|
49
|
+
# Start the recursive scan from the project's root directory
|
|
50
|
+
scanner(root_path, prefix="", depth=1)
|
|
35
51
|
|
|
52
|
+
# Add the special option to select files in the root folder itself
|
|
36
53
|
root_option_name = f"root [{root_path.name}] (files in root folder only, excl. sub-folders)"
|
|
37
54
|
choices.insert(0, questionary.Choice(title=root_option_name, value="ROOT_SENTINEL"))
|
|
38
55
|
|
|
@@ -49,15 +66,20 @@ def is_allowed_file(path: Path, exclude_large: bool) -> bool:
|
|
|
49
66
|
return False
|
|
50
67
|
if path.name.lower() in config.EXCLUDED_FILENAMES:
|
|
51
68
|
return False
|
|
52
|
-
if path.suffix not in config.ALLOWED_EXTENSIONS:
|
|
69
|
+
if path.suffix.lower() not in config.ALLOWED_EXTENSIONS:
|
|
53
70
|
return False
|
|
54
71
|
if exclude_large and path.stat().st_size > config.MAX_FILE_SIZE_MB * 1024 * 1024:
|
|
55
72
|
return False
|
|
56
73
|
return True
|
|
57
74
|
|
|
58
75
|
|
|
59
|
-
def extract_code_from_folder(folder: Path, exclude_large: bool) ->
|
|
60
|
-
"""
|
|
76
|
+
def extract_code_from_folder(folder: Path, exclude_large: bool) -> tuple[str, int, int, int]:
|
|
77
|
+
"""
|
|
78
|
+
Extracts code from a given folder, respecting EXCLUDED_DIRS at all depths.
|
|
79
|
+
|
|
80
|
+
Returns:
|
|
81
|
+
A tuple containing the content string, file count, char count, and word count.
|
|
82
|
+
"""
|
|
61
83
|
content = f"# Folder: {folder.relative_to(Path.cwd())}\n\n"
|
|
62
84
|
extracted_files = 0
|
|
63
85
|
dirs_to_visit = [folder]
|
|
@@ -80,11 +102,21 @@ def extract_code_from_folder(folder: Path, exclude_large: bool) -> (str, int):
|
|
|
80
102
|
content += f"\n\n"
|
|
81
103
|
if extracted_files > config.FILE_COUNT_WARNING_THRESHOLD:
|
|
82
104
|
logging.warning(colored(f"> Caution: Large file count in '{folder.name}' ({extracted_files} files).", "yellow"))
|
|
83
|
-
|
|
105
|
+
|
|
106
|
+
# ADDED: Calculate character and word counts
|
|
107
|
+
char_count = len(content)
|
|
108
|
+
word_count = len(content.split())
|
|
109
|
+
|
|
110
|
+
return content, extracted_files, char_count, word_count
|
|
84
111
|
|
|
85
112
|
|
|
86
|
-
def extract_code_from_root(root_path: Path, exclude_large: bool) ->
|
|
87
|
-
"""
|
|
113
|
+
def extract_code_from_root(root_path: Path, exclude_large: bool) -> tuple[str, int, int, int]:
|
|
114
|
+
"""
|
|
115
|
+
Extracts code only from files present in the root directory.
|
|
116
|
+
|
|
117
|
+
Returns:
|
|
118
|
+
A tuple containing the content string, file count, char count, and word count.
|
|
119
|
+
"""
|
|
88
120
|
content = f"# Root Files: {root_path.name}\n\n"
|
|
89
121
|
extracted_files = 0
|
|
90
122
|
for filepath in sorted(root_path.iterdir()):
|
|
@@ -97,7 +129,12 @@ def extract_code_from_root(root_path: Path, exclude_large: bool) -> (str, int):
|
|
|
97
129
|
extracted_files += 1
|
|
98
130
|
if extracted_files > config.FILE_COUNT_WARNING_THRESHOLD:
|
|
99
131
|
logging.warning(colored(f"> Caution: Large file count in root ({extracted_files} files).", "yellow"))
|
|
100
|
-
|
|
132
|
+
|
|
133
|
+
# ADDED: Calculate character and word counts
|
|
134
|
+
char_count = len(content)
|
|
135
|
+
word_count = len(content.split())
|
|
136
|
+
|
|
137
|
+
return content, extracted_files, char_count, word_count
|
|
101
138
|
|
|
102
139
|
|
|
103
140
|
def write_to_markdown_file(content: str, metadata: dict, root_path: Path, output_dir_name: str):
|
|
@@ -116,12 +153,15 @@ def write_to_markdown_file(content: str, metadata: dict, root_path: Path, output
|
|
|
116
153
|
filename = f"{file_base_name}_{timestamp}.md"
|
|
117
154
|
full_filepath = output_dir / filename
|
|
118
155
|
|
|
156
|
+
# CHANGED: Added char_count and word_count to the YAML header
|
|
119
157
|
yaml_header = f"""---
|
|
120
158
|
extraction_details:
|
|
121
159
|
reference: {metadata['run_ref']}
|
|
122
160
|
timestamp_utc: "{metadata['run_timestamp']}"
|
|
123
161
|
source_folder: "{metadata['folder_name']}"
|
|
124
162
|
file_count: {metadata['file_count']}
|
|
163
|
+
char_count: {metadata['char_count']}
|
|
164
|
+
word_count: {metadata['word_count']}
|
|
125
165
|
tool_details:
|
|
126
166
|
name: "Codebase Extractor"
|
|
127
167
|
version: "{__version__}"
|
|
@@ -35,7 +35,7 @@ class NumberValidator(Validator):
|
|
|
35
35
|
message="Please enter a valid number.",
|
|
36
36
|
cursor_position=len(document.text))
|
|
37
37
|
|
|
38
|
-
def setup_logging(verbose: bool, log_file: str = None):
|
|
38
|
+
def setup_logging(verbose: bool, log_file: Optional[str] = None):
|
|
39
39
|
"""Configures the logging system."""
|
|
40
40
|
log_level = logging.DEBUG if verbose else logging.INFO
|
|
41
41
|
log_format = logging.Formatter('%(message)s')
|
|
@@ -79,14 +79,14 @@ def main():
|
|
|
79
79
|
# --- Startup Sequence ---
|
|
80
80
|
if not is_fully_automated:
|
|
81
81
|
ui.clear_screen()
|
|
82
|
-
|
|
83
|
-
|
|
82
|
+
# CHANGED: Pass the new 'instructions' flag to the banner function
|
|
83
|
+
ui.print_banner(show_instructions=args.instructions)
|
|
84
|
+
# CHANGED: Logic is now inverted to show instructions only when the flag is present
|
|
85
|
+
if args.instructions:
|
|
84
86
|
ui.show_instructions(output_dir_name)
|
|
85
|
-
else:
|
|
86
|
-
input(colored("\nPress Enter to begin...", "green"))
|
|
87
|
-
ui.clear_screen()
|
|
88
87
|
else:
|
|
89
|
-
|
|
88
|
+
# NOTE: For automated runs, the banner is always minimal. This is correct.
|
|
89
|
+
ui.print_banner(show_instructions=False)
|
|
90
90
|
|
|
91
91
|
# --- Collect Settings (Interactively or from Args) ---
|
|
92
92
|
select_style = Style([('qmark', 'fg:#FFA500'), ('pointer', 'fg:#FFA500'), ('highlighted', 'fg:black bg:#FFA500'), ('selected', 'fg:black bg:#FFA500')])
|
|
@@ -94,7 +94,7 @@ def main():
|
|
|
94
94
|
exclude_large = args.exclude_large_files
|
|
95
95
|
if not is_fully_automated:
|
|
96
96
|
logging.info("=== Extraction Settings ===")
|
|
97
|
-
exclude_large_choice = questionary.select("[1/2] -- Exclude files larger than 1MB?", choices=["
|
|
97
|
+
exclude_large_choice = questionary.select("[1/2] -- Exclude files larger than 1MB?", choices=["no", "yes"], style=select_style, instruction=" ").ask()
|
|
98
98
|
if exclude_large_choice is None: raise KeyboardInterrupt
|
|
99
99
|
exclude_large = exclude_large_choice == "yes"
|
|
100
100
|
print()
|
|
@@ -148,13 +148,23 @@ def main():
|
|
|
148
148
|
for folder_path in sorted(list(folders_to_process)):
|
|
149
149
|
with Halo(text=f"Extracting {folder_path.relative_to(root_path)}...", spinner="dots"):
|
|
150
150
|
time.sleep(0.1)
|
|
151
|
-
|
|
151
|
+
# CHANGED: Unpack the new char_count and word_count values
|
|
152
|
+
folder_md, folder_count, char_count, word_count = file_handler.extract_code_from_folder(folder_path, exclude_large)
|
|
152
153
|
|
|
153
154
|
if folder_count > 0:
|
|
154
|
-
|
|
155
|
+
# CHANGED: Add new metrics to the metadata dictionary
|
|
156
|
+
metadata = {
|
|
157
|
+
"run_ref": run_ref,
|
|
158
|
+
"run_timestamp": run_timestamp,
|
|
159
|
+
"folder_name": str(folder_path.relative_to(root_path)),
|
|
160
|
+
"file_count": folder_count,
|
|
161
|
+
"char_count": char_count,
|
|
162
|
+
"word_count": word_count
|
|
163
|
+
}
|
|
155
164
|
if not args.dry_run:
|
|
156
165
|
file_handler.write_to_markdown_file(folder_md, metadata, root_path, output_dir_name)
|
|
157
166
|
logging.info(f"✅ Extracted {folder_count} file(s) from: {folder_path.relative_to(root_path)}")
|
|
167
|
+
logging.info(f"📜 {char_count:,} character(s), {word_count:,} word(s)")
|
|
158
168
|
if args.dry_run: logging.info(colored(" (Dry Run: No file written)", "yellow"))
|
|
159
169
|
total_files_extracted += folder_count
|
|
160
170
|
else:
|
|
@@ -165,14 +175,24 @@ def main():
|
|
|
165
175
|
root_display_name = f"root [{root_path.name}] (files in root folder only, excl. sub-folders)"
|
|
166
176
|
with Halo(text=f"Extracting {root_display_name}...", spinner="dots"):
|
|
167
177
|
time.sleep(0.1)
|
|
168
|
-
|
|
178
|
+
# CHANGED: Unpack the new char_count and word_count values
|
|
179
|
+
root_md, root_count, char_count, word_count = file_handler.extract_code_from_root(root_path, exclude_large)
|
|
169
180
|
|
|
170
181
|
if root_count > 0:
|
|
171
|
-
|
|
182
|
+
# CHANGED: Add new metrics to the metadata dictionary
|
|
183
|
+
metadata = {
|
|
184
|
+
"run_ref": run_ref,
|
|
185
|
+
"run_timestamp": run_timestamp,
|
|
186
|
+
"folder_name": root_display_name,
|
|
187
|
+
"file_count": root_count,
|
|
188
|
+
"char_count": char_count,
|
|
189
|
+
"word_count": word_count
|
|
190
|
+
}
|
|
172
191
|
if not args.dry_run:
|
|
173
192
|
file_handler.write_to_markdown_file(root_md, metadata, root_path, output_dir_name)
|
|
174
193
|
total_files_extracted += root_count
|
|
175
194
|
logging.info(f"✅ Extracted {root_count} file(s) from the root directory")
|
|
195
|
+
logging.info(f"📜 {char_count:,} character(s), {word_count:,} word(s)")
|
|
176
196
|
if args.dry_run: logging.info(colored(" (Dry Run: No file written)", "yellow"))
|
|
177
197
|
else:
|
|
178
198
|
logging.warning("‼️ No extractable files in the root directory")
|
|
@@ -196,4 +216,7 @@ def main():
|
|
|
196
216
|
logging.error(colored(f"\n[!] An unexpected error occurred: {e}", "red"))
|
|
197
217
|
import traceback
|
|
198
218
|
traceback.print_exc()
|
|
199
|
-
sys.exit(1)
|
|
219
|
+
sys.exit(1)
|
|
220
|
+
|
|
221
|
+
if __name__ == "__main__":
|
|
222
|
+
main()
|
|
@@ -4,8 +4,6 @@ from . import config
|
|
|
4
4
|
from . import __version__
|
|
5
5
|
from termcolor import colored
|
|
6
6
|
|
|
7
|
-
# ... (LOGO_LARGE and LOGO_SMALL strings remain the same) ...
|
|
8
|
-
|
|
9
7
|
LOGO_LARGE = """
|
|
10
8
|
██████╗ ██████╗ ██████╗ ███████╗██████╗ █████╗ ███████╗███████╗ ███████╗██╗ ██╗████████╗██████╗ █████╗ ██████╗████████╗ ██████╗ ██████╗
|
|
11
9
|
██╔════╝██╔═══██╗██╔══██╗██╔════╝██╔══██╗██╔══██╗██╔════╝██╔════╝ ██╔════╝╚██╗██╔╝╚══██╔══╝██╔══██╗██╔══██╗██╔════╝╚══██╔══╝██╔═══██╗██╔══██╗
|
|
@@ -16,26 +14,16 @@ LOGO_LARGE = """
|
|
|
16
14
|
"""
|
|
17
15
|
|
|
18
16
|
LOGO_SMALL = """
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
██║ ██║ ██║██║ ██║██╔══╝ ██╔══██╗██╔══██║╚════██║██╔══╝
|
|
23
|
-
╚██████╗╚██████╔╝██████╔╝███████╗██████╔╝██║ ██║███████║███████╗
|
|
24
|
-
╚═════╝ ╚═════╝ ╚═════╝ ╚══════╝╚═════╝ ╚═╝ ╚═╝╚══════╝╚══════╝
|
|
25
|
-
|
|
26
|
-
███████╗██╗ ██╗████████╗██████╗ █████╗ ██████╗████████╗ ██████╗ ██████╗
|
|
27
|
-
██╔════╝╚██╗██╔╝╚══██╔══╝██╔══██╗██╔══██╗██╔════╝╚══██╔══╝██╔═══██╗██╔══██╗
|
|
28
|
-
█████╗ ╚███╔╝ ██║ ██████╔╝███████║██║ ██║ ██║ ██║██████╔╝
|
|
29
|
-
██╔══╝ ██╔██╗ ██║ ██╔══██╗██╔══██║██║ ██║ ██║ ██║██╔══██╗
|
|
30
|
-
███████╗██╔╝ ██╗ ██║ ██║ ██║██║ ██║╚██████╗ ██║ ╚██████╔╝██║ ██║
|
|
31
|
-
╚══════╝╚═╝ ╚═╝ ╚═╝ ╚═╝ ╚═╝╚═╝ ╚═╝ ╚═════╝ ╚═╝ ╚═════╝ ╚═╝ ╚═╝
|
|
17
|
+
░█▀▀░█▀█░█▀▄░█▀▀░█▀▄░█▀█░█▀▀░█▀▀░░░█▀▀░█░█░▀█▀░█▀▄░█▀█░█▀▀░▀█▀░█▀█░█▀▄
|
|
18
|
+
░█░░░█░█░█░█░█▀▀░█▀▄░█▀█░▀▀█░█▀▀░░░█▀▀░▄▀▄░░█░░█▀▄░█▀█░█░░░░█░░█░█░█▀▄
|
|
19
|
+
░▀▀▀░▀▀▀░▀▀░░▀▀▀░▀▀░░▀░▀░▀▀▀░▀▀▀░░░▀▀▀░▀░▀░░▀░░▀░▀░▀░▀░▀▀▀░░▀░░▀▀▀░▀░▀
|
|
32
20
|
"""
|
|
33
21
|
|
|
34
22
|
def clear_screen():
|
|
35
23
|
"""Clears the terminal screen."""
|
|
36
24
|
os.system('cls' if os.name == 'nt' else 'clear')
|
|
37
25
|
|
|
38
|
-
def print_banner(
|
|
26
|
+
def print_banner(show_instructions: bool = False):
|
|
39
27
|
"""Prints a banner that adjusts to the terminal width."""
|
|
40
28
|
try:
|
|
41
29
|
width = shutil.get_terminal_size((80, 20)).columns
|
|
@@ -48,11 +36,10 @@ def print_banner(no_instructions: bool = False):
|
|
|
48
36
|
print(LOGO_SMALL)
|
|
49
37
|
|
|
50
38
|
# Use the imported __version__ variable instead of config.SCRIPT_VERSION
|
|
51
|
-
print(colored(f" Welcome to
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
print("It's ideal for providing context to AI models, archiving projects, or generating documentation.")
|
|
39
|
+
print(colored(f" Welcome to Codebase Extractor v{__version__} by Lukasz Lekowski ".center(width, "="), "white", "on_magenta"))
|
|
40
|
+
print("\nThis tool consolidates your project's codebase into structured Markdown files.")
|
|
41
|
+
print("It's ideal for providing context to AI models, archiving projects, or generating documentation.\n")
|
|
42
|
+
|
|
56
43
|
|
|
57
44
|
def show_instructions(output_dir_name: str):
|
|
58
45
|
"""Clears screen and shows detailed instructions, pausing for user input."""
|
|
@@ -80,12 +67,13 @@ def show_instructions(output_dir_name: str):
|
|
|
80
67
|
print(" - Selection Tree: You'll see a tree-like list of your project's folders. The script handles parent/child selections intelligently:")
|
|
81
68
|
print(" - If you select a parent folder, all of its sub-folders are automatically included. You don't need to check them individually.")
|
|
82
69
|
print(" - To get a file for *only* a sub-folder, select the sub-folder but *not* its parent.")
|
|
83
|
-
print(" - The 'root [...]' option specifically extracts *only* the files in your project's main directory.\n")
|
|
70
|
+
print(" - The 'root [...]' option specifically extracts *only* the files (not files in sub-folders) in your project's main directory.\n")
|
|
84
71
|
|
|
85
72
|
print(colored("--- Output Details ---", "yellow"))
|
|
86
73
|
print(f"All extracted content is saved into the '{output_dir_name}' directory. Each Markdown file generated will contain a YAML metadata header at the top with a unique reference ID, a timestamp, and more.\n")
|
|
87
74
|
|
|
88
|
-
|
|
75
|
+
# CHANGED: Updated the tip to reflect the new '--instructions' flag
|
|
76
|
+
tip = "TIP: To see this guide again, run the script with the --instructions flag."
|
|
89
77
|
print(colored(tip, "black", "on_yellow"))
|
|
90
78
|
|
|
91
79
|
input(colored("\nReady? Press Enter to begin...", "green"))
|
|
@@ -105,6 +93,4 @@ def print_footer():
|
|
|
105
93
|
print("💡 Love this tool? Found a bug? Share your feedback on GitHub:")
|
|
106
94
|
print(config.GITHUB_URL + "\n")
|
|
107
95
|
print("🤝 Connect with the author on LinkedIn:")
|
|
108
|
-
print(config.LINKEDIN_URL + "\n")
|
|
109
|
-
print("☕ Enjoying this tool? You can support its development with a coffee!")
|
|
110
|
-
print("https://www.buymeacoffee.com/lukaszlekowski\n")
|
|
96
|
+
print(config.LINKEDIN_URL + "\n")
|
{codebase_extractor-1.1.0 → codebase_extractor-1.2.0/src/codebase_extractor.egg-info}/PKG-INFO
RENAMED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: codebase-extractor
|
|
3
|
-
Version: 1.
|
|
3
|
+
Version: 1.2.0
|
|
4
4
|
Summary: A CLI tool to extract project source code into structured Markdown files for LLM & AI context.
|
|
5
5
|
Author: Lukasz Lekowski
|
|
6
6
|
Project-URL: Homepage, https://github.com/lukaszlekowski/codebase-extractor
|
|
@@ -10,7 +10,7 @@ Classifier: License :: OSI Approved :: MIT License
|
|
|
10
10
|
Classifier: Operating System :: OS Independent
|
|
11
11
|
Classifier: Topic :: Software Development :: Documentation
|
|
12
12
|
Classifier: Topic :: Utilities
|
|
13
|
-
Requires-Python: >=3.
|
|
13
|
+
Requires-Python: >=3.14
|
|
14
14
|
Description-Content-Type: text/markdown
|
|
15
15
|
License-File: LICENCE
|
|
16
16
|
Requires-Dist: questionary
|
|
@@ -75,12 +75,12 @@ The tool is highly configurable, allowing you to select specific folders, exclud
|
|
|
75
75
|
## ✨ Key Features
|
|
76
76
|
|
|
77
77
|
- **Interactive & User-Friendly:** A guided, multi-step CLI experience that makes selecting options simple and clear.
|
|
78
|
+
- **Quick Start by Default:** The tool starts without delay. Detailed instructions are available via an `--instructions` flag when you need a reminder.
|
|
78
79
|
- **Smart Filtering:** Automatically excludes common dependency folders, build artifacts, version control directories, and IDE configuration files. The exact filters are configurable.
|
|
79
80
|
- **Flexible Selection Modes:** Choose to extract the entire project with one command, or dive into a specific selection mode.
|
|
80
|
-
- **🌳
|
|
81
|
+
- **🌳 Visual Tree Selection:** Interactively browse and select specific sub-folders from a clear, pipe-based tree structure.
|
|
81
82
|
- **🔢 Configurable Scan Depth:** You decide how many levels deep the script should look for folders when building the selection tree.
|
|
82
|
-
- **YAML Metadata:** Each generated Markdown file is prepended with a YAML front matter block containing useful metadata like a unique run ID, timestamp,
|
|
83
|
-
- **🚀 Quick Start Mode:** Use the `--no-instructions` flag to skip the detailed intro guide on subsequent runs.
|
|
83
|
+
- **Rich YAML Metadata:** Each generated Markdown file is prepended with a YAML front matter block containing useful metadata like a unique run ID, timestamp, file count, character count, and word count.
|
|
84
84
|
- **Safe & Robust:** Features graceful exit handling (`Ctrl+C`) and provides clear feedback during the extraction process.
|
|
85
85
|
|
|
86
86
|
---
|
|
@@ -165,15 +165,15 @@ pipx install codebase-extractor
|
|
|
165
165
|
Once installed, you can run the tool from any terminal window. Navigate to your project's root directory and run the command:
|
|
166
166
|
|
|
167
167
|
```bash
|
|
168
|
-
|
|
168
|
+
codebase-extractor
|
|
169
169
|
```
|
|
170
170
|
|
|
171
|
-
The script will
|
|
171
|
+
The script will launch immediately and guide you through the extraction process.
|
|
172
172
|
|
|
173
|
-
For
|
|
173
|
+
For a detailed guide on how the script works, you can use the `--instructions` flag:
|
|
174
174
|
|
|
175
175
|
```bash
|
|
176
|
-
|
|
176
|
+
codebase-extractor --instructions
|
|
177
177
|
```
|
|
178
178
|
|
|
179
179
|
### The Process
|
|
@@ -193,25 +193,25 @@ The tool will guide you through a series of prompts:
|
|
|
193
193
|
|
|
194
194
|
### Output Details
|
|
195
195
|
|
|
196
|
-
All output files are saved in a `CODEBASE_EXTRACTS` directory within your project folder. Each generated Markdown file includes a YAML metadata header with a unique reference ID, timestamp, and
|
|
196
|
+
All output files are saved in a `CODEBASE_EXTRACTS` directory within your project folder. Each generated Markdown file includes a YAML metadata header with a unique reference ID, timestamp, file count, character count, and word count for easy tracking and parsing.
|
|
197
197
|
|
|
198
198
|
### ⚡ CLI Command Reference
|
|
199
199
|
|
|
200
200
|
For non-interactive use and automation, you can control the script entirely with these arguments.
|
|
201
201
|
|
|
202
|
-
| Argument
|
|
203
|
-
|
|
|
204
|
-
|
|
|
205
|
-
| `--root <path>`
|
|
206
|
-
| `--output-dir <name>`
|
|
207
|
-
| `--dry-run`
|
|
208
|
-
| `-v`, `--verbose`
|
|
209
|
-
| `--log-file <path>`
|
|
210
|
-
| `--exclude-large-files`
|
|
211
|
-
| `--mode <mode>`
|
|
212
|
-
| `--depth <number>`
|
|
213
|
-
| `--select-folders <list>`
|
|
214
|
-
| `--select-root`
|
|
202
|
+
| Argument | Description | Default Value |
|
|
203
|
+
| :------------------------ | :--------------------------------------------------------------------------- | :-------------------------- |
|
|
204
|
+
| `--instructions` | Show the detailed instruction guide on startup. | `False` |
|
|
205
|
+
| `--root <path>` | The root directory of the project to extract. | The current directory |
|
|
206
|
+
| `--output-dir <name>` | Custom name for the output directory. | `CODEBASE_EXTRACTS` |
|
|
207
|
+
| `--dry-run` | Simulate the extraction process without writing any files. | `False` |
|
|
208
|
+
| `-v`, `--verbose` | Enable verbose logging for debugging. | `False` |
|
|
209
|
+
| `--log-file <path>` | Path to save the log file. | `None` |
|
|
210
|
+
| `--exclude-large-files` | Non-interactive: Exclude files larger than 1MB. | `False` |
|
|
211
|
+
| `--mode <mode>` | Non-interactive: Set the extraction mode. Choices: `everything`, `specific`. | `None` (Interactive prompt) |
|
|
212
|
+
| `--depth <number>` | Non-interactive: Set the folder scan depth for 'specific' mode. | `3` |
|
|
213
|
+
| `--select-folders <list>` | Non-interactive: A space-separated list of folders/sub-folders to extract. | `[]` |
|
|
214
|
+
| `--select-root` | Non-interactive: Include files from the root directory in the extraction. | `False` |
|
|
215
215
|
|
|
216
216
|
---
|
|
217
217
|
|
|
@@ -224,7 +224,7 @@ Here are a few practical examples of how to use the tool from your command line.
|
|
|
224
224
|
A common command for quick, automated runs.
|
|
225
225
|
|
|
226
226
|
```bash
|
|
227
|
-
|
|
227
|
+
codebase-extractor --mode everything
|
|
228
228
|
```
|
|
229
229
|
|
|
230
230
|
- #### Extract specific sub-folders non-interactively
|
|
@@ -232,7 +232,7 @@ Here are a few practical examples of how to use the tool from your command line.
|
|
|
232
232
|
This command extracts only the `src/components` and `src/hooks` directories, plus any files in the root.
|
|
233
233
|
|
|
234
234
|
```bash
|
|
235
|
-
|
|
235
|
+
codebase-extractor --mode specific --select-folders src/components src/hooks --select-root
|
|
236
236
|
```
|
|
237
237
|
|
|
238
238
|
- #### Perform a safe dry run
|
|
@@ -240,13 +240,13 @@ Here are a few practical examples of how to use the tool from your command line.
|
|
|
240
240
|
This will simulate a full extraction and print what it _would_ have done, without creating any files.
|
|
241
241
|
|
|
242
242
|
```bash
|
|
243
|
-
|
|
243
|
+
codebase-extractor --dry-run --mode everything
|
|
244
244
|
```
|
|
245
245
|
|
|
246
246
|
- #### Run on a different project and save to a custom folder
|
|
247
247
|
This targets a completely different directory and specifies a custom output folder name.
|
|
248
248
|
```bash
|
|
249
|
-
|
|
249
|
+
codebase-extractor --root /path/to/another/project --output-dir MyProject_Extraction
|
|
250
250
|
```
|
|
251
251
|
|
|
252
252
|
---
|
|
@@ -256,10 +256,11 @@ Here are a few practical examples of how to use the tool from your command line.
|
|
|
256
256
|
The tool uses a set of rules to determine which files and folders to include in the extraction. Here are the default settings found in the `config.py` file.
|
|
257
257
|
|
|
258
258
|
<details>
|
|
259
|
-
|
|
259
|
+
<summary><strong>Click to view Excluded Directories</strong></summary>
|
|
260
260
|
|
|
261
261
|
- `node_modules`, `vendor`, `__pycache__`, `dist`, `build`, `target`, `.next`
|
|
262
262
|
- `.git`, `.svn`, `.hg`, `.vscode`, `.idea`, `venv`, `.venv`
|
|
263
|
+
- `.dart_tool`, `.gradle`, `Pods`, `DerivedData`
|
|
263
264
|
|
|
264
265
|
</details>
|
|
265
266
|
|
|
@@ -273,17 +274,20 @@ The tool uses a set of rules to determine which files and folders to include in
|
|
|
273
274
|
<details>
|
|
274
275
|
<summary><strong>Click to view Allowed Filenames & Extensions</strong></summary>
|
|
275
276
|
|
|
276
|
-
|
|
277
|
+
The script will process any file with one of the following extensions. It also explicitly allows common configuration files that may not have an extension.
|
|
277
278
|
|
|
278
279
|
**Allowed Filenames:**
|
|
279
|
-
- `dockerfile`, `.gitignore`, `.htaccess`, `makefile`
|
|
280
|
+
- `dockerfile`, `.gitignore`, `.htaccess`, `makefile`, `.dockerignore`, `.env.example`
|
|
281
|
+
- `podfile`, `gemfile`, `jenkinsfile`, `gradlew`
|
|
280
282
|
|
|
281
283
|
**Allowed Extensions:**
|
|
282
|
-
- `.php`, `.html`, `.css`, `.js`, `.jsx`, `.ts`, `.tsx`, `.vue`, `.svelte`
|
|
283
|
-
- `.py`, `.rb`, `.java`, `.c`, `.cpp`, `.cs`, `.go`, `.rs`
|
|
284
|
-
- `.json`, `.xml`, `.yaml`, `.yml`, `.toml`, `.ini`, `.conf`
|
|
285
|
-
- `.md`, `.txt`, `.rst`, `.twig`, `.blade`, `.handlebars`, `.mustache`, `.ejs`
|
|
286
|
-
- `.sql`, `.graphql`, `.gql`, `.tf`
|
|
284
|
+
- **Web & General:** `.php`, `.html`, `.css`, `.js`, `.jsx`, `.ts`, `.tsx`, `.vue`, `.svelte`
|
|
285
|
+
- **Backend & Systems:** `.py`, `.rb`, `.java`, `.c`, `.cpp`, `.cs`, `.go`, `.rs`
|
|
286
|
+
- **Config & Data:** `.json`, `.xml`, `.yaml`, `.yml`, `.toml`, `.ini`, `.conf`
|
|
287
|
+
- **Docs & Templates:** `.md`, `.txt`, `.rst`, `.twig`, `.blade`, `.handlebars`, `.mustache`, `.ejs`
|
|
288
|
+
- **Database & IaC:** `.sql`, `.graphql`, `.gql`, `.tf`
|
|
289
|
+
- **Mobile (Flutter, Android, iOS):** `.dart`, `.arb`, `.gradle`, `.properties`, `.plist`, `.xcconfig`
|
|
290
|
+
- **Scripts:** `.sh`, `.bat`
|
|
287
291
|
|
|
288
292
|
</details>
|
|
289
293
|
|
|
@@ -291,7 +295,7 @@ The tool uses a set of rules to determine which files and folders to include in
|
|
|
291
295
|
|
|
292
296
|
## 🤔 Troubleshooting
|
|
293
297
|
|
|
294
|
-
- **Problem:** After installation, I run `
|
|
298
|
+
- **Problem:** After installation, I run `codebase-extractor` and my terminal says `command not found`.
|
|
295
299
|
- **Solution:** This is usually a `PATH` issue. It means your system's shell doesn't know where to find the installed script. The `pip install --user` command sometimes requires you to add a local scripts directory to your `PATH`. Please refer to your operating system's documentation for instructions on how to modify your `PATH` environment variable.
|
|
296
300
|
|
|
297
301
|
- **Problem:** The tool ran, but a specific folder or file I expected to see is missing from the output.
|
|
@@ -1 +0,0 @@
|
|
|
1
|
-
__version__ = "1.1.0"
|
|
File without changes
|
|
File without changes
|
{codebase_extractor-1.1.0 → codebase_extractor-1.2.0}/src/codebase_extractor.egg-info/SOURCES.txt
RENAMED
|
File without changes
|
|
File without changes
|
{codebase_extractor-1.1.0 → codebase_extractor-1.2.0}/src/codebase_extractor.egg-info/requires.txt
RENAMED
|
File without changes
|
{codebase_extractor-1.1.0 → codebase_extractor-1.2.0}/src/codebase_extractor.egg-info/top_level.txt
RENAMED
|
File without changes
|