codebase-extractor 1.0.1__tar.gz → 1.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- codebase_extractor-1.1.0/PKG-INFO +311 -0
- codebase_extractor-1.1.0/README.md +291 -0
- {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/pyproject.toml +1 -1
- codebase_extractor-1.1.0/src/codebase_extractor/__init__.py +1 -0
- codebase_extractor-1.1.0/src/codebase_extractor/cli.py +76 -0
- {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/src/codebase_extractor/config.py +1 -2
- {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/src/codebase_extractor/file_handler.py +10 -6
- codebase_extractor-1.1.0/src/codebase_extractor/main_logic.py +199 -0
- {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/src/codebase_extractor/ui.py +31 -5
- codebase_extractor-1.1.0/src/codebase_extractor.egg-info/PKG-INFO +311 -0
- {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/src/codebase_extractor.egg-info/SOURCES.txt +1 -0
- codebase_extractor-1.0.1/PKG-INFO +0 -167
- codebase_extractor-1.0.1/README.md +0 -147
- codebase_extractor-1.0.1/src/codebase_extractor/__init__.py +0 -1
- codebase_extractor-1.0.1/src/codebase_extractor/main_logic.py +0 -193
- codebase_extractor-1.0.1/src/codebase_extractor.egg-info/PKG-INFO +0 -167
- {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/LICENCE +0 -0
- {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/setup.cfg +0 -0
- {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/src/codebase_extractor.egg-info/dependency_links.txt +0 -0
- {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/src/codebase_extractor.egg-info/entry_points.txt +0 -0
- {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/src/codebase_extractor.egg-info/requires.txt +0 -0
- {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/src/codebase_extractor.egg-info/top_level.txt +0 -0
|
@@ -0,0 +1,311 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: codebase-extractor
|
|
3
|
+
Version: 1.1.0
|
|
4
|
+
Summary: A CLI tool to extract project source code into structured Markdown files for LLM & AI context.
|
|
5
|
+
Author: Lukasz Lekowski
|
|
6
|
+
Project-URL: Homepage, https://github.com/lukaszlekowski/codebase-extractor
|
|
7
|
+
Project-URL: Bug Tracker, https://github.com/lukaszlekowski/codebase-extractor/issues
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
+
Classifier: Operating System :: OS Independent
|
|
11
|
+
Classifier: Topic :: Software Development :: Documentation
|
|
12
|
+
Classifier: Topic :: Utilities
|
|
13
|
+
Requires-Python: >=3.9
|
|
14
|
+
Description-Content-Type: text/markdown
|
|
15
|
+
License-File: LICENCE
|
|
16
|
+
Requires-Dist: questionary
|
|
17
|
+
Requires-Dist: halo
|
|
18
|
+
Requires-Dist: termcolor
|
|
19
|
+
Dynamic: license-file
|
|
20
|
+
|
|
21
|
+
# Codebase Extractor
|
|
22
|
+
|
|
23
|
+
<p align="center">
|
|
24
|
+
<strong>A user-friendly CLI tool to extract project source code into structured Markdown files.</strong>
|
|
25
|
+
</p>
|
|
26
|
+
|
|
27
|
+
<p align="center">
|
|
28
|
+
<a href="https://pypi.org/project/codebase-extractor/"><img src="https://badge.fury.io/py/codebase-extractor.svg" alt="PyPI version"></a>
|
|
29
|
+
<img src="https://img.shields.io/badge/python-3.9%2B-blue.svg" alt="Python Version">
|
|
30
|
+
<img src="https://img.shields.io/badge/License-MIT%20(Modified)-yellow.svg" alt="License: MIT (Modified)">
|
|
31
|
+
</p>
|
|
32
|
+
|
|
33
|
+
<p align="center">
|
|
34
|
+
💡 <b>Love this tool?</b> Found a bug or have an idea? Share it on <a href="https://github.com/lukaszlekowski/codebase-extractor">GitHub</a>! <br>
|
|
35
|
+
🤝 <b>Connect with me</b> on <a href="https://www.linkedin.com/in/lukasz-lekowski">LinkedIn</a>. <br>
|
|
36
|
+
☕ <b>Enjoying it?</b> Support development with a <a href="https://www.buymeacoffee.com/lukaszlekowski">coffee</a>!
|
|
37
|
+
</p>
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
## Table of Contents
|
|
42
|
+
|
|
43
|
+
- [Codebase Extractor](#codebase-extractor)
|
|
44
|
+
- [Table of Contents](#table-of-contents)
|
|
45
|
+
- [🚀 Overview](#-overview)
|
|
46
|
+
- [✨ Key Features](#-key-features)
|
|
47
|
+
- [🖼️ Gallery](#️-gallery)
|
|
48
|
+
- [⚙️ Installation](#️-installation)
|
|
49
|
+
- [Step 1: Ensure Python is Installed](#step-1-ensure-python-is-installed)
|
|
50
|
+
- [Step 2: Install the Package](#step-2-install-the-package)
|
|
51
|
+
- [▶️ For macOS \& Linux Users](#️-for-macos--linux-users)
|
|
52
|
+
- [▶️ For Windows Users](#️-for-windows-users)
|
|
53
|
+
- [💡 Pro Tip: Using pipx](#-pro-tip-using-pipx)
|
|
54
|
+
- [▶️ Usage](#️-usage)
|
|
55
|
+
- [Basic Usage](#basic-usage)
|
|
56
|
+
- [The Process](#the-process)
|
|
57
|
+
- [Specific Selection (if chosen):](#specific-selection-if-chosen)
|
|
58
|
+
- [Output Details](#output-details)
|
|
59
|
+
- [⚡ CLI Command Reference](#-cli-command-reference)
|
|
60
|
+
- [Pracical Examples](#pracical-examples)
|
|
61
|
+
- [🔬 Filtering Logic](#-filtering-logic)
|
|
62
|
+
- [🤔 Troubleshooting](#-troubleshooting)
|
|
63
|
+
- [📜 License](#-license)
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
## 🚀 Overview
|
|
68
|
+
|
|
69
|
+
Codebase Extractor is a command-line interface (CLI) tool designed to scan a project directory and consolidate all relevant source code into neatly organized Markdown files. It's perfect for creating a complete project snapshot for analysis, documentation, or providing context to Large Language Models (LLMs) like GPT-4, Gemini, or Claude.
|
|
70
|
+
|
|
71
|
+
The tool is highly configurable, allowing you to select specific folders, exclude large files, and intelligently ignore common directories like `node_modules` and `.git`.
|
|
72
|
+
|
|
73
|
+
---
|
|
74
|
+
|
|
75
|
+
## ✨ Key Features
|
|
76
|
+
|
|
77
|
+
- **Interactive & User-Friendly:** A guided, multi-step CLI experience that makes selecting options simple and clear.
|
|
78
|
+
- **Smart Filtering:** Automatically excludes common dependency folders, build artifacts, version control directories, and IDE configuration files. The exact filters are configurable.
|
|
79
|
+
- **Flexible Selection Modes:** Choose to extract the entire project with one command, or dive into a specific selection mode.
|
|
80
|
+
- **🌳 Nested Folder Selection:** Interactively browse and select specific sub-folders from a tree-like view.
|
|
81
|
+
- **🔢 Configurable Scan Depth:** You decide how many levels deep the script should look for folders when building the selection tree.
|
|
82
|
+
- **YAML Metadata:** Each generated Markdown file is prepended with a YAML front matter block containing useful metadata like a unique run ID, timestamp, and file count for easy tracking and parsing.
|
|
83
|
+
- **🚀 Quick Start Mode:** Use the `--no-instructions` flag to skip the detailed intro guide on subsequent runs.
|
|
84
|
+
- **Safe & Robust:** Features graceful exit handling (`Ctrl+C`) and provides clear feedback during the extraction process.
|
|
85
|
+
|
|
86
|
+
---
|
|
87
|
+
|
|
88
|
+
## 🖼️ Gallery
|
|
89
|
+
|
|
90
|
+
<details>
|
|
91
|
+
<summary>Show Screenshots</summary>
|
|
92
|
+
<img src="images/welcome.png" width="330">
|
|
93
|
+
<img src="images/instructions.png" width="330">
|
|
94
|
+
<img src="images/file_tree.png" width="330">
|
|
95
|
+
<img src="images/extraction.png" width="330">
|
|
96
|
+
|
|
97
|
+
</details>
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## ⚙️ Installation
|
|
102
|
+
|
|
103
|
+
This guide will walk you through installing the Codebase Extractor.
|
|
104
|
+
|
|
105
|
+
### Step 1: Ensure Python is Installed
|
|
106
|
+
|
|
107
|
+
Make sure you have Python 3.9 or newer installed. You can check your version by opening your terminal and running:
|
|
108
|
+
|
|
109
|
+
```bash
|
|
110
|
+
python3 --version
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
### Step 2: Install the Package
|
|
114
|
+
|
|
115
|
+
The recommended way to install is directly from PyPI using pip, which comes with Python.
|
|
116
|
+
|
|
117
|
+
#### ▶️ For macOS & Linux Users
|
|
118
|
+
|
|
119
|
+
Open your terminal and run the following command:
|
|
120
|
+
|
|
121
|
+
```bash
|
|
122
|
+
pip3 install codebase-extractor
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
> **Note on `pip` vs `pip3`**: On most modern systems, you should use pip3 to ensure you are using a Python 3 version of pip. This avoids conflicts with older, system-installed Python 2. If you are using a virtual environment, pip is often sufficient as it will be linked to the environment's Python version.
|
|
126
|
+
|
|
127
|
+
If you encounter a permission denied error, your system may require you to install it for your user account only:
|
|
128
|
+
|
|
129
|
+
```bash
|
|
130
|
+
pip3 install --user codebase-extractor
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
In this case, you may need to add the user script directory to your PATH. The installer will provide the necessary command if this is required.
|
|
134
|
+
|
|
135
|
+
#### ▶️ For Windows Users
|
|
136
|
+
|
|
137
|
+
Open Command Prompt or PowerShell and run the following command:
|
|
138
|
+
|
|
139
|
+
```bash
|
|
140
|
+
pip install codebase-extractor
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
> **Note on `pip`**: The standard Python installer for Windows typically configures the `pip` and `python` commands correctly, so you usually do not need to use `pip3` or `python3`.
|
|
144
|
+
|
|
145
|
+
If the pip command is not found, you can try using the Python executable directly:
|
|
146
|
+
|
|
147
|
+
```bash
|
|
148
|
+
python -m pip install codebase-extractor
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
#### 💡 Pro Tip: Using pipx
|
|
152
|
+
|
|
153
|
+
For a more advanced, isolated installation, we recommend using pipx. This ensures the tool's dependencies do not conflict with other Python projects on your system.
|
|
154
|
+
|
|
155
|
+
```bash
|
|
156
|
+
pipx install codebase-extractor
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
---
|
|
160
|
+
|
|
161
|
+
## ▶️ Usage
|
|
162
|
+
|
|
163
|
+
### Basic Usage
|
|
164
|
+
|
|
165
|
+
Once installed, you can run the tool from any terminal window. Navigate to your project's root directory and run the command:
|
|
166
|
+
|
|
167
|
+
```bash
|
|
168
|
+
code-extractor
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
The script will then guide you through the extraction process.
|
|
172
|
+
|
|
173
|
+
For repeat usage, you can skip the detailed introductory guide by using the `--no-instructions` or `-ni` flag:
|
|
174
|
+
|
|
175
|
+
```bash
|
|
176
|
+
code-extractor --no-instructions
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
### The Process
|
|
180
|
+
|
|
181
|
+
The tool will guide you through a series of prompts:
|
|
182
|
+
|
|
183
|
+
- **Initial Setup [1/2]**: A yes/no question to skip files larger than 1MB.
|
|
184
|
+
- **Extraction Mode [2/2]**: Choose whether to extract the entire project (`Everything`) or select (`Specific`) folders.
|
|
185
|
+
|
|
186
|
+
### Specific Selection (if chosen):
|
|
187
|
+
|
|
188
|
+
- **Scan Depth**: You'll be asked how many sub-folder levels to scan for the selection list (defaults to 3).
|
|
189
|
+
- **Folder Tree**: You'll see a checklist of available folders and sub-folders to extract. The script handles selections intelligently:
|
|
190
|
+
- Selecting a parent folder automatically includes all its sub-folders, so you don’t need to select them individually.
|
|
191
|
+
- To extract only a sub-folder’s contents, select the sub-folder but not its parent.
|
|
192
|
+
- The special `root [...]` option extracts only the files in your project's main directory, ignoring all sub-folders.
|
|
193
|
+
|
|
194
|
+
### Output Details
|
|
195
|
+
|
|
196
|
+
All output files are saved in a `CODEBASE_EXTRACTS` directory within your project folder. Each generated Markdown file includes a YAML metadata header with a unique reference ID, timestamp, and file count for easy tracking and parsing.
|
|
197
|
+
|
|
198
|
+
### ⚡ CLI Command Reference
|
|
199
|
+
|
|
200
|
+
For non-interactive use and automation, you can control the script entirely with these arguments.
|
|
201
|
+
|
|
202
|
+
| Argument | Description | Default Value |
|
|
203
|
+
| :------------------------- | :--------------------------------------------------------------------------- | :-------------------------- |
|
|
204
|
+
| `-ni`, `--no-instructions` | Run the script without printing the detailed instruction banner. | `False` |
|
|
205
|
+
| `--root <path>` | The root directory of the project to extract. | The current directory |
|
|
206
|
+
| `--output-dir <name>` | Custom name for the output directory. | `CODEBASE_EXTRACTS` |
|
|
207
|
+
| `--dry-run` | Simulate the extraction process without writing any files. | `False` |
|
|
208
|
+
| `-v`, `--verbose` | Enable verbose logging for debugging. | `False` |
|
|
209
|
+
| `--log-file <path>` | Path to save the log file. | `None` |
|
|
210
|
+
| `--exclude-large-files` | Non-interactive: Exclude files larger than 1MB. | `False` |
|
|
211
|
+
| `--mode <mode>` | Non-interactive: Set the extraction mode. Choices: `everything`, `specific`. | `None` (Interactive prompt) |
|
|
212
|
+
| `--depth <number>` | Non-interactive: Set the folder scan depth for 'specific' mode. | `3` |
|
|
213
|
+
| `--select-folders <list>` | Non-interactive: A space-separated list of folders/sub-folders to extract. | `[]` |
|
|
214
|
+
| `--select-root` | Non-interactive: Include files from the root directory in the extraction. | `False` |
|
|
215
|
+
|
|
216
|
+
---
|
|
217
|
+
|
|
218
|
+
## Pracical Examples
|
|
219
|
+
|
|
220
|
+
Here are a few practical examples of how to use the tool from your command line.
|
|
221
|
+
|
|
222
|
+
- #### Extract an entire project, skipping the instructions
|
|
223
|
+
|
|
224
|
+
A common command for quick, automated runs.
|
|
225
|
+
|
|
226
|
+
```bash
|
|
227
|
+
code-extractor --no-instructions --mode everything
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
- #### Extract specific sub-folders non-interactively
|
|
231
|
+
|
|
232
|
+
This command extracts only the `src/components` and `src/hooks` directories, plus any files in the root.
|
|
233
|
+
|
|
234
|
+
```bash
|
|
235
|
+
code-extractor --ni --mode specific --select-folders src/components src/hooks --select-root
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
- #### Perform a safe dry run
|
|
239
|
+
|
|
240
|
+
This will simulate a full extraction and print what it _would_ have done, without creating any files.
|
|
241
|
+
|
|
242
|
+
```bash
|
|
243
|
+
code-extractor --dry-run --mode everything
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
- #### Run on a different project and save to a custom folder
|
|
247
|
+
This targets a completely different directory and specifies a custom output folder name.
|
|
248
|
+
```bash
|
|
249
|
+
code-extractor --root /path/to/another/project --output-dir MyProject_Extraction
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
---
|
|
253
|
+
|
|
254
|
+
## 🔬 Filtering Logic
|
|
255
|
+
|
|
256
|
+
The tool uses a set of rules to determine which files and folders to include in the extraction. Here are the default settings found in the `config.py` file.
|
|
257
|
+
|
|
258
|
+
<details>
|
|
259
|
+
<summary><strong>Click to view Excluded Directories</strong></summary>
|
|
260
|
+
|
|
261
|
+
- `node_modules`, `vendor`, `__pycache__`, `dist`, `build`, `target`, `.next`
|
|
262
|
+
- `.git`, `.svn`, `.hg`, `.vscode`, `.idea`, `venv`, `.venv`
|
|
263
|
+
|
|
264
|
+
</details>
|
|
265
|
+
|
|
266
|
+
<details>
|
|
267
|
+
<summary><strong>Click to view Excluded Filenames</strong></summary>
|
|
268
|
+
|
|
269
|
+
- `package-lock.json`, `yarn.lock`, `composer.lock`, `.env`
|
|
270
|
+
|
|
271
|
+
</details>
|
|
272
|
+
|
|
273
|
+
<details>
|
|
274
|
+
<summary><strong>Click to view Allowed Filenames & Extensions</strong></summary>
|
|
275
|
+
|
|
276
|
+
The script will process any file with one of the following extensions. It also explicitly allows common configuration files that may not have an extension.
|
|
277
|
+
|
|
278
|
+
**Allowed Filenames:**
|
|
279
|
+
- `dockerfile`, `.gitignore`, `.htaccess`, `makefile`
|
|
280
|
+
|
|
281
|
+
**Allowed Extensions:**
|
|
282
|
+
- `.php`, `.html`, `.css`, `.js`, `.jsx`, `.ts`, `.tsx`, `.vue`, `.svelte`
|
|
283
|
+
- `.py`, `.rb`, `.java`, `.c`, `.cpp`, `.cs`, `.go`, `.rs`
|
|
284
|
+
- `.json`, `.xml`, `.yaml`, `.yml`, `.toml`, `.ini`, `.conf`
|
|
285
|
+
- `.md`, `.txt`, `.rst`, `.twig`, `.blade`, `.handlebars`, `.mustache`, `.ejs`
|
|
286
|
+
- `.sql`, `.graphql`, `.gql`, `.tf`
|
|
287
|
+
|
|
288
|
+
</details>
|
|
289
|
+
|
|
290
|
+
---
|
|
291
|
+
|
|
292
|
+
## 🤔 Troubleshooting
|
|
293
|
+
|
|
294
|
+
- **Problem:** After installation, I run `code-extractor` and my terminal says `command not found`.
|
|
295
|
+
- **Solution:** This is usually a `PATH` issue. It means your system's shell doesn't know where to find the installed script. The `pip install --user` command sometimes requires you to add a local scripts directory to your `PATH`. Please refer to your operating system's documentation for instructions on how to modify your `PATH` environment variable.
|
|
296
|
+
|
|
297
|
+
- **Problem:** The tool ran, but a specific folder or file I expected to see is missing from the output.
|
|
298
|
+
- **Solution:** The file or folder was likely excluded by the tool's filtering rules. Please review the **[Filtering Logic](#-filtering-logic)** section above to see if its name or extension is on one of the exclusion lists.
|
|
299
|
+
|
|
300
|
+
## 📜 License
|
|
301
|
+
|
|
302
|
+
This project is licensed under a modified MIT License. Please see the [LICENSE](LICENSE) file for the full text.
|
|
303
|
+
|
|
304
|
+
The standard MIT License has been amended with a single, important attribution requirement:
|
|
305
|
+
|
|
306
|
+
If you use, copy, or modify any part of this software, you must include a clear and visible attribution to the original author and project in your derivative work.
|
|
307
|
+
|
|
308
|
+
This attribution must include:
|
|
309
|
+
|
|
310
|
+
- A link back to this original GitHub repository: [https://github.com/lukaszlekowski/codebase-extractor](https://github.com/lukaszlekowski/codebase-extractor)
|
|
311
|
+
- A link to the author's LinkedIn profile: [https://www.linkedin.com/in/lukasz-lekowski](https://www.linkedin.com/in/lukasz-lekowski)
|
|
@@ -0,0 +1,291 @@
|
|
|
1
|
+
# Codebase Extractor
|
|
2
|
+
|
|
3
|
+
<p align="center">
|
|
4
|
+
<strong>A user-friendly CLI tool to extract project source code into structured Markdown files.</strong>
|
|
5
|
+
</p>
|
|
6
|
+
|
|
7
|
+
<p align="center">
|
|
8
|
+
<a href="https://pypi.org/project/codebase-extractor/"><img src="https://badge.fury.io/py/codebase-extractor.svg" alt="PyPI version"></a>
|
|
9
|
+
<img src="https://img.shields.io/badge/python-3.9%2B-blue.svg" alt="Python Version">
|
|
10
|
+
<img src="https://img.shields.io/badge/License-MIT%20(Modified)-yellow.svg" alt="License: MIT (Modified)">
|
|
11
|
+
</p>
|
|
12
|
+
|
|
13
|
+
<p align="center">
|
|
14
|
+
💡 <b>Love this tool?</b> Found a bug or have an idea? Share it on <a href="https://github.com/lukaszlekowski/codebase-extractor">GitHub</a>! <br>
|
|
15
|
+
🤝 <b>Connect with me</b> on <a href="https://www.linkedin.com/in/lukasz-lekowski">LinkedIn</a>. <br>
|
|
16
|
+
☕ <b>Enjoying it?</b> Support development with a <a href="https://www.buymeacoffee.com/lukaszlekowski">coffee</a>!
|
|
17
|
+
</p>
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Table of Contents
|
|
22
|
+
|
|
23
|
+
- [Codebase Extractor](#codebase-extractor)
|
|
24
|
+
- [Table of Contents](#table-of-contents)
|
|
25
|
+
- [🚀 Overview](#-overview)
|
|
26
|
+
- [✨ Key Features](#-key-features)
|
|
27
|
+
- [🖼️ Gallery](#️-gallery)
|
|
28
|
+
- [⚙️ Installation](#️-installation)
|
|
29
|
+
- [Step 1: Ensure Python is Installed](#step-1-ensure-python-is-installed)
|
|
30
|
+
- [Step 2: Install the Package](#step-2-install-the-package)
|
|
31
|
+
- [▶️ For macOS \& Linux Users](#️-for-macos--linux-users)
|
|
32
|
+
- [▶️ For Windows Users](#️-for-windows-users)
|
|
33
|
+
- [💡 Pro Tip: Using pipx](#-pro-tip-using-pipx)
|
|
34
|
+
- [▶️ Usage](#️-usage)
|
|
35
|
+
- [Basic Usage](#basic-usage)
|
|
36
|
+
- [The Process](#the-process)
|
|
37
|
+
- [Specific Selection (if chosen):](#specific-selection-if-chosen)
|
|
38
|
+
- [Output Details](#output-details)
|
|
39
|
+
- [⚡ CLI Command Reference](#-cli-command-reference)
|
|
40
|
+
- [Pracical Examples](#pracical-examples)
|
|
41
|
+
- [🔬 Filtering Logic](#-filtering-logic)
|
|
42
|
+
- [🤔 Troubleshooting](#-troubleshooting)
|
|
43
|
+
- [📜 License](#-license)
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
## 🚀 Overview
|
|
48
|
+
|
|
49
|
+
Codebase Extractor is a command-line interface (CLI) tool designed to scan a project directory and consolidate all relevant source code into neatly organized Markdown files. It's perfect for creating a complete project snapshot for analysis, documentation, or providing context to Large Language Models (LLMs) like GPT-4, Gemini, or Claude.
|
|
50
|
+
|
|
51
|
+
The tool is highly configurable, allowing you to select specific folders, exclude large files, and intelligently ignore common directories like `node_modules` and `.git`.
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## ✨ Key Features
|
|
56
|
+
|
|
57
|
+
- **Interactive & User-Friendly:** A guided, multi-step CLI experience that makes selecting options simple and clear.
|
|
58
|
+
- **Smart Filtering:** Automatically excludes common dependency folders, build artifacts, version control directories, and IDE configuration files. The exact filters are configurable.
|
|
59
|
+
- **Flexible Selection Modes:** Choose to extract the entire project with one command, or dive into a specific selection mode.
|
|
60
|
+
- **🌳 Nested Folder Selection:** Interactively browse and select specific sub-folders from a tree-like view.
|
|
61
|
+
- **🔢 Configurable Scan Depth:** You decide how many levels deep the script should look for folders when building the selection tree.
|
|
62
|
+
- **YAML Metadata:** Each generated Markdown file is prepended with a YAML front matter block containing useful metadata like a unique run ID, timestamp, and file count for easy tracking and parsing.
|
|
63
|
+
- **🚀 Quick Start Mode:** Use the `--no-instructions` flag to skip the detailed intro guide on subsequent runs.
|
|
64
|
+
- **Safe & Robust:** Features graceful exit handling (`Ctrl+C`) and provides clear feedback during the extraction process.
|
|
65
|
+
|
|
66
|
+
---
|
|
67
|
+
|
|
68
|
+
## 🖼️ Gallery
|
|
69
|
+
|
|
70
|
+
<details>
|
|
71
|
+
<summary>Show Screenshots</summary>
|
|
72
|
+
<img src="images/welcome.png" width="330">
|
|
73
|
+
<img src="images/instructions.png" width="330">
|
|
74
|
+
<img src="images/file_tree.png" width="330">
|
|
75
|
+
<img src="images/extraction.png" width="330">
|
|
76
|
+
|
|
77
|
+
</details>
|
|
78
|
+
|
|
79
|
+
---
|
|
80
|
+
|
|
81
|
+
## ⚙️ Installation
|
|
82
|
+
|
|
83
|
+
This guide will walk you through installing the Codebase Extractor.
|
|
84
|
+
|
|
85
|
+
### Step 1: Ensure Python is Installed
|
|
86
|
+
|
|
87
|
+
Make sure you have Python 3.9 or newer installed. You can check your version by opening your terminal and running:
|
|
88
|
+
|
|
89
|
+
```bash
|
|
90
|
+
python3 --version
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
### Step 2: Install the Package
|
|
94
|
+
|
|
95
|
+
The recommended way to install is directly from PyPI using pip, which comes with Python.
|
|
96
|
+
|
|
97
|
+
#### ▶️ For macOS & Linux Users
|
|
98
|
+
|
|
99
|
+
Open your terminal and run the following command:
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
pip3 install codebase-extractor
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
> **Note on `pip` vs `pip3`**: On most modern systems, you should use pip3 to ensure you are using a Python 3 version of pip. This avoids conflicts with older, system-installed Python 2. If you are using a virtual environment, pip is often sufficient as it will be linked to the environment's Python version.
|
|
106
|
+
|
|
107
|
+
If you encounter a permission denied error, your system may require you to install it for your user account only:
|
|
108
|
+
|
|
109
|
+
```bash
|
|
110
|
+
pip3 install --user codebase-extractor
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
In this case, you may need to add the user script directory to your PATH. The installer will provide the necessary command if this is required.
|
|
114
|
+
|
|
115
|
+
#### ▶️ For Windows Users
|
|
116
|
+
|
|
117
|
+
Open Command Prompt or PowerShell and run the following command:
|
|
118
|
+
|
|
119
|
+
```bash
|
|
120
|
+
pip install codebase-extractor
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
> **Note on `pip`**: The standard Python installer for Windows typically configures the `pip` and `python` commands correctly, so you usually do not need to use `pip3` or `python3`.
|
|
124
|
+
|
|
125
|
+
If the pip command is not found, you can try using the Python executable directly:
|
|
126
|
+
|
|
127
|
+
```bash
|
|
128
|
+
python -m pip install codebase-extractor
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
#### 💡 Pro Tip: Using pipx
|
|
132
|
+
|
|
133
|
+
For a more advanced, isolated installation, we recommend using pipx. This ensures the tool's dependencies do not conflict with other Python projects on your system.
|
|
134
|
+
|
|
135
|
+
```bash
|
|
136
|
+
pipx install codebase-extractor
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
---
|
|
140
|
+
|
|
141
|
+
## ▶️ Usage
|
|
142
|
+
|
|
143
|
+
### Basic Usage
|
|
144
|
+
|
|
145
|
+
Once installed, you can run the tool from any terminal window. Navigate to your project's root directory and run the command:
|
|
146
|
+
|
|
147
|
+
```bash
|
|
148
|
+
code-extractor
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
The script will then guide you through the extraction process.
|
|
152
|
+
|
|
153
|
+
For repeat usage, you can skip the detailed introductory guide by using the `--no-instructions` or `-ni` flag:
|
|
154
|
+
|
|
155
|
+
```bash
|
|
156
|
+
code-extractor --no-instructions
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
### The Process
|
|
160
|
+
|
|
161
|
+
The tool will guide you through a series of prompts:
|
|
162
|
+
|
|
163
|
+
- **Initial Setup [1/2]**: A yes/no question to skip files larger than 1MB.
|
|
164
|
+
- **Extraction Mode [2/2]**: Choose whether to extract the entire project (`Everything`) or select (`Specific`) folders.
|
|
165
|
+
|
|
166
|
+
### Specific Selection (if chosen):
|
|
167
|
+
|
|
168
|
+
- **Scan Depth**: You'll be asked how many sub-folder levels to scan for the selection list (defaults to 3).
|
|
169
|
+
- **Folder Tree**: You'll see a checklist of available folders and sub-folders to extract. The script handles selections intelligently:
|
|
170
|
+
- Selecting a parent folder automatically includes all its sub-folders, so you don’t need to select them individually.
|
|
171
|
+
- To extract only a sub-folder’s contents, select the sub-folder but not its parent.
|
|
172
|
+
- The special `root [...]` option extracts only the files in your project's main directory, ignoring all sub-folders.
|
|
173
|
+
|
|
174
|
+
### Output Details
|
|
175
|
+
|
|
176
|
+
All output files are saved in a `CODEBASE_EXTRACTS` directory within your project folder. Each generated Markdown file includes a YAML metadata header with a unique reference ID, timestamp, and file count for easy tracking and parsing.
|
|
177
|
+
|
|
178
|
+
### ⚡ CLI Command Reference
|
|
179
|
+
|
|
180
|
+
For non-interactive use and automation, you can control the script entirely with these arguments.
|
|
181
|
+
|
|
182
|
+
| Argument | Description | Default Value |
|
|
183
|
+
| :------------------------- | :--------------------------------------------------------------------------- | :-------------------------- |
|
|
184
|
+
| `-ni`, `--no-instructions` | Run the script without printing the detailed instruction banner. | `False` |
|
|
185
|
+
| `--root <path>` | The root directory of the project to extract. | The current directory |
|
|
186
|
+
| `--output-dir <name>` | Custom name for the output directory. | `CODEBASE_EXTRACTS` |
|
|
187
|
+
| `--dry-run` | Simulate the extraction process without writing any files. | `False` |
|
|
188
|
+
| `-v`, `--verbose` | Enable verbose logging for debugging. | `False` |
|
|
189
|
+
| `--log-file <path>` | Path to save the log file. | `None` |
|
|
190
|
+
| `--exclude-large-files` | Non-interactive: Exclude files larger than 1MB. | `False` |
|
|
191
|
+
| `--mode <mode>` | Non-interactive: Set the extraction mode. Choices: `everything`, `specific`. | `None` (Interactive prompt) |
|
|
192
|
+
| `--depth <number>` | Non-interactive: Set the folder scan depth for 'specific' mode. | `3` |
|
|
193
|
+
| `--select-folders <list>` | Non-interactive: A space-separated list of folders/sub-folders to extract. | `[]` |
|
|
194
|
+
| `--select-root` | Non-interactive: Include files from the root directory in the extraction. | `False` |
|
|
195
|
+
|
|
196
|
+
---
|
|
197
|
+
|
|
198
|
+
## Pracical Examples
|
|
199
|
+
|
|
200
|
+
Here are a few practical examples of how to use the tool from your command line.
|
|
201
|
+
|
|
202
|
+
- #### Extract an entire project, skipping the instructions
|
|
203
|
+
|
|
204
|
+
A common command for quick, automated runs.
|
|
205
|
+
|
|
206
|
+
```bash
|
|
207
|
+
code-extractor --no-instructions --mode everything
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
- #### Extract specific sub-folders non-interactively
|
|
211
|
+
|
|
212
|
+
This command extracts only the `src/components` and `src/hooks` directories, plus any files in the root.
|
|
213
|
+
|
|
214
|
+
```bash
|
|
215
|
+
code-extractor --ni --mode specific --select-folders src/components src/hooks --select-root
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
- #### Perform a safe dry run
|
|
219
|
+
|
|
220
|
+
This will simulate a full extraction and print what it _would_ have done, without creating any files.
|
|
221
|
+
|
|
222
|
+
```bash
|
|
223
|
+
code-extractor --dry-run --mode everything
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
- #### Run on a different project and save to a custom folder
|
|
227
|
+
This targets a completely different directory and specifies a custom output folder name.
|
|
228
|
+
```bash
|
|
229
|
+
code-extractor --root /path/to/another/project --output-dir MyProject_Extraction
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
---
|
|
233
|
+
|
|
234
|
+
## 🔬 Filtering Logic
|
|
235
|
+
|
|
236
|
+
The tool uses a set of rules to determine which files and folders to include in the extraction. Here are the default settings found in the `config.py` file.
|
|
237
|
+
|
|
238
|
+
<details>
|
|
239
|
+
<summary><strong>Click to view Excluded Directories</strong></summary>
|
|
240
|
+
|
|
241
|
+
- `node_modules`, `vendor`, `__pycache__`, `dist`, `build`, `target`, `.next`
|
|
242
|
+
- `.git`, `.svn`, `.hg`, `.vscode`, `.idea`, `venv`, `.venv`
|
|
243
|
+
|
|
244
|
+
</details>
|
|
245
|
+
|
|
246
|
+
<details>
|
|
247
|
+
<summary><strong>Click to view Excluded Filenames</strong></summary>
|
|
248
|
+
|
|
249
|
+
- `package-lock.json`, `yarn.lock`, `composer.lock`, `.env`
|
|
250
|
+
|
|
251
|
+
</details>
|
|
252
|
+
|
|
253
|
+
<details>
|
|
254
|
+
<summary><strong>Click to view Allowed Filenames & Extensions</strong></summary>
|
|
255
|
+
|
|
256
|
+
The script will process any file with one of the following extensions. It also explicitly allows common configuration files that may not have an extension.
|
|
257
|
+
|
|
258
|
+
**Allowed Filenames:**
|
|
259
|
+
- `dockerfile`, `.gitignore`, `.htaccess`, `makefile`
|
|
260
|
+
|
|
261
|
+
**Allowed Extensions:**
|
|
262
|
+
- `.php`, `.html`, `.css`, `.js`, `.jsx`, `.ts`, `.tsx`, `.vue`, `.svelte`
|
|
263
|
+
- `.py`, `.rb`, `.java`, `.c`, `.cpp`, `.cs`, `.go`, `.rs`
|
|
264
|
+
- `.json`, `.xml`, `.yaml`, `.yml`, `.toml`, `.ini`, `.conf`
|
|
265
|
+
- `.md`, `.txt`, `.rst`, `.twig`, `.blade`, `.handlebars`, `.mustache`, `.ejs`
|
|
266
|
+
- `.sql`, `.graphql`, `.gql`, `.tf`
|
|
267
|
+
|
|
268
|
+
</details>
|
|
269
|
+
|
|
270
|
+
---
|
|
271
|
+
|
|
272
|
+
## 🤔 Troubleshooting
|
|
273
|
+
|
|
274
|
+
- **Problem:** After installation, I run `code-extractor` and my terminal says `command not found`.
|
|
275
|
+
- **Solution:** This is usually a `PATH` issue. It means your system's shell doesn't know where to find the installed script. The `pip install --user` command sometimes requires you to add a local scripts directory to your `PATH`. Please refer to your operating system's documentation for instructions on how to modify your `PATH` environment variable.
|
|
276
|
+
|
|
277
|
+
- **Problem:** The tool ran, but a specific folder or file I expected to see is missing from the output.
|
|
278
|
+
- **Solution:** The file or folder was likely excluded by the tool's filtering rules. Please review the **[Filtering Logic](#-filtering-logic)** section above to see if its name or extension is on one of the exclusion lists.
|
|
279
|
+
|
|
280
|
+
## 📜 License
|
|
281
|
+
|
|
282
|
+
This project is licensed under a modified MIT License. Please see the [LICENSE](LICENSE) file for the full text.
|
|
283
|
+
|
|
284
|
+
The standard MIT License has been amended with a single, important attribution requirement:
|
|
285
|
+
|
|
286
|
+
If you use, copy, or modify any part of this software, you must include a clear and visible attribution to the original author and project in your derivative work.
|
|
287
|
+
|
|
288
|
+
This attribution must include:
|
|
289
|
+
|
|
290
|
+
- A link back to this original GitHub repository: [https://github.com/lukaszlekowski/codebase-extractor](https://github.com/lukaszlekowski/codebase-extractor)
|
|
291
|
+
- A link to the author's LinkedIn profile: [https://www.linkedin.com/in/lukasz-lekowski](https://www.linkedin.com/in/lukasz-lekowski)
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
__version__ = "1.1.0"
|