codebase-extractor 1.0.1__tar.gz → 1.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (22) hide show
  1. codebase_extractor-1.1.0/PKG-INFO +311 -0
  2. codebase_extractor-1.1.0/README.md +291 -0
  3. {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/pyproject.toml +1 -1
  4. codebase_extractor-1.1.0/src/codebase_extractor/__init__.py +1 -0
  5. codebase_extractor-1.1.0/src/codebase_extractor/cli.py +76 -0
  6. {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/src/codebase_extractor/config.py +1 -2
  7. {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/src/codebase_extractor/file_handler.py +10 -6
  8. codebase_extractor-1.1.0/src/codebase_extractor/main_logic.py +199 -0
  9. {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/src/codebase_extractor/ui.py +31 -5
  10. codebase_extractor-1.1.0/src/codebase_extractor.egg-info/PKG-INFO +311 -0
  11. {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/src/codebase_extractor.egg-info/SOURCES.txt +1 -0
  12. codebase_extractor-1.0.1/PKG-INFO +0 -167
  13. codebase_extractor-1.0.1/README.md +0 -147
  14. codebase_extractor-1.0.1/src/codebase_extractor/__init__.py +0 -1
  15. codebase_extractor-1.0.1/src/codebase_extractor/main_logic.py +0 -193
  16. codebase_extractor-1.0.1/src/codebase_extractor.egg-info/PKG-INFO +0 -167
  17. {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/LICENCE +0 -0
  18. {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/setup.cfg +0 -0
  19. {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/src/codebase_extractor.egg-info/dependency_links.txt +0 -0
  20. {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/src/codebase_extractor.egg-info/entry_points.txt +0 -0
  21. {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/src/codebase_extractor.egg-info/requires.txt +0 -0
  22. {codebase_extractor-1.0.1 → codebase_extractor-1.1.0}/src/codebase_extractor.egg-info/top_level.txt +0 -0
@@ -0,0 +1,311 @@
1
+ Metadata-Version: 2.4
2
+ Name: codebase-extractor
3
+ Version: 1.1.0
4
+ Summary: A CLI tool to extract project source code into structured Markdown files for LLM & AI context.
5
+ Author: Lukasz Lekowski
6
+ Project-URL: Homepage, https://github.com/lukaszlekowski/codebase-extractor
7
+ Project-URL: Bug Tracker, https://github.com/lukaszlekowski/codebase-extractor/issues
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Classifier: Topic :: Software Development :: Documentation
12
+ Classifier: Topic :: Utilities
13
+ Requires-Python: >=3.9
14
+ Description-Content-Type: text/markdown
15
+ License-File: LICENCE
16
+ Requires-Dist: questionary
17
+ Requires-Dist: halo
18
+ Requires-Dist: termcolor
19
+ Dynamic: license-file
20
+
21
+ # Codebase Extractor
22
+
23
+ <p align="center">
24
+ <strong>A user-friendly CLI tool to extract project source code into structured Markdown files.</strong>
25
+ </p>
26
+
27
+ <p align="center">
28
+ <a href="https://pypi.org/project/codebase-extractor/"><img src="https://badge.fury.io/py/codebase-extractor.svg" alt="PyPI version"></a>
29
+ <img src="https://img.shields.io/badge/python-3.9%2B-blue.svg" alt="Python Version">
30
+ <img src="https://img.shields.io/badge/License-MIT%20(Modified)-yellow.svg" alt="License: MIT (Modified)">
31
+ </p>
32
+
33
+ <p align="center">
34
+ 💡 <b>Love this tool?</b> Found a bug or have an idea? Share it on <a href="https://github.com/lukaszlekowski/codebase-extractor">GitHub</a>! <br>
35
+ 🤝 <b>Connect with me</b> on <a href="https://www.linkedin.com/in/lukasz-lekowski">LinkedIn</a>. <br>
36
+ ☕ <b>Enjoying it?</b> Support development with a <a href="https://www.buymeacoffee.com/lukaszlekowski">coffee</a>!
37
+ </p>
38
+
39
+ ---
40
+
41
+ ## Table of Contents
42
+
43
+ - [Codebase Extractor](#codebase-extractor)
44
+ - [Table of Contents](#table-of-contents)
45
+ - [🚀 Overview](#-overview)
46
+ - [✨ Key Features](#-key-features)
47
+ - [🖼️ Gallery](#️-gallery)
48
+ - [⚙️ Installation](#️-installation)
49
+ - [Step 1: Ensure Python is Installed](#step-1-ensure-python-is-installed)
50
+ - [Step 2: Install the Package](#step-2-install-the-package)
51
+ - [▶️ For macOS \& Linux Users](#️-for-macos--linux-users)
52
+ - [▶️ For Windows Users](#️-for-windows-users)
53
+ - [💡 Pro Tip: Using pipx](#-pro-tip-using-pipx)
54
+ - [▶️ Usage](#️-usage)
55
+ - [Basic Usage](#basic-usage)
56
+ - [The Process](#the-process)
57
+ - [Specific Selection (if chosen):](#specific-selection-if-chosen)
58
+ - [Output Details](#output-details)
59
+ - [⚡ CLI Command Reference](#-cli-command-reference)
60
+ - [Pracical Examples](#pracical-examples)
61
+ - [🔬 Filtering Logic](#-filtering-logic)
62
+ - [🤔 Troubleshooting](#-troubleshooting)
63
+ - [📜 License](#-license)
64
+
65
+ ---
66
+
67
+ ## 🚀 Overview
68
+
69
+ Codebase Extractor is a command-line interface (CLI) tool designed to scan a project directory and consolidate all relevant source code into neatly organized Markdown files. It's perfect for creating a complete project snapshot for analysis, documentation, or providing context to Large Language Models (LLMs) like GPT-4, Gemini, or Claude.
70
+
71
+ The tool is highly configurable, allowing you to select specific folders, exclude large files, and intelligently ignore common directories like `node_modules` and `.git`.
72
+
73
+ ---
74
+
75
+ ## ✨ Key Features
76
+
77
+ - **Interactive & User-Friendly:** A guided, multi-step CLI experience that makes selecting options simple and clear.
78
+ - **Smart Filtering:** Automatically excludes common dependency folders, build artifacts, version control directories, and IDE configuration files. The exact filters are configurable.
79
+ - **Flexible Selection Modes:** Choose to extract the entire project with one command, or dive into a specific selection mode.
80
+ - **🌳 Nested Folder Selection:** Interactively browse and select specific sub-folders from a tree-like view.
81
+ - **🔢 Configurable Scan Depth:** You decide how many levels deep the script should look for folders when building the selection tree.
82
+ - **YAML Metadata:** Each generated Markdown file is prepended with a YAML front matter block containing useful metadata like a unique run ID, timestamp, and file count for easy tracking and parsing.
83
+ - **🚀 Quick Start Mode:** Use the `--no-instructions` flag to skip the detailed intro guide on subsequent runs.
84
+ - **Safe & Robust:** Features graceful exit handling (`Ctrl+C`) and provides clear feedback during the extraction process.
85
+
86
+ ---
87
+
88
+ ## 🖼️ Gallery
89
+
90
+ <details>
91
+ <summary>Show Screenshots</summary>
92
+ <img src="images/welcome.png" width="330">
93
+ <img src="images/instructions.png" width="330">
94
+ <img src="images/file_tree.png" width="330">
95
+ <img src="images/extraction.png" width="330">
96
+
97
+ </details>
98
+
99
+ ---
100
+
101
+ ## ⚙️ Installation
102
+
103
+ This guide will walk you through installing the Codebase Extractor.
104
+
105
+ ### Step 1: Ensure Python is Installed
106
+
107
+ Make sure you have Python 3.9 or newer installed. You can check your version by opening your terminal and running:
108
+
109
+ ```bash
110
+ python3 --version
111
+ ```
112
+
113
+ ### Step 2: Install the Package
114
+
115
+ The recommended way to install is directly from PyPI using pip, which comes with Python.
116
+
117
+ #### ▶️ For macOS & Linux Users
118
+
119
+ Open your terminal and run the following command:
120
+
121
+ ```bash
122
+ pip3 install codebase-extractor
123
+ ```
124
+
125
+ > **Note on `pip` vs `pip3`**: On most modern systems, you should use pip3 to ensure you are using a Python 3 version of pip. This avoids conflicts with older, system-installed Python 2. If you are using a virtual environment, pip is often sufficient as it will be linked to the environment's Python version.
126
+
127
+ If you encounter a permission denied error, your system may require you to install it for your user account only:
128
+
129
+ ```bash
130
+ pip3 install --user codebase-extractor
131
+ ```
132
+
133
+ In this case, you may need to add the user script directory to your PATH. The installer will provide the necessary command if this is required.
134
+
135
+ #### ▶️ For Windows Users
136
+
137
+ Open Command Prompt or PowerShell and run the following command:
138
+
139
+ ```bash
140
+ pip install codebase-extractor
141
+ ```
142
+
143
+ > **Note on `pip`**: The standard Python installer for Windows typically configures the `pip` and `python` commands correctly, so you usually do not need to use `pip3` or `python3`.
144
+
145
+ If the pip command is not found, you can try using the Python executable directly:
146
+
147
+ ```bash
148
+ python -m pip install codebase-extractor
149
+ ```
150
+
151
+ #### 💡 Pro Tip: Using pipx
152
+
153
+ For a more advanced, isolated installation, we recommend using pipx. This ensures the tool's dependencies do not conflict with other Python projects on your system.
154
+
155
+ ```bash
156
+ pipx install codebase-extractor
157
+ ```
158
+
159
+ ---
160
+
161
+ ## ▶️ Usage
162
+
163
+ ### Basic Usage
164
+
165
+ Once installed, you can run the tool from any terminal window. Navigate to your project's root directory and run the command:
166
+
167
+ ```bash
168
+ code-extractor
169
+ ```
170
+
171
+ The script will then guide you through the extraction process.
172
+
173
+ For repeat usage, you can skip the detailed introductory guide by using the `--no-instructions` or `-ni` flag:
174
+
175
+ ```bash
176
+ code-extractor --no-instructions
177
+ ```
178
+
179
+ ### The Process
180
+
181
+ The tool will guide you through a series of prompts:
182
+
183
+ - **Initial Setup [1/2]**: A yes/no question to skip files larger than 1MB.
184
+ - **Extraction Mode [2/2]**: Choose whether to extract the entire project (`Everything`) or select (`Specific`) folders.
185
+
186
+ ### Specific Selection (if chosen):
187
+
188
+ - **Scan Depth**: You'll be asked how many sub-folder levels to scan for the selection list (defaults to 3).
189
+ - **Folder Tree**: You'll see a checklist of available folders and sub-folders to extract. The script handles selections intelligently:
190
+ - Selecting a parent folder automatically includes all its sub-folders, so you don’t need to select them individually.
191
+ - To extract only a sub-folder’s contents, select the sub-folder but not its parent.
192
+ - The special `root [...]` option extracts only the files in your project's main directory, ignoring all sub-folders.
193
+
194
+ ### Output Details
195
+
196
+ All output files are saved in a `CODEBASE_EXTRACTS` directory within your project folder. Each generated Markdown file includes a YAML metadata header with a unique reference ID, timestamp, and file count for easy tracking and parsing.
197
+
198
+ ### ⚡ CLI Command Reference
199
+
200
+ For non-interactive use and automation, you can control the script entirely with these arguments.
201
+
202
+ | Argument | Description | Default Value |
203
+ | :------------------------- | :--------------------------------------------------------------------------- | :-------------------------- |
204
+ | `-ni`, `--no-instructions` | Run the script without printing the detailed instruction banner. | `False` |
205
+ | `--root <path>` | The root directory of the project to extract. | The current directory |
206
+ | `--output-dir <name>` | Custom name for the output directory. | `CODEBASE_EXTRACTS` |
207
+ | `--dry-run` | Simulate the extraction process without writing any files. | `False` |
208
+ | `-v`, `--verbose` | Enable verbose logging for debugging. | `False` |
209
+ | `--log-file <path>` | Path to save the log file. | `None` |
210
+ | `--exclude-large-files` | Non-interactive: Exclude files larger than 1MB. | `False` |
211
+ | `--mode <mode>` | Non-interactive: Set the extraction mode. Choices: `everything`, `specific`. | `None` (Interactive prompt) |
212
+ | `--depth <number>` | Non-interactive: Set the folder scan depth for 'specific' mode. | `3` |
213
+ | `--select-folders <list>` | Non-interactive: A space-separated list of folders/sub-folders to extract. | `[]` |
214
+ | `--select-root` | Non-interactive: Include files from the root directory in the extraction. | `False` |
215
+
216
+ ---
217
+
218
+ ## Pracical Examples
219
+
220
+ Here are a few practical examples of how to use the tool from your command line.
221
+
222
+ - #### Extract an entire project, skipping the instructions
223
+
224
+ A common command for quick, automated runs.
225
+
226
+ ```bash
227
+ code-extractor --no-instructions --mode everything
228
+ ```
229
+
230
+ - #### Extract specific sub-folders non-interactively
231
+
232
+ This command extracts only the `src/components` and `src/hooks` directories, plus any files in the root.
233
+
234
+ ```bash
235
+ code-extractor --ni --mode specific --select-folders src/components src/hooks --select-root
236
+ ```
237
+
238
+ - #### Perform a safe dry run
239
+
240
+ This will simulate a full extraction and print what it _would_ have done, without creating any files.
241
+
242
+ ```bash
243
+ code-extractor --dry-run --mode everything
244
+ ```
245
+
246
+ - #### Run on a different project and save to a custom folder
247
+ This targets a completely different directory and specifies a custom output folder name.
248
+ ```bash
249
+ code-extractor --root /path/to/another/project --output-dir MyProject_Extraction
250
+ ```
251
+
252
+ ---
253
+
254
+ ## 🔬 Filtering Logic
255
+
256
+ The tool uses a set of rules to determine which files and folders to include in the extraction. Here are the default settings found in the `config.py` file.
257
+
258
+ <details>
259
+ <summary><strong>Click to view Excluded Directories</strong></summary>
260
+
261
+ - `node_modules`, `vendor`, `__pycache__`, `dist`, `build`, `target`, `.next`
262
+ - `.git`, `.svn`, `.hg`, `.vscode`, `.idea`, `venv`, `.venv`
263
+
264
+ </details>
265
+
266
+ <details>
267
+ <summary><strong>Click to view Excluded Filenames</strong></summary>
268
+
269
+ - `package-lock.json`, `yarn.lock`, `composer.lock`, `.env`
270
+
271
+ </details>
272
+
273
+ <details>
274
+ <summary><strong>Click to view Allowed Filenames & Extensions</strong></summary>
275
+
276
+ The script will process any file with one of the following extensions. It also explicitly allows common configuration files that may not have an extension.
277
+
278
+ **Allowed Filenames:**
279
+ - `dockerfile`, `.gitignore`, `.htaccess`, `makefile`
280
+
281
+ **Allowed Extensions:**
282
+ - `.php`, `.html`, `.css`, `.js`, `.jsx`, `.ts`, `.tsx`, `.vue`, `.svelte`
283
+ - `.py`, `.rb`, `.java`, `.c`, `.cpp`, `.cs`, `.go`, `.rs`
284
+ - `.json`, `.xml`, `.yaml`, `.yml`, `.toml`, `.ini`, `.conf`
285
+ - `.md`, `.txt`, `.rst`, `.twig`, `.blade`, `.handlebars`, `.mustache`, `.ejs`
286
+ - `.sql`, `.graphql`, `.gql`, `.tf`
287
+
288
+ </details>
289
+
290
+ ---
291
+
292
+ ## 🤔 Troubleshooting
293
+
294
+ - **Problem:** After installation, I run `code-extractor` and my terminal says `command not found`.
295
+ - **Solution:** This is usually a `PATH` issue. It means your system's shell doesn't know where to find the installed script. The `pip install --user` command sometimes requires you to add a local scripts directory to your `PATH`. Please refer to your operating system's documentation for instructions on how to modify your `PATH` environment variable.
296
+
297
+ - **Problem:** The tool ran, but a specific folder or file I expected to see is missing from the output.
298
+ - **Solution:** The file or folder was likely excluded by the tool's filtering rules. Please review the **[Filtering Logic](#-filtering-logic)** section above to see if its name or extension is on one of the exclusion lists.
299
+
300
+ ## 📜 License
301
+
302
+ This project is licensed under a modified MIT License. Please see the [LICENSE](LICENSE) file for the full text.
303
+
304
+ The standard MIT License has been amended with a single, important attribution requirement:
305
+
306
+ If you use, copy, or modify any part of this software, you must include a clear and visible attribution to the original author and project in your derivative work.
307
+
308
+ This attribution must include:
309
+
310
+ - A link back to this original GitHub repository: [https://github.com/lukaszlekowski/codebase-extractor](https://github.com/lukaszlekowski/codebase-extractor)
311
+ - A link to the author's LinkedIn profile: [https://www.linkedin.com/in/lukasz-lekowski](https://www.linkedin.com/in/lukasz-lekowski)
@@ -0,0 +1,291 @@
1
+ # Codebase Extractor
2
+
3
+ <p align="center">
4
+ <strong>A user-friendly CLI tool to extract project source code into structured Markdown files.</strong>
5
+ </p>
6
+
7
+ <p align="center">
8
+ <a href="https://pypi.org/project/codebase-extractor/"><img src="https://badge.fury.io/py/codebase-extractor.svg" alt="PyPI version"></a>
9
+ <img src="https://img.shields.io/badge/python-3.9%2B-blue.svg" alt="Python Version">
10
+ <img src="https://img.shields.io/badge/License-MIT%20(Modified)-yellow.svg" alt="License: MIT (Modified)">
11
+ </p>
12
+
13
+ <p align="center">
14
+ 💡 <b>Love this tool?</b> Found a bug or have an idea? Share it on <a href="https://github.com/lukaszlekowski/codebase-extractor">GitHub</a>! <br>
15
+ 🤝 <b>Connect with me</b> on <a href="https://www.linkedin.com/in/lukasz-lekowski">LinkedIn</a>. <br>
16
+ ☕ <b>Enjoying it?</b> Support development with a <a href="https://www.buymeacoffee.com/lukaszlekowski">coffee</a>!
17
+ </p>
18
+
19
+ ---
20
+
21
+ ## Table of Contents
22
+
23
+ - [Codebase Extractor](#codebase-extractor)
24
+ - [Table of Contents](#table-of-contents)
25
+ - [🚀 Overview](#-overview)
26
+ - [✨ Key Features](#-key-features)
27
+ - [🖼️ Gallery](#️-gallery)
28
+ - [⚙️ Installation](#️-installation)
29
+ - [Step 1: Ensure Python is Installed](#step-1-ensure-python-is-installed)
30
+ - [Step 2: Install the Package](#step-2-install-the-package)
31
+ - [▶️ For macOS \& Linux Users](#️-for-macos--linux-users)
32
+ - [▶️ For Windows Users](#️-for-windows-users)
33
+ - [💡 Pro Tip: Using pipx](#-pro-tip-using-pipx)
34
+ - [▶️ Usage](#️-usage)
35
+ - [Basic Usage](#basic-usage)
36
+ - [The Process](#the-process)
37
+ - [Specific Selection (if chosen):](#specific-selection-if-chosen)
38
+ - [Output Details](#output-details)
39
+ - [⚡ CLI Command Reference](#-cli-command-reference)
40
+ - [Pracical Examples](#pracical-examples)
41
+ - [🔬 Filtering Logic](#-filtering-logic)
42
+ - [🤔 Troubleshooting](#-troubleshooting)
43
+ - [📜 License](#-license)
44
+
45
+ ---
46
+
47
+ ## 🚀 Overview
48
+
49
+ Codebase Extractor is a command-line interface (CLI) tool designed to scan a project directory and consolidate all relevant source code into neatly organized Markdown files. It's perfect for creating a complete project snapshot for analysis, documentation, or providing context to Large Language Models (LLMs) like GPT-4, Gemini, or Claude.
50
+
51
+ The tool is highly configurable, allowing you to select specific folders, exclude large files, and intelligently ignore common directories like `node_modules` and `.git`.
52
+
53
+ ---
54
+
55
+ ## ✨ Key Features
56
+
57
+ - **Interactive & User-Friendly:** A guided, multi-step CLI experience that makes selecting options simple and clear.
58
+ - **Smart Filtering:** Automatically excludes common dependency folders, build artifacts, version control directories, and IDE configuration files. The exact filters are configurable.
59
+ - **Flexible Selection Modes:** Choose to extract the entire project with one command, or dive into a specific selection mode.
60
+ - **🌳 Nested Folder Selection:** Interactively browse and select specific sub-folders from a tree-like view.
61
+ - **🔢 Configurable Scan Depth:** You decide how many levels deep the script should look for folders when building the selection tree.
62
+ - **YAML Metadata:** Each generated Markdown file is prepended with a YAML front matter block containing useful metadata like a unique run ID, timestamp, and file count for easy tracking and parsing.
63
+ - **🚀 Quick Start Mode:** Use the `--no-instructions` flag to skip the detailed intro guide on subsequent runs.
64
+ - **Safe & Robust:** Features graceful exit handling (`Ctrl+C`) and provides clear feedback during the extraction process.
65
+
66
+ ---
67
+
68
+ ## 🖼️ Gallery
69
+
70
+ <details>
71
+ <summary>Show Screenshots</summary>
72
+ <img src="images/welcome.png" width="330">
73
+ <img src="images/instructions.png" width="330">
74
+ <img src="images/file_tree.png" width="330">
75
+ <img src="images/extraction.png" width="330">
76
+
77
+ </details>
78
+
79
+ ---
80
+
81
+ ## ⚙️ Installation
82
+
83
+ This guide will walk you through installing the Codebase Extractor.
84
+
85
+ ### Step 1: Ensure Python is Installed
86
+
87
+ Make sure you have Python 3.9 or newer installed. You can check your version by opening your terminal and running:
88
+
89
+ ```bash
90
+ python3 --version
91
+ ```
92
+
93
+ ### Step 2: Install the Package
94
+
95
+ The recommended way to install is directly from PyPI using pip, which comes with Python.
96
+
97
+ #### ▶️ For macOS & Linux Users
98
+
99
+ Open your terminal and run the following command:
100
+
101
+ ```bash
102
+ pip3 install codebase-extractor
103
+ ```
104
+
105
+ > **Note on `pip` vs `pip3`**: On most modern systems, you should use pip3 to ensure you are using a Python 3 version of pip. This avoids conflicts with older, system-installed Python 2. If you are using a virtual environment, pip is often sufficient as it will be linked to the environment's Python version.
106
+
107
+ If you encounter a permission denied error, your system may require you to install it for your user account only:
108
+
109
+ ```bash
110
+ pip3 install --user codebase-extractor
111
+ ```
112
+
113
+ In this case, you may need to add the user script directory to your PATH. The installer will provide the necessary command if this is required.
114
+
115
+ #### ▶️ For Windows Users
116
+
117
+ Open Command Prompt or PowerShell and run the following command:
118
+
119
+ ```bash
120
+ pip install codebase-extractor
121
+ ```
122
+
123
+ > **Note on `pip`**: The standard Python installer for Windows typically configures the `pip` and `python` commands correctly, so you usually do not need to use `pip3` or `python3`.
124
+
125
+ If the pip command is not found, you can try using the Python executable directly:
126
+
127
+ ```bash
128
+ python -m pip install codebase-extractor
129
+ ```
130
+
131
+ #### 💡 Pro Tip: Using pipx
132
+
133
+ For a more advanced, isolated installation, we recommend using pipx. This ensures the tool's dependencies do not conflict with other Python projects on your system.
134
+
135
+ ```bash
136
+ pipx install codebase-extractor
137
+ ```
138
+
139
+ ---
140
+
141
+ ## ▶️ Usage
142
+
143
+ ### Basic Usage
144
+
145
+ Once installed, you can run the tool from any terminal window. Navigate to your project's root directory and run the command:
146
+
147
+ ```bash
148
+ code-extractor
149
+ ```
150
+
151
+ The script will then guide you through the extraction process.
152
+
153
+ For repeat usage, you can skip the detailed introductory guide by using the `--no-instructions` or `-ni` flag:
154
+
155
+ ```bash
156
+ code-extractor --no-instructions
157
+ ```
158
+
159
+ ### The Process
160
+
161
+ The tool will guide you through a series of prompts:
162
+
163
+ - **Initial Setup [1/2]**: A yes/no question to skip files larger than 1MB.
164
+ - **Extraction Mode [2/2]**: Choose whether to extract the entire project (`Everything`) or select (`Specific`) folders.
165
+
166
+ ### Specific Selection (if chosen):
167
+
168
+ - **Scan Depth**: You'll be asked how many sub-folder levels to scan for the selection list (defaults to 3).
169
+ - **Folder Tree**: You'll see a checklist of available folders and sub-folders to extract. The script handles selections intelligently:
170
+ - Selecting a parent folder automatically includes all its sub-folders, so you don’t need to select them individually.
171
+ - To extract only a sub-folder’s contents, select the sub-folder but not its parent.
172
+ - The special `root [...]` option extracts only the files in your project's main directory, ignoring all sub-folders.
173
+
174
+ ### Output Details
175
+
176
+ All output files are saved in a `CODEBASE_EXTRACTS` directory within your project folder. Each generated Markdown file includes a YAML metadata header with a unique reference ID, timestamp, and file count for easy tracking and parsing.
177
+
178
+ ### ⚡ CLI Command Reference
179
+
180
+ For non-interactive use and automation, you can control the script entirely with these arguments.
181
+
182
+ | Argument | Description | Default Value |
183
+ | :------------------------- | :--------------------------------------------------------------------------- | :-------------------------- |
184
+ | `-ni`, `--no-instructions` | Run the script without printing the detailed instruction banner. | `False` |
185
+ | `--root <path>` | The root directory of the project to extract. | The current directory |
186
+ | `--output-dir <name>` | Custom name for the output directory. | `CODEBASE_EXTRACTS` |
187
+ | `--dry-run` | Simulate the extraction process without writing any files. | `False` |
188
+ | `-v`, `--verbose` | Enable verbose logging for debugging. | `False` |
189
+ | `--log-file <path>` | Path to save the log file. | `None` |
190
+ | `--exclude-large-files` | Non-interactive: Exclude files larger than 1MB. | `False` |
191
+ | `--mode <mode>` | Non-interactive: Set the extraction mode. Choices: `everything`, `specific`. | `None` (Interactive prompt) |
192
+ | `--depth <number>` | Non-interactive: Set the folder scan depth for 'specific' mode. | `3` |
193
+ | `--select-folders <list>` | Non-interactive: A space-separated list of folders/sub-folders to extract. | `[]` |
194
+ | `--select-root` | Non-interactive: Include files from the root directory in the extraction. | `False` |
195
+
196
+ ---
197
+
198
+ ## Pracical Examples
199
+
200
+ Here are a few practical examples of how to use the tool from your command line.
201
+
202
+ - #### Extract an entire project, skipping the instructions
203
+
204
+ A common command for quick, automated runs.
205
+
206
+ ```bash
207
+ code-extractor --no-instructions --mode everything
208
+ ```
209
+
210
+ - #### Extract specific sub-folders non-interactively
211
+
212
+ This command extracts only the `src/components` and `src/hooks` directories, plus any files in the root.
213
+
214
+ ```bash
215
+ code-extractor --ni --mode specific --select-folders src/components src/hooks --select-root
216
+ ```
217
+
218
+ - #### Perform a safe dry run
219
+
220
+ This will simulate a full extraction and print what it _would_ have done, without creating any files.
221
+
222
+ ```bash
223
+ code-extractor --dry-run --mode everything
224
+ ```
225
+
226
+ - #### Run on a different project and save to a custom folder
227
+ This targets a completely different directory and specifies a custom output folder name.
228
+ ```bash
229
+ code-extractor --root /path/to/another/project --output-dir MyProject_Extraction
230
+ ```
231
+
232
+ ---
233
+
234
+ ## 🔬 Filtering Logic
235
+
236
+ The tool uses a set of rules to determine which files and folders to include in the extraction. Here are the default settings found in the `config.py` file.
237
+
238
+ <details>
239
+ <summary><strong>Click to view Excluded Directories</strong></summary>
240
+
241
+ - `node_modules`, `vendor`, `__pycache__`, `dist`, `build`, `target`, `.next`
242
+ - `.git`, `.svn`, `.hg`, `.vscode`, `.idea`, `venv`, `.venv`
243
+
244
+ </details>
245
+
246
+ <details>
247
+ <summary><strong>Click to view Excluded Filenames</strong></summary>
248
+
249
+ - `package-lock.json`, `yarn.lock`, `composer.lock`, `.env`
250
+
251
+ </details>
252
+
253
+ <details>
254
+ <summary><strong>Click to view Allowed Filenames & Extensions</strong></summary>
255
+
256
+ The script will process any file with one of the following extensions. It also explicitly allows common configuration files that may not have an extension.
257
+
258
+ **Allowed Filenames:**
259
+ - `dockerfile`, `.gitignore`, `.htaccess`, `makefile`
260
+
261
+ **Allowed Extensions:**
262
+ - `.php`, `.html`, `.css`, `.js`, `.jsx`, `.ts`, `.tsx`, `.vue`, `.svelte`
263
+ - `.py`, `.rb`, `.java`, `.c`, `.cpp`, `.cs`, `.go`, `.rs`
264
+ - `.json`, `.xml`, `.yaml`, `.yml`, `.toml`, `.ini`, `.conf`
265
+ - `.md`, `.txt`, `.rst`, `.twig`, `.blade`, `.handlebars`, `.mustache`, `.ejs`
266
+ - `.sql`, `.graphql`, `.gql`, `.tf`
267
+
268
+ </details>
269
+
270
+ ---
271
+
272
+ ## 🤔 Troubleshooting
273
+
274
+ - **Problem:** After installation, I run `code-extractor` and my terminal says `command not found`.
275
+ - **Solution:** This is usually a `PATH` issue. It means your system's shell doesn't know where to find the installed script. The `pip install --user` command sometimes requires you to add a local scripts directory to your `PATH`. Please refer to your operating system's documentation for instructions on how to modify your `PATH` environment variable.
276
+
277
+ - **Problem:** The tool ran, but a specific folder or file I expected to see is missing from the output.
278
+ - **Solution:** The file or folder was likely excluded by the tool's filtering rules. Please review the **[Filtering Logic](#-filtering-logic)** section above to see if its name or extension is on one of the exclusion lists.
279
+
280
+ ## 📜 License
281
+
282
+ This project is licensed under a modified MIT License. Please see the [LICENSE](LICENSE) file for the full text.
283
+
284
+ The standard MIT License has been amended with a single, important attribution requirement:
285
+
286
+ If you use, copy, or modify any part of this software, you must include a clear and visible attribution to the original author and project in your derivative work.
287
+
288
+ This attribution must include:
289
+
290
+ - A link back to this original GitHub repository: [https://github.com/lukaszlekowski/codebase-extractor](https://github.com/lukaszlekowski/codebase-extractor)
291
+ - A link to the author's LinkedIn profile: [https://www.linkedin.com/in/lukasz-lekowski](https://www.linkedin.com/in/lukasz-lekowski)
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "codebase-extractor"
7
- version = "1.0.1"
7
+ version = "1.1.0"
8
8
  authors = [
9
9
  { name="Lukasz Lekowski" },
10
10
  ]
@@ -0,0 +1 @@
1
+ __version__ = "1.1.0"