codebase-extractor 1.0.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- codebase_extractor-1.0.0/LICENCE +11 -0
- codebase_extractor-1.0.0/PKG-INFO +143 -0
- codebase_extractor-1.0.0/README.md +123 -0
- codebase_extractor-1.0.0/pyproject.toml +34 -0
- codebase_extractor-1.0.0/setup.cfg +4 -0
- codebase_extractor-1.0.0/src/codebase_extractor/__init__.py +1 -0
- codebase_extractor-1.0.0/src/codebase_extractor/config.py +41 -0
- codebase_extractor-1.0.0/src/codebase_extractor/file_handler.py +133 -0
- codebase_extractor-1.0.0/src/codebase_extractor/main_logic.py +197 -0
- codebase_extractor-1.0.0/src/codebase_extractor/ui.py +84 -0
- codebase_extractor-1.0.0/src/codebase_extractor.egg-info/PKG-INFO +143 -0
- codebase_extractor-1.0.0/src/codebase_extractor.egg-info/SOURCES.txt +14 -0
- codebase_extractor-1.0.0/src/codebase_extractor.egg-info/dependency_links.txt +1 -0
- codebase_extractor-1.0.0/src/codebase_extractor.egg-info/entry_points.txt +2 -0
- codebase_extractor-1.0.0/src/codebase_extractor.egg-info/requires.txt +3 -0
- codebase_extractor-1.0.0/src/codebase_extractor.egg-info/top_level.txt +1 -0
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
|
|
2
|
+
|
|
3
|
+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
|
|
4
|
+
|
|
5
|
+
If you use, copy, or modify any part of this software, you must include a clear and visible attribution to the original author and project in your derivative work. This attribution must include:
|
|
6
|
+
|
|
7
|
+
A link back to the original GitHub repository: https://github.com/lukaszlekowski/codebase-extractor
|
|
8
|
+
|
|
9
|
+
A link to the author's LinkedIn profile: https://www.linkedin.com/in/lukasz-lekowski
|
|
10
|
+
|
|
11
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
|
@@ -0,0 +1,143 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: codebase-extractor
|
|
3
|
+
Version: 1.0.0
|
|
4
|
+
Summary: A CLI tool to extract project source code into structured Markdown files for LLM & AI context.
|
|
5
|
+
Author: Lukasz Lekowski
|
|
6
|
+
Project-URL: Homepage, https://github.com/lukaszlekowski/codebase-extractor
|
|
7
|
+
Project-URL: Bug Tracker, https://github.com/lukaszlekowski/codebase-extractor/issues
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
+
Classifier: Operating System :: OS Independent
|
|
11
|
+
Classifier: Topic :: Software Development :: Documentation
|
|
12
|
+
Classifier: Topic :: Utilities
|
|
13
|
+
Requires-Python: >=3.9
|
|
14
|
+
Description-Content-Type: text/markdown
|
|
15
|
+
License-File: LICENCE
|
|
16
|
+
Requires-Dist: questionary
|
|
17
|
+
Requires-Dist: halo
|
|
18
|
+
Requires-Dist: termcolor
|
|
19
|
+
Dynamic: license-file
|
|
20
|
+
|
|
21
|
+
# Codebase Extractor
|
|
22
|
+
|
|
23
|
+
<p align="center">
|
|
24
|
+
<strong>A user-friendly CLI tool to extract project source code into structured Markdown files.</strong>
|
|
25
|
+
</p>
|
|
26
|
+
|
|
27
|
+
<p align="center">
|
|
28
|
+
<img src="https://img.shields.io/badge/License-MIT%20(Modified)-yellow.svg" alt="License: MIT (Modified)">
|
|
29
|
+
<img src="https://img.shields.io/badge/python-3.9%2B-blue.svg" alt="Python Version">
|
|
30
|
+
<img src="https://img.shields.io/github/stars/lukaszlekowski/codebase-extractor?style=social" alt="GitHub Stars">
|
|
31
|
+
</p>
|
|
32
|
+
<p align="center">
|
|
33
|
+
π‘ <b>Love this tool?</b> Found a bug or have an idea? Share it on <a href="https://github.com/lukaszlekowski/codebase-extractor">GitHub</a>! <br>
|
|
34
|
+
π€ <b>Connect with me</b> on <a href="https://www.linkedin.com/in/lukasz-lekowski">LinkedIn</a>. <br>
|
|
35
|
+
β <b>Enjoying it?</b> Support development with a <a href="https://www.buymeacoffee.com/lukaszlekowski">coffee</a>!
|
|
36
|
+
</p>
|
|
37
|
+
|
|
38
|
+
</p>
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## π Overview
|
|
43
|
+
|
|
44
|
+
Codebase Extractor is a command-line interface (CLI) tool designed to scan a project directory and consolidate all relevant source code into neatly organized Markdown files. It's perfect for creating a complete project snapshot for analysis, documentation, or providing context to Large Language Models (LLMs) like GPT-4, Gemini, or Claude.
|
|
45
|
+
|
|
46
|
+
The tool is highly configurable, allowing you to select specific folders, exclude large files, and intelligently ignore common directories like `node_modules` and `.git`.
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
## β¨ Key Features
|
|
51
|
+
|
|
52
|
+
- **Interactive & User-Friendly:** A guided, multi-step CLI experience that makes selecting options simple and clear.
|
|
53
|
+
- **Smart Filtering:** Automatically excludes common dependency folders, build artifacts, version control directories, and IDE configuration files.
|
|
54
|
+
- **Flexible Selection Modes:** Choose to extract the entire project with one command, or dive into a specific selection mode.
|
|
55
|
+
- **π³ Nested Folder Selection:** Interactively browse and select specific sub-folders from a tree-like view.
|
|
56
|
+
- **π’ Configurable Scan Depth:** You decide how many levels deep the script should look for folders when building the selection tree.
|
|
57
|
+
- **YAML Metadata:** Each generated Markdown file is prepended with a YAML front matter block containing useful metadata like a unique run ID, timestamp, and file count for easy tracking and parsing.
|
|
58
|
+
- **π Quick Start Mode:** Use the `--no-instructions` flag to skip the detailed intro guide on subsequent runs.
|
|
59
|
+
- **Safe & Robust:** Features graceful exit handling (`Ctrl+C`) and provides clear feedback during the extraction process.
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## π¦ Installation
|
|
64
|
+
|
|
65
|
+
To get started with Codebase Extractor, follow these steps.
|
|
66
|
+
|
|
67
|
+
1. **Clone the Repository**
|
|
68
|
+
|
|
69
|
+
```bash
|
|
70
|
+
git clone [https://github.com/lukaszlekowski/codebase-extractor.git](https://github.com/lukaszlekowski/codebase-extractor.git)
|
|
71
|
+
cd codebase-extractor
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
2. **Set Up a Virtual Environment** (Recommended)
|
|
75
|
+
This keeps the dependencies for this project isolated from your system's Python installation.
|
|
76
|
+
|
|
77
|
+
```bash
|
|
78
|
+
# For macOS/Linux
|
|
79
|
+
python3 -m venv venv
|
|
80
|
+
source venv/bin/activate
|
|
81
|
+
|
|
82
|
+
# For Windows
|
|
83
|
+
python -m venv venv
|
|
84
|
+
venv\Scripts\activate
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
3. **Install Dependencies**
|
|
88
|
+
The required Python packages are listed in `requirements.txt`.
|
|
89
|
+
```bash
|
|
90
|
+
pip install -r requirements.txt
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
## βΆοΈ Usage
|
|
96
|
+
|
|
97
|
+
### Basic Usage
|
|
98
|
+
|
|
99
|
+
Once installed, simply run the script from the root of the project you wish to extract:
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
python3 codebase_extractor.py
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
# Quick Start
|
|
106
|
+
|
|
107
|
+
For repeat usage, you can skip the detailed introductory guide by using the `--no-instructions` or `-ni` flag:
|
|
108
|
+
|
|
109
|
+
```bash
|
|
110
|
+
python3 codebase_extractor.py --no-instructions
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
## The Process
|
|
114
|
+
|
|
115
|
+
The tool will guide you through a series of prompts:
|
|
116
|
+
|
|
117
|
+
- **Initial Setup [1/2]**: A yes/no question to skip files larger than 1MB.
|
|
118
|
+
- **Extraction Mode [2/2]**: Choose whether to extract the entire project (`Everything`) or select specific folders.
|
|
119
|
+
|
|
120
|
+
### Specific Selection (if chosen):
|
|
121
|
+
|
|
122
|
+
- **Scan Depth**: You'll be asked how many sub-folder levels to scan for the selection list (defaults to 3).
|
|
123
|
+
- **Folder Tree**: You'll see a checklist of available folders and sub-folders to extract. The script handles selections intelligently:
|
|
124
|
+
- Selecting a parent folder automatically includes all its sub-folders, so you donβt need to select them individually.
|
|
125
|
+
- To extract only a sub-folderβs contents, select the sub-folder but not its parent.
|
|
126
|
+
- The special `root [...]` option extracts only the files in your project's main directory, ignoring all sub-folders.
|
|
127
|
+
|
|
128
|
+
## Output Details
|
|
129
|
+
|
|
130
|
+
All output files are saved in a `CODEBASE_EXTRACTS` directory within your project folder. Each generated Markdown file includes a YAML metadata header with a unique reference ID, timestamp, and file count for easy tracking and parsing.
|
|
131
|
+
|
|
132
|
+
## π License
|
|
133
|
+
|
|
134
|
+
This project is licensed under a modified MIT License. Please see the [LICENSE](LICENSE) file for the full text.
|
|
135
|
+
|
|
136
|
+
The standard MIT License has been amended with a single, important attribution requirement:
|
|
137
|
+
|
|
138
|
+
If you use, copy, or modify any part of this software, you must include a clear and visible attribution to the original author and project in your derivative work.
|
|
139
|
+
|
|
140
|
+
This attribution must include:
|
|
141
|
+
|
|
142
|
+
- A link back to this original GitHub repository: [https://github.com/lukaszlekowski/codebase-extractor](https://github.com/lukaszlekowski/codebase-extractor)
|
|
143
|
+
- A link to the author's LinkedIn profile: [https://www.linkedin.com/in/lukasz-lekowski](https://www.linkedin.com/in/lukasz-lekowski)
|
|
@@ -0,0 +1,123 @@
|
|
|
1
|
+
# Codebase Extractor
|
|
2
|
+
|
|
3
|
+
<p align="center">
|
|
4
|
+
<strong>A user-friendly CLI tool to extract project source code into structured Markdown files.</strong>
|
|
5
|
+
</p>
|
|
6
|
+
|
|
7
|
+
<p align="center">
|
|
8
|
+
<img src="https://img.shields.io/badge/License-MIT%20(Modified)-yellow.svg" alt="License: MIT (Modified)">
|
|
9
|
+
<img src="https://img.shields.io/badge/python-3.9%2B-blue.svg" alt="Python Version">
|
|
10
|
+
<img src="https://img.shields.io/github/stars/lukaszlekowski/codebase-extractor?style=social" alt="GitHub Stars">
|
|
11
|
+
</p>
|
|
12
|
+
<p align="center">
|
|
13
|
+
π‘ <b>Love this tool?</b> Found a bug or have an idea? Share it on <a href="https://github.com/lukaszlekowski/codebase-extractor">GitHub</a>! <br>
|
|
14
|
+
π€ <b>Connect with me</b> on <a href="https://www.linkedin.com/in/lukasz-lekowski">LinkedIn</a>. <br>
|
|
15
|
+
β <b>Enjoying it?</b> Support development with a <a href="https://www.buymeacoffee.com/lukaszlekowski">coffee</a>!
|
|
16
|
+
</p>
|
|
17
|
+
|
|
18
|
+
</p>
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## π Overview
|
|
23
|
+
|
|
24
|
+
Codebase Extractor is a command-line interface (CLI) tool designed to scan a project directory and consolidate all relevant source code into neatly organized Markdown files. It's perfect for creating a complete project snapshot for analysis, documentation, or providing context to Large Language Models (LLMs) like GPT-4, Gemini, or Claude.
|
|
25
|
+
|
|
26
|
+
The tool is highly configurable, allowing you to select specific folders, exclude large files, and intelligently ignore common directories like `node_modules` and `.git`.
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## β¨ Key Features
|
|
31
|
+
|
|
32
|
+
- **Interactive & User-Friendly:** A guided, multi-step CLI experience that makes selecting options simple and clear.
|
|
33
|
+
- **Smart Filtering:** Automatically excludes common dependency folders, build artifacts, version control directories, and IDE configuration files.
|
|
34
|
+
- **Flexible Selection Modes:** Choose to extract the entire project with one command, or dive into a specific selection mode.
|
|
35
|
+
- **π³ Nested Folder Selection:** Interactively browse and select specific sub-folders from a tree-like view.
|
|
36
|
+
- **π’ Configurable Scan Depth:** You decide how many levels deep the script should look for folders when building the selection tree.
|
|
37
|
+
- **YAML Metadata:** Each generated Markdown file is prepended with a YAML front matter block containing useful metadata like a unique run ID, timestamp, and file count for easy tracking and parsing.
|
|
38
|
+
- **π Quick Start Mode:** Use the `--no-instructions` flag to skip the detailed intro guide on subsequent runs.
|
|
39
|
+
- **Safe & Robust:** Features graceful exit handling (`Ctrl+C`) and provides clear feedback during the extraction process.
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
## π¦ Installation
|
|
44
|
+
|
|
45
|
+
To get started with Codebase Extractor, follow these steps.
|
|
46
|
+
|
|
47
|
+
1. **Clone the Repository**
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
git clone [https://github.com/lukaszlekowski/codebase-extractor.git](https://github.com/lukaszlekowski/codebase-extractor.git)
|
|
51
|
+
cd codebase-extractor
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
2. **Set Up a Virtual Environment** (Recommended)
|
|
55
|
+
This keeps the dependencies for this project isolated from your system's Python installation.
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
# For macOS/Linux
|
|
59
|
+
python3 -m venv venv
|
|
60
|
+
source venv/bin/activate
|
|
61
|
+
|
|
62
|
+
# For Windows
|
|
63
|
+
python -m venv venv
|
|
64
|
+
venv\Scripts\activate
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
3. **Install Dependencies**
|
|
68
|
+
The required Python packages are listed in `requirements.txt`.
|
|
69
|
+
```bash
|
|
70
|
+
pip install -r requirements.txt
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
---
|
|
74
|
+
|
|
75
|
+
## βΆοΈ Usage
|
|
76
|
+
|
|
77
|
+
### Basic Usage
|
|
78
|
+
|
|
79
|
+
Once installed, simply run the script from the root of the project you wish to extract:
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
python3 codebase_extractor.py
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
# Quick Start
|
|
86
|
+
|
|
87
|
+
For repeat usage, you can skip the detailed introductory guide by using the `--no-instructions` or `-ni` flag:
|
|
88
|
+
|
|
89
|
+
```bash
|
|
90
|
+
python3 codebase_extractor.py --no-instructions
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
## The Process
|
|
94
|
+
|
|
95
|
+
The tool will guide you through a series of prompts:
|
|
96
|
+
|
|
97
|
+
- **Initial Setup [1/2]**: A yes/no question to skip files larger than 1MB.
|
|
98
|
+
- **Extraction Mode [2/2]**: Choose whether to extract the entire project (`Everything`) or select specific folders.
|
|
99
|
+
|
|
100
|
+
### Specific Selection (if chosen):
|
|
101
|
+
|
|
102
|
+
- **Scan Depth**: You'll be asked how many sub-folder levels to scan for the selection list (defaults to 3).
|
|
103
|
+
- **Folder Tree**: You'll see a checklist of available folders and sub-folders to extract. The script handles selections intelligently:
|
|
104
|
+
- Selecting a parent folder automatically includes all its sub-folders, so you donβt need to select them individually.
|
|
105
|
+
- To extract only a sub-folderβs contents, select the sub-folder but not its parent.
|
|
106
|
+
- The special `root [...]` option extracts only the files in your project's main directory, ignoring all sub-folders.
|
|
107
|
+
|
|
108
|
+
## Output Details
|
|
109
|
+
|
|
110
|
+
All output files are saved in a `CODEBASE_EXTRACTS` directory within your project folder. Each generated Markdown file includes a YAML metadata header with a unique reference ID, timestamp, and file count for easy tracking and parsing.
|
|
111
|
+
|
|
112
|
+
## π License
|
|
113
|
+
|
|
114
|
+
This project is licensed under a modified MIT License. Please see the [LICENSE](LICENSE) file for the full text.
|
|
115
|
+
|
|
116
|
+
The standard MIT License has been amended with a single, important attribution requirement:
|
|
117
|
+
|
|
118
|
+
If you use, copy, or modify any part of this software, you must include a clear and visible attribution to the original author and project in your derivative work.
|
|
119
|
+
|
|
120
|
+
This attribution must include:
|
|
121
|
+
|
|
122
|
+
- A link back to this original GitHub repository: [https://github.com/lukaszlekowski/codebase-extractor](https://github.com/lukaszlekowski/codebase-extractor)
|
|
123
|
+
- A link to the author's LinkedIn profile: [https://www.linkedin.com/in/lukasz-lekowski](https://www.linkedin.com/in/lukasz-lekowski)
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
[build-system]
|
|
2
|
+
requires = ["setuptools"]
|
|
3
|
+
build-backend = "setuptools.build_meta"
|
|
4
|
+
|
|
5
|
+
[project]
|
|
6
|
+
name = "codebase-extractor"
|
|
7
|
+
version = "1.0.0"
|
|
8
|
+
authors = [
|
|
9
|
+
{ name="Lukasz Lekowski" },
|
|
10
|
+
]
|
|
11
|
+
description = "A CLI tool to extract project source code into structured Markdown files for LLM & AI context."
|
|
12
|
+
readme = "README.md"
|
|
13
|
+
license = { file="LICENSE" }
|
|
14
|
+
requires-python = ">=3.9"
|
|
15
|
+
classifiers = [
|
|
16
|
+
"Programming Language :: Python :: 3",
|
|
17
|
+
"License :: OSI Approved :: MIT License",
|
|
18
|
+
"Operating System :: OS Independent",
|
|
19
|
+
"Topic :: Software Development :: Documentation",
|
|
20
|
+
"Topic :: Utilities",
|
|
21
|
+
]
|
|
22
|
+
dependencies = [
|
|
23
|
+
"questionary",
|
|
24
|
+
"halo",
|
|
25
|
+
"termcolor",
|
|
26
|
+
]
|
|
27
|
+
|
|
28
|
+
# This creates the `code-extractor` command in the user's terminal
|
|
29
|
+
[project.scripts]
|
|
30
|
+
code-extractor = "codebase_extractor.main_logic:main"
|
|
31
|
+
|
|
32
|
+
[project.urls]
|
|
33
|
+
"Homepage" = "https://github.com/lukaszlekowski/codebase-extractor"
|
|
34
|
+
"Bug Tracker" = "https://github.com/lukaszlekowski/codebase-extractor/issues"
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
__version__ = "1.0.0"
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
from pathlib import Path
|
|
2
|
+
import sys
|
|
3
|
+
|
|
4
|
+
# --- CONFIGURATION ---
|
|
5
|
+
SCRIPT_VERSION = "1.0.0"
|
|
6
|
+
GITHUB_URL = "https://github.com/lukaszlekowski/codebase-extractor"
|
|
7
|
+
LINKEDIN_URL = "https://www.linkedin.com/in/lukasz-lekowski"
|
|
8
|
+
SCRIPT_FILENAME = Path(sys.argv[0]).name
|
|
9
|
+
OUTPUT_DIR_NAME = "CODEBASE_EXTRACTS"
|
|
10
|
+
|
|
11
|
+
# --- FILE/FOLDER LISTS ---
|
|
12
|
+
EXCLUDED_DIRS = {
|
|
13
|
+
"node_modules", "vendor", "__pycache__", "dist", "build", "target", ".next",
|
|
14
|
+
".git", ".svn", ".hg", ".vscode", ".idea", "venv", ".venv",
|
|
15
|
+
OUTPUT_DIR_NAME,
|
|
16
|
+
}
|
|
17
|
+
EXCLUDED_FILENAMES = {
|
|
18
|
+
"package-lock.json", "yarn.lock", "composer.lock", ".env"
|
|
19
|
+
}
|
|
20
|
+
ALLOWED_FILENAMES = {
|
|
21
|
+
"dockerfile", ".gitignore", ".htaccess", "makefile"
|
|
22
|
+
}
|
|
23
|
+
ALLOWED_EXTENSIONS = {
|
|
24
|
+
".php", ".html", ".css", ".js", ".jsx", ".ts", ".tsx", ".vue", ".svelte",
|
|
25
|
+
".py", ".rb", ".java", ".c", ".cpp", ".cs", ".go", ".rs", ".json", ".xml",
|
|
26
|
+
".yaml", ".yml", ".toml", ".ini", ".conf", ".md", ".txt", ".rst", ".twig",
|
|
27
|
+
".blade", ".handlebars", ".mustache", ".ejs", ".sql", ".graphql", ".gql", ".tf",
|
|
28
|
+
}
|
|
29
|
+
|
|
30
|
+
# --- MAPPINGS & CONSTANTS ---
|
|
31
|
+
EXTENSION_LANG_MAP = {
|
|
32
|
+
".js": "javascript", ".ts": "typescript", ".tsx": "tsx", ".py": "python",
|
|
33
|
+
".html": "html", ".css": "css", ".json": "json", ".md": "markdown", ".txt": "",
|
|
34
|
+
".sh": "bash", ".yml": "yaml", ".yaml": "yaml", ".php": "php", ".rb": "ruby",
|
|
35
|
+
".java": "java", ".c": "c", ".cpp": "cpp", ".cs": "csharp", ".go": "go",
|
|
36
|
+
".rs": "rust", ".vue": "vue", ".svelte": "svelte", ".sql": "sql",
|
|
37
|
+
".graphql": "graphql", ".gql": "graphql",
|
|
38
|
+
}
|
|
39
|
+
MAX_FILE_SIZE_MB = 1
|
|
40
|
+
FILE_COUNT_WARNING_THRESHOLD = 1000
|
|
41
|
+
LOGO_BREAKPOINT = 144
|
|
@@ -0,0 +1,133 @@
|
|
|
1
|
+
import os
|
|
2
|
+
import datetime
|
|
3
|
+
import uuid
|
|
4
|
+
from pathlib import Path
|
|
5
|
+
from termcolor import colored
|
|
6
|
+
from . import config
|
|
7
|
+
import questionary
|
|
8
|
+
|
|
9
|
+
def get_folder_choices(root_path: Path, max_depth: int) -> list:
|
|
10
|
+
"""Recursively finds folders up to a max depth and prepares them for questionary."""
|
|
11
|
+
choices = []
|
|
12
|
+
|
|
13
|
+
def scanner(current_path: Path, depth: int):
|
|
14
|
+
if depth > max_depth:
|
|
15
|
+
return
|
|
16
|
+
|
|
17
|
+
relative_path = current_path.relative_to(root_path)
|
|
18
|
+
prefix = " " * (depth - 1)
|
|
19
|
+
display_name = f"{prefix}{current_path.name}"
|
|
20
|
+
choices.append(questionary.Choice(title=display_name, value=relative_path))
|
|
21
|
+
|
|
22
|
+
try:
|
|
23
|
+
subdirs = sorted([p for p in current_path.iterdir() if p.is_dir() and p.name not in config.EXCLUDED_DIRS])
|
|
24
|
+
for subdir in subdirs:
|
|
25
|
+
scanner(subdir, depth + 1)
|
|
26
|
+
except PermissionError:
|
|
27
|
+
pass
|
|
28
|
+
|
|
29
|
+
top_level_folders = sorted([p for p in root_path.iterdir() if p.is_dir() and p.name not in config.EXCLUDED_DIRS])
|
|
30
|
+
for folder in top_level_folders:
|
|
31
|
+
scanner(folder, 1)
|
|
32
|
+
|
|
33
|
+
root_option_name = f"root [{root_path.name}] (files in root folder only, excl. sub-folders)"
|
|
34
|
+
choices.insert(0, questionary.Choice(title=root_option_name, value="ROOT_SENTINEL"))
|
|
35
|
+
|
|
36
|
+
return choices
|
|
37
|
+
|
|
38
|
+
|
|
39
|
+
def is_allowed_file(path: Path, exclude_large: bool) -> bool:
|
|
40
|
+
"""Checks if a file should be included based on its name, extension, and size."""
|
|
41
|
+
if path.name == config.SCRIPT_FILENAME:
|
|
42
|
+
return False
|
|
43
|
+
if path.name.lower() in config.ALLOWED_FILENAMES:
|
|
44
|
+
return True
|
|
45
|
+
if not path.is_file():
|
|
46
|
+
return False
|
|
47
|
+
if path.name.lower() in config.EXCLUDED_FILENAMES:
|
|
48
|
+
return False
|
|
49
|
+
if path.suffix not in config.ALLOWED_EXTENSIONS:
|
|
50
|
+
return False
|
|
51
|
+
if exclude_large and path.stat().st_size > config.MAX_FILE_SIZE_MB * 1024 * 1024:
|
|
52
|
+
return False
|
|
53
|
+
return True
|
|
54
|
+
|
|
55
|
+
|
|
56
|
+
def extract_code_from_folder(folder: Path, exclude_large: bool) -> (str, int):
|
|
57
|
+
"""Extracts code from a given folder, respecting EXCLUDED_DIRS at all depths."""
|
|
58
|
+
content = f"# Folder: {folder.relative_to(Path.cwd())}\n\n"
|
|
59
|
+
extracted_files = 0
|
|
60
|
+
dirs_to_visit = [folder]
|
|
61
|
+
while dirs_to_visit:
|
|
62
|
+
current_dir = dirs_to_visit.pop(0)
|
|
63
|
+
for item in sorted(current_dir.iterdir()):
|
|
64
|
+
if item.is_dir() and item.name not in config.EXCLUDED_DIRS:
|
|
65
|
+
dirs_to_visit.append(item)
|
|
66
|
+
elif item.is_file() and is_allowed_file(item, exclude_large):
|
|
67
|
+
try:
|
|
68
|
+
rel_path = item.relative_to(Path.cwd())
|
|
69
|
+
ext = item.suffix
|
|
70
|
+
lang = config.EXTENSION_LANG_MAP.get(ext, "")
|
|
71
|
+
content += f"## {rel_path}\n\n```{lang}\n"
|
|
72
|
+
content += item.read_text(errors="ignore")
|
|
73
|
+
content += "\n```\n\n"
|
|
74
|
+
extracted_files += 1
|
|
75
|
+
except Exception as e:
|
|
76
|
+
content += f"\n\n"
|
|
77
|
+
if extracted_files > config.FILE_COUNT_WARNING_THRESHOLD:
|
|
78
|
+
print(colored(f"> Caution: Large file count in '{folder.name}' ({extracted_files} files).", "yellow"))
|
|
79
|
+
return content, extracted_files
|
|
80
|
+
|
|
81
|
+
|
|
82
|
+
def extract_code_from_root(root_path: Path, exclude_large: bool) -> (str, int):
|
|
83
|
+
"""Extracts code only from files present in the root directory."""
|
|
84
|
+
content = f"# Root Files: {root_path.name}\n\n"
|
|
85
|
+
extracted_files = 0
|
|
86
|
+
for filepath in sorted(root_path.iterdir()):
|
|
87
|
+
if filepath.is_file() and is_allowed_file(filepath, exclude_large):
|
|
88
|
+
ext = filepath.suffix
|
|
89
|
+
lang = config.EXTENSION_LANG_MAP.get(ext, "")
|
|
90
|
+
content += f"## {filepath.name}\n\n```{lang}\n"
|
|
91
|
+
content += filepath.read_text(errors="ignore")
|
|
92
|
+
content += "\n```\n\n"
|
|
93
|
+
extracted_files += 1
|
|
94
|
+
if extracted_files > config.FILE_COUNT_WARNING_THRESHOLD:
|
|
95
|
+
print(colored(f"> Caution: Large file count in root ({extracted_files} files).", "yellow"))
|
|
96
|
+
return content, extracted_files
|
|
97
|
+
|
|
98
|
+
|
|
99
|
+
def write_to_markdown_file(content: str, metadata: dict, root_path: Path):
|
|
100
|
+
"""Constructs a YAML header and writes the full content to a timestamped Markdown file."""
|
|
101
|
+
output_dir = Path(config.OUTPUT_DIR_NAME)
|
|
102
|
+
output_dir.mkdir(exist_ok=True)
|
|
103
|
+
|
|
104
|
+
timestamp = datetime.datetime.fromisoformat(metadata['run_timestamp']).strftime("%Y%m%d_%H%M%S")
|
|
105
|
+
output_name = str(metadata['folder_name'])
|
|
106
|
+
|
|
107
|
+
if output_name.startswith(f"root [{root_path.name}]"):
|
|
108
|
+
file_base_name = f"root_{root_path.name}"
|
|
109
|
+
else:
|
|
110
|
+
file_base_name = str(output_name).replace(os.sep, "_")
|
|
111
|
+
|
|
112
|
+
filename = f"{file_base_name}_{timestamp}.md"
|
|
113
|
+
full_filepath = output_dir / filename
|
|
114
|
+
|
|
115
|
+
yaml_header = f"""---
|
|
116
|
+
extraction_details:
|
|
117
|
+
reference: {metadata['run_ref']}
|
|
118
|
+
timestamp_utc: "{metadata['run_timestamp']}"
|
|
119
|
+
source_folder: "{metadata['folder_name']}"
|
|
120
|
+
file_count: {metadata['file_count']}
|
|
121
|
+
tool_details:
|
|
122
|
+
name: "Codebase Extractor"
|
|
123
|
+
version: "{config.SCRIPT_VERSION}"
|
|
124
|
+
source: "{config.GITHUB_URL}"
|
|
125
|
+
---
|
|
126
|
+
|
|
127
|
+
"""
|
|
128
|
+
full_content = yaml_header + content
|
|
129
|
+
with open(full_filepath, "w", encoding="utf-8") as f:
|
|
130
|
+
f.write(full_content)
|
|
131
|
+
|
|
132
|
+
print(f"\nπΎ Saved to {colored(str(full_filepath), 'cyan')}")
|
|
133
|
+
return str(full_filepath)
|
|
@@ -0,0 +1,197 @@
|
|
|
1
|
+
import os
|
|
2
|
+
import sys
|
|
3
|
+
import time
|
|
4
|
+
import datetime
|
|
5
|
+
import uuid
|
|
6
|
+
import shutil
|
|
7
|
+
import argparse
|
|
8
|
+
from pathlib import Path
|
|
9
|
+
from typing import List, Optional
|
|
10
|
+
|
|
11
|
+
import questionary
|
|
12
|
+
from halo import Halo
|
|
13
|
+
from termcolor import colored
|
|
14
|
+
from prompt_toolkit.styles import Style
|
|
15
|
+
from questionary import Validator, ValidationError
|
|
16
|
+
|
|
17
|
+
# Import from our new modules
|
|
18
|
+
from . import config
|
|
19
|
+
from . import ui
|
|
20
|
+
from . import file_handler
|
|
21
|
+
|
|
22
|
+
class NumberValidator(Validator):
|
|
23
|
+
"""Validates that the input is a positive integer."""
|
|
24
|
+
def validate(self, document):
|
|
25
|
+
try:
|
|
26
|
+
value = int(document.text)
|
|
27
|
+
if value <= 0:
|
|
28
|
+
raise ValidationError(
|
|
29
|
+
message="Please enter a positive number.",
|
|
30
|
+
cursor_position=len(document.text))
|
|
31
|
+
except ValueError:
|
|
32
|
+
raise ValidationError(
|
|
33
|
+
message="Please enter a valid number.",
|
|
34
|
+
cursor_position=len(document.text))
|
|
35
|
+
|
|
36
|
+
def main():
|
|
37
|
+
"""Main function to run the CLI application."""
|
|
38
|
+
parser = argparse.ArgumentParser(add_help=False)
|
|
39
|
+
parser.add_argument(
|
|
40
|
+
'-ni', '--no-instructions',
|
|
41
|
+
action='store_true'
|
|
42
|
+
)
|
|
43
|
+
args, _ = parser.parse_known_args()
|
|
44
|
+
|
|
45
|
+
ui.clear_screen()
|
|
46
|
+
ui.print_banner()
|
|
47
|
+
|
|
48
|
+
if not args.no_instructions:
|
|
49
|
+
ui.show_instructions()
|
|
50
|
+
else:
|
|
51
|
+
input(colored("\nPress Enter to begin...", "green"))
|
|
52
|
+
ui.clear_screen()
|
|
53
|
+
|
|
54
|
+
root_path = Path.cwd()
|
|
55
|
+
|
|
56
|
+
select_style = Style([('qmark', 'fg:#FFA500'), ('pointer', 'fg:#FFA500'), ('highlighted', 'fg:black bg:#FFA500'), ('selected', 'fg:black bg:#FFA500')])
|
|
57
|
+
checkbox_style = Style([('qmark', 'fg:#FFA500'), ('pointer', 'fg:#FFA500'), ('highlighted', 'fg:#FFA500'), ('selected', 'fg:#FFA500'), ('checkbox-selected', 'fg:#FFA500')])
|
|
58
|
+
|
|
59
|
+
exit_message = colored("\nExtraction aborted by user. Closing Code Extractor. Goodbye.", "red")
|
|
60
|
+
|
|
61
|
+
print("=== Extraction Settings ===")
|
|
62
|
+
exclude_large = questionary.select("[1/2] -- Exclude files larger than 1MB?", choices=["yes", "no"], style=select_style, instruction=" ").ask()
|
|
63
|
+
if exclude_large is None:
|
|
64
|
+
print(exit_message)
|
|
65
|
+
return
|
|
66
|
+
exclude_large = exclude_large == "yes"
|
|
67
|
+
print()
|
|
68
|
+
|
|
69
|
+
folders_to_process = set()
|
|
70
|
+
process_root_files = False
|
|
71
|
+
|
|
72
|
+
run_ref = str(uuid.uuid4())
|
|
73
|
+
run_timestamp = datetime.datetime.now(datetime.timezone.utc).isoformat()
|
|
74
|
+
|
|
75
|
+
selection_mode = questionary.select("[2/2] -- What do you want to extract?", choices=["Everything (all folders and root files)", "Specific folders/root from a list"], style=select_style, instruction=" ").ask()
|
|
76
|
+
if selection_mode is None:
|
|
77
|
+
print(exit_message)
|
|
78
|
+
return
|
|
79
|
+
|
|
80
|
+
if selection_mode == "Everything (all folders and root files)":
|
|
81
|
+
folders_to_process.update([p for p in root_path.iterdir() if p.is_dir() and p.name not in config.EXCLUDED_DIRS])
|
|
82
|
+
process_root_files = True
|
|
83
|
+
else:
|
|
84
|
+
depth_str = questionary.text(
|
|
85
|
+
"-- How many levels deep should we scan for folders?",
|
|
86
|
+
default="3",
|
|
87
|
+
validate=NumberValidator,
|
|
88
|
+
style=select_style
|
|
89
|
+
).ask()
|
|
90
|
+
if depth_str is None:
|
|
91
|
+
print(exit_message)
|
|
92
|
+
return
|
|
93
|
+
scan_depth = int(depth_str)
|
|
94
|
+
|
|
95
|
+
folder_choices = file_handler.get_folder_choices(root_path, max_depth=scan_depth)
|
|
96
|
+
selected_options = None
|
|
97
|
+
confirm_exit = False
|
|
98
|
+
|
|
99
|
+
checkbox_instruction = "(Arrows to move, Space to select, A to toggle, I to invert)"
|
|
100
|
+
|
|
101
|
+
while not selected_options:
|
|
102
|
+
selection = questionary.checkbox(
|
|
103
|
+
"-- Select folders/sub-folders to extract (must select at least one):",
|
|
104
|
+
choices=folder_choices,
|
|
105
|
+
style=checkbox_style,
|
|
106
|
+
instruction=checkbox_instruction
|
|
107
|
+
).ask()
|
|
108
|
+
|
|
109
|
+
if selection is None:
|
|
110
|
+
if confirm_exit:
|
|
111
|
+
print(exit_message)
|
|
112
|
+
return
|
|
113
|
+
else:
|
|
114
|
+
confirm_exit = True
|
|
115
|
+
print(colored("\n[!] Press Ctrl+C again to exit.", "yellow"))
|
|
116
|
+
continue
|
|
117
|
+
|
|
118
|
+
confirm_exit = False
|
|
119
|
+
|
|
120
|
+
if not selection:
|
|
121
|
+
print(colored("[!] Error: You must make a selection.", "red"))
|
|
122
|
+
continue
|
|
123
|
+
|
|
124
|
+
selected_options = selection
|
|
125
|
+
break
|
|
126
|
+
|
|
127
|
+
if "ROOT_SENTINEL" in selected_options:
|
|
128
|
+
process_root_files = True
|
|
129
|
+
selected_options.remove("ROOT_SENTINEL")
|
|
130
|
+
|
|
131
|
+
selected_paths = [root_path / p for p in selected_options]
|
|
132
|
+
sorted_paths = sorted(selected_paths, key=lambda p: len(p.parts))
|
|
133
|
+
|
|
134
|
+
final_paths = set()
|
|
135
|
+
for path in sorted_paths:
|
|
136
|
+
if not any(path.is_relative_to(parent) for parent in final_paths):
|
|
137
|
+
final_paths.add(path)
|
|
138
|
+
|
|
139
|
+
folders_to_process.update(final_paths)
|
|
140
|
+
|
|
141
|
+
print()
|
|
142
|
+
total_files_extracted = 0
|
|
143
|
+
|
|
144
|
+
for folder_path in sorted(list(folders_to_process)):
|
|
145
|
+
with Halo(text=f"Extracting {folder_path.relative_to(root_path)}...", spinner="dots"):
|
|
146
|
+
time.sleep(0.1)
|
|
147
|
+
folder_md, folder_count = file_handler.extract_code_from_folder(folder_path, exclude_large)
|
|
148
|
+
if folder_count > 0:
|
|
149
|
+
metadata = {"run_ref": run_ref, "run_timestamp": run_timestamp, "folder_name": str(folder_path.relative_to(root_path)), "file_count": folder_count}
|
|
150
|
+
file_handler.write_to_markdown_file(folder_md, metadata, root_path)
|
|
151
|
+
total_files_extracted += folder_count
|
|
152
|
+
print(f"β
Extracted {folder_count} file(s) from: {folder_path.relative_to(root_path)}")
|
|
153
|
+
else:
|
|
154
|
+
print(f"[!] No extractable files in: {folder_path.relative_to(root_path)}")
|
|
155
|
+
print("")
|
|
156
|
+
|
|
157
|
+
if process_root_files:
|
|
158
|
+
root_display_name = f"root [{root_path.name}] (files in root folder only, excl. sub-folders)"
|
|
159
|
+
with Halo(text=f"Extracting {root_display_name}...", spinner="dots"):
|
|
160
|
+
time.sleep(0.1)
|
|
161
|
+
root_md, root_count = file_handler.extract_code_from_root(root_path, exclude_large)
|
|
162
|
+
if root_count > 0:
|
|
163
|
+
metadata = {"run_ref": run_ref, "run_timestamp": run_timestamp, "folder_name": root_display_name, "file_count": root_count}
|
|
164
|
+
file_handler.write_to_markdown_file(root_md, metadata, root_path)
|
|
165
|
+
total_files_extracted += root_count
|
|
166
|
+
print(f"β
Extracted {root_count} file(s) from the root directory")
|
|
167
|
+
else:
|
|
168
|
+
print("[!] No extractable files in the root directory")
|
|
169
|
+
print("")
|
|
170
|
+
|
|
171
|
+
try:
|
|
172
|
+
width = shutil.get_terminal_size((80, 20)).columns
|
|
173
|
+
except OSError:
|
|
174
|
+
width = 80
|
|
175
|
+
|
|
176
|
+
if total_files_extracted > 0:
|
|
177
|
+
output_dir_path = Path(config.OUTPUT_DIR_NAME).resolve()
|
|
178
|
+
print(colored(f"Success! A total of {total_files_extracted} file(s) have been extracted.", "grey", "on_green"))
|
|
179
|
+
print(f"Files saved in: {colored(str(output_dir_path), 'cyan')}")
|
|
180
|
+
else:
|
|
181
|
+
print(colored("Extraction complete, but no files matched the criteria.", "yellow"))
|
|
182
|
+
|
|
183
|
+
print("\n")
|
|
184
|
+
print("-" * width)
|
|
185
|
+
print("π‘ Love this tool? Found a bug? Share your feedback on GitHub:")
|
|
186
|
+
print(config.GITHUB_URL + "\n")
|
|
187
|
+
print("π€ Connect with the author on LinkedIn:")
|
|
188
|
+
print(config.LINKEDIN_URL + "\n")
|
|
189
|
+
print("β Enjoying this tool? You can support its development with a coffee!")
|
|
190
|
+
print("https://www.buymeacoffee.com/lukaszlekowski\n")
|
|
191
|
+
|
|
192
|
+
|
|
193
|
+
if __name__ == "__main__":
|
|
194
|
+
try:
|
|
195
|
+
main()
|
|
196
|
+
except KeyboardInterrupt:
|
|
197
|
+
print(colored("\nExtraction aborted by user. Closing Code Extractor. Goodbye.", "red"))
|
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
import os
|
|
2
|
+
import shutil
|
|
3
|
+
from . import config
|
|
4
|
+
from termcolor import colored
|
|
5
|
+
|
|
6
|
+
LOGO_LARGE = """
|
|
7
|
+
βββββββ βββββββ βββββββ βββββββββββββββ ββββββ ββββββββββββββββ βββββββββββ βββββββββββββββββββ ββββββ ββββββββββββββββ βββββββ βββββββ
|
|
8
|
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
9
|
+
βββ βββ ββββββ βββββββββ ββββββββββββββββββββββββββββββ ββββββ ββββββ βββ βββββββββββββββββββ βββ βββ βββββββββββ
|
|
10
|
+
βββ βββ ββββββ βββββββββ ββββββββββββββββββββββββββββββ ββββββ ββββββ βββ βββββββββββββββββββ βββ βββ βββββββββββ
|
|
11
|
+
ββββββββββββββββββββββββββββββββββββββββββββ βββββββββββββββββββ ββββββββββββ βββ βββ βββ ββββββ βββββββββββ βββ ββββββββββββ βββ
|
|
12
|
+
βββββββ βββββββ βββββββ βββββββββββββββ βββ βββββββββββββββββββ βββββββββββ βββ βββ βββ ββββββ βββ βββββββ βββ βββββββ βββ βββ
|
|
13
|
+
"""
|
|
14
|
+
|
|
15
|
+
LOGO_SMALL = """
|
|
16
|
+
βββββββ βββββββ βββββββ βββββββββββββββ ββββββ ββββββββββββββββ
|
|
17
|
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
18
|
+
βββ βββ ββββββ βββββββββ ββββββββββββββββββββββββββββββ
|
|
19
|
+
βββ βββ ββββββ βββββββββ ββββββββββββββββββββββββββββββ
|
|
20
|
+
ββββββββββββββββββββββββββββββββββββββββββββ βββββββββββββββββββ
|
|
21
|
+
βββββββ βββββββ βββββββ βββββββββββββββ βββ βββββββββββββββββββ
|
|
22
|
+
|
|
23
|
+
βββββββββββ βββββββββββββββββββ ββββββ ββββββββββββββββ βββββββ βββββββ
|
|
24
|
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
25
|
+
ββββββ ββββββ βββ βββββββββββββββββββ βββ βββ βββββββββββ
|
|
26
|
+
ββββββ ββββββ βββ βββββββββββββββββββ βββ βββ βββββββββββ
|
|
27
|
+
ββββββββββββ βββ βββ βββ ββββββ βββββββββββ βββ ββββββββββββ βββ
|
|
28
|
+
βββββββββββ βββ βββ βββ ββββββ βββ βββββββ βββ βββββββ βββ βββ
|
|
29
|
+
"""
|
|
30
|
+
|
|
31
|
+
def clear_screen():
|
|
32
|
+
"""Clears the terminal screen."""
|
|
33
|
+
os.system('cls' if os.name == 'nt' else 'clear')
|
|
34
|
+
|
|
35
|
+
def print_banner():
|
|
36
|
+
"""Prints a banner that adjusts to the terminal width."""
|
|
37
|
+
try:
|
|
38
|
+
width = shutil.get_terminal_size((80, 20)).columns
|
|
39
|
+
except OSError:
|
|
40
|
+
width = 80
|
|
41
|
+
|
|
42
|
+
if width > config.LOGO_BREAKPOINT:
|
|
43
|
+
print(LOGO_LARGE)
|
|
44
|
+
else:
|
|
45
|
+
print(LOGO_SMALL)
|
|
46
|
+
|
|
47
|
+
print(colored(f" Welcome to Code Extractor v{config.SCRIPT_VERSION} by Lukasz Lekowski ".center(width, "="), "white", "on_magenta"))
|
|
48
|
+
|
|
49
|
+
def show_instructions():
|
|
50
|
+
"""Clears screen and shows detailed instructions, pausing for user input."""
|
|
51
|
+
try:
|
|
52
|
+
width = shutil.get_terminal_size((80, 20)).columns
|
|
53
|
+
except OSError:
|
|
54
|
+
width = 80
|
|
55
|
+
|
|
56
|
+
input(colored("\nPress Enter to view detailed instructions...", "dark_grey"))
|
|
57
|
+
clear_screen()
|
|
58
|
+
print(colored("--- How It Works ---", "yellow"))
|
|
59
|
+
print("The script will guide you through a series of steps:\n")
|
|
60
|
+
|
|
61
|
+
print(colored("Step 1: General Settings", "cyan"))
|
|
62
|
+
print("You will first be asked about basic settings, such as whether to exclude files larger than 1MB to keep the output clean.\n")
|
|
63
|
+
|
|
64
|
+
print(colored("Step 2: Extraction Mode", "cyan"))
|
|
65
|
+
print("You have two main modes to choose from:")
|
|
66
|
+
print(" - 'Everything': This automatically finds and processes every valid folder and all root files. You will get one Markdown file for each top-level folder, plus one for the root files.")
|
|
67
|
+
print(" - 'Specific folders/root': This lets you hand-pick exactly what you want to extract.\n")
|
|
68
|
+
|
|
69
|
+
print(colored("Step 3: Detailed Selection (If you chose 'Specific')", "cyan"))
|
|
70
|
+
print("If you choose to be specific, you'll be presented with more options:")
|
|
71
|
+
print(" - Scan Depth: First, you'll decide how many sub-folder levels to scan and display.")
|
|
72
|
+
print(" - Selection Tree: You'll see a tree-like list of your project's folders. The script handles parent/child selections intelligently:")
|
|
73
|
+
print(" - If you select a parent folder, all of its sub-folders are automatically included. You don't need to check them individually.")
|
|
74
|
+
print(" - To get a file for *only* a sub-folder, select the sub-folder but *not* its parent.")
|
|
75
|
+
print(" - The 'root [...]' option specifically extracts *only* the files in your project's main directory.\n")
|
|
76
|
+
|
|
77
|
+
print(colored("--- Output Details ---", "yellow"))
|
|
78
|
+
print(f"All extracted content is saved into the '{config.OUTPUT_DIR_NAME}' directory. Each Markdown file generated will contain a YAML metadata header at the top with a unique reference ID, a timestamp, and more.\n")
|
|
79
|
+
|
|
80
|
+
tip = "TIP: Run this script with the --no-instructions or -ni flag to skip this guide next time."
|
|
81
|
+
print(colored(tip, "black", "on_yellow"))
|
|
82
|
+
|
|
83
|
+
input(colored("\nReady? Press Enter to begin...", "green"))
|
|
84
|
+
clear_screen()
|
|
@@ -0,0 +1,143 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: codebase-extractor
|
|
3
|
+
Version: 1.0.0
|
|
4
|
+
Summary: A CLI tool to extract project source code into structured Markdown files for LLM & AI context.
|
|
5
|
+
Author: Lukasz Lekowski
|
|
6
|
+
Project-URL: Homepage, https://github.com/lukaszlekowski/codebase-extractor
|
|
7
|
+
Project-URL: Bug Tracker, https://github.com/lukaszlekowski/codebase-extractor/issues
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
+
Classifier: Operating System :: OS Independent
|
|
11
|
+
Classifier: Topic :: Software Development :: Documentation
|
|
12
|
+
Classifier: Topic :: Utilities
|
|
13
|
+
Requires-Python: >=3.9
|
|
14
|
+
Description-Content-Type: text/markdown
|
|
15
|
+
License-File: LICENCE
|
|
16
|
+
Requires-Dist: questionary
|
|
17
|
+
Requires-Dist: halo
|
|
18
|
+
Requires-Dist: termcolor
|
|
19
|
+
Dynamic: license-file
|
|
20
|
+
|
|
21
|
+
# Codebase Extractor
|
|
22
|
+
|
|
23
|
+
<p align="center">
|
|
24
|
+
<strong>A user-friendly CLI tool to extract project source code into structured Markdown files.</strong>
|
|
25
|
+
</p>
|
|
26
|
+
|
|
27
|
+
<p align="center">
|
|
28
|
+
<img src="https://img.shields.io/badge/License-MIT%20(Modified)-yellow.svg" alt="License: MIT (Modified)">
|
|
29
|
+
<img src="https://img.shields.io/badge/python-3.9%2B-blue.svg" alt="Python Version">
|
|
30
|
+
<img src="https://img.shields.io/github/stars/lukaszlekowski/codebase-extractor?style=social" alt="GitHub Stars">
|
|
31
|
+
</p>
|
|
32
|
+
<p align="center">
|
|
33
|
+
π‘ <b>Love this tool?</b> Found a bug or have an idea? Share it on <a href="https://github.com/lukaszlekowski/codebase-extractor">GitHub</a>! <br>
|
|
34
|
+
π€ <b>Connect with me</b> on <a href="https://www.linkedin.com/in/lukasz-lekowski">LinkedIn</a>. <br>
|
|
35
|
+
β <b>Enjoying it?</b> Support development with a <a href="https://www.buymeacoffee.com/lukaszlekowski">coffee</a>!
|
|
36
|
+
</p>
|
|
37
|
+
|
|
38
|
+
</p>
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## π Overview
|
|
43
|
+
|
|
44
|
+
Codebase Extractor is a command-line interface (CLI) tool designed to scan a project directory and consolidate all relevant source code into neatly organized Markdown files. It's perfect for creating a complete project snapshot for analysis, documentation, or providing context to Large Language Models (LLMs) like GPT-4, Gemini, or Claude.
|
|
45
|
+
|
|
46
|
+
The tool is highly configurable, allowing you to select specific folders, exclude large files, and intelligently ignore common directories like `node_modules` and `.git`.
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
## β¨ Key Features
|
|
51
|
+
|
|
52
|
+
- **Interactive & User-Friendly:** A guided, multi-step CLI experience that makes selecting options simple and clear.
|
|
53
|
+
- **Smart Filtering:** Automatically excludes common dependency folders, build artifacts, version control directories, and IDE configuration files.
|
|
54
|
+
- **Flexible Selection Modes:** Choose to extract the entire project with one command, or dive into a specific selection mode.
|
|
55
|
+
- **π³ Nested Folder Selection:** Interactively browse and select specific sub-folders from a tree-like view.
|
|
56
|
+
- **π’ Configurable Scan Depth:** You decide how many levels deep the script should look for folders when building the selection tree.
|
|
57
|
+
- **YAML Metadata:** Each generated Markdown file is prepended with a YAML front matter block containing useful metadata like a unique run ID, timestamp, and file count for easy tracking and parsing.
|
|
58
|
+
- **π Quick Start Mode:** Use the `--no-instructions` flag to skip the detailed intro guide on subsequent runs.
|
|
59
|
+
- **Safe & Robust:** Features graceful exit handling (`Ctrl+C`) and provides clear feedback during the extraction process.
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## π¦ Installation
|
|
64
|
+
|
|
65
|
+
To get started with Codebase Extractor, follow these steps.
|
|
66
|
+
|
|
67
|
+
1. **Clone the Repository**
|
|
68
|
+
|
|
69
|
+
```bash
|
|
70
|
+
git clone [https://github.com/lukaszlekowski/codebase-extractor.git](https://github.com/lukaszlekowski/codebase-extractor.git)
|
|
71
|
+
cd codebase-extractor
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
2. **Set Up a Virtual Environment** (Recommended)
|
|
75
|
+
This keeps the dependencies for this project isolated from your system's Python installation.
|
|
76
|
+
|
|
77
|
+
```bash
|
|
78
|
+
# For macOS/Linux
|
|
79
|
+
python3 -m venv venv
|
|
80
|
+
source venv/bin/activate
|
|
81
|
+
|
|
82
|
+
# For Windows
|
|
83
|
+
python -m venv venv
|
|
84
|
+
venv\Scripts\activate
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
3. **Install Dependencies**
|
|
88
|
+
The required Python packages are listed in `requirements.txt`.
|
|
89
|
+
```bash
|
|
90
|
+
pip install -r requirements.txt
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
## βΆοΈ Usage
|
|
96
|
+
|
|
97
|
+
### Basic Usage
|
|
98
|
+
|
|
99
|
+
Once installed, simply run the script from the root of the project you wish to extract:
|
|
100
|
+
|
|
101
|
+
```bash
|
|
102
|
+
python3 codebase_extractor.py
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
# Quick Start
|
|
106
|
+
|
|
107
|
+
For repeat usage, you can skip the detailed introductory guide by using the `--no-instructions` or `-ni` flag:
|
|
108
|
+
|
|
109
|
+
```bash
|
|
110
|
+
python3 codebase_extractor.py --no-instructions
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
## The Process
|
|
114
|
+
|
|
115
|
+
The tool will guide you through a series of prompts:
|
|
116
|
+
|
|
117
|
+
- **Initial Setup [1/2]**: A yes/no question to skip files larger than 1MB.
|
|
118
|
+
- **Extraction Mode [2/2]**: Choose whether to extract the entire project (`Everything`) or select specific folders.
|
|
119
|
+
|
|
120
|
+
### Specific Selection (if chosen):
|
|
121
|
+
|
|
122
|
+
- **Scan Depth**: You'll be asked how many sub-folder levels to scan for the selection list (defaults to 3).
|
|
123
|
+
- **Folder Tree**: You'll see a checklist of available folders and sub-folders to extract. The script handles selections intelligently:
|
|
124
|
+
- Selecting a parent folder automatically includes all its sub-folders, so you donβt need to select them individually.
|
|
125
|
+
- To extract only a sub-folderβs contents, select the sub-folder but not its parent.
|
|
126
|
+
- The special `root [...]` option extracts only the files in your project's main directory, ignoring all sub-folders.
|
|
127
|
+
|
|
128
|
+
## Output Details
|
|
129
|
+
|
|
130
|
+
All output files are saved in a `CODEBASE_EXTRACTS` directory within your project folder. Each generated Markdown file includes a YAML metadata header with a unique reference ID, timestamp, and file count for easy tracking and parsing.
|
|
131
|
+
|
|
132
|
+
## π License
|
|
133
|
+
|
|
134
|
+
This project is licensed under a modified MIT License. Please see the [LICENSE](LICENSE) file for the full text.
|
|
135
|
+
|
|
136
|
+
The standard MIT License has been amended with a single, important attribution requirement:
|
|
137
|
+
|
|
138
|
+
If you use, copy, or modify any part of this software, you must include a clear and visible attribution to the original author and project in your derivative work.
|
|
139
|
+
|
|
140
|
+
This attribution must include:
|
|
141
|
+
|
|
142
|
+
- A link back to this original GitHub repository: [https://github.com/lukaszlekowski/codebase-extractor](https://github.com/lukaszlekowski/codebase-extractor)
|
|
143
|
+
- A link to the author's LinkedIn profile: [https://www.linkedin.com/in/lukasz-lekowski](https://www.linkedin.com/in/lukasz-lekowski)
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
LICENCE
|
|
2
|
+
README.md
|
|
3
|
+
pyproject.toml
|
|
4
|
+
src/codebase_extractor/__init__.py
|
|
5
|
+
src/codebase_extractor/config.py
|
|
6
|
+
src/codebase_extractor/file_handler.py
|
|
7
|
+
src/codebase_extractor/main_logic.py
|
|
8
|
+
src/codebase_extractor/ui.py
|
|
9
|
+
src/codebase_extractor.egg-info/PKG-INFO
|
|
10
|
+
src/codebase_extractor.egg-info/SOURCES.txt
|
|
11
|
+
src/codebase_extractor.egg-info/dependency_links.txt
|
|
12
|
+
src/codebase_extractor.egg-info/entry_points.txt
|
|
13
|
+
src/codebase_extractor.egg-info/requires.txt
|
|
14
|
+
src/codebase_extractor.egg-info/top_level.txt
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
codebase_extractor
|