codebase-extractor 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,11 @@
1
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
2
+
3
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
4
+
5
+ If you use, copy, or modify any part of this software, you must include a clear and visible attribution to the original author and project in your derivative work. This attribution must include:
6
+
7
+ A link back to the original GitHub repository: https://github.com/lukaszlekowski/codebase-extractor
8
+
9
+ A link to the author's LinkedIn profile: https://www.linkedin.com/in/lukasz-lekowski
10
+
11
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,143 @@
1
+ Metadata-Version: 2.4
2
+ Name: codebase-extractor
3
+ Version: 1.0.0
4
+ Summary: A CLI tool to extract project source code into structured Markdown files for LLM & AI context.
5
+ Author: Lukasz Lekowski
6
+ Project-URL: Homepage, https://github.com/lukaszlekowski/codebase-extractor
7
+ Project-URL: Bug Tracker, https://github.com/lukaszlekowski/codebase-extractor/issues
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Classifier: Topic :: Software Development :: Documentation
12
+ Classifier: Topic :: Utilities
13
+ Requires-Python: >=3.9
14
+ Description-Content-Type: text/markdown
15
+ License-File: LICENCE
16
+ Requires-Dist: questionary
17
+ Requires-Dist: halo
18
+ Requires-Dist: termcolor
19
+ Dynamic: license-file
20
+
21
+ # Codebase Extractor
22
+
23
+ <p align="center">
24
+ <strong>A user-friendly CLI tool to extract project source code into structured Markdown files.</strong>
25
+ </p>
26
+
27
+ <p align="center">
28
+ <img src="https://img.shields.io/badge/License-MIT%20(Modified)-yellow.svg" alt="License: MIT (Modified)">
29
+ <img src="https://img.shields.io/badge/python-3.9%2B-blue.svg" alt="Python Version">
30
+ <img src="https://img.shields.io/github/stars/lukaszlekowski/codebase-extractor?style=social" alt="GitHub Stars">
31
+ </p>
32
+ <p align="center">
33
+ πŸ’‘ <b>Love this tool?</b> Found a bug or have an idea? Share it on <a href="https://github.com/lukaszlekowski/codebase-extractor">GitHub</a>! <br>
34
+ 🀝 <b>Connect with me</b> on <a href="https://www.linkedin.com/in/lukasz-lekowski">LinkedIn</a>. <br>
35
+ β˜• <b>Enjoying it?</b> Support development with a <a href="https://www.buymeacoffee.com/lukaszlekowski">coffee</a>!
36
+ </p>
37
+
38
+ </p>
39
+
40
+ ---
41
+
42
+ ## πŸš€ Overview
43
+
44
+ Codebase Extractor is a command-line interface (CLI) tool designed to scan a project directory and consolidate all relevant source code into neatly organized Markdown files. It's perfect for creating a complete project snapshot for analysis, documentation, or providing context to Large Language Models (LLMs) like GPT-4, Gemini, or Claude.
45
+
46
+ The tool is highly configurable, allowing you to select specific folders, exclude large files, and intelligently ignore common directories like `node_modules` and `.git`.
47
+
48
+ ---
49
+
50
+ ## ✨ Key Features
51
+
52
+ - **Interactive & User-Friendly:** A guided, multi-step CLI experience that makes selecting options simple and clear.
53
+ - **Smart Filtering:** Automatically excludes common dependency folders, build artifacts, version control directories, and IDE configuration files.
54
+ - **Flexible Selection Modes:** Choose to extract the entire project with one command, or dive into a specific selection mode.
55
+ - **🌳 Nested Folder Selection:** Interactively browse and select specific sub-folders from a tree-like view.
56
+ - **πŸ”’ Configurable Scan Depth:** You decide how many levels deep the script should look for folders when building the selection tree.
57
+ - **YAML Metadata:** Each generated Markdown file is prepended with a YAML front matter block containing useful metadata like a unique run ID, timestamp, and file count for easy tracking and parsing.
58
+ - **πŸš€ Quick Start Mode:** Use the `--no-instructions` flag to skip the detailed intro guide on subsequent runs.
59
+ - **Safe & Robust:** Features graceful exit handling (`Ctrl+C`) and provides clear feedback during the extraction process.
60
+
61
+ ---
62
+
63
+ ## πŸ“¦ Installation
64
+
65
+ To get started with Codebase Extractor, follow these steps.
66
+
67
+ 1. **Clone the Repository**
68
+
69
+ ```bash
70
+ git clone [https://github.com/lukaszlekowski/codebase-extractor.git](https://github.com/lukaszlekowski/codebase-extractor.git)
71
+ cd codebase-extractor
72
+ ```
73
+
74
+ 2. **Set Up a Virtual Environment** (Recommended)
75
+ This keeps the dependencies for this project isolated from your system's Python installation.
76
+
77
+ ```bash
78
+ # For macOS/Linux
79
+ python3 -m venv venv
80
+ source venv/bin/activate
81
+
82
+ # For Windows
83
+ python -m venv venv
84
+ venv\Scripts\activate
85
+ ```
86
+
87
+ 3. **Install Dependencies**
88
+ The required Python packages are listed in `requirements.txt`.
89
+ ```bash
90
+ pip install -r requirements.txt
91
+ ```
92
+
93
+ ---
94
+
95
+ ## ▢️ Usage
96
+
97
+ ### Basic Usage
98
+
99
+ Once installed, simply run the script from the root of the project you wish to extract:
100
+
101
+ ```bash
102
+ python3 codebase_extractor.py
103
+ ```
104
+
105
+ # Quick Start
106
+
107
+ For repeat usage, you can skip the detailed introductory guide by using the `--no-instructions` or `-ni` flag:
108
+
109
+ ```bash
110
+ python3 codebase_extractor.py --no-instructions
111
+ ```
112
+
113
+ ## The Process
114
+
115
+ The tool will guide you through a series of prompts:
116
+
117
+ - **Initial Setup [1/2]**: A yes/no question to skip files larger than 1MB.
118
+ - **Extraction Mode [2/2]**: Choose whether to extract the entire project (`Everything`) or select specific folders.
119
+
120
+ ### Specific Selection (if chosen):
121
+
122
+ - **Scan Depth**: You'll be asked how many sub-folder levels to scan for the selection list (defaults to 3).
123
+ - **Folder Tree**: You'll see a checklist of available folders and sub-folders to extract. The script handles selections intelligently:
124
+ - Selecting a parent folder automatically includes all its sub-folders, so you don’t need to select them individually.
125
+ - To extract only a sub-folder’s contents, select the sub-folder but not its parent.
126
+ - The special `root [...]` option extracts only the files in your project's main directory, ignoring all sub-folders.
127
+
128
+ ## Output Details
129
+
130
+ All output files are saved in a `CODEBASE_EXTRACTS` directory within your project folder. Each generated Markdown file includes a YAML metadata header with a unique reference ID, timestamp, and file count for easy tracking and parsing.
131
+
132
+ ## πŸ“œ License
133
+
134
+ This project is licensed under a modified MIT License. Please see the [LICENSE](LICENSE) file for the full text.
135
+
136
+ The standard MIT License has been amended with a single, important attribution requirement:
137
+
138
+ If you use, copy, or modify any part of this software, you must include a clear and visible attribution to the original author and project in your derivative work.
139
+
140
+ This attribution must include:
141
+
142
+ - A link back to this original GitHub repository: [https://github.com/lukaszlekowski/codebase-extractor](https://github.com/lukaszlekowski/codebase-extractor)
143
+ - A link to the author's LinkedIn profile: [https://www.linkedin.com/in/lukasz-lekowski](https://www.linkedin.com/in/lukasz-lekowski)
@@ -0,0 +1,123 @@
1
+ # Codebase Extractor
2
+
3
+ <p align="center">
4
+ <strong>A user-friendly CLI tool to extract project source code into structured Markdown files.</strong>
5
+ </p>
6
+
7
+ <p align="center">
8
+ <img src="https://img.shields.io/badge/License-MIT%20(Modified)-yellow.svg" alt="License: MIT (Modified)">
9
+ <img src="https://img.shields.io/badge/python-3.9%2B-blue.svg" alt="Python Version">
10
+ <img src="https://img.shields.io/github/stars/lukaszlekowski/codebase-extractor?style=social" alt="GitHub Stars">
11
+ </p>
12
+ <p align="center">
13
+ πŸ’‘ <b>Love this tool?</b> Found a bug or have an idea? Share it on <a href="https://github.com/lukaszlekowski/codebase-extractor">GitHub</a>! <br>
14
+ 🀝 <b>Connect with me</b> on <a href="https://www.linkedin.com/in/lukasz-lekowski">LinkedIn</a>. <br>
15
+ β˜• <b>Enjoying it?</b> Support development with a <a href="https://www.buymeacoffee.com/lukaszlekowski">coffee</a>!
16
+ </p>
17
+
18
+ </p>
19
+
20
+ ---
21
+
22
+ ## πŸš€ Overview
23
+
24
+ Codebase Extractor is a command-line interface (CLI) tool designed to scan a project directory and consolidate all relevant source code into neatly organized Markdown files. It's perfect for creating a complete project snapshot for analysis, documentation, or providing context to Large Language Models (LLMs) like GPT-4, Gemini, or Claude.
25
+
26
+ The tool is highly configurable, allowing you to select specific folders, exclude large files, and intelligently ignore common directories like `node_modules` and `.git`.
27
+
28
+ ---
29
+
30
+ ## ✨ Key Features
31
+
32
+ - **Interactive & User-Friendly:** A guided, multi-step CLI experience that makes selecting options simple and clear.
33
+ - **Smart Filtering:** Automatically excludes common dependency folders, build artifacts, version control directories, and IDE configuration files.
34
+ - **Flexible Selection Modes:** Choose to extract the entire project with one command, or dive into a specific selection mode.
35
+ - **🌳 Nested Folder Selection:** Interactively browse and select specific sub-folders from a tree-like view.
36
+ - **πŸ”’ Configurable Scan Depth:** You decide how many levels deep the script should look for folders when building the selection tree.
37
+ - **YAML Metadata:** Each generated Markdown file is prepended with a YAML front matter block containing useful metadata like a unique run ID, timestamp, and file count for easy tracking and parsing.
38
+ - **πŸš€ Quick Start Mode:** Use the `--no-instructions` flag to skip the detailed intro guide on subsequent runs.
39
+ - **Safe & Robust:** Features graceful exit handling (`Ctrl+C`) and provides clear feedback during the extraction process.
40
+
41
+ ---
42
+
43
+ ## πŸ“¦ Installation
44
+
45
+ To get started with Codebase Extractor, follow these steps.
46
+
47
+ 1. **Clone the Repository**
48
+
49
+ ```bash
50
+ git clone [https://github.com/lukaszlekowski/codebase-extractor.git](https://github.com/lukaszlekowski/codebase-extractor.git)
51
+ cd codebase-extractor
52
+ ```
53
+
54
+ 2. **Set Up a Virtual Environment** (Recommended)
55
+ This keeps the dependencies for this project isolated from your system's Python installation.
56
+
57
+ ```bash
58
+ # For macOS/Linux
59
+ python3 -m venv venv
60
+ source venv/bin/activate
61
+
62
+ # For Windows
63
+ python -m venv venv
64
+ venv\Scripts\activate
65
+ ```
66
+
67
+ 3. **Install Dependencies**
68
+ The required Python packages are listed in `requirements.txt`.
69
+ ```bash
70
+ pip install -r requirements.txt
71
+ ```
72
+
73
+ ---
74
+
75
+ ## ▢️ Usage
76
+
77
+ ### Basic Usage
78
+
79
+ Once installed, simply run the script from the root of the project you wish to extract:
80
+
81
+ ```bash
82
+ python3 codebase_extractor.py
83
+ ```
84
+
85
+ # Quick Start
86
+
87
+ For repeat usage, you can skip the detailed introductory guide by using the `--no-instructions` or `-ni` flag:
88
+
89
+ ```bash
90
+ python3 codebase_extractor.py --no-instructions
91
+ ```
92
+
93
+ ## The Process
94
+
95
+ The tool will guide you through a series of prompts:
96
+
97
+ - **Initial Setup [1/2]**: A yes/no question to skip files larger than 1MB.
98
+ - **Extraction Mode [2/2]**: Choose whether to extract the entire project (`Everything`) or select specific folders.
99
+
100
+ ### Specific Selection (if chosen):
101
+
102
+ - **Scan Depth**: You'll be asked how many sub-folder levels to scan for the selection list (defaults to 3).
103
+ - **Folder Tree**: You'll see a checklist of available folders and sub-folders to extract. The script handles selections intelligently:
104
+ - Selecting a parent folder automatically includes all its sub-folders, so you don’t need to select them individually.
105
+ - To extract only a sub-folder’s contents, select the sub-folder but not its parent.
106
+ - The special `root [...]` option extracts only the files in your project's main directory, ignoring all sub-folders.
107
+
108
+ ## Output Details
109
+
110
+ All output files are saved in a `CODEBASE_EXTRACTS` directory within your project folder. Each generated Markdown file includes a YAML metadata header with a unique reference ID, timestamp, and file count for easy tracking and parsing.
111
+
112
+ ## πŸ“œ License
113
+
114
+ This project is licensed under a modified MIT License. Please see the [LICENSE](LICENSE) file for the full text.
115
+
116
+ The standard MIT License has been amended with a single, important attribution requirement:
117
+
118
+ If you use, copy, or modify any part of this software, you must include a clear and visible attribution to the original author and project in your derivative work.
119
+
120
+ This attribution must include:
121
+
122
+ - A link back to this original GitHub repository: [https://github.com/lukaszlekowski/codebase-extractor](https://github.com/lukaszlekowski/codebase-extractor)
123
+ - A link to the author's LinkedIn profile: [https://www.linkedin.com/in/lukasz-lekowski](https://www.linkedin.com/in/lukasz-lekowski)
@@ -0,0 +1,34 @@
1
+ [build-system]
2
+ requires = ["setuptools"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "codebase-extractor"
7
+ version = "1.0.0"
8
+ authors = [
9
+ { name="Lukasz Lekowski" },
10
+ ]
11
+ description = "A CLI tool to extract project source code into structured Markdown files for LLM & AI context."
12
+ readme = "README.md"
13
+ license = { file="LICENSE" }
14
+ requires-python = ">=3.9"
15
+ classifiers = [
16
+ "Programming Language :: Python :: 3",
17
+ "License :: OSI Approved :: MIT License",
18
+ "Operating System :: OS Independent",
19
+ "Topic :: Software Development :: Documentation",
20
+ "Topic :: Utilities",
21
+ ]
22
+ dependencies = [
23
+ "questionary",
24
+ "halo",
25
+ "termcolor",
26
+ ]
27
+
28
+ # This creates the `code-extractor` command in the user's terminal
29
+ [project.scripts]
30
+ code-extractor = "codebase_extractor.main_logic:main"
31
+
32
+ [project.urls]
33
+ "Homepage" = "https://github.com/lukaszlekowski/codebase-extractor"
34
+ "Bug Tracker" = "https://github.com/lukaszlekowski/codebase-extractor/issues"
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1 @@
1
+ __version__ = "1.0.0"
@@ -0,0 +1,41 @@
1
+ from pathlib import Path
2
+ import sys
3
+
4
+ # --- CONFIGURATION ---
5
+ SCRIPT_VERSION = "1.0.0"
6
+ GITHUB_URL = "https://github.com/lukaszlekowski/codebase-extractor"
7
+ LINKEDIN_URL = "https://www.linkedin.com/in/lukasz-lekowski"
8
+ SCRIPT_FILENAME = Path(sys.argv[0]).name
9
+ OUTPUT_DIR_NAME = "CODEBASE_EXTRACTS"
10
+
11
+ # --- FILE/FOLDER LISTS ---
12
+ EXCLUDED_DIRS = {
13
+ "node_modules", "vendor", "__pycache__", "dist", "build", "target", ".next",
14
+ ".git", ".svn", ".hg", ".vscode", ".idea", "venv", ".venv",
15
+ OUTPUT_DIR_NAME,
16
+ }
17
+ EXCLUDED_FILENAMES = {
18
+ "package-lock.json", "yarn.lock", "composer.lock", ".env"
19
+ }
20
+ ALLOWED_FILENAMES = {
21
+ "dockerfile", ".gitignore", ".htaccess", "makefile"
22
+ }
23
+ ALLOWED_EXTENSIONS = {
24
+ ".php", ".html", ".css", ".js", ".jsx", ".ts", ".tsx", ".vue", ".svelte",
25
+ ".py", ".rb", ".java", ".c", ".cpp", ".cs", ".go", ".rs", ".json", ".xml",
26
+ ".yaml", ".yml", ".toml", ".ini", ".conf", ".md", ".txt", ".rst", ".twig",
27
+ ".blade", ".handlebars", ".mustache", ".ejs", ".sql", ".graphql", ".gql", ".tf",
28
+ }
29
+
30
+ # --- MAPPINGS & CONSTANTS ---
31
+ EXTENSION_LANG_MAP = {
32
+ ".js": "javascript", ".ts": "typescript", ".tsx": "tsx", ".py": "python",
33
+ ".html": "html", ".css": "css", ".json": "json", ".md": "markdown", ".txt": "",
34
+ ".sh": "bash", ".yml": "yaml", ".yaml": "yaml", ".php": "php", ".rb": "ruby",
35
+ ".java": "java", ".c": "c", ".cpp": "cpp", ".cs": "csharp", ".go": "go",
36
+ ".rs": "rust", ".vue": "vue", ".svelte": "svelte", ".sql": "sql",
37
+ ".graphql": "graphql", ".gql": "graphql",
38
+ }
39
+ MAX_FILE_SIZE_MB = 1
40
+ FILE_COUNT_WARNING_THRESHOLD = 1000
41
+ LOGO_BREAKPOINT = 144
@@ -0,0 +1,133 @@
1
+ import os
2
+ import datetime
3
+ import uuid
4
+ from pathlib import Path
5
+ from termcolor import colored
6
+ from . import config
7
+ import questionary
8
+
9
+ def get_folder_choices(root_path: Path, max_depth: int) -> list:
10
+ """Recursively finds folders up to a max depth and prepares them for questionary."""
11
+ choices = []
12
+
13
+ def scanner(current_path: Path, depth: int):
14
+ if depth > max_depth:
15
+ return
16
+
17
+ relative_path = current_path.relative_to(root_path)
18
+ prefix = " " * (depth - 1)
19
+ display_name = f"{prefix}{current_path.name}"
20
+ choices.append(questionary.Choice(title=display_name, value=relative_path))
21
+
22
+ try:
23
+ subdirs = sorted([p for p in current_path.iterdir() if p.is_dir() and p.name not in config.EXCLUDED_DIRS])
24
+ for subdir in subdirs:
25
+ scanner(subdir, depth + 1)
26
+ except PermissionError:
27
+ pass
28
+
29
+ top_level_folders = sorted([p for p in root_path.iterdir() if p.is_dir() and p.name not in config.EXCLUDED_DIRS])
30
+ for folder in top_level_folders:
31
+ scanner(folder, 1)
32
+
33
+ root_option_name = f"root [{root_path.name}] (files in root folder only, excl. sub-folders)"
34
+ choices.insert(0, questionary.Choice(title=root_option_name, value="ROOT_SENTINEL"))
35
+
36
+ return choices
37
+
38
+
39
+ def is_allowed_file(path: Path, exclude_large: bool) -> bool:
40
+ """Checks if a file should be included based on its name, extension, and size."""
41
+ if path.name == config.SCRIPT_FILENAME:
42
+ return False
43
+ if path.name.lower() in config.ALLOWED_FILENAMES:
44
+ return True
45
+ if not path.is_file():
46
+ return False
47
+ if path.name.lower() in config.EXCLUDED_FILENAMES:
48
+ return False
49
+ if path.suffix not in config.ALLOWED_EXTENSIONS:
50
+ return False
51
+ if exclude_large and path.stat().st_size > config.MAX_FILE_SIZE_MB * 1024 * 1024:
52
+ return False
53
+ return True
54
+
55
+
56
+ def extract_code_from_folder(folder: Path, exclude_large: bool) -> (str, int):
57
+ """Extracts code from a given folder, respecting EXCLUDED_DIRS at all depths."""
58
+ content = f"# Folder: {folder.relative_to(Path.cwd())}\n\n"
59
+ extracted_files = 0
60
+ dirs_to_visit = [folder]
61
+ while dirs_to_visit:
62
+ current_dir = dirs_to_visit.pop(0)
63
+ for item in sorted(current_dir.iterdir()):
64
+ if item.is_dir() and item.name not in config.EXCLUDED_DIRS:
65
+ dirs_to_visit.append(item)
66
+ elif item.is_file() and is_allowed_file(item, exclude_large):
67
+ try:
68
+ rel_path = item.relative_to(Path.cwd())
69
+ ext = item.suffix
70
+ lang = config.EXTENSION_LANG_MAP.get(ext, "")
71
+ content += f"## {rel_path}\n\n```{lang}\n"
72
+ content += item.read_text(errors="ignore")
73
+ content += "\n```\n\n"
74
+ extracted_files += 1
75
+ except Exception as e:
76
+ content += f"\n\n"
77
+ if extracted_files > config.FILE_COUNT_WARNING_THRESHOLD:
78
+ print(colored(f"> Caution: Large file count in '{folder.name}' ({extracted_files} files).", "yellow"))
79
+ return content, extracted_files
80
+
81
+
82
+ def extract_code_from_root(root_path: Path, exclude_large: bool) -> (str, int):
83
+ """Extracts code only from files present in the root directory."""
84
+ content = f"# Root Files: {root_path.name}\n\n"
85
+ extracted_files = 0
86
+ for filepath in sorted(root_path.iterdir()):
87
+ if filepath.is_file() and is_allowed_file(filepath, exclude_large):
88
+ ext = filepath.suffix
89
+ lang = config.EXTENSION_LANG_MAP.get(ext, "")
90
+ content += f"## {filepath.name}\n\n```{lang}\n"
91
+ content += filepath.read_text(errors="ignore")
92
+ content += "\n```\n\n"
93
+ extracted_files += 1
94
+ if extracted_files > config.FILE_COUNT_WARNING_THRESHOLD:
95
+ print(colored(f"> Caution: Large file count in root ({extracted_files} files).", "yellow"))
96
+ return content, extracted_files
97
+
98
+
99
+ def write_to_markdown_file(content: str, metadata: dict, root_path: Path):
100
+ """Constructs a YAML header and writes the full content to a timestamped Markdown file."""
101
+ output_dir = Path(config.OUTPUT_DIR_NAME)
102
+ output_dir.mkdir(exist_ok=True)
103
+
104
+ timestamp = datetime.datetime.fromisoformat(metadata['run_timestamp']).strftime("%Y%m%d_%H%M%S")
105
+ output_name = str(metadata['folder_name'])
106
+
107
+ if output_name.startswith(f"root [{root_path.name}]"):
108
+ file_base_name = f"root_{root_path.name}"
109
+ else:
110
+ file_base_name = str(output_name).replace(os.sep, "_")
111
+
112
+ filename = f"{file_base_name}_{timestamp}.md"
113
+ full_filepath = output_dir / filename
114
+
115
+ yaml_header = f"""---
116
+ extraction_details:
117
+ reference: {metadata['run_ref']}
118
+ timestamp_utc: "{metadata['run_timestamp']}"
119
+ source_folder: "{metadata['folder_name']}"
120
+ file_count: {metadata['file_count']}
121
+ tool_details:
122
+ name: "Codebase Extractor"
123
+ version: "{config.SCRIPT_VERSION}"
124
+ source: "{config.GITHUB_URL}"
125
+ ---
126
+
127
+ """
128
+ full_content = yaml_header + content
129
+ with open(full_filepath, "w", encoding="utf-8") as f:
130
+ f.write(full_content)
131
+
132
+ print(f"\nπŸ’Ύ Saved to {colored(str(full_filepath), 'cyan')}")
133
+ return str(full_filepath)
@@ -0,0 +1,197 @@
1
+ import os
2
+ import sys
3
+ import time
4
+ import datetime
5
+ import uuid
6
+ import shutil
7
+ import argparse
8
+ from pathlib import Path
9
+ from typing import List, Optional
10
+
11
+ import questionary
12
+ from halo import Halo
13
+ from termcolor import colored
14
+ from prompt_toolkit.styles import Style
15
+ from questionary import Validator, ValidationError
16
+
17
+ # Import from our new modules
18
+ from . import config
19
+ from . import ui
20
+ from . import file_handler
21
+
22
+ class NumberValidator(Validator):
23
+ """Validates that the input is a positive integer."""
24
+ def validate(self, document):
25
+ try:
26
+ value = int(document.text)
27
+ if value <= 0:
28
+ raise ValidationError(
29
+ message="Please enter a positive number.",
30
+ cursor_position=len(document.text))
31
+ except ValueError:
32
+ raise ValidationError(
33
+ message="Please enter a valid number.",
34
+ cursor_position=len(document.text))
35
+
36
+ def main():
37
+ """Main function to run the CLI application."""
38
+ parser = argparse.ArgumentParser(add_help=False)
39
+ parser.add_argument(
40
+ '-ni', '--no-instructions',
41
+ action='store_true'
42
+ )
43
+ args, _ = parser.parse_known_args()
44
+
45
+ ui.clear_screen()
46
+ ui.print_banner()
47
+
48
+ if not args.no_instructions:
49
+ ui.show_instructions()
50
+ else:
51
+ input(colored("\nPress Enter to begin...", "green"))
52
+ ui.clear_screen()
53
+
54
+ root_path = Path.cwd()
55
+
56
+ select_style = Style([('qmark', 'fg:#FFA500'), ('pointer', 'fg:#FFA500'), ('highlighted', 'fg:black bg:#FFA500'), ('selected', 'fg:black bg:#FFA500')])
57
+ checkbox_style = Style([('qmark', 'fg:#FFA500'), ('pointer', 'fg:#FFA500'), ('highlighted', 'fg:#FFA500'), ('selected', 'fg:#FFA500'), ('checkbox-selected', 'fg:#FFA500')])
58
+
59
+ exit_message = colored("\nExtraction aborted by user. Closing Code Extractor. Goodbye.", "red")
60
+
61
+ print("=== Extraction Settings ===")
62
+ exclude_large = questionary.select("[1/2] -- Exclude files larger than 1MB?", choices=["yes", "no"], style=select_style, instruction=" ").ask()
63
+ if exclude_large is None:
64
+ print(exit_message)
65
+ return
66
+ exclude_large = exclude_large == "yes"
67
+ print()
68
+
69
+ folders_to_process = set()
70
+ process_root_files = False
71
+
72
+ run_ref = str(uuid.uuid4())
73
+ run_timestamp = datetime.datetime.now(datetime.timezone.utc).isoformat()
74
+
75
+ selection_mode = questionary.select("[2/2] -- What do you want to extract?", choices=["Everything (all folders and root files)", "Specific folders/root from a list"], style=select_style, instruction=" ").ask()
76
+ if selection_mode is None:
77
+ print(exit_message)
78
+ return
79
+
80
+ if selection_mode == "Everything (all folders and root files)":
81
+ folders_to_process.update([p for p in root_path.iterdir() if p.is_dir() and p.name not in config.EXCLUDED_DIRS])
82
+ process_root_files = True
83
+ else:
84
+ depth_str = questionary.text(
85
+ "-- How many levels deep should we scan for folders?",
86
+ default="3",
87
+ validate=NumberValidator,
88
+ style=select_style
89
+ ).ask()
90
+ if depth_str is None:
91
+ print(exit_message)
92
+ return
93
+ scan_depth = int(depth_str)
94
+
95
+ folder_choices = file_handler.get_folder_choices(root_path, max_depth=scan_depth)
96
+ selected_options = None
97
+ confirm_exit = False
98
+
99
+ checkbox_instruction = "(Arrows to move, Space to select, A to toggle, I to invert)"
100
+
101
+ while not selected_options:
102
+ selection = questionary.checkbox(
103
+ "-- Select folders/sub-folders to extract (must select at least one):",
104
+ choices=folder_choices,
105
+ style=checkbox_style,
106
+ instruction=checkbox_instruction
107
+ ).ask()
108
+
109
+ if selection is None:
110
+ if confirm_exit:
111
+ print(exit_message)
112
+ return
113
+ else:
114
+ confirm_exit = True
115
+ print(colored("\n[!] Press Ctrl+C again to exit.", "yellow"))
116
+ continue
117
+
118
+ confirm_exit = False
119
+
120
+ if not selection:
121
+ print(colored("[!] Error: You must make a selection.", "red"))
122
+ continue
123
+
124
+ selected_options = selection
125
+ break
126
+
127
+ if "ROOT_SENTINEL" in selected_options:
128
+ process_root_files = True
129
+ selected_options.remove("ROOT_SENTINEL")
130
+
131
+ selected_paths = [root_path / p for p in selected_options]
132
+ sorted_paths = sorted(selected_paths, key=lambda p: len(p.parts))
133
+
134
+ final_paths = set()
135
+ for path in sorted_paths:
136
+ if not any(path.is_relative_to(parent) for parent in final_paths):
137
+ final_paths.add(path)
138
+
139
+ folders_to_process.update(final_paths)
140
+
141
+ print()
142
+ total_files_extracted = 0
143
+
144
+ for folder_path in sorted(list(folders_to_process)):
145
+ with Halo(text=f"Extracting {folder_path.relative_to(root_path)}...", spinner="dots"):
146
+ time.sleep(0.1)
147
+ folder_md, folder_count = file_handler.extract_code_from_folder(folder_path, exclude_large)
148
+ if folder_count > 0:
149
+ metadata = {"run_ref": run_ref, "run_timestamp": run_timestamp, "folder_name": str(folder_path.relative_to(root_path)), "file_count": folder_count}
150
+ file_handler.write_to_markdown_file(folder_md, metadata, root_path)
151
+ total_files_extracted += folder_count
152
+ print(f"βœ… Extracted {folder_count} file(s) from: {folder_path.relative_to(root_path)}")
153
+ else:
154
+ print(f"[!] No extractable files in: {folder_path.relative_to(root_path)}")
155
+ print("")
156
+
157
+ if process_root_files:
158
+ root_display_name = f"root [{root_path.name}] (files in root folder only, excl. sub-folders)"
159
+ with Halo(text=f"Extracting {root_display_name}...", spinner="dots"):
160
+ time.sleep(0.1)
161
+ root_md, root_count = file_handler.extract_code_from_root(root_path, exclude_large)
162
+ if root_count > 0:
163
+ metadata = {"run_ref": run_ref, "run_timestamp": run_timestamp, "folder_name": root_display_name, "file_count": root_count}
164
+ file_handler.write_to_markdown_file(root_md, metadata, root_path)
165
+ total_files_extracted += root_count
166
+ print(f"βœ… Extracted {root_count} file(s) from the root directory")
167
+ else:
168
+ print("[!] No extractable files in the root directory")
169
+ print("")
170
+
171
+ try:
172
+ width = shutil.get_terminal_size((80, 20)).columns
173
+ except OSError:
174
+ width = 80
175
+
176
+ if total_files_extracted > 0:
177
+ output_dir_path = Path(config.OUTPUT_DIR_NAME).resolve()
178
+ print(colored(f"Success! A total of {total_files_extracted} file(s) have been extracted.", "grey", "on_green"))
179
+ print(f"Files saved in: {colored(str(output_dir_path), 'cyan')}")
180
+ else:
181
+ print(colored("Extraction complete, but no files matched the criteria.", "yellow"))
182
+
183
+ print("\n")
184
+ print("-" * width)
185
+ print("πŸ’‘ Love this tool? Found a bug? Share your feedback on GitHub:")
186
+ print(config.GITHUB_URL + "\n")
187
+ print("🀝 Connect with the author on LinkedIn:")
188
+ print(config.LINKEDIN_URL + "\n")
189
+ print("β˜• Enjoying this tool? You can support its development with a coffee!")
190
+ print("https://www.buymeacoffee.com/lukaszlekowski\n")
191
+
192
+
193
+ if __name__ == "__main__":
194
+ try:
195
+ main()
196
+ except KeyboardInterrupt:
197
+ print(colored("\nExtraction aborted by user. Closing Code Extractor. Goodbye.", "red"))
@@ -0,0 +1,84 @@
1
+ import os
2
+ import shutil
3
+ from . import config
4
+ from termcolor import colored
5
+
6
+ LOGO_LARGE = """
7
+ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•— β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
8
+ β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β•β•β• β–ˆβ–ˆβ•”β•β•β•β•β•β•šβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•šβ•β•β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β•β•β•šβ•β•β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—
9
+ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β•šβ–ˆβ–ˆβ–ˆβ•”β• β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•
10
+ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β• β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β•šβ•β•β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β• β–ˆβ–ˆβ•”β•β•β• β–ˆβ–ˆβ•”β–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—
11
+ β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β• β–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘ β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘
12
+ β•šβ•β•β•β•β•β• β•šβ•β•β•β•β•β• β•šβ•β•β•β•β•β• β•šβ•β•β•β•β•β•β•β•šβ•β•β•β•β•β• β•šβ•β• β•šβ•β•β•šβ•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β• β•šβ•β•β•β•β•β•β•β•šβ•β• β•šβ•β• β•šβ•β• β•šβ•β• β•šβ•β•β•šβ•β• β•šβ•β• β•šβ•β•β•β•β•β• β•šβ•β• β•šβ•β•β•β•β•β• β•šβ•β• β•šβ•β•
13
+ """
14
+
15
+ LOGO_SMALL = """
16
+ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
17
+ β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β•β•β•
18
+ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
19
+ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β• β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β•šβ•β•β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•
20
+ β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
21
+ β•šβ•β•β•β•β•β• β•šβ•β•β•β•β•β• β•šβ•β•β•β•β•β• β•šβ•β•β•β•β•β•β•β•šβ•β•β•β•β•β• β•šβ•β• β•šβ•β•β•šβ•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•
22
+
23
+ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•— β–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—
24
+ β–ˆβ–ˆβ•”β•β•β•β•β•β•šβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•šβ•β•β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β•β•β•šβ•β•β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—
25
+ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β•šβ–ˆβ–ˆβ–ˆβ•”β• β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•
26
+ β–ˆβ–ˆβ•”β•β•β• β–ˆβ–ˆβ•”β–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—
27
+ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β• β–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘ β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘
28
+ β•šβ•β•β•β•β•β•β•β•šβ•β• β•šβ•β• β•šβ•β• β•šβ•β• β•šβ•β•β•šβ•β• β•šβ•β• β•šβ•β•β•β•β•β• β•šβ•β• β•šβ•β•β•β•β•β• β•šβ•β• β•šβ•β•
29
+ """
30
+
31
+ def clear_screen():
32
+ """Clears the terminal screen."""
33
+ os.system('cls' if os.name == 'nt' else 'clear')
34
+
35
+ def print_banner():
36
+ """Prints a banner that adjusts to the terminal width."""
37
+ try:
38
+ width = shutil.get_terminal_size((80, 20)).columns
39
+ except OSError:
40
+ width = 80
41
+
42
+ if width > config.LOGO_BREAKPOINT:
43
+ print(LOGO_LARGE)
44
+ else:
45
+ print(LOGO_SMALL)
46
+
47
+ print(colored(f" Welcome to Code Extractor v{config.SCRIPT_VERSION} by Lukasz Lekowski ".center(width, "="), "white", "on_magenta"))
48
+
49
+ def show_instructions():
50
+ """Clears screen and shows detailed instructions, pausing for user input."""
51
+ try:
52
+ width = shutil.get_terminal_size((80, 20)).columns
53
+ except OSError:
54
+ width = 80
55
+
56
+ input(colored("\nPress Enter to view detailed instructions...", "dark_grey"))
57
+ clear_screen()
58
+ print(colored("--- How It Works ---", "yellow"))
59
+ print("The script will guide you through a series of steps:\n")
60
+
61
+ print(colored("Step 1: General Settings", "cyan"))
62
+ print("You will first be asked about basic settings, such as whether to exclude files larger than 1MB to keep the output clean.\n")
63
+
64
+ print(colored("Step 2: Extraction Mode", "cyan"))
65
+ print("You have two main modes to choose from:")
66
+ print(" - 'Everything': This automatically finds and processes every valid folder and all root files. You will get one Markdown file for each top-level folder, plus one for the root files.")
67
+ print(" - 'Specific folders/root': This lets you hand-pick exactly what you want to extract.\n")
68
+
69
+ print(colored("Step 3: Detailed Selection (If you chose 'Specific')", "cyan"))
70
+ print("If you choose to be specific, you'll be presented with more options:")
71
+ print(" - Scan Depth: First, you'll decide how many sub-folder levels to scan and display.")
72
+ print(" - Selection Tree: You'll see a tree-like list of your project's folders. The script handles parent/child selections intelligently:")
73
+ print(" - If you select a parent folder, all of its sub-folders are automatically included. You don't need to check them individually.")
74
+ print(" - To get a file for *only* a sub-folder, select the sub-folder but *not* its parent.")
75
+ print(" - The 'root [...]' option specifically extracts *only* the files in your project's main directory.\n")
76
+
77
+ print(colored("--- Output Details ---", "yellow"))
78
+ print(f"All extracted content is saved into the '{config.OUTPUT_DIR_NAME}' directory. Each Markdown file generated will contain a YAML metadata header at the top with a unique reference ID, a timestamp, and more.\n")
79
+
80
+ tip = "TIP: Run this script with the --no-instructions or -ni flag to skip this guide next time."
81
+ print(colored(tip, "black", "on_yellow"))
82
+
83
+ input(colored("\nReady? Press Enter to begin...", "green"))
84
+ clear_screen()
@@ -0,0 +1,143 @@
1
+ Metadata-Version: 2.4
2
+ Name: codebase-extractor
3
+ Version: 1.0.0
4
+ Summary: A CLI tool to extract project source code into structured Markdown files for LLM & AI context.
5
+ Author: Lukasz Lekowski
6
+ Project-URL: Homepage, https://github.com/lukaszlekowski/codebase-extractor
7
+ Project-URL: Bug Tracker, https://github.com/lukaszlekowski/codebase-extractor/issues
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Classifier: Topic :: Software Development :: Documentation
12
+ Classifier: Topic :: Utilities
13
+ Requires-Python: >=3.9
14
+ Description-Content-Type: text/markdown
15
+ License-File: LICENCE
16
+ Requires-Dist: questionary
17
+ Requires-Dist: halo
18
+ Requires-Dist: termcolor
19
+ Dynamic: license-file
20
+
21
+ # Codebase Extractor
22
+
23
+ <p align="center">
24
+ <strong>A user-friendly CLI tool to extract project source code into structured Markdown files.</strong>
25
+ </p>
26
+
27
+ <p align="center">
28
+ <img src="https://img.shields.io/badge/License-MIT%20(Modified)-yellow.svg" alt="License: MIT (Modified)">
29
+ <img src="https://img.shields.io/badge/python-3.9%2B-blue.svg" alt="Python Version">
30
+ <img src="https://img.shields.io/github/stars/lukaszlekowski/codebase-extractor?style=social" alt="GitHub Stars">
31
+ </p>
32
+ <p align="center">
33
+ πŸ’‘ <b>Love this tool?</b> Found a bug or have an idea? Share it on <a href="https://github.com/lukaszlekowski/codebase-extractor">GitHub</a>! <br>
34
+ 🀝 <b>Connect with me</b> on <a href="https://www.linkedin.com/in/lukasz-lekowski">LinkedIn</a>. <br>
35
+ β˜• <b>Enjoying it?</b> Support development with a <a href="https://www.buymeacoffee.com/lukaszlekowski">coffee</a>!
36
+ </p>
37
+
38
+ </p>
39
+
40
+ ---
41
+
42
+ ## πŸš€ Overview
43
+
44
+ Codebase Extractor is a command-line interface (CLI) tool designed to scan a project directory and consolidate all relevant source code into neatly organized Markdown files. It's perfect for creating a complete project snapshot for analysis, documentation, or providing context to Large Language Models (LLMs) like GPT-4, Gemini, or Claude.
45
+
46
+ The tool is highly configurable, allowing you to select specific folders, exclude large files, and intelligently ignore common directories like `node_modules` and `.git`.
47
+
48
+ ---
49
+
50
+ ## ✨ Key Features
51
+
52
+ - **Interactive & User-Friendly:** A guided, multi-step CLI experience that makes selecting options simple and clear.
53
+ - **Smart Filtering:** Automatically excludes common dependency folders, build artifacts, version control directories, and IDE configuration files.
54
+ - **Flexible Selection Modes:** Choose to extract the entire project with one command, or dive into a specific selection mode.
55
+ - **🌳 Nested Folder Selection:** Interactively browse and select specific sub-folders from a tree-like view.
56
+ - **πŸ”’ Configurable Scan Depth:** You decide how many levels deep the script should look for folders when building the selection tree.
57
+ - **YAML Metadata:** Each generated Markdown file is prepended with a YAML front matter block containing useful metadata like a unique run ID, timestamp, and file count for easy tracking and parsing.
58
+ - **πŸš€ Quick Start Mode:** Use the `--no-instructions` flag to skip the detailed intro guide on subsequent runs.
59
+ - **Safe & Robust:** Features graceful exit handling (`Ctrl+C`) and provides clear feedback during the extraction process.
60
+
61
+ ---
62
+
63
+ ## πŸ“¦ Installation
64
+
65
+ To get started with Codebase Extractor, follow these steps.
66
+
67
+ 1. **Clone the Repository**
68
+
69
+ ```bash
70
+ git clone [https://github.com/lukaszlekowski/codebase-extractor.git](https://github.com/lukaszlekowski/codebase-extractor.git)
71
+ cd codebase-extractor
72
+ ```
73
+
74
+ 2. **Set Up a Virtual Environment** (Recommended)
75
+ This keeps the dependencies for this project isolated from your system's Python installation.
76
+
77
+ ```bash
78
+ # For macOS/Linux
79
+ python3 -m venv venv
80
+ source venv/bin/activate
81
+
82
+ # For Windows
83
+ python -m venv venv
84
+ venv\Scripts\activate
85
+ ```
86
+
87
+ 3. **Install Dependencies**
88
+ The required Python packages are listed in `requirements.txt`.
89
+ ```bash
90
+ pip install -r requirements.txt
91
+ ```
92
+
93
+ ---
94
+
95
+ ## ▢️ Usage
96
+
97
+ ### Basic Usage
98
+
99
+ Once installed, simply run the script from the root of the project you wish to extract:
100
+
101
+ ```bash
102
+ python3 codebase_extractor.py
103
+ ```
104
+
105
+ # Quick Start
106
+
107
+ For repeat usage, you can skip the detailed introductory guide by using the `--no-instructions` or `-ni` flag:
108
+
109
+ ```bash
110
+ python3 codebase_extractor.py --no-instructions
111
+ ```
112
+
113
+ ## The Process
114
+
115
+ The tool will guide you through a series of prompts:
116
+
117
+ - **Initial Setup [1/2]**: A yes/no question to skip files larger than 1MB.
118
+ - **Extraction Mode [2/2]**: Choose whether to extract the entire project (`Everything`) or select specific folders.
119
+
120
+ ### Specific Selection (if chosen):
121
+
122
+ - **Scan Depth**: You'll be asked how many sub-folder levels to scan for the selection list (defaults to 3).
123
+ - **Folder Tree**: You'll see a checklist of available folders and sub-folders to extract. The script handles selections intelligently:
124
+ - Selecting a parent folder automatically includes all its sub-folders, so you don’t need to select them individually.
125
+ - To extract only a sub-folder’s contents, select the sub-folder but not its parent.
126
+ - The special `root [...]` option extracts only the files in your project's main directory, ignoring all sub-folders.
127
+
128
+ ## Output Details
129
+
130
+ All output files are saved in a `CODEBASE_EXTRACTS` directory within your project folder. Each generated Markdown file includes a YAML metadata header with a unique reference ID, timestamp, and file count for easy tracking and parsing.
131
+
132
+ ## πŸ“œ License
133
+
134
+ This project is licensed under a modified MIT License. Please see the [LICENSE](LICENSE) file for the full text.
135
+
136
+ The standard MIT License has been amended with a single, important attribution requirement:
137
+
138
+ If you use, copy, or modify any part of this software, you must include a clear and visible attribution to the original author and project in your derivative work.
139
+
140
+ This attribution must include:
141
+
142
+ - A link back to this original GitHub repository: [https://github.com/lukaszlekowski/codebase-extractor](https://github.com/lukaszlekowski/codebase-extractor)
143
+ - A link to the author's LinkedIn profile: [https://www.linkedin.com/in/lukasz-lekowski](https://www.linkedin.com/in/lukasz-lekowski)
@@ -0,0 +1,14 @@
1
+ LICENCE
2
+ README.md
3
+ pyproject.toml
4
+ src/codebase_extractor/__init__.py
5
+ src/codebase_extractor/config.py
6
+ src/codebase_extractor/file_handler.py
7
+ src/codebase_extractor/main_logic.py
8
+ src/codebase_extractor/ui.py
9
+ src/codebase_extractor.egg-info/PKG-INFO
10
+ src/codebase_extractor.egg-info/SOURCES.txt
11
+ src/codebase_extractor.egg-info/dependency_links.txt
12
+ src/codebase_extractor.egg-info/entry_points.txt
13
+ src/codebase_extractor.egg-info/requires.txt
14
+ src/codebase_extractor.egg-info/top_level.txt
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ code-extractor = codebase_extractor.main_logic:main
@@ -0,0 +1,3 @@
1
+ questionary
2
+ halo
3
+ termcolor
@@ -0,0 +1 @@
1
+ codebase_extractor