dirshot 0.2.0__tar.gz → 0.3.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {dirshot-0.2.0 → dirshot-0.3.0}/PKG-INFO +6 -3
- {dirshot-0.2.0 → dirshot-0.3.0}/README.md +6 -3
- {dirshot-0.2.0 → dirshot-0.3.0}/pyproject.toml +1 -1
- {dirshot-0.2.0 → dirshot-0.3.0}/src/dirshot/dirshot.py +180 -118
- dirshot-0.3.0/src/dirshot/reconstruct.py +110 -0
- {dirshot-0.2.0 → dirshot-0.3.0}/src/dirshot.egg-info/PKG-INFO +6 -3
- {dirshot-0.2.0 → dirshot-0.3.0}/src/dirshot.egg-info/SOURCES.txt +1 -0
- {dirshot-0.2.0 → dirshot-0.3.0}/setup.cfg +0 -0
- {dirshot-0.2.0 → dirshot-0.3.0}/src/dirshot/__init__.py +0 -0
- {dirshot-0.2.0 → dirshot-0.3.0}/src/dirshot.egg-info/dependency_links.txt +0 -0
- {dirshot-0.2.0 → dirshot-0.3.0}/src/dirshot.egg-info/requires.txt +0 -0
- {dirshot-0.2.0 → dirshot-0.3.0}/src/dirshot.egg-info/top_level.txt +0 -0
- {dirshot-0.2.0 → dirshot-0.3.0}/tests/test_dirshot.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: dirshot
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.3.0
|
|
4
4
|
Summary: A flexible, high-performance utility for creating project snapshots and searching files with a rich terminal UI.
|
|
5
5
|
Author-email: init-helpful <init.helpful@gmail.com>
|
|
6
6
|
Project-URL: Homepage, https://github.com/init-helpful/dirshot
|
|
@@ -168,10 +168,11 @@ The `generate_snapshot()` function accepts the following parameters:
|
|
|
168
168
|
| `root_directory` | `str` | `"."` | The starting directory for the scan. |
|
|
169
169
|
| `output_file_name` | `str` | `"project_snapshot.txt"` | The name of the file to save the results to. |
|
|
170
170
|
| `search_keywords` | `Optional[List[str]]` | `None` | If provided, switches to **Search Mode**. Otherwise, runs in **Snapshot Mode**. |
|
|
171
|
+
| `files` | `Optional[List[str]]` | `None` | A list of specific filenames to include. If provided, checks this list first before extensions. |
|
|
171
172
|
| `language_presets` | `Optional[List[LanguagePreset]]` | `None` | A list of `LanguagePreset` enums for common file types (e.g., `LanguagePreset.PYTHON`). |
|
|
172
173
|
| `ignore_presets` | `Optional[List[IgnorePreset]]` | `None` | A list of `IgnorePreset` enums for common ignore patterns (e.g., `IgnorePreset.NODE_JS`). |
|
|
173
174
|
| `file_extensions` | `Optional[List[str]]` | `None` | A manual list of file extensions to include (e.g., `[".py", ".md"]`). |
|
|
174
|
-
| `ignore_if_in_path` | `Optional[List[str]]` | `None` | A
|
|
175
|
+
| `ignore_if_in_path` | `Optional[List[str]]` | `None` | A list of directory or file substring names to exclude (e.g., `["temp"]` excludes `src/temp/file.py`). |
|
|
175
176
|
| `ignore_extensions` | `Optional[List[str]]` | `None` | A manual list of file extensions to explicitly ignore (e.g., `[".log", ".tmp"]`). |
|
|
176
177
|
| `search_file_contents` | `bool` | `True` | In Search Mode, search for keywords within file contents. |
|
|
177
178
|
| `generate_tree` | `bool` | `True` | Include a file tree of the matched files at the top of the output. |
|
|
@@ -180,6 +181,9 @@ The `generate_snapshot()` function accepts the following parameters:
|
|
|
180
181
|
| `exclude_whitespace_in_token_count` | `bool` | `False` | If `True`, removes whitespace before counting tokens for a more compact count. |
|
|
181
182
|
| `max_workers` | `Optional[int]` | `CPU count + 4` | The maximum number of worker threads for concurrent processing. |
|
|
182
183
|
| `read_binary_files` | `bool` | `False` | If `True`, the content search will attempt to read and search through binary files. |
|
|
184
|
+
| `only_show_tree` | `bool` | `False` | If `True`, the output file will contain only the file tree (and stats), omitting file content. |
|
|
185
|
+
| `case_sensitive_filter` | `bool` | `False` | If `True`, file filtering (extensions, ignore paths) is case-sensitive. |
|
|
186
|
+
| `case_sensitive_search` | `bool` | `False` | If `True`, keyword searching is case-sensitive. |
|
|
183
187
|
|
|
184
188
|
## 🤝 Contributing
|
|
185
189
|
|
|
@@ -191,4 +195,3 @@ Contributions are welcome! Please feel free to submit a pull request or open an
|
|
|
191
195
|
4. Commit your changes (`git commit -m 'Add some feature'`).
|
|
192
196
|
5. Push to the branch (`git push origin feature/your-feature-name`).
|
|
193
197
|
6. Open a pull request.
|
|
194
|
-
|
|
@@ -151,10 +151,11 @@ The `generate_snapshot()` function accepts the following parameters:
|
|
|
151
151
|
| `root_directory` | `str` | `"."` | The starting directory for the scan. |
|
|
152
152
|
| `output_file_name` | `str` | `"project_snapshot.txt"` | The name of the file to save the results to. |
|
|
153
153
|
| `search_keywords` | `Optional[List[str]]` | `None` | If provided, switches to **Search Mode**. Otherwise, runs in **Snapshot Mode**. |
|
|
154
|
+
| `files` | `Optional[List[str]]` | `None` | A list of specific filenames to include. If provided, checks this list first before extensions. |
|
|
154
155
|
| `language_presets` | `Optional[List[LanguagePreset]]` | `None` | A list of `LanguagePreset` enums for common file types (e.g., `LanguagePreset.PYTHON`). |
|
|
155
156
|
| `ignore_presets` | `Optional[List[IgnorePreset]]` | `None` | A list of `IgnorePreset` enums for common ignore patterns (e.g., `IgnorePreset.NODE_JS`). |
|
|
156
157
|
| `file_extensions` | `Optional[List[str]]` | `None` | A manual list of file extensions to include (e.g., `[".py", ".md"]`). |
|
|
157
|
-
| `ignore_if_in_path` | `Optional[List[str]]` | `None` | A
|
|
158
|
+
| `ignore_if_in_path` | `Optional[List[str]]` | `None` | A list of directory or file substring names to exclude (e.g., `["temp"]` excludes `src/temp/file.py`). |
|
|
158
159
|
| `ignore_extensions` | `Optional[List[str]]` | `None` | A manual list of file extensions to explicitly ignore (e.g., `[".log", ".tmp"]`). |
|
|
159
160
|
| `search_file_contents` | `bool` | `True` | In Search Mode, search for keywords within file contents. |
|
|
160
161
|
| `generate_tree` | `bool` | `True` | Include a file tree of the matched files at the top of the output. |
|
|
@@ -163,6 +164,9 @@ The `generate_snapshot()` function accepts the following parameters:
|
|
|
163
164
|
| `exclude_whitespace_in_token_count` | `bool` | `False` | If `True`, removes whitespace before counting tokens for a more compact count. |
|
|
164
165
|
| `max_workers` | `Optional[int]` | `CPU count + 4` | The maximum number of worker threads for concurrent processing. |
|
|
165
166
|
| `read_binary_files` | `bool` | `False` | If `True`, the content search will attempt to read and search through binary files. |
|
|
167
|
+
| `only_show_tree` | `bool` | `False` | If `True`, the output file will contain only the file tree (and stats), omitting file content. |
|
|
168
|
+
| `case_sensitive_filter` | `bool` | `False` | If `True`, file filtering (extensions, ignore paths) is case-sensitive. |
|
|
169
|
+
| `case_sensitive_search` | `bool` | `False` | If `True`, keyword searching is case-sensitive. |
|
|
166
170
|
|
|
167
171
|
## 🤝 Contributing
|
|
168
172
|
|
|
@@ -173,5 +177,4 @@ Contributions are welcome! Please feel free to submit a pull request or open an
|
|
|
173
177
|
3. Make your changes.
|
|
174
178
|
4. Commit your changes (`git commit -m 'Add some feature'`).
|
|
175
179
|
5. Push to the branch (`git push origin feature/your-feature-name`).
|
|
176
|
-
6. Open a pull request.
|
|
177
|
-
|
|
180
|
+
6. Open a pull request.
|
|
@@ -11,6 +11,12 @@ from concurrent.futures import ThreadPoolExecutor, as_completed
|
|
|
11
11
|
from io import StringIO
|
|
12
12
|
from contextlib import contextmanager
|
|
13
13
|
|
|
14
|
+
|
|
15
|
+
def strip_markup(text: str) -> str:
|
|
16
|
+
"""Removes rich-style markup tags from a string (e.g., [bold red]Error[/])"""
|
|
17
|
+
return re.sub(r"\[/?[^\]]+\]", "", str(text))
|
|
18
|
+
|
|
19
|
+
|
|
14
20
|
# --- Dependency & Console Management ---
|
|
15
21
|
try:
|
|
16
22
|
from rich.console import Console
|
|
@@ -38,7 +44,11 @@ except ImportError:
|
|
|
38
44
|
|
|
39
45
|
def add_task(self, description, total=None, **kwargs):
|
|
40
46
|
task_id = self.task_count
|
|
41
|
-
self.tasks[task_id] = {
|
|
47
|
+
self.tasks[task_id] = {
|
|
48
|
+
"d": strip_markup(description),
|
|
49
|
+
"t": total,
|
|
50
|
+
"c": 0,
|
|
51
|
+
}
|
|
42
52
|
self.task_count += 1
|
|
43
53
|
return task_id
|
|
44
54
|
|
|
@@ -49,12 +59,20 @@ except ImportError:
|
|
|
49
59
|
return
|
|
50
60
|
task = self.tasks[task_id]
|
|
51
61
|
if description:
|
|
52
|
-
task["d"] = description
|
|
62
|
+
task["d"] = strip_markup(description)
|
|
53
63
|
task["c"] = completed if completed is not None else task["c"] + advance
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
64
|
+
|
|
65
|
+
# Simple progress string
|
|
66
|
+
count_str = f"{task['c']}"
|
|
67
|
+
if task["t"]:
|
|
68
|
+
percent = (task["c"] / task["t"]) * 100
|
|
69
|
+
count_str += f"/{task['t']} ({percent:.0f}%)"
|
|
70
|
+
|
|
71
|
+
line = f"-> {task['d']}: {count_str}"
|
|
72
|
+
|
|
73
|
+
# Pad with spaces to clear previous longer lines
|
|
74
|
+
padding = max(0, len(self.active_line) - len(line))
|
|
75
|
+
sys.stdout.write("\r" + line + " " * padding)
|
|
58
76
|
sys.stdout.flush()
|
|
59
77
|
self.active_line = line
|
|
60
78
|
|
|
@@ -78,7 +96,8 @@ class ConsoleManager:
|
|
|
78
96
|
if self.console:
|
|
79
97
|
self.console.log(message, style=style)
|
|
80
98
|
else:
|
|
81
|
-
|
|
99
|
+
clean_msg = strip_markup(message)
|
|
100
|
+
print(f"[{time.strftime('%H:%M:%S')}] {clean_msg}")
|
|
82
101
|
|
|
83
102
|
def print_table(self, title: str, columns: List[str], rows: List[List[str]]):
|
|
84
103
|
"""Prints a formatted table to the console."""
|
|
@@ -95,11 +114,36 @@ class ConsoleManager:
|
|
|
95
114
|
table.add_row(*row)
|
|
96
115
|
self.console.print(table)
|
|
97
116
|
else:
|
|
98
|
-
|
|
99
|
-
print("
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
117
|
+
# Fallback ASCII table
|
|
118
|
+
print(f"\n{title}")
|
|
119
|
+
|
|
120
|
+
# Clean data and calculate widths
|
|
121
|
+
clean_cols = [strip_markup(c) for c in columns]
|
|
122
|
+
clean_rows = [[strip_markup(c) for c in r] for r in rows]
|
|
123
|
+
|
|
124
|
+
col_widths = [len(c) for c in clean_cols]
|
|
125
|
+
for row in clean_rows:
|
|
126
|
+
for i, cell in enumerate(row):
|
|
127
|
+
if i < len(col_widths):
|
|
128
|
+
col_widths[i] = max(col_widths[i], len(cell))
|
|
129
|
+
|
|
130
|
+
def print_sep(char="-", cross="+"):
|
|
131
|
+
print(cross + cross.join(char * (w + 2) for w in col_widths) + cross)
|
|
132
|
+
|
|
133
|
+
print_sep()
|
|
134
|
+
# Header
|
|
135
|
+
header_str = " | ".join(
|
|
136
|
+
f" {c:<{w}} " for c, w in zip(clean_cols, col_widths)
|
|
137
|
+
)
|
|
138
|
+
print(f"| {header_str} |")
|
|
139
|
+
print_sep("=")
|
|
140
|
+
|
|
141
|
+
# Rows
|
|
142
|
+
for row in clean_rows:
|
|
143
|
+
row_str = " | ".join(f" {c:<{w}} " for c, w in zip(row, col_widths))
|
|
144
|
+
print(f"| {row_str} |")
|
|
145
|
+
|
|
146
|
+
print_sep()
|
|
103
147
|
|
|
104
148
|
|
|
105
149
|
# --- Configuration Constants ---
|
|
@@ -416,6 +460,8 @@ class FilterCriteria:
|
|
|
416
460
|
file_extensions: Set[str] = field(default_factory=set)
|
|
417
461
|
ignore_if_in_path: Set[str] = field(default_factory=set)
|
|
418
462
|
ignore_extensions: Set[str] = field(default_factory=set)
|
|
463
|
+
specific_files: Set[str] = field(default_factory=set)
|
|
464
|
+
case_sensitive: bool = False
|
|
419
465
|
|
|
420
466
|
@classmethod
|
|
421
467
|
def normalize_inputs(
|
|
@@ -425,33 +471,48 @@ class FilterCriteria:
|
|
|
425
471
|
ignore_extensions: Optional[List[str]] = None,
|
|
426
472
|
lang_presets: Optional[List[LanguagePreset]] = None,
|
|
427
473
|
ignore_presets: Optional[List[IgnorePreset]] = None,
|
|
474
|
+
files: Optional[List[str]] = None,
|
|
475
|
+
case_sensitive: bool = False,
|
|
428
476
|
) -> "FilterCriteria":
|
|
429
477
|
"""
|
|
430
478
|
Consolidates various filter inputs into a single FilterCriteria object.
|
|
431
479
|
|
|
432
480
|
Args:
|
|
433
481
|
file_types (list, optional): A list of file extensions to include.
|
|
434
|
-
ignore_if_in_path (list, optional): A list of directory/file names to ignore.
|
|
482
|
+
ignore_if_in_path (list, optional): A list of directory/file substring names to ignore.
|
|
435
483
|
ignore_extensions (list, optional): A list of file extensions to ignore.
|
|
436
484
|
lang_presets (list, optional): A list of LanguagePreset enums.
|
|
437
485
|
ignore_presets (list, optional): A list of IgnorePreset enums.
|
|
486
|
+
files (list, optional): A list of specific filenames to include.
|
|
487
|
+
case_sensitive (bool): If True, filters are case sensitive.
|
|
438
488
|
|
|
439
489
|
Returns:
|
|
440
490
|
FilterCriteria: An object containing the combined sets of filters.
|
|
441
491
|
"""
|
|
442
|
-
|
|
443
|
-
|
|
444
|
-
|
|
492
|
+
|
|
493
|
+
def clean(s):
|
|
494
|
+
s = s.strip()
|
|
495
|
+
return s if case_sensitive else s.lower()
|
|
496
|
+
|
|
497
|
+
all_exts = {clean(ft) for ft in file_types or []}
|
|
498
|
+
all_ignore_paths = {clean(ip) for ip in ignore_if_in_path or []}
|
|
499
|
+
all_ignore_exts = {clean(ie) for ie in ignore_extensions or []}
|
|
500
|
+
all_specific_files = {clean(f) for f in files or []}
|
|
445
501
|
|
|
446
502
|
for p in lang_presets or []:
|
|
447
|
-
|
|
503
|
+
for item in p.value:
|
|
504
|
+
all_exts.add(clean(item))
|
|
505
|
+
|
|
448
506
|
for p in ignore_presets or []:
|
|
449
|
-
|
|
507
|
+
for item in p.value:
|
|
508
|
+
all_ignore_paths.add(clean(item))
|
|
450
509
|
|
|
451
510
|
return cls(
|
|
452
511
|
file_extensions=all_exts,
|
|
453
512
|
ignore_if_in_path=all_ignore_paths,
|
|
454
513
|
ignore_extensions=all_ignore_exts,
|
|
514
|
+
specific_files=all_specific_files,
|
|
515
|
+
case_sensitive=case_sensitive,
|
|
455
516
|
)
|
|
456
517
|
|
|
457
518
|
|
|
@@ -477,11 +538,26 @@ def _discover_files(
|
|
|
477
538
|
nonlocal dirs_scanned
|
|
478
539
|
try:
|
|
479
540
|
for entry in os.scandir(current_path):
|
|
480
|
-
|
|
481
|
-
|
|
541
|
+
# Path relative to the project root, used for substring check in path
|
|
542
|
+
# We use string representation for the check
|
|
543
|
+
rel_path = Path(entry.path).relative_to(root_dir)
|
|
544
|
+
rel_path_str = str(rel_path)
|
|
545
|
+
entry_name = entry.name
|
|
546
|
+
|
|
547
|
+
# Normalize for case check
|
|
548
|
+
if not criteria.case_sensitive:
|
|
549
|
+
rel_path_str = rel_path_str.lower()
|
|
550
|
+
entry_name = entry_name.lower()
|
|
551
|
+
|
|
552
|
+
# Ignore Logic: Substring matching in the path
|
|
553
|
+
# If any ignore string is a substring of the relative path, skip it.
|
|
554
|
+
if any(
|
|
555
|
+
ignored in rel_path_str for ignored in criteria.ignore_if_in_path
|
|
556
|
+
):
|
|
482
557
|
continue
|
|
558
|
+
|
|
483
559
|
if entry.is_dir():
|
|
484
|
-
recursive_scan(
|
|
560
|
+
recursive_scan(Path(entry.path))
|
|
485
561
|
dirs_scanned += 1
|
|
486
562
|
if progress:
|
|
487
563
|
progress.update(
|
|
@@ -490,17 +566,36 @@ def _discover_files(
|
|
|
490
566
|
description=f"Discovering files in [cyan]{entry.name}[/cyan]",
|
|
491
567
|
)
|
|
492
568
|
elif entry.is_file():
|
|
493
|
-
|
|
569
|
+
# Specific File Inclusion
|
|
570
|
+
if (
|
|
571
|
+
criteria.specific_files
|
|
572
|
+
and entry_name not in criteria.specific_files
|
|
573
|
+
):
|
|
574
|
+
continue
|
|
575
|
+
|
|
576
|
+
# Extension filtering
|
|
577
|
+
file_ext = Path(entry.path).suffix
|
|
578
|
+
if not criteria.case_sensitive:
|
|
579
|
+
file_ext = file_ext.lower()
|
|
580
|
+
|
|
494
581
|
if (
|
|
495
582
|
criteria.ignore_extensions
|
|
496
583
|
and file_ext in criteria.ignore_extensions
|
|
497
584
|
):
|
|
498
585
|
continue
|
|
586
|
+
|
|
587
|
+
# Inclusion Logic
|
|
588
|
+
# Include if no inclusion filters are set OR ext is allowed OR file is specifically allowed
|
|
499
589
|
if (
|
|
500
590
|
not criteria.file_extensions
|
|
501
591
|
or file_ext in criteria.file_extensions
|
|
592
|
+
or (
|
|
593
|
+
criteria.specific_files
|
|
594
|
+
and entry_name in criteria.specific_files
|
|
595
|
+
)
|
|
502
596
|
):
|
|
503
|
-
candidate_files.append(
|
|
597
|
+
candidate_files.append(Path(entry.path))
|
|
598
|
+
|
|
504
599
|
except (PermissionError, FileNotFoundError):
|
|
505
600
|
pass
|
|
506
601
|
|
|
@@ -515,29 +610,24 @@ def process_file_for_search(
|
|
|
515
610
|
full_path: bool,
|
|
516
611
|
activity: Dict,
|
|
517
612
|
read_binary_files: bool,
|
|
613
|
+
case_sensitive: bool,
|
|
518
614
|
) -> Optional[Path]:
|
|
519
615
|
"""
|
|
520
616
|
Processes a single file to see if it matches the search criteria.
|
|
521
617
|
|
|
522
618
|
A match can occur if a keyword is found in the filename or, if enabled,
|
|
523
619
|
within the file's content.
|
|
524
|
-
|
|
525
|
-
Args:
|
|
526
|
-
file_path (Path): The absolute path to the file to process.
|
|
527
|
-
keywords (List[str]): A list of keywords to search for.
|
|
528
|
-
search_content (bool): If True, search the content of the file.
|
|
529
|
-
full_path (bool): If True, compare keywords against the full file path.
|
|
530
|
-
activity (Dict): A dictionary to track thread activity.
|
|
531
|
-
read_binary_files (bool): If True, attempt to read and search binary files.
|
|
532
|
-
|
|
533
|
-
Returns:
|
|
534
|
-
Optional[Path]: The path to the file if it's a match, otherwise None.
|
|
535
620
|
"""
|
|
536
621
|
thread_id = threading.get_ident()
|
|
537
622
|
activity[thread_id] = file_path.name
|
|
538
623
|
try:
|
|
539
624
|
compare_target = str(file_path) if full_path else file_path.name
|
|
540
|
-
|
|
625
|
+
|
|
626
|
+
if not case_sensitive:
|
|
627
|
+
compare_target = compare_target.lower()
|
|
628
|
+
# Keywords should already be normalized by the caller if not case_sensitive
|
|
629
|
+
|
|
630
|
+
if any(key in compare_target for key in keywords):
|
|
541
631
|
return file_path
|
|
542
632
|
|
|
543
633
|
if search_content and (
|
|
@@ -546,7 +636,9 @@ def process_file_for_search(
|
|
|
546
636
|
try:
|
|
547
637
|
with file_path.open("r", encoding="utf-8", errors="ignore") as f:
|
|
548
638
|
for line in f:
|
|
549
|
-
if
|
|
639
|
+
if not case_sensitive:
|
|
640
|
+
line = line.lower()
|
|
641
|
+
if any(key in line for key in keywords):
|
|
550
642
|
return file_path
|
|
551
643
|
except OSError:
|
|
552
644
|
pass
|
|
@@ -564,24 +656,17 @@ def _process_files_concurrently(
|
|
|
564
656
|
progress: Any,
|
|
565
657
|
task_id: Any,
|
|
566
658
|
read_binary_files: bool,
|
|
659
|
+
case_sensitive: bool,
|
|
567
660
|
) -> Set[Path]:
|
|
568
661
|
"""
|
|
569
662
|
Uses a thread pool to process a list of files for search matches concurrently.
|
|
570
|
-
|
|
571
|
-
Args:
|
|
572
|
-
files (List[Path]): The list of candidate files to search through.
|
|
573
|
-
keywords (List[str]): The keywords to search for.
|
|
574
|
-
search_content (bool): Whether to search inside file contents.
|
|
575
|
-
full_path (bool): Whether to compare keywords against the full path.
|
|
576
|
-
max_workers (Optional[int]): The maximum number of threads to use.
|
|
577
|
-
progress (Any): The progress bar object.
|
|
578
|
-
task_id (Any): The ID of the processing task on the progress bar.
|
|
579
|
-
read_binary_files (bool): If True, search the content of binary files.
|
|
580
|
-
|
|
581
|
-
Returns:
|
|
582
|
-
Set[Path]: A set of absolute paths for all files that matched.
|
|
583
663
|
"""
|
|
584
664
|
matched_files, thread_activity = set(), {}
|
|
665
|
+
|
|
666
|
+
# Normalize keywords once if case insensitive
|
|
667
|
+
if not case_sensitive:
|
|
668
|
+
keywords = [k.lower() for k in keywords]
|
|
669
|
+
|
|
585
670
|
with ThreadPoolExecutor(
|
|
586
671
|
max_workers=max_workers or (os.cpu_count() or 1) + 4,
|
|
587
672
|
thread_name_prefix="scanner",
|
|
@@ -595,6 +680,7 @@ def _process_files_concurrently(
|
|
|
595
680
|
full_path,
|
|
596
681
|
thread_activity,
|
|
597
682
|
read_binary_files,
|
|
683
|
+
case_sensitive,
|
|
598
684
|
): f
|
|
599
685
|
for f in files
|
|
600
686
|
}
|
|
@@ -632,17 +718,7 @@ def _process_files_concurrently(
|
|
|
632
718
|
def _generate_tree_with_stats(
|
|
633
719
|
root_dir: Path, file_paths: List[Path], show_stats: bool
|
|
634
720
|
) -> List[str]:
|
|
635
|
-
"""
|
|
636
|
-
Generates a directory tree structure from a list of file paths.
|
|
637
|
-
|
|
638
|
-
Args:
|
|
639
|
-
root_dir (Path): The root directory of the project, used as the tree's base.
|
|
640
|
-
file_paths (List[Path]): A list of file paths to include in the tree.
|
|
641
|
-
show_stats (bool): If True, include file and directory counts in the tree.
|
|
642
|
-
|
|
643
|
-
Returns:
|
|
644
|
-
List[str]: A list of strings, where each string is a line in the tree.
|
|
645
|
-
"""
|
|
721
|
+
"""Generates a directory tree structure from a list of file paths."""
|
|
646
722
|
tree_dict: Dict[str, Any] = {}
|
|
647
723
|
for path in file_paths:
|
|
648
724
|
level = tree_dict
|
|
@@ -694,23 +770,9 @@ def _collate_content_to_file(
|
|
|
694
770
|
exclude_whitespace: bool,
|
|
695
771
|
progress: Any,
|
|
696
772
|
task_id: Any,
|
|
773
|
+
only_show_tree: bool,
|
|
697
774
|
) -> Tuple[float, int]:
|
|
698
|
-
"""
|
|
699
|
-
Collates the file tree and file contents into a single output file.
|
|
700
|
-
|
|
701
|
-
Args:
|
|
702
|
-
output_path (Path): The path to the final output file.
|
|
703
|
-
tree_lines (List): The generated file tree lines.
|
|
704
|
-
files (List[FileToProcess]): The files whose content needs to be collated.
|
|
705
|
-
show_tree_stats (bool): Whether to include the stats key in the header.
|
|
706
|
-
show_token_count (bool): Whether to calculate and include the token count.
|
|
707
|
-
exclude_whitespace (bool): If True, exclude whitespace from token counting.
|
|
708
|
-
progress (Any): The progress bar object.
|
|
709
|
-
task_id (Any): The ID of the collation task on the progress bar.
|
|
710
|
-
|
|
711
|
-
Returns:
|
|
712
|
-
Tuple[float, int]: A tuple containing the total bytes written and the token count.
|
|
713
|
-
"""
|
|
775
|
+
"""Collates the file tree and file contents into a single output file."""
|
|
714
776
|
output_path.parent.mkdir(parents=True, exist_ok=True)
|
|
715
777
|
buffer, total_bytes, token_count = StringIO(), 0, 0
|
|
716
778
|
|
|
@@ -724,9 +786,14 @@ def _collate_content_to_file(
|
|
|
724
786
|
if RICH_AVAILABLE:
|
|
725
787
|
content = "\n".join(Text.from_markup(line).plain for line in tree_lines)
|
|
726
788
|
else:
|
|
727
|
-
content = "\n".join(tree_lines)
|
|
789
|
+
content = "\n".join(strip_markup(line) for line in tree_lines)
|
|
728
790
|
buffer.write(content + "\n\n")
|
|
729
791
|
|
|
792
|
+
if only_show_tree:
|
|
793
|
+
with output_path.open("w", encoding=DEFAULT_ENCODING) as outfile:
|
|
794
|
+
outfile.write(buffer.getvalue())
|
|
795
|
+
return total_bytes, token_count
|
|
796
|
+
|
|
730
797
|
for file_info in files:
|
|
731
798
|
if progress:
|
|
732
799
|
progress.update(
|
|
@@ -778,48 +845,13 @@ def generate_snapshot(
|
|
|
778
845
|
show_token_count: bool = False,
|
|
779
846
|
exclude_whitespace_in_token_count: bool = False,
|
|
780
847
|
read_binary_files: bool = False,
|
|
848
|
+
files: Optional[List[str]] = None,
|
|
849
|
+
only_show_tree: bool = False,
|
|
850
|
+
case_sensitive_filter: bool = False,
|
|
851
|
+
case_sensitive_search: bool = False,
|
|
781
852
|
) -> None:
|
|
782
853
|
"""
|
|
783
854
|
Orchestrates the entire process of scanning, filtering, and collating project files.
|
|
784
|
-
|
|
785
|
-
This function serves as the main entry point for the utility. It can be used
|
|
786
|
-
to create a full "snapshot" of a project's source code or to search for
|
|
787
|
-
specific keywords within file names and/or contents. It is highly configurable
|
|
788
|
-
through presets and manual overrides.
|
|
789
|
-
|
|
790
|
-
Args:
|
|
791
|
-
root_directory (str): The starting directory for the scan. Defaults to ".".
|
|
792
|
-
output_file_name (str): The name of the file to save the results to.
|
|
793
|
-
Defaults to "project_snapshot.txt".
|
|
794
|
-
search_keywords (List[str], optional): A list of keywords to search for. If
|
|
795
|
-
None or empty, the function runs in "snapshot" mode, including all
|
|
796
|
-
files that match the other criteria. Defaults to None.
|
|
797
|
-
file_extensions (List[str], optional): A list of specific file
|
|
798
|
-
extensions to include (e.g., [".py", ".md"]). Defaults to None.
|
|
799
|
-
ignore_if_in_path (List[str], optional): A list of directory or file
|
|
800
|
-
names to exclude from the scan. Defaults to None.
|
|
801
|
-
ignore_extensions (List[str], optional): A list of file extensions to
|
|
802
|
-
explicitly ignore (e.g., [".log", ".tmp"]). Defaults to None.
|
|
803
|
-
language_presets (List[LanguagePreset], optional): A list of LanguagePreset
|
|
804
|
-
enums for common file types (e.g., [LanguagePreset.PYTHON]). Defaults to None.
|
|
805
|
-
ignore_presets (List[IgnorePreset], optional): A list of IgnorePreset enums
|
|
806
|
-
for common ignore patterns (e.g., [IgnorePreset.PYTHON]). Defaults to None.
|
|
807
|
-
search_file_contents (bool): If True, search for keywords within file
|
|
808
|
-
contents. Defaults to True.
|
|
809
|
-
full_path_compare (bool): If True, search for keywords in the full file path,
|
|
810
|
-
not just the filename. Defaults to True.
|
|
811
|
-
max_workers (Optional[int]): The maximum number of worker threads for
|
|
812
|
-
concurrent processing. Defaults to CPU count + 4.
|
|
813
|
-
generate_tree (bool): If True, a file tree of the matched files will be
|
|
814
|
-
included at the top of the output file. Defaults to True.
|
|
815
|
-
show_tree_stats (bool): If True, display file and directory counts in the
|
|
816
|
-
generated tree. Defaults to False.
|
|
817
|
-
show_token_count (bool): If True, display an approximated token count in the
|
|
818
|
-
summary and output file. Defaults to False.
|
|
819
|
-
exclude_whitespace_in_token_count (bool): If True, whitespace is removed
|
|
820
|
-
before counting tokens, giving a more compact count. Defaults to False.
|
|
821
|
-
read_binary_files (bool): If True, the content search will attempt to read
|
|
822
|
-
and search through binary files. Defaults to False.
|
|
823
855
|
"""
|
|
824
856
|
console, start_time = ConsoleManager(), time.perf_counter()
|
|
825
857
|
root_dir = Path(root_directory or ".").resolve()
|
|
@@ -827,19 +859,31 @@ def generate_snapshot(
|
|
|
827
859
|
console.log(f"Error: Root directory '{root_dir}' not found.", style="bold red")
|
|
828
860
|
return
|
|
829
861
|
|
|
830
|
-
|
|
862
|
+
# Normalize keywords for display/logic
|
|
863
|
+
keywords = [k.strip() for k in search_keywords or [] if k.strip()]
|
|
864
|
+
if not case_sensitive_search:
|
|
865
|
+
# We don't lower here for the variable passed to functions,
|
|
866
|
+
# but for consistent display in the table we might want to.
|
|
867
|
+
# However, logic downstream handles lowering if case_sensitive_search is False.
|
|
868
|
+
pass
|
|
869
|
+
|
|
831
870
|
snapshot_mode = not keywords
|
|
871
|
+
|
|
872
|
+
# Normalize filtering criteria
|
|
832
873
|
criteria = FilterCriteria.normalize_inputs(
|
|
833
874
|
file_types=file_extensions,
|
|
834
875
|
ignore_if_in_path=ignore_if_in_path,
|
|
835
876
|
ignore_extensions=ignore_extensions,
|
|
836
877
|
lang_presets=language_presets,
|
|
837
878
|
ignore_presets=ignore_presets,
|
|
879
|
+
files=files,
|
|
880
|
+
case_sensitive=case_sensitive_filter,
|
|
838
881
|
)
|
|
839
882
|
|
|
840
883
|
config_rows = [
|
|
841
884
|
["Root Directory", str(root_dir)],
|
|
842
885
|
["File Types", ", ".join(criteria.file_extensions) or "All"],
|
|
886
|
+
["Specific Files", ", ".join(criteria.specific_files) or "None"],
|
|
843
887
|
["Ignore Paths", ", ".join(criteria.ignore_if_in_path) or "None"],
|
|
844
888
|
["Ignore Extensions", ", ".join(criteria.ignore_extensions) or "None"],
|
|
845
889
|
["Generate Tree", "[green]Yes[/green]" if generate_tree else "[red]No[/red]"],
|
|
@@ -868,6 +912,12 @@ def generate_snapshot(
|
|
|
868
912
|
|
|
869
913
|
if snapshot_mode:
|
|
870
914
|
config_rows.insert(1, ["Mode", "[bold blue]Snapshot[/bold blue]"])
|
|
915
|
+
config_rows.append(
|
|
916
|
+
[
|
|
917
|
+
"Case Sensitive Filter",
|
|
918
|
+
"[green]Yes[/green]" if case_sensitive_filter else "[red]No[/red]",
|
|
919
|
+
]
|
|
920
|
+
)
|
|
871
921
|
else:
|
|
872
922
|
config_rows.insert(1, ["Mode", "[bold yellow]Search[/bold yellow]"])
|
|
873
923
|
config_rows.insert(
|
|
@@ -885,6 +935,16 @@ def generate_snapshot(
|
|
|
885
935
|
"[green]Yes[/green]" if read_binary_files else "[red]No[/red]",
|
|
886
936
|
]
|
|
887
937
|
)
|
|
938
|
+
config_rows.append(
|
|
939
|
+
[
|
|
940
|
+
"Case Sensitive Search",
|
|
941
|
+
"[green]Yes[/green]" if case_sensitive_search else "[red]No[/red]",
|
|
942
|
+
]
|
|
943
|
+
)
|
|
944
|
+
|
|
945
|
+
if only_show_tree:
|
|
946
|
+
config_rows.append(["Output Content", "[yellow]Tree Only[/yellow]"])
|
|
947
|
+
|
|
888
948
|
console.print_table(
|
|
889
949
|
"Project Scan Configuration", ["Parameter", "Value"], config_rows
|
|
890
950
|
)
|
|
@@ -948,6 +1008,7 @@ def generate_snapshot(
|
|
|
948
1008
|
progress,
|
|
949
1009
|
process_task,
|
|
950
1010
|
read_binary_files,
|
|
1011
|
+
case_sensitive_search,
|
|
951
1012
|
)
|
|
952
1013
|
|
|
953
1014
|
output_path, total_bytes, token_count = None, 0, 0
|
|
@@ -986,6 +1047,7 @@ def generate_snapshot(
|
|
|
986
1047
|
exclude_whitespace_in_token_count,
|
|
987
1048
|
progress,
|
|
988
1049
|
collate_task,
|
|
1050
|
+
only_show_tree,
|
|
989
1051
|
)
|
|
990
1052
|
|
|
991
1053
|
end_time = time.perf_counter()
|
|
@@ -1019,4 +1081,4 @@ if __name__ == "__main__":
|
|
|
1019
1081
|
show_tree_stats=True,
|
|
1020
1082
|
show_token_count=True,
|
|
1021
1083
|
exclude_whitespace_in_token_count=True,
|
|
1022
|
-
)
|
|
1084
|
+
)
|
|
@@ -0,0 +1,110 @@
|
|
|
1
|
+
import os
|
|
2
|
+
import re
|
|
3
|
+
|
|
4
|
+
# --- Configuration ---
|
|
5
|
+
# You can edit these variables to match your needs.
|
|
6
|
+
|
|
7
|
+
# 1. The name of the file containing the project structure and content.
|
|
8
|
+
INPUT_FILENAME = 'repo.txt'
|
|
9
|
+
|
|
10
|
+
# 2. The name of the directory where the project will be created.
|
|
11
|
+
OUTPUT_DIRECTORY = 'studio'
|
|
12
|
+
# --- End of Configuration ---
|
|
13
|
+
|
|
14
|
+
|
|
15
|
+
def reconstruct_and_populate_project(file_path, root_dir):
|
|
16
|
+
"""
|
|
17
|
+
Parses a formatted text file to reconstruct a project's directory
|
|
18
|
+
structure and correctly populates all files with their content.
|
|
19
|
+
|
|
20
|
+
Args:
|
|
21
|
+
file_path (str): The path to the input text file (e.g., 'repo.txt').
|
|
22
|
+
root_dir (str): The name of the root directory for the reconstructed project.
|
|
23
|
+
"""
|
|
24
|
+
print(f"Starting project reconstruction from '{file_path}'...")
|
|
25
|
+
print(f"Output will be saved in the '{root_dir}' directory.")
|
|
26
|
+
|
|
27
|
+
try:
|
|
28
|
+
with open(file_path, 'r', encoding='utf-8') as f:
|
|
29
|
+
content = f.read()
|
|
30
|
+
except FileNotFoundError:
|
|
31
|
+
print(f"\nERROR: The input file '{file_path}' was not found.")
|
|
32
|
+
print("Please make sure the script is in the same directory as the input file.")
|
|
33
|
+
return
|
|
34
|
+
except Exception as e:
|
|
35
|
+
print(f"\nAn error occurred while reading the file: {e}")
|
|
36
|
+
return
|
|
37
|
+
|
|
38
|
+
# A line of 80 hyphens is the separator. We split the entire document by it.
|
|
39
|
+
separator = '--------------------------------------------------------------------------------'
|
|
40
|
+
# The split operation will result in a list where file paths and contents alternate.
|
|
41
|
+
sections = content.split(separator)
|
|
42
|
+
|
|
43
|
+
# The very first section is the visual tree, which we don't need.
|
|
44
|
+
# We start processing from the first "FILE:" header.
|
|
45
|
+
# We skip any empty sections that might result from splitting.
|
|
46
|
+
file_chunks = [s.strip() for s in sections if s.strip()]
|
|
47
|
+
|
|
48
|
+
# Create a dictionary to hold {'filepath': 'content'}
|
|
49
|
+
file_data = {}
|
|
50
|
+
|
|
51
|
+
# The new logic iterates through the chunks. When it finds a file header,
|
|
52
|
+
# it assumes the *next* chunk is the content for that file.
|
|
53
|
+
i = 0
|
|
54
|
+
while i < len(file_chunks):
|
|
55
|
+
chunk = file_chunks[i]
|
|
56
|
+
if chunk.startswith('FILE:'):
|
|
57
|
+
# This chunk is a file header. Extract the path.
|
|
58
|
+
# It might have other text like the tree, so we find the 'FILE:' line specifically.
|
|
59
|
+
path_line = [line for line in chunk.splitlines() if line.startswith('FILE:')][0]
|
|
60
|
+
relative_path = path_line[5:].strip()
|
|
61
|
+
|
|
62
|
+
# The very next chunk in the list is the content for this file.
|
|
63
|
+
if i + 1 < len(file_chunks):
|
|
64
|
+
content = file_chunks[i + 1]
|
|
65
|
+
file_data[relative_path] = content
|
|
66
|
+
# We've processed the header and the content, so we can skip the next item.
|
|
67
|
+
i += 2
|
|
68
|
+
else:
|
|
69
|
+
# Found a file header without any content after it (end of file).
|
|
70
|
+
file_data[relative_path] = '' # Create an empty file
|
|
71
|
+
i += 1
|
|
72
|
+
else:
|
|
73
|
+
# This chunk is not a file header, so we skip it (e.g., the initial tree view).
|
|
74
|
+
i += 1
|
|
75
|
+
|
|
76
|
+
if not file_data:
|
|
77
|
+
print("\nERROR: Could not find any valid 'FILE:' sections. Nothing to create.")
|
|
78
|
+
return
|
|
79
|
+
|
|
80
|
+
# Create the main output directory if it doesn't already exist.
|
|
81
|
+
if not os.path.exists(root_dir):
|
|
82
|
+
print(f"\nCreating root directory: '{root_dir}'")
|
|
83
|
+
os.makedirs(root_dir)
|
|
84
|
+
else:
|
|
85
|
+
print(f"\nOutput directory '{root_dir}' already exists. Files may be overwritten.")
|
|
86
|
+
|
|
87
|
+
# Now, create the directories and write the populated files.
|
|
88
|
+
for relative_path, file_content in file_data.items():
|
|
89
|
+
full_path = os.path.join(root_dir, relative_path)
|
|
90
|
+
parent_dir = os.path.dirname(full_path)
|
|
91
|
+
|
|
92
|
+
# Ensure the directory for the file exists (e.g., 'src/components/ui/').
|
|
93
|
+
if parent_dir:
|
|
94
|
+
os.makedirs(parent_dir, exist_ok=True)
|
|
95
|
+
|
|
96
|
+
# Write the captured content into the file.
|
|
97
|
+
try:
|
|
98
|
+
with open(full_path, 'w', encoding='utf-8') as f:
|
|
99
|
+
f.write(file_content)
|
|
100
|
+
print(f" - Created and populated: {full_path}")
|
|
101
|
+
except Exception as e:
|
|
102
|
+
print(f" - FAILED to create file {full_path}: {e}")
|
|
103
|
+
|
|
104
|
+
print(f"\nProject reconstruction complete!")
|
|
105
|
+
print(f"Check the '{root_dir}' directory to see your populated project.")
|
|
106
|
+
|
|
107
|
+
|
|
108
|
+
# --- Script Execution ---
|
|
109
|
+
if __name__ == '__main__':
|
|
110
|
+
reconstruct_and_populate_project(file_path=INPUT_FILENAME, root_dir=OUTPUT_DIRECTORY)
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: dirshot
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.3.0
|
|
4
4
|
Summary: A flexible, high-performance utility for creating project snapshots and searching files with a rich terminal UI.
|
|
5
5
|
Author-email: init-helpful <init.helpful@gmail.com>
|
|
6
6
|
Project-URL: Homepage, https://github.com/init-helpful/dirshot
|
|
@@ -168,10 +168,11 @@ The `generate_snapshot()` function accepts the following parameters:
|
|
|
168
168
|
| `root_directory` | `str` | `"."` | The starting directory for the scan. |
|
|
169
169
|
| `output_file_name` | `str` | `"project_snapshot.txt"` | The name of the file to save the results to. |
|
|
170
170
|
| `search_keywords` | `Optional[List[str]]` | `None` | If provided, switches to **Search Mode**. Otherwise, runs in **Snapshot Mode**. |
|
|
171
|
+
| `files` | `Optional[List[str]]` | `None` | A list of specific filenames to include. If provided, checks this list first before extensions. |
|
|
171
172
|
| `language_presets` | `Optional[List[LanguagePreset]]` | `None` | A list of `LanguagePreset` enums for common file types (e.g., `LanguagePreset.PYTHON`). |
|
|
172
173
|
| `ignore_presets` | `Optional[List[IgnorePreset]]` | `None` | A list of `IgnorePreset` enums for common ignore patterns (e.g., `IgnorePreset.NODE_JS`). |
|
|
173
174
|
| `file_extensions` | `Optional[List[str]]` | `None` | A manual list of file extensions to include (e.g., `[".py", ".md"]`). |
|
|
174
|
-
| `ignore_if_in_path` | `Optional[List[str]]` | `None` | A
|
|
175
|
+
| `ignore_if_in_path` | `Optional[List[str]]` | `None` | A list of directory or file substring names to exclude (e.g., `["temp"]` excludes `src/temp/file.py`). |
|
|
175
176
|
| `ignore_extensions` | `Optional[List[str]]` | `None` | A manual list of file extensions to explicitly ignore (e.g., `[".log", ".tmp"]`). |
|
|
176
177
|
| `search_file_contents` | `bool` | `True` | In Search Mode, search for keywords within file contents. |
|
|
177
178
|
| `generate_tree` | `bool` | `True` | Include a file tree of the matched files at the top of the output. |
|
|
@@ -180,6 +181,9 @@ The `generate_snapshot()` function accepts the following parameters:
|
|
|
180
181
|
| `exclude_whitespace_in_token_count` | `bool` | `False` | If `True`, removes whitespace before counting tokens for a more compact count. |
|
|
181
182
|
| `max_workers` | `Optional[int]` | `CPU count + 4` | The maximum number of worker threads for concurrent processing. |
|
|
182
183
|
| `read_binary_files` | `bool` | `False` | If `True`, the content search will attempt to read and search through binary files. |
|
|
184
|
+
| `only_show_tree` | `bool` | `False` | If `True`, the output file will contain only the file tree (and stats), omitting file content. |
|
|
185
|
+
| `case_sensitive_filter` | `bool` | `False` | If `True`, file filtering (extensions, ignore paths) is case-sensitive. |
|
|
186
|
+
| `case_sensitive_search` | `bool` | `False` | If `True`, keyword searching is case-sensitive. |
|
|
183
187
|
|
|
184
188
|
## 🤝 Contributing
|
|
185
189
|
|
|
@@ -191,4 +195,3 @@ Contributions are welcome! Please feel free to submit a pull request or open an
|
|
|
191
195
|
4. Commit your changes (`git commit -m 'Add some feature'`).
|
|
192
196
|
5. Push to the branch (`git push origin feature/your-feature-name`).
|
|
193
197
|
6. Open a pull request.
|
|
194
|
-
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|