gitingest 0.3.0 → 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +57 -29
- data/index.html +363 -0
- data/lib/gitingest/generator.rb +176 -38
- data/lib/gitingest/version.rb +1 -1
- metadata +2 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 14bcb35132327c7725e69a895d56b4e88fb21db78fa5473c9afb4d08b879b7ee
|
4
|
+
data.tar.gz: f3a5e06bec7566a268342678887bac14989d69d26f10cbb590e1f594c6779b89
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 1aed7d97acae8b6a1c2b15757cdc9852d802c99b83aa73caf114dfc904d0b05c44b70c853cd453fc12868310c09b15bf6bf71dcac203c01b5dfde241dc7ab0f5
|
7
|
+
data.tar.gz: 07145ca986675723ad0371c873c176384631da660bed227b31ef4cf2680d72cc98479b4da8fb990fedb3dce0eab74ca5114c1222eea0ca1890a85d73726a8d4b
|
data/CHANGELOG.md
CHANGED
@@ -1,39 +1,67 @@
|
|
1
1
|
# Changelog
|
2
2
|
|
3
|
-
|
3
|
+
## [0.3.1] - 2025-03-03
|
4
|
+
|
5
|
+
### Added
|
6
|
+
- Introduced configurable threading options:
|
7
|
+
- `:threads` to specify the number of threads (default: auto-detected).
|
8
|
+
- `:thread_timeout` to define thread pool shutdown timeout (default: 60 seconds).
|
9
|
+
- Implemented thread-local buffers to reduce mutex contention during file processing.
|
10
|
+
- Added exponential backoff with jitter for rate-limited API requests.
|
11
|
+
- Improved progress indicator with a visual progress bar and estimated time remaining.
|
12
|
+
|
13
|
+
### Changed
|
14
|
+
- Increased `BUFFER_SIZE` from 100 to 250 to reduce I/O operations.
|
15
|
+
- Optimized file exclusion check using a combined regex for faster matching.
|
16
|
+
- Improved thread pool efficiency by prioritizing smaller files first.
|
17
|
+
- Enhanced error handling with detailed logging and thread-safe error collection.
|
18
|
+
|
19
|
+
### Fixed
|
20
|
+
- Ensured thread pool shutdown respects the configured timeout.
|
21
|
+
- Resolved potential race conditions in file content retrieval.
|
22
|
+
|
23
|
+
---
|
4
24
|
|
5
25
|
## [0.3.0] - 2025-03-02
|
6
|
-
|
7
|
-
|
8
|
-
- Added
|
9
|
-
-
|
10
|
-
-
|
11
|
-
-
|
12
|
-
-
|
13
|
-
-
|
26
|
+
|
27
|
+
### Added
|
28
|
+
- Added `faraday-retry` gem dependency for better API rate limit handling.
|
29
|
+
- Implemented thread-safe buffer management with mutex locks.
|
30
|
+
- Introduced `ProgressIndicator` class for enhanced CLI progress reporting, including percentages.
|
31
|
+
- Improved memory efficiency with a configurable buffer size.
|
32
|
+
- Enhanced code organization by introducing dedicated methods for file content formatting.
|
33
|
+
- Added comprehensive method documentation and parameter descriptions.
|
34
|
+
- Optimized thread pool size calculation for improved performance.
|
35
|
+
- Improved error handling in concurrent operations.
|
36
|
+
|
37
|
+
---
|
14
38
|
|
15
39
|
## [0.2.0] - 2025-03-02
|
16
|
-
|
17
|
-
|
18
|
-
-
|
19
|
-
- Added
|
20
|
-
-
|
21
|
-
-
|
22
|
-
-
|
23
|
-
-
|
24
|
-
-
|
40
|
+
|
41
|
+
### Added
|
42
|
+
- Introduced support for quiet and verbose modes in the command-line interface.
|
43
|
+
- Added the ability to specify a custom output file for the prompt.
|
44
|
+
- Implemented enhanced error handling with logging support.
|
45
|
+
- Introduced logging functionality with customizable loggers.
|
46
|
+
- Added rate limit handling with retries for file fetching.
|
47
|
+
- Implemented repository branch support.
|
48
|
+
- Enabled exclusion of specific file patterns via command-line arguments.
|
49
|
+
- Enforced a 1000-file limit to prevent memory overload.
|
50
|
+
- Updated version to `0.2.0`.
|
51
|
+
|
52
|
+
---
|
25
53
|
|
26
54
|
## [0.1.0] - 2025-03-02
|
27
55
|
|
28
56
|
### Added
|
29
|
-
- Initial release of Gitingest
|
30
|
-
- Core functionality to fetch and process GitHub repository files
|
31
|
-
- Command-line interface for easy interaction
|
32
|
-
- Smart file filtering with default exclusions for common non-code files
|
33
|
-
- Concurrent processing for improved performance
|
34
|
-
- Custom exclude patterns support
|
35
|
-
- GitHub authentication via access tokens
|
36
|
-
- Automatic rate limit handling with retry mechanism
|
37
|
-
- Repository prompt generation with file separation markers
|
38
|
-
- Support for custom branch selection
|
39
|
-
- Custom output file naming options
|
57
|
+
- Initial release of Gitingest.
|
58
|
+
- Core functionality to fetch and process GitHub repository files.
|
59
|
+
- Command-line interface for easy interaction.
|
60
|
+
- Smart file filtering with default exclusions for common non-code files.
|
61
|
+
- Concurrent processing for improved performance.
|
62
|
+
- Custom exclude patterns support.
|
63
|
+
- GitHub authentication via access tokens.
|
64
|
+
- Automatic rate limit handling with a retry mechanism.
|
65
|
+
- Repository prompt generation with file separation markers.
|
66
|
+
- Support for custom branch selection.
|
67
|
+
- Custom output file naming options.
|
data/index.html
ADDED
@@ -0,0 +1,363 @@
|
|
1
|
+
<!DOCTYPE html>
|
2
|
+
<html lang="en">
|
3
|
+
<head>
|
4
|
+
<meta charset="UTF-8">
|
5
|
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
6
|
+
<title>Gitingest - GitHub Repository Fetcher and Prompt Generator</title>
|
7
|
+
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/github-markdown-css/5.2.0/github-markdown.min.css">
|
8
|
+
<style>
|
9
|
+
:root {
|
10
|
+
--bg-color: #0d1117;
|
11
|
+
--text-color: #c9d1d9;
|
12
|
+
--link-color: #58a6ff;
|
13
|
+
--header-color: #f0f6fc;
|
14
|
+
--border-color: #30363d;
|
15
|
+
--code-bg: #161b22;
|
16
|
+
--code-block-bg: #0d1117;
|
17
|
+
--accent-color: #238636;
|
18
|
+
--accent-hover: #2ea043;
|
19
|
+
}
|
20
|
+
|
21
|
+
body {
|
22
|
+
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Helvetica, Arial, sans-serif;
|
23
|
+
line-height: 1.6;
|
24
|
+
color: var(--text-color);
|
25
|
+
background-color: var(--bg-color);
|
26
|
+
max-width: 900px;
|
27
|
+
margin: 0 auto;
|
28
|
+
padding: 20px;
|
29
|
+
}
|
30
|
+
|
31
|
+
.container {
|
32
|
+
border: 1px solid var(--border-color);
|
33
|
+
border-radius: 6px;
|
34
|
+
padding: 30px;
|
35
|
+
margin-bottom: 20px;
|
36
|
+
background-color: #0d1117;
|
37
|
+
}
|
38
|
+
|
39
|
+
.header {
|
40
|
+
display: flex;
|
41
|
+
align-items: center;
|
42
|
+
margin-bottom: 30px;
|
43
|
+
}
|
44
|
+
|
45
|
+
.logo {
|
46
|
+
width: 60px;
|
47
|
+
height: 60px;
|
48
|
+
margin-right: 15px;
|
49
|
+
background-color: var(--accent-color);
|
50
|
+
border-radius: 50%;
|
51
|
+
display: flex;
|
52
|
+
align-items: center;
|
53
|
+
justify-content: center;
|
54
|
+
color: white;
|
55
|
+
font-size: 24px;
|
56
|
+
font-weight: bold;
|
57
|
+
}
|
58
|
+
|
59
|
+
h1, h2, h3 {
|
60
|
+
color: var(--header-color);
|
61
|
+
border-bottom: 1px solid var(--border-color);
|
62
|
+
padding-bottom: 10px;
|
63
|
+
margin-top: 24px;
|
64
|
+
margin-bottom: 16px;
|
65
|
+
}
|
66
|
+
|
67
|
+
h1 {
|
68
|
+
font-size: 2em;
|
69
|
+
margin-bottom: 0.5em;
|
70
|
+
border-bottom: none;
|
71
|
+
padding-bottom: 0;
|
72
|
+
}
|
73
|
+
|
74
|
+
.header h1 {
|
75
|
+
margin: 0;
|
76
|
+
line-height: 1.3;
|
77
|
+
}
|
78
|
+
|
79
|
+
a {
|
80
|
+
color: var(--link-color);
|
81
|
+
text-decoration: none;
|
82
|
+
}
|
83
|
+
|
84
|
+
a:hover {
|
85
|
+
text-decoration: underline;
|
86
|
+
}
|
87
|
+
|
88
|
+
code {
|
89
|
+
font-family: "SFMono-Regular", Consolas, "Liberation Mono", Menlo, monospace;
|
90
|
+
background-color: var(--code-bg);
|
91
|
+
border-radius: 3px;
|
92
|
+
padding: 2px 4px;
|
93
|
+
font-size: 0.9em;
|
94
|
+
}
|
95
|
+
|
96
|
+
pre {
|
97
|
+
background-color: var(--code-block-bg);
|
98
|
+
border-radius: 6px;
|
99
|
+
padding: 16px;
|
100
|
+
overflow: auto;
|
101
|
+
border: 1px solid var(--border-color);
|
102
|
+
margin: 16px 0;
|
103
|
+
}
|
104
|
+
|
105
|
+
pre code {
|
106
|
+
background-color: transparent;
|
107
|
+
padding: 0;
|
108
|
+
border-radius: 0;
|
109
|
+
white-space: pre;
|
110
|
+
}
|
111
|
+
|
112
|
+
ul, ol {
|
113
|
+
padding-left: 2em;
|
114
|
+
}
|
115
|
+
|
116
|
+
.button {
|
117
|
+
display: inline-block;
|
118
|
+
background-color: var(--accent-color);
|
119
|
+
color: white;
|
120
|
+
padding: 8px 16px;
|
121
|
+
border-radius: 6px;
|
122
|
+
font-weight: 600;
|
123
|
+
margin: 8px 0;
|
124
|
+
}
|
125
|
+
|
126
|
+
.button:hover {
|
127
|
+
background-color: var(--accent-hover);
|
128
|
+
text-decoration: none;
|
129
|
+
}
|
130
|
+
|
131
|
+
.version-badge {
|
132
|
+
display: inline-block;
|
133
|
+
background-color: #238636;
|
134
|
+
color: white;
|
135
|
+
border-radius: 20px;
|
136
|
+
padding: 4px 10px;
|
137
|
+
font-size: 12px;
|
138
|
+
font-weight: bold;
|
139
|
+
margin-left: 10px;
|
140
|
+
}
|
141
|
+
|
142
|
+
footer {
|
143
|
+
margin-top: 40px;
|
144
|
+
text-align: center;
|
145
|
+
color: #8b949e;
|
146
|
+
font-size: 0.9em;
|
147
|
+
border-top: 1px solid var(--border-color);
|
148
|
+
padding-top: 20px;
|
149
|
+
}
|
150
|
+
|
151
|
+
.changelog {
|
152
|
+
margin-top: 30px;
|
153
|
+
}
|
154
|
+
|
155
|
+
.changelog-item {
|
156
|
+
margin-bottom: 24px;
|
157
|
+
}
|
158
|
+
|
159
|
+
.changelog-version {
|
160
|
+
font-weight: bold;
|
161
|
+
color: var(--header-color);
|
162
|
+
}
|
163
|
+
|
164
|
+
.changelog-date {
|
165
|
+
color: #8b949e;
|
166
|
+
font-size: 0.9em;
|
167
|
+
}
|
168
|
+
|
169
|
+
.changelog-list {
|
170
|
+
margin-top: 10px;
|
171
|
+
}
|
172
|
+
</style>
|
173
|
+
</head>
|
174
|
+
<body>
|
175
|
+
<div class="container">
|
176
|
+
<div class="header">
|
177
|
+
<div class="logo">G</div>
|
178
|
+
<div>
|
179
|
+
<h1>Gitingest <span class="version-badge">v0.3.0</span></h1>
|
180
|
+
<p>A Ruby gem that fetches files from a GitHub repository and generates a consolidated text prompt for LLMs</p>
|
181
|
+
</div>
|
182
|
+
</div>
|
183
|
+
|
184
|
+
<a href="https://github.com/davidesantangelo/gitingest" class="button">View on GitHub</a>
|
185
|
+
<a href="https://rubygems.org/gems/gitingest" class="button">View on RubyGems</a>
|
186
|
+
|
187
|
+
<h2>Installation</h2>
|
188
|
+
|
189
|
+
<h3>From RubyGems</h3>
|
190
|
+
<pre><code>gem install gitingest</code></pre>
|
191
|
+
|
192
|
+
<h3>From Source</h3>
|
193
|
+
<pre><code>git clone https://github.com/davidesantangelo/gitingest.git
|
194
|
+
cd gitingest
|
195
|
+
bundle install
|
196
|
+
bundle exec rake install</code></pre>
|
197
|
+
|
198
|
+
<h2>Usage</h2>
|
199
|
+
|
200
|
+
<h3>Command Line</h3>
|
201
|
+
<pre><code># Basic usage (public repository)
|
202
|
+
gitingest --repository user/repo
|
203
|
+
|
204
|
+
# With GitHub token for private repositories
|
205
|
+
gitingest --repository user/repo --token YOUR_GITHUB_TOKEN
|
206
|
+
|
207
|
+
# Specify a custom output file
|
208
|
+
gitingest --repository user/repo --output my_prompt.txt
|
209
|
+
|
210
|
+
# Specify a different branch
|
211
|
+
gitingest --repository user/repo --branch develop
|
212
|
+
|
213
|
+
# Exclude additional patterns
|
214
|
+
gitingest --repository user/repo --exclude "*.md,docs/"
|
215
|
+
|
216
|
+
# Quiet mode
|
217
|
+
gitingest --repository user/repo --quiet
|
218
|
+
|
219
|
+
# Verbose mode
|
220
|
+
gitingest --repository user/repo --verbose</code></pre>
|
221
|
+
|
222
|
+
<h4>Available Options</h4>
|
223
|
+
<ul>
|
224
|
+
<li><code>-r, --repository REPO</code>: GitHub repository (username/repo) [Required]</li>
|
225
|
+
<li><code>-t, --token TOKEN</code>: GitHub personal access token [Optional but recommended]</li>
|
226
|
+
<li><code>-o, --output FILE</code>: Output file for the prompt [Default: reponame_prompt.txt]</li>
|
227
|
+
<li><code>-e, --exclude PATTERN</code>: File patterns to exclude (comma separated)</li>
|
228
|
+
<li><code>-b, --branch BRANCH</code>: Repository branch [Default: main]</li>
|
229
|
+
<li><code>-h, --help</code>: Show help message</li>
|
230
|
+
</ul>
|
231
|
+
|
232
|
+
<h3>As a Library</h3>
|
233
|
+
<pre><code>require "gitingest"
|
234
|
+
|
235
|
+
# Basic usage
|
236
|
+
generator = Gitingest::Generator.new(
|
237
|
+
repository: "user/repo",
|
238
|
+
token: "YOUR_GITHUB_TOKEN" # optional
|
239
|
+
)
|
240
|
+
generator.run
|
241
|
+
|
242
|
+
# With custom options
|
243
|
+
generator = Gitingest::Generator.new(
|
244
|
+
repository: "user/repo",
|
245
|
+
token: "YOUR_GITHUB_TOKEN",
|
246
|
+
output_file: "my_prompt.txt",
|
247
|
+
branch: "develop",
|
248
|
+
exclude: ["*.md", "docs/"],
|
249
|
+
quiet: true # or verbose: true
|
250
|
+
)
|
251
|
+
generator.run
|
252
|
+
|
253
|
+
# With custom logger
|
254
|
+
custom_logger = Logger.new("gitingest.log")
|
255
|
+
generator = Gitingest::Generator.new(
|
256
|
+
repository: "user/repo",
|
257
|
+
logger: custom_logger
|
258
|
+
)
|
259
|
+
generator.run</code></pre>
|
260
|
+
|
261
|
+
<h2>Features</h2>
|
262
|
+
<ul>
|
263
|
+
<li>Fetches all files from a GitHub repository based on the given branch</li>
|
264
|
+
<li>Automatically excludes common binary files and system files by default</li>
|
265
|
+
<li>Allows custom exclusion patterns for specific file extensions or directories</li>
|
266
|
+
<li>Uses concurrent processing for faster downloads</li>
|
267
|
+
<li>Handles GitHub API rate limiting with automatic retry</li>
|
268
|
+
<li>Generates a clean, formatted output file with file paths and content</li>
|
269
|
+
</ul>
|
270
|
+
|
271
|
+
<h2>Default Exclusion Patterns</h2>
|
272
|
+
<p>By default, the generator excludes files and directories commonly ignored in repositories, such as:</p>
|
273
|
+
<ul>
|
274
|
+
<li>Version control files (<code>.git/</code>, <code>.svn/</code>)</li>
|
275
|
+
<li>System files (<code>.DS_Store</code>, <code>Thumbs.db</code>)</li>
|
276
|
+
<li>Log files (<code>*.log</code>, <code>*.bak</code>)</li>
|
277
|
+
<li>Images and media files (<code>*.png</code>, <code>*.jpg</code>, <code>*.mp3</code>)</li>
|
278
|
+
<li>Archives (<code>*.zip</code>, <code>*.tar.gz</code>)</li>
|
279
|
+
<li>Dependency directories (<code>node_modules/</code>, <code>vendor/</code>)</li>
|
280
|
+
<li>Compiled and binary files (<code>*.pyc</code>, <code>*.class</code>, <code>*.exe</code>)</li>
|
281
|
+
</ul>
|
282
|
+
|
283
|
+
<h2>Limitations</h2>
|
284
|
+
<ul>
|
285
|
+
<li>To prevent memory overload, only the first 1000 files will be processed</li>
|
286
|
+
<li>API requests are subject to GitHub limits (60 requests/hour without token, 5000 requests/hour with token)</li>
|
287
|
+
<li>Private repositories require a GitHub personal access token</li>
|
288
|
+
</ul>
|
289
|
+
|
290
|
+
<div class="changelog">
|
291
|
+
<h2>Changelog</h2>
|
292
|
+
|
293
|
+
<div class="changelog-item">
|
294
|
+
<div>
|
295
|
+
<span class="changelog-version">v0.3.0</span>
|
296
|
+
<span class="changelog-date">- March 2, 2025</span>
|
297
|
+
</div>
|
298
|
+
<ul class="changelog-list">
|
299
|
+
<li>Added <code>faraday-retry</code> gem dependency for better API rate limit handling</li>
|
300
|
+
<li>Implemented thread-safe buffer management with mutex locks</li>
|
301
|
+
<li>Added new <code>ProgressIndicator</code> class for better CLI progress reporting (showing percentages)</li>
|
302
|
+
<li>Improved memory efficiency with configurable buffer size</li>
|
303
|
+
<li>Enhanced code organization with dedicated methods for file content formatting</li>
|
304
|
+
<li>Added comprehensive method documentation and parameter descriptions</li>
|
305
|
+
<li>Optimized thread pool size calculation for better performance</li>
|
306
|
+
<li>Improved error handling in concurrent operations</li>
|
307
|
+
</ul>
|
308
|
+
</div>
|
309
|
+
|
310
|
+
<div class="changelog-item">
|
311
|
+
<div>
|
312
|
+
<span class="changelog-version">v0.2.0</span>
|
313
|
+
<span class="changelog-date">- March 2, 2025</span>
|
314
|
+
</div>
|
315
|
+
<ul class="changelog-list">
|
316
|
+
<li>Added support for quiet and verbose modes in the command-line interface</li>
|
317
|
+
<li>Added the ability to specify a custom output file for the prompt</li>
|
318
|
+
<li>Enhanced error handling with logging support</li>
|
319
|
+
<li>Added logging functionality with custom loggers</li>
|
320
|
+
<li>Introduced rate limit handling with retries for file fetching</li>
|
321
|
+
<li>Added repository branch support</li>
|
322
|
+
<li>Exclude specific file patterns via command-line arguments</li>
|
323
|
+
<li>Enforced a 1000 file limit to prevent memory overload</li>
|
324
|
+
</ul>
|
325
|
+
</div>
|
326
|
+
|
327
|
+
<div class="changelog-item">
|
328
|
+
<div>
|
329
|
+
<span class="changelog-version">v0.1.0</span>
|
330
|
+
<span class="changelog-date">- March 2, 2025</span>
|
331
|
+
</div>
|
332
|
+
<ul class="changelog-list">
|
333
|
+
<li>Initial release of Gitingest</li>
|
334
|
+
<li>Core functionality to fetch and process GitHub repository files</li>
|
335
|
+
<li>Command-line interface for easy interaction</li>
|
336
|
+
<li>Smart file filtering with default exclusions for common non-code files</li>
|
337
|
+
<li>Concurrent processing for improved performance</li>
|
338
|
+
<li>Custom exclude patterns support</li>
|
339
|
+
<li>GitHub authentication via access tokens</li>
|
340
|
+
<li>Automatic rate limit handling with retry mechanism</li>
|
341
|
+
<li>Repository prompt generation with file separation markers</li>
|
342
|
+
<li>Support for custom branch selection</li>
|
343
|
+
<li>Custom output file naming options</li>
|
344
|
+
</ul>
|
345
|
+
</div>
|
346
|
+
</div>
|
347
|
+
|
348
|
+
<h2>Contributing</h2>
|
349
|
+
<p>Bug reports and pull requests are welcome on GitHub at <a href="https://github.com/davidesantangelo/gitingest">https://github.com/davidesantangelo/gitingest</a>.</p>
|
350
|
+
|
351
|
+
<h2>Acknowledgements</h2>
|
352
|
+
<p>Inspired by <a href="https://github.com/cyclotruc/gitingest"><code>cyclotruc/gitingest</code></a>.</p>
|
353
|
+
|
354
|
+
<h2>License</h2>
|
355
|
+
<p>The gem is available as open source under the terms of the <a href="https://opensource.org/licenses/MIT">MIT License</a>.</p>
|
356
|
+
</div>
|
357
|
+
|
358
|
+
<footer>
|
359
|
+
<p>© 2025 David Santangelo</p>
|
360
|
+
<p>Last updated: March 2, 2025</p>
|
361
|
+
</footer>
|
362
|
+
</body>
|
363
|
+
</html>
|
data/lib/gitingest/generator.rb
CHANGED
@@ -65,9 +65,21 @@ module Gitingest
|
|
65
65
|
"\.swiftpm/", "\.build/"
|
66
66
|
].freeze
|
67
67
|
|
68
|
+
# Optimization: pattern for dot files/directories
|
69
|
+
DOT_FILE_PATTERN = %r{(?-mix:(^\.|/\.))}
|
70
|
+
|
68
71
|
# Maximum number of files to process to prevent memory overload
|
69
72
|
MAX_FILES = 1000
|
70
|
-
|
73
|
+
|
74
|
+
# Optimization: increased buffer size to reduce I/O operations
|
75
|
+
BUFFER_SIZE = 250
|
76
|
+
|
77
|
+
# Optimization: thread-local buffer threshold
|
78
|
+
LOCAL_BUFFER_THRESHOLD = 50
|
79
|
+
|
80
|
+
# Add configurable threading options
|
81
|
+
DEFAULT_THREAD_COUNT = [Concurrent.processor_count, 8].min
|
82
|
+
DEFAULT_THREAD_TIMEOUT = 60 # seconds
|
71
83
|
|
72
84
|
attr_reader :options, :client, :repo_files, :excluded_patterns, :logger
|
73
85
|
|
@@ -82,6 +94,8 @@ module Gitingest
|
|
82
94
|
# @option options [Boolean] :quiet Reduce logging to errors only
|
83
95
|
# @option options [Boolean] :verbose Increase logging verbosity
|
84
96
|
# @option options [Logger] :logger Custom logger instance
|
97
|
+
# @option options [Integer] :threads Number of threads to use (default: auto-detected)
|
98
|
+
# @option options [Integer] :thread_timeout Seconds to wait for thread pool shutdown (default: 60)
|
85
99
|
def initialize(options = {})
|
86
100
|
@options = options
|
87
101
|
@repo_files = []
|
@@ -121,6 +135,8 @@ module Gitingest
|
|
121
135
|
@options[:output_file] ||= "#{@options[:repository].split("/").last}_prompt.txt"
|
122
136
|
@options[:branch] ||= "main"
|
123
137
|
@options[:exclude] ||= []
|
138
|
+
@options[:threads] ||= DEFAULT_THREAD_COUNT
|
139
|
+
@options[:thread_timeout] ||= DEFAULT_THREAD_TIMEOUT
|
124
140
|
@excluded_patterns = DEFAULT_EXCLUDES + @options[:exclude]
|
125
141
|
end
|
126
142
|
|
@@ -136,9 +152,10 @@ module Gitingest
|
|
136
152
|
end
|
137
153
|
end
|
138
154
|
|
139
|
-
#
|
155
|
+
# Optimization: Create a combined regex for faster exclusion checking
|
140
156
|
def compile_excluded_patterns
|
141
|
-
|
157
|
+
patterns = @excluded_patterns.map { |pattern| "(#{pattern})" }
|
158
|
+
@combined_exclude_regex = Regexp.new("#{DOT_FILE_PATTERN.source}|#{patterns.join("|")}")
|
142
159
|
end
|
143
160
|
|
144
161
|
# Fetch repository contents and apply exclusion filters
|
@@ -180,49 +197,93 @@ module Gitingest
|
|
180
197
|
end
|
181
198
|
end
|
182
199
|
|
183
|
-
#
|
200
|
+
# Optimization: Optimized file exclusion check with combined regex
|
184
201
|
def excluded_file?(path)
|
185
|
-
|
186
|
-
|
187
|
-
@excluded_patterns.any? { |pattern| path.match?(pattern) }
|
202
|
+
path.match?(@combined_exclude_regex)
|
188
203
|
end
|
189
204
|
|
190
|
-
# Generate the consolidated prompt file
|
205
|
+
# Generate the consolidated prompt file with optimized threading
|
191
206
|
def generate_prompt
|
192
207
|
@logger.info "Generating prompt..."
|
208
|
+
@logger.debug "Using thread pool with #{@options[:threads]} threads"
|
209
|
+
|
193
210
|
buffer = []
|
194
211
|
progress = ProgressIndicator.new(@repo_files.size, @logger)
|
195
212
|
|
196
|
-
#
|
197
|
-
|
213
|
+
# Optimization: thread-local buffers to reduce mutex contention
|
214
|
+
thread_buffers = {}
|
215
|
+
mutex = Mutex.new
|
216
|
+
errors = []
|
217
|
+
|
218
|
+
# Dynamic thread pool based on configuration
|
219
|
+
pool = Concurrent::FixedThreadPool.new(@options[:threads])
|
220
|
+
|
221
|
+
# Group files by priority (smaller files first for better parallelism)
|
222
|
+
prioritized_files = prioritize_files(@repo_files)
|
198
223
|
|
199
224
|
File.open(@options[:output_file], "w") do |file|
|
200
|
-
|
225
|
+
prioritized_files.each_with_index do |repo_file, index|
|
201
226
|
pool.post do
|
202
|
-
|
203
|
-
|
204
|
-
|
205
|
-
|
206
|
-
|
207
|
-
|
208
|
-
|
227
|
+
# Optimization: Use thread-local buffers
|
228
|
+
thread_id = Thread.current.object_id
|
229
|
+
thread_buffers[thread_id] ||= []
|
230
|
+
local_buffer = thread_buffers[thread_id]
|
231
|
+
|
232
|
+
begin
|
233
|
+
content = fetch_file_content_with_retry(repo_file.path)
|
234
|
+
result = format_file_content(repo_file.path, content)
|
235
|
+
local_buffer << result
|
236
|
+
|
237
|
+
# Optimization: Only acquire mutex when local buffer reaches threshold
|
238
|
+
if local_buffer.size >= LOCAL_BUFFER_THRESHOLD
|
239
|
+
mutex.synchronize do
|
240
|
+
buffer.concat(local_buffer)
|
241
|
+
write_buffer(file, buffer) if buffer.size >= BUFFER_SIZE
|
242
|
+
local_buffer.clear
|
243
|
+
end
|
244
|
+
end
|
245
|
+
|
246
|
+
progress.update(index + 1)
|
247
|
+
rescue Octokit::Error => e
|
248
|
+
mutex.synchronize do
|
249
|
+
errors << "Error fetching #{repo_file.path}: #{e.message}"
|
250
|
+
@logger.error "Error fetching #{repo_file.path}: #{e.message}"
|
251
|
+
end
|
252
|
+
rescue StandardError => e
|
253
|
+
mutex.synchronize do
|
254
|
+
errors << "Unexpected error processing #{repo_file.path}: #{e.message}"
|
255
|
+
@logger.error "Unexpected error processing #{repo_file.path}: #{e.message}"
|
256
|
+
end
|
209
257
|
end
|
210
|
-
|
211
|
-
progress.update(index + 1)
|
212
|
-
rescue Octokit::Error => e
|
213
|
-
@logger.error "Error fetching #{repo_file.path}: #{e.message}"
|
214
258
|
end
|
215
259
|
end
|
216
260
|
|
217
|
-
|
218
|
-
|
261
|
+
begin
|
262
|
+
pool.shutdown
|
263
|
+
wait_success = pool.wait_for_termination(@options[:thread_timeout])
|
219
264
|
|
220
|
-
|
221
|
-
|
265
|
+
unless wait_success
|
266
|
+
@logger.warn "Thread pool did not shut down within #{@options[:thread_timeout]} seconds, forcing termination"
|
267
|
+
pool.kill
|
268
|
+
end
|
269
|
+
rescue StandardError => e
|
270
|
+
@logger.error "Error during thread pool shutdown: #{e.message}"
|
271
|
+
end
|
272
|
+
|
273
|
+
# Process remaining files in thread-local buffers
|
274
|
+
mutex.synchronize do
|
275
|
+
thread_buffers.each_value do |local_buffer|
|
276
|
+
buffer.concat(local_buffer) unless local_buffer.empty?
|
277
|
+
end
|
222
278
|
write_buffer(file, buffer) unless buffer.empty?
|
223
279
|
end
|
224
280
|
end
|
225
281
|
|
282
|
+
if errors.any?
|
283
|
+
@logger.warn "Completed with #{errors.size} errors"
|
284
|
+
@logger.debug "First few errors: #{errors.first(3).join(", ")}" if @logger.debug?
|
285
|
+
end
|
286
|
+
|
226
287
|
@logger.info "Prompt generated and saved to #{@options[:output_file]}"
|
227
288
|
end
|
228
289
|
|
@@ -237,45 +298,122 @@ module Gitingest
|
|
237
298
|
TEXT
|
238
299
|
end
|
239
300
|
|
240
|
-
# Fetch file content with
|
241
|
-
def fetch_file_content_with_retry(path, retries = 3)
|
301
|
+
# Optimization: Fetch file content with exponential backoff for rate limiting
|
302
|
+
def fetch_file_content_with_retry(path, retries = 3, base_delay = 2)
|
242
303
|
content = @client.contents(@options[:repository], path: path, ref: @options[:branch])
|
243
304
|
Base64.decode64(content.content)
|
244
305
|
rescue Octokit::TooManyRequests
|
245
306
|
raise unless retries.positive?
|
246
307
|
|
247
|
-
|
248
|
-
|
249
|
-
|
250
|
-
|
308
|
+
# Optimization: Exponential backoff with jitter for better rate limit handling
|
309
|
+
delay = base_delay**(4 - retries) * (0.8 + 0.4 * rand)
|
310
|
+
@logger.warn "Rate limit exceeded, waiting #{delay.round(1)} seconds..."
|
311
|
+
sleep(delay)
|
312
|
+
fetch_file_content_with_retry(path, retries - 1, base_delay)
|
251
313
|
end
|
252
314
|
|
253
315
|
# Write buffer contents to file and clear buffer
|
254
316
|
def write_buffer(file, buffer)
|
317
|
+
return if buffer.empty?
|
318
|
+
|
255
319
|
file.puts(buffer.join)
|
256
320
|
buffer.clear
|
257
321
|
end
|
258
322
|
|
259
|
-
#
|
260
|
-
def
|
261
|
-
|
323
|
+
# Sort files by estimated processing priority
|
324
|
+
def prioritize_files(files)
|
325
|
+
# Sort files by estimated size (based on extension)
|
326
|
+
# This helps with better thread distribution - process small files first
|
327
|
+
files.sort_by do |file|
|
328
|
+
path = file.path.downcase
|
329
|
+
if path.end_with?(".md", ".txt", ".json", ".yaml", ".yml")
|
330
|
+
0 # Process documentation and config files first (usually small)
|
331
|
+
elsif path.end_with?(".rb", ".py", ".js", ".ts", ".go", ".java", ".c", ".cpp", ".h")
|
332
|
+
1 # Then process code files (medium size)
|
333
|
+
else
|
334
|
+
2 # Other files last
|
335
|
+
end
|
336
|
+
end
|
262
337
|
end
|
263
338
|
end
|
264
339
|
|
265
|
-
# Helper class for showing progress in CLI
|
340
|
+
# Helper class for showing progress in CLI with visual bar
|
266
341
|
class ProgressIndicator
|
342
|
+
BAR_WIDTH = 30 # Width of the progress bar
|
343
|
+
|
267
344
|
def initialize(total, logger)
|
268
345
|
@total = total
|
269
346
|
@logger = logger
|
270
347
|
@last_percent = 0
|
348
|
+
@start_time = Time.now
|
349
|
+
@last_update_time = Time.now
|
350
|
+
@update_interval = 0.5 # Limit updates to twice per second
|
271
351
|
end
|
272
352
|
|
353
|
+
# Update progress with visual bar
|
273
354
|
def update(current)
|
355
|
+
# Avoid updating too frequently
|
356
|
+
now = Time.now
|
357
|
+
return if now - @last_update_time < @update_interval && current != @total
|
358
|
+
|
359
|
+
@last_update_time = now
|
274
360
|
percent = (current.to_f / @total * 100).round
|
275
|
-
return unless percent > @last_percent && ((percent % 5).zero? || current == @total)
|
276
361
|
|
277
|
-
|
362
|
+
# Only update at meaningful increments or completion
|
363
|
+
return unless percent > @last_percent || current == @total
|
364
|
+
|
365
|
+
elapsed = now - @start_time
|
366
|
+
|
367
|
+
# Generate progress bar
|
368
|
+
progress_chars = (BAR_WIDTH * (current.to_f / @total)).round
|
369
|
+
bar = "[#{"|" * progress_chars}#{" " * (BAR_WIDTH - progress_chars)}]"
|
370
|
+
|
371
|
+
# Calculate ETA
|
372
|
+
eta_string = ""
|
373
|
+
if current > 1 && percent < 100
|
374
|
+
remaining = (elapsed / current) * (@total - current)
|
375
|
+
eta_string = " ETA: #{format_time(remaining)}"
|
376
|
+
end
|
377
|
+
|
378
|
+
# Calculate rate (files per second)
|
379
|
+
rate = begin
|
380
|
+
current / elapsed
|
381
|
+
rescue StandardError
|
382
|
+
0
|
383
|
+
end
|
384
|
+
rate_string = " (#{rate.round(1)} files/sec)"
|
385
|
+
|
386
|
+
# Clear line and print progress bar
|
387
|
+
print "\r\e[K" # Clear the line
|
388
|
+
print "#{bar} #{percent}% | #{current}/#{@total} files#{rate_string}#{eta_string}"
|
389
|
+
print "\n" if current == @total # Add newline when complete
|
390
|
+
|
391
|
+
# Also log to logger at less frequent intervals
|
392
|
+
if (percent % 10).zero? && percent != @last_percent || current == @total
|
393
|
+
@logger.info "Processing: #{percent}% complete (#{current}/#{@total} files)#{eta_string}"
|
394
|
+
end
|
395
|
+
|
278
396
|
@last_percent = percent
|
279
397
|
end
|
398
|
+
|
399
|
+
private
|
400
|
+
|
401
|
+
# Format seconds into a human-readable time string
|
402
|
+
def format_time(seconds)
|
403
|
+
return "< 1s" if seconds < 1
|
404
|
+
|
405
|
+
case seconds
|
406
|
+
when 0...60
|
407
|
+
"#{seconds.round}s"
|
408
|
+
when 60...3600
|
409
|
+
minutes = (seconds / 60).floor
|
410
|
+
secs = (seconds % 60).round
|
411
|
+
"#{minutes}m #{secs}s"
|
412
|
+
else
|
413
|
+
hours = (seconds / 3600).floor
|
414
|
+
minutes = ((seconds % 3600) / 60).floor
|
415
|
+
"#{hours}h #{minutes}m"
|
416
|
+
end
|
417
|
+
end
|
280
418
|
end
|
281
419
|
end
|
data/lib/gitingest/version.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: gitingest
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.3.
|
4
|
+
version: 0.3.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Davide Santangelo
|
@@ -131,6 +131,7 @@ files:
|
|
131
131
|
- bin/console
|
132
132
|
- bin/gitingest
|
133
133
|
- bin/setup
|
134
|
+
- index.html
|
134
135
|
- lib/gitingest.rb
|
135
136
|
- lib/gitingest/generator.rb
|
136
137
|
- lib/gitingest/version.rb
|