gitingest 0.2.0 → 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +59 -21
- data/index.html +363 -0
- data/lib/gitingest/generator.rb +235 -44
- data/lib/gitingest/version.rb +1 -1
- metadata +16 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 14bcb35132327c7725e69a895d56b4e88fb21db78fa5473c9afb4d08b879b7ee
|
4
|
+
data.tar.gz: f3a5e06bec7566a268342678887bac14989d69d26f10cbb590e1f594c6779b89
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 1aed7d97acae8b6a1c2b15757cdc9852d802c99b83aa73caf114dfc904d0b05c44b70c853cd453fc12868310c09b15bf6bf71dcac203c01b5dfde241dc7ab0f5
|
7
|
+
data.tar.gz: 07145ca986675723ad0371c873c176384631da660bed227b31ef4cf2680d72cc98479b4da8fb990fedb3dce0eab74ca5114c1222eea0ca1890a85d73726a8d4b
|
data/CHANGELOG.md
CHANGED
@@ -1,29 +1,67 @@
|
|
1
1
|
# Changelog
|
2
2
|
|
3
|
-
|
3
|
+
## [0.3.1] - 2025-03-03
|
4
|
+
|
5
|
+
### Added
|
6
|
+
- Introduced configurable threading options:
|
7
|
+
- `:threads` to specify the number of threads (default: auto-detected).
|
8
|
+
- `:thread_timeout` to define thread pool shutdown timeout (default: 60 seconds).
|
9
|
+
- Implemented thread-local buffers to reduce mutex contention during file processing.
|
10
|
+
- Added exponential backoff with jitter for rate-limited API requests.
|
11
|
+
- Improved progress indicator with a visual progress bar and estimated time remaining.
|
12
|
+
|
13
|
+
### Changed
|
14
|
+
- Increased `BUFFER_SIZE` from 100 to 250 to reduce I/O operations.
|
15
|
+
- Optimized file exclusion check using a combined regex for faster matching.
|
16
|
+
- Improved thread pool efficiency by prioritizing smaller files first.
|
17
|
+
- Enhanced error handling with detailed logging and thread-safe error collection.
|
18
|
+
|
19
|
+
### Fixed
|
20
|
+
- Ensured thread pool shutdown respects the configured timeout.
|
21
|
+
- Resolved potential race conditions in file content retrieval.
|
22
|
+
|
23
|
+
---
|
24
|
+
|
25
|
+
## [0.3.0] - 2025-03-02
|
26
|
+
|
27
|
+
### Added
|
28
|
+
- Added `faraday-retry` gem dependency for better API rate limit handling.
|
29
|
+
- Implemented thread-safe buffer management with mutex locks.
|
30
|
+
- Introduced `ProgressIndicator` class for enhanced CLI progress reporting, including percentages.
|
31
|
+
- Improved memory efficiency with a configurable buffer size.
|
32
|
+
- Enhanced code organization by introducing dedicated methods for file content formatting.
|
33
|
+
- Added comprehensive method documentation and parameter descriptions.
|
34
|
+
- Optimized thread pool size calculation for improved performance.
|
35
|
+
- Improved error handling in concurrent operations.
|
36
|
+
|
37
|
+
---
|
4
38
|
|
5
39
|
## [0.2.0] - 2025-03-02
|
6
|
-
|
7
|
-
|
8
|
-
-
|
9
|
-
- Added
|
10
|
-
-
|
11
|
-
-
|
12
|
-
-
|
13
|
-
-
|
14
|
-
-
|
40
|
+
|
41
|
+
### Added
|
42
|
+
- Introduced support for quiet and verbose modes in the command-line interface.
|
43
|
+
- Added the ability to specify a custom output file for the prompt.
|
44
|
+
- Implemented enhanced error handling with logging support.
|
45
|
+
- Introduced logging functionality with customizable loggers.
|
46
|
+
- Added rate limit handling with retries for file fetching.
|
47
|
+
- Implemented repository branch support.
|
48
|
+
- Enabled exclusion of specific file patterns via command-line arguments.
|
49
|
+
- Enforced a 1000-file limit to prevent memory overload.
|
50
|
+
- Updated version to `0.2.0`.
|
51
|
+
|
52
|
+
---
|
15
53
|
|
16
54
|
## [0.1.0] - 2025-03-02
|
17
55
|
|
18
56
|
### Added
|
19
|
-
- Initial release of Gitingest
|
20
|
-
- Core functionality to fetch and process GitHub repository files
|
21
|
-
- Command-line interface for easy interaction
|
22
|
-
- Smart file filtering with default exclusions for common non-code files
|
23
|
-
- Concurrent processing for improved performance
|
24
|
-
- Custom exclude patterns support
|
25
|
-
- GitHub authentication via access tokens
|
26
|
-
- Automatic rate limit handling with retry mechanism
|
27
|
-
- Repository prompt generation with file separation markers
|
28
|
-
- Support for custom branch selection
|
29
|
-
- Custom output file naming options
|
57
|
+
- Initial release of Gitingest.
|
58
|
+
- Core functionality to fetch and process GitHub repository files.
|
59
|
+
- Command-line interface for easy interaction.
|
60
|
+
- Smart file filtering with default exclusions for common non-code files.
|
61
|
+
- Concurrent processing for improved performance.
|
62
|
+
- Custom exclude patterns support.
|
63
|
+
- GitHub authentication via access tokens.
|
64
|
+
- Automatic rate limit handling with a retry mechanism.
|
65
|
+
- Repository prompt generation with file separation markers.
|
66
|
+
- Support for custom branch selection.
|
67
|
+
- Custom output file naming options.
|
data/index.html
ADDED
@@ -0,0 +1,363 @@
|
|
1
|
+
<!DOCTYPE html>
|
2
|
+
<html lang="en">
|
3
|
+
<head>
|
4
|
+
<meta charset="UTF-8">
|
5
|
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
6
|
+
<title>Gitingest - GitHub Repository Fetcher and Prompt Generator</title>
|
7
|
+
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/github-markdown-css/5.2.0/github-markdown.min.css">
|
8
|
+
<style>
|
9
|
+
:root {
|
10
|
+
--bg-color: #0d1117;
|
11
|
+
--text-color: #c9d1d9;
|
12
|
+
--link-color: #58a6ff;
|
13
|
+
--header-color: #f0f6fc;
|
14
|
+
--border-color: #30363d;
|
15
|
+
--code-bg: #161b22;
|
16
|
+
--code-block-bg: #0d1117;
|
17
|
+
--accent-color: #238636;
|
18
|
+
--accent-hover: #2ea043;
|
19
|
+
}
|
20
|
+
|
21
|
+
body {
|
22
|
+
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Helvetica, Arial, sans-serif;
|
23
|
+
line-height: 1.6;
|
24
|
+
color: var(--text-color);
|
25
|
+
background-color: var(--bg-color);
|
26
|
+
max-width: 900px;
|
27
|
+
margin: 0 auto;
|
28
|
+
padding: 20px;
|
29
|
+
}
|
30
|
+
|
31
|
+
.container {
|
32
|
+
border: 1px solid var(--border-color);
|
33
|
+
border-radius: 6px;
|
34
|
+
padding: 30px;
|
35
|
+
margin-bottom: 20px;
|
36
|
+
background-color: #0d1117;
|
37
|
+
}
|
38
|
+
|
39
|
+
.header {
|
40
|
+
display: flex;
|
41
|
+
align-items: center;
|
42
|
+
margin-bottom: 30px;
|
43
|
+
}
|
44
|
+
|
45
|
+
.logo {
|
46
|
+
width: 60px;
|
47
|
+
height: 60px;
|
48
|
+
margin-right: 15px;
|
49
|
+
background-color: var(--accent-color);
|
50
|
+
border-radius: 50%;
|
51
|
+
display: flex;
|
52
|
+
align-items: center;
|
53
|
+
justify-content: center;
|
54
|
+
color: white;
|
55
|
+
font-size: 24px;
|
56
|
+
font-weight: bold;
|
57
|
+
}
|
58
|
+
|
59
|
+
h1, h2, h3 {
|
60
|
+
color: var(--header-color);
|
61
|
+
border-bottom: 1px solid var(--border-color);
|
62
|
+
padding-bottom: 10px;
|
63
|
+
margin-top: 24px;
|
64
|
+
margin-bottom: 16px;
|
65
|
+
}
|
66
|
+
|
67
|
+
h1 {
|
68
|
+
font-size: 2em;
|
69
|
+
margin-bottom: 0.5em;
|
70
|
+
border-bottom: none;
|
71
|
+
padding-bottom: 0;
|
72
|
+
}
|
73
|
+
|
74
|
+
.header h1 {
|
75
|
+
margin: 0;
|
76
|
+
line-height: 1.3;
|
77
|
+
}
|
78
|
+
|
79
|
+
a {
|
80
|
+
color: var(--link-color);
|
81
|
+
text-decoration: none;
|
82
|
+
}
|
83
|
+
|
84
|
+
a:hover {
|
85
|
+
text-decoration: underline;
|
86
|
+
}
|
87
|
+
|
88
|
+
code {
|
89
|
+
font-family: "SFMono-Regular", Consolas, "Liberation Mono", Menlo, monospace;
|
90
|
+
background-color: var(--code-bg);
|
91
|
+
border-radius: 3px;
|
92
|
+
padding: 2px 4px;
|
93
|
+
font-size: 0.9em;
|
94
|
+
}
|
95
|
+
|
96
|
+
pre {
|
97
|
+
background-color: var(--code-block-bg);
|
98
|
+
border-radius: 6px;
|
99
|
+
padding: 16px;
|
100
|
+
overflow: auto;
|
101
|
+
border: 1px solid var(--border-color);
|
102
|
+
margin: 16px 0;
|
103
|
+
}
|
104
|
+
|
105
|
+
pre code {
|
106
|
+
background-color: transparent;
|
107
|
+
padding: 0;
|
108
|
+
border-radius: 0;
|
109
|
+
white-space: pre;
|
110
|
+
}
|
111
|
+
|
112
|
+
ul, ol {
|
113
|
+
padding-left: 2em;
|
114
|
+
}
|
115
|
+
|
116
|
+
.button {
|
117
|
+
display: inline-block;
|
118
|
+
background-color: var(--accent-color);
|
119
|
+
color: white;
|
120
|
+
padding: 8px 16px;
|
121
|
+
border-radius: 6px;
|
122
|
+
font-weight: 600;
|
123
|
+
margin: 8px 0;
|
124
|
+
}
|
125
|
+
|
126
|
+
.button:hover {
|
127
|
+
background-color: var(--accent-hover);
|
128
|
+
text-decoration: none;
|
129
|
+
}
|
130
|
+
|
131
|
+
.version-badge {
|
132
|
+
display: inline-block;
|
133
|
+
background-color: #238636;
|
134
|
+
color: white;
|
135
|
+
border-radius: 20px;
|
136
|
+
padding: 4px 10px;
|
137
|
+
font-size: 12px;
|
138
|
+
font-weight: bold;
|
139
|
+
margin-left: 10px;
|
140
|
+
}
|
141
|
+
|
142
|
+
footer {
|
143
|
+
margin-top: 40px;
|
144
|
+
text-align: center;
|
145
|
+
color: #8b949e;
|
146
|
+
font-size: 0.9em;
|
147
|
+
border-top: 1px solid var(--border-color);
|
148
|
+
padding-top: 20px;
|
149
|
+
}
|
150
|
+
|
151
|
+
.changelog {
|
152
|
+
margin-top: 30px;
|
153
|
+
}
|
154
|
+
|
155
|
+
.changelog-item {
|
156
|
+
margin-bottom: 24px;
|
157
|
+
}
|
158
|
+
|
159
|
+
.changelog-version {
|
160
|
+
font-weight: bold;
|
161
|
+
color: var(--header-color);
|
162
|
+
}
|
163
|
+
|
164
|
+
.changelog-date {
|
165
|
+
color: #8b949e;
|
166
|
+
font-size: 0.9em;
|
167
|
+
}
|
168
|
+
|
169
|
+
.changelog-list {
|
170
|
+
margin-top: 10px;
|
171
|
+
}
|
172
|
+
</style>
|
173
|
+
</head>
|
174
|
+
<body>
|
175
|
+
<div class="container">
|
176
|
+
<div class="header">
|
177
|
+
<div class="logo">G</div>
|
178
|
+
<div>
|
179
|
+
<h1>Gitingest <span class="version-badge">v0.3.0</span></h1>
|
180
|
+
<p>A Ruby gem that fetches files from a GitHub repository and generates a consolidated text prompt for LLMs</p>
|
181
|
+
</div>
|
182
|
+
</div>
|
183
|
+
|
184
|
+
<a href="https://github.com/davidesantangelo/gitingest" class="button">View on GitHub</a>
|
185
|
+
<a href="https://rubygems.org/gems/gitingest" class="button">View on RubyGems</a>
|
186
|
+
|
187
|
+
<h2>Installation</h2>
|
188
|
+
|
189
|
+
<h3>From RubyGems</h3>
|
190
|
+
<pre><code>gem install gitingest</code></pre>
|
191
|
+
|
192
|
+
<h3>From Source</h3>
|
193
|
+
<pre><code>git clone https://github.com/davidesantangelo/gitingest.git
|
194
|
+
cd gitingest
|
195
|
+
bundle install
|
196
|
+
bundle exec rake install</code></pre>
|
197
|
+
|
198
|
+
<h2>Usage</h2>
|
199
|
+
|
200
|
+
<h3>Command Line</h3>
|
201
|
+
<pre><code># Basic usage (public repository)
|
202
|
+
gitingest --repository user/repo
|
203
|
+
|
204
|
+
# With GitHub token for private repositories
|
205
|
+
gitingest --repository user/repo --token YOUR_GITHUB_TOKEN
|
206
|
+
|
207
|
+
# Specify a custom output file
|
208
|
+
gitingest --repository user/repo --output my_prompt.txt
|
209
|
+
|
210
|
+
# Specify a different branch
|
211
|
+
gitingest --repository user/repo --branch develop
|
212
|
+
|
213
|
+
# Exclude additional patterns
|
214
|
+
gitingest --repository user/repo --exclude "*.md,docs/"
|
215
|
+
|
216
|
+
# Quiet mode
|
217
|
+
gitingest --repository user/repo --quiet
|
218
|
+
|
219
|
+
# Verbose mode
|
220
|
+
gitingest --repository user/repo --verbose</code></pre>
|
221
|
+
|
222
|
+
<h4>Available Options</h4>
|
223
|
+
<ul>
|
224
|
+
<li><code>-r, --repository REPO</code>: GitHub repository (username/repo) [Required]</li>
|
225
|
+
<li><code>-t, --token TOKEN</code>: GitHub personal access token [Optional but recommended]</li>
|
226
|
+
<li><code>-o, --output FILE</code>: Output file for the prompt [Default: reponame_prompt.txt]</li>
|
227
|
+
<li><code>-e, --exclude PATTERN</code>: File patterns to exclude (comma separated)</li>
|
228
|
+
<li><code>-b, --branch BRANCH</code>: Repository branch [Default: main]</li>
|
229
|
+
<li><code>-h, --help</code>: Show help message</li>
|
230
|
+
</ul>
|
231
|
+
|
232
|
+
<h3>As a Library</h3>
|
233
|
+
<pre><code>require "gitingest"
|
234
|
+
|
235
|
+
# Basic usage
|
236
|
+
generator = Gitingest::Generator.new(
|
237
|
+
repository: "user/repo",
|
238
|
+
token: "YOUR_GITHUB_TOKEN" # optional
|
239
|
+
)
|
240
|
+
generator.run
|
241
|
+
|
242
|
+
# With custom options
|
243
|
+
generator = Gitingest::Generator.new(
|
244
|
+
repository: "user/repo",
|
245
|
+
token: "YOUR_GITHUB_TOKEN",
|
246
|
+
output_file: "my_prompt.txt",
|
247
|
+
branch: "develop",
|
248
|
+
exclude: ["*.md", "docs/"],
|
249
|
+
quiet: true # or verbose: true
|
250
|
+
)
|
251
|
+
generator.run
|
252
|
+
|
253
|
+
# With custom logger
|
254
|
+
custom_logger = Logger.new("gitingest.log")
|
255
|
+
generator = Gitingest::Generator.new(
|
256
|
+
repository: "user/repo",
|
257
|
+
logger: custom_logger
|
258
|
+
)
|
259
|
+
generator.run</code></pre>
|
260
|
+
|
261
|
+
<h2>Features</h2>
|
262
|
+
<ul>
|
263
|
+
<li>Fetches all files from a GitHub repository based on the given branch</li>
|
264
|
+
<li>Automatically excludes common binary files and system files by default</li>
|
265
|
+
<li>Allows custom exclusion patterns for specific file extensions or directories</li>
|
266
|
+
<li>Uses concurrent processing for faster downloads</li>
|
267
|
+
<li>Handles GitHub API rate limiting with automatic retry</li>
|
268
|
+
<li>Generates a clean, formatted output file with file paths and content</li>
|
269
|
+
</ul>
|
270
|
+
|
271
|
+
<h2>Default Exclusion Patterns</h2>
|
272
|
+
<p>By default, the generator excludes files and directories commonly ignored in repositories, such as:</p>
|
273
|
+
<ul>
|
274
|
+
<li>Version control files (<code>.git/</code>, <code>.svn/</code>)</li>
|
275
|
+
<li>System files (<code>.DS_Store</code>, <code>Thumbs.db</code>)</li>
|
276
|
+
<li>Log files (<code>*.log</code>, <code>*.bak</code>)</li>
|
277
|
+
<li>Images and media files (<code>*.png</code>, <code>*.jpg</code>, <code>*.mp3</code>)</li>
|
278
|
+
<li>Archives (<code>*.zip</code>, <code>*.tar.gz</code>)</li>
|
279
|
+
<li>Dependency directories (<code>node_modules/</code>, <code>vendor/</code>)</li>
|
280
|
+
<li>Compiled and binary files (<code>*.pyc</code>, <code>*.class</code>, <code>*.exe</code>)</li>
|
281
|
+
</ul>
|
282
|
+
|
283
|
+
<h2>Limitations</h2>
|
284
|
+
<ul>
|
285
|
+
<li>To prevent memory overload, only the first 1000 files will be processed</li>
|
286
|
+
<li>API requests are subject to GitHub limits (60 requests/hour without token, 5000 requests/hour with token)</li>
|
287
|
+
<li>Private repositories require a GitHub personal access token</li>
|
288
|
+
</ul>
|
289
|
+
|
290
|
+
<div class="changelog">
|
291
|
+
<h2>Changelog</h2>
|
292
|
+
|
293
|
+
<div class="changelog-item">
|
294
|
+
<div>
|
295
|
+
<span class="changelog-version">v0.3.0</span>
|
296
|
+
<span class="changelog-date">- March 2, 2025</span>
|
297
|
+
</div>
|
298
|
+
<ul class="changelog-list">
|
299
|
+
<li>Added <code>faraday-retry</code> gem dependency for better API rate limit handling</li>
|
300
|
+
<li>Implemented thread-safe buffer management with mutex locks</li>
|
301
|
+
<li>Added new <code>ProgressIndicator</code> class for better CLI progress reporting (showing percentages)</li>
|
302
|
+
<li>Improved memory efficiency with configurable buffer size</li>
|
303
|
+
<li>Enhanced code organization with dedicated methods for file content formatting</li>
|
304
|
+
<li>Added comprehensive method documentation and parameter descriptions</li>
|
305
|
+
<li>Optimized thread pool size calculation for better performance</li>
|
306
|
+
<li>Improved error handling in concurrent operations</li>
|
307
|
+
</ul>
|
308
|
+
</div>
|
309
|
+
|
310
|
+
<div class="changelog-item">
|
311
|
+
<div>
|
312
|
+
<span class="changelog-version">v0.2.0</span>
|
313
|
+
<span class="changelog-date">- March 2, 2025</span>
|
314
|
+
</div>
|
315
|
+
<ul class="changelog-list">
|
316
|
+
<li>Added support for quiet and verbose modes in the command-line interface</li>
|
317
|
+
<li>Added the ability to specify a custom output file for the prompt</li>
|
318
|
+
<li>Enhanced error handling with logging support</li>
|
319
|
+
<li>Added logging functionality with custom loggers</li>
|
320
|
+
<li>Introduced rate limit handling with retries for file fetching</li>
|
321
|
+
<li>Added repository branch support</li>
|
322
|
+
<li>Exclude specific file patterns via command-line arguments</li>
|
323
|
+
<li>Enforced a 1000 file limit to prevent memory overload</li>
|
324
|
+
</ul>
|
325
|
+
</div>
|
326
|
+
|
327
|
+
<div class="changelog-item">
|
328
|
+
<div>
|
329
|
+
<span class="changelog-version">v0.1.0</span>
|
330
|
+
<span class="changelog-date">- March 2, 2025</span>
|
331
|
+
</div>
|
332
|
+
<ul class="changelog-list">
|
333
|
+
<li>Initial release of Gitingest</li>
|
334
|
+
<li>Core functionality to fetch and process GitHub repository files</li>
|
335
|
+
<li>Command-line interface for easy interaction</li>
|
336
|
+
<li>Smart file filtering with default exclusions for common non-code files</li>
|
337
|
+
<li>Concurrent processing for improved performance</li>
|
338
|
+
<li>Custom exclude patterns support</li>
|
339
|
+
<li>GitHub authentication via access tokens</li>
|
340
|
+
<li>Automatic rate limit handling with retry mechanism</li>
|
341
|
+
<li>Repository prompt generation with file separation markers</li>
|
342
|
+
<li>Support for custom branch selection</li>
|
343
|
+
<li>Custom output file naming options</li>
|
344
|
+
</ul>
|
345
|
+
</div>
|
346
|
+
</div>
|
347
|
+
|
348
|
+
<h2>Contributing</h2>
|
349
|
+
<p>Bug reports and pull requests are welcome on GitHub at <a href="https://github.com/davidesantangelo/gitingest">https://github.com/davidesantangelo/gitingest</a>.</p>
|
350
|
+
|
351
|
+
<h2>Acknowledgements</h2>
|
352
|
+
<p>Inspired by <a href="https://github.com/cyclotruc/gitingest"><code>cyclotruc/gitingest</code></a>.</p>
|
353
|
+
|
354
|
+
<h2>License</h2>
|
355
|
+
<p>The gem is available as open source under the terms of the <a href="https://opensource.org/licenses/MIT">MIT License</a>.</p>
|
356
|
+
</div>
|
357
|
+
|
358
|
+
<footer>
|
359
|
+
<p>© 2025 David Santangelo</p>
|
360
|
+
<p>Last updated: March 2, 2025</p>
|
361
|
+
</footer>
|
362
|
+
</body>
|
363
|
+
</html>
|
data/lib/gitingest/generator.rb
CHANGED
@@ -65,11 +65,37 @@ module Gitingest
|
|
65
65
|
"\.swiftpm/", "\.build/"
|
66
66
|
].freeze
|
67
67
|
|
68
|
+
# Optimization: pattern for dot files/directories
|
69
|
+
DOT_FILE_PATTERN = %r{(?-mix:(^\.|/\.))}
|
70
|
+
|
68
71
|
# Maximum number of files to process to prevent memory overload
|
69
72
|
MAX_FILES = 1000
|
70
73
|
|
74
|
+
# Optimization: increased buffer size to reduce I/O operations
|
75
|
+
BUFFER_SIZE = 250
|
76
|
+
|
77
|
+
# Optimization: thread-local buffer threshold
|
78
|
+
LOCAL_BUFFER_THRESHOLD = 50
|
79
|
+
|
80
|
+
# Add configurable threading options
|
81
|
+
DEFAULT_THREAD_COUNT = [Concurrent.processor_count, 8].min
|
82
|
+
DEFAULT_THREAD_TIMEOUT = 60 # seconds
|
83
|
+
|
71
84
|
attr_reader :options, :client, :repo_files, :excluded_patterns, :logger
|
72
85
|
|
86
|
+
# Initialize a new Generator with the given options
|
87
|
+
#
|
88
|
+
# @param options [Hash] Configuration options
|
89
|
+
# @option options [String] :repository GitHub repository in format "username/repo"
|
90
|
+
# @option options [String] :token GitHub personal access token
|
91
|
+
# @option options [String] :branch Repository branch (default: "main")
|
92
|
+
# @option options [String] :output_file Output file path
|
93
|
+
# @option options [Array<String>] :exclude Additional patterns to exclude
|
94
|
+
# @option options [Boolean] :quiet Reduce logging to errors only
|
95
|
+
# @option options [Boolean] :verbose Increase logging verbosity
|
96
|
+
# @option options [Logger] :logger Custom logger instance
|
97
|
+
# @option options [Integer] :threads Number of threads to use (default: auto-detected)
|
98
|
+
# @option options [Integer] :thread_timeout Seconds to wait for thread pool shutdown (default: 60)
|
73
99
|
def initialize(options = {})
|
74
100
|
@options = options
|
75
101
|
@repo_files = []
|
@@ -80,6 +106,15 @@ module Gitingest
|
|
80
106
|
compile_excluded_patterns
|
81
107
|
end
|
82
108
|
|
109
|
+
# Main execution method
|
110
|
+
def run
|
111
|
+
fetch_repository_contents
|
112
|
+
generate_prompt
|
113
|
+
end
|
114
|
+
|
115
|
+
private
|
116
|
+
|
117
|
+
# Set up logging based on verbosity options
|
83
118
|
def setup_logger
|
84
119
|
@logger = @options[:logger] || Logger.new($stdout)
|
85
120
|
@logger.level = if @options[:quiet]
|
@@ -89,21 +124,23 @@ module Gitingest
|
|
89
124
|
else
|
90
125
|
Logger::INFO
|
91
126
|
end
|
92
|
-
#
|
127
|
+
# Simplify logger format for command line usage
|
93
128
|
@logger.formatter = proc { |severity, _, _, msg| "#{severity == "INFO" ? "" : "[#{severity}] "}#{msg}\n" }
|
94
129
|
end
|
95
130
|
|
96
|
-
|
131
|
+
# Validate and set default options
|
97
132
|
def validate_options
|
98
133
|
raise ArgumentError, "Repository is required" unless @options[:repository]
|
99
134
|
|
100
135
|
@options[:output_file] ||= "#{@options[:repository].split("/").last}_prompt.txt"
|
101
136
|
@options[:branch] ||= "main"
|
102
137
|
@options[:exclude] ||= []
|
138
|
+
@options[:threads] ||= DEFAULT_THREAD_COUNT
|
139
|
+
@options[:thread_timeout] ||= DEFAULT_THREAD_TIMEOUT
|
103
140
|
@excluded_patterns = DEFAULT_EXCLUDES + @options[:exclude]
|
104
141
|
end
|
105
142
|
|
106
|
-
|
143
|
+
# Configure the GitHub API client
|
107
144
|
def configure_client
|
108
145
|
@client = @options[:token] ? Octokit::Client.new(access_token: @options[:token]) : Octokit::Client.new
|
109
146
|
|
@@ -115,17 +152,17 @@ module Gitingest
|
|
115
152
|
end
|
116
153
|
end
|
117
154
|
|
155
|
+
# Optimization: Create a combined regex for faster exclusion checking
|
118
156
|
def compile_excluded_patterns
|
119
|
-
|
157
|
+
patterns = @excluded_patterns.map { |pattern| "(#{pattern})" }
|
158
|
+
@combined_exclude_regex = Regexp.new("#{DOT_FILE_PATTERN.source}|#{patterns.join("|")}")
|
120
159
|
end
|
121
160
|
|
122
|
-
|
161
|
+
# Fetch repository contents and apply exclusion filters
|
123
162
|
def fetch_repository_contents
|
124
163
|
@logger.info "Fetching repository: #{@options[:repository]} (branch: #{@options[:branch]})"
|
125
164
|
begin
|
126
|
-
# First validate authentication and repository access
|
127
165
|
validate_repository_access
|
128
|
-
|
129
166
|
repo_tree = @client.tree(@options[:repository], @options[:branch], recursive: true)
|
130
167
|
@repo_files = repo_tree.tree.select { |item| item.type == "blob" && !excluded_file?(item.path) }
|
131
168
|
|
@@ -143,8 +180,8 @@ module Gitingest
|
|
143
180
|
end
|
144
181
|
end
|
145
182
|
|
183
|
+
# Validate repository and branch access
|
146
184
|
def validate_repository_access
|
147
|
-
# Check if we can access the repository
|
148
185
|
begin
|
149
186
|
@client.repository(@options[:repository])
|
150
187
|
rescue Octokit::Unauthorized
|
@@ -153,7 +190,6 @@ module Gitingest
|
|
153
190
|
raise "Repository '#{@options[:repository]}' not found or is private. Check the repository name or provide a valid token."
|
154
191
|
end
|
155
192
|
|
156
|
-
# Check if the branch exists
|
157
193
|
begin
|
158
194
|
@client.branch(@options[:repository], @options[:branch])
|
159
195
|
rescue Octokit::NotFound
|
@@ -161,68 +197,223 @@ module Gitingest
|
|
161
197
|
end
|
162
198
|
end
|
163
199
|
|
200
|
+
# Optimization: Optimized file exclusion check with combined regex
|
164
201
|
def excluded_file?(path)
|
165
|
-
|
166
|
-
|
167
|
-
@excluded_patterns.any? { |pattern| path.match?(pattern) }
|
202
|
+
path.match?(@combined_exclude_regex)
|
168
203
|
end
|
169
204
|
|
170
|
-
|
205
|
+
# Generate the consolidated prompt file with optimized threading
|
171
206
|
def generate_prompt
|
172
207
|
@logger.info "Generating prompt..."
|
173
|
-
|
208
|
+
@logger.debug "Using thread pool with #{@options[:threads]} threads"
|
209
|
+
|
174
210
|
buffer = []
|
175
|
-
|
211
|
+
progress = ProgressIndicator.new(@repo_files.size, @logger)
|
212
|
+
|
213
|
+
# Optimization: thread-local buffers to reduce mutex contention
|
214
|
+
thread_buffers = {}
|
215
|
+
mutex = Mutex.new
|
216
|
+
errors = []
|
217
|
+
|
218
|
+
# Dynamic thread pool based on configuration
|
219
|
+
pool = Concurrent::FixedThreadPool.new(@options[:threads])
|
176
220
|
|
177
|
-
#
|
178
|
-
|
221
|
+
# Group files by priority (smaller files first for better parallelism)
|
222
|
+
prioritized_files = prioritize_files(@repo_files)
|
179
223
|
|
180
224
|
File.open(@options[:output_file], "w") do |file|
|
181
|
-
|
225
|
+
prioritized_files.each_with_index do |repo_file, index|
|
182
226
|
pool.post do
|
183
|
-
|
184
|
-
|
185
|
-
|
186
|
-
|
187
|
-
|
188
|
-
|
189
|
-
|
190
|
-
|
191
|
-
|
192
|
-
|
193
|
-
|
194
|
-
|
195
|
-
|
227
|
+
# Optimization: Use thread-local buffers
|
228
|
+
thread_id = Thread.current.object_id
|
229
|
+
thread_buffers[thread_id] ||= []
|
230
|
+
local_buffer = thread_buffers[thread_id]
|
231
|
+
|
232
|
+
begin
|
233
|
+
content = fetch_file_content_with_retry(repo_file.path)
|
234
|
+
result = format_file_content(repo_file.path, content)
|
235
|
+
local_buffer << result
|
236
|
+
|
237
|
+
# Optimization: Only acquire mutex when local buffer reaches threshold
|
238
|
+
if local_buffer.size >= LOCAL_BUFFER_THRESHOLD
|
239
|
+
mutex.synchronize do
|
240
|
+
buffer.concat(local_buffer)
|
241
|
+
write_buffer(file, buffer) if buffer.size >= BUFFER_SIZE
|
242
|
+
local_buffer.clear
|
243
|
+
end
|
244
|
+
end
|
245
|
+
|
246
|
+
progress.update(index + 1)
|
247
|
+
rescue Octokit::Error => e
|
248
|
+
mutex.synchronize do
|
249
|
+
errors << "Error fetching #{repo_file.path}: #{e.message}"
|
250
|
+
@logger.error "Error fetching #{repo_file.path}: #{e.message}"
|
251
|
+
end
|
252
|
+
rescue StandardError => e
|
253
|
+
mutex.synchronize do
|
254
|
+
errors << "Unexpected error processing #{repo_file.path}: #{e.message}"
|
255
|
+
@logger.error "Unexpected error processing #{repo_file.path}: #{e.message}"
|
256
|
+
end
|
257
|
+
end
|
196
258
|
end
|
197
259
|
end
|
198
|
-
|
199
|
-
|
200
|
-
|
260
|
+
|
261
|
+
begin
|
262
|
+
pool.shutdown
|
263
|
+
wait_success = pool.wait_for_termination(@options[:thread_timeout])
|
264
|
+
|
265
|
+
unless wait_success
|
266
|
+
@logger.warn "Thread pool did not shut down within #{@options[:thread_timeout]} seconds, forcing termination"
|
267
|
+
pool.kill
|
268
|
+
end
|
269
|
+
rescue StandardError => e
|
270
|
+
@logger.error "Error during thread pool shutdown: #{e.message}"
|
271
|
+
end
|
272
|
+
|
273
|
+
# Process remaining files in thread-local buffers
|
274
|
+
mutex.synchronize do
|
275
|
+
thread_buffers.each_value do |local_buffer|
|
276
|
+
buffer.concat(local_buffer) unless local_buffer.empty?
|
277
|
+
end
|
278
|
+
write_buffer(file, buffer) unless buffer.empty?
|
279
|
+
end
|
201
280
|
end
|
202
|
-
|
281
|
+
|
282
|
+
if errors.any?
|
283
|
+
@logger.warn "Completed with #{errors.size} errors"
|
284
|
+
@logger.debug "First few errors: #{errors.first(3).join(", ")}" if @logger.debug?
|
285
|
+
end
|
286
|
+
|
287
|
+
@logger.info "Prompt generated and saved to #{@options[:output_file]}"
|
203
288
|
end
|
204
289
|
|
205
|
-
|
290
|
+
# Format a file's content for the prompt
|
291
|
+
def format_file_content(path, content)
|
292
|
+
<<~TEXT
|
293
|
+
================================================================
|
294
|
+
File: #{path}
|
295
|
+
================================================================
|
296
|
+
#{content}
|
297
|
+
|
298
|
+
TEXT
|
299
|
+
end
|
300
|
+
|
301
|
+
# Optimization: Fetch file content with exponential backoff for rate limiting
|
302
|
+
def fetch_file_content_with_retry(path, retries = 3, base_delay = 2)
|
206
303
|
content = @client.contents(@options[:repository], path: path, ref: @options[:branch])
|
207
304
|
Base64.decode64(content.content)
|
208
305
|
rescue Octokit::TooManyRequests
|
209
306
|
raise unless retries.positive?
|
210
307
|
|
211
|
-
|
212
|
-
|
213
|
-
|
214
|
-
|
308
|
+
# Optimization: Exponential backoff with jitter for better rate limit handling
|
309
|
+
delay = base_delay**(4 - retries) * (0.8 + 0.4 * rand)
|
310
|
+
@logger.warn "Rate limit exceeded, waiting #{delay.round(1)} seconds..."
|
311
|
+
sleep(delay)
|
312
|
+
fetch_file_content_with_retry(path, retries - 1, base_delay)
|
215
313
|
end
|
216
314
|
|
315
|
+
# Write buffer contents to file and clear buffer
|
217
316
|
def write_buffer(file, buffer)
|
317
|
+
return if buffer.empty?
|
318
|
+
|
218
319
|
file.puts(buffer.join)
|
219
320
|
buffer.clear
|
220
321
|
end
|
221
322
|
|
222
|
-
|
223
|
-
def
|
224
|
-
|
225
|
-
|
323
|
+
# Sort files by estimated processing priority
|
324
|
+
def prioritize_files(files)
|
325
|
+
# Sort files by estimated size (based on extension)
|
326
|
+
# This helps with better thread distribution - process small files first
|
327
|
+
files.sort_by do |file|
|
328
|
+
path = file.path.downcase
|
329
|
+
if path.end_with?(".md", ".txt", ".json", ".yaml", ".yml")
|
330
|
+
0 # Process documentation and config files first (usually small)
|
331
|
+
elsif path.end_with?(".rb", ".py", ".js", ".ts", ".go", ".java", ".c", ".cpp", ".h")
|
332
|
+
1 # Then process code files (medium size)
|
333
|
+
else
|
334
|
+
2 # Other files last
|
335
|
+
end
|
336
|
+
end
|
337
|
+
end
|
338
|
+
end
|
339
|
+
|
340
|
+
# Helper class for showing progress in CLI with visual bar
|
341
|
+
class ProgressIndicator
|
342
|
+
BAR_WIDTH = 30 # Width of the progress bar
|
343
|
+
|
344
|
+
def initialize(total, logger)
|
345
|
+
@total = total
|
346
|
+
@logger = logger
|
347
|
+
@last_percent = 0
|
348
|
+
@start_time = Time.now
|
349
|
+
@last_update_time = Time.now
|
350
|
+
@update_interval = 0.5 # Limit updates to twice per second
|
351
|
+
end
|
352
|
+
|
353
|
+
# Update progress with visual bar
|
354
|
+
def update(current)
|
355
|
+
# Avoid updating too frequently
|
356
|
+
now = Time.now
|
357
|
+
return if now - @last_update_time < @update_interval && current != @total
|
358
|
+
|
359
|
+
@last_update_time = now
|
360
|
+
percent = (current.to_f / @total * 100).round
|
361
|
+
|
362
|
+
# Only update at meaningful increments or completion
|
363
|
+
return unless percent > @last_percent || current == @total
|
364
|
+
|
365
|
+
elapsed = now - @start_time
|
366
|
+
|
367
|
+
# Generate progress bar
|
368
|
+
progress_chars = (BAR_WIDTH * (current.to_f / @total)).round
|
369
|
+
bar = "[#{"|" * progress_chars}#{" " * (BAR_WIDTH - progress_chars)}]"
|
370
|
+
|
371
|
+
# Calculate ETA
|
372
|
+
eta_string = ""
|
373
|
+
if current > 1 && percent < 100
|
374
|
+
remaining = (elapsed / current) * (@total - current)
|
375
|
+
eta_string = " ETA: #{format_time(remaining)}"
|
376
|
+
end
|
377
|
+
|
378
|
+
# Calculate rate (files per second)
|
379
|
+
rate = begin
|
380
|
+
current / elapsed
|
381
|
+
rescue StandardError
|
382
|
+
0
|
383
|
+
end
|
384
|
+
rate_string = " (#{rate.round(1)} files/sec)"
|
385
|
+
|
386
|
+
# Clear line and print progress bar
|
387
|
+
print "\r\e[K" # Clear the line
|
388
|
+
print "#{bar} #{percent}% | #{current}/#{@total} files#{rate_string}#{eta_string}"
|
389
|
+
print "\n" if current == @total # Add newline when complete
|
390
|
+
|
391
|
+
# Also log to logger at less frequent intervals
|
392
|
+
if (percent % 10).zero? && percent != @last_percent || current == @total
|
393
|
+
@logger.info "Processing: #{percent}% complete (#{current}/#{@total} files)#{eta_string}"
|
394
|
+
end
|
395
|
+
|
396
|
+
@last_percent = percent
|
397
|
+
end
|
398
|
+
|
399
|
+
private
|
400
|
+
|
401
|
+
# Format seconds into a human-readable time string
|
402
|
+
def format_time(seconds)
|
403
|
+
return "< 1s" if seconds < 1
|
404
|
+
|
405
|
+
case seconds
|
406
|
+
when 0...60
|
407
|
+
"#{seconds.round}s"
|
408
|
+
when 60...3600
|
409
|
+
minutes = (seconds / 60).floor
|
410
|
+
secs = (seconds % 60).round
|
411
|
+
"#{minutes}m #{secs}s"
|
412
|
+
else
|
413
|
+
hours = (seconds / 3600).floor
|
414
|
+
minutes = ((seconds % 3600) / 60).floor
|
415
|
+
"#{hours}h #{minutes}m"
|
416
|
+
end
|
226
417
|
end
|
227
418
|
end
|
228
419
|
end
|
data/lib/gitingest/version.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: gitingest
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.3.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Davide Santangelo
|
@@ -24,6 +24,20 @@ dependencies:
|
|
24
24
|
- - "~>"
|
25
25
|
- !ruby/object:Gem::Version
|
26
26
|
version: '1.1'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: faraday-retry
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - "~>"
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '2.0'
|
34
|
+
type: :runtime
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - "~>"
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '2.0'
|
27
41
|
- !ruby/object:Gem::Dependency
|
28
42
|
name: octokit
|
29
43
|
requirement: !ruby/object:Gem::Requirement
|
@@ -117,6 +131,7 @@ files:
|
|
117
131
|
- bin/console
|
118
132
|
- bin/gitingest
|
119
133
|
- bin/setup
|
134
|
+
- index.html
|
120
135
|
- lib/gitingest.rb
|
121
136
|
- lib/gitingest/generator.rb
|
122
137
|
- lib/gitingest/version.rb
|