gitingest 0.3.0 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: fd7a1e5d5ced0b5449fa30671b0d9a536685d37c3d0d34d33437f652df24c199
4
- data.tar.gz: c49a7c6489f7074e3870a05b5d1e47b0ea3e6b7a6eadce405db3c300d8165434
3
+ metadata.gz: 14bcb35132327c7725e69a895d56b4e88fb21db78fa5473c9afb4d08b879b7ee
4
+ data.tar.gz: f3a5e06bec7566a268342678887bac14989d69d26f10cbb590e1f594c6779b89
5
5
  SHA512:
6
- metadata.gz: 64b73ea01bc836a500c82a260c41be6f87e6f0c72bf868bee059407eea45466ad4892d905b7b69f5c6151e40e92d2553c8b8f678bfe1155fef968486648ae871
7
- data.tar.gz: 7c18261e6fdb279916d2f8bff4557de76eb6ec0c039861d646f5f56dadacbef536b176d2a48fb25d142b70e30e868fb431f9cde6aad69262ee6f8dc37232ea0a
6
+ metadata.gz: 1aed7d97acae8b6a1c2b15757cdc9852d802c99b83aa73caf114dfc904d0b05c44b70c853cd453fc12868310c09b15bf6bf71dcac203c01b5dfde241dc7ab0f5
7
+ data.tar.gz: 07145ca986675723ad0371c873c176384631da660bed227b31ef4cf2680d72cc98479b4da8fb990fedb3dce0eab74ca5114c1222eea0ca1890a85d73726a8d4b
data/CHANGELOG.md CHANGED
@@ -1,39 +1,67 @@
1
1
  # Changelog
2
2
 
3
- All notable changes to this project will be documented in this file.
3
+ ## [0.3.1] - 2025-03-03
4
+
5
+ ### Added
6
+ - Introduced configurable threading options:
7
+ - `:threads` to specify the number of threads (default: auto-detected).
8
+ - `:thread_timeout` to define thread pool shutdown timeout (default: 60 seconds).
9
+ - Implemented thread-local buffers to reduce mutex contention during file processing.
10
+ - Added exponential backoff with jitter for rate-limited API requests.
11
+ - Improved progress indicator with a visual progress bar and estimated time remaining.
12
+
13
+ ### Changed
14
+ - Increased `BUFFER_SIZE` from 100 to 250 to reduce I/O operations.
15
+ - Optimized file exclusion check using a combined regex for faster matching.
16
+ - Improved thread pool efficiency by prioritizing smaller files first.
17
+ - Enhanced error handling with detailed logging and thread-safe error collection.
18
+
19
+ ### Fixed
20
+ - Ensured thread pool shutdown respects the configured timeout.
21
+ - Resolved potential race conditions in file content retrieval.
22
+
23
+ ---
4
24
 
5
25
  ## [0.3.0] - 2025-03-02
6
- - Added `faraday-retry` gem dependency for better API rate limit handling
7
- - Implemented thread-safe buffer management with mutex locks
8
- - Added new `ProgressIndicator` class for better CLI progress reporting (showing percentages)
9
- - Improved memory efficiency with configurable buffer size
10
- - Enhanced code organization with dedicated methods for file content formatting
11
- - Added comprehensive method documentation and parameter descriptions
12
- - Optimized thread pool size calculation for better performance
13
- - Improved error handling in concurrent operations
26
+
27
+ ### Added
28
+ - Added `faraday-retry` gem dependency for better API rate limit handling.
29
+ - Implemented thread-safe buffer management with mutex locks.
30
+ - Introduced `ProgressIndicator` class for enhanced CLI progress reporting, including percentages.
31
+ - Improved memory efficiency with a configurable buffer size.
32
+ - Enhanced code organization by introducing dedicated methods for file content formatting.
33
+ - Added comprehensive method documentation and parameter descriptions.
34
+ - Optimized thread pool size calculation for improved performance.
35
+ - Improved error handling in concurrent operations.
36
+
37
+ ---
14
38
 
15
39
  ## [0.2.0] - 2025-03-02
16
- - Added support for quiet and verbose modes in the command-line interface
17
- - Added the ability to specify a custom output file for the prompt
18
- - Enhanced error handling with logging support
19
- - Added logging functionality with custom loggers
20
- - Introduced rate limit handling with retries for file fetching
21
- - Added repository branch support
22
- - Exclude specific file patterns via command-line arguments
23
- - Enforced a 1000 file limit to prevent memory overload
24
- - Updated version to 0.2.0
40
+
41
+ ### Added
42
+ - Introduced support for quiet and verbose modes in the command-line interface.
43
+ - Added the ability to specify a custom output file for the prompt.
44
+ - Implemented enhanced error handling with logging support.
45
+ - Introduced logging functionality with customizable loggers.
46
+ - Added rate limit handling with retries for file fetching.
47
+ - Implemented repository branch support.
48
+ - Enabled exclusion of specific file patterns via command-line arguments.
49
+ - Enforced a 1000-file limit to prevent memory overload.
50
+ - Updated version to `0.2.0`.
51
+
52
+ ---
25
53
 
26
54
  ## [0.1.0] - 2025-03-02
27
55
 
28
56
  ### Added
29
- - Initial release of Gitingest
30
- - Core functionality to fetch and process GitHub repository files
31
- - Command-line interface for easy interaction
32
- - Smart file filtering with default exclusions for common non-code files
33
- - Concurrent processing for improved performance
34
- - Custom exclude patterns support
35
- - GitHub authentication via access tokens
36
- - Automatic rate limit handling with retry mechanism
37
- - Repository prompt generation with file separation markers
38
- - Support for custom branch selection
39
- - Custom output file naming options
57
+ - Initial release of Gitingest.
58
+ - Core functionality to fetch and process GitHub repository files.
59
+ - Command-line interface for easy interaction.
60
+ - Smart file filtering with default exclusions for common non-code files.
61
+ - Concurrent processing for improved performance.
62
+ - Custom exclude patterns support.
63
+ - GitHub authentication via access tokens.
64
+ - Automatic rate limit handling with a retry mechanism.
65
+ - Repository prompt generation with file separation markers.
66
+ - Support for custom branch selection.
67
+ - Custom output file naming options.
data/index.html ADDED
@@ -0,0 +1,363 @@
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>Gitingest - GitHub Repository Fetcher and Prompt Generator</title>
7
+ <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/github-markdown-css/5.2.0/github-markdown.min.css">
8
+ <style>
9
+ :root {
10
+ --bg-color: #0d1117;
11
+ --text-color: #c9d1d9;
12
+ --link-color: #58a6ff;
13
+ --header-color: #f0f6fc;
14
+ --border-color: #30363d;
15
+ --code-bg: #161b22;
16
+ --code-block-bg: #0d1117;
17
+ --accent-color: #238636;
18
+ --accent-hover: #2ea043;
19
+ }
20
+
21
+ body {
22
+ font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Helvetica, Arial, sans-serif;
23
+ line-height: 1.6;
24
+ color: var(--text-color);
25
+ background-color: var(--bg-color);
26
+ max-width: 900px;
27
+ margin: 0 auto;
28
+ padding: 20px;
29
+ }
30
+
31
+ .container {
32
+ border: 1px solid var(--border-color);
33
+ border-radius: 6px;
34
+ padding: 30px;
35
+ margin-bottom: 20px;
36
+ background-color: #0d1117;
37
+ }
38
+
39
+ .header {
40
+ display: flex;
41
+ align-items: center;
42
+ margin-bottom: 30px;
43
+ }
44
+
45
+ .logo {
46
+ width: 60px;
47
+ height: 60px;
48
+ margin-right: 15px;
49
+ background-color: var(--accent-color);
50
+ border-radius: 50%;
51
+ display: flex;
52
+ align-items: center;
53
+ justify-content: center;
54
+ color: white;
55
+ font-size: 24px;
56
+ font-weight: bold;
57
+ }
58
+
59
+ h1, h2, h3 {
60
+ color: var(--header-color);
61
+ border-bottom: 1px solid var(--border-color);
62
+ padding-bottom: 10px;
63
+ margin-top: 24px;
64
+ margin-bottom: 16px;
65
+ }
66
+
67
+ h1 {
68
+ font-size: 2em;
69
+ margin-bottom: 0.5em;
70
+ border-bottom: none;
71
+ padding-bottom: 0;
72
+ }
73
+
74
+ .header h1 {
75
+ margin: 0;
76
+ line-height: 1.3;
77
+ }
78
+
79
+ a {
80
+ color: var(--link-color);
81
+ text-decoration: none;
82
+ }
83
+
84
+ a:hover {
85
+ text-decoration: underline;
86
+ }
87
+
88
+ code {
89
+ font-family: "SFMono-Regular", Consolas, "Liberation Mono", Menlo, monospace;
90
+ background-color: var(--code-bg);
91
+ border-radius: 3px;
92
+ padding: 2px 4px;
93
+ font-size: 0.9em;
94
+ }
95
+
96
+ pre {
97
+ background-color: var(--code-block-bg);
98
+ border-radius: 6px;
99
+ padding: 16px;
100
+ overflow: auto;
101
+ border: 1px solid var(--border-color);
102
+ margin: 16px 0;
103
+ }
104
+
105
+ pre code {
106
+ background-color: transparent;
107
+ padding: 0;
108
+ border-radius: 0;
109
+ white-space: pre;
110
+ }
111
+
112
+ ul, ol {
113
+ padding-left: 2em;
114
+ }
115
+
116
+ .button {
117
+ display: inline-block;
118
+ background-color: var(--accent-color);
119
+ color: white;
120
+ padding: 8px 16px;
121
+ border-radius: 6px;
122
+ font-weight: 600;
123
+ margin: 8px 0;
124
+ }
125
+
126
+ .button:hover {
127
+ background-color: var(--accent-hover);
128
+ text-decoration: none;
129
+ }
130
+
131
+ .version-badge {
132
+ display: inline-block;
133
+ background-color: #238636;
134
+ color: white;
135
+ border-radius: 20px;
136
+ padding: 4px 10px;
137
+ font-size: 12px;
138
+ font-weight: bold;
139
+ margin-left: 10px;
140
+ }
141
+
142
+ footer {
143
+ margin-top: 40px;
144
+ text-align: center;
145
+ color: #8b949e;
146
+ font-size: 0.9em;
147
+ border-top: 1px solid var(--border-color);
148
+ padding-top: 20px;
149
+ }
150
+
151
+ .changelog {
152
+ margin-top: 30px;
153
+ }
154
+
155
+ .changelog-item {
156
+ margin-bottom: 24px;
157
+ }
158
+
159
+ .changelog-version {
160
+ font-weight: bold;
161
+ color: var(--header-color);
162
+ }
163
+
164
+ .changelog-date {
165
+ color: #8b949e;
166
+ font-size: 0.9em;
167
+ }
168
+
169
+ .changelog-list {
170
+ margin-top: 10px;
171
+ }
172
+ </style>
173
+ </head>
174
+ <body>
175
+ <div class="container">
176
+ <div class="header">
177
+ <div class="logo">G</div>
178
+ <div>
179
+ <h1>Gitingest <span class="version-badge">v0.3.0</span></h1>
180
+ <p>A Ruby gem that fetches files from a GitHub repository and generates a consolidated text prompt for LLMs</p>
181
+ </div>
182
+ </div>
183
+
184
+ <a href="https://github.com/davidesantangelo/gitingest" class="button">View on GitHub</a>
185
+ <a href="https://rubygems.org/gems/gitingest" class="button">View on RubyGems</a>
186
+
187
+ <h2>Installation</h2>
188
+
189
+ <h3>From RubyGems</h3>
190
+ <pre><code>gem install gitingest</code></pre>
191
+
192
+ <h3>From Source</h3>
193
+ <pre><code>git clone https://github.com/davidesantangelo/gitingest.git
194
+ cd gitingest
195
+ bundle install
196
+ bundle exec rake install</code></pre>
197
+
198
+ <h2>Usage</h2>
199
+
200
+ <h3>Command Line</h3>
201
+ <pre><code># Basic usage (public repository)
202
+ gitingest --repository user/repo
203
+
204
+ # With GitHub token for private repositories
205
+ gitingest --repository user/repo --token YOUR_GITHUB_TOKEN
206
+
207
+ # Specify a custom output file
208
+ gitingest --repository user/repo --output my_prompt.txt
209
+
210
+ # Specify a different branch
211
+ gitingest --repository user/repo --branch develop
212
+
213
+ # Exclude additional patterns
214
+ gitingest --repository user/repo --exclude "*.md,docs/"
215
+
216
+ # Quiet mode
217
+ gitingest --repository user/repo --quiet
218
+
219
+ # Verbose mode
220
+ gitingest --repository user/repo --verbose</code></pre>
221
+
222
+ <h4>Available Options</h4>
223
+ <ul>
224
+ <li><code>-r, --repository REPO</code>: GitHub repository (username/repo) [Required]</li>
225
+ <li><code>-t, --token TOKEN</code>: GitHub personal access token [Optional but recommended]</li>
226
+ <li><code>-o, --output FILE</code>: Output file for the prompt [Default: reponame_prompt.txt]</li>
227
+ <li><code>-e, --exclude PATTERN</code>: File patterns to exclude (comma separated)</li>
228
+ <li><code>-b, --branch BRANCH</code>: Repository branch [Default: main]</li>
229
+ <li><code>-h, --help</code>: Show help message</li>
230
+ </ul>
231
+
232
+ <h3>As a Library</h3>
233
+ <pre><code>require "gitingest"
234
+
235
+ # Basic usage
236
+ generator = Gitingest::Generator.new(
237
+ repository: "user/repo",
238
+ token: "YOUR_GITHUB_TOKEN" # optional
239
+ )
240
+ generator.run
241
+
242
+ # With custom options
243
+ generator = Gitingest::Generator.new(
244
+ repository: "user/repo",
245
+ token: "YOUR_GITHUB_TOKEN",
246
+ output_file: "my_prompt.txt",
247
+ branch: "develop",
248
+ exclude: ["*.md", "docs/"],
249
+ quiet: true # or verbose: true
250
+ )
251
+ generator.run
252
+
253
+ # With custom logger
254
+ custom_logger = Logger.new("gitingest.log")
255
+ generator = Gitingest::Generator.new(
256
+ repository: "user/repo",
257
+ logger: custom_logger
258
+ )
259
+ generator.run</code></pre>
260
+
261
+ <h2>Features</h2>
262
+ <ul>
263
+ <li>Fetches all files from a GitHub repository based on the given branch</li>
264
+ <li>Automatically excludes common binary files and system files by default</li>
265
+ <li>Allows custom exclusion patterns for specific file extensions or directories</li>
266
+ <li>Uses concurrent processing for faster downloads</li>
267
+ <li>Handles GitHub API rate limiting with automatic retry</li>
268
+ <li>Generates a clean, formatted output file with file paths and content</li>
269
+ </ul>
270
+
271
+ <h2>Default Exclusion Patterns</h2>
272
+ <p>By default, the generator excludes files and directories commonly ignored in repositories, such as:</p>
273
+ <ul>
274
+ <li>Version control files (<code>.git/</code>, <code>.svn/</code>)</li>
275
+ <li>System files (<code>.DS_Store</code>, <code>Thumbs.db</code>)</li>
276
+ <li>Log files (<code>*.log</code>, <code>*.bak</code>)</li>
277
+ <li>Images and media files (<code>*.png</code>, <code>*.jpg</code>, <code>*.mp3</code>)</li>
278
+ <li>Archives (<code>*.zip</code>, <code>*.tar.gz</code>)</li>
279
+ <li>Dependency directories (<code>node_modules/</code>, <code>vendor/</code>)</li>
280
+ <li>Compiled and binary files (<code>*.pyc</code>, <code>*.class</code>, <code>*.exe</code>)</li>
281
+ </ul>
282
+
283
+ <h2>Limitations</h2>
284
+ <ul>
285
+ <li>To prevent memory overload, only the first 1000 files will be processed</li>
286
+ <li>API requests are subject to GitHub limits (60 requests/hour without token, 5000 requests/hour with token)</li>
287
+ <li>Private repositories require a GitHub personal access token</li>
288
+ </ul>
289
+
290
+ <div class="changelog">
291
+ <h2>Changelog</h2>
292
+
293
+ <div class="changelog-item">
294
+ <div>
295
+ <span class="changelog-version">v0.3.0</span>
296
+ <span class="changelog-date">- March 2, 2025</span>
297
+ </div>
298
+ <ul class="changelog-list">
299
+ <li>Added <code>faraday-retry</code> gem dependency for better API rate limit handling</li>
300
+ <li>Implemented thread-safe buffer management with mutex locks</li>
301
+ <li>Added new <code>ProgressIndicator</code> class for better CLI progress reporting (showing percentages)</li>
302
+ <li>Improved memory efficiency with configurable buffer size</li>
303
+ <li>Enhanced code organization with dedicated methods for file content formatting</li>
304
+ <li>Added comprehensive method documentation and parameter descriptions</li>
305
+ <li>Optimized thread pool size calculation for better performance</li>
306
+ <li>Improved error handling in concurrent operations</li>
307
+ </ul>
308
+ </div>
309
+
310
+ <div class="changelog-item">
311
+ <div>
312
+ <span class="changelog-version">v0.2.0</span>
313
+ <span class="changelog-date">- March 2, 2025</span>
314
+ </div>
315
+ <ul class="changelog-list">
316
+ <li>Added support for quiet and verbose modes in the command-line interface</li>
317
+ <li>Added the ability to specify a custom output file for the prompt</li>
318
+ <li>Enhanced error handling with logging support</li>
319
+ <li>Added logging functionality with custom loggers</li>
320
+ <li>Introduced rate limit handling with retries for file fetching</li>
321
+ <li>Added repository branch support</li>
322
+ <li>Exclude specific file patterns via command-line arguments</li>
323
+ <li>Enforced a 1000 file limit to prevent memory overload</li>
324
+ </ul>
325
+ </div>
326
+
327
+ <div class="changelog-item">
328
+ <div>
329
+ <span class="changelog-version">v0.1.0</span>
330
+ <span class="changelog-date">- March 2, 2025</span>
331
+ </div>
332
+ <ul class="changelog-list">
333
+ <li>Initial release of Gitingest</li>
334
+ <li>Core functionality to fetch and process GitHub repository files</li>
335
+ <li>Command-line interface for easy interaction</li>
336
+ <li>Smart file filtering with default exclusions for common non-code files</li>
337
+ <li>Concurrent processing for improved performance</li>
338
+ <li>Custom exclude patterns support</li>
339
+ <li>GitHub authentication via access tokens</li>
340
+ <li>Automatic rate limit handling with retry mechanism</li>
341
+ <li>Repository prompt generation with file separation markers</li>
342
+ <li>Support for custom branch selection</li>
343
+ <li>Custom output file naming options</li>
344
+ </ul>
345
+ </div>
346
+ </div>
347
+
348
+ <h2>Contributing</h2>
349
+ <p>Bug reports and pull requests are welcome on GitHub at <a href="https://github.com/davidesantangelo/gitingest">https://github.com/davidesantangelo/gitingest</a>.</p>
350
+
351
+ <h2>Acknowledgements</h2>
352
+ <p>Inspired by <a href="https://github.com/cyclotruc/gitingest"><code>cyclotruc/gitingest</code></a>.</p>
353
+
354
+ <h2>License</h2>
355
+ <p>The gem is available as open source under the terms of the <a href="https://opensource.org/licenses/MIT">MIT License</a>.</p>
356
+ </div>
357
+
358
+ <footer>
359
+ <p>© 2025 David Santangelo</p>
360
+ <p>Last updated: March 2, 2025</p>
361
+ </footer>
362
+ </body>
363
+ </html>
@@ -65,9 +65,21 @@ module Gitingest
65
65
  "\.swiftpm/", "\.build/"
66
66
  ].freeze
67
67
 
68
+ # Optimization: pattern for dot files/directories
69
+ DOT_FILE_PATTERN = %r{(?-mix:(^\.|/\.))}
70
+
68
71
  # Maximum number of files to process to prevent memory overload
69
72
  MAX_FILES = 1000
70
- BUFFER_SIZE = 100 # Write every 100 files to reduce I/O operations
73
+
74
+ # Optimization: increased buffer size to reduce I/O operations
75
+ BUFFER_SIZE = 250
76
+
77
+ # Optimization: thread-local buffer threshold
78
+ LOCAL_BUFFER_THRESHOLD = 50
79
+
80
+ # Add configurable threading options
81
+ DEFAULT_THREAD_COUNT = [Concurrent.processor_count, 8].min
82
+ DEFAULT_THREAD_TIMEOUT = 60 # seconds
71
83
 
72
84
  attr_reader :options, :client, :repo_files, :excluded_patterns, :logger
73
85
 
@@ -82,6 +94,8 @@ module Gitingest
82
94
  # @option options [Boolean] :quiet Reduce logging to errors only
83
95
  # @option options [Boolean] :verbose Increase logging verbosity
84
96
  # @option options [Logger] :logger Custom logger instance
97
+ # @option options [Integer] :threads Number of threads to use (default: auto-detected)
98
+ # @option options [Integer] :thread_timeout Seconds to wait for thread pool shutdown (default: 60)
85
99
  def initialize(options = {})
86
100
  @options = options
87
101
  @repo_files = []
@@ -121,6 +135,8 @@ module Gitingest
121
135
  @options[:output_file] ||= "#{@options[:repository].split("/").last}_prompt.txt"
122
136
  @options[:branch] ||= "main"
123
137
  @options[:exclude] ||= []
138
+ @options[:threads] ||= DEFAULT_THREAD_COUNT
139
+ @options[:thread_timeout] ||= DEFAULT_THREAD_TIMEOUT
124
140
  @excluded_patterns = DEFAULT_EXCLUDES + @options[:exclude]
125
141
  end
126
142
 
@@ -136,9 +152,10 @@ module Gitingest
136
152
  end
137
153
  end
138
154
 
139
- # Convert exclusion patterns to regular expressions
155
+ # Optimization: Create a combined regex for faster exclusion checking
140
156
  def compile_excluded_patterns
141
- @excluded_patterns = @excluded_patterns.map { |pattern| Regexp.new(pattern) }
157
+ patterns = @excluded_patterns.map { |pattern| "(#{pattern})" }
158
+ @combined_exclude_regex = Regexp.new("#{DOT_FILE_PATTERN.source}|#{patterns.join("|")}")
142
159
  end
143
160
 
144
161
  # Fetch repository contents and apply exclusion filters
@@ -180,49 +197,93 @@ module Gitingest
180
197
  end
181
198
  end
182
199
 
183
- # Check if a file should be excluded based on its path
200
+ # Optimization: Optimized file exclusion check with combined regex
184
201
  def excluded_file?(path)
185
- return true if path.start_with?(".") || path.split("/").any? { |part| part.start_with?(".") }
186
-
187
- @excluded_patterns.any? { |pattern| path.match?(pattern) }
202
+ path.match?(@combined_exclude_regex)
188
203
  end
189
204
 
190
- # Generate the consolidated prompt file
205
+ # Generate the consolidated prompt file with optimized threading
191
206
  def generate_prompt
192
207
  @logger.info "Generating prompt..."
208
+ @logger.debug "Using thread pool with #{@options[:threads]} threads"
209
+
193
210
  buffer = []
194
211
  progress = ProgressIndicator.new(@repo_files.size, @logger)
195
212
 
196
- # Dynamic thread pool based on core count
197
- pool = Concurrent::FixedThreadPool.new([Concurrent.processor_count, 5].min)
213
+ # Optimization: thread-local buffers to reduce mutex contention
214
+ thread_buffers = {}
215
+ mutex = Mutex.new
216
+ errors = []
217
+
218
+ # Dynamic thread pool based on configuration
219
+ pool = Concurrent::FixedThreadPool.new(@options[:threads])
220
+
221
+ # Group files by priority (smaller files first for better parallelism)
222
+ prioritized_files = prioritize_files(@repo_files)
198
223
 
199
224
  File.open(@options[:output_file], "w") do |file|
200
- @repo_files.each_with_index do |repo_file, index|
225
+ prioritized_files.each_with_index do |repo_file, index|
201
226
  pool.post do
202
- content = fetch_file_content_with_retry(repo_file.path)
203
- result = format_file_content(repo_file.path, content)
204
-
205
- # Thread-safe buffer management
206
- buffer_mutex.synchronize do
207
- buffer << result
208
- write_buffer(file, buffer) if buffer.size >= BUFFER_SIZE
227
+ # Optimization: Use thread-local buffers
228
+ thread_id = Thread.current.object_id
229
+ thread_buffers[thread_id] ||= []
230
+ local_buffer = thread_buffers[thread_id]
231
+
232
+ begin
233
+ content = fetch_file_content_with_retry(repo_file.path)
234
+ result = format_file_content(repo_file.path, content)
235
+ local_buffer << result
236
+
237
+ # Optimization: Only acquire mutex when local buffer reaches threshold
238
+ if local_buffer.size >= LOCAL_BUFFER_THRESHOLD
239
+ mutex.synchronize do
240
+ buffer.concat(local_buffer)
241
+ write_buffer(file, buffer) if buffer.size >= BUFFER_SIZE
242
+ local_buffer.clear
243
+ end
244
+ end
245
+
246
+ progress.update(index + 1)
247
+ rescue Octokit::Error => e
248
+ mutex.synchronize do
249
+ errors << "Error fetching #{repo_file.path}: #{e.message}"
250
+ @logger.error "Error fetching #{repo_file.path}: #{e.message}"
251
+ end
252
+ rescue StandardError => e
253
+ mutex.synchronize do
254
+ errors << "Unexpected error processing #{repo_file.path}: #{e.message}"
255
+ @logger.error "Unexpected error processing #{repo_file.path}: #{e.message}"
256
+ end
209
257
  end
210
-
211
- progress.update(index + 1)
212
- rescue Octokit::Error => e
213
- @logger.error "Error fetching #{repo_file.path}: #{e.message}"
214
258
  end
215
259
  end
216
260
 
217
- pool.shutdown
218
- pool.wait_for_termination
261
+ begin
262
+ pool.shutdown
263
+ wait_success = pool.wait_for_termination(@options[:thread_timeout])
219
264
 
220
- # Write any remaining files in buffer
221
- buffer_mutex.synchronize do
265
+ unless wait_success
266
+ @logger.warn "Thread pool did not shut down within #{@options[:thread_timeout]} seconds, forcing termination"
267
+ pool.kill
268
+ end
269
+ rescue StandardError => e
270
+ @logger.error "Error during thread pool shutdown: #{e.message}"
271
+ end
272
+
273
+ # Process remaining files in thread-local buffers
274
+ mutex.synchronize do
275
+ thread_buffers.each_value do |local_buffer|
276
+ buffer.concat(local_buffer) unless local_buffer.empty?
277
+ end
222
278
  write_buffer(file, buffer) unless buffer.empty?
223
279
  end
224
280
  end
225
281
 
282
+ if errors.any?
283
+ @logger.warn "Completed with #{errors.size} errors"
284
+ @logger.debug "First few errors: #{errors.first(3).join(", ")}" if @logger.debug?
285
+ end
286
+
226
287
  @logger.info "Prompt generated and saved to #{@options[:output_file]}"
227
288
  end
228
289
 
@@ -237,45 +298,122 @@ module Gitingest
237
298
  TEXT
238
299
  end
239
300
 
240
- # Fetch file content with retry logic for rate limiting
241
- def fetch_file_content_with_retry(path, retries = 3)
301
+ # Optimization: Fetch file content with exponential backoff for rate limiting
302
+ def fetch_file_content_with_retry(path, retries = 3, base_delay = 2)
242
303
  content = @client.contents(@options[:repository], path: path, ref: @options[:branch])
243
304
  Base64.decode64(content.content)
244
305
  rescue Octokit::TooManyRequests
245
306
  raise unless retries.positive?
246
307
 
247
- sleep_time = 60 / retries
248
- @logger.warn "Rate limit exceeded, waiting #{sleep_time} seconds..."
249
- sleep(sleep_time)
250
- fetch_file_content_with_retry(path, retries - 1)
308
+ # Optimization: Exponential backoff with jitter for better rate limit handling
309
+ delay = base_delay**(4 - retries) * (0.8 + 0.4 * rand)
310
+ @logger.warn "Rate limit exceeded, waiting #{delay.round(1)} seconds..."
311
+ sleep(delay)
312
+ fetch_file_content_with_retry(path, retries - 1, base_delay)
251
313
  end
252
314
 
253
315
  # Write buffer contents to file and clear buffer
254
316
  def write_buffer(file, buffer)
317
+ return if buffer.empty?
318
+
255
319
  file.puts(buffer.join)
256
320
  buffer.clear
257
321
  end
258
322
 
259
- # Thread-safe mutex for buffer operations
260
- def buffer_mutex
261
- @buffer_mutex ||= Mutex.new
323
+ # Sort files by estimated processing priority
324
+ def prioritize_files(files)
325
+ # Sort files by estimated size (based on extension)
326
+ # This helps with better thread distribution - process small files first
327
+ files.sort_by do |file|
328
+ path = file.path.downcase
329
+ if path.end_with?(".md", ".txt", ".json", ".yaml", ".yml")
330
+ 0 # Process documentation and config files first (usually small)
331
+ elsif path.end_with?(".rb", ".py", ".js", ".ts", ".go", ".java", ".c", ".cpp", ".h")
332
+ 1 # Then process code files (medium size)
333
+ else
334
+ 2 # Other files last
335
+ end
336
+ end
262
337
  end
263
338
  end
264
339
 
265
- # Helper class for showing progress in CLI
340
+ # Helper class for showing progress in CLI with visual bar
266
341
  class ProgressIndicator
342
+ BAR_WIDTH = 30 # Width of the progress bar
343
+
267
344
  def initialize(total, logger)
268
345
  @total = total
269
346
  @logger = logger
270
347
  @last_percent = 0
348
+ @start_time = Time.now
349
+ @last_update_time = Time.now
350
+ @update_interval = 0.5 # Limit updates to twice per second
271
351
  end
272
352
 
353
+ # Update progress with visual bar
273
354
  def update(current)
355
+ # Avoid updating too frequently
356
+ now = Time.now
357
+ return if now - @last_update_time < @update_interval && current != @total
358
+
359
+ @last_update_time = now
274
360
  percent = (current.to_f / @total * 100).round
275
- return unless percent > @last_percent && ((percent % 5).zero? || current == @total)
276
361
 
277
- @logger.info "Processing: #{percent}% complete (#{current}/#{@total} files)"
362
+ # Only update at meaningful increments or completion
363
+ return unless percent > @last_percent || current == @total
364
+
365
+ elapsed = now - @start_time
366
+
367
+ # Generate progress bar
368
+ progress_chars = (BAR_WIDTH * (current.to_f / @total)).round
369
+ bar = "[#{"|" * progress_chars}#{" " * (BAR_WIDTH - progress_chars)}]"
370
+
371
+ # Calculate ETA
372
+ eta_string = ""
373
+ if current > 1 && percent < 100
374
+ remaining = (elapsed / current) * (@total - current)
375
+ eta_string = " ETA: #{format_time(remaining)}"
376
+ end
377
+
378
+ # Calculate rate (files per second)
379
+ rate = begin
380
+ current / elapsed
381
+ rescue StandardError
382
+ 0
383
+ end
384
+ rate_string = " (#{rate.round(1)} files/sec)"
385
+
386
+ # Clear line and print progress bar
387
+ print "\r\e[K" # Clear the line
388
+ print "#{bar} #{percent}% | #{current}/#{@total} files#{rate_string}#{eta_string}"
389
+ print "\n" if current == @total # Add newline when complete
390
+
391
+ # Also log to logger at less frequent intervals
392
+ if (percent % 10).zero? && percent != @last_percent || current == @total
393
+ @logger.info "Processing: #{percent}% complete (#{current}/#{@total} files)#{eta_string}"
394
+ end
395
+
278
396
  @last_percent = percent
279
397
  end
398
+
399
+ private
400
+
401
+ # Format seconds into a human-readable time string
402
+ def format_time(seconds)
403
+ return "< 1s" if seconds < 1
404
+
405
+ case seconds
406
+ when 0...60
407
+ "#{seconds.round}s"
408
+ when 60...3600
409
+ minutes = (seconds / 60).floor
410
+ secs = (seconds % 60).round
411
+ "#{minutes}m #{secs}s"
412
+ else
413
+ hours = (seconds / 3600).floor
414
+ minutes = ((seconds % 3600) / 60).floor
415
+ "#{hours}h #{minutes}m"
416
+ end
417
+ end
280
418
  end
281
419
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Gitingest
4
- VERSION = "0.3.0"
4
+ VERSION = "0.3.1"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gitingest
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.3.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Davide Santangelo
@@ -131,6 +131,7 @@ files:
131
131
  - bin/console
132
132
  - bin/gitingest
133
133
  - bin/setup
134
+ - index.html
134
135
  - lib/gitingest.rb
135
136
  - lib/gitingest/generator.rb
136
137
  - lib/gitingest/version.rb