gitingest 0.2.0 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 65bac565322a3c782e5a3da91648fbe1aef905af80344b7ab81d8223e7a39c08
4
- data.tar.gz: ae9c73bc2df91ba61b10a01e36a15c2de7238104627c296408cfcc273851e885
3
+ metadata.gz: 14bcb35132327c7725e69a895d56b4e88fb21db78fa5473c9afb4d08b879b7ee
4
+ data.tar.gz: f3a5e06bec7566a268342678887bac14989d69d26f10cbb590e1f594c6779b89
5
5
  SHA512:
6
- metadata.gz: ba151945b20e7c5b4ce51af89a724f8673ec4b8928700b8922eecc8aeb7089fbe86e172d41289d308caff4ce1e88603be87416b1f743537c6b3f4e64d21515de
7
- data.tar.gz: ce1aaff5e55cca369d8eeb6e3fd68a92e63b5258f35b5edd8698b505c5c6d84363189eedeb483c5553d62ce2b9011ecf9b119f222dbc51e91a28b5f7e5e369e1
6
+ metadata.gz: 1aed7d97acae8b6a1c2b15757cdc9852d802c99b83aa73caf114dfc904d0b05c44b70c853cd453fc12868310c09b15bf6bf71dcac203c01b5dfde241dc7ab0f5
7
+ data.tar.gz: 07145ca986675723ad0371c873c176384631da660bed227b31ef4cf2680d72cc98479b4da8fb990fedb3dce0eab74ca5114c1222eea0ca1890a85d73726a8d4b
data/CHANGELOG.md CHANGED
@@ -1,29 +1,67 @@
1
1
  # Changelog
2
2
 
3
- All notable changes to this project will be documented in this file.
3
+ ## [0.3.1] - 2025-03-03
4
+
5
+ ### Added
6
+ - Introduced configurable threading options:
7
+ - `:threads` to specify the number of threads (default: auto-detected).
8
+ - `:thread_timeout` to define thread pool shutdown timeout (default: 60 seconds).
9
+ - Implemented thread-local buffers to reduce mutex contention during file processing.
10
+ - Added exponential backoff with jitter for rate-limited API requests.
11
+ - Improved progress indicator with a visual progress bar and estimated time remaining.
12
+
13
+ ### Changed
14
+ - Increased `BUFFER_SIZE` from 100 to 250 to reduce I/O operations.
15
+ - Optimized file exclusion check using a combined regex for faster matching.
16
+ - Improved thread pool efficiency by prioritizing smaller files first.
17
+ - Enhanced error handling with detailed logging and thread-safe error collection.
18
+
19
+ ### Fixed
20
+ - Ensured thread pool shutdown respects the configured timeout.
21
+ - Resolved potential race conditions in file content retrieval.
22
+
23
+ ---
24
+
25
+ ## [0.3.0] - 2025-03-02
26
+
27
+ ### Added
28
+ - Added `faraday-retry` gem dependency for better API rate limit handling.
29
+ - Implemented thread-safe buffer management with mutex locks.
30
+ - Introduced `ProgressIndicator` class for enhanced CLI progress reporting, including percentages.
31
+ - Improved memory efficiency with a configurable buffer size.
32
+ - Enhanced code organization by introducing dedicated methods for file content formatting.
33
+ - Added comprehensive method documentation and parameter descriptions.
34
+ - Optimized thread pool size calculation for improved performance.
35
+ - Improved error handling in concurrent operations.
36
+
37
+ ---
4
38
 
5
39
  ## [0.2.0] - 2025-03-02
6
- - Added support for quiet and verbose modes in the command-line interface
7
- - Added the ability to specify a custom output file for the prompt
8
- - Enhanced error handling with logging support
9
- - Added logging functionality with custom loggers
10
- - Introduced rate limit handling with retries for file fetching
11
- - Added repository branch support
12
- - Exclude specific file patterns via command-line arguments
13
- - Enforced a 1000 file limit to prevent memory overload
14
- - Updated version to 0.2.0
40
+
41
+ ### Added
42
+ - Introduced support for quiet and verbose modes in the command-line interface.
43
+ - Added the ability to specify a custom output file for the prompt.
44
+ - Implemented enhanced error handling with logging support.
45
+ - Introduced logging functionality with customizable loggers.
46
+ - Added rate limit handling with retries for file fetching.
47
+ - Implemented repository branch support.
48
+ - Enabled exclusion of specific file patterns via command-line arguments.
49
+ - Enforced a 1000-file limit to prevent memory overload.
50
+ - Updated version to `0.2.0`.
51
+
52
+ ---
15
53
 
16
54
  ## [0.1.0] - 2025-03-02
17
55
 
18
56
  ### Added
19
- - Initial release of Gitingest
20
- - Core functionality to fetch and process GitHub repository files
21
- - Command-line interface for easy interaction
22
- - Smart file filtering with default exclusions for common non-code files
23
- - Concurrent processing for improved performance
24
- - Custom exclude patterns support
25
- - GitHub authentication via access tokens
26
- - Automatic rate limit handling with retry mechanism
27
- - Repository prompt generation with file separation markers
28
- - Support for custom branch selection
29
- - Custom output file naming options
57
+ - Initial release of Gitingest.
58
+ - Core functionality to fetch and process GitHub repository files.
59
+ - Command-line interface for easy interaction.
60
+ - Smart file filtering with default exclusions for common non-code files.
61
+ - Concurrent processing for improved performance.
62
+ - Custom exclude patterns support.
63
+ - GitHub authentication via access tokens.
64
+ - Automatic rate limit handling with a retry mechanism.
65
+ - Repository prompt generation with file separation markers.
66
+ - Support for custom branch selection.
67
+ - Custom output file naming options.
data/index.html ADDED
@@ -0,0 +1,363 @@
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>Gitingest - GitHub Repository Fetcher and Prompt Generator</title>
7
+ <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/github-markdown-css/5.2.0/github-markdown.min.css">
8
+ <style>
9
+ :root {
10
+ --bg-color: #0d1117;
11
+ --text-color: #c9d1d9;
12
+ --link-color: #58a6ff;
13
+ --header-color: #f0f6fc;
14
+ --border-color: #30363d;
15
+ --code-bg: #161b22;
16
+ --code-block-bg: #0d1117;
17
+ --accent-color: #238636;
18
+ --accent-hover: #2ea043;
19
+ }
20
+
21
+ body {
22
+ font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Helvetica, Arial, sans-serif;
23
+ line-height: 1.6;
24
+ color: var(--text-color);
25
+ background-color: var(--bg-color);
26
+ max-width: 900px;
27
+ margin: 0 auto;
28
+ padding: 20px;
29
+ }
30
+
31
+ .container {
32
+ border: 1px solid var(--border-color);
33
+ border-radius: 6px;
34
+ padding: 30px;
35
+ margin-bottom: 20px;
36
+ background-color: #0d1117;
37
+ }
38
+
39
+ .header {
40
+ display: flex;
41
+ align-items: center;
42
+ margin-bottom: 30px;
43
+ }
44
+
45
+ .logo {
46
+ width: 60px;
47
+ height: 60px;
48
+ margin-right: 15px;
49
+ background-color: var(--accent-color);
50
+ border-radius: 50%;
51
+ display: flex;
52
+ align-items: center;
53
+ justify-content: center;
54
+ color: white;
55
+ font-size: 24px;
56
+ font-weight: bold;
57
+ }
58
+
59
+ h1, h2, h3 {
60
+ color: var(--header-color);
61
+ border-bottom: 1px solid var(--border-color);
62
+ padding-bottom: 10px;
63
+ margin-top: 24px;
64
+ margin-bottom: 16px;
65
+ }
66
+
67
+ h1 {
68
+ font-size: 2em;
69
+ margin-bottom: 0.5em;
70
+ border-bottom: none;
71
+ padding-bottom: 0;
72
+ }
73
+
74
+ .header h1 {
75
+ margin: 0;
76
+ line-height: 1.3;
77
+ }
78
+
79
+ a {
80
+ color: var(--link-color);
81
+ text-decoration: none;
82
+ }
83
+
84
+ a:hover {
85
+ text-decoration: underline;
86
+ }
87
+
88
+ code {
89
+ font-family: "SFMono-Regular", Consolas, "Liberation Mono", Menlo, monospace;
90
+ background-color: var(--code-bg);
91
+ border-radius: 3px;
92
+ padding: 2px 4px;
93
+ font-size: 0.9em;
94
+ }
95
+
96
+ pre {
97
+ background-color: var(--code-block-bg);
98
+ border-radius: 6px;
99
+ padding: 16px;
100
+ overflow: auto;
101
+ border: 1px solid var(--border-color);
102
+ margin: 16px 0;
103
+ }
104
+
105
+ pre code {
106
+ background-color: transparent;
107
+ padding: 0;
108
+ border-radius: 0;
109
+ white-space: pre;
110
+ }
111
+
112
+ ul, ol {
113
+ padding-left: 2em;
114
+ }
115
+
116
+ .button {
117
+ display: inline-block;
118
+ background-color: var(--accent-color);
119
+ color: white;
120
+ padding: 8px 16px;
121
+ border-radius: 6px;
122
+ font-weight: 600;
123
+ margin: 8px 0;
124
+ }
125
+
126
+ .button:hover {
127
+ background-color: var(--accent-hover);
128
+ text-decoration: none;
129
+ }
130
+
131
+ .version-badge {
132
+ display: inline-block;
133
+ background-color: #238636;
134
+ color: white;
135
+ border-radius: 20px;
136
+ padding: 4px 10px;
137
+ font-size: 12px;
138
+ font-weight: bold;
139
+ margin-left: 10px;
140
+ }
141
+
142
+ footer {
143
+ margin-top: 40px;
144
+ text-align: center;
145
+ color: #8b949e;
146
+ font-size: 0.9em;
147
+ border-top: 1px solid var(--border-color);
148
+ padding-top: 20px;
149
+ }
150
+
151
+ .changelog {
152
+ margin-top: 30px;
153
+ }
154
+
155
+ .changelog-item {
156
+ margin-bottom: 24px;
157
+ }
158
+
159
+ .changelog-version {
160
+ font-weight: bold;
161
+ color: var(--header-color);
162
+ }
163
+
164
+ .changelog-date {
165
+ color: #8b949e;
166
+ font-size: 0.9em;
167
+ }
168
+
169
+ .changelog-list {
170
+ margin-top: 10px;
171
+ }
172
+ </style>
173
+ </head>
174
+ <body>
175
+ <div class="container">
176
+ <div class="header">
177
+ <div class="logo">G</div>
178
+ <div>
179
+ <h1>Gitingest <span class="version-badge">v0.3.0</span></h1>
180
+ <p>A Ruby gem that fetches files from a GitHub repository and generates a consolidated text prompt for LLMs</p>
181
+ </div>
182
+ </div>
183
+
184
+ <a href="https://github.com/davidesantangelo/gitingest" class="button">View on GitHub</a>
185
+ <a href="https://rubygems.org/gems/gitingest" class="button">View on RubyGems</a>
186
+
187
+ <h2>Installation</h2>
188
+
189
+ <h3>From RubyGems</h3>
190
+ <pre><code>gem install gitingest</code></pre>
191
+
192
+ <h3>From Source</h3>
193
+ <pre><code>git clone https://github.com/davidesantangelo/gitingest.git
194
+ cd gitingest
195
+ bundle install
196
+ bundle exec rake install</code></pre>
197
+
198
+ <h2>Usage</h2>
199
+
200
+ <h3>Command Line</h3>
201
+ <pre><code># Basic usage (public repository)
202
+ gitingest --repository user/repo
203
+
204
+ # With GitHub token for private repositories
205
+ gitingest --repository user/repo --token YOUR_GITHUB_TOKEN
206
+
207
+ # Specify a custom output file
208
+ gitingest --repository user/repo --output my_prompt.txt
209
+
210
+ # Specify a different branch
211
+ gitingest --repository user/repo --branch develop
212
+
213
+ # Exclude additional patterns
214
+ gitingest --repository user/repo --exclude "*.md,docs/"
215
+
216
+ # Quiet mode
217
+ gitingest --repository user/repo --quiet
218
+
219
+ # Verbose mode
220
+ gitingest --repository user/repo --verbose</code></pre>
221
+
222
+ <h4>Available Options</h4>
223
+ <ul>
224
+ <li><code>-r, --repository REPO</code>: GitHub repository (username/repo) [Required]</li>
225
+ <li><code>-t, --token TOKEN</code>: GitHub personal access token [Optional but recommended]</li>
226
+ <li><code>-o, --output FILE</code>: Output file for the prompt [Default: reponame_prompt.txt]</li>
227
+ <li><code>-e, --exclude PATTERN</code>: File patterns to exclude (comma separated)</li>
228
+ <li><code>-b, --branch BRANCH</code>: Repository branch [Default: main]</li>
229
+ <li><code>-h, --help</code>: Show help message</li>
230
+ </ul>
231
+
232
+ <h3>As a Library</h3>
233
+ <pre><code>require "gitingest"
234
+
235
+ # Basic usage
236
+ generator = Gitingest::Generator.new(
237
+ repository: "user/repo",
238
+ token: "YOUR_GITHUB_TOKEN" # optional
239
+ )
240
+ generator.run
241
+
242
+ # With custom options
243
+ generator = Gitingest::Generator.new(
244
+ repository: "user/repo",
245
+ token: "YOUR_GITHUB_TOKEN",
246
+ output_file: "my_prompt.txt",
247
+ branch: "develop",
248
+ exclude: ["*.md", "docs/"],
249
+ quiet: true # or verbose: true
250
+ )
251
+ generator.run
252
+
253
+ # With custom logger
254
+ custom_logger = Logger.new("gitingest.log")
255
+ generator = Gitingest::Generator.new(
256
+ repository: "user/repo",
257
+ logger: custom_logger
258
+ )
259
+ generator.run</code></pre>
260
+
261
+ <h2>Features</h2>
262
+ <ul>
263
+ <li>Fetches all files from a GitHub repository based on the given branch</li>
264
+ <li>Automatically excludes common binary files and system files by default</li>
265
+ <li>Allows custom exclusion patterns for specific file extensions or directories</li>
266
+ <li>Uses concurrent processing for faster downloads</li>
267
+ <li>Handles GitHub API rate limiting with automatic retry</li>
268
+ <li>Generates a clean, formatted output file with file paths and content</li>
269
+ </ul>
270
+
271
+ <h2>Default Exclusion Patterns</h2>
272
+ <p>By default, the generator excludes files and directories commonly ignored in repositories, such as:</p>
273
+ <ul>
274
+ <li>Version control files (<code>.git/</code>, <code>.svn/</code>)</li>
275
+ <li>System files (<code>.DS_Store</code>, <code>Thumbs.db</code>)</li>
276
+ <li>Log files (<code>*.log</code>, <code>*.bak</code>)</li>
277
+ <li>Images and media files (<code>*.png</code>, <code>*.jpg</code>, <code>*.mp3</code>)</li>
278
+ <li>Archives (<code>*.zip</code>, <code>*.tar.gz</code>)</li>
279
+ <li>Dependency directories (<code>node_modules/</code>, <code>vendor/</code>)</li>
280
+ <li>Compiled and binary files (<code>*.pyc</code>, <code>*.class</code>, <code>*.exe</code>)</li>
281
+ </ul>
282
+
283
+ <h2>Limitations</h2>
284
+ <ul>
285
+ <li>To prevent memory overload, only the first 1000 files will be processed</li>
286
+ <li>API requests are subject to GitHub limits (60 requests/hour without token, 5000 requests/hour with token)</li>
287
+ <li>Private repositories require a GitHub personal access token</li>
288
+ </ul>
289
+
290
+ <div class="changelog">
291
+ <h2>Changelog</h2>
292
+
293
+ <div class="changelog-item">
294
+ <div>
295
+ <span class="changelog-version">v0.3.0</span>
296
+ <span class="changelog-date">- March 2, 2025</span>
297
+ </div>
298
+ <ul class="changelog-list">
299
+ <li>Added <code>faraday-retry</code> gem dependency for better API rate limit handling</li>
300
+ <li>Implemented thread-safe buffer management with mutex locks</li>
301
+ <li>Added new <code>ProgressIndicator</code> class for better CLI progress reporting (showing percentages)</li>
302
+ <li>Improved memory efficiency with configurable buffer size</li>
303
+ <li>Enhanced code organization with dedicated methods for file content formatting</li>
304
+ <li>Added comprehensive method documentation and parameter descriptions</li>
305
+ <li>Optimized thread pool size calculation for better performance</li>
306
+ <li>Improved error handling in concurrent operations</li>
307
+ </ul>
308
+ </div>
309
+
310
+ <div class="changelog-item">
311
+ <div>
312
+ <span class="changelog-version">v0.2.0</span>
313
+ <span class="changelog-date">- March 2, 2025</span>
314
+ </div>
315
+ <ul class="changelog-list">
316
+ <li>Added support for quiet and verbose modes in the command-line interface</li>
317
+ <li>Added the ability to specify a custom output file for the prompt</li>
318
+ <li>Enhanced error handling with logging support</li>
319
+ <li>Added logging functionality with custom loggers</li>
320
+ <li>Introduced rate limit handling with retries for file fetching</li>
321
+ <li>Added repository branch support</li>
322
+ <li>Exclude specific file patterns via command-line arguments</li>
323
+ <li>Enforced a 1000 file limit to prevent memory overload</li>
324
+ </ul>
325
+ </div>
326
+
327
+ <div class="changelog-item">
328
+ <div>
329
+ <span class="changelog-version">v0.1.0</span>
330
+ <span class="changelog-date">- March 2, 2025</span>
331
+ </div>
332
+ <ul class="changelog-list">
333
+ <li>Initial release of Gitingest</li>
334
+ <li>Core functionality to fetch and process GitHub repository files</li>
335
+ <li>Command-line interface for easy interaction</li>
336
+ <li>Smart file filtering with default exclusions for common non-code files</li>
337
+ <li>Concurrent processing for improved performance</li>
338
+ <li>Custom exclude patterns support</li>
339
+ <li>GitHub authentication via access tokens</li>
340
+ <li>Automatic rate limit handling with retry mechanism</li>
341
+ <li>Repository prompt generation with file separation markers</li>
342
+ <li>Support for custom branch selection</li>
343
+ <li>Custom output file naming options</li>
344
+ </ul>
345
+ </div>
346
+ </div>
347
+
348
+ <h2>Contributing</h2>
349
+ <p>Bug reports and pull requests are welcome on GitHub at <a href="https://github.com/davidesantangelo/gitingest">https://github.com/davidesantangelo/gitingest</a>.</p>
350
+
351
+ <h2>Acknowledgements</h2>
352
+ <p>Inspired by <a href="https://github.com/cyclotruc/gitingest"><code>cyclotruc/gitingest</code></a>.</p>
353
+
354
+ <h2>License</h2>
355
+ <p>The gem is available as open source under the terms of the <a href="https://opensource.org/licenses/MIT">MIT License</a>.</p>
356
+ </div>
357
+
358
+ <footer>
359
+ <p>© 2025 David Santangelo</p>
360
+ <p>Last updated: March 2, 2025</p>
361
+ </footer>
362
+ </body>
363
+ </html>
@@ -65,11 +65,37 @@ module Gitingest
65
65
  "\.swiftpm/", "\.build/"
66
66
  ].freeze
67
67
 
68
+ # Optimization: pattern for dot files/directories
69
+ DOT_FILE_PATTERN = %r{(?-mix:(^\.|/\.))}
70
+
68
71
  # Maximum number of files to process to prevent memory overload
69
72
  MAX_FILES = 1000
70
73
 
74
+ # Optimization: increased buffer size to reduce I/O operations
75
+ BUFFER_SIZE = 250
76
+
77
+ # Optimization: thread-local buffer threshold
78
+ LOCAL_BUFFER_THRESHOLD = 50
79
+
80
+ # Add configurable threading options
81
+ DEFAULT_THREAD_COUNT = [Concurrent.processor_count, 8].min
82
+ DEFAULT_THREAD_TIMEOUT = 60 # seconds
83
+
71
84
  attr_reader :options, :client, :repo_files, :excluded_patterns, :logger
72
85
 
86
+ # Initialize a new Generator with the given options
87
+ #
88
+ # @param options [Hash] Configuration options
89
+ # @option options [String] :repository GitHub repository in format "username/repo"
90
+ # @option options [String] :token GitHub personal access token
91
+ # @option options [String] :branch Repository branch (default: "main")
92
+ # @option options [String] :output_file Output file path
93
+ # @option options [Array<String>] :exclude Additional patterns to exclude
94
+ # @option options [Boolean] :quiet Reduce logging to errors only
95
+ # @option options [Boolean] :verbose Increase logging verbosity
96
+ # @option options [Logger] :logger Custom logger instance
97
+ # @option options [Integer] :threads Number of threads to use (default: auto-detected)
98
+ # @option options [Integer] :thread_timeout Seconds to wait for thread pool shutdown (default: 60)
73
99
  def initialize(options = {})
74
100
  @options = options
75
101
  @repo_files = []
@@ -80,6 +106,15 @@ module Gitingest
80
106
  compile_excluded_patterns
81
107
  end
82
108
 
109
+ # Main execution method
110
+ def run
111
+ fetch_repository_contents
112
+ generate_prompt
113
+ end
114
+
115
+ private
116
+
117
+ # Set up logging based on verbosity options
83
118
  def setup_logger
84
119
  @logger = @options[:logger] || Logger.new($stdout)
85
120
  @logger.level = if @options[:quiet]
@@ -89,21 +124,23 @@ module Gitingest
89
124
  else
90
125
  Logger::INFO
91
126
  end
92
- # Semplifica il formato del logger per la riga di comando
127
+ # Simplify logger format for command line usage
93
128
  @logger.formatter = proc { |severity, _, _, msg| "#{severity == "INFO" ? "" : "[#{severity}] "}#{msg}\n" }
94
129
  end
95
130
 
96
- ### Option Validation
131
+ # Validate and set default options
97
132
  def validate_options
98
133
  raise ArgumentError, "Repository is required" unless @options[:repository]
99
134
 
100
135
  @options[:output_file] ||= "#{@options[:repository].split("/").last}_prompt.txt"
101
136
  @options[:branch] ||= "main"
102
137
  @options[:exclude] ||= []
138
+ @options[:threads] ||= DEFAULT_THREAD_COUNT
139
+ @options[:thread_timeout] ||= DEFAULT_THREAD_TIMEOUT
103
140
  @excluded_patterns = DEFAULT_EXCLUDES + @options[:exclude]
104
141
  end
105
142
 
106
- ### Client Configuration
143
+ # Configure the GitHub API client
107
144
  def configure_client
108
145
  @client = @options[:token] ? Octokit::Client.new(access_token: @options[:token]) : Octokit::Client.new
109
146
 
@@ -115,17 +152,17 @@ module Gitingest
115
152
  end
116
153
  end
117
154
 
155
+ # Optimization: Create a combined regex for faster exclusion checking
118
156
  def compile_excluded_patterns
119
- @excluded_patterns = @excluded_patterns.map { |pattern| Regexp.new(pattern) }
157
+ patterns = @excluded_patterns.map { |pattern| "(#{pattern})" }
158
+ @combined_exclude_regex = Regexp.new("#{DOT_FILE_PATTERN.source}|#{patterns.join("|")}")
120
159
  end
121
160
 
122
- ### Fetch Repository Contents
161
+ # Fetch repository contents and apply exclusion filters
123
162
  def fetch_repository_contents
124
163
  @logger.info "Fetching repository: #{@options[:repository]} (branch: #{@options[:branch]})"
125
164
  begin
126
- # First validate authentication and repository access
127
165
  validate_repository_access
128
-
129
166
  repo_tree = @client.tree(@options[:repository], @options[:branch], recursive: true)
130
167
  @repo_files = repo_tree.tree.select { |item| item.type == "blob" && !excluded_file?(item.path) }
131
168
 
@@ -143,8 +180,8 @@ module Gitingest
143
180
  end
144
181
  end
145
182
 
183
+ # Validate repository and branch access
146
184
  def validate_repository_access
147
- # Check if we can access the repository
148
185
  begin
149
186
  @client.repository(@options[:repository])
150
187
  rescue Octokit::Unauthorized
@@ -153,7 +190,6 @@ module Gitingest
153
190
  raise "Repository '#{@options[:repository]}' not found or is private. Check the repository name or provide a valid token."
154
191
  end
155
192
 
156
- # Check if the branch exists
157
193
  begin
158
194
  @client.branch(@options[:repository], @options[:branch])
159
195
  rescue Octokit::NotFound
@@ -161,68 +197,223 @@ module Gitingest
161
197
  end
162
198
  end
163
199
 
200
+ # Optimization: Optimized file exclusion check with combined regex
164
201
  def excluded_file?(path)
165
- return true if path.start_with?(".") || path.split("/").any? { |part| part.start_with?(".") }
166
-
167
- @excluded_patterns.any? { |pattern| path.match?(pattern) }
202
+ path.match?(@combined_exclude_regex)
168
203
  end
169
204
 
170
- ### Generate Prompt
205
+ # Generate the consolidated prompt file with optimized threading
171
206
  def generate_prompt
172
207
  @logger.info "Generating prompt..."
173
- Concurrent::Array.new(@repo_files)
208
+ @logger.debug "Using thread pool with #{@options[:threads]} threads"
209
+
174
210
  buffer = []
175
- buffer_size = 100 # Write every 100 files to reduce I/O
211
+ progress = ProgressIndicator.new(@repo_files.size, @logger)
212
+
213
+ # Optimization: thread-local buffers to reduce mutex contention
214
+ thread_buffers = {}
215
+ mutex = Mutex.new
216
+ errors = []
217
+
218
+ # Dynamic thread pool based on configuration
219
+ pool = Concurrent::FixedThreadPool.new(@options[:threads])
176
220
 
177
- # Dynamic thread pool based on core count
178
- pool = Concurrent::FixedThreadPool.new([Concurrent.processor_count, 5].max)
221
+ # Group files by priority (smaller files first for better parallelism)
222
+ prioritized_files = prioritize_files(@repo_files)
179
223
 
180
224
  File.open(@options[:output_file], "w") do |file|
181
- @repo_files.each_with_index do |repo_file, index|
225
+ prioritized_files.each_with_index do |repo_file, index|
182
226
  pool.post do
183
- content = fetch_file_content_with_retry(repo_file.path)
184
- result = <<~TEXT
185
- ================================================================
186
- File: #{repo_file.path}
187
- ================================================================
188
- #{content}
189
-
190
- TEXT
191
- buffer << result
192
- write_buffer(file, buffer) if buffer.size >= buffer_size
193
- print "\rProcessing: #{index + 1}/#{@repo_files.size} files"
194
- rescue Octokit::Error => e
195
- @logger.error "Error fetching #{repo_file.path}: #{e.message}"
227
+ # Optimization: Use thread-local buffers
228
+ thread_id = Thread.current.object_id
229
+ thread_buffers[thread_id] ||= []
230
+ local_buffer = thread_buffers[thread_id]
231
+
232
+ begin
233
+ content = fetch_file_content_with_retry(repo_file.path)
234
+ result = format_file_content(repo_file.path, content)
235
+ local_buffer << result
236
+
237
+ # Optimization: Only acquire mutex when local buffer reaches threshold
238
+ if local_buffer.size >= LOCAL_BUFFER_THRESHOLD
239
+ mutex.synchronize do
240
+ buffer.concat(local_buffer)
241
+ write_buffer(file, buffer) if buffer.size >= BUFFER_SIZE
242
+ local_buffer.clear
243
+ end
244
+ end
245
+
246
+ progress.update(index + 1)
247
+ rescue Octokit::Error => e
248
+ mutex.synchronize do
249
+ errors << "Error fetching #{repo_file.path}: #{e.message}"
250
+ @logger.error "Error fetching #{repo_file.path}: #{e.message}"
251
+ end
252
+ rescue StandardError => e
253
+ mutex.synchronize do
254
+ errors << "Unexpected error processing #{repo_file.path}: #{e.message}"
255
+ @logger.error "Unexpected error processing #{repo_file.path}: #{e.message}"
256
+ end
257
+ end
196
258
  end
197
259
  end
198
- pool.shutdown
199
- pool.wait_for_termination
200
- write_buffer(file, buffer) unless buffer.empty?
260
+
261
+ begin
262
+ pool.shutdown
263
+ wait_success = pool.wait_for_termination(@options[:thread_timeout])
264
+
265
+ unless wait_success
266
+ @logger.warn "Thread pool did not shut down within #{@options[:thread_timeout]} seconds, forcing termination"
267
+ pool.kill
268
+ end
269
+ rescue StandardError => e
270
+ @logger.error "Error during thread pool shutdown: #{e.message}"
271
+ end
272
+
273
+ # Process remaining files in thread-local buffers
274
+ mutex.synchronize do
275
+ thread_buffers.each_value do |local_buffer|
276
+ buffer.concat(local_buffer) unless local_buffer.empty?
277
+ end
278
+ write_buffer(file, buffer) unless buffer.empty?
279
+ end
201
280
  end
202
- @logger.info "\nPrompt generated and saved to #{@options[:output_file]}"
281
+
282
+ if errors.any?
283
+ @logger.warn "Completed with #{errors.size} errors"
284
+ @logger.debug "First few errors: #{errors.first(3).join(", ")}" if @logger.debug?
285
+ end
286
+
287
+ @logger.info "Prompt generated and saved to #{@options[:output_file]}"
203
288
  end
204
289
 
205
- def fetch_file_content_with_retry(path, retries = 3)
290
+ # Format a file's content for the prompt
291
+ def format_file_content(path, content)
292
+ <<~TEXT
293
+ ================================================================
294
+ File: #{path}
295
+ ================================================================
296
+ #{content}
297
+
298
+ TEXT
299
+ end
300
+
301
+ # Optimization: Fetch file content with exponential backoff for rate limiting
302
+ def fetch_file_content_with_retry(path, retries = 3, base_delay = 2)
206
303
  content = @client.contents(@options[:repository], path: path, ref: @options[:branch])
207
304
  Base64.decode64(content.content)
208
305
  rescue Octokit::TooManyRequests
209
306
  raise unless retries.positive?
210
307
 
211
- sleep_time = 60 / retries
212
- @logger.warn "Rate limit exceeded, waiting #{sleep_time} seconds..."
213
- sleep(sleep_time)
214
- fetch_file_content_with_retry(path, retries - 1)
308
+ # Optimization: Exponential backoff with jitter for better rate limit handling
309
+ delay = base_delay**(4 - retries) * (0.8 + 0.4 * rand)
310
+ @logger.warn "Rate limit exceeded, waiting #{delay.round(1)} seconds..."
311
+ sleep(delay)
312
+ fetch_file_content_with_retry(path, retries - 1, base_delay)
215
313
  end
216
314
 
315
+ # Write buffer contents to file and clear buffer
217
316
  def write_buffer(file, buffer)
317
+ return if buffer.empty?
318
+
218
319
  file.puts(buffer.join)
219
320
  buffer.clear
220
321
  end
221
322
 
222
- ### Main Execution
223
- def run
224
- fetch_repository_contents
225
- generate_prompt
323
+ # Sort files by estimated processing priority
324
+ def prioritize_files(files)
325
+ # Sort files by estimated size (based on extension)
326
+ # This helps with better thread distribution - process small files first
327
+ files.sort_by do |file|
328
+ path = file.path.downcase
329
+ if path.end_with?(".md", ".txt", ".json", ".yaml", ".yml")
330
+ 0 # Process documentation and config files first (usually small)
331
+ elsif path.end_with?(".rb", ".py", ".js", ".ts", ".go", ".java", ".c", ".cpp", ".h")
332
+ 1 # Then process code files (medium size)
333
+ else
334
+ 2 # Other files last
335
+ end
336
+ end
337
+ end
338
+ end
339
+
340
+ # Helper class for showing progress in CLI with visual bar
341
+ class ProgressIndicator
342
+ BAR_WIDTH = 30 # Width of the progress bar
343
+
344
+ def initialize(total, logger)
345
+ @total = total
346
+ @logger = logger
347
+ @last_percent = 0
348
+ @start_time = Time.now
349
+ @last_update_time = Time.now
350
+ @update_interval = 0.5 # Limit updates to twice per second
351
+ end
352
+
353
+ # Update progress with visual bar
354
+ def update(current)
355
+ # Avoid updating too frequently
356
+ now = Time.now
357
+ return if now - @last_update_time < @update_interval && current != @total
358
+
359
+ @last_update_time = now
360
+ percent = (current.to_f / @total * 100).round
361
+
362
+ # Only update at meaningful increments or completion
363
+ return unless percent > @last_percent || current == @total
364
+
365
+ elapsed = now - @start_time
366
+
367
+ # Generate progress bar
368
+ progress_chars = (BAR_WIDTH * (current.to_f / @total)).round
369
+ bar = "[#{"|" * progress_chars}#{" " * (BAR_WIDTH - progress_chars)}]"
370
+
371
+ # Calculate ETA
372
+ eta_string = ""
373
+ if current > 1 && percent < 100
374
+ remaining = (elapsed / current) * (@total - current)
375
+ eta_string = " ETA: #{format_time(remaining)}"
376
+ end
377
+
378
+ # Calculate rate (files per second)
379
+ rate = begin
380
+ current / elapsed
381
+ rescue StandardError
382
+ 0
383
+ end
384
+ rate_string = " (#{rate.round(1)} files/sec)"
385
+
386
+ # Clear line and print progress bar
387
+ print "\r\e[K" # Clear the line
388
+ print "#{bar} #{percent}% | #{current}/#{@total} files#{rate_string}#{eta_string}"
389
+ print "\n" if current == @total # Add newline when complete
390
+
391
+ # Also log to logger at less frequent intervals
392
+ if (percent % 10).zero? && percent != @last_percent || current == @total
393
+ @logger.info "Processing: #{percent}% complete (#{current}/#{@total} files)#{eta_string}"
394
+ end
395
+
396
+ @last_percent = percent
397
+ end
398
+
399
+ private
400
+
401
+ # Format seconds into a human-readable time string
402
+ def format_time(seconds)
403
+ return "< 1s" if seconds < 1
404
+
405
+ case seconds
406
+ when 0...60
407
+ "#{seconds.round}s"
408
+ when 60...3600
409
+ minutes = (seconds / 60).floor
410
+ secs = (seconds % 60).round
411
+ "#{minutes}m #{secs}s"
412
+ else
413
+ hours = (seconds / 3600).floor
414
+ minutes = ((seconds % 3600) / 60).floor
415
+ "#{hours}h #{minutes}m"
416
+ end
226
417
  end
227
418
  end
228
419
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Gitingest
4
- VERSION = "0.2.0"
4
+ VERSION = "0.3.1"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gitingest
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.3.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Davide Santangelo
@@ -24,6 +24,20 @@ dependencies:
24
24
  - - "~>"
25
25
  - !ruby/object:Gem::Version
26
26
  version: '1.1'
27
+ - !ruby/object:Gem::Dependency
28
+ name: faraday-retry
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '2.0'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '2.0'
27
41
  - !ruby/object:Gem::Dependency
28
42
  name: octokit
29
43
  requirement: !ruby/object:Gem::Requirement
@@ -117,6 +131,7 @@ files:
117
131
  - bin/console
118
132
  - bin/gitingest
119
133
  - bin/setup
134
+ - index.html
120
135
  - lib/gitingest.rb
121
136
  - lib/gitingest/generator.rb
122
137
  - lib/gitingest/version.rb