gitingest 0.7.1 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8823c78db091723b50cdeebcb51cc716315d2a3351e295badfa80b0cb48270f7
4
- data.tar.gz: 530854838bff4d35f40a22d355cf256db244dbedd0a0be70a88f067e19f9ec5e
3
+ metadata.gz: 53327f20d859b72079399395ab4ac9f31cd045d65c9e5f40cd771192b8da7298
4
+ data.tar.gz: 52eaa4f2d759bf50ce8fcfd8174005ac61102dd9989455828fc35f62b6c6c244
5
5
  SHA512:
6
- metadata.gz: 3fb379041c49627197e47fa3df1b59f6a1a553772e2a162fb38fe89d6eca61f5b0f6cdf1c9ccaf5857fab709b6d7eb3d773375d2db60f409a6df6dff6823fffa
7
- data.tar.gz: caa01a9ea924ec97127c8e75a9d1abb3511a93aef3f8a21714910dde7c133f4ffccfea35004d49d1d04d87f844807d596bdf2a956667d1647e0737db084e3e20
6
+ metadata.gz: '0832d2d8f27bf92b3117f1f60e42dfdba72846dc183f9f1b484da55f81d36afb0d8b2c8feea4fdb7ad3c33954f8a60c57dc5f49b9550ceda43bc963b032cda8d'
7
+ data.tar.gz: f0841925b5cc53ed8a57af3a9354d5b259af0e078388d31da68faf54f28cb693ebcb215e2ab3f9b0fa42bcbbdafaf2719e3b7a11764dbf27b70a9e9f06b3fac8
data/CHANGELOG.md CHANGED
@@ -1,13 +1,23 @@
1
1
  # Changelog
2
2
 
3
+ ## [1.0.0] - 2025-11-28
4
+
5
+ ### Changed
6
+
7
+ - **Major Refactor**: Decomposed the monolithic `Generator` class into smaller, single-responsibility components (`ExclusionFilter`, `RepositoryFetcher`, `ContentFetcher`, `ProgressIndicator`) for better maintainability and testability.
8
+ - **Performance**: Optimized GitHub API usage by switching from path-based content fetching to SHA-based blob fetching, significantly reducing API overhead and improving speed.
9
+ - **Internal**: Standardized logging and error handling across all new components.
10
+
3
11
  ## [0.7.1] - 2025-06-20
4
12
 
5
13
  ### Changed
14
+
6
15
  - Refactored file prioritization logic to use a `case` statement for improved readability and maintainability.
7
16
 
8
17
  ## [0.7.0] - 2025-06-04
9
18
 
10
19
  ### Changed
20
+
11
21
  - Improved file exclusion logic for glob patterns to correctly match files at any directory depth (e.g., `*.md` now correctly matches `docs/file.md`).
12
22
  - Refined internal handling of exclusion patterns for clarity and robustness, using `File.fnmatch` for all custom glob patterns.
13
23
  - Enhanced debug logging for file exclusion to show the specific pattern that caused a match.
@@ -15,22 +25,26 @@
15
25
  ## [0.6.3] - 2025-04-14
16
26
 
17
27
  ### Fixed
28
+
18
29
  - Fixed directory exclusion pattern to properly handle paths ending with slash
19
30
 
20
31
  ## [0.6.2] - 2025-04-11
21
32
 
22
33
  ### Changed
34
+
23
35
  - Updated Octokit dependency from ~> 5.0 to ~> 9.0
24
36
  - Updated various gem dependencies to their latest versions
25
37
 
26
38
  ## [0.6.1] - 2025-03-26
27
39
 
28
40
  ### Fixed
41
+
29
42
  - Fixed error "target of repeat operator is not specified" when using `--exclude` with glob patterns like `*.md`
30
43
 
31
44
  ## [0.6.0] - 2025-03-18
32
45
 
33
46
  ### Changed
47
+
34
48
  - Improved default branch handling to use repository's actual default branch instead of hardcoding "main"
35
49
  - Enhanced error handling in repository access validation
36
50
  - Updated documentation to reflect the correct default branch behavior
@@ -39,29 +53,34 @@
39
53
  ## [0.5.0] - 2025-03-10
40
54
 
41
55
  ### Added
56
+
42
57
  - Added repository directory structure visualization with `--show-structure` / `-s` option
43
58
  - Created `DirectoryStructureBuilder` class to generate tree views of repositories
44
59
  - Added `generate_directory_structure` method to the Generator class
45
60
  - Added tests for directory structure visualization
46
61
 
47
62
  ### Changed
63
+
48
64
  - Enhanced documentation with directory structure visualization examples
49
65
  - Updated CLI help with the new option
50
66
 
51
67
  ## [0.4.0] - 2025-03-03
52
68
 
53
69
  ### Added
70
+
54
71
  - Added `generate_prompt` method for in-memory content generation without file I/O
55
72
  - Integrated visual progress bar with file processing rate reporting
56
73
  - Added human-readable time formatting for progress estimates
57
74
  - Enhanced test coverage for multithreaded operations
58
75
 
59
76
  ### Changed
77
+
60
78
  - Refactored `process_content_to_output` for better code reuse between file and string output
61
79
  - Improved thread management to handle various error conditions more gracefully
62
80
  - Enhanced documentation with programmatic usage examples
63
81
 
64
82
  ### Fixed
83
+
65
84
  - Resolved thread pool shutdown issues in test environment
66
85
  - Fixed race conditions in progress indicator updates
67
86
  - Addressed timing inconsistencies in multithreaded test scenarios
@@ -69,6 +88,7 @@
69
88
  ## [0.3.1] - 2025-03-03
70
89
 
71
90
  ### Added
91
+
72
92
  - Introduced configurable threading options:
73
93
  - `:threads` to specify the number of threads (default: auto-detected).
74
94
  - `:thread_timeout` to define thread pool shutdown timeout (default: 60 seconds).
@@ -77,18 +97,21 @@
77
97
  - Improved progress indicator with a visual progress bar and estimated time remaining.
78
98
 
79
99
  ### Changed
100
+
80
101
  - Increased `BUFFER_SIZE` from 100 to 250 to reduce I/O operations.
81
102
  - Optimized file exclusion check using a combined regex for faster matching.
82
103
  - Improved thread pool efficiency by prioritizing smaller files first.
83
104
  - Enhanced error handling with detailed logging and thread-safe error collection.
84
105
 
85
106
  ### Fixed
107
+
86
108
  - Ensured thread pool shutdown respects the configured timeout.
87
109
  - Resolved potential race conditions in file content retrieval.
88
110
 
89
111
  ## [0.3.0] - 2025-03-02
90
112
 
91
113
  ### Added
114
+
92
115
  - Added `faraday-retry` gem dependency for better API rate limit handling.
93
116
  - Implemented thread-safe buffer management with mutex locks.
94
117
  - Introduced `ProgressIndicator` class for enhanced CLI progress reporting, including percentages.
@@ -101,6 +124,7 @@
101
124
  ## [0.2.0] - 2025-03-02
102
125
 
103
126
  ### Added
127
+
104
128
  - Introduced support for quiet and verbose modes in the command-line interface.
105
129
  - Added the ability to specify a custom output file for the prompt.
106
130
  - Implemented enhanced error handling with logging support.
@@ -114,6 +138,7 @@
114
138
  ## [0.1.0] - 2025-03-02
115
139
 
116
140
  ### Added
141
+
117
142
  - Initial release of Gitingest.
118
143
  - Core functionality to fetch and process GitHub repository files.
119
144
  - Command-line interface for easy interaction.
@@ -124,4 +149,4 @@
124
149
  - Automatic rate limit handling with a retry mechanism.
125
150
  - Repository prompt generation with file separation markers.
126
151
  - Support for custom branch selection.
127
- - Custom output file naming options.
152
+ - Custom output file naming options.
data/README.md CHANGED
@@ -28,7 +28,7 @@ bundle exec rake install
28
28
 
29
29
  ```bash
30
30
  # Basic usage (public repository)
31
- gitingest --repository user/repo
31
+ gitingest --repository user/repo
32
32
 
33
33
  # With GitHub token for private repositories
34
34
  gitingest --repository user/repo --token YOUR_GITHUB_TOKEN
@@ -109,7 +109,7 @@ generator = Gitingest::Generator.new(
109
109
  token: "YOUR_GITHUB_TOKEN",
110
110
  output_file: "my_prompt.txt",
111
111
  branch: "develop",
112
- exclude: ["*.md", "docs/"],
112
+ exclude: ["*.md", "docs/"],
113
113
  threads: 4, # control concurrency
114
114
  thread_timeout: 120, # custom thread timeout
115
115
  quiet: true # or verbose: true
@@ -126,11 +126,13 @@ generator = Gitingest::Generator.new(
126
126
  ## Features
127
127
 
128
128
  - Fetches all files from a GitHub repository based on the given branch
129
+ - **High Performance**: Optimized API usage with SHA-based blob fetching for faster content retrieval
129
130
  - Automatically excludes common binary files and system files by default
130
131
  - Allows custom exclusion patterns for specific file extensions or directories
131
132
  - Uses concurrent processing for faster downloads
132
133
  - Handles GitHub API rate limiting with automatic retry
133
134
  - Generates a clean, formatted output file with file paths and content
135
+ - **Modular Architecture**: Clean, maintainable codebase with single-responsibility components
134
136
 
135
137
  ## Default Exclusion Patterns
136
138
 
@@ -0,0 +1,126 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "concurrent"
4
+ require "base64"
5
+
6
+ module Gitingest
7
+ class ContentFetcher
8
+ BUFFER_SIZE = 250
9
+ LOCAL_BUFFER_THRESHOLD = 50
10
+ DEFAULT_THREAD_COUNT = [Concurrent.processor_count, 8].min
11
+ DEFAULT_THREAD_TIMEOUT = 60 # seconds
12
+
13
+ def initialize(client, repository, files, logger, options = {})
14
+ @client = client
15
+ @repository = repository
16
+ @files = files
17
+ @logger = logger
18
+ @threads = options[:threads] || DEFAULT_THREAD_COUNT
19
+ @thread_timeout = options[:thread_timeout] || DEFAULT_THREAD_TIMEOUT
20
+ end
21
+
22
+ def fetch(output)
23
+ @logger.debug "Using thread pool with #{@threads} threads"
24
+ buffer = []
25
+ progress = ProgressIndicator.new(@files.size, @logger)
26
+ thread_buffers = Concurrent::Map.new
27
+ mutex = Mutex.new
28
+ errors = Concurrent::Array.new
29
+ pool = Concurrent::FixedThreadPool.new(@threads)
30
+ prioritized_files = prioritize_files(@files)
31
+
32
+ prioritized_files.each_with_index do |repo_file, index|
33
+ pool.post do
34
+ thread_id = Thread.current.object_id
35
+ thread_buffers[thread_id] ||= []
36
+ local_buffer = thread_buffers[thread_id]
37
+ begin
38
+ content = fetch_file_content_with_retry(repo_file.sha)
39
+ local_buffer << format_file_content(repo_file.path, content)
40
+ if local_buffer.size >= LOCAL_BUFFER_THRESHOLD
41
+ mutex.synchronize do
42
+ buffer.concat(local_buffer)
43
+ write_buffer(output, buffer) if buffer.size >= BUFFER_SIZE
44
+ local_buffer.clear
45
+ end
46
+ end
47
+ progress.update(index + 1)
48
+ rescue Octokit::Error => e
49
+ mutex.synchronize { errors << "Error fetching #{repo_file.path}: #{e.message}" }
50
+ @logger.error "Error fetching #{repo_file.path}: #{e.message}"
51
+ rescue StandardError => e
52
+ mutex.synchronize { errors << "Unexpected error processing #{repo_file.path}: #{e.message}" }
53
+ @logger.error "Unexpected error processing #{repo_file.path}: #{e.message}"
54
+ end
55
+ end
56
+ end
57
+
58
+ pool.shutdown
59
+ unless pool.wait_for_termination(@thread_timeout)
60
+ @logger.warn "Thread pool did not shut down gracefully within #{@thread_timeout}s, forcing termination."
61
+ pool.kill
62
+ end
63
+
64
+ mutex.synchronize do
65
+ thread_buffers.each_value { |local_buffer| buffer.concat(local_buffer) unless local_buffer.empty? }
66
+ write_buffer(output, buffer) unless buffer.empty?
67
+ end
68
+
69
+ return unless errors.any?
70
+
71
+ @logger.warn "Completed with #{errors.size} errors"
72
+ @logger.debug "First few errors: #{errors.first(3).join(", ")}" if @logger.debug?
73
+ end
74
+
75
+ private
76
+
77
+ def format_file_content(path, content)
78
+ <<~TEXT
79
+ ================================================================
80
+ File: #{path}
81
+ ================================================================
82
+ #{content}
83
+
84
+ TEXT
85
+ end
86
+
87
+ def fetch_file_content_with_retry(sha, retries = 3, base_delay = 2)
88
+ blob = @client.blob(@repository, sha)
89
+ content = blob.content
90
+ case blob.encoding
91
+ when "base64"
92
+ Base64.decode64(content)
93
+ else
94
+ content
95
+ end
96
+ rescue Octokit::TooManyRequests
97
+ raise unless retries.positive?
98
+
99
+ delay = base_delay**(4 - retries) * (0.8 + 0.4 * rand)
100
+ @logger.warn "Rate limit exceeded, waiting #{delay.round(1)} seconds..."
101
+ sleep(delay)
102
+ fetch_file_content_with_retry(sha, retries - 1, base_delay)
103
+ end
104
+
105
+ def write_buffer(file, buffer)
106
+ return if buffer.empty?
107
+
108
+ file.puts(buffer.join)
109
+ buffer.clear
110
+ end
111
+
112
+ def prioritize_files(files)
113
+ files.sort_by do |file|
114
+ ext = File.extname(file.path.downcase)
115
+ case ext
116
+ when ".md", ".txt", ".json", ".yaml", ".yml"
117
+ 0 # Documentation and data files first
118
+ when ".rb", ".py", ".js", ".ts", ".go", ".java", ".c", ".cpp", ".h"
119
+ 1 # Source code files second
120
+ else
121
+ 2 # Other files last
122
+ end
123
+ end
124
+ end
125
+ end
126
+ end
@@ -0,0 +1,113 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Gitingest
4
+ class ExclusionFilter
5
+ # Default exclusion patterns for common files and directories
6
+ DEFAULT_EXCLUDES = [
7
+ # Version control
8
+ '\.git/', '\.github/', '\.gitignore', '\.gitattributes', '\.gitmodules', '\.svn', '\.hg',
9
+
10
+ # System files
11
+ '\.DS_Store', 'Thumbs\.db', 'desktop\.ini',
12
+
13
+ # Log files
14
+ '.*\.log$', '.*\.bak$', '.*\.swp$', '.*\.tmp$', '.*\.temp$',
15
+
16
+ # Images and media
17
+ '.*\.png$', '.*\.jpg$', '.*\.jpeg$', '.*\.gif$', '.*\.svg$', '.*\.ico$',
18
+ '.*\.pdf$', '.*\.mov$', '.*\.mp4$', '.*\.mp3$', '.*\.wav$',
19
+
20
+ # Archives
21
+ '.*\.zip$', '.*\.tar\.gz$',
22
+
23
+ # Dependency directories
24
+ "node_modules/", "vendor/", "bower_components/", "\.npm/", "\.yarn/", "\.pnpm-store/",
25
+ "\.bundle/", "vendor/bundle", "packages/", "site-packages/",
26
+
27
+ # Virtual environments
28
+ "venv/", "\.venv/", "env/", "\.env", "virtualenv/",
29
+
30
+ # IDE and editor files
31
+ "\.idea/", "\.vscode/", "\.vs/", "\.settings/", ".*\.sublime-.*",
32
+ "\.project", "\.classpath", "xcuserdata/", ".*\.xcodeproj/", ".*\.xcworkspace/",
33
+
34
+ # Lock files
35
+ "package-lock\.json", "yarn\.lock", "poetry\.lock", "Pipfile\.lock",
36
+ "Gemfile\.lock", "Cargo\.lock", "bun\.lock", "bun\.lockb",
37
+
38
+ # Build directories and artifacts
39
+ "build/", "dist/", "target/", "out/", "\.gradle/", "\.settings/",
40
+ ".*\.egg-info", ".*\.egg", ".*\.whl", ".*\.so", "bin/", "obj/", "pkg/",
41
+
42
+ # Cache directories
43
+ "\.cache/", "\.sass-cache/", "\.eslintcache/", "\.pytest_cache/",
44
+ "\.coverage", "\.tox/", "\.nox/", "\.mypy_cache/", "\.ruff_cache/",
45
+ "\.hypothesis/", "\.terraform/", "\.docusaurus/", "\.next/", "\.nuxt/",
46
+
47
+ # Compiled code
48
+ ".*\.pyc$", ".*\.pyo$", ".*\.pyd$", "__pycache__/", ".*\.class$",
49
+ ".*\.jar$", ".*\.war$", ".*\.ear$", ".*\.nar$",
50
+ ".*\.o$", ".*\.obj$", ".*\.dll$", ".*\.dylib$", ".*\.exe$",
51
+ ".*\.lib$", ".*\.out$", ".*\.a$", ".*\.pdb$", ".*\.nupkg$",
52
+
53
+ # Language-specific files
54
+ ".*\.min\.js$", ".*\.min\.css$", ".*\.map$", ".*\.tfstate.*",
55
+ ".*\.gem$", ".*\.ruby-version", ".*\.ruby-gemset", ".*\.rvmrc",
56
+ ".*\.rs\.bk$", ".*\.gradle", ".*\.suo", ".*\.user", ".*\.userosscache",
57
+ ".*\.sln\.docstates", "gradle-app\.setting",
58
+ ".*\.pbxuser", ".*\.mode1v3", ".*\.mode2v3", ".*\.perspectivev3", ".*\.xcuserstate",
59
+ "\.swiftpm/", "\.build/"
60
+ ].freeze
61
+
62
+ # Pattern for dot files/directories
63
+ DOT_FILE_PATTERN = %r{(?-mix:(^\.|/\.))}
64
+
65
+ def initialize(custom_excludes = [])
66
+ @custom_excludes = custom_excludes || []
67
+ compile_excluded_patterns
68
+ end
69
+
70
+ def excluded?(path)
71
+ return true if path.match?(DOT_FILE_PATTERN)
72
+
73
+ # Check for directory exclusion patterns (ending with '/')
74
+ matched_dir_pattern = @directory_patterns.find { |dir_pattern| path.start_with?(dir_pattern) }
75
+ return true if matched_dir_pattern
76
+
77
+ # Check default regex patterns
78
+ matched_default_pattern = @default_patterns.find { |pattern| path.match?(pattern) }
79
+ return true if matched_default_pattern
80
+
81
+ # Check custom glob patterns using File.fnmatch
82
+ matched_glob_pattern = @custom_glob_patterns.find do |glob_pattern|
83
+ File.fnmatch(glob_pattern, path, File::FNM_PATHNAME | File::FNM_DOTMATCH)
84
+ end
85
+ return true if matched_glob_pattern
86
+
87
+ false
88
+ end
89
+
90
+ private
91
+
92
+ def compile_excluded_patterns
93
+ @default_patterns = DEFAULT_EXCLUDES.map { |pattern| Regexp.new(pattern) }
94
+ @custom_glob_patterns = [] # For File.fnmatch
95
+ @directory_patterns = []
96
+
97
+ @custom_excludes.each do |pattern_str|
98
+ if pattern_str.end_with?("/")
99
+ @directory_patterns << pattern_str
100
+ else
101
+ # All other custom excludes are treated as glob patterns.
102
+ # If the pattern does not contain a slash, prepend "**/"
103
+ # to make it match at any depth (e.g., "*.md" becomes "**/*.md").
104
+ @custom_glob_patterns << if pattern_str.include?("/")
105
+ pattern_str
106
+ else
107
+ "**/#{pattern_str}"
108
+ end
109
+ end
110
+ end
111
+ end
112
+ end
113
+ end
@@ -1,98 +1,19 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require "octokit"
4
- require "base64"
5
- require "fileutils"
6
- require "concurrent"
7
4
  require "logger"
8
5
 
9
6
  module Gitingest
10
7
  class Generator
11
- # Default exclusion patterns for common files and directories
12
- DEFAULT_EXCLUDES = [
13
- # Version control
14
- '\.git/', '\.github/', '\.gitignore', '\.gitattributes', '\.gitmodules', '\.svn', '\.hg',
15
-
16
- # System files
17
- '\.DS_Store', 'Thumbs\.db', 'desktop\.ini',
18
-
19
- # Log files
20
- '.*\.log$', '.*\.bak$', '.*\.swp$', '.*\.tmp$', '.*\.temp$',
21
-
22
- # Images and media
23
- '.*\.png$', '.*\.jpg$', '.*\.jpeg$', '.*\.gif$', '.*\.svg$', '.*\.ico$',
24
- '.*\.pdf$', '.*\.mov$', '.*\.mp4$', '.*\.mp3$', '.*\.wav$',
25
-
26
- # Archives
27
- '.*\.zip$', '.*\.tar\.gz$',
28
-
29
- # Dependency directories
30
- "node_modules/", "vendor/", "bower_components/", "\.npm/", "\.yarn/", "\.pnpm-store/",
31
- "\.bundle/", "vendor/bundle", "packages/", "site-packages/",
32
-
33
- # Virtual environments
34
- "venv/", "\.venv/", "env/", "\.env", "virtualenv/",
35
-
36
- # IDE and editor files
37
- "\.idea/", "\.vscode/", "\.vs/", "\.settings/", ".*\.sublime-.*",
38
- "\.project", "\.classpath", "xcuserdata/", ".*\.xcodeproj/", ".*\.xcworkspace/",
39
-
40
- # Lock files
41
- "package-lock\.json", "yarn\.lock", "poetry\.lock", "Pipfile\.lock",
42
- "Gemfile\.lock", "Cargo\.lock", "bun\.lock", "bun\.lockb",
43
-
44
- # Build directories and artifacts
45
- "build/", "dist/", "target/", "out/", "\.gradle/", "\.settings/",
46
- ".*\.egg-info", ".*\.egg", ".*\.whl", ".*\.so", "bin/", "obj/", "pkg/",
47
-
48
- # Cache directories
49
- "\.cache/", "\.sass-cache/", "\.eslintcache/", "\.pytest_cache/",
50
- "\.coverage", "\.tox/", "\.nox/", "\.mypy_cache/", "\.ruff_cache/",
51
- "\.hypothesis/", "\.terraform/", "\.docusaurus/", "\.next/", "\.nuxt/",
52
-
53
- # Compiled code
54
- ".*\.pyc$", ".*\.pyo$", ".*\.pyd$", "__pycache__/", ".*\.class$",
55
- ".*\.jar$", ".*\.war$", ".*\.ear$", ".*\.nar$",
56
- ".*\.o$", ".*\.obj$", ".*\.dll$", ".*\.dylib$", ".*\.exe$",
57
- ".*\.lib$", ".*\.out$", ".*\.a$", ".*\.pdb$", ".*\.nupkg$",
58
-
59
- # Language-specific files
60
- ".*\.min\.js$", ".*\.min\.css$", ".*\.map$", ".*\.tfstate.*",
61
- ".*\.gem$", ".*\.ruby-version", ".*\.ruby-gemset", ".*\.rvmrc",
62
- ".*\.rs\.bk$", ".*\.gradle", ".*\.suo", ".*\.user", ".*\.userosscache",
63
- ".*\.sln\.docstates", "gradle-app\.setting",
64
- ".*\.pbxuser", ".*\.mode1v3", ".*\.mode2v3", ".*\.perspectivev3", ".*\.xcuserstate",
65
- "\.swiftpm/", "\.build/"
66
- ].freeze
67
-
68
- # Pattern for dot files/directories
69
- DOT_FILE_PATTERN = %r{(?-mix:(^\.|/\.))}
70
-
71
- # Maximum number of files to process to prevent memory overload
72
- MAX_FILES = 1000
73
-
74
- # Buffer size to reduce I/O operations
75
- BUFFER_SIZE = 250
76
-
77
- # Thread-local buffer threshold
78
- LOCAL_BUFFER_THRESHOLD = 50
79
-
80
- # Default threading options
81
- DEFAULT_THREAD_COUNT = [Concurrent.processor_count, 8].min
82
- DEFAULT_THREAD_TIMEOUT = 60 # seconds
83
-
84
- attr_reader :options, :client, :repo_files, :excluded_patterns, :logger
8
+ attr_reader :options, :client, :repo_files, :logger
85
9
 
86
10
  def initialize(options = {})
87
11
  @options = options
88
12
  @repo_files = []
89
- # @excluded_patterns = [] # This will be set after validate_options
90
13
  setup_logger
91
14
  validate_options
92
15
  configure_client
93
- # Populate @excluded_patterns with raw patterns after options are validated
94
- @excluded_patterns = DEFAULT_EXCLUDES + @options.fetch(:exclude, [])
95
- compile_excluded_patterns
16
+ @exclusion_filter = ExclusionFilter.new(@options[:exclude])
96
17
  end
97
18
 
98
19
  def run
@@ -133,6 +54,15 @@ module Gitingest
133
54
  structure
134
55
  end
135
56
 
57
+ # Exposed for testing
58
+ def excluded_patterns
59
+ # This is a bit of a hack to maintain backward compatibility with tests
60
+ # that check for excluded_patterns. In the new design, this is handled
61
+ # by ExclusionFilter.
62
+ @exclusion_filter.instance_variable_get(:@default_patterns) +
63
+ @exclusion_filter.instance_variable_get(:@custom_glob_patterns).map { |p| Regexp.new(p.gsub("*", ".*")) }
64
+ end
65
+
136
66
  private
137
67
 
138
68
  def setup_logger
@@ -152,11 +82,10 @@ module Gitingest
152
82
 
153
83
  @options[:output_file] ||= "#{@options[:repository].split("/").last}_prompt.txt"
154
84
  @options[:branch] ||= :default
155
- @options[:exclude] ||= [] # Ensure :exclude is always an array
156
- @options[:threads] ||= DEFAULT_THREAD_COUNT
157
- @options[:thread_timeout] ||= DEFAULT_THREAD_TIMEOUT
85
+ @options[:exclude] ||= []
86
+ @options[:threads] ||= ContentFetcher::DEFAULT_THREAD_COUNT
87
+ @options[:thread_timeout] ||= ContentFetcher::DEFAULT_THREAD_TIMEOUT
158
88
  @options[:show_structure] ||= false
159
- # NOTE: @excluded_patterns is set in compile_excluded_patterns based on @options[:exclude] # This comment is now incorrect / removed.
160
89
  end
161
90
 
162
91
  def configure_client
@@ -169,241 +98,16 @@ module Gitingest
169
98
  end
170
99
  end
171
100
 
172
- def compile_excluded_patterns
173
- @default_patterns = DEFAULT_EXCLUDES.map { |pattern| Regexp.new(pattern) }
174
- @custom_glob_patterns = [] # For File.fnmatch
175
- @directory_patterns = []
176
-
177
- @options[:exclude].each do |pattern_str|
178
- if pattern_str.end_with?("/")
179
- @directory_patterns << pattern_str
180
- else
181
- # All other custom excludes are treated as glob patterns.
182
- # If the pattern does not contain a slash, prepend "**/"
183
- # to make it match at any depth (e.g., "*.md" becomes "**/*.md").
184
- @custom_glob_patterns << if pattern_str.include?("/")
185
- pattern_str
186
- else
187
- "**/#{pattern_str}"
188
- end
189
- end
190
- end
191
- end
192
-
193
101
  def fetch_repository_contents
194
102
  @logger.info "Fetching repository: #{@options[:repository]} (branch: #{@options[:branch]})"
195
- validate_repository_access
196
- repo_tree = @client.tree(@options[:repository], @options[:branch], recursive: true)
197
- @repo_files = repo_tree.tree.select { |item| item.type == "blob" && !excluded_file?(item.path) }
198
- if @repo_files.size > MAX_FILES
199
- @logger.warn "Warning: Found #{@repo_files.size} files, limited to #{MAX_FILES}."
200
- @repo_files = @repo_files.first(MAX_FILES)
201
- end
103
+ fetcher = RepositoryFetcher.new(@client, @options[:repository], @options[:branch], @exclusion_filter)
104
+ @repo_files = fetcher.fetch
202
105
  @logger.info "Found #{@repo_files.size} files after exclusion filters"
203
- rescue Octokit::Unauthorized
204
- raise "Authentication error: Invalid or expired GitHub token."
205
- rescue Octokit::NotFound
206
- raise "Repository not found: '#{@options[:repository]}' or branch '#{@options[:branch]}' doesn't exist or is private."
207
- rescue Octokit::Error => e
208
- raise "Error accessing repository: #{e.message}"
209
- end
210
-
211
- # Validate repository and branch access
212
- def validate_repository_access
213
- repo = @client.repository(@options[:repository])
214
- @options[:branch] = repo.default_branch if @options[:branch] == :default
215
-
216
- # If repository check succeeds, store this fact before trying branch
217
- @repository_exists = true
218
-
219
- begin
220
- @client.branch(@options[:repository], @options[:branch])
221
- rescue Octokit::NotFound
222
- # If we got here, the repository exists but the branch doesn't
223
- raise "Branch '#{@options[:branch]}' not found in repository '#{@options[:repository]}'"
224
- end
225
- rescue Octokit::Unauthorized
226
- raise "Authentication error: Invalid or expired GitHub token"
227
- rescue Octokit::NotFound
228
- # Only reach this for repository not found (branch errors handled separately)
229
- raise "Repository '#{@options[:repository]}' not found or is private. Check the repository name or provide a valid token."
230
- end
231
-
232
- def excluded_file?(path)
233
- return true if path.match?(DOT_FILE_PATTERN)
234
-
235
- # Check for directory exclusion patterns (ending with '/')
236
- matched_dir_pattern = @directory_patterns.find { |dir_pattern| path.start_with?(dir_pattern) }
237
- if matched_dir_pattern
238
- @logger.debug { "Excluding #{path} (matched directory pattern: #{matched_dir_pattern})" }
239
- return true
240
- end
241
-
242
- # Check default regex patterns
243
- matched_default_pattern = @default_patterns.find { |pattern| path.match?(pattern) }
244
- if matched_default_pattern
245
- @logger.debug { "Excluding #{path} (matched default pattern: #{matched_default_pattern.source})" }
246
- return true
247
- end
248
-
249
- # Check custom glob patterns using File.fnmatch
250
- matched_glob_pattern = @custom_glob_patterns.find do |glob_pattern|
251
- File.fnmatch(glob_pattern, path, File::FNM_PATHNAME | File::FNM_DOTMATCH)
252
- end
253
- if matched_glob_pattern
254
- @logger.debug { "Excluding #{path} (matched custom glob pattern: #{matched_glob_pattern})" }
255
- return true
256
- end
257
-
258
- false
259
106
  end
260
107
 
261
108
  def process_content_to_output(output)
262
- @logger.debug "Using thread pool with #{@options[:threads]} threads"
263
- buffer = []
264
- progress = ProgressIndicator.new(@repo_files.size, @logger)
265
- thread_buffers = Concurrent::Map.new # Thread-safe map for buffers
266
- mutex = Mutex.new # Mutex for shared buffer and output operations
267
- errors = Concurrent::Array.new # Thread-safe array for errors
268
- pool = Concurrent::FixedThreadPool.new(@options[:threads])
269
- prioritized_files = prioritize_files(@repo_files)
270
-
271
- prioritized_files.each_with_index do |repo_file, index|
272
- pool.post do
273
- thread_id = Thread.current.object_id
274
- thread_buffers[thread_id] ||= []
275
- local_buffer = thread_buffers[thread_id]
276
- begin
277
- content = fetch_file_content_with_retry(repo_file.path)
278
- local_buffer << format_file_content(repo_file.path, content)
279
- if local_buffer.size >= LOCAL_BUFFER_THRESHOLD
280
- mutex.synchronize do
281
- buffer.concat(local_buffer)
282
- write_buffer(output, buffer) if buffer.size >= BUFFER_SIZE
283
- local_buffer.clear
284
- end
285
- end
286
- progress.update(index + 1)
287
- rescue Octokit::Error => e
288
- mutex.synchronize { errors << "Error fetching #{repo_file.path}: #{e.message}" }
289
- @logger.error "Error fetching #{repo_file.path}: #{e.message}"
290
- rescue StandardError => e
291
- mutex.synchronize { errors << "Unexpected error processing #{repo_file.path}: #{e.message}" }
292
- @logger.error "Unexpected error processing #{repo_file.path}: #{e.message}"
293
- end
294
- end
295
- end
296
-
297
- pool.shutdown
298
- unless pool.wait_for_termination(@options[:thread_timeout])
299
- @logger.warn "Thread pool did not shut down gracefully within #{@options[:thread_timeout]}s, forcing termination."
300
- pool.kill
301
- end
302
-
303
- mutex.synchronize do
304
- thread_buffers.each_value { |local_buffer| buffer.concat(local_buffer) unless local_buffer.empty? }
305
- write_buffer(output, buffer) unless buffer.empty?
306
- end
307
-
308
- return unless errors.any?
309
-
310
- @logger.warn "Completed with #{errors.size} errors"
311
- @logger.debug "First few errors: #{errors.first(3).join(", ")}" if @logger.debug?
312
- end
313
-
314
- def format_file_content(path, content)
315
- <<~TEXT
316
- ================================================================
317
- File: #{path}
318
- ================================================================
319
- #{content}
320
-
321
- TEXT
322
- end
323
-
324
- def fetch_file_content_with_retry(path, retries = 3, base_delay = 2)
325
- content = @client.contents(@options[:repository], path: path, ref: @options[:branch])
326
- Base64.decode64(content.content)
327
- rescue Octokit::TooManyRequests
328
- raise unless retries.positive?
329
-
330
- delay = base_delay**(4 - retries) * (0.8 + 0.4 * rand)
331
- @logger.warn "Rate limit exceeded, waiting #{delay.round(1)} seconds..."
332
- sleep(delay)
333
- fetch_file_content_with_retry(path, retries - 1, base_delay)
334
- end
335
-
336
- def write_buffer(file, buffer)
337
- return if buffer.empty?
338
-
339
- file.puts(buffer.join)
340
- buffer.clear
341
- end
342
-
343
- def prioritize_files(files)
344
- files.sort_by do |file|
345
- ext = File.extname(file.path.downcase)
346
- case ext
347
- when ".md", ".txt", ".json", ".yaml", ".yml"
348
- 0 # Documentation and data files first
349
- when ".rb", ".py", ".js", ".ts", ".go", ".java", ".c", ".cpp", ".h"
350
- 1 # Source code files second
351
- else
352
- 2 # Other files last
353
- end
354
- end
355
- end
356
- end
357
-
358
- class ProgressIndicator
359
- BAR_WIDTH = 30
360
-
361
- def initialize(total, logger)
362
- @total = total
363
- @logger = logger
364
- @last_percent = 0
365
- @start_time = Time.now
366
- @last_update_time = Time.now
367
- @update_interval = 0.5
368
- end
369
-
370
- def update(current)
371
- now = Time.now
372
- return if now - @last_update_time < @update_interval && current != @total
373
-
374
- @last_update_time = now
375
- percent = (current.to_f / @total * 100).round
376
- return unless percent > @last_percent || current == @total
377
-
378
- elapsed = now - @start_time
379
- progress_chars = (BAR_WIDTH * (current.to_f / @total)).round
380
- bar = "[#{"|" * progress_chars}#{" " * (BAR_WIDTH - progress_chars)}]"
381
-
382
- rate = if elapsed.positive?
383
- (current / elapsed).round(1)
384
- else
385
- 0 # Avoid division by zero if elapsed time is zero
386
- end
387
- eta_string = current.positive? && percent < 100 && rate.positive? ? " ETA: #{format_time((@total - current) / rate)}" : ""
388
-
389
- print "\r\e[K#{bar} #{percent}% | #{current}/#{@total} files (#{rate} files/sec)#{eta_string}"
390
- print "\n" if current == @total
391
- if (percent % 10).zero? && percent != @last_percent || current == @total
392
- @logger.info "Processing: #{percent}% complete (#{current}/#{@total} files)#{eta_string}"
393
- end
394
- @last_percent = percent
395
- end
396
-
397
- private
398
-
399
- def format_time(seconds)
400
- return "< 1s" if seconds < 1
401
-
402
- case seconds
403
- when 0...60 then "#{seconds.round}s"
404
- when 60...3600 then "#{(seconds / 60).floor}m #{(seconds % 60).round}s"
405
- else "#{(seconds / 3600).floor}h #{((seconds % 3600) / 60).floor}m"
406
- end
109
+ fetcher = ContentFetcher.new(@client, @options[:repository], @repo_files, @logger, @options)
110
+ fetcher.fetch(output)
407
111
  end
408
112
  end
409
113
 
@@ -0,0 +1,55 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Gitingest
4
+ class ProgressIndicator
5
+ BAR_WIDTH = 30
6
+
7
+ def initialize(total, logger)
8
+ @total = total
9
+ @logger = logger
10
+ @last_percent = 0
11
+ @start_time = Time.now
12
+ @last_update_time = Time.now
13
+ @update_interval = 0.5
14
+ end
15
+
16
+ def update(current)
17
+ now = Time.now
18
+ return if now - @last_update_time < @update_interval && current != @total
19
+
20
+ @last_update_time = now
21
+ percent = (current.to_f / @total * 100).round
22
+ return unless percent > @last_percent || current == @total
23
+
24
+ elapsed = now - @start_time
25
+ progress_chars = (BAR_WIDTH * (current.to_f / @total)).round
26
+ bar = "[#{"|" * progress_chars}#{" " * (BAR_WIDTH - progress_chars)}]"
27
+
28
+ rate = if elapsed.positive?
29
+ (current / elapsed).round(1)
30
+ else
31
+ 0 # Avoid division by zero if elapsed time is zero
32
+ end
33
+ eta_string = current.positive? && percent < 100 && rate.positive? ? " ETA: #{format_time((@total - current) / rate)}" : ""
34
+
35
+ print "\r\e[K#{bar} #{percent}% | #{current}/#{@total} files (#{rate} files/sec)#{eta_string}"
36
+ print "\n" if current == @total
37
+ if (percent % 10).zero? && percent != @last_percent || current == @total
38
+ @logger.info "Processing: #{percent}% complete (#{current}/#{@total} files)#{eta_string}"
39
+ end
40
+ @last_percent = percent
41
+ end
42
+
43
+ private
44
+
45
+ def format_time(seconds)
46
+ return "< 1s" if seconds < 1
47
+
48
+ case seconds
49
+ when 0...60 then "#{seconds.round}s"
50
+ when 60...3600 then "#{(seconds / 60).floor}m #{(seconds % 60).round}s"
51
+ else "#{(seconds / 3600).floor}h #{((seconds % 3600) / 60).floor}m"
52
+ end
53
+ end
54
+ end
55
+ end
@@ -0,0 +1,55 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "octokit"
4
+
5
+ module Gitingest
6
+ class RepositoryFetcher
7
+ MAX_FILES = 1000
8
+
9
+ def initialize(client, repository, branch = :default, exclusion_filter = nil)
10
+ @client = client
11
+ @repository = repository
12
+ @branch = branch
13
+ @exclusion_filter = exclusion_filter
14
+ end
15
+
16
+ def fetch
17
+ validate_repository_access
18
+ repo_tree = @client.tree(@repository, @branch, recursive: true)
19
+
20
+ files = repo_tree.tree.select do |item|
21
+ item.type == "blob" && !@exclusion_filter&.excluded?(item.path)
22
+ end
23
+
24
+ if files.size > MAX_FILES
25
+ # We might want to warn here, but for now we just truncate
26
+ files = files.first(MAX_FILES)
27
+ end
28
+
29
+ files
30
+ rescue Octokit::Unauthorized
31
+ raise "Authentication error: Invalid or expired GitHub token."
32
+ rescue Octokit::NotFound
33
+ raise "Repository not found: '#{@repository}' or branch '#{@branch}' doesn't exist or is private."
34
+ rescue Octokit::Error => e
35
+ raise "Error accessing repository: #{e.message}"
36
+ end
37
+
38
+ private
39
+
40
+ def validate_repository_access
41
+ repo = @client.repository(@repository)
42
+ @branch = repo.default_branch if @branch == :default
43
+
44
+ begin
45
+ @client.branch(@repository, @branch)
46
+ rescue Octokit::NotFound
47
+ raise "Branch '#{@branch}' not found in repository '#{@repository}'"
48
+ end
49
+ rescue Octokit::Unauthorized
50
+ raise "Authentication error: Invalid or expired GitHub token"
51
+ rescue Octokit::NotFound
52
+ raise "Repository '#{@repository}' not found or is private. Check the repository name or provide a valid token."
53
+ end
54
+ end
55
+ end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Gitingest
4
- VERSION = "0.7.1"
4
+ VERSION = "1.0.0"
5
5
  end
data/lib/gitingest.rb CHANGED
@@ -1,6 +1,10 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require_relative "gitingest/version"
4
+ require_relative "gitingest/exclusion_filter"
5
+ require_relative "gitingest/repository_fetcher"
6
+ require_relative "gitingest/progress_indicator"
7
+ require_relative "gitingest/content_fetcher"
4
8
  require_relative "gitingest/generator"
5
9
 
6
10
  module Gitingest
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gitingest
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.1
4
+ version: 1.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Davide Santangelo
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2025-06-20 00:00:00.000000000 Z
11
+ date: 2025-11-28 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: concurrent-ruby
@@ -133,7 +133,11 @@ files:
133
133
  - bin/setup
134
134
  - index.html
135
135
  - lib/gitingest.rb
136
+ - lib/gitingest/content_fetcher.rb
137
+ - lib/gitingest/exclusion_filter.rb
136
138
  - lib/gitingest/generator.rb
139
+ - lib/gitingest/progress_indicator.rb
140
+ - lib/gitingest/repository_fetcher.rb
137
141
  - lib/gitingest/version.rb
138
142
  - sig/gitingest.rbs
139
143
  homepage: https://github.com/davidesantangelo/gitingest