gitingest 0.7.1 → 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +26 -1
- data/README.md +4 -2
- data/lib/gitingest/content_fetcher.rb +126 -0
- data/lib/gitingest/exclusion_filter.rb +113 -0
- data/lib/gitingest/generator.rb +18 -314
- data/lib/gitingest/progress_indicator.rb +55 -0
- data/lib/gitingest/repository_fetcher.rb +55 -0
- data/lib/gitingest/version.rb +1 -1
- data/lib/gitingest.rb +4 -0
- metadata +6 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 53327f20d859b72079399395ab4ac9f31cd045d65c9e5f40cd771192b8da7298
|
|
4
|
+
data.tar.gz: 52eaa4f2d759bf50ce8fcfd8174005ac61102dd9989455828fc35f62b6c6c244
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: '0832d2d8f27bf92b3117f1f60e42dfdba72846dc183f9f1b484da55f81d36afb0d8b2c8feea4fdb7ad3c33954f8a60c57dc5f49b9550ceda43bc963b032cda8d'
|
|
7
|
+
data.tar.gz: f0841925b5cc53ed8a57af3a9354d5b259af0e078388d31da68faf54f28cb693ebcb215e2ab3f9b0fa42bcbbdafaf2719e3b7a11764dbf27b70a9e9f06b3fac8
|
data/CHANGELOG.md
CHANGED
|
@@ -1,13 +1,23 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## [1.0.0] - 2025-11-28
|
|
4
|
+
|
|
5
|
+
### Changed
|
|
6
|
+
|
|
7
|
+
- **Major Refactor**: Decomposed the monolithic `Generator` class into smaller, single-responsibility components (`ExclusionFilter`, `RepositoryFetcher`, `ContentFetcher`, `ProgressIndicator`) for better maintainability and testability.
|
|
8
|
+
- **Performance**: Optimized GitHub API usage by switching from path-based content fetching to SHA-based blob fetching, significantly reducing API overhead and improving speed.
|
|
9
|
+
- **Internal**: Standardized logging and error handling across all new components.
|
|
10
|
+
|
|
3
11
|
## [0.7.1] - 2025-06-20
|
|
4
12
|
|
|
5
13
|
### Changed
|
|
14
|
+
|
|
6
15
|
- Refactored file prioritization logic to use a `case` statement for improved readability and maintainability.
|
|
7
16
|
|
|
8
17
|
## [0.7.0] - 2025-06-04
|
|
9
18
|
|
|
10
19
|
### Changed
|
|
20
|
+
|
|
11
21
|
- Improved file exclusion logic for glob patterns to correctly match files at any directory depth (e.g., `*.md` now correctly matches `docs/file.md`).
|
|
12
22
|
- Refined internal handling of exclusion patterns for clarity and robustness, using `File.fnmatch` for all custom glob patterns.
|
|
13
23
|
- Enhanced debug logging for file exclusion to show the specific pattern that caused a match.
|
|
@@ -15,22 +25,26 @@
|
|
|
15
25
|
## [0.6.3] - 2025-04-14
|
|
16
26
|
|
|
17
27
|
### Fixed
|
|
28
|
+
|
|
18
29
|
- Fixed directory exclusion pattern to properly handle paths ending with slash
|
|
19
30
|
|
|
20
31
|
## [0.6.2] - 2025-04-11
|
|
21
32
|
|
|
22
33
|
### Changed
|
|
34
|
+
|
|
23
35
|
- Updated Octokit dependency from ~> 5.0 to ~> 9.0
|
|
24
36
|
- Updated various gem dependencies to their latest versions
|
|
25
37
|
|
|
26
38
|
## [0.6.1] - 2025-03-26
|
|
27
39
|
|
|
28
40
|
### Fixed
|
|
41
|
+
|
|
29
42
|
- Fixed error "target of repeat operator is not specified" when using `--exclude` with glob patterns like `*.md`
|
|
30
43
|
|
|
31
44
|
## [0.6.0] - 2025-03-18
|
|
32
45
|
|
|
33
46
|
### Changed
|
|
47
|
+
|
|
34
48
|
- Improved default branch handling to use repository's actual default branch instead of hardcoding "main"
|
|
35
49
|
- Enhanced error handling in repository access validation
|
|
36
50
|
- Updated documentation to reflect the correct default branch behavior
|
|
@@ -39,29 +53,34 @@
|
|
|
39
53
|
## [0.5.0] - 2025-03-10
|
|
40
54
|
|
|
41
55
|
### Added
|
|
56
|
+
|
|
42
57
|
- Added repository directory structure visualization with `--show-structure` / `-s` option
|
|
43
58
|
- Created `DirectoryStructureBuilder` class to generate tree views of repositories
|
|
44
59
|
- Added `generate_directory_structure` method to the Generator class
|
|
45
60
|
- Added tests for directory structure visualization
|
|
46
61
|
|
|
47
62
|
### Changed
|
|
63
|
+
|
|
48
64
|
- Enhanced documentation with directory structure visualization examples
|
|
49
65
|
- Updated CLI help with the new option
|
|
50
66
|
|
|
51
67
|
## [0.4.0] - 2025-03-03
|
|
52
68
|
|
|
53
69
|
### Added
|
|
70
|
+
|
|
54
71
|
- Added `generate_prompt` method for in-memory content generation without file I/O
|
|
55
72
|
- Integrated visual progress bar with file processing rate reporting
|
|
56
73
|
- Added human-readable time formatting for progress estimates
|
|
57
74
|
- Enhanced test coverage for multithreaded operations
|
|
58
75
|
|
|
59
76
|
### Changed
|
|
77
|
+
|
|
60
78
|
- Refactored `process_content_to_output` for better code reuse between file and string output
|
|
61
79
|
- Improved thread management to handle various error conditions more gracefully
|
|
62
80
|
- Enhanced documentation with programmatic usage examples
|
|
63
81
|
|
|
64
82
|
### Fixed
|
|
83
|
+
|
|
65
84
|
- Resolved thread pool shutdown issues in test environment
|
|
66
85
|
- Fixed race conditions in progress indicator updates
|
|
67
86
|
- Addressed timing inconsistencies in multithreaded test scenarios
|
|
@@ -69,6 +88,7 @@
|
|
|
69
88
|
## [0.3.1] - 2025-03-03
|
|
70
89
|
|
|
71
90
|
### Added
|
|
91
|
+
|
|
72
92
|
- Introduced configurable threading options:
|
|
73
93
|
- `:threads` to specify the number of threads (default: auto-detected).
|
|
74
94
|
- `:thread_timeout` to define thread pool shutdown timeout (default: 60 seconds).
|
|
@@ -77,18 +97,21 @@
|
|
|
77
97
|
- Improved progress indicator with a visual progress bar and estimated time remaining.
|
|
78
98
|
|
|
79
99
|
### Changed
|
|
100
|
+
|
|
80
101
|
- Increased `BUFFER_SIZE` from 100 to 250 to reduce I/O operations.
|
|
81
102
|
- Optimized file exclusion check using a combined regex for faster matching.
|
|
82
103
|
- Improved thread pool efficiency by prioritizing smaller files first.
|
|
83
104
|
- Enhanced error handling with detailed logging and thread-safe error collection.
|
|
84
105
|
|
|
85
106
|
### Fixed
|
|
107
|
+
|
|
86
108
|
- Ensured thread pool shutdown respects the configured timeout.
|
|
87
109
|
- Resolved potential race conditions in file content retrieval.
|
|
88
110
|
|
|
89
111
|
## [0.3.0] - 2025-03-02
|
|
90
112
|
|
|
91
113
|
### Added
|
|
114
|
+
|
|
92
115
|
- Added `faraday-retry` gem dependency for better API rate limit handling.
|
|
93
116
|
- Implemented thread-safe buffer management with mutex locks.
|
|
94
117
|
- Introduced `ProgressIndicator` class for enhanced CLI progress reporting, including percentages.
|
|
@@ -101,6 +124,7 @@
|
|
|
101
124
|
## [0.2.0] - 2025-03-02
|
|
102
125
|
|
|
103
126
|
### Added
|
|
127
|
+
|
|
104
128
|
- Introduced support for quiet and verbose modes in the command-line interface.
|
|
105
129
|
- Added the ability to specify a custom output file for the prompt.
|
|
106
130
|
- Implemented enhanced error handling with logging support.
|
|
@@ -114,6 +138,7 @@
|
|
|
114
138
|
## [0.1.0] - 2025-03-02
|
|
115
139
|
|
|
116
140
|
### Added
|
|
141
|
+
|
|
117
142
|
- Initial release of Gitingest.
|
|
118
143
|
- Core functionality to fetch and process GitHub repository files.
|
|
119
144
|
- Command-line interface for easy interaction.
|
|
@@ -124,4 +149,4 @@
|
|
|
124
149
|
- Automatic rate limit handling with a retry mechanism.
|
|
125
150
|
- Repository prompt generation with file separation markers.
|
|
126
151
|
- Support for custom branch selection.
|
|
127
|
-
- Custom output file naming options.
|
|
152
|
+
- Custom output file naming options.
|
data/README.md
CHANGED
|
@@ -28,7 +28,7 @@ bundle exec rake install
|
|
|
28
28
|
|
|
29
29
|
```bash
|
|
30
30
|
# Basic usage (public repository)
|
|
31
|
-
gitingest --repository user/repo
|
|
31
|
+
gitingest --repository user/repo
|
|
32
32
|
|
|
33
33
|
# With GitHub token for private repositories
|
|
34
34
|
gitingest --repository user/repo --token YOUR_GITHUB_TOKEN
|
|
@@ -109,7 +109,7 @@ generator = Gitingest::Generator.new(
|
|
|
109
109
|
token: "YOUR_GITHUB_TOKEN",
|
|
110
110
|
output_file: "my_prompt.txt",
|
|
111
111
|
branch: "develop",
|
|
112
|
-
exclude: ["*.md", "docs/"],
|
|
112
|
+
exclude: ["*.md", "docs/"],
|
|
113
113
|
threads: 4, # control concurrency
|
|
114
114
|
thread_timeout: 120, # custom thread timeout
|
|
115
115
|
quiet: true # or verbose: true
|
|
@@ -126,11 +126,13 @@ generator = Gitingest::Generator.new(
|
|
|
126
126
|
## Features
|
|
127
127
|
|
|
128
128
|
- Fetches all files from a GitHub repository based on the given branch
|
|
129
|
+
- **High Performance**: Optimized API usage with SHA-based blob fetching for faster content retrieval
|
|
129
130
|
- Automatically excludes common binary files and system files by default
|
|
130
131
|
- Allows custom exclusion patterns for specific file extensions or directories
|
|
131
132
|
- Uses concurrent processing for faster downloads
|
|
132
133
|
- Handles GitHub API rate limiting with automatic retry
|
|
133
134
|
- Generates a clean, formatted output file with file paths and content
|
|
135
|
+
- **Modular Architecture**: Clean, maintainable codebase with single-responsibility components
|
|
134
136
|
|
|
135
137
|
## Default Exclusion Patterns
|
|
136
138
|
|
|
@@ -0,0 +1,126 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "concurrent"
|
|
4
|
+
require "base64"
|
|
5
|
+
|
|
6
|
+
module Gitingest
|
|
7
|
+
class ContentFetcher
|
|
8
|
+
BUFFER_SIZE = 250
|
|
9
|
+
LOCAL_BUFFER_THRESHOLD = 50
|
|
10
|
+
DEFAULT_THREAD_COUNT = [Concurrent.processor_count, 8].min
|
|
11
|
+
DEFAULT_THREAD_TIMEOUT = 60 # seconds
|
|
12
|
+
|
|
13
|
+
def initialize(client, repository, files, logger, options = {})
|
|
14
|
+
@client = client
|
|
15
|
+
@repository = repository
|
|
16
|
+
@files = files
|
|
17
|
+
@logger = logger
|
|
18
|
+
@threads = options[:threads] || DEFAULT_THREAD_COUNT
|
|
19
|
+
@thread_timeout = options[:thread_timeout] || DEFAULT_THREAD_TIMEOUT
|
|
20
|
+
end
|
|
21
|
+
|
|
22
|
+
def fetch(output)
|
|
23
|
+
@logger.debug "Using thread pool with #{@threads} threads"
|
|
24
|
+
buffer = []
|
|
25
|
+
progress = ProgressIndicator.new(@files.size, @logger)
|
|
26
|
+
thread_buffers = Concurrent::Map.new
|
|
27
|
+
mutex = Mutex.new
|
|
28
|
+
errors = Concurrent::Array.new
|
|
29
|
+
pool = Concurrent::FixedThreadPool.new(@threads)
|
|
30
|
+
prioritized_files = prioritize_files(@files)
|
|
31
|
+
|
|
32
|
+
prioritized_files.each_with_index do |repo_file, index|
|
|
33
|
+
pool.post do
|
|
34
|
+
thread_id = Thread.current.object_id
|
|
35
|
+
thread_buffers[thread_id] ||= []
|
|
36
|
+
local_buffer = thread_buffers[thread_id]
|
|
37
|
+
begin
|
|
38
|
+
content = fetch_file_content_with_retry(repo_file.sha)
|
|
39
|
+
local_buffer << format_file_content(repo_file.path, content)
|
|
40
|
+
if local_buffer.size >= LOCAL_BUFFER_THRESHOLD
|
|
41
|
+
mutex.synchronize do
|
|
42
|
+
buffer.concat(local_buffer)
|
|
43
|
+
write_buffer(output, buffer) if buffer.size >= BUFFER_SIZE
|
|
44
|
+
local_buffer.clear
|
|
45
|
+
end
|
|
46
|
+
end
|
|
47
|
+
progress.update(index + 1)
|
|
48
|
+
rescue Octokit::Error => e
|
|
49
|
+
mutex.synchronize { errors << "Error fetching #{repo_file.path}: #{e.message}" }
|
|
50
|
+
@logger.error "Error fetching #{repo_file.path}: #{e.message}"
|
|
51
|
+
rescue StandardError => e
|
|
52
|
+
mutex.synchronize { errors << "Unexpected error processing #{repo_file.path}: #{e.message}" }
|
|
53
|
+
@logger.error "Unexpected error processing #{repo_file.path}: #{e.message}"
|
|
54
|
+
end
|
|
55
|
+
end
|
|
56
|
+
end
|
|
57
|
+
|
|
58
|
+
pool.shutdown
|
|
59
|
+
unless pool.wait_for_termination(@thread_timeout)
|
|
60
|
+
@logger.warn "Thread pool did not shut down gracefully within #{@thread_timeout}s, forcing termination."
|
|
61
|
+
pool.kill
|
|
62
|
+
end
|
|
63
|
+
|
|
64
|
+
mutex.synchronize do
|
|
65
|
+
thread_buffers.each_value { |local_buffer| buffer.concat(local_buffer) unless local_buffer.empty? }
|
|
66
|
+
write_buffer(output, buffer) unless buffer.empty?
|
|
67
|
+
end
|
|
68
|
+
|
|
69
|
+
return unless errors.any?
|
|
70
|
+
|
|
71
|
+
@logger.warn "Completed with #{errors.size} errors"
|
|
72
|
+
@logger.debug "First few errors: #{errors.first(3).join(", ")}" if @logger.debug?
|
|
73
|
+
end
|
|
74
|
+
|
|
75
|
+
private
|
|
76
|
+
|
|
77
|
+
def format_file_content(path, content)
|
|
78
|
+
<<~TEXT
|
|
79
|
+
================================================================
|
|
80
|
+
File: #{path}
|
|
81
|
+
================================================================
|
|
82
|
+
#{content}
|
|
83
|
+
|
|
84
|
+
TEXT
|
|
85
|
+
end
|
|
86
|
+
|
|
87
|
+
def fetch_file_content_with_retry(sha, retries = 3, base_delay = 2)
|
|
88
|
+
blob = @client.blob(@repository, sha)
|
|
89
|
+
content = blob.content
|
|
90
|
+
case blob.encoding
|
|
91
|
+
when "base64"
|
|
92
|
+
Base64.decode64(content)
|
|
93
|
+
else
|
|
94
|
+
content
|
|
95
|
+
end
|
|
96
|
+
rescue Octokit::TooManyRequests
|
|
97
|
+
raise unless retries.positive?
|
|
98
|
+
|
|
99
|
+
delay = base_delay**(4 - retries) * (0.8 + 0.4 * rand)
|
|
100
|
+
@logger.warn "Rate limit exceeded, waiting #{delay.round(1)} seconds..."
|
|
101
|
+
sleep(delay)
|
|
102
|
+
fetch_file_content_with_retry(sha, retries - 1, base_delay)
|
|
103
|
+
end
|
|
104
|
+
|
|
105
|
+
def write_buffer(file, buffer)
|
|
106
|
+
return if buffer.empty?
|
|
107
|
+
|
|
108
|
+
file.puts(buffer.join)
|
|
109
|
+
buffer.clear
|
|
110
|
+
end
|
|
111
|
+
|
|
112
|
+
def prioritize_files(files)
|
|
113
|
+
files.sort_by do |file|
|
|
114
|
+
ext = File.extname(file.path.downcase)
|
|
115
|
+
case ext
|
|
116
|
+
when ".md", ".txt", ".json", ".yaml", ".yml"
|
|
117
|
+
0 # Documentation and data files first
|
|
118
|
+
when ".rb", ".py", ".js", ".ts", ".go", ".java", ".c", ".cpp", ".h"
|
|
119
|
+
1 # Source code files second
|
|
120
|
+
else
|
|
121
|
+
2 # Other files last
|
|
122
|
+
end
|
|
123
|
+
end
|
|
124
|
+
end
|
|
125
|
+
end
|
|
126
|
+
end
|
|
@@ -0,0 +1,113 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Gitingest
|
|
4
|
+
class ExclusionFilter
|
|
5
|
+
# Default exclusion patterns for common files and directories
|
|
6
|
+
DEFAULT_EXCLUDES = [
|
|
7
|
+
# Version control
|
|
8
|
+
'\.git/', '\.github/', '\.gitignore', '\.gitattributes', '\.gitmodules', '\.svn', '\.hg',
|
|
9
|
+
|
|
10
|
+
# System files
|
|
11
|
+
'\.DS_Store', 'Thumbs\.db', 'desktop\.ini',
|
|
12
|
+
|
|
13
|
+
# Log files
|
|
14
|
+
'.*\.log$', '.*\.bak$', '.*\.swp$', '.*\.tmp$', '.*\.temp$',
|
|
15
|
+
|
|
16
|
+
# Images and media
|
|
17
|
+
'.*\.png$', '.*\.jpg$', '.*\.jpeg$', '.*\.gif$', '.*\.svg$', '.*\.ico$',
|
|
18
|
+
'.*\.pdf$', '.*\.mov$', '.*\.mp4$', '.*\.mp3$', '.*\.wav$',
|
|
19
|
+
|
|
20
|
+
# Archives
|
|
21
|
+
'.*\.zip$', '.*\.tar\.gz$',
|
|
22
|
+
|
|
23
|
+
# Dependency directories
|
|
24
|
+
"node_modules/", "vendor/", "bower_components/", "\.npm/", "\.yarn/", "\.pnpm-store/",
|
|
25
|
+
"\.bundle/", "vendor/bundle", "packages/", "site-packages/",
|
|
26
|
+
|
|
27
|
+
# Virtual environments
|
|
28
|
+
"venv/", "\.venv/", "env/", "\.env", "virtualenv/",
|
|
29
|
+
|
|
30
|
+
# IDE and editor files
|
|
31
|
+
"\.idea/", "\.vscode/", "\.vs/", "\.settings/", ".*\.sublime-.*",
|
|
32
|
+
"\.project", "\.classpath", "xcuserdata/", ".*\.xcodeproj/", ".*\.xcworkspace/",
|
|
33
|
+
|
|
34
|
+
# Lock files
|
|
35
|
+
"package-lock\.json", "yarn\.lock", "poetry\.lock", "Pipfile\.lock",
|
|
36
|
+
"Gemfile\.lock", "Cargo\.lock", "bun\.lock", "bun\.lockb",
|
|
37
|
+
|
|
38
|
+
# Build directories and artifacts
|
|
39
|
+
"build/", "dist/", "target/", "out/", "\.gradle/", "\.settings/",
|
|
40
|
+
".*\.egg-info", ".*\.egg", ".*\.whl", ".*\.so", "bin/", "obj/", "pkg/",
|
|
41
|
+
|
|
42
|
+
# Cache directories
|
|
43
|
+
"\.cache/", "\.sass-cache/", "\.eslintcache/", "\.pytest_cache/",
|
|
44
|
+
"\.coverage", "\.tox/", "\.nox/", "\.mypy_cache/", "\.ruff_cache/",
|
|
45
|
+
"\.hypothesis/", "\.terraform/", "\.docusaurus/", "\.next/", "\.nuxt/",
|
|
46
|
+
|
|
47
|
+
# Compiled code
|
|
48
|
+
".*\.pyc$", ".*\.pyo$", ".*\.pyd$", "__pycache__/", ".*\.class$",
|
|
49
|
+
".*\.jar$", ".*\.war$", ".*\.ear$", ".*\.nar$",
|
|
50
|
+
".*\.o$", ".*\.obj$", ".*\.dll$", ".*\.dylib$", ".*\.exe$",
|
|
51
|
+
".*\.lib$", ".*\.out$", ".*\.a$", ".*\.pdb$", ".*\.nupkg$",
|
|
52
|
+
|
|
53
|
+
# Language-specific files
|
|
54
|
+
".*\.min\.js$", ".*\.min\.css$", ".*\.map$", ".*\.tfstate.*",
|
|
55
|
+
".*\.gem$", ".*\.ruby-version", ".*\.ruby-gemset", ".*\.rvmrc",
|
|
56
|
+
".*\.rs\.bk$", ".*\.gradle", ".*\.suo", ".*\.user", ".*\.userosscache",
|
|
57
|
+
".*\.sln\.docstates", "gradle-app\.setting",
|
|
58
|
+
".*\.pbxuser", ".*\.mode1v3", ".*\.mode2v3", ".*\.perspectivev3", ".*\.xcuserstate",
|
|
59
|
+
"\.swiftpm/", "\.build/"
|
|
60
|
+
].freeze
|
|
61
|
+
|
|
62
|
+
# Pattern for dot files/directories
|
|
63
|
+
DOT_FILE_PATTERN = %r{(?-mix:(^\.|/\.))}
|
|
64
|
+
|
|
65
|
+
def initialize(custom_excludes = [])
|
|
66
|
+
@custom_excludes = custom_excludes || []
|
|
67
|
+
compile_excluded_patterns
|
|
68
|
+
end
|
|
69
|
+
|
|
70
|
+
def excluded?(path)
|
|
71
|
+
return true if path.match?(DOT_FILE_PATTERN)
|
|
72
|
+
|
|
73
|
+
# Check for directory exclusion patterns (ending with '/')
|
|
74
|
+
matched_dir_pattern = @directory_patterns.find { |dir_pattern| path.start_with?(dir_pattern) }
|
|
75
|
+
return true if matched_dir_pattern
|
|
76
|
+
|
|
77
|
+
# Check default regex patterns
|
|
78
|
+
matched_default_pattern = @default_patterns.find { |pattern| path.match?(pattern) }
|
|
79
|
+
return true if matched_default_pattern
|
|
80
|
+
|
|
81
|
+
# Check custom glob patterns using File.fnmatch
|
|
82
|
+
matched_glob_pattern = @custom_glob_patterns.find do |glob_pattern|
|
|
83
|
+
File.fnmatch(glob_pattern, path, File::FNM_PATHNAME | File::FNM_DOTMATCH)
|
|
84
|
+
end
|
|
85
|
+
return true if matched_glob_pattern
|
|
86
|
+
|
|
87
|
+
false
|
|
88
|
+
end
|
|
89
|
+
|
|
90
|
+
private
|
|
91
|
+
|
|
92
|
+
def compile_excluded_patterns
|
|
93
|
+
@default_patterns = DEFAULT_EXCLUDES.map { |pattern| Regexp.new(pattern) }
|
|
94
|
+
@custom_glob_patterns = [] # For File.fnmatch
|
|
95
|
+
@directory_patterns = []
|
|
96
|
+
|
|
97
|
+
@custom_excludes.each do |pattern_str|
|
|
98
|
+
if pattern_str.end_with?("/")
|
|
99
|
+
@directory_patterns << pattern_str
|
|
100
|
+
else
|
|
101
|
+
# All other custom excludes are treated as glob patterns.
|
|
102
|
+
# If the pattern does not contain a slash, prepend "**/"
|
|
103
|
+
# to make it match at any depth (e.g., "*.md" becomes "**/*.md").
|
|
104
|
+
@custom_glob_patterns << if pattern_str.include?("/")
|
|
105
|
+
pattern_str
|
|
106
|
+
else
|
|
107
|
+
"**/#{pattern_str}"
|
|
108
|
+
end
|
|
109
|
+
end
|
|
110
|
+
end
|
|
111
|
+
end
|
|
112
|
+
end
|
|
113
|
+
end
|
data/lib/gitingest/generator.rb
CHANGED
|
@@ -1,98 +1,19 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
3
|
require "octokit"
|
|
4
|
-
require "base64"
|
|
5
|
-
require "fileutils"
|
|
6
|
-
require "concurrent"
|
|
7
4
|
require "logger"
|
|
8
5
|
|
|
9
6
|
module Gitingest
|
|
10
7
|
class Generator
|
|
11
|
-
|
|
12
|
-
DEFAULT_EXCLUDES = [
|
|
13
|
-
# Version control
|
|
14
|
-
'\.git/', '\.github/', '\.gitignore', '\.gitattributes', '\.gitmodules', '\.svn', '\.hg',
|
|
15
|
-
|
|
16
|
-
# System files
|
|
17
|
-
'\.DS_Store', 'Thumbs\.db', 'desktop\.ini',
|
|
18
|
-
|
|
19
|
-
# Log files
|
|
20
|
-
'.*\.log$', '.*\.bak$', '.*\.swp$', '.*\.tmp$', '.*\.temp$',
|
|
21
|
-
|
|
22
|
-
# Images and media
|
|
23
|
-
'.*\.png$', '.*\.jpg$', '.*\.jpeg$', '.*\.gif$', '.*\.svg$', '.*\.ico$',
|
|
24
|
-
'.*\.pdf$', '.*\.mov$', '.*\.mp4$', '.*\.mp3$', '.*\.wav$',
|
|
25
|
-
|
|
26
|
-
# Archives
|
|
27
|
-
'.*\.zip$', '.*\.tar\.gz$',
|
|
28
|
-
|
|
29
|
-
# Dependency directories
|
|
30
|
-
"node_modules/", "vendor/", "bower_components/", "\.npm/", "\.yarn/", "\.pnpm-store/",
|
|
31
|
-
"\.bundle/", "vendor/bundle", "packages/", "site-packages/",
|
|
32
|
-
|
|
33
|
-
# Virtual environments
|
|
34
|
-
"venv/", "\.venv/", "env/", "\.env", "virtualenv/",
|
|
35
|
-
|
|
36
|
-
# IDE and editor files
|
|
37
|
-
"\.idea/", "\.vscode/", "\.vs/", "\.settings/", ".*\.sublime-.*",
|
|
38
|
-
"\.project", "\.classpath", "xcuserdata/", ".*\.xcodeproj/", ".*\.xcworkspace/",
|
|
39
|
-
|
|
40
|
-
# Lock files
|
|
41
|
-
"package-lock\.json", "yarn\.lock", "poetry\.lock", "Pipfile\.lock",
|
|
42
|
-
"Gemfile\.lock", "Cargo\.lock", "bun\.lock", "bun\.lockb",
|
|
43
|
-
|
|
44
|
-
# Build directories and artifacts
|
|
45
|
-
"build/", "dist/", "target/", "out/", "\.gradle/", "\.settings/",
|
|
46
|
-
".*\.egg-info", ".*\.egg", ".*\.whl", ".*\.so", "bin/", "obj/", "pkg/",
|
|
47
|
-
|
|
48
|
-
# Cache directories
|
|
49
|
-
"\.cache/", "\.sass-cache/", "\.eslintcache/", "\.pytest_cache/",
|
|
50
|
-
"\.coverage", "\.tox/", "\.nox/", "\.mypy_cache/", "\.ruff_cache/",
|
|
51
|
-
"\.hypothesis/", "\.terraform/", "\.docusaurus/", "\.next/", "\.nuxt/",
|
|
52
|
-
|
|
53
|
-
# Compiled code
|
|
54
|
-
".*\.pyc$", ".*\.pyo$", ".*\.pyd$", "__pycache__/", ".*\.class$",
|
|
55
|
-
".*\.jar$", ".*\.war$", ".*\.ear$", ".*\.nar$",
|
|
56
|
-
".*\.o$", ".*\.obj$", ".*\.dll$", ".*\.dylib$", ".*\.exe$",
|
|
57
|
-
".*\.lib$", ".*\.out$", ".*\.a$", ".*\.pdb$", ".*\.nupkg$",
|
|
58
|
-
|
|
59
|
-
# Language-specific files
|
|
60
|
-
".*\.min\.js$", ".*\.min\.css$", ".*\.map$", ".*\.tfstate.*",
|
|
61
|
-
".*\.gem$", ".*\.ruby-version", ".*\.ruby-gemset", ".*\.rvmrc",
|
|
62
|
-
".*\.rs\.bk$", ".*\.gradle", ".*\.suo", ".*\.user", ".*\.userosscache",
|
|
63
|
-
".*\.sln\.docstates", "gradle-app\.setting",
|
|
64
|
-
".*\.pbxuser", ".*\.mode1v3", ".*\.mode2v3", ".*\.perspectivev3", ".*\.xcuserstate",
|
|
65
|
-
"\.swiftpm/", "\.build/"
|
|
66
|
-
].freeze
|
|
67
|
-
|
|
68
|
-
# Pattern for dot files/directories
|
|
69
|
-
DOT_FILE_PATTERN = %r{(?-mix:(^\.|/\.))}
|
|
70
|
-
|
|
71
|
-
# Maximum number of files to process to prevent memory overload
|
|
72
|
-
MAX_FILES = 1000
|
|
73
|
-
|
|
74
|
-
# Buffer size to reduce I/O operations
|
|
75
|
-
BUFFER_SIZE = 250
|
|
76
|
-
|
|
77
|
-
# Thread-local buffer threshold
|
|
78
|
-
LOCAL_BUFFER_THRESHOLD = 50
|
|
79
|
-
|
|
80
|
-
# Default threading options
|
|
81
|
-
DEFAULT_THREAD_COUNT = [Concurrent.processor_count, 8].min
|
|
82
|
-
DEFAULT_THREAD_TIMEOUT = 60 # seconds
|
|
83
|
-
|
|
84
|
-
attr_reader :options, :client, :repo_files, :excluded_patterns, :logger
|
|
8
|
+
attr_reader :options, :client, :repo_files, :logger
|
|
85
9
|
|
|
86
10
|
def initialize(options = {})
|
|
87
11
|
@options = options
|
|
88
12
|
@repo_files = []
|
|
89
|
-
# @excluded_patterns = [] # This will be set after validate_options
|
|
90
13
|
setup_logger
|
|
91
14
|
validate_options
|
|
92
15
|
configure_client
|
|
93
|
-
|
|
94
|
-
@excluded_patterns = DEFAULT_EXCLUDES + @options.fetch(:exclude, [])
|
|
95
|
-
compile_excluded_patterns
|
|
16
|
+
@exclusion_filter = ExclusionFilter.new(@options[:exclude])
|
|
96
17
|
end
|
|
97
18
|
|
|
98
19
|
def run
|
|
@@ -133,6 +54,15 @@ module Gitingest
|
|
|
133
54
|
structure
|
|
134
55
|
end
|
|
135
56
|
|
|
57
|
+
# Exposed for testing
|
|
58
|
+
def excluded_patterns
|
|
59
|
+
# This is a bit of a hack to maintain backward compatibility with tests
|
|
60
|
+
# that check for excluded_patterns. In the new design, this is handled
|
|
61
|
+
# by ExclusionFilter.
|
|
62
|
+
@exclusion_filter.instance_variable_get(:@default_patterns) +
|
|
63
|
+
@exclusion_filter.instance_variable_get(:@custom_glob_patterns).map { |p| Regexp.new(p.gsub("*", ".*")) }
|
|
64
|
+
end
|
|
65
|
+
|
|
136
66
|
private
|
|
137
67
|
|
|
138
68
|
def setup_logger
|
|
@@ -152,11 +82,10 @@ module Gitingest
|
|
|
152
82
|
|
|
153
83
|
@options[:output_file] ||= "#{@options[:repository].split("/").last}_prompt.txt"
|
|
154
84
|
@options[:branch] ||= :default
|
|
155
|
-
@options[:exclude] ||= []
|
|
156
|
-
@options[:threads] ||= DEFAULT_THREAD_COUNT
|
|
157
|
-
@options[:thread_timeout] ||= DEFAULT_THREAD_TIMEOUT
|
|
85
|
+
@options[:exclude] ||= []
|
|
86
|
+
@options[:threads] ||= ContentFetcher::DEFAULT_THREAD_COUNT
|
|
87
|
+
@options[:thread_timeout] ||= ContentFetcher::DEFAULT_THREAD_TIMEOUT
|
|
158
88
|
@options[:show_structure] ||= false
|
|
159
|
-
# NOTE: @excluded_patterns is set in compile_excluded_patterns based on @options[:exclude] # This comment is now incorrect / removed.
|
|
160
89
|
end
|
|
161
90
|
|
|
162
91
|
def configure_client
|
|
@@ -169,241 +98,16 @@ module Gitingest
|
|
|
169
98
|
end
|
|
170
99
|
end
|
|
171
100
|
|
|
172
|
-
def compile_excluded_patterns
|
|
173
|
-
@default_patterns = DEFAULT_EXCLUDES.map { |pattern| Regexp.new(pattern) }
|
|
174
|
-
@custom_glob_patterns = [] # For File.fnmatch
|
|
175
|
-
@directory_patterns = []
|
|
176
|
-
|
|
177
|
-
@options[:exclude].each do |pattern_str|
|
|
178
|
-
if pattern_str.end_with?("/")
|
|
179
|
-
@directory_patterns << pattern_str
|
|
180
|
-
else
|
|
181
|
-
# All other custom excludes are treated as glob patterns.
|
|
182
|
-
# If the pattern does not contain a slash, prepend "**/"
|
|
183
|
-
# to make it match at any depth (e.g., "*.md" becomes "**/*.md").
|
|
184
|
-
@custom_glob_patterns << if pattern_str.include?("/")
|
|
185
|
-
pattern_str
|
|
186
|
-
else
|
|
187
|
-
"**/#{pattern_str}"
|
|
188
|
-
end
|
|
189
|
-
end
|
|
190
|
-
end
|
|
191
|
-
end
|
|
192
|
-
|
|
193
101
|
def fetch_repository_contents
|
|
194
102
|
@logger.info "Fetching repository: #{@options[:repository]} (branch: #{@options[:branch]})"
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
@repo_files = repo_tree.tree.select { |item| item.type == "blob" && !excluded_file?(item.path) }
|
|
198
|
-
if @repo_files.size > MAX_FILES
|
|
199
|
-
@logger.warn "Warning: Found #{@repo_files.size} files, limited to #{MAX_FILES}."
|
|
200
|
-
@repo_files = @repo_files.first(MAX_FILES)
|
|
201
|
-
end
|
|
103
|
+
fetcher = RepositoryFetcher.new(@client, @options[:repository], @options[:branch], @exclusion_filter)
|
|
104
|
+
@repo_files = fetcher.fetch
|
|
202
105
|
@logger.info "Found #{@repo_files.size} files after exclusion filters"
|
|
203
|
-
rescue Octokit::Unauthorized
|
|
204
|
-
raise "Authentication error: Invalid or expired GitHub token."
|
|
205
|
-
rescue Octokit::NotFound
|
|
206
|
-
raise "Repository not found: '#{@options[:repository]}' or branch '#{@options[:branch]}' doesn't exist or is private."
|
|
207
|
-
rescue Octokit::Error => e
|
|
208
|
-
raise "Error accessing repository: #{e.message}"
|
|
209
|
-
end
|
|
210
|
-
|
|
211
|
-
# Validate repository and branch access
|
|
212
|
-
def validate_repository_access
|
|
213
|
-
repo = @client.repository(@options[:repository])
|
|
214
|
-
@options[:branch] = repo.default_branch if @options[:branch] == :default
|
|
215
|
-
|
|
216
|
-
# If repository check succeeds, store this fact before trying branch
|
|
217
|
-
@repository_exists = true
|
|
218
|
-
|
|
219
|
-
begin
|
|
220
|
-
@client.branch(@options[:repository], @options[:branch])
|
|
221
|
-
rescue Octokit::NotFound
|
|
222
|
-
# If we got here, the repository exists but the branch doesn't
|
|
223
|
-
raise "Branch '#{@options[:branch]}' not found in repository '#{@options[:repository]}'"
|
|
224
|
-
end
|
|
225
|
-
rescue Octokit::Unauthorized
|
|
226
|
-
raise "Authentication error: Invalid or expired GitHub token"
|
|
227
|
-
rescue Octokit::NotFound
|
|
228
|
-
# Only reach this for repository not found (branch errors handled separately)
|
|
229
|
-
raise "Repository '#{@options[:repository]}' not found or is private. Check the repository name or provide a valid token."
|
|
230
|
-
end
|
|
231
|
-
|
|
232
|
-
def excluded_file?(path)
|
|
233
|
-
return true if path.match?(DOT_FILE_PATTERN)
|
|
234
|
-
|
|
235
|
-
# Check for directory exclusion patterns (ending with '/')
|
|
236
|
-
matched_dir_pattern = @directory_patterns.find { |dir_pattern| path.start_with?(dir_pattern) }
|
|
237
|
-
if matched_dir_pattern
|
|
238
|
-
@logger.debug { "Excluding #{path} (matched directory pattern: #{matched_dir_pattern})" }
|
|
239
|
-
return true
|
|
240
|
-
end
|
|
241
|
-
|
|
242
|
-
# Check default regex patterns
|
|
243
|
-
matched_default_pattern = @default_patterns.find { |pattern| path.match?(pattern) }
|
|
244
|
-
if matched_default_pattern
|
|
245
|
-
@logger.debug { "Excluding #{path} (matched default pattern: #{matched_default_pattern.source})" }
|
|
246
|
-
return true
|
|
247
|
-
end
|
|
248
|
-
|
|
249
|
-
# Check custom glob patterns using File.fnmatch
|
|
250
|
-
matched_glob_pattern = @custom_glob_patterns.find do |glob_pattern|
|
|
251
|
-
File.fnmatch(glob_pattern, path, File::FNM_PATHNAME | File::FNM_DOTMATCH)
|
|
252
|
-
end
|
|
253
|
-
if matched_glob_pattern
|
|
254
|
-
@logger.debug { "Excluding #{path} (matched custom glob pattern: #{matched_glob_pattern})" }
|
|
255
|
-
return true
|
|
256
|
-
end
|
|
257
|
-
|
|
258
|
-
false
|
|
259
106
|
end
|
|
260
107
|
|
|
261
108
|
def process_content_to_output(output)
|
|
262
|
-
|
|
263
|
-
|
|
264
|
-
progress = ProgressIndicator.new(@repo_files.size, @logger)
|
|
265
|
-
thread_buffers = Concurrent::Map.new # Thread-safe map for buffers
|
|
266
|
-
mutex = Mutex.new # Mutex for shared buffer and output operations
|
|
267
|
-
errors = Concurrent::Array.new # Thread-safe array for errors
|
|
268
|
-
pool = Concurrent::FixedThreadPool.new(@options[:threads])
|
|
269
|
-
prioritized_files = prioritize_files(@repo_files)
|
|
270
|
-
|
|
271
|
-
prioritized_files.each_with_index do |repo_file, index|
|
|
272
|
-
pool.post do
|
|
273
|
-
thread_id = Thread.current.object_id
|
|
274
|
-
thread_buffers[thread_id] ||= []
|
|
275
|
-
local_buffer = thread_buffers[thread_id]
|
|
276
|
-
begin
|
|
277
|
-
content = fetch_file_content_with_retry(repo_file.path)
|
|
278
|
-
local_buffer << format_file_content(repo_file.path, content)
|
|
279
|
-
if local_buffer.size >= LOCAL_BUFFER_THRESHOLD
|
|
280
|
-
mutex.synchronize do
|
|
281
|
-
buffer.concat(local_buffer)
|
|
282
|
-
write_buffer(output, buffer) if buffer.size >= BUFFER_SIZE
|
|
283
|
-
local_buffer.clear
|
|
284
|
-
end
|
|
285
|
-
end
|
|
286
|
-
progress.update(index + 1)
|
|
287
|
-
rescue Octokit::Error => e
|
|
288
|
-
mutex.synchronize { errors << "Error fetching #{repo_file.path}: #{e.message}" }
|
|
289
|
-
@logger.error "Error fetching #{repo_file.path}: #{e.message}"
|
|
290
|
-
rescue StandardError => e
|
|
291
|
-
mutex.synchronize { errors << "Unexpected error processing #{repo_file.path}: #{e.message}" }
|
|
292
|
-
@logger.error "Unexpected error processing #{repo_file.path}: #{e.message}"
|
|
293
|
-
end
|
|
294
|
-
end
|
|
295
|
-
end
|
|
296
|
-
|
|
297
|
-
pool.shutdown
|
|
298
|
-
unless pool.wait_for_termination(@options[:thread_timeout])
|
|
299
|
-
@logger.warn "Thread pool did not shut down gracefully within #{@options[:thread_timeout]}s, forcing termination."
|
|
300
|
-
pool.kill
|
|
301
|
-
end
|
|
302
|
-
|
|
303
|
-
mutex.synchronize do
|
|
304
|
-
thread_buffers.each_value { |local_buffer| buffer.concat(local_buffer) unless local_buffer.empty? }
|
|
305
|
-
write_buffer(output, buffer) unless buffer.empty?
|
|
306
|
-
end
|
|
307
|
-
|
|
308
|
-
return unless errors.any?
|
|
309
|
-
|
|
310
|
-
@logger.warn "Completed with #{errors.size} errors"
|
|
311
|
-
@logger.debug "First few errors: #{errors.first(3).join(", ")}" if @logger.debug?
|
|
312
|
-
end
|
|
313
|
-
|
|
314
|
-
def format_file_content(path, content)
|
|
315
|
-
<<~TEXT
|
|
316
|
-
================================================================
|
|
317
|
-
File: #{path}
|
|
318
|
-
================================================================
|
|
319
|
-
#{content}
|
|
320
|
-
|
|
321
|
-
TEXT
|
|
322
|
-
end
|
|
323
|
-
|
|
324
|
-
def fetch_file_content_with_retry(path, retries = 3, base_delay = 2)
|
|
325
|
-
content = @client.contents(@options[:repository], path: path, ref: @options[:branch])
|
|
326
|
-
Base64.decode64(content.content)
|
|
327
|
-
rescue Octokit::TooManyRequests
|
|
328
|
-
raise unless retries.positive?
|
|
329
|
-
|
|
330
|
-
delay = base_delay**(4 - retries) * (0.8 + 0.4 * rand)
|
|
331
|
-
@logger.warn "Rate limit exceeded, waiting #{delay.round(1)} seconds..."
|
|
332
|
-
sleep(delay)
|
|
333
|
-
fetch_file_content_with_retry(path, retries - 1, base_delay)
|
|
334
|
-
end
|
|
335
|
-
|
|
336
|
-
def write_buffer(file, buffer)
|
|
337
|
-
return if buffer.empty?
|
|
338
|
-
|
|
339
|
-
file.puts(buffer.join)
|
|
340
|
-
buffer.clear
|
|
341
|
-
end
|
|
342
|
-
|
|
343
|
-
def prioritize_files(files)
|
|
344
|
-
files.sort_by do |file|
|
|
345
|
-
ext = File.extname(file.path.downcase)
|
|
346
|
-
case ext
|
|
347
|
-
when ".md", ".txt", ".json", ".yaml", ".yml"
|
|
348
|
-
0 # Documentation and data files first
|
|
349
|
-
when ".rb", ".py", ".js", ".ts", ".go", ".java", ".c", ".cpp", ".h"
|
|
350
|
-
1 # Source code files second
|
|
351
|
-
else
|
|
352
|
-
2 # Other files last
|
|
353
|
-
end
|
|
354
|
-
end
|
|
355
|
-
end
|
|
356
|
-
end
|
|
357
|
-
|
|
358
|
-
class ProgressIndicator
|
|
359
|
-
BAR_WIDTH = 30
|
|
360
|
-
|
|
361
|
-
def initialize(total, logger)
|
|
362
|
-
@total = total
|
|
363
|
-
@logger = logger
|
|
364
|
-
@last_percent = 0
|
|
365
|
-
@start_time = Time.now
|
|
366
|
-
@last_update_time = Time.now
|
|
367
|
-
@update_interval = 0.5
|
|
368
|
-
end
|
|
369
|
-
|
|
370
|
-
def update(current)
|
|
371
|
-
now = Time.now
|
|
372
|
-
return if now - @last_update_time < @update_interval && current != @total
|
|
373
|
-
|
|
374
|
-
@last_update_time = now
|
|
375
|
-
percent = (current.to_f / @total * 100).round
|
|
376
|
-
return unless percent > @last_percent || current == @total
|
|
377
|
-
|
|
378
|
-
elapsed = now - @start_time
|
|
379
|
-
progress_chars = (BAR_WIDTH * (current.to_f / @total)).round
|
|
380
|
-
bar = "[#{"|" * progress_chars}#{" " * (BAR_WIDTH - progress_chars)}]"
|
|
381
|
-
|
|
382
|
-
rate = if elapsed.positive?
|
|
383
|
-
(current / elapsed).round(1)
|
|
384
|
-
else
|
|
385
|
-
0 # Avoid division by zero if elapsed time is zero
|
|
386
|
-
end
|
|
387
|
-
eta_string = current.positive? && percent < 100 && rate.positive? ? " ETA: #{format_time((@total - current) / rate)}" : ""
|
|
388
|
-
|
|
389
|
-
print "\r\e[K#{bar} #{percent}% | #{current}/#{@total} files (#{rate} files/sec)#{eta_string}"
|
|
390
|
-
print "\n" if current == @total
|
|
391
|
-
if (percent % 10).zero? && percent != @last_percent || current == @total
|
|
392
|
-
@logger.info "Processing: #{percent}% complete (#{current}/#{@total} files)#{eta_string}"
|
|
393
|
-
end
|
|
394
|
-
@last_percent = percent
|
|
395
|
-
end
|
|
396
|
-
|
|
397
|
-
private
|
|
398
|
-
|
|
399
|
-
def format_time(seconds)
|
|
400
|
-
return "< 1s" if seconds < 1
|
|
401
|
-
|
|
402
|
-
case seconds
|
|
403
|
-
when 0...60 then "#{seconds.round}s"
|
|
404
|
-
when 60...3600 then "#{(seconds / 60).floor}m #{(seconds % 60).round}s"
|
|
405
|
-
else "#{(seconds / 3600).floor}h #{((seconds % 3600) / 60).floor}m"
|
|
406
|
-
end
|
|
109
|
+
fetcher = ContentFetcher.new(@client, @options[:repository], @repo_files, @logger, @options)
|
|
110
|
+
fetcher.fetch(output)
|
|
407
111
|
end
|
|
408
112
|
end
|
|
409
113
|
|
|
@@ -0,0 +1,55 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Gitingest
|
|
4
|
+
class ProgressIndicator
|
|
5
|
+
BAR_WIDTH = 30
|
|
6
|
+
|
|
7
|
+
def initialize(total, logger)
|
|
8
|
+
@total = total
|
|
9
|
+
@logger = logger
|
|
10
|
+
@last_percent = 0
|
|
11
|
+
@start_time = Time.now
|
|
12
|
+
@last_update_time = Time.now
|
|
13
|
+
@update_interval = 0.5
|
|
14
|
+
end
|
|
15
|
+
|
|
16
|
+
def update(current)
|
|
17
|
+
now = Time.now
|
|
18
|
+
return if now - @last_update_time < @update_interval && current != @total
|
|
19
|
+
|
|
20
|
+
@last_update_time = now
|
|
21
|
+
percent = (current.to_f / @total * 100).round
|
|
22
|
+
return unless percent > @last_percent || current == @total
|
|
23
|
+
|
|
24
|
+
elapsed = now - @start_time
|
|
25
|
+
progress_chars = (BAR_WIDTH * (current.to_f / @total)).round
|
|
26
|
+
bar = "[#{"|" * progress_chars}#{" " * (BAR_WIDTH - progress_chars)}]"
|
|
27
|
+
|
|
28
|
+
rate = if elapsed.positive?
|
|
29
|
+
(current / elapsed).round(1)
|
|
30
|
+
else
|
|
31
|
+
0 # Avoid division by zero if elapsed time is zero
|
|
32
|
+
end
|
|
33
|
+
eta_string = current.positive? && percent < 100 && rate.positive? ? " ETA: #{format_time((@total - current) / rate)}" : ""
|
|
34
|
+
|
|
35
|
+
print "\r\e[K#{bar} #{percent}% | #{current}/#{@total} files (#{rate} files/sec)#{eta_string}"
|
|
36
|
+
print "\n" if current == @total
|
|
37
|
+
if (percent % 10).zero? && percent != @last_percent || current == @total
|
|
38
|
+
@logger.info "Processing: #{percent}% complete (#{current}/#{@total} files)#{eta_string}"
|
|
39
|
+
end
|
|
40
|
+
@last_percent = percent
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
private
|
|
44
|
+
|
|
45
|
+
def format_time(seconds)
|
|
46
|
+
return "< 1s" if seconds < 1
|
|
47
|
+
|
|
48
|
+
case seconds
|
|
49
|
+
when 0...60 then "#{seconds.round}s"
|
|
50
|
+
when 60...3600 then "#{(seconds / 60).floor}m #{(seconds % 60).round}s"
|
|
51
|
+
else "#{(seconds / 3600).floor}h #{((seconds % 3600) / 60).floor}m"
|
|
52
|
+
end
|
|
53
|
+
end
|
|
54
|
+
end
|
|
55
|
+
end
|
|
@@ -0,0 +1,55 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "octokit"
|
|
4
|
+
|
|
5
|
+
module Gitingest
|
|
6
|
+
class RepositoryFetcher
|
|
7
|
+
MAX_FILES = 1000
|
|
8
|
+
|
|
9
|
+
def initialize(client, repository, branch = :default, exclusion_filter = nil)
|
|
10
|
+
@client = client
|
|
11
|
+
@repository = repository
|
|
12
|
+
@branch = branch
|
|
13
|
+
@exclusion_filter = exclusion_filter
|
|
14
|
+
end
|
|
15
|
+
|
|
16
|
+
def fetch
|
|
17
|
+
validate_repository_access
|
|
18
|
+
repo_tree = @client.tree(@repository, @branch, recursive: true)
|
|
19
|
+
|
|
20
|
+
files = repo_tree.tree.select do |item|
|
|
21
|
+
item.type == "blob" && !@exclusion_filter&.excluded?(item.path)
|
|
22
|
+
end
|
|
23
|
+
|
|
24
|
+
if files.size > MAX_FILES
|
|
25
|
+
# We might want to warn here, but for now we just truncate
|
|
26
|
+
files = files.first(MAX_FILES)
|
|
27
|
+
end
|
|
28
|
+
|
|
29
|
+
files
|
|
30
|
+
rescue Octokit::Unauthorized
|
|
31
|
+
raise "Authentication error: Invalid or expired GitHub token."
|
|
32
|
+
rescue Octokit::NotFound
|
|
33
|
+
raise "Repository not found: '#{@repository}' or branch '#{@branch}' doesn't exist or is private."
|
|
34
|
+
rescue Octokit::Error => e
|
|
35
|
+
raise "Error accessing repository: #{e.message}"
|
|
36
|
+
end
|
|
37
|
+
|
|
38
|
+
private
|
|
39
|
+
|
|
40
|
+
def validate_repository_access
|
|
41
|
+
repo = @client.repository(@repository)
|
|
42
|
+
@branch = repo.default_branch if @branch == :default
|
|
43
|
+
|
|
44
|
+
begin
|
|
45
|
+
@client.branch(@repository, @branch)
|
|
46
|
+
rescue Octokit::NotFound
|
|
47
|
+
raise "Branch '#{@branch}' not found in repository '#{@repository}'"
|
|
48
|
+
end
|
|
49
|
+
rescue Octokit::Unauthorized
|
|
50
|
+
raise "Authentication error: Invalid or expired GitHub token"
|
|
51
|
+
rescue Octokit::NotFound
|
|
52
|
+
raise "Repository '#{@repository}' not found or is private. Check the repository name or provide a valid token."
|
|
53
|
+
end
|
|
54
|
+
end
|
|
55
|
+
end
|
data/lib/gitingest/version.rb
CHANGED
data/lib/gitingest.rb
CHANGED
|
@@ -1,6 +1,10 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
3
|
require_relative "gitingest/version"
|
|
4
|
+
require_relative "gitingest/exclusion_filter"
|
|
5
|
+
require_relative "gitingest/repository_fetcher"
|
|
6
|
+
require_relative "gitingest/progress_indicator"
|
|
7
|
+
require_relative "gitingest/content_fetcher"
|
|
4
8
|
require_relative "gitingest/generator"
|
|
5
9
|
|
|
6
10
|
module Gitingest
|
metadata
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: gitingest
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 1.0.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Davide Santangelo
|
|
8
8
|
autorequire:
|
|
9
9
|
bindir: bin
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date: 2025-
|
|
11
|
+
date: 2025-11-28 00:00:00.000000000 Z
|
|
12
12
|
dependencies:
|
|
13
13
|
- !ruby/object:Gem::Dependency
|
|
14
14
|
name: concurrent-ruby
|
|
@@ -133,7 +133,11 @@ files:
|
|
|
133
133
|
- bin/setup
|
|
134
134
|
- index.html
|
|
135
135
|
- lib/gitingest.rb
|
|
136
|
+
- lib/gitingest/content_fetcher.rb
|
|
137
|
+
- lib/gitingest/exclusion_filter.rb
|
|
136
138
|
- lib/gitingest/generator.rb
|
|
139
|
+
- lib/gitingest/progress_indicator.rb
|
|
140
|
+
- lib/gitingest/repository_fetcher.rb
|
|
137
141
|
- lib/gitingest/version.rb
|
|
138
142
|
- sig/gitingest.rbs
|
|
139
143
|
homepage: https://github.com/davidesantangelo/gitingest
|