star-dlp 0.1.0 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/Gemfile.lock +1 -1
- data/README.md +42 -2
- data/README_zh.md +42 -2
- data/lib/star/dlp/cli.rb +40 -1
- data/lib/star/dlp/downloader.rb +353 -91
- data/lib/star/dlp/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 25b1201d34bb3a3e4d219f9faadde2b8d34aba5747c1ceb926482699107b44e2
|
4
|
+
data.tar.gz: f431fbe097ef52988772208ac1021d4e1dbd11f59fe16c4b8ab512e84ad19906
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: a21e8d101a58153efd911c4974e25683291d9b1ae52dc1f519b8b01f93dbb679c23f0d442eaad8dbf5c1b941e9db6dfa9abea42bc917c1f2a6d65984590dd195
|
7
|
+
data.tar.gz: 6783258316126bcdd95f5ebedd283d9ffa2ce38c6f24acfdd124e7e4ff584549cf1dc1ed9caa0d3118a1bd210ba2f5de64b51ac456777d6225e61d979480a4fc
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -48,6 +48,44 @@ $ star-dlp download your_github_username
|
|
48
48
|
|
49
49
|
This will download all your starred repositories and save them as JSON and Markdown files. If you've previously downloaded some repositories, it will only download newly starred repositories.
|
50
50
|
|
51
|
+
Available options:
|
52
|
+
- `--token`: GitHub API token
|
53
|
+
- `--output_dir`: Output directory
|
54
|
+
- `--json_dir`: JSON files directory
|
55
|
+
- `--markdown_dir`: Markdown files directory
|
56
|
+
- `--threads`: Number of download threads (default: 16)
|
57
|
+
- `--skip_readme`: Skip downloading README files
|
58
|
+
- `--retry_count`: Number of retry attempts for failed downloads (default: 5)
|
59
|
+
- `--retry_delay`: Delay in seconds between retry attempts (default: 1)
|
60
|
+
|
61
|
+
Example with options:
|
62
|
+
|
63
|
+
```bash
|
64
|
+
$ star-dlp download your_github_username --threads=8 --skip_readme --retry_count=3
|
65
|
+
```
|
66
|
+
|
67
|
+
### Downloading READMEs
|
68
|
+
|
69
|
+
If you've already downloaded your starred repositories but want to download or update their README files separately:
|
70
|
+
|
71
|
+
```bash
|
72
|
+
$ star-dlp download_readme
|
73
|
+
```
|
74
|
+
|
75
|
+
This command will scan your JSON files directory, extract repository information, and download README files for repositories that don't already have them.
|
76
|
+
|
77
|
+
Available options:
|
78
|
+
- `--threads`: Number of download threads (default: 16)
|
79
|
+
- `--retry_count`: Number of retry attempts for failed downloads (default: 5)
|
80
|
+
- `--retry_delay`: Delay in seconds between retry attempts (default: 1)
|
81
|
+
- `--force`: Force download even if README was already downloaded
|
82
|
+
|
83
|
+
Example with options:
|
84
|
+
|
85
|
+
```bash
|
86
|
+
$ star-dlp download_readme --threads=8 --force
|
87
|
+
```
|
88
|
+
|
51
89
|
### View Version
|
52
90
|
|
53
91
|
```bash
|
@@ -60,8 +98,10 @@ Star-DLP saves files in the following locations:
|
|
60
98
|
|
61
99
|
- Configuration file: `~/.star-dlp/config.json`
|
62
100
|
- Starred repositories: `~/.star-dlp/stars/`
|
63
|
-
- JSON files: `~/.star-dlp/stars/json
|
64
|
-
- Markdown files: `~/.star-dlp/stars/markdown
|
101
|
+
- JSON files: `~/.star-dlp/stars/json/YYYY/MM/YYYYMMDD.owner.repo.json`
|
102
|
+
- Markdown files: `~/.star-dlp/stars/markdown/YYYY/MM/YYYYMMDD.owner.repo.md`
|
103
|
+
- Last downloaded repository: `~/.star-dlp/stars/last_downloaded_repo.txt`
|
104
|
+
- Downloaded READMEs list: `~/.star-dlp/stars/downloaded_readmes.txt`
|
65
105
|
|
66
106
|
## Development
|
67
107
|
|
data/README_zh.md
CHANGED
@@ -48,6 +48,44 @@ $ star-dlp download your_github_username
|
|
48
48
|
|
49
49
|
这将下载您所有的星标仓库,并将它们保存为 JSON 和 Markdown 文件。如果您之前已经下载过一些仓库,它只会下载新的星标仓库。
|
50
50
|
|
51
|
+
可用选项:
|
52
|
+
- `--token`: GitHub API 令牌
|
53
|
+
- `--output_dir`: 输出目录
|
54
|
+
- `--json_dir`: JSON 文件目录
|
55
|
+
- `--markdown_dir`: Markdown 文件目录
|
56
|
+
- `--threads`: 下载线程数 (默认: 16)
|
57
|
+
- `--skip_readme`: 跳过下载 README 文件
|
58
|
+
- `--retry_count`: 下载失败时的重试次数 (默认: 5)
|
59
|
+
- `--retry_delay`: 重试之间的延迟秒数 (默认: 1)
|
60
|
+
|
61
|
+
带选项的示例:
|
62
|
+
|
63
|
+
```bash
|
64
|
+
$ star-dlp download your_github_username --threads=8 --skip_readme --retry_count=3
|
65
|
+
```
|
66
|
+
|
67
|
+
### 下载 README 文件
|
68
|
+
|
69
|
+
如果您已经下载了星标仓库,但想单独下载或更新它们的 README 文件:
|
70
|
+
|
71
|
+
```bash
|
72
|
+
$ star-dlp download_readme
|
73
|
+
```
|
74
|
+
|
75
|
+
此命令将扫描您的 JSON 文件目录,提取仓库信息,并为尚未下载 README 的仓库下载 README 文件。
|
76
|
+
|
77
|
+
可用选项:
|
78
|
+
- `--threads`: 下载线程数 (默认: 16)
|
79
|
+
- `--retry_count`: 下载失败时的重试次数 (默认: 5)
|
80
|
+
- `--retry_delay`: 重试之间的延迟秒数 (默认: 1)
|
81
|
+
- `--force`: 强制下载,即使 README 已经下载过
|
82
|
+
|
83
|
+
带选项的示例:
|
84
|
+
|
85
|
+
```bash
|
86
|
+
$ star-dlp download_readme --threads=8 --force
|
87
|
+
```
|
88
|
+
|
51
89
|
### 查看版本
|
52
90
|
|
53
91
|
```bash
|
@@ -60,8 +98,10 @@ Star-DLP 将文件保存在以下位置:
|
|
60
98
|
|
61
99
|
- 配置文件: `~/.star-dlp/config.json`
|
62
100
|
- 星标仓库: `~/.star-dlp/stars/`
|
63
|
-
- JSON 文件: `~/.star-dlp/stars/json
|
64
|
-
- Markdown 文件: `~/.star-dlp/stars/markdown
|
101
|
+
- JSON 文件: `~/.star-dlp/stars/json/YYYY/MM/YYYYMMDD.owner.repo.json`
|
102
|
+
- Markdown 文件: `~/.star-dlp/stars/markdown/YYYY/MM/YYYYMMDD.owner.repo.md`
|
103
|
+
- 最后下载的仓库: `~/.star-dlp/stars/last_downloaded_repo.txt`
|
104
|
+
- 已下载 README 列表: `~/.star-dlp/stars/downloaded_readmes.txt`
|
65
105
|
|
66
106
|
## 开发
|
67
107
|
|
data/lib/star/dlp/cli.rb
CHANGED
@@ -1,6 +1,9 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
3
|
require "thor"
|
4
|
+
require "fileutils"
|
5
|
+
require "json"
|
6
|
+
require "time"
|
4
7
|
require_relative "config"
|
5
8
|
require_relative "downloader"
|
6
9
|
|
@@ -12,6 +15,10 @@ module Star
|
|
12
15
|
option :output_dir, type: :string, desc: "Output directory for stars"
|
13
16
|
option :json_dir, type: :string, desc: "Directory for JSON files"
|
14
17
|
option :markdown_dir, type: :string, desc: "Directory for Markdown files"
|
18
|
+
option :threads, type: :numeric, default: 16, desc: "Number of download threads"
|
19
|
+
option :skip_readme, type: :boolean, default: false, desc: "Skip downloading README files"
|
20
|
+
option :retry_count, type: :numeric, default: 5, desc: "Number of retry attempts for failed downloads"
|
21
|
+
option :retry_delay, type: :numeric, default: 1, desc: "Delay in seconds between retry attempts"
|
15
22
|
def download(username)
|
16
23
|
config = Config.load
|
17
24
|
|
@@ -24,10 +31,42 @@ module Star
|
|
24
31
|
# Save config for future use
|
25
32
|
config.save
|
26
33
|
|
27
|
-
downloader = Downloader.new(
|
34
|
+
downloader = Downloader.new(
|
35
|
+
config,
|
36
|
+
username,
|
37
|
+
thread_count: options[:threads],
|
38
|
+
skip_readme: options[:skip_readme],
|
39
|
+
retry_count: options[:retry_count],
|
40
|
+
retry_delay: options[:retry_delay]
|
41
|
+
)
|
28
42
|
downloader.download
|
29
43
|
end
|
30
44
|
|
45
|
+
desc "download_readme", "Download READMEs for all repositories from JSON files"
|
46
|
+
option :threads, type: :numeric, default: 16, desc: "Number of download threads"
|
47
|
+
option :retry_count, type: :numeric, default: 5, desc: "Number of retry attempts for failed downloads"
|
48
|
+
option :retry_delay, type: :numeric, default: 1, desc: "Delay in seconds between retry attempts"
|
49
|
+
option :force, type: :boolean, default: false, desc: "Force download even if README was already downloaded"
|
50
|
+
def download_readme
|
51
|
+
config = Config.load
|
52
|
+
|
53
|
+
# Create a downloader instance
|
54
|
+
downloader = Downloader.new(
|
55
|
+
config,
|
56
|
+
"readme_downloader", # Placeholder username
|
57
|
+
thread_count: options[:threads],
|
58
|
+
retry_count: options[:retry_count],
|
59
|
+
retry_delay: options[:retry_delay]
|
60
|
+
)
|
61
|
+
|
62
|
+
# Call the download_readmes method in the Downloader class
|
63
|
+
result = downloader.download_readmes(force: options[:force])
|
64
|
+
|
65
|
+
puts "README download completed!"
|
66
|
+
puts "Successfully downloaded: #{result[:success]}"
|
67
|
+
puts "Failed or not found: #{result[:failed]}"
|
68
|
+
end
|
69
|
+
|
31
70
|
desc "config", "Configure star-dlp"
|
32
71
|
option :token, type: :string, desc: "GitHub API token"
|
33
72
|
option :output_dir, type: :string, desc: "Output directory for stars"
|
data/lib/star/dlp/downloader.rb
CHANGED
@@ -5,6 +5,7 @@ require "json"
|
|
5
5
|
require "fileutils"
|
6
6
|
require "time"
|
7
7
|
require "base64"
|
8
|
+
require "thread"
|
8
9
|
|
9
10
|
module Star
|
10
11
|
module Dlp
|
@@ -12,10 +13,18 @@ module Star
|
|
12
13
|
attr_reader :config, :github, :username
|
13
14
|
|
14
15
|
LAST_REPO_FILE = "last_downloaded_repo.txt"
|
16
|
+
DOWNLOADED_READMES_FILE = "downloaded_readmes.txt"
|
17
|
+
DEFAULT_THREAD_COUNT = 16
|
18
|
+
DEFAULT_RETRY_COUNT = 5
|
19
|
+
DEFAULT_RETRY_DELAY = 1 # seconds
|
15
20
|
|
16
|
-
def initialize(config, username)
|
21
|
+
def initialize(config, username, thread_count: DEFAULT_THREAD_COUNT, skip_readme: false, retry_count: DEFAULT_RETRY_COUNT, retry_delay: DEFAULT_RETRY_DELAY)
|
17
22
|
@config = config
|
18
23
|
@username = username
|
24
|
+
@thread_count = thread_count
|
25
|
+
@skip_readme = skip_readme
|
26
|
+
@retry_count = retry_count
|
27
|
+
@retry_delay = retry_delay
|
19
28
|
|
20
29
|
# Initialize GitHub API client with the special Accept header for starred_at field
|
21
30
|
options = {
|
@@ -98,14 +107,19 @@ module Star
|
|
98
107
|
|
99
108
|
puts "Found #{new_stars.size} new starred repositories to download"
|
100
109
|
|
101
|
-
# Save new stars
|
110
|
+
# Save new stars using multiple threads
|
102
111
|
if new_stars.any?
|
103
|
-
puts "Downloading new repositories:"
|
104
|
-
|
105
|
-
|
106
|
-
|
107
|
-
|
108
|
-
|
112
|
+
puts "Downloading new repositories using #{@thread_count} threads:"
|
113
|
+
|
114
|
+
# Process stars with multithreading
|
115
|
+
process_items_with_threads(
|
116
|
+
new_stars,
|
117
|
+
->(star) { get_repo_full_name(star) },
|
118
|
+
->(star) {
|
119
|
+
save_star_as_json(star)
|
120
|
+
save_star_as_markdown(star)
|
121
|
+
}
|
122
|
+
)
|
109
123
|
|
110
124
|
puts "Download completed successfully!"
|
111
125
|
else
|
@@ -119,8 +133,320 @@ module Star
|
|
119
133
|
end
|
120
134
|
end
|
121
135
|
|
136
|
+
# Download READMEs for all repositories from JSON files
|
137
|
+
def download_readmes(force: false)
|
138
|
+
puts "Downloading READMEs for repositories from JSON files"
|
139
|
+
|
140
|
+
# File to track repositories with downloaded READMEs
|
141
|
+
downloaded_readmes_file = File.join(config.output_dir, DOWNLOADED_READMES_FILE)
|
142
|
+
|
143
|
+
# Load list of repositories with already downloaded READMEs
|
144
|
+
downloaded_repos = Set.new
|
145
|
+
if File.exist?(downloaded_readmes_file) && !force
|
146
|
+
File.readlines(downloaded_readmes_file).each do |line|
|
147
|
+
downloaded_repos.add(line.strip)
|
148
|
+
end
|
149
|
+
puts "Found #{downloaded_repos.size} repositories with already downloaded READMEs"
|
150
|
+
end
|
151
|
+
|
152
|
+
# Find all JSON files in the json directory
|
153
|
+
json_files = Dir.glob(File.join(config.json_dir, "**", "*.json"))
|
154
|
+
puts "Found #{json_files.size} JSON files"
|
155
|
+
|
156
|
+
# Extract repository names from JSON files
|
157
|
+
repos_to_process = []
|
158
|
+
repo_dates = {} # Store starred_at dates for repositories
|
159
|
+
|
160
|
+
json_files.each do |json_file|
|
161
|
+
begin
|
162
|
+
data = JSON.parse(File.read(json_file))
|
163
|
+
|
164
|
+
# Extract repository full name from JSON data
|
165
|
+
repo_full_name = nil
|
166
|
+
starred_at = nil
|
167
|
+
|
168
|
+
if data.is_a?(Hash) && data["repo"] && data["repo"]["full_name"]
|
169
|
+
repo_full_name = data["repo"]["full_name"]
|
170
|
+
starred_at = data["starred_at"] if data.key?("starred_at")
|
171
|
+
elsif data.is_a?(Hash) && data["full_name"]
|
172
|
+
repo_full_name = data["full_name"]
|
173
|
+
starred_at = data["starred_at"] if data.key?("starred_at")
|
174
|
+
elsif File.basename(json_file) =~ /(\d{8})\.(.+)\.json$/
|
175
|
+
# Try to extract from filename (format: YYYYMMDD.owner.repo.json)
|
176
|
+
date_str = $1
|
177
|
+
parts = $2.split('.')
|
178
|
+
if parts.size >= 2
|
179
|
+
repo_full_name = "#{parts[0]}/#{parts[1]}"
|
180
|
+
# Convert YYYYMMDD to ISO date format
|
181
|
+
if date_str =~ /^(\d{4})(\d{2})(\d{2})$/
|
182
|
+
starred_at = "#{$1}-#{$2}-#{$3}T00:00:00Z"
|
183
|
+
end
|
184
|
+
end
|
185
|
+
end
|
186
|
+
|
187
|
+
# Skip if we couldn't determine the repository name or if README was already downloaded
|
188
|
+
next if repo_full_name.nil?
|
189
|
+
next if downloaded_repos.include?(repo_full_name) && !force
|
190
|
+
|
191
|
+
repos_to_process << repo_full_name
|
192
|
+
# Store the starred_at date if available
|
193
|
+
repo_dates[repo_full_name] = starred_at if starred_at
|
194
|
+
rescue JSON::ParserError => e
|
195
|
+
puts "Error parsing JSON file #{json_file}: #{e.message}"
|
196
|
+
end
|
197
|
+
end
|
198
|
+
|
199
|
+
puts "Found #{repos_to_process.size} repositories that need README downloads"
|
200
|
+
|
201
|
+
# Create a mutex for thread-safe file writing
|
202
|
+
mutex = Mutex.new
|
203
|
+
success_count = 0
|
204
|
+
failed_count = 0
|
205
|
+
|
206
|
+
# Process repositories with multithreading
|
207
|
+
result = process_items_with_threads(
|
208
|
+
repos_to_process,
|
209
|
+
->(repo) { repo }, # Item name is the repo name itself
|
210
|
+
->(repo_full_name) {
|
211
|
+
# Try to download README
|
212
|
+
readme_content = fetch_readme(repo_full_name)
|
213
|
+
|
214
|
+
if readme_content
|
215
|
+
# Get starred_at date if available, or use current date as fallback
|
216
|
+
date = nil
|
217
|
+
if repo_dates.key?(repo_full_name) && repo_dates[repo_full_name]
|
218
|
+
begin
|
219
|
+
date = Time.parse(repo_dates[repo_full_name])
|
220
|
+
rescue
|
221
|
+
date = Time.now
|
222
|
+
end
|
223
|
+
else
|
224
|
+
date = Time.now
|
225
|
+
end
|
226
|
+
|
227
|
+
# Create markdown file path
|
228
|
+
md_filepath = get_markdown_filepath(repo_full_name, date)
|
229
|
+
|
230
|
+
mutex.synchronize do
|
231
|
+
# Check if file exists
|
232
|
+
if File.exist?(md_filepath)
|
233
|
+
# Append README content to existing file
|
234
|
+
File.open(md_filepath, 'a') do |file|
|
235
|
+
file.puts "\n\n## README\n\n#{readme_content}\n"
|
236
|
+
end
|
237
|
+
else
|
238
|
+
# Create new file with repository information and README
|
239
|
+
content = <<~MARKDOWN
|
240
|
+
# #{repo_full_name}
|
241
|
+
|
242
|
+
- **Downloaded at**: #{Time.now.iso8601}
|
243
|
+
- **Starred at**: #{date.iso8601}
|
244
|
+
|
245
|
+
[View on GitHub](https://github.com/#{repo_full_name})
|
246
|
+
|
247
|
+
## README
|
248
|
+
|
249
|
+
#{readme_content}
|
250
|
+
MARKDOWN
|
251
|
+
|
252
|
+
File.write(md_filepath, content)
|
253
|
+
end
|
254
|
+
|
255
|
+
# Add to downloaded repositories list
|
256
|
+
File.open(downloaded_readmes_file, 'a') do |file|
|
257
|
+
file.puts repo_full_name
|
258
|
+
end
|
259
|
+
|
260
|
+
success_count += 1
|
261
|
+
end
|
262
|
+
|
263
|
+
true
|
264
|
+
else
|
265
|
+
mutex.synchronize do
|
266
|
+
puts "No README found for #{repo_full_name}"
|
267
|
+
failed_count += 1
|
268
|
+
end
|
269
|
+
true # Mark as success even if README not found to avoid retries
|
270
|
+
end
|
271
|
+
}
|
272
|
+
)
|
273
|
+
|
274
|
+
puts "README download completed!"
|
275
|
+
puts "Successfully downloaded: #{success_count}"
|
276
|
+
puts "Failed or not found: #{failed_count}"
|
277
|
+
|
278
|
+
return {
|
279
|
+
total: repos_to_process.size,
|
280
|
+
success: success_count,
|
281
|
+
failed: failed_count
|
282
|
+
}
|
283
|
+
end
|
284
|
+
|
285
|
+
# Fetch README.md content from GitHub
|
286
|
+
def fetch_readme(repo_full_name)
|
287
|
+
begin
|
288
|
+
# Get README content using GitHub API
|
289
|
+
response = github.repos.contents.get(
|
290
|
+
user: repo_full_name.split('/').first,
|
291
|
+
repo: repo_full_name.split('/').last,
|
292
|
+
path: 'README.md'
|
293
|
+
)
|
294
|
+
|
295
|
+
# Decode content from Base64
|
296
|
+
if response.content && response.encoding == 'base64'
|
297
|
+
return Base64.decode64(response.content).force_encoding('UTF-8')
|
298
|
+
end
|
299
|
+
rescue Github::Error::NotFound
|
300
|
+
# Try README.markdown if README.md not found
|
301
|
+
begin
|
302
|
+
response = github.repos.contents.get(
|
303
|
+
user: repo_full_name.split('/').first,
|
304
|
+
repo: repo_full_name.split('/').last,
|
305
|
+
path: 'README.markdown'
|
306
|
+
)
|
307
|
+
|
308
|
+
if response.content && response.encoding == 'base64'
|
309
|
+
return Base64.decode64(response.content).force_encoding('UTF-8')
|
310
|
+
end
|
311
|
+
rescue Github::Error::NotFound
|
312
|
+
# Try readme.md (lowercase) if previous attempts failed
|
313
|
+
begin
|
314
|
+
response = github.repos.contents.get(
|
315
|
+
user: repo_full_name.split('/').first,
|
316
|
+
repo: repo_full_name.split('/').last,
|
317
|
+
path: 'readme.md'
|
318
|
+
)
|
319
|
+
|
320
|
+
if response.content && response.encoding == 'base64'
|
321
|
+
return Base64.decode64(response.content).force_encoding('UTF-8')
|
322
|
+
end
|
323
|
+
rescue Github::Error::NotFound
|
324
|
+
# README not found
|
325
|
+
return nil
|
326
|
+
rescue => e
|
327
|
+
puts "Error fetching lowercase readme.md for #{repo_full_name}: #{e.message}"
|
328
|
+
raise e
|
329
|
+
end
|
330
|
+
rescue => e
|
331
|
+
puts "Error fetching README.markdown for #{repo_full_name}: #{e.message}"
|
332
|
+
raise e
|
333
|
+
end
|
334
|
+
rescue => e
|
335
|
+
puts "Error fetching README.md for #{repo_full_name}: #{e.message}"
|
336
|
+
raise e
|
337
|
+
end
|
338
|
+
|
339
|
+
nil
|
340
|
+
end
|
341
|
+
|
122
342
|
private
|
123
343
|
|
344
|
+
# Process a list of items using multiple threads
|
345
|
+
# items: Array of items to process
|
346
|
+
# name_proc: Proc to get item name for logging
|
347
|
+
# process_proc: Proc to process each item
|
348
|
+
def process_items_with_threads(items, name_proc, process_proc)
|
349
|
+
return if items.empty?
|
350
|
+
|
351
|
+
# Create a thread-safe queue for the items
|
352
|
+
queue = Queue.new
|
353
|
+
items.each { |item| queue << item }
|
354
|
+
|
355
|
+
# Create a mutex for thread-safe output
|
356
|
+
mutex = Mutex.new
|
357
|
+
|
358
|
+
# Create a progress counter
|
359
|
+
total = items.size
|
360
|
+
completed = 0
|
361
|
+
|
362
|
+
# Create and start the worker threads
|
363
|
+
threads = Array.new(@thread_count) do
|
364
|
+
Thread.new do
|
365
|
+
until queue.empty?
|
366
|
+
# Try to get an item from the queue (non-blocking)
|
367
|
+
item = queue.pop(true) rescue nil
|
368
|
+
break unless item
|
369
|
+
|
370
|
+
# Get the item name for logging
|
371
|
+
item_name = name_proc.call(item)
|
372
|
+
|
373
|
+
# Process the item with retry mechanism
|
374
|
+
success = false
|
375
|
+
retry_count = 0
|
376
|
+
|
377
|
+
until success || retry_count >= @retry_count
|
378
|
+
begin
|
379
|
+
# Process the item
|
380
|
+
process_proc.call(item)
|
381
|
+
success = true
|
382
|
+
rescue => e
|
383
|
+
retry_count += 1
|
384
|
+
|
385
|
+
# Log the error and retry information
|
386
|
+
mutex.synchronize do
|
387
|
+
puts " Error processing #{item_name}: #{e.message}"
|
388
|
+
if retry_count < @retry_count
|
389
|
+
puts " Retrying in #{@retry_delay} seconds (attempt #{retry_count + 1}/#{@retry_count})..."
|
390
|
+
else
|
391
|
+
puts " Failed to process after #{@retry_count} attempts."
|
392
|
+
end
|
393
|
+
end
|
394
|
+
|
395
|
+
# Wait before retrying
|
396
|
+
sleep(@retry_delay)
|
397
|
+
end
|
398
|
+
end
|
399
|
+
|
400
|
+
# Update progress
|
401
|
+
mutex.synchronize do
|
402
|
+
completed += 1
|
403
|
+
puts " [#{completed}/#{total}] Processed: #{item_name} (#{(completed.to_f / total * 100).round(1)}%)"
|
404
|
+
end
|
405
|
+
end
|
406
|
+
end
|
407
|
+
end
|
408
|
+
|
409
|
+
# Wait for all threads to complete
|
410
|
+
threads.each(&:join)
|
411
|
+
|
412
|
+
return {
|
413
|
+
total: total,
|
414
|
+
completed: completed
|
415
|
+
}
|
416
|
+
end
|
417
|
+
|
418
|
+
# Get the markdown file path for a repository
|
419
|
+
def get_markdown_filepath(repo_full_name, date = Time.now)
|
420
|
+
# Create directory structure based on date: markdown/YYYY/MM/
|
421
|
+
year_dir = date.strftime("%Y")
|
422
|
+
month_dir = date.strftime("%m")
|
423
|
+
target_dir = File.join(config.markdown_dir, year_dir, month_dir)
|
424
|
+
FileUtils.mkdir_p(target_dir) unless Dir.exist?(target_dir)
|
425
|
+
|
426
|
+
# Format filename: YYYYMMDD.repo_owner.repo_name.md
|
427
|
+
date_str = date.strftime("%Y%m%d")
|
428
|
+
repo_name = repo_full_name.gsub('/', '.')
|
429
|
+
filename = "#{date_str}.#{repo_name}.md"
|
430
|
+
|
431
|
+
File.join(target_dir, filename)
|
432
|
+
end
|
433
|
+
|
434
|
+
# Get the JSON file path for a repository
|
435
|
+
def get_json_filepath(repo_full_name, date = Time.now)
|
436
|
+
# Create directory structure based on date: json/YYYY/MM/
|
437
|
+
year_dir = date.strftime("%Y")
|
438
|
+
month_dir = date.strftime("%m")
|
439
|
+
target_dir = File.join(config.json_dir, year_dir, month_dir)
|
440
|
+
FileUtils.mkdir_p(target_dir) unless Dir.exist?(target_dir)
|
441
|
+
|
442
|
+
# Format filename: YYYYMMDD.repo_owner.repo_name.json
|
443
|
+
date_str = date.strftime("%Y%m%d")
|
444
|
+
repo_name = repo_full_name.gsub('/', '.')
|
445
|
+
filename = "#{date_str}.#{repo_name}.json"
|
446
|
+
|
447
|
+
File.join(target_dir, filename)
|
448
|
+
end
|
449
|
+
|
124
450
|
def get_last_repo_name
|
125
451
|
last_repo_file = File.join(config.output_dir, LAST_REPO_FILE)
|
126
452
|
return nil unless File.exist?(last_repo_file)
|
@@ -133,25 +459,19 @@ module Star
|
|
133
459
|
File.write(last_repo_file, repo_name)
|
134
460
|
end
|
135
461
|
|
136
|
-
|
137
462
|
def save_star_as_json(star)
|
138
463
|
star_data = star.to_hash
|
139
464
|
|
140
465
|
# Get starred_at date or use current date as fallback
|
141
466
|
starred_at = star.respond_to?(:starred_at) ? Time.parse(star.starred_at) : Time.now
|
142
467
|
|
143
|
-
#
|
144
|
-
|
145
|
-
month_dir = starred_at.strftime("%m")
|
146
|
-
target_dir = File.join(config.json_dir, year_dir, month_dir)
|
147
|
-
FileUtils.mkdir_p(target_dir) unless Dir.exist?(target_dir)
|
468
|
+
# Get the repository name
|
469
|
+
repo_full_name = get_repo_full_name(star)
|
148
470
|
|
149
|
-
#
|
150
|
-
|
151
|
-
repo_name = get_repo_full_name(star).gsub('/', '.')
|
152
|
-
filename = "#{date_str}.#{repo_name}.json"
|
471
|
+
# Get the JSON file path
|
472
|
+
filepath = get_json_filepath(repo_full_name, starred_at)
|
153
473
|
|
154
|
-
|
474
|
+
# Write the JSON file
|
155
475
|
File.write(filepath, JSON.pretty_generate(star_data))
|
156
476
|
end
|
157
477
|
|
@@ -159,19 +479,14 @@ module Star
|
|
159
479
|
# Get starred_at date or use current date as fallback
|
160
480
|
starred_at = star.respond_to?(:starred_at) ? Time.parse(star.starred_at) : Time.now
|
161
481
|
|
162
|
-
#
|
163
|
-
year_dir = starred_at.strftime("%Y")
|
164
|
-
month_dir = starred_at.strftime("%m")
|
165
|
-
target_dir = File.join(config.markdown_dir, year_dir, month_dir)
|
166
|
-
FileUtils.mkdir_p(target_dir) unless Dir.exist?(target_dir)
|
167
|
-
|
168
|
-
# Format filename: YYYYMMDD.username.repo_name.md
|
169
|
-
date_str = starred_at.strftime("%Y%m%d")
|
482
|
+
# Get the repository name
|
170
483
|
repo_full_name = get_repo_full_name(star)
|
171
|
-
repo_name = repo_full_name.gsub('/', '.')
|
172
|
-
filename = "#{date_str}.#{repo_name}.md"
|
173
484
|
|
174
|
-
|
485
|
+
# Get the markdown file path
|
486
|
+
filepath = get_markdown_filepath(repo_full_name, starred_at)
|
487
|
+
|
488
|
+
# Skip if file already exists
|
489
|
+
return if File.exist?(filepath)
|
175
490
|
|
176
491
|
# Include starred_at in the markdown
|
177
492
|
starred_at_str = star.respond_to?(:starred_at) ? star.starred_at : "N/A"
|
@@ -196,10 +511,14 @@ module Star
|
|
196
511
|
#{(get_topics(star) || []).map { |topic| "- #{topic}" }.join("\n")}
|
197
512
|
MARKDOWN
|
198
513
|
|
199
|
-
# Try to fetch README.md content
|
200
|
-
|
201
|
-
|
202
|
-
|
514
|
+
# Try to fetch README.md content if not skipped
|
515
|
+
unless @skip_readme
|
516
|
+
readme_content = fetch_readme(repo_full_name)
|
517
|
+
if readme_content
|
518
|
+
content += "\n\n## README\n\n#{readme_content}\n"
|
519
|
+
else
|
520
|
+
content += "\n\n## Description\n\n#{get_description(star)}\n"
|
521
|
+
end
|
203
522
|
else
|
204
523
|
content += "\n\n## Description\n\n#{get_description(star)}\n"
|
205
524
|
end
|
@@ -297,63 +616,6 @@ module Star
|
|
297
616
|
[]
|
298
617
|
end
|
299
618
|
end
|
300
|
-
|
301
|
-
# Fetch README.md content from GitHub
|
302
|
-
def fetch_readme(repo_full_name)
|
303
|
-
begin
|
304
|
-
# Get README content using GitHub API
|
305
|
-
response = github.repos.contents.get(
|
306
|
-
user: repo_full_name.split('/').first,
|
307
|
-
repo: repo_full_name.split('/').last,
|
308
|
-
path: 'README.md'
|
309
|
-
)
|
310
|
-
|
311
|
-
# Decode content from Base64
|
312
|
-
if response.content && response.encoding == 'base64'
|
313
|
-
return Base64.decode64(response.content).force_encoding('UTF-8')
|
314
|
-
end
|
315
|
-
rescue Github::Error::NotFound
|
316
|
-
# Try README.markdown if README.md not found
|
317
|
-
begin
|
318
|
-
response = github.repos.contents.get(
|
319
|
-
user: repo_full_name.split('/').first,
|
320
|
-
repo: repo_full_name.split('/').last,
|
321
|
-
path: 'README.markdown'
|
322
|
-
)
|
323
|
-
|
324
|
-
if response.content && response.encoding == 'base64'
|
325
|
-
return Base64.decode64(response.content).force_encoding('UTF-8')
|
326
|
-
end
|
327
|
-
rescue Github::Error::NotFound
|
328
|
-
# Try readme.md (lowercase) if previous attempts failed
|
329
|
-
begin
|
330
|
-
response = github.repos.contents.get(
|
331
|
-
user: repo_full_name.split('/').first,
|
332
|
-
repo: repo_full_name.split('/').last,
|
333
|
-
path: 'readme.md'
|
334
|
-
)
|
335
|
-
|
336
|
-
if response.content && response.encoding == 'base64'
|
337
|
-
return Base64.decode64(response.content).force_encoding('UTF-8')
|
338
|
-
end
|
339
|
-
rescue Github::Error::NotFound
|
340
|
-
# README not found
|
341
|
-
return nil
|
342
|
-
rescue => e
|
343
|
-
puts "Error fetching lowercase readme.md for #{repo_full_name}: #{e.message}"
|
344
|
-
return nil
|
345
|
-
end
|
346
|
-
rescue => e
|
347
|
-
puts "Error fetching README.markdown for #{repo_full_name}: #{e.message}"
|
348
|
-
return nil
|
349
|
-
end
|
350
|
-
rescue => e
|
351
|
-
puts "Error fetching README.md for #{repo_full_name}: #{e.message}"
|
352
|
-
return nil
|
353
|
-
end
|
354
|
-
|
355
|
-
nil
|
356
|
-
end
|
357
619
|
end
|
358
620
|
end
|
359
621
|
end
|
data/lib/star/dlp/version.rb
CHANGED