RubyGems - star-dlp - Versions diffs - 0.1.0 → 0.1.2 - Mend

star-dlp 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

checksums.yaml +4 -4
data/Gemfile.lock +1 -1
data/README.md +42 -2
data/README_zh.md +42 -2
data/lib/star/dlp/cli.rb +40 -1
data/lib/star/dlp/downloader.rb +486 -91
data/lib/star/dlp/version.rb +1 -1
metadata +2 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 0f53a06472e77562560428a8f65cce8a670f26d0d430cd2c947dc8843cd5c697
-  data.tar.gz: 5e9ca4e9a2fc62a854fc1f8b2f8473c5379f8e03c6f8772e9f4eeb1055cbc4dd
+  metadata.gz: 49aecf46afd8779a951f317d8412ae41d157bcb50d6df163db0eec69f556881b
+  data.tar.gz: 4f3b4809beb3fddc5508f2f2e55cd62012829ff6f6b142b28034eee4e2aaedc0
 SHA512:
-  metadata.gz: 290b940570744dc5ed74fbdfeda58794312109c5412a7751b15d40d3eb857418c676414e0ee692a1db31bd2458ff566e1e4437a41786102386563df065cd4b59
-  data.tar.gz: f186110f59e875bd99aa586544ad8a2d69243172d45ee4b76bfa458e373808d9dd6ee8fa9e952251c06e9264a009bf6b3408556334445ed88c4cd4dded8ac8c5
+  metadata.gz: 824c986da6d7c0e30f058bec67b254d289024da4f7effdfbf1975af7d2a5414671e551d561df110331d5cdcff2a3c8f3427028600ade35d86fe3958264df370a
+  data.tar.gz: 18278facb4fd629b173af6f78f9a3c0b10c8ec2cd035590031f4a7fbe30d9a0d016d39fc2c6c2cfc3d285a5b52ac2cd41f6ea250a65a5e0e2589e3b54e213c1a

data/Gemfile.lock CHANGED Viewed

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    star-dlp (0.1.0)
+    star-dlp (0.1.2)
       fileutils (~> 1.6)
       github_api (~> 0.19.0)
       json (~> 2.6)

data/README.md CHANGED Viewed

@@ -48,6 +48,44 @@ $ star-dlp download your_github_username
 This will download all your starred repositories and save them as JSON and Markdown files. If you've previously downloaded some repositories, it will only download newly starred repositories.
+Available options:
+- `--token`: GitHub API token
+- `--output_dir`: Output directory
+- `--json_dir`: JSON files directory
+- `--markdown_dir`: Markdown files directory
+- `--threads`: Number of download threads (default: 16)
+- `--skip_readme`: Skip downloading README files
+- `--retry_count`: Number of retry attempts for failed downloads (default: 5)
+- `--retry_delay`: Delay in seconds between retry attempts (default: 1)
+Example with options:
+```bash
+$ star-dlp download your_github_username --threads=8 --skip_readme --retry_count=3
+```
+### Downloading READMEs
+If you've already downloaded your starred repositories but want to download or update their README files separately:
+```bash
+$ star-dlp download_readme
+```
+This command will scan your JSON files directory, extract repository information, and download README files for repositories that don't already have them.
+Available options:
+- `--threads`: Number of download threads (default: 16)
+- `--retry_count`: Number of retry attempts for failed downloads (default: 5)
+- `--retry_delay`: Delay in seconds between retry attempts (default: 1)
+- `--force`: Force download even if README was already downloaded
+Example with options:
+```bash
+$ star-dlp download_readme --threads=8 --force
+```
 ### View Version
 ```bash
@@ -60,8 +98,10 @@ Star-DLP saves files in the following locations:
 - Configuration file: `~/.star-dlp/config.json`
 - Starred repositories: `~/.star-dlp/stars/`
-  - JSON files: `~/.star-dlp/stars/json/`
-  - Markdown files: `~/.star-dlp/stars/markdown/`
+  - JSON files: `~/.star-dlp/stars/json/YYYY/MM/YYYYMMDD.owner.repo.json`
+  - Markdown files: `~/.star-dlp/stars/markdown/YYYY/MM/YYYYMMDD.owner.repo.md`
+  - Last downloaded repository: `~/.star-dlp/stars/last_downloaded_repo.txt`
+  - Downloaded READMEs list: `~/.star-dlp/stars/downloaded_readmes.txt`
 ## Development

data/README_zh.md CHANGED Viewed

@@ -48,6 +48,44 @@ $ star-dlp download your_github_username
 这将下载您所有的星标仓库，并将它们保存为 JSON 和 Markdown 文件。如果您之前已经下载过一些仓库，它只会下载新的星标仓库。
+可用选项:
+- `--token`: GitHub API 令牌
+- `--output_dir`: 输出目录
+- `--json_dir`: JSON 文件目录
+- `--markdown_dir`: Markdown 文件目录
+- `--threads`: 下载线程数 (默认: 16)
+- `--skip_readme`: 跳过下载 README 文件
+- `--retry_count`: 下载失败时的重试次数 (默认: 5)
+- `--retry_delay`: 重试之间的延迟秒数 (默认: 1)
+带选项的示例:
+```bash
+$ star-dlp download your_github_username --threads=8 --skip_readme --retry_count=3
+```
+### 下载 README 文件
+如果您已经下载了星标仓库，但想单独下载或更新它们的 README 文件:
+```bash
+$ star-dlp download_readme
+```
+此命令将扫描您的 JSON 文件目录，提取仓库信息，并为尚未下载 README 的仓库下载 README 文件。
+可用选项:
+- `--threads`: 下载线程数 (默认: 16)
+- `--retry_count`: 下载失败时的重试次数 (默认: 5)
+- `--retry_delay`: 重试之间的延迟秒数 (默认: 1)
+- `--force`: 强制下载，即使 README 已经下载过
+带选项的示例:
+```bash
+$ star-dlp download_readme --threads=8 --force
+```
 ### 查看版本
 ```bash
@@ -60,8 +98,10 @@ Star-DLP 将文件保存在以下位置:
 - 配置文件: `~/.star-dlp/config.json`
 - 星标仓库: `~/.star-dlp/stars/`
-  - JSON 文件: `~/.star-dlp/stars/json/`
-  - Markdown 文件: `~/.star-dlp/stars/markdown/`
+  - JSON 文件: `~/.star-dlp/stars/json/YYYY/MM/YYYYMMDD.owner.repo.json`
+  - Markdown 文件: `~/.star-dlp/stars/markdown/YYYY/MM/YYYYMMDD.owner.repo.md`
+  - 最后下载的仓库: `~/.star-dlp/stars/last_downloaded_repo.txt`
+  - 已下载 README 列表: `~/.star-dlp/stars/downloaded_readmes.txt`
 ## 开发

data/lib/star/dlp/cli.rb CHANGED Viewed

@@ -1,6 +1,9 @@
 # frozen_string_literal: true
 require "thor"
+require "fileutils"
+require "json"
+require "time"
 require_relative "config"
 require_relative "downloader"
@@ -12,6 +15,10 @@ module Star
       option :output_dir, type: :string, desc: "Output directory for stars"
       option :json_dir, type: :string, desc: "Directory for JSON files"
       option :markdown_dir, type: :string, desc: "Directory for Markdown files"
+      option :threads, type: :numeric, default: 16, desc: "Number of download threads"
+      option :skip_readme, type: :boolean, default: false, desc: "Skip downloading README files"
+      option :retry_count, type: :numeric, default: 5, desc: "Number of retry attempts for failed downloads"
+      option :retry_delay, type: :numeric, default: 1, desc: "Delay in seconds between retry attempts"
       def download(username)
         config = Config.load
@@ -24,10 +31,42 @@ module Star
         # Save config for future use
         config.save
-        downloader = Downloader.new(config, username)
+        downloader = Downloader.new(
+          config,
+          username,
+          thread_count: options[:threads],
+          skip_readme: options[:skip_readme],
+          retry_count: options[:retry_count],
+          retry_delay: options[:retry_delay]
+        )
         downloader.download
       end
+      desc "download_readme", "Download READMEs for all repositories from JSON files"
+      option :threads, type: :numeric, default: 16, desc: "Number of download threads"
+      option :retry_count, type: :numeric, default: 5, desc: "Number of retry attempts for failed downloads"
+      option :retry_delay, type: :numeric, default: 1, desc: "Delay in seconds between retry attempts"
+      option :force, type: :boolean, default: false, desc: "Force download even if README was already downloaded"
+      def download_readme
+        config = Config.load
+        # Create a downloader instance
+        downloader = Downloader.new(
+          config,
+          "readme_downloader", # Placeholder username
+          thread_count: options[:threads],
+          retry_count: options[:retry_count],
+          retry_delay: options[:retry_delay]
+        )
+        # Call the download_readmes method in the Downloader class
+        result = downloader.download_readmes(force: options[:force])
+        puts "README download completed!"
+        puts "Successfully downloaded: #{result[:success]}"
+        puts "Failed or not found: #{result[:failed]}"
+      end
       desc "config", "Configure star-dlp"
       option :token, type: :string, desc: "GitHub API token"
       option :output_dir, type: :string, desc: "Output directory for stars"

data/lib/star/dlp/downloader.rb CHANGED Viewed

@@ -2,9 +2,12 @@
 require "github_api"
 require "json"
+require "tempfile"
 require "fileutils"
 require "time"
 require "base64"
+require "thread"
+require "open3"
 module Star
   module Dlp
@@ -12,10 +15,45 @@ module Star
       attr_reader :config, :github, :username
       LAST_REPO_FILE = "last_downloaded_repo.txt"
+      DOWNLOADED_READMES_FILE = "downloaded_readmes.txt"
+      DEFAULT_THREAD_COUNT = 16
+      DEFAULT_RETRY_COUNT = 5
+      DEFAULT_RETRY_DELAY = 1 # seconds
-      def initialize(config, username)
+      # Supported README formats in order of preference
+      README_FORMATS = [
+        "README.md",
+        "README.markdown",
+        "readme.md",
+        "README.org",
+        "README.rst",
+        "README.txt",
+        "README.rdoc",
+        "README.adoc",
+        "README",
+        "readme.org",
+        "readme.rst",
+        "readme.txt",
+        "readme.rdoc",
+        "readme.adoc",
+        "readme"
+      ]
+      # Formats that need conversion to markdown
+      FORMATS_NEEDING_CONVERSION = {
+        ".org" => "org",
+        ".rst" => "rst",
+        ".txt" => "txt",
+        "" => "txt"  # For files without extension
+      }
+      def initialize(config, username, thread_count: DEFAULT_THREAD_COUNT, skip_readme: false, retry_count: DEFAULT_RETRY_COUNT, retry_delay: DEFAULT_RETRY_DELAY)
         @config = config
         @username = username
+        @thread_count = thread_count
+        @skip_readme = skip_readme
+        @retry_count = retry_count
+        @retry_delay = retry_delay
         # Initialize GitHub API client with the special Accept header for starred_at field
         options = {
@@ -98,14 +136,19 @@ module Star
         puts "Found #{new_stars.size} new starred repositories to download"
-        # Save new stars
+        # Save new stars using multiple threads
         if new_stars.any?
-          puts "Downloading new repositories:"
-          new_stars.each_with_index do |star, index|
-            puts "  [#{index + 1}/#{new_stars.size}] Downloading: #{get_repo_full_name(star)}"
-            save_star_as_json(star)
-            save_star_as_markdown(star)
-          end
+          puts "Downloading new repositories using #{@thread_count} threads:"
+          # Process stars with multithreading
+          process_items_with_threads(
+            new_stars,
+            ->(star) { get_repo_full_name(star) },
+            ->(star) {
+              save_star_as_json(star)
+              save_star_as_markdown(star)
+            }
+          )
           puts "Download completed successfully!"
         else
@@ -119,8 +162,421 @@ module Star
         end
       end
+      # Download READMEs for all repositories from JSON files
+      def download_readmes(force: false)
+        puts "Downloading READMEs for repositories from JSON files"
+        # File to track repositories with downloaded READMEs
+        downloaded_readmes_file = File.join(config.output_dir, DOWNLOADED_READMES_FILE)
+        # Load list of repositories with already downloaded READMEs
+        downloaded_repos = Set.new
+        if File.exist?(downloaded_readmes_file) && !force
+          File.readlines(downloaded_readmes_file).each do |line|
+            downloaded_repos.add(line.strip)
+          end
+          puts "Found #{downloaded_repos.size} repositories with already downloaded READMEs"
+        end
+        # Find all JSON files in the json directory
+        json_files = Dir.glob(File.join(config.json_dir, "**", "*.json"))
+        puts "Found #{json_files.size} JSON files"
+        # Extract repository names from JSON files
+        repos_to_process = []
+        repo_dates = {} # Store starred_at dates for repositories
+        json_files.each do |json_file|
+          begin
+            data = JSON.parse(File.read(json_file))
+            # Extract repository full name from JSON data
+            repo_full_name = nil
+            starred_at = nil
+            if data.is_a?(Hash) && data["repo"] && data["repo"]["full_name"]
+              repo_full_name = data["repo"]["full_name"]
+              starred_at = data["starred_at"] if data.key?("starred_at")
+            elsif data.is_a?(Hash) && data["full_name"]
+              repo_full_name = data["full_name"]
+              starred_at = data["starred_at"] if data.key?("starred_at")
+            elsif File.basename(json_file) =~ /(\d{8})\.(.+)\.json$/
+              # Try to extract from filename (format: YYYYMMDD.owner.repo.json)
+              date_str = $1
+              parts = $2.split('.')
+              if parts.size >= 2
+                repo_full_name = "#{parts[0]}/#{parts[1]}"
+                # Convert YYYYMMDD to ISO date format
+                if date_str =~ /^(\d{4})(\d{2})(\d{2})$/
+                  starred_at = "#{$1}-#{$2}-#{$3}T00:00:00Z"
+                end
+              end
+            end
+            # Skip if we couldn't determine the repository name or if README was already downloaded
+            next if repo_full_name.nil?
+            next if downloaded_repos.include?(repo_full_name) && !force
+            repos_to_process << repo_full_name
+            # Store the starred_at date if available
+            repo_dates[repo_full_name] = starred_at if starred_at
+          rescue JSON::ParserError => e
+            puts "Error parsing JSON file #{json_file}: #{e.message}"
+          end
+        end
+        puts "Found #{repos_to_process.size} repositories that need README downloads"
+        # Create a mutex for thread-safe file writing
+        mutex = Mutex.new
+        success_count = 0
+        failed_count = 0
+        # Process repositories with multithreading
+        result = process_items_with_threads(
+          repos_to_process,
+          ->(repo) { repo }, # Item name is the repo name itself
+          ->(repo_full_name) {
+            # Try to download README
+            readme_result = fetch_readme(repo_full_name)
+            if readme_result && readme_result[:content]
+              # Get starred_at date if available, or use current date as fallback
+              date = nil
+              if repo_dates.key?(repo_full_name) && repo_dates[repo_full_name]
+                begin
+                  date = Time.parse(repo_dates[repo_full_name])
+                rescue
+                  date = Time.now
+                end
+              else
+                date = Time.now
+              end
+              # Create markdown file path
+              md_filepath = get_markdown_filepath(repo_full_name, date)
+              mutex.synchronize do
+                # Check if file exists
+                if File.exist?(md_filepath)
+                  # Append README content to existing file
+                  File.open(md_filepath, 'a') do |file|
+                    file.puts "\n\n## README"
+                    file.puts "\n*Format: #{readme_result[:format]}*\n" if readme_result[:format] != "markdown"
+                    file.puts "\n#{readme_result[:content]}\n"
+                  end
+                else
+                  # Create new file with repository information and README
+                  content = <<~MARKDOWN
+                    # #{repo_full_name}
+                    - **Downloaded at**: #{Time.now.iso8601}
+                    - **Starred at**: #{date.iso8601}
+                    [View on GitHub](https://github.com/#{repo_full_name})
+                    ## README
+                  MARKDOWN
+                  # Add format note if not markdown
+                  content += "\n*Format: #{readme_result[:format]}*\n" if readme_result[:format] != "markdown"
+                  # Add README content
+                  content += "\n#{readme_result[:content]}\n"
+                  File.write(md_filepath, content)
+                end
+                # Add to downloaded repositories list
+                File.open(downloaded_readmes_file, 'a') do |file|
+                  file.puts repo_full_name
+                end
+                success_count += 1
+              end
+              true
+            else
+              mutex.synchronize do
+                puts "No README found for #{repo_full_name}"
+                failed_count += 1
+              end
+              true # Mark as success even if README not found to avoid retries
+            end
+          }
+        )
+        puts "README download completed!"
+        puts "Successfully downloaded: #{success_count}"
+        puts "Failed or not found: #{failed_count}"
+        return {
+          total: repos_to_process.size,
+          success: success_count,
+          failed: failed_count
+        }
+      end
+      # Fetch README content from GitHub
+      # Returns a hash with :content and :format keys, or nil if not found
+      def fetch_readme(repo_full_name)
+        # Try each README format in order
+        README_FORMATS.each do |readme_path|
+          begin
+            # Get README content using GitHub API
+            response = github.repos.contents.get(
+              user: repo_full_name.split('/').first,
+              repo: repo_full_name.split('/').last,
+              path: readme_path
+            )
+            # Decode content from Base64
+            if response.content && response.encoding == 'base64'
+              content = Base64.decode64(response.content).force_encoding('UTF-8')
+              # Get file extension
+              ext = File.extname(readme_path).downcase
+              # Check if we need to convert the content
+              if FORMATS_NEEDING_CONVERSION.key?(ext)
+                format = FORMATS_NEEDING_CONVERSION[ext]
+                puts "Converting #{readme_path} from #{format} to markdown for #{repo_full_name}"
+                # Create a temporary file with the content
+                temp_file = Tempfile.new(['readme', ".#{format}"])
+                begin
+                  temp_file.write(content)
+                  temp_file.close
+                  # Use pandoc to convert to markdown
+                  markdown_content, status = convert_to_markdown(temp_file.path, format)
+                  if status.success?
+                    return { content: markdown_content, format: format }
+                  else
+                    puts "Pandoc conversion failed for #{repo_full_name}, using original content"
+                    return { content: content, format: format }
+                  end
+                ensure
+                  temp_file.unlink
+                end
+              else
+                # Already markdown, no conversion needed
+                return { content: content, format: "markdown" }
+              end
+            end
+          rescue Github::Error::NotFound
+            # Try next format
+            next
+          rescue => e
+            puts "Error fetching #{readme_path} for #{repo_full_name}: #{e.message}"
+            next
+          end
+        end
+        # No README found in predefined formats, check for any readme-like file in the root directory
+        begin
+          # Get repository contents
+          contents = github.repos.contents.get(
+            user: repo_full_name.split('/').first,
+            repo: repo_full_name.split('/').last,
+            path: ""  # Root directory
+          )
+          # Look for any file with name matching /readme/i
+          readme_file = contents.find { |item| item.type == "file" && item.name =~ /readme/i }
+          if readme_file
+            puts "Found alternative README file: #{readme_file.name} for #{repo_full_name}"
+            # Get README content
+            readme_content = github.repos.contents.get(
+              user: repo_full_name.split('/').first,
+              repo: repo_full_name.split('/').last,
+              path: readme_file.name
+            )
+            # Decode content from Base64
+            if readme_content.content && readme_content.encoding == 'base64'
+              content = Base64.decode64(readme_content.content).force_encoding('UTF-8')
+              # Get file extension
+              ext = File.extname(readme_file.name).downcase
+              # Check if we need to convert the content
+              if FORMATS_NEEDING_CONVERSION.key?(ext)
+                format = FORMATS_NEEDING_CONVERSION[ext]
+                puts "Converting #{readme_file.name} from #{format} to markdown for #{repo_full_name}"
+                # Create a temporary file with the content
+                temp_file = Tempfile.new(['readme', ".#{format}"])
+                begin
+                  temp_file.write(content)
+                  temp_file.close
+                  # Use pandoc to convert to markdown
+                  markdown_content, status = convert_to_markdown(temp_file.path, format)
+                  if status.success?
+                    return { content: markdown_content, format: format }
+                  else
+                    puts "Pandoc conversion failed for #{repo_full_name}, using original content"
+                    return { content: content, format: format }
+                  end
+                ensure
+                  temp_file.unlink
+                end
+              else
+                # Determine format based on extension or default to txt
+                format = ext.empty? ? "txt" : ext[1..]
+                # Use markdown format if extension suggests it's already markdown
+                format = "markdown" if [".md", ".markdown"].include?(ext)
+                return { content: content, format: format }
+              end
+            end
+          end
+        rescue => e
+          puts "Error checking root directory for README-like files for #{repo_full_name}: #{e.message}"
+        end
+        # No README found in any format
+        nil
+      end
+      # Convert content from a given format to markdown using pandoc
+      def convert_to_markdown(file_path, format)
+        begin
+          # Check if pandoc is installed
+          version_output, status = Open3.capture2e("pandoc --version")
+          unless status.success?
+            puts "Warning: pandoc is not installed or not in PATH. Cannot convert non-markdown formats."
+            return [File.read(file_path), status]
+          end
+          # Use pandoc to convert to markdown
+          output, status = Open3.capture2e("pandoc", "-f", format, "-t", "markdown", file_path)
+          if status.success?
+            return [output, status]
+          else
+            puts "Pandoc conversion failed: #{output}"
+            return [File.read(file_path), status]
+          end
+        rescue => e
+          puts "Error during conversion: #{e.message}"
+          return [File.read(file_path), OpenStruct.new(success?: false)]
+        end
+      end
       private
+      # Process a list of items using multiple threads
+      # items: Array of items to process
+      # name_proc: Proc to get item name for logging
+      # process_proc: Proc to process each item
+      def process_items_with_threads(items, name_proc, process_proc)
+        return if items.empty?
+        # Create a thread-safe queue for the items
+        queue = Queue.new
+        items.each { |item| queue << item }
+        # Create a mutex for thread-safe output
+        mutex = Mutex.new
+        # Create a progress counter
+        total = items.size
+        completed = 0
+        # Create and start the worker threads
+        threads = Array.new(@thread_count) do
+          Thread.new do
+            until queue.empty?
+              # Try to get an item from the queue (non-blocking)
+              item = queue.pop(true) rescue nil
+              break unless item
+              # Get the item name for logging
+              item_name = name_proc.call(item)
+              # Process the item with retry mechanism
+              success = false
+              retry_count = 0
+              until success || retry_count >= @retry_count
+                begin
+                  # Process the item
+                  process_proc.call(item)
+                  success = true
+                rescue => e
+                  retry_count += 1
+                  # Log the error and retry information
+                  mutex.synchronize do
+                    puts "  Error processing #{item_name}: #{e.message}"
+                    if retry_count < @retry_count
+                      puts "  Retrying in #{@retry_delay} seconds (attempt #{retry_count + 1}/#{@retry_count})..."
+                    else
+                      puts "  Failed to process after #{@retry_count} attempts."
+                    end
+                  end
+                  # Wait before retrying
+                  sleep(@retry_delay)
+                end
+              end
+              # Update progress
+              mutex.synchronize do
+                completed += 1
+                puts "  [#{completed}/#{total}] Processed: #{item_name} (#{(completed.to_f / total * 100).round(1)}%)"
+              end
+            end
+          end
+        end
+        # Wait for all threads to complete
+        threads.each(&:join)
+        return {
+          total: total,
+          completed: completed
+        }
+      end
+      # Get the markdown file path for a repository
+      def get_markdown_filepath(repo_full_name, date = Time.now)
+        # Create directory structure based on date: markdown/YYYY/MM/
+        year_dir = date.strftime("%Y")
+        month_dir = date.strftime("%m")
+        target_dir = File.join(config.markdown_dir, year_dir, month_dir)
+        FileUtils.mkdir_p(target_dir) unless Dir.exist?(target_dir)
+        # Format filename: YYYYMMDD.repo_owner.repo_name.md
+        date_str = date.strftime("%Y%m%d")
+        repo_name = repo_full_name.gsub('/', '.')
+        filename = "#{date_str}.#{repo_name}.md"
+        File.join(target_dir, filename)
+      end
+      # Get the JSON file path for a repository
+      def get_json_filepath(repo_full_name, date = Time.now)
+        # Create directory structure based on date: json/YYYY/MM/
+        year_dir = date.strftime("%Y")
+        month_dir = date.strftime("%m")
+        target_dir = File.join(config.json_dir, year_dir, month_dir)
+        FileUtils.mkdir_p(target_dir) unless Dir.exist?(target_dir)
+        # Format filename: YYYYMMDD.repo_owner.repo_name.json
+        date_str = date.strftime("%Y%m%d")
+        repo_name = repo_full_name.gsub('/', '.')
+        filename = "#{date_str}.#{repo_name}.json"
+        File.join(target_dir, filename)
+      end
       def get_last_repo_name
         last_repo_file = File.join(config.output_dir, LAST_REPO_FILE)
         return nil unless File.exist?(last_repo_file)
@@ -133,25 +589,19 @@ module Star
         File.write(last_repo_file, repo_name)
       end
       def save_star_as_json(star)
         star_data = star.to_hash
         # Get starred_at date or use current date as fallback
         starred_at = star.respond_to?(:starred_at) ? Time.parse(star.starred_at) : Time.now
-        # Create directory structure based on starred_at date: json/YYYY/MM/
-        year_dir = starred_at.strftime("%Y")
-        month_dir = starred_at.strftime("%m")
-        target_dir = File.join(config.json_dir, year_dir, month_dir)
-        FileUtils.mkdir_p(target_dir) unless Dir.exist?(target_dir)
+        # Get the repository name
+        repo_full_name = get_repo_full_name(star)
-        # Format filename: YYYYMMDD.username.repo_name.json
-        date_str = starred_at.strftime("%Y%m%d")
-        repo_name = get_repo_full_name(star).gsub('/', '.')
-        filename = "#{date_str}.#{repo_name}.json"
+        # Get the JSON file path
+        filepath = get_json_filepath(repo_full_name, starred_at)
-        filepath = File.join(target_dir, filename)
+        # Write the JSON file
         File.write(filepath, JSON.pretty_generate(star_data))
       end
@@ -159,19 +609,14 @@ module Star
         # Get starred_at date or use current date as fallback
         starred_at = star.respond_to?(:starred_at) ? Time.parse(star.starred_at) : Time.now
-        # Create directory structure based on starred_at date: markdown/YYYY/MM/
-        year_dir = starred_at.strftime("%Y")
-        month_dir = starred_at.strftime("%m")
-        target_dir = File.join(config.markdown_dir, year_dir, month_dir)
-        FileUtils.mkdir_p(target_dir) unless Dir.exist?(target_dir)
-        # Format filename: YYYYMMDD.username.repo_name.md
-        date_str = starred_at.strftime("%Y%m%d")
+        # Get the repository name
         repo_full_name = get_repo_full_name(star)
-        repo_name = repo_full_name.gsub('/', '.')
-        filename = "#{date_str}.#{repo_name}.md"
-        filepath = File.join(target_dir, filename)
+        # Get the markdown file path
+        filepath = get_markdown_filepath(repo_full_name, starred_at)
+        # Skip if file already exists
+        return if File.exist?(filepath)
         # Include starred_at in the markdown
         starred_at_str = star.respond_to?(:starred_at) ? star.starred_at : "N/A"
@@ -196,10 +641,17 @@ module Star
           #{(get_topics(star) || []).map { |topic| "- #{topic}" }.join("\n")}
         MARKDOWN
-        # Try to fetch README.md content
-        readme_content = fetch_readme(repo_full_name)
-        if readme_content
-          content += "\n\n## README\n\n#{readme_content}\n"
+        # Try to fetch README.md content if not skipped
+        unless @skip_readme
+          readme_result = fetch_readme(repo_full_name)
+          if readme_result && readme_result[:content]
+            content += "\n\n## README"
+            # Add format note if not markdown
+            content += "\n*Format: #{readme_result[:format]}*\n" if readme_result[:format] != "markdown"
+            content += "\n#{readme_result[:content]}\n"
+          else
+            content += "\n\n## Description\n\n#{get_description(star)}\n"
+          end
         else
           content += "\n\n## Description\n\n#{get_description(star)}\n"
         end
@@ -297,63 +749,6 @@ module Star
           []
         end
       end
-      # Fetch README.md content from GitHub
-      def fetch_readme(repo_full_name)
-        begin
-          # Get README content using GitHub API
-          response = github.repos.contents.get(
-            user: repo_full_name.split('/').first,
-            repo: repo_full_name.split('/').last,
-            path: 'README.md'
-          )
-          # Decode content from Base64
-          if response.content && response.encoding == 'base64'
-            return Base64.decode64(response.content).force_encoding('UTF-8')
-          end
-        rescue Github::Error::NotFound
-          # Try README.markdown if README.md not found
-          begin
-            response = github.repos.contents.get(
-              user: repo_full_name.split('/').first,
-              repo: repo_full_name.split('/').last,
-              path: 'README.markdown'
-            )
-            if response.content && response.encoding == 'base64'
-              return Base64.decode64(response.content).force_encoding('UTF-8')
-            end
-          rescue Github::Error::NotFound
-            # Try readme.md (lowercase) if previous attempts failed
-            begin
-              response = github.repos.contents.get(
-                user: repo_full_name.split('/').first,
-                repo: repo_full_name.split('/').last,
-                path: 'readme.md'
-              )
-              if response.content && response.encoding == 'base64'
-                return Base64.decode64(response.content).force_encoding('UTF-8')
-              end
-            rescue Github::Error::NotFound
-              # README not found
-              return nil
-            rescue => e
-              puts "Error fetching lowercase readme.md for #{repo_full_name}: #{e.message}"
-              return nil
-            end
-          rescue => e
-            puts "Error fetching README.markdown for #{repo_full_name}: #{e.message}"
-            return nil
-          end
-        rescue => e
-          puts "Error fetching README.md for #{repo_full_name}: #{e.message}"
-          return nil
-        end
-        nil
-      end
     end
   end
 end

data/lib/star/dlp/version.rb CHANGED Viewed

@@ -2,6 +2,6 @@
 module Star
   module Dlp
-    VERSION = "0.1.0"
+    VERSION = "0.1.2"
   end
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: star-dlp
 version: !ruby/object:Gem::Version
-  version: 0.1.0
+  version: 0.1.2
 platform: ruby
 authors:
 - Liu Xiang
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2025-03-16 00:00:00.000000000 Z
+date: 2025-03-18 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: github_api