youtube-transcript-rb 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: a3a1c99bcacf517440c8be67c4e72f29406c2a7cee87cb844317a5693d5f1aea
4
+ data.tar.gz: d19bc462f35d6d50dd13c452b3be468c5efb57bf5cdb894c22ad99be409485da
5
+ SHA512:
6
+ metadata.gz: b87fab280855a4f3f3b22786789085349492d42dd8fff3b14284b762c656757c8bd07fcfbc084ed551bd489c60ab175e4e57216edcd93c27345cfbeac53507f8
7
+ data.tar.gz: bba30d381a9a685e8c6f1bfeae9fdb40a81a758142a3c0c1659cc1de182266bcb6d85cff575ffade9559c945c9ae23e4c829e10821955605974ee12d499e61e3
data/.rspec ADDED
@@ -0,0 +1 @@
1
+ --require spec_helper
@@ -0,0 +1 @@
1
+ /cache
@@ -0,0 +1,35 @@
1
+ # Code Style and Conventions
2
+
3
+ ## Ruby Style
4
+ - All Ruby files start with `# frozen_string_literal: true`
5
+ - Uses standard Ruby 3.2+ features
6
+ - Module nesting: `Youtube::Transcript::Rb`
7
+
8
+ ## Naming Conventions
9
+ - Classes: PascalCase (e.g., `YouTubeTranscriptApi`, `FetchedTranscript`)
10
+ - Methods: snake_case (e.g., `find_transcript`, `find_generated_transcript`)
11
+ - Constants: SCREAMING_SNAKE_CASE
12
+ - Files: snake_case matching class names
13
+
14
+ ## Documentation
15
+ - Use YARD-style documentation comments
16
+ - `@param` for parameters
17
+ - `@return` for return values
18
+ - `@raise` for exceptions
19
+
20
+ ## Testing
21
+ - RSpec for tests
22
+ - WebMock for HTTP stubbing
23
+ - Test files in `spec/` directory
24
+ - Naming: `*_spec.rb`
25
+
26
+ ## Error Handling
27
+ - Custom exception classes inheriting from base `Error` class
28
+ - Exception hierarchy matching Python library structure:
29
+ - `PoTokenRequired`
30
+ - `TranscriptsDisabled`
31
+ - `NoTranscriptFound`
32
+ - `NoTranscriptAvailable`
33
+ - `VideoUnavailable`
34
+ - `TranslationLanguageNotAvailable`
35
+ - `TooManyRequests`
@@ -0,0 +1,40 @@
1
+ # YouTube Transcript Ruby - Project Overview
2
+
3
+ ## Purpose
4
+ This gem is a Ruby port of the Python library [youtube-transcript-api](https://github.com/jdepoix/youtube-transcript-api).
5
+ It retrieves transcripts/subtitles from YouTube videos without requiring an API key or headless browser.
6
+
7
+ ## Python Library Features to Port
8
+ Based on the original Python library:
9
+ - Fetch transcripts for YouTube videos
10
+ - Support for automatically generated and manually created captions
11
+ - Language preference selection
12
+ - Translation support for translatable transcripts
13
+ - Multiple output formatters (JSON, WebVTT, SRT, plain text, PrettyPrint)
14
+ - HTML formatting preservation option
15
+ - Proxy support (Webshare, Generic HTTP/HTTPS/SOCKS)
16
+ - Error handling for various YouTube states
17
+
18
+ ## Current Implementation Status
19
+ The gem has a basic skeleton but most implementation files are missing:
20
+ - ✅ `lib/youtube/transcript/rb.rb` - Main entry point with convenience methods
21
+ - ✅ `lib/youtube/transcript/rb/version.rb` - Version file (0.1.0)
22
+ - ❌ `lib/youtube/transcript/rb/errors.rb` - Not created yet
23
+ - ❌ `lib/youtube/transcript/rb/transcript.rb` - Not created yet
24
+ - ❌ `lib/youtube/transcript/rb/transcript_list.rb` - Not created yet
25
+ - ❌ `lib/youtube/transcript/rb/transcript_list_fetcher.rb` - Not created yet
26
+ - ❌ `lib/youtube/transcript/rb/api.rb` - Not created yet
27
+ - ❌ `lib/youtube/transcript/rb/formatters.rb` - Not created yet
28
+
29
+ ## Key Classes Expected
30
+ - `YouTubeTranscriptApi` - Main API class with `fetch` and `list` methods
31
+ - `FetchedTranscript` - Represents fetched transcript data with snippets
32
+ - `TranscriptSnippet` / `FetchedTranscriptSnippet` - Individual transcript segments
33
+ - `TranscriptList` - List of available transcripts for a video
34
+ - `Transcript` - Metadata about a transcript (language, is_generated, etc.)
35
+ - Formatters: `TextFormatter`, `JSONFormatter`, `WebVTTFormatter`, `SRTFormatter`, `PrettyPrintFormatter`
36
+
37
+ ## Module Namespace
38
+ ```ruby
39
+ Youtube::Transcript::Rb
40
+ ```
@@ -0,0 +1,50 @@
1
+ # Suggested Commands
2
+
3
+ ## Setup
4
+ ```bash
5
+ # Install dependencies
6
+ bin/setup
7
+ # or
8
+ bundle install
9
+ ```
10
+
11
+ ## Testing
12
+ ```bash
13
+ # Run all tests
14
+ bundle exec rspec
15
+
16
+ # Run specific test file
17
+ bundle exec rspec spec/youtube_transcript_api_spec.rb
18
+
19
+ # Run with verbose output
20
+ bundle exec rspec --format documentation
21
+ ```
22
+
23
+ ## Interactive Console
24
+ ```bash
25
+ bin/console
26
+ ```
27
+
28
+ ## Build & Install Gem
29
+ ```bash
30
+ # Build the gem
31
+ bundle exec rake build
32
+
33
+ # Install locally
34
+ bundle exec rake install
35
+ ```
36
+
37
+ ## Git Commands (macOS/Darwin)
38
+ ```bash
39
+ git status
40
+ git add .
41
+ git commit -m "message"
42
+ git push
43
+ ```
44
+
45
+ ## Utility Commands
46
+ ```bash
47
+ ls -la # List files
48
+ find . -name "*.rb" # Find Ruby files
49
+ grep -r "pattern" . # Search in files
50
+ ```
@@ -0,0 +1,25 @@
1
+ # Task Completion Checklist
2
+
3
+ When completing a task on this project, ensure:
4
+
5
+ ## Before Committing
6
+ 1. Run tests: `bundle exec rspec`
7
+ 2. Ensure all tests pass
8
+ 3. Check for syntax errors by loading the gem: `bundle exec ruby -e "require 'youtube/transcript/rb'"`
9
+
10
+ ## Code Quality
11
+ - Add `# frozen_string_literal: true` to all new Ruby files
12
+ - Follow existing naming conventions
13
+ - Add YARD documentation for public methods
14
+ - Handle errors appropriately with custom exception classes
15
+
16
+ ## Testing
17
+ - Add/update specs for new functionality
18
+ - Use WebMock to stub HTTP requests
19
+ - Test both success and error cases
20
+
21
+ ## Python Library Alignment
22
+ When porting features, refer to the original Python library:
23
+ https://github.com/jdepoix/youtube-transcript-api
24
+
25
+ Ensure API compatibility where possible for a similar developer experience.
@@ -0,0 +1,20 @@
1
+ # Tech Stack
2
+
3
+ ## Language
4
+ - Ruby (>= 3.2.0)
5
+
6
+ ## Dependencies (Runtime)
7
+ - `faraday` (~> 2.0) - HTTP client library
8
+ - `faraday-follow_redirects` (~> 0.3) - Redirect handling for Faraday
9
+ - `nokogiri` (~> 1.15) - XML/HTML parsing
10
+
11
+ ## Dependencies (Development/Test)
12
+ - `rake` (~> 13.0) - Task runner
13
+ - `rspec` (~> 3.0) - Testing framework
14
+ - `webmock` (~> 3.0) - HTTP request stubbing for tests
15
+ - `irb` - Interactive Ruby console
16
+
17
+ ## Gem Structure
18
+ - Uses bundler gem conventions
19
+ - Gem name: `youtube-transcript-rb`
20
+ - Gemspec: `youtube-transcript-rb.gemspec`
@@ -0,0 +1,84 @@
1
+ # list of languages for which language servers are started; choose from:
2
+ # al bash clojure cpp csharp csharp_omnisharp
3
+ # dart elixir elm erlang fortran go
4
+ # haskell java julia kotlin lua markdown
5
+ # nix perl php python python_jedi r
6
+ # rego ruby ruby_solargraph rust scala swift
7
+ # terraform typescript typescript_vts yaml zig
8
+ # Note:
9
+ # - For C, use cpp
10
+ # - For JavaScript, use typescript
11
+ # Special requirements:
12
+ # - csharp: Requires the presence of a .sln file in the project folder.
13
+ # When using multiple languages, the first language server that supports a given file will be used for that file.
14
+ # The first language is the default language and the respective language server will be used as a fallback.
15
+ # Note that when using the JetBrains backend, language servers are not used and this list is correspondingly ignored.
16
+ languages:
17
+ - ruby
18
+
19
+ # the encoding used by text files in the project
20
+ # For a list of possible encodings, see https://docs.python.org/3.11/library/codecs.html#standard-encodings
21
+ encoding: "utf-8"
22
+
23
+ # whether to use the project's gitignore file to ignore files
24
+ # Added on 2025-04-07
25
+ ignore_all_files_in_gitignore: true
26
+
27
+ # list of additional paths to ignore
28
+ # same syntax as gitignore, so you can use * and **
29
+ # Was previously called `ignored_dirs`, please update your config if you are using that.
30
+ # Added (renamed) on 2025-04-07
31
+ ignored_paths: []
32
+
33
+ # whether the project is in read-only mode
34
+ # If set to true, all editing tools will be disabled and attempts to use them will result in an error
35
+ # Added on 2025-04-18
36
+ read_only: false
37
+
38
+ # list of tool names to exclude. We recommend not excluding any tools, see the readme for more details.
39
+ # Below is the complete list of tools for convenience.
40
+ # To make sure you have the latest list of tools, and to view their descriptions,
41
+ # execute `uv run scripts/print_tool_overview.py`.
42
+ #
43
+ # * `activate_project`: Activates a project by name.
44
+ # * `check_onboarding_performed`: Checks whether project onboarding was already performed.
45
+ # * `create_text_file`: Creates/overwrites a file in the project directory.
46
+ # * `delete_lines`: Deletes a range of lines within a file.
47
+ # * `delete_memory`: Deletes a memory from Serena's project-specific memory store.
48
+ # * `execute_shell_command`: Executes a shell command.
49
+ # * `find_referencing_code_snippets`: Finds code snippets in which the symbol at the given location is referenced.
50
+ # * `find_referencing_symbols`: Finds symbols that reference the symbol at the given location (optionally filtered by type).
51
+ # * `find_symbol`: Performs a global (or local) search for symbols with/containing a given name/substring (optionally filtered by type).
52
+ # * `get_current_config`: Prints the current configuration of the agent, including the active and available projects, tools, contexts, and modes.
53
+ # * `get_symbols_overview`: Gets an overview of the top-level symbols defined in a given file.
54
+ # * `initial_instructions`: Gets the initial instructions for the current project.
55
+ # Should only be used in settings where the system prompt cannot be set,
56
+ # e.g. in clients you have no control over, like Claude Desktop.
57
+ # * `insert_after_symbol`: Inserts content after the end of the definition of a given symbol.
58
+ # * `insert_at_line`: Inserts content at a given line in a file.
59
+ # * `insert_before_symbol`: Inserts content before the beginning of the definition of a given symbol.
60
+ # * `list_dir`: Lists files and directories in the given directory (optionally with recursion).
61
+ # * `list_memories`: Lists memories in Serena's project-specific memory store.
62
+ # * `onboarding`: Performs onboarding (identifying the project structure and essential tasks, e.g. for testing or building).
63
+ # * `prepare_for_new_conversation`: Provides instructions for preparing for a new conversation (in order to continue with the necessary context).
64
+ # * `read_file`: Reads a file within the project directory.
65
+ # * `read_memory`: Reads the memory with the given name from Serena's project-specific memory store.
66
+ # * `remove_project`: Removes a project from the Serena configuration.
67
+ # * `replace_lines`: Replaces a range of lines within a file with new content.
68
+ # * `replace_symbol_body`: Replaces the full definition of a symbol.
69
+ # * `restart_language_server`: Restarts the language server, may be necessary when edits not through Serena happen.
70
+ # * `search_for_pattern`: Performs a search for a pattern in the project.
71
+ # * `summarize_changes`: Provides instructions for summarizing the changes made to the codebase.
72
+ # * `switch_modes`: Activates modes by providing a list of their names
73
+ # * `think_about_collected_information`: Thinking tool for pondering the completeness of collected information.
74
+ # * `think_about_task_adherence`: Thinking tool for determining whether the agent is still on track with the current task.
75
+ # * `think_about_whether_you_are_done`: Thinking tool for determining whether the task is truly completed.
76
+ # * `write_memory`: Writes a named memory (for future reference) to Serena's project-specific memory store.
77
+ excluded_tools: []
78
+
79
+ # initial prompt for the project. It will always be given to the LLM upon activating the project
80
+ # (contrary to the memories, which are loaded on demand).
81
+ initial_prompt: ""
82
+
83
+ project_name: "youtube-transcript-rb"
84
+ included_optional_tools: []
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 jeff.dean
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.