youtube-transcript-rb 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.rspec +1 -0
- data/.serena/.gitignore +1 -0
- data/.serena/memories/code_style_and_conventions.md +35 -0
- data/.serena/memories/project_overview.md +40 -0
- data/.serena/memories/suggested_commands.md +50 -0
- data/.serena/memories/task_completion_checklist.md +25 -0
- data/.serena/memories/tech_stack.md +20 -0
- data/.serena/project.yml +84 -0
- data/LICENSE +21 -0
- data/PLAN.md +422 -0
- data/README.md +496 -0
- data/Rakefile +4 -0
- data/lib/youtube/transcript/rb/api.rb +150 -0
- data/lib/youtube/transcript/rb/errors.rb +217 -0
- data/lib/youtube/transcript/rb/formatters.rb +269 -0
- data/lib/youtube/transcript/rb/settings.rb +28 -0
- data/lib/youtube/transcript/rb/transcript.rb +239 -0
- data/lib/youtube/transcript/rb/transcript_list.rb +170 -0
- data/lib/youtube/transcript/rb/transcript_list_fetcher.rb +225 -0
- data/lib/youtube/transcript/rb/transcript_parser.rb +83 -0
- data/lib/youtube/transcript/rb/version.rb +9 -0
- data/lib/youtube/transcript/rb.rb +37 -0
- data/sig/youtube/transcript/rb.rbs +8 -0
- data/spec/api_spec.rb +397 -0
- data/spec/errors_spec.rb +240 -0
- data/spec/formatters_spec.rb +436 -0
- data/spec/integration_spec.rb +363 -0
- data/spec/settings_spec.rb +67 -0
- data/spec/spec_helper.rb +109 -0
- data/spec/transcript_list_fetcher_spec.rb +520 -0
- data/spec/transcript_list_spec.rb +380 -0
- data/spec/transcript_parser_spec.rb +355 -0
- data/spec/transcript_spec.rb +435 -0
- metadata +118 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: a3a1c99bcacf517440c8be67c4e72f29406c2a7cee87cb844317a5693d5f1aea
|
|
4
|
+
data.tar.gz: d19bc462f35d6d50dd13c452b3be468c5efb57bf5cdb894c22ad99be409485da
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: b87fab280855a4f3f3b22786789085349492d42dd8fff3b14284b762c656757c8bd07fcfbc084ed551bd489c60ab175e4e57216edcd93c27345cfbeac53507f8
|
|
7
|
+
data.tar.gz: bba30d381a9a685e8c6f1bfeae9fdb40a81a758142a3c0c1659cc1de182266bcb6d85cff575ffade9559c945c9ae23e4c829e10821955605974ee12d499e61e3
|
data/.rspec
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
--require spec_helper
|
data/.serena/.gitignore
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
/cache
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
# Code Style and Conventions
|
|
2
|
+
|
|
3
|
+
## Ruby Style
|
|
4
|
+
- All Ruby files start with `# frozen_string_literal: true`
|
|
5
|
+
- Uses standard Ruby 3.2+ features
|
|
6
|
+
- Module nesting: `Youtube::Transcript::Rb`
|
|
7
|
+
|
|
8
|
+
## Naming Conventions
|
|
9
|
+
- Classes: PascalCase (e.g., `YouTubeTranscriptApi`, `FetchedTranscript`)
|
|
10
|
+
- Methods: snake_case (e.g., `find_transcript`, `find_generated_transcript`)
|
|
11
|
+
- Constants: SCREAMING_SNAKE_CASE
|
|
12
|
+
- Files: snake_case matching class names
|
|
13
|
+
|
|
14
|
+
## Documentation
|
|
15
|
+
- Use YARD-style documentation comments
|
|
16
|
+
- `@param` for parameters
|
|
17
|
+
- `@return` for return values
|
|
18
|
+
- `@raise` for exceptions
|
|
19
|
+
|
|
20
|
+
## Testing
|
|
21
|
+
- RSpec for tests
|
|
22
|
+
- WebMock for HTTP stubbing
|
|
23
|
+
- Test files in `spec/` directory
|
|
24
|
+
- Naming: `*_spec.rb`
|
|
25
|
+
|
|
26
|
+
## Error Handling
|
|
27
|
+
- Custom exception classes inheriting from base `Error` class
|
|
28
|
+
- Exception hierarchy matching Python library structure:
|
|
29
|
+
- `PoTokenRequired`
|
|
30
|
+
- `TranscriptsDisabled`
|
|
31
|
+
- `NoTranscriptFound`
|
|
32
|
+
- `NoTranscriptAvailable`
|
|
33
|
+
- `VideoUnavailable`
|
|
34
|
+
- `TranslationLanguageNotAvailable`
|
|
35
|
+
- `TooManyRequests`
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
# YouTube Transcript Ruby - Project Overview
|
|
2
|
+
|
|
3
|
+
## Purpose
|
|
4
|
+
This gem is a Ruby port of the Python library [youtube-transcript-api](https://github.com/jdepoix/youtube-transcript-api).
|
|
5
|
+
It retrieves transcripts/subtitles from YouTube videos without requiring an API key or headless browser.
|
|
6
|
+
|
|
7
|
+
## Python Library Features to Port
|
|
8
|
+
Based on the original Python library:
|
|
9
|
+
- Fetch transcripts for YouTube videos
|
|
10
|
+
- Support for automatically generated and manually created captions
|
|
11
|
+
- Language preference selection
|
|
12
|
+
- Translation support for translatable transcripts
|
|
13
|
+
- Multiple output formatters (JSON, WebVTT, SRT, plain text, PrettyPrint)
|
|
14
|
+
- HTML formatting preservation option
|
|
15
|
+
- Proxy support (Webshare, Generic HTTP/HTTPS/SOCKS)
|
|
16
|
+
- Error handling for various YouTube states
|
|
17
|
+
|
|
18
|
+
## Current Implementation Status
|
|
19
|
+
The gem has a basic skeleton but most implementation files are missing:
|
|
20
|
+
- ✅ `lib/youtube/transcript/rb.rb` - Main entry point with convenience methods
|
|
21
|
+
- ✅ `lib/youtube/transcript/rb/version.rb` - Version file (0.1.0)
|
|
22
|
+
- ❌ `lib/youtube/transcript/rb/errors.rb` - Not created yet
|
|
23
|
+
- ❌ `lib/youtube/transcript/rb/transcript.rb` - Not created yet
|
|
24
|
+
- ❌ `lib/youtube/transcript/rb/transcript_list.rb` - Not created yet
|
|
25
|
+
- ❌ `lib/youtube/transcript/rb/transcript_list_fetcher.rb` - Not created yet
|
|
26
|
+
- ❌ `lib/youtube/transcript/rb/api.rb` - Not created yet
|
|
27
|
+
- ❌ `lib/youtube/transcript/rb/formatters.rb` - Not created yet
|
|
28
|
+
|
|
29
|
+
## Key Classes Expected
|
|
30
|
+
- `YouTubeTranscriptApi` - Main API class with `fetch` and `list` methods
|
|
31
|
+
- `FetchedTranscript` - Represents fetched transcript data with snippets
|
|
32
|
+
- `TranscriptSnippet` / `FetchedTranscriptSnippet` - Individual transcript segments
|
|
33
|
+
- `TranscriptList` - List of available transcripts for a video
|
|
34
|
+
- `Transcript` - Metadata about a transcript (language, is_generated, etc.)
|
|
35
|
+
- Formatters: `TextFormatter`, `JSONFormatter`, `WebVTTFormatter`, `SRTFormatter`, `PrettyPrintFormatter`
|
|
36
|
+
|
|
37
|
+
## Module Namespace
|
|
38
|
+
```ruby
|
|
39
|
+
Youtube::Transcript::Rb
|
|
40
|
+
```
|
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
# Suggested Commands
|
|
2
|
+
|
|
3
|
+
## Setup
|
|
4
|
+
```bash
|
|
5
|
+
# Install dependencies
|
|
6
|
+
bin/setup
|
|
7
|
+
# or
|
|
8
|
+
bundle install
|
|
9
|
+
```
|
|
10
|
+
|
|
11
|
+
## Testing
|
|
12
|
+
```bash
|
|
13
|
+
# Run all tests
|
|
14
|
+
bundle exec rspec
|
|
15
|
+
|
|
16
|
+
# Run specific test file
|
|
17
|
+
bundle exec rspec spec/youtube_transcript_api_spec.rb
|
|
18
|
+
|
|
19
|
+
# Run with verbose output
|
|
20
|
+
bundle exec rspec --format documentation
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
## Interactive Console
|
|
24
|
+
```bash
|
|
25
|
+
bin/console
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
## Build & Install Gem
|
|
29
|
+
```bash
|
|
30
|
+
# Build the gem
|
|
31
|
+
bundle exec rake build
|
|
32
|
+
|
|
33
|
+
# Install locally
|
|
34
|
+
bundle exec rake install
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
## Git Commands (macOS/Darwin)
|
|
38
|
+
```bash
|
|
39
|
+
git status
|
|
40
|
+
git add .
|
|
41
|
+
git commit -m "message"
|
|
42
|
+
git push
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
## Utility Commands
|
|
46
|
+
```bash
|
|
47
|
+
ls -la # List files
|
|
48
|
+
find . -name "*.rb" # Find Ruby files
|
|
49
|
+
grep -r "pattern" . # Search in files
|
|
50
|
+
```
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
# Task Completion Checklist
|
|
2
|
+
|
|
3
|
+
When completing a task on this project, ensure:
|
|
4
|
+
|
|
5
|
+
## Before Committing
|
|
6
|
+
1. Run tests: `bundle exec rspec`
|
|
7
|
+
2. Ensure all tests pass
|
|
8
|
+
3. Check for syntax errors by loading the gem: `bundle exec ruby -e "require 'youtube/transcript/rb'"`
|
|
9
|
+
|
|
10
|
+
## Code Quality
|
|
11
|
+
- Add `# frozen_string_literal: true` to all new Ruby files
|
|
12
|
+
- Follow existing naming conventions
|
|
13
|
+
- Add YARD documentation for public methods
|
|
14
|
+
- Handle errors appropriately with custom exception classes
|
|
15
|
+
|
|
16
|
+
## Testing
|
|
17
|
+
- Add/update specs for new functionality
|
|
18
|
+
- Use WebMock to stub HTTP requests
|
|
19
|
+
- Test both success and error cases
|
|
20
|
+
|
|
21
|
+
## Python Library Alignment
|
|
22
|
+
When porting features, refer to the original Python library:
|
|
23
|
+
https://github.com/jdepoix/youtube-transcript-api
|
|
24
|
+
|
|
25
|
+
Ensure API compatibility where possible for a similar developer experience.
|
|
@@ -0,0 +1,20 @@
|
|
|
1
|
+
# Tech Stack
|
|
2
|
+
|
|
3
|
+
## Language
|
|
4
|
+
- Ruby (>= 3.2.0)
|
|
5
|
+
|
|
6
|
+
## Dependencies (Runtime)
|
|
7
|
+
- `faraday` (~> 2.0) - HTTP client library
|
|
8
|
+
- `faraday-follow_redirects` (~> 0.3) - Redirect handling for Faraday
|
|
9
|
+
- `nokogiri` (~> 1.15) - XML/HTML parsing
|
|
10
|
+
|
|
11
|
+
## Dependencies (Development/Test)
|
|
12
|
+
- `rake` (~> 13.0) - Task runner
|
|
13
|
+
- `rspec` (~> 3.0) - Testing framework
|
|
14
|
+
- `webmock` (~> 3.0) - HTTP request stubbing for tests
|
|
15
|
+
- `irb` - Interactive Ruby console
|
|
16
|
+
|
|
17
|
+
## Gem Structure
|
|
18
|
+
- Uses bundler gem conventions
|
|
19
|
+
- Gem name: `youtube-transcript-rb`
|
|
20
|
+
- Gemspec: `youtube-transcript-rb.gemspec`
|
data/.serena/project.yml
ADDED
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
# list of languages for which language servers are started; choose from:
|
|
2
|
+
# al bash clojure cpp csharp csharp_omnisharp
|
|
3
|
+
# dart elixir elm erlang fortran go
|
|
4
|
+
# haskell java julia kotlin lua markdown
|
|
5
|
+
# nix perl php python python_jedi r
|
|
6
|
+
# rego ruby ruby_solargraph rust scala swift
|
|
7
|
+
# terraform typescript typescript_vts yaml zig
|
|
8
|
+
# Note:
|
|
9
|
+
# - For C, use cpp
|
|
10
|
+
# - For JavaScript, use typescript
|
|
11
|
+
# Special requirements:
|
|
12
|
+
# - csharp: Requires the presence of a .sln file in the project folder.
|
|
13
|
+
# When using multiple languages, the first language server that supports a given file will be used for that file.
|
|
14
|
+
# The first language is the default language and the respective language server will be used as a fallback.
|
|
15
|
+
# Note that when using the JetBrains backend, language servers are not used and this list is correspondingly ignored.
|
|
16
|
+
languages:
|
|
17
|
+
- ruby
|
|
18
|
+
|
|
19
|
+
# the encoding used by text files in the project
|
|
20
|
+
# For a list of possible encodings, see https://docs.python.org/3.11/library/codecs.html#standard-encodings
|
|
21
|
+
encoding: "utf-8"
|
|
22
|
+
|
|
23
|
+
# whether to use the project's gitignore file to ignore files
|
|
24
|
+
# Added on 2025-04-07
|
|
25
|
+
ignore_all_files_in_gitignore: true
|
|
26
|
+
|
|
27
|
+
# list of additional paths to ignore
|
|
28
|
+
# same syntax as gitignore, so you can use * and **
|
|
29
|
+
# Was previously called `ignored_dirs`, please update your config if you are using that.
|
|
30
|
+
# Added (renamed) on 2025-04-07
|
|
31
|
+
ignored_paths: []
|
|
32
|
+
|
|
33
|
+
# whether the project is in read-only mode
|
|
34
|
+
# If set to true, all editing tools will be disabled and attempts to use them will result in an error
|
|
35
|
+
# Added on 2025-04-18
|
|
36
|
+
read_only: false
|
|
37
|
+
|
|
38
|
+
# list of tool names to exclude. We recommend not excluding any tools, see the readme for more details.
|
|
39
|
+
# Below is the complete list of tools for convenience.
|
|
40
|
+
# To make sure you have the latest list of tools, and to view their descriptions,
|
|
41
|
+
# execute `uv run scripts/print_tool_overview.py`.
|
|
42
|
+
#
|
|
43
|
+
# * `activate_project`: Activates a project by name.
|
|
44
|
+
# * `check_onboarding_performed`: Checks whether project onboarding was already performed.
|
|
45
|
+
# * `create_text_file`: Creates/overwrites a file in the project directory.
|
|
46
|
+
# * `delete_lines`: Deletes a range of lines within a file.
|
|
47
|
+
# * `delete_memory`: Deletes a memory from Serena's project-specific memory store.
|
|
48
|
+
# * `execute_shell_command`: Executes a shell command.
|
|
49
|
+
# * `find_referencing_code_snippets`: Finds code snippets in which the symbol at the given location is referenced.
|
|
50
|
+
# * `find_referencing_symbols`: Finds symbols that reference the symbol at the given location (optionally filtered by type).
|
|
51
|
+
# * `find_symbol`: Performs a global (or local) search for symbols with/containing a given name/substring (optionally filtered by type).
|
|
52
|
+
# * `get_current_config`: Prints the current configuration of the agent, including the active and available projects, tools, contexts, and modes.
|
|
53
|
+
# * `get_symbols_overview`: Gets an overview of the top-level symbols defined in a given file.
|
|
54
|
+
# * `initial_instructions`: Gets the initial instructions for the current project.
|
|
55
|
+
# Should only be used in settings where the system prompt cannot be set,
|
|
56
|
+
# e.g. in clients you have no control over, like Claude Desktop.
|
|
57
|
+
# * `insert_after_symbol`: Inserts content after the end of the definition of a given symbol.
|
|
58
|
+
# * `insert_at_line`: Inserts content at a given line in a file.
|
|
59
|
+
# * `insert_before_symbol`: Inserts content before the beginning of the definition of a given symbol.
|
|
60
|
+
# * `list_dir`: Lists files and directories in the given directory (optionally with recursion).
|
|
61
|
+
# * `list_memories`: Lists memories in Serena's project-specific memory store.
|
|
62
|
+
# * `onboarding`: Performs onboarding (identifying the project structure and essential tasks, e.g. for testing or building).
|
|
63
|
+
# * `prepare_for_new_conversation`: Provides instructions for preparing for a new conversation (in order to continue with the necessary context).
|
|
64
|
+
# * `read_file`: Reads a file within the project directory.
|
|
65
|
+
# * `read_memory`: Reads the memory with the given name from Serena's project-specific memory store.
|
|
66
|
+
# * `remove_project`: Removes a project from the Serena configuration.
|
|
67
|
+
# * `replace_lines`: Replaces a range of lines within a file with new content.
|
|
68
|
+
# * `replace_symbol_body`: Replaces the full definition of a symbol.
|
|
69
|
+
# * `restart_language_server`: Restarts the language server, may be necessary when edits not through Serena happen.
|
|
70
|
+
# * `search_for_pattern`: Performs a search for a pattern in the project.
|
|
71
|
+
# * `summarize_changes`: Provides instructions for summarizing the changes made to the codebase.
|
|
72
|
+
# * `switch_modes`: Activates modes by providing a list of their names
|
|
73
|
+
# * `think_about_collected_information`: Thinking tool for pondering the completeness of collected information.
|
|
74
|
+
# * `think_about_task_adherence`: Thinking tool for determining whether the agent is still on track with the current task.
|
|
75
|
+
# * `think_about_whether_you_are_done`: Thinking tool for determining whether the task is truly completed.
|
|
76
|
+
# * `write_memory`: Writes a named memory (for future reference) to Serena's project-specific memory store.
|
|
77
|
+
excluded_tools: []
|
|
78
|
+
|
|
79
|
+
# initial prompt for the project. It will always be given to the LLM upon activating the project
|
|
80
|
+
# (contrary to the memories, which are loaded on demand).
|
|
81
|
+
initial_prompt: ""
|
|
82
|
+
|
|
83
|
+
project_name: "youtube-transcript-rb"
|
|
84
|
+
included_optional_tools: []
|
data/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 jeff.dean
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|