youtube-transcript-rb 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md ADDED
@@ -0,0 +1,496 @@
1
+ <h1 align="center">
2
+ ✨ YouTube Transcript API (Ruby) ✨
3
+ </h1>
4
+
5
+ <p align="center">
6
+ <a href="http://opensource.org/licenses/MIT">
7
+ <img src="http://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat" alt="MIT license">
8
+ </a>
9
+ <a href="https://rubygems.org/gems/youtube-transcript-rb">
10
+ <img src="https://img.shields.io/gem/v/youtube-transcript-rb.svg" alt="Gem Version">
11
+ </a>
12
+ <a href="https://rubygems.org/gems/youtube-transcript-rb">
13
+ <img src="https://img.shields.io/badge/ruby-%3E%3D%203.2.0-ruby.svg" alt="Ruby Version">
14
+ </a>
15
+ </p>
16
+
17
+ <p align="center">
18
+ <b>This is a Ruby gem which allows you to retrieve the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles, supports translating subtitles and it does not require a headless browser, like other selenium based solutions do!</b>
19
+ </p>
20
+
21
+ <p align="center">
22
+ This is a Ruby port of the Python <a href="https://github.com/jdepoix/youtube-transcript-api">youtube-transcript-api</a> by jdepoix.
23
+ </p>
24
+
25
+ ## Install
26
+
27
+ Add this line to your application's Gemfile:
28
+
29
+ ```ruby
30
+ gem 'youtube-transcript-rb'
31
+ ```
32
+
33
+ And then execute:
34
+
35
+ ```
36
+ bundle install
37
+ ```
38
+
39
+ Or install it yourself as:
40
+
41
+ ```
42
+ gem install youtube-transcript-rb
43
+ ```
44
+
45
+ ## API
46
+
47
+ The easiest way to get a transcript for a given video is to execute:
48
+
49
+ ```ruby
50
+ require 'youtube/transcript/rb'
51
+
52
+ api = Youtube::Transcript::Rb::YouTubeTranscriptApi.new
53
+ api.fetch(video_id)
54
+ ```
55
+
56
+ > **Note:** By default, this will try to access the English transcript of the video. If your video has a different
57
+ > language, or you are interested in fetching a transcript in a different language, please read the section below.
58
+
59
+ > **Note:** Pass in the video ID, NOT the video URL. For a video with the URL `https://www.youtube.com/watch?v=12345`
60
+ > the ID is `12345`.
61
+
62
+ This will return a `FetchedTranscript` object looking somewhat like this:
63
+
64
+ ```ruby
65
+ #<Youtube::Transcript::Rb::FetchedTranscript
66
+ @video_id="12345",
67
+ @language="English",
68
+ @language_code="en",
69
+ @is_generated=false,
70
+ @snippets=[
71
+ #<Youtube::Transcript::Rb::TranscriptSnippet @text="Hey there", @start=0.0, @duration=1.54>,
72
+ #<Youtube::Transcript::Rb::TranscriptSnippet @text="how are you", @start=1.54, @duration=4.16>,
73
+ # ...
74
+ ]
75
+ >
76
+ ```
77
+
78
+ This object implements `Enumerable`, so you can iterate over it:
79
+
80
+ ```ruby
81
+ api = Youtube::Transcript::Rb::YouTubeTranscriptApi.new
82
+ fetched_transcript = api.fetch(video_id)
83
+
84
+ # is iterable
85
+ fetched_transcript.each do |snippet|
86
+ puts snippet.text
87
+ end
88
+
89
+ # indexable
90
+ last_snippet = fetched_transcript[-1]
91
+
92
+ # provides a length
93
+ snippet_count = fetched_transcript.length
94
+ ```
95
+
96
+ If you prefer to handle the raw transcript data you can call `fetched_transcript.to_raw_data`, which will return
97
+ an array of hashes:
98
+
99
+ ```ruby
100
+ [
101
+ {
102
+ 'text' => 'Hey there',
103
+ 'start' => 0.0,
104
+ 'duration' => 1.54
105
+ },
106
+ {
107
+ 'text' => 'how are you',
108
+ 'start' => 1.54,
109
+ 'duration' => 4.16
110
+ },
111
+ # ...
112
+ ]
113
+ ```
114
+
115
+ ### Convenience Methods
116
+
117
+ You can also use the convenience methods on the module directly:
118
+
119
+ ```ruby
120
+ require 'youtube/transcript/rb'
121
+
122
+ # Fetch a transcript
123
+ transcript = Youtube::Transcript::Rb.fetch(video_id)
124
+
125
+ # List available transcripts
126
+ transcript_list = Youtube::Transcript::Rb.list(video_id)
127
+ ```
128
+
129
+ ### Retrieve different languages
130
+
131
+ You can add the `languages` param if you want to make sure the transcripts are retrieved in your desired language
132
+ (it defaults to english).
133
+
134
+ ```ruby
135
+ Youtube::Transcript::Rb::YouTubeTranscriptApi.new.fetch(video_id, languages: ['de', 'en'])
136
+ ```
137
+
138
+ It's an array of language codes in a descending priority. In this example it will first try to fetch the german
139
+ transcript (`'de'`) and then fetch the english transcript (`'en'`) if it fails to do so. If you want to find out
140
+ which languages are available first, [have a look at `list`](#list-available-transcripts).
141
+
142
+ If you only want one language, you still need to format the `languages` argument as an array:
143
+
144
+ ```ruby
145
+ Youtube::Transcript::Rb::YouTubeTranscriptApi.new.fetch(video_id, languages: ['de'])
146
+ ```
147
+
148
+ ### Preserve formatting
149
+
150
+ You can also add `preserve_formatting: true` if you'd like to keep HTML formatting elements such as `<i>` (italics)
151
+ and `<b>` (bold).
152
+
153
+ ```ruby
154
+ Youtube::Transcript::Rb::YouTubeTranscriptApi.new.fetch(video_id, languages: ['de', 'en'], preserve_formatting: true)
155
+ ```
156
+
157
+ ### List available transcripts
158
+
159
+ If you want to list all transcripts which are available for a given video you can call:
160
+
161
+ ```ruby
162
+ api = Youtube::Transcript::Rb::YouTubeTranscriptApi.new
163
+ transcript_list = api.list(video_id)
164
+ ```
165
+
166
+ This will return a `TranscriptList` object which is iterable and provides methods to filter the list of transcripts for
167
+ specific languages and types, like:
168
+
169
+ ```ruby
170
+ transcript = transcript_list.find_transcript(['de', 'en'])
171
+ ```
172
+
173
+ By default this module always chooses manually created transcripts over automatically created ones, if a transcript in
174
+ the requested language is available both manually created and generated. The `TranscriptList` allows you to bypass this
175
+ default behaviour by searching for specific transcript types:
176
+
177
+ ```ruby
178
+ # filter for manually created transcripts
179
+ transcript = transcript_list.find_manually_created_transcript(['de', 'en'])
180
+
181
+ # or automatically generated ones
182
+ transcript = transcript_list.find_generated_transcript(['de', 'en'])
183
+ ```
184
+
185
+ The methods `find_generated_transcript`, `find_manually_created_transcript`, `find_transcript` return `Transcript`
186
+ objects. They contain metadata regarding the transcript:
187
+
188
+ ```ruby
189
+ puts transcript.video_id
190
+ puts transcript.language
191
+ puts transcript.language_code
192
+ # whether it has been manually created or generated by YouTube
193
+ puts transcript.is_generated
194
+ # whether this transcript can be translated or not
195
+ puts transcript.translatable?
196
+ # a list of languages the transcript can be translated to
197
+ puts transcript.translation_languages
198
+ ```
199
+
200
+ and provide the method, which allows you to fetch the actual transcript data:
201
+
202
+ ```ruby
203
+ transcript.fetch
204
+ ```
205
+
206
+ This returns a `FetchedTranscript` object, just like `YouTubeTranscriptApi.new.fetch` does.
207
+
208
+ ### Translate transcript
209
+
210
+ YouTube has a feature which allows you to automatically translate subtitles. This module also makes it possible to
211
+ access this feature. To do so `Transcript` objects provide a `translate` method, which returns a new translated
212
+ `Transcript` object:
213
+
214
+ ```ruby
215
+ transcript = transcript_list.find_transcript(['en'])
216
+ translated_transcript = transcript.translate('de')
217
+ puts translated_transcript.fetch
218
+ ```
219
+
220
+ ### By example
221
+
222
+ ```ruby
223
+ require 'youtube/transcript/rb'
224
+
225
+ api = Youtube::Transcript::Rb::YouTubeTranscriptApi.new
226
+
227
+ # retrieve the available transcripts
228
+ transcript_list = api.list('video_id')
229
+
230
+ # iterate over all available transcripts
231
+ transcript_list.each do |transcript|
232
+ # the Transcript object provides metadata properties
233
+ puts transcript.video_id
234
+ puts transcript.language
235
+ puts transcript.language_code
236
+ # whether it has been manually created or generated by YouTube
237
+ puts transcript.is_generated
238
+ # whether this transcript can be translated or not
239
+ puts transcript.translatable?
240
+ # a list of languages the transcript can be translated to
241
+ puts transcript.translation_languages
242
+
243
+ # fetch the actual transcript data
244
+ puts transcript.fetch
245
+
246
+ # translating the transcript will return another transcript object
247
+ puts transcript.translate('en').fetch if transcript.translatable?
248
+ end
249
+
250
+ # you can also directly filter for the language you are looking for, using the transcript list
251
+ transcript = transcript_list.find_transcript(['de', 'en'])
252
+
253
+ # or just filter for manually created transcripts
254
+ transcript = transcript_list.find_manually_created_transcript(['de', 'en'])
255
+
256
+ # or automatically generated ones
257
+ transcript = transcript_list.find_generated_transcript(['de', 'en'])
258
+ ```
259
+
260
+ ### Fetch multiple videos
261
+
262
+ You can fetch transcripts for multiple videos at once:
263
+
264
+ ```ruby
265
+ api = Youtube::Transcript::Rb::YouTubeTranscriptApi.new
266
+
267
+ # Fetch multiple videos
268
+ transcripts = api.fetch_all(['video1', 'video2', 'video3'])
269
+ transcripts.each do |video_id, transcript|
270
+ puts "#{video_id}: #{transcript.length} snippets"
271
+ end
272
+
273
+ # With error handling - continue even if some videos fail
274
+ api.fetch_all(['video1', 'video2'], continue_on_error: true) do |video_id, result|
275
+ if result.is_a?(StandardError)
276
+ puts "Error for #{video_id}: #{result.message}"
277
+ else
278
+ puts "Got #{result.length} snippets for #{video_id}"
279
+ end
280
+ end
281
+ ```
282
+
283
+ ## Using Formatters
284
+
285
+ Formatters are meant to be an additional layer of processing of the transcript you pass it. The goal is to convert a
286
+ `FetchedTranscript` object into a consistent string of a given "format". Such as a basic text (`.txt`) or even formats
287
+ that have a defined specification such as JSON (`.json`), WebVTT (`.vtt`), SRT (`.srt`), etc...
288
+
289
+ The `Formatters` module provides a few basic formatters:
290
+
291
+ - `JSONFormatter`
292
+ - `PrettyPrintFormatter`
293
+ - `TextFormatter`
294
+ - `WebVTTFormatter`
295
+ - `SRTFormatter`
296
+
297
+ Here is how to import from the `Formatters` module:
298
+
299
+ ```ruby
300
+ require 'youtube/transcript/rb'
301
+
302
+ # Some provided formatter classes, each outputs a different string format.
303
+ Youtube::Transcript::Rb::Formatters::JSONFormatter
304
+ Youtube::Transcript::Rb::Formatters::TextFormatter
305
+ Youtube::Transcript::Rb::Formatters::PrettyPrintFormatter
306
+ Youtube::Transcript::Rb::Formatters::WebVTTFormatter
307
+ Youtube::Transcript::Rb::Formatters::SRTFormatter
308
+ ```
309
+
310
+ ### Formatter Example
311
+
312
+ Let's say we wanted to retrieve a transcript and store it to a JSON file. That would look something like this:
313
+
314
+ ```ruby
315
+ require 'youtube/transcript/rb'
316
+
317
+ api = Youtube::Transcript::Rb::YouTubeTranscriptApi.new
318
+ transcript = api.fetch(video_id)
319
+
320
+ formatter = Youtube::Transcript::Rb::Formatters::JSONFormatter.new
321
+
322
+ # .format_transcript(transcript) turns the transcript into a JSON string.
323
+ json_formatted = formatter.format_transcript(transcript)
324
+
325
+ # Now we can write it out to a file.
326
+ File.write('your_filename.json', json_formatted)
327
+
328
+ # Now should have a new JSON file that you can easily read back into Ruby.
329
+ ```
330
+
331
+ **Passing extra keyword arguments**
332
+
333
+ Since `JSONFormatter` leverages `JSON.generate` you can also forward keyword arguments into
334
+ `.format_transcript(transcript)` such as making your file output prettier:
335
+
336
+ ```ruby
337
+ json_formatted = Youtube::Transcript::Rb::Formatters::JSONFormatter.new.format_transcript(
338
+ transcript,
339
+ indent: ' ',
340
+ space: ' '
341
+ )
342
+ ```
343
+
344
+ ### Using FormatterLoader
345
+
346
+ You can also use the `FormatterLoader` to dynamically load formatters by name:
347
+
348
+ ```ruby
349
+ require 'youtube/transcript/rb'
350
+
351
+ loader = Youtube::Transcript::Rb::Formatters::FormatterLoader.new
352
+
353
+ # Load by type name: "json", "pretty", "text", "webvtt", "srt"
354
+ formatter = loader.load("json")
355
+ output = formatter.format_transcript(transcript)
356
+
357
+ formatter = loader.load("srt")
358
+ File.write('transcript.srt', formatter.format_transcript(transcript))
359
+ ```
360
+
361
+ ### Custom Formatter Example
362
+
363
+ You can implement your own formatter class. Just inherit from the `Formatter` base class and ensure you implement the
364
+ `format_transcript` and `format_transcripts` methods which should ultimately return a string:
365
+
366
+ ```ruby
367
+ class MyCustomFormatter < Youtube::Transcript::Rb::Formatters::Formatter
368
+ def format_transcript(transcript, **options)
369
+ # Do your custom work in here, but return a string.
370
+ 'your processed output data as a string.'
371
+ end
372
+
373
+ def format_transcripts(transcripts, **options)
374
+ # Do your custom work in here to format an array of transcripts, but return a string.
375
+ 'your processed output data as a string.'
376
+ end
377
+ end
378
+ ```
379
+
380
+ ## Error Handling
381
+
382
+ The library provides a comprehensive set of exceptions for different error scenarios:
383
+
384
+ ```ruby
385
+ require 'youtube/transcript/rb'
386
+
387
+ begin
388
+ transcript = Youtube::Transcript::Rb.fetch(video_id)
389
+ rescue Youtube::Transcript::Rb::TranscriptsDisabled => e
390
+ puts "Subtitles are disabled for this video"
391
+ rescue Youtube::Transcript::Rb::NoTranscriptFound => e
392
+ puts "No transcript found for the requested languages"
393
+ puts e.requested_language_codes
394
+ rescue Youtube::Transcript::Rb::NoTranscriptAvailable => e
395
+ puts "No transcripts are available for this video"
396
+ rescue Youtube::Transcript::Rb::VideoUnavailable => e
397
+ puts "The video is no longer available"
398
+ rescue Youtube::Transcript::Rb::TooManyRequests => e
399
+ puts "Rate limited by YouTube"
400
+ rescue Youtube::Transcript::Rb::RequestBlocked => e
401
+ puts "Request blocked by YouTube"
402
+ rescue Youtube::Transcript::Rb::IpBlocked => e
403
+ puts "Your IP has been blocked by YouTube"
404
+ rescue Youtube::Transcript::Rb::PoTokenRequired => e
405
+ puts "PO token required - this is a YouTube limitation"
406
+ rescue Youtube::Transcript::Rb::CouldNotRetrieveTranscript => e
407
+ puts "Could not retrieve transcript: #{e.message}"
408
+ end
409
+ ```
410
+
411
+ ### Available Exceptions
412
+
413
+ | Exception | Description |
414
+ |-----------|-------------|
415
+ | `Error` | Base error class |
416
+ | `CouldNotRetrieveTranscript` | Base class for transcript retrieval errors |
417
+ | `YouTubeDataUnparsable` | YouTube data cannot be parsed |
418
+ | `YouTubeRequestFailed` | HTTP request to YouTube failed |
419
+ | `VideoUnplayable` | Video cannot be played |
420
+ | `VideoUnavailable` | Video is no longer available |
421
+ | `InvalidVideoId` | Invalid video ID provided |
422
+ | `RequestBlocked` | YouTube is blocking requests |
423
+ | `IpBlocked` | IP has been blocked by YouTube |
424
+ | `TooManyRequests` | Rate limited (HTTP 429) |
425
+ | `TranscriptsDisabled` | Subtitles are disabled for the video |
426
+ | `AgeRestricted` | Video is age-restricted |
427
+ | `NotTranslatable` | Transcript cannot be translated |
428
+ | `TranslationLanguageNotAvailable` | Requested translation language not available |
429
+ | `FailedToCreateConsentCookie` | Failed to create consent cookie |
430
+ | `NoTranscriptFound` | No transcript found for requested languages |
431
+ | `NoTranscriptAvailable` | No transcripts available for the video |
432
+ | `PoTokenRequired` | PO token required to fetch transcript |
433
+
434
+ ## Working around IP bans (`RequestBlocked` or `IpBlocked` exception)
435
+
436
+ Unfortunately, YouTube has started blocking most IPs that are known to belong to cloud providers (like AWS, Google Cloud
437
+ Platform, Azure, etc.), which means you will most likely run into `RequestBlocked` or `IpBlocked` exceptions when
438
+ deploying your code to any cloud solutions. Same can happen to the IP of your self-hosted solution, if you are doing
439
+ too many requests. You can work around these IP bans using proxies.
440
+
441
+ > **Note:** Proxy support is planned for a future release.
442
+
443
+ ## Overwriting request defaults
444
+
445
+ When initializing a `YouTubeTranscriptApi` object, it will create a Faraday HTTP client which will be used for all
446
+ HTTP(S) requests. However, you can optionally pass a custom Faraday connection into its constructor:
447
+
448
+ ```ruby
449
+ require 'faraday'
450
+
451
+ http_client = Faraday.new do |conn|
452
+ conn.options.timeout = 60
453
+ conn.headers['Accept-Encoding'] = 'gzip, deflate'
454
+ conn.ssl.verify = true
455
+ conn.ssl.ca_file = '/path/to/certfile'
456
+ conn.adapter Faraday.default_adapter
457
+ end
458
+
459
+ api = Youtube::Transcript::Rb::YouTubeTranscriptApi.new(http_client: http_client)
460
+ api.fetch(video_id)
461
+
462
+ # Share same connection between two instances
463
+ api_2 = Youtube::Transcript::Rb::YouTubeTranscriptApi.new(http_client: http_client)
464
+ api_2.fetch(video_id)
465
+ ```
466
+
467
+ ## Warning
468
+
469
+ This code uses an undocumented part of the YouTube API, which is called by the YouTube web-client. So there is no
470
+ guarantee that it won't stop working tomorrow, if they change how things work. I will however do my best to make things
471
+ working again as soon as possible if that happens. So if it stops working, let me know!
472
+
473
+ ## Development
474
+
475
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `bundle exec rspec` to run the tests.
476
+ You can also run `bin/console` for an interactive prompt that will allow you to experiment.
477
+
478
+ To install this gem onto your local machine, run `bundle exec rake install`.
479
+
480
+ ### Running Tests
481
+
482
+ ```
483
+ bundle exec rspec
484
+ ```
485
+
486
+ ## Contributing
487
+
488
+ Bug reports and pull requests are welcome on GitHub at https://github.com/stadia/youtube-transcript-rb.
489
+
490
+ ## License
491
+
492
+ This project is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
493
+
494
+ ## Credits
495
+
496
+ This is a Ruby port of [youtube-transcript-api](https://github.com/jdepoix/youtube-transcript-api) by jdepoix.
data/Rakefile ADDED
@@ -0,0 +1,4 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ task default: %i[]
@@ -0,0 +1,150 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "faraday"
4
+ require "faraday/follow_redirects"
5
+
6
+ module Youtube
7
+ module Transcript
8
+ module Rb
9
+ # Main entry point for fetching YouTube transcripts.
10
+ # This class provides a simple API for retrieving transcripts from YouTube videos.
11
+ #
12
+ # @example Basic usage
13
+ # api = Youtube::Transcript::Rb::YouTubeTranscriptApi.new
14
+ # transcript = api.fetch("dQw4w9WgXcQ")
15
+ # transcript.each { |snippet| puts snippet.text }
16
+ #
17
+ # @example With language preference
18
+ # api = Youtube::Transcript::Rb::YouTubeTranscriptApi.new
19
+ # transcript = api.fetch("dQw4w9WgXcQ", languages: ["es", "en"])
20
+ #
21
+ # @example Listing available transcripts
22
+ # api = Youtube::Transcript::Rb::YouTubeTranscriptApi.new
23
+ # transcript_list = api.list("dQw4w9WgXcQ")
24
+ # transcript_list.each { |t| puts t }
25
+ #
26
+ class YouTubeTranscriptApi
27
+ # Default timeout for HTTP requests in seconds
28
+ DEFAULT_TIMEOUT = 30
29
+
30
+ # @param http_client [Faraday::Connection, nil] Custom HTTP client (optional)
31
+ # @param proxy_config [Object, nil] Proxy configuration (optional)
32
+ def initialize(http_client: nil, proxy_config: nil)
33
+ @http_client = http_client || build_default_http_client
34
+ @proxy_config = proxy_config
35
+ @fetcher = TranscriptListFetcher.new(
36
+ http_client: @http_client,
37
+ proxy_config: @proxy_config
38
+ )
39
+ end
40
+
41
+ # Fetch a transcript for a video.
42
+ # This is a convenience method that combines `list` and `find_transcript`.
43
+ #
44
+ # @param video_id [String] The YouTube video ID
45
+ # @param languages [Array<String>] Language codes in order of preference (default: ["en"])
46
+ # @param preserve_formatting [Boolean] Whether to preserve HTML formatting (default: false)
47
+ # @return [FetchedTranscript] The fetched transcript
48
+ # @raise [NoTranscriptFound] If no transcript matches the requested languages
49
+ # @raise [TranscriptsDisabled] If transcripts are disabled for the video
50
+ # @raise [VideoUnavailable] If the video is not available
51
+ #
52
+ # @example
53
+ # api = YouTubeTranscriptApi.new
54
+ # transcript = api.fetch("dQw4w9WgXcQ", languages: ["en", "es"])
55
+ # puts transcript.first.text
56
+ #
57
+ def fetch(video_id, languages: ["en"], preserve_formatting: false)
58
+ list(video_id)
59
+ .find_transcript(languages)
60
+ .fetch(preserve_formatting: preserve_formatting)
61
+ end
62
+
63
+ # List all available transcripts for a video.
64
+ #
65
+ # @param video_id [String] The YouTube video ID
66
+ # @return [TranscriptList] A list of available transcripts
67
+ # @raise [TranscriptsDisabled] If transcripts are disabled for the video
68
+ # @raise [VideoUnavailable] If the video is not available
69
+ #
70
+ # @example
71
+ # api = YouTubeTranscriptApi.new
72
+ # transcript_list = api.list("dQw4w9WgXcQ")
73
+ #
74
+ # # Find a specific transcript
75
+ # transcript = transcript_list.find_transcript(["en"])
76
+ #
77
+ # # Or iterate over all available transcripts
78
+ # transcript_list.each do |transcript|
79
+ # puts "#{transcript.language_code}: #{transcript.language}"
80
+ # end
81
+ #
82
+ def list(video_id)
83
+ @fetcher.fetch(video_id)
84
+ end
85
+
86
+ # Fetch transcripts for multiple videos.
87
+ #
88
+ # @param video_ids [Array<String>] Array of YouTube video IDs
89
+ # @param languages [Array<String>] Language codes in order of preference (default: ["en"])
90
+ # @param preserve_formatting [Boolean] Whether to preserve HTML formatting (default: false)
91
+ # @param continue_on_error [Boolean] Whether to continue if a video fails (default: false)
92
+ # @yield [video_id, result] Block called for each video with either transcript or error
93
+ # @yieldparam video_id [String] The video ID being processed
94
+ # @yieldparam result [FetchedTranscript, StandardError] The transcript or error
95
+ # @return [Hash<String, FetchedTranscript>] Hash mapping video IDs to transcripts
96
+ # @raise [CouldNotRetrieveTranscript] If any video fails and continue_on_error is false
97
+ #
98
+ # @example Fetch multiple videos
99
+ # api = YouTubeTranscriptApi.new
100
+ # transcripts = api.fetch_all(["video1", "video2", "video3"])
101
+ # transcripts.each { |id, t| puts "#{id}: #{t.length} snippets" }
102
+ #
103
+ # @example With error handling
104
+ # api = YouTubeTranscriptApi.new
105
+ # api.fetch_all(["video1", "video2"], continue_on_error: true) do |video_id, result|
106
+ # if result.is_a?(StandardError)
107
+ # puts "Error for #{video_id}: #{result.message}"
108
+ # else
109
+ # puts "Got #{result.length} snippets for #{video_id}"
110
+ # end
111
+ # end
112
+ #
113
+ def fetch_all(video_ids, languages: ["en"], preserve_formatting: false, continue_on_error: false)
114
+ results = {}
115
+
116
+ video_ids.each do |video_id|
117
+ begin
118
+ transcript = fetch(video_id, languages: languages, preserve_formatting: preserve_formatting)
119
+ results[video_id] = transcript
120
+ yield(video_id, transcript) if block_given?
121
+ rescue CouldNotRetrieveTranscript => e
122
+ if continue_on_error
123
+ yield(video_id, e) if block_given?
124
+ else
125
+ raise
126
+ end
127
+ end
128
+ end
129
+
130
+ results
131
+ end
132
+
133
+ private
134
+
135
+ # Build the default Faraday HTTP client
136
+ #
137
+ # @return [Faraday::Connection] The configured HTTP client
138
+ def build_default_http_client
139
+ Faraday.new do |conn|
140
+ conn.options.timeout = DEFAULT_TIMEOUT
141
+ conn.options.open_timeout = DEFAULT_TIMEOUT
142
+ conn.request :url_encoded
143
+ conn.response :follow_redirects
144
+ conn.adapter Faraday.default_adapter
145
+ end
146
+ end
147
+ end
148
+ end
149
+ end
150
+ end