youtube-data 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 9d7d85c28c1258272db9500f0aaf57b66cd831b8d8134b411a0e25ced0cd4912
4
- data.tar.gz: 2de11bacc9a8ebd4b075d23c6da2b6b2f2e0f8898e8ee1f70ae2ebac30216a21
3
+ metadata.gz: 06f7a815e38fca918a2b3544866ac7e605e2b16e891b787c7a0dca89411c67ba
4
+ data.tar.gz: 244076bb34145255f00431839ea90971b15dc08f598bdf7fdf3b351314a5e620
5
5
  SHA512:
6
- metadata.gz: b85bf4e5f90c9bf857f2853fb971c8d61b53f9fff978d318c02f70f4458bbf1bda169954c2de6f22d2a6aa42f6c25d114cba51b760b9fd472c7dee24aa5cd96a
7
- data.tar.gz: 9d2fb4892bda2a428c0da64fc27def5cf243b39fba5fec8245f178da8937c05a16f2afadcbce621ea72c271c748ca3bc604847e8b799063b02567b1748a47e7a
6
+ metadata.gz: d2b611795e7ef374a4f4731322dbb9dd786be6ada843c1e52ac2e2a9c4c89f36e7ddd512f3c98db5d0601ce101fd8a42039952d73e8aaf482c596926782ce27b
7
+ data.tar.gz: cf11496789ec3530bcf36ae1affcda019ec53a7f888a5ef66187aa5d9cdaf763d49bec8b250000ca9e34dd609369bf63bab1e659bdbdb16353cc89e4f282b3a0
data/Gemfile CHANGED
@@ -1,8 +1,8 @@
1
- # frozen_string_literal: true
2
-
3
- source "https://rubygems.org"
4
-
5
- # Specify your gem's dependencies in youtube.gemspec
6
- gemspec
7
-
8
- gem "rake", "~> 13.0"
1
+ # frozen_string_literal: true
2
+
3
+ source "https://rubygems.org"
4
+
5
+ # Specify your gem's dependencies in youtube.gemspec
6
+ gemspec
7
+
8
+ gem "rake", "~> 13.0"
data/LICENSE.txt CHANGED
@@ -1,21 +1,21 @@
1
- The MIT License (MIT)
2
-
3
- Copyright (c) 2024 https://github.com/boddz
4
-
5
- Permission is hereby granted, free of charge, to any person obtaining a copy
6
- of this software and associated documentation files (the "Software"), to deal
7
- in the Software without restriction, including without limitation the rights
8
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
- copies of the Software, and to permit persons to whom the Software is
10
- furnished to do so, subject to the following conditions:
11
-
12
- The above copyright notice and this permission notice shall be included in
13
- all copies or substantial portions of the Software.
14
-
15
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
- THE SOFTWARE.
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2024 https://github.com/boddz
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md CHANGED
@@ -1,61 +1,65 @@
1
- # Youtube
2
-
3
- A ruby gem for extracting youtube video data.
4
-
5
- Currently a work in progress. This is just a mirror at the moment so support will be limited because of this for now,
6
- and I am still new to Ruby, so some things will be changed a lot in the future most likely.
7
-
8
-
9
- ## Installation
10
-
11
- Written and tested with and `ruby 3.0.2` on `Ubuntu 22.04.3 LTS`.
12
-
13
- Install required gems:
14
-
15
- ```bash
16
- bundle install
17
- ```
18
-
19
- Build and install gem file locally:
20
-
21
- ```bash
22
- gem build && gem install youtube-data-[version].gem
23
- ```
24
-
25
- Or install from gem server:
26
-
27
- ```bash
28
- gem install youtube-data
29
- ```
30
-
31
-
32
- ## Usage
33
-
34
- TODO: Write usage instructions here...
35
-
36
-
37
- ## Development
38
-
39
- After checking out the repo, run `bin/setup` to install dependencies. You can also run `bin/console` for an interactive
40
- prompt that will allow you to experiment.
41
-
42
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the
43
- version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version,
44
- push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
45
-
46
- Running tests:
47
-
48
- ```bash
49
- cd tests # These commands must be run in ./tests for relative path reasons.
50
- ruby test_all.rb # To run all tests, or you can just call an individual test instead.
51
- ```
52
-
53
-
54
- ## Contributing
55
-
56
- Bug reports and pull requests are welcome on [GitHub](https://github.com/boddz/youtube).
57
-
58
-
59
- ## License
60
-
61
- The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
1
+ # Youtube Data Extractor
2
+
3
+ A Ruby gem for extracting youtube video data.
4
+
5
+ This does not use the API that google provides, this is just data gathered from scraping raw HTML data. No reason
6
+ really, I just find doing so more fun.
7
+
8
+ This gem is still in it's earlier stages of development so expect stuff to change. I will try not to change too much
9
+ in terms of interface methods/ classes that can break your existing scripts for convenience sake.
10
+
11
+
12
+ ## Installation
13
+
14
+ Written and tested with and `ruby 3.0.2` on `Ubuntu 22.04.3 LTS`.
15
+
16
+ Install required gems:
17
+
18
+ ```bash
19
+ bundle install
20
+ ```
21
+
22
+ Build and install gem file locally:
23
+
24
+ ```bash
25
+ gem build && gem install youtube-data-[version].gem
26
+ ```
27
+
28
+ Or install from gem server:
29
+
30
+ ```bash
31
+ gem install youtube-data
32
+ ```
33
+
34
+
35
+ ## Usage
36
+
37
+ For some examples on how to use this gem go and check out the `examples` sub-directory.
38
+
39
+
40
+ ## Development
41
+
42
+ After checking out the repo, run `bin/setup` to install dependencies. You can also run `bin/console` for an interactive
43
+ prompt that will allow you to experiment.
44
+
45
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the
46
+ version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version,
47
+ push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
48
+
49
+ Running tests:
50
+
51
+ ```bash
52
+ cd tests # These commands must be run in ./tests for relative path reasons.
53
+ ./data/get_data # Fetch data used in test. Used for mocked session objects to avoid sending requests when testing.
54
+ ruby test_all.rb # To run all tests, or you can just call an individual test instead.
55
+ ```
56
+
57
+
58
+ ## Contributing
59
+
60
+ Bug reports and pull requests are welcome on [GitHub](https://github.com/boddz/youtube).
61
+
62
+
63
+ ## License
64
+
65
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
data/Rakefile CHANGED
@@ -1,4 +1,4 @@
1
- # frozen_string_literal: true
2
-
3
- require "bundler/gem_tasks"
4
- task default: %i[]
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ task default: %i[]
data/TODO.md CHANGED
@@ -1,12 +1,15 @@
1
- # A List of To-dos
2
-
3
- ## Little Things
4
-
5
- - Good tests (with mocks for all requests & sessions).
6
- - Some basic examples.
7
-
8
-
9
- ## Bigger Things
10
-
11
- - Docs at some point.
12
- - Youtube video source URL sig decipher sub module.
1
+ # A List of To-dos
2
+
3
+ ## Little Things
4
+
5
+ - ~~Good tests (with mocks for all requests & sessions).~~
6
+ - Some basic examples.
7
+
8
+
9
+ ## Bigger Things
10
+
11
+ - Docs at some point.
12
+
13
+
14
+ ## Okay, like things that are way bigger to deal with than I thought
15
+ - Youtube video source URL sig decipher sub module.
@@ -0,0 +1,12 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require 'youtube-data'
5
+
6
+ extractor = Youtube::DataExtractor.new('k598b_JcQOU') # Extracts, cache, and then parse data.
7
+ video = Youtube::Video.new(extractor) # Interface for the video using extractor data.
8
+
9
+ # Video title, uploader, views and the description.
10
+ puts video.title
11
+ puts "Uploaded by: #{video.uploader} | View count: #{video.views}"
12
+ puts video.description
@@ -0,0 +1,12 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require 'youtube-data'
5
+
6
+ extractor = Youtube::DataExtractor.new("FtutLA63Cp8") # Extract, cache, and parse data.
7
+ thumbnail = Youtube::Thumbnail.new(extractor) # Create a new interface for thumbnails.
8
+
9
+ # Download methods for thumbnails involve sending another seperate request due to them being stored on another server.
10
+ thumbnail.download_default
11
+
12
+ puts "Downloaded default thumbnail: `#{thumbnail.default_filename}`!" # Default is the first item in thumbnails array.
@@ -1,138 +1,133 @@
1
- # frozen_string_literal: true
2
-
3
- module Youtube
4
-
5
- class InitExtractorError < RuntimeError
6
- end
7
-
8
- class InvalidVideoIDError < StandardError
9
- end
10
-
11
- class InvalidPathError < StandardError
12
- end
13
-
14
- # For extracting the data needed from a video in a native format.
15
- #
16
- # @param video_id [String] The video ID in which to scrape data from.
17
- #
18
- # Options
19
- # =======
20
- #
21
- # :mock_session => [MockSession] The mock session to use when testing.
22
- #
23
- class DataExtractor
24
-
25
- HOMEPAGE = URI('https://www.youtube.com')
26
-
27
- def initialize(video_id, opts = {})
28
- @video_id = video_id
29
- @video_path = "/watch?v=#{@video_id}"
30
-
31
- # The session to use for the extractor allows information to persist during requests/session.
32
- @session = Net::HTTP.start(HOMEPAGE.hostname, {'use_ssl': true})
33
- if opts.include?(:mock_session)
34
- @session = opts[:mock_session]
35
- end
36
-
37
- # Cache variables for raw data, saves overhead and less requests from the client session.
38
- # Required requests.
39
- @video_html = get_raw_html
40
- # Required non-requests.
41
- @video_json_raw = find_raw_json_in_html(@video_html)
42
- @video_player_path = find_player_base_js_path_in_html(@video_html)
43
- end
44
-
45
- # Full url to the video's html page returned/ yielded as a `URI::HTTPS` instance.
46
- def video_uri
47
- uri = URI.join(HOMEPAGE, video_path)
48
- return uri unless block_given?
49
- yield uri
50
- end
51
-
52
- # Full url to the `base.js` video player script returned/ yielded as a `URI::HTTPS` instance.
53
- def player_uri
54
- uri = URI.join(HOMEPAGE, @video_player_path)
55
- return uri unless block_given?
56
- yield uri
57
- end
58
-
59
- # Send a simple get request using the extractor session. This should be how the module sends all further requests
60
- # to the `youtube.com` hostname outside of this class also.
61
- #
62
- # @param path [String] Any valid path on the server, prefixed with `/`.
63
- # @return [Net::HTTPResponse] The untouched response sent back from the request.
64
- # @yield [Net::HTTPResponse] Same as return but yields to block if present.
65
- #
66
- def get_raw(path)
67
- res = get_request_path(path)
68
- return res unless block_given?
69
- yield res
70
- end
71
-
72
- def video_raw_html
73
- return @video_html unless block_given?
74
- yield @video_html
75
- end
76
-
77
- # This is the json data for the video that is not yet been altered in terms of it's underlying structure, and of
78
- # which is in it's purest form, but has been processed in a way that makes it easy to work with either through a
79
- # file or a hash that is returned to the caller.
80
- #
81
- # @param dump_file [String] File path to write to (don't open if not provided).
82
- # @param opt [Hash, String] `:pretty` if a dump file is specified, when set to true then format pretty else raw.
83
- #
84
- # @return [Hash] A hash representing the untouched json data parsed from the raw json html data of a video.
85
- # @yield [Hash] Same as return but yields to block if provided.
86
- #
87
- def video_json_untouched(dump_file = nil, opt = {:pretty => true})
88
- parsed_json = JSON.parse(@video_json_raw)
89
-
90
- if dump_file.nil? == false
91
- File.open("#{dump_file}", 'w') do |json_file|
92
- if opt.include?(:pretty) and opt[:pretty] == true
93
- json_file.write(JSON.pretty_generate(parsed_json))
94
- else
95
- JSON.dump(parsed_json, io=json_file)
96
- end
97
- end
98
- end
99
-
100
- return parsed_json unless block_given?
101
- yield parsed_json
102
- end
103
-
104
- # Path from the homepage to the video's page.
105
- private def video_path
106
- if @video_id.length != 11 # 11 is the fixed length of a video ID on youtube.
107
- raise InvalidVideoIDError, "The video id `#{@video_id}' is not valid (too short)"
108
- end
109
- return @video_path
110
- end
111
-
112
- # Sends a GET request to a specified path on the server using the request handler's session.
113
- private def get_request_path(path)
114
- if path.class != "".class
115
- raise InvalidPathError, 'Path on server must be type `String\''
116
- end
117
- if path.empty? == false and path[0] != "/"
118
- raise InvalidPathError, 'Path must be prefixed with `/\''
119
- end
120
- return @session.get(path)
121
- end
122
-
123
- private def get_raw_html
124
- return get_request_path(video_path).body
125
- end
126
-
127
- private def find_raw_json_in_html(html)
128
- var = html[/ytInitialPlayerResponse.*=.*\{.*\};/] # Regex match containing var.
129
- return var[/\{.*\}/] # From matched var, extract the valid js object.
130
- end
131
-
132
- private def find_player_base_js_path_in_html(html)
133
- return html[/([A-Za-z0-9]+(\/[A-Za-z0-9]+)+)_[A-Za-z0-9]+\.[A-Za-z0-9]+\/[A-Za-z0-9]+_[A-Za-z0-9]+\/base\.js/]
134
- end
135
-
136
- end
137
-
138
- end
1
+ # frozen_string_literal: true
2
+
3
+ module Youtube
4
+
5
+ # For extracting the data needed from a video in a native format.
6
+ #
7
+ # @param video_id [String] The video ID in which to scrape data from.
8
+ #
9
+ # Options
10
+ # =======
11
+ #
12
+ # :mock_session => [MockSession] The mock session to use when testing.
13
+ #
14
+ class DataExtractor
15
+
16
+ HOMEPAGE = URI('https://www.youtube.com')
17
+
18
+ def initialize(video_id, opts = {})
19
+ @video_id = video_id
20
+ @video_path = "/watch?v=#{@video_id}"
21
+
22
+ # The session to use for the extractor allows information to persist during requests/session.
23
+ @session = Net::HTTP.start(HOMEPAGE.hostname, {'use_ssl': true})
24
+ if opts.include?(:mock_session)
25
+ @session = opts[:mock_session]
26
+ end
27
+
28
+ # Cache variables for raw data, saves overhead and less requests from the client session.
29
+ # Required requests.
30
+ @video_html = get_raw_html
31
+ # Required non-requests.
32
+ @video_json_raw = find_raw_json_in_html(@video_html)
33
+ @video_player_path = find_player_base_js_path_in_html(@video_html)
34
+ end
35
+
36
+ def inspect
37
+ return itself
38
+ end
39
+
40
+ # Full url to the video's html page returned/ yielded as a `URI::HTTPS` instance.
41
+ def video_uri
42
+ uri = URI.join(HOMEPAGE, video_path)
43
+ return uri unless block_given?
44
+ yield uri
45
+ end
46
+
47
+ # Full url to the `base.js` video player script returned/ yielded as a `URI::HTTPS` instance.
48
+ def player_uri
49
+ uri = URI.join(HOMEPAGE, @video_player_path)
50
+ return uri unless block_given?
51
+ yield uri
52
+ end
53
+
54
+ # Send a simple get request using the extractor session. This should be how the module sends all further requests
55
+ # to the `youtube.com` hostname outside of this class also.
56
+ #
57
+ # @param path [String] Any valid path on the server, prefixed with `/`.
58
+ # @return [Net::HTTPResponse] The untouched response sent back from the request.
59
+ # @yield [Net::HTTPResponse] Same as return but yields to block if present.
60
+ #
61
+ def get_raw(path)
62
+ res = get_request_path(path)
63
+ return res unless block_given?
64
+ yield res
65
+ end
66
+
67
+ def video_raw_html
68
+ return @video_html unless block_given?
69
+ yield @video_html
70
+ end
71
+
72
+ # This is the json data for the video that is not yet been altered in terms of it's underlying structure, and of
73
+ # which is in it's purest form, but has been processed in a way that makes it easy to work with either through a
74
+ # file or a hash that is returned to the caller.
75
+ #
76
+ # @param dump_file [String] File path to write to (don't open if not provided).
77
+ # @param opt [Hash, String] `:pretty` if a dump file is specified, when set to true then format pretty else raw.
78
+ #
79
+ # @return [Hash] A hash representing the untouched json data parsed from the raw json html data of a video.
80
+ # @yield [Hash] Same as return but yields to block if provided.
81
+ #
82
+ def video_json_untouched(dump_file = nil, opt = {:pretty => true})
83
+ parsed_json = JSON.parse(@video_json_raw)
84
+
85
+ if dump_file.nil? == false
86
+ File.open("#{dump_file}", 'w') do |json_file|
87
+ if opt.include?(:pretty) and opt[:pretty] == true
88
+ json_file.write(JSON.pretty_generate(parsed_json))
89
+ else
90
+ JSON.dump(parsed_json, io=json_file)
91
+ end
92
+ end
93
+ end
94
+
95
+ return parsed_json unless block_given?
96
+ yield parsed_json
97
+ end
98
+
99
+ # Path from the homepage to the video's page.
100
+ private def video_path
101
+ if @video_id.length != 11 # 11 is the fixed length of a video ID on youtube.
102
+ raise InvalidVideoIDError, "The video id `#{@video_id}' is not valid (too short)"
103
+ end
104
+ return @video_path
105
+ end
106
+
107
+ # Sends a GET request to a specified path on the server using the request handler's session.
108
+ private def get_request_path(path)
109
+ if path.class != "".class
110
+ raise InvalidPathError, 'Path on server must be type `String\''
111
+ end
112
+ if path.empty? == false and path[0] != "/"
113
+ raise InvalidPathError, 'Path must be prefixed with `/\''
114
+ end
115
+ return @session.get(path)
116
+ end
117
+
118
+ private def get_raw_html
119
+ return get_request_path(video_path).body
120
+ end
121
+
122
+ private def find_raw_json_in_html(html)
123
+ var = html[/ytInitialPlayerResponse.*=.*\{.*\};/] # Regex match containing var.
124
+ return var[/\{.*\}/] # From matched var, extract the valid js object.
125
+ end
126
+
127
+ private def find_player_base_js_path_in_html(html)
128
+ return html[/([A-Za-z0-9]+(\/[A-Za-z0-9]+)+)_[A-Za-z0-9]+\.[A-Za-z0-9]+\/[A-Za-z0-9]+_[A-Za-z0-9]+\/base\.js/]
129
+ end
130
+
131
+ end
132
+
133
+ end