movieDB 0.3.4 → 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.DS_Store +0 -0
- data/.coveralls.yml +2 -0
- data/.gitignore +3 -1
- data/.rspec +2 -0
- data/.travis.yml +5 -0
- data/Gemfile +8 -3
- data/README.md +250 -103
- data/Rakefile +3 -0
- data/lib/movieDB.rb +22 -141
- data/lib/movieDB/base.rb +3 -10
- data/lib/movieDB/data_analysis/statistics.rb +85 -0
- data/lib/movieDB/data_store.rb +83 -0
- data/lib/movieDB/relation/query_methods.rb +139 -0
- data/lib/movieDB/secret.rb +2 -6
- data/lib/movieDB/support/reporting.rb +19 -0
- data/lib/movieDB/version.rb +1 -1
- data/movieDB.gemspec +6 -6
- data/spec/movieDB/data_analysis/statistics_spec.rb +105 -0
- data/spec/movieDB/data_store_spec.rb +31 -0
- data/spec/movieDB/relation/query_methods_spec.rb +71 -0
- data/spec/movieDB/support/reporting_spec.rb +12 -0
- data/spec/movieDB_spec.rb +24 -0
- data/spec/spec_helper.rb +29 -0
- metadata +33 -23
- data/lib/movieDB/data_analysis.rb +0 -263
- data/lib/movieDB/data_export.rb +0 -96
- data/lib/movieDB/data_process.rb +0 -26
- data/lib/movieDB/movie_error.rb +0 -20
- data/lib/movieDB/status_checker.rb +0 -48
- data/test/unit/test_movie_db.rb +0 -97
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA1:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 18eb3efc3ad17c7cb5b6134a951402964af0b5da
|
|
4
|
+
data.tar.gz: 7eeb628cab2588293f92aa458eeabb903cbfb6ec
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 09c390931150038ca9b18d52ffdd084a805b344c4272876c9ff6880ee2c4ae3da5623be445cc9b3ad6e7f192e60594fb62a682ebad4168f68d08f8e07b3629b5
|
|
7
|
+
data.tar.gz: 4b559dcc3e841b450187b87f1060039033109bdce2cd4791ae5c9628bea929db875acff15748e807f9e6444b2eeddcd03ff37b0b2b62d4b324bffdf30468d068
|
data/.DS_Store
ADDED
|
Binary file
|
data/.coveralls.yml
ADDED
data/.gitignore
CHANGED
data/.rspec
ADDED
data/.travis.yml
CHANGED
data/Gemfile
CHANGED
|
@@ -2,7 +2,12 @@ source 'https://rubygems.org'
|
|
|
2
2
|
|
|
3
3
|
gemspec
|
|
4
4
|
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
5
|
+
gem 'coveralls', require: false
|
|
6
|
+
gem 'rspec'
|
|
7
|
+
|
|
8
8
|
|
|
9
|
+
# Code coverage analysis tool for Ruby.
|
|
10
|
+
group :test do
|
|
11
|
+
gem 'simplecov', :require => false
|
|
12
|
+
gem 'pullreview-coverage', require: false
|
|
13
|
+
end
|
data/README.md
CHANGED
|
@@ -1,164 +1,308 @@
|
|
|
1
|
-
|
|
1
|
+
# MovieDB
|
|
2
2
|
|
|
3
|
-
MovieDB is a ruby wrapper for
|
|
4
|
-
The objective and usage of this tool is to
|
|
5
|
-
|
|
6
|
-
The fetched data is stored in memory using Redis and has an expiration time of 1800 seconds for all cached objects.
|
|
3
|
+
MovieDB is a multi-threaded ruby wrapper for performing advance statistical computation and high-level data analysis on Movie or TV Data from IMDb.
|
|
4
|
+
The objective and usage of this tool is to allow producers, directors, writers to make logical business decisions that will generate profitable ROI.
|
|
7
5
|
|
|
6
|
+
## Badges
|
|
7
|
+
- [](https://www.pullreview.com/github/keeperofthenecklace/movieDB/reviews/master)
|
|
8
|
+
- [](https://gemnasium.com/keeperofthenecklace/movieDB)
|
|
8
9
|
- [](https://gitter.im/keeperofthenecklace/movieDB?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
|
|
9
|
-
- [](https://coveralls.io/
|
|
10
|
+
- [](https://coveralls.io/github/keeperofthenecklace/movieDB?branch=master)
|
|
10
11
|
- [](https://codeclimate.com/github/keeperofthenecklace/movieDB)
|
|
11
12
|
- [](http://badge.fury.io/rb/movieDB)
|
|
12
13
|
- [](http://travis-ci.org/keeperofthenecklace/movieDB)
|
|
13
14
|
|
|
14
|
-
##
|
|
15
|
+
## Technology
|
|
16
|
+
* SciRuby is used for all statistical and scientific computations.
|
|
17
|
+
* Redis is used to store all data.
|
|
18
|
+
* IMDb and TMDb is the source for all film / TV data.
|
|
19
|
+
* BoxOfficeMojo is where we will be scraping future film / TV data.
|
|
20
|
+
* Celluloid is used to build the fault-tolerant concurrent programs. Note, if you are using MRI or YARV,
|
|
21
|
+
multi-threading won't work since these types of interpreters have Global Interpreter Lock (GIL).
|
|
22
|
+
Fortunately, you can use JRuby or Rubinius, since they don’t have a GIL and support real parallel threading.
|
|
23
|
+
|
|
24
|
+
## Requirements
|
|
25
|
+
ruby-2.2.2 or higher
|
|
26
|
+
|
|
27
|
+
jruby-9.0.0.0
|
|
28
|
+
|
|
29
|
+
## Category
|
|
30
|
+
movieDB is broken down into 3 components namely:
|
|
31
|
+
|
|
32
|
+
* Statistics
|
|
33
|
+
* Visualizations (Work in progress)
|
|
34
|
+
* DataMining (Work in progress)
|
|
15
35
|
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
36
|
+
# Statistics
|
|
37
|
+
|
|
38
|
+
Simple statistical analysis on numeric data.
|
|
39
|
+
The corresponding computation is performed on
|
|
40
|
+
both numeric and string vectors within the
|
|
41
|
+
collected data.
|
|
20
42
|
|
|
21
43
|
## Installation
|
|
22
44
|
|
|
23
|
-
|
|
45
|
+
Redis Installation
|
|
24
46
|
|
|
25
|
-
|
|
47
|
+
This tutorial doesn't cover redis installation.
|
|
48
|
+
You will find that information at: http://redis.io/topics/quickstart
|
|
26
49
|
|
|
27
|
-
|
|
50
|
+
movieDB is available through [Rubygems](http://rubygems.org/gems/movieDB) and can be installed via Gemfile.
|
|
28
51
|
|
|
29
|
-
|
|
52
|
+
``` ruby
|
|
53
|
+
gem 'movieDB'
|
|
54
|
+
```
|
|
30
55
|
|
|
31
56
|
And then execute:
|
|
32
57
|
|
|
33
|
-
|
|
58
|
+
``` ruby
|
|
59
|
+
$ bundle install
|
|
60
|
+
```
|
|
34
61
|
|
|
35
62
|
Or install it yourself as:
|
|
36
63
|
|
|
37
|
-
|
|
64
|
+
``` ruby
|
|
65
|
+
gem install movieDB
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
## Console - loading the libraries
|
|
38
69
|
|
|
39
|
-
|
|
70
|
+
``` bash
|
|
71
|
+
$ irb
|
|
72
|
+
```
|
|
40
73
|
|
|
41
|
-
|
|
74
|
+
## Require the gem
|
|
42
75
|
|
|
43
|
-
|
|
76
|
+
```ruby
|
|
77
|
+
require 'movieDB'
|
|
78
|
+
```
|
|
44
79
|
|
|
45
|
-
##
|
|
80
|
+
## Initialize MovieDB (multi-thread setup)
|
|
46
81
|
|
|
47
|
-
|
|
82
|
+
``` ruby
|
|
83
|
+
m = MovieDB::Movie.pool(size: 2)
|
|
84
|
+
```
|
|
85
|
+
## Step Process
|
|
48
86
|
|
|
49
|
-
|
|
87
|
+
Fetching and analysing movie / TV data using movieDB is a simple 2 step process.
|
|
50
88
|
|
|
51
|
-
|
|
89
|
+
First, fetch the data from IMDb.
|
|
52
90
|
|
|
53
|
-
|
|
91
|
+
Next, run your choice of statistic.
|
|
54
92
|
|
|
55
|
-
|
|
93
|
+
That's it! It is that simple.
|
|
56
94
|
|
|
57
|
-
|
|
95
|
+
## Part 1 - Fetch Data from IMDb
|
|
58
96
|
|
|
59
|
-
|
|
97
|
+
There are 2 ways to find IMDb ids.
|
|
60
98
|
|
|
61
|
-
|
|
99
|
+
* Finding specific IMDb ids
|
|
62
100
|
|
|
63
|
-
|
|
101
|
+
* Finding random IMDb ids.
|
|
64
102
|
|
|
65
|
-
|
|
103
|
+
### Fetching specific IMDb ids
|
|
66
104
|
|
|
67
|
-
|
|
105
|
+
To find IMDb id for specific movies, you must go to:
|
|
68
106
|
|
|
69
|
-
|
|
107
|
+
```bash
|
|
108
|
+
http://www.imdb.com
|
|
109
|
+
```
|
|
110
|
+
Search for your movie of choice. Once you do, IMDb redirects you to the movie's page.
|
|
70
111
|
|
|
71
|
-
|
|
112
|
+
The URL for the redirect page includes the IMDB id.
|
|
72
113
|
|
|
73
|
-
|
|
114
|
+
``` ruby
|
|
115
|
+
http://www.imdb.com/title/tt0369610/
|
|
116
|
+
```
|
|
117
|
+
0369610 is the IMDb id.
|
|
74
118
|
|
|
75
|
-
|
|
119
|
+
### Fetching random IMDb ids (multi-thread setup)
|
|
120
|
+
You can fetch IMDb ids random.
|
|
76
121
|
|
|
77
|
-
|
|
122
|
+
``` ruby
|
|
123
|
+
r = Random.new
|
|
78
124
|
|
|
79
|
-
|
|
125
|
+
39.times do |i|
|
|
126
|
+
m.async.fetch(sprintf '%07d', r.rand(300000))
|
|
127
|
+
sleep(4)
|
|
128
|
+
end
|
|
80
129
|
|
|
81
|
-
|
|
130
|
+
sleep(10)
|
|
131
|
+
```
|
|
132
|
+
Note: IMDB has a rate limit of 40 requests every 10 seconds and are limited by IP address, not API key.
|
|
133
|
+
If you exceed the limit, you will receive a 429 HTTP status with a 'Retry-After' header.
|
|
134
|
+
As soon your cool down period expires, you are free to continue making requests.
|
|
82
135
|
|
|
83
|
-
|
|
136
|
+
Also, movieDB will throw a NameError if the randomly generated IMDb id in invalid.
|
|
84
137
|
|
|
85
|
-
|
|
138
|
+
### Get Movie Data
|
|
86
139
|
|
|
87
|
-
|
|
140
|
+
``` ruby
|
|
141
|
+
m.async.fetch("0369610", "3079380", "0478970")
|
|
142
|
+
```
|
|
143
|
+
By calling m.async, this instructs Celluloid that you would like for the given method to be called asynchronously.
|
|
144
|
+
This means that rather than the caller waiting for a response of querying both IMDb and TMDb, the caller sends a
|
|
145
|
+
message to the concurrent object that you'd like the given method invoked, and then the caller proceeds without waiting for a response.
|
|
146
|
+
The concurrent object receiving the message will then process the method call in the background.
|
|
88
147
|
|
|
89
|
-
|
|
148
|
+
Asynchronous calls will never raise an exception, even if an exception occurs when the receiver is processing it.
|
|
90
149
|
|
|
91
|
-
|
|
150
|
+
### Redis - caching objects
|
|
92
151
|
|
|
93
|
-
|
|
94
|
-
> Lack_Of_Fit_Sum_Of_Squares
|
|
95
|
-
> Least_Squares_Support_Vector_Machine
|
|
96
|
-
> Mean_Squared_Error
|
|
97
|
-
> Moving_Least_Sqares
|
|
98
|
-
> Non_Linear_Iterative_Partial_Least_Squares
|
|
99
|
-
> Non_Linear_Least_Squares
|
|
100
|
-
> Ordinary_Least_Squares
|
|
101
|
-
> Partial_Least_Squares_Regression
|
|
102
|
-
> Partition_Of_Sums_Of_Squares
|
|
103
|
-
> Proofs_Involving_Ordinary_Least_Squares
|
|
104
|
-
> Residual_Sum_Of_Squares
|
|
105
|
-
> Total_Least_Squares
|
|
106
|
-
> Total_Sum_Of_Squares
|
|
152
|
+
By default, any movie fetched from IMDb is stored in redis and has an expiration time of 1800 seconds (30 minutes).
|
|
107
153
|
|
|
108
|
-
|
|
154
|
+
But you can change this expiration time.
|
|
109
155
|
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
> Multivariate_Kernel_Density_Estimation
|
|
115
|
-
> Variable_Kernel_Density_Estimation
|
|
156
|
+
```ruby
|
|
157
|
+
m.async.fetch("0369610", "3079380", expire: 86400)
|
|
158
|
+
```
|
|
159
|
+
Here, I set the expiration time to 86400 seconds which is equivalent to 24 hours.
|
|
116
160
|
|
|
117
|
-
|
|
161
|
+
## Part 2 - Run the statistic
|
|
118
162
|
|
|
119
|
-
|
|
120
|
-
> Table_Diagonalization
|
|
121
|
-
> Configural_Frequency_Analysis
|
|
122
|
-
> Median_Polish
|
|
123
|
-
> Stem_And_Leaf_Display
|
|
163
|
+
Below, we've collected 3 specific IMDb ids to analyze.
|
|
124
164
|
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
> Dimension_Reduction
|
|
129
|
-
> Applied_DataMining
|
|
165
|
+
* Ant Man - 0369610
|
|
166
|
+
* Jurassic World - 079380
|
|
167
|
+
* Spy - 0478970
|
|
130
168
|
|
|
131
|
-
|
|
132
|
-
|
|
169
|
+
Finding the Mean value.
|
|
170
|
+
```ruby
|
|
171
|
+
m.mean
|
|
172
|
+
```
|
|
173
|
+
Below is the result generated.
|
|
133
174
|
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
175
|
+
```ruby
|
|
176
|
+
mean
|
|
177
|
+
ant-man 576.8444444444444
|
|
178
|
+
jurassic_world 512.5111111111111
|
|
179
|
+
spy 369.73333333333335
|
|
180
|
+
```
|
|
181
|
+
Below are more statistic methods you can invoke on your objects.
|
|
140
182
|
|
|
141
|
-
|
|
142
|
-
> Statistical_Outliers
|
|
143
|
-
> Regression_And_Curve_Fitting_Software
|
|
144
|
-
> Regression_Diagnostics
|
|
145
|
-
> Regression_Variable_Selection
|
|
146
|
-
> Regression_With_Time_Series_Structure
|
|
147
|
-
> Robust_Regression
|
|
148
|
-
> Choice_Modeling
|
|
183
|
+
Feel free to try them out.
|
|
149
184
|
|
|
150
|
-
|
|
151
|
-
|
|
185
|
+
* std
|
|
186
|
+
* sum
|
|
187
|
+
* count
|
|
188
|
+
* max
|
|
189
|
+
* min
|
|
190
|
+
* product
|
|
191
|
+
* standardize
|
|
192
|
+
* describe
|
|
193
|
+
* covariance
|
|
194
|
+
* correlation
|
|
152
195
|
|
|
153
|
-
|
|
154
|
-
> Variance_Based_Sensitivity_Analysis
|
|
155
|
-
> Elementary_Effects_Method
|
|
156
|
-
> Experimental_Uncertainty_Analysis
|
|
157
|
-
> Fourier_Amplitude_Sensitivity_Testing
|
|
158
|
-
> Hyperparameter
|
|
196
|
+
### Layout and Template
|
|
159
197
|
|
|
160
|
-
|
|
161
|
-
|
|
198
|
+
movieDB allows you to view all your data fields in a worksheet style layout.
|
|
199
|
+
|
|
200
|
+
``` ruby
|
|
201
|
+
m.worksheet
|
|
202
|
+
```
|
|
203
|
+
A total of 45 fields are printed out. But, we've truncated the result for ease of reading.
|
|
204
|
+
|
|
205
|
+
|
|
206
|
+
``` ruby
|
|
207
|
+
ant-man jurassic_w spy
|
|
208
|
+
production 177 128 40
|
|
209
|
+
belongs_to 0 151 0
|
|
210
|
+
plot_synop 9083 0 9629
|
|
211
|
+
company 14 18 21
|
|
212
|
+
title 7 14 3
|
|
213
|
+
filming_lo 267 1037 530
|
|
214
|
+
cast_chara 4094 5894 1001
|
|
215
|
+
trailer_ur 0 46 45
|
|
216
|
+
cast_membe 2833 3452 939
|
|
217
|
+
votes 5 6 5
|
|
218
|
+
adult 5 5 5
|
|
219
|
+
also_known 928 1601 1195
|
|
220
|
+
director 15 19 13
|
|
221
|
+
plot_summa 373 298 311
|
|
222
|
+
countries 7 16 7
|
|
223
|
+
... ... ... ...
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
## Filters
|
|
227
|
+
|
|
228
|
+
When performing statistics on an object, movieDB by default processes all fields.
|
|
229
|
+
|
|
230
|
+
Contrary, to this default approach, you now have the option of filtering what fields you want processed with the following 2 filters.
|
|
231
|
+
|
|
232
|
+
* only
|
|
233
|
+
* except
|
|
234
|
+
|
|
235
|
+
'only' analyzes the fields you provide.
|
|
236
|
+
|
|
237
|
+
'Except' is the inverse of 'only'. It analyzes all the fields you did not provide.
|
|
238
|
+
|
|
239
|
+
``` ruby
|
|
240
|
+
m.standardize only: [:budget, :revenue, :length, :vote_average]
|
|
241
|
+
|
|
242
|
+
```
|
|
243
|
+
Processes only budget, revenue, length and vote_average values.
|
|
244
|
+
``` ruby
|
|
245
|
+
ant-man jurassic_w spy
|
|
246
|
+
budget 1.49999999 -0.3616594 1.49999999
|
|
247
|
+
revenue -0.5000006 1.49304559 -0.5000013
|
|
248
|
+
length -0.4999988 -0.5656929 -0.4999976
|
|
249
|
+
vote_avera -0.5000005 -0.5656931 -0.5000010
|
|
250
|
+
```
|
|
251
|
+
# Commands
|
|
252
|
+
|
|
253
|
+
movieDB comes with commands to help you query or manipulate stored objects in redis.
|
|
254
|
+
|
|
255
|
+
* HGETALL key
|
|
256
|
+
Get all the fields and values in a hash of the movie
|
|
257
|
+
|
|
258
|
+
``` ruby
|
|
259
|
+
m.hgetall(["0369610"])
|
|
260
|
+
# => {"production_companies"=>"[{\"name\"=>\"Universal Studios\", \"id\"=>13},...}
|
|
261
|
+
```
|
|
262
|
+
* HKEYS key
|
|
263
|
+
Get all the fields in a hash of the movie
|
|
264
|
+
|
|
265
|
+
``` ruby
|
|
266
|
+
m.hkeys
|
|
267
|
+
# => ["production_companies", "belongs_to_collection", "plot_synopsis", "company", "title",...]
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
* HVALS key
|
|
271
|
+
Get all the values in a hash of the movie
|
|
272
|
+
|
|
273
|
+
``` ruby
|
|
274
|
+
m.hvals
|
|
275
|
+
# => ["[{\"name\"=>\"Universal Studios\", \"id\"=>13}, {\"name\"=>\"Amblin Entertainment\",...]
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
* ALL_IDS key
|
|
279
|
+
Get all the id of movies
|
|
280
|
+
|
|
281
|
+
``` ruby
|
|
282
|
+
m.all_ids
|
|
283
|
+
# => ["0369610", "3079380"...]
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
* TTL key
|
|
287
|
+
Gets the remaining time to live of a movie.
|
|
288
|
+
|
|
289
|
+
``` ruby
|
|
290
|
+
m.ttl("0369610)
|
|
291
|
+
# => 120
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
* DELETE_ALL key
|
|
295
|
+
deletes all movie objects stored in redis.
|
|
296
|
+
|
|
297
|
+
``` ruby
|
|
298
|
+
m.delete_all
|
|
299
|
+
# => []
|
|
300
|
+
```
|
|
301
|
+
# Visualizations
|
|
302
|
+
(Work in progress)
|
|
303
|
+
|
|
304
|
+
# Data mining
|
|
305
|
+
(Work in progress)
|
|
162
306
|
|
|
163
307
|
## Contact me
|
|
164
308
|
|
|
@@ -166,4 +310,7 @@ If you'd like to collaborate, please feel free to fork source code on github.
|
|
|
166
310
|
|
|
167
311
|
You can also contact me at albertmck@gmail.com
|
|
168
312
|
|
|
169
|
-
|
|
313
|
+
## Disclaimer
|
|
314
|
+
This software is provided “as is” and without any express or implied warranties, including, without limitation, the implied warranties of merchantibility and fitness for a particular purpose.
|
|
315
|
+
|
|
316
|
+
###### Copyright (c) 2013 - 2015 Albert McKeever, released under MIT license
|