movieDB 0.3.4 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: abd7277f09e27f87c05a46760b7604142a13c4f4
4
- data.tar.gz: 7b7982a640eff36476dcc23dfeb737e87f3ee911
3
+ metadata.gz: 18eb3efc3ad17c7cb5b6134a951402964af0b5da
4
+ data.tar.gz: 7eeb628cab2588293f92aa458eeabb903cbfb6ec
5
5
  SHA512:
6
- metadata.gz: 950a7d07348be94e30fb40e74048de44c6548f1794bf0c65ac140a6f2845733a76533c0d49e7ea5dec9f5b8397e76082f8b93580959c08ae0cf3d2e8589cca5c
7
- data.tar.gz: e8293834822371dec1e2579eeff6e4f6aa9baf035c46ca8a0324d7458ca029c6c875afec8577cbb329448022207bc0c2d20ed0fec243b2c1eb11dee1b3f81863
6
+ metadata.gz: 09c390931150038ca9b18d52ffdd084a805b344c4272876c9ff6880ee2c4ae3da5623be445cc9b3ad6e7f192e60594fb62a682ebad4168f68d08f8e07b3629b5
7
+ data.tar.gz: 4b559dcc3e841b450187b87f1060039033109bdce2cd4791ae5c9628bea929db875acff15748e807f9e6444b2eeddcd03ff37b0b2b62d4b324bffdf30468d068
Binary file
@@ -0,0 +1,2 @@
1
+ service_name: travis-ci
2
+ repo_token: Uytw7gmmEhTTwfj9STT7rokiH7D0su0GD
data/.gitignore CHANGED
@@ -16,6 +16,8 @@ test/tmp
16
16
  test/version_tmp
17
17
  tmp
18
18
  reports
19
- .coveralls.yml
20
19
  readme_first.txt
21
20
  test/.DS_Store
21
+ coverage
22
+ *.log
23
+ .idea/
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --format documentation
2
+ --color
@@ -3,6 +3,11 @@ rvm:
3
3
  - 2.2.2
4
4
  env:
5
5
  - TEST_SUITE = units
6
+ global:
7
+ - "CI=true"
8
+ - "TRAVIS=true"
9
+ after_success:
10
+ - coveralls --verbose
6
11
  before_script:
7
12
  - rm -rf reports/*.8
8
13
  bundler_args: --without development production --quiet
data/Gemfile CHANGED
@@ -2,7 +2,12 @@ source 'https://rubygems.org'
2
2
 
3
3
  gemspec
4
4
 
5
- group :documentation do
6
- gem 'coveralls', :require => false
7
- end
5
+ gem 'coveralls', require: false
6
+ gem 'rspec'
7
+
8
8
 
9
+ # Code coverage analysis tool for Ruby.
10
+ group :test do
11
+ gem 'simplecov', :require => false
12
+ gem 'pullreview-coverage', require: false
13
+ end
data/README.md CHANGED
@@ -1,164 +1,308 @@
1
- ## MovieDB
1
+ # MovieDB
2
2
 
3
- MovieDB is a ruby wrapper for fetching raw Movie or TV Data from IMDb and performing a variety of statistical analysis and computation.
4
- The objective and usage of this tool is to help media producers make high level structured decisions based on realistic analysis of actual data.
5
-
6
- The fetched data is stored in memory using Redis and has an expiration time of 1800 seconds for all cached objects.
3
+ MovieDB is a multi-threaded ruby wrapper for performing advance statistical computation and high-level data analysis on Movie or TV Data from IMDb.
4
+ The objective and usage of this tool is to allow producers, directors, writers to make logical business decisions that will generate profitable ROI.
7
5
 
6
+ ## Badges
7
+ - [![PullReview stats](https://www.pullreview.com/github/keeperofthenecklace/movieDB/badges/master.svg?)](https://www.pullreview.com/github/keeperofthenecklace/movieDB/reviews/master)
8
+ - [![Dependency Status](https://gemnasium.com/keeperofthenecklace/movieDB.svg)](https://gemnasium.com/keeperofthenecklace/movieDB)
8
9
  - [![Join the chat at https://gitter.im/keeperofthenecklace/movieDB](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/keeperofthenecklace/movieDB?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
9
- - [![Coverage Status](https://coveralls.io/repos/keeperofthenecklace/movieDB/badge.svg)](https://coveralls.io/r/keeperofthenecklace/movieDB)
10
+ - [![Coverage Status](https://coveralls.io/repos/keeperofthenecklace/movieDB/badge.svg?branch=master&service=github)](https://coveralls.io/github/keeperofthenecklace/movieDB?branch=master)
10
11
  - [![Code Climate](https://codeclimate.com/github/keeperofthenecklace/movieDB.png)](https://codeclimate.com/github/keeperofthenecklace/movieDB)
11
12
  - [![Gem Version](https://badge.fury.io/rb/movieDB.png)](http://badge.fury.io/rb/movieDB)
12
13
  - [![Build Status](https://secure.travis-ci.org/keeperofthenecklace/movieDB.png?branch=master)](http://travis-ci.org/keeperofthenecklace/movieDB)
13
14
 
14
- ## Basic functions and Data Analysis:
15
+ ## Technology
16
+ * SciRuby is used for all statistical and scientific computations.
17
+ * Redis is used to store all data.
18
+ * IMDb and TMDb is the source for all film / TV data.
19
+ * BoxOfficeMojo is where we will be scraping future film / TV data.
20
+ * Celluloid is used to build the fault-tolerant concurrent programs. Note, if you are using MRI or YARV,
21
+ multi-threading won't work since these types of interpreters have Global Interpreter Lock (GIL).
22
+ Fortunately, you can use JRuby or Rubinius, since they don’t have a GIL and support real parallel threading.
23
+
24
+ ## Requirements
25
+ ruby-2.2.2 or higher
26
+
27
+ jruby-9.0.0.0
28
+
29
+ ## Category
30
+ movieDB is broken down into 3 components namely:
31
+
32
+ * Statistics
33
+ * Visualizations (Work in progress)
34
+ * DataMining (Work in progress)
15
35
 
16
- * Data Analysis
17
- * Exploratory Data Analysis
18
- * Confirmatory Data Analysis
19
- * More to come...
36
+ # Statistics
37
+
38
+ Simple statistical analysis on numeric data.
39
+ The corresponding computation is performed on
40
+ both numeric and string vectors within the
41
+ collected data.
20
42
 
21
43
  ## Installation
22
44
 
23
- Please make sure you have redis installed.
45
+ Redis Installation
24
46
 
25
- This tutorial doesn't cover redis installation.
47
+ This tutorial doesn't cover redis installation.
48
+ You will find that information at: http://redis.io/topics/quickstart
26
49
 
27
- Add this line to your application's Gemfile:
50
+ movieDB is available through [Rubygems](http://rubygems.org/gems/movieDB) and can be installed via Gemfile.
28
51
 
29
- gem 'movieDB'
52
+ ``` ruby
53
+ gem 'movieDB'
54
+ ```
30
55
 
31
56
  And then execute:
32
57
 
33
- $> bundle install
58
+ ``` ruby
59
+ $ bundle install
60
+ ```
34
61
 
35
62
  Or install it yourself as:
36
63
 
37
- $> gem install movieDB
64
+ ``` ruby
65
+ gem install movieDB
66
+ ```
67
+
68
+ ## Console - loading the libraries
38
69
 
39
- ## Require - loading the libraries
70
+ ``` bash
71
+ $ irb
72
+ ```
40
73
 
41
- $> irb
74
+ ## Require the gem
42
75
 
43
- $> require 'movieDB'
76
+ ```ruby
77
+ require 'movieDB'
78
+ ```
44
79
 
45
- ## Usage - Fetch Raw Movie Data From IMDb
80
+ ## Initialize MovieDB (multi-thread setup)
46
81
 
47
- $> imdb_ids = ["0369610", "3079380"]
82
+ ``` ruby
83
+ m = MovieDB::Movie.pool(size: 2)
84
+ ```
85
+ ## Step Process
48
86
 
49
- $> MovieDB::Movie.find_imdb_id(imdb_ids)
87
+ Fetching and analysing movie / TV data using movieDB is a simple 2 step process.
50
88
 
51
- /* YOU CAN ADD AS MANY IMDB IDs AS YOU LIKE. BUT DO NOT EXCEED THE MAXIMUM REQUEST RATE. */
89
+ First, fetch the data from IMDb.
52
90
 
53
- ### IMDb Data
91
+ Next, run your choice of statistic.
54
92
 
55
- When IMDb data is fetched, two things happen.
93
+ That's it! It is that simple.
56
94
 
57
- First, a reports folder is created in the movieDB gem folder.
95
+ ## Part 1 - Fetch Data from IMDb
58
96
 
59
- Next, the fetched data is written to an xls format and stored in the reports directory.
97
+ There are 2 ways to find IMDb ids.
60
98
 
61
- From your terminal you can locate movieDB gem directory like this:
99
+ * Finding specific IMDb ids
62
100
 
63
- $ gem content movieDB
101
+ * Finding random IMDb ids.
64
102
 
65
- If you use our above IMDb id, you should find the following xls file.
103
+ ### Fetching specific IMDb ids
66
104
 
67
- Feel free to open it.
105
+ To find IMDb id for specific movies, you must go to:
68
106
 
69
- $ open ../reports/imdb_JurassicWorld_Spy_.xls
107
+ ```bash
108
+ http://www.imdb.com
109
+ ```
110
+ Search for your movie of choice. Once you do, IMDb redirects you to the movie's page.
70
111
 
71
- ## Usage - Analyzing Data From IMDb.
112
+ The URL for the redirect page includes the IMDB id.
72
113
 
73
- $ irb
114
+ ``` ruby
115
+ http://www.imdb.com/title/tt0369610/
116
+ ```
117
+ 0369610 is the IMDb id.
74
118
 
75
- > require 'MovieDB/data_analysis'
119
+ ### Fetching random IMDb ids (multi-thread setup)
120
+ You can fetch IMDb ids random.
76
121
 
77
- > require 'MovieDB/data_process'
122
+ ``` ruby
123
+ r = Random.new
78
124
 
79
- > MovieDB::DataProcess.send(:basic_statistic, 'imdb_JurassicWorld_Spy_.xls')
125
+ 39.times do |i|
126
+ m.async.fetch(sprintf '%07d', r.rand(300000))
127
+ sleep(4)
128
+ end
80
129
 
81
- A statistical computation is performed and the results is written to movieDB gem reports folder.
130
+ sleep(10)
131
+ ```
132
+ Note: IMDB has a rate limit of 40 requests every 10 seconds and are limited by IP address, not API key.
133
+ If you exceed the limit, you will receive a 429 HTTP status with a 'Retry-After' header.
134
+ As soon your cool down period expires, you are free to continue making requests.
82
135
 
83
- Feel free to open it.
136
+ Also, movieDB will throw a NameError if the randomly generated IMDb id in invalid.
84
137
 
85
- $ open ../reports/Statistic_imdb_JurassicWorld_Spy.xls
138
+ ### Get Movie Data
86
139
 
87
- ## What's Next
140
+ ``` ruby
141
+ m.async.fetch("0369610", "3079380", "0478970")
142
+ ```
143
+ By calling m.async, this instructs Celluloid that you would like for the given method to be called asynchronously.
144
+ This means that rather than the caller waiting for a response of querying both IMDb and TMDb, the caller sends a
145
+ message to the concurrent object that you'd like the given method invoked, and then the caller proceeds without waiting for a response.
146
+ The concurrent object receiving the message will then process the method call in the background.
88
147
 
89
- ##### More statistical computations coming soon:
148
+ Asynchronous calls will never raise an exception, even if an exception occurs when the receiver is processing it.
90
149
 
91
- `:GaussNewtonAlgorithm`
150
+ ### Redis - caching objects
92
151
 
93
- > Iteratively_Reweighted_Least_Squares
94
- > Lack_Of_Fit_Sum_Of_Squares
95
- > Least_Squares_Support_Vector_Machine
96
- > Mean_Squared_Error
97
- > Moving_Least_Sqares
98
- > Non_Linear_Iterative_Partial_Least_Squares
99
- > Non_Linear_Least_Squares
100
- > Ordinary_Least_Squares
101
- > Partial_Least_Squares_Regression
102
- > Partition_Of_Sums_Of_Squares
103
- > Proofs_Involving_Ordinary_Least_Squares
104
- > Residual_Sum_Of_Squares
105
- > Total_Least_Squares
106
- > Total_Sum_Of_Squares
152
+ By default, any movie fetched from IMDb is stored in redis and has an expiration time of 1800 seconds (30 minutes).
107
153
 
108
- `:EstimationOfDensity`
154
+ But you can change this expiration time.
109
155
 
110
- > Cluster_Weighted_Modeling
111
- > Density_Estimation
112
- > Discretization_Of_Continuous_Features
113
- > Mean_Integrated_Squared_Error
114
- > Multivariate_Kernel_Density_Estimation
115
- > Variable_Kernel_Density_Estimation
156
+ ```ruby
157
+ m.async.fetch("0369610", "3079380", expire: 86400)
158
+ ```
159
+ Here, I set the expiration time to 86400 seconds which is equivalent to 24 hours.
116
160
 
117
- `:ExploratoryDataAnalysis`
161
+ ## Part 2 - Run the statistic
118
162
 
119
- > Data_Reduction
120
- > Table_Diagonalization
121
- > Configural_Frequency_Analysis
122
- > Median_Polish
123
- > Stem_And_Leaf_Display
163
+ Below, we've collected 3 specific IMDb ids to analyze.
124
164
 
125
- > Data_Mining
126
- > Applied_DataMining
127
- > Cluster_Analysis
128
- > Dimension_Reduction
129
- > Applied_DataMining
165
+ * Ant Man - 0369610
166
+ * Jurassic World - 079380
167
+ * Spy - 0478970
130
168
 
131
- > RegressionAnalysis
132
- > Choice_Modelling
169
+ Finding the Mean value.
170
+ ```ruby
171
+ m.mean
172
+ ```
173
+ Below is the result generated.
133
174
 
134
- > Generalized_Linear_Model
135
- > Binomial_Regression
136
- > Generalized_Additive_Model
137
- > Linear_Probability_Model
138
- > Poisson_Regression
139
- > Zero_Inflated_Model
175
+ ```ruby
176
+ mean
177
+ ant-man 576.8444444444444
178
+ jurassic_world 512.5111111111111
179
+ spy 369.73333333333335
180
+ ```
181
+ Below are more statistic methods you can invoke on your objects.
140
182
 
141
- > Nonparametric_Regression
142
- > Statistical_Outliers
143
- > Regression_And_Curve_Fitting_Software
144
- > Regression_Diagnostics
145
- > Regression_Variable_Selection
146
- > Regression_With_Time_Series_Structure
147
- > Robust_Regression
148
- > Choice_Modeling
183
+ Feel free to try them out.
149
184
 
150
- > Resampling
151
- > Bootstrapping_Population
185
+ * std
186
+ * sum
187
+ * count
188
+ * max
189
+ * min
190
+ * product
191
+ * standardize
192
+ * describe
193
+ * covariance
194
+ * correlation
152
195
 
153
- > Sensitivity_Analysis
154
- > Variance_Based_Sensitivity_Analysis
155
- > Elementary_Effects_Method
156
- > Experimental_Uncertainty_Analysis
157
- > Fourier_Amplitude_Sensitivity_Testing
158
- > Hyperparameter
196
+ ### Layout and Template
159
197
 
160
- > Time_Series_Analysis
161
- > Frequency_Deviation
198
+ movieDB allows you to view all your data fields in a worksheet style layout.
199
+
200
+ ``` ruby
201
+ m.worksheet
202
+ ```
203
+ A total of 45 fields are printed out. But, we've truncated the result for ease of reading.
204
+
205
+
206
+ ``` ruby
207
+ ant-man jurassic_w spy
208
+ production 177 128 40
209
+ belongs_to 0 151 0
210
+ plot_synop 9083 0 9629
211
+ company 14 18 21
212
+ title 7 14 3
213
+ filming_lo 267 1037 530
214
+ cast_chara 4094 5894 1001
215
+ trailer_ur 0 46 45
216
+ cast_membe 2833 3452 939
217
+ votes 5 6 5
218
+ adult 5 5 5
219
+ also_known 928 1601 1195
220
+ director 15 19 13
221
+ plot_summa 373 298 311
222
+ countries 7 16 7
223
+ ... ... ... ...
224
+ ```
225
+
226
+ ## Filters
227
+
228
+ When performing statistics on an object, movieDB by default processes all fields.
229
+
230
+ Contrary, to this default approach, you now have the option of filtering what fields you want processed with the following 2 filters.
231
+
232
+ * only
233
+ * except
234
+
235
+ 'only' analyzes the fields you provide.
236
+
237
+ 'Except' is the inverse of 'only'. It analyzes all the fields you did not provide.
238
+
239
+ ``` ruby
240
+ m.standardize only: [:budget, :revenue, :length, :vote_average]
241
+
242
+ ```
243
+ Processes only budget, revenue, length and vote_average values.
244
+ ``` ruby
245
+ ant-man jurassic_w spy
246
+ budget 1.49999999 -0.3616594 1.49999999
247
+ revenue -0.5000006 1.49304559 -0.5000013
248
+ length -0.4999988 -0.5656929 -0.4999976
249
+ vote_avera -0.5000005 -0.5656931 -0.5000010
250
+ ```
251
+ # Commands
252
+
253
+ movieDB comes with commands to help you query or manipulate stored objects in redis.
254
+
255
+ * HGETALL key
256
+ Get all the fields and values in a hash of the movie
257
+
258
+ ``` ruby
259
+ m.hgetall(["0369610"])
260
+ # => {"production_companies"=>"[{\"name\"=>\"Universal Studios\", \"id\"=>13},...}
261
+ ```
262
+ * HKEYS key
263
+ Get all the fields in a hash of the movie
264
+
265
+ ``` ruby
266
+ m.hkeys
267
+ # => ["production_companies", "belongs_to_collection", "plot_synopsis", "company", "title",...]
268
+ ```
269
+
270
+ * HVALS key
271
+ Get all the values in a hash of the movie
272
+
273
+ ``` ruby
274
+ m.hvals
275
+ # => ["[{\"name\"=>\"Universal Studios\", \"id\"=>13}, {\"name\"=>\"Amblin Entertainment\",...]
276
+ ```
277
+
278
+ * ALL_IDS key
279
+ Get all the id of movies
280
+
281
+ ``` ruby
282
+ m.all_ids
283
+ # => ["0369610", "3079380"...]
284
+ ```
285
+
286
+ * TTL key
287
+ Gets the remaining time to live of a movie.
288
+
289
+ ``` ruby
290
+ m.ttl("0369610)
291
+ # => 120
292
+ ```
293
+
294
+ * DELETE_ALL key
295
+ deletes all movie objects stored in redis.
296
+
297
+ ``` ruby
298
+ m.delete_all
299
+ # => []
300
+ ```
301
+ # Visualizations
302
+ (Work in progress)
303
+
304
+ # Data mining
305
+ (Work in progress)
162
306
 
163
307
  ## Contact me
164
308
 
@@ -166,4 +310,7 @@ If you'd like to collaborate, please feel free to fork source code on github.
166
310
 
167
311
  You can also contact me at albertmck@gmail.com
168
312
 
169
- ###### Copyright (c) 2013 Albert McKeever, released under MIT license
313
+ ## Disclaimer
314
+ This software is provided “as is” and without any express or implied warranties, including, without limitation, the implied warranties of merchantibility and fitness for a particular purpose.
315
+
316
+ ###### Copyright (c) 2013 - 2015 Albert McKeever, released under MIT license