so2db 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- data/Gemfile +8 -0
- data/Gemfile.lock +37 -0
- data/MIT-LICENSE +8 -0
- data/README.md +101 -0
- data/Rakefile +8 -0
- data/bin/so2pg +5 -0
- data/lib/so2db/extensions.rb +42 -0
- data/lib/so2db/formatter.rb +113 -0
- data/lib/so2db/migrations.rb +308 -0
- data/lib/so2db/models.rb +106 -0
- data/lib/so2db.rb +126 -0
- data/lib/so2pg.rb +179 -0
- data/test/test_formatter.rb +120 -0
- data/test/test_models.rb +20 -0
- data/test/test_so2db.rb +17 -0
- data/test/test_so2pg.rb +120 -0
- metadata +163 -0
data/Gemfile
ADDED
data/Gemfile.lock
ADDED
@@ -0,0 +1,37 @@
|
|
1
|
+
GEM
|
2
|
+
remote: http://rubygems.org/
|
3
|
+
specs:
|
4
|
+
activemodel (3.0.0)
|
5
|
+
activesupport (= 3.0.0)
|
6
|
+
builder (~> 2.1.2)
|
7
|
+
i18n (~> 0.4.1)
|
8
|
+
activerecord (3.0.0)
|
9
|
+
activemodel (= 3.0.0)
|
10
|
+
activesupport (= 3.0.0)
|
11
|
+
arel (~> 1.0.0)
|
12
|
+
tzinfo (~> 0.3.23)
|
13
|
+
activesupport (3.0.0)
|
14
|
+
arel (1.0.1)
|
15
|
+
activesupport (~> 3.0.0)
|
16
|
+
builder (2.1.2)
|
17
|
+
foreigner (1.2.0)
|
18
|
+
activerecord (>= 3.0.0)
|
19
|
+
i18n (0.4.2)
|
20
|
+
metaclass (0.0.1)
|
21
|
+
mocha (0.12.2)
|
22
|
+
metaclass (~> 0.0.1)
|
23
|
+
nokogiri (1.5.5)
|
24
|
+
pg (0.14.0)
|
25
|
+
rake (0.9.2.2)
|
26
|
+
tzinfo (0.3.33)
|
27
|
+
|
28
|
+
PLATFORMS
|
29
|
+
ruby
|
30
|
+
|
31
|
+
DEPENDENCIES
|
32
|
+
activerecord
|
33
|
+
foreigner
|
34
|
+
mocha
|
35
|
+
nokogiri
|
36
|
+
pg
|
37
|
+
rake
|
data/MIT-LICENSE
ADDED
@@ -0,0 +1,8 @@
|
|
1
|
+
Copyright (c) 2012 Chad Taylor
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
|
4
|
+
|
5
|
+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
|
6
|
+
|
7
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
8
|
+
|
data/README.md
ADDED
@@ -0,0 +1,101 @@
|
|
1
|
+
# SO2DB
|
2
|
+
SO2DB provides an API for importing the Stack Overflow/Stack Exchange data dumps into a database. It also provides a PostgreSQL import utility (so2pg) out of the box.
|
3
|
+
|
4
|
+
|
5
|
+
# Using the PostgreSQL Import Utility
|
6
|
+
1. Download a [Stack Exchange Data Dump](http://www.clearbits.net/creators/146-stack-exchange-data-dump) and extract the data set that you want to import (e.g., math.stackexchange.com.7z).
|
7
|
+
|
8
|
+
2. Strip invalid XML-encoded strings from the extracted XML files. This is a [known issue](http://blog.stackoverflow.com/2009/06/stack-overflow-creative-commons-data-dump/#comment-24223). I have a [Gist](https://gist.github.com/3270224) you can use that backs up and cleans all the XML files in a directory.
|
9
|
+
|
10
|
+
3. Ensure that you have the PostgreSQL client (`psql`) installed. `psql` is used to bulk copy data into your database.
|
11
|
+
|
12
|
+
4. Create a PostgreSQL database for the data.
|
13
|
+
|
14
|
+
5. Install SO2DB with the command `gem install so2db`. You may have to use `sudo` depending on your setup.
|
15
|
+
|
16
|
+
6. Call so2pg to import the data with the command `so2pg -O -R -d db_name -D /path/to/data/dir`. The data directory is the directory containing the clean XML files. The -O and -R flags indicate that you wish to include optional tables and table relationships, respectively. You can see all the options by running `so2pg --help` or in the list below.
|
17
|
+
|
18
|
+
7. Wait impatiently! Depending on your hardware and the data set you choose to import, `so2pg` may take a while to run. With all that free time on your hands, why don't you help me figure out how to make so2db *faster?!*
|
19
|
+
|
20
|
+
|
21
|
+
## so2pg options
|
22
|
+
|
23
|
+
-H --host HOST The database host
|
24
|
+
|
25
|
+
-d --database DBNAME The name of the database (REQUIRED)
|
26
|
+
|
27
|
+
-D --directory DIRECTORY The data directory path (REQUIRED)
|
28
|
+
|
29
|
+
-u --user USER The user name
|
30
|
+
|
31
|
+
-p --password PASSWORD The user's password
|
32
|
+
|
33
|
+
-P --port PORT_NUMBER The port number
|
34
|
+
|
35
|
+
-O --include-optionals Includes optional tables
|
36
|
+
|
37
|
+
-R --include-relationships Includes table relationships
|
38
|
+
|
39
|
+
-h --help Show this help screen
|
40
|
+
|
41
|
+
## Learning PostgreSQL?
|
42
|
+
Be sure to check out the Tekpub [Hello PostgreSQL](http://tekpub.com/productions/pg) series. It presents a lot of useful information quickly and in an accessible manner. I built this gem simply so I could practice along with the videos. Highly recommended!
|
43
|
+
|
44
|
+
## Hardware and Performance
|
45
|
+
We have run so2pg several times on two machines with slightly hardware configurations. The CPUs and RAM were comparable (2.66 vs. 2.4 GHz, 4GB RAM), but one had an SSD while the other had a ...slow... HD. When importing the Apr 2012 Stack Overflow dump on each machine, we noticed about 43% faster import times on the SSD (2 hrs vs. 3.5 hrs).
|
46
|
+
|
47
|
+
Your import may take quite some time to complete, and the performance is very dependent on your hard drive speed.
|
48
|
+
|
49
|
+
## Other Tips
|
50
|
+
If you are running `so2pg` on OS X, you may want to run the `purge` command in a terminal periodically. This will free up "Inactive Memory" and reduce the number of pages to disk. This should help performance, though I haven't specifically made measurements on it.
|
51
|
+
|
52
|
+
|
53
|
+
# Creating a Custom Importer
|
54
|
+
Before you create your own custom importer, you should check to see if someone is already working on one with the same purpose. Otherwise, let us know what you're working on and get started!
|
55
|
+
|
56
|
+
SO2DB depends on ActiveRecord and [Foreigner](https://github.com/matthuhiggins/foreigner) to build tables and relationships, and so SO2DB is limited to the databases supported by both projects. (At the time of writing, Foreigner only supports PostgreSQL, MySQL, and SQLite. ActiveRecord supports these and more.)
|
57
|
+
|
58
|
+
Creating a custom importer requires you to provide two classes: an ActiveRecord monkey patch and the importer implementation.
|
59
|
+
|
60
|
+
First, you need to create a [monkey patch](http://stackoverflow.com/questions/394144/what-does-monkey-patching-exactly-mean-in-ruby) that adds a uuid method to the associated ActiveRecord connection adapter. The uuid method defines the database type associated with a universally unique identifier (e.g., 'uuid' in PostgreSQL, 'CHAR(16)' in MySQL).
|
61
|
+
|
62
|
+
Next, you must create a subclass of SO2DB::Importer that contains a method with the following definition:
|
63
|
+
|
64
|
+
> import_stream(formatter)
|
65
|
+
|
66
|
+
The formatter will be a SO2DB::Formatter, which generates formatted output from a StackOverflow XML data file when you call the its `format` method. The `format` method accepts a stream and pumps formatted data to that stream. This approach allows for you to deal with the data as you see fit - pass it over STDIN to another command (this is what so2pg does) or send it to a file that you then pass to a database system. If you need the database parameters provided to SO2DB, they are provided in the importer conn_opts property.
|
67
|
+
|
68
|
+
The data coming from the formatter will use a delimiter specified in the Importer implementation. A vertical tab, '\v' (0xB), is used by default. This can be changed by setting the format_delimiter property in the importer.
|
69
|
+
|
70
|
+
The formatter also offers a couple of convenience routines, `file_name` and `value_str`. `file_name` simply provides the name of the file to be formatted. `value_str` provides a partial SQL string based on the data to be formatted. It provides the table name and the ordered value names associated with the formatted data. For example, assume you are provided a formatter for badges.xml. The convenience routines produce the following:
|
71
|
+
|
72
|
+
```ruby
|
73
|
+
puts formatter.file_name
|
74
|
+
# => badges.xml
|
75
|
+
|
76
|
+
puts formatter.value_str
|
77
|
+
# => badges(id,date,name,user_id)
|
78
|
+
```
|
79
|
+
|
80
|
+
Note that the field names are alphabetized. The field values from the formatter will have the same ordering. This is to ensure consistent and predictable field ordering, even if fields are not provided in the XML file. If a field is not provided in the XML file, an empty string is inserted into the formatted data.
|
81
|
+
|
82
|
+
I encourage you to check out the implementation of so2pg if you are interested in developing your own importer; there are additional notes throughout the source to make your development experience a bit smoother.
|
83
|
+
|
84
|
+
When you are finished, consider creating a pull request. I would love to include your labor of love in a future release!
|
85
|
+
|
86
|
+
# License
|
87
|
+
SO2DB is released under the [MIT License](http://opensource.org/licenses/MIT).
|
88
|
+
|
89
|
+
|
90
|
+
# Tested Platforms
|
91
|
+
This project has been tested under OS X (Snow Leopard) and Ubuntu 12.04 using Ruby version 1.9.2 and 1.9.3. If anyone wants to test on other platforms, I would appreciate it!
|
92
|
+
|
93
|
+
|
94
|
+
# Supported Data Dumps
|
95
|
+
SO2DB has been tested against the following data dumps:
|
96
|
+
|
97
|
+
Sept 2011
|
98
|
+
Apr 2012
|
99
|
+
|
100
|
+
# Special Thanks
|
101
|
+
Thanks to [@nclaburn](https://github.com/nclaburn) for being a great mentor and helping me get this project released!
|
data/Rakefile
ADDED
data/bin/so2pg
ADDED
@@ -0,0 +1,42 @@
|
|
1
|
+
#--
|
2
|
+
# Copyright (c) 2012 Chad Taylor
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the "Software"), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in
|
12
|
+
# all copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
20
|
+
# SOFTWARE.
|
21
|
+
|
22
|
+
|
23
|
+
require 'active_record'
|
24
|
+
|
25
|
+
module ActiveRecord
|
26
|
+
module ConnectionAdapters
|
27
|
+
|
28
|
+
class TableDefinition
|
29
|
+
def uuid(*args)
|
30
|
+
opts = args.extract_options!
|
31
|
+
column_names = args
|
32
|
+
type, default_opts = @base.uuid
|
33
|
+
|
34
|
+
# prefer the values provided by the user...
|
35
|
+
options = default_opts.merge!(opts)
|
36
|
+
|
37
|
+
column_names.each { |name| column(name, type, options) }
|
38
|
+
end
|
39
|
+
end
|
40
|
+
|
41
|
+
end
|
42
|
+
end
|
@@ -0,0 +1,113 @@
|
|
1
|
+
#--
|
2
|
+
# Copyright (c) 2012 Chad Taylor
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the "Software"), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in
|
12
|
+
# all copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
20
|
+
# SOFTWARE.
|
21
|
+
|
22
|
+
|
23
|
+
require 'active_support/inflector'
|
24
|
+
require 'nokogiri'
|
25
|
+
require 'cgi'
|
26
|
+
|
27
|
+
module SO2DB
|
28
|
+
|
29
|
+
# Formats data from one stream into another stream.
|
30
|
+
class Formatter
|
31
|
+
|
32
|
+
# Infrastructure. Do not call this from your code.
|
33
|
+
def initialize(path = '', delimiter = 11.chr.to_s)
|
34
|
+
@delimiter = delimiter
|
35
|
+
@path = path
|
36
|
+
@name = "row"
|
37
|
+
end
|
38
|
+
|
39
|
+
# Formats a file and prints the formatted output to the outstream.
|
40
|
+
#
|
41
|
+
# The output is performed via a 'puts' method.
|
42
|
+
#
|
43
|
+
# Example:
|
44
|
+
# >> f = get_formatter # assumed to be provided to you
|
45
|
+
# >> cmd = get_cmd # some shell command that accepts piped input
|
46
|
+
# >> IO.popen(cmd, 'r+') do |s|
|
47
|
+
# >> formatter.format(s)
|
48
|
+
# >> s.close_write
|
49
|
+
# >> end
|
50
|
+
def format(outstream)
|
51
|
+
file = File.basename(@path, '.*')
|
52
|
+
req_attrs = Models::Lookup::get_required_attrs(file)
|
53
|
+
|
54
|
+
format_from_stream(File.open(@path), req_attrs, outstream)
|
55
|
+
end
|
56
|
+
|
57
|
+
def file_name
|
58
|
+
File.basename(@path)
|
59
|
+
end
|
60
|
+
|
61
|
+
# Returns a string containing the data type and individual fields provided
|
62
|
+
# through formatting. The fields are sorted alphabetically.
|
63
|
+
#
|
64
|
+
# This method is useful for building SQL statements for the formatted data.
|
65
|
+
#
|
66
|
+
# Example:
|
67
|
+
# >> f = get_formatter_for_badges # formatter should be provided to you
|
68
|
+
# >> puts f.value_str
|
69
|
+
# => badges(id,date,name,user_id)
|
70
|
+
def value_str
|
71
|
+
file = File.basename(@path, '.*')
|
72
|
+
o = Models::Lookup::find_class(file)
|
73
|
+
|
74
|
+
table = o.table_name
|
75
|
+
values = o.exported_fields.sort.join(",")
|
76
|
+
|
77
|
+
"#{table}(#{values})"
|
78
|
+
end
|
79
|
+
|
80
|
+
private
|
81
|
+
def format_from_stream(instream, required_attrs, outstream)
|
82
|
+
reader = Nokogiri::XML::Reader(instream)
|
83
|
+
required_attrs.sort!
|
84
|
+
|
85
|
+
reader.each do |node|
|
86
|
+
outstream.puts format_node(node, required_attrs) if element_start? node
|
87
|
+
end
|
88
|
+
end
|
89
|
+
|
90
|
+
def element_start?(node)
|
91
|
+
node.name == @name &&
|
92
|
+
node.node_type == Nokogiri::XML::Reader::TYPE_ELEMENT
|
93
|
+
end
|
94
|
+
|
95
|
+
def format_node(node, attrs)
|
96
|
+
arr = attrs.map do |a|
|
97
|
+
str = node.attribute(a)
|
98
|
+
str ? scrub(str) : ''
|
99
|
+
end
|
100
|
+
|
101
|
+
arr.join(@delimiter)
|
102
|
+
end
|
103
|
+
|
104
|
+
def scrub(str)
|
105
|
+
s = CGI::escapeHTML(str)
|
106
|
+
s.gsub!(/\n|\r/, '')
|
107
|
+
|
108
|
+
s
|
109
|
+
end
|
110
|
+
|
111
|
+
end
|
112
|
+
end
|
113
|
+
|
@@ -0,0 +1,308 @@
|
|
1
|
+
#--
|
2
|
+
# Copyright (c) 2012 Chad Taylor
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the "Software"), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in
|
12
|
+
# all copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
20
|
+
# SOFTWARE.
|
21
|
+
|
22
|
+
|
23
|
+
require 'active_record'
|
24
|
+
require 'so2db/extensions'
|
25
|
+
require 'foreigner'
|
26
|
+
|
27
|
+
|
28
|
+
module SO2DB
|
29
|
+
|
30
|
+
module FKHelper
|
31
|
+
|
32
|
+
def add_fk(from_table, to_table, options={})
|
33
|
+
begin
|
34
|
+
AddForeignKeyMigration.new(from_table, to_table, options).up
|
35
|
+
rescue
|
36
|
+
s = "Error creating foreign key from #{from_table} to #{to_table}"
|
37
|
+
s << " on column #{options[:column]}" if options.has_key? :column
|
38
|
+
puts s
|
39
|
+
end
|
40
|
+
end
|
41
|
+
|
42
|
+
end
|
43
|
+
|
44
|
+
class AddForeignKeyMigration < ActiveRecord::Migration
|
45
|
+
|
46
|
+
def initialize(from_table, to_table, options)
|
47
|
+
@from_table = from_table
|
48
|
+
@to_table = to_table
|
49
|
+
@options = options
|
50
|
+
end
|
51
|
+
|
52
|
+
def up
|
53
|
+
add_foreign_key(@from_table, @to_table, @options)
|
54
|
+
end
|
55
|
+
|
56
|
+
end
|
57
|
+
|
58
|
+
class CreateBasicTables < ActiveRecord::Migration
|
59
|
+
|
60
|
+
def up
|
61
|
+
create_table :badges do |t|
|
62
|
+
t.integer :user_id
|
63
|
+
t.string :name, :limit => 50
|
64
|
+
t.timestamp :date
|
65
|
+
end
|
66
|
+
|
67
|
+
create_table :comments do |t|
|
68
|
+
t.integer :post_id
|
69
|
+
t.integer :score
|
70
|
+
t.text :text
|
71
|
+
t.timestamp :creation_date
|
72
|
+
t.string :user_display_name, :limit => 30
|
73
|
+
t.integer :user_id
|
74
|
+
end
|
75
|
+
|
76
|
+
create_table :posts do |t|
|
77
|
+
t.integer :post_type_id
|
78
|
+
t.integer :parent_id
|
79
|
+
t.integer :accepted_answer_id
|
80
|
+
t.timestamp :creation_date
|
81
|
+
t.integer :score
|
82
|
+
t.integer :view_count
|
83
|
+
t.text :body
|
84
|
+
t.integer :owner_user_id
|
85
|
+
t.string :owner_display_name, :limit => 40
|
86
|
+
t.integer :last_editor_user_id
|
87
|
+
t.string :last_editor_display_name, :limit => 40
|
88
|
+
t.timestamp :last_edit_date
|
89
|
+
t.timestamp :last_activity_date
|
90
|
+
t.timestamp :community_owned_date
|
91
|
+
t.timestamp :closed_date
|
92
|
+
t.string :title, :limit => 250
|
93
|
+
t.string :tags, :limit => 150
|
94
|
+
t.integer :answer_count
|
95
|
+
t.integer :comment_count
|
96
|
+
t.integer :favorite_count
|
97
|
+
end
|
98
|
+
|
99
|
+
create_table :post_history do |t|
|
100
|
+
t.integer :post_history_type_id
|
101
|
+
t.integer :post_id
|
102
|
+
t.uuid :revision_guid
|
103
|
+
t.timestamp :creation_date
|
104
|
+
t.integer :user_id
|
105
|
+
t.string :user_display_name, :limit => 40
|
106
|
+
t.string :comment, :limit => 600
|
107
|
+
t.text :text
|
108
|
+
t.integer :close_reason_id
|
109
|
+
end
|
110
|
+
|
111
|
+
create_table :users do |t|
|
112
|
+
t.integer :reputation
|
113
|
+
t.timestamp :creation_date
|
114
|
+
t.string :display_name, :limit => 40
|
115
|
+
t.string :email_hash, :limit => 32
|
116
|
+
t.timestamp :last_access_date
|
117
|
+
t.string :website_url, :limit => 300
|
118
|
+
t.string :location, :limit => 200
|
119
|
+
t.integer :age
|
120
|
+
t.text :about_me
|
121
|
+
t.integer :views
|
122
|
+
t.integer :up_votes
|
123
|
+
t.integer :down_votes
|
124
|
+
end
|
125
|
+
|
126
|
+
create_table :votes do |t|
|
127
|
+
t.primary_key :id
|
128
|
+
t.integer :post_id
|
129
|
+
t.integer :vote_type_id
|
130
|
+
t.timestamp :creation_date
|
131
|
+
t.integer :user_id
|
132
|
+
t.integer :bounty_amount
|
133
|
+
end
|
134
|
+
|
135
|
+
end
|
136
|
+
|
137
|
+
def down
|
138
|
+
[:votes, :badges, :comments, :post_history, :posts, :users].each do |t|
|
139
|
+
drop_table t
|
140
|
+
end
|
141
|
+
end
|
142
|
+
|
143
|
+
end
|
144
|
+
|
145
|
+
class CreateRelationships
|
146
|
+
include FKHelper
|
147
|
+
|
148
|
+
def up
|
149
|
+
add_fk(:badges, :users)
|
150
|
+
add_fk(:comments, :posts)
|
151
|
+
add_fk(:comments, :users)
|
152
|
+
add_fk(:posts, :posts, column: 'parent_id')
|
153
|
+
|
154
|
+
# The following relationship is currently suspect, see
|
155
|
+
# http://meta.stackoverflow.com/questions/131975/what-are-the-posttypeids-in-the-2011-12-data-dump
|
156
|
+
add_fk(:posts, :posts, column: 'accepted_answer_id')
|
157
|
+
|
158
|
+
add_fk(:posts, :users, column: 'owner_user_id')
|
159
|
+
add_fk(:posts, :users, column: 'last_editor_user_id')
|
160
|
+
add_fk(:post_history, :posts)
|
161
|
+
add_fk(:post_history, :users)
|
162
|
+
|
163
|
+
# In my experience, the following is also suspect
|
164
|
+
add_fk(:votes, :posts)
|
165
|
+
|
166
|
+
add_fk(:votes, :users)
|
167
|
+
end
|
168
|
+
|
169
|
+
end
|
170
|
+
|
171
|
+
class CreateOptionals < ActiveRecord::Migration
|
172
|
+
|
173
|
+
def up
|
174
|
+
create_post_types
|
175
|
+
create_post_history_types
|
176
|
+
create_close_reasons
|
177
|
+
create_vote_types
|
178
|
+
end
|
179
|
+
|
180
|
+
def create_post_types
|
181
|
+
create_table :post_types do |t|
|
182
|
+
t.primary_key :id
|
183
|
+
t.string :type_name, :limit => 24
|
184
|
+
end
|
185
|
+
|
186
|
+
{ 1 => "Question",
|
187
|
+
2 => "Answer",
|
188
|
+
3 => "Wiki",
|
189
|
+
4 => "TagWikiExcerpt",
|
190
|
+
5 => "TagWiki",
|
191
|
+
6 => "ModeratorNomination",
|
192
|
+
7 => "WikiPlaceholder",
|
193
|
+
8 => "PrivilegeWiki"
|
194
|
+
}.each do |k,v|
|
195
|
+
p = Models::PostType.new
|
196
|
+
p.id = k
|
197
|
+
p.type_name = v
|
198
|
+
p.save
|
199
|
+
end
|
200
|
+
end
|
201
|
+
|
202
|
+
def create_post_history_types
|
203
|
+
create_table :post_history_types do |t|
|
204
|
+
t.primary_key :id
|
205
|
+
t.string :name, :limit => 50
|
206
|
+
end
|
207
|
+
|
208
|
+
{ 1 => "Initial Title",
|
209
|
+
2 => "Initial Body",
|
210
|
+
3 => "Initial Tags",
|
211
|
+
4 => "Edit Title",
|
212
|
+
5 => "Edit Body",
|
213
|
+
6 => "Edit Tags",
|
214
|
+
7 => "Rollback Title",
|
215
|
+
8 => "Rollback Body",
|
216
|
+
9 => "Rollback Tags",
|
217
|
+
10 => "Post Closed",
|
218
|
+
11 => "Post Reopened",
|
219
|
+
12 => "Post Deleted",
|
220
|
+
13 => "Post Undeleted",
|
221
|
+
14 => "Post Locked",
|
222
|
+
15 => "Post Unlocked",
|
223
|
+
16 => "Community Owned",
|
224
|
+
17 => "Post Migrated",
|
225
|
+
18 => "Question Merged",
|
226
|
+
19 => "Question Protected",
|
227
|
+
20 => "Question Unprotected",
|
228
|
+
22 => "Question Unmerged",
|
229
|
+
24 => "Suggested Edit Applied",
|
230
|
+
25 => "Post Tweeted",
|
231
|
+
31 => "Discussion moved to chat",
|
232
|
+
33 => "Post Notice Added",
|
233
|
+
34 => "Post Notice Removed",
|
234
|
+
35 => "Post Migrated Away",
|
235
|
+
36 => "Post Migrated Here",
|
236
|
+
37 => "Post Merge Source",
|
237
|
+
38 => "Post Merge Destination"
|
238
|
+
}.each do |k,v|
|
239
|
+
p = Models::PostHistoryType.new
|
240
|
+
p.id = k
|
241
|
+
p.name = v
|
242
|
+
p.save
|
243
|
+
end
|
244
|
+
end
|
245
|
+
|
246
|
+
def create_close_reasons
|
247
|
+
create_table :close_reasons do |t|
|
248
|
+
t.primary_key :id
|
249
|
+
t.string :name, :limit => 50
|
250
|
+
end
|
251
|
+
|
252
|
+
{ 1 => "Exact duplicate",
|
253
|
+
2 => "off-topic",
|
254
|
+
3 => "subjective",
|
255
|
+
4 => "not a real question",
|
256
|
+
7 => "too localized",
|
257
|
+
10 => "General reference",
|
258
|
+
20 => "Noise or pointless"
|
259
|
+
}.each do |k,v|
|
260
|
+
c = Models::CloseReason.new
|
261
|
+
c.id = k
|
262
|
+
c.name = v
|
263
|
+
c.save
|
264
|
+
end
|
265
|
+
end
|
266
|
+
|
267
|
+
def create_vote_types
|
268
|
+
create_table :vote_types do |t|
|
269
|
+
t.primary_key :id
|
270
|
+
t.string :name, :limit => 50
|
271
|
+
end
|
272
|
+
|
273
|
+
{ 1 => "AcceptedByOriginator",
|
274
|
+
2 =>"UpMod",
|
275
|
+
3 => "DownMod",
|
276
|
+
4 =>"Offensive",
|
277
|
+
5 =>"Favorite",
|
278
|
+
6 =>"Close",
|
279
|
+
7 =>"Reopen",
|
280
|
+
8 =>"BountyStart",
|
281
|
+
9 =>"BountyClose",
|
282
|
+
10 =>"Deletion",
|
283
|
+
11 =>"Undeletion",
|
284
|
+
12 =>"Spam",
|
285
|
+
15 =>"ModeratorReview",
|
286
|
+
16 =>"ApproveEditSuggestion"
|
287
|
+
}.each do |k,v|
|
288
|
+
vt = Models::VoteType.new
|
289
|
+
vt.id = k
|
290
|
+
vt.name = v
|
291
|
+
vt.save
|
292
|
+
end
|
293
|
+
end
|
294
|
+
|
295
|
+
end
|
296
|
+
|
297
|
+
class CreateOptionalRelationships < ActiveRecord::Migration
|
298
|
+
include FKHelper
|
299
|
+
|
300
|
+
def up
|
301
|
+
add_fk(:posts, :post_types)
|
302
|
+
add_fk(:post_history, :post_history_types)
|
303
|
+
add_fk(:post_history, :close_reasons)
|
304
|
+
add_fk(:votes, :vote_types)
|
305
|
+
end
|
306
|
+
end
|
307
|
+
|
308
|
+
end
|