shiba 0.5.0 → 0.6.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/Gemfile.lock +1 -1
- data/README.md +113 -63
- data/bin/review +25 -135
- data/lib/shiba.rb +4 -0
- data/lib/shiba/activerecord_integration.rb +21 -19
- data/lib/shiba/analyzer.rb +1 -1
- data/lib/shiba/configure.rb +12 -6
- data/lib/shiba/connection/mysql.rb +72 -3
- data/lib/shiba/connection/postgres.rb +4 -1
- data/lib/shiba/console.rb +165 -0
- data/lib/shiba/explain.rb +22 -7
- data/lib/shiba/fuzzer.rb +5 -0
- data/lib/shiba/index_stats.rb +20 -1
- data/lib/shiba/output/tags.yaml +1 -1
- data/lib/shiba/parsers/mysql_select_fields.rb +64 -0
- data/lib/shiba/parsers/shiba_string_scanner.rb +27 -0
- data/lib/shiba/review/cli.rb +196 -0
- data/lib/shiba/review/comment_renderer.rb +19 -2
- data/lib/shiba/review/diff.rb +227 -0
- data/lib/shiba/review/explain_diff.rb +117 -0
- data/lib/shiba/reviewer.rb +20 -19
- data/lib/shiba/table_stats.rb +4 -0
- data/lib/shiba/version.rb +1 -1
- data/web/results.html.erb +15 -1
- metadata +10 -5
- data/lib/shiba/checker.rb +0 -165
- data/lib/shiba/diff.rb +0 -129
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: f700a4b773d7a1b3c474ac6437c13ea77743dd6a9b95a8bde1b703d9914f6c20
|
4
|
+
data.tar.gz: 208f0d44f7ba356ad15340cb9ef48bb4ccfaba7050151cd12bcca26cd08b6bab
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: a6d464627f14abd14753473965b4935582095dd5ba4b9187760710bffbe75a40edb9392899608f1c3d86d7911b2c0b73eafa8e01adafcf6370b044b191ed3e8e
|
7
|
+
data.tar.gz: 2417ce5680cf5468fe72a4750e30c964a184c7f943c9ce1e37efe738bf812197a1a9b07511e1a2541ede2c0834aa6d0896515add9937b132ba8af9e2b49a6188
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -2,12 +2,9 @@
|
|
2
2
|
|
3
3
|
# Shiba
|
4
4
|
|
5
|
-
Shiba is a tool that
|
5
|
+
Shiba is a tool (currently in alpha) that automatically reviews SQL queries before they cause problems in production. It uses production statistics for realistic query analysis. It catches missing indexes, overly broad indexes, and queries that return too much data.
|
6
6
|
|
7
|
-
|
8
|
-
* Poorly performing indexes
|
9
|
-
|
10
|
-
By default, it will pretty much only detect queries that miss indexes. As it's fed more information, it warns about advanced problems, such as queries that use indexes but are still very expensive. To help find such queries, Shiba monitors test runs for ActiveRecord queries. A warning and report are then generated
|
7
|
+
![screenshot](https://shiba-sql.com/wp-content/uploads/2019/03/shiba-screenshot-1024x581.png)
|
11
8
|
|
12
9
|
## Installation
|
13
10
|
|
@@ -27,8 +24,7 @@ require 'shiba/setup'
|
|
27
24
|
|
28
25
|
## Usage
|
29
26
|
|
30
|
-
|
31
|
-
To verify shiba is actually running, you can run your tests with SHIBA_DEBUG=true.
|
27
|
+
To get started, try out shiba locally. To verify shiba is actually running, you can run your tests with SHIBA_DEBUG=true.
|
32
28
|
|
33
29
|
```ruby
|
34
30
|
# Install
|
@@ -43,55 +39,15 @@ SHIBA_DEBUG=true ruby test/controllers/users_controller_test.rb
|
|
43
39
|
# Report available at /tmp/shiba-explain.log-1550099512
|
44
40
|
```
|
45
41
|
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
42
|
+
## Next steps
|
43
|
+
* [Integrate with Github pull requests](#automatic-pull-request-reviews)
|
44
|
+
* [Add production stats for realistic analysis](#going-beyond-table-scans)
|
45
|
+
* [Preview queries from the developer console](#analyze-queries-from-the-developer-console)
|
46
|
+
* [Read more about typical query problems](#typical-query-problems)
|
50
47
|
|
51
|
-
## Typical query problems
|
52
|
-
|
53
|
-
Here are some typical query problems Shiba can detect. We'll assume the following schema:
|
54
|
-
|
55
|
-
```ruby
|
56
|
-
create_table :users do |t|
|
57
|
-
t.string :name
|
58
|
-
t.string :email
|
59
|
-
# add an organization_id column with an index
|
60
|
-
t.references :organization, index: true
|
61
48
|
|
62
|
-
t.timestamps
|
63
|
-
end
|
64
|
-
```
|
65
49
|
|
66
|
-
|
67
|
-
|
68
|
-
The most simple case to detect are queries that don't utilize indexes. While it isn't a problem to scan small tables, often tables will grow large enough where this can become a serious issue.
|
69
|
-
|
70
|
-
```ruby
|
71
|
-
user = User.where(email: 'squirrel@example.com').limit(1)
|
72
|
-
```
|
73
|
-
|
74
|
-
Without an index, the database will read every row in the table until it finds one with an email address that matches. By adding an index, the database can perform a quick lookup for the record.
|
75
|
-
|
76
|
-
#### Non selective indexes
|
77
|
-
|
78
|
-
Another common case is queries that use an index, and work fine in the average case, but the distribution is non normal. These issues can be hard to track down and often impact large customers.
|
79
|
-
|
80
|
-
```ruby
|
81
|
-
users = User.where(organization_id: 1)
|
82
|
-
users.size
|
83
|
-
# => 75
|
84
|
-
|
85
|
-
users = User.where(organization_id: 42)
|
86
|
-
users.size
|
87
|
-
# => 52,000
|
88
|
-
```
|
89
|
-
|
90
|
-
Normally a query like this would only become a problem as the app grows in popularity. Fixes include adding `limit` or `find_each`.
|
91
|
-
|
92
|
-
With more data, Shiba can help detect this issue when it appears in a pull request.
|
93
|
-
|
94
|
-
## Going beyond table scans
|
50
|
+
### Going beyond table scans
|
95
51
|
|
96
52
|
Without more information, Shiba acts as a simple missed index detector. To catch other problems that can bring down production (or at least cause some performance issues), Shiba requires general statistics about production data, such as the number of rows in a table and how unique columns are.
|
97
53
|
|
@@ -135,11 +91,18 @@ users:
|
|
135
91
|
unique: false
|
136
92
|
```
|
137
93
|
|
138
|
-
|
94
|
+
### Automatic pull request reviews
|
95
|
+
|
96
|
+
Shiba can automatically comment on Github pull requests when code changes appear to introduce a query issue. To do this, it will need the Github API token of a user that has access to the repo. Shiba's comments will appear to come from that user, so you'll likely want to setup a bot account on Github with repo access for this. The token can be generated on Github at https://github.com/settings/tokens.
|
97
|
+
|
98
|
+
Once the token is ready, you can integrate Shiba on your CI server by following these steps:
|
99
|
+
* [Travis CI](#travis-integration)
|
100
|
+
* [CircleCI](#circleci-integration)
|
101
|
+
* [Customized CI](#custom-ci-integration)
|
139
102
|
|
140
|
-
|
103
|
+
#### Travis Integration
|
141
104
|
|
142
|
-
|
105
|
+
On Travis, add this to the after_script setting:
|
143
106
|
|
144
107
|
```yml
|
145
108
|
# .travis.yml
|
@@ -147,12 +110,99 @@ after_script:
|
|
147
110
|
- bundle exec shiba review --submit
|
148
111
|
```
|
149
112
|
|
150
|
-
|
113
|
+
Add the Github API token you've generated as an environment variable named `GITHUB_TOKEN` at https://travis-ci.com/{organization}/{repo}/settings.
|
151
114
|
|
152
|
-
|
115
|
+
#### CircleCI Integration
|
153
116
|
|
154
|
-
|
155
|
-
|
156
|
-
|
157
|
-
|
158
|
-
|
117
|
+
To integrate with CircleCI, add this after the the test run step in `.circleci/config.yml`.
|
118
|
+
|
119
|
+
```yml
|
120
|
+
# .circleci/config.yml
|
121
|
+
- run:
|
122
|
+
name: Review SQL queries
|
123
|
+
command: bundle exec shiba review --submit
|
124
|
+
```
|
125
|
+
|
126
|
+
An environment variable named `GITHUB_TOKEN` will need to be configured on CircleCI under *Project settings > Environment Variables*
|
127
|
+
|
128
|
+
#### Custom CI Integration
|
129
|
+
|
130
|
+
To run on other servers, two steps are required:
|
131
|
+
1. Ensure an environment variable named `CI` is set when the tests and shiba script are run.
|
132
|
+
2. Run the `shiba review` command after tests are run, supplying the required arguments to `--submit, --token, --branch, and --pull-request`. For example:
|
133
|
+
|
134
|
+
```bash
|
135
|
+
CI=true
|
136
|
+
export CI
|
137
|
+
rake test
|
138
|
+
bundle exec shiba review --submit --token $MY_GITHUB_TOKEN --branch $(git rev-parse HEAD) --pull-request $MY_PR_NUMBER
|
139
|
+
```
|
140
|
+
|
141
|
+
The `--submit` option tells Shiba to comment on the relevant PR when an issue is found.
|
142
|
+
|
143
|
+
|
144
|
+
### Analyze queries from the developer console
|
145
|
+
|
146
|
+
For quick analysis, queries can be analyzed from the Rails console.
|
147
|
+
```ruby
|
148
|
+
# rails console
|
149
|
+
[1] pry(main)> require 'shiba/console'
|
150
|
+
=> true
|
151
|
+
[2] pry(main)> shiba User.where(email: "squirrel@example.com")
|
152
|
+
|
153
|
+
Severity: high
|
154
|
+
----------------------------
|
155
|
+
Fuzzed Data: Table sizes estimated as follows -- 100000: users
|
156
|
+
Table Scan: The database reads 100% (100000) of the of the rows in **users**, skipping any indexes.
|
157
|
+
Results: The database returns 100000 row(s) to the client.
|
158
|
+
Estimated query time: 3.02s
|
159
|
+
|
160
|
+
=> #<Shiba::Console::ExplainRecord:0x00007ffc154e6128>: 'SELECT `users`.* FROM `users` WHERE `users`.`email` = 'squirrel@example.com''. Call the 'help' method on this object for more info.
|
161
|
+
[3] pry(main)>
|
162
|
+
```
|
163
|
+
|
164
|
+
Raw query strings are also supported, e.g. `shiba "select * from users where users.email = 'squirrel@example.com'"`
|
165
|
+
|
166
|
+
|
167
|
+
### Typical query problems
|
168
|
+
|
169
|
+
Here are some typical query problems Shiba can detect. We'll assume the following schema:
|
170
|
+
|
171
|
+
```ruby
|
172
|
+
create_table :users do |t|
|
173
|
+
t.string :name
|
174
|
+
t.string :email
|
175
|
+
# add an organization_id column with an index
|
176
|
+
t.references :organization, index: true
|
177
|
+
|
178
|
+
t.timestamps
|
179
|
+
end
|
180
|
+
```
|
181
|
+
|
182
|
+
#### Full table scans
|
183
|
+
|
184
|
+
The most simple case to detect are queries that don't utilize indexes. While it isn't a problem to scan small tables, often tables will grow large enough where this can become a serious issue.
|
185
|
+
|
186
|
+
```ruby
|
187
|
+
user = User.where(email: 'squirrel@example.com').limit(1)
|
188
|
+
```
|
189
|
+
|
190
|
+
Without an index, the database will read every row in the table until it finds one with an email address that matches. By adding an index, the database can perform a quick lookup for the record.
|
191
|
+
|
192
|
+
#### Non selective indexes
|
193
|
+
|
194
|
+
Another common case is queries that use an index, and work fine in the average case, but the distribution is non normal. These issues can be hard to track down and often impact large customers.
|
195
|
+
|
196
|
+
```ruby
|
197
|
+
users = User.where(organization_id: 1)
|
198
|
+
users.size
|
199
|
+
# => 75
|
200
|
+
|
201
|
+
users = User.where(organization_id: 42)
|
202
|
+
users.size
|
203
|
+
# => 52,000
|
204
|
+
```
|
205
|
+
|
206
|
+
Normally a query like this would only become a problem as the app grows in popularity. Fixes include adding `limit` or `find_each`.
|
207
|
+
|
208
|
+
With more data, Shiba can help detect this issue when it appears in a pull request.
|
data/bin/review
CHANGED
@@ -2,166 +2,56 @@
|
|
2
2
|
|
3
3
|
$LOAD_PATH << File.expand_path("../lib", File.dirname(__FILE__))
|
4
4
|
require 'optionparser'
|
5
|
-
require 'shiba/reviewer'
|
6
|
-
require 'shiba/checker'
|
7
5
|
require 'shiba/configure'
|
6
|
+
require 'shiba/reviewer'
|
7
|
+
require 'shiba/review/explain_diff'
|
8
|
+
require 'shiba/review/cli'
|
9
|
+
require 'json'
|
8
10
|
|
9
|
-
|
10
|
-
|
11
|
-
opts.banner = "Review changes for query problems. Optionally submit the comments to a Github pull request."
|
12
|
-
|
13
|
-
opts.separator "Required:"
|
14
|
-
|
15
|
-
opts.on("-f","--file FILE", "The explain output log to compare with. Automatically configured when $CI environment variable is set") do |f|
|
16
|
-
options["file"] = f
|
17
|
-
end
|
18
|
-
|
19
|
-
opts.separator ""
|
20
|
-
opts.separator "Git diff options:"
|
21
|
-
|
22
|
-
opts.on("-b", "--branch GIT_BRANCH", "Compare to changes between origin/HEAD and BRANCH. Attempts to read from CI environment when not set.") do |b|
|
23
|
-
options["branch"] = b
|
24
|
-
end
|
25
|
-
|
26
|
-
opts.on("--staged", "Only check files that are staged for commit") do
|
27
|
-
options["staged"] = true
|
28
|
-
end
|
29
|
-
|
30
|
-
opts.on("--unstaged", "Only check files that are not staged for commit") do
|
31
|
-
options["unstaged"] = true
|
32
|
-
end
|
33
|
-
|
34
|
-
opts.separator ""
|
35
|
-
opts.separator "Github options:"
|
36
|
-
|
37
|
-
opts.on("--submit", "Submit comments to Github") do
|
38
|
-
options["submit"] = true
|
39
|
-
end
|
40
|
-
|
41
|
-
opts.on("-p", "--pull-request PR_ID", "The ID of the pull request to comment on. Attempts to read from CI environment when not set.") do |p|
|
42
|
-
options["pull_request"] = p
|
43
|
-
end
|
11
|
+
cli = Shiba::Review::CLI.new
|
12
|
+
cli.report_options("diff", "branch", "pull_request")
|
44
13
|
|
45
|
-
|
46
|
-
|
47
|
-
|
14
|
+
if !cli.valid?
|
15
|
+
$stderr.puts cli.failure
|
16
|
+
exit 1
|
17
|
+
end
|
48
18
|
|
49
|
-
|
50
|
-
opts.separator "Common options:"
|
19
|
+
explain_diff = Shiba::Review::ExplainDiff.new(cli.options["file"], cli.options)
|
51
20
|
|
52
|
-
|
53
|
-
|
54
|
-
end
|
21
|
+
problems = if explain_diff.diff_requested_by_user?
|
22
|
+
result = explain_diff.result
|
55
23
|
|
56
|
-
|
57
|
-
puts
|
58
|
-
exit
|
24
|
+
if result.message
|
25
|
+
$stderr.puts result.message
|
59
26
|
end
|
60
27
|
|
61
|
-
|
62
|
-
require 'shiba/version'
|
63
|
-
puts Shiba::VERSION
|
28
|
+
if result.status == :pass
|
64
29
|
exit
|
65
30
|
end
|
66
|
-
end
|
67
|
-
parser.parse!
|
68
|
-
|
69
|
-
# This is a noop since it's the default behavior. Ignore.
|
70
|
-
if options["staged"] && options["unstaged"]
|
71
|
-
options.delete("staged")
|
72
|
-
options.delete("unstaged")
|
73
|
-
end
|
74
|
-
|
75
|
-
|
76
|
-
log = options["file"]
|
77
|
-
|
78
|
-
if log.nil? && Shiba::Configure.ci?
|
79
|
-
log = options["file"] = File.join(Shiba.path, 'ci.json')
|
80
|
-
$stderr.puts "CI detected, setting file to #{log}" if options["verbose"]
|
81
|
-
end
|
82
|
-
|
83
|
-
if log.nil?
|
84
|
-
$stderr.puts "Provide an explain log, or run 'shiba explain' to generate one."
|
85
|
-
$stderr.puts ""
|
86
|
-
$stderr.puts parser
|
87
|
-
exit 1
|
88
|
-
end
|
89
|
-
|
90
|
-
if !File.exist?(log)
|
91
|
-
$stderr.puts "File not found: '#{log}'"
|
92
|
-
exit 1
|
93
|
-
end
|
94
|
-
|
95
|
-
pr_sha = ENV['TRAVIS_PULL_REQUEST_SHA'] || ENV['CIRCLE_BRANCH']
|
96
|
-
|
97
|
-
if options["branch"] == nil && pr_sha && !pr_sha.empty?
|
98
|
-
options["branch"] = pr_sha
|
99
|
-
end
|
100
|
-
|
101
|
-
if options["token"] == nil
|
102
|
-
options["token"] = ENV['GITHUB_TOKEN']
|
103
|
-
end
|
104
|
-
|
105
|
-
# https://circleci.com/docs/2.0/env-vars/
|
106
|
-
# This may be wrong for circle ci
|
107
|
-
pr_id = ENV['TRAVIS_PULL_REQUEST'] || ENV['CIRCLE_PR_NUMBER']
|
108
|
-
|
109
|
-
if options["pull_request"] == nil && pr_id && !pr_id.empty?
|
110
|
-
options["pull_request"] = pr_id
|
111
|
-
end
|
112
31
|
|
113
|
-
|
114
|
-
|
115
|
-
|
116
|
-
|
32
|
+
explain_diff.problems
|
33
|
+
else
|
34
|
+
explains = File.open(cli.options["file"]).each_line.map { |json| JSON.parse(json) }
|
35
|
+
bad = explains.select { |explain| explain["severity"] && explain["severity"] != 'none' }
|
36
|
+
bad.map { |explain| [ "#{explain["sql"]}:-2", explain ] }
|
117
37
|
end
|
118
38
|
|
119
39
|
repo_cmd = "git config --get remote.origin.url"
|
120
40
|
repo_url = `#{repo_cmd}`.chomp
|
121
41
|
|
122
|
-
if options["verbose"]
|
42
|
+
if cli.options["verbose"]
|
123
43
|
$stderr.puts "#{repo_cmd}\t#{repo_url}"
|
124
44
|
end
|
125
45
|
|
126
|
-
def require_option(parser, name)
|
127
|
-
$stderr.puts "Required: #{name}"
|
128
|
-
$stderr.puts ""
|
129
|
-
$stderr.puts parser
|
130
|
-
exit 1
|
131
|
-
end
|
132
|
-
|
133
46
|
if repo_url.empty?
|
134
47
|
$stderr.puts "'#{Dir.pwd}' does not appear to be a git repo"
|
135
48
|
exit 1
|
136
49
|
end
|
137
50
|
|
138
|
-
if options["submit"]
|
139
|
-
if (options["branch"].nil? || options["branch"].empty?) && (ENV['DIFF'].nil? || ENV['DIFF'].empty?)
|
140
|
-
require_option(parser, "branch")
|
141
|
-
end
|
142
|
-
require_option(parser, "token") if options["token"].nil?
|
143
|
-
require_option(parser, "pull_request") if options["pull_request"].nil?
|
144
|
-
end
|
145
|
-
|
146
|
-
if ENV['DIFF']
|
147
|
-
options['diff'] = ENV['DIFF']
|
148
|
-
end
|
149
|
-
|
150
|
-
# Check to see if the log overlaps with the git diff
|
151
|
-
result = Shiba::Checker.new(options).run(log)
|
152
|
-
|
153
|
-
if result.message
|
154
|
-
$stderr.puts result.message
|
155
|
-
end
|
156
|
-
|
157
|
-
if result.status == :pass
|
158
|
-
exit
|
159
|
-
end
|
160
|
-
|
161
51
|
# Generate comments for the problem queries
|
162
|
-
reviewer = Shiba::Reviewer.new(repo_url,
|
52
|
+
reviewer = Shiba::Reviewer.new(repo_url, problems, cli.options)
|
163
53
|
|
164
|
-
if !options["submit"] || options["verbose"]
|
54
|
+
if !cli.options["submit"] || cli.options["verbose"]
|
165
55
|
reviewer.comments.each do |c|
|
166
56
|
puts "#{c[:path]}:#{c[:line]} (#{c[:position]})"
|
167
57
|
puts c[:body]
|
@@ -169,7 +59,7 @@ if !options["submit"] || options["verbose"]
|
|
169
59
|
end
|
170
60
|
end
|
171
61
|
|
172
|
-
if options["submit"]
|
62
|
+
if cli.options["submit"]
|
173
63
|
if reviewer.repo_host.empty? || reviewer.repo_path.empty?
|
174
64
|
$stderr.puts "Invalid repo url '#{repo_url}' from git config --get remote.origin.url"
|
175
65
|
exit 1
|
data/lib/shiba.rb
CHANGED
@@ -8,13 +8,17 @@ require "byebug" if ENV['SHIBA_DEBUG']
|
|
8
8
|
module Shiba
|
9
9
|
class Error < StandardError; end
|
10
10
|
class ConfigError < StandardError; end
|
11
|
+
TEMPLATE_FILE = File.join(File.dirname(__dir__), 'lib/shiba/output/tags.yaml')
|
11
12
|
|
12
13
|
def self.configure(options)
|
14
|
+
return false if @connection_hash
|
15
|
+
|
13
16
|
configure_mysql_defaults(options)
|
14
17
|
|
15
18
|
@connection_hash = options.select { |k, v| [ 'default_file', 'default_group', 'server', 'username', 'database', 'host', 'password', 'port'].include?(k) }
|
16
19
|
@main_config = Configure.read_config_file(options['config'], "config/shiba.yml")
|
17
20
|
@index_config = Configure.read_config_file(options['index'], "config/shiba_index.yml")
|
21
|
+
true
|
18
22
|
end
|
19
23
|
|
20
24
|
def self.configure_mysql_defaults(options)
|