shiba 0.5.0 → 0.6.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4c19ddf8bb56062725650ff02bede1ee4863ec0de0c6ab56ca420496eb35cc5b
4
- data.tar.gz: 8ddfbc998e013cae6ecb5fc0b3ad05e7e76c2f086317cca14699099ffbdb7064
3
+ metadata.gz: f700a4b773d7a1b3c474ac6437c13ea77743dd6a9b95a8bde1b703d9914f6c20
4
+ data.tar.gz: 208f0d44f7ba356ad15340cb9ef48bb4ccfaba7050151cd12bcca26cd08b6bab
5
5
  SHA512:
6
- metadata.gz: e73819cb77cdc7efdaf521d76a64a2dc0ad3e27d97260cc45c2e4c59ac021fac6d12ad501fac2c299455963e6cd6c7650366e91ec6f067c873a1ff97e8d397c4
7
- data.tar.gz: 302a916584abf9fd32c210498639ab63739d953b569d5b57313bcde0baa563d5fbf66bba1dcff28f16282c3fb8f9e18b56d6187d689f77a378bafc13323a7f49
6
+ metadata.gz: a6d464627f14abd14753473965b4935582095dd5ba4b9187760710bffbe75a40edb9392899608f1c3d86d7911b2c0b73eafa8e01adafcf6370b044b191ed3e8e
7
+ data.tar.gz: 2417ce5680cf5468fe72a4750e30c964a184c7f943c9ce1e37efe738bf812197a1a9b07511e1a2541ede2c0834aa6d0896515add9937b132ba8af9e2b49a6188
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- shiba (0.5.0)
4
+ shiba (0.6.0)
5
5
  activesupport
6
6
  mysql2
7
7
  pg
data/README.md CHANGED
@@ -2,12 +2,9 @@
2
2
 
3
3
  # Shiba
4
4
 
5
- Shiba is a tool that helps catch poorly performing queries before they cause problems in production, including:
5
+ Shiba is a tool (currently in alpha) that automatically reviews SQL queries before they cause problems in production. It uses production statistics for realistic query analysis. It catches missing indexes, overly broad indexes, and queries that return too much data.
6
6
 
7
- * Full table scans
8
- * Poorly performing indexes
9
-
10
- By default, it will pretty much only detect queries that miss indexes. As it's fed more information, it warns about advanced problems, such as queries that use indexes but are still very expensive. To help find such queries, Shiba monitors test runs for ActiveRecord queries. A warning and report are then generated
7
+ ![screenshot](https://shiba-sql.com/wp-content/uploads/2019/03/shiba-screenshot-1024x581.png)
11
8
 
12
9
  ## Installation
13
10
 
@@ -27,8 +24,7 @@ require 'shiba/setup'
27
24
 
28
25
  ## Usage
29
26
 
30
- A report will only be generated when problem queries are detected.
31
- To verify shiba is actually running, you can run your tests with SHIBA_DEBUG=true.
27
+ To get started, try out shiba locally. To verify shiba is actually running, you can run your tests with SHIBA_DEBUG=true.
32
28
 
33
29
  ```ruby
34
30
  # Install
@@ -43,55 +39,15 @@ SHIBA_DEBUG=true ruby test/controllers/users_controller_test.rb
43
39
  # Report available at /tmp/shiba-explain.log-1550099512
44
40
  ```
45
41
 
46
- ### Screenshot
47
- `open /tmp/shiba-explain.log-1550099512`
48
- ![screenshot](/data/screenshot.png?raw=true)
49
-
42
+ ## Next steps
43
+ * [Integrate with Github pull requests](#automatic-pull-request-reviews)
44
+ * [Add production stats for realistic analysis](#going-beyond-table-scans)
45
+ * [Preview queries from the developer console](#analyze-queries-from-the-developer-console)
46
+ * [Read more about typical query problems](#typical-query-problems)
50
47
 
51
- ## Typical query problems
52
-
53
- Here are some typical query problems Shiba can detect. We'll assume the following schema:
54
-
55
- ```ruby
56
- create_table :users do |t|
57
- t.string :name
58
- t.string :email
59
- # add an organization_id column with an index
60
- t.references :organization, index: true
61
48
 
62
- t.timestamps
63
- end
64
- ```
65
49
 
66
- #### Full table scans
67
-
68
- The most simple case to detect are queries that don't utilize indexes. While it isn't a problem to scan small tables, often tables will grow large enough where this can become a serious issue.
69
-
70
- ```ruby
71
- user = User.where(email: 'squirrel@example.com').limit(1)
72
- ```
73
-
74
- Without an index, the database will read every row in the table until it finds one with an email address that matches. By adding an index, the database can perform a quick lookup for the record.
75
-
76
- #### Non selective indexes
77
-
78
- Another common case is queries that use an index, and work fine in the average case, but the distribution is non normal. These issues can be hard to track down and often impact large customers.
79
-
80
- ```ruby
81
- users = User.where(organization_id: 1)
82
- users.size
83
- # => 75
84
-
85
- users = User.where(organization_id: 42)
86
- users.size
87
- # => 52,000
88
- ```
89
-
90
- Normally a query like this would only become a problem as the app grows in popularity. Fixes include adding `limit` or `find_each`.
91
-
92
- With more data, Shiba can help detect this issue when it appears in a pull request.
93
-
94
- ## Going beyond table scans
50
+ ### Going beyond table scans
95
51
 
96
52
  Without more information, Shiba acts as a simple missed index detector. To catch other problems that can bring down production (or at least cause some performance issues), Shiba requires general statistics about production data, such as the number of rows in a table and how unique columns are.
97
53
 
@@ -135,11 +91,18 @@ users:
135
91
  unique: false
136
92
  ```
137
93
 
138
- ## Automatic pull request reviews
94
+ ### Automatic pull request reviews
95
+
96
+ Shiba can automatically comment on Github pull requests when code changes appear to introduce a query issue. To do this, it will need the Github API token of a user that has access to the repo. Shiba's comments will appear to come from that user, so you'll likely want to setup a bot account on Github with repo access for this. The token can be generated on Github at https://github.com/settings/tokens.
97
+
98
+ Once the token is ready, you can integrate Shiba on your CI server by following these steps:
99
+ * [Travis CI](#travis-integration)
100
+ * [CircleCI](#circleci-integration)
101
+ * [Customized CI](#custom-ci-integration)
139
102
 
140
- Shiba can automatically comment on Github pull requests when code changes appear to introduce a query issue. The comments are similar to those in the query report dashboard. This guide will walk through setup on Travis CI, but other CI services should work in a similar fashion.
103
+ #### Travis Integration
141
104
 
142
- Once Shiba is installed, the `shiba review` command needs to be run after the tests are finished. On Travis, this goes in an after_script setting:
105
+ On Travis, add this to the after_script setting:
143
106
 
144
107
  ```yml
145
108
  # .travis.yml
@@ -147,12 +110,99 @@ after_script:
147
110
  - bundle exec shiba review --submit
148
111
  ```
149
112
 
150
- The `--submit` option tells Shiba to comment on the relevant PR when an issue is found. To do this, it will need the Github API token of a user that has access to the repo. Shiba's comments will appear to come from that user, so you'll likely want to setup a bot account on Github with repo access for this.
113
+ Add the Github API token you've generated as an environment variable named `GITHUB_TOKEN` at https://travis-ci.com/{organization}/{repo}/settings.
151
114
 
152
- By default, the review script looks for an environment variable named GITHUB_TOKEN that can be specified at https://travis-ci.com/{organization}/{repo}/settings. The token can be generated on Github at https://github.com/settings/tokens. If you have another environment variable name for your Github token, it can be manually configured using the `--token` flag.
115
+ #### CircleCI Integration
153
116
 
154
- ```yml
155
- # .travis.yml
156
- after_script:
157
- - bundle exec shiba review --token $MY_GITHUB_API_TOKEN --submit
158
- ```
117
+ To integrate with CircleCI, add this after the the test run step in `.circleci/config.yml`.
118
+
119
+ ```yml
120
+ # .circleci/config.yml
121
+ - run:
122
+ name: Review SQL queries
123
+ command: bundle exec shiba review --submit
124
+ ```
125
+
126
+ An environment variable named `GITHUB_TOKEN` will need to be configured on CircleCI under *Project settings > Environment Variables*
127
+
128
+ #### Custom CI Integration
129
+
130
+ To run on other servers, two steps are required:
131
+ 1. Ensure an environment variable named `CI` is set when the tests and shiba script are run.
132
+ 2. Run the `shiba review` command after tests are run, supplying the required arguments to `--submit, --token, --branch, and --pull-request`. For example:
133
+
134
+ ```bash
135
+ CI=true
136
+ export CI
137
+ rake test
138
+ bundle exec shiba review --submit --token $MY_GITHUB_TOKEN --branch $(git rev-parse HEAD) --pull-request $MY_PR_NUMBER
139
+ ```
140
+
141
+ The `--submit` option tells Shiba to comment on the relevant PR when an issue is found.
142
+
143
+
144
+ ### Analyze queries from the developer console
145
+
146
+ For quick analysis, queries can be analyzed from the Rails console.
147
+ ```ruby
148
+ # rails console
149
+ [1] pry(main)> require 'shiba/console'
150
+ => true
151
+ [2] pry(main)> shiba User.where(email: "squirrel@example.com")
152
+
153
+ Severity: high
154
+ ----------------------------
155
+ Fuzzed Data: Table sizes estimated as follows -- 100000: users
156
+ Table Scan: The database reads 100% (100000) of the of the rows in **users**, skipping any indexes.
157
+ Results: The database returns 100000 row(s) to the client.
158
+ Estimated query time: 3.02s
159
+
160
+ => #<Shiba::Console::ExplainRecord:0x00007ffc154e6128>: 'SELECT `users`.* FROM `users` WHERE `users`.`email` = 'squirrel@example.com''. Call the 'help' method on this object for more info.
161
+ [3] pry(main)>
162
+ ```
163
+
164
+ Raw query strings are also supported, e.g. `shiba "select * from users where users.email = 'squirrel@example.com'"`
165
+
166
+
167
+ ### Typical query problems
168
+
169
+ Here are some typical query problems Shiba can detect. We'll assume the following schema:
170
+
171
+ ```ruby
172
+ create_table :users do |t|
173
+ t.string :name
174
+ t.string :email
175
+ # add an organization_id column with an index
176
+ t.references :organization, index: true
177
+
178
+ t.timestamps
179
+ end
180
+ ```
181
+
182
+ #### Full table scans
183
+
184
+ The most simple case to detect are queries that don't utilize indexes. While it isn't a problem to scan small tables, often tables will grow large enough where this can become a serious issue.
185
+
186
+ ```ruby
187
+ user = User.where(email: 'squirrel@example.com').limit(1)
188
+ ```
189
+
190
+ Without an index, the database will read every row in the table until it finds one with an email address that matches. By adding an index, the database can perform a quick lookup for the record.
191
+
192
+ #### Non selective indexes
193
+
194
+ Another common case is queries that use an index, and work fine in the average case, but the distribution is non normal. These issues can be hard to track down and often impact large customers.
195
+
196
+ ```ruby
197
+ users = User.where(organization_id: 1)
198
+ users.size
199
+ # => 75
200
+
201
+ users = User.where(organization_id: 42)
202
+ users.size
203
+ # => 52,000
204
+ ```
205
+
206
+ Normally a query like this would only become a problem as the app grows in popularity. Fixes include adding `limit` or `find_each`.
207
+
208
+ With more data, Shiba can help detect this issue when it appears in a pull request.
data/bin/review CHANGED
@@ -2,166 +2,56 @@
2
2
 
3
3
  $LOAD_PATH << File.expand_path("../lib", File.dirname(__FILE__))
4
4
  require 'optionparser'
5
- require 'shiba/reviewer'
6
- require 'shiba/checker'
7
5
  require 'shiba/configure'
6
+ require 'shiba/reviewer'
7
+ require 'shiba/review/explain_diff'
8
+ require 'shiba/review/cli'
9
+ require 'json'
8
10
 
9
- options = {}
10
- parser = OptionParser.new do |opts|
11
- opts.banner = "Review changes for query problems. Optionally submit the comments to a Github pull request."
12
-
13
- opts.separator "Required:"
14
-
15
- opts.on("-f","--file FILE", "The explain output log to compare with. Automatically configured when $CI environment variable is set") do |f|
16
- options["file"] = f
17
- end
18
-
19
- opts.separator ""
20
- opts.separator "Git diff options:"
21
-
22
- opts.on("-b", "--branch GIT_BRANCH", "Compare to changes between origin/HEAD and BRANCH. Attempts to read from CI environment when not set.") do |b|
23
- options["branch"] = b
24
- end
25
-
26
- opts.on("--staged", "Only check files that are staged for commit") do
27
- options["staged"] = true
28
- end
29
-
30
- opts.on("--unstaged", "Only check files that are not staged for commit") do
31
- options["unstaged"] = true
32
- end
33
-
34
- opts.separator ""
35
- opts.separator "Github options:"
36
-
37
- opts.on("--submit", "Submit comments to Github") do
38
- options["submit"] = true
39
- end
40
-
41
- opts.on("-p", "--pull-request PR_ID", "The ID of the pull request to comment on. Attempts to read from CI environment when not set.") do |p|
42
- options["pull_request"] = p
43
- end
11
+ cli = Shiba::Review::CLI.new
12
+ cli.report_options("diff", "branch", "pull_request")
44
13
 
45
- opts.on("-t", "--token TOKEN", "The Github API token to use for commenting. Defaults to $GITHUB_TOKEN.") do |t|
46
- options["token"] = t
47
- end
14
+ if !cli.valid?
15
+ $stderr.puts cli.failure
16
+ exit 1
17
+ end
48
18
 
49
- opts.separator ""
50
- opts.separator "Common options:"
19
+ explain_diff = Shiba::Review::ExplainDiff.new(cli.options["file"], cli.options)
51
20
 
52
- opts.on("--verbose", "Verbose/debug mode") do
53
- options["verbose"] = true
54
- end
21
+ problems = if explain_diff.diff_requested_by_user?
22
+ result = explain_diff.result
55
23
 
56
- opts.on_tail("-h", "--help", "Show this message") do
57
- puts opts
58
- exit
24
+ if result.message
25
+ $stderr.puts result.message
59
26
  end
60
27
 
61
- opts.on_tail("--version", "Show version") do
62
- require 'shiba/version'
63
- puts Shiba::VERSION
28
+ if result.status == :pass
64
29
  exit
65
30
  end
66
- end
67
- parser.parse!
68
-
69
- # This is a noop since it's the default behavior. Ignore.
70
- if options["staged"] && options["unstaged"]
71
- options.delete("staged")
72
- options.delete("unstaged")
73
- end
74
-
75
-
76
- log = options["file"]
77
-
78
- if log.nil? && Shiba::Configure.ci?
79
- log = options["file"] = File.join(Shiba.path, 'ci.json')
80
- $stderr.puts "CI detected, setting file to #{log}" if options["verbose"]
81
- end
82
-
83
- if log.nil?
84
- $stderr.puts "Provide an explain log, or run 'shiba explain' to generate one."
85
- $stderr.puts ""
86
- $stderr.puts parser
87
- exit 1
88
- end
89
-
90
- if !File.exist?(log)
91
- $stderr.puts "File not found: '#{log}'"
92
- exit 1
93
- end
94
-
95
- pr_sha = ENV['TRAVIS_PULL_REQUEST_SHA'] || ENV['CIRCLE_BRANCH']
96
-
97
- if options["branch"] == nil && pr_sha && !pr_sha.empty?
98
- options["branch"] = pr_sha
99
- end
100
-
101
- if options["token"] == nil
102
- options["token"] = ENV['GITHUB_TOKEN']
103
- end
104
-
105
- # https://circleci.com/docs/2.0/env-vars/
106
- # This may be wrong for circle ci
107
- pr_id = ENV['TRAVIS_PULL_REQUEST'] || ENV['CIRCLE_PR_NUMBER']
108
-
109
- if options["pull_request"] == nil && pr_id && !pr_id.empty?
110
- options["pull_request"] = pr_id
111
- end
112
31
 
113
- if options["verbose"]
114
- $stderr.puts "DIFF: #{ENV['DIFF']}" if ENV['DIFF']
115
- $stderr.puts "branch: #{options["branch"].inspect}" if options["branch"]
116
- $stderr.puts "pull_request: #{options["pull_request"]}" if options["pull_request"]
32
+ explain_diff.problems
33
+ else
34
+ explains = File.open(cli.options["file"]).each_line.map { |json| JSON.parse(json) }
35
+ bad = explains.select { |explain| explain["severity"] && explain["severity"] != 'none' }
36
+ bad.map { |explain| [ "#{explain["sql"]}:-2", explain ] }
117
37
  end
118
38
 
119
39
  repo_cmd = "git config --get remote.origin.url"
120
40
  repo_url = `#{repo_cmd}`.chomp
121
41
 
122
- if options["verbose"]
42
+ if cli.options["verbose"]
123
43
  $stderr.puts "#{repo_cmd}\t#{repo_url}"
124
44
  end
125
45
 
126
- def require_option(parser, name)
127
- $stderr.puts "Required: #{name}"
128
- $stderr.puts ""
129
- $stderr.puts parser
130
- exit 1
131
- end
132
-
133
46
  if repo_url.empty?
134
47
  $stderr.puts "'#{Dir.pwd}' does not appear to be a git repo"
135
48
  exit 1
136
49
  end
137
50
 
138
- if options["submit"]
139
- if (options["branch"].nil? || options["branch"].empty?) && (ENV['DIFF'].nil? || ENV['DIFF'].empty?)
140
- require_option(parser, "branch")
141
- end
142
- require_option(parser, "token") if options["token"].nil?
143
- require_option(parser, "pull_request") if options["pull_request"].nil?
144
- end
145
-
146
- if ENV['DIFF']
147
- options['diff'] = ENV['DIFF']
148
- end
149
-
150
- # Check to see if the log overlaps with the git diff
151
- result = Shiba::Checker.new(options).run(log)
152
-
153
- if result.message
154
- $stderr.puts result.message
155
- end
156
-
157
- if result.status == :pass
158
- exit
159
- end
160
-
161
51
  # Generate comments for the problem queries
162
- reviewer = Shiba::Reviewer.new(repo_url, result.problems, options)
52
+ reviewer = Shiba::Reviewer.new(repo_url, problems, cli.options)
163
53
 
164
- if !options["submit"] || options["verbose"]
54
+ if !cli.options["submit"] || cli.options["verbose"]
165
55
  reviewer.comments.each do |c|
166
56
  puts "#{c[:path]}:#{c[:line]} (#{c[:position]})"
167
57
  puts c[:body]
@@ -169,7 +59,7 @@ if !options["submit"] || options["verbose"]
169
59
  end
170
60
  end
171
61
 
172
- if options["submit"]
62
+ if cli.options["submit"]
173
63
  if reviewer.repo_host.empty? || reviewer.repo_path.empty?
174
64
  $stderr.puts "Invalid repo url '#{repo_url}' from git config --get remote.origin.url"
175
65
  exit 1
data/lib/shiba.rb CHANGED
@@ -8,13 +8,17 @@ require "byebug" if ENV['SHIBA_DEBUG']
8
8
  module Shiba
9
9
  class Error < StandardError; end
10
10
  class ConfigError < StandardError; end
11
+ TEMPLATE_FILE = File.join(File.dirname(__dir__), 'lib/shiba/output/tags.yaml')
11
12
 
12
13
  def self.configure(options)
14
+ return false if @connection_hash
15
+
13
16
  configure_mysql_defaults(options)
14
17
 
15
18
  @connection_hash = options.select { |k, v| [ 'default_file', 'default_group', 'server', 'username', 'database', 'host', 'password', 'port'].include?(k) }
16
19
  @main_config = Configure.read_config_file(options['config'], "config/shiba.yml")
17
20
  @index_config = Configure.read_config_file(options['index'], "config/shiba_index.yml")
21
+ true
18
22
  end
19
23
 
20
24
  def self.configure_mysql_defaults(options)