pgdexter 0.1.1 → 0.1.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: a9dd13cbe8dc39e26b90bdc13abd16201ccec455
4
- data.tar.gz: bc4693ab0595c9dc1e38b720c294b8e600d597c5
3
+ metadata.gz: b472c8289c8c878c0ea790a3ec5fdf43e22bfaa1
4
+ data.tar.gz: 55f96eb8bd0a8739c79fb7c749a8aee68b2a119f
5
5
  SHA512:
6
- metadata.gz: 5f96e60ac1660786dd0f059ec0e88a4f017c056f73eb0be07bb6d0d9ad55332a3c36c72cd9fb7f8b3a5ed370254b361f29979baa031850d93800ce8ad74775d2
7
- data.tar.gz: f6b5a5a8487ea44eb38a4aca3ec59aaaea6c8cdd2c3e40267203fa29b4999d03a507fbe8bbe7e4207a5110c7bcc1e649dba624b4eb11e859be1f432e705095a8
6
+ metadata.gz: 873d3d0f9471f90c9b90721274623c6c3dcfbcfe88989143a6b66a5ce1345a2187bab75ef4f32f52efd9f34a1e1806524696854348e592d6eb31bec25ca193c8
7
+ data.tar.gz: a545b2b1a9d2f6650ec04c1769b243863c412fbff3b1e0575ab4cc6c98648119e487a2bc75f0afeeb704895184580ada13169a05edb3e00bc0caca37759183bc
data/CHANGELOG.md ADDED
@@ -0,0 +1,14 @@
1
+ ## 0.1.2
2
+
3
+ - Added `--exclude` option
4
+ - Added `--log-sql` option
5
+
6
+ ## 0.1.1
7
+
8
+ - Added `--interval` option
9
+ - Added `--min-time` option
10
+ - Added `--log-level` option
11
+
12
+ ## 0.1.0
13
+
14
+ - Launched
data/README.md CHANGED
@@ -1,6 +1,8 @@
1
1
  # Dexter
2
2
 
3
- An automatic indexer for Postgres
3
+ The automatic indexer for Postgres
4
+
5
+ [Read about how it works](https://medium.com/@ankane/introducing-dexter-the-automatic-indexer-for-postgres-5f8fa8b28f27)
4
6
 
5
7
  ## Installation
6
8
 
@@ -22,7 +24,7 @@ Enable logging for slow queries.
22
24
  log_min_duration_statement = 10 # ms
23
25
  ```
24
26
 
25
- And install with:
27
+ And install the command line tool with:
26
28
 
27
29
  ```sh
28
30
  gem install pgdexter
@@ -50,12 +52,31 @@ This finds slow queries and generates output like:
50
52
  2017-06-25T17:53:22+00:00 Processing 12 new query fingerprints
51
53
  ```
52
54
 
53
- To be safe, Dexter will not create indexes unless you pass the `--create` flag.
55
+ To be safe, Dexter will not create indexes unless you pass the `--create` flag. In this case, you’ll see:
56
+
57
+ ```log
58
+ 2017-06-25T17:52:22+00:00 Index found: ratings (user_id)
59
+ 2017-06-25T17:52:22+00:00 Creating index: CREATE INDEX CONCURRENTLY ON ratings (user_id)
60
+ 2017-06-25T17:52:37+00:00 Index created: 15243 ms
61
+ ```
54
62
 
55
63
  ## Options
56
64
 
57
- - `--interval` - time to wait between processing queries
58
- - `--min-time` - only consider queries that have consumed a certain amount of DB time (in minutes)
65
+ Name | Description | Default
66
+ --- | --- | ---
67
+ exclude | prevent specific tables from being indexed | None
68
+ interval | time to wait between processing queries, in seconds | 60
69
+ log-level | `debug` gives additional info for suggested indexes<br />`debug2` gives additional info for all processed queries | info
70
+ log-sql | log SQL statements executed | false
71
+ min-time | only process queries consuming a min amount of DB time, in minutes | 0
72
+
73
+ ## Future Work
74
+
75
+ [Here are some ideas](https://github.com/ankane/dexter/issues/1)
76
+
77
+ ## Thanks
78
+
79
+ This software wouldn’t be possible without [HypoPG](https://github.com/dalibo/hypopg), which allows you to create hypothetical indexes, and [pg_query](https://github.com/lfittl/pg_query), which allows you to parse and fingerprint queries. A big thanks to Dalibo and Lukas Fittl respectively.
59
80
 
60
81
  ## Contributing
61
82
 
data/lib/dexter.rb CHANGED
@@ -5,49 +5,10 @@ require "pg_query"
5
5
  require "time"
6
6
  require "set"
7
7
  require "thread"
8
+ require "dexter/logging"
9
+ require "dexter/client"
10
+ require "dexter/collector"
8
11
  require "dexter/indexer"
9
12
  require "dexter/log_parser"
10
-
11
- module Dexter
12
- class Client
13
- attr_reader :arguments, :options
14
-
15
- def initialize(args)
16
- @arguments, @options = parse_args(args)
17
- end
18
-
19
- def perform
20
- abort "Missing database url" if arguments.empty?
21
- abort "Too many arguments" if arguments.size > 2
22
-
23
- # get queries
24
- queries = []
25
- if options[:s]
26
- queries << options[:s]
27
- Indexer.new(self).process_queries(queries)
28
- end
29
- if arguments[1]
30
- begin
31
- LogParser.new(arguments[1], self).perform
32
- rescue Errno::ENOENT
33
- abort "Log file not found"
34
- end
35
- end
36
- if !options[:s] && !arguments[1]
37
- LogParser.new(STDIN, self).perform
38
- end
39
- end
40
-
41
- def parse_args(args)
42
- opts = Slop.parse(args) do |o|
43
- o.boolean "--create", default: false
44
- o.string "-s"
45
- o.float "--min-time", default: 0
46
- o.integer "--interval", default: 60
47
- end
48
- [opts.arguments, opts.to_hash]
49
- rescue Slop::Error => e
50
- abort e.message
51
- end
52
- end
53
- end
13
+ require "dexter/processor"
14
+ require "dexter/query"
@@ -0,0 +1,67 @@
1
+ module Dexter
2
+ class Client
3
+ attr_reader :arguments, :options
4
+
5
+ def initialize(args)
6
+ @arguments, @options = parse_args(args)
7
+ end
8
+
9
+ def perform
10
+ STDOUT.sync = true
11
+ STDERR.sync = true
12
+
13
+ if options[:statement]
14
+ fingerprint = PgQuery.fingerprint(options[:statement]) rescue "unknown"
15
+ query = Query.new(options[:statement], fingerprint)
16
+ Indexer.new(arguments[0], options).process_queries([query])
17
+ elsif arguments[1]
18
+ Processor.new(arguments[0], arguments[1], options).perform
19
+ else
20
+ Processor.new(arguments[0], STDIN, options).perform
21
+ end
22
+ end
23
+
24
+ def parse_args(args)
25
+ opts = Slop.parse(args) do |o|
26
+ o.banner = %(Usage:
27
+ dexter <database-url> [options]
28
+
29
+ Options:)
30
+ o.boolean "--create", "create indexes", default: false
31
+ o.array "--exclude", "prevent specific tables from being indexed"
32
+ o.integer "--interval", "time to wait between processing queries, in seconds", default: 60
33
+ o.float "--min-time", "only process queries that have consumed a certain amount of DB time, in minutes", default: 0
34
+ o.string "--log-level", "log level", default: "info"
35
+ o.boolean "--log-sql", "log sql", default: false
36
+ o.string "-s", "--statement", "process a single statement"
37
+ o.on "-v", "--version", "print the version" do
38
+ log Dexter::VERSION
39
+ exit
40
+ end
41
+ o.on "-h", "--help", "prints help" do
42
+ log o
43
+ exit
44
+ end
45
+ end
46
+
47
+ arguments = opts.arguments
48
+
49
+ if arguments.empty?
50
+ log opts
51
+ exit
52
+ end
53
+
54
+ abort "Too many arguments" if arguments.size > 2
55
+
56
+ abort "Unknown log level" unless ["info", "debug", "debug2"].include?(opts.to_hash[:log_level].to_s.downcase)
57
+
58
+ [arguments, opts.to_hash]
59
+ rescue Slop::Error => e
60
+ abort e.message
61
+ end
62
+
63
+ def log(message)
64
+ $stderr.puts message
65
+ end
66
+ end
67
+ end
@@ -0,0 +1,47 @@
1
+ module Dexter
2
+ class Collector
3
+ def initialize(options = {})
4
+ @top_queries = {}
5
+ @new_queries = Set.new
6
+ @mutex = Mutex.new
7
+ @min_time = options[:min_time] * 60000 # convert minutes to ms
8
+ end
9
+
10
+ def add(query, duration)
11
+ fingerprint =
12
+ begin
13
+ PgQuery.fingerprint(query)
14
+ rescue PgQuery::ParseError
15
+ # do nothing
16
+ end
17
+
18
+ return unless fingerprint
19
+
20
+ @top_queries[fingerprint] ||= {calls: 0, total_time: 0}
21
+ @top_queries[fingerprint][:calls] += 1
22
+ @top_queries[fingerprint][:total_time] += duration
23
+ @top_queries[fingerprint][:query] = query
24
+ @mutex.synchronize do
25
+ @new_queries << fingerprint
26
+ end
27
+ end
28
+
29
+ def fetch_queries
30
+ new_queries = nil
31
+
32
+ @mutex.synchronize do
33
+ new_queries = @new_queries.dup
34
+ @new_queries.clear
35
+ end
36
+
37
+ queries = []
38
+ @top_queries.each do |k, v|
39
+ if new_queries.include?(k) && v[:total_time] > @min_time
40
+ queries << Query.new(v[:query], k)
41
+ end
42
+ end
43
+
44
+ queries.sort_by(&:fingerprint)
45
+ end
46
+ end
47
+ end
@@ -1,36 +1,100 @@
1
1
  module Dexter
2
2
  class Indexer
3
- attr_reader :client
3
+ include Logging
4
4
 
5
- def initialize(client)
6
- @client = client
5
+ def initialize(database_url, options)
6
+ @database_url = database_url
7
+ @create = options[:create]
8
+ @log_level = options[:log_level]
9
+ @exclude_tables = options[:exclude]
10
+ @log_sql = options[:log_sql]
7
11
 
8
- select_all("SET client_min_messages = warning")
9
- select_all("CREATE EXTENSION IF NOT EXISTS hypopg")
12
+ create_extension
10
13
  end
11
14
 
12
15
  def process_queries(queries)
13
- # narrow down queries and tables
14
- tables, queries = narrow_queries(queries)
15
- return [] if tables.empty?
16
+ # reset hypothetical indexes
17
+ reset_hypothetical_indexes
18
+
19
+ # filter queries from other databases and system tables
20
+ tables = possible_tables(queries)
21
+ queries.each do |query|
22
+ query.missing_tables = !query.tables.all? { |t| tables.include?(t) }
23
+ end
24
+
25
+ # exclude user specified tables
26
+ # TODO exclude write-heavy tables
27
+ @exclude_tables.each do |table|
28
+ tables.delete(table)
29
+ end
30
+
31
+ # analyze tables if needed
32
+ analyze_tables(tables) if tables.any?
33
+
34
+ # get initial costs for queries
35
+ calculate_initial_cost(queries.reject(&:missing_tables))
36
+
37
+ # create hypothetical indexes
38
+ candidates = tables.any? ? create_hypothetical_indexes(tables) : {}
39
+
40
+ # get new costs and see if new indexes were used
41
+ new_indexes = determine_indexes(queries, candidates)
42
+
43
+ # display and create new indexes
44
+ show_and_create_indexes(new_indexes)
45
+ end
46
+
47
+ private
16
48
 
17
- # get ready for hypothetical indexes
49
+ def create_extension
50
+ select_all("SET client_min_messages = warning")
51
+ select_all("CREATE EXTENSION IF NOT EXISTS hypopg")
52
+ end
53
+
54
+ def reset_hypothetical_indexes
18
55
  select_all("SELECT hypopg_reset()")
56
+ end
19
57
 
20
- # ensure tables have recently been analyzed
21
- analyze_tables(tables)
58
+ def analyze_tables(tables)
59
+ tables = tables.to_a.sort
60
+
61
+ analyze_stats = select_all <<-SQL
62
+ SELECT
63
+ schemaname AS schema,
64
+ relname AS table,
65
+ last_analyze,
66
+ last_autoanalyze
67
+ FROM
68
+ pg_stat_user_tables
69
+ WHERE
70
+ relname IN (#{tables.map { |t| quote(t) }.join(", ")})
71
+ SQL
72
+
73
+ last_analyzed = {}
74
+ analyze_stats.each do |stats|
75
+ last_analyzed[stats["table"]] = Time.parse(stats["last_analyze"]) if stats["last_analyze"]
76
+ end
22
77
 
23
- # get initial plans
24
- initial_plans = {}
78
+ tables.each do |table|
79
+ if !last_analyzed[table] || last_analyzed[table] < Time.now - 3600
80
+ statement = "ANALYZE #{quote_ident(table)}"
81
+ log "Running analyze: #{statement}"
82
+ select_all(statement)
83
+ end
84
+ end
85
+ end
86
+
87
+ def calculate_initial_cost(queries)
25
88
  queries.each do |query|
26
89
  begin
27
- initial_plans[query] = plan(query)
90
+ query.initial_cost = plan(query.statement)["Total Cost"]
28
91
  rescue PG::Error
29
92
  # do nothing
30
93
  end
31
94
  end
32
- queries.select! { |q| initial_plans[q] }
95
+ end
33
96
 
97
+ def create_hypothetical_indexes(tables)
34
98
  # get existing indexes
35
99
  index_set = Set.new
36
100
  indexes(tables).each do |index|
@@ -45,60 +109,96 @@ module Dexter
45
109
  candidates[col] = select_all("SELECT * FROM hypopg_create_index('CREATE INDEX ON #{col[:table]} (#{[col[:column]].join(", ")})');").first["indexname"]
46
110
  end
47
111
  end
112
+ candidates
113
+ end
48
114
 
49
- queries_by_index = {}
115
+ def determine_indexes(queries, candidates)
116
+ new_indexes = {}
50
117
 
51
- new_indexes = []
52
118
  queries.each do |query|
53
- starting_cost = initial_plans[query]["Total Cost"]
54
- plan2 = plan(query)
55
- cost2 = plan2["Total Cost"]
56
- best_indexes = []
57
-
58
- candidates.each do |col, index_name|
59
- if plan2.inspect.include?(index_name) && cost2 < starting_cost * 0.5
60
- best_indexes << {
61
- table: col[:table],
62
- columns: [col[:column]]
63
- }
64
- (queries_by_index[best_indexes.last] ||= []) << {
65
- starting_cost: starting_cost,
66
- final_cost: cost2,
67
- query: query
68
- }
119
+ if query.initial_cost
120
+ new_plan = plan(query.statement)
121
+ query.new_cost = new_plan["Total Cost"]
122
+ cost_savings = query.new_cost < query.initial_cost * 0.5
123
+
124
+ query_indexes = []
125
+ candidates.each do |col, index_name|
126
+ if new_plan.inspect.include?(index_name)
127
+ index = {
128
+ table: col[:table],
129
+ columns: [col[:column]]
130
+ }
131
+ query_indexes << index
132
+
133
+ if cost_savings
134
+ new_indexes[index] ||= index.dup
135
+ (new_indexes[index][:queries] ||= []) << query
136
+ end
137
+ end
69
138
  end
70
139
  end
71
140
 
72
- new_indexes.concat(best_indexes)
141
+ if @log_level == "debug2"
142
+ log "Processed #{query.fingerprint}"
143
+ if query.initial_cost
144
+ log "Cost: #{query.initial_cost} -> #{query.new_cost}"
145
+
146
+ if query_indexes.any?
147
+ log "Indexes: #{query_indexes.map { |i| "#{i[:table]} (#{i[:columns].join(", ")})" }.join(", ")}"
148
+ log "Need 50% cost savings to suggest index" unless cost_savings
149
+ else
150
+ log "Indexes: None"
151
+ end
152
+ elsif query.fingerprint == "unknown"
153
+ log "Could not parse query"
154
+ elsif query.tables.empty?
155
+ log "No tables"
156
+ elsif query.missing_tables
157
+ log "Tables not present in current database"
158
+ else
159
+ log "Could not run explain"
160
+ end
161
+
162
+ puts
163
+ puts query.statement
164
+ puts
165
+ end
73
166
  end
74
167
 
75
- new_indexes = new_indexes.uniq.sort_by(&:to_a)
168
+ new_indexes.values.sort_by(&:to_a)
169
+ end
76
170
 
77
- # create indexes
171
+ def show_and_create_indexes(new_indexes)
78
172
  if new_indexes.any?
79
173
  new_indexes.each do |index|
80
- index[:queries] = queries_by_index[index]
81
-
82
174
  log "Index found: #{index[:table]} (#{index[:columns].join(", ")})"
83
- # log "CREATE INDEX CONCURRENTLY ON #{index[:table]} (#{index[:columns].join(", ")});"
84
- # index[:queries].sort_by { |q| fingerprints[q[:query]] }.each do |query|
85
- # log "Query #{fingerprints[query[:query]]} (Cost: #{query[:starting_cost]} -> #{query[:final_cost]})"
86
- # puts
87
- # puts query[:query]
88
- # puts
89
- # end
175
+
176
+ if @log_level.start_with?("debug")
177
+ index[:queries].sort_by(&:fingerprint).each do |query|
178
+ log "Query #{query.fingerprint} (Cost: #{query.initial_cost} -> #{query.new_cost})"
179
+ puts
180
+ puts query.statement
181
+ puts
182
+ end
183
+ end
90
184
  end
91
185
 
92
- new_indexes.each do |index|
93
- statement = "CREATE INDEX CONCURRENTLY ON #{index[:table]} (#{index[:columns].join(", ")})"
94
- # puts "#{statement};"
95
- if client.options[:create]
186
+ if @create
187
+ # TODO use advisory locks
188
+ # 1. create lock
189
+ # 2. refresh existing index list
190
+ # 3. create indexes that still don't exist
191
+ # 4. release lock
192
+ new_indexes.each do |index|
193
+ statement = "CREATE INDEX CONCURRENTLY ON #{index[:table]} (#{index[:columns].join(", ")})"
96
194
  log "Creating index: #{statement}"
97
195
  started_at = Time.now
98
196
  select_all(statement)
99
197
  log "Index created: #{((Time.now - started_at) * 1000).to_i} ms"
100
198
  end
101
199
  end
200
+ else
201
+ log "No indexes found"
102
202
  end
103
203
 
104
204
  new_indexes
@@ -106,7 +206,7 @@ module Dexter
106
206
 
107
207
  def conn
108
208
  @conn ||= begin
109
- uri = URI.parse(client.arguments[0])
209
+ uri = URI.parse(@database_url)
110
210
  config = {
111
211
  host: uri.host,
112
212
  port: uri.port,
@@ -122,14 +222,24 @@ module Dexter
122
222
  end
123
223
 
124
224
  def select_all(query)
125
- conn.exec(query).to_a
225
+ # use exec_params instead of exec for securiy
226
+ #
227
+ # Unlike PQexec, PQexecParams allows at most one SQL command in the given string.
228
+ # (There can be semicolons in it, but not more than one nonempty command.)
229
+ # This is a limitation of the underlying protocol, but has some usefulness
230
+ # as an extra defense against SQL-injection attacks.
231
+ # https://www.postgresql.org/docs/current/static/libpq-exec.html
232
+ query = squish(query)
233
+ log "SQL: #{query}" if @log_sql
234
+ conn.exec_params(query, []).to_a
126
235
  end
127
236
 
128
237
  def plan(query)
129
- JSON.parse(select_all("EXPLAIN (FORMAT JSON) #{query}").first["QUERY PLAN"]).first["Plan"]
238
+ # strip semi-colons as another measure of defense
239
+ JSON.parse(select_all("EXPLAIN (FORMAT JSON) #{query.gsub(";", "")}").first["QUERY PLAN"]).first["Plan"]
130
240
  end
131
241
 
132
- def narrow_queries(queries)
242
+ def database_tables
133
243
  result = select_all <<-SQL
134
244
  SELECT
135
245
  table_name
@@ -139,11 +249,11 @@ module Dexter
139
249
  table_catalog = current_database() AND
140
250
  table_schema NOT IN ('pg_catalog', 'information_schema')
141
251
  SQL
142
- possible_tables = Set.new(result.map { |r| r["table_name"] })
143
-
144
- tables = queries.flat_map { |q| PgQuery.parse(q).tables }.uniq.select { |t| possible_tables.include?(t) }
252
+ result.map { |r| r["table_name"] }
253
+ end
145
254
 
146
- [tables, queries.select { |q| PgQuery.parse(q).tables.all? { |t| possible_tables.include?(t) } }]
255
+ def possible_tables(queries)
256
+ Set.new(queries.flat_map(&:tables).uniq & database_tables)
147
257
  end
148
258
 
149
259
  def columns(tables)
@@ -168,13 +278,7 @@ module Dexter
168
278
  t.relname AS table,
169
279
  ix.relname AS name,
170
280
  regexp_replace(pg_get_indexdef(i.indexrelid), '^[^\\(]*\\((.*)\\)$', '\\1') AS columns,
171
- regexp_replace(pg_get_indexdef(i.indexrelid), '.* USING ([^ ]*) \\(.*', '\\1') AS using,
172
- indisunique AS unique,
173
- indisprimary AS primary,
174
- indisvalid AS valid,
175
- indexprs::text,
176
- indpred::text,
177
- pg_get_indexdef(i.indexrelid) AS definition
281
+ regexp_replace(pg_get_indexdef(i.indexrelid), '.* USING ([^ ]*) \\(.*', '\\1') AS using
178
282
  FROM
179
283
  pg_index i
180
284
  INNER JOIN
@@ -203,30 +307,8 @@ module Dexter
203
307
  end
204
308
  end
205
309
 
206
- def analyze_tables(tables)
207
- analyze_stats = select_all <<-SQL
208
- SELECT
209
- schemaname AS schema,
210
- relname AS table,
211
- last_analyze,
212
- last_autoanalyze
213
- FROM
214
- pg_stat_user_tables
215
- WHERE
216
- relname IN (#{tables.map { |t| quote(t) }.join(", ")})
217
- SQL
218
-
219
- last_analyzed = {}
220
- analyze_stats.each do |stats|
221
- last_analyzed[stats["table"]] = Time.parse(stats["last_analyze"]) if stats["last_analyze"]
222
- end
223
-
224
- tables.each do |table|
225
- if !last_analyzed[table] || last_analyzed[table] < Time.now - 3600
226
- log "Analyzing #{table}"
227
- select_all("ANALYZE #{table}")
228
- end
229
- end
310
+ def quote_ident(value)
311
+ conn.quote_ident(value)
230
312
  end
231
313
 
232
314
  def quote(value)
@@ -237,13 +319,14 @@ module Dexter
237
319
  end
238
320
  end
239
321
 
240
- # activerecord
322
+ # from activerecord
241
323
  def quote_string(s)
242
324
  s.gsub(/\\/, '\&\&').gsub(/'/, "''")
243
325
  end
244
326
 
245
- def log(message)
246
- puts "#{Time.now.iso8601} #{message}"
327
+ # from activesupport
328
+ def squish(str)
329
+ str.to_s.gsub(/\A[[:space:]]+/, "").gsub(/[[:space:]]+\z/, "").gsub(/[[:space:]]+/, " ")
247
330
  end
248
331
  end
249
332
  end
@@ -1,32 +1,11 @@
1
1
  module Dexter
2
2
  class LogParser
3
3
  REGEX = /duration: (\d+\.\d+) ms (statement|execute <unnamed>): (.+)/
4
+ LINE_SEPERATOR = ": ".freeze
4
5
 
5
- def initialize(logfile, client)
6
+ def initialize(logfile, collector)
6
7
  @logfile = logfile
7
- @min_time = client.options[:min_time] * 60000 # convert minutes to ms
8
- @top_queries = {}
9
- @indexer = Indexer.new(client)
10
- @new_queries = Set.new
11
- @new_queries_mutex = Mutex.new
12
- @process_queries_mutex = Mutex.new
13
- @last_checked_at = {}
14
-
15
- log "Started"
16
-
17
- if @logfile == STDIN
18
- Thread.abort_on_exception = true
19
-
20
- @timer_thread = Thread.new do
21
- sleep(3) # starting sleep
22
- loop do
23
- @process_queries_mutex.synchronize do
24
- process_queries
25
- end
26
- sleep(client.options[:interval])
27
- end
28
- end
29
- end
8
+ @collector = collector
30
9
  end
31
10
 
32
11
  def perform
@@ -35,10 +14,9 @@ module Dexter
35
14
 
36
15
  each_line do |line|
37
16
  if active_line
38
- if line.include?(": ")
17
+ if line.include?(LINE_SEPERATOR)
39
18
  process_entry(active_line, duration)
40
19
  active_line = nil
41
- duration = nil
42
20
  else
43
21
  active_line << line
44
22
  end
@@ -47,15 +25,9 @@ module Dexter
47
25
  if !active_line && m = REGEX.match(line.chomp)
48
26
  duration = m[1].to_f
49
27
  active_line = m[3]
50
- else
51
- # skip
52
28
  end
53
29
  end
54
30
  process_entry(active_line, duration) if active_line
55
-
56
- @process_queries_mutex.synchronize do
57
- process_queries
58
- end
59
31
  end
60
32
 
61
33
  private
@@ -66,59 +38,18 @@ module Dexter
66
38
  yield line
67
39
  end
68
40
  else
69
- File.foreach(@logfile) do |line|
70
- yield line
71
- end
72
- end
73
- end
74
-
75
- def process_entry(query, duration)
76
- return unless query =~ /SELECT/i
77
- fingerprint =
78
41
  begin
79
- PgQuery.fingerprint(query)
80
- rescue PgQuery::ParseError
81
- # do nothing
82
- end
83
- return unless fingerprint
84
-
85
- @top_queries[fingerprint] ||= {calls: 0, total_time: 0}
86
- @top_queries[fingerprint][:calls] += 1
87
- @top_queries[fingerprint][:total_time] += duration
88
- @top_queries[fingerprint][:query] = query
89
- @new_queries_mutex.synchronize do
90
- @new_queries << fingerprint
91
- end
92
- end
93
-
94
- def process_queries
95
- new_queries = nil
96
-
97
- @new_queries_mutex.synchronize do
98
- new_queries = @new_queries.dup
99
- @new_queries.clear
100
- end
101
-
102
- now = Time.now
103
- min_checked_at = now - 3600 # don't recheck for an hour
104
- queries = []
105
- fingerprints = {}
106
- @top_queries.each do |k, v|
107
- if new_queries.include?(k) && v[:total_time] > @min_time && (!@last_checked_at[k] || @last_checked_at[k] < min_checked_at)
108
- fingerprints[v[:query]] = k
109
- queries << v[:query]
110
- @last_checked_at[k] = now
42
+ File.foreach(@logfile) do |line|
43
+ yield line
44
+ end
45
+ rescue Errno::ENOENT
46
+ abort "Log file not found"
111
47
  end
112
48
  end
113
-
114
- log "Processing #{queries.size} new query fingerprints"
115
- if queries.any?
116
- @indexer.process_queries(queries)
117
- end
118
49
  end
119
50
 
120
- def log(message)
121
- puts "#{Time.now.iso8601} #{message}"
51
+ def process_entry(query, duration)
52
+ @collector.add(query, duration) if query =~ /SELECT/i
122
53
  end
123
54
  end
124
55
  end
@@ -0,0 +1,7 @@
1
+ module Dexter
2
+ module Logging
3
+ def log(message)
4
+ puts "#{Time.now.iso8601} #{message}"
5
+ end
6
+ end
7
+ end
@@ -0,0 +1,61 @@
1
+ module Dexter
2
+ class Processor
3
+ include Logging
4
+
5
+ def initialize(database_url, logfile, options)
6
+ log "Started"
7
+
8
+ @logfile = logfile
9
+
10
+ @collector = Collector.new(min_time: options[:min_time])
11
+ @log_parser = LogParser.new(logfile, @collector)
12
+ @indexer = Indexer.new(database_url, options)
13
+
14
+ @starting_interval = 3
15
+ @interval = options[:interval]
16
+
17
+ @mutex = Mutex.new
18
+ @last_checked_at = {}
19
+ end
20
+
21
+ def perform
22
+ if @logfile == STDIN
23
+ Thread.abort_on_exception = true
24
+ Thread.new do
25
+ sleep(@starting_interval)
26
+ loop do
27
+ process_queries
28
+ sleep(@interval)
29
+ end
30
+ end
31
+ end
32
+
33
+ @log_parser.perform
34
+
35
+ process_queries
36
+ end
37
+
38
+ private
39
+
40
+ def process_queries
41
+ @mutex.synchronize do
42
+ process_queries_without_lock
43
+ end
44
+ end
45
+
46
+ def process_queries_without_lock
47
+ now = Time.now
48
+ min_checked_at = now - 3600 # don't recheck for an hour
49
+ queries = []
50
+ @collector.fetch_queries.each do |query|
51
+ if !@last_checked_at[query.fingerprint] || @last_checked_at[query.fingerprint] < min_checked_at
52
+ queries << query
53
+ @last_checked_at[query.fingerprint] = now
54
+ end
55
+ end
56
+
57
+ log "Processing #{queries.size} new query fingerprints"
58
+ @indexer.process_queries(queries) if queries.any?
59
+ end
60
+ end
61
+ end
@@ -0,0 +1,15 @@
1
+ module Dexter
2
+ class Query
3
+ attr_reader :statement, :fingerprint
4
+ attr_accessor :initial_cost, :new_cost, :missing_tables
5
+
6
+ def initialize(statement, fingerprint)
7
+ @statement = statement
8
+ @fingerprint = fingerprint
9
+ end
10
+
11
+ def tables
12
+ @tables ||= PgQuery.parse(statement).tables rescue []
13
+ end
14
+ end
15
+ end
@@ -1,3 +1,3 @@
1
1
  module Dexter
2
- VERSION = "0.1.1"
2
+ VERSION = "0.1.2"
3
3
  end
data/pgdexter.gemspec CHANGED
@@ -1,4 +1,5 @@
1
1
  # coding: utf-8
2
+
2
3
  lib = File.expand_path("../lib", __FILE__)
3
4
  $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
5
  require "dexter/version"
@@ -9,7 +10,7 @@ Gem::Specification.new do |spec|
9
10
  spec.authors = ["Andrew Kane"]
10
11
  spec.email = ["andrew@chartkick.com"]
11
12
 
12
- spec.summary = "An automatic indexer for Postgres"
13
+ spec.summary = "The automatic indexer for Postgres"
13
14
  spec.homepage = "https://github.com/ankane/dexter"
14
15
 
15
16
  spec.files = `git ls-files -z`.split("\x0").reject do |f|
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pgdexter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 0.1.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2017-06-25 00:00:00.000000000 Z
11
+ date: 2017-06-27 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: slop
@@ -89,14 +89,20 @@ extensions: []
89
89
  extra_rdoc_files: []
90
90
  files:
91
91
  - ".gitignore"
92
+ - CHANGELOG.md
92
93
  - Gemfile
93
94
  - LICENSE.txt
94
95
  - README.md
95
96
  - Rakefile
96
97
  - exe/dexter
97
98
  - lib/dexter.rb
99
+ - lib/dexter/client.rb
100
+ - lib/dexter/collector.rb
98
101
  - lib/dexter/indexer.rb
99
102
  - lib/dexter/log_parser.rb
103
+ - lib/dexter/logging.rb
104
+ - lib/dexter/processor.rb
105
+ - lib/dexter/query.rb
100
106
  - lib/dexter/version.rb
101
107
  - pgdexter.gemspec
102
108
  homepage: https://github.com/ankane/dexter
@@ -121,5 +127,5 @@ rubyforge_project:
121
127
  rubygems_version: 2.6.11
122
128
  signing_key:
123
129
  specification_version: 4
124
- summary: An automatic indexer for Postgres
130
+ summary: The automatic indexer for Postgres
125
131
  test_files: []