git_sme 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,78 @@
1
+ # Git SME
2
+
3
+ Git SME allows you to analyze your git repository and identify subject matter experts for any file
4
+ or directory that you would like to know more about. It does this by analyzing all commits made to
5
+ all files in your git repository over time and finding out who made the most changes to each file
6
+ and directory.
7
+
8
+ Commits are weighted so recent commits are more significant than past commits which should mitigate
9
+ the effect a legacy coder would have on these reports.
10
+
11
+ ## Installation
12
+
13
+ Install the gem for commandline usage in the appropriate version of ruby:
14
+
15
+ $ gem install git_sme
16
+
17
+ This will install the git-sme command which should now be available from everywhere assuming your
18
+ PATH is setup appropriately.
19
+
20
+ ## Usage
21
+
22
+ Basic usage of git-sme is as follows:
23
+
24
+ git-sme </path/to/repository> [flags]
25
+
26
+ This will throw an error if the path is not a git repository. As a rule, you don't have to point to
27
+ the `.git` folder in your checked out code because `git-sme` will know to look for that folder as a
28
+ child of the folder you **do** provide it.
29
+
30
+ `git-sme` will output a list of paths (files and directories) and the list of users who it thinks
31
+ are the subject matter experts on each of those paths. Users are listed in decreasing order of
32
+ expertise:
33
+
34
+ $ bundle exec git-sme ~/rails/dinghy --file ~/rails/dinghy/cli/dinghy/preferences.rb ~/rails/dinghy/cli/dinghy/machine.rb
35
+ Repository: /Users/sjaveed/rails/dinghy
36
+ Analyzed: 317 (0/s) 100.00% Time: 00:00:00 |==============================================================================|
37
+
38
+ /: brianp, ryan, brian, adrian.falleiro, sally, dev, markse, fgrehm, kallin.nagelberg, matt
39
+ /cli: brianp, ryan, markse, sally, adrian.falleiro, fgrehm, matt, kallin.nagelberg, brian, dev
40
+ /cli/dinghy: brianp, markse, ryan, sally, fgrehm, matt, adrian.falleiro, brian, aisipos, paul.moelders
41
+ /cli/dinghy/machine.rb: brianp, markse, sally, brian, fgrehm, ryan, robertc
42
+ /cli/dinghy/preferences.rb: brianp
43
+ /cli/dinghy/machine: brianp, ryan, fgrehm
44
+ /dinghy: brianp
45
+
46
+ Based on analysis of a checked out copy of dinghy, I can see that, for the files I'm interested in,
47
+ brianp would be a subject matter expert but I'll probably find some useful information from ryan and
48
+ markse as well since it looks like ryan has touched enough files in the `/cli/dinghy/machine`
49
+ directory
50
+
51
+ ### Flags
52
+
53
+ Flag | Description
54
+ -----|------------
55
+ `--branch <branch>` | The branch you want to analyze on the given repository. Defaults to 'master'.
56
+ `--user <username1 [username2 ...]>` | An optional list of users to whom you'd like to restrict the analysis. This allows you to see e.g. who might know more about a file given their history of working with it over time.
57
+ `--file </path/to/file [/path/to/other/file ...]` | An optional list of files/directories for which you'd like analysis. Defaults to /. The analysis will also include all directories between a subdirectory and the root of the repository. All file paths are relative to the repository root.
58
+ `--cache` | This is a default specification which caches all commits that the tool loads for a git repository. This allows you to e.g. `git pull` on a large repository and only incur the cost of loading the additional commits from the repository while previously seen commits are loaded a lot quicker from a cache.
59
+ `--no-cache` | Specify this if you do *not* want caching. You'll probably never need to use this.
60
+ `--results <count>` | The number of subject matter experts you'd like to see for each path. Defaults to 10.
61
+
62
+ ## Development
63
+
64
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
65
+
66
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
67
+
68
+ ## Contributing
69
+
70
+ Bug reports and pull requests are welcome on GitHub at https://github.com/sjaveed/git_sme. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
71
+
72
+ ## License
73
+
74
+ The gem is available as open source under the terms of the [GPLv3 License](https://www.gnu.org/licenses/gpl-3.0.en.html).
75
+
76
+ ## Code of Conduct
77
+
78
+ Everyone interacting in the GitSme project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/sjaveed/git_sme/blob/master/CODE_OF_CONDUCT.md).
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "git_sme"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,6 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'git_sme/cli'
4
+
5
+ # Prepend analyze so we don't have to keep typing that on the commandline
6
+ GitSme::CLI.start(ARGV.dup.unshift('analyze'))
@@ -0,0 +1,31 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path("../lib", __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require "git_sme/version"
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "git_sme"
8
+ spec.version = GitSme::VERSION
9
+ spec.authors = ["Shahbaz Javeed"]
10
+ spec.email = ["sjaveed@gmail.com"]
11
+
12
+ spec.summary = %q{Identify subject matter experts by analyzing your git repository}
13
+ spec.description = %q{Analyze your git repository and determine subject matter experts by identifying everyone who has touched a file with preference given to recent touches}
14
+ spec.homepage = "https://github.com/sjaveed/git_sme"
15
+ spec.license = "MIT"
16
+
17
+ spec.files = `git ls-files -z`.split("\x0").reject do |f|
18
+ f.match(%r{^(test|spec|features)/})
19
+ end
20
+ spec.bindir = "exe"
21
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
22
+ spec.require_paths = ["lib"]
23
+
24
+ spec.add_development_dependency "bundler", "~> 1.15"
25
+ spec.add_development_dependency "rake", "~> 10.0"
26
+ spec.add_development_dependency "rspec", "~> 3.2"
27
+
28
+ spec.add_dependency 'ruby-progressbar'
29
+ spec.add_dependency 'rugged'
30
+ spec.add_dependency 'thor'
31
+ end
@@ -0,0 +1,9 @@
1
+ require 'git_sme/version'
2
+ require 'git_sme/preferences'
3
+ require 'git_sme/commit_loader'
4
+ require 'git_sme/commit_analyzer'
5
+ require 'git_sme/analysis_presenter'
6
+
7
+ module GitSme
8
+ # Your code goes here...
9
+ end
@@ -0,0 +1,71 @@
1
+ module GitSme
2
+ class AnalysisPresenter
3
+ attr_reader :valid, :error_message
4
+
5
+ alias_method :valid?, :valid
6
+
7
+ def initialize(commit_analyzer, users = [], files = [])
8
+ @commit_analyzer = commit_analyzer
9
+ @users = users
10
+ @files = files
11
+ @files = ['/'] unless @users.any? || @files.any?
12
+
13
+ @valid = @commit_analyzer.valid?
14
+ @error_message = @commit_analyzer.error_message
15
+ end
16
+
17
+ def get_relevant_analyses(results_to_show = 10)
18
+ @commit_analyzer.analyze unless @commit_analyzer.analyzed?
19
+
20
+ users_to_match = @users.any? ? get_matching_keys(@commit_analyzer.analysis[:by_user].keys, @users) : []
21
+ files_to_match = @files.any? ? get_matching_keys(@commit_analyzer.analysis[:by_file].keys, @files) : []
22
+ presentable_data = []
23
+
24
+ if users_to_match.any? && files_to_match.any?
25
+ users_to_match.each do |user|
26
+ user_data = @commit_analyzer.analysis[:by_user][user].select { |k, v| files_to_match.include?(k) }
27
+ presentable_data << presentable_file_or_user({ user => user_data }, user, results_to_show: results_to_show)
28
+ end
29
+
30
+ puts
31
+
32
+ files_to_match.each do |file|
33
+ user_data = @commit_analyzer.analysis[:by_file][file].select { |k, v| users_to_match.include?(k) }
34
+ presentable_data << presentable_file_or_user({ file => user_data }, file)
35
+ end
36
+ elsif users_to_match.any?
37
+ get_matching_keys(@commit_analyzer.analysis[:by_user].keys, users_to_match).each do |user|
38
+ presentable_data << presentable_file_or_user(@commit_analyzer.analysis[:by_user], user)
39
+ end
40
+ elsif files_to_match.any?
41
+ get_matching_keys(@commit_analyzer.analysis[:by_file].keys, files_to_match).each do |path|
42
+ presentable_data << presentable_file_or_user(@commit_analyzer.analysis[:by_file], path)
43
+ end
44
+ end
45
+
46
+ presentable_data.compact
47
+ end
48
+
49
+ private
50
+
51
+ def presentable_file_or_user(data, key, results_to_show: 10)
52
+ stats = data[key]
53
+ info_to_show = sort_keys_by_value(stats).first(results_to_show)
54
+ return if info_to_show.empty?
55
+
56
+ {
57
+ key => info_to_show
58
+ }
59
+ end
60
+
61
+ def sort_keys_by_value(data)
62
+ data.keys.sort_by { |k| data[k] }.reverse
63
+ end
64
+
65
+ def get_matching_keys(all_keys, keys_to_match)
66
+ all_keys.select do |key|
67
+ keys_to_match.map { |matcher| matcher.match?(key) }.any? { |val| val }
68
+ end
69
+ end
70
+ end
71
+ end
@@ -0,0 +1,51 @@
1
+ require 'fileutils'
2
+
3
+ require_relative 'preferences'
4
+
5
+ module GitSme
6
+ class Cache
7
+ def initialize(name, enabled: true, directory: 'cache', file_prefix: '', file_suffix: '')
8
+ raise "Invalid cache name: [#{name}]" if name.nil? || name =~ /^\s+$/
9
+
10
+ @name = name.gsub(/[^a-zA-Z-]/, '').strip
11
+ @enabled = enabled
12
+ @cache_directory = File.join(PREFERENCES_HOME, directory)
13
+ @file_prefix = file_prefix
14
+ @file_suffix = file_suffix
15
+
16
+ FileUtils.mkdir_p(@cache_directory) unless File.exist?(@cache_directory)
17
+ end
18
+
19
+ def load
20
+ return [] unless @enabled && File.exist?(cache_filename)
21
+
22
+ YAML.load(File.read(cache_filename))
23
+ end
24
+
25
+ def save(data)
26
+ return unless @enabled
27
+
28
+ File.open(cache_filename, 'w') { |f| f.write(YAML.dump(data)) }
29
+ end
30
+
31
+ private
32
+
33
+ def prefix
34
+ return '' if @file_prefix =~ /^\s*$/
35
+
36
+ "#{@file_prefix}-"
37
+ end
38
+
39
+ def suffix
40
+ return '' if @file_suffix =~ /^\s*$/
41
+
42
+ "-#{@file_suffix}"
43
+ end
44
+
45
+ def cache_filename
46
+ filename = @name
47
+
48
+ File.join(@cache_directory, "#{prefix}#{filename}#{suffix}.yml")
49
+ end
50
+ end
51
+ end
@@ -0,0 +1,60 @@
1
+ require 'thor'
2
+ require 'ruby-progressbar'
3
+ require 'git_sme'
4
+
5
+ module GitSme
6
+ class CLI < Thor
7
+ desc 'analyze <repository> [--branch <branch>] [--user <username>] [--file </path/to/file>] [--cache | --no-cache] [--results <count>]',
8
+ 'Analyze the repository and determine the subject matter experts for the given files, limiting them to the users provided if needed'
9
+
10
+ method_option :branch, type: :string, default: 'master'
11
+ method_option :user, type: :array, default: []
12
+ method_option :file, type: :array, default: ['/']
13
+ method_option :cache, type: :boolean, default: true
14
+ method_option :results, type: :numeric, default: 10
15
+
16
+ def analyze(repository)
17
+ loader = GitSme::CommitLoader.new(repository, branch: options[:branch], enable_cache: options[:cache])
18
+ unless loader.valid?
19
+ puts "Error: #{loader.error_message}"
20
+ return
21
+ end
22
+
23
+ puts "Repository: #{loader.repo.path.gsub('/.git/', '')}"
24
+
25
+ loader_progress = ProgressBar.create(starting_at: 0, format: 'Loaded: %c (%R/s) %P%% %f |%B|')
26
+ loader.load do |new_commit_count, processed_commit_count, all_commit_count|
27
+ loader_progress.total = all_commit_count
28
+ loader_progress.increment
29
+ end
30
+
31
+ analyzer = GitSme::CommitAnalyzer.new(loader, enable_cache: false)
32
+ unless analyzer.valid?
33
+ puts "Error: #{analyzer.error_message}"
34
+ return
35
+ end
36
+
37
+ analyzer_progress = ProgressBar.create(starting_at: 0, total: loader.commits.size, format: 'Analyzed: %c (%R/s) %P%% %f |%B|')
38
+ analyzer.analyze do |commit_count, total_commits|
39
+ analyzer_progress.increment
40
+ end
41
+
42
+ presenter = AnalysisPresenter.new(analyzer, options[:user], options[:file])
43
+ analyses = presenter.get_relevant_analyses(options[:results].to_i)
44
+
45
+ puts
46
+
47
+ if !analyses.empty?
48
+ analyses.each do |result|
49
+ result.each do |path, users|
50
+ puts "#{path}: #{users.join(', ')}"
51
+ end
52
+ end
53
+ else
54
+ puts 'No data found!'
55
+ end
56
+ end
57
+
58
+ default_task :analyze
59
+ end
60
+ end
@@ -0,0 +1,125 @@
1
+ module GitSme
2
+ class CommitAnalyzer
3
+ attr_reader :valid, :error_message, :analysis, :analyzed
4
+
5
+ alias_method :valid?, :valid
6
+ alias_method :analyzed?, :analyzed
7
+
8
+ def initialize(commit_loader, enable_cache: true)
9
+ @enable_cache = true
10
+ @commit_loader = commit_loader
11
+ @analyzed = false
12
+ @valid = @commit_loader.valid?
13
+ @error_message = @commit_loader.error_message
14
+
15
+ @analysis = {}
16
+ @cache = GitSme::Cache.new(@commit_loader.repo.path.gsub('/.git/', ''),
17
+ enabled: @enable_cache, file_suffix: "#{@commit_loader.branch}-analysis"
18
+ )
19
+ end
20
+
21
+ def analyze(force: false)
22
+ return unless valid?
23
+ return if analyzed? && !force
24
+
25
+ @commit_loader.load
26
+ @analysis = @cache.load
27
+ new_analysis = []
28
+
29
+ if !@commit_loader.new_commits? || !@analysis.any?
30
+ if block_given?
31
+ @analysis = analyze_new_commits(@commit_loader.commits) do |commit_count, total_commits|
32
+ yield(commit_count, total_commits)
33
+ end
34
+ else
35
+ @analysis = analyze_new_commits(@commit_loader.commits)
36
+ end
37
+ elsif @commit_loader.new_commits?
38
+ new_analysis = if block_given?
39
+ analyze_new_commits(@commit_loader.new_commits) do |commit_count, total_commits|
40
+ yield(commit_count, total_commits)
41
+ end
42
+ else
43
+ analyze_new_commits(@commit_loader.new_commits)
44
+ end
45
+
46
+ summed_merge(@analysis[:by_user], new_analysis[:by_user])
47
+ summed_merge(@analysis[:by_file], new_analysis[:by_file])
48
+ end
49
+
50
+ @cache.save(@analysis)
51
+ @analyzed = true
52
+ end
53
+
54
+ private
55
+
56
+ def analyze_new_commits(commits_to_process)
57
+ user_stats = {}
58
+ file_stats = {}
59
+ now = Time.now.to_i
60
+ commit_count = commits_to_process.size
61
+
62
+ commits_to_process.each_with_index do |commit, current_commit_idx|
63
+ author = commit[:author]
64
+ time_delta = now - commit[:timestamp]
65
+
66
+ commit[:file_changes].each do |filename, change_details|
67
+ all_affected_paths(filename).each do |path|
68
+ change_value = weighted_value(change_details[:changes], time_delta)
69
+
70
+ user_stats[author] = {} unless user_stats.key?(author)
71
+ user_stats[author][path] = 0 unless user_stats[author].key?(path)
72
+
73
+ file_stats[path] = {} unless file_stats.key?(path)
74
+ file_stats[path][author] = 0 unless file_stats[path].key?(author)
75
+
76
+ user_stats[author][path] += change_value
77
+ file_stats[path][author] += change_value
78
+ end
79
+ end
80
+
81
+ if block_given?
82
+ yield(current_commit_idx, commit_count)
83
+ end
84
+ end
85
+
86
+ {
87
+ by_user: user_stats,
88
+ by_file: file_stats
89
+ }
90
+ end
91
+
92
+ def summed_merge(cached_data, new_data)
93
+ return if new_data.nil? || cached_data.nil?
94
+
95
+ new_data.each do |key, value_hash|
96
+ if cached_data.key?(key)
97
+ value_hash.each do |value_key, value|
98
+ if cached_data[key].key?(value_key)
99
+ cached_data[key][value_key] += value
100
+ else
101
+ cached_data[key][value_key] = value
102
+ end
103
+ end
104
+ else
105
+ cached_data[key] = value_hash
106
+ end
107
+ end
108
+ end
109
+
110
+ def all_affected_paths(filename)
111
+ ['/'] + filename.split('/').each_with_object([]) do |path_part, path_list|
112
+ path_list << [path_list[-1], path_part].join('/')
113
+ end
114
+ end
115
+
116
+ def weighted_value(value, time_delta)
117
+ # value_attenuator = 1.0
118
+ # value_attenuation = value_attenuator * time_delta / time_delta
119
+ value_attenuation = time_delta > 0 ? time_delta ** (-1/3) : 1
120
+
121
+ (value * value_attenuation).to_f
122
+ end
123
+
124
+ end
125
+ end