git_sme 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.gitignore +54 -0
- data/.rspec +2 -0
- data/.travis.yml +5 -0
- data/CODE_OF_CONDUCT.md +74 -0
- data/Gemfile +6 -0
- data/Gemfile.lock +41 -0
- data/LICENSE +674 -0
- data/README.md +78 -0
- data/Rakefile +6 -0
- data/bin/console +14 -0
- data/bin/setup +8 -0
- data/exe/git-sme +6 -0
- data/git_sme.gemspec +31 -0
- data/lib/git_sme.rb +9 -0
- data/lib/git_sme/analysis_presenter.rb +71 -0
- data/lib/git_sme/cache.rb +51 -0
- data/lib/git_sme/cli.rb +60 -0
- data/lib/git_sme/commit_analyzer.rb +125 -0
- data/lib/git_sme/commit_loader.rb +149 -0
- data/lib/git_sme/preferences.rb +3 -0
- data/lib/git_sme/version.rb +3 -0
- metadata +151 -0
data/README.md
ADDED
@@ -0,0 +1,78 @@
|
|
1
|
+
# Git SME
|
2
|
+
|
3
|
+
Git SME allows you to analyze your git repository and identify subject matter experts for any file
|
4
|
+
or directory that you would like to know more about. It does this by analyzing all commits made to
|
5
|
+
all files in your git repository over time and finding out who made the most changes to each file
|
6
|
+
and directory.
|
7
|
+
|
8
|
+
Commits are weighted so recent commits are more significant than past commits which should mitigate
|
9
|
+
the effect a legacy coder would have on these reports.
|
10
|
+
|
11
|
+
## Installation
|
12
|
+
|
13
|
+
Install the gem for commandline usage in the appropriate version of ruby:
|
14
|
+
|
15
|
+
$ gem install git_sme
|
16
|
+
|
17
|
+
This will install the git-sme command which should now be available from everywhere assuming your
|
18
|
+
PATH is setup appropriately.
|
19
|
+
|
20
|
+
## Usage
|
21
|
+
|
22
|
+
Basic usage of git-sme is as follows:
|
23
|
+
|
24
|
+
git-sme </path/to/repository> [flags]
|
25
|
+
|
26
|
+
This will throw an error if the path is not a git repository. As a rule, you don't have to point to
|
27
|
+
the `.git` folder in your checked out code because `git-sme` will know to look for that folder as a
|
28
|
+
child of the folder you **do** provide it.
|
29
|
+
|
30
|
+
`git-sme` will output a list of paths (files and directories) and the list of users who it thinks
|
31
|
+
are the subject matter experts on each of those paths. Users are listed in decreasing order of
|
32
|
+
expertise:
|
33
|
+
|
34
|
+
$ bundle exec git-sme ~/rails/dinghy --file ~/rails/dinghy/cli/dinghy/preferences.rb ~/rails/dinghy/cli/dinghy/machine.rb
|
35
|
+
Repository: /Users/sjaveed/rails/dinghy
|
36
|
+
Analyzed: 317 (0/s) 100.00% Time: 00:00:00 |==============================================================================|
|
37
|
+
|
38
|
+
/: brianp, ryan, brian, adrian.falleiro, sally, dev, markse, fgrehm, kallin.nagelberg, matt
|
39
|
+
/cli: brianp, ryan, markse, sally, adrian.falleiro, fgrehm, matt, kallin.nagelberg, brian, dev
|
40
|
+
/cli/dinghy: brianp, markse, ryan, sally, fgrehm, matt, adrian.falleiro, brian, aisipos, paul.moelders
|
41
|
+
/cli/dinghy/machine.rb: brianp, markse, sally, brian, fgrehm, ryan, robertc
|
42
|
+
/cli/dinghy/preferences.rb: brianp
|
43
|
+
/cli/dinghy/machine: brianp, ryan, fgrehm
|
44
|
+
/dinghy: brianp
|
45
|
+
|
46
|
+
Based on analysis of a checked out copy of dinghy, I can see that, for the files I'm interested in,
|
47
|
+
brianp would be a subject matter expert but I'll probably find some useful information from ryan and
|
48
|
+
markse as well since it looks like ryan has touched enough files in the `/cli/dinghy/machine`
|
49
|
+
directory
|
50
|
+
|
51
|
+
### Flags
|
52
|
+
|
53
|
+
Flag | Description
|
54
|
+
-----|------------
|
55
|
+
`--branch <branch>` | The branch you want to analyze on the given repository. Defaults to 'master'.
|
56
|
+
`--user <username1 [username2 ...]>` | An optional list of users to whom you'd like to restrict the analysis. This allows you to see e.g. who might know more about a file given their history of working with it over time.
|
57
|
+
`--file </path/to/file [/path/to/other/file ...]` | An optional list of files/directories for which you'd like analysis. Defaults to /. The analysis will also include all directories between a subdirectory and the root of the repository. All file paths are relative to the repository root.
|
58
|
+
`--cache` | This is a default specification which caches all commits that the tool loads for a git repository. This allows you to e.g. `git pull` on a large repository and only incur the cost of loading the additional commits from the repository while previously seen commits are loaded a lot quicker from a cache.
|
59
|
+
`--no-cache` | Specify this if you do *not* want caching. You'll probably never need to use this.
|
60
|
+
`--results <count>` | The number of subject matter experts you'd like to see for each path. Defaults to 10.
|
61
|
+
|
62
|
+
## Development
|
63
|
+
|
64
|
+
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
65
|
+
|
66
|
+
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
67
|
+
|
68
|
+
## Contributing
|
69
|
+
|
70
|
+
Bug reports and pull requests are welcome on GitHub at https://github.com/sjaveed/git_sme. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
|
71
|
+
|
72
|
+
## License
|
73
|
+
|
74
|
+
The gem is available as open source under the terms of the [GPLv3 License](https://www.gnu.org/licenses/gpl-3.0.en.html).
|
75
|
+
|
76
|
+
## Code of Conduct
|
77
|
+
|
78
|
+
Everyone interacting in the GitSme project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/sjaveed/git_sme/blob/master/CODE_OF_CONDUCT.md).
|
data/Rakefile
ADDED
data/bin/console
ADDED
@@ -0,0 +1,14 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require "bundler/setup"
|
4
|
+
require "git_sme"
|
5
|
+
|
6
|
+
# You can add fixtures and/or initialization code here to make experimenting
|
7
|
+
# with your gem easier. You can also use a different console, if you like.
|
8
|
+
|
9
|
+
# (If you use this, don't forget to add pry to your Gemfile!)
|
10
|
+
# require "pry"
|
11
|
+
# Pry.start
|
12
|
+
|
13
|
+
require "irb"
|
14
|
+
IRB.start(__FILE__)
|
data/bin/setup
ADDED
data/exe/git-sme
ADDED
data/git_sme.gemspec
ADDED
@@ -0,0 +1,31 @@
|
|
1
|
+
# coding: utf-8
|
2
|
+
lib = File.expand_path("../lib", __FILE__)
|
3
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
+
require "git_sme/version"
|
5
|
+
|
6
|
+
Gem::Specification.new do |spec|
|
7
|
+
spec.name = "git_sme"
|
8
|
+
spec.version = GitSme::VERSION
|
9
|
+
spec.authors = ["Shahbaz Javeed"]
|
10
|
+
spec.email = ["sjaveed@gmail.com"]
|
11
|
+
|
12
|
+
spec.summary = %q{Identify subject matter experts by analyzing your git repository}
|
13
|
+
spec.description = %q{Analyze your git repository and determine subject matter experts by identifying everyone who has touched a file with preference given to recent touches}
|
14
|
+
spec.homepage = "https://github.com/sjaveed/git_sme"
|
15
|
+
spec.license = "MIT"
|
16
|
+
|
17
|
+
spec.files = `git ls-files -z`.split("\x0").reject do |f|
|
18
|
+
f.match(%r{^(test|spec|features)/})
|
19
|
+
end
|
20
|
+
spec.bindir = "exe"
|
21
|
+
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
22
|
+
spec.require_paths = ["lib"]
|
23
|
+
|
24
|
+
spec.add_development_dependency "bundler", "~> 1.15"
|
25
|
+
spec.add_development_dependency "rake", "~> 10.0"
|
26
|
+
spec.add_development_dependency "rspec", "~> 3.2"
|
27
|
+
|
28
|
+
spec.add_dependency 'ruby-progressbar'
|
29
|
+
spec.add_dependency 'rugged'
|
30
|
+
spec.add_dependency 'thor'
|
31
|
+
end
|
data/lib/git_sme.rb
ADDED
@@ -0,0 +1,71 @@
|
|
1
|
+
module GitSme
|
2
|
+
class AnalysisPresenter
|
3
|
+
attr_reader :valid, :error_message
|
4
|
+
|
5
|
+
alias_method :valid?, :valid
|
6
|
+
|
7
|
+
def initialize(commit_analyzer, users = [], files = [])
|
8
|
+
@commit_analyzer = commit_analyzer
|
9
|
+
@users = users
|
10
|
+
@files = files
|
11
|
+
@files = ['/'] unless @users.any? || @files.any?
|
12
|
+
|
13
|
+
@valid = @commit_analyzer.valid?
|
14
|
+
@error_message = @commit_analyzer.error_message
|
15
|
+
end
|
16
|
+
|
17
|
+
def get_relevant_analyses(results_to_show = 10)
|
18
|
+
@commit_analyzer.analyze unless @commit_analyzer.analyzed?
|
19
|
+
|
20
|
+
users_to_match = @users.any? ? get_matching_keys(@commit_analyzer.analysis[:by_user].keys, @users) : []
|
21
|
+
files_to_match = @files.any? ? get_matching_keys(@commit_analyzer.analysis[:by_file].keys, @files) : []
|
22
|
+
presentable_data = []
|
23
|
+
|
24
|
+
if users_to_match.any? && files_to_match.any?
|
25
|
+
users_to_match.each do |user|
|
26
|
+
user_data = @commit_analyzer.analysis[:by_user][user].select { |k, v| files_to_match.include?(k) }
|
27
|
+
presentable_data << presentable_file_or_user({ user => user_data }, user, results_to_show: results_to_show)
|
28
|
+
end
|
29
|
+
|
30
|
+
puts
|
31
|
+
|
32
|
+
files_to_match.each do |file|
|
33
|
+
user_data = @commit_analyzer.analysis[:by_file][file].select { |k, v| users_to_match.include?(k) }
|
34
|
+
presentable_data << presentable_file_or_user({ file => user_data }, file)
|
35
|
+
end
|
36
|
+
elsif users_to_match.any?
|
37
|
+
get_matching_keys(@commit_analyzer.analysis[:by_user].keys, users_to_match).each do |user|
|
38
|
+
presentable_data << presentable_file_or_user(@commit_analyzer.analysis[:by_user], user)
|
39
|
+
end
|
40
|
+
elsif files_to_match.any?
|
41
|
+
get_matching_keys(@commit_analyzer.analysis[:by_file].keys, files_to_match).each do |path|
|
42
|
+
presentable_data << presentable_file_or_user(@commit_analyzer.analysis[:by_file], path)
|
43
|
+
end
|
44
|
+
end
|
45
|
+
|
46
|
+
presentable_data.compact
|
47
|
+
end
|
48
|
+
|
49
|
+
private
|
50
|
+
|
51
|
+
def presentable_file_or_user(data, key, results_to_show: 10)
|
52
|
+
stats = data[key]
|
53
|
+
info_to_show = sort_keys_by_value(stats).first(results_to_show)
|
54
|
+
return if info_to_show.empty?
|
55
|
+
|
56
|
+
{
|
57
|
+
key => info_to_show
|
58
|
+
}
|
59
|
+
end
|
60
|
+
|
61
|
+
def sort_keys_by_value(data)
|
62
|
+
data.keys.sort_by { |k| data[k] }.reverse
|
63
|
+
end
|
64
|
+
|
65
|
+
def get_matching_keys(all_keys, keys_to_match)
|
66
|
+
all_keys.select do |key|
|
67
|
+
keys_to_match.map { |matcher| matcher.match?(key) }.any? { |val| val }
|
68
|
+
end
|
69
|
+
end
|
70
|
+
end
|
71
|
+
end
|
@@ -0,0 +1,51 @@
|
|
1
|
+
require 'fileutils'
|
2
|
+
|
3
|
+
require_relative 'preferences'
|
4
|
+
|
5
|
+
module GitSme
|
6
|
+
class Cache
|
7
|
+
def initialize(name, enabled: true, directory: 'cache', file_prefix: '', file_suffix: '')
|
8
|
+
raise "Invalid cache name: [#{name}]" if name.nil? || name =~ /^\s+$/
|
9
|
+
|
10
|
+
@name = name.gsub(/[^a-zA-Z-]/, '').strip
|
11
|
+
@enabled = enabled
|
12
|
+
@cache_directory = File.join(PREFERENCES_HOME, directory)
|
13
|
+
@file_prefix = file_prefix
|
14
|
+
@file_suffix = file_suffix
|
15
|
+
|
16
|
+
FileUtils.mkdir_p(@cache_directory) unless File.exist?(@cache_directory)
|
17
|
+
end
|
18
|
+
|
19
|
+
def load
|
20
|
+
return [] unless @enabled && File.exist?(cache_filename)
|
21
|
+
|
22
|
+
YAML.load(File.read(cache_filename))
|
23
|
+
end
|
24
|
+
|
25
|
+
def save(data)
|
26
|
+
return unless @enabled
|
27
|
+
|
28
|
+
File.open(cache_filename, 'w') { |f| f.write(YAML.dump(data)) }
|
29
|
+
end
|
30
|
+
|
31
|
+
private
|
32
|
+
|
33
|
+
def prefix
|
34
|
+
return '' if @file_prefix =~ /^\s*$/
|
35
|
+
|
36
|
+
"#{@file_prefix}-"
|
37
|
+
end
|
38
|
+
|
39
|
+
def suffix
|
40
|
+
return '' if @file_suffix =~ /^\s*$/
|
41
|
+
|
42
|
+
"-#{@file_suffix}"
|
43
|
+
end
|
44
|
+
|
45
|
+
def cache_filename
|
46
|
+
filename = @name
|
47
|
+
|
48
|
+
File.join(@cache_directory, "#{prefix}#{filename}#{suffix}.yml")
|
49
|
+
end
|
50
|
+
end
|
51
|
+
end
|
data/lib/git_sme/cli.rb
ADDED
@@ -0,0 +1,60 @@
|
|
1
|
+
require 'thor'
|
2
|
+
require 'ruby-progressbar'
|
3
|
+
require 'git_sme'
|
4
|
+
|
5
|
+
module GitSme
|
6
|
+
class CLI < Thor
|
7
|
+
desc 'analyze <repository> [--branch <branch>] [--user <username>] [--file </path/to/file>] [--cache | --no-cache] [--results <count>]',
|
8
|
+
'Analyze the repository and determine the subject matter experts for the given files, limiting them to the users provided if needed'
|
9
|
+
|
10
|
+
method_option :branch, type: :string, default: 'master'
|
11
|
+
method_option :user, type: :array, default: []
|
12
|
+
method_option :file, type: :array, default: ['/']
|
13
|
+
method_option :cache, type: :boolean, default: true
|
14
|
+
method_option :results, type: :numeric, default: 10
|
15
|
+
|
16
|
+
def analyze(repository)
|
17
|
+
loader = GitSme::CommitLoader.new(repository, branch: options[:branch], enable_cache: options[:cache])
|
18
|
+
unless loader.valid?
|
19
|
+
puts "Error: #{loader.error_message}"
|
20
|
+
return
|
21
|
+
end
|
22
|
+
|
23
|
+
puts "Repository: #{loader.repo.path.gsub('/.git/', '')}"
|
24
|
+
|
25
|
+
loader_progress = ProgressBar.create(starting_at: 0, format: 'Loaded: %c (%R/s) %P%% %f |%B|')
|
26
|
+
loader.load do |new_commit_count, processed_commit_count, all_commit_count|
|
27
|
+
loader_progress.total = all_commit_count
|
28
|
+
loader_progress.increment
|
29
|
+
end
|
30
|
+
|
31
|
+
analyzer = GitSme::CommitAnalyzer.new(loader, enable_cache: false)
|
32
|
+
unless analyzer.valid?
|
33
|
+
puts "Error: #{analyzer.error_message}"
|
34
|
+
return
|
35
|
+
end
|
36
|
+
|
37
|
+
analyzer_progress = ProgressBar.create(starting_at: 0, total: loader.commits.size, format: 'Analyzed: %c (%R/s) %P%% %f |%B|')
|
38
|
+
analyzer.analyze do |commit_count, total_commits|
|
39
|
+
analyzer_progress.increment
|
40
|
+
end
|
41
|
+
|
42
|
+
presenter = AnalysisPresenter.new(analyzer, options[:user], options[:file])
|
43
|
+
analyses = presenter.get_relevant_analyses(options[:results].to_i)
|
44
|
+
|
45
|
+
puts
|
46
|
+
|
47
|
+
if !analyses.empty?
|
48
|
+
analyses.each do |result|
|
49
|
+
result.each do |path, users|
|
50
|
+
puts "#{path}: #{users.join(', ')}"
|
51
|
+
end
|
52
|
+
end
|
53
|
+
else
|
54
|
+
puts 'No data found!'
|
55
|
+
end
|
56
|
+
end
|
57
|
+
|
58
|
+
default_task :analyze
|
59
|
+
end
|
60
|
+
end
|
@@ -0,0 +1,125 @@
|
|
1
|
+
module GitSme
|
2
|
+
class CommitAnalyzer
|
3
|
+
attr_reader :valid, :error_message, :analysis, :analyzed
|
4
|
+
|
5
|
+
alias_method :valid?, :valid
|
6
|
+
alias_method :analyzed?, :analyzed
|
7
|
+
|
8
|
+
def initialize(commit_loader, enable_cache: true)
|
9
|
+
@enable_cache = true
|
10
|
+
@commit_loader = commit_loader
|
11
|
+
@analyzed = false
|
12
|
+
@valid = @commit_loader.valid?
|
13
|
+
@error_message = @commit_loader.error_message
|
14
|
+
|
15
|
+
@analysis = {}
|
16
|
+
@cache = GitSme::Cache.new(@commit_loader.repo.path.gsub('/.git/', ''),
|
17
|
+
enabled: @enable_cache, file_suffix: "#{@commit_loader.branch}-analysis"
|
18
|
+
)
|
19
|
+
end
|
20
|
+
|
21
|
+
def analyze(force: false)
|
22
|
+
return unless valid?
|
23
|
+
return if analyzed? && !force
|
24
|
+
|
25
|
+
@commit_loader.load
|
26
|
+
@analysis = @cache.load
|
27
|
+
new_analysis = []
|
28
|
+
|
29
|
+
if !@commit_loader.new_commits? || !@analysis.any?
|
30
|
+
if block_given?
|
31
|
+
@analysis = analyze_new_commits(@commit_loader.commits) do |commit_count, total_commits|
|
32
|
+
yield(commit_count, total_commits)
|
33
|
+
end
|
34
|
+
else
|
35
|
+
@analysis = analyze_new_commits(@commit_loader.commits)
|
36
|
+
end
|
37
|
+
elsif @commit_loader.new_commits?
|
38
|
+
new_analysis = if block_given?
|
39
|
+
analyze_new_commits(@commit_loader.new_commits) do |commit_count, total_commits|
|
40
|
+
yield(commit_count, total_commits)
|
41
|
+
end
|
42
|
+
else
|
43
|
+
analyze_new_commits(@commit_loader.new_commits)
|
44
|
+
end
|
45
|
+
|
46
|
+
summed_merge(@analysis[:by_user], new_analysis[:by_user])
|
47
|
+
summed_merge(@analysis[:by_file], new_analysis[:by_file])
|
48
|
+
end
|
49
|
+
|
50
|
+
@cache.save(@analysis)
|
51
|
+
@analyzed = true
|
52
|
+
end
|
53
|
+
|
54
|
+
private
|
55
|
+
|
56
|
+
def analyze_new_commits(commits_to_process)
|
57
|
+
user_stats = {}
|
58
|
+
file_stats = {}
|
59
|
+
now = Time.now.to_i
|
60
|
+
commit_count = commits_to_process.size
|
61
|
+
|
62
|
+
commits_to_process.each_with_index do |commit, current_commit_idx|
|
63
|
+
author = commit[:author]
|
64
|
+
time_delta = now - commit[:timestamp]
|
65
|
+
|
66
|
+
commit[:file_changes].each do |filename, change_details|
|
67
|
+
all_affected_paths(filename).each do |path|
|
68
|
+
change_value = weighted_value(change_details[:changes], time_delta)
|
69
|
+
|
70
|
+
user_stats[author] = {} unless user_stats.key?(author)
|
71
|
+
user_stats[author][path] = 0 unless user_stats[author].key?(path)
|
72
|
+
|
73
|
+
file_stats[path] = {} unless file_stats.key?(path)
|
74
|
+
file_stats[path][author] = 0 unless file_stats[path].key?(author)
|
75
|
+
|
76
|
+
user_stats[author][path] += change_value
|
77
|
+
file_stats[path][author] += change_value
|
78
|
+
end
|
79
|
+
end
|
80
|
+
|
81
|
+
if block_given?
|
82
|
+
yield(current_commit_idx, commit_count)
|
83
|
+
end
|
84
|
+
end
|
85
|
+
|
86
|
+
{
|
87
|
+
by_user: user_stats,
|
88
|
+
by_file: file_stats
|
89
|
+
}
|
90
|
+
end
|
91
|
+
|
92
|
+
def summed_merge(cached_data, new_data)
|
93
|
+
return if new_data.nil? || cached_data.nil?
|
94
|
+
|
95
|
+
new_data.each do |key, value_hash|
|
96
|
+
if cached_data.key?(key)
|
97
|
+
value_hash.each do |value_key, value|
|
98
|
+
if cached_data[key].key?(value_key)
|
99
|
+
cached_data[key][value_key] += value
|
100
|
+
else
|
101
|
+
cached_data[key][value_key] = value
|
102
|
+
end
|
103
|
+
end
|
104
|
+
else
|
105
|
+
cached_data[key] = value_hash
|
106
|
+
end
|
107
|
+
end
|
108
|
+
end
|
109
|
+
|
110
|
+
def all_affected_paths(filename)
|
111
|
+
['/'] + filename.split('/').each_with_object([]) do |path_part, path_list|
|
112
|
+
path_list << [path_list[-1], path_part].join('/')
|
113
|
+
end
|
114
|
+
end
|
115
|
+
|
116
|
+
def weighted_value(value, time_delta)
|
117
|
+
# value_attenuator = 1.0
|
118
|
+
# value_attenuation = value_attenuator * time_delta / time_delta
|
119
|
+
value_attenuation = time_delta > 0 ? time_delta ** (-1/3) : 1
|
120
|
+
|
121
|
+
(value * value_attenuation).to_f
|
122
|
+
end
|
123
|
+
|
124
|
+
end
|
125
|
+
end
|