git_sme 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,78 @@
1
+ # Git SME
2
+
3
+ Git SME allows you to analyze your git repository and identify subject matter experts for any file
4
+ or directory that you would like to know more about. It does this by analyzing all commits made to
5
+ all files in your git repository over time and finding out who made the most changes to each file
6
+ and directory.
7
+
8
+ Commits are weighted so recent commits are more significant than past commits which should mitigate
9
+ the effect a legacy coder would have on these reports.
10
+
11
+ ## Installation
12
+
13
+ Install the gem for commandline usage in the appropriate version of ruby:
14
+
15
+ $ gem install git_sme
16
+
17
+ This will install the git-sme command which should now be available from everywhere assuming your
18
+ PATH is setup appropriately.
19
+
20
+ ## Usage
21
+
22
+ Basic usage of git-sme is as follows:
23
+
24
+ git-sme </path/to/repository> [flags]
25
+
26
+ This will throw an error if the path is not a git repository. As a rule, you don't have to point to
27
+ the `.git` folder in your checked out code because `git-sme` will know to look for that folder as a
28
+ child of the folder you **do** provide it.
29
+
30
+ `git-sme` will output a list of paths (files and directories) and the list of users who it thinks
31
+ are the subject matter experts on each of those paths. Users are listed in decreasing order of
32
+ expertise:
33
+
34
+ $ bundle exec git-sme ~/rails/dinghy --file ~/rails/dinghy/cli/dinghy/preferences.rb ~/rails/dinghy/cli/dinghy/machine.rb
35
+ Repository: /Users/sjaveed/rails/dinghy
36
+ Analyzed: 317 (0/s) 100.00% Time: 00:00:00 |==============================================================================|
37
+
38
+ /: brianp, ryan, brian, adrian.falleiro, sally, dev, markse, fgrehm, kallin.nagelberg, matt
39
+ /cli: brianp, ryan, markse, sally, adrian.falleiro, fgrehm, matt, kallin.nagelberg, brian, dev
40
+ /cli/dinghy: brianp, markse, ryan, sally, fgrehm, matt, adrian.falleiro, brian, aisipos, paul.moelders
41
+ /cli/dinghy/machine.rb: brianp, markse, sally, brian, fgrehm, ryan, robertc
42
+ /cli/dinghy/preferences.rb: brianp
43
+ /cli/dinghy/machine: brianp, ryan, fgrehm
44
+ /dinghy: brianp
45
+
46
+ Based on analysis of a checked out copy of dinghy, I can see that, for the files I'm interested in,
47
+ brianp would be a subject matter expert but I'll probably find some useful information from ryan and
48
+ markse as well since it looks like ryan has touched enough files in the `/cli/dinghy/machine`
49
+ directory
50
+
51
+ ### Flags
52
+
53
+ Flag | Description
54
+ -----|------------
55
+ `--branch <branch>` | The branch you want to analyze on the given repository. Defaults to 'master'.
56
+ `--user <username1 [username2 ...]>` | An optional list of users to whom you'd like to restrict the analysis. This allows you to see e.g. who might know more about a file given their history of working with it over time.
57
+ `--file </path/to/file [/path/to/other/file ...]` | An optional list of files/directories for which you'd like analysis. Defaults to /. The analysis will also include all directories between a subdirectory and the root of the repository. All file paths are relative to the repository root.
58
+ `--cache` | This is a default specification which caches all commits that the tool loads for a git repository. This allows you to e.g. `git pull` on a large repository and only incur the cost of loading the additional commits from the repository while previously seen commits are loaded a lot quicker from a cache.
59
+ `--no-cache` | Specify this if you do *not* want caching. You'll probably never need to use this.
60
+ `--results <count>` | The number of subject matter experts you'd like to see for each path. Defaults to 10.
61
+
62
+ ## Development
63
+
64
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
65
+
66
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
67
+
68
+ ## Contributing
69
+
70
+ Bug reports and pull requests are welcome on GitHub at https://github.com/sjaveed/git_sme. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
71
+
72
+ ## License
73
+
74
+ The gem is available as open source under the terms of the [GPLv3 License](https://www.gnu.org/licenses/gpl-3.0.en.html).
75
+
76
+ ## Code of Conduct
77
+
78
+ Everyone interacting in the GitSme project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/sjaveed/git_sme/blob/master/CODE_OF_CONDUCT.md).
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "git_sme"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,6 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'git_sme/cli'
4
+
5
+ # Prepend analyze so we don't have to keep typing that on the commandline
6
+ GitSme::CLI.start(ARGV.dup.unshift('analyze'))
@@ -0,0 +1,31 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path("../lib", __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require "git_sme/version"
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "git_sme"
8
+ spec.version = GitSme::VERSION
9
+ spec.authors = ["Shahbaz Javeed"]
10
+ spec.email = ["sjaveed@gmail.com"]
11
+
12
+ spec.summary = %q{Identify subject matter experts by analyzing your git repository}
13
+ spec.description = %q{Analyze your git repository and determine subject matter experts by identifying everyone who has touched a file with preference given to recent touches}
14
+ spec.homepage = "https://github.com/sjaveed/git_sme"
15
+ spec.license = "MIT"
16
+
17
+ spec.files = `git ls-files -z`.split("\x0").reject do |f|
18
+ f.match(%r{^(test|spec|features)/})
19
+ end
20
+ spec.bindir = "exe"
21
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
22
+ spec.require_paths = ["lib"]
23
+
24
+ spec.add_development_dependency "bundler", "~> 1.15"
25
+ spec.add_development_dependency "rake", "~> 10.0"
26
+ spec.add_development_dependency "rspec", "~> 3.2"
27
+
28
+ spec.add_dependency 'ruby-progressbar'
29
+ spec.add_dependency 'rugged'
30
+ spec.add_dependency 'thor'
31
+ end
@@ -0,0 +1,9 @@
1
+ require 'git_sme/version'
2
+ require 'git_sme/preferences'
3
+ require 'git_sme/commit_loader'
4
+ require 'git_sme/commit_analyzer'
5
+ require 'git_sme/analysis_presenter'
6
+
7
+ module GitSme
8
+ # Your code goes here...
9
+ end
@@ -0,0 +1,71 @@
1
+ module GitSme
2
+ class AnalysisPresenter
3
+ attr_reader :valid, :error_message
4
+
5
+ alias_method :valid?, :valid
6
+
7
+ def initialize(commit_analyzer, users = [], files = [])
8
+ @commit_analyzer = commit_analyzer
9
+ @users = users
10
+ @files = files
11
+ @files = ['/'] unless @users.any? || @files.any?
12
+
13
+ @valid = @commit_analyzer.valid?
14
+ @error_message = @commit_analyzer.error_message
15
+ end
16
+
17
+ def get_relevant_analyses(results_to_show = 10)
18
+ @commit_analyzer.analyze unless @commit_analyzer.analyzed?
19
+
20
+ users_to_match = @users.any? ? get_matching_keys(@commit_analyzer.analysis[:by_user].keys, @users) : []
21
+ files_to_match = @files.any? ? get_matching_keys(@commit_analyzer.analysis[:by_file].keys, @files) : []
22
+ presentable_data = []
23
+
24
+ if users_to_match.any? && files_to_match.any?
25
+ users_to_match.each do |user|
26
+ user_data = @commit_analyzer.analysis[:by_user][user].select { |k, v| files_to_match.include?(k) }
27
+ presentable_data << presentable_file_or_user({ user => user_data }, user, results_to_show: results_to_show)
28
+ end
29
+
30
+ puts
31
+
32
+ files_to_match.each do |file|
33
+ user_data = @commit_analyzer.analysis[:by_file][file].select { |k, v| users_to_match.include?(k) }
34
+ presentable_data << presentable_file_or_user({ file => user_data }, file)
35
+ end
36
+ elsif users_to_match.any?
37
+ get_matching_keys(@commit_analyzer.analysis[:by_user].keys, users_to_match).each do |user|
38
+ presentable_data << presentable_file_or_user(@commit_analyzer.analysis[:by_user], user)
39
+ end
40
+ elsif files_to_match.any?
41
+ get_matching_keys(@commit_analyzer.analysis[:by_file].keys, files_to_match).each do |path|
42
+ presentable_data << presentable_file_or_user(@commit_analyzer.analysis[:by_file], path)
43
+ end
44
+ end
45
+
46
+ presentable_data.compact
47
+ end
48
+
49
+ private
50
+
51
+ def presentable_file_or_user(data, key, results_to_show: 10)
52
+ stats = data[key]
53
+ info_to_show = sort_keys_by_value(stats).first(results_to_show)
54
+ return if info_to_show.empty?
55
+
56
+ {
57
+ key => info_to_show
58
+ }
59
+ end
60
+
61
+ def sort_keys_by_value(data)
62
+ data.keys.sort_by { |k| data[k] }.reverse
63
+ end
64
+
65
+ def get_matching_keys(all_keys, keys_to_match)
66
+ all_keys.select do |key|
67
+ keys_to_match.map { |matcher| matcher.match?(key) }.any? { |val| val }
68
+ end
69
+ end
70
+ end
71
+ end
@@ -0,0 +1,51 @@
1
+ require 'fileutils'
2
+
3
+ require_relative 'preferences'
4
+
5
+ module GitSme
6
+ class Cache
7
+ def initialize(name, enabled: true, directory: 'cache', file_prefix: '', file_suffix: '')
8
+ raise "Invalid cache name: [#{name}]" if name.nil? || name =~ /^\s+$/
9
+
10
+ @name = name.gsub(/[^a-zA-Z-]/, '').strip
11
+ @enabled = enabled
12
+ @cache_directory = File.join(PREFERENCES_HOME, directory)
13
+ @file_prefix = file_prefix
14
+ @file_suffix = file_suffix
15
+
16
+ FileUtils.mkdir_p(@cache_directory) unless File.exist?(@cache_directory)
17
+ end
18
+
19
+ def load
20
+ return [] unless @enabled && File.exist?(cache_filename)
21
+
22
+ YAML.load(File.read(cache_filename))
23
+ end
24
+
25
+ def save(data)
26
+ return unless @enabled
27
+
28
+ File.open(cache_filename, 'w') { |f| f.write(YAML.dump(data)) }
29
+ end
30
+
31
+ private
32
+
33
+ def prefix
34
+ return '' if @file_prefix =~ /^\s*$/
35
+
36
+ "#{@file_prefix}-"
37
+ end
38
+
39
+ def suffix
40
+ return '' if @file_suffix =~ /^\s*$/
41
+
42
+ "-#{@file_suffix}"
43
+ end
44
+
45
+ def cache_filename
46
+ filename = @name
47
+
48
+ File.join(@cache_directory, "#{prefix}#{filename}#{suffix}.yml")
49
+ end
50
+ end
51
+ end
@@ -0,0 +1,60 @@
1
+ require 'thor'
2
+ require 'ruby-progressbar'
3
+ require 'git_sme'
4
+
5
+ module GitSme
6
+ class CLI < Thor
7
+ desc 'analyze <repository> [--branch <branch>] [--user <username>] [--file </path/to/file>] [--cache | --no-cache] [--results <count>]',
8
+ 'Analyze the repository and determine the subject matter experts for the given files, limiting them to the users provided if needed'
9
+
10
+ method_option :branch, type: :string, default: 'master'
11
+ method_option :user, type: :array, default: []
12
+ method_option :file, type: :array, default: ['/']
13
+ method_option :cache, type: :boolean, default: true
14
+ method_option :results, type: :numeric, default: 10
15
+
16
+ def analyze(repository)
17
+ loader = GitSme::CommitLoader.new(repository, branch: options[:branch], enable_cache: options[:cache])
18
+ unless loader.valid?
19
+ puts "Error: #{loader.error_message}"
20
+ return
21
+ end
22
+
23
+ puts "Repository: #{loader.repo.path.gsub('/.git/', '')}"
24
+
25
+ loader_progress = ProgressBar.create(starting_at: 0, format: 'Loaded: %c (%R/s) %P%% %f |%B|')
26
+ loader.load do |new_commit_count, processed_commit_count, all_commit_count|
27
+ loader_progress.total = all_commit_count
28
+ loader_progress.increment
29
+ end
30
+
31
+ analyzer = GitSme::CommitAnalyzer.new(loader, enable_cache: false)
32
+ unless analyzer.valid?
33
+ puts "Error: #{analyzer.error_message}"
34
+ return
35
+ end
36
+
37
+ analyzer_progress = ProgressBar.create(starting_at: 0, total: loader.commits.size, format: 'Analyzed: %c (%R/s) %P%% %f |%B|')
38
+ analyzer.analyze do |commit_count, total_commits|
39
+ analyzer_progress.increment
40
+ end
41
+
42
+ presenter = AnalysisPresenter.new(analyzer, options[:user], options[:file])
43
+ analyses = presenter.get_relevant_analyses(options[:results].to_i)
44
+
45
+ puts
46
+
47
+ if !analyses.empty?
48
+ analyses.each do |result|
49
+ result.each do |path, users|
50
+ puts "#{path}: #{users.join(', ')}"
51
+ end
52
+ end
53
+ else
54
+ puts 'No data found!'
55
+ end
56
+ end
57
+
58
+ default_task :analyze
59
+ end
60
+ end
@@ -0,0 +1,125 @@
1
+ module GitSme
2
+ class CommitAnalyzer
3
+ attr_reader :valid, :error_message, :analysis, :analyzed
4
+
5
+ alias_method :valid?, :valid
6
+ alias_method :analyzed?, :analyzed
7
+
8
+ def initialize(commit_loader, enable_cache: true)
9
+ @enable_cache = true
10
+ @commit_loader = commit_loader
11
+ @analyzed = false
12
+ @valid = @commit_loader.valid?
13
+ @error_message = @commit_loader.error_message
14
+
15
+ @analysis = {}
16
+ @cache = GitSme::Cache.new(@commit_loader.repo.path.gsub('/.git/', ''),
17
+ enabled: @enable_cache, file_suffix: "#{@commit_loader.branch}-analysis"
18
+ )
19
+ end
20
+
21
+ def analyze(force: false)
22
+ return unless valid?
23
+ return if analyzed? && !force
24
+
25
+ @commit_loader.load
26
+ @analysis = @cache.load
27
+ new_analysis = []
28
+
29
+ if !@commit_loader.new_commits? || !@analysis.any?
30
+ if block_given?
31
+ @analysis = analyze_new_commits(@commit_loader.commits) do |commit_count, total_commits|
32
+ yield(commit_count, total_commits)
33
+ end
34
+ else
35
+ @analysis = analyze_new_commits(@commit_loader.commits)
36
+ end
37
+ elsif @commit_loader.new_commits?
38
+ new_analysis = if block_given?
39
+ analyze_new_commits(@commit_loader.new_commits) do |commit_count, total_commits|
40
+ yield(commit_count, total_commits)
41
+ end
42
+ else
43
+ analyze_new_commits(@commit_loader.new_commits)
44
+ end
45
+
46
+ summed_merge(@analysis[:by_user], new_analysis[:by_user])
47
+ summed_merge(@analysis[:by_file], new_analysis[:by_file])
48
+ end
49
+
50
+ @cache.save(@analysis)
51
+ @analyzed = true
52
+ end
53
+
54
+ private
55
+
56
+ def analyze_new_commits(commits_to_process)
57
+ user_stats = {}
58
+ file_stats = {}
59
+ now = Time.now.to_i
60
+ commit_count = commits_to_process.size
61
+
62
+ commits_to_process.each_with_index do |commit, current_commit_idx|
63
+ author = commit[:author]
64
+ time_delta = now - commit[:timestamp]
65
+
66
+ commit[:file_changes].each do |filename, change_details|
67
+ all_affected_paths(filename).each do |path|
68
+ change_value = weighted_value(change_details[:changes], time_delta)
69
+
70
+ user_stats[author] = {} unless user_stats.key?(author)
71
+ user_stats[author][path] = 0 unless user_stats[author].key?(path)
72
+
73
+ file_stats[path] = {} unless file_stats.key?(path)
74
+ file_stats[path][author] = 0 unless file_stats[path].key?(author)
75
+
76
+ user_stats[author][path] += change_value
77
+ file_stats[path][author] += change_value
78
+ end
79
+ end
80
+
81
+ if block_given?
82
+ yield(current_commit_idx, commit_count)
83
+ end
84
+ end
85
+
86
+ {
87
+ by_user: user_stats,
88
+ by_file: file_stats
89
+ }
90
+ end
91
+
92
+ def summed_merge(cached_data, new_data)
93
+ return if new_data.nil? || cached_data.nil?
94
+
95
+ new_data.each do |key, value_hash|
96
+ if cached_data.key?(key)
97
+ value_hash.each do |value_key, value|
98
+ if cached_data[key].key?(value_key)
99
+ cached_data[key][value_key] += value
100
+ else
101
+ cached_data[key][value_key] = value
102
+ end
103
+ end
104
+ else
105
+ cached_data[key] = value_hash
106
+ end
107
+ end
108
+ end
109
+
110
+ def all_affected_paths(filename)
111
+ ['/'] + filename.split('/').each_with_object([]) do |path_part, path_list|
112
+ path_list << [path_list[-1], path_part].join('/')
113
+ end
114
+ end
115
+
116
+ def weighted_value(value, time_delta)
117
+ # value_attenuator = 1.0
118
+ # value_attenuation = value_attenuator * time_delta / time_delta
119
+ value_attenuation = time_delta > 0 ? time_delta ** (-1/3) : 1
120
+
121
+ (value * value_attenuation).to_f
122
+ end
123
+
124
+ end
125
+ end