licensee 5.0.0 → 6.0.0b1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +15 -50
- data/bin/licensee +7 -8
- data/lib/licensee.rb +9 -33
- data/lib/licensee/content_helper.rb +7 -8
- data/lib/licensee/license.rb +5 -28
- data/lib/licensee/matchers/copyright_matcher.rb +17 -16
- data/lib/licensee/matchers/dice_matcher.rb +65 -0
- data/lib/licensee/matchers/exact_matcher.rb +12 -6
- data/lib/licensee/matchers/gemspec_matcher.rb +11 -11
- data/lib/licensee/matchers/npm_bower_matcher.rb +10 -10
- data/lib/licensee/matchers/package_matcher.rb +11 -10
- data/lib/licensee/project.rb +96 -30
- data/lib/licensee/project_file.rb +57 -77
- data/lib/licensee/version.rb +1 -1
- data/licensee.gemspec +26 -0
- data/test/fixtures/npm.git/HEAD +1 -0
- data/test/fixtures/npm.git/config +4 -0
- data/test/fixtures/npm.git/objects/info/packs +2 -0
- data/test/fixtures/npm.git/objects/pack/pack-03c0879445cabcc37f91d97c7955465adef26f4a.idx +0 -0
- data/test/fixtures/npm.git/objects/pack/pack-03c0879445cabcc37f91d97c7955465adef26f4a.pack +0 -0
- data/test/fixtures/npm.git/packed-refs +2 -0
- data/test/functions.rb +4 -15
- data/test/test_licensee.rb +1 -13
- data/test/test_licensee_copyright_matcher.rb +19 -28
- data/test/test_licensee_dice_matcher.rb +21 -0
- data/test/test_licensee_exact_matcher.rb +4 -6
- data/test/test_licensee_gemspec_matcher.rb +3 -11
- data/test/test_licensee_license.rb +2 -12
- data/test/test_licensee_npm_bower_matcher.rb +10 -16
- data/test/test_licensee_project.rb +24 -35
- data/test/test_licensee_project_file.rb +5 -10
- data/vendor/choosealicense.com/_licenses/afl-3.0.txt +69 -0
- data/vendor/choosealicense.com/_licenses/isc.txt +2 -2
- metadata +14 -26
- data/lib/licensee/filesystem_repository.rb +0 -38
- data/lib/licensee/matcher.rb +0 -32
- data/lib/licensee/matchers/git_matcher.rb +0 -27
- data/lib/licensee/matchers/levenshtein_matcher.rb +0 -75
- data/test/test_licensee_content_helper.rb +0 -40
- data/test/test_licensee_git_matcher.rb +0 -19
- data/test/test_licensee_levenshtein_matcher.rb +0 -34
- data/test/test_licensee_matcher.rb +0 -7
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: a9a12c248b281de58f46188030a603b373bfb502
|
4
|
+
data.tar.gz: 384ffeaeef392f999d1b0c6f20efd6651abcf34d
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f90e04e86ea6a97828ac39c52ffeb3661082c5df4a4b1c3944a09555d1e39827c052870ea63541eef675639170ff18f3bc661a2b1e84fe0e8451a3fe299aa04c
|
7
|
+
data.tar.gz: 3c2a06a4e5f9c27fa26df6f8fbb0ff62ce730d0af1dcbaed1774c1b38990158a07813306892f47a9386a0661dae48be27a3bbd31d5c33b197f88cd73a8bcc23b
|
data/README.md
CHANGED
@@ -1,33 +1,22 @@
|
|
1
1
|
# Licensee
|
2
|
-
|
3
|
-
*A Ruby Gem to detect under what license a project is distributed.*
|
2
|
+
_A Ruby Gem to detect under what license a project is distributed._
|
4
3
|
|
5
4
|
[![Build Status](https://travis-ci.org/benbalter/licensee.svg?branch=master)](https://travis-ci.org/benbalter/licensee) [![Gem Version](https://badge.fury.io/rb/licensee.svg)](http://badge.fury.io/rb/licensee)
|
6
5
|
|
7
6
|
## The problem
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
* You've got a project with a license file, but which license is it? Has it been modified?
|
7
|
+
- You've got an open source project. How do you know what you can and can't do with the software?
|
8
|
+
- You've got a bunch of open source projects, how do you know what their licenses are?
|
9
|
+
- You've got a project with a license file, but which license is it? Has it been modified?
|
12
10
|
|
13
11
|
## The solution
|
14
|
-
|
15
12
|
Licensee automates the process of reading `LICENSE` files and compares their contents to known licenses using a several strategies (which we call "Matchers"). It attempts to determine a project's license in the following order:
|
13
|
+
- If the license file has an explicit copyright notice, and nothing more (e.g., `Copyright (c) 2015 Ben Balter`), we'll assume the author intends to retain all rights, and thus the project isn't licensed.
|
14
|
+
- If the license is an exact match to a known license. If we strip away whitespace and copyright notice, we might get lucky, and direct string comparison in Ruby is cheap.
|
15
|
+
- If we still can't match the license, we use a fancy math thing called the [Sørensen–Dice coefficient](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient), which is really good at calculating the similarity between two strings. By calculating the percent changed from the known license to the license file, you can tell, e.g., that a given license is 90% similar to the MIT license, that 10% likely representing the copyright line being properly adapted to the project.
|
16
16
|
|
17
|
-
|
18
|
-
|
19
|
-
2. If the license is an exact match to a known license. Licenses like GPL don't have a copyright notice that needs to be changed in the license itself, so if we strip away whitespace, we might get lucky, and direct string comparison in Ruby is cheap.
|
20
|
-
|
21
|
-
3. If 90% of the lines match a known license. We use Git's internal change calculation method. To calculate diffs, Git hashes each line of both files, and compares the hashes to tell the percent changed. This method is fast, but is done on a line-by-line basis, so if the license is wrapped differently, or has extra words inserted, it's not going to match the license.
|
22
|
-
|
23
|
-
4. If we still can't match the license, we use a fancy math thing called the [Levenshtein distance algorithm](https://en.wikipedia.org/wiki/Levenshtein_distance), which while very slow, is really good at calculating the similarity between two strings. By calculating the percent changed from the known license to the license file, you can tell, e.g., that a given license is 90% similar to the MIT license, that 10% likely representing the copyright line being properly adapted to the project.
|
24
|
-
|
25
|
-
Licensee will even diff the distributed license with the original, so you can see exactly what, if anything's been changed.
|
26
|
-
|
27
|
-
*Special thanks to [@vmg](https://github.com/vmg) for his Git prowess.*
|
17
|
+
_Special thanks to [@vmg](https://github.com/vmg) for his Git and algorithmic prowess._
|
28
18
|
|
29
19
|
## Installation
|
30
|
-
|
31
20
|
`gem install licensee` or add `gem 'licensee'` to your project's `Gemfile`.
|
32
21
|
|
33
22
|
## Usage
|
@@ -52,20 +41,7 @@ license.meta["permitted"]
|
|
52
41
|
=> ["commercial-use","modifications","distribution","sublicense","private-use"]
|
53
42
|
```
|
54
43
|
|
55
|
-
## Diffing
|
56
|
-
|
57
|
-
You can also generate a diff of the known license to the distributed license.
|
58
|
-
|
59
|
-
```ruby
|
60
|
-
puts Licensee.diff "/path/to/a/project"
|
61
|
-
-Copyright (c) [year] [fullname]
|
62
|
-
+Copyright (c) 2014 Ben Balter
|
63
|
-
```
|
64
|
-
|
65
|
-
For a full list of diff options (HTML output, color output, etc.) see [Diffy](https://github.com/samg/diffy).
|
66
|
-
|
67
44
|
## Command line usage
|
68
|
-
|
69
45
|
1. `cd` into a project directory
|
70
46
|
2. execute the `licensee` command
|
71
47
|
|
@@ -78,54 +54,43 @@ Matcher: Licensee::GitMatcher
|
|
78
54
|
```
|
79
55
|
|
80
56
|
## What it looks at
|
81
|
-
|
82
|
-
|
83
|
-
* Crowdsourced license content and metadata from [`choosealicense.com`](http://choosealicense.com)
|
57
|
+
- `LICENSE`, `LICENSE.txt`, `COPYING`, etc. files in the root of the project, comparing the body to known licenses
|
58
|
+
- Crowdsourced license content and metadata from [`choosealicense.com`](http://choosealicense.com)
|
84
59
|
|
85
60
|
## What it doesn't look at
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
|
90
|
-
* Every single possible license (just the most popular ones)
|
91
|
-
* Compliance (e.g., whitelisting certain licenses)
|
61
|
+
- Dependency licensing
|
62
|
+
- References to licenses in `README`, `README.md`, etc.
|
63
|
+
- Every single possible license (just the most popular ones)
|
64
|
+
- Compliance (e.g., whitelisting certain licenses)
|
92
65
|
|
93
66
|
If you're looking for dependency license checking and compliance, take a look at [LicenseFinder](https://github.com/pivotal/LicenseFinder).
|
94
67
|
|
95
68
|
## Huh? Why don't you look at X?
|
96
|
-
|
97
69
|
Because reasons.
|
98
70
|
|
99
71
|
### Why not just look at the "license" field of [insert package manager here]?
|
100
|
-
|
101
72
|
Because it's not legally binding. A license is a legal contract. You give up certain rights (e.g., the right to sue the author) in exchange for the right to use the software.
|
102
73
|
|
103
|
-
Most popular licenses today
|
74
|
+
Most popular licenses today _require_ that the license itself be distributed along side the software. Simply putting the letters "MIT" or "GPL" in a configuration file doesn't really meet that requirement.
|
104
75
|
|
105
76
|
Not to mention, it doesn't tell you much about your rights as a user. Is it GPLv2? GPLv2 or later? Those files are designed to be read by computers (who can't enter into contracts), not humans (who can). It's great metadata, but that's about it.
|
106
77
|
|
107
78
|
### What about looking to see if the author said something in the readme?
|
108
|
-
|
109
79
|
You could make an argument that, when linked or sufficiently identified, the terms of the license are incorporated by reference, or at least that the author's intent is there. There's a handful of reasons why this isn't ideal. For one, if you're using the MIT or BSD (ISC) license, along with a few others, there's templematic language, like the copyright notice, which would go unfilled.
|
110
80
|
|
111
81
|
### What about checking every single file for a copyright header?
|
112
|
-
|
113
82
|
Because that's silly in the context of how software is developed today. You wouldn't put a copyright notice on each page of a book. Besides, it's a lot of work, as there's no standardized, cross-platform way to describe a project's license within a comment.
|
114
83
|
|
115
84
|
Checking the actual text into version control is definitive, so that's what this project looks at.
|
116
85
|
|
117
86
|
## Bootstrapping a local development environment
|
118
|
-
|
119
87
|
`script/bootstrap`
|
120
88
|
|
121
89
|
## Running tests
|
122
|
-
|
123
90
|
`script/cibuild`
|
124
91
|
|
125
92
|
## Updating the licenses
|
126
|
-
|
127
93
|
License data is pulled from `choosealicense.com`. To update the license data, simple run `script/vendor-licenses`.
|
128
94
|
|
129
95
|
## Roadmap
|
130
|
-
|
131
96
|
See [proposed enhancements](https://github.com/benbalter/licensee/labels/enhancement).
|
data/bin/licensee
CHANGED
@@ -3,19 +3,18 @@ require_relative "../lib/licensee"
|
|
3
3
|
|
4
4
|
path = ARGV[0] || Dir.pwd
|
5
5
|
|
6
|
-
Licensee.
|
7
|
-
|
8
|
-
license = project.matched_file
|
6
|
+
project = Licensee::GitProject.new(path, detect_packages: true)
|
7
|
+
file = project.matched_file
|
9
8
|
|
10
9
|
if project.license_file
|
11
|
-
puts "License file: #{project.license_file.
|
10
|
+
puts "License file: #{project.license_file.filename}"
|
12
11
|
puts "Attribution: #{project.license_file.attribution}" if project.license_file.attribution
|
13
12
|
end
|
14
13
|
|
15
|
-
if
|
16
|
-
puts "License: #{license
|
17
|
-
puts "Confidence: #{
|
18
|
-
puts "Method: #{
|
14
|
+
if file
|
15
|
+
puts "License: #{file.license ? file.license.meta['title'] : 'no license'}"
|
16
|
+
puts "Confidence: #{file.confidence}%"
|
17
|
+
puts "Method: #{file.matcher.class}"
|
19
18
|
else
|
20
19
|
puts "Unknown"
|
21
20
|
end
|
data/lib/licensee.rb
CHANGED
@@ -1,25 +1,17 @@
|
|
1
|
-
require 'uri'
|
2
|
-
require 'yaml'
|
3
|
-
require 'rugged'
|
4
|
-
require 'levenshtein'
|
5
|
-
|
6
1
|
require_relative "licensee/version"
|
7
2
|
require_relative "licensee/content_helper"
|
8
3
|
require_relative "licensee/license"
|
9
4
|
require_relative "licensee/project"
|
10
5
|
require_relative "licensee/project_file"
|
11
|
-
|
12
|
-
require_relative "licensee/matcher"
|
6
|
+
|
13
7
|
require_relative "licensee/matchers/exact_matcher"
|
14
8
|
require_relative "licensee/matchers/copyright_matcher"
|
15
|
-
require_relative "licensee/matchers/
|
16
|
-
require_relative "licensee/matchers/levenshtein_matcher"
|
9
|
+
require_relative "licensee/matchers/dice_matcher"
|
17
10
|
require_relative "licensee/matchers/package_matcher"
|
18
11
|
require_relative "licensee/matchers/gemspec_matcher"
|
19
12
|
require_relative "licensee/matchers/npm_bower_matcher"
|
20
13
|
|
21
14
|
class Licensee
|
22
|
-
|
23
15
|
# Over which percent is a match considered a match by default
|
24
16
|
CONFIDENCE_THRESHOLD = 90
|
25
17
|
|
@@ -27,40 +19,24 @@ class Licensee
|
|
27
19
|
DOMAIN = "http://choosealicense.com"
|
28
20
|
|
29
21
|
class << self
|
30
|
-
|
31
|
-
attr_writer :confidence_threshold, :package_manager_files
|
22
|
+
attr_writer :confidence_threshold
|
32
23
|
|
33
24
|
# Returns an array of Licensee::License instances
|
34
25
|
def licenses(options={})
|
35
26
|
Licensee::License.all(options)
|
36
27
|
end
|
37
28
|
|
38
|
-
# Returns the license for a given
|
29
|
+
# Returns the license for a given path
|
39
30
|
def license(path)
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
def matchers
|
46
|
-
matchers = [
|
47
|
-
Licensee::CopyrightMatcher,
|
48
|
-
Licensee::ExactMatcher,
|
49
|
-
Licensee::GitMatcher,
|
50
|
-
Licensee::LevenshteinMatcher,
|
51
|
-
Licensee::GemspecMatcher,
|
52
|
-
Licensee::NpmBowerMatcher
|
53
|
-
]
|
54
|
-
matchers.reject! { |m| m.package_manager? } unless package_manager_files?
|
55
|
-
matchers
|
31
|
+
begin
|
32
|
+
Licensee::GitProject.new(path).license
|
33
|
+
rescue Licensee::GitProject::InvalidRepository
|
34
|
+
Licensee::FSProject.new(path).license
|
35
|
+
end
|
56
36
|
end
|
57
37
|
|
58
38
|
def confidence_threshold
|
59
39
|
@confidence_threshold ||= CONFIDENCE_THRESHOLD
|
60
40
|
end
|
61
|
-
|
62
|
-
def package_manager_files?
|
63
|
-
@package_manager_files || false
|
64
|
-
end
|
65
41
|
end
|
66
42
|
end
|
@@ -1,14 +1,13 @@
|
|
1
|
+
require 'set'
|
2
|
+
|
1
3
|
class Licensee
|
2
4
|
module ContentHelper
|
3
|
-
def
|
5
|
+
def create_word_set(content)
|
4
6
|
return unless content
|
5
|
-
content = content.
|
6
|
-
content
|
7
|
-
content
|
8
|
-
content
|
9
|
-
content = content.gsub(/[[:space:]]+/, ' ')
|
10
|
-
content = content.gsub("\u0000", '') # Remove null byte which breaks Levenshtein
|
11
|
-
content.squeeze(' ').strip
|
7
|
+
content = content.dup
|
8
|
+
content.downcase!
|
9
|
+
content.gsub!(/^#{Matchers::Copyright::REGEX}$/i, '')
|
10
|
+
content.scan(/[\w']+/).to_set
|
12
11
|
end
|
13
12
|
end
|
14
13
|
end
|
data/lib/licensee/license.rb
CHANGED
@@ -1,3 +1,6 @@
|
|
1
|
+
require 'uri'
|
2
|
+
require 'yaml'
|
3
|
+
|
1
4
|
class Licensee
|
2
5
|
class InvalidLicense < ArgumentError; end
|
3
6
|
class License
|
@@ -39,10 +42,6 @@ class Licensee
|
|
39
42
|
|
40
43
|
HIDDEN_LICENSES = %w[other no-license]
|
41
44
|
|
42
|
-
# Licenses that technically contain the license name or nickname
|
43
|
-
# But we are so short that GitMatcher may not catch if rewrapped
|
44
|
-
BODY_INCLUDES_WHITELIST = %w[mit]
|
45
|
-
|
46
45
|
include Licensee::ContentHelper
|
47
46
|
|
48
47
|
def initialize(key)
|
@@ -107,17 +106,8 @@ class Licensee
|
|
107
106
|
alias_method :to_s, :body
|
108
107
|
alias_method :text, :body
|
109
108
|
|
110
|
-
|
111
|
-
|
112
|
-
@body_normalized ||= normalize_content(body)
|
113
|
-
end
|
114
|
-
|
115
|
-
# Git-computed hash signature for the license file
|
116
|
-
def hashsig
|
117
|
-
@hashsig ||= Rugged::Blob::HashSignature.new(
|
118
|
-
body, Rugged::Blob::HashSignature::WHITESPACE_SMART) unless body.nil?
|
119
|
-
rescue Rugged::InvalidError
|
120
|
-
nil
|
109
|
+
def wordset
|
110
|
+
@wordset ||= create_word_set(body)
|
121
111
|
end
|
122
112
|
|
123
113
|
def inspect
|
@@ -132,20 +122,7 @@ class Licensee
|
|
132
122
|
other != nil && key == other.key
|
133
123
|
end
|
134
124
|
|
135
|
-
def body_includes_name?
|
136
|
-
return false if BODY_INCLUDES_WHITELIST.include?(key)
|
137
|
-
return @body_includes_name if defined? @body_includes_name
|
138
|
-
@body_includes_name = body_normalized.include?(name_without_version.downcase)
|
139
|
-
end
|
140
|
-
|
141
|
-
def body_includes_nickname?
|
142
|
-
return false if BODY_INCLUDES_WHITELIST.include?(key)
|
143
|
-
return @body_includes_nickname if defined? @body_includes_nickname
|
144
|
-
@body_includes_nickname = !!(nickname && body_normalized.include?(nickname.downcase))
|
145
|
-
end
|
146
|
-
|
147
125
|
private
|
148
|
-
|
149
126
|
def parts
|
150
127
|
@parts ||= content.match(/\A(---\n.*\n---\n+)?(.*)/m).to_a if content
|
151
128
|
end
|
@@ -1,24 +1,25 @@
|
|
1
1
|
# encoding=utf-8
|
2
2
|
class Licensee
|
3
|
-
|
3
|
+
module Matchers
|
4
|
+
class Copyright
|
5
|
+
REGEX = /\s*Copyright (©|\(c\)|\xC2\xA9)? ?(\d{4}|\[year\])(.*)?\s*/i
|
4
6
|
|
5
|
-
|
7
|
+
def initialize(file)
|
8
|
+
@file = file
|
9
|
+
end
|
6
10
|
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
100
|
16
|
-
end
|
17
|
-
|
18
|
-
private
|
11
|
+
def match
|
12
|
+
# Note: must use content, and not content_normalized here
|
13
|
+
if @file.content.strip =~ /\A#{REGEX}\z/i
|
14
|
+
Licensee::License.find("no-license")
|
15
|
+
end
|
16
|
+
rescue
|
17
|
+
nil
|
18
|
+
end
|
19
19
|
|
20
|
-
|
21
|
-
|
20
|
+
def confidence
|
21
|
+
100
|
22
|
+
end
|
22
23
|
end
|
23
24
|
end
|
24
25
|
end
|
@@ -0,0 +1,65 @@
|
|
1
|
+
class Licensee
|
2
|
+
module Matchers
|
3
|
+
class Dice
|
4
|
+
def initialize(file)
|
5
|
+
@file = file
|
6
|
+
end
|
7
|
+
|
8
|
+
# Return the first potential license that is more similar
|
9
|
+
# than the confidence threshold
|
10
|
+
def match
|
11
|
+
return @match if defined? @match
|
12
|
+
matches = potential_licenses.map do |license|
|
13
|
+
if (sim = similarity(license)) >= Licensee.confidence_threshold
|
14
|
+
[license, sim]
|
15
|
+
end
|
16
|
+
end
|
17
|
+
matches.compact!
|
18
|
+
@match = if matches.empty?
|
19
|
+
nil
|
20
|
+
else
|
21
|
+
matches.max_by { |l, sim| sim }.first
|
22
|
+
end
|
23
|
+
end
|
24
|
+
|
25
|
+
# Sort all licenses, in decending order, by difference in
|
26
|
+
# length to the file
|
27
|
+
# Difference in lengths cannot exceed the file's length *
|
28
|
+
# the confidence threshold / 100
|
29
|
+
def potential_licenses
|
30
|
+
@potential_licenses ||= begin
|
31
|
+
licenses = Licensee.licenses(:hidden => true)
|
32
|
+
licenses = licenses.select do |license|
|
33
|
+
license.wordset && length_delta(license) <= max_delta
|
34
|
+
end
|
35
|
+
licenses.sort_by { |l| length_delta(l) }
|
36
|
+
end
|
37
|
+
end
|
38
|
+
|
39
|
+
# Calculate the difference between the file length and a given
|
40
|
+
# license's length
|
41
|
+
def length_delta(license)
|
42
|
+
(@file.wordset.size - license.wordset.size).abs
|
43
|
+
end
|
44
|
+
|
45
|
+
# Maximum possible difference between file length and license length
|
46
|
+
# for a license to be a potential license to be matched
|
47
|
+
def max_delta
|
48
|
+
@max_delta ||= (@file.wordset.size * (Licensee.confidence_threshold/100.0))
|
49
|
+
end
|
50
|
+
|
51
|
+
# Confidence that the matched license is a match
|
52
|
+
def confidence
|
53
|
+
@confidence ||= match ? similarity(match) : 0
|
54
|
+
end
|
55
|
+
|
56
|
+
private
|
57
|
+
# Calculate percent changed between file and potential license
|
58
|
+
def similarity(license)
|
59
|
+
overlap = (@file.wordset & license.wordset).size
|
60
|
+
total = @file.wordset.size + license.wordset.size
|
61
|
+
100.0 * (overlap * 2.0 / total)
|
62
|
+
end
|
63
|
+
end
|
64
|
+
end
|
65
|
+
end
|
@@ -1,11 +1,17 @@
|
|
1
1
|
class Licensee
|
2
|
-
|
3
|
-
|
4
|
-
|
5
|
-
|
2
|
+
module Matchers
|
3
|
+
class Exact
|
4
|
+
def initialize(file)
|
5
|
+
@file = file
|
6
|
+
end
|
7
|
+
|
8
|
+
def match
|
9
|
+
Licensee.licenses(:hidden => true).find { |l| l.wordset == @file.wordset }
|
10
|
+
end
|
6
11
|
|
7
|
-
|
8
|
-
|
12
|
+
def confidence
|
13
|
+
100
|
14
|
+
end
|
9
15
|
end
|
10
16
|
end
|
11
17
|
end
|