text_profile_signature 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: b40d0d688520628dfb1b3ae43e936d4c454753b9
4
+ data.tar.gz: 14e32db1f6b8e993a4f59740f8a28aeeeace3234
5
+ SHA512:
6
+ metadata.gz: c3593a2de6b6162c85c28cb55b952ab84bd81991e10ea07634de9cecc6fbef2eb2364b1b80b948824882f3acd1ce9481f6b70bd62ca2313a1ae2ec8a5af21014
7
+ data.tar.gz: 5c8a4e9ec4a11208a4809cef022d22e6e6519513b7dcafad18f61695c405fd9fba7828521e7ff8d3f72122e7820cd1d514f2d22c0a5b6202be24d47f9393b9f4
data/.gitignore ADDED
@@ -0,0 +1,2 @@
1
+ *.gem
2
+ Gemfile.lock
data/Gemfile ADDED
@@ -0,0 +1,3 @@
1
+ source "https://rubygems.org"
2
+
3
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,15 @@
1
+ Text Profile Signature calculates a fuzzy hash of textual fields for Deduplication.
2
+ Copyright (C) 2016 Hamed Ramezanian Nik
3
+
4
+ This program is free software: you can redistribute it and/or modify
5
+ it under the terms of the GNU Lesser General Public License as published by
6
+ the Free Software Foundation, either version 3 of the License, or
7
+ (at your option) any later version.
8
+
9
+ This program is distributed in the hope that it will be useful,
10
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
11
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12
+ GNU Lesser General Public License for more details.
13
+
14
+ You should have received a copy of the GNU Lesser General Public License
15
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
data/README.md ADDED
@@ -0,0 +1,55 @@
1
+ # Text Profile Signature
2
+ [![Gem Version](https://badge.fury.io/rb/text_profile_signature.svg)](https://badge.fury.io/rb/text_profile_signature)
3
+
4
+ Text Profile Signature calculates a fuzzy hash of textual fields for Deduplication. It's the port of [TextProfileSignature](https://wiki.apache.org/solr/TextProfileSignature) which is written in Java.
5
+
6
+ ## Installation
7
+
8
+ ### RubyGems
9
+
10
+ Add this to the Gemfile:
11
+
12
+ gem 'text_profile_signature'
13
+
14
+ or install it directly:
15
+
16
+ gem install text_profile_signature
17
+
18
+ ### Install from Git
19
+
20
+ Add the following in the Gemfile:
21
+
22
+ gem 'text_profile_signature', :git => 'https://github.com/iCEAGE/text_profile_signature.git'
23
+
24
+
25
+ ## Getting Started
26
+
27
+ Please follow the [installation](#installation) procedure and then run the following code:
28
+
29
+ ```ruby
30
+ # Load the gem
31
+ require 'text_profile_signature'
32
+
33
+ opts = {
34
+ :min_token_length => 2, # Default to 2
35
+ :quant_rate => 0.01 # Default to 0.01
36
+ }
37
+
38
+ text_profile_signature_instance = TextProfileSignature.new(opts)
39
+
40
+ text = <<-STR
41
+ Liberty, in philosophy, involves free will as contrasted with determinism.[1] In politics, liberty consists of the social and political freedoms enjoyed by all citizens.[2] In theology, liberty is freedom from the bondage of sin.[3] Generally, liberty seems to be distinct from freedom in that freedom concerns itself primarily, if not exclusively, with the ability to do as one wills and what one has the power to do; whereas liberty also takes into account the rights of all involved. As such, liberty can be thought of as freedom limited by rights, and therefore cannot be abused.
42
+ STR
43
+
44
+ sign = text_profile_signature_instance.generate_sign(text)
45
+
46
+ puts sign
47
+
48
+ ```
49
+
50
+ ## Documentation for options
51
+
52
+ | Name | Type | Description | Default value |
53
+ |:----------------:|:-----:|:----------------------------------------------------------------------------------:|---------------|
54
+ | min_token_length | int | The minimum token length to consider | 2 |
55
+ | quant_rate | float | When multiplied by the maximum token frequency, this determines count quantization | 0.01 |
data/Rakefile ADDED
@@ -0,0 +1,8 @@
1
+ require 'rake/testtask'
2
+
3
+ Rake::TestTask.new do |t|
4
+ t.test_files = FileList['test/lib/*_test.rb']
5
+ t.verbose = true
6
+ end
7
+
8
+ task :default => :test
@@ -0,0 +1,106 @@
1
+ =begin
2
+ Text Profile Signature calculates a fuzzy hash of textual fields for Deduplication.
3
+ Copyright (C) 2016 Hamed Ramezanian Nik
4
+
5
+ This program is free software: you can redistribute it and/or modify
6
+ it under the terms of the GNU Lesser General Public License as published by
7
+ the Free Software Foundation, either version 3 of the License, or
8
+ (at your option) any later version.
9
+
10
+ This program is distributed in the hope that it will be useful,
11
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
12
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
13
+ GNU Lesser General Public License for more details.
14
+
15
+ You should have received a copy of the GNU Lesser General Public License
16
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
17
+ =end
18
+
19
+ require 'digest'
20
+ require 'unicode'
21
+
22
+ class TextProfileSignature
23
+ VERSION = "0.0.1" unless defined? TextProfileSignature::VERSION
24
+
25
+ def initialize(options={})
26
+ options[:min_token_length] ||= 2
27
+ options[:quant_rate] ||= 0.01
28
+
29
+ @options = options
30
+ end
31
+
32
+ def generate_sign(text)
33
+ # remove all characters except letters and digits,
34
+ # and bring all characters to lower case
35
+ # split the text into tokens (all consecutive non-whitespace characters)
36
+ # discard tokens equal or shorter than MIN_TOKEN_LEN (default 2 characters)
37
+ current_token = String.new
38
+ max_freq = 0
39
+ tokens = {}
40
+ text.each_char do |character|
41
+ if character =~ /[[:alnum:]]/
42
+ current_token << Unicode::downcase(character)
43
+ else
44
+ if current_token.length > 0
45
+ if current_token.length > @options[:min_token_length]
46
+ # Add it
47
+ tok = tokens[current_token]
48
+ unless tok
49
+ tok = {count: 0, term: current_token}
50
+ tokens[current_token] = tok
51
+ end
52
+ tok[:count] += 1
53
+ max_freq = tok[:count] if tok[:count] > max_freq
54
+ end
55
+ current_token = String.new
56
+ end
57
+ end
58
+ end
59
+
60
+ # Check the last token
61
+ if current_token.length > @options[:min_token_length]
62
+ # Add it
63
+ tok = tokens[current_token]
64
+ unless tok
65
+ tok = {count: 0, term: current_token}
66
+ tokens[current_token] = tok
67
+ end
68
+ tok[:count] += 1
69
+ max_freq = tok[:count] if tok[:count] > max_freq
70
+ end
71
+
72
+ # calculate the QUANT value
73
+ quant = (max_freq * @options[:quant_rate]).round
74
+
75
+ if quant < 2
76
+ if max_freq > 1
77
+ quant = 2
78
+ else
79
+ quant = 1
80
+ end
81
+ end
82
+
83
+ # round down the counts of tokens to the nearest multiple of QUANT
84
+ # tokens, which frequency after quantization falls below QUANT, are discarded
85
+ quantized_tokens = tokens.values.inject([]) do |memo, item|
86
+ # round down to the nearest QUANT
87
+ item[:count] = (item[:count] / quant) * quant
88
+
89
+ # discard the frequencies below the QUANT
90
+ memo.push(item) if item[:count] >= quant
91
+
92
+ memo
93
+ end
94
+
95
+ # sort the list of tokens by decreasing frequency
96
+ profile = quantized_tokens.sort {|x, y| y[:count] <=> x[:count]}
97
+
98
+ # create a list of tokens and their quantized frequency,
99
+ # separated by spaces, in the order of decreasing frequency
100
+ quantized_frequency_str = profile.map do |a|
101
+ "#{a[:term]} #{a[:count]}"
102
+ end.join('\n')
103
+
104
+ Digest::MD5.hexdigest(quantized_frequency_str)
105
+ end
106
+ end
@@ -0,0 +1,7 @@
1
+ :signature: 6a357987bbb275181328aec84aab0e00
2
+ :article: |
3
+ Freiheit (lateinisch libertas) wird in der Regel verstanden als die Möglichkeit, ohne Zwang zwischen unterschiedlichen Möglichkeiten auswählen und entscheiden zu können. Der Begriff benennt in Philosophie, Theologie und Recht der Moderne allgemein einen Zustand der Autonomie eines Subjekts.
4
+
5
+ Ebenfalls von rechtlicher, politischer und philosophischer Bedeutung ist die Unterscheidung zwischen positiver und negativer Freiheit, die sich nur zum Teil mit der Unterscheidung von inneren und äußern Beschränkungen der Handlungsfreiheit deckt. Sie ist vor allem sozialphilosophisch aufgeladen.[5] Die Unterscheidung findet sich schon bei Aristoteles, sie ist aber über die Tradition von Thomas Hobbes und Immanuel Kant zentrales Element des Liberalismus auch im 20. Jahrhundert geworden, dessen Hauptanliegen politische Selbstbestimmung, Schutz des Individuums und Freiheit des Wirtschaftshandelns (als Voraussetzung eines allgemeinen Wohlstandszuwachses und einer daraus resultierenden erweiterten Handlungsfähigkeit) sind. Negative Freiheit (Freiheit von etwas) bezeichnet einen Zustand, in dem keine von der Regierung, der Gesellschaft oder anderen Menschen ausgehenden Zwänge ein Verhalten erschweren oder verhindern;[6][7] Positive Freiheit (Freiheit zu etwas) bezeichnet die Möglichkeit der Selbstverwirklichung, insbesondere der demokratischen Selbstregierung einer Gemeinschaft.[8] Einige Sozialwissenstheoretiker, wie Ralf Dahrendorf lehnen diese Begriffe von Freiheit ab und vertreten stattdessen das Konzept einer einzigen sozialen Freiheit. Diese wird definiert als Abwesenheit externer sozialer Beschränkungen und dem Vorhandensein zumindest eines notwendigen Minimums an sozialen Handlungsressourcen.[9]
6
+
7
+ Im Allgemeinen wird auch bürgerlich-rechtlich die positive Freiheit von der negativen unterschieden. Die positive Freiheit (nicht zu verwechseln mit dem Positivismus) meint die Freiheit zu etwas, bspw. das Recht des Bürgers auf Bewegungsfreiheit oder Meinungsfreiheit. Negative Freiheit hingegen bezeichnet die Freiheit von etwas, bspw. von staatlicher Intervention im persönlichem oder künstlerischem Bereich.[10]
@@ -0,0 +1,15 @@
1
+ :signature: f18022e6b522fa7249ab73498728c275
2
+ :article: |
3
+ Liberty, in philosophy, involves free will as contrasted with determinism.[1] In politics, liberty consists of the social and political freedoms enjoyed by all citizens.[2] In theology, liberty is freedom from the bondage of sin.[3] Generally, liberty seems to be distinct from freedom in that freedom concerns itself primarily, if not exclusively, with the ability to do as one wills and what one has the power to do; whereas liberty also takes into account the rights of all involved. As such, liberty can be thought of as freedom limited by rights, and therefore cannot be abused.
4
+
5
+
6
+
7
+ Philosophers from earliest times have considered the question of liberty. Roman Emperor Marcus Aurelius (121–180 AD) wrote of "a polity in which there is the same law for all, a polity administered with regard to equal rights and equal freedom of speech, and the idea of a kingly government which respects most of all the freedom of the governed."[4] According to Thomas Hobbes, "a free man is he that in those things which by his strength and wit he is able to do is not hindered to do what he hath the will to do" (Leviathan, Part 2, Ch. XXI).
8
+
9
+ John Locke (1632–1704) rejected that definition of liberty. While not specifically mentioning Hobbes, he attacks Sir Robert Filmer who had the same definition. According to Locke:
10
+
11
+ "In the state of nature, liberty consists of being free from any superior power on Earth. People are not under the will or lawmaking authority of others but have only the law of nature for their rule. In political society, liberty consists of being under no other lawmaking power except that established by consent in the commonwealth. People are free from the dominion of any will or legal restraint apart from that enacted by their own constituted lawmaking power according to the trust put in it. Thus, freedom is not as Sir Robert Filmer defines it: 'A liberty for everyone to do what he likes, to live as he pleases, and not to be tied by any laws.' Freedom is constrained by laws in both the state of nature and political society. Freedom of nature is to be under no other restraint but the law of nature. Freedom of people under government is to be under no restraint apart from standing rules to live by that are common to everyone in the society and made by the lawmaking power established in it. Persons have a right or liberty to (1) follow their own will in all things that the law has not prohibited and (2) not be subject to the inconstant, uncertain, unknown, and arbitrary wills of others."[5]
12
+
13
+ John Stuart Mill.
14
+
15
+ John Stuart Mill (1806–1873), in his work, On Liberty, was the first to recognize the difference between liberty as the freedom to act and liberty as the absence of coercion.[6] In his book, Two Concepts of Liberty, Isaiah Berlin formally framed the differences between these two perspectives as the distinction between two opposite concepts of liberty: positive liberty and negative liberty. The latter designates a negative condition in which an individual is protected from tyranny and the arbitrary exercise of authority, while the former refers to the liberty that comes from self-mastery, the freedom from inner compulsions such as weakness and fear.
@@ -0,0 +1,19 @@
1
+ :signature: 8cc2cf890f4632b55899a3184b7b4a96
2
+ :article: |
3
+ La libertad (del latín: libertas, -ātis)1 es la capacidad de la conciencia para pensar y obrar según la propia voluntad de la persona pero en sujeción a un orden o regulación más elevados.
4
+
5
+ Según las acepciones 1, 2, 3 y 4 de este término en el diccionario de la RAE,2 el estado de libertad define la situación, circunstancias o condiciones de quien no es esclavo, ni sujeto, ni impuesto al deseo de otros de forma coercitiva. En otras palabras, aquello que permite a alguien decidir si quiere hacer algo o no, lo hace libre, pero también responsable de sus actos en la medida en que comprenda las consecuencias de ellos. Pues la libertad implica una clara opción por el bien y el mal, solo desde esta opción se estaría actuando desde la concepción de la Teleología[cita requerida].
6
+
7
+ La quinta acepción del término2 define la libertad como la "facultad que se disfruta en las naciones bien gobernadas de hacer y decir cuanto no se oponga a las leyes ni a las buenas costumbres". Con base a ello, la protección de la libertad interpersonal, es objeto de una investigación social y política.
8
+
9
+ El fundamento metafísico de la libertad interior es una cuestión psicológica y filosófica. Ambas formas de la libertad se unen en cada individuo como lo interno y lo externo de una malla de valores, juntos en una dinámica de compromiso.
10
+
11
+ En castellano la palabra libertad proviene del latín libertas, -ātis, de igual significado. La palabra inglesa para libertad, freedom, proviene de una raíz indoeuropea que significa "amar"; la palabra de la misma lengua para decir miedo, afraid, viene de la misma raíz, usado como contraposición a libertad mediante el prefijo a por influencia del latín vulgar.
12
+
13
+ La libertad como desaparición de opresión significa no querer subyugar ni ser subyugado, e implica el fin de un estado de servidumbre. El logro de esta forma de la libertad depende de una combinación de la resistencia del individuo (o grupo) y su entorno.
14
+
15
+ Las leyes artificiales limitan esta forma de libertad, por ejemplo, nadie es libre de no ser representado por políticos dentro de una nación (aunque podamos o no ser libres para intentarlo).
16
+
17
+ Las leyes naturales, como las leyes físicas, o la ley de la gravedad, son también un fundamento importante para la libertad de todos los seres vivos existentes en el universo.
18
+
19
+ Todos los actos presuponen a la libertad para poder ser moralmente imputables (libre albedrío). La libertad se sitúa en la interioridad de la persona y siguiendo esa línea de pensamiento afirma Ricardo Yepes Stork: "Es una de las notas definitorias de la persona. Permite al hombre alcanzar su máxima grandeza pero también su mayor degradación. Es quizás su don más valioso porque empapa y define todo su actuar. El hombre es libre desde lo más profundo de su ser. Por eso los hombres modernos han identificado el ejercicio de la libertad con la realización de la persona: se trata de un derecho y de un ideal al que no podemos ni queremos renunciar. No se concibe que se pueda ser verdaderamente humano sin ser libre de verdad."[cita requerida]
@@ -0,0 +1,16 @@
1
+ :signature: 728ce35eacc5e3ac743d6931f422caf0
2
+ :article: |
3
+ De façon générale, la liberté est un concept qui désigne la possibilité d'action ou de mouvement.
4
+ En mécanique par exemple, on parle de degrés de liberté pour comptabiliser les mouvements possibles d'une pièce.
5
+
6
+ Pour le sens commun, la liberté s'oppose à la notion d'enfermement ou de séquestration. Une personne qui vient de sortir de prison est dite libre. Le sens originel du mot liberté est d'ailleurs assez proche : l'homme libre est celui qui n'appartient pas à autrui, qui n'a pas le statut d'esclave.
7
+
8
+ En philosophie, en sociologie, en droit et en politique, la liberté est une notion majeure : elle marque l'aptitude des individus à exercer leur volonté avec — selon l'orientation politique des discours tenus — la mise en avant de nuances dont aucune n'épuise le sens intégral :
9
+
10
+ formulation négative : où l'on pointe l'absence de soumission, de servitude, de contrainte, qu'elles soient exercées par d'autres individus (comme pour l'esclavage), ou par la société (c'est-à-dire par la Loi).
11
+
12
+ formulation positive : où l'on affirme l'autonomie et la spontanéité du sujet rationnel ; les comportements humains volontaires se fondent sur la liberté et sont qualifiés de libres.
13
+
14
+ formulation relative : différents adages font ressortir l'équilibre à trouver dans une alternative, visant notamment à rendre la liberté compatible avec des principes de philosophie politique tels que l'égalité et la justice. Ainsi : La « liberté consiste à pouvoir faire tout ce qui ne nuit pas à autrui » (art. 4 de la Déclaration des droits de l'homme), ce qui implique la possibilité de « faire tout ce qui n'est point interdit, comme ne pas faire ce qui n'est point obligatoire » (art. 5), la « liberté de dire ou de faire ce qui n'est pas contraire à l'ordre public ou à la morale publique » (droit administratif) ou encore « La liberté des uns s'arrête là où commence celle des autres » (peut-être inspiré par John Stuart Mill)1. Dans une telle formulation, la liberté est étroitement liée au concept de droit, allant jusqu'à confondre les deux notions.
15
+
16
+ Dans la mesure où ces deux perspectives se recoupent de diverses manières, leur chevauchement peut provoquer des erreurs d'interprétation dans les analyses et la confusion dans les débats. Il faut donc prendre soin de distinguer les différents sens de ce mot.
@@ -0,0 +1,13 @@
1
+ :signature: 4c8484da0faa43e345d956e13a4a61f4
2
+ :article: |
3
+ Per libertà s'intende la condizione per cui un individuo può decidere di pensare, esprimersi ed agire senza costrizioni, ricorrendo alla volontà di ideare e mettere in atto un'azione, mediante una libera scelta dei fini e degli strumenti che ritiene utili a realizzarla.
4
+
5
+ Secondo una concezione non solo kantiana, la libertà è una condizione formale della scelta che, quando si tramuterà in atto, in azione concreta, risentirà necessariamente dei condizionamenti che le vengono dal mondo reale, sottoposto alle leggi fisiche necessitanti, o da situazioni determinanti di altra natura.
6
+
7
+ Riguardo all'ambito in cui si opera la libera scelta si parla di libertà morale, giuridica, economica, politica, di pensiero, libertà metafisica, religiosa ecc.
8
+
9
+ «L'essenza della libertà è sempre consistita nella capacità di scegliere come si vuole scegliere e perché così si vuole, senza costrizioni o intimidazioni, senza che un sistema immenso ci inghiotta; e nel diritto di resistere, di essere impopolare, di schierarti per le tue convinzioni per il solo fatto che sono tue. La vera libertà è questa, e senza di essa non c'è mai libertà, di nessun genere, e nemmeno l'illusione di averla»[1]
10
+
11
+ La mitologia romana, che pure aveva tratto da quella greca molte divinità e miti, ne possedeva alcuni che appartenevano solo ai loro riti come quello della dea Libertà che rappresentava simbolicamente la libertà personale di ognuno e, nel seguito della loro storia civile, il diritto riservato a coloro che godevano della cittadinanza romana.
12
+
13
+ A questa divinità i Romani avevano innalzato due templi, uno nel Foro e l'altro nell'Aventino. La dea veniva raffigurata come una donna, con ai piedi un gatto, recante in una mano uno scettro e nell'altra mano un berretto frigio.[2][3]
@@ -0,0 +1,21 @@
1
+ :signature: f18022e6b522fa7249ab73498728c275
2
+ :article: |
3
+ Liberdade, em filosofia, pode ser compreendida sob uma perspectiva que denota a ausência de submissão e de servidão. Ou sob outra perspectiva que é a autonomia e a espontaneidade de um sujeito racional.
4
+
5
+ Para o filósofo René Descartes (1596-1650), age com mais liberdade quem melhor compreende as alternativas que precedem a escolha. Dessa premissa, decorre o silogismo lógico de que, quanto mais evidente a veracidade de uma alternativa, maiores as chances de ela ser escolhida pelo agente. Nesse sentido, a inexistência de acesso à informação afigura-se óbice à identificação da alternativa com maior grau de veracidade.
6
+
7
+ Para Baruch Espinoza (1632-1677), a liberdade possui um elemento de identificação com a natureza do "ser". Nesse sentido, ser livre significa agir de acordo com sua natureza.
8
+
9
+ É mediante a liberdade que o Homem se exprime como tal e em sua totalidade. Esta é também, enquanto meta dos seus esforços, a sua própria realização.
10
+
11
+ Tendemos a associar a fruição da liberdade a uma determinação constante e inescapável. Contudo, os ditames de nossa vida estão sendo realizados a cada passo que damos: assim, a deliberação está também a cargo da vontade humana (na qual se inserem as leis físicas e químicas, biológicas e psicológicas).
12
+
13
+ Diretamente associada à ideia de liberdade, está a noção de responsabilidade, vez que o ato de ser livre implica assumir o conjunto dos nossos atos e saber responder por eles.
14
+
15
+ Para Arthur Schopenhauer (1788-1860), a ação humana não é absolutamente livre. Todo o agir humano, bem como todos os fenômenos da natureza, até mesmo suas leis, são níveis de objetivação da coisa-em-si kantiana que o filósofo identifica como sendo puramente Vontade.
16
+
17
+ Para Schopenhauer, o homem é capaz de acessar sua realidade por um duplo registro: o primeiro, o do fenômeno, onde todo o existente reduz-se, nesse nível, a mera representação.
18
+
19
+ No nível essencial, que não deixa-se apreender pela intuição intelectual, pela experiência dos sentidos, o mundo é apreendido imediatamente como vontade, Vontade de Vida. Nesse caso, a noção de vontade assume um aspecto amplo e aberto, transformando-se no princípio motor dos eventos que sucedem-se na dimensão fenomênica segundo a lei da causalidade.
20
+
21
+ O homem, objeto entre objetos, coisa entre coisas, não possui liberdade de ação porque não é livre para deliberar sobre sua vontade. O homem não escolhe o que deseja, o que quer. Logo, não é livre - é absolutamente determinado a agir segundo sua vontade particular, objetivação da vontade metafísica por trás de todos os eventos naturais. O que parece deliberação é uma ilusão ocasionada pela mera consciência sobre os próprios desejos.É poder viver sem ninguém mandar.
@@ -0,0 +1,63 @@
1
+ =begin
2
+ Text Profile Signature calculates a fuzzy hash of textual fields for Deduplication.
3
+ Copyright (C) 2016 Hamed Ramezanian Nik
4
+
5
+ This program is free software: you can redistribute it and/or modify
6
+ it under the terms of the GNU Lesser General Public License as published by
7
+ the Free Software Foundation, either version 3 of the License, or
8
+ (at your option) any later version.
9
+
10
+ This program is distributed in the hope that it will be useful,
11
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
12
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
13
+ GNU Lesser General Public License for more details.
14
+
15
+ You should have received a copy of the GNU Lesser General Public License
16
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
17
+ =end
18
+
19
+ $LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..'))
20
+
21
+ require 'test_helper'
22
+
23
+ class TextProfileSignatureGeneratorTest < TextProfileSignatureTest
24
+ def setup
25
+ @text_profile_signature = TextProfileSignature.new
26
+ end
27
+
28
+ def test_en_lang
29
+ page = get_wikipedia_article("en")
30
+ sign = @text_profile_signature.generate_sign(page[:article])
31
+ assert_equal sign, page[:signature]
32
+ end
33
+
34
+ def test_de_lang
35
+ page = get_wikipedia_article("de")
36
+ sign = @text_profile_signature.generate_sign(page[:article])
37
+ assert_equal sign, page[:signature]
38
+ end
39
+
40
+ def test_es_lang
41
+ page = get_wikipedia_article("es")
42
+ sign = @text_profile_signature.generate_sign(page[:article])
43
+ assert_equal sign, page[:signature]
44
+ end
45
+
46
+ def test_fr_lang
47
+ page = get_wikipedia_article("fr")
48
+ sign = @text_profile_signature.generate_sign(page[:article])
49
+ assert_equal sign, page[:signature]
50
+ end
51
+
52
+ def test_it_lang
53
+ page = get_wikipedia_article("it")
54
+ sign = @text_profile_signature.generate_sign(page[:article])
55
+ assert_equal sign, page[:signature]
56
+ end
57
+
58
+ def test_pt_lang
59
+ page = get_wikipedia_article("it")
60
+ sign = @text_profile_signature.generate_sign(page[:article])
61
+ assert_equal sign, page[:signature]
62
+ end
63
+ end
@@ -0,0 +1,29 @@
1
+ =begin
2
+ Text Profile Signature calculates a fuzzy hash of textual fields for Deduplication.
3
+ Copyright (C) 2016 Hamed Ramezanian Nik
4
+
5
+ This program is free software: you can redistribute it and/or modify
6
+ it under the terms of the GNU Lesser General Public License as published by
7
+ the Free Software Foundation, either version 3 of the License, or
8
+ (at your option) any later version.
9
+
10
+ This program is distributed in the hope that it will be useful,
11
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
12
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
13
+ GNU Lesser General Public License for more details.
14
+
15
+ You should have received a copy of the GNU Lesser General Public License
16
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
17
+ =end
18
+
19
+ require File.expand_path('../../lib/text_profile_signature', __FILE__)
20
+
21
+ require 'minitest/autorun'
22
+ require 'yaml'
23
+
24
+ class TextProfileSignatureTest < MiniTest::Test
25
+ def get_wikipedia_article(lang)
26
+ path = File.expand_path("../fixtures/liberty_article_from_#{lang}_wikipedia.yaml", __FILE__)
27
+ YAML.load_file(path)
28
+ end
29
+ end
@@ -0,0 +1,24 @@
1
+ # -*- encoding: utf-8 -*-
2
+ require File.expand_path('../lib/text_profile_signature', __FILE__)
3
+
4
+ Gem::Specification.new do |gem|
5
+ gem.name = "text_profile_signature"
6
+ gem.version = TextProfileSignature::VERSION
7
+ gem.platform = Gem::Platform::RUBY
8
+ gem.authors = ["Hamed Ramezanian Nik"]
9
+ gem.email = ["hamed.r.nik@gmail.com"]
10
+ gem.summary = "A fuzzy hash of text generator for Deduplication"
11
+ gem.description = "A fuzzy hash of text generator for Deduplication."
12
+ gem.homepage = "https://github.com/iCEAGE/text_profile_signature"
13
+ gem.license = "LGPL-3.0"
14
+
15
+ gem.files = `git ls-files | grep -Ev '^(myapp|examples)'`.split("\n")
16
+ gem.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
17
+ gem.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
18
+ gem.require_paths = ["lib"]
19
+
20
+ gem.add_runtime_dependency 'unicode', '~> 0.4.4.2'
21
+
22
+ gem.add_development_dependency 'rake', '~> 11.1', '>= 11.1.2'
23
+ gem.add_development_dependency 'minitest', '~> 5.9'
24
+ end
metadata ADDED
@@ -0,0 +1,115 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: text_profile_signature
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ platform: ruby
6
+ authors:
7
+ - Hamed Ramezanian Nik
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2016-06-04 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: unicode
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: 0.4.4.2
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: 0.4.4.2
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '11.1'
34
+ - - ">="
35
+ - !ruby/object:Gem::Version
36
+ version: 11.1.2
37
+ type: :development
38
+ prerelease: false
39
+ version_requirements: !ruby/object:Gem::Requirement
40
+ requirements:
41
+ - - "~>"
42
+ - !ruby/object:Gem::Version
43
+ version: '11.1'
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: 11.1.2
47
+ - !ruby/object:Gem::Dependency
48
+ name: minitest
49
+ requirement: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - "~>"
52
+ - !ruby/object:Gem::Version
53
+ version: '5.9'
54
+ type: :development
55
+ prerelease: false
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ requirements:
58
+ - - "~>"
59
+ - !ruby/object:Gem::Version
60
+ version: '5.9'
61
+ description: A fuzzy hash of text generator for Deduplication.
62
+ email:
63
+ - hamed.r.nik@gmail.com
64
+ executables: []
65
+ extensions: []
66
+ extra_rdoc_files: []
67
+ files:
68
+ - ".gitignore"
69
+ - Gemfile
70
+ - LICENSE
71
+ - README.md
72
+ - Rakefile
73
+ - lib/text_profile_signature.rb
74
+ - test/fixtures/liberty_article_from_de_wikipedia.yaml
75
+ - test/fixtures/liberty_article_from_en_wikipedia.yaml
76
+ - test/fixtures/liberty_article_from_es_wikipedia.yaml
77
+ - test/fixtures/liberty_article_from_fr_wikipedia.yaml
78
+ - test/fixtures/liberty_article_from_it_wikipedia.yaml
79
+ - test/fixtures/liberty_article_from_pt_wikipedia.yaml
80
+ - test/lib/text_profile_signature_unit_test.rb
81
+ - test/test_helper.rb
82
+ - text_profile_signature.gemspec
83
+ homepage: https://github.com/iCEAGE/text_profile_signature
84
+ licenses:
85
+ - LGPL-3.0
86
+ metadata: {}
87
+ post_install_message:
88
+ rdoc_options: []
89
+ require_paths:
90
+ - lib
91
+ required_ruby_version: !ruby/object:Gem::Requirement
92
+ requirements:
93
+ - - ">="
94
+ - !ruby/object:Gem::Version
95
+ version: '0'
96
+ required_rubygems_version: !ruby/object:Gem::Requirement
97
+ requirements:
98
+ - - ">="
99
+ - !ruby/object:Gem::Version
100
+ version: '0'
101
+ requirements: []
102
+ rubyforge_project:
103
+ rubygems_version: 2.6.4
104
+ signing_key:
105
+ specification_version: 4
106
+ summary: A fuzzy hash of text generator for Deduplication
107
+ test_files:
108
+ - test/fixtures/liberty_article_from_de_wikipedia.yaml
109
+ - test/fixtures/liberty_article_from_en_wikipedia.yaml
110
+ - test/fixtures/liberty_article_from_es_wikipedia.yaml
111
+ - test/fixtures/liberty_article_from_fr_wikipedia.yaml
112
+ - test/fixtures/liberty_article_from_it_wikipedia.yaml
113
+ - test/fixtures/liberty_article_from_pt_wikipedia.yaml
114
+ - test/lib/text_profile_signature_unit_test.rb
115
+ - test/test_helper.rb