rika-stevedore 1.1.4-java

Sign up to get free protection for your applications and to get access to all the features.
Files changed (167) hide show
  1. checksums.yaml +7 -0
  2. data/.gitignore +21 -0
  3. data/.rspec +2 -0
  4. data/.travis.yml +7 -0
  5. data/Gemfile +4 -0
  6. data/LICENSE.txt +22 -0
  7. data/README.md +92 -0
  8. data/Rakefile +11 -0
  9. data/lib/rika/version.rb +3 -0
  10. data/lib/rika.rb +129 -0
  11. data/pom.xml +20 -0
  12. data/rika-stevedore.gemspec +21 -0
  13. data/spec/fixtures/de.txt +1 -0
  14. data/spec/fixtures/document.doc +0 -0
  15. data/spec/fixtures/document.docx +0 -0
  16. data/spec/fixtures/document.pdf +0 -0
  17. data/spec/fixtures/en.txt +1 -0
  18. data/spec/fixtures/es.txt +1 -0
  19. data/spec/fixtures/fr.txt +1 -0
  20. data/spec/fixtures/image.jpg +0 -0
  21. data/spec/fixtures/lang_cant_be_determined.txt +1 -0
  22. data/spec/fixtures/over_100k_file.txt +1241 -0
  23. data/spec/fixtures/ru.txt +1 -0
  24. data/spec/fixtures/text_file.txt +1 -0
  25. data/spec/fixtures/text_file_without_extension +1 -0
  26. data/spec/fixtures/unknown.bin +0 -0
  27. data/spec/rika_spec.rb +203 -0
  28. data/spec/spec_helper.rb +14 -0
  29. data/target/dependency/aopalliance-1.0.jar +0 -0
  30. data/target/dependency/apache-mime4j-core-0.7.2.jar +0 -0
  31. data/target/dependency/apache-mime4j-dom-0.7.2.jar +0 -0
  32. data/target/dependency/asm-5.0.4.jar +0 -0
  33. data/target/dependency/bcmail-jdk15on-1.54.jar +0 -0
  34. data/target/dependency/bcpkix-jdk15on-1.54.jar +0 -0
  35. data/target/dependency/bcprov-jdk15on-1.54.jar +0 -0
  36. data/target/dependency/bndlib-1.43.0.jar +0 -0
  37. data/target/dependency/boilerpipe-1.1.0.jar +0 -0
  38. data/target/dependency/bzip2-0.9.1.jar +0 -0
  39. data/target/dependency/c3p0-0.9.1.1.jar +0 -0
  40. data/target/dependency/cdm-4.5.5.jar +0 -0
  41. data/target/dependency/cleartk-util-2.0.0.jar +0 -0
  42. data/target/dependency/commons-codec-1.10.jar +0 -0
  43. data/target/dependency/commons-collections4-4.1.jar +0 -0
  44. data/target/dependency/commons-compress-1.12.jar +0 -0
  45. data/target/dependency/commons-csv-1.0.jar +0 -0
  46. data/target/dependency/commons-exec-1.3.jar +0 -0
  47. data/target/dependency/commons-io-2.5.jar +0 -0
  48. data/target/dependency/commons-lang-2.6.jar +0 -0
  49. data/target/dependency/commons-logging-1.1.3.jar +0 -0
  50. data/target/dependency/commons-logging-api-1.1.jar +0 -0
  51. data/target/dependency/commons-vfs2-2.0.jar +0 -0
  52. data/target/dependency/ctakes-core-3.2.2.jar +0 -0
  53. data/target/dependency/ctakes-core-res-3.2.2.jar +0 -0
  54. data/target/dependency/ctakes-type-system-3.2.2.jar +0 -0
  55. data/target/dependency/ctakes-utils-3.2.2.jar +0 -0
  56. data/target/dependency/curvesapi-1.04.jar +0 -0
  57. data/target/dependency/cxf-core-3.0.3.jar +0 -0
  58. data/target/dependency/cxf-rt-frontend-jaxrs-3.0.3.jar +0 -0
  59. data/target/dependency/cxf-rt-rs-client-3.0.3.jar +0 -0
  60. data/target/dependency/cxf-rt-transports-http-3.0.3.jar +0 -0
  61. data/target/dependency/ehcache-core-2.6.2.jar +0 -0
  62. data/target/dependency/findstructapi-0.0.1.jar +0 -0
  63. data/target/dependency/fontbox-2.0.3.jar +0 -0
  64. data/target/dependency/geoapi-3.0.0.jar +0 -0
  65. data/target/dependency/grib-4.5.5.jar +0 -0
  66. data/target/dependency/gson-2.2.4.jar +0 -0
  67. data/target/dependency/guava-17.0.jar +0 -0
  68. data/target/dependency/hamcrest-core-1.3.jar +0 -0
  69. data/target/dependency/httpclient-4.5.2.jar +0 -0
  70. data/target/dependency/httpcore-4.4.4.jar +0 -0
  71. data/target/dependency/httpmime-4.5.2.jar +0 -0
  72. data/target/dependency/httpservices-4.5.5.jar +0 -0
  73. data/target/dependency/isoparser-1.1.18.jar +0 -0
  74. data/target/dependency/jVinci-2.4.0.jar +0 -0
  75. data/target/dependency/jackcess-2.1.4.jar +0 -0
  76. data/target/dependency/jackcess-encrypt-2.1.1.jar +0 -0
  77. data/target/dependency/jackson-core-2.8.1.jar +0 -0
  78. data/target/dependency/jai-imageio-core-1.3.1.jar +0 -0
  79. data/target/dependency/jakarta-regexp-1.4.jar +0 -0
  80. data/target/dependency/java-libpst-0.8.1.jar +0 -0
  81. data/target/dependency/javax.annotation-api-1.2.jar +0 -0
  82. data/target/dependency/javax.ws.rs-api-2.0.1.jar +0 -0
  83. data/target/dependency/jcip-annotations-1.0.jar +0 -0
  84. data/target/dependency/jcommander-1.35.jar +0 -0
  85. data/target/dependency/jdom-1.0.jar +0 -0
  86. data/target/dependency/jdom2-2.0.4.jar +0 -0
  87. data/target/dependency/jempbox-1.8.12.jar +0 -0
  88. data/target/dependency/jhighlight-1.0.2.jar +0 -0
  89. data/target/dependency/jj2000-5.2.jar +0 -0
  90. data/target/dependency/jmatio-1.2.jar +0 -0
  91. data/target/dependency/jna-4.1.0.jar +0 -0
  92. data/target/dependency/joda-time-2.2.jar +0 -0
  93. data/target/dependency/json-20140107.jar +0 -0
  94. data/target/dependency/json-simple-1.1.1.jar +0 -0
  95. data/target/dependency/jsoup-1.7.2.jar +0 -0
  96. data/target/dependency/jsr-275-0.9.3.jar +0 -0
  97. data/target/dependency/junit-4.11.jar +0 -0
  98. data/target/dependency/juniversalchardet-1.0.3.jar +0 -0
  99. data/target/dependency/junrar-0.7.jar +0 -0
  100. data/target/dependency/jwnl-1.3.3.jar +0 -0
  101. data/target/dependency/libsvm-3.1.jar +0 -0
  102. data/target/dependency/lucene-analyzers-common-4.0.0.jar +0 -0
  103. data/target/dependency/lucene-core-4.0.0.jar +0 -0
  104. data/target/dependency/lucene-queries-4.0.0.jar +0 -0
  105. data/target/dependency/lucene-queryparser-4.0.0.jar +0 -0
  106. data/target/dependency/lucene-sandbox-4.0.0.jar +0 -0
  107. data/target/dependency/maven-scm-api-1.4.jar +0 -0
  108. data/target/dependency/maven-scm-provider-svn-commons-1.4.jar +0 -0
  109. data/target/dependency/maven-scm-provider-svnexe-1.4.jar +0 -0
  110. data/target/dependency/metadata-extractor-2.8.1.jar +0 -0
  111. data/target/dependency/mockito-core-1.7.jar +0 -0
  112. data/target/dependency/netcdf4-4.5.5.jar +0 -0
  113. data/target/dependency/objenesis-1.0.jar +0 -0
  114. data/target/dependency/openaifsm-0.0.1.jar +0 -0
  115. data/target/dependency/opennlp-maxent-3.0.3.jar +0 -0
  116. data/target/dependency/opennlp-tools-1.5.3.jar +0 -0
  117. data/target/dependency/org.apache.felix.scr.annotations-1.6.0.jar +0 -0
  118. data/target/dependency/org.osgi.compendium-4.0.0.jar +0 -0
  119. data/target/dependency/org.osgi.core-4.0.0.jar +0 -0
  120. data/target/dependency/pdfbox-2.0.3.jar +0 -0
  121. data/target/dependency/pdfbox-debugger-2.0.3.jar +0 -0
  122. data/target/dependency/pdfbox-tools-2.0.3.jar +0 -0
  123. data/target/dependency/plexus-utils-1.5.6.jar +0 -0
  124. data/target/dependency/poi-3.15.jar +0 -0
  125. data/target/dependency/poi-ooxml-3.15.jar +0 -0
  126. data/target/dependency/poi-ooxml-schemas-3.15.jar +0 -0
  127. data/target/dependency/poi-scratchpad-3.15.jar +0 -0
  128. data/target/dependency/protobuf-java-2.5.0.jar +0 -0
  129. data/target/dependency/quartz-2.2.0.jar +0 -0
  130. data/target/dependency/regexp-1.3.jar +0 -0
  131. data/target/dependency/rome-1.5.1.jar +0 -0
  132. data/target/dependency/rome-utils-1.5.1.jar +0 -0
  133. data/target/dependency/sis-metadata-0.6.jar +0 -0
  134. data/target/dependency/sis-netcdf-0.6.jar +0 -0
  135. data/target/dependency/sis-referencing-0.6.jar +0 -0
  136. data/target/dependency/sis-storage-0.6.jar +0 -0
  137. data/target/dependency/sis-utility-0.6.jar +0 -0
  138. data/target/dependency/slf4j-api-1.7.12.jar +0 -0
  139. data/target/dependency/slf4j-log4j12-1.7.12.jar +0 -0
  140. data/target/dependency/spring-aop-3.1.2.RELEASE.jar +0 -0
  141. data/target/dependency/spring-asm-3.1.2.RELEASE.jar +0 -0
  142. data/target/dependency/spring-beans-3.1.2.RELEASE.jar +0 -0
  143. data/target/dependency/spring-context-3.1.2.RELEASE.jar +0 -0
  144. data/target/dependency/spring-core-3.1.2.RELEASE.jar +0 -0
  145. data/target/dependency/spring-expression-3.1.2.RELEASE.jar +0 -0
  146. data/target/dependency/sqlite-jdbc-3.8.11.2.jar +0 -0
  147. data/target/dependency/sqlwrapper-0.0.1.jar +0 -0
  148. data/target/dependency/stax2-api-3.1.4.jar +0 -0
  149. data/target/dependency/tagsoup-1.2.1.jar +0 -0
  150. data/target/dependency/tika-core-1.14.jar +0 -0
  151. data/target/dependency/tika-parsers-1.14.jar +0 -0
  152. data/target/dependency/udunits-4.5.5.jar +0 -0
  153. data/target/dependency/uimafit-core-2.1.0.jar +0 -0
  154. data/target/dependency/uimaj-adapter-vinci-2.4.0.jar +0 -0
  155. data/target/dependency/uimaj-core-2.4.0.jar +0 -0
  156. data/target/dependency/uimaj-cpe-2.4.0.jar +0 -0
  157. data/target/dependency/uimaj-document-annotation-2.4.0.jar +0 -0
  158. data/target/dependency/uimaj-examples-2.4.0.jar +0 -0
  159. data/target/dependency/uimaj-tools-2.6.0.jar +0 -0
  160. data/target/dependency/vorbis-java-core-0.8.jar +0 -0
  161. data/target/dependency/vorbis-java-tika-0.8.jar +0 -0
  162. data/target/dependency/woodstox-core-asl-4.4.1.jar +0 -0
  163. data/target/dependency/xmlbeans-2.6.0.jar +0 -0
  164. data/target/dependency/xmlschema-core-2.1.0.jar +0 -0
  165. data/target/dependency/xmpcore-5.1.2.jar +0 -0
  166. data/target/dependency/xz-1.5.jar +0 -0
  167. metadata +254 -0
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: d5fd2d1d229514de3b11e87a02c06248fed2965a
4
+ data.tar.gz: 27a040889f25d447535a65af1240c9dc73b219b0
5
+ SHA512:
6
+ metadata.gz: 3972d118579b7444160479976ad9eaa10a4f8ec07621486c568c856966a694191e4f2812431a8501e3f99a0aea53b52219cc144a721408c8785fbe342c65bc52
7
+ data.tar.gz: f302bccd1252ca66fdb7343e6783900a2a7b3c94731d34c521dc0a660d9988071324615a221209770366d19900e3cfa1f9409dea77784a31c95e0453f7380dfc
data/.gitignore ADDED
@@ -0,0 +1,21 @@
1
+ *.gem
2
+ *.rbc
3
+ .bundle
4
+ .config
5
+ .yardoc
6
+ Gemfile.lock
7
+ InstalledFiles
8
+ _yardoc
9
+ coverage
10
+ doc/
11
+ lib/bundler/man
12
+ pkg
13
+ rdoc
14
+ spec/reports
15
+ test/tmp
16
+ test/version_tmp
17
+ tmp
18
+
19
+ .DS_Store
20
+ projectFilesBackup/
21
+ .idea/
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --color
2
+ --format progress
data/.travis.yml ADDED
@@ -0,0 +1,7 @@
1
+ language: ruby
2
+ rvm:
3
+ - jruby-19mode
4
+ - jruby-head
5
+ notifications:
6
+ recipients:
7
+ - ricny046@gmail.com
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in rika.gemspec
4
+ gemspec
data/LICENSE.txt ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2013 Richard Nyström
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,92 @@
1
+ # Rika
2
+
3
+ A JRuby wrapper for Apache Tika to extract text and metadata from various file formats.
4
+
5
+ More information about Apache Tika can be found here: http://tika.apache.org/
6
+
7
+ [![Code Climate](https://codeclimate.com/github/ricn/rika.png)](https://codeclimate.com/github/ricn/rika)
8
+ [![Build Status](https://travis-ci.org/ricn/rika.png?branch=master)](https://travis-ci.org/ricn/rika)
9
+
10
+ ## Jeremy's modifications
11
+ basically, just using my own version of Tika with special email parsing fixes, adds X-Attachments metadata key (listing attachment filenames from emails) and removes bouncycastle from Tika-parsers's requirements because everything is awful.
12
+
13
+
14
+ ## Installation
15
+
16
+ Add this line to your application's Gemfile:
17
+
18
+ gem 'rika'
19
+
20
+ Remember that this gem only works on JRuby.
21
+
22
+ And then execute:
23
+
24
+ $ bundle
25
+
26
+ Or install it yourself as:
27
+
28
+ $ gem install rika
29
+
30
+ ## Usage
31
+
32
+ For a quick start with the simplest use cases, the following functions
33
+ are provided to get what you need in a single function call, for your convenience:
34
+
35
+ ```ruby
36
+ require 'rika'
37
+
38
+ content = Rika.parse_content('document.pdf') # string containing all content text
39
+ metadata = Rika.parse_metadata('document.pdf') # hash containing the document metadata
40
+ content, metadata = Rika.parse_content_and_metadata('document.pdf') # both of the above
41
+ ```
42
+
43
+ For other use cases and finer control, you can work directly with the Rika::Parser object:
44
+
45
+ ```ruby
46
+ require 'rika'
47
+
48
+ parser = Rika::Parser.new('document.pdf')
49
+
50
+ # Return the content of the document:
51
+ parser.content
52
+
53
+ # Return the media type for the document:
54
+ parser.media_type
55
+ => "application/pdf"
56
+
57
+ # Return the metadata field title if it exists:
58
+ parser.metadata["title"] if parser.metadata_exists?("title")
59
+
60
+ # Return all the available metadata keys that can be read from the document
61
+ parser.available_metadata
62
+
63
+ # Return only the first 10000 chars of the content:
64
+ parser = Rika::Parser.new('document.pdf', 10000)
65
+ parser.content # 10000 first chars returned
66
+
67
+ # Return content from URL
68
+ parser = Rika::Parser.new('http://riakhandbook.com/sample.pdf', 200)
69
+ parser.content
70
+
71
+ # Return the language for the content
72
+ parser = parser = Rika::Parser.new('german document.pdf')
73
+ parser.language
74
+ => "de"
75
+
76
+ # Check whether the langugage identification is certain enough to be trusted
77
+ parser.language_is_reasonably_certain?
78
+
79
+ ```
80
+
81
+ ## Credits
82
+ The following people have contributed ideas, documentation, or code to Rika:
83
+ * Keith Bennett
84
+ * Richard Nyström
85
+
86
+ ## Contributing
87
+
88
+ 1. Fork it
89
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
90
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
91
+ 4. Push to the branch (`git push origin my-new-feature`)
92
+ 5. Create new Pull Request
data/Rakefile ADDED
@@ -0,0 +1,11 @@
1
+ require "bundler/gem_tasks"
2
+ require 'rspec/core/rake_task'
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
7
+
8
+ desc 'Download jars'
9
+ task :download_jars do
10
+ system "mvn dependency:copy-dependencies"
11
+ end
@@ -0,0 +1,3 @@
1
+ module Rika
2
+ VERSION = "1.1.4"
3
+ end
data/lib/rika.rb ADDED
@@ -0,0 +1,129 @@
1
+ # encoding: utf-8
2
+
3
+ raise "You need to run JRuby to use Rika" unless RUBY_PLATFORM =~ /java/
4
+
5
+ require "rika/version"
6
+ require 'uri'
7
+ require 'net/http'
8
+ require 'java'
9
+
10
+ Dir[File.join(File.dirname(__FILE__), "../target/dependency/*.jar")].each do |jar|
11
+ require jar
12
+ end
13
+
14
+ $rika_tika ||= nil
15
+
16
+ # Heavily based on the Apache Tika API: http://tika.apache.org/1.5/api/org/apache/tika/Tika.html
17
+ module Rika
18
+ import org.apache.tika.metadata.Metadata
19
+ import org.apache.tika.Tika
20
+ import org.apache.tika.language.LanguageIdentifier
21
+ import org.apache.tika.detect.DefaultDetector
22
+ import java.io.FileInputStream
23
+ import java.net.URL
24
+
25
+ def self.parse_content_and_metadata(file_location, max_content_length = -1)
26
+ parser = Parser.new(file_location, max_content_length)
27
+ [parser.content, parser.metadata]
28
+ end
29
+
30
+ def self.parse_content(file_location, max_content_length = -1)
31
+ parser = Parser.new(file_location, max_content_length)
32
+ parser.content
33
+ end
34
+
35
+ def self.parse_metadata(file_location)
36
+ parser = Parser.new(file_location, 0)
37
+ parser.metadata
38
+ end
39
+
40
+ class Parser
41
+
42
+ def initialize(file_location, max_content_length = -1, detector = DefaultDetector.new)
43
+ @uri = file_location
44
+ $rika_tika = @tika = if $rika_tika.nil?
45
+ puts "creating a new Tika"
46
+ Tika.new(detector)
47
+ else
48
+ $rika_tika
49
+ end
50
+ @tika.set_max_string_length(max_content_length)
51
+ @metadata_java = Metadata.new
52
+ @metadata_ruby = nil
53
+ @input_type = get_input_type
54
+ end
55
+
56
+ def content
57
+ self.parse
58
+ @content
59
+ end
60
+
61
+ def metadata
62
+ unless @metadata_ruby
63
+ self.parse
64
+ @metadata_ruby = {}
65
+
66
+ @metadata_java.names.each do |name|
67
+ @metadata_ruby[name] = @metadata_java.get(name)
68
+ end
69
+ end
70
+ @metadata_ruby
71
+ end
72
+
73
+ def media_type
74
+ if file?
75
+ @media_type ||= @tika.detect(java.io.File.new(@uri))
76
+ else
77
+ @media_type ||= @tika.detect(input_stream)
78
+ end
79
+ end
80
+
81
+ def available_metadata
82
+ metadata.keys
83
+ end
84
+
85
+ def metadata_exists?(name)
86
+ metadata[name] != nil
87
+ end
88
+
89
+ def file?
90
+ @input_type == :file
91
+ end
92
+
93
+ def language
94
+ @lang ||= LanguageIdentifier.new(content)
95
+
96
+ @lang.language
97
+ end
98
+
99
+ def language_is_reasonably_certain?
100
+ @lang ||= LanguageIdentifier.new(content)
101
+
102
+ @lang.is_reasonably_certain
103
+ end
104
+
105
+ protected
106
+
107
+ def parse
108
+ @content ||= @tika.parse_to_string(input_stream, @metadata_java).to_s.strip
109
+ end
110
+
111
+ def get_input_type
112
+ if File.exists?(@uri) && File.directory?(@uri) == false
113
+ :file
114
+ elsif URI(@uri).scheme.to_s.match(%r{https?})
115
+ :http
116
+ else
117
+ raise IOError, "Input (#{@uri}) is neither file nor http."
118
+ end
119
+ end
120
+
121
+ def input_stream
122
+ if file?
123
+ FileInputStream.new(java.io.File.new(@uri))
124
+ else # :http
125
+ URL.new(@uri).open_stream
126
+ end
127
+ end
128
+ end
129
+ end
data/pom.xml ADDED
@@ -0,0 +1,20 @@
1
+ <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
2
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
3
+ <modelVersion>4.0.0</modelVersion>
4
+
5
+ <name>Rika</name>
6
+
7
+ <groupId>org.rika</groupId>
8
+ <artifactId>Rika</artifactId>
9
+ <version>1.0-SNAPSHOT</version>
10
+ <packaging>jar</packaging>
11
+
12
+ <dependencies>
13
+ <dependency>
14
+ <groupId>org.apache.tika</groupId>
15
+ <artifactId>tika-parsers</artifactId>
16
+ <version>1.9</version>
17
+ <scope>test</scope>
18
+ </dependency>
19
+ </dependencies>
20
+ </project>
@@ -0,0 +1,21 @@
1
+ # -*- encoding: utf-8 -*-
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'rika/version'
5
+
6
+ Gem::Specification.new do |gem|
7
+ gem.name = "rika-stevedore"
8
+ gem.version = Rika::VERSION
9
+ gem.authors = ["Richard Nyström", "Jeremy B. Merrill"]
10
+ gem.email = ["jeremybmerrill@gmail.com"]
11
+ gem.description = %q{ A JRuby wrapper for Apache Tika to extract text and metadata from various file formats, slightly modified. }
12
+ gem.summary = %q{ A JRuby wrapper for Apache Tika to extract text and metadata from various file formats, slightly modified. }
13
+ gem.homepage = "https://github.com/jeremybmerrill/rika"
14
+ gem.files = `git ls-files`.split($/)
15
+ gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
16
+ gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
17
+ gem.require_paths = ["lib"]
18
+ gem.add_development_dependency "rspec", "2.14.1"
19
+ gem.add_development_dependency "rake", "10.3.1"
20
+ gem.platform = "java"
21
+ end
@@ -0,0 +1 @@
1
+ Er hörte leise Schritte hinter sich. Das bedeutete nichts Gutes. Wer würde ihm schon folgen, spät in der Nacht und dazu noch in dieser engen Gasse mitten im übel beleumundeten Hafenviertel? Gerade jetzt, wo er das Ding seines Lebens gedreht hatte und mit der Beute verschwinden wollte! Hatte einer seiner zahllosen Kollegen dieselbe Idee gehabt, ihn beobachtet und abgewartet, um ihn nun um die Früchte seiner Arbeit zu erleichtern? Oder gehörten die Schritte hinter ihm zu einem der unzähligen Gesetzeshüter dieser Stadt, und die stählerne Acht um seine Handgelenke würde gleich zuschnappen? Er konnte die Aufforderung stehen zu bleiben schon hören. Gehetzt sah er sich um. Plötzlich erblickte er den schmalen Durchgang. Blitzartig drehte er sich nach rechts und verschwand zwischen den beiden Gebäuden. Beinahe wäre er dabei über den umgestürzten Mülleimer gefallen, der mitten im Weg lag. Er versuchte, sich in der Dunkelheit seinen Weg zu ertasten und erstarrte: Anscheinend gab es keinen anderen Ausweg aus diesem kleinen Hof als den Durchgang, durch den er gekommen war. Die Schritte wurden lauter und lauter, er sah eine dunkle Gestalt um die Ecke biegen. Fieberhaft irrten seine Augen durch die nächtliche Dunkelheit und suchten einen Ausweg. War jetzt wirklich alles vorbei,
Binary file
Binary file
Binary file
@@ -0,0 +1 @@
1
+ Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove right at the coast of the Semantics, a large language ocean. A small river named Duden flows by their place and supplies it with the necessary regelialia. It is a paradisematic country, in which roasted parts of sentences fly into your mouth. Even the all-powerful Pointing has no control about the blind texts it is an almost unorthographic life One day however a small line of blind text by the name of Lorem Ipsum decided to leave for the far World of Grammar. The Big Oxmox advised her not to do so, because there were thousands of bad Commas, wild Question Marks and devious Semikoli, but the Little Blind Text didn’t listen. She packed her seven versalia, put her initial into the belt and made herself on the way. When she reached the first hills of the Italic Mountains, she had a last view back on the skyline of her hometown Bookmarksgrove, the headline of Alphabet Village and the subline of her own road, the Line Lane. Pityful a rethoric question ran over her cheek, then
@@ -0,0 +1 @@
1
+ Una mañana, tras un sueño intranquilo, Gregorio Samsa se despertó convertido en un monstruoso insecto. Estaba echado de espaldas sobre un duro caparazón y, al alzar la cabeza, vio su vientre convexo y oscuro, surcado por curvadas callosidades, sobre el que casi no se aguantaba la colcha, que estaba a punto de escurrirse hasta el suelo. Numerosas patas, penosamente delgadas en comparación con el grosor normal de sus piernas, se agitaban sin concierto. - ¿Qué me ha ocurrido? No estaba soñando. Su habitación, una habitación normal, aunque muy pequeña, tenía el aspecto habitual. Sobre la mesa había desparramado un muestrario de paños - Samsa era viajante de comercio-, y de la pared colgaba una estampa recientemente recortada de una revista ilustrada y puesta en un marco dorado. La estampa mostraba a una mujer tocada con un gorro de pieles, envuelta en una estola también de pieles, y que, muy erguida, esgrimía un amplio manguito, asimismo de piel, que ocultaba todo su antebrazo. Gregorio miró hacia la ventana; estaba nublado, y sobre el cinc del alféizar repiqueteaban las gotas de lluvia, lo que le hizo sentir una gran melancolía. «Bueno -pensó-; ¿y si siguiese durmiendo un rato y me olvidase de
@@ -0,0 +1 @@
1
+ En se réveillant un matin après des rêves agités, Gregor Samsa se retrouva, dans son lit, métamorphosé en un monstrueux insecte. Il était sur le dos, un dos aussi dur qu’une carapace, et, en relevant un peu la tête, il vit, bombé, brun, cloisonné par des arceaux plus rigides, son abdomen sur le haut duquel la couverture, prête à glisser tout à fait, ne tenait plus qu’à peine. Ses nombreuses pattes, lamentablement grêles par comparaison avec la corpulence qu’il avait par ailleurs, grouillaient désespérément sous ses yeux.« Qu’est-ce qui m’est arrivé ? » pensa-t-il. Ce n’était pas un rêve. Sa chambre, une vraie chambre humaine, juste un peu trop petite, était là tranquille entre les quatre murs qu’il connaissait bien. Au-dessus de la table où était déballée une collection d’échantillons de tissus - Samsa était représentant de commerce - on voyait accrochée l’image qu’il avait récemment découpée dans un magazine et mise dans un joli cadre doré. Elle représentait une dame munie d’une toque et d’un boa tous les deux en fourrure et qui, assise bien droite, tendait vers le spectateur un lourd manchon de fourrure où tout son avant-bras avait disparu. Le regard de Gregor se tourna ensuite vers
Binary file
@@ -0,0 +1 @@
1
+ hej