proselytism 0.0.1 → 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -1,13 +1,14 @@
1
1
  # Proselytism
2
2
 
3
- Document converter, text and image extractor using OpenOffice headless server, pdf_tools and net_pbm
3
+ Document converter, text and image extractor using OpenOffice headless server (JOD or PYOD converter), pdf_tools and net_pbm
4
+
5
+ Handled formats for document conversion : odt, doc, rtf, sxw, docx, txt, html, htm, wps, pdf
4
6
 
5
7
  ## Note
6
8
 
7
- This gem has been originally written for as a RoR 3.2 engine running on Ruby 1.8.7.
8
- It should be framework agnostic and has been tested on Ubuntu and MacOSX.
9
+ This gem has been originally written as a RoR 3.2 engine running on Ruby 1.8.7.
9
10
 
10
- Due to its dependency to system_timer it doesn't work with ruby 1.9.x
11
+ It is framework agnostic and has been tested on Ubuntu and MacOSX.
11
12
 
12
13
  ## Installation
13
14
 
@@ -16,28 +17,45 @@ Install the required external librairies :
16
17
  # aptitude install netpbm
17
18
  # aptitude install xpdf
18
19
  # aptitude install libreoffice
19
-
20
+
20
21
  Add this line to your application's Gemfile:
21
22
 
22
- gem 'proselytism', :git => "git://github.com/itkin/proselytism.git"
23
+ gem 'proselytism'
24
+
25
+ Note : for ruby 1.9 use the branch 1.9
26
+
27
+ gem 'proselytism', :git => "git://github.com/itkin/proselytism.git", :branch => "1.9"
23
28
 
24
29
  And then execute:
25
30
 
26
31
  $ bundle
27
32
 
28
- Generate the config file or / and an initializer
33
+ Configure the gem:
29
34
 
30
- $ rails g proselytism:config
31
- $ rails g proselytism:initializer
35
+ - With a YAML config file:
32
36
 
33
- As an engine, Proselytism automatically load and autoconfig with /config/proselytism.yml if it exists
34
- You can override these configurations params with an initializer. This is especially usefull when you want a custom log file
35
-
36
37
  ```ruby
37
- #/config/initializers/proselytism.rb
38
- Proselytism.config do |config|
39
- config.logger = ActiveSupport::BufferedLogger.new(File.join(Rails.root, 'log', 'proselytism.log'))
40
- end
38
+ $ rails g proselytism:config
39
+ ```
40
+
41
+ As an engine, Proselytism automatically load /config/proselytism.yml (if the file exists) and set its config params depending on the current rails env.
42
+
43
+ - With an initializer (optional for Rails App) :
44
+
45
+ You can override the configuration file params by adding a custom initializer to /config/initializers .
46
+ By default Proselytism will log in a separate log file, if you want to use the rails logger
47
+
48
+ ```ruby
49
+ #/config/initializers/proselytism.rb
50
+ Proselytism.config do |config|
51
+ config.logger = Rails.logger
52
+ end
53
+ ```
54
+
55
+ To generate a full config initializer:
56
+
57
+ ```ruby
58
+ $ rails g proselytism:initializer
41
59
  ```
42
60
 
43
61
  ## Usage
@@ -54,32 +72,38 @@ Proselytism.extract_images source_file_path do |image_files_paths|
54
72
  end
55
73
  ```
56
74
 
57
- Proselytism create its converted files in temporary folders.
58
- - If you pass a block to the method the folders are automatically deleted after the block is yield, so use or copy the file content within the block
59
- - If you don't pass a block, don't forget to safely remove the temp folder
75
+ Proselytism creates its converted files in temporary folders.
76
+ - If you pass a block to the method above the folders are automatically deleted after the block is yield, so use or copy the file content within the block
77
+ - If you don't pass a block, the mentioned folder and its content remains permanently, so don't forget to safely remove it yourself
60
78
 
61
79
  ```ruby
62
80
  pdf_file_path = Proselytism.convert source_file_path, :to => :pdf
81
+ #my code
63
82
  FileUtils.remove_entry_secure File.dirname(pdf_file_path)
64
83
  ```
65
84
 
66
- ## Add your own converter
85
+ ## Add your own converters
67
86
 
68
87
  Add your own converter by extending Proselytism::Converters::Base
69
- - Your converter will be automatically selected and used related to the form and to extensions list
88
+ - Your converter will be automatically selected and used related to the params given to the :from and :to methods
70
89
  - Add a perform method which
71
- - define a text command
72
- - call execute
73
- - return the converted file(s) path
90
+ - calls the execute method with your custom command
91
+ - returns the converted file(s) path(s)
92
+
93
+ Proselytism::Converters::Base takes care of
94
+ - raising error (if the command execution fail)
95
+ - logging the command output
74
96
 
75
97
  ```ruby
76
98
  class MyConverter < Proselytism::Converters::Base
99
+ class Error < parent::Base::Error; end
100
+
77
101
  form :ext1, :ext2
78
102
  to :ext3, :ext4
79
103
 
80
104
  def perform(origin, options={})
81
105
  destination = destination_file_path(origin, options)
82
- command = "pdftotext #{origin} #{destination} 2>&1"
106
+ command = "mycommand #{origin} #{destination} 2>&1"
83
107
  execute command
84
108
  destination
85
109
  end
@@ -8,6 +8,7 @@ test:
8
8
  oo_conversion_max_tries: 2
9
9
  oo_conversion_max_time: 4 #seconds
10
10
 
11
+
11
12
  development:
12
13
  open_office_path: "/Applications/OpenOffice.org.app/Contents/MacOS/soffice"
13
14
  oo_server_bridge: "JOD"
@@ -25,6 +25,13 @@ Proselytism.config do |config|
25
25
  #Path where conversion are done by default system temp dir
26
26
  #config.tmp_path = File.expand_path("../tmp", __FILE__)
27
27
 
28
- #Logger (otherwhise rails logger)
29
- #config.logger = ActiveSupport::BufferedLogger.new("your/log/path")
28
+ #Log level: By default env log level
29
+ #config.log_level = Rails.logger.level
30
+
31
+ #Log path :
32
+ #config.log_path = File.join(Rails.root, 'log', "proselytism.log")
33
+
34
+ #Logger instance
35
+ #config.logger = Proselytism::BufferedLogger.new Proselytism.config.log_path, Proselytism.config.log_level
36
+
30
37
  end
@@ -1,7 +1,7 @@
1
1
  require "active_support/core_ext"
2
2
 
3
3
  require "proselytism/version"
4
- require "proselytism/shared"
4
+ require "proselytism/logger"
5
5
  require "proselytism/proselytism"
6
6
  require "proselytism/converter"
7
7
 
@@ -6,14 +6,13 @@ module Proselytism
6
6
  module Converters
7
7
  class Base
8
8
  include ::Singleton
9
- include Proselytism::Shared
10
9
  class_attribute :from, :to, :subclasses
11
10
 
12
11
  class Error < Exception; end
13
12
 
14
- def config
15
- Proselytism.config
16
- end
13
+
14
+ delegate :config, :log, :to => Proselytism
15
+
17
16
 
18
17
  def destination_file_path(origin, options={})
19
18
  if options[:dest]
@@ -25,11 +24,11 @@ module Proselytism
25
24
 
26
25
  #call perform logging duration and potential errors
27
26
  def convert(file_path, options={})
28
- log :debug, "convert #{file_path} to :#{options[:to]}" do
27
+ log :debug, "#{self.class.name} converted #{file_path} to :#{options[:to]}" do
29
28
  begin
30
29
  perform(file_path, options)
31
30
  rescue Error => e
32
- log :error, e.message
31
+ log :error, "#{e.class.name} #{e.message}\n#{e.backtrace}\n"
33
32
  raise e
34
33
  end
35
34
  end
@@ -28,16 +28,21 @@ class Proselytism::Converters::OpenOffice < Proselytism::Converters::Base
28
28
  destination
29
29
  end
30
30
 
31
-
32
- # HACK pour contourner un comportement ?trange d'OpenOffice, normalement les enregistrements
33
- # se font en UTF-8, mais parfois pour une raison obscure les fichiers texte sont en ISO-8859-1
34
- # donc on rajoute un test pour re-convertir dans l'encodage qu'on attend
35
- def convert_txt_to_utf8(file_path)
36
- if `file #{file_path}` =~ /ISO/
37
- system("iconv --from-code ISO-8859-1 --to-code UTF-8 #{file_path} > tmp_iconv.txt && mv tmp_iconv.txt #{file_path}")
31
+ # For unknown reason sometimes OpenOffice converts in ISO-8859-1,
32
+ # post process to ensure a conversion in UTF-8 when :to => :txt
33
+ def perform_with_ensure_utf8(origin, options={})
34
+ destination = perform_without_ensure_utf8(origin, options)
35
+ if options[:to].to_s == "txt" and `file #{destination}` =~ /ISO/
36
+ #lookup_on = Iconv.new('ASCII//TRANSLIT','UTF-8').iconv(str).upcase.strip.gsub(/'/, " ")
37
+ #log :warn, "***OOO has converted file in "
38
+ tmp_iconv_file = "#{destination}-tmp_iconv.txt"
39
+ execute("iconv --from-code ISO-8859-1 --to-code UTF-8 #{destination} > #{tmp_iconv_file} && mv #{tmp_iconv_file} #{destination}")
38
40
  end
41
+ destination
39
42
  end
40
43
 
44
+ alias_method_chain :perform, :ensure_utf8
45
+
41
46
  def server
42
47
  Server.instance
43
48
  end
@@ -45,12 +50,9 @@ class Proselytism::Converters::OpenOffice < Proselytism::Converters::Base
45
50
 
46
51
  class Server
47
52
  include Singleton
48
- include Proselytism::Shared
49
53
  class Error < Proselytism::Converters::OpenOffice::Error; end
50
54
 
51
- def config
52
- Proselytism.config
53
- end
55
+ delegate :config, :log, :to => Proselytism
54
56
 
55
57
  # Run a block with a timeout and retry if the first execution fails
56
58
  def perform(&block)
@@ -108,9 +110,9 @@ class Proselytism::Converters::OpenOffice < Proselytism::Converters::Base
108
110
  begin
109
111
  Timeout::timeout(3) do
110
112
  loop do
111
- system("killall -9 soffice && killall -9 soffice.bin > /dev/null 2>&1")
113
+ system("killall -9 soffice > /dev/null 2>&1")
114
+ system("killall -9 soffice.bin > /dev/null 2>&1")
112
115
  break unless running?
113
- sleep(0.2)
114
116
  end
115
117
  end
116
118
  rescue Timeout::Error
@@ -11,13 +11,14 @@ module Proselytism
11
11
  params[Rails.env].each do |k, v|
12
12
  config.send "#{k}=", v
13
13
  end
14
- Proselytism.config.logger = nil
15
14
  end
16
15
  end
17
16
  end
18
17
 
19
18
  ActiveSupport.on_load :after_initialize do |app|
20
- Proselytism.config.logger ||= Rails.logger
19
+ Proselytism.config.log_level ||= Rails.logger.level
20
+ Proselytism.config.log_path ||= File.join(Rails.root, 'log', "proselytism.log")
21
+ Proselytism.config.logger ||= Proselytism::BufferedLogger.new Proselytism.config.log_path, Proselytism.config.log_level
21
22
  end
22
23
 
23
24
  end
@@ -0,0 +1,31 @@
1
+ module Proselytism
2
+
3
+ class Logger < ActiveSupport::BufferedLogger
4
+ class Formatter
5
+ def call(severity, time, progname, msg)
6
+ formatted_time = time.strftime("%Y-%m-%d %H:%M:%S.") << time.usec.to_s[0..2].rjust(3)
7
+ "#{formatted_time} [#{severity}][pid:#{$$}] #{msg.strip}\n"
8
+ end
9
+ end
10
+
11
+ def initialize(log, level = DEBUG)
12
+ super(log, level)
13
+ @log.formatter = Formatter.new
14
+ end
15
+
16
+ end
17
+
18
+ def self.log(severity, message = nil, &block)
19
+ if config.logger
20
+ start_time = Time.now
21
+ if block_given?
22
+ result = yield
23
+ config.logger.send(severity, "#{message} in #{((Time.now - start_time)*1000).to_i} ms")
24
+ else
25
+ config.logger.send(severity, message.strip)
26
+ end
27
+ block_given? ? result : true
28
+ end
29
+ end
30
+
31
+ end
@@ -1,7 +1,6 @@
1
1
  require "active_support/core_ext/module/attribute_accessors"
2
2
 
3
3
  module Proselytism
4
- extend Shared
5
4
  mattr_accessor :config
6
5
 
7
6
  def self.config(&block)
@@ -25,6 +24,9 @@ module Proselytism
25
24
  end
26
25
  end
27
26
 
27
+
28
+
29
+
28
30
  # Finds the relevant converter
29
31
  def self.get_converter(origin, destination)
30
32
  Converters::Base.subclasses.detect do |converter|
@@ -1,3 +1,3 @@
1
1
  module Proselytism
2
- VERSION = "0.0.1"
2
+ VERSION = "0.0.2"
3
3
  end
@@ -0,0 +1,63 @@
1
+ K�BIR BRAHIM
2
+
3
+ 1 rue Pierre Bonnard
4
+ 8 Pavillon Beethoven 62300 Lens
5
+ N� le�: 29/08/1973 � Calais
6
+ Nationalit�: Fran�aise
7
+ T�l�: 03/62/90/95/31 Portable�: 06/61/10/48/63
8
+ Email�: brahim62100@hotmail.fr
9
+ Vie maritale 3 enfants
10
+
11
+
12
+
13
+ EXPERIENCES PROFESSIONNELLES
14
+
15
+
16
+ Novembre 2008-Avril 2009� : Atlantique Automatisme Incendie
17
+ Monteur tuyauterie (Manpower)
18
+
19
+ Juin 2008-Octobre 2008 : Soci�t� d?Etude Fabrication Montage
20
+ Monteur (Manpower)
21
+
22
+ Avril 2008-Mai 2008 : Ateliers Bois
23
+ Monteur Charpentes m�talliques (Inter 5)
24
+
25
+ D�cembre 2007-F�vrier 2008� : Soci�t� d?Etude Fabrication Montage
26
+ Monteur (Manpower)
27
+
28
+ Aout 1998-Novembre 2007 : France Montage Fabrication
29
+ Monteur
30
+
31
+ Octobre 1996-Juillet 1998� : France Montage Fabrication
32
+ Monteur (Adia Interim)
33
+
34
+ D�cembre 1995-Septembre 1996�: Service Militaire
35
+
36
+
37
+
38
+
39
+ FORMATIONS ET DIPLOMES
40
+
41
+
42
+ 2005�: Littoral Formation (Dunkerque)
43
+ CACES 3 Chariot El�vateur
44
+ CACES 9 Engins de Manutention
45
+ CACES 3B Cat�gorie de PEMP
46
+
47
+ 1995�: Lyc�e Pierre de Coubertin � Calais
48
+ BTS Transport et Logistique (niveau)
49
+
50
+ 1993�: Lyc�e Pierre de Coubertin � Calais
51
+ Bac G2 Techniques Quantitatives de Gestion
52
+
53
+ 1988�: Coll�ge Lucien Vadez � Calais
54
+ BEPC
55
+
56
+
57
+
58
+ INFORMATIONS COMPLEMENTAIRES
59
+
60
+
61
+ Permis B + voiture personnel
62
+ Loisirs�: Lecture,sport,sorties p�destres
63
+ Maitrise de l?outil informatique�: Internet,Word,Excel
@@ -55,6 +55,16 @@ describe Proselytism::Converters::OpenOffice.instance do
55
55
  subject.perform fixture_path("002.doc"), :dir => tmp_dir, :to => :txt
56
56
  }.should change(self, :tmp_dir_file_count).by 1
57
57
  end
58
+ it "should ensure destination file encoding is utf8" do
59
+ FileUtils.cp(fixture_path("001-latin.txt"), tmp_path("001-latin.txt"))
60
+ subject.should_receive(:perform_without_ensure_utf8).
61
+ with(fixture_path("001.doc"), :dir => tmp_dir, :to => :txt).
62
+ and_return(tmp_path("001-latin.txt"))
63
+ subject.perform fixture_path("001.doc"), :dir => tmp_dir, :to => :txt do |converted_file|
64
+ `file #{converted_file}`.should match 'UTF-8'
65
+ File.read(converted_file).should match('é')
66
+ end
67
+ end
58
68
 
59
69
  it "should not freeze" do
60
70
  3.times do |j|
@@ -15,7 +15,7 @@ Proselytism.config do |config|
15
15
  config.oo_conversion_max_time = 4 #seconds
16
16
 
17
17
  config.tmp_path = File.expand_path("../tmp", __FILE__)
18
- config.logger = ActiveSupport::BufferedLogger.new(File.expand_path("../tmp/log", __FILE__))
18
+ config.logger = Proselytism::Logger.new(File.expand_path("../tmp/log", __FILE__), 0)
19
19
  end
20
20
 
21
21
 
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: proselytism
3
3
  version: !ruby/object:Gem::Version
4
- hash: 29
4
+ hash: 27
5
5
  prerelease:
6
6
  segments:
7
7
  - 0
8
8
  - 0
9
- - 1
10
- version: 0.0.1
9
+ - 2
10
+ version: 0.0.2
11
11
  platform: ruby
12
12
  authors:
13
13
  - Itkin
@@ -15,7 +15,7 @@ autorequire:
15
15
  bindir: bin
16
16
  cert_chain: []
17
17
 
18
- date: 2013-03-06 00:00:00 Z
18
+ date: 2013-03-08 00:00:00 Z
19
19
  dependencies:
20
20
  - !ruby/object:Gem::Dependency
21
21
  name: activesupport
@@ -125,12 +125,13 @@ files:
125
125
  - lib/proselytism/converters/pdf_to_text.rb
126
126
  - lib/proselytism/converters/ppm_to_jpeg.rb
127
127
  - lib/proselytism/engine.rb
128
+ - lib/proselytism/logger.rb
128
129
  - lib/proselytism/proselytism.rb
129
- - lib/proselytism/shared.rb
130
130
  - lib/proselytism/version.rb
131
131
  - proselytism.gemspec
132
132
  - spec/.DS_Store
133
133
  - spec/base_converter_spec.rb
134
+ - spec/fixtures/001-latin.txt
134
135
  - spec/fixtures/001.doc
135
136
  - spec/fixtures/001.pdf
136
137
  - spec/fixtures/001.txt
@@ -142,7 +143,6 @@ files:
142
143
  - spec/pdf_images_spec.rb
143
144
  - spec/pdf_to_text_spec.rb
144
145
  - spec/proselytism_spec.rb
145
- - spec/shared_spec.rb
146
146
  - spec/spec_helper.rb
147
147
  homepage: https://github.com/itkin/proselytism.git
148
148
  licenses: []
@@ -180,6 +180,7 @@ summary: document converter and plain text extractor
180
180
  test_files:
181
181
  - spec/.DS_Store
182
182
  - spec/base_converter_spec.rb
183
+ - spec/fixtures/001-latin.txt
183
184
  - spec/fixtures/001.doc
184
185
  - spec/fixtures/001.pdf
185
186
  - spec/fixtures/001.txt
@@ -191,5 +192,4 @@ test_files:
191
192
  - spec/pdf_images_spec.rb
192
193
  - spec/pdf_to_text_spec.rb
193
194
  - spec/proselytism_spec.rb
194
- - spec/shared_spec.rb
195
195
  - spec/spec_helper.rb
@@ -1,22 +0,0 @@
1
- module Proselytism
2
- module Shared
3
-
4
- def log(severity, message = nil)
5
- config.logger.send(severity, message) if config.logger
6
- end
7
-
8
- def log_with_time(severity, message = nil, &block)
9
- start_time = Time.now
10
- delay = nil
11
- if block_given?
12
- result = yield
13
- delay = "(#{((Time.now - start_time)*1000).to_i} ms) "
14
- end
15
- message= "** Proselytism #{start_time.strftime("%Y-%m-%d %H:%M:%S")} #{delay}: " + message.to_s
16
- log_without_time(severity, message)
17
- block_given? ? result : true
18
- end
19
- alias_method_chain :log, :time
20
-
21
- end
22
- end
@@ -1,26 +0,0 @@
1
- require "spec_helper"
2
-
3
- describe Proselytism do
4
- context "log" do
5
- it "should log with class and time data" do
6
- subject.config.logger.should_receive(:debug).with do |message|
7
- message.should match /Proselytism/
8
- message.should match /io/
9
- message.should match Time.now.strftime("%Y-%m-%d")
10
- end
11
- subject.log(:debug, 'io').should be_true
12
- end
13
- it "should log delay when a block is passed" do
14
- subject.config.logger.should_receive(:debug).with do |message|
15
- message.should match /Proselytism/
16
- message.should match /io/
17
- message.should match /([\d:]+)/
18
- end
19
- subject.log :debug , 'io' do
20
- sleep(0.5)
21
- false
22
- end.should be_false
23
- end
24
-
25
- end
26
- end