proselytism 0.0.1 → 0.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -1,13 +1,14 @@
1
1
  # Proselytism
2
2
 
3
- Document converter, text and image extractor using OpenOffice headless server, pdf_tools and net_pbm
3
+ Document converter, text and image extractor using OpenOffice headless server (JOD or PYOD converter), pdf_tools and net_pbm
4
+
5
+ Handled formats for document conversion : odt, doc, rtf, sxw, docx, txt, html, htm, wps, pdf
4
6
 
5
7
  ## Note
6
8
 
7
- This gem has been originally written for as a RoR 3.2 engine running on Ruby 1.8.7.
8
- It should be framework agnostic and has been tested on Ubuntu and MacOSX.
9
+ This gem has been originally written as a RoR 3.2 engine running on Ruby 1.8.7.
9
10
 
10
- Due to its dependency to system_timer it doesn't work with ruby 1.9.x
11
+ It is framework agnostic and has been tested on Ubuntu and MacOSX.
11
12
 
12
13
  ## Installation
13
14
 
@@ -16,28 +17,45 @@ Install the required external librairies :
16
17
  # aptitude install netpbm
17
18
  # aptitude install xpdf
18
19
  # aptitude install libreoffice
19
-
20
+
20
21
  Add this line to your application's Gemfile:
21
22
 
22
- gem 'proselytism', :git => "git://github.com/itkin/proselytism.git"
23
+ gem 'proselytism'
24
+
25
+ Note : for ruby 1.9 use the branch 1.9
26
+
27
+ gem 'proselytism', :git => "git://github.com/itkin/proselytism.git", :branch => "1.9"
23
28
 
24
29
  And then execute:
25
30
 
26
31
  $ bundle
27
32
 
28
- Generate the config file or / and an initializer
33
+ Configure the gem:
29
34
 
30
- $ rails g proselytism:config
31
- $ rails g proselytism:initializer
35
+ - With a YAML config file:
32
36
 
33
- As an engine, Proselytism automatically load and autoconfig with /config/proselytism.yml if it exists
34
- You can override these configurations params with an initializer. This is especially usefull when you want a custom log file
35
-
36
37
  ```ruby
37
- #/config/initializers/proselytism.rb
38
- Proselytism.config do |config|
39
- config.logger = ActiveSupport::BufferedLogger.new(File.join(Rails.root, 'log', 'proselytism.log'))
40
- end
38
+ $ rails g proselytism:config
39
+ ```
40
+
41
+ As an engine, Proselytism automatically load /config/proselytism.yml (if the file exists) and set its config params depending on the current rails env.
42
+
43
+ - With an initializer (optional for Rails App) :
44
+
45
+ You can override the configuration file params by adding a custom initializer to /config/initializers .
46
+ By default Proselytism will log in a separate log file, if you want to use the rails logger
47
+
48
+ ```ruby
49
+ #/config/initializers/proselytism.rb
50
+ Proselytism.config do |config|
51
+ config.logger = Rails.logger
52
+ end
53
+ ```
54
+
55
+ To generate a full config initializer:
56
+
57
+ ```ruby
58
+ $ rails g proselytism:initializer
41
59
  ```
42
60
 
43
61
  ## Usage
@@ -54,32 +72,38 @@ Proselytism.extract_images source_file_path do |image_files_paths|
54
72
  end
55
73
  ```
56
74
 
57
- Proselytism create its converted files in temporary folders.
58
- - If you pass a block to the method the folders are automatically deleted after the block is yield, so use or copy the file content within the block
59
- - If you don't pass a block, don't forget to safely remove the temp folder
75
+ Proselytism creates its converted files in temporary folders.
76
+ - If you pass a block to the method above the folders are automatically deleted after the block is yield, so use or copy the file content within the block
77
+ - If you don't pass a block, the mentioned folder and its content remains permanently, so don't forget to safely remove it yourself
60
78
 
61
79
  ```ruby
62
80
  pdf_file_path = Proselytism.convert source_file_path, :to => :pdf
81
+ #my code
63
82
  FileUtils.remove_entry_secure File.dirname(pdf_file_path)
64
83
  ```
65
84
 
66
- ## Add your own converter
85
+ ## Add your own converters
67
86
 
68
87
  Add your own converter by extending Proselytism::Converters::Base
69
- - Your converter will be automatically selected and used related to the form and to extensions list
88
+ - Your converter will be automatically selected and used related to the params given to the :from and :to methods
70
89
  - Add a perform method which
71
- - define a text command
72
- - call execute
73
- - return the converted file(s) path
90
+ - calls the execute method with your custom command
91
+ - returns the converted file(s) path(s)
92
+
93
+ Proselytism::Converters::Base takes care of
94
+ - raising error (if the command execution fail)
95
+ - logging the command output
74
96
 
75
97
  ```ruby
76
98
  class MyConverter < Proselytism::Converters::Base
99
+ class Error < parent::Base::Error; end
100
+
77
101
  form :ext1, :ext2
78
102
  to :ext3, :ext4
79
103
 
80
104
  def perform(origin, options={})
81
105
  destination = destination_file_path(origin, options)
82
- command = "pdftotext #{origin} #{destination} 2>&1"
106
+ command = "mycommand #{origin} #{destination} 2>&1"
83
107
  execute command
84
108
  destination
85
109
  end
@@ -8,6 +8,7 @@ test:
8
8
  oo_conversion_max_tries: 2
9
9
  oo_conversion_max_time: 4 #seconds
10
10
 
11
+
11
12
  development:
12
13
  open_office_path: "/Applications/OpenOffice.org.app/Contents/MacOS/soffice"
13
14
  oo_server_bridge: "JOD"
@@ -25,6 +25,13 @@ Proselytism.config do |config|
25
25
  #Path where conversion are done by default system temp dir
26
26
  #config.tmp_path = File.expand_path("../tmp", __FILE__)
27
27
 
28
- #Logger (otherwhise rails logger)
29
- #config.logger = ActiveSupport::BufferedLogger.new("your/log/path")
28
+ #Log level: By default env log level
29
+ #config.log_level = Rails.logger.level
30
+
31
+ #Log path :
32
+ #config.log_path = File.join(Rails.root, 'log', "proselytism.log")
33
+
34
+ #Logger instance
35
+ #config.logger = Proselytism::BufferedLogger.new Proselytism.config.log_path, Proselytism.config.log_level
36
+
30
37
  end
@@ -1,7 +1,7 @@
1
1
  require "active_support/core_ext"
2
2
 
3
3
  require "proselytism/version"
4
- require "proselytism/shared"
4
+ require "proselytism/logger"
5
5
  require "proselytism/proselytism"
6
6
  require "proselytism/converter"
7
7
 
@@ -6,14 +6,13 @@ module Proselytism
6
6
  module Converters
7
7
  class Base
8
8
  include ::Singleton
9
- include Proselytism::Shared
10
9
  class_attribute :from, :to, :subclasses
11
10
 
12
11
  class Error < Exception; end
13
12
 
14
- def config
15
- Proselytism.config
16
- end
13
+
14
+ delegate :config, :log, :to => Proselytism
15
+
17
16
 
18
17
  def destination_file_path(origin, options={})
19
18
  if options[:dest]
@@ -25,11 +24,11 @@ module Proselytism
25
24
 
26
25
  #call perform logging duration and potential errors
27
26
  def convert(file_path, options={})
28
- log :debug, "convert #{file_path} to :#{options[:to]}" do
27
+ log :debug, "#{self.class.name} converted #{file_path} to :#{options[:to]}" do
29
28
  begin
30
29
  perform(file_path, options)
31
30
  rescue Error => e
32
- log :error, e.message
31
+ log :error, "#{e.class.name} #{e.message}\n#{e.backtrace}\n"
33
32
  raise e
34
33
  end
35
34
  end
@@ -28,16 +28,21 @@ class Proselytism::Converters::OpenOffice < Proselytism::Converters::Base
28
28
  destination
29
29
  end
30
30
 
31
-
32
- # HACK pour contourner un comportement ?trange d'OpenOffice, normalement les enregistrements
33
- # se font en UTF-8, mais parfois pour une raison obscure les fichiers texte sont en ISO-8859-1
34
- # donc on rajoute un test pour re-convertir dans l'encodage qu'on attend
35
- def convert_txt_to_utf8(file_path)
36
- if `file #{file_path}` =~ /ISO/
37
- system("iconv --from-code ISO-8859-1 --to-code UTF-8 #{file_path} > tmp_iconv.txt && mv tmp_iconv.txt #{file_path}")
31
+ # For unknown reason sometimes OpenOffice converts in ISO-8859-1,
32
+ # post process to ensure a conversion in UTF-8 when :to => :txt
33
+ def perform_with_ensure_utf8(origin, options={})
34
+ destination = perform_without_ensure_utf8(origin, options)
35
+ if options[:to].to_s == "txt" and `file #{destination}` =~ /ISO/
36
+ #lookup_on = Iconv.new('ASCII//TRANSLIT','UTF-8').iconv(str).upcase.strip.gsub(/'/, " ")
37
+ #log :warn, "***OOO has converted file in "
38
+ tmp_iconv_file = "#{destination}-tmp_iconv.txt"
39
+ execute("iconv --from-code ISO-8859-1 --to-code UTF-8 #{destination} > #{tmp_iconv_file} && mv #{tmp_iconv_file} #{destination}")
38
40
  end
41
+ destination
39
42
  end
40
43
 
44
+ alias_method_chain :perform, :ensure_utf8
45
+
41
46
  def server
42
47
  Server.instance
43
48
  end
@@ -45,12 +50,9 @@ class Proselytism::Converters::OpenOffice < Proselytism::Converters::Base
45
50
 
46
51
  class Server
47
52
  include Singleton
48
- include Proselytism::Shared
49
53
  class Error < Proselytism::Converters::OpenOffice::Error; end
50
54
 
51
- def config
52
- Proselytism.config
53
- end
55
+ delegate :config, :log, :to => Proselytism
54
56
 
55
57
  # Run a block with a timeout and retry if the first execution fails
56
58
  def perform(&block)
@@ -108,9 +110,9 @@ class Proselytism::Converters::OpenOffice < Proselytism::Converters::Base
108
110
  begin
109
111
  Timeout::timeout(3) do
110
112
  loop do
111
- system("killall -9 soffice && killall -9 soffice.bin > /dev/null 2>&1")
113
+ system("killall -9 soffice > /dev/null 2>&1")
114
+ system("killall -9 soffice.bin > /dev/null 2>&1")
112
115
  break unless running?
113
- sleep(0.2)
114
116
  end
115
117
  end
116
118
  rescue Timeout::Error
@@ -11,13 +11,14 @@ module Proselytism
11
11
  params[Rails.env].each do |k, v|
12
12
  config.send "#{k}=", v
13
13
  end
14
- Proselytism.config.logger = nil
15
14
  end
16
15
  end
17
16
  end
18
17
 
19
18
  ActiveSupport.on_load :after_initialize do |app|
20
- Proselytism.config.logger ||= Rails.logger
19
+ Proselytism.config.log_level ||= Rails.logger.level
20
+ Proselytism.config.log_path ||= File.join(Rails.root, 'log', "proselytism.log")
21
+ Proselytism.config.logger ||= Proselytism::BufferedLogger.new Proselytism.config.log_path, Proselytism.config.log_level
21
22
  end
22
23
 
23
24
  end
@@ -0,0 +1,31 @@
1
+ module Proselytism
2
+
3
+ class Logger < ActiveSupport::BufferedLogger
4
+ class Formatter
5
+ def call(severity, time, progname, msg)
6
+ formatted_time = time.strftime("%Y-%m-%d %H:%M:%S.") << time.usec.to_s[0..2].rjust(3)
7
+ "#{formatted_time} [#{severity}][pid:#{$$}] #{msg.strip}\n"
8
+ end
9
+ end
10
+
11
+ def initialize(log, level = DEBUG)
12
+ super(log, level)
13
+ @log.formatter = Formatter.new
14
+ end
15
+
16
+ end
17
+
18
+ def self.log(severity, message = nil, &block)
19
+ if config.logger
20
+ start_time = Time.now
21
+ if block_given?
22
+ result = yield
23
+ config.logger.send(severity, "#{message} in #{((Time.now - start_time)*1000).to_i} ms")
24
+ else
25
+ config.logger.send(severity, message.strip)
26
+ end
27
+ block_given? ? result : true
28
+ end
29
+ end
30
+
31
+ end
@@ -1,7 +1,6 @@
1
1
  require "active_support/core_ext/module/attribute_accessors"
2
2
 
3
3
  module Proselytism
4
- extend Shared
5
4
  mattr_accessor :config
6
5
 
7
6
  def self.config(&block)
@@ -25,6 +24,9 @@ module Proselytism
25
24
  end
26
25
  end
27
26
 
27
+
28
+
29
+
28
30
  # Finds the relevant converter
29
31
  def self.get_converter(origin, destination)
30
32
  Converters::Base.subclasses.detect do |converter|
@@ -1,3 +1,3 @@
1
1
  module Proselytism
2
- VERSION = "0.0.1"
2
+ VERSION = "0.0.2"
3
3
  end
@@ -0,0 +1,63 @@
1
+ K�BIR BRAHIM
2
+
3
+ 1 rue Pierre Bonnard
4
+ 8 Pavillon Beethoven 62300 Lens
5
+ N� le�: 29/08/1973 � Calais
6
+ Nationalit�: Fran�aise
7
+ T�l�: 03/62/90/95/31 Portable�: 06/61/10/48/63
8
+ Email�: brahim62100@hotmail.fr
9
+ Vie maritale 3 enfants
10
+
11
+
12
+
13
+ EXPERIENCES PROFESSIONNELLES
14
+
15
+
16
+ Novembre 2008-Avril 2009� : Atlantique Automatisme Incendie
17
+ Monteur tuyauterie (Manpower)
18
+
19
+ Juin 2008-Octobre 2008 : Soci�t� d?Etude Fabrication Montage
20
+ Monteur (Manpower)
21
+
22
+ Avril 2008-Mai 2008 : Ateliers Bois
23
+ Monteur Charpentes m�talliques (Inter 5)
24
+
25
+ D�cembre 2007-F�vrier 2008� : Soci�t� d?Etude Fabrication Montage
26
+ Monteur (Manpower)
27
+
28
+ Aout 1998-Novembre 2007 : France Montage Fabrication
29
+ Monteur
30
+
31
+ Octobre 1996-Juillet 1998� : France Montage Fabrication
32
+ Monteur (Adia Interim)
33
+
34
+ D�cembre 1995-Septembre 1996�: Service Militaire
35
+
36
+
37
+
38
+
39
+ FORMATIONS ET DIPLOMES
40
+
41
+
42
+ 2005�: Littoral Formation (Dunkerque)
43
+ CACES 3 Chariot El�vateur
44
+ CACES 9 Engins de Manutention
45
+ CACES 3B Cat�gorie de PEMP
46
+
47
+ 1995�: Lyc�e Pierre de Coubertin � Calais
48
+ BTS Transport et Logistique (niveau)
49
+
50
+ 1993�: Lyc�e Pierre de Coubertin � Calais
51
+ Bac G2 Techniques Quantitatives de Gestion
52
+
53
+ 1988�: Coll�ge Lucien Vadez � Calais
54
+ BEPC
55
+
56
+
57
+
58
+ INFORMATIONS COMPLEMENTAIRES
59
+
60
+
61
+ Permis B + voiture personnel
62
+ Loisirs�: Lecture,sport,sorties p�destres
63
+ Maitrise de l?outil informatique�: Internet,Word,Excel
@@ -55,6 +55,16 @@ describe Proselytism::Converters::OpenOffice.instance do
55
55
  subject.perform fixture_path("002.doc"), :dir => tmp_dir, :to => :txt
56
56
  }.should change(self, :tmp_dir_file_count).by 1
57
57
  end
58
+ it "should ensure destination file encoding is utf8" do
59
+ FileUtils.cp(fixture_path("001-latin.txt"), tmp_path("001-latin.txt"))
60
+ subject.should_receive(:perform_without_ensure_utf8).
61
+ with(fixture_path("001.doc"), :dir => tmp_dir, :to => :txt).
62
+ and_return(tmp_path("001-latin.txt"))
63
+ subject.perform fixture_path("001.doc"), :dir => tmp_dir, :to => :txt do |converted_file|
64
+ `file #{converted_file}`.should match 'UTF-8'
65
+ File.read(converted_file).should match('é')
66
+ end
67
+ end
58
68
 
59
69
  it "should not freeze" do
60
70
  3.times do |j|
@@ -15,7 +15,7 @@ Proselytism.config do |config|
15
15
  config.oo_conversion_max_time = 4 #seconds
16
16
 
17
17
  config.tmp_path = File.expand_path("../tmp", __FILE__)
18
- config.logger = ActiveSupport::BufferedLogger.new(File.expand_path("../tmp/log", __FILE__))
18
+ config.logger = Proselytism::Logger.new(File.expand_path("../tmp/log", __FILE__), 0)
19
19
  end
20
20
 
21
21
 
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: proselytism
3
3
  version: !ruby/object:Gem::Version
4
- hash: 29
4
+ hash: 27
5
5
  prerelease:
6
6
  segments:
7
7
  - 0
8
8
  - 0
9
- - 1
10
- version: 0.0.1
9
+ - 2
10
+ version: 0.0.2
11
11
  platform: ruby
12
12
  authors:
13
13
  - Itkin
@@ -15,7 +15,7 @@ autorequire:
15
15
  bindir: bin
16
16
  cert_chain: []
17
17
 
18
- date: 2013-03-06 00:00:00 Z
18
+ date: 2013-03-08 00:00:00 Z
19
19
  dependencies:
20
20
  - !ruby/object:Gem::Dependency
21
21
  name: activesupport
@@ -125,12 +125,13 @@ files:
125
125
  - lib/proselytism/converters/pdf_to_text.rb
126
126
  - lib/proselytism/converters/ppm_to_jpeg.rb
127
127
  - lib/proselytism/engine.rb
128
+ - lib/proselytism/logger.rb
128
129
  - lib/proselytism/proselytism.rb
129
- - lib/proselytism/shared.rb
130
130
  - lib/proselytism/version.rb
131
131
  - proselytism.gemspec
132
132
  - spec/.DS_Store
133
133
  - spec/base_converter_spec.rb
134
+ - spec/fixtures/001-latin.txt
134
135
  - spec/fixtures/001.doc
135
136
  - spec/fixtures/001.pdf
136
137
  - spec/fixtures/001.txt
@@ -142,7 +143,6 @@ files:
142
143
  - spec/pdf_images_spec.rb
143
144
  - spec/pdf_to_text_spec.rb
144
145
  - spec/proselytism_spec.rb
145
- - spec/shared_spec.rb
146
146
  - spec/spec_helper.rb
147
147
  homepage: https://github.com/itkin/proselytism.git
148
148
  licenses: []
@@ -180,6 +180,7 @@ summary: document converter and plain text extractor
180
180
  test_files:
181
181
  - spec/.DS_Store
182
182
  - spec/base_converter_spec.rb
183
+ - spec/fixtures/001-latin.txt
183
184
  - spec/fixtures/001.doc
184
185
  - spec/fixtures/001.pdf
185
186
  - spec/fixtures/001.txt
@@ -191,5 +192,4 @@ test_files:
191
192
  - spec/pdf_images_spec.rb
192
193
  - spec/pdf_to_text_spec.rb
193
194
  - spec/proselytism_spec.rb
194
- - spec/shared_spec.rb
195
195
  - spec/spec_helper.rb
@@ -1,22 +0,0 @@
1
- module Proselytism
2
- module Shared
3
-
4
- def log(severity, message = nil)
5
- config.logger.send(severity, message) if config.logger
6
- end
7
-
8
- def log_with_time(severity, message = nil, &block)
9
- start_time = Time.now
10
- delay = nil
11
- if block_given?
12
- result = yield
13
- delay = "(#{((Time.now - start_time)*1000).to_i} ms) "
14
- end
15
- message= "** Proselytism #{start_time.strftime("%Y-%m-%d %H:%M:%S")} #{delay}: " + message.to_s
16
- log_without_time(severity, message)
17
- block_given? ? result : true
18
- end
19
- alias_method_chain :log, :time
20
-
21
- end
22
- end
@@ -1,26 +0,0 @@
1
- require "spec_helper"
2
-
3
- describe Proselytism do
4
- context "log" do
5
- it "should log with class and time data" do
6
- subject.config.logger.should_receive(:debug).with do |message|
7
- message.should match /Proselytism/
8
- message.should match /io/
9
- message.should match Time.now.strftime("%Y-%m-%d")
10
- end
11
- subject.log(:debug, 'io').should be_true
12
- end
13
- it "should log delay when a block is passed" do
14
- subject.config.logger.should_receive(:debug).with do |message|
15
- message.should match /Proselytism/
16
- message.should match /io/
17
- message.should match /([\d:]+)/
18
- end
19
- subject.log :debug , 'io' do
20
- sleep(0.5)
21
- false
22
- end.should be_false
23
- end
24
-
25
- end
26
- end