despeck 0.3.0 → 0.4.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: 39b755a474307d4ace2641ee3d6b7202eda4c8d5
4
- data.tar.gz: 5cf5487d284c2e113b7b46cb866dfeb318495a37
2
+ SHA256:
3
+ metadata.gz: 33f1c6c61bd3033e7bea34a95f83bedf3286e1ef9e0740417bee7d0c437a047c
4
+ data.tar.gz: 480a0a0f4a4b6a4b7df8c037d7199fcd0cc6f90dc4a20384650f882bb35a4909
5
5
  SHA512:
6
- metadata.gz: b1e9a14709e077d5607445f4c46cb66323e783ac80134f12e20cc4d2e67ea3bd8882009bba5bed20e23a843b90d789cc5f8fb5f51d60cc9d5fbb582b7ab57df2
7
- data.tar.gz: 33912c0cc1999c1a652be4404819d3ae09c89677ddbdc64cf77fb993746cbc0dc82490587b8ee48959b263d7e7019d30189b901aacbea7ade497f633f52b51e3
6
+ metadata.gz: c6b0b6f1e086ea4bdbc697ed0ed204688e4306239de9ad909d812cc89d32bbfeada413b01240a574cd53640d28e4805feb163b3d93b73a88cc15626922e2561e
7
+ data.tar.gz: 4bd2a99f218bb7ffa9bc15bbb2be2810d22923973659446e664d2260a6b0d91ae1aaceeb2d755c996834c482f19dfdf6180f152818624c19d8b760f0a3696ac8
@@ -0,0 +1,75 @@
1
+
2
+ name: ubuntu
3
+ on:
4
+ push:
5
+ branches: [ master ]
6
+ tags:
7
+ - '*'
8
+ pull_request:
9
+ paths-ignore:
10
+ - .github/workflows/macos.yml
11
+ - .github/workflows/windows.yml
12
+
13
+ jobs:
14
+ rspec:
15
+ name: Ubuntu RSpec [ruby-${{ matrix.ruby }}&libvips-${{ matrix.libvips_version }}]
16
+ runs-on: ubuntu-latest
17
+ continue-on-error: ${{ matrix.experimental }}
18
+ strategy:
19
+ fail-fast: false
20
+ matrix:
21
+ ruby: [ '2.6', '2.5', '2.4', '2.3' ]
22
+ libvips_version: ['8.6.5', '8.7.4']
23
+ experimental: [false]
24
+ include:
25
+ - ruby: '2.7'
26
+ libvips_version: ['8.6.5', '8.7.4']
27
+ experimental: true
28
+
29
+ steps:
30
+ - uses: actions/checkout@v2
31
+ - name: Use Ruby
32
+ uses: ruby/setup-ruby@v1
33
+ with:
34
+ ruby-version: ${{ matrix.ruby }}
35
+ - name: Setup Deps Package
36
+ run: |
37
+ sudo apt-get install -y libexpat1-dev gettext liblcms2-dev \
38
+ libmagickwand-dev libopenexr-dev libcfitsio-dev libgif-dev \
39
+ libgs-dev libgsf-1-dev libmatio-dev libopenslide-dev liborc-0.4-dev \
40
+ libpango1.0-dev libpoppler-glib-dev librsvg2-dev \
41
+ libwebp-dev libfftw3-dev libglib2.0-dev tesseract-ocr \
42
+ tesseract-ocr-chi-sim imagemagick \
43
+ libxslt-dev libxml2-dev
44
+ - name: Cache libvips
45
+ uses: actions/cache@v2
46
+ with:
47
+ path: ~/vips
48
+ key: ${{ runner.os }}-${{matrix.libvips_version}}-vips-${{ hashFiles('**/install-vips.sh') }}
49
+ - name: Setup Libvips
50
+ run: |
51
+ export LIBVIPS_VERSION=${{matrix.libvips_version}}
52
+ bash install-vips.sh --without-python
53
+ - name: Update gems
54
+ run: |
55
+ export NOKOGIRI_USE_SYSTEM_LIBRARIES=true
56
+ export PATH=$HOME/vips/bin:$PATH
57
+ export LD_LIBRARY_PATH=$HOME/vips/lib:$LD_LIBRARY_PATH
58
+ export PKG_CONFIG_PATH=$HOME/vips/lib/pkgconfig:$PKG_CONFIG_PATH
59
+ gem uninstall bundler
60
+ gem install bundler -v '~> 1.16'
61
+ bundle config path vendor/bundle
62
+ bundle install --jobs 4 --retry 3
63
+ cat Gemfile.lock
64
+ - name: Run specs
65
+ run: |
66
+ export PATH=$HOME/vips/bin:$PATH
67
+ export LD_LIBRARY_PATH=$HOME/vips/lib:$LD_LIBRARY_PATH
68
+ export PKG_CONFIG_PATH=$HOME/vips/lib/pkgconfig:$PKG_CONFIG_PATH
69
+ bundle exec rake
70
+ - name: Rubocop
71
+ run: |
72
+ export PATH=$HOME/vips/bin:$PATH
73
+ export LD_LIBRARY_PATH=$HOME/vips/lib:$LD_LIBRARY_PATH
74
+ export PKG_CONFIG_PATH=$HOME/vips/lib/pkgconfig:$PKG_CONFIG_PATH
75
+ bundle exec rubocop
@@ -0,0 +1,196 @@
1
+ image:https://github.com/despeck/despeck/workflows/ubuntu/badge.svg["Build status (Ubuntu)", link="https://github.com/despeck/despeck/actions?workflow=ubuntu"]
2
+ image:https://badge.fury.io/rb/despeck.svg["Gem Version", link="https://badge.fury.io/rb/despeck"]
3
+
4
+ = Despeck
5
+
6
+ Remove unwanted stamps or watermarks from scanned images
7
+
8
+ `despeck` is a Ruby gem that helps you remove unwanted stamps or watermarks from
9
+ scanned images/PDFs, primarily prior to OCR.
10
+
11
+ Its image processing operations are based on `libvips` via the
12
+ https://github.com/jcupitt/ruby-vips[ruby-vips] Ruby-bindings.
13
+
14
+ It can be used to:
15
+
16
+ * detect uniform watermarks from a series of images,
17
+ * output a watermark pattern file (image, mask) that describes a watermark pattern, and
18
+ * remove a specified watermark pattern from input images regardless of the
19
+ location of the watermark on these images.
20
+
21
+ Assumptions on input:
22
+
23
+ * The input may be a single image, or a PDF of multiple pages of images.
24
+ * In the case of multiple pages, not all pages may have the watermark.
25
+ * The input images are assumed to be purely monochrome text-based.
26
+ * The watermarks are colored. For example, if the watermark is a "`GREEN SQUARE PATTERN`", for all
27
+ the pages that contain this mark, `despeck` will attempt to detect this pattern
28
+ and remove them.
29
+
30
+ == Installation
31
+
32
+ === General
33
+
34
+ Install gem manually:
35
+
36
+ [source,sh]
37
+ ----
38
+ $ gem install despeck
39
+ ----
40
+
41
+ Or add it to your `Gemfile`:
42
+
43
+ [source,ruby]
44
+ ----
45
+ gem 'despeck'
46
+ ----
47
+
48
+ and then run `bundle install`
49
+
50
+ === OCR functions
51
+
52
+ To be able to extract text via `despeck ocr` command, you'll need to install:
53
+
54
+ * Tesseract (3.x)
55
+ * ImageMagick (6.x)
56
+ * Desired languages
57
+
58
+ ==== MacOS
59
+
60
+ To install Tesseract itself (with all languages pre-installed):
61
+
62
+ [source,sh]
63
+ ----
64
+ $ brew install tesseract --all-languages
65
+ ----
66
+
67
+ Or you can install Tesseract with some languages manually:
68
+
69
+ [source,sh]
70
+ ----
71
+ $ brew install tesseract
72
+ $ mkdir -p ~/Downloads/tessdata
73
+ $ cd ~/Downloads/tessdata
74
+ $ wget https://github.com/tesseract-ocr/tessdata/raw/3.04.00/chi_sim.traineddata
75
+ ----
76
+
77
+ To install ImageMagick:
78
+
79
+ [source,sh]
80
+ ----
81
+ $ brew install imagemagick@6
82
+ $ echo 'export PATH="/usr/local/opt/imagemagick@6/bin:$PATH"' >> ~/.bash_profile
83
+ $ export PKG_CONFIG_PATH=/usr/local/opt/imagemagick@6/lib/pkgconfig
84
+ ----
85
+
86
+ The full list of languages trained data can be found here (note, they're different for different Tesseract versions):
87
+
88
+ https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#data-files-for-version-304305
89
+
90
+ ==== Ubuntu/Debian
91
+
92
+ [source,sh]
93
+ ----
94
+ $ apt-get install tesseract-ocr tesseract-ocr-chi-sim imagemagick
95
+ ----
96
+
97
+ ==== FAQ
98
+
99
+ > **I'm getting the following error:**
100
+ >
101
+ > 'convert': No such file or directory @ rb_sysopen - /var/folders/2t/xmdrn2sd2lv2w49dv0zw9_q00000gp/T/1521805124.661379908.txt (RTesseract::ConversionError)
102
+
103
+
104
+ *This error means you don't have the appropriate Tesseract language installed (or Tesseract is unable to find that language). See language installation instructions above.*
105
+
106
+
107
+
108
+ == Usage (Command Line)
109
+
110
+ Getting actual help:
111
+
112
+ [source,sh]
113
+ ----
114
+ # To show general help
115
+ despeck -h
116
+ despeck remove -h
117
+ despeck ocr -h
118
+ despeck despeck -h
119
+ ----
120
+
121
+ === All-in-one (aka Despeck)
122
+
123
+ If you need to remove watermark and extract OCR text, you may want to use:
124
+
125
+ [source,sh]
126
+ ----
127
+ $ bundle exec despeck despeck -l chi_sim input.jpg
128
+ ----
129
+
130
+ This is the same as two following commands:
131
+
132
+ [source,sh]
133
+ ----
134
+ $ bundle exec despeck remove input.jpg output.jpg
135
+ $ bundle exec despeck ocr -l chi_sim output.jpg
136
+ ----
137
+
138
+ === Remove watermark
139
+
140
+ To remove watermark:
141
+
142
+ [source,sh]
143
+ ----
144
+ $ despeck remove /path/to/input.jpg /path/to/output.jpg
145
+ ----
146
+
147
+ With the command above, Despeck will try to find the watermark colour, and apply best filter settings to remove the watermark. It may be wrong, so you can pass several parameters to help Despeck with that:
148
+
149
+ [source,sh]
150
+ ----
151
+ $ despeck remove --color 00FF00 --sensitivity 120 --black-const -60 --add-contrast /path/to/input.pdf /path/to/output.pdf
152
+ ----
153
+
154
+ A lit of available options:
155
+
156
+ * `--color 00FF00` - to say watermark is ~ green.
157
+ * `--sensitivity 120` - increases sensitivity (if with default 100 watermark is still visible).
158
+ * `--black-const -60` - by default, Despeck tries to improve text quality by increasing black by -110. This may be too much for you, so you can reduce that number.
159
+ * `--add-contrast` - disabled by default, increases output image's contrast.
160
+ * `--accurate` - disabled by default. Applies filters to the area with watermark only, preserving the rest of the image untouched.
161
+ * `--debug` - shows debug information during command execution.
162
+
163
+ ==== "Accurate" option
164
+
165
+ By default, `despeck` applies colour filters to the entire image and tries to improve the quality of the image by increasing contrast and cleaning the image.
166
+
167
+ It may decrease the original image quality in some cases, so there is the `--accurate` option, which forces `despeck` to apply `despeck` filters only to the area where watermark was found, leaving the rest of the image intact.
168
+
169
+ For example:
170
+
171
+ ===== Original image
172
+
173
+ image::readme_images/watermarked.jpg[Original image]
174
+
175
+ ===== Despecked with default options
176
+
177
+ image::readme_images/defaults.jpg[Despecked with defaults]
178
+
179
+ ===== Despecked with --accurate option
180
+
181
+ image::readme_images/accurate.jpg[Despecked with --accurate option]
182
+
183
+ == Usage
184
+
185
+ *(still under development)*
186
+
187
+ [source,ruby]
188
+ ----
189
+ wr = Despeck::WatermarkRemover.new(black_const: -90, resize: 0.01)
190
+ # => #<Despeck::WatermarkRemover:0x007f935b5a1a68 @add_contrast=true, @black_const=-110, @watermark_color=nil, @resize=0.1, @sensitivity=100>
191
+ image = Vips::Image.new_from_file("/path/to/image.jpg")
192
+ # => #<Image 4816x6900 uchar, 3 bands, srgb>
193
+ output_image = wr.remove_watermark(image)
194
+ # => #<Image 4816x6900 float, 3 bands, b-w>
195
+ output_image.write_to_file('/path/to/output.jpg')
196
+ ----
@@ -1,6 +1,5 @@
1
- # frozen_string_literal: true
2
-
3
1
  #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
4
3
 
5
4
  require 'bundler/setup'
6
5
  require 'despeck'
@@ -21,22 +21,20 @@ Gem::Specification.new do |spec|
21
21
  spec.files = `git ls-files -z`.split("\x0").reject do |f|
22
22
  f.match(%r{^(test|spec|features)/})
23
23
  end
24
- spec.bindir = 'bin'
25
- spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
24
+ spec.bindir = 'exe'
25
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
26
26
  spec.require_paths = ['lib']
27
-
28
27
  spec.required_ruby_version = '>= 2.3'
29
28
 
30
- spec.add_dependency 'clamp', '~> 1.2'
29
+ spec.add_dependency 'clamp', '~> 1.3'
31
30
  spec.add_dependency 'pdf-reader', '~> 2.1'
32
- spec.add_dependency 'prawn', '~> 2.2'
33
- spec.add_dependency 'rmagick', '~> 2'
34
- spec.add_dependency 'rtesseract', '~> 2.2'
31
+ spec.add_dependency 'prawn', '~> 2.3'
32
+ spec.add_dependency 'rmagick', '~> 4.0'
33
+ spec.add_dependency 'rtesseract', '~> 3.1'
35
34
  spec.add_dependency 'ruby-vips', '~> 2.0'
36
35
 
37
- spec.add_development_dependency 'bundler', '~> 1.16'
38
- spec.add_development_dependency 'pry'
39
- spec.add_development_dependency 'rake', '~> 10.0'
36
+ spec.add_development_dependency 'bundler'
37
+ spec.add_development_dependency 'rake', '~> 13.0'
40
38
  spec.add_development_dependency 'rspec', '~> 3.0'
41
- spec.add_development_dependency 'rubocop', '~> 0.52'
39
+ spec.add_development_dependency 'rubocop', '~> 0.90.0'
42
40
  end
@@ -1,8 +1,8 @@
1
- # frozen_string_literal: true
2
-
3
1
  #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
4
3
 
5
4
  require 'bundler/setup'
6
5
  require 'despeck'
7
-
6
+ require 'despeck/installation_post_message'
7
+ Despeck::InstallationPostMessage.build
8
8
  Despeck::CLI.run
@@ -1,7 +1,7 @@
1
1
  #!/bin/bash
2
2
 
3
3
  vips_site=https://github.com/jcupitt/libvips/releases/download
4
- version=$VIPS_VERSION_MAJOR.$VIPS_VERSION_MINOR.$VIPS_VERSION_MICRO
4
+ version=$LIBVIPS_VERSION
5
5
 
6
6
  set -e
7
7
 
@@ -9,18 +9,23 @@ set -e
9
9
  # we could check the configure params as well I guess
10
10
  if [ -d "$HOME/vips/bin" ]; then
11
11
  installed_version=$($HOME/vips/bin/vips --version)
12
- escaped_version="$VIPS_VERSION_MAJOR\.$VIPS_VERSION_MINOR\.$VIPS_VERSION_MICRO"
13
12
  echo "Need vips-$version"
14
13
  echo "Found $installed_version"
15
- if [[ "$installed_version" =~ ^vips-$escaped_version ]]; then
14
+ if [[ "$installed_version" =~ ^vips-$version ]]; then
16
15
  echo "Using cached directory"
17
16
  exit 0
18
17
  fi
19
18
  fi
20
19
 
21
20
  rm -rf $HOME/vips
22
- wget $vips_site/v$version/vips-$version.tar.gz
21
+ echo 'Downloading libvips source'
22
+ wget $vips_site/v$version/vips-$version.tar.gz >/dev/null 2>&1
23
+ echo 'Extracting'
23
24
  tar xf vips-$version.tar.gz
24
25
  cd vips-$version
25
- CXXFLAGS=-D_GLIBCXX_USE_CXX11_ABI=0 ./configure --prefix=$HOME/vips $*
26
- make && make install
26
+ echo 'Configuring'
27
+ CXXFLAGS=-D_GLIBCXX_USE_CXX11_ABI=0 ./configure --prefix=$HOME/vips $* >/dev/null 2>&1
28
+ echo 'Make'
29
+ make >/dev/null 2>&1
30
+ echo 'Install'
31
+ make install >/dev/null 2>&1
@@ -0,0 +1,23 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Despeck
4
+ module Commands
5
+ # Subcommand that removes watermarks & returns OCR text
6
+ class DespeckAndOcr < Clamp::Command
7
+ parameter 'input_file', 'Input file - either PDF or image',
8
+ attribute_name: :input_file
9
+ option ['-l', '--lang'],
10
+ 'LANGUAGE',
11
+ 'One of supported Tesseract languages (`eng` by default)',
12
+ default: :eng
13
+
14
+ def execute
15
+ extension = File.extname(input_file)
16
+ temp_image = Tempfile.new(['despecked', extension])
17
+ `bundle exec despeck remove #{input_file} #{temp_image.path}`
18
+ input_image = temp_image.size.zero? ? input_file : temp_image.path
19
+ puts `bundle exec despeck ocr -l #{lang} #{input_image}`
20
+ end
21
+ end
22
+ end
23
+ end
@@ -42,11 +42,11 @@ module Despeck
42
42
  Despeck.apply_logger_level(debug?)
43
43
 
44
44
  if input_file.end_with?('.pdf')
45
- images =
46
- PdfTools.pdf_to_images(input_file).map do |image|
47
- remove_watermark_from_image(image, nil)
48
- end
49
- PdfTools.images_to_pdf(images, output_file)
45
+ origin_images = PdfTools.pdf_to_images(input_file)
46
+ images = origin_images.map do |image|
47
+ remove_watermark_from_image(image, nil)
48
+ end
49
+ PdfTools.images_to_pdf(images, output_file, origin_images)
50
50
  else
51
51
  remove_watermark_from_image(input_file, output_file)
52
52
  end
@@ -57,10 +57,10 @@ module Despeck
57
57
  def remove_watermark_from_image(input, output)
58
58
  wr =
59
59
  WatermarkRemover.new(
60
- add_contrast: add_contrast?,
61
- accurate: accurate?,
62
- black_const: black_const,
63
- sensitivity: sensitivity,
60
+ add_contrast: add_contrast?,
61
+ accurate: accurate?,
62
+ black_const: black_const,
63
+ sensitivity: sensitivity,
64
64
  watermark_color: color
65
65
  )
66
66
 
@@ -4,11 +4,11 @@ require 'clamp'
4
4
  require 'benchmark'
5
5
  require 'pdf-reader'
6
6
  require 'prawn'
7
- require 'pry'
8
7
  require 'vips'
9
8
  require 'rmagick'
10
9
  require 'rtesseract'
11
10
 
11
+ require_relative 'commands/despeck_and_ocr'
12
12
  require_relative 'commands/remove'
13
13
  require_relative 'commands/ocr'
14
14
 
@@ -8,6 +8,11 @@ module Despeck
8
8
  exit(0)
9
9
  end
10
10
 
11
+ subcommand(
12
+ 'despeck',
13
+ 'Extract text from the despecked image or pdf',
14
+ Despeck::Commands::DespeckAndOcr
15
+ )
11
16
  subcommand('remove', 'Remove watermark', Despeck::Commands::Remove)
12
17
  subcommand('ocr', 'Extract text from the image', Despeck::Commands::Ocr)
13
18
  end
@@ -0,0 +1,77 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Despeck
4
+ # Print notes from the installation
5
+ module InstallationPostMessage
6
+ module_function
7
+
8
+ def build
9
+ if %w[1 true TRUE].include?(
10
+ ENV.fetch('DESPECK_SKIP_INSTALL_NOTES', false)
11
+ )
12
+ return
13
+ end
14
+
15
+ require 'vips'
16
+ return print_notes unless vips_check_passed?
17
+ end
18
+
19
+ def vips_support_pdf?
20
+ begin
21
+ Vips::Image.pdfload
22
+ rescue Vips::Error => e
23
+ if e.message =~ /class "pdfload" not found/
24
+ notes << <<~DOC
25
+ - Libvips installed without PDF support, make sure you
26
+ have PDFium/poppler-glib installed before installing
27
+ despeck. For more detail install instruction go to
28
+ this page https://libvips.github.io/libvips/install.html
29
+ DOC
30
+ return false
31
+ end
32
+ end
33
+ true
34
+ end
35
+
36
+ def vips_version_supported?
37
+ version_only = Vips.version_string.match(/(\d+\.\d+\.\d+)/)[0]
38
+ return true if version_only > '8.6.5'
39
+
40
+ notes << <<~DOC
41
+ - Your libvips version is should be minimal at 8.6.5
42
+ Please rebuild/reinstall your libvips to >= 8.6.5 .
43
+ DOC
44
+ false
45
+ end
46
+
47
+ def vips_check_passed?
48
+ passed = true
49
+ passed = false unless vips_version_supported?
50
+ passed = false unless vips_support_pdf?
51
+ passed
52
+ end
53
+
54
+ def notes
55
+ @notes ||= []
56
+ end
57
+
58
+ def print_notes
59
+ return if notes.empty?
60
+
61
+ puts <<~NOTES
62
+ #{hr '='}
63
+ Despeck Installation Notes :
64
+ #{hr '-'}
65
+ #{notes.uniq.join("\n")}
66
+ To Skip this notes `export DESPECK_SKIP_INSTALL_NOTES=1`
67
+ #{hr '='}
68
+ NOTES
69
+ @error_message = []
70
+ false
71
+ end
72
+
73
+ def hr(line = '-')
74
+ (line * 50)
75
+ end
76
+ end
77
+ end
@@ -3,14 +3,36 @@
3
3
  module Despeck
4
4
  # Extracts text of desired language from the image
5
5
  class Ocr
6
- attr_reader :lang, :image_path
6
+ attr_reader :lang, :source_path
7
7
 
8
- def initialize(image)
9
- @image_path = image
8
+ def initialize(path)
9
+ @source_path = path
10
10
  end
11
11
 
12
12
  def text(lang: :eng)
13
- RTesseract.new(image_path, lang: lang).to_s
13
+ if source_path.end_with?('.pdf')
14
+ res = ''
15
+ for_each_page_image do |path|
16
+ res += RTesseract.new(path, lang: lang).to_s
17
+ end
18
+ res
19
+ else
20
+ RTesseract.new(source_path, lang: lang).to_s
21
+ end
22
+ end
23
+
24
+ private
25
+
26
+ def for_each_page_image
27
+ paths = []
28
+ Despeck::PdfTools
29
+ .pdf_to_images(source_path).each do |pic|
30
+ tempfile = Tempfile.new(['despeck_page', '.jpg'])
31
+ pic.write_to_file(tempfile.path)
32
+ yield tempfile.path
33
+ end
34
+
35
+ paths
14
36
  end
15
37
  end
16
38
  end
@@ -1,4 +1,5 @@
1
1
  # frozen_string_literal: true
2
+ require 'tempfile'
2
3
 
3
4
  module Despeck
4
5
  # Read/Write PDF files
@@ -15,16 +16,16 @@ module Despeck
15
16
  images
16
17
  end
17
18
 
18
- def images_to_pdf(images, pdf_path)
19
+ def images_to_pdf(images, pdf_path, origin_images = [])
19
20
  doc = nil
20
21
 
21
- for_each_image_file(images) do |path, page_size, pic_size, layout|
22
+ for_each_image_file(images,
23
+ origin_images) do |path, pg_size, pic_size, layout|
22
24
  if doc
23
- doc.start_new_page(size: page_size, layout: layout)
25
+ doc.start_new_page(size: pg_size, layout: layout)
24
26
  else
25
- doc = Prawn::Document.new(page_size: page_size, page_layout: layout)
27
+ doc = Prawn::Document.new(page_size: pg_size, page_layout: layout)
26
28
  end
27
-
28
29
  doc.image(path, position: :left, vposition: :top, fit: pic_size)
29
30
  end
30
31
 
@@ -43,10 +44,11 @@ module Despeck
43
44
 
44
45
  private
45
46
 
46
- def for_each_image_file(images)
47
- images.each do |pic|
47
+ def for_each_image_file(images, origin_images)
48
+ images.each_with_index do |picture, i|
48
49
  tempfile = Tempfile.new(['despeck', '.jpg'])
49
- pic.write_to_file(tempfile.path)
50
+ pic = picture || origin_images[i]
51
+ picture.write_to_file(tempfile.path)
50
52
 
51
53
  page_size = pdf_size(pic)
52
54
  layout = page_size.max == page_size.first ? :landscape : :portrait
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Despeck
4
- VERSION = '0.3.0'
4
+ VERSION = '0.4.2'
5
5
  end
@@ -50,9 +50,9 @@ module Despeck
50
50
 
51
51
  no_watermark = no_watermark.colourspace('b-w').bandjoin(mask.invert)
52
52
 
53
- output_image
54
- .bandjoin(mask)
55
- .composite(no_watermark, 'over')
53
+ output_image = output_image.colourspace('srgb') if output_image.bands < 3
54
+ output_image = output_image.bandjoin(mask) if output_image.bands == 3
55
+ output_image.composite(no_watermark, 'over')
56
56
  end
57
57
 
58
58
  def __remove_watermark__(image)
@@ -86,7 +86,7 @@ module Despeck
86
86
  def grayscale_algorithm(image, pr_color)
87
87
  rgb_params = greyscale_params(pr_color)
88
88
  rgb_params << 0 if image.bands == 4
89
- image.recomb(rgb_params)
89
+ image.recomb([rgb_params])
90
90
  end
91
91
 
92
92
  def greyscale_params(pr_color)
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: despeck
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.4.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ribose Inc.
8
8
  autorequire:
9
- bindir: bin
9
+ bindir: exe
10
10
  cert_chain: []
11
- date: 2018-03-22 00:00:00.000000000 Z
11
+ date: 2020-09-15 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: clamp
@@ -16,14 +16,14 @@ dependencies:
16
16
  requirements:
17
17
  - - "~>"
18
18
  - !ruby/object:Gem::Version
19
- version: '1.2'
19
+ version: '1.3'
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
24
  - - "~>"
25
25
  - !ruby/object:Gem::Version
26
- version: '1.2'
26
+ version: '1.3'
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: pdf-reader
29
29
  requirement: !ruby/object:Gem::Requirement
@@ -44,42 +44,42 @@ dependencies:
44
44
  requirements:
45
45
  - - "~>"
46
46
  - !ruby/object:Gem::Version
47
- version: '2.2'
47
+ version: '2.3'
48
48
  type: :runtime
49
49
  prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
52
  - - "~>"
53
53
  - !ruby/object:Gem::Version
54
- version: '2.2'
54
+ version: '2.3'
55
55
  - !ruby/object:Gem::Dependency
56
56
  name: rmagick
57
57
  requirement: !ruby/object:Gem::Requirement
58
58
  requirements:
59
59
  - - "~>"
60
60
  - !ruby/object:Gem::Version
61
- version: '2'
61
+ version: '4.0'
62
62
  type: :runtime
63
63
  prerelease: false
64
64
  version_requirements: !ruby/object:Gem::Requirement
65
65
  requirements:
66
66
  - - "~>"
67
67
  - !ruby/object:Gem::Version
68
- version: '2'
68
+ version: '4.0'
69
69
  - !ruby/object:Gem::Dependency
70
70
  name: rtesseract
71
71
  requirement: !ruby/object:Gem::Requirement
72
72
  requirements:
73
73
  - - "~>"
74
74
  - !ruby/object:Gem::Version
75
- version: '2.2'
75
+ version: '3.1'
76
76
  type: :runtime
77
77
  prerelease: false
78
78
  version_requirements: !ruby/object:Gem::Requirement
79
79
  requirements:
80
80
  - - "~>"
81
81
  - !ruby/object:Gem::Version
82
- version: '2.2'
82
+ version: '3.1'
83
83
  - !ruby/object:Gem::Dependency
84
84
  name: ruby-vips
85
85
  requirement: !ruby/object:Gem::Requirement
@@ -96,20 +96,6 @@ dependencies:
96
96
  version: '2.0'
97
97
  - !ruby/object:Gem::Dependency
98
98
  name: bundler
99
- requirement: !ruby/object:Gem::Requirement
100
- requirements:
101
- - - "~>"
102
- - !ruby/object:Gem::Version
103
- version: '1.16'
104
- type: :development
105
- prerelease: false
106
- version_requirements: !ruby/object:Gem::Requirement
107
- requirements:
108
- - - "~>"
109
- - !ruby/object:Gem::Version
110
- version: '1.16'
111
- - !ruby/object:Gem::Dependency
112
- name: pry
113
99
  requirement: !ruby/object:Gem::Requirement
114
100
  requirements:
115
101
  - - ">="
@@ -128,14 +114,14 @@ dependencies:
128
114
  requirements:
129
115
  - - "~>"
130
116
  - !ruby/object:Gem::Version
131
- version: '10.0'
117
+ version: '13.0'
132
118
  type: :development
133
119
  prerelease: false
134
120
  version_requirements: !ruby/object:Gem::Requirement
135
121
  requirements:
136
122
  - - "~>"
137
123
  - !ruby/object:Gem::Version
138
- version: '10.0'
124
+ version: '13.0'
139
125
  - !ruby/object:Gem::Dependency
140
126
  name: rspec
141
127
  requirement: !ruby/object:Gem::Requirement
@@ -156,40 +142,37 @@ dependencies:
156
142
  requirements:
157
143
  - - "~>"
158
144
  - !ruby/object:Gem::Version
159
- version: '0.52'
145
+ version: 0.90.0
160
146
  type: :development
161
147
  prerelease: false
162
148
  version_requirements: !ruby/object:Gem::Requirement
163
149
  requirements:
164
150
  - - "~>"
165
151
  - !ruby/object:Gem::Version
166
- version: '0.52'
152
+ version: 0.90.0
167
153
  description: Removes stamps and watermarks from scanned images for OCR, 'removes specks'
168
154
  email:
169
155
  - open.source@ribose.com
170
156
  executables:
171
- - console
172
157
  - despeck
173
- - setup
174
158
  extensions: []
175
159
  extra_rdoc_files: []
176
160
  files:
161
+ - ".github/workflows/ubuntu.yml"
177
162
  - ".gitignore"
178
163
  - ".rspec"
179
164
  - ".rubocop.yml"
180
- - ".ruby-version"
181
- - ".travis.yml"
182
165
  - CODE_OF_CONDUCT.md
183
166
  - Gemfile
184
- - OCR.md
185
- - README.md
167
+ - README.adoc
186
168
  - ROADMAP.adoc
187
169
  - Rakefile
188
170
  - bin/console
189
- - bin/despeck
190
171
  - bin/setup
191
172
  - despeck.gemspec
173
+ - exe/despeck
192
174
  - install-vips.sh
175
+ - lib/commands/despeck_and_ocr.rb
193
176
  - lib/commands/ocr.rb
194
177
  - lib/commands/remove.rb
195
178
  - lib/despeck.rb
@@ -197,12 +180,16 @@ files:
197
180
  - lib/despeck/colour_checker.rb
198
181
  - lib/despeck/dominant_color.rb
199
182
  - lib/despeck/dominant_color_v2.rb
183
+ - lib/despeck/installation_post_message.rb
200
184
  - lib/despeck/logger.rb
201
185
  - lib/despeck/ocr.rb
202
186
  - lib/despeck/pdf_tools.rb
203
187
  - lib/despeck/version.rb
204
188
  - lib/despeck/watermark_mask.rb
205
189
  - lib/despeck/watermark_remover.rb
190
+ - readme_images/accurate.jpg
191
+ - readme_images/defaults.jpg
192
+ - readme_images/watermarked.jpg
206
193
  - sensitivities.txt
207
194
  homepage: https://github.com/riboseinc/despeck
208
195
  licenses:
@@ -223,8 +210,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
223
210
  - !ruby/object:Gem::Version
224
211
  version: '0'
225
212
  requirements: []
226
- rubyforge_project:
227
- rubygems_version: 2.5.2
213
+ rubygems_version: 3.0.3
228
214
  signing_key:
229
215
  specification_version: 4
230
216
  summary: Removes stamps and watermarks from scanned images for OCR, 'removes specks'
@@ -1 +0,0 @@
1
- 2.3.3
@@ -1,58 +0,0 @@
1
- sudo: false
2
-
3
- env:
4
- global:
5
- - NOKOGIRI_USE_SYSTEM_LIBRARIES=true
6
- - VIPS_VERSION_MAJOR=8
7
- - VIPS_VERSION_MINOR=5
8
- - VIPS_VERSION_MICRO=7
9
- - PATH=$HOME/vips/bin:$PATH
10
- - LD_LIBRARY_PATH=$HOME/vips/lib:$LD_LIBRARY_PATH
11
- - PKG_CONFIG_PATH=$HOME/vips/lib/pkgconfig:$PKG_CONFIG_PATH
12
-
13
- dist: trusty
14
-
15
- addons:
16
- apt:
17
- packages:
18
- - libexpat1-dev
19
- - gettext
20
- - liblcms2-dev
21
- - libmagickwand-dev
22
- - libopenexr-dev
23
- - libcfitsio3-dev
24
- - libgif-dev
25
- - libgs-dev
26
- - libgsf-1-dev
27
- - libmatio-dev
28
- - libopenslide-dev
29
- - liborc-0.4-dev
30
- - libpango1.0-dev
31
- - libpoppler-glib-dev
32
- - librsvg2-dev
33
- - libwebp-dev
34
- # missing on trusty, unfortunately
35
- # - libwebpmux2
36
- - libfftw3-dev
37
- - libglib2.0-dev
38
-
39
- cache:
40
- directories:
41
- - $HOME/vips
42
-
43
- language: ruby
44
- rvm:
45
- - 2.3
46
- - 2.4
47
- - 2.5
48
-
49
- script:
50
- - bundle exec rspec spec
51
- - bundle exec rubocop
52
-
53
- gemfile:
54
- - Gemfile
55
-
56
- before_install:
57
- - uname -a
58
- - bash install-vips.sh --without-python
data/OCR.md DELETED
@@ -1,36 +0,0 @@
1
- # OCR with Despeck
2
-
3
- To make OCR work, you need to install the following tools:
4
-
5
- * Tesseract (version 3.x)
6
- * ImageMagick (version 6.x)
7
-
8
- ## Installation
9
-
10
- ### MacOS
11
-
12
- To install tesseract itself:
13
-
14
- ```sh
15
- $ brew install tesseract --all-languages
16
- $ brew install imagemagick
17
- ```
18
-
19
- Or you can install tesseract with some languages manually:
20
-
21
- ```sh
22
- $ brew install tesseract wget imagemagick
23
- $ mkdir -p ~/Downloads/tessdata
24
- $ cd ~/Downloads/tessdata
25
- $ wget https://github.com/tesseract-ocr/tessdata/raw/3.04.00/chi_sim.traineddata
26
- ```
27
-
28
- The full list of languages trained data can be found here (note, they're different for different Tesseract versions):
29
-
30
- https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#data-files-for-version-304305
31
-
32
- ### Ubuntu/Debian
33
-
34
- ```sh
35
- $ apt-get install tesseract-ocr tesseract-ocr-chi-sim imagemagick
36
- ```
data/README.md DELETED
@@ -1,96 +0,0 @@
1
- [![Gem Version](https://badge.fury.io/rb/despeck.svg)](https://badge.fury.io/rb/despeck)
2
- [![Build Status](https://travis-ci.org/riboseinc/despeck.svg?branch=master)](https://travis-ci.org/riboseinc/despeck)
3
-
4
- # Despeck
5
-
6
- Remove unwanted stamps or watermarks from scanned images
7
-
8
- `despeck` is a Ruby gem that helps you remove unwanted stamps or watermarks from
9
- scanned images/PDFs, primarily prior to OCR.
10
-
11
- Its image processing operations are based on libvips via the
12
- https://github.com/jcupitt/ruby-vips[ruby-vips] Ruby-bindings.
13
-
14
- It can be used to:
15
-
16
- * detect uniform watermarks from a series of images,
17
- * output a watermark pattern file (image, mask) that describes a watermark pattern, and
18
- * remove a specified watermark pattern from input images regardless of the
19
- location of the watermark on these images.
20
-
21
- Assumptions on input:
22
-
23
- * The input may be a single image, or a PDF of multiple pages of images
24
- * In the case of multiple pages, not all pages may have the watermark
25
- * The input images are assumed to be purely monochrome text-based.
26
- * The watermarks are colored. For example, if the watermark is a GREEN SQUARE PATTERN, for all
27
- the pages that contain this mark, despeck will attempt to detect this pattern
28
- and remove them
29
-
30
- ## Installation
31
-
32
- Install gem manually
33
-
34
- ```
35
- $ gem install despeck
36
- ```
37
-
38
- Or add it to your `Gemfile`
39
-
40
- ```
41
- gem 'despeck'
42
- ```
43
-
44
- and then run `bundle install`
45
-
46
- ## OCR
47
-
48
- To be able to extract text via `despeck ocr` command, you'll need to install:
49
-
50
- * Tesseract (3.x)
51
- * ImageMagick (6.x)
52
- * Desired languages
53
-
54
- Installation instruction can be found here: [OCR tools installation guide](./OCR.md)
55
-
56
- ## Usage (Command Line)
57
-
58
- Getting actual help:
59
-
60
- ```sh
61
- # To show general help
62
- despeck -h
63
- despeck remove -h
64
- ```
65
-
66
- To remove watermark:
67
-
68
- ```sh
69
- $ despeck remove /path/to/input.jpg /path/to/output.jpg
70
- ```
71
-
72
- With the command above, Despeck will try to find the watermark colour, and apply best filter settings to remove the watermark. It may be wrong, so you can pass several parameters to help Despeck with that:
73
-
74
- ```sh
75
- $ despec remove --color 00FF00 --sensitivity 120 --black-const -60 --add-contrast /path/to/input.pdf /path/to/output.pdf
76
- ```
77
-
78
- * `--color 00FF00` - to say watermark is ~ green.
79
- * `--sensitivity 120` - increases sensitivity (if with default 100 watermark is still visible).
80
- * `--black-const -60` - by default, Despeck tries to improve text quality by increasing black by -110. This may be too much for you, so you can reduce that number.
81
- * `--add-contrast` - disabled by default, increases output image's contrast.
82
- * `--accurate` - disabled by default. Applies filters to the area with watermark only, preserving the rest of the image untouched.
83
-
84
- ## Usage
85
-
86
- *(still under development)*
87
-
88
- ```ruby
89
- wr = Despeck::WatermarkRemover.new(black_const: -90, resize: 0.01)
90
- # => #<Despeck::WatermarkRemover:0x007f935b5a1a68 @add_contrast=true, @black_const=-110, @watermark_color=nil, @resize=0.1, @sensitivity=100>
91
- image = Vips::Image.new_from_file("/path/to/image.jpg")
92
- # => #<Image 4816x6900 uchar, 3 bands, srgb>
93
- output_image = wr.remove_watermark(image)
94
- # => #<Image 4816x6900 float, 3 bands, b-w>
95
- output_image.write_to_file('/path/to/output.jpg')
96
- ```