textractor 0.0.2 → 0.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/VERSION +1 -1
- data/lib/textractor/document.rb +15 -7
- data/spec/document_spec.rb +22 -7
- data/spec/fixtures/document.docx +0 -0
- data/textractor.gemspec +19 -3
- data/vendor/docx2txt/AUTHORS +1 -0
- data/vendor/docx2txt/BSDmakefile +14 -0
- data/vendor/docx2txt/COPYING +674 -0
- data/vendor/docx2txt/ChangeLog +67 -0
- data/vendor/docx2txt/INSTALL +100 -0
- data/vendor/docx2txt/Makefile +23 -0
- data/vendor/docx2txt/README +109 -0
- data/vendor/docx2txt/ToDo +16 -0
- data/vendor/docx2txt/VERSION +1 -0
- data/vendor/docx2txt/WInstall.bat +218 -0
- data/vendor/docx2txt/docx2txt.bat +206 -0
- data/vendor/docx2txt/docx2txt.config +51 -0
- data/vendor/docx2txt/docx2txt.pl +387 -0
- data/vendor/docx2txt/docx2txt.sh +118 -0
- data/vendor/docx2txt/resume.docx +0 -0
- metadata +20 -4
@@ -0,0 +1,67 @@
|
|
1
|
+
v1.0 : 05/10/2009
|
2
|
+
|
3
|
+
New features:
|
4
|
+
- Input argument can also be a directory holding the unzipped content of .docx
|
5
|
+
file.
|
6
|
+
- Windows wrapper script, and support for using CakeCmd command line unzipper.
|
7
|
+
- Configuration file support for easy control over settings.
|
8
|
+
- Windows installation script.
|
9
|
+
|
10
|
+
Updations:
|
11
|
+
- Hyperlink is not displayed if hyperlink and hyperlinked text are same, even
|
12
|
+
though user has enabled hyperlink display.
|
13
|
+
- Improved handling of short line justification, capturing many cases that were
|
14
|
+
missed in earlier approach.
|
15
|
+
- Path names containing spaces are now handled.
|
16
|
+
|
17
|
+
Please refer to the updated documentation for more details.
|
18
|
+
|
19
|
+
|
20
|
+
v0.4 : 06/09/2009
|
21
|
+
|
22
|
+
New features: [suggestions from "Sergei Kulakov (sergei>AT<dewia>DOT<com)"].
|
23
|
+
- user can control display of hyperlink along with linked text.
|
24
|
+
- TOC related cleanup. TOC was not addressed so far.
|
25
|
+
|
26
|
+
Updations:
|
27
|
+
- many new character conversions (check the script code for details).
|
28
|
+
- character conversion mappings are now organised in a tabular form.
|
29
|
+
- currency characters are converted to respective full currency name.
|
30
|
+
- code tweaks to speedup the conversion process.
|
31
|
+
|
32
|
+
|
33
|
+
v0.3 : 23/09/2008
|
34
|
+
|
35
|
+
New features:
|
36
|
+
- center and right justification of text fitting in a line of (adjustible) 80
|
37
|
+
columns.
|
38
|
+
- indicating hyperlinked text along with the hyperlink.
|
39
|
+
- BSD makefile [Thanks to "Rene Maroufi" (info>AT<maroufi>DOT<net) for giving
|
40
|
+
guest access on an OpenBSD host for it].
|
41
|
+
|
42
|
+
Please refer to the release documentation for details.
|
43
|
+
- docx2txt.pl invocation has been changed a little,
|
44
|
+
- user involvement during installation is reduced.
|
45
|
+
- some suggestions on how Windows users can use this tool.
|
46
|
+
|
47
|
+
|
48
|
+
v0.2 : 15/08/2008
|
49
|
+
|
50
|
+
Docx text extraction can now be done in two ways (check version README for
|
51
|
+
further details).
|
52
|
+
- docx2txt.sh file.docx
|
53
|
+
- docx2txt.pl infile.docx outfile.txt
|
54
|
+
|
55
|
+
|
56
|
+
v0.1 : 10/08/2008
|
57
|
+
|
58
|
+
Initial Sourceforge release with attempts to handle following features during
|
59
|
+
text extraction.
|
60
|
+
- horizontal ruler, line breaks, paragraphs separation, tabs
|
61
|
+
- naive nested list formatting - assumed 8 level nesting, however if you want
|
62
|
+
to deal with further nesting, play comment-uncomment in perl script. :)
|
63
|
+
- capitalisation of text blocks i.e. in document.xml text is stored either as
|
64
|
+
lowercase or in mixed case, but in corresponding text files generated by
|
65
|
+
MSOffice it comes as all caps.
|
66
|
+
- character conversions (" ' < & > - ... etc.). Euro character is converted to
|
67
|
+
E, however you can change this behaviour by comment-uncomment in perl script.
|
@@ -0,0 +1,100 @@
|
|
1
|
+
Non-Windows users, please adjust following executables paths before proceeding
|
2
|
+
for installation.
|
3
|
+
|
4
|
+
- #! path for env in docx2txt.sh and docx2txt.pl
|
5
|
+
- path for unzip in docx2txt.config
|
6
|
+
|
7
|
+
You can skip installing docx2txt.sh and docx2txt.bat wrapper scripts (as
|
8
|
+
applicable) during manual installation. These check for overwriting the output
|
9
|
+
text file and have slightly restricted usage as compared to core docx2txt.pl
|
10
|
+
script. [check README for details]
|
11
|
+
|
12
|
+
However if you are using CakeCmd unzipper, docx2txt.bat can be quite handy as
|
13
|
+
it internally manages unzipping the .docx files that do not have .zip extension.
|
14
|
+
|
15
|
+
|
16
|
+
|
17
|
+
Installation on Linux, Cygwin, BSD and similar systems
|
18
|
+
------------------------------------------------------
|
19
|
+
|
20
|
+
Type "make" as root to install docx2txt files for all users in /usr/local/bin.
|
21
|
+
If you want to install these in some other directory, you can do so via
|
22
|
+
|
23
|
+
make INSTALLDIR=/path/to/desired/directory
|
24
|
+
|
25
|
+
BSD users can use either GNU make or BSD make.
|
26
|
+
|
27
|
+
You will need make and install utilities installed on your system for
|
28
|
+
installation via Makefile.
|
29
|
+
|
30
|
+
In case, you don't want to use Makefile for installation, you can follow these
|
31
|
+
steps for manual installation.
|
32
|
+
|
33
|
+
1. Copy docx2txt.pl, docx2txt.sh and docx2txt.config to the desired directory.
|
34
|
+
|
35
|
+
cp docx2txt.pl docx2txt.sh docx2txt.config /path/to/desired/directory
|
36
|
+
|
37
|
+
2. Change the permission of copied files to 755 for docx2txt.pl and docx2txt.sh,
|
38
|
+
and 644 for docx2txt.config .
|
39
|
+
|
40
|
+
chmod a+rX /path/to/desired/directory/docx2txt.*
|
41
|
+
|
42
|
+
3. Add the concerned directory to your PATH, if not already in PATH.
|
43
|
+
|
44
|
+
PATH=$PATH:/path/to/desired/directory
|
45
|
+
|
46
|
+
|
47
|
+
Installation on Windows
|
48
|
+
-----------------------
|
49
|
+
|
50
|
+
I. You can install minimal Cygwin packages from http://www.cygwin.com/ to have
|
51
|
+
working bash, cat, env, install, make, perl and unzip utilities and thus
|
52
|
+
create the required Cygwin environment for using this utility.
|
53
|
+
|
54
|
+
II. If you do not want to install even minimal Cygwin, you can try following
|
55
|
+
sequence for manual installation.
|
56
|
+
|
57
|
+
a. Get following files from /usr/bin/ of cygwin installation and place them in,
|
58
|
+
say C:\docx2txt .
|
59
|
+
|
60
|
+
cygwin1.dll
|
61
|
+
perl.exe
|
62
|
+
cygperl*.dll
|
63
|
+
unzip.exe
|
64
|
+
cygcrypt*.dll
|
65
|
+
|
66
|
+
b. Copy docx2txt.pl, docx2txt.bat and docx2txt.config to C:\docx2txt .
|
67
|
+
|
68
|
+
c. Change path for unzip in docx2txt.config to C:/docx2txt/unzip.exe and path
|
69
|
+
for perl in docx2txt.bat to C:\docx2txt\perl.exe .
|
70
|
+
|
71
|
+
d. You can now use this tool from within C:\docx2txt as follows.
|
72
|
+
|
73
|
+
docx2txt.bat file.docx
|
74
|
+
docx2txt.bat path-to-directory\file.docx
|
75
|
+
|
76
|
+
perl docx2txt.pl file.docx
|
77
|
+
perl docx2txt.pl directory\file.docx -
|
78
|
+
perl docx2txt.pl directory/file.docx file.txt
|
79
|
+
perl docx2txt.pl C:/somedir/file.docx
|
80
|
+
perl docx2txt.pl C:\somedir\file.docx C:\otherdir\converted.txt
|
81
|
+
|
82
|
+
III. You can also install this utility via WInstall.bat and follow the
|
83
|
+
instructions during installation. WInstall.bat can be invoked in two ways.
|
84
|
+
|
85
|
+
WInstall.bat installation-folder-name
|
86
|
+
WInstall.bat
|
87
|
+
|
88
|
+
In second case, install script will ask user for installation folder name.
|
89
|
+
|
90
|
+
It is advisable to have working installations of perl and atleast one command
|
91
|
+
line unzipper (Unzip/CakeCmd) before running this install script, so that it
|
92
|
+
can automatically set the desired paths in installed files.
|
93
|
+
|
94
|
+
You can use
|
95
|
+
|
96
|
+
- Cygwin perl or Strawberry perl [http://strawberryperl.com/] or any other
|
97
|
+
Windows native perl implementation
|
98
|
+
- Cygwin unzip or UnZip for Windows [http://gnuwin32.sourceforge.net/downlinks/unzip.php]
|
99
|
+
- CakeCmd unzipper [http://www.quickzip.org/cakecmd.html]
|
100
|
+
|
@@ -0,0 +1,23 @@
|
|
1
|
+
#
|
2
|
+
# Makefile for docx2txt
|
3
|
+
#
|
4
|
+
|
5
|
+
INSTALLDIR ?= /usr/local/bin
|
6
|
+
|
7
|
+
INSTALL = $(shell which install 2>/dev/null)
|
8
|
+
ifeq ($(INSTALL),)
|
9
|
+
$(error "Need 'install' to install docx2txt")
|
10
|
+
endif
|
11
|
+
|
12
|
+
PERL = $(shell which perl 2>/dev/null)
|
13
|
+
ifeq ($(PERL),)
|
14
|
+
$(warning "*** Make sure 'perl' is installed and is in your PATH, before running the installed script. ***")
|
15
|
+
endif
|
16
|
+
|
17
|
+
Dx2TFILES = docx2txt.sh docx2txt.pl docx2txt.config
|
18
|
+
|
19
|
+
install: $(Dx2TFILES)
|
20
|
+
[ -d $(INSTALLDIR) ] || mkdir -p $(INSTALLDIR)
|
21
|
+
$(INSTALL) -m 755 $^ $(INSTALLDIR)
|
22
|
+
|
23
|
+
.PHONY: install
|
@@ -0,0 +1,109 @@
|
|
1
|
+
docx2txt (http://docx2txt.sourceforge.net/) is a simple tool to generate
|
2
|
+
equivalent text files from Microsoft .docx documents, with an attempt towards
|
3
|
+
preserving sufficient formatting and document information, and appropriate
|
4
|
+
character conversions for a good text experience.
|
5
|
+
|
6
|
+
You need to atleast have perl installed on your system for using this tool.
|
7
|
+
|
8
|
+
|
9
|
+
How to Use
|
10
|
+
----------
|
11
|
+
|
12
|
+
You can do the text conversion in different ways depending upon your usage
|
13
|
+
environment.
|
14
|
+
|
15
|
+
1. Using docx2txt.sh :
|
16
|
+
|
17
|
+
docx2txt.sh file.docx
|
18
|
+
OR
|
19
|
+
docx2txt.sh file
|
20
|
+
|
21
|
+
In both these cases output text will be saved in file.txt .
|
22
|
+
|
23
|
+
2. Using docx2txt.pl :
|
24
|
+
|
25
|
+
a. docx2txt.pl infile.docx outfile.txt
|
26
|
+
|
27
|
+
Use - as the name of output text file, to send extracted text to the
|
28
|
+
stdout/terminal.
|
29
|
+
|
30
|
+
b. docx2txt.pl file.docx
|
31
|
+
OR
|
32
|
+
docx2txt.pl file
|
33
|
+
|
34
|
+
In both these cases output text will be saved in file.txt .
|
35
|
+
|
36
|
+
3. Using docx2txt.bat :
|
37
|
+
|
38
|
+
docx2txt.bat file.docx
|
39
|
+
OR
|
40
|
+
docx2txt.bat file
|
41
|
+
|
42
|
+
In both these cases output text will be saved in file.txt .
|
43
|
+
|
44
|
+
Input argument in all the above cases can also be a directory holding the
|
45
|
+
unzipped content of a .docx file. This feature is particulary useful if you do
|
46
|
+
not have a commandline unzipping tool like Unzip/CakeCmd installed on your
|
47
|
+
system.
|
48
|
+
|
49
|
+
|
50
|
+
Tune your Experience
|
51
|
+
--------------------
|
52
|
+
|
53
|
+
You can change these settings via docx2txt.config file located either in current
|
54
|
+
directory or in same location as the docx2txt.pl script.
|
55
|
+
|
56
|
+
- path to unzip program
|
57
|
+
- newline in output text file (Unix/Dos way)
|
58
|
+
- list level indentation amount
|
59
|
+
- line width (used for short line justification)
|
60
|
+
- showing of hyperlink along with linked text
|
61
|
+
|
62
|
+
Settings take preference in the order - docx2txt.config file in current folder,
|
63
|
+
docx2txt.config file in same location as docx2txt.pl script, defaults hardcoded
|
64
|
+
in docx2txt.pl script.
|
65
|
+
|
66
|
+
You can also adjust list element indicator characters for different levels, in
|
67
|
+
docx2txt.pl to suit your formatting taste. Currently 8 level list nesting is
|
68
|
+
assumed, however if you want to deal with deeper nesting, you can adjust that
|
69
|
+
as well in the perl script, by following the related comments there.
|
70
|
+
|
71
|
+
|
72
|
+
Note for MC (Midnight Commander) fans
|
73
|
+
-------------------------------------
|
74
|
+
|
75
|
+
You can add following binding in ~/.mc/bindings and view the text content of
|
76
|
+
.docx file by hitting F3 key (assuming default key mappings) after moving the
|
77
|
+
cursor over concerned filename in mc pannel.
|
78
|
+
|
79
|
+
# Microsoft .docx Document
|
80
|
+
regex/\.(docx|DOCX|Docx)$
|
81
|
+
View=%view{ascii} docx2txt.pl %f -
|
82
|
+
|
83
|
+
|
84
|
+
Request
|
85
|
+
-------
|
86
|
+
|
87
|
+
If you are using this work directly/indirectly for non-personal purpose(s),
|
88
|
+
please inform the author about it along with relevant url(s), so that it can be
|
89
|
+
mentioned on the project homepage.
|
90
|
+
|
91
|
+
In case you come across some issue with it, or need a feature that can be
|
92
|
+
handled in docx to text conversion, please feel free to communicate. An
|
93
|
+
accompanying test .docx document depicting the issue/need and the corresponding
|
94
|
+
text file generated by MSOffice with character substitution enabled (or as you
|
95
|
+
would like the text file to be) will be helpful.
|
96
|
+
|
97
|
+
You can track the project via http://sourceforge.net/projects/docx2txt and refer
|
98
|
+
to project cvs if there have been changes since this release.
|
99
|
+
|
100
|
+
|
101
|
+
Disclaimer
|
102
|
+
----------
|
103
|
+
|
104
|
+
This program includes no warranty whatsoever. It is provided "AS IS". For more
|
105
|
+
information please read the COPYING document, which should be included with the
|
106
|
+
package, and describes the GNU Public License, which covers docx2txt.
|
107
|
+
|
108
|
+
Sandeep Kumar ( shimple0 -AT- yahoo .DOT. com )
|
109
|
+
|
@@ -0,0 +1,16 @@
|
|
1
|
+
1. Handle lists in better way. [partly worked on, target latest by v2.0]
|
2
|
+
|
3
|
+
2. Heuristics based cleanup of damaged document content. [leaving for this
|
4
|
+
release - looking for more test samples, target v1.1]
|
5
|
+
|
6
|
+
3. Extract images. Now there has been a user request as well. [target pre v2.0]
|
7
|
+
4. Handle footnotes.
|
8
|
+
5. Improve table and short line justification handling. Ideally table columns
|
9
|
+
in a single row should be separated by pipe. Short line justification needs
|
10
|
+
to be adjusted to situations when tab occurs in line. A quick look into these
|
11
|
+
issues suggests that logic/code will need to be reorganised to handle these.
|
12
|
+
|
13
|
+
6. Create a simple manpage, hopefully after resolving footnote and list issues.
|
14
|
+
7. Implement simple state-machine for speedup [partially worked towards it].
|
15
|
+
8. XML parsing??? and making things more efficient. When it has matured enough,
|
16
|
+
may be a C/C++ version should be looked into.
|
@@ -0,0 +1 @@
|
|
1
|
+
1.0
|
@@ -0,0 +1,218 @@
|
|
1
|
+
@echo off
|
2
|
+
|
3
|
+
:: docx2txt, a command-line utility to convert Docx documents to text format.
|
4
|
+
:: Copyright (C) 2008-now Sandeep Kumar
|
5
|
+
::
|
6
|
+
:: This program is free software; you can redistribute it and/or modify
|
7
|
+
:: it under the terms of the GNU General Public License as published by
|
8
|
+
:: the Free Software Foundation; either version 3 of the License, or
|
9
|
+
:: (at your option) any later version.
|
10
|
+
::
|
11
|
+
:: This program is distributed in the hope that it will be useful,
|
12
|
+
:: but WITHOUT ANY WARRANTY; without even the implied warranty of
|
13
|
+
:: MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
14
|
+
:: GNU General Public License for more details.
|
15
|
+
::
|
16
|
+
:: You should have received a copy of the GNU General Public License
|
17
|
+
:: along with this program; if not, write to the Free Software
|
18
|
+
:: Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
|
19
|
+
|
20
|
+
::
|
21
|
+
:: A simple commandline installer for docx2txt on Windows.
|
22
|
+
::
|
23
|
+
:: Author : Sandeep Kumar (shimple0 -AT- Yahoo .DOT. COM)
|
24
|
+
::
|
25
|
+
:: ChangeLog :
|
26
|
+
::
|
27
|
+
:: 02/10/2009 - Initial version of command line installation script for
|
28
|
+
:: Windows users. Script will prompt user for perl, unzip and
|
29
|
+
:: cakecmd paths and will update these paths in the installed
|
30
|
+
:: files using perl, if perl path is valid. Else it will simply
|
31
|
+
:: copy the concerned files to the installation folder.
|
32
|
+
::
|
33
|
+
|
34
|
+
|
35
|
+
::
|
36
|
+
:: Ensure that required command extensions are enabled.
|
37
|
+
::
|
38
|
+
|
39
|
+
setlocal enableextensions
|
40
|
+
setlocal enabledelayedexpansion
|
41
|
+
|
42
|
+
|
43
|
+
echo.
|
44
|
+
echo Welcome to command line installer for docx2txt.
|
45
|
+
echo.
|
46
|
+
|
47
|
+
|
48
|
+
::
|
49
|
+
:: Check if this install script is invoked correctly.
|
50
|
+
::
|
51
|
+
|
52
|
+
if not "%~2" == "" (
|
53
|
+
echo.
|
54
|
+
echo Usage : "%~0" [WhereToInstall]
|
55
|
+
echo.
|
56
|
+
echo WhereToInstall specifies a folder to install into.
|
57
|
+
echo.
|
58
|
+
echo If destination folder is not specified on command line,
|
59
|
+
echo then it will be asked for during the installation.
|
60
|
+
echo.
|
61
|
+
goto END
|
62
|
+
)
|
63
|
+
|
64
|
+
|
65
|
+
::
|
66
|
+
:: Check if destination folder was specified on command line, else ask for it.
|
67
|
+
::
|
68
|
+
|
69
|
+
if "%~1" == "" (
|
70
|
+
echo.
|
71
|
+
echo Where should the docx2txt tool be installed? Specify the location
|
72
|
+
echo without surrounding quotes.
|
73
|
+
echo.
|
74
|
+
set /P destdir=Installation Folder :
|
75
|
+
echo.
|
76
|
+
) else (
|
77
|
+
set destdir=%~1
|
78
|
+
)
|
79
|
+
|
80
|
+
if not exist "%destdir%" (
|
81
|
+
echo.
|
82
|
+
echo ** Folder "%destdir%" does not exist. It will be created now.
|
83
|
+
echo.
|
84
|
+
mkdir "%destdir%"
|
85
|
+
)
|
86
|
+
|
87
|
+
|
88
|
+
::
|
89
|
+
:: Check if user specified destdir is a valid folder or a not.
|
90
|
+
::
|
91
|
+
|
92
|
+
pushd "%destdir%" 2>nul
|
93
|
+
if ERRORLEVEL 1 (
|
94
|
+
echo.
|
95
|
+
echo ** "%destdir%" does not specify a valid folder name.
|
96
|
+
echo ** Exiting installer.
|
97
|
+
echo.
|
98
|
+
goto END
|
99
|
+
) else if ERRORLEVEL 0 (
|
100
|
+
popd
|
101
|
+
)
|
102
|
+
|
103
|
+
|
104
|
+
echo.
|
105
|
+
echo Please specify fully qualified paths to utilities when requested.
|
106
|
+
echo Perl.exe is required for docx2txt tool as well as for this installation.
|
107
|
+
echo.
|
108
|
+
|
109
|
+
set /A attempts=0
|
110
|
+
|
111
|
+
:GET_PERL_PATH
|
112
|
+
|
113
|
+
set /P PERL=Path to Perl.exe :
|
114
|
+
call :CHECK_FILE_EXISTENCE "%PERL%" "perl"
|
115
|
+
if ERRORLEVEL 7 (
|
116
|
+
set /A attempts=attempts+1
|
117
|
+
if !attempts! == 3 (
|
118
|
+
echo.
|
119
|
+
echo Continuing with simple installation ....
|
120
|
+
echo.
|
121
|
+
goto SIMPLE_INSTALL
|
122
|
+
) else (
|
123
|
+
goto GET_PERL_PATH
|
124
|
+
)
|
125
|
+
)
|
126
|
+
|
127
|
+
|
128
|
+
echo.
|
129
|
+
echo.
|
130
|
+
echo If you do not have CakeCmd.exe installed, simply press Enter/Return key.
|
131
|
+
echo.
|
132
|
+
|
133
|
+
set /P CAKECMD=Path to CakeCmd.exe :
|
134
|
+
|
135
|
+
|
136
|
+
echo.
|
137
|
+
echo.
|
138
|
+
echo In case you are using Cygwin Perl.exe, you need to specify Unzip.exe path
|
139
|
+
echo using forward slashes i.e. like C:/path/to/unzip.exe .
|
140
|
+
echo If you do not have Unzip.exe installed, simply press Enter/Return key.
|
141
|
+
echo.
|
142
|
+
|
143
|
+
set /P UNZIP=Path to Unzip.exe :
|
144
|
+
|
145
|
+
echo.
|
146
|
+
echo.
|
147
|
+
echo Here is the information you have provided.
|
148
|
+
echo.
|
149
|
+
echo Installation folder = %destdir%
|
150
|
+
echo Perl = %PERL%
|
151
|
+
echo CakeCmd = %CAKECMD%
|
152
|
+
echo Unzip = %UNZIP%
|
153
|
+
echo.
|
154
|
+
|
155
|
+
pause
|
156
|
+
|
157
|
+
echo.
|
158
|
+
echo Installing script files to "%destdir%" ....
|
159
|
+
|
160
|
+
copy docx2txt.pl "%destdir%" > nul
|
161
|
+
|
162
|
+
if not "%UNZIP%" == "" (
|
163
|
+
%PERL% -e "undef $/; $_ = <>; s/(unzip\s*=>)[^,]*,/$1 '$ARGV[0]',/; print;" docx2txt.config "%UNZIP%" > "%destdir%\docx2txt.config"
|
164
|
+
)
|
165
|
+
|
166
|
+
if "%CAKECMD%" == "" (
|
167
|
+
%PERL% -e "undef $/; $_ = <>; s/(set PERL=).*?(\r?\n)/$1$ARGV[0]$2/; print;" docx2txt.bat "%PERL%" > "%destdir%\docx2txt.bat"
|
168
|
+
) else (
|
169
|
+
%PERL% -e "undef $/; $_ = <>; s/(set PERL=).*?(\r?\n)/$1$ARGV[0]$2/; s/:: (set CAKECMD=).*?(\r?\n)/$1$ARGV[1]$2/; print;" docx2txt.bat "%PERL%" "%CAKECMD%" > "%destdir%\docx2txt.bat"
|
170
|
+
)
|
171
|
+
|
172
|
+
goto END
|
173
|
+
|
174
|
+
|
175
|
+
:SIMPLE_INSTALL
|
176
|
+
|
177
|
+
echo Copying script files to "%destdir%" ....
|
178
|
+
|
179
|
+
copy docx2txt.bat "%destdir%" > nul
|
180
|
+
copy docx2txt.pl "%destdir%" > nul
|
181
|
+
copy docx2txt.config "%destdir%" > nul
|
182
|
+
|
183
|
+
echo.
|
184
|
+
echo Please adjust perl, unzip and cakecmd paths (as needed) in
|
185
|
+
echo "%destdir%\docx2txt.bat" and "%destdir%\docx2txt.config"
|
186
|
+
echo.
|
187
|
+
|
188
|
+
goto END
|
189
|
+
|
190
|
+
::
|
191
|
+
:: Check whether the argument executable exists?
|
192
|
+
::
|
193
|
+
|
194
|
+
:CHECK_FILE_EXISTENCE
|
195
|
+
|
196
|
+
if not exist "%~1" (
|
197
|
+
echo.
|
198
|
+
echo ** Can not find executable "%~1".
|
199
|
+
echo.
|
200
|
+
) else if /I "%~nx1" NEQ "%~2.exe" (
|
201
|
+
echo.
|
202
|
+
echo ** "%~1" does not seem to be an executable file.
|
203
|
+
echo.
|
204
|
+
) else exit /B 0
|
205
|
+
|
206
|
+
exit /B 7
|
207
|
+
|
208
|
+
|
209
|
+
:END
|
210
|
+
|
211
|
+
endlocal
|
212
|
+
endlocal
|
213
|
+
|
214
|
+
set PERL=
|
215
|
+
set CAKECMD=
|
216
|
+
set UNZIP=
|
217
|
+
set FILES=
|
218
|
+
set attempts=
|