textractor 0.0.2 → 0.0.3
Sign up to get free protection for your applications and to get access to all the features.
- data/VERSION +1 -1
- data/lib/textractor/document.rb +15 -7
- data/spec/document_spec.rb +22 -7
- data/spec/fixtures/document.docx +0 -0
- data/textractor.gemspec +19 -3
- data/vendor/docx2txt/AUTHORS +1 -0
- data/vendor/docx2txt/BSDmakefile +14 -0
- data/vendor/docx2txt/COPYING +674 -0
- data/vendor/docx2txt/ChangeLog +67 -0
- data/vendor/docx2txt/INSTALL +100 -0
- data/vendor/docx2txt/Makefile +23 -0
- data/vendor/docx2txt/README +109 -0
- data/vendor/docx2txt/ToDo +16 -0
- data/vendor/docx2txt/VERSION +1 -0
- data/vendor/docx2txt/WInstall.bat +218 -0
- data/vendor/docx2txt/docx2txt.bat +206 -0
- data/vendor/docx2txt/docx2txt.config +51 -0
- data/vendor/docx2txt/docx2txt.pl +387 -0
- data/vendor/docx2txt/docx2txt.sh +118 -0
- data/vendor/docx2txt/resume.docx +0 -0
- metadata +20 -4
@@ -0,0 +1,67 @@
|
|
1
|
+
v1.0 : 05/10/2009
|
2
|
+
|
3
|
+
New features:
|
4
|
+
- Input argument can also be a directory holding the unzipped content of .docx
|
5
|
+
file.
|
6
|
+
- Windows wrapper script, and support for using CakeCmd command line unzipper.
|
7
|
+
- Configuration file support for easy control over settings.
|
8
|
+
- Windows installation script.
|
9
|
+
|
10
|
+
Updations:
|
11
|
+
- Hyperlink is not displayed if hyperlink and hyperlinked text are same, even
|
12
|
+
though user has enabled hyperlink display.
|
13
|
+
- Improved handling of short line justification, capturing many cases that were
|
14
|
+
missed in earlier approach.
|
15
|
+
- Path names containing spaces are now handled.
|
16
|
+
|
17
|
+
Please refer to the updated documentation for more details.
|
18
|
+
|
19
|
+
|
20
|
+
v0.4 : 06/09/2009
|
21
|
+
|
22
|
+
New features: [suggestions from "Sergei Kulakov (sergei>AT<dewia>DOT<com)"].
|
23
|
+
- user can control display of hyperlink along with linked text.
|
24
|
+
- TOC related cleanup. TOC was not addressed so far.
|
25
|
+
|
26
|
+
Updations:
|
27
|
+
- many new character conversions (check the script code for details).
|
28
|
+
- character conversion mappings are now organised in a tabular form.
|
29
|
+
- currency characters are converted to respective full currency name.
|
30
|
+
- code tweaks to speedup the conversion process.
|
31
|
+
|
32
|
+
|
33
|
+
v0.3 : 23/09/2008
|
34
|
+
|
35
|
+
New features:
|
36
|
+
- center and right justification of text fitting in a line of (adjustible) 80
|
37
|
+
columns.
|
38
|
+
- indicating hyperlinked text along with the hyperlink.
|
39
|
+
- BSD makefile [Thanks to "Rene Maroufi" (info>AT<maroufi>DOT<net) for giving
|
40
|
+
guest access on an OpenBSD host for it].
|
41
|
+
|
42
|
+
Please refer to the release documentation for details.
|
43
|
+
- docx2txt.pl invocation has been changed a little,
|
44
|
+
- user involvement during installation is reduced.
|
45
|
+
- some suggestions on how Windows users can use this tool.
|
46
|
+
|
47
|
+
|
48
|
+
v0.2 : 15/08/2008
|
49
|
+
|
50
|
+
Docx text extraction can now be done in two ways (check version README for
|
51
|
+
further details).
|
52
|
+
- docx2txt.sh file.docx
|
53
|
+
- docx2txt.pl infile.docx outfile.txt
|
54
|
+
|
55
|
+
|
56
|
+
v0.1 : 10/08/2008
|
57
|
+
|
58
|
+
Initial Sourceforge release with attempts to handle following features during
|
59
|
+
text extraction.
|
60
|
+
- horizontal ruler, line breaks, paragraphs separation, tabs
|
61
|
+
- naive nested list formatting - assumed 8 level nesting, however if you want
|
62
|
+
to deal with further nesting, play comment-uncomment in perl script. :)
|
63
|
+
- capitalisation of text blocks i.e. in document.xml text is stored either as
|
64
|
+
lowercase or in mixed case, but in corresponding text files generated by
|
65
|
+
MSOffice it comes as all caps.
|
66
|
+
- character conversions (" ' < & > - ... etc.). Euro character is converted to
|
67
|
+
E, however you can change this behaviour by comment-uncomment in perl script.
|
@@ -0,0 +1,100 @@
|
|
1
|
+
Non-Windows users, please adjust following executables paths before proceeding
|
2
|
+
for installation.
|
3
|
+
|
4
|
+
- #! path for env in docx2txt.sh and docx2txt.pl
|
5
|
+
- path for unzip in docx2txt.config
|
6
|
+
|
7
|
+
You can skip installing docx2txt.sh and docx2txt.bat wrapper scripts (as
|
8
|
+
applicable) during manual installation. These check for overwriting the output
|
9
|
+
text file and have slightly restricted usage as compared to core docx2txt.pl
|
10
|
+
script. [check README for details]
|
11
|
+
|
12
|
+
However if you are using CakeCmd unzipper, docx2txt.bat can be quite handy as
|
13
|
+
it internally manages unzipping the .docx files that do not have .zip extension.
|
14
|
+
|
15
|
+
|
16
|
+
|
17
|
+
Installation on Linux, Cygwin, BSD and similar systems
|
18
|
+
------------------------------------------------------
|
19
|
+
|
20
|
+
Type "make" as root to install docx2txt files for all users in /usr/local/bin.
|
21
|
+
If you want to install these in some other directory, you can do so via
|
22
|
+
|
23
|
+
make INSTALLDIR=/path/to/desired/directory
|
24
|
+
|
25
|
+
BSD users can use either GNU make or BSD make.
|
26
|
+
|
27
|
+
You will need make and install utilities installed on your system for
|
28
|
+
installation via Makefile.
|
29
|
+
|
30
|
+
In case, you don't want to use Makefile for installation, you can follow these
|
31
|
+
steps for manual installation.
|
32
|
+
|
33
|
+
1. Copy docx2txt.pl, docx2txt.sh and docx2txt.config to the desired directory.
|
34
|
+
|
35
|
+
cp docx2txt.pl docx2txt.sh docx2txt.config /path/to/desired/directory
|
36
|
+
|
37
|
+
2. Change the permission of copied files to 755 for docx2txt.pl and docx2txt.sh,
|
38
|
+
and 644 for docx2txt.config .
|
39
|
+
|
40
|
+
chmod a+rX /path/to/desired/directory/docx2txt.*
|
41
|
+
|
42
|
+
3. Add the concerned directory to your PATH, if not already in PATH.
|
43
|
+
|
44
|
+
PATH=$PATH:/path/to/desired/directory
|
45
|
+
|
46
|
+
|
47
|
+
Installation on Windows
|
48
|
+
-----------------------
|
49
|
+
|
50
|
+
I. You can install minimal Cygwin packages from http://www.cygwin.com/ to have
|
51
|
+
working bash, cat, env, install, make, perl and unzip utilities and thus
|
52
|
+
create the required Cygwin environment for using this utility.
|
53
|
+
|
54
|
+
II. If you do not want to install even minimal Cygwin, you can try following
|
55
|
+
sequence for manual installation.
|
56
|
+
|
57
|
+
a. Get following files from /usr/bin/ of cygwin installation and place them in,
|
58
|
+
say C:\docx2txt .
|
59
|
+
|
60
|
+
cygwin1.dll
|
61
|
+
perl.exe
|
62
|
+
cygperl*.dll
|
63
|
+
unzip.exe
|
64
|
+
cygcrypt*.dll
|
65
|
+
|
66
|
+
b. Copy docx2txt.pl, docx2txt.bat and docx2txt.config to C:\docx2txt .
|
67
|
+
|
68
|
+
c. Change path for unzip in docx2txt.config to C:/docx2txt/unzip.exe and path
|
69
|
+
for perl in docx2txt.bat to C:\docx2txt\perl.exe .
|
70
|
+
|
71
|
+
d. You can now use this tool from within C:\docx2txt as follows.
|
72
|
+
|
73
|
+
docx2txt.bat file.docx
|
74
|
+
docx2txt.bat path-to-directory\file.docx
|
75
|
+
|
76
|
+
perl docx2txt.pl file.docx
|
77
|
+
perl docx2txt.pl directory\file.docx -
|
78
|
+
perl docx2txt.pl directory/file.docx file.txt
|
79
|
+
perl docx2txt.pl C:/somedir/file.docx
|
80
|
+
perl docx2txt.pl C:\somedir\file.docx C:\otherdir\converted.txt
|
81
|
+
|
82
|
+
III. You can also install this utility via WInstall.bat and follow the
|
83
|
+
instructions during installation. WInstall.bat can be invoked in two ways.
|
84
|
+
|
85
|
+
WInstall.bat installation-folder-name
|
86
|
+
WInstall.bat
|
87
|
+
|
88
|
+
In second case, install script will ask user for installation folder name.
|
89
|
+
|
90
|
+
It is advisable to have working installations of perl and atleast one command
|
91
|
+
line unzipper (Unzip/CakeCmd) before running this install script, so that it
|
92
|
+
can automatically set the desired paths in installed files.
|
93
|
+
|
94
|
+
You can use
|
95
|
+
|
96
|
+
- Cygwin perl or Strawberry perl [http://strawberryperl.com/] or any other
|
97
|
+
Windows native perl implementation
|
98
|
+
- Cygwin unzip or UnZip for Windows [http://gnuwin32.sourceforge.net/downlinks/unzip.php]
|
99
|
+
- CakeCmd unzipper [http://www.quickzip.org/cakecmd.html]
|
100
|
+
|
@@ -0,0 +1,23 @@
|
|
1
|
+
#
|
2
|
+
# Makefile for docx2txt
|
3
|
+
#
|
4
|
+
|
5
|
+
INSTALLDIR ?= /usr/local/bin
|
6
|
+
|
7
|
+
INSTALL = $(shell which install 2>/dev/null)
|
8
|
+
ifeq ($(INSTALL),)
|
9
|
+
$(error "Need 'install' to install docx2txt")
|
10
|
+
endif
|
11
|
+
|
12
|
+
PERL = $(shell which perl 2>/dev/null)
|
13
|
+
ifeq ($(PERL),)
|
14
|
+
$(warning "*** Make sure 'perl' is installed and is in your PATH, before running the installed script. ***")
|
15
|
+
endif
|
16
|
+
|
17
|
+
Dx2TFILES = docx2txt.sh docx2txt.pl docx2txt.config
|
18
|
+
|
19
|
+
install: $(Dx2TFILES)
|
20
|
+
[ -d $(INSTALLDIR) ] || mkdir -p $(INSTALLDIR)
|
21
|
+
$(INSTALL) -m 755 $^ $(INSTALLDIR)
|
22
|
+
|
23
|
+
.PHONY: install
|
@@ -0,0 +1,109 @@
|
|
1
|
+
docx2txt (http://docx2txt.sourceforge.net/) is a simple tool to generate
|
2
|
+
equivalent text files from Microsoft .docx documents, with an attempt towards
|
3
|
+
preserving sufficient formatting and document information, and appropriate
|
4
|
+
character conversions for a good text experience.
|
5
|
+
|
6
|
+
You need to atleast have perl installed on your system for using this tool.
|
7
|
+
|
8
|
+
|
9
|
+
How to Use
|
10
|
+
----------
|
11
|
+
|
12
|
+
You can do the text conversion in different ways depending upon your usage
|
13
|
+
environment.
|
14
|
+
|
15
|
+
1. Using docx2txt.sh :
|
16
|
+
|
17
|
+
docx2txt.sh file.docx
|
18
|
+
OR
|
19
|
+
docx2txt.sh file
|
20
|
+
|
21
|
+
In both these cases output text will be saved in file.txt .
|
22
|
+
|
23
|
+
2. Using docx2txt.pl :
|
24
|
+
|
25
|
+
a. docx2txt.pl infile.docx outfile.txt
|
26
|
+
|
27
|
+
Use - as the name of output text file, to send extracted text to the
|
28
|
+
stdout/terminal.
|
29
|
+
|
30
|
+
b. docx2txt.pl file.docx
|
31
|
+
OR
|
32
|
+
docx2txt.pl file
|
33
|
+
|
34
|
+
In both these cases output text will be saved in file.txt .
|
35
|
+
|
36
|
+
3. Using docx2txt.bat :
|
37
|
+
|
38
|
+
docx2txt.bat file.docx
|
39
|
+
OR
|
40
|
+
docx2txt.bat file
|
41
|
+
|
42
|
+
In both these cases output text will be saved in file.txt .
|
43
|
+
|
44
|
+
Input argument in all the above cases can also be a directory holding the
|
45
|
+
unzipped content of a .docx file. This feature is particulary useful if you do
|
46
|
+
not have a commandline unzipping tool like Unzip/CakeCmd installed on your
|
47
|
+
system.
|
48
|
+
|
49
|
+
|
50
|
+
Tune your Experience
|
51
|
+
--------------------
|
52
|
+
|
53
|
+
You can change these settings via docx2txt.config file located either in current
|
54
|
+
directory or in same location as the docx2txt.pl script.
|
55
|
+
|
56
|
+
- path to unzip program
|
57
|
+
- newline in output text file (Unix/Dos way)
|
58
|
+
- list level indentation amount
|
59
|
+
- line width (used for short line justification)
|
60
|
+
- showing of hyperlink along with linked text
|
61
|
+
|
62
|
+
Settings take preference in the order - docx2txt.config file in current folder,
|
63
|
+
docx2txt.config file in same location as docx2txt.pl script, defaults hardcoded
|
64
|
+
in docx2txt.pl script.
|
65
|
+
|
66
|
+
You can also adjust list element indicator characters for different levels, in
|
67
|
+
docx2txt.pl to suit your formatting taste. Currently 8 level list nesting is
|
68
|
+
assumed, however if you want to deal with deeper nesting, you can adjust that
|
69
|
+
as well in the perl script, by following the related comments there.
|
70
|
+
|
71
|
+
|
72
|
+
Note for MC (Midnight Commander) fans
|
73
|
+
-------------------------------------
|
74
|
+
|
75
|
+
You can add following binding in ~/.mc/bindings and view the text content of
|
76
|
+
.docx file by hitting F3 key (assuming default key mappings) after moving the
|
77
|
+
cursor over concerned filename in mc pannel.
|
78
|
+
|
79
|
+
# Microsoft .docx Document
|
80
|
+
regex/\.(docx|DOCX|Docx)$
|
81
|
+
View=%view{ascii} docx2txt.pl %f -
|
82
|
+
|
83
|
+
|
84
|
+
Request
|
85
|
+
-------
|
86
|
+
|
87
|
+
If you are using this work directly/indirectly for non-personal purpose(s),
|
88
|
+
please inform the author about it along with relevant url(s), so that it can be
|
89
|
+
mentioned on the project homepage.
|
90
|
+
|
91
|
+
In case you come across some issue with it, or need a feature that can be
|
92
|
+
handled in docx to text conversion, please feel free to communicate. An
|
93
|
+
accompanying test .docx document depicting the issue/need and the corresponding
|
94
|
+
text file generated by MSOffice with character substitution enabled (or as you
|
95
|
+
would like the text file to be) will be helpful.
|
96
|
+
|
97
|
+
You can track the project via http://sourceforge.net/projects/docx2txt and refer
|
98
|
+
to project cvs if there have been changes since this release.
|
99
|
+
|
100
|
+
|
101
|
+
Disclaimer
|
102
|
+
----------
|
103
|
+
|
104
|
+
This program includes no warranty whatsoever. It is provided "AS IS". For more
|
105
|
+
information please read the COPYING document, which should be included with the
|
106
|
+
package, and describes the GNU Public License, which covers docx2txt.
|
107
|
+
|
108
|
+
Sandeep Kumar ( shimple0 -AT- yahoo .DOT. com )
|
109
|
+
|
@@ -0,0 +1,16 @@
|
|
1
|
+
1. Handle lists in better way. [partly worked on, target latest by v2.0]
|
2
|
+
|
3
|
+
2. Heuristics based cleanup of damaged document content. [leaving for this
|
4
|
+
release - looking for more test samples, target v1.1]
|
5
|
+
|
6
|
+
3. Extract images. Now there has been a user request as well. [target pre v2.0]
|
7
|
+
4. Handle footnotes.
|
8
|
+
5. Improve table and short line justification handling. Ideally table columns
|
9
|
+
in a single row should be separated by pipe. Short line justification needs
|
10
|
+
to be adjusted to situations when tab occurs in line. A quick look into these
|
11
|
+
issues suggests that logic/code will need to be reorganised to handle these.
|
12
|
+
|
13
|
+
6. Create a simple manpage, hopefully after resolving footnote and list issues.
|
14
|
+
7. Implement simple state-machine for speedup [partially worked towards it].
|
15
|
+
8. XML parsing??? and making things more efficient. When it has matured enough,
|
16
|
+
may be a C/C++ version should be looked into.
|
@@ -0,0 +1 @@
|
|
1
|
+
1.0
|
@@ -0,0 +1,218 @@
|
|
1
|
+
@echo off
|
2
|
+
|
3
|
+
:: docx2txt, a command-line utility to convert Docx documents to text format.
|
4
|
+
:: Copyright (C) 2008-now Sandeep Kumar
|
5
|
+
::
|
6
|
+
:: This program is free software; you can redistribute it and/or modify
|
7
|
+
:: it under the terms of the GNU General Public License as published by
|
8
|
+
:: the Free Software Foundation; either version 3 of the License, or
|
9
|
+
:: (at your option) any later version.
|
10
|
+
::
|
11
|
+
:: This program is distributed in the hope that it will be useful,
|
12
|
+
:: but WITHOUT ANY WARRANTY; without even the implied warranty of
|
13
|
+
:: MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
14
|
+
:: GNU General Public License for more details.
|
15
|
+
::
|
16
|
+
:: You should have received a copy of the GNU General Public License
|
17
|
+
:: along with this program; if not, write to the Free Software
|
18
|
+
:: Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
|
19
|
+
|
20
|
+
::
|
21
|
+
:: A simple commandline installer for docx2txt on Windows.
|
22
|
+
::
|
23
|
+
:: Author : Sandeep Kumar (shimple0 -AT- Yahoo .DOT. COM)
|
24
|
+
::
|
25
|
+
:: ChangeLog :
|
26
|
+
::
|
27
|
+
:: 02/10/2009 - Initial version of command line installation script for
|
28
|
+
:: Windows users. Script will prompt user for perl, unzip and
|
29
|
+
:: cakecmd paths and will update these paths in the installed
|
30
|
+
:: files using perl, if perl path is valid. Else it will simply
|
31
|
+
:: copy the concerned files to the installation folder.
|
32
|
+
::
|
33
|
+
|
34
|
+
|
35
|
+
::
|
36
|
+
:: Ensure that required command extensions are enabled.
|
37
|
+
::
|
38
|
+
|
39
|
+
setlocal enableextensions
|
40
|
+
setlocal enabledelayedexpansion
|
41
|
+
|
42
|
+
|
43
|
+
echo.
|
44
|
+
echo Welcome to command line installer for docx2txt.
|
45
|
+
echo.
|
46
|
+
|
47
|
+
|
48
|
+
::
|
49
|
+
:: Check if this install script is invoked correctly.
|
50
|
+
::
|
51
|
+
|
52
|
+
if not "%~2" == "" (
|
53
|
+
echo.
|
54
|
+
echo Usage : "%~0" [WhereToInstall]
|
55
|
+
echo.
|
56
|
+
echo WhereToInstall specifies a folder to install into.
|
57
|
+
echo.
|
58
|
+
echo If destination folder is not specified on command line,
|
59
|
+
echo then it will be asked for during the installation.
|
60
|
+
echo.
|
61
|
+
goto END
|
62
|
+
)
|
63
|
+
|
64
|
+
|
65
|
+
::
|
66
|
+
:: Check if destination folder was specified on command line, else ask for it.
|
67
|
+
::
|
68
|
+
|
69
|
+
if "%~1" == "" (
|
70
|
+
echo.
|
71
|
+
echo Where should the docx2txt tool be installed? Specify the location
|
72
|
+
echo without surrounding quotes.
|
73
|
+
echo.
|
74
|
+
set /P destdir=Installation Folder :
|
75
|
+
echo.
|
76
|
+
) else (
|
77
|
+
set destdir=%~1
|
78
|
+
)
|
79
|
+
|
80
|
+
if not exist "%destdir%" (
|
81
|
+
echo.
|
82
|
+
echo ** Folder "%destdir%" does not exist. It will be created now.
|
83
|
+
echo.
|
84
|
+
mkdir "%destdir%"
|
85
|
+
)
|
86
|
+
|
87
|
+
|
88
|
+
::
|
89
|
+
:: Check if user specified destdir is a valid folder or a not.
|
90
|
+
::
|
91
|
+
|
92
|
+
pushd "%destdir%" 2>nul
|
93
|
+
if ERRORLEVEL 1 (
|
94
|
+
echo.
|
95
|
+
echo ** "%destdir%" does not specify a valid folder name.
|
96
|
+
echo ** Exiting installer.
|
97
|
+
echo.
|
98
|
+
goto END
|
99
|
+
) else if ERRORLEVEL 0 (
|
100
|
+
popd
|
101
|
+
)
|
102
|
+
|
103
|
+
|
104
|
+
echo.
|
105
|
+
echo Please specify fully qualified paths to utilities when requested.
|
106
|
+
echo Perl.exe is required for docx2txt tool as well as for this installation.
|
107
|
+
echo.
|
108
|
+
|
109
|
+
set /A attempts=0
|
110
|
+
|
111
|
+
:GET_PERL_PATH
|
112
|
+
|
113
|
+
set /P PERL=Path to Perl.exe :
|
114
|
+
call :CHECK_FILE_EXISTENCE "%PERL%" "perl"
|
115
|
+
if ERRORLEVEL 7 (
|
116
|
+
set /A attempts=attempts+1
|
117
|
+
if !attempts! == 3 (
|
118
|
+
echo.
|
119
|
+
echo Continuing with simple installation ....
|
120
|
+
echo.
|
121
|
+
goto SIMPLE_INSTALL
|
122
|
+
) else (
|
123
|
+
goto GET_PERL_PATH
|
124
|
+
)
|
125
|
+
)
|
126
|
+
|
127
|
+
|
128
|
+
echo.
|
129
|
+
echo.
|
130
|
+
echo If you do not have CakeCmd.exe installed, simply press Enter/Return key.
|
131
|
+
echo.
|
132
|
+
|
133
|
+
set /P CAKECMD=Path to CakeCmd.exe :
|
134
|
+
|
135
|
+
|
136
|
+
echo.
|
137
|
+
echo.
|
138
|
+
echo In case you are using Cygwin Perl.exe, you need to specify Unzip.exe path
|
139
|
+
echo using forward slashes i.e. like C:/path/to/unzip.exe .
|
140
|
+
echo If you do not have Unzip.exe installed, simply press Enter/Return key.
|
141
|
+
echo.
|
142
|
+
|
143
|
+
set /P UNZIP=Path to Unzip.exe :
|
144
|
+
|
145
|
+
echo.
|
146
|
+
echo.
|
147
|
+
echo Here is the information you have provided.
|
148
|
+
echo.
|
149
|
+
echo Installation folder = %destdir%
|
150
|
+
echo Perl = %PERL%
|
151
|
+
echo CakeCmd = %CAKECMD%
|
152
|
+
echo Unzip = %UNZIP%
|
153
|
+
echo.
|
154
|
+
|
155
|
+
pause
|
156
|
+
|
157
|
+
echo.
|
158
|
+
echo Installing script files to "%destdir%" ....
|
159
|
+
|
160
|
+
copy docx2txt.pl "%destdir%" > nul
|
161
|
+
|
162
|
+
if not "%UNZIP%" == "" (
|
163
|
+
%PERL% -e "undef $/; $_ = <>; s/(unzip\s*=>)[^,]*,/$1 '$ARGV[0]',/; print;" docx2txt.config "%UNZIP%" > "%destdir%\docx2txt.config"
|
164
|
+
)
|
165
|
+
|
166
|
+
if "%CAKECMD%" == "" (
|
167
|
+
%PERL% -e "undef $/; $_ = <>; s/(set PERL=).*?(\r?\n)/$1$ARGV[0]$2/; print;" docx2txt.bat "%PERL%" > "%destdir%\docx2txt.bat"
|
168
|
+
) else (
|
169
|
+
%PERL% -e "undef $/; $_ = <>; s/(set PERL=).*?(\r?\n)/$1$ARGV[0]$2/; s/:: (set CAKECMD=).*?(\r?\n)/$1$ARGV[1]$2/; print;" docx2txt.bat "%PERL%" "%CAKECMD%" > "%destdir%\docx2txt.bat"
|
170
|
+
)
|
171
|
+
|
172
|
+
goto END
|
173
|
+
|
174
|
+
|
175
|
+
:SIMPLE_INSTALL
|
176
|
+
|
177
|
+
echo Copying script files to "%destdir%" ....
|
178
|
+
|
179
|
+
copy docx2txt.bat "%destdir%" > nul
|
180
|
+
copy docx2txt.pl "%destdir%" > nul
|
181
|
+
copy docx2txt.config "%destdir%" > nul
|
182
|
+
|
183
|
+
echo.
|
184
|
+
echo Please adjust perl, unzip and cakecmd paths (as needed) in
|
185
|
+
echo "%destdir%\docx2txt.bat" and "%destdir%\docx2txt.config"
|
186
|
+
echo.
|
187
|
+
|
188
|
+
goto END
|
189
|
+
|
190
|
+
::
|
191
|
+
:: Check whether the argument executable exists?
|
192
|
+
::
|
193
|
+
|
194
|
+
:CHECK_FILE_EXISTENCE
|
195
|
+
|
196
|
+
if not exist "%~1" (
|
197
|
+
echo.
|
198
|
+
echo ** Can not find executable "%~1".
|
199
|
+
echo.
|
200
|
+
) else if /I "%~nx1" NEQ "%~2.exe" (
|
201
|
+
echo.
|
202
|
+
echo ** "%~1" does not seem to be an executable file.
|
203
|
+
echo.
|
204
|
+
) else exit /B 0
|
205
|
+
|
206
|
+
exit /B 7
|
207
|
+
|
208
|
+
|
209
|
+
:END
|
210
|
+
|
211
|
+
endlocal
|
212
|
+
endlocal
|
213
|
+
|
214
|
+
set PERL=
|
215
|
+
set CAKECMD=
|
216
|
+
set UNZIP=
|
217
|
+
set FILES=
|
218
|
+
set attempts=
|