Mail Archives: djgpp/2016/03/02/17:34:23
This is a port of PCRE2 10.21 to MSDOS/DJGPP.
The PCRE2 library is a set of functions that implement regular expression
pattern matching using the same syntax and semantics as Perl 5. PCRE2 has
its own native API, as well as a set of wrapper functions that correspond
to the POSIX regular expression API. PCRE2 is a re-working of the original
PCRE library to provide an entirely new API. It is written in C, and there
are no C++ wrappers anymore. The original, very widely deployed PCRE library,
is at version 8.38, and the API and feature set are stable future releases
will be for bugfixes only. All new future features will be to PCRE2, not
the original PCRE 8.x series.
DJGPP specific changes.
=======================
To configure and compile this port you will need an OS with LFN support.
The products itself will even work on systems that have only SFN support.
The usual stuff to configure the sources have been added. As usual, it is
stored together with the diffs file that documents my changes in the /djgpp
directory.
The port has been configured to support gzip and bzip2 compressed files. The
binaries will detect at runtime if LFN or SFN support is available. If there
is LFN support available then only the default extensions ".bz2" and ".gz"
will be used and no other ones will be honored. But if only SFN support is
available, then also ".*bz" will be accepted as a valid bzip2 extension and
in the case of gzip compressed files also ".*gz" and ".**z" will be accepted
as valid extensions, where "*" always stands for any valid character. If a
file with a valid bzip2 or gzip extension can not be opened using the appro-
priate compressor library functions, it will be treated as an uncopressed
file and will be opened as a plain file. Please note that all files are
opened in binary mode and that neither the original code nor this port offers
any way to change this behaviour.
The library can be configured to accept different EOL characters. I have
choosen to configure this port using the option to enable any EOL characters.
This means that CR, LF and CRLF will be recognized as valid EOL character.
If you do not like this, you will have to reconfigure and recompile the port
passing the option you prefer to config.bat. The following command line
options are available:
cr: enables CR as EOL
lf: enables LF as EOL
crlf: enables CRLF as EOL
any-crlf: enables CR, LF and CRLF as EOL
Also the following command line options are available to disable any of the
EOL characters:
no-cr: disables CR as EOL
no-lf: disables LF as EOL
no-crlf: disables CRLF as EOL
no-any-crlf: disables CR, LF and CRLF as EOL
Enabling one of them disables all of the other ones. This concerns only the
library. The EOL character used by pcre2grep can always be controlled with
the -N command line option. Please note that this choice has the consequence
that if you have a string looking like this:
foo\r\nbar
PCRE library and pcre2grep.exe will find 2 EOLs. One for \r and one for \n.
In other words the following two sequences "CRCRLF" and "LFCRLF" will always
produce two EOL matches. The "CRCRLF" sequence will produce one match for
the first CR and a second one for the following CRLF. The "LFCRLF" sequence
will produce a match for the first LF and a second match for the following
CRLF. Please note that this behaviour differs from the DJGPP's port of grep.
It was not my intention to modify the PCRE code in such a drastic way to be
able to emulation DJGPP's grep behaviour. pcre2grep.exe also offers color
support without having to install an ansi.sys driver. Please also note that
I have not configured the port to support neither UTF-8 Unicode character
strings nor any UTF-8 EOL character sequence.
There are more options like this:
pcre8 or no-pcre8, default pcre8. Enable 8 bit character set support
and disable 16 bit character set support.
pcre16 or no-pcre16, default no-pcre16. Disable 8 bit and 32 character
set support and enables 16 bit character set support. DJGPP does not
support 16 bit character sets AFAIK.
pcre32 or no-pcre32, default no-pcre32. Disable 8 bit and 16 character
set support and enables 32 bit character set support. DJGPP does not
support 32 bit character sets AFAIK.
jit or no-jit, default no-jit. Disable Just-In-Time compiling support.
grepjit or no-grepjit, default no-grepjit. Disable Just-In-Time support
in pcre2grep.
Neither the Just-In-Time compiling support for the library nor the JIT
support for pcre2grep are supported by me. I have checked the code to
see if there are DOS specific issues to fix but I did not found any.
To compile it you will need to install some pthread library together with
a socket library. I have never used that kind of libraries on DOS with
DJGPP so I am not able to support JIT. If you want to try you are alone.
The pcre2test.exe binary will not support the -S command line option that
allows to change the program stack. This is because DJGPP's setrlimit does
not support this feature. The port has been configure to use the readline
and history libraries. If you do not like this you will have to reconfigure
the sources passing the "no-rl" command line option to config.bat.
The default is always that readline is used.
The port passes the test suite except for the last test. This test is
completely UNIX or LF centric and does not work well with any other EOL
encoding than the one used on posix systems. This is known by the author
and maintainer of PCRE library and I have no plans to write a CRLF specific
test case for the DJGPP port.
Certain man and html pages have been renamed to fit into the SFN limits.
The index.html has been adjusted accordingly.
As told before, to configure and compile the package you will have to install
the following packages too:
ftp://ftp.delorie.com/pub/djgpp/current/v2gnu/rdln63b.zip
ftp://ftp.delorie.com/pub/djgpp/current/v2tk/zlib128br2.zip
ftp://ftp.delorie.com/pub/djgpp/current/v2tk/bz2-106ar2.zip
Of course, you can always download newer versions of these ports if available.
The source package is distributed preconfigured to be build in the /_build
directory located under the top srcdir.
The port has been configured and compiled on WinXP SP3 and Win98SE using
gcc530b and bnu226br2. There is no guarantee that this may be possible with
any other DOS-like OS. Due to the massive use of long file names it will
not be possible to configure and compile without LFN support.
Please read the docs. There are no info formated docs. All the extensive
documentation is html formated and placed in /share/doc/pcre/html
All the changes done to the original distribution are documented in the
diffs file and located together with all the files needed to configure
the package (config.bat, config.sed, config.site, etc.) in the /djgpp
directory.
For further information about PCRE2 please read the man pages and NEWS file.
As the pcre author noted, the original, very widely deployed PCRE library is at
version 8.37, and the API and feature set are stable future releases will be for
bugfixes only. All new future features will be to PCRE2 and is at version 10.20,
not the original PCRE 8.x series.
The previous port of pcre (aka pcre837[b|s].zip) has been kept in the /current
directory because it is the last one of the series and just in case it is
prefered over the new API.
This is an verbatim extract of the NEWS file:
-------------------------------------------------------------------------------
Version 10.21 12-January-2016
-----------------------------
1. Many bugs have been fixed. A large number of them were provoked only by very
strange pattern input, and were discovered by fuzzers. Some others were
discovered by code auditing. See ChangeLog for details.
2. The Unicode tables have been updated to Unicode version 8.0.0.
3. For Perl compatibility in EBCDIC environments, ranges such as a-z in a
class, where both values are literal letters in the same case, omit the
non-letter EBCDIC code points within the range.
4. There have been a number of enhancements to the pcre2_substitute() function,
giving more flexibility to replacement facilities. It is now also possible to
cause the function to return the needed buffer size if the one given is too
small.
5. The PCRE2_ALT_VERBNAMES option causes the "name" parts of special verbs such
as (*THEN:name) to be processed for backslashes and to take note of
PCRE2_EXTENDED.
6. PCRE2_INFO_HASBACKSLASHC makes it possible for a client to find out if a
pattern uses \C, and --never-backslash-C makes it possible to compile a version
PCRE2 in which the use of \C is always forbidden.
7. A limit to the length of pattern that can be handled can now be set by
calling pcre2_set_max_pattern_length().
8. When matching an unanchored pattern, a match can be required to begin within
a given number of code units after the start of the subject by calling
pcre2_set_offset_limit().
9. The pcre2test program has been extended to test new facilities, and it can
now run the tests when LF on its own is not a valid newline sequence.
10. The RunTest script has also been updated to enable more tests to be run.
11. There have been some minor performance enhancements.
Version 10.20 30-June-2015
--------------------------
1. Callouts with string arguments and the pcre2_callout_enumerate() function
have been implemented.
2. The PCRE2_NEVER_BACKSLASH_C option, which locks out the use of \C, is added.
3. The PCRE2_ALT_CIRCUMFLEX option lets ^ match after a newline at the end of a
subject in multiline mode.
4. The way named subpatterns are handled has been refactored. The previous
approach had several bugs.
5. The handling of \c in EBCDIC environments has been changed to conform to the
perlebcdic document. This is an incompatible change.
6. Bugs have been mended, many of them discovered by fuzzers.
Version 10.10 06-March-2015
---------------------------
1. Serialization and de-serialization functions have been added to the API,
making it possible to save and restore sets of compiled patterns, though
restoration must be done in the same environment that was used for compilation.
2. The (*NO_JIT) feature has been added; this makes it possible for a pattern
creator to specify that JIT is not to be used.
3. A number of bugs have been fixed. In particular, bugs that caused building
on Windows using CMake to fail have been mended.
Version 10.00 05-January-2015
-----------------------------
Version 10.00 is the first release of PCRE2, a revised API for the PCRE
library. Changes prior to 10.00 are logged in the ChangeLog file for the old
API, up to item 20 for release 8.36. New programs are recommended to use the
new library. Programs that use the original (PCRE1) API will need changing
before linking with the new library.
-------------------------------------------------------------------------------
The port has been compiled using stock djdev205 and consists of the two packages
that can be downloaded from ftp.delorie.com and mirrors as (timestamp 2016-02-28):
PCRE2 10.21 binaries, headers, libs and man formated documentation:
ftp://ftp.delorie.com/pub/djgpp/current/v2tk/pcr1021b.zip
PCRE2 10.21 source:
ftp://ftp.delorie.com/pub/djgpp/current/v2tk/pcr1021s.zip
Send PCRE2 specific bug reports to <pcre-dev AT exim DOT org>.
Send suggestions and bug reports concerning the DJGPP port to
comp.os.msdos.djgpp or <djgpp AT delorie DOT com>.
Enjoy.
Guerrero, Juan Manuel <juan DOT guerrero AT gmx DOT de>
- Raw text -