delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2017/10/15/17:50:32

X-Authentication-Warning: delorie.com: mail set sender to djgpp-bounces using -f
Message-Id: <201710152150.v9FLo9I9027173@delorie.com>
Date: Sun, 15 Oct 2017 23:52:50 +0200
From: "Juan Manuel Guerrero (juan DOT guerrero AT gmx DOT de) [via djgpp-announce AT delorie DOT com]" <djgpp-announce AT delorie DOT com>
To: djgpp-announce AT delorie DOT com
Subject: ANNOUNCE: DJGPP port of PCRE2 10.30 uploaded.
Reply-To: djgpp AT delorie DOT com

This is a port of PCRE2 10.30 to MSDOS/DJGPP.


   The PCRE2 library is a set of functions that implement regular expression
   pattern matching using the same syntax and semantics as Perl 5. PCRE2 has
   its own native API, as well as a set of wrapper functions that correspond
   to the POSIX regular expression API.  PCRE2 is a re-working of the original
   PCRE library to provide an entirely new API.  It is written in C, and there
   are no C++ wrappers anymore.  The original, very widely deployed PCRE library,
   is at version 8.40, and the API and feature set are stable future releases
   will be for bugfixes only.  All new future features will be to PCRE2, not
   the original PCRE 8.x series.


   DJGPP specific changes.
   =======================


   To configure and compile this port you will need an OS with LFN support.
   The products itself will even work on systems that have only SFN support.

   The usual stuff to configure the sources have been added.  As usual, it is
   stored together with the diffs file that documents my changes in the /djgpp
   directory.
   The port has been configured to support gzip and bzip2 compressed files.  The
   binaries will detect at runtime if LFN or SFN support is available.  If there
   is LFN support available then only the default extensions ".bz2" and ".gz"
   will be used and no other ones will be honored.  But if only SFN support is
   available, then also ".*bz" will be accepted as a valid bzip2 extension and
   in the case of gzip compressed files also ".*gz" and ".**z" will be accepted
   as valid extensions, where "*" always stands for any valid character.  If a
   file with a valid bzip2 or gzip extension can not be opened using the appro-
   priate compressor library functions, it will be treated as an uncopressed
   file and will be opened as a plain file.  Please note that all files are
   opened in binary mode and that neither the original code nor this port offers
   any way to change this behaviour.

   The library can be configured to accept different EOL characters.  I have
   choosen to configure this port using the option to enable any EOL characters.
   This means that CR, LF and CRLF will be recognized as valid EOL character.

   If you do not like this, you will have to reconfigure and recompile the port
   passing the option you prefer to config.bat.  The following command line
   options are available:
     cr: enables CR as EOL
     lf: enables LF as EOL
     crlf: enables CRLF as EOL
     any-crlf: enables CR, LF and CRLF as EOL
   Also the following command line options are available to disable any of the
   EOL characters:
     no-cr: disables CR as EOL
     no-lf: disables LF as EOL
     no-crlf: disables CRLF as EOL
     no-any-crlf: disables CR, LF and CRLF as EOL
   Enabling one of them disables all of the other ones.  This concerns only the
   library.  The EOL character used by pcre2grep can always be controlled with
   the -N command line option.  Please note that this choice has the consequence
   that if you have a string looking like this:
     foo\r\nbar
   PCRE library and pcre2grep.exe will find 2 EOLs.  One for \r and one for \n.
   In other words the following two sequences "CRCRLF" and "LFCRLF" will always
   produce two EOL matches.  The "CRCRLF" sequence will produce one match for
   the first CR and a second one for the following CRLF.  The "LFCRLF" sequence
   will produce a match for the first LF and a second match for the following
   CRLF.  Please note that this behaviour differs from the DJGPP's port of grep.
   It was not my intention to modify the PCRE code in such a drastic way to be
   able to emulation DJGPP's grep behaviour.  pcre2grep.exe also offers color
   support without having to install an ansi.sys driver.  Please also note that
   I have not configured the port to support neither UTF-8 Unicode character
   strings nor any UTF-8 EOL character sequence.

   There are more options like this:
     pcre8 or no-pcre8, default pcre8.  Enable 8 bit character set support
       and disable 16 bit character set support.
     pcre16 or no-pcre16, default no-pcre16.  Disable 8 bit and 32 character
       set support and enables 16 bit character set support.  DJGPP does not
       support 16 bit character sets AFAIK.
     pcre32 or no-pcre32, default no-pcre32.  Disable 8 bit and 16 character
       set support and enables 32 bit character set support.  DJGPP does not
       support 32 bit character sets AFAIK.
     jit or no-jit, default no-jit.  Disable Just-In-Time compiling support.
     grepjit or no-grepjit, default no-grepjit.  Disable Just-In-Time support
       in pcre2grep.
   Neither the Just-In-Time compiling support for the library nor the JIT
   support for pcre2grep are supported by me.  I have checked the code to
   see if there are DOS specific issues to fix but I did not found any.
   To compile it you will need to install some pthread library together with
   a socket library.  I have never used that kind of libraries on DOS with
   DJGPP so I am not able to support JIT.  If you want to try you are alone.

   The pcre2test.exe binary will not support the -S command line option that
   allows to change the program stack.  This is because DJGPP's setrlimit does
   not support this feature.  The port has been configure to use the readline
   and history libraries.  If you do not like this you will have to reconfigure
   the sources passing the "no-rl" command line option to config.bat.
   The default is always that readline is used.

   The port passes the test suite except for the last test.  This test is
   completely UNIX or LF centric and does not work well with any other EOL
   encoding than the one used on posix systems.  This is known by the author
   and maintainer of PCRE library and I have no plans to write a CRLF specific
   test case for the DJGPP port.

   Certain man and html pages have been renamed to fit into the SFN limits.
   The index.html has been adjusted accordingly.

   As told before, to configure and compile the package you will have to install
   the following packages too:
     ftp://ftp.delorie.com/pub/djgpp/current/v2gnu/rdln70b.zip
     ftp://ftp.delorie.com/pub/djgpp/current/v2tk/zlb1211b.zip
     ftp://ftp.delorie.com/pub/djgpp/current/v2tk/bz2-106ar2.zip
   Of course, you can always download newer versions of these ports if available.

   The source package is distributed preconfigured to be build in the /_build
   directory located under the top srcdir.

   The port has been configured and compiled on WinXP SP3 and Win98SE using
   gcc346b and bnu229b.  There is no guarantee that this may be possible with
   any other DOS-like OS.  Due to the massive use of long file names it will
   not be possible to configure and compile without LFN support.

   Please read the docs.  There are no info formated docs.  All the extensive
   documentation is html formated and placed in /share/doc/pcre/html

   All the changes done to the original distribution are documented in the
   diffs file and located together with all the files needed to configure
   the package (config.bat, config.sed, config.site, etc.) in the /djgpp
   directory.

   For further information about PCRE2 please read the man pages and NEWS file.

   As the pcre author noted, the original, very widely deployed PCRE1 library is at
   version 8.41, and the API and feature set are stable future releases will be for
   bugfixes only.  All new future features will be to PCRE2 and is at version 10.30,
   not the original PCRE 8.x series.

   The previous port of pcre (aka pcre841[b|s].zip) has been kept in the /current
   directory because it is the last one of the series and just in case it is
   prefered over the new API.



   This is a verbatim extract of the NEWS file:
-------------------------------------------------------------------------------
Version 10.30 14-August-2017
----------------------------

The full list of changes that includes bugfixes and tidies is, as always, in
ChangeLog. These are the most important new features:

1. The main interpreter, pcre2_match(), has been refactored into a new version
that does not use recursive function calls (and therefore the system stack) for
remembering backtracking positions. This makes --disable-stack-for-recursion a
NOOP. The new implementation allows backtracking into recursive group calls in
patterns, making it more compatible with Perl, and also fixes some other
previously hard-to-do issues. For patterns that have a lot of backtracking, the
heap is now used, and there is explicit limit on the amount, settable by
pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). The "recursion limit" is retained,
but is renamed as "depth limit" (though the old names remain for
compatibility).

There is also a change in the way callouts from pcre2_match() are handled. The
offset_vector field in the callout block is no longer a pointer to the
actual ovector that was passed to the matching function in the match data
block. Instead it points to an internal ovector of a size large enough to hold
all possible captured substrings in the pattern.

2. The new option PCRE2_ENDANCHORED insists that a pattern match must end at
the end of the subject.

3. The new option PCRE2_EXTENDED_MORE implements Perl's /xx feature, and
pcre2test is upgraded to support it. Setting within the pattern by (?xx) is
also supported.

4. (?n) can be used to set PCRE2_NO_AUTO_CAPTURE, because Perl now has this.

5. Additional compile options in the compile context are now available, and the
first two are: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and
PCRE2_EXTRA_BAD_ESCAPE_IS LITERAL.

6. The newline type PCRE2_NEWLINE_NUL is now available.

7. The match limit value now also applies to pcre2_dfa_match() as there are
patterns that can use up a lot of resources without necessarily recursing very
deeply.

8. The option REG_PEND (a GNU extension) is now available for the POSIX
wrapper. Also there is a new option PCRE2_LITERAL which is used to support
REG_NOSPEC.

9. PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD are implemented for the
benefit of pcre2grep, and pcre2grep's -F, -w, and -x options are re-implemented
using PCRE2_LITERAL, PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This
is tidier and also fixes some bugs.

10. The Unicode tables are upgraded from Unicode 8.0.0 to Unicode 10.0.0.

11. There are some experimental functions for converting foreign patterns
(globs and POSIX patterns) into PCRE2 patterns.


Version 10.23 14-February-2017
------------------------------

1. ChangeLog has the details of a lot of bug fixes and tidies.

2. There has been a major re-factoring of the pcre2_compile.c file. Most syntax
checking is now done in the pre-pass that identifies capturing groups. This has
reduced the amount of duplication and made the code tidier. While doing this,
some minor bugs and Perl incompatibilities were fixed (see ChangeLog for
details.)

3. Back references are now permitted in lookbehind assertions when there are
no duplicated group numbers (that is, (?| has not been used), and, if the
reference is by name, there is only one group of that name. The referenced
group must, of course be of fixed length.

4. \g{+<number>} (e.g. \g{+2} ) is now supported. It is a "forward back
reference" and can be useful in repetitions (compare \g{-<number>} ). Perl does
not recognize this syntax.

5. pcre2grep now automatically expands its buffer up to a maximum set by
--max-buffer-size.

6. The -t option (grand total) has been added to pcre2grep.

7. A new function called pcre2_code_copy_with_tables() exists to copy a
compiled pattern along with a private copy of the character tables that is
uses.

8. A user supplied a number of patches to upgrade pcre2grep under Windows and
tidy the code.

9. Several updates have been made to pcre2test and test scripts (see
ChangeLog).


Version 10.22 29-July-2016
--------------------------

1. ChangeLog has the details of a number of bug fixes.

2. The POSIX wrapper function regcomp() did not used to support back references
and subroutine calls if called with the REG_NOSUB option. It now does.

3. A new function, pcre2_code_copy(), is added, to make a copy of a compiled
pattern.

4. Support for string callouts is added to pcre2grep.

5. Added the PCRE2_NO_JIT option to pcre2_match().

6. The pcre2_get_error_message() function now returns with a negative error
code if the error number it is given is unknown.

7. Several updates have been made to pcre2test and test scripts (see
ChangeLog).


-------------------------------------------------------------------------------




   The port has been compiled using stock djdev205 and consists of the two packages
   that can be downloaded from ftp.delorie.com and mirrors as (time stamp 2017-10-12):

     PCRE2 10.30 binaries, headers, libs and man formated documentation:
     ftp://ftp.delorie.com/pub/djgpp/current/v2tk/pcr1030b.zip

     PCRE2 10.30 source:
     ftp://ftp.delorie.com/pub/djgpp/current/v2tk/pcr1030s.zip



   Send PCRE2 specific bug reports to <pcre-dev AT exim DOT org>.
   Send suggestions and bug reports concerning the DJGPP port to
   comp.os.msdos.djgpp or <djgpp AT delorie DOT com>.

Enjoy.

       Guerrero, Juan Manuel <juan DOT guerrero AT gmx DOT de>

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019