Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; 	charset="us-ascii"
Subject: RE: heap_chunk_in_mb default value (Was Re: perl - segfault on "free unused scalar")
Date: Wed, 27 Jul 2005 20:32:34 -0700
Message-ID: <23AA05B1B7171647BC38C5D761900EA40223C864@DF-SEADOG-MSG.exchange.corp.microsoft.com>
From: "Stephan Mueller" <smueller AT exchange DOT microsoft DOT com>
To: <cygwin AT cygwin DOT com>
Cc: "Krzysztof Duleba" <krzysan AT skrzynka DOT pl>
Content-Transfer-Encoding: 8bit

Igor Pechtchanski wrote:

" On Wed, 27 Jul 2005, Stephan Mueller wrote:
" > Igor Pechtchanski wrote:
" >
" > " (I wrote:)
" > " > End result is that the perl internal representation in the
example
" > " > above probably only needs about 200MB of space, and not double
that,
" > " > as suggested.
" > "
" > " Umm, that was unclear from the description on the perlunicode
manpage.
" > " That, combined with Perl actually taking up 500M of memory with
one
" > " string of 200,000,000 characters, led me to believe that Perl uses
" > " UCS-2 internally.
" > "
" > " Do you have another explanation for the doubled memory
consumption?
" > " 	Igor
" >
" > The admittedly old perl pages (perl 5.6) I have handy right now
include
" > the following near the top of the perlunicode page.  I strongly
doubt
" > this has changed in 5.8.

Clarification: I meant "I strongly doubt the _implementation_ has
changed
in 5.8".

" >   Byte and Character semantics
" >
" >     Beginning with version 5.6, Perl uses logically wide characters
to
" >     represent strings internally. This internal representation of
strings
" >     uses the UTF-8 encoding.
" 
" Yep, it has.  Here's all the Cygwin 5.8.7 manpage has to say on the
" matter:
" 
"    Byte and Character Semantics
" 
"    Beginning with version 5.6, Perl uses logically-wide characters to
rep-
"    resent strings internally.
" 
"    In future, Perl-level operations will be expected to work with
charac-
"    ters rather than bytes.

... but I guess I'm not surprised that the man page text has changed.
The
implementation details don't really belong at that point in the man
page.

" There is also some text on encodings, but all it says is that an
explicit
" "use utf8" pragma is needed to recognize byte strings as UTF-8.  It
says
" nothing about the internal representation of strings.

The utf8 pragma, to my understanding, is mostly concerned with how to
interpret sequences of bytes in perl program source text, and is not
related to internal representation.

" > I've also found text suggesting the same in Chapter 15 of the Camel
" > book.
" >
" > Unfortunately, I don't have another explanation for the doubled
memory
" > consumption.
" 
" It could be that the default encoding has changed, and could be forced
" back to utf8 by the "use utf8" pragma...

It's possible, but I think unlikely (given the aforementioned
understanding
of the meaning of utf8).  I'll admit though, that I'm still at a loss to
explain the memory doubling.

" The Perl maintainer might be in
" a better position to comment on this.

This I think is more likely :-)

" FWIW, neither "use utf8" nor "use bytes" seems to change the memory
" consumption of that sample script.

Yeah, this fits with my understanding of these pragmas as being just
about
interpretation of text in your perl source program.

stephan();

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/