Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: RE: heap_chunk_in_mb default value (Was Re: perl - segfault on "free unused scalar") Date: Wed, 27 Jul 2005 20:32:34 -0700 Message-ID: <23AA05B1B7171647BC38C5D761900EA40223C864@DF-SEADOG-MSG.exchange.corp.microsoft.com> From: "Stephan Mueller" To: Cc: "Krzysztof Duleba" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id j6S3awJd017703 Igor Pechtchanski wrote: " On Wed, 27 Jul 2005, Stephan Mueller wrote: " > Igor Pechtchanski wrote: " > " > " (I wrote:) " > " > End result is that the perl internal representation in the example " > " > above probably only needs about 200MB of space, and not double that, " > " > as suggested. " > " " > " Umm, that was unclear from the description on the perlunicode manpage. " > " That, combined with Perl actually taking up 500M of memory with one " > " string of 200,000,000 characters, led me to believe that Perl uses " > " UCS-2 internally. " > " " > " Do you have another explanation for the doubled memory consumption? " > " Igor " > " > The admittedly old perl pages (perl 5.6) I have handy right now include " > the following near the top of the perlunicode page. I strongly doubt " > this has changed in 5.8. Clarification: I meant "I strongly doubt the _implementation_ has changed in 5.8". " > Byte and Character semantics " > " > Beginning with version 5.6, Perl uses logically wide characters to " > represent strings internally. This internal representation of strings " > uses the UTF-8 encoding. " " Yep, it has. Here's all the Cygwin 5.8.7 manpage has to say on the " matter: " " Byte and Character Semantics " " Beginning with version 5.6, Perl uses logically-wide characters to rep- " resent strings internally. " " In future, Perl-level operations will be expected to work with charac- " ters rather than bytes. ... but I guess I'm not surprised that the man page text has changed. The implementation details don't really belong at that point in the man page. " There is also some text on encodings, but all it says is that an explicit " "use utf8" pragma is needed to recognize byte strings as UTF-8. It says " nothing about the internal representation of strings. The utf8 pragma, to my understanding, is mostly concerned with how to interpret sequences of bytes in perl program source text, and is not related to internal representation. " > I've also found text suggesting the same in Chapter 15 of the Camel " > book. " > " > Unfortunately, I don't have another explanation for the doubled memory " > consumption. " " It could be that the default encoding has changed, and could be forced " back to utf8 by the "use utf8" pragma... It's possible, but I think unlikely (given the aforementioned understanding of the meaning of utf8). I'll admit though, that I'm still at a loss to explain the memory doubling. " The Perl maintainer might be in " a better position to comment on this. This I think is more likely :-) " FWIW, neither "use utf8" nor "use bytes" seems to change the memory " consumption of that sample script. Yeah, this fits with my understanding of these pragmas as being just about interpretation of text in your perl source program. stephan(); -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/