Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Date: Wed, 27 Jul 2005 23:04:53 -0400 (EDT) From: Igor Pechtchanski Reply-To: cygwin AT cygwin DOT com To: Stephan Mueller cc: cygwin AT cygwin DOT com, Krzysztof Duleba Subject: RE: heap_chunk_in_mb default value (Was Re: perl - segfault on "free unused scalar") In-Reply-To: <23AA05B1B7171647BC38C5D761900EA40223C84E@DF-SEADOG-MSG.exchange.corp.microsoft.com> Message-ID: References: <23AA05B1B7171647BC38C5D761900EA40223C84E AT DF-SEADOG-MSG DOT exchange DOT corp DOT microsoft DOT com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII On Wed, 27 Jul 2005, Stephan Mueller wrote: > Igor Pechtchanski wrote: > > " (I wrote:) > " > End result is that the perl internal representation in the example > " > above probably only needs about 200MB of space, and not double that, > " > as suggested. > " > " Umm, that was unclear from the description on the perlunicode manpage. > " That, combined with Perl actually taking up 500M of memory with one > " string of 200,000,000 characters, led me to believe that Perl uses > " UCS-2 internally. > " > " Do you have another explanation for the doubled memory consumption? > " Igor > > The admittedly old perl pages (perl 5.6) I have handy right now include > the following near the top of the perlunicode page. I strongly doubt > this has changed in 5.8. > > Byte and Character semantics > > Beginning with version 5.6, Perl uses logically wide characters to > represent strings internally. This internal representation of strings > uses the UTF-8 encoding. Yep, it has. Here's all the Cygwin 5.8.7 manpage has to say on the matter: Byte and Character Semantics Beginning with version 5.6, Perl uses logically-wide characters to rep- resent strings internally. In future, Perl-level operations will be expected to work with charac- ters rather than bytes. There is also some text on encodings, but all it says is that an explicit "use utf8" pragma is needed to recognize byte strings as UTF-8. It says nothing about the internal representation of strings. > I've also found text suggesting the same in Chapter 15 of the Camel > book. > > Unfortunately, I don't have another explanation for the doubled memory > consumption. It could be that the default encoding has changed, and could be forced back to utf8 by the "use utf8" pragma... The Perl maintainer might be in a better position to comment on this. FWIW, neither "use utf8" nor "use bytes" seems to change the memory consumption of that sample script. Igor -- http://cs.nyu.edu/~pechtcha/ |\ _,,,---,,_ pechtcha AT cs DOT nyu DOT edu ZZZzz /,`.-'`' -. ;-;;,_ igor AT watson DOT ibm DOT com |,4- ) )-,_. ,\ ( `'-' Igor Pechtchanski, Ph.D. '---''(_/--' `-'\_) fL a.k.a JaguaR-R-R-r-r-r-.-.-. Meow! If there's any real truth it's that the entire multidimensional infinity of the Universe is almost certainly being run by a bunch of maniacs. /DA -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/