delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2005/07/27/23:05:06

Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Date: Wed, 27 Jul 2005 23:04:53 -0400 (EDT)
From: Igor Pechtchanski <pechtcha AT cs DOT nyu DOT edu>
Reply-To: cygwin AT cygwin DOT com
To: Stephan Mueller <smueller AT exchange DOT microsoft DOT com>
cc: cygwin AT cygwin DOT com, Krzysztof Duleba <krzysan AT skrzynka DOT pl>
Subject: RE: heap_chunk_in_mb default value (Was Re: perl - segfault on "free unused scalar")
In-Reply-To: <23AA05B1B7171647BC38C5D761900EA40223C84E@DF-SEADOG-MSG.exchange.corp.microsoft.com>
Message-ID: <Pine.GSO.4.61.0507272257260.27026@slinky.cs.nyu.edu>
References: <23AA05B1B7171647BC38C5D761900EA40223C84E AT DF-SEADOG-MSG DOT exchange DOT corp DOT microsoft DOT com>
MIME-Version: 1.0

On Wed, 27 Jul 2005, Stephan Mueller wrote:

> Igor Pechtchanski wrote:
>
> " (I wrote:)
> " > End result is that the perl internal representation in the example
> " > above probably only needs about 200MB of space, and not double that,
> " > as suggested.
> "
> " Umm, that was unclear from the description on the perlunicode manpage.
> " That, combined with Perl actually taking up 500M of memory with one
> " string of 200,000,000 characters, led me to believe that Perl uses
> " UCS-2 internally.
> "
> " Do you have another explanation for the doubled memory consumption?
> " 	Igor
>
> The admittedly old perl pages (perl 5.6) I have handy right now include
> the following near the top of the perlunicode page.  I strongly doubt
> this has changed in 5.8.
>
>   Byte and Character semantics
>
>     Beginning with version 5.6, Perl uses logically wide characters to
>     represent strings internally. This internal representation of strings
>     uses the UTF-8 encoding.

Yep, it has.  Here's all the Cygwin 5.8.7 manpage has to say on the
matter:

   Byte and Character Semantics

   Beginning with version 5.6, Perl uses logically-wide characters to rep-
   resent strings internally.

   In future, Perl-level operations will be expected to work with charac-
   ters rather than bytes.

There is also some text on encodings, but all it says is that an explicit
"use utf8" pragma is needed to recognize byte strings as UTF-8.  It says
nothing about the internal representation of strings.

> I've also found text suggesting the same in Chapter 15 of the Camel
> book.
>
> Unfortunately, I don't have another explanation for the doubled memory
> consumption.

It could be that the default encoding has changed, and could be forced
back to utf8 by the "use utf8" pragma...  The Perl maintainer might be in
a better position to comment on this.

FWIW, neither "use utf8" nor "use bytes" seems to change the memory
consumption of that sample script.
	Igor
-- 
				http://cs.nyu.edu/~pechtcha/
      |\      _,,,---,,_		pechtcha AT cs DOT nyu DOT edu
ZZZzz /,`.-'`'    -.  ;-;;,_		igor AT watson DOT ibm DOT com
     |,4-  ) )-,_. ,\ (  `'-'		Igor Pechtchanski, Ph.D.
    '---''(_/--'  `-'\_) fL	a.k.a JaguaR-R-R-r-r-r-.-.-.  Meow!

If there's any real truth it's that the entire multidimensional infinity
of the Universe is almost certainly being run by a bunch of maniacs. /DA

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019