delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2005/07/27/22:07:22

Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Date: Wed, 27 Jul 2005 22:06:58 -0400 (EDT)
From: Igor Pechtchanski <pechtcha AT cs DOT nyu DOT edu>
Reply-To: cygwin AT cygwin DOT com
To: Stephan Mueller <smueller AT exchange DOT microsoft DOT com>
cc: cygwin AT cygwin DOT com, Krzysztof Duleba <krzysan AT skrzynka DOT pl>
Subject: RE: heap_chunk_in_mb default value (Was Re: perl - segfault on "free unused scalar")
In-Reply-To: <23AA05B1B7171647BC38C5D761900EA40223C7E9@DF-SEADOG-MSG.exchange.corp.microsoft.com>
Message-ID: <Pine.GSO.4.61.0507272202380.27026@slinky.cs.nyu.edu>
References: <23AA05B1B7171647BC38C5D761900EA40223C7E9 AT DF-SEADOG-MSG DOT exchange DOT corp DOT microsoft DOT com>
MIME-Version: 1.0

On Wed, 27 Jul 2005, Stephan Mueller wrote:

> "Igor Pechtchanski wrote:
> "
> " On Thu, 28 Jul 2005, Krzysztof Duleba wrote:
> " > > > I've simplified the test case. It seems that Cygwin perl can't
> " > > > handle too much memory. For instance:
> " > > >
> " > > > $ perl -e '$a="a"x(200 * 1024 * 1024); sleep 9'
> " > > >
> " > > > OK, this could have failed because $a might require 200 MB of
> " > > > continuous space.
> " > >
> " > > Actually, $a requires *more* than 200MB of continuous space.  Perl
> " > > characters are 2 bytes, so you're allocating at least 400MB of space!
> " >
> " > Right, UTF. I completely forgot about that.
> "
> " Unicode, actually.
>
> Unicode is a standard that defines 'code points' (numeric values) for a
> whole lot of different characters.  UTF-8 is a specific encoding of
> Unicode.  It has the nifty property that ASCII characters are encoded
> just as in ASCII -- one byte, with the high bit clear, and the low seven
> bits representing a character in the range 0..127.  Characters above the
> ASCII range require multiple bytes -- sometimes two, sometimes more. The
> algorithm is quite clever; find it in The Unicode Standard or with a
> quick Google search.

I'm very familiar with the algorithm and the UTF-8 encoding, thanks.

> Another popular encoding is UCS-2, which is roughly "16-bit words each
> holding one Unicode character".
>
> The latter is frequently what people think of as "Unicode".

Yes, that's the one I meant.  Sorry for being imprecise.

> The former is what perl uses internally to encode characters.
>
> End result is that the perl internal representation in the example above
> probably only needs about 200MB of space, and not double that, as
> suggested.

Umm, that was unclear from the description on the perlunicode manpage.
That, combined with Perl actually taking up 500M of memory with one string
of 200,000,000 characters, led me to believe that Perl uses UCS-2
internally.

Do you have another explanation for the doubled memory consumption?
	Igor
-- 
				http://cs.nyu.edu/~pechtcha/
      |\      _,,,---,,_		pechtcha AT cs DOT nyu DOT edu
ZZZzz /,`.-'`'    -.  ;-;;,_		igor AT watson DOT ibm DOT com
     |,4-  ) )-,_. ,\ (  `'-'		Igor Pechtchanski, Ph.D.
    '---''(_/--'  `-'\_) fL	a.k.a JaguaR-R-R-r-r-r-.-.-.  Meow!

If there's any real truth it's that the entire multidimensional infinity
of the Universe is almost certainly being run by a bunch of maniacs. /DA

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019