delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2005/07/27/20:11:47

Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
MIME-Version: 1.0
Subject: RE: heap_chunk_in_mb default value (Was Re: perl - segfault on "free unused scalar")
Date: Wed, 27 Jul 2005 17:07:23 -0700
Message-ID: <23AA05B1B7171647BC38C5D761900EA40223C7E9@DF-SEADOG-MSG.exchange.corp.microsoft.com>
From: "Stephan Mueller" <smueller AT exchange DOT microsoft DOT com>
To: <cygwin AT cygwin DOT com>, "Krzysztof Duleba" <krzysan AT skrzynka DOT pl>
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id j6S0Bjs3032164

"Igor Pechtchanski wrote:
"
" On Thu, 28 Jul 2005, Krzysztof Duleba wrote:
" > > > I've simplified the test case. It seems that Cygwin perl can't
" > > > handle too much memory. For instance:
" > > >
" > > > $ perl -e '$a="a"x(200 * 1024 * 1024); sleep 9'
" > > >
" > > > OK, this could have failed because $a might require 200 MB of
" > > > continuous space.
" > >
" > > Actually, $a requires *more* than 200MB of continuous space.  Perl
" > > characters are 2 bytes, so you're allocating at least 400MB of
space!
" >
" > Right, UTF. I completely forgot about that.
" 
" Unicode, actually.

Unicode is a standard that defines 'code points' (numeric values) for a
whole lot of different characters.  UTF-8 is a specific encoding of
Unicode.  It has the nifty property that ASCII characters are encoded
just as in ASCII -- one byte, with the high bit clear, and the low seven
bits representing a character in the range 0..127.  Characters above the
ASCII range require multiple bytes -- sometimes two, sometimes more.
The algorithm is quite clever; find it in The Unicode Standard or with a
quick Google search.

Another popular encoding is UCS-2, which is roughly "16-bit words each
holding one Unicode character".

The latter is frequently what people think of as "Unicode".  The former
is what perl uses internally to encode characters.

End result is that the perl internal representation in the example above
probably only needs about 200MB of space, and not double that, as
suggested.

stephan();

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019