Mail Archives: cygwin/2005/07/29/13:10:17

delorie.com/archives/browse.cgi

search

Mail Archives: cygwin/2005/07/29/13:10:17

Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm

List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>

List-Archive: <http://sourceware.org/ml/cygwin/>

List-Post: <mailto:cygwin AT cygwin DOT com>

List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>

Sender: cygwin-owner AT cygwin DOT com

Mail-Followup-To: cygwin AT cygwin DOT com

Delivered-To: mailing list cygwin AT cygwin DOT com

Date: Fri, 29 Jul 2005 10:09:59 -0700

From: Yitzchak Scott-Thoennes <sthoenna AT efn DOT org>

To: cygwin AT cygwin DOT com

Subject: Re: heap_chunk_in_mb default value (Was Re: perl - segfault on "free unused scalar")

Message-ID: <20050729170958.GA872@efn.org>

References: <23AA05B1B7171647BC38C5D761900EA40223C7E9 AT DF-SEADOG-MSG DOT exchange DOT corp DOT microsoft DOT com>

Mime-Version: 1.0

In-Reply-To: <23AA05B1B7171647BC38C5D761900EA40223C7E9@DF-SEADOG-MSG.exchange.corp.microsoft.com>

User-Agent: Mutt/1.4.2.1i

X-IsSubscribed: yes

On Wed, Jul 27, 2005 at 05:07:23PM -0700, Stephan Mueller wrote:
> "Igor Pechtchanski wrote:
> "
> " On Thu, 28 Jul 2005, Krzysztof Duleba wrote:
> " > > > I've simplified the test case. It seems that Cygwin perl can't
> " > > > handle too much memory. For instance:
> " > > >
> " > > > $ perl -e '$a="a"x(200 * 1024 * 1024); sleep 9'
> " > > >
> " > > > OK, this could have failed because $a might require 200 MB of
> " > > > continuous space.
> " > >
> " > > Actually, $a requires *more* than 200MB of continuous space.  Perl
> " > > characters are 2 bytes, so you're allocating at least 400MB of
> space!
> " >
> " > Right, UTF. I completely forgot about that.
> " 
> " Unicode, actually.
> 
> Unicode is a standard that defines 'code points' (numeric values) for a
> whole lot of different characters.  UTF-8 is a specific encoding of
> Unicode.  It has the nifty property that ASCII characters are encoded
> just as in ASCII -- one byte, with the high bit clear, and the low seven
> bits representing a character in the range 0..127.  Characters above the
> ASCII range require multiple bytes -- sometimes two, sometimes more.
> The algorithm is quite clever; find it in The Unicode Standard or with a
> quick Google search.
> 
> Another popular encoding is UCS-2, which is roughly "16-bit words each
> holding one Unicode character".
> 
> The latter is frequently what people think of as "Unicode".  The former
> is what perl uses internally to encode characters.
> 
> End result is that the perl internal representation in the example above
> probably only needs about 200MB of space, and not double that, as
> suggested.

Correct; perl uses UTF-8 (actually, an extension of UTF-8 which allows
codepoints up to 2**72-1).

However code like the above does end up using twice the space; it's
allocated once to store the result of the x operation and again when
it's copied to $a.

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -

webmaster	delorie software privacy
Copyright © 2019 by DJ Delorie	Updated Jul 2019

Mailing-List:	contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe:	<mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive:	<http://sourceware.org/ml/cygwin/>
List-Post:	<mailto:cygwin AT cygwin DOT com>
List-Help:	<mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender:	cygwin-owner AT cygwin DOT com
Mail-Followup-To:	cygwin AT cygwin DOT com
Delivered-To:	mailing list cygwin AT cygwin DOT com
Date:	Fri, 29 Jul 2005 10:09:59 -0700
From:	Yitzchak Scott-Thoennes <sthoenna AT efn DOT org>
To:	cygwin AT cygwin DOT com
Subject:	Re: heap_chunk_in_mb default value (Was Re: perl - segfault on "free unused scalar")
Message-ID:	<20050729170958.GA872@efn.org>
References:	<23AA05B1B7171647BC38C5D761900EA40223C7E9 AT DF-SEADOG-MSG DOT exchange DOT corp DOT microsoft DOT com>
Mime-Version:	1.0
In-Reply-To:	<23AA05B1B7171647BC38C5D761900EA40223C7E9@DF-SEADOG-MSG.exchange.corp.microsoft.com>
User-Agent:	Mutt/1.4.2.1i
X-IsSubscribed:	yes