Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: RE: heap_chunk_in_mb default value (Was Re: perl - segfault on "free unused scalar") Date: Wed, 27 Jul 2005 17:07:23 -0700 Message-ID: <23AA05B1B7171647BC38C5D761900EA40223C7E9@DF-SEADOG-MSG.exchange.corp.microsoft.com> From: "Stephan Mueller" To: , "Krzysztof Duleba" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id j6S0Bjs3032164 "Igor Pechtchanski wrote: " " On Thu, 28 Jul 2005, Krzysztof Duleba wrote: " > > > I've simplified the test case. It seems that Cygwin perl can't " > > > handle too much memory. For instance: " > > > " > > > $ perl -e '$a="a"x(200 * 1024 * 1024); sleep 9' " > > > " > > > OK, this could have failed because $a might require 200 MB of " > > > continuous space. " > > " > > Actually, $a requires *more* than 200MB of continuous space. Perl " > > characters are 2 bytes, so you're allocating at least 400MB of space! " > " > Right, UTF. I completely forgot about that. " " Unicode, actually. Unicode is a standard that defines 'code points' (numeric values) for a whole lot of different characters. UTF-8 is a specific encoding of Unicode. It has the nifty property that ASCII characters are encoded just as in ASCII -- one byte, with the high bit clear, and the low seven bits representing a character in the range 0..127. Characters above the ASCII range require multiple bytes -- sometimes two, sometimes more. The algorithm is quite clever; find it in The Unicode Standard or with a quick Google search. Another popular encoding is UCS-2, which is roughly "16-bit words each holding one Unicode character". The latter is frequently what people think of as "Unicode". The former is what perl uses internally to encode characters. End result is that the perl internal representation in the example above probably only needs about 200MB of space, and not double that, as suggested. stephan(); -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/