Mail Archives: cygwin/2005/07/27/20:11:47
"Igor Pechtchanski wrote:
"
" On Thu, 28 Jul 2005, Krzysztof Duleba wrote:
" > > > I've simplified the test case. It seems that Cygwin perl can't
" > > > handle too much memory. For instance:
" > > >
" > > > $ perl -e '$a="a"x(200 * 1024 * 1024); sleep 9'
" > > >
" > > > OK, this could have failed because $a might require 200 MB of
" > > > continuous space.
" > >
" > > Actually, $a requires *more* than 200MB of continuous space. Perl
" > > characters are 2 bytes, so you're allocating at least 400MB of
space!
" >
" > Right, UTF. I completely forgot about that.
"
" Unicode, actually.
Unicode is a standard that defines 'code points' (numeric values) for a
whole lot of different characters. UTF-8 is a specific encoding of
Unicode. It has the nifty property that ASCII characters are encoded
just as in ASCII -- one byte, with the high bit clear, and the low seven
bits representing a character in the range 0..127. Characters above the
ASCII range require multiple bytes -- sometimes two, sometimes more.
The algorithm is quite clever; find it in The Unicode Standard or with a
quick Google search.
Another popular encoding is UCS-2, which is roughly "16-bit words each
holding one Unicode character".
The latter is frequently what people think of as "Unicode". The former
is what perl uses internally to encode characters.
End result is that the perl internal representation in the example above
probably only needs about 200MB of space, and not double that, as
suggested.
stephan();
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
- Raw text -