Mail Archives: cygwin/2005/07/27/23:05:06
On Wed, 27 Jul 2005, Stephan Mueller wrote:
> Igor Pechtchanski wrote:
>
> " (I wrote:)
> " > End result is that the perl internal representation in the example
> " > above probably only needs about 200MB of space, and not double that,
> " > as suggested.
> "
> " Umm, that was unclear from the description on the perlunicode manpage.
> " That, combined with Perl actually taking up 500M of memory with one
> " string of 200,000,000 characters, led me to believe that Perl uses
> " UCS-2 internally.
> "
> " Do you have another explanation for the doubled memory consumption?
> " Igor
>
> The admittedly old perl pages (perl 5.6) I have handy right now include
> the following near the top of the perlunicode page. I strongly doubt
> this has changed in 5.8.
>
> Byte and Character semantics
>
> Beginning with version 5.6, Perl uses logically wide characters to
> represent strings internally. This internal representation of strings
> uses the UTF-8 encoding.
Yep, it has. Here's all the Cygwin 5.8.7 manpage has to say on the
matter:
Byte and Character Semantics
Beginning with version 5.6, Perl uses logically-wide characters to rep-
resent strings internally.
In future, Perl-level operations will be expected to work with charac-
ters rather than bytes.
There is also some text on encodings, but all it says is that an explicit
"use utf8" pragma is needed to recognize byte strings as UTF-8. It says
nothing about the internal representation of strings.
> I've also found text suggesting the same in Chapter 15 of the Camel
> book.
>
> Unfortunately, I don't have another explanation for the doubled memory
> consumption.
It could be that the default encoding has changed, and could be forced
back to utf8 by the "use utf8" pragma... The Perl maintainer might be in
a better position to comment on this.
FWIW, neither "use utf8" nor "use bytes" seems to change the memory
consumption of that sample script.
Igor
--
http://cs.nyu.edu/~pechtcha/
|\ _,,,---,,_ pechtcha AT cs DOT nyu DOT edu
ZZZzz /,`.-'`' -. ;-;;,_ igor AT watson DOT ibm DOT com
|,4- ) )-,_. ,\ ( `'-' Igor Pechtchanski, Ph.D.
'---''(_/--' `-'\_) fL a.k.a JaguaR-R-R-r-r-r-.-.-. Meow!
If there's any real truth it's that the entire multidimensional infinity
of the Universe is almost certainly being run by a bunch of maniacs. /DA
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
- Raw text -