Mail Archives: cygwin/2009/11/06/10:25:15
On Nov 6 16:00, Thomas Wolff wrote:
> Corinna Vinschen wrote:
> >I created a simple testcase:
> >
> >==== SNIP ===
> >...
> >==== SNAP ====
> I extended your test program to demonstrate the inefficiency of the
> standard mbrtowc function. [...]
> >Under Cygwin (tcsh time output):
> >
> > $ setenv LANG en_US.UTF-8
> > $ time ./mb 1000000 1 0
> > with malloc: 1, with mbrtowc: 0
> > 0.328u 0.031s 0:00.34 102.9% 0+0k 0+0io 1834pf+0w
> > $ time ./mb 1000000 0 1
> > with malloc: 0, with mbrtowc: 1
> > 1.921u 0.092s 0:02.09 96.1% 0+0k 0+0io 1827pf+0w
> > $ time ./mb 1000000 1 1
> > with malloc: 1, with mbrtowc: 1
> > 2.062u 0.140s 0:02.15 102.3% 0+0k 0+0io 1839pf+0w
> >
> >Running on the same CPU under Linux:
> >
> > $ setenv LANG en_US.UTF-8
> > $ time ./mb 1000000 1 0
> > with malloc: 1, with mbrtowc: 0
> > 0.088u 0.004s 0:00.09 88.8% 0+0k 0+0io 0pf+0w
> > $ time ./mb 1000000 0 1
> > with malloc: 0, with mbrtowc: 1
> > 1.836u 0.000s 0:01.85 98.9% 0+0k 0+0io 0pf+0w
> > $ time ./mb 1000000 1 1
> > with malloc: 1, with mbrtowc: 1
> > 1.888u 0.000s 0:01.93 97.4% 0+0k 0+0io 0pf+0w
> >
> >So, while Linux is definitely faster, the number are still comparable
> >for 1 million iterations. That still doens't explain why grep is a
> >multitude slower when using UTF-8 as charset.
> Results of mbrtowc vs. utftouni on Linux:
>
> thw[en_US.UTF-8]@scotty:~/tmp: locale charmap
> UTF-8
> thw[en_US.UTF-8]@scotty:~/tmp: time ./uu 1000000 0 1 0
> with malloc: 0, with mbrtowc: 1, with utftouni: 0
>
> real 0m2.897s
> user 0m2.836s
> sys 0m0.012s
> thw[en_US.UTF-8]@scotty:~/tmp: time ./uu 1000000 0 0 1
> with malloc: 0, with mbrtowc: 0, with utftouni: 1
>
> real 0m0.030s
> user 0m0.028s
> sys 0m0.000s
> thw[en_US.UTF-8]@scotty:~/tmp:
> [...]
> The conclusion is, as long as calling mbrtowc is as inefficient, a
> program caring about performance should not use it.
That's sort of an unfair test. Your utftouni function doesn't care for
mbstate, error, and surrogate pair handling.
Having said that, I just experimented further with mbrtowc, and I was
able to speed up mbrtowc and wcrtomb calls on Cygwin by a factor of
almost 50 per cent, just by reducing the function call depth in newlib,
which is the result of reentrancy and isolation efforts.
Talking about your implementation, if you could come up with a faster
implementation of newlib's __utf8_wctomb/__utf8_mbtowc, it would
certainly be another welcome performance boost.
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
- Raw text -