delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2017/08/07/15:31:24

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type
:content-transfer-encoding; q=dns; s=default; b=J8Kq/D1JIGuDuM8m
dvcCCNfnAXxUOvyhRUFEuzU4TsnvQzqKdmjmBsLBtBELW36mtbydwpo4nPeBGzh+
qDitYODn9sZwpqpp3Vm4+fGy0X9+Y8sPYcj06uSbbKfhc2ti/O+EEhNITaFJYa+P
v84CxsF8DhsxVLAN4A2RU0bV5dY=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type
:content-transfer-encoding; s=default; bh=21MFxX9opJkEhQNZNss4Dm
aTZ6w=; b=lPVH2OndPmYPKxDVrMDS+1cymbEOevbZqizjGnweFMyFNxvjEMa0HF
b40EdYlN4s83BSkP2nFPOa1QqSUCa1s6xsKVcfBZJaFbQu25xhwa47ExLwQKazuA
HTp3sOrKGEY0PvGCiGv0+V1luZ7Q4dimb2WELQ6QbogBH3td72Qnk=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.4 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM autolearn=no version=3.3.2 spammy=07082017, Hx-languages-length:1473, 07.08.2017, demonstrated
X-HELO: mout.kundenserver.de
Subject: Re: Unicode width data inconsistent/outdated
To: cygwin AT cygwin DOT com
References: <f3c1b415-7a26-8bbe-a67f-5619d356f058 AT towo DOT net> <20170726080859 DOT GA24312 AT calimero DOT vinschen DOT de> <5d3cb047-49f8-26a6-d816-387a71486e99 AT cygwin DOT com> <20170726095016 DOT GA25666 AT calimero DOT vinschen DOT de> <289bd98b-e644-888d-07f8-8965b6538373 AT towo DOT net> <20170728195826 DOT GI24013 AT calimero DOT vinschen DOT de> <1244bd24-bb27-d185-1f24-61beae02c2cd AT towo DOT net> <20170804170156 DOT GL25551 AT calimero DOT vinschen DOT de> <30486790-c59d-9a78-6000-b3c20fb86d9d AT towo DOT net> <20170807092820 DOT GQ25551 AT calimero DOT vinschen DOT de> <401b6d26-35cb-3026-afde-6bd5d09b2d71 AT SystematicSw DOT ab DOT ca>
From: Thomas Wolff <towo AT towo DOT net>
Message-ID: <9f7a8d16-6ebc-52ff-15ae-b1a52d23986b@towo.net>
Date: Mon, 7 Aug 2017 21:30:32 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <401b6d26-35cb-3026-afde-6bd5d09b2d71@SystematicSw.ab.ca>
X-UI-Out-Filterresults: notjunk:1;V01:K0:a1KqQ8mSyOY=:YdCbibSlC/l6tq3c1bxY3L +s19k6pbffJjSgWwRIeSc9xuNWlCSz6xPbVjAVvEXBvhe/3X8gthTAsFsPb8n3HiLEyt3o3IL PuwFQBLcuY87b/f7CaTE5LAGUmhjpvIMKNHDCoC5G3kodLmbRdSFvvWml0x9o67L+8PSWm9FN RSAGtDxvCoH797/8p6lUD9yMCUjotiaNrCFUh/p/Z8emifWBKoVfrhv9OoieprvQriisGZHtF /a9DCJbGAzgdlVzqDmWBHk3SXNTACYW6VohFltty7VXhE+WrWi6bXvyzC10UQ66irTjpAKRZ6 kBHLi61TWEaDDfsUKlM+3X73idsvb/+X7gP6twOXygFVf2BniSwjeQasy5SIE6WH/KN6sNDdk Z4Bpbd5GVYurFnKPxEhgrtD/uik+O8NacTVsByxIJi/H/Bha0KmndAp68ku1zMqy+M1id/ut2 aL0G552dmjlfJhvFi3UbUcIUVmWEp89seJMm+2HYVm4/tCZR5Jh9Jeckst9Et6jXRff+LVf5i KobCykbDMYcGdkO83vP7wRgg2CNmY4eeP+Gf3p25+KlOErrlRmsoOZF7va3H+kedv6Cx3QpEJ OgfIDliYMaMODXIZZAKYDlURH+usYxLzF75sx5egGspdgwoHq8z2OwFk8dlh5u2o8DV7MsaMO u6q2b287vvfQFjwYxR+ODy29pPtZBV9chjFy/1S2mBn78mr0mnMPv3pq5x0T3JJrag9Uohjhv jC2qDX+Je3fMQhESRKD4yetAGVUD/d7Tyu1dO32BHfUHWl02BYNvnVTYdZA=
X-IsSubscribed: yes

Hi Brian,

Am 07.08.2017 um 21:07 schrieb Brian Inglis:
> ...
> Implementation considerations for handling the Unicode tables described in
> 	http://www.unicode.org/versions/Unicode10.0.0/ch05.pdf
> and implemented in
> 	https://www.strchr.com/multi-stage_tables
>
> ICU icu4[cj] uses a folded trie of the properties, where the unique property
> combinations are indexed, strings of those indices are generated for fixed size
> groups of character codes, unique values of those strings are then indexed, and
> those indices assigned to each character code group. The result is a multi-level
> indexing operation that returns the required property combination for each
> character.
>
> https://slidegur.com/doc/4172411/folded-trie--efficient-data-structure-for-all-of-unicode
>
> The FOX Toolkit uses a similar approach, splitting the 21 bit character code
> into 7 bit groups, with two higher levels of 7 bit indices, and more tweaks to
> eliminate redundancy.
>
> ftp://ftp.fox-toolkit.org/pub/FOX_Unicode_Tables.pdf
>
Thanks for the interesting links, I'll chech them out.
But such multi-level tables don't really help without a given procedure 
how to update them (that's only available for the lowest level, not for 
the code-embedded levels).
Also, as I've demonstrated, my more straight-forward and more efficient 
approach will even use less total space than the multi-level approach if 
packed table entries are used.
Thomas

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019