Mail Archives: djgpp/1997/12/08/09:33:15

delorie.com/archives/browse.cgi

search

Mail Archives: djgpp/1997/12/08/09:33:15

Date: Mon, 8 Dec 1997 16:29:49 +0200 (IST)

From: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>

To: ron aaron <ron AT mossrrrbay DOT com>

cc: djgpp AT delorie DOT com

Subject: Re: Help with mkid/lid and Hebrew text

In-Reply-To: <66absv$fb@bgtnsc01.worldnet.att.net>

Message-ID: <Pine.SUN.3.91.971208162928.4200l-100000@is>

MIME-Version: 1.0

On Fri, 5 Dec 1997, ron aaron wrote:

> I have a Hebrew text corpus which I would like to index with mkid/lid.  I
> have been able to mkid ok, and lid ".*" dumps all the tokens, but I can't do
> 'lid (hebrew text)'.

If I'm not mistaken, the tokens you see dumped by `lid' do NOT include
Hebrew words.  Is that true?  If so, that's because ID-Utils do not
treat characters with ASCII codes between 128 and 192 as word
characters.  It should be a simple matter to change ID-Utils and
recompile them so that they do support such characters.  (I can
provide the necessary details, if you want.)

- Raw text -

webmaster	delorie software privacy
Copyright © 2019 by DJ Delorie	Updated Jul 2019

Date:	Mon, 8 Dec 1997 16:29:49 +0200 (IST)
From:	Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
To:	ron aaron <ron AT mossrrrbay DOT com>
cc:	djgpp AT delorie DOT com
Subject:	Re: Help with mkid/lid and Hebrew text
In-Reply-To:	<66absv$fb@bgtnsc01.worldnet.att.net>
Message-ID:	<Pine.SUN.3.91.971208162928.4200l-100000@is>
MIME-Version:	1.0