delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1997/12/08/09:33:15

Date: Mon, 8 Dec 1997 16:29:49 +0200 (IST)
From: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
To: ron aaron <ron AT mossrrrbay DOT com>
cc: djgpp AT delorie DOT com
Subject: Re: Help with mkid/lid and Hebrew text
In-Reply-To: <66absv$fb@bgtnsc01.worldnet.att.net>
Message-ID: <Pine.SUN.3.91.971208162928.4200l-100000@is>
MIME-Version: 1.0

On Fri, 5 Dec 1997, ron aaron wrote:

> I have a Hebrew text corpus which I would like to index with mkid/lid.  I
> have been able to mkid ok, and lid ".*" dumps all the tokens, but I can't do
> 'lid (hebrew text)'.

If I'm not mistaken, the tokens you see dumped by `lid' do NOT include
Hebrew words.  Is that true?  If so, that's because ID-Utils do not
treat characters with ASCII codes between 128 and 192 as word
characters.  It should be a simple matter to change ID-Utils and
recompile them so that they do support such characters.  (I can
provide the necessary details, if you want.)

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019