Date: Mon, 8 Dec 1997 16:29:49 +0200 (IST) From: Eli Zaretskii To: ron aaron cc: djgpp AT delorie DOT com Subject: Re: Help with mkid/lid and Hebrew text In-Reply-To: <66absv$fb@bgtnsc01.worldnet.att.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk On Fri, 5 Dec 1997, ron aaron wrote: > I have a Hebrew text corpus which I would like to index with mkid/lid. I > have been able to mkid ok, and lid ".*" dumps all the tokens, but I can't do > 'lid (hebrew text)'. If I'm not mistaken, the tokens you see dumped by `lid' do NOT include Hebrew words. Is that true? If so, that's because ID-Utils do not treat characters with ASCII codes between 128 and 192 as word characters. It should be a simple matter to change ID-Utils and recompile them so that they do support such characters. (I can provide the necessary details, if you want.)