delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2008/10/02/07:50:20

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Message-ID: <6910a60810020449yc8a5c3fxae4c278944ef3b32@mail.gmail.com>
Date: Thu, 2 Oct 2008 13:49:14 +0200
From: "Reini Urban" <rurban AT x-ray DOT at>
To: "=?ISO-8859-1?Q?Bernt_R=F8skar_Brenna?=" <bernt DOT brenna AT gmail DOT com>
Subject: Re: Missing file from cygwin's catdoc
Cc: "The Cygwin Mailing List" <cygwin AT cygwin DOT com>
In-Reply-To: <6910a60810011236k7c451cc3y97d2df61687bbd00@mail.gmail.com>
MIME-Version: 1.0
References: <f5ff2d960810010354m2ddd50b2hd7bbcf8eacb22dab AT mail DOT gmail DOT com> <6910a60810011236k7c451cc3y97d2df61687bbd00 AT mail DOT gmail DOT com>
X-Google-Sender-Auth: 7722b1fac4faaa73
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id m92BoIXA025086

2008/10/1 Reini Urban:
> 2008/10/1 Bernt Røskar Brenna:
>> I believe that the Cygwin package catdoc is missing the file /etc/catdocrc
>>
>> Without the settings in the file, running catdoc against Word
>> documents with Norwegian characters produces strange results.
>>
>> I have used the following /etc/catdocrc:
>> charset_path=/usr/share/catdoc
>> map_path=/usr/share/catdoc
>> source_charset=cp1252
>> target_charset=8859-1
>> unknown_char='?'
>
> Thanks for this info.
>
>> There is another quite strange matter:
>> 'catdoc test_catdoc.doc' and 'catdoc -d8859-1 test_catdoc.doc'
>> produces different results (with the config file above, that has
>> 8859-1 as default). How is that possible?
>
> Because the defaults are insane.
> in: cp1251 out: koi8-r
>
> ----- version 0.94.2-2 -----
> * Added --with-input=cp1252 --with-output=8859-1
>  Was cp1251 to koi8-r as default
> * Added /etc/catdocrc (thanks to Bernt Røskar Brenna)
>
>> $ catdoc test_catdoc.doc
>> aeoa
>>
>> $ catdoc -d8859-1 test_catdoc.doc
>> æøå

I was wrong before. The source charset is almost always unicode
and the target charset is falsely detected on cygwin as US-ASCII.
You get "aeoa" because "æøå" translated to US-ASCII is "aeoa".

You can override this with
  use_locale=no
in the catdocrc.

> I'm just having problems with the new cygport or autoreconfig,
> so it doesn't build yet.
> I hope I can fix it soon.

I also found the build problem and fixed it.
The new release will come this evening.
-- 
Reini Urban
http://phpwiki.org/              http://murbreak.at/

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019