Mail Archives: cygwin/2011/03/20/15:14:21

delorie.com/archives/browse.cgi

search

Mail Archives: cygwin/2011/03/20/15:14:21

X-Recipient: archive-cygwin AT delorie DOT com

X-SWARE-Spam-Status: No, hits=-1.0 required=5.0 tests=AWL,BAYES_40,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_LOW

X-Spam-Check-By: sourceware.org

Message-ID: <4D8651F2.3000200@cwilson.fastmail.fm>

Date: Sun, 20 Mar 2011 15:13:54 -0400

From: Charles Wilson <cygwin AT cwilson DOT fastmail DOT fm>

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.23) Gecko/20090812 Thunderbird/2.0.0.23 Mnenhy/0.7.6.666

MIME-Version: 1.0

To: Cygwin Mailing List <cygwin AT cygwin DOT com>

Subject: cygwin + GetConsoleOutputCP

Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm

List-Id: <cygwin.cygwin.com>

List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>

List-Archive: <http://sourceware.org/ml/cygwin/>

List-Post: <mailto:cygwin AT cygwin DOT com>

List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>

Sender: cygwin-owner AT cygwin DOT com

Mail-Followup-To: cygwin AT cygwin DOT com

Delivered-To: mailing list cygwin AT cygwin DOT com

Question about porting the upstream "dos2unix" utilities.  These
implementations provide capabilities to convert text files from a
certain limited set of INPUT encodings (most are DOS codepages):

=====================================================
CONVERSION MODES
       Conversion modes ascii, 7bit, and iso are
       similar to those of dos2unix/unix2dos under
       SunOS/Solaris.

       ascii
           In mode "ascii" only line breaks are
           converted. This is the default conversion
           mode.

           Although the name of this mode is ASCII,
           which is a 7 bit standard, the actual mode
           is 8 bit. Use always this mode when
           converting Unicode UTF-8 files.

       7bit
           In this mode all 8 bit non-ASCII characters
           (with values from 128 to 255) are converted
           to a 7 bit space.

       iso Characters are converted between a DOS
           character set (code page) and ISO character
           set ISO-8859-1 (Latin-1) on Unix. DOS
           characters without ISO-8859-1 equivalent,
           for which conversion is not possible, are
           converted to a dot. The same counts for
           ISO-8859-1 characters without DOS
           counterpart.

           When only option "-iso" is used dos2unix
           will try to determine the active code page.
           When this is not possible dos2unix will use
           default code page CP437, which is mainly
           used in the USA.  To force a specific code
           page use options "-437" (US), "-850"
           (Western European), "-860" (Portuguese),
           "-863" (French Canadian), or "-865"
           (Nordic).  Windows code page CP1252
           (Western European) is also supported with
           option "-1252". For other code pages use
           dos2unix in combination with iconv(1).
           Iconv can convert between a long list of
           character encodings.
=====================================================

So basically if you specify -iso (or --conv iso) without any of the
"input encoding specification" options like -437 etc, then dos2unix will
autodetect attempt to detect the *console* encoding.  If it succeeds,
then it will "convert" character codes from that encoding to their
equivalent in ISO-8859-1 ("Latin 1") [unconvertible codes are replaced
with an ascii dot]

Note that this autodetect, if it works, assumes that the console's CP is
the input file's CP.  Fair enough -- and it's an overridable default
anyway.  However, I wonder if, in cygwin-1.7, we actually can/should use
the "console codepage" in ANY way.  Here's the code:

querycp.c:
#elif defined (WIN32) || defined(__CYGWIN__)

/* Erwin Waterlander */

#include <windows.h>
unsigned short query_con_codepage(void) {
   return((unsigned short)GetConsoleOutputCP());
}
#else

Or if instead, on cygwin, we should use some other mechanism (locale
settings?) to determine the correct default "input" codepage.

Comments?

--
Chuck




--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -

webmaster	delorie software privacy
Copyright © 2019 by DJ Delorie	Updated Jul 2019

X-Recipient:	archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status:	No, hits=-1.0 required=5.0 tests=AWL,BAYES_40,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_LOW
X-Spam-Check-By:	sourceware.org
Message-ID:	<4D8651F2.3000200@cwilson.fastmail.fm>
Date:	Sun, 20 Mar 2011 15:13:54 -0400
From:	Charles Wilson <cygwin AT cwilson DOT fastmail DOT fm>
User-Agent:	Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.23) Gecko/20090812 Thunderbird/2.0.0.23 Mnenhy/0.7.6.666
MIME-Version:	1.0
To:	Cygwin Mailing List <cygwin AT cygwin DOT com>
Subject:	cygwin + GetConsoleOutputCP
Mailing-List:	contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id:	<cygwin.cygwin.com>
List-Subscribe:	<mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive:	<http://sourceware.org/ml/cygwin/>
List-Post:	<mailto:cygwin AT cygwin DOT com>
List-Help:	<mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender:	cygwin-owner AT cygwin DOT com
Mail-Followup-To:	cygwin AT cygwin DOT com
Delivered-To:	mailing list cygwin AT cygwin DOT com