delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2021/06/25/05:01:30

X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3443B39C0C22
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1624611686;
bh=Fs3gb2elptRpXywUhgWMkC0KlStdbehA0AE7ZraTT3I=;
h=To:Subject:Date:References:In-Reply-To:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
From;
b=Irzi+DJl2hN9z8hGiUSA9lbuNUVgQBFeIkbLAXfZduQYyjpuTDNgCDk3f9oomJ10G
lrlHKGWnidDgJ9ocCXXha0aOZgr6BuD9ls1swi4YFd1IS/08Qz0zi2BL9AzAaMFD6W
/JlAdkCUu9yzXKJVTACf11TvlvxIwbexFyJw9HKA=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E80B0385702C
To: "cygwin AT cygwin DOT com" <cygwin AT cygwin DOT com>
Subject: RE: getclip and putclip garble unicode characters
Thread-Topic: getclip and putclip garble unicode characters
Thread-Index: AddoNN8dUidQoEiqTx6xJKaDlsTnngAdSooAABRJ19A=
Date: Fri, 25 Jun 2021 09:00:09 +0000
Message-ID: <c7597576edad43a1ae1a8a37ab47bd62@severstal.com>
References: <af6d2888db8e4256b8c24064389aed9b AT severstal DOT com>
<1442655532 DOT 20210624093554 AT yandex DOT ru>
In-Reply-To: <1442655532.20210624093554@yandex.ru>
Accept-Language: en-GB, en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.120.77.3]
x-kse-serverinfo: chr-mail-m03.severstal.severstalgroup.com, 9
x-kse-attachmentfiltering-interceptor-info: protection disabled
x-kse-antivirus-interceptor-info: scan successful
x-kse-antivirus-info: Clean, bases: 6/25/2021 6:01:00 AM
x-kse-bulkmessagesfiltering-scan-result: protection disabled
MIME-Version: 1.0
X-KSE-ServerInfo: chr-exch-edg2.severstal.severstalgroup.com, 9
X-KSE-Attachment-Filter-Triggered-Rules: Clean
X-KSE-Attachment-Filter-Triggered-Filters: Clean
X-KSE-BulkMessagesFiltering-Scan-Result: protection disabled
X-Spam-Status: No, score=2.3 required=5.0 tests=BAYES_50, BODY_8BITS,
KAM_DMARC_STATUS, SPF_HELO_NONE,
SPF_PASS autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Level: **
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: =?utf-8?b?0JzQuNGA0L7QvdC+0LIg0JvQtdC+0L3QuNC0INCS0LvQsNC00LjQvNC40YA=?=
=?utf-8?b?0L7QstC40YcgdmlhIEN5Z3dpbg==?= <cygwin AT cygwin DOT com>
Reply-To: =?windows-1251?B?zOjw7u3u4iDL5e7t6OQgwuvg5Ojs6PDu4uj3?=
<lv DOT mironov AT severstal DOT com>
Sender: "Cygwin" <cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 15P91TSU019412

As far as copying from cygwin to windows is concerned, it happens in exactly the same way in all windows programs I tried pasting data to - word, outlook, chrome, console, you name it. Changing windows keyboard language has no effect either, windows still stubbornly treats clipboard contents as cp1252 (don't quite see how it is supposed to help - data on the clipboard is not limited to one single-byte codepage anyway). 

At first I missed that when copying from windows to cygwin getclip actually gets data in cp1251 (windows ANSI codepage), thus cyrillic characters can be at least recovered with iconv, but non-cyrillic non-latin characters - e.g. greek, are replaced with question marks and are lost although in windows everything can be pasted back without issues, again regardless of the program and keyboard language.

So in a nutshell, when copy-pasting from cygwin putclip to windows unicode is treated as cp1252 while copy-pasting from windows to cygwin getclip unicode is treated as cp1251.

Sorry for top-posting.

-----Original Message-----
From: Andrey Repin <anrdaemon AT yandex DOT ru> 
Sent: Thursday, June 24, 2021 9:36 AM
To: Миронов Леонид Владимирович <lv DOT mironov AT severstal DOT com>; cygwin AT cygwin DOT com
Subject: Re: getclip and putclip garble unicode characters

Greetings, Миронов Леонид Владимирович!

> getclip and putclip from cygutils-extra garble unicode characters:
> non-latin characters copied to clipboard in windows are replaced with 
> question marks when retrieved with getclip in cygwin, and non-latin 
> characters copied to clipboard using putclip are pasted it in windows 
> looking like utf-8 displayed in cp1252 but can be retrieved with 
> getclip exactly as pasted, so it looks like the problem is not in the 
> way the data is copied but in the way cygwin and windows communicate 
> text encoding to each other. LC_CTYPE=en_US.UTF-8, windows ANSI codepage is set to cp1251 - 1251, not 1252.

This looks like you are using a program incapable of dealing with unicode clipboard. To achieve better results, switch your input language/keyboard to matching language before copying text from application. I.e. switch to Russian then copy text, then check what is returned by getclip.
But then, why LC_CTYPE is en_US?


--
With best regards,
Andrey Repin
Thursday, June 24, 2021 9:33:54

Sorry for my terrible english...

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019