Message-Id: <200409100309.i8A39LOY023213@delorie.com> Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com From: "Gary R. Van Sickle" To: "'Cygwin'" Subject: OT: RE: filesystem encoding Date: Thu, 9 Sep 2004 22:08:46 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit In-Reply-To: <4140156E.9060902@chameleonnet.co.uk> X-IsSubscribed: yes > Hmm...interesting. Not entirely sure what the implications > of what you are saying are (as I don't really understand codepages). > > Does a codepage represent a character with 16 bits? or 8? > Could you recommend a book or a URL on the subject? Maybe I > should look at this when I have more time (I'm in the middle > of a move). A "codepage" isn't a Unicode thing, it's a horrific hack that was and still is used to allow a computer to "speak" almost all of the worlds' languages (the ones that aren't made of thousands of pictographs anyway). A codepage is essentially a mapping of 7- or 8-bit numbers to the glyphs of a particular language. So for example, Russian might have a codepage that says the number 0x01 is the backwards-"R" letter, 0x02 is the "X" with a vertical line though it, etc etc. So a guy in Russia sets up his computer to use this codepage, and he gets his Cyrillic characters popping up when he types, and everything is great, right? Wrong: - Ever get an "ASCII" text email or file that had some goofy graphic characters in it that clearly weren't what the other guy had typed? You're not using the same codepage as the guy who wrote the text. His codepage has a "starting quote" character at the same number where yours has a goofy graphics character. - Some languages have more than one codepage. Russian IIRC has like five or six. The mappings may or may not be related to each other in any way. So even if you speak the same language as the guy you're sending a text file to, it may be completely unintelligible to him. - And heaven help you if you're an American and need to look at a Russian text file. Which ASCII character is "backwards R" going to map to? Let me field that one: trick question, it'll map to some control character or something and if you're lucky it'll be rendered by your text editor as "?" or something, if you're not.... Email tries to get around these problems by having a header telling you what codepage the email was composed in, but if the mutt ML is any indication it seems to be spottily implemented. With your garden variety text file, you're just SOL. Welcome to the 21st century, where computers can't even unambiguously represent written text. -- Gary R. Van Sickle -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/