X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; q=dns; s=default; b=RKDCjhp9mRwTDLjBcCuK5PLHUKVoSRe9Ou/9UxlTUBV r+zpCv/wC2yXddg2DCPKmvW5n8v9J6WY0jvCcesOU/MCjHb4h8+AHHeaVLqHwsi5 /Iawuhp5TUbHVKGAXozEFhFfBuQbV3BsfM8Oy4aJkbk3brW54ptgCf10VwecdPGU = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; s=default; bh=8sKM1w0oi3IV5cb2jspkM3mQJdY=; b=Led4qWHkgiPun3637 CPMvBSNYEipo6McAdsr3lVjXQ57Se91AMX3SXRzjux9WWJNXrKWndN+FSkbu172d 2iTRtSCpzyBp4ot1iIWDHJ6lQl6GLbgn/Q6Y90K6BDJtx55OBWwxJtr1wF+e8F// FCOZA79nK/93e4w4nB8xfZbrds= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.0 required=5.0 tests=AWL,BAYES_40,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=scientific, corporations, emoji, bitten X-HELO: mail-yb0-f180.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-transfer-encoding; bh=4yYCCAZQNCB9Xspi0R+artgSeAscYMShmW7jahkrM4c=; b=F3LfUAPWudHVVeyiW/xnuSKF06oQBJQPmdLvq9P1Tl0ZQnQKtKA0SsYmS4zy+I9Ml7 a1lCdhiI4Jc71CMBhZghN4ywH1rR48ahR04pa1XkiLoBCHu+CuIR6nwFgLukLcTrerb3 yh+2oj/vFpTb8ToE57pr4MAHLRQYkewtZAZ/R1iKOQKUctFDA8m9wby+bQCXRNSg9U6C CeAZEjxGgKlEve7F+FU9AA2eXZP3xEY1ukgle+O/gm8ycufu1cVXb2EjyZ2GuykMs5xG pIxW1YgG9/ljoeR3m3EYCSVJbVlwA6Z9VwxYsRnWrhRYHUapCxNUPnSscze+4XYHwayi TeCQ== X-Gm-Message-State: AODbwcCJhioMS08OyDEohZONxKJpzTSBLvGCcwLK8fDM7cbaHTk4glzn aohfSdFSxJk6KgcYBs0rOyRc9B9yJN6ytq0= X-Received: by 10.37.105.6 with SMTP id e6mr14225802ybc.161.1495626167191; Wed, 24 May 2017 04:42:47 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: From: Erik Bray Date: Wed, 24 May 2017 13:42:46 +0200 Message-ID: Subject: Re: Python2 "narrow" build, Unicode issue in regex package To: cygwin AT cygwin DOT com Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id v4OBh4xB011425 On Wed, May 24, 2017 at 10:30 AM, Václav Haisman wrote: > Hi. > > I have recently hit an issue ([1]) with Python 2.7 and regex package > for it on Cygwin. It appears that Cygwin's Python 2.7 is so called > narrow build. This causes issues when working with Unicode code point > outside BMP, like the emoji code points in my issue. > > Is there a chance Cygwin's Python could be rebuilt as a wide build? > > [1] https://bitbucket.org/mrabarnett/mrab-regex/issues/241/issues-matching-unicode-code-ranges-with-p I've been bitten by this before too, and I don't know if there's a specific policy by which Cygwin has determined the narrow build should be used. Though narrow builds are typical on Windows because it translates easily to native wide character strings on Windows, whereas using a wide build introduces significantly more overhead. I know it's trite to answer "use a different tool", but if at all possible you might consider switching to Python 3, which is the future. Heck, it's really the present. Even most of the scientific Python community has switched over to Python 3 (well, at least the development community has--users are understandably a little slower). Many large corporations, such a Instagram, have switched. And Python 2 support is ending in 2020, so the sooner the better. I know it's a hassle though. Anyways, on current versions of Python 3 (I think 3.3 and above) there is no longer a wide- versus narrow- distinction. Instead, each string is stored in the smallest possible representation that fits the highest codepoint in the string. If you need a wide character build on Cygwin you could also build it yourself. Just make sure to get a few Cygwin patches from https://github.com/cygwinports/python2 Best, Erik -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple