delorie.com/archives/browse.cgi | search |
DKIM-Filter: | OpenDKIM Filter v2.11.0 delorie.com 49E5VDGr3257202 |
Authentication-Results: | delorie.com; |
dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=g+QNyfHg | |
X-Recipient: | archive-cygwin AT delorie DOT com |
DKIM-Filter: | OpenDKIM Filter v2.11.0 sourceware.org 3CA87385B513 |
DKIM-Signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; |
s=default; t=1728883871; | |
bh=/yFiyLs/kfv8dlZZwpbvP/y6Atl1bUFiz7x4Idi2MMc=; | |
h=Date:To:Subject:In-Reply-To:References:List-Id:List-Unsubscribe: | |
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: | |
From; | |
b=g+QNyfHgUhArfvzREuGebeFnUVSZDDuRuYhH5JVZkGG5C0RX7B1MQNR6neIN0icEq | |
rPdoi/1EqMb5jFOdv8S5MjmaOU3OCFvIQGnGAgT5/nIlppUdE2l/SBxh0SzeaHbS33 | |
Za37yFbalYdqP0h4QTgvuVrBJuewXXHzzEjMjtJo= | |
X-Original-To: | cygwin AT cygwin DOT com |
Delivered-To: | cygwin AT cygwin DOT com |
DMARC-Filter: | OpenDMARC Filter v1.4.2 sourceware.org 37D2E385B50F |
ARC-Filter: | OpenARC Filter v1.0.0 sourceware.org 37D2E385B50F |
ARC-Seal: | i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728883815; cv=none; |
b=K7VQG+JOuu5PKb+Si7jhlAbHTn/9h2dvtpr0US24K9bHUcOfwMLvnnBlBsBnDikaLUWxBAJw/pNL06w01jJou3RYbtHg9rQRA4xGzwQv0veSLSCMifvE5jGzUuvImNx5R18MsG5As1L1xhQDVkiBxsjv6fMLeTWtrnHzDzQ2MA8= | |
ARC-Message-Signature: | i=1; a=rsa-sha256; d=sourceware.org; s=key; |
t=1728883815; c=relaxed/simple; | |
bh=AqRcgvTMVyBNB/y1MWlKH7Uu5x6UShSBWNPxYeNllvk=; | |
h=Date:From:To:Subject:Message-Id:Mime-Version:DKIM-Signature; | |
b=TYSJuUhjxy39k4NMmP7YJT/4f0eQYAvYbTxqb/QAjiRe92Bz7r1xXdI3xfL46kNO9rlQmSJhvHI56kN2FhzPtH85W79JbNiFzx9NCLuTIVGM4GxYI5beQf5Gkyt38PDkR5mEoSzfThQnbX60wxO4SiuevM4Ppq+inSnUxUbJBRY= | |
ARC-Authentication-Results: | i=1; server2.sourceware.org |
Date: | Mon, 14 Oct 2024 14:29:58 +0900 |
To: | cygwin AT cygwin DOT com |
Subject: | Re: cygwin 3.5.4-1: signal handling destroys 'long double' values |
Message-Id: | <20241014142958.ecf5faeb06a11a8c7a5301de@nifty.ne.jp> |
In-Reply-To: | <26b71767-a2a5-423a-96cd-8d01f9438527@SystematicSW.ab.ca> |
References: | <922a6d7e-3ee1-9bb7-dfd7-b94c53a7b9d4 AT t-online DOT de> |
<20241008202057 DOT abd3dc5bb4df172c530e7655 AT nifty DOT ne DOT jp> | |
<79171662-eede-4b14-aaf4-ebd98e6d98de AT SystematicSW DOT ab DOT ca> | |
<99f51137-2889-4985-b4c6-a460e05befb8 AT SystematicSW DOT ab DOT ca> | |
<20241013081407 DOT f07402abe9f721924f461dcc AT nifty DOT ne DOT jp> | |
<51e4e5dd-57ef-4cbc-aff4-572eebb863e2 AT SystematicSW DOT ab DOT ca> | |
<20241014050649 DOT ddaa7e0d14365a86d8523f1d AT nifty DOT ne DOT jp> | |
<26b71767-a2a5-423a-96cd-8d01f9438527 AT SystematicSW DOT ab DOT ca> | |
X-Mailer: | Sylpheed 3.7.0 (GTK+ 2.24.30; i686-pc-mingw32) |
Mime-Version: | 1.0 |
X-Spam-Status: | No, score=-3.2 required=5.0 tests=BAYES_00, BODY_8BITS, |
DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, | |
SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 | |
X-Spam-Checker-Version: | SpamAssassin 3.4.6 (2021-04-09) on |
server2.sourceware.org | |
X-BeenThere: | cygwin AT cygwin DOT com |
X-Mailman-Version: | 2.1.30 |
List-Id: | General Cygwin discussions and problem reports <cygwin.cygwin.com> |
List-Unsubscribe: | <https://cygwin.com/mailman/options/cygwin>, |
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe> | |
List-Archive: | <https://cygwin.com/pipermail/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-request AT cygwin DOT com?subject=help> |
List-Subscribe: | <https://cygwin.com/mailman/listinfo/cygwin>, |
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe> | |
From: | Takashi Yano via Cygwin <cygwin AT cygwin DOT com> |
Reply-To: | Takashi Yano <takashi DOT yano AT nifty DOT ne DOT jp> |
Errors-To: | cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com |
Sender: | "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com> |
X-MIME-Autoconverted: | from base64 to 8bit by delorie.com id 49E5VDGr3257202 |
Hi Brian, Thanks for the detail expression. On Sun, 13 Oct 2024 16:19:31 -0600 Brian Inglis wrote: > On 2024-10-13 14:06, Takashi Yano via Cygwin wrote: > > Hi Brian > > > > On Sun, 13 Oct 2024 10:41:58 -0600 > > Brian Inglis wrote: > >> On 2024-10-12 17:14, Takashi Yano via Cygwin wrote: > >>> Hi Brian, > >>> > >>> On Tue, 8 Oct 2024 10:37:14 -0600 > >>> Brian Inglis wrote: > >>>> On 2024-10-08 10:14, Brian Inglis via Cygwin wrote: > >>>>> On 2024-10-08 05:20, Takashi Yano via Cygwin wrote: > >>>>>> On Mon, 7 Oct 2024 15:11:52 +0200 > >>>>>> Christian Franke wrote: > >>>>>>> $ gcc -o sigtest -O2 sigtest.c > >>>>>>> > >>>>>>> $ ./sigtest > out.txt > >>>>>>> (press ^C 42x :-) > >>>>>>> > >>>>>>> $ sort out.txt | uniq -c > >>>>>>> Â Â Â Â Â Â 3 x = 0x1.23456789p+0, y = -nan, d = -nan > >>>>>>> Â Â Â Â Â Â 6 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = -nan > >>>>>>> Â Â Â Â Â 33 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = 0x0p+0 > >>>>>>> > >>>>>>> The problem also occurs if compiled without -O2, but less often. No > >>>>>>> problem occurs if compiled with -DWORKS which suggests that only 'long > >>>>>>> double' is affected. > >>>>>> > >>>>>> Thanks for the report. I looked into this problem and might find the > >>>>>> cause. It seems due to a bug of scripts/gendef. It generates signal > >>>>>> handler caller (sigfe.s) which stores/restores the registers. > >>>>>> > >>>>>> In sigdelayed, control word is stored/restored by fnstcw/fldcw instruction, > >>>>>> however, fninit instruction destroys some status registers in FPU (x87). > >>>>>> > >>>>>> I think we shold use fnstenv/fldenv rather than fnstcw/fldcw and fninit. > >>>>>> However, I'm not familiar with x87 instructions, so I may overlook > >>>>>> something. > >>>>>> > >>>>>> Could anyone expert of x87 instructions and sigfe stuff give some > >>>>>> comments? > >>>>> > >>>>> AIUI x87 FP handling is outdated and mainly unused on current systems, as > >>>>> current systems do more and use more than the legacy x87 instructions and stack. > >>>>> > >>>>> See https://en.cppreference.com/w/c/numeric/fenv and related docs for more > >>>>> modern approaches. > >>>>> > >>>>> You would have to look into the AMD/Intel/IEEE docs for lower level details. > >>>> > >>>> This is basically what ISTR: > >>>> > >>>> https://beta.boost.org/doc/libs/1_82_0/libs/context/doc/html/context/rationale/x86_and_floating_point_env.html > >>>> > >>>> where legacy x87 and MMX registers are not used or preserved on x86_64/amd64, as > >>>> SSE... instructions and XMM registers are used. > >>> > >>> Thanks for the advice. I read throuh the web pages and related documents > >>> and made a patch which uses fxsave/fxrstor and xsave/xrstror to > >>> cygwin-patches AT cygwin DOT com mailing list. > >>> https://cygwin.com/pipermail/cygwin-patches/2024q4/012804.html > >>> > >>> Is this as you intended? > >> > >> That seems to be the preferred approach now, as long as you can correctly > >> determine adequate space for fxsave and xsave, given the varying feature sets, > >> register counts, and register sizes of recent processors: > >> sse/2/3/4.1/4.2/4a/5/ssse3 avx2/512 128/256/512 bits X/Y/ZMM registers. > > > > Thanks for checking. > > > > According to https://cdrdv2.intel.com/v1/dl/getContent/671110 , > > fxsave uses 512 bytes fixed length memory to save the current > > state of the x87 FPU, MMX technology, XMM, and MXCSR registers. > > > > The patch allocates 0x238 bytes: > > 0x200 (512 bytes): fxsave area > > 0x008 ( 8 bytes): for 16-byte alignment > > 0x010 ( 16 bytes): work area > > 0x020 ( 32 bytes): reserved for later processing > > That is just the FPU state, MMX state, and 16 16B XMM registers, etc. > Please also note that 64 bit operands or REX prefix must be used with > FXSAVE/FXRSTOR to save expanded state rather than legacy state. Fixed. > > According to https://cdrdv2.intel.com/v1/dl/getContent/671436 , > > cpuid instruction with eax=0dh and ecs=00h returns the maximum > > size required by xsave in ebx. So the patch allocates: > > ebx + 0x048 bytes. > > 0x018 ( 24 bytes): for 64-byte alignment > > 0x010 ( 16 bytes): work area > > 0x020 ( 32 bytes): reserved for later processing > > That is for features currently enabled in XCR0 user state, not all the values of > all possible registers, for all possible features, in ecx, which are supported, > may be enabled, and in use. > You need 2KB to store 32 X/Y/ZMM 64B registers, and new real and virtual > features may require more. Do you mean we should use ecx value rather than ebx returned by cpuid (eax=0dh,ecx=0)? I did not understand difference of the values of ebx and ecx returned by cpuid. Fixed. > It may be conservative, but I would suggest allocating the space in ecx as > documented, just in case of future changes, and that can be reduced to 512 if > only fxsave is supported. > I suggest you should check for fxsave in cpuid 1:0 edx:24, fall back to > fnsave/frstor if not, and keep everything aligned to 64 bytes for safety. According to my survay, all Intel and AMD CPUs (means all x86 CPUs) have fxsave/fxrstor. So we do not need to check bit 24, do we? > For my AMD A10-9700 /proc/cpuinfo shows: > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx *fxsr* sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb > rdtscp lm constant_tsc rep_good acc_power nopl tsc_reliable nonstop_tsc cpuid > aperfmperf pni pclmuldq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes > *xsave* avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a > misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm > perfctr_core perfctr_nb bpext ptsc mwaitx cpb hw_pstate fsgsbase bmi1 avx2 smep > bmi2 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid > decode_assists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov > > and /usr/bin/cpuid (package cpuid) shows (see my added !): > > ... > feature information (1/edx): > x87 FPU on chip = true > VME: virtual-8086 mode enhancement = true > DE: debugging extensions = true > PSE: page size extensions = true > TSC: time stamp counter = true > RDMSR and WRMSR support = true > PAE: physical address extensions = true > MCE: machine check exception = true > CMPXCHG8B inst. = true > APIC on chip = true > SYSENTER and SYSEXIT = true > MTRR: memory type range registers = true > PTE global bit = true > MCA: machine check architecture = true > CMOV: conditional move/compare instr = true > PAT: page attribute table = true > PSE-36: page size extension = true > PSN: processor serial number = false > CLFLUSH instruction = true > DS: debug store = false > ACPI: thermal monitor and clock ctrl = false > MMX Technology = true > ! FXSAVE/FXRSTOR = true > SSE extensions = true > SSE2 extensions = true > SS: self snoop = false > hyper-threading / multi-core supported = true > TM: therm. monitor = false > IA64 = false > PBE: pending break event = false > feature information (1/ecx): > PNI/SSE3: Prescott New Instructions = true > PCLMULDQ instruction = true > DTES64: 64-bit debug store = false > MONITOR/MWAIT = true > CPL-qualified debug store = false > VMX: virtual machine extensions = false > SMX: safer mode extensions = false > Enhanced Intel SpeedStep Technology = false > TM2: thermal monitor 2 = false > SSSE3 extensions = true > context ID: adaptive or shared L1 data = false > SDBG: IA32_DEBUG_INTERFACE = false > FMA instruction = true > CMPXCHG16B instruction = true > xTPR disable = false > PDCM: perfmon and debug = false > PCID: process context identifiers = false > DCA: direct cache access = false > SSE4.1 extensions = true > SSE4.2 extensions = true > x2APIC: extended xAPIC support = false > MOVBE instruction = true > POPCNT instruction = true > time stamp counter deadline = false > AES instruction = true > XSAVE/XSTOR states = true > ! OS-enabled XSAVE/XSTOR = true > AVX: advanced vector extensions = true > F16C half-precision convert instruction = true > RDRAND instruction = true > hypervisor guest status = false > ... > XSAVE features (0xd/0): > XCR0 valid bit field mask = 0x4000000000000007 > x87 state = true > SSE state = true > AVX state = true > MPX BNDREGS = false > MPX BNDCSR = false > AVX-512 opmask = false > AVX-512 ZMM_Hi256 = false > AVX-512 Hi16_ZMM = false > PKRU state = false > XTILECFG state = false > XTILEDATA state = false > bytes required by fields in XCR0 = 0x00000340 (832) Is this ebx > ! bytes required by XSAVE/XRSTOR area = 0x000003c0 (960) and is this ecx from cpuid (0d:0)? I had checked some of my environments, but ebx and ecx had always the same value. So, I thought either can be used... Please check v2 patch. -- Takashi Yano <takashi DOT yano AT nifty DOT ne DOT jp> -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |