DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 49E5VDGr3257202 Authentication-Results: delorie.com; dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=g+QNyfHg X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3CA87385B513 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1728883871; bh=/yFiyLs/kfv8dlZZwpbvP/y6Atl1bUFiz7x4Idi2MMc=; h=Date:To:Subject:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=g+QNyfHgUhArfvzREuGebeFnUVSZDDuRuYhH5JVZkGG5C0RX7B1MQNR6neIN0icEq rPdoi/1EqMb5jFOdv8S5MjmaOU3OCFvIQGnGAgT5/nIlppUdE2l/SBxh0SzeaHbS33 Za37yFbalYdqP0h4QTgvuVrBJuewXXHzzEjMjtJo= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 37D2E385B50F ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 37D2E385B50F ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728883815; cv=none; b=K7VQG+JOuu5PKb+Si7jhlAbHTn/9h2dvtpr0US24K9bHUcOfwMLvnnBlBsBnDikaLUWxBAJw/pNL06w01jJou3RYbtHg9rQRA4xGzwQv0veSLSCMifvE5jGzUuvImNx5R18MsG5As1L1xhQDVkiBxsjv6fMLeTWtrnHzDzQ2MA8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728883815; c=relaxed/simple; bh=AqRcgvTMVyBNB/y1MWlKH7Uu5x6UShSBWNPxYeNllvk=; h=Date:From:To:Subject:Message-Id:Mime-Version:DKIM-Signature; b=TYSJuUhjxy39k4NMmP7YJT/4f0eQYAvYbTxqb/QAjiRe92Bz7r1xXdI3xfL46kNO9rlQmSJhvHI56kN2FhzPtH85W79JbNiFzx9NCLuTIVGM4GxYI5beQf5Gkyt38PDkR5mEoSzfThQnbX60wxO4SiuevM4Ppq+inSnUxUbJBRY= ARC-Authentication-Results: i=1; server2.sourceware.org Date: Mon, 14 Oct 2024 14:29:58 +0900 To: cygwin AT cygwin DOT com Subject: Re: cygwin 3.5.4-1: signal handling destroys 'long double' values Message-Id: <20241014142958.ecf5faeb06a11a8c7a5301de@nifty.ne.jp> In-Reply-To: <26b71767-a2a5-423a-96cd-8d01f9438527@SystematicSW.ab.ca> References: <922a6d7e-3ee1-9bb7-dfd7-b94c53a7b9d4 AT t-online DOT de> <20241008202057 DOT abd3dc5bb4df172c530e7655 AT nifty DOT ne DOT jp> <79171662-eede-4b14-aaf4-ebd98e6d98de AT SystematicSW DOT ab DOT ca> <99f51137-2889-4985-b4c6-a460e05befb8 AT SystematicSW DOT ab DOT ca> <20241013081407 DOT f07402abe9f721924f461dcc AT nifty DOT ne DOT jp> <51e4e5dd-57ef-4cbc-aff4-572eebb863e2 AT SystematicSW DOT ab DOT ca> <20241014050649 DOT ddaa7e0d14365a86d8523f1d AT nifty DOT ne DOT jp> <26b71767-a2a5-423a-96cd-8d01f9438527 AT SystematicSW DOT ab DOT ca> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.30; i686-pc-mingw32) Mime-Version: 1.0 X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Takashi Yano via Cygwin Reply-To: Takashi Yano Content-Type: text/plain; charset="utf-8" Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 49E5VDGr3257202 Hi Brian, Thanks for the detail expression. On Sun, 13 Oct 2024 16:19:31 -0600 Brian Inglis wrote: > On 2024-10-13 14:06, Takashi Yano via Cygwin wrote: > > Hi Brian > > > > On Sun, 13 Oct 2024 10:41:58 -0600 > > Brian Inglis wrote: > >> On 2024-10-12 17:14, Takashi Yano via Cygwin wrote: > >>> Hi Brian, > >>> > >>> On Tue, 8 Oct 2024 10:37:14 -0600 > >>> Brian Inglis wrote: > >>>> On 2024-10-08 10:14, Brian Inglis via Cygwin wrote: > >>>>> On 2024-10-08 05:20, Takashi Yano via Cygwin wrote: > >>>>>> On Mon, 7 Oct 2024 15:11:52 +0200 > >>>>>> Christian Franke wrote: > >>>>>>> $ gcc -o sigtest -O2 sigtest.c > >>>>>>> > >>>>>>> $ ./sigtest > out.txt > >>>>>>> (press ^C 42x :-) > >>>>>>> > >>>>>>> $ sort out.txt | uniq -c > >>>>>>>         3 x = 0x1.23456789p+0, y = -nan, d = -nan > >>>>>>>         6 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = -nan > >>>>>>>        33 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = 0x0p+0 > >>>>>>> > >>>>>>> The problem also occurs if compiled without -O2, but less often. No > >>>>>>> problem occurs if compiled with -DWORKS which suggests that only 'long > >>>>>>> double' is affected. > >>>>>> > >>>>>> Thanks for the report. I looked into this problem and might find the > >>>>>> cause. It seems due to a bug of scripts/gendef. It generates signal > >>>>>> handler caller (sigfe.s) which stores/restores the registers. > >>>>>> > >>>>>> In sigdelayed, control word is stored/restored by fnstcw/fldcw instruction, > >>>>>> however, fninit instruction destroys some status registers in FPU (x87). > >>>>>> > >>>>>> I think we shold use fnstenv/fldenv rather than fnstcw/fldcw and fninit. > >>>>>> However, I'm not familiar with x87 instructions, so I may overlook > >>>>>> something. > >>>>>> > >>>>>> Could anyone expert of x87 instructions and sigfe stuff give some > >>>>>> comments? > >>>>> > >>>>> AIUI x87 FP handling is outdated and mainly unused on current systems, as > >>>>> current systems do more and use more than the legacy x87 instructions and stack. > >>>>> > >>>>> See https://en.cppreference.com/w/c/numeric/fenv and related docs for more > >>>>> modern approaches. > >>>>> > >>>>> You would have to look into the AMD/Intel/IEEE docs for lower level details. > >>>> > >>>> This is basically what ISTR: > >>>> > >>>> https://beta.boost.org/doc/libs/1_82_0/libs/context/doc/html/context/rationale/x86_and_floating_point_env.html > >>>> > >>>> where legacy x87 and MMX registers are not used or preserved on x86_64/amd64, as > >>>> SSE... instructions and XMM registers are used. > >>> > >>> Thanks for the advice. I read throuh the web pages and related documents > >>> and made a patch which uses fxsave/fxrstor and xsave/xrstror to > >>> cygwin-patches AT cygwin DOT com mailing list. > >>> https://cygwin.com/pipermail/cygwin-patches/2024q4/012804.html > >>> > >>> Is this as you intended? > >> > >> That seems to be the preferred approach now, as long as you can correctly > >> determine adequate space for fxsave and xsave, given the varying feature sets, > >> register counts, and register sizes of recent processors: > >> sse/2/3/4.1/4.2/4a/5/ssse3 avx2/512 128/256/512 bits X/Y/ZMM registers. > > > > Thanks for checking. > > > > According to https://cdrdv2.intel.com/v1/dl/getContent/671110 , > > fxsave uses 512 bytes fixed length memory to save the current > > state of the x87 FPU, MMX technology, XMM, and MXCSR registers. > > > > The patch allocates 0x238 bytes: > > 0x200 (512 bytes): fxsave area > > 0x008 ( 8 bytes): for 16-byte alignment > > 0x010 ( 16 bytes): work area > > 0x020 ( 32 bytes): reserved for later processing > > That is just the FPU state, MMX state, and 16 16B XMM registers, etc. > Please also note that 64 bit operands or REX prefix must be used with > FXSAVE/FXRSTOR to save expanded state rather than legacy state. Fixed. > > According to https://cdrdv2.intel.com/v1/dl/getContent/671436 , > > cpuid instruction with eax=0dh and ecs=00h returns the maximum > > size required by xsave in ebx. So the patch allocates: > > ebx + 0x048 bytes. > > 0x018 ( 24 bytes): for 64-byte alignment > > 0x010 ( 16 bytes): work area > > 0x020 ( 32 bytes): reserved for later processing > > That is for features currently enabled in XCR0 user state, not all the values of > all possible registers, for all possible features, in ecx, which are supported, > may be enabled, and in use. > You need 2KB to store 32 X/Y/ZMM 64B registers, and new real and virtual > features may require more. Do you mean we should use ecx value rather than ebx returned by cpuid (eax=0dh,ecx=0)? I did not understand difference of the values of ebx and ecx returned by cpuid. Fixed. > It may be conservative, but I would suggest allocating the space in ecx as > documented, just in case of future changes, and that can be reduced to 512 if > only fxsave is supported. > I suggest you should check for fxsave in cpuid 1:0 edx:24, fall back to > fnsave/frstor if not, and keep everything aligned to 64 bytes for safety. According to my survay, all Intel and AMD CPUs (means all x86 CPUs) have fxsave/fxrstor. So we do not need to check bit 24, do we? > For my AMD A10-9700 /proc/cpuinfo shows: > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx *fxsr* sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb > rdtscp lm constant_tsc rep_good acc_power nopl tsc_reliable nonstop_tsc cpuid > aperfmperf pni pclmuldq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes > *xsave* avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a > misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm > perfctr_core perfctr_nb bpext ptsc mwaitx cpb hw_pstate fsgsbase bmi1 avx2 smep > bmi2 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid > decode_assists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov > > and /usr/bin/cpuid (package cpuid) shows (see my added !): > > ... > feature information (1/edx): > x87 FPU on chip = true > VME: virtual-8086 mode enhancement = true > DE: debugging extensions = true > PSE: page size extensions = true > TSC: time stamp counter = true > RDMSR and WRMSR support = true > PAE: physical address extensions = true > MCE: machine check exception = true > CMPXCHG8B inst. = true > APIC on chip = true > SYSENTER and SYSEXIT = true > MTRR: memory type range registers = true > PTE global bit = true > MCA: machine check architecture = true > CMOV: conditional move/compare instr = true > PAT: page attribute table = true > PSE-36: page size extension = true > PSN: processor serial number = false > CLFLUSH instruction = true > DS: debug store = false > ACPI: thermal monitor and clock ctrl = false > MMX Technology = true > ! FXSAVE/FXRSTOR = true > SSE extensions = true > SSE2 extensions = true > SS: self snoop = false > hyper-threading / multi-core supported = true > TM: therm. monitor = false > IA64 = false > PBE: pending break event = false > feature information (1/ecx): > PNI/SSE3: Prescott New Instructions = true > PCLMULDQ instruction = true > DTES64: 64-bit debug store = false > MONITOR/MWAIT = true > CPL-qualified debug store = false > VMX: virtual machine extensions = false > SMX: safer mode extensions = false > Enhanced Intel SpeedStep Technology = false > TM2: thermal monitor 2 = false > SSSE3 extensions = true > context ID: adaptive or shared L1 data = false > SDBG: IA32_DEBUG_INTERFACE = false > FMA instruction = true > CMPXCHG16B instruction = true > xTPR disable = false > PDCM: perfmon and debug = false > PCID: process context identifiers = false > DCA: direct cache access = false > SSE4.1 extensions = true > SSE4.2 extensions = true > x2APIC: extended xAPIC support = false > MOVBE instruction = true > POPCNT instruction = true > time stamp counter deadline = false > AES instruction = true > XSAVE/XSTOR states = true > ! OS-enabled XSAVE/XSTOR = true > AVX: advanced vector extensions = true > F16C half-precision convert instruction = true > RDRAND instruction = true > hypervisor guest status = false > ... > XSAVE features (0xd/0): > XCR0 valid bit field mask = 0x4000000000000007 > x87 state = true > SSE state = true > AVX state = true > MPX BNDREGS = false > MPX BNDCSR = false > AVX-512 opmask = false > AVX-512 ZMM_Hi256 = false > AVX-512 Hi16_ZMM = false > PKRU state = false > XTILECFG state = false > XTILEDATA state = false > bytes required by fields in XCR0 = 0x00000340 (832) Is this ebx > ! bytes required by XSAVE/XRSTOR area = 0x000003c0 (960) and is this ecx from cpuid (0d:0)? I had checked some of my environments, but ebx and ecx had always the same value. So, I thought either can be used... Please check v2 patch. -- Takashi Yano -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple