DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 49E60B7b3266577 Authentication-Results: delorie.com; dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=EZAiebWZ X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D77203858CDA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1728885610; bh=sGqKxmW36bu5kUkctBrsAoJyTRx194hbm2SxbjuiKQ0=; h=Date:To:Subject:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=EZAiebWZyi1g9ucVMH1LJGE3yeryo8XDGJYxQht6PMaZg1iTI7yV8bIsJoxF6cCOx e2l5KGeh4nL4YjsCLdFoIrHLQAST2NejS67T+T0gbTl7TaR+GixdY6Wf0Jh5goZ+Lv oFYNYaM2myHNUmCtAzirPBK2t5hvX4UDv8SSnkJ0= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 43EF13858D34 ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 43EF13858D34 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728885586; cv=none; b=f9L0xPudN/cjWJERLV5kBQVWbqepPxmpJlNpuA8C1swCIFoJrRDdJk/e/Inn80pePgbPxT4ajMOgvZ1aldzMyOjlR28l0YRbV7Z9+p28KrcpT4xYPRT9K82qBJU+2AVXoSH87A8eU0gT+rVnZ430NkhweSkcISfoJG3+T8ppCuo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728885586; c=relaxed/simple; bh=igV7LcsQenfc8NCYHtwOOkSpgPVAPGcAZBstfY6UNRw=; h=Date:From:To:Subject:Message-Id:Mime-Version:DKIM-Signature; b=iCtWthUl33dessvedxjdsfA5GDmYjQn4FjK1uknuL2mGscbHb7b1R5DRC5IMoZwvoIkmisf5rMMGvK9BpFtW20eYcuS+59DPmv4Ag09bsnQ+N1BbetjYL3NmM88hiwx2tXMoRKPUvhOns06yQQbAMZP50+qWbUEY4RTxWachhaI= ARC-Authentication-Results: i=1; server2.sourceware.org Date: Mon, 14 Oct 2024 14:59:40 +0900 To: cygwin AT cygwin DOT com Subject: Re: cygwin 3.5.4-1: signal handling destroys 'long double' values Message-Id: <20241014145940.c8a84f4360cd3b58837493d1@nifty.ne.jp> In-Reply-To: <20241014142958.ecf5faeb06a11a8c7a5301de@nifty.ne.jp> References: <922a6d7e-3ee1-9bb7-dfd7-b94c53a7b9d4 AT t-online DOT de> <20241008202057 DOT abd3dc5bb4df172c530e7655 AT nifty DOT ne DOT jp> <79171662-eede-4b14-aaf4-ebd98e6d98de AT SystematicSW DOT ab DOT ca> <99f51137-2889-4985-b4c6-a460e05befb8 AT SystematicSW DOT ab DOT ca> <20241013081407 DOT f07402abe9f721924f461dcc AT nifty DOT ne DOT jp> <51e4e5dd-57ef-4cbc-aff4-572eebb863e2 AT SystematicSW DOT ab DOT ca> <20241014050649 DOT ddaa7e0d14365a86d8523f1d AT nifty DOT ne DOT jp> <26b71767-a2a5-423a-96cd-8d01f9438527 AT SystematicSW DOT ab DOT ca> <20241014142958 DOT ecf5faeb06a11a8c7a5301de AT nifty DOT ne DOT jp> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.30; i686-pc-mingw32) Mime-Version: 1.0 X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Takashi Yano via Cygwin Reply-To: Takashi Yano Content-Type: text/plain; charset="utf-8" Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 49E60B7b3266577 On Mon, 14 Oct 2024 14:29:58 +0900 Takashi Yano wrote: > Hi Brian, > > Thanks for the detail expression. > > On Sun, 13 Oct 2024 16:19:31 -0600 > Brian Inglis wrote: > > On 2024-10-13 14:06, Takashi Yano via Cygwin wrote: > > > Hi Brian > > > > > > On Sun, 13 Oct 2024 10:41:58 -0600 > > > Brian Inglis wrote: > > >> On 2024-10-12 17:14, Takashi Yano via Cygwin wrote: > > >>> Hi Brian, > > >>> > > >>> On Tue, 8 Oct 2024 10:37:14 -0600 > > >>> Brian Inglis wrote: > > >>>> On 2024-10-08 10:14, Brian Inglis via Cygwin wrote: > > >>>>> On 2024-10-08 05:20, Takashi Yano via Cygwin wrote: > > >>>>>> On Mon, 7 Oct 2024 15:11:52 +0200 > > >>>>>> Christian Franke wrote: > > >>>>>>> $ gcc -o sigtest -O2 sigtest.c > > >>>>>>> > > >>>>>>> $ ./sigtest > out.txt > > >>>>>>> (press ^C 42x :-) > > >>>>>>> > > >>>>>>> $ sort out.txt | uniq -c > > >>>>>>>         3 x = 0x1.23456789p+0, y = -nan, d = -nan > > >>>>>>>         6 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = -nan > > >>>>>>>        33 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = 0x0p+0 > > >>>>>>> > > >>>>>>> The problem also occurs if compiled without -O2, but less often. No > > >>>>>>> problem occurs if compiled with -DWORKS which suggests that only 'long > > >>>>>>> double' is affected. > > >>>>>> > > >>>>>> Thanks for the report. I looked into this problem and might find the > > >>>>>> cause. It seems due to a bug of scripts/gendef. It generates signal > > >>>>>> handler caller (sigfe.s) which stores/restores the registers. > > >>>>>> > > >>>>>> In sigdelayed, control word is stored/restored by fnstcw/fldcw instruction, > > >>>>>> however, fninit instruction destroys some status registers in FPU (x87). > > >>>>>> > > >>>>>> I think we shold use fnstenv/fldenv rather than fnstcw/fldcw and fninit. > > >>>>>> However, I'm not familiar with x87 instructions, so I may overlook > > >>>>>> something. > > >>>>>> > > >>>>>> Could anyone expert of x87 instructions and sigfe stuff give some > > >>>>>> comments? > > >>>>> > > >>>>> AIUI x87 FP handling is outdated and mainly unused on current systems, as > > >>>>> current systems do more and use more than the legacy x87 instructions and stack. > > >>>>> > > >>>>> See https://en.cppreference.com/w/c/numeric/fenv and related docs for more > > >>>>> modern approaches. > > >>>>> > > >>>>> You would have to look into the AMD/Intel/IEEE docs for lower level details. > > >>>> > > >>>> This is basically what ISTR: > > >>>> > > >>>> https://beta.boost.org/doc/libs/1_82_0/libs/context/doc/html/context/rationale/x86_and_floating_point_env.html > > >>>> > > >>>> where legacy x87 and MMX registers are not used or preserved on x86_64/amd64, as > > >>>> SSE... instructions and XMM registers are used. > > >>> > > >>> Thanks for the advice. I read throuh the web pages and related documents > > >>> and made a patch which uses fxsave/fxrstor and xsave/xrstror to > > >>> cygwin-patches AT cygwin DOT com mailing list. > > >>> https://cygwin.com/pipermail/cygwin-patches/2024q4/012804.html > > >>> > > >>> Is this as you intended? > > >> > > >> That seems to be the preferred approach now, as long as you can correctly > > >> determine adequate space for fxsave and xsave, given the varying feature sets, > > >> register counts, and register sizes of recent processors: > > >> sse/2/3/4.1/4.2/4a/5/ssse3 avx2/512 128/256/512 bits X/Y/ZMM registers. > > > > > > Thanks for checking. > > > > > > According to https://cdrdv2.intel.com/v1/dl/getContent/671110 , > > > fxsave uses 512 bytes fixed length memory to save the current > > > state of the x87 FPU, MMX technology, XMM, and MXCSR registers. > > > > > > The patch allocates 0x238 bytes: > > > 0x200 (512 bytes): fxsave area > > > 0x008 ( 8 bytes): for 16-byte alignment > > > 0x010 ( 16 bytes): work area > > > 0x020 ( 32 bytes): reserved for later processing > > > > That is just the FPU state, MMX state, and 16 16B XMM registers, etc. > > Please also note that 64 bit operands or REX prefix must be used with > > FXSAVE/FXRSTOR to save expanded state rather than legacy state. > > Fixed. > > > > According to https://cdrdv2.intel.com/v1/dl/getContent/671436 , > > > cpuid instruction with eax=0dh and ecs=00h returns the maximum > > > size required by xsave in ebx. So the patch allocates: > > > ebx + 0x048 bytes. > > > 0x018 ( 24 bytes): for 64-byte alignment > > > 0x010 ( 16 bytes): work area > > > 0x020 ( 32 bytes): reserved for later processing > > > > That is for features currently enabled in XCR0 user state, not all the values of > > all possible registers, for all possible features, in ecx, which are supported, > > may be enabled, and in use. > > You need 2KB to store 32 X/Y/ZMM 64B registers, and new real and virtual > > features may require more. > > Do you mean we should use ecx value rather than ebx returned by > cpuid (eax=0dh,ecx=0)? I did not understand difference of the > values of ebx and ecx returned by cpuid. > > Fixed. On the second thought, it is not necessary to use the ecx value because the patch uses the EDX:EAX value of cpuid(0d,0) for xsave, is it? This means that only features enabled in XCR0 are saved. The features not enabed in XCR0 cannot be used in user mode, so we do not need to store the states for them. -- Takashi Yano -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple