delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2024/10/14/01:31:14

DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 49E5VDGr3257202
Authentication-Results: delorie.com;
dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=g+QNyfHg
X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3CA87385B513
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1728883871;
bh=/yFiyLs/kfv8dlZZwpbvP/y6Atl1bUFiz7x4Idi2MMc=;
h=Date:To:Subject:In-Reply-To:References:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
From;
b=g+QNyfHgUhArfvzREuGebeFnUVSZDDuRuYhH5JVZkGG5C0RX7B1MQNR6neIN0icEq
rPdoi/1EqMb5jFOdv8S5MjmaOU3OCFvIQGnGAgT5/nIlppUdE2l/SBxh0SzeaHbS33
Za37yFbalYdqP0h4QTgvuVrBJuewXXHzzEjMjtJo=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 37D2E385B50F
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 37D2E385B50F
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728883815; cv=none;
b=K7VQG+JOuu5PKb+Si7jhlAbHTn/9h2dvtpr0US24K9bHUcOfwMLvnnBlBsBnDikaLUWxBAJw/pNL06w01jJou3RYbtHg9rQRA4xGzwQv0veSLSCMifvE5jGzUuvImNx5R18MsG5As1L1xhQDVkiBxsjv6fMLeTWtrnHzDzQ2MA8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
t=1728883815; c=relaxed/simple;
bh=AqRcgvTMVyBNB/y1MWlKH7Uu5x6UShSBWNPxYeNllvk=;
h=Date:From:To:Subject:Message-Id:Mime-Version:DKIM-Signature;
b=TYSJuUhjxy39k4NMmP7YJT/4f0eQYAvYbTxqb/QAjiRe92Bz7r1xXdI3xfL46kNO9rlQmSJhvHI56kN2FhzPtH85W79JbNiFzx9NCLuTIVGM4GxYI5beQf5Gkyt38PDkR5mEoSzfThQnbX60wxO4SiuevM4Ppq+inSnUxUbJBRY=
ARC-Authentication-Results: i=1; server2.sourceware.org
Date: Mon, 14 Oct 2024 14:29:58 +0900
To: cygwin AT cygwin DOT com
Subject: Re: cygwin 3.5.4-1: signal handling destroys 'long double' values
Message-Id: <20241014142958.ecf5faeb06a11a8c7a5301de@nifty.ne.jp>
In-Reply-To: <26b71767-a2a5-423a-96cd-8d01f9438527@SystematicSW.ab.ca>
References: <922a6d7e-3ee1-9bb7-dfd7-b94c53a7b9d4 AT t-online DOT de>
<20241008202057 DOT abd3dc5bb4df172c530e7655 AT nifty DOT ne DOT jp>
<79171662-eede-4b14-aaf4-ebd98e6d98de AT SystematicSW DOT ab DOT ca>
<99f51137-2889-4985-b4c6-a460e05befb8 AT SystematicSW DOT ab DOT ca>
<20241013081407 DOT f07402abe9f721924f461dcc AT nifty DOT ne DOT jp>
<51e4e5dd-57ef-4cbc-aff4-572eebb863e2 AT SystematicSW DOT ab DOT ca>
<20241014050649 DOT ddaa7e0d14365a86d8523f1d AT nifty DOT ne DOT jp>
<26b71767-a2a5-423a-96cd-8d01f9438527 AT SystematicSW DOT ab DOT ca>
X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.30; i686-pc-mingw32)
Mime-Version: 1.0
X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00, BODY_8BITS,
DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A,
SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.30
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Takashi Yano via Cygwin <cygwin AT cygwin DOT com>
Reply-To: Takashi Yano <takashi DOT yano AT nifty DOT ne DOT jp>
Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 49E5VDGr3257202

Hi Brian,

Thanks for the detail expression.

On Sun, 13 Oct 2024 16:19:31 -0600
Brian Inglis wrote:
> On 2024-10-13 14:06, Takashi Yano via Cygwin wrote:
> > Hi Brian
> > 
> > On Sun, 13 Oct 2024 10:41:58 -0600
> > Brian Inglis wrote:
> >> On 2024-10-12 17:14, Takashi Yano via Cygwin wrote:
> >>> Hi Brian,
> >>>
> >>> On Tue, 8 Oct 2024 10:37:14 -0600
> >>> Brian Inglis wrote:
> >>>> On 2024-10-08 10:14, Brian Inglis via Cygwin wrote:
> >>>>> On 2024-10-08 05:20, Takashi Yano via Cygwin wrote:
> >>>>>> On Mon, 7 Oct 2024 15:11:52 +0200
> >>>>>> Christian Franke wrote:
> >>>>>>> $ gcc -o sigtest -O2 sigtest.c
> >>>>>>>
> >>>>>>> $ ./sigtest > out.txt
> >>>>>>> (press ^C 42x :-)
> >>>>>>>
> >>>>>>> $ sort out.txt | uniq -c
> >>>>>>>           3 x = 0x1.23456789p+0, y = -nan, d = -nan
> >>>>>>>           6 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = -nan
> >>>>>>>          33 x = 0x1.23456789p+0, y = 0x1.23456789p+0, d = 0x0p+0
> >>>>>>>
> >>>>>>> The problem also occurs if compiled without -O2, but less often. No
> >>>>>>> problem occurs if compiled with -DWORKS which suggests that only 'long
> >>>>>>> double' is affected.
> >>>>>>
> >>>>>> Thanks for the report. I looked into this problem and might find the
> >>>>>> cause. It seems due to a bug of scripts/gendef. It generates signal
> >>>>>> handler caller (sigfe.s) which stores/restores the registers.
> >>>>>>
> >>>>>> In sigdelayed, control word is stored/restored by fnstcw/fldcw instruction,
> >>>>>> however, fninit instruction destroys some status registers in FPU (x87).
> >>>>>>
> >>>>>> I think we shold use fnstenv/fldenv rather than fnstcw/fldcw and fninit.
> >>>>>> However, I'm not familiar with x87 instructions, so I may overlook
> >>>>>> something.
> >>>>>>
> >>>>>> Could anyone expert of x87 instructions and sigfe stuff give some
> >>>>>> comments?
> >>>>>
> >>>>> AIUI x87 FP handling is outdated and mainly unused on current systems, as
> >>>>> current systems do more and use more than the legacy x87 instructions and stack.
> >>>>>
> >>>>> See https://en.cppreference.com/w/c/numeric/fenv and related docs for more
> >>>>> modern approaches.
> >>>>>
> >>>>> You would have to look into the AMD/Intel/IEEE docs for lower level details.
> >>>>
> >>>> This is basically what ISTR:
> >>>>
> >>>> https://beta.boost.org/doc/libs/1_82_0/libs/context/doc/html/context/rationale/x86_and_floating_point_env.html
> >>>>
> >>>> where legacy x87 and MMX registers are not used or preserved on x86_64/amd64, as
> >>>> SSE... instructions and XMM registers are used.
> >>>
> >>> Thanks for the advice. I read throuh the web pages and related documents
> >>> and made a patch which uses fxsave/fxrstor and xsave/xrstror to
> >>> cygwin-patches AT cygwin DOT com mailing list.
> >>> https://cygwin.com/pipermail/cygwin-patches/2024q4/012804.html
> >>>
> >>> Is this as you intended?
> >>
> >> That seems to be the preferred approach now, as long as you can correctly
> >> determine adequate space for fxsave and xsave, given the varying feature sets,
> >> register counts, and register sizes of recent processors:
> >> sse/2/3/4.1/4.2/4a/5/ssse3 avx2/512 128/256/512 bits X/Y/ZMM registers.
> > 
> > Thanks for checking.
> > 
> > According to https://cdrdv2.intel.com/v1/dl/getContent/671110 ,
> > fxsave uses 512 bytes fixed length memory to save the current
> > state of the x87 FPU, MMX technology, XMM, and MXCSR registers.
> > 
> > The patch allocates 0x238 bytes:
> >   0x200 (512 bytes): fxsave area
> >   0x008 (  8 bytes): for 16-byte alignment
> >   0x010 ( 16 bytes): work area
> >   0x020 ( 32 bytes): reserved for later processing
> 
> That is just the FPU state, MMX state, and 16 16B XMM registers, etc.
> Please also note that 64 bit operands or REX prefix must be used with 
> FXSAVE/FXRSTOR to save expanded state rather than legacy state.

Fixed.

> > According to https://cdrdv2.intel.com/v1/dl/getContent/671436 ,
> > cpuid instruction with eax=0dh and ecs=00h returns the maximum
> > size required by xsave in ebx. So the patch allocates:
> > ebx + 0x048 bytes.
> >   0x018 ( 24 bytes): for 64-byte alignment
> >   0x010 ( 16 bytes): work area
> >   0x020 ( 32 bytes): reserved for later processing
> 
> That is for features currently enabled in XCR0 user state, not all the values of 
> all possible registers, for all possible features, in ecx, which are supported, 
> may be enabled, and in use.
> You need 2KB to store 32 X/Y/ZMM 64B registers, and new real and virtual 
> features may require more.

Do you mean we should use ecx value rather than ebx returned by
cpuid (eax=0dh,ecx=0)? I did not understand difference of the
values of ebx and ecx returned by cpuid.

Fixed.

> It may be conservative, but I would suggest allocating the space in ecx as 
> documented, just in case of future changes, and that can be reduced to 512 if 
> only fxsave is supported.
> I suggest you should check for fxsave in cpuid 1:0 edx:24, fall back to 
> fnsave/frstor if not, and keep everything aligned to 64 bytes for safety.

According to my survay, all Intel and AMD CPUs (means all x86 CPUs)
have fxsave/fxrstor. So we do not need to check bit 24, do we?

> For my AMD A10-9700 /proc/cpuinfo shows:
> 
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
> pat pse36 clflush mmx *fxsr* sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
> rdtscp lm constant_tsc rep_good acc_power nopl tsc_reliable nonstop_tsc cpuid 
> aperfmperf pni pclmuldq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes 
> *xsave* avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a 
> misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm 
> perfctr_core perfctr_nb bpext ptsc mwaitx cpb hw_pstate fsgsbase bmi1 avx2 smep 
> bmi2 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid 
> decode_assists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov
> 
> and /usr/bin/cpuid (package cpuid) shows (see my added !):
> 
> ...
>     feature information (1/edx):
>        x87 FPU on chip                        = true
>        VME: virtual-8086 mode enhancement     = true
>        DE: debugging extensions               = true
>        PSE: page size extensions              = true
>        TSC: time stamp counter                = true
>        RDMSR and WRMSR support                = true
>        PAE: physical address extensions       = true
>        MCE: machine check exception           = true
>        CMPXCHG8B inst.                        = true
>        APIC on chip                           = true
>        SYSENTER and SYSEXIT                   = true
>        MTRR: memory type range registers      = true
>        PTE global bit                         = true
>        MCA: machine check architecture        = true
>        CMOV: conditional move/compare instr   = true
>        PAT: page attribute table              = true
>        PSE-36: page size extension            = true
>        PSN: processor serial number           = false
>        CLFLUSH instruction                    = true
>        DS: debug store                        = false
>        ACPI: thermal monitor and clock ctrl   = false
>        MMX Technology                         = true
> !     FXSAVE/FXRSTOR                         = true
>        SSE extensions                         = true
>        SSE2 extensions                        = true
>        SS: self snoop                         = false
>        hyper-threading / multi-core supported = true
>        TM: therm. monitor                     = false
>        IA64                                   = false
>        PBE: pending break event               = false
>     feature information (1/ecx):
>        PNI/SSE3: Prescott New Instructions     = true
>        PCLMULDQ instruction                    = true
>        DTES64: 64-bit debug store              = false
>        MONITOR/MWAIT                           = true
>        CPL-qualified debug store               = false
>        VMX: virtual machine extensions         = false
>        SMX: safer mode extensions              = false
>        Enhanced Intel SpeedStep Technology     = false
>        TM2: thermal monitor 2                  = false
>        SSSE3 extensions                        = true
>        context ID: adaptive or shared L1 data  = false
>        SDBG: IA32_DEBUG_INTERFACE              = false
>        FMA instruction                         = true
>        CMPXCHG16B instruction                  = true
>        xTPR disable                            = false
>        PDCM: perfmon and debug                 = false
>        PCID: process context identifiers       = false
>        DCA: direct cache access                = false
>        SSE4.1 extensions                       = true
>        SSE4.2 extensions                       = true
>        x2APIC: extended xAPIC support          = false
>        MOVBE instruction                       = true
>        POPCNT instruction                      = true
>        time stamp counter deadline             = false
>        AES instruction                         = true
>        XSAVE/XSTOR states                      = true
> !     OS-enabled XSAVE/XSTOR                  = true
>        AVX: advanced vector extensions         = true
>        F16C half-precision convert instruction = true
>        RDRAND instruction                      = true
>        hypervisor guest status                 = false
> ...
>     XSAVE features (0xd/0):
>        XCR0 valid bit field mask               = 0x4000000000000007
>           x87 state                            = true
>           SSE state                            = true
>           AVX state                            = true
>           MPX BNDREGS                          = false
>           MPX BNDCSR                           = false
>           AVX-512 opmask                       = false
>           AVX-512 ZMM_Hi256                    = false
>           AVX-512 Hi16_ZMM                     = false
>           PKRU state                           = false
>           XTILECFG state                       = false
>           XTILEDATA state                      = false
>        bytes required by fields in XCR0        = 0x00000340 (832)
Is this ebx

> !     bytes required by XSAVE/XRSTOR area     = 0x000003c0 (960)
and is this ecx from cpuid (0d:0)? I had checked some of my
environments, but ebx and ecx had always the same value. So,
I thought either can be used...

Please check v2 patch.

-- 
Takashi Yano <takashi DOT yano AT nifty DOT ne DOT jp>

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019