From patchwork Mon Jul 29 11:37:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul Zimmermann X-Patchwork-Id: 94699 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 55BA53858C31 for ; Mon, 29 Jul 2024 11:38:26 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sourceware.org (Postfix) with ESMTPS id CE59E3858417 for ; Mon, 29 Jul 2024 11:37:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CE59E3858417 Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=inria.fr Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loria.fr ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CE59E3858417 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.134.164.104 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722253082; cv=none; b=uOPz0IweNqsEmD4cHKbVj6/jnqrkUGwoWbEbg8YqBcmibPYG9++petNjNQxzDG3sdpURZLm89dkhCU9uDCAzcEqwD9Ph240LTfuX1vnmPAkoxmC6rrGql8OZrXt1M33KmmvtDXx/xllSCPxsqYSSv5+NHW3csuJkvuNBbGLAAZA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722253082; c=relaxed/simple; bh=CSph29bdr5h3Krb1iVcl6hnHVTzqKszB64qieDplXfA=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=vS9TlkpDeTwudmaCaSMvEPAAxydg+T9APMsi9ua2BDB9rNPID65oIWEkWJNuTYC3scAB0Cwoje7AAt0hAATy+prFQWHg00otU5rylaql5wtjmJOf7CHerZ5Qn+nyKr9aGp+9AIkK8pLg1jThC6Mk2t4Ktk57y7Lo74ODAxRrEPY= ARC-Authentication-Results: i=1; server2.sourceware.org Authentication-Results: mail3-relais-sop.national.inria.fr; dkim=none (message not signed) header.i=none; spf=SoftFail smtp.mailfrom=Paul.Zimmermann@loria.fr; spf=None smtp.helo=postmaster@coriandre.loria.fr Received-SPF: SoftFail (mail3-relais-sop.national.inria.fr: domain of Paul.Zimmermann@loria.fr is inclined to not designate 152.81.9.227 as permitted sender) identity=mailfrom; client-ip=152.81.9.227; receiver=mail3-relais-sop.national.inria.fr; envelope-from="Paul.Zimmermann@loria.fr"; x-sender="Paul.Zimmermann@loria.fr"; x-conformance=spf_only; x-record-type="v=spf1"; x-record-text="v=spf1 ip4:128.93.142.0/24 ip4:192.134.164.0/24 ip4:128.93.162.160 ip4:89.107.174.7 mx ~all" Received-SPF: None (mail3-relais-sop.national.inria.fr: no sender authenticity information available from domain of postmaster@coriandre.loria.fr) identity=helo; client-ip=152.81.9.227; receiver=mail3-relais-sop.national.inria.fr; envelope-from="Paul.Zimmermann@loria.fr"; x-sender="postmaster@coriandre.loria.fr"; x-conformance=spf_only X-IronPort-AV: E=Sophos;i="6.09,245,1716242400"; d="scan'208";a="93156909" Received: from coriandre.loria.fr ([152.81.9.227]) by mail3-relais-sop.national.inria.fr with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jul 2024 13:37:57 +0200 Received: from zimmerma by coriandre.loria.fr with local (Exim 4.98) (envelope-from ) id 1sYOhR-00000006PGw-1ue2; Mon, 29 Jul 2024 13:37:57 +0200 From: Paul Zimmermann To: libc-alpha@sourceware.org Cc: Paul Zimmermann Subject: [PATCH] replace tgammaf by the CORE-MATH implementation (correctly rounded) Date: Mon, 29 Jul 2024 13:37:45 +0200 Message-ID: <20240729113753.1527107-1-Paul.Zimmermann@inria.fr> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Sender: Paul Zimmermann X-Spam-Status: No, score=-9.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org The CORE-MATH implementation is correctly rounded (for any rounding mode). This can be checked by exhaustive tests in a few minutes since there are less than 2^32 values to check (against GNU MPFR for example). This patch also adds benchtest values for tgammaf. With the initial GNU libc code it gave: "tgammaf": { "": { "duration": 3.50188e+09, "iterations": 2e+07, "max": 602.891, "min": 65.1415, "mean": 175.094 } } With the new code: "tgammaf": { "": { "duration": 3.27888e+09, "iterations": 4.8e+07, "max": 232.661, "min": 29.932, "mean": 68.3099 } } --- benchtests/Makefile | 1 + .../strcoll-inputs/filelist#en_US.UTF-8 | 1 - math/w_tgammaf_compat.c | 52 ++- sysdeps/ieee754/flt-32/e_gammaf_r.c | 307 +++++++----------- sysdeps/x86_64/fpu/libm-test-ulps | 4 - 5 files changed, 136 insertions(+), 229 deletions(-) diff --git a/benchtests/Makefile b/benchtests/Makefile index 382fb7bae1..265ad34d8d 100644 --- a/benchtests/Makefile +++ b/benchtests/Makefile @@ -94,6 +94,7 @@ bench-math := \ tan \ tanh \ tgamma \ + tgammaf \ trunc \ truncf \ y0 \ diff --git a/benchtests/strcoll-inputs/filelist#en_US.UTF-8 b/benchtests/strcoll-inputs/filelist#en_US.UTF-8 index 0d8f1c722b..93142aed97 100644 --- a/benchtests/strcoll-inputs/filelist#en_US.UTF-8 +++ b/benchtests/strcoll-inputs/filelist#en_US.UTF-8 @@ -5315,7 +5315,6 @@ s_isinf.c dbl2mpn.c atnat.h flt-32 -e_gammaf_r.c e_remainderf.c s_llroundf.c s_erff.c diff --git a/math/w_tgammaf_compat.c b/math/w_tgammaf_compat.c index 34e0e096e0..b5208c54d5 100644 --- a/math/w_tgammaf_compat.c +++ b/math/w_tgammaf_compat.c @@ -2,14 +2,28 @@ */ /* - * ==================================================== - * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved. - * - * Developed at SunPro, a Sun Microsystems, Inc. business. - * Permission to use, copy, modify, and distribute this - * software is freely granted, provided that this notice - * is preserved. - * ==================================================== +Copyright (c) 2023 Alexei Sibidanov. + +This file was copied from the CORE-MATH project +(file src/binary32/tgamma/tgammaf.c, revision 673c2af) + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. */ #include @@ -22,26 +36,8 @@ float __tgammaf(float x) { - int local_signgam; - float y = __ieee754_gammaf_r(x,&local_signgam); - - if(__glibc_unlikely (!isfinite (y) || y == 0) - && (isfinite (x) || (isinf (x) && x < 0.0)) - && _LIB_VERSION != _IEEE_) { - if (x == (float)0.0) - /* tgammaf pole */ - return __kernel_standard_f(x, x, 150); - else if(floorf(x)==x&&x<0.0f) - /* tgammaf domain */ - return __kernel_standard_f(x, x, 141); - else if (y == 0) - /* tgammaf underflow */ - __set_errno (ERANGE); - else - /* tgammaf overflow */ - return __kernel_standard_f(x, x, 140); - } - return local_signgam < 0 ? - y : y; + int e; + return __ieee754_gammaf_r (x, &e); } libm_alias_float (__tgamma, tgamma) #endif diff --git a/sysdeps/ieee754/flt-32/e_gammaf_r.c b/sysdeps/ieee754/flt-32/e_gammaf_r.c index a9730d61c1..9df66c7757 100644 --- a/sysdeps/ieee754/flt-32/e_gammaf_r.c +++ b/sysdeps/ieee754/flt-32/e_gammaf_r.c @@ -1,215 +1,130 @@ -/* Implementation of gamma function according to ISO C. - Copyright (C) 1997-2024 Free Software Foundation, Inc. - This file is part of the GNU C Library. +/* Implementation of the gamma function for binary32. - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. +Copyright (c) 2023-2024 Alexei Sibidanov. - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. +This file was copied from the CORE-MATH project +(file src/binary32/tgamma/tgammaf.c, revision a48e352) - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library; if not, see - . */ +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: -#include -#include -#include -#include -#include -#include -#include - -/* Coefficients B_2k / 2k(2k-1) of x^-(2k-1) inside exp in Stirling's - approximation to gamma function. */ - -static const float gamma_coeff[] = - { - 0x1.555556p-4f, - -0xb.60b61p-12f, - 0x3.403404p-12f, - }; +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. -#define NCOEFF (sizeof (gamma_coeff) / sizeof (gamma_coeff[0])) +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. + */ -/* Return gamma (X), for positive X less than 42, in the form R * - 2^(*EXP2_ADJ), where R is the return value and *EXP2_ADJ is set to - avoid overflow or underflow in intermediate calculations. */ +#include +#include +#include -static float -gammaf_positive (float x, int *exp2_adj) -{ - int local_signgam; - if (x < 0.5f) - { - *exp2_adj = 0; - return __ieee754_expf (__ieee754_lgammaf_r (x + 1, &local_signgam)) / x; - } - else if (x <= 1.5f) - { - *exp2_adj = 0; - return __ieee754_expf (__ieee754_lgammaf_r (x, &local_signgam)); - } - else if (x < 2.5f) - { - *exp2_adj = 0; - float x_adj = x - 1; - return (__ieee754_expf (__ieee754_lgammaf_r (x_adj, &local_signgam)) - * x_adj); - } - else - { - float eps = 0; - float x_eps = 0; - float x_adj = x; - float prod = 1; - if (x < 4.0f) - { - /* Adjust into the range for applying Stirling's - approximation. */ - float n = ceilf (4.0f - x); - x_adj = math_narrow_eval (x + n); - x_eps = (x - (x_adj - n)); - prod = __gamma_productf (x_adj - n, x_eps, n, &eps); - } - /* The result is now gamma (X_ADJ + X_EPS) / (PROD * (1 + EPS)). - Compute gamma (X_ADJ + X_EPS) using Stirling's approximation, - starting by computing pow (X_ADJ, X_ADJ) with a power of 2 - factored out. */ - float exp_adj = -eps; - float x_adj_int = roundf (x_adj); - float x_adj_frac = x_adj - x_adj_int; - int x_adj_log2; - float x_adj_mant = __frexpf (x_adj, &x_adj_log2); - if (x_adj_mant < M_SQRT1_2f) - { - x_adj_log2--; - x_adj_mant *= 2.0f; - } - *exp2_adj = x_adj_log2 * (int) x_adj_int; - float ret = (__ieee754_powf (x_adj_mant, x_adj) - * __ieee754_exp2f (x_adj_log2 * x_adj_frac) - * __ieee754_expf (-x_adj) - * sqrtf (2 * M_PIf / x_adj) - / prod); - exp_adj += x_eps * __ieee754_logf (x_adj); - float bsum = gamma_coeff[NCOEFF - 1]; - float x_adj2 = x_adj * x_adj; - for (size_t i = 1; i <= NCOEFF - 1; i++) - bsum = bsum / x_adj2 + gamma_coeff[NCOEFF - 1 - i]; - exp_adj += bsum / x_adj; - return ret + ret * __expm1f (exp_adj); - } -} +typedef union {float f; uint32_t u;} b32u32_u; +typedef union {double f; uint64_t u;} b64u64_u; float -__ieee754_gammaf_r (float x, int *signgamp) +__ieee754_gammaf_r (float x, int *exp2_adj) { - int32_t hx; - float ret; - - GET_FLOAT_WORD (hx, x); + static const struct {b32u32_u x; float f, df;} tb[] = { + {{.u = 0x27de86a9u}, 0x1.268266p+47f, 0x1p22f}, + {{.u = 0x27e05475u}, 0x1.242422p+47f, 0x1p22f}, + {{.u = 0xb63befb3u}, -0x1.5cb6e4p+18f, 0x1p-7f}, + {{.u = 0x3c7bb570u}, 0x1.021d9p+6f, 0x1p-19f}, + {{.u = 0x41e886d1u}, 0x1.33136ap+98f, 0x1p73f}, + {{.u = 0xc067d177u}, 0x1.f6850cp-3f, 0x1p-28f}, + {{.f = -0x1.33b462p-4}, -0x1.befe66p+3, -0x1p-22f}, + {{.f = -0x1.a988b4p-1}, -0x1.a6b4ecp+2, +0x1p-23f}, + {{.f = 0x1.dceffcp+4}, 0x1.d3631cp+101, -0x1p-76f}, + {{.f = 0x1.0874c8p+0}, 0x1.f6c638p-1, 0x1p-26f}, + }; - if (__glibc_unlikely ((hx & 0x7fffffff) == 0)) - { - /* Return value for x == 0 is Inf with divide by zero exception. */ - *signgamp = 0; - return 1.0 / x; + b32u32_u t = {.f = x}; + uint32_t ax = t.u<<1; + if(__builtin_expect(ax>=(0xffu<<24), 0)){ + if(ax==(0xffu<<24)){ + if(t.u>>31){ + errno = EDOM; + return __builtin_nanf("12"); + } + return x; } - if (__builtin_expect (hx < 0, 0) - && (uint32_t) hx < 0xff800000 && rintf (x) == x) - { - /* Return value for integer x < 0 is NaN with invalid exception. */ - *signgamp = 0; - return (x - x) / (x - x); + return x; // nan + } + double z = x; + if(__builtin_expect(ax<0x6d000000u, 0)){ + volatile double d = (0x1.fa658c23b1578p-1 - 0x1.d0a118f324b63p-1*z)*z - 0x1.2788cfc6fb619p-1; + double f = 1.0/z + d; + float r = f; + if(__builtin_fabs(r)>0x1.fffffep+127f) errno = ERANGE; + b64u64_u rt = {.f = f}; + if(((rt.u+2)&0xfffffff) < 4){ + for(unsigned i=0;i= 0x1.18522p+5f, 0)){ + float r = 0x1p127f * 0x1p127f; + if(r>0x1.fffffep+127) errno = ERANGE; + return r; + } + if(__builtin_expect(fx==x, 0)){ + if(x == 0.0f){ + errno = ERANGE; + return 1.0f/x; } - if (__glibc_unlikely ((hx & 0x7f800000) == 0x7f800000)) - { - /* Positive infinity (return positive infinity) or NaN (return - NaN). */ - *signgamp = 0; - return x + x; + if(x < 0.0f) { + errno = EDOM; + return __builtin_nanf("12"); } + double t0 = 1, x0 = 1; + for(int i=1; i= 36.0f) - { - /* Overflow. */ - *signgamp = 0; - ret = math_narrow_eval (FLT_MAX * FLT_MAX); - return ret; - } - else - { - SET_RESTORE_ROUNDF (FE_TONEAREST); - if (x > 0.0f) - { - *signgamp = 0; - int exp2_adj; - float tret = gammaf_positive (x, &exp2_adj); - ret = __scalbnf (tret, exp2_adj); - } - else if (x >= -FLT_EPSILON / 4.0f) - { - *signgamp = 0; - ret = 1.0f / x; - } - else - { - float tx = truncf (x); - *signgamp = (tx == 2.0f * truncf (tx / 2.0f)) ? -1 : 1; - if (x <= -42.0f) - /* Underflow. */ - ret = FLT_MIN * FLT_MIN; - else - { - float frac = tx - x; - if (frac > 0.5f) - frac = 1.0f - frac; - float sinpix = (frac <= 0.25f - ? __sinf (M_PIf * frac) - : __cosf (M_PIf * (0.5f - frac))); - int exp2_adj; - float tret = M_PIf / (-x * sinpix - * gammaf_positive (-x, &exp2_adj)); - ret = __scalbnf (tret, -exp2_adj); - math_check_force_underflow_nonneg (ret); - } - } - ret = math_narrow_eval (ret); - } - if (isinf (ret) && x != 0) - { - if (*signgamp < 0) - { - ret = math_narrow_eval (-copysignf (FLT_MAX, ret) * FLT_MAX); - ret = -ret; - } - else - ret = math_narrow_eval (copysignf (FLT_MAX, ret) * FLT_MAX); - return ret; - } - else if (ret == 0) - { - if (*signgamp < 0) - { - ret = math_narrow_eval (-copysignf (FLT_MIN, ret) * FLT_MIN); - ret = -ret; - } - else - ret = math_narrow_eval (copysignf (FLT_MIN, ret) * FLT_MIN); - return ret; + double m = z - 0x1.7p+1, i = __builtin_roundeven(m), step = __builtin_copysign(1.0,i); + double d = m - i, d2 = d*d, d4 = d2*d2, d8 = d4*d4; + double f = (c[0] + d*c[1]) + d2*(c[2] + d*c[3]) + d4*((c[4] + d*c[5]) + d2*(c[6] + d*c[7])) + + d8*((c[8] + d*c[9]) + d2*(c[10] + d*c[11]) + d4*((c[12] + d*c[13]) + d2*(c[14] + d*c[15]))); + int jm = __builtin_fabs(i); + double w = 1; + if(jm){ + z -= 0.5 + step*0.5; + w = z; + for(int j=jm-1; j; j--) {z -= step; w *= z;} + } + if(i<=-0.5) w = 1/w; + f *= w; + b64u64_u rt = {.f = f}; + float r = f; + if(__builtin_expect(r==0.0f, 0)) errno = ERANGE; + if(__builtin_expect(((rt.u+2)&0xfffffff) < 8, 0)){ + for(unsigned i=0;i