From patchwork Mon Jul 29 11:37:45 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Paul Zimmermann <Paul.Zimmermann@inria.fr>
X-Patchwork-Id: 94699
Return-Path: <libc-alpha-bounces~patchwork=sourceware.org@sourceware.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 55BA53858C31
	for <patchwork@sourceware.org>; Mon, 29 Jul 2024 11:38:26 +0000 (GMT)
X-Original-To: libc-alpha@sourceware.org
Delivered-To: libc-alpha@sourceware.org
Received: from mail3-relais-sop.national.inria.fr
 (mail3-relais-sop.national.inria.fr [192.134.164.104])
 by sourceware.org (Postfix) with ESMTPS id CE59E3858417
 for <libc-alpha@sourceware.org>; Mon, 29 Jul 2024 11:37:58 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CE59E3858417
Authentication-Results: sourceware.org;
 dmarc=fail (p=none dis=none) header.from=inria.fr
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loria.fr
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CE59E3858417
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=192.134.164.104
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722253082; cv=none;
 b=uOPz0IweNqsEmD4cHKbVj6/jnqrkUGwoWbEbg8YqBcmibPYG9++petNjNQxzDG3sdpURZLm89dkhCU9uDCAzcEqwD9Ph240LTfuX1vnmPAkoxmC6rrGql8OZrXt1M33KmmvtDXx/xllSCPxsqYSSv5+NHW3csuJkvuNBbGLAAZA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1722253082; c=relaxed/simple;
 bh=CSph29bdr5h3Krb1iVcl6hnHVTzqKszB64qieDplXfA=;
 h=From:To:Subject:Date:Message-ID:MIME-Version;
 b=vS9TlkpDeTwudmaCaSMvEPAAxydg+T9APMsi9ua2BDB9rNPID65oIWEkWJNuTYC3scAB0Cwoje7AAt0hAATy+prFQWHg00otU5rylaql5wtjmJOf7CHerZ5Qn+nyKr9aGp+9AIkK8pLg1jThC6Mk2t4Ktk57y7Lo74ODAxRrEPY=
ARC-Authentication-Results: i=1; server2.sourceware.org
Authentication-Results: mail3-relais-sop.national.inria.fr;
 dkim=none (message not signed) header.i=none;
 spf=SoftFail smtp.mailfrom=Paul.Zimmermann@loria.fr;
 spf=None smtp.helo=postmaster@coriandre.loria.fr
Received-SPF: SoftFail (mail3-relais-sop.national.inria.fr:
 domain of Paul.Zimmermann@loria.fr is inclined to not
 designate 152.81.9.227 as permitted sender)
 identity=mailfrom; client-ip=152.81.9.227;
 receiver=mail3-relais-sop.national.inria.fr;
 envelope-from="Paul.Zimmermann@loria.fr";
 x-sender="Paul.Zimmermann@loria.fr"; x-conformance=spf_only;
 x-record-type="v=spf1"; x-record-text="v=spf1
 ip4:128.93.142.0/24 ip4:192.134.164.0/24 ip4:128.93.162.160
 ip4:89.107.174.7 mx ~all"
Received-SPF: None (mail3-relais-sop.national.inria.fr: no sender
 authenticity information available from domain of
 postmaster@coriandre.loria.fr) identity=helo;
 client-ip=152.81.9.227;
 receiver=mail3-relais-sop.national.inria.fr;
 envelope-from="Paul.Zimmermann@loria.fr";
 x-sender="postmaster@coriandre.loria.fr"; x-conformance=spf_only
X-IronPort-AV: E=Sophos;i="6.09,245,1716242400"; d="scan'208";a="93156909"
Received: from coriandre.loria.fr ([152.81.9.227])
 by mail3-relais-sop.national.inria.fr with
 ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jul 2024 13:37:57 +0200
Received: from zimmerma by coriandre.loria.fr with local (Exim 4.98)
 (envelope-from <Paul.Zimmermann@loria.fr>)
 id 1sYOhR-00000006PGw-1ue2; Mon, 29 Jul 2024 13:37:57 +0200
From: Paul Zimmermann <Paul.Zimmermann@inria.fr>
To: libc-alpha@sourceware.org
Cc: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Subject: [PATCH] replace tgammaf by the CORE-MATH implementation (correctly
 rounded)
Date: Mon, 29 Jul 2024 13:37:45 +0200
Message-ID: <20240729113753.1527107-1-Paul.Zimmermann@inria.fr>
X-Mailer: git-send-email 2.43.0
MIME-Version: 1.0
Sender: Paul Zimmermann <Paul.Zimmermann@loria.fr>
X-Spam-Status: No, score=-9.2 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS,
 KAM_SHORT,
 RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org

The CORE-MATH implementation is correctly rounded (for any rounding mode).
This can be checked by exhaustive tests in a few minutes since there are
less than 2^32 values to check (against GNU MPFR for example).

This patch also adds benchtest values for tgammaf. With the initial
GNU libc code it gave:

  "tgammaf": {
   "": {
    "duration": 3.50188e+09,
    "iterations": 2e+07,
    "max": 602.891,
    "min": 65.1415,
    "mean": 175.094
   }
  }

With the new code:

  "tgammaf": {
   "": {
    "duration": 3.27888e+09,
    "iterations": 4.8e+07,
    "max": 232.661,
    "min": 29.932,
    "mean": 68.3099
   }
  }
---
 benchtests/Makefile                           |   1 +
 .../strcoll-inputs/filelist#en_US.UTF-8       |   1 -
 math/w_tgammaf_compat.c                       |  52 ++-
 sysdeps/ieee754/flt-32/e_gammaf_r.c           | 307 +++++++-----------
 sysdeps/x86_64/fpu/libm-test-ulps             |   4 -
 5 files changed, 136 insertions(+), 229 deletions(-)

diff --git a/benchtests/Makefile b/benchtests/Makefile
index 382fb7bae1..265ad34d8d 100644
--- a/benchtests/Makefile
+++ b/benchtests/Makefile
@@ -94,6 +94,7 @@ bench-math := \
   tan \
   tanh \
   tgamma \
+  tgammaf \
   trunc \
   truncf \
   y0 \
diff --git a/benchtests/strcoll-inputs/filelist#en_US.UTF-8 b/benchtests/strcoll-inputs/filelist#en_US.UTF-8
index 0d8f1c722b..93142aed97 100644
--- a/benchtests/strcoll-inputs/filelist#en_US.UTF-8
+++ b/benchtests/strcoll-inputs/filelist#en_US.UTF-8
@@ -5315,7 +5315,6 @@ s_isinf.c
 dbl2mpn.c
 atnat.h
 flt-32
-e_gammaf_r.c
 e_remainderf.c
 s_llroundf.c
 s_erff.c
diff --git a/math/w_tgammaf_compat.c b/math/w_tgammaf_compat.c
index 34e0e096e0..b5208c54d5 100644
--- a/math/w_tgammaf_compat.c
+++ b/math/w_tgammaf_compat.c
@@ -2,14 +2,28 @@
  */
 
 /*
- * ====================================================
- * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
- *
- * Developed at SunPro, a Sun Microsystems, Inc. business.
- * Permission to use, copy, modify, and distribute this
- * software is freely granted, provided that this notice
- * is preserved.
- * ====================================================
+Copyright (c) 2023 Alexei Sibidanov.
+
+This file was copied from the CORE-MATH project
+(file src/binary32/tgamma/tgammaf.c, revision 673c2af)
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
  */
 
 #include <errno.h>
@@ -22,26 +36,8 @@
 float
 __tgammaf(float x)
 {
-	int local_signgam;
-	float y = __ieee754_gammaf_r(x,&local_signgam);
-
-	if(__glibc_unlikely (!isfinite (y) || y == 0)
-	   && (isfinite (x) || (isinf (x) && x < 0.0))
-	   && _LIB_VERSION != _IEEE_) {
-	  if (x == (float)0.0)
-	    /* tgammaf pole */
-	    return __kernel_standard_f(x, x, 150);
-	  else if(floorf(x)==x&&x<0.0f)
-	    /* tgammaf domain */
-	    return __kernel_standard_f(x, x, 141);
-	  else if (y == 0)
-	    /* tgammaf underflow */
-	    __set_errno (ERANGE);
-	  else
-	    /* tgammaf overflow */
-	    return __kernel_standard_f(x, x, 140);
-	}
-	return local_signgam < 0 ? - y : y;
+  int e;
+  return __ieee754_gammaf_r (x, &e);
 }
 libm_alias_float (__tgamma, tgamma)
 #endif
diff --git a/sysdeps/ieee754/flt-32/e_gammaf_r.c b/sysdeps/ieee754/flt-32/e_gammaf_r.c
index a9730d61c1..9df66c7757 100644
--- a/sysdeps/ieee754/flt-32/e_gammaf_r.c
+++ b/sysdeps/ieee754/flt-32/e_gammaf_r.c
@@ -1,215 +1,130 @@
-/* Implementation of gamma function according to ISO C.
-   Copyright (C) 1997-2024 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
+/* Implementation of the gamma function for binary32.
 
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
+Copyright (c) 2023-2024 Alexei Sibidanov.
 
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
+This file was copied from the CORE-MATH project
+(file src/binary32/tgamma/tgammaf.c, revision a48e352)
 
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
 
-#include <math.h>
-#include <math-narrow-eval.h>
-#include <math_private.h>
-#include <fenv_private.h>
-#include <math-underflow.h>
-#include <float.h>
-#include <libm-alias-finite.h>
-
-/* Coefficients B_2k / 2k(2k-1) of x^-(2k-1) inside exp in Stirling's
-   approximation to gamma function.  */
-
-static const float gamma_coeff[] =
-  {
-    0x1.555556p-4f,
-    -0xb.60b61p-12f,
-    0x3.403404p-12f,
-  };
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
 
-#define NCOEFF (sizeof (gamma_coeff) / sizeof (gamma_coeff[0]))
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+ */
 
-/* Return gamma (X), for positive X less than 42, in the form R *
-   2^(*EXP2_ADJ), where R is the return value and *EXP2_ADJ is set to
-   avoid overflow or underflow in intermediate calculations.  */
+#include <stdint.h>
+#include <errno.h>
+#include <libm-alias-finite.h>
 
-static float
-gammaf_positive (float x, int *exp2_adj)
-{
-  int local_signgam;
-  if (x < 0.5f)
-    {
-      *exp2_adj = 0;
-      return __ieee754_expf (__ieee754_lgammaf_r (x + 1, &local_signgam)) / x;
-    }
-  else if (x <= 1.5f)
-    {
-      *exp2_adj = 0;
-      return __ieee754_expf (__ieee754_lgammaf_r (x, &local_signgam));
-    }
-  else if (x < 2.5f)
-    {
-      *exp2_adj = 0;
-      float x_adj = x - 1;
-      return (__ieee754_expf (__ieee754_lgammaf_r (x_adj, &local_signgam))
-	      * x_adj);
-    }
-  else
-    {
-      float eps = 0;
-      float x_eps = 0;
-      float x_adj = x;
-      float prod = 1;
-      if (x < 4.0f)
-	{
-	  /* Adjust into the range for applying Stirling's
-	     approximation.  */
-	  float n = ceilf (4.0f - x);
-	  x_adj = math_narrow_eval (x + n);
-	  x_eps = (x - (x_adj - n));
-	  prod = __gamma_productf (x_adj - n, x_eps, n, &eps);
-	}
-      /* The result is now gamma (X_ADJ + X_EPS) / (PROD * (1 + EPS)).
-	 Compute gamma (X_ADJ + X_EPS) using Stirling's approximation,
-	 starting by computing pow (X_ADJ, X_ADJ) with a power of 2
-	 factored out.  */
-      float exp_adj = -eps;
-      float x_adj_int = roundf (x_adj);
-      float x_adj_frac = x_adj - x_adj_int;
-      int x_adj_log2;
-      float x_adj_mant = __frexpf (x_adj, &x_adj_log2);
-      if (x_adj_mant < M_SQRT1_2f)
-	{
-	  x_adj_log2--;
-	  x_adj_mant *= 2.0f;
-	}
-      *exp2_adj = x_adj_log2 * (int) x_adj_int;
-      float ret = (__ieee754_powf (x_adj_mant, x_adj)
-		   * __ieee754_exp2f (x_adj_log2 * x_adj_frac)
-		   * __ieee754_expf (-x_adj)
-		   * sqrtf (2 * M_PIf / x_adj)
-		   / prod);
-      exp_adj += x_eps * __ieee754_logf (x_adj);
-      float bsum = gamma_coeff[NCOEFF - 1];
-      float x_adj2 = x_adj * x_adj;
-      for (size_t i = 1; i <= NCOEFF - 1; i++)
-	bsum = bsum / x_adj2 + gamma_coeff[NCOEFF - 1 - i];
-      exp_adj += bsum / x_adj;
-      return ret + ret * __expm1f (exp_adj);
-    }
-}
+typedef union {float f; uint32_t u;} b32u32_u;
+typedef union {double f; uint64_t u;} b64u64_u;
 
 float
-__ieee754_gammaf_r (float x, int *signgamp)
+__ieee754_gammaf_r (float x, int *exp2_adj)
 {
-  int32_t hx;
-  float ret;
-
-  GET_FLOAT_WORD (hx, x);
+  static const struct {b32u32_u x; float f, df;} tb[] = {
+    {{.u = 0x27de86a9u}, 0x1.268266p+47f, 0x1p22f},
+    {{.u = 0x27e05475u}, 0x1.242422p+47f, 0x1p22f},
+    {{.u = 0xb63befb3u}, -0x1.5cb6e4p+18f, 0x1p-7f},
+    {{.u = 0x3c7bb570u}, 0x1.021d9p+6f, 0x1p-19f},
+    {{.u = 0x41e886d1u}, 0x1.33136ap+98f, 0x1p73f},
+    {{.u = 0xc067d177u}, 0x1.f6850cp-3f, 0x1p-28f},
+    {{.f = -0x1.33b462p-4}, -0x1.befe66p+3, -0x1p-22f},
+    {{.f = -0x1.a988b4p-1}, -0x1.a6b4ecp+2, +0x1p-23f},
+    {{.f = 0x1.dceffcp+4}, 0x1.d3631cp+101, -0x1p-76f},
+    {{.f = 0x1.0874c8p+0}, 0x1.f6c638p-1, 0x1p-26f},
+  };
 
-  if (__glibc_unlikely ((hx & 0x7fffffff) == 0))
-    {
-      /* Return value for x == 0 is Inf with divide by zero exception.  */
-      *signgamp = 0;
-      return 1.0 / x;
+  b32u32_u t = {.f = x};
+  uint32_t ax = t.u<<1;
+  if(__builtin_expect(ax>=(0xffu<<24), 0)){
+    if(ax==(0xffu<<24)){
+      if(t.u>>31){
+	errno = EDOM;
+	return __builtin_nanf("12");
+      }
+      return x;
     }
-  if (__builtin_expect (hx < 0, 0)
-      && (uint32_t) hx < 0xff800000 && rintf (x) == x)
-    {
-      /* Return value for integer x < 0 is NaN with invalid exception.  */
-      *signgamp = 0;
-      return (x - x) / (x - x);
+    return x; // nan
+  }
+  double z = x;
+  if(__builtin_expect(ax<0x6d000000u, 0)){
+    volatile double d = (0x1.fa658c23b1578p-1 - 0x1.d0a118f324b63p-1*z)*z - 0x1.2788cfc6fb619p-1;
+    double f = 1.0/z + d;
+    float r = f;
+    if(__builtin_fabs(r)>0x1.fffffep+127f) errno = ERANGE;
+    b64u64_u rt = {.f = f};
+    if(((rt.u+2)&0xfffffff) < 4){
+      for(unsigned i=0;i<sizeof(tb)/sizeof(tb[0]);i++)
+	if(t.u==tb[i].x.u) return tb[i].f + tb[i].df;
     }
-  if (__glibc_unlikely (hx == 0xff800000))
-    {
-      /* x == -Inf.  According to ISO this is NaN.  */
-      *signgamp = 0;
-      return x - x;
+    return r;
+  }
+  float fx = __builtin_floor(x);
+  int k = fx;
+  if(__builtin_expect(x >= 0x1.18522p+5f, 0)){
+    float r = 0x1p127f * 0x1p127f;
+    if(r>0x1.fffffep+127) errno = ERANGE;
+    return r;
+  }
+  if(__builtin_expect(fx==x, 0)){
+    if(x == 0.0f){
+      errno = ERANGE;
+      return 1.0f/x;
     }
-  if (__glibc_unlikely ((hx & 0x7f800000) == 0x7f800000))
-    {
-      /* Positive infinity (return positive infinity) or NaN (return
-	 NaN).  */
-      *signgamp = 0;
-      return x + x;
+    if(x < 0.0f) {
+      errno = EDOM;
+      return __builtin_nanf("12");
     }
+    double t0 = 1, x0 = 1;
+    for(int i=1; i<k; i++, x0 += 1.0) t0 *= x0;
+    return t0;
+  }
+  if(__builtin_expect(x<-47.0f, 0)){
+    static const float sgn[2] = {0x1p-127, -0x1p-127};
+    float r = 0x1p-127f * sgn[k&1];
+    if(r == 0.0f) errno = ERANGE;
+    return r;
+  }
+  static const double c[] =
+    {0x1.c9a76be577123p+0, 0x1.8f2754ddcf90dp+0, 0x1.0d1191949419bp+0, 0x1.e1f42cf0ae4a1p-2,
+     0x1.82b358a3ab638p-3, 0x1.e1f2b30cd907bp-5, 0x1.240f6d4071bd8p-6, 0x1.1522c9f3cd012p-8,
+     0x1.1fd0051a0525bp-10, 0x1.9808a8b96c37ep-13, 0x1.b3f78e01152b5p-15, 0x1.49c85a7e1fd04p-18,
+     0x1.471ca49184475p-19, -0x1.368f0b7ed9e36p-23, 0x1.882222f9049efp-23, -0x1.a69ed2042842cp-25};
 
-  if (x >= 36.0f)
-    {
-      /* Overflow.  */
-      *signgamp = 0;
-      ret = math_narrow_eval (FLT_MAX * FLT_MAX);
-      return ret;
-    }
-  else
-    {
-      SET_RESTORE_ROUNDF (FE_TONEAREST);
-      if (x > 0.0f)
-	{
-	  *signgamp = 0;
-	  int exp2_adj;
-	  float tret = gammaf_positive (x, &exp2_adj);
-	  ret = __scalbnf (tret, exp2_adj);
-	}
-      else if (x >= -FLT_EPSILON / 4.0f)
-	{
-	  *signgamp = 0;
-	  ret = 1.0f / x;
-	}
-      else
-	{
-	  float tx = truncf (x);
-	  *signgamp = (tx == 2.0f * truncf (tx / 2.0f)) ? -1 : 1;
-	  if (x <= -42.0f)
-	    /* Underflow.  */
-	    ret = FLT_MIN * FLT_MIN;
-	  else
-	    {
-	      float frac = tx - x;
-	      if (frac > 0.5f)
-		frac = 1.0f - frac;
-	      float sinpix = (frac <= 0.25f
-			      ? __sinf (M_PIf * frac)
-			      : __cosf (M_PIf * (0.5f - frac)));
-	      int exp2_adj;
-	      float tret = M_PIf / (-x * sinpix
-				    * gammaf_positive (-x, &exp2_adj));
-	      ret = __scalbnf (tret, -exp2_adj);
-	      math_check_force_underflow_nonneg (ret);
-	    }
-	}
-      ret = math_narrow_eval (ret);
-    }
-  if (isinf (ret) && x != 0)
-    {
-      if (*signgamp < 0)
-	{
-	  ret = math_narrow_eval (-copysignf (FLT_MAX, ret) * FLT_MAX);
-	  ret = -ret;
-	}
-      else
-	ret = math_narrow_eval (copysignf (FLT_MAX, ret) * FLT_MAX);
-      return ret;
-    }
-  else if (ret == 0)
-    {
-      if (*signgamp < 0)
-	{
-	  ret = math_narrow_eval (-copysignf (FLT_MIN, ret) * FLT_MIN);
-	  ret = -ret;
-	}
-      else
-	ret = math_narrow_eval (copysignf (FLT_MIN, ret) * FLT_MIN);
-      return ret;
+  double m = z - 0x1.7p+1, i = __builtin_roundeven(m), step = __builtin_copysign(1.0,i);
+  double d = m - i, d2 = d*d, d4 = d2*d2, d8 = d4*d4;
+  double f = (c[0] + d*c[1]) + d2*(c[2] + d*c[3]) + d4*((c[4] + d*c[5]) + d2*(c[6] + d*c[7]))
+    + d8*((c[8] + d*c[9]) + d2*(c[10] + d*c[11]) + d4*((c[12] + d*c[13]) + d2*(c[14] + d*c[15])));
+  int jm = __builtin_fabs(i);
+  double w = 1;
+  if(jm){
+    z -= 0.5 + step*0.5;
+    w = z;
+    for(int j=jm-1; j; j--) {z -= step; w *= z;}
+  }
+  if(i<=-0.5) w = 1/w;
+  f *= w;
+  b64u64_u rt = {.f = f};
+  float r = f;
+  if(__builtin_expect(r==0.0f, 0)) errno = ERANGE;
+  if(__builtin_expect(((rt.u+2)&0xfffffff) < 8, 0)){
+    for(unsigned i=0;i<sizeof(tb)/sizeof(tb[0]);i++) {
+      if(t.u==tb[i].x.u) return tb[i].f + tb[i].df;
     }
-  else
-    return ret;
+  }
+  return r;
 }
 libm_alias_finite (__ieee754_gammaf_r, __gammaf_r)
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index 37d8998c71..29e3ca926a 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -2263,25 +2263,21 @@ double: 1
 
 Function: "tgamma":
 double: 9
-float: 8
 float128: 4
 ldouble: 5
 
 Function: "tgamma_downward":
 double: 9
-float: 7
 float128: 5
 ldouble: 6
 
 Function: "tgamma_towardzero":
 double: 9
-float: 7
 float128: 5
 ldouble: 6
 
 Function: "tgamma_upward":
 double: 9
-float: 8
 float128: 4
 ldouble: 5