From patchwork Wed Nov 27 07:45:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 101951 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EA34F385843B for ; Wed, 27 Nov 2024 07:47:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org EA34F385843B Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=TyR3cEuw X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-yw1-x112a.google.com (mail-yw1-x112a.google.com [IPv6:2607:f8b0:4864:20::112a]) by sourceware.org (Postfix) with ESMTPS id 755153858D37 for ; Wed, 27 Nov 2024 07:46:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 755153858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 755153858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::112a ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732693591; cv=none; b=U51EcGZkoNcYPZBWFbbvA8Mqzq4bIIC9gU5KjCPoLt3VX4Y+2T6bmkVI8+VmOKd8aEkxtetD8GrlxgqQ+Jbto66EV7NxhDvUwPvbs/5HvZsHzSUCveqgigobI6My+qOvU3ujAcAFa92AaTxiotDiN5RC5ZDfLPOch+o9+VvBzBM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732693591; c=relaxed/simple; bh=lOXUQV2770mEyXjNbnECLbgA3/ZoxO6HWIkQH53NNNE=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=MxLJlfcich2vbhxd3aeqYo3pRQR2HbSZnHHCFPdP9bc8reqwr+ZIAFsrRpegDQMCtco8DFg2rz078eUtfW9gkiVUzLP8fjsrdOXXx3c5V9u+cwHh6ZILMFRMOXbt7jrNsJeOxKG6VSkK7pSWB1GuHrYVacvN4Ix2m7pDCYZWoIw= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 755153858D37 Received: by mail-yw1-x112a.google.com with SMTP id 00721157ae682-6eeb66727e7so63423807b3.2 for ; Tue, 26 Nov 2024 23:46:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1732693590; x=1733298390; darn=sourceware.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=kLqRFiOUiGW1teskc/psQ+kmUn2KEFvy40uzZyownPE=; b=TyR3cEuwHwCG0Aa4IMdZ34w58EPlGlZNMaxUMs1sIiLiheNF8DHZDzlOoL/euIEl9E xKMXTPyKIHlbbu/jTkk04bULbd+1fzSyVPCJ+1TlDGrAIeSFOmIZ7wDFC3g6bAxSSpQ+ YLl4i62AepMqRq+w51cOx5jlQM36z0T8hnbQ3BdzbvMs+DS8AVPqrNHwnJ+H0WWYr146 F9EXo3FEYF5FW5K++7hM90jwakhMQiottYKweKeD02FjaHngT9xVd5Xt+9EEhxWK4ozg 74r3tT2IZYbpiGeL450bEQb7uJnSjIt+seFKs0IPqlO6BDgy2rtQbHd66rHYkiwt5cSW ftmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732693590; x=1733298390; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=kLqRFiOUiGW1teskc/psQ+kmUn2KEFvy40uzZyownPE=; b=dOugJ/x8c/eA55GvkJNJothIYi1v/8702zWL2ZCRPDmg1NxLQk8D5/Ij2byZHCBu32 tTsS09xdaZw+3hqlGhtBJXwrLoQkb1GyrzFRQrVMIq6T9WpRaP8D2UV1UJzxdKRwj+C7 4HnI8BONjgTN7rAnr4f63bfFo7ZqvMjDHUiBHvUpZSrsk0igjwwehr/aZszMJkdahxVm W/MtBgFs/71HdR1CxXp3W7cEzw5tPSAtOn+TVq6gkfOuWOm+qpoQ8R/lge6maMNvfJfk CYF2jIZFC6wmvKOn+qI9E7eij9r2bSWnRB9YqC35e2VwzI3Oe9WocaWL9e0PK/7BVh6s O0cg== X-Gm-Message-State: AOJu0YxNZ5BmFJCjEdvGic1oc6Rj1+DfaFna39gC6hH3OcoR31hYgfps 9CpnJqIDRJk/dIm0ojFEgW9fiZTdTj8fgpLbZ6eB80IhhlCTrS0BruTmZVSQgaWIj4PXDfTcZ/E 3pzewiq3R7D64M01DGMxxMAn5fm+L+k81sHt3iXah X-Gm-Gg: ASbGncvfxoEr7Ad0LpmI0QPDLmyJJJ7JtQYBP9KRI7p16yCVEgIGJH1oS1B3RUPMe1A swevWhImVHWzQJDrATsQ2/J+kmXhWforM4tqvvz/fSFi0zRN7IA== X-Google-Smtp-Source: AGHT+IF6v9AE+1N1LifPXS1qdkxYSTG4LcFScJ7UtDTdOZoXfQFRdnRLwsp1hmSVFY4DCRrV1KJppb/y3/9co9BTpI4= X-Received: by 2002:a05:690c:4b08:b0:6e3:32e2:ecbf with SMTP id 00721157ae682-6ef37224e42mr24051597b3.24.1732693590512; Tue, 26 Nov 2024 23:46:30 -0800 (PST) MIME-Version: 1.0 From: "H.J. Lu" Date: Wed, 27 Nov 2024 15:45:54 +0800 Message-ID: Subject: [PATCH] malloc: Optimize small memory zeroing for calloc To: GNU C Library , "Guo, Wangyang" , Sunil K Pandey , Noah Goldstein , Florian Weimer X-Spam-Status: No, score=-3017.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org For memory size up to 9 * INTERNAL_SIZE_T bytes, calloc has special codes to clear the memory. Add calloc-clear-memory.h to allow architecture specific optimization. On x86-64, it uses up to 1 branch, instead of 3, and up to 5 stores, instead of 9, by using overlapping vector stores: Test Platform: Xeon-8380 Bench Function: calloc Ratio: New / Original time_per_iteration (Lower is Better) Threads# | Ratio -----------|------ 1 thread | 0.953 4 threads | 0.952 OK for master? Thanks. From ffc1deb8150640ac377f97617570e8c4e92efe72 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Tue, 26 Nov 2024 16:15:25 +0800 Subject: [PATCH] malloc: Optimize small memory zeroing for calloc For memory size up to 9 * INTERNAL_SIZE_T bytes, calloc has special codes to clear the memory. Add calloc-clear-memory.h to allow architecture specific optimization. On x86-64, it uses up to 1 branch, instead of 3, and up to 5 stores, instead of 9, by using overlapping vector stores: Test Platform: Xeon-8380 Bench Function: calloc Ratio: New / Original time_per_iteration (Lower is Better) Threads# | Ratio -----------|------ 1 thread | 0.953 4 threads | 0.952 Signed-off-by: H.J. Lu --- malloc/malloc-internal.h | 1 + malloc/malloc.c | 22 +------------ sysdeps/generic/calloc-clear-memory.h | 44 +++++++++++++++++++++++++ sysdeps/x86_64/64/calloc-clear-memory.h | 40 ++++++++++++++++++++++ 4 files changed, 86 insertions(+), 21 deletions(-) create mode 100644 sysdeps/generic/calloc-clear-memory.h create mode 100644 sysdeps/x86_64/64/calloc-clear-memory.h diff --git a/malloc/malloc-internal.h b/malloc/malloc-internal.h index cba03433fe..3349e2d1fe 100644 --- a/malloc/malloc-internal.h +++ b/malloc/malloc-internal.h @@ -23,6 +23,7 @@ #include #include #include +#include /* Called in the parent process before a fork. */ void __malloc_fork_lock_parent (void) attribute_hidden; diff --git a/malloc/malloc.c b/malloc/malloc.c index 32dbc272a8..00b01d282a 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -3863,28 +3863,8 @@ __libc_calloc (size_t n, size_t elem_size) if (nclears > 9) return memset (d, 0, clearsize); - else - { - *(d + 0) = 0; - *(d + 1) = 0; - *(d + 2) = 0; - if (nclears > 4) - { - *(d + 3) = 0; - *(d + 4) = 0; - if (nclears > 6) - { - *(d + 5) = 0; - *(d + 6) = 0; - if (nclears > 8) - { - *(d + 7) = 0; - *(d + 8) = 0; - } - } - } - } + clear_small_memory (d, nclears); return mem; } diff --git a/sysdeps/generic/calloc-clear-memory.h b/sysdeps/generic/calloc-clear-memory.h new file mode 100644 index 0000000000..fd3c4abeea --- /dev/null +++ b/sysdeps/generic/calloc-clear-memory.h @@ -0,0 +1,44 @@ +/* Clear a block of memory for calloc. Generic version. + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Clear a memory size up to 9 * INTERNAL_SIZE_T bytes. We know that + contents have an odd number of INTERNAL_SIZE_T-sized words; minimally + 3 words. */ + +static __always_inline void +clear_small_memory (INTERNAL_SIZE_T *mem, unsigned long nclears) +{ + *(mem + 0) = 0; + *(mem + 1) = 0; + *(mem + 2) = 0; + if (nclears > 4) + { + *(mem + 3) = 0; + *(mem + 4) = 0; + if (nclears > 6) + { + *(mem + 5) = 0; + *(mem + 6) = 0; + if (nclears > 8) + { + *(mem + 7) = 0; + *(mem + 8) = 0; + } + } + } +} diff --git a/sysdeps/x86_64/64/calloc-clear-memory.h b/sysdeps/x86_64/64/calloc-clear-memory.h new file mode 100644 index 0000000000..b794a57811 --- /dev/null +++ b/sysdeps/x86_64/64/calloc-clear-memory.h @@ -0,0 +1,40 @@ +/* Clear a block of memory for calloc. X86-64/LP64 version. + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include + +_Static_assert (sizeof (INTERNAL_SIZE_T) == sizeof (long), + "size of INTERNAL_SIZE_T == sizeof (long)"); + +/* Clear a memory size up to 9 * INTERNAL_SIZE_T bytes. We know that + contents have an odd number of INTERNAL_SIZE_T-sized words; minimally + 3 words. */ + +static __always_inline void +clear_small_memory (INTERNAL_SIZE_T *mem, unsigned long nclears) +{ + __m128i zero = _mm_setzero_si128 (); + *(mem + 0) = 0; + _mm_storeu_si128 ((__m128i_u *) (mem + 1), zero); + _mm_storeu_si128 ((__m128i_u *) (mem + nclears - 2), zero); + if (nclears > 6) + { + _mm_storeu_si128 ((__m128i_u *) (mem + 3), zero); + _mm_storeu_si128 ((__m128i_u *) (mem + nclears - 4), zero); + } +} -- 2.47.0