From patchwork Mon Dec 2 12:50:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 102219 X-Patchwork-Delegate: Wilco.Dijkstra@arm.com Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 219BC3858C50 for ; Mon, 2 Dec 2024 12:52:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 219BC3858C50 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=L78sXLXW X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-yw1-x1133.google.com (mail-yw1-x1133.google.com [IPv6:2607:f8b0:4864:20::1133]) by sourceware.org (Postfix) with ESMTPS id 156A43858D33 for ; Mon, 2 Dec 2024 12:51:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 156A43858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 156A43858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::1133 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1733143879; cv=none; b=mECVGxvfaByMrwQW2HUL5VTHkLBe8k5/lVq+d3FKCELweWU/k21RwIZRa0/Zp3Nmqsy1srG4Rv8rkRV2TWhqbL4YSulwHe+Y2JpxgRk4EW02wBanWSnDJfbTX4XWSZ9n2KWCXNH4gof0v5WluvA0NcX5fvjcehFH/42vj3UjpbU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1733143879; c=relaxed/simple; bh=JoKQ+vCZKw/V3Fqnd4Ii7gBazzOxOGDPLTui/0Ml3f0=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=bNB1E5xHBmj2QZqZp5ZDfq0laYG8F+1c0SU8/uNxLHI3+jMG1rvRzpFTBuYVnO6TN1f5yZrp4tQwA3aBOwW5ywKoztcPkvlaxhSSn1KX7bgyeWrbk1whxje5P4mmRJ0GQlRGDBfLxDZM14TEmPX/VPg6zsiIVJx7cMCs1hsaVzs= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 156A43858D33 Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-6eeca160f14so43692867b3.2 for ; Mon, 02 Dec 2024 04:51:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733143878; x=1733748678; darn=sourceware.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=faI5AS6OYSLxUD3l/ISoRbAGrIkrOSezGmiN2shpNS8=; b=L78sXLXWPQeOmuQAcJCXtMvCtnR4EAMq8BDocJ4iMrEBhVTiXA+FEyVNbcF/P0sitt lfmg4xnMOZH5BaEGNsBgWDSCiF5dTdvEQSRqhR8REw+0fi5vPEebwjctr6S95ZUEfVNW uV0gC2+hShYNRWo4pwScvx0ofBEMe80MdOP1NfQrag7zvJ4VsoFnyvTAZ0vzB8+XDuhY 6wd0asvWdsR35N4lskeJfn8o8BpZWuqSGQwxdb8N3siNMSo6zSE+gaUno8gpyRFF6LnY SEkXFiLCZztPZQV2u7i4k3H05O4L/cq0SRY9Fw/3QvIjgFSEkiFEi14wPY775oCtpmHL bNjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733143878; x=1733748678; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=faI5AS6OYSLxUD3l/ISoRbAGrIkrOSezGmiN2shpNS8=; b=H1thjHluOeSalKkk7wTqn4hSx6zpMOl48GAHbP2CSJoLXxgs2mI0Jb0brmPg99oE+a Ks2MKIucPsDShCOuQRt0IET++S/ExjSw5WOLBgqnumTGvuEkHJz3X9aKI1unurTGpABM tGuMq6z5rIuRfs8o2+21ufY9WvIonWekfUq/+TH8esN4ASs5jFeD07YmQPLvJo7xVlYJ K9cHhd2vSe4APwRv968KyUcPlAV2mGA/NyBtC92S8A6dLrUPi0bZe+BOixJe8poQsv3b Glpb5NiWuyARh5KnllpihnSCHEQfkZOEkK+te27oWEdED6oztv7isGS5VhGVizq/tZ8e 0Xzw== X-Forwarded-Encrypted: i=1; AJvYcCX3wjsPI4XpUWhSTLD4Ir2TAyV1CVh6quspCg0XEwc5vyscbyT4xn6e3K/vYz5dHL6G1el4q1YVOlBm@sourceware.org X-Gm-Message-State: AOJu0YwkdUUh7GnZmIh9kTjtJLrxxJiimbVNwL8482RGacmH6F65+JWb jU7sdA1ZW2HHJniwPAch0jmhZrvnNxcJTy3pFdWkuxzzEK0I526Ut0Cve+PrDsDE47SzU+y595G xMd0XcC1RF0W91sZb8qrMXQGVl1E= X-Gm-Gg: ASbGncuwiBh1FRVzgk6ujnmBT3PffLu7gUSit+xVNCInU8EQ2pLWrX8woUA1bWftFfj r1GAXSJ8oLywIV4eGnd/IDy8tsWryvMOo X-Google-Smtp-Source: AGHT+IELqQFWaydwVadxQs3VCcgN1Fy4QPfvuxXbTU0A47xyF6QR8qGy97u0X4oW+tUF8RLSaURVKGCAJy1/V1erHzY= X-Received: by 2002:a05:690c:7206:b0:6ef:5bb3:f18d with SMTP id 00721157ae682-6ef5bb40196mr118533457b3.42.1733143878277; Mon, 02 Dec 2024 04:51:18 -0800 (PST) MIME-Version: 1.0 References: <20241130042111.663276-1-hjl.tools@gmail.com> <3faf1e2b-6fd9-4e49-ad3f-a0336ed92597@linaro.org> <87r06qcmt2.fsf@oldenburg3.str.redhat.com> <10c4538d-8046-4de7-a0a5-d16edb206120@linaro.org> In-Reply-To: <10c4538d-8046-4de7-a0a5-d16edb206120@linaro.org> From: "H.J. Lu" Date: Mon, 2 Dec 2024 20:50:42 +0800 Message-ID: Subject: [PATCH v6] malloc: Optimize small memory clearing for calloc To: Adhemerval Zanella Netto Cc: Florian Weimer , GNU C Library , "Guo, Wangyang" , Sunil K Pandey , Noah Goldstein , Wilco Dijkstra X-Spam-Status: No, score=-3017.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org On Mon, Dec 2, 2024 at 8:17 PM Adhemerval Zanella Netto wrote: > > > > On 02/12/24 09:05, Florian Weimer wrote: > > * Adhemerval Zanella Netto: > > > >>> +static __always_inline void * > >>> +clear_memory (void *mem, unsigned long clearsize) > >>> +{ > >>> + /* Unroll clear memory size up to 9 * INTERNAL_SIZE_T bytes. We know > >>> + that contents have an odd number of INTERNAL_SIZE_T-sized words; > >>> + minimally 3 words. */ > >>> + INTERNAL_SIZE_T *d = (INTERNAL_SIZE_T *) mem; > >> > >> I think this strictly UB and it might generate some issues on architecture > >> with strict alignment requirement (like sparc and some riscv chips). I > >> think we will need to use either some struct helper with __attribute__((packed)) > >> or memcpy to avoid it. > > > > I think everything is properly aligned? The “overlapping” comment is a > > bit misleading, it's about multiple stores to the same locations, not > > partially overlapping stores. > > So maybe then define 'mem' as 'INTERNAL_SIZE_T *'? Fixed in the v6 patch. I also changed "overlapping" to "repeated". Add calloc-clear-memory.h to clear memory size up to 36 bytes (72 if 8byte sizes) for calloc. Use repeated stores with 1 branch, instead of up to 3 branches. On x860-64, it is faster than memset since calling memset needs 1 indirect branch, 1 broadcast, and up to 4 branches. OK for master? Thanks. From 3c1430ad80ad0efefc6ed3b4ea992a7ea21e921e Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Tue, 26 Nov 2024 16:15:25 +0800 Subject: [PATCH v6] malloc: Optimize small memory clearing for calloc Add calloc-clear-memory.h to clear memory size up to 36 bytes (72 if 8byte sizes) for calloc. Use repeated stores with 1 branch, instead of up to 3 branches. On x860-64, it is faster than memset since calling memset needs 1 indirect branch, 1 broadcast, and up to 4 branches. Signed-off-by: H.J. Lu --- malloc/malloc-internal.h | 1 + malloc/malloc.c | 36 +------------------- sysdeps/generic/calloc-clear-memory.h | 47 +++++++++++++++++++++++++++ 3 files changed, 49 insertions(+), 35 deletions(-) create mode 100644 sysdeps/generic/calloc-clear-memory.h diff --git a/malloc/malloc-internal.h b/malloc/malloc-internal.h index cba03433fe..3349e2d1fe 100644 --- a/malloc/malloc-internal.h +++ b/malloc/malloc-internal.h @@ -23,6 +23,7 @@ #include #include #include +#include /* Called in the parent process before a fork. */ void __malloc_fork_lock_parent (void) attribute_hidden; diff --git a/malloc/malloc.c b/malloc/malloc.c index 287fa0904d..ac3901bdd5 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -3755,8 +3755,6 @@ __libc_calloc (size_t n, size_t elem_size) INTERNAL_SIZE_T sz, oldtopsize; void *mem; unsigned long clearsize; - unsigned long nclears; - INTERNAL_SIZE_T *d; ptrdiff_t bytes; if (__glibc_unlikely (__builtin_mul_overflow (n, elem_size, &bytes))) @@ -3853,40 +3851,8 @@ __libc_calloc (size_t n, size_t elem_size) } #endif - /* Unroll clear of <= 36 bytes (72 if 8byte sizes). We know that - contents have an odd number of INTERNAL_SIZE_T-sized words; - minimally 3. */ - d = (INTERNAL_SIZE_T *) mem; clearsize = csz - SIZE_SZ; - nclears = clearsize / sizeof (INTERNAL_SIZE_T); - assert (nclears >= 3); - - if (nclears > 9) - return memset (d, 0, clearsize); - - else - { - *(d + 0) = 0; - *(d + 1) = 0; - *(d + 2) = 0; - if (nclears > 4) - { - *(d + 3) = 0; - *(d + 4) = 0; - if (nclears > 6) - { - *(d + 5) = 0; - *(d + 6) = 0; - if (nclears > 8) - { - *(d + 7) = 0; - *(d + 8) = 0; - } - } - } - } - - return mem; + return clear_memory ((INTERNAL_SIZE_T *) mem, clearsize); } #endif /* IS_IN (libc) */ diff --git a/sysdeps/generic/calloc-clear-memory.h b/sysdeps/generic/calloc-clear-memory.h new file mode 100644 index 0000000000..e6f11d2e76 --- /dev/null +++ b/sysdeps/generic/calloc-clear-memory.h @@ -0,0 +1,47 @@ +/* Clear a block of memory for calloc. Generic version. + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +static __always_inline void * +clear_memory (INTERNAL_SIZE_T *d, unsigned long clearsize) +{ + /* Unroll clear memory size up to 9 * INTERNAL_SIZE_T bytes. We know + that contents have an odd number of INTERNAL_SIZE_T-sized words; + minimally 3 words. */ + unsigned long nclears = clearsize / sizeof (INTERNAL_SIZE_T); + + if (nclears > 9) + return memset (d, 0, clearsize); + + /* Use repeated stores with 1 branch, instead of up to 3. */ + *(d + 0) = 0; + *(d + 1) = 0; + *(d + 2) = 0; + *(d + nclears - 2) = 0; + *(d + nclears - 2 + 1) = 0; + if (nclears > 5) + { + *(d + 3) = 0; + *(d + 3 + 1) = 0; + *(d + nclears - 4) = 0; + *(d + nclears - 4 + 1) = 0; + } + else if (nclears < 3) + __builtin_unreachable (); + + return d; +} -- 2.47.1