From patchwork Wed Oct 30 19:40:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Law X-Patchwork-Id: 99847 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 589AB3858290 for ; Wed, 30 Oct 2024 19:40:33 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-ej1-x634.google.com (mail-ej1-x634.google.com [IPv6:2a00:1450:4864:20::634]) by sourceware.org (Postfix) with ESMTPS id DC8483858D33 for ; Wed, 30 Oct 2024 19:40:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DC8483858D33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=ventanamicro.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=ventanamicro.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DC8483858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::634 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730317209; cv=none; b=Z1BcZ2qqhCIAzNfhGFf997nba1hQxeoFDpYq3qAjTDRdPHj09SeWdoNcKOuVFucLFcCWDyxb3wWNcHDZwL6Dhug2jJxD4Y8oFM1PQDLXJYpU9RYk853yAPYuhIzxQboUd6ilSxwEqI9h44vVxKk5R2zM4p7VezJ0Bm7I+tWQH3U= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730317209; c=relaxed/simple; bh=GskwhOHELdF3YIp1UZFiecCVZI5F6cF47B4lkJh/gKU=; h=DKIM-Signature:Message-ID:Date:MIME-Version:From:Subject:To; b=MPe5NPDU6jUzbfPndx6lpmDRMIWYDusVBpLrkyCbXXrzHaimMLdGNFMWgfAH11hzIEIisY+QHUtAAuc48Evc7Kr7YFo7yavLKYhFzBRDGYztU4RPZtzupPSYfeXuJ4qO9GTCzesOdXYuq+XwMYSyBMo0E8aLuVbPKOMceM0blmI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ej1-x634.google.com with SMTP id a640c23a62f3a-a9a3dc089d8so23514166b.3 for ; Wed, 30 Oct 2024 12:40:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ventanamicro.com; s=google; t=1730317205; x=1730922005; darn=sourceware.org; h=to:subject:from:content-language:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=wfFkzo7HR3whp86/4Er5EaqWxlWWkKtRqR/8IHgZ8Pc=; b=dvQOJ9Qx+IVVY1feyezVXxT8KH5KJYZ4qt1G8zWLWT1s4Dpo4R57RM/KAcBu3+z12K d1O1SUuZuZUFJaVXxoUndegvkdGTbhikXW2ZF51KhPgmuwcKbTOtIMnOxVOfIGtC9x1p vJ9ScjpCwoiEAUl4UhVYDirZzmiT7UozeuUzuj+8C+3u++nc6s0kaxMEEgtJugPVlZRK /LNFUXHtP8kvSFXYLBXbbrY36mQsG+YudzruVOR6ZvG9eyfP3xGuu91FQ8NDV0KBRo7j tqcJc81nswHD4NCs+FJbMgbrs4yB1EDoUv5s03DSy3Tguh2HZ5iTWSX3/kC6QfWQyQMw 2yfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730317205; x=1730922005; h=to:subject:from:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=wfFkzo7HR3whp86/4Er5EaqWxlWWkKtRqR/8IHgZ8Pc=; b=VkoMW2FIsEl4IywNzMT6gEG+WH3sragGmR4Q0WfFPt4ukItuR3xUdj2+MzmAl+OIIq iuUi6hrVOamDd498KyWmiOiqOvTN+s9Ps5UZj5A+S3ZKAxQG26gacF1C7uxM3bQYrShq +lUB1zdmjaKT7Puk518P9thnpPhfJeXnY2r2qcU2cZqFBoJuNycZpB2VLjoZdhyRalfm CI1jYh91145OnCO3omXkDtagcBUdUGEoDpgBksYwf1hnSDewH5Li0fmEcLoLyul0fZ3A kHZMsW1tMOx2kN5zZtcD8kTwG5/D3dlstcHSr0B92UEySY/J9nyFVpdfYeiGTMeYd3dS wrCw== X-Gm-Message-State: AOJu0Yz5128FjcRJH9LOyNmcG6DUOagcs2b5hRgaAppikWZg1Bmgyah3 GQFWrT/sopVYlmfh1JaKzzMr0QivUUBijRAA+UKgPOJnjCGGTCLmi6WWCXY8MiB667+PFXNZ5GP 8 X-Google-Smtp-Source: AGHT+IEc8x3chCUPKpdxLtuB3JKhVqJmH3oXwYNNjL86TTk3PfJcrN2z7IOruq/dpkb8yq1V69zrBg== X-Received: by 2002:a17:907:6d24:b0:a96:cca9:5f5 with SMTP id a640c23a62f3a-a9de5f9a186mr1786376766b.37.1730317204958; Wed, 30 Oct 2024 12:40:04 -0700 (PDT) Received: from [172.31.0.109] ([136.36.72.243]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9b1f29744esm602471066b.130.2024.10.30.12.40.03 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 30 Oct 2024 12:40:04 -0700 (PDT) Message-ID: Date: Wed, 30 Oct 2024 13:40:01 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US From: Jeff Law Subject: risc-v: Enable vectorized memset via ifunc To: GNU C Library X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org This patch adds the ability for glibc to select a vectorized memset implementation for RISC-V using the ifunc/hwprobe mechanism. Implementation on the ifunc/hwprobe is quite simple. We call hwprobe with the right key and we check the returned value to see if vector is enabled. If so, then we use the vector memset implementation. Else we fall back to the generic memset implementation. The guts of the memset itself are quite simple as well and represent a generic vector implementation for riscv. In particular we use a VLA style loop where each iteration of the loop tries to handle as much data as the cpu core indicates it can reasonably handle. On something like the spacemit design we can handle up to 256 bytes of data per loop iteration (256bit vector length * LMUL8). Naturally this has been tested with the glibc testsuite. I've tested on kernels with and without hwprobe support. For the latter we naturally only use memset_generic. The memset implementation is originally from Hau Hsu (SiFive), posted to libc-alpha back in May 2023. Sergei from Rivos posted an alternate, more complex implementation back in Feb 2023. I took the simpler implementation largely because it included performance data. Sergei's could well be better, but it likely depends on uarch details like quality of branch predictors as Sergei's has multiple conditional branches to select between a few variants. I've got several other of these routines queued up that I'll submit once we're acked on memset. Obviously any feedback on memset will be incorporated into the other routines. OK for the trunk? Thanks, Jeff From fd42d5ac491308b2ffa955ea11b4843813387510 Mon Sep 17 00:00:00 2001 From: Hau Hsu Date: Wed, 30 Oct 2024 11:10:25 -0600 Subject: [PATCH] risc-v: Enable vectorized memset via ifunc This patch adds the ability for glibc to select a vectorized memset implementation for RISC-V using the ifunc/hwprobe mechanism. Implementation on the ifunc/hwprobe is quite simple. We call hwprobe with the right key and we check the returned value to see if vector is enabled. If so, then we use the vector memset implementation. Else we fall back to the generic memset implementation. The guts of the memset itself are quite simple as well and represent a generic vector implementation for riscv. In particular we use a VLA style loop where each iteration of the loop tries to handle as much data as the cpu core indicates it can reasonably handle. On something like the spacemit design we can handle up to 256 bytes of data per loop iteration (256bit vector length * LMUL8). Naturally this has been tested with the glibc testsuite. I've tested on kernels with and without hwprobe support. For the latter we naturally only use memset_generic. The memset implementation is originally from Hau Hsu (SiFive), posted to libc-alpha back in May 2023. Sergei from Rivos posted an alternate, more complex implementation back in Feb 2023. I took the simpler implementation largely because it included performance data. Sergei's could well be better, but it likely depends on uarch details like quality of branch predictors as Sergei's has multiple conditional branches to select between a few variants. I've got several other of these routines queued up that I'll submit once we're acked on memset. Obviously any feedback on memset will be incorporated into the other routines. Co-authored-by: Jerry Shih Co-authored-by: Jeff Law --- sysdeps/riscv/multiarch/memset-generic.c | 26 +++++++++ sysdeps/riscv/multiarch/memset_vector.S | 51 ++++++++++++++++ .../unix/sysv/linux/riscv/multiarch/Makefile | 3 + .../linux/riscv/multiarch/ifunc-impl-list.c | 12 ++++ .../unix/sysv/linux/riscv/multiarch/memset.c | 58 +++++++++++++++++++ 5 files changed, 150 insertions(+) create mode 100644 sysdeps/riscv/multiarch/memset-generic.c create mode 100644 sysdeps/riscv/multiarch/memset_vector.S create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/memset.c diff --git a/sysdeps/riscv/multiarch/memset-generic.c b/sysdeps/riscv/multiarch/memset-generic.c new file mode 100644 index 0000000000..864e692077 --- /dev/null +++ b/sysdeps/riscv/multiarch/memset-generic.c @@ -0,0 +1,26 @@ +/* Re-include the default memset implementation. + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include + +#if IS_IN(libc) +# define MEMSET __memset_generic +# undef libc_hidden_builtin_def +# define libc_hidden_builtin_def(x) +#endif +#include diff --git a/sysdeps/riscv/multiarch/memset_vector.S b/sysdeps/riscv/multiarch/memset_vector.S new file mode 100644 index 0000000000..36455575a4 --- /dev/null +++ b/sysdeps/riscv/multiarch/memset_vector.S @@ -0,0 +1,51 @@ +/* memset for RISC-V, using vectors + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include + +#define dst a0 +#define value a1 +#define num a2 + +#define ivl a3 +#define dst_ptr a5 + +#define ELEM_LMUL_SETTING m8 +#define vdata v0 + + .attribute unaligned_access, 1 + .option arch, +v + +ENTRY(__memset_vector) + + mv dst_ptr, dst + + vsetvli ivl, num, e8, ELEM_LMUL_SETTING, ta, ma + vmv.v.x vdata, value + +L(loop): + vse8.v vdata, (dst_ptr) + sub num, num, ivl + add dst_ptr, dst_ptr, ivl + vsetvli ivl, num, e8, ELEM_LMUL_SETTING, ta, ma + bnez num, L(loop) + + ret + +END(__memset_vector) diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile b/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile index fcef5659d4..de8024b86d 100644 --- a/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile @@ -3,6 +3,9 @@ sysdep_routines += \ memcpy \ memcpy-generic \ memcpy_noalignment \ + memset \ + memset-generic \ + memset_vector \ # sysdep_routines CFLAGS-memcpy_noalignment.c += -mno-strict-align diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c b/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c index 9f806d7a9e..8e8907b40e 100644 --- a/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c @@ -27,6 +27,7 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, size_t i = max; bool fast_unaligned = false; + bool v_ext = false; struct riscv_hwprobe pair = { .key = RISCV_HWPROBE_KEY_CPUPERF_0 }; if (__riscv_hwprobe (&pair, 1, 0, NULL, 0) == 0 @@ -34,10 +35,21 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, == RISCV_HWPROBE_MISALIGNED_FAST) fast_unaligned = true; + pair.key = RISCV_HWPROBE_KEY_IMA_EXT_0; + pair.value = 0; + if (__riscv_hwprobe (&pair, 1, 0, NULL, 0) == 0 + && (pair.value & RISCV_HWPROBE_IMA_V) == RISCV_HWPROBE_IMA_V) + v_ext = true; + IFUNC_IMPL (i, name, memcpy, IFUNC_IMPL_ADD (array, i, memcpy, fast_unaligned, __memcpy_noalignment) IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_generic)) + IFUNC_IMPL (i, name, memset, + IFUNC_IMPL_ADD (array, i, memset, v_ext, + __memset_vector) + IFUNC_IMPL_ADD (array, i, memset, 1, __memset_generic)) + return 0; } diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/memset.c b/sysdeps/unix/sysv/linux/riscv/multiarch/memset.c new file mode 100644 index 0000000000..20dba8a702 --- /dev/null +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/memset.c @@ -0,0 +1,58 @@ +/* Multiple versions of memset. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2017-2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#if IS_IN (libc) +/* Redefine memset so that the compiler won't complain about the type + mismatch with the IFUNC selector in strong_alias, below. */ +# undef memset +# define memset __redirect_memset +# include +# include +# include +# include +# include + +extern __typeof (__redirect_memset) __libc_memset; + +extern __typeof (__redirect_memset) __memset_generic attribute_hidden; +extern __typeof (__redirect_memset) __memset_vector attribute_hidden; + +static inline __typeof (__redirect_memset) * +select_memset_ifunc (uint64_t dl_hwcap, __riscv_hwprobe_t hwprobe_func) +{ + unsigned long long int v; + + if (__riscv_hwprobe_one (hwprobe_func, RISCV_HWPROBE_KEY_IMA_EXT_0, &v) == 0 + && (v & RISCV_HWPROBE_IMA_V) == RISCV_HWPROBE_IMA_V) + return __memset_vector; + + return __memset_generic; +} + +riscv_libc_ifunc (__libc_memset, select_memset_ifunc); + +# undef memset +strong_alias (__libc_memset, memset); +# ifdef SHARED +__hidden_ver1 (memset, __GI_memset, __redirect_memset) + __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memset); +# endif +#else +# include +#endif -- 2.45.2