From patchwork Fri Sep 20 12:44:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Ramsay X-Patchwork-Id: 97757 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6B333385840E for ; Fri, 20 Sep 2024 12:45:50 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR03-VI1-obe.outbound.protection.outlook.com (mail-vi1eur03on20625.outbound.protection.outlook.com [IPv6:2a01:111:f403:260c::625]) by sourceware.org (Postfix) with ESMTPS id 5424E3858D39 for ; Fri, 20 Sep 2024 12:45:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5424E3858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5424E3858D39 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:260c::625 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1726836305; cv=pass; b=UF1/1MW85GajOmdV/QKKR2mQwW97CqDZX2ijKZ7DUjcqTs7DYCEynL62hynK+EXUfZFESjaCNEmzWyf0vih130Hn30Wi1PD0j7zOLBfRJATtDN5JufqwMra91Z6bhweIYpBoPucwokk6dZtbvjnTyGeg+m1ii0Lf8xX0QA75rzc= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1726836305; c=relaxed/simple; bh=l2cd15rYY9xeLcAS5nFt1Np+duU28wYhtsYpm12InQ0=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=Om6irIE4GuJoiSM8gKep3wuk01On0lj6rh6x62Yc0j7k5D9ieLea/b58P1eyw/Xg8SS8/oDvZp50cWcmsSlJB/6MA5IF5mF1SaVZr/iQLjiE12rg6j4mpthF/D6TrYFVO5qA0UkpZdXsR7oYkTmW1scdydi4liilGtnrGE26tf0= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=k23ZxfEycQpJVGT1ZKOULNkr6+T+mymPXMYg2eKi/1l8KZGX7T4ed0D2rOIRN6F52a2C8ArCwGUyM7KvJa6uUEUFDRvlyyJYepQdIAmkrG78nvDQ/zOedBGY7RH/SAIWBs1N8g55yfn/rRuMAuTaArK6vl2Q/g2x6BTFM6QIKmyd+SbxKxsznO28I1ro+ZmkZ5eXzygxwuSe2YOvVa/Bg55+Qk1Wq5514j4k4ZA+ayhy1bYRNfmqi+yyMNzR0iqQ+xHxYTamUENx8AMiXNybXYkEy+3icTBOE1cSfIXEg7JuuOIxBurnzDd0fwqQ7blEszEkBv85S0CsGrp0iubveg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=KMN6O+PlS4wZ/WilCQvBbc4WMBPregHBh9PdafygBU8=; b=tRDSRI3l+6TpZuImJs0+XRs2vfE44m+y9pCDmTQJfChewdlXqSIn3Srz9qRl1Ld0o4mVBmd9B7klEljWgojolU59GnzGWSkMb47lGvow1WCra5aM6NkV3ZzEMoEETN1MRBNWjvHHQMK5Y2UY48qCZVsmssArqUslCV0s8y/9mlDda0qxFblRdpqu9UxDAwXhqt1moU71F3VlWAVyBvhrgsT1p32YfMiM6TOnmU+awWOz6MFAcWYyfPK3zFg+QBx1sieQ8JvKBroeYZC3ZiPgXJz3ZUQa4KStWVaIBa3d4QRaz6+tzT7cfm/rHQjbmRACp9xUki+CcELtCEBDh+R2uQ== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=KMN6O+PlS4wZ/WilCQvBbc4WMBPregHBh9PdafygBU8=; b=FTvuhhaE8C5BOsqdJFwYGZkzjNhqAp28uJz2m57DA8iPI+AqBz7nN9agat8tpt5qPoC+JM34f7wYNVNK8VTYE6M3obgnoHonF1aAX0fUiXQ9K+HpsPqJoiB+qJUB9qZreCLxumJ/pkiYSK79b3uB0BYp8GyMBh8auT9fcIEFmXQ= Received: from AM6PR02CA0034.eurprd02.prod.outlook.com (2603:10a6:20b:6e::47) by PAVPR08MB9627.eurprd08.prod.outlook.com (2603:10a6:102:31b::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7982.15; Fri, 20 Sep 2024 12:44:57 +0000 Received: from AM2PEPF0001C717.eurprd05.prod.outlook.com (2603:10a6:20b:6e:cafe::78) by AM6PR02CA0034.outlook.office365.com (2603:10a6:20b:6e::47) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7939.30 via Frontend Transport; Fri, 20 Sep 2024 12:44:57 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM2PEPF0001C717.mail.protection.outlook.com (10.167.16.187) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7918.13 via Frontend Transport; Fri, 20 Sep 2024 12:44:55 +0000 Received: ("Tessian outbound 690adfc84afa:v465"); Fri, 20 Sep 2024 12:44:55 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 649032caaeaf5929 X-TessianGatewayMetadata: S1A4OFgOmXMWt7bPaYrESCxfC/UziEUNquYVvsS/J1WZRIZYUOxU4fLZjEv5WXp3IQf8ieK3PpG9fTqaOSz2kddzYXtVacvNFm5IT961rQQvgA54EGywGTa+kczYxsMHYZs2CkrvNFYXLHV6V2rCFNWh98Wrbc6FidEKSZunpzM= X-CR-MTA-TID: 64aa7808 Received: from L7c03439cc720.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 4F168AD9-F3D0-4A31-B596-FE2BB802E103.1; Fri, 20 Sep 2024 12:44:49 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id L7c03439cc720.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 20 Sep 2024 12:44:49 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=dg0W6D/7WCcgpd971Anx0X7VIzl8Ku2nIOCoZbHZ4Iu5jtmQxi/X9uRHSxarxO9V7HI7lUVXHOrUrwBFlR3WrSotpA/OYIZUnKDsnW0XqnYcd/7YLjzzCqi1vOLE20gZLe5CKVRxdnI/K1te5YUlZEoqacWhx9ae1kzPp5m0Bb253IlbcZDqU20ZBJzC8XgqRijrZnaMWBR8vqN/n7M1OdMxfdpqaiLcgpO2RDXEhOCKtbCRGIz7Om2Qhu/S9fY3U/MzoyYtuzb3f8fS9Tg40KshB9f7lpKN5HxCkgubmW1F7EvUyX8QX3WZyEXLdhFSye5seyu3Ldql/v6tG8xSyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=KMN6O+PlS4wZ/WilCQvBbc4WMBPregHBh9PdafygBU8=; b=BaJsSnzjipmNNfzJQqdtPAizI6EcnZqLFDz83P5rYQP61RC5EernSbxMi9CEZsKO9sRQQ10DR0OGsXcViTtmIqKgW2ihtCR2/p71y3NIs4izZ9aWro+LGm4PntuxX4lpHTtYk7TQb9lqgVJK29y0lqA8vPyQkejTfXl0Mm5bd40um3eK+08aj1yVpppHWCTWM1UiXd+/WCa8XhTtTEFBtV2PrF69Jnpn1aHF+XfSoArX+GM3dDxT3Byc/sDnzFDMae//po9x5aBBuS/+i8fgm/iJj5lrAbkuOT1cnOpvTgqMWgmUxQ217pwBgSavyh9qd+51bCd26GhgunLI0YJL3w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=KMN6O+PlS4wZ/WilCQvBbc4WMBPregHBh9PdafygBU8=; b=FTvuhhaE8C5BOsqdJFwYGZkzjNhqAp28uJz2m57DA8iPI+AqBz7nN9agat8tpt5qPoC+JM34f7wYNVNK8VTYE6M3obgnoHonF1aAX0fUiXQ9K+HpsPqJoiB+qJUB9qZreCLxumJ/pkiYSK79b3uB0BYp8GyMBh8auT9fcIEFmXQ= Received: from AS9PR06CA0538.eurprd06.prod.outlook.com (2603:10a6:20b:49d::25) by AS8PR08MB6054.eurprd08.prod.outlook.com (2603:10a6:20b:291::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8005.11; Fri, 20 Sep 2024 12:44:47 +0000 Received: from AM3PEPF0000A798.eurprd04.prod.outlook.com (2603:10a6:20b:49d:cafe::d8) by AS9PR06CA0538.outlook.office365.com (2603:10a6:20b:49d::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7939.30 via Frontend Transport; Fri, 20 Sep 2024 12:44:47 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by AM3PEPF0000A798.mail.protection.outlook.com (10.167.16.103) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7918.13 via Frontend Transport; Fri, 20 Sep 2024 12:44:47 +0000 Received: from AZ-NEU-EX02.Emea.Arm.com (10.251.26.5) by AZ-NEU-EX04.Arm.com (10.251.24.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 20 Sep 2024 12:44:44 +0000 Received: from AZ-NEU-EX04.Arm.com (10.251.24.32) by AZ-NEU-EX02.Emea.Arm.com (10.251.26.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 20 Sep 2024 12:44:44 +0000 Received: from vcn-man-apps.manchester.arm.com (10.32.108.22) by mail.arm.com (10.251.24.32) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Fri, 20 Sep 2024 12:44:44 +0000 From: Joe Ramsay To: CC: Joe Ramsay Subject: [PATCH] aarch64: Simplify rounding-multiply pattern in several AdvSIMD routines Date: Fri, 20 Sep 2024 13:44:37 +0100 Message-ID: <20240920124437.1908340-5-Joe.Ramsay@arm.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20240920124437.1908340-1-Joe.Ramsay@arm.com> References: <20240920124437.1908340-1-Joe.Ramsay@arm.com> MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: AM3PEPF0000A798:EE_|AS8PR08MB6054:EE_|AM2PEPF0001C717:EE_|PAVPR08MB9627:EE_ X-MS-Office365-Filtering-Correlation-Id: 34009fa0-4304-44a1-cfae-08dcd97207c4 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230040|1800799024|36860700013|376014|82310400026; X-Microsoft-Antispam-Message-Info-Original: ECnXDRDWRYc+JbNbgP5AWut8Pnl/BJ7/T/ONtZxkXqraUOiqcKWECNmrMmanV1g9N/5rULFEy5dX6IrON45cYnc/PVr7B1P4oTTKuaKEIUHcf9GVK8kXDmGUY76ED7SOFo1hNGzNBCBt0qnDSus+N0fsOcq/JmjDoKS+GVJWBdfT5lArL8iN3DmSS+p0A7uckLCBxSXX06csAuSdmgNuKJu/lhucUqteVRvBdxC7/wGCi+b8n7hCGt1grl3PpiwiKpFGkUVmaKxaP/Je92OfvRmmw+lYsxslWBwkTmYVjt1XJ20L2JuPzXjLAPGshOP2qEC5KNTWn6c91Tl6GpeaJmz6g6NtFJErEWITL5c+zxh+DFP47GfJO5SJRJqaFYFXidmrFjGrvnRwTyxqqFSbeXsQc65ff4gCRkkuyJTd9T9kPGhlgeCXE/y/4Fk9fQWL0pNmFAa3aPd2IIXylEslGH63eXBbVe2eSILIfWJUybCzMDX2fthWBDJxn6b0O9EnXhVDkk3cwcGbYijPjW8hgQsPjdAQE5WPugy8pfQxkPNWcjGxy1PhdvOwq2HEtQjckyMJt+WBV5af6zwedfVoRDVU7jqZzaT9hNtrYWHNGY69fpfU6M7YHU9+0t2XLZaXBQeD2UfsYMrqZp9c3O6IzVM2p9dXxMfONXHkMPcAMq87rs1Cza/WIsnjFBd9yTU1/eEVKMjC/qXUUAGJrBr8Ywt4oYHZxErrmZDou88Tn6Cb0LHF/i3mqzb/0pSE1eO3pQHdYMFGU8EHca5w55ZX+iK8lGOjkw3RtilYmUYUqtrqX7LDJVLqWvLom1uNowVA+NWLuClwxpvvfSEnix79hMM2i3zqilHYiMGRGJNun5Dn+7aXwbToE99DZGZfyRNNjKSNfGLm2qNDf1VKUXaZX2sZHsDrHOiPMjvxKCpuhRg+EYhZOS8GnObCZia0ALeQjz50VmtqeQgWZk3gpyhssuqc8TUhIIx7M2SSj7bqJRWeZY5HElomiyZQa0V3cZgdS4Ug7VFF2+hVNVEg1fuD58WKSOLkt8pMtoR/xzO8gucR1eKzLOwBdyZ5RTaEPZPF0Ptzw4j4WXeHk3HduexEv/5aHvq5xSNTGzMdK7/BAwnCS6nFHSLjM659OyvqfIiLr8nAF5aqxwTjQgG3F1c+8RBT7dwjiyzV/jHvRzFOM2qABzjngBotXu5FmIyDCohOidy3OWfvy0lamK/hExmsizNDmVsrotcf0XSqDncDEAx0IPwaC7E4LD/FK0amoR2eVTmzgYt2pty6GrweFEyAb/5BXNvMo96EfcQivx3ejbaNOuF+8TV62ns/neHzlXwWrMvLJlNEkwFeh7kf69OkoNjWAacCPWtakfKOvZgeJzTjieh2JM1Tt+38pSioobNVCrpOR+PI/Npoo05P64gLGDrTRr4I8SNmzpWIzqxyVwThWRqL/+D28XWbp4dYxg8K X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(1800799024)(36860700013)(376014)(82310400026); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB6054 X-MS-Exchange-SkipListedInternetSender: ip=[2603:10a6:20b:49d::25]; domain=AS9PR06CA0538.eurprd06.prod.outlook.com X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM2PEPF0001C717.eurprd05.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 9fdbdac2-cc53-4241-3cc8-08dcd972027a X-Microsoft-Antispam: BCL:0; ARA:13230040|376014|35042699022|1800799024|36860700013|82310400026; X-Microsoft-Antispam-Message-Info: 14iKWKCOmjdBrpocf0+J6UqcGRI47ohUTf7y14X6ZCdkZ2dXrV1OhNPoN2nSWmaKRIHxpQU0/RBKEEl9P+SkaPVMk+cTgHWGMidpuvVinb/u/W7qH7h7E/4ljalnu04B7dAP0T/7IEJnv2/DGORFW7x+qU/VL2uSm462+suk0QMw96hu/482xIAS8277Jk/UJDVL+tv6GcEttVyorNT94G8hR4+m6eInflaRiqdZ/L9n+DmsOjRq4l0EX+NT9ZhLQIfHJ98h+DPtgzcSz7tM9NNfoetPVWcrKSvzsjqV0iP261GZOx+o9lmH14F8s3+q+5pWmVTfw5CkUHTUTdQyU0SfgTsh6o9iK7fO6ogFNRCLzKt5WJCKlY2mVfHS501XTCqNIMuUAzxqRUvn08TpKCAfrtGZLGM9Pb5tOBowxB5iwrtLldXuCNNaVma2JpJG7EMBf2gMH6544zVRipOPHMmVry6v8dVb0jhRs5FtpzZm5lpX5D1YGNwjt0pc3JlSmmlRVamLHNd2wHlloOVEGag+Ad8WYq2zD/cKDNeZaRBzDtsAtm3vwDMlTQ0DVeOJLxoMCPAmJC/njtZxhZWa1bLPSLiY65cYxpv+MljdRaUrQrs8UKR2Ctivo5W8JfimO48FVY07kmV7qRuaA5NvF3Y9hzzicExolfl4i0lVpN+9DSRNH7c+uOxnLyvqb2bgoPhxZtE7SK789igsEDyCPA/P4pk0LnXK1p4x93Xa8mvLL48M5N0T4jdXYTOYVi0kSi4Kk3guhfSldUhDvD4/DA6fXfOb+EYFSIFpLqci1FNYFBNKCxtlS43mnNTg2tzIuRdfckwYwts7yyt7y8ypyHzXv1uxZ3eOTjf5cHI1ep0WQDMsCtyTCFMkdPYCLD03AXZBH1Nrd5O0pxQTA0n5LlaGwxV9Tjs/3nZ+hWvRNqO0S29Jqc9yFaxgoB/vG5FcmmS4AFpDVldRgCAdyP80DwkFPuH/T8i8ZDi3+u/w3yK3mc4pvU4veJgdcmmxWkh7he+hbaAHo4GlZw2swqTrMYrA2VdvuM+L+yqkh5JbOtLK75Zfig0uRy+G1MF171lnWbfgEvh26mBPBUmsvC+U2t2MUnniIzcwV7zZk3vpEJJDDwR2i3i9/RhkYycofw92sCcNKOsza7FIAuuE/WKecc7Ao9o2u1XWFFbAKklYRGHsO5ZqA9ddwHxGjhERy2OjkgW+1RkOuSULXXfLss2WOIKkdJ286JjfJ4eJmsVYtaVQ9kLoShUOQFHo1vgEWcFMmst765gCajJHp26Bp2CU6wPMaDujNLmSdvS3+Fvxrq74sZWfE0K7WZz91KaNb6Uv7KbYuoIZs5KSnQeiVGq4vsyUM8SWR7SUHbqKyLn7bztW4XQTBAY8pqFx3Q2ouOB3Sokz2DfgxBMawJlVR0U7PuLfTwAkkoeOR2iwQbrJZdewfVld2Vi7tCjGGTGr3Dcz X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230040)(376014)(35042699022)(1800799024)(36860700013)(82310400026); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Sep 2024 12:44:55.8980 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 34009fa0-4304-44a1-cfae-08dcd97207c4 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM2PEPF0001C717.eurprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAVPR08MB9627 X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY, UNWANTED_LANGUAGE_BODY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org This operation can be simplified to use simpler multiply-round-convert sequence, which uses fewer instructions and constants. --- OK for master? If so please commit for as I don't have commit rights. Thanks, Joe sysdeps/aarch64/fpu/cos_advsimd.c | 11 ++++------- sysdeps/aarch64/fpu/cosf_advsimd.c | 9 +++------ sysdeps/aarch64/fpu/expf_advsimd.c | 10 ++++------ sysdeps/aarch64/fpu/sin_advsimd.c | 16 ++++++++-------- sysdeps/aarch64/fpu/sinf_advsimd.c | 22 +++++++++++----------- 5 files changed, 30 insertions(+), 38 deletions(-) diff --git a/sysdeps/aarch64/fpu/cos_advsimd.c b/sysdeps/aarch64/fpu/cos_advsimd.c index 3924c9ce44..11a89b1530 100644 --- a/sysdeps/aarch64/fpu/cos_advsimd.c +++ b/sysdeps/aarch64/fpu/cos_advsimd.c @@ -22,7 +22,7 @@ static const struct data { float64x2_t poly[7]; - float64x2_t range_val, shift, inv_pi, half_pi, pi_1, pi_2, pi_3; + float64x2_t range_val, inv_pi, pi_1, pi_2, pi_3; } data = { /* Worst-case error is 3.3 ulp in [-pi/2, pi/2]. */ .poly = { V2 (-0x1.555555555547bp-3), V2 (0x1.1111111108a4dp-7), @@ -30,11 +30,9 @@ static const struct data V2 (-0x1.ae633919987c6p-26), V2 (0x1.60e277ae07cecp-33), V2 (-0x1.9e9540300a1p-41) }, .inv_pi = V2 (0x1.45f306dc9c883p-2), - .half_pi = V2 (0x1.921fb54442d18p+0), .pi_1 = V2 (0x1.921fb54442d18p+1), .pi_2 = V2 (0x1.1a62633145c06p-53), .pi_3 = V2 (0x1.c1cd129024e09p-106), - .shift = V2 (0x1.8p52), .range_val = V2 (0x1p23) }; @@ -68,10 +66,9 @@ float64x2_t VPCS_ATTR V_NAME_D1 (cos) (float64x2_t x) #endif /* n = rint((|x|+pi/2)/pi) - 0.5. */ - n = vfmaq_f64 (d->shift, d->inv_pi, vaddq_f64 (r, d->half_pi)); - odd = vshlq_n_u64 (vreinterpretq_u64_f64 (n), 63); - n = vsubq_f64 (n, d->shift); - n = vsubq_f64 (n, v_f64 (0.5)); + n = vrndaq_f64 (vfmaq_f64 (v_f64 (0.5), r, d->inv_pi)); + odd = vshlq_n_u64 (vreinterpretq_u64_s64 (vcvtq_s64_f64 (n)), 63); + n = vsubq_f64 (n, v_f64 (0.5f)); /* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */ r = vfmsq_f64 (r, d->pi_1, n); diff --git a/sysdeps/aarch64/fpu/cosf_advsimd.c b/sysdeps/aarch64/fpu/cosf_advsimd.c index d0c285b03a..85a1b37373 100644 --- a/sysdeps/aarch64/fpu/cosf_advsimd.c +++ b/sysdeps/aarch64/fpu/cosf_advsimd.c @@ -22,7 +22,7 @@ static const struct data { float32x4_t poly[4]; - float32x4_t range_val, inv_pi, half_pi, shift, pi_1, pi_2, pi_3; + float32x4_t range_val, inv_pi, pi_1, pi_2, pi_3; } data = { /* 1.886 ulp error. */ .poly = { V4 (-0x1.555548p-3f), V4 (0x1.110df4p-7f), V4 (-0x1.9f42eap-13f), @@ -33,8 +33,6 @@ static const struct data .pi_3 = V4 (-0x1.ee59dap-49f), .inv_pi = V4 (0x1.45f306p-2f), - .shift = V4 (0x1.8p+23f), - .half_pi = V4 (0x1.921fb6p0f), .range_val = V4 (0x1p20f) }; @@ -69,9 +67,8 @@ float32x4_t VPCS_ATTR NOINLINE V_NAME_F1 (cos) (float32x4_t x) #endif /* n = rint((|x|+pi/2)/pi) - 0.5. */ - n = vfmaq_f32 (d->shift, d->inv_pi, vaddq_f32 (r, d->half_pi)); - odd = vshlq_n_u32 (vreinterpretq_u32_f32 (n), 31); - n = vsubq_f32 (n, d->shift); + n = vrndaq_f32 (vfmaq_f32 (v_f32 (0.5), r, d->inv_pi)); + odd = vshlq_n_u32 (vreinterpretq_u32_s32 (vcvtq_s32_f32 (n)), 31); n = vsubq_f32 (n, v_f32 (0.5f)); /* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */ diff --git a/sysdeps/aarch64/fpu/expf_advsimd.c b/sysdeps/aarch64/fpu/expf_advsimd.c index 99d2e647aa..5c9cb72620 100644 --- a/sysdeps/aarch64/fpu/expf_advsimd.c +++ b/sysdeps/aarch64/fpu/expf_advsimd.c @@ -22,7 +22,7 @@ static const struct data { float32x4_t poly[5]; - float32x4_t shift, inv_ln2, ln2_hi, ln2_lo; + float32x4_t inv_ln2, ln2_hi, ln2_lo; uint32x4_t exponent_bias; #if !WANT_SIMD_EXCEPT float32x4_t special_bound, scale_thresh; @@ -31,7 +31,6 @@ static const struct data /* maxerr: 1.45358 +0.5 ulp. */ .poly = { V4 (0x1.0e4020p-7f), V4 (0x1.573e2ep-5f), V4 (0x1.555e66p-3f), V4 (0x1.fffdb6p-2f), V4 (0x1.ffffecp-1f) }, - .shift = V4 (0x1.8p23f), .inv_ln2 = V4 (0x1.715476p+0f), .ln2_hi = V4 (0x1.62e4p-1f), .ln2_lo = V4 (0x1.7f7d1cp-20f), @@ -85,7 +84,7 @@ special_case (float32x4_t poly, float32x4_t n, uint32x4_t e, uint32x4_t cmp1, float32x4_t VPCS_ATTR NOINLINE V_NAME_F1 (exp) (float32x4_t x) { const struct data *d = ptr_barrier (&data); - float32x4_t n, r, r2, scale, p, q, poly, z; + float32x4_t n, r, r2, scale, p, q, poly; uint32x4_t cmp, e; #if WANT_SIMD_EXCEPT @@ -104,11 +103,10 @@ float32x4_t VPCS_ATTR NOINLINE V_NAME_F1 (exp) (float32x4_t x) /* exp(x) = 2^n (1 + poly(r)), with 1 + poly(r) in [1/sqrt(2),sqrt(2)] x = ln2*n + r, with r in [-ln2/2, ln2/2]. */ - z = vfmaq_f32 (d->shift, x, d->inv_ln2); - n = vsubq_f32 (z, d->shift); + n = vrndaq_f32 (vmulq_f32 (x, d->inv_ln2)); r = vfmsq_f32 (x, n, d->ln2_hi); r = vfmsq_f32 (r, n, d->ln2_lo); - e = vshlq_n_u32 (vreinterpretq_u32_f32 (z), 23); + e = vshlq_n_u32 (vreinterpretq_u32_s32 (vcvtq_s32_f32 (n)), 23); scale = vreinterpretq_f32_u32 (vaddq_u32 (e, d->exponent_bias)); #if !WANT_SIMD_EXCEPT diff --git a/sysdeps/aarch64/fpu/sin_advsimd.c b/sysdeps/aarch64/fpu/sin_advsimd.c index a0d9d3b819..718125cbad 100644 --- a/sysdeps/aarch64/fpu/sin_advsimd.c +++ b/sysdeps/aarch64/fpu/sin_advsimd.c @@ -22,7 +22,7 @@ static const struct data { float64x2_t poly[7]; - float64x2_t range_val, inv_pi, shift, pi_1, pi_2, pi_3; + float64x2_t range_val, inv_pi, pi_1, pi_2, pi_3; } data = { .poly = { V2 (-0x1.555555555547bp-3), V2 (0x1.1111111108a4dp-7), V2 (-0x1.a01a019936f27p-13), V2 (0x1.71de37a97d93ep-19), @@ -34,12 +34,13 @@ static const struct data .pi_1 = V2 (0x1.921fb54442d18p+1), .pi_2 = V2 (0x1.1a62633145c06p-53), .pi_3 = V2 (0x1.c1cd129024e09p-106), - .shift = V2 (0x1.8p52), }; #if WANT_SIMD_EXCEPT -# define TinyBound v_u64 (0x3000000000000000) /* asuint64 (0x1p-255). */ -# define Thresh v_u64 (0x1160000000000000) /* RangeVal - TinyBound. */ +/* asuint64(0x1p-253)), below which multiply by inv_pi underflows. */ +# define TinyBound v_u64 (0x3020000000000000) +/* RangeVal - TinyBound. */ +# define Thresh v_u64 (0x1160000000000000) #endif #define C(i) d->poly[i] @@ -72,16 +73,15 @@ float64x2_t VPCS_ATTR V_NAME_D1 (sin) (float64x2_t x) fenv). These lanes will be fixed by special-case handler later. */ uint64x2_t ir = vreinterpretq_u64_f64 (vabsq_f64 (x)); cmp = vcgeq_u64 (vsubq_u64 (ir, TinyBound), Thresh); - r = vbslq_f64 (cmp, vreinterpretq_f64_u64 (cmp), x); + r = vreinterpretq_f64_u64 (vbicq_u64 (vreinterpretq_u64_f64 (x), cmp)); #else r = x; cmp = vcageq_f64 (x, d->range_val); #endif /* n = rint(|x|/pi). */ - n = vfmaq_f64 (d->shift, d->inv_pi, r); - odd = vshlq_n_u64 (vreinterpretq_u64_f64 (n), 63); - n = vsubq_f64 (n, d->shift); + n = vrndaq_f64 (vmulq_f64 (r, d->inv_pi)); + odd = vshlq_n_u64 (vreinterpretq_u64_s64 (vcvtq_s64_f64 (n)), 63); /* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */ r = vfmsq_f64 (r, d->pi_1, n); diff --git a/sysdeps/aarch64/fpu/sinf_advsimd.c b/sysdeps/aarch64/fpu/sinf_advsimd.c index 375dfc3331..6ee9a23d5b 100644 --- a/sysdeps/aarch64/fpu/sinf_advsimd.c +++ b/sysdeps/aarch64/fpu/sinf_advsimd.c @@ -22,7 +22,7 @@ static const struct data { float32x4_t poly[4]; - float32x4_t range_val, inv_pi, shift, pi_1, pi_2, pi_3; + float32x4_t range_val, inv_pi, pi_1, pi_2, pi_3; } data = { /* 1.886 ulp error. */ .poly = { V4 (-0x1.555548p-3f), V4 (0x1.110df4p-7f), V4 (-0x1.9f42eap-13f), @@ -33,13 +33,14 @@ static const struct data .pi_3 = V4 (-0x1.ee59dap-49f), .inv_pi = V4 (0x1.45f306p-2f), - .shift = V4 (0x1.8p+23f), .range_val = V4 (0x1p20f) }; #if WANT_SIMD_EXCEPT -# define TinyBound v_u32 (0x21000000) /* asuint32(0x1p-61f). */ -# define Thresh v_u32 (0x28800000) /* RangeVal - TinyBound. */ +/* asuint32(0x1p-59f), below which multiply by inv_pi underflows. */ +# define TinyBound v_u32 (0x22000000) +/* RangeVal - TinyBound. */ +# define Thresh v_u32 (0x27800000) #endif #define C(i) d->poly[i] @@ -64,23 +65,22 @@ float32x4_t VPCS_ATTR NOINLINE V_NAME_F1 (sin) (float32x4_t x) /* If fenv exceptions are to be triggered correctly, set any special lanes to 1 (which is neutral w.r.t. fenv). These lanes will be fixed by special-case handler later. */ - r = vbslq_f32 (cmp, vreinterpretq_f32_u32 (cmp), x); + r = vreinterpretq_f32_u32 (vbicq_u32 (vreinterpretq_u32_f32 (x), cmp)); #else r = x; cmp = vcageq_f32 (x, d->range_val); #endif - /* n = rint(|x|/pi) */ - n = vfmaq_f32 (d->shift, d->inv_pi, r); - odd = vshlq_n_u32 (vreinterpretq_u32_f32 (n), 31); - n = vsubq_f32 (n, d->shift); + /* n = rint(|x|/pi). */ + n = vrndaq_f32 (vmulq_f32 (r, d->inv_pi)); + odd = vshlq_n_u32 (vreinterpretq_u32_s32 (vcvtq_s32_f32 (n)), 31); - /* r = |x| - n*pi (range reduction into -pi/2 .. pi/2) */ + /* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */ r = vfmsq_f32 (r, d->pi_1, n); r = vfmsq_f32 (r, d->pi_2, n); r = vfmsq_f32 (r, d->pi_3, n); - /* y = sin(r) */ + /* y = sin(r). */ r2 = vmulq_f32 (r, r); y = vfmaq_f32 (C (2), C (3), r2); y = vfmaq_f32 (C (1), y, r2);