From patchwork Thu Jun 27 15:58:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Weimer X-Patchwork-Id: 92968 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BBA513858D26 for ; Thu, 27 Jun 2024 15:59:25 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id B87923858D26 for ; Thu, 27 Jun 2024 15:58:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B87923858D26 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B87923858D26 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719503935; cv=none; b=iccTl5grCO8e2bisX/8leaJbuTdB4tw/VzAeaDelodq7wnAKzuQNJ74SSPiMxTDeWsAVdbWBqyiWLkwKKWTh/m3YIwSQ5YDKA9ve8I1IpGqlfWPZOhBMlis4WeJTUYMV2cr23u6cTV8vtGt/JwrXyCn5+XDMF+LEjTHWKMp/dkk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719503935; c=relaxed/simple; bh=IOXPGJGEBTkXdfZzPZYvWgmh4GhVHcl4HqR2pKDX4Zk=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=R+sN54P+GLp6dEOQFVxEpS2zlNcM3L+KXdhJQk2PB8ePkUjhM79CuNEtoXEKYHnqnU+2sjHy8LujT66qtrwS2cJAgjje9A+yWmN1HgY1u6oRQIGoV+3qiJX0hBlkITRt8CCISfqIxjx912FUpm/kn7lzC4DxUjNEy9kVg2mNEQk= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1719503932; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type; bh=XCKwywKS4D3hyAU1jL1BYSrwO3qadXl6x4m7i9/XxLY=; b=XjbAYoG2ZW1HRtDC75nv3sMuKkanPqbW0gJepbuDTGq3u/wh9+B1veTEAmn7wy6cMjUfkv t2NNxKM5aMNfBhPc1rlUj2e9rO8eC56dWC6dyqC9AgGo2HGGJpFirO7Hrt35HpRbfOknvf H6623NDjP76z6ZSvtfU4TPOEOwNbkvw= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-500-zJm4kh1IPMqIMGttHpCV4Q-1; Thu, 27 Jun 2024 11:58:50 -0400 X-MC-Unique: zJm4kh1IPMqIMGttHpCV4Q-1 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 861911945118 for ; Thu, 27 Jun 2024 15:58:49 +0000 (UTC) Received: from oldenburg.str.redhat.com (unknown [10.45.224.225]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5F88D19560A3 for ; Thu, 27 Jun 2024 15:58:48 +0000 (UTC) From: Florian Weimer To: libc-alpha@sourceware.org Subject: [PATCH] manual: Document a GNU extension for strncmp/wcsncmp Date: Thu, 27 Jun 2024 17:58:45 +0200 Message-ID: <874j9eml6y.fsf@oldenburg.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org At least strncnmp is widely used for string prefix checking, so add some language to make this valid. Add tests to show that glibc implements this extension. This should probably go in after the strnlen/wcsnlen GNU extension. Tested on aarch64-linux-gnu (Neoverse-V2), i686-linux-gnu (Zen 4), powerpc64le-linux-gnu (POWER10), x86_64-linux-gnu (Zen 4). On s390x-linux-gnu (z16), the new wcsncmp test fails due to bug 31934. (There could be further issues because the test crashes rather early.) --- manual/string.texi | 36 ++++++++- string/Makefile | 1 + string/test-Xncmp-gnu.c | 183 ++++++++++++++++++++++++++++++++++++++++++++++ string/test-strncmp-gnu.c | 4 + wcsmbs/Makefile | 1 + wcsmbs/test-wcsncmp-gnu.c | 5 ++ 6 files changed, 226 insertions(+), 4 deletions(-) base-commit: 21738846a19eb4a36981efd37d9ee7cb6d687494 diff --git a/manual/string.texi b/manual/string.texi index 0b667bd3fb..ecd3c66d43 100644 --- a/manual/string.texi +++ b/manual/string.texi @@ -1234,6 +1234,12 @@ char} objects, then promoted to @code{int}). If the contents of the two blocks are equal, @code{memcmp} returns @code{0}. + +Note that @code{memcmp} requires objects of at least @var{size} bytes at +@var{a1} and @var{a2}. The implementation does not necessarily stop +processing after the first byte difference. Use @code{strcmp} to +compare a string with a string literal, and use the GNU extension of +@code{strncmp} to check if a string has a given prefix. @end deftypefun @deftypefun int wmemcmp (const wchar_t *@var{a1}, const wchar_t *@var{a2}, size_t @var{size}) @@ -1247,6 +1253,13 @@ smaller or larger than the corresponding wide character in @var{a2}. If the contents of the two blocks are equal, @code{wmemcmp} returns @code{0}. + +Note that @code{wmemcmp} requires that @var{size} wide characters are +available starting at @var{a1} and @var{a2}. The implementation does +not necessarily stop processing after the first difference encountered. +Use @code{wcscmp} to compare a wide string with a wide string literal, +and use the GNU extension of @code{wcsncmp} to check if a string has a +given prefix. @end deftypefun On arbitrary arrays, the @code{memcmp} function is mostly useful for @@ -1367,15 +1380,30 @@ This function is the similar to @code{strcmp}, except that no more than @var{size} bytes are compared. In other words, if the two strings are the same in their first @var{size} bytes, the return value is zero. + +As a GNU extension, the pointer arguments do not need to point to arrays +of at least @var{size} elements in some cases. For example, for +null-terminated strings @var{s1} and @var{s2}, the expression +@code{strncmp (@var{s1}, @var{s2}, strlen (@var{s2})) == 0} is true if +and only if the string @var{s2} is a prefix of the string @var{s1}. +More generally, in the GNU version, @code{strncmp (@var{s1}, @var{s2}, +@var{size})} is valid if both @code{strnlen (@var{s1}, @var{size})} and +@code{strnlen (@var{s2}, @var{size})} are valid. In the prefix checking +idiom, note that this still requires that @var{s1} is a null-terminated +string there are fewer than @var{size} array elements starting at +@var{s1}. @end deftypefun @deftypefun int wcsncmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}, size_t @var{size}) @standards{ISO, wchar.h} @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} -This function is similar to @code{wcscmp}, except that no more than -@var{size} wide characters are compared. In other words, if the two -strings are the same in their first @var{size} wide characters, the -return value is zero. +This function is similar to @code{strncnmp}, except that it operates +on wide characters instead of bytes. At most @var{size} wide characters +are compared. + +As a GNU extension, @code{wcsncmp (@var{ws1}, @var{ws2}, @var{size})} is +valid if both @code{wcsnlen (@var{ws1}, @var{size})} and @code{wcsnlen +(@var{ws2}, @var{size})} are valid. @end deftypefun @deftypefun int strncasecmp (const char *@var{s1}, const char *@var{s2}, size_t @var{n}) diff --git a/string/Makefile b/string/Makefile index 8f31fa49e6..ad98d06391 100644 --- a/string/Makefile +++ b/string/Makefile @@ -181,6 +181,7 @@ tests := \ test-strncasecmp \ test-strncat \ test-strncmp \ + test-strncmp-gnu \ test-strncpy \ test-strndup \ test-strnlen \ diff --git a/string/test-Xncmp-gnu.c b/string/test-Xncmp-gnu.c new file mode 100644 index 0000000000..9dc1ecca3c --- /dev/null +++ b/string/test-Xncmp-gnu.c @@ -0,0 +1,183 @@ +/* Test GNU extension for non-array inputs to string comparison functions. + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* This skeleton file is included from string/test-strncmp-gnu.c and + wcsmbs/tst-wcsncmp-gnu.c to test that reading of the arrays stops + at the first null character. + + TEST_IDENTIFIER must be the test function identifier. TEST_NAME is + the same as a string. + + CHAR must be defined as the character type. */ + +#include +#include +#include +#include +#include +#include +#include + +/* Much shorter than test-Xnlen-gnu.c because of deeply nested loops. */ +enum { buffer_length = 80 }; + +/* The test buffer layout follows what is described test-Xnlen-gnu.c, + except that there two buffers, left and right. The variables + a_count, zero_count, start_offset are all duplicated. */ + +/* Return the maximum string length for a string that starts at + start_offset. */ +static int +string_length (int a_count, int start_offset) +{ + if (start_offset == buffer_length || start_offset >= a_count) + return 0; + else + return a_count - start_offset; +} + +/* This is the valid maximum length argument computation for + strnlen/wcsnlen. See text-Xnlen-gnu.c. */ +static int +maximum_length (int start_offset, int zero_count) +{ + if (start_offset == buffer_length) + return 0; + else if (zero_count > 0) + /* Effectively unbounded, but we need to stop fairly low, + otherwise testing takes too long. */ + return buffer_length + 32; + else + return buffer_length - start_offset; +} + +typedef __typeof (TEST_IDENTIFIER) *proto_t; + +#define TEST_MAIN +#include "test-string.h" + +IMPL (TEST_IDENTIFIER, 1) + +static int +test_main (void) +{ + TEST_VERIFY_EXIT (sysconf (_SC_PAGESIZE) >= buffer_length); + test_init (); + + struct support_next_to_fault left_ntf + = support_next_to_fault_allocate (buffer_length * sizeof (CHAR)); + CHAR *left_buffer = (CHAR *) left_ntf.buffer; + struct support_next_to_fault right_ntf + = support_next_to_fault_allocate (buffer_length * sizeof (CHAR)); + CHAR *right_buffer = (CHAR *) right_ntf.buffer; + + FOR_EACH_IMPL (impl, 0) + { + printf ("info: testing %s\n", impl->name); + for (size_t i = 0; i < buffer_length; ++i) + left_buffer[i] = 'A'; + + for (int left_zero_count = 0; left_zero_count <= buffer_length; + ++left_zero_count) + { + if (left_zero_count > 0) + left_buffer[buffer_length - left_zero_count] = 0; + int left_a_count = buffer_length - left_zero_count; + for (size_t i = 0; i < buffer_length; ++i) + right_buffer[i] = 'A'; + for (int right_zero_count = 0; right_zero_count <= buffer_length; + ++right_zero_count) + { + if (right_zero_count > 0) + right_buffer[buffer_length - right_zero_count] = 0; + int right_a_count = buffer_length - right_zero_count; + for (int left_start_offset = 0; + left_start_offset <= buffer_length; + ++left_start_offset) + { + CHAR *left_start_pointer = left_buffer + left_start_offset; + int left_maxlen + = maximum_length (left_start_offset, left_zero_count); + int left_length + = string_length (left_a_count, left_start_offset); + for (int right_start_offset = 0; + right_start_offset <= buffer_length; + ++right_start_offset) + { + CHAR *right_start_pointer + = right_buffer + right_start_offset; + int right_maxlen + = maximum_length (right_start_offset, right_zero_count); + int right_length + = string_length (right_a_count, right_start_offset); + + /* Maximum length is modelled after strnlen/wcsnlen, + and must be valid for both pointer arguments at + the same time. */ + int maxlen = MIN (left_maxlen, right_maxlen); + + for (int length_argument = 0; length_argument <= maxlen; + ++length_argument) + { + if (test_verbose) + { + printf ("left: zero_count=%d" + " a_count=%d start_offset=%d\n", + left_zero_count, left_a_count, + left_start_offset); + printf ("right: zero_count=%d" + " a_count=%d start_offset=%d\n", + right_zero_count, right_a_count, + right_start_offset); + printf ("length argument: %d\n", + length_argument); + } + + /* Effective lengths bounded by length argument. + The effective length determines the + outcome of the comparison. */ + int left_effective + = MIN (left_length, length_argument); + int right_effective + = MIN (right_length, length_argument); + if (left_effective == right_effective) + TEST_COMPARE (CALL (impl, + left_start_pointer, + right_start_pointer, + length_argument), 0); + else if (left_effective < right_effective) + TEST_COMPARE (CALL (impl, + left_start_pointer, + right_start_pointer, + length_argument) < 0, 1); + else + TEST_COMPARE (CALL (impl, + left_start_pointer, + right_start_pointer, + length_argument) > 0, 1); + } + } + } + } + } + } + + return 0; +} + +#include diff --git a/string/test-strncmp-gnu.c b/string/test-strncmp-gnu.c new file mode 100644 index 0000000000..0652145caa --- /dev/null +++ b/string/test-strncmp-gnu.c @@ -0,0 +1,4 @@ +#define TEST_IDENTIFIER strncmp +#define TEST_NAME "strncmp" +typedef char CHAR; +#include "test-Xncmp-gnu.c" diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile index 1cddd8cc6d..884b9ce8b7 100644 --- a/wcsmbs/Makefile +++ b/wcsmbs/Makefile @@ -158,6 +158,7 @@ tests := \ test-wcslen \ test-wcsncat \ test-wcsncmp \ + test-wcsncmp-gnu \ test-wcsncpy \ test-wcsnlen \ test-wcspbrk \ diff --git a/wcsmbs/test-wcsncmp-gnu.c b/wcsmbs/test-wcsncmp-gnu.c new file mode 100644 index 0000000000..6d085d300b --- /dev/null +++ b/wcsmbs/test-wcsncmp-gnu.c @@ -0,0 +1,5 @@ +#include +#define TEST_IDENTIFIER wcsncmp +#define TEST_NAME "wcsncmp" +typedef wchar_t CHAR; +#include "../string/test-Xncmp-gnu.c"