DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 61S2VRXJ1990096
Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com
Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com
DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 61S2VRXJ1990096
Authentication-Results: delorie.com;
	dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=qAMzxjuN
X-Recipient: archive-cygwin@delorie.com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4378E4B9DB64
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
	s=default; t=1772245885;
	bh=gdN3TnC0xYAkUo9JVFdVTAhW+NBzgfLNlGyHHjz9kVw=;
	h=Date:To:Cc:In-Reply-To:References:Subject:List-Id:
	 List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe:
	 From:Reply-To:From;
	b=qAMzxjuNXJc0EI098N7ou0YynmEzOoTNNheQgUdp3dobKT3rXLSJFNY9EupSbpVlH
	 JgEpfEs0EVb52Cj3CSvejUpTjxxqYC9ZNc6l0DQvLEfoxXLnXyiAB9dOgnENiKuxKp
	 AfjScNNJre2Q6RgeShUidH0VOjxBk+uVofIMoiXI=
X-Original-To: cygwin@cygwin.com
Delivered-To: cygwin@cygwin.com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0C79D4BA23C2
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0C79D4BA23C2
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1772245836; cv=none;
 b=XRHAagK+P4sgHGvMx+gSOtOEFN6cv8d8GjmXxY3sCBVPW4+AvKIbY8YkqKl/KUg1khUO1tf6DSQqy2aftyezvZtLEDygo4ib/NV/03+ivN8VfBCHlOPnFVjacubj4kpck/bqYTiZZRYzmyCh4T5jZkDm66XCi5uzPadUlAaCoBQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1772245836; c=relaxed/simple;
 bh=amXZ3Xkn5iIs0mP3mph98AMeMtqVMklBMOJLZK4dQ88=;
 h=DKIM-Signature:Date:From:To:Message-ID:Subject:MIME-Version;
 b=pjmrUi04TwWSjJypDrW65IRqJ38y7EdjUk80U66uUljejkkWSzv/9AuRCpwXczyH6orprBB4N4FUjsKDRY7dmEpXkaDxocD/ZQzn/7HuT9V/tQ+B0EEBGqGgk1B+f5dlEokEn/eEl0MOV90xgOvVucN61ipAHcNLCDJnizkmQ24=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0C79D4BA23C2
Date: Fri, 27 Feb 2026 18:30:30 -0800 (PST)
To: General Cygwin discussions and problem reports <cygwin@cygwin.com>
Cc: Brian Inglis <Brian.Inglis@SystematicSW.ab.ca>
Message-ID: <103536920.1558501.1772245830440@connect.xfinity.com>
In-Reply-To: <45c133f7-8285-4cb3-9701-2642cb76ab37@SystematicSW.ab.ca>
References: <547312365.1464244.1771958282029@connect.xfinity.com>
 <aZ7PrbisVR1R4A7v@dimstar.local.net>
 <1670201592.1489273.1772043520008@connect.xfinity.com>
 <e91d8b5b-2690-4271-aa74-e6226440e33d@SystematicSW.ab.ca>
 <1044918836.1507810.1772086967212@connect.xfinity.com>
 <1579472684.1508349.1772092747339@connect.xfinity.com>
 <aaABFf5iEowV1l7I@xps13>
 <1148572549.1808180.1772097444036@mail.yahoo.com>
 <1901597260.1508573.1772100378936@connect.xfinity.com>
 <0C965DD0-856E-41FF-B5A4-15E472292A32@unified-streaming.com>
 <483908609.1508714.1772103775739@connect.xfinity.com>
 <2346fd41-2500-0db6-5849-6788174b5a1d@cs.umass.edu>
 <1462848037.1521935.1772136952077@connect.xfinity.com>
 <399745a1-429a-ebb4-0f67-c32f6282caa6@cs.umass.edu>
 <1093316506.1533154.1772157883568@connect.xfinity.com>
 <3e0de899-a7dd-8fea-7743-10e6b05cc6b6@cs.umass.edu>
 <1990836634.1545853.1772216419837@connect.xfinity.com>
 <45c133f7-8285-4cb3-9701-2642cb76ab37@SystematicSW.ab.ca>
Subject: Re: Memmove causing program crashes, giving SIGTRAP in GDB(?)
MIME-Version: 1.0
X-Priority: 3
Importance: Normal
X-Mailer: Open-Xchange Mailer v7.10.6-Rev83
X-Originating-IP: ::ffff:50.47.202.14
X-Originating-Port: 36348
X-Originating-Client: open-xchange-appsuite
X-CMAE-Envelope: MS4xfI/UU4H+SaPHknYXzHUK2BoZF4RB0MXCfODbxdfEs9f3ctJvJ8W/tJHdVD8KU9Eo9f0uyz1KUsQnU2avaNSnsFpdxb3apBMe2q+Fbq4iBY4QVhcX1o4o
 4H+Szh9uoTg2x9ljlp+SmZVkfRyQG92g58IFE8NcXJIAFhB5d4mfONGshSEpo/CKbFUkTVgXOWoleg03yEKI5Eob2biwlTf7k47WC+gCRPYuw3k+0pg8P/g2
 CzDPegWTIm97vJ3e9NxRASkcm/TzCw1xoP7zb9j21Gc=
X-BeenThere: cygwin@cygwin.com
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
 <mailto:cygwin-request@cygwin.com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-request@cygwin.com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
 <mailto:cygwin-request@cygwin.com?subject=subscribe>
From: KENNON J CONRAD via Cygwin <cygwin@cygwin.com>
Reply-To: KENNON J CONRAD <kennonconrad@comcast.net>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: cygwin-bounces~archive-cygwin=delorie.com@cygwin.com
Sender: "Cygwin" <cygwin-bounces~archive-cygwin=delorie.com@cygwin.com>

Hi Brian,

I just wanted to add that the stash and store idea you suggest that is also used in memmove has a very nice impact
on the assembly code.

With the old code that does this for the last 0 to 7 words:
        while (candidate_ptr > score_ptr) {
          *candidate_ptr = *(candidate_ptr - 1);
          candidate_ptr--;
        }

the assembly code shows this from the point where the move starts:
.L24:
	movdqu	-16(%rax), %xmm1
	subq	$16, %rax
	movups	%xmm1, 2(%rax)
	cmpq	%rdx, %rax
	jnb	.L24
	movq	%r10, %rax
	subq	%r9, %rax
	subq	$16, %rax
	notq	%rax
	andq	$-16, %rax
	addq	%r10, %rax
	cmpq	%rax, %r9
	jnb	.L28
	movq	%rax, %rcx
	movq	%rax, %rdx
	movq	%r9, 48(%rsp)
	subq	%r9, %rcx
	subq	$1, %rcx
	shrq	%rcx
	leaq	2(%rcx,%rcx), %r8
	negq	%rcx
	subq	%r8, %rdx
	leaq	(%rax,%rcx,2), %rcx
	call	memmove
	movq	48(%rsp), %r9
	jmp	.L28

But with stash and store:
        *(uint64_t *)&candidates_index[new_score_rank + 1] = first_four;
        *(uint64_t *)&candidates_index[new_score_rank + 5] = next_four;

the assembly code from the point where the move start is this:
.L24:
	movdqu	-16(%r9), %xmm1
	subq	$16, %r9
	movups	%xmm1, 2(%r9)
	cmpq	%rax, %r9
	jnb	.L24
	movups	%xmm0, 2(%rdi,%rdx)
	jmp	.L26

There are a couple of extra assembly instructions to stash into xmm0 before the move, but this is a big reduction in
assembly code size for the backward memory move.  Not as fast as memmove if the DF wasn't getting corrupted, but much
better than the old code plus it completely avoids the risk of DF corruption during rep movsq in memmove for backward
move sizes >= 8!  I like it because there is no need to worry about whether rep movsb or rep movsw could also be
vulnerable to DF corruption.

Best Regards,

Kennon

> On 02/27/2026 11:49 AM PST Brian Inglis via Cygwin <cygwin@cygwin.com> wrote:
> 
>  
> Hi Kennon,
> 
> Some perf reports and analysis imply that backward moves (with overlap?) are no 
> faster than straight rep movsb on some CPUs, so it may be better to just 
> simplify to that, unless you want to stash the final element(s) to be moved out 
> of the way in register(s), and use multiple registers in unrolled wide moves for 
> the aligned portion?
>

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
