DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 61S2VRXJ1990096 Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 61S2VRXJ1990096 Authentication-Results: delorie.com; dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=qAMzxjuN X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4378E4B9DB64 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1772245885; bh=gdN3TnC0xYAkUo9JVFdVTAhW+NBzgfLNlGyHHjz9kVw=; h=Date:To:Cc:In-Reply-To:References:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=qAMzxjuNXJc0EI098N7ou0YynmEzOoTNNheQgUdp3dobKT3rXLSJFNY9EupSbpVlH JgEpfEs0EVb52Cj3CSvejUpTjxxqYC9ZNc6l0DQvLEfoxXLnXyiAB9dOgnENiKuxKp AfjScNNJre2Q6RgeShUidH0VOjxBk+uVofIMoiXI= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0C79D4BA23C2 ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0C79D4BA23C2 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1772245836; cv=none; b=XRHAagK+P4sgHGvMx+gSOtOEFN6cv8d8GjmXxY3sCBVPW4+AvKIbY8YkqKl/KUg1khUO1tf6DSQqy2aftyezvZtLEDygo4ib/NV/03+ivN8VfBCHlOPnFVjacubj4kpck/bqYTiZZRYzmyCh4T5jZkDm66XCi5uzPadUlAaCoBQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1772245836; c=relaxed/simple; bh=amXZ3Xkn5iIs0mP3mph98AMeMtqVMklBMOJLZK4dQ88=; h=DKIM-Signature:Date:From:To:Message-ID:Subject:MIME-Version; b=pjmrUi04TwWSjJypDrW65IRqJ38y7EdjUk80U66uUljejkkWSzv/9AuRCpwXczyH6orprBB4N4FUjsKDRY7dmEpXkaDxocD/ZQzn/7HuT9V/tQ+B0EEBGqGgk1B+f5dlEokEn/eEl0MOV90xgOvVucN61ipAHcNLCDJnizkmQ24= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0C79D4BA23C2 Date: Fri, 27 Feb 2026 18:30:30 -0800 (PST) To: General Cygwin discussions and problem reports Cc: Brian Inglis Message-ID: <103536920.1558501.1772245830440@connect.xfinity.com> In-Reply-To: <45c133f7-8285-4cb3-9701-2642cb76ab37@SystematicSW.ab.ca> References: <547312365 DOT 1464244 DOT 1771958282029 AT connect DOT xfinity DOT com> <1670201592 DOT 1489273 DOT 1772043520008 AT connect DOT xfinity DOT com> <1044918836 DOT 1507810 DOT 1772086967212 AT connect DOT xfinity DOT com> <1579472684 DOT 1508349 DOT 1772092747339 AT connect DOT xfinity DOT com> <1148572549 DOT 1808180 DOT 1772097444036 AT mail DOT yahoo DOT com> <1901597260 DOT 1508573 DOT 1772100378936 AT connect DOT xfinity DOT com> <0C965DD0-856E-41FF-B5A4-15E472292A32 AT unified-streaming DOT com> <483908609 DOT 1508714 DOT 1772103775739 AT connect DOT xfinity DOT com> <2346fd41-2500-0db6-5849-6788174b5a1d AT cs DOT umass DOT edu> <1462848037 DOT 1521935 DOT 1772136952077 AT connect DOT xfinity DOT com> <399745a1-429a-ebb4-0f67-c32f6282caa6 AT cs DOT umass DOT edu> <1093316506 DOT 1533154 DOT 1772157883568 AT connect DOT xfinity DOT com> <3e0de899-a7dd-8fea-7743-10e6b05cc6b6 AT cs DOT umass DOT edu> <1990836634 DOT 1545853 DOT 1772216419837 AT connect DOT xfinity DOT com> <45c133f7-8285-4cb3-9701-2642cb76ab37 AT SystematicSW DOT ab DOT ca> Subject: Re: Memmove causing program crashes, giving SIGTRAP in GDB(?) MIME-Version: 1.0 X-Priority: 3 Importance: Normal X-Mailer: Open-Xchange Mailer v7.10.6-Rev83 X-Originating-IP: ::ffff:50.47.202.14 X-Originating-Port: 36348 X-Originating-Client: open-xchange-appsuite X-CMAE-Envelope: MS4xfI/UU4H+SaPHknYXzHUK2BoZF4RB0MXCfODbxdfEs9f3ctJvJ8W/tJHdVD8KU9Eo9f0uyz1KUsQnU2avaNSnsFpdxb3apBMe2q+Fbq4iBY4QVhcX1o4o 4H+Szh9uoTg2x9ljlp+SmZVkfRyQG92g58IFE8NcXJIAFhB5d4mfONGshSEpo/CKbFUkTVgXOWoleg03yEKI5Eob2biwlTf7k47WC+gCRPYuw3k+0pg8P/g2 CzDPegWTIm97vJ3e9NxRASkcm/TzCw1xoP7zb9j21Gc= X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: KENNON J CONRAD via Cygwin Reply-To: KENNON J CONRAD Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" Hi Brian, I just wanted to add that the stash and store idea you suggest that is also used in memmove has a very nice impact on the assembly code. With the old code that does this for the last 0 to 7 words: while (candidate_ptr > score_ptr) { *candidate_ptr = *(candidate_ptr - 1); candidate_ptr--; } the assembly code shows this from the point where the move starts: .L24: movdqu -16(%rax), %xmm1 subq $16, %rax movups %xmm1, 2(%rax) cmpq %rdx, %rax jnb .L24 movq %r10, %rax subq %r9, %rax subq $16, %rax notq %rax andq $-16, %rax addq %r10, %rax cmpq %rax, %r9 jnb .L28 movq %rax, %rcx movq %rax, %rdx movq %r9, 48(%rsp) subq %r9, %rcx subq $1, %rcx shrq %rcx leaq 2(%rcx,%rcx), %r8 negq %rcx subq %r8, %rdx leaq (%rax,%rcx,2), %rcx call memmove movq 48(%rsp), %r9 jmp .L28 But with stash and store: *(uint64_t *)&candidates_index[new_score_rank + 1] = first_four; *(uint64_t *)&candidates_index[new_score_rank + 5] = next_four; the assembly code from the point where the move start is this: .L24: movdqu -16(%r9), %xmm1 subq $16, %r9 movups %xmm1, 2(%r9) cmpq %rax, %r9 jnb .L24 movups %xmm0, 2(%rdi,%rdx) jmp .L26 There are a couple of extra assembly instructions to stash into xmm0 before the move, but this is a big reduction in assembly code size for the backward memory move. Not as fast as memmove if the DF wasn't getting corrupted, but much better than the old code plus it completely avoids the risk of DF corruption during rep movsq in memmove for backward move sizes >= 8! I like it because there is no need to worry about whether rep movsb or rep movsw could also be vulnerable to DF corruption. Best Regards, Kennon > On 02/27/2026 11:49 AM PST Brian Inglis via Cygwin wrote: > > > Hi Kennon, > > Some perf reports and analysis imply that backward moves (with overlap?) are no > faster than straight rep movsb on some CPUs, so it may be better to just > simplify to that, unless you want to stash the final element(s) to be moved out > of the way in register(s), and use multiple registers in unrolled wide moves for > the aligned portion? > -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple