| delorie.com/archives/browse.cgi | search |
| DMARC-Filter: | OpenDMARC Filter v1.4.2 delorie.com 6217eeMQ3189198 |
| Authentication-Results: | delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com |
| Authentication-Results: | delorie.com; spf=pass smtp.mailfrom=cygwin.com |
| DKIM-Filter: | OpenDKIM Filter v2.11.0 delorie.com 6217eeMQ3189198 |
| Authentication-Results: | delorie.com; |
| dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=DLgSCewH | |
| X-Recipient: | archive-cygwin AT delorie DOT com |
| DKIM-Filter: | OpenDKIM Filter v2.11.0 sourceware.org A33224BA2E18 |
| DKIM-Signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; |
| s=default; t=1772350839; | |
| bh=KsMByaCmUvLc00OtA8li/vvAV8QSUudDhTOIAcIG++A=; | |
| h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe: | |
| List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: | |
| From; | |
| b=DLgSCewHBskV3s0rL3eNc2m+yDfs/XQyGnO31idoiHxwKQ4VYR/6Rw9RrTF5eqVnW | |
| swMfT1OFoD2ChDC+qnZCo4Y+8dZSoZxkeKN2rXs9VLrb5lKPUnCw3No1SnvZmSMjgY | |
| ISxo15tV9RTmrg34A4njnU10ATHbZHcuWtWi77nk= | |
| X-Original-To: | cygwin AT cygwin DOT com |
| Delivered-To: | cygwin AT cygwin DOT com |
| DMARC-Filter: | OpenDMARC Filter v1.4.2 sourceware.org 99B4F4BA2E0B |
| ARC-Filter: | OpenARC Filter v1.0.0 sourceware.org 99B4F4BA2E0B |
| ARC-Seal: | i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1772350819; cv=none; |
| b=cPV9Bl8Z1Jrh8xbqy/KwAxeYHVfv/nnc83huAc1YEbde2FS2O+8H485rgPygvK9jTLiXgXn01v8yjsCpFoweKHC95dCP1wxLfrGjoyWkSDhDZ37mikyaaEzUabfpGzjCxxqBu4HCP3l1k5BMiquyOOp6k4yKtYC7CDCF8SVV/QI= | |
| ARC-Message-Signature: | i=1; a=rsa-sha256; d=sourceware.org; s=key; |
| t=1772350819; c=relaxed/simple; | |
| bh=CmS1EjFg3LntsT361D3N1gci8QM3s/5iBOvyxRHTax0=; | |
| h=Message-ID:Date:MIME-Version:From:Subject:To:DKIM-Signature; | |
| b=n5OkjbQsdvOz1/Pa/Ivi4cUNxyVCWVvF2OlS2/gaS6Y5MoUS8ocP1ePAzlAff3nNqLTt31o/FIgflyVj+T1UWEb7rDYrmDh7TkTxXohBUqMOhMHb3ShKbYM+suTeLy+8Kox4ysfhtIaZol/7nMMMMJBbaEVn+A7OEbdjJkxwV/s= | |
| ARC-Authentication-Results: | i=1; server2.sourceware.org |
| DKIM-Filter: | OpenDKIM Filter v2.11.0 sourceware.org 99B4F4BA2E0B |
| Message-ID: | <c013bd50-6cef-4d8f-ad9a-2421e417a6bb@SystematicSW.ab.ca> |
| Date: | Sun, 1 Mar 2026 00:40:16 -0700 |
| MIME-Version: | 1.0 |
| User-Agent: | Mozilla Thunderbird |
| Subject: | Re: Memmove causing program crashes, giving SIGTRAP in GDB(?) |
| To: | General Cygwin discussions and problem reports <cygwin AT cygwin DOT com> |
| References: | <547312365 DOT 1464244 DOT 1771958282029 AT connect DOT xfinity DOT com> |
| <1670201592 DOT 1489273 DOT 1772043520008 AT connect DOT xfinity DOT com> | |
| <e91d8b5b-2690-4271-aa74-e6226440e33d AT SystematicSW DOT ab DOT ca> | |
| <1044918836 DOT 1507810 DOT 1772086967212 AT connect DOT xfinity DOT com> | |
| <1579472684 DOT 1508349 DOT 1772092747339 AT connect DOT xfinity DOT com> | |
| <aaABFf5iEowV1l7I AT xps13> <1148572549 DOT 1808180 DOT 1772097444036 AT mail DOT yahoo DOT com> | |
| <1901597260 DOT 1508573 DOT 1772100378936 AT connect DOT xfinity DOT com> | |
| <0C965DD0-856E-41FF-B5A4-15E472292A32 AT unified-streaming DOT com> | |
| <483908609 DOT 1508714 DOT 1772103775739 AT connect DOT xfinity DOT com> | |
| <2346fd41-2500-0db6-5849-6788174b5a1d AT cs DOT umass DOT edu> | |
| <1462848037 DOT 1521935 DOT 1772136952077 AT connect DOT xfinity DOT com> | |
| <399745a1-429a-ebb4-0f67-c32f6282caa6 AT cs DOT umass DOT edu> | |
| <1093316506 DOT 1533154 DOT 1772157883568 AT connect DOT xfinity DOT com> | |
| <3e0de899-a7dd-8fea-7743-10e6b05cc6b6 AT cs DOT umass DOT edu> | |
| <1990836634 DOT 1545853 DOT 1772216419837 AT connect DOT xfinity DOT com> | |
| <45c133f7-8285-4cb3-9701-2642cb76ab37 AT SystematicSW DOT ab DOT ca> | |
| <103536920 DOT 1558501 DOT 1772245830440 AT connect DOT xfinity DOT com> | |
| Organization: | Systematic Software |
| In-Reply-To: | <103536920.1558501.1772245830440@connect.xfinity.com> |
| X-Stat-Signature: | biqa4939a69e6g9p53otn4gu4n474a8y |
| X-Rspamd-Server: | rspamout08 |
| X-Rspamd-Queue-Id: | CAFD920029 |
| X-Session-Marker: | 427269616E2E496E676C69734053797374656D6174696353572E61622E6361 |
| X-Session-ID: | U2FsdGVkX18U4BffKIHRxESbVwXCRmCLyoVGwiew7nM= |
| X-HE-Tag: | 1772350817-817816 |
| X-HE-Meta: | U2FsdGVkX189ZweJKR767FWre7KX11SUZ6lfDrzbFbeyqsTE5N+UmLlgJgImGjofXGLgOdk/lIsktd7DeSsDQsIe/T4+klqOb7dbifWNVTaP6Byehz2oUxRUTu4KPD4GSuLetExZ2NY9FZ5k2KI8fcU6Es3wBfc2MkfMDcgsYwKAOjh8Hc2hncSYt3MembPROMhcA7DDy9aCfz6jleXUsM3/h1ETG21Zrz7+fpBVEwdtcNjKK6xezREUSuuCpKuuwmaKt5AfDkOEVNv68beeWK44TACI4JPeqIh9cKDKX1uoFdlhCm3qqezS0Ea5g+dkdFQy0QFCfPEnJtftZVoumo8vuptrcLcP |
| X-BeenThere: | cygwin AT cygwin DOT com |
| X-Mailman-Version: | 2.1.30 |
| List-Id: | General Cygwin discussions and problem reports <cygwin.cygwin.com> |
| List-Unsubscribe: | <https://cygwin.com/mailman/options/cygwin>, |
| <mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe> | |
| List-Archive: | <https://cygwin.com/pipermail/cygwin/> |
| List-Post: | <mailto:cygwin AT cygwin DOT com> |
| List-Help: | <mailto:cygwin-request AT cygwin DOT com?subject=help> |
| List-Subscribe: | <https://cygwin.com/mailman/listinfo/cygwin>, |
| <mailto:cygwin-request AT cygwin DOT com?subject=subscribe> | |
| From: | Brian Inglis via Cygwin <cygwin AT cygwin DOT com> |
| Reply-To: | General Cygwin discussions and problem reports <cygwin AT cygwin DOT com> |
| Cc: | Brian Inglis <Brian DOT Inglis AT SystematicSW DOT ab DOT ca> |
| Errors-To: | cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com |
| Sender: | "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com> |
| X-MIME-Autoconverted: | from base64 to 8bit by delorie.com id 6217eeMQ3189198 |
Very good Kennon,
Neat and well researched, and surprisingly minimal!
Hopefully some of those approaches can eliminate all problems with CPU errata or
unfixed bugs, so you no longer hit any crashes, while managing high performance
on fast hardware.
And given the source is in C, it will continue working okay on older and newer
compilers, CPUs, and combos of those, as nowadays little improves, they are only
moving the bottlenecks around, to where your code hopefully will no lomger
notice the problems.
That's the issue I always had with "optimized" assembler: it's all well and good
with today's compiler and CPU, but give it a generation of each, and it's an
unpredictable pile of emoji, good only on old machines (like those I have) ;^>
We have to be able to run the same code on systems ranging from whatever today's
cheap mobile laptop celery-stick-in-the-muds are called, to GPU monster CPUs, to
the fractional or multiple package KCPU servers, with dozens to thousands of
threads on each, variable ISAs, uarchs, cache levels, sizes, and write policies.
That's actually an advantage for CISC ISAs, acting as an HLA, interpreted by the
instruction decoder into highly tuned RISC-like uops for dispatch into multiple
pipelined stages per thread, CPU, and/or package, to hopefully hide any poor
performance issues.
On 2026-02-27 19:30, KENNON J CONRAD via Cygwin wrote:
> I just wanted to add that the stash and store idea you suggest that is also
> used in memmove has a very nice impact on the assembly code.
>
> With the old code that does this for the last 0 to 7 words:
> while (candidate_ptr > score_ptr) {
> *candidate_ptr = *(candidate_ptr - 1);
> candidate_ptr--;
> }
>
> the assembly code shows this from the point where the move starts:
> .L24:
> movdqu -16(%rax), %xmm1
> subq $16, %rax
> movups %xmm1, 2(%rax)
> cmpq %rdx, %rax
> jnb .L24
> movq %r10, %rax
> subq %r9, %rax
> subq $16, %rax
> notq %rax
> andq $-16, %rax
> addq %r10, %rax
> cmpq %rax, %r9
> jnb .L28
> movq %rax, %rcx
> movq %rax, %rdx
> movq %r9, 48(%rsp)
> subq %r9, %rcx
> subq $1, %rcx
> shrq %rcx
> leaq 2(%rcx,%rcx), %r8
> negq %rcx
> subq %r8, %rdx
> leaq (%rax,%rcx,2), %rcx
> call memmove
> movq 48(%rsp), %r9
> jmp .L28
>
> But with stash and store:
> *(uint64_t *)&candidates_index[new_score_rank + 1] = first_four;
> *(uint64_t *)&candidates_index[new_score_rank + 5] = next_four;
>
> the assembly code from the point where the move start is this:
> .L24:
> movdqu -16(%r9), %xmm1
> subq $16, %r9
> movups %xmm1, 2(%r9)
> cmpq %rax, %r9
> jnb .L24
> movups %xmm0, 2(%rdi,%rdx)
> jmp .L26
>
> There are a couple of extra assembly instructions to stash into xmm0 before
> the move, but this is a big reduction in assembly code size for the backward
> memory move. Not as fast as memmove if the DF wasn't getting corrupted, but
> much better than the old code plus it completely avoids the risk of DF
> corruption during rep movsq in memmove for backward move sizes >= 8! I like it
> because there is no need to worry about whether rep movsb or rep movsw could
> also be vulnerable to DF corruption.
>> On 02/27/2026 11:49 AM PST Brian Inglis via Cygwin wrote:
>> Some perf reports and analysis imply that backward moves (with overlap?) are no
>> faster than straight rep movsb on some CPUs, so it may be better to just
>> simplify to that, unless you want to stash the final element(s) to be moved out
>> of the way in register(s), and use multiple registers in unrolled wide moves for
>> the aligned portion?
--
Take care. Thanks, Brian Inglis Calgary, Alberta, Canada
La perfection est atteinte Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add
mais lorsqu'il n'y a plus rien à retrancher but when there is no more to cut
-- Antoine de Saint-Exupéry
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
| webmaster | delorie software privacy |
| Copyright © 2019 by DJ Delorie | Updated Jul 2019 |