Mail Archives: cygwin/2023/12/25/08:36:31
X-Recipient: | archive-cygwin AT delorie DOT com
|
DKIM-Filter: | OpenDKIM Filter v2.11.0 sourceware.org 752B13858418
|
DKIM-Signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
|
| s=default; t=1703511388;
|
| bh=c6k8p7bBTk54fmIotNqQxlAm3cT9WxhS6Oy72FrYso0=;
|
| h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post:
|
| List-Help:List-Subscribe:From:Reply-To:From;
|
| b=U8zFERCK8/5NA4LMkYoOqnUoJRBGCLEc9BP6BkaYjx0jimhT7bIoF8CTNHTLdFhLc
|
| a4RFnIYJvZZ4cS3zWJ2BXv3ZRSPIiJerBwQsKQR0rh5JgeQLikoBcuoj67v/5/UND7
|
| D33b2QLRswGTN6tShwzt7WaLTIyETlKnLAzzNhPo=
|
X-Original-To: | cygwin AT cygwin DOT com
|
Delivered-To: | cygwin AT cygwin DOT com
|
DMARC-Filter: | OpenDMARC Filter v1.4.2 sourceware.org AEFA63858C98
|
ARC-Filter: | OpenARC Filter v1.0.0 sourceware.org AEFA63858C98
|
ARC-Seal: | i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703511367; cv=none;
|
| b=wLyQtvIP0kVcYWDi9QIHf5vj19R2zmQIWFxOYznPhPbXf+Psk75XXHqY+XhCNwW77jpAW7cSVCzx0snuGs6i0V1igy99kzj+gouQHwnyP+vOMKCnHWyx13qzCU4wCaW2HrLflNoiIIMXnspWMnwv0oBkVZK0ANiPUhn1WP4T33M=
|
ARC-Message-Signature: | i=1; a=rsa-sha256; d=sourceware.org; s=key;
|
| t=1703511367; c=relaxed/simple;
|
| bh=KZwlW9x3/ynZ+r4vE0frvtz4PctuY7IzLo5FYGOTBuY=;
|
| h=Date:From:To:Subject:Message-Id:Mime-Version;
|
| b=CURl+76/AozzlaFfl9OGQ8TgNDIzSL4TxlH9DRikc5+KH8h9jzVx+oV4Kgvvu4Nl63tXIXJlHYM2wxesQYj2MoYN1EY0Nhldv2M6hSPv2fRZ0LwJF5ZcuyxVSjCdVK4QCIk+2wAIoM0revyppx1Gv3I03U8QbfG47sTrXachjlc=
|
ARC-Authentication-Results: | i=1; server2.sourceware.org
|
Date: | Mon, 25 Dec 2023 22:35:59 +0900
|
To: | cygwin AT cygwin DOT com
|
Subject: | gcc code generator problem for avx2
|
Message-Id: | <20231225223559.5c071dbd966250851e5c7748@nifty.ne.jp>
|
X-Mailer: | Sylpheed 3.7.0 (GTK+ 2.24.30; i686-pc-mingw32)
|
Mime-Version: | 1.0
|
X-Spam-Status: | No, score=-3.7 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
|
| KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, TXREP,
|
| T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6
|
X-Spam-Checker-Version: | SpamAssassin 3.4.6 (2021-04-09) on
|
| server2.sourceware.org
|
X-BeenThere: | cygwin AT cygwin DOT com
|
X-Mailman-Version: | 2.1.30
|
List-Id: | General Cygwin discussions and problem reports <cygwin.cygwin.com>
|
List-Archive: | <https://cygwin.com/pipermail/cygwin/>
|
List-Post: | <mailto:cygwin AT cygwin DOT com>
|
List-Help: | <mailto:cygwin-request AT cygwin DOT com?subject=help>
|
List-Subscribe: | <https://cygwin.com/mailman/listinfo/cygwin>,
|
| <mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
|
From: | Takashi Yano via Cygwin <cygwin AT cygwin DOT com>
|
Reply-To: | Takashi Yano <takashi DOT yano AT nifty DOT ne DOT jp>
|
Sender: | "Cygwin" <cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com>
|
Hi,
I encountered the problem that the avx2 code crashes with segfault.
This happens if the cygwin gcc or cygwin mingw gcc compiler is used.
To reproduce the problem, compile the following test case with:
gcc -O0 avx2test.c -mavx2 -o avx2test
or
x86_64-w64-mingw32-gcc -O0 avx2test.c -mavx2 -o avx2test
then, run avx2test.exe.
I looked into this problem a bit, and noticed that vmovdqa/vmovaps is
used for misaligned operands. If run
gcc -S -O0 avx2test.c -mavx2
and replase all vmovdqa/vmovaps with vmovdqu/vmovups in avx2test.s,
then run
gcc avx2test.s -o avx2test
the test case runs correctly.
This does not happen with gcc in Linux, mingw gcc in linux and MSYS2.
Only the cygwin gcc (v11.4, 12.3, 13.2) causes this problem.
Any idea?
/* avx2test.c */
#include <stdio.h>
#include <immintrin.h>
#include <stdint.h>
__m256i __m256i_div_epi32(const __m256i *a, const __m256i *b)
{
__m256 a1 = _mm256_cvtepi32_ps(*a);
__m256 b1 = _mm256_cvtepi32_ps(*b);
__m256 c = _mm256_div_ps(a1, b1);
__m256 d = _mm256_floor_ps(c);
__m256i e = _mm256_cvtps_epi32(d);
return e;
}
__m256i load_32bit_to_16bit_w16_avx2(const int32_t *a)
{
__m256i a_low = _mm256_lddqu_si256((const __m256i *) a);
__m256i a_high = _mm256_lddqu_si256((const __m256i *) (a + 8));
__m256i b = _mm256_packs_epi32(a_low, a_high);
return _mm256_permute4x64_epi64(b, 0xD8);
}
int main()
{
__attribute__ ((aligned (32))) int64_t a64[4] =
{0x447b400045754, 0x447b4000447b4, 0x447b4000443cc, 0x44b9c000447b4};
__attribute__ ((aligned (32))) int64_t b64[4] =
{0x426800004268, 0x426800004268, 0x426800004268, 0x426800004268};
__attribute__ ((aligned (32))) int32_t c32[16] =
{0x7fe, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
__m256i d = __m256i_div_epi32((__m256i*)a64, (__m256i*)b64);
int64_t *d64 = (int64_t *)&d;
printf("%lx, %lx, %lx, %lx\n", d64[0], d64[1], d64[2], d64[3]);
__m256i e = load_32bit_to_16bit_w16_avx2(c32);
int64_t *e64 = (int64_t *)&e;
printf("%lx, %lx, %lx, %lx\n", e64[0], e64[1], e64[2], e64[3]);
return 0;
}
--
Takashi Yano <takashi DOT yano AT nifty DOT ne DOT jp>
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
- Raw text -