delorie.com/archives/browse.cgi | search |
X-Recipient: | archive-cygwin AT delorie DOT com |
DomainKey-Signature: | a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:mime-version:from:date:message-id:subject:to | |
:cc:content-type; q=dns; s=default; b=oA3U5T1eJvE7MRwleHS5pCdzH/ | |
Kt5MdqJxgWmcuDc01Eh49kZT8OlI5BDHSFBe6bgTmrqw/na+4HJBj+fbEGBY7aUG | |
1fo5H1bnJOWa/kFs2vAA1BvZ9XMOc/N/GOsft/CTqbakSMTQ9gKFTaLNCY+wvOAv | |
YJWSIp0rTz11uWPzQ= | |
DKIM-Signature: | v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:mime-version:from:date:message-id:subject:to | |
:cc:content-type; s=default; bh=ccVgvrojjd09bzWl2QraPfhkQp0=; b= | |
RPoVM9hr6jATysi0bdN6lXh2b+6kvnGKzQS5W3hgMlpENsZw8U7D/O0IjFS+0uaK | |
HWMChIrEyuR6HfU7SFVpR3mj3rpjR33n/mWd/VvzStYt8TdsXGUCq/NBQCNlFWFH | |
oQ7P8tBF4SMAgRBQddf51iezWTKZLcJd1Eu/Wh0Rmmw= | |
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm |
List-Id: | <cygwin.cygwin.com> |
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com> |
List-Archive: | <http://sourceware.org/ml/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs> |
Sender: | cygwin-owner AT cygwin DOT com |
Mail-Followup-To: | cygwin AT cygwin DOT com |
Delivered-To: | mailing list cygwin AT cygwin DOT com |
Authentication-Results: | sourceware.org; auth=none |
X-Virus-Found: | No |
X-Spam-SWARE-Status: | No, score=-5.6 required=5.0 tests=BAYES_00,GIT_PATCH_2,KAM_ASCII_DIVIDERS,RCVD_IN_SORBS_SPAM,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy=spinning, fastest, 2.5, competing |
X-HELO: | fe3.lbl.gov |
X-Ironport-SBRS: | 2.7 |
X-IronPort-Anti-Spam-Filtered: | true |
X-IronPort-Anti-Spam-Result: | A2H1AQAeI79Yf8XcVdFdHAEBBAEBCgEBFwEBBAEBCgEBhAeBCgeDWIp/kzoBihyKAkMqhXiCJAdDFAEBAQEBAQEBAQEBAhABAQkLCwgmMYIzIg1GJjIBAQEBAQEBAQEBAQEBAQEaAj4SREsLJg8CJgIkEgEFAQ4BQYlkBQmkEz+MA4Imin0BAQgCASUJAQh5jUmEQ4JfBYkZhz2FIYY5hnaLQYJOh3GGYEiRLBQegRU2bTgZCxMlThcFhDgdggFXAYoSAQEB |
X-Google-DKIM-Signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=GhgtxkrZcEa/TQb39GzfVdvFFrcaPR5iwQpSEnQ6oFE=; b=iYJn2rFywv3XKWLgYBkvp2XaDp0sb4lcSVdp+qmvYyF1nRObJ1sbq2in+rUsx8H7aS gBzoWNTBV2N5Mdgjwsh/+Fsw58jaW1Oh6KDd5+m4+QDMupvMshCaYD/N9ek1/Iy2wmyT OZG0PeD9/8zFanNuNEVd0Uz42yX1g6o35UEiZNq2SBXt5DsnR7CImhp32CNHhclgqP4k gIRyjVKY33mrPFecgzQMMHMQkdbr8sIhYQ7B0peey4vTJN+WeYr+1/SjBX9LusvuU8kx NFilupDxRjUtLXd8HQaVEWyTiiogiVU9xYJektPMlWAOh8PCekImdfOIMZMW9vCZb+Wo jFhw== |
X-Gm-Message-State: | AMke39nL9pb7RseOIKRP8UpLEB/hYet/YrsSpAsmrdPcieVAuT9VPrEmRG6+MUq3lQ+JO47JkQNMWqQLKknPcnGxpjLFV05d9yCnWy5izi2kZy7I/A28M6aZJaRshllQ3lOpmA== |
X-Received: | by 10.200.42.66 with SMTP id l2mr3272526qtl.33.1488921617372; Tue, 07 Mar 2017 13:20:17 -0800 (PST) |
X-Received: | by 10.200.42.66 with SMTP id l2mr3272483qtl.33.1488921616985; Tue, 07 Mar 2017 13:20:16 -0800 (PST) |
X-Received: | by 10.55.16.230 with SMTP id 99mr3189558qkq.295.1488921615498; Tue, 07 Mar 2017 13:20:15 -0800 (PST) |
MIME-Version: | 1.0 |
From: | Dan Bonachea <dobonachea AT lbl DOT gov> |
Date: | Tue, 7 Mar 2017 16:19:35 -0500 |
X-Gmail-Original-Message-ID: | <CAJTO8-Z7Dn-cqt3zcgP26zbECtx2-TjKYQSEXLJbST-mDMaLow AT mail DOT gmail DOT com> |
Message-ID: | <CAJTO8-Z7Dn-cqt3zcgP26zbECtx2-TjKYQSEXLJbST-mDMaLow@mail.gmail.com> |
Subject: | pthread_create() slowdown with concurrent sched_yield() |
To: | cygwin AT cygwin DOT com |
Cc: | gasnet-devel AT lbl DOT gov, Dan Bonachea <dobonachea AT lbl DOT gov> |
I suspect I may have discovered a corner-case performance bug in Cygwin's pthread_create() implementation. The problem arises when a call to pthread_create() is made concurrently with multiple pthreads in the same process spinning on calls to sched_yield(). I've searched the Cygwin mailing list archives, user guide, FAQ, and Google and not found any mention of this particular misbehavior. A minimal demo program is copied below and also available here: https://upc-bugs.lbl.gov/bugzilla/attachment.cgi?id=549 The demo program is a narrowed-down version of test code used in the GASNet communication system (http://gasnet.lbl.gov). The test code calls pthread_create to spawn a user-controlled number of threads, which then execute 1000 "spin barriers" - implemented by spinning on in-memory flags and stalling with sched_yield(). The test can also optionally insert a pthread_barrier_wait() across all threads before the first spin barrier. Here are some experimental results - these are full-process "real" wall-clock timings (fastest over 5 runs) collected using the bash 'time' shell built-in. The systems are otherwise idle. All code has been compiled with the default 64-bit /usr/bin/gcc (compile line appears as a comment in the test), but the results are similar with clang. 8-core Win7-Cygwin/64 2.6.0 8-core Linux/64 3.13.0 (Ubuntu) i7-4800MQ @ 2.70GHz Xeon E5420 @ 2.50GHz 4 core x 2-way hyperthread 2 socket x 4 cores/socket thread create-vs- create-vs- create-vs- create-vs- count spin/yield pthread_barrier spin/yield pthread_barrier ------ ------------ ---------------- ------------- ----------------- 1 0m 0.000s 0m0.000s 0m0.001s 0m0.001s 2 0m 0.000s 0m0.000s 0m0.002s 0m0.002s 4 0m 0.000s 0m0.000s 0m0.002s 0m0.003s 8 0m 0.000s 0m0.016s 0m0.003s 0m0.006s 16 0m10.717s 0m0.000s 0m0.013s 0m0.012s 32 2m23.988s 0m0.016s 0m0.018s 0m0.024s 64 12m40.002s 0m0.016s 0m0.038s 0m0.046s 128 >20m* 0m0.016s 0m0.063s 0m0.067s 256 >20m* 0m0.047s 0m0.290s 0m0.631s (*) = killed after >20m of wall time (>2.5 hours of cpu time) When the number of pthreads start to exceed the physical core count, Cygwin's pthread_create() starts taking exponentially longer to return when it is competing with concurrent calls to sched_yield(). During the long pauses, windows Task Manager shows the process consuming 100% CPU on all cores and it becomes unresponsive to SIGINT. The observed behavior seems to suggest that Cygwin's pthread creation operation (and/or the newly spawned thread) is not being scheduled, despite every OTHER application thread spamming calls to sched_yield(). If the other threads competing with pthread_create() are instead stalled in a pthread_barrier_wait(), the problem goes away entirely (ie by adding a semantically unnecessary pthread_barrier_wait(), the worst-case performance gets over 75,000x better). The test results demonstrate that the spin barriers themselves run quite fast, but pthread_create() runs very slowly when other unrelated threads are executing sched_yield(). Note that inserting pthread_barrier_wait() to stall every thread in the process during a pthread_create() is not always a practical solution in a real program, where the thread creation behavior may be less regular than shown in this example. Also shown are performance results for the same test on a Linux system with somewhat comparable hardware (the CPU running Linux is 5 years older on Intel's product calendar). The Linux system does NOT demonstrate the problem. Similar code has run on several other POSIX OS's (including OSX, FreeBSD, NetBSD, Solaris), in a wide variety of architectural configurations -- all without problems. This pthread_create() performance problem has been reproduced with similar results on four different windows machines (including laptops and servers), running all combinations of the following Cygwin configurations: Windows 7/64 Cygwin {32,64} {2.7,2.6,2.0} Windows 10/64 Cygwin 64 2.7 I realize this may represent a parallelism pattern that cannot be supported efficiently on Cygwin (and we've internally found an app-specific workaround not represented here), but I thought it responsible to report the performance issue anyhow. Thanks for your consideration. -Dan Bonachea ======================================================================================== // pthread-spawn.c test, by Dan Bonachea // compile with a command like: // gcc -std=c99 -D_REENTRANT -D_GNU_SOURCE pthread-spawn.c -o pthread-spawn -lpthread // usage: // pthread-spawn <initialbarrier> <numthreads> <numiters> #include <pthread.h> #include <sched.h> #include <stdio.h> #include <stdlib.h> int numthreads=256; int numiters=1000; int initialbarrier = 0; pthread_barrier_t pthbarrier; volatile int *spinbarrier; void *thread_start(void *p) { volatile int *myspin = p; if (initialbarrier) { int ret = pthread_barrier_wait(&pthbarrier); if (ret && ret != PTHREAD_BARRIER_SERIAL_THREAD) perror("pthread_barrier_wait"); } if (myspin == &spinbarrier[numthreads-1]) { // last thread printf("Running %d spin barriers...\n",numiters); } for (int iter=1; iter <= numiters; iter++) { // execute numiters spin barriers if (myspin == spinbarrier) { // master thread for (int th = 1; th < numthreads; th++) { // wait for each slave while (spinbarrier[th] != iter) { if (sched_yield()) perror("sched_yield"); // yield } } *spinbarrier = iter; // broadcast } else { // slave threads *myspin = iter; // signal while (*spinbarrier != iter) { // wait for master broadcast if (sched_yield()) perror("sched_yield"); // yield } } } return 0; } int main(int argc, char **argv) { // parse args if (argc > 1) initialbarrier = atoi(argv[1]); if (argc > 2) numthreads = atoi(argv[2]); if (argc > 3) numiters = atoi(argv[3]); // init data structures pthread_t *th = malloc(sizeof(pthread_t)*numthreads); spinbarrier = calloc(sizeof(int),numthreads); if (pthread_barrier_init(&pthbarrier, NULL, numthreads)) perror("pthread_barrier_init"); printf("Creating %d threads%s...\n",numthreads, (initialbarrier?", with initial pthread_barrier_wait":""));fflush(stdout); for (int i=0; i < numthreads; i++) { if (pthread_create(&th[i], NULL, thread_start, (void *)&spinbarrier[i])) perror("pthread_create"); } printf("Creation complete!\n"); fflush(stdout); for (int i=0; i < numthreads; i++) { void *ret; if (pthread_join(th[i], &ret)) perror("pthread_join"); } printf("Done!\n");fflush(stdout); return 0; } -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |