X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; q=dns; s=default; b=Cz 9uCgB+sO56O3KOQrhyN9WgQ25qj5Hs2LsiEFreFe2ReHAowpfnZ5bHgeJ0paFVcI 36SnT90E9FcdkNyOveHcsd2amTS/jULpeYjaKO8Y1UfVNnsLffjIUaWfne3Dxb0K C/WM7Ato0SloX+fN4vxMTTRraeIJ7nP73zENgOINw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; s=default; bh=xdHfBEr+ A6KvKs5V7JXvoc9MssQ=; b=fIXz2w/DuHyhtWtEidf28rtGovbbpuv+T/0b6DaN G3O/xg5A+tiyff6gj7iutf3QlhVaTAMgG/mLA71vaoh73nRedIKXmWVffuFCTeXG ktCQfQRhy8OC7z+u/moUuec88fhatHcXiFIyV0rIwKI145AluZ53Yge+T8X37gpq VqY= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=2.2 required=5.0 tests=AWL,BAYES_40,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-lb0-f181.google.com MIME-Version: 1.0 X-Received: by 10.112.134.169 with SMTP id pl9mr5900948lbb.145.1448029503768; Fri, 20 Nov 2015 06:25:03 -0800 (PST) In-Reply-To: <564E3017.90205@maxrnd.com> References: <564E3017 DOT 90205 AT maxrnd DOT com> Date: Fri, 20 Nov 2015 15:25:03 +0100 Message-ID: Subject: Re: Cygwin multithreading performance From: Kacper Michajlow To: cygwin AT cygwin DOT com Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes 2015-11-19 21:24 GMT+01:00 Mark Geisert : > Kacper Michajlow wrote: >> >> I recently noticed that Cygwin multithreading is very inefficient. I >> was repacking few git repositories and with Cygwin's git, it spawns >> threads but they are so badly synchronized that there is no speed gain >> over one thread and possible loose because of the overhead. On my >> machine I got 7-10% CPU usage while with git build with mingw easily >> uses 100%. >> >> You can find the code in question here >> https://github.com/git/git/blob/master/builtin/pack-objects.c#L1967-L2094 >> >> Do you have any suggestions? Is there any chance to get MT workloads >> improved in Cygwin? In present days it is really big problem in my >> opinion. > > > Although there have been some issues with Cygwin pthreads reported and > resolved, I can't recall complaints about their performance. You don't > supply much specific info so I had to guess that you must be doing something > like 'git gc' to provoke calls to the code you quote. Please give more info > if I was mistaken. > > I did an strace of 'git gc' over a small source tree I have and found: > >> ~/src/cygwin-cygutils strace --mask=debug+syscall+thread -o git.strace git >> gc >> Counting objects: 1691, done. >> Delta compression using up to 4 threads. >> Compressing objects: 100% (398/398), done. >> Writing objects: 100% (1691/1691), done. >> Total 1691 (delta 1250), reused 1691 (delta 1250) >> >> ~/src/cygwin-cygutils grep "fork(" git.strace >> 350 111164 [main] git 360 fork: 0 = fork() >> 59 113379 [main] git 4980 fork: 360 = fork() >> 496 242346 [main] git 4980 fork: 368 = fork() >> 513 242585 [main] git 368 fork: 0 = fork() >> 828 589040 [main] git 4980 fork: 4968 = fork() >> 685 589341 [main] git 4968 fork: 0 = fork() >> 591 126631 [main] git 4968 fork: 1784 = fork() >> 483 126866 [main] git 1784 fork: 0 = fork() >> 618 2320996 [main] git 4980 fork: 2912 = fork() >> 558 2321259 [main] git 2912 fork: 0 = fork() >> 555 3023781 [main] git 4980 fork: 1612 = fork() >> 500 3024002 [main] git 1612 fork: 0 = fork() >> 766 3112383 [main] git 4980 fork: 1756 = fork() >> 681 3112655 [main] git 1756 fork: 0 = fork() > > > There's your problem. Git is for some reason fork()ing to do its parallel > operations. fork() is very complicated to emulate on Windows and Cygwin's > fork() is already known to be slow compared to native OS implementations. > > Why is mingw faster? Inspection of run-command.c in the git source tree > (BTW thanks for the github link) shows that start_command() has two code > paths divided by "#ifndef GIT_WINDOWS_NATIVE". The Windows native path > (e.g. mingw) doesn't fork() but instead spawns subprocesses. On Cygwin the > fork() path is used. Git probably ought to use the spawn code path on > Cygwin too. > > I don't know offhand if this is something Cygwin's git maintainer would want > to tackle or if it should be handled upstream but I'd guess the latter. > Hope this helps, > > ..mark > > -- > Problem reports: http://cygwin.com/problems.html > FAQ: http://cygwin.com/faq/ > Documentation: http://cygwin.com/docs.html > Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple > Thanks for reply. And sorry for being not specific enough before. 'git gc' is a driver which runs various git command to do cleanup in repository. Though I'm mostly concerned about the code I linked. Instead of 'git gc' it is better to test directly 'git repack -a -f' and possibly on repository where it takes some time. 'git://sourceware.org/git/newlib-cygwin.git' is good test case. Although with bigger repositories performance hit is bigger, this is good example to see what's going on. I'm well aware that forking on windows is problematic, but I explicitly interested in parallelized part of execution. I don't care about forks, while this slows things down too, they are not used in compression process which is parallelized over the all cpu threads. Each command is indeed forked, but I'm only interested about pack-objects part hence the code I linked. Here is my result on mineralized test. $ strace --mask=debug+syscall+thread -o git.strace git repack -a -f Counting objects: 156690, done. Delta compression using up to 12 threads. Compressing objects: 100% (154730/154730), done. Writing objects: 100% (156690/156690), done. Total 156690 (delta 123449), reused 33146 (delta 0) $ grep "fork(" git.strace 559 53728 [main] git 24340 fork: 24368 = fork() 465 54022 [main] git 24368 fork: 0 = fork() Only two forks were created, while during compression only 25% cpu was used (on big repo like linux kernel it doesn't exceed 8%). With native git the same workload easily uses 95-100% cpu and therefor is a lot faster. I know I'm not that specific, but I don't know what more to say here. I could try to produce sample app to illustrate the issue. But git is already good example I think. Pure C with pthreads. I already linked the code in my first email. Tell me how I can help to diagnose it further. -Kacper -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple