Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Date: Sun, 5 Jun 2005 23:46:52 -0400 From: Christopher Faylor To: cygwin AT cygwin DOT com Subject: Re: Performance problems Message-ID: <20050606034652.GB9161@trixie.casa.cgf.cx> Reply-To: cygwin AT cygwin DOT com References: <4297A14B DOT 9070409 AT plausible DOT org> <20050528131501 DOT V53507 AT logout DOT sh DOT cvut DOT cz> <20050528160424 DOT GB12395 AT trixie DOT casa DOT cgf DOT cx> <429ED094 DOT 9080001 AT tlinx DOT org> <20050602172226 DOT GC6597 AT trixie DOT casa DOT cgf DOT cx> <42A2246D DOT 3090000 AT tlinx DOT org> <20050605005508 DOT GA2706 AT trixie DOT casa DOT cgf DOT cx> <42A3BC5C DOT 1090605 AT tlinx DOT org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <42A3BC5C.1090605@tlinx.org> User-Agent: Mutt/1.5.8i On Sun, Jun 05, 2005 at 08:00:44PM -0700, Linda W wrote: >Christopher Faylor wrote: >>On Sat, Jun 04, 2005 at 03:00:13PM -0700, Linda W wrote: >>>You are technically accurate, but the cygwin layer is a POSIX >>>complient-OS emulation layer by some definition, no? >> >>Yes, but that has nothing to do with caching. Cygwin is just a DLL. It >>can't monitor all file transactions in the whole system. > >True, but cygwin doesn't need to monitor the entire OS -- neither >does Windows. Take a look at the open file descriptors held by >the winlogon process sometime -- it holds open OS-specific >directories and files. I am talking about cache coherency. The OS can maintain it because it knows what files are being updated. Cygwin can't. If cygwin opens a file and another unrelated process modifies it, cygwin's cached information would be wrong. This is a simple statement of fact. >Cygwin would only need to "cache" items (in the sense I would >anticipate) while the DLL is loaded and only those file items >that are being used by the current program. For example a simple >find command on /tmp "find /tmp" produces 17 lines: >/tmp >/tmp/d.txt >/tmp/run-crons.ZE1996 >/tmp/run-crons.ZE1996/run-crons.1924 >/tmp/run-crons.ZE1996/run-crons.daily.1924 >/tmp/588-reg.reg >/tmp/1892-reg.reg >/tmp/VolumeC.txt >/tmp/xyz.txt >/tmp/wd.txt >/tmp/d1.txt >/tmp/xyz.txt.orig >/tmp/AUTORUN.INF >/tmp/WD_Data.ICO >/tmp/WD_Install.exe >/tmp/img1 >/tmp/1 >============ >In all there were 311 file operations to list these 17 files. >They break down as folows: >1-27 - finding program by bash >28-48 - loading libraries >49-75 - processing "C:\, C:\home and C:\home\username >76-243 - working on tmp >244-311 - accessing home directory; search for psapi.dll & close of /tmp > >The ones working on tmp were broken down as follows: > >The first 27 were processing by bash to find "find.exe". Ignore. >Commands up to 28-48 were loading cygwin libraries by the find >command; Ignore that. >Commands 49-75 Involved file ops (Open, Query Info, Directory on the >paths C:\, C:\home\ and C:\home\user). Calls 76-243 seem to be working >on /tmp, calls. The tmp calls (executing between time index 51.995 - >51.005 (<1 clock tick), show the following breakdown: > > 1 C:\home\law, QUERY INFORMATION > 1 C:\tmp\d.txt, READ > 2 C:\home\law, CLOSE > 2 C:\home\law, OPEN > 2 C:\tmp\d.txt, CLOSE > 2 C:\tmp\d.txt, OPEN > 3 C:\tmp\d.txt, QUERY INFORMATION > 5 C:\tmp\run-crons.ZE1996\, CLOSE > 5 C:\tmp\run-crons.ZE1996\, OPEN > 6 C:\tmp\run-crons.ZE1996, QUERY INFORMATION > 7 C:\, CLOSE > 7 C:\, DIRECTORY > 7 C:\, OPEN > 8 C:\tmp\run-crons.ZE1996, CLOSE > 8 C:\tmp\run-crons.ZE1996, OPEN > 10 C:\tmp, QUERY INFORMATION > 12 C:\tmp\, CLOSE > 12 C:\tmp\, OPEN > 13 C:\tmp, CLOSE > 13 C:\tmp, OPEN > 15 C:\tmp\run-crons.ZE1996\, DIRECTORY > 28 C:\tmp\, DIRECTORY > >So if I was wanting to cache -- say limit caching to ~.1-1 seconds, it would appear, on the surface, to possibly reduce the 169 calls to >maybe 22? You really can't predict without looking at the code. There is no way of knowing what the above information represents as far as what cygwin and find are doing. >>You can't do that without taking the fact that the handle is open into >>account when cygwin itself removes a file, opens a file, renames a file. >> >You can't? It would seem the cygwin library, itself could maintain >it's own list of open descriptors and close them when needed. Doesn't >cygwin use a shared-memory region for interprocess communication? >Couldn't this same region be used for the File-handle/info cache so >multiple cygwin processes would behave with each other? I was filling in the details here just to show that the solution of keeping files open has consequences. Keeping the file open increases the complexity of every function which manipulates a file rather than the one or two functions which might be interested in the cached status information. >>And it could be pretty surprising to find that when process a does an >>opendir/readdir, process b is now unable to delete a file. >> >I'm not 100% certain, but I believe having a file (or dir) open >for read doesn't mean someone can't change the contents. They >just can't delete the dir or file that is still opened for reading. Which is why I said "unable to delete a file". >This is already a problem even w/o caching. Cygwin can't delete >various directories because they are kept open by the login shell. Being unable to consistently delete a file because something has it open is explainable. What isn't explainable is "Why does my configure script work some times but not others?" When you talk about keeping caching information around, you stand the chance of something like this not working: find . -name foo | xargs rm because find may still have foo open when rm tries to remove it. That may not be a huge deal for a FS/OS which honors DELETE_ON_CLOSE and will be able to delete "foo" regardless, but this introduces another potential place for code complexity. cgf -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/