Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Message-ID: <42A3BC5C.1090605@tlinx.org> Date: Sun, 05 Jun 2005 20:00:44 -0700 From: Linda W User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: Performance problems References: <4297A14B DOT 9070409 AT plausible DOT org> <20050528131501 DOT V53507 AT logout DOT sh DOT cvut DOT cz> <20050528160424 DOT GB12395 AT trixie DOT casa DOT cgf DOT cx> <429ED094 DOT 9080001 AT tlinx DOT org> <20050602172226 DOT GC6597 AT trixie DOT casa DOT cgf DOT cx> <42A2246D DOT 3090000 AT tlinx DOT org> <20050605005508 DOT GA2706 AT trixie DOT casa DOT cgf DOT cx> In-Reply-To: <20050605005508.GA2706@trixie.casa.cgf.cx> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Christopher Faylor wrote: >On Sat, Jun 04, 2005 at 03:00:13PM -0700, Linda W wrote: > > >>You are technically accurate, but the cygwin layer is a POSIX >>complient-OS emulation layer by some definition, no? >> >> > >Yes, but that has nothing to do with caching. Cygwin is just a DLL. It >can't monitor all file transactions in the whole system. > > True, but cygwin doesn't need to monitor the entire OS -- neither does Windows. Take a look at the open file descriptors held by the winlogon process sometime -- it holds open OS-specific directories and files. Cygwin would only need to "cache" items (in the sense I would anticipate) while the DLL is loaded and only those file items that are being used by the current program. For example a simple find command on /tmp "find /tmp" produces 17 lines: /tmp /tmp/d.txt /tmp/run-crons.ZE1996 /tmp/run-crons.ZE1996/run-crons.1924 /tmp/run-crons.ZE1996/run-crons.daily.1924 /tmp/588-reg.reg /tmp/1892-reg.reg /tmp/VolumeC.txt /tmp/xyz.txt /tmp/wd.txt /tmp/d1.txt /tmp/xyz.txt.orig /tmp/AUTORUN.INF /tmp/WD_Data.ICO /tmp/WD_Install.exe /tmp/img1 /tmp/1 ============ In all there were 311 file operations to list these 17 files. They break down as folows: 1-27 - finding program by bash 28-48 - loading libraries 49-75 - processing "C:\, C:\home and C:\home\username 76-243 - working on tmp 244-311 - accessing home directory; search for psapi.dll & close of /tmp The ones working on tmp were broken down as follows: The first 27 were processing by bash to find "find.exe". Ignore. Commands up to 28-48 were loading cygwin libraries by the find command; Ignore that. Commands 49-75 Involved file ops (Open, Query Info, Directory on the paths C:\, C:\home\ and C:\home\user). Calls 76-243 seem to be working on /tmp, calls. The tmp calls (executing between time index 51.995 - 51.005 (<1 clock tick), show the following breakdown: 1 C:\home\law, QUERY INFORMATION 1 C:\tmp\d.txt, READ 2 C:\home\law, CLOSE 2 C:\home\law, OPEN 2 C:\tmp\d.txt, CLOSE 2 C:\tmp\d.txt, OPEN 3 C:\tmp\d.txt, QUERY INFORMATION 5 C:\tmp\run-crons.ZE1996\, CLOSE 5 C:\tmp\run-crons.ZE1996\, OPEN 6 C:\tmp\run-crons.ZE1996, QUERY INFORMATION 7 C:\, CLOSE 7 C:\, DIRECTORY 7 C:\, OPEN 8 C:\tmp\run-crons.ZE1996, CLOSE 8 C:\tmp\run-crons.ZE1996, OPEN 10 C:\tmp, QUERY INFORMATION 12 C:\tmp\, CLOSE 12 C:\tmp\, OPEN 13 C:\tmp, CLOSE 13 C:\tmp, OPEN 15 C:\tmp\run-crons.ZE1996\, DIRECTORY 28 C:\tmp\, DIRECTORY So if I was wanting to cache -- say limit caching to ~.1-1 seconds, it would appear, on the surface, to possibly reduce the 169 calls to maybe 22? Lock open for read C:\, then C:\tmp, and C:\tmp\run-crons.Ze1996 while processing those dirs. With open file handles for read, you can't remove or rename them during that .1-1 seconds. That could eliminate 147 (87%) of the calls, in best case, almost a 10x speedup. >>I wouldn't cache data without keeping the associated handles to the >>corresponding file objects open. As long as they are kept open, >>Windows would disallow things like deleting the file and replacing >>it with a directory. That should control most race conditions >>with some degree of relative safety. >> >> >You can't do that without taking the fact that the handle is open into >account when cygwin itself removes a file, opens a file, renames a file. > > You can't? It would seem the cygwin library, itself could maintain it's own list of open descriptors and close them when needed. Doesn't cygwin use a shared-memory region for interprocess communication? Couldn't this same region be used for the File-handle/info cache so multiple cygwin processes would behave with each other? >And it could be pretty surprising to find that when process a does an >opendir/readdir, process b is now unable to delete a file. > > I'm not 100% certain, but I believe having a file (or dir) open for read doesn't mean someone can't change the contents. They just can't delete the dir or file that is still opened for reading. This is already a problem even w/o caching. Cygwin can't delete various directories because they are kept open by the login shell. Weird and strange dirs like the MSN Gaming Zone that winlogin kept open even though it was empty (deletable by forcing the winlogon handle to close). So I don't see that as a major loss of functionality as the problem already exists. >>>She thinks that the benefits would outweigh the tiny possibility of bad >>>cache data resulting from something like performing an "ls" on a file >>>and having, e.g., some other process sneak in, remove the file and >>>introduce a directory, but still having "ls" report file data. >>> >>> >>Isn't this already a problem on networked shares? I.e. doesn't >>Windows cache file info from network shares for a few seconds (maybe >>more if one has local-file caching turned on). >> >> >I don't know but, regardless, this would increase the possibility for >surprise to include local disks too. I'm not convinced that this is a >good thing. This would make the behavior that Gary R. Van Sickle >recently reported as the result of using google search (I think it was >google search), where files were kept open even though it seems like >they should be closed, common with cygwin. > > I couldn't find a reference to GRSV's report on files being kept open by a search engine. However, with the caching proposed for cygwin, those "file open" opportunities are measured in fractions of a second. Caching for 10 milliseconds, might have saved nearly 90% of the calls to Windows. Windows is hardly a real-time OS where tolerances need millisecond precision -- the clock defaults to about a 20Hz clock speed unless you've tweaked it. >>However, you spend time writing how no one _ever_ investigates >>performance problems or suggests solutions. That appears to be a >>cynical view. Then, when offered a clear example to the contrary, you >>discard the effort as being "unoriginal" and already something that has >>been (and is being) considered independantly of their suggestion. >> >>That \could\ be perceived, by some, as "mean-spirited" or "spiteful". >>I don't feel that this _encourages_ people to take the time to actually >>"figure out" problems nor "figure out" improvements. If they don't >>know you, some people might take it personally. :-) (Not that you >>would be expected to care, publically :-) ). >> >> > >You seem to be affronted by something that I said before you even >responded. I did not respond to your email with a "you didn't even look >at the code" response. I did not say "you are unoriginal". I merely >represented our current thinking about the subject that you raised. > >I happen to know that Corinna isn't around so I wanted to make sure that >she got the credit for having been thinking about this and even going so >far as to start coding something, I believe. We have been talking about >caching for a long, long time. I believe that there is even an "#if 0" >or two in the cygwin code still which contains my aborted attempt to >cache some path_conv lookups. > > I wasn't _really_ affronted, that's why I posed it as a as both a question about why you felt the need to talk about caching that was suggested almost 2 years ago and "pre-apologized", if I misunderstood your meaning. Credit Smedit...other side of the blame, and scapegoating. Politics and 1-upmanship. Can't we just have "consensus" and agree that something is a good idea? Nahhh.... not in today's IP (Intellectual Property) climate. Sigh. Anyway, I hope you don't become increasingly against the idea now that I've supported Corinna in her ideas...:-) I.e. by supporting the idea, I hope I'm not shooting myself in the foot....:-? linda -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/