X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A8936385841D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1704725605; bh=twNukmjOdZxvjVBe8BhmDJ32UknUtNJgH9I/K+s4Z1g=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=rxj8cdS0pmcPjoRhIWEyZhIXkELx4N+oqZ6gs1Vgu7dpvwVSiR0DEZkcMIfWmEwnH v7ESS51k/t2pSookIwj8L6X7hSQSrYn3VdPZhFA3QP2SA1WFw2WRo3RGp4Kz7m4axM P6Ya51SOK9KPLH4X7tBaKZqiPIBOr8pFQFU2x3lY= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CD6203858C29 Date: Mon, 8 Jan 2024 15:53:02 +0100 To: cygwin AT cygwin DOT com Subject: Re: rfe: CYGWIN fslinktypes option? Re: Catastrophic Cygwin find . -ls, grep performance on samba share compared to WSL&Linux Message-ID: Mail-Followup-To: cygwin AT cygwin DOT com References: <07c7379e983c9f436ebf86e3818ca843 AT kylheku DOT com> <4723aab7e2b331cb81946eff0fb4e862 AT kylheku DOT com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Corinna Vinschen via Cygwin Reply-To: cygwin AT cygwin DOT com Cc: Corinna Vinschen Content-Type: text/plain; charset="utf-8" Errors-To: cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 408ErRph016424 On Dec 24 01:47, Roland Mainz via Cygwin wrote: > On Thu, Dec 21, 2023 at 9:32 PM Kaz Kylheku via Cygwin > wrote: > > On 2023-12-21 04:16, Martin Wege via Cygwin wrote: > > > On Wed, Dec 20, 2023 at 6:21 PM Kaz Kylheku via Cygwin > > > wrote: > [snip] > > > The root cause is IMO the extra Win32 syscalls (>= 3 per file lookup, > > > compared to 1 on Linux) to lookup the *.lnk and *.exe.lnk files on > > > filesystems which have native link support (NTFS, ReFS, SMBFS, NFS). > > > On SMBFS and NFS it hurts the most, because access latency is the > > > highest for networked filesystems. > > > > Could some intelligent caching be added there? (Discussion of > > associated invalidation problem in 3... 2.... 1... ) > > See below, basically a short-lived cache which is only valid for the > lifetime of the one POSIX function call would be OK... > > > Can you discuss more details, so people don't have to dive into code > > to understand it? If we are accessing some file "foo", the application > > or user may actually be referring to a "foo.lnk" link. But in the > > happy case that "foo" exists, why would we bother looking for "foo.lnk"? > > > > If "foo" does not exist, but "foo.lnk" does, that could probably be > > cached, so that next time "foo" is accessed, we go straight for "foo.lnk", > > and keep using that while it exists. > > > > If someone has both "foo" and "foo.lnk" in the same directory, > > that's a bit of a degenerate case; how important is it to be "correct", > > anyway. > > Question, mainly for Corinna: > Could the code be modified to use one |NtQueryDirectoryFile()| call > with a SINGLE pattern testing for { "foo", "foo.lnk", "foo.lnk.exe", > ... } (instead of calling the kernel for each suffix independently) > and cache that information for the lifetime of the matching POSIX > function call ? Yes and no. This could certainly made to work, but it has a couple of caveats which are not trivial, and there's *no* guarantee that you will be able to get faster code by doing that. At all. First of all, in contrast to calling NtOpenFile on the file, NtQueryDirectoryFile always needs two calls, because you have to open the directory first. If you then found the file, you have to open the file to fetch information. So you have always one more call than by opening the file immediately and having immediate success. It's more or less equivalent if the file is a *.exe file, and it's one less hit if it's a *.lnk file. Which pattern would you like to use? Let's assume we carefully try to get rid of .exe.lnk, we still have to check for "foo", "foo.exe" and "foo.lnk". Even if we get rid of .lnk, we have two patterns which can *not* be expressed in a single call to NtQueryDirectoryFile. We only have Windows' most simple globbing, i. e., we have '*' and '?'. The only pattern matching "foo" and "foo.exe" is "foo*". "foo.*" does not hit on "foo". So "foo*". As you know, the NtQueryDirectoryFile call can return a buffer with multiple hits. But the buffer has a finite size, so if somebody is looking for the file "a", we'd have to look for "a*", which may have more hits than fit into the buffer, So the code has to be prepared not only to scan a 64K buffer for (potentially) hundrets of entries, but also to repeat the call to NtQueryDirectoryFile to load more matching file entries. Next problem, NFS. The current call just opening the file checks with the necessary flags to access symlinks. Without these flags, NFS symlinks are invisible or not handled as symlinks. So, right now, we have a single call on NFS to open a file, if it exists without suffix. If you use NtQueryDirectoryFile, you have another subtil problem. If it happens to be an NFS dir, you have to use another FILE_INFORMATION_CLASS, otherwise symlinks don't show up at all. This information clas isn't even sufficient for the most basic of information we need in the symlink_info::check method. So you need to open the file here, too, and extract the information. There's probably more to it, but that's just what came to mind for a start. > The idea is to reduce the number of userland<--->kernel roundstrips > from to <1>, and filesystem drivers could be optimized even > further (for example if the network filesystem protocol supports file > name globbing...) I have a hard time to see that you can really avoid a lot of calls. You may find that you won't save a lot of them, and another lot of them don't matter becasue the OS already cached information. Also, as exciting as it might be to do extensive caching (and, as I wrote in a former reply today, we do some caching), keep in mind the we are only a user-space DLL. The only caching of file information you can rely upon is that of the kernel. Corinna -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple