delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2024/01/08/09:53:27

X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A8936385841D
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1704725605;
bh=twNukmjOdZxvjVBe8BhmDJ32UknUtNJgH9I/K+s4Z1g=;
h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
From;
b=rxj8cdS0pmcPjoRhIWEyZhIXkELx4N+oqZ6gs1Vgu7dpvwVSiR0DEZkcMIfWmEwnH
v7ESS51k/t2pSookIwj8L6X7hSQSrYn3VdPZhFA3QP2SA1WFw2WRo3RGp4Kz7m4axM
P6Ya51SOK9KPLH4X7tBaKZqiPIBOr8pFQFU2x3lY=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CD6203858C29
Date: Mon, 8 Jan 2024 15:53:02 +0100
To: cygwin AT cygwin DOT com
Subject: Re: rfe: CYGWIN fslinktypes option? Re: Catastrophic Cygwin find .
-ls, grep performance on samba share compared to WSL&Linux
Message-ID: <ZZwMTtV5_kdgH0Yr@calimero.vinschen.de>
Mail-Followup-To: cygwin AT cygwin DOT com
References: <CAAvCNcBZGepZMP9Q0D5ua+6ACftDOQEriqnuCbwg6umBPUA72Q AT mail DOT gmail DOT com>
<CAAvCNcB0_0ZeujP23QZFZaDvVTh5rxbXJw4FP6uXNPErCgdZ2w AT mail DOT gmail DOT com>
<07c7379e983c9f436ebf86e3818ca843 AT kylheku DOT com>
<CANH4o6OjJJZQkbELt+H3WdAxQbLGZ1DL0ytevknRpbTO9sVUig AT mail DOT gmail DOT com>
<4723aab7e2b331cb81946eff0fb4e862 AT kylheku DOT com>
<CAKAoaQnQ2eL9JJfn=CeJ06WujqgLdLVeXS7ojf7GvvmkB-KYoA AT mail DOT gmail DOT com>
MIME-Version: 1.0
In-Reply-To: <CAKAoaQnQ2eL9JJfn=CeJ06WujqgLdLVeXS7ojf7GvvmkB-KYoA@mail.gmail.com>
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.30
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Corinna Vinschen via Cygwin <cygwin AT cygwin DOT com>
Reply-To: cygwin AT cygwin DOT com
Cc: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
Errors-To: cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 408ErRph016424

On Dec 24 01:47, Roland Mainz via Cygwin wrote:
> On Thu, Dec 21, 2023 at 9:32 PM Kaz Kylheku via Cygwin
> <cygwin AT cygwin DOT com> wrote:
> > On 2023-12-21 04:16, Martin Wege via Cygwin wrote:
> > > On Wed, Dec 20, 2023 at 6:21 PM Kaz Kylheku via Cygwin
> > > <cygwin AT cygwin DOT com> wrote:
> [snip]
> > > The root cause is IMO the extra Win32 syscalls (>= 3 per file lookup,
> > > compared to 1 on Linux) to lookup the *.lnk and *.exe.lnk files on
> > > filesystems which have native link support (NTFS, ReFS, SMBFS, NFS).
> > > On SMBFS and NFS it hurts the most, because access latency is the
> > > highest for networked filesystems.
> >
> > Could some intelligent caching be added there? (Discussion of
> > associated invalidation problem in 3... 2.... 1... )
> 
> See below, basically a short-lived cache which is only valid for the
> lifetime of the one POSIX function call would be OK...
> 
> > Can you discuss more details, so people don't have to dive into code
> > to understand it? If we are accessing some file "foo", the application
> > or user may actually be referring to a "foo.lnk" link. But in the
> > happy case that "foo" exists, why would we bother looking for "foo.lnk"?
> >
> > If "foo" does not exist, but "foo.lnk" does, that could probably be
> > cached, so that next time "foo" is accessed, we go straight for "foo.lnk",
> > and keep using that while it exists.
> >
> > If someone has both "foo" and "foo.lnk" in the same directory,
> > that's a bit of a degenerate case; how important is it to be "correct",
> > anyway.
> 
> Question, mainly for Corinna:
> Could the code be modified to use one |NtQueryDirectoryFile()| call
> with a SINGLE pattern testing for { "foo", "foo.lnk", "foo.lnk.exe",
> ... } (instead of calling the kernel for each suffix independently)
> and cache that information for the lifetime of the matching POSIX
> function call ?

Yes and no.  This could certainly made to work, but it has a couple
of caveats which are not trivial, and there's *no* guarantee that
you will be able to get faster code by doing that.  At all.

First of all, in contrast to calling NtOpenFile on the file,
NtQueryDirectoryFile always needs two calls, because you have to open
the directory first. If you then found the file, you have to open the
file to fetch information.  So you have always one more call than by
opening the file immediately and having immediate success.  It's more or
less equivalent if the file is a *.exe file, and it's one less hit if
it's a *.lnk file.

Which pattern would you like to use? Let's assume we carefully try to
get rid of .exe.lnk, we still have to check for "foo", "foo.exe" and
"foo.lnk".  Even if we get rid of .lnk, we have two patterns which
can *not* be expressed in a single call to NtQueryDirectoryFile.
We only have Windows' most simple globbing, i. e., we have '*' and '?'.
The only pattern matching "foo" and "foo.exe" is "foo*".  "foo.*"
does not hit on "foo". So "foo*".  As you know, the NtQueryDirectoryFile
call can return a buffer with multiple hits.  But the buffer has a 
finite size, so if somebody is looking for the file "a", we'd have to
look for "a*", which may have more hits than fit into the buffer,
So the code has to be prepared not only to scan a 64K buffer for
(potentially) hundrets of entries, but also to repeat the call to
NtQueryDirectoryFile to load more matching file entries.

Next problem, NFS.  The current call just opening the file checks with
the necessary flags to access symlinks.  Without these flags, NFS
symlinks are invisible or not handled as symlinks.  So, right now, we
have a single call on NFS to open a file, if it exists without suffix.
If you use NtQueryDirectoryFile, you have another subtil problem.  If it
happens to be an NFS dir, you have to use another FILE_INFORMATION_CLASS,
otherwise symlinks don't show up at all.  This information clas isn't
even sufficient for the most basic of information we need in the
symlink_info::check method. So you need to open the file here, too,
and extract the information.

There's probably more to it, but that's just what came to mind for
a start.

> The idea is to reduce the number of userland<--->kernel roundstrips
> from <n> to <1>, and filesystem drivers could be optimized even
> further (for example if the network filesystem protocol supports file
> name globbing...)

I have a hard time to see that you can really avoid a lot of calls.
You may find that you won't save a lot of them, and another lot
of them don't matter becasue the OS already cached information.

Also, as exciting as it might be to do extensive caching (and, as I
wrote in a former reply today, we do some caching), keep in mind the we
are only a user-space DLL.  The only caching of file information you
can rely upon is that of the kernel.


Corinna

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019