delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2024/04/07/19:43:31

X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6D6F03858410
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1712254270;
bh=TcPt386vM9NNcjN739cJgIXaTX3y19yW5Txg7K6zXqk=;
h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
From;
b=w2exlqVsDPS/OA9brswahikKRrLRDSTiFVICvAV3LV/la93zoHFGtIk3uINUhssuA
l2+ueYAHa+oY38s+KiWpnnf6F/fzUg9qsBEl4RaN4UunSQ1No0TqzdVkNmxCIGCCLl
F9VEaFHomHHfcQ5UuXepZOfJk506gyjL1xbda5Hg=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 545783858C98
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 545783858C98
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712254249; cv=none;
b=r9vMRktVQj57+e0KESRQT4GsQHxrPuqGY6G6mETJsQyU6JuD+F4Hqq152BEsb7PauoRz7Gj6kgNgkBPYNrZ43qT511A6ClLCWSkukoGvUEpNDqvi8620pZHoP9PxXeTBXwewLsRmh3f96kFT2aofrwcGDuLNX6CvghpnYh0FfsM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
t=1712254249; c=relaxed/simple;
bh=LiRAKFUlYJobnLMR7fI3Tlh5ldppKQZvrxFySEKJEzo=;
h=Message-ID:Date:MIME-Version:Subject:To:From;
b=IhakmSitTh5kJ2FYm8J00H333CWgw2e/V6Q8D87QyTK/bSIRpHQKF3ZUV+s7vq9l4MXIuUCSzqnD0DmJz2WlIg4aVM1P6mEgPZ9WWtc+CluEzpvqZPpme0dk7xRG1nwQH1L44CRmZn+E1XXeaj8mVb1glTWdmMw9UBFaz0P7CMA=
ARC-Authentication-Results: i=1; server2.sourceware.org
Message-ID: <1d5aea81-c7c3-4d41-a5f5-db97e25ee9c2@SystematicSW.ab.ca>
Date: Thu, 4 Apr 2024 12:10:44 -0600
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: Cygwin&Win32 file prefetch, block sizes?
To: cygwin AT cygwin DOT com
References: <CANH4o6Mxai4_C=d1P93Prrimb8_H=trTwm-Eg+WBwpomN3tNJw AT mail DOT gmail DOT com>
<ZgwFNde2z804koS_ AT calimero DOT vinschen DOT de>
<CANH4o6P8cts9TJgpdjR4mi+sj2YvuDa=d49XLcEVvYnRB81KRw AT mail DOT gmail DOT com>
<af940815-81af-4ca5-8198-072d09dad23e AT maxrnd DOT com>
<CALXu0UdVLmyTjF6v+1O2Zm_rGu2pEWZY3Uk=skXRbwDu=+JgdA AT mail DOT gmail DOT com>
Organization: Systematic Software
In-Reply-To: <CALXu0UdVLmyTjF6v+1O2Zm_rGu2pEWZY3Uk=skXRbwDu=+JgdA@mail.gmail.com>
X-Rspamd-Queue-Id: 2110F60011
X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS,
SPF_PASS, TXREP,
UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6
X-Stat-Signature: xuaq5d65sytanx6espopamnfr61znwr6
X-Rspamd-Server: rspamout05
X-Session-Marker: 427269616E2E496E676C69734053797374656D6174696353572E61622E6361
X-Session-ID: U2FsdGVkX180btF60alBenDC0Dg2ZN1XKc0BQrTal5k=
X-HE-Tag: 1712254244-347056
X-HE-Meta: U2FsdGVkX19F6A81xZdX0e6Lmd2t0+RRbmdBvNnDdeEambALJ6JoojwEj9ONJYs8ObnamY9NWmBdOFYjbtWw5SWKq6M04DvUgnMeX9mJjNv4Dj1Btzxi7nW/ha2i4K6OLAJiTgKF+0cuS8oDLLR3u17od436+eZfaO8orNKsfqD38lAEBd3Vru1vQ0ojVIHlhGeReIoBQ+ELE3Lda9xjUbeJQXm6RStxkSEPuGbpcbUKcbqXJyvhdWLn/vgYTu5ghwmYyw57Skdw/boybITeIBoAN7PU2pKWHs4+Q3wku/LvZw9estGiXhvT+Ywx7u2aRUNnWsnFVzmz9+Cf4txf2EcmIaaXAPEPj8at70syJMZsockm62NoyNETswK+OeaDbOFaZa73+QRVEDCE1E/YNm2CNftJcXaeD6jkQGyVzm8387z7QNv37unVuBejnsvnK5D3VWl19FkDc/K/h+h7xpxgY9Mvj5fJ+UxR7v+5QYU=
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.30
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Brian Inglis via Cygwin <cygwin AT cygwin DOT com>
Reply-To: cygwin AT cygwin DOT com
Cc: Brian Inglis <Brian DOT Inglis AT SystematicSW DOT ab DOT ca>
Errors-To: cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 437NhUMW155916

On 2024-04-03 00:44, Cedric Blancher via Cygwin wrote:
> On Wed, 3 Apr 2024 at 03:10, Mark Geisert via Cygwin wrote:
>> On 4/2/2024 3:35 PM, Martin Wege via Cygwin wrote:
>>> On Tue, Apr 2, 2024 at 3:17 PM Corinna Vinschen via Cygwin wrote:
>>>> On Apr  2 02:04, Martin Wege via Cygwin wrote:
>>>>> Is there any document which describes how Cygwin and Win32 file
>>>>> prefetch and readahead work, and which sizes are used (e.g. always
>>>>> read one full page even if only 16 bytes are requested?)?

>>>> I'm not aware of any docs, but again, keep in mind that Cygwin is a
>>>> userspace DLL. We basically do what Windows does for low-level file
>>>> access.

>>>>> Quick /usr/bin/stat /etc/profile returns "IO Block: 65536". Does that
>>>>> mean the file's block size is really 64k? Is this info per filesystem,
>>>>> or hardcoded in Cygwin?

>>>> Hardcoded in Cygwin since 2017, based on a discussion in terms of
>>>> file access performance, especially when using stdio.h functions:
>>>>
>>>>     https://cygwin.com/cgit/newlib-cygwin/commit/?id=7bef7db5ccd9c

>>> OUCH.
>>>
>>> While I can understand the motivation, FAT32 on multi-GB-devices
>>> having 64k block size, and Win32 API on Win95/98/ME/Win7 being

Those 32 bit systems stopped being of interest long ago and 32 bit Windows and 
Win7 are no longer supported.

>>> optimized to that insane block size, it is absolutely WRONG with
>>> today's NTFS and even more so with ReFS. This only works if you stream
>>> files, but as soon as you are doing random read/writes the performance
>>> is terrible due to cache thrashing. That could explain the many
>>> complaints about Cygwin's IO performance.

Most Cygwin random read/writes are likely for directories.
Any random file I/O is down to the application's needs.

>> No comment.
>>
>>> So, what can be done? I'm not a benchmarking guru, so I'd like to
>>> propose to add a tunable called EXPERIMENTAL_PREFERRED_IO_BLKSIZE to
>>> the CYGWIN env variable (marked as "experimental"), so the
>>> benchmarking guys can do performance testing without recompiling
>>> everything, get perf results for Cygwin 3.6, and decide what to do for
>>> Cygwin 3.7.
>>
>> That kind of experiment is what folks who can build their own
>> cygwin1.dll might do.  I doubt we'd want to make a run-time global disk
>> I/O strategy changer available like this, even temporarily.
> 
> Realistically that would mean that Cygwin will forever be stuck with
> an insane IO block size.
> 
> Building Cygwin.dll requires specialised knowledge and TIME, and no
> manager will waste the time of a performance engineer to produce
> custom binaries.
> Cygwin 3.6 is right now in development, so it would be better to add
> such a knob, so performance engineers can just grab those binaries and
> do benchmarking with them.

Benefit for majority of users to have volunteers do that, rather than address 
Cygwin issues and keep up to date with Windows releases?
If they can pay for benchmarking and performance engineers, they can pay to make 
their own changes, and do their own builds.
No one is saying they have issues and why, and want to bench Cygwin I/O and 
share their results with us.

> BTW: A block size of 64k is CLEARLY harming performance. Have a look
> at https://www.zabkat.com/blog/buffered-disk-access.htm the sweet spot
> is somewhere between 16k and 32k, for SMB even below that. 64k is
> clearly on the backside of the curve, and actively harming
> performance, except for "linear reads".

A decade ago in 2013!
I have older papers recommending 4KB and 8KB blocks and pages, and other older 
papers from that same period recommending 40KB or track sized I/O.
Remember Cygwin does its own directory reads so 64KB is probably about right for 
NTFS entries into dirent.
Unless someone has done benchmarking to prove that some other number would be 
better in future, making it smaller probably does not make any sense.

>> What could make sense is enhancing Cygwin's posix_fadvise() to support
>> POSIX_FADV_RANDOM getting mapped to Windows' FILE_RANDOM_ACCESS flag.
>> Something like this is currently done for POSIX_FADV_SEQUENTIAL ->
>> FILE_SEQUENTIAL_ONLY.  These are per-filedescriptor adjustments and due
>> to Windows limitations would apply to a whole file rather than having
>> the POSIX behavior of being settable for a byte range within a file.
> 
> Nope. Because we are talking about a sensible default for all
> applications, and a block size of 64k is HARMFUL, except on fat32
> where the filesystem block size is already 64k for multi gigabyte
> disks.

Who uses FAT32 for large drives except maybe flash, not even then if they're smart?
Even in the small, slow old days, the equivalent of readfile(2), and mmap(2) 
were better choices.

-- 
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                 -- Antoine de Saint-Exupéry

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019