delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2022/10/11/03:54:15

X-Recipient: archive-cygwin AT delorie DOT com
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AAC72385735F
Authentication-Results: sourceware.org;
dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
h=to:subject:message-id:date:from:mime-version:from:to:cc:subject
:date:message-id:reply-to;
bh=C9FCd13BV1SobA3yuuzl/xjf+XYRjS650Ss9301NSB0=;
b=E77dprdGP56Re3bKL/JK3nItDXvZQ+coHewA20dAxSrpcAZd3wfMuOYO6vxPmSVG0f
ms/odw+297eD4SBkVx6BfcbvSlYdsIIk6TgW8xMqmmqW9/Zk8KvcLzGi5ZW4H4syuS+Y
h/EqXaKQZwbjxGv8pzqdCF5WzK3yM0sME6zxWK5tbnTY6NKTqLUkPzdTTOhQRt1lpmcp
ihUR3HbAu/9yCw0PEFRd9DeKPGLBrNAL475DVENZi8GRp7X5cHx79PARM1I+tw5xG86+
asmXuNs7kN3Cmcu5Sl36lLAXSJQLJraQA/zb4jWaC0gqEdRTuBt0c4SIHZ+mXWGog054
2wDw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20210112;
h=to:subject:message-id:date:from:mime-version:x-gm-message-state
:from:to:cc:subject:date:message-id:reply-to;
bh=C9FCd13BV1SobA3yuuzl/xjf+XYRjS650Ss9301NSB0=;
b=468FI7c6uCieCH+FDFNi+YXAfJBm9JOMW+Qq7I2lHZrLhVJtfcX2JmNEyAJwamqfDu
xmYDTM3/A6YZY5ltdY7T4Lmastcx4qUHdqiW/OVbtYoEEIHBJNXAiAa+wwnUXXAj1KIK
Yj/of6hyC5XaWhJ/UUhfUBHcowPOoiNDQjF4Px6X2SVZNKPxWSyFpC5xjdnvYSBRSSsT
qeQPwHCoZs7XjrrLh0hUL1qx1wvb01XxPMALn6W0Y4uA5akGnscXn4BP1YMXBMyt1t9F
NjOs/I2PBPrYjxEcQmvjM05oka/06O3DI/9AWtq+6/nWuXk1pUBWbZMuBwESEF6lgIxS
wTzQ==
X-Gm-Message-State: ACrzQf3KboE6B2PWDyWbqNhxwQXgkRMT2SfgQLNxMqE6txdi5cUPixJO
VqkSW+uM67ZDl8fjKnmuQeqFDqlvhKSb49RKVyC4ySadaYs=
X-Google-Smtp-Source: AMsMyM5bmKTxLbuHb6VTQxKfBvmGZwhmGwCj4EtA0OqGUkRpt5KxnHuX4EbInGrHEycWG2KJpbyhli1oQQwLhQrdVcs=
X-Received: by 2002:a17:907:2672:b0:780:8bb5:25a3 with SMTP id
ci18-20020a170907267200b007808bb525a3mr17877712ejc.281.1665474792887; Tue, 11
Oct 2022 00:53:12 -0700 (PDT)
MIME-Version: 1.0
From: "Matt D." <codespunk AT gmail DOT com>
Date: Tue, 11 Oct 2022 03:53:01 -0400
Message-ID: <CAC+X2=JFRkvyOZnO1FpqCPE+FnuA1Oxf-tauafJgUkz+o9mrYA@mail.gmail.com>
Subject: Cygwin triggers integrity scrubbing on ReFS filesystems, making
searching files impossible on large datasets
To: cygwin AT cygwin DOT com
X-Spam-Status: No, score=2.3 required=5.0 tests=BAYES_20, DKIM_SIGNED,
DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,
SPF_HELO_NONE, SPF_PASS, TXREP,
URI_TRY_3LD autolearn=no autolearn_force=no version=3.4.6
X-Spam-Level: **
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
Sender: "Cygwin" <cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com>

I formatted a drive today, with ReFS on a Storage Pool mirror with
integrity streams enabled, before copying data over from a backup. The
data included several million files, which I search often with tools
like find and grep. After the copy was finished, I tried doing a
simple find:

time find . -iname file.png

I noticed that the search was taking much longer than expected, and I
gave up after waiting for over 20 minutes. I confirmed that I could
perform a search of the same data on an external USB3 drive formatted
NTFS in between 1-1.5 minutes.

To verify that this is in fact an incompatibility with ReFS's
integrity streams, I formatted the same pool with this feature
disabled and copied the files back over. Without integrity streams,
the find operation took about 30 seconds. I confirmed this further by
formatting the pool as NTFS, with a similar result. I then formatted
the pool one last time with ReFS again with integrity streams enabled,
and the problem returned.

Although the behavior appears as a program hang, it's just very slow
at searching, and not actually frozen. It continues to respond to
Ctrl-C and, if a more permissive pattern is used, output can be seen
during the search; it's just very slow. I believe the issue has to do
with how Cygwin or find is accessing these files as it searches,
triggering the integrity scrubber on each visit, causing the search to
be unbearably slow. Using Windows search on the same disk does not
have this problem.

I haven't tried to do any performance comparison with grep, but I
would expect the experience to be similarly poor or worse. It's
interesting that the scrubber is triggered in this example with find,
as I'm only examining the name of files, and not trying to read their
contents.

See here for more information on ReFS integrity streams:

https://learn.microsoft.com/en-us/windows-server/storage/refs/integrity-streams

To format a disk with this feature, PowerShell must be used, as it's
not enabled by default or accessible from the GUI:

Format-Volume -DriveLetter D -FileSystem REFS -SetIntegrityStreams $true

The hardware I used was two Crucial MX500 2TB SSDs, recently trimmed,
in a RAID1 mirror configuration in Storage Spaces on Windows 10
Professional for Workstations. My system just formatted and fully
updated. Cygwin was also a fresh download and fully updated. The
system is otherwise very fast, with a Ryzen 1800X and 64GB of memory.

At this point, I am unable to use Cygwin whatsoever on any disk
formatted ReFS with the integrity streams feature enabled for any kind
of performant workload on a dataset that includes I/O on a large
number of files.

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019