X-Recipient: archive-cygwin@delorie.com
X-Original-To: cygwin@cygwin.com
Delivered-To: cygwin@cygwin.com
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AAC72385735F
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=to:subject:message-id:date:from:mime-version:from:to:cc:subject
 :date:message-id:reply-to;
 bh=C9FCd13BV1SobA3yuuzl/xjf+XYRjS650Ss9301NSB0=;
 b=E77dprdGP56Re3bKL/JK3nItDXvZQ+coHewA20dAxSrpcAZd3wfMuOYO6vxPmSVG0f
 ms/odw+297eD4SBkVx6BfcbvSlYdsIIk6TgW8xMqmmqW9/Zk8KvcLzGi5ZW4H4syuS+Y
 h/EqXaKQZwbjxGv8pzqdCF5WzK3yM0sME6zxWK5tbnTY6NKTqLUkPzdTTOhQRt1lpmcp
 ihUR3HbAu/9yCw0PEFRd9DeKPGLBrNAL475DVENZi8GRp7X5cHx79PARM1I+tw5xG86+
 asmXuNs7kN3Cmcu5Sl36lLAXSJQLJraQA/zb4jWaC0gqEdRTuBt0c4SIHZ+mXWGog054
 2wDw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=to:subject:message-id:date:from:mime-version:x-gm-message-state
 :from:to:cc:subject:date:message-id:reply-to;
 bh=C9FCd13BV1SobA3yuuzl/xjf+XYRjS650Ss9301NSB0=;
 b=468FI7c6uCieCH+FDFNi+YXAfJBm9JOMW+Qq7I2lHZrLhVJtfcX2JmNEyAJwamqfDu
 xmYDTM3/A6YZY5ltdY7T4Lmastcx4qUHdqiW/OVbtYoEEIHBJNXAiAa+wwnUXXAj1KIK
 Yj/of6hyC5XaWhJ/UUhfUBHcowPOoiNDQjF4Px6X2SVZNKPxWSyFpC5xjdnvYSBRSSsT
 qeQPwHCoZs7XjrrLh0hUL1qx1wvb01XxPMALn6W0Y4uA5akGnscXn4BP1YMXBMyt1t9F
 NjOs/I2PBPrYjxEcQmvjM05oka/06O3DI/9AWtq+6/nWuXk1pUBWbZMuBwESEF6lgIxS
 wTzQ==
X-Gm-Message-State: ACrzQf3KboE6B2PWDyWbqNhxwQXgkRMT2SfgQLNxMqE6txdi5cUPixJO
 VqkSW+uM67ZDl8fjKnmuQeqFDqlvhKSb49RKVyC4ySadaYs=
X-Google-Smtp-Source: AMsMyM5bmKTxLbuHb6VTQxKfBvmGZwhmGwCj4EtA0OqGUkRpt5KxnHuX4EbInGrHEycWG2KJpbyhli1oQQwLhQrdVcs=
X-Received: by 2002:a17:907:2672:b0:780:8bb5:25a3 with SMTP id
 ci18-20020a170907267200b007808bb525a3mr17877712ejc.281.1665474792887; Tue, 11
 Oct 2022 00:53:12 -0700 (PDT)
MIME-Version: 1.0
From: "Matt D." <codespunk@gmail.com>
Date: Tue, 11 Oct 2022 03:53:01 -0400
Message-ID: <CAC+X2=JFRkvyOZnO1FpqCPE+FnuA1Oxf-tauafJgUkz+o9mrYA@mail.gmail.com>
Subject: Cygwin triggers integrity scrubbing on ReFS filesystems, making
 searching files impossible on large datasets
To: cygwin@cygwin.com
X-Spam-Status: No, score=2.3 required=5.0 tests=BAYES_20, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP,
 URI_TRY_3LD autolearn=no autolearn_force=no version=3.4.6
X-Spam-Level: **
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: cygwin@cygwin.com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-request@cygwin.com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
 <mailto:cygwin-request@cygwin.com?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "Cygwin" <cygwin-bounces+archive-cygwin=delorie.com@cygwin.com>

I formatted a drive today, with ReFS on a Storage Pool mirror with
integrity streams enabled, before copying data over from a backup. The
data included several million files, which I search often with tools
like find and grep. After the copy was finished, I tried doing a
simple find:

time find . -iname file.png

I noticed that the search was taking much longer than expected, and I
gave up after waiting for over 20 minutes. I confirmed that I could
perform a search of the same data on an external USB3 drive formatted
NTFS in between 1-1.5 minutes.

To verify that this is in fact an incompatibility with ReFS's
integrity streams, I formatted the same pool with this feature
disabled and copied the files back over. Without integrity streams,
the find operation took about 30 seconds. I confirmed this further by
formatting the pool as NTFS, with a similar result. I then formatted
the pool one last time with ReFS again with integrity streams enabled,
and the problem returned.

Although the behavior appears as a program hang, it's just very slow
at searching, and not actually frozen. It continues to respond to
Ctrl-C and, if a more permissive pattern is used, output can be seen
during the search; it's just very slow. I believe the issue has to do
with how Cygwin or find is accessing these files as it searches,
triggering the integrity scrubber on each visit, causing the search to
be unbearably slow. Using Windows search on the same disk does not
have this problem.

I haven't tried to do any performance comparison with grep, but I
would expect the experience to be similarly poor or worse. It's
interesting that the scrubber is triggered in this example with find,
as I'm only examining the name of files, and not trying to read their
contents.

See here for more information on ReFS integrity streams:

https://learn.microsoft.com/en-us/windows-server/storage/refs/integrity-streams

To format a disk with this feature, PowerShell must be used, as it's
not enabled by default or accessible from the GUI:

Format-Volume -DriveLetter D -FileSystem REFS -SetIntegrityStreams $true

The hardware I used was two Crucial MX500 2TB SSDs, recently trimmed,
in a RAID1 mirror configuration in Storage Spaces on Windows 10
Professional for Workstations. My system just formatted and fully
updated. Cygwin was also a fresh download and fully updated. The
system is otherwise very fast, with a Ryzen 1800X and 64GB of memory.

At this point, I am unable to use Cygwin whatsoever on any disk
formatted ReFS with the integrity streams feature enabled for any kind
of performant workload on a dataset that includes I/O on a large
number of files.

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
