delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2015/07/13/15:27:30

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:content-type:mime-version:subject:from
:in-reply-to:date:content-transfer-encoding:message-id
:references:to; q=dns; s=default; b=pV4czw7FdiX0N7Bqo/hTU2WoHEp3
lvng+vN2gTk3KCjirGBz4uU5OxVl/tCr4gooFpoKwbwUiws8x6euT/jE7o8Qlowq
ukLHOPjoXGcXZxegiYeh9P0Fq1AXyCUL5zbHilDZGRY1FaJ/4nUkohIOTHZ/zxSw
aSTN9+tKaZt0k4U=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:content-type:mime-version:subject:from
:in-reply-to:date:content-transfer-encoding:message-id
:references:to; s=default; bh=8/M8m4dJNCMTkNrRbDIkN9lGw7k=; b=Dp
l/1eG2TTZfMWA9ALRp3Dv7ByGtnevf4yL1BLVOhiBZcU0ASUiQx/nLrwQIjpjg73
7yOrgz2poQiCL0H6vXW/pudM8BL8ofRbkPLMZD/IxajeZB6o+r2AsdV8hgDO5CCF
5tB7sP9N1RKYkyek+EDUUbpqLWh2wPh1latlrV9iw=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=0.1 required=5.0 tests=AWL,BAYES_50,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD autolearn=ham version=3.3.2
X-HELO: etr-usa.com
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2102\))
Subject: Re: Telnet / SSH connection timeout on LAN
From: Warren Young <wyml AT etr-usa DOT com>
In-Reply-To: <1632599002.20150713193420@yandex.ru>
Date: Mon, 13 Jul 2015 13:27:15 -0600
Message-Id: <5E3C5AF7-A63D-4826-80AD-F2040BCFF6EA@etr-usa.com>
References: <1436142936994-119480 DOT post AT n5 DOT nabble DOT com> <7931485F-EEA3-4C1B-8B2C-E495EF5ED1A9 AT etr-usa DOT com> <1283519593 DOT 20150709090453 AT yandex DOT ru> <5C24455B-3D28-4C6D-A77B-70BB5D67F0AA AT etr-usa DOT com> <1023815842 DOT 20150711053838 AT yandex DOT ru> <F96E2C14-5C27-43A0-AC16-DDF01F05BB23 AT etr-usa DOT com> <1632599002 DOT 20150713193420 AT yandex DOT ru>
To: cygwin AT cygwin DOT com
X-IsSubscribed: yes
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id t6DJRQms017494

On Jul 13, 2015, at 10:34 AM, Andrey Repin <anrdaemon AT yandex DOT ru> wrote:
> 
> In my environment, a small touch to the original file cause changes throughout
> the entirety of its stored image. ('cause storage format is actually an
> archive, and a small change here and there in the source file cause massive
> shifts in the resulting image.)

Unless those files are written using either whole-archive compression or whole-archive encryption, rsync should still be able to find substantial savings in the transfer with its rolling checksums.  rsync won’t be confused by simple changes like a new byte added to the middle of a file, shifting all subsequent bytes down by one.

Some “archive” formats do use compression, but in a piecewise fashion, so that changing one byte of one piece of the archive may cause that entire chunk to change, but it might not affect any of the others.  An example of this is the Fossil database format.

You can figure out if your archive files work this way by adding -v to your rsync command.  It reports a ratio of the on-disk data size to the transfer size as “speedup is N”, where N > 1.0 means it is not re-sending the entire file.  The output of --stats gives similar info, more verbosely. 

The point I made in the original post, however, is that all this work to save network bandwidth comes at a disk I/O and CPU cost in the case of rsync, because it doesn’t have a daemon that can sit around watching for filesystem change events.  The larger the files are with respect to the change sizes, the greater the waste.

Always-running software like Dropbox avoids much of this cost because it can watch for those events, and thus only do work when the OS tells it that a particular file has changed.

I have also left out another disadvantage of rsync: it’s basically a one-way operation.  If you ever need two-way (or N-way) syncing, you’re better off moving to one of the many alternatives that know how to do this correctly.  Multilateral syncing is surprisingly hard to get right.

I don’t mean to advertise for Dropbox, just to give it as an example that everyone can relate to.

An alternative that’s open source, more secure, and definitely does pay attention to the OS’s filesystem event API is SpiderOak.  You can see from their Github contents that they’ve got OS-specific file change notifiers:

  https://github.com/SpiderOak

Now contrast Syncthing, which has many of the same virtues, but currently doesn’t have file change notification built in, causing some third party to write a helper for Syncthing to fill the gap:

  https://syncthing.net/
  https://github.com/syncthing/syncthing-inotify/

These tables may be helpful:

  https://en.wikipedia.org/wiki/Comparison_of_file_synchronization_software
--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


- Raw text -


  webmaster     delorie software   privacy  
  Copyright 2019   by DJ Delorie     Updated Jul 2019