Mail Archives: cygwin/2021/05/03/12:33:08
X-Recipient: | archive-cygwin AT delorie DOT com
|
X-Original-To: | cygwin AT cygwin DOT com
|
Delivered-To: | cygwin AT cygwin DOT com
|
DMARC-Filter: | OpenDMARC Filter v1.3.2 sourceware.org 66DB1385701E
|
Authentication-Results: | sourceware.org; dmarc=none (p=none dis=none)
|
| header.from=SystematicSw.ab.ca
|
Authentication-Results: | sourceware.org;
|
| spf=none smtp.mailfrom=brian DOT inglis AT systematicsw DOT ab DOT ca
|
X-Authority-Analysis: | v=2.4 cv=H864f8Ui c=1 sm=1 tr=0 ts=609025bc
|
| a=T+ovY1NZ+FAi/xYICV7Bgg==:117 a=T+ovY1NZ+FAi/xYICV7Bgg==:17
|
| a=IkcTkHD0fZMA:10 a=ejknC5xS72zp2OFXFO8A:9 a=QEXdDO2ut3YA:10
|
To: | cygwin AT cygwin DOT com
|
References: | <0a6c01d74021$d98e7790$8cab66b0$@pdinc.us>
|
From: | Brian Inglis <Brian DOT Inglis AT SystematicSw DOT ab DOT ca>
|
Organization: | Systematic Software
|
Subject: | Re: sshd slow login and/or 100% cpu
|
Message-ID: | <35028e0f-88fd-f0e6-1284-9bde09ff5c8b@SystematicSw.ab.ca>
|
Date: | Mon, 3 May 2021 10:32:59 -0600
|
User-Agent: | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
|
| Thunderbird/78.10.0
|
MIME-Version: | 1.0
|
In-Reply-To: | <0a6c01d74021$d98e7790$8cab66b0$@pdinc.us>
|
X-CMAE-Envelope: | MS4xfD52V89j9l7ncc4jkIvPmo2Ht23Tb9sEyIQJ0zsYuDCyPa7b+4KfxDR/UY/ilMZQnsrtAqcs+ZISVEdH5gZoquaPPXoUwOYY4u4SPpvrL3mKLHGXaY/v
|
| 3D4qAZpMEo/M9/ev032FbEwzjAaZ1xyQSEv1Hq46A8JCYWIXyCHlijDhz4ma815Lh46cnmEjWj6zRp+RTcSNWCEMYOhJlUgaOH0=
|
X-Spam-Status: | No, score=-10.2 required=5.0 tests=BAYES_00, GIT_PATCH_0,
|
| KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, NICE_REPLY_A,
|
| RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3,
|
| RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE,
|
| TXREP autolearn=ham autolearn_force=no version=3.4.2
|
X-Spam-Checker-Version: | SpamAssassin 3.4.2 (2018-09-13) on
|
| server2.sourceware.org
|
X-BeenThere: | cygwin AT cygwin DOT com
|
X-Mailman-Version: | 2.1.29
|
List-Id: | General Cygwin discussions and problem reports <cygwin.cygwin.com>
|
List-Unsubscribe: | <https://cygwin.com/mailman/options/cygwin>,
|
| <mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
|
List-Archive: | <https://cygwin.com/pipermail/cygwin/>
|
List-Post: | <mailto:cygwin AT cygwin DOT com>
|
List-Help: | <mailto:cygwin-request AT cygwin DOT com?subject=help>
|
List-Subscribe: | <https://cygwin.com/mailman/listinfo/cygwin>,
|
| <mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
|
Reply-To: | cygwin AT cygwin DOT com
|
Errors-To: | cygwin-bounces AT cygwin DOT com
|
Sender: | "Cygwin" <cygwin-bounces AT cygwin DOT com>
|
On 2021-05-03 07:40, Jason Pyeron wrote:
> My teammates have been observing periodic slow login problems recently, most recent Cygwin update was for the Git CVE, but I do not think that is related.
> Guidance on troubleshooting and resolution most appreciated.
> My assumptions:
> BLODA (I cannot influence that) and bad network between AD and Server.
> I am willing to do whatever I am allowed to fix this.
> Here are my observations:
> 1. The users log in via PKI
> 2. The users' accounts are AD based accounts
> 3. stopping the sshd service does not kill all the sshd.exe processes
> 4. killing all the sshd.exe processes (after service stop) and starting the service returns performance to normal
> 5. the normal last for an indeterminate time between 1 and 24 hours typical. Once it goes slow, it does not recover on its own.
> 6. /etc/nsswitch.conf only contains
> passwd: files
> group: files
> 7. we are running the cygserver, /etc/cygserver.conf is empty
> 8. resolving group information takes 97 seconds (sometimes)
> XREDACTED_00012X AT XREDACTED_00003X ~
> $ id -G XREDACTED_00047X
> XREDACTED_00023X 545 555 2 11 15 XREDACTED_00045X XREDACTED_00028X 401408
> XREDACTED_00012X AT XREDACTED_00003X ~
> $ getent group XREDACTED_00023X 545 555 2 11 15 XREDACTED_00045X XREDACTED_00028X 401408
> Domain Users:S-1-5-21-XREDACTED_00044X-513:XREDACTED_00023X:
> Users:S-1-5-32-545:545:
> Remote Desktop Users:S-1-5-32-555:555:
> NETWORK:S-1-5-2:2:
> Authenticated Users:S-1-5-11:11:
> This Organization:S-1-5-15:15:
> Service asserted identity:S-1-18-2:XREDACTED_00045X:
> XREDACTED_00016X:XREDACTED_00014X:XREDACTED_00028X:
> Medium Mandatory Level:S-1-16-8192:401408:
> Running id a second time is quick, most of the time. A few hours later this morning in the same bash shell, it was slow again.
> 9. tracing through sshd it seems to be holding at 2 system calls in uidswap.c (initgroups, getgroups)
> diff --git a/openssh-8.5p1-1.x86_64/src/openssh-8.5p1/uidswap.c b/openssh-8.5p1-1.x86_64/src/openssh-8.5p1/uidswap.c
> index 40e1215..4538e63 100644
> --- a/openssh-8.5p1-1.x86_64/src/openssh-8.5p1/uidswap.c
> +++ b/openssh-8.5p1-1.x86_64/src/openssh-8.5p1/uidswap.c
> @@ -60,6 +60,8 @@ static int saved_egroupslen = -1, user_groupslen = -1;
> void
> temporarily_use_uid(struct passwd *pw)
> {
> + debug3_f("entering");
> +
> /* Save the current euid, and egroups. */
> #ifdef SAVED_IDS_WORK_WITH_SETEUID
> saved_euid = geteuid();
> @@ -83,7 +85,9 @@ temporarily_use_uid(struct passwd *pw)
> privileged = 1;
> temporarily_use_uid_effective = 1;
>
> + debug3_f("getgroups(0, NULL)");
> saved_egroupslen = getgroups(0, NULL);
> + debug3_f("getgroups(0, NULL)=%u", saved_egroupslen);
> if (saved_egroupslen == -1)
> fatal("getgroups: %.100s", strerror(errno));
> if (saved_egroupslen > 0) {
> @@ -97,42 +101,57 @@ temporarily_use_uid(struct passwd *pw)
> }
>
> /* set and save the user's groups */
> + debug3_f("if (user_groupslen == -1 || user_groups_uid != pw->pw_uid)");
> if (user_groupslen == -1 || user_groups_uid != pw->pw_uid) {
> + debug3_f("if (initgroups(\"%s\", %u) == -1) [SLOW NEXT LINE]", pw->pw_name, pw->pw_gid);
> if (initgroups(pw->pw_name, pw->pw_gid) == -1)
> fatal("initgroups: %s: %.100s", pw->pw_name,
> strerror(errno));
>
> + debug3_f("getgroups(0, NULL) [SLOW NEXT LINE]");
> user_groupslen = getgroups(0, NULL);
> + debug3_f("getgroups(0, NULL)=%u", user_groupslen);
> 10. I have not tried to find the 100% cpu cause, yet. When at 100% it may or may not be slow to log in.
> 11. redacted cygcheck output attached. I ran cygcheck -s -v -r > cygcheck-20210503-0759.out
Given AD accounts /etc/nsswitch.conf should be using the default "files db", and
you may wish to add local and primary (and/or alltrusted and/or other trusted
domain names) to db_enum: see /etc/nsswitch.conf comments.
It is useful before or after certain service startups to prefill caches before
starting real work, e.g. running getent on common hosts right after networking
startup preloads the DNS cache.
With cygserver using AD, right after that service startup, you could getent
group and passwd to prefill their caches.
To do that you can set non-MS services to manual start and start them in
/usr/local/sbin/ scripts run under dash from system startup scheduled tasks.
--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
- Raw text -