Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Message-Id: <200507212301.j6LN11xv008921@tigris.pounder.sol.net> To: cygwin AT cygwin DOT com From: cygwin AT trodman DOT com (Tom Rodman) Subject: messy death of ssh bash sesn, bash $$ and ppid in still /proc,but not in ps Date: Thu, 21 Jul 2005 18:00:59 -0500 X-IsSubscribed: yes Greetings: Not expecting help; want to share a problem I have seen repeatedly every week or so on 1 host (windows 2000 server w/latest service packs and fixes), using 1.5.17. I have not seen a pattern or a cause, but I seem to recall that the shell that "goes south", often (always?) has several suspended jobs - usually a mix of "vim" and "less". (Notice the child processes of pid 6084 shown below.) I will be running fairly routine sys admin commands in a bash session on a remote host through an ssh session. The bash command prompt returns after succeeding, then I type a (simple) command ("ls" in the example below), and after the cursor moves down to the first column in the next line, *nothing* subsequently happens. If I look for the bash process that was running the interactive shell, with cygwin 'ps', it is not there (pid is 6084 in example below). Strangely, though /proc has both the bash session and it's parent. "cat /proc/6084/ppid" shows value 1052, which is the parent sshd process ("/usr/sbin/sshd -D -R";see example below). It turns out cygwin's ps does not show processes 1052 nor 6084. Both the sshd and it's child bash session mysteriously vanished, but the child processes of the bash session remain. ( I'm not sure how much I should trust "procps -H -Ao pid,ppid,%cpu,user,bsdstart,args", but that's a side issue. Search ahead for "defunct".) -- regards, Tom --v-v------------------C-U-T---H-E-R-E-------------------------v-v-- # -------------------------------------------------------------------- # this bash session shows my checks *after* bash session w/pid # 6084 died. 6084's ppid was 1052 # -------------------------------------------------------------------- [16:36:03 Thu Jul 21 ~ ourhost scmcron -bash-2.05b] $ tty /dev/tty1 [16:36:05 Thu Jul 21 ~ ourhost scmcron -bash-2.05b] $ which ps_ ps_ is aliased to `procps -H -Ao pid,ppid,%cpu,user,bsdstart,args' [16:36:14 Thu Jul 21 ~ ourhost scmcron -bash-2.05b] $ ps_ PID PPID %CPU USER START COMMAND 3040 6084 0.0 scmcron 16:03 vim rc_startup 5240 6084 0.0 scmcron 15:33 less -I -j4 -x2 -S /var/log/rc_startup.log 4148 6084 0.0 scmcron 15:30 vim basename.shinc 3196 6084 0.0 scmcron 14:51 vim _logrotate 1624 1 0.0 SYSTEM 00:38 /usr/bin/cygrunsrv 1696 1624 0.0 SYSTEM 00:38 /usr/sbin/sshd -D 1052 1696 0.0 SYSTEM 09:43 /usr/sbin/sshd -D -R 6188 1696 0.0 SYSTEM 16:14 /usr/sbin/sshd -D -R 2768 6188 0.0 scmcron 16:14 -bash 3396 2768 0.0 scmcron 16:32 procps -H -Ao pid,ppid,%cpu,user,bsdstart,args 1028 1 0.0 SYSTEM 00:38 /usr/bin/cygrunsrv 1116 1028 0.0 SYSTEM 00:38 /usr/sbin/cron -D [16:36:17 Thu Jul 21 ~ ourhost scmcron -bash-2.05b] $ cat /proc/6084/ppid 1052 [16:36:43 Thu Jul 21 ~ ourhost scmcron -bash-2.05b] $ which ps ps is aliased to `ps -elW ' ps is /usr/bin/ps [16:37:07 Thu Jul 21 ~ ourhost scmcron -bash-2.05b] $ ps|egrep '\<1052|6084\>' [16:37:34 Thu Jul 21 ~ ourhost scmcron -bash-2.05b] $ --v-v------------------C-U-T---H-E-R-E-------------------------v-v-- # -------------------------------------------------------------------- # Here are the last two commands typed in bash session w/pid 6084 # The "ls" command never completed. # Sorry for the prompt, PS1 below is "\t \d \jj \l 4880 \w\n> \h \u >" # This session has tty of tty0, and 4 suspended jobs. # -------------------------------------------------------------------- > 16:15:07 Thu Jul 21 4j tty0 6084 /drv/c/adm/config/rc > ourhost scmcron > regtool -s set /HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Services/rc_startup/Parameters/Environment/PATH_ADM 'c:' > 16:16:01 Thu Jul 21 4j tty0 6084 /drv/c/adm/config/rc > ourhost scmcron > ls --v-v------------------C-U-T---H-E-R-E-------------------------v-v-- # -------------------------------------------------------------------- # change in "procps -H -Ao pid,ppid,%cpu,user,bsdstart,args" output # cause again unknown # (about 1 hour of time passed, and I looked at processes via a TS session and task manager) # I waited ~10 min, and re-ran procps and the output returned to normal; ie # all the defunct pids were replaced again w/the earlier integer values! # -------------------------------------------------------------------- [17:30:49 Thu Jul 21 /proc/registry/HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Services/rc_startup/Parameters/Environment c7mkes108 scmcron -bash-2.05b] $ ps_ PID PPID %CPU USER START COMMAND 3040 6084 0.0 scmcron 16:03 5240 6084 0.0 scmcron 15:33 4148 6084 0.0 scmcron 15:30 3196 6084 0.0 scmcron 14:51 1624 1 0.0 SYSTEM 00:38 1696 1624 0.0 SYSTEM 00:38 1052 1696 0.0 SYSTEM 09:43 6188 1696 0.0 SYSTEM 16:14 2768 6188 0.0 scmcron 16:14 2936 2768 0.7 scmcron 17:27 procps -H -Ao pid,ppid,%cpu,user,bsdstart,args 1028 1 0.0 SYSTEM 00:38 1116 1028 0.0 SYSTEM 00:38 [17:31:01 Thu Jul 21 /proc/registry/HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Services/rc_startup/Parameters/Environment c7mkes108 scmcron -bash-2.05b] $ cat /proc/6084/ppid 1052 [17:40:56 Thu Jul 21 /proc/registry/HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Services/rc_startup/Parameters/Environment c7mkes108 scmcron -bash-2.05b] $ ps_ PID PPID %CPU USER START COMMAND 3040 6084 0.0 scmcron 16:03 vim rc_startup 5240 6084 0.0 scmcron 15:33 less -I -j4 -x2 -S /var/log/rc_startup.log 4148 6084 0.0 scmcron 15:30 vim basename.shinc 3196 6084 0.0 scmcron 14:51 vim _logrotate 1624 1 0.0 SYSTEM 00:38 /usr/bin/cygrunsrv 1696 1624 0.0 SYSTEM 00:38 /usr/sbin/sshd -D 6188 1696 0.0 SYSTEM 16:14 /usr/sbin/sshd -D -R 2768 6188 0.0 scmcron 16:14 -bash 5388 2768 0.0 scmcron 17:44 procps -H -Ao pid,ppid,%cpu,user,bsdstart,args 1028 1 0.0 SYSTEM 00:38 /usr/bin/cygrunsrv 1116 1028 0.0 SYSTEM 00:38 /usr/sbin/cron -D --v-v------------------C-U-T---H-E-R-E-------------------------v-v-- # -------------------------------------------------------------------- # I finally killed the ssh client that had originated from a linux box.. # -------------------------------------------------------------------- > 16:15:07 Thu Jul 21 4j tty0 6084 /drv/c/adm/config/rc > c7mkes108 scmcron > regtool -s set /HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Services/rc_startup/Parameters/Environment/PATH_ADM 'c:' > 16:16:01 Thu Jul 21 4j tty0 6084 /drv/c/adm/config/rc > c7mkes108 scmcron > ls Killed by signal 15. [17:41:43 Thu Jul 21 0j 19 17149 ~] [cmke6-75 rodmant]$ #now we're back on the linux host -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/