Getting random SIGHUPs - A-Shell Forum

Register Log In A-Shell Network Forums Environment - UNIX / Linux Getting random SIGHUPs

Print Thread

Getting random SIGHUPs #11498 27 Nov 18 01:38 AM
Joined: Jan 2003 Posts: 133 Madics Systems Ltd D Dominic - Madics Systems Ltd OP Member
OP Dominic - Madics Systems Ltd Member D Joined: Jan 2003 Posts: 133 Madics Systems Ltd	This is probably not an ashell issue but just in case anyone has seen the same before. This is happening on two remote sites of a customer but not on the local site. It started happening two days ago. Normal network diagnostics are not getting anywhere (show no problems). Was wondering is there could be a false positive for the SIGHUP - a SIGHUP that is not a SIGHUP or similar. 27-Nov-18 14:13:55 [p9175066-j0]<> jcbrebuild #0 27-Nov-18 14:16:03 [p9633834-j0]<> jcbrebuild #0 27-Nov-18 14:19:06 [tsk:10551428-j0]<> jcbrebuild #0 27-Nov-18 14:21:58 [tsk:7602254-j0]<> jcbrebuild #0 27-Nov-18 14:23:58 [tsk:7405796-j0]<> jcbrebuild #0 27-Nov-18 14:26:30 [p7798786-j3] SIGHUP trapped on: TSKAAC (yjane) 27-Nov-18 14:26:30 [p7798786-j3] (Waiting for kbd wait to generate e 27-Nov-18 14:26:30 [p7798786-j3] (Now in kbd wait; setting basic erro 27-Nov-18 14:26:31 [p7798786-j3] jcbrebuild #3 27-Nov-18 14:26:31 [p7798786-j3] 27-Nov-18 14:26:31 [TSKAAC,3,MADICX,y Was: 20P/21L, Is: 20P/21L (free passes: 0 tty, 1 ip)

Re: Getting random SIGHUPs #11499 27 Nov 18 03:43 AM
Joined: Sep 2002 Posts: 5,486 USA F Frank Member
Frank Member F Joined: Sep 2002 Posts: 5,486 USA	Maybe Steve is creating some gremlins

Re: Getting random SIGHUPs #11500 27 Nov 18 04:07 AM
Joined: Jun 2001 Posts: 11,925 Woodland Hills, CA J Jack McGregor Member
Jack McGregor Member J Joined: Jun 2001 Posts: 11,925 Woodland Hills, CA	Hmmm... I'm not sure what to make of this but here are a few thoughts... 1. I've never heard of a false positive SIGHUP coming from some OS-level glitch. But it is certainly possible that an A-Shell or OS-level process is explicitly sending SIGHUP signals via the UNIX kill command, so I guess a "rogue agent" is impossible to rule out. (Note that at one point in the distant past, KILL.LIT was even guilty of sending SIGHUP to terminate jobs; that was eventually changed though to sending SIGKILL instead. Also note that when A-Shell sends a signal to another job via MX_KILL , a message "MX_KILL signal ## sent to pid #####" will be written to the ashlog file.) 2. I have seen cases of cron scripts that crank up every so many minutes, ostensibly looking for CPU hogs or zombies and end up sending out a barrage of SIGHUP or SIGKILL signals. The telltale sign of that is usually a bunch of SIGHUP messages in the log at the same time, repeating at a fixed interval. 3. With that in mind, it might be useful to see a somewhat larger excerpt of the log, both to check for such clusters, and also to get a sense of how well the jobs receiving the SIGHUPs are shutting themselves down. (I know that's not the issue here, but it often is, so it's a good opportunity to review. Typically what I'll do is locate the SIGHUP message, then search from there for the pid (in the bracketed part of each trace prefix) to see if the job goes through the entire sequence, ending up with "After qpurge & qclose", and how long that takes. 4. It's hard to tell from the excerpt if the SIGHUP is occurring after the job had been running for awhile, or if it was during the startup. (The "Was: 20P/21L, Is: 20P/21L" in the final trace suggests that there was no change in the physical/logical job count as a result of the SIGHUP, which might mean that the job never got into the job table to start with, or the "jcb rebuild" (more like a "rescan") occurred before the job had exited. 5. What version/platform is this? 6. I once wrote a utility to convert an ashlog.log to a kind of spreadsheet of sessions to allow for a kind of overview, including information on how many sessions terminate with errors or signals. But it's kind of a work in progress due to the constantly evolving information in the ashlog. (This is one of the ideas under the category of "system health reports" that we touched on at the Conference but didn't really resolve.) Usually the issue comes up in case like this where you are interested in a specific statistic - the number or frequency of SIGHUPs in this case. And that should be fairly easy to get by scanning your ashlog for indicators of session start, finish, and termination, such as: Normal (but rather short) session... Code 27-Nov-18 05:20:23 [p21317-23]<:(nil)> In: Nodes=11/31/55 [P], ip=192.168.20.205 d8:9e:f3:6:9a:43, (dave) ... 27-Nov-18 05:20:36 [p21317-23]<HOST:0x3da> Out: Nodes Remaining = 10P/30L, 15 reads, 1 writes, 140 kbd byte SIGHUP with normal recovery/termination... Code 27-Nov-18 04:55:38 [tsk:20349-18]<BXINA2:0x436a> SIGHUP trapped on: TSKAAR (steak) ... 27-Nov-18 04:55:43 [tsk:20349-18]<MASTMU:0x47c6> After qpurge & qclose But, if you are sufficiently motivated to want to tinker with a spreadsheet treatment of the ashlog in order to gather statistics, look for anomalies, etc., that might inspire me to dig out the routine to let you play with it. (Full disclosure/warning: it's the kind of thing that can easily suck up many hours gathering, analyzing, refining, etc. which might be interesting but aren't necessary that productive.)

Re: Getting random SIGHUPs #11501 02 Dec 18 10:46 PM
Joined: Sep 2003 Posts: 4,178 Cambridge, England. Steve - Caliq Member
Steve - Caliq Member Joined: Sep 2003 Posts: 4,178 Cambridge, England.	After Dominic running around in circle for a few days the customers I.T person decided to upgrade the firm on their Router and this then magically fixed it. They said "upgrade" we do wonder if it was really an unspoken of "downgrade"

Re: Getting random SIGHUPs [Re: Dominic - Madics Systems Ltd] #34180 28 Apr 21 08:23 AM
Joined: Nov 2006 Posts: 2,262 Northwest Arkansas S Stephen Funkhouser Member
Stephen Funkhouser Member S Joined: Nov 2006 Posts: 2,262 Northwest Arkansas	Curious if the utility you mention here has gotten any unreported attention. Having users report disconnects, and then having to try to manually parse the ashlog to determine what disconnects should be treated as ordinary termination vs abnormal is quite difficult. Not to mention time consuming. Stephen Funkhouser Diversified Data Solutions

Re: Getting random SIGHUPs [Re: Dominic - Madics Systems Ltd] #34181 28 Apr 21 10:21 AM
Joined: Jun 2001 Posts: 11,925 Woodland Hills, CA J Jack McGregor Member
Jack McGregor Member J Joined: Jun 2001 Posts: 11,925 Woodland Hills, CA	No unreported attention so far, but I'll take this as an indication of interest and will put it on the to-do list to see whether it's practical to turn it into something usable outside of the lab.

Re: Getting random SIGHUPs [Re: Dominic - Madics Systems Ltd] #34182 28 Apr 21 11:04 AM
Joined: Nov 2006 Posts: 2,262 Northwest Arkansas S Stephen Funkhouser Member
Stephen Funkhouser Member S Joined: Nov 2006 Posts: 2,262 Northwest Arkansas	Thanks Last edited by Stephen Funkhouser; 28 Apr 21 11:04 AM. Stephen Funkhouser Diversified Data Solutions

Moderated by Jack McGregor, Ty Griffin

Powered by UBB.threads™ PHP Forum Software 7.7.3