An AIX developer has identified an anomaly in the asynchronous server connection scheme illustrated by the SOSLIB sample program TCPPAR. After some investigation, I've decided to report the details here in case someone else runs into this.

First, some background. The TCPPAR sample program illustrates asynchronous server connections, as well as the use of a spawned child (TCPCH1) to process the connections. This is a fairly powerful technique for implementing high-activity servers (where multiple clients might be connecting at once, thus making the single-threaded server model inappropriate).

The concept is as follows:

1. The server opens a listening socket, using TCPOP_ACCEPT with the TCPXFLG_LISTEN flag, so that it returns immediately after the listening socket is created (rather than waiting for a client connection).

2. The server then loops, using TCPOP_CHECK with the TCPXFLG_ASYNC flag to check if any connections are ready to accept. (In between these checks, it could do some other housekeeping, although in the sample program, it just loops back to check again.)

3. When TCPOP_CHECK comes back with STATUS=1, the server then uses TCPOP_ACCEPT with the TCPXFLG_ASYNC and TCPXFLG_KEEPLISTEN to accept the connection, but keep the listening socket open. (That allows other clients to be able to initiate a connection even before the server is ready to fully accept the next one.)

4. The server then spawns a child process, using xcall SUBMIT, passing it the connection socket (via a command line written into a CTL file).

5. As soon as SUBMIT returns, the server can close the connection socket (since the child now has it's own copy of it), which means that the server is now effectively done with that connection, and is free to go back to step 2, waiting for another client connection. Meanwhile, the child can take as much time as it likes servicing the connection (for example, it might involve several packet exchanges), without interfering with the ability of other clients (and other spawned server-children) to operate.

This technique seems to work just fine under Linux, but under AIX, we noticed that the server was getting a -1 ("not owner") status returned from the TCPOP_CHECK call (in step 2) shortly after spawning a child.

After some investigation, it appears that the explanation is that the child termination signal (SIGCHLD), which the OS delivers to the parent when a spawned child terminates, was aborting the timed TCPOP_CHECK operation. This is actually not an unusual circumstance, and there is a bunch of code in the A-Shell to handle various permutations of the problem with other signals and waiting API calls, particularly in the character input routine, but this particular signal seems to be eluding capture by the handler under AIX.

Fortunately, the workaround is pretty simple. Just ignore the -1 error and loop back to check again. To make sure that the error really is spurious, I put a counter in there so that if we get the -1 status more than 3 times in a row, then we reset the server by closing the listening socket and reopening it (which might cause a client connection attempt to fail if it happened to have been initiated but not fully accepted at that instant), but which otherwise should be perfectly safe. Otherwise, we effectively ignore the spurious -1 and keep looping, waiting for the next client connection.

I should note that the same issue could happen with the TCPOP_READ call also, if a child terminated while it was waiting for data (that is if the TIMER parameter was nonzero). But that situation doesn't occur in the TCPPAR scheme because the server never actually does a TCPOP_READ - it leaves that to the spawned child, which doesn't have the issue to worry about (unless it happened to spawn its own children).

Ultimately I should figure out how to get A-Shell to trap the problem, but that will obviously require an update to the executable, whereas the workaround requires only a minor application tweak.

Here's a temporary link to the updated TCPPAR.BP 1.0(103) which illustrates the workaround. (It will appear in the next SOSLIB update.)