Slow character echo under RHEL5
#27373
24 Sep 07 04:04 AM
|
Joined: Jun 2001
Posts: 11,925
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,925 |
I recently did a big system upgrade (50 user site) from a single core 800MHz Pentium OpenServer box to a Quad-Core Xeon RHEL 5 (kernel 2.6.18-8.1.10.el5 #1 SMP) system. As expected the new system is an order of magnitude faster than the old one. EXCEPT in one regard:
Character echoing over telnet or ssh is slower, frequently taking about 0.2 seconds. That isn't much, does affect typing throughput, and I probably never would have noticed it, but several fast typists have complained that it is throwing off their keypunching rhythm since they have to pause an instant to visually confirm a numeric entry before hitting ENTER.
The effect seems not specific to A-Shell, since the same thing happens at the bash shell prompt. And it occurs with both telnet and ssh connections, with a variety of PC clients (ATE, ZTERM, putty, Microsoft telnet). The only case where it doesn't occur is when telnetting/ssh'ing from the console to the same server. (Unfortunately there is no other local Linux box to test with, so we can't easily determine if the reason why this combination works is due to the Linux telnet/ssd client, or something relative to the fact that we don't actually have to pass on to any wire.)
After a week of back-and-forth with Red Hat tech support, it seems that we mostly suspect that it is probably the Nagle delay at work (an optimization strategy for minimizing the number of small TCP packets which get sent), although Red Hat is of course suspicious that it it is something wrong with the LAN. They also tell me that it (the Nagle delay) can't be turned off for the telnet/ssh servers in RHEL5, although it isn't yet clear it could ever have been turned off. (I certainly don't find any related configuration options for either of them.)
I am pretty sure I've not seen this kind of issue with older versions of RHEL or CentOS.
There are also some conflicting reports of the delay getting better or worse from one day to the next, even though my sense is that the system and NIC are surely not busy enough to really affect this kind of delay at the character level.
The one system variable I did experiment with was setting /proc/sys/net/ipv4/tcp_low_latency to 1, which is supposed to adjust the tradeoff between low latency and high throughput towards lower latency, lower throughput. And supposedly that did reduce the character delay somewhat, but not enough.
Anyway, the purpose of this report is:
A) it might warn someone else of such a possibility, thus preventing the over-selling of an update.
B) if anyone has any experience or suggestions in this area, they would certainly be appreciated.
C) it just goes to show you that there are many ways to measure performance, such that it's quite possible for a new system to be simultaneously many times faster and also many times slower, depending on your perspective.
|
|
|
Re: Slow character echo under RHEL5
#27374
24 Sep 07 05:14 AM
|
Joined: Sep 2003
Posts: 4,178
Steve - Caliq
Member
|
Member
Joined: Sep 2003
Posts: 4,178 |
The good old nagle delay, as im sure you remember we have to edit this on all our AIX install that use ATE to turbo charge the type ahead etc if we not it feels really sluggish.
Linux always looked far quicker/snappy but we only put a few linux boxes in so far so thanks for the heads up, sure useful to know.
|
|
|
Re: Slow character echo under RHEL5
#27375
24 Sep 07 05:55 AM
|
Joined: Jun 2001
Posts: 713
Steven Shatz
Member
|
Member
Joined: Jun 2001
Posts: 713 |
The problem you describe sounds a lot like the one I encountered when upgrading a client to a faster Linux server (and simultaneously from RH8 to ES4). They experienced significant, but intermittent telnet and ftp sluggishness until their hardware support company either increased or decrased their network switch's speed. I no longer recall the specific solution, but I'm looking into it. In the meantime, perhaps you can try something similar?
|
|
|
Re: Slow character echo under RHEL5
#27376
24 Sep 07 06:25 AM
|
Joined: Jun 2001
Posts: 713
Steven Shatz
Member
|
Member
Joined: Jun 2001
Posts: 713 |
When I asked about the resolution of the aforementioned problem, I was told: the sluggishness was a result of write cacheing being disabled due to a RAID controller in which the battery had never been installed! I hadn’t heard this explanation before, so I am skeptical as to whether that wasn't a fix for a different problem. I’ll let you know if I find out more.
|
|
|
Re: Slow character echo under RHEL5
#27377
24 Sep 07 06:35 AM
|
Joined: Jun 2001
Posts: 11,925
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,925 |
Thanks for the feedback - both of you.
Too bad there isn't (or is there?) a config file somewhere to just turn off the Nagle delay. I'm not sure if the issue is new in RHEL5 or just never noticed it in RHEL4. I'm quite sure it wasn't this way in the old RH8.
As for the network switch, the LAN is big and complex (close to 200 devices on it). The server is plugged into a Gigabit switch though, although most of the telnet users are plugged into a secondary switch beneath that one.
Still, if LAN topography or hardware was part of the issue, we'd see the same delay when connecting from the same PC clients to the old server, which is still plugged into the network, at the same level/switch as the new server. But we don't see any delay there, so it seems to be isolated to the Red Hat server software.
Note that the sluggishness is only really apparent when you stop typing for an instant. For example, if you type the string "12345678" very quickly, you don't really notice any delay in the echoing of characters 1-7 (although admittedly it's hard to focus on while you're typing). The delay occurs on the last character, and that is perfectly consistent with the way the Nagle algorithm usually works.
That is, it tends to delay sending a packet to a remote host until it has gotten an ACK from the previous packet, or until the timer expires. (And 0.2 secs is a typical timer value.) So, the receipt of the "2" forces the server to release the packet containing the echo of the "1", and so on. But after receiving the "8", as there is no further activity coming from the remote host (PC telnet client), it figures why not wait to see if anything else happens.
So it doesn't seem like general sluggishness, except to a few fast keypunchers. Other than that, screen updates are lightning fast, reports starting printing almost before you hit the ENTER key, etc. I literally had people telling me that some reports weren't working, because when they selected the option on the menu, it just went "right back to the menu, without doing the normal processing". (We used to get that a lot, when switching from old AMOS systems, but in this case, the previous server was fairly decent to begin with.)
|
|
|
Re: Slow character echo under RHEL5
#27378
24 Sep 07 08:24 AM
|
Joined: Sep 2002
Posts: 5,486
Frank
Member
|
Member
Joined: Sep 2002
Posts: 5,486 |
Is this one of those "be careful what you wish for?" times...
FYIW: We are still under 7.3 and RHEL 4.0... but i have noticed this effect when the actual console is under heavy utilization... aka serving up email apps, or running other heavy cpu/disk activity... we try to NOT have the main console running anything but the login prompt.
Not having seen 5.0 yet, i imagine it has an upgraded GUI desktop... have u tried exiting that, and everything else at the main console? Also, are you using heavy USB at the main system? If so, these also tend to be heavy CPU hogs... If the degradations is somewhat random, perhaps there is a clue there....
|
|
|
Re: Slow character echo under RHEL5
#27379
25 Sep 07 02:13 AM
|
Joined: Jun 2001
Posts: 11,925
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,925 |
They do have a fancy Gnome graphic desktop running on the console, but it would be rare for any actual activity going on (since no one sits at the console). I could have them try to switch the graphic console into background, although if it uses that much CPU without any actual user activity, it probably would continue to do the same in background. I'm not sure I remember how to actually shut the GUI desktop off entirely. The only thing on USB is the APC battery unit - I'm not sure how much activity that generates but I would hope not much.
|
|
|
Re: Slow character echo under RHEL5
#27380
25 Sep 07 03:09 AM
|
Joined: Sep 2002
Posts: 5,486
Frank
Member
|
Member
Joined: Sep 2002
Posts: 5,486 |
Dont underestimate how much stuff is going on in the background there!
A check of TOP will give you a CPU% as well.
Under rhel4 Alt F1 and Alt F7 toggle b/w gui and text at the consolem but you probably need to exit the GUI to be sure.
|
|
|
Re: Slow character echo under RHEL5
#27381
25 Sep 07 04:29 AM
|
Joined: Jun 2001
Posts: 11,925
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,925 |
I've never seen more than about 0.1% CPU usage in top. (It's a quad core 2GHz Xeon).
I've done the Alt-F1 / Alt-F7 toggle, but I don't think that really puts the GUI to sleep - it just puts it in the background. In any case, it's very hard to see how the GUI would be using much CPU time when there's nothing showing in top and nothing actually going on with the GUI.
The GUI console does use up a lot of memory (100MB+) but we have memory to burn (4GB).
Still, it has been reported that there are times when the delay seems to go away, which does suggest some kind of interaction with something else in the system or environment.
|
|
|
Re: Slow character echo under RHEL5
#27382
25 Sep 07 05:12 AM
|
Joined: Sep 2002
Posts: 5,486
Frank
Member
|
Member
Joined: Sep 2002
Posts: 5,486 |
u could pull the UPS montitor off just for fun as well..
true, its more disk i/o than CPU that kills the thruput... even to the kbd... is there some krazy journaling/mirroring going on that could be hitting the drive(s) extra hard? We have seen the mail daemnon grind our systems down if they are active as well.
|
|
|
Re: Slow character echo under RHEL5
#27383
25 Sep 07 06:14 AM
|
Joined: Feb 2002
Posts: 94
Tom Jeske
Member
|
Member
Joined: Feb 2002
Posts: 94 |
FWIW. I had a similar problem awhile back. Finally found the problem to be the cable from the server to the switch was of poor quality. After the cable was replaced all was well.
|
|
|
Re: Slow character echo under RHEL5
#27384
25 Sep 07 07:40 AM
|
Joined: Jun 2001
Posts: 11,925
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,925 |
We aren't running a mail server. I may consider trying to remove the USB for awhile, and also changing the cable. It seems that if the cable was an issue, it would show up in some kind of statistics, although I haven't had a chance to study all these values from netstat to see if any suggest serious problems:
[jackmc@gopfab ~]$ uptime
15:32:19 up 10 days, 17:47, 26 users, load average: 0.13, 0.10, 0.09
[jackmc@gopfab ~]$ netstat -s
Ip:
3258181 total packets received
251 with invalid addresses
0 forwarded
0 incoming packets discarded
3257930 incoming packets delivered
2819252 requests sent out
Icmp:
6543 ICMP messages received
1457 input ICMP message failed.
ICMP input histogram:
destination unreachable: 6541
echo replies: 2
6545 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 6545
Tcp:
17464 active connections openings
11360 passive connection openings
2447 failed connection attempts
455 connection resets received
33 connections established
2922970 segments received
2766334 segments send out
10928 segments retransmited
0 bad segments received.
394 resets sent
Udp:
32013 packets received
3561 packets to unknown port received.
0 packet receive errors
35432 packets sent
TcpExt:
10 invalid SYN cookies received
13 resets received for embryonic SYN_RECV sockets
16 ICMP packets dropped because they were out-of-window
20459 TCP sockets finished time wait in fast timer
8 time wait sockets recycled by time stamp
40777 delayed acks sent
1 delayed acks further delayed because of locked socket
Quick ack mode was activated 358 times
12316 packets directly queued to recvmsg prequeue.
564 packets directly received from backlog
1393350 packets directly received from prequeue
450862 packets header predicted
5729 packets header predicted and directly queued to user
968029 acknowledgments not containing data received
122411 predicted acknowledgments
5 times recovered from packet loss due to SACK data
317 congestion windows recovered after partial ack
8 TCP data loss events
TCPLostRetransmit: 1
13 timeouts after SACK recovery
1 timeouts in loss state
11 fast retransmits
9 forward retransmits
4 retransmits in slow start
10803 other TCP timeouts
241 DSACKs sent for old packets
67 DSACKs received
139 connections reset due to unexpected data
106 connections reset due to early user close
15 connections aborted due to timeout
[jackmc@gopfab ~]$
Note: using watch -s 'netstat -s' for a minute or so, it appears that we are only getting about 5-10 total packets per second, which doesn't strike me as very heavy traffic.
|
|
|
Re: Slow character echo under RHEL5
#27385
10 Oct 07 04:41 AM
|
Anonymous
Unregistered
|
Anonymous
Unregistered
|
Jack, Have you tried running Ethereal or Wireshark, which will allow you to see the actual TCP level packets in your telnet connection? Either of these free tools, available on Linux or Windows, will show the data bytes in each packet, along with a timestamp of the time it takes for the character echo to be delivered. The problem could also be related to the new network driver model (NAPI), which is supposed to run the network driver in an interrupt mode when traffic is light, but switch to a polled mode when traffic is heavy. If the driver is inappropriately switching to polling mode under a light load, there would be latency. This could account for a change in echo timing independent of the Nagle settings. Here is more on NAPI: http://linux-net.osdl.org/index.php/NAPI Here is a link to Wireshark: http://www.wireshark.org/ Take care,
|
|
|
Re: Slow character echo under RHEL5
#27386
10 Oct 07 05:00 AM
|
Joined: Sep 2002
Posts: 5,486
Frank
Member
|
Member
Joined: Sep 2002
Posts: 5,486 |
|
|
|
Re: Slow character echo under RHEL5
#27387
10 Oct 07 05:42 AM
|
Joined: Jun 2001
Posts: 11,925
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,925 |
Bob, Nice to hear from you and thanks for the excellent suggestions. I was gearing up to do an Ethereal test, but it's a bit awkward to do remotely. (I have only SSH access through their firewall, so I'd probably need to coordinate it with someone locally. Unless I can run the capture without any GUI. (In the past I've used Ethereal but only from the Windows side.) Or, perhaps I should try setting up RHEL5 or CentOS 5 on an in-house server.) I suspect that running one of the packet captures from the client side over the Internet is going to obscure the timing issue. In any case, your NAPI description rings a bell, because there have been times when, mysteriously, they've reported that the delay seemed to go away. So maybe those were times when the traffic was low enough for it to operate in a more efficient mode, although it isn't clear to me which mode has the lowest latency. It's also not clear to me that this system has enough overall activity for any of these NAPI enhancements to really be an issue. (3258181 total packets received in 10 days of uptime is not a particularly impressive number.) On the other hand, I do note in the LinuxNet article under disadvantages of NAPI the following: In some cases, NAPI may introduce additional software IRQ latency.
I'll study up on it to see if I can discover anything. Thanks again, Jack P.S. FYI, the site has the standard support agreement with Red Hat, and I've exchanged about a dozen messages with various people there on the case, at the rate of about 1 exchange per day or two, but so far no one has mentioned NAPI. They've mostly been looking at tcpdump and strace output (and apparently not getting anywhere). Also, we have changed cables and tried removing the UPS, but that had no effect.
|
|
|
Re: Slow character echo under RHEL5
#27388
10 Oct 07 08:07 AM
|
Anonymous
Unregistered
|
Anonymous
Unregistered
|
Ouch, Wireshark is hard to use remotely. The best place to test from would be from a client connecting to the RHE server. Establish a telnet connection, and set up tcpdump or Wireshark _on the client_ to filter to TCP only using the client's TCP address and port. That way, the test tool isn't disturbing the thing you are trying to measure. The fact that getting a connection through the lo (127.0.0.*) interface doesn't have the delay does seem to put the issue on the network driver side. I guess another thing I would try is a flood ping between the client and the server, done from the client side. See if that makes the delay better, or worse. If it makes it better, then the extra traffic is helping the network driver do it's job. Bob
|
|
|
Re: Slow character echo under RHEL5
#27389
11 Oct 07 03:26 AM
|
Joined: Jun 2001
Posts: 11,925
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,925 |
I've decided the best way to proceed is to set up an in-house CentOS 5, which should have the same kernel behavior (although I may end up with different network drivers, since the hardware will be different). But if the problem occurs here, then chances are that any solutions will be transferable. I started downloading the 6 CD images last night, but apparently Microsoft got wind of it and decided that a Windows update, complete with auto-reboot, might be the best way to keep me from getting too involved with another OS.) So that plan has been set back a few hours...
I kind of like the flood ping test idea, but that may be slightly overkill - I don't want to bring the network down! Oddly, it doesn't seem as if any of the standard ping utilities have something in between the one-per-second and flood-ping modes - I may need to write one that allows a variable rate. On the other hand, if that really were a solution, it would be a pretty crummy one! (And particularly hard to calibrate from the client side!)
But imagine how fun it would be to document the "feature" if added to ATE (e.g. "ate -f switch fills the gaps between the keys of your pathetically-slow typing, so that the server doesn't fall too deeply asleep and have to be bludgeoned awake for every key typed.")
Question: does ping or telnet from and back to the server at its real IP address (192.168.x.x) rather than 127.0.0.x, does that exercise the network driver in the same way that packets originating from another machine would? Or does the routing table effectively just route those packets to the lo interface, bypassing the network driver?
|
|
|
Re: Slow character echo under RHEL5
#27390
11 Oct 07 08:03 AM
|
Joined: Jun 2001
Posts: 11,925
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,925 |
(non)Progress update: CentOS 5 now installed locally on a very cheap Sempron box with only 128Mb RAM. (Didn't want to use VMware for fear the virtualized network interface would act too differently from a real one.) Installed the identical telnet-server package (telnet-server-0.17-31.EL4.3). Kernel versions nearly the same:
2.6.18-8.1.10.el5 #1 SMP (RHEL 5) 2.6.18-8.el5 #1 SMP (CentOS 5)
Problem not reproducible here!
|
|
|
Re: Slow character echo under RHEL5
#27391
11 Oct 07 12:23 PM
|
Joined: Jun 2001
Posts: 11,925
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,925 |
Here's some progress: Surprising at it may seem, I could actually detect this problem, visually, over the internet using ATE in SSH mode. I'd fire off a sequence of 8 characters as fast as I could type (the home row of ";laskdjf". Even though there was a slight delay before the characters were echoed, once they started echoing (actually before I even finished typing), they would appear in a rapid burst, except for the last one, without would always be about 0.2 seconds after the second-to-last character. Looking at that in Ethereal, I could see that the last character would not be sent from the host until the previous character had been ACK'd, and that ACK packet (from the client to the server, acknowledging the previous packet) would be about 0.2 seconds after the previous character was received. In doing some research on the Nagle delay, I saw that it was often bound up with something called "delayed ACKs", which is essentially the same concept but relates to an attempt to avoid sending empty ACK packets by delaying for a short time to see if the ACK can be piggy-backed with the next data packet. Various references were all rather vague on whether turning off the Nagle delay (i.e., setting the TCP_NODELAY flag) also disables delayed ACKs, but when I just search the Internet for "Windows delayed ack", I found the following Microsoft article addressing this very issue: http://support.microsoft.com/kb/328890 Obviously, setting TCP_NODELAY under Windows sockets does NOT disable the delayed acks (it didn't seem to have any effect on the data packet timing either, at least in my tests), but there is now apparently a Registry entry that you can manually insert in order to adjust the ACK delay. I did that (i.e. added a value TcpAckFrequency and set it to 1, for no delay), and believe it or not, the effect is actually noticeable on this particular server, even over the Internet. (I'm waiting to see if someone on-site is brave enough to try it.) I'm not sure if this is the solution, or even a solution (it certainly isn't very elegant, since it can't be turned on and off dynamically by the application, and turning it on permanently may adversely affect other kinds of network performance), but it seemed worth documenting.
|
|
|
Re: Slow character echo under RHEL5
#27392
12 Oct 07 09:55 AM
|
Joined: Sep 2002
Posts: 5,486
Frank
Member
|
Member
Joined: Sep 2002
Posts: 5,486 |
Thanks for keeping the communitee in the loop... at the very least im trying to understand what's going on in the thread... Just to reiterate, is this a rhel 5 issue or a windows tcp issue?
Kinda got lost there.
I certainly dont want to repeat all your hard work if we ever put out a rhel5 server.
Thanks.
|
|
|
Re: Slow character echo under RHEL5
#27393
12 Oct 07 10:19 AM
|
Joined: Jun 2001
Posts: 713
Steven Shatz
Member
|
Member
Joined: Jun 2001
Posts: 713 |
I'm also interested in this thread, both because of past performance problems that seem mighty similar in their frustrating aspects and because we do have a customer already running A-Shell under RH EL5. That customer however, has not reported any problems. But, they are a relatively light usage shop - with no more than 15-20 people logged into A-Shell at any time.
|
|
|
Re: Slow character echo under RHEL5
#27394
12 Oct 07 10:34 AM
|
Joined: Jun 2001
Posts: 11,925
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,925 |
Excellent question. Let's just say that it is a "tcp issue" but not specifically the fault of either Windows or RHEL5. The fact that I was able to ameliorate it by disabling the delayed ACK on the Windows side is nice to know, but I don't think we should conclude that it is a Windows problem or that we should disable it on every PC.
Although the NAPI driver mechanism seems like it may be associated with the issue, it may also just be one of those things that is dependent on so many factors (including peculiar habits of old-school keypunch operators) that there is not much point in worrying about it. (Certainly in the GUI environment, it's the last thing you should be worrying about.)
It is also probably unfair to associate it with RHEL5 per se, since if it is related to NAPI, it would probably occur in all the newer LINUX distros.
If you do run into the issue someday, you can hopefully search the BBS and try the workaround described above.
|
|
|
Re: Slow character echo under RHEL5
#27395
12 Oct 07 10:40 AM
|
Joined: Jun 2001
Posts: 11,925
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,925 |
Steven - for what it's worth, I'm quite sure that this issue was not caused by TOO MUCH activity. (It might have been caused by TOO LITTLE activity, although I never managed to set up a proper test of that.) But if you want to pursue it, try the "home row rapid typing test" (e.g. "a;sdlkfj") that I described above to see if you can detect a noticeable delay between the echoing of the last two chars. If so, then maybe you could benefit from the TcpAckDelay Windows Registry fix (although I shudder at the idea of trying to do that to dozens of PCs).
|
|
|
Re: Slow character echo under RHEL5
#27396
16 Oct 07 09:12 AM
|
Joined: Jun 2001
Posts: 11,925
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,925 |
Here's a follow-up on this case:
The customer finally installed the Windows Registry patch described above (to remove the delayed ACK) on several PC's where the operators were complaining, and they all seem to be satisfied now.
However, contradicting my last comment to Steven, they are reasonably certain that the problem was minimized when the activity on the system was the least (for example, on a weekend when few others were working).
|
|
|
Re: Slow character echo under RHEL5
#27397
17 Oct 07 01:23 AM
|
Joined: Sep 2002
Posts: 5,486
Frank
Member
|
Member
Joined: Sep 2002
Posts: 5,486 |
Wouldnt it be interesting in this case, to plant a "placebo" fix on one of the stations and see if the user notices any changes? Sometimes the power of suggesting that it is better makes the user feel that it is...
|
|
|
Re: Slow character echo under RHEL5
#27398
17 Oct 07 03:12 AM
|
Joined: Jun 2001
Posts: 11,925
Jack McGregor
OP
Member
|
OP
Member
Joined: Jun 2001
Posts: 11,925 |
Good suggestion. If I was going to market this patch as a drug, some double-blind studies would surely be called for. But for what it's worth, the effect is measurable with a packet tracer (I used Ethereal), which I doubt is subject to the placebo effect. Just to clarify why removing the delayed ACK on the Windows side improves apparent responsiveness on the Linux side, here's a simplified example of a user typing the string "1234" quickly. (I'm only showing the ACKs from the PC back to the server): User (PC) Server
--------- -------
1 ---->
2 ---->
<---- 1
3+ACK(1) ---->
<---- 2
4+ACK(2) ---->
<---- 3
ACK(3) ---->
<---- 4
ACK(4) ----> The user manages to type "12" before the server can echo the "1". From that point on, each subsequent character typed triggers the echoing of the preceding character. But there are no apparent delays because the windows ACK delay is interrupted by either the next typed character or the server's echo of the previous one. (Note that the ACK messages are piggybacking on the data, which is exactly the goal of delaying ACKs in the first place, i.e. to avoid sending an empty ACK packet and then 1ms later sending a data packet when the data and the ACK could have shared a packet.) But after the server echoes the "3", since the user has already typed and sent the "4", there is no further activity in either direction to spur things along. The PC's delayed ACK logic kicks in, causing it to wait the full 200ms before acknowledging the receipt of the echoed "3". Meanwhile, on the server side, the Nagle algorithm is causing it to hold off on sending the echoed "4", in order to see if any more data can be included in the same packet. When it finally gets the ACK of the echoed "3", that terminates the Nagle delay and it finally sends the "4". If we could have turned off the Nagle delay on the server side, that would probably have also eliminated the issue, but since we couldn't figure out how to do that, eliminating the delayed ACK on the Windows box was a workaround. As for why it seems to be more of an issue on this server, than on a much slower SCO box, I'm still not sure. Obviously, server load could slow down the echoing, allowing the operator to get ahead of the server, but the new server should be experiencing much less load than the old one. Perhaps it is related to the NAPI optimizations, where are presumably trying to reduce the frequency at which the CPU is interrupted by network packets. And maybe there is some kind of SMP scheduling/tuning issue, whereby strategies which maximize overall throughput and CPU utilization are working against the responsiveness to these tiny "nuisance" interrupts (like echoing a character). There may be some tunable parameters which relate to that, but I don't know what they are. Nor is it clear that it would be a good idea to tinker with them, since other than this character echoing issue (which now seems to be resolved by our workaround), the overall performance of the system is fantastic.
|
|
|
|
|